linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 00/39] x86: Kernel IBT
@ 2022-02-24 14:51 Peter Zijlstra
  2022-02-24 14:51 ` [PATCH v2 01/39] kbuild: Fix clang build Peter Zijlstra
                   ` (39 more replies)
  0 siblings, 40 replies; 183+ messages in thread
From: Peter Zijlstra @ 2022-02-24 14:51 UTC (permalink / raw)
  To: x86, joao, hjl.tools, jpoimboe, andrew.cooper3
  Cc: linux-kernel, peterz, ndesaulniers, keescook, samitolvanen,
	mark.rutland, alyssa.milburn, mbenes, rostedt, mhiramat,
	alexei.starovoitov

Hi,

This is an even more complete Kernel IBT implementation.

Since last time (in no specific order):

 - Reworked Xen and paravirt bits lots (andyhpp)
 - Reworked entry annotation (jpoimboe)
 - Renamed CONFIG symbol to CONFIG_X86_KERNEL_IBT (redgecomb)
 - Pinned CR4_CET (kees)
 - Added __noendbr to CET control functions (kees)
 - kexec (redgecomb)
 - made function-graph, kprobes and bpf not explode (rostedt)
 - cleanups and split ups (jpoimboe, mbenes)
 - reworked whole module objtool (nathanchance)
 - attempted and failed at making Clang go

Specifically to clang; I made clang-13 explode by rediscovering:
https://reviews.llvm.org/D111108, then I tried clang-14 but it looks like
ld.lld is still generating .plt entries out of thin air.

Also, I know the very first patch is somewhat controversial amonst the clang
people, but I really think the current state of affairs is abysmal and this
lets me at least use clang.

Patches are also available here:

  git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git x86/wip.ibt

This series is on top of tip/master along with the linkage patches from Mark:

  https://lore.kernel.org/all/20220216162229.1076788-1-mark.rutland@arm.com/

Enjoy!


^ permalink raw reply	[flat|nested] 183+ messages in thread

* [PATCH v2 01/39] kbuild: Fix clang build
  2022-02-24 14:51 [PATCH v2 00/39] x86: Kernel IBT Peter Zijlstra
@ 2022-02-24 14:51 ` Peter Zijlstra
  2022-02-25  0:11   ` Kees Cook
  2022-03-01 21:16   ` Nick Desaulniers
  2022-02-24 14:51 ` [PATCH v2 02/39] static_call: Avoid building empty .static_call_sites Peter Zijlstra
                   ` (38 subsequent siblings)
  39 siblings, 2 replies; 183+ messages in thread
From: Peter Zijlstra @ 2022-02-24 14:51 UTC (permalink / raw)
  To: x86, joao, hjl.tools, jpoimboe, andrew.cooper3
  Cc: linux-kernel, peterz, ndesaulniers, keescook, samitolvanen,
	mark.rutland, alyssa.milburn, mbenes, rostedt, mhiramat,
	alexei.starovoitov

Debian (and derived) distros ship their compilers as -$ver suffixed
binaries. For gcc it is sufficent to use:

 $ make CC=gcc-12

However, clang builds (esp. clang-lto) need a whole array of tools to be
exactly right, leading to unweildy stuff like:

 $ make CC=clang-13 LD=ld.lld-13 AR=llvm-ar-13 NM=llvm-nm-13 OBJCOPY=llvm-objcopy-13 OBJDUMP=llvm-objdump-13 READELF=llvm-readelf-13 STRIP=llvm-strip-13 LLVM=1

which is, quite franktly, totally insane and unusable. Instead make
the CC variable DTRT, enabling one such as myself to use:

 $ make CC=clang-13

This also lets one quickly test different clang versions.
Additionally, also support path based LLVM suites like:

 $ make CC=/opt/llvm/bin/clang

This changes the default to LLVM=1 when CC is clang, mixing toolchains
is still possible by explicitly adding LLVM=0.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 Makefile                       |   45 +++++++++++++++++++++++++++---------
 tools/scripts/Makefile.include |   50 ++++++++++++++++++++++++++++-------------
 2 files changed, 68 insertions(+), 27 deletions(-)

--- a/Makefile
+++ b/Makefile
@@ -423,9 +423,29 @@ HOST_LFS_CFLAGS := $(shell getconf LFS_C
 HOST_LFS_LDFLAGS := $(shell getconf LFS_LDFLAGS 2>/dev/null)
 HOST_LFS_LIBS := $(shell getconf LFS_LIBS 2>/dev/null)
 
-ifneq ($(LLVM),)
-HOSTCC	= clang
-HOSTCXX	= clang++
+# powerpc and s390 don't yet work with LLVM as a whole
+ifeq ($(ARCH),powerpc)
+LLVM = 0
+endif
+ifeq ($(ARCH),s390)
+LLVM = 0
+endif
+
+# otherwise, if CC=clang, default to using LLVM to enable LTO
+CC_BASE := $(shell echo $(CC) | sed 's/.*\///')
+CC_NAME := $(shell echo $(CC_BASE) | cut -b "1-5")
+ifeq ($(shell test "$(CC_NAME)" = "clang"; echo $$?),0)
+LLVM ?= 1
+LLVM_PFX := $(shell echo $(CC) | sed 's/\(.*\/\)\?.*/\1/')
+LLVM_SFX := $(shell echo $(CC_BASE) | cut -b "6-")
+endif
+
+# if not set by now, do not use LLVM
+LLVM ?= 0
+
+ifneq ($(LLVM),0)
+HOSTCC	= $(LLVM_PFX)clang$(LLVM_SFX)
+HOSTCXX	= $(LLVM_PFX)clang++$(LLVM_SFX)
 else
 HOSTCC	= gcc
 HOSTCXX	= g++
@@ -442,15 +462,15 @@ KBUILD_HOSTLDLIBS   := $(HOST_LFS_LIBS)
 
 # Make variables (CC, etc...)
 CPP		= $(CC) -E
-ifneq ($(LLVM),)
-CC		= clang
-LD		= ld.lld
-AR		= llvm-ar
-NM		= llvm-nm
-OBJCOPY		= llvm-objcopy
-OBJDUMP		= llvm-objdump
-READELF		= llvm-readelf
-STRIP		= llvm-strip
+ifneq ($(LLVM),0)
+CC		= $(LLVM_PFX)clang$(LLVM_SFX)
+LD		= $(LLVM_PFX)ld.lld$(LLVM_SFX)
+AR		= $(LLVM_PFX)llvm-ar$(LLVM_SFX)
+NM		= $(LLVM_PFX)llvm-nm$(LLVM_SFX)
+OBJCOPY		= $(LLVM_PFX)llvm-objcopy$(LLVM_SFX)
+OBJDUMP		= $(LLVM_PFX)llvm-objdump$(LLVM_SFX)
+READELF		= $(LLVM_PFX)llvm-readelf$(LLVM_SFX)
+STRIP		= $(LLVM_PFX)llvm-strip$(LLVM_SFX)
 else
 CC		= $(CROSS_COMPILE)gcc
 LD		= $(CROSS_COMPILE)ld
@@ -461,6 +481,7 @@ OBJDUMP		= $(CROSS_COMPILE)objdump
 READELF		= $(CROSS_COMPILE)readelf
 STRIP		= $(CROSS_COMPILE)strip
 endif
+
 PAHOLE		= pahole
 RESOLVE_BTFIDS	= $(objtree)/tools/bpf/resolve_btfids/resolve_btfids
 LEX		= flex
--- a/tools/scripts/Makefile.include
+++ b/tools/scripts/Makefile.include
@@ -51,12 +51,32 @@ define allow-override
     $(eval $(1) = $(2)))
 endef
 
-ifneq ($(LLVM),)
-$(call allow-override,CC,clang)
-$(call allow-override,AR,llvm-ar)
-$(call allow-override,LD,ld.lld)
-$(call allow-override,CXX,clang++)
-$(call allow-override,STRIP,llvm-strip)
+# powerpc and s390 don't yet work with LLVM as a whole
+ifeq ($(ARCH),powerpc)
+LLVM = 0
+endif
+ifeq ($(ARCH),s390)
+LLVM = 0
+endif
+
+# otherwise, if CC=clang, default to using LLVM to enable LTO
+CC_BASE := $(shell echo $(CC) | sed 's/.*\///')
+CC_NAME := $(shell echo $(CC_BASE) | cut -b "1-5")
+ifeq ($(shell test "$(CC_NAME)" = "clang"; echo $$?),0)
+LLVM ?= 1
+LLVM_PFX := $(shell echo $(CC) | sed 's/\(.*\/\)\?.*/\1/')
+LLVM_SFX := $(shell echo $(CC_BASE) | cut -b "6-")
+endif
+
+# if not set by now, do not use LLVM
+LLVM ?= 0
+
+ifneq ($(LLVM),0)
+$(call allow-override,CC,$(LLVM_PFX)clang$(LLVM_SFX))
+$(call allow-override,AR,$(LLVM_PFX)llvm-ar$(LLVM_SFX))
+$(call allow-override,LD,$(LLVM_PFX)ld.lld$(LLVM_SFX))
+$(call allow-override,CXX,$(LLVM_PFX)clang++$(LLVM_SFX))
+$(call allow-override,STRIP,$(LLVM_PFX)llvm-strip$(LLVM_SFX))
 else
 # Allow setting various cross-compile vars or setting CROSS_COMPILE as a prefix.
 $(call allow-override,CC,$(CROSS_COMPILE)gcc)
@@ -68,10 +88,10 @@ endif
 
 CC_NO_CLANG := $(shell $(CC) -dM -E -x c /dev/null | grep -Fq "__clang__"; echo $$?)
 
-ifneq ($(LLVM),)
-HOSTAR  ?= llvm-ar
-HOSTCC  ?= clang
-HOSTLD  ?= ld.lld
+ifneq ($(LLVM),0)
+HOSTAR  ?= $(LLVM_PFX)llvm-ar$(LLVM_SFX)
+HOSTCC  ?= $(LLVM_PFX)clang$(LLVM_SFX)
+HOSTLD  ?= $(LLVM_PFX)ld.lld$(LLVM_SFX)
 else
 HOSTAR  ?= ar
 HOSTCC  ?= gcc
@@ -79,11 +99,11 @@ HOSTLD  ?= ld
 endif
 
 # Some tools require Clang, LLC and/or LLVM utils
-CLANG		?= clang
-LLC		?= llc
-LLVM_CONFIG	?= llvm-config
-LLVM_OBJCOPY	?= llvm-objcopy
-LLVM_STRIP	?= llvm-strip
+CLANG		?= $(LLVM_PFX)clang$(LLVM_SFX)
+LLC		?= $(LLVM_PFX)llc$(LLVM_SFX)
+LLVM_CONFIG	?= $(LLVM_PFX)llvm-config$(LLVM_SFX)
+LLVM_OBJCOPY	?= $(LLVM_PFX)llvm-objcopy$(LLVM_SFX)
+LLVM_STRIP	?= $(LLVM_PFX)llvm-strip$(LLVM_SFX)
 
 ifeq ($(CC_NO_CLANG), 1)
 EXTRA_WARNINGS += -Wstrict-aliasing=3



^ permalink raw reply	[flat|nested] 183+ messages in thread

* [PATCH v2 02/39] static_call: Avoid building empty .static_call_sites
  2022-02-24 14:51 [PATCH v2 00/39] x86: Kernel IBT Peter Zijlstra
  2022-02-24 14:51 ` [PATCH v2 01/39] kbuild: Fix clang build Peter Zijlstra
@ 2022-02-24 14:51 ` Peter Zijlstra
  2022-02-24 14:51 ` [PATCH v2 03/39] x86/module: Fix the paravirt vs alternative order Peter Zijlstra
                   ` (37 subsequent siblings)
  39 siblings, 0 replies; 183+ messages in thread
From: Peter Zijlstra @ 2022-02-24 14:51 UTC (permalink / raw)
  To: x86, joao, hjl.tools, jpoimboe, andrew.cooper3
  Cc: linux-kernel, peterz, ndesaulniers, keescook, samitolvanen,
	mark.rutland, alyssa.milburn, mbenes, rostedt, mhiramat,
	alexei.starovoitov

Without CONFIG_HAVE_STATIC_CALL_INLINE there's no point in creating
the .static_call_sites section and it's related symbols.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 include/asm-generic/vmlinux.lds.h |    4 ++++
 1 file changed, 4 insertions(+)

--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -398,6 +398,7 @@
 	KEEP(*(__jump_table))						\
 	__stop___jump_table = .;
 
+#ifdef CONFIG_HAVE_STATIC_CALL_INLINE
 #define STATIC_CALL_DATA						\
 	. = ALIGN(8);							\
 	__start_static_call_sites = .;					\
@@ -406,6 +407,9 @@
 	__start_static_call_tramp_key = .;				\
 	KEEP(*(.static_call_tramp_key))					\
 	__stop_static_call_tramp_key = .;
+#else
+#define STATIC_CALL_DATA
+#endif
 
 /*
  * Allow architectures to handle ro_after_init data on their



^ permalink raw reply	[flat|nested] 183+ messages in thread

* [PATCH v2 03/39] x86/module: Fix the paravirt vs alternative order
  2022-02-24 14:51 [PATCH v2 00/39] x86: Kernel IBT Peter Zijlstra
  2022-02-24 14:51 ` [PATCH v2 01/39] kbuild: Fix clang build Peter Zijlstra
  2022-02-24 14:51 ` [PATCH v2 02/39] static_call: Avoid building empty .static_call_sites Peter Zijlstra
@ 2022-02-24 14:51 ` Peter Zijlstra
  2022-03-01 14:37   ` Miroslav Benes
  2022-02-24 14:51 ` [PATCH v2 04/39] objtool: Add --dry-run Peter Zijlstra
                   ` (36 subsequent siblings)
  39 siblings, 1 reply; 183+ messages in thread
From: Peter Zijlstra @ 2022-02-24 14:51 UTC (permalink / raw)
  To: x86, joao, hjl.tools, jpoimboe, andrew.cooper3
  Cc: linux-kernel, peterz, ndesaulniers, keescook, samitolvanen,
	mark.rutland, alyssa.milburn, mbenes, rostedt, mhiramat,
	alexei.starovoitov

Ever since commit 4e6292114c741 ("x86/paravirt: Add new features for
paravirt patching") there is an ordering dependency between patching
paravirt ops and patching alternatives, the module loader still
violates this.

Fixes: 4e6292114c741 ("x86/paravirt: Add new features for paravirt patching")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/kernel/module.c |   13 ++++++++-----
 1 file changed, 8 insertions(+), 5 deletions(-)

--- a/arch/x86/kernel/module.c
+++ b/arch/x86/kernel/module.c
@@ -273,6 +273,14 @@ int module_finalize(const Elf_Ehdr *hdr,
 			retpolines = s;
 	}
 
+	/*
+	 * See alternative_instructions() for the ordering rules between the
+	 * various patching types.
+	 */
+	if (para) {
+		void *pseg = (void *)para->sh_addr;
+		apply_paravirt(pseg, pseg + para->sh_size);
+	}
 	if (retpolines) {
 		void *rseg = (void *)retpolines->sh_addr;
 		apply_retpolines(rseg, rseg + retpolines->sh_size);
@@ -290,11 +298,6 @@ int module_finalize(const Elf_Ehdr *hdr,
 					    tseg, tseg + text->sh_size);
 	}
 
-	if (para) {
-		void *pseg = (void *)para->sh_addr;
-		apply_paravirt(pseg, pseg + para->sh_size);
-	}
-
 	/* make jump label nops */
 	jump_label_apply_nops(me);
 



^ permalink raw reply	[flat|nested] 183+ messages in thread

* [PATCH v2 04/39] objtool: Add --dry-run
  2022-02-24 14:51 [PATCH v2 00/39] x86: Kernel IBT Peter Zijlstra
                   ` (2 preceding siblings ...)
  2022-02-24 14:51 ` [PATCH v2 03/39] x86/module: Fix the paravirt vs alternative order Peter Zijlstra
@ 2022-02-24 14:51 ` Peter Zijlstra
  2022-02-25  0:27   ` Kees Cook
  2022-03-01 14:37   ` Miroslav Benes
  2022-02-24 14:51 ` [PATCH v2 05/39] x86: Base IBT bits Peter Zijlstra
                   ` (35 subsequent siblings)
  39 siblings, 2 replies; 183+ messages in thread
From: Peter Zijlstra @ 2022-02-24 14:51 UTC (permalink / raw)
  To: x86, joao, hjl.tools, jpoimboe, andrew.cooper3
  Cc: linux-kernel, peterz, ndesaulniers, keescook, samitolvanen,
	mark.rutland, alyssa.milburn, mbenes, rostedt, mhiramat,
	alexei.starovoitov

Add a --dry-run argument to skip writing the modifications. This is
convenient for debugging.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 tools/objtool/builtin-check.c           |    3 ++-
 tools/objtool/elf.c                     |    3 +++
 tools/objtool/include/objtool/builtin.h |    2 +-
 3 files changed, 6 insertions(+), 2 deletions(-)

--- a/tools/objtool/builtin-check.c
+++ b/tools/objtool/builtin-check.c
@@ -20,7 +20,7 @@
 #include <objtool/objtool.h>
 
 bool no_fp, no_unreachable, retpoline, module, backtrace, uaccess, stats,
-     validate_dup, vmlinux, mcount, noinstr, backup, sls;
+     validate_dup, vmlinux, mcount, noinstr, backup, sls, dryrun;
 
 static const char * const check_usage[] = {
 	"objtool check [<options>] file.o",
@@ -46,6 +46,7 @@ const struct option check_options[] = {
 	OPT_BOOLEAN('M', "mcount", &mcount, "generate __mcount_loc"),
 	OPT_BOOLEAN('B', "backup", &backup, "create .orig files before modification"),
 	OPT_BOOLEAN('S', "sls", &sls, "validate straight-line-speculation"),
+	OPT_BOOLEAN(0, "dry-run", &dryrun, "don't write the modifications"),
 	OPT_END(),
 };
 
--- a/tools/objtool/elf.c
+++ b/tools/objtool/elf.c
@@ -1019,6 +1019,9 @@ int elf_write(struct elf *elf)
 	struct section *sec;
 	Elf_Scn *s;
 
+	if (dryrun)
+		return 0;
+
 	/* Update changed relocation sections and section headers: */
 	list_for_each_entry(sec, &elf->sections, list) {
 		if (sec->changed) {
--- a/tools/objtool/include/objtool/builtin.h
+++ b/tools/objtool/include/objtool/builtin.h
@@ -9,7 +9,7 @@
 
 extern const struct option check_options[];
 extern bool no_fp, no_unreachable, retpoline, module, backtrace, uaccess, stats,
-            validate_dup, vmlinux, mcount, noinstr, backup, sls;
+            validate_dup, vmlinux, mcount, noinstr, backup, sls, dryrun;
 
 extern int cmd_parse_options(int argc, const char **argv, const char * const usage[]);
 



^ permalink raw reply	[flat|nested] 183+ messages in thread

* [PATCH v2 05/39] x86: Base IBT bits
  2022-02-24 14:51 [PATCH v2 00/39] x86: Kernel IBT Peter Zijlstra
                   ` (3 preceding siblings ...)
  2022-02-24 14:51 ` [PATCH v2 04/39] objtool: Add --dry-run Peter Zijlstra
@ 2022-02-24 14:51 ` Peter Zijlstra
  2022-02-25  0:35   ` Kees Cook
  2022-02-24 14:51 ` [PATCH v2 06/39] x86/ibt: Add ANNOTATE_NOENDBR Peter Zijlstra
                   ` (34 subsequent siblings)
  39 siblings, 1 reply; 183+ messages in thread
From: Peter Zijlstra @ 2022-02-24 14:51 UTC (permalink / raw)
  To: x86, joao, hjl.tools, jpoimboe, andrew.cooper3
  Cc: linux-kernel, peterz, ndesaulniers, keescook, samitolvanen,
	mark.rutland, alyssa.milburn, mbenes, rostedt, mhiramat,
	alexei.starovoitov

Add Kconfig, Makefile and basic instruction support for x86 IBT.

XXX clang is not playing ball, probably lld being 'funny', I'm having
problems with .plt entries appearing all over after linking.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/Kconfig           |   19 ++++++++++++
 arch/x86/Makefile          |    7 +++-
 arch/x86/include/asm/ibt.h |   71 +++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 95 insertions(+), 2 deletions(-)

--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1861,6 +1861,25 @@ config X86_UMIP
 	  specific cases in protected and virtual-8086 modes. Emulated
 	  results are dummy.
 
+config CC_HAS_IBT
+	# GCC >= 9 and binutils >= 2.29
+	# Retpoline check to work around https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93654
+	# Clang/LLVM >= 14
+	# fentry check to work around https://reviews.llvm.org/D111108
+	def_bool ((CC_IS_GCC && $(cc-option, -fcf-protection=branch -mindirect-branch-register)) || \
+		  (CC_IS_CLANG && $(cc-option, -fcf-protection=branch -mfentry))) && \
+		  $(as-instr,endbr64)
+
+config X86_KERNEL_IBT
+	prompt "Indirect Branch Tracking"
+	bool
+	depends on X86_64 && CC_HAS_IBT
+	help
+	  Build the kernel with support for Indirect Branch Tracking, a
+	  hardware supported CFI scheme. Any indirect call must land on
+	  an ENDBR instruction, as such, the compiler will litter the
+	  code with them to make this happen.
+
 config X86_INTEL_MEMORY_PROTECTION_KEYS
 	prompt "Memory Protection Keys"
 	def_bool y
--- a/arch/x86/Makefile
+++ b/arch/x86/Makefile
@@ -36,7 +36,7 @@ endif
 
 # How to compile the 16-bit code.  Note we always compile for -march=i386;
 # that way we can complain to the user if the CPU is insufficient.
-REALMODE_CFLAGS	:= -m16 -g -Os -DDISABLE_BRANCH_PROFILING \
+REALMODE_CFLAGS	:= -m16 -g -Os -DDISABLE_BRANCH_PROFILING -D__DISABLE_EXPORTS \
 		   -Wall -Wstrict-prototypes -march=i386 -mregparm=3 \
 		   -fno-strict-aliasing -fomit-frame-pointer -fno-pic \
 		   -mno-mmx -mno-sse $(call cc-option,-fcf-protection=none)
@@ -62,8 +62,11 @@ export BITS
 #
 KBUILD_CFLAGS += -mno-sse -mno-mmx -mno-sse2 -mno-3dnow -mno-avx
 
-# Intel CET isn't enabled in the kernel
+ifeq ($(CONFIG_X86_KERNEL_IBT),y)
+KBUILD_CFLAGS += $(call cc-option,-fcf-protection=branch)
+else
 KBUILD_CFLAGS += $(call cc-option,-fcf-protection=none)
+endif
 
 ifeq ($(CONFIG_X86_32),y)
         BITS := 32
--- /dev/null
+++ b/arch/x86/include/asm/ibt.h
@@ -0,0 +1,71 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_X86_IBT_H
+#define _ASM_X86_IBT_H
+
+#include <linux/types.h>
+
+#if defined(CONFIG_X86_KERNEL_IBT) && !defined(__DISABLE_EXPORTS)
+
+#define HAS_KERNEL_IBT	1
+
+#ifndef __ASSEMBLY__
+
+#ifdef CONFIG_X86_64
+#define ASM_ENDBR	"endbr64\n\t"
+#else
+#define ASM_ENDBR	"endbr32\n\t"
+#endif
+
+#define __noendbr	__attribute__((nocf_check))
+
+static inline __attribute_const__ unsigned int gen_endbr(void)
+{
+	unsigned int endbr;
+
+	/*
+	 * Generate ENDBR64 in a way that is sure to not result in
+	 * an ENDBR64 instruction as immediate.
+	 */
+	asm ( "mov $~0xfa1e0ff3, %[endbr]\n\t"
+	      "not %[endbr]\n\t"
+	       : [endbr] "=&r" (endbr) );
+
+	return endbr;
+}
+
+static inline bool is_endbr(unsigned int val)
+{
+	val &= ~0x01000000U; /* ENDBR32 -> ENDBR64 */
+	return val == gen_endbr();
+}
+
+#else /* __ASSEMBLY__ */
+
+#ifdef CONFIG_X86_64
+#define ENDBR	endbr64
+#else
+#define ENDBR	endbr32
+#endif
+
+#endif /* __ASSEMBLY__ */
+
+#else /* !IBT */
+
+#define HAS_KERNEL_IBT	0
+
+#ifndef __ASSEMBLY__
+
+#define ASM_ENDBR
+
+#define __noendbr
+
+static inline bool is_endbr(unsigned int val) { return false; }
+
+#else /* __ASSEMBLY__ */
+
+#define ENDBR
+
+#endif /* __ASSEMBLY__ */
+
+#endif /* CONFIG_X86_KERNEL_IBT */
+#endif /* _ASM_X86_IBT_H */



^ permalink raw reply	[flat|nested] 183+ messages in thread

* [PATCH v2 06/39] x86/ibt: Add ANNOTATE_NOENDBR
  2022-02-24 14:51 [PATCH v2 00/39] x86: Kernel IBT Peter Zijlstra
                   ` (4 preceding siblings ...)
  2022-02-24 14:51 ` [PATCH v2 05/39] x86: Base IBT bits Peter Zijlstra
@ 2022-02-24 14:51 ` Peter Zijlstra
  2022-02-25  0:36   ` Kees Cook
  2022-02-24 14:51 ` [PATCH v2 07/39] x86/entry: Sprinkle ENDBR dust Peter Zijlstra
                   ` (33 subsequent siblings)
  39 siblings, 1 reply; 183+ messages in thread
From: Peter Zijlstra @ 2022-02-24 14:51 UTC (permalink / raw)
  To: x86, joao, hjl.tools, jpoimboe, andrew.cooper3
  Cc: linux-kernel, peterz, ndesaulniers, keescook, samitolvanen,
	mark.rutland, alyssa.milburn, mbenes, rostedt, mhiramat,
	alexei.starovoitov

In order to have objtool warn about code references to !ENDBR
instruction, we need an annotation to allow this for non-control-flow
instances -- consider text range checks, text patching, or return
trampolines etc.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 include/linux/objtool.h |   16 ++++++++++++++++
 1 file changed, 16 insertions(+)

--- a/include/linux/objtool.h
+++ b/include/linux/objtool.h
@@ -77,6 +77,12 @@ struct unwind_hint {
 #define STACK_FRAME_NON_STANDARD_FP(func)
 #endif
 
+#define ANNOTATE_NOENDBR					\
+	"986: \n\t"						\
+	".pushsection .discard.noendbr\n\t"			\
+	_ASM_PTR " 986b\n\t"					\
+	".popsection\n\t"
+
 #else /* __ASSEMBLY__ */
 
 /*
@@ -129,6 +135,13 @@ struct unwind_hint {
 	.popsection
 .endm
 
+.macro ANNOTATE_NOENDBR
+.Lhere_\@:
+	.pushsection .discard.noendbr
+	.quad	.Lhere_\@
+	.popsection
+.endm
+
 #endif /* __ASSEMBLY__ */
 
 #else /* !CONFIG_STACK_VALIDATION */
@@ -139,12 +152,15 @@ struct unwind_hint {
 	"\n\t"
 #define STACK_FRAME_NON_STANDARD(func)
 #define STACK_FRAME_NON_STANDARD_FP(func)
+#define ANNOTATE_NOENDBR
 #else
 #define ANNOTATE_INTRA_FUNCTION_CALL
 .macro UNWIND_HINT sp_reg:req sp_offset=0 type:req end=0
 .endm
 .macro STACK_FRAME_NON_STANDARD func:req
 .endm
+.macro ANNOTATE_NOENDBR
+.endm
 #endif
 
 #endif /* CONFIG_STACK_VALIDATION */



^ permalink raw reply	[flat|nested] 183+ messages in thread

* [PATCH v2 07/39] x86/entry: Sprinkle ENDBR dust
  2022-02-24 14:51 [PATCH v2 00/39] x86: Kernel IBT Peter Zijlstra
                   ` (5 preceding siblings ...)
  2022-02-24 14:51 ` [PATCH v2 06/39] x86/ibt: Add ANNOTATE_NOENDBR Peter Zijlstra
@ 2022-02-24 14:51 ` Peter Zijlstra
  2022-02-24 22:37   ` Josh Poimboeuf
  2022-02-25  0:42   ` Kees Cook
  2022-02-24 14:51 ` [PATCH v2 08/39] x86/linkage: Add ENDBR to SYM_FUNC_START*() Peter Zijlstra
                   ` (32 subsequent siblings)
  39 siblings, 2 replies; 183+ messages in thread
From: Peter Zijlstra @ 2022-02-24 14:51 UTC (permalink / raw)
  To: x86, joao, hjl.tools, jpoimboe, andrew.cooper3
  Cc: linux-kernel, peterz, ndesaulniers, keescook, samitolvanen,
	mark.rutland, alyssa.milburn, mbenes, rostedt, mhiramat,
	alexei.starovoitov

Kernel entry points should be having ENDBR on for IBT configs.

The SYSCALL entry points are found through taking their respective
address in order to program them in the MSRs, while the exception
entry points are found through UNWIND_HINT_IRET_REGS.

The rule is that any UNWIND_HINT_IRET_REGS at sym+0 should have an
ENDBR, see the later objtool ibt validation patch.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/entry/entry_64.S        |    6 ++++++
 arch/x86/entry/entry_64_compat.S |    3 +++
 arch/x86/include/asm/idtentry.h  |   20 +++++++++++---------
 arch/x86/include/asm/segment.h   |    2 +-
 arch/x86/kernel/head_64.S        |    5 ++++-
 arch/x86/kernel/idt.c            |    5 +++--
 6 files changed, 28 insertions(+), 13 deletions(-)

--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -86,6 +86,7 @@
 
 SYM_CODE_START(entry_SYSCALL_64)
 	UNWIND_HINT_EMPTY
+	ENDBR
 
 	swapgs
 	/* tss.sp2 is scratch space. */
@@ -350,6 +351,7 @@ SYM_CODE_END(ret_from_fork)
 .macro idtentry vector asmsym cfunc has_error_code:req
 SYM_CODE_START(\asmsym)
 	UNWIND_HINT_IRET_REGS offset=\has_error_code*8
+	ENDBR
 	ASM_CLAC
 
 	.if \has_error_code == 0
@@ -417,6 +419,7 @@ SYM_CODE_END(\asmsym)
 .macro idtentry_mce_db vector asmsym cfunc
 SYM_CODE_START(\asmsym)
 	UNWIND_HINT_IRET_REGS
+	ENDBR
 	ASM_CLAC
 
 	pushq	$-1			/* ORIG_RAX: no syscall to restart */
@@ -472,6 +475,7 @@ SYM_CODE_END(\asmsym)
 .macro idtentry_vc vector asmsym cfunc
 SYM_CODE_START(\asmsym)
 	UNWIND_HINT_IRET_REGS
+	ENDBR
 	ASM_CLAC
 
 	/*
@@ -533,6 +537,7 @@ SYM_CODE_END(\asmsym)
 .macro idtentry_df vector asmsym cfunc
 SYM_CODE_START(\asmsym)
 	UNWIND_HINT_IRET_REGS offset=8
+	ENDBR
 	ASM_CLAC
 
 	/* paranoid_entry returns GS information for paranoid_exit in EBX. */
@@ -1063,6 +1068,7 @@ SYM_CODE_END(error_return)
  */
 SYM_CODE_START(asm_exc_nmi)
 	UNWIND_HINT_IRET_REGS
+	ENDBR
 
 	/*
 	 * We allow breakpoints in NMIs. If a breakpoint occurs, then
--- a/arch/x86/entry/entry_64_compat.S
+++ b/arch/x86/entry/entry_64_compat.S
@@ -48,6 +48,7 @@
  */
 SYM_CODE_START(entry_SYSENTER_compat)
 	UNWIND_HINT_EMPTY
+	ENDBR
 	/* Interrupts are off on entry. */
 	SWAPGS
 
@@ -198,6 +199,7 @@ SYM_CODE_END(entry_SYSENTER_compat)
  */
 SYM_CODE_START(entry_SYSCALL_compat)
 	UNWIND_HINT_EMPTY
+	ENDBR
 	/* Interrupts are off on entry. */
 	swapgs
 
@@ -340,6 +342,7 @@ SYM_CODE_END(entry_SYSCALL_compat)
  */
 SYM_CODE_START(entry_INT80_compat)
 	UNWIND_HINT_EMPTY
+	ENDBR
 	/*
 	 * Interrupts are off on entry.
 	 */
--- a/arch/x86/include/asm/idtentry.h
+++ b/arch/x86/include/asm/idtentry.h
@@ -5,6 +5,8 @@
 /* Interrupts/Exceptions */
 #include <asm/trapnr.h>
 
+#define IDT_ALIGN	(8 * (1 + HAS_KERNEL_IBT))
+
 #ifndef __ASSEMBLY__
 #include <linux/entry-common.h>
 #include <linux/hardirq.h>
@@ -480,7 +482,7 @@ __visible noinstr void func(struct pt_re
 
 /*
  * ASM code to emit the common vector entry stubs where each stub is
- * packed into 8 bytes.
+ * packed into IDT_ALIGN bytes.
  *
  * Note, that the 'pushq imm8' is emitted via '.byte 0x6a, vector' because
  * GCC treats the local vector variable as unsigned int and would expand
@@ -492,33 +494,33 @@ __visible noinstr void func(struct pt_re
  * point is to mask off the bits above bit 7 because the push is sign
  * extending.
  */
-	.align 8
+	.align IDT_ALIGN
 SYM_CODE_START(irq_entries_start)
     vector=FIRST_EXTERNAL_VECTOR
     .rept NR_EXTERNAL_VECTORS
 	UNWIND_HINT_IRET_REGS
 0 :
+	ENDBR
 	.byte	0x6a, vector
 	jmp	asm_common_interrupt
-	nop
-	/* Ensure that the above is 8 bytes max */
-	. = 0b + 8
+	/* Ensure that the above is IDT_ALIGN bytes max */
+	.fill 0b + IDT_ALIGN - ., 1, 0x90
 	vector = vector+1
     .endr
 SYM_CODE_END(irq_entries_start)
 
 #ifdef CONFIG_X86_LOCAL_APIC
-	.align 8
+	.align IDT_ALIGN
 SYM_CODE_START(spurious_entries_start)
     vector=FIRST_SYSTEM_VECTOR
     .rept NR_SYSTEM_VECTORS
 	UNWIND_HINT_IRET_REGS
 0 :
+	ENDBR
 	.byte	0x6a, vector
 	jmp	asm_spurious_interrupt
-	nop
-	/* Ensure that the above is 8 bytes max */
-	. = 0b + 8
+	/* Ensure that the above is IDT_ALIGN bytes max */
+	.fill 0b + IDT_ALIGN - ., 1, 0x90
 	vector = vector+1
     .endr
 SYM_CODE_END(spurious_entries_start)
--- a/arch/x86/include/asm/segment.h
+++ b/arch/x86/include/asm/segment.h
@@ -275,7 +275,7 @@ static inline void vdso_read_cpunode(uns
  * vector has no error code (two bytes), a 'push $vector_number' (two
  * bytes), and a jump to the common entry code (up to five bytes).
  */
-#define EARLY_IDT_HANDLER_SIZE 9
+#define EARLY_IDT_HANDLER_SIZE (9 + 4*HAS_KERNEL_IBT)
 
 /*
  * xen_early_idt_handler_array is for Xen pv guests: for each entry in
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -372,9 +372,11 @@ SYM_CODE_START(early_idt_handler_array)
 	.rept NUM_EXCEPTION_VECTORS
 	.if ((EXCEPTION_ERRCODE_MASK >> i) & 1) == 0
 		UNWIND_HINT_IRET_REGS
+		ENDBR
 		pushq $0	# Dummy error code, to make stack frame uniform
 	.else
 		UNWIND_HINT_IRET_REGS offset=8
+		ENDBR
 	.endif
 	pushq $i		# 72(%rsp) Vector number
 	jmp early_idt_handler_common
@@ -382,10 +384,11 @@ SYM_CODE_START(early_idt_handler_array)
 	i = i + 1
 	.fill early_idt_handler_array + i*EARLY_IDT_HANDLER_SIZE - ., 1, 0xcc
 	.endr
-	UNWIND_HINT_IRET_REGS offset=16
 SYM_CODE_END(early_idt_handler_array)
 
 SYM_CODE_START_LOCAL(early_idt_handler_common)
+	UNWIND_HINT_IRET_REGS offset=16
+	ANNOTATE_NOENDBR
 	/*
 	 * The stack is the hardware frame, an error code or zero, and the
 	 * vector number.
--- a/arch/x86/kernel/idt.c
+++ b/arch/x86/kernel/idt.c
@@ -10,6 +10,7 @@
 #include <asm/proto.h>
 #include <asm/desc.h>
 #include <asm/hw_irq.h>
+#include <asm/idtentry.h>
 
 #define DPL0		0x0
 #define DPL3		0x3
@@ -272,7 +273,7 @@ void __init idt_setup_apic_and_irq_gates
 	idt_setup_from_table(idt_table, apic_idts, ARRAY_SIZE(apic_idts), true);
 
 	for_each_clear_bit_from(i, system_vectors, FIRST_SYSTEM_VECTOR) {
-		entry = irq_entries_start + 8 * (i - FIRST_EXTERNAL_VECTOR);
+		entry = irq_entries_start + IDT_ALIGN * (i - FIRST_EXTERNAL_VECTOR);
 		set_intr_gate(i, entry);
 	}
 
@@ -283,7 +284,7 @@ void __init idt_setup_apic_and_irq_gates
 		 * system_vectors bitmap. Otherwise they show up in
 		 * /proc/interrupts.
 		 */
-		entry = spurious_entries_start + 8 * (i - FIRST_SYSTEM_VECTOR);
+		entry = spurious_entries_start + IDT_ALIGN * (i - FIRST_SYSTEM_VECTOR);
 		set_intr_gate(i, entry);
 	}
 #endif



^ permalink raw reply	[flat|nested] 183+ messages in thread

* [PATCH v2 08/39] x86/linkage: Add ENDBR to SYM_FUNC_START*()
  2022-02-24 14:51 [PATCH v2 00/39] x86: Kernel IBT Peter Zijlstra
                   ` (6 preceding siblings ...)
  2022-02-24 14:51 ` [PATCH v2 07/39] x86/entry: Sprinkle ENDBR dust Peter Zijlstra
@ 2022-02-24 14:51 ` Peter Zijlstra
  2022-02-25  0:45   ` Kees Cook
  2022-02-24 14:51 ` [PATCH v2 09/39] x86/ibt,paravirt: Sprinkle ENDBR Peter Zijlstra
                   ` (31 subsequent siblings)
  39 siblings, 1 reply; 183+ messages in thread
From: Peter Zijlstra @ 2022-02-24 14:51 UTC (permalink / raw)
  To: x86, joao, hjl.tools, jpoimboe, andrew.cooper3
  Cc: linux-kernel, peterz, ndesaulniers, keescook, samitolvanen,
	mark.rutland, alyssa.milburn, mbenes, rostedt, mhiramat,
	alexei.starovoitov

Ensure the ASM functions have ENDBR on for IBT builds, this follows
the ARM64 example. Unlike ARM64, we'll likely end up overwriting them
with poison.

Suggested-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/include/asm/linkage.h |   39 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 39 insertions(+)

--- a/arch/x86/include/asm/linkage.h
+++ b/arch/x86/include/asm/linkage.h
@@ -3,6 +3,7 @@
 #define _ASM_X86_LINKAGE_H
 
 #include <linux/stringify.h>
+#include <asm/ibt.h>
 
 #undef notrace
 #define notrace __attribute__((no_instrument_function))
@@ -34,5 +35,43 @@
 
 #endif /* __ASSEMBLY__ */
 
+/*
+ * compressed and purgatory define this to disable EXPORT,
+ * hijack this same to also not emit ENDBR.
+ */
+#ifndef __DISABLE_EXPORTS
+
+/* SYM_FUNC_START -- use for global functions */
+#define SYM_FUNC_START(name)				\
+	SYM_START(name, SYM_L_GLOBAL, SYM_A_ALIGN)	\
+	ENDBR
+
+/* SYM_FUNC_START_NOALIGN -- use for global functions, w/o alignment */
+#define SYM_FUNC_START_NOALIGN(name)			\
+	SYM_START(name, SYM_L_GLOBAL, SYM_A_NONE)	\
+	ENDBR
+
+/* SYM_FUNC_START_LOCAL -- use for local functions */
+#define SYM_FUNC_START_LOCAL(name)			\
+	SYM_START(name, SYM_L_LOCAL, SYM_A_ALIGN)	\
+	ENDBR
+
+/* SYM_FUNC_START_LOCAL_NOALIGN -- use for local functions, w/o alignment */
+#define SYM_FUNC_START_LOCAL_NOALIGN(name)		\
+	SYM_START(name, SYM_L_LOCAL, SYM_A_NONE)	\
+	ENDBR
+
+/* SYM_FUNC_START_WEAK -- use for weak functions */
+#define SYM_FUNC_START_WEAK(name)			\
+	SYM_START(name, SYM_L_WEAK, SYM_A_ALIGN)	\
+	ENDBR
+
+/* SYM_FUNC_START_WEAK_NOALIGN -- use for weak functions, w/o alignment */
+#define SYM_FUNC_START_WEAK_NOALIGN(name)		\
+	SYM_START(name, SYM_L_WEAK, SYM_A_NONE)		\
+	ENDBR
+
+#endif /* __DISABLE_EXPORTS */
+
 #endif /* _ASM_X86_LINKAGE_H */
 



^ permalink raw reply	[flat|nested] 183+ messages in thread

* [PATCH v2 09/39] x86/ibt,paravirt: Sprinkle ENDBR
  2022-02-24 14:51 [PATCH v2 00/39] x86: Kernel IBT Peter Zijlstra
                   ` (7 preceding siblings ...)
  2022-02-24 14:51 ` [PATCH v2 08/39] x86/linkage: Add ENDBR to SYM_FUNC_START*() Peter Zijlstra
@ 2022-02-24 14:51 ` Peter Zijlstra
  2022-02-25  0:47   ` Kees Cook
  2022-02-24 14:51 ` [PATCH v2 10/39] x86/ibt,crypto: Add ENDBR for the jump-table entries Peter Zijlstra
                   ` (30 subsequent siblings)
  39 siblings, 1 reply; 183+ messages in thread
From: Peter Zijlstra @ 2022-02-24 14:51 UTC (permalink / raw)
  To: x86, joao, hjl.tools, jpoimboe, andrew.cooper3
  Cc: linux-kernel, peterz, ndesaulniers, keescook, samitolvanen,
	mark.rutland, alyssa.milburn, mbenes, rostedt, mhiramat,
	alexei.starovoitov


Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/entry/entry_64.S                 |    1 +
 arch/x86/include/asm/paravirt.h           |    1 +
 arch/x86/include/asm/qspinlock_paravirt.h |    3 +++
 arch/x86/kernel/kvm.c                     |    3 ++-
 arch/x86/kernel/paravirt.c                |    2 ++
 5 files changed, 9 insertions(+), 1 deletion(-)

--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -635,6 +635,7 @@ SYM_INNER_LABEL(restore_regs_and_return_
 
 SYM_INNER_LABEL_ALIGN(native_iret, SYM_L_GLOBAL)
 	UNWIND_HINT_IRET_REGS
+	ENDBR // paravirt_iret
 	/*
 	 * Are we returning to a stack segment from the LDT?  Note: in
 	 * 64-bit mode SS:RSP on the exception stack is always valid.
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -666,6 +666,7 @@ bool __raw_callee_save___native_vcpu_is_
 	    ".globl " PV_THUNK_NAME(func) ";"				\
 	    ".type " PV_THUNK_NAME(func) ", @function;"			\
 	    PV_THUNK_NAME(func) ":"					\
+	    ASM_ENDBR							\
 	    FRAME_BEGIN							\
 	    PV_SAVE_ALL_CALLER_REGS					\
 	    "call " #func ";"						\
--- a/arch/x86/include/asm/qspinlock_paravirt.h
+++ b/arch/x86/include/asm/qspinlock_paravirt.h
@@ -2,6 +2,8 @@
 #ifndef __ASM_QSPINLOCK_PARAVIRT_H
 #define __ASM_QSPINLOCK_PARAVIRT_H
 
+#include <asm/ibt.h>
+
 /*
  * For x86-64, PV_CALLEE_SAVE_REGS_THUNK() saves and restores 8 64-bit
  * registers. For i386, however, only 1 32-bit register needs to be saved
@@ -39,6 +41,7 @@ asm    (".pushsection .text;"
 	".type " PV_UNLOCK ", @function;"
 	".align 4,0x90;"
 	PV_UNLOCK ": "
+	ASM_ENDBR
 	FRAME_BEGIN
 	"push  %rdx;"
 	"mov   $0x1,%eax;"
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -1024,10 +1024,11 @@ asm(
 ".global __raw_callee_save___kvm_vcpu_is_preempted;"
 ".type __raw_callee_save___kvm_vcpu_is_preempted, @function;"
 "__raw_callee_save___kvm_vcpu_is_preempted:"
+ASM_ENDBR
 "movq	__per_cpu_offset(,%rdi,8), %rax;"
 "cmpb	$0, " __stringify(KVM_STEAL_TIME_preempted) "+steal_time(%rax);"
 "setne	%al;"
-"ret;"
+ASM_RET
 ".size __raw_callee_save___kvm_vcpu_is_preempted, .-__raw_callee_save___kvm_vcpu_is_preempted;"
 ".popsection");
 
--- a/arch/x86/kernel/paravirt.c
+++ b/arch/x86/kernel/paravirt.c
@@ -41,6 +41,7 @@ extern void _paravirt_nop(void);
 asm (".pushsection .entry.text, \"ax\"\n"
      ".global _paravirt_nop\n"
      "_paravirt_nop:\n\t"
+     ASM_ENDBR
      ASM_RET
      ".size _paravirt_nop, . - _paravirt_nop\n\t"
      ".type _paravirt_nop, @function\n\t"
@@ -50,6 +51,7 @@ asm (".pushsection .entry.text, \"ax\"\n
 asm (".pushsection .entry.text, \"ax\"\n"
      ".global paravirt_ret0\n"
      "paravirt_ret0:\n\t"
+     ASM_ENDBR
      "xor %" _ASM_AX ", %" _ASM_AX ";\n\t"
      ASM_RET
      ".size paravirt_ret0, . - paravirt_ret0\n\t"



^ permalink raw reply	[flat|nested] 183+ messages in thread

* [PATCH v2 10/39] x86/ibt,crypto: Add ENDBR for the jump-table entries
  2022-02-24 14:51 [PATCH v2 00/39] x86: Kernel IBT Peter Zijlstra
                   ` (8 preceding siblings ...)
  2022-02-24 14:51 ` [PATCH v2 09/39] x86/ibt,paravirt: Sprinkle ENDBR Peter Zijlstra
@ 2022-02-24 14:51 ` Peter Zijlstra
  2022-02-24 22:41   ` Josh Poimboeuf
  2022-02-25  0:50   ` Kees Cook
  2022-02-24 14:51 ` [PATCH v2 11/39] x86/ibt,kvm: Add ENDBR to fastops Peter Zijlstra
                   ` (29 subsequent siblings)
  39 siblings, 2 replies; 183+ messages in thread
From: Peter Zijlstra @ 2022-02-24 14:51 UTC (permalink / raw)
  To: x86, joao, hjl.tools, jpoimboe, andrew.cooper3
  Cc: linux-kernel, peterz, ndesaulniers, keescook, samitolvanen,
	mark.rutland, alyssa.milburn, mbenes, rostedt, mhiramat,
	alexei.starovoitov


Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/crypto/crc32c-pcl-intel-asm_64.S |    3 +++
 1 file changed, 3 insertions(+)

--- a/arch/x86/crypto/crc32c-pcl-intel-asm_64.S
+++ b/arch/x86/crypto/crc32c-pcl-intel-asm_64.S
@@ -195,6 +195,7 @@ SYM_FUNC_START(crc_pcl)
 .altmacro
 LABEL crc_ %i
 .noaltmacro
+	ENDBR
 	crc32q   -i*8(block_0), crc_init
 	crc32q   -i*8(block_1), crc1
 	crc32q   -i*8(block_2), crc2
@@ -203,6 +204,7 @@ LABEL crc_ %i
 
 .altmacro
 LABEL crc_ %i
+	ENDBR
 .noaltmacro
 	crc32q   -i*8(block_0), crc_init
 	crc32q   -i*8(block_1), crc1
@@ -237,6 +239,7 @@ LABEL crc_ %i
 	################################################################
 
 LABEL crc_ 0
+	ENDBR
 	mov     tmp, len
 	cmp     $128*24, tmp
 	jae     full_block



^ permalink raw reply	[flat|nested] 183+ messages in thread

* [PATCH v2 11/39] x86/ibt,kvm: Add ENDBR to fastops
  2022-02-24 14:51 [PATCH v2 00/39] x86: Kernel IBT Peter Zijlstra
                   ` (9 preceding siblings ...)
  2022-02-24 14:51 ` [PATCH v2 10/39] x86/ibt,crypto: Add ENDBR for the jump-table entries Peter Zijlstra
@ 2022-02-24 14:51 ` Peter Zijlstra
  2022-02-25  0:54   ` Kees Cook
  2022-02-24 14:51 ` [PATCH v2 12/39] x86/ibt,ftrace: Search for __fentry__ location Peter Zijlstra
                   ` (28 subsequent siblings)
  39 siblings, 1 reply; 183+ messages in thread
From: Peter Zijlstra @ 2022-02-24 14:51 UTC (permalink / raw)
  To: x86, joao, hjl.tools, jpoimboe, andrew.cooper3
  Cc: linux-kernel, peterz, ndesaulniers, keescook, samitolvanen,
	mark.rutland, alyssa.milburn, mbenes, rostedt, mhiramat,
	alexei.starovoitov


Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/kvm/emulate.c |    6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -189,7 +189,7 @@
 #define X16(x...) X8(x), X8(x)
 
 #define NR_FASTOP (ilog2(sizeof(ulong)) + 1)
-#define FASTOP_SIZE 8
+#define FASTOP_SIZE (8 * (1 + HAS_KERNEL_IBT))
 
 struct opcode {
 	u64 flags;
@@ -311,7 +311,8 @@ static int fastop(struct x86_emulate_ctx
 #define __FOP_FUNC(name) \
 	".align " __stringify(FASTOP_SIZE) " \n\t" \
 	".type " name ", @function \n\t" \
-	name ":\n\t"
+	name ":\n\t" \
+	ASM_ENDBR
 
 #define FOP_FUNC(name) \
 	__FOP_FUNC(#name)
@@ -433,6 +434,7 @@ static int fastop(struct x86_emulate_ctx
 	".align 4 \n\t" \
 	".type " #op ", @function \n\t" \
 	#op ": \n\t" \
+	ASM_ENDBR \
 	#op " %al \n\t" \
 	__FOP_RET(#op)
 



^ permalink raw reply	[flat|nested] 183+ messages in thread

* [PATCH v2 12/39] x86/ibt,ftrace: Search for __fentry__ location
  2022-02-24 14:51 [PATCH v2 00/39] x86: Kernel IBT Peter Zijlstra
                   ` (10 preceding siblings ...)
  2022-02-24 14:51 ` [PATCH v2 11/39] x86/ibt,kvm: Add ENDBR to fastops Peter Zijlstra
@ 2022-02-24 14:51 ` Peter Zijlstra
  2022-02-24 15:55   ` Masami Hiramatsu
                     ` (2 more replies)
  2022-02-24 14:51 ` [PATCH v2 13/39] x86/livepatch: Validate " Peter Zijlstra
                   ` (27 subsequent siblings)
  39 siblings, 3 replies; 183+ messages in thread
From: Peter Zijlstra @ 2022-02-24 14:51 UTC (permalink / raw)
  To: x86, joao, hjl.tools, jpoimboe, andrew.cooper3
  Cc: linux-kernel, peterz, ndesaulniers, keescook, samitolvanen,
	mark.rutland, alyssa.milburn, mbenes, rostedt, mhiramat,
	alexei.starovoitov

Have ftrace_location() search the symbol for the __fentry__ location
when it isn't at func+0 and use this for {,un}register_ftrace_direct().

This avoids a whole bunch of assumptions about __fentry__ being at
func+0.

Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 kernel/trace/ftrace.c |   30 ++++++++++++++++++++++++++++--
 1 file changed, 28 insertions(+), 2 deletions(-)

--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -1578,7 +1578,24 @@ unsigned long ftrace_location_range(unsi
  */
 unsigned long ftrace_location(unsigned long ip)
 {
-	return ftrace_location_range(ip, ip);
+	struct dyn_ftrace *rec;
+	unsigned long offset;
+	unsigned long size;
+
+	rec = lookup_rec(ip, ip);
+	if (!rec) {
+		if (!kallsyms_lookup_size_offset(ip, &size, &offset))
+			goto out;
+
+		if (!offset)
+			rec = lookup_rec(ip - offset, (ip - offset) + size);
+	}
+
+	if (rec)
+		return rec->ip;
+
+out:
+	return 0;
 }
 
 /**
@@ -5110,11 +5127,16 @@ int register_ftrace_direct(unsigned long
 	struct ftrace_func_entry *entry;
 	struct ftrace_hash *free_hash = NULL;
 	struct dyn_ftrace *rec;
-	int ret = -EBUSY;
+	int ret = -ENODEV;
 
 	mutex_lock(&direct_mutex);
 
+	ip = ftrace_location(ip);
+	if (!ip)
+		goto out_unlock;
+
 	/* See if there's a direct function at @ip already */
+	ret = -EBUSY;
 	if (ftrace_find_rec_direct(ip))
 		goto out_unlock;
 
@@ -5222,6 +5244,10 @@ int unregister_ftrace_direct(unsigned lo
 
 	mutex_lock(&direct_mutex);
 
+	ip = ftrace_location(ip);
+	if (!ip)
+		goto out_unlock;
+
 	entry = find_direct_entry(&ip, NULL);
 	if (!entry)
 		goto out_unlock;



^ permalink raw reply	[flat|nested] 183+ messages in thread

* [PATCH v2 13/39] x86/livepatch: Validate __fentry__ location
  2022-02-24 14:51 [PATCH v2 00/39] x86: Kernel IBT Peter Zijlstra
                   ` (11 preceding siblings ...)
  2022-02-24 14:51 ` [PATCH v2 12/39] x86/ibt,ftrace: Search for __fentry__ location Peter Zijlstra
@ 2022-02-24 14:51 ` Peter Zijlstra
  2022-02-24 23:02   ` Josh Poimboeuf
  2022-02-24 14:51 ` [PATCH v2 14/39] x86/ibt,ftrace: Make function-graph play nice Peter Zijlstra
                   ` (26 subsequent siblings)
  39 siblings, 1 reply; 183+ messages in thread
From: Peter Zijlstra @ 2022-02-24 14:51 UTC (permalink / raw)
  To: x86, joao, hjl.tools, jpoimboe, andrew.cooper3
  Cc: linux-kernel, peterz, ndesaulniers, keescook, samitolvanen,
	mark.rutland, alyssa.milburn, mbenes, rostedt, mhiramat,
	alexei.starovoitov

Currently livepatch assumes __fentry__ lives at func+0, which is most
likely untrue with IBT on. Instead make it use ftrace_location() by
default which both validates and finds the actual ip if there is any
in the same symbol.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/include/asm/livepatch.h |    9 +++++++++
 kernel/livepatch/patch.c         |    2 +-
 2 files changed, 10 insertions(+), 1 deletion(-)

--- a/kernel/livepatch/patch.c
+++ b/kernel/livepatch/patch.c
@@ -133,7 +133,7 @@ static void notrace klp_ftrace_handler(u
 #ifndef klp_get_ftrace_location
 static unsigned long klp_get_ftrace_location(unsigned long faddr)
 {
-	return faddr;
+	return ftrace_location(faddr);
 }
 #endif
 



^ permalink raw reply	[flat|nested] 183+ messages in thread

* [PATCH v2 14/39] x86/ibt,ftrace: Make function-graph play nice
  2022-02-24 14:51 [PATCH v2 00/39] x86: Kernel IBT Peter Zijlstra
                   ` (12 preceding siblings ...)
  2022-02-24 14:51 ` [PATCH v2 13/39] x86/livepatch: Validate " Peter Zijlstra
@ 2022-02-24 14:51 ` Peter Zijlstra
  2022-02-24 15:36   ` Peter Zijlstra
  2022-02-24 14:51 ` [PATCH v2 15/39] x86/ibt,kprobes: Fix more +0 assumptions Peter Zijlstra
                   ` (25 subsequent siblings)
  39 siblings, 1 reply; 183+ messages in thread
From: Peter Zijlstra @ 2022-02-24 14:51 UTC (permalink / raw)
  To: x86, joao, hjl.tools, jpoimboe, andrew.cooper3
  Cc: linux-kernel, peterz, ndesaulniers, keescook, samitolvanen,
	mark.rutland, alyssa.milburn, mbenes, rostedt, mhiramat,
	alexei.starovoitov

Return trampoline must not use indirect branch to return; while this
preserves the RSB, it is fundamentally incompatible with IBT, reverts
commit 194ec3418486 ("function-graph/x86: Replace unbalanced ret with
jmp").

And since ftrace_stub no longer is a plain RET, don't use it to copy
from. Since RET is a trivial instruction, poke it directly.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/kernel/ftrace.c    |    9 ++-------
 arch/x86/kernel/ftrace_64.S |   11 +++++++----
 kernel/trace/ftrace.c       |   10 ++++++++++
 3 files changed, 19 insertions(+), 11 deletions(-)

--- a/arch/x86/kernel/ftrace.c
+++ b/arch/x86/kernel/ftrace.c
@@ -316,12 +316,12 @@ create_trampoline(struct ftrace_ops *ops
 	unsigned long offset;
 	unsigned long npages;
 	unsigned long size;
-	unsigned long retq;
 	unsigned long *ptr;
 	void *trampoline;
 	void *ip;
 	/* 48 8b 15 <offset> is movq <offset>(%rip), %rdx */
 	unsigned const char op_ref[] = { 0x48, 0x8b, 0x15 };
+	unsigned const char retq[] = { RET_INSN_OPCODE, INT3_INSN_OPCODE };
 	union ftrace_op_code_union op_ptr;
 	int ret;
 
@@ -359,12 +359,7 @@ create_trampoline(struct ftrace_ops *ops
 		goto fail;
 
 	ip = trampoline + size;
-
-	/* The trampoline ends with ret(q) */
-	retq = (unsigned long)ftrace_stub;
-	ret = copy_from_kernel_nofault(ip, (void *)retq, RET_SIZE);
-	if (WARN_ON(ret < 0))
-		goto fail;
+	memcpy(ip, retq, RET_SIZE);
 
 	/* No need to test direct calls on created trampolines */
 	if (ops->flags & FTRACE_OPS_FL_SAVE_REGS) {
--- a/arch/x86/kernel/ftrace_64.S
+++ b/arch/x86/kernel/ftrace_64.S
@@ -176,10 +176,10 @@ SYM_FUNC_END(ftrace_caller);
 SYM_FUNC_START(ftrace_epilogue)
 /*
  * This is weak to keep gas from relaxing the jumps.
- * It is also used to copy the RET for trampolines.
  */
 SYM_INNER_LABEL_ALIGN(ftrace_stub, SYM_L_WEAK)
 	UNWIND_HINT_FUNC
+	ENDBR
 	RET
 SYM_FUNC_END(ftrace_epilogue)
 
@@ -284,6 +284,7 @@ SYM_FUNC_START(__fentry__)
 	jnz trace
 
 SYM_INNER_LABEL(ftrace_stub, SYM_L_GLOBAL)
+	ENDBR
 	RET
 
 trace:
@@ -316,10 +317,12 @@ SYM_FUNC_START(return_to_handler)
 
 	call ftrace_return_to_handler
 
-	movq %rax, %rdi
+	movq %rax, 16(%rsp)
 	movq 8(%rsp), %rdx
 	movq (%rsp), %rax
-	addq $24, %rsp
-	JMP_NOSPEC rdi
+
+	addq $16, %rsp
+	UNWIND_HINT_FUNC
+	RET
 SYM_FUNC_END(return_to_handler)
 #endif



^ permalink raw reply	[flat|nested] 183+ messages in thread

* [PATCH v2 15/39] x86/ibt,kprobes: Fix more +0 assumptions
  2022-02-24 14:51 [PATCH v2 00/39] x86: Kernel IBT Peter Zijlstra
                   ` (13 preceding siblings ...)
  2022-02-24 14:51 ` [PATCH v2 14/39] x86/ibt,ftrace: Make function-graph play nice Peter Zijlstra
@ 2022-02-24 14:51 ` Peter Zijlstra
  2022-02-25  0:58   ` Kees Cook
                     ` (2 more replies)
  2022-02-24 14:51 ` [PATCH v2 16/39] x86/bpf: Add ENDBR instructions to prologue and trampoline Peter Zijlstra
                   ` (24 subsequent siblings)
  39 siblings, 3 replies; 183+ messages in thread
From: Peter Zijlstra @ 2022-02-24 14:51 UTC (permalink / raw)
  To: x86, joao, hjl.tools, jpoimboe, andrew.cooper3
  Cc: linux-kernel, peterz, ndesaulniers, keescook, samitolvanen,
	mark.rutland, alyssa.milburn, mbenes, rostedt, mhiramat,
	alexei.starovoitov

With IBT on, sym+0 is no longer the __fentry__ site.

NOTE: the architecture has a special case and *does* allow placing an
INT3 breakpoint over ENDBR in which case #BP has precedence over #CP
and as such we don't need to disallow probing these instructions.

NOTE: irrespective of the above; there is a complication in that
direct branches to functions are rewritten to not execute ENDBR, so
any breakpoint thereon might miss lots of actual function executions.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/kernel/kprobes/core.c |   11 +++++++++++
 kernel/kprobes.c               |   15 ++++++++++++---
 2 files changed, 23 insertions(+), 3 deletions(-)

--- a/arch/x86/kernel/kprobes/core.c
+++ b/arch/x86/kernel/kprobes/core.c
@@ -1156,3 +1162,8 @@ int arch_trampoline_kprobe(struct kprobe
 {
 	return 0;
 }
+
+bool arch_kprobe_on_func_entry(unsigned long offset)
+{
+	return offset <= 4*HAS_KERNEL_IBT;
+}
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -67,10 +67,19 @@ static bool kprobes_all_disarmed;
 static DEFINE_MUTEX(kprobe_mutex);
 static DEFINE_PER_CPU(struct kprobe *, kprobe_instance);
 
-kprobe_opcode_t * __weak kprobe_lookup_name(const char *name,
-					unsigned int __unused)
+kprobe_opcode_t * __weak kprobe_lookup_name(const char *name, unsigned int offset)
 {
-	return ((kprobe_opcode_t *)(kallsyms_lookup_name(name)));
+	kprobe_opcode_t *addr = NULL;
+
+	addr = ((kprobe_opcode_t *)(kallsyms_lookup_name(name)));
+#ifdef CONFIG_KPROBES_ON_FTRACE
+	if (addr && !offset) {
+		unsigned long faddr = ftrace_location((unsigned long)addr);
+		if (faddr)
+			addr = (kprobe_opcode_t *)faddr;
+	}
+#endif
+	return addr;
 }
 
 /*



^ permalink raw reply	[flat|nested] 183+ messages in thread

* [PATCH v2 16/39] x86/bpf: Add ENDBR instructions to prologue and trampoline
  2022-02-24 14:51 [PATCH v2 00/39] x86: Kernel IBT Peter Zijlstra
                   ` (14 preceding siblings ...)
  2022-02-24 14:51 ` [PATCH v2 15/39] x86/ibt,kprobes: Fix more +0 assumptions Peter Zijlstra
@ 2022-02-24 14:51 ` Peter Zijlstra
  2022-02-24 23:37   ` Josh Poimboeuf
  2022-02-24 14:51 ` [PATCH v2 17/39] x86/ibt,ftrace: Add ENDBR to samples/ftrace Peter Zijlstra
                   ` (23 subsequent siblings)
  39 siblings, 1 reply; 183+ messages in thread
From: Peter Zijlstra @ 2022-02-24 14:51 UTC (permalink / raw)
  To: x86, joao, hjl.tools, jpoimboe, andrew.cooper3
  Cc: linux-kernel, peterz, ndesaulniers, keescook, samitolvanen,
	mark.rutland, alyssa.milburn, mbenes, rostedt, mhiramat,
	alexei.starovoitov

With IBT enabled builds we need ENDBR instructions at indirect jump
target sites, since we start execution of the JIT'ed code through an
indirect jump, the very first instruction needs to be ENDBR.

Similarly, since eBPF tail-calls use indirect branches, their landing
site needs to be an ENDBR too.

The trampolines need similar adjustment.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/net/bpf_jit_comp.c |   37 +++++++++++++++++++++++++++++++------
 kernel/bpf/trampoline.c     |   20 ++++----------------
 2 files changed, 35 insertions(+), 22 deletions(-)

--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -46,6 +46,12 @@ static u8 *emit_code(u8 *ptr, u32 bytes,
 #define EMIT4_off32(b1, b2, b3, b4, off) \
 	do { EMIT4(b1, b2, b3, b4); EMIT(off, 4); } while (0)
 
+#ifdef CONFIG_X86_KERNEL_IBT
+#define EMIT_ENDBR()	EMIT(gen_endbr(), 4)
+#else
+#define EMIT_ENDBR()
+#endif
+
 static bool is_imm8(int value)
 {
 	return value <= 127 && value >= -128;
@@ -241,7 +247,7 @@ struct jit_context {
 /* Number of bytes emit_patch() needs to generate instructions */
 #define X86_PATCH_SIZE		5
 /* Number of bytes that will be skipped on tailcall */
-#define X86_TAIL_CALL_OFFSET	11
+#define X86_TAIL_CALL_OFFSET	(11 + 4*HAS_KERNEL_IBT)
 
 static void push_callee_regs(u8 **pprog, bool *callee_regs_used)
 {
@@ -286,16 +292,21 @@ static void emit_prologue(u8 **pprog, u3
 	/* BPF trampoline can be made to work without these nops,
 	 * but let's waste 5 bytes for now and optimize later
 	 */
-	memcpy(prog, x86_nops[5], X86_PATCH_SIZE);
+	EMIT_ENDBR();
+	memcpy(prog, x86_nops[5], X86_PATCH_SIZE);		/*    =  5 */
 	prog += X86_PATCH_SIZE;
 	if (!ebpf_from_cbpf) {
 		if (tail_call_reachable && !is_subprog)
-			EMIT2(0x31, 0xC0); /* xor eax, eax */
+			EMIT2(0x31, 0xC0); /* xor eax, eax */	/* +2 =  7 */
 		else
 			EMIT2(0x66, 0x90); /* nop2 */
 	}
-	EMIT1(0x55);             /* push rbp */
-	EMIT3(0x48, 0x89, 0xE5); /* mov rbp, rsp */
+	EMIT1(0x55);             /* push rbp */			/* +1 =  8 */
+	EMIT3(0x48, 0x89, 0xE5); /* mov rbp, rsp */		/* +3 = 11 */
+
+	/* X86_TAIL_CALL_OFFSET is here */
+	EMIT_ENDBR();
+
 	/* sub rsp, rounded_stack_depth */
 	if (stack_depth)
 		EMIT3_off32(0x48, 0x81, 0xEC, round_up(stack_depth, 8));
@@ -339,9 +350,18 @@ static int __bpf_arch_text_poke(void *ip
 	u8 *prog;
 	int ret;
 
+#ifdef CONFIG_X86_KERNEL_IBT
+	if (is_endbr(*(u32 *)ip))
+		ip += 4;
+#endif
+
 	memcpy(old_insn, nop_insn, X86_PATCH_SIZE);
 	if (old_addr) {
 		prog = old_insn;
+#ifdef CONFIG_X86_KERNEL_IBT
+		if (is_endbr(*(u32 *)old_addr))
+			old_addr += 4;
+#endif
 		ret = t == BPF_MOD_CALL ?
 		      emit_call(&prog, old_addr, ip) :
 		      emit_jump(&prog, old_addr, ip);
@@ -352,6 +372,10 @@ static int __bpf_arch_text_poke(void *ip
 	memcpy(new_insn, nop_insn, X86_PATCH_SIZE);
 	if (new_addr) {
 		prog = new_insn;
+#ifdef CONFIG_X86_KERNEL_IBT
+		if (is_endbr(*(u32 *)new_addr))
+			new_addr += 4;
+#endif
 		ret = t == BPF_MOD_CALL ?
 		      emit_call(&prog, new_addr, ip) :
 		      emit_jump(&prog, new_addr, ip);
@@ -2028,10 +2052,11 @@ int arch_prepare_bpf_trampoline(struct b
 		/* skip patched call instruction and point orig_call to actual
 		 * body of the kernel function.
 		 */
-		orig_call += X86_PATCH_SIZE;
+		orig_call += X86_PATCH_SIZE + 4*HAS_KERNEL_IBT;
 
 	prog = image;
 
+	EMIT_ENDBR();
 	EMIT1(0x55);		 /* push rbp */
 	EMIT3(0x48, 0x89, 0xE5); /* mov rbp, rsp */
 	EMIT4(0x48, 0x83, 0xEC, stack_size); /* sub rsp, stack_size */
--- a/kernel/bpf/trampoline.c
+++ b/kernel/bpf/trampoline.c
@@ -117,18 +117,6 @@ static void bpf_trampoline_module_put(st
 	tr->mod = NULL;
 }
 
-static int is_ftrace_location(void *ip)
-{
-	long addr;
-
-	addr = ftrace_location((long)ip);
-	if (!addr)
-		return 0;
-	if (WARN_ON_ONCE(addr != (long)ip))
-		return -EFAULT;
-	return 1;
-}
-
 static int unregister_fentry(struct bpf_trampoline *tr, void *old_addr)
 {
 	void *ip = tr->func.addr;
@@ -160,12 +148,12 @@ static int modify_fentry(struct bpf_tram
 static int register_fentry(struct bpf_trampoline *tr, void *new_addr)
 {
 	void *ip = tr->func.addr;
+	unsigned long faddr;
 	int ret;
 
-	ret = is_ftrace_location(ip);
-	if (ret < 0)
-		return ret;
-	tr->func.ftrace_managed = ret;
+	faddr = ftrace_location((unsigned long)ip);
+	if (faddr)
+		tr->func.ftrace_managed = true;
 
 	if (bpf_trampoline_module_get(tr))
 		return -ENOENT;



^ permalink raw reply	[flat|nested] 183+ messages in thread

* [PATCH v2 17/39] x86/ibt,ftrace: Add ENDBR to samples/ftrace
  2022-02-24 14:51 [PATCH v2 00/39] x86: Kernel IBT Peter Zijlstra
                   ` (15 preceding siblings ...)
  2022-02-24 14:51 ` [PATCH v2 16/39] x86/bpf: Add ENDBR instructions to prologue and trampoline Peter Zijlstra
@ 2022-02-24 14:51 ` Peter Zijlstra
  2022-02-24 14:51 ` [PATCH v2 18/39] x86/ibt: Add IBT feature, MSR and #CP handling Peter Zijlstra
                   ` (22 subsequent siblings)
  39 siblings, 0 replies; 183+ messages in thread
From: Peter Zijlstra @ 2022-02-24 14:51 UTC (permalink / raw)
  To: x86, joao, hjl.tools, jpoimboe, andrew.cooper3
  Cc: linux-kernel, peterz, ndesaulniers, keescook, samitolvanen,
	mark.rutland, alyssa.milburn, mbenes, rostedt, mhiramat,
	alexei.starovoitov


Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 samples/ftrace/ftrace-direct-modify.c       |    5 +++++
 samples/ftrace/ftrace-direct-multi-modify.c |   10 +++++++---
 samples/ftrace/ftrace-direct-multi.c        |    5 ++++-
 samples/ftrace/ftrace-direct-too.c          |    3 +++
 samples/ftrace/ftrace-direct.c              |    3 +++
 5 files changed, 22 insertions(+), 4 deletions(-)

--- a/samples/ftrace/ftrace-direct-modify.c
+++ b/samples/ftrace/ftrace-direct-modify.c
@@ -24,20 +24,25 @@ static unsigned long my_ip = (unsigned l
 
 #ifdef CONFIG_X86_64
 
+#include <asm/ibt.h>
+
 asm (
 "	.pushsection    .text, \"ax\", @progbits\n"
 "	.type		my_tramp1, @function\n"
 "	.globl		my_tramp1\n"
 "   my_tramp1:"
+	ASM_ENDBR
 "	pushq %rbp\n"
 "	movq %rsp, %rbp\n"
 "	call my_direct_func1\n"
 "	leave\n"
 "	.size		my_tramp1, .-my_tramp1\n"
 	ASM_RET
+
 "	.type		my_tramp2, @function\n"
 "	.globl		my_tramp2\n"
 "   my_tramp2:"
+	ASM_ENDBR
 "	pushq %rbp\n"
 "	movq %rsp, %rbp\n"
 "	call my_direct_func2\n"
--- a/samples/ftrace/ftrace-direct-multi-modify.c
+++ b/samples/ftrace/ftrace-direct-multi-modify.c
@@ -22,11 +22,14 @@ extern void my_tramp2(void *);
 
 #ifdef CONFIG_X86_64
 
+#include <asm/ibt.h>
+
 asm (
 "	.pushsection    .text, \"ax\", @progbits\n"
 "	.type		my_tramp1, @function\n"
 "	.globl		my_tramp1\n"
 "   my_tramp1:"
+	ASM_ENDBR
 "	pushq %rbp\n"
 "	movq %rsp, %rbp\n"
 "	pushq %rdi\n"
@@ -34,12 +37,13 @@ asm (
 "	call my_direct_func1\n"
 "	popq %rdi\n"
 "	leave\n"
-"	ret\n"
+	ASM_RET
 "	.size		my_tramp1, .-my_tramp1\n"
+
 "	.type		my_tramp2, @function\n"
-"\n"
 "	.globl		my_tramp2\n"
 "   my_tramp2:"
+	ASM_ENDBR
 "	pushq %rbp\n"
 "	movq %rsp, %rbp\n"
 "	pushq %rdi\n"
@@ -47,7 +51,7 @@ asm (
 "	call my_direct_func2\n"
 "	popq %rdi\n"
 "	leave\n"
-"	ret\n"
+	ASM_RET
 "	.size		my_tramp2, .-my_tramp2\n"
 "	.popsection\n"
 );
--- a/samples/ftrace/ftrace-direct-multi.c
+++ b/samples/ftrace/ftrace-direct-multi.c
@@ -17,11 +17,14 @@ extern void my_tramp(void *);
 
 #ifdef CONFIG_X86_64
 
+#include <asm/ibt.h>
+
 asm (
 "	.pushsection    .text, \"ax\", @progbits\n"
 "	.type		my_tramp, @function\n"
 "	.globl		my_tramp\n"
 "   my_tramp:"
+	ASM_ENDBR
 "	pushq %rbp\n"
 "	movq %rsp, %rbp\n"
 "	pushq %rdi\n"
@@ -29,7 +32,7 @@ asm (
 "	call my_direct_func\n"
 "	popq %rdi\n"
 "	leave\n"
-"	ret\n"
+	ASM_RET
 "	.size		my_tramp, .-my_tramp\n"
 "	.popsection\n"
 );
--- a/samples/ftrace/ftrace-direct-too.c
+++ b/samples/ftrace/ftrace-direct-too.c
@@ -19,11 +19,14 @@ extern void my_tramp(void *);
 
 #ifdef CONFIG_X86_64
 
+#include <asm/ibt.h>
+
 asm (
 "	.pushsection    .text, \"ax\", @progbits\n"
 "	.type		my_tramp, @function\n"
 "	.globl		my_tramp\n"
 "   my_tramp:"
+	ASM_ENDBR
 "	pushq %rbp\n"
 "	movq %rsp, %rbp\n"
 "	pushq %rdi\n"
--- a/samples/ftrace/ftrace-direct.c
+++ b/samples/ftrace/ftrace-direct.c
@@ -16,11 +16,14 @@ extern void my_tramp(void *);
 
 #ifdef CONFIG_X86_64
 
+#include <asm/ibt.h>
+
 asm (
 "	.pushsection    .text, \"ax\", @progbits\n"
 "	.type		my_tramp, @function\n"
 "	.globl		my_tramp\n"
 "   my_tramp:"
+	ASM_ENDBR
 "	pushq %rbp\n"
 "	movq %rsp, %rbp\n"
 "	pushq %rdi\n"



^ permalink raw reply	[flat|nested] 183+ messages in thread

* [PATCH v2 18/39] x86/ibt: Add IBT feature, MSR and #CP handling
  2022-02-24 14:51 [PATCH v2 00/39] x86: Kernel IBT Peter Zijlstra
                   ` (16 preceding siblings ...)
  2022-02-24 14:51 ` [PATCH v2 17/39] x86/ibt,ftrace: Add ENDBR to samples/ftrace Peter Zijlstra
@ 2022-02-24 14:51 ` Peter Zijlstra
  2022-02-24 23:55   ` Josh Poimboeuf
                     ` (2 more replies)
  2022-02-24 14:51 ` [PATCH v2 19/39] x86: Disable IBT around firmware Peter Zijlstra
                   ` (21 subsequent siblings)
  39 siblings, 3 replies; 183+ messages in thread
From: Peter Zijlstra @ 2022-02-24 14:51 UTC (permalink / raw)
  To: x86, joao, hjl.tools, jpoimboe, andrew.cooper3
  Cc: linux-kernel, peterz, ndesaulniers, keescook, samitolvanen,
	mark.rutland, alyssa.milburn, mbenes, rostedt, mhiramat,
	alexei.starovoitov

The bits required to make the hardware go.. Of note is that, provided
the syscall entry points are covered with ENDBR, #CP doesn't need to
be an IST because we'll never hit the syscall gap.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/include/asm/cpu.h                  |    4 +
 arch/x86/include/asm/cpufeatures.h          |    1 
 arch/x86/include/asm/idtentry.h             |    5 ++
 arch/x86/include/asm/msr-index.h            |   20 +++++++-
 arch/x86/include/asm/traps.h                |    2 
 arch/x86/include/uapi/asm/processor-flags.h |    2 
 arch/x86/kernel/cpu/common.c                |   28 ++++++++++-
 arch/x86/kernel/idt.c                       |    4 +
 arch/x86/kernel/machine_kexec_64.c          |    2 
 arch/x86/kernel/traps.c                     |   70 ++++++++++++++++++++++++++++
 10 files changed, 136 insertions(+), 2 deletions(-)

--- a/arch/x86/include/asm/cpu.h
+++ b/arch/x86/include/asm/cpu.h
@@ -7,6 +7,7 @@
 #include <linux/topology.h>
 #include <linux/nodemask.h>
 #include <linux/percpu.h>
+#include <asm/ibt.h>
 
 #ifdef CONFIG_SMP
 
@@ -72,4 +73,7 @@ void init_ia32_feat_ctl(struct cpuinfo_x
 #else
 static inline void init_ia32_feat_ctl(struct cpuinfo_x86 *c) {}
 #endif
+
+extern __noendbr void cet_disable(void);
+
 #endif /* _ASM_X86_CPU_H */
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -387,6 +387,7 @@
 #define X86_FEATURE_TSXLDTRK		(18*32+16) /* TSX Suspend Load Address Tracking */
 #define X86_FEATURE_PCONFIG		(18*32+18) /* Intel PCONFIG */
 #define X86_FEATURE_ARCH_LBR		(18*32+19) /* Intel ARCH LBR */
+#define X86_FEATURE_IBT			(18*32+20) /* Indirect Branch Tracking */
 #define X86_FEATURE_AMX_BF16		(18*32+22) /* AMX bf16 Support */
 #define X86_FEATURE_AVX512_FP16		(18*32+23) /* AVX512 FP16 */
 #define X86_FEATURE_AMX_TILE		(18*32+24) /* AMX tile Support */
--- a/arch/x86/include/asm/idtentry.h
+++ b/arch/x86/include/asm/idtentry.h
@@ -617,6 +617,11 @@ DECLARE_IDTENTRY_DF(X86_TRAP_DF,	exc_dou
 DECLARE_IDTENTRY_RAW_ERRORCODE(X86_TRAP_DF,	xenpv_exc_double_fault);
 #endif
 
+/* #CP */
+#ifdef CONFIG_X86_KERNEL_IBT
+DECLARE_IDTENTRY_ERRORCODE(X86_TRAP_CP,	exc_control_protection);
+#endif
+
 /* #VC */
 #ifdef CONFIG_AMD_MEM_ENCRYPT
 DECLARE_IDTENTRY_VC(X86_TRAP_VC,	exc_vmm_communication);
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -362,11 +362,29 @@
 #define MSR_ATOM_CORE_TURBO_RATIOS	0x0000066c
 #define MSR_ATOM_CORE_TURBO_VIDS	0x0000066d
 
-
 #define MSR_CORE_PERF_LIMIT_REASONS	0x00000690
 #define MSR_GFX_PERF_LIMIT_REASONS	0x000006B0
 #define MSR_RING_PERF_LIMIT_REASONS	0x000006B1
 
+/* Control-flow Enforcement Technology MSRs */
+#define MSR_IA32_U_CET			0x000006a0 /* user mode cet */
+#define MSR_IA32_S_CET			0x000006a2 /* kernel mode cet */
+#define CET_SHSTK_EN			BIT_ULL(0)
+#define CET_WRSS_EN			BIT_ULL(1)
+#define CET_ENDBR_EN			BIT_ULL(2)
+#define CET_LEG_IW_EN			BIT_ULL(3)
+#define CET_NO_TRACK_EN			BIT_ULL(4)
+#define CET_SUPPRESS_DISABLE		BIT_ULL(5)
+#define CET_RESERVED			(BIT_ULL(6) | BIT_ULL(7) | BIT_ULL(8) | BIT_ULL(9))
+#define CET_SUPPRESS			BIT_ULL(10)
+#define CET_WAIT_ENDBR			BIT_ULL(11)
+
+#define MSR_IA32_PL0_SSP		0x000006a4 /* ring-0 shadow stack pointer */
+#define MSR_IA32_PL1_SSP		0x000006a5 /* ring-1 shadow stack pointer */
+#define MSR_IA32_PL2_SSP		0x000006a6 /* ring-2 shadow stack pointer */
+#define MSR_IA32_PL3_SSP		0x000006a7 /* ring-3 shadow stack pointer */
+#define MSR_IA32_INT_SSP_TAB		0x000006a8 /* exception shadow stack table */
+
 /* Hardware P state interface */
 #define MSR_PPERF			0x0000064e
 #define MSR_PERF_LIMIT_REASONS		0x0000064f
--- a/arch/x86/include/asm/traps.h
+++ b/arch/x86/include/asm/traps.h
@@ -18,6 +18,8 @@ void __init trap_init(void);
 asmlinkage __visible noinstr struct pt_regs *vc_switch_off_ist(struct pt_regs *eregs);
 #endif
 
+extern bool ibt_selftest(void);
+
 #ifdef CONFIG_X86_F00F_BUG
 /* For handling the FOOF bug */
 void handle_invalid_op(struct pt_regs *regs);
--- a/arch/x86/include/uapi/asm/processor-flags.h
+++ b/arch/x86/include/uapi/asm/processor-flags.h
@@ -130,6 +130,8 @@
 #define X86_CR4_SMAP		_BITUL(X86_CR4_SMAP_BIT)
 #define X86_CR4_PKE_BIT		22 /* enable Protection Keys support */
 #define X86_CR4_PKE		_BITUL(X86_CR4_PKE_BIT)
+#define X86_CR4_CET_BIT		23 /* enable Control-flow Enforcement Technology */
+#define X86_CR4_CET		_BITUL(X86_CR4_CET_BIT)
 
 /*
  * x86-64 Task Priority Register, CR8
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -59,6 +59,7 @@
 #include <asm/cpu_device_id.h>
 #include <asm/uv/uv.h>
 #include <asm/sigframe.h>
+#include <asm/traps.h>
 
 #include "cpu.h"
 
@@ -438,7 +439,8 @@ static __always_inline void setup_umip(s
 
 /* These bits should not change their value after CPU init is finished. */
 static const unsigned long cr4_pinned_mask =
-	X86_CR4_SMEP | X86_CR4_SMAP | X86_CR4_UMIP | X86_CR4_FSGSBASE;
+	X86_CR4_SMEP | X86_CR4_SMAP | X86_CR4_UMIP |
+	X86_CR4_FSGSBASE | X86_CR4_CET;
 static DEFINE_STATIC_KEY_FALSE_RO(cr_pinning);
 static unsigned long cr4_pinned_bits __ro_after_init;
 
@@ -592,6 +594,29 @@ static __init int setup_disable_pku(char
 __setup("nopku", setup_disable_pku);
 #endif /* CONFIG_X86_64 */
 
+static __always_inline void setup_cet(struct cpuinfo_x86 *c)
+{
+	u64 msr = CET_ENDBR_EN;
+
+	if (!HAS_KERNEL_IBT ||
+	    !cpu_feature_enabled(X86_FEATURE_IBT))
+		return;
+
+	wrmsrl(MSR_IA32_S_CET, msr);
+	cr4_set_bits(X86_CR4_CET);
+
+	if (!ibt_selftest()) {
+		pr_err("IBT selftest: Failed!\n");
+		setup_clear_cpu_cap(X86_FEATURE_IBT);
+	}
+}
+
+__noendbr void cet_disable(void)
+{
+	if (cpu_feature_enabled(X86_FEATURE_IBT))
+		wrmsrl(MSR_IA32_S_CET, 0);
+}
+
 /*
  * Some CPU features depend on higher CPUID levels, which may not always
  * be available due to CPUID level capping or broken virtualization
@@ -1709,6 +1734,7 @@ static void identify_cpu(struct cpuinfo_
 
 	x86_init_rdrand(c);
 	setup_pku(c);
+	setup_cet(c);
 
 	/*
 	 * Clear/Set all flags overridden by options, need do it
--- a/arch/x86/kernel/idt.c
+++ b/arch/x86/kernel/idt.c
@@ -104,6 +104,10 @@ static const __initconst struct idt_data
 	ISTG(X86_TRAP_MC,		asm_exc_machine_check, IST_INDEX_MCE),
 #endif
 
+#ifdef CONFIG_X86_KERNEL_IBT
+	INTG(X86_TRAP_CP,		asm_exc_control_protection),
+#endif
+
 #ifdef CONFIG_AMD_MEM_ENCRYPT
 	ISTG(X86_TRAP_VC,		asm_exc_vmm_communication, IST_INDEX_VC),
 #endif
--- a/arch/x86/kernel/machine_kexec_64.c
+++ b/arch/x86/kernel/machine_kexec_64.c
@@ -27,6 +27,7 @@
 #include <asm/kexec-bzimage64.h>
 #include <asm/setup.h>
 #include <asm/set_memory.h>
+#include <asm/cpu.h>
 
 #ifdef CONFIG_ACPI
 /*
@@ -310,6 +311,7 @@ void machine_kexec(struct kimage *image)
 	/* Interrupts aren't acceptable while we reboot */
 	local_irq_disable();
 	hw_breakpoint_disable();
+	cet_disable();
 
 	if (image->preserve_context) {
 #ifdef CONFIG_X86_IO_APIC
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -210,6 +210,76 @@ DEFINE_IDTENTRY(exc_overflow)
 	do_error_trap(regs, 0, "overflow", X86_TRAP_OF, SIGSEGV, 0, NULL);
 }
 
+#ifdef CONFIG_X86_KERNEL_IBT
+
+static __ro_after_init bool ibt_fatal = true;
+
+extern const unsigned long ibt_selftest_ip; /* defined in asm beow */
+
+DEFINE_IDTENTRY_ERRORCODE(exc_control_protection)
+{
+	if (!cpu_feature_enabled(X86_FEATURE_IBT)) {
+		pr_err("Unexpected #CP\n");
+		BUG();
+	}
+
+	if (WARN_ON_ONCE(user_mode(regs) || error_code != 3))
+		return;
+
+	if (unlikely(regs->ip == ibt_selftest_ip)) {
+		regs->ax = 0;
+		return;
+	}
+
+	pr_err("Missing ENDBR: %pS\n", (void *)instruction_pointer(regs));
+	if (!ibt_fatal) {
+		printk(KERN_DEFAULT CUT_HERE);
+		__warn(__FILE__, __LINE__, (void *)regs->ip, TAINT_WARN, regs, NULL);
+		return;
+	}
+	BUG();
+}
+
+bool ibt_selftest(void)
+{
+	unsigned long ret;
+
+	asm ("1: lea 2f(%%rip), %%rax\n\t"
+	     ANNOTATE_RETPOLINE_SAFE
+	     "   jmp *%%rax\n\t"
+	     ASM_REACHABLE
+	     ANNOTATE_NOENDBR
+	     "2: nop\n\t"
+
+	     /* unsigned ibt_selftest_ip = 2b */
+	     ".pushsection .rodata,\"a\"\n\t"
+	     ".align 8\n\t"
+	     ".type ibt_selftest_ip, @object\n\t"
+	     ".size ibt_selftest_ip, 8\n\t"
+	     "ibt_selftest_ip:\n\t"
+	     ".quad 2b\n\t"
+	     ".popsection\n\t"
+
+	     : "=a" (ret) : : "memory");
+
+	return !ret;
+}
+
+static int __init ibt_setup(char *str)
+{
+	if (!strcmp(str, "off"))
+		setup_clear_cpu_cap(X86_FEATURE_IBT);
+
+	if (!strcmp(str, "warn"))
+		ibt_fatal = false;
+
+	return 1;
+}
+
+__setup("ibt=", ibt_setup);
+
+#endif /* CONFIG_X86_KERNEL_IBT */
+
 #ifdef CONFIG_X86_F00F_BUG
 void handle_invalid_op(struct pt_regs *regs)
 #else



^ permalink raw reply	[flat|nested] 183+ messages in thread

* [PATCH v2 19/39] x86: Disable IBT around firmware
  2022-02-24 14:51 [PATCH v2 00/39] x86: Kernel IBT Peter Zijlstra
                   ` (17 preceding siblings ...)
  2022-02-24 14:51 ` [PATCH v2 18/39] x86/ibt: Add IBT feature, MSR and #CP handling Peter Zijlstra
@ 2022-02-24 14:51 ` Peter Zijlstra
  2022-02-25  1:10   ` Kees Cook
  2022-02-24 14:51 ` [PATCH v2 20/39] x86/bugs: Disable Retpoline when IBT Peter Zijlstra
                   ` (20 subsequent siblings)
  39 siblings, 1 reply; 183+ messages in thread
From: Peter Zijlstra @ 2022-02-24 14:51 UTC (permalink / raw)
  To: x86, joao, hjl.tools, jpoimboe, andrew.cooper3
  Cc: linux-kernel, peterz, ndesaulniers, keescook, samitolvanen,
	mark.rutland, alyssa.milburn, mbenes, rostedt, mhiramat,
	alexei.starovoitov

Assume firmware isn't IBT clean and disable it across calls.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/include/asm/efi.h   |    9 +++++++--
 arch/x86/include/asm/ibt.h   |    6 ++++++
 arch/x86/kernel/apm_32.c     |    7 +++++++
 arch/x86/kernel/cpu/common.c |   28 ++++++++++++++++++++++++++++
 4 files changed, 48 insertions(+), 2 deletions(-)

--- a/arch/x86/include/asm/efi.h
+++ b/arch/x86/include/asm/efi.h
@@ -7,6 +7,7 @@
 #include <asm/tlb.h>
 #include <asm/nospec-branch.h>
 #include <asm/mmu_context.h>
+#include <asm/ibt.h>
 #include <linux/build_bug.h>
 #include <linux/kernel.h>
 #include <linux/pgtable.h>
@@ -120,8 +121,12 @@ extern asmlinkage u64 __efi_call(void *f
 	efi_enter_mm();							\
 })
 
-#define arch_efi_call_virt(p, f, args...)				\
-	efi_call((void *)p->f, args)					\
+#define arch_efi_call_virt(p, f, args...) ({				\
+	u64 ret, ibt = ibt_save();					\
+	ret = efi_call((void *)p->f, args);				\
+	ibt_restore(ibt);						\
+	ret;								\
+})
 
 #define arch_efi_call_virt_teardown()					\
 ({									\
--- a/arch/x86/include/asm/ibt.h
+++ b/arch/x86/include/asm/ibt.h
@@ -37,6 +37,9 @@ static inline bool is_endbr(unsigned int
 	return val == gen_endbr();
 }
 
+extern __noendbr u64 ibt_save(void);
+extern __noendbr void ibt_restore(u64 save);
+
 #else /* __ASSEMBLY__ */
 
 #ifdef CONFIG_X86_64
@@ -57,6 +60,9 @@ static inline bool is_endbr(unsigned int
 
 static inline bool is_endbr(unsigned int val) { return false; }
 
+static inline u64 ibt_save(void) { return 0; }
+static inline void ibt_restore(u64 save) { }
+
 #else /* __ASSEMBLY__ */
 
 #define ENDBR
--- a/arch/x86/kernel/apm_32.c
+++ b/arch/x86/kernel/apm_32.c
@@ -232,6 +232,7 @@
 #include <asm/paravirt.h>
 #include <asm/reboot.h>
 #include <asm/nospec-branch.h>
+#include <asm/ibt.h>
 
 #if defined(CONFIG_APM_DISPLAY_BLANK) && defined(CONFIG_VT)
 extern int (*console_blank_hook)(int);
@@ -598,6 +599,7 @@ static long __apm_bios_call(void *_call)
 	struct desc_struct	save_desc_40;
 	struct desc_struct	*gdt;
 	struct apm_bios_call	*call = _call;
+	u64			ibt;
 
 	cpu = get_cpu();
 	BUG_ON(cpu != 0);
@@ -607,11 +609,13 @@ static long __apm_bios_call(void *_call)
 
 	apm_irq_save(flags);
 	firmware_restrict_branch_speculation_start();
+	ibt = ibt_save();
 	APM_DO_SAVE_SEGS;
 	apm_bios_call_asm(call->func, call->ebx, call->ecx,
 			  &call->eax, &call->ebx, &call->ecx, &call->edx,
 			  &call->esi);
 	APM_DO_RESTORE_SEGS;
+	ibt_restore(ibt);
 	firmware_restrict_branch_speculation_end();
 	apm_irq_restore(flags);
 	gdt[0x40 / 8] = save_desc_40;
@@ -676,6 +680,7 @@ static long __apm_bios_call_simple(void
 	struct desc_struct	save_desc_40;
 	struct desc_struct	*gdt;
 	struct apm_bios_call	*call = _call;
+	u64			ibt;
 
 	cpu = get_cpu();
 	BUG_ON(cpu != 0);
@@ -685,10 +690,12 @@ static long __apm_bios_call_simple(void
 
 	apm_irq_save(flags);
 	firmware_restrict_branch_speculation_start();
+	ibt = ibt_save();
 	APM_DO_SAVE_SEGS;
 	error = apm_bios_call_simple_asm(call->func, call->ebx, call->ecx,
 					 &call->eax);
 	APM_DO_RESTORE_SEGS;
+	ibt_restore(ibt);
 	firmware_restrict_branch_speculation_end();
 	apm_irq_restore(flags);
 	gdt[0x40 / 8] = save_desc_40;
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -594,6 +594,34 @@ static __init int setup_disable_pku(char
 __setup("nopku", setup_disable_pku);
 #endif /* CONFIG_X86_64 */
 
+#ifdef CONFIG_X86_KERNEL_IBT
+
+__noendbr u64 ibt_save(void)
+{
+	u64 msr = 0;
+
+	if (cpu_feature_enabled(X86_FEATURE_IBT)) {
+		rdmsrl(MSR_IA32_S_CET, msr);
+		wrmsrl(MSR_IA32_S_CET, msr & ~CET_ENDBR_EN);
+	}
+
+	return msr;
+}
+
+__noendbr void ibt_restore(u64 save)
+{
+	u64 msr;
+
+	if (cpu_feature_enabled(X86_FEATURE_IBT)) {
+		rdmsrl(MSR_IA32_S_CET, msr);
+		msr &= ~CET_ENDBR_EN;
+		msr |= (save & CET_ENDBR_EN);
+		wrmsrl(MSR_IA32_S_CET, msr);
+	}
+}
+
+#endif
+
 static __always_inline void setup_cet(struct cpuinfo_x86 *c)
 {
 	u64 msr = CET_ENDBR_EN;



^ permalink raw reply	[flat|nested] 183+ messages in thread

* [PATCH v2 20/39] x86/bugs: Disable Retpoline when IBT
  2022-02-24 14:51 [PATCH v2 00/39] x86: Kernel IBT Peter Zijlstra
                   ` (18 preceding siblings ...)
  2022-02-24 14:51 ` [PATCH v2 19/39] x86: Disable IBT around firmware Peter Zijlstra
@ 2022-02-24 14:51 ` Peter Zijlstra
  2022-02-25  1:11   ` Kees Cook
  2022-02-24 14:51 ` [PATCH v2 21/39] x86/ibt: Annotate text references Peter Zijlstra
                   ` (19 subsequent siblings)
  39 siblings, 1 reply; 183+ messages in thread
From: Peter Zijlstra @ 2022-02-24 14:51 UTC (permalink / raw)
  To: x86, joao, hjl.tools, jpoimboe, andrew.cooper3
  Cc: linux-kernel, peterz, ndesaulniers, keescook, samitolvanen,
	mark.rutland, alyssa.milburn, mbenes, rostedt, mhiramat,
	alexei.starovoitov

Retpoline and IBT are mutually exclusive. IBT relies on indirect
branches (JMP/CALL *%reg) while retpoline avoids them by design.

Demote to LFENCE on IBT enabled hardware.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/kernel/cpu/bugs.c |   25 +++++++++++++++++++++++++
 1 file changed, 25 insertions(+)

--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -937,6 +937,11 @@ static void __init spectre_v2_select_mit
 	    boot_cpu_data.x86_vendor == X86_VENDOR_HYGON) {
 	retpoline_amd:
 		if (!boot_cpu_has(X86_FEATURE_LFENCE_RDTSC)) {
+			if (HAS_KERNEL_IBT &&
+			    boot_cpu_has(X86_FEATURE_IBT)) {
+				pr_err("Spectre mitigation: LFENCE not serializing, generic retpoline not available due to IBT, switching to none\n");
+				return;
+			}
 			pr_err("Spectre mitigation: LFENCE not serializing, switching to generic retpoline\n");
 			goto retpoline_generic;
 		}
@@ -945,6 +950,26 @@ static void __init spectre_v2_select_mit
 		setup_force_cpu_cap(X86_FEATURE_RETPOLINE);
 	} else {
 	retpoline_generic:
+		/*
+		 *  Full retpoline is incompatible with IBT, demote to LFENCE.
+		 */
+		if (HAS_KERNEL_IBT &&
+		    boot_cpu_has(X86_FEATURE_IBT)) {
+			switch (cmd) {
+			case SPECTRE_V2_CMD_FORCE:
+			case SPECTRE_V2_CMD_AUTO:
+			case SPECTRE_V2_CMD_RETPOLINE:
+				/* silent for auto select */
+				break;
+
+			default:
+				/* warn when 'demoting' an explicit selection */
+				pr_warn("Spectre mitigation: Switching to LFENCE due to IBT\n");
+				break;
+			}
+
+			goto retpoline_amd;
+		}
 		mode = SPECTRE_V2_RETPOLINE_GENERIC;
 		setup_force_cpu_cap(X86_FEATURE_RETPOLINE);
 	}



^ permalink raw reply	[flat|nested] 183+ messages in thread

* [PATCH v2 21/39] x86/ibt: Annotate text references
  2022-02-24 14:51 [PATCH v2 00/39] x86: Kernel IBT Peter Zijlstra
                   ` (19 preceding siblings ...)
  2022-02-24 14:51 ` [PATCH v2 20/39] x86/bugs: Disable Retpoline when IBT Peter Zijlstra
@ 2022-02-24 14:51 ` Peter Zijlstra
  2022-02-25  0:47   ` Josh Poimboeuf
  2022-02-24 14:52 ` [PATCH v2 22/39] x86/ibt,ftrace: Annotate ftrace code patching Peter Zijlstra
                   ` (18 subsequent siblings)
  39 siblings, 1 reply; 183+ messages in thread
From: Peter Zijlstra @ 2022-02-24 14:51 UTC (permalink / raw)
  To: x86, joao, hjl.tools, jpoimboe, andrew.cooper3
  Cc: linux-kernel, peterz, ndesaulniers, keescook, samitolvanen,
	mark.rutland, alyssa.milburn, mbenes, rostedt, mhiramat,
	alexei.starovoitov

Annotate away some of the generic code references. This is things
where we take the address of a symbol for exception handling or return
addresses (eg. context switch).

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/entry/entry_64.S            |    9 +++++++++
 arch/x86/entry/entry_64_compat.S     |    1 +
 arch/x86/kernel/alternative.c        |    4 +++-
 arch/x86/kernel/head_64.S            |    4 ++++
 arch/x86/kernel/kprobes/core.c       |    1 +
 arch/x86/kernel/relocate_kernel_64.S |    2 ++
 arch/x86/lib/error-inject.c          |    2 ++
 arch/x86/lib/retpoline.S             |    2 ++
 8 files changed, 24 insertions(+), 1 deletion(-)

--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -277,6 +277,7 @@ SYM_FUNC_END(__switch_to_asm)
 .pushsection .text, "ax"
 SYM_CODE_START(ret_from_fork)
 	UNWIND_HINT_EMPTY
+	ANNOTATE_NOENDBR // copy_thread
 	movq	%rax, %rdi
 	call	schedule_tail			/* rdi: 'prev' task parameter */
 
@@ -563,12 +564,14 @@ SYM_CODE_END(\asmsym)
 	.align 16
 	.globl __irqentry_text_start
 __irqentry_text_start:
+	ANNOTATE_NOENDBR // unwinders
 
 #include <asm/idtentry.h>
 
 	.align 16
 	.globl __irqentry_text_end
 __irqentry_text_end:
+	ANNOTATE_NOENDBR
 
 SYM_CODE_START_LOCAL(common_interrupt_return)
 SYM_INNER_LABEL(swapgs_restore_regs_and_return_to_usermode, SYM_L_GLOBAL)
@@ -646,6 +651,7 @@ SYM_INNER_LABEL_ALIGN(native_iret, SYM_L
 #endif
 
 SYM_INNER_LABEL(native_irq_return_iret, SYM_L_GLOBAL)
+	ANNOTATE_NOENDBR // exc_double_fault
 	/*
 	 * This may fault.  Non-paranoid faults on return to userspace are
 	 * handled by fixup_bad_iret.  These include #SS, #GP, and #NP.
@@ -740,6 +746,7 @@ SYM_FUNC_START(asm_load_gs_index)
 	FRAME_BEGIN
 	swapgs
 .Lgs_change:
+	ANNOTATE_NOENDBR // error_entry
 	movl	%edi, %gs
 2:	ALTERNATIVE "", "mfence", X86_BUG_SWAPGS_FENCE
 	swapgs
@@ -1317,6 +1324,7 @@ SYM_CODE_START(asm_exc_nmi)
 #endif
 
 repeat_nmi:
+	ANNOTATE_NOENDBR // this code
 	/*
 	 * If there was a nested NMI, the first NMI's iret will return
 	 * here. But NMIs are still enabled and we can take another
@@ -1345,6 +1353,7 @@ SYM_CODE_START(asm_exc_nmi)
 	.endr
 	subq	$(5*8), %rsp
 end_repeat_nmi:
+	ANNOTATE_NOENDBR // this code
 
 	/*
 	 * Everything below this point can be preempted by a nested NMI.
--- a/arch/x86/entry/entry_64_compat.S
+++ b/arch/x86/entry/entry_64_compat.S
@@ -148,6 +148,7 @@ SYM_INNER_LABEL(entry_SYSENTER_compat_af
 	popfq
 	jmp	.Lsysenter_flags_fixed
 SYM_INNER_LABEL(__end_entry_SYSENTER_compat, SYM_L_GLOBAL)
+	ANNOTATE_NOENDBR // is_sysenter_singlestep
 SYM_CODE_END(entry_SYSENTER_compat)
 
 /*
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -713,6 +713,7 @@ asm (
 "	.pushsection	.init.text, \"ax\", @progbits\n"
 "	.type		int3_magic, @function\n"
 "int3_magic:\n"
+	ANNOTATE_NOENDBR
 "	movl	$1, (%" _ASM_ARG1 ")\n"
 	ASM_RET
 "	.size		int3_magic, .-int3_magic\n"
@@ -757,7 +758,8 @@ static void __init int3_selftest(void)
 	 * then trigger the INT3, padded with NOPs to match a CALL instruction
 	 * length.
 	 */
-	asm volatile ("1: int3; nop; nop; nop; nop\n\t"
+	asm volatile (ANNOTATE_NOENDBR
+		      "1: int3; nop; nop; nop; nop\n\t"
 		      ".pushsection .init.data,\"aw\"\n\t"
 		      ".align " __ASM_SEL(4, 8) "\n\t"
 		      ".type int3_selftest_ip, @object\n\t"
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -99,6 +99,7 @@ SYM_CODE_END(startup_64)
 
 SYM_CODE_START(secondary_startup_64)
 	UNWIND_HINT_EMPTY
+	ANNOTATE_NOENDBR
 	/*
 	 * At this point the CPU runs in 64bit mode CS.L = 1 CS.D = 0,
 	 * and someone has loaded a mapped page table.
@@ -127,6 +128,7 @@ SYM_CODE_START(secondary_startup_64)
 	 */
 SYM_INNER_LABEL(secondary_startup_64_no_verify, SYM_L_GLOBAL)
 	UNWIND_HINT_EMPTY
+	ANNOTATE_NOENDBR
 
 	/*
 	 * Retrieve the modifier (SME encryption mask if SME is active) to be
@@ -192,6 +194,7 @@ SYM_INNER_LABEL(secondary_startup_64_no_
 	jmp	*%rax
 1:
 	UNWIND_HINT_EMPTY
+	ANNOTATE_NOENDBR // above
 
 	/*
 	 * We must switch to a new descriptor in kernel space for the GDT
@@ -299,6 +302,7 @@ SYM_INNER_LABEL(secondary_startup_64_no_
 	pushq	%rax		# target address in negative space
 	lretq
 .Lafter_lret:
+	ANNOTATE_NOENDBR
 SYM_CODE_END(secondary_startup_64)
 
 #include "verify_cpu.S"
--- a/arch/x86/kernel/kprobes/core.c
+++ b/arch/x86/kernel/kprobes/core.c
@@ -1023,6 +1023,7 @@ asm(
 	".type __kretprobe_trampoline, @function\n"
 	"__kretprobe_trampoline:\n"
 #ifdef CONFIG_X86_64
+	ANNOTATE_NOENDBR
 	/* Push a fake return address to tell the unwinder it's a kretprobe. */
 	"	pushq $__kretprobe_trampoline\n"
 	UNWIND_HINT_FUNC
--- a/arch/x86/kernel/relocate_kernel_64.S
+++ b/arch/x86/kernel/relocate_kernel_64.S
@@ -42,6 +42,7 @@
 	.code64
 SYM_CODE_START_NOALIGN(relocate_kernel)
 	UNWIND_HINT_EMPTY
+	ANNOTATE_NOENDBR
 	/*
 	 * %rdi indirection_page
 	 * %rsi page_list
@@ -215,6 +216,7 @@ SYM_CODE_END(identity_mapped)
 
 SYM_CODE_START_LOCAL_NOALIGN(virtual_mapped)
 	UNWIND_HINT_EMPTY
+	ANNOTATE_NOENDBR // RET target, above
 	movq	RSP(%r8), %rsp
 	movq	CR4(%r8), %rax
 	movq	%rax, %cr4
--- a/arch/x86/lib/error-inject.c
+++ b/arch/x86/lib/error-inject.c
@@ -3,6 +3,7 @@
 #include <linux/linkage.h>
 #include <linux/error-injection.h>
 #include <linux/kprobes.h>
+#include <linux/objtool.h>
 
 asmlinkage void just_return_func(void);
 
@@ -11,6 +12,7 @@ asm(
 	".type just_return_func, @function\n"
 	".globl just_return_func\n"
 	"just_return_func:\n"
+		ANNOTATE_NOENDBR
 		ASM_RET
 	".size just_return_func, .-just_return_func\n"
 );
--- a/arch/x86/lib/retpoline.S
+++ b/arch/x86/lib/retpoline.S
@@ -12,6 +12,8 @@
 
 	.section .text.__x86.indirect_thunk
 
+	ANNOTATE_NOENDBR // apply_retpolines
+
 .macro RETPOLINE reg
 	ANNOTATE_INTRA_FUNCTION_CALL
 	call    .Ldo_rop_\@



^ permalink raw reply	[flat|nested] 183+ messages in thread

* [PATCH v2 22/39] x86/ibt,ftrace: Annotate ftrace code patching
  2022-02-24 14:51 [PATCH v2 00/39] x86: Kernel IBT Peter Zijlstra
                   ` (20 preceding siblings ...)
  2022-02-24 14:51 ` [PATCH v2 21/39] x86/ibt: Annotate text references Peter Zijlstra
@ 2022-02-24 14:52 ` Peter Zijlstra
  2022-02-24 14:52 ` [PATCH v2 23/39] x86/ibt,sev: Annotations Peter Zijlstra
                   ` (17 subsequent siblings)
  39 siblings, 0 replies; 183+ messages in thread
From: Peter Zijlstra @ 2022-02-24 14:52 UTC (permalink / raw)
  To: x86, joao, hjl.tools, jpoimboe, andrew.cooper3
  Cc: linux-kernel, peterz, ndesaulniers, keescook, samitolvanen,
	mark.rutland, alyssa.milburn, mbenes, rostedt, mhiramat,
	alexei.starovoitov

These are code patching sites, not indirect targets.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/kernel/ftrace_64.S |    7 +++++++
 1 file changed, 7 insertions(+)

--- a/arch/x86/kernel/ftrace_64.S
+++ b/arch/x86/kernel/ftrace_64.S
@@ -145,6 +145,7 @@ SYM_FUNC_START(ftrace_caller)
 	movq %rcx, RSP(%rsp)
 
 SYM_INNER_LABEL(ftrace_caller_op_ptr, SYM_L_GLOBAL)
+	ANNOTATE_NOENDBR
 	/* Load the ftrace_ops into the 3rd parameter */
 	movq function_trace_op(%rip), %rdx
 
@@ -155,6 +156,7 @@ SYM_INNER_LABEL(ftrace_caller_op_ptr, SY
 	movq $0, CS(%rsp)
 
 SYM_INNER_LABEL(ftrace_call, SYM_L_GLOBAL)
+	ANNOTATE_NOENDBR
 	call ftrace_stub
 
 	/* Handlers can change the RIP */
@@ -169,6 +171,7 @@ SYM_INNER_LABEL(ftrace_call, SYM_L_GLOBA
 	 * layout here.
 	 */
 SYM_INNER_LABEL(ftrace_caller_end, SYM_L_GLOBAL)
+	ANNOTATE_NOENDBR
 
 	jmp ftrace_epilogue
 SYM_FUNC_END(ftrace_caller);
@@ -192,6 +195,7 @@ SYM_FUNC_START(ftrace_regs_caller)
 	/* save_mcount_regs fills in first two parameters */
 
 SYM_INNER_LABEL(ftrace_regs_caller_op_ptr, SYM_L_GLOBAL)
+	ANNOTATE_NOENDBR
 	/* Load the ftrace_ops into the 3rd parameter */
 	movq function_trace_op(%rip), %rdx
 
@@ -221,6 +225,7 @@ SYM_INNER_LABEL(ftrace_regs_caller_op_pt
 	leaq (%rsp), %rcx
 
 SYM_INNER_LABEL(ftrace_regs_call, SYM_L_GLOBAL)
+	ANNOTATE_NOENDBR
 	call ftrace_stub
 
 	/* Copy flags back to SS, to restore them */
@@ -248,6 +253,7 @@ SYM_INNER_LABEL(ftrace_regs_call, SYM_L_
 	 */
 	testq	%rax, %rax
 SYM_INNER_LABEL(ftrace_regs_caller_jmp, SYM_L_GLOBAL)
+	ANNOTATE_NOENDBR
 	jnz	1f
 
 	restore_mcount_regs
@@ -261,6 +267,7 @@ SYM_INNER_LABEL(ftrace_regs_caller_jmp,
 	 * to the return.
 	 */
 SYM_INNER_LABEL(ftrace_regs_caller_end, SYM_L_GLOBAL)
+	ANNOTATE_NOENDBR
 	jmp ftrace_epilogue
 
 	/* Swap the flags with orig_rax */



^ permalink raw reply	[flat|nested] 183+ messages in thread

* [PATCH v2 23/39] x86/ibt,sev: Annotations
  2022-02-24 14:51 [PATCH v2 00/39] x86: Kernel IBT Peter Zijlstra
                   ` (21 preceding siblings ...)
  2022-02-24 14:52 ` [PATCH v2 22/39] x86/ibt,ftrace: Annotate ftrace code patching Peter Zijlstra
@ 2022-02-24 14:52 ` Peter Zijlstra
  2022-02-24 14:52 ` [PATCH v2 24/39] x86/text-patching: Make text_gen_insn() IBT aware Peter Zijlstra
                   ` (16 subsequent siblings)
  39 siblings, 0 replies; 183+ messages in thread
From: Peter Zijlstra @ 2022-02-24 14:52 UTC (permalink / raw)
  To: x86, joao, hjl.tools, jpoimboe, andrew.cooper3
  Cc: linux-kernel, peterz, ndesaulniers, keescook, samitolvanen,
	mark.rutland, alyssa.milburn, mbenes, rostedt, mhiramat,
	alexei.starovoitov

No IBT on AMD so far.. probably correct, who knows.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/entry/entry_64.S        |    1 +
 arch/x86/entry/entry_64_compat.S |    1 +
 arch/x86/kernel/head_64.S        |    2 ++
 3 files changed, 4 insertions(+)

--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -95,6 +95,7 @@ SYM_CODE_START(entry_SYSCALL_64)
 	movq	PER_CPU_VAR(cpu_current_top_of_stack), %rsp
 
 SYM_INNER_LABEL(entry_SYSCALL_64_safe_stack, SYM_L_GLOBAL)
+	ANNOTATE_NOENDBR
 
 	/* Construct struct pt_regs on stack */
 	pushq	$__USER_DS				/* pt_regs->ss */
--- a/arch/x86/entry/entry_64_compat.S
+++ b/arch/x86/entry/entry_64_compat.S
@@ -214,6 +214,7 @@ SYM_CODE_START(entry_SYSCALL_compat)
 	movq	PER_CPU_VAR(cpu_current_top_of_stack), %rsp
 
 SYM_INNER_LABEL(entry_SYSCALL_compat_safe_stack, SYM_L_GLOBAL)
+	ANNOTATE_NOENDBR
 
 	/* Construct struct pt_regs on stack */
 	pushq	$__USER32_DS		/* pt_regs->ss */
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -332,6 +332,7 @@ SYM_CODE_END(start_cpu0)
  */
 SYM_CODE_START_NOALIGN(vc_boot_ghcb)
 	UNWIND_HINT_IRET_REGS offset=8
+	ENDBR
 
 	/* Build pt_regs */
 	PUSH_AND_CLEAR_REGS
@@ -439,6 +440,7 @@ SYM_CODE_END(early_idt_handler_common)
  */
 SYM_CODE_START_NOALIGN(vc_no_ghcb)
 	UNWIND_HINT_IRET_REGS offset=8
+	ENDBR
 
 	/* Build pt_regs */
 	PUSH_AND_CLEAR_REGS



^ permalink raw reply	[flat|nested] 183+ messages in thread

* [PATCH v2 24/39] x86/text-patching: Make text_gen_insn() IBT aware
  2022-02-24 14:51 [PATCH v2 00/39] x86: Kernel IBT Peter Zijlstra
                   ` (22 preceding siblings ...)
  2022-02-24 14:52 ` [PATCH v2 23/39] x86/ibt,sev: Annotations Peter Zijlstra
@ 2022-02-24 14:52 ` Peter Zijlstra
  2022-02-25  0:49   ` Josh Poimboeuf
  2022-02-24 14:52 ` [PATCH v2 25/39] x86/ibt,paravirt: Use text_gen_insn() for paravirt_patch() Peter Zijlstra
                   ` (15 subsequent siblings)
  39 siblings, 1 reply; 183+ messages in thread
From: Peter Zijlstra @ 2022-02-24 14:52 UTC (permalink / raw)
  To: x86, joao, hjl.tools, jpoimboe, andrew.cooper3
  Cc: linux-kernel, peterz, ndesaulniers, keescook, samitolvanen,
	mark.rutland, alyssa.milburn, mbenes, rostedt, mhiramat,
	alexei.starovoitov

Make sure we don't generate direct JMP/CALL instructions to an ENDBR
instruction (which might be poison).

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/include/asm/text-patching.h |   16 +++++++++++++++-
 1 file changed, 15 insertions(+), 1 deletion(-)

--- a/arch/x86/include/asm/text-patching.h
+++ b/arch/x86/include/asm/text-patching.h
@@ -5,6 +5,7 @@
 #include <linux/types.h>
 #include <linux/stddef.h>
 #include <asm/ptrace.h>
+#include <asm/ibt.h>
 
 struct paravirt_patch_site;
 #ifdef CONFIG_PARAVIRT
@@ -101,13 +102,26 @@ void *text_gen_insn(u8 opcode, const voi
 	static union text_poke_insn insn; /* per instance */
 	int size = text_opcode_size(opcode);
 
+	/*
+	 * Hide the addresses to avoid the compiler folding in constants when
+	 * referencing code, these can mess up annotations like
+	 * ANNOTATE_NOENDBR.
+	 */
+	OPTIMIZER_HIDE_VAR(addr);
+	OPTIMIZER_HIDE_VAR(dest);
+
+#ifdef CONFIG_X86_KERNEL_IBT
+	if (is_endbr(*(u32 *)dest))
+		dest += 4;
+#endif
+
 	insn.opcode = opcode;
 
 	if (size > 1) {
 		insn.disp = (long)dest - (long)(addr + size);
 		if (size == 2) {
 			/*
-			 * Ensure that for JMP9 the displacement
+			 * Ensure that for JMP8 the displacement
 			 * actually fits the signed byte.
 			 */
 			BUG_ON((insn.disp >> 31) != (insn.disp >> 7));



^ permalink raw reply	[flat|nested] 183+ messages in thread

* [PATCH v2 25/39] x86/ibt,paravirt: Use text_gen_insn() for paravirt_patch()
  2022-02-24 14:51 [PATCH v2 00/39] x86: Kernel IBT Peter Zijlstra
                   ` (23 preceding siblings ...)
  2022-02-24 14:52 ` [PATCH v2 24/39] x86/text-patching: Make text_gen_insn() IBT aware Peter Zijlstra
@ 2022-02-24 14:52 ` Peter Zijlstra
  2022-02-24 14:52 ` [PATCH v2 26/39] x86/entry: Cleanup PARAVIRT Peter Zijlstra
                   ` (14 subsequent siblings)
  39 siblings, 0 replies; 183+ messages in thread
From: Peter Zijlstra @ 2022-02-24 14:52 UTC (permalink / raw)
  To: x86, joao, hjl.tools, jpoimboe, andrew.cooper3
  Cc: linux-kernel, peterz, ndesaulniers, keescook, samitolvanen,
	mark.rutland, alyssa.milburn, mbenes, rostedt, mhiramat,
	alexei.starovoitov

Now that writing direct JMP/CALL instructions is non-trivial, ensure
we only have a single copy of all the relevant magic.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/include/asm/text-patching.h |   20 ++++++++++++++------
 arch/x86/kernel/paravirt.c           |   23 +++--------------------
 2 files changed, 17 insertions(+), 26 deletions(-)

--- a/arch/x86/include/asm/text-patching.h
+++ b/arch/x86/include/asm/text-patching.h
@@ -97,16 +97,18 @@ union text_poke_insn {
 };
 
 static __always_inline
-void *text_gen_insn(u8 opcode, const void *addr, const void *dest)
+void __text_gen_insn(void *buf, u8 opcode, const void *addr, const void *dest, int size)
 {
-	static union text_poke_insn insn; /* per instance */
-	int size = text_opcode_size(opcode);
+	union text_poke_insn *insn = buf;
+
+	BUG_ON(size < text_opcode_size(opcode));
 
 	/*
 	 * Hide the addresses to avoid the compiler folding in constants when
 	 * referencing code, these can mess up annotations like
 	 * ANNOTATE_NOENDBR.
 	 */
+	OPTIMIZER_HIDE_VAR(insn);
 	OPTIMIZER_HIDE_VAR(addr);
 	OPTIMIZER_HIDE_VAR(dest);
 
@@ -115,19 +117,25 @@ void *text_gen_insn(u8 opcode, const voi
 		dest += 4;
 #endif
 
-	insn.opcode = opcode;
+	insn->opcode = opcode;
 
 	if (size > 1) {
-		insn.disp = (long)dest - (long)(addr + size);
+		insn->disp = (long)dest - (long)(addr + size);
 		if (size == 2) {
 			/*
 			 * Ensure that for JMP8 the displacement
 			 * actually fits the signed byte.
 			 */
-			BUG_ON((insn.disp >> 31) != (insn.disp >> 7));
+			BUG_ON((insn->disp >> 31) != (insn->disp >> 7));
 		}
 	}
+}
 
+static __always_inline
+void *text_gen_insn(u8 opcode, const void *addr, const void *dest)
+{
+	static union text_poke_insn insn; /* per instance */
+	__text_gen_insn(&insn, opcode, addr, dest, text_opcode_size(opcode));
 	return &insn.text;
 }
 
--- a/arch/x86/kernel/paravirt.c
+++ b/arch/x86/kernel/paravirt.c
@@ -71,29 +71,12 @@ noinstr void paravirt_BUG(void)
 	BUG();
 }
 
-struct branch {
-	unsigned char opcode;
-	u32 delta;
-} __attribute__((packed));
-
 static unsigned paravirt_patch_call(void *insn_buff, const void *target,
 				    unsigned long addr, unsigned len)
 {
-	const int call_len = 5;
-	struct branch *b = insn_buff;
-	unsigned long delta = (unsigned long)target - (addr+call_len);
-
-	if (len < call_len) {
-		pr_warn("paravirt: Failed to patch indirect CALL at %ps\n", (void *)addr);
-		/* Kernel might not be viable if patching fails, bail out: */
-		BUG_ON(1);
-	}
-
-	b->opcode = 0xe8; /* call */
-	b->delta = delta;
-	BUILD_BUG_ON(sizeof(*b) != call_len);
-
-	return call_len;
+	__text_gen_insn(insn_buff, CALL_INSN_OPCODE,
+			(void *)addr, target, CALL_INSN_SIZE);
+	return CALL_INSN_SIZE;
 }
 
 #ifdef CONFIG_PARAVIRT_XXL



^ permalink raw reply	[flat|nested] 183+ messages in thread

* [PATCH v2 26/39] x86/entry: Cleanup PARAVIRT
  2022-02-24 14:51 [PATCH v2 00/39] x86: Kernel IBT Peter Zijlstra
                   ` (24 preceding siblings ...)
  2022-02-24 14:52 ` [PATCH v2 25/39] x86/ibt,paravirt: Use text_gen_insn() for paravirt_patch() Peter Zijlstra
@ 2022-02-24 14:52 ` Peter Zijlstra
  2022-02-24 14:52 ` [PATCH v2 27/39] x86/entry,xen: Early rewrite of restore_regs_and_return_to_kernel() Peter Zijlstra
                   ` (13 subsequent siblings)
  39 siblings, 0 replies; 183+ messages in thread
From: Peter Zijlstra @ 2022-02-24 14:52 UTC (permalink / raw)
  To: x86, joao, hjl.tools, jpoimboe, andrew.cooper3
  Cc: linux-kernel, peterz, ndesaulniers, keescook, samitolvanen,
	mark.rutland, alyssa.milburn, mbenes, rostedt, mhiramat,
	alexei.starovoitov, Andrew Cooper

Since commit 5c8f6a2e316e ("x86/xen: Add
xenpv_restore_regs_and_return_to_usermode()") Xen will no longer reach
this code and we can do away with the paravirt
SWAPGS/INTERRUPT_RETURN.

Suggested-by: Andrew Cooper <Andrew.Cooper3@citrix.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/entry/entry_64.S |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -619,8 +619,8 @@ SYM_INNER_LABEL(swapgs_restore_regs_and_
 
 	/* Restore RDI. */
 	popq	%rdi
-	SWAPGS
-	INTERRUPT_RETURN
+	swapgs
+	jmp	native_iret
 
 
 SYM_INNER_LABEL(restore_regs_and_return_to_kernel, SYM_L_GLOBAL)



^ permalink raw reply	[flat|nested] 183+ messages in thread

* [PATCH v2 27/39] x86/entry,xen: Early rewrite of restore_regs_and_return_to_kernel()
  2022-02-24 14:51 [PATCH v2 00/39] x86: Kernel IBT Peter Zijlstra
                   ` (25 preceding siblings ...)
  2022-02-24 14:52 ` [PATCH v2 26/39] x86/entry: Cleanup PARAVIRT Peter Zijlstra
@ 2022-02-24 14:52 ` Peter Zijlstra
  2022-02-24 17:51   ` Andrew Cooper
  2022-02-24 14:52 ` [PATCH v2 28/39] x86/ibt,xen: Sprinkle the ENDBR Peter Zijlstra
                   ` (12 subsequent siblings)
  39 siblings, 1 reply; 183+ messages in thread
From: Peter Zijlstra @ 2022-02-24 14:52 UTC (permalink / raw)
  To: x86, joao, hjl.tools, jpoimboe, andrew.cooper3
  Cc: linux-kernel, peterz, ndesaulniers, keescook, samitolvanen,
	mark.rutland, alyssa.milburn, mbenes, rostedt, mhiramat,
	alexei.starovoitov, Andrew Cooper

By doing an early rewrite of 'jmp native_iret` in
restore_regs_and_return_to_kernel() we can get rid of the last
INTERRUPT_RETURN user and paravirt_iret.

Suggested-by: Andrew Cooper <Andrew.Cooper3@citrix.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/entry/entry_64.S             |   12 ++++++++----
 arch/x86/include/asm/irqflags.h       |    5 -----
 arch/x86/include/asm/paravirt_types.h |    1 -
 arch/x86/kernel/head_64.S             |    3 ++-
 arch/x86/kernel/paravirt.c            |    4 ----
 arch/x86/xen/enlighten_pv.c           |    7 ++++++-
 arch/x86/xen/xen-asm.S                |    1 +
 7 files changed, 17 insertions(+), 16 deletions(-)

--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -620,7 +620,7 @@ SYM_INNER_LABEL(swapgs_restore_regs_and_
 	/* Restore RDI. */
 	popq	%rdi
 	swapgs
-	jmp	native_iret
+	jmp	.Lnative_iret
 
 
 SYM_INNER_LABEL(restore_regs_and_return_to_kernel, SYM_L_GLOBAL)
@@ -637,11 +637,15 @@ SYM_INNER_LABEL(restore_regs_and_return_
 	 * ARCH_HAS_MEMBARRIER_SYNC_CORE rely on IRET core serialization
 	 * when returning from IPI handler.
 	 */
-	INTERRUPT_RETURN
+#ifdef CONFIG_XEN_PV
+SYM_INNER_LABEL(early_xen_iret_patch, SYM_L_GLOBAL)
+	ANNOTATE_NOENDBR
+	.byte 0xe9
+	.long .Lnative_iret - (. + 4)
+#endif
 
-SYM_INNER_LABEL_ALIGN(native_iret, SYM_L_GLOBAL)
+.Lnative_iret:
 	UNWIND_HINT_IRET_REGS
-	ENDBR // paravirt_iret
 	/*
 	 * Are we returning to a stack segment from the LDT?  Note: in
 	 * 64-bit mode SS:RSP on the exception stack is always valid.
--- a/arch/x86/include/asm/irqflags.h
+++ b/arch/x86/include/asm/irqflags.h
@@ -141,13 +141,8 @@ static __always_inline void arch_local_i
 #ifdef CONFIG_X86_64
 #ifdef CONFIG_XEN_PV
 #define SWAPGS	ALTERNATIVE "swapgs", "", X86_FEATURE_XENPV
-#define INTERRUPT_RETURN						\
-	ANNOTATE_RETPOLINE_SAFE;					\
-	ALTERNATIVE_TERNARY("jmp *paravirt_iret(%rip);",		\
-		X86_FEATURE_XENPV, "jmp xen_iret;", "jmp native_iret;")
 #else
 #define SWAPGS	swapgs
-#define INTERRUPT_RETURN	jmp native_iret
 #endif
 #endif
 #endif /* !__ASSEMBLY__ */
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -272,7 +272,6 @@ struct paravirt_patch_template {
 
 extern struct pv_info pv_info;
 extern struct paravirt_patch_template pv_ops;
-extern void (*paravirt_iret)(void);
 
 #define PARAVIRT_PATCH(x)					\
 	(offsetof(struct paravirt_patch_template, x) / sizeof(void *))
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -350,7 +350,6 @@ SYM_CODE_START_NOALIGN(vc_boot_ghcb)
 	/* Remove Error Code */
 	addq    $8, %rsp
 
-	/* Pure iret required here - don't use INTERRUPT_RETURN */
 	iretq
 SYM_CODE_END(vc_boot_ghcb)
 #endif
@@ -434,6 +433,8 @@ SYM_CODE_END(early_idt_handler_common)
  * early_idt_handler_array can't be used because it returns via the
  * paravirtualized INTERRUPT_RETURN and pv-ops don't work that early.
  *
+ * XXX it does, fix this.
+ *
  * This handler will end up in the .init.text section and not be
  * available to boot secondary CPUs.
  */
--- a/arch/x86/kernel/paravirt.c
+++ b/arch/x86/kernel/paravirt.c
@@ -134,8 +134,6 @@ void paravirt_set_sched_clock(u64 (*func
 }
 
 /* These are in entry.S */
-extern void native_iret(void);
-
 static struct resource reserve_ioports = {
 	.start = 0,
 	.end = IO_SPACE_LIMIT,
@@ -399,8 +397,6 @@ struct paravirt_patch_template pv_ops =
 
 #ifdef CONFIG_PARAVIRT_XXL
 NOKPROBE_SYMBOL(native_load_idt);
-
-void (*paravirt_iret)(void) = native_iret;
 #endif
 
 EXPORT_SYMBOL(pv_ops);
--- a/arch/x86/xen/enlighten_pv.c
+++ b/arch/x86/xen/enlighten_pv.c
@@ -1177,6 +1177,8 @@ static void __init xen_domu_set_legacy_f
 	x86_platform.legacy.rtc = 0;
 }
 
+extern void early_xen_iret_patch(void);
+
 /* First C function to be called on Xen boot */
 asmlinkage __visible void __init xen_start_kernel(void)
 {
@@ -1187,6 +1189,10 @@ asmlinkage __visible void __init xen_sta
 	if (!xen_start_info)
 		return;
 
+	__text_gen_insn(&early_xen_iret_patch,
+			JMP32_INSN_OPCODE, &early_xen_iret_patch, &xen_iret,
+			JMP32_INSN_SIZE);
+
 	xen_domain_type = XEN_PV_DOMAIN;
 	xen_start_flags = xen_start_info->flags;
 
@@ -1195,7 +1201,6 @@ asmlinkage __visible void __init xen_sta
 	/* Install Xen paravirt ops */
 	pv_info = xen_info;
 	pv_ops.cpu = xen_cpu_ops.cpu;
-	paravirt_iret = xen_iret;
 	xen_init_irq_ops();
 
 	/*
--- a/arch/x86/xen/xen-asm.S
+++ b/arch/x86/xen/xen-asm.S
@@ -189,6 +189,7 @@ hypercall_iret = hypercall_page + __HYPE
  */
 SYM_CODE_START(xen_iret)
 	UNWIND_HINT_EMPTY
+	ANNOTATE_NOENDBR
 	pushq $0
 	jmp hypercall_iret
 SYM_CODE_END(xen_iret)



^ permalink raw reply	[flat|nested] 183+ messages in thread

* [PATCH v2 28/39] x86/ibt,xen: Sprinkle the ENDBR
  2022-02-24 14:51 [PATCH v2 00/39] x86: Kernel IBT Peter Zijlstra
                   ` (26 preceding siblings ...)
  2022-02-24 14:52 ` [PATCH v2 27/39] x86/entry,xen: Early rewrite of restore_regs_and_return_to_kernel() Peter Zijlstra
@ 2022-02-24 14:52 ` Peter Zijlstra
  2022-02-25  0:54   ` Josh Poimboeuf
  2022-02-24 14:52 ` [PATCH v2 29/39] objtool: Rename --duplicate to --lto Peter Zijlstra
                   ` (11 subsequent siblings)
  39 siblings, 1 reply; 183+ messages in thread
From: Peter Zijlstra @ 2022-02-24 14:52 UTC (permalink / raw)
  To: x86, joao, hjl.tools, jpoimboe, andrew.cooper3
  Cc: linux-kernel, peterz, ndesaulniers, keescook, samitolvanen,
	mark.rutland, alyssa.milburn, mbenes, rostedt, mhiramat,
	alexei.starovoitov

Even though Xen currently doesn't advertise IBT, prepare for when it
will eventually do so and sprinkle the ENDBR dust accordingly.

Even though most of the entry points are IRET like, the CPL0
Hypervisor can set WAIT-FOR-ENDBR and demand ENDBR at these sites.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/entry/entry_64.S      |    1 +
 arch/x86/include/asm/segment.h |    2 +-
 arch/x86/kernel/head_64.S      |    1 +
 arch/x86/xen/enlighten_pv.c    |    3 +++
 arch/x86/xen/xen-asm.S         |   10 ++++++++++
 arch/x86/xen/xen-head.S        |    8 ++++++--
 6 files changed, 22 insertions(+), 3 deletions(-)

--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -822,6 +822,7 @@ SYM_CODE_END(exc_xen_hypervisor_callback
  */
 SYM_CODE_START(xen_failsafe_callback)
 	UNWIND_HINT_EMPTY
+	ENDBR
 	movl	%ds, %ecx
 	cmpw	%cx, 0x10(%rsp)
 	jne	1f
--- a/arch/x86/include/asm/segment.h
+++ b/arch/x86/include/asm/segment.h
@@ -283,7 +283,7 @@ static inline void vdso_read_cpunode(uns
  * pop %rcx; pop %r11; jmp early_idt_handler_array[i]; summing up to
  * max 8 bytes.
  */
-#define XEN_EARLY_IDT_HANDLER_SIZE 8
+#define XEN_EARLY_IDT_HANDLER_SIZE (8 + 4*HAS_KERNEL_IBT)
 
 #ifndef __ASSEMBLY__
 
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -389,6 +389,7 @@ SYM_CODE_START(early_idt_handler_array)
 	.fill early_idt_handler_array + i*EARLY_IDT_HANDLER_SIZE - ., 1, 0xcc
 	.endr
 SYM_CODE_END(early_idt_handler_array)
+	ANNOTATE_NOENDBR // early_idt_handler_array[NUM_EXCEPTION_VECTORS]
 
 SYM_CODE_START_LOCAL(early_idt_handler_common)
 	UNWIND_HINT_IRET_REGS offset=16
--- a/arch/x86/xen/enlighten_pv.c
+++ b/arch/x86/xen/enlighten_pv.c
@@ -624,6 +624,9 @@ static struct trap_array_entry trap_arra
 	TRAP_ENTRY(exc_coprocessor_error,		false ),
 	TRAP_ENTRY(exc_alignment_check,			false ),
 	TRAP_ENTRY(exc_simd_coprocessor_error,		false ),
+#ifdef CONFIG_X86_KERNEL_IBT
+	TRAP_ENTRY(exc_control_protection,		false ),
+#endif
 };
 
 static bool __ref get_trap_addr(void **addr, unsigned int ist)
--- a/arch/x86/xen/xen-asm.S
+++ b/arch/x86/xen/xen-asm.S
@@ -122,6 +122,7 @@ SYM_FUNC_END(xen_read_cr2_direct);
 .macro xen_pv_trap name
 SYM_CODE_START(xen_\name)
 	UNWIND_HINT_EMPTY
+	ENDBR
 	pop %rcx
 	pop %r11
 	jmp  \name
@@ -147,6 +148,9 @@ xen_pv_trap asm_exc_page_fault
 xen_pv_trap asm_exc_spurious_interrupt_bug
 xen_pv_trap asm_exc_coprocessor_error
 xen_pv_trap asm_exc_alignment_check
+#ifdef CONFIG_X86_KERNEL_IBT
+xen_pv_trap asm_exc_control_protection
+#endif
 #ifdef CONFIG_X86_MCE
 xen_pv_trap asm_xenpv_exc_machine_check
 #endif /* CONFIG_X86_MCE */
@@ -162,6 +166,7 @@ SYM_CODE_START(xen_early_idt_handler_arr
 	i = 0
 	.rept NUM_EXCEPTION_VECTORS
 	UNWIND_HINT_EMPTY
+	ENDBR
 	pop %rcx
 	pop %r11
 	jmp early_idt_handler_array + i*EARLY_IDT_HANDLER_SIZE
@@ -169,6 +174,7 @@ SYM_CODE_START(xen_early_idt_handler_arr
 	.fill xen_early_idt_handler_array + i*XEN_EARLY_IDT_HANDLER_SIZE - ., 1, 0xcc
 	.endr
 SYM_CODE_END(xen_early_idt_handler_array)
+	ANNOTATE_NOENDBR
 	__FINIT
 
 hypercall_iret = hypercall_page + __HYPERVISOR_iret * 32
@@ -231,6 +237,7 @@ SYM_CODE_END(xenpv_restore_regs_and_retu
 /* Normal 64-bit system call target */
 SYM_CODE_START(xen_syscall_target)
 	UNWIND_HINT_EMPTY
+	ENDBR
 	popq %rcx
 	popq %r11
 
@@ -250,6 +257,7 @@ SYM_CODE_END(xen_syscall_target)
 /* 32-bit compat syscall target */
 SYM_CODE_START(xen_syscall32_target)
 	UNWIND_HINT_EMPTY
+	ENDBR
 	popq %rcx
 	popq %r11
 
@@ -267,6 +275,7 @@ SYM_CODE_END(xen_syscall32_target)
 /* 32-bit compat sysenter target */
 SYM_CODE_START(xen_sysenter_target)
 	UNWIND_HINT_EMPTY
+	ENDBR
 	/*
 	 * NB: Xen is polite and clears TF from EFLAGS for us.  This means
 	 * that we don't need to guard against single step exceptions here.
@@ -290,6 +299,7 @@ SYM_CODE_END(xen_sysenter_target)
 SYM_CODE_START(xen_syscall32_target)
 SYM_CODE_START(xen_sysenter_target)
 	UNWIND_HINT_EMPTY
+	ENDBR
 	lea 16(%rsp), %rsp	/* strip %rcx, %r11 */
 	mov $-ENOSYS, %rax
 	pushq $0
--- a/arch/x86/xen/xen-head.S
+++ b/arch/x86/xen/xen-head.S
@@ -25,8 +25,11 @@
 SYM_CODE_START(hypercall_page)
 	.rept (PAGE_SIZE / 32)
 		UNWIND_HINT_FUNC
-		.skip 31, 0x90
-		RET
+		ANNOTATE_NOENDBR
+		/*
+		 * Xen will write the hypercall page, and sort out ENDBR.
+		 */
+		.skip 32, 0xcc
 	.endr
 
 #define HYPERCALL(n) \
@@ -74,6 +77,7 @@ SYM_CODE_END(startup_xen)
 .pushsection .text
 SYM_CODE_START(asm_cpu_bringup_and_idle)
 	UNWIND_HINT_EMPTY
+	ENDBR
 
 	call cpu_bringup_and_idle
 SYM_CODE_END(asm_cpu_bringup_and_idle)



^ permalink raw reply	[flat|nested] 183+ messages in thread

* [PATCH v2 29/39] objtool: Rename --duplicate to --lto
  2022-02-24 14:51 [PATCH v2 00/39] x86: Kernel IBT Peter Zijlstra
                   ` (27 preceding siblings ...)
  2022-02-24 14:52 ` [PATCH v2 28/39] x86/ibt,xen: Sprinkle the ENDBR Peter Zijlstra
@ 2022-02-24 14:52 ` Peter Zijlstra
  2022-02-24 14:52 ` [PATCH v2 30/39] Kbuild: Allow whole module objtool runs Peter Zijlstra
                   ` (10 subsequent siblings)
  39 siblings, 0 replies; 183+ messages in thread
From: Peter Zijlstra @ 2022-02-24 14:52 UTC (permalink / raw)
  To: x86, joao, hjl.tools, jpoimboe, andrew.cooper3
  Cc: linux-kernel, peterz, ndesaulniers, keescook, samitolvanen,
	mark.rutland, alyssa.milburn, mbenes, rostedt, mhiramat,
	alexei.starovoitov

In order to prepare for LTO like objtool runs for modules, rename the
duplicate argument to lto.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 scripts/link-vmlinux.sh                 |    2 +-
 tools/objtool/builtin-check.c           |    4 ++--
 tools/objtool/check.c                   |    7 ++++++-
 tools/objtool/include/objtool/builtin.h |    2 +-
 4 files changed, 10 insertions(+), 5 deletions(-)

--- a/scripts/link-vmlinux.sh
+++ b/scripts/link-vmlinux.sh
@@ -115,7 +115,7 @@ objtool_link()
 			objtoolcmd="orc generate"
 		fi
 
-		objtoolopt="${objtoolopt} --duplicate"
+		objtoolopt="${objtoolopt} --lto"
 
 		if is_enabled CONFIG_FTRACE_MCOUNT_USE_OBJTOOL; then
 			objtoolopt="${objtoolopt} --mcount"
--- a/tools/objtool/builtin-check.c
+++ b/tools/objtool/builtin-check.c
@@ -20,7 +20,7 @@
 #include <objtool/objtool.h>
 
 bool no_fp, no_unreachable, retpoline, module, backtrace, uaccess, stats,
-     validate_dup, vmlinux, mcount, noinstr, backup, sls, dryrun;
+     lto, vmlinux, mcount, noinstr, backup, sls, dryrun;
 
 static const char * const check_usage[] = {
 	"objtool check [<options>] file.o",
@@ -40,7 +40,7 @@ const struct option check_options[] = {
 	OPT_BOOLEAN('b', "backtrace", &backtrace, "unwind on error"),
 	OPT_BOOLEAN('a', "uaccess", &uaccess, "enable uaccess checking"),
 	OPT_BOOLEAN('s', "stats", &stats, "print statistics"),
-	OPT_BOOLEAN('d', "duplicate", &validate_dup, "duplicate validation for vmlinux.o"),
+	OPT_BOOLEAN(0, "lto", &lto, "whole-archive like runs"),
 	OPT_BOOLEAN('n', "noinstr", &noinstr, "noinstr validation for vmlinux.o"),
 	OPT_BOOLEAN('l', "vmlinux", &vmlinux, "vmlinux.o validation"),
 	OPT_BOOLEAN('M', "mcount", &mcount, "generate __mcount_loc"),
--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -3501,6 +3501,11 @@ int check(struct objtool_file *file)
 {
 	int ret, warnings = 0;
 
+	if (lto && !(vmlinux || module)) {
+		fprintf(stderr, "--lto requires: --vmlinux or --module\n");
+		return 1;
+	}
+
 	arch_initial_func_cfi_state(&initial_func_cfi);
 	init_cfi_state(&init_cfi);
 	init_cfi_state(&func_cfi);
@@ -3521,7 +3526,7 @@ int check(struct objtool_file *file)
 	if (list_empty(&file->insn_list))
 		goto out;
 
-	if (vmlinux && !validate_dup) {
+	if (vmlinux && !lto) {
 		ret = validate_vmlinux_functions(file);
 		if (ret < 0)
 			goto out;
--- a/tools/objtool/include/objtool/builtin.h
+++ b/tools/objtool/include/objtool/builtin.h
@@ -9,7 +9,7 @@
 
 extern const struct option check_options[];
 extern bool no_fp, no_unreachable, retpoline, module, backtrace, uaccess, stats,
-            validate_dup, vmlinux, mcount, noinstr, backup, sls, dryrun;
+	    lto, vmlinux, mcount, noinstr, backup, sls, dryrun;
 
 extern int cmd_parse_options(int argc, const char **argv, const char * const usage[]);
 



^ permalink raw reply	[flat|nested] 183+ messages in thread

* [PATCH v2 30/39] Kbuild: Allow whole module objtool runs
  2022-02-24 14:51 [PATCH v2 00/39] x86: Kernel IBT Peter Zijlstra
                   ` (28 preceding siblings ...)
  2022-02-24 14:52 ` [PATCH v2 29/39] objtool: Rename --duplicate to --lto Peter Zijlstra
@ 2022-02-24 14:52 ` Peter Zijlstra
  2022-02-24 14:52 ` [PATCH v2 31/39] objtool: Read the NOENDBR annotation Peter Zijlstra
                   ` (9 subsequent siblings)
  39 siblings, 0 replies; 183+ messages in thread
From: Peter Zijlstra @ 2022-02-24 14:52 UTC (permalink / raw)
  To: x86, joao, hjl.tools, jpoimboe, andrew.cooper3
  Cc: linux-kernel, peterz, ndesaulniers, keescook, samitolvanen,
	mark.rutland, alyssa.milburn, mbenes, rostedt, mhiramat,
	alexei.starovoitov, Nathan Chancellor

Just like we have vmlinux.o objtool runs, add the ability to do whole
module objtool runs.

XXX it doesn't print OBJTOOL like:

 OBJTOOL  sound/usb/line6/snd-usb-variax.o
  LD [M]  sound/usb/line6/snd-usb-variax.ko

when linking modules.

Suggested-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 scripts/Makefile.build    |   44 +-----------------------------------
 scripts/Makefile.lib      |   56 ++++++++++++++++++++++++++++++++++++++++++++++
 scripts/Makefile.modfinal |    1 
 3 files changed, 59 insertions(+), 42 deletions(-)

--- a/scripts/Makefile.build
+++ b/scripts/Makefile.build
@@ -223,41 +223,6 @@ cmd_record_mcount = $(if $(findstring $(
 	$(sub_cmd_record_mcount))
 endif # CONFIG_FTRACE_MCOUNT_USE_RECORDMCOUNT
 
-ifdef CONFIG_STACK_VALIDATION
-
-objtool := $(objtree)/tools/objtool/objtool
-
-objtool_args =								\
-	$(if $(CONFIG_UNWINDER_ORC),orc generate,check)			\
-	$(if $(part-of-module), --module)				\
-	$(if $(CONFIG_FRAME_POINTER),, --no-fp)				\
-	$(if $(CONFIG_GCOV_KERNEL)$(CONFIG_LTO_CLANG), --no-unreachable)\
-	$(if $(CONFIG_RETPOLINE), --retpoline)				\
-	$(if $(CONFIG_X86_SMAP), --uaccess)				\
-	$(if $(CONFIG_FTRACE_MCOUNT_USE_OBJTOOL), --mcount)		\
-	$(if $(CONFIG_SLS), --sls)
-
-cmd_objtool = $(if $(objtool-enabled), ; $(objtool) $(objtool_args) $@)
-cmd_gen_objtooldep = $(if $(objtool-enabled), { echo ; echo '$@: $$(wildcard $(objtool))' ; } >> $(dot-target).cmd)
-
-endif # CONFIG_STACK_VALIDATION
-
-ifdef CONFIG_LTO_CLANG
-
-# Skip objtool for LLVM bitcode
-$(obj)/%.o: objtool-enabled :=
-
-else
-
-# 'OBJECT_FILES_NON_STANDARD := y': skip objtool checking for a directory
-# 'OBJECT_FILES_NON_STANDARD_foo.o := 'y': skip objtool checking for a file
-# 'OBJECT_FILES_NON_STANDARD_foo.o := 'n': override directory skip for a file
-
-$(obj)/%.o: objtool-enabled = $(if $(filter-out y%, \
-	$(OBJECT_FILES_NON_STANDARD_$(basetarget).o)$(OBJECT_FILES_NON_STANDARD)n),y)
-
-endif
-
 ifdef CONFIG_TRIM_UNUSED_KSYMS
 cmd_gen_ksymdeps = \
 	$(CONFIG_SHELL) $(srctree)/scripts/gen_ksymdeps.sh $@ >> $(dot-target).cmd
@@ -292,21 +257,16 @@ ifdef CONFIG_LTO_CLANG
 # Module .o files may contain LLVM bitcode, compile them into native code
 # before ELF processing
 quiet_cmd_cc_lto_link_modules = LTO [M] $@
-cmd_cc_lto_link_modules =						\
+      cmd_cc_lto_link_modules =						\
 	$(LD) $(ld_flags) -r -o $@					\
 		$(shell [ -s $(@:.lto.o=.o.symversions) ] &&		\
 			echo -T $(@:.lto.o=.o.symversions))		\
 		--whole-archive $(filter-out FORCE,$^)			\
 		$(cmd_objtool)
-
-# objtool was skipped for LLVM bitcode, run it now that we have compiled
-# modules into native code
-$(obj)/%.lto.o: objtool-enabled = y
-$(obj)/%.lto.o: part-of-module := y
+endif
 
 $(obj)/%.lto.o: $(obj)/%.o FORCE
 	$(call if_changed,cc_lto_link_modules)
-endif
 
 cmd_mod = { \
 	echo $(if $($*-objs)$($*-y)$($*-m), $(addprefix $(obj)/, $($*-objs) $($*-y) $($*-m)), $(@:.mod=.o)); \
--- a/scripts/Makefile.lib
+++ b/scripts/Makefile.lib
@@ -533,3 +533,59 @@ define filechk_offsets
 	 echo ""; \
 	 echo "#endif"
 endef
+
+# objtool
+# ---------------------------------------------------------------------------
+
+ifdef CONFIG_STACK_VALIDATION
+
+objtool := $(objtree)/tools/objtool/objtool
+
+objtool_args =								\
+	$(if $(CONFIG_UNWINDER_ORC),orc generate,check)			\
+	$(if $(part-of-module), --module)				\
+	$(if $(CONFIG_X86_KERNEL_IBT_SEAL), --lto --ibt --ibt-fix-direct --ibt-seal)	\
+	$(if $(CONFIG_FRAME_POINTER),, --no-fp)				\
+	$(if $(CONFIG_GCOV_KERNEL)$(CONFIG_LTO_CLANG), --no-unreachable)\
+	$(if $(CONFIG_RETPOLINE), --retpoline)				\
+	$(if $(CONFIG_X86_SMAP), --uaccess)				\
+	$(if $(CONFIG_FTRACE_MCOUNT_USE_OBJTOOL), --mcount)		\
+	$(if $(CONFIG_SLS), --sls)
+
+cmd_objtool = $(if $(objtool-enabled), ; $(objtool) $(objtool_args) $@)
+cmd_objtool_mod = $(if $(objtool-enabled), $(objtool) $(objtool_args) $(@:.ko=.o) ; )
+cmd_gen_objtooldep = $(if $(objtool-enabled), { echo ; echo '$@: $$(wildcard $(objtool))' ; } >> $(dot-target).cmd)
+
+endif # CONFIG_STACK_VALIDATION
+
+ifdef CONFIG_LTO_CLANG
+
+# Skip objtool for LLVM bitcode
+$(obj)/%.o: objtool-enabled :=
+
+# objtool was skipped for LLVM bitcode, run it now that we have compiled
+# modules into native code
+$(obj)/%.lto.o: objtool-enabled = y
+$(obj)/%.lto.o: part-of-module := y
+
+else ifdef CONFIG_X86_KERNEL_IBT_SEAL
+
+# Skip objtool on individual files
+$(obj)/%.o: objtool-enabled :=
+
+# instead run objtool on the module as a whole, right before
+# the final link pass with the linker script.
+%.ko: objtool-enabled = y
+%.ko: part-of-module := y
+
+else
+
+# 'OBJECT_FILES_NON_STANDARD := y': skip objtool checking for a directory
+# 'OBJECT_FILES_NON_STANDARD_foo.o := 'y': skip objtool checking for a file
+# 'OBJECT_FILES_NON_STANDARD_foo.o := 'n': override directory skip for a file
+
+$(obj)/%.o: objtool-enabled = $(if $(filter-out y%, \
+	$(OBJECT_FILES_NON_STANDARD_$(basetarget).o)$(OBJECT_FILES_NON_STANDARD)n),y)
+
+endif
+
--- a/scripts/Makefile.modfinal
+++ b/scripts/Makefile.modfinal
@@ -32,6 +32,7 @@ ARCH_POSTLINK := $(wildcard $(srctree)/a
 
 quiet_cmd_ld_ko_o = LD [M]  $@
       cmd_ld_ko_o +=							\
+	$(cmd_objtool_mod)						\
 	$(LD) -r $(KBUILD_LDFLAGS)					\
 		$(KBUILD_LDFLAGS_MODULE) $(LDFLAGS_MODULE)		\
 		-T scripts/module.lds -o $@ $(filter %.o, $^);		\



^ permalink raw reply	[flat|nested] 183+ messages in thread

* [PATCH v2 31/39] objtool: Read the NOENDBR annotation
  2022-02-24 14:51 [PATCH v2 00/39] x86: Kernel IBT Peter Zijlstra
                   ` (29 preceding siblings ...)
  2022-02-24 14:52 ` [PATCH v2 30/39] Kbuild: Allow whole module objtool runs Peter Zijlstra
@ 2022-02-24 14:52 ` Peter Zijlstra
  2022-02-24 14:52 ` [PATCH v2 32/39] x86/ibt: Dont generate ENDBR in .discard.text Peter Zijlstra
                   ` (8 subsequent siblings)
  39 siblings, 0 replies; 183+ messages in thread
From: Peter Zijlstra @ 2022-02-24 14:52 UTC (permalink / raw)
  To: x86, joao, hjl.tools, jpoimboe, andrew.cooper3
  Cc: linux-kernel, peterz, ndesaulniers, keescook, samitolvanen,
	mark.rutland, alyssa.milburn, mbenes, rostedt, mhiramat,
	alexei.starovoitov

Read the new NOENDBR annotation. While there, attempt to not bloat
struct instruction.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 tools/objtool/check.c                 |   27 +++++++++++++++++++++++++++
 tools/objtool/include/objtool/check.h |   13 ++++++++++---
 2 files changed, 37 insertions(+), 3 deletions(-)

--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -1860,6 +1860,29 @@ static int read_unwind_hints(struct objt
 	return 0;
 }
 
+static int read_noendbr_hints(struct objtool_file *file)
+{
+	struct section *sec;
+	struct instruction *insn;
+	struct reloc *reloc;
+
+	sec = find_section_by_name(file->elf, ".rela.discard.noendbr");
+	if (!sec)
+		return 0;
+
+	list_for_each_entry(reloc, &sec->reloc_list, list) {
+		insn = find_insn(file, reloc->sym->sec, reloc->sym->offset + reloc->addend);
+		if (!insn) {
+			WARN("bad .discard.noendbr entry");
+			return -1;
+		}
+
+		insn->noendbr = 1;
+	}
+
+	return 0;
+}
+
 static int read_retpoline_hints(struct objtool_file *file)
 {
 	struct section *sec;
@@ -2097,6 +2120,10 @@ static int decode_sections(struct objtoo
 	if (ret)
 		return ret;
 
+	ret = read_noendbr_hints(file);
+	if (ret)
+		return ret;
+
 	/*
 	 * Must be before add_{jump_call}_destination.
 	 */
--- a/tools/objtool/include/objtool/check.h
+++ b/tools/objtool/include/objtool/check.h
@@ -45,11 +45,18 @@ struct instruction {
 	unsigned int len;
 	enum insn_type type;
 	unsigned long immediate;
-	bool dead_end, ignore, ignore_alts;
-	bool hint;
-	bool retpoline_safe;
+
+	u8 dead_end	: 1,
+	   ignore	: 1,
+	   ignore_alts	: 1,
+	   hint		: 1,
+	   retpoline_safe : 1,
+	   noendbr	: 1;
+		/* 2 bit hole */
 	s8 instr;
 	u8 visited;
+	/* u8 hole */
+
 	struct alt_group *alt_group;
 	struct symbol *call_dest;
 	struct instruction *jump_dest;



^ permalink raw reply	[flat|nested] 183+ messages in thread

* [PATCH v2 32/39] x86/ibt: Dont generate ENDBR in .discard.text
  2022-02-24 14:51 [PATCH v2 00/39] x86: Kernel IBT Peter Zijlstra
                   ` (30 preceding siblings ...)
  2022-02-24 14:52 ` [PATCH v2 31/39] objtool: Read the NOENDBR annotation Peter Zijlstra
@ 2022-02-24 14:52 ` Peter Zijlstra
  2022-02-24 14:52 ` [PATCH v2 33/39] objtool: Add IBT/ENDBR decoding Peter Zijlstra
                   ` (7 subsequent siblings)
  39 siblings, 0 replies; 183+ messages in thread
From: Peter Zijlstra @ 2022-02-24 14:52 UTC (permalink / raw)
  To: x86, joao, hjl.tools, jpoimboe, andrew.cooper3
  Cc: linux-kernel, peterz, ndesaulniers, keescook, samitolvanen,
	mark.rutland, alyssa.milburn, mbenes, rostedt, mhiramat,
	alexei.starovoitov

Having ENDBR in discarded sections can easily lead to relocations into
discarded sections which the linkers aren't really fond of. Objtool
also shouldn't generate them, but why tempt fate.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/include/asm/setup.h |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

--- a/arch/x86/include/asm/setup.h
+++ b/arch/x86/include/asm/setup.h
@@ -8,6 +8,7 @@
 
 #include <linux/linkage.h>
 #include <asm/page_types.h>
+#include <asm/ibt.h>
 
 #ifdef __i386__
 
@@ -119,7 +120,7 @@ void *extend_brk(size_t size, size_t ali
  * executable.)
  */
 #define RESERVE_BRK(name,sz)						\
-	static void __section(".discard.text") __used notrace		\
+	static void __section(".discard.text") __noendbr __used notrace	\
 	__brk_reservation_fn_##name##__(void) {				\
 		asm volatile (						\
 			".pushsection .brk_reservation,\"aw\",@nobits;" \



^ permalink raw reply	[flat|nested] 183+ messages in thread

* [PATCH v2 33/39] objtool: Add IBT/ENDBR decoding
  2022-02-24 14:51 [PATCH v2 00/39] x86: Kernel IBT Peter Zijlstra
                   ` (31 preceding siblings ...)
  2022-02-24 14:52 ` [PATCH v2 32/39] x86/ibt: Dont generate ENDBR in .discard.text Peter Zijlstra
@ 2022-02-24 14:52 ` Peter Zijlstra
  2022-03-03 10:53   ` Miroslav Benes
  2022-02-24 14:52 ` [PATCH v2 34/39] objtool: Validate IBT assumptions Peter Zijlstra
                   ` (6 subsequent siblings)
  39 siblings, 1 reply; 183+ messages in thread
From: Peter Zijlstra @ 2022-02-24 14:52 UTC (permalink / raw)
  To: x86, joao, hjl.tools, jpoimboe, andrew.cooper3
  Cc: linux-kernel, peterz, ndesaulniers, keescook, samitolvanen,
	mark.rutland, alyssa.milburn, mbenes, rostedt, mhiramat,
	alexei.starovoitov

Decode ENDBR instructions and WARN about NOTRACK prefixes.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 tools/objtool/arch/x86/decode.c      |   34 +++++++++++++++++++++++++++++-----
 tools/objtool/include/objtool/arch.h |    1 +
 2 files changed, 30 insertions(+), 5 deletions(-)

--- a/tools/objtool/arch/x86/decode.c
+++ b/tools/objtool/arch/x86/decode.c
@@ -103,6 +103,18 @@ unsigned long arch_jump_destination(stru
 #define rm_is_mem(reg)	(mod_is_mem() && !is_RIP() && rm_is(reg))
 #define rm_is_reg(reg)	(mod_is_reg() && modrm_rm == (reg))
 
+static bool has_notrack_prefix(struct insn *insn)
+{
+	int i;
+
+	for (i = 0; i < insn->prefixes.nbytes; i++) {
+		if (insn->prefixes.bytes[i] == 0x3e)
+			return true;
+	}
+
+	return false;
+}
+
 int arch_decode_instruction(struct objtool_file *file, const struct section *sec,
 			    unsigned long offset, unsigned int maxlen,
 			    unsigned int *len, enum insn_type *type,
@@ -112,7 +124,7 @@ int arch_decode_instruction(struct objto
 	const struct elf *elf = file->elf;
 	struct insn insn;
 	int x86_64, ret;
-	unsigned char op1, op2, op3,
+	unsigned char op1, op2, op3, prefix,
 		      rex = 0, rex_b = 0, rex_r = 0, rex_w = 0, rex_x = 0,
 		      modrm = 0, modrm_mod = 0, modrm_rm = 0, modrm_reg = 0,
 		      sib = 0, /* sib_scale = 0, */ sib_index = 0, sib_base = 0;
@@ -137,6 +149,8 @@ int arch_decode_instruction(struct objto
 	if (insn.vex_prefix.nbytes)
 		return 0;
 
+	prefix = insn.prefixes.bytes[0];
+
 	op1 = insn.opcode.bytes[0];
 	op2 = insn.opcode.bytes[1];
 	op3 = insn.opcode.bytes[2];
@@ -492,6 +506,12 @@ int arch_decode_instruction(struct objto
 			/* nopl/nopw */
 			*type = INSN_NOP;
 
+		} else if (op2 == 0x1e) {
+
+			if (prefix == 0xf3 && (modrm == 0xfa || modrm == 0xfb))
+				*type = INSN_ENDBR;
+
+
 		} else if (op2 == 0x38 && op3 == 0xf8) {
 			if (insn.prefixes.nbytes == 1 &&
 			    insn.prefixes.bytes[0] == 0xf2) {
@@ -636,20 +656,24 @@ int arch_decode_instruction(struct objto
 		break;
 
 	case 0xff:
-		if (modrm_reg == 2 || modrm_reg == 3)
+		if (modrm_reg == 2 || modrm_reg == 3) {
 
 			*type = INSN_CALL_DYNAMIC;
+			if (has_notrack_prefix(&insn))
+				WARN("notrack prefix found at %s:0x%lx", sec->name, offset);
 
-		else if (modrm_reg == 4)
+		} else if (modrm_reg == 4) {
 
 			*type = INSN_JUMP_DYNAMIC;
+			if (has_notrack_prefix(&insn))
+				WARN("notrack prefix found at %s:0x%lx", sec->name, offset);
 
-		else if (modrm_reg == 5)
+		} else if (modrm_reg == 5) {
 
 			/* jmpf */
 			*type = INSN_CONTEXT_SWITCH;
 
-		else if (modrm_reg == 6) {
+		} else if (modrm_reg == 6) {
 
 			/* push from mem */
 			ADD_OP(op) {
--- a/tools/objtool/include/objtool/arch.h
+++ b/tools/objtool/include/objtool/arch.h
@@ -27,6 +27,7 @@ enum insn_type {
 	INSN_STD,
 	INSN_CLD,
 	INSN_TRAP,
+	INSN_ENDBR,
 	INSN_OTHER,
 };
 



^ permalink raw reply	[flat|nested] 183+ messages in thread

* [PATCH v2 34/39] objtool: Validate IBT assumptions
  2022-02-24 14:51 [PATCH v2 00/39] x86: Kernel IBT Peter Zijlstra
                   ` (32 preceding siblings ...)
  2022-02-24 14:52 ` [PATCH v2 33/39] objtool: Add IBT/ENDBR decoding Peter Zijlstra
@ 2022-02-24 14:52 ` Peter Zijlstra
  2022-02-27  3:13   ` Josh Poimboeuf
  2022-02-24 14:52 ` [PATCH v2 35/39] objtool: IBT fix direct JMP/CALL Peter Zijlstra
                   ` (5 subsequent siblings)
  39 siblings, 1 reply; 183+ messages in thread
From: Peter Zijlstra @ 2022-02-24 14:52 UTC (permalink / raw)
  To: x86, joao, hjl.tools, jpoimboe, andrew.cooper3
  Cc: linux-kernel, peterz, ndesaulniers, keescook, samitolvanen,
	mark.rutland, alyssa.milburn, mbenes, rostedt, mhiramat,
	alexei.starovoitov

Intel IBT requires that every indirect JMP/CALL targets an ENDBR
instructions, failing this #CP happens and we die. Similarly, all
exception entries should be ENDBR.

Find all code relocations and ensure they're either an ENDBR
instruction or ANNOTATE_NOENDBR. For the exceptions look for
UNWIND_HINT_IRET_REGS at sym+0 not being ENDBR.

Additionally, look for direct JMP/CALL instructions and warn if they
target an ENDBR instruction. This extra constraint comes from the
desire to poison unused ENDBR instructions.

NOTE: the changes in add_{call,jump}_destination() are to add a common
path after setting insn->{jump,call}_dest with both (source and
destination) instructions available.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 tools/objtool/builtin-check.c           |    4 
 tools/objtool/check.c                   |  255 +++++++++++++++++++++++++++++---
 tools/objtool/include/objtool/builtin.h |    3 
 tools/objtool/include/objtool/objtool.h |    3 
 4 files changed, 243 insertions(+), 22 deletions(-)

--- a/tools/objtool/builtin-check.c
+++ b/tools/objtool/builtin-check.c
@@ -20,7 +20,8 @@
 #include <objtool/objtool.h>
 
 bool no_fp, no_unreachable, retpoline, module, backtrace, uaccess, stats,
-     lto, vmlinux, mcount, noinstr, backup, sls, dryrun;
+     lto, vmlinux, mcount, noinstr, backup, sls, dryrun,
+     ibt;
 
 static const char * const check_usage[] = {
 	"objtool check [<options>] file.o",
@@ -47,6 +48,7 @@ const struct option check_options[] = {
 	OPT_BOOLEAN('B', "backup", &backup, "create .orig files before modification"),
 	OPT_BOOLEAN('S', "sls", &sls, "validate straight-line-speculation"),
 	OPT_BOOLEAN(0, "dry-run", &dryrun, "don't write the modifications"),
+	OPT_BOOLEAN(0, "ibt", &ibt, "validate ENDBR placement"),
 	OPT_END(),
 };
 
--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -380,6 +380,7 @@ static int decode_instructions(struct ob
 			memset(insn, 0, sizeof(*insn));
 			INIT_LIST_HEAD(&insn->alts);
 			INIT_LIST_HEAD(&insn->stack_ops);
+			INIT_LIST_HEAD(&insn->call_node);
 
 			insn->sec = sec;
 			insn->offset = offset;
@@ -1176,6 +1177,14 @@ static int add_jump_destinations(struct
 	unsigned long dest_off;
 
 	for_each_insn(file, insn) {
+		if (insn->type == INSN_ENDBR && insn->func) {
+			if (insn->offset == insn->func->offset) {
+				file->nr_endbr++;
+			} else {
+				file->nr_endbr_int++;
+			}
+		}
+
 		if (!is_static_jump(insn))
 			continue;
 
@@ -1192,10 +1201,14 @@ static int add_jump_destinations(struct
 		} else if (insn->func) {
 			/* internal or external sibling call (with reloc) */
 			add_call_dest(file, insn, reloc->sym, true);
-			continue;
+
+			dest_sec = reloc->sym->sec;
+			dest_off = reloc->sym->offset +
+				   arch_dest_reloc_offset(reloc->addend);
+
 		} else if (reloc->sym->sec->idx) {
 			dest_sec = reloc->sym->sec;
-			dest_off = reloc->sym->sym.st_value +
+			dest_off = reloc->sym->offset +
 				   arch_dest_reloc_offset(reloc->addend);
 		} else {
 			/* non-func asm code jumping to another file */
@@ -1205,6 +1218,10 @@ static int add_jump_destinations(struct
 		insn->jump_dest = find_insn(file, dest_sec, dest_off);
 		if (!insn->jump_dest) {
 
+			/* external symbol */
+			if (!vmlinux && insn->func)
+				continue;
+
 			/*
 			 * This is a special case where an alt instruction
 			 * jumps past the end of the section.  These are
@@ -1219,6 +1236,16 @@ static int add_jump_destinations(struct
 			return -1;
 		}
 
+		if (ibt && insn->jump_dest->type == INSN_ENDBR &&
+		    insn->jump_dest->func &&
+		    insn->jump_dest->offset == insn->jump_dest->func->offset) {
+			if (reloc) {
+				WARN_FUNC("Direct RELOC jump to ENDBR", insn->sec, insn->offset);
+			} else {
+				WARN_FUNC("Direct IMM jump to ENDBR", insn->sec, insn->offset);
+			}
+		}
+
 		/*
 		 * Cross-function jump.
 		 */
@@ -1246,7 +1273,8 @@ static int add_jump_destinations(struct
 				insn->jump_dest->func->pfunc = insn->func;
 
 			} else if (insn->jump_dest->func->pfunc != insn->func->pfunc &&
-				   insn->jump_dest->offset == insn->jump_dest->func->offset) {
+				   ((insn->jump_dest->offset == insn->jump_dest->func->offset) ||
+				    (insn->jump_dest->offset == insn->jump_dest->func->offset + 4))) {
 				/* internal sibling call (without reloc) */
 				add_call_dest(file, insn, insn->jump_dest->func, true);
 			}
@@ -1256,23 +1284,12 @@ static int add_jump_destinations(struct
 	return 0;
 }
 
-static struct symbol *find_call_destination(struct section *sec, unsigned long offset)
-{
-	struct symbol *call_dest;
-
-	call_dest = find_func_by_offset(sec, offset);
-	if (!call_dest)
-		call_dest = find_symbol_by_offset(sec, offset);
-
-	return call_dest;
-}
-
 /*
  * Find the destination instructions for all calls.
  */
 static int add_call_destinations(struct objtool_file *file)
 {
-	struct instruction *insn;
+	struct instruction *insn, *target = NULL;
 	unsigned long dest_off;
 	struct symbol *dest;
 	struct reloc *reloc;
@@ -1284,7 +1301,21 @@ static int add_call_destinations(struct
 		reloc = insn_reloc(file, insn);
 		if (!reloc) {
 			dest_off = arch_jump_destination(insn);
-			dest = find_call_destination(insn->sec, dest_off);
+
+			target = find_insn(file, insn->sec, dest_off);
+			if (!target) {
+				WARN_FUNC("direct call to nowhere", insn->sec, insn->offset);
+				return -1;
+			}
+			dest = target->func;
+			if (!dest)
+				dest = find_symbol_containing(insn->sec, dest_off);
+			if (!dest) {
+				WARN_FUNC("IMM can't find call dest symbol at %s+0x%lx",
+					  insn->sec, insn->offset,
+					  insn->sec->name, dest_off);
+				return -1;
+			}
 
 			add_call_dest(file, insn, dest, false);
 
@@ -1303,10 +1334,22 @@ static int add_call_destinations(struct
 			}
 
 		} else if (reloc->sym->type == STT_SECTION) {
-			dest_off = arch_dest_reloc_offset(reloc->addend);
-			dest = find_call_destination(reloc->sym->sec, dest_off);
+			struct section *dest_sec;
+
+			dest_sec = reloc->sym->sec;
+			dest_off = reloc->sym->offset +
+				   arch_dest_reloc_offset(reloc->addend);
+
+			target = find_insn(file, dest_sec, dest_off);
+			if (!target) {
+				WARN_FUNC("direct call to nowhere", insn->sec, insn->offset);
+				return -1;
+			}
+			dest = target->func;
+			if (!dest)
+				dest = find_symbol_containing(dest_sec, dest_off);
 			if (!dest) {
-				WARN_FUNC("can't find call dest symbol at %s+0x%lx",
+				WARN_FUNC("RELOC can't find call dest symbol at %s+0x%lx",
 					  insn->sec, insn->offset,
 					  reloc->sym->sec->name,
 					  dest_off);
@@ -1317,9 +1360,27 @@ static int add_call_destinations(struct
 
 		} else if (reloc->sym->retpoline_thunk) {
 			add_retpoline_call(file, insn);
+			continue;
+
+		} else {
+			struct section *dest_sec;
+
+			dest_sec = reloc->sym->sec;
+			dest_off = reloc->sym->offset +
+				   arch_dest_reloc_offset(reloc->addend);
+
+			target = find_insn(file, dest_sec, dest_off);
 
-		} else
 			add_call_dest(file, insn, reloc->sym, false);
+		}
+
+		if (ibt && target && target->type == INSN_ENDBR) {
+			if (reloc) {
+				WARN_FUNC("Direct RELOC call to ENDBR", insn->sec, insn->offset);
+			} else {
+				WARN_FUNC("Direct IMM call to ENDBR", insn->sec, insn->offset);
+			}
+		}
 	}
 
 	return 0;
@@ -3053,6 +3114,8 @@ static struct instruction *next_insn_to_
 	return next_insn_same_sec(file, insn);
 }
 
+static void validate_ibt_insn(struct objtool_file *file, struct instruction *insn);
+
 /*
  * Follow the branch starting at the given instruction, and recursively follow
  * any other branches (jumps).  Meanwhile, track the frame pointer state at
@@ -3101,6 +3164,17 @@ static int validate_branch(struct objtoo
 
 		if (insn->hint) {
 			state.cfi = *insn->cfi;
+			if (ibt) {
+				struct symbol *sym;
+
+				if (insn->cfi->type == UNWIND_HINT_TYPE_REGS_PARTIAL &&
+				    (sym = find_symbol_by_offset(insn->sec, insn->offset)) &&
+				    insn->type != INSN_ENDBR && !insn->noendbr) {
+					WARN_FUNC("IRET_REGS hint without ENDBR: %s",
+						  insn->sec, insn->offset,
+						  sym->name);
+				}
+			}
 		} else {
 			/* XXX track if we actually changed state.cfi */
 
@@ -3260,7 +3334,12 @@ static int validate_branch(struct objtoo
 			state.df = false;
 			break;
 
+		case INSN_NOP:
+			break;
+
 		default:
+			if (ibt)
+				validate_ibt_insn(file, insn);
 			break;
 		}
 
@@ -3506,6 +3585,130 @@ static int validate_functions(struct obj
 	return warnings;
 }
 
+static struct instruction *
+validate_ibt_reloc(struct objtool_file *file, struct reloc *reloc)
+{
+	struct instruction *dest;
+	struct section *sec;
+	unsigned long off;
+
+	sec = reloc->sym->sec;
+	off = reloc->sym->offset + reloc->addend;
+
+	dest = find_insn(file, sec, off);
+	if (!dest)
+		return NULL;
+
+	if (dest->type == INSN_ENDBR)
+		return NULL;
+
+	if (reloc->sym->static_call_tramp)
+		return NULL;
+
+	return dest;
+}
+
+static void warn_noendbr(const char *msg, struct section *sec, unsigned long offset,
+			 struct instruction *target)
+{
+	WARN_FUNC("%srelocation to !ENDBR: %s+0x%lx", sec, offset, msg,
+		  target->func ? target->func->name : target->sec->name,
+		  target->func ? target->offset - target->func->offset : target->offset);
+}
+
+static void validate_ibt_target(struct objtool_file *file, struct instruction *insn,
+				struct instruction *target)
+{
+	if (target->func && target->func == insn->func) {
+		/*
+		 * Anything from->to self is either _THIS_IP_ or IRET-to-self.
+		 *
+		 * There is no sane way to annotate _THIS_IP_ since the compiler treats the
+		 * relocation as a constant and is happy to fold in offsets, skewing any
+		 * annotation we do, leading to vast amounts of false-positives.
+		 *
+		 * There's also compiler generated _THIS_IP_ through KCOV and
+		 * such which we have no hope of annotating.
+		 *
+		 * As such, blanked accept self-references without issue.
+		 */
+		return;
+	}
+
+	/*
+	 * Annotated non-control flow target.
+	 */
+	if (target->noendbr)
+		return;
+
+	warn_noendbr("", insn->sec, insn->offset, target);
+}
+
+static void validate_ibt_insn(struct objtool_file *file, struct instruction *insn)
+{
+	struct reloc *reloc = insn_reloc(file, insn);
+	struct instruction *target;
+
+	for (;;) {
+		if (!reloc)
+			return;
+
+		target = validate_ibt_reloc(file, reloc);
+		if (target)
+			validate_ibt_target(file, insn, target);
+
+		reloc = find_reloc_by_dest_range(file->elf, insn->sec, reloc->offset + 1,
+						 (insn->offset + insn->len) - (reloc->offset + 1));
+	}
+}
+
+static int validate_ibt(struct objtool_file *file)
+{
+	struct section *sec;
+	struct reloc *reloc;
+
+	for_each_sec(file, sec) {
+		bool is_data;
+
+		/* already done in validate_branch() */
+		if (sec->sh.sh_flags & SHF_EXECINSTR)
+			continue;
+
+		if (!sec->reloc)
+			continue;
+
+		if (!strncmp(sec->name, ".orc", 4))
+			continue;
+
+		if (!strncmp(sec->name, ".discard", 8))
+			continue;
+
+		if (!strncmp(sec->name, ".debug", 6))
+			continue;
+
+		if (!strcmp(sec->name, "_error_injection_whitelist"))
+			continue;
+
+		if (!strcmp(sec->name, "_kprobe_blacklist"))
+			continue;
+
+		is_data = strstr(sec->name, ".data") || strstr(sec->name, ".rodata");
+
+		list_for_each_entry(reloc, &sec->reloc->reloc_list, list) {
+			struct instruction *target;
+
+			target = validate_ibt_reloc(file, reloc);
+			if (is_data && target && !target->noendbr) {
+				warn_noendbr("data ", reloc->sym->sec,
+					     reloc->sym->offset + reloc->addend,
+					     target);
+			}
+		}
+	}
+
+	return 0;
+}
+
 static int validate_reachable_instructions(struct objtool_file *file)
 {
 	struct instruction *insn;
@@ -3533,6 +3736,11 @@ int check(struct objtool_file *file)
 		return 1;
 	}
 
+	if (ibt && !lto) {
+		fprintf(stderr, "--ibt requires: --lto\n");
+		return 1;
+	}
+
 	arch_initial_func_cfi_state(&initial_func_cfi);
 	init_cfi_state(&init_cfi);
 	init_cfi_state(&func_cfi);
@@ -3579,6 +3787,13 @@ int check(struct objtool_file *file)
 		goto out;
 	warnings += ret;
 
+	if (ibt) {
+		ret = validate_ibt(file);
+		if (ret < 0)
+			goto out;
+		warnings += ret;
+	}
+
 	if (!warnings) {
 		ret = validate_reachable_instructions(file);
 		if (ret < 0)
--- a/tools/objtool/include/objtool/builtin.h
+++ b/tools/objtool/include/objtool/builtin.h
@@ -9,7 +9,8 @@
 
 extern const struct option check_options[];
 extern bool no_fp, no_unreachable, retpoline, module, backtrace, uaccess, stats,
-	    lto, vmlinux, mcount, noinstr, backup, sls, dryrun;
+	    lto, vmlinux, mcount, noinstr, backup, sls, dryrun,
+	    ibt;
 
 extern int cmd_parse_options(int argc, const char **argv, const char * const usage[]);
 
--- a/tools/objtool/include/objtool/objtool.h
+++ b/tools/objtool/include/objtool/objtool.h
@@ -28,6 +28,9 @@ struct objtool_file {
 	struct list_head mcount_loc_list;
 	bool ignore_unreachables, c_file, hints, rodata;
 
+	unsigned int nr_endbr;
+	unsigned int nr_endbr_int;
+
 	unsigned long jl_short, jl_long;
 	unsigned long jl_nop_short, jl_nop_long;
 



^ permalink raw reply	[flat|nested] 183+ messages in thread

* [PATCH v2 35/39] objtool: IBT fix direct JMP/CALL
  2022-02-24 14:51 [PATCH v2 00/39] x86: Kernel IBT Peter Zijlstra
                   ` (33 preceding siblings ...)
  2022-02-24 14:52 ` [PATCH v2 34/39] objtool: Validate IBT assumptions Peter Zijlstra
@ 2022-02-24 14:52 ` Peter Zijlstra
  2022-02-24 14:52 ` [PATCH v2 36/39] objtool: Find unused ENDBR instructions Peter Zijlstra
                   ` (4 subsequent siblings)
  39 siblings, 0 replies; 183+ messages in thread
From: Peter Zijlstra @ 2022-02-24 14:52 UTC (permalink / raw)
  To: x86, joao, hjl.tools, jpoimboe, andrew.cooper3
  Cc: linux-kernel, peterz, ndesaulniers, keescook, samitolvanen,
	mark.rutland, alyssa.milburn, mbenes, rostedt, mhiramat,
	alexei.starovoitov

Optionally rewrite all direct JMP/CALL that target ENDBR. 

By doing this it is guaranteed that only indirect code flow uses
ENDBR, at which point it becomes possible to poison unused ENDBR
instructions (a later patch).

By having this rely on --lto the only direct code flow missed is that
fixed up by the module loader.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 tools/objtool/arch/x86/decode.c         |   71 ++++++++++++++++++++++++++++++++
 tools/objtool/builtin-check.c           |    3 -
 tools/objtool/check.c                   |   45 ++++++++++++++++++--
 tools/objtool/include/objtool/arch.h    |    1 
 tools/objtool/include/objtool/builtin.h |    2 
 5 files changed, 116 insertions(+), 6 deletions(-)

--- a/tools/objtool/arch/x86/decode.c
+++ b/tools/objtool/arch/x86/decode.c
@@ -729,6 +729,77 @@ const char *arch_nop_insn(int len)
 	return nops[len-1];
 }
 
+const char *arch_mod_immediate(struct instruction *insn, unsigned long target)
+{
+	struct section *sec = insn->sec;
+	Elf_Data *data = sec->data;
+	unsigned char op1, op2;
+	static char bytes[16];
+	struct insn x86_insn;
+	int ret, disp;
+
+	disp = (long)(target - (insn->offset + insn->len));
+
+	if (data->d_type != ELF_T_BYTE || data->d_off) {
+		WARN("unexpected data for section: %s", sec->name);
+		return NULL;
+	}
+
+	ret = insn_decode(&x86_insn, data->d_buf + insn->offset, insn->len,
+			  INSN_MODE_64);
+	if (ret < 0) {
+		WARN("can't decode instruction at %s:0x%lx", sec->name, insn->offset);
+		return NULL;
+	}
+
+	op1 = x86_insn.opcode.bytes[0];
+	op2 = x86_insn.opcode.bytes[1];
+
+	switch (op1) {
+	case 0x0f: /* escape */
+		switch (op2) {
+		case 0x80 ... 0x8f: /* jcc.d32 */
+			if (insn->len != 6)
+				return NULL;
+			bytes[0] = op1;
+			bytes[1] = op2;
+			*(int *)&bytes[2] = disp;
+			break;
+
+		default:
+			return NULL;
+		}
+		break;
+
+	case 0x70 ... 0x7f: /* jcc.d8 */
+	case 0xeb: /* jmp.d8 */
+		if (insn->len != 2)
+			return NULL;
+
+		if (disp >> 7 != disp >> 31) {
+			WARN("displacement doesn't fit\n");
+			return NULL;
+		}
+
+		bytes[0] = op1;
+		bytes[1] = disp & 0xff;
+		break;
+
+	case 0xe8: /* call */
+	case 0xe9: /* jmp.d32 */
+		if (insn->len != 5)
+			return NULL;
+		bytes[0] = op1;
+		*(int *)&bytes[1] = disp;
+		break;
+
+	default:
+		return NULL;
+	}
+
+	return bytes;
+}
+
 #define BYTE_RET	0xC3
 
 const char *arch_ret_insn(int len)
--- a/tools/objtool/builtin-check.c
+++ b/tools/objtool/builtin-check.c
@@ -21,7 +21,7 @@
 
 bool no_fp, no_unreachable, retpoline, module, backtrace, uaccess, stats,
      lto, vmlinux, mcount, noinstr, backup, sls, dryrun,
-     ibt;
+     ibt, ibt_fix_direct;
 
 static const char * const check_usage[] = {
 	"objtool check [<options>] file.o",
@@ -49,6 +49,7 @@ const struct option check_options[] = {
 	OPT_BOOLEAN('S', "sls", &sls, "validate straight-line-speculation"),
 	OPT_BOOLEAN(0, "dry-run", &dryrun, "don't write the modifications"),
 	OPT_BOOLEAN(0, "ibt", &ibt, "validate ENDBR placement"),
+	OPT_BOOLEAN(0, "ibt-fix-direct", &ibt_fix_direct, "fixup direct jmp/call to ENDBR"),
 	OPT_END(),
 };
 
--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -1240,9 +1240,25 @@ static int add_jump_destinations(struct
 		    insn->jump_dest->func &&
 		    insn->jump_dest->offset == insn->jump_dest->func->offset) {
 			if (reloc) {
-				WARN_FUNC("Direct RELOC jump to ENDBR", insn->sec, insn->offset);
+				if (ibt_fix_direct) {
+					reloc->addend += 4;
+					elf_write_reloc(file->elf, reloc);
+				} else {
+					WARN_FUNC("Direct RELOC jump to ENDBR", insn->sec, insn->offset);
+				}
 			} else {
-				WARN_FUNC("Direct IMM jump to ENDBR", insn->sec, insn->offset);
+				if (ibt_fix_direct) {
+					const char *bytes = arch_mod_immediate(insn, dest_off + 4);
+					if (bytes) {
+						elf_write_insn(file->elf, insn->sec,
+							       insn->offset, insn->len,
+							       bytes);
+					} else {
+						WARN_FUNC("Direct IMM jump to ENDBR; cannot fix", insn->sec, insn->offset);
+					}
+				} else {
+					WARN_FUNC("Direct IMM jump to ENDBR", insn->sec, insn->offset);
+				}
 			}
 		}
 
@@ -1378,9 +1394,25 @@ static int add_call_destinations(struct
 
 		if (ibt && target && target->type == INSN_ENDBR) {
 			if (reloc) {
-				WARN_FUNC("Direct RELOC call to ENDBR", insn->sec, insn->offset);
+				if (ibt_fix_direct) {
+					reloc->addend += 4;
+					elf_write_reloc(file->elf, reloc);
+				} else {
+					WARN_FUNC("Direct RELOC call to ENDBR", insn->sec, insn->offset);
+				}
 			} else {
-				WARN_FUNC("Direct IMM call to ENDBR", insn->sec, insn->offset);
+				if (ibt_fix_direct) {
+					const char *bytes = arch_mod_immediate(insn, dest_off + 4);
+					if (bytes) {
+						elf_write_insn(file->elf, insn->sec,
+							       insn->offset, insn->len,
+							       bytes);
+					} else {
+						WARN_FUNC("Direct IMM call to ENDBR; cannot fix", insn->sec, insn->offset);
+					}
+				} else {
+					WARN_FUNC("Direct IMM call to ENDBR", insn->sec, insn->offset);
+				}
 			}
 		}
 	}
@@ -3740,6 +3772,11 @@ int check(struct objtool_file *file)
 		return 1;
 	}
 
+	if (ibt_fix_direct && !ibt) {
+		fprintf(stderr, "--ibt-fix-direct requires: --ibt\n");
+		return 1;
+	}
+
 	arch_initial_func_cfi_state(&initial_func_cfi);
 	init_cfi_state(&init_cfi);
 	init_cfi_state(&func_cfi);
--- a/tools/objtool/include/objtool/arch.h
+++ b/tools/objtool/include/objtool/arch.h
@@ -85,6 +85,7 @@ unsigned long arch_dest_reloc_offset(int
 
 const char *arch_nop_insn(int len);
 const char *arch_ret_insn(int len);
+const char *arch_mod_immediate(struct instruction *insn, unsigned long target);
 
 int arch_decode_hint_reg(u8 sp_reg, int *base);
 
--- a/tools/objtool/include/objtool/builtin.h
+++ b/tools/objtool/include/objtool/builtin.h
@@ -10,7 +10,7 @@
 extern const struct option check_options[];
 extern bool no_fp, no_unreachable, retpoline, module, backtrace, uaccess, stats,
 	    lto, vmlinux, mcount, noinstr, backup, sls, dryrun,
-	    ibt;
+	    ibt, ibt_fix_direct;
 
 extern int cmd_parse_options(int argc, const char **argv, const char * const usage[]);
 



^ permalink raw reply	[flat|nested] 183+ messages in thread

* [PATCH v2 36/39] objtool: Find unused ENDBR instructions
  2022-02-24 14:51 [PATCH v2 00/39] x86: Kernel IBT Peter Zijlstra
                   ` (34 preceding siblings ...)
  2022-02-24 14:52 ` [PATCH v2 35/39] objtool: IBT fix direct JMP/CALL Peter Zijlstra
@ 2022-02-24 14:52 ` Peter Zijlstra
  2022-02-27  3:46   ` Josh Poimboeuf
  2022-02-24 14:52 ` [PATCH v2 37/39] x86/ibt: Finish --ibt-fix-direct on module loading Peter Zijlstra
                   ` (3 subsequent siblings)
  39 siblings, 1 reply; 183+ messages in thread
From: Peter Zijlstra @ 2022-02-24 14:52 UTC (permalink / raw)
  To: x86, joao, hjl.tools, jpoimboe, andrew.cooper3
  Cc: linux-kernel, peterz, ndesaulniers, keescook, samitolvanen,
	mark.rutland, alyssa.milburn, mbenes, rostedt, mhiramat,
	alexei.starovoitov

Find all unused ENDBR instructions and stick them in a section such
that the kernel can poison them.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/kernel/vmlinux.lds.S           |    9 ++++
 tools/objtool/builtin-check.c           |    3 -
 tools/objtool/check.c                   |   72 +++++++++++++++++++++++++++++++-
 tools/objtool/include/objtool/builtin.h |    2 
 tools/objtool/include/objtool/objtool.h |    1 
 tools/objtool/objtool.c                 |    1 
 6 files changed, 85 insertions(+), 3 deletions(-)

--- a/arch/x86/kernel/vmlinux.lds.S
+++ b/arch/x86/kernel/vmlinux.lds.S
@@ -285,6 +285,15 @@ SECTIONS
 	}
 #endif
 
+#ifdef CONFIG_X86_KERNEL_IBT
+	. = ALIGN(8);
+	.ibt_endbr_sites : AT(ADDR(.ibt_endbr_sites) - LOAD_OFFSET) {
+		__ibt_endbr_sites = .;
+		*(.ibt_endbr_sites)
+		__ibt_endbr_sites_end = .;
+	}
+#endif
+
 	/*
 	 * struct alt_inst entries. From the header (alternative.h):
 	 * "Alternative instructions for different CPU types or capabilities"
--- a/tools/objtool/builtin-check.c
+++ b/tools/objtool/builtin-check.c
@@ -21,7 +21,7 @@
 
 bool no_fp, no_unreachable, retpoline, module, backtrace, uaccess, stats,
      lto, vmlinux, mcount, noinstr, backup, sls, dryrun,
-     ibt, ibt_fix_direct;
+     ibt, ibt_fix_direct, ibt_seal;
 
 static const char * const check_usage[] = {
 	"objtool check [<options>] file.o",
@@ -50,6 +50,7 @@ const struct option check_options[] = {
 	OPT_BOOLEAN(0, "dry-run", &dryrun, "don't write the modifications"),
 	OPT_BOOLEAN(0, "ibt", &ibt, "validate ENDBR placement"),
 	OPT_BOOLEAN(0, "ibt-fix-direct", &ibt_fix_direct, "fixup direct jmp/call to ENDBR"),
+	OPT_BOOLEAN(0, "ibt-seal", &ibt_seal, "list superfluous ENDBR instructions"),
 	OPT_END(),
 };
 
--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -732,6 +732,58 @@ static int create_retpoline_sites_sectio
 	return 0;
 }
 
+static int create_ibt_endbr_sites_sections(struct objtool_file *file)
+{
+	struct instruction *insn;
+	struct section *sec;
+	int idx;
+
+	sec = find_section_by_name(file->elf, ".ibt_endbr_sites");
+	if (sec) {
+		WARN("file already has .ibt_endbr_sites, skipping");
+		return 0;
+	}
+
+	idx = 0;
+	list_for_each_entry(insn, &file->endbr_list, call_node)
+		idx++;
+
+	if (stats) {
+		printf("ibt: ENDBR at function start: %d\n", file->nr_endbr);
+		printf("ibt: ENDBR inside functions:  %d\n", file->nr_endbr_int);
+		printf("ibt: superfluous ENDBR:       %d\n", idx);
+	}
+
+	if (!idx)
+		return 0;
+
+	sec = elf_create_section(file->elf, ".ibt_endbr_sites", 0,
+				 sizeof(int), idx);
+	if (!sec) {
+		WARN("elf_create_section: .ibt_endbr_sites");
+		return -1;
+	}
+
+	idx = 0;
+	list_for_each_entry(insn, &file->endbr_list, call_node) {
+
+		int *site = (int *)sec->data->d_buf + idx;
+		*site = 0;
+
+		if (elf_add_reloc_to_insn(file->elf, sec,
+					  idx * sizeof(int),
+					  R_X86_64_PC32,
+					  insn->sec, insn->offset)) {
+			WARN("elf_add_reloc_to_insn: .ibt_endbr_sites");
+			return -1;
+		}
+
+		idx++;
+	}
+
+	return 0;
+}
+
 static int create_mcount_loc_sections(struct objtool_file *file)
 {
 	struct section *sec;
@@ -1179,6 +1231,7 @@ static int add_jump_destinations(struct
 	for_each_insn(file, insn) {
 		if (insn->type == INSN_ENDBR && insn->func) {
 			if (insn->offset == insn->func->offset) {
+				list_add_tail(&insn->call_node, &file->endbr_list);
 				file->nr_endbr++;
 			} else {
 				file->nr_endbr_int++;
@@ -3633,8 +3687,12 @@ validate_ibt_reloc(struct objtool_file *
 	if (!dest)
 		return NULL;
 
-	if (dest->type == INSN_ENDBR)
+	if (dest->type == INSN_ENDBR) {
+		if (!list_empty(&dest->call_node))
+			list_del_init(&dest->call_node);
+
 		return NULL;
+	}
 
 	if (reloc->sym->static_call_tramp)
 		return NULL;
@@ -3777,6 +3835,11 @@ int check(struct objtool_file *file)
 		return 1;
 	}
 
+	if (ibt_seal && !ibt_fix_direct) {
+		fprintf(stderr, "--ibt-seal requires: --ibt-fix-direct\n");
+		return 1;
+	}
+
 	arch_initial_func_cfi_state(&initial_func_cfi);
 	init_cfi_state(&init_cfi);
 	init_cfi_state(&func_cfi);
@@ -3854,6 +3917,13 @@ int check(struct objtool_file *file)
 		if (ret < 0)
 			goto out;
 		warnings += ret;
+	}
+
+	if (ibt_seal) {
+		ret = create_ibt_endbr_sites_sections(file);
+		if (ret < 0)
+			goto out;
+		warnings += ret;
 	}
 
 	if (stats) {
--- a/tools/objtool/include/objtool/builtin.h
+++ b/tools/objtool/include/objtool/builtin.h
@@ -10,7 +10,7 @@
 extern const struct option check_options[];
 extern bool no_fp, no_unreachable, retpoline, module, backtrace, uaccess, stats,
 	    lto, vmlinux, mcount, noinstr, backup, sls, dryrun,
-	    ibt, ibt_fix_direct;
+	    ibt, ibt_fix_direct, ibt_seal;
 
 extern int cmd_parse_options(int argc, const char **argv, const char * const usage[]);
 
--- a/tools/objtool/include/objtool/objtool.h
+++ b/tools/objtool/include/objtool/objtool.h
@@ -26,6 +26,7 @@ struct objtool_file {
 	struct list_head retpoline_call_list;
 	struct list_head static_call_list;
 	struct list_head mcount_loc_list;
+	struct list_head endbr_list;
 	bool ignore_unreachables, c_file, hints, rodata;
 
 	unsigned int nr_endbr;
--- a/tools/objtool/objtool.c
+++ b/tools/objtool/objtool.c
@@ -128,6 +128,7 @@ struct objtool_file *objtool_open_read(c
 	INIT_LIST_HEAD(&file.retpoline_call_list);
 	INIT_LIST_HEAD(&file.static_call_list);
 	INIT_LIST_HEAD(&file.mcount_loc_list);
+	INIT_LIST_HEAD(&file.endbr_list);
 	file.c_file = !vmlinux && find_section_by_name(file.elf, ".comment");
 	file.ignore_unreachables = no_unreachable;
 	file.hints = false;



^ permalink raw reply	[flat|nested] 183+ messages in thread

* [PATCH v2 37/39] x86/ibt: Finish --ibt-fix-direct on module loading
  2022-02-24 14:51 [PATCH v2 00/39] x86: Kernel IBT Peter Zijlstra
                   ` (35 preceding siblings ...)
  2022-02-24 14:52 ` [PATCH v2 36/39] objtool: Find unused ENDBR instructions Peter Zijlstra
@ 2022-02-24 14:52 ` Peter Zijlstra
  2022-02-24 14:52 ` [PATCH v2 38/39] x86/ibt: Ensure module init/exit points have references Peter Zijlstra
                   ` (2 subsequent siblings)
  39 siblings, 0 replies; 183+ messages in thread
From: Peter Zijlstra @ 2022-02-24 14:52 UTC (permalink / raw)
  To: x86, joao, hjl.tools, jpoimboe, andrew.cooper3
  Cc: linux-kernel, peterz, ndesaulniers, keescook, samitolvanen,
	mark.rutland, alyssa.milburn, mbenes, rostedt, mhiramat,
	alexei.starovoitov

Since modules are not fully linked objects, per construction, the
LTO-like objtool pass cannot fix up the direct calls to external
symbols.

Have the module loader finish the job.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/kernel/module.c |   40 +++++++++++++++++++++++++++++++++++++---
 1 file changed, 37 insertions(+), 3 deletions(-)

--- a/arch/x86/kernel/module.c
+++ b/arch/x86/kernel/module.c
@@ -24,6 +24,7 @@
 #include <asm/page.h>
 #include <asm/setup.h>
 #include <asm/unwind.h>
+#include <asm/ibt.h>
 
 #if 0
 #define DEBUGP(fmt, ...)				\
@@ -128,6 +129,33 @@ int apply_relocate(Elf32_Shdr *sechdrs,
 	return 0;
 }
 #else /*X86_64*/
+
+static inline void ibt_fix_direct(void *loc, u64 *val)
+{
+#ifdef CONFIG_X86_KERNEL_IBT
+	const void *addr = (void *)(4 + *val);
+	union text_poke_insn text;
+	u32 insn;
+
+	if (get_kernel_nofault(insn, addr))
+		return;
+
+	if (!is_endbr(insn))
+		return;
+
+	/* validate jmp.d32/call @ loc */
+	if (WARN_ONCE(get_kernel_nofault(text, loc-1) ||
+		      (text.opcode != CALL_INSN_OPCODE &&
+		       text.opcode != JMP32_INSN_OPCODE),
+		      "Unexpected code at: %pS\n", loc))
+		return;
+
+	DEBUGP("ibt_fix_direct: %pS\n", addr);
+
+	*val += 4;
+#endif
+}
+
 static int __apply_relocate_add(Elf64_Shdr *sechdrs,
 		   const char *strtab,
 		   unsigned int symindex,
@@ -139,6 +167,7 @@ static int __apply_relocate_add(Elf64_Sh
 	Elf64_Rela *rel = (void *)sechdrs[relsec].sh_addr;
 	Elf64_Sym *sym;
 	void *loc;
+	int type;
 	u64 val;
 
 	DEBUGP("Applying relocate section %u to %u\n",
@@ -153,13 +182,14 @@ static int __apply_relocate_add(Elf64_Sh
 		sym = (Elf64_Sym *)sechdrs[symindex].sh_addr
 			+ ELF64_R_SYM(rel[i].r_info);
 
+		type = ELF64_R_TYPE(rel[i].r_info);
+
 		DEBUGP("type %d st_value %Lx r_addend %Lx loc %Lx\n",
-		       (int)ELF64_R_TYPE(rel[i].r_info),
-		       sym->st_value, rel[i].r_addend, (u64)loc);
+		       type, sym->st_value, rel[i].r_addend, (u64)loc);
 
 		val = sym->st_value + rel[i].r_addend;
 
-		switch (ELF64_R_TYPE(rel[i].r_info)) {
+		switch (type) {
 		case R_X86_64_NONE:
 			break;
 		case R_X86_64_64:
@@ -185,6 +215,10 @@ static int __apply_relocate_add(Elf64_Sh
 		case R_X86_64_PLT32:
 			if (*(u32 *)loc != 0)
 				goto invalid_relocation;
+
+			if (type == R_X86_64_PLT32)
+				ibt_fix_direct(loc, &val);
+
 			val -= (u64)loc;
 			write(loc, &val, 4);
 #if 0



^ permalink raw reply	[flat|nested] 183+ messages in thread

* [PATCH v2 38/39] x86/ibt: Ensure module init/exit points have references
  2022-02-24 14:51 [PATCH v2 00/39] x86: Kernel IBT Peter Zijlstra
                   ` (36 preceding siblings ...)
  2022-02-24 14:52 ` [PATCH v2 37/39] x86/ibt: Finish --ibt-fix-direct on module loading Peter Zijlstra
@ 2022-02-24 14:52 ` Peter Zijlstra
  2022-02-24 14:52 ` [PATCH v2 39/39] x86/alternative: Use .ibt_endbr_sites to seal indirect calls Peter Zijlstra
  2022-02-24 20:26 ` [PATCH v2 00/39] x86: Kernel IBT Josh Poimboeuf
  39 siblings, 0 replies; 183+ messages in thread
From: Peter Zijlstra @ 2022-02-24 14:52 UTC (permalink / raw)
  To: x86, joao, hjl.tools, jpoimboe, andrew.cooper3
  Cc: linux-kernel, peterz, ndesaulniers, keescook, samitolvanen,
	mark.rutland, alyssa.milburn, mbenes, rostedt, mhiramat,
	alexei.starovoitov

Since the references to the module init/exit points only have external
references, a module LTO run will consider them 'unused' and seal
them, leading to an immediate fail on module load.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 include/linux/cfi.h |   11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

--- a/include/linux/cfi.h
+++ b/include/linux/cfi.h
@@ -34,8 +34,17 @@ static inline void cfi_module_remove(str
 
 #else /* !CONFIG_CFI_CLANG */
 
-#define __CFI_ADDRESSABLE(fn, __attr)
+#ifdef CONFIG_X86_KERNEL_IBT
+
+#define __CFI_ADDRESSABLE(fn, __attr) \
+	const void *__cfi_jt_ ## fn __visible __attr = (void *)&fn
+
+#endif /* CONFIG_X86_KERNEL_IBT */
 
 #endif /* CONFIG_CFI_CLANG */
 
+#ifndef __CFI_ADDRESSABLE
+#define __CFI_ADDRESSABLE(fn, __attr)
+#endif
+
 #endif /* _LINUX_CFI_H */



^ permalink raw reply	[flat|nested] 183+ messages in thread

* [PATCH v2 39/39] x86/alternative: Use .ibt_endbr_sites to seal indirect calls
  2022-02-24 14:51 [PATCH v2 00/39] x86: Kernel IBT Peter Zijlstra
                   ` (37 preceding siblings ...)
  2022-02-24 14:52 ` [PATCH v2 38/39] x86/ibt: Ensure module init/exit points have references Peter Zijlstra
@ 2022-02-24 14:52 ` Peter Zijlstra
  2022-02-24 20:26 ` [PATCH v2 00/39] x86: Kernel IBT Josh Poimboeuf
  39 siblings, 0 replies; 183+ messages in thread
From: Peter Zijlstra @ 2022-02-24 14:52 UTC (permalink / raw)
  To: x86, joao, hjl.tools, jpoimboe, andrew.cooper3
  Cc: linux-kernel, peterz, ndesaulniers, keescook, samitolvanen,
	mark.rutland, alyssa.milburn, mbenes, rostedt, mhiramat,
	alexei.starovoitov

Objtool's --ibt-seal option generates .ibt_endbr_sites which lists
superfluous ENDBR instructions. That is those instructions for which
the function is never indirectly called.

Additionally, objtool's --ibt-fix-direct ensures direct calls never
target an ENDBR instruction.

Combined this yields that these instructions should never be executed.

Poison them using a 4 byte UD1 instruction; for IBT hardware this will
raise an #CP exception due to WAIT-FOR-ENDBR not getting what it
wants. For !IBT hardware it'll trigger #UD.

In either case, it will be 'impossible' to indirectly call these
functions thereafter.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/um/kernel/um_arch.c           |    4 +++
 arch/x86/Kconfig                   |   12 +++++++++
 arch/x86/include/asm/alternative.h |    1 
 arch/x86/include/asm/ibt.h         |    8 ++++++
 arch/x86/kernel/alternative.c      |   41 +++++++++++++++++++++++++++++++++
 arch/x86/kernel/module.c           |   10 ++++++--
 arch/x86/kernel/traps.c            |   45 +++++++++++++++++++++++++++++++------
 scripts/link-vmlinux.sh            |   10 ++++++--
 8 files changed, 120 insertions(+), 11 deletions(-)

--- a/arch/um/kernel/um_arch.c
+++ b/arch/um/kernel/um_arch.c
@@ -424,6 +424,10 @@ void __init check_bugs(void)
 	os_check_bugs();
 }
 
+void apply_ibt_endbr(s32 *start, s32 *end)
+{
+}
+
 void apply_retpolines(s32 *start, s32 *end)
 {
 }
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1878,6 +1878,18 @@ config X86_KERNEL_IBT
 	  an ENDBR instruction, as such, the compiler will litter the
 	  code with them to make this happen.
 
+config X86_KERNEL_IBT_SEAL
+	prompt "Seal functions"
+	def_bool y
+	depends on X86_KERNEL_IBT && STACK_VALIDATION
+	help
+	  In addition to building the kernel with IBT, seal all functions that
+	  are not indirect call targets, avoiding them ever becomming one.
+
+	  This requires LTO like objtool runs and will slow down the build. It
+	  does significantly reduce the number of ENDBR instructions in the
+	  kernel image as well as provide some validation for !IBT hardware.
+
 config X86_INTEL_MEMORY_PROTECTION_KEYS
 	prompt "Memory Protection Keys"
 	def_bool y
--- a/arch/x86/include/asm/alternative.h
+++ b/arch/x86/include/asm/alternative.h
@@ -76,6 +76,7 @@ extern int alternatives_patched;
 extern void alternative_instructions(void);
 extern void apply_alternatives(struct alt_instr *start, struct alt_instr *end);
 extern void apply_retpolines(s32 *start, s32 *end);
+extern void apply_ibt_endbr(s32 *start, s32 *end);
 
 struct module;
 
--- a/arch/x86/include/asm/ibt.h
+++ b/arch/x86/include/asm/ibt.h
@@ -31,8 +31,16 @@ static inline __attribute_const__ unsign
 	return endbr;
 }
 
+static inline unsigned int gen_endbr_poison(void)
+{
+	return 0x0040b90f; /* ud1 0x0(%eax), %eax */
+}
+
 static inline bool is_endbr(unsigned int val)
 {
+	if (val == gen_endbr_poison())
+		return true;
+
 	val &= ~0x01000000U; /* ENDBR32 -> ENDBR64 */
 	return val == gen_endbr();
 }
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -115,6 +115,7 @@ static void __init_or_module add_nops(vo
 }
 
 extern s32 __retpoline_sites[], __retpoline_sites_end[];
+extern s32 __ibt_endbr_sites[], __ibt_endbr_sites_end[];
 extern struct alt_instr __alt_instructions[], __alt_instructions_end[];
 extern s32 __smp_locks[], __smp_locks_end[];
 void text_poke_early(void *addr, const void *opcode, size_t len);
@@ -512,6 +513,44 @@ void __init_or_module noinline apply_ret
 
 #endif /* CONFIG_RETPOLINE && CONFIG_STACK_VALIDATION */
 
+#ifdef CONFIG_X86_KERNEL_IBT_SEAL
+
+/*
+ * Generated by: objtool --ibt-seal
+ */
+void __init_or_module noinline apply_ibt_endbr(s32 *start, s32 *end)
+{
+	s32 *s;
+
+	for (s = start; s < end; s++) {
+		void *addr = (void *)s + *s;
+		u32 ud1 = gen_endbr_poison();
+		u32 endbr;
+
+		if (WARN_ON_ONCE(get_kernel_nofault(endbr, addr)))
+			continue;
+
+		if (WARN_ON_ONCE(!is_endbr(endbr)))
+			continue;
+
+		DPRINTK("ENDBR at: %pS (%px)", addr, addr);
+
+		/*
+		 * When we have IBT, the lack of ENDBR will trigger #CP
+		 * When we don't have IBT, explicitly trigger #UD
+		 */
+		DUMP_BYTES(((u8*)addr), 4, "%px: orig: ", addr);
+		DUMP_BYTES(((u8*)&ud1), 4, "%px: repl: ", addr);
+		text_poke_early(addr, &ud1, 4);
+	}
+}
+
+#else
+
+void __init_or_module noinline apply_ibt_endbr(s32 *start, s32 *end) { }
+
+#endif /* CONFIG_X86_KERNEL_IBT_SEAL */
+
 #ifdef CONFIG_SMP
 static void alternatives_smp_lock(const s32 *start, const s32 *end,
 				  u8 *text, u8 *text_end)
@@ -833,6 +872,8 @@ void __init alternative_instructions(voi
 	 */
 	apply_alternatives(__alt_instructions, __alt_instructions_end);
 
+	apply_ibt_endbr(__ibt_endbr_sites, __ibt_endbr_sites_end);
+
 #ifdef CONFIG_SMP
 	/* Patch to UP if other cpus not imminent. */
 	if (!noreplace_smp && (num_present_cpus() == 1 || setup_max_cpus <= 1)) {
--- a/arch/x86/kernel/module.c
+++ b/arch/x86/kernel/module.c
@@ -132,7 +132,7 @@ int apply_relocate(Elf32_Shdr *sechdrs,
 
 static inline void ibt_fix_direct(void *loc, u64 *val)
 {
-#ifdef CONFIG_X86_KERNEL_IBT
+#ifdef CONFIG_X86_KERNEL_IBT_SEAL
 	const void *addr = (void *)(4 + *val);
 	union text_poke_insn text;
 	u32 insn;
@@ -287,7 +287,7 @@ int module_finalize(const Elf_Ehdr *hdr,
 {
 	const Elf_Shdr *s, *text = NULL, *alt = NULL, *locks = NULL,
 		*para = NULL, *orc = NULL, *orc_ip = NULL,
-		*retpolines = NULL;
+		*retpolines = NULL, *ibt_endbr = NULL;
 	char *secstrings = (void *)hdr + sechdrs[hdr->e_shstrndx].sh_offset;
 
 	for (s = sechdrs; s < sechdrs + hdr->e_shnum; s++) {
@@ -305,6 +305,8 @@ int module_finalize(const Elf_Ehdr *hdr,
 			orc_ip = s;
 		if (!strcmp(".retpoline_sites", secstrings + s->sh_name))
 			retpolines = s;
+		if (!strcmp(".ibt_endbr_sites", secstrings + s->sh_name))
+			ibt_endbr = s;
 	}
 
 	/*
@@ -324,6 +326,10 @@ int module_finalize(const Elf_Ehdr *hdr,
 		void *aseg = (void *)alt->sh_addr;
 		apply_alternatives(aseg, aseg + alt->sh_size);
 	}
+	if (ibt_endbr) {
+		void *iseg = (void *)ibt_endbr->sh_addr;
+		apply_ibt_endbr(iseg, iseg + ibt_endbr->sh_size);
+	}
 	if (locks && text) {
 		void *lseg = (void *)locks->sh_addr;
 		void *tseg = (void *)text->sh_addr;
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -214,6 +214,17 @@ DEFINE_IDTENTRY(exc_overflow)
 
 static __ro_after_init bool ibt_fatal = true;
 
+static void handle_endbr(struct pt_regs *regs)
+{
+	pr_err("Missing ENDBR: %pS\n", (void *)instruction_pointer(regs));
+	if (!ibt_fatal) {
+		printk(KERN_DEFAULT CUT_HERE);
+		__warn(__FILE__, __LINE__, (void *)regs->ip, TAINT_WARN, regs, NULL);
+		return;
+	}
+	BUG();
+}
+
 extern const unsigned long ibt_selftest_ip; /* defined in asm beow */
 
 DEFINE_IDTENTRY_ERRORCODE(exc_control_protection)
@@ -231,13 +242,7 @@ DEFINE_IDTENTRY_ERRORCODE(exc_control_pr
 		return;
 	}
 
-	pr_err("Missing ENDBR: %pS\n", (void *)instruction_pointer(regs));
-	if (!ibt_fatal) {
-		printk(KERN_DEFAULT CUT_HERE);
-		__warn(__FILE__, __LINE__, (void *)regs->ip, TAINT_WARN, regs, NULL);
-		return;
-	}
-	BUG();
+	handle_endbr(regs);
 }
 
 bool ibt_selftest(void)
@@ -278,6 +283,29 @@ static int __init ibt_setup(char *str)
 
 __setup("ibt=", ibt_setup);
 
+static bool handle_ud1_endbr(struct pt_regs *regs)
+{
+	u32 ud1;
+
+	if (get_kernel_nofault(ud1, (u32 *)regs->ip))
+		return false;
+
+	if (ud1 == gen_endbr_poison()) {
+		handle_endbr(regs);
+		regs->ip += 4;
+		return true;
+	}
+
+	return false;
+}
+
+#else /* CONFIG_X86_KERNEL_IBT */
+
+static bool handle_ud1_endbr(struct pt_regs *regs)
+{
+	return false;
+}
+
 #endif /* CONFIG_X86_KERNEL_IBT */
 
 #ifdef CONFIG_X86_F00F_BUG
@@ -286,6 +314,9 @@ void handle_invalid_op(struct pt_regs *r
 static inline void handle_invalid_op(struct pt_regs *regs)
 #endif
 {
+	if (!user_mode(regs) && handle_ud1_endbr(regs))
+		return;
+
 	do_error_trap(regs, 0, "invalid opcode", X86_TRAP_UD, SIGILL,
 		      ILL_ILLOPN, error_get_trap_addr(regs));
 }
--- a/scripts/link-vmlinux.sh
+++ b/scripts/link-vmlinux.sh
@@ -108,7 +108,9 @@ objtool_link()
 	local objtoolcmd;
 	local objtoolopt;
 
-	if is_enabled CONFIG_LTO_CLANG && is_enabled CONFIG_STACK_VALIDATION; then
+	if is_enabled CONFIG_STACK_VALIDATION && \
+	   ( is_enabled CONFIG_LTO_CLANG || is_enabled CONFIG_X86_KERNEL_IBT_SEAL ); then
+
 		# Don't perform vmlinux validation unless explicitly requested,
 		# but run objtool on vmlinux.o now that we have an object file.
 		if is_enabled CONFIG_UNWINDER_ORC; then
@@ -117,6 +119,10 @@ objtool_link()
 
 		objtoolopt="${objtoolopt} --lto"
 
+		if is_enabled CONFIG_X86_KERNEL_IBT_SEAL; then
+			objtoolopt="${objtoolopt} --ibt --ibt-fix-direct --ibt-seal"
+		fi
+
 		if is_enabled CONFIG_FTRACE_MCOUNT_USE_OBJTOOL; then
 			objtoolopt="${objtoolopt} --mcount"
 		fi
@@ -168,7 +174,7 @@ vmlinux_link()
 	# skip output file argument
 	shift
 
-	if is_enabled CONFIG_LTO_CLANG; then
+	if is_enabled CONFIG_LTO_CLANG || is_enabled CONFIG_X86_KERNEL_IBT_SEAL; then
 		# Use vmlinux.o instead of performing the slow LTO link again.
 		objs=vmlinux.o
 		libs=



^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH v2 14/39] x86/ibt,ftrace: Make function-graph play nice
  2022-02-24 14:51 ` [PATCH v2 14/39] x86/ibt,ftrace: Make function-graph play nice Peter Zijlstra
@ 2022-02-24 15:36   ` Peter Zijlstra
  2022-02-24 15:42     ` Steven Rostedt
  0 siblings, 1 reply; 183+ messages in thread
From: Peter Zijlstra @ 2022-02-24 15:36 UTC (permalink / raw)
  To: x86, joao, hjl.tools, jpoimboe, andrew.cooper3
  Cc: linux-kernel, ndesaulniers, keescook, samitolvanen, mark.rutland,
	alyssa.milburn, mbenes, rostedt, mhiramat, alexei.starovoitov

On Thu, Feb 24, 2022 at 03:51:52PM +0100, Peter Zijlstra wrote:
> @@ -316,10 +317,12 @@ SYM_FUNC_START(return_to_handler)
>  
>  	call ftrace_return_to_handler
>  
> -	movq %rax, %rdi
> +	movq %rax, 16(%rsp)
>  	movq 8(%rsp), %rdx
>  	movq (%rsp), %rax
> -	addq $24, %rsp
> -	JMP_NOSPEC rdi
> +
> +	addq $16, %rsp
> +	UNWIND_HINT_FUNC
> +	RET
>  SYM_FUNC_END(return_to_handler)
>  #endif

While talking about this with Mark, an alternative solution is something
like this, that would keep the RSB balanced and only mess up the current
return.

No idea it if makes an appreciatable difference on current hardware,
therefore I went with the simpler option above.

@@ -307,7 +315,7 @@ EXPORT_SYMBOL(__fentry__)
 
 #ifdef CONFIG_FUNCTION_GRAPH_TRACER
 SYM_FUNC_START(return_to_handler)
-	subq  $24, %rsp
+	subq  $16, %rsp
 
 	/* Save the return values */
 	movq %rax, (%rsp)
@@ -319,7 +327,13 @@ SYM_FUNC_START(return_to_handler)
 	movq %rax, %rdi
 	movq 8(%rsp), %rdx
 	movq (%rsp), %rax
-	addq $24, %rsp
-	JMP_NOSPEC rdi
+
+	addq $16, %rsp
+	ANNOTATE_INTRA_FUNCTION_CALL
+	call .Ldo_rop
+.Ldo_rop:
+	mov %rdi, (%rsp)
+	UNWIND_HINT_FUNC
+	RET
 SYM_FUNC_END(return_to_handler)
 #endif



^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH v2 14/39] x86/ibt,ftrace: Make function-graph play nice
  2022-02-24 15:36   ` Peter Zijlstra
@ 2022-02-24 15:42     ` Steven Rostedt
  2022-02-24 23:09       ` Peter Zijlstra
  0 siblings, 1 reply; 183+ messages in thread
From: Steven Rostedt @ 2022-02-24 15:42 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: x86, joao, hjl.tools, jpoimboe, andrew.cooper3, linux-kernel,
	ndesaulniers, keescook, samitolvanen, mark.rutland,
	alyssa.milburn, mbenes, mhiramat, alexei.starovoitov

On Thu, 24 Feb 2022 16:36:57 +0100
Peter Zijlstra <peterz@infradead.org> wrote:

> On Thu, Feb 24, 2022 at 03:51:52PM +0100, Peter Zijlstra wrote:
> > @@ -316,10 +317,12 @@ SYM_FUNC_START(return_to_handler)
> >  
> >  	call ftrace_return_to_handler
> >  
> > -	movq %rax, %rdi
> > +	movq %rax, 16(%rsp)
> >  	movq 8(%rsp), %rdx
> >  	movq (%rsp), %rax
> > -	addq $24, %rsp
> > -	JMP_NOSPEC rdi
> > +
> > +	addq $16, %rsp
> > +	UNWIND_HINT_FUNC
> > +	RET
> >  SYM_FUNC_END(return_to_handler)
> >  #endif  
> 
> While talking about this with Mark, an alternative solution is something
> like this, that would keep the RSB balanced and only mess up the current
> return.
> 
> No idea it if makes an appreciatable difference on current hardware,
> therefore I went with the simpler option above.
> 
> @@ -307,7 +315,7 @@ EXPORT_SYMBOL(__fentry__)
>  
>  #ifdef CONFIG_FUNCTION_GRAPH_TRACER
>  SYM_FUNC_START(return_to_handler)
> -	subq  $24, %rsp
> +	subq  $16, %rsp
>  
>  	/* Save the return values */
>  	movq %rax, (%rsp)
> @@ -319,7 +327,13 @@ SYM_FUNC_START(return_to_handler)
>  	movq %rax, %rdi
>  	movq 8(%rsp), %rdx
>  	movq (%rsp), %rax
> -	addq $24, %rsp
> -	JMP_NOSPEC rdi
> +
> +	addq $16, %rsp
> +	ANNOTATE_INTRA_FUNCTION_CALL
> +	call .Ldo_rop
> +.Ldo_rop:

What's the overhead of an added call (for every function call that is being
traced)?

-- Steve

> +	mov %rdi, (%rsp)
> +	UNWIND_HINT_FUNC
> +	RET
>  SYM_FUNC_END(return_to_handler)
>  #endif
> 


^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH v2 12/39] x86/ibt,ftrace: Search for __fentry__ location
  2022-02-24 14:51 ` [PATCH v2 12/39] x86/ibt,ftrace: Search for __fentry__ location Peter Zijlstra
@ 2022-02-24 15:55   ` Masami Hiramatsu
  2022-02-24 15:58     ` Steven Rostedt
  2022-02-25  0:55   ` Kees Cook
  2022-03-02 16:25   ` Naveen N. Rao
  2 siblings, 1 reply; 183+ messages in thread
From: Masami Hiramatsu @ 2022-02-24 15:55 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: x86, joao, hjl.tools, jpoimboe, andrew.cooper3, linux-kernel,
	ndesaulniers, keescook, samitolvanen, mark.rutland,
	alyssa.milburn, mbenes, rostedt, mhiramat, alexei.starovoitov

Hi Peter,

On Thu, 24 Feb 2022 15:51:50 +0100
Peter Zijlstra <peterz@infradead.org> wrote:

> Have ftrace_location() search the symbol for the __fentry__ location
> when it isn't at func+0 and use this for {,un}register_ftrace_direct().
> 
> This avoids a whole bunch of assumptions about __fentry__ being at
> func+0.
> 
> Suggested-by: Steven Rostedt <rostedt@goodmis.org>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  kernel/trace/ftrace.c |   30 ++++++++++++++++++++++++++++--
>  1 file changed, 28 insertions(+), 2 deletions(-)
> 
> --- a/kernel/trace/ftrace.c
> +++ b/kernel/trace/ftrace.c
> @@ -1578,7 +1578,24 @@ unsigned long ftrace_location_range(unsi
>   */
>  unsigned long ftrace_location(unsigned long ip)
>  {
> -	return ftrace_location_range(ip, ip);
> +	struct dyn_ftrace *rec;
> +	unsigned long offset;
> +	unsigned long size;
> +
> +	rec = lookup_rec(ip, ip);
> +	if (!rec) {
> +		if (!kallsyms_lookup_size_offset(ip, &size, &offset))
> +			goto out;
> +
> +		if (!offset)

Isn't this 'if (offset)' ?

> +			rec = lookup_rec(ip - offset, (ip - offset) + size);
> +	}
> +
> +	if (rec)
> +		return rec->ip;
> +
> +out:
> +	return 0;
>  }

Thank you,

>  
>  /**
> @@ -5110,11 +5127,16 @@ int register_ftrace_direct(unsigned long
>  	struct ftrace_func_entry *entry;
>  	struct ftrace_hash *free_hash = NULL;
>  	struct dyn_ftrace *rec;
> -	int ret = -EBUSY;
> +	int ret = -ENODEV;
>  
>  	mutex_lock(&direct_mutex);
>  
> +	ip = ftrace_location(ip);
> +	if (!ip)
> +		goto out_unlock;
> +
>  	/* See if there's a direct function at @ip already */
> +	ret = -EBUSY;
>  	if (ftrace_find_rec_direct(ip))
>  		goto out_unlock;
>  
> @@ -5222,6 +5244,10 @@ int unregister_ftrace_direct(unsigned lo
>  
>  	mutex_lock(&direct_mutex);
>  
> +	ip = ftrace_location(ip);
> +	if (!ip)
> +		goto out_unlock;
> +
>  	entry = find_direct_entry(&ip, NULL);
>  	if (!entry)
>  		goto out_unlock;
> 
> 


-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH v2 12/39] x86/ibt,ftrace: Search for __fentry__ location
  2022-02-24 15:55   ` Masami Hiramatsu
@ 2022-02-24 15:58     ` Steven Rostedt
  2022-02-24 15:59       ` Steven Rostedt
                         ` (2 more replies)
  0 siblings, 3 replies; 183+ messages in thread
From: Steven Rostedt @ 2022-02-24 15:58 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Peter Zijlstra, x86, joao, hjl.tools, jpoimboe, andrew.cooper3,
	linux-kernel, ndesaulniers, keescook, samitolvanen, mark.rutland,
	alyssa.milburn, mbenes, alexei.starovoitov

On Fri, 25 Feb 2022 00:55:20 +0900
Masami Hiramatsu <mhiramat@kernel.org> wrote:

> >  unsigned long ftrace_location(unsigned long ip)
> >  {
> > -	return ftrace_location_range(ip, ip);
> > +	struct dyn_ftrace *rec;
> > +	unsigned long offset;
> > +	unsigned long size;
> > +
> > +	rec = lookup_rec(ip, ip);
> > +	if (!rec) {
> > +		if (!kallsyms_lookup_size_offset(ip, &size, &offset))
> > +			goto out;
> > +
> > +		if (!offset)  
> 
> Isn't this 'if (offset)' ?

No, the point to only look for the fentry location if the ip passed in
points to the start of the function. IOW, +0 offset.

-- Steve


> 
> > +			rec = lookup_rec(ip - offset, (ip - offset) + size);
> > +	}
> > +
> > +	if (rec)
> > +		return rec->ip;
> > +
> > +out:
> > +	return 0;

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH v2 12/39] x86/ibt,ftrace: Search for __fentry__ location
  2022-02-24 15:58     ` Steven Rostedt
@ 2022-02-24 15:59       ` Steven Rostedt
  2022-02-24 16:01       ` Steven Rostedt
  2022-02-25  1:34       ` Masami Hiramatsu
  2 siblings, 0 replies; 183+ messages in thread
From: Steven Rostedt @ 2022-02-24 15:59 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Peter Zijlstra, x86, joao, hjl.tools, jpoimboe, andrew.cooper3,
	linux-kernel, ndesaulniers, keescook, samitolvanen, mark.rutland,
	alyssa.milburn, mbenes, alexei.starovoitov

On Thu, 24 Feb 2022 10:58:47 -0500
Steven Rostedt <rostedt@goodmis.org> wrote:

> No, the point to only look for the fentry location if the ip passed in

      "the point is to only look"

-- Steve

> points to the start of the function. IOW, +0 offset.


^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH v2 12/39] x86/ibt,ftrace: Search for __fentry__ location
  2022-02-24 15:58     ` Steven Rostedt
  2022-02-24 15:59       ` Steven Rostedt
@ 2022-02-24 16:01       ` Steven Rostedt
  2022-02-24 22:46         ` Josh Poimboeuf
  2022-02-25  1:34       ` Masami Hiramatsu
  2 siblings, 1 reply; 183+ messages in thread
From: Steven Rostedt @ 2022-02-24 16:01 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Peter Zijlstra, x86, joao, hjl.tools, jpoimboe, andrew.cooper3,
	linux-kernel, ndesaulniers, keescook, samitolvanen, mark.rutland,
	alyssa.milburn, mbenes, alexei.starovoitov

On Thu, 24 Feb 2022 10:58:47 -0500
Steven Rostedt <rostedt@goodmis.org> wrote:

> On Fri, 25 Feb 2022 00:55:20 +0900
> Masami Hiramatsu <mhiramat@kernel.org> wrote:
> 
> > >  unsigned long ftrace_location(unsigned long ip)
> > >  {
> > > -	return ftrace_location_range(ip, ip);
> > > +	struct dyn_ftrace *rec;
> > > +	unsigned long offset;
> > > +	unsigned long size;
> > > +
> > > +	rec = lookup_rec(ip, ip);
> > > +	if (!rec) {
> > > +		if (!kallsyms_lookup_size_offset(ip, &size, &offset))
> > > +			goto out;
> > > +
> > > +		if (!offset)    
> > 
> > Isn't this 'if (offset)' ?  
> 
> No, the point to only look for the fentry location if the ip passed in
> points to the start of the function. IOW, +0 offset.
> 

I do agree with Masami that it is confusing. Please add a comment:

		/* Search the entire function if ip is the start of the function */
		if (!offset)
			[..]

-- Steve

> 
> 
> >   
> > > +			rec = lookup_rec(ip - offset, (ip - offset) + size);
> > > +	}
> > > +
> > > +	if (rec)
> > > +		return rec->ip;
> > > +
> > > +out:
> > > +	return 0;  


^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH v2 27/39] x86/entry,xen: Early rewrite of restore_regs_and_return_to_kernel()
  2022-02-24 14:52 ` [PATCH v2 27/39] x86/entry,xen: Early rewrite of restore_regs_and_return_to_kernel() Peter Zijlstra
@ 2022-02-24 17:51   ` Andrew Cooper
  0 siblings, 0 replies; 183+ messages in thread
From: Andrew Cooper @ 2022-02-24 17:51 UTC (permalink / raw)
  To: Peter Zijlstra, x86, joao, hjl.tools, jpoimboe
  Cc: linux-kernel, ndesaulniers, keescook, samitolvanen, mark.rutland,
	alyssa.milburn, mbenes, rostedt, mhiramat, alexei.starovoitov,
	Andrew Cooper

On 24/02/2022 14:52, Peter Zijlstra wrote:
> By doing an early rewrite of 'jmp native_iret` in
> restore_regs_and_return_to_kernel() we can get rid of the last
> INTERRUPT_RETURN user and paravirt_iret.
>
> Suggested-by: Andrew Cooper <Andrew.Cooper3@citrix.com>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>

As an observation, if you move this earlier in the series, you'll reduce
the churn by not needing to take out ENDBRs which you inserted previously.

Patches 25-27 all look like the can be prerequisites, ahead of patch 5.

~Andrew

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH v2 00/39] x86: Kernel IBT
  2022-02-24 14:51 [PATCH v2 00/39] x86: Kernel IBT Peter Zijlstra
                   ` (38 preceding siblings ...)
  2022-02-24 14:52 ` [PATCH v2 39/39] x86/alternative: Use .ibt_endbr_sites to seal indirect calls Peter Zijlstra
@ 2022-02-24 20:26 ` Josh Poimboeuf
  2022-02-25 15:28   ` Peter Zijlstra
  39 siblings, 1 reply; 183+ messages in thread
From: Josh Poimboeuf @ 2022-02-24 20:26 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: x86, joao, hjl.tools, andrew.cooper3, linux-kernel, ndesaulniers,
	keescook, samitolvanen, mark.rutland, alyssa.milburn, mbenes,
	rostedt, mhiramat, alexei.starovoitov

On Thu, Feb 24, 2022 at 03:51:38PM +0100, Peter Zijlstra wrote:
> Hi,
> 
> This is an even more complete Kernel IBT implementation.
> 
> Since last time (in no specific order):
> 
>  - Reworked Xen and paravirt bits lots (andyhpp)
>  - Reworked entry annotation (jpoimboe)
>  - Renamed CONFIG symbol to CONFIG_X86_KERNEL_IBT (redgecomb)
>  - Pinned CR4_CET (kees)
>  - Added __noendbr to CET control functions (kees)
>  - kexec (redgecomb)
>  - made function-graph, kprobes and bpf not explode (rostedt)
>  - cleanups and split ups (jpoimboe, mbenes)
>  - reworked whole module objtool (nathanchance)
>  - attempted and failed at making Clang go
> 
> Specifically to clang; I made clang-13 explode by rediscovering:
> https://reviews.llvm.org/D111108, then I tried clang-14 but it looks like
> ld.lld is still generating .plt entries out of thin air.
> 
> Also, I know the very first patch is somewhat controversial amonst the clang
> people, but I really think the current state of affairs is abysmal and this
> lets me at least use clang.
> 
> Patches are also available here:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git x86/wip.ibt

Bricked my SPR:

[   21.602888] jump_label: Fatal kernel bug, unexpected op at sched_clock_stable+0x4/0x20 [0000000074a0db20] (eb 06 b8 01 00 != eb 0a 00 00 00)) size:2 type:0
[   21.618489] ------------[ cut here ]------------
[   21.623706] kernel BUG at arch/x86/kernel/jump_label.c:73!
[   21.629903] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
[   21.630897] CPU: 1 PID: 1 Comm: swapper/0 Tainted: G          I       5.17.0-rc5+ #3
[   21.630897] Hardware name: Intel Corporation ArcherCity/ArcherCity, BIOS EGSDCRB1.86B.0064.D15.2109031959 09/03/2021
[   21.630897] RIP: 0010:__jump_label_patch.cold.0+0x24/0x26
[   21.630897] Code: 9f e9 9d b1 5f ff 48 c7 c3 a8 44 65 a0 41 55 45 89 f1 49 89 d8 4c 89 e1 4c 89 e2 4c 89 e6 48 c7 c7 78 00 77 9f e8 e8 a8 00 00 <0f> 0b 48 89 fb 48 c7 c6 f0 03 77 9f 48 8d bf d0 00 00 00 e8 41 16
[   21.630897] RSP: 0000:ff7af25cc01cfd68 EFLAGS: 00010246
[   21.630897] RAX: 000000000000008f RBX: ffffffffa06544a8 RCX: 0000000000000001
[   21.630897] RDX: 0000000000000000 RSI: 00000000fffeffff RDI: 00000000ffffffff
[   21.630897] RBP: ff7af25cc01cfd98 R08: 0000000000000000 R09: c0000000fffeffff
[   21.630897] R10: 0000000000000001 R11: ff7af25cc01cfb88 R12: ffffffff9e520a74
[   21.630897] R13: 0000000000000000 R14: 0000000000000002 R15: ffffffff9f7293b7
[   21.630897] FS:  0000000000000000(0000) GS:ff422b3e2cc40000(0000) knlGS:0000000000000000
[   21.630897] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   21.630897] CR2: 0000000000000000 CR3: 0000001994810001 CR4: 0000000000f71ee0
[   21.630897] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   21.630897] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
[   21.630897] PKRU: 55555554
[   21.630897] Call Trace:
[   21.630897]  <TASK>
[   21.630897]  arch_jump_label_transform_queue+0x37/0x80
[   21.630897]  __jump_label_update+0x74/0x130
[   21.630897]  ? sched_init+0x556/0x556
[   21.630897]  ? rdinit_setup+0x30/0x30
[   21.630897]  static_key_enable_cpuslocked+0x5b/0x90
[   21.630897]  static_key_enable+0x1a/0x20
[   21.630897]  sched_clock_init_late+0x7a/0x95
[   21.630897]  do_one_initcall+0x45/0x200
[   21.630897]  kernel_init_freeable+0x211/0x27a
[   21.630897]  ? rest_init+0xd0/0xd0
[   21.630897]  kernel_init+0x1a/0x130
[   21.630897]  ret_from_fork+0x1f/0x30
[   21.630897]  </TASK>
[   21.630897] Modules linked in:
[   21.838238] ---[ end trace 0000000000000000 ]---


ffffffff81120a70 <sched_clock_stable>:
ffffffff81120a70:       f3 0f 1e fa             endbr64
ffffffff81120a74:       eb 06                   jmp    ffffffff81120a7c <sched_clock_stable+0xc>
ffffffff81120a76:       b8 01 00 00 00          mov    $0x1,%eax
ffffffff81120a7b:       c3                      retq
ffffffff81120a7c:       f3 0f 1e fa             endbr64
ffffffff81120a80:       31 c0                   xor    %eax,%eax
ffffffff81120a82:       c3                      retq
ffffffff81120a83:       66 66 2e 0f 1f 84 00    data16 nopw %cs:0x0(%rax,%rax,1)
ffffffff81120a8a:       00 00 00 00
ffffffff81120a8e:       66 90                   xchg   %ax,%ax


^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH v2 07/39] x86/entry: Sprinkle ENDBR dust
  2022-02-24 14:51 ` [PATCH v2 07/39] x86/entry: Sprinkle ENDBR dust Peter Zijlstra
@ 2022-02-24 22:37   ` Josh Poimboeuf
  2022-02-25  0:42   ` Kees Cook
  1 sibling, 0 replies; 183+ messages in thread
From: Josh Poimboeuf @ 2022-02-24 22:37 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: x86, joao, hjl.tools, andrew.cooper3, linux-kernel, ndesaulniers,
	keescook, samitolvanen, mark.rutland, alyssa.milburn, mbenes,
	rostedt, mhiramat, alexei.starovoitov

On Thu, Feb 24, 2022 at 03:51:45PM +0100, Peter Zijlstra wrote:
> Kernel entry points should be having ENDBR on for IBT configs.
> 
> The SYSCALL entry points are found through taking their respective
> address in order to program them in the MSRs, while the exception
> entry points are found through UNWIND_HINT_IRET_REGS.
> 
> The rule is that any UNWIND_HINT_IRET_REGS at sym+0 should have an
> ENDBR, see the later objtool ibt validation patch.

Could the "rule" be changed to only check global syms?  It seems
unlikely a local symbol would need ENDBR.

Then you wouldn't need this annotation:

>  SYM_CODE_START_LOCAL(early_idt_handler_common)
> +	UNWIND_HINT_IRET_REGS offset=16
> +	ANNOTATE_NOENDBR
>  	/*
>  	 * The stack is the hardware frame, an error code or zero, and the
>  	 * vector number.

-- 
Josh


^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH v2 10/39] x86/ibt,crypto: Add ENDBR for the jump-table entries
  2022-02-24 14:51 ` [PATCH v2 10/39] x86/ibt,crypto: Add ENDBR for the jump-table entries Peter Zijlstra
@ 2022-02-24 22:41   ` Josh Poimboeuf
  2022-02-25  0:50   ` Kees Cook
  1 sibling, 0 replies; 183+ messages in thread
From: Josh Poimboeuf @ 2022-02-24 22:41 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: x86, joao, hjl.tools, andrew.cooper3, linux-kernel, ndesaulniers,
	keescook, samitolvanen, mark.rutland, alyssa.milburn, mbenes,
	rostedt, mhiramat, alexei.starovoitov

On Thu, Feb 24, 2022 at 03:51:48PM +0100, Peter Zijlstra wrote:
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  arch/x86/crypto/crc32c-pcl-intel-asm_64.S |    3 +++
>  1 file changed, 3 insertions(+)
> 
> --- a/arch/x86/crypto/crc32c-pcl-intel-asm_64.S
> +++ b/arch/x86/crypto/crc32c-pcl-intel-asm_64.S
> @@ -195,6 +195,7 @@ SYM_FUNC_START(crc_pcl)
>  .altmacro
>  LABEL crc_ %i
>  .noaltmacro
> +	ENDBR
>  	crc32q   -i*8(block_0), crc_init
>  	crc32q   -i*8(block_1), crc1
>  	crc32q   -i*8(block_2), crc2
> @@ -203,6 +204,7 @@ LABEL crc_ %i
>  
>  .altmacro
>  LABEL crc_ %i
> +	ENDBR
>  .noaltmacro

Minor inconsistency here in the placement of ENDBR.  Should probably go
below .noaltmacro in both cases.

-- 
Josh


^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH v2 12/39] x86/ibt,ftrace: Search for __fentry__ location
  2022-02-24 16:01       ` Steven Rostedt
@ 2022-02-24 22:46         ` Josh Poimboeuf
  2022-02-24 22:51           ` Steven Rostedt
  0 siblings, 1 reply; 183+ messages in thread
From: Josh Poimboeuf @ 2022-02-24 22:46 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Masami Hiramatsu, Peter Zijlstra, x86, joao, hjl.tools,
	andrew.cooper3, linux-kernel, ndesaulniers, keescook,
	samitolvanen, mark.rutland, alyssa.milburn, mbenes,
	alexei.starovoitov

On Thu, Feb 24, 2022 at 11:01:30AM -0500, Steven Rostedt wrote:
> On Thu, 24 Feb 2022 10:58:47 -0500
> Steven Rostedt <rostedt@goodmis.org> wrote:
> 
> > On Fri, 25 Feb 2022 00:55:20 +0900
> > Masami Hiramatsu <mhiramat@kernel.org> wrote:
> > 
> > > >  unsigned long ftrace_location(unsigned long ip)
> > > >  {
> > > > -	return ftrace_location_range(ip, ip);
> > > > +	struct dyn_ftrace *rec;
> > > > +	unsigned long offset;
> > > > +	unsigned long size;
> > > > +
> > > > +	rec = lookup_rec(ip, ip);
> > > > +	if (!rec) {
> > > > +		if (!kallsyms_lookup_size_offset(ip, &size, &offset))
> > > > +			goto out;
> > > > +
> > > > +		if (!offset)    
> > > 
> > > Isn't this 'if (offset)' ?  
> > 
> > No, the point to only look for the fentry location if the ip passed in
> > points to the start of the function. IOW, +0 offset.
> > 
> 
> I do agree with Masami that it is confusing. Please add a comment:
> 
> 		/* Search the entire function if ip is the start of the function */
> 		if (!offset)
> 			[..]
> 
> -- Steve
> 
> > 
> > 
> > >   
> > > > +			rec = lookup_rec(ip - offset, (ip - offset) + size);

If 'offset' is zero then why the math here?      ^^^^^^^^^^^   ^^^^^^^^^^^
-- 
Josh


^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH v2 12/39] x86/ibt,ftrace: Search for __fentry__ location
  2022-02-24 22:46         ` Josh Poimboeuf
@ 2022-02-24 22:51           ` Steven Rostedt
  0 siblings, 0 replies; 183+ messages in thread
From: Steven Rostedt @ 2022-02-24 22:51 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Masami Hiramatsu, Peter Zijlstra, x86, joao, hjl.tools,
	andrew.cooper3, linux-kernel, ndesaulniers, keescook,
	samitolvanen, mark.rutland, alyssa.milburn, mbenes,
	alexei.starovoitov

On Thu, 24 Feb 2022 14:46:31 -0800
Josh Poimboeuf <jpoimboe@redhat.com> wrote:

> > > > > +			rec = lookup_rec(ip - offset, (ip - offset) + size);  
> 
> If 'offset' is zero then why the math here?      ^^^^^^^^^^^   ^^^^^^^^^^^

Because it didn't check for offset being zero when we wrote that line. ;-)

Yes, checking for !offset makes that logic irrelevant.

Good catch.

-- Steve

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH v2 13/39] x86/livepatch: Validate __fentry__ location
  2022-02-24 14:51 ` [PATCH v2 13/39] x86/livepatch: Validate " Peter Zijlstra
@ 2022-02-24 23:02   ` Josh Poimboeuf
  0 siblings, 0 replies; 183+ messages in thread
From: Josh Poimboeuf @ 2022-02-24 23:02 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: x86, joao, hjl.tools, andrew.cooper3, linux-kernel, ndesaulniers,
	keescook, samitolvanen, mark.rutland, alyssa.milburn, mbenes,
	rostedt, mhiramat, alexei.starovoitov

On Thu, Feb 24, 2022 at 03:51:51PM +0100, Peter Zijlstra wrote:
> Currently livepatch assumes __fentry__ lives at func+0, which is most
> likely untrue with IBT on. Instead make it use ftrace_location() by
> default which both validates and finds the actual ip if there is any
> in the same symbol.
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  arch/x86/include/asm/livepatch.h |    9 +++++++++
>  kernel/livepatch/patch.c         |    2 +-
>  2 files changed, 10 insertions(+), 1 deletion(-)
> 
> --- a/kernel/livepatch/patch.c
> +++ b/kernel/livepatch/patch.c
> @@ -133,7 +133,7 @@ static void notrace klp_ftrace_handler(u
>  #ifndef klp_get_ftrace_location
>  static unsigned long klp_get_ftrace_location(unsigned long faddr)
>  {
> -	return faddr;
> +	return ftrace_location(faddr);

Now that ftrace is doing the dirty work, I think this means we can get
rid of klp_get_ftrace_location() altogether:

diff --git a/arch/powerpc/include/asm/livepatch.h b/arch/powerpc/include/asm/livepatch.h
index 4fe018cc207b..7b9dcd51af32 100644
--- a/arch/powerpc/include/asm/livepatch.h
+++ b/arch/powerpc/include/asm/livepatch.h
@@ -19,16 +19,6 @@ static inline void klp_arch_set_pc(struct ftrace_regs *fregs, unsigned long ip)
 	regs_set_return_ip(regs, ip);
 }
 
-#define klp_get_ftrace_location klp_get_ftrace_location
-static inline unsigned long klp_get_ftrace_location(unsigned long faddr)
-{
-	/*
-	 * Live patch works only with -mprofile-kernel on PPC. In this case,
-	 * the ftrace location is always within the first 16 bytes.
-	 */
-	return ftrace_location_range(faddr, faddr + 16);
-}
-
 static inline void klp_init_thread_info(struct task_struct *p)
 {
 	/* + 1 to account for STACK_END_MAGIC */
diff --git a/kernel/livepatch/patch.c b/kernel/livepatch/patch.c
index fd295bbbcbc7..ed3464e68bda 100644
--- a/kernel/livepatch/patch.c
+++ b/kernel/livepatch/patch.c
@@ -124,19 +124,6 @@ static void notrace klp_ftrace_handler(unsigned long ip,
 	ftrace_test_recursion_unlock(bit);
 }
 
-/*
- * Convert a function address into the appropriate ftrace location.
- *
- * Usually this is just the address of the function, but on some architectures
- * it's more complicated so allow them to provide a custom behaviour.
- */
-#ifndef klp_get_ftrace_location
-static unsigned long klp_get_ftrace_location(unsigned long faddr)
-{
-	return ftrace_location(faddr);
-}
-#endif
-
 static void klp_unpatch_func(struct klp_func *func)
 {
 	struct klp_ops *ops;
@@ -153,8 +140,7 @@ static void klp_unpatch_func(struct klp_func *func)
 	if (list_is_singular(&ops->func_stack)) {
 		unsigned long ftrace_loc;
 
-		ftrace_loc =
-			klp_get_ftrace_location((unsigned long)func->old_func);
+		ftrace_loc = ftrace_location((unsigned long)func->old_func);
 		if (WARN_ON(!ftrace_loc))
 			return;
 


^ permalink raw reply related	[flat|nested] 183+ messages in thread

* Re: [PATCH v2 14/39] x86/ibt,ftrace: Make function-graph play nice
  2022-02-24 15:42     ` Steven Rostedt
@ 2022-02-24 23:09       ` Peter Zijlstra
  0 siblings, 0 replies; 183+ messages in thread
From: Peter Zijlstra @ 2022-02-24 23:09 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: x86, joao, hjl.tools, jpoimboe, andrew.cooper3, linux-kernel,
	ndesaulniers, keescook, samitolvanen, mark.rutland,
	alyssa.milburn, mbenes, mhiramat, alexei.starovoitov

On Thu, Feb 24, 2022 at 10:42:30AM -0500, Steven Rostedt wrote:
> On Thu, 24 Feb 2022 16:36:57 +0100
> Peter Zijlstra <peterz@infradead.org> wrote:
> 
> > On Thu, Feb 24, 2022 at 03:51:52PM +0100, Peter Zijlstra wrote:
> > > @@ -316,10 +317,12 @@ SYM_FUNC_START(return_to_handler)
> > >  
> > >  	call ftrace_return_to_handler
> > >  
> > > -	movq %rax, %rdi
> > > +	movq %rax, 16(%rsp)
> > >  	movq 8(%rsp), %rdx
> > >  	movq (%rsp), %rax
> > > -	addq $24, %rsp
> > > -	JMP_NOSPEC rdi
> > > +
> > > +	addq $16, %rsp
> > > +	UNWIND_HINT_FUNC
> > > +	RET
> > >  SYM_FUNC_END(return_to_handler)
> > >  #endif  
> > 
> > While talking about this with Mark, an alternative solution is something
> > like this, that would keep the RSB balanced and only mess up the current
> > return.
> > 
> > No idea it if makes an appreciatable difference on current hardware,
> > therefore I went with the simpler option above.
> > 
> > @@ -307,7 +315,7 @@ EXPORT_SYMBOL(__fentry__)
> >  
> >  #ifdef CONFIG_FUNCTION_GRAPH_TRACER
> >  SYM_FUNC_START(return_to_handler)
> > -	subq  $24, %rsp
> > +	subq  $16, %rsp
> >  
> >  	/* Save the return values */
> >  	movq %rax, (%rsp)
> > @@ -319,7 +327,13 @@ SYM_FUNC_START(return_to_handler)
> >  	movq %rax, %rdi
> >  	movq 8(%rsp), %rdx
> >  	movq (%rsp), %rax
> > -	addq $24, %rsp
> > -	JMP_NOSPEC rdi
> > +
> > +	addq $16, %rsp
> > +	ANNOTATE_INTRA_FUNCTION_CALL
> > +	call .Ldo_rop
> > +.Ldo_rop:
> 
> What's the overhead of an added call (for every function call that is being
> traced)?

Who knows :-) That's all u-arch magic and needs testing (on lots of
hardware). I suspect the dominating cost of all this code is the RSB
miss, not a few regular instructions.

So with this alternative we'll get one guaranteed RSB miss per
construction, but at least the RSB should be mostly good again
afterwards.

With the patch as proposed, the RSB is basically scrap due to
unbalanced.

The original patch changing it to an indirect call cited <2% performance
improvement IIRC, and that all was pre-speculation mess. This
alternative isn't much different from a retpoline (all that's missing is
the speculation trap).


^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH v2 16/39] x86/bpf: Add ENDBR instructions to prologue and trampoline
  2022-02-24 14:51 ` [PATCH v2 16/39] x86/bpf: Add ENDBR instructions to prologue and trampoline Peter Zijlstra
@ 2022-02-24 23:37   ` Josh Poimboeuf
  2022-02-25  0:59     ` Kees Cook
                       ` (2 more replies)
  0 siblings, 3 replies; 183+ messages in thread
From: Josh Poimboeuf @ 2022-02-24 23:37 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: x86, joao, hjl.tools, andrew.cooper3, linux-kernel, ndesaulniers,
	keescook, samitolvanen, mark.rutland, alyssa.milburn, mbenes,
	rostedt, mhiramat, alexei.starovoitov

On Thu, Feb 24, 2022 at 03:51:54PM +0100, Peter Zijlstra wrote:
> @@ -339,9 +350,18 @@ static int __bpf_arch_text_poke(void *ip
>  	u8 *prog;
>  	int ret;
>  
> +#ifdef CONFIG_X86_KERNEL_IBT
> +	if (is_endbr(*(u32 *)ip))
> +		ip += 4;
> +#endif
> +
>  	memcpy(old_insn, nop_insn, X86_PATCH_SIZE);
>  	if (old_addr) {
>  		prog = old_insn;
> +#ifdef CONFIG_X86_KERNEL_IBT
> +		if (is_endbr(*(u32 *)old_addr))
> +			old_addr += 4;
> +#endif
>  		ret = t == BPF_MOD_CALL ?
>  		      emit_call(&prog, old_addr, ip) :
>  		      emit_jump(&prog, old_addr, ip);
> @@ -352,6 +372,10 @@ static int __bpf_arch_text_poke(void *ip
>  	memcpy(new_insn, nop_insn, X86_PATCH_SIZE);
>  	if (new_addr) {
>  		prog = new_insn;
> +#ifdef CONFIG_X86_KERNEL_IBT
> +		if (is_endbr(*(u32 *)new_addr))
> +			new_addr += 4;
> +#endif

All the above ifdef-itis should be able to be removed since is_endbr()
returns false for !IBT.

>  		ret = t == BPF_MOD_CALL ?
>  		      emit_call(&prog, new_addr, ip) :
>  		      emit_jump(&prog, new_addr, ip);
> @@ -2028,10 +2052,11 @@ int arch_prepare_bpf_trampoline(struct b
>  		/* skip patched call instruction and point orig_call to actual
>  		 * body of the kernel function.
>  		 */
> -		orig_call += X86_PATCH_SIZE;
> +		orig_call += X86_PATCH_SIZE + 4*HAS_KERNEL_IBT;

All the "4*HAS_KERNEL_IBT" everywhere is cute, but you might as well
just have IBT_ENDBR_SIZE (here and in other patches).

-- 
Josh


^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH v2 18/39] x86/ibt: Add IBT feature, MSR and #CP handling
  2022-02-24 14:51 ` [PATCH v2 18/39] x86/ibt: Add IBT feature, MSR and #CP handling Peter Zijlstra
@ 2022-02-24 23:55   ` Josh Poimboeuf
  2022-02-25 10:51     ` Peter Zijlstra
  2022-02-25  1:09   ` Kees Cook
  2022-02-25 19:59   ` Edgecombe, Rick P
  2 siblings, 1 reply; 183+ messages in thread
From: Josh Poimboeuf @ 2022-02-24 23:55 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: x86, joao, hjl.tools, andrew.cooper3, linux-kernel, ndesaulniers,
	keescook, samitolvanen, mark.rutland, alyssa.milburn, mbenes,
	rostedt, mhiramat, alexei.starovoitov

On Thu, Feb 24, 2022 at 03:51:56PM +0100, Peter Zijlstra wrote:
> +static __always_inline void setup_cet(struct cpuinfo_x86 *c)
> +{
> +	u64 msr = CET_ENDBR_EN;
> +
> +	if (!HAS_KERNEL_IBT ||
> +	    !cpu_feature_enabled(X86_FEATURE_IBT))
> +		return;

If you add X86_FEATURE_BIT to arch/x86/include/asm/disabled-features.h,
the HAS_KERNEL_IBT check becomes redundant.

> +bool ibt_selftest(void)
> +{
> +	unsigned long ret;
> +
> +	asm ("1: lea 2f(%%rip), %%rax\n\t"
> +	     ANNOTATE_RETPOLINE_SAFE
> +	     "   jmp *%%rax\n\t"
> +	     ASM_REACHABLE
> +	     ANNOTATE_NOENDBR
> +	     "2: nop\n\t"
> +
> +	     /* unsigned ibt_selftest_ip = 2b */
> +	     ".pushsection .rodata,\"a\"\n\t"
> +	     ".align 8\n\t"
> +	     ".type ibt_selftest_ip, @object\n\t"
> +	     ".size ibt_selftest_ip, 8\n\t"
> +	     "ibt_selftest_ip:\n\t"
> +	     ".quad 2b\n\t"
> +	     ".popsection\n\t"

It's still seems silly to make this variable in asm.

Also .rodata isn't going to work for CPU hotplug.

-- 
Josh


^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH v2 01/39] kbuild: Fix clang build
  2022-02-24 14:51 ` [PATCH v2 01/39] kbuild: Fix clang build Peter Zijlstra
@ 2022-02-25  0:11   ` Kees Cook
  2022-03-01 21:16   ` Nick Desaulniers
  1 sibling, 0 replies; 183+ messages in thread
From: Kees Cook @ 2022-02-25  0:11 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: x86, joao, hjl.tools, jpoimboe, andrew.cooper3, linux-kernel,
	ndesaulniers, samitolvanen, mark.rutland, alyssa.milburn, mbenes,
	rostedt, mhiramat, alexei.starovoitov, Nathan Chancellor, llvm

nit: Subject maybe could be something more specific like:
"kbuild: Detect and apply clang version suffix to defaults"

On Thu, Feb 24, 2022 at 03:51:39PM +0100, Peter Zijlstra wrote:
> Debian (and derived) distros ship their compilers as -$ver suffixed
> binaries. For gcc it is sufficent to use:
> 
>  $ make CC=gcc-12
> 
> However, clang builds (esp. clang-lto) need a whole array of tools to be
> exactly right, leading to unweildy stuff like:

Yeah, I have had this problem with gcc versions too (when trying to
select older compilers when cross compiling).

> [...]
> which is, quite franktly, totally insane and unusable. Instead make
> the CC variable DTRT, enabling one such as myself to use:
> 
>  $ make CC=clang-13

This form is intended to mix clang and binutils, as there is a long tail
of (usually architecture-specific) issues with ld.lld and Clang's
assembler. It's only relatively recently that clang + ld.lld works
happily on x86_64. Growing a way to split that logic by architecture
might be interesting, but also confusing...

> This also lets one quickly test different clang versions.
> Additionally, also support path based LLVM suites like:
> 
>  $ make CC=/opt/llvm/bin/clang
> 
> This changes the default to LLVM=1 when CC is clang, mixing toolchains
> is still possible by explicitly adding LLVM=0.

I do like the path idea -- this is much cleaner in Clang's case than the
gcc/binutils mixture where a versioned binutils might not even exist in
a given environment. I've been fighting versioned cross compilers with
gcc and ld.bfd. It's almost worse. ;)

> [...]
> --- a/Makefile
> +++ b/Makefile
> @@ -423,9 +423,29 @@ HOST_LFS_CFLAGS := $(shell getconf LFS_C
>  HOST_LFS_LDFLAGS := $(shell getconf LFS_LDFLAGS 2>/dev/null)
>  HOST_LFS_LIBS := $(shell getconf LFS_LIBS 2>/dev/null)
>  
> -ifneq ($(LLVM),)
> -HOSTCC	= clang
> -HOSTCXX	= clang++
> +# powerpc and s390 don't yet work with LLVM as a whole

Er, this, uh, doesn't really capture the matrix. ;) See
https://clangbuiltlinux.github.io/

> +ifeq ($(ARCH),powerpc)
> +LLVM = 0
> +endif
> +ifeq ($(ARCH),s390)
> +LLVM = 0
> +endif
> +
> +# otherwise, if CC=clang, default to using LLVM to enable LTO

I find this comment confusing: using LLVM=1 lets LTO be possible,
strictly speaking, it doesn't enable it.

> +CC_BASE := $(shell echo $(CC) | sed 's/.*\///')

I would expect $(shell basename $(CC)) (or similarly "dirname") for
these sorts of path manipulations, but I'll bet there's some Makefile
string syntax to do this too, to avoid $(shell) entirely...

> +CC_NAME := $(shell echo $(CC_BASE) | cut -b "1-5")

O_o  cut -d- -f1

> +ifeq ($(shell test "$(CC_NAME)" = "clang"; echo $$?),0)
> +LLVM ?= 1
> +LLVM_PFX := $(shell echo $(CC) | sed 's/\(.*\/\)\?.*/\1/')

dirname

> +LLVM_SFX := $(shell echo $(CC_BASE) | cut -b "6-")

cut -d- -f2-

> +endif

This versioned suffix logic I'm fine with, though I'd prefer we gain
this for both Clang and GCC, as I've needed it there too, specifically
with the handling of CROSS_COMPILE:

$ make CROSS_COMPILE=/path/to/abi- CC=/path/to/abi-gcc-10 LD=/path/to/abi-ld-binutilsversion

Anyway, I guess that could be a separate patch.

> +# if not set by now, do not use LLVM
> +LLVM ?= 0

I think, however, the LLVM state needs to stay default=0. The "how to
build with Clang" already details the "how" on this:
https://www.kernel.org/doc/html/latest/kbuild/llvm.html

But I do agree: extracting the version suffix would make things much
easier.

Regardless, even if LLVM=1 is made the default, I think that should be
separate from the path and suffix logic.

Also, please CC the "CLANG/LLVM BUILD SUPPORT" maintainers in the
future:
M:      Nathan Chancellor <nathan@kernel.org>
M:      Nick Desaulniers <ndesaulniers@google.com>
L:      llvm@lists.linux.dev
S:      Supported
W:      https://clangbuiltlinux.github.io/
B:      https://github.com/ClangBuiltLinux/linux/issues
C:      irc://irc.libera.chat/clangbuiltlinux
F:      Documentation/kbuild/llvm.rst
F:      include/linux/compiler-clang.h
F:      scripts/Makefile.clang
F:      scripts/clang-tools/
K:      \b(?i:clang|llvm)\b

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH v2 04/39] objtool: Add --dry-run
  2022-02-24 14:51 ` [PATCH v2 04/39] objtool: Add --dry-run Peter Zijlstra
@ 2022-02-25  0:27   ` Kees Cook
  2022-03-01 14:37   ` Miroslav Benes
  1 sibling, 0 replies; 183+ messages in thread
From: Kees Cook @ 2022-02-25  0:27 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: x86, joao, hjl.tools, jpoimboe, andrew.cooper3, linux-kernel,
	ndesaulniers, samitolvanen, mark.rutland, alyssa.milburn, mbenes,
	rostedt, mhiramat, alexei.starovoitov

On Thu, Feb 24, 2022 at 03:51:42PM +0100, Peter Zijlstra wrote:
> Add a --dry-run argument to skip writing the modifications. This is
> convenient for debugging.
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>

Yes please. :)

Reviewed-by: Kees Cook <keescook@chromium.org>

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH v2 05/39] x86: Base IBT bits
  2022-02-24 14:51 ` [PATCH v2 05/39] x86: Base IBT bits Peter Zijlstra
@ 2022-02-25  0:35   ` Kees Cook
  2022-02-25  0:46     ` Nathan Chancellor
  2022-02-25 13:41     ` Peter Zijlstra
  0 siblings, 2 replies; 183+ messages in thread
From: Kees Cook @ 2022-02-25  0:35 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: x86, joao, hjl.tools, jpoimboe, andrew.cooper3, linux-kernel,
	ndesaulniers, samitolvanen, mark.rutland, alyssa.milburn, mbenes,
	rostedt, mhiramat, alexei.starovoitov, Nathan Chancellor, llvm

On Thu, Feb 24, 2022 at 03:51:43PM +0100, Peter Zijlstra wrote:
> Add Kconfig, Makefile and basic instruction support for x86 IBT.
> 
> XXX clang is not playing ball, probably lld being 'funny', I'm having
> problems with .plt entries appearing all over after linking.

I'll try to look into this; I know you've been chatting with Nathan
about it. Is there an open bug for it? (And any kind of reproducer
smaller than a 39 patch series we can show the linker folks?) :)

> [...]
> +config X86_KERNEL_IBT
> +	prompt "Indirect Branch Tracking"
> +	bool
> +	depends on X86_64 && CC_HAS_IBT
> +	help
> +	  Build the kernel with support for Indirect Branch Tracking, a
> +	  hardware supported CFI scheme. Any indirect call must land on

	  hardware support course-grain forward-edge Control Flow Integrity
	  protection. It enforces that all indirect calls must land on

> +	  an ENDBR instruction, as such, the compiler will litter the
> +	  code with them to make this happen.

"litter the code" -> "instrument the machine code".


> +
>  config X86_INTEL_MEMORY_PROTECTION_KEYS
>  	prompt "Memory Protection Keys"
>  	def_bool y
> --- a/arch/x86/Makefile
> +++ b/arch/x86/Makefile
> @@ -36,7 +36,7 @@ endif
>  
>  # How to compile the 16-bit code.  Note we always compile for -march=i386;
>  # that way we can complain to the user if the CPU is insufficient.
> -REALMODE_CFLAGS	:= -m16 -g -Os -DDISABLE_BRANCH_PROFILING \
> +REALMODE_CFLAGS	:= -m16 -g -Os -DDISABLE_BRANCH_PROFILING -D__DISABLE_EXPORTS \
>  		   -Wall -Wstrict-prototypes -march=i386 -mregparm=3 \
>  		   -fno-strict-aliasing -fomit-frame-pointer -fno-pic \
>  		   -mno-mmx -mno-sse $(call cc-option,-fcf-protection=none)

This change seems important separately from this patch, yes? (Or at
least a specific call-out in the commit log.)

Otherwise, looks good.

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH v2 06/39] x86/ibt: Add ANNOTATE_NOENDBR
  2022-02-24 14:51 ` [PATCH v2 06/39] x86/ibt: Add ANNOTATE_NOENDBR Peter Zijlstra
@ 2022-02-25  0:36   ` Kees Cook
  0 siblings, 0 replies; 183+ messages in thread
From: Kees Cook @ 2022-02-25  0:36 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: x86, joao, hjl.tools, jpoimboe, andrew.cooper3, linux-kernel,
	ndesaulniers, samitolvanen, mark.rutland, alyssa.milburn, mbenes,
	rostedt, mhiramat, alexei.starovoitov

On Thu, Feb 24, 2022 at 03:51:44PM +0100, Peter Zijlstra wrote:
> In order to have objtool warn about code references to !ENDBR
> instruction, we need an annotation to allow this for non-control-flow
> instances -- consider text range checks, text patching, or return
> trampolines etc.
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>

Reviewed-by: Kees Cook <keescook@chromium.org>

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH v2 07/39] x86/entry: Sprinkle ENDBR dust
  2022-02-24 14:51 ` [PATCH v2 07/39] x86/entry: Sprinkle ENDBR dust Peter Zijlstra
  2022-02-24 22:37   ` Josh Poimboeuf
@ 2022-02-25  0:42   ` Kees Cook
  2022-02-25  9:22     ` Andrew Cooper
  1 sibling, 1 reply; 183+ messages in thread
From: Kees Cook @ 2022-02-25  0:42 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: x86, joao, hjl.tools, jpoimboe, andrew.cooper3, linux-kernel,
	ndesaulniers, samitolvanen, mark.rutland, alyssa.milburn, mbenes,
	rostedt, mhiramat, alexei.starovoitov

On Thu, Feb 24, 2022 at 03:51:45PM +0100, Peter Zijlstra wrote:
> The SYSCALL entry points are found through taking their respective
> address in order to program them in the MSRs, while the exception
> entry points are found through UNWIND_HINT_IRET_REGS.

Stupid question: does CET consider exception and syscall entry points to
be indirect calls? (I would expect so, but they're ever so slightly
differently executed...)

> [...]
>  0 :
> +	ENDBR
>  	.byte	0x6a, vector
>  	jmp	asm_common_interrupt
> -	nop
> -	/* Ensure that the above is 8 bytes max */
> -	. = 0b + 8
> +	/* Ensure that the above is IDT_ALIGN bytes max */
> +	.fill 0b + IDT_ALIGN - ., 1, 0x90

IIUC, these are just padding -- let's use 0xcc instead of 0x90 as we do
in other places (e.g. vmlinux.lds.S).

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH v2 08/39] x86/linkage: Add ENDBR to SYM_FUNC_START*()
  2022-02-24 14:51 ` [PATCH v2 08/39] x86/linkage: Add ENDBR to SYM_FUNC_START*() Peter Zijlstra
@ 2022-02-25  0:45   ` Kees Cook
  0 siblings, 0 replies; 183+ messages in thread
From: Kees Cook @ 2022-02-25  0:45 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: x86, joao, hjl.tools, jpoimboe, andrew.cooper3, linux-kernel,
	ndesaulniers, samitolvanen, mark.rutland, alyssa.milburn, mbenes,
	rostedt, mhiramat, alexei.starovoitov

On Thu, Feb 24, 2022 at 03:51:46PM +0100, Peter Zijlstra wrote:
> Ensure the ASM functions have ENDBR on for IBT builds, this follows
> the ARM64 example. Unlike ARM64, we'll likely end up overwriting them
> with poison.
> 
> Suggested-by: Mark Rutland <mark.rutland@arm.com>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  arch/x86/include/asm/linkage.h |   39 +++++++++++++++++++++++++++++++++++++++
>  1 file changed, 39 insertions(+)
> 
> --- a/arch/x86/include/asm/linkage.h
> +++ b/arch/x86/include/asm/linkage.h
> @@ -3,6 +3,7 @@
>  #define _ASM_X86_LINKAGE_H
>  
>  #include <linux/stringify.h>
> +#include <asm/ibt.h>
>  
>  #undef notrace
>  #define notrace __attribute__((no_instrument_function))
> @@ -34,5 +35,43 @@
>  
>  #endif /* __ASSEMBLY__ */
>  
> +/*
> + * compressed and purgatory define this to disable EXPORT,
> + * hijack this same to also not emit ENDBR.
> + */
> +#ifndef __DISABLE_EXPORTS

It's certainly cleaner to avoid increasing boot stub image size, but
there's no _harm_ in including them, yes?

Reviewed-by: Kees Cook <keescook@chromium.org>

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH v2 05/39] x86: Base IBT bits
  2022-02-25  0:35   ` Kees Cook
@ 2022-02-25  0:46     ` Nathan Chancellor
  2022-02-25 22:08       ` Nathan Chancellor
  2022-02-25 13:41     ` Peter Zijlstra
  1 sibling, 1 reply; 183+ messages in thread
From: Nathan Chancellor @ 2022-02-25  0:46 UTC (permalink / raw)
  To: Kees Cook
  Cc: Peter Zijlstra, x86, joao, hjl.tools, jpoimboe, andrew.cooper3,
	linux-kernel, ndesaulniers, samitolvanen, mark.rutland,
	alyssa.milburn, mbenes, rostedt, mhiramat, alexei.starovoitov,
	llvm

On Thu, Feb 24, 2022 at 04:35:51PM -0800, Kees Cook wrote:
> On Thu, Feb 24, 2022 at 03:51:43PM +0100, Peter Zijlstra wrote:
> > Add Kconfig, Makefile and basic instruction support for x86 IBT.
> > 
> > XXX clang is not playing ball, probably lld being 'funny', I'm having
> > problems with .plt entries appearing all over after linking.
> 
> I'll try to look into this; I know you've been chatting with Nathan
> about it. Is there an open bug for it? (And any kind of reproducer
> smaller than a 39 patch series we can show the linker folks?) :)

I should be able to create a reproducer with cvise and file a bug on
GitHub around this tomorrow, I should have done it after Peter's
comments on IRC.

Cheers,
Nathan

> > [...]
> > +config X86_KERNEL_IBT
> > +	prompt "Indirect Branch Tracking"
> > +	bool
> > +	depends on X86_64 && CC_HAS_IBT
> > +	help
> > +	  Build the kernel with support for Indirect Branch Tracking, a
> > +	  hardware supported CFI scheme. Any indirect call must land on
> 
> 	  hardware support course-grain forward-edge Control Flow Integrity
> 	  protection. It enforces that all indirect calls must land on
> 
> > +	  an ENDBR instruction, as such, the compiler will litter the
> > +	  code with them to make this happen.
> 
> "litter the code" -> "instrument the machine code".
> 
> 
> > +
> >  config X86_INTEL_MEMORY_PROTECTION_KEYS
> >  	prompt "Memory Protection Keys"
> >  	def_bool y
> > --- a/arch/x86/Makefile
> > +++ b/arch/x86/Makefile
> > @@ -36,7 +36,7 @@ endif
> >  
> >  # How to compile the 16-bit code.  Note we always compile for -march=i386;
> >  # that way we can complain to the user if the CPU is insufficient.
> > -REALMODE_CFLAGS	:= -m16 -g -Os -DDISABLE_BRANCH_PROFILING \
> > +REALMODE_CFLAGS	:= -m16 -g -Os -DDISABLE_BRANCH_PROFILING -D__DISABLE_EXPORTS \
> >  		   -Wall -Wstrict-prototypes -march=i386 -mregparm=3 \
> >  		   -fno-strict-aliasing -fomit-frame-pointer -fno-pic \
> >  		   -mno-mmx -mno-sse $(call cc-option,-fcf-protection=none)
> 
> This change seems important separately from this patch, yes? (Or at
> least a specific call-out in the commit log.)
> 
> Otherwise, looks good.
> 
> -- 
> Kees Cook

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH v2 21/39] x86/ibt: Annotate text references
  2022-02-24 14:51 ` [PATCH v2 21/39] x86/ibt: Annotate text references Peter Zijlstra
@ 2022-02-25  0:47   ` Josh Poimboeuf
  2022-02-25 12:57     ` Peter Zijlstra
  2022-02-25 13:04     ` Peter Zijlstra
  0 siblings, 2 replies; 183+ messages in thread
From: Josh Poimboeuf @ 2022-02-25  0:47 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: x86, joao, hjl.tools, andrew.cooper3, linux-kernel, ndesaulniers,
	keescook, samitolvanen, mark.rutland, alyssa.milburn, mbenes,
	rostedt, mhiramat, alexei.starovoitov

On Thu, Feb 24, 2022 at 03:51:59PM +0100, Peter Zijlstra wrote:
> @@ -563,12 +564,14 @@ SYM_CODE_END(\asmsym)
>  	.align 16
>  	.globl __irqentry_text_start
>  __irqentry_text_start:
> +	ANNOTATE_NOENDBR // unwinders

But the instruction here (first idt entry) actually does have an
endbr64...

Also I'm wondering if it would make sense to create an
'idt_entry_<vector>' symbol for each entry so objtool knows to validate
their ENDBRs.

> +++ b/arch/x86/lib/retpoline.S
> @@ -12,6 +12,8 @@
>  
>  	.section .text.__x86.indirect_thunk
>  
> +	ANNOTATE_NOENDBR // apply_retpolines

This should probably go after __x86_indirect_thunk_array?

-- 
Josh


^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH v2 09/39] x86/ibt,paravirt: Sprinkle ENDBR
  2022-02-24 14:51 ` [PATCH v2 09/39] x86/ibt,paravirt: Sprinkle ENDBR Peter Zijlstra
@ 2022-02-25  0:47   ` Kees Cook
  0 siblings, 0 replies; 183+ messages in thread
From: Kees Cook @ 2022-02-25  0:47 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: x86, joao, hjl.tools, jpoimboe, andrew.cooper3, linux-kernel,
	ndesaulniers, samitolvanen, mark.rutland, alyssa.milburn, mbenes,
	rostedt, mhiramat, alexei.starovoitov

On Thu, Feb 24, 2022 at 03:51:47PM +0100, Peter Zijlstra wrote:
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  arch/x86/entry/entry_64.S                 |    1 +
>  arch/x86/include/asm/paravirt.h           |    1 +
>  arch/x86/include/asm/qspinlock_paravirt.h |    3 +++
>  arch/x86/kernel/kvm.c                     |    3 ++-
>  arch/x86/kernel/paravirt.c                |    2 ++
>  5 files changed, 9 insertions(+), 1 deletion(-)
> 
> --- a/arch/x86/entry/entry_64.S
> +++ b/arch/x86/entry/entry_64.S
> @@ -635,6 +635,7 @@ SYM_INNER_LABEL(restore_regs_and_return_
>  
>  SYM_INNER_LABEL_ALIGN(native_iret, SYM_L_GLOBAL)
>  	UNWIND_HINT_IRET_REGS
> +	ENDBR // paravirt_iret

If this is also setting the stage for finer grain CFI schemes, should
these macros instead be something more generically named? Like,
INDIRECT_ENTRY, or so? I imagine that'd avoid future churn, but maybe
I'm pre-optimizing... Regardless:

Reviewed-by: Kees Cook <keescook@chromium.org>

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH v2 24/39] x86/text-patching: Make text_gen_insn() IBT aware
  2022-02-24 14:52 ` [PATCH v2 24/39] x86/text-patching: Make text_gen_insn() IBT aware Peter Zijlstra
@ 2022-02-25  0:49   ` Josh Poimboeuf
  0 siblings, 0 replies; 183+ messages in thread
From: Josh Poimboeuf @ 2022-02-25  0:49 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: x86, joao, hjl.tools, andrew.cooper3, linux-kernel, ndesaulniers,
	keescook, samitolvanen, mark.rutland, alyssa.milburn, mbenes,
	rostedt, mhiramat, alexei.starovoitov

On Thu, Feb 24, 2022 at 03:52:02PM +0100, Peter Zijlstra wrote:
> +	/*
> +	 * Hide the addresses to avoid the compiler folding in constants when
> +	 * referencing code, these can mess up annotations like
> +	 * ANNOTATE_NOENDBR.
> +	 */
> +	OPTIMIZER_HIDE_VAR(addr);
> +	OPTIMIZER_HIDE_VAR(dest);
> +
> +#ifdef CONFIG_X86_KERNEL_IBT
> +	if (is_endbr(*(u32 *)dest))
> +		dest += 4;
> +#endif

Another unnecessary ifdef.

-- 
Josh


^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH v2 10/39] x86/ibt,crypto: Add ENDBR for the jump-table entries
  2022-02-24 14:51 ` [PATCH v2 10/39] x86/ibt,crypto: Add ENDBR for the jump-table entries Peter Zijlstra
  2022-02-24 22:41   ` Josh Poimboeuf
@ 2022-02-25  0:50   ` Kees Cook
  2022-02-25 10:22     ` Peter Zijlstra
  1 sibling, 1 reply; 183+ messages in thread
From: Kees Cook @ 2022-02-25  0:50 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: x86, joao, hjl.tools, jpoimboe, andrew.cooper3, linux-kernel,
	ndesaulniers, samitolvanen, mark.rutland, alyssa.milburn, mbenes,
	rostedt, mhiramat, alexei.starovoitov

On Thu, Feb 24, 2022 at 03:51:48PM +0100, Peter Zijlstra wrote:
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>

Can you put some details in the commit log here about why these are
needed? My eyes can't find the indirect users...

> ---
>  arch/x86/crypto/crc32c-pcl-intel-asm_64.S |    3 +++
>  1 file changed, 3 insertions(+)
> 
> --- a/arch/x86/crypto/crc32c-pcl-intel-asm_64.S
> +++ b/arch/x86/crypto/crc32c-pcl-intel-asm_64.S
> @@ -195,6 +195,7 @@ SYM_FUNC_START(crc_pcl)
>  .altmacro
>  LABEL crc_ %i
>  .noaltmacro
> +	ENDBR
>  	crc32q   -i*8(block_0), crc_init
>  	crc32q   -i*8(block_1), crc1
>  	crc32q   -i*8(block_2), crc2
> @@ -203,6 +204,7 @@ LABEL crc_ %i

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH v2 11/39] x86/ibt,kvm: Add ENDBR to fastops
  2022-02-24 14:51 ` [PATCH v2 11/39] x86/ibt,kvm: Add ENDBR to fastops Peter Zijlstra
@ 2022-02-25  0:54   ` Kees Cook
  2022-02-25 10:24     ` Peter Zijlstra
  0 siblings, 1 reply; 183+ messages in thread
From: Kees Cook @ 2022-02-25  0:54 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: x86, joao, hjl.tools, jpoimboe, andrew.cooper3, linux-kernel,
	ndesaulniers, samitolvanen, mark.rutland, alyssa.milburn, mbenes,
	rostedt, mhiramat, alexei.starovoitov

On Thu, Feb 24, 2022 at 03:51:49PM +0100, Peter Zijlstra wrote:
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  arch/x86/kvm/emulate.c |    6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> --- a/arch/x86/kvm/emulate.c
> +++ b/arch/x86/kvm/emulate.c
> @@ -189,7 +189,7 @@
>  #define X16(x...) X8(x), X8(x)
>  
>  #define NR_FASTOP (ilog2(sizeof(ulong)) + 1)
> -#define FASTOP_SIZE 8
> +#define FASTOP_SIZE (8 * (1 + HAS_KERNEL_IBT))

Err, is this right? FASTOP_SIZE is used both as a size and an alignment.
But the ENDBR instruction is 4 bytes? Commit log maybe needed to
describe this.

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH v2 28/39] x86/ibt,xen: Sprinkle the ENDBR
  2022-02-24 14:52 ` [PATCH v2 28/39] x86/ibt,xen: Sprinkle the ENDBR Peter Zijlstra
@ 2022-02-25  0:54   ` Josh Poimboeuf
  2022-02-25 13:16     ` Peter Zijlstra
  0 siblings, 1 reply; 183+ messages in thread
From: Josh Poimboeuf @ 2022-02-25  0:54 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: x86, joao, hjl.tools, andrew.cooper3, linux-kernel, ndesaulniers,
	keescook, samitolvanen, mark.rutland, alyssa.milburn, mbenes,
	rostedt, mhiramat, alexei.starovoitov

On Thu, Feb 24, 2022 at 03:52:06PM +0100, Peter Zijlstra wrote:
> +++ b/arch/x86/xen/xen-head.S
> @@ -25,8 +25,11 @@
>  SYM_CODE_START(hypercall_page)
>  	.rept (PAGE_SIZE / 32)
>  		UNWIND_HINT_FUNC
> -		.skip 31, 0x90
> -		RET
> +		ANNOTATE_NOENDBR
> +		/*
> +		 * Xen will write the hypercall page, and sort out ENDBR.
> +		 */
> +		.skip 32, 0xcc

I seem to remember this UNWIND_HINT_FUNC was only there to silence
warnings because of the ret.  With the ret gone, maybe the hint can be
dropped as well.

-- 
Josh


^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH v2 12/39] x86/ibt,ftrace: Search for __fentry__ location
  2022-02-24 14:51 ` [PATCH v2 12/39] x86/ibt,ftrace: Search for __fentry__ location Peter Zijlstra
  2022-02-24 15:55   ` Masami Hiramatsu
@ 2022-02-25  0:55   ` Kees Cook
  2022-03-02 16:25   ` Naveen N. Rao
  2 siblings, 0 replies; 183+ messages in thread
From: Kees Cook @ 2022-02-25  0:55 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: x86, joao, hjl.tools, jpoimboe, andrew.cooper3, linux-kernel,
	ndesaulniers, samitolvanen, mark.rutland, alyssa.milburn, mbenes,
	rostedt, mhiramat, alexei.starovoitov

On Thu, Feb 24, 2022 at 03:51:50PM +0100, Peter Zijlstra wrote:
> Have ftrace_location() search the symbol for the __fentry__ location
> when it isn't at func+0 and use this for {,un}register_ftrace_direct().
> 
> This avoids a whole bunch of assumptions about __fentry__ being at
> func+0.
> 
> Suggested-by: Steven Rostedt <rostedt@goodmis.org>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>

Cool. This should help with anything using __fentry__ tricks (i.e.
future CFI...), yes?

Reviewed-by: Kees Cook <keescook@chromium.org>

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH v2 15/39] x86/ibt,kprobes: Fix more +0 assumptions
  2022-02-24 14:51 ` [PATCH v2 15/39] x86/ibt,kprobes: Fix more +0 assumptions Peter Zijlstra
@ 2022-02-25  0:58   ` Kees Cook
  2022-02-25  1:32   ` Masami Hiramatsu
  2022-02-28  6:07   ` Masami Hiramatsu
  2 siblings, 0 replies; 183+ messages in thread
From: Kees Cook @ 2022-02-25  0:58 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: x86, joao, hjl.tools, jpoimboe, andrew.cooper3, linux-kernel,
	ndesaulniers, samitolvanen, mark.rutland, alyssa.milburn, mbenes,
	rostedt, mhiramat, alexei.starovoitov

On Thu, Feb 24, 2022 at 03:51:53PM +0100, Peter Zijlstra wrote:
> With IBT on, sym+0 is no longer the __fentry__ site.
> 
> NOTE: the architecture has a special case and *does* allow placing an
> INT3 breakpoint over ENDBR in which case #BP has precedence over #CP
> and as such we don't need to disallow probing these instructions.
> 
> NOTE: irrespective of the above; there is a complication in that
> direct branches to functions are rewritten to not execute ENDBR, so
> any breakpoint thereon might miss lots of actual function executions.
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  arch/x86/kernel/kprobes/core.c |   11 +++++++++++
>  kernel/kprobes.c               |   15 ++++++++++++---
>  2 files changed, 23 insertions(+), 3 deletions(-)
> 
> --- a/arch/x86/kernel/kprobes/core.c
> +++ b/arch/x86/kernel/kprobes/core.c
> @@ -1156,3 +1162,8 @@ int arch_trampoline_kprobe(struct kprobe
>  {
>  	return 0;
>  }
> +
> +bool arch_kprobe_on_func_entry(unsigned long offset)
> +{
> +	return offset <= 4*HAS_KERNEL_IBT;
> +}

Let's avoid magic (though obvious right now) literal values. Can the "4"
be changed to a new ENBR_INSTR_SIZE macro or something?

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH v2 16/39] x86/bpf: Add ENDBR instructions to prologue and trampoline
  2022-02-24 23:37   ` Josh Poimboeuf
@ 2022-02-25  0:59     ` Kees Cook
  2022-02-25 11:20     ` Peter Zijlstra
  2022-02-25 12:24     ` Peter Zijlstra
  2 siblings, 0 replies; 183+ messages in thread
From: Kees Cook @ 2022-02-25  0:59 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Peter Zijlstra, x86, joao, hjl.tools, andrew.cooper3,
	linux-kernel, ndesaulniers, samitolvanen, mark.rutland,
	alyssa.milburn, mbenes, rostedt, mhiramat, alexei.starovoitov

On Thu, Feb 24, 2022 at 03:37:31PM -0800, Josh Poimboeuf wrote:
> All the "4*HAS_KERNEL_IBT" everywhere is cute, but you might as well
> just have IBT_ENDBR_SIZE (here and in other patches).

Oops, I should have read ahead. Yeah, I like folding the multiplication
into the macro...

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH v2 18/39] x86/ibt: Add IBT feature, MSR and #CP handling
  2022-02-24 14:51 ` [PATCH v2 18/39] x86/ibt: Add IBT feature, MSR and #CP handling Peter Zijlstra
  2022-02-24 23:55   ` Josh Poimboeuf
@ 2022-02-25  1:09   ` Kees Cook
  2022-02-25 19:59   ` Edgecombe, Rick P
  2 siblings, 0 replies; 183+ messages in thread
From: Kees Cook @ 2022-02-25  1:09 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: x86, joao, hjl.tools, jpoimboe, andrew.cooper3, linux-kernel,
	ndesaulniers, samitolvanen, mark.rutland, alyssa.milburn, mbenes,
	rostedt, mhiramat, alexei.starovoitov, linux-hardening

On Thu, Feb 24, 2022 at 03:51:56PM +0100, Peter Zijlstra wrote:
> [...]
> @@ -438,7 +439,8 @@ static __always_inline void setup_umip(s
>  
>  /* These bits should not change their value after CPU init is finished. */
>  static const unsigned long cr4_pinned_mask =
> -	X86_CR4_SMEP | X86_CR4_SMAP | X86_CR4_UMIP | X86_CR4_FSGSBASE;
> +	X86_CR4_SMEP | X86_CR4_SMAP | X86_CR4_UMIP |
> +	X86_CR4_FSGSBASE | X86_CR4_CET;

Thanks!

>  static DEFINE_STATIC_KEY_FALSE_RO(cr_pinning);
>  static unsigned long cr4_pinned_bits __ro_after_init;
>  
> @@ -592,6 +594,29 @@ static __init int setup_disable_pku(char
>  __setup("nopku", setup_disable_pku);
>  #endif /* CONFIG_X86_64 */
>  
> +static __always_inline void setup_cet(struct cpuinfo_x86 *c)
> +{
> +	u64 msr = CET_ENDBR_EN;
> +
> +	if (!HAS_KERNEL_IBT ||
> +	    !cpu_feature_enabled(X86_FEATURE_IBT))
> +		return;
> +
> +	wrmsrl(MSR_IA32_S_CET, msr);
> +	cr4_set_bits(X86_CR4_CET);
> +
> +	if (!ibt_selftest()) {
> +		pr_err("IBT selftest: Failed!\n");
> +		setup_clear_cpu_cap(X86_FEATURE_IBT);
> +	}

For easy boot-output testing, I'd love to see something like:

	} else {
		pr_info("CET detected: Indirect Branch Tracking enabled.\n")
	}

or maybe:
		pr_info("CET detected: Indirect Branch Tracking is %s.\n",
			ibt_fatal ? "enforced" : "warning only");

> [...]
> +bool ibt_selftest(void)
> +{
> +	unsigned long ret;
> +
> +	asm ("1: lea 2f(%%rip), %%rax\n\t"
> +	     ANNOTATE_RETPOLINE_SAFE
> +	     "   jmp *%%rax\n\t"
> +	     ASM_REACHABLE
> +	     ANNOTATE_NOENDBR
> +	     "2: nop\n\t"
> +
> +	     /* unsigned ibt_selftest_ip = 2b */
> +	     ".pushsection .rodata,\"a\"\n\t"
> +	     ".align 8\n\t"
> +	     ".type ibt_selftest_ip, @object\n\t"
> +	     ".size ibt_selftest_ip, 8\n\t"
> +	     "ibt_selftest_ip:\n\t"
> +	     ".quad 2b\n\t"
> +	     ".popsection\n\t"
> +
> +	     : "=a" (ret) : : "memory");
> +
> +	return !ret;
> +}

I did something like this for LKDTM, but I realize it depends on having no
frame pointer, and is likely x86-specific too, as I think arm64's function
preamble is responsible for pushing the return address on the stack:


static volatile int lkdtm_just_count;

/* Function taking one argument, returning int. */
static noinline void *lkdtm_just_return(void)
{
	/* Do something after preamble but before label. */
	lkdtm_just_count++;

yolo:
	{
		void *right_here = &&yolo;

		OPTIMIZER_HIDE_VAR(right_here);
		return right_here;
	}
}
/*
 * This tries to call an indirect function in the middle.
 */
void lkdtm_CFI_FORWARD_ENTRY(void)
{
	/*
	 * Matches lkdtm_increment_void()'s prototype, but not
	 * lkdtm_increment_int()'s prototype.
	 */
	void * (*func)(void);

	func = lkdtm_just_return;
	pr_info("Calling actual function entry point %px ...\n", func);
	func = func();

	pr_info("Calling middle of function %px ...\n", func);
	func = func();

	pr_err("FAIL: survived non-entry point call!\n");
#ifdef CONFIG_X86
	pr_expected_config(CONFIG_X86_BTI);
#endif
}

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH v2 19/39] x86: Disable IBT around firmware
  2022-02-24 14:51 ` [PATCH v2 19/39] x86: Disable IBT around firmware Peter Zijlstra
@ 2022-02-25  1:10   ` Kees Cook
  0 siblings, 0 replies; 183+ messages in thread
From: Kees Cook @ 2022-02-25  1:10 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: x86, joao, hjl.tools, jpoimboe, andrew.cooper3, linux-kernel,
	ndesaulniers, samitolvanen, mark.rutland, alyssa.milburn, mbenes,
	rostedt, mhiramat, alexei.starovoitov

On Thu, Feb 24, 2022 at 03:51:57PM +0100, Peter Zijlstra wrote:
> Assume firmware isn't IBT clean and disable it across calls.
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>

Reviewed-by: Kees Cook <keescook@chromium.org>

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH v2 20/39] x86/bugs: Disable Retpoline when IBT
  2022-02-24 14:51 ` [PATCH v2 20/39] x86/bugs: Disable Retpoline when IBT Peter Zijlstra
@ 2022-02-25  1:11   ` Kees Cook
  2022-02-25  2:22     ` Josh Poimboeuf
  2022-02-25 10:55     ` Peter Zijlstra
  0 siblings, 2 replies; 183+ messages in thread
From: Kees Cook @ 2022-02-25  1:11 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: x86, joao, hjl.tools, jpoimboe, andrew.cooper3, linux-kernel,
	ndesaulniers, samitolvanen, mark.rutland, alyssa.milburn, mbenes,
	rostedt, mhiramat, alexei.starovoitov

On Thu, Feb 24, 2022 at 03:51:58PM +0100, Peter Zijlstra wrote:
> Retpoline and IBT are mutually exclusive. IBT relies on indirect
> branches (JMP/CALL *%reg) while retpoline avoids them by design.
> 
> Demote to LFENCE on IBT enabled hardware.

What's the expected perf impact of this?

-Kees

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH v2 15/39] x86/ibt,kprobes: Fix more +0 assumptions
  2022-02-24 14:51 ` [PATCH v2 15/39] x86/ibt,kprobes: Fix more +0 assumptions Peter Zijlstra
  2022-02-25  0:58   ` Kees Cook
@ 2022-02-25  1:32   ` Masami Hiramatsu
  2022-02-25 10:46     ` Peter Zijlstra
  2022-02-28  6:07   ` Masami Hiramatsu
  2 siblings, 1 reply; 183+ messages in thread
From: Masami Hiramatsu @ 2022-02-25  1:32 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: x86, joao, hjl.tools, jpoimboe, andrew.cooper3, linux-kernel,
	ndesaulniers, keescook, samitolvanen, mark.rutland,
	alyssa.milburn, mbenes, rostedt, mhiramat, alexei.starovoitov

Hi Peter,

On Thu, 24 Feb 2022 15:51:53 +0100
Peter Zijlstra <peterz@infradead.org> wrote:

> With IBT on, sym+0 is no longer the __fentry__ site.
> 
> NOTE: the architecture has a special case and *does* allow placing an
> INT3 breakpoint over ENDBR in which case #BP has precedence over #CP
> and as such we don't need to disallow probing these instructions.

Does this mean we can still putting a probe on sym+0??

If so, NAK this patch, since the KPROBES_ON_FTRACE is not meaning
to accelerate the function entry probe, but just allows user to
put a probe on 'call _mcount' (which can be modified by ftrace).

func:
  endbr  <- sym+0  : INT3 is used. (kp->addr = func+0)
  nop5   <- sym+4? : ftrace is used. (kp->addr = func+4?)
  ...

And anyway, in some case (e.g. perf probe) symbol will be a basement
symbol like '_text' and @offset will be the function addr - _text addr
so that we can put a probe on local-scope function.

If you think we should not probe on the endbr, we should treat the
pair of endbr and nop5 (or call _mcount) instructions as a virtual
single instruction. This means kp->addr should point sym+0, but use
ftrace to probe.

func:
  endbr  <- sym+0  : ftrace is used. (kp->addr = func+0)
  nop5   <- sym+4? : This is not able to be probed.
  ...

Thank you,

> 
> NOTE: irrespective of the above; there is a complication in that
> direct branches to functions are rewritten to not execute ENDBR, so
> any breakpoint thereon might miss lots of actual function executions.
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  arch/x86/kernel/kprobes/core.c |   11 +++++++++++
>  kernel/kprobes.c               |   15 ++++++++++++---
>  2 files changed, 23 insertions(+), 3 deletions(-)
> 
> --- a/arch/x86/kernel/kprobes/core.c
> +++ b/arch/x86/kernel/kprobes/core.c
> @@ -1156,3 +1162,8 @@ int arch_trampoline_kprobe(struct kprobe
>  {
>  	return 0;
>  }
> +
> +bool arch_kprobe_on_func_entry(unsigned long offset)
> +{
> +	return offset <= 4*HAS_KERNEL_IBT;
> +}
> --- a/kernel/kprobes.c
> +++ b/kernel/kprobes.c
> @@ -67,10 +67,19 @@ static bool kprobes_all_disarmed;
>  static DEFINE_MUTEX(kprobe_mutex);
>  static DEFINE_PER_CPU(struct kprobe *, kprobe_instance);
>  
> -kprobe_opcode_t * __weak kprobe_lookup_name(const char *name,
> -					unsigned int __unused)
> +kprobe_opcode_t * __weak kprobe_lookup_name(const char *name, unsigned int offset)
>  {
> -	return ((kprobe_opcode_t *)(kallsyms_lookup_name(name)));
> +	kprobe_opcode_t *addr = NULL;
> +
> +	addr = ((kprobe_opcode_t *)(kallsyms_lookup_name(name)));
> +#ifdef CONFIG_KPROBES_ON_FTRACE
> +	if (addr && !offset) {
> +		unsigned long faddr = ftrace_location((unsigned long)addr);
> +		if (faddr)
> +			addr = (kprobe_opcode_t *)faddr;
> +	}
> +#endif
> +	return addr;
>  }
>  
>  /*
> 
> 


-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH v2 12/39] x86/ibt,ftrace: Search for __fentry__ location
  2022-02-24 15:58     ` Steven Rostedt
  2022-02-24 15:59       ` Steven Rostedt
  2022-02-24 16:01       ` Steven Rostedt
@ 2022-02-25  1:34       ` Masami Hiramatsu
  2022-02-25  2:19         ` Steven Rostedt
  2 siblings, 1 reply; 183+ messages in thread
From: Masami Hiramatsu @ 2022-02-25  1:34 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Peter Zijlstra, x86, joao, hjl.tools, jpoimboe, andrew.cooper3,
	linux-kernel, ndesaulniers, keescook, samitolvanen, mark.rutland,
	alyssa.milburn, mbenes, alexei.starovoitov

On Thu, 24 Feb 2022 10:58:47 -0500
Steven Rostedt <rostedt@goodmis.org> wrote:

> On Fri, 25 Feb 2022 00:55:20 +0900
> Masami Hiramatsu <mhiramat@kernel.org> wrote:
> 
> > >  unsigned long ftrace_location(unsigned long ip)
> > >  {
> > > -	return ftrace_location_range(ip, ip);
> > > +	struct dyn_ftrace *rec;
> > > +	unsigned long offset;
> > > +	unsigned long size;
> > > +
> > > +	rec = lookup_rec(ip, ip);
> > > +	if (!rec) {
> > > +		if (!kallsyms_lookup_size_offset(ip, &size, &offset))
> > > +			goto out;
> > > +
> > > +		if (!offset)  
> > 
> > Isn't this 'if (offset)' ?
> 
> No, the point to only look for the fentry location if the ip passed in
> points to the start of the function. IOW, +0 offset.

OK, so this means ftrace_location() will be same as
ftrace_location_range(sym, sym_end) ?

Thank you,

> 
> -- Steve
> 
> 
> > 
> > > +			rec = lookup_rec(ip - offset, (ip - offset) + size);
> > > +	}
> > > +
> > > +	if (rec)
> > > +		return rec->ip;
> > > +
> > > +out:
> > > +	return 0;


-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH v2 12/39] x86/ibt,ftrace: Search for __fentry__ location
  2022-02-25  1:34       ` Masami Hiramatsu
@ 2022-02-25  2:19         ` Steven Rostedt
  2022-02-25 10:20           ` Masami Hiramatsu
  0 siblings, 1 reply; 183+ messages in thread
From: Steven Rostedt @ 2022-02-25  2:19 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Peter Zijlstra, x86, joao, hjl.tools, jpoimboe, andrew.cooper3,
	linux-kernel, ndesaulniers, keescook, samitolvanen, mark.rutland,
	alyssa.milburn, mbenes, alexei.starovoitov

On Fri, 25 Feb 2022 10:34:49 +0900
Masami Hiramatsu <mhiramat@kernel.org> wrote:

> > > > +	if (!rec) {
> > > > +		if (!kallsyms_lookup_size_offset(ip, &size, &offset))
> > > > +			goto out;
> > > > +
> > > > +		if (!offset)    
> > > 
> > > Isn't this 'if (offset)' ?  
> > 
> > No, the point to only look for the fentry location if the ip passed in
> > points to the start of the function. IOW, +0 offset.  
> 
> OK, so this means ftrace_location() will be same as
> ftrace_location_range(sym, sym_end) ?

No. It only acts like ftrace_location_range(sym, sym_end) if the passed
in argument is the ip of the function (kallsyms returns offset = 0)

-- Steve

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH v2 20/39] x86/bugs: Disable Retpoline when IBT
  2022-02-25  1:11   ` Kees Cook
@ 2022-02-25  2:22     ` Josh Poimboeuf
  2022-02-25 10:55     ` Peter Zijlstra
  1 sibling, 0 replies; 183+ messages in thread
From: Josh Poimboeuf @ 2022-02-25  2:22 UTC (permalink / raw)
  To: Kees Cook
  Cc: Peter Zijlstra, x86, joao, hjl.tools, andrew.cooper3,
	linux-kernel, ndesaulniers, samitolvanen, mark.rutland,
	alyssa.milburn, mbenes, rostedt, mhiramat, alexei.starovoitov

On Thu, Feb 24, 2022 at 05:11:23PM -0800, Kees Cook wrote:
> On Thu, Feb 24, 2022 at 03:51:58PM +0100, Peter Zijlstra wrote:
> > Retpoline and IBT are mutually exclusive. IBT relies on indirect
> > branches (JMP/CALL *%reg) while retpoline avoids them by design.
> > 
> > Demote to LFENCE on IBT enabled hardware.
> 
> What's the expected perf impact of this?

Should make it faster...

-- 
Josh


^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH v2 07/39] x86/entry: Sprinkle ENDBR dust
  2022-02-25  0:42   ` Kees Cook
@ 2022-02-25  9:22     ` Andrew Cooper
  0 siblings, 0 replies; 183+ messages in thread
From: Andrew Cooper @ 2022-02-25  9:22 UTC (permalink / raw)
  To: Kees Cook, Peter Zijlstra
  Cc: x86, joao, hjl.tools, jpoimboe, linux-kernel, ndesaulniers,
	samitolvanen, mark.rutland, alyssa.milburn, mbenes, rostedt,
	mhiramat, alexei.starovoitov, Andrew Cooper

On 25/02/2022 00:42, Kees Cook wrote:
> On Thu, Feb 24, 2022 at 03:51:45PM +0100, Peter Zijlstra wrote:
>> The SYSCALL entry points are found through taking their respective
>> address in order to program them in the MSRs, while the exception
>> entry points are found through UNWIND_HINT_IRET_REGS.
> Stupid question: does CET consider exception and syscall entry points to
> be indirect calls? (I would expect so, but they're ever so slightly
> differently executed...)

Yes it does.  What happens is that on ring transition, microcode forces
the WAIT-FOR-ENDBR state.

For IDT entries, this protects against a single stray write hijacking
control flow.

SYSCALL/SYSENTER in principle don't need to be, IMO.  They're rooted in
MSRs rather than RAM, and if an attacker has hijacked the system enough
to change those, then the absence of ENDBR is not going to save you.

However, from a consistency and implementation point of view, you don't
want to be special casing how a ring transition was triggered.

~Andrew

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH v2 12/39] x86/ibt,ftrace: Search for __fentry__ location
  2022-02-25  2:19         ` Steven Rostedt
@ 2022-02-25 10:20           ` Masami Hiramatsu
  2022-02-25 13:36             ` Steven Rostedt
  0 siblings, 1 reply; 183+ messages in thread
From: Masami Hiramatsu @ 2022-02-25 10:20 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Peter Zijlstra, x86, joao, hjl.tools, jpoimboe, andrew.cooper3,
	linux-kernel, ndesaulniers, keescook, samitolvanen, mark.rutland,
	alyssa.milburn, mbenes, alexei.starovoitov

On Thu, 24 Feb 2022 21:19:19 -0500
Steven Rostedt <rostedt@goodmis.org> wrote:

> On Fri, 25 Feb 2022 10:34:49 +0900
> Masami Hiramatsu <mhiramat@kernel.org> wrote:
> 
> > > > > +	if (!rec) {
> > > > > +		if (!kallsyms_lookup_size_offset(ip, &size, &offset))
> > > > > +			goto out;
> > > > > +
> > > > > +		if (!offset)    
> > > > 
> > > > Isn't this 'if (offset)' ?  
> > > 
> > > No, the point to only look for the fentry location if the ip passed in
> > > points to the start of the function. IOW, +0 offset.  
> > 
> > OK, so this means ftrace_location() will be same as
> > ftrace_location_range(sym, sym_end) ?
> 
> No. It only acts like ftrace_location_range(sym, sym_end) if the passed
> in argument is the ip of the function (kallsyms returns offset = 0)

Got it. So now ftrace_location() will return the ftrace address
when the ip == func or ip == mcount-call.

Thank you,

-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH v2 10/39] x86/ibt,crypto: Add ENDBR for the jump-table entries
  2022-02-25  0:50   ` Kees Cook
@ 2022-02-25 10:22     ` Peter Zijlstra
  0 siblings, 0 replies; 183+ messages in thread
From: Peter Zijlstra @ 2022-02-25 10:22 UTC (permalink / raw)
  To: Kees Cook
  Cc: x86, joao, hjl.tools, jpoimboe, andrew.cooper3, linux-kernel,
	ndesaulniers, samitolvanen, mark.rutland, alyssa.milburn, mbenes,
	rostedt, mhiramat, alexei.starovoitov

On Thu, Feb 24, 2022 at 04:50:40PM -0800, Kees Cook wrote:
> On Thu, Feb 24, 2022 at 03:51:48PM +0100, Peter Zijlstra wrote:
> > 
> > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> 
> Can you put some details in the commit log here about why these are
> needed? My eyes can't find the indirect users...

	## branch into array
	mov	jump_table(,%rax,8), %bufp
	JMP_NOSPEC bufp

:-)

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH v2 11/39] x86/ibt,kvm: Add ENDBR to fastops
  2022-02-25  0:54   ` Kees Cook
@ 2022-02-25 10:24     ` Peter Zijlstra
  2022-02-25 13:09       ` David Laight
  0 siblings, 1 reply; 183+ messages in thread
From: Peter Zijlstra @ 2022-02-25 10:24 UTC (permalink / raw)
  To: Kees Cook
  Cc: x86, joao, hjl.tools, jpoimboe, andrew.cooper3, linux-kernel,
	ndesaulniers, samitolvanen, mark.rutland, alyssa.milburn, mbenes,
	rostedt, mhiramat, alexei.starovoitov

On Thu, Feb 24, 2022 at 04:54:04PM -0800, Kees Cook wrote:
> On Thu, Feb 24, 2022 at 03:51:49PM +0100, Peter Zijlstra wrote:
> > 
> > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> > ---
> >  arch/x86/kvm/emulate.c |    6 ++++--
> >  1 file changed, 4 insertions(+), 2 deletions(-)
> > 
> > --- a/arch/x86/kvm/emulate.c
> > +++ b/arch/x86/kvm/emulate.c
> > @@ -189,7 +189,7 @@
> >  #define X16(x...) X8(x), X8(x)
> >  
> >  #define NR_FASTOP (ilog2(sizeof(ulong)) + 1)
> > -#define FASTOP_SIZE 8
> > +#define FASTOP_SIZE (8 * (1 + HAS_KERNEL_IBT))
> 
> Err, is this right? FASTOP_SIZE is used both as a size and an alignment.
> But the ENDBR instruction is 4 bytes? Commit log maybe needed to
> describe this.

Note how that comes out as 8*1 or 8*2, iow 8 or 16. Does that clarify?
That is, 8+4 being 12 is ovbiuosly a fail for alignment.

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH v2 15/39] x86/ibt,kprobes: Fix more +0 assumptions
  2022-02-25  1:32   ` Masami Hiramatsu
@ 2022-02-25 10:46     ` Peter Zijlstra
  2022-02-25 13:42       ` Masami Hiramatsu
  2022-02-25 14:14       ` Steven Rostedt
  0 siblings, 2 replies; 183+ messages in thread
From: Peter Zijlstra @ 2022-02-25 10:46 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: x86, joao, hjl.tools, jpoimboe, andrew.cooper3, linux-kernel,
	ndesaulniers, keescook, samitolvanen, mark.rutland,
	alyssa.milburn, mbenes, rostedt, alexei.starovoitov

On Fri, Feb 25, 2022 at 10:32:15AM +0900, Masami Hiramatsu wrote:
> Hi Peter,
> 
> On Thu, 24 Feb 2022 15:51:53 +0100
> Peter Zijlstra <peterz@infradead.org> wrote:
> 
> > With IBT on, sym+0 is no longer the __fentry__ site.
> > 
> > NOTE: the architecture has a special case and *does* allow placing an
> > INT3 breakpoint over ENDBR in which case #BP has precedence over #CP
> > and as such we don't need to disallow probing these instructions.
> 
> Does this mean we can still putting a probe on sym+0??

I'm not sure... Possibly not. I'm not sure if there's an ABI that
by-passes kprobes_lookup_name(). Arguably you could give it a direct
address instead of a name and still hit the ENDBR I think. But the ABI
surface of this thing it too big for me to easily tell.

> If so, NAK this patch, since the KPROBES_ON_FTRACE is not meaning
> to accelerate the function entry probe, but just allows user to
> put a probe on 'call _mcount' (which can be modified by ftrace).
> 
> func:
>   endbr  <- sym+0  : INT3 is used. (kp->addr = func+0)
>   nop5   <- sym+4? : ftrace is used. (kp->addr = func+4?)
>   ...
> 
> And anyway, in some case (e.g. perf probe) symbol will be a basement
> symbol like '_text' and @offset will be the function addr - _text addr
> so that we can put a probe on local-scope function.
> 
> If you think we should not probe on the endbr, we should treat the
> pair of endbr and nop5 (or call _mcount) instructions as a virtual
> single instruction. This means kp->addr should point sym+0, but use
> ftrace to probe.
> 
> func:
>   endbr  <- sym+0  : ftrace is used. (kp->addr = func+0)
>   nop5   <- sym+4? : This is not able to be probed.
>   ...

Well, it's all a bit crap :/

This patch came from kernel/trace/trace_kprobe.c selftest failing at
boot. That tries to set a kprobe on kprobe_trace_selftest_target which
the whole kprobe machinery translates into
kprobe_trace_selftest_target+0 and then not actually hitting the fentry.

IOW, that selftest seems to hard-code/assume +0 matches __fentry__,
which just isn't true in general (arm64, powerpc are architectures that
come to mind) and now also might not be true on x86.

Calling the selftest broken works for me and I'll drop the patch.


Note that with these patches:

 - Not every function starts with ENDBR; the compiler is free to omit
   this instruction if it can determine the function address is never
   taken (and as such there's never an indirect call to it).

 - If there is an ENDBR, not every function entry will actually execute
   it. This first instruction is used exclusively as an indirect entry
   point. All direct calls should be to the next instruction.

 - If there was an ENDBR, it might be turned into a 4 byte UD1
   instruction to ensure any indirect call *will* fail.

Given all that, kprobe users are in a bit of a bind. Determining the
__fentry__ point basically means they *have* to first read the function
assembly to figure out where it is.

This patch takes the approach that sym+0 means __fentry__, irrespective
of where it might actually live. I *think* that's more or less
consistent with what other architectures do; specifically see
arch/powerpc/kernel/kprobes.c:kprobe_lookup_name(). I'm not quite sure
what ARM64 does when it has BTI on (which is then very similar to what
we have here).

What do you think makes most sense here?

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH v2 18/39] x86/ibt: Add IBT feature, MSR and #CP handling
  2022-02-24 23:55   ` Josh Poimboeuf
@ 2022-02-25 10:51     ` Peter Zijlstra
  2022-02-25 11:10       ` Peter Zijlstra
  2022-02-25 23:51       ` Josh Poimboeuf
  0 siblings, 2 replies; 183+ messages in thread
From: Peter Zijlstra @ 2022-02-25 10:51 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: x86, joao, hjl.tools, andrew.cooper3, linux-kernel, ndesaulniers,
	keescook, samitolvanen, mark.rutland, alyssa.milburn, mbenes,
	rostedt, mhiramat, alexei.starovoitov

On Thu, Feb 24, 2022 at 03:55:16PM -0800, Josh Poimboeuf wrote:
> On Thu, Feb 24, 2022 at 03:51:56PM +0100, Peter Zijlstra wrote:
> > +static __always_inline void setup_cet(struct cpuinfo_x86 *c)
> > +{
> > +	u64 msr = CET_ENDBR_EN;
> > +
> > +	if (!HAS_KERNEL_IBT ||
> > +	    !cpu_feature_enabled(X86_FEATURE_IBT))
> > +		return;
> 
> If you add X86_FEATURE_BIT to arch/x86/include/asm/disabled-features.h,
> the HAS_KERNEL_IBT check becomes redundant.

Cute.

> > +bool ibt_selftest(void)
> > +{
> > +	unsigned long ret;
> > +
> > +	asm ("1: lea 2f(%%rip), %%rax\n\t"
> > +	     ANNOTATE_RETPOLINE_SAFE
> > +	     "   jmp *%%rax\n\t"
> > +	     ASM_REACHABLE
> > +	     ANNOTATE_NOENDBR
> > +	     "2: nop\n\t"
> > +
> > +	     /* unsigned ibt_selftest_ip = 2b */
> > +	     ".pushsection .rodata,\"a\"\n\t"
> > +	     ".align 8\n\t"
> > +	     ".type ibt_selftest_ip, @object\n\t"
> > +	     ".size ibt_selftest_ip, 8\n\t"
> > +	     "ibt_selftest_ip:\n\t"
> > +	     ".quad 2b\n\t"
> > +	     ".popsection\n\t"
> 
> It's still seems silly to make this variable in asm.
> 
> Also .rodata isn't going to work for CPU hotplug.

It's just the IP, that stays invariant. I'm not sure how else to match
regs->ip to 2 in #CP.

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH v2 20/39] x86/bugs: Disable Retpoline when IBT
  2022-02-25  1:11   ` Kees Cook
  2022-02-25  2:22     ` Josh Poimboeuf
@ 2022-02-25 10:55     ` Peter Zijlstra
  1 sibling, 0 replies; 183+ messages in thread
From: Peter Zijlstra @ 2022-02-25 10:55 UTC (permalink / raw)
  To: Kees Cook
  Cc: x86, joao, hjl.tools, jpoimboe, andrew.cooper3, linux-kernel,
	ndesaulniers, samitolvanen, mark.rutland, alyssa.milburn, mbenes,
	rostedt, mhiramat, alexei.starovoitov

On Thu, Feb 24, 2022 at 05:11:23PM -0800, Kees Cook wrote:
> On Thu, Feb 24, 2022 at 03:51:58PM +0100, Peter Zijlstra wrote:
> > Retpoline and IBT are mutually exclusive. IBT relies on indirect
> > branches (JMP/CALL *%reg) while retpoline avoids them by design.
> > 
> > Demote to LFENCE on IBT enabled hardware.
> 
> What's the expected perf impact of this?

Well, the expected case is that this is all dead code because any part
that has IBT also has eIBRS and if we have eIBRS we never use retpolines
as per the current code.

The only case is virt, where someone could expose IBT and not eIBRS, in
which case this will, as per Josh's answer, make things faster since
LFENCE sucks less than full Retpoline.

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH v2 18/39] x86/ibt: Add IBT feature, MSR and #CP handling
  2022-02-25 10:51     ` Peter Zijlstra
@ 2022-02-25 11:10       ` Peter Zijlstra
  2022-02-25 23:51       ` Josh Poimboeuf
  1 sibling, 0 replies; 183+ messages in thread
From: Peter Zijlstra @ 2022-02-25 11:10 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: x86, joao, hjl.tools, andrew.cooper3, linux-kernel, ndesaulniers,
	keescook, samitolvanen, mark.rutland, alyssa.milburn, mbenes,
	rostedt, mhiramat, alexei.starovoitov

On Fri, Feb 25, 2022 at 11:51:01AM +0100, Peter Zijlstra wrote:
> On Thu, Feb 24, 2022 at 03:55:16PM -0800, Josh Poimboeuf wrote:
> > On Thu, Feb 24, 2022 at 03:51:56PM +0100, Peter Zijlstra wrote:
> > > +static __always_inline void setup_cet(struct cpuinfo_x86 *c)
> > > +{
> > > +	u64 msr = CET_ENDBR_EN;
> > > +
> > > +	if (!HAS_KERNEL_IBT ||
> > > +	    !cpu_feature_enabled(X86_FEATURE_IBT))
> > > +		return;
> > 
> > If you add X86_FEATURE_BIT to arch/x86/include/asm/disabled-features.h,
> > the HAS_KERNEL_IBT check becomes redundant.
> 
> Cute.

On second thought; I'm not sure that's desirable. Ideally KVM would
still expose IBT if present on the hardware, even if the host kernel
doesn't use it.

Killing the feature when the host doesn't use it seems unfortunate.

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH v2 16/39] x86/bpf: Add ENDBR instructions to prologue and trampoline
  2022-02-24 23:37   ` Josh Poimboeuf
  2022-02-25  0:59     ` Kees Cook
@ 2022-02-25 11:20     ` Peter Zijlstra
  2022-02-25 12:24     ` Peter Zijlstra
  2 siblings, 0 replies; 183+ messages in thread
From: Peter Zijlstra @ 2022-02-25 11:20 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: x86, joao, hjl.tools, andrew.cooper3, linux-kernel, ndesaulniers,
	keescook, samitolvanen, mark.rutland, alyssa.milburn, mbenes,
	rostedt, mhiramat, alexei.starovoitov

On Thu, Feb 24, 2022 at 03:37:31PM -0800, Josh Poimboeuf wrote:
> On Thu, Feb 24, 2022 at 03:51:54PM +0100, Peter Zijlstra wrote:
> > @@ -339,9 +350,18 @@ static int __bpf_arch_text_poke(void *ip
> >  	u8 *prog;
> >  	int ret;
> >  
> > +#ifdef CONFIG_X86_KERNEL_IBT
> > +	if (is_endbr(*(u32 *)ip))
> > +		ip += 4;
> > +#endif
> > +
> >  	memcpy(old_insn, nop_insn, X86_PATCH_SIZE);
> >  	if (old_addr) {
> >  		prog = old_insn;
> > +#ifdef CONFIG_X86_KERNEL_IBT
> > +		if (is_endbr(*(u32 *)old_addr))
> > +			old_addr += 4;
> > +#endif
> >  		ret = t == BPF_MOD_CALL ?
> >  		      emit_call(&prog, old_addr, ip) :
> >  		      emit_jump(&prog, old_addr, ip);
> > @@ -352,6 +372,10 @@ static int __bpf_arch_text_poke(void *ip
> >  	memcpy(new_insn, nop_insn, X86_PATCH_SIZE);
> >  	if (new_addr) {
> >  		prog = new_insn;
> > +#ifdef CONFIG_X86_KERNEL_IBT
> > +		if (is_endbr(*(u32 *)new_addr))
> > +			new_addr += 4;
> > +#endif
> 
> All the above ifdef-itis should be able to be removed since is_endbr()
> returns false for !IBT.

So I've been pondering a skip_endbr() function for all these sites.

I just couldn't decide making it a 'function' and having a signature
like:

	ip = skip_endbr(ip);

or making it macro magic and it reading:

	skip_endbr(ip);

Which is why I've not cleaned it up yet.  This being C(ish) I suppose
the former is less confusing, so let me go do that.


^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH v2 16/39] x86/bpf: Add ENDBR instructions to prologue and trampoline
  2022-02-24 23:37   ` Josh Poimboeuf
  2022-02-25  0:59     ` Kees Cook
  2022-02-25 11:20     ` Peter Zijlstra
@ 2022-02-25 12:24     ` Peter Zijlstra
  2022-02-25 22:46       ` Josh Poimboeuf
  2 siblings, 1 reply; 183+ messages in thread
From: Peter Zijlstra @ 2022-02-25 12:24 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: x86, joao, hjl.tools, andrew.cooper3, linux-kernel, ndesaulniers,
	keescook, samitolvanen, mark.rutland, alyssa.milburn, mbenes,
	rostedt, mhiramat, alexei.starovoitov

On Thu, Feb 24, 2022 at 03:37:31PM -0800, Josh Poimboeuf wrote:

> > @@ -2028,10 +2052,11 @@ int arch_prepare_bpf_trampoline(struct b
> >  		/* skip patched call instruction and point orig_call to actual
> >  		 * body of the kernel function.
> >  		 */
> > -		orig_call += X86_PATCH_SIZE;
> > +		orig_call += X86_PATCH_SIZE + 4*HAS_KERNEL_IBT;
> 
> All the "4*HAS_KERNEL_IBT" everywhere is cute, but you might as well
> just have IBT_ENDBR_SIZE (here and in other patches).

So there's two forms of this, only one has the 4 included:

  (x * (1 + HAS_KERNEL_IBT))
  (x + 4*HAS_KERNEL_IBT)

If I include the 4, then the first form would become something like:

  (x * (1 + !!IBT_ENDBR_SIZE))

that ok?

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH v2 21/39] x86/ibt: Annotate text references
  2022-02-25  0:47   ` Josh Poimboeuf
@ 2022-02-25 12:57     ` Peter Zijlstra
  2022-02-25 13:04     ` Peter Zijlstra
  1 sibling, 0 replies; 183+ messages in thread
From: Peter Zijlstra @ 2022-02-25 12:57 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: x86, joao, hjl.tools, andrew.cooper3, linux-kernel, ndesaulniers,
	keescook, samitolvanen, mark.rutland, alyssa.milburn, mbenes,
	rostedt, mhiramat, alexei.starovoitov

On Thu, Feb 24, 2022 at 04:47:16PM -0800, Josh Poimboeuf wrote:
> On Thu, Feb 24, 2022 at 03:51:59PM +0100, Peter Zijlstra wrote:
> > @@ -563,12 +564,14 @@ SYM_CODE_END(\asmsym)
> >  	.align 16
> >  	.globl __irqentry_text_start
> >  __irqentry_text_start:
> > +	ANNOTATE_NOENDBR // unwinders
> 
> But the instruction here (first idt entry) actually does have an
> endbr64...
> 

--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -2023,6 +2023,9 @@ static int read_noendbr_hints(struct obj
 			return -1;
 		}
 
+		if (insn->type == INSN_ENDBR)
+			WARN_FUNC("ANNOTATE_NOENDBR on ENDBR", insn->sec, insn->offset);
+
 		insn->noendbr = 1;
 	}
 

vmlinux.o: warning: objtool: .entry.text+0x160: ANNOTATE_NOENDBR on ENDBR
vmlinux.o: warning: objtool: xen_pvh_init()+0x0: ANNOTATE_NOENDBR on ENDBR

right you are... /me goes fix

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH v2 21/39] x86/ibt: Annotate text references
  2022-02-25  0:47   ` Josh Poimboeuf
  2022-02-25 12:57     ` Peter Zijlstra
@ 2022-02-25 13:04     ` Peter Zijlstra
  1 sibling, 0 replies; 183+ messages in thread
From: Peter Zijlstra @ 2022-02-25 13:04 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: x86, joao, hjl.tools, andrew.cooper3, linux-kernel, ndesaulniers,
	keescook, samitolvanen, mark.rutland, alyssa.milburn, mbenes,
	rostedt, mhiramat, alexei.starovoitov

On Thu, Feb 24, 2022 at 04:47:16PM -0800, Josh Poimboeuf wrote:
> On Thu, Feb 24, 2022 at 03:51:59PM +0100, Peter Zijlstra wrote:
> > @@ -563,12 +564,14 @@ SYM_CODE_END(\asmsym)
> >  	.align 16
> >  	.globl __irqentry_text_start
> >  __irqentry_text_start:
> > +	ANNOTATE_NOENDBR // unwinders
> 
> But the instruction here (first idt entry) actually does have an
> endbr64...
> 
> Also I'm wondering if it would make sense to create an
> 'idt_entry_<vector>' symbol for each entry so objtool knows to validate
> their ENDBRs.

I think we're good on that front since irq_entries_start and
spurious_entries_start capture the first entry and the rest is .rept off
of that.

^ permalink raw reply	[flat|nested] 183+ messages in thread

* RE: [PATCH v2 11/39] x86/ibt,kvm: Add ENDBR to fastops
  2022-02-25 10:24     ` Peter Zijlstra
@ 2022-02-25 13:09       ` David Laight
  0 siblings, 0 replies; 183+ messages in thread
From: David Laight @ 2022-02-25 13:09 UTC (permalink / raw)
  To: 'Peter Zijlstra', Kees Cook
  Cc: x86, joao, hjl.tools, jpoimboe, andrew.cooper3, linux-kernel,
	ndesaulniers, samitolvanen, mark.rutland, alyssa.milburn, mbenes,
	rostedt, mhiramat, alexei.starovoitov

From: Peter Zijlstra
> Sent: 25 February 2022 10:25
> 
> On Thu, Feb 24, 2022 at 04:54:04PM -0800, Kees Cook wrote:
> > On Thu, Feb 24, 2022 at 03:51:49PM +0100, Peter Zijlstra wrote:
> > >
> > > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> > > ---
> > >  arch/x86/kvm/emulate.c |    6 ++++--
> > >  1 file changed, 4 insertions(+), 2 deletions(-)
> > >
> > > --- a/arch/x86/kvm/emulate.c
> > > +++ b/arch/x86/kvm/emulate.c
> > > @@ -189,7 +189,7 @@
> > >  #define X16(x...) X8(x), X8(x)
> > >
> > >  #define NR_FASTOP (ilog2(sizeof(ulong)) + 1)
> > > -#define FASTOP_SIZE 8
> > > +#define FASTOP_SIZE (8 * (1 + HAS_KERNEL_IBT))

Defining as:
		(8 + HAS_KERNEL_IBT * 8)
would probably be easier to read when half asleep.

	David

> >
> > Err, is this right? FASTOP_SIZE is used both as a size and an alignment.
> > But the ENDBR instruction is 4 bytes? Commit log maybe needed to
> > describe this.
> 
> Note how that comes out as 8*1 or 8*2, iow 8 or 16. Does that clarify?
> That is, 8+4 being 12 is ovbiuosly a fail for alignment.

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH v2 28/39] x86/ibt,xen: Sprinkle the ENDBR
  2022-02-25  0:54   ` Josh Poimboeuf
@ 2022-02-25 13:16     ` Peter Zijlstra
  0 siblings, 0 replies; 183+ messages in thread
From: Peter Zijlstra @ 2022-02-25 13:16 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: x86, joao, hjl.tools, andrew.cooper3, linux-kernel, ndesaulniers,
	keescook, samitolvanen, mark.rutland, alyssa.milburn, mbenes,
	rostedt, mhiramat, alexei.starovoitov

On Thu, Feb 24, 2022 at 04:54:40PM -0800, Josh Poimboeuf wrote:
> On Thu, Feb 24, 2022 at 03:52:06PM +0100, Peter Zijlstra wrote:
> > +++ b/arch/x86/xen/xen-head.S
> > @@ -25,8 +25,11 @@
> >  SYM_CODE_START(hypercall_page)
> >  	.rept (PAGE_SIZE / 32)
> >  		UNWIND_HINT_FUNC
> > -		.skip 31, 0x90
> > -		RET
> > +		ANNOTATE_NOENDBR
> > +		/*
> > +		 * Xen will write the hypercall page, and sort out ENDBR.
> > +		 */
> > +		.skip 32, 0xcc
> 
> I seem to remember this UNWIND_HINT_FUNC was only there to silence
> warnings because of the ret.  With the ret gone, maybe the hint can be
> dropped as well.

vmlinux.o: warning: objtool: xen_hypercall_iret()+0x0: stack state mismatch: cfa1=4+8 cfa2=-1+0

and back it goes ;-)

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH v2 12/39] x86/ibt,ftrace: Search for __fentry__ location
  2022-02-25 10:20           ` Masami Hiramatsu
@ 2022-02-25 13:36             ` Steven Rostedt
  2022-03-01 18:57               ` Naveen N. Rao
  0 siblings, 1 reply; 183+ messages in thread
From: Steven Rostedt @ 2022-02-25 13:36 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Peter Zijlstra, x86, joao, hjl.tools, jpoimboe, andrew.cooper3,
	linux-kernel, ndesaulniers, keescook, samitolvanen, mark.rutland,
	alyssa.milburn, mbenes, alexei.starovoitov

On Fri, 25 Feb 2022 19:20:08 +0900
Masami Hiramatsu <mhiramat@kernel.org> wrote:

> > No. It only acts like ftrace_location_range(sym, sym_end) if the passed
> > in argument is the ip of the function (kallsyms returns offset = 0)  
> 
> Got it. So now ftrace_location() will return the ftrace address
> when the ip == func or ip == mcount-call.

Exactly! :-)

Are you OK with that?

-- Steve

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH v2 05/39] x86: Base IBT bits
  2022-02-25  0:35   ` Kees Cook
  2022-02-25  0:46     ` Nathan Chancellor
@ 2022-02-25 13:41     ` Peter Zijlstra
  1 sibling, 0 replies; 183+ messages in thread
From: Peter Zijlstra @ 2022-02-25 13:41 UTC (permalink / raw)
  To: Kees Cook
  Cc: x86, joao, hjl.tools, jpoimboe, andrew.cooper3, linux-kernel,
	ndesaulniers, samitolvanen, mark.rutland, alyssa.milburn, mbenes,
	rostedt, mhiramat, alexei.starovoitov, Nathan Chancellor, llvm

On Thu, Feb 24, 2022 at 04:35:51PM -0800, Kees Cook wrote:

> > --- a/arch/x86/Makefile
> > +++ b/arch/x86/Makefile
> > @@ -36,7 +36,7 @@ endif
> >  
> >  # How to compile the 16-bit code.  Note we always compile for -march=i386;
> >  # that way we can complain to the user if the CPU is insufficient.
> > -REALMODE_CFLAGS	:= -m16 -g -Os -DDISABLE_BRANCH_PROFILING \
> > +REALMODE_CFLAGS	:= -m16 -g -Os -DDISABLE_BRANCH_PROFILING -D__DISABLE_EXPORTS \
> >  		   -Wall -Wstrict-prototypes -march=i386 -mregparm=3 \
> >  		   -fno-strict-aliasing -fomit-frame-pointer -fno-pic \
> >  		   -mno-mmx -mno-sse $(call cc-option,-fcf-protection=none)
> 
> This change seems important separately from this patch, yes? (Or at
> least a specific call-out in the commit log.)

It was to not stuff endbr in realmode code. *something* complained about
it (and Joao independently mentioned it on IRC).

Could be I hit some compile fail somewhere and this was the cleanest way
to simply make it go away.

I'll go mention it somewhere.


^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH v2 15/39] x86/ibt,kprobes: Fix more +0 assumptions
  2022-02-25 10:46     ` Peter Zijlstra
@ 2022-02-25 13:42       ` Masami Hiramatsu
  2022-02-25 15:41         ` Peter Zijlstra
  2022-02-25 14:14       ` Steven Rostedt
  1 sibling, 1 reply; 183+ messages in thread
From: Masami Hiramatsu @ 2022-02-25 13:42 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: x86, joao, hjl.tools, jpoimboe, andrew.cooper3, linux-kernel,
	ndesaulniers, keescook, samitolvanen, mark.rutland,
	alyssa.milburn, mbenes, rostedt, alexei.starovoitov

On Fri, 25 Feb 2022 11:46:23 +0100
Peter Zijlstra <peterz@infradead.org> wrote:

> On Fri, Feb 25, 2022 at 10:32:15AM +0900, Masami Hiramatsu wrote:
> > Hi Peter,
> > 
> > On Thu, 24 Feb 2022 15:51:53 +0100
> > Peter Zijlstra <peterz@infradead.org> wrote:
> > 
> > > With IBT on, sym+0 is no longer the __fentry__ site.
> > > 
> > > NOTE: the architecture has a special case and *does* allow placing an
> > > INT3 breakpoint over ENDBR in which case #BP has precedence over #CP
> > > and as such we don't need to disallow probing these instructions.
> > 
> > Does this mean we can still putting a probe on sym+0??
> 
> I'm not sure... Possibly not. I'm not sure if there's an ABI that
> by-passes kprobes_lookup_name(). Arguably you could give it a direct
> address instead of a name and still hit the ENDBR I think. But the ABI
> surface of this thing it too big for me to easily tell.
> 
> > If so, NAK this patch, since the KPROBES_ON_FTRACE is not meaning
> > to accelerate the function entry probe, but just allows user to
> > put a probe on 'call _mcount' (which can be modified by ftrace).
> > 
> > func:
> >   endbr  <- sym+0  : INT3 is used. (kp->addr = func+0)
> >   nop5   <- sym+4? : ftrace is used. (kp->addr = func+4?)
> >   ...
> > 
> > And anyway, in some case (e.g. perf probe) symbol will be a basement
> > symbol like '_text' and @offset will be the function addr - _text addr
> > so that we can put a probe on local-scope function.
> > 
> > If you think we should not probe on the endbr, we should treat the
> > pair of endbr and nop5 (or call _mcount) instructions as a virtual
> > single instruction. This means kp->addr should point sym+0, but use
> > ftrace to probe.
> > 
> > func:
> >   endbr  <- sym+0  : ftrace is used. (kp->addr = func+0)
> >   nop5   <- sym+4? : This is not able to be probed.
> >   ...
> 
> Well, it's all a bit crap :/
> 
> This patch came from kernel/trace/trace_kprobe.c selftest failing at
> boot. That tries to set a kprobe on kprobe_trace_selftest_target which
> the whole kprobe machinery translates into
> kprobe_trace_selftest_target+0 and then not actually hitting the fentry.

OK.

> 
> IOW, that selftest seems to hard-code/assume +0 matches __fentry__,
> which just isn't true in general (arm64, powerpc are architectures that
> come to mind) and now also might not be true on x86.

Yeah, right. But if we can handle this as above, maybe we can continue
to put the probe on the entry of the function.

> 
> Calling the selftest broken works for me and I'll drop the patch.
> 
> 
> Note that with these patches:
> 
>  - Not every function starts with ENDBR; the compiler is free to omit
>    this instruction if it can determine the function address is never
>    taken (and as such there's never an indirect call to it).
> 
>  - If there is an ENDBR, not every function entry will actually execute
>    it. This first instruction is used exclusively as an indirect entry
>    point. All direct calls should be to the next instruction.
> 
>  - If there was an ENDBR, it might be turned into a 4 byte UD1
>    instruction to ensure any indirect call *will* fail.

Ah, I see. So that is a booby trap for the cracker. 

> 
> Given all that, kprobe users are in a bit of a bind. Determining the
> __fentry__ point basically means they *have* to first read the function
> assembly to figure out where it is.

OK, this sounds like kp->addr should be "call fentry" if there is ENDBR.

> 
> This patch takes the approach that sym+0 means __fentry__, irrespective
> of where it might actually live. I *think* that's more or less
> consistent with what other architectures do; specifically see
> arch/powerpc/kernel/kprobes.c:kprobe_lookup_name(). I'm not quite sure
> what ARM64 does when it has BTI on (which is then very similar to what
> we have here).

Yeah, I know the powerpc does such thing, but I think that is not what
user expected. I actually would like to fix that, because in powerpc
and other non-x86 case (without BTI/IBT), the instructions on sym+0 is
actually executed.

> 
> What do you think makes most sense here?

Are there any way to distinguish the "preparing instructions" (part of
calling mcount) and this kind of trap instruction online[1]? If possible,
I would like to skip such traps, but put the probe on preparing
instructions.
It seems currently we are using ftrace address as the end marker of
the trap instruction, but we actually need another marker to split
the end of ENDBR and the preparing instructions.

[1]
On x86, we have

func:
endbr
call __fentry__ <-- ftrace location

But on other arch,

func:
[BTI instruction]
push return address <--- preparing instruction(s)
call __fentry__     <-- ftrace location



Thank you,

-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH v2 15/39] x86/ibt,kprobes: Fix more +0 assumptions
  2022-02-25 10:46     ` Peter Zijlstra
  2022-02-25 13:42       ` Masami Hiramatsu
@ 2022-02-25 14:14       ` Steven Rostedt
  2022-02-26  7:09         ` Masami Hiramatsu
  1 sibling, 1 reply; 183+ messages in thread
From: Steven Rostedt @ 2022-02-25 14:14 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Masami Hiramatsu, x86, joao, hjl.tools, jpoimboe, andrew.cooper3,
	linux-kernel, ndesaulniers, keescook, samitolvanen, mark.rutland,
	alyssa.milburn, mbenes, alexei.starovoitov

On Fri, 25 Feb 2022 11:46:23 +0100
Peter Zijlstra <peterz@infradead.org> wrote:

> Given all that, kprobe users are in a bit of a bind. Determining the
> __fentry__ point basically means they *have* to first read the function
> assembly to figure out where it is.

Technically I think that's what kprobes has been designed for. But
realistically, I do not think anyone actually does that (outside of
academic and niche uses).

Really, when people use func+0 they just want to trace the function, and
ftrace is the fastest way to do so, and if it's not *exactly* at function
entry, but includes the arguments, then it should be fine.

That said, perhaps we should add a config to know if the architecture
uses function entry or the old mcount that is after the frame set up (that
is, you can not get to the arguments).

CONFIG_HAVE_FTRACE_FUNCTION_START ?

Because, if the arch still uses the old mcount method (where it's after the
frame set up), then a kprobe at func+0 really wants the breakpoint.

-- Steve


^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH v2 00/39] x86: Kernel IBT
  2022-02-24 20:26 ` [PATCH v2 00/39] x86: Kernel IBT Josh Poimboeuf
@ 2022-02-25 15:28   ` Peter Zijlstra
  2022-02-25 15:43     ` Peter Zijlstra
  2022-03-01 23:10     ` Josh Poimboeuf
  0 siblings, 2 replies; 183+ messages in thread
From: Peter Zijlstra @ 2022-02-25 15:28 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: x86, joao, hjl.tools, andrew.cooper3, linux-kernel, ndesaulniers,
	keescook, samitolvanen, mark.rutland, alyssa.milburn, mbenes,
	rostedt, mhiramat, alexei.starovoitov

On Thu, Feb 24, 2022 at 12:26:02PM -0800, Josh Poimboeuf wrote:

> Bricked my SPR:
> 
> [   21.602888] jump_label: Fatal kernel bug, unexpected op at sched_clock_stable+0x4/0x20 [0000000074a0db20] (eb 06 b8 01 00 != eb 0a 00 00 00)) size:2 type:0

> ffffffff81120a70 <sched_clock_stable>:
> ffffffff81120a70:       f3 0f 1e fa             endbr64
> ffffffff81120a74:       eb 06                   jmp    ffffffff81120a7c <sched_clock_stable+0xc>
> ffffffff81120a76:       b8 01 00 00 00          mov    $0x1,%eax
> ffffffff81120a7b:       c3                      retq
> ffffffff81120a7c:       f3 0f 1e fa             endbr64
> ffffffff81120a80:       31 c0                   xor    %eax,%eax
> ffffffff81120a82:       c3                      retq
> ffffffff81120a83:       66 66 2e 0f 1f 84 00    data16 nopw %cs:0x0(%rax,%rax,1)
> ffffffff81120a8a:       00 00 00 00
> ffffffff81120a8e:       66 90                   xchg   %ax,%ax

This is due to you having a very old (and arguably buggy) compiler :-( I
can reproduce with gcc-8.4 and gcc-9.4, my gcc-10.3 compiler no longer
generates daft code like that, nor do any later.

That said, I can fix objtool to also re-write jumps to in-the-middle
ENDBR like this, but then I do get a bunch of:

OBJTOOL vmlinux.o
vmlinux.o: warning: objtool: displacement doesn't fit
vmlinux.o: warning: objtool: ep_insert()+0xbc5: Direct IMM jump to ENDBR; cannot fix
vmlinux.o: warning: objtool: displacement doesn't fit
vmlinux.o: warning: objtool: configfs_depend_prep()+0x76: Direct IMM jump to ENDBR; cannot fix
vmlinux.o: warning: objtool: displacement doesn't fit
vmlinux.o: warning: objtool: request_key_and_link()+0x17b: Direct IMM jump to ENDBR; cannot fix
vmlinux.o: warning: objtool: displacement doesn't fit
vmlinux.o: warning: objtool: blk_mq_poll()+0x2e0: Direct IMM jump to ENDBR; cannot fix

The alternative is only skipping endbr at +0 I suppose, lemme go try
that with the brand spanking new skip_endbr() function.

Yep,.. that seems to cure things. It noaw boats when build with old
crappy compilers too.


--- a/arch/x86/include/asm/ibt.h
+++ b/arch/x86/include/asm/ibt.h
@@ -47,6 +47,8 @@ static inline bool is_endbr(unsigned int
 	return val == gen_endbr();
 }
 
+extern void *skip_endbr(void *);
+
 extern __noendbr u64 ibt_save(void);
 extern __noendbr void ibt_restore(u64 save);
 
@@ -71,6 +73,7 @@ extern __noendbr void ibt_restore(u64 sa
 #define __noendbr
 
 static inline bool is_endbr(unsigned int val) { return false; }
+static inline void *skip_endbr(void *addr) { return addr; }
 
 static inline u64 ibt_save(void) { return 0; }
 static inline void ibt_restore(u64 save) { }
--- a/arch/x86/include/asm/text-patching.h
+++ b/arch/x86/include/asm/text-patching.h
@@ -112,10 +112,7 @@ void __text_gen_insn(void *buf, u8 opcod
 	OPTIMIZER_HIDE_VAR(addr);
 	OPTIMIZER_HIDE_VAR(dest);
 
-#ifdef CONFIG_X86_KERNEL_IBT
-	if (is_endbr(*(u32 *)dest))
-		dest += 4;
-#endif
+	dest = skip_endbr((void *)dest);
 
 	insn->opcode = opcode;
 
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -620,6 +620,19 @@ __noendbr void ibt_restore(u64 save)
 	}
 }
 
+
+void *skip_endbr(void *addr)
+{
+	unsigned long size, offset;
+
+	if (is_endbr(*(unsigned int *)addr) &&
+	    kallsyms_lookup_size_offset((unsigned long)addr, &size, &offset) &&
+	    !offset)
+		addr += 4;
+
+	return addr;
+}
+
 #endif
 
 static __always_inline void setup_cet(struct cpuinfo_x86 *c)
@@ -636,7 +649,10 @@ static __always_inline void setup_cet(st
 	if (!ibt_selftest()) {
 		pr_err("IBT selftest: Failed!\n");
 		setup_clear_cpu_cap(X86_FEATURE_IBT);
+		return;
 	}
+
+	pr_info("CET detected: Indirect Branch Tracking enabled\n");
 }
 
 __noendbr void cet_disable(void)
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -350,18 +350,12 @@ static int __bpf_arch_text_poke(void *ip
 	u8 *prog;
 	int ret;
 
-#ifdef CONFIG_X86_KERNEL_IBT
-	if (is_endbr(*(u32 *)ip))
-		ip += 4;
-#endif
+	ip = skip_endbr(ip);
 
 	memcpy(old_insn, nop_insn, X86_PATCH_SIZE);
 	if (old_addr) {
 		prog = old_insn;
-#ifdef CONFIG_X86_KERNEL_IBT
-		if (is_endbr(*(u32 *)old_addr))
-			old_addr += 4;
-#endif
+		old_addr = skip_endbr(old_addr);
 		ret = t == BPF_MOD_CALL ?
 		      emit_call(&prog, old_addr, ip) :
 		      emit_jump(&prog, old_addr, ip);
@@ -372,10 +366,7 @@ static int __bpf_arch_text_poke(void *ip
 	memcpy(new_insn, nop_insn, X86_PATCH_SIZE);
 	if (new_addr) {
 		prog = new_insn;
-#ifdef CONFIG_X86_KERNEL_IBT
-		if (is_endbr(*(u32 *)new_addr))
-			new_addr += 4;
-#endif
+		new_addr = skip_endbr(new_addr);
 		ret = t == BPF_MOD_CALL ?
 		      emit_call(&prog, new_addr, ip) :
 		      emit_jump(&prog, new_addr, ip);

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH v2 15/39] x86/ibt,kprobes: Fix more +0 assumptions
  2022-02-25 13:42       ` Masami Hiramatsu
@ 2022-02-25 15:41         ` Peter Zijlstra
  2022-02-26  2:10           ` Masami Hiramatsu
  0 siblings, 1 reply; 183+ messages in thread
From: Peter Zijlstra @ 2022-02-25 15:41 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: x86, joao, hjl.tools, jpoimboe, andrew.cooper3, linux-kernel,
	ndesaulniers, keescook, samitolvanen, mark.rutland,
	alyssa.milburn, mbenes, rostedt, alexei.starovoitov

On Fri, Feb 25, 2022 at 10:42:49PM +0900, Masami Hiramatsu wrote:

> OK, this sounds like kp->addr should be "call fentry" if there is ENDBR.
> 
> > 
> > This patch takes the approach that sym+0 means __fentry__, irrespective
> > of where it might actually live. I *think* that's more or less
> > consistent with what other architectures do; specifically see
> > arch/powerpc/kernel/kprobes.c:kprobe_lookup_name(). I'm not quite sure
> > what ARM64 does when it has BTI on (which is then very similar to what
> > we have here).
> 
> Yeah, I know the powerpc does such thing, but I think that is not what
> user expected. I actually would like to fix that, because in powerpc
> and other non-x86 case (without BTI/IBT), the instructions on sym+0 is
> actually executed.
> 
> > 
> > What do you think makes most sense here?
> 
> Are there any way to distinguish the "preparing instructions" (part of
> calling mcount) and this kind of trap instruction online[1]? If possible,
> I would like to skip such traps, but put the probe on preparing
> instructions.

None that exist, but we could easily create one. See also my email here:

  https://lkml.kernel.org/r/Yhj1oFcTl2RnghBz@hirez.programming.kicks-ass.net

That skip_endbr() function is basically what you're looking for; it just
needs a better name and a Power/ARM64 implementation to get what you
want, right?

The alternative 'hack' I've been contemplating is (ab)using
INT_MIN/INT_MAX offset for __fentry__ and __fexit__ points (that latter
is something we'll probably have to grow when CET-SHSTK or backward-edge
CFI gets to be done, because then ROP tricks as used by function-graph
and kretprobes are out the window).

That way sym+[0..size) is still a valid reference to the actual
instruction in the symbol, but sym+INT_MIN will hard map to __fentry__
while sym+INT_MAX will get us __fexit__.

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH v2 00/39] x86: Kernel IBT
  2022-02-25 15:28   ` Peter Zijlstra
@ 2022-02-25 15:43     ` Peter Zijlstra
  2022-02-25 17:26       ` Josh Poimboeuf
  2022-03-01 23:10     ` Josh Poimboeuf
  1 sibling, 1 reply; 183+ messages in thread
From: Peter Zijlstra @ 2022-02-25 15:43 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: x86, joao, hjl.tools, andrew.cooper3, linux-kernel, ndesaulniers,
	keescook, samitolvanen, mark.rutland, alyssa.milburn, mbenes,
	rostedt, mhiramat, alexei.starovoitov

On Fri, Feb 25, 2022 at 04:28:32PM +0100, Peter Zijlstra wrote:
> +void *skip_endbr(void *addr)
> +{
> +	unsigned long size, offset;
> +
> +	if (is_endbr(*(unsigned int *)addr) &&
> +	    kallsyms_lookup_size_offset((unsigned long)addr, &size, &offset) &&
> +	    !offset)
> +		addr += 4;
> +
> +	return addr;
> +}

Damn, I just realized this makes KERNEL_IBT hard depend on KALLSYMS :-(

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH v2 00/39] x86: Kernel IBT
  2022-02-25 15:43     ` Peter Zijlstra
@ 2022-02-25 17:26       ` Josh Poimboeuf
  2022-02-25 17:32         ` Steven Rostedt
  0 siblings, 1 reply; 183+ messages in thread
From: Josh Poimboeuf @ 2022-02-25 17:26 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: x86, joao, hjl.tools, andrew.cooper3, linux-kernel, ndesaulniers,
	keescook, samitolvanen, mark.rutland, alyssa.milburn, mbenes,
	rostedt, mhiramat, alexei.starovoitov

On Fri, Feb 25, 2022 at 04:43:20PM +0100, Peter Zijlstra wrote:
> On Fri, Feb 25, 2022 at 04:28:32PM +0100, Peter Zijlstra wrote:
> > +void *skip_endbr(void *addr)
> > +{
> > +	unsigned long size, offset;
> > +
> > +	if (is_endbr(*(unsigned int *)addr) &&
> > +	    kallsyms_lookup_size_offset((unsigned long)addr, &size, &offset) &&
> > +	    !offset)
> > +		addr += 4;
> > +
> > +	return addr;
> > +}
> 
> Damn, I just realized this makes KERNEL_IBT hard depend on KALLSYMS :-(

Why should the jump label patching code even care whether there's an
ENDBR at the jump target?  It should never jump to the beginning of a
function anyway, right?  And objtool presumably doesn't patch out ENDBRs
in the middle of a function.

-- 
Josh


^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH v2 00/39] x86: Kernel IBT
  2022-02-25 17:26       ` Josh Poimboeuf
@ 2022-02-25 17:32         ` Steven Rostedt
  2022-02-25 19:53           ` Peter Zijlstra
  0 siblings, 1 reply; 183+ messages in thread
From: Steven Rostedt @ 2022-02-25 17:32 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Peter Zijlstra, x86, joao, hjl.tools, andrew.cooper3,
	linux-kernel, ndesaulniers, keescook, samitolvanen, mark.rutland,
	alyssa.milburn, mbenes, mhiramat, alexei.starovoitov

On Fri, 25 Feb 2022 09:26:44 -0800
Josh Poimboeuf <jpoimboe@redhat.com> wrote:

> > Damn, I just realized this makes KERNEL_IBT hard depend on KALLSYMS :-(  
> 
> Why should the jump label patching code even care whether there's an
> ENDBR at the jump target?  It should never jump to the beginning of a
> function anyway, right?  And objtool presumably doesn't patch out ENDBRs
> in the middle of a function.

Perhaps Peter confused jump labels with static calls?

-- Steve

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH v2 00/39] x86: Kernel IBT
  2022-02-25 17:32         ` Steven Rostedt
@ 2022-02-25 19:53           ` Peter Zijlstra
  2022-02-25 20:15             ` Josh Poimboeuf
  0 siblings, 1 reply; 183+ messages in thread
From: Peter Zijlstra @ 2022-02-25 19:53 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Josh Poimboeuf, x86, joao, hjl.tools, andrew.cooper3,
	linux-kernel, ndesaulniers, keescook, samitolvanen, mark.rutland,
	alyssa.milburn, mbenes, mhiramat, alexei.starovoitov

On Fri, Feb 25, 2022 at 12:32:38PM -0500, Steven Rostedt wrote:
> On Fri, 25 Feb 2022 09:26:44 -0800
> Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> 
> > > Damn, I just realized this makes KERNEL_IBT hard depend on KALLSYMS :-(  
> > 
> > Why should the jump label patching code even care whether there's an
> > ENDBR at the jump target?  It should never jump to the beginning of a
> > function anyway, right?  And objtool presumably doesn't patch out ENDBRs
> > in the middle of a function.
> 
> Perhaps Peter confused jump labels with static calls?

The code is shared between the two.

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH v2 18/39] x86/ibt: Add IBT feature, MSR and #CP handling
  2022-02-24 14:51 ` [PATCH v2 18/39] x86/ibt: Add IBT feature, MSR and #CP handling Peter Zijlstra
  2022-02-24 23:55   ` Josh Poimboeuf
  2022-02-25  1:09   ` Kees Cook
@ 2022-02-25 19:59   ` Edgecombe, Rick P
  2022-03-01 15:14     ` Peter Zijlstra
  2 siblings, 1 reply; 183+ messages in thread
From: Edgecombe, Rick P @ 2022-02-25 19:59 UTC (permalink / raw)
  To: Poimboe, Josh, peterz, hjl.tools, x86, joao, Cooper, Andrew
  Cc: linux-kernel, keescook, rostedt, samitolvanen, mark.rutland,
	alexei.starovoitov, Milburn, Alyssa, mhiramat, mbenes,
	ndesaulniers

On Thu, 2022-02-24 at 15:51 +0100, Peter Zijlstra wrote:
> +__noendbr void cet_disable(void)
> +{
> +       if (cpu_feature_enabled(X86_FEATURE_IBT))
> +               wrmsrl(MSR_IA32_S_CET, 0);
> +}
> +

Did this actually work? There are actually two problems with kexecing
when CET is enabled. One is leaving the enforcement enabled when the
new kernel can't handle it. The other is that CR4.CET and CR0.WP are
tied together such that if you try to disable CR0.WP while CR4.CET is
still set, it will #GP. CR0.WP gets unset during kexec/boot in the new
kernel, so it blows up if you just disable IBT with the MSR and leave
the CR4 bit set.

I was under the impression that this had been tested in the userspace
series, but apparently not as I've just produced the CR0.WP issue. So
it needs to be fixed in that series too. Userspace doesn't really need
it pinned, so it should be easy.

For kernel IBT, to enumerate a few options for kexec/pinning:

1. Just remove CR4.CET from the pinning mask, and unset it normally.
2. Leave it in the pinning mask and add separate non-pin-checking 
   inlined CR4 write late in the kexec path to unset CR4.CET.
3. Remove the unsetting of CR0.WP in the boot path. This would 
   only support kexecing to new kernels (there were actually patches 
   for this from the KVM CR pinning stuff that detected whether the 
   new kernel was bootable and failed gracefully IIRC). It's 
   potentially fraught and not great to lose kexecing to old kernels.
4. Do (1) for now and then follow this series with a larger solution 
   that does (2) and also adds some level of MSR pinning.

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH v2 00/39] x86: Kernel IBT
  2022-02-25 19:53           ` Peter Zijlstra
@ 2022-02-25 20:15             ` Josh Poimboeuf
  0 siblings, 0 replies; 183+ messages in thread
From: Josh Poimboeuf @ 2022-02-25 20:15 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Steven Rostedt, x86, joao, hjl.tools, andrew.cooper3,
	linux-kernel, ndesaulniers, keescook, samitolvanen, mark.rutland,
	alyssa.milburn, mbenes, mhiramat, alexei.starovoitov

On Fri, Feb 25, 2022 at 08:53:03PM +0100, Peter Zijlstra wrote:
> On Fri, Feb 25, 2022 at 12:32:38PM -0500, Steven Rostedt wrote:
> > On Fri, 25 Feb 2022 09:26:44 -0800
> > Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> > 
> > > > Damn, I just realized this makes KERNEL_IBT hard depend on KALLSYMS :-(  
> > > 
> > > Why should the jump label patching code even care whether there's an
> > > ENDBR at the jump target?  It should never jump to the beginning of a
> > > function anyway, right?  And objtool presumably doesn't patch out ENDBRs
> > > in the middle of a function.
> > 
> > Perhaps Peter confused jump labels with static calls?
> 
> The code is shared between the two.

Then just have a text_gen_insn_dont_skip_endbr() or so... ?

-- 
Josh


^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH v2 05/39] x86: Base IBT bits
  2022-02-25  0:46     ` Nathan Chancellor
@ 2022-02-25 22:08       ` Nathan Chancellor
  2022-02-26  0:29         ` Joao Moreira
  0 siblings, 1 reply; 183+ messages in thread
From: Nathan Chancellor @ 2022-02-25 22:08 UTC (permalink / raw)
  To: Kees Cook
  Cc: Peter Zijlstra, x86, joao, hjl.tools, jpoimboe, andrew.cooper3,
	linux-kernel, ndesaulniers, samitolvanen, mark.rutland,
	alyssa.milburn, mbenes, rostedt, mhiramat, alexei.starovoitov,
	llvm

On Thu, Feb 24, 2022 at 05:46:59PM -0700, Nathan Chancellor wrote:
> On Thu, Feb 24, 2022 at 04:35:51PM -0800, Kees Cook wrote:
> > On Thu, Feb 24, 2022 at 03:51:43PM +0100, Peter Zijlstra wrote:
> > > Add Kconfig, Makefile and basic instruction support for x86 IBT.
> > > 
> > > XXX clang is not playing ball, probably lld being 'funny', I'm having
> > > problems with .plt entries appearing all over after linking.
> > 
> > I'll try to look into this; I know you've been chatting with Nathan
> > about it. Is there an open bug for it? (And any kind of reproducer
> > smaller than a 39 patch series we can show the linker folks?) :)
> 
> I should be able to create a reproducer with cvise and file a bug on
> GitHub around this tomorrow, I should have done it after Peter's
> comments on IRC.

https://github.com/ClangBuiltLinux/linux/issues/1606

Cheers,
Nathan

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH v2 16/39] x86/bpf: Add ENDBR instructions to prologue and trampoline
  2022-02-25 12:24     ` Peter Zijlstra
@ 2022-02-25 22:46       ` Josh Poimboeuf
  0 siblings, 0 replies; 183+ messages in thread
From: Josh Poimboeuf @ 2022-02-25 22:46 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: x86, joao, hjl.tools, andrew.cooper3, linux-kernel, ndesaulniers,
	keescook, samitolvanen, mark.rutland, alyssa.milburn, mbenes,
	rostedt, mhiramat, alexei.starovoitov

On Fri, Feb 25, 2022 at 01:24:58PM +0100, Peter Zijlstra wrote:
> On Thu, Feb 24, 2022 at 03:37:31PM -0800, Josh Poimboeuf wrote:
> 
> > > @@ -2028,10 +2052,11 @@ int arch_prepare_bpf_trampoline(struct b
> > >  		/* skip patched call instruction and point orig_call to actual
> > >  		 * body of the kernel function.
> > >  		 */
> > > -		orig_call += X86_PATCH_SIZE;
> > > +		orig_call += X86_PATCH_SIZE + 4*HAS_KERNEL_IBT;
> > 
> > All the "4*HAS_KERNEL_IBT" everywhere is cute, but you might as well
> > just have IBT_ENDBR_SIZE (here and in other patches).
> 
> So there's two forms of this, only one has the 4 included:
> 
>   (x * (1 + HAS_KERNEL_IBT))
>   (x + 4*HAS_KERNEL_IBT)
> 
> If I include the 4, then the first form would become something like:
> 
>   (x * (1 + !!IBT_ENDBR_SIZE))
> 
> that ok?

I don't have a strong preference, but there's no harming in having both
IBT_ENDBR_SIZE and HAS_KERNEL_IBT if it would help readability.

-- 
Josh


^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH v2 18/39] x86/ibt: Add IBT feature, MSR and #CP handling
  2022-02-25 10:51     ` Peter Zijlstra
  2022-02-25 11:10       ` Peter Zijlstra
@ 2022-02-25 23:51       ` Josh Poimboeuf
  2022-02-26 11:55         ` Peter Zijlstra
  1 sibling, 1 reply; 183+ messages in thread
From: Josh Poimboeuf @ 2022-02-25 23:51 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: x86, joao, hjl.tools, andrew.cooper3, linux-kernel, ndesaulniers,
	keescook, samitolvanen, mark.rutland, alyssa.milburn, mbenes,
	rostedt, mhiramat, alexei.starovoitov

On Fri, Feb 25, 2022 at 11:51:01AM +0100, Peter Zijlstra wrote:
> > > +bool ibt_selftest(void)
> > > +{
> > > +	unsigned long ret;
> > > +
> > > +	asm ("1: lea 2f(%%rip), %%rax\n\t"
> > > +	     ANNOTATE_RETPOLINE_SAFE
> > > +	     "   jmp *%%rax\n\t"
> > > +	     ASM_REACHABLE
> > > +	     ANNOTATE_NOENDBR
> > > +	     "2: nop\n\t"
> > > +
> > > +	     /* unsigned ibt_selftest_ip = 2b */
> > > +	     ".pushsection .rodata,\"a\"\n\t"
> > > +	     ".align 8\n\t"
> > > +	     ".type ibt_selftest_ip, @object\n\t"
> > > +	     ".size ibt_selftest_ip, 8\n\t"
> > > +	     "ibt_selftest_ip:\n\t"
> > > +	     ".quad 2b\n\t"
> > > +	     ".popsection\n\t"
> > 
> > It's still seems silly to make this variable in asm.
> > 
> > Also .rodata isn't going to work for CPU hotplug.
> 
> It's just the IP, that stays invariant. I'm not sure how else to match
> regs->ip to 2 in #CP.

Ah, I see what you mean now.  Still, it can just reference the code
label itself without having to allocate storage:

diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 4806fa0adec7..cfaa05ddd1ec 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -225,7 +225,7 @@ static void handle_endbr(struct pt_regs *regs)
 	BUG();
 }
 
-extern const unsigned long ibt_selftest_ip; /* defined in asm beow */
+void ibt_selftest_ip(void); /* code label defined in asm below */
 
 DEFINE_IDTENTRY_ERRORCODE(exc_control_protection)
 {
@@ -237,7 +237,7 @@ DEFINE_IDTENTRY_ERRORCODE(exc_control_protection)
 	if (WARN_ON_ONCE(user_mode(regs) || error_code != 3))
 		return;
 
-	if (unlikely(regs->ip == ibt_selftest_ip)) {
+	if (unlikely(regs->ip == (unsigned long)ibt_selftest_ip)) {
 		regs->ax = 0;
 		return;
 	}
@@ -249,22 +249,12 @@ bool ibt_selftest(void)
 {
 	unsigned long ret;
 
-	asm ("1: lea 2f(%%rip), %%rax\n\t"
+	asm ("1: lea ibt_selftest_ip(%%rip), %%rax\n\t"
 	     ANNOTATE_RETPOLINE_SAFE
 	     "   jmp *%%rax\n\t"
 	     ASM_REACHABLE
 	     ANNOTATE_NOENDBR
-	     "2: nop\n\t"
-
-	     /* unsigned ibt_selftest_ip = 2b */
-	     ".pushsection .rodata,\"a\"\n\t"
-	     ".align 8\n\t"
-	     ".type ibt_selftest_ip, @object\n\t"
-	     ".size ibt_selftest_ip, 8\n\t"
-	     "ibt_selftest_ip:\n\t"
-	     ".quad 2b\n\t"
-	     ".popsection\n\t"
-
+	     "ibt_selftest_ip: nop\n\t"
 	     : "=a" (ret) : : "memory");
 
 	return !ret;


^ permalink raw reply related	[flat|nested] 183+ messages in thread

* Re: [PATCH v2 05/39] x86: Base IBT bits
  2022-02-25 22:08       ` Nathan Chancellor
@ 2022-02-26  0:29         ` Joao Moreira
  2022-02-26  4:58           ` Kees Cook
  0 siblings, 1 reply; 183+ messages in thread
From: Joao Moreira @ 2022-02-26  0:29 UTC (permalink / raw)
  To: Nathan Chancellor
  Cc: Kees Cook, Peter Zijlstra, x86, hjl.tools, jpoimboe,
	andrew.cooper3, linux-kernel, ndesaulniers, samitolvanen,
	mark.rutland, alyssa.milburn, mbenes, rostedt, mhiramat,
	alexei.starovoitov, llvm

> https://github.com/ClangBuiltLinux/linux/issues/1606

Candidate fix: https://reviews.llvm.org/D120600

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH v2 15/39] x86/ibt,kprobes: Fix more +0 assumptions
  2022-02-25 15:41         ` Peter Zijlstra
@ 2022-02-26  2:10           ` Masami Hiramatsu
  2022-02-26 11:48             ` Peter Zijlstra
  0 siblings, 1 reply; 183+ messages in thread
From: Masami Hiramatsu @ 2022-02-26  2:10 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: x86, joao, hjl.tools, jpoimboe, andrew.cooper3, linux-kernel,
	ndesaulniers, keescook, samitolvanen, mark.rutland,
	alyssa.milburn, mbenes, rostedt, alexei.starovoitov

On Fri, 25 Feb 2022 16:41:15 +0100
Peter Zijlstra <peterz@infradead.org> wrote:

> On Fri, Feb 25, 2022 at 10:42:49PM +0900, Masami Hiramatsu wrote:
> 
> > OK, this sounds like kp->addr should be "call fentry" if there is ENDBR.
> > 
> > > 
> > > This patch takes the approach that sym+0 means __fentry__, irrespective
> > > of where it might actually live. I *think* that's more or less
> > > consistent with what other architectures do; specifically see
> > > arch/powerpc/kernel/kprobes.c:kprobe_lookup_name(). I'm not quite sure
> > > what ARM64 does when it has BTI on (which is then very similar to what
> > > we have here).
> > 
> > Yeah, I know the powerpc does such thing, but I think that is not what
> > user expected. I actually would like to fix that, because in powerpc
> > and other non-x86 case (without BTI/IBT), the instructions on sym+0 is
> > actually executed.
> > 
> > > 
> > > What do you think makes most sense here?
> > 
> > Are there any way to distinguish the "preparing instructions" (part of
> > calling mcount) and this kind of trap instruction online[1]? If possible,
> > I would like to skip such traps, but put the probe on preparing
> > instructions.
> 
> None that exist, but we could easily create one. See also my email here:
> 
>   https://lkml.kernel.org/r/Yhj1oFcTl2RnghBz@hirez.programming.kicks-ass.net
> 
> That skip_endbr() function is basically what you're looking for; it just
> needs a better name and a Power/ARM64 implementation to get what you
> want, right?

Great! that's what I need. I think is_endbr() is also useful :)

> The alternative 'hack' I've been contemplating is (ab)using
> INT_MIN/INT_MAX offset for __fentry__ and __fexit__ points (that latter
> is something we'll probably have to grow when CET-SHSTK or backward-edge
> CFI gets to be done, because then ROP tricks as used by function-graph
> and kretprobes are out the window).
> 
> That way sym+[0..size) is still a valid reference to the actual
> instruction in the symbol, but sym+INT_MIN will hard map to __fentry__
> while sym+INT_MAX will get us __fexit__.

Interesting, is that done by another series?
Maybe I have to check that change for kprobe jump optimization.

Thank you,

-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH v2 05/39] x86: Base IBT bits
  2022-02-26  0:29         ` Joao Moreira
@ 2022-02-26  4:58           ` Kees Cook
  2022-02-26  4:59             ` Fāng-ruì Sòng
  0 siblings, 1 reply; 183+ messages in thread
From: Kees Cook @ 2022-02-26  4:58 UTC (permalink / raw)
  To: Joao Moreira, Fangrui Song
  Cc: Nathan Chancellor, Peter Zijlstra, x86, hjl.tools, jpoimboe,
	andrew.cooper3, linux-kernel, ndesaulniers, samitolvanen,
	mark.rutland, alyssa.milburn, mbenes, rostedt, mhiramat,
	alexei.starovoitov, llvm

On Fri, Feb 25, 2022 at 04:29:49PM -0800, Joao Moreira wrote:
> > https://github.com/ClangBuiltLinux/linux/issues/1606
> 
> Candidate fix: https://reviews.llvm.org/D120600

And landed! Thanks!

Since this is a pretty small change, do you think it could be backported
to the clang-14 branch?

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH v2 05/39] x86: Base IBT bits
  2022-02-26  4:58           ` Kees Cook
@ 2022-02-26  4:59             ` Fāng-ruì Sòng
  2022-02-26  5:04               ` Kees Cook
  0 siblings, 1 reply; 183+ messages in thread
From: Fāng-ruì Sòng @ 2022-02-26  4:59 UTC (permalink / raw)
  To: Kees Cook
  Cc: Joao Moreira, Nathan Chancellor, Peter Zijlstra, x86, hjl.tools,
	jpoimboe, andrew.cooper3, linux-kernel, ndesaulniers,
	samitolvanen, Mark.Rutland, alyssa.milburn, mbenes, rostedt,
	mhiramat, alexei.starovoitov, llvm

On Fri, Feb 25, 2022 at 8:58 PM Kees Cook <keescook@chromium.org> wrote:
>
> On Fri, Feb 25, 2022 at 04:29:49PM -0800, Joao Moreira wrote:
> > > https://github.com/ClangBuiltLinux/linux/issues/1606
> >
> > Candidate fix: https://reviews.llvm.org/D120600
>
> And landed! Thanks!
>
> Since this is a pretty small change, do you think it could be backported
> to the clang-14 branch?
>
> --
> Kees Cook

I have pushed it to release/14.x :)
https://github.com/llvm/llvm-project/commit/f8ca5fabdb54fdf64b3dffb38ebf7d0220f415a2

The current release schedule is
https://discourse.llvm.org/t/llvm-14-0-0-release-schedule/5846


-- 
宋方睿

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH v2 05/39] x86: Base IBT bits
  2022-02-26  4:59             ` Fāng-ruì Sòng
@ 2022-02-26  5:04               ` Kees Cook
  0 siblings, 0 replies; 183+ messages in thread
From: Kees Cook @ 2022-02-26  5:04 UTC (permalink / raw)
  To: Fāng-ruì Sòng
  Cc: Joao Moreira, Nathan Chancellor, Peter Zijlstra, x86, hjl.tools,
	jpoimboe, andrew.cooper3, linux-kernel, ndesaulniers,
	samitolvanen, Mark.Rutland, alyssa.milburn, mbenes, rostedt,
	mhiramat, alexei.starovoitov, llvm

On Fri, Feb 25, 2022 at 08:59:57PM -0800, Fāng-ruì Sòng wrote:
> On Fri, Feb 25, 2022 at 8:58 PM Kees Cook <keescook@chromium.org> wrote:
> >
> > On Fri, Feb 25, 2022 at 04:29:49PM -0800, Joao Moreira wrote:
> > > > https://github.com/ClangBuiltLinux/linux/issues/1606
> > >
> > > Candidate fix: https://reviews.llvm.org/D120600
> >
> > And landed! Thanks!
> >
> > Since this is a pretty small change, do you think it could be backported
> > to the clang-14 branch?
> >
> > --
> > Kees Cook
> 
> I have pushed it to release/14.x :)
> https://github.com/llvm/llvm-project/commit/f8ca5fabdb54fdf64b3dffb38ebf7d0220f415a2
> 
> The current release schedule is
> https://discourse.llvm.org/t/llvm-14-0-0-release-schedule/5846

Great! :)

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH v2 15/39] x86/ibt,kprobes: Fix more +0 assumptions
  2022-02-25 14:14       ` Steven Rostedt
@ 2022-02-26  7:09         ` Masami Hiramatsu
  0 siblings, 0 replies; 183+ messages in thread
From: Masami Hiramatsu @ 2022-02-26  7:09 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Peter Zijlstra, Masami Hiramatsu, x86, joao, hjl.tools, jpoimboe,
	andrew.cooper3, linux-kernel, ndesaulniers, keescook,
	samitolvanen, mark.rutland, alyssa.milburn, mbenes,
	alexei.starovoitov

On Fri, 25 Feb 2022 09:14:09 -0500
Steven Rostedt <rostedt@goodmis.org> wrote:

> On Fri, 25 Feb 2022 11:46:23 +0100
> Peter Zijlstra <peterz@infradead.org> wrote:
> 
> > Given all that, kprobe users are in a bit of a bind. Determining the
> > __fentry__ point basically means they *have* to first read the function
> > assembly to figure out where it is.
> 
> Technically I think that's what kprobes has been designed for. But
> realistically, I do not think anyone actually does that (outside of
> academic and niche uses).

Yeah, raw kprobe user must check the instruction boundary anyway.
And if possible, I would like to keep the kprobe (in kprobe level) as it is.

> Really, when people use func+0 they just want to trace the function, and
> ftrace is the fastest way to do so, and if it's not *exactly* at function
> entry, but includes the arguments, then it should be fine.

Yes, that is another (sub) reason why I introduced fprobe. ;-)

OK, I understand that we should not allow to probe on endbr unless
user really wants it. Let me add a KPROBE_FLAG_RAW_ENTRY for that special
purpose. If the flag is not set (by default), the kprobe::addr will be
shifted automatically.
ANyway, this address translation must be done in check_ftrace_location
instead of kprobe_lookup_name(). Let me make another patch.
Also, selftest and document must be updated with that.

> That said, perhaps we should add a config to know if the architecture
> uses function entry or the old mcount that is after the frame set up (that
> is, you can not get to the arguments).
> 
> CONFIG_HAVE_FTRACE_FUNCTION_START ?

Hmm, ENDBR is not always there, and except x86, most of the arch will
make it 'n'. (x86 is a special case.)

> 
> Because, if the arch still uses the old mcount method (where it's after the
> frame set up), then a kprobe at func+0 really wants the breakpoint.
> 
> -- Steve
> 

Thank you,

-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH v2 15/39] x86/ibt,kprobes: Fix more +0 assumptions
  2022-02-26  2:10           ` Masami Hiramatsu
@ 2022-02-26 11:48             ` Peter Zijlstra
  0 siblings, 0 replies; 183+ messages in thread
From: Peter Zijlstra @ 2022-02-26 11:48 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: x86, joao, hjl.tools, jpoimboe, andrew.cooper3, linux-kernel,
	ndesaulniers, keescook, samitolvanen, mark.rutland,
	alyssa.milburn, mbenes, rostedt, alexei.starovoitov

On Sat, Feb 26, 2022 at 11:10:40AM +0900, Masami Hiramatsu wrote:

> > The alternative 'hack' I've been contemplating is (ab)using
> > INT_MIN/INT_MAX offset for __fentry__ and __fexit__ points (that latter
> > is something we'll probably have to grow when CET-SHSTK or backward-edge
> > CFI gets to be done, because then ROP tricks as used by function-graph
> > and kretprobes are out the window).
> > 
> > That way sym+[0..size) is still a valid reference to the actual
> > instruction in the symbol, but sym+INT_MIN will hard map to __fentry__
> > while sym+INT_MAX will get us __fexit__.
> 
> Interesting, is that done by another series?

Not yet, that was just a crazy idea I had ;-)

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH v2 18/39] x86/ibt: Add IBT feature, MSR and #CP handling
  2022-02-25 23:51       ` Josh Poimboeuf
@ 2022-02-26 11:55         ` Peter Zijlstra
  0 siblings, 0 replies; 183+ messages in thread
From: Peter Zijlstra @ 2022-02-26 11:55 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: x86, joao, hjl.tools, andrew.cooper3, linux-kernel, ndesaulniers,
	keescook, samitolvanen, mark.rutland, alyssa.milburn, mbenes,
	rostedt, mhiramat, alexei.starovoitov

On Fri, Feb 25, 2022 at 03:51:22PM -0800, Josh Poimboeuf wrote:
> Ah, I see what you mean now.  Still, it can just reference the code
> label itself without having to allocate storage:

Ooh, cute. Yes ofcourse... duh.

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH v2 34/39] objtool: Validate IBT assumptions
  2022-02-24 14:52 ` [PATCH v2 34/39] objtool: Validate IBT assumptions Peter Zijlstra
@ 2022-02-27  3:13   ` Josh Poimboeuf
  2022-02-27 17:00     ` Peter Zijlstra
  0 siblings, 1 reply; 183+ messages in thread
From: Josh Poimboeuf @ 2022-02-27  3:13 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: x86, joao, hjl.tools, andrew.cooper3, linux-kernel, ndesaulniers,
	keescook, samitolvanen, mark.rutland, alyssa.milburn, mbenes,
	rostedt, mhiramat, alexei.starovoitov

On Thu, Feb 24, 2022 at 03:52:12PM +0100, Peter Zijlstra wrote:
> +++ b/tools/objtool/check.c
> @@ -380,6 +380,7 @@ static int decode_instructions(struct ob
>  			memset(insn, 0, sizeof(*insn));
>  			INIT_LIST_HEAD(&insn->alts);
>  			INIT_LIST_HEAD(&insn->stack_ops);
> +			INIT_LIST_HEAD(&insn->call_node);

Is this needed?  'call_node' isn't actually a list head, otherwise this
would presumably be fixing a major bug.

>  			insn->sec = sec;
>  			insn->offset = offset;
> @@ -1176,6 +1177,14 @@ static int add_jump_destinations(struct
>  	unsigned long dest_off;
>  
>  	for_each_insn(file, insn) {
> +		if (insn->type == INSN_ENDBR && insn->func) {
> +			if (insn->offset == insn->func->offset) {
> +				file->nr_endbr++;
> +			} else {
> +				file->nr_endbr_int++;
> +			}
> +		}
> +

This doesn't have much to do with adding jump destinations.  I'm
thinking this would fit better in decode_instructions() in the
sym_for_each_insn() loop.

>  		if (!is_static_jump(insn))
>  			continue;
>  
> @@ -1192,10 +1201,14 @@ static int add_jump_destinations(struct
>  		} else if (insn->func) {
>  			/* internal or external sibling call (with reloc) */
>  			add_call_dest(file, insn, reloc->sym, true);
> -			continue;
> +
> +			dest_sec = reloc->sym->sec;
> +			dest_off = reloc->sym->offset +
> +				   arch_dest_reloc_offset(reloc->addend);
> +
>  		} else if (reloc->sym->sec->idx) {
>  			dest_sec = reloc->sym->sec;
> -			dest_off = reloc->sym->sym.st_value +
> +			dest_off = reloc->sym->offset +
>  				   arch_dest_reloc_offset(reloc->addend);
>  		} else {
>  			/* non-func asm code jumping to another file */
> @@ -1205,6 +1218,10 @@ static int add_jump_destinations(struct
>  		insn->jump_dest = find_insn(file, dest_sec, dest_off);
>  		if (!insn->jump_dest) {
>  
> +			/* external symbol */
> +			if (!vmlinux && insn->func)
> +				continue;
> +
>  			/*
>  			 * This is a special case where an alt instruction
>  			 * jumps past the end of the section.  These are
> @@ -1219,6 +1236,16 @@ static int add_jump_destinations(struct
>  			return -1;
>  		}
>  
> +		if (ibt && insn->jump_dest->type == INSN_ENDBR &&
> +		    insn->jump_dest->func &&
> +		    insn->jump_dest->offset == insn->jump_dest->func->offset) {
> +			if (reloc) {
> +				WARN_FUNC("Direct RELOC jump to ENDBR", insn->sec, insn->offset);
> +			} else {
> +				WARN_FUNC("Direct IMM jump to ENDBR", insn->sec, insn->offset);
> +			}
> +		}
> +

I have several concerns about all the above (and corresponding changes
elsewhere), but it looks like this was moved to separate patches, for
ease of NACKing :-)

>  		/*
>  		 * Cross-function jump.
>  		 */
> @@ -1246,7 +1273,8 @@ static int add_jump_destinations(struct
>  				insn->jump_dest->func->pfunc = insn->func;
>  
>  			} else if (insn->jump_dest->func->pfunc != insn->func->pfunc &&
> -				   insn->jump_dest->offset == insn->jump_dest->func->offset) {
> +				   ((insn->jump_dest->offset == insn->jump_dest->func->offset) ||
> +				    (insn->jump_dest->offset == insn->jump_dest->func->offset + 4))) {
>  				/* internal sibling call (without reloc) */
>  				add_call_dest(file, insn, insn->jump_dest->func, true);

How about something more precise/readable/portable:

static bool same_func(struct instruction *insn1, struct instruction *insn2)
{
	return insn1->func->pfunc == insn2->func->pfunc;
}

static bool is_first_func_insn(struct instruction *insn)
{
	return insn->offset == insn->func->offset ||
	       (insn->type == INSN_ENDBR &&
	        insn->offset == insn->func->offset + insn->len);
}

			...

  			} else if (!same_func(insn, insn->jump_dest) &&
				   is_first_func_insn(insn->jump_dest))


> +static void validate_ibt_insn(struct objtool_file *file, struct instruction *insn);

I'd rather avoid forward declares and stay with the existing convention.

> +
>  /*
>   * Follow the branch starting at the given instruction, and recursively follow
>   * any other branches (jumps).  Meanwhile, track the frame pointer state at
> @@ -3101,6 +3164,17 @@ static int validate_branch(struct objtoo
>  
>  		if (insn->hint) {
>  			state.cfi = *insn->cfi;
> +			if (ibt) {
> +				struct symbol *sym;
> +
> +				if (insn->cfi->type == UNWIND_HINT_TYPE_REGS_PARTIAL &&
> +				    (sym = find_symbol_by_offset(insn->sec, insn->offset)) &&
> +				    insn->type != INSN_ENDBR && !insn->noendbr) {
> +					WARN_FUNC("IRET_REGS hint without ENDBR: %s",
> +						  insn->sec, insn->offset,
> +						  sym->name);
> +				}

No need to print sym->name here, WARN_FUNC() already does it?

> +			}
>  		} else {
>  			/* XXX track if we actually changed state.cfi */
>  
> @@ -3260,7 +3334,12 @@ static int validate_branch(struct objtoo
>  			state.df = false;
>  			break;
>  
> +		case INSN_NOP:
> +			break;
> +
>  		default:
> +			if (ibt)
> +				validate_ibt_insn(file, insn);

This is kind of subtle.  It would be more robust/clear to move this call
out of the switch statement and check explicitly for the exclusion of
jump/call instructions from within validate_ibt_insn().

>  			break;
>  		}
>  
> @@ -3506,6 +3585,130 @@ static int validate_functions(struct obj
>  	return warnings;
>  }
>  
> +static struct instruction *
> +validate_ibt_reloc(struct objtool_file *file, struct reloc *reloc)
> +{
> +	struct instruction *dest;
> +	struct section *sec;
> +	unsigned long off;
> +
> +	sec = reloc->sym->sec;
> +	off = reloc->sym->offset + reloc->addend;

This math assumes non-PC-relative.  If it's R_X86_64_PC32 or
R_X86_64_PLT32 then it needs +4 added.

There are actually a few cases of this in startup_64().  Those are
harmless, but there might conceivably be other code which isn't?

> +
> +	dest = find_insn(file, sec, off);
> +	if (!dest)
> +		return NULL;
> +
> +	if (dest->type == INSN_ENDBR)
> +		return NULL;
> +
> +	if (reloc->sym->static_call_tramp)
> +		return NULL;
> +
> +	return dest;
> +}
> +
> +static void warn_noendbr(const char *msg, struct section *sec, unsigned long offset,
> +			 struct instruction *target)
> +{
> +	WARN_FUNC("%srelocation to !ENDBR: %s+0x%lx", sec, offset, msg,
> +		  target->func ? target->func->name : target->sec->name,
> +		  target->func ? target->offset - target->func->offset : target->offset);
> +}
> +
> +static void validate_ibt_target(struct objtool_file *file, struct instruction *insn,
> +				struct instruction *target)
> +{
> +	if (target->func && target->func == insn->func) {

(Here and elsewhere) Instead of 'target' can we call it 'dest' for
consistency with existing code?

> +		/*
> +		 * Anything from->to self is either _THIS_IP_ or IRET-to-self.
> +		 *
> +		 * There is no sane way to annotate _THIS_IP_ since the compiler treats the
> +		 * relocation as a constant and is happy to fold in offsets, skewing any
> +		 * annotation we do, leading to vast amounts of false-positives.
> +		 *
> +		 * There's also compiler generated _THIS_IP_ through KCOV and
> +		 * such which we have no hope of annotating.
> +		 *
> +		 * As such, blanked accept self-references without issue.

"blanket"

> +		 */
> +		return;
> +	}
> +
> +	/*
> +	 * Annotated non-control flow target.
> +	 */
> +	if (target->noendbr)
> +		return;

I don't think the comment really adds anything.  What's a "non-control
flow target" anyway...

> +
> +	warn_noendbr("", insn->sec, insn->offset, target);
> +}
> +
> +static void validate_ibt_insn(struct objtool_file *file, struct instruction *insn)
> +{
> +	struct reloc *reloc = insn_reloc(file, insn);
> +	struct instruction *target;
> +
> +	for (;;) {
> +		if (!reloc)
> +			return;
> +
> +		target = validate_ibt_reloc(file, reloc);
> +		if (target)
> +			validate_ibt_target(file, insn, target);
> +
> +		reloc = find_reloc_by_dest_range(file->elf, insn->sec, reloc->offset + 1,
> +						 (insn->offset + insn->len) - (reloc->offset + 1));
> +	}

I'm confused about what this loop is trying to do.  Why would an
instruction have more than one reloc?  It at least needs a comment.

Also a proper for() loop would be easier to follow:

	for (reloc = insn_reloc(file, insn);
	     reloc;
	     reloc = find_reloc_by_dest_range(file->elf, insn->sec,
					      reloc->offset + 1,
					      (insn->offset + insn->len) - (reloc->offset + 1)) {

> +}
> +
> +static int validate_ibt(struct objtool_file *file)
> +{
> +	struct section *sec;
> +	struct reloc *reloc;
> +
> +	for_each_sec(file, sec) {
> +		bool is_data;
> +
> +		/* already done in validate_branch() */
> +		if (sec->sh.sh_flags & SHF_EXECINSTR)
> +			continue;
> +
> +		if (!sec->reloc)
> +			continue;
> +
> +		if (!strncmp(sec->name, ".orc", 4))
> +			continue;
> +
> +		if (!strncmp(sec->name, ".discard", 8))
> +			continue;
> +
> +		if (!strncmp(sec->name, ".debug", 6))
> +			continue;
> +
> +		if (!strcmp(sec->name, "_error_injection_whitelist"))
> +			continue;
> +
> +		if (!strcmp(sec->name, "_kprobe_blacklist"))
> +			continue;
> +
> +		is_data = strstr(sec->name, ".data") || strstr(sec->name, ".rodata");
> +
> +		list_for_each_entry(reloc, &sec->reloc->reloc_list, list) {
> +			struct instruction *target;
> +
> +			target = validate_ibt_reloc(file, reloc);
> +			if (is_data && target && !target->noendbr) {
> +				warn_noendbr("data ", reloc->sym->sec,
> +					     reloc->sym->offset + reloc->addend,

Another case where the addend math would be wrong if it were
pc-relative.  Not sure if that's possible here or not.

-- 
Josh


^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH v2 36/39] objtool: Find unused ENDBR instructions
  2022-02-24 14:52 ` [PATCH v2 36/39] objtool: Find unused ENDBR instructions Peter Zijlstra
@ 2022-02-27  3:46   ` Josh Poimboeuf
  2022-02-28 12:41     ` Peter Zijlstra
  0 siblings, 1 reply; 183+ messages in thread
From: Josh Poimboeuf @ 2022-02-27  3:46 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: x86, joao, hjl.tools, andrew.cooper3, linux-kernel, ndesaulniers,
	keescook, samitolvanen, mark.rutland, alyssa.milburn, mbenes,
	rostedt, mhiramat, alexei.starovoitov

On Thu, Feb 24, 2022 at 03:52:14PM +0100, Peter Zijlstra wrote:
> +#ifdef CONFIG_X86_KERNEL_IBT
> +	. = ALIGN(8);
> +	.ibt_endbr_sites : AT(ADDR(.ibt_endbr_sites) - LOAD_OFFSET) {
> +		__ibt_endbr_sites = .;
> +		*(.ibt_endbr_sites)
> +		__ibt_endbr_sites_end = .;
> +	}
> +#endif

".ibt_endbr_superfluous" maybe?  It's not *all* the endbr sites.

> +
>  	/*
>  	 * struct alt_inst entries. From the header (alternative.h):
>  	 * "Alternative instructions for different CPU types or capabilities"
> --- a/tools/objtool/builtin-check.c
> +++ b/tools/objtool/builtin-check.c
> @@ -21,7 +21,7 @@
>  
>  bool no_fp, no_unreachable, retpoline, module, backtrace, uaccess, stats,
>       lto, vmlinux, mcount, noinstr, backup, sls, dryrun,
> -     ibt, ibt_fix_direct;
> +     ibt, ibt_fix_direct, ibt_seal;
>  
>  static const char * const check_usage[] = {
>  	"objtool check [<options>] file.o",
> @@ -50,6 +50,7 @@ const struct option check_options[] = {
>  	OPT_BOOLEAN(0, "dry-run", &dryrun, "don't write the modifications"),
>  	OPT_BOOLEAN(0, "ibt", &ibt, "validate ENDBR placement"),
>  	OPT_BOOLEAN(0, "ibt-fix-direct", &ibt_fix_direct, "fixup direct jmp/call to ENDBR"),
> +	OPT_BOOLEAN(0, "ibt-seal", &ibt_seal, "list superfluous ENDBR instructions"),

s/list/annotate/ ?

Not sure "ibt-seal" is the appropriate name since the "seal" is done at
boot time.

Do we really need a separate option anyway?  To get the full benefits of
IBT you might as well enable it...  And always enabling it helps flush
out bugs quicker.

-- 
Josh


^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH v2 34/39] objtool: Validate IBT assumptions
  2022-02-27  3:13   ` Josh Poimboeuf
@ 2022-02-27 17:00     ` Peter Zijlstra
  2022-02-27 22:20       ` Josh Poimboeuf
  2022-02-28  9:26       ` Peter Zijlstra
  0 siblings, 2 replies; 183+ messages in thread
From: Peter Zijlstra @ 2022-02-27 17:00 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: x86, joao, hjl.tools, andrew.cooper3, linux-kernel, ndesaulniers,
	keescook, samitolvanen, mark.rutland, alyssa.milburn, mbenes,
	rostedt, mhiramat, alexei.starovoitov

On Sat, Feb 26, 2022 at 07:13:48PM -0800, Josh Poimboeuf wrote:
> On Thu, Feb 24, 2022 at 03:52:12PM +0100, Peter Zijlstra wrote:
> > +++ b/tools/objtool/check.c
> > @@ -380,6 +380,7 @@ static int decode_instructions(struct ob
> >  			memset(insn, 0, sizeof(*insn));
> >  			INIT_LIST_HEAD(&insn->alts);
> >  			INIT_LIST_HEAD(&insn->stack_ops);
> > +			INIT_LIST_HEAD(&insn->call_node);
> 
> Is this needed?  'call_node' isn't actually a list head, otherwise this
> would presumably be fixing a major bug.

Somewhere there's an unconditional list_del_init() on call_node, could
be that moved to another patch and now it don't make immediate sense,
I'll move them together again.

> >  			insn->sec = sec;
> >  			insn->offset = offset;
> > @@ -1176,6 +1177,14 @@ static int add_jump_destinations(struct
> >  	unsigned long dest_off;
> >  
> >  	for_each_insn(file, insn) {
> > +		if (insn->type == INSN_ENDBR && insn->func) {
> > +			if (insn->offset == insn->func->offset) {
> > +				file->nr_endbr++;
> > +			} else {
> > +				file->nr_endbr_int++;
> > +			}
> > +		}
> > +
> 
> This doesn't have much to do with adding jump destinations.  I'm
> thinking this would fit better in decode_instructions() in the
> sym_for_each_insn() loop.

Fair enough I suppose. I'm not quite sure how it ended up where it did.

> > @@ -1219,6 +1236,16 @@ static int add_jump_destinations(struct
> >  			return -1;
> >  		}
> >  
> > +		if (ibt && insn->jump_dest->type == INSN_ENDBR &&
> > +		    insn->jump_dest->func &&
> > +		    insn->jump_dest->offset == insn->jump_dest->func->offset) {
> > +			if (reloc) {
> > +				WARN_FUNC("Direct RELOC jump to ENDBR", insn->sec, insn->offset);
> > +			} else {
> > +				WARN_FUNC("Direct IMM jump to ENDBR", insn->sec, insn->offset);
> > +			}
> > +		}
> > +
> 
> I have several concerns about all the above (and corresponding changes
> elsewhere), but it looks like this was moved to separate patches, for
> ease of NACKing :-)

Right, we talked about that, I'll move the whole UD1 poisoning to the
end and use NOP4 instead, which removes the need for this.

> >  		/*
> >  		 * Cross-function jump.
> >  		 */
> > @@ -1246,7 +1273,8 @@ static int add_jump_destinations(struct
> >  				insn->jump_dest->func->pfunc = insn->func;
> >  
> >  			} else if (insn->jump_dest->func->pfunc != insn->func->pfunc &&
> > -				   insn->jump_dest->offset == insn->jump_dest->func->offset) {
> > +				   ((insn->jump_dest->offset == insn->jump_dest->func->offset) ||
> > +				    (insn->jump_dest->offset == insn->jump_dest->func->offset + 4))) {
> >  				/* internal sibling call (without reloc) */
> >  				add_call_dest(file, insn, insn->jump_dest->func, true);
> 
> How about something more precise/readable/portable:
> 
> static bool same_func(struct instruction *insn1, struct instruction *insn2)
> {
> 	return insn1->func->pfunc == insn2->func->pfunc;
> }
> 
> static bool is_first_func_insn(struct instruction *insn)
> {
> 	return insn->offset == insn->func->offset ||
> 	       (insn->type == INSN_ENDBR &&
> 	        insn->offset == insn->func->offset + insn->len);
> }
> 
> 			...
> 
>   			} else if (!same_func(insn, insn->jump_dest) &&
> 				   is_first_func_insn(insn->jump_dest))
> 

Done.

> > +static void validate_ibt_insn(struct objtool_file *file, struct instruction *insn);
> 
> I'd rather avoid forward declares and stay with the existing convention.
> 
> > +
> >  /*
> >   * Follow the branch starting at the given instruction, and recursively follow
> >   * any other branches (jumps).  Meanwhile, track the frame pointer state at
> > @@ -3101,6 +3164,17 @@ static int validate_branch(struct objtoo
> >  
> >  		if (insn->hint) {
> >  			state.cfi = *insn->cfi;
> > +			if (ibt) {
> > +				struct symbol *sym;
> > +
> > +				if (insn->cfi->type == UNWIND_HINT_TYPE_REGS_PARTIAL &&
> > +				    (sym = find_symbol_by_offset(insn->sec, insn->offset)) &&
> > +				    insn->type != INSN_ENDBR && !insn->noendbr) {
> > +					WARN_FUNC("IRET_REGS hint without ENDBR: %s",
> > +						  insn->sec, insn->offset,
> > +						  sym->name);
> > +				}
> 
> No need to print sym->name here, WARN_FUNC() already does it?

Almost; perhaps the change to make is to either introduce WARN_SYM or
make WARN_FUNC also print !STT_FUNC symbols ?

> > @@ -3260,7 +3334,12 @@ static int validate_branch(struct objtoo
> >  			state.df = false;
> >  			break;
> >  
> > +		case INSN_NOP:
> > +			break;
> > +
> >  		default:
> > +			if (ibt)
> > +				validate_ibt_insn(file, insn);
> 
> This is kind of subtle.  It would be more robust/clear to move this call
> out of the switch statement and check explicitly for the exclusion of
> jump/call instructions from within validate_ibt_insn().

Can do I suppose.

> >  			break;
> >  		}
> >  
> > @@ -3506,6 +3585,130 @@ static int validate_functions(struct obj
> >  	return warnings;
> >  }
> >  
> > +static struct instruction *
> > +validate_ibt_reloc(struct objtool_file *file, struct reloc *reloc)
> > +{
> > +	struct instruction *dest;
> > +	struct section *sec;
> > +	unsigned long off;
> > +
> > +	sec = reloc->sym->sec;
> > +	off = reloc->sym->offset + reloc->addend;
> 
> This math assumes non-PC-relative.  If it's R_X86_64_PC32 or
> R_X86_64_PLT32 then it needs +4 added.

Right; so I actually had that PC32 thing in there for a while, but ran
into other trouble. I'll go try and figure it out.


> > +static void validate_ibt_target(struct objtool_file *file, struct instruction *insn,
> > +				struct instruction *target)
> > +{
> > +	if (target->func && target->func == insn->func) {
> 
> (Here and elsewhere) Instead of 'target' can we call it 'dest' for
> consistency with existing code?

Done.

> > +		/*
> > +		 * Anything from->to self is either _THIS_IP_ or IRET-to-self.
> > +		 *
> > +		 * There is no sane way to annotate _THIS_IP_ since the compiler treats the
> > +		 * relocation as a constant and is happy to fold in offsets, skewing any
> > +		 * annotation we do, leading to vast amounts of false-positives.
> > +		 *
> > +		 * There's also compiler generated _THIS_IP_ through KCOV and
> > +		 * such which we have no hope of annotating.
> > +		 *
> > +		 * As such, blanked accept self-references without issue.
> 
> "blanket"

Duh.

> > +static void validate_ibt_insn(struct objtool_file *file, struct instruction *insn)
> > +{
> > +	struct reloc *reloc = insn_reloc(file, insn);
> > +	struct instruction *target;
> > +
> > +	for (;;) {
> > +		if (!reloc)
> > +			return;
> > +
> > +		target = validate_ibt_reloc(file, reloc);
> > +		if (target)
> > +			validate_ibt_target(file, insn, target);
> > +
> > +		reloc = find_reloc_by_dest_range(file->elf, insn->sec, reloc->offset + 1,
> > +						 (insn->offset + insn->len) - (reloc->offset + 1));
> > +	}
> 
> I'm confused about what this loop is trying to do.  Why would an
> instruction have more than one reloc?  It at least needs a comment.

Because there are some :/ 'mov' can have an immediate and a
displacement, both needing a relocation.

> Also a proper for() loop would be easier to follow:
> 
> 	for (reloc = insn_reloc(file, insn);
> 	     reloc;
> 	     reloc = find_reloc_by_dest_range(file->elf, insn->sec,
> 					      reloc->offset + 1,
> 					      (insn->offset + insn->len) - (reloc->offset + 1)) {

Sure.

> > +}
> > +
> > +static int validate_ibt(struct objtool_file *file)
> > +{
> > +	struct section *sec;
> > +	struct reloc *reloc;
> > +
> > +	for_each_sec(file, sec) {
> > +		bool is_data;
> > +
> > +		/* already done in validate_branch() */
> > +		if (sec->sh.sh_flags & SHF_EXECINSTR)
> > +			continue;
> > +
> > +		if (!sec->reloc)
> > +			continue;
> > +
> > +		if (!strncmp(sec->name, ".orc", 4))
> > +			continue;
> > +
> > +		if (!strncmp(sec->name, ".discard", 8))
> > +			continue;
> > +
> > +		if (!strncmp(sec->name, ".debug", 6))
> > +			continue;
> > +
> > +		if (!strcmp(sec->name, "_error_injection_whitelist"))
> > +			continue;
> > +
> > +		if (!strcmp(sec->name, "_kprobe_blacklist"))
> > +			continue;
> > +
> > +		is_data = strstr(sec->name, ".data") || strstr(sec->name, ".rodata");
> > +
> > +		list_for_each_entry(reloc, &sec->reloc->reloc_list, list) {
> > +			struct instruction *target;
> > +
> > +			target = validate_ibt_reloc(file, reloc);
> > +			if (is_data && target && !target->noendbr) {
> > +				warn_noendbr("data ", reloc->sym->sec,
> > +					     reloc->sym->offset + reloc->addend,
> 
> Another case where the addend math would be wrong if it were
> pc-relative.  Not sure if that's possible here or not.

I'll check.

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH v2 34/39] objtool: Validate IBT assumptions
  2022-02-27 17:00     ` Peter Zijlstra
@ 2022-02-27 22:20       ` Josh Poimboeuf
  2022-02-28  9:47         ` Peter Zijlstra
  2022-02-28  9:26       ` Peter Zijlstra
  1 sibling, 1 reply; 183+ messages in thread
From: Josh Poimboeuf @ 2022-02-27 22:20 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: x86, joao, hjl.tools, andrew.cooper3, linux-kernel, ndesaulniers,
	keescook, samitolvanen, mark.rutland, alyssa.milburn, mbenes,
	rostedt, mhiramat, alexei.starovoitov

On Sun, Feb 27, 2022 at 06:00:03PM +0100, Peter Zijlstra wrote:
> > > @@ -3101,6 +3164,17 @@ static int validate_branch(struct objtoo
> > >  
> > >  		if (insn->hint) {
> > >  			state.cfi = *insn->cfi;
> > > +			if (ibt) {
> > > +				struct symbol *sym;
> > > +
> > > +				if (insn->cfi->type == UNWIND_HINT_TYPE_REGS_PARTIAL &&
> > > +				    (sym = find_symbol_by_offset(insn->sec, insn->offset)) &&
> > > +				    insn->type != INSN_ENDBR && !insn->noendbr) {
> > > +					WARN_FUNC("IRET_REGS hint without ENDBR: %s",
> > > +						  insn->sec, insn->offset,
> > > +						  sym->name);
> > > +				}
> > 
> > No need to print sym->name here, WARN_FUNC() already does it?
> 
> Almost; perhaps the change to make is to either introduce WARN_SYM or
> make WARN_FUNC also print !STT_FUNC symbols ?

In the case of no function, WARN_FUNC() falls back to printing sec+off.
Is that not good enough?

> > > +static void validate_ibt_insn(struct objtool_file *file, struct instruction *insn)
> > > +{
> > > +	struct reloc *reloc = insn_reloc(file, insn);
> > > +	struct instruction *target;
> > > +
> > > +	for (;;) {
> > > +		if (!reloc)
> > > +			return;
> > > +
> > > +		target = validate_ibt_reloc(file, reloc);
> > > +		if (target)
> > > +			validate_ibt_target(file, insn, target);
> > > +
> > > +		reloc = find_reloc_by_dest_range(file->elf, insn->sec, reloc->offset + 1,
> > > +						 (insn->offset + insn->len) - (reloc->offset + 1));
> > > +	}
> > 
> > I'm confused about what this loop is trying to do.  Why would an
> > instruction have more than one reloc?  It at least needs a comment.
> 
> Because there are some :/ 'mov' can have an immediate and a
> displacement, both needing a relocation.

<boom> mind blown.  How did I not know this?

-- 
Josh


^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH v2 15/39] x86/ibt,kprobes: Fix more +0 assumptions
  2022-02-24 14:51 ` [PATCH v2 15/39] x86/ibt,kprobes: Fix more +0 assumptions Peter Zijlstra
  2022-02-25  0:58   ` Kees Cook
  2022-02-25  1:32   ` Masami Hiramatsu
@ 2022-02-28  6:07   ` Masami Hiramatsu
  2022-02-28 23:25     ` Peter Zijlstra
  2 siblings, 1 reply; 183+ messages in thread
From: Masami Hiramatsu @ 2022-02-28  6:07 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: x86, joao, hjl.tools, jpoimboe, andrew.cooper3, linux-kernel,
	ndesaulniers, keescook, samitolvanen, mark.rutland,
	alyssa.milburn, mbenes, rostedt, mhiramat, alexei.starovoitov

Hi Peter,

So, instead of this change, can you try below?
This introduce the arch_adjust_kprobe_addr() and use it in the kprobe_addr()
so that it can handle the case that user passed the probe address in 
_text+OFFSET format.

From: Masami Hiramatsu <mhiramat@kernel.org>
Date: Mon, 28 Feb 2022 15:01:48 +0900
Subject: [PATCH] x86: kprobes: Skip ENDBR instruction probing

This adjust the kprobe probe address to skip the ENDBR and put the kprobe
next to the ENDBR so that the kprobe doesn't disturb IBT.

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
---
 arch/x86/kernel/kprobes/core.c |  7 +++++++
 include/linux/kprobes.h        |  2 ++
 kernel/kprobes.c               | 11 ++++++++++-
 3 files changed, 19 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/kprobes/core.c b/arch/x86/kernel/kprobes/core.c
index 745f42cf82dc..a90cfe50d800 100644
--- a/arch/x86/kernel/kprobes/core.c
+++ b/arch/x86/kernel/kprobes/core.c
@@ -52,6 +52,7 @@
 #include <asm/insn.h>
 #include <asm/debugreg.h>
 #include <asm/set_memory.h>
+"include <asm/ibt.h>
 
 #include "common.h"
 
@@ -301,6 +302,12 @@ static int can_probe(unsigned long paddr)
 	return (addr == paddr);
 }
 
+/* If the x86 support IBT (ENDBR) it must be skipped. */
+kprobe_opcode_t *arch_adjust_kprobe_addr(unsigned long addr)
+{
+	return (kprobe_opcode_t *)skip_endbr((void *)addr);
+}
+
 /*
  * Copy an instruction with recovering modified instruction by kprobes
  * and adjust the displacement if the instruction uses the %rip-relative
diff --git a/include/linux/kprobes.h b/include/linux/kprobes.h
index 19b884353b15..485d7832a613 100644
--- a/include/linux/kprobes.h
+++ b/include/linux/kprobes.h
@@ -384,6 +384,8 @@ static inline struct kprobe_ctlblk *get_kprobe_ctlblk(void)
 }
 
 kprobe_opcode_t *kprobe_lookup_name(const char *name, unsigned int offset);
+kprobe_opcode_t *arch_adjust_kprobe_addr(unsigned long addr);
+
 int register_kprobe(struct kprobe *p);
 void unregister_kprobe(struct kprobe *p);
 int register_kprobes(struct kprobe **kps, int num);
diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index 94cab8c9ce56..312f10e85c93 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -1488,6 +1488,15 @@ bool within_kprobe_blacklist(unsigned long addr)
 	return false;
 }
 
+/*
+ * If the arch supports the feature like IBT which will put a trap at
+ * the entry of the symbol, it must be adjusted in this function.
+ */
+kprobe_opcode_t *__weak arch_adjust_kprobe_addr(unsigned long addr)
+{
+	return (kprobe_opcode_t *)addr;
+}
+
 /*
  * If 'symbol_name' is specified, look it up and add the 'offset'
  * to it. This way, we can specify a relative address to a symbol.
@@ -1506,7 +1515,7 @@ static kprobe_opcode_t *_kprobe_addr(kprobe_opcode_t *addr,
 			return ERR_PTR(-ENOENT);
 	}
 
-	addr = (kprobe_opcode_t *)(((char *)addr) + offset);
+	addr = arch_adjust_kprobe_addr((unsigned long)addr + offset);
 	if (addr)
 		return addr;
 
-- 
2.25.1


On Thu, 24 Feb 2022 15:51:53 +0100
Peter Zijlstra <peterz@infradead.org> wrote:

> With IBT on, sym+0 is no longer the __fentry__ site.
> 
> NOTE: the architecture has a special case and *does* allow placing an
> INT3 breakpoint over ENDBR in which case #BP has precedence over #CP
> and as such we don't need to disallow probing these instructions.
> 
> NOTE: irrespective of the above; there is a complication in that
> direct branches to functions are rewritten to not execute ENDBR, so
> any breakpoint thereon might miss lots of actual function executions.
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  arch/x86/kernel/kprobes/core.c |   11 +++++++++++
>  kernel/kprobes.c               |   15 ++++++++++++---
>  2 files changed, 23 insertions(+), 3 deletions(-)
> 
> --- a/arch/x86/kernel/kprobes/core.c
> +++ b/arch/x86/kernel/kprobes/core.c
> @@ -1156,3 +1162,8 @@ int arch_trampoline_kprobe(struct kprobe
>  {
>  	return 0;
>  }
> +
> +bool arch_kprobe_on_func_entry(unsigned long offset)
> +{
> +	return offset <= 4*HAS_KERNEL_IBT;
> +}
> --- a/kernel/kprobes.c
> +++ b/kernel/kprobes.c
> @@ -67,10 +67,19 @@ static bool kprobes_all_disarmed;
>  static DEFINE_MUTEX(kprobe_mutex);
>  static DEFINE_PER_CPU(struct kprobe *, kprobe_instance);
>  
> -kprobe_opcode_t * __weak kprobe_lookup_name(const char *name,
> -					unsigned int __unused)
> +kprobe_opcode_t * __weak kprobe_lookup_name(const char *name, unsigned int offset)
>  {
> -	return ((kprobe_opcode_t *)(kallsyms_lookup_name(name)));
> +	kprobe_opcode_t *addr = NULL;
> +
> +	addr = ((kprobe_opcode_t *)(kallsyms_lookup_name(name)));
> +#ifdef CONFIG_KPROBES_ON_FTRACE
> +	if (addr && !offset) {
> +		unsigned long faddr = ftrace_location((unsigned long)addr);
> +		if (faddr)
> +			addr = (kprobe_opcode_t *)faddr;
> +	}
> +#endif
> +	return addr;
>  }
>  
>  /*
> 
> 


-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply related	[flat|nested] 183+ messages in thread

* Re: [PATCH v2 34/39] objtool: Validate IBT assumptions
  2022-02-27 17:00     ` Peter Zijlstra
  2022-02-27 22:20       ` Josh Poimboeuf
@ 2022-02-28  9:26       ` Peter Zijlstra
  2022-02-28 18:39         ` Josh Poimboeuf
  1 sibling, 1 reply; 183+ messages in thread
From: Peter Zijlstra @ 2022-02-28  9:26 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: x86, joao, hjl.tools, andrew.cooper3, linux-kernel, ndesaulniers,
	keescook, samitolvanen, mark.rutland, alyssa.milburn, mbenes,
	rostedt, mhiramat, alexei.starovoitov

On Sun, Feb 27, 2022 at 06:00:03PM +0100, Peter Zijlstra wrote:
> On Sat, Feb 26, 2022 at 07:13:48PM -0800, Josh Poimboeuf wrote:
> > > +static struct instruction *
> > > +validate_ibt_reloc(struct objtool_file *file, struct reloc *reloc)
> > > +{
> > > +	struct instruction *dest;
> > > +	struct section *sec;
> > > +	unsigned long off;
> > > +
> > > +	sec = reloc->sym->sec;
> > > +	off = reloc->sym->offset + reloc->addend;
> > 
> > This math assumes non-PC-relative.  If it's R_X86_64_PC32 or
> > R_X86_64_PLT32 then it needs +4 added.
> 
> Right; so I actually had that PC32 thing in there for a while, but ran
> into other trouble. I'll go try and figure it out.

Things like .rela.initcall*.init use PC32 but don't need the +4. If we
get that wrong it'll seal all the initcall and boot doesn't get very
far at all :-)

How do you feel about something like:

	sec = reloc->sym->sec;
	off = reloc->sym->offset;

	if ((reloc->sec->base->sh.sh_flags & SHF_EXECINSTR) &&
	    (reloc->type == R_X86_64_PC32 || reloc->type == R_X86_64_PLT32))
		off += arch_dest_reloc_offset(reloc->addend);
	else
		off += reloc->addend;


hmm ?

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH v2 34/39] objtool: Validate IBT assumptions
  2022-02-27 22:20       ` Josh Poimboeuf
@ 2022-02-28  9:47         ` Peter Zijlstra
  2022-02-28 18:36           ` Josh Poimboeuf
  0 siblings, 1 reply; 183+ messages in thread
From: Peter Zijlstra @ 2022-02-28  9:47 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: x86, joao, hjl.tools, andrew.cooper3, linux-kernel, ndesaulniers,
	keescook, samitolvanen, mark.rutland, alyssa.milburn, mbenes,
	rostedt, mhiramat, alexei.starovoitov

On Sun, Feb 27, 2022 at 02:20:55PM -0800, Josh Poimboeuf wrote:
> On Sun, Feb 27, 2022 at 06:00:03PM +0100, Peter Zijlstra wrote:
> > > > @@ -3101,6 +3164,17 @@ static int validate_branch(struct objtoo
> > > >  
> > > >  		if (insn->hint) {
> > > >  			state.cfi = *insn->cfi;
> > > > +			if (ibt) {
> > > > +				struct symbol *sym;
> > > > +
> > > > +				if (insn->cfi->type == UNWIND_HINT_TYPE_REGS_PARTIAL &&
> > > > +				    (sym = find_symbol_by_offset(insn->sec, insn->offset)) &&
> > > > +				    insn->type != INSN_ENDBR && !insn->noendbr) {
> > > > +					WARN_FUNC("IRET_REGS hint without ENDBR: %s",
> > > > +						  insn->sec, insn->offset,
> > > > +						  sym->name);
> > > > +				}
> > > 
> > > No need to print sym->name here, WARN_FUNC() already does it?
> > 
> > Almost; perhaps the change to make is to either introduce WARN_SYM or
> > make WARN_FUNC also print !STT_FUNC symbols ?
> 
> In the case of no function, WARN_FUNC() falls back to printing sec+off.
> Is that not good enough?

I got really tired of doing the manual symbol lookup... I don't suppose
it matters too much now that I've more or less completed the triage, but
it was useful.

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH v2 36/39] objtool: Find unused ENDBR instructions
  2022-02-27  3:46   ` Josh Poimboeuf
@ 2022-02-28 12:41     ` Peter Zijlstra
  2022-02-28 17:36       ` Josh Poimboeuf
  0 siblings, 1 reply; 183+ messages in thread
From: Peter Zijlstra @ 2022-02-28 12:41 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: x86, joao, hjl.tools, andrew.cooper3, linux-kernel, ndesaulniers,
	keescook, samitolvanen, mark.rutland, alyssa.milburn, mbenes,
	rostedt, mhiramat, alexei.starovoitov

On Sat, Feb 26, 2022 at 07:46:13PM -0800, Josh Poimboeuf wrote:
> On Thu, Feb 24, 2022 at 03:52:14PM +0100, Peter Zijlstra wrote:
> > +#ifdef CONFIG_X86_KERNEL_IBT
> > +	. = ALIGN(8);
> > +	.ibt_endbr_sites : AT(ADDR(.ibt_endbr_sites) - LOAD_OFFSET) {
> > +		__ibt_endbr_sites = .;
> > +		*(.ibt_endbr_sites)
> > +		__ibt_endbr_sites_end = .;
> > +	}
> > +#endif
> 
> ".ibt_endbr_superfluous" maybe?  It's not *all* the endbr sites.

Since I like seals, I'll make it .ibt_endbr_seal :-) Also goes well with
the option at hand.

> > +
> >  	/*
> >  	 * struct alt_inst entries. From the header (alternative.h):
> >  	 * "Alternative instructions for different CPU types or capabilities"
> > --- a/tools/objtool/builtin-check.c
> > +++ b/tools/objtool/builtin-check.c
> > @@ -21,7 +21,7 @@
> >  
> >  bool no_fp, no_unreachable, retpoline, module, backtrace, uaccess, stats,
> >       lto, vmlinux, mcount, noinstr, backup, sls, dryrun,
> > -     ibt, ibt_fix_direct;
> > +     ibt, ibt_fix_direct, ibt_seal;
> >  
> >  static const char * const check_usage[] = {
> >  	"objtool check [<options>] file.o",
> > @@ -50,6 +50,7 @@ const struct option check_options[] = {
> >  	OPT_BOOLEAN(0, "dry-run", &dryrun, "don't write the modifications"),
> >  	OPT_BOOLEAN(0, "ibt", &ibt, "validate ENDBR placement"),
> >  	OPT_BOOLEAN(0, "ibt-fix-direct", &ibt_fix_direct, "fixup direct jmp/call to ENDBR"),
> > +	OPT_BOOLEAN(0, "ibt-seal", &ibt_seal, "list superfluous ENDBR instructions"),
> 
> s/list/annotate/ ?

Done :-)

> Not sure "ibt-seal" is the appropriate name since the "seal" is done at
> boot time.

It allows sealing; it finds the locations to seal, whatever :-)

> Do we really need a separate option anyway?  To get the full benefits of
> IBT you might as well enable it...  And always enabling it helps flush
> out bugs quicker.

Are you asking about --ibt and --ibt-seal or about the existence of
X86_KERNEL_IBT_SEAL here?

The Makefiles will only ever use --ibt and --ibt-seal together for the
reason you state. The reason they're two separate objtool arguments is
because it's stictly speaking two different things being done. Also
--ibt as such is invariant, while --ibt-seal causes modifications to the
object file (which can be discarded using the new --dry-run I suppose).

The reason X86_KERNEL_IBT_SEAL exists is because that requires objtool
while X86_KERNEL_IBT does not -- you seemed to favour not hard relying
on having objtool present.



^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH v2 36/39] objtool: Find unused ENDBR instructions
  2022-02-28 12:41     ` Peter Zijlstra
@ 2022-02-28 17:36       ` Josh Poimboeuf
  0 siblings, 0 replies; 183+ messages in thread
From: Josh Poimboeuf @ 2022-02-28 17:36 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: x86, joao, hjl.tools, andrew.cooper3, linux-kernel, ndesaulniers,
	keescook, samitolvanen, mark.rutland, alyssa.milburn, mbenes,
	rostedt, mhiramat, alexei.starovoitov

On Mon, Feb 28, 2022 at 01:41:13PM +0100, Peter Zijlstra wrote:
> On Sat, Feb 26, 2022 at 07:46:13PM -0800, Josh Poimboeuf wrote:
> > On Thu, Feb 24, 2022 at 03:52:14PM +0100, Peter Zijlstra wrote:
> > > +#ifdef CONFIG_X86_KERNEL_IBT
> > > +	. = ALIGN(8);
> > > +	.ibt_endbr_sites : AT(ADDR(.ibt_endbr_sites) - LOAD_OFFSET) {
> > > +		__ibt_endbr_sites = .;
> > > +		*(.ibt_endbr_sites)
> > > +		__ibt_endbr_sites_end = .;
> > > +	}
> > > +#endif
> > 
> > ".ibt_endbr_superfluous" maybe?  It's not *all* the endbr sites.
> 
> Since I like seals, I'll make it .ibt_endbr_seal :-) Also goes well with
> the option at hand.

Sounds good.

> 
> > > +
> > >  	/*
> > >  	 * struct alt_inst entries. From the header (alternative.h):
> > >  	 * "Alternative instructions for different CPU types or capabilities"
> > > --- a/tools/objtool/builtin-check.c
> > > +++ b/tools/objtool/builtin-check.c
> > > @@ -21,7 +21,7 @@
> > >  
> > >  bool no_fp, no_unreachable, retpoline, module, backtrace, uaccess, stats,
> > >       lto, vmlinux, mcount, noinstr, backup, sls, dryrun,
> > > -     ibt, ibt_fix_direct;
> > > +     ibt, ibt_fix_direct, ibt_seal;
> > >  
> > >  static const char * const check_usage[] = {
> > >  	"objtool check [<options>] file.o",
> > > @@ -50,6 +50,7 @@ const struct option check_options[] = {
> > >  	OPT_BOOLEAN(0, "dry-run", &dryrun, "don't write the modifications"),
> > >  	OPT_BOOLEAN(0, "ibt", &ibt, "validate ENDBR placement"),
> > >  	OPT_BOOLEAN(0, "ibt-fix-direct", &ibt_fix_direct, "fixup direct jmp/call to ENDBR"),
> > > +	OPT_BOOLEAN(0, "ibt-seal", &ibt_seal, "list superfluous ENDBR instructions"),
> > 
> > s/list/annotate/ ?
> 
> Done :-)
> 
> > Not sure "ibt-seal" is the appropriate name since the "seal" is done at
> > boot time.
> 
> It allows sealing; it finds the locations to seal, whatever :-)

Fair enough :-)

> > Do we really need a separate option anyway?  To get the full benefits of
> > IBT you might as well enable it...  And always enabling it helps flush
> > out bugs quicker.
> 
> Are you asking about --ibt and --ibt-seal or about the existence of
> X86_KERNEL_IBT_SEAL here?

Both.

> The Makefiles will only ever use --ibt and --ibt-seal together for the
> reason you state. The reason they're two separate objtool arguments is
> because it's stictly speaking two different things being done. Also
> --ibt as such is invariant, while --ibt-seal causes modifications to the
> object file (which can be discarded using the new --dry-run I suppose).

Ok, but I wanted to avoid option sprawl.  I don't see a reason to
separate them.

> The reason X86_KERNEL_IBT_SEAL exists is because that requires objtool
> while X86_KERNEL_IBT does not -- you seemed to favour not hard relying
> on having objtool present.

Hm, either you misunderstood, I misspoke, or I have short term memory
loss.  Objtool is already hopelessly intertwined with x86.  I'd rather
not have the extra option.

-- 
Josh


^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH v2 34/39] objtool: Validate IBT assumptions
  2022-02-28  9:47         ` Peter Zijlstra
@ 2022-02-28 18:36           ` Josh Poimboeuf
  2022-02-28 20:10             ` Peter Zijlstra
  0 siblings, 1 reply; 183+ messages in thread
From: Josh Poimboeuf @ 2022-02-28 18:36 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: x86, joao, hjl.tools, andrew.cooper3, linux-kernel, ndesaulniers,
	keescook, samitolvanen, mark.rutland, alyssa.milburn, mbenes,
	rostedt, mhiramat, alexei.starovoitov

On Mon, Feb 28, 2022 at 10:47:55AM +0100, Peter Zijlstra wrote:
> On Sun, Feb 27, 2022 at 02:20:55PM -0800, Josh Poimboeuf wrote:
> > On Sun, Feb 27, 2022 at 06:00:03PM +0100, Peter Zijlstra wrote:
> > > > > @@ -3101,6 +3164,17 @@ static int validate_branch(struct objtoo
> > > > >  
> > > > >  		if (insn->hint) {
> > > > >  			state.cfi = *insn->cfi;
> > > > > +			if (ibt) {
> > > > > +				struct symbol *sym;
> > > > > +
> > > > > +				if (insn->cfi->type == UNWIND_HINT_TYPE_REGS_PARTIAL &&
> > > > > +				    (sym = find_symbol_by_offset(insn->sec, insn->offset)) &&
> > > > > +				    insn->type != INSN_ENDBR && !insn->noendbr) {
> > > > > +					WARN_FUNC("IRET_REGS hint without ENDBR: %s",
> > > > > +						  insn->sec, insn->offset,
> > > > > +						  sym->name);
> > > > > +				}
> > > > 
> > > > No need to print sym->name here, WARN_FUNC() already does it?
> > > 
> > > Almost; perhaps the change to make is to either introduce WARN_SYM or
> > > make WARN_FUNC also print !STT_FUNC symbols ?
> > 
> > In the case of no function, WARN_FUNC() falls back to printing sec+off.
> > Is that not good enough?
> 
> I got really tired of doing the manual symbol lookup... I don't suppose
> it matters too much now that I've more or less completed the triage, but
> it was useful.

Maybe it would be reasonable to change WARN_FUNC to do that?  i.e. fall
back from func+off to sym+off to sec+off.

-- 
Josh


^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH v2 34/39] objtool: Validate IBT assumptions
  2022-02-28  9:26       ` Peter Zijlstra
@ 2022-02-28 18:39         ` Josh Poimboeuf
  0 siblings, 0 replies; 183+ messages in thread
From: Josh Poimboeuf @ 2022-02-28 18:39 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: x86, joao, hjl.tools, andrew.cooper3, linux-kernel, ndesaulniers,
	keescook, samitolvanen, mark.rutland, alyssa.milburn, mbenes,
	rostedt, mhiramat, alexei.starovoitov

On Mon, Feb 28, 2022 at 10:26:07AM +0100, Peter Zijlstra wrote:
> On Sun, Feb 27, 2022 at 06:00:03PM +0100, Peter Zijlstra wrote:
> > On Sat, Feb 26, 2022 at 07:13:48PM -0800, Josh Poimboeuf wrote:
> > > > +static struct instruction *
> > > > +validate_ibt_reloc(struct objtool_file *file, struct reloc *reloc)
> > > > +{
> > > > +	struct instruction *dest;
> > > > +	struct section *sec;
> > > > +	unsigned long off;
> > > > +
> > > > +	sec = reloc->sym->sec;
> > > > +	off = reloc->sym->offset + reloc->addend;
> > > 
> > > This math assumes non-PC-relative.  If it's R_X86_64_PC32 or
> > > R_X86_64_PLT32 then it needs +4 added.
> > 
> > Right; so I actually had that PC32 thing in there for a while, but ran
> > into other trouble. I'll go try and figure it out.
> 
> Things like .rela.initcall*.init use PC32 but don't need the +4. If we
> get that wrong it'll seal all the initcall and boot doesn't get very
> far at all :-)

Ah...

> How do you feel about something like:
> 
> 	sec = reloc->sym->sec;
> 	off = reloc->sym->offset;
> 
> 	if ((reloc->sec->base->sh.sh_flags & SHF_EXECINSTR) &&
> 	    (reloc->type == R_X86_64_PC32 || reloc->type == R_X86_64_PLT32))
> 		off += arch_dest_reloc_offset(reloc->addend);
> 	else
> 		off += reloc->addend;
> 
> 
> hmm ?

Looks good to me.

-- 
Josh


^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH v2 34/39] objtool: Validate IBT assumptions
  2022-02-28 18:36           ` Josh Poimboeuf
@ 2022-02-28 20:10             ` Peter Zijlstra
  0 siblings, 0 replies; 183+ messages in thread
From: Peter Zijlstra @ 2022-02-28 20:10 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: x86, joao, hjl.tools, andrew.cooper3, linux-kernel, ndesaulniers,
	keescook, samitolvanen, mark.rutland, alyssa.milburn, mbenes,
	rostedt, mhiramat, alexei.starovoitov

On Mon, Feb 28, 2022 at 10:36:55AM -0800, Josh Poimboeuf wrote:
> On Mon, Feb 28, 2022 at 10:47:55AM +0100, Peter Zijlstra wrote:
> > On Sun, Feb 27, 2022 at 02:20:55PM -0800, Josh Poimboeuf wrote:
> > > On Sun, Feb 27, 2022 at 06:00:03PM +0100, Peter Zijlstra wrote:
> > > > > > @@ -3101,6 +3164,17 @@ static int validate_branch(struct objtoo
> > > > > >  
> > > > > >  		if (insn->hint) {
> > > > > >  			state.cfi = *insn->cfi;
> > > > > > +			if (ibt) {
> > > > > > +				struct symbol *sym;
> > > > > > +
> > > > > > +				if (insn->cfi->type == UNWIND_HINT_TYPE_REGS_PARTIAL &&
> > > > > > +				    (sym = find_symbol_by_offset(insn->sec, insn->offset)) &&
> > > > > > +				    insn->type != INSN_ENDBR && !insn->noendbr) {
> > > > > > +					WARN_FUNC("IRET_REGS hint without ENDBR: %s",
> > > > > > +						  insn->sec, insn->offset,
> > > > > > +						  sym->name);
> > > > > > +				}
> > > > > 
> > > > > No need to print sym->name here, WARN_FUNC() already does it?
> > > > 
> > > > Almost; perhaps the change to make is to either introduce WARN_SYM or
> > > > make WARN_FUNC also print !STT_FUNC symbols ?
> > > 
> > > In the case of no function, WARN_FUNC() falls back to printing sec+off.
> > > Is that not good enough?
> > 
> > I got really tired of doing the manual symbol lookup... I don't suppose
> > it matters too much now that I've more or less completed the triage, but
> > it was useful.
> 
> Maybe it would be reasonable to change WARN_FUNC to do that?  i.e. fall
> back from func+off to sym+off to sec+off.

I'll make it happen.

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH v2 15/39] x86/ibt,kprobes: Fix more +0 assumptions
  2022-02-28  6:07   ` Masami Hiramatsu
@ 2022-02-28 23:25     ` Peter Zijlstra
  2022-03-01  2:49       ` Masami Hiramatsu
  2022-03-01 17:03       ` Naveen N. Rao
  0 siblings, 2 replies; 183+ messages in thread
From: Peter Zijlstra @ 2022-02-28 23:25 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: x86, joao, hjl.tools, jpoimboe, andrew.cooper3, linux-kernel,
	ndesaulniers, keescook, samitolvanen, mark.rutland,
	alyssa.milburn, mbenes, rostedt, alexei.starovoitov,
	naveen.n.rao

On Mon, Feb 28, 2022 at 03:07:05PM +0900, Masami Hiramatsu wrote:
> Hi Peter,
> 
> So, instead of this change, can you try below?
> This introduce the arch_adjust_kprobe_addr() and use it in the kprobe_addr()
> so that it can handle the case that user passed the probe address in 
> _text+OFFSET format.

It works a little... at the very least it still needs
arch_kprobe_on_func_entry() allowing offset 4.

But looking at this, we've got:

kprobe_on_func_entry(addr, sym, offset)
  _kprobe_addr(addr, sym, offset)
    if (sym)
      addr = kprobe_lookup_name()
           = kallsyms_lookup_name()
    arch_adjust_kprobe_addr(addr+offset)
      skip_endbr()
        kallsyms_loopup_size_offset(addr, ...)
  kallsyms_lookup_size_offset(addr, NULL, &offset)
  arch_kprobe_on_func_entry(offset)

Which is _3_ kallsyms lookups and 3 weak/arch hooks.

Surely we can make this a little more streamlined? The below seems to
work.

I think with a little care and testing it should be possible to fold all
the magic of PowerPC's kprobe_lookup_name() into this one hook as well,
meaning we can get rid of kprobe_lookup_name() entirely.  Naveen?

This then gets us down to a 1 kallsyms call and 1 arch hook. Hmm?

---
 arch/powerpc/kernel/kprobes.c  |   34 +++++++++++++++---------
 arch/x86/kernel/kprobes/core.c |   17 ++++++++++++
 include/linux/kprobes.h        |    3 +-
 kernel/kprobes.c               |   56 ++++++++++++++++++++++++++++++++++-------
 4 files changed, 87 insertions(+), 23 deletions(-)

--- a/arch/powerpc/kernel/kprobes.c
+++ b/arch/powerpc/kernel/kprobes.c
@@ -105,6 +105,27 @@ kprobe_opcode_t *kprobe_lookup_name(cons
 	return addr;
 }
 
+static bool arch_kprobe_on_func_entry(unsigned long offset)
+{
+#ifdef PPC64_ELF_ABI_v2
+#ifdef CONFIG_KPROBES_ON_FTRACE
+	return offset <= 16;
+#else
+	return offset <= 8;
+#endif
+#else
+	return !offset;
+#endif
+}
+
+/* XXX try and fold the magic of kprobe_lookup_name() in this */
+kprobe_opcode_t *arch_adjust_kprobe_addr(unsigned long addr, unsigned long offset,
+					 bool *on_func_entry)
+{
+	*on_func_entry = arch_kprobe_on_func_entry(offset);
+	return (kprobe_opcode_t *)(addr + offset);
+}
+
 void *alloc_insn_page(void)
 {
 	void *page;
@@ -218,19 +239,6 @@ static nokprobe_inline void set_current_
 	kcb->kprobe_saved_msr = regs->msr;
 }
 
-bool arch_kprobe_on_func_entry(unsigned long offset)
-{
-#ifdef PPC64_ELF_ABI_v2
-#ifdef CONFIG_KPROBES_ON_FTRACE
-	return offset <= 16;
-#else
-	return offset <= 8;
-#endif
-#else
-	return !offset;
-#endif
-}
-
 void arch_prepare_kretprobe(struct kretprobe_instance *ri, struct pt_regs *regs)
 {
 	ri->ret_addr = (kprobe_opcode_t *)regs->link;
--- a/arch/x86/kernel/kprobes/core.c
+++ b/arch/x86/kernel/kprobes/core.c
@@ -52,6 +52,7 @@
 #include <asm/insn.h>
 #include <asm/debugreg.h>
 #include <asm/set_memory.h>
+#include <asm/ibt.h>
 
 #include "common.h"
 
@@ -301,6 +302,22 @@ static int can_probe(unsigned long paddr
 	return (addr == paddr);
 }
 
+/* If the x86 support IBT (ENDBR) it must be skipped. */
+kprobe_opcode_t *arch_adjust_kprobe_addr(unsigned long addr, unsigned long offset,
+					 bool *on_func_entry)
+{
+	if (is_endbr(*(u32 *)addr)) {
+		*on_func_entry = !offset || offset == 4;
+		if (*on_func_entry)
+			offset = 4;
+
+	} else {
+		*on_func_entry = !offset;
+	}
+
+	return (kprobe_opcode_t *)(addr + offset);
+}
+
 /*
  * Copy an instruction with recovering modified instruction by kprobes
  * and adjust the displacement if the instruction uses the %rip-relative
--- a/include/linux/kprobes.h
+++ b/include/linux/kprobes.h
@@ -265,7 +265,6 @@ extern int arch_init_kprobes(void);
 extern void kprobes_inc_nmissed_count(struct kprobe *p);
 extern bool arch_within_kprobe_blacklist(unsigned long addr);
 extern int arch_populate_kprobe_blacklist(void);
-extern bool arch_kprobe_on_func_entry(unsigned long offset);
 extern int kprobe_on_func_entry(kprobe_opcode_t *addr, const char *sym, unsigned long offset);
 
 extern bool within_kprobe_blacklist(unsigned long addr);
@@ -384,6 +383,8 @@ static inline struct kprobe_ctlblk *get_
 }
 
 kprobe_opcode_t *kprobe_lookup_name(const char *name, unsigned int offset);
+kprobe_opcode_t *arch_adjust_kprobe_addr(unsigned long addr, unsigned long offset, bool *on_func_entry);
+
 int register_kprobe(struct kprobe *p);
 void unregister_kprobe(struct kprobe *p);
 int register_kprobes(struct kprobe **kps, int num);
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -1489,24 +1489,63 @@ bool within_kprobe_blacklist(unsigned lo
 }
 
 /*
+ * arch_adjust_kprobe_addr - adjust the address
+ * @addr: symbol base address
+ * @offset: offset within the symbol
+ * @on_func_entry: was this @addr+@offset on the function entry
+ *
+ * Typically returns @addr + @offset, except for special cases where the
+ * function might be prefixed by a CFI landing pad, in that case any offset
+ * inside the landing pad is mapped to the first 'real' instruction of the
+ * symbol.
+ *
+ * Specifically, for things like IBT/BTI, skip the resp. ENDBR/BTI.C
+ * instruction at +0.
+ */
+kprobe_opcode_t *__weak arch_adjust_kprobe_addr(unsigned long addr,
+						unsigned long offset,
+						bool *on_func_entry)
+{
+	*on_func_entry = !offset;
+	return (kprobe_opcode_t *)(addr + offset);
+}
+
+/*
  * If 'symbol_name' is specified, look it up and add the 'offset'
  * to it. This way, we can specify a relative address to a symbol.
  * This returns encoded errors if it fails to look up symbol or invalid
  * combination of parameters.
  */
-static kprobe_opcode_t *_kprobe_addr(kprobe_opcode_t *addr,
-			const char *symbol_name, unsigned int offset)
+static kprobe_opcode_t *
+_kprobe_addr(kprobe_opcode_t *addr, const char *symbol_name,
+	     unsigned long offset, bool *on_func_entry)
 {
 	if ((symbol_name && addr) || (!symbol_name && !addr))
 		goto invalid;
 
 	if (symbol_name) {
+		/*
+		 * Input: @sym + @offset
+		 * Output: @addr + @offset
+		 *
+		 * NOTE: kprobe_lookup_name() does *NOT* fold the offset
+		 *       argument into it's output!
+		 */
 		addr = kprobe_lookup_name(symbol_name, offset);
 		if (!addr)
 			return ERR_PTR(-ENOENT);
+	} else {
+		/*
+		 * Input: @addr + @offset
+		 * Output: @addr' + @offset'
+		 */
+		if (!kallsyms_lookup_size_offset((unsigned long)addr + offset,
+						 NULL, &offset))
+			return ERR_PTR(-ENOENT);
+		addr = (kprobe_opcode_t *)((unsigned long)addr - offset);
 	}
 
-	addr = (kprobe_opcode_t *)(((char *)addr) + offset);
+	addr = arch_adjust_kprobe_addr((unsigned long)addr, offset, on_func_entry);
 	if (addr)
 		return addr;
 
@@ -1516,7 +1555,8 @@ static kprobe_opcode_t *_kprobe_addr(kpr
 
 static kprobe_opcode_t *kprobe_addr(struct kprobe *p)
 {
-	return _kprobe_addr(p->addr, p->symbol_name, p->offset);
+	bool on_func_entry;
+	return _kprobe_addr(p->addr, p->symbol_name, p->offset, &on_func_entry);
 }
 
 /*
@@ -2067,15 +2107,13 @@ bool __weak arch_kprobe_on_func_entry(un
  */
 int kprobe_on_func_entry(kprobe_opcode_t *addr, const char *sym, unsigned long offset)
 {
-	kprobe_opcode_t *kp_addr = _kprobe_addr(addr, sym, offset);
+	bool on_func_entry;
+	kprobe_opcode_t *kp_addr = _kprobe_addr(addr, sym, offset, &on_func_entry);
 
 	if (IS_ERR(kp_addr))
 		return PTR_ERR(kp_addr);
 
-	if (!kallsyms_lookup_size_offset((unsigned long)kp_addr, NULL, &offset))
-		return -ENOENT;
-
-	if (!arch_kprobe_on_func_entry(offset))
+	if (!on_func_entry)
 		return -EINVAL;
 
 	return 0;

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH v2 15/39] x86/ibt,kprobes: Fix more +0 assumptions
  2022-02-28 23:25     ` Peter Zijlstra
@ 2022-03-01  2:49       ` Masami Hiramatsu
  2022-03-01  8:28         ` Peter Zijlstra
  2022-03-01 17:03       ` Naveen N. Rao
  1 sibling, 1 reply; 183+ messages in thread
From: Masami Hiramatsu @ 2022-03-01  2:49 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: x86, joao, hjl.tools, jpoimboe, andrew.cooper3, linux-kernel,
	ndesaulniers, keescook, samitolvanen, mark.rutland,
	alyssa.milburn, mbenes, rostedt, alexei.starovoitov,
	naveen.n.rao

On Tue, 1 Mar 2022 00:25:13 +0100
Peter Zijlstra <peterz@infradead.org> wrote:

> On Mon, Feb 28, 2022 at 03:07:05PM +0900, Masami Hiramatsu wrote:
> > Hi Peter,
> > 
> > So, instead of this change, can you try below?
> > This introduce the arch_adjust_kprobe_addr() and use it in the kprobe_addr()
> > so that it can handle the case that user passed the probe address in 
> > _text+OFFSET format.
> 
> It works a little... at the very least it still needs
> arch_kprobe_on_func_entry() allowing offset 4.
> 
> But looking at this, we've got:
> 
> kprobe_on_func_entry(addr, sym, offset)
>   _kprobe_addr(addr, sym, offset)
>     if (sym)
>       addr = kprobe_lookup_name()
>            = kallsyms_lookup_name()
>     arch_adjust_kprobe_addr(addr+offset)
>       skip_endbr()
>         kallsyms_loopup_size_offset(addr, ...)
>   kallsyms_lookup_size_offset(addr, NULL, &offset)
>   arch_kprobe_on_func_entry(offset)
> 
> Which is _3_ kallsyms lookups and 3 weak/arch hooks.

Yeah.

> 
> Surely we can make this a little more streamlined? The below seems to
> work.

OK, let me check.

> 
> I think with a little care and testing it should be possible to fold all
> the magic of PowerPC's kprobe_lookup_name() into this one hook as well,
> meaning we can get rid of kprobe_lookup_name() entirely.  Naveen?

Agreed. my previous patch just focused on x86, but powerpc
kprobe_lookup_name() must be updated too.

> 
> This then gets us down to a 1 kallsyms call and 1 arch hook. Hmm?
> 
> ---
>  arch/powerpc/kernel/kprobes.c  |   34 +++++++++++++++---------
>  arch/x86/kernel/kprobes/core.c |   17 ++++++++++++
>  include/linux/kprobes.h        |    3 +-
>  kernel/kprobes.c               |   56 ++++++++++++++++++++++++++++++++++-------
>  4 files changed, 87 insertions(+), 23 deletions(-)
> 
> --- a/arch/powerpc/kernel/kprobes.c
> +++ b/arch/powerpc/kernel/kprobes.c
> @@ -105,6 +105,27 @@ kprobe_opcode_t *kprobe_lookup_name(cons
>  	return addr;
>  }
>  
> +static bool arch_kprobe_on_func_entry(unsigned long offset)
> +{
> +#ifdef PPC64_ELF_ABI_v2
> +#ifdef CONFIG_KPROBES_ON_FTRACE
> +	return offset <= 16;
> +#else
> +	return offset <= 8;
> +#endif
> +#else
> +	return !offset;
> +#endif
> +}
> +
> +/* XXX try and fold the magic of kprobe_lookup_name() in this */
> +kprobe_opcode_t *arch_adjust_kprobe_addr(unsigned long addr, unsigned long offset,
> +					 bool *on_func_entry)
> +{
> +	*on_func_entry = arch_kprobe_on_func_entry(offset);
> +	return (kprobe_opcode_t *)(addr + offset);
> +}
> +
>  void *alloc_insn_page(void)
>  {
>  	void *page;
> @@ -218,19 +239,6 @@ static nokprobe_inline void set_current_
>  	kcb->kprobe_saved_msr = regs->msr;
>  }
>  
> -bool arch_kprobe_on_func_entry(unsigned long offset)
> -{
> -#ifdef PPC64_ELF_ABI_v2
> -#ifdef CONFIG_KPROBES_ON_FTRACE
> -	return offset <= 16;
> -#else
> -	return offset <= 8;
> -#endif
> -#else
> -	return !offset;
> -#endif
> -}
> -
>  void arch_prepare_kretprobe(struct kretprobe_instance *ri, struct pt_regs *regs)
>  {
>  	ri->ret_addr = (kprobe_opcode_t *)regs->link;
> --- a/arch/x86/kernel/kprobes/core.c
> +++ b/arch/x86/kernel/kprobes/core.c
> @@ -52,6 +52,7 @@
>  #include <asm/insn.h>
>  #include <asm/debugreg.h>
>  #include <asm/set_memory.h>
> +#include <asm/ibt.h>
>  
>  #include "common.h"
>  
> @@ -301,6 +302,22 @@ static int can_probe(unsigned long paddr
>  	return (addr == paddr);
>  }
>  
> +/* If the x86 support IBT (ENDBR) it must be skipped. */
> +kprobe_opcode_t *arch_adjust_kprobe_addr(unsigned long addr, unsigned long offset,
> +					 bool *on_func_entry)
> +{
> +	if (is_endbr(*(u32 *)addr)) {
> +		*on_func_entry = !offset || offset == 4;
> +		if (*on_func_entry)
> +			offset = 4;
> +
> +	} else {
> +		*on_func_entry = !offset;
> +	}
> +
> +	return (kprobe_opcode_t *)(addr + offset);
> +}
> +
>  /*
>   * Copy an instruction with recovering modified instruction by kprobes
>   * and adjust the displacement if the instruction uses the %rip-relative
> --- a/include/linux/kprobes.h
> +++ b/include/linux/kprobes.h
> @@ -265,7 +265,6 @@ extern int arch_init_kprobes(void);
>  extern void kprobes_inc_nmissed_count(struct kprobe *p);
>  extern bool arch_within_kprobe_blacklist(unsigned long addr);
>  extern int arch_populate_kprobe_blacklist(void);
> -extern bool arch_kprobe_on_func_entry(unsigned long offset);
>  extern int kprobe_on_func_entry(kprobe_opcode_t *addr, const char *sym, unsigned long offset);
>  
>  extern bool within_kprobe_blacklist(unsigned long addr);
> @@ -384,6 +383,8 @@ static inline struct kprobe_ctlblk *get_
>  }
>  
>  kprobe_opcode_t *kprobe_lookup_name(const char *name, unsigned int offset);
> +kprobe_opcode_t *arch_adjust_kprobe_addr(unsigned long addr, unsigned long offset, bool *on_func_entry);
> +
>  int register_kprobe(struct kprobe *p);
>  void unregister_kprobe(struct kprobe *p);
>  int register_kprobes(struct kprobe **kps, int num);
> --- a/kernel/kprobes.c
> +++ b/kernel/kprobes.c
> @@ -1489,24 +1489,63 @@ bool within_kprobe_blacklist(unsigned lo
>  }
>  
>  /*
> + * arch_adjust_kprobe_addr - adjust the address
> + * @addr: symbol base address
> + * @offset: offset within the symbol
> + * @on_func_entry: was this @addr+@offset on the function entry
> + *
> + * Typically returns @addr + @offset, except for special cases where the
> + * function might be prefixed by a CFI landing pad, in that case any offset
> + * inside the landing pad is mapped to the first 'real' instruction of the
> + * symbol.
> + *
> + * Specifically, for things like IBT/BTI, skip the resp. ENDBR/BTI.C
> + * instruction at +0.
> + */
> +kprobe_opcode_t *__weak arch_adjust_kprobe_addr(unsigned long addr,
> +						unsigned long offset,
> +						bool *on_func_entry)
> +{
> +	*on_func_entry = !offset;
> +	return (kprobe_opcode_t *)(addr + offset);
> +}
> +
> +/*
>   * If 'symbol_name' is specified, look it up and add the 'offset'
>   * to it. This way, we can specify a relative address to a symbol.
>   * This returns encoded errors if it fails to look up symbol or invalid
>   * combination of parameters.
>   */
> -static kprobe_opcode_t *_kprobe_addr(kprobe_opcode_t *addr,
> -			const char *symbol_name, unsigned int offset)
> +static kprobe_opcode_t *
> +_kprobe_addr(kprobe_opcode_t *addr, const char *symbol_name,
> +	     unsigned long offset, bool *on_func_entry)
>  {
>  	if ((symbol_name && addr) || (!symbol_name && !addr))
>  		goto invalid;
>  
>  	if (symbol_name) {
> +		/*
> +		 * Input: @sym + @offset
> +		 * Output: @addr + @offset
> +		 *
> +		 * NOTE: kprobe_lookup_name() does *NOT* fold the offset
> +		 *       argument into it's output!
> +		 */
>  		addr = kprobe_lookup_name(symbol_name, offset);

Hmm, there are 2 issues.

- the 'addr' includes the 'offset' here.
- the 'offset' is NOT limited under the symbol size.
  (e.g. symbol_name = "_text" and @offset points the offset of target symbol from _text)

This means we need to call kallsyms_lookup_size_offset() in this case too.

>  		if (!addr)
>  			return ERR_PTR(-ENOENT);
> +	} else {
> +		/*
> +		 * Input: @addr + @offset
> +		 * Output: @addr' + @offset'
> +		 */
> +		if (!kallsyms_lookup_size_offset((unsigned long)addr + offset,
> +						 NULL, &offset))
> +			return ERR_PTR(-ENOENT);
> +		addr = (kprobe_opcode_t *)((unsigned long)addr - offset);
>  	}
>  
> -	addr = (kprobe_opcode_t *)(((char *)addr) + offset);
> +	addr = arch_adjust_kprobe_addr((unsigned long)addr, offset, on_func_entry);

Thus we can ensure the 'offset' here is the real offset from the target function entry.

Thank you,

>  	if (addr)
>  		return addr;
>  
> @@ -1516,7 +1555,8 @@ static kprobe_opcode_t *_kprobe_addr(kpr
>  
>  static kprobe_opcode_t *kprobe_addr(struct kprobe *p)
>  {
> -	return _kprobe_addr(p->addr, p->symbol_name, p->offset);
> +	bool on_func_entry;
> +	return _kprobe_addr(p->addr, p->symbol_name, p->offset, &on_func_entry);
>  }
>  
>  /*
> @@ -2067,15 +2107,13 @@ bool __weak arch_kprobe_on_func_entry(un
>   */
>  int kprobe_on_func_entry(kprobe_opcode_t *addr, const char *sym, unsigned long offset)
>  {
> -	kprobe_opcode_t *kp_addr = _kprobe_addr(addr, sym, offset);
> +	bool on_func_entry;
> +	kprobe_opcode_t *kp_addr = _kprobe_addr(addr, sym, offset, &on_func_entry);
>  
>  	if (IS_ERR(kp_addr))
>  		return PTR_ERR(kp_addr);
>  
> -	if (!kallsyms_lookup_size_offset((unsigned long)kp_addr, NULL, &offset))
> -		return -ENOENT;
> -
> -	if (!arch_kprobe_on_func_entry(offset))
> +	if (!on_func_entry)
>  		return -EINVAL;
>  
>  	return 0;


-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH v2 15/39] x86/ibt,kprobes: Fix more +0 assumptions
  2022-03-01  2:49       ` Masami Hiramatsu
@ 2022-03-01  8:28         ` Peter Zijlstra
  2022-03-01 17:19           ` Naveen N. Rao
  2022-03-02  0:11           ` Masami Hiramatsu
  0 siblings, 2 replies; 183+ messages in thread
From: Peter Zijlstra @ 2022-03-01  8:28 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: x86, joao, hjl.tools, jpoimboe, andrew.cooper3, linux-kernel,
	ndesaulniers, keescook, samitolvanen, mark.rutland,
	alyssa.milburn, mbenes, rostedt, alexei.starovoitov,
	naveen.n.rao

On Tue, Mar 01, 2022 at 11:49:05AM +0900, Masami Hiramatsu wrote:
> > +static kprobe_opcode_t *
> > +_kprobe_addr(kprobe_opcode_t *addr, const char *symbol_name,
> > +	     unsigned long offset, bool *on_func_entry)
> >  {
> >  	if ((symbol_name && addr) || (!symbol_name && !addr))
> >  		goto invalid;
> >  
> >  	if (symbol_name) {
> > +		/*
> > +		 * Input: @sym + @offset
> > +		 * Output: @addr + @offset
> > +		 *
> > +		 * NOTE: kprobe_lookup_name() does *NOT* fold the offset
> > +		 *       argument into it's output!
> > +		 */
> >  		addr = kprobe_lookup_name(symbol_name, offset);
> 
> Hmm, there are 2 issues.
> 
> - the 'addr' includes the 'offset' here.

AFAICT it doesn't (I ever wrote that in the comment on top). There's two
implementations of kprobe_lookup_name(), the weak version doesn't even
use the offset argument, and the PowerPC implementation only checks for
!offset and doesn't fold it.

> - the 'offset' is NOT limited under the symbol size.
>   (e.g. symbol_name = "_text" and @offset points the offset of target symbol from _text)
> 
> This means we need to call kallsyms_lookup_size_offset() in this case too.

I'm feeling we should error out in that case. Using sym+offset beyond
the limits of sym is just daft.

But if you really want/need to retain that, then yes, we need that
else branch unconditionally :/

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH v2 03/39] x86/module: Fix the paravirt vs alternative order
  2022-02-24 14:51 ` [PATCH v2 03/39] x86/module: Fix the paravirt vs alternative order Peter Zijlstra
@ 2022-03-01 14:37   ` Miroslav Benes
  0 siblings, 0 replies; 183+ messages in thread
From: Miroslav Benes @ 2022-03-01 14:37 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: x86, joao, hjl.tools, jpoimboe, andrew.cooper3, linux-kernel,
	ndesaulniers, keescook, samitolvanen, mark.rutland,
	alyssa.milburn, rostedt, mhiramat, alexei.starovoitov

On Thu, 24 Feb 2022, Peter Zijlstra wrote:

> Ever since commit 4e6292114c741 ("x86/paravirt: Add new features for
> paravirt patching") there is an ordering dependency between patching
> paravirt ops and patching alternatives, the module loader still
> violates this.
> 
> Fixes: 4e6292114c741 ("x86/paravirt: Add new features for paravirt patching")
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>

Reviewed-by: Miroslav Benes <mbenes@suse.cz>

M

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH v2 04/39] objtool: Add --dry-run
  2022-02-24 14:51 ` [PATCH v2 04/39] objtool: Add --dry-run Peter Zijlstra
  2022-02-25  0:27   ` Kees Cook
@ 2022-03-01 14:37   ` Miroslav Benes
  1 sibling, 0 replies; 183+ messages in thread
From: Miroslav Benes @ 2022-03-01 14:37 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: x86, joao, hjl.tools, jpoimboe, andrew.cooper3, linux-kernel,
	ndesaulniers, keescook, samitolvanen, mark.rutland,
	alyssa.milburn, rostedt, mhiramat, alexei.starovoitov

On Thu, 24 Feb 2022, Peter Zijlstra wrote:

> Add a --dry-run argument to skip writing the modifications. This is
> convenient for debugging.
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>

Reviewed-by: Miroslav Benes <mbenes@suse.cz>

M

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH v2 18/39] x86/ibt: Add IBT feature, MSR and #CP handling
  2022-02-25 19:59   ` Edgecombe, Rick P
@ 2022-03-01 15:14     ` Peter Zijlstra
  2022-03-01 21:02       ` Peter Zijlstra
  0 siblings, 1 reply; 183+ messages in thread
From: Peter Zijlstra @ 2022-03-01 15:14 UTC (permalink / raw)
  To: Edgecombe, Rick P
  Cc: Poimboe, Josh, hjl.tools, x86, joao, Cooper, Andrew,
	linux-kernel, keescook, rostedt, samitolvanen, mark.rutland,
	alexei.starovoitov, Milburn, Alyssa, mhiramat, mbenes,
	ndesaulniers

On Fri, Feb 25, 2022 at 07:59:15PM +0000, Edgecombe, Rick P wrote:
> On Thu, 2022-02-24 at 15:51 +0100, Peter Zijlstra wrote:
> > +__noendbr void cet_disable(void)
> > +{
> > +       if (cpu_feature_enabled(X86_FEATURE_IBT))
> > +               wrmsrl(MSR_IA32_S_CET, 0);
> > +}
> > +
> 
> Did this actually work? 

No idea,.. I don't generally have kexec clue.

> There are actually two problems with kexecing
> when CET is enabled. One is leaving the enforcement enabled when the
> new kernel can't handle it. The other is that CR4.CET and CR0.WP are
> tied together such that if you try to disable CR0.WP while CR4.CET is
> still set, it will #GP. CR0.WP gets unset during kexec/boot in the new
> kernel, so it blows up if you just disable IBT with the MSR and leave
> the CR4 bit set.
> 
> I was under the impression that this had been tested in the userspace
> series, but apparently not as I've just produced the CR0.WP issue. So
> it needs to be fixed in that series too. Userspace doesn't really need
> it pinned, so it should be easy.

So I see CR0 frobbing in identity_mapped and CR4 frobbing right after
it. Is there a reason to first do CR0 and then CR4 or can we flip them?
Otherwise we need to do CR4 twice.

(Also, whoever wrote that function with _5_ identically named labels in
it deserves something painful. Also, wth's up with that jmp 1f; 1:)

Something like so?

diff --git a/arch/x86/kernel/relocate_kernel_64.S b/arch/x86/kernel/relocate_kernel_64.S
index 399f075ccdc4..5b65f6ec5ee6 100644
--- a/arch/x86/kernel/relocate_kernel_64.S
+++ b/arch/x86/kernel/relocate_kernel_64.S
@@ -114,6 +114,14 @@ SYM_CODE_START_LOCAL_NOALIGN(identity_mapped)
 	/* store the start address on the stack */
 	pushq   %rdx
 
+	/*
+	 * Clear X86_CR4_CET (if it was set) such that we can clear CR0_WP
+	 * below.
+	 */
+	movq	%cr4, %rax
+	andq	$~(X86_CR4_CET), %rax
+	movq	%rax, %cr4
+
 	/*
 	 * Set cr0 to a known state:
 	 *  - Paging enabled

^ permalink raw reply related	[flat|nested] 183+ messages in thread

* Re: [PATCH v2 15/39] x86/ibt,kprobes: Fix more +0 assumptions
  2022-02-28 23:25     ` Peter Zijlstra
  2022-03-01  2:49       ` Masami Hiramatsu
@ 2022-03-01 17:03       ` Naveen N. Rao
  1 sibling, 0 replies; 183+ messages in thread
From: Naveen N. Rao @ 2022-03-01 17:03 UTC (permalink / raw)
  To: Masami Hiramatsu, Peter Zijlstra
  Cc: alexei.starovoitov, alyssa.milburn, andrew.cooper3, hjl.tools,
	joao, jpoimboe, keescook, linux-kernel, mark.rutland, mbenes,
	ndesaulniers, rostedt, samitolvanen, x86

Hi Peter,

Peter Zijlstra wrote:
> On Mon, Feb 28, 2022 at 03:07:05PM +0900, Masami Hiramatsu wrote:
>> Hi Peter,
>> 
>> So, instead of this change, can you try below?
>> This introduce the arch_adjust_kprobe_addr() and use it in the kprobe_addr()
>> so that it can handle the case that user passed the probe address in 
>> _text+OFFSET format.
> 
> It works a little... at the very least it still needs
> arch_kprobe_on_func_entry() allowing offset 4.
> 
> But looking at this, we've got:
> 
> kprobe_on_func_entry(addr, sym, offset)
>   _kprobe_addr(addr, sym, offset)
>     if (sym)
>       addr = kprobe_lookup_name()
>            = kallsyms_lookup_name()
>     arch_adjust_kprobe_addr(addr+offset)
>       skip_endbr()
>         kallsyms_loopup_size_offset(addr, ...)
>   kallsyms_lookup_size_offset(addr, NULL, &offset)
>   arch_kprobe_on_func_entry(offset)
> 
> Which is _3_ kallsyms lookups and 3 weak/arch hooks.
> 
> Surely we can make this a little more streamlined? The below seems to
> work.
> 
> I think with a little care and testing it should be possible to fold all
> the magic of PowerPC's kprobe_lookup_name() into this one hook as well,
> meaning we can get rid of kprobe_lookup_name() entirely.  Naveen?

This is timely. I've been looking at addressing a similar set of issues 
on powerpc:
http://lkml.kernel.org/r/cover.1645096227.git.naveen.n.rao@linux.vnet.ibm.com

> 
> This then gets us down to a 1 kallsyms call and 1 arch hook. Hmm?

I was going to propose making _kprobe_addr() into a weak function in 
place of kprobe_lookup_name() in response to Masami in the other thread, 
but this is looking better.

> 
> ---
>  arch/powerpc/kernel/kprobes.c  |   34 +++++++++++++++---------
>  arch/x86/kernel/kprobes/core.c |   17 ++++++++++++
>  include/linux/kprobes.h        |    3 +-
>  kernel/kprobes.c               |   56 ++++++++++++++++++++++++++++++++++-------
>  4 files changed, 87 insertions(+), 23 deletions(-)

I will take a closer look at this tomorrow and revert.


Thanks,
- Naveen


^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH v2 15/39] x86/ibt,kprobes: Fix more +0 assumptions
  2022-03-01  8:28         ` Peter Zijlstra
@ 2022-03-01 17:19           ` Naveen N. Rao
  2022-03-01 19:12             ` Peter Zijlstra
  2022-03-02  0:11           ` Masami Hiramatsu
  1 sibling, 1 reply; 183+ messages in thread
From: Naveen N. Rao @ 2022-03-01 17:19 UTC (permalink / raw)
  To: Masami Hiramatsu, Peter Zijlstra
  Cc: alexei.starovoitov, alyssa.milburn, andrew.cooper3, hjl.tools,
	joao, jpoimboe, keescook, linux-kernel, mark.rutland, mbenes,
	ndesaulniers, rostedt, samitolvanen, x86

Peter Zijlstra wrote:
> On Tue, Mar 01, 2022 at 11:49:05AM +0900, Masami Hiramatsu wrote:
> 
>> - the 'offset' is NOT limited under the symbol size.
>>   (e.g. symbol_name = "_text" and @offset points the offset of target symbol from _text)
>> 
>> This means we need to call kallsyms_lookup_size_offset() in this case too.
> 
> I'm feeling we should error out in that case. Using sym+offset beyond
> the limits of sym is just daft.
> 
> But if you really want/need to retain that, then yes, we need that
> else branch unconditionally :/

I think we will need this. perf always specifies an offset from _text.

Also, I just noticed:

> -static kprobe_opcode_t *_kprobe_addr(kprobe_opcode_t *addr,
> -			const char *symbol_name, unsigned int offset)
> +static kprobe_opcode_t *
> +_kprobe_addr(kprobe_opcode_t *addr, const char *symbol_name,
> +	     unsigned long offset, bool *on_func_entry)
>  {
>  	if ((symbol_name && addr) || (!symbol_name && !addr))
>  		goto invalid;
>  
>  	if (symbol_name) {
> +		/*
> +		 * Input: @sym + @offset
> +		 * Output: @addr + @offset
> +		 *
> +		 * NOTE: kprobe_lookup_name() does *NOT* fold the offset
> +		 *       argument into it's output!
> +		 */
>  		addr = kprobe_lookup_name(symbol_name, offset);
>  		if (!addr)
>  			return ERR_PTR(-ENOENT);
> +	} else {
> +		/*
> +		 * Input: @addr + @offset
> +		 * Output: @addr' + @offset'
> +		 */
> +		if (!kallsyms_lookup_size_offset((unsigned long)addr + offset,
> +						 NULL, &offset))
> +			return ERR_PTR(-ENOENT);
> +		addr = (kprobe_opcode_t *)((unsigned long)addr - offset);
>  	}

This looks wrong. I think you need to retain offset to calculate the 
proper function entry address so that you can do:
	addr = (kprobe_opcode_t *)((unsigned long)(addr + offset) - func_offset);
	offset = func_offset;

>  
> -	addr = (kprobe_opcode_t *)(((char *)addr) + offset);
> +	addr = arch_adjust_kprobe_addr((unsigned long)addr, offset, on_func_entry);
>  	if (addr)
>  		return addr;
>

- Naveen


^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH v2 12/39] x86/ibt,ftrace: Search for __fentry__ location
  2022-02-25 13:36             ` Steven Rostedt
@ 2022-03-01 18:57               ` Naveen N. Rao
  2022-03-01 19:20                 ` Steven Rostedt
  0 siblings, 1 reply; 183+ messages in thread
From: Naveen N. Rao @ 2022-03-01 18:57 UTC (permalink / raw)
  To: Masami Hiramatsu, Steven Rostedt
  Cc: alexei.starovoitov, alyssa.milburn, andrew.cooper3, hjl.tools,
	joao, jpoimboe, keescook, linux-kernel, mark.rutland, mbenes,
	ndesaulniers, Peter Zijlstra, samitolvanen, x86

Steven Rostedt wrote:
> On Fri, 25 Feb 2022 19:20:08 +0900
> Masami Hiramatsu <mhiramat@kernel.org> wrote:
> 
>> > No. It only acts like ftrace_location_range(sym, sym_end) if the passed
>> > in argument is the ip of the function (kallsyms returns offset = 0)  
>> 
>> Got it. So now ftrace_location() will return the ftrace address
>> when the ip == func or ip == mcount-call.

Won't this cause issues with ftrace_set_filter_ip() and others? If the 
passed-in ip points to func+0 when the actual ftrace location is at some 
offset, the ftrace location check in ftrace_match_addr() will now pass, 
resulting in adding func+0 to the hash. Should we also update 
ftrace_match_addr() to use the ip returned by ftrace_location()?


- Naveen

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH v2 15/39] x86/ibt,kprobes: Fix more +0 assumptions
  2022-03-01 17:19           ` Naveen N. Rao
@ 2022-03-01 19:12             ` Peter Zijlstra
  2022-03-01 20:05               ` Peter Zijlstra
  0 siblings, 1 reply; 183+ messages in thread
From: Peter Zijlstra @ 2022-03-01 19:12 UTC (permalink / raw)
  To: Naveen N. Rao
  Cc: Masami Hiramatsu, alexei.starovoitov, alyssa.milburn,
	andrew.cooper3, hjl.tools, joao, jpoimboe, keescook,
	linux-kernel, mark.rutland, mbenes, ndesaulniers, rostedt,
	samitolvanen, x86

On Tue, Mar 01, 2022 at 10:49:09PM +0530, Naveen N. Rao wrote:
> Peter Zijlstra wrote:
> > On Tue, Mar 01, 2022 at 11:49:05AM +0900, Masami Hiramatsu wrote:
> > 
> > > - the 'offset' is NOT limited under the symbol size.
> > >   (e.g. symbol_name = "_text" and @offset points the offset of target symbol from _text)
> > > 
> > > This means we need to call kallsyms_lookup_size_offset() in this case too.
> > 
> > I'm feeling we should error out in that case. Using sym+offset beyond
> > the limits of sym is just daft.
> > 
> > But if you really want/need to retain that, then yes, we need that
> > else branch unconditionally :/
> 
> I think we will need this. perf always specifies an offset from _text.

The _text section symbol should have an adequate size, no?

> Also, I just noticed:

> > +		if (!kallsyms_lookup_size_offset((unsigned long)addr + offset,
> > +						 NULL, &offset))
> > +			return ERR_PTR(-ENOENT);
> > +		addr = (kprobe_opcode_t *)((unsigned long)addr - offset);
> >  	}
> 
> This looks wrong. I think you need to retain offset to calculate the proper
> function entry address so that you can do:
> 	addr = (kprobe_opcode_t *)((unsigned long)(addr + offset) - func_offset);
> 	offset = func_offset;


Right you are, it needs to be:

	addr += offset;
	kallsyms_lookup_size_offset(addr, &size, &offset);
	addr -= offset;

with all the extra unreadable casts on.

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH v2 12/39] x86/ibt,ftrace: Search for __fentry__ location
  2022-03-01 18:57               ` Naveen N. Rao
@ 2022-03-01 19:20                 ` Steven Rostedt
  2022-03-02 13:20                   ` Peter Zijlstra
  0 siblings, 1 reply; 183+ messages in thread
From: Steven Rostedt @ 2022-03-01 19:20 UTC (permalink / raw)
  To: Naveen N. Rao
  Cc: Masami Hiramatsu, alexei.starovoitov, alyssa.milburn,
	andrew.cooper3, hjl.tools, joao, jpoimboe, keescook,
	linux-kernel, mark.rutland, mbenes, ndesaulniers, Peter Zijlstra,
	samitolvanen, x86

On Wed, 02 Mar 2022 00:27:51 +0530
"Naveen N. Rao" <naveen.n.rao@linux.vnet.ibm.com> wrote:

> Won't this cause issues with ftrace_set_filter_ip() and others? If the 
> passed-in ip points to func+0 when the actual ftrace location is at some 
> offset, the ftrace location check in ftrace_match_addr() will now pass, 
> resulting in adding func+0 to the hash. Should we also update 
> ftrace_match_addr() to use the ip returned by ftrace_location()?
> 

Yes, ftrace_match_addr() would need to be updated, or at least
ftrace_set_filter_ip() which is the only user ftrace_match_addr(), and is
currently only used by kprobes, live kernel patching and the direct
trampoline example code.

-- Steve

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH v2 15/39] x86/ibt,kprobes: Fix more +0 assumptions
  2022-03-01 19:12             ` Peter Zijlstra
@ 2022-03-01 20:05               ` Peter Zijlstra
  2022-03-02 15:59                 ` Naveen N. Rao
  2022-03-02 16:17                 ` Naveen N. Rao
  0 siblings, 2 replies; 183+ messages in thread
From: Peter Zijlstra @ 2022-03-01 20:05 UTC (permalink / raw)
  To: Naveen N. Rao
  Cc: Masami Hiramatsu, alexei.starovoitov, alyssa.milburn,
	andrew.cooper3, hjl.tools, joao, jpoimboe, keescook,
	linux-kernel, mark.rutland, mbenes, ndesaulniers, rostedt,
	samitolvanen, x86

On Tue, Mar 01, 2022 at 08:12:45PM +0100, Peter Zijlstra wrote:
> On Tue, Mar 01, 2022 at 10:49:09PM +0530, Naveen N. Rao wrote:
> > Peter Zijlstra wrote:

> > > But if you really want/need to retain that, then yes, we need that
> > > else branch unconditionally :/
> > 
> > I think we will need this. perf always specifies an offset from _text.
> 
> The _text section symbol should have an adequate size, no?

n/m, I should really go get some sleep it seems. Even if the size is
correct, that isn't relevant.

> > Also, I just noticed:
> 
> > > +		if (!kallsyms_lookup_size_offset((unsigned long)addr + offset,
> > > +						 NULL, &offset))
> > > +			return ERR_PTR(-ENOENT);
> > > +		addr = (kprobe_opcode_t *)((unsigned long)addr - offset);
> > >  	}
> > 
> > This looks wrong. I think you need to retain offset to calculate the proper
> > function entry address so that you can do:
> > 	addr = (kprobe_opcode_t *)((unsigned long)(addr + offset) - func_offset);
> > 	offset = func_offset;
> 
> 
> Right you are, it needs to be:
> 
> 	addr += offset;
> 	kallsyms_lookup_size_offset(addr, &size, &offset);
> 	addr -= offset;
> 
> with all the extra unreadable casts on.

How does this look?

--- a/arch/powerpc/kernel/kprobes.c
+++ b/arch/powerpc/kernel/kprobes.c
@@ -105,6 +105,27 @@ kprobe_opcode_t *kprobe_lookup_name(cons
 	return addr;
 }
 
+static bool arch_kprobe_on_func_entry(unsigned long offset)
+{
+#ifdef PPC64_ELF_ABI_v2
+#ifdef CONFIG_KPROBES_ON_FTRACE
+	return offset <= 16;
+#else
+	return offset <= 8;
+#endif
+#else
+	return !offset;
+#endif
+}
+
+/* XXX try and fold the magic of kprobe_lookup_name() in this */
+kprobe_opcode_t *arch_adjust_kprobe_addr(unsigned long addr, unsigned long offset,
+					 bool *on_func_entry)
+{
+	*on_func_entry = arch_kprobe_on_func_entry(offset);
+	return (kprobe_opcode_t *)(addr + offset);
+}
+
 void *alloc_insn_page(void)
 {
 	void *page;
@@ -218,19 +239,6 @@ static nokprobe_inline void set_current_
 	kcb->kprobe_saved_msr = regs->msr;
 }
 
-bool arch_kprobe_on_func_entry(unsigned long offset)
-{
-#ifdef PPC64_ELF_ABI_v2
-#ifdef CONFIG_KPROBES_ON_FTRACE
-	return offset <= 16;
-#else
-	return offset <= 8;
-#endif
-#else
-	return !offset;
-#endif
-}
-
 void arch_prepare_kretprobe(struct kretprobe_instance *ri, struct pt_regs *regs)
 {
 	ri->ret_addr = (kprobe_opcode_t *)regs->link;
--- a/arch/x86/kernel/kprobes/core.c
+++ b/arch/x86/kernel/kprobes/core.c
@@ -52,6 +52,7 @@
 #include <asm/insn.h>
 #include <asm/debugreg.h>
 #include <asm/set_memory.h>
+#include <asm/ibt.h>
 
 #include "common.h"
 
@@ -301,6 +302,22 @@ static int can_probe(unsigned long paddr
 	return (addr == paddr);
 }
 
+/* If the x86 support IBT (ENDBR) it must be skipped. */
+kprobe_opcode_t *arch_adjust_kprobe_addr(unsigned long addr, unsigned long offset,
+					 bool *on_func_entry)
+{
+	if (is_endbr(*(u32 *)addr)) {
+		*on_func_entry = !offset || offset == 4;
+		if (*on_func_entry)
+			offset = 4;
+
+	} else {
+		*on_func_entry = !offset;
+	}
+
+	return (kprobe_opcode_t *)(addr + offset);
+}
+
 /*
  * Copy an instruction with recovering modified instruction by kprobes
  * and adjust the displacement if the instruction uses the %rip-relative
--- a/include/linux/kprobes.h
+++ b/include/linux/kprobes.h
@@ -265,7 +265,6 @@ extern int arch_init_kprobes(void);
 extern void kprobes_inc_nmissed_count(struct kprobe *p);
 extern bool arch_within_kprobe_blacklist(unsigned long addr);
 extern int arch_populate_kprobe_blacklist(void);
-extern bool arch_kprobe_on_func_entry(unsigned long offset);
 extern int kprobe_on_func_entry(kprobe_opcode_t *addr, const char *sym, unsigned long offset);
 
 extern bool within_kprobe_blacklist(unsigned long addr);
@@ -384,6 +383,8 @@ static inline struct kprobe_ctlblk *get_
 }
 
 kprobe_opcode_t *kprobe_lookup_name(const char *name, unsigned int offset);
+kprobe_opcode_t *arch_adjust_kprobe_addr(unsigned long addr, unsigned long offset, bool *on_func_entry);
+
 int register_kprobe(struct kprobe *p);
 void unregister_kprobe(struct kprobe *p);
 int register_kprobes(struct kprobe **kps, int num);
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -1489,24 +1489,68 @@ bool within_kprobe_blacklist(unsigned lo
 }
 
 /*
+ * arch_adjust_kprobe_addr - adjust the address
+ * @addr: symbol base address
+ * @offset: offset within the symbol
+ * @on_func_entry: was this @addr+@offset on the function entry
+ *
+ * Typically returns @addr + @offset, except for special cases where the
+ * function might be prefixed by a CFI landing pad, in that case any offset
+ * inside the landing pad is mapped to the first 'real' instruction of the
+ * symbol.
+ *
+ * Specifically, for things like IBT/BTI, skip the resp. ENDBR/BTI.C
+ * instruction at +0.
+ */
+kprobe_opcode_t *__weak arch_adjust_kprobe_addr(unsigned long addr,
+						unsigned long offset,
+						bool *on_func_entry)
+{
+	*on_func_entry = !offset;
+	return (kprobe_opcode_t *)(addr + offset);
+}
+
+/*
  * If 'symbol_name' is specified, look it up and add the 'offset'
  * to it. This way, we can specify a relative address to a symbol.
  * This returns encoded errors if it fails to look up symbol or invalid
  * combination of parameters.
  */
-static kprobe_opcode_t *_kprobe_addr(kprobe_opcode_t *addr,
-			const char *symbol_name, unsigned int offset)
+static kprobe_opcode_t *
+_kprobe_addr(kprobe_opcode_t *addr, const char *symbol_name,
+	     unsigned long offset, bool *on_func_entry)
 {
 	if ((symbol_name && addr) || (!symbol_name && !addr))
 		goto invalid;
 
 	if (symbol_name) {
+		/*
+		 * Input: @sym + @offset
+		 * Output: @addr + @offset
+		 *
+		 * NOTE: kprobe_lookup_name() does *NOT* fold the offset
+		 *       argument into it's output!
+		 */
 		addr = kprobe_lookup_name(symbol_name, offset);
 		if (!addr)
 			return ERR_PTR(-ENOENT);
 	}
 
-	addr = (kprobe_opcode_t *)(((char *)addr) + offset);
+	/*
+	 * So here we have @addr + @offset, displace it into a new
+	 * @addr' + @offset' where @addr' is the symbol start address.
+	 */
+	addr = (void *)addr + offset;
+	if (!kallsyms_lookup_size_offset((unsigned long)addr, NULL, &offset))
+		return ERR_PTR(-ENOENT);
+	addr = (void *)addr - offset;
+
+	/*
+	 * Then ask the architecture to re-combine them, taking care of
+	 * magical function entry details while telling us if this was indeed
+	 * at the start of the function.
+	 */
+	addr = arch_adjust_kprobe_addr((unsigned long)addr, offset, on_func_entry);
 	if (addr)
 		return addr;
 
@@ -1516,7 +1560,8 @@ static kprobe_opcode_t *_kprobe_addr(kpr
 
 static kprobe_opcode_t *kprobe_addr(struct kprobe *p)
 {
-	return _kprobe_addr(p->addr, p->symbol_name, p->offset);
+	bool on_func_entry;
+	return _kprobe_addr(p->addr, p->symbol_name, p->offset, &on_func_entry);
 }
 
 /*
@@ -2067,15 +2112,13 @@ bool __weak arch_kprobe_on_func_entry(un
  */
 int kprobe_on_func_entry(kprobe_opcode_t *addr, const char *sym, unsigned long offset)
 {
-	kprobe_opcode_t *kp_addr = _kprobe_addr(addr, sym, offset);
+	bool on_func_entry;
+	kprobe_opcode_t *kp_addr = _kprobe_addr(addr, sym, offset, &on_func_entry);
 
 	if (IS_ERR(kp_addr))
 		return PTR_ERR(kp_addr);
 
-	if (!kallsyms_lookup_size_offset((unsigned long)kp_addr, NULL, &offset))
-		return -ENOENT;
-
-	if (!arch_kprobe_on_func_entry(offset))
+	if (!on_func_entry)
 		return -EINVAL;
 
 	return 0;

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH v2 18/39] x86/ibt: Add IBT feature, MSR and #CP handling
  2022-03-01 15:14     ` Peter Zijlstra
@ 2022-03-01 21:02       ` Peter Zijlstra
  2022-03-01 23:13         ` Josh Poimboeuf
  0 siblings, 1 reply; 183+ messages in thread
From: Peter Zijlstra @ 2022-03-01 21:02 UTC (permalink / raw)
  To: Edgecombe, Rick P
  Cc: Poimboe, Josh, hjl.tools, x86, joao, Cooper, Andrew,
	linux-kernel, keescook, rostedt, samitolvanen, mark.rutland,
	alexei.starovoitov, Milburn, Alyssa, mhiramat, mbenes,
	ndesaulniers

On Tue, Mar 01, 2022 at 04:14:42PM +0100, Peter Zijlstra wrote:

> Something like so?
> 
> diff --git a/arch/x86/kernel/relocate_kernel_64.S b/arch/x86/kernel/relocate_kernel_64.S
> index 399f075ccdc4..5b65f6ec5ee6 100644
> --- a/arch/x86/kernel/relocate_kernel_64.S
> +++ b/arch/x86/kernel/relocate_kernel_64.S
> @@ -114,6 +114,14 @@ SYM_CODE_START_LOCAL_NOALIGN(identity_mapped)
>  	/* store the start address on the stack */
>  	pushq   %rdx
>  
> +	/*
> +	 * Clear X86_CR4_CET (if it was set) such that we can clear CR0_WP
> +	 * below.
> +	 */
> +	movq	%cr4, %rax
> +	andq	$~(X86_CR4_CET), %rax
> +	movq	%rax, %cr4
> +
>  	/*
>  	 * Set cr0 to a known state:
>  	 *  - Paging enabled

I *think* it worked, I 'apt install kexec-tools' and copied the magic
commands Josh gave over IRC and the machine went and came back real
quick.

Lacking useful console I can't say much more.

I pushed out a version with these things on.

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH v2 01/39] kbuild: Fix clang build
  2022-02-24 14:51 ` [PATCH v2 01/39] kbuild: Fix clang build Peter Zijlstra
  2022-02-25  0:11   ` Kees Cook
@ 2022-03-01 21:16   ` Nick Desaulniers
  2022-03-02  0:47     ` Kees Cook
  2022-03-02 16:37     ` Nathan Chancellor
  1 sibling, 2 replies; 183+ messages in thread
From: Nick Desaulniers @ 2022-03-01 21:16 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: x86, joao, hjl.tools, jpoimboe, andrew.cooper3, linux-kernel,
	keescook, samitolvanen, mark.rutland, alyssa.milburn, mbenes,
	rostedt, mhiramat, alexei.starovoitov, Masahiro Yamada,
	Linux Kbuild mailing list, llvm, Nathan Chancellor

On Thu, Feb 24, 2022 at 7:17 AM Peter Zijlstra <peterz@infradead.org> wrote:
>
> Debian (and derived) distros ship their compilers as -$ver suffixed
> binaries. For gcc it is sufficent to use:
>
>  $ make CC=gcc-12
>
> However, clang builds (esp. clang-lto) need a whole array of tools to be
> exactly right, leading to unweildy stuff like:
>
>  $ make CC=clang-13 LD=ld.lld-13 AR=llvm-ar-13 NM=llvm-nm-13 OBJCOPY=llvm-objcopy-13 OBJDUMP=llvm-objdump-13 READELF=llvm-readelf-13 STRIP=llvm-strip-13 LLVM=1
>
> which is, quite franktly, totally insane and unusable. Instead make
> the CC variable DTRT, enabling one such as myself to use:
>
>  $ make CC=clang-13
>
> This also lets one quickly test different clang versions.
> Additionally, also support path based LLVM suites like:
>
>  $ make CC=/opt/llvm/bin/clang
>
> This changes the default to LLVM=1 when CC is clang, mixing toolchains

No, nack, we definitely do not want CC=clang to set LLVM=1. Those are
distinctly two different things for testing JUST the compiler
(CC=clang) vs the whole toolchain suite (LLVM=1). I do not wish to
change the semantics of those, and only for LLVM.

LLVM=1 means test clang, lld, llvm-objcopy, etc..
CC=clang means test clang, bfd, GNU objcopy, etc..
https://docs.kernel.org/kbuild/llvm.html#llvm-utilities

I don't wish to see the behavior of CC=clang change based on LLVM=0 being set.

> is still possible by explicitly adding LLVM=0.

Thanks for testing with LLVM, and even multiple versions of LLVM.

I'm still sympathetic, but only up to a point. A change like this MUST
CC the kbuild AND LLVM maintainers AND respective lists though.  It
also has little to do with the rest of the series.

As per our previous discussion
https://lore.kernel.org/linux-kbuild/CAKwvOd=x9E=7WcCiieso-CDiiU-wMFcXL4W3V5j8dq7BL5QT+w@mail.gmail.com/
I'm still of the opionion that this should be solved by modifications
(permanent or one off) to one's $PATH.

To see what that would look like, let's test that out:

$ sudo apt install clang-11 lld-11

$ cd linux
$ dirname $(readlink -f $(which clang-11))
$ PATH=$(dirname $(readlink -f $(which clang-11))):$PATH make LLVM=1
-j72 -s allnoconfig all
$ llvm-readelf -p .comment vmlinux
String dump of section '.comment':
[     0] Linker: LLD 11.1.0
[    14] Debian clang version 11.1.0-4+build3


If that's too much for the command line, then add a shell function to
your shell's .rc file:

$ which make_clang
make_clang () {
  ver=$1
  shift
  if ! [[ -n $(command -v clang-$ver) ]]
  then
    echo "clang-$ver not installed"
    return 1
  fi
  PATH=$(dirname $(readlink -f $(which clang-$ver))):$PATH make CC=clang $@
}

$ make_clang 11 -j72 -s clean allnoconfig all
$ llvm-readelf -p .comment vmlinux
String dump of section '.comment':
[     0] Debian clang version 11.1.0-4+build3


Even stuffing the dirname+readlink+which in a short helper fn would let you do:

$ make CC=$(helper clang-11)

Also, Kees mentions this is an issue for testing multiple different
versions of gcc, too.  There perhaps is a way to simplify the builds
for BOTH toolchains; i.e. a yet-to-be-created shared variable denoting
the suffix for binaries?  The primary pain point seems to be Debian's
suffixing scheme; it will suffix GCC, clang, and lld, but not GNU
binutils IIUC.

>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  Makefile                       |   45 +++++++++++++++++++++++++++---------
>  tools/scripts/Makefile.include |   50 ++++++++++++++++++++++++++++-------------
>  2 files changed, 68 insertions(+), 27 deletions(-)
>
> --- a/Makefile
> +++ b/Makefile
> @@ -423,9 +423,29 @@ HOST_LFS_CFLAGS := $(shell getconf LFS_C
>  HOST_LFS_LDFLAGS := $(shell getconf LFS_LDFLAGS 2>/dev/null)
>  HOST_LFS_LIBS := $(shell getconf LFS_LIBS 2>/dev/null)
>
> -ifneq ($(LLVM),)
> -HOSTCC = clang
> -HOSTCXX        = clang++
> +# powerpc and s390 don't yet work with LLVM as a whole
> +ifeq ($(ARCH),powerpc)
> +LLVM = 0
> +endif
> +ifeq ($(ARCH),s390)
> +LLVM = 0
> +endif
> +
> +# otherwise, if CC=clang, default to using LLVM to enable LTO
> +CC_BASE := $(shell echo $(CC) | sed 's/.*\///')
> +CC_NAME := $(shell echo $(CC_BASE) | cut -b "1-5")
> +ifeq ($(shell test "$(CC_NAME)" = "clang"; echo $$?),0)
> +LLVM ?= 1
> +LLVM_PFX := $(shell echo $(CC) | sed 's/\(.*\/\)\?.*/\1/')

Just curious, what prefixes have you observed in the wild?

> +LLVM_SFX := $(shell echo $(CC_BASE) | cut -b "6-")
> +endif
> +
> +# if not set by now, do not use LLVM
> +LLVM ?= 0
> +
> +ifneq ($(LLVM),0)
> +HOSTCC = $(LLVM_PFX)clang$(LLVM_SFX)
> +HOSTCXX        = $(LLVM_PFX)clang++$(LLVM_SFX)
>  else
>  HOSTCC = gcc
>  HOSTCXX        = g++
> @@ -442,15 +462,15 @@ KBUILD_HOSTLDLIBS   := $(HOST_LFS_LIBS)
>
>  # Make variables (CC, etc...)
>  CPP            = $(CC) -E
> -ifneq ($(LLVM),)
> -CC             = clang
> -LD             = ld.lld
> -AR             = llvm-ar
> -NM             = llvm-nm
> -OBJCOPY                = llvm-objcopy
> -OBJDUMP                = llvm-objdump
> -READELF                = llvm-readelf
> -STRIP          = llvm-strip
> +ifneq ($(LLVM),0)
> +CC             = $(LLVM_PFX)clang$(LLVM_SFX)
> +LD             = $(LLVM_PFX)ld.lld$(LLVM_SFX)
> +AR             = $(LLVM_PFX)llvm-ar$(LLVM_SFX)
> +NM             = $(LLVM_PFX)llvm-nm$(LLVM_SFX)
> +OBJCOPY                = $(LLVM_PFX)llvm-objcopy$(LLVM_SFX)
> +OBJDUMP                = $(LLVM_PFX)llvm-objdump$(LLVM_SFX)
> +READELF                = $(LLVM_PFX)llvm-readelf$(LLVM_SFX)
> +STRIP          = $(LLVM_PFX)llvm-strip$(LLVM_SFX)
>  else
>  CC             = $(CROSS_COMPILE)gcc
>  LD             = $(CROSS_COMPILE)ld
> @@ -461,6 +481,7 @@ OBJDUMP             = $(CROSS_COMPILE)objdump
>  READELF                = $(CROSS_COMPILE)readelf
>  STRIP          = $(CROSS_COMPILE)strip
>  endif
> +
>  PAHOLE         = pahole
>  RESOLVE_BTFIDS = $(objtree)/tools/bpf/resolve_btfids/resolve_btfids
>  LEX            = flex
> --- a/tools/scripts/Makefile.include
> +++ b/tools/scripts/Makefile.include
> @@ -51,12 +51,32 @@ define allow-override
>      $(eval $(1) = $(2)))
>  endef
>
> -ifneq ($(LLVM),)
> -$(call allow-override,CC,clang)
> -$(call allow-override,AR,llvm-ar)
> -$(call allow-override,LD,ld.lld)
> -$(call allow-override,CXX,clang++)
> -$(call allow-override,STRIP,llvm-strip)
> +# powerpc and s390 don't yet work with LLVM as a whole
> +ifeq ($(ARCH),powerpc)
> +LLVM = 0
> +endif
> +ifeq ($(ARCH),s390)
> +LLVM = 0
> +endif
> +
> +# otherwise, if CC=clang, default to using LLVM to enable LTO
> +CC_BASE := $(shell echo $(CC) | sed 's/.*\///')
> +CC_NAME := $(shell echo $(CC_BASE) | cut -b "1-5")
> +ifeq ($(shell test "$(CC_NAME)" = "clang"; echo $$?),0)
> +LLVM ?= 1
> +LLVM_PFX := $(shell echo $(CC) | sed 's/\(.*\/\)\?.*/\1/')
> +LLVM_SFX := $(shell echo $(CC_BASE) | cut -b "6-")
> +endif
> +
> +# if not set by now, do not use LLVM
> +LLVM ?= 0
> +
> +ifneq ($(LLVM),0)
> +$(call allow-override,CC,$(LLVM_PFX)clang$(LLVM_SFX))
> +$(call allow-override,AR,$(LLVM_PFX)llvm-ar$(LLVM_SFX))
> +$(call allow-override,LD,$(LLVM_PFX)ld.lld$(LLVM_SFX))
> +$(call allow-override,CXX,$(LLVM_PFX)clang++$(LLVM_SFX))
> +$(call allow-override,STRIP,$(LLVM_PFX)llvm-strip$(LLVM_SFX))
>  else
>  # Allow setting various cross-compile vars or setting CROSS_COMPILE as a prefix.
>  $(call allow-override,CC,$(CROSS_COMPILE)gcc)
> @@ -68,10 +88,10 @@ endif
>
>  CC_NO_CLANG := $(shell $(CC) -dM -E -x c /dev/null | grep -Fq "__clang__"; echo $$?)
>
> -ifneq ($(LLVM),)
> -HOSTAR  ?= llvm-ar
> -HOSTCC  ?= clang
> -HOSTLD  ?= ld.lld
> +ifneq ($(LLVM),0)
> +HOSTAR  ?= $(LLVM_PFX)llvm-ar$(LLVM_SFX)
> +HOSTCC  ?= $(LLVM_PFX)clang$(LLVM_SFX)
> +HOSTLD  ?= $(LLVM_PFX)ld.lld$(LLVM_SFX)
>  else
>  HOSTAR  ?= ar
>  HOSTCC  ?= gcc
> @@ -79,11 +99,11 @@ HOSTLD  ?= ld
>  endif
>
>  # Some tools require Clang, LLC and/or LLVM utils
> -CLANG          ?= clang
> -LLC            ?= llc
> -LLVM_CONFIG    ?= llvm-config
> -LLVM_OBJCOPY   ?= llvm-objcopy
> -LLVM_STRIP     ?= llvm-strip
> +CLANG          ?= $(LLVM_PFX)clang$(LLVM_SFX)
> +LLC            ?= $(LLVM_PFX)llc$(LLVM_SFX)
> +LLVM_CONFIG    ?= $(LLVM_PFX)llvm-config$(LLVM_SFX)
> +LLVM_OBJCOPY   ?= $(LLVM_PFX)llvm-objcopy$(LLVM_SFX)
> +LLVM_STRIP     ?= $(LLVM_PFX)llvm-strip$(LLVM_SFX)
>
>  ifeq ($(CC_NO_CLANG), 1)
>  EXTRA_WARNINGS += -Wstrict-aliasing=3
>
>


-- 
Thanks,
~Nick Desaulniers

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH v2 00/39] x86: Kernel IBT
  2022-02-25 15:28   ` Peter Zijlstra
  2022-02-25 15:43     ` Peter Zijlstra
@ 2022-03-01 23:10     ` Josh Poimboeuf
  2022-03-02 10:20       ` Peter Zijlstra
  1 sibling, 1 reply; 183+ messages in thread
From: Josh Poimboeuf @ 2022-03-01 23:10 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: x86, joao, hjl.tools, andrew.cooper3, linux-kernel, ndesaulniers,
	keescook, samitolvanen, mark.rutland, alyssa.milburn, mbenes,
	rostedt, mhiramat, alexei.starovoitov

On Fri, Feb 25, 2022 at 04:28:32PM +0100, Peter Zijlstra wrote:
> On Thu, Feb 24, 2022 at 12:26:02PM -0800, Josh Poimboeuf wrote:
> 
> > Bricked my SPR:
> > 
> > [   21.602888] jump_label: Fatal kernel bug, unexpected op at sched_clock_stable+0x4/0x20 [0000000074a0db20] (eb 06 b8 01 00 != eb 0a 00 00 00)) size:2 type:0
> 
> > ffffffff81120a70 <sched_clock_stable>:
> > ffffffff81120a70:       f3 0f 1e fa             endbr64
> > ffffffff81120a74:       eb 06                   jmp    ffffffff81120a7c <sched_clock_stable+0xc>
> > ffffffff81120a76:       b8 01 00 00 00          mov    $0x1,%eax
> > ffffffff81120a7b:       c3                      retq
> > ffffffff81120a7c:       f3 0f 1e fa             endbr64
> > ffffffff81120a80:       31 c0                   xor    %eax,%eax
> > ffffffff81120a82:       c3                      retq
> > ffffffff81120a83:       66 66 2e 0f 1f 84 00    data16 nopw %cs:0x0(%rax,%rax,1)
> > ffffffff81120a8a:       00 00 00 00
> > ffffffff81120a8e:       66 90                   xchg   %ax,%ax
> 
> This is due to you having a very old (and arguably buggy) compiler :-( I
> can reproduce with gcc-8.4 and gcc-9.4, my gcc-10.3 compiler no longer
> generates daft code like that, nor do any later.
> 
> That said, I can fix objtool to also re-write jumps to in-the-middle
> ENDBR like this, but then I do get a bunch of:
> 
> OBJTOOL vmlinux.o
> vmlinux.o: warning: objtool: displacement doesn't fit
> vmlinux.o: warning: objtool: ep_insert()+0xbc5: Direct IMM jump to ENDBR; cannot fix
> vmlinux.o: warning: objtool: displacement doesn't fit
> vmlinux.o: warning: objtool: configfs_depend_prep()+0x76: Direct IMM jump to ENDBR; cannot fix
> vmlinux.o: warning: objtool: displacement doesn't fit
> vmlinux.o: warning: objtool: request_key_and_link()+0x17b: Direct IMM jump to ENDBR; cannot fix
> vmlinux.o: warning: objtool: displacement doesn't fit
> vmlinux.o: warning: objtool: blk_mq_poll()+0x2e0: Direct IMM jump to ENDBR; cannot fix
> 
> The alternative is only skipping endbr at +0 I suppose, lemme go try
> that with the brand spanking new skip_endbr() function.
> 
> Yep,.. that seems to cure things. It noaw boats when build with old
> crappy compilers too.
> 
> 
> --- a/arch/x86/include/asm/ibt.h
> +++ b/arch/x86/include/asm/ibt.h
> @@ -47,6 +47,8 @@ static inline bool is_endbr(unsigned int
>  	return val == gen_endbr();
>  }
>  
> +extern void *skip_endbr(void *);
> +
>  extern __noendbr u64 ibt_save(void);
>  extern __noendbr void ibt_restore(u64 save);
>  
> @@ -71,6 +73,7 @@ extern __noendbr void ibt_restore(u64 sa
>  #define __noendbr
>  
>  static inline bool is_endbr(unsigned int val) { return false; }
> +static inline void *skip_endbr(void *addr) { return addr; }
>  
>  static inline u64 ibt_save(void) { return 0; }
>  static inline void ibt_restore(u64 save) { }
> --- a/arch/x86/include/asm/text-patching.h
> +++ b/arch/x86/include/asm/text-patching.h
> @@ -112,10 +112,7 @@ void __text_gen_insn(void *buf, u8 opcod
>  	OPTIMIZER_HIDE_VAR(addr);
>  	OPTIMIZER_HIDE_VAR(dest);
>  
> -#ifdef CONFIG_X86_KERNEL_IBT
> -	if (is_endbr(*(u32 *)dest))
> -		dest += 4;
> -#endif
> +	dest = skip_endbr((void *)dest);
>  
>  	insn->opcode = opcode;
>  
> --- a/arch/x86/kernel/cpu/common.c
> +++ b/arch/x86/kernel/cpu/common.c
> @@ -620,6 +620,19 @@ __noendbr void ibt_restore(u64 save)
>  	}
>  }
>  
> +
> +void *skip_endbr(void *addr)
> +{
> +	unsigned long size, offset;
> +
> +	if (is_endbr(*(unsigned int *)addr) &&
> +	    kallsyms_lookup_size_offset((unsigned long)addr, &size, &offset) &&
> +	    !offset)
> +		addr += 4;
> +
> +	return addr;
> +}
> +
>  #endif
>  
>  static __always_inline void setup_cet(struct cpuinfo_x86 *c)
> @@ -636,7 +649,10 @@ static __always_inline void setup_cet(st
>  	if (!ibt_selftest()) {
>  		pr_err("IBT selftest: Failed!\n");
>  		setup_clear_cpu_cap(X86_FEATURE_IBT);
> +		return;
>  	}
> +
> +	pr_info("CET detected: Indirect Branch Tracking enabled\n");

This is a little excessive on my 192 CPUs :-)

It also messes with the pr_cont()s in announce_cpu():

[    3.733446] x86: Booting SMP configuration:
[    3.734342] .... node  #0, CPUs:          #1
[    3.534902] CET detected: Indirect Branch Tracking enabled
[    3.770955]    #2
[    3.534902] CET detected: Indirect Branch Tracking enabled
[    3.802979]    #3
[    3.534902] CET detected: Indirect Branch Tracking enabled
[    3.835459]    #4
[    3.534902] CET detected: Indirect Branch Tracking enabled
[    3.866826]    #5
[    3.534902] CET detected: Indirect Branch Tracking enabled
[    3.898690]    #6
[    3.534902] CET detected: Indirect Branch Tracking enabled
[    3.930355]    #7
[    3.534902] CET detected: Indirect Branch Tracking enabled
[    3.961493]    #8
[    3.534902] CET detected: Indirect Branch Tracking enabled
[    3.993500]    #9
[    3.534902] CET detected: Indirect Branch Tracking enabled
[    4.024952]   #10
[    3.534902] CET detected: Indirect Branch Tracking enabled
[    4.056491]   #11
[    3.534902] CET detected: Indirect Branch Tracking enabled
[    4.087493]   #12
[    3.534902] CET detected: Indirect Branch Tracking enabled
[    4.118907]   #13
[    3.534902] CET detected: Indirect Branch Tracking enabled
[    4.150494]   #14
[    3.534902] CET detected: Indirect Branch Tracking enabled
[    4.181425]   #15


-- 
Josh


^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH v2 18/39] x86/ibt: Add IBT feature, MSR and #CP handling
  2022-03-01 21:02       ` Peter Zijlstra
@ 2022-03-01 23:13         ` Josh Poimboeuf
  2022-03-02  1:59           ` Edgecombe, Rick P
  0 siblings, 1 reply; 183+ messages in thread
From: Josh Poimboeuf @ 2022-03-01 23:13 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Edgecombe, Rick P, hjl.tools, x86, joao, Cooper, Andrew,
	linux-kernel, keescook, rostedt, samitolvanen, mark.rutland,
	alexei.starovoitov, Milburn, Alyssa, mhiramat, mbenes,
	ndesaulniers

On Tue, Mar 01, 2022 at 10:02:45PM +0100, Peter Zijlstra wrote:
> On Tue, Mar 01, 2022 at 04:14:42PM +0100, Peter Zijlstra wrote:
> 
> > Something like so?
> > 
> > diff --git a/arch/x86/kernel/relocate_kernel_64.S b/arch/x86/kernel/relocate_kernel_64.S
> > index 399f075ccdc4..5b65f6ec5ee6 100644
> > --- a/arch/x86/kernel/relocate_kernel_64.S
> > +++ b/arch/x86/kernel/relocate_kernel_64.S
> > @@ -114,6 +114,14 @@ SYM_CODE_START_LOCAL_NOALIGN(identity_mapped)
> >  	/* store the start address on the stack */
> >  	pushq   %rdx
> >  
> > +	/*
> > +	 * Clear X86_CR4_CET (if it was set) such that we can clear CR0_WP
> > +	 * below.
> > +	 */
> > +	movq	%cr4, %rax
> > +	andq	$~(X86_CR4_CET), %rax
> > +	movq	%rax, %cr4
> > +
> >  	/*
> >  	 * Set cr0 to a known state:
> >  	 *  - Paging enabled
> 
> I *think* it worked, I 'apt install kexec-tools' and copied the magic
> commands Josh gave over IRC and the machine went and came back real
> quick.
> 
> Lacking useful console I can't say much more.
> 
> I pushed out a version with these things on.

I just used your latest git tree, kexec into a non-IBT kernel worked for
me as well.

-- 
Josh


^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH v2 15/39] x86/ibt,kprobes: Fix more +0 assumptions
  2022-03-01  8:28         ` Peter Zijlstra
  2022-03-01 17:19           ` Naveen N. Rao
@ 2022-03-02  0:11           ` Masami Hiramatsu
  2022-03-02 10:25             ` Peter Zijlstra
  1 sibling, 1 reply; 183+ messages in thread
From: Masami Hiramatsu @ 2022-03-02  0:11 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: x86, joao, hjl.tools, jpoimboe, andrew.cooper3, linux-kernel,
	ndesaulniers, keescook, samitolvanen, mark.rutland,
	alyssa.milburn, mbenes, rostedt, alexei.starovoitov,
	naveen.n.rao

On Tue, 1 Mar 2022 09:28:49 +0100
Peter Zijlstra <peterz@infradead.org> wrote:

> On Tue, Mar 01, 2022 at 11:49:05AM +0900, Masami Hiramatsu wrote:
> > > +static kprobe_opcode_t *
> > > +_kprobe_addr(kprobe_opcode_t *addr, const char *symbol_name,
> > > +	     unsigned long offset, bool *on_func_entry)
> > >  {
> > >  	if ((symbol_name && addr) || (!symbol_name && !addr))
> > >  		goto invalid;
> > >  
> > >  	if (symbol_name) {
> > > +		/*
> > > +		 * Input: @sym + @offset
> > > +		 * Output: @addr + @offset
> > > +		 *
> > > +		 * NOTE: kprobe_lookup_name() does *NOT* fold the offset
> > > +		 *       argument into it's output!
> > > +		 */
> > >  		addr = kprobe_lookup_name(symbol_name, offset);
> > 
> > Hmm, there are 2 issues.
> > 
> > - the 'addr' includes the 'offset' here.
> 
> AFAICT it doesn't (I ever wrote that in the comment on top). There's two
> implementations of kprobe_lookup_name(), the weak version doesn't even
> use the offset argument, and the PowerPC implementation only checks for
> !offset and doesn't fold it.

Oops, OK.

> 
> > - the 'offset' is NOT limited under the symbol size.
> >   (e.g. symbol_name = "_text" and @offset points the offset of target symbol from _text)
> > 
> > This means we need to call kallsyms_lookup_size_offset() in this case too.
> 
> I'm feeling we should error out in that case. Using sym+offset beyond
> the limits of sym is just daft.

No, this is required for pointing some local scope functions, which has
same name. (And perf-probe does that)

> 
> But if you really want/need to retain that, then yes, we need that
> else branch unconditionally :/

Thank you,

-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH v2 01/39] kbuild: Fix clang build
  2022-03-01 21:16   ` Nick Desaulniers
@ 2022-03-02  0:47     ` Kees Cook
  2022-03-02  0:53       ` Fangrui Song
  2022-03-02 16:37     ` Nathan Chancellor
  1 sibling, 1 reply; 183+ messages in thread
From: Kees Cook @ 2022-03-02  0:47 UTC (permalink / raw)
  To: Nick Desaulniers
  Cc: Peter Zijlstra, x86, joao, hjl.tools, jpoimboe, andrew.cooper3,
	linux-kernel, samitolvanen, mark.rutland, alyssa.milburn, mbenes,
	rostedt, mhiramat, alexei.starovoitov, Masahiro Yamada,
	Linux Kbuild mailing list, llvm, Nathan Chancellor

On Tue, Mar 01, 2022 at 01:16:04PM -0800, Nick Desaulniers wrote:
> Also, Kees mentions this is an issue for testing multiple different
> versions of gcc, too.  There perhaps is a way to simplify the builds
> for BOTH toolchains; i.e. a yet-to-be-created shared variable denoting
> the suffix for binaries?  The primary pain point seems to be Debian's
> suffixing scheme; it will suffix GCC, clang, and lld, but not GNU
> binutils IIUC.

Right. Though I think auto-detection still makes sense.

If I do:

	make CC=clang-12 LLVM=1

it'd be nice if it also used ld.lld-12.

> [...]
> Just curious, what prefixes have you observed in the wild?

For me, where ever I built clang, and "/usr/bin"

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH v2 01/39] kbuild: Fix clang build
  2022-03-02  0:47     ` Kees Cook
@ 2022-03-02  0:53       ` Fangrui Song
  0 siblings, 0 replies; 183+ messages in thread
From: Fangrui Song @ 2022-03-02  0:53 UTC (permalink / raw)
  To: Kees Cook
  Cc: Nick Desaulniers, Peter Zijlstra, x86, joao, hjl.tools, jpoimboe,
	andrew.cooper3, linux-kernel, samitolvanen, mark.rutland,
	alyssa.milburn, mbenes, rostedt, mhiramat, alexei.starovoitov,
	Masahiro Yamada, Linux Kbuild mailing list, llvm,
	Nathan Chancellor

On 2022-03-01, Kees Cook wrote:
>On Tue, Mar 01, 2022 at 01:16:04PM -0800, Nick Desaulniers wrote:
>> Also, Kees mentions this is an issue for testing multiple different
>> versions of gcc, too.  There perhaps is a way to simplify the builds
>> for BOTH toolchains; i.e. a yet-to-be-created shared variable denoting
>> the suffix for binaries?  The primary pain point seems to be Debian's
>> suffixing scheme; it will suffix GCC, clang, and lld, but not GNU
>> binutils IIUC.
>
>Right. Though I think auto-detection still makes sense.
>
>If I do:
>
>	make CC=clang-12 LLVM=1
>
>it'd be nice if it also used ld.lld-12.

This transformation may be a bit magical.

On Debian, /usr/bin/clang-13 is a symlink to /usr/lib/llvm-13/bin/clang .
Will it be fine for the user to provide correct feasible PATH?

>> [...]
>> Just curious, what prefixes have you observed in the wild?
>
>For me, where ever I built clang, and "/usr/bin"
>
>-- 
>Kees Cook
>

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH v2 18/39] x86/ibt: Add IBT feature, MSR and #CP handling
  2022-03-01 23:13         ` Josh Poimboeuf
@ 2022-03-02  1:59           ` Edgecombe, Rick P
  2022-03-02 13:49             ` Peter Zijlstra
  0 siblings, 1 reply; 183+ messages in thread
From: Edgecombe, Rick P @ 2022-03-02  1:59 UTC (permalink / raw)
  To: keescook, peterz, Poimboe, Josh
  Cc: linux-kernel, Cooper, Andrew, hjl.tools, rostedt, joao,
	samitolvanen, x86, mark.rutland, alexei.starovoitov, Milburn,
	Alyssa, mhiramat, mbenes, ndesaulniers

On Tue, 2022-03-01 at 15:13 -0800, Josh Poimboeuf wrote:
> On Tue, Mar 01, 2022 at 10:02:45PM +0100, Peter Zijlstra wrote:
> > On Tue, Mar 01, 2022 at 04:14:42PM +0100, Peter Zijlstra wrote:
> > 
> > > Something like so?
> > > 
> > > diff --git a/arch/x86/kernel/relocate_kernel_64.S
> > > b/arch/x86/kernel/relocate_kernel_64.S
> > > index 399f075ccdc4..5b65f6ec5ee6 100644
> > > --- a/arch/x86/kernel/relocate_kernel_64.S
> > > +++ b/arch/x86/kernel/relocate_kernel_64.S
> > > @@ -114,6 +114,14 @@
> > > SYM_CODE_START_LOCAL_NOALIGN(identity_mapped)
> > >      /* store the start address on the stack */
> > >      pushq   %rdx
> > >   
> > > +   /*
> > > +    * Clear X86_CR4_CET (if it was set) such that we can clear
> > > CR0_WP
> > > +    * below.
> > > +    */
> > > +   movq    %cr4, %rax
> > > +   andq    $~(X86_CR4_CET), %rax
> > > +   movq    %rax, %cr4
> > > +
> > >      /*
> > >       * Set cr0 to a known state:
> > >       *  - Paging enabled
> > 
> > I *think* it worked, I 'apt install kexec-tools' and copied the
> > magic
> > commands Josh gave over IRC and the machine went and came back real
> > quick.
> > 
> > Lacking useful console I can't say much more.
> > 
> > I pushed out a version with these things on.
> 
> I just used your latest git tree, kexec into a non-IBT kernel worked
> for
> me as well.

I copied this over to the userspace series and it worked where it
failed before. And looking at the code, it seems like it should work as
well. 

As for pinning strength, I'm not understanding this kexec asm enough to
say for sure how much better it is than just removing the bit from the
pinning mask. I think some future hardening around preventing turning
off IBT might still be worthwhile.

Kees, I think you brought up the pinning, what do you think of this?


^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH v2 00/39] x86: Kernel IBT
  2022-03-01 23:10     ` Josh Poimboeuf
@ 2022-03-02 10:20       ` Peter Zijlstra
  0 siblings, 0 replies; 183+ messages in thread
From: Peter Zijlstra @ 2022-03-02 10:20 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: x86, joao, hjl.tools, andrew.cooper3, linux-kernel, ndesaulniers,
	keescook, samitolvanen, mark.rutland, alyssa.milburn, mbenes,
	rostedt, mhiramat, alexei.starovoitov

On Tue, Mar 01, 2022 at 03:10:22PM -0800, Josh Poimboeuf wrote:
> On Fri, Feb 25, 2022 at 04:28:32PM +0100, Peter Zijlstra wrote:
> > @@ -636,7 +649,10 @@ static __always_inline void setup_cet(st
> >  	if (!ibt_selftest()) {
> >  		pr_err("IBT selftest: Failed!\n");
> >  		setup_clear_cpu_cap(X86_FEATURE_IBT);
> > +		return;
> >  	}
> > +
> > +	pr_info("CET detected: Indirect Branch Tracking enabled\n");
> 
> This is a little excessive on my 192 CPUs :-)
> 
> It also messes with the pr_cont()s in announce_cpu():

Hehe, I just noticed the same when looking at logs trying to figure out
if kexec worked. I'll go fix.

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH v2 15/39] x86/ibt,kprobes: Fix more +0 assumptions
  2022-03-02  0:11           ` Masami Hiramatsu
@ 2022-03-02 10:25             ` Peter Zijlstra
  0 siblings, 0 replies; 183+ messages in thread
From: Peter Zijlstra @ 2022-03-02 10:25 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: x86, joao, hjl.tools, jpoimboe, andrew.cooper3, linux-kernel,
	ndesaulniers, keescook, samitolvanen, mark.rutland,
	alyssa.milburn, mbenes, rostedt, alexei.starovoitov,
	naveen.n.rao

On Wed, Mar 02, 2022 at 09:11:50AM +0900, Masami Hiramatsu wrote:

> > But if you really want/need to retain that, then yes, we need that
> > else branch unconditionally :/
> 
> Thank you,

That's what I ended up doing in the latest version; I realized that
irrespective of symbol size, it is required when symbols overlap, as per
the case mentioned by Naveen.

  https://lkml.kernel.org/r/20220301200547.GK11184@worktop.programming.kicks-ass.net

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH v2 12/39] x86/ibt,ftrace: Search for __fentry__ location
  2022-03-01 19:20                 ` Steven Rostedt
@ 2022-03-02 13:20                   ` Peter Zijlstra
  2022-03-02 16:01                     ` Steven Rostedt
  0 siblings, 1 reply; 183+ messages in thread
From: Peter Zijlstra @ 2022-03-02 13:20 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Naveen N. Rao, Masami Hiramatsu, alexei.starovoitov,
	alyssa.milburn, andrew.cooper3, hjl.tools, joao, jpoimboe,
	keescook, linux-kernel, mark.rutland, mbenes, ndesaulniers,
	samitolvanen, x86

On Tue, Mar 01, 2022 at 02:20:16PM -0500, Steven Rostedt wrote:
> On Wed, 02 Mar 2022 00:27:51 +0530
> "Naveen N. Rao" <naveen.n.rao@linux.vnet.ibm.com> wrote:
> 
> > Won't this cause issues with ftrace_set_filter_ip() and others? If the 
> > passed-in ip points to func+0 when the actual ftrace location is at some 
> > offset, the ftrace location check in ftrace_match_addr() will now pass, 
> > resulting in adding func+0 to the hash. Should we also update 
> > ftrace_match_addr() to use the ip returned by ftrace_location()?
> > 
> 
> Yes, ftrace_match_addr() would need to be updated, or at least
> ftrace_set_filter_ip() which is the only user ftrace_match_addr(), and is
> currently only used by kprobes, live kernel patching and the direct
> trampoline example code.

Like so, or is something else needed?

diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index 68ecd3e35342..d1b30b5c5c23 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -4980,7 +4980,8 @@ ftrace_match_addr(struct ftrace_hash *hash, unsigned long ip, int remove)
 {
 	struct ftrace_func_entry *entry;
 
-	if (!ftrace_location(ip))
+	ip = ftrace_location(ip);
+	if (!ip)
 		return -EINVAL;
 
 	if (remove) {

^ permalink raw reply related	[flat|nested] 183+ messages in thread

* Re: [PATCH v2 18/39] x86/ibt: Add IBT feature, MSR and #CP handling
  2022-03-02  1:59           ` Edgecombe, Rick P
@ 2022-03-02 13:49             ` Peter Zijlstra
  2022-03-02 18:38               ` Kees Cook
  0 siblings, 1 reply; 183+ messages in thread
From: Peter Zijlstra @ 2022-03-02 13:49 UTC (permalink / raw)
  To: Edgecombe, Rick P
  Cc: keescook, Poimboe, Josh, linux-kernel, Cooper, Andrew, hjl.tools,
	rostedt, joao, samitolvanen, x86, mark.rutland,
	alexei.starovoitov, Milburn, Alyssa, mhiramat, mbenes,
	ndesaulniers

On Wed, Mar 02, 2022 at 01:59:46AM +0000, Edgecombe, Rick P wrote:
> As for pinning strength, I'm not understanding this kexec asm enough to
> say for sure how much better it is than just removing the bit from the
> pinning mask. I think some future hardening around preventing turning
> off IBT might still be worthwhile.
> 
> Kees, I think you brought up the pinning, what do you think of this?

IIRC the whole purpose of that was to ensure that the
cr4_update_irqsoff() function itself isn't a useful gadget to manipulate
CR4 with.


^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH v2 15/39] x86/ibt,kprobes: Fix more +0 assumptions
  2022-03-01 20:05               ` Peter Zijlstra
@ 2022-03-02 15:59                 ` Naveen N. Rao
  2022-03-02 16:38                   ` Peter Zijlstra
  2022-03-02 16:17                 ` Naveen N. Rao
  1 sibling, 1 reply; 183+ messages in thread
From: Naveen N. Rao @ 2022-03-02 15:59 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: alexei.starovoitov, alyssa.milburn, andrew.cooper3, hjl.tools,
	joao, jpoimboe, keescook, linux-kernel, mark.rutland, mbenes,
	Masami Hiramatsu, ndesaulniers, rostedt, samitolvanen, x86

Peter Zijlstra wrote:
> 
> How does this look?

I gave this a quick test on powerpc and this looks good to me.

> --- a/include/linux/kprobes.h
> +++ b/include/linux/kprobes.h
> @@ -265,7 +265,6 @@ extern int arch_init_kprobes(void);
>  extern void kprobes_inc_nmissed_count(struct kprobe *p);
>  extern bool arch_within_kprobe_blacklist(unsigned long addr);
>  extern int arch_populate_kprobe_blacklist(void);
> -extern bool arch_kprobe_on_func_entry(unsigned long offset);

There is a __weak definition of this function in kernel/kprobes.c which 
should also be removed.


Thanks,
Naveen


^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH v2 12/39] x86/ibt,ftrace: Search for __fentry__ location
  2022-03-02 13:20                   ` Peter Zijlstra
@ 2022-03-02 16:01                     ` Steven Rostedt
  2022-03-02 19:47                       ` Steven Rostedt
  0 siblings, 1 reply; 183+ messages in thread
From: Steven Rostedt @ 2022-03-02 16:01 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Naveen N. Rao, Masami Hiramatsu, alexei.starovoitov,
	alyssa.milburn, andrew.cooper3, hjl.tools, joao, jpoimboe,
	keescook, linux-kernel, mark.rutland, mbenes, ndesaulniers,
	samitolvanen, x86

On Wed, 2 Mar 2022 14:20:23 +0100
Peter Zijlstra <peterz@infradead.org> wrote:

> Like so, or is something else needed?
> 
> diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
> index 68ecd3e35342..d1b30b5c5c23 100644
> --- a/kernel/trace/ftrace.c
> +++ b/kernel/trace/ftrace.c
> @@ -4980,7 +4980,8 @@ ftrace_match_addr(struct ftrace_hash *hash, unsigned long ip, int remove)
>  {
>  	struct ftrace_func_entry *entry;
>  
> -	if (!ftrace_location(ip))
> +	ip = ftrace_location(ip);
> +	if (!ip)
>  		return -EINVAL;

This could possibly work. I'd have to test all this though.

I probably could just take this patch and try it out. You can remove the
"x86/ibt" from the subject, as this patch may be a requirement for that
(include that in the commit log), but it is has no changes to x86/ibt
specifically.

-- Steve


>  
>  	if (remove) {

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH v2 15/39] x86/ibt,kprobes: Fix more +0 assumptions
  2022-03-01 20:05               ` Peter Zijlstra
  2022-03-02 15:59                 ` Naveen N. Rao
@ 2022-03-02 16:17                 ` Naveen N. Rao
  2022-03-02 19:32                   ` Peter Zijlstra
  2022-03-03  1:54                   ` Masami Hiramatsu
  1 sibling, 2 replies; 183+ messages in thread
From: Naveen N. Rao @ 2022-03-02 16:17 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: alexei.starovoitov, alyssa.milburn, andrew.cooper3, hjl.tools,
	joao, jpoimboe, keescook, linux-kernel, mark.rutland, mbenes,
	Masami Hiramatsu, ndesaulniers, rostedt, samitolvanen, x86

Peter Zijlstra wrote:
> --- a/arch/powerpc/kernel/kprobes.c
> +++ b/arch/powerpc/kernel/kprobes.c
> @@ -105,6 +105,27 @@ kprobe_opcode_t *kprobe_lookup_name(cons
>  	return addr;
>  }
> 
> +static bool arch_kprobe_on_func_entry(unsigned long offset)
> +{
> +#ifdef PPC64_ELF_ABI_v2
> +#ifdef CONFIG_KPROBES_ON_FTRACE
> +	return offset <= 16;
> +#else
> +	return offset <= 8;
> +#endif
> +#else
> +	return !offset;
> +#endif
> +}
> +
> +/* XXX try and fold the magic of kprobe_lookup_name() in this */
> +kprobe_opcode_t *arch_adjust_kprobe_addr(unsigned long addr, unsigned long offset,
> +					 bool *on_func_entry)
> +{
> +	*on_func_entry = arch_kprobe_on_func_entry(offset);
> +	return (kprobe_opcode_t *)(addr + offset);
> +}
> +

With respect to kprobe_lookup_name(), one of the primary motivations there was 
the issue with function descriptors for the previous elf v1 ABI (it likely also 
affects ia64/parisc). I'm thinking it'll be simpler if we have a way to obtain 
function entry point. Something like this:

diff --git a/include/linux/kallsyms.h b/include/linux/kallsyms.h
index 4176c7eca7b5aa..8c57cc5b77f9ae 100644
--- a/include/linux/kallsyms.h
+++ b/include/linux/kallsyms.h
@@ -73,6 +73,12 @@ int kallsyms_on_each_symbol(int (*fn)(void *, const char *, struct module *,
 /* Lookup the address for a symbol. Returns 0 if not found. */
 unsigned long kallsyms_lookup_name(const char *name);
 
+/* Return function entry point by additionally dereferencing function descriptor */
+static inline unsigned long kallsyms_lookup_function(const char *name)
+{
+	return (unsigned long)dereference_symbol_descriptor((void *)kallsyms_lookup_name(name));
+}
+
 extern int kallsyms_lookup_size_offset(unsigned long addr,
 				  unsigned long *symbolsize,
 				  unsigned long *offset);
@@ -103,6 +109,11 @@ static inline unsigned long kallsyms_lookup_name(const char *name)
 	return 0;
 }
 
+static inline unsigned long kallsyms_lookup_function(const char *name)
+{
+	return 0;
+}
+
 static inline int kallsyms_lookup_size_offset(unsigned long addr,
 					      unsigned long *symbolsize,
 					      unsigned long *offset)


With that, we can fold some of the code from kprobe_lookup_name() into 
arch_adjust_kprobe_addr() and remove kprobe_lookup_name(). This should also 
address Masami's concerns with powerpc promoting all probes at function entry 
into a probe at the ftrace location.


- Naveen


---
 arch/powerpc/kernel/kprobes.c | 70 +++--------------------------------
 include/linux/kprobes.h       |  1 -
 kernel/kprobes.c              | 19 ++--------
 kernel/trace/trace_kprobe.c   |  2 +-
 4 files changed, 9 insertions(+), 83 deletions(-)

diff --git a/arch/powerpc/kernel/kprobes.c b/arch/powerpc/kernel/kprobes.c
index 7dae0b01abfbd6..46aa2b9e44c27c 100644
--- a/arch/powerpc/kernel/kprobes.c
+++ b/arch/powerpc/kernel/kprobes.c
@@ -41,70 +41,6 @@ bool arch_within_kprobe_blacklist(unsigned long addr)
 		 addr < (unsigned long)__head_end);
 }
 
-kprobe_opcode_t *kprobe_lookup_name(const char *name, unsigned int offset)
-{
-	kprobe_opcode_t *addr = NULL;
-
-#ifdef PPC64_ELF_ABI_v2
-	/* PPC64 ABIv2 needs local entry point */
-	addr = (kprobe_opcode_t *)kallsyms_lookup_name(name);
-	if (addr && !offset) {
-#ifdef CONFIG_KPROBES_ON_FTRACE
-		unsigned long faddr;
-		/*
-		 * Per livepatch.h, ftrace location is always within the first
-		 * 16 bytes of a function on powerpc with -mprofile-kernel.
-		 */
-		faddr = ftrace_location_range((unsigned long)addr,
-					      (unsigned long)addr + 16);
-		if (faddr)
-			addr = (kprobe_opcode_t *)faddr;
-		else
-#endif
-			addr = (kprobe_opcode_t *)ppc_function_entry(addr);
-	}
-#elif defined(PPC64_ELF_ABI_v1)
-	/*
-	 * 64bit powerpc ABIv1 uses function descriptors:
-	 * - Check for the dot variant of the symbol first.
-	 * - If that fails, try looking up the symbol provided.
-	 *
-	 * This ensures we always get to the actual symbol and not
-	 * the descriptor.
-	 *
-	 * Also handle <module:symbol> format.
-	 */
-	char dot_name[MODULE_NAME_LEN + 1 + KSYM_NAME_LEN];
-	bool dot_appended = false;
-	const char *c;
-	ssize_t ret = 0;
-	int len = 0;
-
-	if ((c = strnchr(name, MODULE_NAME_LEN, ':')) != NULL) {
-		c++;
-		len = c - name;
-		memcpy(dot_name, name, len);
-	} else
-		c = name;
-
-	if (*c != '\0' && *c != '.') {
-		dot_name[len++] = '.';
-		dot_appended = true;
-	}
-	ret = strscpy(dot_name + len, c, KSYM_NAME_LEN);
-	if (ret > 0)
-		addr = (kprobe_opcode_t *)kallsyms_lookup_name(dot_name);
-
-	/* Fallback to the original non-dot symbol lookup */
-	if (!addr && dot_appended)
-		addr = (kprobe_opcode_t *)kallsyms_lookup_name(name);
-#else
-	addr = (kprobe_opcode_t *)kallsyms_lookup_name(name);
-#endif
-
-	return addr;
-}
-
 static bool arch_kprobe_on_func_entry(unsigned long offset)
 {
 #ifdef PPC64_ELF_ABI_v2
@@ -118,11 +54,15 @@ static bool arch_kprobe_on_func_entry(unsigned long offset)
 #endif
 }
 
-/* XXX try and fold the magic of kprobe_lookup_name() in this */
 kprobe_opcode_t *arch_adjust_kprobe_addr(unsigned long addr, unsigned long offset,
 					 bool *on_func_entry)
 {
 	*on_func_entry = arch_kprobe_on_func_entry(offset);
+#ifdef PPC64_ELF_ABI_v2
+	/* Promote probes on the GEP to the LEP */
+	if (!offset)
+		addr = ppc_function_entry((void *)addr);
+#endif
 	return (kprobe_opcode_t *)(addr + offset);
 }
 
diff --git a/include/linux/kprobes.h b/include/linux/kprobes.h
index 9c28f7a0ef4268..dad375056ba049 100644
--- a/include/linux/kprobes.h
+++ b/include/linux/kprobes.h
@@ -382,7 +382,6 @@ static inline struct kprobe_ctlblk *get_kprobe_ctlblk(void)
 	return this_cpu_ptr(&kprobe_ctlblk);
 }
 
-kprobe_opcode_t *kprobe_lookup_name(const char *name, unsigned int offset);
 kprobe_opcode_t *arch_adjust_kprobe_addr(unsigned long addr, unsigned long offset, bool *on_func_entry);
 
 int register_kprobe(struct kprobe *p);
diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index 8be57fdc19bdc0..066fa644e9dfa3 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -67,12 +67,6 @@ static bool kprobes_all_disarmed;
 static DEFINE_MUTEX(kprobe_mutex);
 static DEFINE_PER_CPU(struct kprobe *, kprobe_instance);
 
-kprobe_opcode_t * __weak kprobe_lookup_name(const char *name,
-					unsigned int __unused)
-{
-	return ((kprobe_opcode_t *)(kallsyms_lookup_name(name)));
-}
-
 /*
  * Blacklist -- list of 'struct kprobe_blacklist_entry' to store info where
  * kprobes can not probe.
@@ -1481,7 +1475,7 @@ bool within_kprobe_blacklist(unsigned long addr)
 		if (!p)
 			return false;
 		*p = '\0';
-		addr = (unsigned long)kprobe_lookup_name(symname, 0);
+		addr = kallsyms_lookup_function(symname);
 		if (addr)
 			return __within_kprobe_blacklist(addr);
 	}
@@ -1524,14 +1518,7 @@ _kprobe_addr(kprobe_opcode_t *addr, const char *symbol_name,
 		goto invalid;
 
 	if (symbol_name) {
-		/*
-		 * Input: @sym + @offset
-		 * Output: @addr + @offset
-		 *
-		 * NOTE: kprobe_lookup_name() does *NOT* fold the offset
-		 *       argument into it's output!
-		 */
-		addr = kprobe_lookup_name(symbol_name, offset);
+		addr = (kprobe_opcode_t *)kallsyms_lookup_function(symbol_name);
 		if (!addr)
 			return ERR_PTR(-ENOENT);
 	}
@@ -2621,7 +2608,7 @@ static int __init init_kprobes(void)
 		/* lookup the function address from its name */
 		for (i = 0; kretprobe_blacklist[i].name != NULL; i++) {
 			kretprobe_blacklist[i].addr =
-				kprobe_lookup_name(kretprobe_blacklist[i].name, 0);
+				(void *)kallsyms_lookup_function(kretprobe_blacklist[i].name);
 			if (!kretprobe_blacklist[i].addr)
 				pr_err("Failed to lookup symbol '%s' for kretprobe blacklist. Maybe the target function is removed or renamed.\n",
 				       kretprobe_blacklist[i].name);
diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index 508f14af4f2c7e..a8d01954051e60 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -461,7 +461,7 @@ static bool within_notrace_func(struct trace_kprobe *tk)
 		if (!p)
 			return true;
 		*p = '\0';
-		addr = (unsigned long)kprobe_lookup_name(symname, 0);
+		addr = kallsyms_lookup_function(symname);
 		if (addr)
 			return __within_notrace_func(addr);
 	}
-- 
2.35.1



^ permalink raw reply related	[flat|nested] 183+ messages in thread

* Re: [PATCH v2 12/39] x86/ibt,ftrace: Search for __fentry__ location
  2022-02-24 14:51 ` [PATCH v2 12/39] x86/ibt,ftrace: Search for __fentry__ location Peter Zijlstra
  2022-02-24 15:55   ` Masami Hiramatsu
  2022-02-25  0:55   ` Kees Cook
@ 2022-03-02 16:25   ` Naveen N. Rao
  2 siblings, 0 replies; 183+ messages in thread
From: Naveen N. Rao @ 2022-03-02 16:25 UTC (permalink / raw)
  To: andrew.cooper3, hjl.tools, joao, jpoimboe, Peter Zijlstra, x86
  Cc: alexei.starovoitov, alyssa.milburn, keescook, linux-kernel,
	mark.rutland, mbenes, mhiramat, ndesaulniers, rostedt,
	samitolvanen

Peter Zijlstra wrote:
> Have ftrace_location() search the symbol for the __fentry__ location
> when it isn't at func+0 and use this for {,un}register_ftrace_direct().
> 
> This avoids a whole bunch of assumptions about __fentry__ being at
> func+0.
> 
> Suggested-by: Steven Rostedt <rostedt@goodmis.org>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  kernel/trace/ftrace.c |   30 ++++++++++++++++++++++++++++--
>  1 file changed, 28 insertions(+), 2 deletions(-)
> 
> --- a/kernel/trace/ftrace.c
> +++ b/kernel/trace/ftrace.c
> @@ -1578,7 +1578,24 @@ unsigned long ftrace_location_range(unsi
>   */
>  unsigned long ftrace_location(unsigned long ip)
>  {
> -	return ftrace_location_range(ip, ip);
> +	struct dyn_ftrace *rec;
> +	unsigned long offset;
> +	unsigned long size;
> +
> +	rec = lookup_rec(ip, ip);
> +	if (!rec) {
> +		if (!kallsyms_lookup_size_offset(ip, &size, &offset))
> +			goto out;
> +
> +		if (!offset)
> +			rec = lookup_rec(ip - offset, (ip - offset) + size);
> +	}
> +
> +	if (rec)
> +		return rec->ip;
> +
> +out:
> +	return 0;
>  }
>  
>  /**
> @@ -5110,11 +5127,16 @@ int register_ftrace_direct(unsigned long
>  	struct ftrace_func_entry *entry;
>  	struct ftrace_hash *free_hash = NULL;
>  	struct dyn_ftrace *rec;
> -	int ret = -EBUSY;
> +	int ret = -ENODEV;
>  
>  	mutex_lock(&direct_mutex);
>  
> +	ip = ftrace_location(ip);
> +	if (!ip)
> +		goto out_unlock;
> +
>  	/* See if there's a direct function at @ip already */
> +	ret = -EBUSY;
>  	if (ftrace_find_rec_direct(ip))
>  		goto out_unlock;

I think some of the validation at this point can be removed (diff below).

>  
> @@ -5222,6 +5244,10 @@ int unregister_ftrace_direct(unsigned lo
>  
>  	mutex_lock(&direct_mutex);
>  
> +	ip = ftrace_location(ip);
> +	if (!ip)
> +		goto out_unlock;
> +
>  	entry = find_direct_entry(&ip, NULL);
>  	if (!entry)
>  		goto out_unlock;

We should also update modify_ftrace_direct(). An incremental diff below.


- Naveen


---
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index 65d7553668ca3d..17ce4751a2051a 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -5126,7 +5126,6 @@ int register_ftrace_direct(unsigned long ip, unsigned long addr)
 	struct ftrace_direct_func *direct;
 	struct ftrace_func_entry *entry;
 	struct ftrace_hash *free_hash = NULL;
-	struct dyn_ftrace *rec;
 	int ret = -ENODEV;
 
 	mutex_lock(&direct_mutex);
@@ -5140,26 +5139,6 @@ int register_ftrace_direct(unsigned long ip, unsigned long addr)
 	if (ftrace_find_rec_direct(ip))
 		goto out_unlock;
 
-	ret = -ENODEV;
-	rec = lookup_rec(ip, ip);
-	if (!rec)
-		goto out_unlock;
-
-	/*
-	 * Check if the rec says it has a direct call but we didn't
-	 * find one earlier?
-	 */
-	if (WARN_ON(rec->flags & FTRACE_FL_DIRECT))
-		goto out_unlock;
-
-	/* Make sure the ip points to the exact record */
-	if (ip != rec->ip) {
-		ip = rec->ip;
-		/* Need to check this ip for a direct. */
-		if (ftrace_find_rec_direct(ip))
-			goto out_unlock;
-	}
-
 	ret = -ENOMEM;
 	direct = ftrace_find_direct_func(addr);
 	if (!direct) {
@@ -5380,6 +5359,10 @@ int modify_ftrace_direct(unsigned long ip,
 	mutex_lock(&direct_mutex);
 
 	mutex_lock(&ftrace_lock);
+	ip = ftrace_location(ip);
+	if (!ip)
+		goto out_unlock;
+
 	entry = find_direct_entry(&ip, &rec);
 	if (!entry)
 		goto out_unlock;
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 183+ messages in thread

* Re: [PATCH v2 01/39] kbuild: Fix clang build
  2022-03-01 21:16   ` Nick Desaulniers
  2022-03-02  0:47     ` Kees Cook
@ 2022-03-02 16:37     ` Nathan Chancellor
  2022-03-02 18:40       ` Kees Cook
  2022-03-02 19:18       ` Nick Desaulniers
  1 sibling, 2 replies; 183+ messages in thread
From: Nathan Chancellor @ 2022-03-02 16:37 UTC (permalink / raw)
  To: Nick Desaulniers
  Cc: Peter Zijlstra, x86, joao, hjl.tools, jpoimboe, andrew.cooper3,
	linux-kernel, keescook, samitolvanen, mark.rutland,
	alyssa.milburn, mbenes, rostedt, mhiramat, alexei.starovoitov,
	Masahiro Yamada, Linux Kbuild mailing list, llvm

On Tue, Mar 01, 2022 at 01:16:04PM -0800, Nick Desaulniers wrote:
> On Thu, Feb 24, 2022 at 7:17 AM Peter Zijlstra <peterz@infradead.org> wrote:
> >
> > Debian (and derived) distros ship their compilers as -$ver suffixed
> > binaries. For gcc it is sufficent to use:
> >
> >  $ make CC=gcc-12
> >
> > However, clang builds (esp. clang-lto) need a whole array of tools to be
> > exactly right, leading to unweildy stuff like:
> >
> >  $ make CC=clang-13 LD=ld.lld-13 AR=llvm-ar-13 NM=llvm-nm-13 OBJCOPY=llvm-objcopy-13 OBJDUMP=llvm-objdump-13 READELF=llvm-readelf-13 STRIP=llvm-strip-13 LLVM=1
> >
> > which is, quite franktly, totally insane and unusable. Instead make
> > the CC variable DTRT, enabling one such as myself to use:
> >
> >  $ make CC=clang-13
> >
> > This also lets one quickly test different clang versions.
> > Additionally, also support path based LLVM suites like:
> >
> >  $ make CC=/opt/llvm/bin/clang
> >
> > This changes the default to LLVM=1 when CC is clang, mixing toolchains
> 
> No, nack, we definitely do not want CC=clang to set LLVM=1. Those are
> distinctly two different things for testing JUST the compiler
> (CC=clang) vs the whole toolchain suite (LLVM=1). I do not wish to
> change the semantics of those, and only for LLVM.

I agree with this. CC is only changing the compiler, not any of the
other build utilities. CC=gcc-12 works for GCC because you are only
using a different compiler, not an entirely new toolchain (as binutils
will be the same as just CC=gcc).

> LLVM=1 means test clang, lld, llvm-objcopy, etc..
> CC=clang means test clang, bfd, GNU objcopy, etc..
> https://docs.kernel.org/kbuild/llvm.html#llvm-utilities
> 
> I don't wish to see the behavior of CC=clang change based on LLVM=0 being set.
> 
> > is still possible by explicitly adding LLVM=0.
> 
> Thanks for testing with LLVM, and even multiple versions of LLVM.
> 
> I'm still sympathetic, but only up to a point. A change like this MUST
> CC the kbuild AND LLVM maintainers AND respective lists though.  It
> also has little to do with the rest of the series.
> 
> As per our previous discussion
> https://lore.kernel.org/linux-kbuild/CAKwvOd=x9E=7WcCiieso-CDiiU-wMFcXL4W3V5j8dq7BL5QT+w@mail.gmail.com/
> I'm still of the opionion that this should be solved by modifications
> (permanent or one off) to one's $PATH.

However, I think we could still address Peter's complaint of "there
should be an easier way for me to use the tools that are already in my
PATH" with his first iteration of this patch [1], which I feel is
totally reasonable:

$ make LLVM=-14

It is still easy to use (in fact, it is shorter than 'CC=clang-14') and
it does not change anything else about how we build with LLVM. We would
just have to add something along the lines of

"If your LLVM tools have a suffix like Debian's (clang-14, ld.lld-14,
etc.), use LLVM=<suffix>.

$ make LLVM=-14"

to Documentation/kbuild/llvm.rst.

I might change the patch not to be so clever though:

ifneq ($(LLVM),)
ifneq ($(LLVM),1)
LLVM_SFX := $(LLVM)
endif
endif

[1]: https://lore.kernel.org/r/YXqpFHeY26sEbort@hirez.programming.kicks-ass.net/

Cheers,
Nathan

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH v2 15/39] x86/ibt,kprobes: Fix more +0 assumptions
  2022-03-02 15:59                 ` Naveen N. Rao
@ 2022-03-02 16:38                   ` Peter Zijlstra
  0 siblings, 0 replies; 183+ messages in thread
From: Peter Zijlstra @ 2022-03-02 16:38 UTC (permalink / raw)
  To: Naveen N. Rao
  Cc: alexei.starovoitov, alyssa.milburn, andrew.cooper3, hjl.tools,
	joao, jpoimboe, keescook, linux-kernel, mark.rutland, mbenes,
	Masami Hiramatsu, ndesaulniers, rostedt, samitolvanen, x86

On Wed, Mar 02, 2022 at 09:29:04PM +0530, Naveen N. Rao wrote:
> Peter Zijlstra wrote:
> > 
> > How does this look?
> 
> I gave this a quick test on powerpc and this looks good to me.

Thanks!

> > --- a/include/linux/kprobes.h
> > +++ b/include/linux/kprobes.h
> > @@ -265,7 +265,6 @@ extern int arch_init_kprobes(void);
> >  extern void kprobes_inc_nmissed_count(struct kprobe *p);
> >  extern bool arch_within_kprobe_blacklist(unsigned long addr);
> >  extern int arch_populate_kprobe_blacklist(void);
> > -extern bool arch_kprobe_on_func_entry(unsigned long offset);
> 
> There is a __weak definition of this function in kernel/kprobes.c which
> should also be removed.

*poof*, gone.

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH v2 18/39] x86/ibt: Add IBT feature, MSR and #CP handling
  2022-03-02 13:49             ` Peter Zijlstra
@ 2022-03-02 18:38               ` Kees Cook
  0 siblings, 0 replies; 183+ messages in thread
From: Kees Cook @ 2022-03-02 18:38 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Edgecombe, Rick P, Poimboe, Josh, linux-kernel, Cooper, Andrew,
	hjl.tools, rostedt, joao, samitolvanen, x86, mark.rutland,
	alexei.starovoitov, Milburn, Alyssa, mhiramat, mbenes,
	ndesaulniers

On Wed, Mar 02, 2022 at 02:49:08PM +0100, Peter Zijlstra wrote:
> On Wed, Mar 02, 2022 at 01:59:46AM +0000, Edgecombe, Rick P wrote:
> > As for pinning strength, I'm not understanding this kexec asm enough to
> > say for sure how much better it is than just removing the bit from the
> > pinning mask. I think some future hardening around preventing turning
> > off IBT might still be worthwhile.
> > 
> > Kees, I think you brought up the pinning, what do you think of this?
> 
> IIRC the whole purpose of that was to ensure that the
> cr4_update_irqsoff() function itself isn't a useful gadget to manipulate
> CR4 with.

Right -- as long as the paths that touch cr4 are either respecting
pinning or are inlined at a point of no return (i.e. performing kexec),
they shouldn't end up being very helpful gadgets.

e.g. imagine the scenario of a controlled write to a function pointer
than just aims at a valid endbr function that manipulates cr: an
attacker will just call into that first to disable IBT, and then carry
on doing things as normal.

There is an argument to be made that if an attacker has that kind of
control, they could call any set of functions to do what they want,
but the point is to create an unfriendly environment to attackers so
that future defenses compose well together -- e.g. FGKASLR.

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH v2 01/39] kbuild: Fix clang build
  2022-03-02 16:37     ` Nathan Chancellor
@ 2022-03-02 18:40       ` Kees Cook
  2022-03-02 19:18       ` Nick Desaulniers
  1 sibling, 0 replies; 183+ messages in thread
From: Kees Cook @ 2022-03-02 18:40 UTC (permalink / raw)
  To: Nathan Chancellor
  Cc: Nick Desaulniers, Peter Zijlstra, x86, joao, hjl.tools, jpoimboe,
	andrew.cooper3, linux-kernel, samitolvanen, mark.rutland,
	alyssa.milburn, mbenes, rostedt, mhiramat, alexei.starovoitov,
	Masahiro Yamada, Linux Kbuild mailing list, llvm

On Wed, Mar 02, 2022 at 09:37:04AM -0700, Nathan Chancellor wrote:
> However, I think we could still address Peter's complaint of "there
> should be an easier way for me to use the tools that are already in my
> PATH" with his first iteration of this patch [1], which I feel is
> totally reasonable:
> 
> $ make LLVM=-14
> 
> It is still easy to use (in fact, it is shorter than 'CC=clang-14') and
> it does not change anything else about how we build with LLVM. We would
> just have to add something along the lines of
> 
> "If your LLVM tools have a suffix like Debian's (clang-14, ld.lld-14,
> etc.), use LLVM=<suffix>.
> 
> $ make LLVM=-14"
> 
> to Documentation/kbuild/llvm.rst.
> 
> I might change the patch not to be so clever though:
> 
> ifneq ($(LLVM),)
> ifneq ($(LLVM),1)
> LLVM_SFX := $(LLVM)
> endif
> endif

I like this idea! I think it's much easier to control than PATH (though
I see the rationale there too).

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH v2 01/39] kbuild: Fix clang build
  2022-03-02 16:37     ` Nathan Chancellor
  2022-03-02 18:40       ` Kees Cook
@ 2022-03-02 19:18       ` Nick Desaulniers
  2022-03-02 21:15         ` Nathan Chancellor
  1 sibling, 1 reply; 183+ messages in thread
From: Nick Desaulniers @ 2022-03-02 19:18 UTC (permalink / raw)
  To: Nathan Chancellor, Peter Zijlstra, keescook
  Cc: x86, joao, hjl.tools, jpoimboe, andrew.cooper3, linux-kernel,
	samitolvanen, mark.rutland, alyssa.milburn, mbenes, rostedt,
	mhiramat, alexei.starovoitov, Masahiro Yamada,
	Linux Kbuild mailing list, llvm

On Wed, Mar 2, 2022 at 8:37 AM Nathan Chancellor <nathan@kernel.org> wrote:
>
> On Tue, Mar 01, 2022 at 01:16:04PM -0800, Nick Desaulniers wrote:
> > As per our previous discussion
> > https://lore.kernel.org/linux-kbuild/CAKwvOd=x9E=7WcCiieso-CDiiU-wMFcXL4W3V5j8dq7BL5QT+w@mail.gmail.com/
> > I'm still of the opionion that this should be solved by modifications
> > (permanent or one off) to one's $PATH.
>
> However, I think we could still address Peter's complaint of "there
> should be an easier way for me to use the tools that are already in my
> PATH" with his first iteration of this patch [1], which I feel is
> totally reasonable:
>
> $ make LLVM=-14
>
> It is still easy to use (in fact, it is shorter than 'CC=clang-14') and
> it does not change anything else about how we build with LLVM. We would
> just have to add something along the lines of
>
> "If your LLVM tools have a suffix like Debian's (clang-14, ld.lld-14,
> etc.), use LLVM=<suffix>.

"If your LLVM tools have a suffix and you prefer to test an explicit
version rather than the unsuffixed executables ..."

>
> $ make LLVM=-14"
>
> to Documentation/kbuild/llvm.rst.
>
> I might change the patch not to be so clever though:
>
> ifneq ($(LLVM),)
> ifneq ($(LLVM),1)
> LLVM_SFX := $(LLVM)
> endif
> endif
>
> [1]: https://lore.kernel.org/r/YXqpFHeY26sEbort@hirez.programming.kicks-ass.net/

I'd be much more amenable to that approach.
-- 
Thanks,
~Nick Desaulniers

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH v2 15/39] x86/ibt,kprobes: Fix more +0 assumptions
  2022-03-02 16:17                 ` Naveen N. Rao
@ 2022-03-02 19:32                   ` Peter Zijlstra
  2022-03-02 19:39                     ` Peter Zijlstra
  2022-03-03  1:54                   ` Masami Hiramatsu
  1 sibling, 1 reply; 183+ messages in thread
From: Peter Zijlstra @ 2022-03-02 19:32 UTC (permalink / raw)
  To: Naveen N. Rao
  Cc: alexei.starovoitov, alyssa.milburn, andrew.cooper3, hjl.tools,
	joao, jpoimboe, keescook, linux-kernel, mark.rutland, mbenes,
	Masami Hiramatsu, ndesaulniers, rostedt, samitolvanen, x86

On Wed, Mar 02, 2022 at 09:47:03PM +0530, Naveen N. Rao wrote:

> With respect to kprobe_lookup_name(), one of the primary motivations there
> was the issue with function descriptors for the previous elf v1 ABI (it
> likely also affects ia64/parisc). I'm thinking it'll be simpler if we have a
> way to obtain function entry point. Something like this:
> 
> diff --git a/include/linux/kallsyms.h b/include/linux/kallsyms.h
> index 4176c7eca7b5aa..8c57cc5b77f9ae 100644
> --- a/include/linux/kallsyms.h
> +++ b/include/linux/kallsyms.h
> @@ -73,6 +73,12 @@ int kallsyms_on_each_symbol(int (*fn)(void *, const char *, struct module *,
> /* Lookup the address for a symbol. Returns 0 if not found. */
> unsigned long kallsyms_lookup_name(const char *name);
> 
> +/* Return function entry point by additionally dereferencing function descriptor */
> +static inline unsigned long kallsyms_lookup_function(const char *name)
> +{
> +	return (unsigned long)dereference_symbol_descriptor((void *)kallsyms_lookup_name(name));
> +}
> +
> extern int kallsyms_lookup_size_offset(unsigned long addr,
> 				  unsigned long *symbolsize,
> 				  unsigned long *offset);
> @@ -103,6 +109,11 @@ static inline unsigned long kallsyms_lookup_name(const char *name)
> 	return 0;
> }
> 
> +static inline unsigned long kallsyms_lookup_function(const char *name)
> +{
> +	return 0;
> +}
> +
> static inline int kallsyms_lookup_size_offset(unsigned long addr,
> 					      unsigned long *symbolsize,
> 					      unsigned long *offset)
> 
> 
> With that, we can fold some of the code from kprobe_lookup_name() into
> arch_adjust_kprobe_addr() and remove kprobe_lookup_name(). This should also
> address Masami's concerns with powerpc promoting all probes at function
> entry into a probe at the ftrace location.

Yes, this looks entirely reasonable to me.

> ---

> kprobe_opcode_t *arch_adjust_kprobe_addr(unsigned long addr, unsigned long offset,
> 					 bool *on_func_entry)
> {
> 	*on_func_entry = arch_kprobe_on_func_entry(offset);
> +#ifdef PPC64_ELF_ABI_v2
> +	/* Promote probes on the GEP to the LEP */
> +	if (!offset)
> +		addr = ppc_function_entry((void *)addr);
> +#endif
> 	return (kprobe_opcode_t *)(addr + offset);
> }

I wonder if you also want to tighten up on_func_entry? Wouldn't the
above suggest something like:

kprobe_opcode_t *arch_adjust_kprobe_addr(unsigned long addr, unsigned long offset,
					 bool *on_func_entry)
{
#ifdef PPC64_ELF_ABI_V2
	unsigned long entry = ppc_function_entry((void *)addr) - addr;
	*on_func_entry = !offset || offset == entry;
	if (*on_func_entry)
		offset = entry;
#else
	*on_func_entry = !offset;
#endif
	return (void *)(addr + offset);
}

Then you can also go and delete the arch_kprobe_on_func_entry rudment.

For the rest, Yay at cleanups!, lots of magic code gone.

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH v2 15/39] x86/ibt,kprobes: Fix more +0 assumptions
  2022-03-02 19:32                   ` Peter Zijlstra
@ 2022-03-02 19:39                     ` Peter Zijlstra
  2022-03-03 12:11                       ` Naveen N. Rao
  0 siblings, 1 reply; 183+ messages in thread
From: Peter Zijlstra @ 2022-03-02 19:39 UTC (permalink / raw)
  To: Naveen N. Rao
  Cc: alexei.starovoitov, alyssa.milburn, andrew.cooper3, hjl.tools,
	joao, jpoimboe, keescook, linux-kernel, mark.rutland, mbenes,
	Masami Hiramatsu, ndesaulniers, rostedt, samitolvanen, x86

On Wed, Mar 02, 2022 at 08:32:45PM +0100, Peter Zijlstra wrote:
> I wonder if you also want to tighten up on_func_entry? Wouldn't the
> above suggest something like:
> 
> kprobe_opcode_t *arch_adjust_kprobe_addr(unsigned long addr, unsigned long offset,
> 					 bool *on_func_entry)
> {
> #ifdef PPC64_ELF_ABI_V2
> 	unsigned long entry = ppc_function_entry((void *)addr) - addr;
> 	*on_func_entry = !offset || offset == entry;
> 	if (*on_func_entry)
> 		offset = entry;
> #else
> 	*on_func_entry = !offset;
> #endif
> 	return (void *)(addr + offset);
> }

One question though; the above seems to work for +0 or +8 (IIRC your
instructions are 4 bytes each and the GEP is 2 instructions).

But what do we want to happen for +4 ?

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH v2 12/39] x86/ibt,ftrace: Search for __fentry__ location
  2022-03-02 16:01                     ` Steven Rostedt
@ 2022-03-02 19:47                       ` Steven Rostedt
  2022-03-02 20:48                         ` Steven Rostedt
  2022-03-02 20:51                         ` Peter Zijlstra
  0 siblings, 2 replies; 183+ messages in thread
From: Steven Rostedt @ 2022-03-02 19:47 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Naveen N. Rao, Masami Hiramatsu, alexei.starovoitov,
	alyssa.milburn, andrew.cooper3, hjl.tools, joao, jpoimboe,
	keescook, linux-kernel, mark.rutland, mbenes, ndesaulniers,
	samitolvanen, x86

On Wed, 2 Mar 2022 11:01:38 -0500
Steven Rostedt <rostedt@goodmis.org> wrote:

> On Wed, 2 Mar 2022 14:20:23 +0100
> Peter Zijlstra <peterz@infradead.org> wrote:
> 
> > Like so, or is something else needed?
> > 
> > diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
> > index 68ecd3e35342..d1b30b5c5c23 100644
> > --- a/kernel/trace/ftrace.c
> > +++ b/kernel/trace/ftrace.c
> > @@ -4980,7 +4980,8 @@ ftrace_match_addr(struct ftrace_hash *hash, unsigned long ip, int remove)
> >  {
> >  	struct ftrace_func_entry *entry;
> >  
> > -	if (!ftrace_location(ip))
> > +	ip = ftrace_location(ip);
> > +	if (!ip)
> >  		return -EINVAL;  
> 
> This could possibly work. I'd have to test all this though.
> 
> I probably could just take this patch and try it out. You can remove the
> "x86/ibt" from the subject, as this patch may be a requirement for that
> (include that in the commit log), but it is has no changes to x86/ibt
> specifically.
> 

Note, I just pulled this patch, and I hit this warning:

WARNING: CPU: 0 PID: 6965 at arch/x86/kernel/kprobes/core.c:205 recover_probed_instruction+0x8f/0xa0

static unsigned long
__recover_probed_insn(kprobe_opcode_t *buf, unsigned long addr)
{
        struct kprobe *kp;
        unsigned long faddr;

        kp = get_kprobe((void *)addr);
        faddr = ftrace_location(addr);
        /*
         * Addresses inside the ftrace location are refused by
         * arch_check_ftrace_location(). Something went terribly wrong
         * if such an address is checked here.
         */
        if (WARN_ON(faddr && faddr != addr))  <<---- HERE
                return 0UL;
        /*
         * Use the current code if it is not modified by Kprobe
         * and it cannot be modified by ftrace.
         */
        if (!kp && !faddr)
                return addr;

I guess this patch needs kprobe updates.

I'll pull down the latest tip and test that code.

-- Steve

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH v2 12/39] x86/ibt,ftrace: Search for __fentry__ location
  2022-03-02 19:47                       ` Steven Rostedt
@ 2022-03-02 20:48                         ` Steven Rostedt
  2022-03-02 20:51                         ` Peter Zijlstra
  1 sibling, 0 replies; 183+ messages in thread
From: Steven Rostedt @ 2022-03-02 20:48 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Naveen N. Rao, Masami Hiramatsu, alexei.starovoitov,
	alyssa.milburn, andrew.cooper3, hjl.tools, joao, jpoimboe,
	keescook, linux-kernel, mark.rutland, mbenes, ndesaulniers,
	samitolvanen, x86

On Wed, 2 Mar 2022 14:47:16 -0500
Steven Rostedt <rostedt@goodmis.org> wrote:

> Note, I just pulled this patch, and I hit this warning:
> 
> WARNING: CPU: 0 PID: 6965 at arch/x86/kernel/kprobes/core.c:205 recover_probed_instruction+0x8f/0xa0

As we discussed on IRC (but I want an email record of this), it appears that
with some debugging, the ftrace_location() could return the function right
after the current function because lookup_rec() has an inclusive "end"
argument.

Testing:

	rec = lookup_rec(ip - offset, (ip - offset) + size - 1);

Appears to fix the issue.

-- Steve

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH v2 12/39] x86/ibt,ftrace: Search for __fentry__ location
  2022-03-02 19:47                       ` Steven Rostedt
  2022-03-02 20:48                         ` Steven Rostedt
@ 2022-03-02 20:51                         ` Peter Zijlstra
  2022-03-03  9:45                           ` Naveen N. Rao
  1 sibling, 1 reply; 183+ messages in thread
From: Peter Zijlstra @ 2022-03-02 20:51 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Naveen N. Rao, Masami Hiramatsu, alexei.starovoitov,
	alyssa.milburn, andrew.cooper3, hjl.tools, joao, jpoimboe,
	keescook, linux-kernel, mark.rutland, mbenes, ndesaulniers,
	samitolvanen, x86

On Wed, Mar 02, 2022 at 02:47:16PM -0500, Steven Rostedt wrote:
> Note, I just pulled this patch, and I hit this warning:
> 
> WARNING: CPU: 0 PID: 6965 at arch/x86/kernel/kprobes/core.c:205 recover_probed_instruction+0x8f/0xa0
> 
> static unsigned long
> __recover_probed_insn(kprobe_opcode_t *buf, unsigned long addr)
> {
>         struct kprobe *kp;
>         unsigned long faddr;
> 
>         kp = get_kprobe((void *)addr);
>         faddr = ftrace_location(addr);
>         /*
>          * Addresses inside the ftrace location are refused by
>          * arch_check_ftrace_location(). Something went terribly wrong
>          * if such an address is checked here.
>          */
>         if (WARN_ON(faddr && faddr != addr))  <<---- HERE
>                 return 0UL;

Ha! so a bunch of IRC later I figured out how it is possible you hit
this with just the patch on and how I legitimately hit this and what to
do about it.

Your problem seems to be that we got ftrace_location() wrong in that
lookup_rec()'s end argument is inclusive, hence we need:

	lookup_rec(ip, ip + size - 1)

Now, the above thing asserts that:

	ftrace_location(x) == {0, x}

and that is genuinely false in my case, I get x+4 as additional possible
output. So I think I need the below change to go on top of all I have:


diff --git a/arch/x86/kernel/kprobes/core.c b/arch/x86/kernel/kprobes/core.c
index 7f0ce42f8ff9..4c13406e0bc4 100644
--- a/arch/x86/kernel/kprobes/core.c
+++ b/arch/x86/kernel/kprobes/core.c
@@ -198,13 +198,14 @@ __recover_probed_insn(kprobe_opcode_t *buf, unsigned long addr)
 
 	kp = get_kprobe((void *)addr);
 	faddr = ftrace_location(addr);
+
 	/*
-	 * Addresses inside the ftrace location are refused by
-	 * arch_check_ftrace_location(). Something went terribly wrong
-	 * if such an address is checked here.
+	 * In case addr maps to sym+0 ftrace_location() might return something
+	 * other than faddr. In that case consider it the same as !faddr.
 	 */
-	if (WARN_ON(faddr && faddr != addr))
-		return 0UL;
+	if (faddr && faddr != addr)
+		faddr = 0;
+
 	/*
 	 * Use the current code if it is not modified by Kprobe
 	 * and it cannot be modified by ftrace.

^ permalink raw reply related	[flat|nested] 183+ messages in thread

* Re: [PATCH v2 01/39] kbuild: Fix clang build
  2022-03-02 19:18       ` Nick Desaulniers
@ 2022-03-02 21:15         ` Nathan Chancellor
  2022-03-02 22:07           ` Nick Desaulniers
                             ` (2 more replies)
  0 siblings, 3 replies; 183+ messages in thread
From: Nathan Chancellor @ 2022-03-02 21:15 UTC (permalink / raw)
  To: Nick Desaulniers, Peter Zijlstra
  Cc: keescook, x86, joao, hjl.tools, jpoimboe, andrew.cooper3,
	linux-kernel, samitolvanen, mark.rutland, alyssa.milburn, mbenes,
	rostedt, mhiramat, alexei.starovoitov, Masahiro Yamada,
	Linux Kbuild mailing list, llvm

[-- Attachment #1: Type: text/plain, Size: 1969 bytes --]

On Wed, Mar 02, 2022 at 11:18:07AM -0800, Nick Desaulniers wrote:
> On Wed, Mar 2, 2022 at 8:37 AM Nathan Chancellor <nathan@kernel.org> wrote:
> >
> > On Tue, Mar 01, 2022 at 01:16:04PM -0800, Nick Desaulniers wrote:
> > > As per our previous discussion
> > > https://lore.kernel.org/linux-kbuild/CAKwvOd=x9E=7WcCiieso-CDiiU-wMFcXL4W3V5j8dq7BL5QT+w@mail.gmail.com/
> > > I'm still of the opionion that this should be solved by modifications
> > > (permanent or one off) to one's $PATH.
> >
> > However, I think we could still address Peter's complaint of "there
> > should be an easier way for me to use the tools that are already in my
> > PATH" with his first iteration of this patch [1], which I feel is
> > totally reasonable:
> >
> > $ make LLVM=-14
> >
> > It is still easy to use (in fact, it is shorter than 'CC=clang-14') and
> > it does not change anything else about how we build with LLVM. We would
> > just have to add something along the lines of
> >
> > "If your LLVM tools have a suffix like Debian's (clang-14, ld.lld-14,
> > etc.), use LLVM=<suffix>.
> 
> "If your LLVM tools have a suffix and you prefer to test an explicit
> version rather than the unsuffixed executables ..."

Ack, I included this.

> > $ make LLVM=-14"
> >
> > to Documentation/kbuild/llvm.rst.
> >
> > I might change the patch not to be so clever though:
> >
> > ifneq ($(LLVM),)
> > ifneq ($(LLVM),1)
> > LLVM_SFX := $(LLVM)
> > endif
> > endif
> >
> > [1]: https://lore.kernel.org/r/YXqpFHeY26sEbort@hirez.programming.kicks-ass.net/
> 
> I'd be much more amenable to that approach.

Sounds good, tentative patch attached, it passes all of my testing.
There is an instance of $(LLVM) in tools/testing/selftests/lib.mk that I
did not touch, as that will presumably have to go through the selftests
tree. I can send a separate patch for that later.

Peter, is this approach okay with you? If so, would you like to be
co-author or should I use a suggested-by tag?

Cheers,
Nathan

[-- Attachment #2: 0001-kbuild-Allow-a-suffix-with-LLVM.patch --]
[-- Type: text/plain, Size: 4675 bytes --]

From 83219caafbb7dbc2e41e3888ba5079d342aff633 Mon Sep 17 00:00:00 2001
From: Nathan Chancellor <nathan@kernel.org>
Date: Wed, 2 Mar 2022 13:28:14 -0700
Subject: [PATCH] kbuild: Allow a suffix with $(LLVM)

The LLVM variable allows a developer to quickly switch between the GNU
and LLVM tools. However, it does not handle versioned binaries, such as
the ones shipped by Debian, as LLVM=1 just defines the build variables
with the unversioned binaries.

There was some discussion during the review of the patch that introduces
LLVM=1 around this, ultimately coming to the conclusion that developers
can just add the folder that contains the unversioned binaries to their
PATH, as Debian's versioned suffixed binaries are really just symlinks
to the unversioned binaries in /usr/lib/llvm-#/bin:

$ realpath /usr/bin/clang-14
/usr/lib/llvm-14/bin/clang

$ PATH=/usr/lib/llvm-14/bin:$PATH make ... LLVM=1

However, it is simple enough to support this scheme directly in the
Kbuild system by allowing the developer to specify the version suffix
with LLVM=, which is shorter than the above suggestion:

$ make ... LLVM=-14

It does not change the meaning of LLVM=1 (which will continue to use
unversioned binaries) and it does not add too much additional complexity
to the existing $(LLVM) code, while allowing developers to quickly test
their series with different versions of the whole LLVM suite of tools.

Signed-off-by: Nathan Chancellor <nathan@kernel.org>
---
 Documentation/kbuild/llvm.rst  |  7 +++++++
 Makefile                       | 24 ++++++++++++++----------
 tools/scripts/Makefile.include | 20 ++++++++++++--------
 3 files changed, 33 insertions(+), 18 deletions(-)

diff --git a/Documentation/kbuild/llvm.rst b/Documentation/kbuild/llvm.rst
index d32616891dcf..5805a8473a36 100644
--- a/Documentation/kbuild/llvm.rst
+++ b/Documentation/kbuild/llvm.rst
@@ -60,6 +60,13 @@ They can be enabled individually. The full list of the parameters: ::
 	  OBJCOPY=llvm-objcopy OBJDUMP=llvm-objdump READELF=llvm-readelf \
 	  HOSTCC=clang HOSTCXX=clang++ HOSTAR=llvm-ar HOSTLD=ld.lld
 
+If your LLVM tools have a suffix and you prefer to test an explicit version rather
+than the unsuffixed executables, use ``LLVM=<suffix>``. For example: ::
+
+	make LLVM=-14
+
+will use ``clang-14``, ``ld.lld-14``, etc.
+
 The integrated assembler is enabled by default. You can pass ``LLVM_IAS=0`` to
 disable it.
 
diff --git a/Makefile b/Makefile
index daeb5c88b50b..89b61e693258 100644
--- a/Makefile
+++ b/Makefile
@@ -424,8 +424,12 @@ HOST_LFS_LDFLAGS := $(shell getconf LFS_LDFLAGS 2>/dev/null)
 HOST_LFS_LIBS := $(shell getconf LFS_LIBS 2>/dev/null)
 
 ifneq ($(LLVM),)
-HOSTCC	= clang
-HOSTCXX	= clang++
+ifneq ($(LLVM),1)
+LLVM_SFX := $(LLVM)
+endif
+
+HOSTCC	= clang$(LLVM_SFX)
+HOSTCXX	= clang++$(LLVM_SFX)
 else
 HOSTCC	= gcc
 HOSTCXX	= g++
@@ -443,14 +447,14 @@ KBUILD_HOSTLDLIBS   := $(HOST_LFS_LIBS) $(HOSTLDLIBS)
 # Make variables (CC, etc...)
 CPP		= $(CC) -E
 ifneq ($(LLVM),)
-CC		= clang
-LD		= ld.lld
-AR		= llvm-ar
-NM		= llvm-nm
-OBJCOPY		= llvm-objcopy
-OBJDUMP		= llvm-objdump
-READELF		= llvm-readelf
-STRIP		= llvm-strip
+CC		= clang$(LLVM_SFX)
+LD		= ld.lld$(LLVM_SFX)
+AR		= llvm-ar$(LLVM_SFX)
+NM		= llvm-nm$(LLVM_SFX)
+OBJCOPY		= llvm-objcopy$(LLVM_SFX)
+OBJDUMP		= llvm-objdump$(LLVM_SFX)
+READELF		= llvm-readelf$(LLVM_SFX)
+STRIP		= llvm-strip$(LLVM_SFX)
 else
 CC		= $(CROSS_COMPILE)gcc
 LD		= $(CROSS_COMPILE)ld
diff --git a/tools/scripts/Makefile.include b/tools/scripts/Makefile.include
index 79d102304470..ab3b2a7dcc94 100644
--- a/tools/scripts/Makefile.include
+++ b/tools/scripts/Makefile.include
@@ -52,11 +52,15 @@ define allow-override
 endef
 
 ifneq ($(LLVM),)
-$(call allow-override,CC,clang)
-$(call allow-override,AR,llvm-ar)
-$(call allow-override,LD,ld.lld)
-$(call allow-override,CXX,clang++)
-$(call allow-override,STRIP,llvm-strip)
+ifneq ($(LLVM),1)
+LLVM_SFX := $(LLVM)
+endif
+
+$(call allow-override,CC,clang$(LLVM_SFX))
+$(call allow-override,AR,llvm-ar$(LLVM_SFX))
+$(call allow-override,LD,ld.lld$(LLVM_SFX))
+$(call allow-override,CXX,clang++$(LLVM_SFX))
+$(call allow-override,STRIP,llvm-strip$(LLVM_SFX))
 else
 # Allow setting various cross-compile vars or setting CROSS_COMPILE as a prefix.
 $(call allow-override,CC,$(CROSS_COMPILE)gcc)
@@ -69,9 +73,9 @@ endif
 CC_NO_CLANG := $(shell $(CC) -dM -E -x c /dev/null | grep -Fq "__clang__"; echo $$?)
 
 ifneq ($(LLVM),)
-HOSTAR  ?= llvm-ar
-HOSTCC  ?= clang
-HOSTLD  ?= ld.lld
+HOSTAR  ?= llvm-ar$(LLVM_SFX)
+HOSTCC  ?= clang$(LLVM_SFX)
+HOSTLD  ?= ld.lld$(LLVM_SFX)
 else
 HOSTAR  ?= ar
 HOSTCC  ?= gcc

base-commit: fb184c4af9b9f4563e7a126219389986a71d5b5b
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 183+ messages in thread

* Re: [PATCH v2 01/39] kbuild: Fix clang build
  2022-03-02 21:15         ` Nathan Chancellor
@ 2022-03-02 22:07           ` Nick Desaulniers
  2022-03-02 23:00           ` Kees Cook
  2022-03-02 23:10           ` Peter Zijlstra
  2 siblings, 0 replies; 183+ messages in thread
From: Nick Desaulniers @ 2022-03-02 22:07 UTC (permalink / raw)
  To: Nathan Chancellor
  Cc: Peter Zijlstra, keescook, x86, joao, hjl.tools, jpoimboe,
	andrew.cooper3, linux-kernel, samitolvanen, mark.rutland,
	alyssa.milburn, mbenes, rostedt, mhiramat, alexei.starovoitov,
	Masahiro Yamada, Linux Kbuild mailing list, llvm

On Wed, Mar 2, 2022 at 1:15 PM Nathan Chancellor <nathan@kernel.org> wrote:
>
> On Wed, Mar 02, 2022 at 11:18:07AM -0800, Nick Desaulniers wrote:
> > I'd be much more amenable to that approach.
>
> Sounds good, tentative patch attached, it passes all of my testing.
> There is an instance of $(LLVM) in tools/testing/selftests/lib.mk that I
> did not touch, as that will presumably have to go through the selftests
> tree. I can send a separate patch for that later.

LGTM!
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
-- 
Thanks,
~Nick Desaulniers

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH v2 01/39] kbuild: Fix clang build
  2022-03-02 21:15         ` Nathan Chancellor
  2022-03-02 22:07           ` Nick Desaulniers
@ 2022-03-02 23:00           ` Kees Cook
  2022-03-02 23:10           ` Peter Zijlstra
  2 siblings, 0 replies; 183+ messages in thread
From: Kees Cook @ 2022-03-02 23:00 UTC (permalink / raw)
  To: Nathan Chancellor
  Cc: Nick Desaulniers, Peter Zijlstra, x86, joao, hjl.tools, jpoimboe,
	andrew.cooper3, linux-kernel, samitolvanen, mark.rutland,
	alyssa.milburn, mbenes, rostedt, mhiramat, alexei.starovoitov,
	Masahiro Yamada, Linux Kbuild mailing list, llvm

On Wed, Mar 02, 2022 at 02:15:45PM -0700, Nathan Chancellor wrote:
> Sounds good, tentative patch attached, it passes all of my testing.
> There is an instance of $(LLVM) in tools/testing/selftests/lib.mk that I
> did not touch, as that will presumably have to go through the selftests
> tree. I can send a separate patch for that later.

I think it's fine to include that here, just to keep the logic together.

> Peter, is this approach okay with you? If so, would you like to be
> co-author or should I use a suggested-by tag?
> 
> Cheers,
> Nathan

> From 83219caafbb7dbc2e41e3888ba5079d342aff633 Mon Sep 17 00:00:00 2001
> From: Nathan Chancellor <nathan@kernel.org>
> Date: Wed, 2 Mar 2022 13:28:14 -0700
> Subject: [PATCH] kbuild: Allow a suffix with $(LLVM)
> 
> The LLVM variable allows a developer to quickly switch between the GNU
> and LLVM tools. However, it does not handle versioned binaries, such as
> the ones shipped by Debian, as LLVM=1 just defines the build variables
> with the unversioned binaries.
> 
> There was some discussion during the review of the patch that introduces
> LLVM=1 around this, ultimately coming to the conclusion that developers
> can just add the folder that contains the unversioned binaries to their
> PATH, as Debian's versioned suffixed binaries are really just symlinks
> to the unversioned binaries in /usr/lib/llvm-#/bin:
> 
> $ realpath /usr/bin/clang-14
> /usr/lib/llvm-14/bin/clang
> 
> $ PATH=/usr/lib/llvm-14/bin:$PATH make ... LLVM=1
> 
> However, it is simple enough to support this scheme directly in the
> Kbuild system by allowing the developer to specify the version suffix
> with LLVM=, which is shorter than the above suggestion:
> 
> $ make ... LLVM=-14
> 
> It does not change the meaning of LLVM=1 (which will continue to use
> unversioned binaries) and it does not add too much additional complexity
> to the existing $(LLVM) code, while allowing developers to quickly test
> their series with different versions of the whole LLVM suite of tools.
> 
> Signed-off-by: Nathan Chancellor <nathan@kernel.org>

I like it!

Reviewed-by: Kees Cook <keescook@chromium.org>

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH v2 01/39] kbuild: Fix clang build
  2022-03-02 21:15         ` Nathan Chancellor
  2022-03-02 22:07           ` Nick Desaulniers
  2022-03-02 23:00           ` Kees Cook
@ 2022-03-02 23:10           ` Peter Zijlstra
  2 siblings, 0 replies; 183+ messages in thread
From: Peter Zijlstra @ 2022-03-02 23:10 UTC (permalink / raw)
  To: Nathan Chancellor
  Cc: Nick Desaulniers, keescook, x86, joao, hjl.tools, jpoimboe,
	andrew.cooper3, linux-kernel, samitolvanen, mark.rutland,
	alyssa.milburn, mbenes, rostedt, mhiramat, alexei.starovoitov,
	Masahiro Yamada, Linux Kbuild mailing list, llvm

On Wed, Mar 02, 2022 at 02:15:45PM -0700, Nathan Chancellor wrote:
> Peter, is this approach okay with you? If so, would you like to be
> co-author or should I use a suggested-by tag?

Yeah, this helps lots. Suggested-by seems fine. Thanks!

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH v2 15/39] x86/ibt,kprobes: Fix more +0 assumptions
  2022-03-02 16:17                 ` Naveen N. Rao
  2022-03-02 19:32                   ` Peter Zijlstra
@ 2022-03-03  1:54                   ` Masami Hiramatsu
  1 sibling, 0 replies; 183+ messages in thread
From: Masami Hiramatsu @ 2022-03-03  1:54 UTC (permalink / raw)
  To: Naveen N. Rao
  Cc: Peter Zijlstra, alexei.starovoitov, alyssa.milburn,
	andrew.cooper3, hjl.tools, joao, jpoimboe, keescook,
	linux-kernel, mark.rutland, mbenes, Masami Hiramatsu,
	ndesaulniers, rostedt, samitolvanen, x86

On Wed, 02 Mar 2022 21:47:03 +0530
"Naveen N. Rao" <naveen.n.rao@linux.vnet.ibm.com> wrote:

> Peter Zijlstra wrote:
> > --- a/arch/powerpc/kernel/kprobes.c
> > +++ b/arch/powerpc/kernel/kprobes.c
> > @@ -105,6 +105,27 @@ kprobe_opcode_t *kprobe_lookup_name(cons
> >  	return addr;
> >  }
> > 
> > +static bool arch_kprobe_on_func_entry(unsigned long offset)
> > +{
> > +#ifdef PPC64_ELF_ABI_v2
> > +#ifdef CONFIG_KPROBES_ON_FTRACE
> > +	return offset <= 16;
> > +#else
> > +	return offset <= 8;
> > +#endif
> > +#else
> > +	return !offset;
> > +#endif
> > +}
> > +
> > +/* XXX try and fold the magic of kprobe_lookup_name() in this */
> > +kprobe_opcode_t *arch_adjust_kprobe_addr(unsigned long addr, unsigned long offset,
> > +					 bool *on_func_entry)
> > +{
> > +	*on_func_entry = arch_kprobe_on_func_entry(offset);
> > +	return (kprobe_opcode_t *)(addr + offset);
> > +}
> > +
> 
> With respect to kprobe_lookup_name(), one of the primary motivations there was 
> the issue with function descriptors for the previous elf v1 ABI (it likely also 
> affects ia64/parisc). I'm thinking it'll be simpler if we have a way to obtain 
> function entry point. Something like this:
> 
> diff --git a/include/linux/kallsyms.h b/include/linux/kallsyms.h
> index 4176c7eca7b5aa..8c57cc5b77f9ae 100644
> --- a/include/linux/kallsyms.h
> +++ b/include/linux/kallsyms.h
> @@ -73,6 +73,12 @@ int kallsyms_on_each_symbol(int (*fn)(void *, const char *, struct module *,
>  /* Lookup the address for a symbol. Returns 0 if not found. */
>  unsigned long kallsyms_lookup_name(const char *name);
>  
> +/* Return function entry point by additionally dereferencing function descriptor */
> +static inline unsigned long kallsyms_lookup_function(const char *name)
> +{
> +	return (unsigned long)dereference_symbol_descriptor((void *)kallsyms_lookup_name(name));
> +}
> +
>  extern int kallsyms_lookup_size_offset(unsigned long addr,
>  				  unsigned long *symbolsize,
>  				  unsigned long *offset);
> @@ -103,6 +109,11 @@ static inline unsigned long kallsyms_lookup_name(const char *name)
>  	return 0;
>  }
>  
> +static inline unsigned long kallsyms_lookup_function(const char *name)
> +{
> +	return 0;
> +}
> +
>  static inline int kallsyms_lookup_size_offset(unsigned long addr,
>  					      unsigned long *symbolsize,
>  					      unsigned long *offset)
> 
> 
> With that, we can fold some of the code from kprobe_lookup_name() into 
> arch_adjust_kprobe_addr() and remove kprobe_lookup_name(). This should also 
> address Masami's concerns with powerpc promoting all probes at function entry 
> into a probe at the ftrace location.

Good point, this looks good to me.
And "kallsyms_lookup_entry_address()" will be the preferable name.

Thank you,
> 
> 
> - Naveen
> 
> 
> ---
>  arch/powerpc/kernel/kprobes.c | 70 +++--------------------------------
>  include/linux/kprobes.h       |  1 -
>  kernel/kprobes.c              | 19 ++--------
>  kernel/trace/trace_kprobe.c   |  2 +-
>  4 files changed, 9 insertions(+), 83 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/kprobes.c b/arch/powerpc/kernel/kprobes.c
> index 7dae0b01abfbd6..46aa2b9e44c27c 100644
> --- a/arch/powerpc/kernel/kprobes.c
> +++ b/arch/powerpc/kernel/kprobes.c
> @@ -41,70 +41,6 @@ bool arch_within_kprobe_blacklist(unsigned long addr)
>  		 addr < (unsigned long)__head_end);
>  }
>  
> -kprobe_opcode_t *kprobe_lookup_name(const char *name, unsigned int offset)
> -{
> -	kprobe_opcode_t *addr = NULL;
> -
> -#ifdef PPC64_ELF_ABI_v2
> -	/* PPC64 ABIv2 needs local entry point */
> -	addr = (kprobe_opcode_t *)kallsyms_lookup_name(name);
> -	if (addr && !offset) {
> -#ifdef CONFIG_KPROBES_ON_FTRACE
> -		unsigned long faddr;
> -		/*
> -		 * Per livepatch.h, ftrace location is always within the first
> -		 * 16 bytes of a function on powerpc with -mprofile-kernel.
> -		 */
> -		faddr = ftrace_location_range((unsigned long)addr,
> -					      (unsigned long)addr + 16);
> -		if (faddr)
> -			addr = (kprobe_opcode_t *)faddr;
> -		else
> -#endif
> -			addr = (kprobe_opcode_t *)ppc_function_entry(addr);
> -	}
> -#elif defined(PPC64_ELF_ABI_v1)
> -	/*
> -	 * 64bit powerpc ABIv1 uses function descriptors:
> -	 * - Check for the dot variant of the symbol first.
> -	 * - If that fails, try looking up the symbol provided.
> -	 *
> -	 * This ensures we always get to the actual symbol and not
> -	 * the descriptor.
> -	 *
> -	 * Also handle <module:symbol> format.
> -	 */
> -	char dot_name[MODULE_NAME_LEN + 1 + KSYM_NAME_LEN];
> -	bool dot_appended = false;
> -	const char *c;
> -	ssize_t ret = 0;
> -	int len = 0;
> -
> -	if ((c = strnchr(name, MODULE_NAME_LEN, ':')) != NULL) {
> -		c++;
> -		len = c - name;
> -		memcpy(dot_name, name, len);
> -	} else
> -		c = name;
> -
> -	if (*c != '\0' && *c != '.') {
> -		dot_name[len++] = '.';
> -		dot_appended = true;
> -	}
> -	ret = strscpy(dot_name + len, c, KSYM_NAME_LEN);
> -	if (ret > 0)
> -		addr = (kprobe_opcode_t *)kallsyms_lookup_name(dot_name);
> -
> -	/* Fallback to the original non-dot symbol lookup */
> -	if (!addr && dot_appended)
> -		addr = (kprobe_opcode_t *)kallsyms_lookup_name(name);
> -#else
> -	addr = (kprobe_opcode_t *)kallsyms_lookup_name(name);
> -#endif
> -
> -	return addr;
> -}
> -
>  static bool arch_kprobe_on_func_entry(unsigned long offset)
>  {
>  #ifdef PPC64_ELF_ABI_v2
> @@ -118,11 +54,15 @@ static bool arch_kprobe_on_func_entry(unsigned long offset)
>  #endif
>  }
>  
> -/* XXX try and fold the magic of kprobe_lookup_name() in this */
>  kprobe_opcode_t *arch_adjust_kprobe_addr(unsigned long addr, unsigned long offset,
>  					 bool *on_func_entry)
>  {
>  	*on_func_entry = arch_kprobe_on_func_entry(offset);
> +#ifdef PPC64_ELF_ABI_v2
> +	/* Promote probes on the GEP to the LEP */
> +	if (!offset)
> +		addr = ppc_function_entry((void *)addr);
> +#endif
>  	return (kprobe_opcode_t *)(addr + offset);
>  }
>  
> diff --git a/include/linux/kprobes.h b/include/linux/kprobes.h
> index 9c28f7a0ef4268..dad375056ba049 100644
> --- a/include/linux/kprobes.h
> +++ b/include/linux/kprobes.h
> @@ -382,7 +382,6 @@ static inline struct kprobe_ctlblk *get_kprobe_ctlblk(void)
>  	return this_cpu_ptr(&kprobe_ctlblk);
>  }
>  
> -kprobe_opcode_t *kprobe_lookup_name(const char *name, unsigned int offset);
>  kprobe_opcode_t *arch_adjust_kprobe_addr(unsigned long addr, unsigned long offset, bool *on_func_entry);
>  
>  int register_kprobe(struct kprobe *p);
> diff --git a/kernel/kprobes.c b/kernel/kprobes.c
> index 8be57fdc19bdc0..066fa644e9dfa3 100644
> --- a/kernel/kprobes.c
> +++ b/kernel/kprobes.c
> @@ -67,12 +67,6 @@ static bool kprobes_all_disarmed;
>  static DEFINE_MUTEX(kprobe_mutex);
>  static DEFINE_PER_CPU(struct kprobe *, kprobe_instance);
>  
> -kprobe_opcode_t * __weak kprobe_lookup_name(const char *name,
> -					unsigned int __unused)
> -{
> -	return ((kprobe_opcode_t *)(kallsyms_lookup_name(name)));
> -}
> -
>  /*
>   * Blacklist -- list of 'struct kprobe_blacklist_entry' to store info where
>   * kprobes can not probe.
> @@ -1481,7 +1475,7 @@ bool within_kprobe_blacklist(unsigned long addr)
>  		if (!p)
>  			return false;
>  		*p = '\0';
> -		addr = (unsigned long)kprobe_lookup_name(symname, 0);
> +		addr = kallsyms_lookup_function(symname);
>  		if (addr)
>  			return __within_kprobe_blacklist(addr);
>  	}
> @@ -1524,14 +1518,7 @@ _kprobe_addr(kprobe_opcode_t *addr, const char *symbol_name,
>  		goto invalid;
>  
>  	if (symbol_name) {
> -		/*
> -		 * Input: @sym + @offset
> -		 * Output: @addr + @offset
> -		 *
> -		 * NOTE: kprobe_lookup_name() does *NOT* fold the offset
> -		 *       argument into it's output!
> -		 */
> -		addr = kprobe_lookup_name(symbol_name, offset);
> +		addr = (kprobe_opcode_t *)kallsyms_lookup_function(symbol_name);
>  		if (!addr)
>  			return ERR_PTR(-ENOENT);
>  	}
> @@ -2621,7 +2608,7 @@ static int __init init_kprobes(void)
>  		/* lookup the function address from its name */
>  		for (i = 0; kretprobe_blacklist[i].name != NULL; i++) {
>  			kretprobe_blacklist[i].addr =
> -				kprobe_lookup_name(kretprobe_blacklist[i].name, 0);
> +				(void *)kallsyms_lookup_function(kretprobe_blacklist[i].name);
>  			if (!kretprobe_blacklist[i].addr)
>  				pr_err("Failed to lookup symbol '%s' for kretprobe blacklist. Maybe the target function is removed or renamed.\n",
>  				       kretprobe_blacklist[i].name);
> diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
> index 508f14af4f2c7e..a8d01954051e60 100644
> --- a/kernel/trace/trace_kprobe.c
> +++ b/kernel/trace/trace_kprobe.c
> @@ -461,7 +461,7 @@ static bool within_notrace_func(struct trace_kprobe *tk)
>  		if (!p)
>  			return true;
>  		*p = '\0';
> -		addr = (unsigned long)kprobe_lookup_name(symname, 0);
> +		addr = kallsyms_lookup_function(symname);
>  		if (addr)
>  			return __within_notrace_func(addr);
>  	}
> -- 
> 2.35.1
> 
> 


-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH v2 12/39] x86/ibt,ftrace: Search for __fentry__ location
  2022-03-02 20:51                         ` Peter Zijlstra
@ 2022-03-03  9:45                           ` Naveen N. Rao
  2022-03-03 13:04                             ` Peter Zijlstra
  0 siblings, 1 reply; 183+ messages in thread
From: Naveen N. Rao @ 2022-03-03  9:45 UTC (permalink / raw)
  To: Peter Zijlstra, Steven Rostedt
  Cc: alexei.starovoitov, alyssa.milburn, andrew.cooper3, hjl.tools,
	joao, jpoimboe, keescook, linux-kernel, mark.rutland, mbenes,
	Masami Hiramatsu, ndesaulniers, samitolvanen, x86

Peter Zijlstra wrote:
> On Wed, Mar 02, 2022 at 02:47:16PM -0500, Steven Rostedt wrote:
>> Note, I just pulled this patch, and I hit this warning:
>> 
>> WARNING: CPU: 0 PID: 6965 at arch/x86/kernel/kprobes/core.c:205 recover_probed_instruction+0x8f/0xa0
>> 
>> static unsigned long
>> __recover_probed_insn(kprobe_opcode_t *buf, unsigned long addr)
>> {
>>         struct kprobe *kp;
>>         unsigned long faddr;
>> 
>>         kp = get_kprobe((void *)addr);
>>         faddr = ftrace_location(addr);
>>         /*
>>          * Addresses inside the ftrace location are refused by
>>          * arch_check_ftrace_location(). Something went terribly wrong
>>          * if such an address is checked here.
>>          */
>>         if (WARN_ON(faddr && faddr != addr))  <<---- HERE
>>                 return 0UL;
> 
> Ha! so a bunch of IRC later I figured out how it is possible you hit
> this with just the patch on and how I legitimately hit this and what to
> do about it.
> 
> Your problem seems to be that we got ftrace_location() wrong in that
> lookup_rec()'s end argument is inclusive, hence we need:
> 
> 	lookup_rec(ip, ip + size - 1)
> 
> Now, the above thing asserts that:
> 
> 	ftrace_location(x) == {0, x}
> 
> and that is genuinely false in my case, I get x+4 as additional possible
> output. So I think I need the below change to go on top of all I have:
> 
> 
> diff --git a/arch/x86/kernel/kprobes/core.c b/arch/x86/kernel/kprobes/core.c
> index 7f0ce42f8ff9..4c13406e0bc4 100644
> --- a/arch/x86/kernel/kprobes/core.c
> +++ b/arch/x86/kernel/kprobes/core.c
> @@ -198,13 +198,14 @@ __recover_probed_insn(kprobe_opcode_t *buf, unsigned long addr)
> 
>  	kp = get_kprobe((void *)addr);
>  	faddr = ftrace_location(addr);
> +
>  	/*
> -	 * Addresses inside the ftrace location are refused by
> -	 * arch_check_ftrace_location(). Something went terribly wrong
> -	 * if such an address is checked here.
> +	 * In case addr maps to sym+0 ftrace_location() might return something
> +	 * other than faddr. In that case consider it the same as !faddr.
>  	 */
> -	if (WARN_ON(faddr && faddr != addr))
> -		return 0UL;
> +	if (faddr && faddr != addr)
> +		faddr = 0;
> +
>  	/*
>  	 * Use the current code if it is not modified by Kprobe
>  	 * and it cannot be modified by ftrace.

I hit this issue yesterday in kprobe generic code in check_ftrace_location(). 
In both these scenarios, we just want to check if a particular instruction is 
reserved by ftrace.  ftrace_location_range() should still work for that 
purpose, so that may be the easier fix:

diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index 066fa644e9dfa3..ee3cd035403ca2 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -1596,7 +1596,7 @@ static int check_ftrace_location(struct kprobe *p)
 {
 	unsigned long ftrace_addr;
 
-	ftrace_addr = ftrace_location((unsigned long)p->addr);
+	ftrace_addr = ftrace_location_range((unsigned long)p->addr, (unsigned long)p->addr);
 	if (ftrace_addr) {
 #ifdef CONFIG_KPROBES_ON_FTRACE
 		/* Given address is not on the instruction boundary */



- Naveen


^ permalink raw reply related	[flat|nested] 183+ messages in thread

* Re: [PATCH v2 33/39] objtool: Add IBT/ENDBR decoding
  2022-02-24 14:52 ` [PATCH v2 33/39] objtool: Add IBT/ENDBR decoding Peter Zijlstra
@ 2022-03-03 10:53   ` Miroslav Benes
  2022-03-03 11:06     ` Andrew Cooper
  0 siblings, 1 reply; 183+ messages in thread
From: Miroslav Benes @ 2022-03-03 10:53 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: x86, joao, hjl.tools, jpoimboe, andrew.cooper3, linux-kernel,
	ndesaulniers, keescook, samitolvanen, mark.rutland,
	alyssa.milburn, rostedt, mhiramat, alexei.starovoitov

Hi,

On Thu, 24 Feb 2022, Peter Zijlstra wrote:

> Decode ENDBR instructions and WARN about NOTRACK prefixes.

I guess it has been already mentioned somewhere, but could you explain 
NOTRACK prefix here, please? If I understand it right, it disables IBT for 
the indirect branch instruction meaning that its target does not have to 
start with ENDBR? 

> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  tools/objtool/arch/x86/decode.c      |   34 +++++++++++++++++++++++++++++-----
>  tools/objtool/include/objtool/arch.h |    1 +
>  2 files changed, 30 insertions(+), 5 deletions(-)
> 
> --- a/tools/objtool/arch/x86/decode.c
> +++ b/tools/objtool/arch/x86/decode.c
> @@ -103,6 +103,18 @@ unsigned long arch_jump_destination(stru
>  #define rm_is_mem(reg)	(mod_is_mem() && !is_RIP() && rm_is(reg))
>  #define rm_is_reg(reg)	(mod_is_reg() && modrm_rm == (reg))
>  
> +static bool has_notrack_prefix(struct insn *insn)
> +{
> +	int i;
> +
> +	for (i = 0; i < insn->prefixes.nbytes; i++) {
> +		if (insn->prefixes.bytes[i] == 0x3e)
> +			return true;
> +	}
> +
> +	return false;
> +}
> +

...

> @@ -636,20 +656,24 @@ int arch_decode_instruction(struct objto
>  		break;
>  
>  	case 0xff:
> -		if (modrm_reg == 2 || modrm_reg == 3)
> +		if (modrm_reg == 2 || modrm_reg == 3) {
>  
>  			*type = INSN_CALL_DYNAMIC;
> +			if (has_notrack_prefix(&insn))
> +				WARN("notrack prefix found at %s:0x%lx", sec->name, offset);
>  
> -		else if (modrm_reg == 4)
> +		} else if (modrm_reg == 4) {
>  
>  			*type = INSN_JUMP_DYNAMIC;
> +			if (has_notrack_prefix(&insn))
> +				WARN("notrack prefix found at %s:0x%lx", sec->name, offset);

And we want to warn about it here so that we can have it all in the kernel 
control?

Miroslav

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH v2 33/39] objtool: Add IBT/ENDBR decoding
  2022-03-03 10:53   ` Miroslav Benes
@ 2022-03-03 11:06     ` Andrew Cooper
  2022-03-03 12:33       ` Miroslav Benes
  0 siblings, 1 reply; 183+ messages in thread
From: Andrew Cooper @ 2022-03-03 11:06 UTC (permalink / raw)
  To: Miroslav Benes, Peter Zijlstra
  Cc: x86, joao, hjl.tools, jpoimboe, linux-kernel, ndesaulniers,
	keescook, samitolvanen, mark.rutland, alyssa.milburn, rostedt,
	mhiramat, alexei.starovoitov

On 03/03/2022 10:53, Miroslav Benes wrote:
> Hi,
>
> On Thu, 24 Feb 2022, Peter Zijlstra wrote:
>
>> Decode ENDBR instructions and WARN about NOTRACK prefixes.
> I guess it has been already mentioned somewhere, but could you explain 
> NOTRACK prefix here, please? If I understand it right, it disables IBT for 
> the indirect branch instruction meaning that its target does not have to 
> start with ENDBR?

CET-IBT has loads of get-out clauses.  The NOTRACK prefix is one; the
legacy code bitmap (implicit NOTRACK for whole libraries) is another.

And yes - the purpose of NOTRACK is to exempt a specific indirect branch
from checks.

GCC can emit NOTRACK'd calls in some cases when e.g. the programmer
launders a function pointer through (void *), or when
__attribute__((no_cf_check)) is used explicitly.


Each of the get-out clauses has separate enable bits, as each of them
reduces security.  In this series, Linux sets MSR_S_CET.ENDBR_EN but
specifically does not set NOTRACK_EN, so NOTRACK prefixes will be
ignored and suffer #CP if encountered.

~Andrew

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH v2 15/39] x86/ibt,kprobes: Fix more +0 assumptions
  2022-03-02 19:39                     ` Peter Zijlstra
@ 2022-03-03 12:11                       ` Naveen N. Rao
  0 siblings, 0 replies; 183+ messages in thread
From: Naveen N. Rao @ 2022-03-03 12:11 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: alexei.starovoitov, alyssa.milburn, andrew.cooper3, hjl.tools,
	joao, jpoimboe, keescook, linux-kernel, mark.rutland, mbenes,
	Masami Hiramatsu, ndesaulniers, rostedt, samitolvanen, x86

Peter Zijlstra wrote:
> On Wed, Mar 02, 2022 at 08:32:45PM +0100, Peter Zijlstra wrote:
>> I wonder if you also want to tighten up on_func_entry? Wouldn't the
>> above suggest something like:

Good question ;)
I noticed this yesterday, but held off on making changes so that I can 
think this through.

>> 
>> kprobe_opcode_t *arch_adjust_kprobe_addr(unsigned long addr, unsigned long offset,
>> 					 bool *on_func_entry)
>> {
>> #ifdef PPC64_ELF_ABI_V2
>> 	unsigned long entry = ppc_function_entry((void *)addr) - addr;
>> 	*on_func_entry = !offset || offset == entry;
>> 	if (*on_func_entry)
>> 		offset = entry;
>> #else
>> 	*on_func_entry = !offset;
>> #endif
>> 	return (void *)(addr + offset);
>> }

This is more accurate and probably something we should do in the long 
term. The main issue I see today is userspace, specifically perf (and 
perhaps others too).

Historically, we have worked around the issue of probes on a function's 
global entry point not working by having userspace adjust the offset at 
which probes are placed. This works well if those object files have 
either the symbol table, or debuginfo capturing if functions have a 
separate local entry point. In the absence of those, we are left 
guessing and we chose to just offset all probes at function entry by 8 
(GEP almost always has the same two instructions) so that perf "just 
works". This still works well for functions without a GEP since we 
expect to see the two ftrace instructions at function entry, so we are 
ok to probe after that. As an added bonus, this also allows uprobes to 
work, for the most part.

On the kernel side, we only implemented logic to adjust probe address if 
a function name was specified without an offset. This went for a toss 
once perf probe moved to using _text as the base symbol for kprobes 
though, and we weren't handling scenarios where addr+offset was 
provided. With the changes in this series, we can now adjust kprobe 
address across all those scenarios properly.

If we update perf to not pass an offset any more, then newer perf will 
stop working on older kernels. If we make the logic to determine 
function entry strict in the kernel, then we risk breaking existing 
userspace.

I'm not sure how best to address this.

> 
> One question though; the above seems to work for +0 or +8 (IIRC your
> instructions are 4 bytes each and the GEP is 2 instructions).
> 
> But what do we want to happen for +4 ?

We don't want to change the behavior of probes at the second instruction 
in GEP. The thinking is that it allows the rare scenario (if at all) of 
wanting to catch indirect function calls, and/or cross-module function 
calls -- especially since we now promote probes at GEP to LEP. I frankly 
know of no such scenarios so far, but in any case, if the user is 
specifying an offset, they better know what they are asking for :)

For the same reason, we should allow kretprobe at +4.


Thanks,
Naveen

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH v2 33/39] objtool: Add IBT/ENDBR decoding
  2022-03-03 11:06     ` Andrew Cooper
@ 2022-03-03 12:33       ` Miroslav Benes
  2022-03-03 14:13         ` Peter Zijlstra
  0 siblings, 1 reply; 183+ messages in thread
From: Miroslav Benes @ 2022-03-03 12:33 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Peter Zijlstra, x86, joao, hjl.tools, jpoimboe, linux-kernel,
	ndesaulniers, keescook, samitolvanen, mark.rutland,
	alyssa.milburn, rostedt, mhiramat, alexei.starovoitov

[-- Attachment #1: Type: text/plain, Size: 1253 bytes --]

On Thu, 3 Mar 2022, Andrew Cooper wrote:

> On 03/03/2022 10:53, Miroslav Benes wrote:
> > Hi,
> >
> > On Thu, 24 Feb 2022, Peter Zijlstra wrote:
> >
> >> Decode ENDBR instructions and WARN about NOTRACK prefixes.
> > I guess it has been already mentioned somewhere, but could you explain 
> > NOTRACK prefix here, please? If I understand it right, it disables IBT for 
> > the indirect branch instruction meaning that its target does not have to 
> > start with ENDBR?
> 
> CET-IBT has loads of get-out clauses.  The NOTRACK prefix is one; the
> legacy code bitmap (implicit NOTRACK for whole libraries) is another.
> 
> And yes - the purpose of NOTRACK is to exempt a specific indirect branch
> from checks.
> 
> GCC can emit NOTRACK'd calls in some cases when e.g. the programmer
> launders a function pointer through (void *), or when
> __attribute__((no_cf_check)) is used explicitly.
> 
> 
> Each of the get-out clauses has separate enable bits, as each of them
> reduces security.  In this series, Linux sets MSR_S_CET.ENDBR_EN but
> specifically does not set NOTRACK_EN, so NOTRACK prefixes will be
> ignored and suffer #CP if encountered.

Thanks for the explanation. I would be nice to include it somewhere so 
that it is not lost.

Miroslav

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH v2 12/39] x86/ibt,ftrace: Search for __fentry__ location
  2022-03-03  9:45                           ` Naveen N. Rao
@ 2022-03-03 13:04                             ` Peter Zijlstra
  2022-03-03 14:34                               ` Steven Rostedt
  2022-03-03 14:39                               ` Naveen N. Rao
  0 siblings, 2 replies; 183+ messages in thread
From: Peter Zijlstra @ 2022-03-03 13:04 UTC (permalink / raw)
  To: Naveen N. Rao
  Cc: Steven Rostedt, alexei.starovoitov, alyssa.milburn,
	andrew.cooper3, hjl.tools, joao, jpoimboe, keescook,
	linux-kernel, mark.rutland, mbenes, Masami Hiramatsu,
	ndesaulniers, samitolvanen, x86

On Thu, Mar 03, 2022 at 03:15:14PM +0530, Naveen N. Rao wrote:

> > diff --git a/arch/x86/kernel/kprobes/core.c b/arch/x86/kernel/kprobes/core.c
> > index 7f0ce42f8ff9..4c13406e0bc4 100644
> > --- a/arch/x86/kernel/kprobes/core.c
> > +++ b/arch/x86/kernel/kprobes/core.c
> > @@ -198,13 +198,14 @@ __recover_probed_insn(kprobe_opcode_t *buf, unsigned long addr)
> > 
> >  	kp = get_kprobe((void *)addr);
> >  	faddr = ftrace_location(addr);
> > +
> >  	/*
> > -	 * Addresses inside the ftrace location are refused by
> > -	 * arch_check_ftrace_location(). Something went terribly wrong
> > -	 * if such an address is checked here.
> > +	 * In case addr maps to sym+0 ftrace_location() might return something
> > +	 * other than faddr. In that case consider it the same as !faddr.
> >  	 */
> > -	if (WARN_ON(faddr && faddr != addr))
> > -		return 0UL;
> > +	if (faddr && faddr != addr)
> > +		faddr = 0;
> > +
> >  	/*
> >  	 * Use the current code if it is not modified by Kprobe
> >  	 * and it cannot be modified by ftrace.
> 
> I hit this issue yesterday in kprobe generic code in
> check_ftrace_location().

What exactly where you running to trigger this? (so that I can extend my
test coverage etc..)

> In both these scenarios, we just want to check if a
> particular instruction is reserved by ftrace.  ftrace_location_range()
> should still work for that purpose, so that may be the easier fix:
> 
> diff --git a/kernel/kprobes.c b/kernel/kprobes.c
> index 066fa644e9dfa3..ee3cd035403ca2 100644
> --- a/kernel/kprobes.c
> +++ b/kernel/kprobes.c
> @@ -1596,7 +1596,7 @@ static int check_ftrace_location(struct kprobe *p)
> {
> 	unsigned long ftrace_addr;
> 
> -	ftrace_addr = ftrace_location((unsigned long)p->addr);
> +	ftrace_addr = ftrace_location_range((unsigned long)p->addr, (unsigned long)p->addr);

Yes, although perhaps a new helper. I'll go ponder during lunch.

PS. I posted v3 but forgot to Cc you:

  https://lkml.kernel.org/r/20220303112321.422525803@infradead.org

I think the above hunk ended up in the kprobe patch, but on second
thought I should've put it in the ftrace one. I'll go ammend and add
this other site you found.

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH v2 33/39] objtool: Add IBT/ENDBR decoding
  2022-03-03 12:33       ` Miroslav Benes
@ 2022-03-03 14:13         ` Peter Zijlstra
  0 siblings, 0 replies; 183+ messages in thread
From: Peter Zijlstra @ 2022-03-03 14:13 UTC (permalink / raw)
  To: Miroslav Benes
  Cc: Andrew Cooper, x86, joao, hjl.tools, jpoimboe, linux-kernel,
	ndesaulniers, keescook, samitolvanen, mark.rutland,
	alyssa.milburn, rostedt, mhiramat, alexei.starovoitov

On Thu, Mar 03, 2022 at 01:33:06PM +0100, Miroslav Benes wrote:
> On Thu, 3 Mar 2022, Andrew Cooper wrote:
> 
> > On 03/03/2022 10:53, Miroslav Benes wrote:
> > > Hi,
> > >
> > > On Thu, 24 Feb 2022, Peter Zijlstra wrote:
> > >
> > >> Decode ENDBR instructions and WARN about NOTRACK prefixes.
> > > I guess it has been already mentioned somewhere, but could you explain 
> > > NOTRACK prefix here, please? If I understand it right, it disables IBT for 
> > > the indirect branch instruction meaning that its target does not have to 
> > > start with ENDBR?
> > 
> > CET-IBT has loads of get-out clauses.  The NOTRACK prefix is one; the
> > legacy code bitmap (implicit NOTRACK for whole libraries) is another.
> > 
> > And yes - the purpose of NOTRACK is to exempt a specific indirect branch
> > from checks.
> > 
> > GCC can emit NOTRACK'd calls in some cases when e.g. the programmer
> > launders a function pointer through (void *), or when
> > __attribute__((no_cf_check)) is used explicitly.
> > 
> > 
> > Each of the get-out clauses has separate enable bits, as each of them
> > reduces security.  In this series, Linux sets MSR_S_CET.ENDBR_EN but
> > specifically does not set NOTRACK_EN, so NOTRACK prefixes will be
> > ignored and suffer #CP if encountered.
> 
> Thanks for the explanation. I would be nice to include it somewhere so 
> that it is not lost.

I'll add something to the Changelog. Thanks!

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH v2 12/39] x86/ibt,ftrace: Search for __fentry__ location
  2022-03-03 13:04                             ` Peter Zijlstra
@ 2022-03-03 14:34                               ` Steven Rostedt
  2022-03-03 15:59                                 ` Peter Zijlstra
  2022-03-03 14:39                               ` Naveen N. Rao
  1 sibling, 1 reply; 183+ messages in thread
From: Steven Rostedt @ 2022-03-03 14:34 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Naveen N. Rao, alexei.starovoitov, alyssa.milburn,
	andrew.cooper3, hjl.tools, joao, jpoimboe, keescook,
	linux-kernel, mark.rutland, mbenes, Masami Hiramatsu,
	ndesaulniers, samitolvanen, x86

On Thu, 3 Mar 2022 14:04:52 +0100
Peter Zijlstra <peterz@infradead.org> wrote:

> > @@ -1596,7 +1596,7 @@ static int check_ftrace_location(struct kprobe *p)
> > {
> > 	unsigned long ftrace_addr;
> > 
> > -	ftrace_addr = ftrace_location((unsigned long)p->addr);
> > +	ftrace_addr = ftrace_location_range((unsigned long)p->addr, (unsigned long)p->addr);  
> 
> Yes, although perhaps a new helper. I'll go ponder during lunch.

Is there more places to add that to make it worth creating a helper?

If not, I would just keep using the ftrace_location_range().

If there is to be a helper function, then we should not have touched
ftrace_location() in the first place, and instead created a new function
that does the offset check.

Because thinking about this more, ftrace_location() is suppose to act just
like ftrace_location_range() and now it does not.

I rather keep ftrace_location() the same as ftrace_location_range() if
there's going to be another API. Maybe create a function ftrace_addr() that
does the new ftrace_location() that you have, and leave ftrace_location()
as is?

This is actually what I suggested in the beginning.

-- Steve

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH v2 12/39] x86/ibt,ftrace: Search for __fentry__ location
  2022-03-03 13:04                             ` Peter Zijlstra
  2022-03-03 14:34                               ` Steven Rostedt
@ 2022-03-03 14:39                               ` Naveen N. Rao
  1 sibling, 0 replies; 183+ messages in thread
From: Naveen N. Rao @ 2022-03-03 14:39 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: alexei.starovoitov, alyssa.milburn, andrew.cooper3, hjl.tools,
	joao, jpoimboe, keescook, linux-kernel, mark.rutland, mbenes,
	Masami Hiramatsu, ndesaulniers, Steven Rostedt, samitolvanen,
	x86

Peter Zijlstra wrote:
> On Thu, Mar 03, 2022 at 03:15:14PM +0530, Naveen N. Rao wrote:
> 
>> > diff --git a/arch/x86/kernel/kprobes/core.c b/arch/x86/kernel/kprobes/core.c
>> > index 7f0ce42f8ff9..4c13406e0bc4 100644
>> > --- a/arch/x86/kernel/kprobes/core.c
>> > +++ b/arch/x86/kernel/kprobes/core.c
>> > @@ -198,13 +198,14 @@ __recover_probed_insn(kprobe_opcode_t *buf, unsigned long addr)
>> > 
>> >  	kp = get_kprobe((void *)addr);
>> >  	faddr = ftrace_location(addr);
>> > +
>> >  	/*
>> > -	 * Addresses inside the ftrace location are refused by
>> > -	 * arch_check_ftrace_location(). Something went terribly wrong
>> > -	 * if such an address is checked here.
>> > +	 * In case addr maps to sym+0 ftrace_location() might return something
>> > +	 * other than faddr. In that case consider it the same as !faddr.
>> >  	 */
>> > -	if (WARN_ON(faddr && faddr != addr))
>> > -		return 0UL;
>> > +	if (faddr && faddr != addr)
>> > +		faddr = 0;
>> > +
>> >  	/*
>> >  	 * Use the current code if it is not modified by Kprobe
>> >  	 * and it cannot be modified by ftrace.
>> 
>> I hit this issue yesterday in kprobe generic code in
>> check_ftrace_location().
> 
> What exactly where you running to trigger this? (so that I can extend my
> test coverage etc..)

With the changes here, we always promote probes at +0 offset to the LEP 
at +8.  So, we get the old behavior of ftrace_location(). But, we also 
have functions that do not have a GEP. For those, we allow probing at an 
offset of +0, resulting in ftrace_location() giving us the actual ftrace 
site, which on ppc64le will be at +4.

On ppc32, this will affect all probes since the ftrace site is usually 
at an offset of +8.

I don't think x86 has this issue since you will always have ftrace site 
at +0 if a function does not have the endbr instruction. And if you do 
have the endbr instruction, then you adjust the probe address so it 
won't be at an offset of 0 into the function.

> 
>> In both these scenarios, we just want to check if a
>> particular instruction is reserved by ftrace.  ftrace_location_range()
>> should still work for that purpose, so that may be the easier fix:
>> 
>> diff --git a/kernel/kprobes.c b/kernel/kprobes.c
>> index 066fa644e9dfa3..ee3cd035403ca2 100644
>> --- a/kernel/kprobes.c
>> +++ b/kernel/kprobes.c
>> @@ -1596,7 +1596,7 @@ static int check_ftrace_location(struct kprobe *p)
>> {
>> 	unsigned long ftrace_addr;
>> 
>> -	ftrace_addr = ftrace_location((unsigned long)p->addr);
>> +	ftrace_addr = ftrace_location_range((unsigned long)p->addr, (unsigned long)p->addr);
> 
> Yes, although perhaps a new helper. I'll go ponder during lunch.

Sure. We do have ftrace_text_reserved(), but that only returns a 
boolean. Masami wanted to ensure that the probe is at an instruction 
boundary, hence this check.

> 
> PS. I posted v3 but forgot to Cc you:
> 
>   https://lkml.kernel.org/r/20220303112321.422525803@infradead.org
> 
> I think the above hunk ended up in the kprobe patch, but on second
> thought I should've put it in the ftrace one. I'll go ammend and add
> this other site you found.

Thanks, I'll take a look.

- Naveen

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH v2 12/39] x86/ibt,ftrace: Search for __fentry__ location
  2022-03-03 14:34                               ` Steven Rostedt
@ 2022-03-03 15:59                                 ` Peter Zijlstra
  2022-03-06  3:48                                   ` Masami Hiramatsu
  2022-03-09 11:47                                   ` Naveen N. Rao
  0 siblings, 2 replies; 183+ messages in thread
From: Peter Zijlstra @ 2022-03-03 15:59 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Naveen N. Rao, alexei.starovoitov, alyssa.milburn,
	andrew.cooper3, hjl.tools, joao, jpoimboe, keescook,
	linux-kernel, mark.rutland, mbenes, Masami Hiramatsu,
	ndesaulniers, samitolvanen, x86

On Thu, Mar 03, 2022 at 09:34:13AM -0500, Steven Rostedt wrote:
> On Thu, 3 Mar 2022 14:04:52 +0100
> Peter Zijlstra <peterz@infradead.org> wrote:
> 
> > > @@ -1596,7 +1596,7 @@ static int check_ftrace_location(struct kprobe *p)
> > > {
> > > 	unsigned long ftrace_addr;
> > > 
> > > -	ftrace_addr = ftrace_location((unsigned long)p->addr);
> > > +	ftrace_addr = ftrace_location_range((unsigned long)p->addr, (unsigned long)p->addr);  
> > 
> > Yes, although perhaps a new helper. I'll go ponder during lunch.
> 
> Is there more places to add that to make it worth creating a helper?

This is what I ended up with, I've looked at all ftrace_location() sites
there are, seems to work too, both the built-in boot time ftrace tests
and the selftests work splat-less.

I should update the Changelog some though.

Naveen also mentioned register_ftrace_direct() could be further cleaned
up, but I didn't want to do too much in once go.

---

Subject: x86/ibt,ftrace: Search for __fentry__ location
From: Peter Zijlstra <peterz@infradead.org>
Date: Wed Feb 23 10:01:38 CET 2022

Have ftrace_location() search the symbol for the __fentry__ location
when it isn't at func+0 and use this for {,un}register_ftrace_direct().

This avoids a whole bunch of assumptions about __fentry__ being at
func+0.

Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/kernel/kprobes/core.c |   11 +---------
 kernel/bpf/trampoline.c        |   20 +++----------------
 kernel/kprobes.c               |    8 +------
 kernel/trace/ftrace.c          |   43 +++++++++++++++++++++++++++++++++--------
 4 files changed, 43 insertions(+), 39 deletions(-)

--- a/arch/x86/kernel/kprobes/core.c
+++ b/arch/x86/kernel/kprobes/core.c
@@ -193,17 +193,10 @@ static unsigned long
 __recover_probed_insn(kprobe_opcode_t *buf, unsigned long addr)
 {
 	struct kprobe *kp;
-	unsigned long faddr;
+	bool faddr;
 
 	kp = get_kprobe((void *)addr);
-	faddr = ftrace_location(addr);
-	/*
-	 * Addresses inside the ftrace location are refused by
-	 * arch_check_ftrace_location(). Something went terribly wrong
-	 * if such an address is checked here.
-	 */
-	if (WARN_ON(faddr && faddr != addr))
-		return 0UL;
+	faddr = ftrace_location(addr) == addr;
 	/*
 	 * Use the current code if it is not modified by Kprobe
 	 * and it cannot be modified by ftrace.
--- a/kernel/bpf/trampoline.c
+++ b/kernel/bpf/trampoline.c
@@ -117,18 +117,6 @@ static void bpf_trampoline_module_put(st
 	tr->mod = NULL;
 }
 
-static int is_ftrace_location(void *ip)
-{
-	long addr;
-
-	addr = ftrace_location((long)ip);
-	if (!addr)
-		return 0;
-	if (WARN_ON_ONCE(addr != (long)ip))
-		return -EFAULT;
-	return 1;
-}
-
 static int unregister_fentry(struct bpf_trampoline *tr, void *old_addr)
 {
 	void *ip = tr->func.addr;
@@ -160,12 +148,12 @@ static int modify_fentry(struct bpf_tram
 static int register_fentry(struct bpf_trampoline *tr, void *new_addr)
 {
 	void *ip = tr->func.addr;
+	unsigned long faddr;
 	int ret;
 
-	ret = is_ftrace_location(ip);
-	if (ret < 0)
-		return ret;
-	tr->func.ftrace_managed = ret;
+	faddr = ftrace_location((unsigned long)ip);
+	if (faddr)
+		tr->func.ftrace_managed = true;
 
 	if (bpf_trampoline_module_get(tr))
 		return -ENOENT;
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -1562,14 +1562,10 @@ static inline int warn_kprobe_rereg(stru
 
 static int check_ftrace_location(struct kprobe *p)
 {
-	unsigned long ftrace_addr;
+	unsigned long addr = (unsigned long)p->addr;
 
-	ftrace_addr = ftrace_location((unsigned long)p->addr);
-	if (ftrace_addr) {
+	if (ftrace_location(addr) == addr) {
 #ifdef CONFIG_KPROBES_ON_FTRACE
-		/* Given address is not on the instruction boundary */
-		if ((unsigned long)p->addr != ftrace_addr)
-			return -EILSEQ;
 		p->flags |= KPROBE_FLAG_FTRACE;
 #else	/* !CONFIG_KPROBES_ON_FTRACE */
 		return -EINVAL;
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -1568,17 +1568,34 @@ unsigned long ftrace_location_range(unsi
 }
 
 /**
- * ftrace_location - return true if the ip giving is a traced location
+ * ftrace_location - return the ftrace location
  * @ip: the instruction pointer to check
  *
- * Returns rec->ip if @ip given is a pointer to a ftrace location.
- * That is, the instruction that is either a NOP or call to
- * the function tracer. It checks the ftrace internal tables to
- * determine if the address belongs or not.
+ * If @ip matches the ftrace location, return @ip.
+ * If @ip matches sym+0, return sym's ftrace location.
+ * Otherwise, return 0.
  */
 unsigned long ftrace_location(unsigned long ip)
 {
-	return ftrace_location_range(ip, ip);
+	struct dyn_ftrace *rec;
+	unsigned long offset;
+	unsigned long size;
+
+	rec = lookup_rec(ip, ip);
+	if (!rec) {
+		if (!kallsyms_lookup_size_offset(ip, &size, &offset))
+			goto out;
+
+		/* map sym+0 to __fentry__ */
+		if (!offset)
+			rec = lookup_rec(ip, ip + size - 1);
+	}
+
+	if (rec)
+		return rec->ip;
+
+out:
+	return 0;
 }
 
 /**
@@ -4962,7 +4979,8 @@ ftrace_match_addr(struct ftrace_hash *ha
 {
 	struct ftrace_func_entry *entry;
 
-	if (!ftrace_location(ip))
+	ip = ftrace_location(ip);
+	if (!ip)
 		return -EINVAL;
 
 	if (remove) {
@@ -5110,11 +5128,16 @@ int register_ftrace_direct(unsigned long
 	struct ftrace_func_entry *entry;
 	struct ftrace_hash *free_hash = NULL;
 	struct dyn_ftrace *rec;
-	int ret = -EBUSY;
+	int ret = -ENODEV;
 
 	mutex_lock(&direct_mutex);
 
+	ip = ftrace_location(ip);
+	if (!ip)
+		goto out_unlock;
+
 	/* See if there's a direct function at @ip already */
+	ret = -EBUSY;
 	if (ftrace_find_rec_direct(ip))
 		goto out_unlock;
 
@@ -5222,6 +5245,10 @@ int unregister_ftrace_direct(unsigned lo
 
 	mutex_lock(&direct_mutex);
 
+	ip = ftrace_location(ip);
+	if (!ip)
+		goto out_unlock;
+
 	entry = find_direct_entry(&ip, NULL);
 	if (!entry)
 		goto out_unlock;

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH v2 12/39] x86/ibt,ftrace: Search for __fentry__ location
  2022-03-03 15:59                                 ` Peter Zijlstra
@ 2022-03-06  3:48                                   ` Masami Hiramatsu
  2022-03-09 11:47                                   ` Naveen N. Rao
  1 sibling, 0 replies; 183+ messages in thread
From: Masami Hiramatsu @ 2022-03-06  3:48 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Steven Rostedt, Naveen N. Rao, alexei.starovoitov,
	alyssa.milburn, andrew.cooper3, hjl.tools, joao, jpoimboe,
	keescook, linux-kernel, mark.rutland, mbenes, Masami Hiramatsu,
	ndesaulniers, samitolvanen, x86

On Thu, 3 Mar 2022 16:59:03 +0100
Peter Zijlstra <peterz@infradead.org> wrote:

> On Thu, Mar 03, 2022 at 09:34:13AM -0500, Steven Rostedt wrote:
> > On Thu, 3 Mar 2022 14:04:52 +0100
> > Peter Zijlstra <peterz@infradead.org> wrote:
> > 
> > > > @@ -1596,7 +1596,7 @@ static int check_ftrace_location(struct kprobe *p)
> > > > {
> > > > 	unsigned long ftrace_addr;
> > > > 
> > > > -	ftrace_addr = ftrace_location((unsigned long)p->addr);
> > > > +	ftrace_addr = ftrace_location_range((unsigned long)p->addr, (unsigned long)p->addr);  
> > > 
> > > Yes, although perhaps a new helper. I'll go ponder during lunch.
> > 
> > Is there more places to add that to make it worth creating a helper?
> 
> This is what I ended up with, I've looked at all ftrace_location() sites
> there are, seems to work too, both the built-in boot time ftrace tests
> and the selftests work splat-less.
> 
> I should update the Changelog some though.
> 
> Naveen also mentioned register_ftrace_direct() could be further cleaned
> up, but I didn't want to do too much in once go.
> 
> ---
> 
> Subject: x86/ibt,ftrace: Search for __fentry__ location
> From: Peter Zijlstra <peterz@infradead.org>
> Date: Wed Feb 23 10:01:38 CET 2022
> 
> Have ftrace_location() search the symbol for the __fentry__ location
> when it isn't at func+0 and use this for {,un}register_ftrace_direct().
> 
> This avoids a whole bunch of assumptions about __fentry__ being at
> func+0.
> 
> Suggested-by: Steven Rostedt <rostedt@goodmis.org>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  arch/x86/kernel/kprobes/core.c |   11 +---------
>  kernel/bpf/trampoline.c        |   20 +++----------------
>  kernel/kprobes.c               |    8 +------
>  kernel/trace/ftrace.c          |   43 +++++++++++++++++++++++++++++++++--------
>  4 files changed, 43 insertions(+), 39 deletions(-)
> 
> --- a/arch/x86/kernel/kprobes/core.c
> +++ b/arch/x86/kernel/kprobes/core.c
> @@ -193,17 +193,10 @@ static unsigned long
>  __recover_probed_insn(kprobe_opcode_t *buf, unsigned long addr)
>  {
>  	struct kprobe *kp;
> -	unsigned long faddr;
> +	bool faddr;
>  
>  	kp = get_kprobe((void *)addr);
> -	faddr = ftrace_location(addr);
> -	/*
> -	 * Addresses inside the ftrace location are refused by
> -	 * arch_check_ftrace_location(). Something went terribly wrong
> -	 * if such an address is checked here.
> -	 */
> -	if (WARN_ON(faddr && faddr != addr))
> -		return 0UL;
> +	faddr = ftrace_location(addr) == addr;

OK, this looks good to me. 

>  	/*
>  	 * Use the current code if it is not modified by Kprobe
>  	 * and it cannot be modified by ftrace.
> --- a/kernel/bpf/trampoline.c
> +++ b/kernel/bpf/trampoline.c
> @@ -117,18 +117,6 @@ static void bpf_trampoline_module_put(st
>  	tr->mod = NULL;
>  }
>  
> -static int is_ftrace_location(void *ip)
> -{
> -	long addr;
> -
> -	addr = ftrace_location((long)ip);
> -	if (!addr)
> -		return 0;
> -	if (WARN_ON_ONCE(addr != (long)ip))
> -		return -EFAULT;
> -	return 1;
> -}
> -
>  static int unregister_fentry(struct bpf_trampoline *tr, void *old_addr)
>  {
>  	void *ip = tr->func.addr;
> @@ -160,12 +148,12 @@ static int modify_fentry(struct bpf_tram
>  static int register_fentry(struct bpf_trampoline *tr, void *new_addr)
>  {
>  	void *ip = tr->func.addr;
> +	unsigned long faddr;
>  	int ret;
>  
> -	ret = is_ftrace_location(ip);
> -	if (ret < 0)
> -		return ret;
> -	tr->func.ftrace_managed = ret;
> +	faddr = ftrace_location((unsigned long)ip);
> +	if (faddr)
> +		tr->func.ftrace_managed = true;
>  
>  	if (bpf_trampoline_module_get(tr))
>  		return -ENOENT;
> --- a/kernel/kprobes.c
> +++ b/kernel/kprobes.c
> @@ -1562,14 +1562,10 @@ static inline int warn_kprobe_rereg(stru
>  
>  static int check_ftrace_location(struct kprobe *p)
>  {
> -	unsigned long ftrace_addr;
> +	unsigned long addr = (unsigned long)p->addr;
>  
> -	ftrace_addr = ftrace_location((unsigned long)p->addr);
> -	if (ftrace_addr) {
> +	if (ftrace_location(addr) == addr) {
>  #ifdef CONFIG_KPROBES_ON_FTRACE
> -		/* Given address is not on the instruction boundary */
> -		if ((unsigned long)p->addr != ftrace_addr)
> -			return -EILSEQ;

OK, so this means we only use the ftrace if the kprobe puts the
sym+ftrace-offset. Thus if there is ENDBR at the first instruction,
kprobe will use int3, right?
I agree with this, but later I have to add another patch to use ftrace
for the kprobes on symbol+0. But anyway, that is another issue.

So this looks good to me.

Acked-by: Masami Hiramatsu <mhiramat@kernel.org>

Thank you!

>  		p->flags |= KPROBE_FLAG_FTRACE;
>  #else	/* !CONFIG_KPROBES_ON_FTRACE */
>  		return -EINVAL;
> --- a/kernel/trace/ftrace.c
> +++ b/kernel/trace/ftrace.c
> @@ -1568,17 +1568,34 @@ unsigned long ftrace_location_range(unsi
>  }
>  
>  /**
> - * ftrace_location - return true if the ip giving is a traced location
> + * ftrace_location - return the ftrace location
>   * @ip: the instruction pointer to check
>   *
> - * Returns rec->ip if @ip given is a pointer to a ftrace location.
> - * That is, the instruction that is either a NOP or call to
> - * the function tracer. It checks the ftrace internal tables to
> - * determine if the address belongs or not.
> + * If @ip matches the ftrace location, return @ip.
> + * If @ip matches sym+0, return sym's ftrace location.
> + * Otherwise, return 0.
>   */
>  unsigned long ftrace_location(unsigned long ip)
>  {
> -	return ftrace_location_range(ip, ip);
> +	struct dyn_ftrace *rec;
> +	unsigned long offset;
> +	unsigned long size;
> +
> +	rec = lookup_rec(ip, ip);
> +	if (!rec) {
> +		if (!kallsyms_lookup_size_offset(ip, &size, &offset))
> +			goto out;
> +
> +		/* map sym+0 to __fentry__ */
> +		if (!offset)
> +			rec = lookup_rec(ip, ip + size - 1);
> +	}
> +
> +	if (rec)
> +		return rec->ip;
> +
> +out:
> +	return 0;
>  }
>  
>  /**
> @@ -4962,7 +4979,8 @@ ftrace_match_addr(struct ftrace_hash *ha
>  {
>  	struct ftrace_func_entry *entry;
>  
> -	if (!ftrace_location(ip))
> +	ip = ftrace_location(ip);
> +	if (!ip)
>  		return -EINVAL;
>  
>  	if (remove) {
> @@ -5110,11 +5128,16 @@ int register_ftrace_direct(unsigned long
>  	struct ftrace_func_entry *entry;
>  	struct ftrace_hash *free_hash = NULL;
>  	struct dyn_ftrace *rec;
> -	int ret = -EBUSY;
> +	int ret = -ENODEV;
>  
>  	mutex_lock(&direct_mutex);
>  
> +	ip = ftrace_location(ip);
> +	if (!ip)
> +		goto out_unlock;
> +
>  	/* See if there's a direct function at @ip already */
> +	ret = -EBUSY;
>  	if (ftrace_find_rec_direct(ip))
>  		goto out_unlock;
>  
> @@ -5222,6 +5245,10 @@ int unregister_ftrace_direct(unsigned lo
>  
>  	mutex_lock(&direct_mutex);
>  
> +	ip = ftrace_location(ip);
> +	if (!ip)
> +		goto out_unlock;
> +
>  	entry = find_direct_entry(&ip, NULL);
>  	if (!entry)
>  		goto out_unlock;


-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH v2 12/39] x86/ibt,ftrace: Search for __fentry__ location
  2022-03-03 15:59                                 ` Peter Zijlstra
  2022-03-06  3:48                                   ` Masami Hiramatsu
@ 2022-03-09 11:47                                   ` Naveen N. Rao
  1 sibling, 0 replies; 183+ messages in thread
From: Naveen N. Rao @ 2022-03-09 11:47 UTC (permalink / raw)
  To: Peter Zijlstra, Steven Rostedt
  Cc: alexei.starovoitov, alyssa.milburn, andrew.cooper3, hjl.tools,
	joao, jpoimboe, keescook, linux-kernel, mark.rutland, mbenes,
	Masami Hiramatsu, ndesaulniers, samitolvanen, x86

Peter Zijlstra wrote:
> On Thu, Mar 03, 2022 at 09:34:13AM -0500, Steven Rostedt wrote:
>> On Thu, 3 Mar 2022 14:04:52 +0100
>> Peter Zijlstra <peterz@infradead.org> wrote:
>> 
>> > > @@ -1596,7 +1596,7 @@ static int check_ftrace_location(struct kprobe *p)
>> > > {
>> > > 	unsigned long ftrace_addr;
>> > > 
>> > > -	ftrace_addr = ftrace_location((unsigned long)p->addr);
>> > > +	ftrace_addr = ftrace_location_range((unsigned long)p->addr, (unsigned long)p->addr);  
>> > 
>> > Yes, although perhaps a new helper. I'll go ponder during lunch.
>> 
>> Is there more places to add that to make it worth creating a helper?
> 
> This is what I ended up with, I've looked at all ftrace_location() sites
> there are, seems to work too, both the built-in boot time ftrace tests
> and the selftests work splat-less.
> 
> I should update the Changelog some though.
> 
> Naveen also mentioned register_ftrace_direct() could be further cleaned
> up, but I didn't want to do too much in once go.

Not a problem, I can send those as cleanups atop this series.

> 
> ---
> 
> Subject: x86/ibt,ftrace: Search for __fentry__ location
> From: Peter Zijlstra <peterz@infradead.org>
> Date: Wed Feb 23 10:01:38 CET 2022
> 
> Have ftrace_location() search the symbol for the __fentry__ location
> when it isn't at func+0 and use this for {,un}register_ftrace_direct().
> 
> This avoids a whole bunch of assumptions about __fentry__ being at
> func+0.
> 
> Suggested-by: Steven Rostedt <rostedt@goodmis.org>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---

This version looks good to me.
Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>


Thanks,
Naveen


^ permalink raw reply	[flat|nested] 183+ messages in thread

end of thread, other threads:[~2022-03-09 11:48 UTC | newest]

Thread overview: 183+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-24 14:51 [PATCH v2 00/39] x86: Kernel IBT Peter Zijlstra
2022-02-24 14:51 ` [PATCH v2 01/39] kbuild: Fix clang build Peter Zijlstra
2022-02-25  0:11   ` Kees Cook
2022-03-01 21:16   ` Nick Desaulniers
2022-03-02  0:47     ` Kees Cook
2022-03-02  0:53       ` Fangrui Song
2022-03-02 16:37     ` Nathan Chancellor
2022-03-02 18:40       ` Kees Cook
2022-03-02 19:18       ` Nick Desaulniers
2022-03-02 21:15         ` Nathan Chancellor
2022-03-02 22:07           ` Nick Desaulniers
2022-03-02 23:00           ` Kees Cook
2022-03-02 23:10           ` Peter Zijlstra
2022-02-24 14:51 ` [PATCH v2 02/39] static_call: Avoid building empty .static_call_sites Peter Zijlstra
2022-02-24 14:51 ` [PATCH v2 03/39] x86/module: Fix the paravirt vs alternative order Peter Zijlstra
2022-03-01 14:37   ` Miroslav Benes
2022-02-24 14:51 ` [PATCH v2 04/39] objtool: Add --dry-run Peter Zijlstra
2022-02-25  0:27   ` Kees Cook
2022-03-01 14:37   ` Miroslav Benes
2022-02-24 14:51 ` [PATCH v2 05/39] x86: Base IBT bits Peter Zijlstra
2022-02-25  0:35   ` Kees Cook
2022-02-25  0:46     ` Nathan Chancellor
2022-02-25 22:08       ` Nathan Chancellor
2022-02-26  0:29         ` Joao Moreira
2022-02-26  4:58           ` Kees Cook
2022-02-26  4:59             ` Fāng-ruì Sòng
2022-02-26  5:04               ` Kees Cook
2022-02-25 13:41     ` Peter Zijlstra
2022-02-24 14:51 ` [PATCH v2 06/39] x86/ibt: Add ANNOTATE_NOENDBR Peter Zijlstra
2022-02-25  0:36   ` Kees Cook
2022-02-24 14:51 ` [PATCH v2 07/39] x86/entry: Sprinkle ENDBR dust Peter Zijlstra
2022-02-24 22:37   ` Josh Poimboeuf
2022-02-25  0:42   ` Kees Cook
2022-02-25  9:22     ` Andrew Cooper
2022-02-24 14:51 ` [PATCH v2 08/39] x86/linkage: Add ENDBR to SYM_FUNC_START*() Peter Zijlstra
2022-02-25  0:45   ` Kees Cook
2022-02-24 14:51 ` [PATCH v2 09/39] x86/ibt,paravirt: Sprinkle ENDBR Peter Zijlstra
2022-02-25  0:47   ` Kees Cook
2022-02-24 14:51 ` [PATCH v2 10/39] x86/ibt,crypto: Add ENDBR for the jump-table entries Peter Zijlstra
2022-02-24 22:41   ` Josh Poimboeuf
2022-02-25  0:50   ` Kees Cook
2022-02-25 10:22     ` Peter Zijlstra
2022-02-24 14:51 ` [PATCH v2 11/39] x86/ibt,kvm: Add ENDBR to fastops Peter Zijlstra
2022-02-25  0:54   ` Kees Cook
2022-02-25 10:24     ` Peter Zijlstra
2022-02-25 13:09       ` David Laight
2022-02-24 14:51 ` [PATCH v2 12/39] x86/ibt,ftrace: Search for __fentry__ location Peter Zijlstra
2022-02-24 15:55   ` Masami Hiramatsu
2022-02-24 15:58     ` Steven Rostedt
2022-02-24 15:59       ` Steven Rostedt
2022-02-24 16:01       ` Steven Rostedt
2022-02-24 22:46         ` Josh Poimboeuf
2022-02-24 22:51           ` Steven Rostedt
2022-02-25  1:34       ` Masami Hiramatsu
2022-02-25  2:19         ` Steven Rostedt
2022-02-25 10:20           ` Masami Hiramatsu
2022-02-25 13:36             ` Steven Rostedt
2022-03-01 18:57               ` Naveen N. Rao
2022-03-01 19:20                 ` Steven Rostedt
2022-03-02 13:20                   ` Peter Zijlstra
2022-03-02 16:01                     ` Steven Rostedt
2022-03-02 19:47                       ` Steven Rostedt
2022-03-02 20:48                         ` Steven Rostedt
2022-03-02 20:51                         ` Peter Zijlstra
2022-03-03  9:45                           ` Naveen N. Rao
2022-03-03 13:04                             ` Peter Zijlstra
2022-03-03 14:34                               ` Steven Rostedt
2022-03-03 15:59                                 ` Peter Zijlstra
2022-03-06  3:48                                   ` Masami Hiramatsu
2022-03-09 11:47                                   ` Naveen N. Rao
2022-03-03 14:39                               ` Naveen N. Rao
2022-02-25  0:55   ` Kees Cook
2022-03-02 16:25   ` Naveen N. Rao
2022-02-24 14:51 ` [PATCH v2 13/39] x86/livepatch: Validate " Peter Zijlstra
2022-02-24 23:02   ` Josh Poimboeuf
2022-02-24 14:51 ` [PATCH v2 14/39] x86/ibt,ftrace: Make function-graph play nice Peter Zijlstra
2022-02-24 15:36   ` Peter Zijlstra
2022-02-24 15:42     ` Steven Rostedt
2022-02-24 23:09       ` Peter Zijlstra
2022-02-24 14:51 ` [PATCH v2 15/39] x86/ibt,kprobes: Fix more +0 assumptions Peter Zijlstra
2022-02-25  0:58   ` Kees Cook
2022-02-25  1:32   ` Masami Hiramatsu
2022-02-25 10:46     ` Peter Zijlstra
2022-02-25 13:42       ` Masami Hiramatsu
2022-02-25 15:41         ` Peter Zijlstra
2022-02-26  2:10           ` Masami Hiramatsu
2022-02-26 11:48             ` Peter Zijlstra
2022-02-25 14:14       ` Steven Rostedt
2022-02-26  7:09         ` Masami Hiramatsu
2022-02-28  6:07   ` Masami Hiramatsu
2022-02-28 23:25     ` Peter Zijlstra
2022-03-01  2:49       ` Masami Hiramatsu
2022-03-01  8:28         ` Peter Zijlstra
2022-03-01 17:19           ` Naveen N. Rao
2022-03-01 19:12             ` Peter Zijlstra
2022-03-01 20:05               ` Peter Zijlstra
2022-03-02 15:59                 ` Naveen N. Rao
2022-03-02 16:38                   ` Peter Zijlstra
2022-03-02 16:17                 ` Naveen N. Rao
2022-03-02 19:32                   ` Peter Zijlstra
2022-03-02 19:39                     ` Peter Zijlstra
2022-03-03 12:11                       ` Naveen N. Rao
2022-03-03  1:54                   ` Masami Hiramatsu
2022-03-02  0:11           ` Masami Hiramatsu
2022-03-02 10:25             ` Peter Zijlstra
2022-03-01 17:03       ` Naveen N. Rao
2022-02-24 14:51 ` [PATCH v2 16/39] x86/bpf: Add ENDBR instructions to prologue and trampoline Peter Zijlstra
2022-02-24 23:37   ` Josh Poimboeuf
2022-02-25  0:59     ` Kees Cook
2022-02-25 11:20     ` Peter Zijlstra
2022-02-25 12:24     ` Peter Zijlstra
2022-02-25 22:46       ` Josh Poimboeuf
2022-02-24 14:51 ` [PATCH v2 17/39] x86/ibt,ftrace: Add ENDBR to samples/ftrace Peter Zijlstra
2022-02-24 14:51 ` [PATCH v2 18/39] x86/ibt: Add IBT feature, MSR and #CP handling Peter Zijlstra
2022-02-24 23:55   ` Josh Poimboeuf
2022-02-25 10:51     ` Peter Zijlstra
2022-02-25 11:10       ` Peter Zijlstra
2022-02-25 23:51       ` Josh Poimboeuf
2022-02-26 11:55         ` Peter Zijlstra
2022-02-25  1:09   ` Kees Cook
2022-02-25 19:59   ` Edgecombe, Rick P
2022-03-01 15:14     ` Peter Zijlstra
2022-03-01 21:02       ` Peter Zijlstra
2022-03-01 23:13         ` Josh Poimboeuf
2022-03-02  1:59           ` Edgecombe, Rick P
2022-03-02 13:49             ` Peter Zijlstra
2022-03-02 18:38               ` Kees Cook
2022-02-24 14:51 ` [PATCH v2 19/39] x86: Disable IBT around firmware Peter Zijlstra
2022-02-25  1:10   ` Kees Cook
2022-02-24 14:51 ` [PATCH v2 20/39] x86/bugs: Disable Retpoline when IBT Peter Zijlstra
2022-02-25  1:11   ` Kees Cook
2022-02-25  2:22     ` Josh Poimboeuf
2022-02-25 10:55     ` Peter Zijlstra
2022-02-24 14:51 ` [PATCH v2 21/39] x86/ibt: Annotate text references Peter Zijlstra
2022-02-25  0:47   ` Josh Poimboeuf
2022-02-25 12:57     ` Peter Zijlstra
2022-02-25 13:04     ` Peter Zijlstra
2022-02-24 14:52 ` [PATCH v2 22/39] x86/ibt,ftrace: Annotate ftrace code patching Peter Zijlstra
2022-02-24 14:52 ` [PATCH v2 23/39] x86/ibt,sev: Annotations Peter Zijlstra
2022-02-24 14:52 ` [PATCH v2 24/39] x86/text-patching: Make text_gen_insn() IBT aware Peter Zijlstra
2022-02-25  0:49   ` Josh Poimboeuf
2022-02-24 14:52 ` [PATCH v2 25/39] x86/ibt,paravirt: Use text_gen_insn() for paravirt_patch() Peter Zijlstra
2022-02-24 14:52 ` [PATCH v2 26/39] x86/entry: Cleanup PARAVIRT Peter Zijlstra
2022-02-24 14:52 ` [PATCH v2 27/39] x86/entry,xen: Early rewrite of restore_regs_and_return_to_kernel() Peter Zijlstra
2022-02-24 17:51   ` Andrew Cooper
2022-02-24 14:52 ` [PATCH v2 28/39] x86/ibt,xen: Sprinkle the ENDBR Peter Zijlstra
2022-02-25  0:54   ` Josh Poimboeuf
2022-02-25 13:16     ` Peter Zijlstra
2022-02-24 14:52 ` [PATCH v2 29/39] objtool: Rename --duplicate to --lto Peter Zijlstra
2022-02-24 14:52 ` [PATCH v2 30/39] Kbuild: Allow whole module objtool runs Peter Zijlstra
2022-02-24 14:52 ` [PATCH v2 31/39] objtool: Read the NOENDBR annotation Peter Zijlstra
2022-02-24 14:52 ` [PATCH v2 32/39] x86/ibt: Dont generate ENDBR in .discard.text Peter Zijlstra
2022-02-24 14:52 ` [PATCH v2 33/39] objtool: Add IBT/ENDBR decoding Peter Zijlstra
2022-03-03 10:53   ` Miroslav Benes
2022-03-03 11:06     ` Andrew Cooper
2022-03-03 12:33       ` Miroslav Benes
2022-03-03 14:13         ` Peter Zijlstra
2022-02-24 14:52 ` [PATCH v2 34/39] objtool: Validate IBT assumptions Peter Zijlstra
2022-02-27  3:13   ` Josh Poimboeuf
2022-02-27 17:00     ` Peter Zijlstra
2022-02-27 22:20       ` Josh Poimboeuf
2022-02-28  9:47         ` Peter Zijlstra
2022-02-28 18:36           ` Josh Poimboeuf
2022-02-28 20:10             ` Peter Zijlstra
2022-02-28  9:26       ` Peter Zijlstra
2022-02-28 18:39         ` Josh Poimboeuf
2022-02-24 14:52 ` [PATCH v2 35/39] objtool: IBT fix direct JMP/CALL Peter Zijlstra
2022-02-24 14:52 ` [PATCH v2 36/39] objtool: Find unused ENDBR instructions Peter Zijlstra
2022-02-27  3:46   ` Josh Poimboeuf
2022-02-28 12:41     ` Peter Zijlstra
2022-02-28 17:36       ` Josh Poimboeuf
2022-02-24 14:52 ` [PATCH v2 37/39] x86/ibt: Finish --ibt-fix-direct on module loading Peter Zijlstra
2022-02-24 14:52 ` [PATCH v2 38/39] x86/ibt: Ensure module init/exit points have references Peter Zijlstra
2022-02-24 14:52 ` [PATCH v2 39/39] x86/alternative: Use .ibt_endbr_sites to seal indirect calls Peter Zijlstra
2022-02-24 20:26 ` [PATCH v2 00/39] x86: Kernel IBT Josh Poimboeuf
2022-02-25 15:28   ` Peter Zijlstra
2022-02-25 15:43     ` Peter Zijlstra
2022-02-25 17:26       ` Josh Poimboeuf
2022-02-25 17:32         ` Steven Rostedt
2022-02-25 19:53           ` Peter Zijlstra
2022-02-25 20:15             ` Josh Poimboeuf
2022-03-01 23:10     ` Josh Poimboeuf
2022-03-02 10:20       ` Peter Zijlstra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).