* [PATCH 00/29] x86: Kernel IBT
@ 2022-02-18 16:49 Peter Zijlstra
2022-02-18 16:49 ` [PATCH 01/29] static_call: Avoid building empty .static_call_sites Peter Zijlstra
` (29 more replies)
0 siblings, 30 replies; 94+ messages in thread
From: Peter Zijlstra @ 2022-02-18 16:49 UTC (permalink / raw)
To: x86, joao, hjl.tools, jpoimboe, andrew.cooper3
Cc: linux-kernel, peterz, ndesaulniers, keescook, samitolvanen,
mark.rutland, alyssa.milburn
Hi,
This is an (almost!) complete Kernel IBT implementation. It's been self-hosting
for a few days now. That is, it runs on IBT enabled hardware (Tigerlake) and is
capable of building the next kernel.
It is also almost clean on allmodconfig using GCC-11.2.
The biggest TODO item at this point is Clang, I've not yet looked at that.
Patches are also available here:
git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git x86/wip.ibt
This series is on top of tip/master along with the linkage patches from Mark:
https://lore.kernel.org/all/20220216162229.1076788-1-mark.rutland@arm.com/
Enjoy!
^ permalink raw reply [flat|nested] 94+ messages in thread
* [PATCH 01/29] static_call: Avoid building empty .static_call_sites
2022-02-18 16:49 [PATCH 00/29] x86: Kernel IBT Peter Zijlstra
@ 2022-02-18 16:49 ` Peter Zijlstra
2022-02-18 16:49 ` [PATCH 02/29] x86/module: Fix the paravirt vs alternative order Peter Zijlstra
` (28 subsequent siblings)
29 siblings, 0 replies; 94+ messages in thread
From: Peter Zijlstra @ 2022-02-18 16:49 UTC (permalink / raw)
To: x86, joao, hjl.tools, jpoimboe, andrew.cooper3
Cc: linux-kernel, peterz, ndesaulniers, keescook, samitolvanen,
mark.rutland, alyssa.milburn
Without CONFIG_HAVE_STATIC_CALL_INLINE there's no point in creating
the .static_call_sites section and it's related symbols.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
include/asm-generic/vmlinux.lds.h | 4 ++++
1 file changed, 4 insertions(+)
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -398,6 +398,7 @@
KEEP(*(__jump_table)) \
__stop___jump_table = .;
+#ifdef CONFIG_HAVE_STATIC_CALL_INLINE
#define STATIC_CALL_DATA \
. = ALIGN(8); \
__start_static_call_sites = .; \
@@ -406,6 +407,9 @@
__start_static_call_tramp_key = .; \
KEEP(*(.static_call_tramp_key)) \
__stop_static_call_tramp_key = .;
+#else
+#define STATIC_CALL_DATA
+#endif
/*
* Allow architectures to handle ro_after_init data on their
^ permalink raw reply [flat|nested] 94+ messages in thread
* [PATCH 02/29] x86/module: Fix the paravirt vs alternative order
2022-02-18 16:49 [PATCH 00/29] x86: Kernel IBT Peter Zijlstra
2022-02-18 16:49 ` [PATCH 01/29] static_call: Avoid building empty .static_call_sites Peter Zijlstra
@ 2022-02-18 16:49 ` Peter Zijlstra
2022-02-18 20:28 ` Josh Poimboeuf
2022-02-18 16:49 ` [PATCH 03/29] objtool: Add --dry-run Peter Zijlstra
` (27 subsequent siblings)
29 siblings, 1 reply; 94+ messages in thread
From: Peter Zijlstra @ 2022-02-18 16:49 UTC (permalink / raw)
To: x86, joao, hjl.tools, jpoimboe, andrew.cooper3
Cc: linux-kernel, peterz, ndesaulniers, keescook, samitolvanen,
mark.rutland, alyssa.milburn, Juergen Gross
Ever since commit 4e6292114c741 ("x86/paravirt: Add new features for
paravirt patching") there is an ordering dependency between patching
paravirt ops and patching alternatives, the module loader still
violates this.
Fixes: 4e6292114c741 ("x86/paravirt: Add new features for paravirt patching")
Cc: Juergen Gross <jgross@suse.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
arch/x86/kernel/module.c | 9 ++++-----
1 file changed, 4 insertions(+), 5 deletions(-)
--- a/arch/x86/kernel/module.c
+++ b/arch/x86/kernel/module.c
@@ -272,6 +272,10 @@ int module_finalize(const Elf_Ehdr *hdr,
retpolines = s;
}
+ if (para) {
+ void *pseg = (void *)para->sh_addr;
+ apply_paravirt(pseg, pseg + para->sh_size);
+ }
if (retpolines) {
void *rseg = (void *)retpolines->sh_addr;
apply_retpolines(rseg, rseg + retpolines->sh_size);
@@ -289,11 +293,6 @@ int module_finalize(const Elf_Ehdr *hdr,
tseg, tseg + text->sh_size);
}
- if (para) {
- void *pseg = (void *)para->sh_addr;
- apply_paravirt(pseg, pseg + para->sh_size);
- }
-
/* make jump label nops */
jump_label_apply_nops(me);
^ permalink raw reply [flat|nested] 94+ messages in thread
* [PATCH 03/29] objtool: Add --dry-run
2022-02-18 16:49 [PATCH 00/29] x86: Kernel IBT Peter Zijlstra
2022-02-18 16:49 ` [PATCH 01/29] static_call: Avoid building empty .static_call_sites Peter Zijlstra
2022-02-18 16:49 ` [PATCH 02/29] x86/module: Fix the paravirt vs alternative order Peter Zijlstra
@ 2022-02-18 16:49 ` Peter Zijlstra
2022-02-18 16:49 ` [PATCH 04/29] x86/livepatch: Validate __fentry__ location Peter Zijlstra
` (26 subsequent siblings)
29 siblings, 0 replies; 94+ messages in thread
From: Peter Zijlstra @ 2022-02-18 16:49 UTC (permalink / raw)
To: x86, joao, hjl.tools, jpoimboe, andrew.cooper3
Cc: linux-kernel, peterz, ndesaulniers, keescook, samitolvanen,
mark.rutland, alyssa.milburn
Add a --dry-run argument to skip writing the modifications. This is
convenient for debugging.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
tools/objtool/builtin-check.c | 3 ++-
tools/objtool/elf.c | 3 +++
tools/objtool/include/objtool/builtin.h | 2 +-
3 files changed, 6 insertions(+), 2 deletions(-)
--- a/tools/objtool/builtin-check.c
+++ b/tools/objtool/builtin-check.c
@@ -20,7 +20,7 @@
#include <objtool/objtool.h>
bool no_fp, no_unreachable, retpoline, module, backtrace, uaccess, stats,
- validate_dup, vmlinux, mcount, noinstr, backup, sls;
+ validate_dup, vmlinux, mcount, noinstr, backup, sls, dryrun;
static const char * const check_usage[] = {
"objtool check [<options>] file.o",
@@ -46,6 +46,7 @@ const struct option check_options[] = {
OPT_BOOLEAN('M', "mcount", &mcount, "generate __mcount_loc"),
OPT_BOOLEAN('B', "backup", &backup, "create .orig files before modification"),
OPT_BOOLEAN('S', "sls", &sls, "validate straight-line-speculation"),
+ OPT_BOOLEAN(0, "dry-run", &dryrun, "don't write the modifications"),
OPT_END(),
};
--- a/tools/objtool/elf.c
+++ b/tools/objtool/elf.c
@@ -1019,6 +1019,9 @@ int elf_write(struct elf *elf)
struct section *sec;
Elf_Scn *s;
+ if (dryrun)
+ return 0;
+
/* Update changed relocation sections and section headers: */
list_for_each_entry(sec, &elf->sections, list) {
if (sec->changed) {
--- a/tools/objtool/include/objtool/builtin.h
+++ b/tools/objtool/include/objtool/builtin.h
@@ -9,7 +9,7 @@
extern const struct option check_options[];
extern bool no_fp, no_unreachable, retpoline, module, backtrace, uaccess, stats,
- validate_dup, vmlinux, mcount, noinstr, backup, sls;
+ validate_dup, vmlinux, mcount, noinstr, backup, sls, dryrun;
extern int cmd_parse_options(int argc, const char **argv, const char * const usage[]);
^ permalink raw reply [flat|nested] 94+ messages in thread
* [PATCH 04/29] x86/livepatch: Validate __fentry__ location
2022-02-18 16:49 [PATCH 00/29] x86: Kernel IBT Peter Zijlstra
` (2 preceding siblings ...)
2022-02-18 16:49 ` [PATCH 03/29] objtool: Add --dry-run Peter Zijlstra
@ 2022-02-18 16:49 ` Peter Zijlstra
2022-02-18 21:08 ` Josh Poimboeuf
2022-02-18 16:49 ` [PATCH 05/29] x86: Base IBT bits Peter Zijlstra
` (25 subsequent siblings)
29 siblings, 1 reply; 94+ messages in thread
From: Peter Zijlstra @ 2022-02-18 16:49 UTC (permalink / raw)
To: x86, joao, hjl.tools, jpoimboe, andrew.cooper3
Cc: linux-kernel, peterz, ndesaulniers, keescook, samitolvanen,
mark.rutland, alyssa.milburn, Miroslav Benes
Currently livepatch assumes __fentry__ lives at func+0, which is most
likely untrue with IBT on. Override the weak klp_get_ftrace_location()
function with an arch specific version that's IBT aware.
Also make the weak fallback verify the location is an actual ftrace
location as a sanity check.
Suggested-by: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
arch/x86/include/asm/livepatch.h | 9 +++++++++
kernel/livepatch/patch.c | 2 +-
2 files changed, 10 insertions(+), 1 deletion(-)
--- a/arch/x86/include/asm/livepatch.h
+++ b/arch/x86/include/asm/livepatch.h
@@ -17,4 +17,13 @@ static inline void klp_arch_set_pc(struc
ftrace_instruction_pointer_set(fregs, ip);
}
+#define klp_get_ftrace_location klp_get_ftrace_location
+static inline unsigned long klp_get_ftrace_location(unsigned long faddr)
+{
+ unsigned long addr = ftrace_location(faddr);
+ if (!addr && IS_ENABLED(CONFIG_X86_IBT))
+ addr = ftrace_location(faddr + 4);
+ return addr;
+}
+
#endif /* _ASM_X86_LIVEPATCH_H */
--- a/kernel/livepatch/patch.c
+++ b/kernel/livepatch/patch.c
@@ -133,7 +133,7 @@ static void notrace klp_ftrace_handler(u
#ifndef klp_get_ftrace_location
static unsigned long klp_get_ftrace_location(unsigned long faddr)
{
- return faddr;
+ return ftrace_location(faddr);
}
#endif
^ permalink raw reply [flat|nested] 94+ messages in thread
* [PATCH 05/29] x86: Base IBT bits
2022-02-18 16:49 [PATCH 00/29] x86: Kernel IBT Peter Zijlstra
` (3 preceding siblings ...)
2022-02-18 16:49 ` [PATCH 04/29] x86/livepatch: Validate __fentry__ location Peter Zijlstra
@ 2022-02-18 16:49 ` Peter Zijlstra
2022-02-18 20:49 ` Andrew Cooper
` (3 more replies)
2022-02-18 16:49 ` [PATCH 06/29] x86/ibt: Add ANNOTATE_NOENDBR Peter Zijlstra
` (24 subsequent siblings)
29 siblings, 4 replies; 94+ messages in thread
From: Peter Zijlstra @ 2022-02-18 16:49 UTC (permalink / raw)
To: x86, joao, hjl.tools, jpoimboe, andrew.cooper3
Cc: linux-kernel, peterz, ndesaulniers, keescook, samitolvanen,
mark.rutland, alyssa.milburn
Add Kconfig, Makefile and basic instruction support for x86 IBT.
TODO: clang
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
arch/x86/Kconfig | 15 ++++++++++++
arch/x86/Makefile | 5 +++-
arch/x86/include/asm/ibt.h | 53 +++++++++++++++++++++++++++++++++++++++++++++
3 files changed, 72 insertions(+), 1 deletion(-)
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1861,6 +1861,21 @@ config X86_UMIP
specific cases in protected and virtual-8086 modes. Emulated
results are dummy.
+config CC_HAS_IBT
+ # GCC >= 9 and binutils >= 2.29
+ # Retpoline check to work around https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93654
+ def_bool $(cc-option, -fcf-protection=branch -mindirect-branch-register) && $(as-instr,endbr64)
+
+config X86_IBT
+ prompt "Indirect Branch Tracking"
+ bool
+ depends on X86_64 && CC_HAS_IBT
+ help
+ Build the kernel with support for Indirect Branch Tracking, a
+ hardware supported CFI scheme. Any indirect call must land on
+ an ENDBR instruction, as such, the compiler will litter the
+ code with them to make this happen.
+
config X86_INTEL_MEMORY_PROTECTION_KEYS
prompt "Memory Protection Keys"
def_bool y
--- a/arch/x86/Makefile
+++ b/arch/x86/Makefile
@@ -62,8 +62,11 @@ export BITS
#
KBUILD_CFLAGS += -mno-sse -mno-mmx -mno-sse2 -mno-3dnow -mno-avx
-# Intel CET isn't enabled in the kernel
+ifeq ($(CONFIG_X86_IBT),y)
+KBUILD_CFLAGS += $(call cc-option,-fcf-protection=branch)
+else
KBUILD_CFLAGS += $(call cc-option,-fcf-protection=none)
+endif
ifeq ($(CONFIG_X86_32),y)
BITS := 32
--- /dev/null
+++ b/arch/x86/include/asm/ibt.h
@@ -0,0 +1,53 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_X86_IBT_H
+#define _ASM_X86_IBT_H
+
+#ifdef CONFIG_X86_IBT
+
+#ifndef __ASSEMBLY__
+
+#ifdef CONFIG_X86_64
+#define ASM_ENDBR "endbr64\n\t"
+#else
+#define ASM_ENDBR "endbr32\n\t"
+#endif
+
+#define __noendbr __attribute__((nocf_check))
+
+/*
+ * A bit convoluted, but matches both endbr32 and endbr64 without
+ * having either as literal in the text.
+ */
+static inline bool is_endbr(const void *addr)
+{
+ unsigned int val = ~*(unsigned int *)addr;
+ val |= 0x01000000U;
+ return val == ~0xfa1e0ff3;
+}
+
+#else /* __ASSEMBLY__ */
+
+#ifdef CONFIG_X86_64
+#define ENDBR endbr64
+#else
+#define ENDBR endbr32
+#endif
+
+#endif /* __ASSEMBLY__ */
+
+#else /* !IBT */
+
+#ifndef __ASSEMBLY__
+
+#define ASM_ENDBR
+
+#define __noendbr
+
+#else /* __ASSEMBLY__ */
+
+#define ENDBR
+
+#endif /* __ASSEMBLY__ */
+
+#endif /* CONFIG_X86_IBT */
+#endif /* _ASM_X86_IBT_H */
^ permalink raw reply [flat|nested] 94+ messages in thread
* [PATCH 06/29] x86/ibt: Add ANNOTATE_NOENDBR
2022-02-18 16:49 [PATCH 00/29] x86: Kernel IBT Peter Zijlstra
` (4 preceding siblings ...)
2022-02-18 16:49 ` [PATCH 05/29] x86: Base IBT bits Peter Zijlstra
@ 2022-02-18 16:49 ` Peter Zijlstra
2022-02-18 16:49 ` [PATCH 07/29] x86/entry: Sprinkle ENDBR dust Peter Zijlstra
` (23 subsequent siblings)
29 siblings, 0 replies; 94+ messages in thread
From: Peter Zijlstra @ 2022-02-18 16:49 UTC (permalink / raw)
To: x86, joao, hjl.tools, jpoimboe, andrew.cooper3
Cc: linux-kernel, peterz, ndesaulniers, keescook, samitolvanen,
mark.rutland, alyssa.milburn
In order to have objtool warn about code references to !ENDBR
instruction, we need an annotation to allow this for non-coontrol-flow
instances -- consider text range checks, text patching, or return
trampolines etc.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
arch/x86/include/asm/linkage.h | 11 +++++++++++
include/linux/instruction_pointer.h | 5 +++++
include/linux/objtool.h | 13 +++++++++++++
3 files changed, 29 insertions(+)
--- a/include/linux/objtool.h
+++ b/include/linux/objtool.h
@@ -78,6 +78,12 @@ struct unwind_hint {
#define STACK_FRAME_NON_STANDARD_FP(func)
#endif
+#define ANNOTATE_NOENDBR \
+ "986: \n\t" \
+ ".pushsection .discard.noendbr\n\t" \
+ _ASM_PTR " 986b\n\t" \
+ ".popsection\n\t"
+
#else /* __ASSEMBLY__ */
/*
@@ -130,6 +136,13 @@ struct unwind_hint {
.popsection
.endm
+.macro ANNOTATE_NOENDBR
+.Lhere_\@:
+ .pushsection .discard.noendbr
+ .quad .Lhere_\@
+ .popsection
+.endm
+
#endif /* __ASSEMBLY__ */
#else /* !CONFIG_STACK_VALIDATION */
^ permalink raw reply [flat|nested] 94+ messages in thread
* [PATCH 07/29] x86/entry: Sprinkle ENDBR dust
2022-02-18 16:49 [PATCH 00/29] x86: Kernel IBT Peter Zijlstra
` (5 preceding siblings ...)
2022-02-18 16:49 ` [PATCH 06/29] x86/ibt: Add ANNOTATE_NOENDBR Peter Zijlstra
@ 2022-02-18 16:49 ` Peter Zijlstra
2022-02-19 0:23 ` Josh Poimboeuf
2022-02-19 0:36 ` Josh Poimboeuf
2022-02-18 16:49 ` [PATCH 08/29] x86/linkage: Add ENDBR to SYM_FUNC_START*() Peter Zijlstra
` (22 subsequent siblings)
29 siblings, 2 replies; 94+ messages in thread
From: Peter Zijlstra @ 2022-02-18 16:49 UTC (permalink / raw)
To: x86, joao, hjl.tools, jpoimboe, andrew.cooper3
Cc: linux-kernel, peterz, ndesaulniers, keescook, samitolvanen,
mark.rutland, alyssa.milburn
Kernel entry points should be having ENDBR on for IBT configs.
The SYSCALL entry points are found through taking their respective
address in order to program them in the MSRs, while the exception
entry points are found through UNWIND_HINT_IRET_REGS.
*Except* that latter hint is also used on exit code to denote when
we're down to an IRET frame. As such add an additional 'entry'
argument to the macro and have it default to '1' such that objtool
will assume it's an entry and WARN about it.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
arch/x86/entry/entry_64.S | 35 +++++++++++++++++++++--------------
arch/x86/entry/entry_64_compat.S | 3 +++
arch/x86/include/asm/idtentry.h | 23 +++++++++++++++--------
arch/x86/include/asm/segment.h | 5 +++++
arch/x86/include/asm/unwind_hints.h | 18 +++++++++++++-----
arch/x86/kernel/head_64.S | 14 +++++++++-----
arch/x86/kernel/idt.c | 5 +++--
arch/x86/kernel/unwind_orc.c | 3 ++-
include/linux/objtool.h | 5 +++--
tools/include/linux/objtool.h | 5 +++--
tools/objtool/check.c | 3 ++-
tools/objtool/orc_dump.c | 3 ++-
12 files changed, 81 insertions(+), 41 deletions(-)
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -39,6 +39,7 @@
#include <asm/trapnr.h>
#include <asm/nospec-branch.h>
#include <asm/fsgsbase.h>
+#include <asm/ibt.h>
#include <linux/err.h>
#include "calling.h"
@@ -87,6 +88,7 @@
SYM_CODE_START(entry_SYSCALL_64)
UNWIND_HINT_EMPTY
+ ENDBR
swapgs
/* tss.sp2 is scratch space. */
movq %rsp, PER_CPU_VAR(cpu_tss_rw + TSS_sp2)
@@ -349,7 +351,8 @@ SYM_CODE_END(ret_from_fork)
*/
.macro idtentry vector asmsym cfunc has_error_code:req
SYM_CODE_START(\asmsym)
- UNWIND_HINT_IRET_REGS offset=\has_error_code*8
+ UNWIND_HINT_IRET_REGS offset=\has_error_code*8 entry=1
+ ENDBR
ASM_CLAC
.if \has_error_code == 0
@@ -366,7 +369,7 @@ SYM_CODE_START(\asmsym)
.rept 6
pushq 5*8(%rsp)
.endr
- UNWIND_HINT_IRET_REGS offset=8
+ UNWIND_HINT_IRET_REGS offset=8 entry=0
.Lfrom_usermode_no_gap_\@:
.endif
@@ -416,7 +419,8 @@ SYM_CODE_END(\asmsym)
*/
.macro idtentry_mce_db vector asmsym cfunc
SYM_CODE_START(\asmsym)
- UNWIND_HINT_IRET_REGS
+ UNWIND_HINT_IRET_REGS entry=1
+ ENDBR
ASM_CLAC
pushq $-1 /* ORIG_RAX: no syscall to restart */
@@ -471,7 +475,8 @@ SYM_CODE_END(\asmsym)
*/
.macro idtentry_vc vector asmsym cfunc
SYM_CODE_START(\asmsym)
- UNWIND_HINT_IRET_REGS
+ UNWIND_HINT_IRET_REGS entry=1
+ ENDBR
ASM_CLAC
/*
@@ -532,7 +537,8 @@ SYM_CODE_END(\asmsym)
*/
.macro idtentry_df vector asmsym cfunc
SYM_CODE_START(\asmsym)
- UNWIND_HINT_IRET_REGS offset=8
+ UNWIND_HINT_IRET_REGS offset=8 entry=1
+ ENDBR
ASM_CLAC
/* paranoid_entry returns GS information for paranoid_exit in EBX. */
@@ -629,7 +635,7 @@ SYM_INNER_LABEL(restore_regs_and_return_
INTERRUPT_RETURN
SYM_INNER_LABEL_ALIGN(native_iret, SYM_L_GLOBAL)
- UNWIND_HINT_IRET_REGS
+ UNWIND_HINT_IRET_REGS entry=0
/*
* Are we returning to a stack segment from the LDT? Note: in
* 64-bit mode SS:RSP on the exception stack is always valid.
@@ -706,7 +712,7 @@ SYM_INNER_LABEL(native_irq_return_iret,
popq %rdi /* Restore user RDI */
movq %rax, %rsp
- UNWIND_HINT_IRET_REGS offset=8
+ UNWIND_HINT_IRET_REGS offset=8 entry=0
/*
* At this point, we cannot write to the stack any more, but we can
@@ -821,13 +827,13 @@ SYM_CODE_START(xen_failsafe_callback)
movq 8(%rsp), %r11
addq $0x30, %rsp
pushq $0 /* RIP */
- UNWIND_HINT_IRET_REGS offset=8
+ UNWIND_HINT_IRET_REGS offset=8 entry=0
jmp asm_exc_general_protection
1: /* Segment mismatch => Category 1 (Bad segment). Retry the IRET. */
movq (%rsp), %rcx
movq 8(%rsp), %r11
addq $0x30, %rsp
- UNWIND_HINT_IRET_REGS
+ UNWIND_HINT_IRET_REGS entry=0
pushq $-1 /* orig_ax = -1 => not a system call */
PUSH_AND_CLEAR_REGS
ENCODE_FRAME_POINTER
@@ -1062,7 +1068,8 @@ SYM_CODE_END(error_return)
* when PAGE_TABLE_ISOLATION is in use. Do not clobber.
*/
SYM_CODE_START(asm_exc_nmi)
- UNWIND_HINT_IRET_REGS
+ UNWIND_HINT_IRET_REGS entry=1
+ ENDBR
/*
* We allow breakpoints in NMIs. If a breakpoint occurs, then
@@ -1127,13 +1134,13 @@ SYM_CODE_START(asm_exc_nmi)
SWITCH_TO_KERNEL_CR3 scratch_reg=%rdx
movq %rsp, %rdx
movq PER_CPU_VAR(cpu_current_top_of_stack), %rsp
- UNWIND_HINT_IRET_REGS base=%rdx offset=8
+ UNWIND_HINT_IRET_REGS base=%rdx offset=8 entry=0
pushq 5*8(%rdx) /* pt_regs->ss */
pushq 4*8(%rdx) /* pt_regs->rsp */
pushq 3*8(%rdx) /* pt_regs->flags */
pushq 2*8(%rdx) /* pt_regs->cs */
pushq 1*8(%rdx) /* pt_regs->rip */
- UNWIND_HINT_IRET_REGS
+ UNWIND_HINT_IRET_REGS entry=0
pushq $-1 /* pt_regs->orig_ax */
PUSH_AND_CLEAR_REGS rdx=(%rdx)
ENCODE_FRAME_POINTER
@@ -1289,7 +1296,7 @@ SYM_CODE_START(asm_exc_nmi)
.rept 5
pushq 11*8(%rsp)
.endr
- UNWIND_HINT_IRET_REGS
+ UNWIND_HINT_IRET_REGS entry=0
/* Everything up to here is safe from nested NMIs */
@@ -1305,7 +1312,7 @@ SYM_CODE_START(asm_exc_nmi)
pushq $__KERNEL_CS /* CS */
pushq $1f /* RIP */
iretq /* continues at repeat_nmi below */
- UNWIND_HINT_IRET_REGS
+ UNWIND_HINT_IRET_REGS entry=0
1:
#endif
--- a/arch/x86/entry/entry_64_compat.S
+++ b/arch/x86/entry/entry_64_compat.S
@@ -49,6 +49,7 @@
SYM_CODE_START(entry_SYSENTER_compat)
UNWIND_HINT_EMPTY
/* Interrupts are off on entry. */
+ ENDBR
SWAPGS
pushq %rax
@@ -198,6 +199,7 @@ SYM_CODE_END(entry_SYSENTER_compat)
*/
SYM_CODE_START(entry_SYSCALL_compat)
UNWIND_HINT_EMPTY
+ ENDBR
/* Interrupts are off on entry. */
swapgs
@@ -340,6 +342,7 @@ SYM_CODE_END(entry_SYSCALL_compat)
*/
SYM_CODE_START(entry_INT80_compat)
UNWIND_HINT_EMPTY
+ ENDBR
/*
* Interrupts are off on entry.
*/
--- a/arch/x86/include/asm/idtentry.h
+++ b/arch/x86/include/asm/idtentry.h
@@ -5,6 +5,12 @@
/* Interrupts/Exceptions */
#include <asm/trapnr.h>
+#ifdef CONFIG_X86_IBT
+#define IDT_ALIGN 16
+#else
+#define IDT_ALIGN 8
+#endif
+
#ifndef __ASSEMBLY__
#include <linux/entry-common.h>
#include <linux/hardirq.h>
@@ -492,33 +498,34 @@ __visible noinstr void func(struct pt_re
* point is to mask off the bits above bit 7 because the push is sign
* extending.
*/
- .align 8
+
+ .align IDT_ALIGN
SYM_CODE_START(irq_entries_start)
vector=FIRST_EXTERNAL_VECTOR
.rept NR_EXTERNAL_VECTORS
- UNWIND_HINT_IRET_REGS
+ UNWIND_HINT_IRET_REGS entry=1
0 :
+ ENDBR
.byte 0x6a, vector
jmp asm_common_interrupt
- nop
/* Ensure that the above is 8 bytes max */
- . = 0b + 8
+ .fill 0b + IDT_ALIGN - ., 1, 0x90
vector = vector+1
.endr
SYM_CODE_END(irq_entries_start)
#ifdef CONFIG_X86_LOCAL_APIC
- .align 8
+ .align IDT_ALIGN
SYM_CODE_START(spurious_entries_start)
vector=FIRST_SYSTEM_VECTOR
.rept NR_SYSTEM_VECTORS
- UNWIND_HINT_IRET_REGS
+ UNWIND_HINT_IRET_REGS entry=1
0 :
+ ENDBR
.byte 0x6a, vector
jmp asm_spurious_interrupt
- nop
/* Ensure that the above is 8 bytes max */
- . = 0b + 8
+ .fill 0b + IDT_ALIGN - ., 1, 0x90
vector = vector+1
.endr
SYM_CODE_END(spurious_entries_start)
--- a/arch/x86/include/asm/segment.h
+++ b/arch/x86/include/asm/segment.h
@@ -4,6 +4,7 @@
#include <linux/const.h>
#include <asm/alternative.h>
+#include <asm/ibt.h>
/*
* Constructor for a conventional segment GDT (or LDT) entry.
@@ -275,7 +276,11 @@ static inline void vdso_read_cpunode(uns
* vector has no error code (two bytes), a 'push $vector_number' (two
* bytes), and a jump to the common entry code (up to five bytes).
*/
+#ifdef CONFIG_X86_IBT
+#define EARLY_IDT_HANDLER_SIZE 13
+#else
#define EARLY_IDT_HANDLER_SIZE 9
+#endif
/*
* xen_early_idt_handler_array is for Xen pv guests: for each entry in
--- a/arch/x86/include/asm/unwind_hints.h
+++ b/arch/x86/include/asm/unwind_hints.h
@@ -11,7 +11,7 @@
UNWIND_HINT sp_reg=ORC_REG_UNDEFINED type=UNWIND_HINT_TYPE_CALL end=1
.endm
-.macro UNWIND_HINT_REGS base=%rsp offset=0 indirect=0 extra=1 partial=0
+.macro UNWIND_HINT_REGS base=%rsp offset=0 indirect=0 extra=1 partial=0 entry=1
.if \base == %rsp
.if \indirect
.set sp_reg, ORC_REG_SP_INDIRECT
@@ -33,9 +33,17 @@
.set sp_offset, \offset
.if \partial
- .set type, UNWIND_HINT_TYPE_REGS_PARTIAL
+ .if \entry
+ .set type, UNWIND_HINT_TYPE_REGS_ENTRY
+ .else
+ .set type, UNWIND_HINT_TYPE_REGS_EXIT
+ .endif
.elseif \extra == 0
- .set type, UNWIND_HINT_TYPE_REGS_PARTIAL
+ .if \entry
+ .set type, UNWIND_HINT_TYPE_REGS_ENTRY
+ .else
+ .set type, UNWIND_HINT_TYPE_REGS_EXIT
+ .endif
.set sp_offset, \offset + (16*8)
.else
.set type, UNWIND_HINT_TYPE_REGS
@@ -44,8 +52,8 @@
UNWIND_HINT sp_reg=sp_reg sp_offset=sp_offset type=type
.endm
-.macro UNWIND_HINT_IRET_REGS base=%rsp offset=0
- UNWIND_HINT_REGS base=\base offset=\offset partial=1
+.macro UNWIND_HINT_IRET_REGS base=%rsp offset=0 entry=1
+ UNWIND_HINT_REGS base=\base offset=\offset partial=1 entry=\entry
.endm
.macro UNWIND_HINT_FUNC
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -25,6 +25,7 @@
#include <asm/export.h>
#include <asm/nospec-branch.h>
#include <asm/fixmap.h>
+#include <asm/ibt.h>
/*
* We are not able to switch in one step to the final KERNEL ADDRESS SPACE
@@ -327,7 +328,8 @@ SYM_CODE_END(start_cpu0)
* when .init.text is freed.
*/
SYM_CODE_START_NOALIGN(vc_boot_ghcb)
- UNWIND_HINT_IRET_REGS offset=8
+ UNWIND_HINT_IRET_REGS offset=8 entry=1
+ ENDBR
/* Build pt_regs */
PUSH_AND_CLEAR_REGS
@@ -371,18 +373,20 @@ SYM_CODE_START(early_idt_handler_array)
i = 0
.rept NUM_EXCEPTION_VECTORS
.if ((EXCEPTION_ERRCODE_MASK >> i) & 1) == 0
- UNWIND_HINT_IRET_REGS
+ UNWIND_HINT_IRET_REGS entry=1
+ ENDBR
pushq $0 # Dummy error code, to make stack frame uniform
.else
- UNWIND_HINT_IRET_REGS offset=8
+ UNWIND_HINT_IRET_REGS offset=8 entry=1
+ ENDBR
.endif
pushq $i # 72(%rsp) Vector number
jmp early_idt_handler_common
- UNWIND_HINT_IRET_REGS
+ UNWIND_HINT_IRET_REGS entry=0
i = i + 1
.fill early_idt_handler_array + i*EARLY_IDT_HANDLER_SIZE - ., 1, 0xcc
.endr
- UNWIND_HINT_IRET_REGS offset=16
+ UNWIND_HINT_IRET_REGS offset=16 entry=0
SYM_CODE_END(early_idt_handler_array)
SYM_CODE_START_LOCAL(early_idt_handler_common)
--- a/arch/x86/kernel/idt.c
+++ b/arch/x86/kernel/idt.c
@@ -10,6 +10,7 @@
#include <asm/proto.h>
#include <asm/desc.h>
#include <asm/hw_irq.h>
+#include <asm/idtentry.h>
#define DPL0 0x0
#define DPL3 0x3
@@ -272,7 +273,7 @@ void __init idt_setup_apic_and_irq_gates
idt_setup_from_table(idt_table, apic_idts, ARRAY_SIZE(apic_idts), true);
for_each_clear_bit_from(i, system_vectors, FIRST_SYSTEM_VECTOR) {
- entry = irq_entries_start + 8 * (i - FIRST_EXTERNAL_VECTOR);
+ entry = irq_entries_start + IDT_ALIGN * (i - FIRST_EXTERNAL_VECTOR);
set_intr_gate(i, entry);
}
@@ -283,7 +284,7 @@ void __init idt_setup_apic_and_irq_gates
* system_vectors bitmap. Otherwise they show up in
* /proc/interrupts.
*/
- entry = spurious_entries_start + 8 * (i - FIRST_SYSTEM_VECTOR);
+ entry = spurious_entries_start + IDT_ALIGN * (i - FIRST_SYSTEM_VECTOR);
set_intr_gate(i, entry);
}
#endif
--- a/arch/x86/kernel/unwind_orc.c
+++ b/arch/x86/kernel/unwind_orc.c
@@ -566,7 +566,8 @@ bool unwind_next_frame(struct unwind_sta
state->signal = true;
break;
- case UNWIND_HINT_TYPE_REGS_PARTIAL:
+ case UNWIND_HINT_TYPE_REGS_ENTRY:
+ case UNWIND_HINT_TYPE_REGS_EXIT:
if (!deref_stack_iret_regs(state, sp, &state->ip, &state->sp)) {
orc_warn_current("can't access iret registers at %pB\n",
(void *)orig_ip);
--- a/include/linux/objtool.h
+++ b/include/linux/objtool.h
@@ -35,8 +35,9 @@ struct unwind_hint {
*/
#define UNWIND_HINT_TYPE_CALL 0
#define UNWIND_HINT_TYPE_REGS 1
-#define UNWIND_HINT_TYPE_REGS_PARTIAL 2
-#define UNWIND_HINT_TYPE_FUNC 3
+#define UNWIND_HINT_TYPE_REGS_ENTRY 2
+#define UNWIND_HINT_TYPE_REGS_EXIT 3
+#define UNWIND_HINT_TYPE_FUNC 4
#ifdef CONFIG_STACK_VALIDATION
--- a/tools/include/linux/objtool.h
+++ b/tools/include/linux/objtool.h
@@ -35,8 +35,9 @@ struct unwind_hint {
*/
#define UNWIND_HINT_TYPE_CALL 0
#define UNWIND_HINT_TYPE_REGS 1
-#define UNWIND_HINT_TYPE_REGS_PARTIAL 2
-#define UNWIND_HINT_TYPE_FUNC 3
+#define UNWIND_HINT_TYPE_REGS_ENTRY 2
+#define UNWIND_HINT_TYPE_REGS_EXIT 3
+#define UNWIND_HINT_TYPE_FUNC 4
#ifdef CONFIG_STACK_VALIDATION
--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -2312,7 +2312,8 @@ static int update_cfi_state(struct instr
}
if (cfi->type == UNWIND_HINT_TYPE_REGS ||
- cfi->type == UNWIND_HINT_TYPE_REGS_PARTIAL)
+ cfi->type == UNWIND_HINT_TYPE_REGS_ENTRY ||
+ cfi->type == UNWIND_HINT_TYPE_REGS_EXIT)
return update_cfi_state_regs(insn, cfi, op);
switch (op->dest.type) {
--- a/tools/objtool/orc_dump.c
+++ b/tools/objtool/orc_dump.c
@@ -43,7 +43,8 @@ static const char *orc_type_name(unsigne
return "call";
case UNWIND_HINT_TYPE_REGS:
return "regs";
- case UNWIND_HINT_TYPE_REGS_PARTIAL:
+ case UNWIND_HINT_TYPE_REGS_ENTRY:
+ case UNWIND_HINT_TYPE_REGS_EXIT:
return "regs (partial)";
default:
return "?";
^ permalink raw reply [flat|nested] 94+ messages in thread
* [PATCH 08/29] x86/linkage: Add ENDBR to SYM_FUNC_START*()
2022-02-18 16:49 [PATCH 00/29] x86: Kernel IBT Peter Zijlstra
` (6 preceding siblings ...)
2022-02-18 16:49 ` [PATCH 07/29] x86/entry: Sprinkle ENDBR dust Peter Zijlstra
@ 2022-02-18 16:49 ` Peter Zijlstra
2022-02-18 16:49 ` [PATCH 09/29] x86/ibt,paravirt: Sprinkle ENDBR Peter Zijlstra
` (21 subsequent siblings)
29 siblings, 0 replies; 94+ messages in thread
From: Peter Zijlstra @ 2022-02-18 16:49 UTC (permalink / raw)
To: x86, joao, hjl.tools, jpoimboe, andrew.cooper3
Cc: linux-kernel, peterz, ndesaulniers, keescook, samitolvanen,
mark.rutland, alyssa.milburn
Ensure the ASM functions have ENDBR on for IBT builds, this follows
the ARM64 example. Unlike ARM64, we'll likely end up overwriting them
with poison.
Suggested-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
arch/x86/include/asm/linkage.h | 39 +++++++++++++++++++++++++++++++++++++++
1 file changed, 39 insertions(+)
--- a/arch/x86/include/asm/linkage.h
+++ b/arch/x86/include/asm/linkage.h
@@ -3,6 +3,7 @@
#define _ASM_X86_LINKAGE_H
#include <linux/stringify.h>
+#include <asm/ibt.h>
#undef notrace
#define notrace __attribute__((no_instrument_function))
@@ -34,5 +35,43 @@
#endif /* __ASSEMBLY__ */
+/*
+ * compressed and purgatory define this to disable EXPORT,
+ * hijack this same to also not emit ENDBR.
+ */
+#ifndef __DISABLE_EXPORTS
+
+/* SYM_FUNC_START -- use for global functions */
+#define SYM_FUNC_START(name) \
+ SYM_START(name, SYM_L_GLOBAL, SYM_A_ALIGN) \
+ ENDBR
+
+/* SYM_FUNC_START_NOALIGN -- use for global functions, w/o alignment */
+#define SYM_FUNC_START_NOALIGN(name) \
+ SYM_START(name, SYM_L_GLOBAL, SYM_A_NONE) \
+ ENDBR
+
+/* SYM_FUNC_START_LOCAL -- use for local functions */
+#define SYM_FUNC_START_LOCAL(name) \
+ SYM_START(name, SYM_L_LOCAL, SYM_A_ALIGN) \
+ ENDBR
+
+/* SYM_FUNC_START_LOCAL_NOALIGN -- use for local functions, w/o alignment */
+#define SYM_FUNC_START_LOCAL_NOALIGN(name) \
+ SYM_START(name, SYM_L_LOCAL, SYM_A_NONE) \
+ ENDBR
+
+/* SYM_FUNC_START_WEAK -- use for weak functions */
+#define SYM_FUNC_START_WEAK(name) \
+ SYM_START(name, SYM_L_WEAK, SYM_A_ALIGN) \
+ ENDBR
+
+/* SYM_FUNC_START_WEAK_NOALIGN -- use for weak functions, w/o alignment */
+#define SYM_FUNC_START_WEAK_NOALIGN(name) \
+ SYM_START(name, SYM_L_WEAK, SYM_A_NONE) \
+ ENDBR
+
+#endif /* __DISABLE_EXPORTS */
+
#endif /* _ASM_X86_LINKAGE_H */
^ permalink raw reply [flat|nested] 94+ messages in thread
* [PATCH 09/29] x86/ibt,paravirt: Sprinkle ENDBR
2022-02-18 16:49 [PATCH 00/29] x86: Kernel IBT Peter Zijlstra
` (7 preceding siblings ...)
2022-02-18 16:49 ` [PATCH 08/29] x86/linkage: Add ENDBR to SYM_FUNC_START*() Peter Zijlstra
@ 2022-02-18 16:49 ` Peter Zijlstra
2022-02-18 16:49 ` [PATCH 10/29] x86/bpf: Add ENDBR instructions to prologue Peter Zijlstra
` (20 subsequent siblings)
29 siblings, 0 replies; 94+ messages in thread
From: Peter Zijlstra @ 2022-02-18 16:49 UTC (permalink / raw)
To: x86, joao, hjl.tools, jpoimboe, andrew.cooper3
Cc: linux-kernel, peterz, ndesaulniers, keescook, samitolvanen,
mark.rutland, alyssa.milburn
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
arch/x86/entry/entry_64.S | 1 +
arch/x86/include/asm/paravirt.h | 1 +
arch/x86/include/asm/qspinlock_paravirt.h | 3 +++
arch/x86/kernel/kvm.c | 3 ++-
arch/x86/kernel/paravirt.c | 2 ++
5 files changed, 9 insertions(+), 1 deletion(-)
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -636,6 +636,7 @@ SYM_INNER_LABEL(restore_regs_and_return_
SYM_INNER_LABEL_ALIGN(native_iret, SYM_L_GLOBAL)
UNWIND_HINT_IRET_REGS entry=0
+ ENDBR // paravirt_iret
/*
* Are we returning to a stack segment from the LDT? Note: in
* 64-bit mode SS:RSP on the exception stack is always valid.
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -666,6 +666,7 @@ bool __raw_callee_save___native_vcpu_is_
".globl " PV_THUNK_NAME(func) ";" \
".type " PV_THUNK_NAME(func) ", @function;" \
PV_THUNK_NAME(func) ":" \
+ ASM_ENDBR \
FRAME_BEGIN \
PV_SAVE_ALL_CALLER_REGS \
"call " #func ";" \
--- a/arch/x86/include/asm/qspinlock_paravirt.h
+++ b/arch/x86/include/asm/qspinlock_paravirt.h
@@ -2,6 +2,8 @@
#ifndef __ASM_QSPINLOCK_PARAVIRT_H
#define __ASM_QSPINLOCK_PARAVIRT_H
+#include <asm/ibt.h>
+
/*
* For x86-64, PV_CALLEE_SAVE_REGS_THUNK() saves and restores 8 64-bit
* registers. For i386, however, only 1 32-bit register needs to be saved
@@ -39,6 +41,7 @@ asm (".pushsection .text;"
".type " PV_UNLOCK ", @function;"
".align 4,0x90;"
PV_UNLOCK ": "
+ ASM_ENDBR
FRAME_BEGIN
"push %rdx;"
"mov $0x1,%eax;"
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -1024,10 +1024,11 @@ asm(
".global __raw_callee_save___kvm_vcpu_is_preempted;"
".type __raw_callee_save___kvm_vcpu_is_preempted, @function;"
"__raw_callee_save___kvm_vcpu_is_preempted:"
+ASM_ENDBR
"movq __per_cpu_offset(,%rdi,8), %rax;"
"cmpb $0, " __stringify(KVM_STEAL_TIME_preempted) "+steal_time(%rax);"
"setne %al;"
-"ret;"
+ASM_RET
".size __raw_callee_save___kvm_vcpu_is_preempted, .-__raw_callee_save___kvm_vcpu_is_preempted;"
".popsection");
--- a/arch/x86/kernel/paravirt.c
+++ b/arch/x86/kernel/paravirt.c
@@ -41,6 +41,7 @@ extern void _paravirt_nop(void);
asm (".pushsection .entry.text, \"ax\"\n"
".global _paravirt_nop\n"
"_paravirt_nop:\n\t"
+ ASM_ENDBR
ASM_RET
".size _paravirt_nop, . - _paravirt_nop\n\t"
".type _paravirt_nop, @function\n\t"
@@ -50,6 +51,7 @@ asm (".pushsection .entry.text, \"ax\"\n
asm (".pushsection .entry.text, \"ax\"\n"
".global paravirt_ret0\n"
"paravirt_ret0:\n\t"
+ ASM_ENDBR
"xor %" _ASM_AX ", %" _ASM_AX ";\n\t"
ASM_RET
".size paravirt_ret0, . - paravirt_ret0\n\t"
^ permalink raw reply [flat|nested] 94+ messages in thread
* [PATCH 10/29] x86/bpf: Add ENDBR instructions to prologue
2022-02-18 16:49 [PATCH 00/29] x86: Kernel IBT Peter Zijlstra
` (8 preceding siblings ...)
2022-02-18 16:49 ` [PATCH 09/29] x86/ibt,paravirt: Sprinkle ENDBR Peter Zijlstra
@ 2022-02-18 16:49 ` Peter Zijlstra
2022-02-18 16:49 ` [PATCH 11/29] x86/ibt,crypto: Add ENDBR for the jump-table entries Peter Zijlstra
` (19 subsequent siblings)
29 siblings, 0 replies; 94+ messages in thread
From: Peter Zijlstra @ 2022-02-18 16:49 UTC (permalink / raw)
To: x86, joao, hjl.tools, jpoimboe, andrew.cooper3
Cc: linux-kernel, peterz, ndesaulniers, keescook, samitolvanen,
mark.rutland, alyssa.milburn
With IBT enabled builds we need ENDBR instructions at indirect jump
target sites, since we start execution of the JIT'ed code through an
indirect jump, the very first instruction needs to be ENDBR.
Similarly, since eBPF tail-calls use indirect branches, their landing
site needs to be an ENDBR too.
Note: this shifts the trampoline patch site by 5 bytes but I've not
yet figured out where this is used.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
arch/x86/net/bpf_jit_comp.c | 13 ++++++++++++-
1 file changed, 12 insertions(+), 1 deletion(-)
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -46,6 +46,12 @@ static u8 *emit_code(u8 *ptr, u32 bytes,
#define EMIT4_off32(b1, b2, b3, b4, off) \
do { EMIT4(b1, b2, b3, b4); EMIT(off, 4); } while (0)
+#ifdef CONFIG_X86_IBT
+#define EMIT_ENDBR() EMIT4(0xf3, 0x0f, 0x1e, 0xfa)
+#else
+#define EMIT_ENDBR()
+#endif
+
static bool is_imm8(int value)
{
return value <= 127 && value >= -128;
@@ -241,7 +247,7 @@ struct jit_context {
/* Number of bytes emit_patch() needs to generate instructions */
#define X86_PATCH_SIZE 5
/* Number of bytes that will be skipped on tailcall */
-#define X86_TAIL_CALL_OFFSET 11
+#define X86_TAIL_CALL_OFFSET (11 + 4*IS_ENABLED(CONFIG_X86_IBT))
static void push_callee_regs(u8 **pprog, bool *callee_regs_used)
{
@@ -286,6 +292,7 @@ static void emit_prologue(u8 **pprog, u3
/* BPF trampoline can be made to work without these nops,
* but let's waste 5 bytes for now and optimize later
*/
+ EMIT_ENDBR();
memcpy(prog, x86_nops[5], X86_PATCH_SIZE);
prog += X86_PATCH_SIZE;
if (!ebpf_from_cbpf) {
@@ -296,6 +303,10 @@ static void emit_prologue(u8 **pprog, u3
}
EMIT1(0x55); /* push rbp */
EMIT3(0x48, 0x89, 0xE5); /* mov rbp, rsp */
+
+ /* X86_TAIL_CALL_OFFSET is here */
+ EMIT_ENDBR();
+
/* sub rsp, rounded_stack_depth */
if (stack_depth)
EMIT3_off32(0x48, 0x81, 0xEC, round_up(stack_depth, 8));
^ permalink raw reply [flat|nested] 94+ messages in thread
* [PATCH 11/29] x86/ibt,crypto: Add ENDBR for the jump-table entries
2022-02-18 16:49 [PATCH 00/29] x86: Kernel IBT Peter Zijlstra
` (9 preceding siblings ...)
2022-02-18 16:49 ` [PATCH 10/29] x86/bpf: Add ENDBR instructions to prologue Peter Zijlstra
@ 2022-02-18 16:49 ` Peter Zijlstra
2022-02-18 16:49 ` [PATCH 12/29] x86/ibt,kvm: Add ENDBR to fastops Peter Zijlstra
` (18 subsequent siblings)
29 siblings, 0 replies; 94+ messages in thread
From: Peter Zijlstra @ 2022-02-18 16:49 UTC (permalink / raw)
To: x86, joao, hjl.tools, jpoimboe, andrew.cooper3
Cc: linux-kernel, peterz, ndesaulniers, keescook, samitolvanen,
mark.rutland, alyssa.milburn
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
arch/x86/crypto/crc32c-pcl-intel-asm_64.S | 3 +++
1 file changed, 3 insertions(+)
--- a/arch/x86/crypto/crc32c-pcl-intel-asm_64.S
+++ b/arch/x86/crypto/crc32c-pcl-intel-asm_64.S
@@ -195,6 +195,7 @@ SYM_FUNC_START(crc_pcl)
.altmacro
LABEL crc_ %i
.noaltmacro
+ ENDBR
crc32q -i*8(block_0), crc_init
crc32q -i*8(block_1), crc1
crc32q -i*8(block_2), crc2
@@ -203,6 +204,7 @@ LABEL crc_ %i
.altmacro
LABEL crc_ %i
+ ENDBR
.noaltmacro
crc32q -i*8(block_0), crc_init
crc32q -i*8(block_1), crc1
@@ -237,6 +239,7 @@ LABEL crc_ %i
################################################################
LABEL crc_ 0
+ ENDBR
mov tmp, len
cmp $128*24, tmp
jae full_block
^ permalink raw reply [flat|nested] 94+ messages in thread
* [PATCH 12/29] x86/ibt,kvm: Add ENDBR to fastops
2022-02-18 16:49 [PATCH 00/29] x86: Kernel IBT Peter Zijlstra
` (10 preceding siblings ...)
2022-02-18 16:49 ` [PATCH 11/29] x86/ibt,crypto: Add ENDBR for the jump-table entries Peter Zijlstra
@ 2022-02-18 16:49 ` Peter Zijlstra
2022-02-18 16:49 ` [PATCH 13/29] x86/ibt,ftrace: Add ENDBR to samples/ftrace Peter Zijlstra
` (17 subsequent siblings)
29 siblings, 0 replies; 94+ messages in thread
From: Peter Zijlstra @ 2022-02-18 16:49 UTC (permalink / raw)
To: x86, joao, hjl.tools, jpoimboe, andrew.cooper3
Cc: linux-kernel, peterz, ndesaulniers, keescook, samitolvanen,
mark.rutland, alyssa.milburn
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
arch/x86/kvm/emulate.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -189,7 +189,7 @@
#define X16(x...) X8(x), X8(x)
#define NR_FASTOP (ilog2(sizeof(ulong)) + 1)
-#define FASTOP_SIZE 8
+#define FASTOP_SIZE (8 * (1 + IS_ENABLED(CONFIG_X86_IBT)))
struct opcode {
u64 flags;
@@ -311,7 +311,8 @@ static int fastop(struct x86_emulate_ctx
#define __FOP_FUNC(name) \
".align " __stringify(FASTOP_SIZE) " \n\t" \
".type " name ", @function \n\t" \
- name ":\n\t"
+ name ":\n\t" \
+ ASM_ENDBR
#define FOP_FUNC(name) \
__FOP_FUNC(#name)
@@ -433,6 +434,7 @@ static int fastop(struct x86_emulate_ctx
".align 4 \n\t" \
".type " #op ", @function \n\t" \
#op ": \n\t" \
+ ASM_ENDBR \
#op " %al \n\t" \
__FOP_RET(#op)
^ permalink raw reply [flat|nested] 94+ messages in thread
* [PATCH 13/29] x86/ibt,ftrace: Add ENDBR to samples/ftrace
2022-02-18 16:49 [PATCH 00/29] x86: Kernel IBT Peter Zijlstra
` (11 preceding siblings ...)
2022-02-18 16:49 ` [PATCH 12/29] x86/ibt,kvm: Add ENDBR to fastops Peter Zijlstra
@ 2022-02-18 16:49 ` Peter Zijlstra
2022-02-18 16:49 ` [PATCH 14/29] x86/ibt: Add IBT feature, MSR and #CP handling Peter Zijlstra
` (16 subsequent siblings)
29 siblings, 0 replies; 94+ messages in thread
From: Peter Zijlstra @ 2022-02-18 16:49 UTC (permalink / raw)
To: x86, joao, hjl.tools, jpoimboe, andrew.cooper3
Cc: linux-kernel, peterz, ndesaulniers, keescook, samitolvanen,
mark.rutland, alyssa.milburn
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
samples/ftrace/ftrace-direct-modify.c | 5 +++++
samples/ftrace/ftrace-direct-multi-modify.c | 10 +++++++---
samples/ftrace/ftrace-direct-multi.c | 5 ++++-
samples/ftrace/ftrace-direct-too.c | 3 +++
samples/ftrace/ftrace-direct.c | 3 +++
5 files changed, 22 insertions(+), 4 deletions(-)
--- a/samples/ftrace/ftrace-direct-modify.c
+++ b/samples/ftrace/ftrace-direct-modify.c
@@ -24,20 +24,25 @@ static unsigned long my_ip = (unsigned l
#ifdef CONFIG_X86_64
+#include <asm/ibt.h>
+
asm (
" .pushsection .text, \"ax\", @progbits\n"
" .type my_tramp1, @function\n"
" .globl my_tramp1\n"
" my_tramp1:"
+ ASM_ENDBR
" pushq %rbp\n"
" movq %rsp, %rbp\n"
" call my_direct_func1\n"
" leave\n"
" .size my_tramp1, .-my_tramp1\n"
ASM_RET
+
" .type my_tramp2, @function\n"
" .globl my_tramp2\n"
" my_tramp2:"
+ ASM_ENDBR
" pushq %rbp\n"
" movq %rsp, %rbp\n"
" call my_direct_func2\n"
--- a/samples/ftrace/ftrace-direct-multi-modify.c
+++ b/samples/ftrace/ftrace-direct-multi-modify.c
@@ -22,11 +22,14 @@ extern void my_tramp2(void *);
#ifdef CONFIG_X86_64
+#include <asm/ibt.h>
+
asm (
" .pushsection .text, \"ax\", @progbits\n"
" .type my_tramp1, @function\n"
" .globl my_tramp1\n"
" my_tramp1:"
+ ASM_ENDBR
" pushq %rbp\n"
" movq %rsp, %rbp\n"
" pushq %rdi\n"
@@ -34,12 +37,13 @@ asm (
" call my_direct_func1\n"
" popq %rdi\n"
" leave\n"
-" ret\n"
+ ASM_RET
" .size my_tramp1, .-my_tramp1\n"
+
" .type my_tramp2, @function\n"
-"\n"
" .globl my_tramp2\n"
" my_tramp2:"
+ ASM_ENDBR
" pushq %rbp\n"
" movq %rsp, %rbp\n"
" pushq %rdi\n"
@@ -47,7 +51,7 @@ asm (
" call my_direct_func2\n"
" popq %rdi\n"
" leave\n"
-" ret\n"
+ ASM_RET
" .size my_tramp2, .-my_tramp2\n"
" .popsection\n"
);
--- a/samples/ftrace/ftrace-direct-multi.c
+++ b/samples/ftrace/ftrace-direct-multi.c
@@ -17,11 +17,14 @@ extern void my_tramp(void *);
#ifdef CONFIG_X86_64
+#include <asm/ibt.h>
+
asm (
" .pushsection .text, \"ax\", @progbits\n"
" .type my_tramp, @function\n"
" .globl my_tramp\n"
" my_tramp:"
+ ASM_ENDBR
" pushq %rbp\n"
" movq %rsp, %rbp\n"
" pushq %rdi\n"
@@ -29,7 +32,7 @@ asm (
" call my_direct_func\n"
" popq %rdi\n"
" leave\n"
-" ret\n"
+ ASM_RET
" .size my_tramp, .-my_tramp\n"
" .popsection\n"
);
--- a/samples/ftrace/ftrace-direct-too.c
+++ b/samples/ftrace/ftrace-direct-too.c
@@ -19,11 +19,14 @@ extern void my_tramp(void *);
#ifdef CONFIG_X86_64
+#include <asm/ibt.h>
+
asm (
" .pushsection .text, \"ax\", @progbits\n"
" .type my_tramp, @function\n"
" .globl my_tramp\n"
" my_tramp:"
+ ASM_ENDBR
" pushq %rbp\n"
" movq %rsp, %rbp\n"
" pushq %rdi\n"
--- a/samples/ftrace/ftrace-direct.c
+++ b/samples/ftrace/ftrace-direct.c
@@ -16,11 +16,14 @@ extern void my_tramp(void *);
#ifdef CONFIG_X86_64
+#include <asm/ibt.h>
+
asm (
" .pushsection .text, \"ax\", @progbits\n"
" .type my_tramp, @function\n"
" .globl my_tramp\n"
" my_tramp:"
+ ASM_ENDBR
" pushq %rbp\n"
" movq %rsp, %rbp\n"
" pushq %rdi\n"
^ permalink raw reply [flat|nested] 94+ messages in thread
* [PATCH 14/29] x86/ibt: Add IBT feature, MSR and #CP handling
2022-02-18 16:49 [PATCH 00/29] x86: Kernel IBT Peter Zijlstra
` (12 preceding siblings ...)
2022-02-18 16:49 ` [PATCH 13/29] x86/ibt,ftrace: Add ENDBR to samples/ftrace Peter Zijlstra
@ 2022-02-18 16:49 ` Peter Zijlstra
2022-02-18 19:31 ` Andrew Cooper
` (4 more replies)
2022-02-18 16:49 ` [PATCH 15/29] x86: Disable IBT around firmware Peter Zijlstra
` (15 subsequent siblings)
29 siblings, 5 replies; 94+ messages in thread
From: Peter Zijlstra @ 2022-02-18 16:49 UTC (permalink / raw)
To: x86, joao, hjl.tools, jpoimboe, andrew.cooper3
Cc: linux-kernel, peterz, ndesaulniers, keescook, samitolvanen,
mark.rutland, alyssa.milburn
The bits required to make the hardware go.. Of note is that, provided
the syscall entry points are covered with ENDBR, #CP doesn't need to
be an IST because we'll never hit the syscall gap.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
arch/x86/include/asm/cpufeatures.h | 1
arch/x86/include/asm/idtentry.h | 5 ++
arch/x86/include/asm/msr-index.h | 20 ++++++++
arch/x86/include/asm/traps.h | 2
arch/x86/include/uapi/asm/processor-flags.h | 2
arch/x86/kernel/cpu/common.c | 23 +++++++++
arch/x86/kernel/idt.c | 4 +
arch/x86/kernel/traps.c | 65 ++++++++++++++++++++++++++++
8 files changed, 121 insertions(+), 1 deletion(-)
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -387,6 +387,7 @@
#define X86_FEATURE_TSXLDTRK (18*32+16) /* TSX Suspend Load Address Tracking */
#define X86_FEATURE_PCONFIG (18*32+18) /* Intel PCONFIG */
#define X86_FEATURE_ARCH_LBR (18*32+19) /* Intel ARCH LBR */
+#define X86_FEATURE_IBT (18*32+20) /* Indirect Branch Tracking */
#define X86_FEATURE_AMX_BF16 (18*32+22) /* AMX bf16 Support */
#define X86_FEATURE_AVX512_FP16 (18*32+23) /* AVX512 FP16 */
#define X86_FEATURE_AMX_TILE (18*32+24) /* AMX tile Support */
--- a/arch/x86/include/asm/idtentry.h
+++ b/arch/x86/include/asm/idtentry.h
@@ -622,6 +622,11 @@ DECLARE_IDTENTRY_DF(X86_TRAP_DF, exc_dou
DECLARE_IDTENTRY_RAW_ERRORCODE(X86_TRAP_DF, xenpv_exc_double_fault);
#endif
+/* #CP */
+#ifdef CONFIG_X86_IBT
+DECLARE_IDTENTRY_ERRORCODE(X86_TRAP_CP, exc_control_protection);
+#endif
+
/* #VC */
#ifdef CONFIG_AMD_MEM_ENCRYPT
DECLARE_IDTENTRY_VC(X86_TRAP_VC, exc_vmm_communication);
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -360,11 +360,29 @@
#define MSR_ATOM_CORE_TURBO_RATIOS 0x0000066c
#define MSR_ATOM_CORE_TURBO_VIDS 0x0000066d
-
#define MSR_CORE_PERF_LIMIT_REASONS 0x00000690
#define MSR_GFX_PERF_LIMIT_REASONS 0x000006B0
#define MSR_RING_PERF_LIMIT_REASONS 0x000006B1
+/* Control-flow Enforcement Technology MSRs */
+#define MSR_IA32_U_CET 0x000006a0 /* user mode cet */
+#define MSR_IA32_S_CET 0x000006a2 /* kernel mode cet */
+#define CET_SHSTK_EN BIT_ULL(0)
+#define CET_WRSS_EN BIT_ULL(1)
+#define CET_ENDBR_EN BIT_ULL(2)
+#define CET_LEG_IW_EN BIT_ULL(3)
+#define CET_NO_TRACK_EN BIT_ULL(4)
+#define CET_SUPPRESS_DISABLE BIT_ULL(5)
+#define CET_RESERVED (BIT_ULL(6) | BIT_ULL(7) | BIT_ULL(8) | BIT_ULL(9))
+#define CET_SUPPRESS BIT_ULL(10)
+#define CET_WAIT_ENDBR BIT_ULL(11)
+
+#define MSR_IA32_PL0_SSP 0x000006a4 /* ring-0 shadow stack pointer */
+#define MSR_IA32_PL1_SSP 0x000006a5 /* ring-1 shadow stack pointer */
+#define MSR_IA32_PL2_SSP 0x000006a6 /* ring-2 shadow stack pointer */
+#define MSR_IA32_PL3_SSP 0x000006a7 /* ring-3 shadow stack pointer */
+#define MSR_IA32_INT_SSP_TAB 0x000006a8 /* exception shadow stack table */
+
/* Hardware P state interface */
#define MSR_PPERF 0x0000064e
#define MSR_PERF_LIMIT_REASONS 0x0000064f
--- a/arch/x86/include/asm/traps.h
+++ b/arch/x86/include/asm/traps.h
@@ -18,6 +18,8 @@ void __init trap_init(void);
asmlinkage __visible noinstr struct pt_regs *vc_switch_off_ist(struct pt_regs *eregs);
#endif
+extern bool ibt_selftest(void);
+
#ifdef CONFIG_X86_F00F_BUG
/* For handling the FOOF bug */
void handle_invalid_op(struct pt_regs *regs);
--- a/arch/x86/include/uapi/asm/processor-flags.h
+++ b/arch/x86/include/uapi/asm/processor-flags.h
@@ -130,6 +130,8 @@
#define X86_CR4_SMAP _BITUL(X86_CR4_SMAP_BIT)
#define X86_CR4_PKE_BIT 22 /* enable Protection Keys support */
#define X86_CR4_PKE _BITUL(X86_CR4_PKE_BIT)
+#define X86_CR4_CET_BIT 23 /* enable Control-flow Enforcement Technology */
+#define X86_CR4_CET _BITUL(X86_CR4_CET_BIT)
/*
* x86-64 Task Priority Register, CR8
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -59,6 +59,7 @@
#include <asm/cpu_device_id.h>
#include <asm/uv/uv.h>
#include <asm/sigframe.h>
+#include <asm/traps.h>
#include "cpu.h"
@@ -592,6 +593,27 @@ static __init int setup_disable_pku(char
__setup("nopku", setup_disable_pku);
#endif /* CONFIG_X86_64 */
+static __always_inline void setup_cet(struct cpuinfo_x86 *c)
+{
+ u64 msr;
+
+ if (!IS_ENABLED(CONFIG_X86_IBT) ||
+ !cpu_feature_enabled(X86_FEATURE_IBT))
+ return;
+
+ cr4_set_bits(X86_CR4_CET);
+
+ rdmsrl(MSR_IA32_S_CET, msr);
+ if (cpu_feature_enabled(X86_FEATURE_IBT))
+ msr |= CET_ENDBR_EN;
+ wrmsrl(MSR_IA32_S_CET, msr);
+
+ if (!ibt_selftest()) {
+ pr_err("IBT selftest: Failed!\n");
+ setup_clear_cpu_cap(X86_FEATURE_IBT);
+ }
+}
+
/*
* Some CPU features depend on higher CPUID levels, which may not always
* be available due to CPUID level capping or broken virtualization
@@ -1709,6 +1731,7 @@ static void identify_cpu(struct cpuinfo_
x86_init_rdrand(c);
setup_pku(c);
+ setup_cet(c);
/*
* Clear/Set all flags overridden by options, need do it
--- a/arch/x86/kernel/idt.c
+++ b/arch/x86/kernel/idt.c
@@ -104,6 +104,10 @@ static const __initconst struct idt_data
ISTG(X86_TRAP_MC, asm_exc_machine_check, IST_INDEX_MCE),
#endif
+#ifdef CONFIG_X86_IBT
+ INTG(X86_TRAP_CP, asm_exc_control_protection),
+#endif
+
#ifdef CONFIG_AMD_MEM_ENCRYPT
ISTG(X86_TRAP_VC, asm_exc_vmm_communication, IST_INDEX_VC),
#endif
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -210,6 +210,71 @@ DEFINE_IDTENTRY(exc_overflow)
do_error_trap(regs, 0, "overflow", X86_TRAP_OF, SIGSEGV, 0, NULL);
}
+#ifdef CONFIG_X86_IBT
+
+static bool ibt_fatal = true;
+
+extern unsigned long ibt_selftest_ip; /* defined in asm beow */
+static volatile bool ibt_selftest_ok = false;
+
+DEFINE_IDTENTRY_ERRORCODE(exc_control_protection)
+{
+ if (!cpu_feature_enabled(X86_FEATURE_IBT)) {
+ pr_err("Whaaa?!?!\n");
+ return;
+ }
+
+ if (WARN_ON_ONCE(user_mode(regs) || error_code != 3))
+ return;
+
+ if (unlikely(regs->ip == ibt_selftest_ip)) {
+ ibt_selftest_ok = true;
+ return;
+ }
+
+ pr_err("Missing ENDBR: %pS\n", (void *)instruction_pointer(regs));
+ BUG_ON(ibt_fatal);
+}
+
+bool ibt_selftest(void)
+{
+ ibt_selftest_ok = false;
+
+ asm (ANNOTATE_NOENDBR
+ "1: lea 2f(%%rip), %%rax\n\t"
+ ANNOTATE_RETPOLINE_SAFE
+ " jmp *%%rax\n\t"
+ "2: nop\n\t"
+
+ /* unsigned ibt_selftest_ip = 2b */
+ ".pushsection .data,\"aw\"\n\t"
+ ".align 8\n\t"
+ ".type ibt_selftest_ip, @object\n\t"
+ ".size ibt_selftest_ip, 8\n\t"
+ "ibt_selftest_ip:\n\t"
+ ".quad 2b\n\t"
+ ".popsection\n\t"
+
+ : : : "rax", "memory");
+
+ return ibt_selftest_ok;
+}
+
+static int __init ibt_setup(char *str)
+{
+ if (!strcmp(str, "off"))
+ setup_clear_cpu_cap(X86_FEATURE_IBT);
+
+ if (!strcmp(str, "warn"))
+ ibt_fatal = false;
+
+ return 1;
+}
+
+__setup("ibt=", ibt_setup);
+
+#endif /* CONFIG_X86_IBT */
+
#ifdef CONFIG_X86_F00F_BUG
void handle_invalid_op(struct pt_regs *regs)
#else
^ permalink raw reply [flat|nested] 94+ messages in thread
* [PATCH 15/29] x86: Disable IBT around firmware
2022-02-18 16:49 [PATCH 00/29] x86: Kernel IBT Peter Zijlstra
` (13 preceding siblings ...)
2022-02-18 16:49 ` [PATCH 14/29] x86/ibt: Add IBT feature, MSR and #CP handling Peter Zijlstra
@ 2022-02-18 16:49 ` Peter Zijlstra
2022-02-21 8:27 ` Kees Cook
2022-02-18 16:49 ` [PATCH 16/29] x86/bugs: Disable Retpoline when IBT Peter Zijlstra
` (14 subsequent siblings)
29 siblings, 1 reply; 94+ messages in thread
From: Peter Zijlstra @ 2022-02-18 16:49 UTC (permalink / raw)
To: x86, joao, hjl.tools, jpoimboe, andrew.cooper3
Cc: linux-kernel, peterz, ndesaulniers, keescook, samitolvanen,
mark.rutland, alyssa.milburn
Assume firmware isn't IBT clean and disable it across calls.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
arch/x86/include/asm/efi.h | 9 +++++++--
arch/x86/include/asm/ibt.h | 10 ++++++++++
arch/x86/kernel/apm_32.c | 7 +++++++
arch/x86/kernel/cpu/common.c | 28 ++++++++++++++++++++++++++++
4 files changed, 52 insertions(+), 2 deletions(-)
--- a/arch/x86/include/asm/efi.h
+++ b/arch/x86/include/asm/efi.h
@@ -7,6 +7,7 @@
#include <asm/tlb.h>
#include <asm/nospec-branch.h>
#include <asm/mmu_context.h>
+#include <asm/ibt.h>
#include <linux/build_bug.h>
#include <linux/kernel.h>
#include <linux/pgtable.h>
@@ -120,8 +121,12 @@ extern asmlinkage u64 __efi_call(void *f
efi_enter_mm(); \
})
-#define arch_efi_call_virt(p, f, args...) \
- efi_call((void *)p->f, args) \
+#define arch_efi_call_virt(p, f, args...) ({ \
+ u64 ret, ibt = ibt_save(); \
+ ret = efi_call((void *)p->f, args); \
+ ibt_restore(ibt); \
+ ret; \
+})
#define arch_efi_call_virt_teardown() \
({ \
--- a/arch/x86/include/asm/ibt.h
+++ b/arch/x86/include/asm/ibt.h
@@ -6,6 +6,8 @@
#ifndef __ASSEMBLY__
+#include <linux/types.h>
+
#ifdef CONFIG_X86_64
#define ASM_ENDBR "endbr64\n\t"
#else
@@ -25,6 +27,9 @@ static inline bool is_endbr(const void *
return val == ~0xfa1e0ff3;
}
+extern u64 ibt_save(void);
+extern void ibt_restore(u64 save);
+
#else /* __ASSEMBLY__ */
#ifdef CONFIG_X86_64
@@ -39,10 +44,15 @@ static inline bool is_endbr(const void *
#ifndef __ASSEMBLY__
+#include <linux/types.h>
+
#define ASM_ENDBR
#define __noendbr
+static inline u64 ibt_save(void) { return 0; }
+static inline void ibt_restore(u64 save) { }
+
#else /* __ASSEMBLY__ */
#define ENDBR
--- a/arch/x86/kernel/apm_32.c
+++ b/arch/x86/kernel/apm_32.c
@@ -232,6 +232,7 @@
#include <asm/paravirt.h>
#include <asm/reboot.h>
#include <asm/nospec-branch.h>
+#include <asm/ibt.h>
#if defined(CONFIG_APM_DISPLAY_BLANK) && defined(CONFIG_VT)
extern int (*console_blank_hook)(int);
@@ -598,6 +599,7 @@ static long __apm_bios_call(void *_call)
struct desc_struct save_desc_40;
struct desc_struct *gdt;
struct apm_bios_call *call = _call;
+ u64 ibt;
cpu = get_cpu();
BUG_ON(cpu != 0);
@@ -607,11 +609,13 @@ static long __apm_bios_call(void *_call)
apm_irq_save(flags);
firmware_restrict_branch_speculation_start();
+ ibt = ibt_save();
APM_DO_SAVE_SEGS;
apm_bios_call_asm(call->func, call->ebx, call->ecx,
&call->eax, &call->ebx, &call->ecx, &call->edx,
&call->esi);
APM_DO_RESTORE_SEGS;
+ ibt_restore(ibt);
firmware_restrict_branch_speculation_end();
apm_irq_restore(flags);
gdt[0x40 / 8] = save_desc_40;
@@ -676,6 +680,7 @@ static long __apm_bios_call_simple(void
struct desc_struct save_desc_40;
struct desc_struct *gdt;
struct apm_bios_call *call = _call;
+ u64 ibt;
cpu = get_cpu();
BUG_ON(cpu != 0);
@@ -685,10 +690,12 @@ static long __apm_bios_call_simple(void
apm_irq_save(flags);
firmware_restrict_branch_speculation_start();
+ ibt = ibt_save();
APM_DO_SAVE_SEGS;
error = apm_bios_call_simple_asm(call->func, call->ebx, call->ecx,
&call->eax);
APM_DO_RESTORE_SEGS;
+ ibt_restore(ibt);
firmware_restrict_branch_speculation_end();
apm_irq_restore(flags);
gdt[0x40 / 8] = save_desc_40;
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -592,6 +592,34 @@ static __init int setup_disable_pku(char
__setup("nopku", setup_disable_pku);
#endif /* CONFIG_X86_64 */
+#ifdef CONFIG_X86_IBT
+
+u64 ibt_save(void)
+{
+ u64 msr = 0;
+
+ if (cpu_feature_enabled(X86_FEATURE_IBT)) {
+ rdmsrl(MSR_IA32_S_CET, msr);
+ wrmsrl(MSR_IA32_S_CET, msr & ~CET_ENDBR_EN);
+ }
+
+ return msr;
+}
+
+void ibt_restore(u64 save)
+{
+ u64 msr;
+
+ if (cpu_feature_enabled(X86_FEATURE_IBT)) {
+ rdmsrl(MSR_IA32_S_CET, msr);
+ msr &= ~CET_ENDBR_EN;
+ msr |= (save & CET_ENDBR_EN);
+ wrmsrl(MSR_IA32_S_CET, msr);
+ }
+}
+
+#endif
+
static __always_inline void setup_cet(struct cpuinfo_x86 *c)
{
u64 msr;
^ permalink raw reply [flat|nested] 94+ messages in thread
* [PATCH 16/29] x86/bugs: Disable Retpoline when IBT
2022-02-18 16:49 [PATCH 00/29] x86: Kernel IBT Peter Zijlstra
` (14 preceding siblings ...)
2022-02-18 16:49 ` [PATCH 15/29] x86: Disable IBT around firmware Peter Zijlstra
@ 2022-02-18 16:49 ` Peter Zijlstra
2022-02-19 2:15 ` Josh Poimboeuf
2022-02-18 16:49 ` [PATCH 17/29] x86/ibt: Annotate text references Peter Zijlstra
` (13 subsequent siblings)
29 siblings, 1 reply; 94+ messages in thread
From: Peter Zijlstra @ 2022-02-18 16:49 UTC (permalink / raw)
To: x86, joao, hjl.tools, jpoimboe, andrew.cooper3
Cc: linux-kernel, peterz, ndesaulniers, keescook, samitolvanen,
mark.rutland, alyssa.milburn
Retpoline and IBT are mutually exclusive. IBT relies on indirect
branches (JMP/CALL *%reg) while retpoline avoids them by design.
Demote to LFENCE on IBT enabled hardware.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
arch/x86/kernel/cpu/bugs.c | 25 +++++++++++++++++++++++++
1 file changed, 25 insertions(+)
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -937,6 +937,11 @@ static void __init spectre_v2_select_mit
boot_cpu_data.x86_vendor == X86_VENDOR_HYGON) {
retpoline_amd:
if (!boot_cpu_has(X86_FEATURE_LFENCE_RDTSC)) {
+ if (IS_ENABLED(CONFIG_X86_IBT) &&
+ boot_cpu_has(X86_FEATURE_IBT)) {
+ pr_err("Spectre mitigation: LFENCE not serializing, generic retpoline not available due to IBT, switching to none\n");
+ return;
+ }
pr_err("Spectre mitigation: LFENCE not serializing, switching to generic retpoline\n");
goto retpoline_generic;
}
@@ -945,6 +950,26 @@ static void __init spectre_v2_select_mit
setup_force_cpu_cap(X86_FEATURE_RETPOLINE);
} else {
retpoline_generic:
+ /*
+ * Full retpoline is incompatible with IBT, demote to LFENCE.
+ */
+ if (IS_ENABLED(CONFIG_X86_IBT) &&
+ boot_cpu_has(X86_FEATURE_IBT)) {
+ switch (cmd) {
+ case SPECTRE_V2_CMD_FORCE:
+ case SPECTRE_V2_CMD_AUTO:
+ case SPECTRE_V2_CMD_RETPOLINE:
+ /* silent for auto select */
+ break;
+
+ default:
+ /* warn when 'demoting' an explicit selection */
+ pr_warn("Spectre mitigation: Switching to LFENCE due to IBT\n");
+ break;
+ }
+
+ goto retpoline_amd;
+ }
mode = SPECTRE_V2_RETPOLINE_GENERIC;
setup_force_cpu_cap(X86_FEATURE_RETPOLINE);
}
^ permalink raw reply [flat|nested] 94+ messages in thread
* [PATCH 17/29] x86/ibt: Annotate text references
2022-02-18 16:49 [PATCH 00/29] x86: Kernel IBT Peter Zijlstra
` (15 preceding siblings ...)
2022-02-18 16:49 ` [PATCH 16/29] x86/bugs: Disable Retpoline when IBT Peter Zijlstra
@ 2022-02-18 16:49 ` Peter Zijlstra
2022-02-19 5:22 ` Josh Poimboeuf
2022-02-18 16:49 ` [PATCH 18/29] x86/ibt,ftrace: Annotate ftrace code patching Peter Zijlstra
` (12 subsequent siblings)
29 siblings, 1 reply; 94+ messages in thread
From: Peter Zijlstra @ 2022-02-18 16:49 UTC (permalink / raw)
To: x86, joao, hjl.tools, jpoimboe, andrew.cooper3
Cc: linux-kernel, peterz, ndesaulniers, keescook, samitolvanen,
mark.rutland, alyssa.milburn
Annotate away some of the generic code references. This is things
where we take the address of a symbol for exception handling or return
addresses (eg. context switch).
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
arch/x86/entry/entry_64.S | 9 +++++++++
arch/x86/entry/entry_64_compat.S | 1 +
arch/x86/kernel/alternative.c | 4 +++-
arch/x86/kernel/head_64.S | 4 ++++
arch/x86/kernel/kprobes/core.c | 1 +
arch/x86/kernel/relocate_kernel_64.S | 2 ++
arch/x86/lib/error-inject.c | 1 +
arch/x86/lib/retpoline.S | 2 ++
10 files changed, 33 insertions(+), 2 deletions(-)
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -278,6 +278,7 @@ SYM_FUNC_END(__switch_to_asm)
.pushsection .text, "ax"
SYM_CODE_START(ret_from_fork)
UNWIND_HINT_EMPTY
+ ANNOTATE_NOENDBR // copy_thread
movq %rax, %rdi
call schedule_tail /* rdi: 'prev' task parameter */
@@ -564,12 +565,16 @@ SYM_CODE_END(\asmsym)
.align 16
.globl __irqentry_text_start
__irqentry_text_start:
+ ANNOTATE_NOENDBR // unwinders
+ ud2;
#include <asm/idtentry.h>
.align 16
.globl __irqentry_text_end
__irqentry_text_end:
+ ANNOTATE_NOENDBR
+ ud2;
SYM_CODE_START_LOCAL(common_interrupt_return)
SYM_INNER_LABEL(swapgs_restore_regs_and_return_to_usermode, SYM_L_GLOBAL)
@@ -647,6 +652,7 @@ SYM_INNER_LABEL_ALIGN(native_iret, SYM_L
#endif
SYM_INNER_LABEL(native_irq_return_iret, SYM_L_GLOBAL)
+ ANNOTATE_NOENDBR // exc_double_fault
/*
* This may fault. Non-paranoid faults on return to userspace are
* handled by fixup_bad_iret. These include #SS, #GP, and #NP.
@@ -741,6 +747,7 @@ SYM_FUNC_START(asm_load_gs_index)
FRAME_BEGIN
swapgs
.Lgs_change:
+ ANNOTATE_NOENDBR // error_entry
movl %edi, %gs
2: ALTERNATIVE "", "mfence", X86_BUG_SWAPGS_FENCE
swapgs
@@ -1318,6 +1325,7 @@ SYM_CODE_START(asm_exc_nmi)
#endif
repeat_nmi:
+ ANNOTATE_NOENDBR // this code
/*
* If there was a nested NMI, the first NMI's iret will return
* here. But NMIs are still enabled and we can take another
@@ -1346,6 +1354,7 @@ SYM_CODE_START(asm_exc_nmi)
.endr
subq $(5*8), %rsp
end_repeat_nmi:
+ ANNOTATE_NOENDBR // this code
/*
* Everything below this point can be preempted by a nested NMI.
--- a/arch/x86/entry/entry_64_compat.S
+++ b/arch/x86/entry/entry_64_compat.S
@@ -148,6 +148,7 @@ SYM_INNER_LABEL(entry_SYSENTER_compat_af
popfq
jmp .Lsysenter_flags_fixed
SYM_INNER_LABEL(__end_entry_SYSENTER_compat, SYM_L_GLOBAL)
+ ANNOTATE_NOENDBR // is_sysenter_singlestep
SYM_CODE_END(entry_SYSENTER_compat)
/*
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -713,6 +713,7 @@ asm (
" .pushsection .init.text, \"ax\", @progbits\n"
" .type int3_magic, @function\n"
"int3_magic:\n"
+ ANNOTATE_NOENDBR
" movl $1, (%" _ASM_ARG1 ")\n"
ASM_RET
" .size int3_magic, .-int3_magic\n"
@@ -757,7 +758,8 @@ static void __init int3_selftest(void)
* then trigger the INT3, padded with NOPs to match a CALL instruction
* length.
*/
- asm volatile ("1: int3; nop; nop; nop; nop\n\t"
+ asm volatile (ANNOTATE_NOENDBR
+ "1: int3; nop; nop; nop; nop\n\t"
".pushsection .init.data,\"aw\"\n\t"
".align " __ASM_SEL(4, 8) "\n\t"
".type int3_selftest_ip, @object\n\t"
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -100,6 +100,7 @@ SYM_CODE_END(startup_64)
SYM_CODE_START(secondary_startup_64)
UNWIND_HINT_EMPTY
+ ANNOTATE_NOENDBR
/*
* At this point the CPU runs in 64bit mode CS.L = 1 CS.D = 0,
* and someone has loaded a mapped page table.
@@ -128,6 +129,7 @@ SYM_CODE_START(secondary_startup_64)
*/
SYM_INNER_LABEL(secondary_startup_64_no_verify, SYM_L_GLOBAL)
UNWIND_HINT_EMPTY
+ ANNOTATE_NOENDBR
/*
* Retrieve the modifier (SME encryption mask if SME is active) to be
@@ -193,6 +195,7 @@ SYM_INNER_LABEL(secondary_startup_64_no_
jmp *%rax
1:
UNWIND_HINT_EMPTY
+ ANNOTATE_NOENDBR // above
/*
* We must switch to a new descriptor in kernel space for the GDT
@@ -300,6 +303,7 @@ SYM_INNER_LABEL(secondary_startup_64_no_
pushq %rax # target address in negative space
lretq
.Lafter_lret:
+ ANNOTATE_NOENDBR
SYM_CODE_END(secondary_startup_64)
#include "verify_cpu.S"
--- a/arch/x86/kernel/kprobes/core.c
+++ b/arch/x86/kernel/kprobes/core.c
@@ -1023,6 +1023,7 @@ asm(
".type __kretprobe_trampoline, @function\n"
"__kretprobe_trampoline:\n"
#ifdef CONFIG_X86_64
+ ANNOTATE_NOENDBR
/* Push a fake return address to tell the unwinder it's a kretprobe. */
" pushq $__kretprobe_trampoline\n"
UNWIND_HINT_FUNC
--- a/arch/x86/kernel/relocate_kernel_64.S
+++ b/arch/x86/kernel/relocate_kernel_64.S
@@ -42,6 +42,7 @@
.code64
SYM_CODE_START_NOALIGN(relocate_kernel)
UNWIND_HINT_EMPTY
+ ANNOTATE_NOENDBR
/*
* %rdi indirection_page
* %rsi page_list
@@ -215,6 +216,7 @@ SYM_CODE_END(identity_mapped)
SYM_CODE_START_LOCAL_NOALIGN(virtual_mapped)
UNWIND_HINT_EMPTY
+ ANNOTATE_NOENDBR // RET target, above
movq RSP(%r8), %rsp
movq CR4(%r8), %rax
movq %rax, %cr4
--- a/arch/x86/lib/error-inject.c
+++ b/arch/x86/lib/error-inject.c
@@ -11,6 +11,7 @@ asm(
".type just_return_func, @function\n"
".globl just_return_func\n"
"just_return_func:\n"
+ ANNOTATE_NOENDBR
ASM_RET
".size just_return_func, .-just_return_func\n"
);
--- a/arch/x86/lib/retpoline.S
+++ b/arch/x86/lib/retpoline.S
@@ -12,6 +12,8 @@
.section .text.__x86.indirect_thunk
+ ANNOTATE_NOENDBR // apply_retpolines
+
.macro RETPOLINE reg
ANNOTATE_INTRA_FUNCTION_CALL
call .Ldo_rop_\@
^ permalink raw reply [flat|nested] 94+ messages in thread
* [PATCH 18/29] x86/ibt,ftrace: Annotate ftrace code patching
2022-02-18 16:49 [PATCH 00/29] x86: Kernel IBT Peter Zijlstra
` (16 preceding siblings ...)
2022-02-18 16:49 ` [PATCH 17/29] x86/ibt: Annotate text references Peter Zijlstra
@ 2022-02-18 16:49 ` Peter Zijlstra
2022-02-18 16:49 ` [PATCH 19/29] x86/ibt,xen: Annotate away warnings Peter Zijlstra
` (11 subsequent siblings)
29 siblings, 0 replies; 94+ messages in thread
From: Peter Zijlstra @ 2022-02-18 16:49 UTC (permalink / raw)
To: x86, joao, hjl.tools, jpoimboe, andrew.cooper3
Cc: linux-kernel, peterz, ndesaulniers, keescook, samitolvanen,
mark.rutland, alyssa.milburn
Notably the noinline is required to generate sane code; without it GCC
think's it's awesome to fold in a constant to the code reloc which
puts it in the wrong place to match with the ANNOTATE_NOENDBR.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
arch/x86/kernel/ftrace.c | 2 +-
arch/x86/kernel/ftrace_64.S | 9 +++++++++
2 files changed, 10 insertions(+), 1 deletion(-)
--- a/arch/x86/kernel/ftrace.c
+++ b/arch/x86/kernel/ftrace.c
@@ -69,7 +69,7 @@ static const char *ftrace_nop_replace(vo
return x86_nops[5];
}
-static const char *ftrace_call_replace(unsigned long ip, unsigned long addr)
+static noinline const char *ftrace_call_replace(unsigned long ip, unsigned long addr)
{
return text_gen_insn(CALL_INSN_OPCODE, (void *)ip, (void *)addr);
}
--- a/arch/x86/kernel/ftrace_64.S
+++ b/arch/x86/kernel/ftrace_64.S
@@ -145,6 +145,7 @@ SYM_FUNC_START(ftrace_caller)
movq %rcx, RSP(%rsp)
SYM_INNER_LABEL(ftrace_caller_op_ptr, SYM_L_GLOBAL)
+ ANNOTATE_NOENDBR
/* Load the ftrace_ops into the 3rd parameter */
movq function_trace_op(%rip), %rdx
@@ -155,6 +156,7 @@ SYM_INNER_LABEL(ftrace_caller_op_ptr, SY
movq $0, CS(%rsp)
SYM_INNER_LABEL(ftrace_call, SYM_L_GLOBAL)
+ ANNOTATE_NOENDBR
call ftrace_stub
/* Handlers can change the RIP */
@@ -169,6 +171,7 @@ SYM_INNER_LABEL(ftrace_call, SYM_L_GLOBA
* layout here.
*/
SYM_INNER_LABEL(ftrace_caller_end, SYM_L_GLOBAL)
+ ANNOTATE_NOENDBR
jmp ftrace_epilogue
SYM_FUNC_END(ftrace_caller);
@@ -179,6 +182,7 @@ SYM_FUNC_START(ftrace_epilogue)
* It is also used to copy the RET for trampolines.
*/
SYM_INNER_LABEL_ALIGN(ftrace_stub, SYM_L_WEAK)
+ ANNOTATE_NOENDBR
UNWIND_HINT_FUNC
RET
SYM_FUNC_END(ftrace_epilogue)
@@ -192,6 +196,7 @@ SYM_FUNC_START(ftrace_regs_caller)
/* save_mcount_regs fills in first two parameters */
SYM_INNER_LABEL(ftrace_regs_caller_op_ptr, SYM_L_GLOBAL)
+ ANNOTATE_NOENDBR
/* Load the ftrace_ops into the 3rd parameter */
movq function_trace_op(%rip), %rdx
@@ -221,6 +226,7 @@ SYM_INNER_LABEL(ftrace_regs_caller_op_pt
leaq (%rsp), %rcx
SYM_INNER_LABEL(ftrace_regs_call, SYM_L_GLOBAL)
+ ANNOTATE_NOENDBR
call ftrace_stub
/* Copy flags back to SS, to restore them */
@@ -248,6 +254,7 @@ SYM_INNER_LABEL(ftrace_regs_call, SYM_L_
*/
testq %rax, %rax
SYM_INNER_LABEL(ftrace_regs_caller_jmp, SYM_L_GLOBAL)
+ ANNOTATE_NOENDBR
jnz 1f
restore_mcount_regs
@@ -261,6 +268,7 @@ SYM_INNER_LABEL(ftrace_regs_caller_jmp,
* to the return.
*/
SYM_INNER_LABEL(ftrace_regs_caller_end, SYM_L_GLOBAL)
+ ANNOTATE_NOENDBR
jmp ftrace_epilogue
/* Swap the flags with orig_rax */
@@ -284,6 +292,7 @@ SYM_FUNC_START(__fentry__)
jnz trace
SYM_INNER_LABEL(ftrace_stub, SYM_L_GLOBAL)
+ ANNOTATE_NOENDBR
RET
trace:
^ permalink raw reply [flat|nested] 94+ messages in thread
* [PATCH 19/29] x86/ibt,xen: Annotate away warnings
2022-02-18 16:49 [PATCH 00/29] x86: Kernel IBT Peter Zijlstra
` (17 preceding siblings ...)
2022-02-18 16:49 ` [PATCH 18/29] x86/ibt,ftrace: Annotate ftrace code patching Peter Zijlstra
@ 2022-02-18 16:49 ` Peter Zijlstra
2022-02-18 20:24 ` Andrew Cooper
2022-02-18 16:49 ` [PATCH 20/29] x86/ibt,sev: Annotations Peter Zijlstra
` (10 subsequent siblings)
29 siblings, 1 reply; 94+ messages in thread
From: Peter Zijlstra @ 2022-02-18 16:49 UTC (permalink / raw)
To: x86, joao, hjl.tools, jpoimboe, andrew.cooper3
Cc: linux-kernel, peterz, ndesaulniers, keescook, samitolvanen,
mark.rutland, alyssa.milburn
The xen_iret ENDBR is needed for pre-alternative code calling the
pv_ops using indirect calls.
The rest look like hypervisor entry points which will be IRET like
transfers and as such don't need ENDBR.
The hypercall page comes from the hypervisor, there might or might not
be ENDBR there, not our problem.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
arch/x86/entry/entry_64.S | 1 +
arch/x86/kernel/head_64.S | 1 +
arch/x86/xen/xen-asm.S | 8 ++++++++
arch/x86/xen/xen-head.S | 5 +++--
4 files changed, 13 insertions(+), 2 deletions(-)
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -818,6 +818,7 @@ SYM_CODE_END(exc_xen_hypervisor_callback
*/
SYM_CODE_START(xen_failsafe_callback)
UNWIND_HINT_EMPTY
+ ANNOTATE_NOENDBR
movl %ds, %ecx
cmpw %cx, 0x10(%rsp)
jne 1f
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -392,6 +392,7 @@ SYM_CODE_START(early_idt_handler_array)
.endr
UNWIND_HINT_IRET_REGS offset=16 entry=0
SYM_CODE_END(early_idt_handler_array)
+ ANNOTATE_NOENDBR // early_idt_handler_array[NUM_EXCEPTION_VECTORS]
SYM_CODE_START_LOCAL(early_idt_handler_common)
/*
--- a/arch/x86/xen/xen-asm.S
+++ b/arch/x86/xen/xen-asm.S
@@ -122,6 +122,7 @@ SYM_FUNC_END(xen_read_cr2_direct);
.macro xen_pv_trap name
SYM_CODE_START(xen_\name)
UNWIND_HINT_EMPTY
+ ANNOTATE_NOENDBR
pop %rcx
pop %r11
jmp \name
@@ -162,6 +163,7 @@ SYM_CODE_START(xen_early_idt_handler_arr
i = 0
.rept NUM_EXCEPTION_VECTORS
UNWIND_HINT_EMPTY
+ ANNOTATE_NOENDBR
pop %rcx
pop %r11
jmp early_idt_handler_array + i*EARLY_IDT_HANDLER_SIZE
@@ -169,6 +171,7 @@ SYM_CODE_START(xen_early_idt_handler_arr
.fill xen_early_idt_handler_array + i*XEN_EARLY_IDT_HANDLER_SIZE - ., 1, 0xcc
.endr
SYM_CODE_END(xen_early_idt_handler_array)
+ ANNOTATE_NOENDBR
__FINIT
hypercall_iret = hypercall_page + __HYPERVISOR_iret * 32
@@ -189,6 +192,7 @@ hypercall_iret = hypercall_page + __HYPE
*/
SYM_CODE_START(xen_iret)
UNWIND_HINT_EMPTY
+ ENDBR
pushq $0
jmp hypercall_iret
SYM_CODE_END(xen_iret)
@@ -230,6 +234,7 @@ SYM_CODE_END(xenpv_restore_regs_and_retu
/* Normal 64-bit system call target */
SYM_CODE_START(xen_syscall_target)
UNWIND_HINT_EMPTY
+ ANNOTATE_NOENDBR
popq %rcx
popq %r11
@@ -249,6 +254,7 @@ SYM_CODE_END(xen_syscall_target)
/* 32-bit compat syscall target */
SYM_CODE_START(xen_syscall32_target)
UNWIND_HINT_EMPTY
+ ANNOTATE_NOENDBR
popq %rcx
popq %r11
@@ -266,6 +272,7 @@ SYM_CODE_END(xen_syscall32_target)
/* 32-bit compat sysenter target */
SYM_CODE_START(xen_sysenter_target)
UNWIND_HINT_EMPTY
+ ANNOTATE_NOENDBR
/*
* NB: Xen is polite and clears TF from EFLAGS for us. This means
* that we don't need to guard against single step exceptions here.
@@ -289,6 +296,7 @@ SYM_CODE_END(xen_sysenter_target)
SYM_CODE_START(xen_syscall32_target)
SYM_CODE_START(xen_sysenter_target)
UNWIND_HINT_EMPTY
+ ANNOTATE_NOENDBR
lea 16(%rsp), %rsp /* strip %rcx, %r11 */
mov $-ENOSYS, %rax
pushq $0
--- a/arch/x86/xen/xen-head.S
+++ b/arch/x86/xen/xen-head.S
@@ -25,8 +25,8 @@
SYM_CODE_START(hypercall_page)
.rept (PAGE_SIZE / 32)
UNWIND_HINT_FUNC
- .skip 31, 0x90
- RET
+ ANNOTATE_NOENDBR
+ .skip 32, 0xcc
.endr
#define HYPERCALL(n) \
@@ -74,6 +74,7 @@ SYM_CODE_END(startup_xen)
.pushsection .text
SYM_CODE_START(asm_cpu_bringup_and_idle)
UNWIND_HINT_EMPTY
+ ANNOTATE_NOENDBR
call cpu_bringup_and_idle
SYM_CODE_END(asm_cpu_bringup_and_idle)
^ permalink raw reply [flat|nested] 94+ messages in thread
* [PATCH 20/29] x86/ibt,sev: Annotations
2022-02-18 16:49 [PATCH 00/29] x86: Kernel IBT Peter Zijlstra
` (18 preceding siblings ...)
2022-02-18 16:49 ` [PATCH 19/29] x86/ibt,xen: Annotate away warnings Peter Zijlstra
@ 2022-02-18 16:49 ` Peter Zijlstra
2022-02-18 16:49 ` [PATCH 21/29] objtool: Rename --duplicate to --lto Peter Zijlstra
` (9 subsequent siblings)
29 siblings, 0 replies; 94+ messages in thread
From: Peter Zijlstra @ 2022-02-18 16:49 UTC (permalink / raw)
To: x86, joao, hjl.tools, jpoimboe, andrew.cooper3
Cc: linux-kernel, peterz, ndesaulniers, keescook, samitolvanen,
mark.rutland, alyssa.milburn
No IBT on AMD so far.. probably correct, who knows.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
arch/x86/entry/entry_64.S | 1 +
arch/x86/entry/entry_64_compat.S | 1 +
arch/x86/kernel/head_64.S | 1 +
3 files changed, 3 insertions(+)
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -96,6 +96,7 @@ SYM_CODE_START(entry_SYSCALL_64)
movq PER_CPU_VAR(cpu_current_top_of_stack), %rsp
SYM_INNER_LABEL(entry_SYSCALL_64_safe_stack, SYM_L_GLOBAL)
+ ANNOTATE_NOENDBR
/* Construct struct pt_regs on stack */
pushq $__USER_DS /* pt_regs->ss */
--- a/arch/x86/entry/entry_64_compat.S
+++ b/arch/x86/entry/entry_64_compat.S
@@ -214,6 +214,7 @@ SYM_CODE_START(entry_SYSCALL_compat)
movq PER_CPU_VAR(cpu_current_top_of_stack), %rsp
SYM_INNER_LABEL(entry_SYSCALL_compat_safe_stack, SYM_L_GLOBAL)
+ ANNOTATE_NOENDBR
/* Construct struct pt_regs on stack */
pushq $__USER32_DS /* pt_regs->ss */
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -440,6 +440,7 @@ SYM_CODE_END(early_idt_handler_common)
*/
SYM_CODE_START_NOALIGN(vc_no_ghcb)
UNWIND_HINT_IRET_REGS offset=8
+ ENDBR
/* Build pt_regs */
PUSH_AND_CLEAR_REGS
^ permalink raw reply [flat|nested] 94+ messages in thread
* [PATCH 21/29] objtool: Rename --duplicate to --lto
2022-02-18 16:49 [PATCH 00/29] x86: Kernel IBT Peter Zijlstra
` (19 preceding siblings ...)
2022-02-18 16:49 ` [PATCH 20/29] x86/ibt,sev: Annotations Peter Zijlstra
@ 2022-02-18 16:49 ` Peter Zijlstra
2022-02-26 19:42 ` Josh Poimboeuf
2022-02-18 16:49 ` [PATCH 22/29] Kbuild: Prepare !CLANG whole module objtool Peter Zijlstra
` (8 subsequent siblings)
29 siblings, 1 reply; 94+ messages in thread
From: Peter Zijlstra @ 2022-02-18 16:49 UTC (permalink / raw)
To: x86, joao, hjl.tools, jpoimboe, andrew.cooper3
Cc: linux-kernel, peterz, ndesaulniers, keescook, samitolvanen,
mark.rutland, alyssa.milburn
In order to prepare for LTO like objtool runs for modules, rename the
duplicate argument to lto.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
scripts/link-vmlinux.sh | 2 +-
tools/objtool/builtin-check.c | 4 ++--
tools/objtool/check.c | 7 ++++++-
tools/objtool/include/objtool/builtin.h | 2 +-
4 files changed, 10 insertions(+), 5 deletions(-)
--- a/scripts/link-vmlinux.sh
+++ b/scripts/link-vmlinux.sh
@@ -115,7 +115,7 @@ objtool_link()
objtoolcmd="orc generate"
fi
- objtoolopt="${objtoolopt} --duplicate"
+ objtoolopt="${objtoolopt} --lto"
if is_enabled CONFIG_FTRACE_MCOUNT_USE_OBJTOOL; then
objtoolopt="${objtoolopt} --mcount"
--- a/tools/objtool/builtin-check.c
+++ b/tools/objtool/builtin-check.c
@@ -20,7 +20,7 @@
#include <objtool/objtool.h>
bool no_fp, no_unreachable, retpoline, module, backtrace, uaccess, stats,
- validate_dup, vmlinux, mcount, noinstr, backup, sls, dryrun;
+ lto, vmlinux, mcount, noinstr, backup, sls, dryrun;
static const char * const check_usage[] = {
"objtool check [<options>] file.o",
@@ -40,7 +40,7 @@ const struct option check_options[] = {
OPT_BOOLEAN('b', "backtrace", &backtrace, "unwind on error"),
OPT_BOOLEAN('a', "uaccess", &uaccess, "enable uaccess checking"),
OPT_BOOLEAN('s', "stats", &stats, "print statistics"),
- OPT_BOOLEAN('d', "duplicate", &validate_dup, "duplicate validation for vmlinux.o"),
+ OPT_BOOLEAN(0, "lto", <o, "whole-archive like runs"),
OPT_BOOLEAN('n', "noinstr", &noinstr, "noinstr validation for vmlinux.o"),
OPT_BOOLEAN('l', "vmlinux", &vmlinux, "vmlinux.o validation"),
OPT_BOOLEAN('M', "mcount", &mcount, "generate __mcount_loc"),
--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -3501,6 +3501,11 @@ int check(struct objtool_file *file)
{
int ret, warnings = 0;
+ if (lto && !(vmlinux || module)) {
+ fprintf(stderr, "--lto requires: --vmlinux or --module\n");
+ return 1;
+ }
+
arch_initial_func_cfi_state(&initial_func_cfi);
init_cfi_state(&init_cfi);
init_cfi_state(&func_cfi);
@@ -3521,7 +3526,7 @@ int check(struct objtool_file *file)
if (list_empty(&file->insn_list))
goto out;
- if (vmlinux && !validate_dup) {
+ if (vmlinux && !lto) {
ret = validate_vmlinux_functions(file);
if (ret < 0)
goto out;
--- a/tools/objtool/include/objtool/builtin.h
+++ b/tools/objtool/include/objtool/builtin.h
@@ -9,7 +9,7 @@
extern const struct option check_options[];
extern bool no_fp, no_unreachable, retpoline, module, backtrace, uaccess, stats,
- validate_dup, vmlinux, mcount, noinstr, backup, sls, dryrun;
+ lto, vmlinux, mcount, noinstr, backup, sls, dryrun;
extern int cmd_parse_options(int argc, const char **argv, const char * const usage[]);
^ permalink raw reply [flat|nested] 94+ messages in thread
* [PATCH 22/29] Kbuild: Prepare !CLANG whole module objtool
2022-02-18 16:49 [PATCH 00/29] x86: Kernel IBT Peter Zijlstra
` (20 preceding siblings ...)
2022-02-18 16:49 ` [PATCH 21/29] objtool: Rename --duplicate to --lto Peter Zijlstra
@ 2022-02-18 16:49 ` Peter Zijlstra
2022-02-18 16:49 ` [PATCH 23/29] objtool: Read the NOENDBR annotation Peter Zijlstra
` (7 subsequent siblings)
29 siblings, 0 replies; 94+ messages in thread
From: Peter Zijlstra @ 2022-02-18 16:49 UTC (permalink / raw)
To: x86, joao, hjl.tools, jpoimboe, andrew.cooper3
Cc: linux-kernel, peterz, ndesaulniers, keescook, samitolvanen,
mark.rutland, alyssa.milburn
Massage the Kbuild stuff to allow running objtool on whole modules.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
Makefile | 2 ++
scripts/Makefile.build | 25 ++++++++++++++++---------
scripts/Makefile.lib | 2 +-
3 files changed, 19 insertions(+), 10 deletions(-)
--- a/Makefile
+++ b/Makefile
@@ -907,6 +907,8 @@ ifdef CONFIG_LTO
KBUILD_CFLAGS += -fno-lto $(CC_FLAGS_LTO)
KBUILD_AFLAGS += -fno-lto
export CC_FLAGS_LTO
+BUILD_LTO := y
+export BUILD_LTO
endif
ifdef CONFIG_CFI_CLANG
--- a/scripts/Makefile.build
+++ b/scripts/Makefile.build
@@ -88,7 +88,7 @@ endif
targets-for-modules := $(patsubst %.o, %.mod, $(filter %.o, $(obj-m)))
-ifdef CONFIG_LTO_CLANG
+ifdef BUILD_LTO
targets-for-modules += $(patsubst %.o, %.lto.o, $(filter %.o, $(obj-m)))
endif
@@ -230,6 +230,7 @@ objtool := $(objtree)/tools/objtool/objt
objtool_args = \
$(if $(CONFIG_UNWINDER_ORC),orc generate,check) \
$(if $(part-of-module), --module) \
+ $(if $(BUILD_LTO), --lto) \
$(if $(CONFIG_FRAME_POINTER),, --no-fp) \
$(if $(CONFIG_GCOV_KERNEL)$(CONFIG_LTO_CLANG), --no-unreachable)\
$(if $(CONFIG_RETPOLINE), --retpoline) \
@@ -242,11 +243,16 @@ cmd_gen_objtooldep = $(if $(objtool-enab
endif # CONFIG_STACK_VALIDATION
-ifdef CONFIG_LTO_CLANG
+ifdef BUILD_LTO
# Skip objtool for LLVM bitcode
$(obj)/%.o: objtool-enabled :=
+# objtool was skipped for LLVM bitcode, run it now that we have compiled
+# modules into native code
+$(obj)/%.lto.o: objtool-enabled = y
+$(obj)/%.lto.o: part-of-module := y
+
else
# 'OBJECT_FILES_NON_STANDARD := y': skip objtool checking for a directory
@@ -292,21 +298,22 @@ ifdef CONFIG_LTO_CLANG
# Module .o files may contain LLVM bitcode, compile them into native code
# before ELF processing
quiet_cmd_cc_lto_link_modules = LTO [M] $@
-cmd_cc_lto_link_modules = \
+ cmd_cc_lto_link_modules = \
$(LD) $(ld_flags) -r -o $@ \
$(shell [ -s $(@:.lto.o=.o.symversions) ] && \
echo -T $(@:.lto.o=.o.symversions)) \
--whole-archive $(filter-out FORCE,$^) \
$(cmd_objtool)
-
-# objtool was skipped for LLVM bitcode, run it now that we have compiled
-# modules into native code
-$(obj)/%.lto.o: objtool-enabled = y
-$(obj)/%.lto.o: part-of-module := y
+else
+quiet_cmd_cc_lto_link_modules = LD [M] $@
+ cmd_cc_lto_link_modules = \
+ $(LD) $(ld_flags) -r -o $@ \
+ $(filter-out FORCE,$^) \
+ $(cmd_objtool)
+endif
$(obj)/%.lto.o: $(obj)/%.o FORCE
$(call if_changed,cc_lto_link_modules)
-endif
cmd_mod = { \
echo $(if $($*-objs)$($*-y)$($*-m), $(addprefix $(obj)/, $($*-objs) $($*-y) $($*-m)), $(@:.mod=.o)); \
--- a/scripts/Makefile.lib
+++ b/scripts/Makefile.lib
@@ -230,7 +230,7 @@ dtc_cpp_flags = -Wp,-MMD,$(depfile).pre
$(addprefix -I,$(DTC_INCLUDE)) \
-undef -D__DTS__
-ifeq ($(CONFIG_LTO_CLANG),y)
+ifdef BUILD_LTO
# With CONFIG_LTO_CLANG, .o files in modules might be LLVM bitcode, so we
# need to run LTO to compile them into native code (.lto.o) before further
# processing.
^ permalink raw reply [flat|nested] 94+ messages in thread
* [PATCH 23/29] objtool: Read the NOENDBR annotation
2022-02-18 16:49 [PATCH 00/29] x86: Kernel IBT Peter Zijlstra
` (21 preceding siblings ...)
2022-02-18 16:49 ` [PATCH 22/29] Kbuild: Prepare !CLANG whole module objtool Peter Zijlstra
@ 2022-02-18 16:49 ` Peter Zijlstra
2022-02-18 16:49 ` [PATCH 24/29] x86/text-patching: Make text_gen_insn() IBT aware Peter Zijlstra
` (6 subsequent siblings)
29 siblings, 0 replies; 94+ messages in thread
From: Peter Zijlstra @ 2022-02-18 16:49 UTC (permalink / raw)
To: x86, joao, hjl.tools, jpoimboe, andrew.cooper3
Cc: linux-kernel, peterz, ndesaulniers, keescook, samitolvanen,
mark.rutland, alyssa.milburn
Read the new NOENDBR annotation. While there, attempt to not bloat
struct instruction.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
tools/objtool/check.c | 27 +++++++++++++++++++++++++++
tools/objtool/include/objtool/check.h | 13 ++++++++++---
2 files changed, 37 insertions(+), 3 deletions(-)
--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -1860,6 +1860,29 @@ static int read_unwind_hints(struct objt
return 0;
}
+static int read_noendbr_hints(struct objtool_file *file)
+{
+ struct section *sec;
+ struct instruction *insn;
+ struct reloc *reloc;
+
+ sec = find_section_by_name(file->elf, ".rela.discard.noendbr");
+ if (!sec)
+ return 0;
+
+ list_for_each_entry(reloc, &sec->reloc_list, list) {
+ insn = find_insn(file, reloc->sym->sec, reloc->sym->offset + reloc->addend);
+ if (!insn) {
+ WARN("bad .discard.noendbr entry");
+ return -1;
+ }
+
+ insn->noendbr = 1;
+ }
+
+ return 0;
+}
+
static int read_retpoline_hints(struct objtool_file *file)
{
struct section *sec;
@@ -2097,6 +2120,10 @@ static int decode_sections(struct objtoo
if (ret)
return ret;
+ ret = read_noendbr_hints(file);
+ if (ret)
+ return ret;
+
/*
* Must be before add_{jump_call}_destination.
*/
--- a/tools/objtool/include/objtool/check.h
+++ b/tools/objtool/include/objtool/check.h
@@ -45,11 +45,18 @@ struct instruction {
unsigned int len;
enum insn_type type;
unsigned long immediate;
- bool dead_end, ignore, ignore_alts;
- bool hint;
- bool retpoline_safe;
+
+ u8 dead_end : 1,
+ ignore : 1,
+ ignore_alts : 1,
+ hint : 1,
+ retpoline_safe : 1,
+ noendbr : 1;
+ /* 2 bit hole */
s8 instr;
u8 visited;
+ /* u8 hole */
+
struct alt_group *alt_group;
struct symbol *call_dest;
struct instruction *jump_dest;
^ permalink raw reply [flat|nested] 94+ messages in thread
* [PATCH 24/29] x86/text-patching: Make text_gen_insn() IBT aware
2022-02-18 16:49 [PATCH 00/29] x86: Kernel IBT Peter Zijlstra
` (22 preceding siblings ...)
2022-02-18 16:49 ` [PATCH 23/29] objtool: Read the NOENDBR annotation Peter Zijlstra
@ 2022-02-18 16:49 ` Peter Zijlstra
2022-02-24 1:18 ` Joao Moreira
2022-02-18 16:49 ` [PATCH 25/29] x86/ibt: Dont generate ENDBR in .discard.text Peter Zijlstra
` (5 subsequent siblings)
29 siblings, 1 reply; 94+ messages in thread
From: Peter Zijlstra @ 2022-02-18 16:49 UTC (permalink / raw)
To: x86, joao, hjl.tools, jpoimboe, andrew.cooper3
Cc: linux-kernel, peterz, ndesaulniers, keescook, samitolvanen,
mark.rutland, alyssa.milburn
Make sure we don't generate direct JMP/CALL instructions to an ENDBR
instruction (which might be poison).
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
arch/x86/include/asm/text-patching.h | 6 ++++++
1 file changed, 6 insertions(+)
--- a/arch/x86/include/asm/text-patching.h
+++ b/arch/x86/include/asm/text-patching.h
@@ -5,6 +5,7 @@
#include <linux/types.h>
#include <linux/stddef.h>
#include <asm/ptrace.h>
+#include <asm/ibt.h>
struct paravirt_patch_site;
#ifdef CONFIG_PARAVIRT
@@ -101,6 +102,11 @@ void *text_gen_insn(u8 opcode, const voi
static union text_poke_insn insn; /* per instance */
int size = text_opcode_size(opcode);
+#ifdef CONFIG_X86_IBT
+ if (is_endbr(dest))
+ dest += 4;
+#endif
+
insn.opcode = opcode;
if (size > 1) {
^ permalink raw reply [flat|nested] 94+ messages in thread
* [PATCH 25/29] x86/ibt: Dont generate ENDBR in .discard.text
2022-02-18 16:49 [PATCH 00/29] x86: Kernel IBT Peter Zijlstra
` (23 preceding siblings ...)
2022-02-18 16:49 ` [PATCH 24/29] x86/text-patching: Make text_gen_insn() IBT aware Peter Zijlstra
@ 2022-02-18 16:49 ` Peter Zijlstra
2022-02-18 16:49 ` [PATCH 26/29] objtool: Add IBT validation / fixups Peter Zijlstra
` (4 subsequent siblings)
29 siblings, 0 replies; 94+ messages in thread
From: Peter Zijlstra @ 2022-02-18 16:49 UTC (permalink / raw)
To: x86, joao, hjl.tools, jpoimboe, andrew.cooper3
Cc: linux-kernel, peterz, ndesaulniers, keescook, samitolvanen,
mark.rutland, alyssa.milburn
Having ENDBR in discarded sections can easily lead to relocations into
discarded sections which the linkers aren't really fond of. Objtool
also shouldn't generate them, but why tempt fate.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
arch/x86/include/asm/setup.h | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
--- a/arch/x86/include/asm/setup.h
+++ b/arch/x86/include/asm/setup.h
@@ -8,6 +8,7 @@
#include <linux/linkage.h>
#include <asm/page_types.h>
+#include <asm/ibt.h>
#ifdef __i386__
@@ -119,7 +120,7 @@ void *extend_brk(size_t size, size_t ali
* executable.)
*/
#define RESERVE_BRK(name,sz) \
- static void __section(".discard.text") __used notrace \
+ static void __section(".discard.text") __noendbr __used notrace \
__brk_reservation_fn_##name##__(void) { \
asm volatile ( \
".pushsection .brk_reservation,\"aw\",@nobits;" \
^ permalink raw reply [flat|nested] 94+ messages in thread
* [PATCH 26/29] objtool: Add IBT validation / fixups
2022-02-18 16:49 [PATCH 00/29] x86: Kernel IBT Peter Zijlstra
` (24 preceding siblings ...)
2022-02-18 16:49 ` [PATCH 25/29] x86/ibt: Dont generate ENDBR in .discard.text Peter Zijlstra
@ 2022-02-18 16:49 ` Peter Zijlstra
2022-02-18 16:49 ` [PATCH 27/29] x86/ibt: Finish --ibt-fix-direct on module loading Peter Zijlstra
` (3 subsequent siblings)
29 siblings, 0 replies; 94+ messages in thread
From: Peter Zijlstra @ 2022-02-18 16:49 UTC (permalink / raw)
To: x86, joao, hjl.tools, jpoimboe, andrew.cooper3
Cc: linux-kernel, peterz, ndesaulniers, keescook, samitolvanen,
mark.rutland, alyssa.milburn
Objtool based IBT validation in 3 passes:
--ibt:
Report code relocs that are not JMP/CALL and don't point to ENDBR
--ibt-fix-direct:
Detect and rewrite any code/reloc from a JMP/CALL instruction
to an ENDBR instruction. This is basically a compiler bug since
neither needs the ENDBR and decoding it is a pure waste of time.
--ibt-seal:
Find superfluous ENDBR instructions. Any function that
doesn't have it's address taken should not have an ENDBR
instruction. This removes about 1-in-4 ENDBR instructions.
All these flags are LTO like and require '--lto' to run.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
arch/x86/kernel/vmlinux.lds.S | 9
tools/objtool/arch/x86/decode.c | 82 +++++++
tools/objtool/builtin-check.c | 6
tools/objtool/check.c | 356 ++++++++++++++++++++++++++++++--
tools/objtool/include/objtool/arch.h | 3
tools/objtool/include/objtool/builtin.h | 3
tools/objtool/include/objtool/objtool.h | 4
tools/objtool/objtool.c | 1
8 files changed, 441 insertions(+), 23 deletions(-)
--- a/arch/x86/kernel/vmlinux.lds.S
+++ b/arch/x86/kernel/vmlinux.lds.S
@@ -285,6 +285,15 @@ SECTIONS
}
#endif
+#ifdef CONFIG_X86_IBT
+ . = ALIGN(8);
+ .ibt_endbr_sites : AT(ADDR(.ibt_endbr_sites) - LOAD_OFFSET) {
+ __ibt_endbr_sites = .;
+ *(.ibt_endbr_sites)
+ __ibt_endbr_sites_end = .;
+ }
+#endif
+
/*
* struct alt_inst entries. From the header (alternative.h):
* "Alternative instructions for different CPU types or capabilities"
--- a/tools/objtool/arch/x86/decode.c
+++ b/tools/objtool/arch/x86/decode.c
@@ -112,7 +112,7 @@ int arch_decode_instruction(struct objto
const struct elf *elf = file->elf;
struct insn insn;
int x86_64, ret;
- unsigned char op1, op2, op3,
+ unsigned char op1, op2, op3, prefix,
rex = 0, rex_b = 0, rex_r = 0, rex_w = 0, rex_x = 0,
modrm = 0, modrm_mod = 0, modrm_rm = 0, modrm_reg = 0,
sib = 0, /* sib_scale = 0, */ sib_index = 0, sib_base = 0;
@@ -137,6 +137,8 @@ int arch_decode_instruction(struct objto
if (insn.vex_prefix.nbytes)
return 0;
+ prefix = insn.prefixes.bytes[0];
+
op1 = insn.opcode.bytes[0];
op2 = insn.opcode.bytes[1];
op3 = insn.opcode.bytes[2];
@@ -492,6 +494,12 @@ int arch_decode_instruction(struct objto
/* nopl/nopw */
*type = INSN_NOP;
+ } else if (op2 == 0x1e) {
+
+ if (prefix == 0xf3 && (modrm == 0xfa || modrm == 0xfb))
+ *type = INSN_ENDBR;
+
+
} else if (op2 == 0x38 && op3 == 0xf8) {
if (insn.prefixes.nbytes == 1 &&
insn.prefixes.bytes[0] == 0xf2) {
@@ -605,6 +613,7 @@ int arch_decode_instruction(struct objto
op->dest.type = OP_DEST_REG;
op->dest.reg = CFI_SP;
}
+ *type = INSN_IRET;
break;
}
@@ -705,6 +714,77 @@ const char *arch_nop_insn(int len)
return nops[len-1];
}
+const char *arch_mod_immediate(struct instruction *insn, unsigned long target)
+{
+ struct section *sec = insn->sec;
+ Elf_Data *data = sec->data;
+ unsigned char op1, op2;
+ static char bytes[16];
+ struct insn x86_insn;
+ int ret, disp;
+
+ disp = (long)(target - (insn->offset + insn->len));
+
+ if (data->d_type != ELF_T_BYTE || data->d_off) {
+ WARN("unexpected data for section: %s", sec->name);
+ return NULL;
+ }
+
+ ret = insn_decode(&x86_insn, data->d_buf + insn->offset, insn->len,
+ INSN_MODE_64);
+ if (ret < 0) {
+ WARN("can't decode instruction at %s:0x%lx", sec->name, insn->offset);
+ return NULL;
+ }
+
+ op1 = x86_insn.opcode.bytes[0];
+ op2 = x86_insn.opcode.bytes[1];
+
+ switch (op1) {
+ case 0x0f: /* escape */
+ switch (op2) {
+ case 0x80 ... 0x8f: /* jcc.d32 */
+ if (insn->len != 6)
+ return NULL;
+ bytes[0] = op1;
+ bytes[1] = op2;
+ *(int *)&bytes[2] = disp;
+ break;
+
+ default:
+ return NULL;
+ }
+ break;
+
+ case 0x70 ... 0x7f: /* jcc.d8 */
+ case 0xeb: /* jmp.d8 */
+ if (insn->len != 2)
+ return NULL;
+
+ if (disp >> 7 != disp >> 31) {
+ WARN("displacement doesn't fit\n");
+ return NULL;
+ }
+
+ bytes[0] = op1;
+ bytes[1] = disp & 0xff;
+ break;
+
+ case 0xe8: /* call */
+ case 0xe9: /* jmp.d32 */
+ if (insn->len != 5)
+ return NULL;
+ bytes[0] = op1;
+ *(int *)&bytes[1] = disp;
+ break;
+
+ default:
+ return NULL;
+ }
+
+ return bytes;
+}
+
#define BYTE_RET 0xC3
const char *arch_ret_insn(int len)
--- a/tools/objtool/builtin-check.c
+++ b/tools/objtool/builtin-check.c
@@ -20,7 +20,8 @@
#include <objtool/objtool.h>
bool no_fp, no_unreachable, retpoline, module, backtrace, uaccess, stats,
- lto, vmlinux, mcount, noinstr, backup, sls, dryrun;
+ lto, vmlinux, mcount, noinstr, backup, sls, dryrun,
+ ibt, ibt_fix_direct, ibt_seal;
static const char * const check_usage[] = {
"objtool check [<options>] file.o",
@@ -47,6 +48,9 @@ const struct option check_options[] = {
OPT_BOOLEAN('B', "backup", &backup, "create .orig files before modification"),
OPT_BOOLEAN('S', "sls", &sls, "validate straight-line-speculation"),
OPT_BOOLEAN(0, "dry-run", &dryrun, "don't write the modifications"),
+ OPT_BOOLEAN(0, "ibt", &ibt, "validate ENDBR placement"),
+ OPT_BOOLEAN(0, "ibt-fix-direct", &ibt_fix_direct, "fixup direct jmp/call to ENDBR"),
+ OPT_BOOLEAN(0, "ibt-seal", &ibt_seal, "list superfluous ENDBR instructions"),
OPT_END(),
};
--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -380,6 +380,7 @@ static int decode_instructions(struct ob
memset(insn, 0, sizeof(*insn));
INIT_LIST_HEAD(&insn->alts);
INIT_LIST_HEAD(&insn->stack_ops);
+ INIT_LIST_HEAD(&insn->call_node);
insn->sec = sec;
insn->offset = offset;
@@ -731,6 +732,58 @@ static int create_retpoline_sites_sectio
return 0;
}
+static int create_ibt_endbr_sites_sections(struct objtool_file *file)
+{
+ struct instruction *insn;
+ struct section *sec;
+ int idx;
+
+ sec = find_section_by_name(file->elf, ".ibt_endbr_sites");
+ if (sec) {
+ WARN("file already has .ibt_endbr_sites, skipping");
+ return 0;
+ }
+
+ idx = 0;
+ list_for_each_entry(insn, &file->endbr_list, call_node)
+ idx++;
+
+ if (stats) {
+ printf("ibt: ENDBR at function start: %d\n", file->nr_endbr);
+ printf("ibt: ENDBR inside functions: %d\n", file->nr_endbr_int);
+ printf("ibt: superfluous ENDBR: %d\n", idx);
+ }
+
+ if (!idx)
+ return 0;
+
+ sec = elf_create_section(file->elf, ".ibt_endbr_sites", 0,
+ sizeof(int), idx);
+ if (!sec) {
+ WARN("elf_create_section: .ibt_endbr_sites");
+ return -1;
+ }
+
+ idx = 0;
+ list_for_each_entry(insn, &file->endbr_list, call_node) {
+
+ int *site = (int *)sec->data->d_buf + idx;
+ *site = 0;
+
+ if (elf_add_reloc_to_insn(file->elf, sec,
+ idx * sizeof(int),
+ R_X86_64_PC32,
+ insn->sec, insn->offset)) {
+ WARN("elf_add_reloc_to_insn: .ibt_endbr_sites");
+ return -1;
+ }
+
+ idx++;
+ }
+
+ return 0;
+}
+
static int create_mcount_loc_sections(struct objtool_file *file)
{
struct section *sec;
@@ -1176,6 +1229,15 @@ static int add_jump_destinations(struct
unsigned long dest_off;
for_each_insn(file, insn) {
+ if (insn->type == INSN_ENDBR && insn->func) {
+ if (insn->offset == insn->func->offset) {
+ list_add_tail(&insn->call_node, &file->endbr_list);
+ file->nr_endbr++;
+ } else {
+ file->nr_endbr_int++;
+ }
+ }
+
if (!is_static_jump(insn))
continue;
@@ -1192,10 +1254,14 @@ static int add_jump_destinations(struct
} else if (insn->func) {
/* internal or external sibling call (with reloc) */
add_call_dest(file, insn, reloc->sym, true);
- continue;
+
+ dest_sec = reloc->sym->sec;
+ dest_off = reloc->sym->offset +
+ arch_dest_reloc_offset(reloc->addend);
+
} else if (reloc->sym->sec->idx) {
dest_sec = reloc->sym->sec;
- dest_off = reloc->sym->sym.st_value +
+ dest_off = reloc->sym->offset +
arch_dest_reloc_offset(reloc->addend);
} else {
/* non-func asm code jumping to another file */
@@ -1205,6 +1271,10 @@ static int add_jump_destinations(struct
insn->jump_dest = find_insn(file, dest_sec, dest_off);
if (!insn->jump_dest) {
+ /* external symbol */
+ if (!vmlinux && insn->func)
+ continue;
+
/*
* This is a special case where an alt instruction
* jumps past the end of the section. These are
@@ -1219,6 +1289,32 @@ static int add_jump_destinations(struct
return -1;
}
+ if (ibt && insn->jump_dest->type == INSN_ENDBR &&
+ insn->jump_dest->func &&
+ insn->jump_dest->offset == insn->jump_dest->func->offset) {
+ if (reloc) {
+ if (ibt_fix_direct) {
+ reloc->addend += 4;
+ elf_write_reloc(file->elf, reloc);
+ } else {
+ WARN_FUNC("Direct RELOC jump to ENDBR", insn->sec, insn->offset);
+ }
+ } else {
+ if (ibt_fix_direct) {
+ const char *bytes = arch_mod_immediate(insn, dest_off + 4);
+ if (bytes) {
+ elf_write_insn(file->elf, insn->sec,
+ insn->offset, insn->len,
+ bytes);
+ } else {
+ WARN_FUNC("Direct IMM jump to ENDBR; cannot fix", insn->sec, insn->offset);
+ }
+ } else {
+ WARN_FUNC("Direct IMM jump to ENDBR", insn->sec, insn->offset);
+ }
+ }
+ }
+
/*
* Cross-function jump.
*/
@@ -1246,7 +1342,8 @@ static int add_jump_destinations(struct
insn->jump_dest->func->pfunc = insn->func;
} else if (insn->jump_dest->func->pfunc != insn->func->pfunc &&
- insn->jump_dest->offset == insn->jump_dest->func->offset) {
+ ((insn->jump_dest->offset == insn->jump_dest->func->offset) ||
+ (insn->jump_dest->offset == insn->jump_dest->func->offset + 4))) {
/* internal sibling call (without reloc) */
add_call_dest(file, insn, insn->jump_dest->func, true);
}
@@ -1256,23 +1353,12 @@ static int add_jump_destinations(struct
return 0;
}
-static struct symbol *find_call_destination(struct section *sec, unsigned long offset)
-{
- struct symbol *call_dest;
-
- call_dest = find_func_by_offset(sec, offset);
- if (!call_dest)
- call_dest = find_symbol_by_offset(sec, offset);
-
- return call_dest;
-}
-
/*
* Find the destination instructions for all calls.
*/
static int add_call_destinations(struct objtool_file *file)
{
- struct instruction *insn;
+ struct instruction *insn, *target = NULL;
unsigned long dest_off;
struct symbol *dest;
struct reloc *reloc;
@@ -1284,7 +1370,21 @@ static int add_call_destinations(struct
reloc = insn_reloc(file, insn);
if (!reloc) {
dest_off = arch_jump_destination(insn);
- dest = find_call_destination(insn->sec, dest_off);
+
+ target = find_insn(file, insn->sec, dest_off);
+ if (!target) {
+ WARN_FUNC("direct call to nowhere", insn->sec, insn->offset);
+ return -1;
+ }
+ dest = target->func;
+ if (!dest)
+ dest = find_symbol_containing(insn->sec, dest_off);
+ if (!dest) {
+ WARN_FUNC("IMM can't find call dest symbol at %s+0x%lx",
+ insn->sec, insn->offset,
+ insn->sec->name, dest_off);
+ return -1;
+ }
add_call_dest(file, insn, dest, false);
@@ -1303,10 +1403,25 @@ static int add_call_destinations(struct
}
} else if (reloc->sym->type == STT_SECTION) {
- dest_off = arch_dest_reloc_offset(reloc->addend);
- dest = find_call_destination(reloc->sym->sec, dest_off);
+ struct section *dest_sec;
+
+ dest_sec = reloc->sym->sec;
+ dest_off = reloc->sym->offset +
+ arch_dest_reloc_offset(reloc->addend);
+
+ target = find_insn(file, dest_sec, dest_off);
+ if (target) {
+ dest = target->func;
+ if (!dest)
+ dest = find_symbol_containing(dest_sec, dest_off);
+ } else {
+ WARN("foo");
+ dest = find_func_by_offset(dest_sec, dest_off);
+ if (!dest)
+ dest = find_symbol_by_offset(dest_sec, dest_off);
+ }
if (!dest) {
- WARN_FUNC("can't find call dest symbol at %s+0x%lx",
+ WARN_FUNC("RELOC can't find call dest symbol at %s+0x%lx",
insn->sec, insn->offset,
reloc->sym->sec->name,
dest_off);
@@ -1317,9 +1432,43 @@ static int add_call_destinations(struct
} else if (reloc->sym->retpoline_thunk) {
add_retpoline_call(file, insn);
+ continue;
+
+ } else {
+ struct section *dest_sec;
+
+ dest_sec = reloc->sym->sec;
+ dest_off = reloc->sym->offset +
+ arch_dest_reloc_offset(reloc->addend);
+
+ target = find_insn(file, dest_sec, dest_off);
- } else
add_call_dest(file, insn, reloc->sym, false);
+ }
+
+ if (ibt && target && target->type == INSN_ENDBR) {
+ if (reloc) {
+ if (ibt_fix_direct) {
+ reloc->addend += 4;
+ elf_write_reloc(file->elf, reloc);
+ } else {
+ WARN_FUNC("Direct RELOC call to ENDBR", insn->sec, insn->offset);
+ }
+ } else {
+ if (ibt_fix_direct) {
+ const char *bytes = arch_mod_immediate(insn, dest_off + 4);
+ if (bytes) {
+ elf_write_insn(file->elf, insn->sec,
+ insn->offset, insn->len,
+ bytes);
+ } else {
+ WARN_FUNC("Direct IMM call to ENDBR; cannot fix", insn->sec, insn->offset);
+ }
+ } else {
+ WARN_FUNC("Direct IMM call to ENDBR", insn->sec, insn->offset);
+ }
+ }
+ }
}
return 0;
@@ -3054,6 +3203,8 @@ static struct instruction *next_insn_to_
return next_insn_same_sec(file, insn);
}
+static void validate_ibt_insn(struct objtool_file *file, struct instruction *insn);
+
/*
* Follow the branch starting at the given instruction, and recursively follow
* any other branches (jumps). Meanwhile, track the frame pointer state at
@@ -3102,6 +3253,12 @@ static int validate_branch(struct objtoo
if (insn->hint) {
state.cfi = *insn->cfi;
+ if (ibt) {
+ if (insn->cfi->type == UNWIND_HINT_TYPE_REGS_ENTRY &&
+ insn->type != INSN_ENDBR) {
+ WARN_FUNC("IRET_ENTRY hint without ENDBR", insn->sec, insn->offset);
+ }
+ }
} else {
/* XXX track if we actually changed state.cfi */
@@ -3261,7 +3418,12 @@ static int validate_branch(struct objtoo
state.df = false;
break;
+ case INSN_NOP:
+ break;
+
default:
+ if (ibt)
+ validate_ibt_insn(file, insn);
break;
}
@@ -3507,6 +3669,131 @@ static int validate_functions(struct obj
return warnings;
}
+static struct instruction *
+validate_ibt_reloc(struct objtool_file *file, struct reloc *reloc)
+{
+ struct instruction *dest;
+ struct section *sec;
+ unsigned long off;
+
+ sec = reloc->sym->sec;
+ off = reloc->sym->offset + reloc->addend;
+
+ dest = find_insn(file, sec, off);
+ if (!dest)
+ return NULL;
+
+ if (dest->type == INSN_ENDBR) {
+ if (!list_empty(&dest->call_node))
+ list_del_init(&dest->call_node);
+
+ return NULL;
+ }
+
+ if (reloc->sym->static_call_tramp)
+ return NULL;
+
+ return dest;
+}
+
+static void validate_ibt_target(struct objtool_file *file, struct instruction *insn,
+ struct instruction *target)
+{
+ if (target->func && target->func == insn->func) {
+ /*
+ * Anything from->to self is either _THIS_IP_ or IRET-to-self.
+ *
+ * There is no sane way to annotate _THIS_IP_ since the compiler treats the
+ * relocation as a constant and is happy to fold in offsets, skewing any
+ * annotation we do, leading to vast amounts of false-positives.
+ *
+ * There's also compiler generated _THIS_IP_ through KCOV and
+ * such which we have no hope of annotating.
+ *
+ * As such, blanked accept self-references without issue.
+ */
+ return;
+ }
+
+ /*
+ * Annotated non-control flow target.
+ */
+ if (target->noendbr)
+ return;
+
+ WARN_FUNC("relocation to !ENDBR: %s+0x%lx",
+ insn->sec, insn->offset,
+ target->func ? target->func->name : target->sec->name,
+ target->func ? target->offset - target->func->offset : target->offset);
+}
+
+static void validate_ibt_insn(struct objtool_file *file, struct instruction *insn)
+{
+ struct reloc *reloc = insn_reloc(file, insn);
+ struct instruction *target;
+
+ for (;;) {
+ if (!reloc)
+ return;
+
+ target = validate_ibt_reloc(file, reloc);
+ if (target)
+ validate_ibt_target(file, insn, target);
+
+ reloc = find_reloc_by_dest_range(file->elf, insn->sec, reloc->offset + 1,
+ (insn->offset + insn->len) - (reloc->offset + 1));
+ }
+}
+
+static int validate_ibt(struct objtool_file *file)
+{
+ struct section *sec;
+ struct reloc *reloc;
+
+ for_each_sec(file, sec) {
+ bool is_data;
+
+ /* already done in validate_branch() */
+ if (sec->sh.sh_flags & SHF_EXECINSTR)
+ continue;
+
+ if (!sec->reloc)
+ continue;
+
+ if (!strncmp(sec->name, ".orc", 4))
+ continue;
+
+ if (!strncmp(sec->name, ".discard", 8))
+ continue;
+
+ if (!strncmp(sec->name, ".debug", 6))
+ continue;
+
+ if (!strcmp(sec->name, "_error_injection_whitelist"))
+ continue;
+
+ if (!strcmp(sec->name, "_kprobe_blacklist"))
+ continue;
+
+ is_data = strstr(sec->name, ".data") || strstr(sec->name, ".rodata");
+
+ list_for_each_entry(reloc, &sec->reloc->reloc_list, list) {
+ struct instruction *target;
+
+ target = validate_ibt_reloc(file, reloc);
+ if (is_data && target && !target->noendbr) {
+ WARN_FUNC("data relocaction to !ENDBR: %s+0x%lx",
+ reloc->sym->sec,
+ reloc->sym->offset + reloc->addend,
+ target->func ? target->func->name : target->sec->name,
+ target->func ? target->offset - target->func->offset : target->offset);
+ }
+ }
+ }
+
+ return 0;
+}
+
static int validate_reachable_instructions(struct objtool_file *file)
{
struct instruction *insn;
@@ -3534,6 +3821,21 @@ int check(struct objtool_file *file)
return 1;
}
+ if (ibt && !lto) {
+ fprintf(stderr, "--ibt requires: --lto\n");
+ return 1;
+ }
+
+ if (ibt_fix_direct && !ibt) {
+ fprintf(stderr, "--ibt-fix-direct requires: --ibt\n");
+ return 1;
+ }
+
+ if (ibt_seal && !ibt_fix_direct) {
+ fprintf(stderr, "--ibt-seal requires: --ibt-fix-direct\n");
+ return 1;
+ }
+
arch_initial_func_cfi_state(&initial_func_cfi);
init_cfi_state(&init_cfi);
init_cfi_state(&func_cfi);
@@ -3580,6 +3882,13 @@ int check(struct objtool_file *file)
goto out;
warnings += ret;
+ if (ibt) {
+ ret = validate_ibt(file);
+ if (ret < 0)
+ goto out;
+ warnings += ret;
+ }
+
if (!warnings) {
ret = validate_reachable_instructions(file);
if (ret < 0)
@@ -3604,6 +3913,13 @@ int check(struct objtool_file *file)
if (ret < 0)
goto out;
warnings += ret;
+ }
+
+ if (ibt_seal) {
+ ret = create_ibt_endbr_sites_sections(file);
+ if (ret < 0)
+ goto out;
+ warnings += ret;
}
if (stats) {
--- a/tools/objtool/include/objtool/arch.h
+++ b/tools/objtool/include/objtool/arch.h
@@ -27,6 +27,8 @@ enum insn_type {
INSN_STD,
INSN_CLD,
INSN_TRAP,
+ INSN_ENDBR,
+ INSN_IRET,
INSN_OTHER,
};
@@ -84,6 +86,7 @@ unsigned long arch_dest_reloc_offset(int
const char *arch_nop_insn(int len);
const char *arch_ret_insn(int len);
+const char *arch_mod_immediate(struct instruction *insn, unsigned long target);
int arch_decode_hint_reg(u8 sp_reg, int *base);
--- a/tools/objtool/include/objtool/builtin.h
+++ b/tools/objtool/include/objtool/builtin.h
@@ -9,7 +9,8 @@
extern const struct option check_options[];
extern bool no_fp, no_unreachable, retpoline, module, backtrace, uaccess, stats,
- lto, vmlinux, mcount, noinstr, backup, sls, dryrun;
+ lto, vmlinux, mcount, noinstr, backup, sls, dryrun,
+ ibt, ibt_fix_direct, ibt_seal;
extern int cmd_parse_options(int argc, const char **argv, const char * const usage[]);
--- a/tools/objtool/include/objtool/objtool.h
+++ b/tools/objtool/include/objtool/objtool.h
@@ -26,8 +26,12 @@ struct objtool_file {
struct list_head retpoline_call_list;
struct list_head static_call_list;
struct list_head mcount_loc_list;
+ struct list_head endbr_list;
bool ignore_unreachables, c_file, hints, rodata;
+ unsigned int nr_endbr;
+ unsigned int nr_endbr_int;
+
unsigned long jl_short, jl_long;
unsigned long jl_nop_short, jl_nop_long;
--- a/tools/objtool/objtool.c
+++ b/tools/objtool/objtool.c
@@ -128,6 +128,7 @@ struct objtool_file *objtool_open_read(c
INIT_LIST_HEAD(&file.retpoline_call_list);
INIT_LIST_HEAD(&file.static_call_list);
INIT_LIST_HEAD(&file.mcount_loc_list);
+ INIT_LIST_HEAD(&file.endbr_list);
file.c_file = !vmlinux && find_section_by_name(file.elf, ".comment");
file.ignore_unreachables = no_unreachable;
file.hints = false;
^ permalink raw reply [flat|nested] 94+ messages in thread
* [PATCH 27/29] x86/ibt: Finish --ibt-fix-direct on module loading
2022-02-18 16:49 [PATCH 00/29] x86: Kernel IBT Peter Zijlstra
` (25 preceding siblings ...)
2022-02-18 16:49 ` [PATCH 26/29] objtool: Add IBT validation / fixups Peter Zijlstra
@ 2022-02-18 16:49 ` Peter Zijlstra
2022-02-18 16:49 ` [PATCH 28/29] x86/ibt: Ensure module init/exit points have references Peter Zijlstra
` (2 subsequent siblings)
29 siblings, 0 replies; 94+ messages in thread
From: Peter Zijlstra @ 2022-02-18 16:49 UTC (permalink / raw)
To: x86, joao, hjl.tools, jpoimboe, andrew.cooper3
Cc: linux-kernel, peterz, ndesaulniers, keescook, samitolvanen,
mark.rutland, alyssa.milburn
Since modules are not fully linked objectes, per construction, the
LTO-like objtool pass cannot fix up the direct calls to external
functions.
Have the module loader finish the job.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
arch/x86/kernel/module.c | 40 +++++++++++++++++++++++++++++++++++++---
1 file changed, 37 insertions(+), 3 deletions(-)
--- a/arch/x86/kernel/module.c
+++ b/arch/x86/kernel/module.c
@@ -24,6 +24,7 @@
#include <asm/page.h>
#include <asm/setup.h>
#include <asm/unwind.h>
+#include <asm/ibt.h>
#if 0
#define DEBUGP(fmt, ...) \
@@ -128,6 +129,33 @@ int apply_relocate(Elf32_Shdr *sechdrs,
return 0;
}
#else /*X86_64*/
+
+static inline void ibt_fix_direct(void *loc, u64 *val)
+{
+#ifdef CONFIG_X86_IBT
+ const void *addr = (void *)(4 + *val);
+ union text_poke_insn text;
+ u32 insn;
+
+ if (get_kernel_nofault(insn, addr))
+ return;
+
+ if (!is_endbr(&insn))
+ return;
+
+ /* validate jmp.d32/call @ loc */
+ if (WARN_ONCE(get_kernel_nofault(text, loc-1) ||
+ (text.opcode != CALL_INSN_OPCODE &&
+ text.opcode != JMP32_INSN_OPCODE),
+ "Unexpected code at: %pS\n", loc))
+ return;
+
+ DEBUGP("ibt_fix_direct: %pS\n", addr);
+
+ *val += 4;
+#endif
+}
+
static int __apply_relocate_add(Elf64_Shdr *sechdrs,
const char *strtab,
unsigned int symindex,
@@ -139,6 +167,7 @@ static int __apply_relocate_add(Elf64_Sh
Elf64_Rela *rel = (void *)sechdrs[relsec].sh_addr;
Elf64_Sym *sym;
void *loc;
+ int type;
u64 val;
DEBUGP("Applying relocate section %u to %u\n",
@@ -153,13 +182,14 @@ static int __apply_relocate_add(Elf64_Sh
sym = (Elf64_Sym *)sechdrs[symindex].sh_addr
+ ELF64_R_SYM(rel[i].r_info);
+ type = ELF64_R_TYPE(rel[i].r_info);
+
DEBUGP("type %d st_value %Lx r_addend %Lx loc %Lx\n",
- (int)ELF64_R_TYPE(rel[i].r_info),
- sym->st_value, rel[i].r_addend, (u64)loc);
+ type, sym->st_value, rel[i].r_addend, (u64)loc);
val = sym->st_value + rel[i].r_addend;
- switch (ELF64_R_TYPE(rel[i].r_info)) {
+ switch (type) {
case R_X86_64_NONE:
break;
case R_X86_64_64:
@@ -185,6 +215,10 @@ static int __apply_relocate_add(Elf64_Sh
case R_X86_64_PLT32:
if (*(u32 *)loc != 0)
goto invalid_relocation;
+
+ if (type == R_X86_64_PLT32)
+ ibt_fix_direct(loc, &val);
+
val -= (u64)loc;
write(loc, &val, 4);
#if 0
^ permalink raw reply [flat|nested] 94+ messages in thread
* [PATCH 28/29] x86/ibt: Ensure module init/exit points have references
2022-02-18 16:49 [PATCH 00/29] x86: Kernel IBT Peter Zijlstra
` (26 preceding siblings ...)
2022-02-18 16:49 ` [PATCH 27/29] x86/ibt: Finish --ibt-fix-direct on module loading Peter Zijlstra
@ 2022-02-18 16:49 ` Peter Zijlstra
2022-02-18 16:49 ` [PATCH 29/29] x86/alternative: Use .ibt_endbr_sites to seal indirect calls Peter Zijlstra
2022-02-19 1:29 ` [PATCH 00/29] x86: Kernel IBT Edgecombe, Rick P
29 siblings, 0 replies; 94+ messages in thread
From: Peter Zijlstra @ 2022-02-18 16:49 UTC (permalink / raw)
To: x86, joao, hjl.tools, jpoimboe, andrew.cooper3
Cc: linux-kernel, peterz, ndesaulniers, keescook, samitolvanen,
mark.rutland, alyssa.milburn
Since the references to the module init/exit points only have external
references, a module LTO run will consider them 'unused' and seal
them, leading to an immediate fail on module load.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
include/linux/cfi.h | 11 ++++++++++-
1 file changed, 10 insertions(+), 1 deletion(-)
--- a/include/linux/cfi.h
+++ b/include/linux/cfi.h
@@ -34,8 +34,17 @@ static inline void cfi_module_remove(str
#else /* !CONFIG_CFI_CLANG */
-#define __CFI_ADDRESSABLE(fn, __attr)
+#ifdef CONFIG_X86_IBT
+
+#define __CFI_ADDRESSABLE(fn, __attr) \
+ const void *__cfi_jt_ ## fn __visible __attr = (void *)&fn
+
+#endif /* CONFIG_X86_IBT */
#endif /* CONFIG_CFI_CLANG */
+#ifndef __CFI_ADDRESSABLE
+#define __CFI_ADDRESSABLE(fn, __attr)
+#endif
+
#endif /* _LINUX_CFI_H */
^ permalink raw reply [flat|nested] 94+ messages in thread
* [PATCH 29/29] x86/alternative: Use .ibt_endbr_sites to seal indirect calls
2022-02-18 16:49 [PATCH 00/29] x86: Kernel IBT Peter Zijlstra
` (27 preceding siblings ...)
2022-02-18 16:49 ` [PATCH 28/29] x86/ibt: Ensure module init/exit points have references Peter Zijlstra
@ 2022-02-18 16:49 ` Peter Zijlstra
2022-02-19 1:29 ` [PATCH 00/29] x86: Kernel IBT Edgecombe, Rick P
29 siblings, 0 replies; 94+ messages in thread
From: Peter Zijlstra @ 2022-02-18 16:49 UTC (permalink / raw)
To: x86, joao, hjl.tools, jpoimboe, andrew.cooper3
Cc: linux-kernel, peterz, ndesaulniers, keescook, samitolvanen,
mark.rutland, alyssa.milburn
Objtool's --ibt-seal option generates .ibt_endbr_sites which lists
superfluous ENDBR instructions. That is those instructions for which
the function is never indirectly called.
Additionally, objtool's --ibt-fix-direct ensures direct calls never
target an ENDBR instruction.
Combined this yields that these instructions should never be executed.
Poison them using a 4 byte UD1 instruction; for IBT hardware this will
raise an #CP exception due to WAIT-FOR-ENDBR not getting what it
wants. For !IBT hardware it'll trigger #UD.
In either case, it will be 'impossible' to indirectly call these
functions thereafter.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
Makefile | 5 ++++
arch/um/kernel/um_arch.c | 4 +++
arch/x86/Kconfig | 12 +++++++++
arch/x86/include/asm/alternative.h | 1
arch/x86/include/asm/ibt.h | 4 ++-
arch/x86/kernel/alternative.c | 46 +++++++++++++++++++++++++++++++++++++
arch/x86/kernel/module.c | 10 ++++++--
arch/x86/kernel/traps.c | 35 ++++++++++++++++++++++++++--
scripts/Makefile.build | 3 +-
scripts/link-vmlinux.sh | 10 ++++++--
10 files changed, 122 insertions(+), 8 deletions(-)
--- a/Makefile
+++ b/Makefile
@@ -911,6 +911,11 @@ BUILD_LTO := y
export BUILD_LTO
endif
+ifdef CONFIG_X86_IBT_SEAL
+BUILD_LTO := y
+export BUILD_LTO
+endif
+
ifdef CONFIG_CFI_CLANG
CC_FLAGS_CFI := -fsanitize=cfi \
-fsanitize-cfi-cross-dso \
--- a/arch/um/kernel/um_arch.c
+++ b/arch/um/kernel/um_arch.c
@@ -424,6 +424,10 @@ void __init check_bugs(void)
os_check_bugs();
}
+void apply_ibt_endbr(s32 *start, s32 *end)
+{
+}
+
void apply_retpolines(s32 *start, s32 *end)
{
}
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1876,6 +1876,18 @@ config X86_IBT
an ENDBR instruction, as such, the compiler will litter the
code with them to make this happen.
+config X86_IBT_SEAL
+ prompt "Seal functions"
+ def_bool y
+ depends on X86_IBT && STACK_VALIDATION
+ help
+ In addition to building the kernel with IBT, seal all functions that
+ are not indirect call targets, avoiding them ever becomming one.
+
+ This requires LTO like objtool runs and will slow down the build. It
+ does significantly reduce the number of ENDBR instructions in the
+ kernel image as well as provide some validation for !IBT hardware.
+
config X86_INTEL_MEMORY_PROTECTION_KEYS
prompt "Memory Protection Keys"
def_bool y
--- a/arch/x86/include/asm/alternative.h
+++ b/arch/x86/include/asm/alternative.h
@@ -76,6 +76,7 @@ extern int alternatives_patched;
extern void alternative_instructions(void);
extern void apply_alternatives(struct alt_instr *start, struct alt_instr *end);
extern void apply_retpolines(s32 *start, s32 *end);
+extern void apply_ibt_endbr(s32 *start, s32 *end);
struct module;
--- a/arch/x86/include/asm/ibt.h
+++ b/arch/x86/include/asm/ibt.h
@@ -23,8 +23,10 @@
static inline bool is_endbr(const void *addr)
{
unsigned int val = ~*(unsigned int *)addr;
+ if (val == ~0x0040b90f) /* ud1_endbr */
+ return true;
val |= 0x01000000U;
- return val == ~0xfa1e0ff3;
+ return val == ~0xfa1e0ff3; /* endbr */
}
extern u64 ibt_save(void);
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -115,6 +115,7 @@ static void __init_or_module add_nops(vo
}
extern s32 __retpoline_sites[], __retpoline_sites_end[];
+extern s32 __ibt_endbr_sites[], __ibt_endbr_sites_end[];
extern struct alt_instr __alt_instructions[], __alt_instructions_end[];
extern s32 __smp_locks[], __smp_locks_end[];
void text_poke_early(void *addr, const void *opcode, size_t len);
@@ -512,6 +513,49 @@ void __init_or_module noinline apply_ret
#endif /* CONFIG_RETPOLINE && CONFIG_STACK_VALIDATION */
+#ifdef CONFIG_X86_IBT_SEAL
+
+/*
+ * ud1 0x0(%rax),%eax -- a 4 byte #UD instruction for when we don't have
+ * IBT and still want to trigger fail.
+ */
+static const u8 ud1_endbr[4] = { 0x0f, 0xb9, 0x40, 0x00 };
+
+/*
+ * Generated by: objtool --ibt-seal
+ */
+void __init_or_module noinline apply_ibt_endbr(s32 *start, s32 *end)
+{
+ s32 *s;
+
+ for (s = start; s < end; s++) {
+ void *addr = (void *)s + *s;
+ u32 endbr;
+
+ if (WARN_ON_ONCE(get_kernel_nofault(endbr, addr)))
+ continue;
+
+ if (WARN_ON_ONCE(!is_endbr(&endbr)))
+ continue;
+
+ DPRINTK("ENDBR at: %pS (%px)", addr, addr);
+
+ /*
+ * When we have IBT, the lack of ENDBR will trigger #CP
+ * When we don't have IBT, explicitly trigger #UD
+ */
+ DUMP_BYTES(((u8*)addr), 4, "%px: orig: ", addr);
+ DUMP_BYTES(((u8*)ud1_endbr), 4, "%px: repl: ", addr);
+ text_poke_early(addr, ud1_endbr, 4);
+ }
+}
+
+#else
+
+void __init_or_module noinline apply_ibt_endbr(s32 *start, s32 *end) { }
+
+#endif /* CONFIG_X86_IBT_SEAL */
+
#ifdef CONFIG_SMP
static void alternatives_smp_lock(const s32 *start, const s32 *end,
u8 *text, u8 *text_end)
@@ -832,6 +876,8 @@ void __init alternative_instructions(voi
*/
apply_alternatives(__alt_instructions, __alt_instructions_end);
+ apply_ibt_endbr(__ibt_endbr_sites, __ibt_endbr_sites_end);
+
#ifdef CONFIG_SMP
/* Patch to UP if other cpus not imminent. */
if (!noreplace_smp && (num_present_cpus() == 1 || setup_max_cpus <= 1)) {
--- a/arch/x86/kernel/module.c
+++ b/arch/x86/kernel/module.c
@@ -132,7 +132,7 @@ int apply_relocate(Elf32_Shdr *sechdrs,
static inline void ibt_fix_direct(void *loc, u64 *val)
{
-#ifdef CONFIG_X86_IBT
+#ifdef CONFIG_X86_IBT_SEAL
const void *addr = (void *)(4 + *val);
union text_poke_insn text;
u32 insn;
@@ -287,7 +287,7 @@ int module_finalize(const Elf_Ehdr *hdr,
{
const Elf_Shdr *s, *text = NULL, *alt = NULL, *locks = NULL,
*para = NULL, *orc = NULL, *orc_ip = NULL,
- *retpolines = NULL;
+ *retpolines = NULL, *ibt_endbr = NULL;
char *secstrings = (void *)hdr + sechdrs[hdr->e_shstrndx].sh_offset;
for (s = sechdrs; s < sechdrs + hdr->e_shnum; s++) {
@@ -305,6 +305,8 @@ int module_finalize(const Elf_Ehdr *hdr,
orc_ip = s;
if (!strcmp(".retpoline_sites", secstrings + s->sh_name))
retpolines = s;
+ if (!strcmp(".ibt_endbr_sites", secstrings + s->sh_name))
+ ibt_endbr = s;
}
if (para) {
@@ -320,6 +322,10 @@ int module_finalize(const Elf_Ehdr *hdr,
void *aseg = (void *)alt->sh_addr;
apply_alternatives(aseg, aseg + alt->sh_size);
}
+ if (ibt_endbr) {
+ void *iseg = (void *)ibt_endbr->sh_addr;
+ apply_ibt_endbr(iseg, iseg + ibt_endbr->sh_size);
+ }
if (locks && text) {
void *lseg = (void *)locks->sh_addr;
void *tseg = (void *)text->sh_addr;
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -214,6 +214,12 @@ DEFINE_IDTENTRY(exc_overflow)
static bool ibt_fatal = true;
+static void handle_endbr(struct pt_regs *regs)
+{
+ pr_err("Missing ENDBR: %pS\n", (void *)instruction_pointer(regs));
+ BUG_ON(ibt_fatal);
+}
+
extern unsigned long ibt_selftest_ip; /* defined in asm beow */
static volatile bool ibt_selftest_ok = false;
@@ -232,8 +238,7 @@ DEFINE_IDTENTRY_ERRORCODE(exc_control_pr
return;
}
- pr_err("Missing ENDBR: %pS\n", (void *)instruction_pointer(regs));
- BUG_ON(ibt_fatal);
+ handle_endbr(regs);
}
bool ibt_selftest(void)
@@ -277,6 +282,29 @@ static int __init ibt_setup(char *str)
__setup("ibt=", ibt_setup);
+static bool handle_ud1_endbr(struct pt_regs *regs)
+{
+ u32 ud1;
+
+ if (get_kernel_nofault(ud1, (u32 *)regs->ip))
+ return false;
+
+ if (ud1 == 0x0040b90f) {
+ handle_endbr(regs);
+ regs->ip += 4;
+ return true;
+ }
+
+ return false;
+}
+
+#else /* CONFIG_X86_IBT */
+
+static bool handle_ud1_endbr(struct pt_regs *regs)
+{
+ return false;
+}
+
#endif /* CONFIG_X86_IBT */
#ifdef CONFIG_X86_F00F_BUG
@@ -285,6 +313,9 @@ void handle_invalid_op(struct pt_regs *r
static inline void handle_invalid_op(struct pt_regs *regs)
#endif
{
+ if (!user_mode(regs) && handle_ud1_endbr(regs))
+ return;
+
do_error_trap(regs, 0, "invalid opcode", X86_TRAP_UD, SIGILL,
ILL_ILLOPN, error_get_trap_addr(regs));
}
--- a/scripts/Makefile.build
+++ b/scripts/Makefile.build
@@ -231,6 +231,7 @@ objtool_args = \
$(if $(CONFIG_UNWINDER_ORC),orc generate,check) \
$(if $(part-of-module), --module) \
$(if $(BUILD_LTO), --lto) \
+ $(if $(CONFIG_X86_IBT_SEAL), --ibt --ibt-fix-direct --ibt-seal) \
$(if $(CONFIG_FRAME_POINTER),, --no-fp) \
$(if $(CONFIG_GCOV_KERNEL)$(CONFIG_LTO_CLANG), --no-unreachable)\
$(if $(CONFIG_RETPOLINE), --retpoline) \
@@ -305,7 +306,7 @@ quiet_cmd_cc_lto_link_modules = LTO [M]
--whole-archive $(filter-out FORCE,$^) \
$(cmd_objtool)
else
-quiet_cmd_cc_lto_link_modules = LD [M] $@
+quiet_cmd_cc_lto_link_modules = LD [M] $@
cmd_cc_lto_link_modules = \
$(LD) $(ld_flags) -r -o $@ \
$(filter-out FORCE,$^) \
--- a/scripts/link-vmlinux.sh
+++ b/scripts/link-vmlinux.sh
@@ -108,7 +108,9 @@ objtool_link()
local objtoolcmd;
local objtoolopt;
- if is_enabled CONFIG_LTO_CLANG && is_enabled CONFIG_STACK_VALIDATION; then
+ if is_enabled CONFIG_STACK_VALIDATION && \
+ ( is_enabled CONFIG_LTO_CLANG || is_enabled CONFIG_X86_IBT_SEAL ); then
+
# Don't perform vmlinux validation unless explicitly requested,
# but run objtool on vmlinux.o now that we have an object file.
if is_enabled CONFIG_UNWINDER_ORC; then
@@ -117,6 +119,10 @@ objtool_link()
objtoolopt="${objtoolopt} --lto"
+ if is_enabled CONFIG_X86_IBT_SEAL; then
+ objtoolopt="${objtoolopt} --ibt --ibt-fix-direct --ibt-seal"
+ fi
+
if is_enabled CONFIG_FTRACE_MCOUNT_USE_OBJTOOL; then
objtoolopt="${objtoolopt} --mcount"
fi
@@ -168,7 +174,7 @@ vmlinux_link()
# skip output file argument
shift
- if is_enabled CONFIG_LTO_CLANG; then
+ if is_enabled CONFIG_LTO_CLANG || is_enabled CONFIG_X86_IBT_SEAL; then
# Use vmlinux.o instead of performing the slow LTO link again.
objs=vmlinux.o
libs=
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: [PATCH 14/29] x86/ibt: Add IBT feature, MSR and #CP handling
2022-02-18 16:49 ` [PATCH 14/29] x86/ibt: Add IBT feature, MSR and #CP handling Peter Zijlstra
@ 2022-02-18 19:31 ` Andrew Cooper
2022-02-18 21:15 ` Peter Zijlstra
2022-02-19 1:20 ` Edgecombe, Rick P
` (3 subsequent siblings)
4 siblings, 1 reply; 94+ messages in thread
From: Andrew Cooper @ 2022-02-18 19:31 UTC (permalink / raw)
To: Peter Zijlstra, x86, joao, hjl.tools, jpoimboe
Cc: linux-kernel, ndesaulniers, keescook, samitolvanen, mark.rutland,
alyssa.milburn
On 18/02/2022 16:49, Peter Zijlstra wrote:
> --- a/arch/x86/kernel/cpu/common.c
> +++ b/arch/x86/kernel/cpu/common.c
> @@ -592,6 +593,27 @@ static __init int setup_disable_pku(char
> __setup("nopku", setup_disable_pku);
> #endif /* CONFIG_X86_64 */
>
> +static __always_inline void setup_cet(struct cpuinfo_x86 *c)
> +{
> + u64 msr;
> +
> + if (!IS_ENABLED(CONFIG_X86_IBT) ||
> + !cpu_feature_enabled(X86_FEATURE_IBT))
> + return;
> +
> + cr4_set_bits(X86_CR4_CET);
> +
> + rdmsrl(MSR_IA32_S_CET, msr);
> + if (cpu_feature_enabled(X86_FEATURE_IBT))
> + msr |= CET_ENDBR_EN;
> + wrmsrl(MSR_IA32_S_CET, msr);
So something I learnt the hard way with shstk is that you really want to
disable S_CET before heading into purgatory.
I've got no idea what's going to result from UEFI finally getting CET
support. However, clearing out the other IBT settings is probably a
wise move.
In particular, if there was a stale legacy bitmap pointer, then
ibt_selftest() could take #PF ahead of #CP.
~Andrew
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: [PATCH 19/29] x86/ibt,xen: Annotate away warnings
2022-02-18 16:49 ` [PATCH 19/29] x86/ibt,xen: Annotate away warnings Peter Zijlstra
@ 2022-02-18 20:24 ` Andrew Cooper
2022-02-18 21:05 ` Peter Zijlstra
0 siblings, 1 reply; 94+ messages in thread
From: Andrew Cooper @ 2022-02-18 20:24 UTC (permalink / raw)
To: Peter Zijlstra, x86, joao, hjl.tools, jpoimboe, Juergen Gross
Cc: linux-kernel, ndesaulniers, keescook, samitolvanen, mark.rutland,
alyssa.milburn, Andrew Cooper
On 18/02/2022 16:49, Peter Zijlstra wrote:
> The xen_iret ENDBR is needed for pre-alternative code calling the
> pv_ops using indirect calls.
>
> The rest look like hypervisor entry points which will be IRET like
> transfers and as such don't need ENDBR.
That's up for debate. Mechanically, yes - they're IRET or SYSERET.
Logically however, they're entrypoints registered with Xen, so following
the spec, Xen ought to force WAIT-FOR-ENDBR.
Or we could argue that said entrypoints are registered in Xen.
The case for ENDBR for the IDT vectors is quite obviously - a stray
write into the IDT can modify the entrypoint, and ENDBR limits an
attacker's choices.
OTOH, the SYSCALL and SYSENTER entrypoints are latched in MSRs, and if
you've got a sufficiently large security hole that the attacker can
write these MSRs, you have already lost. I'm not aware of any extra
security you get from forcing WAIT-FOR-ENDBR in the SYSCALL/SYSENTER
flow, and suspect it was like that just for consistency.
Under Xen PV, all entrypoints are configured by explicit hypercall, not
via a shared memory structure, so better match the MSR model for
native. I could probably be argued away from having a RMW of MSR_U_CET
in the event delivery fastpath.
I'd be tempted to leave the ENDBR's in. It feels like a safer default
until we figure out how to paravirt IBT properly.
> The hypercall page comes from the hypervisor, there might or might not
> be ENDBR there, not our problem.
Xen will make sure that the hypercall page contains ENDBR's if CET-IBT
is available for the guest to use. Perhaps...
> --- a/arch/x86/xen/xen-head.S
> +++ b/arch/x86/xen/xen-head.S
> @@ -25,8 +25,8 @@
> SYM_CODE_START(hypercall_page)
> .rept (PAGE_SIZE / 32)
> UNWIND_HINT_FUNC
> - .skip 31, 0x90
> - RET
> + ANNOTATE_NOENDBR
> + .skip 32, 0xcc
// Xen writes the hypercall page, and will sort out ENDBR
?
Also, somewhere in this series needs:
diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c
index 5004feb16783..e30f77264ee6 100644
--- a/arch/x86/xen/enlighten_pv.c
+++ b/arch/x86/xen/enlighten_pv.c
@@ -624,6 +624,7 @@ static struct trap_array_entry trap_array[] = {
TRAP_ENTRY(exc_coprocessor_error, false ),
TRAP_ENTRY(exc_alignment_check, false ),
TRAP_ENTRY(exc_simd_coprocessor_error, false ),
+ TRAP_ENTRY(exc_control_protection, false ),
};
static bool __ref get_trap_addr(void **addr, unsigned int ist)
diff --git a/arch/x86/xen/xen-asm.S b/arch/x86/xen/xen-asm.S
index 444d824775f6..6f077aedd561 100644
--- a/arch/x86/xen/xen-asm.S
+++ b/arch/x86/xen/xen-asm.S
@@ -147,6 +147,7 @@ xen_pv_trap asm_exc_page_fault
xen_pv_trap asm_exc_spurious_interrupt_bug
xen_pv_trap asm_exc_coprocessor_error
xen_pv_trap asm_exc_alignment_check
+xen_pv_trap asm_exc_control_protection
#ifdef CONFIG_X86_MCE
xen_pv_trap asm_xenpv_exc_machine_check
#endif /* CONFIG_X86_MCE */
at a minimum, and possibly also:
diff --git a/arch/x86/xen/xen-asm.S b/arch/x86/xen/xen-asm.S
index 444d824775f6..96db5c50a6e7 100644
--- a/arch/x86/xen/xen-asm.S
+++ b/arch/x86/xen/xen-asm.S
@@ -124,7 +124,7 @@ SYM_CODE_START(xen_\name)
UNWIND_HINT_EMPTY
pop %rcx
pop %r11
- jmp \name
+ jmp \name + 4 * IS_ENABLED(CONFIG_X86_IBT)
SYM_CODE_END(xen_\name)
_ASM_NOKPROBE(xen_\name)
.endm
(Entirely untested.)
~Andrew
^ permalink raw reply related [flat|nested] 94+ messages in thread
* Re: [PATCH 02/29] x86/module: Fix the paravirt vs alternative order
2022-02-18 16:49 ` [PATCH 02/29] x86/module: Fix the paravirt vs alternative order Peter Zijlstra
@ 2022-02-18 20:28 ` Josh Poimboeuf
2022-02-18 21:22 ` Peter Zijlstra
0 siblings, 1 reply; 94+ messages in thread
From: Josh Poimboeuf @ 2022-02-18 20:28 UTC (permalink / raw)
To: Peter Zijlstra
Cc: x86, joao, hjl.tools, andrew.cooper3, linux-kernel, ndesaulniers,
keescook, samitolvanen, mark.rutland, alyssa.milburn,
Juergen Gross
On Fri, Feb 18, 2022 at 05:49:04PM +0100, Peter Zijlstra wrote:
> Ever since commit 4e6292114c741 ("x86/paravirt: Add new features for
> paravirt patching") there is an ordering dependency between patching
> paravirt ops and patching alternatives, the module loader still
> violates this.
>
> Fixes: 4e6292114c741 ("x86/paravirt: Add new features for paravirt patching")
> Cc: Juergen Gross <jgross@suse.com>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Probably a good idea to put the 'para' and 'alt' clauses next to each
other and add a comment that the ordering is necessary.
> ---
> arch/x86/kernel/module.c | 9 ++++-----
> 1 file changed, 4 insertions(+), 5 deletions(-)
>
> --- a/arch/x86/kernel/module.c
> +++ b/arch/x86/kernel/module.c
> @@ -272,6 +272,10 @@ int module_finalize(const Elf_Ehdr *hdr,
> retpolines = s;
> }
>
> + if (para) {
> + void *pseg = (void *)para->sh_addr;
> + apply_paravirt(pseg, pseg + para->sh_size);
> + }
> if (retpolines) {
> void *rseg = (void *)retpolines->sh_addr;
> apply_retpolines(rseg, rseg + retpolines->sh_size);
> @@ -289,11 +293,6 @@ int module_finalize(const Elf_Ehdr *hdr,
> tseg, tseg + text->sh_size);
> }
>
> - if (para) {
> - void *pseg = (void *)para->sh_addr;
> - apply_paravirt(pseg, pseg + para->sh_size);
> - }
> -
> /* make jump label nops */
> jump_label_apply_nops(me);
>
>
>
--
Josh
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: [PATCH 05/29] x86: Base IBT bits
2022-02-18 16:49 ` [PATCH 05/29] x86: Base IBT bits Peter Zijlstra
@ 2022-02-18 20:49 ` Andrew Cooper
2022-02-18 21:11 ` David Laight
2022-02-18 21:26 ` Peter Zijlstra
2022-02-18 21:14 ` Josh Poimboeuf
` (2 subsequent siblings)
3 siblings, 2 replies; 94+ messages in thread
From: Andrew Cooper @ 2022-02-18 20:49 UTC (permalink / raw)
To: Peter Zijlstra, x86, joao, hjl.tools, jpoimboe
Cc: linux-kernel, ndesaulniers, keescook, samitolvanen, mark.rutland,
alyssa.milburn, Andrew Cooper
On 18/02/2022 16:49, Peter Zijlstra wrote:
> +/*
> + * A bit convoluted, but matches both endbr32 and endbr64 without
> + * having either as literal in the text.
> + */
> +static inline bool is_endbr(const void *addr)
> +{
> + unsigned int val = ~*(unsigned int *)addr;
> + val |= 0x01000000U;
> + return val == ~0xfa1e0ff3;
> +}
At this point, I feel I've earned an "I told you so". :)
Clang 13 sees straight through the trickery and generates:
is_endbr: # @is_endbr
movl $-16777217, %eax # imm = 0xFEFFFFFF
andl (%rdi), %eax
cmpl $-98693133, %eax # imm = 0xFA1E0FF3
sete %al
retq
Here's one I prepared earlier:
/*
* In some cases we need to inspect/insert endbr64 instructions.
*
* The naive way, mem{cmp,cpy}(ptr, "\xf3\x0f\x1e\xfa", 4), optimises
unsafely
* by placing 0xfa1e0ff3 in an imm32 operand, and marks a legal indirect
* branch target as far as the CPU is concerned.
*
* gen_endbr64() is written deliberately to avoid the problematic
operand, and
* marked __const__ as it is safe for the optimiser to hoist/merge/etc.
*/
static inline uint32_t __attribute_const__ gen_endbr64(void)
{
uint32_t res;
asm ( "mov $~0xfa1e0ff3, %[res]\n\t"
"not %[res]\n\t"
: [res] "=&r" (res) );
return res;
}
which should be robust against even the most enterprising optimiser.
~Andrew
P.S. Clang IAS had better never get "clever" enough to optimise what it
finds in asm statements...
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: [PATCH 19/29] x86/ibt,xen: Annotate away warnings
2022-02-18 20:24 ` Andrew Cooper
@ 2022-02-18 21:05 ` Peter Zijlstra
2022-02-18 23:07 ` Andrew Cooper
0 siblings, 1 reply; 94+ messages in thread
From: Peter Zijlstra @ 2022-02-18 21:05 UTC (permalink / raw)
To: Andrew Cooper
Cc: x86, joao, hjl.tools, jpoimboe, Juergen Gross, linux-kernel,
ndesaulniers, keescook, samitolvanen, mark.rutland,
alyssa.milburn
On Fri, Feb 18, 2022 at 08:24:41PM +0000, Andrew Cooper wrote:
> On 18/02/2022 16:49, Peter Zijlstra wrote:
> > The xen_iret ENDBR is needed for pre-alternative code calling the
> > pv_ops using indirect calls.
> >
> > The rest look like hypervisor entry points which will be IRET like
> > transfers and as such don't need ENDBR.
>
> That's up for debate. Mechanically, yes - they're IRET or SYSERET.
>
> Logically however, they're entrypoints registered with Xen, so following
> the spec, Xen ought to force WAIT-FOR-ENDBR.
Cute..
> I'd be tempted to leave the ENDBR's in. It feels like a safer default
> until we figure out how to paravirt IBT properly.
Fair enough, done.
> at a minimum, and possibly also:
>
> diff --git a/arch/x86/xen/xen-asm.S b/arch/x86/xen/xen-asm.S
> index 444d824775f6..96db5c50a6e7 100644
> --- a/arch/x86/xen/xen-asm.S
> +++ b/arch/x86/xen/xen-asm.S
> @@ -124,7 +124,7 @@ SYM_CODE_START(xen_\name)
> UNWIND_HINT_EMPTY
> pop %rcx
> pop %r11
> - jmp \name
> + jmp \name + 4 * IS_ENABLED(CONFIG_X86_IBT)
> SYM_CODE_END(xen_\name)
> _ASM_NOKPROBE(xen_\name)
> .endm
objtool will do that for you, it will rewrite all direct jmp/call to
endbr.
Something like so then?
---
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -818,6 +818,7 @@ SYM_CODE_END(exc_xen_hypervisor_callback
*/
SYM_CODE_START(xen_failsafe_callback)
UNWIND_HINT_EMPTY
+ ENDBR
movl %ds, %ecx
cmpw %cx, 0x10(%rsp)
jne 1f
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -392,6 +392,7 @@ SYM_CODE_START(early_idt_handler_array)
.endr
UNWIND_HINT_IRET_REGS offset=16 entry=0
SYM_CODE_END(early_idt_handler_array)
+ ANNOTATE_NOENDBR // early_idt_handler_array[NUM_EXCEPTION_VECTORS]
SYM_CODE_START_LOCAL(early_idt_handler_common)
/*
--- a/arch/x86/xen/enlighten_pv.c
+++ b/arch/x86/xen/enlighten_pv.c
@@ -624,6 +624,7 @@ static struct trap_array_entry trap_arra
TRAP_ENTRY(exc_coprocessor_error, false ),
TRAP_ENTRY(exc_alignment_check, false ),
TRAP_ENTRY(exc_simd_coprocessor_error, false ),
+ TRAP_ENTRY(exc_control_protection, false ),
};
static bool __ref get_trap_addr(void **addr, unsigned int ist)
--- a/arch/x86/xen/xen-asm.S
+++ b/arch/x86/xen/xen-asm.S
@@ -122,6 +122,7 @@ SYM_FUNC_END(xen_read_cr2_direct);
.macro xen_pv_trap name
SYM_CODE_START(xen_\name)
UNWIND_HINT_EMPTY
+ ENDBR
pop %rcx
pop %r11
jmp \name
@@ -147,6 +148,7 @@ xen_pv_trap asm_exc_page_fault
xen_pv_trap asm_exc_spurious_interrupt_bug
xen_pv_trap asm_exc_coprocessor_error
xen_pv_trap asm_exc_alignment_check
+xen_pv_trap_asm_exc_control_protection
#ifdef CONFIG_X86_MCE
xen_pv_trap asm_xenpv_exc_machine_check
#endif /* CONFIG_X86_MCE */
@@ -162,6 +164,7 @@ SYM_CODE_START(xen_early_idt_handler_arr
i = 0
.rept NUM_EXCEPTION_VECTORS
UNWIND_HINT_EMPTY
+ ENDBR
pop %rcx
pop %r11
jmp early_idt_handler_array + i*EARLY_IDT_HANDLER_SIZE
@@ -169,6 +172,7 @@ SYM_CODE_START(xen_early_idt_handler_arr
.fill xen_early_idt_handler_array + i*XEN_EARLY_IDT_HANDLER_SIZE - ., 1, 0xcc
.endr
SYM_CODE_END(xen_early_idt_handler_array)
+ ANNOTATE_NOENDBR
__FINIT
hypercall_iret = hypercall_page + __HYPERVISOR_iret * 32
@@ -189,6 +193,7 @@ hypercall_iret = hypercall_page + __HYPE
*/
SYM_CODE_START(xen_iret)
UNWIND_HINT_EMPTY
+ ENDBR
pushq $0
jmp hypercall_iret
SYM_CODE_END(xen_iret)
@@ -230,6 +235,7 @@ SYM_CODE_END(xenpv_restore_regs_and_retu
/* Normal 64-bit system call target */
SYM_CODE_START(xen_syscall_target)
UNWIND_HINT_EMPTY
+ ENDBR
popq %rcx
popq %r11
@@ -249,6 +255,7 @@ SYM_CODE_END(xen_syscall_target)
/* 32-bit compat syscall target */
SYM_CODE_START(xen_syscall32_target)
UNWIND_HINT_EMPTY
+ ENDBR
popq %rcx
popq %r11
@@ -266,6 +273,7 @@ SYM_CODE_END(xen_syscall32_target)
/* 32-bit compat sysenter target */
SYM_CODE_START(xen_sysenter_target)
UNWIND_HINT_EMPTY
+ ENDBR
/*
* NB: Xen is polite and clears TF from EFLAGS for us. This means
* that we don't need to guard against single step exceptions here.
@@ -289,6 +297,7 @@ SYM_CODE_END(xen_sysenter_target)
SYM_CODE_START(xen_syscall32_target)
SYM_CODE_START(xen_sysenter_target)
UNWIND_HINT_EMPTY
+ ENDBR
lea 16(%rsp), %rsp /* strip %rcx, %r11 */
mov $-ENOSYS, %rax
pushq $0
--- a/arch/x86/xen/xen-head.S
+++ b/arch/x86/xen/xen-head.S
@@ -25,8 +25,11 @@
SYM_CODE_START(hypercall_page)
.rept (PAGE_SIZE / 32)
UNWIND_HINT_FUNC
- .skip 31, 0x90
- RET
+ ANNOTATE_NOENDBR
+ /*
+ * Xen will write the hypercall page, and sort out ENDBR.
+ */
+ .skip 32, 0xcc
.endr
#define HYPERCALL(n) \
@@ -74,6 +77,7 @@ SYM_CODE_END(startup_xen)
.pushsection .text
SYM_CODE_START(asm_cpu_bringup_and_idle)
UNWIND_HINT_EMPTY
+ ENDBR
call cpu_bringup_and_idle
SYM_CODE_END(asm_cpu_bringup_and_idle)
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: [PATCH 04/29] x86/livepatch: Validate __fentry__ location
2022-02-18 16:49 ` [PATCH 04/29] x86/livepatch: Validate __fentry__ location Peter Zijlstra
@ 2022-02-18 21:08 ` Josh Poimboeuf
2022-02-23 10:09 ` Peter Zijlstra
0 siblings, 1 reply; 94+ messages in thread
From: Josh Poimboeuf @ 2022-02-18 21:08 UTC (permalink / raw)
To: Peter Zijlstra
Cc: x86, joao, hjl.tools, andrew.cooper3, linux-kernel, ndesaulniers,
keescook, samitolvanen, mark.rutland, alyssa.milburn,
Miroslav Benes, Steven Rostedt
On Fri, Feb 18, 2022 at 05:49:06PM +0100, Peter Zijlstra wrote:
> Currently livepatch assumes __fentry__ lives at func+0, which is most
> likely untrue with IBT on. Override the weak klp_get_ftrace_location()
> function with an arch specific version that's IBT aware.
>
> Also make the weak fallback verify the location is an actual ftrace
> location as a sanity check.
>
> Suggested-by: Miroslav Benes <mbenes@suse.cz>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
> arch/x86/include/asm/livepatch.h | 9 +++++++++
> kernel/livepatch/patch.c | 2 +-
> 2 files changed, 10 insertions(+), 1 deletion(-)
>
> --- a/arch/x86/include/asm/livepatch.h
> +++ b/arch/x86/include/asm/livepatch.h
> @@ -17,4 +17,13 @@ static inline void klp_arch_set_pc(struc
> ftrace_instruction_pointer_set(fregs, ip);
> }
>
> +#define klp_get_ftrace_location klp_get_ftrace_location
> +static inline unsigned long klp_get_ftrace_location(unsigned long faddr)
> +{
> + unsigned long addr = ftrace_location(faddr);
> + if (!addr && IS_ENABLED(CONFIG_X86_IBT))
> + addr = ftrace_location(faddr + 4);
> + return addr;
I'm kind of surprised this logic doesn't exist in ftrace itself. Is
livepatch really the only user that needs to find the fentry for a given
function?
I had to do a double take for the ftrace_location() semantics, as I
originally assumed that's what it did, based on its name and signature.
Instead it apparently functions like a bool but returns its argument on
success.
Though the function comment tells a different story:
/**
* ftrace_location - return true if the ip giving is a traced location
So it's all kinds of confusing...
--
Josh
^ permalink raw reply [flat|nested] 94+ messages in thread
* RE: [PATCH 05/29] x86: Base IBT bits
2022-02-18 20:49 ` Andrew Cooper
@ 2022-02-18 21:11 ` David Laight
2022-02-18 21:24 ` Andrew Cooper
2022-02-18 21:26 ` Peter Zijlstra
1 sibling, 1 reply; 94+ messages in thread
From: David Laight @ 2022-02-18 21:11 UTC (permalink / raw)
To: 'Andrew Cooper', Peter Zijlstra, x86, joao, hjl.tools, jpoimboe
Cc: linux-kernel, ndesaulniers, keescook, samitolvanen, mark.rutland,
alyssa.milburn
From: Andrew Cooper
> Sent: 18 February 2022 20:50
>
> On 18/02/2022 16:49, Peter Zijlstra wrote:
> > +/*
> > + * A bit convoluted, but matches both endbr32 and endbr64 without
> > + * having either as literal in the text.
> > + */
> > +static inline bool is_endbr(const void *addr)
> > +{
> > + unsigned int val = ~*(unsigned int *)addr;
> > + val |= 0x01000000U;
> > + return val == ~0xfa1e0ff3;
> > +}
>
> At this point, I feel I've earned an "I told you so". :)
>
> Clang 13 sees straight through the trickery and generates:
>
> is_endbr: # @is_endbr
> movl $-16777217, %eax # imm = 0xFEFFFFFF
> andl (%rdi), %eax
> cmpl $-98693133, %eax # imm = 0xFA1E0FF3
> sete %al
> retq
I think it is enough to add:
asm("", "=r" (val));
somewhere in the middle.
(I think that is right for asm with input and output in the same
register.)
There might be a HIDE_FOR_OPTIMISER() define that does that.
David
-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: [PATCH 05/29] x86: Base IBT bits
2022-02-18 16:49 ` [PATCH 05/29] x86: Base IBT bits Peter Zijlstra
2022-02-18 20:49 ` Andrew Cooper
@ 2022-02-18 21:14 ` Josh Poimboeuf
2022-02-18 21:21 ` Peter Zijlstra
2022-02-18 22:12 ` Joao Moreira
2022-02-19 1:07 ` Edgecombe, Rick P
3 siblings, 1 reply; 94+ messages in thread
From: Josh Poimboeuf @ 2022-02-18 21:14 UTC (permalink / raw)
To: Peter Zijlstra
Cc: x86, joao, hjl.tools, andrew.cooper3, linux-kernel, ndesaulniers,
keescook, samitolvanen, mark.rutland, alyssa.milburn
On Fri, Feb 18, 2022 at 05:49:07PM +0100, Peter Zijlstra wrote:
> +#ifdef CONFIG_X86_64
> +#define ASM_ENDBR "endbr64\n\t"
> +#else
> +#define ASM_ENDBR "endbr32\n\t"
> +#endif
Is it safe to assume all supported assemblers know this instruction?
--
Josh
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: [PATCH 14/29] x86/ibt: Add IBT feature, MSR and #CP handling
2022-02-18 19:31 ` Andrew Cooper
@ 2022-02-18 21:15 ` Peter Zijlstra
0 siblings, 0 replies; 94+ messages in thread
From: Peter Zijlstra @ 2022-02-18 21:15 UTC (permalink / raw)
To: Andrew Cooper
Cc: x86, joao, hjl.tools, jpoimboe, linux-kernel, ndesaulniers,
keescook, samitolvanen, mark.rutland, alyssa.milburn
On Fri, Feb 18, 2022 at 07:31:38PM +0000, Andrew Cooper wrote:
> On 18/02/2022 16:49, Peter Zijlstra wrote:
> > --- a/arch/x86/kernel/cpu/common.c
> > +++ b/arch/x86/kernel/cpu/common.c
> > @@ -592,6 +593,27 @@ static __init int setup_disable_pku(char
> > __setup("nopku", setup_disable_pku);
> > #endif /* CONFIG_X86_64 */
> >
> > +static __always_inline void setup_cet(struct cpuinfo_x86 *c)
> > +{
> > + u64 msr;
> > +
> > + if (!IS_ENABLED(CONFIG_X86_IBT) ||
> > + !cpu_feature_enabled(X86_FEATURE_IBT))
> > + return;
> > +
> > + cr4_set_bits(X86_CR4_CET);
> > +
> > + rdmsrl(MSR_IA32_S_CET, msr);
> > + if (cpu_feature_enabled(X86_FEATURE_IBT))
> > + msr |= CET_ENDBR_EN;
> > + wrmsrl(MSR_IA32_S_CET, msr);
>
> So something I learnt the hard way with shstk is that you really want to
> disable S_CET before heading into purgatory.
>
> I've got no idea what's going to result from UEFI finally getting CET
> support. However, clearing out the other IBT settings is probably a
> wise move.
>
> In particular, if there was a stale legacy bitmap pointer, then
> ibt_selftest() could take #PF ahead of #CP.
How's this then? That writes the whole state to a known value before
enabling CR4.CET to make the thing go...
+static __always_inline void setup_cet(struct cpuinfo_x86 *c)
+{
+ u64 msr = CET_ENDBR_EN;
+
+ if (!IS_ENABLED(CONFIG_X86_IBT) ||
+ !cpu_feature_enabled(X86_FEATURE_IBT))
+ return;
+
+ wrmsrl(MSR_IA32_S_CET, msr);
+ cr4_set_bits(X86_CR4_CET);
+
+ if (!ibt_selftest()) {
+ pr_err("IBT selftest: Failed!\n");
+ setup_clear_cpu_cap(X86_FEATURE_IBT);
+ }
+}
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: [PATCH 05/29] x86: Base IBT bits
2022-02-18 21:14 ` Josh Poimboeuf
@ 2022-02-18 21:21 ` Peter Zijlstra
0 siblings, 0 replies; 94+ messages in thread
From: Peter Zijlstra @ 2022-02-18 21:21 UTC (permalink / raw)
To: Josh Poimboeuf
Cc: x86, joao, hjl.tools, andrew.cooper3, linux-kernel, ndesaulniers,
keescook, samitolvanen, mark.rutland, alyssa.milburn
On Fri, Feb 18, 2022 at 01:14:51PM -0800, Josh Poimboeuf wrote:
> On Fri, Feb 18, 2022 at 05:49:07PM +0100, Peter Zijlstra wrote:
> > +#ifdef CONFIG_X86_64
> > +#define ASM_ENDBR "endbr64\n\t"
> > +#else
> > +#define ASM_ENDBR "endbr32\n\t"
> > +#endif
>
> Is it safe to assume all supported assemblers know this instruction?
I was hoping the answer was yes, given CC_HAS_IBT.
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: [PATCH 02/29] x86/module: Fix the paravirt vs alternative order
2022-02-18 20:28 ` Josh Poimboeuf
@ 2022-02-18 21:22 ` Peter Zijlstra
2022-02-18 23:28 ` Josh Poimboeuf
0 siblings, 1 reply; 94+ messages in thread
From: Peter Zijlstra @ 2022-02-18 21:22 UTC (permalink / raw)
To: Josh Poimboeuf
Cc: x86, joao, hjl.tools, andrew.cooper3, linux-kernel, ndesaulniers,
keescook, samitolvanen, mark.rutland, alyssa.milburn,
Juergen Gross
On Fri, Feb 18, 2022 at 12:28:20PM -0800, Josh Poimboeuf wrote:
> On Fri, Feb 18, 2022 at 05:49:04PM +0100, Peter Zijlstra wrote:
> > Ever since commit 4e6292114c741 ("x86/paravirt: Add new features for
> > paravirt patching") there is an ordering dependency between patching
> > paravirt ops and patching alternatives, the module loader still
> > violates this.
> >
> > Fixes: 4e6292114c741 ("x86/paravirt: Add new features for paravirt patching")
> > Cc: Juergen Gross <jgross@suse.com>
> > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
>
> Probably a good idea to put the 'para' and 'alt' clauses next to each
> other and add a comment that the ordering is necessary.
Can't, retpolines must be in between, but I'll add a comment to check
alternative.c for ordering constraints.
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: [PATCH 05/29] x86: Base IBT bits
2022-02-18 21:11 ` David Laight
@ 2022-02-18 21:24 ` Andrew Cooper
2022-02-18 22:37 ` David Laight
0 siblings, 1 reply; 94+ messages in thread
From: Andrew Cooper @ 2022-02-18 21:24 UTC (permalink / raw)
To: David Laight, Peter Zijlstra, x86, joao, hjl.tools, jpoimboe
Cc: linux-kernel, ndesaulniers, keescook, samitolvanen, mark.rutland,
alyssa.milburn, Andrew Cooper
On 18/02/2022 21:11, David Laight wrote:
> From: Andrew Cooper
>> Sent: 18 February 2022 20:50
>>
>> On 18/02/2022 16:49, Peter Zijlstra wrote:
>>> +/*
>>> + * A bit convoluted, but matches both endbr32 and endbr64 without
>>> + * having either as literal in the text.
>>> + */
>>> +static inline bool is_endbr(const void *addr)
>>> +{
>>> + unsigned int val = ~*(unsigned int *)addr;
>>> + val |= 0x01000000U;
>>> + return val == ~0xfa1e0ff3;
>>> +}
>> At this point, I feel I've earned an "I told you so". :)
>>
>> Clang 13 sees straight through the trickery and generates:
>>
>> is_endbr: # @is_endbr
>> movl $-16777217, %eax # imm = 0xFEFFFFFF
>> andl (%rdi), %eax
>> cmpl $-98693133, %eax # imm = 0xFA1E0FF3
>> sete %al
>> retq
> I think it is enough to add:
> asm("", "=r" (val));
> somewhere in the middle.
(First, you mean "+r" not "=r"), but no - the problem isn't val. It's
`~0xfa1e0ff3` which the compiler is free to transform in several unsafe way.
~Andrew
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: [PATCH 05/29] x86: Base IBT bits
2022-02-18 20:49 ` Andrew Cooper
2022-02-18 21:11 ` David Laight
@ 2022-02-18 21:26 ` Peter Zijlstra
1 sibling, 0 replies; 94+ messages in thread
From: Peter Zijlstra @ 2022-02-18 21:26 UTC (permalink / raw)
To: Andrew Cooper
Cc: x86, joao, hjl.tools, jpoimboe, linux-kernel, ndesaulniers,
keescook, samitolvanen, mark.rutland, alyssa.milburn
On Fri, Feb 18, 2022 at 08:49:45PM +0000, Andrew Cooper wrote:
> On 18/02/2022 16:49, Peter Zijlstra wrote:
> > +/*
> > + * A bit convoluted, but matches both endbr32 and endbr64 without
> > + * having either as literal in the text.
> > + */
> > +static inline bool is_endbr(const void *addr)
> > +{
> > + unsigned int val = ~*(unsigned int *)addr;
> > + val |= 0x01000000U;
> > + return val == ~0xfa1e0ff3;
> > +}
>
> At this point, I feel I've earned an "I told you so". :)
Ha! I actually have a note to double-check this. But yes, I'll stuff
that piece of asm in so I can forget about it.
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: [PATCH 05/29] x86: Base IBT bits
2022-02-18 16:49 ` [PATCH 05/29] x86: Base IBT bits Peter Zijlstra
2022-02-18 20:49 ` Andrew Cooper
2022-02-18 21:14 ` Josh Poimboeuf
@ 2022-02-18 22:12 ` Joao Moreira
2022-02-19 1:07 ` Edgecombe, Rick P
3 siblings, 0 replies; 94+ messages in thread
From: Joao Moreira @ 2022-02-18 22:12 UTC (permalink / raw)
To: Peter Zijlstra
Cc: x86, hjl.tools, jpoimboe, andrew.cooper3, linux-kernel,
ndesaulniers, keescook, samitolvanen, mark.rutland,
alyssa.milburn
> +config CC_HAS_IBT
> + # GCC >= 9 and binutils >= 2.29
> + # Retpoline check to work around
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93654
> + def_bool $(cc-option, -fcf-protection=branch
> -mindirect-branch-register) && $(as-instr,endbr64)
> +
Is -mindirect-branch-register breaks compiling with clang. Maybe we
should we do instead?
+ def_bool ($(cc-option, -fcf-protection=branch
-mindirect-branch-register) || $(cc-option, -mretpoline-external-thunk))
&& $(as-instr,endbr64)
^ permalink raw reply [flat|nested] 94+ messages in thread
* RE: [PATCH 05/29] x86: Base IBT bits
2022-02-18 21:24 ` Andrew Cooper
@ 2022-02-18 22:37 ` David Laight
0 siblings, 0 replies; 94+ messages in thread
From: David Laight @ 2022-02-18 22:37 UTC (permalink / raw)
To: 'Andrew Cooper', Peter Zijlstra, x86, joao, hjl.tools, jpoimboe
Cc: linux-kernel, ndesaulniers, keescook, samitolvanen, mark.rutland,
alyssa.milburn
From: Andrew Cooper
> Sent: 18 February 2022 21:24
>
> On 18/02/2022 21:11, David Laight wrote:
> > From: Andrew Cooper
> >> Sent: 18 February 2022 20:50
> >>
> >> On 18/02/2022 16:49, Peter Zijlstra wrote:
> >>> +/*
> >>> + * A bit convoluted, but matches both endbr32 and endbr64 without
> >>> + * having either as literal in the text.
> >>> + */
> >>> +static inline bool is_endbr(const void *addr)
> >>> +{
> >>> + unsigned int val = ~*(unsigned int *)addr;
> >>> + val |= 0x01000000U;
> >>> + return val == ~0xfa1e0ff3;
> >>> +}
> >> At this point, I feel I've earned an "I told you so". :)
> >>
> >> Clang 13 sees straight through the trickery and generates:
> >>
> >> is_endbr: # @is_endbr
> >> movl $-16777217, %eax # imm = 0xFEFFFFFF
> >> andl (%rdi), %eax
> >> cmpl $-98693133, %eax # imm = 0xFA1E0FF3
> >> sete %al
> >> retq
> > I think it is enough to add:
> > asm("", "=r" (val));
> > somewhere in the middle.
>
> (First, you mean "+r" not "=r"),
I always double check....
> but no - the problem isn't val. It's
> `~0xfa1e0ff3` which the compiler is free to transform in several unsafe way.
Actually you could do (modulo stupid errors):
val = (*(unsigned int *)addr & ~0x01000000) ^ 0xff3;
asm("", "+r" (val));
return val ^ 0xfa1e0000;
which should be zero for endbra and non-zero overwise.
Shame the compiler will probably never use the flags from the final xor.
Converting to bool just adds code!
(I hate bool)
David
-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: [PATCH 19/29] x86/ibt,xen: Annotate away warnings
2022-02-18 21:05 ` Peter Zijlstra
@ 2022-02-18 23:07 ` Andrew Cooper
2022-02-21 14:20 ` Peter Zijlstra
0 siblings, 1 reply; 94+ messages in thread
From: Andrew Cooper @ 2022-02-18 23:07 UTC (permalink / raw)
To: Peter Zijlstra
Cc: x86, joao, hjl.tools, jpoimboe, Juergen Gross, linux-kernel,
ndesaulniers, keescook, samitolvanen, mark.rutland,
alyssa.milburn, Juergen Gross, Andrew Cooper, Andy Lutomirski
On 18/02/2022 21:05, Peter Zijlstra wrote:
> On Fri, Feb 18, 2022 at 08:24:41PM +0000, Andrew Cooper wrote:
>> at a minimum, and possibly also:
>>
>> diff --git a/arch/x86/xen/xen-asm.S b/arch/x86/xen/xen-asm.S
>> index 444d824775f6..96db5c50a6e7 100644
>> --- a/arch/x86/xen/xen-asm.S
>> +++ b/arch/x86/xen/xen-asm.S
>> @@ -124,7 +124,7 @@ SYM_CODE_START(xen_\name)
>> UNWIND_HINT_EMPTY
>> pop %rcx
>> pop %r11
>> - jmp \name
>> + jmp \name + 4 * IS_ENABLED(CONFIG_X86_IBT)
>> SYM_CODE_END(xen_\name)
>> _ASM_NOKPROBE(xen_\name)
>> .endm
> objtool will do that for you, it will rewrite all direct jmp/call to
> endbr.
Ah - great.
> Something like so then?
Looks plausible, although Juergen would be a better person to judge.
About paravirt_iret, this is all way more complicated than it needs to be.
Currently, there are two users of INTERRUPT_RETURN.
The first, in swapgs_restore_regs_and_return_to_usermode, is never going
to execute until patching is complete, and is already behind an
alternative causing XENPV to go a different way, which means that:
diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 97b1f84bb53f..f9a021e7688a 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -608,8 +608,8 @@
SYM_INNER_LABEL(swapgs_restore_regs_and_return_to_usermode, SYM_L_GLOBAL)
/* Restore RDI. */
popq %rdi
- SWAPGS
- INTERRUPT_RETURN
+ swapgs
+ jmp native_iret
SYM_INNER_LABEL(restore_regs_and_return_to_kernel, SYM_L_GLOBAL)
is correct AFAICT. (Tangent; then ESPFIX64 can be simplified because
only the return-to-user path needs the LDT check, so the enter/exit user
state can be dropped.)
That leaves the single INTERRUPT_RETURN in
restore_regs_and_return_to_kernel. Xen PV is an easy environment to
start up in, so:
diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 97b1f84bb53f..a9e7846cc176 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -626,7 +626,10 @@ SYM_INNER_LABEL(restore_regs_and_return_to_kernel,
SYM_L_GLOBAL)
* ARCH_HAS_MEMBARRIER_SYNC_CORE rely on IRET core serialization
* when returning from IPI handler.
*/
- INTERRUPT_RETURN
+#ifdef CONFIG_XEN_PV
+early_iret_patch:
+#endif
+ jmp native_iret
SYM_INNER_LABEL_ALIGN(native_iret, SYM_L_GLOBAL)
UNWIND_HINT_IRET_REGS
diff --git a/arch/x86/xen/xen-head.S b/arch/x86/xen/xen-head.S
index 6a64496edefb..31f136328c84 100644
--- a/arch/x86/xen/xen-head.S
+++ b/arch/x86/xen/xen-head.S
@@ -66,6 +66,10 @@ SYM_CODE_START(startup_xen)
cdq
wrmsr
+ mov $native_iret, %rax
+ sub $xen_iret, %rax
+ add %eax, 1 + early_iret_patch
+
call xen_start_kernel
SYM_CODE_END(startup_xen)
__FINIT
really should be good enough to drop INTERRUPT_RETURN and paravirt_iret
entirely.
Obviously, that's very hacky, and might better be expressed like:
diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 97b1f84bb53f..af371e4f0dda 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -626,7 +626,7 @@ SYM_INNER_LABEL(restore_regs_and_return_to_kernel,
SYM_L_GLOBAL)
* ARCH_HAS_MEMBARRIER_SYNC_CORE rely on IRET core serialization
* when returning from IPI handler.
*/
- INTERRUPT_RETURN
+ EARLY_ALTERNATIVE "jmp native_iret", "jmp xen_iret",
X86_FEATURE_XENPV
SYM_INNER_LABEL_ALIGN(native_iret, SYM_L_GLOBAL)
UNWIND_HINT_IRET_REGS
or so, but my point is that the early Xen code, if it can identify this
patch point separate to the list of everything, can easily arrange for
it to be modified before HYPERCALL_set_trap_table (Xen PV's LIDT), and
then return_to_kernel is in its fully configured state (paravirt or
otherwise) before interrupts/exceptions can be taken.
~Andrew
^ permalink raw reply related [flat|nested] 94+ messages in thread
* Re: [PATCH 02/29] x86/module: Fix the paravirt vs alternative order
2022-02-18 21:22 ` Peter Zijlstra
@ 2022-02-18 23:28 ` Josh Poimboeuf
0 siblings, 0 replies; 94+ messages in thread
From: Josh Poimboeuf @ 2022-02-18 23:28 UTC (permalink / raw)
To: Peter Zijlstra
Cc: x86, joao, hjl.tools, andrew.cooper3, linux-kernel, ndesaulniers,
keescook, samitolvanen, mark.rutland, alyssa.milburn,
Juergen Gross
On Fri, Feb 18, 2022 at 10:22:46PM +0100, Peter Zijlstra wrote:
> On Fri, Feb 18, 2022 at 12:28:20PM -0800, Josh Poimboeuf wrote:
> > On Fri, Feb 18, 2022 at 05:49:04PM +0100, Peter Zijlstra wrote:
> > > Ever since commit 4e6292114c741 ("x86/paravirt: Add new features for
> > > paravirt patching") there is an ordering dependency between patching
> > > paravirt ops and patching alternatives, the module loader still
> > > violates this.
> > >
> > > Fixes: 4e6292114c741 ("x86/paravirt: Add new features for paravirt patching")
> > > Cc: Juergen Gross <jgross@suse.com>
> > > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> >
> > Probably a good idea to put the 'para' and 'alt' clauses next to each
> > other and add a comment that the ordering is necessary.
>
> Can't, retpolines must be in between, but I'll add a comment to check
> alternative.c for ordering constraints.
Ah, even more justification for a comment then ;-)
--
Josh
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: [PATCH 07/29] x86/entry: Sprinkle ENDBR dust
2022-02-18 16:49 ` [PATCH 07/29] x86/entry: Sprinkle ENDBR dust Peter Zijlstra
@ 2022-02-19 0:23 ` Josh Poimboeuf
2022-02-19 23:08 ` Peter Zijlstra
2022-02-19 0:36 ` Josh Poimboeuf
1 sibling, 1 reply; 94+ messages in thread
From: Josh Poimboeuf @ 2022-02-19 0:23 UTC (permalink / raw)
To: Peter Zijlstra
Cc: x86, joao, hjl.tools, andrew.cooper3, linux-kernel, ndesaulniers,
keescook, samitolvanen, mark.rutland, alyssa.milburn
On Fri, Feb 18, 2022 at 05:49:09PM +0100, Peter Zijlstra wrote:
> Kernel entry points should be having ENDBR on for IBT configs.
>
> The SYSCALL entry points are found through taking their respective
> address in order to program them in the MSRs, while the exception
> entry points are found through UNWIND_HINT_IRET_REGS.
>
> *Except* that latter hint is also used on exit code to denote when
> we're down to an IRET frame. As such add an additional 'entry'
> argument to the macro and have it default to '1' such that objtool
> will assume it's an entry and WARN about it.
>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
So we now have two unwind types which are identical, except one requires
ENDBR after it.
It's not ideal. The code has to make sure to get the annotations right
for objtool to do its job. Setting the macro's default to 'entry=1'
does help with that, but still... it's clunky.
Also, calling them "entry" and "exit" is confusing. Not all the exits
are exits. Their common attribute is really that they're not "entry".
How important is it for objtool to validate these anyway? Seems like
such bugs would be few and far between, and would be discovered in a
jiffy after bricking the system.
Another possibly better and less intrusive way of doing this would be
for objtool to realize that any UNWIND_HINT_IRET_REGS at the beginning
of a SYM_CODE_START (global non-function code symbol) needs ENDBR.
--
Josh
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: [PATCH 07/29] x86/entry: Sprinkle ENDBR dust
2022-02-18 16:49 ` [PATCH 07/29] x86/entry: Sprinkle ENDBR dust Peter Zijlstra
2022-02-19 0:23 ` Josh Poimboeuf
@ 2022-02-19 0:36 ` Josh Poimboeuf
1 sibling, 0 replies; 94+ messages in thread
From: Josh Poimboeuf @ 2022-02-19 0:36 UTC (permalink / raw)
To: Peter Zijlstra
Cc: x86, joao, hjl.tools, andrew.cooper3, linux-kernel, ndesaulniers,
keescook, samitolvanen, mark.rutland, alyssa.milburn
On Fri, Feb 18, 2022 at 05:49:09PM +0100, Peter Zijlstra wrote:
> - .align 8
> +
> + .align IDT_ALIGN
> SYM_CODE_START(irq_entries_start)
> vector=FIRST_EXTERNAL_VECTOR
> .rept NR_EXTERNAL_VECTORS
> - UNWIND_HINT_IRET_REGS
> + UNWIND_HINT_IRET_REGS entry=1
> 0 :
> + ENDBR
> .byte 0x6a, vector
> jmp asm_common_interrupt
> - nop
> /* Ensure that the above is 8 bytes max */
"IDT_ALIGN bytes max" ?
> - . = 0b + 8
> + .fill 0b + IDT_ALIGN - ., 1, 0x90
> vector = vector+1
> .endr
> SYM_CODE_END(irq_entries_start)
>
> #ifdef CONFIG_X86_LOCAL_APIC
> - .align 8
> + .align IDT_ALIGN
> SYM_CODE_START(spurious_entries_start)
> vector=FIRST_SYSTEM_VECTOR
> .rept NR_SYSTEM_VECTORS
> - UNWIND_HINT_IRET_REGS
> + UNWIND_HINT_IRET_REGS entry=1
> 0 :
> + ENDBR
> .byte 0x6a, vector
> jmp asm_spurious_interrupt
> - nop
> /* Ensure that the above is 8 bytes max */
Ditto
> - . = 0b + 8
> + .fill 0b + IDT_ALIGN - ., 1, 0x90
> vector = vector+1
> .endr
> SYM_CODE_END(spurious_entries_start)
> --- a/arch/x86/include/asm/segment.h
> +++ b/arch/x86/include/asm/segment.h
> @@ -4,6 +4,7 @@
>
> #include <linux/const.h>
> #include <asm/alternative.h>
> +#include <asm/ibt.h>
>
> /*
> * Constructor for a conventional segment GDT (or LDT) entry.
> @@ -275,7 +276,11 @@ static inline void vdso_read_cpunode(uns
> * vector has no error code (two bytes), a 'push $vector_number' (two
> * bytes), and a jump to the common entry code (up to five bytes).
> */
> +#ifdef CONFIG_X86_IBT
> +#define EARLY_IDT_HANDLER_SIZE 13
> +#else
> #define EARLY_IDT_HANDLER_SIZE 9
> +#endif
Might want to add a sentence to the comment above: With IDT enabled,
ENDBR adds another four bytes.
> /*
> * xen_early_idt_handler_array is for Xen pv guests: for each entry in
> --- a/arch/x86/include/asm/unwind_hints.h
> +++ b/arch/x86/include/asm/unwind_hints.h
> @@ -11,7 +11,7 @@
> UNWIND_HINT sp_reg=ORC_REG_UNDEFINED type=UNWIND_HINT_TYPE_CALL end=1
> .endm
>
> -.macro UNWIND_HINT_REGS base=%rsp offset=0 indirect=0 extra=1 partial=0
> +.macro UNWIND_HINT_REGS base=%rsp offset=0 indirect=0 extra=1 partial=0 entry=1
> .if \base == %rsp
> .if \indirect
> .set sp_reg, ORC_REG_SP_INDIRECT
> @@ -33,9 +33,17 @@
> .set sp_offset, \offset
>
> .if \partial
> - .set type, UNWIND_HINT_TYPE_REGS_PARTIAL
> + .if \entry
> + .set type, UNWIND_HINT_TYPE_REGS_ENTRY
> + .else
> + .set type, UNWIND_HINT_TYPE_REGS_EXIT
> + .endif
> .elseif \extra == 0
> - .set type, UNWIND_HINT_TYPE_REGS_PARTIAL
> + .if \entry
> + .set type, UNWIND_HINT_TYPE_REGS_ENTRY
> + .else
> + .set type, UNWIND_HINT_TYPE_REGS_EXIT
> + .endif
> .set sp_offset, \offset + (16*8)
'extra' is apparently no longer needed and can be shown the door.
--
Josh
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: [PATCH 05/29] x86: Base IBT bits
2022-02-18 16:49 ` [PATCH 05/29] x86: Base IBT bits Peter Zijlstra
` (2 preceding siblings ...)
2022-02-18 22:12 ` Joao Moreira
@ 2022-02-19 1:07 ` Edgecombe, Rick P
3 siblings, 0 replies; 94+ messages in thread
From: Edgecombe, Rick P @ 2022-02-19 1:07 UTC (permalink / raw)
To: Poimboe, Josh, peterz, hjl.tools, x86, joao, Cooper, Andrew
Cc: keescook, linux-kernel, mark.rutland, samitolvanen, ndesaulniers,
Milburn, Alyssa
On Fri, 2022-02-18 at 17:49 +0100, Peter Zijlstra wrote:
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -1861,6 +1861,21 @@ config X86_UMIP
> specific cases in protected and virtual-8086 modes.
> Emulated
> results are dummy.
>
> +config CC_HAS_IBT
> + # GCC >= 9 and binutils >= 2.29
> + # Retpoline check to work around
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93654
> + def_bool $(cc-option, -fcf-protection=branch -mindirect-
> branch-register) && $(as-instr,endbr64)
> +
> +config X86_IBT
> + prompt "Indirect Branch Tracking"
> + bool
> + depends on X86_64 && CC_HAS_IBT
> + help
> + Build the kernel with support for Indirect Branch Tracking,
> a
> + hardware supported CFI scheme. Any indirect call must land
> on
> + an ENDBR instruction, as such, the compiler will litter the
> + code with them to make this happen.
> +
>
Could you call this something more specific then just X86_IBT? Like
X86_KERNEL_IBT or something? It could get confusing if we add userspace
IBT, or if someone wants IBT for KVM guests without CFI in the kernel.
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: [PATCH 14/29] x86/ibt: Add IBT feature, MSR and #CP handling
2022-02-18 16:49 ` [PATCH 14/29] x86/ibt: Add IBT feature, MSR and #CP handling Peter Zijlstra
2022-02-18 19:31 ` Andrew Cooper
@ 2022-02-19 1:20 ` Edgecombe, Rick P
2022-02-19 1:21 ` Josh Poimboeuf
` (2 subsequent siblings)
4 siblings, 0 replies; 94+ messages in thread
From: Edgecombe, Rick P @ 2022-02-19 1:20 UTC (permalink / raw)
To: Poimboe, Josh, peterz, hjl.tools, x86, joao, Cooper, Andrew
Cc: keescook, linux-kernel, mark.rutland, samitolvanen, ndesaulniers,
Milburn, Alyssa
On Fri, 2022-02-18 at 17:49 +0100, Peter Zijlstra wrote:
> +static __always_inline void setup_cet(struct cpuinfo_x86 *c)
> +{
> + u64 msr;
> +
> + if (!IS_ENABLED(CONFIG_X86_IBT) ||
> + !cpu_feature_enabled(X86_FEATURE_IBT))
> + return;
> +
> + cr4_set_bits(X86_CR4_CET);
> +
> + rdmsrl(MSR_IA32_S_CET, msr);
> + if (cpu_feature_enabled(X86_FEATURE_IBT))
It must be true because of the above check.
> + msr |= CET_ENDBR_EN;
> + wrmsrl(MSR_IA32_S_CET, msr);
> +
> + if (!ibt_selftest()) {
> + pr_err("IBT selftest: Failed!\n");
> + setup_clear_cpu_cap(X86_FEATURE_IBT);
> + }
> +}
> +
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: [PATCH 14/29] x86/ibt: Add IBT feature, MSR and #CP handling
2022-02-18 16:49 ` [PATCH 14/29] x86/ibt: Add IBT feature, MSR and #CP handling Peter Zijlstra
2022-02-18 19:31 ` Andrew Cooper
2022-02-19 1:20 ` Edgecombe, Rick P
@ 2022-02-19 1:21 ` Josh Poimboeuf
2022-02-19 9:24 ` Peter Zijlstra
2022-02-21 8:24 ` Kees Cook
2022-02-22 4:38 ` Edgecombe, Rick P
4 siblings, 1 reply; 94+ messages in thread
From: Josh Poimboeuf @ 2022-02-19 1:21 UTC (permalink / raw)
To: Peter Zijlstra
Cc: x86, joao, hjl.tools, andrew.cooper3, linux-kernel, ndesaulniers,
keescook, samitolvanen, mark.rutland, alyssa.milburn
On Fri, Feb 18, 2022 at 05:49:16PM +0100, Peter Zijlstra wrote:
> +DEFINE_IDTENTRY_ERRORCODE(exc_control_protection)
> +{
> + if (!cpu_feature_enabled(X86_FEATURE_IBT)) {
> + pr_err("Whaaa?!?!\n");
> + return;
> + }
Might want to upgrade that to a proper warning :-)
> +bool ibt_selftest(void)
> +{
> + ibt_selftest_ok = false;
> +
> + asm (ANNOTATE_NOENDBR
> + "1: lea 2f(%%rip), %%rax\n\t"
> + ANNOTATE_RETPOLINE_SAFE
> + " jmp *%%rax\n\t"
> + "2: nop\n\t"
> +
> + /* unsigned ibt_selftest_ip = 2b */
> + ".pushsection .data,\"aw\"\n\t"
> + ".align 8\n\t"
> + ".type ibt_selftest_ip, @object\n\t"
> + ".size ibt_selftest_ip, 8\n\t"
> + "ibt_selftest_ip:\n\t"
> + ".quad 2b\n\t"
> + ".popsection\n\t"
> +
> + : : : "rax", "memory");
Can 'ibt_selftest_ip' just be defined in C (with __ro_after_init) and
passed as an output to the asm doing 'mov $2b, %[ibt_selftest_ip]'?
--
Josh
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: [PATCH 00/29] x86: Kernel IBT
2022-02-18 16:49 [PATCH 00/29] x86: Kernel IBT Peter Zijlstra
` (28 preceding siblings ...)
2022-02-18 16:49 ` [PATCH 29/29] x86/alternative: Use .ibt_endbr_sites to seal indirect calls Peter Zijlstra
@ 2022-02-19 1:29 ` Edgecombe, Rick P
2022-02-19 9:58 ` Peter Zijlstra
2022-02-23 7:26 ` Kees Cook
29 siblings, 2 replies; 94+ messages in thread
From: Edgecombe, Rick P @ 2022-02-19 1:29 UTC (permalink / raw)
To: Poimboe, Josh, peterz, hjl.tools, x86, joao, Cooper, Andrew
Cc: keescook, linux-kernel, mark.rutland, samitolvanen, ndesaulniers,
Milburn, Alyssa
On Fri, 2022-02-18 at 17:49 +0100, Peter Zijlstra wrote:
> This is an (almost!) complete Kernel IBT implementation. It's been
> self-hosting
> for a few days now. That is, it runs on IBT enabled hardware
> (Tigerlake) and is
> capable of building the next kernel.
>
> It is also almost clean on allmodconfig using GCC-11.2.
>
> The biggest TODO item at this point is Clang, I've not yet looked at
> that.
Do you need to turn this off before kexec?
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: [PATCH 16/29] x86/bugs: Disable Retpoline when IBT
2022-02-18 16:49 ` [PATCH 16/29] x86/bugs: Disable Retpoline when IBT Peter Zijlstra
@ 2022-02-19 2:15 ` Josh Poimboeuf
2022-02-22 15:00 ` Peter Zijlstra
0 siblings, 1 reply; 94+ messages in thread
From: Josh Poimboeuf @ 2022-02-19 2:15 UTC (permalink / raw)
To: Peter Zijlstra
Cc: x86, joao, hjl.tools, andrew.cooper3, linux-kernel, ndesaulniers,
keescook, samitolvanen, mark.rutland, alyssa.milburn
On Fri, Feb 18, 2022 at 05:49:18PM +0100, Peter Zijlstra wrote:
> Retpoline and IBT are mutually exclusive. IBT relies on indirect
> branches (JMP/CALL *%reg) while retpoline avoids them by design.
>
> Demote to LFENCE on IBT enabled hardware.
>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
> arch/x86/kernel/cpu/bugs.c | 25 +++++++++++++++++++++++++
> 1 file changed, 25 insertions(+)
>
> --- a/arch/x86/kernel/cpu/bugs.c
> +++ b/arch/x86/kernel/cpu/bugs.c
> @@ -937,6 +937,11 @@ static void __init spectre_v2_select_mit
> boot_cpu_data.x86_vendor == X86_VENDOR_HYGON) {
> retpoline_amd:
> if (!boot_cpu_has(X86_FEATURE_LFENCE_RDTSC)) {
> + if (IS_ENABLED(CONFIG_X86_IBT) &&
> + boot_cpu_has(X86_FEATURE_IBT)) {
> + pr_err("Spectre mitigation: LFENCE not serializing, generic retpoline not available due to IBT, switching to none\n");
> + return;
> + }
> pr_err("Spectre mitigation: LFENCE not serializing, switching to generic retpoline\n");
> goto retpoline_generic;
> }
> @@ -945,6 +950,26 @@ static void __init spectre_v2_select_mit
> setup_force_cpu_cap(X86_FEATURE_RETPOLINE);
> } else {
> retpoline_generic:
> + /*
> + * Full retpoline is incompatible with IBT, demote to LFENCE.
> + */
> + if (IS_ENABLED(CONFIG_X86_IBT) &&
> + boot_cpu_has(X86_FEATURE_IBT)) {
> + switch (cmd) {
> + case SPECTRE_V2_CMD_FORCE:
> + case SPECTRE_V2_CMD_AUTO:
> + case SPECTRE_V2_CMD_RETPOLINE:
> + /* silent for auto select */
> + break;
> +
> + default:
> + /* warn when 'demoting' an explicit selection */
> + pr_warn("Spectre mitigation: Switching to LFENCE due to IBT\n");
> + break;
This code is confusing, not helped by the fact that the existing code
already looks like spaghetti.
Assuming IBT systems also have eIBRS (right?), I don't think the above
SPECTRE_V2_CMD_{FORCE,AUTO} cases would be possible.
AFAICT, if execution reached the retpoline_generic label, the user
specified either RETPOLINE or RETPOLINE_GENERIC.
I'm not sure it makes sense to put RETPOLINE in the "silent" list. If
the user boots an Intel system with spectre_v2=retpoline on the cmdline,
they're probably expecting a traditional retpoline and should be warned
if that changes, especially if it's a "demotion".
In that case the switch statement isn't even needed. It can instead
just unconditinoally print the warning.
Also, why "demote" retpoline to LFENCE rather than attempting to
"promote" it to eIBRS? Maybe there's a good reason but it probably at
least deserves some mention in the commit log.
--
Josh
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: [PATCH 17/29] x86/ibt: Annotate text references
2022-02-18 16:49 ` [PATCH 17/29] x86/ibt: Annotate text references Peter Zijlstra
@ 2022-02-19 5:22 ` Josh Poimboeuf
2022-02-19 9:39 ` Peter Zijlstra
0 siblings, 1 reply; 94+ messages in thread
From: Josh Poimboeuf @ 2022-02-19 5:22 UTC (permalink / raw)
To: Peter Zijlstra
Cc: x86, joao, hjl.tools, andrew.cooper3, linux-kernel, ndesaulniers,
keescook, samitolvanen, mark.rutland, alyssa.milburn
On Fri, Feb 18, 2022 at 05:49:19PM +0100, Peter Zijlstra wrote:
> Annotate away some of the generic code references. This is things
> where we take the address of a symbol for exception handling or return
> addresses (eg. context switch).
>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
The vast majority of these annotations can go away if objtool only
requires ENDBR for referenced *STT_FUNC* symbols.
Anything still needing ANNOTATE_NOENDBR after that, might arguably not
belong as STT_FUNC anyway and it might make sense to convert it to
non-function code (e.g. SYM_CODE{START,END}.
> @@ -564,12 +565,16 @@ SYM_CODE_END(\asmsym)
> .align 16
> .globl __irqentry_text_start
> __irqentry_text_start:
> + ANNOTATE_NOENDBR // unwinders
> + ud2;
>
> #include <asm/idtentry.h>
>
> .align 16
> .globl __irqentry_text_end
> __irqentry_text_end:
> + ANNOTATE_NOENDBR
> + ud2;
Why ud2? If no ud2 then the annotation shouldn't be needed since the
first idt entry has ENDBR.
--
Josh
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: [PATCH 14/29] x86/ibt: Add IBT feature, MSR and #CP handling
2022-02-19 1:21 ` Josh Poimboeuf
@ 2022-02-19 9:24 ` Peter Zijlstra
0 siblings, 0 replies; 94+ messages in thread
From: Peter Zijlstra @ 2022-02-19 9:24 UTC (permalink / raw)
To: Josh Poimboeuf
Cc: x86, joao, hjl.tools, andrew.cooper3, linux-kernel, ndesaulniers,
keescook, samitolvanen, mark.rutland, alyssa.milburn
On Fri, Feb 18, 2022 at 05:21:55PM -0800, Josh Poimboeuf wrote:
> On Fri, Feb 18, 2022 at 05:49:16PM +0100, Peter Zijlstra wrote:
> > +DEFINE_IDTENTRY_ERRORCODE(exc_control_protection)
> > +{
> > + if (!cpu_feature_enabled(X86_FEATURE_IBT)) {
> > + pr_err("Whaaa?!?!\n");
> > + return;
> > + }
>
> Might want to upgrade that to a proper warning :-)
"Unexpected #CP\n" ?
> > +bool ibt_selftest(void)
> > +{
> > + ibt_selftest_ok = false;
> > +
> > + asm (ANNOTATE_NOENDBR
> > + "1: lea 2f(%%rip), %%rax\n\t"
> > + ANNOTATE_RETPOLINE_SAFE
> > + " jmp *%%rax\n\t"
> > + "2: nop\n\t"
> > +
> > + /* unsigned ibt_selftest_ip = 2b */
> > + ".pushsection .data,\"aw\"\n\t"
> > + ".align 8\n\t"
> > + ".type ibt_selftest_ip, @object\n\t"
> > + ".size ibt_selftest_ip, 8\n\t"
> > + "ibt_selftest_ip:\n\t"
> > + ".quad 2b\n\t"
> > + ".popsection\n\t"
> > +
> > + : : : "rax", "memory");
>
> Can 'ibt_selftest_ip' just be defined in C (with __ro_after_init) and
> passed as an output to the asm doing 'mov $2b, %[ibt_selftest_ip]'?
This seemed simpler... note that it's ran on cpu bringup, so if you do
cpu hotplug it'll end up trying to write to ro memory if you do what you
suggest.
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: [PATCH 17/29] x86/ibt: Annotate text references
2022-02-19 5:22 ` Josh Poimboeuf
@ 2022-02-19 9:39 ` Peter Zijlstra
0 siblings, 0 replies; 94+ messages in thread
From: Peter Zijlstra @ 2022-02-19 9:39 UTC (permalink / raw)
To: Josh Poimboeuf
Cc: x86, joao, hjl.tools, andrew.cooper3, linux-kernel, ndesaulniers,
keescook, samitolvanen, mark.rutland, alyssa.milburn
On Fri, Feb 18, 2022 at 09:22:16PM -0800, Josh Poimboeuf wrote:
> On Fri, Feb 18, 2022 at 05:49:19PM +0100, Peter Zijlstra wrote:
> > Annotate away some of the generic code references. This is things
> > where we take the address of a symbol for exception handling or return
> > addresses (eg. context switch).
> >
> > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
>
> The vast majority of these annotations can go away if objtool only
> requires ENDBR for referenced *STT_FUNC* symbols.
>
> Anything still needing ANNOTATE_NOENDBR after that, might arguably not
> belong as STT_FUNC anyway and it might make sense to convert it to
> non-function code (e.g. SYM_CODE{START,END}.
I really rather prefer objtool to err to the side of caution for now.
Missing ENDBR typically bricks a box hard, normal consoles don't get
around to showing anything. My force_early_printk patches saved the day
a number of times.
Given that the only hardware I have with this on is a NUC without
serial, this is a massive pain in the arse to debug. That box has been
>< close to total destruction a number of times. I never want to do that
ever again, life's too short to have to work with a NUC.
> > @@ -564,12 +565,16 @@ SYM_CODE_END(\asmsym)
> > .align 16
> > .globl __irqentry_text_start
> > __irqentry_text_start:
> > + ANNOTATE_NOENDBR // unwinders
> > + ud2;
> >
> > #include <asm/idtentry.h>
> >
> > .align 16
> > .globl __irqentry_text_end
> > __irqentry_text_end:
> > + ANNOTATE_NOENDBR
> > + ud2;
>
> Why ud2? If no ud2 then the annotation shouldn't be needed since the
> first idt entry has ENDBR.
paranoia :-) just to make absolutely sure nobody ever tries to call
__irqentry_text_end, but yes, removed it.
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: [PATCH 00/29] x86: Kernel IBT
2022-02-19 1:29 ` [PATCH 00/29] x86: Kernel IBT Edgecombe, Rick P
@ 2022-02-19 9:58 ` Peter Zijlstra
2022-02-19 16:00 ` Andrew Cooper
2022-02-21 8:42 ` Kees Cook
2022-02-23 7:26 ` Kees Cook
1 sibling, 2 replies; 94+ messages in thread
From: Peter Zijlstra @ 2022-02-19 9:58 UTC (permalink / raw)
To: Edgecombe, Rick P
Cc: Poimboe, Josh, hjl.tools, x86, joao, Cooper, Andrew, keescook,
linux-kernel, mark.rutland, samitolvanen, ndesaulniers, Milburn,
Alyssa
On Sat, Feb 19, 2022 at 01:29:45AM +0000, Edgecombe, Rick P wrote:
> On Fri, 2022-02-18 at 17:49 +0100, Peter Zijlstra wrote:
> > This is an (almost!) complete Kernel IBT implementation. It's been
> > self-hosting
> > for a few days now. That is, it runs on IBT enabled hardware
> > (Tigerlake) and is
> > capable of building the next kernel.
> >
> > It is also almost clean on allmodconfig using GCC-11.2.
> >
> > The biggest TODO item at this point is Clang, I've not yet looked at
> > that.
>
> Do you need to turn this off before kexec?
Probably... :-) I've never looked at that code though; so I'm not
exactly sure where to put things.
I'm assuming kexec does a hot-unplug of all but the boot-cpu which then
leaves only a single CPU with state in machine_kexec() ? Does the below
look reasonable?
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -638,6 +638,12 @@ static __always_inline void setup_cet(st
}
}
+void cet_disable(void)
+{
+ cr4_clear_bits(X86_CR4_CET);
+ wrmsrl(MSR_IA32_S_CET, 0);
+}
+
/*
* Some CPU features depend on higher CPUID levels, which may not always
* be available due to CPUID level capping or broken virtualization
diff --git a/arch/x86/include/asm/cpu.h b/arch/x86/include/asm/cpu.h
index 33d41e350c79..cf26356db53e 100644
--- a/arch/x86/include/asm/cpu.h
+++ b/arch/x86/include/asm/cpu.h
@@ -72,4 +72,7 @@ void init_ia32_feat_ctl(struct cpuinfo_x86 *c);
#else
static inline void init_ia32_feat_ctl(struct cpuinfo_x86 *c) {}
#endif
+
+extern void cet_disable(void);
+
#endif /* _ASM_X86_CPU_H */
diff --git a/arch/x86/kernel/machine_kexec_64.c b/arch/x86/kernel/machine_kexec_64.c
index f5da4a18070a..29a2a1732605 100644
--- a/arch/x86/kernel/machine_kexec_64.c
+++ b/arch/x86/kernel/machine_kexec_64.c
@@ -310,6 +310,7 @@ void machine_kexec(struct kimage *image)
/* Interrupts aren't acceptable while we reboot */
local_irq_disable();
hw_breakpoint_disable();
+ cet_disable();
if (image->preserve_context) {
#ifdef CONFIG_X86_IO_APIC
^ permalink raw reply related [flat|nested] 94+ messages in thread
* Re: [PATCH 00/29] x86: Kernel IBT
2022-02-19 9:58 ` Peter Zijlstra
@ 2022-02-19 16:00 ` Andrew Cooper
2022-02-21 8:42 ` Kees Cook
1 sibling, 0 replies; 94+ messages in thread
From: Andrew Cooper @ 2022-02-19 16:00 UTC (permalink / raw)
To: Peter Zijlstra, Edgecombe, Rick P
Cc: Poimboe, Josh, hjl.tools, x86, joao, keescook, linux-kernel,
mark.rutland, samitolvanen, ndesaulniers, Milburn, Alyssa,
Andrew Cooper
On 19/02/2022 09:58, Peter Zijlstra wrote:
> On Sat, Feb 19, 2022 at 01:29:45AM +0000, Edgecombe, Rick P wrote:
>> On Fri, 2022-02-18 at 17:49 +0100, Peter Zijlstra wrote:
>>> This is an (almost!) complete Kernel IBT implementation. It's been
>>> self-hosting
>>> for a few days now. That is, it runs on IBT enabled hardware
>>> (Tigerlake) and is
>>> capable of building the next kernel.
>>>
>>> It is also almost clean on allmodconfig using GCC-11.2.
>>>
>>> The biggest TODO item at this point is Clang, I've not yet looked at
>>> that.
>> Do you need to turn this off before kexec?
> Probably... :-) I've never looked at that code though; so I'm not
> exactly sure where to put things.
>
> I'm assuming kexec does a hot-unplug of all but the boot-cpu which then
> leaves only a single CPU with state in machine_kexec() ? Does the below
> look reasonable?
If you skip writing to S_CET on hardware that doesn't have it, probably.
~Andrew
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: [PATCH 07/29] x86/entry: Sprinkle ENDBR dust
2022-02-19 0:23 ` Josh Poimboeuf
@ 2022-02-19 23:08 ` Peter Zijlstra
0 siblings, 0 replies; 94+ messages in thread
From: Peter Zijlstra @ 2022-02-19 23:08 UTC (permalink / raw)
To: Josh Poimboeuf
Cc: x86, joao, hjl.tools, andrew.cooper3, linux-kernel, ndesaulniers,
keescook, samitolvanen, mark.rutland, alyssa.milburn
On Fri, Feb 18, 2022 at 04:23:38PM -0800, Josh Poimboeuf wrote:
> Another possibly better and less intrusive way of doing this would be
> for objtool to realize that any UNWIND_HINT_IRET_REGS at the beginning
> of a SYM_CODE_START (global non-function code symbol) needs ENDBR.
This; I likes that. I reverted this patch from the tree (very much
including the annotations), redid the objtool check and regenerated the
missing ENDBR given the objtool output.
I think the few missing ENDBRs in this are due to using x86_64-defconfig
instead of allmodconfig. I'll try on monday after spooling up a real
build machine :-)
---
arch/x86/entry/entry_64.S | 32 +++++++++++++++-----------------
arch/x86/entry/entry_64_compat.S | 3 +--
arch/x86/include/asm/idtentry.h | 19 +++++++------------
arch/x86/include/asm/segment.h | 7 +------
arch/x86/include/asm/unwind_hints.h | 18 +++++-------------
arch/x86/kernel/head_64.S | 13 ++++++-------
arch/x86/kernel/unwind_orc.c | 3 +--
include/linux/objtool.h | 5 ++---
tools/include/linux/objtool.h | 5 ++---
tools/objtool/check.c | 13 ++++++++-----
tools/objtool/orc_dump.c | 3 +--
11 files changed, 49 insertions(+), 72 deletions(-)
diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 77e222f2061e..d69239c638a2 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -39,7 +39,6 @@
#include <asm/trapnr.h>
#include <asm/nospec-branch.h>
#include <asm/fsgsbase.h>
-#include <asm/ibt.h>
#include <linux/err.h>
#include "calling.h"
@@ -87,8 +86,8 @@
SYM_CODE_START(entry_SYSCALL_64)
UNWIND_HINT_EMPTY
-
ENDBR
+
swapgs
/* tss.sp2 is scratch space. */
movq %rsp, PER_CPU_VAR(cpu_tss_rw + TSS_sp2)
@@ -353,7 +352,7 @@ SYM_CODE_END(ret_from_fork)
*/
.macro idtentry vector asmsym cfunc has_error_code:req
SYM_CODE_START(\asmsym)
- UNWIND_HINT_IRET_REGS offset=\has_error_code*8 entry=1
+ UNWIND_HINT_IRET_REGS offset=\has_error_code*8
ENDBR
ASM_CLAC
@@ -371,7 +370,7 @@ SYM_CODE_START(\asmsym)
.rept 6
pushq 5*8(%rsp)
.endr
- UNWIND_HINT_IRET_REGS offset=8 entry=0
+ UNWIND_HINT_IRET_REGS offset=8
.Lfrom_usermode_no_gap_\@:
.endif
@@ -421,7 +420,7 @@ SYM_CODE_END(\asmsym)
*/
.macro idtentry_mce_db vector asmsym cfunc
SYM_CODE_START(\asmsym)
- UNWIND_HINT_IRET_REGS entry=1
+ UNWIND_HINT_IRET_REGS
ENDBR
ASM_CLAC
@@ -477,7 +476,7 @@ SYM_CODE_END(\asmsym)
*/
.macro idtentry_vc vector asmsym cfunc
SYM_CODE_START(\asmsym)
- UNWIND_HINT_IRET_REGS entry=1
+ UNWIND_HINT_IRET_REGS
ENDBR
ASM_CLAC
@@ -539,7 +538,7 @@ SYM_CODE_END(\asmsym)
*/
.macro idtentry_df vector asmsym cfunc
SYM_CODE_START(\asmsym)
- UNWIND_HINT_IRET_REGS offset=8 entry=1
+ UNWIND_HINT_IRET_REGS offset=8
ENDBR
ASM_CLAC
@@ -641,8 +640,7 @@ SYM_INNER_LABEL(restore_regs_and_return_to_kernel, SYM_L_GLOBAL)
INTERRUPT_RETURN
SYM_INNER_LABEL_ALIGN(native_iret, SYM_L_GLOBAL)
- UNWIND_HINT_IRET_REGS entry=0
- ENDBR // paravirt_iret
+ UNWIND_HINT_IRET_REGS
/*
* Are we returning to a stack segment from the LDT? Note: in
* 64-bit mode SS:RSP on the exception stack is always valid.
@@ -720,7 +718,7 @@ SYM_INNER_LABEL(native_irq_return_iret, SYM_L_GLOBAL)
popq %rdi /* Restore user RDI */
movq %rax, %rsp
- UNWIND_HINT_IRET_REGS offset=8 entry=0
+ UNWIND_HINT_IRET_REGS offset=8
/*
* At this point, we cannot write to the stack any more, but we can
@@ -837,13 +835,13 @@ SYM_CODE_START(xen_failsafe_callback)
movq 8(%rsp), %r11
addq $0x30, %rsp
pushq $0 /* RIP */
- UNWIND_HINT_IRET_REGS offset=8 entry=0
+ UNWIND_HINT_IRET_REGS offset=8
jmp asm_exc_general_protection
1: /* Segment mismatch => Category 1 (Bad segment). Retry the IRET. */
movq (%rsp), %rcx
movq 8(%rsp), %r11
addq $0x30, %rsp
- UNWIND_HINT_IRET_REGS entry=0
+ UNWIND_HINT_IRET_REGS
pushq $-1 /* orig_ax = -1 => not a system call */
PUSH_AND_CLEAR_REGS
ENCODE_FRAME_POINTER
@@ -1078,7 +1076,7 @@ SYM_CODE_END(error_return)
* when PAGE_TABLE_ISOLATION is in use. Do not clobber.
*/
SYM_CODE_START(asm_exc_nmi)
- UNWIND_HINT_IRET_REGS entry=1
+ UNWIND_HINT_IRET_REGS
ENDBR
/*
@@ -1144,13 +1142,13 @@ SYM_CODE_START(asm_exc_nmi)
SWITCH_TO_KERNEL_CR3 scratch_reg=%rdx
movq %rsp, %rdx
movq PER_CPU_VAR(cpu_current_top_of_stack), %rsp
- UNWIND_HINT_IRET_REGS base=%rdx offset=8 entry=0
+ UNWIND_HINT_IRET_REGS base=%rdx offset=8
pushq 5*8(%rdx) /* pt_regs->ss */
pushq 4*8(%rdx) /* pt_regs->rsp */
pushq 3*8(%rdx) /* pt_regs->flags */
pushq 2*8(%rdx) /* pt_regs->cs */
pushq 1*8(%rdx) /* pt_regs->rip */
- UNWIND_HINT_IRET_REGS entry=0
+ UNWIND_HINT_IRET_REGS
pushq $-1 /* pt_regs->orig_ax */
PUSH_AND_CLEAR_REGS rdx=(%rdx)
ENCODE_FRAME_POINTER
@@ -1306,7 +1304,7 @@ SYM_CODE_START(asm_exc_nmi)
.rept 5
pushq 11*8(%rsp)
.endr
- UNWIND_HINT_IRET_REGS entry=0
+ UNWIND_HINT_IRET_REGS
/* Everything up to here is safe from nested NMIs */
@@ -1322,7 +1320,7 @@ SYM_CODE_START(asm_exc_nmi)
pushq $__KERNEL_CS /* CS */
pushq $1f /* RIP */
iretq /* continues at repeat_nmi below */
- UNWIND_HINT_IRET_REGS entry=0
+ UNWIND_HINT_IRET_REGS
1:
#endif
diff --git a/arch/x86/entry/entry_64_compat.S b/arch/x86/entry/entry_64_compat.S
index 316e0fa119b4..86caf7872a25 100644
--- a/arch/x86/entry/entry_64_compat.S
+++ b/arch/x86/entry/entry_64_compat.S
@@ -48,8 +48,8 @@
*/
SYM_CODE_START(entry_SYSENTER_compat)
UNWIND_HINT_EMPTY
- /* Interrupts are off on entry. */
ENDBR
+ /* Interrupts are off on entry. */
SWAPGS
pushq %rax
@@ -344,7 +344,6 @@ SYM_CODE_END(entry_SYSCALL_compat)
*/
SYM_CODE_START(entry_INT80_compat)
UNWIND_HINT_EMPTY
- ENDBR
/*
* Interrupts are off on entry.
*/
diff --git a/arch/x86/include/asm/idtentry.h b/arch/x86/include/asm/idtentry.h
index 9127e1e3c439..1157ee6f98d7 100644
--- a/arch/x86/include/asm/idtentry.h
+++ b/arch/x86/include/asm/idtentry.h
@@ -5,11 +5,7 @@
/* Interrupts/Exceptions */
#include <asm/trapnr.h>
-#ifdef CONFIG_X86_IBT
-#define IDT_ALIGN 16
-#else
-#define IDT_ALIGN 8
-#endif
+#define IDT_ALIGN (8 * (1 + IS_ENABLED(CONFIG_X86_IBT)))
#ifndef __ASSEMBLY__
#include <linux/entry-common.h>
@@ -486,7 +482,7 @@ __visible noinstr void func(struct pt_regs *regs, \
/*
* ASM code to emit the common vector entry stubs where each stub is
- * packed into 8 bytes.
+ * packed into IDT_ALIGN bytes.
*
* Note, that the 'pushq imm8' is emitted via '.byte 0x6a, vector' because
* GCC treats the local vector variable as unsigned int and would expand
@@ -498,17 +494,16 @@ __visible noinstr void func(struct pt_regs *regs, \
* point is to mask off the bits above bit 7 because the push is sign
* extending.
*/
-
.align IDT_ALIGN
SYM_CODE_START(irq_entries_start)
vector=FIRST_EXTERNAL_VECTOR
.rept NR_EXTERNAL_VECTORS
- UNWIND_HINT_IRET_REGS entry=1
-0 :
+ UNWIND_HINT_IRET_REGS
ENDBR
+0 :
.byte 0x6a, vector
jmp asm_common_interrupt
- /* Ensure that the above is 8 bytes max */
+ /* Ensure that the above is IDT_ALIGN bytes max */
.fill 0b + IDT_ALIGN - ., 1, 0x90
vector = vector+1
.endr
@@ -519,12 +514,12 @@ SYM_CODE_END(irq_entries_start)
SYM_CODE_START(spurious_entries_start)
vector=FIRST_SYSTEM_VECTOR
.rept NR_SYSTEM_VECTORS
- UNWIND_HINT_IRET_REGS entry=1
+ UNWIND_HINT_IRET_REGS
0 :
ENDBR
.byte 0x6a, vector
jmp asm_spurious_interrupt
- /* Ensure that the above is 8 bytes max */
+ /* Ensure that the above is IDT_ALIGN bytes max */
.fill 0b + IDT_ALIGN - ., 1, 0x90
vector = vector+1
.endr
diff --git a/arch/x86/include/asm/segment.h b/arch/x86/include/asm/segment.h
index 6a8a5bcbf14d..3a09647788bd 100644
--- a/arch/x86/include/asm/segment.h
+++ b/arch/x86/include/asm/segment.h
@@ -4,7 +4,6 @@
#include <linux/const.h>
#include <asm/alternative.h>
-#include <asm/ibt.h>
/*
* Constructor for a conventional segment GDT (or LDT) entry.
@@ -276,11 +275,7 @@ static inline void vdso_read_cpunode(unsigned *cpu, unsigned *node)
* vector has no error code (two bytes), a 'push $vector_number' (two
* bytes), and a jump to the common entry code (up to five bytes).
*/
-#ifdef CONFIG_X86_IBT
-#define EARLY_IDT_HANDLER_SIZE 13
-#else
-#define EARLY_IDT_HANDLER_SIZE 9
-#endif
+#define EARLY_IDT_HANDLER_SIZE (9 + 4*IS_ENABLED(CONFIG_X86_IBT))
/*
* xen_early_idt_handler_array is for Xen pv guests: for each entry in
diff --git a/arch/x86/include/asm/unwind_hints.h b/arch/x86/include/asm/unwind_hints.h
index d5b401c2f9e9..8b33674288ea 100644
--- a/arch/x86/include/asm/unwind_hints.h
+++ b/arch/x86/include/asm/unwind_hints.h
@@ -11,7 +11,7 @@
UNWIND_HINT sp_reg=ORC_REG_UNDEFINED type=UNWIND_HINT_TYPE_CALL end=1
.endm
-.macro UNWIND_HINT_REGS base=%rsp offset=0 indirect=0 extra=1 partial=0 entry=1
+.macro UNWIND_HINT_REGS base=%rsp offset=0 indirect=0 extra=1 partial=0
.if \base == %rsp
.if \indirect
.set sp_reg, ORC_REG_SP_INDIRECT
@@ -33,17 +33,9 @@
.set sp_offset, \offset
.if \partial
- .if \entry
- .set type, UNWIND_HINT_TYPE_REGS_ENTRY
- .else
- .set type, UNWIND_HINT_TYPE_REGS_EXIT
- .endif
+ .set type, UNWIND_HINT_TYPE_REGS_PARTIAL
.elseif \extra == 0
- .if \entry
- .set type, UNWIND_HINT_TYPE_REGS_ENTRY
- .else
- .set type, UNWIND_HINT_TYPE_REGS_EXIT
- .endif
+ .set type, UNWIND_HINT_TYPE_REGS_PARTIAL
.set sp_offset, \offset + (16*8)
.else
.set type, UNWIND_HINT_TYPE_REGS
@@ -52,8 +44,8 @@
UNWIND_HINT sp_reg=sp_reg sp_offset=sp_offset type=type
.endm
-.macro UNWIND_HINT_IRET_REGS base=%rsp offset=0 entry=1
- UNWIND_HINT_REGS base=\base offset=\offset partial=1 entry=\entry
+.macro UNWIND_HINT_IRET_REGS base=%rsp offset=0
+ UNWIND_HINT_REGS base=\base offset=\offset partial=1
.endm
.macro UNWIND_HINT_FUNC
diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index 92e759ae9030..816bc70c9e71 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -25,7 +25,6 @@
#include <asm/export.h>
#include <asm/nospec-branch.h>
#include <asm/fixmap.h>
-#include <asm/ibt.h>
/*
* We are not able to switch in one step to the final KERNEL ADDRESS SPACE
@@ -332,8 +331,7 @@ SYM_CODE_END(start_cpu0)
* when .init.text is freed.
*/
SYM_CODE_START_NOALIGN(vc_boot_ghcb)
- UNWIND_HINT_IRET_REGS offset=8 entry=1
- ENDBR
+ UNWIND_HINT_IRET_REGS offset=8
/* Build pt_regs */
PUSH_AND_CLEAR_REGS
@@ -377,24 +375,25 @@ SYM_CODE_START(early_idt_handler_array)
i = 0
.rept NUM_EXCEPTION_VECTORS
.if ((EXCEPTION_ERRCODE_MASK >> i) & 1) == 0
- UNWIND_HINT_IRET_REGS entry=1
+ UNWIND_HINT_IRET_REGS
ENDBR
pushq $0 # Dummy error code, to make stack frame uniform
.else
- UNWIND_HINT_IRET_REGS offset=8 entry=1
+ UNWIND_HINT_IRET_REGS offset=8
ENDBR
.endif
pushq $i # 72(%rsp) Vector number
jmp early_idt_handler_common
- UNWIND_HINT_IRET_REGS entry=0
+ UNWIND_HINT_IRET_REGS
i = i + 1
.fill early_idt_handler_array + i*EARLY_IDT_HANDLER_SIZE - ., 1, 0xcc
.endr
- UNWIND_HINT_IRET_REGS offset=16 entry=0
SYM_CODE_END(early_idt_handler_array)
ANNOTATE_NOENDBR // early_idt_handler_array[NUM_EXCEPTION_VECTORS]
SYM_CODE_START_LOCAL(early_idt_handler_common)
+ UNWIND_HINT_IRET_REGS offset=16
+ ANNOTATE_NOENDBR
/*
* The stack is the hardware frame, an error code or zero, and the
* vector number.
diff --git a/arch/x86/kernel/unwind_orc.c b/arch/x86/kernel/unwind_orc.c
index fbf112c5485c..2de3c8c5eba9 100644
--- a/arch/x86/kernel/unwind_orc.c
+++ b/arch/x86/kernel/unwind_orc.c
@@ -566,8 +566,7 @@ bool unwind_next_frame(struct unwind_state *state)
state->signal = true;
break;
- case UNWIND_HINT_TYPE_REGS_ENTRY:
- case UNWIND_HINT_TYPE_REGS_EXIT:
+ case UNWIND_HINT_TYPE_REGS_PARTIAL:
if (!deref_stack_iret_regs(state, sp, &state->ip, &state->sp)) {
orc_warn_current("can't access iret registers at %pB\n",
(void *)orig_ip);
diff --git a/include/linux/objtool.h b/include/linux/objtool.h
index 5281e02c2326..fd9d90ec0e48 100644
--- a/include/linux/objtool.h
+++ b/include/linux/objtool.h
@@ -35,9 +35,8 @@ struct unwind_hint {
*/
#define UNWIND_HINT_TYPE_CALL 0
#define UNWIND_HINT_TYPE_REGS 1
-#define UNWIND_HINT_TYPE_REGS_ENTRY 2
-#define UNWIND_HINT_TYPE_REGS_EXIT 3
-#define UNWIND_HINT_TYPE_FUNC 4
+#define UNWIND_HINT_TYPE_REGS_PARTIAL 2
+#define UNWIND_HINT_TYPE_FUNC 3
#ifdef CONFIG_STACK_VALIDATION
diff --git a/tools/include/linux/objtool.h b/tools/include/linux/objtool.h
index c48d45733071..aca52db2f3f3 100644
--- a/tools/include/linux/objtool.h
+++ b/tools/include/linux/objtool.h
@@ -35,9 +35,8 @@ struct unwind_hint {
*/
#define UNWIND_HINT_TYPE_CALL 0
#define UNWIND_HINT_TYPE_REGS 1
-#define UNWIND_HINT_TYPE_REGS_ENTRY 2
-#define UNWIND_HINT_TYPE_REGS_EXIT 3
-#define UNWIND_HINT_TYPE_FUNC 4
+#define UNWIND_HINT_TYPE_REGS_PARTIAL 2
+#define UNWIND_HINT_TYPE_FUNC 3
#ifdef CONFIG_STACK_VALIDATION
diff --git a/tools/objtool/check.c b/tools/objtool/check.c
index 414c8a1dd868..5db0f66ab8fe 100644
--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -2488,8 +2488,7 @@ static int update_cfi_state(struct instruction *insn,
}
if (cfi->type == UNWIND_HINT_TYPE_REGS ||
- cfi->type == UNWIND_HINT_TYPE_REGS_ENTRY ||
- cfi->type == UNWIND_HINT_TYPE_REGS_EXIT)
+ cfi->type == UNWIND_HINT_TYPE_REGS_PARTIAL)
return update_cfi_state_regs(insn, cfi, op);
switch (op->dest.type) {
@@ -3254,9 +3253,13 @@ static int validate_branch(struct objtool_file *file, struct symbol *func,
if (insn->hint) {
state.cfi = *insn->cfi;
if (ibt) {
- if (insn->cfi->type == UNWIND_HINT_TYPE_REGS_ENTRY &&
- insn->type != INSN_ENDBR) {
- WARN_FUNC("IRET_ENTRY hint without ENDBR", insn->sec, insn->offset);
+ struct symbol *sym;
+ if (insn->cfi->type == UNWIND_HINT_TYPE_REGS_PARTIAL &&
+ (sym = find_symbol_by_offset(insn->sec, insn->offset)) &&
+ insn->type != INSN_ENDBR && !insn->noendbr) {
+ WARN_FUNC("IRET_REGS hint without ENDBR: %s",
+ insn->sec, insn->offset,
+ sym->name);
}
}
} else {
diff --git a/tools/objtool/orc_dump.c b/tools/objtool/orc_dump.c
index 145cef3535c2..f5a8508c42d6 100644
--- a/tools/objtool/orc_dump.c
+++ b/tools/objtool/orc_dump.c
@@ -43,8 +43,7 @@ static const char *orc_type_name(unsigned int type)
return "call";
case UNWIND_HINT_TYPE_REGS:
return "regs";
- case UNWIND_HINT_TYPE_REGS_ENTRY:
- case UNWIND_HINT_TYPE_REGS_EXIT:
+ case UNWIND_HINT_TYPE_REGS_PARTIAL:
return "regs (partial)";
default:
return "?";
^ permalink raw reply related [flat|nested] 94+ messages in thread
* Re: [PATCH 14/29] x86/ibt: Add IBT feature, MSR and #CP handling
2022-02-18 16:49 ` [PATCH 14/29] x86/ibt: Add IBT feature, MSR and #CP handling Peter Zijlstra
` (2 preceding siblings ...)
2022-02-19 1:21 ` Josh Poimboeuf
@ 2022-02-21 8:24 ` Kees Cook
2022-02-22 4:38 ` Edgecombe, Rick P
4 siblings, 0 replies; 94+ messages in thread
From: Kees Cook @ 2022-02-21 8:24 UTC (permalink / raw)
To: Peter Zijlstra, x86, joao, hjl.tools, jpoimboe, andrew.cooper3
Cc: linux-kernel, peterz, ndesaulniers, samitolvanen, mark.rutland,
alyssa.milburn
On February 18, 2022 8:49:16 AM PST, Peter Zijlstra <peterz@infradead.org> wrote:
>The bits required to make the hardware go.. Of note is that, provided
>the syscall entry points are covered with ENDBR, #CP doesn't need to
>be an IST because we'll never hit the syscall gap.
>
>Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
>---
> arch/x86/include/asm/cpufeatures.h | 1
> arch/x86/include/asm/idtentry.h | 5 ++
> arch/x86/include/asm/msr-index.h | 20 ++++++++
> arch/x86/include/asm/traps.h | 2
> arch/x86/include/uapi/asm/processor-flags.h | 2
> arch/x86/kernel/cpu/common.c | 23 +++++++++
> arch/x86/kernel/idt.c | 4 +
> arch/x86/kernel/traps.c | 65 ++++++++++++++++++++++++++++
> 8 files changed, 121 insertions(+), 1 deletion(-)
>
>--- a/arch/x86/include/asm/cpufeatures.h
>+++ b/arch/x86/include/asm/cpufeatures.h
>@@ -387,6 +387,7 @@
> #define X86_FEATURE_TSXLDTRK (18*32+16) /* TSX Suspend Load Address Tracking */
> #define X86_FEATURE_PCONFIG (18*32+18) /* Intel PCONFIG */
> #define X86_FEATURE_ARCH_LBR (18*32+19) /* Intel ARCH LBR */
>+#define X86_FEATURE_IBT (18*32+20) /* Indirect Branch Tracking */
> #define X86_FEATURE_AMX_BF16 (18*32+22) /* AMX bf16 Support */
> #define X86_FEATURE_AVX512_FP16 (18*32+23) /* AVX512 FP16 */
> #define X86_FEATURE_AMX_TILE (18*32+24) /* AMX tile Support */
>--- a/arch/x86/include/asm/idtentry.h
>+++ b/arch/x86/include/asm/idtentry.h
>@@ -622,6 +622,11 @@ DECLARE_IDTENTRY_DF(X86_TRAP_DF, exc_dou
> DECLARE_IDTENTRY_RAW_ERRORCODE(X86_TRAP_DF, xenpv_exc_double_fault);
> #endif
>
>+/* #CP */
>+#ifdef CONFIG_X86_IBT
>+DECLARE_IDTENTRY_ERRORCODE(X86_TRAP_CP, exc_control_protection);
>+#endif
>+
> /* #VC */
> #ifdef CONFIG_AMD_MEM_ENCRYPT
> DECLARE_IDTENTRY_VC(X86_TRAP_VC, exc_vmm_communication);
>--- a/arch/x86/include/asm/msr-index.h
>+++ b/arch/x86/include/asm/msr-index.h
>@@ -360,11 +360,29 @@
> #define MSR_ATOM_CORE_TURBO_RATIOS 0x0000066c
> #define MSR_ATOM_CORE_TURBO_VIDS 0x0000066d
>
>-
> #define MSR_CORE_PERF_LIMIT_REASONS 0x00000690
> #define MSR_GFX_PERF_LIMIT_REASONS 0x000006B0
> #define MSR_RING_PERF_LIMIT_REASONS 0x000006B1
>
>+/* Control-flow Enforcement Technology MSRs */
>+#define MSR_IA32_U_CET 0x000006a0 /* user mode cet */
>+#define MSR_IA32_S_CET 0x000006a2 /* kernel mode cet */
>+#define CET_SHSTK_EN BIT_ULL(0)
>+#define CET_WRSS_EN BIT_ULL(1)
>+#define CET_ENDBR_EN BIT_ULL(2)
>+#define CET_LEG_IW_EN BIT_ULL(3)
>+#define CET_NO_TRACK_EN BIT_ULL(4)
>+#define CET_SUPPRESS_DISABLE BIT_ULL(5)
>+#define CET_RESERVED (BIT_ULL(6) | BIT_ULL(7) | BIT_ULL(8) | BIT_ULL(9))
>+#define CET_SUPPRESS BIT_ULL(10)
>+#define CET_WAIT_ENDBR BIT_ULL(11)
>+
>+#define MSR_IA32_PL0_SSP 0x000006a4 /* ring-0 shadow stack pointer */
>+#define MSR_IA32_PL1_SSP 0x000006a5 /* ring-1 shadow stack pointer */
>+#define MSR_IA32_PL2_SSP 0x000006a6 /* ring-2 shadow stack pointer */
>+#define MSR_IA32_PL3_SSP 0x000006a7 /* ring-3 shadow stack pointer */
>+#define MSR_IA32_INT_SSP_TAB 0x000006a8 /* exception shadow stack table */
>+
> /* Hardware P state interface */
> #define MSR_PPERF 0x0000064e
> #define MSR_PERF_LIMIT_REASONS 0x0000064f
>--- a/arch/x86/include/asm/traps.h
>+++ b/arch/x86/include/asm/traps.h
>@@ -18,6 +18,8 @@ void __init trap_init(void);
> asmlinkage __visible noinstr struct pt_regs *vc_switch_off_ist(struct pt_regs *eregs);
> #endif
>
>+extern bool ibt_selftest(void);
>+
> #ifdef CONFIG_X86_F00F_BUG
> /* For handling the FOOF bug */
> void handle_invalid_op(struct pt_regs *regs);
>--- a/arch/x86/include/uapi/asm/processor-flags.h
>+++ b/arch/x86/include/uapi/asm/processor-flags.h
>@@ -130,6 +130,8 @@
> #define X86_CR4_SMAP _BITUL(X86_CR4_SMAP_BIT)
> #define X86_CR4_PKE_BIT 22 /* enable Protection Keys support */
> #define X86_CR4_PKE _BITUL(X86_CR4_PKE_BIT)
>+#define X86_CR4_CET_BIT 23 /* enable Control-flow Enforcement Technology */
>+#define X86_CR4_CET _BITUL(X86_CR4_CET_BIT)
>
> /*
> * x86-64 Task Priority Register, CR8
>--- a/arch/x86/kernel/cpu/common.c
>+++ b/arch/x86/kernel/cpu/common.c
>@@ -59,6 +59,7 @@
> #include <asm/cpu_device_id.h>
> #include <asm/uv/uv.h>
> #include <asm/sigframe.h>
>+#include <asm/traps.h>
>
> #include "cpu.h"
>
>@@ -592,6 +593,27 @@ static __init int setup_disable_pku(char
> __setup("nopku", setup_disable_pku);
> #endif /* CONFIG_X86_64 */
>
>+static __always_inline void setup_cet(struct cpuinfo_x86 *c)
>+{
>+ u64 msr;
>+
>+ if (!IS_ENABLED(CONFIG_X86_IBT) ||
>+ !cpu_feature_enabled(X86_FEATURE_IBT))
>+ return;
>+
>+ cr4_set_bits(X86_CR4_CET);
Please add X86_CR4_CET to cr4_pinned_mask too.
>+
>+ rdmsrl(MSR_IA32_S_CET, msr);
>+ if (cpu_feature_enabled(X86_FEATURE_IBT))
>+ msr |= CET_ENDBR_EN;
>+ wrmsrl(MSR_IA32_S_CET, msr);
>+
>+ if (!ibt_selftest()) {
>+ pr_err("IBT selftest: Failed!\n");
>+ setup_clear_cpu_cap(X86_FEATURE_IBT);
>+ }
>+}
>+
> /*
> * Some CPU features depend on higher CPUID levels, which may not always
> * be available due to CPUID level capping or broken virtualization
>@@ -1709,6 +1731,7 @@ static void identify_cpu(struct cpuinfo_
>
> x86_init_rdrand(c);
> setup_pku(c);
>+ setup_cet(c);
>
> /*
> * Clear/Set all flags overridden by options, need do it
>--- a/arch/x86/kernel/idt.c
>+++ b/arch/x86/kernel/idt.c
>@@ -104,6 +104,10 @@ static const __initconst struct idt_data
> ISTG(X86_TRAP_MC, asm_exc_machine_check, IST_INDEX_MCE),
> #endif
>
>+#ifdef CONFIG_X86_IBT
>+ INTG(X86_TRAP_CP, asm_exc_control_protection),
>+#endif
>+
> #ifdef CONFIG_AMD_MEM_ENCRYPT
> ISTG(X86_TRAP_VC, asm_exc_vmm_communication, IST_INDEX_VC),
> #endif
>--- a/arch/x86/kernel/traps.c
>+++ b/arch/x86/kernel/traps.c
>@@ -210,6 +210,71 @@ DEFINE_IDTENTRY(exc_overflow)
> do_error_trap(regs, 0, "overflow", X86_TRAP_OF, SIGSEGV, 0, NULL);
> }
>
>+#ifdef CONFIG_X86_IBT
>+
>+static bool ibt_fatal = true;
__ro_after_init please. :)
>+
>+extern unsigned long ibt_selftest_ip; /* defined in asm beow */
>+static volatile bool ibt_selftest_ok = false;
>+
>+DEFINE_IDTENTRY_ERRORCODE(exc_control_protection)
>+{
>+ if (!cpu_feature_enabled(X86_FEATURE_IBT)) {
>+ pr_err("Whaaa?!?!\n");
>+ return;
Seems like this case should fail closed and not return?
>+ }
>+
>+ if (WARN_ON_ONCE(user_mode(regs) || error_code != 3))
>+ return;
>+
>+ if (unlikely(regs->ip == ibt_selftest_ip)) {
>+ ibt_selftest_ok = true;
>+ return;
>+ }
>+
>+ pr_err("Missing ENDBR: %pS\n", (void *)instruction_pointer(regs));
>+ BUG_ON(ibt_fatal);
>+}
>+
>+bool ibt_selftest(void)
>+{
>+ ibt_selftest_ok = false;
>+
>+ asm (ANNOTATE_NOENDBR
>+ "1: lea 2f(%%rip), %%rax\n\t"
>+ ANNOTATE_RETPOLINE_SAFE
>+ " jmp *%%rax\n\t"
>+ "2: nop\n\t"
>+
>+ /* unsigned ibt_selftest_ip = 2b */
>+ ".pushsection .data,\"aw\"\n\t"
>+ ".align 8\n\t"
>+ ".type ibt_selftest_ip, @object\n\t"
>+ ".size ibt_selftest_ip, 8\n\t"
>+ "ibt_selftest_ip:\n\t"
>+ ".quad 2b\n\t"
>+ ".popsection\n\t"
>+
>+ : : : "rax", "memory");
>+
>+ return ibt_selftest_ok;
>+}
>+
>+static int __init ibt_setup(char *str)
>+{
>+ if (!strcmp(str, "off"))
>+ setup_clear_cpu_cap(X86_FEATURE_IBT);
>+
>+ if (!strcmp(str, "warn"))
>+ ibt_fatal = false;
>+
>+ return 1;
>+}
>+
>+__setup("ibt=", ibt_setup);
>+
>+#endif /* CONFIG_X86_IBT */
>+
> #ifdef CONFIG_X86_F00F_BUG
> void handle_invalid_op(struct pt_regs *regs)
> #else
>
>
--
Kees Cook
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: [PATCH 15/29] x86: Disable IBT around firmware
2022-02-18 16:49 ` [PATCH 15/29] x86: Disable IBT around firmware Peter Zijlstra
@ 2022-02-21 8:27 ` Kees Cook
2022-02-21 10:06 ` Peter Zijlstra
0 siblings, 1 reply; 94+ messages in thread
From: Kees Cook @ 2022-02-21 8:27 UTC (permalink / raw)
To: Peter Zijlstra, x86, joao, hjl.tools, jpoimboe, andrew.cooper3
Cc: linux-kernel, peterz, ndesaulniers, samitolvanen, mark.rutland,
alyssa.milburn
On February 18, 2022 8:49:17 AM PST, Peter Zijlstra <peterz@infradead.org> wrote:
>Assume firmware isn't IBT clean and disable it across calls.
>
>Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
>---
> arch/x86/include/asm/efi.h | 9 +++++++--
> arch/x86/include/asm/ibt.h | 10 ++++++++++
> arch/x86/kernel/apm_32.c | 7 +++++++
> arch/x86/kernel/cpu/common.c | 28 ++++++++++++++++++++++++++++
> 4 files changed, 52 insertions(+), 2 deletions(-)
>
>--- a/arch/x86/include/asm/efi.h
>+++ b/arch/x86/include/asm/efi.h
>@@ -7,6 +7,7 @@
> #include <asm/tlb.h>
> #include <asm/nospec-branch.h>
> #include <asm/mmu_context.h>
>+#include <asm/ibt.h>
> #include <linux/build_bug.h>
> #include <linux/kernel.h>
> #include <linux/pgtable.h>
>@@ -120,8 +121,12 @@ extern asmlinkage u64 __efi_call(void *f
> efi_enter_mm(); \
> })
>
>-#define arch_efi_call_virt(p, f, args...) \
>- efi_call((void *)p->f, args) \
>+#define arch_efi_call_virt(p, f, args...) ({ \
>+ u64 ret, ibt = ibt_save(); \
>+ ret = efi_call((void *)p->f, args); \
>+ ibt_restore(ibt); \
>+ ret; \
>+})
>
> #define arch_efi_call_virt_teardown() \
> ({ \
>--- a/arch/x86/include/asm/ibt.h
>+++ b/arch/x86/include/asm/ibt.h
>@@ -6,6 +6,8 @@
>
> #ifndef __ASSEMBLY__
>
>+#include <linux/types.h>
>+
> #ifdef CONFIG_X86_64
> #define ASM_ENDBR "endbr64\n\t"
> #else
>@@ -25,6 +27,9 @@ static inline bool is_endbr(const void *
> return val == ~0xfa1e0ff3;
> }
>
>+extern u64 ibt_save(void);
>+extern void ibt_restore(u64 save);
>+
> #else /* __ASSEMBLY__ */
>
> #ifdef CONFIG_X86_64
>@@ -39,10 +44,15 @@ static inline bool is_endbr(const void *
>
> #ifndef __ASSEMBLY__
>
>+#include <linux/types.h>
>+
> #define ASM_ENDBR
>
> #define __noendbr
>
>+static inline u64 ibt_save(void) { return 0; }
>+static inline void ibt_restore(u64 save) { }
>+
> #else /* __ASSEMBLY__ */
>
> #define ENDBR
>--- a/arch/x86/kernel/apm_32.c
>+++ b/arch/x86/kernel/apm_32.c
>@@ -232,6 +232,7 @@
> #include <asm/paravirt.h>
> #include <asm/reboot.h>
> #include <asm/nospec-branch.h>
>+#include <asm/ibt.h>
>
> #if defined(CONFIG_APM_DISPLAY_BLANK) && defined(CONFIG_VT)
> extern int (*console_blank_hook)(int);
>@@ -598,6 +599,7 @@ static long __apm_bios_call(void *_call)
> struct desc_struct save_desc_40;
> struct desc_struct *gdt;
> struct apm_bios_call *call = _call;
>+ u64 ibt;
>
> cpu = get_cpu();
> BUG_ON(cpu != 0);
>@@ -607,11 +609,13 @@ static long __apm_bios_call(void *_call)
>
> apm_irq_save(flags);
> firmware_restrict_branch_speculation_start();
>+ ibt = ibt_save();
> APM_DO_SAVE_SEGS;
> apm_bios_call_asm(call->func, call->ebx, call->ecx,
> &call->eax, &call->ebx, &call->ecx, &call->edx,
> &call->esi);
> APM_DO_RESTORE_SEGS;
>+ ibt_restore(ibt);
> firmware_restrict_branch_speculation_end();
> apm_irq_restore(flags);
> gdt[0x40 / 8] = save_desc_40;
>@@ -676,6 +680,7 @@ static long __apm_bios_call_simple(void
> struct desc_struct save_desc_40;
> struct desc_struct *gdt;
> struct apm_bios_call *call = _call;
>+ u64 ibt;
>
> cpu = get_cpu();
> BUG_ON(cpu != 0);
>@@ -685,10 +690,12 @@ static long __apm_bios_call_simple(void
>
> apm_irq_save(flags);
> firmware_restrict_branch_speculation_start();
>+ ibt = ibt_save();
> APM_DO_SAVE_SEGS;
> error = apm_bios_call_simple_asm(call->func, call->ebx, call->ecx,
> &call->eax);
> APM_DO_RESTORE_SEGS;
>+ ibt_restore(ibt);
> firmware_restrict_branch_speculation_end();
> apm_irq_restore(flags);
> gdt[0x40 / 8] = save_desc_40;
>--- a/arch/x86/kernel/cpu/common.c
>+++ b/arch/x86/kernel/cpu/common.c
>@@ -592,6 +592,34 @@ static __init int setup_disable_pku(char
> __setup("nopku", setup_disable_pku);
> #endif /* CONFIG_X86_64 */
>
>+#ifdef CONFIG_X86_IBT
>+
>+u64 ibt_save(void)
>+{
>+ u64 msr = 0;
>+
>+ if (cpu_feature_enabled(X86_FEATURE_IBT)) {
>+ rdmsrl(MSR_IA32_S_CET, msr);
>+ wrmsrl(MSR_IA32_S_CET, msr & ~CET_ENDBR_EN);
>+ }
>+
>+ return msr;
>+}
>+
>+void ibt_restore(u64 save)
Please make these both __always_inline so there no risk of them ever gaining ENDBRs and being used by ROP to disable IBT...
>+{
>+ u64 msr;
>+
>+ if (cpu_feature_enabled(X86_FEATURE_IBT)) {
>+ rdmsrl(MSR_IA32_S_CET, msr);
>+ msr &= ~CET_ENDBR_EN;
>+ msr |= (save & CET_ENDBR_EN);
>+ wrmsrl(MSR_IA32_S_CET, msr);
>+ }
>+}
>+
>+#endif
>+
> static __always_inline void setup_cet(struct cpuinfo_x86 *c)
> {
> u64 msr;
>
>
--
Kees Cook
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: [PATCH 00/29] x86: Kernel IBT
2022-02-19 9:58 ` Peter Zijlstra
2022-02-19 16:00 ` Andrew Cooper
@ 2022-02-21 8:42 ` Kees Cook
2022-02-21 9:24 ` Peter Zijlstra
1 sibling, 1 reply; 94+ messages in thread
From: Kees Cook @ 2022-02-21 8:42 UTC (permalink / raw)
To: Peter Zijlstra, Edgecombe, Rick P
Cc: Poimboe, Josh, hjl.tools, x86, joao, Cooper, Andrew,
linux-kernel, mark.rutland, samitolvanen, ndesaulniers, Milburn,
Alyssa
On February 19, 2022 1:58:27 AM PST, Peter Zijlstra <peterz@infradead.org> wrote:
>On Sat, Feb 19, 2022 at 01:29:45AM +0000, Edgecombe, Rick P wrote:
>> On Fri, 2022-02-18 at 17:49 +0100, Peter Zijlstra wrote:
>> > This is an (almost!) complete Kernel IBT implementation. It's been
>> > self-hosting
>> > for a few days now. That is, it runs on IBT enabled hardware
>> > (Tigerlake) and is
>> > capable of building the next kernel.
>> >
>> > It is also almost clean on allmodconfig using GCC-11.2.
>> >
>> > The biggest TODO item at this point is Clang, I've not yet looked at
>> > that.
>>
>> Do you need to turn this off before kexec?
>
>Probably... :-) I've never looked at that code though; so I'm not
>exactly sure where to put things.
>
>I'm assuming kexec does a hot-unplug of all but the boot-cpu which then
>leaves only a single CPU with state in machine_kexec() ? Does the below
>look reasonable?
>
>--- a/arch/x86/kernel/cpu/common.c
>+++ b/arch/x86/kernel/cpu/common.c
>@@ -638,6 +638,12 @@ static __always_inline void setup_cet(st
> }
> }
>
>+void cet_disable(void)
>+{
>+ cr4_clear_bits(X86_CR4_CET);
I'd rather keep the pinning...
>+ wrmsrl(MSR_IA32_S_CET, 0);
>+}
Eh, why not just require kexec to be IBT safe? That seems a reasonable exercise if we ever expect UEFI to enforce IBT when starting the kernel on a normal boot...
-Kees
>+
> /*
> * Some CPU features depend on higher CPUID levels, which may not always
> * be available due to CPUID level capping or broken virtualization
>diff --git a/arch/x86/include/asm/cpu.h b/arch/x86/include/asm/cpu.h
>index 33d41e350c79..cf26356db53e 100644
>--- a/arch/x86/include/asm/cpu.h
>+++ b/arch/x86/include/asm/cpu.h
>@@ -72,4 +72,7 @@ void init_ia32_feat_ctl(struct cpuinfo_x86 *c);
> #else
> static inline void init_ia32_feat_ctl(struct cpuinfo_x86 *c) {}
> #endif
>+
>+extern void cet_disable(void);
>+
> #endif /* _ASM_X86_CPU_H */
>diff --git a/arch/x86/kernel/machine_kexec_64.c b/arch/x86/kernel/machine_kexec_64.c
>index f5da4a18070a..29a2a1732605 100644
>--- a/arch/x86/kernel/machine_kexec_64.c
>+++ b/arch/x86/kernel/machine_kexec_64.c
>@@ -310,6 +310,7 @@ void machine_kexec(struct kimage *image)
> /* Interrupts aren't acceptable while we reboot */
> local_irq_disable();
> hw_breakpoint_disable();
>+ cet_disable();
>
> if (image->preserve_context) {
> #ifdef CONFIG_X86_IO_APIC
--
Kees Cook
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: [PATCH 00/29] x86: Kernel IBT
2022-02-21 8:42 ` Kees Cook
@ 2022-02-21 9:24 ` Peter Zijlstra
0 siblings, 0 replies; 94+ messages in thread
From: Peter Zijlstra @ 2022-02-21 9:24 UTC (permalink / raw)
To: Kees Cook
Cc: Edgecombe, Rick P, Poimboe, Josh, hjl.tools, x86, joao, Cooper,
Andrew, linux-kernel, mark.rutland, samitolvanen, ndesaulniers,
Milburn, Alyssa
On Mon, Feb 21, 2022 at 12:42:25AM -0800, Kees Cook wrote:
> >+void cet_disable(void)
> >+{
> >+ cr4_clear_bits(X86_CR4_CET);
>
> I'd rather keep the pinning...
Uff. is that still enforced at this point?
> >+ wrmsrl(MSR_IA32_S_CET, 0);
> >+}
>
> Eh, why not just require kexec to be IBT safe? That seems a reasonable
> exercise if we ever expect UEFI to enforce IBT when starting the
> kernel on a normal boot...
Well, it makes it impossible to kexec into an 'old' kernel. That might
not be very nice.
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: [PATCH 15/29] x86: Disable IBT around firmware
2022-02-21 8:27 ` Kees Cook
@ 2022-02-21 10:06 ` Peter Zijlstra
2022-02-21 13:22 ` Peter Zijlstra
2022-02-21 15:54 ` Kees Cook
0 siblings, 2 replies; 94+ messages in thread
From: Peter Zijlstra @ 2022-02-21 10:06 UTC (permalink / raw)
To: Kees Cook
Cc: x86, joao, hjl.tools, jpoimboe, andrew.cooper3, linux-kernel,
ndesaulniers, samitolvanen, mark.rutland, alyssa.milburn
Could you trim replies so that I can actually find what you write?
On Mon, Feb 21, 2022 at 12:27:20AM -0800, Kees Cook wrote:
> >+#ifdef CONFIG_X86_IBT
> >+
> >+u64 ibt_save(void)
> >+{
> >+ u64 msr = 0;
> >+
> >+ if (cpu_feature_enabled(X86_FEATURE_IBT)) {
> >+ rdmsrl(MSR_IA32_S_CET, msr);
> >+ wrmsrl(MSR_IA32_S_CET, msr & ~CET_ENDBR_EN);
> >+ }
> >+
> >+ return msr;
> >+}
> >+
> >+void ibt_restore(u64 save)
>
> Please make these both __always_inline so there no risk of them ever gaining ENDBRs and being used by ROP to disable IBT...
Either that or mark them __noendbr. The below seems to work.
Do we have a preference?
--- a/arch/x86/include/asm/ibt.h
+++ b/arch/x86/include/asm/ibt.h
@@ -48,8 +48,8 @@ static inline bool is_endbr(const void *
return val == gen_endbr64();
}
-extern u64 ibt_save(void);
-extern void ibt_restore(u64 save);
+extern __noendbr u64 ibt_save(void);
+extern __noendbr void ibt_restore(u64 save);
#else /* __ASSEMBLY__ */
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -596,7 +596,7 @@ __setup("nopku", setup_disable_pku);
#ifdef CONFIG_X86_IBT
-u64 ibt_save(void)
+__noendbr u64 ibt_save(void)
{
u64 msr = 0;
@@ -608,7 +608,7 @@ u64 ibt_save(void)
return msr;
}
-void ibt_restore(u64 save)
+__noendbr void ibt_restore(u64 save)
{
u64 msr;
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: [PATCH 15/29] x86: Disable IBT around firmware
2022-02-21 10:06 ` Peter Zijlstra
@ 2022-02-21 13:22 ` Peter Zijlstra
2022-02-21 15:54 ` Kees Cook
1 sibling, 0 replies; 94+ messages in thread
From: Peter Zijlstra @ 2022-02-21 13:22 UTC (permalink / raw)
To: Kees Cook
Cc: x86, joao, hjl.tools, jpoimboe, andrew.cooper3, linux-kernel,
ndesaulniers, samitolvanen, mark.rutland, alyssa.milburn
On Mon, Feb 21, 2022 at 11:06:15AM +0100, Peter Zijlstra wrote:
>
> Could you trim replies so that I can actually find what you write?
>
> On Mon, Feb 21, 2022 at 12:27:20AM -0800, Kees Cook wrote:
>
> > >+#ifdef CONFIG_X86_IBT
> > >+
> > >+u64 ibt_save(void)
> > >+{
> > >+ u64 msr = 0;
> > >+
> > >+ if (cpu_feature_enabled(X86_FEATURE_IBT)) {
> > >+ rdmsrl(MSR_IA32_S_CET, msr);
> > >+ wrmsrl(MSR_IA32_S_CET, msr & ~CET_ENDBR_EN);
> > >+ }
> > >+
> > >+ return msr;
> > >+}
> > >+
> > >+void ibt_restore(u64 save)
> >
> > Please make these both __always_inline so there no risk of them ever gaining ENDBRs and being used by ROP to disable IBT...
>
> Either that or mark them __noendbr. The below seems to work.
>
> Do we have a preference?
The inline thing runs into header hell..
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: [PATCH 19/29] x86/ibt,xen: Annotate away warnings
2022-02-18 23:07 ` Andrew Cooper
@ 2022-02-21 14:20 ` Peter Zijlstra
0 siblings, 0 replies; 94+ messages in thread
From: Peter Zijlstra @ 2022-02-21 14:20 UTC (permalink / raw)
To: Andrew Cooper
Cc: x86, joao, hjl.tools, jpoimboe, Juergen Gross, linux-kernel,
ndesaulniers, keescook, samitolvanen, mark.rutland,
alyssa.milburn, Andy Lutomirski
On Fri, Feb 18, 2022 at 11:07:15PM +0000, Andrew Cooper wrote:
> or so, but my point is that the early Xen code, if it can identify this
> patch point separate to the list of everything, can easily arrange for
> it to be modified before HYPERCALL_set_trap_table (Xen PV's LIDT), and
> then return_to_kernel is in its fully configured state (paravirt or
> otherwise) before interrupts/exceptions can be taken.
I ended up with the below... still bit of a hack, and I wonder if the
asm version you did isn't saner..
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -619,8 +619,8 @@ SYM_INNER_LABEL(swapgs_restore_regs_and_
/* Restore RDI. */
popq %rdi
- SWAPGS
- INTERRUPT_RETURN
+ swapgs
+ jmp .Lnative_iret
SYM_INNER_LABEL(restore_regs_and_return_to_kernel, SYM_L_GLOBAL)
@@ -637,11 +637,16 @@ SYM_INNER_LABEL(restore_regs_and_return_
* ARCH_HAS_MEMBARRIER_SYNC_CORE rely on IRET core serialization
* when returning from IPI handler.
*/
- INTERRUPT_RETURN
+#ifdef CONFIG_XEN_PV
+SYM_INNER_LABEL(early_xen_iret_patch, SYM_L_GLOBAL)
+ ANNOTATE_NOENDBR
+ .byte 0xe9
+ .long .Lnative_iret - (. + 4)
+#endif
-SYM_INNER_LABEL_ALIGN(native_iret, SYM_L_GLOBAL)
+.Lnative_iret:
UNWIND_HINT_IRET_REGS
- ENDBR // paravirt_iret
+ ANNOTATE_NOENDBR
/*
* Are we returning to a stack segment from the LDT? Note: in
* 64-bit mode SS:RSP on the exception stack is always valid.
--- a/arch/x86/include/asm/irqflags.h
+++ b/arch/x86/include/asm/irqflags.h
@@ -141,13 +141,8 @@ static __always_inline void arch_local_i
#ifdef CONFIG_X86_64
#ifdef CONFIG_XEN_PV
#define SWAPGS ALTERNATIVE "swapgs", "", X86_FEATURE_XENPV
-#define INTERRUPT_RETURN \
- ANNOTATE_RETPOLINE_SAFE; \
- ALTERNATIVE_TERNARY("jmp *paravirt_iret(%rip);", \
- X86_FEATURE_XENPV, "jmp xen_iret;", "jmp native_iret;")
#else
#define SWAPGS swapgs
-#define INTERRUPT_RETURN jmp native_iret
#endif
#endif
#endif /* !__ASSEMBLY__ */
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -272,7 +272,6 @@ struct paravirt_patch_template {
extern struct pv_info pv_info;
extern struct paravirt_patch_template pv_ops;
-extern void (*paravirt_iret)(void);
#define PARAVIRT_PATCH(x) \
(offsetof(struct paravirt_patch_template, x) / sizeof(void *))
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -350,7 +350,6 @@ SYM_CODE_START_NOALIGN(vc_boot_ghcb)
/* Remove Error Code */
addq $8, %rsp
- /* Pure iret required here - don't use INTERRUPT_RETURN */
iretq
SYM_CODE_END(vc_boot_ghcb)
#endif
@@ -435,6 +434,8 @@ SYM_CODE_END(early_idt_handler_common)
* early_idt_handler_array can't be used because it returns via the
* paravirtualized INTERRUPT_RETURN and pv-ops don't work that early.
*
+ * XXX it does, fix this.
+ *
* This handler will end up in the .init.text section and not be
* available to boot secondary CPUs.
*/
--- a/arch/x86/kernel/paravirt.c
+++ b/arch/x86/kernel/paravirt.c
@@ -151,8 +151,6 @@ void paravirt_set_sched_clock(u64 (*func
}
/* These are in entry.S */
-extern void native_iret(void);
-
static struct resource reserve_ioports = {
.start = 0,
.end = IO_SPACE_LIMIT,
@@ -416,8 +414,6 @@ struct paravirt_patch_template pv_ops =
#ifdef CONFIG_PARAVIRT_XXL
NOKPROBE_SYMBOL(native_load_idt);
-
-void (*paravirt_iret)(void) = native_iret;
#endif
EXPORT_SYMBOL(pv_ops);
--- a/arch/x86/xen/enlighten_pv.c
+++ b/arch/x86/xen/enlighten_pv.c
@@ -1178,9 +1178,13 @@ static void __init xen_domu_set_legacy_f
x86_platform.legacy.rtc = 0;
}
+extern void early_xen_iret_patch(void);
+
/* First C function to be called on Xen boot */
asmlinkage __visible void __init xen_start_kernel(void)
{
+ void *early_xen_iret = &early_xen_iret_patch;
+ void *xen_iret_dest = &xen_iret;
struct physdev_set_iopl set_iopl;
unsigned long initrd_start = 0;
int rc;
@@ -1188,6 +1192,13 @@ asmlinkage __visible void __init xen_sta
if (!xen_start_info)
return;
+ OPTIMIZER_HIDE_VAR(early_xen_iret);
+ OPTIMIZER_HIDE_VAR(xen_iret_dest);
+
+ memcpy(early_xen_iret,
+ text_gen_insn(JMP32_INSN_OPCODE, early_xen_iret, xen_iret_dest),
+ JMP32_INSN_SIZE);
+
xen_domain_type = XEN_PV_DOMAIN;
xen_start_flags = xen_start_info->flags;
@@ -1196,7 +1207,6 @@ asmlinkage __visible void __init xen_sta
/* Install Xen paravirt ops */
pv_info = xen_info;
pv_ops.cpu = xen_cpu_ops.cpu;
- paravirt_iret = xen_iret;
xen_init_irq_ops();
/*
--- a/arch/x86/xen/xen-asm.S
+++ b/arch/x86/xen/xen-asm.S
@@ -193,7 +193,7 @@ hypercall_iret = hypercall_page + __HYPE
*/
SYM_CODE_START(xen_iret)
UNWIND_HINT_EMPTY
- ENDBR
+ ANNOTATE_NOENDBR
pushq $0
jmp hypercall_iret
SYM_CODE_END(xen_iret)
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: [PATCH 15/29] x86: Disable IBT around firmware
2022-02-21 10:06 ` Peter Zijlstra
2022-02-21 13:22 ` Peter Zijlstra
@ 2022-02-21 15:54 ` Kees Cook
2022-02-21 16:10 ` Peter Zijlstra
1 sibling, 1 reply; 94+ messages in thread
From: Kees Cook @ 2022-02-21 15:54 UTC (permalink / raw)
To: Peter Zijlstra
Cc: x86, joao, hjl.tools, jpoimboe, andrew.cooper3, linux-kernel,
ndesaulniers, samitolvanen, mark.rutland, alyssa.milburn
On February 21, 2022 2:06:15 AM PST, Peter Zijlstra <peterz@infradead.org> wrote:
>
>Could you trim replies so that I can actually find what you write?
Sorry, yes; I was on my phone where the interface is awkward.
>On Mon, Feb 21, 2022 at 12:27:20AM -0800, Kees Cook wrote:
>> Please make these both __always_inline so there no risk of them ever gaining ENDBRs and being used by ROP to disable IBT...
>
>Either that or mark them __noendbr. The below seems to work.
>
>Do we have a preference?
Ah yeah, that works for me.
A small bike shed: should __noendbr have an alias, like __never_indirect or something, so there is an arch-agnostic way to do this that actually says what it does? (yes, it's in x86-only code now, hence the bike shed...)
-Kees
--
Kees Cook
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: [PATCH 15/29] x86: Disable IBT around firmware
2022-02-21 15:54 ` Kees Cook
@ 2022-02-21 16:10 ` Peter Zijlstra
0 siblings, 0 replies; 94+ messages in thread
From: Peter Zijlstra @ 2022-02-21 16:10 UTC (permalink / raw)
To: Kees Cook
Cc: x86, joao, hjl.tools, jpoimboe, andrew.cooper3, linux-kernel,
ndesaulniers, samitolvanen, mark.rutland, alyssa.milburn
On Mon, Feb 21, 2022 at 07:54:55AM -0800, Kees Cook wrote:
> A small bike shed: should __noendbr have an alias, like
> __never_indirect or something, so there is an arch-agnostic way to do
> this that actually says what it does? (yes, it's in x86-only code now,
> hence the bike shed...)
I actually asked Mark a related question last week somewhere, I think
the answer was that the annotation either wasn't working or not as
useful on ARM64.
I'm thinking it's easy enough to do a mass rename if/when we cross that
bridge though.
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: [PATCH 14/29] x86/ibt: Add IBT feature, MSR and #CP handling
2022-02-18 16:49 ` [PATCH 14/29] x86/ibt: Add IBT feature, MSR and #CP handling Peter Zijlstra
` (3 preceding siblings ...)
2022-02-21 8:24 ` Kees Cook
@ 2022-02-22 4:38 ` Edgecombe, Rick P
2022-02-22 9:32 ` Peter Zijlstra
4 siblings, 1 reply; 94+ messages in thread
From: Edgecombe, Rick P @ 2022-02-22 4:38 UTC (permalink / raw)
To: Poimboe, Josh, peterz, hjl.tools, x86, joao, Cooper, Andrew
Cc: keescook, linux-kernel, mark.rutland, samitolvanen, ndesaulniers,
Milburn, Alyssa
On Fri, 2022-02-18 at 17:49 +0100, Peter Zijlstra wrote:
> + cr4_set_bits(X86_CR4_CET);
> +
> + rdmsrl(MSR_IA32_S_CET, msr);
> + if (cpu_feature_enabled(X86_FEATURE_IBT))
> + msr |= CET_ENDBR_EN;
> + wrmsrl(MSR_IA32_S_CET, msr);
So I guess implicit in all of this is that MSR_IA32_S_CET will not be
managed by xsaves (makes sense).
But it still might be good to add the supervisor cet xfeature number to
XFEATURE_MASK_SUPERVISOR_UNSUPPORTED, with analogous reasoning to
XFEATURE_MASK_PT.
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: [PATCH 14/29] x86/ibt: Add IBT feature, MSR and #CP handling
2022-02-22 4:38 ` Edgecombe, Rick P
@ 2022-02-22 9:32 ` Peter Zijlstra
0 siblings, 0 replies; 94+ messages in thread
From: Peter Zijlstra @ 2022-02-22 9:32 UTC (permalink / raw)
To: Edgecombe, Rick P
Cc: Poimboe, Josh, hjl.tools, x86, joao, Cooper, Andrew, keescook,
linux-kernel, mark.rutland, samitolvanen, ndesaulniers, Milburn,
Alyssa
On Tue, Feb 22, 2022 at 04:38:22AM +0000, Edgecombe, Rick P wrote:
> On Fri, 2022-02-18 at 17:49 +0100, Peter Zijlstra wrote:
> > + cr4_set_bits(X86_CR4_CET);
> > +
> > + rdmsrl(MSR_IA32_S_CET, msr);
> > + if (cpu_feature_enabled(X86_FEATURE_IBT))
> > + msr |= CET_ENDBR_EN;
> > + wrmsrl(MSR_IA32_S_CET, msr);
>
> So I guess implicit in all of this is that MSR_IA32_S_CET will not be
> managed by xsaves (makes sense).
>
> But it still might be good to add the supervisor cet xfeature number to
> XFEATURE_MASK_SUPERVISOR_UNSUPPORTED, with analogous reasoning to
> XFEATURE_MASK_PT.
Yeah, no, I'm not touching that.
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: [PATCH 16/29] x86/bugs: Disable Retpoline when IBT
2022-02-19 2:15 ` Josh Poimboeuf
@ 2022-02-22 15:00 ` Peter Zijlstra
2022-02-25 0:19 ` Josh Poimboeuf
0 siblings, 1 reply; 94+ messages in thread
From: Peter Zijlstra @ 2022-02-22 15:00 UTC (permalink / raw)
To: Josh Poimboeuf
Cc: x86, joao, hjl.tools, andrew.cooper3, linux-kernel, ndesaulniers,
keescook, samitolvanen, mark.rutland, alyssa.milburn
On Fri, Feb 18, 2022 at 06:15:30PM -0800, Josh Poimboeuf wrote:
> This code is confusing, not helped by the fact that the existing code
> already looks like spaghetti.
I'd say that's an insult to spaghetti.
> Assuming IBT systems also have eIBRS (right?), I don't think the above
> SPECTRE_V2_CMD_{FORCE,AUTO} cases would be possible.
Virt FTW.. if I don't handle it, some idiot will create a virtual
machine that doesn't expose eIBRS but does do IBT just to spite me.
> AFAICT, if execution reached the retpoline_generic label, the user
> specified either RETPOLINE or RETPOLINE_GENERIC.
Only RETPOLINE_GENERIC;
> I'm not sure it makes sense to put RETPOLINE in the "silent" list. If
> the user boots an Intel system with spectre_v2=retpoline on the cmdline,
> they're probably expecting a traditional retpoline and should be warned
> if that changes, especially if it's a "demotion".
too friggin bad as to expectations; retpoline == auto. Not saying that
makes sense, just saying that's what it does.
> In that case the switch statement isn't even needed. It can instead
> just unconditinoally print the warning.
>
>
> Also, why "demote" retpoline to LFENCE rather than attempting to
> "promote" it to eIBRS? Maybe there's a good reason but it probably at
> least deserves some mention in the commit log.
The current code will never select retpoline if eibrs is available.
The alternative is doing this in apply_retpolines(), but that might be
even more nasty.
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: [PATCH 00/29] x86: Kernel IBT
2022-02-19 1:29 ` [PATCH 00/29] x86: Kernel IBT Edgecombe, Rick P
2022-02-19 9:58 ` Peter Zijlstra
@ 2022-02-23 7:26 ` Kees Cook
2022-02-24 16:47 ` Mike Rapoport
1 sibling, 1 reply; 94+ messages in thread
From: Kees Cook @ 2022-02-23 7:26 UTC (permalink / raw)
To: Edgecombe, Rick P, Poimboe, Josh, peterz, hjl.tools, x86, joao,
Cooper, Andrew
Cc: linux-kernel, mark.rutland, samitolvanen, ndesaulniers, Milburn, Alyssa
On Fri, 2022-02-18 at 17:49 +0100, Peter Zijlstra wrote:
> This is an (almost!) complete Kernel IBT implementation.
BTW, I've successfully tested this on what /proc/cpuinfo calls an "11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz" (in a Lenovo "Yoga 7 15ITL5"). Normal laptop-y things all seem happy and it correctly blows up on a new LKDTM test I'll send out tomorrow.
So, even though the series is young and has some TODOs still:
Tested-by: Kees Cook <keescook@chromium.org>
One thought: should there be a note in dmesg about it being active? The only way to see it is finding "ibt" in cpuinfo...
-Kees
--
Kees Cook
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: [PATCH 04/29] x86/livepatch: Validate __fentry__ location
2022-02-18 21:08 ` Josh Poimboeuf
@ 2022-02-23 10:09 ` Peter Zijlstra
2022-02-23 10:21 ` Miroslav Benes
2022-02-23 10:57 ` Peter Zijlstra
0 siblings, 2 replies; 94+ messages in thread
From: Peter Zijlstra @ 2022-02-23 10:09 UTC (permalink / raw)
To: Josh Poimboeuf
Cc: x86, joao, hjl.tools, andrew.cooper3, linux-kernel, ndesaulniers,
keescook, samitolvanen, mark.rutland, alyssa.milburn,
Miroslav Benes, Steven Rostedt
On Fri, Feb 18, 2022 at 01:08:31PM -0800, Josh Poimboeuf wrote:
> On Fri, Feb 18, 2022 at 05:49:06PM +0100, Peter Zijlstra wrote:
> > Currently livepatch assumes __fentry__ lives at func+0, which is most
> > likely untrue with IBT on. Override the weak klp_get_ftrace_location()
> > function with an arch specific version that's IBT aware.
> >
> > Also make the weak fallback verify the location is an actual ftrace
> > location as a sanity check.
> >
> > Suggested-by: Miroslav Benes <mbenes@suse.cz>
> > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> > ---
> > arch/x86/include/asm/livepatch.h | 9 +++++++++
> > kernel/livepatch/patch.c | 2 +-
> > 2 files changed, 10 insertions(+), 1 deletion(-)
> >
> > --- a/arch/x86/include/asm/livepatch.h
> > +++ b/arch/x86/include/asm/livepatch.h
> > @@ -17,4 +17,13 @@ static inline void klp_arch_set_pc(struc
> > ftrace_instruction_pointer_set(fregs, ip);
> > }
> >
> > +#define klp_get_ftrace_location klp_get_ftrace_location
> > +static inline unsigned long klp_get_ftrace_location(unsigned long faddr)
> > +{
> > + unsigned long addr = ftrace_location(faddr);
> > + if (!addr && IS_ENABLED(CONFIG_X86_IBT))
> > + addr = ftrace_location(faddr + 4);
> > + return addr;
>
> I'm kind of surprised this logic doesn't exist in ftrace itself. Is
> livepatch really the only user that needs to find the fentry for a given
> function?
>
> I had to do a double take for the ftrace_location() semantics, as I
> originally assumed that's what it did, based on its name and signature.
>
> Instead it apparently functions like a bool but returns its argument on
> success.
>
> Though the function comment tells a different story:
>
> /**
> * ftrace_location - return true if the ip giving is a traced location
>
> So it's all kinds of confusing...
Yes.. so yesterday, when making function-graph tracing not explode, I
ran into a similar issue. Steve suggested something along the lines of
.... this.
(modified from his actual suggestion to also cover this case)
Let me go try this...
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -1578,7 +1578,23 @@ unsigned long ftrace_location_range(unsi
*/
unsigned long ftrace_location(unsigned long ip)
{
- return ftrace_location_range(ip, ip);
+ struct dyn_ftrace *rec;
+ unsigned long offset;
+ unsigned long size;
+
+ rec = lookup_rec(ip, ip);
+ if (!rec) {
+ if (!kallsyms_lookup(ip, &size, &offset, NULL, NULL))
+ goto out;
+
+ rec = lookup_rec(ip - offset, (ip - offset) + size);
+ }
+
+ if (rec)
+ return rec->ip;
+
+out:
+ return 0;
}
/**
@@ -5110,11 +5126,16 @@ int register_ftrace_direct(unsigned long
struct ftrace_func_entry *entry;
struct ftrace_hash *free_hash = NULL;
struct dyn_ftrace *rec;
- int ret = -EBUSY;
+ int ret = -ENODEV;
mutex_lock(&direct_mutex);
+ ip = ftrace_location(ip);
+ if (!ip)
+ goto out_unlock;
+
/* See if there's a direct function at @ip already */
+ ret = -EBUSY;
if (ftrace_find_rec_direct(ip))
goto out_unlock;
@@ -5222,6 +5243,10 @@ int unregister_ftrace_direct(unsigned lo
mutex_lock(&direct_mutex);
+ ip = ftrace_location(ip);
+ if (!ip)
+ goto out_unlock;
+
entry = find_direct_entry(&ip, NULL);
if (!entry)
goto out_unlock;
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: [PATCH 04/29] x86/livepatch: Validate __fentry__ location
2022-02-23 10:09 ` Peter Zijlstra
@ 2022-02-23 10:21 ` Miroslav Benes
2022-02-23 10:57 ` Peter Zijlstra
1 sibling, 0 replies; 94+ messages in thread
From: Miroslav Benes @ 2022-02-23 10:21 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Josh Poimboeuf, x86, joao, hjl.tools, andrew.cooper3,
linux-kernel, ndesaulniers, keescook, samitolvanen, mark.rutland,
alyssa.milburn, Steven Rostedt
On Wed, 23 Feb 2022, Peter Zijlstra wrote:
> On Fri, Feb 18, 2022 at 01:08:31PM -0800, Josh Poimboeuf wrote:
> > On Fri, Feb 18, 2022 at 05:49:06PM +0100, Peter Zijlstra wrote:
> > > Currently livepatch assumes __fentry__ lives at func+0, which is most
> > > likely untrue with IBT on. Override the weak klp_get_ftrace_location()
> > > function with an arch specific version that's IBT aware.
> > >
> > > Also make the weak fallback verify the location is an actual ftrace
> > > location as a sanity check.
> > >
> > > Suggested-by: Miroslav Benes <mbenes@suse.cz>
> > > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> > > ---
> > > arch/x86/include/asm/livepatch.h | 9 +++++++++
> > > kernel/livepatch/patch.c | 2 +-
> > > 2 files changed, 10 insertions(+), 1 deletion(-)
> > >
> > > --- a/arch/x86/include/asm/livepatch.h
> > > +++ b/arch/x86/include/asm/livepatch.h
> > > @@ -17,4 +17,13 @@ static inline void klp_arch_set_pc(struc
> > > ftrace_instruction_pointer_set(fregs, ip);
> > > }
> > >
> > > +#define klp_get_ftrace_location klp_get_ftrace_location
> > > +static inline unsigned long klp_get_ftrace_location(unsigned long faddr)
> > > +{
> > > + unsigned long addr = ftrace_location(faddr);
> > > + if (!addr && IS_ENABLED(CONFIG_X86_IBT))
> > > + addr = ftrace_location(faddr + 4);
> > > + return addr;
> >
> > I'm kind of surprised this logic doesn't exist in ftrace itself. Is
> > livepatch really the only user that needs to find the fentry for a given
> > function?
> >
> > I had to do a double take for the ftrace_location() semantics, as I
> > originally assumed that's what it did, based on its name and signature.
> >
> > Instead it apparently functions like a bool but returns its argument on
> > success.
> >
> > Though the function comment tells a different story:
> >
> > /**
> > * ftrace_location - return true if the ip giving is a traced location
> >
> > So it's all kinds of confusing...
>
> Yes.. so yesterday, when making function-graph tracing not explode, I
> ran into a similar issue. Steve suggested something along the lines of
> .... this.
>
> (modified from his actual suggestion to also cover this case)
>
> Let me go try this...
Yes, this looks good.
> --- a/kernel/trace/ftrace.c
> +++ b/kernel/trace/ftrace.c
> @@ -1578,7 +1578,23 @@ unsigned long ftrace_location_range(unsi
> */
> unsigned long ftrace_location(unsigned long ip)
> {
> - return ftrace_location_range(ip, ip);
> + struct dyn_ftrace *rec;
> + unsigned long offset;
> + unsigned long size;
> +
> + rec = lookup_rec(ip, ip);
> + if (!rec) {
> + if (!kallsyms_lookup(ip, &size, &offset, NULL, NULL))
Since we do not care about a symbol name, kallsyms_lookup_size_offset()
would be better I think.
Miroslav
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: [PATCH 04/29] x86/livepatch: Validate __fentry__ location
2022-02-23 10:09 ` Peter Zijlstra
2022-02-23 10:21 ` Miroslav Benes
@ 2022-02-23 10:57 ` Peter Zijlstra
2022-02-23 12:41 ` Steven Rostedt
1 sibling, 1 reply; 94+ messages in thread
From: Peter Zijlstra @ 2022-02-23 10:57 UTC (permalink / raw)
To: Josh Poimboeuf
Cc: x86, joao, hjl.tools, andrew.cooper3, linux-kernel, ndesaulniers,
keescook, samitolvanen, mark.rutland, alyssa.milburn,
Miroslav Benes, Steven Rostedt
On Wed, Feb 23, 2022 at 11:09:44AM +0100, Peter Zijlstra wrote:
> Yes.. so yesterday, when making function-graph tracing not explode, I
> ran into a similar issue. Steve suggested something along the lines of
> .... this.
>
> (modified from his actual suggestion to also cover this case)
>
> Let me go try this...
This one actually works...
---
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -1578,7 +1578,23 @@ unsigned long ftrace_location_range(unsi
*/
unsigned long ftrace_location(unsigned long ip)
{
- return ftrace_location_range(ip, ip);
+ struct dyn_ftrace *rec;
+ unsigned long offset;
+ unsigned long size;
+
+ rec = lookup_rec(ip, ip);
+ if (!rec) {
+ if (!kallsyms_lookup_size_offset(ip, &size, &offset))
+ goto out;
+
+ rec = lookup_rec(ip - offset, (ip - offset) + size);
+ }
+
+ if (rec)
+ return rec->ip;
+
+out:
+ return 0;
}
/**
@@ -5110,11 +5126,16 @@ int register_ftrace_direct(unsigned long
struct ftrace_func_entry *entry;
struct ftrace_hash *free_hash = NULL;
struct dyn_ftrace *rec;
- int ret = -EBUSY;
+ int ret = -ENODEV;
mutex_lock(&direct_mutex);
+ ip = ftrace_location(ip);
+ if (!ip)
+ goto out_unlock;
+
/* See if there's a direct function at @ip already */
+ ret = -EBUSY;
if (ftrace_find_rec_direct(ip))
goto out_unlock;
@@ -5222,6 +5243,10 @@ int unregister_ftrace_direct(unsigned lo
mutex_lock(&direct_mutex);
+ ip = ftrace_location(ip);
+ if (!ip)
+ goto out_unlock;
+
entry = find_direct_entry(&ip, NULL);
if (!entry)
goto out_unlock;
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: [PATCH 04/29] x86/livepatch: Validate __fentry__ location
2022-02-23 10:57 ` Peter Zijlstra
@ 2022-02-23 12:41 ` Steven Rostedt
2022-02-23 14:05 ` Peter Zijlstra
2022-02-23 14:23 ` Steven Rostedt
0 siblings, 2 replies; 94+ messages in thread
From: Steven Rostedt @ 2022-02-23 12:41 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Josh Poimboeuf, x86, joao, hjl.tools, andrew.cooper3,
linux-kernel, ndesaulniers, keescook, samitolvanen, mark.rutland,
alyssa.milburn, Miroslav Benes
On Wed, 23 Feb 2022 11:57:26 +0100
Peter Zijlstra <peterz@infradead.org> wrote:
> --- a/kernel/trace/ftrace.c
> +++ b/kernel/trace/ftrace.c
> @@ -1578,7 +1578,23 @@ unsigned long ftrace_location_range(unsi
> */
> unsigned long ftrace_location(unsigned long ip)
> {
> - return ftrace_location_range(ip, ip);
> + struct dyn_ftrace *rec;
> + unsigned long offset;
> + unsigned long size;
> +
> + rec = lookup_rec(ip, ip);
> + if (!rec) {
> + if (!kallsyms_lookup_size_offset(ip, &size, &offset))
> + goto out;
> +
> + rec = lookup_rec(ip - offset, (ip - offset) + size);
> + }
> +
Please create a new function for this. Perhaps find_ftrace_location().
ftrace_location() is used to see if the address given is a ftrace
nop or not. This change will make it always return true.
-- Steve
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: [PATCH 04/29] x86/livepatch: Validate __fentry__ location
2022-02-23 12:41 ` Steven Rostedt
@ 2022-02-23 14:05 ` Peter Zijlstra
2022-02-23 14:16 ` Steven Rostedt
2022-02-23 14:23 ` Steven Rostedt
1 sibling, 1 reply; 94+ messages in thread
From: Peter Zijlstra @ 2022-02-23 14:05 UTC (permalink / raw)
To: Steven Rostedt
Cc: Josh Poimboeuf, x86, joao, hjl.tools, andrew.cooper3,
linux-kernel, ndesaulniers, keescook, samitolvanen, mark.rutland,
alyssa.milburn, Miroslav Benes
On Wed, Feb 23, 2022 at 07:41:39AM -0500, Steven Rostedt wrote:
> On Wed, 23 Feb 2022 11:57:26 +0100
> Peter Zijlstra <peterz@infradead.org> wrote:
>
> > --- a/kernel/trace/ftrace.c
> > +++ b/kernel/trace/ftrace.c
> > @@ -1578,7 +1578,23 @@ unsigned long ftrace_location_range(unsi
> > */
> > unsigned long ftrace_location(unsigned long ip)
> > {
> > - return ftrace_location_range(ip, ip);
> > + struct dyn_ftrace *rec;
> > + unsigned long offset;
> > + unsigned long size;
> > +
> > + rec = lookup_rec(ip, ip);
> > + if (!rec) {
> > + if (!kallsyms_lookup_size_offset(ip, &size, &offset))
> > + goto out;
> > +
> > + rec = lookup_rec(ip - offset, (ip - offset) + size);
> > + }
> > +
>
> Please create a new function for this. Perhaps find_ftrace_location().
>
> ftrace_location() is used to see if the address given is a ftrace
> nop or not. This change will make it always return true.
>
# git grep ftrace_location
arch/powerpc/include/asm/livepatch.h:#define klp_get_ftrace_location klp_get_ftrace_location
arch/powerpc/include/asm/livepatch.h:static inline unsigned long klp_get_ftrace_location(unsigned long faddr)
arch/powerpc/include/asm/livepatch.h: return ftrace_location_range(faddr, faddr + 16);
arch/powerpc/kernel/kprobes.c: faddr = ftrace_location_range((unsigned long)addr,
arch/x86/kernel/kprobes/core.c: faddr = ftrace_location(addr);
arch/x86/kernel/kprobes/core.c: * arch_check_ftrace_location(). Something went terribly wrong
include/linux/ftrace.h:unsigned long ftrace_location(unsigned long ip);
include/linux/ftrace.h:unsigned long ftrace_location_range(unsigned long start, unsigned long end);
include/linux/ftrace.h:static inline unsigned long ftrace_location(unsigned long ip)
kernel/bpf/trampoline.c:static int is_ftrace_location(void *ip)
kernel/bpf/trampoline.c: addr = ftrace_location((long)ip);
kernel/bpf/trampoline.c: ret = is_ftrace_location(ip);
kernel/kprobes.c: unsigned long faddr = ftrace_location((unsigned long)addr);
kernel/kprobes.c:static int check_ftrace_location(struct kprobe *p)
kernel/kprobes.c: ftrace_addr = ftrace_location((unsigned long)p->addr);
kernel/kprobes.c: ret = check_ftrace_location(p);
kernel/livepatch/patch.c:#ifndef klp_get_ftrace_location
kernel/livepatch/patch.c:static unsigned long klp_get_ftrace_location(unsigned long faddr)
kernel/livepatch/patch.c: return ftrace_location(faddr);
kernel/livepatch/patch.c: klp_get_ftrace_location((unsigned long)func->old_func);
kernel/livepatch/patch.c: klp_get_ftrace_location((unsigned long)func->old_func);
kernel/trace/ftrace.c: * ftrace_location_range - return the first address of a traced location
kernel/trace/ftrace.c:unsigned long ftrace_location_range(unsigned long start, unsigned long end)
kernel/trace/ftrace.c: * ftrace_location - return true if the ip giving is a traced location
kernel/trace/ftrace.c:unsigned long ftrace_location(unsigned long ip)
kernel/trace/ftrace.c: ret = ftrace_location_range((unsigned long)start,
kernel/trace/ftrace.c: if (!ftrace_location(ip))
kernel/trace/ftrace.c: ip = ftrace_location(ip);
kernel/trace/ftrace.c: ip = ftrace_location(ip);
kernel/trace/trace_kprobe.c: * Since ftrace_location_range() does inclusive range check, we need
kernel/trace/trace_kprobe.c: return !ftrace_location_range(addr, addr + size - 1);
and yet almost every caller takes the address it returns...
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: [PATCH 04/29] x86/livepatch: Validate __fentry__ location
2022-02-23 14:05 ` Peter Zijlstra
@ 2022-02-23 14:16 ` Steven Rostedt
0 siblings, 0 replies; 94+ messages in thread
From: Steven Rostedt @ 2022-02-23 14:16 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Josh Poimboeuf, x86, joao, hjl.tools, andrew.cooper3,
linux-kernel, ndesaulniers, keescook, samitolvanen, mark.rutland,
alyssa.milburn, Miroslav Benes
On Wed, 23 Feb 2022 15:05:42 +0100
Peter Zijlstra <peterz@infradead.org> wrote:
> On Wed, Feb 23, 2022 at 07:41:39AM -0500, Steven Rostedt wrote:
> > On Wed, 23 Feb 2022 11:57:26 +0100
> > Peter Zijlstra <peterz@infradead.org> wrote:
> >
> > > --- a/kernel/trace/ftrace.c
> > > +++ b/kernel/trace/ftrace.c
> > > @@ -1578,7 +1578,23 @@ unsigned long ftrace_location_range(unsi
> > > */
> > > unsigned long ftrace_location(unsigned long ip)
> > > {
> > > - return ftrace_location_range(ip, ip);
> > > + struct dyn_ftrace *rec;
> > > + unsigned long offset;
> > > + unsigned long size;
> > > +
> > > + rec = lookup_rec(ip, ip);
> > > + if (!rec) {
> > > + if (!kallsyms_lookup_size_offset(ip, &size, &offset))
> > > + goto out;
> > > +
> > > + rec = lookup_rec(ip - offset, (ip - offset) + size);
> > > + }
> > > +
> >
> > Please create a new function for this. Perhaps find_ftrace_location().
> >
> > ftrace_location() is used to see if the address given is a ftrace
> > nop or not. This change will make it always return true.
> >
>
> # git grep ftrace_location
> arch/powerpc/include/asm/livepatch.h:#define klp_get_ftrace_location klp_get_ftrace_location
> arch/powerpc/include/asm/livepatch.h:static inline unsigned long klp_get_ftrace_location(unsigned long faddr)
> arch/powerpc/include/asm/livepatch.h: return ftrace_location_range(faddr, faddr + 16);
> arch/powerpc/kernel/kprobes.c: faddr = ftrace_location_range((unsigned long)addr,
> arch/x86/kernel/kprobes/core.c: faddr = ftrace_location(addr);
> arch/x86/kernel/kprobes/core.c: * arch_check_ftrace_location(). Something went terribly wrong
> include/linux/ftrace.h:unsigned long ftrace_location(unsigned long ip);
> include/linux/ftrace.h:unsigned long ftrace_location_range(unsigned long start, unsigned long end);
> include/linux/ftrace.h:static inline unsigned long ftrace_location(unsigned long ip)
> kernel/bpf/trampoline.c:static int is_ftrace_location(void *ip)
> kernel/bpf/trampoline.c: addr = ftrace_location((long)ip);
> kernel/bpf/trampoline.c: ret = is_ftrace_location(ip);
> kernel/kprobes.c: unsigned long faddr = ftrace_location((unsigned long)addr);
> kernel/kprobes.c:static int check_ftrace_location(struct kprobe *p)
> kernel/kprobes.c: ftrace_addr = ftrace_location((unsigned long)p->addr);
> kernel/kprobes.c: ret = check_ftrace_location(p);
> kernel/livepatch/patch.c:#ifndef klp_get_ftrace_location
> kernel/livepatch/patch.c:static unsigned long klp_get_ftrace_location(unsigned long faddr)
> kernel/livepatch/patch.c: return ftrace_location(faddr);
> kernel/livepatch/patch.c: klp_get_ftrace_location((unsigned long)func->old_func);
> kernel/livepatch/patch.c: klp_get_ftrace_location((unsigned long)func->old_func);
> kernel/trace/ftrace.c: * ftrace_location_range - return the first address of a traced location
> kernel/trace/ftrace.c:unsigned long ftrace_location_range(unsigned long start, unsigned long end)
> kernel/trace/ftrace.c: * ftrace_location - return true if the ip giving is a traced location
> kernel/trace/ftrace.c:unsigned long ftrace_location(unsigned long ip)
> kernel/trace/ftrace.c: ret = ftrace_location_range((unsigned long)start,
> kernel/trace/ftrace.c: if (!ftrace_location(ip))
> kernel/trace/ftrace.c: ip = ftrace_location(ip);
> kernel/trace/ftrace.c: ip = ftrace_location(ip);
> kernel/trace/trace_kprobe.c: * Since ftrace_location_range() does inclusive range check, we need
> kernel/trace/trace_kprobe.c: return !ftrace_location_range(addr, addr + size - 1);
>
> and yet almost every caller takes the address it returns...
And they check if the returned value is 0 or not. If it is zero, it lets
them know it isn't an ftrace location.
-- Steve
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: [PATCH 04/29] x86/livepatch: Validate __fentry__ location
2022-02-23 12:41 ` Steven Rostedt
2022-02-23 14:05 ` Peter Zijlstra
@ 2022-02-23 14:23 ` Steven Rostedt
2022-02-23 14:33 ` Steven Rostedt
2022-02-23 14:49 ` Peter Zijlstra
1 sibling, 2 replies; 94+ messages in thread
From: Steven Rostedt @ 2022-02-23 14:23 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Josh Poimboeuf, x86, joao, hjl.tools, andrew.cooper3,
linux-kernel, ndesaulniers, keescook, samitolvanen, mark.rutland,
alyssa.milburn, Miroslav Benes, Masami Hiramatsu,
Alexei Starovoitov
On Wed, 23 Feb 2022 07:41:39 -0500
Steven Rostedt <rostedt@goodmis.org> wrote:
> > --- a/kernel/trace/ftrace.c
> > +++ b/kernel/trace/ftrace.c
> > @@ -1578,7 +1578,23 @@ unsigned long ftrace_location_range(unsi
> > */
> > unsigned long ftrace_location(unsigned long ip)
> > {
> > - return ftrace_location_range(ip, ip);
> > + struct dyn_ftrace *rec;
> > + unsigned long offset;
> > + unsigned long size;
> > +
> > + rec = lookup_rec(ip, ip);
> > + if (!rec) {
> > + if (!kallsyms_lookup_size_offset(ip, &size, &offset))
> > + goto out;
> > +
> > + rec = lookup_rec(ip - offset, (ip - offset) + size);
> > + }
> > +
>
> Please create a new function for this. Perhaps find_ftrace_location().
>
> ftrace_location() is used to see if the address given is a ftrace
> nop or not. This change will make it always return true.
Now we could do:
return ip <= (rec->ip + MCOUNT_INSN_SIZE) ? rec->ip : 0;
Since we would want rec->ip if the pointer is before the ftrace
instruction. But we would need to audit all use cases and make sure this is
not called from any hot paths (in a callback).
This will affect kprobes and BPF as they both use ftrace_location() as well.
-- Steve
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: [PATCH 04/29] x86/livepatch: Validate __fentry__ location
2022-02-23 14:23 ` Steven Rostedt
@ 2022-02-23 14:33 ` Steven Rostedt
2022-02-23 14:49 ` Peter Zijlstra
1 sibling, 0 replies; 94+ messages in thread
From: Steven Rostedt @ 2022-02-23 14:33 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Josh Poimboeuf, x86, joao, hjl.tools, andrew.cooper3,
linux-kernel, ndesaulniers, keescook, samitolvanen, mark.rutland,
alyssa.milburn, Miroslav Benes, Masami Hiramatsu,
Alexei Starovoitov
On Wed, 23 Feb 2022 09:23:27 -0500
Steven Rostedt <rostedt@goodmis.org> wrote:
> return ip <= (rec->ip + MCOUNT_INSN_SIZE) ? rec->ip : 0;
That should be < and not <=, as I added the + MCOUNT_INSN_SIZE as an after
thought, and that addition changes the compare.
-- Steve
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: [PATCH 04/29] x86/livepatch: Validate __fentry__ location
2022-02-23 14:23 ` Steven Rostedt
2022-02-23 14:33 ` Steven Rostedt
@ 2022-02-23 14:49 ` Peter Zijlstra
2022-02-23 15:54 ` Peter Zijlstra
1 sibling, 1 reply; 94+ messages in thread
From: Peter Zijlstra @ 2022-02-23 14:49 UTC (permalink / raw)
To: Steven Rostedt
Cc: Josh Poimboeuf, x86, joao, hjl.tools, andrew.cooper3,
linux-kernel, ndesaulniers, keescook, samitolvanen, mark.rutland,
alyssa.milburn, Miroslav Benes, Masami Hiramatsu,
Alexei Starovoitov
On Wed, Feb 23, 2022 at 09:23:27AM -0500, Steven Rostedt wrote:
> On Wed, 23 Feb 2022 07:41:39 -0500
> Steven Rostedt <rostedt@goodmis.org> wrote:
>
> > > --- a/kernel/trace/ftrace.c
> > > +++ b/kernel/trace/ftrace.c
> > > @@ -1578,7 +1578,23 @@ unsigned long ftrace_location_range(unsi
> > > */
> > > unsigned long ftrace_location(unsigned long ip)
> > > {
> > > - return ftrace_location_range(ip, ip);
> > > + struct dyn_ftrace *rec;
> > > + unsigned long offset;
> > > + unsigned long size;
> > > +
> > > + rec = lookup_rec(ip, ip);
> > > + if (!rec) {
> > > + if (!kallsyms_lookup_size_offset(ip, &size, &offset))
> > > + goto out;
> > > +
if (!offset)
> > > + rec = lookup_rec(ip - offset, (ip - offset) + size);
> > > + }
> > > +
> >
> > Please create a new function for this. Perhaps find_ftrace_location().
> >
> > ftrace_location() is used to see if the address given is a ftrace
> > nop or not. This change will make it always return true.
>
> Now we could do:
>
> return ip <= (rec->ip + MCOUNT_INSN_SIZE) ? rec->ip : 0;
I don't see the point of that MCOUNT_INSN_SIZE there, I've done the
above. If +0 then find the entry, wherever it may be.
> Since we would want rec->ip if the pointer is before the ftrace
> instruction. But we would need to audit all use cases and make sure this is
> not called from any hot paths (in a callback).
>
> This will affect kprobes and BPF as they both use ftrace_location() as well.
Yes, I already fixed kprobes, still trying to (re)discover how to run
the bpf-selftests, that stuff is too painful :-(
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: [PATCH 04/29] x86/livepatch: Validate __fentry__ location
2022-02-23 14:49 ` Peter Zijlstra
@ 2022-02-23 15:54 ` Peter Zijlstra
0 siblings, 0 replies; 94+ messages in thread
From: Peter Zijlstra @ 2022-02-23 15:54 UTC (permalink / raw)
To: Steven Rostedt
Cc: Josh Poimboeuf, x86, joao, hjl.tools, andrew.cooper3,
linux-kernel, ndesaulniers, keescook, samitolvanen, mark.rutland,
alyssa.milburn, Miroslav Benes, Masami Hiramatsu,
Alexei Starovoitov
On Wed, Feb 23, 2022 at 03:49:41PM +0100, Peter Zijlstra wrote:
> > Since we would want rec->ip if the pointer is before the ftrace
> > instruction. But we would need to audit all use cases and make sure this is
> > not called from any hot paths (in a callback).
> >
> > This will affect kprobes and BPF as they both use ftrace_location() as well.
>
> Yes, I already fixed kprobes, still trying to (re)discover how to run
> the bpf-selftests, that stuff is too painful :-(
Ok, I think I managed... I'm obviously hitting the WARN_ON_ONCE() in
is_ftrace_location(). Funnily, no dead kernel, so that's something I
suppose.
Now, I'm trying to make sense of that code, but all that !ftrace_managed
code scares me to death.
At the very least __bpf_arch_text_poke() needs a bunch of help. Let me
go prod it with something sharp to see what falls out ...
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: [PATCH 24/29] x86/text-patching: Make text_gen_insn() IBT aware
2022-02-18 16:49 ` [PATCH 24/29] x86/text-patching: Make text_gen_insn() IBT aware Peter Zijlstra
@ 2022-02-24 1:18 ` Joao Moreira
2022-02-24 9:10 ` Peter Zijlstra
0 siblings, 1 reply; 94+ messages in thread
From: Joao Moreira @ 2022-02-24 1:18 UTC (permalink / raw)
To: Peter Zijlstra
Cc: x86, hjl.tools, jpoimboe, andrew.cooper3, linux-kernel,
ndesaulniers, keescook, samitolvanen, mark.rutland,
alyssa.milburn
> +#ifdef CONFIG_X86_IBT
> + if (is_endbr(dest))
> + dest += 4;
> +#endif
Hi, FWIIW I saw this snippet trigger a bug in the jump_label infra where
the target displacement would not fit in a JMP8 operand. The behavior
was seen because clang, for whatever reason (probably a bug?) inlined an
ENDBR function along with a function, thus the JMP8 target was
incremented. I compared the faulty kernel to one compiled with GCC and
the latter wont emit/inline the ENDBR.
The displacement I'm using in my experimentation is a few bytes more
than just 4, because I'm also adding extra instrumentation that should
be skipped when not reached indirectly. Of course this is more prone to
triggering the bug, but I don't think it is impossible to happen in the
current implementation.
For these cases perhaps we can verify if the displacement fits the
operand and, if not, simply ignore and lose the decode cycle which may
not be a huge problem and remains semantically correct. Seems more
sensible than padding jump tables with nops. In the meantime I'll
investigate clang's behavior and if it is really a bug, I'll work on a
patch.
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: [PATCH 24/29] x86/text-patching: Make text_gen_insn() IBT aware
2022-02-24 1:18 ` Joao Moreira
@ 2022-02-24 9:10 ` Peter Zijlstra
0 siblings, 0 replies; 94+ messages in thread
From: Peter Zijlstra @ 2022-02-24 9:10 UTC (permalink / raw)
To: Joao Moreira
Cc: x86, hjl.tools, jpoimboe, andrew.cooper3, linux-kernel,
ndesaulniers, keescook, samitolvanen, mark.rutland,
alyssa.milburn
On Wed, Feb 23, 2022 at 05:18:04PM -0800, Joao Moreira wrote:
> > +#ifdef CONFIG_X86_IBT
> > + if (is_endbr(dest))
> > + dest += 4;
> > +#endif
>
> Hi, FWIIW I saw this snippet trigger a bug in the jump_label infra where the
> target displacement would not fit in a JMP8 operand.
Bah, I was afraid of seening that :/
> For these cases perhaps we can verify if the displacement fits the operand
> and, if not, simply ignore and lose the decode cycle which may not be a huge
> problem and remains semantically correct. Seems more sensible than padding
> jump tables with nops. In the meantime I'll investigate clang's behavior and
> if it is really a bug, I'll work on a patch.
Urgh, trouble is, we're going to be re-writing a bunch of ENDBR to be
UD1 0x0(%eax),%eax, and you really don't want to try and execute those.
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: [PATCH 00/29] x86: Kernel IBT
2022-02-23 7:26 ` Kees Cook
@ 2022-02-24 16:47 ` Mike Rapoport
0 siblings, 0 replies; 94+ messages in thread
From: Mike Rapoport @ 2022-02-24 16:47 UTC (permalink / raw)
To: Kees Cook
Cc: Edgecombe, Rick P, Poimboe, Josh, peterz, hjl.tools, x86, joao,
Cooper, Andrew, linux-kernel, mark.rutland, samitolvanen,
ndesaulniers, Milburn, Alyssa
On Tue, Feb 22, 2022 at 11:26:57PM -0800, Kees Cook wrote:
>
> On Fri, 2022-02-18 at 17:49 +0100, Peter Zijlstra wrote:
> > This is an (almost!) complete Kernel IBT implementation.
>
> BTW, I've successfully tested this on what /proc/cpuinfo calls an "11th
> Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz" (in a Lenovo "Yoga 7 15ITL5").
> Normal laptop-y things all seem happy and it correctly blows up on a new
> LKDTM test I'll send out tomorrow.
For me it boots and can build kernel on a desktop with "12th Gen Intel(R)
Core(TM) i9-12900K"
> So, even though the series is young and has some TODOs still:
>
> Tested-by: Kees Cook <keescook@chromium.org>
So, FWIW:
Tested-by: Mike Rapoport <rppt@linux.ibm.com>
> One thought: should there be a note in dmesg about it being active? The
> only way to see it is finding "ibt" in cpuinfo...
>
> -Kees
>
> --
> Kees Cook
--
Sincerely yours,
Mike.
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: [PATCH 16/29] x86/bugs: Disable Retpoline when IBT
2022-02-22 15:00 ` Peter Zijlstra
@ 2022-02-25 0:19 ` Josh Poimboeuf
0 siblings, 0 replies; 94+ messages in thread
From: Josh Poimboeuf @ 2022-02-25 0:19 UTC (permalink / raw)
To: Peter Zijlstra
Cc: x86, joao, hjl.tools, andrew.cooper3, linux-kernel, ndesaulniers,
keescook, samitolvanen, mark.rutland, alyssa.milburn
On Tue, Feb 22, 2022 at 04:00:18PM +0100, Peter Zijlstra wrote:
> On Fri, Feb 18, 2022 at 06:15:30PM -0800, Josh Poimboeuf wrote:
>
> > This code is confusing, not helped by the fact that the existing code
> > already looks like spaghetti.
>
> I'd say that's an insult to spaghetti.
:-)
> > Assuming IBT systems also have eIBRS (right?), I don't think the above
> > SPECTRE_V2_CMD_{FORCE,AUTO} cases would be possible.
>
> Virt FTW.. if I don't handle it, some idiot will create a virtual
> machine that doesn't expose eIBRS but does do IBT just to spite me.
Ok, but in such a case, why not still do the warning, since the spectre
v2 mitigation isn't what the user might expect based on previous
behavior?
>
> > AFAICT, if execution reached the retpoline_generic label, the user
> > specified either RETPOLINE or RETPOLINE_GENERIC.
>
> Only RETPOLINE_GENERIC;
Hm?
case SPECTRE_V2_CMD_RETPOLINE:
if (IS_ENABLED(CONFIG_RETPOLINE))
goto retpoline_auto;
retpoline_auto:
if (boot_cpu_data.x86_vendor == X86_VENDOR_AMD ||
boot_cpu_data.x86_vendor == X86_VENDOR_HYGON) {
...
} else {
retpoline_generic:
> > I'm not sure it makes sense to put RETPOLINE in the "silent" list. If
> > the user boots an Intel system with spectre_v2=retpoline on the cmdline,
> > they're probably expecting a traditional retpoline and should be warned
> > if that changes, especially if it's a "demotion".
>
> too friggin bad as to expectations; retpoline == auto. Not saying that
> makes sense, just saying that's what it does.
Note quite. Today it means "on Intel use the Intel retpoline; on AMD
use the AMD retpoline."
Intel doesn't recommend the AMD retpoline. If you change that behavior
then it should be warned about so the user can adjust their mitigation
strategy accordingly.
> > In that case the switch statement isn't even needed. It can instead
> > just unconditinoally print the warning.
> >
> >
> > Also, why "demote" retpoline to LFENCE rather than attempting to
> > "promote" it to eIBRS? Maybe there's a good reason but it probably at
> > least deserves some mention in the commit log.
>
> The current code will never select retpoline if eibrs is available.
Hm? What do you think "spectre_v2=retpoline" does?
> The alternative is doing this in apply_retpolines(), but that might be
> even more nasty.
Hm? Doing what in apply_retpolines()?
--
Josh
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: [PATCH 21/29] objtool: Rename --duplicate to --lto
2022-02-18 16:49 ` [PATCH 21/29] objtool: Rename --duplicate to --lto Peter Zijlstra
@ 2022-02-26 19:42 ` Josh Poimboeuf
2022-02-26 21:48 ` Josh Poimboeuf
0 siblings, 1 reply; 94+ messages in thread
From: Josh Poimboeuf @ 2022-02-26 19:42 UTC (permalink / raw)
To: Peter Zijlstra
Cc: x86, joao, hjl.tools, andrew.cooper3, linux-kernel, ndesaulniers,
keescook, samitolvanen, mark.rutland, alyssa.milburn
On Fri, Feb 18, 2022 at 05:49:23PM +0100, Peter Zijlstra wrote:
> In order to prepare for LTO like objtool runs for modules, rename the
> duplicate argument to lto.
>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
> scripts/link-vmlinux.sh | 2 +-
> tools/objtool/builtin-check.c | 4 ++--
> tools/objtool/check.c | 7 ++++++-
> tools/objtool/include/objtool/builtin.h | 2 +-
> 4 files changed, 10 insertions(+), 5 deletions(-)
>
> --- a/scripts/link-vmlinux.sh
> +++ b/scripts/link-vmlinux.sh
> @@ -115,7 +115,7 @@ objtool_link()
> objtoolcmd="orc generate"
> fi
>
> - objtoolopt="${objtoolopt} --duplicate"
> + objtoolopt="${objtoolopt} --lto"
>
> if is_enabled CONFIG_FTRACE_MCOUNT_USE_OBJTOOL; then
> objtoolopt="${objtoolopt} --mcount"
> --- a/tools/objtool/builtin-check.c
> +++ b/tools/objtool/builtin-check.c
> @@ -20,7 +20,7 @@
> #include <objtool/objtool.h>
>
> bool no_fp, no_unreachable, retpoline, module, backtrace, uaccess, stats,
> - validate_dup, vmlinux, mcount, noinstr, backup, sls, dryrun;
> + lto, vmlinux, mcount, noinstr, backup, sls, dryrun;
>
> static const char * const check_usage[] = {
> "objtool check [<options>] file.o",
> @@ -40,7 +40,7 @@ const struct option check_options[] = {
> OPT_BOOLEAN('b', "backtrace", &backtrace, "unwind on error"),
> OPT_BOOLEAN('a', "uaccess", &uaccess, "enable uaccess checking"),
> OPT_BOOLEAN('s', "stats", &stats, "print statistics"),
> - OPT_BOOLEAN('d', "duplicate", &validate_dup, "duplicate validation for vmlinux.o"),
> + OPT_BOOLEAN(0, "lto", <o, "whole-archive like runs"),
"--lto" is a confusing name, since this "feature" isn't specific to LTO.
Also, it gives no indication of what it actually does.
What it does is, run objtool on vmlinux or module just like it's a
normal object, and *don't* do noinstr validation. Right?
It's weird for the noinstr-only-mode to be the default.
BTW "--duplicate" had similar problems...
So how about:
- Default to normal mode on vmlinux/module, i.e. validate and/or
generate ORC like any other object. This default is more logically
consistent and makes sense for the future once we get around to
parallelizing objtool.
- Have "--noinstr", which does noinstr validation, in addition to all
the other objtool validation/generation. So it's additive, like any
other cmdline option. (Maybe this option isn't necessarily needed for
now.)
- Have "--noinstr-only" which only does noinstr validation and nothing
else. (Alternatively, "--noinstr --dry-run")
?
--
Josh
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: [PATCH 21/29] objtool: Rename --duplicate to --lto
2022-02-26 19:42 ` Josh Poimboeuf
@ 2022-02-26 21:48 ` Josh Poimboeuf
2022-02-28 11:05 ` Peter Zijlstra
0 siblings, 1 reply; 94+ messages in thread
From: Josh Poimboeuf @ 2022-02-26 21:48 UTC (permalink / raw)
To: Peter Zijlstra
Cc: x86, joao, hjl.tools, andrew.cooper3, linux-kernel, ndesaulniers,
keescook, samitolvanen, mark.rutland, alyssa.milburn
On Sat, Feb 26, 2022 at 11:42:13AM -0800, Josh Poimboeuf wrote:
> > + OPT_BOOLEAN(0, "lto", <o, "whole-archive like runs"),
>
> "--lto" is a confusing name, since this "feature" isn't specific to LTO.
>
> Also, it gives no indication of what it actually does.
>
> What it does is, run objtool on vmlinux or module just like it's a
> normal object, and *don't* do noinstr validation. Right?
>
> It's weird for the noinstr-only-mode to be the default.
>
> BTW "--duplicate" had similar problems...
>
> So how about:
>
> - Default to normal mode on vmlinux/module, i.e. validate and/or
> generate ORC like any other object. This default is more logically
> consistent and makes sense for the future once we get around to
> parallelizing objtool.
>
> - Have "--noinstr", which does noinstr validation, in addition to all
> the other objtool validation/generation. So it's additive, like any
> other cmdline option. (Maybe this option isn't necessarily needed for
> now.)
It just dawned on me that "--noinstr" already exists. But I'm
scratching my head trying to figure out the difference between
"--noinstr" and omitting "--lto".
> - Have "--noinstr-only" which only does noinstr validation and nothing
> else. (Alternatively, "--noinstr --dry-run")
>
> ?
--
Josh
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: [PATCH 21/29] objtool: Rename --duplicate to --lto
2022-02-26 21:48 ` Josh Poimboeuf
@ 2022-02-28 11:05 ` Peter Zijlstra
2022-02-28 18:32 ` Josh Poimboeuf
0 siblings, 1 reply; 94+ messages in thread
From: Peter Zijlstra @ 2022-02-28 11:05 UTC (permalink / raw)
To: Josh Poimboeuf
Cc: x86, joao, hjl.tools, andrew.cooper3, linux-kernel, ndesaulniers,
keescook, samitolvanen, mark.rutland, alyssa.milburn
On Sat, Feb 26, 2022 at 01:48:02PM -0800, Josh Poimboeuf wrote:
> On Sat, Feb 26, 2022 at 11:42:13AM -0800, Josh Poimboeuf wrote:
> > > + OPT_BOOLEAN(0, "lto", <o, "whole-archive like runs"),
> >
> > "--lto" is a confusing name, since this "feature" isn't specific to LTO.
> >
> > Also, it gives no indication of what it actually does.
> >
> > What it does is, run objtool on vmlinux or module just like it's a
> > normal object, and *don't* do noinstr validation. Right?
How about --whole-archive, much like the linker then?
The distinction is that we run objtool *only* on vmlinux and modules and
not also on the individual .o files.
There's 3 models:
A) every translation unit
(module parts get --module)
B) every translation unit + shallow vmlinux
(module parts get --module, vmlinux.o gets --vmlinux)
C) vmlinux + modules
(modules get --module, vmlinux.o gets --vmlinux
--duplicate/lto/whole-archive, pick your poison).
objtool started out with (A); then for noinstr validation I added a
shallow vmlinux pass that *only* checks .noinstr.text and .entry.text
for escapes (B). This is to not unduly add time to the slowest (single
threaded) part of the kernel build, linking vmlinux.
Then CLANG_LTO added (C), due to LTO there simply isn't asm to poke at
until the whole-archive thing. But this means that the vmlinux run needs
to do all validation, not only the shallow noinstr validation.
--duplicate was added there, a bad name because it really doesn't do
duplicate work, it's the first and only objtool run (it's only duplicate
if you also run on each TU, but we don't do that).
Now with these patches I need whole-archive objtool passes and instead
of making a 4th mode, or extend (B), I chose to just bite the bullet and
go full LTO style (C).
Now, I figured it would be good to have a flag to indicate we're running
LTO style and --duplicate is more or less that, except for the terrible
name.
> > It's weird for the noinstr-only-mode to be the default.
> >
> > BTW "--duplicate" had similar problems...
> >
> > So how about:
> >
> > - Default to normal mode on vmlinux/module, i.e. validate and/or
> > generate ORC like any other object. This default is more logically
> > consistent and makes sense for the future once we get around to
> > parallelizing objtool.
> >
> > - Have "--noinstr", which does noinstr validation, in addition to all
> > the other objtool validation/generation. So it's additive, like any
> > other cmdline option. (Maybe this option isn't necessarily needed for
> > now.)
>
> It just dawned on me that "--noinstr" already exists. But I'm
> scratching my head trying to figure out the difference between
> "--noinstr" and omitting "--lto".
If you run: "objtool check --vmlinux --noinstr vmlinux.o", it'll only do
the shallow .noinstr.text/.entry.text checks. If OTOH you do: "objtool
check --vmlinux --noinstr --lto vmlinux.o" it'll do everything
(including noinstr).
Similarlt, "--module --lto" will come to mean whole module (which is
currently not distinguishable from a regular module part run).
(barring the possible 's/--lto/--whole-archive/' rename proposed up top)
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: [PATCH 21/29] objtool: Rename --duplicate to --lto
2022-02-28 11:05 ` Peter Zijlstra
@ 2022-02-28 18:32 ` Josh Poimboeuf
2022-02-28 20:09 ` Peter Zijlstra
0 siblings, 1 reply; 94+ messages in thread
From: Josh Poimboeuf @ 2022-02-28 18:32 UTC (permalink / raw)
To: Peter Zijlstra
Cc: x86, joao, hjl.tools, andrew.cooper3, linux-kernel, ndesaulniers,
keescook, samitolvanen, mark.rutland, alyssa.milburn
On Mon, Feb 28, 2022 at 12:05:06PM +0100, Peter Zijlstra wrote:
> > It just dawned on me that "--noinstr" already exists. But I'm
> > scratching my head trying to figure out the difference between
> > "--noinstr" and omitting "--lto".
>
> If you run: "objtool check --vmlinux --noinstr vmlinux.o", it'll only do
> the shallow .noinstr.text/.entry.text checks. If OTOH you do: "objtool
> check --vmlinux --noinstr --lto vmlinux.o" it'll do everything
> (including noinstr).
I think I got all that. But what does "--vmlinux" do by itself?
> Similarlt, "--module --lto" will come to mean whole module (which is
> currently not distinguishable from a regular module part run).
>
> (barring the possible 's/--lto/--whole-archive/' rename proposed up top)
Thanks for the explanations. To summarize, we have:
A) legacy mode:
translation unit: objtool check [--module]
vmlinux.o: N/A
module: N/A
B) CONFIG_VMLINUX_VALIDATION=y && !(CONFIG_X86_KERNEL_IBT=y || CONFIG_LTO=y)
translation unit: objtool check [--module]
vmlinux: objtool check --vmlinux --noinstr
module: objtool check --module --noinstr
C) CONFIG_X86_KERNEL_IBT=y || CONFIG_LTO=y:
translation unit: N/A
vmlinux: objtool check --vmlinux --noinstr --lto
module: objtool check --module --noinstr --lto
Right?
I think I get it, but it's mental gymnastics for me to remember how the
options interact. It still seems counterintuitive, because whatever
"objtool check" does to a translation unit, I'd expect "objtool check
--vmlinux" to do the same things.
I suppose it makes sense if I can remember that --vmlinux is a magical
option which disables all that other stuff. And it's counteracted by
--lto, which removes the magic. But that's all hard to remember and
just seems weird.
There are a variety of ways to run objtool against vmlinux. The "lto"
approach is going to be less of an exception and may end up being the
default someday. So making --vmlinux do weird stuff is going to be even
less intuitive as we go forward. Let's make the default sane and
consistent with other file types.
So how about we just get rid of the magical --vmlinux and --lto options
altogether, and make --noinstr additive, like all the other options?
A) legacy mode:
.o files: objtool check [--module]
vmlinux: N/A
module: N/A
B) CONFIG_NOINSTR_VALIDATION=y && !(CONFIG_X86_KERNEL_IBT=y || CONFIG_LTO=y):
.o files: objtool check [--module]
vmlinux: objtool check --noinstr-only
module: objtool check --module --noinstr-only
C) CONFIG_X86_KERNEL_IBT=y || CONFIG_LTO=y:
.o files: N/A
vmlinux: objtool check --noinstr
module: objtool check --module --noinstr
(notice I renamed VMLINUX_VALIDATION to NOINSTR_VALIDATION)
Isn't that much more logical and intuitive?
a) objtool has sane defaults, regardless of object type
b) no magic options, other than --noinstr-only, but that's
communicated in its name
c) --vmlinux is no longer needed -- fewer options to juggle
--
Josh
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: [PATCH 21/29] objtool: Rename --duplicate to --lto
2022-02-28 18:32 ` Josh Poimboeuf
@ 2022-02-28 20:09 ` Peter Zijlstra
2022-02-28 20:18 ` Josh Poimboeuf
0 siblings, 1 reply; 94+ messages in thread
From: Peter Zijlstra @ 2022-02-28 20:09 UTC (permalink / raw)
To: Josh Poimboeuf
Cc: x86, joao, hjl.tools, andrew.cooper3, linux-kernel, ndesaulniers,
keescook, samitolvanen, mark.rutland, alyssa.milburn
On Mon, Feb 28, 2022 at 10:32:28AM -0800, Josh Poimboeuf wrote:
> Thanks for the explanations. To summarize, we have:
>
> A) legacy mode:
>
> translation unit: objtool check [--module]
> vmlinux.o: N/A
> module: N/A
>
> B) CONFIG_VMLINUX_VALIDATION=y && !(CONFIG_X86_KERNEL_IBT=y || CONFIG_LTO=y)
>
> translation unit: objtool check [--module]
> vmlinux: objtool check --vmlinux --noinstr
> module: objtool check --module --noinstr
Not the module thing here; noinstr never leaves the core kernel (for
now; I need me a few compiler features before I can tackle the idle path
issues).
> C) CONFIG_X86_KERNEL_IBT=y || CONFIG_LTO=y:
>
> translation unit: N/A
> vmlinux: objtool check --vmlinux --noinstr --lto
> module: objtool check --module --noinstr --lto
>
> Right?
More or less, with the one caveat above.
> I think I get it, but it's mental gymnastics for me to remember how the
> options interact. It still seems counterintuitive, because whatever
> "objtool check" does to a translation unit, I'd expect "objtool check
> --vmlinux" to do the same things.
I think I agree. It is a bit weird.
> So how about we just get rid of the magical --vmlinux and --lto options
> altogether, and make --noinstr additive, like all the other options?
>
> A) legacy mode:
> .o files: objtool check [--module]
> vmlinux: N/A
> module: N/A
>
> B) CONFIG_NOINSTR_VALIDATION=y && !(CONFIG_X86_KERNEL_IBT=y || CONFIG_LTO=y):
> .o files: objtool check [--module]
> vmlinux: objtool check --noinstr-only
> module: objtool check --module --noinstr-only
>
> C) CONFIG_X86_KERNEL_IBT=y || CONFIG_LTO=y:
> .o files: N/A
> vmlinux: objtool check --noinstr
> module: objtool check --module --noinstr
I like the --noinstr-only thing. But I think I still like a flag to
differentiate between TU/.o file and vmlinux/whole-module invocation.
Anyway, you ok with me cleaning this up later, in a separate series?
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: [PATCH 21/29] objtool: Rename --duplicate to --lto
2022-02-28 20:09 ` Peter Zijlstra
@ 2022-02-28 20:18 ` Josh Poimboeuf
2022-03-01 14:19 ` Miroslav Benes
0 siblings, 1 reply; 94+ messages in thread
From: Josh Poimboeuf @ 2022-02-28 20:18 UTC (permalink / raw)
To: Peter Zijlstra
Cc: x86, joao, hjl.tools, andrew.cooper3, linux-kernel, ndesaulniers,
keescook, samitolvanen, mark.rutland, alyssa.milburn
On Mon, Feb 28, 2022 at 09:09:34PM +0100, Peter Zijlstra wrote:
> > So how about we just get rid of the magical --vmlinux and --lto options
> > altogether, and make --noinstr additive, like all the other options?
> >
> > A) legacy mode:
> > .o files: objtool check [--module]
> > vmlinux: N/A
> > module: N/A
> >
> > B) CONFIG_NOINSTR_VALIDATION=y && !(CONFIG_X86_KERNEL_IBT=y || CONFIG_LTO=y):
> > .o files: objtool check [--module]
> > vmlinux: objtool check --noinstr-only
> > module: objtool check --module --noinstr-only
> >
> > C) CONFIG_X86_KERNEL_IBT=y || CONFIG_LTO=y:
> > .o files: N/A
> > vmlinux: objtool check --noinstr
> > module: objtool check --module --noinstr
>
> I like the --noinstr-only thing. But I think I still like a flag to
> differentiate between TU/.o file and vmlinux/whole-module invocation.
I'm missing why that would still be useful.
> Anyway, you ok with me cleaning this up later, in a separate series?
Sure. It's already less than ideal today anyway, with '--vmlinux' and
'--duplicate'.
--
Josh
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: [PATCH 21/29] objtool: Rename --duplicate to --lto
2022-02-28 20:18 ` Josh Poimboeuf
@ 2022-03-01 14:19 ` Miroslav Benes
0 siblings, 0 replies; 94+ messages in thread
From: Miroslav Benes @ 2022-03-01 14:19 UTC (permalink / raw)
To: Josh Poimboeuf
Cc: Peter Zijlstra, x86, joao, hjl.tools, andrew.cooper3,
linux-kernel, ndesaulniers, keescook, samitolvanen, mark.rutland,
alyssa.milburn
On Mon, 28 Feb 2022, Josh Poimboeuf wrote:
> On Mon, Feb 28, 2022 at 09:09:34PM +0100, Peter Zijlstra wrote:
> > > So how about we just get rid of the magical --vmlinux and --lto options
> > > altogether, and make --noinstr additive, like all the other options?
> > >
> > > A) legacy mode:
> > > .o files: objtool check [--module]
> > > vmlinux: N/A
> > > module: N/A
> > >
> > > B) CONFIG_NOINSTR_VALIDATION=y && !(CONFIG_X86_KERNEL_IBT=y || CONFIG_LTO=y):
> > > .o files: objtool check [--module]
> > > vmlinux: objtool check --noinstr-only
> > > module: objtool check --module --noinstr-only
> > >
> > > C) CONFIG_X86_KERNEL_IBT=y || CONFIG_LTO=y:
> > > .o files: N/A
> > > vmlinux: objtool check --noinstr
> > > module: objtool check --module --noinstr
> >
> > I like the --noinstr-only thing. But I think I still like a flag to
> > differentiate between TU/.o file and vmlinux/whole-module invocation.
>
> I'm missing why that would still be useful.
>
> > Anyway, you ok with me cleaning this up later, in a separate series?
>
> Sure. It's already less than ideal today anyway, with '--vmlinux' and
> '--duplicate'.
Since I always have hard times to figure out different passes and options
of objtool, could you add the above description (its final version) to
tools/objtool/Documentation/ as a part of the cleanup series, please?
Miroslav
^ permalink raw reply [flat|nested] 94+ messages in thread
end of thread, other threads:[~2022-03-01 14:19 UTC | newest]
Thread overview: 94+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-18 16:49 [PATCH 00/29] x86: Kernel IBT Peter Zijlstra
2022-02-18 16:49 ` [PATCH 01/29] static_call: Avoid building empty .static_call_sites Peter Zijlstra
2022-02-18 16:49 ` [PATCH 02/29] x86/module: Fix the paravirt vs alternative order Peter Zijlstra
2022-02-18 20:28 ` Josh Poimboeuf
2022-02-18 21:22 ` Peter Zijlstra
2022-02-18 23:28 ` Josh Poimboeuf
2022-02-18 16:49 ` [PATCH 03/29] objtool: Add --dry-run Peter Zijlstra
2022-02-18 16:49 ` [PATCH 04/29] x86/livepatch: Validate __fentry__ location Peter Zijlstra
2022-02-18 21:08 ` Josh Poimboeuf
2022-02-23 10:09 ` Peter Zijlstra
2022-02-23 10:21 ` Miroslav Benes
2022-02-23 10:57 ` Peter Zijlstra
2022-02-23 12:41 ` Steven Rostedt
2022-02-23 14:05 ` Peter Zijlstra
2022-02-23 14:16 ` Steven Rostedt
2022-02-23 14:23 ` Steven Rostedt
2022-02-23 14:33 ` Steven Rostedt
2022-02-23 14:49 ` Peter Zijlstra
2022-02-23 15:54 ` Peter Zijlstra
2022-02-18 16:49 ` [PATCH 05/29] x86: Base IBT bits Peter Zijlstra
2022-02-18 20:49 ` Andrew Cooper
2022-02-18 21:11 ` David Laight
2022-02-18 21:24 ` Andrew Cooper
2022-02-18 22:37 ` David Laight
2022-02-18 21:26 ` Peter Zijlstra
2022-02-18 21:14 ` Josh Poimboeuf
2022-02-18 21:21 ` Peter Zijlstra
2022-02-18 22:12 ` Joao Moreira
2022-02-19 1:07 ` Edgecombe, Rick P
2022-02-18 16:49 ` [PATCH 06/29] x86/ibt: Add ANNOTATE_NOENDBR Peter Zijlstra
2022-02-18 16:49 ` [PATCH 07/29] x86/entry: Sprinkle ENDBR dust Peter Zijlstra
2022-02-19 0:23 ` Josh Poimboeuf
2022-02-19 23:08 ` Peter Zijlstra
2022-02-19 0:36 ` Josh Poimboeuf
2022-02-18 16:49 ` [PATCH 08/29] x86/linkage: Add ENDBR to SYM_FUNC_START*() Peter Zijlstra
2022-02-18 16:49 ` [PATCH 09/29] x86/ibt,paravirt: Sprinkle ENDBR Peter Zijlstra
2022-02-18 16:49 ` [PATCH 10/29] x86/bpf: Add ENDBR instructions to prologue Peter Zijlstra
2022-02-18 16:49 ` [PATCH 11/29] x86/ibt,crypto: Add ENDBR for the jump-table entries Peter Zijlstra
2022-02-18 16:49 ` [PATCH 12/29] x86/ibt,kvm: Add ENDBR to fastops Peter Zijlstra
2022-02-18 16:49 ` [PATCH 13/29] x86/ibt,ftrace: Add ENDBR to samples/ftrace Peter Zijlstra
2022-02-18 16:49 ` [PATCH 14/29] x86/ibt: Add IBT feature, MSR and #CP handling Peter Zijlstra
2022-02-18 19:31 ` Andrew Cooper
2022-02-18 21:15 ` Peter Zijlstra
2022-02-19 1:20 ` Edgecombe, Rick P
2022-02-19 1:21 ` Josh Poimboeuf
2022-02-19 9:24 ` Peter Zijlstra
2022-02-21 8:24 ` Kees Cook
2022-02-22 4:38 ` Edgecombe, Rick P
2022-02-22 9:32 ` Peter Zijlstra
2022-02-18 16:49 ` [PATCH 15/29] x86: Disable IBT around firmware Peter Zijlstra
2022-02-21 8:27 ` Kees Cook
2022-02-21 10:06 ` Peter Zijlstra
2022-02-21 13:22 ` Peter Zijlstra
2022-02-21 15:54 ` Kees Cook
2022-02-21 16:10 ` Peter Zijlstra
2022-02-18 16:49 ` [PATCH 16/29] x86/bugs: Disable Retpoline when IBT Peter Zijlstra
2022-02-19 2:15 ` Josh Poimboeuf
2022-02-22 15:00 ` Peter Zijlstra
2022-02-25 0:19 ` Josh Poimboeuf
2022-02-18 16:49 ` [PATCH 17/29] x86/ibt: Annotate text references Peter Zijlstra
2022-02-19 5:22 ` Josh Poimboeuf
2022-02-19 9:39 ` Peter Zijlstra
2022-02-18 16:49 ` [PATCH 18/29] x86/ibt,ftrace: Annotate ftrace code patching Peter Zijlstra
2022-02-18 16:49 ` [PATCH 19/29] x86/ibt,xen: Annotate away warnings Peter Zijlstra
2022-02-18 20:24 ` Andrew Cooper
2022-02-18 21:05 ` Peter Zijlstra
2022-02-18 23:07 ` Andrew Cooper
2022-02-21 14:20 ` Peter Zijlstra
2022-02-18 16:49 ` [PATCH 20/29] x86/ibt,sev: Annotations Peter Zijlstra
2022-02-18 16:49 ` [PATCH 21/29] objtool: Rename --duplicate to --lto Peter Zijlstra
2022-02-26 19:42 ` Josh Poimboeuf
2022-02-26 21:48 ` Josh Poimboeuf
2022-02-28 11:05 ` Peter Zijlstra
2022-02-28 18:32 ` Josh Poimboeuf
2022-02-28 20:09 ` Peter Zijlstra
2022-02-28 20:18 ` Josh Poimboeuf
2022-03-01 14:19 ` Miroslav Benes
2022-02-18 16:49 ` [PATCH 22/29] Kbuild: Prepare !CLANG whole module objtool Peter Zijlstra
2022-02-18 16:49 ` [PATCH 23/29] objtool: Read the NOENDBR annotation Peter Zijlstra
2022-02-18 16:49 ` [PATCH 24/29] x86/text-patching: Make text_gen_insn() IBT aware Peter Zijlstra
2022-02-24 1:18 ` Joao Moreira
2022-02-24 9:10 ` Peter Zijlstra
2022-02-18 16:49 ` [PATCH 25/29] x86/ibt: Dont generate ENDBR in .discard.text Peter Zijlstra
2022-02-18 16:49 ` [PATCH 26/29] objtool: Add IBT validation / fixups Peter Zijlstra
2022-02-18 16:49 ` [PATCH 27/29] x86/ibt: Finish --ibt-fix-direct on module loading Peter Zijlstra
2022-02-18 16:49 ` [PATCH 28/29] x86/ibt: Ensure module init/exit points have references Peter Zijlstra
2022-02-18 16:49 ` [PATCH 29/29] x86/alternative: Use .ibt_endbr_sites to seal indirect calls Peter Zijlstra
2022-02-19 1:29 ` [PATCH 00/29] x86: Kernel IBT Edgecombe, Rick P
2022-02-19 9:58 ` Peter Zijlstra
2022-02-19 16:00 ` Andrew Cooper
2022-02-21 8:42 ` Kees Cook
2022-02-21 9:24 ` Peter Zijlstra
2022-02-23 7:26 ` Kees Cook
2022-02-24 16:47 ` Mike Rapoport
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).