Hi, So I hacked this up on Friday night / Saturday morning and spend all of today cleaning it up. It is the very bare beginnings of kernel IBT support. Since I'm lacking any sort of actual hardware it even lacks fun things like code to write to the MSRs to enable the IBT tracker etc.. However, it should have most of the ENDBR instructions in the right place -- I hope :-) That said; I would *really* like compiler support for this stuff to be improved, the amount of fixups done by objtool is obscene. The end result still boots on ancient x86-64 hardware, for whatever that's worth (when built with the below turd included that is). Enjoy! --- diff --git a/scripts/link-vmlinux.sh b/scripts/link-vmlinux.sh index 5cdd9bc5c385..1d180bbe7b28 100755 --- a/scripts/link-vmlinux.sh +++ b/scripts/link-vmlinux.sh @@ -142,6 +142,11 @@ objtool_link() info OBJTOOL ${1} tools/objtool/objtool ${objtoolcmd} ${objtoolopt} ${1} fi + + if [ "${CONFIG_X86_IBT}" = "y" ]; then + # XXX less ugleh + tools/objtool/objtool check --no-fp --retpoline --uaccess --vmlinux --duplicate --ibt --ibt-fix-direct --ibt-seal ${1} + fi } # Link of vmlinux
In order to find _THIS_IP_ code references in objtool, annotate them. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> --- arch/x86/include/asm/linkage.h | 11 +++++++++++ include/linux/instruction_pointer.h | 5 +++++ 2 files changed, 16 insertions(+) --- a/arch/x86/include/asm/linkage.h +++ b/arch/x86/include/asm/linkage.h @@ -3,10 +3,21 @@ #define _ASM_X86_LINKAGE_H #include <linux/stringify.h> +#include <asm/asm.h> #undef notrace #define notrace __attribute__((no_instrument_function)) +#define _THIS_IP_ \ + ({ __label__ __here; \ + __here: \ + asm_volatile_goto ( \ + ".pushsection .discard.this_ip\n\t" \ + _ASM_PTR " %l[__here]\n\t" \ + ".popsection\n\t" \ + : : : : __here); \ + (unsigned long)&&__here; }) + #ifdef CONFIG_X86_32 #define asmlinkage CPP_ASMLINKAGE __attribute__((regparm(0))) #endif /* CONFIG_X86_32 */ --- a/include/linux/instruction_pointer.h +++ b/include/linux/instruction_pointer.h @@ -2,7 +2,12 @@ #ifndef _LINUX_INSTRUCTION_POINTER_H #define _LINUX_INSTRUCTION_POINTER_H +#include <asm/linkage.h> + #define _RET_IP_ (unsigned long)__builtin_return_address(0) + +#ifndef _THIS_IP_ #define _THIS_IP_ ({ __label__ __here; __here: (unsigned long)&&__here; }) +#endif #endif /* _LINUX_INSTRUCTION_POINTER_H */
Add Kconfig, Makefile and basic instruction support for x86 IBT. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> --- arch/x86/Kconfig | 10 ++++++++++ arch/x86/Makefile | 5 ++++- arch/x86/include/asm/ibt.h | 40 ++++++++++++++++++++++++++++++++++++++++ 3 files changed, 54 insertions(+), 1 deletion(-) --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -1848,6 +1848,16 @@ config X86_UMIP specific cases in protected and virtual-8086 modes. Emulated results are dummy. +config CC_HAS_IBT + def_bool $(cc-option, -fcf-protection=branch) + +config X86_IBT + prompt "Indirect Branch Tracking" + bool + depends on X86_64 && CC_HAS_IBT + help + Increase kernel text size for giggles + config X86_INTEL_MEMORY_PROTECTION_KEYS prompt "Memory Protection Keys" def_bool y --- a/arch/x86/Makefile +++ b/arch/x86/Makefile @@ -50,8 +50,11 @@ export BITS # KBUILD_CFLAGS += -mno-sse -mno-mmx -mno-sse2 -mno-3dnow -mno-avx -# Intel CET isn't enabled in the kernel +ifeq ($(CONFIG_X86_IBT),y) +KBUILD_CFLAGS += $(call cc-option,-fcf-protection=branch) +else KBUILD_CFLAGS += $(call cc-option,-fcf-protection=none) +endif ifeq ($(CONFIG_X86_32),y) BITS := 32 --- /dev/null +++ b/arch/x86/include/asm/ibt.h @@ -0,0 +1,40 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _ASM_X86_IBT_H +#define _ASM_X86_IBT_H + +#ifdef CONFIG_X86_IBT + +#ifndef __ASSEMBLY__ + +// XXX note about GAS version required + +#ifdef CONFIG_X86_64 +#define ASM_ENDBR ".byte 0xf3, 0x0f, 0x1e, 0xfa\n\t" +#else +#define ASM_ENDBR ".byte 0xf3, 0x0f, 0x1e, 0xfb\n\t" +#endif + +#else /* __ASSEMBLY__ */ + +#ifdef CONFIG_X86_64 +#define ENDBR .byte 0xf3, 0x0f, 0x1e, 0xfa +#else +#define ENDBR .byte 0xf3, 0x0f, 0x1e, 0xfb +#endif + +#endif /* __ASSEMBLY__ */ + +#else /* !IBT */ + +#ifndef __ASSEMBLY__ + +#define ASM_ENDBR + +#else /* __ASSEMBLY__ */ + +#define ENDBR + +#endif /* __ASSEMBLY__ */ + +#endif /* CONFIG_X86_IBT */ +#endif /* _ASM_X86_IBT_H */
The IRET-to-Self chunks trigger forward code references without ENDBR, fix that. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> --- arch/x86/entry/entry_64.S | 2 ++ arch/x86/include/asm/sync_core.h | 2 ++ 2 files changed, 4 insertions(+) --- a/arch/x86/entry/entry_64.S +++ b/arch/x86/entry/entry_64.S @@ -39,6 +39,7 @@ #include <asm/trapnr.h> #include <asm/nospec-branch.h> #include <asm/fsgsbase.h> +#include <asm/ibt.h> #include <linux/err.h> #include "calling.h" @@ -1309,6 +1310,7 @@ SYM_CODE_START(asm_exc_nmi) iretq /* continues at repeat_nmi below */ UNWIND_HINT_IRET_REGS 1: + ENDBR #endif repeat_nmi: --- a/arch/x86/include/asm/sync_core.h +++ b/arch/x86/include/asm/sync_core.h @@ -6,6 +6,7 @@ #include <asm/processor.h> #include <asm/cpufeature.h> #include <asm/special_insns.h> +#include <asm/ibt.h> #ifdef CONFIG_X86_32 static inline void iret_to_self(void) @@ -34,6 +35,7 @@ static inline void iret_to_self(void) "pushq $1f\n\t" "iretq\n\t" "1:" + ASM_ENDBR : "=&r" (tmp), ASM_CALL_CONSTRAINT : : "cc", "memory"); } #endif /* CONFIG_X86_32 */
Read the new _THIS_IP_ annotation. While there, attempt to not bloat struct instruction. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> --- tools/objtool/check.c | 27 +++++++++++++++++++++++++++ tools/objtool/include/objtool/check.h | 13 ++++++++++--- 2 files changed, 37 insertions(+), 3 deletions(-) --- a/tools/objtool/check.c +++ b/tools/objtool/check.c @@ -1854,6 +1854,29 @@ static int read_unwind_hints(struct objt return 0; } +static int read_this_ip_hints(struct objtool_file *file) +{ + struct section *sec; + struct instruction *insn; + struct reloc *reloc; + + sec = find_section_by_name(file->elf, ".rela.discard.this_ip"); + if (!sec) + return 0; + + list_for_each_entry(reloc, &sec->reloc_list, list) { + insn = find_insn(file, reloc->sym->sec, reloc->sym->offset + reloc->addend); + if (!insn) { + WARN("bad .discard.this_ip entry"); + return -1; + } + + insn->this_ip = 1; + } + + return 0; +} + static int read_retpoline_hints(struct objtool_file *file) { struct section *sec; @@ -2066,6 +2089,10 @@ static int decode_sections(struct objtoo if (ret) return ret; + ret = read_this_ip_hints(file); + if (ret) + return ret; + /* * Must be before add_{jump_call}_destination. */ --- a/tools/objtool/include/objtool/check.h +++ b/tools/objtool/include/objtool/check.h @@ -45,11 +45,18 @@ struct instruction { unsigned int len; enum insn_type type; unsigned long immediate; - bool dead_end, ignore, ignore_alts; - bool hint; - bool retpoline_safe; + + u8 dead_end : 1, + ignore : 1, + ignore_alts : 1, + hint : 1, + retpoline_safe : 1, + this_ip : 1; + /* 2 bit hole */ s8 instr; u8 visited; + /* u8 hole */ + struct alt_group *alt_group; struct symbol *call_dest; struct instruction *jump_dest;
Kernel entry points should be having ENDBR on for IBT configs. The SYSCALL entry points are found through taking their respective address in order to program them in the MSRs, while the exception entry points are found through UNWIND_HINT_IRET_REGS. *Except* that latter hint is also used on exit code to denote when we're down to an IRET frame. As such add an additional 'entry' argument to the macro and have it default to '1' such that objtool will assume it's an entry and WARN about it. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> --- arch/x86/entry/entry_64.S | 34 ++++++++++++++++++++-------------- arch/x86/entry/entry_64_compat.S | 2 ++ arch/x86/include/asm/idtentry.h | 23 +++++++++++++++-------- arch/x86/include/asm/segment.h | 5 +++++ arch/x86/include/asm/unwind_hints.h | 18 +++++++++++++----- arch/x86/kernel/head_64.S | 14 +++++++++----- arch/x86/kernel/idt.c | 5 +++-- arch/x86/kernel/unwind_orc.c | 3 ++- include/linux/objtool.h | 5 +++-- tools/include/linux/objtool.h | 5 +++-- tools/objtool/check.c | 3 ++- tools/objtool/orc_dump.c | 3 ++- 12 files changed, 79 insertions(+), 41 deletions(-) --- a/arch/x86/entry/entry_64.S +++ b/arch/x86/entry/entry_64.S @@ -88,6 +88,7 @@ SYM_CODE_START(entry_SYSCALL_64) UNWIND_HINT_EMPTY + ENDBR swapgs /* tss.sp2 is scratch space. */ movq %rsp, PER_CPU_VAR(cpu_tss_rw + TSS_sp2) @@ -350,7 +351,8 @@ SYM_CODE_END(ret_from_fork) */ .macro idtentry vector asmsym cfunc has_error_code:req SYM_CODE_START(\asmsym) - UNWIND_HINT_IRET_REGS offset=\has_error_code*8 + UNWIND_HINT_IRET_REGS offset=\has_error_code*8 entry=1 + ENDBR ASM_CLAC .if \has_error_code == 0 @@ -367,7 +369,7 @@ SYM_CODE_START(\asmsym) .rept 6 pushq 5*8(%rsp) .endr - UNWIND_HINT_IRET_REGS offset=8 + UNWIND_HINT_IRET_REGS offset=8 entry=0 .Lfrom_usermode_no_gap_\@: .endif @@ -417,7 +419,8 @@ SYM_CODE_END(\asmsym) */ .macro idtentry_mce_db vector asmsym cfunc SYM_CODE_START(\asmsym) - UNWIND_HINT_IRET_REGS + UNWIND_HINT_IRET_REGS entry=1 + ENDBR ASM_CLAC pushq $-1 /* ORIG_RAX: no syscall to restart */ @@ -472,7 +475,8 @@ SYM_CODE_END(\asmsym) */ .macro idtentry_vc vector asmsym cfunc SYM_CODE_START(\asmsym) - UNWIND_HINT_IRET_REGS + UNWIND_HINT_IRET_REGS entry=1 + ENDBR ASM_CLAC /* @@ -533,7 +537,8 @@ SYM_CODE_END(\asmsym) */ .macro idtentry_df vector asmsym cfunc SYM_CODE_START(\asmsym) - UNWIND_HINT_IRET_REGS offset=8 + UNWIND_HINT_IRET_REGS offset=8 entry=1 + ENDBR ASM_CLAC /* paranoid_entry returns GS information for paranoid_exit in EBX. */ @@ -626,7 +631,7 @@ SYM_INNER_LABEL(restore_regs_and_return_ INTERRUPT_RETURN SYM_INNER_LABEL_ALIGN(native_iret, SYM_L_GLOBAL) - UNWIND_HINT_IRET_REGS + UNWIND_HINT_IRET_REGS entry=0 /* * Are we returning to a stack segment from the LDT? Note: in * 64-bit mode SS:RSP on the exception stack is always valid. @@ -703,7 +708,7 @@ SYM_INNER_LABEL(native_irq_return_iret, popq %rdi /* Restore user RDI */ movq %rax, %rsp - UNWIND_HINT_IRET_REGS offset=8 + UNWIND_HINT_IRET_REGS offset=8 entry=0 /* * At this point, we cannot write to the stack any more, but we can @@ -819,13 +824,13 @@ SYM_CODE_START(xen_failsafe_callback) movq 8(%rsp), %r11 addq $0x30, %rsp pushq $0 /* RIP */ - UNWIND_HINT_IRET_REGS offset=8 + UNWIND_HINT_IRET_REGS offset=8 entry=0 jmp asm_exc_general_protection 1: /* Segment mismatch => Category 1 (Bad segment). Retry the IRET. */ movq (%rsp), %rcx movq 8(%rsp), %r11 addq $0x30, %rsp - UNWIND_HINT_IRET_REGS + UNWIND_HINT_IRET_REGS entry=0 pushq $-1 /* orig_ax = -1 => not a system call */ PUSH_AND_CLEAR_REGS ENCODE_FRAME_POINTER @@ -1065,7 +1070,8 @@ SYM_CODE_END(error_return) * when PAGE_TABLE_ISOLATION is in use. Do not clobber. */ SYM_CODE_START(asm_exc_nmi) - UNWIND_HINT_IRET_REGS + UNWIND_HINT_IRET_REGS entry=1 + ENDBR /* * We allow breakpoints in NMIs. If a breakpoint occurs, then @@ -1130,13 +1136,13 @@ SYM_CODE_START(asm_exc_nmi) SWITCH_TO_KERNEL_CR3 scratch_reg=%rdx movq %rsp, %rdx movq PER_CPU_VAR(cpu_current_top_of_stack), %rsp - UNWIND_HINT_IRET_REGS base=%rdx offset=8 + UNWIND_HINT_IRET_REGS base=%rdx offset=8 entry=0 pushq 5*8(%rdx) /* pt_regs->ss */ pushq 4*8(%rdx) /* pt_regs->rsp */ pushq 3*8(%rdx) /* pt_regs->flags */ pushq 2*8(%rdx) /* pt_regs->cs */ pushq 1*8(%rdx) /* pt_regs->rip */ - UNWIND_HINT_IRET_REGS + UNWIND_HINT_IRET_REGS entry=0 pushq $-1 /* pt_regs->orig_ax */ PUSH_AND_CLEAR_REGS rdx=(%rdx) ENCODE_FRAME_POINTER @@ -1292,7 +1298,7 @@ SYM_CODE_START(asm_exc_nmi) .rept 5 pushq 11*8(%rsp) .endr - UNWIND_HINT_IRET_REGS + UNWIND_HINT_IRET_REGS entry=0 /* Everything up to here is safe from nested NMIs */ @@ -1308,7 +1314,7 @@ SYM_CODE_START(asm_exc_nmi) pushq $__KERNEL_CS /* CS */ pushq $1f /* RIP */ iretq /* continues at repeat_nmi below */ - UNWIND_HINT_IRET_REGS + UNWIND_HINT_IRET_REGS entry=0 1: ENDBR #endif --- a/arch/x86/entry/entry_64_compat.S +++ b/arch/x86/entry/entry_64_compat.S @@ -49,6 +49,7 @@ SYM_CODE_START(entry_SYSENTER_compat) UNWIND_HINT_EMPTY /* Interrupts are off on entry. */ + ENDBR SWAPGS pushq %rax @@ -198,6 +199,7 @@ SYM_CODE_END(entry_SYSENTER_compat) */ SYM_CODE_START(entry_SYSCALL_compat) UNWIND_HINT_EMPTY + ENDBR /* Interrupts are off on entry. */ swapgs --- a/arch/x86/include/asm/idtentry.h +++ b/arch/x86/include/asm/idtentry.h @@ -5,6 +5,12 @@ /* Interrupts/Exceptions */ #include <asm/trapnr.h> +#ifdef CONFIG_X86_IBT +#define IDT_ALIGN 16 +#else +#define IDT_ALIGN 8 +#endif + #ifndef __ASSEMBLY__ #include <linux/entry-common.h> #include <linux/hardirq.h> @@ -492,33 +498,34 @@ __visible noinstr void func(struct pt_re * point is to mask off the bits above bit 7 because the push is sign * extending. */ - .align 8 + + .align IDT_ALIGN SYM_CODE_START(irq_entries_start) vector=FIRST_EXTERNAL_VECTOR .rept NR_EXTERNAL_VECTORS - UNWIND_HINT_IRET_REGS + UNWIND_HINT_IRET_REGS entry=1 0 : + ENDBR .byte 0x6a, vector jmp asm_common_interrupt - nop /* Ensure that the above is 8 bytes max */ - . = 0b + 8 + .fill 0b + IDT_ALIGN - ., 1, 0x90 vector = vector+1 .endr SYM_CODE_END(irq_entries_start) #ifdef CONFIG_X86_LOCAL_APIC - .align 8 + .align IDT_ALIGN SYM_CODE_START(spurious_entries_start) vector=FIRST_SYSTEM_VECTOR .rept NR_SYSTEM_VECTORS - UNWIND_HINT_IRET_REGS + UNWIND_HINT_IRET_REGS entry=1 0 : + ENDBR .byte 0x6a, vector jmp asm_spurious_interrupt - nop /* Ensure that the above is 8 bytes max */ - . = 0b + 8 + .fill 0b + IDT_ALIGN - ., 1, 0x90 vector = vector+1 .endr SYM_CODE_END(spurious_entries_start) --- a/arch/x86/include/asm/segment.h +++ b/arch/x86/include/asm/segment.h @@ -4,6 +4,7 @@ #include <linux/const.h> #include <asm/alternative.h> +#include <asm/ibt.h> /* * Constructor for a conventional segment GDT (or LDT) entry. @@ -275,7 +276,11 @@ static inline void vdso_read_cpunode(uns * vector has no error code (two bytes), a 'push $vector_number' (two * bytes), and a jump to the common entry code (up to five bytes). */ +#ifdef CONFIG_X86_IBT +#define EARLY_IDT_HANDLER_SIZE 13 +#else #define EARLY_IDT_HANDLER_SIZE 9 +#endif /* * xen_early_idt_handler_array is for Xen pv guests: for each entry in --- a/arch/x86/include/asm/unwind_hints.h +++ b/arch/x86/include/asm/unwind_hints.h @@ -11,7 +11,7 @@ UNWIND_HINT sp_reg=ORC_REG_UNDEFINED type=UNWIND_HINT_TYPE_CALL end=1 .endm -.macro UNWIND_HINT_REGS base=%rsp offset=0 indirect=0 extra=1 partial=0 +.macro UNWIND_HINT_REGS base=%rsp offset=0 indirect=0 extra=1 partial=0 entry=1 .if \base == %rsp .if \indirect .set sp_reg, ORC_REG_SP_INDIRECT @@ -33,9 +33,17 @@ .set sp_offset, \offset .if \partial - .set type, UNWIND_HINT_TYPE_REGS_PARTIAL + .if \entry + .set type, UNWIND_HINT_TYPE_REGS_ENTRY + .else + .set type, UNWIND_HINT_TYPE_REGS_EXIT + .endif .elseif \extra == 0 - .set type, UNWIND_HINT_TYPE_REGS_PARTIAL + .if \entry + .set type, UNWIND_HINT_TYPE_REGS_ENTRY + .else + .set type, UNWIND_HINT_TYPE_REGS_EXIT + .endif .set sp_offset, \offset + (16*8) .else .set type, UNWIND_HINT_TYPE_REGS @@ -44,8 +52,8 @@ UNWIND_HINT sp_reg=sp_reg sp_offset=sp_offset type=type .endm -.macro UNWIND_HINT_IRET_REGS base=%rsp offset=0 - UNWIND_HINT_REGS base=\base offset=\offset partial=1 +.macro UNWIND_HINT_IRET_REGS base=%rsp offset=0 entry=1 + UNWIND_HINT_REGS base=\base offset=\offset partial=1 entry=\entry .endm .macro UNWIND_HINT_FUNC --- a/arch/x86/kernel/head_64.S +++ b/arch/x86/kernel/head_64.S @@ -25,6 +25,7 @@ #include <asm/export.h> #include <asm/nospec-branch.h> #include <asm/fixmap.h> +#include <asm/ibt.h> /* * We are not able to switch in one step to the final KERNEL ADDRESS SPACE @@ -310,7 +311,8 @@ SYM_CODE_END(start_cpu0) * when .init.text is freed. */ SYM_CODE_START_NOALIGN(vc_boot_ghcb) - UNWIND_HINT_IRET_REGS offset=8 + UNWIND_HINT_IRET_REGS offset=8 entry=1 + ENDBR /* Build pt_regs */ PUSH_AND_CLEAR_REGS @@ -354,18 +356,20 @@ SYM_CODE_START(early_idt_handler_array) i = 0 .rept NUM_EXCEPTION_VECTORS .if ((EXCEPTION_ERRCODE_MASK >> i) & 1) == 0 - UNWIND_HINT_IRET_REGS + UNWIND_HINT_IRET_REGS entry=1 + ENDBR pushq $0 # Dummy error code, to make stack frame uniform .else - UNWIND_HINT_IRET_REGS offset=8 + UNWIND_HINT_IRET_REGS offset=8 entry=1 + ENDBR .endif pushq $i # 72(%rsp) Vector number jmp early_idt_handler_common - UNWIND_HINT_IRET_REGS + UNWIND_HINT_IRET_REGS entry=0 i = i + 1 .fill early_idt_handler_array + i*EARLY_IDT_HANDLER_SIZE - ., 1, 0xcc .endr - UNWIND_HINT_IRET_REGS offset=16 + UNWIND_HINT_IRET_REGS offset=16 entry=0 SYM_CODE_END(early_idt_handler_array) SYM_CODE_START_LOCAL(early_idt_handler_common) --- a/arch/x86/kernel/idt.c +++ b/arch/x86/kernel/idt.c @@ -10,6 +10,7 @@ #include <asm/proto.h> #include <asm/desc.h> #include <asm/hw_irq.h> +#include <asm/idtentry.h> #define DPL0 0x0 #define DPL3 0x3 @@ -272,7 +273,7 @@ void __init idt_setup_apic_and_irq_gates idt_setup_from_table(idt_table, apic_idts, ARRAY_SIZE(apic_idts), true); for_each_clear_bit_from(i, system_vectors, FIRST_SYSTEM_VECTOR) { - entry = irq_entries_start + 8 * (i - FIRST_EXTERNAL_VECTOR); + entry = irq_entries_start + IDT_ALIGN * (i - FIRST_EXTERNAL_VECTOR); set_intr_gate(i, entry); } @@ -283,7 +284,7 @@ void __init idt_setup_apic_and_irq_gates * system_vectors bitmap. Otherwise they show up in * /proc/interrupts. */ - entry = spurious_entries_start + 8 * (i - FIRST_SYSTEM_VECTOR); + entry = spurious_entries_start + IDT_ALIGN * (i - FIRST_SYSTEM_VECTOR); set_intr_gate(i, entry); } #endif --- a/arch/x86/kernel/unwind_orc.c +++ b/arch/x86/kernel/unwind_orc.c @@ -566,7 +566,8 @@ bool unwind_next_frame(struct unwind_sta state->signal = true; break; - case UNWIND_HINT_TYPE_REGS_PARTIAL: + case UNWIND_HINT_TYPE_REGS_ENTRY: + case UNWIND_HINT_TYPE_REGS_EXIT: if (!deref_stack_iret_regs(state, sp, &state->ip, &state->sp)) { orc_warn_current("can't access iret registers at %pB\n", (void *)orig_ip); --- a/include/linux/objtool.h +++ b/include/linux/objtool.h @@ -35,8 +35,9 @@ struct unwind_hint { */ #define UNWIND_HINT_TYPE_CALL 0 #define UNWIND_HINT_TYPE_REGS 1 -#define UNWIND_HINT_TYPE_REGS_PARTIAL 2 -#define UNWIND_HINT_TYPE_FUNC 3 +#define UNWIND_HINT_TYPE_REGS_ENTRY 2 +#define UNWIND_HINT_TYPE_REGS_EXIT 3 +#define UNWIND_HINT_TYPE_FUNC 4 #ifdef CONFIG_STACK_VALIDATION --- a/tools/include/linux/objtool.h +++ b/tools/include/linux/objtool.h @@ -35,8 +35,9 @@ struct unwind_hint { */ #define UNWIND_HINT_TYPE_CALL 0 #define UNWIND_HINT_TYPE_REGS 1 -#define UNWIND_HINT_TYPE_REGS_PARTIAL 2 -#define UNWIND_HINT_TYPE_FUNC 3 +#define UNWIND_HINT_TYPE_REGS_ENTRY 2 +#define UNWIND_HINT_TYPE_REGS_EXIT 3 +#define UNWIND_HINT_TYPE_FUNC 4 #ifdef CONFIG_STACK_VALIDATION --- a/tools/objtool/check.c +++ b/tools/objtool/check.c @@ -2308,7 +2308,8 @@ static int update_cfi_state(struct instr } if (cfi->type == UNWIND_HINT_TYPE_REGS || - cfi->type == UNWIND_HINT_TYPE_REGS_PARTIAL) + cfi->type == UNWIND_HINT_TYPE_REGS_ENTRY || + cfi->type == UNWIND_HINT_TYPE_REGS_EXIT) return update_cfi_state_regs(insn, cfi, op); switch (op->dest.type) { --- a/tools/objtool/orc_dump.c +++ b/tools/objtool/orc_dump.c @@ -43,7 +43,8 @@ static const char *orc_type_name(unsigne return "call"; case UNWIND_HINT_TYPE_REGS: return "regs"; - case UNWIND_HINT_TYPE_REGS_PARTIAL: + case UNWIND_HINT_TYPE_REGS_ENTRY: + case UNWIND_HINT_TYPE_REGS_EXIT: return "regs (partial)"; default: return "?";
Objtool based IBT validation in 3 passes: --ibt-fix-direct: Detect and rewrite any code/reloc from a JMP/CALL instruction to an ENDBR instruction. This is basically a compiler bug since neither needs the ENDBR and decoding it is a pure waste of time. --ibt: Report code relocs that are not JMP/CALL and don't point to ENDBR There's a bunch of false positives, for eg. static_call_update() and copy_thread() and kprobes. But most of them were due to _THIS_IP_ which has been taken care of with the prior annotation. --ibt-seal: Find and NOP superfluous ENDBR instructions. Any function that doesn't have it's address taken should not have an ENDBR instruction. This removes about 1-in-4 ENDBR instructions. All these flags are LTO like and require '--vmlinux --duplicate' to run. As is, the output on x86_64-defconfig looks like: defconfig-build/vmlinux.o: warning: objtool: apply_retpolines()+0x9b: relocation to !ENDBR: __x86_indirect_thunk_array+0x0 defconfig-build/vmlinux.o: warning: objtool: copy_thread()+0x3c: relocation to !ENDBR: ret_from_fork+0x0 defconfig-build/vmlinux.o: warning: objtool: machine_kexec_prepare()+0x189: relocation to !ENDBR: relocate_kernel+0x0 defconfig-build/vmlinux.o: warning: objtool: machine_kexec()+0x44: relocation to !ENDBR: relocate_kernel+0x0 defconfig-build/vmlinux.o: warning: objtool: __kretprobe_trampoline()+0x0: relocation to !ENDBR: __kretprobe_trampoline+0x0 defconfig-build/vmlinux.o: warning: objtool: arch_prepare_kretprobe()+0x16: relocation to !ENDBR: __kretprobe_trampoline+0x0 defconfig-build/vmlinux.o: warning: objtool: trampoline_handler()+0x1b: relocation to !ENDBR: __kretprobe_trampoline+0x0 defconfig-build/vmlinux.o: warning: objtool: apply_relocate_add()+0x30: relocation to !ENDBR: __memcpy+0x0 defconfig-build/vmlinux.o: warning: objtool: unwind_next_frame()+0x53d: relocation to !ENDBR: __kretprobe_trampoline+0x0 defconfig-build/vmlinux.o: warning: objtool: unwind_next_frame()+0x3d0: relocation to !ENDBR: __kretprobe_trampoline+0x0 defconfig-build/vmlinux.o: warning: objtool: unwind_next_frame()+0x232: relocation to !ENDBR: __kretprobe_trampoline+0x0 defconfig-build/vmlinux.o: warning: objtool: __unwind_start()+0x11a: relocation to !ENDBR: ret_from_fork+0x0 defconfig-build/vmlinux.o: warning: objtool: sched_dynamic_update()+0x3b: relocation to !ENDBR: preempt_schedule_thunk+0x0 defconfig-build/vmlinux.o: warning: objtool: sched_dynamic_update()+0x55: relocation to !ENDBR: preempt_schedule_notrace_thunk+0x0 defconfig-build/vmlinux.o: warning: objtool: sched_dynamic_update()+0x1e3: relocation to !ENDBR: preempt_schedule_thunk+0x0 defconfig-build/vmlinux.o: warning: objtool: sched_dynamic_update()+0x1fd: relocation to !ENDBR: preempt_schedule_notrace_thunk+0x0 defconfig-build/vmlinux.o: warning: objtool: filter_irq_stacks()+0x1f: relocation to !ENDBR: __irqentry_text_end+0x0 defconfig-build/vmlinux.o: warning: objtool: __kretprobe_find_ret_addr()+0xe: relocation to !ENDBR: __kretprobe_trampoline+0x0 defconfig-build/vmlinux.o: warning: objtool: kretprobe_find_ret_addr()+0x13: relocation to !ENDBR: __kretprobe_trampoline+0x0 defconfig-build/vmlinux.o: warning: objtool: __kretprobe_trampoline_handler()+0x43: relocation to !ENDBR: __kretprobe_trampoline+0x0 defconfig-build/vmlinux.o: warning: objtool: trace_seq_print_sym()+0x8e: relocation to !ENDBR: __kretprobe_trampoline+0x0 defconfig-build/vmlinux.o: warning: objtool: trace_seq_print_sym()+0x39: relocation to !ENDBR: __kretprobe_trampoline+0x0 defconfig-build/vmlinux.o: warning: objtool: override_function_with_return()+0x4: relocation to !ENDBR: just_return_func+0x0 defconfig-build/vmlinux.o: warning: objtool: rpm_suspend()+0x409: relocation to !ENDBR: rpm_suspend+0x56b defconfig-build/vmlinux.o: warning: objtool: rpm_idle()+0x192: relocation to !ENDBR: rpm_idle+0x215 defconfig-build/vmlinux.o: warning: objtool: arch_hibernation_header_save()+0x17: relocation to !ENDBR: restore_registers+0x0 defconfig-build/vmlinux.o: warning: objtool: relocate_restore_code()+0x1e: relocation to !ENDBR: core_restore_code+0xfffffffffffffffc defconfig-build/vmlinux.o: warning: objtool: relocate_restore_code()+0x29: relocation to !ENDBR: core_restore_code+0x0 defconfig-build/vmlinux.o: warning: objtool: init_real_mode()+0xf1: relocation to !ENDBR: secondary_startup_64+0x0 defconfig-build/vmlinux.o: warning: objtool: int3_exception_notify()+0x52: relocation to !ENDBR: int3_magic+0x0 defconfig-build/vmlinux.o: warning: objtool: exc_double_fault()+0x55: relocation to !ENDBR: native_irq_return_iret+0x0 defconfig-build/vmlinux.o: warning: objtool: exc_debug()+0xa3: relocation to !ENDBR: __end_entry_SYSENTER_compat+0x0 defconfig-build/vmlinux.o: warning: objtool: .text+0x5211c: relocation to !ENDBR: .text+0x52125 defconfig-build/vmlinux.o: warning: objtool: .head.text+0x80: relocation to !ENDBR: .head.text+0x89 defconfig-build/vmlinux.o: warning: objtool: .head.text+0xf4: relocation to !ENDBR: .head.text+0x107 defconfig-build/vmlinux.o: warning: objtool: .entry.text+0x1836: relocation to !ENDBR: asm_load_gs_index+0x3 defconfig-build/vmlinux.o: warning: objtool: .entry.text+0x1937: relocation to !ENDBR: .entry.text+0x19b4 defconfig-build/vmlinux.o: warning: objtool: .entry.text+0x198a: relocation to !ENDBR: .entry.text+0x19b4 defconfig-build/vmlinux.o: warning: objtool: .entry.text+0x1945: relocation to !ENDBR: .entry.text+0x19d9 IBT superfluous ENDBR: 11799/43941 Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> --- tools/objtool/arch/x86/decode.c | 18 ++ tools/objtool/builtin-check.c | 10 + tools/objtool/check.c | 236 +++++++++++++++++++++++++++++--- tools/objtool/include/objtool/arch.h | 2 tools/objtool/include/objtool/builtin.h | 2 tools/objtool/include/objtool/objtool.h | 3 tools/objtool/objtool.c | 1 7 files changed, 249 insertions(+), 23 deletions(-) --- a/tools/objtool/arch/x86/decode.c +++ b/tools/objtool/arch/x86/decode.c @@ -112,7 +112,7 @@ int arch_decode_instruction(struct objto const struct elf *elf = file->elf; struct insn insn; int x86_64, ret; - unsigned char op1, op2, + unsigned char op1, op2, prefix, rex = 0, rex_b = 0, rex_r = 0, rex_w = 0, rex_x = 0, modrm = 0, modrm_mod = 0, modrm_rm = 0, modrm_reg = 0, sib = 0, /* sib_scale = 0, */ sib_index = 0, sib_base = 0; @@ -137,6 +137,8 @@ int arch_decode_instruction(struct objto if (insn.vex_prefix.nbytes) return 0; + prefix = insn.prefixes.bytes[0]; + op1 = insn.opcode.bytes[0]; op2 = insn.opcode.bytes[1]; @@ -491,6 +493,11 @@ int arch_decode_instruction(struct objto /* nopl/nopw */ *type = INSN_NOP; + } else if (op2 == 0x1e) { + + if (prefix == 0xf3 && (modrm == 0xfa || modrm == 0xfb)) + *type = INSN_ENDBR; + } else if (op2 == 0xa0 || op2 == 0xa8) { /* push fs/gs */ @@ -691,6 +698,15 @@ const char *arch_nop_insn(int len) return nops[len-1]; } +const char *arch_call_insn(unsigned long ip, unsigned long target) +{ + static char bytes[5] = { 0xe8, 0, 0, 0, 0 }; + + *(int *)(&bytes[1]) = (long)(target - (ip + 5)); + + return bytes; +} + #define BYTE_RET 0xC3 const char *arch_ret_insn(int len) --- a/tools/objtool/builtin-check.c +++ b/tools/objtool/builtin-check.c @@ -20,7 +20,7 @@ #include <objtool/objtool.h> bool no_fp, no_unreachable, retpoline, module, backtrace, uaccess, stats, - validate_dup, vmlinux, mcount, noinstr, backup; + validate_dup, vmlinux, mcount, noinstr, backup, ibt, ibt_fix_direct, ibt_seal; static const char * const check_usage[] = { "objtool check [<options>] file.o", @@ -45,6 +45,9 @@ const struct option check_options[] = { OPT_BOOLEAN('l', "vmlinux", &vmlinux, "vmlinux.o validation"), OPT_BOOLEAN('M', "mcount", &mcount, "generate __mcount_loc"), OPT_BOOLEAN('B', "backup", &backup, "create .orig files before modification"), + OPT_BOOLEAN(0, "ibt", &ibt, "foobar"), + OPT_BOOLEAN(0, "ibt-fix-direct", &ibt_fix_direct, "foobar"), + OPT_BOOLEAN(0, "ibt-seal", &ibt_seal, "foobar"), OPT_END(), }; @@ -84,6 +87,11 @@ int cmd_check(int argc, const char **arg argc = cmd_parse_options(argc, argv, check_usage); objname = argv[0]; + if (ibt && !(vmlinux && validate_dup)) { + fprintf(stderr, "--ibt requires: --vmlinux --duplicate\n"); + exit(129); + } + file = objtool_open_read(objname); if (!file) return 1; --- a/tools/objtool/check.c +++ b/tools/objtool/check.c @@ -378,6 +378,7 @@ static int decode_instructions(struct ob memset(insn, 0, sizeof(*insn)); INIT_LIST_HEAD(&insn->alts); INIT_LIST_HEAD(&insn->stack_ops); + INIT_LIST_HEAD(&insn->call_node); insn->sec = sec; insn->offset = offset; @@ -1170,6 +1171,13 @@ static int add_jump_destinations(struct unsigned long dest_off; for_each_insn(file, insn) { + if (insn->type == INSN_ENDBR) { + if (insn->func && insn->offset == insn->func->offset) { + list_add_tail(&insn->call_node, &file->endbr_list); + file->nr_endbr++; + } + } + if (!is_static_jump(insn)) continue; @@ -1186,10 +1194,14 @@ static int add_jump_destinations(struct } else if (insn->func) { /* internal or external sibling call (with reloc) */ add_call_dest(file, insn, reloc->sym, true); - continue; + + dest_sec = reloc->sym->sec; + dest_off = reloc->sym->offset + + arch_dest_reloc_offset(reloc->addend); + } else if (reloc->sym->sec->idx) { dest_sec = reloc->sym->sec; - dest_off = reloc->sym->sym.st_value + + dest_off = reloc->sym->offset + arch_dest_reloc_offset(reloc->addend); } else { /* non-func asm code jumping to another file */ @@ -1199,6 +1211,9 @@ static int add_jump_destinations(struct insn->jump_dest = find_insn(file, dest_sec, dest_off); if (!insn->jump_dest) { + if (insn->func) + continue; + /* * This is a special case where an alt instruction * jumps past the end of the section. These are @@ -1213,6 +1228,19 @@ static int add_jump_destinations(struct return -1; } + if (ibt && insn->call_dest && insn->jump_dest->type == INSN_ENDBR) { + if (reloc) { + if (ibt_fix_direct) { + reloc->addend += 4; + elf_write_reloc(file->elf, reloc); + } else { + WARN_FUNC("Direct RELOC jump to ENDBR", insn->sec, insn->offset); + } + } else { + WARN_FUNC("Direct IMM jump to ENDBR", insn->sec, insn->offset); + } + } + /* * Cross-function jump. */ @@ -1240,7 +1268,8 @@ static int add_jump_destinations(struct insn->jump_dest->func->pfunc = insn->func; } else if (insn->jump_dest->func->pfunc != insn->func->pfunc && - insn->jump_dest->offset == insn->jump_dest->func->offset) { + ((insn->jump_dest->offset == insn->jump_dest->func->offset) || + (insn->jump_dest->offset == insn->jump_dest->func->offset + 4))) { /* internal sibling call (without reloc) */ add_call_dest(file, insn, insn->jump_dest->func, true); } @@ -1250,23 +1279,12 @@ static int add_jump_destinations(struct return 0; } -static struct symbol *find_call_destination(struct section *sec, unsigned long offset) -{ - struct symbol *call_dest; - - call_dest = find_func_by_offset(sec, offset); - if (!call_dest) - call_dest = find_symbol_by_offset(sec, offset); - - return call_dest; -} - /* * Find the destination instructions for all calls. */ static int add_call_destinations(struct objtool_file *file) { - struct instruction *insn; + struct instruction *insn, *target = NULL; unsigned long dest_off; struct symbol *dest; struct reloc *reloc; @@ -1278,7 +1296,21 @@ static int add_call_destinations(struct reloc = insn_reloc(file, insn); if (!reloc) { dest_off = arch_jump_destination(insn); - dest = find_call_destination(insn->sec, dest_off); + + target = find_insn(file, insn->sec, dest_off); + if (!target) { + WARN_FUNC("direct call to nowhere", insn->sec, insn->offset); + return -1; + } + dest = target->func; + if (!dest) + dest = find_symbol_containing(insn->sec, dest_off); + if (!dest) { + WARN_FUNC("IMM can't find call dest symbol at %s+0x%lx", + insn->sec, insn->offset, + insn->sec->name, dest_off); + return -1; + } add_call_dest(file, insn, dest, false); @@ -1297,10 +1329,25 @@ static int add_call_destinations(struct } } else if (reloc->sym->type == STT_SECTION) { - dest_off = arch_dest_reloc_offset(reloc->addend); - dest = find_call_destination(reloc->sym->sec, dest_off); + struct section *dest_sec; + + dest_sec = reloc->sym->sec; + dest_off = reloc->sym->offset + + arch_dest_reloc_offset(reloc->addend); + + target = find_insn(file, dest_sec, dest_off); + if (target) { + dest = target->func; + if (!dest) + dest = find_symbol_containing(dest_sec, dest_off); + } else { + WARN("foo"); + dest = find_func_by_offset(dest_sec, dest_off); + if (!dest) + dest = find_symbol_by_offset(dest_sec, dest_off); + } if (!dest) { - WARN_FUNC("can't find call dest symbol at %s+0x%lx", + WARN_FUNC("RELOC can't find call dest symbol at %s+0x%lx", insn->sec, insn->offset, reloc->sym->sec->name, dest_off); @@ -1311,9 +1358,37 @@ static int add_call_destinations(struct } else if (reloc->sym->retpoline_thunk) { add_retpoline_call(file, insn); + continue; + + } else { + struct section *dest_sec; + + dest_sec = reloc->sym->sec; + dest_off = reloc->sym->offset + + arch_dest_reloc_offset(reloc->addend); + + target = find_insn(file, dest_sec, dest_off); - } else add_call_dest(file, insn, reloc->sym, false); + } + + if (ibt && target && target->type == INSN_ENDBR) { + if (reloc) { + if (ibt_fix_direct) { + reloc->addend += 4; + elf_write_reloc(file->elf, reloc); + } else { + WARN_FUNC("Direct RELOC call to ENDBR", insn->sec, insn->offset); + } + } else { + if (ibt_fix_direct) { + elf_write_insn(file->elf, insn->sec, insn->offset, insn->len, + arch_call_insn(insn->offset, dest_off + 4)); + } else { + WARN_FUNC("Direct IMM call to ENDBR", insn->sec, insn->offset); + } + } + } } return 0; @@ -3023,6 +3098,8 @@ static struct instruction *next_insn_to_ return next_insn_same_sec(file, insn); } +static void validate_ibt_insn(struct objtool_file *file, struct instruction *insn); + /* * Follow the branch starting at the given instruction, and recursively follow * any other branches (jumps). Meanwhile, track the frame pointer state at @@ -3071,6 +3148,12 @@ static int validate_branch(struct objtoo if (insn->hint) { state.cfi = *insn->cfi; + if (ibt) { + if (insn->cfi->type == UNWIND_HINT_TYPE_REGS_ENTRY && + insn->type != INSN_ENDBR) { + WARN_FUNC("IRET_ENTRY hint without ENDBR", insn->sec, insn->offset); + } + } } else { /* XXX track if we actually changed state.cfi */ @@ -3216,7 +3299,12 @@ static int validate_branch(struct objtoo state.df = false; break; + case INSN_NOP: + break; + default: + if (ibt) + validate_ibt_insn(file, insn); break; } @@ -3466,6 +3554,111 @@ static int validate_functions(struct obj return warnings; } +static int validate_ibt_reloc(struct objtool_file *file, struct reloc *reloc, char **name) +{ + struct instruction *dest; + struct section *sec; + unsigned long off; + + sec = reloc->sym->sec; + off = reloc->sym->offset + reloc->addend; + + dest = find_insn(file, sec, off); + if (!dest) + return 0; + + if (name && dest->func) + *name = dest->func->name; + + if (dest->type == INSN_ENDBR) { + if (!list_empty(&dest->call_node)) { + list_del_init(&dest->call_node); + return 1; + } + return 0; + } + + if (reloc->sym->static_call_tramp) + return 0; + + return -1; +} + +static void validate_ibt_insn(struct objtool_file *file, struct instruction *insn) +{ + struct reloc *reloc = insn_reloc(file, insn); + struct instruction *target; + unsigned long offset; + + if (!reloc) + return; + + if (validate_ibt_reloc(file, reloc, NULL) >= 0) + return; + + offset = reloc->sym->offset + reloc->addend; + + target = find_insn(file, reloc->sym->sec, offset); + if (target && insn->func == target->func && target->this_ip) + return; + + WARN_FUNC("relocation to !ENDBR: %s+0x%lx", + insn->sec, insn->offset, + target && target->func ? target->func->name : reloc->sym->name, + target && target->func ? target->offset - target->func->offset : reloc->addend); +} + +static int validate_ibt(struct objtool_file *file) +{ + struct instruction *insn; + struct section *sec; + struct reloc *reloc; + char *name; + int nr = 0; + + for_each_sec(file, sec) { + /* already done in validate_branch() */ + if (sec->sh.sh_flags & SHF_EXECINSTR) + continue; + + if (!sec->reloc) + continue; + + if (!strncmp(sec->name, ".orc", 4)) + continue; + + if (!strncmp(sec->name, ".discard", 8)) + continue; + + if (!strcmp(sec->name, "_error_injection_whitelist")) + continue; + + if (!strcmp(sec->name, "_kprobe_blacklist")) + continue; + + list_for_each_entry(reloc, &sec->reloc->reloc_list, list) { + if (validate_ibt_reloc(file, reloc, &name) > 0) { +// WARN("preserved (%s) ENDBR from section: %s", name, sec->name); + } + + } + } + + list_for_each_entry(insn, &file->endbr_list, call_node) { + if (ibt_seal) { + elf_write_insn(file->elf, insn->sec, + insn->offset, insn->len, + arch_nop_insn(insn->len)); + } + nr++; + } + + if (stats) + printf("IBT superfluous ENDBR: %d/%d\n", nr, file->nr_endbr); + + return 0; +} + static int validate_reachable_instructions(struct objtool_file *file) { struct instruction *insn; @@ -3534,6 +3727,9 @@ int check(struct objtool_file *file) goto out; warnings += ret; + if (vmlinux && ibt) + ret = validate_ibt(file); + if (!warnings) { ret = validate_reachable_instructions(file); if (ret < 0) --- a/tools/objtool/include/objtool/arch.h +++ b/tools/objtool/include/objtool/arch.h @@ -26,6 +26,7 @@ enum insn_type { INSN_CLAC, INSN_STD, INSN_CLD, + INSN_ENDBR, INSN_OTHER, }; @@ -83,6 +84,7 @@ unsigned long arch_dest_reloc_offset(int const char *arch_nop_insn(int len); const char *arch_ret_insn(int len); +const char *arch_call_insn(unsigned long ip, unsigned long target); int arch_decode_hint_reg(u8 sp_reg, int *base); --- a/tools/objtool/include/objtool/builtin.h +++ b/tools/objtool/include/objtool/builtin.h @@ -9,7 +9,7 @@ extern const struct option check_options[]; extern bool no_fp, no_unreachable, retpoline, module, backtrace, uaccess, stats, - validate_dup, vmlinux, mcount, noinstr, backup; + validate_dup, vmlinux, mcount, noinstr, backup, ibt, ibt_fix_direct, ibt_seal; extern int cmd_parse_options(int argc, const char **argv, const char * const usage[]); --- a/tools/objtool/include/objtool/objtool.h +++ b/tools/objtool/include/objtool/objtool.h @@ -26,8 +26,11 @@ struct objtool_file { struct list_head retpoline_call_list; struct list_head static_call_list; struct list_head mcount_loc_list; + struct list_head endbr_list; bool ignore_unreachables, c_file, hints, rodata; + unsigned int nr_endbr; + unsigned long jl_short, jl_long; unsigned long jl_nop_short, jl_nop_long; --- a/tools/objtool/objtool.c +++ b/tools/objtool/objtool.c @@ -128,6 +128,7 @@ struct objtool_file *objtool_open_read(c INIT_LIST_HEAD(&file.retpoline_call_list); INIT_LIST_HEAD(&file.static_call_list); INIT_LIST_HEAD(&file.mcount_loc_list); + INIT_LIST_HEAD(&file.endbr_list); file.c_file = !vmlinux && find_section_by_name(file.elf, ".comment"); file.ignore_unreachables = no_unreachable; file.hints = false;
On Mon, Nov 22, 2021 at 06:03:04PM +0100, Peter Zijlstra wrote:
> The IRET-to-Self chunks trigger forward code references without ENDBR,
> fix that.
Andy corrected me, IRET doesn't take ENBR, the alternative is the below.
---
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -1316,7 +1316,6 @@ SYM_CODE_START(asm_exc_nmi)
iretq /* continues at repeat_nmi below */
UNWIND_HINT_IRET_REGS entry=0
1:
- ENDBR
#endif
repeat_nmi:
--- a/arch/x86/include/asm/sync_core.h
+++ b/arch/x86/include/asm/sync_core.h
@@ -6,7 +6,6 @@
#include <asm/processor.h>
#include <asm/cpufeature.h>
#include <asm/special_insns.h>
-#include <asm/ibt.h>
#ifdef CONFIG_X86_32
static inline void iret_to_self(void)
@@ -35,7 +34,6 @@ static inline void iret_to_self(void)
"pushq $1f\n\t"
"iretq\n\t"
"1:"
- ASM_ENDBR
: "=&r" (tmp), ASM_CALL_CONSTRAINT : : "cc", "memory");
}
#endif /* CONFIG_X86_32 */
--- a/tools/objtool/arch/x86/decode.c
+++ b/tools/objtool/arch/x86/decode.c
@@ -598,6 +598,7 @@ int arch_decode_instruction(struct objto
op->dest.type = OP_DEST_REG;
op->dest.reg = CFI_SP;
}
+ *type = INSN_IRET;
break;
}
--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -3587,7 +3587,7 @@ static int validate_ibt_reloc(struct obj
static void validate_ibt_insn(struct objtool_file *file, struct instruction *insn)
{
struct reloc *reloc = insn_reloc(file, insn);
- struct instruction *target;
+ struct instruction *target, *n;
unsigned long offset;
if (!reloc)
@@ -3599,8 +3599,16 @@ static void validate_ibt_insn(struct obj
offset = reloc->sym->offset + reloc->addend;
target = find_insn(file, reloc->sym->sec, offset);
- if (target && insn->func == target->func && target->this_ip)
- return;
+ if (target && insn->func == target->func) {
+ if (target->this_ip)
+ return;
+
+ for (n = insn; n->offset <= target->offset;
+ n = next_insn_same_func(file, n)) {
+ if (n->type == INSN_IRET)
+ return;
+ }
+ }
WARN_FUNC("relocation to !ENDBR: %s+0x%lx",
insn->sec, insn->offset,
--- a/tools/objtool/include/objtool/arch.h
+++ b/tools/objtool/include/objtool/arch.h
@@ -27,6 +27,7 @@ enum insn_type {
INSN_STD,
INSN_CLD,
INSN_ENDBR,
+ INSN_IRET,
INSN_OTHER,
};
What is "IBT"?
On Mon, Nov 22, 2021 at 11:58:43PM -0800, Christoph Hellwig wrote:
> What is "IBT"?
Fair enough; it's Indirect Branch Tracking, it's a hardware feature that
ensures any indirect JMP/CALL can only ever land on an ENDBR
instruction. It's a form of Control Flow Integrity, albeit a weak one.
(FineIBT is a software improvement that combines this with a hash value
to further narrow the allowed branches. People are working on that, but
basics first etc..)
More practical, by stripping ENDBR instruction from functions that are
never called indirectly, we insta kill the tried and true method of:
func = kallsym_lookup_name("unexported_function");
(*func)(args);
favoured by pretty much every out of tree piece-of-cra^Wmodule.
On Mon, Nov 22, 2021 at 06:03:02PM +0100, Peter Zijlstra wrote: > In order to find _THIS_IP_ code references in objtool, annotate them. Just to check my understanding, IIUC this is because in later patches you'll look at text relocations to spot missing ENDBRs, and when doing so you need to filter out _THIS_IP_ instances, since those don't need an ENDBR. Is that right? Just checking I haven't missed some other concern that might apply to arm64's BTI (Branch Target Identifier), which are analagous to ENDBR. Thanks, Mark. > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> > --- > arch/x86/include/asm/linkage.h | 11 +++++++++++ > include/linux/instruction_pointer.h | 5 +++++ > 2 files changed, 16 insertions(+) > > --- a/arch/x86/include/asm/linkage.h > +++ b/arch/x86/include/asm/linkage.h > @@ -3,10 +3,21 @@ > #define _ASM_X86_LINKAGE_H > > #include <linux/stringify.h> > +#include <asm/asm.h> > > #undef notrace > #define notrace __attribute__((no_instrument_function)) > > +#define _THIS_IP_ \ > + ({ __label__ __here; \ > + __here: \ > + asm_volatile_goto ( \ > + ".pushsection .discard.this_ip\n\t" \ > + _ASM_PTR " %l[__here]\n\t" \ > + ".popsection\n\t" \ > + : : : : __here); \ > + (unsigned long)&&__here; }) > + > #ifdef CONFIG_X86_32 > #define asmlinkage CPP_ASMLINKAGE __attribute__((regparm(0))) > #endif /* CONFIG_X86_32 */ > --- a/include/linux/instruction_pointer.h > +++ b/include/linux/instruction_pointer.h > @@ -2,7 +2,12 @@ > #ifndef _LINUX_INSTRUCTION_POINTER_H > #define _LINUX_INSTRUCTION_POINTER_H > > +#include <asm/linkage.h> > + > #define _RET_IP_ (unsigned long)__builtin_return_address(0) > + > +#ifndef _THIS_IP_ > #define _THIS_IP_ ({ __label__ __here; __here: (unsigned long)&&__here; }) > +#endif > > #endif /* _LINUX_INSTRUCTION_POINTER_H */ > >
On Mon, Nov 22, 2021 at 06:03:06PM +0100, Peter Zijlstra wrote: > Kernel entry points should be having ENDBR on for IBT configs. > > The SYSCALL entry points are found through taking their respective > address in order to program them in the MSRs, while the exception > entry points are found through UNWIND_HINT_IRET_REGS. > > *Except* that latter hint is also used on exit code to denote when > we're down to an IRET frame. As such add an additional 'entry' > argument to the macro and have it default to '1' such that objtool > will assume it's an entry and WARN about it. > > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> On arm64 we also added BTI (analagous to ENDBR) in our SYMC_FUNC_START() so what we don't miss a necessary landing pad for any assembly functions that can be indirectly branched to. See commits: 714a8d02ca4da147 ("arm64: asm: Override SYM_FUNC_START when building the kernel with BTI") 2d21889f8b5c50f6 ("arm64: Don't insert a BTI instruction at inner labels") ... do you need something similar? Or do you never indirectly branch to an assembly function? Thanks, Mark. > --- > arch/x86/entry/entry_64.S | 34 ++++++++++++++++++++-------------- > arch/x86/entry/entry_64_compat.S | 2 ++ > arch/x86/include/asm/idtentry.h | 23 +++++++++++++++-------- > arch/x86/include/asm/segment.h | 5 +++++ > arch/x86/include/asm/unwind_hints.h | 18 +++++++++++++----- > arch/x86/kernel/head_64.S | 14 +++++++++----- > arch/x86/kernel/idt.c | 5 +++-- > arch/x86/kernel/unwind_orc.c | 3 ++- > include/linux/objtool.h | 5 +++-- > tools/include/linux/objtool.h | 5 +++-- > tools/objtool/check.c | 3 ++- > tools/objtool/orc_dump.c | 3 ++- > 12 files changed, 79 insertions(+), 41 deletions(-) > > --- a/arch/x86/entry/entry_64.S > +++ b/arch/x86/entry/entry_64.S > @@ -88,6 +88,7 @@ > SYM_CODE_START(entry_SYSCALL_64) > UNWIND_HINT_EMPTY > > + ENDBR > swapgs > /* tss.sp2 is scratch space. */ > movq %rsp, PER_CPU_VAR(cpu_tss_rw + TSS_sp2) > @@ -350,7 +351,8 @@ SYM_CODE_END(ret_from_fork) > */ > .macro idtentry vector asmsym cfunc has_error_code:req > SYM_CODE_START(\asmsym) > - UNWIND_HINT_IRET_REGS offset=\has_error_code*8 > + UNWIND_HINT_IRET_REGS offset=\has_error_code*8 entry=1 > + ENDBR > ASM_CLAC > > .if \has_error_code == 0 > @@ -367,7 +369,7 @@ SYM_CODE_START(\asmsym) > .rept 6 > pushq 5*8(%rsp) > .endr > - UNWIND_HINT_IRET_REGS offset=8 > + UNWIND_HINT_IRET_REGS offset=8 entry=0 > .Lfrom_usermode_no_gap_\@: > .endif > > @@ -417,7 +419,8 @@ SYM_CODE_END(\asmsym) > */ > .macro idtentry_mce_db vector asmsym cfunc > SYM_CODE_START(\asmsym) > - UNWIND_HINT_IRET_REGS > + UNWIND_HINT_IRET_REGS entry=1 > + ENDBR > ASM_CLAC > > pushq $-1 /* ORIG_RAX: no syscall to restart */ > @@ -472,7 +475,8 @@ SYM_CODE_END(\asmsym) > */ > .macro idtentry_vc vector asmsym cfunc > SYM_CODE_START(\asmsym) > - UNWIND_HINT_IRET_REGS > + UNWIND_HINT_IRET_REGS entry=1 > + ENDBR > ASM_CLAC > > /* > @@ -533,7 +537,8 @@ SYM_CODE_END(\asmsym) > */ > .macro idtentry_df vector asmsym cfunc > SYM_CODE_START(\asmsym) > - UNWIND_HINT_IRET_REGS offset=8 > + UNWIND_HINT_IRET_REGS offset=8 entry=1 > + ENDBR > ASM_CLAC > > /* paranoid_entry returns GS information for paranoid_exit in EBX. */ > @@ -626,7 +631,7 @@ SYM_INNER_LABEL(restore_regs_and_return_ > INTERRUPT_RETURN > > SYM_INNER_LABEL_ALIGN(native_iret, SYM_L_GLOBAL) > - UNWIND_HINT_IRET_REGS > + UNWIND_HINT_IRET_REGS entry=0 > /* > * Are we returning to a stack segment from the LDT? Note: in > * 64-bit mode SS:RSP on the exception stack is always valid. > @@ -703,7 +708,7 @@ SYM_INNER_LABEL(native_irq_return_iret, > popq %rdi /* Restore user RDI */ > > movq %rax, %rsp > - UNWIND_HINT_IRET_REGS offset=8 > + UNWIND_HINT_IRET_REGS offset=8 entry=0 > > /* > * At this point, we cannot write to the stack any more, but we can > @@ -819,13 +824,13 @@ SYM_CODE_START(xen_failsafe_callback) > movq 8(%rsp), %r11 > addq $0x30, %rsp > pushq $0 /* RIP */ > - UNWIND_HINT_IRET_REGS offset=8 > + UNWIND_HINT_IRET_REGS offset=8 entry=0 > jmp asm_exc_general_protection > 1: /* Segment mismatch => Category 1 (Bad segment). Retry the IRET. */ > movq (%rsp), %rcx > movq 8(%rsp), %r11 > addq $0x30, %rsp > - UNWIND_HINT_IRET_REGS > + UNWIND_HINT_IRET_REGS entry=0 > pushq $-1 /* orig_ax = -1 => not a system call */ > PUSH_AND_CLEAR_REGS > ENCODE_FRAME_POINTER > @@ -1065,7 +1070,8 @@ SYM_CODE_END(error_return) > * when PAGE_TABLE_ISOLATION is in use. Do not clobber. > */ > SYM_CODE_START(asm_exc_nmi) > - UNWIND_HINT_IRET_REGS > + UNWIND_HINT_IRET_REGS entry=1 > + ENDBR > > /* > * We allow breakpoints in NMIs. If a breakpoint occurs, then > @@ -1130,13 +1136,13 @@ SYM_CODE_START(asm_exc_nmi) > SWITCH_TO_KERNEL_CR3 scratch_reg=%rdx > movq %rsp, %rdx > movq PER_CPU_VAR(cpu_current_top_of_stack), %rsp > - UNWIND_HINT_IRET_REGS base=%rdx offset=8 > + UNWIND_HINT_IRET_REGS base=%rdx offset=8 entry=0 > pushq 5*8(%rdx) /* pt_regs->ss */ > pushq 4*8(%rdx) /* pt_regs->rsp */ > pushq 3*8(%rdx) /* pt_regs->flags */ > pushq 2*8(%rdx) /* pt_regs->cs */ > pushq 1*8(%rdx) /* pt_regs->rip */ > - UNWIND_HINT_IRET_REGS > + UNWIND_HINT_IRET_REGS entry=0 > pushq $-1 /* pt_regs->orig_ax */ > PUSH_AND_CLEAR_REGS rdx=(%rdx) > ENCODE_FRAME_POINTER > @@ -1292,7 +1298,7 @@ SYM_CODE_START(asm_exc_nmi) > .rept 5 > pushq 11*8(%rsp) > .endr > - UNWIND_HINT_IRET_REGS > + UNWIND_HINT_IRET_REGS entry=0 > > /* Everything up to here is safe from nested NMIs */ > > @@ -1308,7 +1314,7 @@ SYM_CODE_START(asm_exc_nmi) > pushq $__KERNEL_CS /* CS */ > pushq $1f /* RIP */ > iretq /* continues at repeat_nmi below */ > - UNWIND_HINT_IRET_REGS > + UNWIND_HINT_IRET_REGS entry=0 > 1: > ENDBR > #endif > --- a/arch/x86/entry/entry_64_compat.S > +++ b/arch/x86/entry/entry_64_compat.S > @@ -49,6 +49,7 @@ > SYM_CODE_START(entry_SYSENTER_compat) > UNWIND_HINT_EMPTY > /* Interrupts are off on entry. */ > + ENDBR > SWAPGS > > pushq %rax > @@ -198,6 +199,7 @@ SYM_CODE_END(entry_SYSENTER_compat) > */ > SYM_CODE_START(entry_SYSCALL_compat) > UNWIND_HINT_EMPTY > + ENDBR > /* Interrupts are off on entry. */ > swapgs > > --- a/arch/x86/include/asm/idtentry.h > +++ b/arch/x86/include/asm/idtentry.h > @@ -5,6 +5,12 @@ > /* Interrupts/Exceptions */ > #include <asm/trapnr.h> > > +#ifdef CONFIG_X86_IBT > +#define IDT_ALIGN 16 > +#else > +#define IDT_ALIGN 8 > +#endif > + > #ifndef __ASSEMBLY__ > #include <linux/entry-common.h> > #include <linux/hardirq.h> > @@ -492,33 +498,34 @@ __visible noinstr void func(struct pt_re > * point is to mask off the bits above bit 7 because the push is sign > * extending. > */ > - .align 8 > + > + .align IDT_ALIGN > SYM_CODE_START(irq_entries_start) > vector=FIRST_EXTERNAL_VECTOR > .rept NR_EXTERNAL_VECTORS > - UNWIND_HINT_IRET_REGS > + UNWIND_HINT_IRET_REGS entry=1 > 0 : > + ENDBR > .byte 0x6a, vector > jmp asm_common_interrupt > - nop > /* Ensure that the above is 8 bytes max */ > - . = 0b + 8 > + .fill 0b + IDT_ALIGN - ., 1, 0x90 > vector = vector+1 > .endr > SYM_CODE_END(irq_entries_start) > > #ifdef CONFIG_X86_LOCAL_APIC > - .align 8 > + .align IDT_ALIGN > SYM_CODE_START(spurious_entries_start) > vector=FIRST_SYSTEM_VECTOR > .rept NR_SYSTEM_VECTORS > - UNWIND_HINT_IRET_REGS > + UNWIND_HINT_IRET_REGS entry=1 > 0 : > + ENDBR > .byte 0x6a, vector > jmp asm_spurious_interrupt > - nop > /* Ensure that the above is 8 bytes max */ > - . = 0b + 8 > + .fill 0b + IDT_ALIGN - ., 1, 0x90 > vector = vector+1 > .endr > SYM_CODE_END(spurious_entries_start) > --- a/arch/x86/include/asm/segment.h > +++ b/arch/x86/include/asm/segment.h > @@ -4,6 +4,7 @@ > > #include <linux/const.h> > #include <asm/alternative.h> > +#include <asm/ibt.h> > > /* > * Constructor for a conventional segment GDT (or LDT) entry. > @@ -275,7 +276,11 @@ static inline void vdso_read_cpunode(uns > * vector has no error code (two bytes), a 'push $vector_number' (two > * bytes), and a jump to the common entry code (up to five bytes). > */ > +#ifdef CONFIG_X86_IBT > +#define EARLY_IDT_HANDLER_SIZE 13 > +#else > #define EARLY_IDT_HANDLER_SIZE 9 > +#endif > > /* > * xen_early_idt_handler_array is for Xen pv guests: for each entry in > --- a/arch/x86/include/asm/unwind_hints.h > +++ b/arch/x86/include/asm/unwind_hints.h > @@ -11,7 +11,7 @@ > UNWIND_HINT sp_reg=ORC_REG_UNDEFINED type=UNWIND_HINT_TYPE_CALL end=1 > .endm > > -.macro UNWIND_HINT_REGS base=%rsp offset=0 indirect=0 extra=1 partial=0 > +.macro UNWIND_HINT_REGS base=%rsp offset=0 indirect=0 extra=1 partial=0 entry=1 > .if \base == %rsp > .if \indirect > .set sp_reg, ORC_REG_SP_INDIRECT > @@ -33,9 +33,17 @@ > .set sp_offset, \offset > > .if \partial > - .set type, UNWIND_HINT_TYPE_REGS_PARTIAL > + .if \entry > + .set type, UNWIND_HINT_TYPE_REGS_ENTRY > + .else > + .set type, UNWIND_HINT_TYPE_REGS_EXIT > + .endif > .elseif \extra == 0 > - .set type, UNWIND_HINT_TYPE_REGS_PARTIAL > + .if \entry > + .set type, UNWIND_HINT_TYPE_REGS_ENTRY > + .else > + .set type, UNWIND_HINT_TYPE_REGS_EXIT > + .endif > .set sp_offset, \offset + (16*8) > .else > .set type, UNWIND_HINT_TYPE_REGS > @@ -44,8 +52,8 @@ > UNWIND_HINT sp_reg=sp_reg sp_offset=sp_offset type=type > .endm > > -.macro UNWIND_HINT_IRET_REGS base=%rsp offset=0 > - UNWIND_HINT_REGS base=\base offset=\offset partial=1 > +.macro UNWIND_HINT_IRET_REGS base=%rsp offset=0 entry=1 > + UNWIND_HINT_REGS base=\base offset=\offset partial=1 entry=\entry > .endm > > .macro UNWIND_HINT_FUNC > --- a/arch/x86/kernel/head_64.S > +++ b/arch/x86/kernel/head_64.S > @@ -25,6 +25,7 @@ > #include <asm/export.h> > #include <asm/nospec-branch.h> > #include <asm/fixmap.h> > +#include <asm/ibt.h> > > /* > * We are not able to switch in one step to the final KERNEL ADDRESS SPACE > @@ -310,7 +311,8 @@ SYM_CODE_END(start_cpu0) > * when .init.text is freed. > */ > SYM_CODE_START_NOALIGN(vc_boot_ghcb) > - UNWIND_HINT_IRET_REGS offset=8 > + UNWIND_HINT_IRET_REGS offset=8 entry=1 > + ENDBR > > /* Build pt_regs */ > PUSH_AND_CLEAR_REGS > @@ -354,18 +356,20 @@ SYM_CODE_START(early_idt_handler_array) > i = 0 > .rept NUM_EXCEPTION_VECTORS > .if ((EXCEPTION_ERRCODE_MASK >> i) & 1) == 0 > - UNWIND_HINT_IRET_REGS > + UNWIND_HINT_IRET_REGS entry=1 > + ENDBR > pushq $0 # Dummy error code, to make stack frame uniform > .else > - UNWIND_HINT_IRET_REGS offset=8 > + UNWIND_HINT_IRET_REGS offset=8 entry=1 > + ENDBR > .endif > pushq $i # 72(%rsp) Vector number > jmp early_idt_handler_common > - UNWIND_HINT_IRET_REGS > + UNWIND_HINT_IRET_REGS entry=0 > i = i + 1 > .fill early_idt_handler_array + i*EARLY_IDT_HANDLER_SIZE - ., 1, 0xcc > .endr > - UNWIND_HINT_IRET_REGS offset=16 > + UNWIND_HINT_IRET_REGS offset=16 entry=0 > SYM_CODE_END(early_idt_handler_array) > > SYM_CODE_START_LOCAL(early_idt_handler_common) > --- a/arch/x86/kernel/idt.c > +++ b/arch/x86/kernel/idt.c > @@ -10,6 +10,7 @@ > #include <asm/proto.h> > #include <asm/desc.h> > #include <asm/hw_irq.h> > +#include <asm/idtentry.h> > > #define DPL0 0x0 > #define DPL3 0x3 > @@ -272,7 +273,7 @@ void __init idt_setup_apic_and_irq_gates > idt_setup_from_table(idt_table, apic_idts, ARRAY_SIZE(apic_idts), true); > > for_each_clear_bit_from(i, system_vectors, FIRST_SYSTEM_VECTOR) { > - entry = irq_entries_start + 8 * (i - FIRST_EXTERNAL_VECTOR); > + entry = irq_entries_start + IDT_ALIGN * (i - FIRST_EXTERNAL_VECTOR); > set_intr_gate(i, entry); > } > > @@ -283,7 +284,7 @@ void __init idt_setup_apic_and_irq_gates > * system_vectors bitmap. Otherwise they show up in > * /proc/interrupts. > */ > - entry = spurious_entries_start + 8 * (i - FIRST_SYSTEM_VECTOR); > + entry = spurious_entries_start + IDT_ALIGN * (i - FIRST_SYSTEM_VECTOR); > set_intr_gate(i, entry); > } > #endif > --- a/arch/x86/kernel/unwind_orc.c > +++ b/arch/x86/kernel/unwind_orc.c > @@ -566,7 +566,8 @@ bool unwind_next_frame(struct unwind_sta > state->signal = true; > break; > > - case UNWIND_HINT_TYPE_REGS_PARTIAL: > + case UNWIND_HINT_TYPE_REGS_ENTRY: > + case UNWIND_HINT_TYPE_REGS_EXIT: > if (!deref_stack_iret_regs(state, sp, &state->ip, &state->sp)) { > orc_warn_current("can't access iret registers at %pB\n", > (void *)orig_ip); > --- a/include/linux/objtool.h > +++ b/include/linux/objtool.h > @@ -35,8 +35,9 @@ struct unwind_hint { > */ > #define UNWIND_HINT_TYPE_CALL 0 > #define UNWIND_HINT_TYPE_REGS 1 > -#define UNWIND_HINT_TYPE_REGS_PARTIAL 2 > -#define UNWIND_HINT_TYPE_FUNC 3 > +#define UNWIND_HINT_TYPE_REGS_ENTRY 2 > +#define UNWIND_HINT_TYPE_REGS_EXIT 3 > +#define UNWIND_HINT_TYPE_FUNC 4 > > #ifdef CONFIG_STACK_VALIDATION > > --- a/tools/include/linux/objtool.h > +++ b/tools/include/linux/objtool.h > @@ -35,8 +35,9 @@ struct unwind_hint { > */ > #define UNWIND_HINT_TYPE_CALL 0 > #define UNWIND_HINT_TYPE_REGS 1 > -#define UNWIND_HINT_TYPE_REGS_PARTIAL 2 > -#define UNWIND_HINT_TYPE_FUNC 3 > +#define UNWIND_HINT_TYPE_REGS_ENTRY 2 > +#define UNWIND_HINT_TYPE_REGS_EXIT 3 > +#define UNWIND_HINT_TYPE_FUNC 4 > > #ifdef CONFIG_STACK_VALIDATION > > --- a/tools/objtool/check.c > +++ b/tools/objtool/check.c > @@ -2308,7 +2308,8 @@ static int update_cfi_state(struct instr > } > > if (cfi->type == UNWIND_HINT_TYPE_REGS || > - cfi->type == UNWIND_HINT_TYPE_REGS_PARTIAL) > + cfi->type == UNWIND_HINT_TYPE_REGS_ENTRY || > + cfi->type == UNWIND_HINT_TYPE_REGS_EXIT) > return update_cfi_state_regs(insn, cfi, op); > > switch (op->dest.type) { > --- a/tools/objtool/orc_dump.c > +++ b/tools/objtool/orc_dump.c > @@ -43,7 +43,8 @@ static const char *orc_type_name(unsigne > return "call"; > case UNWIND_HINT_TYPE_REGS: > return "regs"; > - case UNWIND_HINT_TYPE_REGS_PARTIAL: > + case UNWIND_HINT_TYPE_REGS_ENTRY: > + case UNWIND_HINT_TYPE_REGS_EXIT: > return "regs (partial)"; > default: > return "?"; > >
On Tue, Nov 23, 2021 at 01:53:49PM +0000, Mark Rutland wrote:
> On Mon, Nov 22, 2021 at 06:03:02PM +0100, Peter Zijlstra wrote:
> > In order to find _THIS_IP_ code references in objtool, annotate them.
>
> Just to check my understanding, IIUC this is because in later patches
> you'll look at text relocations to spot missing ENDBRs, and when doing
> so you need to filter out _THIS_IP_ instances, since those don't need an
> ENDBR. Is that right?
>
> Just checking I haven't missed some other concern that might apply to
> arm64's BTI (Branch Target Identifier), which are analagous to ENDBR.
Correct; since _THIS_IP_ is never used for control flow (afaik, let's
hope to $deity etc..) we can ignore any relocation resulting from it
(lots!).
On Tue, Nov 23, 2021 at 02:00:53PM +0000, Mark Rutland wrote:
> On Mon, Nov 22, 2021 at 06:03:06PM +0100, Peter Zijlstra wrote:
> > Kernel entry points should be having ENDBR on for IBT configs.
> >
> > The SYSCALL entry points are found through taking their respective
> > address in order to program them in the MSRs, while the exception
> > entry points are found through UNWIND_HINT_IRET_REGS.
> >
> > *Except* that latter hint is also used on exit code to denote when
> > we're down to an IRET frame. As such add an additional 'entry'
> > argument to the macro and have it default to '1' such that objtool
> > will assume it's an entry and WARN about it.
> >
> > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
>
> On arm64 we also added BTI (analagous to ENDBR) in our SYMC_FUNC_START()
> so what we don't miss a necessary landing pad for any assembly functions
> that can be indirectly branched to.
>
> See commits:
>
> 714a8d02ca4da147 ("arm64: asm: Override SYM_FUNC_START when building the kernel with BTI")
> 2d21889f8b5c50f6 ("arm64: Don't insert a BTI instruction at inner labels")
>
> ... do you need something similar? Or do you never indirectly branch to
> an assembly function?
With the big caveat that I've only looked at x86_64-defconfig so far, we
don't seem to be doing that.
We also have the big benefit of having a larger immediate doesn't give
us those 'spurious' indirect branches you suffer from.
Hence also that --ibt-seal option that removes as many ENDBR
instructions as possible. Minimizing ENDBR is a feasible option for us,
where I don't think it is on ARM64.
On Tue, Nov 23, 2021 at 03:14:45PM +0100, Peter Zijlstra wrote:
> On Tue, Nov 23, 2021 at 01:53:49PM +0000, Mark Rutland wrote:
> > On Mon, Nov 22, 2021 at 06:03:02PM +0100, Peter Zijlstra wrote:
> > > In order to find _THIS_IP_ code references in objtool, annotate them.
> >
> > Just to check my understanding, IIUC this is because in later patches
> > you'll look at text relocations to spot missing ENDBRs, and when doing
> > so you need to filter out _THIS_IP_ instances, since those don't need an
> > ENDBR. Is that right?
> >
> > Just checking I haven't missed some other concern that might apply to
> > arm64's BTI (Branch Target Identifier), which are analagous to ENDBR.
>
> Correct; since _THIS_IP_ is never used for control flow (afaik, let's
> hope to $deity etc..) we can ignore any relocation resulting from it
> (lots!).
This would all be good context to add to the commit message.
--
Josh
On Mon, Nov 22, 2021 at 06:03:07PM +0100, Peter Zijlstra wrote: > +static int validate_ibt_reloc(struct objtool_file *file, struct reloc *reloc, char **name) > +{ > + struct instruction *dest; > + struct section *sec; > + unsigned long off; > + > + sec = reloc->sym->sec; > + off = reloc->sym->offset + reloc->addend; > + > + dest = find_insn(file, sec, off); > + if (!dest) > + return 0; > + > + if (name && dest->func) > + *name = dest->func->name; I think these checks can be further narrowed down by only looking for X86_64_64 relocs. > + list_for_each_entry(insn, &file->endbr_list, call_node) { > + if (ibt_seal) { > + elf_write_insn(file->elf, insn->sec, > + insn->offset, insn->len, > + arch_nop_insn(insn->len)); > + } Like the retpoline rewriting, I'd much rather have objtool create annotations which the kernel can then read and patch itself. e.g. have '.ibt.direct_call_sites' and '.ibt.unused_endbr_insns' sections. -- Josh
On 2021-11-22 09:03, Peter Zijlstra wrote: > Objtool based IBT validation in 3 passes: > > --ibt-fix-direct: > > Detect and rewrite any code/reloc from a JMP/CALL instruction > to an ENDBR instruction. This is basically a compiler bug since > neither needs the ENDBR and decoding it is a pure waste of time. > > --ibt: > > Report code relocs that are not JMP/CALL and don't point to ENDBR > > There's a bunch of false positives, for eg. static_call_update() > and copy_thread() and kprobes. But most of them were due to > _THIS_IP_ which has been taken care of with the prior annotation. > > --ibt-seal: > > Find and NOP superfluous ENDBR instructions. Any function that > doesn't have it's address taken should not have an ENDBR > instruction. This removes about 1-in-4 ENDBR instructions. > I did some experimentation with compiler-based implementation for two of the features described here (plus an additional one). Before going into details, just a quick highlight that the compiler has limited visibility over handwritten assembly sources thus, depending on the feature, a compiler-based approach will not cover as much as objtool. All the observations below were made when compiling the kernel with defconfig, + CLANG-related options, + LTO sometimes. Here I used kernel revision 0fcfb00b28c0b7884635dacf38e46d60bf3d4eb1 with PeterZ's IBT Beginning patches applied on top (plus changes to Kbuild), thus, IBT was not really enforced. Tests consisted mostly of Clang's synthetics tests + booting a compiled kernel. Prototypes of the features are available in: https://github.com/lvwr/llvm/tree/joao/ibt -- I fixed as many corner cases I could find while trying it out, but I believe some might still be haunting. Also, I'm not very keen to Kbuild internals nor to however the kernel patches itself during runtime, so I may have missed some details. Finally, I'm interested in general feedback about this... better ways of implementing, alternative approaches, new possible optimizations and everything. I should be AFK for a few days in the next weeks, but I'll be glad to discuss this in January and then. Happy Holidays :) The features: -mibt-seal: Add ENDBRs exclusively to address-taken functions regardless of its linkage visibility. Only make sense together with LTO. Observations: Reduced drastically the number of ENDBRs placed in the kernel binary (From 44383 to 33192), but still missed some that were later fixed by objtool (Number of fixes by objtool reduced from 11730 to 540). I did not investigate further why these superfluous ENDBRs were still left in the binary, but at this point my hypotheses spin around (i) false-positive address-taken conclusions by the compiler, possibly due to things like exported symbols and such; (ii) assembly sources which are invisible to the compiler (although this would more likely hide address taken functions); (iii) other binary level transformations done by objtool. Runtime testing: The kernel was verified to properly boot after being compiled with -mibt-seal (+ LTO). Note: This feature was already submitted for upstreaming with the llvm-project: https://reviews.llvm.org/D116070 -mibt-fix-direct: Direct calls to functions which are known to have ENDBR instructions are fixed to target the first instruction after the ENDBR (+4 offset). Does not necessarily depend on LTO, but is meaningfully improved by it because it depends on after-linking stuff to successfully identify targets that will have ENDBRs (aliases and assembly functions are a big complication here, so this currently won't try to optimize calls to functions which exist in the compiler context as declarations). Observations: I did a quick change on objtool to collect stats on the number of fixes applied. Without -mibt-fix-direct objtool had to fix 227552 direct calls/jmps. Without LTO, -mibt-fix-direct reduced this number to 218455. With LTO this number was reduced to 1728. Runtime testing: The kernel was verified to properly boot when compiled with -mibt-fix-direct both with and without LTO. I tried a more aggressive approach for non-LTO compilation, which was trying to optimize based on declarations (when functions were not visible) and noticed that in this scenario the kernel would crash very early in the boot process. I did not investigate further, but my hypothesis is that this approach mistakenly optimizes calls to assembly functions without ENDBRs, leading to random weirdness and doom. -mibt-preceding-endbr: Instead of placing ENDBRs after the function entry label, place it right before. Indirect calls are fixed (using a sub instr) to target 4 bytes before the function entry point. Address values which are reused after the indirect call are fixed back to their original value (using an add instr). Indirect calls that use in-memory pointers are transformed to first load the function pointer from memory into a free register, then do the sub, then call it. This does not depend on LTO. Runtime testing: The runtime test was done using a kernel whose asm functions were not fixed regarding the position of ENDBRs. Yet, perhaps surprisingly, the kernel still boots when compiled with -mibt-preceding-endbr. I don't really know how many indirect calls to assembly functions happened, but I assume that these would have crashed if IBT was being enforced due to the misplaced ENDBRs (and it could also have crashed in these early tests due to indirect calls targeting unknown instruction 4 bytes before assembly functions, thus the surprise factor when I saw the kernel booting). Kernel booted alright with -mibt-preceding-endbr with and without LTO. CONFIG_RETPOLINE was disabled for testing this.
On Mon, Nov 22, 2021 at 06:03:03PM +0100, Peter Zijlstra wrote: > Add Kconfig, Makefile and basic instruction support for x86 IBT. > > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> > --- > arch/x86/Kconfig | 10 ++++++++++ > arch/x86/Makefile | 5 ++++- > arch/x86/include/asm/ibt.h | 40 ++++++++++++++++++++++++++++++++++++++++ > 3 files changed, 54 insertions(+), 1 deletion(-) > > --- a/arch/x86/Kconfig > +++ b/arch/x86/Kconfig > @@ -1848,6 +1848,16 @@ config X86_UMIP > specific cases in protected and virtual-8086 modes. Emulated > results are dummy. > > +config CC_HAS_IBT > + def_bool $(cc-option, -fcf-protection=branch) > + > +config X86_IBT > + prompt "Indirect Branch Tracking" > + bool > + depends on X86_64 && CC_HAS_IBT > + help > + Increase kernel text size for giggles How about: For systems that support CET, enable Indirect Branch Tracking, which blocks all JOP and indirect call pointer attacks that are not pointing at function entry points (i.e. marked with ENDBR). This also eliminates the use of all of the "misaligned" gadgets that might be reachable in the middle of instructions. > + > config X86_INTEL_MEMORY_PROTECTION_KEYS > prompt "Memory Protection Keys" > def_bool y > --- a/arch/x86/Makefile > +++ b/arch/x86/Makefile > @@ -50,8 +50,11 @@ export BITS > # > KBUILD_CFLAGS += -mno-sse -mno-mmx -mno-sse2 -mno-3dnow -mno-avx > > -# Intel CET isn't enabled in the kernel > +ifeq ($(CONFIG_X86_IBT),y) > +KBUILD_CFLAGS += $(call cc-option,-fcf-protection=branch) > +else > KBUILD_CFLAGS += $(call cc-option,-fcf-protection=none) > +endif > > ifeq ($(CONFIG_X86_32),y) > BITS := 32 > --- /dev/null > +++ b/arch/x86/include/asm/ibt.h > @@ -0,0 +1,40 @@ > +/* SPDX-License-Identifier: GPL-2.0 */ > +#ifndef _ASM_X86_IBT_H > +#define _ASM_X86_IBT_H > + > +#ifdef CONFIG_X86_IBT > + > +#ifndef __ASSEMBLY__ > + > +// XXX note about GAS version required > + > +#ifdef CONFIG_X86_64 > +#define ASM_ENDBR ".byte 0xf3, 0x0f, 0x1e, 0xfa\n\t" > +#else > +#define ASM_ENDBR ".byte 0xf3, 0x0f, 0x1e, 0xfb\n\t" > +#endif > + > +#else /* __ASSEMBLY__ */ > + > +#ifdef CONFIG_X86_64 > +#define ENDBR .byte 0xf3, 0x0f, 0x1e, 0xfa > +#else > +#define ENDBR .byte 0xf3, 0x0f, 0x1e, 0xfb > +#endif > + > +#endif /* __ASSEMBLY__ */ > + > +#else /* !IBT */ > + > +#ifndef __ASSEMBLY__ > + > +#define ASM_ENDBR > + > +#else /* __ASSEMBLY__ */ > + > +#define ENDBR > + > +#endif /* __ASSEMBLY__ */ > + > +#endif /* CONFIG_X86_IBT */ > +#endif /* _ASM_X86_IBT_H */ > > -- Kees Cook
On Mon, Nov 22, 2021 at 07:09:47PM +0100, Peter Zijlstra wrote: > On Mon, Nov 22, 2021 at 06:03:04PM +0100, Peter Zijlstra wrote: > > The IRET-to-Self chunks trigger forward code references without ENDBR, > > fix that. > > Andy corrected me, IRET doesn't take ENBR, the alternative is the below. Okay, good. I was scratching my head for a second there. :) -Kees > > --- > --- a/arch/x86/entry/entry_64.S > +++ b/arch/x86/entry/entry_64.S > @@ -1316,7 +1316,6 @@ SYM_CODE_START(asm_exc_nmi) > iretq /* continues at repeat_nmi below */ > UNWIND_HINT_IRET_REGS entry=0 > 1: > - ENDBR > #endif > > repeat_nmi: > --- a/arch/x86/include/asm/sync_core.h > +++ b/arch/x86/include/asm/sync_core.h > @@ -6,7 +6,6 @@ > #include <asm/processor.h> > #include <asm/cpufeature.h> > #include <asm/special_insns.h> > -#include <asm/ibt.h> > > #ifdef CONFIG_X86_32 > static inline void iret_to_self(void) > @@ -35,7 +34,6 @@ static inline void iret_to_self(void) > "pushq $1f\n\t" > "iretq\n\t" > "1:" > - ASM_ENDBR > : "=&r" (tmp), ASM_CALL_CONSTRAINT : : "cc", "memory"); > } > #endif /* CONFIG_X86_32 */ > --- a/tools/objtool/arch/x86/decode.c > +++ b/tools/objtool/arch/x86/decode.c > @@ -598,6 +598,7 @@ int arch_decode_instruction(struct objto > op->dest.type = OP_DEST_REG; > op->dest.reg = CFI_SP; > } > + *type = INSN_IRET; > break; > } > > --- a/tools/objtool/check.c > +++ b/tools/objtool/check.c > @@ -3587,7 +3587,7 @@ static int validate_ibt_reloc(struct obj > static void validate_ibt_insn(struct objtool_file *file, struct instruction *insn) > { > struct reloc *reloc = insn_reloc(file, insn); > - struct instruction *target; > + struct instruction *target, *n; > unsigned long offset; > > if (!reloc) > @@ -3599,8 +3599,16 @@ static void validate_ibt_insn(struct obj > offset = reloc->sym->offset + reloc->addend; > > target = find_insn(file, reloc->sym->sec, offset); > - if (target && insn->func == target->func && target->this_ip) > - return; > + if (target && insn->func == target->func) { > + if (target->this_ip) > + return; > + > + for (n = insn; n->offset <= target->offset; > + n = next_insn_same_func(file, n)) { > + if (n->type == INSN_IRET) > + return; > + } > + } > > WARN_FUNC("relocation to !ENDBR: %s+0x%lx", > insn->sec, insn->offset, > --- a/tools/objtool/include/objtool/arch.h > +++ b/tools/objtool/include/objtool/arch.h > @@ -27,6 +27,7 @@ enum insn_type { > INSN_STD, > INSN_CLD, > INSN_ENDBR, > + INSN_IRET, > INSN_OTHER, > }; > -- Kees Cook
On Tue, Nov 23, 2021 at 02:00:53PM +0000, Mark Rutland wrote: > On Mon, Nov 22, 2021 at 06:03:06PM +0100, Peter Zijlstra wrote: > > Kernel entry points should be having ENDBR on for IBT configs. > > > > The SYSCALL entry points are found through taking their respective > > address in order to program them in the MSRs, while the exception > > entry points are found through UNWIND_HINT_IRET_REGS. > > > > *Except* that latter hint is also used on exit code to denote when > > we're down to an IRET frame. As such add an additional 'entry' > > argument to the macro and have it default to '1' such that objtool > > will assume it's an entry and WARN about it. > > > > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> > > On arm64 we also added BTI (analagous to ENDBR) in our SYMC_FUNC_START() > so what we don't miss a necessary landing pad for any assembly functions > that can be indirectly branched to. > > See commits: > > 714a8d02ca4da147 ("arm64: asm: Override SYM_FUNC_START when building the kernel with BTI") > 2d21889f8b5c50f6 ("arm64: Don't insert a BTI instruction at inner labels") > > ... do you need something similar? Or do you never indirectly branch to > an assembly function? The crypto was doing a bunch of this, though much of it was fixed recently. I know Sami is still dealing with needing to annotate assembly functions for CFI, though, so I'd expect ENDBR will be needed as well. Since objtool strips the ones it doesn't like, can't we add them unconditionally to SYM_FUNC_START()? -Kees > > Thanks, > Mark. > > > --- > > arch/x86/entry/entry_64.S | 34 ++++++++++++++++++++-------------- > > arch/x86/entry/entry_64_compat.S | 2 ++ > > arch/x86/include/asm/idtentry.h | 23 +++++++++++++++-------- > > arch/x86/include/asm/segment.h | 5 +++++ > > arch/x86/include/asm/unwind_hints.h | 18 +++++++++++++----- > > arch/x86/kernel/head_64.S | 14 +++++++++----- > > arch/x86/kernel/idt.c | 5 +++-- > > arch/x86/kernel/unwind_orc.c | 3 ++- > > include/linux/objtool.h | 5 +++-- > > tools/include/linux/objtool.h | 5 +++-- > > tools/objtool/check.c | 3 ++- > > tools/objtool/orc_dump.c | 3 ++- > > 12 files changed, 79 insertions(+), 41 deletions(-) > > > > --- a/arch/x86/entry/entry_64.S > > +++ b/arch/x86/entry/entry_64.S > > @@ -88,6 +88,7 @@ > > SYM_CODE_START(entry_SYSCALL_64) > > UNWIND_HINT_EMPTY > > > > + ENDBR > > swapgs > > /* tss.sp2 is scratch space. */ > > movq %rsp, PER_CPU_VAR(cpu_tss_rw + TSS_sp2) > > @@ -350,7 +351,8 @@ SYM_CODE_END(ret_from_fork) > > */ > > .macro idtentry vector asmsym cfunc has_error_code:req > > SYM_CODE_START(\asmsym) > > - UNWIND_HINT_IRET_REGS offset=\has_error_code*8 > > + UNWIND_HINT_IRET_REGS offset=\has_error_code*8 entry=1 > > + ENDBR > > ASM_CLAC > > > > .if \has_error_code == 0 > > @@ -367,7 +369,7 @@ SYM_CODE_START(\asmsym) > > .rept 6 > > pushq 5*8(%rsp) > > .endr > > - UNWIND_HINT_IRET_REGS offset=8 > > + UNWIND_HINT_IRET_REGS offset=8 entry=0 > > .Lfrom_usermode_no_gap_\@: > > .endif > > > > @@ -417,7 +419,8 @@ SYM_CODE_END(\asmsym) > > */ > > .macro idtentry_mce_db vector asmsym cfunc > > SYM_CODE_START(\asmsym) > > - UNWIND_HINT_IRET_REGS > > + UNWIND_HINT_IRET_REGS entry=1 > > + ENDBR > > ASM_CLAC > > > > pushq $-1 /* ORIG_RAX: no syscall to restart */ > > @@ -472,7 +475,8 @@ SYM_CODE_END(\asmsym) > > */ > > .macro idtentry_vc vector asmsym cfunc > > SYM_CODE_START(\asmsym) > > - UNWIND_HINT_IRET_REGS > > + UNWIND_HINT_IRET_REGS entry=1 > > + ENDBR > > ASM_CLAC > > > > /* > > @@ -533,7 +537,8 @@ SYM_CODE_END(\asmsym) > > */ > > .macro idtentry_df vector asmsym cfunc > > SYM_CODE_START(\asmsym) > > - UNWIND_HINT_IRET_REGS offset=8 > > + UNWIND_HINT_IRET_REGS offset=8 entry=1 > > + ENDBR > > ASM_CLAC > > > > /* paranoid_entry returns GS information for paranoid_exit in EBX. */ > > @@ -626,7 +631,7 @@ SYM_INNER_LABEL(restore_regs_and_return_ > > INTERRUPT_RETURN > > > > SYM_INNER_LABEL_ALIGN(native_iret, SYM_L_GLOBAL) > > - UNWIND_HINT_IRET_REGS > > + UNWIND_HINT_IRET_REGS entry=0 > > /* > > * Are we returning to a stack segment from the LDT? Note: in > > * 64-bit mode SS:RSP on the exception stack is always valid. > > @@ -703,7 +708,7 @@ SYM_INNER_LABEL(native_irq_return_iret, > > popq %rdi /* Restore user RDI */ > > > > movq %rax, %rsp > > - UNWIND_HINT_IRET_REGS offset=8 > > + UNWIND_HINT_IRET_REGS offset=8 entry=0 > > > > /* > > * At this point, we cannot write to the stack any more, but we can > > @@ -819,13 +824,13 @@ SYM_CODE_START(xen_failsafe_callback) > > movq 8(%rsp), %r11 > > addq $0x30, %rsp > > pushq $0 /* RIP */ > > - UNWIND_HINT_IRET_REGS offset=8 > > + UNWIND_HINT_IRET_REGS offset=8 entry=0 > > jmp asm_exc_general_protection > > 1: /* Segment mismatch => Category 1 (Bad segment). Retry the IRET. */ > > movq (%rsp), %rcx > > movq 8(%rsp), %r11 > > addq $0x30, %rsp > > - UNWIND_HINT_IRET_REGS > > + UNWIND_HINT_IRET_REGS entry=0 > > pushq $-1 /* orig_ax = -1 => not a system call */ > > PUSH_AND_CLEAR_REGS > > ENCODE_FRAME_POINTER > > @@ -1065,7 +1070,8 @@ SYM_CODE_END(error_return) > > * when PAGE_TABLE_ISOLATION is in use. Do not clobber. > > */ > > SYM_CODE_START(asm_exc_nmi) > > - UNWIND_HINT_IRET_REGS > > + UNWIND_HINT_IRET_REGS entry=1 > > + ENDBR > > > > /* > > * We allow breakpoints in NMIs. If a breakpoint occurs, then > > @@ -1130,13 +1136,13 @@ SYM_CODE_START(asm_exc_nmi) > > SWITCH_TO_KERNEL_CR3 scratch_reg=%rdx > > movq %rsp, %rdx > > movq PER_CPU_VAR(cpu_current_top_of_stack), %rsp > > - UNWIND_HINT_IRET_REGS base=%rdx offset=8 > > + UNWIND_HINT_IRET_REGS base=%rdx offset=8 entry=0 > > pushq 5*8(%rdx) /* pt_regs->ss */ > > pushq 4*8(%rdx) /* pt_regs->rsp */ > > pushq 3*8(%rdx) /* pt_regs->flags */ > > pushq 2*8(%rdx) /* pt_regs->cs */ > > pushq 1*8(%rdx) /* pt_regs->rip */ > > - UNWIND_HINT_IRET_REGS > > + UNWIND_HINT_IRET_REGS entry=0 > > pushq $-1 /* pt_regs->orig_ax */ > > PUSH_AND_CLEAR_REGS rdx=(%rdx) > > ENCODE_FRAME_POINTER > > @@ -1292,7 +1298,7 @@ SYM_CODE_START(asm_exc_nmi) > > .rept 5 > > pushq 11*8(%rsp) > > .endr > > - UNWIND_HINT_IRET_REGS > > + UNWIND_HINT_IRET_REGS entry=0 > > > > /* Everything up to here is safe from nested NMIs */ > > > > @@ -1308,7 +1314,7 @@ SYM_CODE_START(asm_exc_nmi) > > pushq $__KERNEL_CS /* CS */ > > pushq $1f /* RIP */ > > iretq /* continues at repeat_nmi below */ > > - UNWIND_HINT_IRET_REGS > > + UNWIND_HINT_IRET_REGS entry=0 > > 1: > > ENDBR > > #endif > > --- a/arch/x86/entry/entry_64_compat.S > > +++ b/arch/x86/entry/entry_64_compat.S > > @@ -49,6 +49,7 @@ > > SYM_CODE_START(entry_SYSENTER_compat) > > UNWIND_HINT_EMPTY > > /* Interrupts are off on entry. */ > > + ENDBR > > SWAPGS > > > > pushq %rax > > @@ -198,6 +199,7 @@ SYM_CODE_END(entry_SYSENTER_compat) > > */ > > SYM_CODE_START(entry_SYSCALL_compat) > > UNWIND_HINT_EMPTY > > + ENDBR > > /* Interrupts are off on entry. */ > > swapgs > > > > --- a/arch/x86/include/asm/idtentry.h > > +++ b/arch/x86/include/asm/idtentry.h > > @@ -5,6 +5,12 @@ > > /* Interrupts/Exceptions */ > > #include <asm/trapnr.h> > > > > +#ifdef CONFIG_X86_IBT > > +#define IDT_ALIGN 16 > > +#else > > +#define IDT_ALIGN 8 > > +#endif > > + > > #ifndef __ASSEMBLY__ > > #include <linux/entry-common.h> > > #include <linux/hardirq.h> > > @@ -492,33 +498,34 @@ __visible noinstr void func(struct pt_re > > * point is to mask off the bits above bit 7 because the push is sign > > * extending. > > */ > > - .align 8 > > + > > + .align IDT_ALIGN > > SYM_CODE_START(irq_entries_start) > > vector=FIRST_EXTERNAL_VECTOR > > .rept NR_EXTERNAL_VECTORS > > - UNWIND_HINT_IRET_REGS > > + UNWIND_HINT_IRET_REGS entry=1 > > 0 : > > + ENDBR > > .byte 0x6a, vector > > jmp asm_common_interrupt > > - nop > > /* Ensure that the above is 8 bytes max */ > > - . = 0b + 8 > > + .fill 0b + IDT_ALIGN - ., 1, 0x90 > > vector = vector+1 > > .endr > > SYM_CODE_END(irq_entries_start) > > > > #ifdef CONFIG_X86_LOCAL_APIC > > - .align 8 > > + .align IDT_ALIGN > > SYM_CODE_START(spurious_entries_start) > > vector=FIRST_SYSTEM_VECTOR > > .rept NR_SYSTEM_VECTORS > > - UNWIND_HINT_IRET_REGS > > + UNWIND_HINT_IRET_REGS entry=1 > > 0 : > > + ENDBR > > .byte 0x6a, vector > > jmp asm_spurious_interrupt > > - nop > > /* Ensure that the above is 8 bytes max */ > > - . = 0b + 8 > > + .fill 0b + IDT_ALIGN - ., 1, 0x90 > > vector = vector+1 > > .endr > > SYM_CODE_END(spurious_entries_start) > > --- a/arch/x86/include/asm/segment.h > > +++ b/arch/x86/include/asm/segment.h > > @@ -4,6 +4,7 @@ > > > > #include <linux/const.h> > > #include <asm/alternative.h> > > +#include <asm/ibt.h> > > > > /* > > * Constructor for a conventional segment GDT (or LDT) entry. > > @@ -275,7 +276,11 @@ static inline void vdso_read_cpunode(uns > > * vector has no error code (two bytes), a 'push $vector_number' (two > > * bytes), and a jump to the common entry code (up to five bytes). > > */ > > +#ifdef CONFIG_X86_IBT > > +#define EARLY_IDT_HANDLER_SIZE 13 > > +#else > > #define EARLY_IDT_HANDLER_SIZE 9 > > +#endif > > > > /* > > * xen_early_idt_handler_array is for Xen pv guests: for each entry in > > --- a/arch/x86/include/asm/unwind_hints.h > > +++ b/arch/x86/include/asm/unwind_hints.h > > @@ -11,7 +11,7 @@ > > UNWIND_HINT sp_reg=ORC_REG_UNDEFINED type=UNWIND_HINT_TYPE_CALL end=1 > > .endm > > > > -.macro UNWIND_HINT_REGS base=%rsp offset=0 indirect=0 extra=1 partial=0 > > +.macro UNWIND_HINT_REGS base=%rsp offset=0 indirect=0 extra=1 partial=0 entry=1 > > .if \base == %rsp > > .if \indirect > > .set sp_reg, ORC_REG_SP_INDIRECT > > @@ -33,9 +33,17 @@ > > .set sp_offset, \offset > > > > .if \partial > > - .set type, UNWIND_HINT_TYPE_REGS_PARTIAL > > + .if \entry > > + .set type, UNWIND_HINT_TYPE_REGS_ENTRY > > + .else > > + .set type, UNWIND_HINT_TYPE_REGS_EXIT > > + .endif > > .elseif \extra == 0 > > - .set type, UNWIND_HINT_TYPE_REGS_PARTIAL > > + .if \entry > > + .set type, UNWIND_HINT_TYPE_REGS_ENTRY > > + .else > > + .set type, UNWIND_HINT_TYPE_REGS_EXIT > > + .endif > > .set sp_offset, \offset + (16*8) > > .else > > .set type, UNWIND_HINT_TYPE_REGS > > @@ -44,8 +52,8 @@ > > UNWIND_HINT sp_reg=sp_reg sp_offset=sp_offset type=type > > .endm > > > > -.macro UNWIND_HINT_IRET_REGS base=%rsp offset=0 > > - UNWIND_HINT_REGS base=\base offset=\offset partial=1 > > +.macro UNWIND_HINT_IRET_REGS base=%rsp offset=0 entry=1 > > + UNWIND_HINT_REGS base=\base offset=\offset partial=1 entry=\entry > > .endm > > > > .macro UNWIND_HINT_FUNC > > --- a/arch/x86/kernel/head_64.S > > +++ b/arch/x86/kernel/head_64.S > > @@ -25,6 +25,7 @@ > > #include <asm/export.h> > > #include <asm/nospec-branch.h> > > #include <asm/fixmap.h> > > +#include <asm/ibt.h> > > > > /* > > * We are not able to switch in one step to the final KERNEL ADDRESS SPACE > > @@ -310,7 +311,8 @@ SYM_CODE_END(start_cpu0) > > * when .init.text is freed. > > */ > > SYM_CODE_START_NOALIGN(vc_boot_ghcb) > > - UNWIND_HINT_IRET_REGS offset=8 > > + UNWIND_HINT_IRET_REGS offset=8 entry=1 > > + ENDBR > > > > /* Build pt_regs */ > > PUSH_AND_CLEAR_REGS > > @@ -354,18 +356,20 @@ SYM_CODE_START(early_idt_handler_array) > > i = 0 > > .rept NUM_EXCEPTION_VECTORS > > .if ((EXCEPTION_ERRCODE_MASK >> i) & 1) == 0 > > - UNWIND_HINT_IRET_REGS > > + UNWIND_HINT_IRET_REGS entry=1 > > + ENDBR > > pushq $0 # Dummy error code, to make stack frame uniform > > .else > > - UNWIND_HINT_IRET_REGS offset=8 > > + UNWIND_HINT_IRET_REGS offset=8 entry=1 > > + ENDBR > > .endif > > pushq $i # 72(%rsp) Vector number > > jmp early_idt_handler_common > > - UNWIND_HINT_IRET_REGS > > + UNWIND_HINT_IRET_REGS entry=0 > > i = i + 1 > > .fill early_idt_handler_array + i*EARLY_IDT_HANDLER_SIZE - ., 1, 0xcc > > .endr > > - UNWIND_HINT_IRET_REGS offset=16 > > + UNWIND_HINT_IRET_REGS offset=16 entry=0 > > SYM_CODE_END(early_idt_handler_array) > > > > SYM_CODE_START_LOCAL(early_idt_handler_common) > > --- a/arch/x86/kernel/idt.c > > +++ b/arch/x86/kernel/idt.c > > @@ -10,6 +10,7 @@ > > #include <asm/proto.h> > > #include <asm/desc.h> > > #include <asm/hw_irq.h> > > +#include <asm/idtentry.h> > > > > #define DPL0 0x0 > > #define DPL3 0x3 > > @@ -272,7 +273,7 @@ void __init idt_setup_apic_and_irq_gates > > idt_setup_from_table(idt_table, apic_idts, ARRAY_SIZE(apic_idts), true); > > > > for_each_clear_bit_from(i, system_vectors, FIRST_SYSTEM_VECTOR) { > > - entry = irq_entries_start + 8 * (i - FIRST_EXTERNAL_VECTOR); > > + entry = irq_entries_start + IDT_ALIGN * (i - FIRST_EXTERNAL_VECTOR); > > set_intr_gate(i, entry); > > } > > > > @@ -283,7 +284,7 @@ void __init idt_setup_apic_and_irq_gates > > * system_vectors bitmap. Otherwise they show up in > > * /proc/interrupts. > > */ > > - entry = spurious_entries_start + 8 * (i - FIRST_SYSTEM_VECTOR); > > + entry = spurious_entries_start + IDT_ALIGN * (i - FIRST_SYSTEM_VECTOR); > > set_intr_gate(i, entry); > > } > > #endif > > --- a/arch/x86/kernel/unwind_orc.c > > +++ b/arch/x86/kernel/unwind_orc.c > > @@ -566,7 +566,8 @@ bool unwind_next_frame(struct unwind_sta > > state->signal = true; > > break; > > > > - case UNWIND_HINT_TYPE_REGS_PARTIAL: > > + case UNWIND_HINT_TYPE_REGS_ENTRY: > > + case UNWIND_HINT_TYPE_REGS_EXIT: > > if (!deref_stack_iret_regs(state, sp, &state->ip, &state->sp)) { > > orc_warn_current("can't access iret registers at %pB\n", > > (void *)orig_ip); > > --- a/include/linux/objtool.h > > +++ b/include/linux/objtool.h > > @@ -35,8 +35,9 @@ struct unwind_hint { > > */ > > #define UNWIND_HINT_TYPE_CALL 0 > > #define UNWIND_HINT_TYPE_REGS 1 > > -#define UNWIND_HINT_TYPE_REGS_PARTIAL 2 > > -#define UNWIND_HINT_TYPE_FUNC 3 > > +#define UNWIND_HINT_TYPE_REGS_ENTRY 2 > > +#define UNWIND_HINT_TYPE_REGS_EXIT 3 > > +#define UNWIND_HINT_TYPE_FUNC 4 > > > > #ifdef CONFIG_STACK_VALIDATION > > > > --- a/tools/include/linux/objtool.h > > +++ b/tools/include/linux/objtool.h > > @@ -35,8 +35,9 @@ struct unwind_hint { > > */ > > #define UNWIND_HINT_TYPE_CALL 0 > > #define UNWIND_HINT_TYPE_REGS 1 > > -#define UNWIND_HINT_TYPE_REGS_PARTIAL 2 > > -#define UNWIND_HINT_TYPE_FUNC 3 > > +#define UNWIND_HINT_TYPE_REGS_ENTRY 2 > > +#define UNWIND_HINT_TYPE_REGS_EXIT 3 > > +#define UNWIND_HINT_TYPE_FUNC 4 > > > > #ifdef CONFIG_STACK_VALIDATION > > > > --- a/tools/objtool/check.c > > +++ b/tools/objtool/check.c > > @@ -2308,7 +2308,8 @@ static int update_cfi_state(struct instr > > } > > > > if (cfi->type == UNWIND_HINT_TYPE_REGS || > > - cfi->type == UNWIND_HINT_TYPE_REGS_PARTIAL) > > + cfi->type == UNWIND_HINT_TYPE_REGS_ENTRY || > > + cfi->type == UNWIND_HINT_TYPE_REGS_EXIT) > > return update_cfi_state_regs(insn, cfi, op); > > > > switch (op->dest.type) { > > --- a/tools/objtool/orc_dump.c > > +++ b/tools/objtool/orc_dump.c > > @@ -43,7 +43,8 @@ static const char *orc_type_name(unsigne > > return "call"; > > case UNWIND_HINT_TYPE_REGS: > > return "regs"; > > - case UNWIND_HINT_TYPE_REGS_PARTIAL: > > + case UNWIND_HINT_TYPE_REGS_ENTRY: > > + case UNWIND_HINT_TYPE_REGS_EXIT: > > return "regs (partial)"; > > default: > > return "?"; > > > > -- Kees Cook
On Thu, Dec 23, 2021 at 06:05:50PM -0800, joao@overdrivepizza.com wrote:
> On 2021-11-22 09:03, Peter Zijlstra wrote:
> > Objtool based IBT validation in 3 passes:
> >
> > --ibt-fix-direct:
> >
> > Detect and rewrite any code/reloc from a JMP/CALL instruction
> > to an ENDBR instruction. This is basically a compiler bug since
> > neither needs the ENDBR and decoding it is a pure waste of time.
> >
> > --ibt:
> >
> > Report code relocs that are not JMP/CALL and don't point to ENDBR
> >
> > There's a bunch of false positives, for eg. static_call_update()
> > and copy_thread() and kprobes. But most of them were due to
> > _THIS_IP_ which has been taken care of with the prior annotation.
> >
> > --ibt-seal:
> >
> > Find and NOP superfluous ENDBR instructions. Any function that
> > doesn't have it's address taken should not have an ENDBR
> > instruction. This removes about 1-in-4 ENDBR instructions.
> >
>
> I did some experimentation with compiler-based implementation for two of the
> features described here (plus an additional one). Before going into details,
> just a quick highlight that the compiler has limited visibility over
> handwritten assembly sources thus, depending on the feature, a
> compiler-based approach will not cover as much as objtool. All the
> observations below were made when compiling the kernel with defconfig, +
> CLANG-related options, + LTO sometimes. Here I used kernel revision
> 0fcfb00b28c0b7884635dacf38e46d60bf3d4eb1 with PeterZ's IBT Beginning patches
> applied on top (plus changes to Kbuild), thus, IBT was not really enforced.
> Tests consisted mostly of Clang's synthetics tests + booting a compiled
> kernel.
>
> Prototypes of the features are available in:
> https://github.com/lvwr/llvm/tree/joao/ibt -- I fixed as many corner cases I
> could find while trying it out, but I believe some might still be haunting.
> Also, I'm not very keen to Kbuild internals nor to however the kernel
> patches itself during runtime, so I may have missed some details.
>
> Finally, I'm interested in general feedback about this... better ways of
> implementing, alternative approaches, new possible optimizations and
> everything. I should be AFK for a few days in the next weeks, but I'll be
> glad to discuss this in January and then. Happy Holidays :)
>
> The features:
>
> -mibt-seal:
>
> Add ENDBRs exclusively to address-taken functions regardless of its linkage
> visibility. Only make sense together with LTO.
>
> Observations: Reduced drastically the number of ENDBRs placed in the kernel
> binary (From 44383 to 33192), but still missed some that were later fixed by
> objtool (Number of fixes by objtool reduced from 11730 to 540). I did not
> investigate further why these superfluous ENDBRs were still left in the
> binary, but at this point my hypotheses spin around (i) false-positive
> address-taken conclusions by the compiler, possibly due to things like
> exported symbols and such; (ii) assembly sources which are invisible to the
> compiler (although this would more likely hide address taken functions);
> (iii) other binary level transformations done by objtool.
>
> Runtime testing: The kernel was verified to properly boot after being
> compiled with -mibt-seal (+ LTO).
>
> Note: This feature was already submitted for upstreaming with the
> llvm-project: https://reviews.llvm.org/D116070
Ah nice; I see this has been committed now.
Given that IBT will need to work with both Clang and gcc, I suspect the
objtool approach will still end up needing to do all the verification.
(And as you say, it has limited visibility into assembly.)
-Kees
--
Kees Cook
On Wed, Nov 24, 2021 at 11:30:37AM -0800, Josh Poimboeuf wrote:
> On Mon, Nov 22, 2021 at 06:03:07PM +0100, Peter Zijlstra wrote:
> > +static int validate_ibt_reloc(struct objtool_file *file, struct reloc *reloc, char **name)
> > +{
> > + struct instruction *dest;
> > + struct section *sec;
> > + unsigned long off;
> > +
> > + sec = reloc->sym->sec;
> > + off = reloc->sym->offset + reloc->addend;
> > +
> > + dest = find_insn(file, sec, off);
> > + if (!dest)
> > + return 0;
> > +
> > + if (name && dest->func)
> > + *name = dest->func->name;
>
> I think these checks can be further narrowed down by only looking for
> X86_64_64 relocs.
>
> > + list_for_each_entry(insn, &file->endbr_list, call_node) {
> > + if (ibt_seal) {
> > + elf_write_insn(file->elf, insn->sec,
> > + insn->offset, insn->len,
> > + arch_nop_insn(insn->len));
> > + }
>
> Like the retpoline rewriting, I'd much rather have objtool create
> annotations which the kernel can then read and patch itself.
>
> e.g. have '.ibt.direct_call_sites' and '.ibt.unused_endbr_insns'
> sections.
Why have the kernel do that work at every boot when it can be known at
link time?
--
Kees Cook
On Mon, Nov 22, 2021 at 06:03:01PM +0100, Peter Zijlstra wrote: > It is the very bare beginnings of kernel IBT support. Since I'm lacking any > sort of actual hardware it even lacks fun things like code to write to the MSRs > to enable the IBT tracker etc.. Heh. I have hardware to test with -- recent laptops all have the support. I haven't checked in QEMU can emulate it, though. Bochs seems to. > However, it should have most of the ENDBR instructions in the right place -- I > hope :-) That said; I would *really* like compiler support for this stuff to be > improved, the amount of fixups done by objtool is obscene. > > The end result still boots on ancient x86-64 hardware, for whatever that's > worth (when built with the below turd included that is). Should the below roughly be patch 7? > > Enjoy! > > --- > > diff --git a/scripts/link-vmlinux.sh b/scripts/link-vmlinux.sh > index 5cdd9bc5c385..1d180bbe7b28 100755 > --- a/scripts/link-vmlinux.sh > +++ b/scripts/link-vmlinux.sh > @@ -142,6 +142,11 @@ objtool_link() > info OBJTOOL ${1} > tools/objtool/objtool ${objtoolcmd} ${objtoolopt} ${1} > fi > + > + if [ "${CONFIG_X86_IBT}" = "y" ]; then > + # XXX less ugleh > + tools/objtool/objtool check --no-fp --retpoline --uaccess --vmlinux --duplicate --ibt --ibt-fix-direct --ibt-seal ${1} > + fi > } > > # Link of vmlinux > Have you had a chance to get this into shape for a v1? -- Kees Cook
On Mon, Nov 22, 2021 at 9:14 AM Peter Zijlstra <peterz@infradead.org> wrote: > > Hi, > > So I hacked this up on Friday night / Saturday morning and spend all of today > cleaning it up. > > It is the very bare beginnings of kernel IBT support. Since I'm lacking any > sort of actual hardware it even lacks fun things like code to write to the MSRs > to enable the IBT tracker etc.. > > However, it should have most of the ENDBR instructions in the right place -- I > hope :-) That said; I would *really* like compiler support for this stuff to be > improved, the amount of fixups done by objtool is obscene. > > The end result still boots on ancient x86-64 hardware, for whatever that's > worth (when built with the below turd included that is). Thanks for the patches! Are there recommended command line args for qemu emulation to test this with? Tigerlake and Alderlake should be required for IBT support IIRC from our IRC discussion? https://qemu.readthedocs.io/en/latest/system/qemu-cpu-models.html#preferred-cpu-models-for-intel-x86-hosts No hits for: $ qemu-system-x86_64 -cpu help | grep -e tiger -e alder $ qemu-system-x86_64 --version QEMU emulator version 6.2.0 (Debian 1:6.2+dfsg-2) Copyright (c) 2003-2021 Fabrice Bellard and the QEMU Project developers -- Thanks, ~Nick Desaulniers
>> Note: This feature was already submitted for upstreaming with the >> llvm-project: https://reviews.llvm.org/D116070 > > Ah nice; I see this has been committed now. Yes, but then some front-end changes also required this fix https://reviews.llvm.org/D118052, which is currently under review (posting this here in case someone is trying this out). > > Given that IBT will need to work with both Clang and gcc, I suspect the > objtool approach will still end up needing to do all the verification. > > (And as you say, it has limited visibility into assembly.) Agreed that at this point objtool provides more coverage. Yet, besides being an attempt to relief objtool and improve a bit the compiler support as mentioned in the series cover letter, it is still nice to reduce the left-over nops and fixups which end-up scattered all around. FWIIW, https://reviews.llvm.org/D118438 and https://reviews.llvm.org/D118355 are also being cooked. Comments and ideas for new approaches or improvements in the compiler support for this are very welcome :) Tks, Joao
On Tue, Feb 08, 2022 at 06:21:16PM -0800, Joao Moreira wrote: > > > Note: This feature was already submitted for upstreaming with the > > > llvm-project: https://reviews.llvm.org/D116070 > > > > Ah nice; I see this has been committed now. > > Yes, but then some front-end changes also required this fix > https://reviews.llvm.org/D118052, which is currently under review (posting > this here in case someone is trying this out). > > > > > Given that IBT will need to work with both Clang and gcc, I suspect the > > objtool approach will still end up needing to do all the verification. > > > > (And as you say, it has limited visibility into assembly.) > > Agreed that at this point objtool provides more coverage. Yet, besides being > an attempt to relief objtool and improve a bit the compiler support as > mentioned in the series cover letter, it is still nice to reduce the > left-over nops and fixups which end-up scattered all around. > > FWIIW, https://reviews.llvm.org/D118438 and https://reviews.llvm.org/D118355 > are also being cooked. Comments and ideas for new approaches or improvements > in the compiler support for this are very welcome :) Ah, excellent, thanks for the pointers. There's also this in the works: https://reviews.llvm.org/D119296 (a new CFI mode, designed to play nice to objtool, IBT, etc.) -- Kees Cook
On Tue, Feb 08, 2022 at 03:43:27PM -0800, Kees Cook wrote:
> On Wed, Nov 24, 2021 at 11:30:37AM -0800, Josh Poimboeuf wrote:
> > On Mon, Nov 22, 2021 at 06:03:07PM +0100, Peter Zijlstra wrote:
> > > +static int validate_ibt_reloc(struct objtool_file *file, struct reloc *reloc, char **name)
> > > +{
> > > + struct instruction *dest;
> > > + struct section *sec;
> > > + unsigned long off;
> > > +
> > > + sec = reloc->sym->sec;
> > > + off = reloc->sym->offset + reloc->addend;
> > > +
> > > + dest = find_insn(file, sec, off);
> > > + if (!dest)
> > > + return 0;
> > > +
> > > + if (name && dest->func)
> > > + *name = dest->func->name;
> >
> > I think these checks can be further narrowed down by only looking for
> > X86_64_64 relocs.
> >
> > > + list_for_each_entry(insn, &file->endbr_list, call_node) {
> > > + if (ibt_seal) {
> > > + elf_write_insn(file->elf, insn->sec,
> > > + insn->offset, insn->len,
> > > + arch_nop_insn(insn->len));
> > > + }
> >
> > Like the retpoline rewriting, I'd much rather have objtool create
> > annotations which the kernel can then read and patch itself.
> >
> > e.g. have '.ibt.direct_call_sites' and '.ibt.unused_endbr_insns'
> > sections.
>
> Why have the kernel do that work at every boot when it can be known at
> link time?
True, but:
- The kernel is already doing several other flavors of boot-time
self-patching. IMO it's better to consolidate such craziness in one
place (or a limited number of places) if we can. It seems more
robust, and limits confusion about who's patching what and when.
- Patching text at link time has pitfalls and I'd like to avoid (as much
as reasonably possible) objtool having the ability to brick the
kernel.
--
Josh
> Ah, excellent, thanks for the pointers. There's also this in the works: > https://reviews.llvm.org/D119296 (a new CFI mode, designed to play nice > to objtool, IBT, etc.) Oh, great! Thanks for pointing it out. I guess I saw something with a similar name before ;) https://www.blackhat.com/docs/asia-17/materials/asia-17-Moreira-Drop-The-Rop-Fine-Grained-Control-Flow-Integrity-For-The-Linux-Kernel.pdf Jokes aside (and perhaps questions more targeted to Sami), from a diagonal look it seems that this follows the good old tag approach proposed by PaX/grsecurity, right? If this is the case, should I assume it could also benefit from features like -mibt-seal? Also are you considering that perhaps we can use alternatives to flip different CFI instrumentation as suggested by PeterZ in another thread? Tks, Joao
On Tue, Feb 08, 2022 at 03:43:27PM -0800, Kees Cook wrote:
> On Wed, Nov 24, 2021 at 11:30:37AM -0800, Josh Poimboeuf wrote:
> > On Mon, Nov 22, 2021 at 06:03:07PM +0100, Peter Zijlstra wrote:
> > > +static int validate_ibt_reloc(struct objtool_file *file, struct reloc *reloc, char **name)
> > > +{
> > > + struct instruction *dest;
> > > + struct section *sec;
> > > + unsigned long off;
> > > +
> > > + sec = reloc->sym->sec;
> > > + off = reloc->sym->offset + reloc->addend;
> > > +
> > > + dest = find_insn(file, sec, off);
> > > + if (!dest)
> > > + return 0;
> > > +
> > > + if (name && dest->func)
> > > + *name = dest->func->name;
> >
> > I think these checks can be further narrowed down by only looking for
> > X86_64_64 relocs.
> >
> > > + list_for_each_entry(insn, &file->endbr_list, call_node) {
> > > + if (ibt_seal) {
> > > + elf_write_insn(file->elf, insn->sec,
> > > + insn->offset, insn->len,
> > > + arch_nop_insn(insn->len));
> > > + }
> >
> > Like the retpoline rewriting, I'd much rather have objtool create
> > annotations which the kernel can then read and patch itself.
> >
> > e.g. have '.ibt.direct_call_sites' and '.ibt.unused_endbr_insns'
> > sections.
>
> Why have the kernel do that work at every boot when it can be known at
> link time?
I have patches that write a 4 byte #UD there instead of a nop. That
would make !IBT hardware splat as well when it hits a sealed function
(and in that case actually having those extra ENDBR generated is a
bonus).
Anyway, I have some newer patches and some hardware, except it's a NUC
and working with those things is a royal pain in the arse since they
don't have serial. I finally did get XHCI debug port working, but
there's no XDBC grub support, so now I managed to boot a dead kernel and
the thing is a brick until I can be arsed to connect a keybaord and
screen to it again :-(
KVM/qemu has no IBT support merged yet, so I can't use that either.
On Wed, Feb 09, 2022 at 12:41:41PM +0100, Peter Zijlstra wrote: > On Tue, Feb 08, 2022 at 03:43:27PM -0800, Kees Cook wrote: > > Why have the kernel do that work at every boot when it can be known at > > link time? > > I have patches that write a 4 byte #UD there instead of a nop. That > would make !IBT hardware splat as well when it hits a sealed function > (and in that case actually having those extra ENDBR generated is a > bonus). > > Anyway, I have some newer patches and some hardware, except it's a NUC > and working with those things is a royal pain in the arse since they > don't have serial. I finally did get XHCI debug port working, but > there's no XDBC grub support, so now I managed to boot a dead kernel and > the thing is a brick until I can be arsed to connect a keybaord and > screen to it again :-( > > KVM/qemu has no IBT support merged yet, so I can't use that either. FWIW, those patches live here: git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git x86/wip.ibt
On Tue, Feb 08, 2022 at 09:18:44PM -0800, Joao Moreira wrote:
> > Ah, excellent, thanks for the pointers. There's also this in the works:
> > https://reviews.llvm.org/D119296 (a new CFI mode, designed to play nice
> > to objtool, IBT, etc.)
>
> Oh, great! Thanks for pointing it out. I guess I saw something with a
> similar name before ;) https://www.blackhat.com/docs/asia-17/materials/asia-17-Moreira-Drop-The-Rop-Fine-Grained-Control-Flow-Integrity-For-The-Linux-Kernel.pdf
>
> Jokes aside (and perhaps questions more targeted to Sami), from a diagonal
> look it seems that this follows the good old tag approach proposed by
> PaX/grsecurity, right? If this is the case, should I assume it could also
> benefit from features like -mibt-seal? Also are you considering that perhaps
> we can use alternatives to flip different CFI instrumentation as suggested
> by PeterZ in another thread?
So, lets try and recap things from IRC yesterday. There's a whole bunch
of things intertwining making indirect branches 'interesting'. Most of
which I've not seen mentioned in Sami's KCFI proposal which makes it
rather pointless.
I think we'll end up with something related to KCFI, but with distinct
differences:
- 32bit immediates for smaller code
- __kcfi_check_fail() is out for smaller code
- it must interact with IBT/BTI and retpolines
- we must be very careful with speculation.
Right, so because !IBT-CFI needs the check at the call site, like:
caller:
cmpl $0xdeadbeef, -0x4(%rax) # 7 bytes
je 1f # 2 bytes
ud2 # 2 bytes
1: call __x86_indirect_thunk_rax # 5 bytes
.align 16
.byte 0xef, 0xbe, 0xad, 0xde # 4 bytes
func:
...
ret
While FineIBT has them at the landing site:
caller:
movl $0xdeadbeef, %r11d # 6 bytes
call __x86_indirect_thunk_rax # 5 bytes
.align 16
func:
endbr # 4 bytes
cmpl $0xdeadbeef, %r11d # 7 bytes
je 1f # 2 bytes
ud2 # 2 bytes
1: ...
ret
It seems to me that always doing the check at the call-site is simpler,
since it avoids code-bloat and patching work. That is, if we allow both
we'll effectivly blow up the code by 11 + 13 bytes (11 at the call site,
13 at function entry) as opposed to 11+4 or 6+13.
Which then yields:
caller:
cmpl $0xdeadbeef, -0x4(%rax) # 7 bytes
je 1f # 2 bytes
ud2 # 2 bytes
1: call __x86_indirect_thunk_rax # 5 bytes
.align 16
.byte 0xef, 0xbe, 0xad, 0xde # 4 bytes
func:
endbr # 4 bytes
...
ret
For a combined 11+8 bytes overhead :/
Now, this setup provides:
- minimal size (please yell if there's a smaller option I missed;
s/ud2/int3/ ?)
- since the retpoline handles speculation from stuff before it, the
load-miss induced speculation is covered.
- the 'je' branch is binary, leading to either the retpoline or the
ud2, both which are a speculation stop.
- the ud2 is placed such that if the exception is non-fatal, code
execution can recover
- when IBT is present we can rewrite the thunk call to:
lfence
call *(%rax)
and rely on the WAIT-FOR-ENDBR speculation stop (also 5 bytes).
- can disable CFI by replacing the cmpl with:
jmp 1f
(or an 11 byte nop, which is just about possible). And since we
already have all retpoline thunk callsites in a section, we can
trivially find all CFI bits that are always in front it them.
- function pointer sanity
Additionally, if we ensure all direct call are +4 and only indirect
calls hit the ENDBR -- as it optimal anyway, saves on decoding ENDBR. We
can replace those ENDBR instructions of functions that should never be
indirectly called with:
ud1 0x0(%rax),%eax
which is a 4 byte #UD. This gives us the property that even on !IBT
hardware such a call will go *splat*.
Further, Andrew put in the request for __attribute__((cfi_seed(blah)))
to allow distinguishing indirect functions with otherwise identical
signature; eg. cookie = hash32(blah##signature).
Did I miss anything? Got anything wrong?
On Fri, Feb 11, 2022 at 5:38 AM Peter Zijlstra <peterz@infradead.org> wrote: > I think we'll end up with something related to KCFI, but with distinct > differences: > > - 32bit immediates for smaller code Sure, I don't see issues with that. Based on a quick test with defconfig, this reduces vmlinux size by 0.30%. > - __kcfi_check_fail() is out for smaller code I'm fine with adding a trap mode that's used by default, but having more helpful diagnostics when something fails is useful even in production systems in my experience. This change results in a vmlinux that's another 0.92% smaller. > Which then yields: > > caller: > cmpl $0xdeadbeef, -0x4(%rax) # 7 bytes > je 1f # 2 bytes > ud2 # 2 bytes > 1: call __x86_indirect_thunk_rax # 5 bytes Note that the compiler might not emit this *exact* sequence of instructions. For example, Clang generates this for events_sysfs_show with the modified KCFI patch: 2274: cmpl $0x4d7bed9e,-0x4(%r11) 227c: jne 22c0 <events_sysfs_show+0x6c> 227e: call 2283 <events_sysfs_show+0x2f> 227f: R_X86_64_PLT32 __x86_indirect_thunk_r11-0x4 ... 22c0: ud2 In this case the function has two indirect calls and Clang seems to prefer to emit just one ud2. > .align 16 > .byte 0xef, 0xbe, 0xad, 0xde # 4 bytes > func: > endbr # 4 bytes Here func is no longer aligned to 16 bytes, in case that's important. > Further, Andrew put in the request for __attribute__((cfi_seed(blah))) > to allow distinguishing indirect functions with otherwise identical > signature; eg. cookie = hash32(blah##signature). Sounds reasonable. > Did I miss anything? Got anything wrong? How would you like to deal with the 4-byte hashes in objtool? We either need to annotate all function symbols in the kernel, or we need a way to distinguish the hashes from random instructions, so we can also have functions that don't have a type hash. Sami
On Mon, Feb 14, 2022 at 01:38:18PM -0800, Sami Tolvanen wrote: > On Fri, Feb 11, 2022 at 5:38 AM Peter Zijlstra <peterz@infradead.org> wrote: > > I think we'll end up with something related to KCFI, but with distinct > > differences: > > > > - 32bit immediates for smaller code > > Sure, I don't see issues with that. Based on a quick test with > defconfig, this reduces vmlinux size by 0.30%. > > > - __kcfi_check_fail() is out for smaller code > > I'm fine with adding a trap mode that's used by default, but having > more helpful diagnostics when something fails is useful even in > production systems in my experience. This change results in a vmlinux > that's another 0.92% smaller. You can easily have the exception generate a nice warning, you can even have it continue. You really don't need a call for that. > > Which then yields: > > > > caller: > > cmpl $0xdeadbeef, -0x4(%rax) # 7 bytes > > je 1f # 2 bytes > > ud2 # 2 bytes > > 1: call __x86_indirect_thunk_rax # 5 bytes > > Note that the compiler might not emit this *exact* sequence of > instructions. For example, Clang generates this for events_sysfs_show > with the modified KCFI patch: > > 2274: cmpl $0x4d7bed9e,-0x4(%r11) > 227c: jne 22c0 <events_sysfs_show+0x6c> > 227e: call 2283 <events_sysfs_show+0x2f> > 227f: R_X86_64_PLT32 __x86_indirect_thunk_r11-0x4 > ... > 22c0: ud2 > > In this case the function has two indirect calls and Clang seems to > prefer to emit just one ud2. That will not allow you to recover from the exception. UD2 is not an unconditional fail. It should have an out-going edge in this case too. Heck, most of the WARN_ON() things are UD2 instructions. Also, you really should add a CS prefix to the retpoline thunk call if you insist on using r11 (or any of the higher regs). > > .align 16 > > .byte 0xef, 0xbe, 0xad, 0xde # 4 bytes > > func: > > endbr # 4 bytes > > Here func is no longer aligned to 16 bytes, in case that's important. The idea was to have the hash and the endbr in the same cacheline. > > Did I miss anything? Got anything wrong? > > How would you like to deal with the 4-byte hashes in objtool? We > either need to annotate all function symbols in the kernel, or we need > a way to distinguish the hashes from random instructions, so we can > also have functions that don't have a type hash. Easiest would be to create a special section with all the hash offsets in I suppose. A bit like -mfentry-section=name.
On Mon, Feb 14, 2022 at 2:25 PM Peter Zijlstra <peterz@infradead.org> wrote: > On Mon, Feb 14, 2022 at 01:38:18PM -0800, Sami Tolvanen wrote: > > I'm fine with adding a trap mode that's used by default, but having > > more helpful diagnostics when something fails is useful even in > > production systems in my experience. This change results in a vmlinux > > that's another 0.92% smaller. > > You can easily have the exception generate a nice warning, you can even > have it continue. You really don't need a call for that. Sure, but wouldn't that require us to generate something like __bug_table, so we know where the CFI specific traps are? > > In this case the function has two indirect calls and Clang seems to > > prefer to emit just one ud2. > > That will not allow you to recover from the exception. UD2 is not an > unconditional fail. It should have an out-going edge in this case too. Yes, CFI failures are not recoverable in that code. In fact, LLVM assumes that the llvm.trap intrinsic (i.e. ud2) never returns, but I suppose we could just use an int3 instead. I assume that's sufficient to stop speculation? > Also, you really should add a CS prefix to the retpoline thunk call if > you insist on using r11 (or any of the higher regs). I actually didn't touch the retpoline thunk call, that's exactly the code Clang normally generates. > > How would you like to deal with the 4-byte hashes in objtool? We > > either need to annotate all function symbols in the kernel, or we need > > a way to distinguish the hashes from random instructions, so we can > > also have functions that don't have a type hash. > > Easiest would be to create a special section with all the hash offsets > in I suppose. A bit like -mfentry-section=name. OK, I'll take a look. With 64-bit hashes I was planning to use a known prefix, but that's not really an option with a 32-bit hash. Sami
On Tue, Feb 15, 2022 at 08:56:03AM -0800, Sami Tolvanen wrote: > On Mon, Feb 14, 2022 at 2:25 PM Peter Zijlstra <peterz@infradead.org> wrote: > > On Mon, Feb 14, 2022 at 01:38:18PM -0800, Sami Tolvanen wrote: > > > I'm fine with adding a trap mode that's used by default, but having > > > more helpful diagnostics when something fails is useful even in > > > production systems in my experience. This change results in a vmlinux > > > that's another 0.92% smaller. > > > > You can easily have the exception generate a nice warning, you can even > > have it continue. You really don't need a call for that. > > Sure, but wouldn't that require us to generate something like > __bug_table, so we know where the CFI specific traps are? It also means the trap handler needs to do a bunch of instruction decoding to find the address that was going to be jumped to, etc. > > > In this case the function has two indirect calls and Clang seems to > > > prefer to emit just one ud2. > > > > That will not allow you to recover from the exception. UD2 is not an > > unconditional fail. It should have an out-going edge in this case too. > > Yes, CFI failures are not recoverable in that code. In fact, LLVM > assumes that the llvm.trap intrinsic (i.e. ud2) never returns, but I > suppose we could just use an int3 instead. I assume that's sufficient > to stop speculation? Peter, is there a reason you want things in the specific order of: cmp, je-to-call, trap, call Isn't it more run-time efficient to have an out-of-line failure of the form: cmp, jne-to-trap, call, ...code..., trap, jmp-to-call I thought the static label stuff allowed the "default out of line" option, as far as pessimizing certain states, etc? The former is certainly code-size smaller, though, yes, but doesn't it waste space in the cache line for the unlikely case, etc? > > Also, you really should add a CS prefix to the retpoline thunk call if > > you insist on using r11 (or any of the higher regs). > > I actually didn't touch the retpoline thunk call, that's exactly the > code Clang normally generates. > > > > How would you like to deal with the 4-byte hashes in objtool? We > > > either need to annotate all function symbols in the kernel, or we need > > > a way to distinguish the hashes from random instructions, so we can > > > also have functions that don't have a type hash. > > > > Easiest would be to create a special section with all the hash offsets > > in I suppose. A bit like -mfentry-section=name. > > OK, I'll take a look. With 64-bit hashes I was planning to use a known > prefix, but that's not really an option with a 32-bit hash. 32-bit hashes would have both code size and runtime benefits: fewer instructions for the compare therefore a smaller set of instructions added. -- Kees Cook
On Tue, Feb 15, 2022 at 08:56:03AM -0800, Sami Tolvanen wrote: > On Mon, Feb 14, 2022 at 2:25 PM Peter Zijlstra <peterz@infradead.org> wrote: > > On Mon, Feb 14, 2022 at 01:38:18PM -0800, Sami Tolvanen wrote: > > > I'm fine with adding a trap mode that's used by default, but having > > > more helpful diagnostics when something fails is useful even in > > > production systems in my experience. This change results in a vmlinux > > > that's another 0.92% smaller. > > > > You can easily have the exception generate a nice warning, you can even > > have it continue. You really don't need a call for that. > > Sure, but wouldn't that require us to generate something like > __bug_table, so we know where the CFI specific traps are? Yes, but since you're going to emit a retpoline, and objtool already generates a list of retpoline sites, we can abuse that. If the instruction after the trap is a retpoline site, we a CFI fail. > > > In this case the function has two indirect calls and Clang seems to > > > prefer to emit just one ud2. > > > > That will not allow you to recover from the exception. UD2 is not an > > unconditional fail. It should have an out-going edge in this case too. > > Yes, CFI failures are not recoverable in that code. In fact, LLVM > assumes that the llvm.trap intrinsic (i.e. ud2) never returns, but I > suppose we could just use an int3 instead. I assume that's sufficient > to stop speculation? It is. int3 is also smaller by one byte, but the exception is already fairly complicated, but I can see if we can make it work. > > Also, you really should add a CS prefix to the retpoline thunk call if > > you insist on using r11 (or any of the higher regs). > > I actually didn't touch the retpoline thunk call, that's exactly the > code Clang normally generates. Yeah, and it needs help, also see: https://lore.kernel.org/lkml/20211119165630.276205624@infradead.org/ Specifically, clang needs to: - CS prefix stuff the retpoline thunk call/jmp; - stop doing conditional indirect tail-calls.
On Tue, Feb 15, 2022 at 12:03:12PM -0800, Kees Cook wrote: > On Tue, Feb 15, 2022 at 08:56:03AM -0800, Sami Tolvanen wrote: > > On Mon, Feb 14, 2022 at 2:25 PM Peter Zijlstra <peterz@infradead.org> wrote: > > > On Mon, Feb 14, 2022 at 01:38:18PM -0800, Sami Tolvanen wrote: > > > > I'm fine with adding a trap mode that's used by default, but having > > > > more helpful diagnostics when something fails is useful even in > > > > production systems in my experience. This change results in a vmlinux > > > > that's another 0.92% smaller. > > > > > > You can easily have the exception generate a nice warning, you can even > > > have it continue. You really don't need a call for that. > > > > Sure, but wouldn't that require us to generate something like > > __bug_table, so we know where the CFI specific traps are? > > It also means the trap handler needs to do a bunch of instruction > decoding to find the address that was going to be jumped to, etc. arch/x86/kernel/alternative.c:apply_retpolines() has all that, since we need to to know that to re-write the thunk-call. > > > > In this case the function has two indirect calls and Clang seems to > > > > prefer to emit just one ud2. > > > > > > That will not allow you to recover from the exception. UD2 is not an > > > unconditional fail. It should have an out-going edge in this case too. > > > > Yes, CFI failures are not recoverable in that code. In fact, LLVM > > assumes that the llvm.trap intrinsic (i.e. ud2) never returns, but I > > suppose we could just use an int3 instead. I assume that's sufficient > > to stop speculation? > > Peter, is there a reason you want things in the specific order of: > > cmp, je-to-call, trap, call > > Isn't it more run-time efficient to have an out-of-line failure of > the form: > > cmp, jne-to-trap, call, ...code..., trap, jmp-to-call > > I thought the static label stuff allowed the "default out of line" > option, as far as pessimizing certain states, etc? The former is certainly > code-size smaller, though, yes, but doesn't it waste space in the cache > line for the unlikely case, etc? Mostly so that we can deduce the address of the trap from the retpoline site, also the above has a fairly high chance of using jcc.d32 which is actually larger than jcc.d8+ud2.
On 2022-02-11 05:38, Peter Zijlstra wrote: > On Tue, Feb 08, 2022 at 09:18:44PM -0800, Joao Moreira wrote: >> > Ah, excellent, thanks for the pointers. There's also this in the works: >> > https://reviews.llvm.org/D119296 (a new CFI mode, designed to play nice >> > to objtool, IBT, etc.) >> >> Oh, great! Thanks for pointing it out. I guess I saw something with a >> similar name before ;) >> https://www.blackhat.com/docs/asia-17/materials/asia-17-Moreira-Drop-The-Rop-Fine-Grained-Control-Flow-Integrity-For-The-Linux-Kernel.pdf >> >> Jokes aside (and perhaps questions more targeted to Sami), from a >> diagonal >> look it seems that this follows the good old tag approach proposed by >> PaX/grsecurity, right? If this is the case, should I assume it could >> also >> benefit from features like -mibt-seal? Also are you considering that >> perhaps >> we can use alternatives to flip different CFI instrumentation as >> suggested >> by PeterZ in another thread? > > So, lets try and recap things from IRC yesterday. There's a whole bunch > of things intertwining making indirect branches 'interesting'. Most of > which I've not seen mentioned in Sami's KCFI proposal which makes it > rather pointless. > > I think we'll end up with something related to KCFI, but with distinct > differences: > > - 32bit immediates for smaller code > - __kcfi_check_fail() is out for smaller code > - it must interact with IBT/BTI and retpolines > - we must be very careful with speculation. > > Right, so because !IBT-CFI needs the check at the call site, like: > > caller: > cmpl $0xdeadbeef, -0x4(%rax) # 7 bytes > je 1f # 2 bytes > ud2 # 2 bytes > 1: call __x86_indirect_thunk_rax # 5 bytes > > > .align 16 > .byte 0xef, 0xbe, 0xad, 0xde # 4 bytes > func: > ... > ret > > > While FineIBT has them at the landing site: > > caller: > movl $0xdeadbeef, %r11d # 6 bytes > call __x86_indirect_thunk_rax # 5 bytes > > > .align 16 > func: > endbr # 4 bytes > cmpl $0xdeadbeef, %r11d # 7 bytes > je 1f # 2 bytes > ud2 # 2 bytes > 1: ... > ret > > > It seems to me that always doing the check at the call-site is simpler, > since it avoids code-bloat and patching work. That is, if we allow both > we'll effectivly blow up the code by 11 + 13 bytes (11 at the call > site, > 13 at function entry) as opposed to 11+4 or 6+13. > > Which then yields: > > caller: > cmpl $0xdeadbeef, -0x4(%rax) # 7 bytes > je 1f # 2 bytes > ud2 # 2 bytes > 1: call __x86_indirect_thunk_rax # 5 bytes > > > .align 16 > .byte 0xef, 0xbe, 0xad, 0xde # 4 bytes > func: > endbr # 4 bytes > ... > ret > > For a combined 11+8 bytes overhead :/ > > Now, this setup provides: > > - minimal size (please yell if there's a smaller option I missed; > s/ud2/int3/ ?) > - since the retpoline handles speculation from stuff before it, the > load-miss induced speculation is covered. > - the 'je' branch is binary, leading to either the retpoline or the > ud2, both which are a speculation stop. > - the ud2 is placed such that if the exception is non-fatal, code > execution can recover > - when IBT is present we can rewrite the thunk call to: > > lfence > call *(%rax) > > and rely on the WAIT-FOR-ENDBR speculation stop (also 5 bytes). > - can disable CFI by replacing the cmpl with: > > jmp 1f > > (or an 11 byte nop, which is just about possible). And since we > already have all retpoline thunk callsites in a section, we can > trivially find all CFI bits that are always in front it them. > - function pointer sanity > Agreed that it is sensible to support CFI for CPUs without CET. KCFI is a win. Yet, I still think that we should support FineIBT and there are a few reasons for that (other than hopeful performance benefits). - KCFI is more prone to the presence of unintended allowed targets. Since the IBT scheme always rely on the same watermark tag (the ENDBR instruction), it is easier to prevent these from being emitted by JIT/compilers (fwiiw, see https://reviews.llvm.org/rGf385823e04f300c92ec03dbd660d621cc618a271 and https://dl.acm.org/doi/10.1145/3337167.3337175). - Regarding the space overhead, I can try to verify if FineIBT can be feasibly reshaped into: caller: movl $0xdeadbeef,%r10 # 6 bytes sub $0x5,%r11 # 4 bytes call *(%r11) # 5 bytes .align 16 endbr # 4 bytes call __x86_fineibt_thunk_deadbeef # 5 bytes func+4: ... ret __x86_fineibt_thunk_deadbeef: xor $0xdeadbeef, %r10 je 1f ud2 1: retq This scheme would require less space and less runtime patching. We would need one additional byte on callees to patch both the ENDBR and the 5-byte call instruction, plus space to hold the thunks somewhere else. The thunks overhead is then dilluted across the functions belonging to the same equivalence set, as these will use the same thunk. On a quick analysis over defconfig, it seems that the number of equivalence sets are ~25% of the number of functions (thus, space overhead is cut to a fourth). I guess this would only require the KCFI compiler support to emit an additional "0xcc" before the tag. Thunks can also be placed in a special section and dropped during boot to reduce the runtime memory footprint. What do you think, is this worth investigating? Also, in my understanding, r11 does not need to be preserved as, per-ABI, it is a scratch register. So there is no need to add $0x5,%r11 back after the call. Let me know if I'm mistaken. > > Additionally, if we ensure all direct call are +4 and only indirect > calls hit the ENDBR -- as it optimal anyway, saves on decoding ENDBR. > We > can replace those ENDBR instructions of functions that should never be > indirectly called with: > > ud1 0x0(%rax),%eax > > which is a 4 byte #UD. This gives us the property that even on !IBT > hardware such a call will go *splat*. Given that the space overhead is a big concern, isn't it better to use the compiler support to prevent ENDBRs in the first place? Patching #UD is kinda cool, yeah, but I don't see it providing meaningful security guarantees over not having the ENDBRs emitted at all. > > Further, Andrew put in the request for __attribute__((cfi_seed(blah))) > to allow distinguishing indirect functions with otherwise identical > signature; eg. cookie = hash32(blah##signature). > > > Did I miss anything? Got anything wrong? I understand that Today there are not too many CET-enabled CPUs out there, and that the benefits might be a bit blurred currently. Yet, given that this is something that should continuously change with time, would you object to FineIBT as a build option, i.e., something which might not be supported by alternatives when using defconfig?
On Tue, Feb 15, 2022 at 10:05:50PM +0100, Peter Zijlstra wrote: > On Tue, Feb 15, 2022 at 12:03:12PM -0800, Kees Cook wrote: > > On Tue, Feb 15, 2022 at 08:56:03AM -0800, Sami Tolvanen wrote: > > > On Mon, Feb 14, 2022 at 2:25 PM Peter Zijlstra <peterz@infradead.org> wrote: > > > > On Mon, Feb 14, 2022 at 01:38:18PM -0800, Sami Tolvanen wrote: > > > > > I'm fine with adding a trap mode that's used by default, but having > > > > > more helpful diagnostics when something fails is useful even in > > > > > production systems in my experience. This change results in a vmlinux > > > > > that's another 0.92% smaller. > > > > > > > > You can easily have the exception generate a nice warning, you can even > > > > have it continue. You really don't need a call for that. > > > > > > Sure, but wouldn't that require us to generate something like > > > __bug_table, so we know where the CFI specific traps are? > > > > It also means the trap handler needs to do a bunch of instruction > > decoding to find the address that was going to be jumped to, etc. > > arch/x86/kernel/alternative.c:apply_retpolines() has all that, since we > need to to know that to re-write the thunk-call. Ah, okay, well that makes things easier. :) > > > > > In this case the function has two indirect calls and Clang seems to > > > > > prefer to emit just one ud2. > > > > > > > > That will not allow you to recover from the exception. UD2 is not an > > > > unconditional fail. It should have an out-going edge in this case too. > > > > > > Yes, CFI failures are not recoverable in that code. In fact, LLVM > > > assumes that the llvm.trap intrinsic (i.e. ud2) never returns, but I > > > suppose we could just use an int3 instead. I assume that's sufficient > > > to stop speculation? > > > > Peter, is there a reason you want things in the specific order of: > > > > cmp, je-to-call, trap, call > > > > Isn't it more run-time efficient to have an out-of-line failure of > > the form: > > > > cmp, jne-to-trap, call, ...code..., trap, jmp-to-call > > > > I thought the static label stuff allowed the "default out of line" > > option, as far as pessimizing certain states, etc? The former is certainly > > code-size smaller, though, yes, but doesn't it waste space in the cache > > line for the unlikely case, etc? > > Mostly so that we can deduce the address of the trap from the retpoline > site, also the above has a fairly high chance of using jcc.d32 which is > actually larger than jcc.d8+ud2. Ah, yeah, that's an interesting point. Still, I worry about finding ways to convinces Clang to emit precisely cmp/je/trap/call, but I guess we'll catch it immediately if it doesn't. :P -- Kees Cook
>>
>> Mostly so that we can deduce the address of the trap from the
>> retpoline
>> site, also the above has a fairly high chance of using jcc.d32 which
>> is
>> actually larger than jcc.d8+ud2.
>
> Ah, yeah, that's an interesting point.
>
> Still, I worry about finding ways to convinces Clang to emit precisely
> cmp/je/trap/call, but I guess we'll catch it immediately if it doesn't.
> :P
This can probably be done more easily/precisely if implemented directly
in the compiler's arch-specific backend. At least for x86 it wasn't a
hassle to emit a defined sequence of instructions in the past. The price
is that it will require a pass specific to each supported architecture,
but I guess this isn't that bad.
Perhaps this is discussion for a different mailing list, idk... but
just pointing that it is not a huge wall.
On 11/02/2022 13:38, Peter Zijlstra wrote:
> Now, this setup provides:
>
> - minimal size (please yell if there's a smaller option I missed;
> s/ud2/int3/ ?)
It's probably best not to overload int3. #BP is already has magic
properties, and what happens if someone wants to breakpoint one of these?
I'll make my usual into suggestion for 64bit builds.
ud2 is the sensible approach, with a caveat that whatever is hiding here
should have metadata.
~Andrew
On Tue, Feb 15, 2022 at 10:05:50PM +0100, Peter Zijlstra wrote:
> > Peter, is there a reason you want things in the specific order of:
> >
> > cmp, je-to-call, trap, call
> >
> > Isn't it more run-time efficient to have an out-of-line failure of
> > the form:
> >
> > cmp, jne-to-trap, call, ...code..., trap, jmp-to-call
> >
> > I thought the static label stuff allowed the "default out of line"
> > option, as far as pessimizing certain states, etc? The former is certainly
> > code-size smaller, though, yes, but doesn't it waste space in the cache
> > line for the unlikely case, etc?
>
> Mostly so that we can deduce the address of the trap from the retpoline
> site, also the above has a fairly high chance of using jcc.d32 which is
> actually larger than jcc.d8+ud2.
Also; and I think I mentioned this a few emails back, by having the
whole CFI thing as a single range of bytes it becomes easier to patch.
Hi Peter, One issue with this call sequence is that: On Fri, Feb 11, 2022 at 02:38:03PM +0100, Peter Zijlstra wrote: > caller: > cmpl $0xdeadbeef, -0x4(%rax) # 7 bytes Because this instruction ends in the constant 0xdeadbeef, it may be used as a "gadget" that would effectively allow branching to an arbitrary address in %rax if the attacker can arrange to set ZF=1. > je 1f # 2 bytes > ud2 # 2 bytes > 1: call __x86_indirect_thunk_rax # 5 bytes > > > .align 16 > .byte 0xef, 0xbe, 0xad, 0xde # 4 bytes > func: > endbr # 4 bytes > ... > ret I think we can avoid this problem with a slight tweak to your instruction sequence, at the cost of 2 bytes per function prologue. First, change the call sequence like so: cmpl $0xdeadbeef, -0x6(%rax) # 6 bytes je 1f # 2 bytes ud2 # 2 bytes 1: call __x86_indirect_thunk_rax # 5 bytes The key difference is that we've changed 0x4 to 0x6. Then change the function prologue to this: .align 16 .byte 0xef, 0xbe, 0xad, 0xde # 4 bytes .zero 2 # 2 bytes func: The end result of the above is that the constant embedded in the cmpl instruction may only be used to reach the following ud2 instruction, which will "harmlessly" terminate execution in the same way as if the prologue signature did not match. Peter
On 2022-03-01 19:06, Peter Collingbourne wrote:
> Hi Peter,
>
> One issue with this call sequence is that:
>
> On Fri, Feb 11, 2022 at 02:38:03PM +0100, Peter Zijlstra wrote:
>> caller:
>> cmpl $0xdeadbeef, -0x4(%rax) # 7 bytes
>
> Because this instruction ends in the constant 0xdeadbeef, it may
> be used as a "gadget" that would effectively allow branching to an
> arbitrary address in %rax if the attacker can arrange to set ZF=1.
>
>> je 1f # 2 bytes
>> ud2 # 2 bytes
>> 1: call __x86_indirect_thunk_rax # 5 bytes
>>
>>
>> .align 16
>> .byte 0xef, 0xbe, 0xad, 0xde # 4 bytes
>> func:
>> endbr # 4 bytes
>> ...
>> ret
>
> I think we can avoid this problem with a slight tweak to your
> instruction sequence, at the cost of 2 bytes per function prologue.
> First, change the call sequence like so:
>
> cmpl $0xdeadbeef, -0x6(%rax) # 6 bytes
> je 1f # 2 bytes
> ud2 # 2 bytes
> 1: call __x86_indirect_thunk_rax # 5 bytes
>
> The key difference is that we've changed 0x4 to 0x6.
>
> Then change the function prologue to this:
>
> .align 16
> .byte 0xef, 0xbe, 0xad, 0xde # 4 bytes
> .zero 2 # 2 bytes
> func:
>
> The end result of the above is that the constant embedded in the cmpl
> instruction may only be used to reach the following ud2 instruction,
> which will "harmlessly" terminate execution in the same way as if
> the prologue signature did not match.
FWIIW, this makes sense IMHO. These additional pre-prologue bytes are
also what might be needed to enable the smaller version of FineIBT that
I suggested in this thread some time ago.
Hi Peter, On Tue, Mar 1, 2022 at 7:06 PM Peter Collingbourne <pcc@google.com> wrote: > > Hi Peter, > One issue with this call sequence is that: > > On Fri, Feb 11, 2022 at 02:38:03PM +0100, Peter Zijlstra wrote: > > caller: > > cmpl $0xdeadbeef, -0x4(%rax) # 7 bytes > > Because this instruction ends in the constant 0xdeadbeef, it may > be used as a "gadget" that would effectively allow branching to an > arbitrary address in %rax if the attacker can arrange to set ZF=1. Do you mind elaborating how this instruction can be used as a gadget? How does it look like? The information will be useful to the summary of Sami's KCFI LLVM patch: https://reviews.llvm.org/D119296 > > je 1f # 2 bytes > > ud2 # 2 bytes > > 1: call __x86_indirect_thunk_rax # 5 bytes > > > > > > .align 16 > > .byte 0xef, 0xbe, 0xad, 0xde # 4 bytes > > func: > > endbr # 4 bytes > > ... > > ret > > I think we can avoid this problem with a slight tweak to your > instruction sequence, at the cost of 2 bytes per function prologue. > First, change the call sequence like so: > > cmpl $0xdeadbeef, -0x6(%rax) # 6 bytes > je 1f # 2 bytes > ud2 # 2 bytes > 1: call __x86_indirect_thunk_rax # 5 bytes > > The key difference is that we've changed 0x4 to 0x6. > > Then change the function prologue to this: > > .align 16 > .byte 0xef, 0xbe, 0xad, 0xde # 4 bytes > .zero 2 # 2 bytes > func: > > The end result of the above is that the constant embedded in the cmpl > instruction may only be used to reach the following ud2 instruction, > which will "harmlessly" terminate execution in the same way as if > the prologue signature did not match. > > Peter > -- 宋方睿
On Wed, Jun 8, 2022 at 10:53 AM Fāng-ruì Sòng <maskray@google.com> wrote: > > Hi Peter, > > On Tue, Mar 1, 2022 at 7:06 PM Peter Collingbourne <pcc@google.com> wrote: > > > > Hi Peter, > > One issue with this call sequence is that: > > > > On Fri, Feb 11, 2022 at 02:38:03PM +0100, Peter Zijlstra wrote: > > > caller: > > > cmpl $0xdeadbeef, -0x4(%rax) # 7 bytes > > > > Because this instruction ends in the constant 0xdeadbeef, it may > > be used as a "gadget" that would effectively allow branching to an > > arbitrary address in %rax if the attacker can arrange to set ZF=1. > > Do you mind elaborating how this instruction can be used as a gadget? > How does it look like? With the offset of -4, the je instruction here can be an indirect call target because it's preceded by a valid type hash at the end of the cmpl instruction. If we change the offset to -6, only the ud2 instruction is a potential call target in this sequence, which will be less useful to an attacker. > The information will be useful to the summary of Sami's KCFI LLVM > patch: https://reviews.llvm.org/D119296 I'll add more information about the X86 preamble to the description. Sami