linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/9] x86: Rewrite the retpoline rewrite logic
@ 2021-10-13 12:22 Peter Zijlstra
  2021-10-13 12:22 ` [PATCH 1/9] objtool,x86: Replace alternatives with .retpoline_sites Peter Zijlstra
                   ` (8 more replies)
  0 siblings, 9 replies; 52+ messages in thread
From: Peter Zijlstra @ 2021-10-13 12:22 UTC (permalink / raw)
  To: x86, jpoimboe, andrew.cooper3
  Cc: linux-kernel, peterz, alexei.starovoitov, ndesaulniers

Hi,

These patches rewrite the way retpolines are rewritten. Currently objtool emits
alternative entries for most retpoline calls. However trying to extend that led
to trouble (ELF files are horrid).

Therefore completely overhaul this and have objtool emit a .retpoline_sites
section that lists all compiler generated retpoline thunk calls. Then the
kernel can do with them as it pleases.

Notably it will:

 - rewrite them to indirect instructions for !RETPOLINE
 - rewrite them to lfence; indirect; for RETPOLINE_AMD,
   where size allows (boo clang!)

Specifically, the !RETPOLINE case can now also deal with the clang-special
conditional-indirect-tail-call:

  Jcc __x86_indirect_thunk_\reg.

Finally, also update the x86 BPF jit to catch up to recent times and do these
same things.

All this should help improve performance by removing an indirection.

The series has been booted in kvm using 'debug-alternative spectre_v2=off' and
'debug-alternative spectre_v2=retpoline,amd' on a x86_64
defconfig+kvm_guest.config build. IOW, early days.

---
 arch/x86/include/asm/GEN-for-each-reg.h |  10 +-
 arch/x86/include/asm/alternative.h      |   1 +
 arch/x86/kernel/alternative.c           | 186 +++++++++++++++++++++++++++++++-
 arch/x86/kernel/cpu/bugs.c              |   7 --
 arch/x86/kernel/module.c                |   9 +-
 arch/x86/kernel/vmlinux.lds.S           |  12 +++
 arch/x86/lib/retpoline.S                |  42 --------
 arch/x86/net/bpf_jit_comp.c             |  18 ++--
 tools/objtool/arch/x86/decode.c         | 126 +++++-----------------
 tools/objtool/elf.c                     |  84 ---------------
 tools/objtool/include/objtool/elf.h     |   1 -
 11 files changed, 246 insertions(+), 250 deletions(-)


^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH 1/9] objtool,x86: Replace alternatives with .retpoline_sites
  2021-10-13 12:22 [PATCH 0/9] x86: Rewrite the retpoline rewrite logic Peter Zijlstra
@ 2021-10-13 12:22 ` Peter Zijlstra
  2021-10-13 13:29   ` Borislav Petkov
  2021-10-13 20:11   ` Josh Poimboeuf
  2021-10-13 12:22 ` [PATCH 2/9] x86/retpoline: Remove unused replacement symbols Peter Zijlstra
                   ` (7 subsequent siblings)
  8 siblings, 2 replies; 52+ messages in thread
From: Peter Zijlstra @ 2021-10-13 12:22 UTC (permalink / raw)
  To: x86, jpoimboe, andrew.cooper3
  Cc: linux-kernel, peterz, alexei.starovoitov, ndesaulniers

Instead of writing complete alternatives, simply provide a list of all
the retpoline thunk calls. Then the kernel is free to do with them as
it pleases. Simpler code all-round.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/kernel/vmlinux.lds.S       |   12 +++
 arch/x86/lib/retpoline.S            |   42 ------------
 tools/objtool/arch/x86/decode.c     |  126 +++++++-----------------------------
 tools/objtool/elf.c                 |   84 ------------------------
 tools/objtool/include/objtool/elf.h |    1 
 5 files changed, 38 insertions(+), 227 deletions(-)

--- a/arch/x86/kernel/vmlinux.lds.S
+++ b/arch/x86/kernel/vmlinux.lds.S
@@ -273,6 +273,18 @@ SECTIONS
 	}
 
 	/*
+	 * List of instructions that call/jmp/jcc to retpoline thunks
+	 * __x86_indirect_thunk_*(). These instructions can be patched along
+	 * with alternatives, after which the section can be freed.
+	 */
+	. = ALIGN(8);
+	.retpoline_sites : AT(ADDR(.retpoline_sites) - LOAD_OFFSET) {
+		__retpoline_sites = .;
+		*(.retpoline_sites)
+		__retpoline_sites_end = .;
+	}
+
+	/*
 	 * struct alt_inst entries. From the header (alternative.h):
 	 * "Alternative instructions for different CPU types or capabilities"
 	 * Think locking instructions on spinlocks.
--- a/tools/objtool/arch/x86/decode.c
+++ b/tools/objtool/arch/x86/decode.c
@@ -711,121 +711,47 @@ const char *arch_ret_insn(int len)
 	return ret[len-1];
 }
 
-/* asm/alternative.h ? */
-
-#define ALTINSTR_FLAG_INV	(1 << 15)
-#define ALT_NOT(feat)		((feat) | ALTINSTR_FLAG_INV)
-
-struct alt_instr {
-	s32 instr_offset;	/* original instruction */
-	s32 repl_offset;	/* offset to replacement instruction */
-	u16 cpuid;		/* cpuid bit set for replacement */
-	u8  instrlen;		/* length of original instruction */
-	u8  replacementlen;	/* length of new instruction */
-} __packed;
-
-static int elf_add_alternative(struct elf *elf,
-			       struct instruction *orig, struct symbol *sym,
-			       int cpuid, u8 orig_len, u8 repl_len)
+int arch_rewrite_retpolines(struct objtool_file *file)
 {
-	const int size = sizeof(struct alt_instr);
-	struct alt_instr *alt;
+	struct instruction *insn;
 	struct section *sec;
-	Elf_Scn *s;
-
-	sec = find_section_by_name(elf, ".altinstructions");
-	if (!sec) {
-		sec = elf_create_section(elf, ".altinstructions",
-					 SHF_ALLOC, 0, 0);
-
-		if (!sec) {
-			WARN_ELF("elf_create_section");
-			return -1;
-		}
-	}
-
-	s = elf_getscn(elf->elf, sec->idx);
-	if (!s) {
-		WARN_ELF("elf_getscn");
-		return -1;
-	}
+	int idx;
 
-	sec->data = elf_newdata(s);
-	if (!sec->data) {
-		WARN_ELF("elf_newdata");
-		return -1;
+	sec = find_section_by_name(file->elf, ".retpoline_sites");
+	if (sec) {
+		WARN("file already has .retpoline_sites, skipping");
+		return 0;
 	}
 
-	sec->data->d_size = size;
-	sec->data->d_align = 1;
-
-	alt = sec->data->d_buf = malloc(size);
-	if (!sec->data->d_buf) {
-		perror("malloc");
-		return -1;
-	}
-	memset(sec->data->d_buf, 0, size);
+	idx = 0;
+	list_for_each_entry(insn, &file->retpoline_call_list, call_node)
+		idx++;
 
-	if (elf_add_reloc_to_insn(elf, sec, sec->sh.sh_size,
-				  R_X86_64_PC32, orig->sec, orig->offset)) {
-		WARN("elf_create_reloc: alt_instr::instr_offset");
-		return -1;
-	}
+	if (!idx)
+		return 0;
 
-	if (elf_add_reloc(elf, sec, sec->sh.sh_size + 4,
-			  R_X86_64_PC32, sym, 0)) {
-		WARN("elf_create_reloc: alt_instr::repl_offset");
+	sec = elf_create_section(file->elf, ".retpoline_sites", 0,
+				 sizeof(int), idx);
+	if (!sec) {
+		WARN("elf_create_section: .retpoline_sites");
 		return -1;
 	}
 
-	alt->cpuid = bswap_if_needed(cpuid);
-	alt->instrlen = orig_len;
-	alt->replacementlen = repl_len;
-
-	sec->sh.sh_size += size;
-	sec->changed = true;
-
-	return 0;
-}
-
-#define X86_FEATURE_RETPOLINE                ( 7*32+12)
-
-int arch_rewrite_retpolines(struct objtool_file *file)
-{
-	struct instruction *insn;
-	struct reloc *reloc;
-	struct symbol *sym;
-	char name[32] = "";
-
+	idx = 0;
 	list_for_each_entry(insn, &file->retpoline_call_list, call_node) {
 
-		if (insn->type != INSN_JUMP_DYNAMIC &&
-		    insn->type != INSN_CALL_DYNAMIC)
-			continue;
-
-		if (!strcmp(insn->sec->name, ".text.__x86.indirect_thunk"))
-			continue;
-
-		reloc = insn->reloc;
-
-		sprintf(name, "__x86_indirect_alt_%s_%s",
-			insn->type == INSN_JUMP_DYNAMIC ? "jmp" : "call",
-			reloc->sym->name + 21);
-
-		sym = find_symbol_by_name(file->elf, name);
-		if (!sym) {
-			sym = elf_create_undef_symbol(file->elf, name);
-			if (!sym) {
-				WARN("elf_create_undef_symbol");
-				return -1;
-			}
-		}
+		int *site = (int *)sec->data->d_buf + idx;
+		*site = 0;
 
-		if (elf_add_alternative(file->elf, insn, sym,
-					ALT_NOT(X86_FEATURE_RETPOLINE), 5, 5)) {
-			WARN("elf_add_alternative");
+		if (elf_add_reloc_to_insn(file->elf, sec,
+					  idx * sizeof(int),
+					  R_X86_64_PC32,
+					  insn->sec, insn->offset)) {
+			WARN("elf_add_reloc_to_insn: .retpoline_sites");
 			return -1;
 		}
+
+		idx++;
 	}
 
 	return 0;
--- a/tools/objtool/elf.c
+++ b/tools/objtool/elf.c
@@ -740,90 +740,6 @@ static int elf_add_string(struct elf *el
 	return len;
 }
 
-struct symbol *elf_create_undef_symbol(struct elf *elf, const char *name)
-{
-	struct section *symtab, *symtab_shndx;
-	struct symbol *sym;
-	Elf_Data *data;
-	Elf_Scn *s;
-
-	sym = malloc(sizeof(*sym));
-	if (!sym) {
-		perror("malloc");
-		return NULL;
-	}
-	memset(sym, 0, sizeof(*sym));
-
-	sym->name = strdup(name);
-
-	sym->sym.st_name = elf_add_string(elf, NULL, sym->name);
-	if (sym->sym.st_name == -1)
-		return NULL;
-
-	sym->sym.st_info = GELF_ST_INFO(STB_GLOBAL, STT_NOTYPE);
-	// st_other 0
-	// st_shndx 0
-	// st_value 0
-	// st_size 0
-
-	symtab = find_section_by_name(elf, ".symtab");
-	if (!symtab) {
-		WARN("can't find .symtab");
-		return NULL;
-	}
-
-	s = elf_getscn(elf->elf, symtab->idx);
-	if (!s) {
-		WARN_ELF("elf_getscn");
-		return NULL;
-	}
-
-	data = elf_newdata(s);
-	if (!data) {
-		WARN_ELF("elf_newdata");
-		return NULL;
-	}
-
-	data->d_buf = &sym->sym;
-	data->d_size = sizeof(sym->sym);
-	data->d_align = 1;
-	data->d_type = ELF_T_SYM;
-
-	sym->idx = symtab->sh.sh_size / sizeof(sym->sym);
-
-	symtab->sh.sh_size += data->d_size;
-	symtab->changed = true;
-
-	symtab_shndx = find_section_by_name(elf, ".symtab_shndx");
-	if (symtab_shndx) {
-		s = elf_getscn(elf->elf, symtab_shndx->idx);
-		if (!s) {
-			WARN_ELF("elf_getscn");
-			return NULL;
-		}
-
-		data = elf_newdata(s);
-		if (!data) {
-			WARN_ELF("elf_newdata");
-			return NULL;
-		}
-
-		data->d_buf = &sym->sym.st_size; /* conveniently 0 */
-		data->d_size = sizeof(Elf32_Word);
-		data->d_align = 4;
-		data->d_type = ELF_T_WORD;
-
-		symtab_shndx->sh.sh_size += 4;
-		symtab_shndx->changed = true;
-	}
-
-	sym->sec = find_section_by_index(elf, 0);
-
-	elf_add_symbol(elf, sym);
-
-	return sym;
-}
-
 struct section *elf_create_section(struct elf *elf, const char *name,
 				   unsigned int sh_flags, size_t entsize, int nr)
 {
--- a/tools/objtool/include/objtool/elf.h
+++ b/tools/objtool/include/objtool/elf.h
@@ -141,7 +141,6 @@ int elf_write_insn(struct elf *elf, stru
 		   unsigned long offset, unsigned int len,
 		   const char *insn);
 int elf_write_reloc(struct elf *elf, struct reloc *reloc);
-struct symbol *elf_create_undef_symbol(struct elf *elf, const char *name);
 int elf_write(struct elf *elf);
 void elf_close(struct elf *elf);
 



^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH 2/9] x86/retpoline: Remove unused replacement symbols
  2021-10-13 12:22 [PATCH 0/9] x86: Rewrite the retpoline rewrite logic Peter Zijlstra
  2021-10-13 12:22 ` [PATCH 1/9] objtool,x86: Replace alternatives with .retpoline_sites Peter Zijlstra
@ 2021-10-13 12:22 ` Peter Zijlstra
  2021-10-13 12:22 ` [PATCH 3/9] x86/asm: Fix register order Peter Zijlstra
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 52+ messages in thread
From: Peter Zijlstra @ 2021-10-13 12:22 UTC (permalink / raw)
  To: x86, jpoimboe, andrew.cooper3
  Cc: linux-kernel, peterz, alexei.starovoitov, ndesaulniers

Now that objtool no longer creates alternatives, these replacement
symbols are no longer needed, remove them.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/lib/retpoline.S |   42 ------------------------------------------
 1 file changed, 42 deletions(-)

--- a/arch/x86/lib/retpoline.S
+++ b/arch/x86/lib/retpoline.S
@@ -41,36 +41,6 @@ SYM_FUNC_END(__x86_indirect_thunk_\reg)
 .endm
 
 /*
- * This generates .altinstr_replacement symbols for use by objtool. They,
- * however, must not actually live in .altinstr_replacement since that will be
- * discarded after init, but module alternatives will also reference these
- * symbols.
- *
- * Their names matches the "__x86_indirect_" prefix to mark them as retpolines.
- */
-.macro ALT_THUNK reg
-
-	.align 1
-
-SYM_FUNC_START_NOALIGN(__x86_indirect_alt_call_\reg)
-	ANNOTATE_RETPOLINE_SAFE
-1:	call	*%\reg
-2:	.skip	5-(2b-1b), 0x90
-SYM_FUNC_END(__x86_indirect_alt_call_\reg)
-
-STACK_FRAME_NON_STANDARD(__x86_indirect_alt_call_\reg)
-
-SYM_FUNC_START_NOALIGN(__x86_indirect_alt_jmp_\reg)
-	ANNOTATE_RETPOLINE_SAFE
-1:	jmp	*%\reg
-2:	.skip	5-(2b-1b), 0x90
-SYM_FUNC_END(__x86_indirect_alt_jmp_\reg)
-
-STACK_FRAME_NON_STANDARD(__x86_indirect_alt_jmp_\reg)
-
-.endm
-
-/*
  * Despite being an assembler file we can't just use .irp here
  * because __KSYM_DEPS__ only uses the C preprocessor and would
  * only see one instance of "__x86_indirect_thunk_\reg" rather
@@ -92,15 +62,3 @@ STACK_FRAME_NON_STANDARD(__x86_indirect_
 #undef GEN
 #define GEN(reg) EXPORT_THUNK(reg)
 #include <asm/GEN-for-each-reg.h>
-
-#undef GEN
-#define GEN(reg) ALT_THUNK reg
-#include <asm/GEN-for-each-reg.h>
-
-#undef GEN
-#define GEN(reg) __EXPORT_THUNK(__x86_indirect_alt_call_ ## reg)
-#include <asm/GEN-for-each-reg.h>
-
-#undef GEN
-#define GEN(reg) __EXPORT_THUNK(__x86_indirect_alt_jmp_ ## reg)
-#include <asm/GEN-for-each-reg.h>



^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH 3/9] x86/asm: Fix register order
  2021-10-13 12:22 [PATCH 0/9] x86: Rewrite the retpoline rewrite logic Peter Zijlstra
  2021-10-13 12:22 ` [PATCH 1/9] objtool,x86: Replace alternatives with .retpoline_sites Peter Zijlstra
  2021-10-13 12:22 ` [PATCH 2/9] x86/retpoline: Remove unused replacement symbols Peter Zijlstra
@ 2021-10-13 12:22 ` Peter Zijlstra
  2021-10-13 20:15   ` Josh Poimboeuf
  2021-10-13 12:22 ` [PATCH 4/9] x86/alternative: Implement .retpoline_sites support Peter Zijlstra
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 52+ messages in thread
From: Peter Zijlstra @ 2021-10-13 12:22 UTC (permalink / raw)
  To: x86, jpoimboe, andrew.cooper3
  Cc: linux-kernel, peterz, alexei.starovoitov, ndesaulniers

Ensure the register order is correct; this allows for easy translation
between register number and trampoline and vice-versa.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/include/asm/GEN-for-each-reg.h |   10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

--- a/arch/x86/include/asm/GEN-for-each-reg.h
+++ b/arch/x86/include/asm/GEN-for-each-reg.h
@@ -1,11 +1,12 @@
 #ifdef CONFIG_64BIT
 GEN(rax)
-GEN(rbx)
 GEN(rcx)
 GEN(rdx)
+GEN(rbx)
+GEN(rsp)
+GEN(rbp)
 GEN(rsi)
 GEN(rdi)
-GEN(rbp)
 GEN(r8)
 GEN(r9)
 GEN(r10)
@@ -16,10 +17,11 @@ GEN(r14)
 GEN(r15)
 #else
 GEN(eax)
-GEN(ebx)
 GEN(ecx)
 GEN(edx)
+GEN(ebx)
+GEN(esp)
+GEN(ebp)
 GEN(esi)
 GEN(edi)
-GEN(ebp)
 #endif



^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH 4/9] x86/alternative: Implement .retpoline_sites support
  2021-10-13 12:22 [PATCH 0/9] x86: Rewrite the retpoline rewrite logic Peter Zijlstra
                   ` (2 preceding siblings ...)
  2021-10-13 12:22 ` [PATCH 3/9] x86/asm: Fix register order Peter Zijlstra
@ 2021-10-13 12:22 ` Peter Zijlstra
  2021-10-13 14:38   ` Andrew Cooper
                     ` (4 more replies)
  2021-10-13 12:22 ` [PATCH 5/9] x86/alternative: Handle Jcc __x86_indirect_thunk_\reg Peter Zijlstra
                   ` (4 subsequent siblings)
  8 siblings, 5 replies; 52+ messages in thread
From: Peter Zijlstra @ 2021-10-13 12:22 UTC (permalink / raw)
  To: x86, jpoimboe, andrew.cooper3
  Cc: linux-kernel, peterz, alexei.starovoitov, ndesaulniers

Rewrite retpoline thunk call sites to be indirect calls for
spectre_v2=off.  This ensures spectre_v2=off is as near to a
RETPOLINE=n build as possible.

This is the replacement for objtool writing alternative entries to
ensure the same and achieves feature-parity with the previous
approach.

One noteworthy feature is that it relies on the thunks to be in
machine order to compute the register index.

Specifically, this does not yet address the Jcc __x86_indirect_thunk_*
calls generated by clang, a future patch will add this.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/include/asm/alternative.h |    1 
 arch/x86/kernel/alternative.c      |  136 +++++++++++++++++++++++++++++++++++--
 arch/x86/kernel/module.c           |    9 ++
 3 files changed, 141 insertions(+), 5 deletions(-)

--- a/arch/x86/include/asm/alternative.h
+++ b/arch/x86/include/asm/alternative.h
@@ -75,6 +75,7 @@ extern int alternatives_patched;
 
 extern void alternative_instructions(void);
 extern void apply_alternatives(struct alt_instr *start, struct alt_instr *end);
+extern void apply_retpolines(s32 *start, s32 *end);
 
 struct module;
 
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -29,6 +29,7 @@
 #include <asm/io.h>
 #include <asm/fixmap.h>
 #include <asm/paravirt.h>
+#include <asm/asm-prototypes.h>
 
 int __read_mostly alternatives_patched;
 
@@ -113,6 +114,7 @@ static void __init_or_module add_nops(vo
 	}
 }
 
+extern s32 __retpoline_sites[], __retpoline_sites_end[];
 extern struct alt_instr __alt_instructions[], __alt_instructions_end[];
 extern s32 __smp_locks[], __smp_locks_end[];
 void text_poke_early(void *addr, const void *opcode, size_t len);
@@ -221,7 +223,7 @@ static __always_inline int optimize_nops
  * "noinline" to cause control flow change and thus invalidate I$ and
  * cause refetch after modification.
  */
-static void __init_or_module noinline optimize_nops(struct alt_instr *a, u8 *instr)
+static void __init_or_module noinline optimize_nops(u8 *instr, size_t len)
 {
 	struct insn insn;
 	int i = 0;
@@ -239,11 +241,11 @@ static void __init_or_module noinline op
 		 * optimized.
 		 */
 		if (insn.length == 1 && insn.opcode.bytes[0] == 0x90)
-			i += optimize_nops_range(instr, a->instrlen, i);
+			i += optimize_nops_range(instr, len, i);
 		else
 			i += insn.length;
 
-		if (i >= a->instrlen)
+		if (i >= len)
 			return;
 	}
 }
@@ -331,10 +333,130 @@ void __init_or_module noinline apply_alt
 		text_poke_early(instr, insn_buff, insn_buff_sz);
 
 next:
-		optimize_nops(a, instr);
+		optimize_nops(instr, a->instrlen);
 	}
 }
 
+#ifdef CONFIG_X86_64
+
+/*
+ * CALL/JMP *%\reg
+ */
+static int emit_indirect(int op, int reg, u8 *bytes)
+{
+	int i = 0;
+	u8 modrm;
+
+	switch (op) {
+	case CALL_INSN_OPCODE:
+		modrm = 0x10; /* Reg = 2; CALL r/m */
+		break;
+
+	case JMP32_INSN_OPCODE:
+		modrm = 0x20; /* Reg = 4; JMP r/m */
+		break;
+
+	default:
+		WARN_ON_ONCE(1);
+		return -1;
+	}
+
+	if (reg >= 8) {
+		bytes[i++] = 0x41; /* REX.B prefix */
+		reg -= 8;
+	}
+
+	modrm |= 0xc0; /* Mod = 3 */
+	modrm += reg;
+
+	bytes[i++] = 0xff; /* opcode */
+	bytes[i++] = modrm;
+
+	return i;
+}
+
+/*
+ * Rewrite the compiler generated retpoline thunk calls.
+ *
+ * For spectre_v2=off (!X86_FEATURE_RETPOLINE), rewrite them into immediate
+ * indirect instructions, avoiding the extra indirection.
+ *
+ * For example, convert:
+ *
+ *   CALL __x86_indirect_thunk_\reg
+ *
+ * into:
+ *
+ *   CALL *%\reg
+ *
+ */
+static int patch_retpoline(void *addr, struct insn *insn, u8 *bytes)
+{
+	void (*target)(void);
+	int reg, i = 0;
+
+	if (cpu_feature_enabled(X86_FEATURE_RETPOLINE))
+		return -1;
+
+	target = addr + insn->length + insn->immediate.value;
+	reg = (target - &__x86_indirect_thunk_rax) /
+	      (&__x86_indirect_thunk_rcx - &__x86_indirect_thunk_rax);
+
+	if (WARN_ON_ONCE(reg & ~0xf))
+		return -1;
+
+	i = emit_indirect(insn->opcode.bytes[0], reg, bytes);
+	if (i < 0)
+		return i;
+
+	for (; i < insn->length;)
+		bytes[i++] = BYTES_NOP1;
+
+	return i;
+}
+
+void __init_or_module noinline apply_retpolines(s32 *start, s32 *end)
+{
+	s32 *s;
+
+	for (s = start; s < end; s++) {
+		void *addr = (void *)s + *s;
+		struct insn insn;
+		int len, ret;
+		u8 bytes[16];
+		u8 op1, op2;
+
+		ret = insn_decode_kernel(&insn, addr);
+		if (WARN_ON_ONCE(ret < 0))
+			continue;
+
+		op1 = insn.opcode.bytes[0];
+		op2 = insn.opcode.bytes[1];
+
+		switch (op1) {
+		case CALL_INSN_OPCODE:
+		case JMP32_INSN_OPCODE:
+			break;
+
+		default:
+			WARN_ON_ONCE(1);
+			continue;
+		}
+
+		len = patch_retpoline(addr, &insn, bytes);
+		if (len == insn.length) {
+			optimize_nops(bytes, len);
+			text_poke_early(addr, bytes, len);
+		}
+	}
+}
+
+#else /* CONFIG_X86_32 */
+
+void __init_or_module noinline apply_retpolines(s32 *start, s32 *end) { }
+
+#endif /* CONFIG_X86_64 */
+
 #ifdef CONFIG_SMP
 static void alternatives_smp_lock(const s32 *start, const s32 *end,
 				  u8 *text, u8 *text_end)
@@ -643,6 +765,12 @@ void __init alternative_instructions(voi
 	apply_paravirt(__parainstructions, __parainstructions_end);
 
 	/*
+	 * Rewrite the retpolines, must be done before alternatives since
+	 * those can rewrite the retpoline thunks.
+	 */
+	apply_retpolines(__retpoline_sites, __retpoline_sites_end);
+
+	/*
 	 * Then patch alternatives, such that those paravirt calls that are in
 	 * alternatives can be overwritten by their immediate fragments.
 	 */
--- a/arch/x86/kernel/module.c
+++ b/arch/x86/kernel/module.c
@@ -251,7 +251,8 @@ int module_finalize(const Elf_Ehdr *hdr,
 		    struct module *me)
 {
 	const Elf_Shdr *s, *text = NULL, *alt = NULL, *locks = NULL,
-		*para = NULL, *orc = NULL, *orc_ip = NULL;
+		*para = NULL, *orc = NULL, *orc_ip = NULL,
+		*retpolines = NULL;
 	char *secstrings = (void *)hdr + sechdrs[hdr->e_shstrndx].sh_offset;
 
 	for (s = sechdrs; s < sechdrs + hdr->e_shnum; s++) {
@@ -267,8 +268,14 @@ int module_finalize(const Elf_Ehdr *hdr,
 			orc = s;
 		if (!strcmp(".orc_unwind_ip", secstrings + s->sh_name))
 			orc_ip = s;
+		if (!strcmp(".retpoline_sites", secstrings + s->sh_name))
+			retpolines = s;
 	}
 
+	if (retpolines) {
+		void *rseg = (void *)retpolines->sh_addr;
+		apply_retpolines(rseg, rseg + retpolines->sh_size);
+	}
 	if (alt) {
 		/* patch .altinstructions */
 		void *aseg = (void *)alt->sh_addr;



^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH 5/9] x86/alternative: Handle Jcc __x86_indirect_thunk_\reg
  2021-10-13 12:22 [PATCH 0/9] x86: Rewrite the retpoline rewrite logic Peter Zijlstra
                   ` (3 preceding siblings ...)
  2021-10-13 12:22 ` [PATCH 4/9] x86/alternative: Implement .retpoline_sites support Peter Zijlstra
@ 2021-10-13 12:22 ` Peter Zijlstra
  2021-10-13 20:11   ` Nick Desaulniers
  2021-10-13 12:22 ` [PATCH 6/9] x86/alternative: Try inline spectre_v2=retpoline,amd Peter Zijlstra
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 52+ messages in thread
From: Peter Zijlstra @ 2021-10-13 12:22 UTC (permalink / raw)
  To: x86, jpoimboe, andrew.cooper3
  Cc: linux-kernel, peterz, alexei.starovoitov, ndesaulniers

Handle the rare cases where the compiler (clang) does an indirect
conditional tail-call using:

  Jcc __x86_indirect_thunk_\reg

For the !RETPOLINE case this can be rewritten to fit the original (6
byte) instruction like:

  Jncc.d8	1f
  JMP		*%\reg
  NOP
1:

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/kernel/alternative.c |   38 ++++++++++++++++++++++++++++++++++----
 1 file changed, 34 insertions(+), 4 deletions(-)

--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -378,7 +378,8 @@ static int emit_indirect(int op, int reg
 static int patch_retpoline(void *addr, struct insn *insn, u8 *bytes)
 {
 	void (*target)(void);
-	int reg, i = 0;
+	int reg, ret, i = 0;
+	u8 op, cc;
 
 	if (cpu_feature_enabled(X86_FEATURE_RETPOLINE))
 		return -1;
@@ -390,9 +391,34 @@ static int patch_retpoline(void *addr, s
 	if (WARN_ON_ONCE(reg & ~0xf))
 		return -1;
 
-	i = emit_indirect(insn->opcode.bytes[0], reg, bytes);
-	if (i < 0)
-		return i;
+	op = insn->opcode.bytes[0];
+
+	/*
+	 * Convert:
+	 *
+	 *   Jcc.d32 __x86_indirect_thunk_\reg
+	 *
+	 * into:
+	 *
+	 *   Jncc.d8 1f
+	 *   jmp *%\reg
+	 *   nop
+	 * 1:
+	 */
+	if (op == 0x0f && (insn->opcode.bytes[1] & 0xf0) == 0x80) {
+		cc = insn->opcode.bytes[1] & 0xf;
+		cc ^= 1; /* invert condition */
+
+		bytes[i++] = 0x70 + cc; /* Jcc.d8 */
+		bytes[i++] = insn->length - 2;
+
+		op = JMP32_INSN_OPCODE;
+	}
+
+	ret = emit_indirect(op, reg, bytes + i);
+	if (ret < 0)
+		return ret;
+	i += ret;
 
 	for (; i < insn->length;)
 		bytes[i++] = BYTES_NOP1;
@@ -423,6 +449,10 @@ void __init_or_module noinline apply_ret
 		case JMP32_INSN_OPCODE:
 			break;
 
+		case 0x0f: /* escape */
+			if (op2 >= 0x80 && op2 <= 0x8f)
+				break;
+			fallthrough;
 		default:
 			WARN_ON_ONCE(1);
 			continue;



^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH 6/9] x86/alternative: Try inline spectre_v2=retpoline,amd
  2021-10-13 12:22 [PATCH 0/9] x86: Rewrite the retpoline rewrite logic Peter Zijlstra
                   ` (4 preceding siblings ...)
  2021-10-13 12:22 ` [PATCH 5/9] x86/alternative: Handle Jcc __x86_indirect_thunk_\reg Peter Zijlstra
@ 2021-10-13 12:22 ` Peter Zijlstra
  2021-10-13 12:22 ` [PATCH 7/9] x86/alternative: Add debug prints to apply_retpolines() Peter Zijlstra
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 52+ messages in thread
From: Peter Zijlstra @ 2021-10-13 12:22 UTC (permalink / raw)
  To: x86, jpoimboe, andrew.cooper3
  Cc: linux-kernel, peterz, alexei.starovoitov, ndesaulniers

Try and replace retpoline thunk calls with:

  lfence
  call    *%\reg

for spectre_v2=retpoline,amd.

Specifically, the sequence above is 5 bytes for the low 8 registers,
but 6 bytes for the high 8 registers. This means that unless the
compilers prefix stuff the call with higher registers this replacement
will fail.

Luckily GCC strongly favours RAX for the indirect calls and most (95%+
for defconfig-x86_64) will be converted. OTOH clang strongly favours
R11 and almost nothing gets converted.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/kernel/alternative.c |   24 +++++++++++++++++++-----
 1 file changed, 19 insertions(+), 5 deletions(-)

--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -389,15 +389,13 @@ static int emit_indirect(int op, int reg
  *
  *   CALL *%\reg
  *
+ * It also tries to inline spectre_v2=retpoline,amd when size permits.
  */
 static int patch_retpoline(void *addr, struct insn *insn, u8 *bytes)
 {
+	u8 cc, op = insn->opcode.bytes[0];
 	void (*target)(void);
 	int reg, ret, i = 0;
-	u8 op, cc;
-
-	if (cpu_feature_enabled(X86_FEATURE_RETPOLINE))
-		return -1;
 
 	target = addr + insn->length + insn->immediate.value;
 	reg = (target - &__x86_indirect_thunk_rax) /
@@ -406,7 +404,22 @@ static int patch_retpoline(void *addr, s
 	if (WARN_ON_ONCE(reg & ~0xf))
 		return -1;
 
-	op = insn->opcode.bytes[0];
+	if (cpu_feature_enabled(X86_FEATURE_RETPOLINE_AMD)) {
+		/*
+		 * Can't do nothing about the Jcc case here.
+		 */
+		if (op != JMP32_INSN_OPCODE && op != CALL_INSN_OPCODE)
+			return -1;
+
+		bytes[i++] = 0x0f;
+		bytes[i++] = 0xae;
+		bytes[i++] = 0xe8; /* lfence */
+
+		goto indirect;
+	}
+
+	if (cpu_feature_enabled(X86_FEATURE_RETPOLINE))
+		return -1;
 
 	/*
 	 * Convert:
@@ -430,6 +443,7 @@ static int patch_retpoline(void *addr, s
 		op = JMP32_INSN_OPCODE;
 	}
 
+indirect:
 	ret = emit_indirect(op, reg, bytes + i);
 	if (ret < 0)
 		return ret;



^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH 7/9] x86/alternative: Add debug prints to apply_retpolines()
  2021-10-13 12:22 [PATCH 0/9] x86: Rewrite the retpoline rewrite logic Peter Zijlstra
                   ` (5 preceding siblings ...)
  2021-10-13 12:22 ` [PATCH 6/9] x86/alternative: Try inline spectre_v2=retpoline,amd Peter Zijlstra
@ 2021-10-13 12:22 ` Peter Zijlstra
  2021-10-13 12:22 ` [PATCH 8/9] x86,bugs: Unconditionally allow spectre_v2=retpoline,amd Peter Zijlstra
  2021-10-13 12:22 ` [PATCH 9/9] bpf,x86: Respect X86_FEATURE_RETPOLINE* Peter Zijlstra
  8 siblings, 0 replies; 52+ messages in thread
From: Peter Zijlstra @ 2021-10-13 12:22 UTC (permalink / raw)
  To: x86, jpoimboe, andrew.cooper3
  Cc: linux-kernel, peterz, alexei.starovoitov, ndesaulniers

Make sure we can see the text changes when booting with
'debug-alternative'.

Example output:

 [ ] SMP alternatives: retpoline at: __traceiter_initcall_level+0x1f/0x30 (ffffffff8100066f) len: 5 to: __x86_indirect_thunk_rax+0x0/0x20
 [ ] SMP alternatives: ffffffff82603e58: [2:5) optimized NOPs: ff d0 0f 1f 00
 [ ] SMP alternatives: ffffffff8100066f: orig: e8 cc 30 00 01
 [ ] SMP alternatives: ffffffff8100066f: repl: ff d0 0f 1f 00

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/kernel/alternative.c |    6 ++++++
 1 file changed, 6 insertions(+)

--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -487,9 +487,15 @@ void __init_or_module noinline apply_ret
 			continue;
 		}
 
+		DPRINTK("retpoline at: %pS (%px) len: %d to: %pS",
+			addr, addr, insn.length,
+			addr + insn.length + insn.immediate.value);
+
 		len = patch_retpoline(addr, &insn, bytes);
 		if (len == insn.length) {
 			optimize_nops(bytes, len);
+			DUMP_BYTES(((u8*)addr),  len, "%px: orig: ", addr);
+			DUMP_BYTES(((u8*)bytes), len, "%px: repl: ", addr);
 			text_poke_early(addr, bytes, len);
 		}
 	}



^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH 8/9] x86,bugs: Unconditionally allow spectre_v2=retpoline,amd
  2021-10-13 12:22 [PATCH 0/9] x86: Rewrite the retpoline rewrite logic Peter Zijlstra
                   ` (6 preceding siblings ...)
  2021-10-13 12:22 ` [PATCH 7/9] x86/alternative: Add debug prints to apply_retpolines() Peter Zijlstra
@ 2021-10-13 12:22 ` Peter Zijlstra
  2021-10-13 12:22 ` [PATCH 9/9] bpf,x86: Respect X86_FEATURE_RETPOLINE* Peter Zijlstra
  8 siblings, 0 replies; 52+ messages in thread
From: Peter Zijlstra @ 2021-10-13 12:22 UTC (permalink / raw)
  To: x86, jpoimboe, andrew.cooper3
  Cc: linux-kernel, peterz, alexei.starovoitov, ndesaulniers

Currently Linux prevents usage of retpoline,amd on !AMD hardware, this
is unfriendly and gets in the way of testing. Remove this restriction.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/kernel/cpu/bugs.c |    7 -------
 1 file changed, 7 deletions(-)

--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -882,13 +882,6 @@ static enum spectre_v2_mitigation_cmd __
 		return SPECTRE_V2_CMD_AUTO;
 	}
 
-	if (cmd == SPECTRE_V2_CMD_RETPOLINE_AMD &&
-	    boot_cpu_data.x86_vendor != X86_VENDOR_HYGON &&
-	    boot_cpu_data.x86_vendor != X86_VENDOR_AMD) {
-		pr_err("retpoline,amd selected but CPU is not AMD. Switching to AUTO select\n");
-		return SPECTRE_V2_CMD_AUTO;
-	}
-
 	spec_v2_print_cond(mitigation_options[i].option,
 			   mitigation_options[i].secure);
 	return cmd;



^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH 9/9] bpf,x86: Respect X86_FEATURE_RETPOLINE*
  2021-10-13 12:22 [PATCH 0/9] x86: Rewrite the retpoline rewrite logic Peter Zijlstra
                   ` (7 preceding siblings ...)
  2021-10-13 12:22 ` [PATCH 8/9] x86,bugs: Unconditionally allow spectre_v2=retpoline,amd Peter Zijlstra
@ 2021-10-13 12:22 ` Peter Zijlstra
  2021-10-13 21:06   ` Josh Poimboeuf
  8 siblings, 1 reply; 52+ messages in thread
From: Peter Zijlstra @ 2021-10-13 12:22 UTC (permalink / raw)
  To: x86, jpoimboe, andrew.cooper3
  Cc: linux-kernel, peterz, alexei.starovoitov, ndesaulniers

Current BPF codegen doesn't respect X86_FEATURE_RETPOLINE* flags and
unconditionally emits a thunk call, this is sub-optimal and doesn't
match the regular, compiler generated, code.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/net/bpf_jit_comp.c |   18 +++++++++++-------
 1 file changed, 11 insertions(+), 7 deletions(-)

--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -2123,14 +2123,18 @@ static int emit_fallback_jump(u8 **pprog
 	int err = 0;
 
 #ifdef CONFIG_RETPOLINE
-	/* Note that this assumes the the compiler uses external
-	 * thunks for indirect calls. Both clang and GCC use the same
-	 * naming convention for external thunks.
-	 */
-	err = emit_jump(&prog, __x86_indirect_thunk_rdx, prog);
-#else
-	EMIT2(0xFF, 0xE2);	/* jmp rdx */
+	if (cpu_feature_enabled(X86_FEATURE_RETPOLINE)) {
+		if (cpu_feature_enabled(X86_FEATURE_RETPOLINE_AMD)) {
+			/* The AMD retpoline can be easily emitted inline. */
+			EMIT3(0x0F, 0xAE, 0xE8);	/* lfence */
+			EMIT2(0xFF, 0xE2);		/* jmp rdx */
+		} else {
+			/* Call the retpoline thunk */
+			err = emit_jump(&prog, __x86_indirect_thunk_rdx, prog);
+		}
+	} else
 #endif
+	EMIT2(0xFF, 0xE2);	/* jmp rdx */
 	*pprog = prog;
 	return err;
 }



^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 1/9] objtool,x86: Replace alternatives with .retpoline_sites
  2021-10-13 12:22 ` [PATCH 1/9] objtool,x86: Replace alternatives with .retpoline_sites Peter Zijlstra
@ 2021-10-13 13:29   ` Borislav Petkov
  2021-10-13 20:11   ` Josh Poimboeuf
  1 sibling, 0 replies; 52+ messages in thread
From: Borislav Petkov @ 2021-10-13 13:29 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: x86, jpoimboe, andrew.cooper3, linux-kernel, alexei.starovoitov,
	ndesaulniers

On Wed, Oct 13, 2021 at 02:22:18PM +0200, Peter Zijlstra wrote:
> Instead of writing complete alternatives, simply provide a list of all
> the retpoline thunk calls. Then the kernel is free to do with them as
> it pleases. Simpler code all-round.
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  arch/x86/kernel/vmlinux.lds.S       |   12 +++
>  arch/x86/lib/retpoline.S            |   42 ------------
>  tools/objtool/arch/x86/decode.c     |  126 +++++++-----------------------------
>  tools/objtool/elf.c                 |   84 ------------------------
>  tools/objtool/include/objtool/elf.h |    1 
>  5 files changed, 38 insertions(+), 227 deletions(-)
> 
> --- a/arch/x86/kernel/vmlinux.lds.S
> +++ b/arch/x86/kernel/vmlinux.lds.S
> @@ -273,6 +273,18 @@ SECTIONS
>  	}

#ifdef CONFIG_RETPOLINE around it I guess...

>  	/*
> +	 * List of instructions that call/jmp/jcc to retpoline thunks
> +	 * __x86_indirect_thunk_*(). These instructions can be patched along
> +	 * with alternatives, after which the section can be freed.
> +	 */
> +	. = ALIGN(8);
> +	.retpoline_sites : AT(ADDR(.retpoline_sites) - LOAD_OFFSET) {
> +		__retpoline_sites = .;
> +		*(.retpoline_sites)
> +		__retpoline_sites_end = .;
> +	}

#endif

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 4/9] x86/alternative: Implement .retpoline_sites support
  2021-10-13 12:22 ` [PATCH 4/9] x86/alternative: Implement .retpoline_sites support Peter Zijlstra
@ 2021-10-13 14:38   ` Andrew Cooper
  2021-10-13 15:12     ` Peter Zijlstra
  2021-10-13 20:39   ` Josh Poimboeuf
                     ` (3 subsequent siblings)
  4 siblings, 1 reply; 52+ messages in thread
From: Andrew Cooper @ 2021-10-13 14:38 UTC (permalink / raw)
  To: Peter Zijlstra, x86, jpoimboe
  Cc: linux-kernel, alexei.starovoitov, ndesaulniers

On 13/10/2021 13:22, Peter Zijlstra wrote:
> +/*
> + * Rewrite the compiler generated retpoline thunk calls.
> + *
> + * For spectre_v2=off (!X86_FEATURE_RETPOLINE), rewrite them into immediate
> + * indirect instructions, avoiding the extra indirection.
> + *
> + * For example, convert:
> + *
> + *   CALL __x86_indirect_thunk_\reg
> + *
> + * into:
> + *
> + *   CALL *%\reg
> + *
> + */
> +static int patch_retpoline(void *addr, struct insn *insn, u8 *bytes)
> +{
> +	void (*target)(void);
> +	int reg, i = 0;
> +
> +	if (cpu_feature_enabled(X86_FEATURE_RETPOLINE))
> +		return -1;
> +
> +	target = addr + insn->length + insn->immediate.value;
> +	reg = (target - &__x86_indirect_thunk_rax) /
> +	      (&__x86_indirect_thunk_rcx - &__x86_indirect_thunk_rax);

This is equal measures beautiful and terrifying.

Something around here really wants to BUG_ON(reg == 4), because
literally nothing good can come from selecting %rsp.

Also, it might be a good idea (previous patch perhaps) to have some
linker assertions to confirm that the symbols are laid out safely to do
this calculation.

~Andrew


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 4/9] x86/alternative: Implement .retpoline_sites support
  2021-10-13 14:38   ` Andrew Cooper
@ 2021-10-13 15:12     ` Peter Zijlstra
  2021-10-13 17:11       ` Andrew Cooper
  2021-10-14 10:05       ` Peter Zijlstra
  0 siblings, 2 replies; 52+ messages in thread
From: Peter Zijlstra @ 2021-10-13 15:12 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: x86, jpoimboe, linux-kernel, alexei.starovoitov, ndesaulniers

On Wed, Oct 13, 2021 at 03:38:27PM +0100, Andrew Cooper wrote:
> On 13/10/2021 13:22, Peter Zijlstra wrote:
> > +/*
> > + * Rewrite the compiler generated retpoline thunk calls.
> > + *
> > + * For spectre_v2=off (!X86_FEATURE_RETPOLINE), rewrite them into immediate
> > + * indirect instructions, avoiding the extra indirection.
> > + *
> > + * For example, convert:
> > + *
> > + *   CALL __x86_indirect_thunk_\reg
> > + *
> > + * into:
> > + *
> > + *   CALL *%\reg
> > + *
> > + */
> > +static int patch_retpoline(void *addr, struct insn *insn, u8 *bytes)
> > +{
> > +	void (*target)(void);
> > +	int reg, i = 0;
> > +
> > +	if (cpu_feature_enabled(X86_FEATURE_RETPOLINE))
> > +		return -1;
> > +
> > +	target = addr + insn->length + insn->immediate.value;
> > +	reg = (target - &__x86_indirect_thunk_rax) /
> > +	      (&__x86_indirect_thunk_rcx - &__x86_indirect_thunk_rax);
> 
> This is equal measures beautiful and terrifying.

Thanks! :-)

> Something around here really wants to BUG_ON(reg == 4), because
> literally nothing good can come from selecting %rsp.

Ack, I had to add rsp to get the offsets right, but indeed, if anything
ever selects that we're in trouble.

> Also, it might be a good idea (previous patch perhaps) to have some
> linker assertions to confirm that the symbols are laid out safely to do
> this calculation.

I was hoping that since all this is in .S it would be immune from crazy
things like a compiler and do as told. But I suppose carzy stuff like
LTO (or worse BOLT) can totaly wreck this still (then BOLT won't care
about linker script assertions either).

I'll see if I can come up with something.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 4/9] x86/alternative: Implement .retpoline_sites support
  2021-10-13 15:12     ` Peter Zijlstra
@ 2021-10-13 17:11       ` Andrew Cooper
  2021-10-14 10:05       ` Peter Zijlstra
  1 sibling, 0 replies; 52+ messages in thread
From: Andrew Cooper @ 2021-10-13 17:11 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: x86, jpoimboe, linux-kernel, alexei.starovoitov, ndesaulniers,
	Andrew Cooper

On 13/10/2021 16:12, Peter Zijlstra wrote:
> On Wed, Oct 13, 2021 at 03:38:27PM +0100, Andrew Cooper wrote:
>> On 13/10/2021 13:22, Peter Zijlstra wrote:
>>> +/*
>>> + * Rewrite the compiler generated retpoline thunk calls.
>>> + *
>>> + * For spectre_v2=off (!X86_FEATURE_RETPOLINE), rewrite them into immediate
>>> + * indirect instructions, avoiding the extra indirection.
>>> + *
>>> + * For example, convert:
>>> + *
>>> + *   CALL __x86_indirect_thunk_\reg
>>> + *
>>> + * into:
>>> + *
>>> + *   CALL *%\reg
>>> + *
>>> + */
>>> +static int patch_retpoline(void *addr, struct insn *insn, u8 *bytes)
>>> +{
>>> +	void (*target)(void);
>>> +	int reg, i = 0;
>>> +
>>> +	if (cpu_feature_enabled(X86_FEATURE_RETPOLINE))
>>> +		return -1;
>>> +
>>> +	target = addr + insn->length + insn->immediate.value;
>>> +	reg = (target - &__x86_indirect_thunk_rax) /
>>> +	      (&__x86_indirect_thunk_rcx - &__x86_indirect_thunk_rax);
>> This is equal measures beautiful and terrifying.
> Thanks! :-)
>
>> Something around here really wants to BUG_ON(reg == 4), because
>> literally nothing good can come from selecting %rsp.
> Ack, I had to add rsp to get the offsets right, but indeed, if anything
> ever selects that we're in trouble.

Actually, all you need is space for the RSP thunk, not an actual RSP
thunk, and it's probably a wise move not to write one out.

You can fill it with 0xcc's, and make sure not to make it an exported
symbol.

>
>> Also, it might be a good idea (previous patch perhaps) to have some
>> linker assertions to confirm that the symbols are laid out safely to do
>> this calculation.
> I was hoping that since all this is in .S it would be immune from crazy
> things like a compiler and do as told. But I suppose carzy stuff like
> LTO (or worse BOLT) can totaly wreck this still (then BOLT won't care
> about linker script assertions either).
>
> I'll see if I can come up with something.

Another cross check could be something like:

unsigned long reg_to_thunk[] = {
    &__x86_indirec_thunk_rax,
    ...
};

because then BUG_ON(target != reg_to_thunk[reg]) will catch any errors
from layout issues.

Using 0 for rsp could then subsume the individual check.

~Andrew


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 1/9] objtool,x86: Replace alternatives with .retpoline_sites
  2021-10-13 12:22 ` [PATCH 1/9] objtool,x86: Replace alternatives with .retpoline_sites Peter Zijlstra
  2021-10-13 13:29   ` Borislav Petkov
@ 2021-10-13 20:11   ` Josh Poimboeuf
  2021-10-14 15:43     ` Peter Zijlstra
  1 sibling, 1 reply; 52+ messages in thread
From: Josh Poimboeuf @ 2021-10-13 20:11 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: x86, andrew.cooper3, linux-kernel, alexei.starovoitov, ndesaulniers

On Wed, Oct 13, 2021 at 02:22:18PM +0200, Peter Zijlstra wrote:
> Instead of writing complete alternatives, simply provide a list of all
> the retpoline thunk calls. Then the kernel is free to do with them as
> it pleases. Simpler code all-round.
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>

+1, let the kernel patch its own damn self ;-)

> +int arch_rewrite_retpolines(struct objtool_file *file)

Guess it shouldn't be called arch_rewrite_retpolines() anymore.  And it
can be moved to check.c next to create_static_call_sections().

Also is it possible to remove the arch_is_retpoline() check in
get_alt_entry()?  I'm having trouble remembering why that was needed in
the first place.

-- 
Josh


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 5/9] x86/alternative: Handle Jcc __x86_indirect_thunk_\reg
  2021-10-13 12:22 ` [PATCH 5/9] x86/alternative: Handle Jcc __x86_indirect_thunk_\reg Peter Zijlstra
@ 2021-10-13 20:11   ` Nick Desaulniers
  2021-10-13 21:08     ` Peter Zijlstra
  0 siblings, 1 reply; 52+ messages in thread
From: Nick Desaulniers @ 2021-10-13 20:11 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: x86, jpoimboe, andrew.cooper3, linux-kernel, alexei.starovoitov, llvm

On Wed, Oct 13, 2021 at 5:41 AM Peter Zijlstra <peterz@infradead.org> wrote:
>
> Handle the rare cases where the compiler (clang) does an indirect
> conditional tail-call using:
>
>   Jcc __x86_indirect_thunk_\reg

`Jcc.d32 __x86_indirect_thunk_\reg` might be clearer; otherwise
putting that in an assembler and assembling/disassembling produces the
2B instructions, which makes the below patch confusing. Ah, it is
stated in the comment added below.

>
> For the !RETPOLINE case this can be rewritten to fit the original (6
> byte) instruction like:
>
>   Jncc.d8       1f
>   JMP           *%\reg
>   NOP
> 1:
>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  arch/x86/kernel/alternative.c |   38 ++++++++++++++++++++++++++++++++++----
>  1 file changed, 34 insertions(+), 4 deletions(-)
>
> --- a/arch/x86/kernel/alternative.c
> +++ b/arch/x86/kernel/alternative.c
> @@ -378,7 +378,8 @@ static int emit_indirect(int op, int reg
>  static int patch_retpoline(void *addr, struct insn *insn, u8 *bytes)
>  {
>         void (*target)(void);
> -       int reg, i = 0;
> +       int reg, ret, i = 0;
> +       u8 op, cc;
>
>         if (cpu_feature_enabled(X86_FEATURE_RETPOLINE))
>                 return -1;
> @@ -390,9 +391,34 @@ static int patch_retpoline(void *addr, s
>         if (WARN_ON_ONCE(reg & ~0xf))
>                 return -1;
>
> -       i = emit_indirect(insn->opcode.bytes[0], reg, bytes);
> -       if (i < 0)
> -               return i;
> +       op = insn->opcode.bytes[0];
> +
> +       /*
> +        * Convert:
> +        *
> +        *   Jcc.d32 __x86_indirect_thunk_\reg
> +        *
> +        * into:
> +        *
> +        *   Jncc.d8 1f
> +        *   jmp *%\reg
> +        *   nop
> +        * 1:
> +        */
> +       if (op == 0x0f && (insn->opcode.bytes[1] & 0xf0) == 0x80) {
> +               cc = insn->opcode.bytes[1] & 0xf;
> +               cc ^= 1; /* invert condition */
> +
> +               bytes[i++] = 0x70 + cc; /* Jcc.d8 */
> +               bytes[i++] = insn->length - 2;

Isn't `insn->length - 2` always 4 (in this case)? We could avoid
computing that at runtime I suspect if we just hardcoded it.

Either way, I've looked at the disassembly enough that this LGTM.
Thanks for the patch.

Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>

> +
> +               op = JMP32_INSN_OPCODE;
> +       }
> +
> +       ret = emit_indirect(op, reg, bytes + i);
> +       if (ret < 0)
> +               return ret;
> +       i += ret;
>
>         for (; i < insn->length;)
>                 bytes[i++] = BYTES_NOP1;
> @@ -423,6 +449,10 @@ void __init_or_module noinline apply_ret
>                 case JMP32_INSN_OPCODE:
>                         break;
>
> +               case 0x0f: /* escape */
> +                       if (op2 >= 0x80 && op2 <= 0x8f)
> +                               break;
> +                       fallthrough;
>                 default:
>                         WARN_ON_ONCE(1);
>                         continue;
>
>


-- 
Thanks,
~Nick Desaulniers

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 3/9] x86/asm: Fix register order
  2021-10-13 12:22 ` [PATCH 3/9] x86/asm: Fix register order Peter Zijlstra
@ 2021-10-13 20:15   ` Josh Poimboeuf
  0 siblings, 0 replies; 52+ messages in thread
From: Josh Poimboeuf @ 2021-10-13 20:15 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: x86, andrew.cooper3, linux-kernel, alexei.starovoitov, ndesaulniers

On Wed, Oct 13, 2021 at 02:22:20PM +0200, Peter Zijlstra wrote:
> Ensure the register order is correct; this allows for easy translation
> between register number and trampoline and vice-versa.
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>

It would be wise to add a comment saying the register order is critical
to the functioning of the system and shouldn't be changed.

> ---
>  arch/x86/include/asm/GEN-for-each-reg.h |   10 ++++++----
>  1 file changed, 6 insertions(+), 4 deletions(-)
> 
> --- a/arch/x86/include/asm/GEN-for-each-reg.h
> +++ b/arch/x86/include/asm/GEN-for-each-reg.h
> @@ -1,11 +1,12 @@
>  #ifdef CONFIG_64BIT
>  GEN(rax)
> -GEN(rbx)
>  GEN(rcx)
>  GEN(rdx)
> +GEN(rbx)
> +GEN(rsp)
> +GEN(rbp)
>  GEN(rsi)
>  GEN(rdi)
> -GEN(rbp)
>  GEN(r8)
>  GEN(r9)
>  GEN(r10)
> @@ -16,10 +17,11 @@ GEN(r14)
>  GEN(r15)
>  #else
>  GEN(eax)
> -GEN(ebx)
>  GEN(ecx)
>  GEN(edx)
> +GEN(ebx)
> +GEN(esp)
> +GEN(ebp)
>  GEN(esi)
>  GEN(edi)
> -GEN(ebp)
>  #endif

-- 
Josh


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 4/9] x86/alternative: Implement .retpoline_sites support
  2021-10-13 12:22 ` [PATCH 4/9] x86/alternative: Implement .retpoline_sites support Peter Zijlstra
  2021-10-13 14:38   ` Andrew Cooper
@ 2021-10-13 20:39   ` Josh Poimboeuf
  2021-10-13 21:20     ` Peter Zijlstra
  2021-10-13 20:52   ` Josh Poimboeuf
                     ` (2 subsequent siblings)
  4 siblings, 1 reply; 52+ messages in thread
From: Josh Poimboeuf @ 2021-10-13 20:39 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: x86, andrew.cooper3, linux-kernel, alexei.starovoitov, ndesaulniers

On Wed, Oct 13, 2021 at 02:22:21PM +0200, Peter Zijlstra wrote:
> +static int patch_retpoline(void *addr, struct insn *insn, u8 *bytes)
> +{
> +	void (*target)(void);
> +	int reg, i = 0;
> +
> +	if (cpu_feature_enabled(X86_FEATURE_RETPOLINE))
> +		return -1;

Better to do this check further up the call stack in apply_retpolines()
before looping through all the call sites?

> +
> +	target = addr + insn->length + insn->immediate.value;
> +	reg = (target - &__x86_indirect_thunk_rax) /
> +	      (&__x86_indirect_thunk_rcx - &__x86_indirect_thunk_rax);
> +
> +	if (WARN_ON_ONCE(reg & ~0xf))
> +		return -1;

It would be more robust and less magical to just have a basic lookup
table array which converts a thunk address to a reg.  Then you can just
avoid all the safety checks because it's no longer insane ;-)

-- 
Josh


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 4/9] x86/alternative: Implement .retpoline_sites support
  2021-10-13 12:22 ` [PATCH 4/9] x86/alternative: Implement .retpoline_sites support Peter Zijlstra
  2021-10-13 14:38   ` Andrew Cooper
  2021-10-13 20:39   ` Josh Poimboeuf
@ 2021-10-13 20:52   ` Josh Poimboeuf
  2021-10-13 21:00     ` Peter Zijlstra
  2021-10-19 11:37     ` Peter Zijlstra
  2021-10-13 21:11   ` Josh Poimboeuf
  2021-10-15 14:24   ` Borislav Petkov
  4 siblings, 2 replies; 52+ messages in thread
From: Josh Poimboeuf @ 2021-10-13 20:52 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: x86, andrew.cooper3, linux-kernel, alexei.starovoitov, ndesaulniers

On Wed, Oct 13, 2021 at 02:22:21PM +0200, Peter Zijlstra wrote:
>  	/*
> +	 * Rewrite the retpolines, must be done before alternatives since
> +	 * those can rewrite the retpoline thunks.
> +	 */

Why exactly is that a problem?  This code doesn't read the thunks.

BTW, CALL_NOSPEC results in a retpoline site in .altinstr_replacement:

Relocation section [40] '.rela.retpoline_sites' for section [39] '.retpoline_sites' at offset 0x8d28 contains 1 entry:
  Offset              Type            Value               Addend Name
  000000000000000000  X86_64_PC32     000000000000000000     +10 .altinstr_replacement

Which I assume we don't want.

-- 
Josh


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 4/9] x86/alternative: Implement .retpoline_sites support
  2021-10-13 20:52   ` Josh Poimboeuf
@ 2021-10-13 21:00     ` Peter Zijlstra
  2021-10-19 11:37     ` Peter Zijlstra
  1 sibling, 0 replies; 52+ messages in thread
From: Peter Zijlstra @ 2021-10-13 21:00 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: x86, andrew.cooper3, linux-kernel, alexei.starovoitov, ndesaulniers

On Wed, Oct 13, 2021 at 01:52:59PM -0700, Josh Poimboeuf wrote:
> On Wed, Oct 13, 2021 at 02:22:21PM +0200, Peter Zijlstra wrote:
> >  	/*
> > +	 * Rewrite the retpolines, must be done before alternatives since
> > +	 * those can rewrite the retpoline thunks.
> > +	 */
> 
> Why exactly is that a problem?  This code doesn't read the thunks.

The below problem :-) I didn't include it in the series, but I'm
thinking that's where I wants to go eventually.

---
Subject: x86,retpoline: Poison retpoline thunks for !X86_FEATURE_RETPOLINE
From: Peter Zijlstra <peterz@infradead.org>
Date: Tue Oct 12 10:30:56 CEST 2021

Now that objtool will out-of-line all retpoline thunk calls for
!RETPOLINE, poison them.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/lib/retpoline.S |   10 ++++++++++
 1 file changed, 10 insertions(+)

--- a/arch/x86/lib/retpoline.S
+++ b/arch/x86/lib/retpoline.S
@@ -32,9 +32,19 @@
 
 SYM_FUNC_START(__x86_indirect_thunk_\reg)
 
+#ifdef CONFIG_STACK_VALIDATION
+/*
+ * When objtool runs, there should not be any __x86_indirect_thunk_* calls
+ * left after alternatives, ensure this by patching it to UD2.
+ */
+	ALTERNATIVE_2 __stringify(RETPOLINE \reg), \
+		      __stringify(ud2), ALT_NOT(X86_FEATURE_RETPOLINE), \
+		      __stringify(lfence; ANNOTATE_RETPOLINE_SAFE; jmp *%\reg), X86_FEATURE_RETPOLINE_AMD
+#else
 	ALTERNATIVE_2 __stringify(ANNOTATE_RETPOLINE_SAFE; jmp *%\reg), \
 		      __stringify(RETPOLINE \reg), X86_FEATURE_RETPOLINE, \
 		      __stringify(lfence; ANNOTATE_RETPOLINE_SAFE; jmp *%\reg), X86_FEATURE_RETPOLINE_AMD
+#endif
 
 SYM_FUNC_END(__x86_indirect_thunk_\reg)
 

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 9/9] bpf,x86: Respect X86_FEATURE_RETPOLINE*
  2021-10-13 12:22 ` [PATCH 9/9] bpf,x86: Respect X86_FEATURE_RETPOLINE* Peter Zijlstra
@ 2021-10-13 21:06   ` Josh Poimboeuf
  2021-10-13 21:54     ` Peter Zijlstra
  0 siblings, 1 reply; 52+ messages in thread
From: Josh Poimboeuf @ 2021-10-13 21:06 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: x86, andrew.cooper3, linux-kernel, alexei.starovoitov, ndesaulniers

On Wed, Oct 13, 2021 at 02:22:26PM +0200, Peter Zijlstra wrote:
> Current BPF codegen doesn't respect X86_FEATURE_RETPOLINE* flags and
> unconditionally emits a thunk call, this is sub-optimal and doesn't
> match the regular, compiler generated, code.
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  arch/x86/net/bpf_jit_comp.c |   18 +++++++++++-------
>  1 file changed, 11 insertions(+), 7 deletions(-)
> 
> --- a/arch/x86/net/bpf_jit_comp.c
> +++ b/arch/x86/net/bpf_jit_comp.c
> @@ -2123,14 +2123,18 @@ static int emit_fallback_jump(u8 **pprog
>  	int err = 0;
>  
>  #ifdef CONFIG_RETPOLINE
> -	/* Note that this assumes the the compiler uses external
> -	 * thunks for indirect calls. Both clang and GCC use the same
> -	 * naming convention for external thunks.
> -	 */
> -	err = emit_jump(&prog, __x86_indirect_thunk_rdx, prog);
> -#else
> -	EMIT2(0xFF, 0xE2);	/* jmp rdx */
> +	if (cpu_feature_enabled(X86_FEATURE_RETPOLINE)) {
> +		if (cpu_feature_enabled(X86_FEATURE_RETPOLINE_AMD)) {
> +			/* The AMD retpoline can be easily emitted inline. */
> +			EMIT3(0x0F, 0xAE, 0xE8);	/* lfence */
> +			EMIT2(0xFF, 0xE2);		/* jmp rdx */
> +		} else {
> +			/* Call the retpoline thunk */
> +			err = emit_jump(&prog, __x86_indirect_thunk_rdx, prog);
> +		}
> +	} else
>  #endif
> +	EMIT2(0xFF, 0xE2);	/* jmp rdx */

But the rest of eBPF JIT just emits retpolines unconditionally
regardless of feature, for example see RETPOLINE_RCX_BPF_JIT().  So I'm
thinking this should probably be consistent with that (or that with
this).

-- 
Josh


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 5/9] x86/alternative: Handle Jcc __x86_indirect_thunk_\reg
  2021-10-13 20:11   ` Nick Desaulniers
@ 2021-10-13 21:08     ` Peter Zijlstra
  0 siblings, 0 replies; 52+ messages in thread
From: Peter Zijlstra @ 2021-10-13 21:08 UTC (permalink / raw)
  To: Nick Desaulniers
  Cc: x86, jpoimboe, andrew.cooper3, linux-kernel, alexei.starovoitov, llvm

On Wed, Oct 13, 2021 at 01:11:45PM -0700, Nick Desaulniers wrote:

> > +       /*
> > +        * Convert:
> > +        *
> > +        *   Jcc.d32 __x86_indirect_thunk_\reg
> > +        *
> > +        * into:
> > +        *
> > +        *   Jncc.d8 1f
> > +        *   jmp *%\reg
> > +        *   nop
> > +        * 1:
> > +        */
> > +       if (op == 0x0f && (insn->opcode.bytes[1] & 0xf0) == 0x80) {
> > +               cc = insn->opcode.bytes[1] & 0xf;
> > +               cc ^= 1; /* invert condition */
> > +
> > +               bytes[i++] = 0x70 + cc; /* Jcc.d8 */
> > +               bytes[i++] = insn->length - 2;
> 
> Isn't `insn->length - 2` always 4 (in this case)? We could avoid
> computing that at runtime I suspect if we just hardcoded it.

Yeah, but I found this to be more expressive. The purpose is getting to
the next instruction.

Also, if clang ever does instruction stuffing to hit alignment targets
without extra nops (ISTR reading about such passes) this logic still
works.

That is, I think you can prefix this with REX.W just to make a longer
instruction.

> Either way, I've looked at the disassembly enough that this LGTM.
> Thanks for the patch.
> 
> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>

Thanks!

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 4/9] x86/alternative: Implement .retpoline_sites support
  2021-10-13 12:22 ` [PATCH 4/9] x86/alternative: Implement .retpoline_sites support Peter Zijlstra
                     ` (2 preceding siblings ...)
  2021-10-13 20:52   ` Josh Poimboeuf
@ 2021-10-13 21:11   ` Josh Poimboeuf
  2021-10-13 21:43     ` Peter Zijlstra
  2021-10-15 14:24   ` Borislav Petkov
  4 siblings, 1 reply; 52+ messages in thread
From: Josh Poimboeuf @ 2021-10-13 21:11 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: x86, andrew.cooper3, linux-kernel, alexei.starovoitov, ndesaulniers

On Wed, Oct 13, 2021 at 02:22:21PM +0200, Peter Zijlstra wrote:
> +#ifdef CONFIG_X86_64
> +
> +/*
> + * CALL/JMP *%\reg
> + */
> +static int emit_indirect(int op, int reg, u8 *bytes)

X86_64 is already equivalent to STACK_VALIDATION these days, but might
as well clarify here where the retpoline_sites dependency comes from by
changing this to '#ifdef CONFIG_STACK_VALIDATION'.

-- 
Josh


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 4/9] x86/alternative: Implement .retpoline_sites support
  2021-10-13 20:39   ` Josh Poimboeuf
@ 2021-10-13 21:20     ` Peter Zijlstra
  2021-10-13 21:49       ` Josh Poimboeuf
  0 siblings, 1 reply; 52+ messages in thread
From: Peter Zijlstra @ 2021-10-13 21:20 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: x86, andrew.cooper3, linux-kernel, alexei.starovoitov, ndesaulniers

On Wed, Oct 13, 2021 at 01:39:27PM -0700, Josh Poimboeuf wrote:
> On Wed, Oct 13, 2021 at 02:22:21PM +0200, Peter Zijlstra wrote:
> > +static int patch_retpoline(void *addr, struct insn *insn, u8 *bytes)
> > +{
> > +	void (*target)(void);
> > +	int reg, i = 0;
> > +
> > +	if (cpu_feature_enabled(X86_FEATURE_RETPOLINE))
> > +		return -1;
> 
> Better to do this check further up the call stack in apply_retpolines()
> before looping through all the call sites?

In fact, I've pushed it further down, in order to always validate the
absense of rsp.

> > +
> > +	target = addr + insn->length + insn->immediate.value;
> > +	reg = (target - &__x86_indirect_thunk_rax) /
> > +	      (&__x86_indirect_thunk_rcx - &__x86_indirect_thunk_rax);
> > +
> > +	if (WARN_ON_ONCE(reg & ~0xf))
> > +		return -1;
> 
> It would be more robust and less magical to just have a basic lookup
> table array which converts a thunk address to a reg.  Then you can just
> avoid all the safety checks because it's no longer insane ;-)

Andrew suggested the reverse lookup to validate. That should give the
same robustness but lacks the linear lookup.

---
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -392,6 +392,12 @@ static int emit_indirect(int op, int reg
  */
 static int patch_retpoline(void *addr, struct insn *insn, u8 *bytes)
 {
+	static const void *reg_to_thunk[] = {
+#undef GEN
+#define GEN(reg) &__x86_indirect_thunk_ ## reg,
+#include <asm/GEN-for-each-reg.h>
+	};
+
 	void (*target)(void);
 	int reg, i = 0;
 
@@ -402,6 +408,8 @@ static int patch_retpoline(void *addr, s
 	if (WARN_ON_ONCE(reg & ~0xf))
 		return -1;
 
+	BUG_ON(target != reg_to_thunk[reg]);
+
 	/*
 	 * If anyone ever does: CALL/JMP *%rsp, we're in deep trouble.
 	 */

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 4/9] x86/alternative: Implement .retpoline_sites support
  2021-10-13 21:11   ` Josh Poimboeuf
@ 2021-10-13 21:43     ` Peter Zijlstra
  2021-10-13 22:05       ` Josh Poimboeuf
  0 siblings, 1 reply; 52+ messages in thread
From: Peter Zijlstra @ 2021-10-13 21:43 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: x86, andrew.cooper3, linux-kernel, alexei.starovoitov, ndesaulniers

On Wed, Oct 13, 2021 at 02:11:18PM -0700, Josh Poimboeuf wrote:
> On Wed, Oct 13, 2021 at 02:22:21PM +0200, Peter Zijlstra wrote:
> > +#ifdef CONFIG_X86_64
> > +
> > +/*
> > + * CALL/JMP *%\reg
> > + */
> > +static int emit_indirect(int op, int reg, u8 *bytes)
> 
> X86_64 is already equivalent to STACK_VALIDATION these days, but might
> as well clarify here where the retpoline_sites dependency comes from by
> changing this to '#ifdef CONFIG_STACK_VALIDATION'.

Yeah, I was contemplating having x86_64 unconditionally select that.
Maybe we should.

Also, I think I've proposed it before, but what about:

 s/STACK_VALIDATION/OBJTOOL/

on all that?

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 4/9] x86/alternative: Implement .retpoline_sites support
  2021-10-13 21:20     ` Peter Zijlstra
@ 2021-10-13 21:49       ` Josh Poimboeuf
  2021-10-13 21:52         ` Josh Poimboeuf
  2021-10-13 22:10         ` Peter Zijlstra
  0 siblings, 2 replies; 52+ messages in thread
From: Josh Poimboeuf @ 2021-10-13 21:49 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: x86, andrew.cooper3, linux-kernel, alexei.starovoitov, ndesaulniers

On Wed, Oct 13, 2021 at 11:20:02PM +0200, Peter Zijlstra wrote:
> On Wed, Oct 13, 2021 at 01:39:27PM -0700, Josh Poimboeuf wrote:
> > On Wed, Oct 13, 2021 at 02:22:21PM +0200, Peter Zijlstra wrote:
> > > +static int patch_retpoline(void *addr, struct insn *insn, u8 *bytes)
> > > +{
> > > +	void (*target)(void);
> > > +	int reg, i = 0;
> > > +
> > > +	if (cpu_feature_enabled(X86_FEATURE_RETPOLINE))
> > > +		return -1;
> > 
> > Better to do this check further up the call stack in apply_retpolines()
> > before looping through all the call sites?
> 
> In fact, I've pushed it further down, in order to always validate the
> absense of rsp.
> 
> > > +
> > > +	target = addr + insn->length + insn->immediate.value;
> > > +	reg = (target - &__x86_indirect_thunk_rax) /
> > > +	      (&__x86_indirect_thunk_rcx - &__x86_indirect_thunk_rax);
> > > +
> > > +	if (WARN_ON_ONCE(reg & ~0xf))
> > > +		return -1;
> > 
> > It would be more robust and less magical to just have a basic lookup
> > table array which converts a thunk address to a reg.  Then you can just
> > avoid all the safety checks because it's no longer insane ;-)
> 
> Andrew suggested the reverse lookup to validate. That should give the
> same robustness but lacks the linear lookup.

So you've got a WARN_ON_ONCE, a BUG_ON, and a too-deep feature check,
all in the name of supporting this scheme.  ok...

If performance of the linear lookup were a real concern then you could
just put rax and r11 at the beginning of the array.

-- 
Josh


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 4/9] x86/alternative: Implement .retpoline_sites support
  2021-10-13 21:49       ` Josh Poimboeuf
@ 2021-10-13 21:52         ` Josh Poimboeuf
  2021-10-13 22:10         ` Peter Zijlstra
  1 sibling, 0 replies; 52+ messages in thread
From: Josh Poimboeuf @ 2021-10-13 21:52 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: x86, andrew.cooper3, linux-kernel, alexei.starovoitov, ndesaulniers

On Wed, Oct 13, 2021 at 02:49:10PM -0700, Josh Poimboeuf wrote:
> On Wed, Oct 13, 2021 at 11:20:02PM +0200, Peter Zijlstra wrote:
> > On Wed, Oct 13, 2021 at 01:39:27PM -0700, Josh Poimboeuf wrote:
> > > On Wed, Oct 13, 2021 at 02:22:21PM +0200, Peter Zijlstra wrote:
> > > > +static int patch_retpoline(void *addr, struct insn *insn, u8 *bytes)
> > > > +{
> > > > +	void (*target)(void);
> > > > +	int reg, i = 0;
> > > > +
> > > > +	if (cpu_feature_enabled(X86_FEATURE_RETPOLINE))
> > > > +		return -1;
> > > 
> > > Better to do this check further up the call stack in apply_retpolines()
> > > before looping through all the call sites?
> > 
> > In fact, I've pushed it further down, in order to always validate the
> > absense of rsp.
> > 
> > > > +
> > > > +	target = addr + insn->length + insn->immediate.value;
> > > > +	reg = (target - &__x86_indirect_thunk_rax) /
> > > > +	      (&__x86_indirect_thunk_rcx - &__x86_indirect_thunk_rax);
> > > > +
> > > > +	if (WARN_ON_ONCE(reg & ~0xf))
> > > > +		return -1;
> > > 
> > > It would be more robust and less magical to just have a basic lookup
> > > table array which converts a thunk address to a reg.  Then you can just
> > > avoid all the safety checks because it's no longer insane ;-)
> > 
> > Andrew suggested the reverse lookup to validate. That should give the
> > same robustness but lacks the linear lookup.
> 
> So you've got a WARN_ON_ONCE, a BUG_ON, and a too-deep feature check,
> all in the name of supporting this scheme.  ok...

Not to mention the extra silly rsp thunks...

> If performance of the linear lookup were a real concern then you could
> just put rax and r11 at the beginning of the array.

-- 
Josh


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 9/9] bpf,x86: Respect X86_FEATURE_RETPOLINE*
  2021-10-13 21:06   ` Josh Poimboeuf
@ 2021-10-13 21:54     ` Peter Zijlstra
  2021-10-14  9:46       ` Peter Zijlstra
  0 siblings, 1 reply; 52+ messages in thread
From: Peter Zijlstra @ 2021-10-13 21:54 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: x86, andrew.cooper3, linux-kernel, alexei.starovoitov, ndesaulniers

On Wed, Oct 13, 2021 at 02:06:05PM -0700, Josh Poimboeuf wrote:
> On Wed, Oct 13, 2021 at 02:22:26PM +0200, Peter Zijlstra wrote:
> > Current BPF codegen doesn't respect X86_FEATURE_RETPOLINE* flags and
> > unconditionally emits a thunk call, this is sub-optimal and doesn't
> > match the regular, compiler generated, code.
> > 
> > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> > ---
> >  arch/x86/net/bpf_jit_comp.c |   18 +++++++++++-------
> >  1 file changed, 11 insertions(+), 7 deletions(-)
> > 
> > --- a/arch/x86/net/bpf_jit_comp.c
> > +++ b/arch/x86/net/bpf_jit_comp.c
> > @@ -2123,14 +2123,18 @@ static int emit_fallback_jump(u8 **pprog
> >  	int err = 0;
> >  
> >  #ifdef CONFIG_RETPOLINE
> > -	/* Note that this assumes the the compiler uses external
> > -	 * thunks for indirect calls. Both clang and GCC use the same
> > -	 * naming convention for external thunks.
> > -	 */
> > -	err = emit_jump(&prog, __x86_indirect_thunk_rdx, prog);
> > -#else
> > -	EMIT2(0xFF, 0xE2);	/* jmp rdx */
> > +	if (cpu_feature_enabled(X86_FEATURE_RETPOLINE)) {
> > +		if (cpu_feature_enabled(X86_FEATURE_RETPOLINE_AMD)) {
> > +			/* The AMD retpoline can be easily emitted inline. */
> > +			EMIT3(0x0F, 0xAE, 0xE8);	/* lfence */
> > +			EMIT2(0xFF, 0xE2);		/* jmp rdx */
> > +		} else {
> > +			/* Call the retpoline thunk */
> > +			err = emit_jump(&prog, __x86_indirect_thunk_rdx, prog);
> > +		}
> > +	} else
> >  #endif
> > +	EMIT2(0xFF, 0xE2);	/* jmp rdx */
> 
> But the rest of eBPF JIT just emits retpolines unconditionally
> regardless of feature, for example see RETPOLINE_RCX_BPF_JIT().  So I'm
> thinking this should probably be consistent with that (or that with
> this).

Argh, I grepped for __x86_indirect_thunk, and missed they're writing
retpolines themselves. Bah.

Yes, that needs cleaning up. I'll go prod at that tomorrow.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 4/9] x86/alternative: Implement .retpoline_sites support
  2021-10-13 21:43     ` Peter Zijlstra
@ 2021-10-13 22:05       ` Josh Poimboeuf
  2021-10-13 22:14         ` Peter Zijlstra
  0 siblings, 1 reply; 52+ messages in thread
From: Josh Poimboeuf @ 2021-10-13 22:05 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: x86, andrew.cooper3, linux-kernel, alexei.starovoitov, ndesaulniers

On Wed, Oct 13, 2021 at 11:43:43PM +0200, Peter Zijlstra wrote:
> On Wed, Oct 13, 2021 at 02:11:18PM -0700, Josh Poimboeuf wrote:
> > On Wed, Oct 13, 2021 at 02:22:21PM +0200, Peter Zijlstra wrote:
> > > +#ifdef CONFIG_X86_64
> > > +
> > > +/*
> > > + * CALL/JMP *%\reg
> > > + */
> > > +static int emit_indirect(int op, int reg, u8 *bytes)
> > 
> > X86_64 is already equivalent to STACK_VALIDATION these days, but might
> > as well clarify here where the retpoline_sites dependency comes from by
> > changing this to '#ifdef CONFIG_STACK_VALIDATION'.
> 
> Yeah, I was contemplating having x86_64 unconditionally select that.
> Maybe we should.

As far as I can tell, it already does that:

        select HAVE_STACK_VALIDATION            if X86_64
        select HAVE_STATIC_CALL_INLINE          if HAVE_STACK_VALIDATION
        select STACK_VALIDATION                 if HAVE_STACK_VALIDATION && (HAVE_STATIC_CALL_INLINE || RETPOLINE)

> Also, I think I've proposed it before, but what about:
> 
>  s/STACK_VALIDATION/OBJTOOL/
> 
> on all that?

How about keeping STACK_VALIDATION, but then having it depend on
OBJTOOL, in case anybody cares to extricate the two at some point.  Some
objtool features (like this one) don't rely on the full validation.

-- 
Josh


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 4/9] x86/alternative: Implement .retpoline_sites support
  2021-10-13 21:49       ` Josh Poimboeuf
  2021-10-13 21:52         ` Josh Poimboeuf
@ 2021-10-13 22:10         ` Peter Zijlstra
  2021-10-13 22:47           ` Andrew Cooper
  1 sibling, 1 reply; 52+ messages in thread
From: Peter Zijlstra @ 2021-10-13 22:10 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: x86, andrew.cooper3, linux-kernel, alexei.starovoitov, ndesaulniers

On Wed, Oct 13, 2021 at 02:49:07PM -0700, Josh Poimboeuf wrote:
> So you've got a WARN_ON_ONCE, a BUG_ON, and a too-deep feature check,
> all in the name of supporting this scheme.  ok...

Mostly because I'm way too lazy to type out that table. With the thunks
being in .S, I really don't see that getting messed up, they even in
their own special section too.

> If performance of the linear lookup were a real concern then you could
> just put rax and r11 at the beginning of the array.

That would mean the table would have to be { __thunk, reg_idx }, which
is even more yuck.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 4/9] x86/alternative: Implement .retpoline_sites support
  2021-10-13 22:05       ` Josh Poimboeuf
@ 2021-10-13 22:14         ` Peter Zijlstra
  0 siblings, 0 replies; 52+ messages in thread
From: Peter Zijlstra @ 2021-10-13 22:14 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: x86, andrew.cooper3, linux-kernel, alexei.starovoitov, ndesaulniers

On Wed, Oct 13, 2021 at 03:05:20PM -0700, Josh Poimboeuf wrote:
> On Wed, Oct 13, 2021 at 11:43:43PM +0200, Peter Zijlstra wrote:
> > On Wed, Oct 13, 2021 at 02:11:18PM -0700, Josh Poimboeuf wrote:
> > > On Wed, Oct 13, 2021 at 02:22:21PM +0200, Peter Zijlstra wrote:
> > > > +#ifdef CONFIG_X86_64
> > > > +
> > > > +/*
> > > > + * CALL/JMP *%\reg
> > > > + */
> > > > +static int emit_indirect(int op, int reg, u8 *bytes)
> > > 
> > > X86_64 is already equivalent to STACK_VALIDATION these days, but might
> > > as well clarify here where the retpoline_sites dependency comes from by
> > > changing this to '#ifdef CONFIG_STACK_VALIDATION'.
> > 
> > Yeah, I was contemplating having x86_64 unconditionally select that.
> > Maybe we should.
> 
> As far as I can tell, it already does that:
> 
>         select HAVE_STACK_VALIDATION            if X86_64
>         select HAVE_STATIC_CALL_INLINE          if HAVE_STACK_VALIDATION
>         select STACK_VALIDATION                 if HAVE_STACK_VALIDATION && (HAVE_STATIC_CALL_INLINE || RETPOLINE)

Oh right, I thought there was still a possible hole in there, but I
guess that's pretty solid. I suppose we should just remove the && ...
from the last line.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 4/9] x86/alternative: Implement .retpoline_sites support
  2021-10-13 22:10         ` Peter Zijlstra
@ 2021-10-13 22:47           ` Andrew Cooper
  0 siblings, 0 replies; 52+ messages in thread
From: Andrew Cooper @ 2021-10-13 22:47 UTC (permalink / raw)
  To: Peter Zijlstra, Josh Poimboeuf
  Cc: x86, linux-kernel, alexei.starovoitov, ndesaulniers

On 13/10/2021 23:10, Peter Zijlstra wrote:
>> If performance of the linear lookup were a real concern then you could
>> just put rax and r11 at the beginning of the array.
> That would mean the table would have to be { __thunk, reg_idx }, which
> is even more yuck.

Yeah - it's nasty because it is a reverse lookup you need.

In both cases, it is %rax (GCC) or %r11 (Clang) and change for the other
regs, so you can construct a search which will hit on the first lookup
most of the time.

Either:

1) an array of { __thunk } with a hole for rsp.  Bias searching entry 0
or 11 first based on compiler, then a 16 step linear search.

or

2) an array of { __thunk, reg }, sorted by thunk address.  This has an
odd number of entries, so arrange the thunk generation to emit rax or
r11 as the 7th thunk, so it ends up in the middle when sorted.

~Andrew


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 9/9] bpf,x86: Respect X86_FEATURE_RETPOLINE*
  2021-10-13 21:54     ` Peter Zijlstra
@ 2021-10-14  9:46       ` Peter Zijlstra
  2021-10-14  9:48         ` Peter Zijlstra
  2021-10-20  7:34         ` Peter Zijlstra
  0 siblings, 2 replies; 52+ messages in thread
From: Peter Zijlstra @ 2021-10-14  9:46 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: x86, andrew.cooper3, linux-kernel, alexei.starovoitov, ndesaulniers

On Wed, Oct 13, 2021 at 11:54:51PM +0200, Peter Zijlstra wrote:

> > But the rest of eBPF JIT just emits retpolines unconditionally
> > regardless of feature, for example see RETPOLINE_RCX_BPF_JIT().  So I'm
> > thinking this should probably be consistent with that (or that with
> > this).
> 
> Argh, I grepped for __x86_indirect_thunk, and missed they're writing
> retpolines themselves. Bah.
> 
> Yes, that needs cleaning up. I'll go prod at that tomorrow.

Bah, i've too much of a head-ache to make sense of that bpf jit code :-(

Alexei, emit_fallback_jump() uses emit_jump() and provides @prog as .ip
argument, but other sites do weird things like 'image + addrs[i]',
presumable because @prog points to an on-stack array instead of to the
actual text location.

Also @prog is confusingly sometimes a struct bpf_prog* and sometimes a
u8* which makes grepping really confusing.

I've ended up with the below.. which is known broken-crap for 32, but is
likely simlar for 64bit as well :-( Can you please have a look?

(random remarks:
  I *really* hate Intel syntax,
  64bit seems to lack IA32_reg equivalents,
  add_2reg(), add_1reg() are really daft names for something that
  generates a modrm byte.
)

---

--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -303,63 +303,4 @@ static inline void mds_idle_clear_cpu_bu
 
 #endif /* __ASSEMBLY__ */
 
-/*
- * Below is used in the eBPF JIT compiler and emits the byte sequence
- * for the following assembly:
- *
- * With retpolines configured:
- *
- *    callq do_rop
- *  spec_trap:
- *    pause
- *    lfence
- *    jmp spec_trap
- *  do_rop:
- *    mov %rcx,(%rsp) for x86_64
- *    mov %edx,(%esp) for x86_32
- *    retq
- *
- * Without retpolines configured:
- *
- *    jmp *%rcx for x86_64
- *    jmp *%edx for x86_32
- */
-#ifdef CONFIG_RETPOLINE
-# ifdef CONFIG_X86_64
-#  define RETPOLINE_RCX_BPF_JIT_SIZE	17
-#  define RETPOLINE_RCX_BPF_JIT()				\
-do {								\
-	EMIT1_off32(0xE8, 7);	 /* callq do_rop */		\
-	/* spec_trap: */					\
-	EMIT2(0xF3, 0x90);       /* pause */			\
-	EMIT3(0x0F, 0xAE, 0xE8); /* lfence */			\
-	EMIT2(0xEB, 0xF9);       /* jmp spec_trap */		\
-	/* do_rop: */						\
-	EMIT4(0x48, 0x89, 0x0C, 0x24); /* mov %rcx,(%rsp) */	\
-	EMIT1(0xC3);             /* retq */			\
-} while (0)
-# else /* !CONFIG_X86_64 */
-#  define RETPOLINE_EDX_BPF_JIT()				\
-do {								\
-	EMIT1_off32(0xE8, 7);	 /* call do_rop */		\
-	/* spec_trap: */					\
-	EMIT2(0xF3, 0x90);       /* pause */			\
-	EMIT3(0x0F, 0xAE, 0xE8); /* lfence */			\
-	EMIT2(0xEB, 0xF9);       /* jmp spec_trap */		\
-	/* do_rop: */						\
-	EMIT3(0x89, 0x14, 0x24); /* mov %edx,(%esp) */		\
-	EMIT1(0xC3);             /* ret */			\
-} while (0)
-# endif
-#else /* !CONFIG_RETPOLINE */
-# ifdef CONFIG_X86_64
-#  define RETPOLINE_RCX_BPF_JIT_SIZE	2
-#  define RETPOLINE_RCX_BPF_JIT()				\
-	EMIT2(0xFF, 0xE1);       /* jmp *%rcx */
-# else /* !CONFIG_X86_64 */
-#  define RETPOLINE_EDX_BPF_JIT()				\
-	EMIT2(0xFF, 0xE2)        /* jmp *%edx */
-# endif
-#endif
-
 #endif /* _ASM_X86_NOSPEC_BRANCH_H_ */
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -396,6 +396,37 @@ static int get_pop_bytes(bool *callee_re
 	return bytes;
 }
 
+#define EMIT_LFENCE()	EMIT3(0x0F, 0xAE, 0xE8)
+
+#ifdef CONFIG_RETPOLINE
+#define INDIRECT_SIZE (5)
+#else
+#define INDIRECT_SIZE (2)
+#endif
+
+static void emit_indirect_jump(u8 **pprog, int reg, u8 *ip)
+{
+	static const void *reg_thunk[] = {
+#undef GEN
+#define GEN(reg) __x86_indirect_thunk_ ## reg,
+#include <asm/GEN-for-each-reg.h>
+	};
+
+	u8 *prog = *pprog;
+
+#ifdef CONFIG_RETPOLINE
+	if (cpu_feature_enabled(X86_FEATURE_RETPOLINE_AMD)) {
+		EMIT_LFENCE();
+		EMIT2(0xFF, 0xE0 + reg);
+	} else if (cpu_feature_enabled(X86_FEATURE_RETPOLINE)) {
+		emit_jump(&prog, reg_thunk[reg], ip);
+	} else
+#endif
+	EMIT2(0xFF, 0xE0 + reg);
+
+	*pprog = prog;
+}
+
 /*
  * Generate the following code:
  *
@@ -448,7 +479,7 @@ static void emit_bpf_tail_call_indirect(
 	EMIT2(0x89, 0xD2);                        /* mov edx, edx */
 	EMIT3(0x39, 0x56,                         /* cmp dword ptr [rsi + 16], edx */
 	      offsetof(struct bpf_array, map.max_entries));
-#define OFFSET1 (off1 + RETPOLINE_RCX_BPF_JIT_SIZE) /* Number of bytes to jump */
+#define OFFSET1 (off1 + INDIRECT_SIZE) /* Number of bytes to jump */
 	EMIT2(X86_JBE, OFFSET1);                  /* jbe out */
 
 	/*
@@ -457,7 +488,7 @@ static void emit_bpf_tail_call_indirect(
 	 */
 	EMIT2_off32(0x8B, 0x85, tcc_off);         /* mov eax, dword ptr [rbp - tcc_off] */
 	EMIT3(0x83, 0xF8, MAX_TAIL_CALL_CNT);     /* cmp eax, MAX_TAIL_CALL_CNT */
-#define OFFSET2 (off2 + RETPOLINE_RCX_BPF_JIT_SIZE)
+#define OFFSET2 (off2 + INDIRECT_SIZE)
 	EMIT2(X86_JA, OFFSET2);                   /* ja out */
 	EMIT3(0x83, 0xC0, 0x01);                  /* add eax, 1 */
 	EMIT2_off32(0x89, 0x85, tcc_off);         /* mov dword ptr [rbp - tcc_off], eax */
@@ -471,7 +502,7 @@ static void emit_bpf_tail_call_indirect(
 	 *	goto out;
 	 */
 	EMIT3(0x48, 0x85, 0xC9);                  /* test rcx,rcx */
-#define OFFSET3 (off3 + RETPOLINE_RCX_BPF_JIT_SIZE)
+#define OFFSET3 (off3 + INDIRECT_SIZE)
 	EMIT2(X86_JE, OFFSET3);                   /* je out */
 
 	*pprog = prog;
@@ -493,7 +524,7 @@ static void emit_bpf_tail_call_indirect(
 	 * rdi == ctx (1st arg)
 	 * rcx == prog->bpf_func + X86_TAIL_CALL_OFFSET
 	 */
-	RETPOLINE_RCX_BPF_JIT();
+	emit_indirect_jump(&prog, 1 /* rcx */, prog);
 
 	/* out: */
 	*pprog = prog;
@@ -1220,8 +1251,7 @@ static int do_jit(struct bpf_prog *bpf_p
 			/* speculation barrier */
 		case BPF_ST | BPF_NOSPEC:
 			if (boot_cpu_has(X86_FEATURE_XMM2))
-				/* Emit 'lfence' */
-				EMIT3(0x0F, 0xAE, 0xE8);
+				EMIT_LFENCE();
 			break;
 
 			/* ST: *(u8*)(dst_reg + off) = imm */
@@ -2117,24 +2147,6 @@ int arch_prepare_bpf_trampoline(struct b
 	return ret;
 }
 
-static int emit_fallback_jump(u8 **pprog)
-{
-	u8 *prog = *pprog;
-	int err = 0;
-
-#ifdef CONFIG_RETPOLINE
-	/* Note that this assumes the the compiler uses external
-	 * thunks for indirect calls. Both clang and GCC use the same
-	 * naming convention for external thunks.
-	 */
-	err = emit_jump(&prog, __x86_indirect_thunk_rdx, prog);
-#else
-	EMIT2(0xFF, 0xE2);	/* jmp rdx */
-#endif
-	*pprog = prog;
-	return err;
-}
-
 static int emit_bpf_dispatcher(u8 **pprog, int a, int b, s64 *progs)
 {
 	u8 *jg_reloc, *prog = *pprog;
@@ -2156,9 +2168,7 @@ static int emit_bpf_dispatcher(u8 **ppro
 		if (err)
 			return err;
 
-		err = emit_fallback_jump(&prog);	/* jmp thunk/indirect */
-		if (err)
-			return err;
+		emit_indirect_jump(&prog, 2 /* rdx */, prog);
 
 		*pprog = prog;
 		return 0;
--- a/arch/x86/net/bpf_jit_comp32.c
+++ b/arch/x86/net/bpf_jit_comp32.c
@@ -1362,6 +1362,11 @@ static void emit_bpf_tail_call(u8 **ppro
 	 * eax == ctx (1st arg)
 	 * edx == prog->bpf_func + prologue_size
 	 */
+#ifdef CONFIG_RETPOLINE
+	EMIT1_off32(0xE9, __x86_indirect_thunk_edx - (image + addrs[i] + 5));
+#else
+	EMIT2(0xFF, 0xE2);
+#endif
 	RETPOLINE_EDX_BPF_JIT();
 
 	if (jmp_label1 == -1)

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 9/9] bpf,x86: Respect X86_FEATURE_RETPOLINE*
  2021-10-14  9:46       ` Peter Zijlstra
@ 2021-10-14  9:48         ` Peter Zijlstra
  2021-10-20  7:34         ` Peter Zijlstra
  1 sibling, 0 replies; 52+ messages in thread
From: Peter Zijlstra @ 2021-10-14  9:48 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: x86, andrew.cooper3, linux-kernel, alexei.starovoitov, ndesaulniers

On Thu, Oct 14, 2021 at 11:46:11AM +0200, Peter Zijlstra wrote:
> --- a/arch/x86/net/bpf_jit_comp.c
> +++ b/arch/x86/net/bpf_jit_comp.c
> @@ -396,6 +396,37 @@ static int get_pop_bytes(bool *callee_re
>  	return bytes;
>  }
>  
> +#define EMIT_LFENCE()	EMIT3(0x0F, 0xAE, 0xE8)
> +
> +#ifdef CONFIG_RETPOLINE
> +#define INDIRECT_SIZE (5)

Bah, that should be:

#define INDIRECT_SIZE	(2 + 3*cpu_feature_enabled(X86_FEATURE_RETPOLINE))

> +#else
> +#define INDIRECT_SIZE (2)
> +#endif
> +
> +static void emit_indirect_jump(u8 **pprog, int reg, u8 *ip)
> +{
> +	static const void *reg_thunk[] = {
> +#undef GEN
> +#define GEN(reg) __x86_indirect_thunk_ ## reg,
> +#include <asm/GEN-for-each-reg.h>
> +	};
> +
> +	u8 *prog = *pprog;
> +
> +#ifdef CONFIG_RETPOLINE
> +	if (cpu_feature_enabled(X86_FEATURE_RETPOLINE_AMD)) {
> +		EMIT_LFENCE();
> +		EMIT2(0xFF, 0xE0 + reg);
> +	} else if (cpu_feature_enabled(X86_FEATURE_RETPOLINE)) {
> +		emit_jump(&prog, reg_thunk[reg], ip);
> +	} else
> +#endif
> +	EMIT2(0xFF, 0xE0 + reg);
> +
> +	*pprog = prog;
> +}

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 4/9] x86/alternative: Implement .retpoline_sites support
  2021-10-13 15:12     ` Peter Zijlstra
  2021-10-13 17:11       ` Andrew Cooper
@ 2021-10-14 10:05       ` Peter Zijlstra
  1 sibling, 0 replies; 52+ messages in thread
From: Peter Zijlstra @ 2021-10-14 10:05 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: x86, jpoimboe, linux-kernel, alexei.starovoitov, ndesaulniers

On Wed, Oct 13, 2021 at 05:12:13PM +0200, Peter Zijlstra wrote:
> On Wed, Oct 13, 2021 at 03:38:27PM +0100, Andrew Cooper wrote:
> > On 13/10/2021 13:22, Peter Zijlstra wrote:

> > > +static int patch_retpoline(void *addr, struct insn *insn, u8 *bytes)
> > > +{
> > > +	void (*target)(void);
> > > +	int reg, i = 0;
> > > +
> > > +	if (cpu_feature_enabled(X86_FEATURE_RETPOLINE))
> > > +		return -1;
> > > +
> > > +	target = addr + insn->length + insn->immediate.value;
> > > +	reg = (target - &__x86_indirect_thunk_rax) /
> > > +	      (&__x86_indirect_thunk_rcx - &__x86_indirect_thunk_rax);
> > 
> > This is equal measures beautiful and terrifying.
> 
> Thanks! :-)

Would something like this appease people? If the toolchain can mess this
up everything is broken.

That makes the symtab looks like:

(and arguably, that array symbol could be local)

...
35: 0000000000000000   512 NOTYPE  GLOBAL DEFAULT    4 __x86_indirect_thunk_array
36: 0000000000000000    17 FUNC    GLOBAL DEFAULT    4 __x86_indirect_thunk_rax
37: 0000000000000020    17 FUNC    GLOBAL DEFAULT    4 __x86_indirect_thunk_rcx
38: 0000000000000040    17 FUNC    GLOBAL DEFAULT    4 __x86_indirect_thunk_rdx
39: 0000000000000060    17 FUNC    GLOBAL DEFAULT    4 __x86_indirect_thunk_rbx
40: 0000000000000080    17 FUNC    GLOBAL DEFAULT    4 __x86_indirect_thunk_rsp
41: 00000000000000a0    17 FUNC    GLOBAL DEFAULT    4 __x86_indirect_thunk_rbp
42: 00000000000000c0    17 FUNC    GLOBAL DEFAULT    4 __x86_indirect_thunk_rsi
43: 00000000000000e0    17 FUNC    GLOBAL DEFAULT    4 __x86_indirect_thunk_rdi
44: 0000000000000100    17 FUNC    GLOBAL DEFAULT    4 __x86_indirect_thunk_r8
45: 0000000000000120    17 FUNC    GLOBAL DEFAULT    4 __x86_indirect_thunk_r9
46: 0000000000000140    17 FUNC    GLOBAL DEFAULT    4 __x86_indirect_thunk_r10
47: 0000000000000160    17 FUNC    GLOBAL DEFAULT    4 __x86_indirect_thunk_r11
48: 0000000000000180    17 FUNC    GLOBAL DEFAULT    4 __x86_indirect_thunk_r12
49: 00000000000001a0    17 FUNC    GLOBAL DEFAULT    4 __x86_indirect_thunk_r13
50: 00000000000001c0    17 FUNC    GLOBAL DEFAULT    4 __x86_indirect_thunk_r14
51: 00000000000001e0    17 FUNC    GLOBAL DEFAULT    4 __x86_indirect_thunk_r15


---
--- a/arch/x86/lib/retpoline.S
+++ b/arch/x86/lib/retpoline.S
@@ -30,7 +30,7 @@
 
 	.align 32
 
-SYM_FUNC_START(__x86_indirect_thunk_\reg)
+SYM_INNER_LABEL(__x86_indirect_thunk_\reg, SYM_L_GLOBAL)
 
 	ALTERNATIVE_2 __stringify(ANNOTATE_RETPOLINE_SAFE; jmp *%\reg), \
 		      __stringify(RETPOLINE \reg), X86_FEATURE_RETPOLINE, \
@@ -55,10 +55,16 @@ SYM_FUNC_END(__x86_indirect_thunk_\reg)
 #define __EXPORT_THUNK(sym)	_ASM_NOKPROBE(sym); EXPORT_SYMBOL(sym)
 #define EXPORT_THUNK(reg)	__EXPORT_THUNK(__x86_indirect_thunk_ ## reg)
 
+	.align 32
+SYM_CODE_START(__x86_indirect_thunk_array)
+
 #define GEN(reg) THUNK reg
 #include <asm/GEN-for-each-reg.h>
 #undef GEN
 
+	.align 32
+SYM_CODE_END(__x86_indirect_thunk_array)
+
 #define GEN(reg) EXPORT_THUNK(reg)
 #include <asm/GEN-for-each-reg.h>
 #undef GEN

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 1/9] objtool,x86: Replace alternatives with .retpoline_sites
  2021-10-13 20:11   ` Josh Poimboeuf
@ 2021-10-14 15:43     ` Peter Zijlstra
  0 siblings, 0 replies; 52+ messages in thread
From: Peter Zijlstra @ 2021-10-14 15:43 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: x86, andrew.cooper3, linux-kernel, alexei.starovoitov, ndesaulniers

On Wed, Oct 13, 2021 at 01:11:35PM -0700, Josh Poimboeuf wrote:

> Guess it shouldn't be called arch_rewrite_retpolines() anymore.  And it
> can be moved to check.c next to create_static_call_sections().
> 
> Also is it possible to remove the arch_is_retpoline() check in
> get_alt_entry()?  I'm having trouble remembering why that was needed in
> the first place.

Makes sense...

---
+++ b/tools/objtool/arch/x86/decode.c
@@ -711,52 +711,6 @@ const char *arch_ret_insn(int len)
 	return ret[len-1];
 }
 
-int arch_rewrite_retpolines(struct objtool_file *file)
-{
-	struct instruction *insn;
-	struct section *sec;
-	int idx;
-
-	sec = find_section_by_name(file->elf, ".retpoline_sites");
-	if (sec) {
-		WARN("file already has .retpoline_sites, skipping");
-		return 0;
-	}
-
-	idx = 0;
-	list_for_each_entry(insn, &file->retpoline_call_list, call_node)
-		idx++;
-
-	if (!idx)
-		return 0;
-
-	sec = elf_create_section(file->elf, ".retpoline_sites", 0,
-				 sizeof(int), idx);
-	if (!sec) {
-		WARN("elf_create_section: .retpoline_sites");
-		return -1;
-	}
-
-	idx = 0;
-	list_for_each_entry(insn, &file->retpoline_call_list, call_node) {
-
-		int *site = (int *)sec->data->d_buf + idx;
-		*site = 0;
-
-		if (elf_add_reloc_to_insn(file->elf, sec,
-					  idx * sizeof(int),
-					  R_X86_64_PC32,
-					  insn->sec, insn->offset)) {
-			WARN("elf_add_reloc_to_insn: .retpoline_sites");
-			return -1;
-		}
-
-		idx++;
-	}
-
-	return 0;
-}
-
 int arch_decode_hint_reg(u8 sp_reg, int *base)
 {
 	switch (sp_reg) {
--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -683,6 +683,52 @@ static int create_static_call_sections(s
 	return 0;
 }
 
+static int create_retpoline_sites_sections(struct objtool_file *file)
+{
+	struct instruction *insn;
+	struct section *sec;
+	int idx;
+
+	sec = find_section_by_name(file->elf, ".retpoline_sites");
+	if (sec) {
+		WARN("file already has .retpoline_sites, skipping");
+		return 0;
+	}
+
+	idx = 0;
+	list_for_each_entry(insn, &file->retpoline_call_list, call_node)
+		idx++;
+
+	if (!idx)
+		return 0;
+
+	sec = elf_create_section(file->elf, ".retpoline_sites", 0,
+				 sizeof(int), idx);
+	if (!sec) {
+		WARN("elf_create_section: .retpoline_sites");
+		return -1;
+	}
+
+	idx = 0;
+	list_for_each_entry(insn, &file->retpoline_call_list, call_node) {
+
+		int *site = (int *)sec->data->d_buf + idx;
+		*site = 0;
+
+		if (elf_add_reloc_to_insn(file->elf, sec,
+					  idx * sizeof(int),
+					  R_X86_64_PC32,
+					  insn->sec, insn->offset)) {
+			WARN("elf_add_reloc_to_insn: .retpoline_sites");
+			return -1;
+		}
+
+		idx++;
+	}
+
+	return 0;
+}
+
 static int create_mcount_loc_sections(struct objtool_file *file)
 {
 	struct section *sec;
@@ -1950,11 +1996,6 @@ static void mark_rodata(struct objtool_f
 	file->rodata = found;
 }
 
-__weak int arch_rewrite_retpolines(struct objtool_file *file)
-{
-	return 0;
-}
-
 static int decode_sections(struct objtool_file *file)
 {
 	int ret;
@@ -2027,15 +2068,6 @@ static int decode_sections(struct objtoo
 	if (ret)
 		return ret;
 
-	/*
-	 * Must be after add_special_section_alts(), since this will emit
-	 * alternatives. Must be after add_{jump,call}_destination(), since
-	 * those create the call insn lists.
-	 */
-	ret = arch_rewrite_retpolines(file);
-	if (ret)
-		return ret;
-
 	return 0;
 }
 
@@ -3438,6 +3470,13 @@ int check(struct objtool_file *file)
 		goto out;
 	warnings += ret;
 
+	if (retpoline) {
+		ret = create_retpoline_sites_sections(file);
+		if (ret < 0)
+			goto out;
+		warnings += ret;
+	}
+
 	if (mcount) {
 		ret = create_mcount_loc_sections(file);
 		if (ret < 0)
--- a/tools/objtool/special.c
+++ b/tools/objtool/special.c
@@ -109,14 +109,6 @@ static int get_alt_entry(struct elf *elf
 			return -1;
 		}
 
-		/*
-		 * Skip retpoline .altinstr_replacement... we already rewrite the
-		 * instructions for retpolines anyway, see arch_is_retpoline()
-		 * usage in add_{call,jump}_destinations().
-		 */
-		if (arch_is_retpoline(new_reloc->sym))
-			return 1;
-
 		reloc_to_sec_off(new_reloc, &alt->new_sec, &alt->new_off);
 
 		/* _ASM_EXTABLE_EX hack */

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 4/9] x86/alternative: Implement .retpoline_sites support
  2021-10-13 12:22 ` [PATCH 4/9] x86/alternative: Implement .retpoline_sites support Peter Zijlstra
                     ` (3 preceding siblings ...)
  2021-10-13 21:11   ` Josh Poimboeuf
@ 2021-10-15 14:24   ` Borislav Petkov
  2021-10-15 16:56     ` Peter Zijlstra
  4 siblings, 1 reply; 52+ messages in thread
From: Borislav Petkov @ 2021-10-15 14:24 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: x86, jpoimboe, andrew.cooper3, linux-kernel, alexei.starovoitov,
	ndesaulniers

On Wed, Oct 13, 2021 at 02:22:21PM +0200, Peter Zijlstra wrote:
> +static int patch_retpoline(void *addr, struct insn *insn, u8 *bytes)
> +{
> +	void (*target)(void);
> +	int reg, i = 0;
> +
> +	if (cpu_feature_enabled(X86_FEATURE_RETPOLINE))
> +		return -1;
> +
> +	target = addr + insn->length + insn->immediate.value;
> +	reg = (target - &__x86_indirect_thunk_rax) /
> +	      (&__x86_indirect_thunk_rcx - &__x86_indirect_thunk_rax);

I guess you should compute those values once so that it doesn't have to
do them for each function invocation. And it does them here when I look
at the asm it generates.

> +
> +	if (WARN_ON_ONCE(reg & ~0xf))
> +		return -1;

Sanity-checking the alignment of those thunks?

> +
> +	i = emit_indirect(insn->opcode.bytes[0], reg, bytes);
> +	if (i < 0)
> +		return i;
> +
> +	for (; i < insn->length;)
> +		bytes[i++] = BYTES_NOP1;

Why not:

        nop_len = insn->length - i;
        if (nop_len) {
                memcpy(&bytes[i], x86_nops[nop_len], nop_len);
                i += nop_len;
        }

and then you save yourself the optimize_nops() call because it'll take
the right-sized NOP directly.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 4/9] x86/alternative: Implement .retpoline_sites support
  2021-10-15 14:24   ` Borislav Petkov
@ 2021-10-15 16:56     ` Peter Zijlstra
  2021-10-18 23:06       ` Alexander Lobakin
  0 siblings, 1 reply; 52+ messages in thread
From: Peter Zijlstra @ 2021-10-15 16:56 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: x86, jpoimboe, andrew.cooper3, linux-kernel, alexei.starovoitov,
	ndesaulniers

On Fri, Oct 15, 2021 at 04:24:08PM +0200, Borislav Petkov wrote:
> On Wed, Oct 13, 2021 at 02:22:21PM +0200, Peter Zijlstra wrote:
> > +static int patch_retpoline(void *addr, struct insn *insn, u8 *bytes)
> > +{
> > +	void (*target)(void);
> > +	int reg, i = 0;
> > +
> > +	if (cpu_feature_enabled(X86_FEATURE_RETPOLINE))
> > +		return -1;
> > +
> > +	target = addr + insn->length + insn->immediate.value;
> > +	reg = (target - &__x86_indirect_thunk_rax) /
> > +	      (&__x86_indirect_thunk_rcx - &__x86_indirect_thunk_rax);
> 
> I guess you should compute those values once so that it doesn't have to
> do them for each function invocation. And it does them here when I look
> at the asm it generates.

Takes away the simplicity of the thing. It can't know these values at
compile time (due to external symbols etc..) although I suppose LTO
might be able to fix that.

Other than that, the above is the trivial form of reverse indexing an
array.

> > +
> > +	if (WARN_ON_ONCE(reg & ~0xf))
> > +		return -1;
> 
> Sanity-checking the alignment of those thunks?

Nah, the target address of the instruction; if that's not a retpoline
thunk (for whatever raisin) then the computation will not result in a
valid reg and we should bail.

> > +
> > +	i = emit_indirect(insn->opcode.bytes[0], reg, bytes);
> > +	if (i < 0)
> > +		return i;
> > +
> > +	for (; i < insn->length;)
> > +		bytes[i++] = BYTES_NOP1;
> 
> Why not:
> 
>         nop_len = insn->length - i;
>         if (nop_len) {
>                 memcpy(&bytes[i], x86_nops[nop_len], nop_len);
>                 i += nop_len;
>         }
> 
> and then you save yourself the optimize_nops() call because it'll take
> the right-sized NOP directly.

That's not immediately safe; if for some reason or other the original
instrucion is 15 bytes long, and we generated 2 bytes, then we need 13
nop bytes, the above will then do an out-of-bound array access (due to
the nops array only doing 8 byte nops at max).

I wanted this code to be simple and obvious.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 4/9] x86/alternative: Implement .retpoline_sites support
  2021-10-15 16:56     ` Peter Zijlstra
@ 2021-10-18 23:06       ` Alexander Lobakin
  2021-10-19  0:25         ` Alexander Lobakin
  2021-10-19  9:40         ` Peter Zijlstra
  0 siblings, 2 replies; 52+ messages in thread
From: Alexander Lobakin @ 2021-10-18 23:06 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Alexander Lobakin, Borislav Petkov, x86, jpoimboe,
	andrew.cooper3, linux-kernel, alexei.starovoitov, ndesaulniers

From: Peter Zijlstra <peterz@infradead.org>
Date: Fri, 15 Oct 2021 18:56:35 +0200

Hi,

Gave it a spin with Clang/LLVM, and

> On Fri, Oct 15, 2021 at 04:24:08PM +0200, Borislav Petkov wrote:
> > On Wed, Oct 13, 2021 at 02:22:21PM +0200, Peter Zijlstra wrote:
> > > +static int patch_retpoline(void *addr, struct insn *insn, u8 *bytes)
> > > +{
> > > +	void (*target)(void);
> > > +	int reg, i = 0;
> > > +
> > > +	if (cpu_feature_enabled(X86_FEATURE_RETPOLINE))
> > > +		return -1;
> > > +
> > > +	target = addr + insn->length + insn->immediate.value;
> > > +	reg = (target - &__x86_indirect_thunk_rax) /
> > > +	      (&__x86_indirect_thunk_rcx - &__x86_indirect_thunk_rax);

this triggers

> > I guess you should compute those values once so that it doesn't have to
> > do them for each function invocation. And it does them here when I look
> > at the asm it generates.
>
> Takes away the simplicity of the thing. It can't know these values at
> compile time (due to external symbols etc..) although I suppose LTO
> might be able to fix that.
>
> Other than that, the above is the trivial form of reverse indexing an
> array.
>
> > > +
> > > +	if (WARN_ON_ONCE(reg & ~0xf))
> > > +		return -1;

this:

WARN in patch_retpoline:408: addr pcibios_scan_specific_bus+0x196/0x200, op 0xe8, reg 0xb88ca
WARN in patch_retpoline:408: addr xen_pv_teardown_msi_irqs+0x8d/0x120, op 0xe8, reg 0xb88ca
WARN in patch_retpoline:408: addr __mptcp_sockopt_sync+0x7e/0x200, op 0xe8, reg 0xb88ca
[...]
(thousands of them, but op == 0xe8 && reg == 0xb88ca are always the same)

I know this reg calculation is about to be replaced, but anyway ;)

> > Sanity-checking the alignment of those thunks?
>
> Nah, the target address of the instruction; if that's not a retpoline
> thunk (for whatever raisin) then the computation will not result in a
> valid reg and we should bail.
>
> > > +
> > > +	i = emit_indirect(insn->opcode.bytes[0], reg, bytes);
> > > +	if (i < 0)
> > > +		return i;
> > > +
> > > +	for (; i < insn->length;)
> > > +		bytes[i++] = BYTES_NOP1;
> >
> > Why not:
> >
> >         nop_len = insn->length - i;
> >         if (nop_len) {
> >                 memcpy(&bytes[i], x86_nops[nop_len], nop_len);
> >                 i += nop_len;
> >         }
> >
> > and then you save yourself the optimize_nops() call because it'll take
> > the right-sized NOP directly.
>
> That's not immediately safe; if for some reason or other the original
> instrucion is 15 bytes long, and we generated 2 bytes, then we need 13
> nop bytes, the above will then do an out-of-bound array access (due to
> the nops array only doing 8 byte nops at max).
>
> I wanted this code to be simple and obvious.

Thanks,
Al


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 4/9] x86/alternative: Implement .retpoline_sites support
  2021-10-18 23:06       ` Alexander Lobakin
@ 2021-10-19  0:25         ` Alexander Lobakin
  2021-10-19  9:47           ` Alexander Lobakin
  2021-10-19  9:40         ` Peter Zijlstra
  1 sibling, 1 reply; 52+ messages in thread
From: Alexander Lobakin @ 2021-10-19  0:25 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Alexander Lobakin, Borislav Petkov, x86, jpoimboe,
	andrew.cooper3, linux-kernel, alexei.starovoitov, ndesaulniers

From: Alexander Lobakin <alobakin@pm.me>
Date: Mon, 18 Oct 2021 23:06:35 +0000

Sorry for double posting, should've include this from the start.

> Hi,
>
> Gave it a spin with Clang/LLVM, and
>
> > On Fri, Oct 15, 2021 at 04:24:08PM +0200, Borislav Petkov wrote:
> > > On Wed, Oct 13, 2021 at 02:22:21PM +0200, Peter Zijlstra wrote:
> > > > +static int patch_retpoline(void *addr, struct insn *insn, u8 *bytes)
> > > > +{
> > > > +	void (*target)(void);
> > > > +	int reg, i = 0;
> > > > +
> > > > +	if (cpu_feature_enabled(X86_FEATURE_RETPOLINE))
> > > > +		return -1;
> > > > +
> > > > +	target = addr + insn->length + insn->immediate.value;
> > > > +	reg = (target - &__x86_indirect_thunk_rax) /
> > > > +	      (&__x86_indirect_thunk_rcx - &__x86_indirect_thunk_rax);
>
> this triggers
>
> > > I guess you should compute those values once so that it doesn't have to
> > > do them for each function invocation. And it does them here when I look
> > > at the asm it generates.
> >
> > Takes away the simplicity of the thing. It can't know these values at
> > compile time (due to external symbols etc..) although I suppose LTO
> > might be able to fix that.
> >
> > Other than that, the above is the trivial form of reverse indexing an
> > array.
> >
> > > > +
> > > > +	if (WARN_ON_ONCE(reg & ~0xf))
> > > > +		return -1;
>
> this:
>
> WARN in patch_retpoline:408: addr pcibios_scan_specific_bus+0x196/0x200, op 0xe8, reg 0xb88ca
> WARN in patch_retpoline:408: addr xen_pv_teardown_msi_irqs+0x8d/0x120, op 0xe8, reg 0xb88ca
> WARN in patch_retpoline:408: addr __mptcp_sockopt_sync+0x7e/0x200, op 0xe8, reg 0xb88ca
> [...]
> (thousands of them, but op == 0xe8 && reg == 0xb88ca are always the same)

SMP alternatives: WARN in patch_retpoline:408: addr __strp_unpause+0x62/0x1b0/0xffffffff92d20a12, op 0xe8, reg 0xb88ca
SMP alternatives: insn->length: 5, insn->immediate.value: 0xffae0989
SMP alternatives: target: 0xffffffff928013a0/__x86_indirect_thunk_r11+0x0/0x20
SMP alternatives: rax: 0xffffffff9223cd50, target - rax: 0x5c4650
SMP alternatives: rcx - rax: 0x8

Imm value and addr are different each time, the rest are the same.
target is correct and even %pS works on it, but this distance
between r11 and rax thunks (0x5c4650) doesn't look fine, as well as
rcx - rax being 0x8. Thunks are 0x11 sized + alignment, should be
0x20, and it is, according to vmlinux.map. Weird. Amps/&s?

> I know this reg calculation is about to be replaced, but anyway ;)
>
> > > Sanity-checking the alignment of those thunks?
> >
> > Nah, the target address of the instruction; if that's not a retpoline
> > thunk (for whatever raisin) then the computation will not result in a
> > valid reg and we should bail.
> >
> > > > +
> > > > +	i = emit_indirect(insn->opcode.bytes[0], reg, bytes);
> > > > +	if (i < 0)
> > > > +		return i;
> > > > +
> > > > +	for (; i < insn->length;)
> > > > +		bytes[i++] = BYTES_NOP1;
> > >
> > > Why not:
> > >
> > >         nop_len = insn->length - i;
> > >         if (nop_len) {
> > >                 memcpy(&bytes[i], x86_nops[nop_len], nop_len);
> > >                 i += nop_len;
> > >         }
> > >
> > > and then you save yourself the optimize_nops() call because it'll take
> > > the right-sized NOP directly.
> >
> > That's not immediately safe; if for some reason or other the original
> > instrucion is 15 bytes long, and we generated 2 bytes, then we need 13
> > nop bytes, the above will then do an out-of-bound array access (due to
> > the nops array only doing 8 byte nops at max).
> >
> > I wanted this code to be simple and obvious.
>
> Thanks,
> Al

Thanks,
Al


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 4/9] x86/alternative: Implement .retpoline_sites support
  2021-10-18 23:06       ` Alexander Lobakin
  2021-10-19  0:25         ` Alexander Lobakin
@ 2021-10-19  9:40         ` Peter Zijlstra
  2021-10-19 10:02           ` Peter Zijlstra
  1 sibling, 1 reply; 52+ messages in thread
From: Peter Zijlstra @ 2021-10-19  9:40 UTC (permalink / raw)
  To: Alexander Lobakin
  Cc: Borislav Petkov, x86, jpoimboe, andrew.cooper3, linux-kernel,
	alexei.starovoitov, ndesaulniers

On Mon, Oct 18, 2021 at 11:06:35PM +0000, Alexander Lobakin wrote:
> From: Peter Zijlstra <peterz@infradead.org>
> Date: Fri, 15 Oct 2021 18:56:35 +0200
> 
> Hi,
> 
> Gave it a spin with Clang/LLVM, and

Just your normal clang build, not some fancy LTO ? eg.  make CC=clang.


> > > > +	target = addr + insn->length + insn->immediate.value;
> > > > +	reg = (target - &__x86_indirect_thunk_rax) /
> > > > +	      (&__x86_indirect_thunk_rcx - &__x86_indirect_thunk_rax);
> 
> this triggers
> 

> > > > +
> > > > +	if (WARN_ON_ONCE(reg & ~0xf))
> > > > +		return -1;
> 
> this:
> 
> WARN in patch_retpoline:408: addr pcibios_scan_specific_bus+0x196/0x200, op 0xe8, reg 0xb88ca
> WARN in patch_retpoline:408: addr xen_pv_teardown_msi_irqs+0x8d/0x120, op 0xe8, reg 0xb88ca
> WARN in patch_retpoline:408: addr __mptcp_sockopt_sync+0x7e/0x200, op 0xe8, reg 0xb88ca
> [...]
> (thousands of them, but op == 0xe8 && reg == 0xb88ca are always the same)
> 
> I know this reg calculation is about to be replaced, but anyway ;)

Well, I was sorta hoping to keep that with something like:

  https://lkml.kernel.org/r/YWgA+vbWCdGLZhq5@hirez.programming.kicks-ass.net

Anyway, let me try and reproduce.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 4/9] x86/alternative: Implement .retpoline_sites support
  2021-10-19  0:25         ` Alexander Lobakin
@ 2021-10-19  9:47           ` Alexander Lobakin
  2021-10-19 10:16             ` Peter Zijlstra
  0 siblings, 1 reply; 52+ messages in thread
From: Alexander Lobakin @ 2021-10-19  9:47 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Alexander Lobakin, Borislav Petkov, x86, jpoimboe,
	andrew.cooper3, linux-kernel, alexei.starovoitov, ndesaulniers

From: Alexander Lobakin <alobakin@pm.me>
Date: Tue, 19 Oct 2021 00:25:30 +0000

> From: Alexander Lobakin <alobakin@pm.me>
> Date: Mon, 18 Oct 2021 23:06:35 +0000
>
> Sorry for double posting, should've include this from the start.
>
> > Hi,
> >
> > Gave it a spin with Clang/LLVM, and
> >
> > > On Fri, Oct 15, 2021 at 04:24:08PM +0200, Borislav Petkov wrote:
> > > > On Wed, Oct 13, 2021 at 02:22:21PM +0200, Peter Zijlstra wrote:
> > > > > +static int patch_retpoline(void *addr, struct insn *insn, u8 *bytes)
> > > > > +{
> > > > > +	void (*target)(void);
> > > > > +	int reg, i = 0;
> > > > > +
> > > > > +	if (cpu_feature_enabled(X86_FEATURE_RETPOLINE))
> > > > > +		return -1;
> > > > > +
> > > > > +	target = addr + insn->length + insn->immediate.value;
> > > > > +	reg = (target - &__x86_indirect_thunk_rax) /
> > > > > +	      (&__x86_indirect_thunk_rcx - &__x86_indirect_thunk_rax);
> >
> > this triggers
> >
> > > > I guess you should compute those values once so that it doesn't have to
> > > > do them for each function invocation. And it does them here when I look
> > > > at the asm it generates.
> > >
> > > Takes away the simplicity of the thing. It can't know these values at
> > > compile time (due to external symbols etc..) although I suppose LTO
> > > might be able to fix that.
> > >
> > > Other than that, the above is the trivial form of reverse indexing an
> > > array.
> > >
> > > > > +
> > > > > +	if (WARN_ON_ONCE(reg & ~0xf))
> > > > > +		return -1;
> >
> > this:
> >
> > WARN in patch_retpoline:408: addr pcibios_scan_specific_bus+0x196/0x200, op 0xe8, reg 0xb88ca
> > WARN in patch_retpoline:408: addr xen_pv_teardown_msi_irqs+0x8d/0x120, op 0xe8, reg 0xb88ca
> > WARN in patch_retpoline:408: addr __mptcp_sockopt_sync+0x7e/0x200, op 0xe8, reg 0xb88ca
> > [...]
> > (thousands of them, but op == 0xe8 && reg == 0xb88ca are always the same)
>
> SMP alternatives: WARN in patch_retpoline:408: addr __strp_unpause+0x62/0x1b0/0xffffffff92d20a12, op 0xe8, reg 0xb88ca
> SMP alternatives: insn->length: 5, insn->immediate.value: 0xffae0989
> SMP alternatives: target: 0xffffffff928013a0/__x86_indirect_thunk_r11+0x0/0x20
> SMP alternatives: rax: 0xffffffff9223cd50, target - rax: 0x5c4650
> SMP alternatives: rcx - rax: 0x8
>
> Imm value and addr are different each time, the rest are the same.
> target is correct and even %pS works on it, but this distance
> between r11 and rax thunks (0x5c4650) doesn't look fine, as well as
> rcx - rax being 0x8. Thunks are 0x11 sized + alignment, should be
> 0x20, and it is, according to vmlinux.map. Weird. Amps/&s?

Oh okay, it's because of ClangCFI:

SMP alternatives: You were looking for __typeid__ZTSFvvE_global_addr+0x370/0x1410 at 0xffffffffa523cd60,>
SMP alternatives: rax is __typeid__ZTSFvvE_global_addr+0x360/0x1410 at 0xffffffffa523cd50

Sorry for confusing, seems like it's a side effect of using it on
Clang 12 while the original series supports only 13+. I'll double
check and let know if find something.

[ snip ]

Thanks,
Al


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 4/9] x86/alternative: Implement .retpoline_sites support
  2021-10-19  9:40         ` Peter Zijlstra
@ 2021-10-19 10:02           ` Peter Zijlstra
  0 siblings, 0 replies; 52+ messages in thread
From: Peter Zijlstra @ 2021-10-19 10:02 UTC (permalink / raw)
  To: Alexander Lobakin
  Cc: Borislav Petkov, x86, jpoimboe, andrew.cooper3, linux-kernel,
	alexei.starovoitov, ndesaulniers

On Tue, Oct 19, 2021 at 11:40:56AM +0200, Peter Zijlstra wrote:
> On Mon, Oct 18, 2021 at 11:06:35PM +0000, Alexander Lobakin wrote:

> > WARN in patch_retpoline:408: addr pcibios_scan_specific_bus+0x196/0x200, op 0xe8, reg 0xb88ca
> > WARN in patch_retpoline:408: addr xen_pv_teardown_msi_irqs+0x8d/0x120, op 0xe8, reg 0xb88ca
> > WARN in patch_retpoline:408: addr __mptcp_sockopt_sync+0x7e/0x200, op 0xe8, reg 0xb88ca
> > [...]
> > (thousands of them, but op == 0xe8 && reg == 0xb88ca are always the same)
> > 
> > I know this reg calculation is about to be replaced, but anyway ;)
> 
> Well, I was sorta hoping to keep that with something like:
> 
>   https://lkml.kernel.org/r/YWgA+vbWCdGLZhq5@hirez.programming.kicks-ass.net
> 
> Anyway, let me try and reproduce.

Using: make O=defconfig CC=clang-13 -j80 -s

( is actually defconfig+kvm_guest.config,
  and clang-13 from debian/testing )

on https://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git/log/?h=objtool/core

It seems to 'just' work:

spectre_v2=retpoline,amd:

gets me two very sad and lonely replacements:

 [] SMP alternatives: retpoline at: 0xffffffff82e764b4 (ffffffff82e764b4) len: 5 to: __x86_indirect_thunk_rbx+0x0/0x20
 [] SMP alternatives: ffffffff82e764b4: orig: e8 07 d3 18 ff
 [] SMP alternatives: ffffffff82e764b4: repl: 0f ae e8 ff d3

 [] SMP alternatives: retpoline at: 0xffffffff82e76f39 (ffffffff82e76f39) len: 5 to: __x86_indirect_thunk_rdi+0x0/0x20
 [] SMP alternatives: ffffffff82e76f39: orig: e8 02 c9 18 ff
 [] SMP alternatives: ffffffff82e76f39: repl: 0f ae e8 ff d7

The rest is R11 and therefore doesn't actaully fit.

For spectre_v2=off everything gets replaced, and that also seems happy.

 [] SMP alternatives: retpoline at: pm_check_save_msr+0x24/0x30 (ffffffff81d2ef24) len: 5 to: __x86_indirect_thunk_r11+0x0/0x20
 [] SMP alternatives: ffffffff82603eb0: [3:5) optimized NOPs: 41 ff d3 66 90
 [] SMP alternatives: ffffffff81d2ef24: orig: e8 97 49 2d 00
 [] SMP alternatives: ffffffff81d2ef24: repl: 41 ff d3 66 90

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 4/9] x86/alternative: Implement .retpoline_sites support
  2021-10-19  9:47           ` Alexander Lobakin
@ 2021-10-19 10:16             ` Peter Zijlstra
  2021-10-19 15:37               ` Sami Tolvanen
  0 siblings, 1 reply; 52+ messages in thread
From: Peter Zijlstra @ 2021-10-19 10:16 UTC (permalink / raw)
  To: Alexander Lobakin
  Cc: Borislav Petkov, x86, jpoimboe, andrew.cooper3, linux-kernel,
	alexei.starovoitov, ndesaulniers, samitolvanen


+ Sami

(Sami, for context:

  https://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git/log/?h=objtool/core

which contains the following code:

+	void (*target)(void);
+	int reg, i = 0;
+
+	target = addr + insn->length + insn->immediate.value;
+	reg = (target - &__x86_indirect_thunk_rax) /
+	      (&__x86_indirect_thunk_rcx - &__x86_indirect_thunk_rax);
+
+	if (WARN_ON_ONCE(reg & ~0xf))
+		return -1;

which blows up something fierce on clang-cfi)

On Tue, Oct 19, 2021 at 09:47:26AM +0000, Alexander Lobakin wrote:

> Oh okay, it's because of ClangCFI:
> 
> SMP alternatives: You were looking for __typeid__ZTSFvvE_global_addr+0x370/0x1410 at 0xffffffffa523cd60,>
> SMP alternatives: rax is __typeid__ZTSFvvE_global_addr+0x360/0x1410 at 0xffffffffa523cd50
> 
> Sorry for confusing, seems like it's a side effect of using it on
> Clang 12 while the original series supports only 13+. I'll double
> check and let know if find something.

I'm thinking CFI will totally screw this up regardless, seeing how a
function pointer is taken, and the CFI magicks will turn that into one
of those weird trampolines instead of the actual symbol.

The compiler could of course deduce that these addresses are never
called and don't escape the function, and therefore doesn't need to do
the CFI transformation on then, but I'm guessing it isn't quite that
clever.

Also doing CFI on retpoline thunks seems 'weird', they have a very
particular calling convention, excplicitly very much not the standard C
one. Can't we mark them using asmlinkage or something to tell the
compiler to politely 'bugger off' or somesuch ;-)

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 4/9] x86/alternative: Implement .retpoline_sites support
  2021-10-13 20:52   ` Josh Poimboeuf
  2021-10-13 21:00     ` Peter Zijlstra
@ 2021-10-19 11:37     ` Peter Zijlstra
  2021-10-19 16:46       ` Josh Poimboeuf
  1 sibling, 1 reply; 52+ messages in thread
From: Peter Zijlstra @ 2021-10-19 11:37 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: x86, andrew.cooper3, linux-kernel, alexei.starovoitov, ndesaulniers

On Wed, Oct 13, 2021 at 01:52:59PM -0700, Josh Poimboeuf wrote:

> BTW, CALL_NOSPEC results in a retpoline site in .altinstr_replacement:
> 
> Relocation section [40] '.rela.retpoline_sites' for section [39] '.retpoline_sites' at offset 0x8d28 contains 1 entry:
>   Offset              Type            Value               Addend Name
>   000000000000000000  X86_64_PC32     000000000000000000     +10 .altinstr_replacement
> 
> Which I assume we don't want.

(I missed this initially, and just independently rediscovered it)

In principle this problem affects static_call_list, the __sanitizer_cov_
and __fentry__ and now retpoline_sites.

Granted, it seems really unlikely to find __fentry__ or __sanitizer_cov_
references in alternatives, but it should be trivial to manually create
one.

I'm thinking we want to exclude all those when found in
.altinstr_replacement, right? It just doesn't make sense to rewrite
replacement text.

How is something like the below? (I'm not completely happy with it, but
I couldn't think of something better either).

---
--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -1039,14 +1039,39 @@ static void remove_insn_ops(struct instr
 	}
 }
 
+#define DEST_RETPOLINE	((void *)-1L)
+
 static void add_call_dest(struct objtool_file *file, struct instruction *insn,
 			  struct symbol *dest, bool sibling)
 {
 	struct reloc *reloc = insn_reloc(file, insn);
 
-	insn->call_dest = dest;
-	if (!dest)
+	if (dest != DEST_RETPOLINE) {
+		insn->call_dest = dest;
+		if (!dest )
+			return;
+	}
+
+	/*
+	 * Whatever stack impact regular CALLs have, should be undone
+	 * by the RETURN of the called function.
+	 *
+	 * Annotated intra-function calls retain the stack_ops but
+	 * are converted to JUMP, see read_intra_function_calls().
+	 */
+	remove_insn_ops(insn);
+
+	/*
+	 * Whatever we do, do not rewrite replacement text.
+	 */
+	if (!strcmp(insn->sec->name, ".altinstr_replacement"))
+		return;
+
+	if (dest == DEST_RETPOLINE) {
+		list_add_tail(&insn->call_node,
+			      &file->retpoline_call_list);
 		return;
+	}
 
 	if (insn->call_dest->static_call_tramp) {
 		list_add_tail(&insn->call_node,
@@ -1091,15 +1116,6 @@ static void add_call_dest(struct objtool
 		list_add_tail(&insn->mcount_loc_node,
 			      &file->mcount_loc_list);
 	}
-
-	/*
-	 * Whatever stack impact regular CALLs have, should be undone
-	 * by the RETURN of the called function.
-	 *
-	 * Annotated intra-function calls retain the stack_ops but
-	 * are converted to JUMP, see read_intra_function_calls().
-	 */
-	remove_insn_ops(insn);
 }
 
 /*
@@ -1133,10 +1149,9 @@ static int add_jump_destinations(struct
 			else
 				insn->type = INSN_JUMP_DYNAMIC_CONDITIONAL;
 
-			list_add_tail(&insn->call_node,
-				      &file->retpoline_call_list);
-
 			insn->retpoline_safe = true;
+
+			add_call_dest(file, insn, DEST_RETPOLINE, true);
 			continue;
 		} else if (insn->func) {
 			/* internal or external sibling call (with reloc) */
@@ -1272,12 +1287,7 @@ static int add_call_destinations(struct
 			insn->type = INSN_CALL_DYNAMIC;
 			insn->retpoline_safe = true;
 
-			list_add_tail(&insn->call_node,
-				      &file->retpoline_call_list);
-
-			remove_insn_ops(insn);
-			continue;
-
+			add_call_dest(file, insn, DEST_RETPOLINE, false);
 		} else
 			add_call_dest(file, insn, reloc->sym, false);
 	}

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 4/9] x86/alternative: Implement .retpoline_sites support
  2021-10-19 10:16             ` Peter Zijlstra
@ 2021-10-19 15:37               ` Sami Tolvanen
  2021-10-19 18:00                 ` Alexander Lobakin
  0 siblings, 1 reply; 52+ messages in thread
From: Sami Tolvanen @ 2021-10-19 15:37 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Alexander Lobakin, Borislav Petkov, X86 ML, Josh Poimboeuf,
	andrew.cooper3, LKML, alexei.starovoitov, Nick Desaulniers

On Tue, Oct 19, 2021 at 3:17 AM Peter Zijlstra <peterz@infradead.org> wrote:
>
>
> + Sami
>
> (Sami, for context:
>
>   https://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git/log/?h=objtool/core
>
> which contains the following code:
>
> +       void (*target)(void);
> +       int reg, i = 0;
> +
> +       target = addr + insn->length + insn->immediate.value;
> +       reg = (target - &__x86_indirect_thunk_rax) /
> +             (&__x86_indirect_thunk_rcx - &__x86_indirect_thunk_rax);
> +
> +       if (WARN_ON_ONCE(reg & ~0xf))
> +               return -1;
>
> which blows up something fierce on clang-cfi)
>
> On Tue, Oct 19, 2021 at 09:47:26AM +0000, Alexander Lobakin wrote:
>
> > Oh okay, it's because of ClangCFI:
> >
> > SMP alternatives: You were looking for __typeid__ZTSFvvE_global_addr+0x370/0x1410 at 0xffffffffa523cd60,>
> > SMP alternatives: rax is __typeid__ZTSFvvE_global_addr+0x360/0x1410 at 0xffffffffa523cd50
> >
> > Sorry for confusing, seems like it's a side effect of using it on
> > Clang 12 while the original series supports only 13+. I'll double
> > check and let know if find something.
>
> I'm thinking CFI will totally screw this up regardless, seeing how a
> function pointer is taken, and the CFI magicks will turn that into one
> of those weird trampolines instead of the actual symbol.
>
> The compiler could of course deduce that these addresses are never
> called and don't escape the function, and therefore doesn't need to do
> the CFI transformation on then, but I'm guessing it isn't quite that
> clever.

Yes, it's unfortunately not that clever.

> Also doing CFI on retpoline thunks seems 'weird', they have a very
> particular calling convention, excplicitly very much not the standard C
> one. Can't we mark them using asmlinkage or something to tell the
> compiler to politely 'bugger off' or somesuch ;-)

I confirmed that using an opaque type for the thunk declaration fixes
this issue with CFI. It also makes it obvious that these are not
callable from C code.

Sami

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 4/9] x86/alternative: Implement .retpoline_sites support
  2021-10-19 11:37     ` Peter Zijlstra
@ 2021-10-19 16:46       ` Josh Poimboeuf
  2021-10-19 16:49         ` Josh Poimboeuf
  0 siblings, 1 reply; 52+ messages in thread
From: Josh Poimboeuf @ 2021-10-19 16:46 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: x86, andrew.cooper3, linux-kernel, alexei.starovoitov, ndesaulniers

On Tue, Oct 19, 2021 at 01:37:09PM +0200, Peter Zijlstra wrote:
> On Wed, Oct 13, 2021 at 01:52:59PM -0700, Josh Poimboeuf wrote:
> 
> > BTW, CALL_NOSPEC results in a retpoline site in .altinstr_replacement:
> > 
> > Relocation section [40] '.rela.retpoline_sites' for section [39] '.retpoline_sites' at offset 0x8d28 contains 1 entry:
> >   Offset              Type            Value               Addend Name
> >   000000000000000000  X86_64_PC32     000000000000000000     +10 .altinstr_replacement
> > 
> > Which I assume we don't want.
> 
> (I missed this initially, and just independently rediscovered it)
> 
> In principle this problem affects static_call_list, the __sanitizer_cov_
> and __fentry__ and now retpoline_sites.
> 
> Granted, it seems really unlikely to find __fentry__ or __sanitizer_cov_
> references in alternatives, but it should be trivial to manually create
> one.
> 
> I'm thinking we want to exclude all those when found in
> .altinstr_replacement, right? It just doesn't make sense to rewrite
> replacement text.

Right.

(Someday, if it made sense to do so, objtool could put the annotation at
the original replaced instruction.  Then the kernel self-patching code
could run after alternatives patching and could then decide whether the
annotation is relevant or not.  But right now I can't think of any
scenario where that would be remotely sane.)

> How is something like the below? (I'm not completely happy with it, but
> I couldn't think of something better either).

How about something like this?

diff --git a/tools/objtool/check.c b/tools/objtool/check.c
index 7c865a10372a..90d51c294034 100644
--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -993,18 +993,31 @@ static void remove_insn_ops(struct instruction *insn)
 	}
 }
 
-static void add_call_dest(struct objtool_file *file, struct instruction *insn,
-			  struct symbol *dest, bool sibling)
+static void annotate_call_site(struct objtool_file *file,
+			       struct instruction *insn, bool sibling)
 {
 	struct reloc *reloc = insn_reloc(file, insn);
 
-	insn->call_dest = dest;
-	if (!dest)
+	if (!insn->call_dest)
+		return;
+
+	/*
+	 * Alternative replacement code is just template code which is
+	 * sometimes copied to the original instruction.  For now, don't
+	 * annotate it.  (In the future we might consider annotating the
+	 * original instruction if/when it ever makes sense to do so.)
+	 */
+	if (!strcmp(insn->sec->name, ".altinstr_replacement"))
+		return;
+
+	if (insn->call_dest->retpoline) {
+		list_add_tail(&insn->call_node, &file->retpoline_call_list);
 		return;
+	}
 
 	if (insn->call_dest->static_call_tramp) {
-		list_add_tail(&insn->call_node,
-			      &file->static_call_list);
+		list_add_tail(&insn->call_node, &file->static_call_list);
+		return;
 	}
 
 	/*
@@ -1025,6 +1038,7 @@ static void add_call_dest(struct objtool_file *file, struct instruction *insn,
 			               : arch_nop_insn(insn->len));
 
 		insn->type = sibling ? INSN_RETURN : INSN_NOP;
+		return;
 	}
 
 	if (mcount && !strcmp(insn->call_dest->name, "__fentry__")) {
@@ -1042,9 +1056,17 @@ static void add_call_dest(struct objtool_file *file, struct instruction *insn,
 
 		insn->type = INSN_NOP;
 
-		list_add_tail(&insn->mcount_loc_node,
-			      &file->mcount_loc_list);
+		list_add_tail(&insn->mcount_loc_node, &file->mcount_loc_list);
+		return;
 	}
+}
+
+static void add_call_dest(struct objtool_file *file, struct instruction *insn,
+			  struct symbol *dest, bool sibling)
+{
+	insn->call_dest = dest;
+
+	annotate_call_site(file, insn, sibling);
 
 	/*
 	 * Whatever stack impact regular CALLs have, should be undone
@@ -1053,7 +1075,9 @@ static void add_call_dest(struct objtool_file *file, struct instruction *insn,
 	 * Annotated intra-function calls retain the stack_ops but
 	 * are converted to JUMP, see read_intra_function_calls().
 	 */
-	remove_insn_ops(insn);
+	if (dest)
+		remove_insn_ops(insn);
+
 }
 
 /*
@@ -1077,7 +1101,7 @@ static int add_jump_destinations(struct objtool_file *file)
 		} else if (reloc->sym->type == STT_SECTION) {
 			dest_sec = reloc->sym->sec;
 			dest_off = arch_dest_reloc_offset(reloc->addend);
-		} else if (arch_is_retpoline(reloc->sym)) {
+		} else if (reloc->sym->retpoline) {
 			/*
 			 * Retpoline jumps are really dynamic jumps in
 			 * disguise, so convert them accordingly.
@@ -1087,9 +1111,7 @@ static int add_jump_destinations(struct objtool_file *file)
 			else
 				insn->type = INSN_JUMP_DYNAMIC_CONDITIONAL;
 
-			list_add_tail(&insn->call_node,
-				      &file->retpoline_call_list);
-
+			add_call_dest(file, insn, reloc->sym, true);
 			insn->retpoline_safe = true;
 			continue;
 		} else if (insn->func) {
@@ -1218,20 +1240,14 @@ static int add_call_destinations(struct objtool_file *file)
 
 			add_call_dest(file, insn, dest, false);
 
-		} else if (arch_is_retpoline(reloc->sym)) {
+		} else if (reloc->sym->retpoline) {
 			/*
 			 * Retpoline calls are really dynamic calls in
 			 * disguise, so convert them accordingly.
 			 */
 			insn->type = INSN_CALL_DYNAMIC;
+			add_call_dest(file, insn, reloc->sym, false);
 			insn->retpoline_safe = true;
-
-			list_add_tail(&insn->call_node,
-				      &file->retpoline_call_list);
-
-			remove_insn_ops(insn);
-			continue;
-
 		} else
 			add_call_dest(file, insn, reloc->sym, false);
 	}
@@ -1916,8 +1932,25 @@ static int read_static_call_tramps(struct objtool_file *file)
 		list_for_each_entry(func, &sec->symbol_list, list) {
 			if (func->bind == STB_GLOBAL &&
 			    !strncmp(func->name, STATIC_CALL_TRAMP_PREFIX_STR,
-				     strlen(STATIC_CALL_TRAMP_PREFIX_STR)))
+				     strlen(STATIC_CALL_TRAMP_PREFIX_STR))) {
 				func->static_call_tramp = true;
+			}
+		}
+	}
+
+	return 0;
+}
+
+static int read_retpoline_thunks(struct objtool_file *file)
+{
+	struct section *sec;
+	struct symbol *func;
+
+	for_each_sec(file, sec) {
+		list_for_each_entry(func, &sec->symbol_list, list) {
+			if (func->bind == STB_GLOBAL && arch_is_retpoline(func)) {
+				func->retpoline = true;
+			}
 		}
 	}
 
@@ -1980,13 +2013,16 @@ static int decode_sections(struct objtool_file *file)
 	if (ret)
 		return ret;
 
-	/*
-	 * Must be before add_{jump_call}_destination.
-	 */
+	/* Must be before add_{jump_call}_destination. */
 	ret = read_static_call_tramps(file);
 	if (ret)
 		return ret;
 
+	/* Must be before add_{jump_call}_destination. */
+	ret = read_retpoline_thunks(file);
+	if (ret)
+		return ret;
+
 	/*
 	 * Must be before add_special_section_alts() as that depends on
 	 * jump_dest being set.
diff --git a/tools/objtool/include/objtool/elf.h b/tools/objtool/include/objtool/elf.h
index 9bdc7c757bf8..b773366dfb52 100644
--- a/tools/objtool/include/objtool/elf.h
+++ b/tools/objtool/include/objtool/elf.h
@@ -56,6 +56,7 @@ struct symbol {
 	struct symbol *pfunc, *cfunc, *alias;
 	bool uaccess_safe;
 	bool static_call_tramp;
+	bool retpoline;
 	struct list_head pv_target;
 };
 


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* Re: [PATCH 4/9] x86/alternative: Implement .retpoline_sites support
  2021-10-19 16:46       ` Josh Poimboeuf
@ 2021-10-19 16:49         ` Josh Poimboeuf
  2021-10-20  8:25           ` Peter Zijlstra
  2021-10-20  8:30           ` Peter Zijlstra
  0 siblings, 2 replies; 52+ messages in thread
From: Josh Poimboeuf @ 2021-10-19 16:49 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: x86, andrew.cooper3, linux-kernel, alexei.starovoitov, ndesaulniers

On Tue, Oct 19, 2021 at 09:47:02AM -0700, Josh Poimboeuf wrote:
> On Tue, Oct 19, 2021 at 01:37:09PM +0200, Peter Zijlstra wrote:
> > On Wed, Oct 13, 2021 at 01:52:59PM -0700, Josh Poimboeuf wrote:
> > 
> > > BTW, CALL_NOSPEC results in a retpoline site in .altinstr_replacement:
> > > 
> > > Relocation section [40] '.rela.retpoline_sites' for section [39] '.retpoline_sites' at offset 0x8d28 contains 1 entry:
> > >   Offset              Type            Value               Addend Name
> > >   000000000000000000  X86_64_PC32     000000000000000000     +10 .altinstr_replacement
> > > 
> > > Which I assume we don't want.
> > 
> > (I missed this initially, and just independently rediscovered it)
> > 
> > In principle this problem affects static_call_list, the __sanitizer_cov_
> > and __fentry__ and now retpoline_sites.
> > 
> > Granted, it seems really unlikely to find __fentry__ or __sanitizer_cov_
> > references in alternatives, but it should be trivial to manually create
> > one.
> > 
> > I'm thinking we want to exclude all those when found in
> > .altinstr_replacement, right? It just doesn't make sense to rewrite
> > replacement text.
> 
> Right.
> 
> (Someday, if it made sense to do so, objtool could put the annotation at
> the original replaced instruction.  Then the kernel self-patching code
> could run after alternatives patching and could then decide whether the
> annotation is relevant or not.  But right now I can't think of any
> scenario where that would be remotely sane.)
> 
> > How is something like the below? (I'm not completely happy with it, but
> > I couldn't think of something better either).
> 
> How about something like this?

Slightly improved version:

diff --git a/tools/objtool/check.c b/tools/objtool/check.c
index 7c865a10372a..e0892632ef4a 100644
--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -993,18 +993,28 @@ static void remove_insn_ops(struct instruction *insn)
 	}
 }
 
-static void add_call_dest(struct objtool_file *file, struct instruction *insn,
-			  struct symbol *dest, bool sibling)
+static void annotate_call_site(struct objtool_file *file,
+			       struct instruction *insn, bool sibling)
 {
 	struct reloc *reloc = insn_reloc(file, insn);
 
-	insn->call_dest = dest;
-	if (!dest)
+	/*
+	 * Alternative replacement code is just template code which is
+	 * sometimes copied to the original instruction.  For now, don't
+	 * annotate it.  (In the future we might consider annotating the
+	 * original instruction if/when it ever makes sense to do so.)
+	 */
+	if (!strcmp(insn->sec->name, ".altinstr_replacement"))
 		return;
 
+	if (insn->call_dest->retpoline) {
+		list_add_tail(&insn->call_node, &file->retpoline_call_list);
+		return;
+	}
+
 	if (insn->call_dest->static_call_tramp) {
-		list_add_tail(&insn->call_node,
-			      &file->static_call_list);
+		list_add_tail(&insn->call_node, &file->static_call_list);
+		return;
 	}
 
 	/*
@@ -1025,6 +1035,7 @@ static void add_call_dest(struct objtool_file *file, struct instruction *insn,
 			               : arch_nop_insn(insn->len));
 
 		insn->type = sibling ? INSN_RETURN : INSN_NOP;
+		return;
 	}
 
 	if (mcount && !strcmp(insn->call_dest->name, "__fentry__")) {
@@ -1042,9 +1053,19 @@ static void add_call_dest(struct objtool_file *file, struct instruction *insn,
 
 		insn->type = INSN_NOP;
 
-		list_add_tail(&insn->mcount_loc_node,
-			      &file->mcount_loc_list);
+		list_add_tail(&insn->mcount_loc_node, &file->mcount_loc_list);
+		return;
 	}
+}
+
+static void add_call_dest(struct objtool_file *file, struct instruction *insn,
+			  struct symbol *dest, bool sibling)
+{
+	insn->call_dest = dest;
+	if (!dest)
+		return;
+
+	annotate_call_site(file, insn, sibling);
 
 	/*
 	 * Whatever stack impact regular CALLs have, should be undone
@@ -1054,6 +1075,7 @@ static void add_call_dest(struct objtool_file *file, struct instruction *insn,
 	 * are converted to JUMP, see read_intra_function_calls().
 	 */
 	remove_insn_ops(insn);
+
 }
 
 /*
@@ -1077,7 +1099,7 @@ static int add_jump_destinations(struct objtool_file *file)
 		} else if (reloc->sym->type == STT_SECTION) {
 			dest_sec = reloc->sym->sec;
 			dest_off = arch_dest_reloc_offset(reloc->addend);
-		} else if (arch_is_retpoline(reloc->sym)) {
+		} else if (reloc->sym->retpoline) {
 			/*
 			 * Retpoline jumps are really dynamic jumps in
 			 * disguise, so convert them accordingly.
@@ -1087,9 +1109,7 @@ static int add_jump_destinations(struct objtool_file *file)
 			else
 				insn->type = INSN_JUMP_DYNAMIC_CONDITIONAL;
 
-			list_add_tail(&insn->call_node,
-				      &file->retpoline_call_list);
-
+			add_call_dest(file, insn, reloc->sym, true);
 			insn->retpoline_safe = true;
 			continue;
 		} else if (insn->func) {
@@ -1218,20 +1238,14 @@ static int add_call_destinations(struct objtool_file *file)
 
 			add_call_dest(file, insn, dest, false);
 
-		} else if (arch_is_retpoline(reloc->sym)) {
+		} else if (reloc->sym->retpoline) {
 			/*
 			 * Retpoline calls are really dynamic calls in
 			 * disguise, so convert them accordingly.
 			 */
 			insn->type = INSN_CALL_DYNAMIC;
+			add_call_dest(file, insn, reloc->sym, false);
 			insn->retpoline_safe = true;
-
-			list_add_tail(&insn->call_node,
-				      &file->retpoline_call_list);
-
-			remove_insn_ops(insn);
-			continue;
-
 		} else
 			add_call_dest(file, insn, reloc->sym, false);
 	}
@@ -1916,8 +1930,25 @@ static int read_static_call_tramps(struct objtool_file *file)
 		list_for_each_entry(func, &sec->symbol_list, list) {
 			if (func->bind == STB_GLOBAL &&
 			    !strncmp(func->name, STATIC_CALL_TRAMP_PREFIX_STR,
-				     strlen(STATIC_CALL_TRAMP_PREFIX_STR)))
+				     strlen(STATIC_CALL_TRAMP_PREFIX_STR))) {
 				func->static_call_tramp = true;
+			}
+		}
+	}
+
+	return 0;
+}
+
+static int read_retpoline_thunks(struct objtool_file *file)
+{
+	struct section *sec;
+	struct symbol *func;
+
+	for_each_sec(file, sec) {
+		list_for_each_entry(func, &sec->symbol_list, list) {
+			if (func->bind == STB_GLOBAL && arch_is_retpoline(func)) {
+				func->retpoline = true;
+			}
 		}
 	}
 
@@ -1980,13 +2011,16 @@ static int decode_sections(struct objtool_file *file)
 	if (ret)
 		return ret;
 
-	/*
-	 * Must be before add_{jump_call}_destination.
-	 */
+	/* Must be before add_{jump_call}_destination. */
 	ret = read_static_call_tramps(file);
 	if (ret)
 		return ret;
 
+	/* Must be before add_{jump_call}_destination. */
+	ret = read_retpoline_thunks(file);
+	if (ret)
+		return ret;
+
 	/*
 	 * Must be before add_special_section_alts() as that depends on
 	 * jump_dest being set.
diff --git a/tools/objtool/include/objtool/elf.h b/tools/objtool/include/objtool/elf.h
index 9bdc7c757bf8..b773366dfb52 100644
--- a/tools/objtool/include/objtool/elf.h
+++ b/tools/objtool/include/objtool/elf.h
@@ -56,6 +56,7 @@ struct symbol {
 	struct symbol *pfunc, *cfunc, *alias;
 	bool uaccess_safe;
 	bool static_call_tramp;
+	bool retpoline;
 	struct list_head pv_target;
 };
 


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* Re: [PATCH 4/9] x86/alternative: Implement .retpoline_sites support
  2021-10-19 15:37               ` Sami Tolvanen
@ 2021-10-19 18:00                 ` Alexander Lobakin
  0 siblings, 0 replies; 52+ messages in thread
From: Alexander Lobakin @ 2021-10-19 18:00 UTC (permalink / raw)
  To: Sami Tolvanen
  Cc: Alexander Lobakin, Peter Zijlstra, Borislav Petkov, X86 ML,
	Josh Poimboeuf, andrew.cooper3, LKML, alexei.starovoitov,
	Nick Desaulniers

From: Sami Tolvanen <samitolvanen@google.com>
Date: Tue, 19 Oct 2021 08:37:14 -0700

> On Tue, Oct 19, 2021 at 3:17 AM Peter Zijlstra <peterz@infradead.org> wrote:
> >
> >
> > + Sami
> >
> > (Sami, for context:
> >
> >   https://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git/log/?h=objtool/core
> >
> > which contains the following code:
> >
> > +       void (*target)(void);
> > +       int reg, i = 0;
> > +
> > +       target = addr + insn->length + insn->immediate.value;
> > +       reg = (target - &__x86_indirect_thunk_rax) /
> > +             (&__x86_indirect_thunk_rcx - &__x86_indirect_thunk_rax);
> > +
> > +       if (WARN_ON_ONCE(reg & ~0xf))
> > +               return -1;
> >
> > which blows up something fierce on clang-cfi)
> >
> > On Tue, Oct 19, 2021 at 09:47:26AM +0000, Alexander Lobakin wrote:
> >
> > > Oh okay, it's because of ClangCFI:
> > >
> > > SMP alternatives: You were looking for __typeid__ZTSFvvE_global_addr+0x370/0x1410 at 0xffffffffa523cd60,>
> > > SMP alternatives: rax is __typeid__ZTSFvvE_global_addr+0x360/0x1410 at 0xffffffffa523cd50
> > >
> > > Sorry for confusing, seems like it's a side effect of using it on
> > > Clang 12 while the original series supports only 13+. I'll double
> > > check and let know if find something.
> >
> > I'm thinking CFI will totally screw this up regardless, seeing how a
> > function pointer is taken, and the CFI magicks will turn that into one
> > of those weird trampolines instead of the actual symbol.
> >
> > The compiler could of course deduce that these addresses are never
> > called and don't escape the function, and therefore doesn't need to do
> > the CFI transformation on then, but I'm guessing it isn't quite that
> > clever.
>
> Yes, it's unfortunately not that clever.
>
> > Also doing CFI on retpoline thunks seems 'weird', they have a very
> > particular calling convention, excplicitly very much not the standard C
> > one. Can't we mark them using asmlinkage or something to tell the
> > compiler to politely 'bugger off' or somesuch ;-)
>
> I confirmed that using an opaque type for the thunk declaration fixes
> this issue with CFI. It also makes it obvious that these are not
> callable from C code.

Oh, glad we caught this then, much thanks y'all!

> Sami

Thanks,
Al


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 9/9] bpf,x86: Respect X86_FEATURE_RETPOLINE*
  2021-10-14  9:46       ` Peter Zijlstra
  2021-10-14  9:48         ` Peter Zijlstra
@ 2021-10-20  7:34         ` Peter Zijlstra
  1 sibling, 0 replies; 52+ messages in thread
From: Peter Zijlstra @ 2021-10-20  7:34 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: x86, andrew.cooper3, linux-kernel, alexei.starovoitov, ndesaulniers

On Thu, Oct 14, 2021 at 11:46:11AM +0200, Peter Zijlstra wrote:
> On Wed, Oct 13, 2021 at 11:54:51PM +0200, Peter Zijlstra wrote:
> 
> > > But the rest of eBPF JIT just emits retpolines unconditionally
> > > regardless of feature, for example see RETPOLINE_RCX_BPF_JIT().  So I'm
> > > thinking this should probably be consistent with that (or that with
> > > this).
> > 
> > Argh, I grepped for __x86_indirect_thunk, and missed they're writing
> > retpolines themselves. Bah.
> > 
> > Yes, that needs cleaning up. I'll go prod at that tomorrow.
> 
> Bah, i've too much of a head-ache to make sense of that bpf jit code :-(
> 
> Alexei, emit_fallback_jump() uses emit_jump() and provides @prog as .ip
> argument, but other sites do weird things like 'image + addrs[i]',
> presumable because @prog points to an on-stack array instead of to the
> actual text location.
> 
> Also @prog is confusingly sometimes a struct bpf_prog* and sometimes a
> u8* which makes grepping really confusing.
> 
> I've ended up with the below.. which is known broken-crap for 32, but is
> likely simlar for 64bit as well :-( Can you please have a look?

OK, I've got a working version (thanks to Josh for finding my
brain-fart)! I'll go repost the series soon.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 4/9] x86/alternative: Implement .retpoline_sites support
  2021-10-19 16:49         ` Josh Poimboeuf
@ 2021-10-20  8:25           ` Peter Zijlstra
  2021-10-20  8:30           ` Peter Zijlstra
  1 sibling, 0 replies; 52+ messages in thread
From: Peter Zijlstra @ 2021-10-20  8:25 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: x86, andrew.cooper3, linux-kernel, alexei.starovoitov, ndesaulniers

On Tue, Oct 19, 2021 at 09:49:13AM -0700, Josh Poimboeuf wrote:

> @@ -1916,8 +1930,25 @@ static int read_static_call_tramps(struct objtool_file *file)
>  		list_for_each_entry(func, &sec->symbol_list, list) {
>  			if (func->bind == STB_GLOBAL &&
>  			    !strncmp(func->name, STATIC_CALL_TRAMP_PREFIX_STR,
> -				     strlen(STATIC_CALL_TRAMP_PREFIX_STR)))
> +				     strlen(STATIC_CALL_TRAMP_PREFIX_STR))) {
>  				func->static_call_tramp = true;
> +			}
> +		}
> +	}
> +
> +	return 0;
> +}
> +
> +static int read_retpoline_thunks(struct objtool_file *file)
> +{
> +	struct section *sec;
> +	struct symbol *func;
> +
> +	for_each_sec(file, sec) {
> +		list_for_each_entry(func, &sec->symbol_list, list) {
> +			if (func->bind == STB_GLOBAL && arch_is_retpoline(func)) {
> +				func->retpoline = true;
> +			}
>  		}
>  	}

I've folded these two identical loops over all symbols into a single
classify_symbols() function.


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 4/9] x86/alternative: Implement .retpoline_sites support
  2021-10-19 16:49         ` Josh Poimboeuf
  2021-10-20  8:25           ` Peter Zijlstra
@ 2021-10-20  8:30           ` Peter Zijlstra
  1 sibling, 0 replies; 52+ messages in thread
From: Peter Zijlstra @ 2021-10-20  8:30 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: x86, andrew.cooper3, linux-kernel, alexei.starovoitov, ndesaulniers

On Tue, Oct 19, 2021 at 09:49:13AM -0700, Josh Poimboeuf wrote:

> @@ -1087,9 +1109,7 @@ static int add_jump_destinations(struct objtool_file *file)
>  			else
>  				insn->type = INSN_JUMP_DYNAMIC_CONDITIONAL;
>  
> -			list_add_tail(&insn->call_node,
> -				      &file->retpoline_call_list);
> -
> +			add_call_dest(file, insn, reloc->sym, true);
>  			insn->retpoline_safe = true;
>  			continue;
>  		} else if (insn->func) {
> @@ -1218,20 +1238,14 @@ static int add_call_destinations(struct objtool_file *file)
>  
>  			add_call_dest(file, insn, dest, false);
>  
> -		} else if (arch_is_retpoline(reloc->sym)) {
> +		} else if (reloc->sym->retpoline) {
>  			/*
>  			 * Retpoline calls are really dynamic calls in
>  			 * disguise, so convert them accordingly.
>  			 */
>  			insn->type = INSN_CALL_DYNAMIC;
> +			add_call_dest(file, insn, reloc->sym, false);
>  			insn->retpoline_safe = true;
> -
> -			list_add_tail(&insn->call_node,
> -				      &file->retpoline_call_list);
> -
> -			remove_insn_ops(insn);
> -			continue;
> -
>  		} else
>  			add_call_dest(file, insn, reloc->sym, false);
>  	}

So the problem with add_call_dest() like this, is that the instructions
now get to have ->call_dest set, which is really strange for
INSN_CALL_DYNAMIC etc.. At the very least it makes call_dest_name()
misbehave.

I've added add_retpoline_call() that also does the ->type frobbing and
->retpoline_safe marking instead.

^ permalink raw reply	[flat|nested] 52+ messages in thread

end of thread, other threads:[~2021-10-20  8:30 UTC | newest]

Thread overview: 52+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-13 12:22 [PATCH 0/9] x86: Rewrite the retpoline rewrite logic Peter Zijlstra
2021-10-13 12:22 ` [PATCH 1/9] objtool,x86: Replace alternatives with .retpoline_sites Peter Zijlstra
2021-10-13 13:29   ` Borislav Petkov
2021-10-13 20:11   ` Josh Poimboeuf
2021-10-14 15:43     ` Peter Zijlstra
2021-10-13 12:22 ` [PATCH 2/9] x86/retpoline: Remove unused replacement symbols Peter Zijlstra
2021-10-13 12:22 ` [PATCH 3/9] x86/asm: Fix register order Peter Zijlstra
2021-10-13 20:15   ` Josh Poimboeuf
2021-10-13 12:22 ` [PATCH 4/9] x86/alternative: Implement .retpoline_sites support Peter Zijlstra
2021-10-13 14:38   ` Andrew Cooper
2021-10-13 15:12     ` Peter Zijlstra
2021-10-13 17:11       ` Andrew Cooper
2021-10-14 10:05       ` Peter Zijlstra
2021-10-13 20:39   ` Josh Poimboeuf
2021-10-13 21:20     ` Peter Zijlstra
2021-10-13 21:49       ` Josh Poimboeuf
2021-10-13 21:52         ` Josh Poimboeuf
2021-10-13 22:10         ` Peter Zijlstra
2021-10-13 22:47           ` Andrew Cooper
2021-10-13 20:52   ` Josh Poimboeuf
2021-10-13 21:00     ` Peter Zijlstra
2021-10-19 11:37     ` Peter Zijlstra
2021-10-19 16:46       ` Josh Poimboeuf
2021-10-19 16:49         ` Josh Poimboeuf
2021-10-20  8:25           ` Peter Zijlstra
2021-10-20  8:30           ` Peter Zijlstra
2021-10-13 21:11   ` Josh Poimboeuf
2021-10-13 21:43     ` Peter Zijlstra
2021-10-13 22:05       ` Josh Poimboeuf
2021-10-13 22:14         ` Peter Zijlstra
2021-10-15 14:24   ` Borislav Petkov
2021-10-15 16:56     ` Peter Zijlstra
2021-10-18 23:06       ` Alexander Lobakin
2021-10-19  0:25         ` Alexander Lobakin
2021-10-19  9:47           ` Alexander Lobakin
2021-10-19 10:16             ` Peter Zijlstra
2021-10-19 15:37               ` Sami Tolvanen
2021-10-19 18:00                 ` Alexander Lobakin
2021-10-19  9:40         ` Peter Zijlstra
2021-10-19 10:02           ` Peter Zijlstra
2021-10-13 12:22 ` [PATCH 5/9] x86/alternative: Handle Jcc __x86_indirect_thunk_\reg Peter Zijlstra
2021-10-13 20:11   ` Nick Desaulniers
2021-10-13 21:08     ` Peter Zijlstra
2021-10-13 12:22 ` [PATCH 6/9] x86/alternative: Try inline spectre_v2=retpoline,amd Peter Zijlstra
2021-10-13 12:22 ` [PATCH 7/9] x86/alternative: Add debug prints to apply_retpolines() Peter Zijlstra
2021-10-13 12:22 ` [PATCH 8/9] x86,bugs: Unconditionally allow spectre_v2=retpoline,amd Peter Zijlstra
2021-10-13 12:22 ` [PATCH 9/9] bpf,x86: Respect X86_FEATURE_RETPOLINE* Peter Zijlstra
2021-10-13 21:06   ` Josh Poimboeuf
2021-10-13 21:54     ` Peter Zijlstra
2021-10-14  9:46       ` Peter Zijlstra
2021-10-14  9:48         ` Peter Zijlstra
2021-10-20  7:34         ` Peter Zijlstra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).