linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 00/14] x86: Rewrite the retpoline rewrite logic
@ 2021-10-20 10:44 Peter Zijlstra
  2021-10-20 10:44 ` [PATCH v2 01/14] objtool: Tag retpoline thunk symbols Peter Zijlstra
                   ` (13 more replies)
  0 siblings, 14 replies; 45+ messages in thread
From: Peter Zijlstra @ 2021-10-20 10:44 UTC (permalink / raw)
  To: x86, jpoimboe, andrew.cooper3
  Cc: linux-kernel, peterz, alexei.starovoitov, ndesaulniers

Hi,

These patches rewrite the way retpolines are rewritten. Currently objtool emits
alternative entries for most retpoline calls. However trying to extend that led
to trouble (ELF files are horrid).

Therefore completely overhaul this and have objtool emit a .retpoline_sites
section that lists all compiler generated retpoline thunk calls. Then the
kernel can do with them as it pleases.

Notably it will:

 - rewrite them to indirect instructions for !RETPOLINE
 - rewrite them to lfence; indirect; for RETPOLINE_AMD,
   where size allows (boo clang!)

Specifically, the !RETPOLINE case can now also deal with the clang-special
conditional-indirect-tail-call:

  Jcc __x86_indirect_thunk_\reg.

Finally, also update the x86 BPF jit to catch up to recent times and do these
same things.

All this should help improve performance by removing an indirection.

Patches can also be found here:

  git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git objtool/core


Changes since v1:
 - objtool: avoid self-modifying-code in .altinstr_replacement
 - objtool: shrink struct instruction
 - objtool: more cleanups
 - x86: more #ifdef
 - x86: GEN-for-each-reg cleanups/comments
 - x86: retpoline thunk array
 - x86,bpf: more complete retpoline replacement
 - build fixes

---
 arch/um/kernel/um_arch.c                |   4 +
 arch/x86/include/asm/GEN-for-each-reg.h |  13 ++-
 arch/x86/include/asm/alternative.h      |   1 +
 arch/x86/include/asm/asm-prototypes.h   |   6 +-
 arch/x86/include/asm/nospec-branch.h    |  59 ----------
 arch/x86/kernel/alternative.c           | 192 ++++++++++++++++++++++++++++++-
 arch/x86/kernel/cpu/bugs.c              |   7 --
 arch/x86/kernel/module.c                |   9 +-
 arch/x86/kernel/vmlinux.lds.S           |  14 +++
 arch/x86/lib/retpoline.S                |  50 ++------
 arch/x86/net/bpf_jit_comp.c             |  71 +++++++-----
 arch/x86/net/bpf_jit_comp32.c           |  22 +++-
 tools/objtool/arch/x86/decode.c         | 120 --------------------
 tools/objtool/check.c                   | 195 ++++++++++++++++++++++----------
 tools/objtool/elf.c                     |  84 --------------
 tools/objtool/include/objtool/check.h   |   1 -
 tools/objtool/include/objtool/elf.h     |   2 +-
 tools/objtool/special.c                 |   8 --
 18 files changed, 431 insertions(+), 427 deletions(-)


^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH v2 01/14] objtool: Tag retpoline thunk symbols
  2021-10-20 10:44 [PATCH v2 00/14] x86: Rewrite the retpoline rewrite logic Peter Zijlstra
@ 2021-10-20 10:44 ` Peter Zijlstra
  2021-10-20 15:17   ` Josh Poimboeuf
  2021-10-20 10:44 ` [PATCH v2 02/14] objtool: Explicitly avoid self modifying code in .altinstr_replacement Peter Zijlstra
                   ` (12 subsequent siblings)
  13 siblings, 1 reply; 45+ messages in thread
From: Peter Zijlstra @ 2021-10-20 10:44 UTC (permalink / raw)
  To: x86, jpoimboe, andrew.cooper3
  Cc: linux-kernel, peterz, alexei.starovoitov, ndesaulniers

In order to avoid calling arch_is_retpoline() (which does a strcmp)
over and over on symbols, tag them once upfront.

XXX do we also want to do __fentry__ ?
XXX do we want an enum instead of a bunch of bools ?

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 tools/objtool/check.c               |   17 +++++++++++------
 tools/objtool/include/objtool/elf.h |    1 +
 2 files changed, 12 insertions(+), 6 deletions(-)

--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -1077,7 +1077,7 @@ static int add_jump_destinations(struct
 		} else if (reloc->sym->type == STT_SECTION) {
 			dest_sec = reloc->sym->sec;
 			dest_off = arch_dest_reloc_offset(reloc->addend);
-		} else if (arch_is_retpoline(reloc->sym)) {
+		} else if (reloc->sym->retpoline_thunk) {
 			/*
 			 * Retpoline jumps are really dynamic jumps in
 			 * disguise, so convert them accordingly.
@@ -1218,7 +1218,7 @@ static int add_call_destinations(struct
 
 			add_call_dest(file, insn, dest, false);
 
-		} else if (arch_is_retpoline(reloc->sym)) {
+		} else if (reloc->sym->retpoline_thunk) {
 			/*
 			 * Retpoline calls are really dynamic calls in
 			 * disguise, so convert them accordingly.
@@ -1907,17 +1907,22 @@ static int read_intra_function_calls(str
 	return 0;
 }
 
-static int read_static_call_tramps(struct objtool_file *file)
+static int classify_symbols(struct objtool_file *file)
 {
 	struct section *sec;
 	struct symbol *func;
 
 	for_each_sec(file, sec) {
 		list_for_each_entry(func, &sec->symbol_list, list) {
-			if (func->bind == STB_GLOBAL &&
-			    !strncmp(func->name, STATIC_CALL_TRAMP_PREFIX_STR,
+			if (func->bind != STB_GLOBAL)
+				continue;
+
+			if (!strncmp(func->name, STATIC_CALL_TRAMP_PREFIX_STR,
 				     strlen(STATIC_CALL_TRAMP_PREFIX_STR)))
 				func->static_call_tramp = true;
+
+			if (arch_is_retpoline(func))
+				func->retpoline_thunk = true;
 		}
 	}
 
@@ -1983,7 +1988,7 @@ static int decode_sections(struct objtoo
 	/*
 	 * Must be before add_{jump_call}_destination.
 	 */
-	ret = read_static_call_tramps(file);
+	ret = classify_symbols(file);
 	if (ret)
 		return ret;
 
--- a/tools/objtool/include/objtool/elf.h
+++ b/tools/objtool/include/objtool/elf.h
@@ -56,6 +56,7 @@ struct symbol {
 	struct symbol *pfunc, *cfunc, *alias;
 	bool uaccess_safe;
 	bool static_call_tramp;
+	bool retpoline_thunk;
 	struct list_head pv_target;
 };
 



^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH v2 02/14] objtool: Explicitly avoid self modifying code in .altinstr_replacement
  2021-10-20 10:44 [PATCH v2 00/14] x86: Rewrite the retpoline rewrite logic Peter Zijlstra
  2021-10-20 10:44 ` [PATCH v2 01/14] objtool: Tag retpoline thunk symbols Peter Zijlstra
@ 2021-10-20 10:44 ` Peter Zijlstra
  2021-10-20 10:44 ` [PATCH v2 03/14] objtool: Shrink struct instruction Peter Zijlstra
                   ` (11 subsequent siblings)
  13 siblings, 0 replies; 45+ messages in thread
From: Peter Zijlstra @ 2021-10-20 10:44 UTC (permalink / raw)
  To: x86, jpoimboe, andrew.cooper3
  Cc: linux-kernel, peterz, alexei.starovoitov, ndesaulniers

Assume ALTERNATIVE()s know what they're doing and do not change, or
cause to change, instructions in .altinstr_replacement sections.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 tools/objtool/check.c |   42 +++++++++++++++++++++++++++++++-----------
 1 file changed, 31 insertions(+), 11 deletions(-)

--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -993,18 +993,27 @@ static void remove_insn_ops(struct instr
 	}
 }
 
-static void add_call_dest(struct objtool_file *file, struct instruction *insn,
-			  struct symbol *dest, bool sibling)
+static void annotate_call_site(struct objtool_file *file,
+			       struct instruction *insn, bool sibling)
 {
 	struct reloc *reloc = insn_reloc(file, insn);
+	struct symbol *sym = insn->call_dest;
 
-	insn->call_dest = dest;
-	if (!dest)
+	if (!sym)
+		sym = reloc->sym;
+
+	/*
+	 * Alternative replacement code is just template code which is
+	 * sometimes copied to the original instruction, For now, don't
+	 * annotate it. (In the future we might consider annotating the
+	 * original instruction if/when it ever makes sense to do so.).
+	 */
+	if (!strcmp(insn->sec->name, ".altinstr_replacement"))
 		return;
 
-	if (insn->call_dest->static_call_tramp) {
-		list_add_tail(&insn->call_node,
-			      &file->static_call_list);
+	if (sym->static_call_tramp) {
+		list_add_tail(&insn->call_node, &file->static_call_list);
+		return;
 	}
 
 	/*
@@ -1013,7 +1022,7 @@ static void add_call_dest(struct objtool
 	 * text.
 	 */
 	if (insn->sec->noinstr &&
-	    !strncmp(insn->call_dest->name, "__sanitizer_cov_", 16)) {
+	    !strncmp(sym->name, "__sanitizer_cov_", 16)) {
 		if (reloc) {
 			reloc->type = R_NONE;
 			elf_write_reloc(file->elf, reloc);
@@ -1025,9 +1034,10 @@ static void add_call_dest(struct objtool
 			               : arch_nop_insn(insn->len));
 
 		insn->type = sibling ? INSN_RETURN : INSN_NOP;
+		return;
 	}
 
-	if (mcount && !strcmp(insn->call_dest->name, "__fentry__")) {
+	if (mcount && !strcmp(sym->name, "__fentry__")) {
 		if (sibling)
 			WARN_FUNC("Tail call to __fentry__ !?!?", insn->sec, insn->offset);
 
@@ -1042,9 +1052,17 @@ static void add_call_dest(struct objtool
 
 		insn->type = INSN_NOP;
 
-		list_add_tail(&insn->mcount_loc_node,
-			      &file->mcount_loc_list);
+		list_add_tail(&insn->mcount_loc_node, &file->mcount_loc_list);
+		return;
 	}
+}
+
+static void add_call_dest(struct objtool_file *file, struct instruction *insn,
+			  struct symbol *dest, bool sibling)
+{
+	insn->call_dest = dest;
+	if (!dest)
+		return;
 
 	/*
 	 * Whatever stack impact regular CALLs have, should be undone
@@ -1054,6 +1072,8 @@ static void add_call_dest(struct objtool
 	 * are converted to JUMP, see read_intra_function_calls().
 	 */
 	remove_insn_ops(insn);
+
+	annotate_call_site(file, insn, sibling);
 }
 
 /*



^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH v2 03/14] objtool: Shrink struct instruction
  2021-10-20 10:44 [PATCH v2 00/14] x86: Rewrite the retpoline rewrite logic Peter Zijlstra
  2021-10-20 10:44 ` [PATCH v2 01/14] objtool: Tag retpoline thunk symbols Peter Zijlstra
  2021-10-20 10:44 ` [PATCH v2 02/14] objtool: Explicitly avoid self modifying code in .altinstr_replacement Peter Zijlstra
@ 2021-10-20 10:44 ` Peter Zijlstra
  2021-10-20 10:44 ` [PATCH v2 04/14] objtool,x86: Replace alternatives with .retpoline_sites Peter Zijlstra
                   ` (10 subsequent siblings)
  13 siblings, 0 replies; 45+ messages in thread
From: Peter Zijlstra @ 2021-10-20 10:44 UTC (permalink / raw)
  To: x86, jpoimboe, andrew.cooper3
  Cc: linux-kernel, peterz, alexei.starovoitov, ndesaulniers

Any one instruction can only ever call a single function, therefore
insn->mcount_loc_node is superfluous and can use insn->call_node.

This shrinks struct instruction, which is by far the most numerous
structure objtool creates.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 tools/objtool/check.c                 |    6 +++---
 tools/objtool/include/objtool/check.h |    1 -
 2 files changed, 3 insertions(+), 4 deletions(-)

--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -701,7 +701,7 @@ static int create_mcount_loc_sections(st
 		return 0;
 
 	idx = 0;
-	list_for_each_entry(insn, &file->mcount_loc_list, mcount_loc_node)
+	list_for_each_entry(insn, &file->mcount_loc_list, call_node)
 		idx++;
 
 	sec = elf_create_section(file->elf, "__mcount_loc", 0, sizeof(unsigned long), idx);
@@ -709,7 +709,7 @@ static int create_mcount_loc_sections(st
 		return -1;
 
 	idx = 0;
-	list_for_each_entry(insn, &file->mcount_loc_list, mcount_loc_node) {
+	list_for_each_entry(insn, &file->mcount_loc_list, call_node) {
 
 		loc = (unsigned long *)sec->data->d_buf + idx;
 		memset(loc, 0, sizeof(unsigned long));
@@ -1049,7 +1049,7 @@ static void annotate_call_site(struct ob
 
 		insn->type = INSN_NOP;
 
-		list_add_tail(&insn->mcount_loc_node, &file->mcount_loc_list);
+		list_add_tail(&insn->call_node, &file->mcount_loc_list);
 		return;
 	}
 }
--- a/tools/objtool/include/objtool/check.h
+++ b/tools/objtool/include/objtool/check.h
@@ -40,7 +40,6 @@ struct instruction {
 	struct list_head list;
 	struct hlist_node hash;
 	struct list_head call_node;
-	struct list_head mcount_loc_node;
 	struct section *sec;
 	unsigned long offset;
 	unsigned int len;



^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH v2 04/14] objtool,x86: Replace alternatives with .retpoline_sites
  2021-10-20 10:44 [PATCH v2 00/14] x86: Rewrite the retpoline rewrite logic Peter Zijlstra
                   ` (2 preceding siblings ...)
  2021-10-20 10:44 ` [PATCH v2 03/14] objtool: Shrink struct instruction Peter Zijlstra
@ 2021-10-20 10:44 ` Peter Zijlstra
  2021-10-20 10:44 ` [PATCH v2 05/14] x86/retpoline: Remove unused replacement symbols Peter Zijlstra
                   ` (9 subsequent siblings)
  13 siblings, 0 replies; 45+ messages in thread
From: Peter Zijlstra @ 2021-10-20 10:44 UTC (permalink / raw)
  To: x86, jpoimboe, andrew.cooper3
  Cc: linux-kernel, peterz, alexei.starovoitov, ndesaulniers

Instead of writing complete alternatives, simply provide a list of all
the retpoline thunk calls. Then the kernel is free to do with them as
it pleases. Simpler code all-round.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/kernel/vmlinux.lds.S       |   14 +++
 tools/objtool/arch/x86/decode.c     |  120 --------------------------------
 tools/objtool/check.c               |  132 +++++++++++++++++++++++++-----------
 tools/objtool/elf.c                 |   84 ----------------------
 tools/objtool/include/objtool/elf.h |    1 
 tools/objtool/special.c             |    8 --
 6 files changed, 107 insertions(+), 252 deletions(-)

--- a/arch/x86/kernel/vmlinux.lds.S
+++ b/arch/x86/kernel/vmlinux.lds.S
@@ -272,6 +272,20 @@ SECTIONS
 		__parainstructions_end = .;
 	}
 
+#ifdef CONFIG_RETPOLINE
+	/*
+	 * List of instructions that call/jmp/jcc to retpoline thunks
+	 * __x86_indirect_thunk_*(). These instructions can be patched along
+	 * with alternatives, after which the section can be freed.
+	 */
+	. = ALIGN(8);
+	.retpoline_sites : AT(ADDR(.retpoline_sites) - LOAD_OFFSET) {
+		__retpoline_sites = .;
+		*(.retpoline_sites)
+		__retpoline_sites_end = .;
+	}
+#endif
+
 	/*
 	 * struct alt_inst entries. From the header (alternative.h):
 	 * "Alternative instructions for different CPU types or capabilities"
--- a/tools/objtool/arch/x86/decode.c
+++ b/tools/objtool/arch/x86/decode.c
@@ -711,126 +711,6 @@ const char *arch_ret_insn(int len)
 	return ret[len-1];
 }
 
-/* asm/alternative.h ? */
-
-#define ALTINSTR_FLAG_INV	(1 << 15)
-#define ALT_NOT(feat)		((feat) | ALTINSTR_FLAG_INV)
-
-struct alt_instr {
-	s32 instr_offset;	/* original instruction */
-	s32 repl_offset;	/* offset to replacement instruction */
-	u16 cpuid;		/* cpuid bit set for replacement */
-	u8  instrlen;		/* length of original instruction */
-	u8  replacementlen;	/* length of new instruction */
-} __packed;
-
-static int elf_add_alternative(struct elf *elf,
-			       struct instruction *orig, struct symbol *sym,
-			       int cpuid, u8 orig_len, u8 repl_len)
-{
-	const int size = sizeof(struct alt_instr);
-	struct alt_instr *alt;
-	struct section *sec;
-	Elf_Scn *s;
-
-	sec = find_section_by_name(elf, ".altinstructions");
-	if (!sec) {
-		sec = elf_create_section(elf, ".altinstructions",
-					 SHF_ALLOC, 0, 0);
-
-		if (!sec) {
-			WARN_ELF("elf_create_section");
-			return -1;
-		}
-	}
-
-	s = elf_getscn(elf->elf, sec->idx);
-	if (!s) {
-		WARN_ELF("elf_getscn");
-		return -1;
-	}
-
-	sec->data = elf_newdata(s);
-	if (!sec->data) {
-		WARN_ELF("elf_newdata");
-		return -1;
-	}
-
-	sec->data->d_size = size;
-	sec->data->d_align = 1;
-
-	alt = sec->data->d_buf = malloc(size);
-	if (!sec->data->d_buf) {
-		perror("malloc");
-		return -1;
-	}
-	memset(sec->data->d_buf, 0, size);
-
-	if (elf_add_reloc_to_insn(elf, sec, sec->sh.sh_size,
-				  R_X86_64_PC32, orig->sec, orig->offset)) {
-		WARN("elf_create_reloc: alt_instr::instr_offset");
-		return -1;
-	}
-
-	if (elf_add_reloc(elf, sec, sec->sh.sh_size + 4,
-			  R_X86_64_PC32, sym, 0)) {
-		WARN("elf_create_reloc: alt_instr::repl_offset");
-		return -1;
-	}
-
-	alt->cpuid = bswap_if_needed(cpuid);
-	alt->instrlen = orig_len;
-	alt->replacementlen = repl_len;
-
-	sec->sh.sh_size += size;
-	sec->changed = true;
-
-	return 0;
-}
-
-#define X86_FEATURE_RETPOLINE                ( 7*32+12)
-
-int arch_rewrite_retpolines(struct objtool_file *file)
-{
-	struct instruction *insn;
-	struct reloc *reloc;
-	struct symbol *sym;
-	char name[32] = "";
-
-	list_for_each_entry(insn, &file->retpoline_call_list, call_node) {
-
-		if (insn->type != INSN_JUMP_DYNAMIC &&
-		    insn->type != INSN_CALL_DYNAMIC)
-			continue;
-
-		if (!strcmp(insn->sec->name, ".text.__x86.indirect_thunk"))
-			continue;
-
-		reloc = insn->reloc;
-
-		sprintf(name, "__x86_indirect_alt_%s_%s",
-			insn->type == INSN_JUMP_DYNAMIC ? "jmp" : "call",
-			reloc->sym->name + 21);
-
-		sym = find_symbol_by_name(file->elf, name);
-		if (!sym) {
-			sym = elf_create_undef_symbol(file->elf, name);
-			if (!sym) {
-				WARN("elf_create_undef_symbol");
-				return -1;
-			}
-		}
-
-		if (elf_add_alternative(file->elf, insn, sym,
-					ALT_NOT(X86_FEATURE_RETPOLINE), 5, 5)) {
-			WARN("elf_add_alternative");
-			return -1;
-		}
-	}
-
-	return 0;
-}
-
 int arch_decode_hint_reg(u8 sp_reg, int *base)
 {
 	switch (sp_reg) {
--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -683,6 +683,52 @@ static int create_static_call_sections(s
 	return 0;
 }
 
+static int create_retpoline_sites_sections(struct objtool_file *file)
+{
+	struct instruction *insn;
+	struct section *sec;
+	int idx;
+
+	sec = find_section_by_name(file->elf, ".retpoline_sites");
+	if (sec) {
+		WARN("file already has .retpoline_sites, skipping");
+		return 0;
+	}
+
+	idx = 0;
+	list_for_each_entry(insn, &file->retpoline_call_list, call_node)
+		idx++;
+
+	if (!idx)
+		return 0;
+
+	sec = elf_create_section(file->elf, ".retpoline_sites", 0,
+				 sizeof(int), idx);
+	if (!sec) {
+		WARN("elf_create_section: .retpoline_sites");
+		return -1;
+	}
+
+	idx = 0;
+	list_for_each_entry(insn, &file->retpoline_call_list, call_node) {
+
+		int *site = (int *)sec->data->d_buf + idx;
+		*site = 0;
+
+		if (elf_add_reloc_to_insn(file->elf, sec,
+					  idx * sizeof(int),
+					  R_X86_64_PC32,
+					  insn->sec, insn->offset)) {
+			WARN("elf_add_reloc_to_insn: .retpoline_sites");
+			return -1;
+		}
+
+		idx++;
+	}
+
+	return 0;
+}
+
 static int create_mcount_loc_sections(struct objtool_file *file)
 {
 	struct section *sec;
@@ -1013,6 +1059,11 @@ static void annotate_call_site(struct ob
 		return;
 	}
 
+	if (sym->retpoline_thunk) {
+		list_add_tail(&insn->call_node, &file->retpoline_call_list);
+		return;
+	}
+
 	/*
 	 * Many compilers cannot disable KCOV with a function attribute
 	 * so they need a little help, NOP out any KCOV calls from noinstr
@@ -1073,6 +1124,39 @@ static void add_call_dest(struct objtool
 	annotate_call_site(file, insn, sibling);
 }
 
+static void add_retpoline_call(struct objtool_file *file, struct instruction *insn)
+{
+	/*
+	 * Retpoline calls/jumps are really dynamic calls/jumps in disguise,
+	 * so convert them accordingly.
+	 */
+	switch (insn->type) {
+	case INSN_CALL:
+		insn->type = INSN_CALL_DYNAMIC;
+		break;
+	case INSN_JUMP_UNCONDITIONAL:
+		insn->type = INSN_JUMP_DYNAMIC;
+		break;
+	case INSN_JUMP_CONDITIONAL:
+		insn->type = INSN_JUMP_DYNAMIC_CONDITIONAL;
+		break;
+	default:
+		return;
+	}
+
+	insn->retpoline_safe = true;
+
+	/*
+	 * Whatever stack impact regular CALLs have, should be undone
+	 * by the RETURN of the called function.
+	 *
+	 * Annotated intra-function calls retain the stack_ops but
+	 * are converted to JUMP, see read_intra_function_calls().
+	 */
+	remove_insn_ops(insn);
+
+	annotate_call_site(file, insn, false);
+}
 /*
  * Find the destination instructions for all jumps.
  */
@@ -1095,19 +1179,7 @@ static int add_jump_destinations(struct
 			dest_sec = reloc->sym->sec;
 			dest_off = arch_dest_reloc_offset(reloc->addend);
 		} else if (reloc->sym->retpoline_thunk) {
-			/*
-			 * Retpoline jumps are really dynamic jumps in
-			 * disguise, so convert them accordingly.
-			 */
-			if (insn->type == INSN_JUMP_UNCONDITIONAL)
-				insn->type = INSN_JUMP_DYNAMIC;
-			else
-				insn->type = INSN_JUMP_DYNAMIC_CONDITIONAL;
-
-			list_add_tail(&insn->call_node,
-				      &file->retpoline_call_list);
-
-			insn->retpoline_safe = true;
+			add_retpoline_call(file, insn);
 			continue;
 		} else if (insn->func) {
 			/* internal or external sibling call (with reloc) */
@@ -1236,18 +1308,7 @@ static int add_call_destinations(struct
 			add_call_dest(file, insn, dest, false);
 
 		} else if (reloc->sym->retpoline_thunk) {
-			/*
-			 * Retpoline calls are really dynamic calls in
-			 * disguise, so convert them accordingly.
-			 */
-			insn->type = INSN_CALL_DYNAMIC;
-			insn->retpoline_safe = true;
-
-			list_add_tail(&insn->call_node,
-				      &file->retpoline_call_list);
-
-			remove_insn_ops(insn);
-			continue;
+			add_retpoline_call(file, insn);
 
 		} else
 			add_call_dest(file, insn, reloc->sym, false);
@@ -1972,11 +2033,6 @@ static void mark_rodata(struct objtool_f
 	file->rodata = found;
 }
 
-__weak int arch_rewrite_retpolines(struct objtool_file *file)
-{
-	return 0;
-}
-
 static int decode_sections(struct objtool_file *file)
 {
 	int ret;
@@ -2049,15 +2105,6 @@ static int decode_sections(struct objtoo
 	if (ret)
 		return ret;
 
-	/*
-	 * Must be after add_special_section_alts(), since this will emit
-	 * alternatives. Must be after add_{jump,call}_destination(), since
-	 * those create the call insn lists.
-	 */
-	ret = arch_rewrite_retpolines(file);
-	if (ret)
-		return ret;
-
 	return 0;
 }
 
@@ -3460,6 +3507,13 @@ int check(struct objtool_file *file)
 		goto out;
 	warnings += ret;
 
+	if (retpoline) {
+		ret = create_retpoline_sites_sections(file);
+		if (ret < 0)
+			goto out;
+		warnings += ret;
+	}
+
 	if (mcount) {
 		ret = create_mcount_loc_sections(file);
 		if (ret < 0)
--- a/tools/objtool/elf.c
+++ b/tools/objtool/elf.c
@@ -741,90 +741,6 @@ static int elf_add_string(struct elf *el
 	return len;
 }
 
-struct symbol *elf_create_undef_symbol(struct elf *elf, const char *name)
-{
-	struct section *symtab, *symtab_shndx;
-	struct symbol *sym;
-	Elf_Data *data;
-	Elf_Scn *s;
-
-	sym = malloc(sizeof(*sym));
-	if (!sym) {
-		perror("malloc");
-		return NULL;
-	}
-	memset(sym, 0, sizeof(*sym));
-
-	sym->name = strdup(name);
-
-	sym->sym.st_name = elf_add_string(elf, NULL, sym->name);
-	if (sym->sym.st_name == -1)
-		return NULL;
-
-	sym->sym.st_info = GELF_ST_INFO(STB_GLOBAL, STT_NOTYPE);
-	// st_other 0
-	// st_shndx 0
-	// st_value 0
-	// st_size 0
-
-	symtab = find_section_by_name(elf, ".symtab");
-	if (!symtab) {
-		WARN("can't find .symtab");
-		return NULL;
-	}
-
-	s = elf_getscn(elf->elf, symtab->idx);
-	if (!s) {
-		WARN_ELF("elf_getscn");
-		return NULL;
-	}
-
-	data = elf_newdata(s);
-	if (!data) {
-		WARN_ELF("elf_newdata");
-		return NULL;
-	}
-
-	data->d_buf = &sym->sym;
-	data->d_size = sizeof(sym->sym);
-	data->d_align = 1;
-	data->d_type = ELF_T_SYM;
-
-	sym->idx = symtab->sh.sh_size / sizeof(sym->sym);
-
-	symtab->sh.sh_size += data->d_size;
-	symtab->changed = true;
-
-	symtab_shndx = find_section_by_name(elf, ".symtab_shndx");
-	if (symtab_shndx) {
-		s = elf_getscn(elf->elf, symtab_shndx->idx);
-		if (!s) {
-			WARN_ELF("elf_getscn");
-			return NULL;
-		}
-
-		data = elf_newdata(s);
-		if (!data) {
-			WARN_ELF("elf_newdata");
-			return NULL;
-		}
-
-		data->d_buf = &sym->sym.st_size; /* conveniently 0 */
-		data->d_size = sizeof(Elf32_Word);
-		data->d_align = 4;
-		data->d_type = ELF_T_WORD;
-
-		symtab_shndx->sh.sh_size += 4;
-		symtab_shndx->changed = true;
-	}
-
-	sym->sec = find_section_by_index(elf, 0);
-
-	elf_add_symbol(elf, sym);
-
-	return sym;
-}
-
 struct section *elf_create_section(struct elf *elf, const char *name,
 				   unsigned int sh_flags, size_t entsize, int nr)
 {
--- a/tools/objtool/include/objtool/elf.h
+++ b/tools/objtool/include/objtool/elf.h
@@ -142,7 +142,6 @@ int elf_write_insn(struct elf *elf, stru
 		   unsigned long offset, unsigned int len,
 		   const char *insn);
 int elf_write_reloc(struct elf *elf, struct reloc *reloc);
-struct symbol *elf_create_undef_symbol(struct elf *elf, const char *name);
 int elf_write(struct elf *elf);
 void elf_close(struct elf *elf);
 
--- a/tools/objtool/special.c
+++ b/tools/objtool/special.c
@@ -109,14 +109,6 @@ static int get_alt_entry(struct elf *elf
 			return -1;
 		}
 
-		/*
-		 * Skip retpoline .altinstr_replacement... we already rewrite the
-		 * instructions for retpolines anyway, see arch_is_retpoline()
-		 * usage in add_{call,jump}_destinations().
-		 */
-		if (arch_is_retpoline(new_reloc->sym))
-			return 1;
-
 		reloc_to_sec_off(new_reloc, &alt->new_sec, &alt->new_off);
 
 		/* _ASM_EXTABLE_EX hack */



^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH v2 05/14] x86/retpoline: Remove unused replacement symbols
  2021-10-20 10:44 [PATCH v2 00/14] x86: Rewrite the retpoline rewrite logic Peter Zijlstra
                   ` (3 preceding siblings ...)
  2021-10-20 10:44 ` [PATCH v2 04/14] objtool,x86: Replace alternatives with .retpoline_sites Peter Zijlstra
@ 2021-10-20 10:44 ` Peter Zijlstra
  2021-10-20 10:44 ` [PATCH v2 06/14] x86/asm: Fix register order Peter Zijlstra
                   ` (8 subsequent siblings)
  13 siblings, 0 replies; 45+ messages in thread
From: Peter Zijlstra @ 2021-10-20 10:44 UTC (permalink / raw)
  To: x86, jpoimboe, andrew.cooper3
  Cc: linux-kernel, peterz, alexei.starovoitov, ndesaulniers

Now that objtool no longer creates alternatives, these replacement
symbols are no longer needed, remove them.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/lib/retpoline.S |   42 ------------------------------------------
 1 file changed, 42 deletions(-)

--- a/arch/x86/lib/retpoline.S
+++ b/arch/x86/lib/retpoline.S
@@ -41,36 +41,6 @@ SYM_FUNC_END(__x86_indirect_thunk_\reg)
 .endm
 
 /*
- * This generates .altinstr_replacement symbols for use by objtool. They,
- * however, must not actually live in .altinstr_replacement since that will be
- * discarded after init, but module alternatives will also reference these
- * symbols.
- *
- * Their names matches the "__x86_indirect_" prefix to mark them as retpolines.
- */
-.macro ALT_THUNK reg
-
-	.align 1
-
-SYM_FUNC_START_NOALIGN(__x86_indirect_alt_call_\reg)
-	ANNOTATE_RETPOLINE_SAFE
-1:	call	*%\reg
-2:	.skip	5-(2b-1b), 0x90
-SYM_FUNC_END(__x86_indirect_alt_call_\reg)
-
-STACK_FRAME_NON_STANDARD(__x86_indirect_alt_call_\reg)
-
-SYM_FUNC_START_NOALIGN(__x86_indirect_alt_jmp_\reg)
-	ANNOTATE_RETPOLINE_SAFE
-1:	jmp	*%\reg
-2:	.skip	5-(2b-1b), 0x90
-SYM_FUNC_END(__x86_indirect_alt_jmp_\reg)
-
-STACK_FRAME_NON_STANDARD(__x86_indirect_alt_jmp_\reg)
-
-.endm
-
-/*
  * Despite being an assembler file we can't just use .irp here
  * because __KSYM_DEPS__ only uses the C preprocessor and would
  * only see one instance of "__x86_indirect_thunk_\reg" rather
@@ -92,15 +62,3 @@ STACK_FRAME_NON_STANDARD(__x86_indirect_
 #undef GEN
 #define GEN(reg) EXPORT_THUNK(reg)
 #include <asm/GEN-for-each-reg.h>
-
-#undef GEN
-#define GEN(reg) ALT_THUNK reg
-#include <asm/GEN-for-each-reg.h>
-
-#undef GEN
-#define GEN(reg) __EXPORT_THUNK(__x86_indirect_alt_call_ ## reg)
-#include <asm/GEN-for-each-reg.h>
-
-#undef GEN
-#define GEN(reg) __EXPORT_THUNK(__x86_indirect_alt_jmp_ ## reg)
-#include <asm/GEN-for-each-reg.h>



^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH v2 06/14] x86/asm: Fix register order
  2021-10-20 10:44 [PATCH v2 00/14] x86: Rewrite the retpoline rewrite logic Peter Zijlstra
                   ` (4 preceding siblings ...)
  2021-10-20 10:44 ` [PATCH v2 05/14] x86/retpoline: Remove unused replacement symbols Peter Zijlstra
@ 2021-10-20 10:44 ` Peter Zijlstra
  2021-10-22 19:27   ` David Laight
  2021-10-25 14:09   ` Borislav Petkov
  2021-10-20 10:44 ` [PATCH v2 07/14] x86/asm: Fixup odd GEN-for-each-reg.h usage Peter Zijlstra
                   ` (7 subsequent siblings)
  13 siblings, 2 replies; 45+ messages in thread
From: Peter Zijlstra @ 2021-10-20 10:44 UTC (permalink / raw)
  To: x86, jpoimboe, andrew.cooper3
  Cc: linux-kernel, peterz, alexei.starovoitov, ndesaulniers

Ensure the register order is correct; this allows for easy translation
between register number and trampoline and vice-versa.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/include/asm/GEN-for-each-reg.h |   13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

--- a/arch/x86/include/asm/GEN-for-each-reg.h
+++ b/arch/x86/include/asm/GEN-for-each-reg.h
@@ -1,11 +1,15 @@
+/*
+ * These are in machine order; things rely on that.
+ */
 #ifdef CONFIG_64BIT
 GEN(rax)
-GEN(rbx)
 GEN(rcx)
 GEN(rdx)
+GEN(rbx)
+GEN(rsp)
+GEN(rbp)
 GEN(rsi)
 GEN(rdi)
-GEN(rbp)
 GEN(r8)
 GEN(r9)
 GEN(r10)
@@ -16,10 +20,11 @@ GEN(r14)
 GEN(r15)
 #else
 GEN(eax)
-GEN(ebx)
 GEN(ecx)
 GEN(edx)
+GEN(ebx)
+GEN(esp)
+GEN(ebp)
 GEN(esi)
 GEN(edi)
-GEN(ebp)
 #endif



^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH v2 07/14] x86/asm: Fixup odd GEN-for-each-reg.h usage
  2021-10-20 10:44 [PATCH v2 00/14] x86: Rewrite the retpoline rewrite logic Peter Zijlstra
                   ` (5 preceding siblings ...)
  2021-10-20 10:44 ` [PATCH v2 06/14] x86/asm: Fix register order Peter Zijlstra
@ 2021-10-20 10:44 ` Peter Zijlstra
  2021-10-20 10:44 ` [PATCH v2 08/14] x86/retpoline: Create a retpoline thunk array Peter Zijlstra
                   ` (6 subsequent siblings)
  13 siblings, 0 replies; 45+ messages in thread
From: Peter Zijlstra @ 2021-10-20 10:44 UTC (permalink / raw)
  To: x86, jpoimboe, andrew.cooper3
  Cc: linux-kernel, peterz, alexei.starovoitov, ndesaulniers

Currently GEN-for-each-reg.h usage leaves GEN defined, relying on any
subsequent usage to start with #undef, which is rude.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/include/asm/asm-prototypes.h |    6 +++---
 arch/x86/lib/retpoline.S              |    4 ++--
 2 files changed, 5 insertions(+), 5 deletions(-)

--- a/arch/x86/include/asm/asm-prototypes.h
+++ b/arch/x86/include/asm/asm-prototypes.h
@@ -19,19 +19,19 @@ extern void cmpxchg8b_emu(void);
 
 #ifdef CONFIG_RETPOLINE
 
-#undef GEN
 #define GEN(reg) \
 	extern asmlinkage void __x86_indirect_thunk_ ## reg (void);
 #include <asm/GEN-for-each-reg.h>
-
 #undef GEN
+
 #define GEN(reg) \
 	extern asmlinkage void __x86_indirect_alt_call_ ## reg (void);
 #include <asm/GEN-for-each-reg.h>
-
 #undef GEN
+
 #define GEN(reg) \
 	extern asmlinkage void __x86_indirect_alt_jmp_ ## reg (void);
 #include <asm/GEN-for-each-reg.h>
+#undef GEN
 
 #endif /* CONFIG_RETPOLINE */
--- a/arch/x86/lib/retpoline.S
+++ b/arch/x86/lib/retpoline.S
@@ -55,10 +55,10 @@ SYM_FUNC_END(__x86_indirect_thunk_\reg)
 #define __EXPORT_THUNK(sym)	_ASM_NOKPROBE(sym); EXPORT_SYMBOL(sym)
 #define EXPORT_THUNK(reg)	__EXPORT_THUNK(__x86_indirect_thunk_ ## reg)
 
-#undef GEN
 #define GEN(reg) THUNK reg
 #include <asm/GEN-for-each-reg.h>
-
 #undef GEN
+
 #define GEN(reg) EXPORT_THUNK(reg)
 #include <asm/GEN-for-each-reg.h>
+#undef GEN



^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH v2 08/14] x86/retpoline: Create a retpoline thunk array
  2021-10-20 10:44 [PATCH v2 00/14] x86: Rewrite the retpoline rewrite logic Peter Zijlstra
                   ` (6 preceding siblings ...)
  2021-10-20 10:44 ` [PATCH v2 07/14] x86/asm: Fixup odd GEN-for-each-reg.h usage Peter Zijlstra
@ 2021-10-20 10:44 ` Peter Zijlstra
  2021-10-20 15:57   ` Josh Poimboeuf
  2021-10-20 10:44 ` [PATCH v2 09/14] x86/alternative: Implement .retpoline_sites support Peter Zijlstra
                   ` (5 subsequent siblings)
  13 siblings, 1 reply; 45+ messages in thread
From: Peter Zijlstra @ 2021-10-20 10:44 UTC (permalink / raw)
  To: x86, jpoimboe, andrew.cooper3
  Cc: linux-kernel, peterz, alexei.starovoitov, ndesaulniers

Stick all the retpolines in a single symbol and have the individual
thunks as inner labels, this should guarantee thunk order and layout.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/lib/retpoline.S |    8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

--- a/arch/x86/lib/retpoline.S
+++ b/arch/x86/lib/retpoline.S
@@ -30,7 +30,7 @@
 
 	.align 32
 
-SYM_FUNC_START(__x86_indirect_thunk_\reg)
+SYM_INNER_LABEL(__x86_indirect_thunk_\reg, SYM_L_GLOBAL)
 
 	ALTERNATIVE_2 __stringify(ANNOTATE_RETPOLINE_SAFE; jmp *%\reg), \
 		      __stringify(RETPOLINE \reg), X86_FEATURE_RETPOLINE, \
@@ -55,10 +55,16 @@ SYM_FUNC_END(__x86_indirect_thunk_\reg)
 #define __EXPORT_THUNK(sym)	_ASM_NOKPROBE(sym); EXPORT_SYMBOL(sym)
 #define EXPORT_THUNK(reg)	__EXPORT_THUNK(__x86_indirect_thunk_ ## reg)
 
+	.align 32
+SYM_CODE_START(__x86_indirect_thunk_array)
+
 #define GEN(reg) THUNK reg
 #include <asm/GEN-for-each-reg.h>
 #undef GEN
 
+	.align 32
+SYM_CODE_END(__x86_indirect_thunk_array)
+
 #define GEN(reg) EXPORT_THUNK(reg)
 #include <asm/GEN-for-each-reg.h>
 #undef GEN



^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH v2 09/14] x86/alternative: Implement .retpoline_sites support
  2021-10-20 10:44 [PATCH v2 00/14] x86: Rewrite the retpoline rewrite logic Peter Zijlstra
                   ` (7 preceding siblings ...)
  2021-10-20 10:44 ` [PATCH v2 08/14] x86/retpoline: Create a retpoline thunk array Peter Zijlstra
@ 2021-10-20 10:44 ` Peter Zijlstra
  2021-10-20 10:44 ` [PATCH v2 10/14] x86/alternative: Handle Jcc __x86_indirect_thunk_\reg Peter Zijlstra
                   ` (4 subsequent siblings)
  13 siblings, 0 replies; 45+ messages in thread
From: Peter Zijlstra @ 2021-10-20 10:44 UTC (permalink / raw)
  To: x86, jpoimboe, andrew.cooper3
  Cc: linux-kernel, peterz, alexei.starovoitov, ndesaulniers

Rewrite retpoline thunk call sites to be indirect calls for
spectre_v2=off. This ensures spectre_v2=off is as near to a
RETPOLINE=n build as possible.

This is the replacement for objtool writing alternative entries to
ensure the same and achieves feature-parity with the previous
approach.

One noteworthy feature is that it relies on the thunks to be in
machine order to compute the register index.

Specifically, this does not yet address the Jcc __x86_indirect_thunk_*
calls generated by clang, a future patch will add this.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/um/kernel/um_arch.c           |    4 +
 arch/x86/include/asm/alternative.h |    1 
 arch/x86/kernel/alternative.c      |  142 +++++++++++++++++++++++++++++++++++--
 arch/x86/kernel/module.c           |    9 ++
 4 files changed, 151 insertions(+), 5 deletions(-)

--- a/arch/um/kernel/um_arch.c
+++ b/arch/um/kernel/um_arch.c
@@ -421,6 +421,10 @@ void __init check_bugs(void)
 	os_check_bugs();
 }
 
+void apply_retpolines(s32 *start, s32 *end)
+{
+}
+
 void apply_alternatives(struct alt_instr *start, struct alt_instr *end)
 {
 }
--- a/arch/x86/include/asm/alternative.h
+++ b/arch/x86/include/asm/alternative.h
@@ -75,6 +75,7 @@ extern int alternatives_patched;
 
 extern void alternative_instructions(void);
 extern void apply_alternatives(struct alt_instr *start, struct alt_instr *end);
+extern void apply_retpolines(s32 *start, s32 *end);
 
 struct module;
 
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -29,6 +29,7 @@
 #include <asm/io.h>
 #include <asm/fixmap.h>
 #include <asm/paravirt.h>
+#include <asm/asm-prototypes.h>
 
 int __read_mostly alternatives_patched;
 
@@ -113,6 +114,7 @@ static void __init_or_module add_nops(vo
 	}
 }
 
+extern s32 __retpoline_sites[], __retpoline_sites_end[];
 extern struct alt_instr __alt_instructions[], __alt_instructions_end[];
 extern s32 __smp_locks[], __smp_locks_end[];
 void text_poke_early(void *addr, const void *opcode, size_t len);
@@ -221,7 +223,7 @@ static __always_inline int optimize_nops
  * "noinline" to cause control flow change and thus invalidate I$ and
  * cause refetch after modification.
  */
-static void __init_or_module noinline optimize_nops(struct alt_instr *a, u8 *instr)
+static void __init_or_module noinline optimize_nops(u8 *instr, size_t len)
 {
 	struct insn insn;
 	int i = 0;
@@ -239,11 +241,11 @@ static void __init_or_module noinline op
 		 * optimized.
 		 */
 		if (insn.length == 1 && insn.opcode.bytes[0] == 0x90)
-			i += optimize_nops_range(instr, a->instrlen, i);
+			i += optimize_nops_range(instr, len, i);
 		else
 			i += insn.length;
 
-		if (i >= a->instrlen)
+		if (i >= len)
 			return;
 	}
 }
@@ -331,10 +333,136 @@ void __init_or_module noinline apply_alt
 		text_poke_early(instr, insn_buff, insn_buff_sz);
 
 next:
-		optimize_nops(a, instr);
+		optimize_nops(instr, a->instrlen);
 	}
 }
 
+#if defined(CONFIG_RETPOLINE) && defined(CONFIG_STACK_VALIDATION)
+
+/*
+ * CALL/JMP *%\reg
+ */
+static int emit_indirect(int op, int reg, u8 *bytes)
+{
+	int i = 0;
+	u8 modrm;
+
+	switch (op) {
+	case CALL_INSN_OPCODE:
+		modrm = 0x10; /* Reg = 2; CALL r/m */
+		break;
+
+	case JMP32_INSN_OPCODE:
+		modrm = 0x20; /* Reg = 4; JMP r/m */
+		break;
+
+	default:
+		WARN_ON_ONCE(1);
+		return -1;
+	}
+
+	if (reg >= 8) {
+		bytes[i++] = 0x41; /* REX.B prefix */
+		reg -= 8;
+	}
+
+	modrm |= 0xc0; /* Mod = 3 */
+	modrm += reg;
+
+	bytes[i++] = 0xff; /* opcode */
+	bytes[i++] = modrm;
+
+	return i;
+}
+
+/*
+ * Rewrite the compiler generated retpoline thunk calls.
+ *
+ * For spectre_v2=off (!X86_FEATURE_RETPOLINE), rewrite them into immediate
+ * indirect instructions, avoiding the extra indirection.
+ *
+ * For example, convert:
+ *
+ *   CALL __x86_indirect_thunk_\reg
+ *
+ * into:
+ *
+ *   CALL *%\reg
+ *
+ */
+static int patch_retpoline(void *addr, struct insn *insn, u8 *bytes)
+{
+	void (*target)(void);
+	int reg, i = 0;
+
+	target = addr + insn->length + insn->immediate.value;
+	reg = (target - &__x86_indirect_thunk_rax) /
+	      (&__x86_indirect_thunk_rcx - &__x86_indirect_thunk_rax);
+
+	if (WARN_ON_ONCE(reg & ~0xf))
+		return -1;
+
+	/* If anyone ever does: CALL/JMP *%rsp, we're in deep trouble. */
+	BUG_ON(reg == 4);
+
+	if (cpu_feature_enabled(X86_FEATURE_RETPOLINE))
+		return -1;
+
+	i = emit_indirect(insn->opcode.bytes[0], reg, bytes);
+	if (i < 0)
+		return i;
+
+	for (; i < insn->length;)
+		bytes[i++] = BYTES_NOP1;
+
+	return i;
+}
+
+/*
+ * Generated by 'objtool --retpoline'.
+ */
+void __init_or_module noinline apply_retpolines(s32 *start, s32 *end)
+{
+	s32 *s;
+
+	for (s = start; s < end; s++) {
+		void *addr = (void *)s + *s;
+		struct insn insn;
+		int len, ret;
+		u8 bytes[16];
+		u8 op1, op2;
+
+		ret = insn_decode_kernel(&insn, addr);
+		if (WARN_ON_ONCE(ret < 0))
+			continue;
+
+		op1 = insn.opcode.bytes[0];
+		op2 = insn.opcode.bytes[1];
+
+		switch (op1) {
+		case CALL_INSN_OPCODE:
+		case JMP32_INSN_OPCODE:
+			break;
+
+		default:
+			WARN_ON_ONCE(1);
+			continue;
+		}
+
+		len = patch_retpoline(addr, &insn, bytes);
+		if (len == insn.length) {
+			optimize_nops(bytes, len);
+			text_poke_early(addr, bytes, len);
+		}
+	}
+}
+
+#else /* !RETPOLINES || !CONFIG_STACK_VALIDATION */
+
+void __init_or_module noinline apply_retpolines(s32 *start, s32 *end) { }
+
+#endif /* CONFIG_RETPOLINE && CONFIG_STACK_VALIDATION */
+
 #ifdef CONFIG_SMP
 static void alternatives_smp_lock(const s32 *start, const s32 *end,
 				  u8 *text, u8 *text_end)
@@ -643,6 +771,12 @@ void __init alternative_instructions(voi
 	apply_paravirt(__parainstructions, __parainstructions_end);
 
 	/*
+	 * Rewrite the retpolines, must be done before alternatives since
+	 * those can rewrite the retpoline thunks.
+	 */
+	apply_retpolines(__retpoline_sites, __retpoline_sites_end);
+
+	/*
 	 * Then patch alternatives, such that those paravirt calls that are in
 	 * alternatives can be overwritten by their immediate fragments.
 	 */
--- a/arch/x86/kernel/module.c
+++ b/arch/x86/kernel/module.c
@@ -251,7 +251,8 @@ int module_finalize(const Elf_Ehdr *hdr,
 		    struct module *me)
 {
 	const Elf_Shdr *s, *text = NULL, *alt = NULL, *locks = NULL,
-		*para = NULL, *orc = NULL, *orc_ip = NULL;
+		*para = NULL, *orc = NULL, *orc_ip = NULL,
+		*retpolines = NULL;
 	char *secstrings = (void *)hdr + sechdrs[hdr->e_shstrndx].sh_offset;
 
 	for (s = sechdrs; s < sechdrs + hdr->e_shnum; s++) {
@@ -267,8 +268,14 @@ int module_finalize(const Elf_Ehdr *hdr,
 			orc = s;
 		if (!strcmp(".orc_unwind_ip", secstrings + s->sh_name))
 			orc_ip = s;
+		if (!strcmp(".retpoline_sites", secstrings + s->sh_name))
+			retpolines = s;
 	}
 
+	if (retpolines) {
+		void *rseg = (void *)retpolines->sh_addr;
+		apply_retpolines(rseg, rseg + retpolines->sh_size);
+	}
 	if (alt) {
 		/* patch .altinstructions */
 		void *aseg = (void *)alt->sh_addr;



^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH v2 10/14] x86/alternative: Handle Jcc __x86_indirect_thunk_\reg
  2021-10-20 10:44 [PATCH v2 00/14] x86: Rewrite the retpoline rewrite logic Peter Zijlstra
                   ` (8 preceding siblings ...)
  2021-10-20 10:44 ` [PATCH v2 09/14] x86/alternative: Implement .retpoline_sites support Peter Zijlstra
@ 2021-10-20 10:44 ` Peter Zijlstra
  2021-10-20 10:44 ` [PATCH v2 11/14] x86/alternative: Try inline spectre_v2=retpoline,amd Peter Zijlstra
                   ` (3 subsequent siblings)
  13 siblings, 0 replies; 45+ messages in thread
From: Peter Zijlstra @ 2021-10-20 10:44 UTC (permalink / raw)
  To: x86, jpoimboe, andrew.cooper3
  Cc: linux-kernel, peterz, alexei.starovoitov, ndesaulniers

Handle the rare cases where the compiler (clang) does an indirect
conditional tail-call using:

  Jcc __x86_indirect_thunk_\reg

For the !RETPOLINE case this can be rewritten to fit the original (6
byte) instruction like:

  Jncc.d8	1f
  JMP		*%\reg
  NOP
1:

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/kernel/alternative.c |   38 ++++++++++++++++++++++++++++++++++----
 1 file changed, 34 insertions(+), 4 deletions(-)

--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -393,7 +393,8 @@ static int emit_indirect(int op, int reg
 static int patch_retpoline(void *addr, struct insn *insn, u8 *bytes)
 {
 	void (*target)(void);
-	int reg, i = 0;
+	int reg, ret, i = 0;
+	u8 op, cc;
 
 	target = addr + insn->length + insn->immediate.value;
 	reg = (target - &__x86_indirect_thunk_rax) /
@@ -408,9 +409,34 @@ static int patch_retpoline(void *addr, s
 	if (cpu_feature_enabled(X86_FEATURE_RETPOLINE))
 		return -1;
 
-	i = emit_indirect(insn->opcode.bytes[0], reg, bytes);
-	if (i < 0)
-		return i;
+	op = insn->opcode.bytes[0];
+
+	/*
+	 * Convert:
+	 *
+	 *   Jcc.d32 __x86_indirect_thunk_\reg
+	 *
+	 * into:
+	 *
+	 *   Jncc.d8 1f
+	 *   jmp *%\reg
+	 *   nop
+	 * 1:
+	 */
+	if (op == 0x0f && (insn->opcode.bytes[1] & 0xf0) == 0x80) {
+		cc = insn->opcode.bytes[1] & 0xf;
+		cc ^= 1; /* invert condition */
+
+		bytes[i++] = 0x70 + cc; /* Jcc.d8 */
+		bytes[i++] = insn->length - 2;
+
+		op = JMP32_INSN_OPCODE;
+	}
+
+	ret = emit_indirect(op, reg, bytes + i);
+	if (ret < 0)
+		return ret;
+	i += ret;
 
 	for (; i < insn->length;)
 		bytes[i++] = BYTES_NOP1;
@@ -444,6 +470,10 @@ void __init_or_module noinline apply_ret
 		case JMP32_INSN_OPCODE:
 			break;
 
+		case 0x0f: /* escape */
+			if (op2 >= 0x80 && op2 <= 0x8f)
+				break;
+			fallthrough;
 		default:
 			WARN_ON_ONCE(1);
 			continue;



^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH v2 11/14] x86/alternative: Try inline spectre_v2=retpoline,amd
  2021-10-20 10:44 [PATCH v2 00/14] x86: Rewrite the retpoline rewrite logic Peter Zijlstra
                   ` (9 preceding siblings ...)
  2021-10-20 10:44 ` [PATCH v2 10/14] x86/alternative: Handle Jcc __x86_indirect_thunk_\reg Peter Zijlstra
@ 2021-10-20 10:44 ` Peter Zijlstra
  2021-10-20 10:44 ` [PATCH v2 12/14] x86/alternative: Add debug prints to apply_retpolines() Peter Zijlstra
                   ` (2 subsequent siblings)
  13 siblings, 0 replies; 45+ messages in thread
From: Peter Zijlstra @ 2021-10-20 10:44 UTC (permalink / raw)
  To: x86, jpoimboe, andrew.cooper3
  Cc: linux-kernel, peterz, alexei.starovoitov, ndesaulniers

Try and replace retpoline thunk calls with:

  lfence
  call    *%\reg

for spectre_v2=retpoline,amd.

Specifically, the sequence above is 5 bytes for the low 8 registers,
but 6 bytes for the high 8 registers. This means that unless the
compilers prefix stuff the call with higher registers this replacement
will fail.

Luckily GCC strongly favours RAX for the indirect calls and most (95%+
for defconfig-x86_64) will be converted. OTOH clang strongly favours
R11 and almost nothing gets converted.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/kernel/alternative.c |   20 +++++++++++++++++---
 1 file changed, 17 insertions(+), 3 deletions(-)

--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -389,12 +389,13 @@ static int emit_indirect(int op, int reg
  *
  *   CALL *%\reg
  *
+ * It also tries to inline spectre_v2=retpoline,amd when size permits.
  */
 static int patch_retpoline(void *addr, struct insn *insn, u8 *bytes)
 {
+	u8 cc, op = insn->opcode.bytes[0];
 	void (*target)(void);
 	int reg, ret, i = 0;
-	u8 op, cc;
 
 	target = addr + insn->length + insn->immediate.value;
 	reg = (target - &__x86_indirect_thunk_rax) /
@@ -406,11 +407,23 @@ static int patch_retpoline(void *addr, s
 	/* If anyone ever does: CALL/JMP *%rsp, we're in deep trouble. */
 	BUG_ON(reg == 4);
 
+	if (cpu_feature_enabled(X86_FEATURE_RETPOLINE_AMD)) {
+		/*
+		 * Can't do nothing about the Jcc case here.
+		 */
+		if (op != JMP32_INSN_OPCODE && op != CALL_INSN_OPCODE)
+			return -1;
+
+		bytes[i++] = 0x0f;
+		bytes[i++] = 0xae;
+		bytes[i++] = 0xe8; /* lfence */
+
+		goto indirect;
+	}
+
 	if (cpu_feature_enabled(X86_FEATURE_RETPOLINE))
 		return -1;
 
-	op = insn->opcode.bytes[0];
-
 	/*
 	 * Convert:
 	 *
@@ -433,6 +446,7 @@ static int patch_retpoline(void *addr, s
 		op = JMP32_INSN_OPCODE;
 	}
 
+indirect:
 	ret = emit_indirect(op, reg, bytes + i);
 	if (ret < 0)
 		return ret;



^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH v2 12/14] x86/alternative: Add debug prints to apply_retpolines()
  2021-10-20 10:44 [PATCH v2 00/14] x86: Rewrite the retpoline rewrite logic Peter Zijlstra
                   ` (10 preceding siblings ...)
  2021-10-20 10:44 ` [PATCH v2 11/14] x86/alternative: Try inline spectre_v2=retpoline,amd Peter Zijlstra
@ 2021-10-20 10:44 ` Peter Zijlstra
  2021-10-20 10:44 ` [PATCH v2 13/14] x86,bugs: Unconditionally allow spectre_v2=retpoline,amd Peter Zijlstra
  2021-10-20 10:44 ` [PATCH v2 14/14] bpf,x86: Respect X86_FEATURE_RETPOLINE* Peter Zijlstra
  13 siblings, 0 replies; 45+ messages in thread
From: Peter Zijlstra @ 2021-10-20 10:44 UTC (permalink / raw)
  To: x86, jpoimboe, andrew.cooper3
  Cc: linux-kernel, peterz, alexei.starovoitov, ndesaulniers

Make sure we can see the text changes when booting with
'debug-alternative'.

Example output:

 [ ] SMP alternatives: retpoline at: __traceiter_initcall_level+0x1f/0x30 (ffffffff8100066f) len: 5 to: __x86_indirect_thunk_rax+0x0/0x20
 [ ] SMP alternatives: ffffffff82603e58: [2:5) optimized NOPs: ff d0 0f 1f 00
 [ ] SMP alternatives: ffffffff8100066f: orig: e8 cc 30 00 01
 [ ] SMP alternatives: ffffffff8100066f: repl: ff d0 0f 1f 00

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/kernel/alternative.c |    6 ++++++
 1 file changed, 6 insertions(+)

--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -487,9 +487,15 @@ void __init_or_module noinline apply_ret
 			continue;
 		}
 
+		DPRINTK("retpoline at: %pS (%px) len: %d to: %pS",
+			addr, addr, insn.length,
+			addr + insn.length + insn.immediate.value);
+
 		len = patch_retpoline(addr, &insn, bytes);
 		if (len == insn.length) {
 			optimize_nops(bytes, len);
+			DUMP_BYTES(((u8*)addr),  len, "%px: orig: ", addr);
+			DUMP_BYTES(((u8*)bytes), len, "%px: repl: ", addr);
 			text_poke_early(addr, bytes, len);
 		}
 	}



^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH v2 13/14] x86,bugs: Unconditionally allow spectre_v2=retpoline,amd
  2021-10-20 10:44 [PATCH v2 00/14] x86: Rewrite the retpoline rewrite logic Peter Zijlstra
                   ` (11 preceding siblings ...)
  2021-10-20 10:44 ` [PATCH v2 12/14] x86/alternative: Add debug prints to apply_retpolines() Peter Zijlstra
@ 2021-10-20 10:44 ` Peter Zijlstra
  2021-10-20 10:44 ` [PATCH v2 14/14] bpf,x86: Respect X86_FEATURE_RETPOLINE* Peter Zijlstra
  13 siblings, 0 replies; 45+ messages in thread
From: Peter Zijlstra @ 2021-10-20 10:44 UTC (permalink / raw)
  To: x86, jpoimboe, andrew.cooper3
  Cc: linux-kernel, peterz, alexei.starovoitov, ndesaulniers

Currently Linux prevents usage of retpoline,amd on !AMD hardware, this
is unfriendly and gets in the way of testing. Remove this restriction.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/kernel/cpu/bugs.c |    7 -------
 1 file changed, 7 deletions(-)

--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -882,13 +882,6 @@ static enum spectre_v2_mitigation_cmd __
 		return SPECTRE_V2_CMD_AUTO;
 	}
 
-	if (cmd == SPECTRE_V2_CMD_RETPOLINE_AMD &&
-	    boot_cpu_data.x86_vendor != X86_VENDOR_HYGON &&
-	    boot_cpu_data.x86_vendor != X86_VENDOR_AMD) {
-		pr_err("retpoline,amd selected but CPU is not AMD. Switching to AUTO select\n");
-		return SPECTRE_V2_CMD_AUTO;
-	}
-
 	spec_v2_print_cond(mitigation_options[i].option,
 			   mitigation_options[i].secure);
 	return cmd;



^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH v2 14/14] bpf,x86: Respect X86_FEATURE_RETPOLINE*
  2021-10-20 10:44 [PATCH v2 00/14] x86: Rewrite the retpoline rewrite logic Peter Zijlstra
                   ` (12 preceding siblings ...)
  2021-10-20 10:44 ` [PATCH v2 13/14] x86,bugs: Unconditionally allow spectre_v2=retpoline,amd Peter Zijlstra
@ 2021-10-20 10:44 ` Peter Zijlstra
  2021-10-20 11:09   ` Peter Zijlstra
  2021-10-21  0:07   ` Alexei Starovoitov
  13 siblings, 2 replies; 45+ messages in thread
From: Peter Zijlstra @ 2021-10-20 10:44 UTC (permalink / raw)
  To: x86, jpoimboe, andrew.cooper3
  Cc: linux-kernel, peterz, alexei.starovoitov, ndesaulniers

Current BPF codegen doesn't respect X86_FEATURE_RETPOLINE* flags and
unconditionally emits a thunk call, this is sub-optimal and doesn't
match the regular, compiler generated, code.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/include/asm/nospec-branch.h |   59 -----------------------------
 arch/x86/net/bpf_jit_comp.c          |   71 ++++++++++++++++++++---------------
 arch/x86/net/bpf_jit_comp32.c        |   22 ++++++++--
 3 files changed, 59 insertions(+), 93 deletions(-)

--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -303,63 +303,4 @@ static inline void mds_idle_clear_cpu_bu
 
 #endif /* __ASSEMBLY__ */
 
-/*
- * Below is used in the eBPF JIT compiler and emits the byte sequence
- * for the following assembly:
- *
- * With retpolines configured:
- *
- *    callq do_rop
- *  spec_trap:
- *    pause
- *    lfence
- *    jmp spec_trap
- *  do_rop:
- *    mov %rcx,(%rsp) for x86_64
- *    mov %edx,(%esp) for x86_32
- *    retq
- *
- * Without retpolines configured:
- *
- *    jmp *%rcx for x86_64
- *    jmp *%edx for x86_32
- */
-#ifdef CONFIG_RETPOLINE
-# ifdef CONFIG_X86_64
-#  define RETPOLINE_RCX_BPF_JIT_SIZE	17
-#  define RETPOLINE_RCX_BPF_JIT()				\
-do {								\
-	EMIT1_off32(0xE8, 7);	 /* callq do_rop */		\
-	/* spec_trap: */					\
-	EMIT2(0xF3, 0x90);       /* pause */			\
-	EMIT3(0x0F, 0xAE, 0xE8); /* lfence */			\
-	EMIT2(0xEB, 0xF9);       /* jmp spec_trap */		\
-	/* do_rop: */						\
-	EMIT4(0x48, 0x89, 0x0C, 0x24); /* mov %rcx,(%rsp) */	\
-	EMIT1(0xC3);             /* retq */			\
-} while (0)
-# else /* !CONFIG_X86_64 */
-#  define RETPOLINE_EDX_BPF_JIT()				\
-do {								\
-	EMIT1_off32(0xE8, 7);	 /* call do_rop */		\
-	/* spec_trap: */					\
-	EMIT2(0xF3, 0x90);       /* pause */			\
-	EMIT3(0x0F, 0xAE, 0xE8); /* lfence */			\
-	EMIT2(0xEB, 0xF9);       /* jmp spec_trap */		\
-	/* do_rop: */						\
-	EMIT3(0x89, 0x14, 0x24); /* mov %edx,(%esp) */		\
-	EMIT1(0xC3);             /* ret */			\
-} while (0)
-# endif
-#else /* !CONFIG_RETPOLINE */
-# ifdef CONFIG_X86_64
-#  define RETPOLINE_RCX_BPF_JIT_SIZE	2
-#  define RETPOLINE_RCX_BPF_JIT()				\
-	EMIT2(0xFF, 0xE1);       /* jmp *%rcx */
-# else /* !CONFIG_X86_64 */
-#  define RETPOLINE_EDX_BPF_JIT()				\
-	EMIT2(0xFF, 0xE2)        /* jmp *%edx */
-# endif
-#endif
-
 #endif /* _ASM_X86_NOSPEC_BRANCH_H_ */
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -396,6 +396,37 @@ static int get_pop_bytes(bool *callee_re
 	return bytes;
 }
 
+#define EMIT_LFENCE()	EMIT3(0x0F, 0xAE, 0xE8)
+
+#ifdef CONFIG_RETPOLINE
+#define INDIRECT_SIZE (2 + 3*cpu_feature_enabled(X86_FEATURE_RETPOLINE))
+#else
+#define INDIRECT_SIZE (2)
+#endif
+
+static void emit_indirect_jump(u8 **pprog, int reg, u8 *ip)
+{
+	u8 *prog = *pprog;
+
+#ifdef CONFIG_RETPOLINE
+	static void * const reg_thunk[] = {
+#define GEN(reg) __x86_indirect_thunk_ ## reg,
+#include <asm/GEN-for-each-reg.h>
+#undef GEN
+	};
+
+	if (cpu_feature_enabled(X86_FEATURE_RETPOLINE_AMD)) {
+		EMIT_LFENCE();
+		EMIT2(0xFF, 0xE0 + reg);
+	} else if (cpu_feature_enabled(X86_FEATURE_RETPOLINE)) {
+		emit_jump(&prog, reg_thunk[reg], ip);
+	} else
+#endif
+	EMIT2(0xFF, 0xE0 + reg);
+
+	*pprog = prog;
+}
+
 /*
  * Generate the following code:
  *
@@ -411,10 +442,10 @@ static int get_pop_bytes(bool *callee_re
  * out:
  */
 static void emit_bpf_tail_call_indirect(u8 **pprog, bool *callee_regs_used,
-					u32 stack_depth)
+					u32 stack_depth, u8 *ip)
 {
 	int tcc_off = -4 - round_up(stack_depth, 8);
-	u8 *prog = *pprog;
+	u8 *prog = *pprog, *start = *pprog;
 	int pop_bytes = 0;
 	int off1 = 42;
 	int off2 = 31;
@@ -448,7 +479,7 @@ static void emit_bpf_tail_call_indirect(
 	EMIT2(0x89, 0xD2);                        /* mov edx, edx */
 	EMIT3(0x39, 0x56,                         /* cmp dword ptr [rsi + 16], edx */
 	      offsetof(struct bpf_array, map.max_entries));
-#define OFFSET1 (off1 + RETPOLINE_RCX_BPF_JIT_SIZE) /* Number of bytes to jump */
+#define OFFSET1 (off1 + INDIRECT_SIZE) /* Number of bytes to jump */
 	EMIT2(X86_JBE, OFFSET1);                  /* jbe out */
 
 	/*
@@ -457,7 +488,7 @@ static void emit_bpf_tail_call_indirect(
 	 */
 	EMIT2_off32(0x8B, 0x85, tcc_off);         /* mov eax, dword ptr [rbp - tcc_off] */
 	EMIT3(0x83, 0xF8, MAX_TAIL_CALL_CNT);     /* cmp eax, MAX_TAIL_CALL_CNT */
-#define OFFSET2 (off2 + RETPOLINE_RCX_BPF_JIT_SIZE)
+#define OFFSET2 (off2 + INDIRECT_SIZE)
 	EMIT2(X86_JA, OFFSET2);                   /* ja out */
 	EMIT3(0x83, 0xC0, 0x01);                  /* add eax, 1 */
 	EMIT2_off32(0x89, 0x85, tcc_off);         /* mov dword ptr [rbp - tcc_off], eax */
@@ -471,7 +502,7 @@ static void emit_bpf_tail_call_indirect(
 	 *	goto out;
 	 */
 	EMIT3(0x48, 0x85, 0xC9);                  /* test rcx,rcx */
-#define OFFSET3 (off3 + RETPOLINE_RCX_BPF_JIT_SIZE)
+#define OFFSET3 (off3 + INDIRECT_SIZE)
 	EMIT2(X86_JE, OFFSET3);                   /* je out */
 
 	*pprog = prog;
@@ -493,7 +524,7 @@ static void emit_bpf_tail_call_indirect(
 	 * rdi == ctx (1st arg)
 	 * rcx == prog->bpf_func + X86_TAIL_CALL_OFFSET
 	 */
-	RETPOLINE_RCX_BPF_JIT();
+	emit_indirect_jump(&prog, 1 /* rcx */, ip + (prog - start));
 
 	/* out: */
 	*pprog = prog;
@@ -1220,8 +1251,7 @@ static int do_jit(struct bpf_prog *bpf_p
 			/* speculation barrier */
 		case BPF_ST | BPF_NOSPEC:
 			if (boot_cpu_has(X86_FEATURE_XMM2))
-				/* Emit 'lfence' */
-				EMIT3(0x0F, 0xAE, 0xE8);
+				EMIT_LFENCE();
 			break;
 
 			/* ST: *(u8*)(dst_reg + off) = imm */
@@ -1411,7 +1441,8 @@ st:			if (is_imm8(insn->off))
 			else
 				emit_bpf_tail_call_indirect(&prog,
 							    callee_regs_used,
-							    bpf_prog->aux->stack_depth);
+							    bpf_prog->aux->stack_depth,
+							    image + addrs[i - 1]);
 			break;
 
 			/* cond jump */
@@ -2117,24 +2148,6 @@ int arch_prepare_bpf_trampoline(struct b
 	return ret;
 }
 
-static int emit_fallback_jump(u8 **pprog)
-{
-	u8 *prog = *pprog;
-	int err = 0;
-
-#ifdef CONFIG_RETPOLINE
-	/* Note that this assumes the the compiler uses external
-	 * thunks for indirect calls. Both clang and GCC use the same
-	 * naming convention for external thunks.
-	 */
-	err = emit_jump(&prog, __x86_indirect_thunk_rdx, prog);
-#else
-	EMIT2(0xFF, 0xE2);	/* jmp rdx */
-#endif
-	*pprog = prog;
-	return err;
-}
-
 static int emit_bpf_dispatcher(u8 **pprog, int a, int b, s64 *progs)
 {
 	u8 *jg_reloc, *prog = *pprog;
@@ -2156,9 +2169,7 @@ static int emit_bpf_dispatcher(u8 **ppro
 		if (err)
 			return err;
 
-		err = emit_fallback_jump(&prog);	/* jmp thunk/indirect */
-		if (err)
-			return err;
+		emit_indirect_jump(&prog, 2 /* rdx */, prog);
 
 		*pprog = prog;
 		return 0;
--- a/arch/x86/net/bpf_jit_comp32.c
+++ b/arch/x86/net/bpf_jit_comp32.c
@@ -15,6 +15,7 @@
 #include <asm/cacheflush.h>
 #include <asm/set_memory.h>
 #include <asm/nospec-branch.h>
+#include <asm/asm-prototypes.h>
 #include <linux/bpf.h>
 
 /*
@@ -1267,6 +1268,19 @@ static void emit_epilogue(u8 **pprog, u3
 	*pprog = prog;
 }
 
+static void emit_jmp_edx(u8 **pprog, u8 *ip)
+{
+	u8 *prog = *pprog;
+	int cnt = 0;
+
+#ifdef CONFIG_RETPOLINE
+	EMIT1_off32(0xE9, (u8 *)__x86_indirect_thunk_edx - (ip + 5));
+#else
+	EMIT2(0xFF, 0xE2);
+#endif
+	*pprog = prog;
+}
+
 /*
  * Generate the following code:
  * ... bpf_tail_call(void *ctx, struct bpf_array *array, u64 index) ...
@@ -1280,9 +1294,9 @@ static void emit_epilogue(u8 **pprog, u3
  *   goto *(prog->bpf_func + prologue_size);
  * out:
  */
-static void emit_bpf_tail_call(u8 **pprog)
+static void emit_bpf_tail_call(u8 **pprog, u8 *ip)
 {
-	u8 *prog = *pprog;
+	u8 *prog = *pprog, *start = *pprog;
 	int cnt = 0;
 	const u8 *r1 = bpf2ia32[BPF_REG_1];
 	const u8 *r2 = bpf2ia32[BPF_REG_2];
@@ -1362,7 +1376,7 @@ static void emit_bpf_tail_call(u8 **ppro
 	 * eax == ctx (1st arg)
 	 * edx == prog->bpf_func + prologue_size
 	 */
-	RETPOLINE_EDX_BPF_JIT();
+	emit_jmp_edx(&prog, ip + (prog - start));
 
 	if (jmp_label1 == -1)
 		jmp_label1 = cnt;
@@ -2122,7 +2136,7 @@ static int do_jit(struct bpf_prog *bpf_p
 			break;
 		}
 		case BPF_JMP | BPF_TAIL_CALL:
-			emit_bpf_tail_call(&prog);
+			emit_bpf_tail_call(&prog, image + addrs[i - 1]);
 			break;
 
 		/* cond jump */



^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 14/14] bpf,x86: Respect X86_FEATURE_RETPOLINE*
  2021-10-20 10:44 ` [PATCH v2 14/14] bpf,x86: Respect X86_FEATURE_RETPOLINE* Peter Zijlstra
@ 2021-10-20 11:09   ` Peter Zijlstra
  2021-10-20 16:56     ` Josh Poimboeuf
  2021-10-21  0:05     ` Alexei Starovoitov
  2021-10-21  0:07   ` Alexei Starovoitov
  1 sibling, 2 replies; 45+ messages in thread
From: Peter Zijlstra @ 2021-10-20 11:09 UTC (permalink / raw)
  To: x86, jpoimboe, andrew.cooper3
  Cc: linux-kernel, alexei.starovoitov, ndesaulniers

On Wed, Oct 20, 2021 at 12:44:56PM +0200, Peter Zijlstra wrote:
> --- a/arch/x86/net/bpf_jit_comp.c
> +++ b/arch/x86/net/bpf_jit_comp.c

> +#ifdef CONFIG_RETPOLINE
> +#define INDIRECT_SIZE (2 + 3*cpu_feature_enabled(X86_FEATURE_RETPOLINE))
> +#else
> +#define INDIRECT_SIZE (2)
> +#endif

> @@ -411,10 +442,10 @@ static int get_pop_bytes(bool *callee_re
>   * out:
>   */
>  static void emit_bpf_tail_call_indirect(u8 **pprog, bool *callee_regs_used,
> -					u32 stack_depth)
> +					u32 stack_depth, u8 *ip)
>  {
>  	int tcc_off = -4 - round_up(stack_depth, 8);
> -	u8 *prog = *pprog;
> +	u8 *prog = *pprog, *start = *pprog;
>  	int pop_bytes = 0;
>  	int off1 = 42;
>  	int off2 = 31;
> @@ -448,7 +479,7 @@ static void emit_bpf_tail_call_indirect(
>  	EMIT2(0x89, 0xD2);                        /* mov edx, edx */
>  	EMIT3(0x39, 0x56,                         /* cmp dword ptr [rsi + 16], edx */
>  	      offsetof(struct bpf_array, map.max_entries));
> -#define OFFSET1 (off1 + RETPOLINE_RCX_BPF_JIT_SIZE) /* Number of bytes to jump */
> +#define OFFSET1 (off1 + INDIRECT_SIZE) /* Number of bytes to jump */
>  	EMIT2(X86_JBE, OFFSET1);                  /* jbe out */
>  
>  	/*
> @@ -457,7 +488,7 @@ static void emit_bpf_tail_call_indirect(
>  	 */
>  	EMIT2_off32(0x8B, 0x85, tcc_off);         /* mov eax, dword ptr [rbp - tcc_off] */
>  	EMIT3(0x83, 0xF8, MAX_TAIL_CALL_CNT);     /* cmp eax, MAX_TAIL_CALL_CNT */
> -#define OFFSET2 (off2 + RETPOLINE_RCX_BPF_JIT_SIZE)
> +#define OFFSET2 (off2 + INDIRECT_SIZE)
>  	EMIT2(X86_JA, OFFSET2);                   /* ja out */
>  	EMIT3(0x83, 0xC0, 0x01);                  /* add eax, 1 */
>  	EMIT2_off32(0x89, 0x85, tcc_off);         /* mov dword ptr [rbp - tcc_off], eax */
> @@ -471,7 +502,7 @@ static void emit_bpf_tail_call_indirect(
>  	 *	goto out;
>  	 */
>  	EMIT3(0x48, 0x85, 0xC9);                  /* test rcx,rcx */
> -#define OFFSET3 (off3 + RETPOLINE_RCX_BPF_JIT_SIZE)
> +#define OFFSET3 (off3 + INDIRECT_SIZE)
>  	EMIT2(X86_JE, OFFSET3);                   /* je out */
>  
>  	*pprog = prog;
> @@ -493,7 +524,7 @@ static void emit_bpf_tail_call_indirect(
>  	 * rdi == ctx (1st arg)
>  	 * rcx == prog->bpf_func + X86_TAIL_CALL_OFFSET
>  	 */
> -	RETPOLINE_RCX_BPF_JIT();
> +	emit_indirect_jump(&prog, 1 /* rcx */, ip + (prog - start));
>  
>  	/* out: */
>  	*pprog = prog;

Alexei; could the above not be further improved with something like the
below?

Despite several hours trying and Song helping, I can't seem to run
anything bpf, that stuff is cursed. So I've no idea if the below
actually works, but it seems reasonable.

---

--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -398,12 +398,6 @@ static int get_pop_bytes(bool *callee_re
 
 #define EMIT_LFENCE()	EMIT3(0x0F, 0xAE, 0xE8)
 
-#ifdef CONFIG_RETPOLINE
-#define INDIRECT_SIZE (2 + 3*cpu_feature_enabled(X86_FEATURE_RETPOLINE))
-#else
-#define INDIRECT_SIZE (2)
-#endif
-
 static void emit_indirect_jump(u8 **pprog, int reg, u8 *ip)
 {
 	u8 *prog = *pprog;
@@ -446,25 +440,8 @@ static void emit_bpf_tail_call_indirect(
 {
 	int tcc_off = -4 - round_up(stack_depth, 8);
 	u8 *prog = *pprog, *start = *pprog;
-	int pop_bytes = 0;
-	int off1 = 42;
-	int off2 = 31;
-	int off3 = 9;
-
-	/* count the additional bytes used for popping callee regs from stack
-	 * that need to be taken into account for each of the offsets that
-	 * are used for bailing out of the tail call
-	 */
-	pop_bytes = get_pop_bytes(callee_regs_used);
-	off1 += pop_bytes;
-	off2 += pop_bytes;
-	off3 += pop_bytes;
-
-	if (stack_depth) {
-		off1 += 7;
-		off2 += 7;
-		off3 += 7;
-	}
+	static int out_label = -1;
+	int offset;
 
 	/*
 	 * rdi - pointer to ctx
@@ -479,8 +456,9 @@ static void emit_bpf_tail_call_indirect(
 	EMIT2(0x89, 0xD2);                        /* mov edx, edx */
 	EMIT3(0x39, 0x56,                         /* cmp dword ptr [rsi + 16], edx */
 	      offsetof(struct bpf_array, map.max_entries));
-#define OFFSET1 (off1 + INDIRECT_SIZE) /* Number of bytes to jump */
-	EMIT2(X86_JBE, OFFSET1);                  /* jbe out */
+
+	offset = out_label - (prog - start) + 2;
+	EMIT2(X86_JBE, offset);                  /* jbe out */
 
 	/*
 	 * if (tail_call_cnt > MAX_TAIL_CALL_CNT)
@@ -488,8 +466,9 @@ static void emit_bpf_tail_call_indirect(
 	 */
 	EMIT2_off32(0x8B, 0x85, tcc_off);         /* mov eax, dword ptr [rbp - tcc_off] */
 	EMIT3(0x83, 0xF8, MAX_TAIL_CALL_CNT);     /* cmp eax, MAX_TAIL_CALL_CNT */
-#define OFFSET2 (off2 + INDIRECT_SIZE)
-	EMIT2(X86_JA, OFFSET2);                   /* ja out */
+
+	offset = out_label - (prog - start) + 2;
+	EMIT2(X86_JA, offset);                   /* ja out */
 	EMIT3(0x83, 0xC0, 0x01);                  /* add eax, 1 */
 	EMIT2_off32(0x89, 0x85, tcc_off);         /* mov dword ptr [rbp - tcc_off], eax */
 
@@ -502,8 +481,9 @@ static void emit_bpf_tail_call_indirect(
 	 *	goto out;
 	 */
 	EMIT3(0x48, 0x85, 0xC9);                  /* test rcx,rcx */
-#define OFFSET3 (off3 + INDIRECT_SIZE)
-	EMIT2(X86_JE, OFFSET3);                   /* je out */
+
+	offset = out_label - (prog - start) + 2;
+	EMIT2(X86_JE, offset);                   /* je out */
 
 	*pprog = prog;
 	pop_callee_regs(pprog, callee_regs_used);
@@ -527,6 +507,8 @@ static void emit_bpf_tail_call_indirect(
 	emit_indirect_jump(&prog, 1 /* rcx */, ip + (prog - start));
 
 	/* out: */
+	out_label = (prog - start);
+
 	*pprog = prog;
 }
 

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 01/14] objtool: Tag retpoline thunk symbols
  2021-10-20 10:44 ` [PATCH v2 01/14] objtool: Tag retpoline thunk symbols Peter Zijlstra
@ 2021-10-20 15:17   ` Josh Poimboeuf
  2021-10-26  7:55     ` Peter Zijlstra
  0 siblings, 1 reply; 45+ messages in thread
From: Josh Poimboeuf @ 2021-10-20 15:17 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: x86, andrew.cooper3, linux-kernel, alexei.starovoitov, ndesaulniers

On Wed, Oct 20, 2021 at 12:44:43PM +0200, Peter Zijlstra wrote:
> In order to avoid calling arch_is_retpoline() (which does a strcmp)
> over and over on symbols, tag them once upfront.
> 
> XXX do we also want to do __fentry__ ?

Might as well.

> XXX do we want an enum instead of a bunch of bools ?

Maybe, or convert the bools to bit fields.

-- 
Josh


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 08/14] x86/retpoline: Create a retpoline thunk array
  2021-10-20 10:44 ` [PATCH v2 08/14] x86/retpoline: Create a retpoline thunk array Peter Zijlstra
@ 2021-10-20 15:57   ` Josh Poimboeuf
  2021-10-20 16:46     ` Andrew Cooper
  0 siblings, 1 reply; 45+ messages in thread
From: Josh Poimboeuf @ 2021-10-20 15:57 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: x86, andrew.cooper3, linux-kernel, alexei.starovoitov, ndesaulniers

On Wed, Oct 20, 2021 at 12:44:50PM +0200, Peter Zijlstra wrote:
> Stick all the retpolines in a single symbol and have the individual
> thunks as inner labels, this should guarantee thunk order and layout.

How so?

Just wondering what the purpose of the array is.  It doesn't seem to be
referenced anywhere.

-- 
Josh


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 08/14] x86/retpoline: Create a retpoline thunk array
  2021-10-20 15:57   ` Josh Poimboeuf
@ 2021-10-20 16:46     ` Andrew Cooper
  2021-10-20 17:09       ` Josh Poimboeuf
  2021-10-20 19:34       ` Peter Zijlstra
  0 siblings, 2 replies; 45+ messages in thread
From: Andrew Cooper @ 2021-10-20 16:46 UTC (permalink / raw)
  To: Josh Poimboeuf, Peter Zijlstra
  Cc: x86, linux-kernel, alexei.starovoitov, ndesaulniers, Andrew Cooper

On 20/10/2021 16:57, Josh Poimboeuf wrote:
> On Wed, Oct 20, 2021 at 12:44:50PM +0200, Peter Zijlstra wrote:
>> Stick all the retpolines in a single symbol and have the individual
>> thunks as inner labels, this should guarantee thunk order and layout.
> How so?
>
> Just wondering what the purpose of the array is.  It doesn't seem to be
> referenced anywhere.

The array property is what makes:

> +	reg = (target - &__x86_indirect_thunk_rax) /
> +	      (&__x86_indirect_thunk_rcx - &__x86_indirect_thunk_rax);

safe in the next path.

~Andrew

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 14/14] bpf,x86: Respect X86_FEATURE_RETPOLINE*
  2021-10-20 11:09   ` Peter Zijlstra
@ 2021-10-20 16:56     ` Josh Poimboeuf
  2021-10-20 19:23       ` Peter Zijlstra
  2021-10-21  0:05     ` Alexei Starovoitov
  1 sibling, 1 reply; 45+ messages in thread
From: Josh Poimboeuf @ 2021-10-20 16:56 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: x86, andrew.cooper3, linux-kernel, alexei.starovoitov, ndesaulniers

On Wed, Oct 20, 2021 at 01:09:51PM +0200, Peter Zijlstra wrote:
> Alexei; could the above not be further improved with something like the
> below?
> 
> Despite several hours trying and Song helping, I can't seem to run
> anything bpf, that stuff is cursed. So I've no idea if the below
> actually works, but it seems reasonable.

The below fix gets it to run with test_verififer.

I do like it, it does seem less fiddly and more robust against future
changes, though it probably needs a comment for 'out_label' clarifying
it only works because this function is always called at least twice for
a given bpf_tail_call emission.

---

diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
index 08f32c9fceaa..c9230c5bbca5 100644
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -457,7 +457,7 @@ static void emit_bpf_tail_call_indirect(u8 **pprog, bool *callee_regs_used,
 	EMIT3(0x39, 0x56,                         /* cmp dword ptr [rsi + 16], edx */
 	      offsetof(struct bpf_array, map.max_entries));
 
-	offset = out_label - (prog - start) + 2;
+	offset = out_label - (prog - start + 2);
 	EMIT2(X86_JBE, offset);                  /* jbe out */
 
 	/*
@@ -467,7 +467,7 @@ static void emit_bpf_tail_call_indirect(u8 **pprog, bool *callee_regs_used,
 	EMIT2_off32(0x8B, 0x85, tcc_off);         /* mov eax, dword ptr [rbp - tcc_off] */
 	EMIT3(0x83, 0xF8, MAX_TAIL_CALL_CNT);     /* cmp eax, MAX_TAIL_CALL_CNT */
 
-	offset = out_label - (prog - start) + 2;
+	offset = out_label - (prog - start + 2);
 	EMIT2(X86_JA, offset);                   /* ja out */
 	EMIT3(0x83, 0xC0, 0x01);                  /* add eax, 1 */
 	EMIT2_off32(0x89, 0x85, tcc_off);         /* mov dword ptr [rbp - tcc_off], eax */
@@ -482,7 +482,7 @@ static void emit_bpf_tail_call_indirect(u8 **pprog, bool *callee_regs_used,
 	 */
 	EMIT3(0x48, 0x85, 0xC9);                  /* test rcx,rcx */
 
-	offset = out_label - (prog - start) + 2;
+	offset = out_label - (prog - start + 2);
 	EMIT2(X86_JE, offset);                   /* je out */
 
 	*pprog = prog;


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 08/14] x86/retpoline: Create a retpoline thunk array
  2021-10-20 16:46     ` Andrew Cooper
@ 2021-10-20 17:09       ` Josh Poimboeuf
  2021-10-20 19:22         ` Peter Zijlstra
  2021-10-20 19:34       ` Peter Zijlstra
  1 sibling, 1 reply; 45+ messages in thread
From: Josh Poimboeuf @ 2021-10-20 17:09 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Peter Zijlstra, x86, linux-kernel, alexei.starovoitov, ndesaulniers

On Wed, Oct 20, 2021 at 05:46:39PM +0100, Andrew Cooper wrote:
> On 20/10/2021 16:57, Josh Poimboeuf wrote:
> > On Wed, Oct 20, 2021 at 12:44:50PM +0200, Peter Zijlstra wrote:
> >> Stick all the retpolines in a single symbol and have the individual
> >> thunks as inner labels, this should guarantee thunk order and layout.
> > How so?
> >
> > Just wondering what the purpose of the array is.  It doesn't seem to be
> > referenced anywhere.
> 
> The array property is what makes:
> 
> > +	reg = (target - &__x86_indirect_thunk_rax) /
> > +	      (&__x86_indirect_thunk_rcx - &__x86_indirect_thunk_rax);
> 
> safe in the next path.

The thunks were already 32-byte aligned.  I don't see how slapping a few
unused symbols around them does anything.

-- 
Josh


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 08/14] x86/retpoline: Create a retpoline thunk array
  2021-10-20 17:09       ` Josh Poimboeuf
@ 2021-10-20 19:22         ` Peter Zijlstra
  2021-10-20 19:43           ` Josh Poimboeuf
  0 siblings, 1 reply; 45+ messages in thread
From: Peter Zijlstra @ 2021-10-20 19:22 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Andrew Cooper, x86, linux-kernel, alexei.starovoitov, ndesaulniers

On Wed, Oct 20, 2021 at 10:09:56AM -0700, Josh Poimboeuf wrote:
> On Wed, Oct 20, 2021 at 05:46:39PM +0100, Andrew Cooper wrote:
> > On 20/10/2021 16:57, Josh Poimboeuf wrote:
> > > On Wed, Oct 20, 2021 at 12:44:50PM +0200, Peter Zijlstra wrote:
> > >> Stick all the retpolines in a single symbol and have the individual
> > >> thunks as inner labels, this should guarantee thunk order and layout.
> > > How so?
> > >
> > > Just wondering what the purpose of the array is.  It doesn't seem to be
> > > referenced anywhere.
> > 
> > The array property is what makes:
> > 
> > > +	reg = (target - &__x86_indirect_thunk_rax) /
> > > +	      (&__x86_indirect_thunk_rcx - &__x86_indirect_thunk_rax);
> > 
> > safe in the next path.
> 
> The thunks were already 32-byte aligned.  I don't see how slapping a few
> unused symbols around them does anything.

Previously there were 16 (or rather 15 without rsp) separate symbols and
a toolchain might reasonably expect it could displace them however it
liked, with disregard for the relative position.

However, now they're part of a larger symbol. Any change to their
relative position would disrupt this larger _array symbol and thus not
be sound.

This is I think the same reasoning used for data symbols. On their own
there is no guarantee about their relative position wrt to one aonther,
but we're still able to do arrays because an array as a whole is a
single larger symbol.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 14/14] bpf,x86: Respect X86_FEATURE_RETPOLINE*
  2021-10-20 16:56     ` Josh Poimboeuf
@ 2021-10-20 19:23       ` Peter Zijlstra
  0 siblings, 0 replies; 45+ messages in thread
From: Peter Zijlstra @ 2021-10-20 19:23 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: x86, andrew.cooper3, linux-kernel, alexei.starovoitov, ndesaulniers

On Wed, Oct 20, 2021 at 09:56:55AM -0700, Josh Poimboeuf wrote:
> On Wed, Oct 20, 2021 at 01:09:51PM +0200, Peter Zijlstra wrote:
> > Alexei; could the above not be further improved with something like the
> > below?
> > 
> > Despite several hours trying and Song helping, I can't seem to run
> > anything bpf, that stuff is cursed. So I've no idea if the below
> > actually works, but it seems reasonable.
> 
> The below fix gets it to run with test_verififer.

Argh, I;m fast running out of brown paper bags at this rate :/

> 
> I do like it, it does seem less fiddly and more robust against future
> changes, though it probably needs a comment for 'out_label' clarifying
> it only works because this function is always called at least twice for
> a given bpf_tail_call emission.

Right.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 08/14] x86/retpoline: Create a retpoline thunk array
  2021-10-20 16:46     ` Andrew Cooper
  2021-10-20 17:09       ` Josh Poimboeuf
@ 2021-10-20 19:34       ` Peter Zijlstra
  1 sibling, 0 replies; 45+ messages in thread
From: Peter Zijlstra @ 2021-10-20 19:34 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Josh Poimboeuf, x86, linux-kernel, alexei.starovoitov, ndesaulniers

On Wed, Oct 20, 2021 at 05:46:39PM +0100, Andrew Cooper wrote:
> On 20/10/2021 16:57, Josh Poimboeuf wrote:
> > On Wed, Oct 20, 2021 at 12:44:50PM +0200, Peter Zijlstra wrote:
> >> Stick all the retpolines in a single symbol and have the individual
> >> thunks as inner labels, this should guarantee thunk order and layout.
> > How so?
> >
> > Just wondering what the purpose of the array is.  It doesn't seem to be
> > referenced anywhere.
> 
> The array property is what makes:
> 
> > +	reg = (target - &__x86_indirect_thunk_rax) /
> > +	      (&__x86_indirect_thunk_rcx - &__x86_indirect_thunk_rax);
> 
> safe in the next path.

Arguably I should probably write that like:

	(target - &__x86_indirect_thunk_array) / __x86_indirect_thunk_size;

or something, and also generate and expose that latter symbol from .S.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 08/14] x86/retpoline: Create a retpoline thunk array
  2021-10-20 19:22         ` Peter Zijlstra
@ 2021-10-20 19:43           ` Josh Poimboeuf
  0 siblings, 0 replies; 45+ messages in thread
From: Josh Poimboeuf @ 2021-10-20 19:43 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Andrew Cooper, x86, linux-kernel, alexei.starovoitov, ndesaulniers

On Wed, Oct 20, 2021 at 09:22:29PM +0200, Peter Zijlstra wrote:
> On Wed, Oct 20, 2021 at 10:09:56AM -0700, Josh Poimboeuf wrote:
> > On Wed, Oct 20, 2021 at 05:46:39PM +0100, Andrew Cooper wrote:
> > > On 20/10/2021 16:57, Josh Poimboeuf wrote:
> > > > On Wed, Oct 20, 2021 at 12:44:50PM +0200, Peter Zijlstra wrote:
> > > >> Stick all the retpolines in a single symbol and have the individual
> > > >> thunks as inner labels, this should guarantee thunk order and layout.
> > > > How so?
> > > >
> > > > Just wondering what the purpose of the array is.  It doesn't seem to be
> > > > referenced anywhere.
> > > 
> > > The array property is what makes:
> > > 
> > > > +	reg = (target - &__x86_indirect_thunk_rax) /
> > > > +	      (&__x86_indirect_thunk_rcx - &__x86_indirect_thunk_rax);
> > > 
> > > safe in the next path.
> > 
> > The thunks were already 32-byte aligned.  I don't see how slapping a few
> > unused symbols around them does anything.
> 
> Previously there were 16 (or rather 15 without rsp) separate symbols and
> a toolchain might reasonably expect it could displace them however it
> liked, with disregard for the relative position.
> 
> However, now they're part of a larger symbol. Any change to their
> relative position would disrupt this larger _array symbol and thus not
> be sound.
> 
> This is I think the same reasoning used for data symbols. On their own
> there is no guarantee about their relative position wrt to one aonther,
> but we're still able to do arrays because an array as a whole is a
> single larger symbol.

Makes sense, I think (and good fodder for the commit log).

-- 
Josh


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 14/14] bpf,x86: Respect X86_FEATURE_RETPOLINE*
  2021-10-20 11:09   ` Peter Zijlstra
  2021-10-20 16:56     ` Josh Poimboeuf
@ 2021-10-21  0:05     ` Alexei Starovoitov
  2021-10-21  8:47       ` Peter Zijlstra
  1 sibling, 1 reply; 45+ messages in thread
From: Alexei Starovoitov @ 2021-10-21  0:05 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: x86, jpoimboe, andrew.cooper3, linux-kernel, ndesaulniers,
	daniel, bpf, andrii

On Wed, Oct 20, 2021 at 01:09:51PM +0200, Peter Zijlstra wrote:
> > -	RETPOLINE_RCX_BPF_JIT();
> > +	emit_indirect_jump(&prog, 1 /* rcx */, ip + (prog - start));
> >  
> >  	/* out: */
> >  	*pprog = prog;
> 
> Alexei; could the above not be further improved with something like the
> below?

sorry for delay. I was traveling last week
and Daniel is on PTO this week.

> Despite several hours trying and Song helping, I can't seem to run
> anything bpf, that stuff is cursed. So I've no idea if the below
> actually works, but it seems reasonable.

It's certainly delicate.

> @@ -446,25 +440,8 @@ static void emit_bpf_tail_call_indirect(
>  {
>  	int tcc_off = -4 - round_up(stack_depth, 8);
>  	u8 *prog = *pprog, *start = *pprog;
> -	int pop_bytes = 0;
> -	int off1 = 42;
> -	int off2 = 31;
> -	int off3 = 9;
> -
> -	/* count the additional bytes used for popping callee regs from stack
> -	 * that need to be taken into account for each of the offsets that
> -	 * are used for bailing out of the tail call
> -	 */
> -	pop_bytes = get_pop_bytes(callee_regs_used);
> -	off1 += pop_bytes;
> -	off2 += pop_bytes;
> -	off3 += pop_bytes;
> -
> -	if (stack_depth) {
> -		off1 += 7;
> -		off2 += 7;
> -		off3 += 7;
> -	}
> +	static int out_label = -1;

Interesting idea!
All insn emits trying to do the right thing from the start.
Here the logic assumes that there will be at least two passes over image.
I think that is correct, but we never had such assumption.
A comment is certainly must have.
The race is possible too. Not sure whether READ_ONCE/WRITE_ONCE
are really warranted though. Might be overkill.

Nice that Josh's test_verifier is passing, but it doesn't provide
a ton of coverage. test_progs has a lot more.
Once you have a git branch with all the changes I can give it a go.
Also you can rely on our BPF CI.
Just cc your patchset to bpf@vger and add [PATCH bpf-next] to a subject.
In patchwork there will be "bpf/vmtest-bpf-next" link that
builds kernel, selftests and runs everything.
It's pretty much the same as selftests/bpf/vmtest.sh, but with the latest
clang nightly and other deps like pahole.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 14/14] bpf,x86: Respect X86_FEATURE_RETPOLINE*
  2021-10-20 10:44 ` [PATCH v2 14/14] bpf,x86: Respect X86_FEATURE_RETPOLINE* Peter Zijlstra
  2021-10-20 11:09   ` Peter Zijlstra
@ 2021-10-21  0:07   ` Alexei Starovoitov
  2021-10-21  0:18     ` Josh Poimboeuf
  1 sibling, 1 reply; 45+ messages in thread
From: Alexei Starovoitov @ 2021-10-21  0:07 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: x86, jpoimboe, andrew.cooper3, linux-kernel, ndesaulniers, bpf, daniel

On Wed, Oct 20, 2021 at 12:44:56PM +0200, Peter Zijlstra wrote:
> +
> +	if (cpu_feature_enabled(X86_FEATURE_RETPOLINE_AMD)) {
> +		EMIT_LFENCE();
> +		EMIT2(0xFF, 0xE0 + reg);
> +	} else if (cpu_feature_enabled(X86_FEATURE_RETPOLINE)) {
> +		emit_jump(&prog, reg_thunk[reg], ip);
> +	} else

One more question.
What's a deal with AMD? I thought the retpoline is effective on it as well.
lfence is an optimization or retpoline turned out to be not enough
in some cases?

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 14/14] bpf,x86: Respect X86_FEATURE_RETPOLINE*
  2021-10-21  0:07   ` Alexei Starovoitov
@ 2021-10-21  0:18     ` Josh Poimboeuf
  2021-10-21  8:53       ` Peter Zijlstra
  0 siblings, 1 reply; 45+ messages in thread
From: Josh Poimboeuf @ 2021-10-21  0:18 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Peter Zijlstra, x86, andrew.cooper3, linux-kernel, ndesaulniers,
	bpf, daniel

On Wed, Oct 20, 2021 at 05:07:53PM -0700, Alexei Starovoitov wrote:
> On Wed, Oct 20, 2021 at 12:44:56PM +0200, Peter Zijlstra wrote:
> > +
> > +	if (cpu_feature_enabled(X86_FEATURE_RETPOLINE_AMD)) {
> > +		EMIT_LFENCE();
> > +		EMIT2(0xFF, 0xE0 + reg);
> > +	} else if (cpu_feature_enabled(X86_FEATURE_RETPOLINE)) {
> > +		emit_jump(&prog, reg_thunk[reg], ip);
> > +	} else
> 
> One more question.
> What's a deal with AMD? I thought the retpoline is effective on it as well.
> lfence is an optimization or retpoline turned out to be not enough
> in some cases?

Yes, it's basically an optimization.  AMD recommends it presumably
because it's quite a bit faster than a retpoline.

According to AMD it shrinks the speculative execution window enough so
that Spectre v2 isn't a threat.

-- 
Josh


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 14/14] bpf,x86: Respect X86_FEATURE_RETPOLINE*
  2021-10-21  0:05     ` Alexei Starovoitov
@ 2021-10-21  8:47       ` Peter Zijlstra
  2021-10-21 18:03         ` Alexei Starovoitov
  2021-10-21 23:51         ` Zvi Effron
  0 siblings, 2 replies; 45+ messages in thread
From: Peter Zijlstra @ 2021-10-21  8:47 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: x86, jpoimboe, andrew.cooper3, linux-kernel, ndesaulniers,
	daniel, bpf, andrii

On Wed, Oct 20, 2021 at 05:05:02PM -0700, Alexei Starovoitov wrote:
> On Wed, Oct 20, 2021 at 01:09:51PM +0200, Peter Zijlstra wrote:

> > @@ -446,25 +440,8 @@ static void emit_bpf_tail_call_indirect(
> >  {
> >  	int tcc_off = -4 - round_up(stack_depth, 8);
> >  	u8 *prog = *pprog, *start = *pprog;
> > -	int pop_bytes = 0;
> > -	int off1 = 42;
> > -	int off2 = 31;
> > -	int off3 = 9;
> > -
> > -	/* count the additional bytes used for popping callee regs from stack
> > -	 * that need to be taken into account for each of the offsets that
> > -	 * are used for bailing out of the tail call
> > -	 */
> > -	pop_bytes = get_pop_bytes(callee_regs_used);
> > -	off1 += pop_bytes;
> > -	off2 += pop_bytes;
> > -	off3 += pop_bytes;
> > -
> > -	if (stack_depth) {
> > -		off1 += 7;
> > -		off2 += 7;
> > -		off3 += 7;
> > -	}
> > +	static int out_label = -1;
> 
> Interesting idea!

I nicked it from emit_bpf_tail_call() in the 32bit jit :-) It seemed a
lot more robust than the 64bit one and I couldn't figure out why the
difference.

> All insn emits trying to do the right thing from the start.
> Here the logic assumes that there will be at least two passes over image.
> I think that is correct, but we never had such assumption.

That's not exactly true; I think image is NULL on every first run, so
all insn that depend on it will be wrong to start with. Equally there's
a number of insn that seem to depend on addrs[i], that also requires at
least two passes.

> A comment is certainly must have.

I can certainly add one, although I think we'll disagree on the comment
style :-)

> The race is possible too. Not sure whether READ_ONCE/WRITE_ONCE
> are really warranted though. Might be overkill.

Is there concurrency on the jit?

> Once you have a git branch with all the changes I can give it a go.

Ok, I'll go polish this thing and stick it in the tree mentioned in the
cover letter.

> Also you can rely on our BPF CI.
> Just cc your patchset to bpf@vger and add [PATCH bpf-next] to a subject.
> In patchwork there will be "bpf/vmtest-bpf-next" link that
> builds kernel, selftests and runs everything.

What's a patchwork and where do I find it?

> It's pretty much the same as selftests/bpf/vmtest.sh, but with the latest
> clang nightly and other deps like pahole.

nice.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 14/14] bpf,x86: Respect X86_FEATURE_RETPOLINE*
  2021-10-21  0:18     ` Josh Poimboeuf
@ 2021-10-21  8:53       ` Peter Zijlstra
  0 siblings, 0 replies; 45+ messages in thread
From: Peter Zijlstra @ 2021-10-21  8:53 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Alexei Starovoitov, x86, andrew.cooper3, linux-kernel,
	ndesaulniers, bpf, daniel

On Wed, Oct 20, 2021 at 05:18:08PM -0700, Josh Poimboeuf wrote:
> On Wed, Oct 20, 2021 at 05:07:53PM -0700, Alexei Starovoitov wrote:
> > On Wed, Oct 20, 2021 at 12:44:56PM +0200, Peter Zijlstra wrote:
> > > +
> > > +	if (cpu_feature_enabled(X86_FEATURE_RETPOLINE_AMD)) {
> > > +		EMIT_LFENCE();
> > > +		EMIT2(0xFF, 0xE0 + reg);
> > > +	} else if (cpu_feature_enabled(X86_FEATURE_RETPOLINE)) {
> > > +		emit_jump(&prog, reg_thunk[reg], ip);
> > > +	} else
> > 
> > One more question.
> > What's a deal with AMD? I thought the retpoline is effective on it as well.
> > lfence is an optimization or retpoline turned out to be not enough
> > in some cases?
> 
> Yes, it's basically an optimization.  AMD recommends it presumably
> because it's quite a bit faster than a retpoline.
> 
> According to AMD it shrinks the speculative execution window enough so
> that Spectre v2 isn't a threat.

Right, also note that we've been using alternatives to patch the thunk
to lfence;jmp for AMD pretty much forever.

Inlining it is better tho; just a shame clang seems to insist on r11,
which means we cannot fit it in the thunk call site for them :/

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 14/14] bpf,x86: Respect X86_FEATURE_RETPOLINE*
  2021-10-21  8:47       ` Peter Zijlstra
@ 2021-10-21 18:03         ` Alexei Starovoitov
  2021-10-21 22:37           ` Peter Zijlstra
  2021-10-21 23:51         ` Zvi Effron
  1 sibling, 1 reply; 45+ messages in thread
From: Alexei Starovoitov @ 2021-10-21 18:03 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: X86 ML, Josh Poimboeuf, Andrew Cooper, LKML, Nick Desaulniers,
	Daniel Borkmann, bpf, Andrii Nakryiko

On Thu, Oct 21, 2021 at 1:47 AM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Wed, Oct 20, 2021 at 05:05:02PM -0700, Alexei Starovoitov wrote:
> > On Wed, Oct 20, 2021 at 01:09:51PM +0200, Peter Zijlstra wrote:
>
> > > @@ -446,25 +440,8 @@ static void emit_bpf_tail_call_indirect(
> > >  {
> > >     int tcc_off = -4 - round_up(stack_depth, 8);
> > >     u8 *prog = *pprog, *start = *pprog;
> > > -   int pop_bytes = 0;
> > > -   int off1 = 42;
> > > -   int off2 = 31;
> > > -   int off3 = 9;
> > > -
> > > -   /* count the additional bytes used for popping callee regs from stack
> > > -    * that need to be taken into account for each of the offsets that
> > > -    * are used for bailing out of the tail call
> > > -    */
> > > -   pop_bytes = get_pop_bytes(callee_regs_used);
> > > -   off1 += pop_bytes;
> > > -   off2 += pop_bytes;
> > > -   off3 += pop_bytes;
> > > -
> > > -   if (stack_depth) {
> > > -           off1 += 7;
> > > -           off2 += 7;
> > > -           off3 += 7;
> > > -   }
> > > +   static int out_label = -1;
> >
> > Interesting idea!
>
> I nicked it from emit_bpf_tail_call() in the 32bit jit :-) It seemed a
> lot more robust than the 64bit one and I couldn't figure out why the
> difference.

Interesting. Daniel will recognize that trick then :)

> > All insn emits trying to do the right thing from the start.
> > Here the logic assumes that there will be at least two passes over image.
> > I think that is correct, but we never had such assumption.
>
> That's not exactly true; I think image is NULL on every first run, so
> all insn that depend on it will be wrong to start with. Equally there's
> a number of insn that seem to depend on addrs[i], that also requires at
> least two passes.

Right. The image will be allocated after size converges and
addrs[] is inited with 64.
So there is certainly more than one pass.
I was saying that emit* helpers didn't have that assumption.
Looks like 32-bit JIT had.

> > A comment is certainly must have.
>
> I can certainly add one, although I think we'll disagree on the comment
> style :-)

I think we're on the same page actually.

> > The race is possible too. Not sure whether READ_ONCE/WRITE_ONCE
> > are really warranted though. Might be overkill.
>
> Is there concurrency on the jit?

The JIT of different progs can happen in parallel.

> > Once you have a git branch with all the changes I can give it a go.
>
> Ok, I'll go polish this thing and stick it in the tree mentioned in the
> cover letter.
>
> > Also you can rely on our BPF CI.
> > Just cc your patchset to bpf@vger and add [PATCH bpf-next] to a subject.
> > In patchwork there will be "bpf/vmtest-bpf-next" link that
> > builds kernel, selftests and runs everything.
>
> What's a patchwork and where do I find it?

https://patchwork.kernel.org/project/netdevbpf/list/?delegate=121173
Click on any patch, search for 'bpf/vmtest-bpf-next' and follow the
'VM_Test' link.
The summary of the test run is available without logging in into github.
To see detailed logs you need to be logged in with your github account.
It's a silly limitation they have.
They even have a button 'Sign in to view logs'. Oh well.

> > It's pretty much the same as selftests/bpf/vmtest.sh, but with the latest
> > clang nightly and other deps like pahole.
>
> nice.

One more thing. There is test_bpf.ko.
Just insmod it and it will run a ton of JIT tests that we cannot do
from user space.
Please use bpf-next tree though. Few weeks ago Johan Almbladh added
a lot more tests to it.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 14/14] bpf,x86: Respect X86_FEATURE_RETPOLINE*
  2021-10-21 18:03         ` Alexei Starovoitov
@ 2021-10-21 22:37           ` Peter Zijlstra
  2021-10-21 23:24             ` Alexei Starovoitov
  0 siblings, 1 reply; 45+ messages in thread
From: Peter Zijlstra @ 2021-10-21 22:37 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: X86 ML, Josh Poimboeuf, Andrew Cooper, LKML, Nick Desaulniers,
	Daniel Borkmann, bpf, Andrii Nakryiko

On Thu, Oct 21, 2021 at 11:03:33AM -0700, Alexei Starovoitov wrote:

> > I nicked it from emit_bpf_tail_call() in the 32bit jit :-) It seemed a
> > lot more robust than the 64bit one and I couldn't figure out why the
> > difference.
> 
> Interesting. Daniel will recognize that trick then :)

> > Is there concurrency on the jit?
> 
> The JIT of different progs can happen in parallel.

In that case I don't think the patch is safe. I'll see if I can find a
variant that doesn't use static storage.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 14/14] bpf,x86: Respect X86_FEATURE_RETPOLINE*
  2021-10-21 22:37           ` Peter Zijlstra
@ 2021-10-21 23:24             ` Alexei Starovoitov
  2021-10-21 23:38               ` Josh Poimboeuf
  0 siblings, 1 reply; 45+ messages in thread
From: Alexei Starovoitov @ 2021-10-21 23:24 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: X86 ML, Josh Poimboeuf, Andrew Cooper, LKML, Nick Desaulniers,
	Daniel Borkmann, bpf, Andrii Nakryiko

On Thu, Oct 21, 2021 at 3:40 PM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Thu, Oct 21, 2021 at 11:03:33AM -0700, Alexei Starovoitov wrote:
>
> > > I nicked it from emit_bpf_tail_call() in the 32bit jit :-) It seemed a
> > > lot more robust than the 64bit one and I couldn't figure out why the
> > > difference.
> >
> > Interesting. Daniel will recognize that trick then :)
>
> > > Is there concurrency on the jit?
> >
> > The JIT of different progs can happen in parallel.
>
> In that case I don't think the patch is safe. I'll see if I can find a
> variant that doesn't use static storage.

The variable can only change from one fixed value to another fixed value.
Different threads will compute the same value. So I think it's safe
as-is. READ_ONCE/WRITE_ONCE won't hurt though.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 14/14] bpf,x86: Respect X86_FEATURE_RETPOLINE*
  2021-10-21 23:24             ` Alexei Starovoitov
@ 2021-10-21 23:38               ` Josh Poimboeuf
  2021-10-21 23:42                 ` Alexei Starovoitov
  0 siblings, 1 reply; 45+ messages in thread
From: Josh Poimboeuf @ 2021-10-21 23:38 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Peter Zijlstra, X86 ML, Andrew Cooper, LKML, Nick Desaulniers,
	Daniel Borkmann, bpf, Andrii Nakryiko

On Thu, Oct 21, 2021 at 04:24:33PM -0700, Alexei Starovoitov wrote:
> On Thu, Oct 21, 2021 at 3:40 PM Peter Zijlstra <peterz@infradead.org> wrote:
> >
> > On Thu, Oct 21, 2021 at 11:03:33AM -0700, Alexei Starovoitov wrote:
> >
> > > > I nicked it from emit_bpf_tail_call() in the 32bit jit :-) It seemed a
> > > > lot more robust than the 64bit one and I couldn't figure out why the
> > > > difference.
> > >
> > > Interesting. Daniel will recognize that trick then :)
> >
> > > > Is there concurrency on the jit?
> > >
> > > The JIT of different progs can happen in parallel.
> >
> > In that case I don't think the patch is safe. I'll see if I can find a
> > variant that doesn't use static storage.
> 
> The variable can only change from one fixed value to another fixed value.
> Different threads will compute the same value. So I think it's safe
> as-is. READ_ONCE/WRITE_ONCE won't hurt though.

But the size of the generated code differs based on the
emit_bpf_tail_call_indirect() args: 'callee_regs_used' and
'stack_depth'.  So the fixed value can change.

-- 
Josh


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 14/14] bpf,x86: Respect X86_FEATURE_RETPOLINE*
  2021-10-21 23:38               ` Josh Poimboeuf
@ 2021-10-21 23:42                 ` Alexei Starovoitov
  2021-10-22 11:31                   ` Peter Zijlstra
  0 siblings, 1 reply; 45+ messages in thread
From: Alexei Starovoitov @ 2021-10-21 23:42 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Peter Zijlstra, X86 ML, Andrew Cooper, LKML, Nick Desaulniers,
	Daniel Borkmann, bpf, Andrii Nakryiko

On Thu, Oct 21, 2021 at 4:38 PM Josh Poimboeuf <jpoimboe@redhat.com> wrote:
>
> On Thu, Oct 21, 2021 at 04:24:33PM -0700, Alexei Starovoitov wrote:
> > On Thu, Oct 21, 2021 at 3:40 PM Peter Zijlstra <peterz@infradead.org> wrote:
> > >
> > > On Thu, Oct 21, 2021 at 11:03:33AM -0700, Alexei Starovoitov wrote:
> > >
> > > > > I nicked it from emit_bpf_tail_call() in the 32bit jit :-) It seemed a
> > > > > lot more robust than the 64bit one and I couldn't figure out why the
> > > > > difference.
> > > >
> > > > Interesting. Daniel will recognize that trick then :)
> > >
> > > > > Is there concurrency on the jit?
> > > >
> > > > The JIT of different progs can happen in parallel.
> > >
> > > In that case I don't think the patch is safe. I'll see if I can find a
> > > variant that doesn't use static storage.
> >
> > The variable can only change from one fixed value to another fixed value.
> > Different threads will compute the same value. So I think it's safe
> > as-is. READ_ONCE/WRITE_ONCE won't hurt though.
>
> But the size of the generated code differs based on the
> emit_bpf_tail_call_indirect() args: 'callee_regs_used' and
> 'stack_depth'.  So the fixed value can change.

Ahh. Right. It's potentially a different offset for every prog.
Let's put it into struct jit_context then.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 14/14] bpf,x86: Respect X86_FEATURE_RETPOLINE*
  2021-10-21  8:47       ` Peter Zijlstra
  2021-10-21 18:03         ` Alexei Starovoitov
@ 2021-10-21 23:51         ` Zvi Effron
  2021-10-22  8:33           ` Peter Zijlstra
  1 sibling, 1 reply; 45+ messages in thread
From: Zvi Effron @ 2021-10-21 23:51 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Alexei Starovoitov, x86, jpoimboe, andrew.cooper3, linux-kernel,
	ndesaulniers, Daniel Borkmann, bpf, Andrii Nakryiko

On Thu, Oct 21, 2021 at 1:47 AM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Wed, Oct 20, 2021 at 05:05:02PM -0700, Alexei Starovoitov wrote:
> > On Wed, Oct 20, 2021 at 01:09:51PM +0200, Peter Zijlstra wrote:
>
> > > @@ -446,25 +440,8 @@ static void emit_bpf_tail_call_indirect(
> > >  {
> > >     int tcc_off = -4 - round_up(stack_depth, 8);
> > >     u8 *prog = *pprog, *start = *pprog;
> > > -   int pop_bytes = 0;
> > > -   int off1 = 42;
> > > -   int off2 = 31;
> > > -   int off3 = 9;
> > > -
> > > -   /* count the additional bytes used for popping callee regs from stack
> > > -    * that need to be taken into account for each of the offsets that
> > > -    * are used for bailing out of the tail call
> > > -    */
> > > -   pop_bytes = get_pop_bytes(callee_regs_used);
> > > -   off1 += pop_bytes;
> > > -   off2 += pop_bytes;
> > > -   off3 += pop_bytes;
> > > -
> > > -   if (stack_depth) {
> > > -           off1 += 7;
> > > -           off2 += 7;
> > > -           off3 += 7;
> > > -   }
> > > +   static int out_label = -1;
> >
> > Interesting idea!
>
> I nicked it from emit_bpf_tail_call() in the 32bit jit :-) It seemed a
> lot more robust than the 64bit one and I couldn't figure out why the
> difference.
>
> > All insn emits trying to do the right thing from the start.
> > Here the logic assumes that there will be at least two passes over image.
> > I think that is correct, but we never had such assumption.
>
> That's not exactly true; I think image is NULL on every first run, so
> all insn that depend on it will be wrong to start with. Equally there's
> a number of insn that seem to depend on addrs[i], that also requires at
> least two passes.
>
> > A comment is certainly must have.
>
> I can certainly add one, although I think we'll disagree on the comment
> style :-)
>
> > The race is possible too. Not sure whether READ_ONCE/WRITE_ONCE
> > are really warranted though. Might be overkill.
>
> Is there concurrency on the jit?
>
> > Once you have a git branch with all the changes I can give it a go.
>
> Ok, I'll go polish this thing and stick it in the tree mentioned in the
> cover letter.
>
> > Also you can rely on our BPF CI.
> > Just cc your patchset to bpf@vger and add [PATCH bpf-next] to a subject.
> > In patchwork there will be "bpf/vmtest-bpf-next" link that
> > builds kernel, selftests and runs everything.
>
> What's a patchwork and where do I find it?
>

Patchwork[0] tracks the status of patches from submission through to merge (and
beyond?).

[0]: https://patchwork.kernel.org/

> > It's pretty much the same as selftests/bpf/vmtest.sh, but with the latest
> > clang nightly and other deps like pahole.
>
> nice.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 14/14] bpf,x86: Respect X86_FEATURE_RETPOLINE*
  2021-10-21 23:51         ` Zvi Effron
@ 2021-10-22  8:33           ` Peter Zijlstra
  2021-10-22 21:06             ` Zvi Effron
  0 siblings, 1 reply; 45+ messages in thread
From: Peter Zijlstra @ 2021-10-22  8:33 UTC (permalink / raw)
  To: Zvi Effron
  Cc: Alexei Starovoitov, x86, jpoimboe, andrew.cooper3, linux-kernel,
	ndesaulniers, Daniel Borkmann, bpf, Andrii Nakryiko

On Thu, Oct 21, 2021 at 04:51:08PM -0700, Zvi Effron wrote:

> > What's a patchwork and where do I find it?
> >
> 
> Patchwork[0] tracks the status of patches from submission through to merge (and
> beyond?).

Yeah, I sorta know that :-) Even though I loathe the things because
web-browser, but the second part of the question was genuine, there's a
*lot* of patchwork instances around, not everyone uses the new k.org
based one.


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 14/14] bpf,x86: Respect X86_FEATURE_RETPOLINE*
  2021-10-21 23:42                 ` Alexei Starovoitov
@ 2021-10-22 11:31                   ` Peter Zijlstra
  2021-10-22 15:22                     ` Alexei Starovoitov
  0 siblings, 1 reply; 45+ messages in thread
From: Peter Zijlstra @ 2021-10-22 11:31 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Josh Poimboeuf, X86 ML, Andrew Cooper, LKML, Nick Desaulniers,
	Daniel Borkmann, bpf, Andrii Nakryiko

On Thu, Oct 21, 2021 at 04:42:12PM -0700, Alexei Starovoitov wrote:

> Ahh. Right. It's potentially a different offset for every prog.
> Let's put it into struct jit_context then.

Something like this...

---
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -225,6 +225,14 @@ static void jit_fill_hole(void *area, un
 
 struct jit_context {
 	int cleanup_addr; /* Epilogue code offset */
+
+	/*
+	 * Program specific offsets of labels in the code; these rely on the
+	 * JIT doing at least 2 passes, recording the position on the first
+	 * pass, only to generate the correct offset on the second pass.
+	 */
+	int tail_call_direct_label;
+	int tail_call_indirect_label;
 };
 
 /* Maximum number of bytes emitted while JITing one eBPF insn */
@@ -380,22 +388,6 @@ int bpf_arch_text_poke(void *ip, enum bp
 	return __bpf_arch_text_poke(ip, t, old_addr, new_addr, true);
 }
 
-static int get_pop_bytes(bool *callee_regs_used)
-{
-	int bytes = 0;
-
-	if (callee_regs_used[3])
-		bytes += 2;
-	if (callee_regs_used[2])
-		bytes += 2;
-	if (callee_regs_used[1])
-		bytes += 2;
-	if (callee_regs_used[0])
-		bytes += 1;
-
-	return bytes;
-}
-
 /*
  * Generate the following code:
  *
@@ -411,29 +403,12 @@ static int get_pop_bytes(bool *callee_re
  * out:
  */
 static void emit_bpf_tail_call_indirect(u8 **pprog, bool *callee_regs_used,
-					u32 stack_depth)
+					u32 stack_depth, u8 *ip,
+					struct jit_context *ctx)
 {
 	int tcc_off = -4 - round_up(stack_depth, 8);
-	u8 *prog = *pprog;
-	int pop_bytes = 0;
-	int off1 = 42;
-	int off2 = 31;
-	int off3 = 9;
-
-	/* count the additional bytes used for popping callee regs from stack
-	 * that need to be taken into account for each of the offsets that
-	 * are used for bailing out of the tail call
-	 */
-	pop_bytes = get_pop_bytes(callee_regs_used);
-	off1 += pop_bytes;
-	off2 += pop_bytes;
-	off3 += pop_bytes;
-
-	if (stack_depth) {
-		off1 += 7;
-		off2 += 7;
-		off3 += 7;
-	}
+	u8 *prog = *pprog, *start = *pprog;
+	int offset;
 
 	/*
 	 * rdi - pointer to ctx
@@ -448,8 +423,9 @@ static void emit_bpf_tail_call_indirect(
 	EMIT2(0x89, 0xD2);                        /* mov edx, edx */
 	EMIT3(0x39, 0x56,                         /* cmp dword ptr [rsi + 16], edx */
 	      offsetof(struct bpf_array, map.max_entries));
-#define OFFSET1 (off1 + RETPOLINE_RCX_BPF_JIT_SIZE) /* Number of bytes to jump */
-	EMIT2(X86_JBE, OFFSET1);                  /* jbe out */
+
+	offset = ctx->tail_call_indirect_label - (prog + 2 - start);
+	EMIT2(X86_JBE, offset);                   /* jbe out */
 
 	/*
 	 * if (tail_call_cnt > MAX_TAIL_CALL_CNT)
@@ -457,8 +433,9 @@ static void emit_bpf_tail_call_indirect(
 	 */
 	EMIT2_off32(0x8B, 0x85, tcc_off);         /* mov eax, dword ptr [rbp - tcc_off] */
 	EMIT3(0x83, 0xF8, MAX_TAIL_CALL_CNT);     /* cmp eax, MAX_TAIL_CALL_CNT */
-#define OFFSET2 (off2 + RETPOLINE_RCX_BPF_JIT_SIZE)
-	EMIT2(X86_JA, OFFSET2);                   /* ja out */
+
+	offset = ctx->tail_call_indirect_label - (prog + 2 - start);
+	EMIT2(X86_JA, offset);                    /* ja out */
 	EMIT3(0x83, 0xC0, 0x01);                  /* add eax, 1 */
 	EMIT2_off32(0x89, 0x85, tcc_off);         /* mov dword ptr [rbp - tcc_off], eax */
 
@@ -471,12 +448,11 @@ static void emit_bpf_tail_call_indirect(
 	 *	goto out;
 	 */
 	EMIT3(0x48, 0x85, 0xC9);                  /* test rcx,rcx */
-#define OFFSET3 (off3 + RETPOLINE_RCX_BPF_JIT_SIZE)
-	EMIT2(X86_JE, OFFSET3);                   /* je out */
 
-	*pprog = prog;
-	pop_callee_regs(pprog, callee_regs_used);
-	prog = *pprog;
+	offset = ctx->tail_call_indirect_label - (prog + 2 - start);
+	EMIT2(X86_JE, offset);                    /* je out */
+
+	pop_callee_regs(&prog, callee_regs_used);
 
 	EMIT1(0x58);                              /* pop rax */
 	if (stack_depth)
@@ -496,38 +472,18 @@ static void emit_bpf_tail_call_indirect(
 	RETPOLINE_RCX_BPF_JIT();
 
 	/* out: */
+	ctx->tail_call_indirect_label = prog - start;
 	*pprog = prog;
 }
 
 static void emit_bpf_tail_call_direct(struct bpf_jit_poke_descriptor *poke,
-				      u8 **pprog, int addr, u8 *image,
-				      bool *callee_regs_used, u32 stack_depth)
+				      u8 **pprog, u8 *ip,
+				      bool *callee_regs_used, u32 stack_depth,
+				      struct jit_context *ctx)
 {
 	int tcc_off = -4 - round_up(stack_depth, 8);
-	u8 *prog = *pprog;
-	int pop_bytes = 0;
-	int off1 = 20;
-	int poke_off;
-
-	/* count the additional bytes used for popping callee regs to stack
-	 * that need to be taken into account for jump offset that is used for
-	 * bailing out from of the tail call when limit is reached
-	 */
-	pop_bytes = get_pop_bytes(callee_regs_used);
-	off1 += pop_bytes;
-
-	/*
-	 * total bytes for:
-	 * - nop5/ jmpq $off
-	 * - pop callee regs
-	 * - sub rsp, $val if depth > 0
-	 * - pop rax
-	 */
-	poke_off = X86_PATCH_SIZE + pop_bytes + 1;
-	if (stack_depth) {
-		poke_off += 7;
-		off1 += 7;
-	}
+	u8 *prog = *pprog, *start = *pprog;
+	int offset;
 
 	/*
 	 * if (tail_call_cnt > MAX_TAIL_CALL_CNT)
@@ -535,28 +491,30 @@ static void emit_bpf_tail_call_direct(st
 	 */
 	EMIT2_off32(0x8B, 0x85, tcc_off);             /* mov eax, dword ptr [rbp - tcc_off] */
 	EMIT3(0x83, 0xF8, MAX_TAIL_CALL_CNT);         /* cmp eax, MAX_TAIL_CALL_CNT */
-	EMIT2(X86_JA, off1);                          /* ja out */
+
+	offset = ctx->tail_call_direct_label - (prog + 2 - start);
+	EMIT2(X86_JA, offset);                        /* ja out */
 	EMIT3(0x83, 0xC0, 0x01);                      /* add eax, 1 */
 	EMIT2_off32(0x89, 0x85, tcc_off);             /* mov dword ptr [rbp - tcc_off], eax */
 
-	poke->tailcall_bypass = image + (addr - poke_off - X86_PATCH_SIZE);
+	poke->tailcall_bypass = ip + (prog - start);
 	poke->adj_off = X86_TAIL_CALL_OFFSET;
-	poke->tailcall_target = image + (addr - X86_PATCH_SIZE);
+	poke->tailcall_target = ip + ctx->tail_call_direct_label - X86_PATCH_SIZE;
 	poke->bypass_addr = (u8 *)poke->tailcall_target + X86_PATCH_SIZE;
 
 	emit_jump(&prog, (u8 *)poke->tailcall_target + X86_PATCH_SIZE,
 		  poke->tailcall_bypass);
 
-	*pprog = prog;
-	pop_callee_regs(pprog, callee_regs_used);
-	prog = *pprog;
+	pop_callee_regs(&prog, callee_regs_used);
 	EMIT1(0x58);                                  /* pop rax */
 	if (stack_depth)
 		EMIT3_off32(0x48, 0x81, 0xC4, round_up(stack_depth, 8));
 
 	memcpy(prog, x86_nops[5], X86_PATCH_SIZE);
 	prog += X86_PATCH_SIZE;
+
 	/* out: */
+	ctx->tail_call_direct_label = prog - start;
 
 	*pprog = prog;
 }
@@ -1405,13 +1363,16 @@ st:			if (is_imm8(insn->off))
 		case BPF_JMP | BPF_TAIL_CALL:
 			if (imm32)
 				emit_bpf_tail_call_direct(&bpf_prog->aux->poke_tab[imm32 - 1],
-							  &prog, addrs[i], image,
+							  &prog, image + addrs[i - 1],
 							  callee_regs_used,
-							  bpf_prog->aux->stack_depth);
+							  bpf_prog->aux->stack_depth,
+							  ctx);
 			else
 				emit_bpf_tail_call_indirect(&prog,
 							    callee_regs_used,
-							    bpf_prog->aux->stack_depth);
+							    bpf_prog->aux->stack_depth,
+							    image + addrs[i - 1],
+							    ctx);
 			break;
 
 			/* cond jump */

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 14/14] bpf,x86: Respect X86_FEATURE_RETPOLINE*
  2021-10-22 11:31                   ` Peter Zijlstra
@ 2021-10-22 15:22                     ` Alexei Starovoitov
  2021-10-25 13:44                       ` Maciej Fijalkowski
  0 siblings, 1 reply; 45+ messages in thread
From: Alexei Starovoitov @ 2021-10-22 15:22 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Josh Poimboeuf, X86 ML, Andrew Cooper, LKML, Nick Desaulniers,
	Daniel Borkmann, bpf, Andrii Nakryiko

On Fri, Oct 22, 2021 at 4:33 AM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Thu, Oct 21, 2021 at 04:42:12PM -0700, Alexei Starovoitov wrote:
>
> > Ahh. Right. It's potentially a different offset for every prog.
> > Let's put it into struct jit_context then.
>
> Something like this...

Yep. Looks nice and clean to me.

> -       poke->tailcall_bypass = image + (addr - poke_off - X86_PATCH_SIZE);
> +       poke->tailcall_bypass = ip + (prog - start);
>         poke->adj_off = X86_TAIL_CALL_OFFSET;
> -       poke->tailcall_target = image + (addr - X86_PATCH_SIZE);
> +       poke->tailcall_target = ip + ctx->tail_call_direct_label - X86_PATCH_SIZE;

This part looks correct too, but this is Daniel's magic.
He'll probably take a look next week when he comes back from PTO.
I don't recall which test exercises this tailcall poking logic.
It's only used with dynamic updates to prog_array.
insmod test_bpf.ko and test_verifier won't go down this path.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* RE: [PATCH v2 06/14] x86/asm: Fix register order
  2021-10-20 10:44 ` [PATCH v2 06/14] x86/asm: Fix register order Peter Zijlstra
@ 2021-10-22 19:27   ` David Laight
  2021-10-25 14:09   ` Borislav Petkov
  1 sibling, 0 replies; 45+ messages in thread
From: David Laight @ 2021-10-22 19:27 UTC (permalink / raw)
  To: 'Peter Zijlstra', x86, jpoimboe, andrew.cooper3
  Cc: linux-kernel, alexei.starovoitov, ndesaulniers

From: Peter Zijlstra
> Sent: 20 October 2021 11:45
> 
> Ensure the register order is correct; this allows for easy translation
> between register number and trampoline and vice-versa.

There is an alternate cpp pattern that can be used to have the
same effect.
Possibly less error prone.
Stealing some text from 2 patches you'd have:

asm/GEN-for-each-reg.h would contain:

#define GEN_FOR_EACH_REG(GEN) \
  GEN(rax) \
  GEN(rcx) \
  GEN(rdx) \
  GEN(rbx) \
  GEN(rsp) \
  GEN(rbp) \
  GEN(rsi) \
  GEN(rdi) \
  GEN(r8)  \
  GEN(r9)  \
  GEN(r10)

and the user would contain:

#include <asm/GEN-for-each-reg.h>

#define GEN(reg) \
 	extern asmlinkage void __x86_indirect_thunk_ ## reg (void);
GEN_FOR_EACH_REG(GEN)
#undef GEN

#define GEN(reg) \
 	extern asmlinkage void __x86_indirect_alt_call_ ## reg (void);
GEN_FOR_EACH_REG(GEN)
#undef GEN

It is probably possible to use ... and _VA_LIST_.
Something like:

#define GEN_FOR_EACH_REG(GEN, ...) \
  GEN(rax, _VA_LIST_) \
  ...

#define GEN_THUNK(reg, thunk) \
	extern asmlinkage void thunk ## reg (void);

GEN_FOR_EACH_REG(GEN_THUNK, __x86_indirect_thunk_)
GEN_FOR_EACH_REG(GEN_THUNK, __x86_indirect_alt_call_)
#undef GEN_THUNK

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 14/14] bpf,x86: Respect X86_FEATURE_RETPOLINE*
  2021-10-22  8:33           ` Peter Zijlstra
@ 2021-10-22 21:06             ` Zvi Effron
  0 siblings, 0 replies; 45+ messages in thread
From: Zvi Effron @ 2021-10-22 21:06 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Alexei Starovoitov, x86, jpoimboe, andrew.cooper3, linux-kernel,
	ndesaulniers, Daniel Borkmann, bpf, Andrii Nakryiko

On Fri, Oct 22, 2021 at 1:33 AM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Thu, Oct 21, 2021 at 04:51:08PM -0700, Zvi Effron wrote:
>
> > > What's a patchwork and where do I find it?
> > >
> >
> > Patchwork[0] tracks the status of patches from submission through to merge (and
> > beyond?).
>
> Yeah, I sorta know that :-) Even though I loathe the things because
> web-browser, but the second part of the question was genuine, there's a
> *lot* of patchwork instances around, not everyone uses the new k.org
> based one.
>

BPF and NetDev share one: https://patchwork.kernel.org/project/netdevbpf/list/

--Zvi

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 14/14] bpf,x86: Respect X86_FEATURE_RETPOLINE*
  2021-10-25 13:44                       ` Maciej Fijalkowski
@ 2021-10-25 12:42                         ` Peter Zijlstra
  0 siblings, 0 replies; 45+ messages in thread
From: Peter Zijlstra @ 2021-10-25 12:42 UTC (permalink / raw)
  To: Maciej Fijalkowski
  Cc: Alexei Starovoitov, Josh Poimboeuf, X86 ML, Andrew Cooper, LKML,
	Nick Desaulniers, Daniel Borkmann, bpf, Andrii Nakryiko

On Mon, Oct 25, 2021 at 03:44:24PM +0200, Maciej Fijalkowski wrote:
> On Fri, Oct 22, 2021 at 08:22:35AM -0700, Alexei Starovoitov wrote:
> > On Fri, Oct 22, 2021 at 4:33 AM Peter Zijlstra <peterz@infradead.org> wrote:
> > >
> > > On Thu, Oct 21, 2021 at 04:42:12PM -0700, Alexei Starovoitov wrote:
> > >
> > > > Ahh. Right. It's potentially a different offset for every prog.
> > > > Let's put it into struct jit_context then.
> > >
> > > Something like this...
> > 
> > Yep. Looks nice and clean to me.
> > 
> > > -       poke->tailcall_bypass = image + (addr - poke_off - X86_PATCH_SIZE);
> > > +       poke->tailcall_bypass = ip + (prog - start);
> > >         poke->adj_off = X86_TAIL_CALL_OFFSET;
> > > -       poke->tailcall_target = image + (addr - X86_PATCH_SIZE);
> > > +       poke->tailcall_target = ip + ctx->tail_call_direct_label - X86_PATCH_SIZE;
> > 
> > This part looks correct too, but this is Daniel's magic.
> > He'll probably take a look next week when he comes back from PTO.
> > I don't recall which test exercises this tailcall poking logic.
> > It's only used with dynamic updates to prog_array.
> > insmod test_bpf.ko and test_verifier won't go down this path.
> 
> Please run ./test_progs -t tailcalls from tools/testing/selftests/bpf and
> make sure that all of the tests are passing in there, especially the
> tailcall_bpf2bpf* subset.

Yeah, so nothing from that selftests crud wants to work for me; also I
*really* dislike how vmtest.sh as found there tramples all over my
source dir without asking.

Note that even when eventually supplied with O=builddir (confusingly in
front of it), it doesn't want to work and bails with lots of -ENOSPC
warnings (I double checked, my disks are nowhere near full). (and this
is after installing some horrendous python rst crap because clearly
running a test needs to build documentation :/)

I've spend hours on that, I'm not sinking more time into it. If you want
me to run that crap, fix it first.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 14/14] bpf,x86: Respect X86_FEATURE_RETPOLINE*
  2021-10-22 15:22                     ` Alexei Starovoitov
@ 2021-10-25 13:44                       ` Maciej Fijalkowski
  2021-10-25 12:42                         ` Peter Zijlstra
  0 siblings, 1 reply; 45+ messages in thread
From: Maciej Fijalkowski @ 2021-10-25 13:44 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Peter Zijlstra, Josh Poimboeuf, X86 ML, Andrew Cooper, LKML,
	Nick Desaulniers, Daniel Borkmann, bpf, Andrii Nakryiko

On Fri, Oct 22, 2021 at 08:22:35AM -0700, Alexei Starovoitov wrote:
> On Fri, Oct 22, 2021 at 4:33 AM Peter Zijlstra <peterz@infradead.org> wrote:
> >
> > On Thu, Oct 21, 2021 at 04:42:12PM -0700, Alexei Starovoitov wrote:
> >
> > > Ahh. Right. It's potentially a different offset for every prog.
> > > Let's put it into struct jit_context then.
> >
> > Something like this...
> 
> Yep. Looks nice and clean to me.
> 
> > -       poke->tailcall_bypass = image + (addr - poke_off - X86_PATCH_SIZE);
> > +       poke->tailcall_bypass = ip + (prog - start);
> >         poke->adj_off = X86_TAIL_CALL_OFFSET;
> > -       poke->tailcall_target = image + (addr - X86_PATCH_SIZE);
> > +       poke->tailcall_target = ip + ctx->tail_call_direct_label - X86_PATCH_SIZE;
> 
> This part looks correct too, but this is Daniel's magic.
> He'll probably take a look next week when he comes back from PTO.
> I don't recall which test exercises this tailcall poking logic.
> It's only used with dynamic updates to prog_array.
> insmod test_bpf.ko and test_verifier won't go down this path.

Please run ./test_progs -t tailcalls from tools/testing/selftests/bpf and
make sure that all of the tests are passing in there, especially the
tailcall_bpf2bpf* subset.

Thanks!

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 06/14] x86/asm: Fix register order
  2021-10-20 10:44 ` [PATCH v2 06/14] x86/asm: Fix register order Peter Zijlstra
  2021-10-22 19:27   ` David Laight
@ 2021-10-25 14:09   ` Borislav Petkov
  1 sibling, 0 replies; 45+ messages in thread
From: Borislav Petkov @ 2021-10-25 14:09 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: x86, jpoimboe, andrew.cooper3, linux-kernel, alexei.starovoitov,
	ndesaulniers

On Wed, Oct 20, 2021 at 12:44:48PM +0200, Peter Zijlstra wrote:
> Ensure the register order is correct; this allows for easy translation
> between register number and trampoline and vice-versa.
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  arch/x86/include/asm/GEN-for-each-reg.h |   13 +++++++++----
>  1 file changed, 9 insertions(+), 4 deletions(-)
> 
> --- a/arch/x86/include/asm/GEN-for-each-reg.h
> +++ b/arch/x86/include/asm/GEN-for-each-reg.h
> @@ -1,11 +1,15 @@

<--- while you're at it:

WARNING: Missing or malformed SPDX-License-Identifier tag in line 1
#75: FILE: arch/x86/include/asm/GEN-for-each-reg.h:1:
+/*

I.e., put

/* SPDX-License-Identifier: GPL-2.0 */

at the 1st line there.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 01/14] objtool: Tag retpoline thunk symbols
  2021-10-20 15:17   ` Josh Poimboeuf
@ 2021-10-26  7:55     ` Peter Zijlstra
  0 siblings, 0 replies; 45+ messages in thread
From: Peter Zijlstra @ 2021-10-26  7:55 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: x86, andrew.cooper3, linux-kernel, alexei.starovoitov, ndesaulniers

On Wed, Oct 20, 2021 at 08:17:14AM -0700, Josh Poimboeuf wrote:
> On Wed, Oct 20, 2021 at 12:44:43PM +0200, Peter Zijlstra wrote:
> > In order to avoid calling arch_is_retpoline() (which does a strcmp)
> > over and over on symbols, tag them once upfront.
> > 
> > XXX do we also want to do __fentry__ ?
> 
> Might as well.
> 
> > XXX do we want an enum instead of a bunch of bools ?
> 
> Maybe, or convert the bools to bit fields.

Like so then...

---
Subject: objtool: Classify symbols
From: Peter Zijlstra <peterz@infradead.org>
Date: Wed Oct 20 10:05:55 CEST 2021

In order to avoid calling str*cmp() on symbol names, over and over, do
them all once upfront and store the result.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 tools/objtool/check.c               |   34 ++++++++++++++++++++++------------
 tools/objtool/include/objtool/elf.h |    5 ++++-
 2 files changed, 26 insertions(+), 13 deletions(-)

--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -1012,8 +1012,7 @@ static void add_call_dest(struct objtool
 	 * so they need a little help, NOP out any KCOV calls from noinstr
 	 * text.
 	 */
-	if (insn->sec->noinstr &&
-	    !strncmp(insn->call_dest->name, "__sanitizer_cov_", 16)) {
+	if (insn->sec->noinstr && insn->call_dest->kcov) {
 		if (reloc) {
 			reloc->type = R_NONE;
 			elf_write_reloc(file->elf, reloc);
@@ -1027,7 +1026,7 @@ static void add_call_dest(struct objtool
 		insn->type = sibling ? INSN_RETURN : INSN_NOP;
 	}
 
-	if (mcount && !strcmp(insn->call_dest->name, "__fentry__")) {
+	if (mcount && insn->call_dest->fentry) {
 		if (sibling)
 			WARN_FUNC("Tail call to __fentry__ !?!?", insn->sec, insn->offset);
 
@@ -1077,7 +1076,7 @@ static int add_jump_destinations(struct
 		} else if (reloc->sym->type == STT_SECTION) {
 			dest_sec = reloc->sym->sec;
 			dest_off = arch_dest_reloc_offset(reloc->addend);
-		} else if (arch_is_retpoline(reloc->sym)) {
+		} else if (reloc->sym->retpoline_thunk) {
 			/*
 			 * Retpoline jumps are really dynamic jumps in
 			 * disguise, so convert them accordingly.
@@ -1218,7 +1217,7 @@ static int add_call_destinations(struct
 
 			add_call_dest(file, insn, dest, false);
 
-		} else if (arch_is_retpoline(reloc->sym)) {
+		} else if (reloc->sym->retpoline_thunk) {
 			/*
 			 * Retpoline calls are really dynamic calls in
 			 * disguise, so convert them accordingly.
@@ -1919,17 +1918,28 @@ static int read_intra_function_calls(str
 	return 0;
 }
 
-static int read_static_call_tramps(struct objtool_file *file)
+static int classify_symbols(struct objtool_file *file)
 {
 	struct section *sec;
 	struct symbol *func;
 
 	for_each_sec(file, sec) {
 		list_for_each_entry(func, &sec->symbol_list, list) {
-			if (func->bind == STB_GLOBAL &&
-			    !strncmp(func->name, STATIC_CALL_TRAMP_PREFIX_STR,
+			if (func->bind != STB_GLOBAL)
+				continue;
+
+			if (!strncmp(func->name, STATIC_CALL_TRAMP_PREFIX_STR,
 				     strlen(STATIC_CALL_TRAMP_PREFIX_STR)))
 				func->static_call_tramp = true;
+
+			if (arch_is_retpoline(func))
+				func->retpoline_thunk = true;
+
+			if (!strcmp(func->name, "__fentry__"))
+				func->fentry = true;
+
+			if (!strncmp(func->name, "__sanitizer_cov_", 16))
+				func->kcov = true;
 		}
 	}
 
@@ -1995,7 +2005,7 @@ static int decode_sections(struct objtoo
 	/*
 	 * Must be before add_{jump_call}_destination.
 	 */
-	ret = read_static_call_tramps(file);
+	ret = classify_symbols(file);
 	if (ret)
 		return ret;
 
@@ -2053,9 +2063,9 @@ static int decode_sections(struct objtoo
 
 static bool is_fentry_call(struct instruction *insn)
 {
-	if (insn->type == INSN_CALL && insn->call_dest &&
-	    insn->call_dest->type == STT_NOTYPE &&
-	    !strcmp(insn->call_dest->name, "__fentry__"))
+	if (insn->type == INSN_CALL &&
+	    insn->call_dest &&
+	    insn->call_dest->fentry)
 		return true;
 
 	return false;
--- a/tools/objtool/include/objtool/elf.h
+++ b/tools/objtool/include/objtool/elf.h
@@ -55,7 +55,10 @@ struct symbol {
 	unsigned int len;
 	struct symbol *pfunc, *cfunc, *alias;
 	bool uaccess_safe;
-	bool static_call_tramp;
+	u8 static_call_tramp : 1;
+	u8 retpoline_thunk   : 1;
+	u8 fentry            : 1;
+	u8 kcov              : 1;
 	struct list_head pv_target;
 };
 

^ permalink raw reply	[flat|nested] 45+ messages in thread

end of thread, other threads:[~2021-10-26  7:55 UTC | newest]

Thread overview: 45+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-20 10:44 [PATCH v2 00/14] x86: Rewrite the retpoline rewrite logic Peter Zijlstra
2021-10-20 10:44 ` [PATCH v2 01/14] objtool: Tag retpoline thunk symbols Peter Zijlstra
2021-10-20 15:17   ` Josh Poimboeuf
2021-10-26  7:55     ` Peter Zijlstra
2021-10-20 10:44 ` [PATCH v2 02/14] objtool: Explicitly avoid self modifying code in .altinstr_replacement Peter Zijlstra
2021-10-20 10:44 ` [PATCH v2 03/14] objtool: Shrink struct instruction Peter Zijlstra
2021-10-20 10:44 ` [PATCH v2 04/14] objtool,x86: Replace alternatives with .retpoline_sites Peter Zijlstra
2021-10-20 10:44 ` [PATCH v2 05/14] x86/retpoline: Remove unused replacement symbols Peter Zijlstra
2021-10-20 10:44 ` [PATCH v2 06/14] x86/asm: Fix register order Peter Zijlstra
2021-10-22 19:27   ` David Laight
2021-10-25 14:09   ` Borislav Petkov
2021-10-20 10:44 ` [PATCH v2 07/14] x86/asm: Fixup odd GEN-for-each-reg.h usage Peter Zijlstra
2021-10-20 10:44 ` [PATCH v2 08/14] x86/retpoline: Create a retpoline thunk array Peter Zijlstra
2021-10-20 15:57   ` Josh Poimboeuf
2021-10-20 16:46     ` Andrew Cooper
2021-10-20 17:09       ` Josh Poimboeuf
2021-10-20 19:22         ` Peter Zijlstra
2021-10-20 19:43           ` Josh Poimboeuf
2021-10-20 19:34       ` Peter Zijlstra
2021-10-20 10:44 ` [PATCH v2 09/14] x86/alternative: Implement .retpoline_sites support Peter Zijlstra
2021-10-20 10:44 ` [PATCH v2 10/14] x86/alternative: Handle Jcc __x86_indirect_thunk_\reg Peter Zijlstra
2021-10-20 10:44 ` [PATCH v2 11/14] x86/alternative: Try inline spectre_v2=retpoline,amd Peter Zijlstra
2021-10-20 10:44 ` [PATCH v2 12/14] x86/alternative: Add debug prints to apply_retpolines() Peter Zijlstra
2021-10-20 10:44 ` [PATCH v2 13/14] x86,bugs: Unconditionally allow spectre_v2=retpoline,amd Peter Zijlstra
2021-10-20 10:44 ` [PATCH v2 14/14] bpf,x86: Respect X86_FEATURE_RETPOLINE* Peter Zijlstra
2021-10-20 11:09   ` Peter Zijlstra
2021-10-20 16:56     ` Josh Poimboeuf
2021-10-20 19:23       ` Peter Zijlstra
2021-10-21  0:05     ` Alexei Starovoitov
2021-10-21  8:47       ` Peter Zijlstra
2021-10-21 18:03         ` Alexei Starovoitov
2021-10-21 22:37           ` Peter Zijlstra
2021-10-21 23:24             ` Alexei Starovoitov
2021-10-21 23:38               ` Josh Poimboeuf
2021-10-21 23:42                 ` Alexei Starovoitov
2021-10-22 11:31                   ` Peter Zijlstra
2021-10-22 15:22                     ` Alexei Starovoitov
2021-10-25 13:44                       ` Maciej Fijalkowski
2021-10-25 12:42                         ` Peter Zijlstra
2021-10-21 23:51         ` Zvi Effron
2021-10-22  8:33           ` Peter Zijlstra
2021-10-22 21:06             ` Zvi Effron
2021-10-21  0:07   ` Alexei Starovoitov
2021-10-21  0:18     ` Josh Poimboeuf
2021-10-21  8:53       ` Peter Zijlstra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).