linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 00/14] x86,objtool: Optimize !RETPOLINE
@ 2021-03-18 17:11 Peter Zijlstra
  2021-03-18 17:11 ` [PATCH v2 01/14] x86: Add insn_decode_kernel() Peter Zijlstra
                   ` (13 more replies)
  0 siblings, 14 replies; 45+ messages in thread
From: Peter Zijlstra @ 2021-03-18 17:11 UTC (permalink / raw)
  To: x86, jpoimboe, jgross, mbenes; +Cc: linux-kernel, peterz

Hi,

Respin of the !RETPOLINE optimization patches.

Boris, the first 3 should probably go into tip/x86/core, it's an ungodly tangle
since it relies on the insn decoder patches in tip/x86/core, the NOP patches in
tip/x86/cpu and the alternative patches in tip/x86/alternatives.

Just to make life easy I'd suggest merging everything in x86/core and
forgetting about the other topic branches (that's what I ended up doing locally).

The remaining 11 patches depend on the first 3 as well as on the work in
tip/objtool/core, just to make life more interesting still ;-)

All except the last 4 patches should be fairly uncontroversial (I hope...)



^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH v2 01/14] x86: Add insn_decode_kernel()
  2021-03-18 17:11 [PATCH v2 00/14] x86,objtool: Optimize !RETPOLINE Peter Zijlstra
@ 2021-03-18 17:11 ` Peter Zijlstra
  2021-03-19 10:40   ` Borislav Petkov
  2021-03-18 17:11 ` [PATCH v2 02/14] x86/alternatives: Optimize optimize_nops() Peter Zijlstra
                   ` (12 subsequent siblings)
  13 siblings, 1 reply; 45+ messages in thread
From: Peter Zijlstra @ 2021-03-18 17:11 UTC (permalink / raw)
  To: x86, jpoimboe, jgross, mbenes; +Cc: linux-kernel, peterz

Add a helper to decode kernel instructions; there's no point in
endlessly repeating those last two arguments.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/include/asm/insn.h        |    2 ++
 arch/x86/kernel/alternative.c      |    2 +-
 arch/x86/kernel/cpu/mce/severity.c |    2 +-
 arch/x86/kernel/kprobes/core.c     |    4 ++--
 arch/x86/kernel/kprobes/opt.c      |    2 +-
 arch/x86/kernel/traps.c            |    2 +-
 tools/arch/x86/include/asm/insn.h  |    4 +++-
 7 files changed, 11 insertions(+), 7 deletions(-)

--- a/arch/x86/include/asm/insn.h
+++ b/arch/x86/include/asm/insn.h
@@ -150,6 +150,8 @@ enum insn_mode {
 
 extern int insn_decode(struct insn *insn, const void *kaddr, int buf_len, enum insn_mode m);
 
+#define insn_decode_kernel(_insn, _ptr) insn_decode((_insn), (_ptr), MAX_INSN_SIZE, INSN_MODE_KERN)
+
 /* Attribute will be determined after getting ModRM (for opcode groups) */
 static inline void insn_get_attribute(struct insn *insn)
 {
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -1160,7 +1160,7 @@ static void text_poke_loc_init(struct te
 	if (!emulate)
 		emulate = opcode;
 
-	ret = insn_decode(&insn, emulate, MAX_INSN_SIZE, INSN_MODE_KERN);
+	ret = insn_decode_kernel(&insn, emulate);
 
 	BUG_ON(ret < 0);
 	BUG_ON(len != insn.length);
--- a/arch/x86/kernel/cpu/mce/severity.c
+++ b/arch/x86/kernel/cpu/mce/severity.c
@@ -225,7 +225,7 @@ static bool is_copy_from_user(struct pt_
 	if (copy_from_kernel_nofault(insn_buf, (void *)regs->ip, MAX_INSN_SIZE))
 		return false;
 
-	ret = insn_decode(&insn, insn_buf, MAX_INSN_SIZE, INSN_MODE_KERN);
+	ret = insn_decode_kernel(&insn, insn_buf);
 	if (ret < 0)
 		return false;
 
--- a/arch/x86/kernel/kprobes/core.c
+++ b/arch/x86/kernel/kprobes/core.c
@@ -279,7 +279,7 @@ static int can_probe(unsigned long paddr
 		if (!__addr)
 			return 0;
 
-		ret = insn_decode(&insn, (void *)__addr, MAX_INSN_SIZE, INSN_MODE_KERN);
+		ret = insn_decode_kernel(&insn, (void *)__addr);
 		if (ret < 0)
 			return 0;
 
@@ -316,7 +316,7 @@ int __copy_instruction(u8 *dest, u8 *src
 			MAX_INSN_SIZE))
 		return 0;
 
-	ret = insn_decode(insn, dest, MAX_INSN_SIZE, INSN_MODE_KERN);
+	ret = insn_decode_kernel(insn, dest);
 	if (ret < 0)
 		return 0;
 
--- a/arch/x86/kernel/kprobes/opt.c
+++ b/arch/x86/kernel/kprobes/opt.c
@@ -324,7 +324,7 @@ static int can_optimize(unsigned long pa
 		if (!recovered_insn)
 			return 0;
 
-		ret = insn_decode(&insn, (void *)recovered_insn, MAX_INSN_SIZE, INSN_MODE_KERN);
+		ret = insn_decode_kernel(&insn, (void *)recovered_insn);
 		if (ret < 0)
 			return 0;
 
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -504,7 +504,7 @@ static enum kernel_gp_hint get_kernel_gp
 			MAX_INSN_SIZE))
 		return GP_NO_HINT;
 
-	ret = insn_decode(&insn, insn_buf, MAX_INSN_SIZE, INSN_MODE_KERN);
+	ret = insn_decode_kernel(&insn, insn_buf);
 	if (ret < 0)
 		return GP_NO_HINT;
 
--- a/tools/arch/x86/include/asm/insn.h
+++ b/tools/arch/x86/include/asm/insn.h
@@ -150,6 +150,8 @@ enum insn_mode {
 
 extern int insn_decode(struct insn *insn, const void *kaddr, int buf_len, enum insn_mode m);
 
+#define insn_decode_kernel(_insn, _ptr) insn_decode((_insn), (_ptr), MAX_INSN_SIZE, INSN_MODE_KERN)
+
 /* Attribute will be determined after getting ModRM (for opcode groups) */
 static inline void insn_get_attribute(struct insn *insn)
 {



^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH v2 02/14] x86/alternatives: Optimize optimize_nops()
  2021-03-18 17:11 [PATCH v2 00/14] x86,objtool: Optimize !RETPOLINE Peter Zijlstra
  2021-03-18 17:11 ` [PATCH v2 01/14] x86: Add insn_decode_kernel() Peter Zijlstra
@ 2021-03-18 17:11 ` Peter Zijlstra
  2021-03-21 12:06   ` Borislav Petkov
  2021-03-18 17:11 ` [PATCH v2 03/14] x86/retpoline: Simplify retpolines Peter Zijlstra
                   ` (11 subsequent siblings)
  13 siblings, 1 reply; 45+ messages in thread
From: Peter Zijlstra @ 2021-03-18 17:11 UTC (permalink / raw)
  To: x86, jpoimboe, jgross, mbenes; +Cc: linux-kernel, peterz

Currently optimize_nops() scans to see if the alternative starts with
NOPs. However, the emit pattern is:

  141:	\oldinstr
  142:	.skip (len-(142b-141b)), 0x90

That is, when oldinstr is short, we pad the tail with NOPs. This case
never gets optimized.

Rewrite optimize_nops() to replace any string of NOPs inside the
alternative to larger NOPs. Also run it irrespective of patching,
replacing NOPs in both the original and replaced code.

A direct consequence is that padlen becomes superfluous, so remove it.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/include/asm/alternative.h            |   14 ++---
 arch/x86/kernel/alternative.c                 |   61 ++++++++++++++++----------
 tools/objtool/arch/x86/include/arch/special.h |    2 
 3 files changed, 45 insertions(+), 32 deletions(-)

--- a/arch/x86/include/asm/alternative.h
+++ b/arch/x86/include/asm/alternative.h
@@ -65,7 +65,6 @@ struct alt_instr {
 	u16 cpuid;		/* cpuid bit set for replacement */
 	u8  instrlen;		/* length of original instruction */
 	u8  replacementlen;	/* length of new instruction */
-	u8  padlen;		/* length of build-time padding */
 } __packed;
 
 /*
@@ -104,7 +103,6 @@ static inline int alternatives_text_rese
 
 #define alt_end_marker		"663"
 #define alt_slen		"662b-661b"
-#define alt_pad_len		alt_end_marker"b-662b"
 #define alt_total_slen		alt_end_marker"b-661b"
 #define alt_rlen(num)		e_replacement(num)"f-"b_replacement(num)"f"
 
@@ -151,8 +149,7 @@ static inline int alternatives_text_rese
 	" .long " b_replacement(num)"f - .\n"		/* new instruction */ \
 	" .word " __stringify(feature) "\n"		/* feature bit     */ \
 	" .byte " alt_total_slen "\n"			/* source len      */ \
-	" .byte " alt_rlen(num) "\n"			/* replacement len */ \
-	" .byte " alt_pad_len "\n"			/* pad len */
+	" .byte " alt_rlen(num) "\n"			/* replacement len */
 
 #define ALTINSTR_REPLACEMENT(newinstr, num)		/* replacement */	\
 	"# ALT: replacement " #num "\n"						\
@@ -315,13 +312,12 @@ static inline int alternatives_text_rese
  * enough information for the alternatives patching code to patch an
  * instruction. See apply_alternatives().
  */
-.macro altinstruction_entry orig alt feature orig_len alt_len pad_len
+.macro altinstruction_entry orig alt feature orig_len alt_len
 	.long \orig - .
 	.long \alt - .
 	.word \feature
 	.byte \orig_len
 	.byte \alt_len
-	.byte \pad_len
 .endm
 
 /*
@@ -338,7 +334,7 @@ static inline int alternatives_text_rese
 142:
 
 	.pushsection .altinstructions,"a"
-	altinstruction_entry 140b,143f,\feature,142b-140b,144f-143f,142b-141b
+	altinstruction_entry 140b,143f,\feature,142b-140b,144f-143f
 	.popsection
 
 	.pushsection .altinstr_replacement,"ax"
@@ -375,8 +371,8 @@ static inline int alternatives_text_rese
 142:
 
 	.pushsection .altinstructions,"a"
-	altinstruction_entry 140b,143f,\feature1,142b-140b,144f-143f,142b-141b
-	altinstruction_entry 140b,144f,\feature2,142b-140b,145f-144f,142b-141b
+	altinstruction_entry 140b,143f,\feature1,142b-140b,144f-143f
+	altinstruction_entry 140b,144f,\feature2,142b-140b,145f-144f
 	.popsection
 
 	.pushsection .altinstr_replacement,"ax"
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -345,19 +345,39 @@ recompute_jump(struct alt_instr *a, u8 *
 static void __init_or_module noinline optimize_nops(struct alt_instr *a, u8 *instr)
 {
 	unsigned long flags;
-	int i;
+	int nops = 0, i = 0;
+	struct insn insn;
+	u8 *nop = NULL;
 
-	for (i = 0; i < a->padlen; i++) {
-		if (instr[i] != 0x90)
+	do {
+		if (insn_decode_kernel(&insn, &instr[i]))
 			return;
-	}
 
-	local_irq_save(flags);
-	add_nops(instr + (a->instrlen - a->padlen), a->padlen);
-	local_irq_restore(flags);
+		if (insn.length == 1 && insn.opcode.bytes[0] == 0x90) {
+			if (!nop) {
+				nop = &instr[i];
+				nops = 1;
+			} else {
+				nops++;
+			}
+		}
+		i += insn.length;
+
+		if ((insn.length != 1 || i == a->instrlen) && nop) {
+
+			local_irq_save(flags);
+			add_nops(nop, nops);
+			local_irq_restore(flags);
+
+			DUMP_BYTES(instr, a->instrlen, "%px: [%d:%d) optimized NOPs: ",
+				   instr, (int)(unsigned long)(nop-instr), nops);
+
+			nop = NULL;
+		}
 
-	DUMP_BYTES(instr, a->instrlen, "%px: [%d:%d) optimized NOPs: ",
-		   instr, a->instrlen - a->padlen, a->padlen);
+	} while (i < a->instrlen);
+
+	WARN_ON_ONCE(nop);
 }
 
 /*
@@ -403,19 +423,15 @@ void __init_or_module noinline apply_alt
 		 * - feature not present but ALTINSTR_FLAG_INV is set to mean,
 		 *   patch if feature is *NOT* present.
 		 */
-		if (!boot_cpu_has(feature) == !(a->cpuid & ALTINSTR_FLAG_INV)) {
-			if (a->padlen > 1)
-				optimize_nops(a, instr);
-
-			continue;
-		}
+		if (!boot_cpu_has(feature) == !(a->cpuid & ALTINSTR_FLAG_INV))
+			goto next;
 
-		DPRINTK("feat: %s%d*32+%d, old: (%pS (%px) len: %d), repl: (%px, len: %d), pad: %d",
+		DPRINTK("feat: %s%d*32+%d, old: (%pS (%px) len: %d), repl: (%px, len: %d)",
 			(a->cpuid & ALTINSTR_FLAG_INV) ? "!" : "",
 			feature >> 5,
 			feature & 0x1f,
 			instr, instr, a->instrlen,
-			replacement, a->replacementlen, a->padlen);
+			replacement, a->replacementlen);
 
 		DUMP_BYTES(instr, a->instrlen, "%px: old_insn: ", instr);
 		DUMP_BYTES(replacement, a->replacementlen, "%px: rpl_insn: ", replacement);
@@ -439,14 +455,15 @@ void __init_or_module noinline apply_alt
 		if (a->replacementlen && is_jmp(replacement[0]))
 			recompute_jump(a, instr, replacement, insn_buff);
 
-		if (a->instrlen > a->replacementlen) {
-			add_nops(insn_buff + a->replacementlen,
-				 a->instrlen - a->replacementlen);
-			insn_buff_sz += a->instrlen - a->replacementlen;
-		}
+		for (; insn_buff_sz < a->instrlen; insn_buff_sz++)
+			insn_buff[insn_buff_sz] = 0x90;
+
 		DUMP_BYTES(insn_buff, insn_buff_sz, "%px: final_insn: ", instr);
 
 		text_poke_early(instr, insn_buff, insn_buff_sz);
+
+next:
+		optimize_nops(a, instr);
 	}
 }
 
--- a/tools/objtool/arch/x86/include/arch/special.h
+++ b/tools/objtool/arch/x86/include/arch/special.h
@@ -10,7 +10,7 @@
 #define JUMP_ORIG_OFFSET	0
 #define JUMP_NEW_OFFSET		4
 
-#define ALT_ENTRY_SIZE		13
+#define ALT_ENTRY_SIZE		12
 #define ALT_ORIG_OFFSET		0
 #define ALT_NEW_OFFSET		4
 #define ALT_FEATURE_OFFSET	8



^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH v2 03/14] x86/retpoline: Simplify retpolines
  2021-03-18 17:11 [PATCH v2 00/14] x86,objtool: Optimize !RETPOLINE Peter Zijlstra
  2021-03-18 17:11 ` [PATCH v2 01/14] x86: Add insn_decode_kernel() Peter Zijlstra
  2021-03-18 17:11 ` [PATCH v2 02/14] x86/alternatives: Optimize optimize_nops() Peter Zijlstra
@ 2021-03-18 17:11 ` Peter Zijlstra
  2021-03-19 17:18   ` David Laight
  2021-03-18 17:11 ` [PATCH v2 04/14] objtool: Correctly handle retpoline thunk calls Peter Zijlstra
                   ` (10 subsequent siblings)
  13 siblings, 1 reply; 45+ messages in thread
From: Peter Zijlstra @ 2021-03-18 17:11 UTC (permalink / raw)
  To: x86, jpoimboe, jgross, mbenes; +Cc: linux-kernel, peterz

Due to commit c9c324dc22aa ("objtool: Support stack layout changes
in alternatives"), it is possible to simplify the retpolines.

Currently our retpolines consist of 2 symbols,
__x86_indirect_thunk_\reg, which is the compiler target, and
__x86_retpoline_\reg, which is the actual retpoline. Both are
consecutive in code and aligned such that for any one register they
both live in the same cacheline:

  0000000000000000 <__x86_indirect_thunk_rax>:
   0:   ff e0                   jmpq   *%rax
   2:   90                      nop
   3:   90                      nop
   4:   90                      nop

  0000000000000005 <__x86_retpoline_rax>:
   5:   e8 07 00 00 00          callq  11 <__x86_retpoline_rax+0xc>
   a:   f3 90                   pause
   c:   0f ae e8                lfence
   f:   eb f9                   jmp    a <__x86_retpoline_rax+0x5>
  11:   48 89 04 24             mov    %rax,(%rsp)
  15:   c3                      retq
  16:   66 2e 0f 1f 84 00 00 00 00 00   nopw   %cs:0x0(%rax,%rax,1)

The thunk is an alternative_2, where one option is a jmp to the
retpoline. This was done so that objtool didn't need to deal with
alternatives with stack ops. But that problem has been solved, so now
it is possible to fold the entire retpoline into the alternative to
simplify and consolidate unused bytes:

  0000000000000000 <__x86_indirect_thunk_rax>:
   0:   ff e0                   jmpq   *%rax
   2:   90                      nop
   3:   90                      nop
   4:   90                      nop
   5:   90                      nop
   6:   90                      nop
   7:   90                      nop
   8:   90                      nop
   9:   90                      nop
   a:   90                      nop
   b:   90                      nop
   c:   90                      nop
   d:   90                      nop
   e:   90                      nop
   f:   90                      nop
  10:   90                      nop
  11:   66 66 2e 0f 1f 84 00 00 00 00 00        data16 nopw %cs:0x0(%rax,%rax,1)
  1c:   0f 1f 40 00             nopl   0x0(%rax)

Notice that since the longest alternative sequence is now:

   0:   e8 07 00 00 00          callq  c <.altinstr_replacement+0xc>
   5:   f3 90                   pause
   7:   0f ae e8                lfence
   a:   eb f9                   jmp    5 <.altinstr_replacement+0x5>
   c:   48 89 04 24             mov    %rax,(%rsp)
  10:   c3                      retq

17 bytes, we have 15 bytes NOP at the end of our 32 byte slot. (IOW,
if we can shrink the retpoline by 1 byte we can pack it more dense)

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/include/asm/asm-prototypes.h |    7 -------
 arch/x86/include/asm/nospec-branch.h  |    6 +++---
 arch/x86/lib/retpoline.S              |   34 +++++++++++++++++-----------------
 tools/objtool/check.c                 |    3 +--
 4 files changed, 21 insertions(+), 29 deletions(-)

--- a/arch/x86/include/asm/asm-prototypes.h
+++ b/arch/x86/include/asm/asm-prototypes.h
@@ -22,15 +22,8 @@ extern void cmpxchg8b_emu(void);
 #define DECL_INDIRECT_THUNK(reg) \
 	extern asmlinkage void __x86_indirect_thunk_ ## reg (void);
 
-#define DECL_RETPOLINE(reg) \
-	extern asmlinkage void __x86_retpoline_ ## reg (void);
-
 #undef GEN
 #define GEN(reg) DECL_INDIRECT_THUNK(reg)
 #include <asm/GEN-for-each-reg.h>
 
-#undef GEN
-#define GEN(reg) DECL_RETPOLINE(reg)
-#include <asm/GEN-for-each-reg.h>
-
 #endif /* CONFIG_RETPOLINE */
--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -81,7 +81,7 @@
 .macro JMP_NOSPEC reg:req
 #ifdef CONFIG_RETPOLINE
 	ALTERNATIVE_2 __stringify(ANNOTATE_RETPOLINE_SAFE; jmp *%\reg), \
-		      __stringify(jmp __x86_retpoline_\reg), X86_FEATURE_RETPOLINE, \
+		      __stringify(jmp __x86_indirect_thunk_\reg), X86_FEATURE_RETPOLINE, \
 		      __stringify(lfence; ANNOTATE_RETPOLINE_SAFE; jmp *%\reg), X86_FEATURE_RETPOLINE_AMD
 #else
 	jmp	*%\reg
@@ -91,7 +91,7 @@
 .macro CALL_NOSPEC reg:req
 #ifdef CONFIG_RETPOLINE
 	ALTERNATIVE_2 __stringify(ANNOTATE_RETPOLINE_SAFE; call *%\reg), \
-		      __stringify(call __x86_retpoline_\reg), X86_FEATURE_RETPOLINE, \
+		      __stringify(call __x86_indirect_thunk_\reg), X86_FEATURE_RETPOLINE, \
 		      __stringify(lfence; ANNOTATE_RETPOLINE_SAFE; call *%\reg), X86_FEATURE_RETPOLINE_AMD
 #else
 	call	*%\reg
@@ -129,7 +129,7 @@
 	ALTERNATIVE_2(						\
 	ANNOTATE_RETPOLINE_SAFE					\
 	"call *%[thunk_target]\n",				\
-	"call __x86_retpoline_%V[thunk_target]\n",		\
+	"call __x86_indirect_thunk_%V[thunk_target]\n",		\
 	X86_FEATURE_RETPOLINE,					\
 	"lfence;\n"						\
 	ANNOTATE_RETPOLINE_SAFE					\
--- a/arch/x86/lib/retpoline.S
+++ b/arch/x86/lib/retpoline.S
@@ -10,27 +10,31 @@
 #include <asm/unwind_hints.h>
 #include <asm/frame.h>
 
-.macro THUNK reg
-	.section .text.__x86.indirect_thunk
-
-	.align 32
-SYM_FUNC_START(__x86_indirect_thunk_\reg)
-	JMP_NOSPEC \reg
-SYM_FUNC_END(__x86_indirect_thunk_\reg)
-
-SYM_FUNC_START_NOALIGN(__x86_retpoline_\reg)
+.macro RETPOLINE reg
 	ANNOTATE_INTRA_FUNCTION_CALL
-	call	.Ldo_rop_\@
+	call    .Ldo_rop_\@
 .Lspec_trap_\@:
 	UNWIND_HINT_EMPTY
 	pause
 	lfence
-	jmp	.Lspec_trap_\@
+	jmp .Lspec_trap_\@
 .Ldo_rop_\@:
-	mov	%\reg, (%_ASM_SP)
+	mov     %\reg, (%_ASM_SP)
 	UNWIND_HINT_FUNC
 	ret
-SYM_FUNC_END(__x86_retpoline_\reg)
+.endm
+
+.macro THUNK reg
+	.section .text.__x86.indirect_thunk
+
+	.align 32
+SYM_FUNC_START(__x86_indirect_thunk_\reg)
+
+	ALTERNATIVE_2 __stringify(ANNOTATE_RETPOLINE_SAFE; jmp *%\reg), \
+		      __stringify(RETPOLINE \reg), X86_FEATURE_RETPOLINE, \
+		      __stringify(lfence; ANNOTATE_RETPOLINE_SAFE; jmp *%\reg), X86_FEATURE_RETPOLINE_AMD
+
+SYM_FUNC_END(__x86_indirect_thunk_\reg)
 
 .endm
 
@@ -48,7 +52,6 @@ SYM_FUNC_END(__x86_retpoline_\reg)
 
 #define __EXPORT_THUNK(sym)	_ASM_NOKPROBE(sym); EXPORT_SYMBOL(sym)
 #define EXPORT_THUNK(reg)	__EXPORT_THUNK(__x86_indirect_thunk_ ## reg)
-#define EXPORT_RETPOLINE(reg)  __EXPORT_THUNK(__x86_retpoline_ ## reg)
 
 #undef GEN
 #define GEN(reg) THUNK reg
@@ -58,6 +61,3 @@ SYM_FUNC_END(__x86_retpoline_\reg)
 #define GEN(reg) EXPORT_THUNK(reg)
 #include <asm/GEN-for-each-reg.h>
 
-#undef GEN
-#define GEN(reg) EXPORT_RETPOLINE(reg)
-#include <asm/GEN-for-each-reg.h>
--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -800,8 +800,7 @@ static int add_jump_destinations(struct
 		} else if (reloc->sym->type == STT_SECTION) {
 			dest_sec = reloc->sym->sec;
 			dest_off = arch_dest_reloc_offset(reloc->addend);
-		} else if (!strncmp(reloc->sym->name, "__x86_indirect_thunk_", 21) ||
-			   !strncmp(reloc->sym->name, "__x86_retpoline_", 16)) {
+		} else if (!strncmp(reloc->sym->name, "__x86_indirect_thunk_", 21)) {
 			/*
 			 * Retpoline jumps are really dynamic jumps in
 			 * disguise, so convert them accordingly.



^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH v2 04/14] objtool: Correctly handle retpoline thunk calls
  2021-03-18 17:11 [PATCH v2 00/14] x86,objtool: Optimize !RETPOLINE Peter Zijlstra
                   ` (2 preceding siblings ...)
  2021-03-18 17:11 ` [PATCH v2 03/14] x86/retpoline: Simplify retpolines Peter Zijlstra
@ 2021-03-18 17:11 ` Peter Zijlstra
  2021-03-18 17:11 ` [PATCH v2 05/14] objtool: Per arch retpoline naming Peter Zijlstra
                   ` (9 subsequent siblings)
  13 siblings, 0 replies; 45+ messages in thread
From: Peter Zijlstra @ 2021-03-18 17:11 UTC (permalink / raw)
  To: x86, jpoimboe, jgross, mbenes; +Cc: linux-kernel, peterz

Just like JMP handling, convert a direct CALL to a retpoline thunk
into a retpoline safe indirect CALL.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 tools/objtool/check.c |   12 ++++++++++++
 1 file changed, 12 insertions(+)

--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -953,6 +953,18 @@ static int add_call_destinations(struct
 					  dest_off);
 				return -1;
 			}
+
+		} else if (!strncmp(reloc->sym->name, "__x86_indirect_thunk_", 21)) {
+			/*
+			 * Retpoline calls are really dynamic calls in
+			 * disguise, so convert them accordingly.
+			 */
+			insn->type = INSN_CALL_DYNAMIC;
+			insn->retpoline_safe = true;
+
+			remove_insn_ops(insn);
+			continue;
+
 		} else
 			insn->call_dest = reloc->sym;
 



^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH v2 05/14] objtool: Per arch retpoline naming
  2021-03-18 17:11 [PATCH v2 00/14] x86,objtool: Optimize !RETPOLINE Peter Zijlstra
                   ` (3 preceding siblings ...)
  2021-03-18 17:11 ` [PATCH v2 04/14] objtool: Correctly handle retpoline thunk calls Peter Zijlstra
@ 2021-03-18 17:11 ` Peter Zijlstra
  2021-03-19  2:38   ` Josh Poimboeuf
  2021-03-18 17:11 ` [PATCH v2 06/14] objtool: Fix static_call list generation Peter Zijlstra
                   ` (8 subsequent siblings)
  13 siblings, 1 reply; 45+ messages in thread
From: Peter Zijlstra @ 2021-03-18 17:11 UTC (permalink / raw)
  To: x86, jpoimboe, jgross, mbenes; +Cc: linux-kernel, peterz

The __x86_indirect_ naming is obviously not generic. Shorten to allow
matching some additional magic names later.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 tools/objtool/arch/x86/decode.c      |    5 +++++
 tools/objtool/check.c                |    7 ++++++-
 tools/objtool/include/objtool/arch.h |    2 ++
 3 files changed, 13 insertions(+), 1 deletion(-)

--- a/tools/objtool/arch/x86/decode.c
+++ b/tools/objtool/arch/x86/decode.c
@@ -692,3 +692,8 @@ int arch_decode_hint_reg(struct instruct
 
 	return 0;
 }
+
+bool arch_is_retpoline(struct symbol *sym)
+{
+	return !strncmp(sym->name, "__x86_indirect_", 15);
+}
--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -850,6 +850,11 @@ static int add_ignore_alternatives(struc
 	return 0;
 }
 
+__weak bool arch_is_retpoline(struct symbol *sym)
+{
+	return false;
+}
+
 /*
  * Find the destination instructions for all jumps.
  */
@@ -872,7 +877,7 @@ static int add_jump_destinations(struct
 		} else if (reloc->sym->type == STT_SECTION) {
 			dest_sec = reloc->sym->sec;
 			dest_off = arch_dest_reloc_offset(reloc->addend);
-		} else if (!strncmp(reloc->sym->name, "__x86_indirect_thunk_", 21)) {
+		} else if (arch_is_retpoline(reloc->sym)) {
 			/*
 			 * Retpoline jumps are really dynamic jumps in
 			 * disguise, so convert them accordingly.
--- a/tools/objtool/include/objtool/arch.h
+++ b/tools/objtool/include/objtool/arch.h
@@ -85,4 +85,6 @@ const char *arch_nop_insn(int len);
 
 int arch_decode_hint_reg(struct instruction *insn, u8 sp_reg);
 
+bool arch_is_retpoline(struct symbol *sym);
+
 #endif /* _ARCH_H */



^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH v2 06/14] objtool: Fix static_call list generation
  2021-03-18 17:11 [PATCH v2 00/14] x86,objtool: Optimize !RETPOLINE Peter Zijlstra
                   ` (4 preceding siblings ...)
  2021-03-18 17:11 ` [PATCH v2 05/14] objtool: Per arch retpoline naming Peter Zijlstra
@ 2021-03-18 17:11 ` Peter Zijlstra
  2021-03-22 12:44   ` Miroslav Benes
  2021-03-18 17:11 ` [PATCH v2 07/14] objtool: Rework rebuild_reloc logic Peter Zijlstra
                   ` (7 subsequent siblings)
  13 siblings, 1 reply; 45+ messages in thread
From: Peter Zijlstra @ 2021-03-18 17:11 UTC (permalink / raw)
  To: x86, jpoimboe, jgross, mbenes; +Cc: linux-kernel, peterz

Currently objtool generates tail call entries in
add_jump_destination() but waits until validate_branch() to generate
the regular call entries, move these to add_call_destination() for
consistency.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 tools/objtool/check.c |   18 +++++++++++++-----
 1 file changed, 13 insertions(+), 5 deletions(-)

--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -1045,6 +1045,12 @@ static int add_call_destinations(struct
 		} else
 			insn->call_dest = reloc->sym;
 
+		if (insn->call_dest && insn->call_dest->static_call_tramp) {
+			list_add_tail(&insn->static_call_node,
+				      &file->static_call_list);
+		}
+
+
 		/*
 		 * Many compilers cannot disable KCOV with a function attribute
 		 * so they need a little help, NOP out any KCOV calls from noinstr
@@ -1788,6 +1794,9 @@ static int decode_sections(struct objtoo
 	if (ret)
 		return ret;
 
+	/*
+	 * Must be before add_{jump_call}_destination.
+	 */
 	ret = read_static_call_tramps(file);
 	if (ret)
 		return ret;
@@ -1800,6 +1809,10 @@ static int decode_sections(struct objtoo
 	if (ret)
 		return ret;
 
+	/*
+	 * Must be before add_call_destination(); it changes INSN_CALL to
+	 * INSN_JUMP.
+	 */
 	ret = read_intra_function_calls(file);
 	if (ret)
 		return ret;
@@ -2745,11 +2758,6 @@ static int validate_branch(struct objtoo
 			if (dead_end_function(file, insn->call_dest))
 				return 0;
 
-			if (insn->type == INSN_CALL && insn->call_dest->static_call_tramp) {
-				list_add_tail(&insn->static_call_node,
-					      &file->static_call_list);
-			}
-
 			break;
 
 		case INSN_JUMP_CONDITIONAL:



^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH v2 07/14] objtool: Rework rebuild_reloc logic
  2021-03-18 17:11 [PATCH v2 00/14] x86,objtool: Optimize !RETPOLINE Peter Zijlstra
                   ` (5 preceding siblings ...)
  2021-03-18 17:11 ` [PATCH v2 06/14] objtool: Fix static_call list generation Peter Zijlstra
@ 2021-03-18 17:11 ` Peter Zijlstra
  2021-03-18 17:11 ` [PATCH v2 08/14] objtool: Add elf_create_reloc() helper Peter Zijlstra
                   ` (6 subsequent siblings)
  13 siblings, 0 replies; 45+ messages in thread
From: Peter Zijlstra @ 2021-03-18 17:11 UTC (permalink / raw)
  To: x86, jpoimboe, jgross, mbenes; +Cc: linux-kernel, peterz

Instead of manually calling elf_rebuild_reloc_section() on sections
we've called elf_add_reloc() on, have elf_write() DTRT.

This makes it easier to add random relocations in places without
carefully tracking when we're done and need to flush what section.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 tools/objtool/check.c               |    6 ------
 tools/objtool/elf.c                 |   20 ++++++++++++++------
 tools/objtool/include/objtool/elf.h |    1 -
 tools/objtool/orc_gen.c             |    3 ---
 4 files changed, 14 insertions(+), 16 deletions(-)

--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -542,9 +542,6 @@ static int create_static_call_sections(s
 		idx++;
 	}
 
-	if (elf_rebuild_reloc_section(file->elf, reloc_sec))
-		return -1;
-
 	return 0;
 }
 
@@ -614,9 +611,6 @@ static int create_mcount_loc_sections(st
 		idx++;
 	}
 
-	if (elf_rebuild_reloc_section(file->elf, reloc_sec))
-		return -1;
-
 	return 0;
 }
 
--- a/tools/objtool/elf.c
+++ b/tools/objtool/elf.c
@@ -479,6 +479,8 @@ void elf_add_reloc(struct elf *elf, stru
 
 	list_add_tail(&reloc->list, &sec->reloc_list);
 	elf_hash_add(elf->reloc_hash, &reloc->hash, reloc_hash(reloc));
+
+	sec->changed = true;
 }
 
 static int read_rel_reloc(struct section *sec, int i, struct reloc *reloc, unsigned int *symndx)
@@ -558,7 +560,9 @@ static int read_relocs(struct elf *elf)
 				return -1;
 			}
 
-			elf_add_reloc(elf, reloc);
+			list_add_tail(&reloc->list, &sec->reloc_list);
+			elf_hash_add(elf->reloc_hash, &reloc->hash, reloc_hash(reloc));
+
 			nr_reloc++;
 		}
 		max_reloc = max(max_reloc, nr_reloc);
@@ -873,14 +877,11 @@ static int elf_rebuild_rela_reloc_sectio
 	return 0;
 }
 
-int elf_rebuild_reloc_section(struct elf *elf, struct section *sec)
+static int elf_rebuild_reloc_section(struct elf *elf, struct section *sec)
 {
 	struct reloc *reloc;
 	int nr;
 
-	sec->changed = true;
-	elf->changed = true;
-
 	nr = 0;
 	list_for_each_entry(reloc, &sec->reloc_list, list)
 		nr++;
@@ -944,9 +945,15 @@ int elf_write(struct elf *elf)
 	struct section *sec;
 	Elf_Scn *s;
 
-	/* Update section headers for changed sections: */
+	/* Update changed relocation sections and section headers: */
 	list_for_each_entry(sec, &elf->sections, list) {
 		if (sec->changed) {
+			if (sec->reloc &&
+			    elf_rebuild_reloc_section(elf, sec->reloc)) {
+				WARN_ELF("elf_rebuild_reloc_section");
+				return -1;
+			}
+
 			s = elf_getscn(elf->elf, sec->idx);
 			if (!s) {
 				WARN_ELF("elf_getscn");
@@ -958,6 +965,7 @@ int elf_write(struct elf *elf)
 			}
 
 			sec->changed = false;
+			elf->changed = true;
 		}
 	}
 
--- a/tools/objtool/include/objtool/elf.h
+++ b/tools/objtool/include/objtool/elf.h
@@ -142,7 +142,6 @@ struct reloc *find_reloc_by_dest_range(c
 struct symbol *find_func_containing(struct section *sec, unsigned long offset);
 void insn_to_reloc_sym_addend(struct section *sec, unsigned long offset,
 			      struct reloc *reloc);
-int elf_rebuild_reloc_section(struct elf *elf, struct section *sec);
 
 #define for_each_sec(file, sec)						\
 	list_for_each_entry(sec, &file->elf->sections, list)
--- a/tools/objtool/orc_gen.c
+++ b/tools/objtool/orc_gen.c
@@ -254,8 +254,5 @@ int orc_create(struct objtool_file *file
 			return -1;
 	}
 
-	if (elf_rebuild_reloc_section(file->elf, ip_rsec))
-		return -1;
-
 	return 0;
 }



^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH v2 08/14] objtool: Add elf_create_reloc() helper
  2021-03-18 17:11 [PATCH v2 00/14] x86,objtool: Optimize !RETPOLINE Peter Zijlstra
                   ` (6 preceding siblings ...)
  2021-03-18 17:11 ` [PATCH v2 07/14] objtool: Rework rebuild_reloc logic Peter Zijlstra
@ 2021-03-18 17:11 ` Peter Zijlstra
  2021-03-19  1:42   ` Josh Poimboeuf
  2021-03-18 17:11 ` [PATCH v2 09/14] objtool: Extract elf_strtab_concat() Peter Zijlstra
                   ` (5 subsequent siblings)
  13 siblings, 1 reply; 45+ messages in thread
From: Peter Zijlstra @ 2021-03-18 17:11 UTC (permalink / raw)
  To: x86, jpoimboe, jgross, mbenes; +Cc: linux-kernel, peterz

We have 4 instances of adding a relocation. Create a common helper
to avoid growing even more.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 tools/objtool/check.c                 |   76 ++++++----------------------
 tools/objtool/elf.c                   |   90 +++++++++++++++++++++++-----------
 tools/objtool/include/objtool/check.h |    7 ++
 tools/objtool/include/objtool/elf.h   |   10 +++
 tools/objtool/orc_gen.c               |   57 ++++++++-------------
 5 files changed, 119 insertions(+), 121 deletions(-)

--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -433,8 +433,7 @@ static int add_dead_ends(struct objtool_
 
 static int create_static_call_sections(struct objtool_file *file)
 {
-	struct section *sec, *reloc_sec;
-	struct reloc *reloc;
+	struct section *sec;
 	struct static_call_site *site;
 	struct instruction *insn;
 	struct symbol *key_sym;
@@ -460,8 +459,7 @@ static int create_static_call_sections(s
 	if (!sec)
 		return -1;
 
-	reloc_sec = elf_create_reloc_section(file->elf, sec, SHT_RELA);
-	if (!reloc_sec)
+	if (!elf_create_reloc_section(file->elf, sec, SHT_RELA))
 		return -1;
 
 	idx = 0;
@@ -471,26 +469,13 @@ static int create_static_call_sections(s
 		memset(site, 0, sizeof(struct static_call_site));
 
 		/* populate reloc for 'addr' */
-		reloc = malloc(sizeof(*reloc));
-
-		if (!reloc) {
-			perror("malloc");
-			return -1;
-		}
-		memset(reloc, 0, sizeof(*reloc));
-
-		insn_to_reloc_sym_addend(insn->sec, insn->offset, reloc);
-		if (!reloc->sym) {
-			WARN_FUNC("static call tramp: missing containing symbol",
-				  insn->sec, insn->offset);
+		if (!elf_create_reloc(file->elf, sec, idx * sizeof(struct static_call_site),
+				      R_X86_64_PC32,
+				      reloc_to_insn, insn, 0)) {
+			WARN_ELF("elf_create_reloc: static_call_site::addr");
 			return -1;
 		}
 
-		reloc->type = R_X86_64_PC32;
-		reloc->offset = idx * sizeof(struct static_call_site);
-		reloc->sec = reloc_sec;
-		elf_add_reloc(file->elf, reloc);
-
 		/* find key symbol */
 		key_name = strdup(insn->call_dest->name);
 		if (!key_name) {
@@ -526,18 +511,13 @@ static int create_static_call_sections(s
 		free(key_name);
 
 		/* populate reloc for 'key' */
-		reloc = malloc(sizeof(*reloc));
-		if (!reloc) {
-			perror("malloc");
+		if (!elf_create_reloc(file->elf, sec, idx * sizeof(struct static_call_site) + 4,
+				      R_X86_64_PC32,
+				      reloc_to_sym, key_sym,
+				      is_sibling_call(insn) * STATIC_CALL_SITE_TAIL)) {
+			WARN_ELF("elf_create_reloc: static_call_site::key");
 			return -1;
 		}
-		memset(reloc, 0, sizeof(*reloc));
-		reloc->sym = key_sym;
-		reloc->addend = is_sibling_call(insn) ? STATIC_CALL_SITE_TAIL : 0;
-		reloc->type = R_X86_64_PC32;
-		reloc->offset = idx * sizeof(struct static_call_site) + 4;
-		reloc->sec = reloc_sec;
-		elf_add_reloc(file->elf, reloc);
 
 		idx++;
 	}
@@ -547,8 +527,7 @@ static int create_static_call_sections(s
 
 static int create_mcount_loc_sections(struct objtool_file *file)
 {
-	struct section *sec, *reloc_sec;
-	struct reloc *reloc;
+	struct section *sec;
 	unsigned long *loc;
 	struct instruction *insn;
 	int idx;
@@ -571,8 +550,7 @@ static int create_mcount_loc_sections(st
 	if (!sec)
 		return -1;
 
-	reloc_sec = elf_create_reloc_section(file->elf, sec, SHT_RELA);
-	if (!reloc_sec)
+	if (!elf_create_reloc_section(file->elf, sec, SHT_RELA))
 		return -1;
 
 	idx = 0;
@@ -581,32 +559,12 @@ static int create_mcount_loc_sections(st
 		loc = (unsigned long *)sec->data->d_buf + idx;
 		memset(loc, 0, sizeof(unsigned long));
 
-		reloc = malloc(sizeof(*reloc));
-		if (!reloc) {
-			perror("malloc");
+		if (!elf_create_reloc(file->elf, sec, idx * sizeof(unsigned long),
+				      R_X86_64_64,
+				      reloc_to_insn, insn, 0)) {
+			WARN_ELF("elf_create_reloc: __mcount_loc");
 			return -1;
 		}
-		memset(reloc, 0, sizeof(*reloc));
-
-		if (insn->sec->sym) {
-			reloc->sym = insn->sec->sym;
-			reloc->addend = insn->offset;
-		} else {
-			reloc->sym = find_symbol_containing(insn->sec, insn->offset);
-
-			if (!reloc->sym) {
-				WARN("missing symbol for insn at offset 0x%lx\n",
-				     insn->offset);
-				return -1;
-			}
-
-			reloc->addend = insn->offset - reloc->sym->offset;
-		}
-
-		reloc->type = R_X86_64_64;
-		reloc->offset = idx * sizeof(unsigned long);
-		reloc->sec = reloc_sec;
-		elf_add_reloc(file->elf, reloc);
 
 		idx++;
 	}
--- a/tools/objtool/elf.c
+++ b/tools/objtool/elf.c
@@ -211,32 +211,6 @@ struct reloc *find_reloc_by_dest(const s
 	return find_reloc_by_dest_range(elf, sec, offset, 1);
 }
 
-void insn_to_reloc_sym_addend(struct section *sec, unsigned long offset,
-			      struct reloc *reloc)
-{
-	if (sec->sym) {
-		reloc->sym = sec->sym;
-		reloc->addend = offset;
-		return;
-	}
-
-	/*
-	 * The Clang assembler strips section symbols, so we have to reference
-	 * the function symbol instead:
-	 */
-	reloc->sym = find_symbol_containing(sec, offset);
-	if (!reloc->sym) {
-		/*
-		 * Hack alert.  This happens when we need to reference the NOP
-		 * pad insn immediately after the function.
-		 */
-		reloc->sym = find_symbol_containing(sec, offset - 1);
-	}
-
-	if (reloc->sym)
-		reloc->addend = offset - reloc->sym->offset;
-}
-
 static int read_sections(struct elf *elf)
 {
 	Elf_Scn *s = NULL;
@@ -473,7 +447,7 @@ static int read_symbols(struct elf *elf)
 	return -1;
 }
 
-void elf_add_reloc(struct elf *elf, struct reloc *reloc)
+static void elf_add_reloc(struct elf *elf, struct reloc *reloc)
 {
 	struct section *sec = reloc->sec;
 
@@ -483,6 +457,68 @@ void elf_add_reloc(struct elf *elf, stru
 	sec->changed = true;
 }
 
+void reloc_to_sym(struct reloc *reloc, void *dst)
+{
+	struct symbol *sym = dst;
+	reloc->sym = sym;
+}
+
+void __reloc_to_insn(struct reloc *reloc, struct section *sec, unsigned long offset)
+{
+	if (sec->sym) {
+		reloc->sym = sec->sym;
+		reloc->addend = offset;
+		return;
+	}
+
+	/*
+	 * The Clang assembler strips section symbols, so we have to reference
+	 * the function symbol instead:
+	 */
+	reloc->sym = find_symbol_containing(sec, offset);
+	if (!reloc->sym) {
+		/*
+		 * Hack alert.  This happens when we need to reference the NOP
+		 * pad insn immediately after the function.
+		 */
+		reloc->sym = find_symbol_containing(sec, offset - 1);
+	}
+
+	if (reloc->sym)
+		reloc->addend = offset - reloc->sym->offset;
+}
+
+struct reloc *elf_create_reloc(struct elf *elf,
+			       struct section *sec, unsigned long offset,
+			       unsigned int type,
+			       void (*set_dest)(struct reloc *, void *), void *dst, int addend)
+{
+	struct reloc *reloc;
+
+	reloc = malloc(sizeof(*reloc));
+	if (!reloc) {
+		perror("malloc");
+		return NULL;
+	}
+	memset(reloc, 0, sizeof(*reloc));
+
+	reloc->sec = sec->reloc;
+	reloc->offset = offset;
+	reloc->type = type;
+
+	set_dest(reloc, dst);
+	if (!reloc->sym) {
+		WARN("failed to create reloc");
+		return NULL;
+	}
+
+	reloc->addend += addend;
+
+	elf_add_reloc(elf, reloc);
+
+	return reloc;
+}
+
 static int read_rel_reloc(struct section *sec, int i, struct reloc *reloc, unsigned int *symndx)
 {
 	if (!gelf_getrel(sec->data, i, &reloc->rel)) {
--- a/tools/objtool/include/objtool/check.h
+++ b/tools/objtool/include/objtool/check.h
@@ -9,6 +9,7 @@
 #include <stdbool.h>
 #include <objtool/cfi.h>
 #include <objtool/arch.h>
+#include <objtool/elf.h>
 
 struct insn_state {
 	struct cfi_state cfi;
@@ -91,4 +92,10 @@ struct instruction *find_insn(struct obj
 			insn->sec == sec;				\
 	     insn = list_next_entry(insn, list))
 
+static inline void reloc_to_insn(struct reloc *reloc, void *dst)
+{
+	struct instruction *insn = dst;
+	__reloc_to_insn(reloc, insn->sec, insn->offset);
+}
+
 #endif /* _CHECK_H */
--- a/tools/objtool/include/objtool/elf.h
+++ b/tools/objtool/include/objtool/elf.h
@@ -123,7 +123,15 @@ static inline u32 reloc_hash(struct relo
 struct elf *elf_open_read(const char *name, int flags);
 struct section *elf_create_section(struct elf *elf, const char *name, unsigned int sh_flags, size_t entsize, int nr);
 struct section *elf_create_reloc_section(struct elf *elf, struct section *base, int reltype);
-void elf_add_reloc(struct elf *elf, struct reloc *reloc);
+
+void reloc_to_sym(struct reloc *reloc, void *sym);
+void __reloc_to_insn(struct reloc *reloc, struct section *sec, unsigned long offset);
+
+struct reloc *elf_create_reloc(struct elf *elf,
+			       struct section *sec, unsigned long offset,
+			       unsigned int type,
+			       void (*set_dest)(struct reloc *reloc, void *), void *dst, int addend);
+
 int elf_write_insn(struct elf *elf, struct section *sec,
 		   unsigned long offset, unsigned int len,
 		   const char *insn);
--- a/tools/objtool/orc_gen.c
+++ b/tools/objtool/orc_gen.c
@@ -81,13 +81,25 @@ static int init_orc_entry(struct orc_ent
 	return 0;
 }
 
+struct orc_list_entry {
+	struct list_head list;
+	struct orc_entry orc;
+	struct section *insn_sec;
+	unsigned long insn_off;
+};
+
+static void reloc_to_orc(struct reloc *reloc, void *dst)
+{
+	struct orc_list_entry *entry = dst;
+	__reloc_to_insn(reloc, entry->insn_sec, entry->insn_off);
+}
+
 static int write_orc_entry(struct elf *elf, struct section *orc_sec,
-			   struct section *ip_rsec, unsigned int idx,
-			   struct section *insn_sec, unsigned long insn_off,
-			   struct orc_entry *o)
+			   struct section *ip_sec, unsigned int idx,
+			   struct orc_list_entry *entry)
 {
+	struct orc_entry *o = &entry->orc;
 	struct orc_entry *orc;
-	struct reloc *reloc;
 
 	/* populate ORC data */
 	orc = (struct orc_entry *)orc_sec->data->d_buf + idx;
@@ -96,36 +108,16 @@ static int write_orc_entry(struct elf *e
 	orc->bp_offset = bswap_if_needed(orc->bp_offset);
 
 	/* populate reloc for ip */
-	reloc = malloc(sizeof(*reloc));
-	if (!reloc) {
-		perror("malloc");
-		return -1;
-	}
-	memset(reloc, 0, sizeof(*reloc));
-
-	insn_to_reloc_sym_addend(insn_sec, insn_off, reloc);
-	if (!reloc->sym) {
-		WARN("missing symbol for insn at offset 0x%lx",
-		     insn_off);
+	if (!elf_create_reloc(elf, ip_sec, idx * sizeof(int),
+			      R_X86_64_PC32,
+			      reloc_to_orc, entry, 0)) {
+		WARN_ELF("elf_create_reloc: ORC ip");
 		return -1;
 	}
 
-	reloc->type = R_X86_64_PC32;
-	reloc->offset = idx * sizeof(int);
-	reloc->sec = ip_rsec;
-
-	elf_add_reloc(elf, reloc);
-
 	return 0;
 }
 
-struct orc_list_entry {
-	struct list_head list;
-	struct orc_entry orc;
-	struct section *insn_sec;
-	unsigned long insn_off;
-};
-
 static int orc_list_add(struct list_head *orc_list, struct orc_entry *orc,
 			struct section *sec, unsigned long offset)
 {
@@ -153,7 +145,7 @@ static unsigned long alt_group_len(struc
 
 int orc_create(struct objtool_file *file)
 {
-	struct section *sec, *ip_rsec, *orc_sec;
+	struct section *sec, *orc_sec;
 	unsigned int nr = 0, idx = 0;
 	struct orc_list_entry *entry;
 	struct list_head orc_list;
@@ -242,15 +234,12 @@ int orc_create(struct objtool_file *file
 	sec = elf_create_section(file->elf, ".orc_unwind_ip", 0, sizeof(int), nr);
 	if (!sec)
 		return -1;
-	ip_rsec = elf_create_reloc_section(file->elf, sec, SHT_RELA);
-	if (!ip_rsec)
+	if (!elf_create_reloc_section(file->elf, sec, SHT_RELA))
 		return -1;
 
 	/* Write ORC entries to sections: */
 	list_for_each_entry(entry, &orc_list, list) {
-		if (write_orc_entry(file->elf, orc_sec, ip_rsec, idx++,
-				    entry->insn_sec, entry->insn_off,
-				    &entry->orc))
+		if (write_orc_entry(file->elf, orc_sec, sec, idx++, entry))
 			return -1;
 	}
 



^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH v2 09/14] objtool: Extract elf_strtab_concat()
  2021-03-18 17:11 [PATCH v2 00/14] x86,objtool: Optimize !RETPOLINE Peter Zijlstra
                   ` (7 preceding siblings ...)
  2021-03-18 17:11 ` [PATCH v2 08/14] objtool: Add elf_create_reloc() helper Peter Zijlstra
@ 2021-03-18 17:11 ` Peter Zijlstra
  2021-03-19  2:10   ` Josh Poimboeuf
  2021-03-18 17:11 ` [PATCH v2 10/14] objtool: Extract elf_symbol_add() Peter Zijlstra
                   ` (4 subsequent siblings)
  13 siblings, 1 reply; 45+ messages in thread
From: Peter Zijlstra @ 2021-03-18 17:11 UTC (permalink / raw)
  To: x86, jpoimboe, jgross, mbenes; +Cc: linux-kernel, peterz

Create a common helper to append strings to a strtab.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 tools/objtool/elf.c |   73 +++++++++++++++++++++++++++++-----------------------
 1 file changed, 42 insertions(+), 31 deletions(-)

--- a/tools/objtool/elf.c
+++ b/tools/objtool/elf.c
@@ -676,13 +676,51 @@ struct elf *elf_open_read(const char *na
 	return NULL;
 }
 
+static int elf_strtab_concat(struct elf *elf, char *name, const char *strtab_name)
+{
+	struct section *strtab = NULL;
+	Elf_Data *data;
+	Elf_Scn *s;
+	int len;
+
+	if (strtab_name)
+		strtab = find_section_by_name(elf, strtab_name);
+	if (!strtab)
+		strtab = find_section_by_name(elf, ".strtab");
+	if (!strtab) {
+		WARN("can't find %s or .strtab section", strtab_name);
+		return -1;
+	}
+
+	s = elf_getscn(elf->elf, strtab->idx);
+	if (!s) {
+		WARN_ELF("elf_getscn");
+		return -1;
+	}
+
+	data = elf_newdata(s);
+	if (!data) {
+		WARN_ELF("elf_newdata");
+		return -1;
+	}
+
+	data->d_buf = name;
+	data->d_size = strlen(name) + 1;;
+	data->d_align = 1;
+
+	len = strtab->len;
+	strtab->len += data->d_size;
+	strtab->changed = true;
+
+	return len;
+}
+
 struct section *elf_create_section(struct elf *elf, const char *name,
 				   unsigned int sh_flags, size_t entsize, int nr)
 {
-	struct section *sec, *shstrtab;
+	struct section *sec;
 	size_t size = entsize * nr;
 	Elf_Scn *s;
-	Elf_Data *data;
 
 	sec = malloc(sizeof(*sec));
 	if (!sec) {
@@ -739,36 +777,9 @@ struct section *elf_create_section(struc
 	sec->sh.sh_addralign = 1;
 	sec->sh.sh_flags = SHF_ALLOC | sh_flags;
 
-
-	/* Add section name to .shstrtab (or .strtab for Clang) */
-	shstrtab = find_section_by_name(elf, ".shstrtab");
-	if (!shstrtab)
-		shstrtab = find_section_by_name(elf, ".strtab");
-	if (!shstrtab) {
-		WARN("can't find .shstrtab or .strtab section");
-		return NULL;
-	}
-
-	s = elf_getscn(elf->elf, shstrtab->idx);
-	if (!s) {
-		WARN_ELF("elf_getscn");
+	sec->sh.sh_name = elf_strtab_concat(elf, sec->name, ".shstrtab");
+	if (sec->sh.sh_name == -1)
 		return NULL;
-	}
-
-	data = elf_newdata(s);
-	if (!data) {
-		WARN_ELF("elf_newdata");
-		return NULL;
-	}
-
-	data->d_buf = sec->name;
-	data->d_size = strlen(name) + 1;
-	data->d_align = 1;
-
-	sec->sh.sh_name = shstrtab->len;
-
-	shstrtab->len += strlen(name) + 1;
-	shstrtab->changed = true;
 
 	list_add_tail(&sec->list, &elf->sections);
 	elf_hash_add(elf->section_hash, &sec->hash, sec->idx);



^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH v2 10/14] objtool: Extract elf_symbol_add()
  2021-03-18 17:11 [PATCH v2 00/14] x86,objtool: Optimize !RETPOLINE Peter Zijlstra
                   ` (8 preceding siblings ...)
  2021-03-18 17:11 ` [PATCH v2 09/14] objtool: Extract elf_strtab_concat() Peter Zijlstra
@ 2021-03-18 17:11 ` Peter Zijlstra
  2021-03-19  2:14   ` Josh Poimboeuf
  2021-03-18 17:11 ` [PATCH v2 11/14] objtool: Add elf_create_undef_symbol() Peter Zijlstra
                   ` (3 subsequent siblings)
  13 siblings, 1 reply; 45+ messages in thread
From: Peter Zijlstra @ 2021-03-18 17:11 UTC (permalink / raw)
  To: x86, jpoimboe, jgross, mbenes; +Cc: linux-kernel, peterz

Create a common helper to add symbols.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 tools/objtool/elf.c |   57 ++++++++++++++++++++++++++++++----------------------
 1 file changed, 33 insertions(+), 24 deletions(-)

--- a/tools/objtool/elf.c
+++ b/tools/objtool/elf.c
@@ -290,12 +290,41 @@ static int read_sections(struct elf *elf
 	return 0;
 }
 
+static bool elf_symbol_add(struct elf *elf, struct symbol *sym)
+{
+	struct list_head *entry;
+	struct rb_node *pnode;
+
+	sym->type = GELF_ST_TYPE(sym->sym.st_info);
+	sym->bind = GELF_ST_BIND(sym->sym.st_info);
+
+	sym->offset = sym->sym.st_value;
+	sym->len = sym->sym.st_size;
+
+	rb_add(&sym->node, &sym->sec->symbol_tree, symbol_to_offset);
+	pnode = rb_prev(&sym->node);
+	if (pnode)
+		entry = &rb_entry(pnode, struct symbol, node)->list;
+	else
+		entry = &sym->sec->symbol_list;
+	list_add(&sym->list, entry);
+	elf_hash_add(elf->symbol_hash, &sym->hash, sym->idx);
+	elf_hash_add(elf->symbol_name_hash, &sym->name_hash, str_hash(sym->name));
+
+	/*
+	 * Don't store empty STT_NOTYPE symbols in the rbtree.  They
+	 * can exist within a function, confusing the sorting.
+	 */
+	if (!sym->len)
+		rb_erase(&sym->node, &sym->sec->symbol_tree);
+
+	return true;
+}
+
 static int read_symbols(struct elf *elf)
 {
 	struct section *symtab, *symtab_shndx, *sec;
 	struct symbol *sym, *pfunc;
-	struct list_head *entry;
-	struct rb_node *pnode;
 	int symbols_nr, i;
 	char *coldstr;
 	Elf_Data *shndx_data = NULL;
@@ -340,9 +369,6 @@ static int read_symbols(struct elf *elf)
 			goto err;
 		}
 
-		sym->type = GELF_ST_TYPE(sym->sym.st_info);
-		sym->bind = GELF_ST_BIND(sym->sym.st_info);
-
 		if ((sym->sym.st_shndx > SHN_UNDEF &&
 		     sym->sym.st_shndx < SHN_LORESERVE) ||
 		    (shndx_data && sym->sym.st_shndx == SHN_XINDEX)) {
@@ -362,25 +388,8 @@ static int read_symbols(struct elf *elf)
 		} else
 			sym->sec = find_section_by_index(elf, 0);
 
-		sym->offset = sym->sym.st_value;
-		sym->len = sym->sym.st_size;
-
-		rb_add(&sym->node, &sym->sec->symbol_tree, symbol_to_offset);
-		pnode = rb_prev(&sym->node);
-		if (pnode)
-			entry = &rb_entry(pnode, struct symbol, node)->list;
-		else
-			entry = &sym->sec->symbol_list;
-		list_add(&sym->list, entry);
-		elf_hash_add(elf->symbol_hash, &sym->hash, sym->idx);
-		elf_hash_add(elf->symbol_name_hash, &sym->name_hash, str_hash(sym->name));
-
-		/*
-		 * Don't store empty STT_NOTYPE symbols in the rbtree.  They
-		 * can exist within a function, confusing the sorting.
-		 */
-		if (!sym->len)
-			rb_erase(&sym->node, &sym->sec->symbol_tree);
+		if (!elf_symbol_add(elf, sym))
+			goto err;
 	}
 
 	if (stats)



^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH v2 11/14] objtool: Add elf_create_undef_symbol()
  2021-03-18 17:11 [PATCH v2 00/14] x86,objtool: Optimize !RETPOLINE Peter Zijlstra
                   ` (9 preceding siblings ...)
  2021-03-18 17:11 ` [PATCH v2 10/14] objtool: Extract elf_symbol_add() Peter Zijlstra
@ 2021-03-18 17:11 ` Peter Zijlstra
  2021-03-19  2:29   ` Josh Poimboeuf
  2021-03-18 17:11 ` [PATCH v2 12/14] objtool: Allow archs to rewrite retpolines Peter Zijlstra
                   ` (2 subsequent siblings)
  13 siblings, 1 reply; 45+ messages in thread
From: Peter Zijlstra @ 2021-03-18 17:11 UTC (permalink / raw)
  To: x86, jpoimboe, jgross, mbenes; +Cc: linux-kernel, peterz

Allow objtool to create undefined symbols; this allows creating
relocations to symbols not currently in the symbol table.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 tools/objtool/elf.c                 |   63 ++++++++++++++++++++++++++++++++++++
 tools/objtool/include/objtool/elf.h |    1 
 2 files changed, 64 insertions(+)

--- a/tools/objtool/elf.c
+++ b/tools/objtool/elf.c
@@ -724,6 +724,69 @@ static int elf_strtab_concat(struct elf
 	return len;
 }
 
+struct symbol *elf_create_undef_symbol(struct elf *elf, const char *name)
+{
+	struct section *symtab;
+	struct symbol *sym;
+	Elf_Data *data;
+	Elf_Scn *s;
+
+	sym = malloc(sizeof(*sym));
+	if (!sym) {
+		perror("malloc");
+		return NULL;
+	}
+	memset(sym, 0, sizeof(*sym));
+
+	sym->name = strdup(name);
+
+	sym->sym.st_name = elf_strtab_concat(elf, sym->name, NULL);
+	if (sym->sym.st_name == -1)
+		return NULL;
+
+	sym->sym.st_info = 0x10; /* STB_GLOBAL, STT_NOTYPE */
+	// st_other 0
+	// st_shndx 0
+	// st_value 0
+	// st_size 0
+
+	symtab = find_section_by_name(elf, ".symtab");
+	if (!symtab) {
+		WARN("can't find .symtab");
+		return NULL;
+	}
+
+	s = elf_getscn(elf->elf, symtab->idx);
+	if (!s) {
+		WARN_ELF("elf_getscn");
+		return NULL;
+	}
+
+	data = elf_newdata(s);
+	if (!data) {
+		WARN_ELF("elf_newdata");
+		return NULL;
+	}
+
+	data->d_buf = &sym->sym;
+	data->d_size = sizeof(sym->sym);
+	data->d_align = 1;
+
+	sym->idx = symtab->len / sizeof(sym->sym);
+
+	symtab->len += data->d_size;
+	symtab->changed = true;
+
+	sym->sec = find_section_by_index(elf, 0);
+
+	if (!elf_symbol_add(elf, sym)) {
+		WARN("elf_symbol_add");
+		return NULL;
+	}
+
+	return sym;
+}
+
 struct section *elf_create_section(struct elf *elf, const char *name,
 				   unsigned int sh_flags, size_t entsize, int nr)
 {
--- a/tools/objtool/include/objtool/elf.h
+++ b/tools/objtool/include/objtool/elf.h
@@ -136,6 +136,7 @@ int elf_write_insn(struct elf *elf, stru
 		   unsigned long offset, unsigned int len,
 		   const char *insn);
 int elf_write_reloc(struct elf *elf, struct reloc *reloc);
+struct symbol *elf_create_undef_symbol(struct elf *elf, const char *name);
 int elf_write(struct elf *elf);
 void elf_close(struct elf *elf);
 



^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH v2 12/14] objtool: Allow archs to rewrite retpolines
  2021-03-18 17:11 [PATCH v2 00/14] x86,objtool: Optimize !RETPOLINE Peter Zijlstra
                   ` (10 preceding siblings ...)
  2021-03-18 17:11 ` [PATCH v2 11/14] objtool: Add elf_create_undef_symbol() Peter Zijlstra
@ 2021-03-18 17:11 ` Peter Zijlstra
  2021-03-19  2:54   ` Josh Poimboeuf
  2021-03-18 17:11 ` [PATCH v2 13/14] objtool: Skip magical retpoline .altinstr_replacement Peter Zijlstra
  2021-03-18 17:11 ` [PATCH v2 14/14] objtool,x86: Rewrite retpoline thunk calls Peter Zijlstra
  13 siblings, 1 reply; 45+ messages in thread
From: Peter Zijlstra @ 2021-03-18 17:11 UTC (permalink / raw)
  To: x86, jpoimboe, jgross, mbenes; +Cc: linux-kernel, peterz

When retpolines are employed, compilers typically emit calls to
retpoline thunks. Objtool recognises these calls and marks them as
dynamic calls.

Provide infrastructure for architectures to rewrite/augment what the
compiler wrote for us.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 tools/objtool/check.c                |   23 +++++++++++++++++++++--
 tools/objtool/include/objtool/arch.h |    3 +++
 2 files changed, 24 insertions(+), 2 deletions(-)

--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -852,6 +852,14 @@ __weak bool arch_is_retpoline(struct sym
 	return false;
 }
 
+
+__weak int arch_rewrite_retpoline(struct objtool_file *file,
+				  struct instruction *insn,
+				  struct reloc *reloc)
+{
+	return 0;
+}
+
 /*
  * Find the destination instructions for all jumps.
  */
@@ -885,6 +893,9 @@ static int add_jump_destinations(struct
 				insn->type = INSN_JUMP_DYNAMIC_CONDITIONAL;
 
 			insn->retpoline_safe = true;
+
+			arch_rewrite_retpoline(file, insn, reloc);
+
 			continue;
 		} else if (insn->func) {
 			/* internal or external sibling call (with reloc) */
@@ -1036,6 +1047,8 @@ static int add_call_destinations(struct
 			insn->type = INSN_CALL_DYNAMIC;
 			insn->retpoline_safe = true;
 
+			arch_rewrite_retpoline(file, insn, reloc);
+
 			remove_insn_ops(insn);
 			continue;
 
@@ -1212,6 +1225,8 @@ static int handle_group_alt(struct objto
 		dest_off = arch_jump_destination(insn);
 		if (dest_off == special_alt->new_off + special_alt->new_len)
 			insn->jump_dest = next_insn_same_sec(file, last_orig_insn);
+		else
+			insn->jump_dest = find_insn(file, insn->sec, dest_off);
 
 		if (!insn->jump_dest) {
 			WARN_FUNC("can't find alternative jump destination",
@@ -1797,11 +1812,15 @@ static int decode_sections(struct objtoo
 	if (ret)
 		return ret;
 
-	ret = add_jump_destinations(file);
+	/*
+	 * Must be before add_{jump,call}_destination; for they can add
+	 * magic alternatives.
+	 */
+	ret = add_special_section_alts(file);
 	if (ret)
 		return ret;
 
-	ret = add_special_section_alts(file);
+	ret = add_jump_destinations(file);
 	if (ret)
 		return ret;
 
--- a/tools/objtool/include/objtool/arch.h
+++ b/tools/objtool/include/objtool/arch.h
@@ -82,6 +82,9 @@ unsigned long arch_jump_destination(stru
 unsigned long arch_dest_reloc_offset(int addend);
 
 const char *arch_nop_insn(int len);
+int arch_rewrite_retpoline(struct objtool_file *file,
+			   struct instruction *insn,
+			   struct reloc *reloc);
 
 int arch_decode_hint_reg(struct instruction *insn, u8 sp_reg);
 



^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH v2 13/14] objtool: Skip magical retpoline .altinstr_replacement
  2021-03-18 17:11 [PATCH v2 00/14] x86,objtool: Optimize !RETPOLINE Peter Zijlstra
                   ` (11 preceding siblings ...)
  2021-03-18 17:11 ` [PATCH v2 12/14] objtool: Allow archs to rewrite retpolines Peter Zijlstra
@ 2021-03-18 17:11 ` Peter Zijlstra
  2021-03-18 17:11 ` [PATCH v2 14/14] objtool,x86: Rewrite retpoline thunk calls Peter Zijlstra
  13 siblings, 0 replies; 45+ messages in thread
From: Peter Zijlstra @ 2021-03-18 17:11 UTC (permalink / raw)
  To: x86, jpoimboe, jgross, mbenes; +Cc: linux-kernel, peterz

When the .altinstr_replacement is a retpoline, skip the alternative.
We already special case retpolines anyway.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 tools/objtool/special.c |   12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

--- a/tools/objtool/special.c
+++ b/tools/objtool/special.c
@@ -106,6 +106,14 @@ static int get_alt_entry(struct elf *elf
 			return -1;
 		}
 
+		/*
+		 * Skip retpoline .altinstr_replacement... we already rewrite the
+		 * instructions for retpolines anyway, see arch_is_retpoline()
+		 * usage in add_{call,jump}_destinations().
+		 */
+		if (arch_is_retpoline(new_reloc->sym))
+			return 1;
+
 		alt->new_sec = new_reloc->sym->sec;
 		alt->new_off = (unsigned int)new_reloc->addend;
 
@@ -154,7 +162,9 @@ int special_get_alts(struct elf *elf, st
 			memset(alt, 0, sizeof(*alt));
 
 			ret = get_alt_entry(elf, entry, sec, idx, alt);
-			if (ret)
+			if (ret > 0)
+				continue;
+			if (ret < 0)
 				return ret;
 
 			list_add_tail(&alt->list, alts);



^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH v2 14/14] objtool,x86: Rewrite retpoline thunk calls
  2021-03-18 17:11 [PATCH v2 00/14] x86,objtool: Optimize !RETPOLINE Peter Zijlstra
                   ` (12 preceding siblings ...)
  2021-03-18 17:11 ` [PATCH v2 13/14] objtool: Skip magical retpoline .altinstr_replacement Peter Zijlstra
@ 2021-03-18 17:11 ` Peter Zijlstra
  2021-03-19  3:29   ` Josh Poimboeuf
  13 siblings, 1 reply; 45+ messages in thread
From: Peter Zijlstra @ 2021-03-18 17:11 UTC (permalink / raw)
  To: x86, jpoimboe, jgross, mbenes; +Cc: linux-kernel, peterz

When the compiler emits: "CALL __x86_indirect_thunk_\reg" for an
indirect call, have objtool rewrite it to:

	ALTERNATIVE "call __x86_indirect_thunk_\reg",
		    "call *%reg", ALT_NOT(X86_FEATURE_RETPOLINE)

Additionally, in order to not emit endless identical
.altinst_replacement chunks, use a global symbol for them, see
__x86_indirect_alt_*.

This also avoids objtool from having to do code generation.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/include/asm/asm-prototypes.h |   12 ++-
 arch/x86/lib/retpoline.S              |   33 +++++++++
 tools/objtool/arch/x86/decode.c       |  116 ++++++++++++++++++++++++++++++++++
 3 files changed, 158 insertions(+), 3 deletions(-)

--- a/arch/x86/include/asm/asm-prototypes.h
+++ b/arch/x86/include/asm/asm-prototypes.h
@@ -19,11 +19,19 @@ extern void cmpxchg8b_emu(void);
 
 #ifdef CONFIG_RETPOLINE
 
-#define DECL_INDIRECT_THUNK(reg) \
+#undef GEN
+#define GEN(reg) \
 	extern asmlinkage void __x86_indirect_thunk_ ## reg (void);
+#include <asm/GEN-for-each-reg.h>
+
+#undef GEN
+#define GEN(reg) \
+	extern asmlinkage void __x86_indirect_alt_call_ ## reg (void);
+#include <asm/GEN-for-each-reg.h>
 
 #undef GEN
-#define GEN(reg) DECL_INDIRECT_THUNK(reg)
+#define GEN(reg) \
+	extern asmlinkage void __x86_indirect_alt_jmp_ ## reg (void);
 #include <asm/GEN-for-each-reg.h>
 
 #endif /* CONFIG_RETPOLINE */
--- a/arch/x86/lib/retpoline.S
+++ b/arch/x86/lib/retpoline.S
@@ -10,6 +10,8 @@
 #include <asm/unwind_hints.h>
 #include <asm/frame.h>
 
+	.section .text.__x86.indirect_thunk
+
 .macro RETPOLINE reg
 	ANNOTATE_INTRA_FUNCTION_CALL
 	call    .Ldo_rop_\@
@@ -25,7 +27,6 @@
 .endm
 
 .macro THUNK reg
-	.section .text.__x86.indirect_thunk
 
 	.align 32
 SYM_FUNC_START(__x86_indirect_thunk_\reg)
@@ -38,6 +39,24 @@ SYM_FUNC_END(__x86_indirect_thunk_\reg)
 
 .endm
 
+.macro ALT_THUNK reg
+
+	.align 1
+
+SYM_FUNC_START_NOALIGN(__x86_indirect_alt_call_\reg)
+	ANNOTATE_RETPOLINE_SAFE
+1:	call	*%\reg
+2:	.skip	5-(2b-1b), 0x90
+SYM_FUNC_END(__x86_indirect_alt_call_\reg)
+
+SYM_FUNC_START_NOALIGN(__x86_indirect_alt_jmp_\reg)
+	ANNOTATE_RETPOLINE_SAFE
+1:	jmp	*%\reg
+2:	.skip	5-(2b-1b), 0x90
+SYM_FUNC_END(__x86_indirect_alt_jmp_\reg)
+
+.endm
+
 /*
  * Despite being an assembler file we can't just use .irp here
  * because __KSYM_DEPS__ only uses the C preprocessor and would
@@ -61,3 +80,15 @@ SYM_FUNC_END(__x86_indirect_thunk_\reg)
 #define GEN(reg) EXPORT_THUNK(reg)
 #include <asm/GEN-for-each-reg.h>
 
+#undef GEN
+#define GEN(reg) ALT_THUNK reg
+#include <asm/GEN-for-each-reg.h>
+
+#undef GEN
+#define GEN(reg) __EXPORT_THUNK(__x86_indirect_alt_call_ ## reg)
+#include <asm/GEN-for-each-reg.h>
+
+#undef GEN
+#define GEN(reg) __EXPORT_THUNK(__x86_indirect_alt_jmp_ ## reg)
+#include <asm/GEN-for-each-reg.h>
+
--- a/tools/objtool/arch/x86/decode.c
+++ b/tools/objtool/arch/x86/decode.c
@@ -19,6 +19,7 @@
 #include <objtool/elf.h>
 #include <objtool/arch.h>
 #include <objtool/warn.h>
+#include <arch/elf.h>
 
 static int is_x86_64(const struct elf *elf)
 {
@@ -657,6 +658,121 @@ const char *arch_nop_insn(int len)
 	return nops[len-1];
 }
 
+/* asm/alternative.h ? */
+
+#define ALTINSTR_FLAG_INV	(1 << 15)
+#define ALT_NOT(feat)		((feat) | ALTINSTR_FLAG_INV)
+
+struct alt_instr {
+	s32 instr_offset;	/* original instruction */
+	s32 repl_offset;	/* offset to replacement instruction */
+	u16 cpuid;		/* cpuid bit set for replacement */
+	u8  instrlen;		/* length of original instruction */
+	u8  replacementlen;	/* length of new instruction */
+} __packed;
+
+static int elf_add_alternative(struct elf *elf,
+			       struct instruction *orig, struct symbol *sym,
+			       int cpuid, u8 orig_len, u8 repl_len)
+{
+	const int size = sizeof(struct alt_instr);
+	struct alt_instr *alt;
+	struct section *sec;
+	Elf_Scn *s;
+
+	sec = find_section_by_name(elf, ".altinstructions");
+	if (!sec) {
+		sec = elf_create_section(elf, ".altinstructions",
+					 SHF_WRITE, size, 0);
+
+		if (!sec) {
+			WARN_ELF("elf_create_section");
+			return -1;
+		}
+
+		if (!elf_create_reloc_section(elf, sec, SHT_RELA)) {
+			WARN_ELF("elf_create_reloc_section");
+			return -1;
+		}
+	}
+
+	s = elf_getscn(elf->elf, sec->idx);
+	if (!s) {
+		WARN_ELF("elf_getscn");
+		return -1;
+	}
+
+	sec->data = elf_newdata(s);
+	if (!sec->data) {
+		WARN_ELF("elf_newdata");
+		return -1;
+	}
+
+	sec->data->d_size = size;
+	sec->data->d_align = 1;
+
+	alt = sec->data->d_buf = malloc(size);
+	if (!sec->data->d_buf) {
+		perror("malloc");
+		return -1;
+	}
+	memset(sec->data->d_buf, 0, size);
+
+	if (!elf_create_reloc(elf, sec, sec->sh.sh_size,
+			      R_X86_64_PC32,
+			      reloc_to_insn, orig, 0)) {
+		WARN_ELF("elf_create_reloc: alt_instr::instr_offset");
+		return -1;
+	}
+
+	if (!elf_create_reloc(elf, sec, sec->sh.sh_size + 4,
+			      R_X86_64_PC32,
+			      reloc_to_sym, sym, 0)) {
+		WARN_ELF("elf_create_reloc: alt_instr::repl_offset");
+		return -1;
+	}
+
+	alt->cpuid = cpuid;
+	alt->instrlen = orig_len;
+	alt->replacementlen = repl_len;
+
+	sec->sh.sh_size += size;
+	sec->changed = true;
+
+	return 0;
+}
+
+#define X86_FEATURE_RETPOLINE                ( 7*32+12)
+
+int arch_rewrite_retpoline(struct objtool_file *file,
+			   struct instruction *insn,
+			   struct reloc *reloc)
+{
+	struct symbol *sym;
+	char name[32] = "";
+
+	if (!strcmp(insn->sec->name, ".text.__x86.indirect_thunk"))
+		return 0;
+
+	sprintf(name, "__x86_indirect_alt_%s_%s",
+		insn->type == INSN_JUMP_DYNAMIC ? "jmp" : "call",
+		reloc->sym->name + 21);
+
+	sym = find_symbol_by_name(file->elf, name);
+	if (!sym) {
+		sym = elf_create_undef_symbol(file->elf, name);
+		if (!sym) {
+			WARN("elf_create_undef_symbol");
+			return -1;
+		}
+	}
+
+	elf_add_alternative(file->elf, insn, sym,
+			    ALT_NOT(X86_FEATURE_RETPOLINE), 5, 5);
+
+	return 0;
+}
+
 int arch_decode_hint_reg(struct instruction *insn, u8 sp_reg)
 {
 	struct cfi_reg *cfa = &insn->cfi.cfa;



^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 08/14] objtool: Add elf_create_reloc() helper
  2021-03-18 17:11 ` [PATCH v2 08/14] objtool: Add elf_create_reloc() helper Peter Zijlstra
@ 2021-03-19  1:42   ` Josh Poimboeuf
  2021-03-19  9:47     ` Peter Zijlstra
  0 siblings, 1 reply; 45+ messages in thread
From: Josh Poimboeuf @ 2021-03-19  1:42 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: x86, jgross, mbenes, linux-kernel

On Thu, Mar 18, 2021 at 06:11:11PM +0100, Peter Zijlstra wrote:
> We have 4 instances of adding a relocation. Create a common helper
> to avoid growing even more.
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>

I'm not a fan of the API -- how about squashing this in?  Untested, of
course.

diff --git a/tools/objtool/check.c b/tools/objtool/check.c
index 6a52d56504b1..38ffa3b2595e 100644
--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -469,12 +469,11 @@ static int create_static_call_sections(struct objtool_file *file)
 		memset(site, 0, sizeof(struct static_call_site));
 
 		/* populate reloc for 'addr' */
-		if (!elf_create_reloc(file->elf, sec, idx * sizeof(struct static_call_site),
-				      R_X86_64_PC32,
-				      reloc_to_insn, insn, 0)) {
-			WARN_ELF("elf_create_reloc: static_call_site::addr");
+		if (elf_add_reloc_to_insn(file->elf, sec,
+					  idx * sizeof(struct static_call_site),
+					  R_X86_64_PC32,
+					  insn->sec, insn->offset))
 			return -1;
-		}
 
 		/* find key symbol */
 		key_name = strdup(insn->call_dest->name);
@@ -511,13 +510,11 @@ static int create_static_call_sections(struct objtool_file *file)
 		free(key_name);
 
 		/* populate reloc for 'key' */
-		if (!elf_create_reloc(file->elf, sec, idx * sizeof(struct static_call_site) + 4,
-				      R_X86_64_PC32,
-				      reloc_to_sym, key_sym,
-				      is_sibling_call(insn) * STATIC_CALL_SITE_TAIL)) {
-			WARN_ELF("elf_create_reloc: static_call_site::key");
+		if (elf_add_reloc(file->elf, sec,
+				  idx * sizeof(struct static_call_site) + 4,
+				  R_X86_64_PC32, key_sym,
+				  is_sibling_call(insn) * STATIC_CALL_SITE_TAIL))
 			return -1;
-		}
 
 		idx++;
 	}
@@ -559,12 +556,11 @@ static int create_mcount_loc_sections(struct objtool_file *file)
 		loc = (unsigned long *)sec->data->d_buf + idx;
 		memset(loc, 0, sizeof(unsigned long));
 
-		if (!elf_create_reloc(file->elf, sec, idx * sizeof(unsigned long),
-				      R_X86_64_64,
-				      reloc_to_insn, insn, 0)) {
-			WARN_ELF("elf_create_reloc: __mcount_loc");
+		if (elf_add_reloc_to_insn(file->elf, sec,
+					  idx * sizeof(unsigned long),
+					  R_X86_64_64,
+					  insn->sec, insn->offset))
 			return -1;
-		}
 
 		idx++;
 	}
diff --git a/tools/objtool/elf.c b/tools/objtool/elf.c
index d2afe2454de4..b3bd97b2b034 100644
--- a/tools/objtool/elf.c
+++ b/tools/objtool/elf.c
@@ -447,76 +447,73 @@ static int read_symbols(struct elf *elf)
 	return -1;
 }
 
-static void elf_add_reloc(struct elf *elf, struct reloc *reloc)
+int elf_add_reloc(struct elf *elf, struct section *sec, unsigned long offset,
+		  unsigned int type, struct symbol *sym, int addend)
 {
-	struct section *sec = reloc->sec;
+	struct reloc *reloc;
+
+	reloc = malloc(sizeof(*reloc));
+	if (!reloc) {
+		perror("malloc");
+		return -1;
+	}
+	memset(reloc, 0, sizeof(*reloc));
+
+	reloc->sec = sec->reloc;
+	reloc->offset = offset;
+	reloc->type = type;
+	reloc->sym = sym;
+	reloc->addend = addend;
 
 	list_add_tail(&reloc->list, &sec->reloc_list);
 	elf_hash_add(elf->reloc_hash, &reloc->hash, reloc_hash(reloc));
 
-	sec->changed = true;
-}
-
-void reloc_to_sym(struct reloc *reloc, void *dst)
-{
-	struct symbol *sym = dst;
-	reloc->sym = sym;
+	return 0;
 }
 
-void __reloc_to_insn(struct reloc *reloc, struct section *sec, unsigned long offset)
+static int insn_to_sym_addend(struct section *sec, unsigned long offset,
+			      struct symbol **sym, int *addend)
 {
 	if (sec->sym) {
-		reloc->sym = sec->sym;
-		reloc->addend = offset;
-		return;
+		*sym = sec->sym;
+		*addend = offset;
+		return 0;
 	}
 
 	/*
 	 * The Clang assembler strips section symbols, so we have to reference
 	 * the function symbol instead:
 	 */
-	reloc->sym = find_symbol_containing(sec, offset);
-	if (!reloc->sym) {
+	*sym = find_symbol_containing(sec, offset);
+	if (!*sym) {
 		/*
 		 * Hack alert.  This happens when we need to reference the NOP
 		 * pad insn immediately after the function.
 		 */
-		reloc->sym = find_symbol_containing(sec, offset - 1);
+		*sym = find_symbol_containing(sec, offset - 1);
 	}
 
-	if (reloc->sym)
-		reloc->addend = offset - reloc->sym->offset;
-}
-
-struct reloc *elf_create_reloc(struct elf *elf,
-			       struct section *sec, unsigned long offset,
-			       unsigned int type,
-			       void (*set_dest)(struct reloc *, void *), void *dst, int addend)
-{
-	struct reloc *reloc;
-
-	reloc = malloc(sizeof(*reloc));
-	if (!reloc) {
-		perror("malloc");
-		return NULL;
+	if (!*sym) {
+		WARN("can't find symbol containing %s+0x%lx", sec->name, offset);
+		return -1;
 	}
-	memset(reloc, 0, sizeof(*reloc));
 
-	reloc->sec = sec->reloc;
-	reloc->offset = offset;
-	reloc->type = type;
+	*addend = offset - (*sym)->offset;
+	return 0;
+}
 
-	set_dest(reloc, dst);
-	if (!reloc->sym) {
-		WARN("failed to create reloc");
-		return NULL;
-	}
 
-	reloc->addend += addend;
+int elf_add_reloc_to_insn(struct elf *elf, struct section *sec,
+			  unsigned long offset, unsigned int type,
+			  struct section *insn_sec, unsigned long insn_off)
+{
+	struct symbol *sym;
+	int addend;
 
-	elf_add_reloc(elf, reloc);
+	if (insn_to_sym_addend(insn_sec, insn_off, &sym, &addend))
+		return -1;
 
-	return reloc;
+	return elf_add_reloc(elf, sec, offset, type, sym, addend);
 }
 
 static int read_rel_reloc(struct section *sec, int i, struct reloc *reloc, unsigned int *symndx)
diff --git a/tools/objtool/include/objtool/check.h b/tools/objtool/include/objtool/check.h
index 8f31f96c8fe1..f5be798107bc 100644
--- a/tools/objtool/include/objtool/check.h
+++ b/tools/objtool/include/objtool/check.h
@@ -9,7 +9,6 @@
 #include <stdbool.h>
 #include <objtool/cfi.h>
 #include <objtool/arch.h>
-#include <objtool/elf.h>
 
 struct insn_state {
 	struct cfi_state cfi;
@@ -92,10 +91,4 @@ struct instruction *find_insn(struct objtool_file *file,
 			insn->sec == sec;				\
 	     insn = list_next_entry(insn, list))
 
-static inline void reloc_to_insn(struct reloc *reloc, void *dst)
-{
-	struct instruction *insn = dst;
-	__reloc_to_insn(reloc, insn->sec, insn->offset);
-}
-
 #endif /* _CHECK_H */
diff --git a/tools/objtool/include/objtool/elf.h b/tools/objtool/include/objtool/elf.h
index 8b11f3cdad98..825ad326201d 100644
--- a/tools/objtool/include/objtool/elf.h
+++ b/tools/objtool/include/objtool/elf.h
@@ -124,13 +124,11 @@ struct elf *elf_open_read(const char *name, int flags);
 struct section *elf_create_section(struct elf *elf, const char *name, unsigned int sh_flags, size_t entsize, int nr);
 struct section *elf_create_reloc_section(struct elf *elf, struct section *base, int reltype);
 
-void reloc_to_sym(struct reloc *reloc, void *sym);
-void __reloc_to_insn(struct reloc *reloc, struct section *sec, unsigned long offset);
-
-struct reloc *elf_create_reloc(struct elf *elf,
-			       struct section *sec, unsigned long offset,
-			       unsigned int type,
-			       void (*set_dest)(struct reloc *reloc, void *), void *dst, int addend);
+int elf_add_reloc(struct elf *elf, struct section *sec, unsigned long offset,
+		  unsigned int type, struct symbol *sym, int addend);
+int elf_add_reloc_to_insn(struct elf *elf, struct section *sec,
+			  unsigned long offset, unsigned int type,
+			  struct section *insn_sec, unsigned long insn_off);
 
 int elf_write_insn(struct elf *elf, struct section *sec,
 		   unsigned long offset, unsigned int len,
@@ -148,8 +146,6 @@ struct reloc *find_reloc_by_dest(const struct elf *elf, struct section *sec, uns
 struct reloc *find_reloc_by_dest_range(const struct elf *elf, struct section *sec,
 				     unsigned long offset, unsigned int len);
 struct symbol *find_func_containing(struct section *sec, unsigned long offset);
-void insn_to_reloc_sym_addend(struct section *sec, unsigned long offset,
-			      struct reloc *reloc);
 
 #define for_each_sec(file, sec)						\
 	list_for_each_entry(sec, &file->elf->sections, list)
diff --git a/tools/objtool/orc_gen.c b/tools/objtool/orc_gen.c
index dfc3cc6ef4aa..1b57be6508f4 100644
--- a/tools/objtool/orc_gen.c
+++ b/tools/objtool/orc_gen.c
@@ -81,24 +81,11 @@ static int init_orc_entry(struct orc_entry *orc, struct cfi_state *cfi)
 	return 0;
 }
 
-struct orc_list_entry {
-	struct list_head list;
-	struct orc_entry orc;
-	struct section *insn_sec;
-	unsigned long insn_off;
-};
-
-static void reloc_to_orc(struct reloc *reloc, void *dst)
-{
-	struct orc_list_entry *entry = dst;
-	__reloc_to_insn(reloc, entry->insn_sec, entry->insn_off);
-}
-
 static int write_orc_entry(struct elf *elf, struct section *orc_sec,
 			   struct section *ip_sec, unsigned int idx,
-			   struct orc_list_entry *entry)
+			   struct section *insn_sec, unsigned long insn_off,
+			   struct orc_entry *o)
 {
-	struct orc_entry *o = &entry->orc;
 	struct orc_entry *orc;
 
 	/* populate ORC data */
@@ -108,16 +95,20 @@ static int write_orc_entry(struct elf *elf, struct section *orc_sec,
 	orc->bp_offset = bswap_if_needed(orc->bp_offset);
 
 	/* populate reloc for ip */
-	if (!elf_create_reloc(elf, ip_sec, idx * sizeof(int),
-			      R_X86_64_PC32,
-			      reloc_to_orc, entry, 0)) {
-		WARN_ELF("elf_create_reloc: ORC ip");
+	if (elf_add_reloc_to_insn(elf, ip_sec, idx * sizeof(int), R_X86_64_PC32,
+				  insn_sec, insn_off))
 		return -1;
-	}
 
 	return 0;
 }
 
+struct orc_list_entry {
+	struct list_head list;
+	struct orc_entry orc;
+	struct section *insn_sec;
+	unsigned long insn_off;
+};
+
 static int orc_list_add(struct list_head *orc_list, struct orc_entry *orc,
 			struct section *sec, unsigned long offset)
 {
@@ -239,7 +230,9 @@ int orc_create(struct objtool_file *file)
 
 	/* Write ORC entries to sections: */
 	list_for_each_entry(entry, &orc_list, list) {
-		if (write_orc_entry(file->elf, orc_sec, sec, idx++, entry))
+		if (write_orc_entry(file->elf, orc_sec, sec, idx++,
+				    entry->insn_sec, entry->insn_off,
+				    &entry->orc))
 			return -1;
 	}
 


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 09/14] objtool: Extract elf_strtab_concat()
  2021-03-18 17:11 ` [PATCH v2 09/14] objtool: Extract elf_strtab_concat() Peter Zijlstra
@ 2021-03-19  2:10   ` Josh Poimboeuf
  2021-03-19  9:52     ` Peter Zijlstra
  0 siblings, 1 reply; 45+ messages in thread
From: Josh Poimboeuf @ 2021-03-19  2:10 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: x86, jgross, mbenes, linux-kernel

On Thu, Mar 18, 2021 at 06:11:12PM +0100, Peter Zijlstra wrote:
> Create a common helper to append strings to a strtab.
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  tools/objtool/elf.c |   73 +++++++++++++++++++++++++++++-----------------------
>  1 file changed, 42 insertions(+), 31 deletions(-)
> 
> --- a/tools/objtool/elf.c
> +++ b/tools/objtool/elf.c
> @@ -676,13 +676,51 @@ struct elf *elf_open_read(const char *na
>  	return NULL;
>  }
>  
> +static int elf_strtab_concat(struct elf *elf, char *name, const char *strtab_name)
> +{
> +	struct section *strtab = NULL;
> +	Elf_Data *data;
> +	Elf_Scn *s;
> +	int len;
> +
> +	if (strtab_name)
> +		strtab = find_section_by_name(elf, strtab_name);
> +	if (!strtab)
> +		strtab = find_section_by_name(elf, ".strtab");
> +	if (!strtab) {
> +		WARN("can't find %s or .strtab section", strtab_name);
> +		return -1;
> +	}

This part's a bit mysterious (and it loses the Clang comment).  Maybe we
can leave the section finding logic in elf_create_section()?

diff --git a/tools/objtool/elf.c b/tools/objtool/elf.c
index b85db6efb9d3..db9ad54894d8 100644
--- a/tools/objtool/elf.c
+++ b/tools/objtool/elf.c
@@ -676,19 +676,17 @@ struct elf *elf_open_read(const char *name, int flags)
 	return NULL;
 }
 
-static int elf_strtab_concat(struct elf *elf, char *name, const char *strtab_name)
+static int elf_add_string(struct elf *elf, struct section *strtab, char *str)
 {
 	struct section *strtab = NULL;
 	Elf_Data *data;
 	Elf_Scn *s;
 	int len;
 
-	if (strtab_name)
-		strtab = find_section_by_name(elf, strtab_name);
 	if (!strtab)
 		strtab = find_section_by_name(elf, ".strtab");
 	if (!strtab) {
-		WARN("can't find %s or .strtab section", strtab_name);
+		WARN("can't find .strtab section");
 		return -1;
 	}
 
@@ -718,7 +716,7 @@ static int elf_strtab_concat(struct elf *elf, char *name, const char *strtab_nam
 struct section *elf_create_section(struct elf *elf, const char *name,
 				   unsigned int sh_flags, size_t entsize, int nr)
 {
-	struct section *sec;
+	struct section *sec, *shstrtab;
 	size_t size = entsize * nr;
 	Elf_Scn *s;
 
@@ -777,7 +775,15 @@ struct section *elf_create_section(struct elf *elf, const char *name,
 	sec->sh.sh_addralign = 1;
 	sec->sh.sh_flags = SHF_ALLOC | sh_flags;
 
-	sec->sh.sh_name = elf_strtab_concat(elf, sec->name, ".shstrtab");
+	/* Add section name to .shstrtab (or .strtab for Clang) */
+	shstrtab = find_section_by_name(elf, ".shstrtab");
+	if (!shstrtab)
+		shstrtab = find_section_by_name(elf, ".strtab");
+	if (!shstrtab) {
+		WARN("can't find .shstrtab or .strtab section");
+		return NULL;
+	}
+	sec->sh.sh_name = elf_add_string(elf, sec->name, shstrtab);
 	if (sec->sh.sh_name == -1)
 		return NULL;
 


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 10/14] objtool: Extract elf_symbol_add()
  2021-03-18 17:11 ` [PATCH v2 10/14] objtool: Extract elf_symbol_add() Peter Zijlstra
@ 2021-03-19  2:14   ` Josh Poimboeuf
  2021-03-19  9:54     ` Peter Zijlstra
  0 siblings, 1 reply; 45+ messages in thread
From: Josh Poimboeuf @ 2021-03-19  2:14 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: x86, jgross, mbenes, linux-kernel

On Thu, Mar 18, 2021 at 06:11:13PM +0100, Peter Zijlstra wrote:
> Create a common helper to add symbols.
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  tools/objtool/elf.c |   57 ++++++++++++++++++++++++++++++----------------------
>  1 file changed, 33 insertions(+), 24 deletions(-)
> 
> --- a/tools/objtool/elf.c
> +++ b/tools/objtool/elf.c
> @@ -290,12 +290,41 @@ static int read_sections(struct elf *elf
>  	return 0;
>  }
>  
> +static bool elf_symbol_add(struct elf *elf, struct symbol *sym)

How about "elf_add_symbol()" for consistency with my other suggestions
(elf_add_reloc() and elf_add_string()).  And return an int.

-- 
Josh


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 11/14] objtool: Add elf_create_undef_symbol()
  2021-03-18 17:11 ` [PATCH v2 11/14] objtool: Add elf_create_undef_symbol() Peter Zijlstra
@ 2021-03-19  2:29   ` Josh Poimboeuf
  2021-03-19  7:56     ` Peter Zijlstra
  0 siblings, 1 reply; 45+ messages in thread
From: Josh Poimboeuf @ 2021-03-19  2:29 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: x86, jgross, mbenes, linux-kernel

On Thu, Mar 18, 2021 at 06:11:14PM +0100, Peter Zijlstra wrote:
> Allow objtool to create undefined symbols; this allows creating
> relocations to symbols not currently in the symbol table.
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  tools/objtool/elf.c                 |   63 ++++++++++++++++++++++++++++++++++++
>  tools/objtool/include/objtool/elf.h |    1 
>  2 files changed, 64 insertions(+)
> 
> --- a/tools/objtool/elf.c
> +++ b/tools/objtool/elf.c
> @@ -724,6 +724,69 @@ static int elf_strtab_concat(struct elf
>  	return len;
>  }
>  
> +struct symbol *elf_create_undef_symbol(struct elf *elf, const char *name)
> +{
> +	struct section *symtab;
> +	struct symbol *sym;
> +	Elf_Data *data;
> +	Elf_Scn *s;
> +
> +	sym = malloc(sizeof(*sym));
> +	if (!sym) {
> +		perror("malloc");
> +		return NULL;
> +	}
> +	memset(sym, 0, sizeof(*sym));
> +
> +	sym->name = strdup(name);
> +
> +	sym->sym.st_name = elf_strtab_concat(elf, sym->name, NULL);
> +	if (sym->sym.st_name == -1)
> +		return NULL;
> +
> +	sym->sym.st_info = 0x10; /* STB_GLOBAL, STT_NOTYPE */

There's a generic macro for this:
	
	sym->sym.st_info = GELF_ST_INFO(STB_GLOBAL, STT_NOTYPE);

And sym->bind and sym->type should probably get set.

-- 
Josh


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 05/14] objtool: Per arch retpoline naming
  2021-03-18 17:11 ` [PATCH v2 05/14] objtool: Per arch retpoline naming Peter Zijlstra
@ 2021-03-19  2:38   ` Josh Poimboeuf
  2021-03-19  9:07     ` Peter Zijlstra
  0 siblings, 1 reply; 45+ messages in thread
From: Josh Poimboeuf @ 2021-03-19  2:38 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: x86, jgross, mbenes, linux-kernel

On Thu, Mar 18, 2021 at 06:11:08PM +0100, Peter Zijlstra wrote:
> @@ -872,7 +877,7 @@ static int add_jump_destinations(struct
>  		} else if (reloc->sym->type == STT_SECTION) {
>  			dest_sec = reloc->sym->sec;
>  			dest_off = arch_dest_reloc_offset(reloc->addend);
> -		} else if (!strncmp(reloc->sym->name, "__x86_indirect_thunk_", 21)) {
> +		} else if (arch_is_retpoline(reloc->sym)) {
>  			/*
>  			 * Retpoline jumps are really dynamic jumps in
>  			 * disguise, so convert them accordingly.

There's another one in add_call_destinations().

-- 
Josh


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 12/14] objtool: Allow archs to rewrite retpolines
  2021-03-18 17:11 ` [PATCH v2 12/14] objtool: Allow archs to rewrite retpolines Peter Zijlstra
@ 2021-03-19  2:54   ` Josh Poimboeuf
  2021-03-19 11:21     ` Peter Zijlstra
  0 siblings, 1 reply; 45+ messages in thread
From: Josh Poimboeuf @ 2021-03-19  2:54 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: x86, jgross, mbenes, linux-kernel

On Thu, Mar 18, 2021 at 06:11:15PM +0100, Peter Zijlstra wrote:
> @@ -1212,6 +1225,8 @@ static int handle_group_alt(struct objto
>  		dest_off = arch_jump_destination(insn);
>  		if (dest_off == special_alt->new_off + special_alt->new_len)
>  			insn->jump_dest = next_insn_same_sec(file, last_orig_insn);
> +		else
> +			insn->jump_dest = find_insn(file, insn->sec, dest_off);
>  
>  		if (!insn->jump_dest) {
>  			WARN_FUNC("can't find alternative jump destination",

So I assume this change is because of the ordering change: now this is
done before add_jump_destinations().

But doesn't that mean the alternative jump modification (changing the
dest to the end of the original insns) will get overwritten later?

Also the new hunk to be an oversimplified version of
add_jump_destinations().  I'm not quite convinced that it will always do
the right thing for this case.

> @@ -1797,11 +1812,15 @@ static int decode_sections(struct objtoo
>  	if (ret)
>  		return ret;
>  
> -	ret = add_jump_destinations(file);
> +	/*
> +	 * Must be before add_{jump,call}_destination; for they can add
> +	 * magic alternatives.
> +	 */
> +	ret = add_special_section_alts(file);

This reordering is unfortunate.  Maybe there's a better way, though I
don't have any ideas, at least until I get to the most controversial
patch.

-- 
Josh


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 14/14] objtool,x86: Rewrite retpoline thunk calls
  2021-03-18 17:11 ` [PATCH v2 14/14] objtool,x86: Rewrite retpoline thunk calls Peter Zijlstra
@ 2021-03-19  3:29   ` Josh Poimboeuf
  2021-03-19  8:06     ` Peter Zijlstra
  0 siblings, 1 reply; 45+ messages in thread
From: Josh Poimboeuf @ 2021-03-19  3:29 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: x86, jgross, mbenes, linux-kernel

On Thu, Mar 18, 2021 at 06:11:17PM +0100, Peter Zijlstra wrote:
> When the compiler emits: "CALL __x86_indirect_thunk_\reg" for an
> indirect call, have objtool rewrite it to:
> 
> 	ALTERNATIVE "call __x86_indirect_thunk_\reg",
> 		    "call *%reg", ALT_NOT(X86_FEATURE_RETPOLINE)
> 
> Additionally, in order to not emit endless identical
> .altinst_replacement chunks, use a global symbol for them, see
> __x86_indirect_alt_*.
> 
> This also avoids objtool from having to do code generation.
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>

This is better than I expected.  Nice workaround for not generating
code.

> +.macro ALT_THUNK reg
> +
> +	.align 1
> +
> +SYM_FUNC_START_NOALIGN(__x86_indirect_alt_call_\reg)
> +	ANNOTATE_RETPOLINE_SAFE
> +1:	call	*%\reg
> +2:	.skip	5-(2b-1b), 0x90
> +SYM_FUNC_END(__x86_indirect_alt_call_\reg)
> +
> +SYM_FUNC_START_NOALIGN(__x86_indirect_alt_jmp_\reg)
> +	ANNOTATE_RETPOLINE_SAFE
> +1:	jmp	*%\reg
> +2:	.skip	5-(2b-1b), 0x90
> +SYM_FUNC_END(__x86_indirect_alt_jmp_\reg)

This mysterious code needs a comment.  Shouldn't it be in
.altinstr_replacement or something?

Also doesn't the alternative code already insert nops?

> +int arch_rewrite_retpoline(struct objtool_file *file,
> +			   struct instruction *insn,
> +			   struct reloc *reloc)
> +{
> +	struct symbol *sym;
> +	char name[32] = "";
> +
> +	if (!strcmp(insn->sec->name, ".text.__x86.indirect_thunk"))
> +		return 0;
> +
> +	sprintf(name, "__x86_indirect_alt_%s_%s",
> +		insn->type == INSN_JUMP_DYNAMIC ? "jmp" : "call",
> +		reloc->sym->name + 21);
> +
> +	sym = find_symbol_by_name(file->elf, name);
> +	if (!sym) {
> +		sym = elf_create_undef_symbol(file->elf, name);
> +		if (!sym) {
> +			WARN("elf_create_undef_symbol");
> +			return -1;
> +		}
> +	}
> +
> +	elf_add_alternative(file->elf, insn, sym,
> +			    ALT_NOT(X86_FEATURE_RETPOLINE), 5, 5);
> +
> +	return 0;
> +}

Need to propagate the error.

-- 
Josh


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 11/14] objtool: Add elf_create_undef_symbol()
  2021-03-19  2:29   ` Josh Poimboeuf
@ 2021-03-19  7:56     ` Peter Zijlstra
  0 siblings, 0 replies; 45+ messages in thread
From: Peter Zijlstra @ 2021-03-19  7:56 UTC (permalink / raw)
  To: Josh Poimboeuf; +Cc: x86, jgross, mbenes, linux-kernel

On Thu, Mar 18, 2021 at 09:29:23PM -0500, Josh Poimboeuf wrote:
> On Thu, Mar 18, 2021 at 06:11:14PM +0100, Peter Zijlstra wrote:
> > Allow objtool to create undefined symbols; this allows creating
> > relocations to symbols not currently in the symbol table.
> > 
> > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> > ---
> >  tools/objtool/elf.c                 |   63 ++++++++++++++++++++++++++++++++++++
> >  tools/objtool/include/objtool/elf.h |    1 
> >  2 files changed, 64 insertions(+)
> > 
> > --- a/tools/objtool/elf.c
> > +++ b/tools/objtool/elf.c
> > @@ -724,6 +724,69 @@ static int elf_strtab_concat(struct elf
> >  	return len;
> >  }
> >  
> > +struct symbol *elf_create_undef_symbol(struct elf *elf, const char *name)
> > +{
> > +	struct section *symtab;
> > +	struct symbol *sym;
> > +	Elf_Data *data;
> > +	Elf_Scn *s;
> > +
> > +	sym = malloc(sizeof(*sym));
> > +	if (!sym) {
> > +		perror("malloc");
> > +		return NULL;
> > +	}
> > +	memset(sym, 0, sizeof(*sym));
> > +
> > +	sym->name = strdup(name);
> > +
> > +	sym->sym.st_name = elf_strtab_concat(elf, sym->name, NULL);
> > +	if (sym->sym.st_name == -1)
> > +		return NULL;
> > +
> > +	sym->sym.st_info = 0x10; /* STB_GLOBAL, STT_NOTYPE */
> 
> There's a generic macro for this:
> 	
> 	sym->sym.st_info = GELF_ST_INFO(STB_GLOBAL, STT_NOTYPE);

Ah, I remember not finding that many moons ago when I wrote that ..

> And sym->bind and sym->type should probably get set.

They are, it's in that elf_add_symbol() thing.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 14/14] objtool,x86: Rewrite retpoline thunk calls
  2021-03-19  3:29   ` Josh Poimboeuf
@ 2021-03-19  8:06     ` Peter Zijlstra
  2021-03-19 15:30       ` Josh Poimboeuf
  0 siblings, 1 reply; 45+ messages in thread
From: Peter Zijlstra @ 2021-03-19  8:06 UTC (permalink / raw)
  To: Josh Poimboeuf; +Cc: x86, jgross, mbenes, linux-kernel

On Thu, Mar 18, 2021 at 10:29:55PM -0500, Josh Poimboeuf wrote:
> On Thu, Mar 18, 2021 at 06:11:17PM +0100, Peter Zijlstra wrote:
> > When the compiler emits: "CALL __x86_indirect_thunk_\reg" for an
> > indirect call, have objtool rewrite it to:
> > 
> > 	ALTERNATIVE "call __x86_indirect_thunk_\reg",
> > 		    "call *%reg", ALT_NOT(X86_FEATURE_RETPOLINE)
> > 
> > Additionally, in order to not emit endless identical
> > .altinst_replacement chunks, use a global symbol for them, see
> > __x86_indirect_alt_*.
> > 
> > This also avoids objtool from having to do code generation.
> > 
> > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> 
> This is better than I expected.  Nice workaround for not generating
> code.

Thanks :-)

> > +.macro ALT_THUNK reg
> > +
> > +	.align 1
> > +
> > +SYM_FUNC_START_NOALIGN(__x86_indirect_alt_call_\reg)
> > +	ANNOTATE_RETPOLINE_SAFE
> > +1:	call	*%\reg
> > +2:	.skip	5-(2b-1b), 0x90
> > +SYM_FUNC_END(__x86_indirect_alt_call_\reg)
> > +
> > +SYM_FUNC_START_NOALIGN(__x86_indirect_alt_jmp_\reg)
> > +	ANNOTATE_RETPOLINE_SAFE
> > +1:	jmp	*%\reg
> > +2:	.skip	5-(2b-1b), 0x90
> > +SYM_FUNC_END(__x86_indirect_alt_jmp_\reg)
> 
> This mysterious code needs a comment.  Shouldn't it be in
> .altinstr_replacement or something?

Comment, yes, I suppose so. And no, if we stick it in
.altinstr_replacement we'll throw them away with initmem and module
alternative patching (which will also refer to these symbols) will go
side-ways.

> Also doesn't the alternative code already insert nops?

Problem is that the {call,jmp} *%\reg thing is not fixed length. They're
2 or 3 bytes depending on which register is picked.

We could make them all 3 long and insert 0,1 nop I suppose.

Initially alternatives wouldn't re-optimize nops on patched things, it
would simply add nops on. And I had the above be:

1:	INSN	*%\reg
2:	.nops	5-(2b-1b)

and we'd get a single right sized nop. But the .nops directive it too
new, we support binutils that don't have it :/

Hence, it now reads:

2:	.skip	5-(2b-1b), 0x90

End result is that alternative NOP optimizer patch at the start of the
series that now also optimizes a bunch of cases that are unrelated and
were previously missed -- but crucially, it covers this case too :-)

Anyway, yes I could make it 3 long.

> > +int arch_rewrite_retpoline(struct objtool_file *file,
> > +			   struct instruction *insn,
> > +			   struct reloc *reloc)
> > +{
> > +	struct symbol *sym;
> > +	char name[32] = "";
> > +
> > +	if (!strcmp(insn->sec->name, ".text.__x86.indirect_thunk"))
> > +		return 0;
> > +
> > +	sprintf(name, "__x86_indirect_alt_%s_%s",
> > +		insn->type == INSN_JUMP_DYNAMIC ? "jmp" : "call",
> > +		reloc->sym->name + 21);
> > +
> > +	sym = find_symbol_by_name(file->elf, name);
> > +	if (!sym) {
> > +		sym = elf_create_undef_symbol(file->elf, name);
> > +		if (!sym) {
> > +			WARN("elf_create_undef_symbol");
> > +			return -1;
> > +		}
> > +	}
> > +
> > +	elf_add_alternative(file->elf, insn, sym,
> > +			    ALT_NOT(X86_FEATURE_RETPOLINE), 5, 5);
> > +
> > +	return 0;
> > +}
> 
> Need to propagate the error.

Oh, indeed so.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 05/14] objtool: Per arch retpoline naming
  2021-03-19  2:38   ` Josh Poimboeuf
@ 2021-03-19  9:07     ` Peter Zijlstra
  0 siblings, 0 replies; 45+ messages in thread
From: Peter Zijlstra @ 2021-03-19  9:07 UTC (permalink / raw)
  To: Josh Poimboeuf; +Cc: x86, jgross, mbenes, linux-kernel

On Thu, Mar 18, 2021 at 09:38:14PM -0500, Josh Poimboeuf wrote:
> On Thu, Mar 18, 2021 at 06:11:08PM +0100, Peter Zijlstra wrote:
> > @@ -872,7 +877,7 @@ static int add_jump_destinations(struct
> >  		} else if (reloc->sym->type == STT_SECTION) {
> >  			dest_sec = reloc->sym->sec;
> >  			dest_off = arch_dest_reloc_offset(reloc->addend);
> > -		} else if (!strncmp(reloc->sym->name, "__x86_indirect_thunk_", 21)) {
> > +		} else if (arch_is_retpoline(reloc->sym)) {
> >  			/*
> >  			 * Retpoline jumps are really dynamic jumps in
> >  			 * disguise, so convert them accordingly.
> 
> There's another one in add_call_destinations().

Bah, I'm sure I converted all of them at some point, must've lost a hunk
somewhere along the way. Fixed!

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 08/14] objtool: Add elf_create_reloc() helper
  2021-03-19  1:42   ` Josh Poimboeuf
@ 2021-03-19  9:47     ` Peter Zijlstra
  2021-03-19 15:12       ` Josh Poimboeuf
  0 siblings, 1 reply; 45+ messages in thread
From: Peter Zijlstra @ 2021-03-19  9:47 UTC (permalink / raw)
  To: Josh Poimboeuf; +Cc: x86, jgross, mbenes, linux-kernel

On Thu, Mar 18, 2021 at 08:42:46PM -0500, Josh Poimboeuf wrote:
> On Thu, Mar 18, 2021 at 06:11:11PM +0100, Peter Zijlstra wrote:
> > We have 4 instances of adding a relocation. Create a common helper
> > to avoid growing even more.
> > 
> > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> 
> I'm not a fan of the API -- how about squashing this in?  Untested, of
> course.

Can do.. I do seem to have over-cooked it a bit.

> diff --git a/tools/objtool/elf.c b/tools/objtool/elf.c
> index d2afe2454de4..b3bd97b2b034 100644
> --- a/tools/objtool/elf.c
> +++ b/tools/objtool/elf.c
> @@ -447,76 +447,73 @@ static int read_symbols(struct elf *elf)
>  	return -1;
>  }
>  
> -static void elf_add_reloc(struct elf *elf, struct reloc *reloc)
> +int elf_add_reloc(struct elf *elf, struct section *sec, unsigned long offset,
> +		  unsigned int type, struct symbol *sym, int addend)
>  {
> -	struct section *sec = reloc->sec;
> +	struct reloc *reloc;
> +
> +	reloc = malloc(sizeof(*reloc));
> +	if (!reloc) {
> +		perror("malloc");
> +		return -1;
> +	}
> +	memset(reloc, 0, sizeof(*reloc));
> +
> +	reloc->sec = sec->reloc;
> +	reloc->offset = offset;
> +	reloc->type = type;
> +	reloc->sym = sym;
> +	reloc->addend = addend;
>  
>  	list_add_tail(&reloc->list, &sec->reloc_list);
>  	elf_hash_add(elf->reloc_hash, &reloc->hash, reloc_hash(reloc));
>  
> -	sec->changed = true;
> -}

I'm thinking we have to mark something changed though; I've added
sec->reloc->changed = true; there.

> +static int insn_to_sym_addend(struct section *sec, unsigned long offset,
> +			      struct symbol **sym, int *addend)
>  {
>  	if (sec->sym) {
> +		*sym = sec->sym;
> +		*addend = offset;
> +		return 0;
>  	}
>  
>  	/*
>  	 * The Clang assembler strips section symbols, so we have to reference
>  	 * the function symbol instead:
>  	 */
> +	*sym = find_symbol_containing(sec, offset);
> +	if (!*sym) {
>  		/*
>  		 * Hack alert.  This happens when we need to reference the NOP
>  		 * pad insn immediately after the function.
>  		 */
> +		*sym = find_symbol_containing(sec, offset - 1);
>  	}
>  
> +	if (!*sym) {
> +		WARN("can't find symbol containing %s+0x%lx", sec->name, offset);
> +		return -1;
>  	}
>  
> +	*addend = offset - (*sym)->offset;
> +	return 0;
> +}
>  
> +int elf_add_reloc_to_insn(struct elf *elf, struct section *sec,
> +			  unsigned long offset, unsigned int type,
> +			  struct section *insn_sec, unsigned long insn_off)
> +{
> +	struct symbol *sym;
> +	int addend;
>  
> +	if (insn_to_sym_addend(insn_sec, insn_off, &sym, &addend))
> +		return -1;

Might be simpler to not have that function and just do it here instead.

>  
> +	return elf_add_reloc(elf, sec, offset, type, sym, addend);
>  }



Full patch, because diff on a diff on a diff is getting ludicrous hard
to read :-)

---
 check.c               |   78 +++++++++-------------------------------------
 elf.c                 |   84 ++++++++++++++++++++++++++++++++------------------
 include/objtool/elf.h |   10 ++++-
 orc_gen.c             |   30 +++--------------
 4 files changed, 84 insertions(+), 118 deletions(-)

--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -433,8 +433,7 @@ static int add_dead_ends(struct objtool_
 
 static int create_static_call_sections(struct objtool_file *file)
 {
-	struct section *sec, *reloc_sec;
-	struct reloc *reloc;
+	struct section *sec;
 	struct static_call_site *site;
 	struct instruction *insn;
 	struct symbol *key_sym;
@@ -460,8 +459,7 @@ static int create_static_call_sections(s
 	if (!sec)
 		return -1;
 
-	reloc_sec = elf_create_reloc_section(file->elf, sec, SHT_RELA);
-	if (!reloc_sec)
+	if (!elf_create_reloc_section(file->elf, sec, SHT_RELA))
 		return -1;
 
 	idx = 0;
@@ -471,25 +469,11 @@ static int create_static_call_sections(s
 		memset(site, 0, sizeof(struct static_call_site));
 
 		/* populate reloc for 'addr' */
-		reloc = malloc(sizeof(*reloc));
-
-		if (!reloc) {
-			perror("malloc");
+		if (elf_add_reloc_to_insn(file->elf, sec,
+					  idx * sizeof(struct static_call_site),
+					  R_X86_64_PC32,
+					  insn->sec, insn->offset))
 			return -1;
-		}
-		memset(reloc, 0, sizeof(*reloc));
-
-		insn_to_reloc_sym_addend(insn->sec, insn->offset, reloc);
-		if (!reloc->sym) {
-			WARN_FUNC("static call tramp: missing containing symbol",
-				  insn->sec, insn->offset);
-			return -1;
-		}
-
-		reloc->type = R_X86_64_PC32;
-		reloc->offset = idx * sizeof(struct static_call_site);
-		reloc->sec = reloc_sec;
-		elf_add_reloc(file->elf, reloc);
 
 		/* find key symbol */
 		key_name = strdup(insn->call_dest->name);
@@ -526,18 +510,11 @@ static int create_static_call_sections(s
 		free(key_name);
 
 		/* populate reloc for 'key' */
-		reloc = malloc(sizeof(*reloc));
-		if (!reloc) {
-			perror("malloc");
+		if (elf_add_reloc(file->elf, sec,
+				  idx * sizeof(struct static_call_site) + 4,
+				  R_X86_64_PC32, key_sym,
+				  is_sibling_call(insn) * STATIC_CALL_SITE_TAIL))
 			return -1;
-		}
-		memset(reloc, 0, sizeof(*reloc));
-		reloc->sym = key_sym;
-		reloc->addend = is_sibling_call(insn) ? STATIC_CALL_SITE_TAIL : 0;
-		reloc->type = R_X86_64_PC32;
-		reloc->offset = idx * sizeof(struct static_call_site) + 4;
-		reloc->sec = reloc_sec;
-		elf_add_reloc(file->elf, reloc);
 
 		idx++;
 	}
@@ -547,8 +524,7 @@ static int create_static_call_sections(s
 
 static int create_mcount_loc_sections(struct objtool_file *file)
 {
-	struct section *sec, *reloc_sec;
-	struct reloc *reloc;
+	struct section *sec;
 	unsigned long *loc;
 	struct instruction *insn;
 	int idx;
@@ -571,8 +547,7 @@ static int create_mcount_loc_sections(st
 	if (!sec)
 		return -1;
 
-	reloc_sec = elf_create_reloc_section(file->elf, sec, SHT_RELA);
-	if (!reloc_sec)
+	if (!elf_create_reloc_section(file->elf, sec, SHT_RELA))
 		return -1;
 
 	idx = 0;
@@ -581,32 +556,11 @@ static int create_mcount_loc_sections(st
 		loc = (unsigned long *)sec->data->d_buf + idx;
 		memset(loc, 0, sizeof(unsigned long));
 
-		reloc = malloc(sizeof(*reloc));
-		if (!reloc) {
-			perror("malloc");
+		if (elf_add_reloc_to_insn(file->elf, sec,
+					  idx * sizeof(unsigned long),
+					  R_X86_64_64,
+					  insn->sec, insn->offset))
 			return -1;
-		}
-		memset(reloc, 0, sizeof(*reloc));
-
-		if (insn->sec->sym) {
-			reloc->sym = insn->sec->sym;
-			reloc->addend = insn->offset;
-		} else {
-			reloc->sym = find_symbol_containing(insn->sec, insn->offset);
-
-			if (!reloc->sym) {
-				WARN("missing symbol for insn at offset 0x%lx\n",
-				     insn->offset);
-				return -1;
-			}
-
-			reloc->addend = insn->offset - reloc->sym->offset;
-		}
-
-		reloc->type = R_X86_64_64;
-		reloc->offset = idx * sizeof(unsigned long);
-		reloc->sec = reloc_sec;
-		elf_add_reloc(file->elf, reloc);
 
 		idx++;
 	}
--- a/tools/objtool/elf.c
+++ b/tools/objtool/elf.c
@@ -211,32 +211,6 @@ struct reloc *find_reloc_by_dest(const s
 	return find_reloc_by_dest_range(elf, sec, offset, 1);
 }
 
-void insn_to_reloc_sym_addend(struct section *sec, unsigned long offset,
-			      struct reloc *reloc)
-{
-	if (sec->sym) {
-		reloc->sym = sec->sym;
-		reloc->addend = offset;
-		return;
-	}
-
-	/*
-	 * The Clang assembler strips section symbols, so we have to reference
-	 * the function symbol instead:
-	 */
-	reloc->sym = find_symbol_containing(sec, offset);
-	if (!reloc->sym) {
-		/*
-		 * Hack alert.  This happens when we need to reference the NOP
-		 * pad insn immediately after the function.
-		 */
-		reloc->sym = find_symbol_containing(sec, offset - 1);
-	}
-
-	if (reloc->sym)
-		reloc->addend = offset - reloc->sym->offset;
-}
-
 static int read_sections(struct elf *elf)
 {
 	Elf_Scn *s = NULL;
@@ -473,14 +447,66 @@ static int read_symbols(struct elf *elf)
 	return -1;
 }
 
-void elf_add_reloc(struct elf *elf, struct reloc *reloc)
+int elf_add_reloc(struct elf *elf, struct section *sec, unsigned long offset,
+		  unsigned int type, struct symbol *sym, int addend)
 {
-	struct section *sec = reloc->sec;
+	struct reloc *reloc;
+
+	reloc = malloc(sizeof(*reloc));
+	if (!reloc) {
+		perror("malloc");
+		return -1;
+	}
+	memset(reloc, 0, sizeof(*reloc));
+
+	reloc->sec = sec->reloc;
+	reloc->offset = offset;
+	reloc->type = type;
+	reloc->sym = sym;
+	reloc->addend = addend;
 
 	list_add_tail(&reloc->list, &sec->reloc_list);
 	elf_hash_add(elf->reloc_hash, &reloc->hash, reloc_hash(reloc));
 
-	sec->changed = true;
+	sec->reloc->changed = true;
+
+	return 0;
+}
+
+int elf_add_reloc_to_insn(struct elf *elf, struct section *sec,
+			  unsigned long offset, unsigned int type,
+			  struct section *insn_sec, unsigned long insn_off)
+{
+	struct symbol *sym;
+	int addend;
+
+	if (sec->sym) {
+		sym = sec->sym;
+		addend = offset;
+
+	} else {
+		/*
+		 * The Clang assembler strips section symbols, so we have to
+		 * reference the function symbol instead:
+		 */
+		sym = find_symbol_containing(sec, offset);
+		if (!sym) {
+			/*
+			 * Hack alert.  This happens when we need to reference
+			 * the NOP pad insn immediately after the function.
+			 */
+			sym = find_symbol_containing(sec, offset - 1);
+		}
+
+		if (!sym) {
+			WARN("can't find symbol containing %s+0x%lx", sec->name, offset);
+			return -1;
+		}
+
+		addend = offset - sym->offset;
+	}
+
+	return elf_add_reloc(elf, sec, offset, type, sym, addend);
 }
 
 static int read_rel_reloc(struct section *sec, int i, struct reloc *reloc, unsigned int *symndx)
--- a/tools/objtool/include/objtool/elf.h
+++ b/tools/objtool/include/objtool/elf.h
@@ -123,7 +123,13 @@ static inline u32 reloc_hash(struct relo
 struct elf *elf_open_read(const char *name, int flags);
 struct section *elf_create_section(struct elf *elf, const char *name, unsigned int sh_flags, size_t entsize, int nr);
 struct section *elf_create_reloc_section(struct elf *elf, struct section *base, int reltype);
-void elf_add_reloc(struct elf *elf, struct reloc *reloc);
+
+int elf_add_reloc(struct elf *elf, struct section *sec, unsigned long offset,
+		  unsigned int type, struct symbol *sym, int addend);
+int elf_add_reloc_to_insn(struct elf *elf, struct section *sec,
+			  unsigned long offset, unsigned int type,
+			  struct section *insn_sec, unsigned long insn_off);
+
 int elf_write_insn(struct elf *elf, struct section *sec,
 		   unsigned long offset, unsigned int len,
 		   const char *insn);
@@ -140,8 +146,6 @@ struct reloc *find_reloc_by_dest(const s
 struct reloc *find_reloc_by_dest_range(const struct elf *elf, struct section *sec,
 				     unsigned long offset, unsigned int len);
 struct symbol *find_func_containing(struct section *sec, unsigned long offset);
-void insn_to_reloc_sym_addend(struct section *sec, unsigned long offset,
-			      struct reloc *reloc);
 
 #define for_each_sec(file, sec)						\
 	list_for_each_entry(sec, &file->elf->sections, list)
--- a/tools/objtool/orc_gen.c
+++ b/tools/objtool/orc_gen.c
@@ -82,12 +82,11 @@ static int init_orc_entry(struct orc_ent
 }
 
 static int write_orc_entry(struct elf *elf, struct section *orc_sec,
-			   struct section *ip_rsec, unsigned int idx,
+			   struct section *ip_sec, unsigned int idx,
 			   struct section *insn_sec, unsigned long insn_off,
 			   struct orc_entry *o)
 {
 	struct orc_entry *orc;
-	struct reloc *reloc;
 
 	/* populate ORC data */
 	orc = (struct orc_entry *)orc_sec->data->d_buf + idx;
@@ -96,25 +95,9 @@ static int write_orc_entry(struct elf *e
 	orc->bp_offset = bswap_if_needed(orc->bp_offset);
 
 	/* populate reloc for ip */
-	reloc = malloc(sizeof(*reloc));
-	if (!reloc) {
-		perror("malloc");
+	if (elf_add_reloc_to_insn(elf, ip_sec, idx * sizeof(int), R_X86_64_PC32,
+				  insn_sec, insn_off))
 		return -1;
-	}
-	memset(reloc, 0, sizeof(*reloc));
-
-	insn_to_reloc_sym_addend(insn_sec, insn_off, reloc);
-	if (!reloc->sym) {
-		WARN("missing symbol for insn at offset 0x%lx",
-		     insn_off);
-		return -1;
-	}
-
-	reloc->type = R_X86_64_PC32;
-	reloc->offset = idx * sizeof(int);
-	reloc->sec = ip_rsec;
-
-	elf_add_reloc(elf, reloc);
 
 	return 0;
 }
@@ -153,7 +136,7 @@ static unsigned long alt_group_len(struc
 
 int orc_create(struct objtool_file *file)
 {
-	struct section *sec, *ip_rsec, *orc_sec;
+	struct section *sec, *orc_sec;
 	unsigned int nr = 0, idx = 0;
 	struct orc_list_entry *entry;
 	struct list_head orc_list;
@@ -242,13 +225,12 @@ int orc_create(struct objtool_file *file
 	sec = elf_create_section(file->elf, ".orc_unwind_ip", 0, sizeof(int), nr);
 	if (!sec)
 		return -1;
-	ip_rsec = elf_create_reloc_section(file->elf, sec, SHT_RELA);
-	if (!ip_rsec)
+	if (!elf_create_reloc_section(file->elf, sec, SHT_RELA))
 		return -1;
 
 	/* Write ORC entries to sections: */
 	list_for_each_entry(entry, &orc_list, list) {
-		if (write_orc_entry(file->elf, orc_sec, ip_rsec, idx++,
+		if (write_orc_entry(file->elf, orc_sec, sec, idx++,
 				    entry->insn_sec, entry->insn_off,
 				    &entry->orc))
 			return -1;

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 09/14] objtool: Extract elf_strtab_concat()
  2021-03-19  2:10   ` Josh Poimboeuf
@ 2021-03-19  9:52     ` Peter Zijlstra
  0 siblings, 0 replies; 45+ messages in thread
From: Peter Zijlstra @ 2021-03-19  9:52 UTC (permalink / raw)
  To: Josh Poimboeuf; +Cc: x86, jgross, mbenes, linux-kernel

On Thu, Mar 18, 2021 at 09:10:38PM -0500, Josh Poimboeuf wrote:
> On Thu, Mar 18, 2021 at 06:11:12PM +0100, Peter Zijlstra wrote:
> > Create a common helper to append strings to a strtab.
> > 
> > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> > ---
> >  tools/objtool/elf.c |   73 +++++++++++++++++++++++++++++-----------------------
> >  1 file changed, 42 insertions(+), 31 deletions(-)
> > 
> > --- a/tools/objtool/elf.c
> > +++ b/tools/objtool/elf.c
> > @@ -676,13 +676,51 @@ struct elf *elf_open_read(const char *na
> >  	return NULL;
> >  }
> >  
> > +static int elf_strtab_concat(struct elf *elf, char *name, const char *strtab_name)
> > +{
> > +	struct section *strtab = NULL;
> > +	Elf_Data *data;
> > +	Elf_Scn *s;
> > +	int len;
> > +
> > +	if (strtab_name)
> > +		strtab = find_section_by_name(elf, strtab_name);
> > +	if (!strtab)
> > +		strtab = find_section_by_name(elf, ".strtab");
> > +	if (!strtab) {
> > +		WARN("can't find %s or .strtab section", strtab_name);
> > +		return -1;
> > +	}
> 
> This part's a bit mysterious (and it loses the Clang comment).  Maybe we
> can leave the section finding logic in elf_create_section()?

Sure, made it compile too ;-)

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 10/14] objtool: Extract elf_symbol_add()
  2021-03-19  2:14   ` Josh Poimboeuf
@ 2021-03-19  9:54     ` Peter Zijlstra
  2021-03-19 15:04       ` Josh Poimboeuf
  0 siblings, 1 reply; 45+ messages in thread
From: Peter Zijlstra @ 2021-03-19  9:54 UTC (permalink / raw)
  To: Josh Poimboeuf; +Cc: x86, jgross, mbenes, linux-kernel

On Thu, Mar 18, 2021 at 09:14:03PM -0500, Josh Poimboeuf wrote:
> On Thu, Mar 18, 2021 at 06:11:13PM +0100, Peter Zijlstra wrote:
> > Create a common helper to add symbols.
> > 
> > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> > ---
> >  tools/objtool/elf.c |   57 ++++++++++++++++++++++++++++++----------------------
> >  1 file changed, 33 insertions(+), 24 deletions(-)
> > 
> > --- a/tools/objtool/elf.c
> > +++ b/tools/objtool/elf.c
> > @@ -290,12 +290,41 @@ static int read_sections(struct elf *elf
> >  	return 0;
> >  }
> >  
> > +static bool elf_symbol_add(struct elf *elf, struct symbol *sym)
> 
> How about "elf_add_symbol()" for consistency with my other suggestions
> (elf_add_reloc() and elf_add_string()).  And return an int.

Changed the nane, how about void? This latest incarnation doesn't
actually have an error path. Still doesn't hurt to have one and not use
it I suppose...

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 01/14] x86: Add insn_decode_kernel()
  2021-03-18 17:11 ` [PATCH v2 01/14] x86: Add insn_decode_kernel() Peter Zijlstra
@ 2021-03-19 10:40   ` Borislav Petkov
  0 siblings, 0 replies; 45+ messages in thread
From: Borislav Petkov @ 2021-03-19 10:40 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: x86, jpoimboe, jgross, mbenes, linux-kernel

On Thu, Mar 18, 2021 at 06:11:04PM +0100, Peter Zijlstra wrote:
> Add a helper to decode kernel instructions; there's no point in
> endlessly repeating those last two arguments.
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  arch/x86/include/asm/insn.h        |    2 ++
>  arch/x86/kernel/alternative.c      |    2 +-
>  arch/x86/kernel/cpu/mce/severity.c |    2 +-
>  arch/x86/kernel/kprobes/core.c     |    4 ++--
>  arch/x86/kernel/kprobes/opt.c      |    2 +-
>  arch/x86/kernel/traps.c            |    2 +-
>  tools/arch/x86/include/asm/insn.h  |    4 +++-
>  7 files changed, 11 insertions(+), 7 deletions(-)

Reviewed-by: Borislav Petkov <bp@suse.de>

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 12/14] objtool: Allow archs to rewrite retpolines
  2021-03-19  2:54   ` Josh Poimboeuf
@ 2021-03-19 11:21     ` Peter Zijlstra
  2021-03-19 13:28       ` Peter Zijlstra
  0 siblings, 1 reply; 45+ messages in thread
From: Peter Zijlstra @ 2021-03-19 11:21 UTC (permalink / raw)
  To: Josh Poimboeuf; +Cc: x86, jgross, mbenes, linux-kernel

On Thu, Mar 18, 2021 at 09:54:40PM -0500, Josh Poimboeuf wrote:
> On Thu, Mar 18, 2021 at 06:11:15PM +0100, Peter Zijlstra wrote:
> > @@ -1212,6 +1225,8 @@ static int handle_group_alt(struct objto
> >  		dest_off = arch_jump_destination(insn);
> >  		if (dest_off == special_alt->new_off + special_alt->new_len)
> >  			insn->jump_dest = next_insn_same_sec(file, last_orig_insn);
> > +		else
> > +			insn->jump_dest = find_insn(file, insn->sec, dest_off);
> >  
> >  		if (!insn->jump_dest) {
> >  			WARN_FUNC("can't find alternative jump destination",
> 
> So I assume this change is because of the ordering change: now this is
> done before add_jump_destinations().

Correct.

> Also the new hunk to be an oversimplified version of
> add_jump_destinations().  I'm not quite convinced that it will always do
> the right thing for this case.

You're right; this is because of the reorder. At the time I thought this
was right, but looking at it now, I'm not sure. Esp. so when ARM64 comes
along and allows more relocations in alternatives.

Let me see if I can come up with something better.

> But doesn't that mean the alternative jump modification (changing the
> dest to the end of the original insns) will get overwritten later?

Good point, should be simple enough to fix by having
add_jump_destination() skip all that already have insn->jump_dest set I
suppose.

> > @@ -1797,11 +1812,15 @@ static int decode_sections(struct objtoo
> >  	if (ret)
> >  		return ret;
> >  
> > -	ret = add_jump_destinations(file);
> > +	/*
> > +	 * Must be before add_{jump,call}_destination; for they can add
> > +	 * magic alternatives.
> > +	 */
> > +	ret = add_special_section_alts(file);
> 
> This reordering is unfortunate.  Maybe there's a better way, though I
> don't have any ideas, at least until I get to the most controversial
> patch.

So the thing no longer crashes on the alternatives it writes, so we
*could* read back our own alternatives, but it does seem somewhat
unfortunate to do that. Too easy to get into cycles that way.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 12/14] objtool: Allow archs to rewrite retpolines
  2021-03-19 11:21     ` Peter Zijlstra
@ 2021-03-19 13:28       ` Peter Zijlstra
  0 siblings, 0 replies; 45+ messages in thread
From: Peter Zijlstra @ 2021-03-19 13:28 UTC (permalink / raw)
  To: Josh Poimboeuf; +Cc: x86, jgross, mbenes, linux-kernel

On Fri, Mar 19, 2021 at 12:21:03PM +0100, Peter Zijlstra wrote:
> Let me see if I can come up with something better.

Duh, we can put the instructions on a list and process them later, just
like static_call, can even use the same insn list_head since a retpoline
cannot be a static call.

Lemme go make that happen.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 10/14] objtool: Extract elf_symbol_add()
  2021-03-19  9:54     ` Peter Zijlstra
@ 2021-03-19 15:04       ` Josh Poimboeuf
  0 siblings, 0 replies; 45+ messages in thread
From: Josh Poimboeuf @ 2021-03-19 15:04 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: x86, jgross, mbenes, linux-kernel

On Fri, Mar 19, 2021 at 10:54:05AM +0100, Peter Zijlstra wrote:
> On Thu, Mar 18, 2021 at 09:14:03PM -0500, Josh Poimboeuf wrote:
> > On Thu, Mar 18, 2021 at 06:11:13PM +0100, Peter Zijlstra wrote:
> > > Create a common helper to add symbols.
> > > 
> > > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> > > ---
> > >  tools/objtool/elf.c |   57 ++++++++++++++++++++++++++++++----------------------
> > >  1 file changed, 33 insertions(+), 24 deletions(-)
> > > 
> > > --- a/tools/objtool/elf.c
> > > +++ b/tools/objtool/elf.c
> > > @@ -290,12 +290,41 @@ static int read_sections(struct elf *elf
> > >  	return 0;
> > >  }
> > >  
> > > +static bool elf_symbol_add(struct elf *elf, struct symbol *sym)
> > 
> > How about "elf_add_symbol()" for consistency with my other suggestions
> > (elf_add_reloc() and elf_add_string()).  And return an int.
> 
> Changed the nane, how about void? This latest incarnation doesn't
> actually have an error path. Still doesn't hurt to have one and not use
> it I suppose...

void would be my preference, just to avoid unnecessary error handling
boilerplate in the caller.

-- 
Josh


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 08/14] objtool: Add elf_create_reloc() helper
  2021-03-19  9:47     ` Peter Zijlstra
@ 2021-03-19 15:12       ` Josh Poimboeuf
  2021-03-19 15:24         ` Peter Zijlstra
  0 siblings, 1 reply; 45+ messages in thread
From: Josh Poimboeuf @ 2021-03-19 15:12 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: x86, jgross, mbenes, linux-kernel

On Fri, Mar 19, 2021 at 10:47:36AM +0100, Peter Zijlstra wrote:
> Full patch, because diff on a diff on a diff is getting ludicrous hard
> to read :-)

Appreciated :-)

> -void elf_add_reloc(struct elf *elf, struct reloc *reloc)
> +int elf_add_reloc(struct elf *elf, struct section *sec, unsigned long offset,
> +		  unsigned int type, struct symbol *sym, int addend)
>  {
> -	struct section *sec = reloc->sec;
> +	struct reloc *reloc;
> +
> +	reloc = malloc(sizeof(*reloc));
> +	if (!reloc) {
> +		perror("malloc");
> +		return -1;
> +	}
> +	memset(reloc, 0, sizeof(*reloc));
> +
> +	reloc->sec = sec->reloc;

Maybe elf_add_reloc() could create the reloc section if it doesn't
already exist, that would help remove the multiple calls to
elf_create_reloc_section().

> +	reloc->offset = offset;
> +	reloc->type = type;
> +	reloc->sym = sym;
> +	reloc->addend = addend;
>  
>  	list_add_tail(&reloc->list, &sec->reloc_list);

This should be sec->reloc->reloc_list?

-- 
Josh


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 08/14] objtool: Add elf_create_reloc() helper
  2021-03-19 15:12       ` Josh Poimboeuf
@ 2021-03-19 15:24         ` Peter Zijlstra
  2021-03-19 15:37           ` Josh Poimboeuf
  0 siblings, 1 reply; 45+ messages in thread
From: Peter Zijlstra @ 2021-03-19 15:24 UTC (permalink / raw)
  To: Josh Poimboeuf; +Cc: x86, jgross, mbenes, linux-kernel

On Fri, Mar 19, 2021 at 10:12:59AM -0500, Josh Poimboeuf wrote:

> > -void elf_add_reloc(struct elf *elf, struct reloc *reloc)
> > +int elf_add_reloc(struct elf *elf, struct section *sec, unsigned long offset,
> > +		  unsigned int type, struct symbol *sym, int addend)
> >  {
> > -	struct section *sec = reloc->sec;
> > +	struct reloc *reloc;
> > +
> > +	reloc = malloc(sizeof(*reloc));
> > +	if (!reloc) {
> > +		perror("malloc");
> > +		return -1;
> > +	}
> > +	memset(reloc, 0, sizeof(*reloc));
> > +
> > +	reloc->sec = sec->reloc;
> 
> Maybe elf_add_reloc() could create the reloc section if it doesn't
> already exist, that would help remove the multiple calls to
> elf_create_reloc_section().

I'll do that as yet another patch ;-)

> > +	reloc->offset = offset;
> > +	reloc->type = type;
> > +	reloc->sym = sym;
> > +	reloc->addend = addend;
> >  
> >  	list_add_tail(&reloc->list, &sec->reloc_list);
> 
> This should be sec->reloc->reloc_list?

Absolutely, there were a few other 'funnies' as well that I just fixed
whilst trying to make the whole new stack work again :-)

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 14/14] objtool,x86: Rewrite retpoline thunk calls
  2021-03-19  8:06     ` Peter Zijlstra
@ 2021-03-19 15:30       ` Josh Poimboeuf
  2021-03-19 15:56         ` Peter Zijlstra
  0 siblings, 1 reply; 45+ messages in thread
From: Josh Poimboeuf @ 2021-03-19 15:30 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: x86, jgross, mbenes, linux-kernel

On Fri, Mar 19, 2021 at 09:06:44AM +0100, Peter Zijlstra wrote:
> > Also doesn't the alternative code already insert nops?
> 
> Problem is that the {call,jmp} *%\reg thing is not fixed length. They're
> 2 or 3 bytes depending on which register is picked.

Why do they need to be fixed length?  Objtool can use sym->len as the
alternative replacement length.  Then alternatives can add nops as
needed.

-- 
Josh


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 08/14] objtool: Add elf_create_reloc() helper
  2021-03-19 15:24         ` Peter Zijlstra
@ 2021-03-19 15:37           ` Josh Poimboeuf
  0 siblings, 0 replies; 45+ messages in thread
From: Josh Poimboeuf @ 2021-03-19 15:37 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: x86, jgross, mbenes, linux-kernel

On Fri, Mar 19, 2021 at 04:24:40PM +0100, Peter Zijlstra wrote:
> On Fri, Mar 19, 2021 at 10:12:59AM -0500, Josh Poimboeuf wrote:
> 
> > > -void elf_add_reloc(struct elf *elf, struct reloc *reloc)
> > > +int elf_add_reloc(struct elf *elf, struct section *sec, unsigned long offset,
> > > +		  unsigned int type, struct symbol *sym, int addend)
> > >  {
> > > -	struct section *sec = reloc->sec;
> > > +	struct reloc *reloc;
> > > +
> > > +	reloc = malloc(sizeof(*reloc));
> > > +	if (!reloc) {
> > > +		perror("malloc");
> > > +		return -1;
> > > +	}
> > > +	memset(reloc, 0, sizeof(*reloc));
> > > +
> > > +	reloc->sec = sec->reloc;
> > 
> > Maybe elf_add_reloc() could create the reloc section if it doesn't
> > already exist, that would help remove the multiple calls to
> > elf_create_reloc_section().
> 
> I'll do that as yet another patch ;-)

Ok!

> > > +	reloc->offset = offset;
> > > +	reloc->type = type;
> > > +	reloc->sym = sym;
> > > +	reloc->addend = addend;
> > >  
> > >  	list_add_tail(&reloc->list, &sec->reloc_list);
> > 
> > This should be sec->reloc->reloc_list?
> 
> Absolutely, there were a few other 'funnies' as well that I just fixed
> whilst trying to make the whole new stack work again :-)

I'm sure some of those were my fault, thanks for testing my crap :-)

-- 
Josh


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 14/14] objtool,x86: Rewrite retpoline thunk calls
  2021-03-19 15:30       ` Josh Poimboeuf
@ 2021-03-19 15:56         ` Peter Zijlstra
  2021-03-19 22:52           ` Josh Poimboeuf
  0 siblings, 1 reply; 45+ messages in thread
From: Peter Zijlstra @ 2021-03-19 15:56 UTC (permalink / raw)
  To: Josh Poimboeuf; +Cc: x86, jgross, mbenes, linux-kernel

On Fri, Mar 19, 2021 at 10:30:26AM -0500, Josh Poimboeuf wrote:
> On Fri, Mar 19, 2021 at 09:06:44AM +0100, Peter Zijlstra wrote:
> > > Also doesn't the alternative code already insert nops?
> > 
> > Problem is that the {call,jmp} *%\reg thing is not fixed length. They're
> > 2 or 3 bytes depending on which register is picked.
> 
> Why do they need to be fixed length?  Objtool can use sym->len as the
> alternative replacement length.  Then alternatives can add nops as
> needed.

UNDEF has size 0. That is, unless these symbols exist in the translation
unit (they do not) we have no clue.

Arguably I could parse the symbol name again and then we know the
register number and thus if we need REX etc.. but I figured we wanted to
avoid all that.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* RE: [PATCH v2 03/14] x86/retpoline: Simplify retpolines
  2021-03-18 17:11 ` [PATCH v2 03/14] x86/retpoline: Simplify retpolines Peter Zijlstra
@ 2021-03-19 17:18   ` David Laight
  2021-03-22  9:32     ` Peter Zijlstra
  0 siblings, 1 reply; 45+ messages in thread
From: David Laight @ 2021-03-19 17:18 UTC (permalink / raw)
  To: 'Peter Zijlstra', x86, jpoimboe, jgross, mbenes; +Cc: linux-kernel

From: Peter Zijlstra
> Sent: 18 March 2021 17:11
> 
> Due to commit c9c324dc22aa ("objtool: Support stack layout changes
> in alternatives"), it is possible to simplify the retpolines.
> 
...
> Notice that since the longest alternative sequence is now:
> 
>    0:   e8 07 00 00 00          callq  c <.altinstr_replacement+0xc>
>    5:   f3 90                   pause
>    7:   0f ae e8                lfence
>    a:   eb f9                   jmp    5 <.altinstr_replacement+0x5>
>    c:   48 89 04 24             mov    %rax,(%rsp)
>   10:   c3                      retq
> 
> 17 bytes, we have 15 bytes NOP at the end of our 32 byte slot. (IOW,
> if we can shrink the retpoline by 1 byte we can pack it more dense)

I'm intrigued about the lfence after the pause.
Clearly this is for very warped cpu behaviour.
To get to the pause you have to be speculating past an
unconditional call.

To get to the lfence you have to (mostly) have ignored the pause.
Which is commented:
	_mm_pause(); /* Abort speculation */
in a couple of examples in 248966-033.

I wonder what effect replacing the lfence with hlt would have?
It would certainly save 2 bytes and allow the entire retpoline
be put into a single 16byte code fetch block.

248966-033 also contains a note that the instruction(s) after
an indirect jump may get executed.
It suggests adding a pause (or illegal instruction) to stop
anything odd happening (they knew it could be horrid in June 2016.
But then go on to say the adding pause may be a performance issue.
(Presumably because if it is speculatively executed it takes ages.)

I do remember something from even longer ago about trying to never
speculate any of the trig opcodes - because at last some cpu couldn't
abort the instruction.

This may also mean that a big pile of 0x90 nops after the jmp (%eax)
is actually better than 2 or 3 'big' nops.
Of course, if you execute the nops you always want the big ones.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 14/14] objtool,x86: Rewrite retpoline thunk calls
  2021-03-19 15:56         ` Peter Zijlstra
@ 2021-03-19 22:52           ` Josh Poimboeuf
  0 siblings, 0 replies; 45+ messages in thread
From: Josh Poimboeuf @ 2021-03-19 22:52 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: x86, jgross, mbenes, linux-kernel

On Fri, Mar 19, 2021 at 04:56:30PM +0100, Peter Zijlstra wrote:
> On Fri, Mar 19, 2021 at 10:30:26AM -0500, Josh Poimboeuf wrote:
> > On Fri, Mar 19, 2021 at 09:06:44AM +0100, Peter Zijlstra wrote:
> > > > Also doesn't the alternative code already insert nops?
> > > 
> > > Problem is that the {call,jmp} *%\reg thing is not fixed length. They're
> > > 2 or 3 bytes depending on which register is picked.
> > 
> > Why do they need to be fixed length?  Objtool can use sym->len as the
> > alternative replacement length.  Then alternatives can add nops as
> > needed.
> 
> UNDEF has size 0. That is, unless these symbols exist in the translation
> unit (they do not) we have no clue.
> 
> Arguably I could parse the symbol name again and then we know the
> register number and thus if we need REX etc.. but I figured we wanted to
> avoid all that.

Ah, makes sense now.

-- 
Josh


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 02/14] x86/alternatives: Optimize optimize_nops()
  2021-03-18 17:11 ` [PATCH v2 02/14] x86/alternatives: Optimize optimize_nops() Peter Zijlstra
@ 2021-03-21 12:06   ` Borislav Petkov
  2021-03-22  8:17     ` Peter Zijlstra
  0 siblings, 1 reply; 45+ messages in thread
From: Borislav Petkov @ 2021-03-21 12:06 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: x86, jpoimboe, jgross, mbenes, linux-kernel

On Thu, Mar 18, 2021 at 06:11:05PM +0100, Peter Zijlstra wrote:
> --- a/arch/x86/kernel/alternative.c
> +++ b/arch/x86/kernel/alternative.c
> @@ -345,19 +345,39 @@ recompute_jump(struct alt_instr *a, u8 *
>  static void __init_or_module noinline optimize_nops(struct alt_instr *a, u8 *instr)
>  {
>  	unsigned long flags;
> -	int i;
> +	int nops = 0, i = 0;
> +	struct insn insn;
> +	u8 *nop = NULL;
>  
> -	for (i = 0; i < a->padlen; i++) {
> -		if (instr[i] != 0x90)
> +	do {
> +		if (insn_decode_kernel(&insn, &instr[i]))
>  			return;
> -	}
>  
> -	local_irq_save(flags);
> -	add_nops(instr + (a->instrlen - a->padlen), a->padlen);
> -	local_irq_restore(flags);
> +		if (insn.length == 1 && insn.opcode.bytes[0] == 0x90) {
> +			if (!nop) {
> +				nop = &instr[i];
> +				nops = 1;
> +			} else {
> +				nops++;
> +			}
> +		}
> +		i += insn.length;
> +
> +		if ((insn.length != 1 || i == a->instrlen) && nop) {
> +
> +			local_irq_save(flags);
> +			add_nops(nop, nops);
> +			local_irq_restore(flags);
> +
> +			DUMP_BYTES(instr, a->instrlen, "%px: [%d:%d) optimized NOPs: ",
> +				   instr, (int)(unsigned long)(nop-instr), nops);
> +
> +			nop = NULL;
> +		}
>  
> -	DUMP_BYTES(instr, a->instrlen, "%px: [%d:%d) optimized NOPs: ",
> -		   instr, a->instrlen - a->padlen, a->padlen);
> +	} while (i < a->instrlen);
> +
> +	WARN_ON_ONCE(nop);
>  }

I think I've made this simpler; pasting the whole function and not the
diff because former is easier to read:

/*
 * "noinline" to cause control flow change and thus invalidate I$ and
 * cause refetch after modification.
 *
 * Jump over the non-NOP insns, the remaining bytes must be single-byte NOPs,
 * optimize them.
 */
static void __init_or_module noinline optimize_nops(struct alt_instr *a, u8 *instr)
{
        unsigned long flags;
        struct insn insn;
        int i = 0, j;

        /* Skip preceding non-NOP instructions. */
        do {
                if (insn_decode_kernel(&insn, &instr[i]))
                        return;

                if (insn.length == 1 && insn.opcode.bytes[0] == 0x90)
                        break;

                i += insn.length;

        } while (i < a->instrlen);

        if (i >= a->instrlen - 1)
                return;

        /* Verify rest is NOPs - should not fire(tm) */
        for (j = i; j < a->instrlen - 1; j++) {
                if (WARN(instr[j] != 0x90, "Wrong insn byte 0x%hx at 0x%px\n",
                         instr[j], &instr[j]))
                        return;
        }

        local_irq_save(flags);
        add_nops(&instr[i], a->instrlen - i);
        local_irq_restore(flags);

        DUMP_BYTES(instr, a->instrlen, "%px: [%d:%d) optimized NOPs: ",
                   instr, i, a->instrlen);
}

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 02/14] x86/alternatives: Optimize optimize_nops()
  2021-03-21 12:06   ` Borislav Petkov
@ 2021-03-22  8:17     ` Peter Zijlstra
  2021-03-22 11:07       ` Borislav Petkov
  0 siblings, 1 reply; 45+ messages in thread
From: Peter Zijlstra @ 2021-03-22  8:17 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: x86, jpoimboe, jgross, mbenes, linux-kernel

On Sun, Mar 21, 2021 at 01:06:47PM +0100, Borislav Petkov wrote:

> I think I've made this simpler; pasting the whole function and not the
> diff because former is easier to read:

You've make it only replace a single stream of NOPs. Which is probably
fine, but... :-)

So mine, while a little more complicated, will replace any number of NOP
streams in any location. Saves having to worry about if we missed some
stupid corner case somewhere.

Anyway, I think I got it slightly wrong.. How's this one?

---

static void __init_or_module noinline optimize_nops(struct alt_instr *a, u8 *instr)
{
	int nops = 0, i = 0;
	bool done = false;
	struct insn insn;
	u8 *nop = NULL;

	do {
		if (insn_decode_kernel(&insn, &instr[i]))
			return;

		if (insn.length == 1 && insn.opcode.bytes[0] == 0x90)
			nops++;
		else if (nops)
			nop = &instr[i - nops];

		if (i + insn.length >= a->instrlen) {
			nop = &instr[i - nops];
			done = true;
		}

		if (nop) {
			unsigned long flags;

			local_irq_save(flags);
			add_nops(nop, nops);
			local_irq_restore(flags);

			DUMP_BYTES(instr, a->instrlen, "%px: [%d:%d) optimized NOPs: ",
				   instr, (int)(unsigned long)(nop-instr), nops);

			nops = 0;
			nop = NULL;
		}

		i += insn.length;

	} while (!done);

	WARN_ON_ONCE(nop);
}

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 03/14] x86/retpoline: Simplify retpolines
  2021-03-19 17:18   ` David Laight
@ 2021-03-22  9:32     ` Peter Zijlstra
  2021-03-22 15:41       ` David Laight
  0 siblings, 1 reply; 45+ messages in thread
From: Peter Zijlstra @ 2021-03-22  9:32 UTC (permalink / raw)
  To: David Laight; +Cc: x86, jpoimboe, jgross, mbenes, linux-kernel

On Fri, Mar 19, 2021 at 05:18:14PM +0000, David Laight wrote:
> From: Peter Zijlstra
> > Sent: 18 March 2021 17:11
> > 
> > Due to commit c9c324dc22aa ("objtool: Support stack layout changes
> > in alternatives"), it is possible to simplify the retpolines.
> > 
> ...
> > Notice that since the longest alternative sequence is now:
> > 
> >    0:   e8 07 00 00 00          callq  c <.altinstr_replacement+0xc>
> >    5:   f3 90                   pause
> >    7:   0f ae e8                lfence
> >    a:   eb f9                   jmp    5 <.altinstr_replacement+0x5>
> >    c:   48 89 04 24             mov    %rax,(%rsp)
> >   10:   c3                      retq
> > 
> > 17 bytes, we have 15 bytes NOP at the end of our 32 byte slot. (IOW,
> > if we can shrink the retpoline by 1 byte we can pack it more dense)
> 
> I'm intrigued about the lfence after the pause.
> Clearly this is for very warped cpu behaviour.
> To get to the pause you have to be speculating past an
> unconditional call.

Please read up on retpoline... That's the speculation trap. The warped
CPU behaviour is called Spectre-v2.

For others, the obvious alternative is the below; and possibly we could
then also remove the loop.

The original retpoline, as per Paul's article has: 1: pause; jmp 1b;.
That is, it lacks the LFENCE we have and would also fit 16 bytes.



---
--- a/arch/x86/lib/retpoline.S
+++ b/arch/x86/lib/retpoline.S
@@ -15,8 +15,7 @@
 	call    .Ldo_rop_\@
 .Lspec_trap_\@:
 	UNWIND_HINT_EMPTY
-	pause
-	lfence
+	int3
 	jmp .Lspec_trap_\@
 .Ldo_rop_\@:
 	mov     %\reg, (%_ASM_SP)
@@ -27,7 +26,7 @@
 .macro THUNK reg
 	.section .text.__x86.indirect_thunk
 
-	.align 32
+	.align 16
 SYM_FUNC_START(__x86_indirect_thunk_\reg)
 
 	ALTERNATIVE_2 __stringify(ANNOTATE_RETPOLINE_SAFE; jmp *%\reg), \



^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 02/14] x86/alternatives: Optimize optimize_nops()
  2021-03-22  8:17     ` Peter Zijlstra
@ 2021-03-22 11:07       ` Borislav Petkov
  0 siblings, 0 replies; 45+ messages in thread
From: Borislav Petkov @ 2021-03-22 11:07 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: x86, jpoimboe, jgross, mbenes, linux-kernel

On Mon, Mar 22, 2021 at 09:17:57AM +0100, Peter Zijlstra wrote:
> You've make it only replace a single stream of NOPs. Which is probably
> fine, but... :-)

Yap, since I added the padding thing there should be no need to put NOPs
in the middle.

> So mine, while a little more complicated, will replace any number of NOP

A little? I had to sprinkle printks to figure out what it does :)

And what it does is, it calls the instruction decoder on every byte and
advances by the length of each insn it decoded. But that is unnecessary
because the possible opcode bytes you should get to see in any possible
location are:

[(insn).* NOP*]

i.e., 0 or more instructions which are non-NOPs followed by 0 or more
NOPs. Thus my simpler solution to scan past the non-NOPs and patch the
rest.

So I don't see the need for the complexity, frankly.

Btw, yours needs some adjusting to the DUMP_BYTES indices call:

[    0.145789] SMP alternatives: SKIP feat: 8*32+16, old: (entry_SYSCALL_64_after_hwframe+0x49/0xc5 (ffffffff81a0006d) len: 23), repl: (ffffffff89764c3f, len: 23)
[    0.146699] SMP alternatives: ffffffff81a0006d:		 old_insn: 48 c7 c0 90 90 00 00 90 90 90 cd 30 90 90 90 90 90 90 90 90 90 90 90
[    0.148121] SMP alternatives: ffffffff89764c3f: 		 rpl_insn: 48 bb ef be ad de 00 00 00 00 48 31 c0 48 b9 be ba ee ff c0 00 00 00
[    0.149405] SMP alternatives: ffffffff81a0006d:   [7:3) optimized NOPs: 48 c7 c0 90 90 00 00 0f 1f 00 cd 30 90 90 90 90 90 90 90 90 90 90 90
[    0.150700] SMP alternatives: ffffffff81a0006d: [11:11) optimized NOPs: 48 c7 c0 90 90 00 00 0f 1f 00 cd 0f 1f 84 00 00 00 00 00 0f 1f 00 90

[7:3) and [11:11) look weird.

The thing I'm testing with is:

        ALTERNATIVE "mov $0x9090, %rax; .byte 0x90,0x90,0x90; int $0x30", "mov $0xdeadbeef, %rbx; xor %rax, %rax; mov $0xc0ffeebabe, %rcx", \
                X86_FEATURE_XENPV

which is arbitrary, ofc.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 06/14] objtool: Fix static_call list generation
  2021-03-18 17:11 ` [PATCH v2 06/14] objtool: Fix static_call list generation Peter Zijlstra
@ 2021-03-22 12:44   ` Miroslav Benes
  0 siblings, 0 replies; 45+ messages in thread
From: Miroslav Benes @ 2021-03-22 12:44 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: x86, jpoimboe, jgross, linux-kernel

> --- a/tools/objtool/check.c
> +++ b/tools/objtool/check.c
> @@ -1045,6 +1045,12 @@ static int add_call_destinations(struct
>  		} else
>  			insn->call_dest = reloc->sym;
>  
> +		if (insn->call_dest && insn->call_dest->static_call_tramp) {
> +			list_add_tail(&insn->static_call_node,
> +				      &file->static_call_list);
> +		}
> +
> +

A nit, but too many new lines here.

Miroslav

^ permalink raw reply	[flat|nested] 45+ messages in thread

* RE: [PATCH v2 03/14] x86/retpoline: Simplify retpolines
  2021-03-22  9:32     ` Peter Zijlstra
@ 2021-03-22 15:41       ` David Laight
  0 siblings, 0 replies; 45+ messages in thread
From: David Laight @ 2021-03-22 15:41 UTC (permalink / raw)
  To: 'Peter Zijlstra'; +Cc: x86, jpoimboe, jgross, mbenes, linux-kernel

From: Peter Zijlstra
> Sent: 22 March 2021 09:33
> 
> On Fri, Mar 19, 2021 at 05:18:14PM +0000, David Laight wrote:
> > From: Peter Zijlstra
> > > Sent: 18 March 2021 17:11
> > >
> > > Due to commit c9c324dc22aa ("objtool: Support stack layout changes
> > > in alternatives"), it is possible to simplify the retpolines.
> > >
> > ...
> > > Notice that since the longest alternative sequence is now:
> > >
> > >    0:   e8 07 00 00 00          callq  c <.altinstr_replacement+0xc>
> > >    5:   f3 90                   pause
> > >    7:   0f ae e8                lfence
> > >    a:   eb f9                   jmp    5 <.altinstr_replacement+0x5>
> > >    c:   48 89 04 24             mov    %rax,(%rsp)
> > >   10:   c3                      retq
> > >
> > > 17 bytes, we have 15 bytes NOP at the end of our 32 byte slot. (IOW,
> > > if we can shrink the retpoline by 1 byte we can pack it more dense)
> >
> > I'm intrigued about the lfence after the pause.
> > Clearly this is for very warped cpu behaviour.
> > To get to the pause you have to be speculating past an
> > unconditional call.
> 
> Please read up on retpoline... That's the speculation trap. The warped
> CPU behaviour is called Spectre-v2.

There is 'warped' and 'very warped' :-)
Most of Spectre-v2 (don't search for Spectra v2 by mistake) is avoiding
the indirect branch prediction - which I knew.

I think the 'pause' is only executed is the cpu speculates through
the mov and retq; rather the speculating past the 'call' - which
some ARM cpu seem to do.

> For others, the obvious alternative is the below; and possibly we could
> then also remove the loop.

Another alternative is 'hlt' with the loop (rather than int3).

> The original retpoline, as per Paul's article has: 1: pause; jmp 1b;.
> That is, it lacks the LFENCE we have and would also fit 16 bytes.

Yes. Just 'jmps .' ought to be enough to block any side effects
of speculative execution.
Adding 'pause' is 'a good idea' for any spin loop.

There might be another lurking performance issue.
Skylake increased the execution time of pause from ~10 to ~140 clocks.
Reading between the lines I suspect this applies to speculatively
executed pause.
If that happens on a regular basis it might be noticeable.
So it may even be best to remove the pause!

As you say, the original retpoline lacked the lfence.
The only 'load' instruction in that sequence is the 'retq'.
It has to be said that given all (normal) loads are executed
in program order and all (normal) stores are also executed
in program order I've never actually seen what either
lfence or sfence actually do for you.
(mfence synchronises reads and writes - so may be useful.)
The (pre spectre) copies of the intel pdf's I have don't say
anything about lfence being any kind of a barrier against
speculative memory reads.

If the retpoline doesn't fit in 16 bytes it is almost certainly
(probably) worth putting the called label at offset 16.
This would mean that there is only one 16-byte code block
read from the call target.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


^ permalink raw reply	[flat|nested] 45+ messages in thread

end of thread, other threads:[~2021-03-22 15:42 UTC | newest]

Thread overview: 45+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-18 17:11 [PATCH v2 00/14] x86,objtool: Optimize !RETPOLINE Peter Zijlstra
2021-03-18 17:11 ` [PATCH v2 01/14] x86: Add insn_decode_kernel() Peter Zijlstra
2021-03-19 10:40   ` Borislav Petkov
2021-03-18 17:11 ` [PATCH v2 02/14] x86/alternatives: Optimize optimize_nops() Peter Zijlstra
2021-03-21 12:06   ` Borislav Petkov
2021-03-22  8:17     ` Peter Zijlstra
2021-03-22 11:07       ` Borislav Petkov
2021-03-18 17:11 ` [PATCH v2 03/14] x86/retpoline: Simplify retpolines Peter Zijlstra
2021-03-19 17:18   ` David Laight
2021-03-22  9:32     ` Peter Zijlstra
2021-03-22 15:41       ` David Laight
2021-03-18 17:11 ` [PATCH v2 04/14] objtool: Correctly handle retpoline thunk calls Peter Zijlstra
2021-03-18 17:11 ` [PATCH v2 05/14] objtool: Per arch retpoline naming Peter Zijlstra
2021-03-19  2:38   ` Josh Poimboeuf
2021-03-19  9:07     ` Peter Zijlstra
2021-03-18 17:11 ` [PATCH v2 06/14] objtool: Fix static_call list generation Peter Zijlstra
2021-03-22 12:44   ` Miroslav Benes
2021-03-18 17:11 ` [PATCH v2 07/14] objtool: Rework rebuild_reloc logic Peter Zijlstra
2021-03-18 17:11 ` [PATCH v2 08/14] objtool: Add elf_create_reloc() helper Peter Zijlstra
2021-03-19  1:42   ` Josh Poimboeuf
2021-03-19  9:47     ` Peter Zijlstra
2021-03-19 15:12       ` Josh Poimboeuf
2021-03-19 15:24         ` Peter Zijlstra
2021-03-19 15:37           ` Josh Poimboeuf
2021-03-18 17:11 ` [PATCH v2 09/14] objtool: Extract elf_strtab_concat() Peter Zijlstra
2021-03-19  2:10   ` Josh Poimboeuf
2021-03-19  9:52     ` Peter Zijlstra
2021-03-18 17:11 ` [PATCH v2 10/14] objtool: Extract elf_symbol_add() Peter Zijlstra
2021-03-19  2:14   ` Josh Poimboeuf
2021-03-19  9:54     ` Peter Zijlstra
2021-03-19 15:04       ` Josh Poimboeuf
2021-03-18 17:11 ` [PATCH v2 11/14] objtool: Add elf_create_undef_symbol() Peter Zijlstra
2021-03-19  2:29   ` Josh Poimboeuf
2021-03-19  7:56     ` Peter Zijlstra
2021-03-18 17:11 ` [PATCH v2 12/14] objtool: Allow archs to rewrite retpolines Peter Zijlstra
2021-03-19  2:54   ` Josh Poimboeuf
2021-03-19 11:21     ` Peter Zijlstra
2021-03-19 13:28       ` Peter Zijlstra
2021-03-18 17:11 ` [PATCH v2 13/14] objtool: Skip magical retpoline .altinstr_replacement Peter Zijlstra
2021-03-18 17:11 ` [PATCH v2 14/14] objtool,x86: Rewrite retpoline thunk calls Peter Zijlstra
2021-03-19  3:29   ` Josh Poimboeuf
2021-03-19  8:06     ` Peter Zijlstra
2021-03-19 15:30       ` Josh Poimboeuf
2021-03-19 15:56         ` Peter Zijlstra
2021-03-19 22:52           ` Josh Poimboeuf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).