linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/8] objtool vs retpoline
@ 2020-04-23 12:47 Peter Zijlstra
  2020-04-23 12:47 ` [PATCH 1/8] objtool: is_fentry_call() crashes if call has no destination Peter Zijlstra
                   ` (7 more replies)
  0 siblings, 8 replies; 24+ messages in thread
From: Peter Zijlstra @ 2020-04-23 12:47 UTC (permalink / raw)
  To: jpoimboe, alexandre.chartre
  Cc: linux-kernel, jthierry, tglx, x86, mbenes, peterz

Hi,

Based on Alexandre's patches, here's a pile that goes on top of tip/objtool/core.

With these patches on objtool can completely understand retpolines and RSB
stuffing, which means it can emit valid ORC unwind information for them, which
in turn means we can now unwind through a retpoline.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 1/8] objtool: is_fentry_call() crashes if call has no destination
  2020-04-23 12:47 [PATCH 0/8] objtool vs retpoline Peter Zijlstra
@ 2020-04-23 12:47 ` Peter Zijlstra
  2020-04-23 12:47 ` [PATCH 2/8] objtool: UNWIND_HINT_RET_OFFSET should not check registers Peter Zijlstra
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 24+ messages in thread
From: Peter Zijlstra @ 2020-04-23 12:47 UTC (permalink / raw)
  To: jpoimboe, alexandre.chartre
  Cc: linux-kernel, jthierry, tglx, x86, mbenes, peterz

From: Alexandre Chartre <alexandre.chartre@oracle.com>

Fix is_fentry_call() so that it works if a call has no destination
set (call_dest). This needs to be done in order to support intra-
function calls.

Signed-off-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200414103618.12657-2-alexandre.chartre@oracle.com
---
 tools/objtool/check.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -1490,7 +1490,7 @@ static int decode_sections(struct objtoo
 
 static bool is_fentry_call(struct instruction *insn)
 {
-	if (insn->type == INSN_CALL &&
+	if (insn->type == INSN_CALL && insn->call_dest &&
 	    insn->call_dest->type == STT_NOTYPE &&
 	    !strcmp(insn->call_dest->name, "__fentry__"))
 		return true;



^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 2/8] objtool: UNWIND_HINT_RET_OFFSET should not check registers
  2020-04-23 12:47 [PATCH 0/8] objtool vs retpoline Peter Zijlstra
  2020-04-23 12:47 ` [PATCH 1/8] objtool: is_fentry_call() crashes if call has no destination Peter Zijlstra
@ 2020-04-23 12:47 ` Peter Zijlstra
  2020-04-23 12:47 ` [PATCH 3/8] objtool: Rework allocating stack_ops on decode Peter Zijlstra
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 24+ messages in thread
From: Peter Zijlstra @ 2020-04-23 12:47 UTC (permalink / raw)
  To: jpoimboe, alexandre.chartre
  Cc: linux-kernel, jthierry, tglx, x86, mbenes, peterz

From: Alexandre Chartre <alexandre.chartre@oracle.com>

UNWIND_HINT_RET_OFFSET will adjust a modified stack. However if a
callee-saved register was pushed on the stack then the stack frame
will still appear modified. So stop checking registers when
UNWIND_HINT_RET_OFFSET is used.

Signed-off-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200407073142.20659-3-alexandre.chartre@oracle.com
---
 tools/objtool/check.c |    8 ++++++++
 1 file changed, 8 insertions(+)

--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -1513,6 +1513,14 @@ static bool has_modified_stack_frame(str
 	if (cfi->stack_size != initial_func_cfi.cfa.offset + ret_offset)
 		return true;
 
+	/*
+	 * If there is a ret offset hint then don't check registers
+	 * because a callee-saved register might have been pushed on
+	 * the stack.
+	 */
+	if (ret_offset)
+		return false;
+
 	for (i = 0; i < CFI_NUM_REGS; i++) {
 		if (cfi->regs[i].base != initial_func_cfi.regs[i].base ||
 		    cfi->regs[i].offset != initial_func_cfi.regs[i].offset)



^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 3/8] objtool: Rework allocating stack_ops on decode
  2020-04-23 12:47 [PATCH 0/8] objtool vs retpoline Peter Zijlstra
  2020-04-23 12:47 ` [PATCH 1/8] objtool: is_fentry_call() crashes if call has no destination Peter Zijlstra
  2020-04-23 12:47 ` [PATCH 2/8] objtool: UNWIND_HINT_RET_OFFSET should not check registers Peter Zijlstra
@ 2020-04-23 12:47 ` Peter Zijlstra
  2020-04-23 15:40   ` Alexandre Chartre
  2020-04-23 12:47 ` [PATCH 4/8] objtool: Add support for intra-function calls Peter Zijlstra
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 24+ messages in thread
From: Peter Zijlstra @ 2020-04-23 12:47 UTC (permalink / raw)
  To: jpoimboe, alexandre.chartre
  Cc: linux-kernel, jthierry, tglx, x86, mbenes, peterz

Wrap each stack_op in a macro that allocates and adds it to the list.
This simplifies trying to figure out what to do with the pre-allocated
stack_op at the end.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 tools/objtool/arch/x86/decode.c |  257 +++++++++++++++++++++++-----------------
 1 file changed, 153 insertions(+), 104 deletions(-)

--- a/tools/objtool/arch/x86/decode.c
+++ b/tools/objtool/arch/x86/decode.c
@@ -77,6 +77,17 @@ unsigned long arch_jump_destination(stru
 	return insn->offset + insn->len + insn->immediate;
 }
 
+#define PUSH_OP(op) \
+({ \
+	list_add_tail(&op->list, ops_list); \
+	NULL; \
+})
+
+#define ADD_OP(op) \
+	if (!(op = calloc(1, sizeof(*op)))) \
+		return -1; \
+	else for (; op; op = PUSH_OP(op))
+
 int arch_decode_instruction(struct elf *elf, struct section *sec,
 			    unsigned long offset, unsigned int maxlen,
 			    unsigned int *len, enum insn_type *type,
@@ -88,7 +99,7 @@ int arch_decode_instruction(struct elf *
 	unsigned char op1, op2, rex = 0, rex_b = 0, rex_r = 0, rex_w = 0,
 		      rex_x = 0, modrm = 0, modrm_mod = 0, modrm_rm = 0,
 		      modrm_reg = 0, sib = 0;
-	struct stack_op *op;
+	struct stack_op *op = NULL;
 
 	x86_64 = is_x86_64(elf);
 	if (x86_64 == -1)
@@ -129,10 +140,6 @@ int arch_decode_instruction(struct elf *
 	if (insn.sib.nbytes)
 		sib = insn.sib.bytes[0];
 
-	op = calloc(1, sizeof(*op));
-	if (!op)
-		return -1;
-
 	switch (op1) {
 
 	case 0x1:
@@ -141,10 +148,12 @@ int arch_decode_instruction(struct elf *
 
 			/* add/sub reg, %rsp */
 			*type = INSN_STACK;
-			op->src.type = OP_SRC_ADD;
-			op->src.reg = op_to_cfi_reg[modrm_reg][rex_r];
-			op->dest.type = OP_DEST_REG;
-			op->dest.reg = CFI_SP;
+			ADD_OP(op) {
+				op->src.type = OP_SRC_ADD;
+				op->src.reg = op_to_cfi_reg[modrm_reg][rex_r];
+				op->dest.type = OP_DEST_REG;
+				op->dest.reg = CFI_SP;
+			}
 		}
 		break;
 
@@ -152,9 +161,11 @@ int arch_decode_instruction(struct elf *
 
 		/* push reg */
 		*type = INSN_STACK;
-		op->src.type = OP_SRC_REG;
-		op->src.reg = op_to_cfi_reg[op1 & 0x7][rex_b];
-		op->dest.type = OP_DEST_PUSH;
+		ADD_OP(op) {
+			op->src.type = OP_SRC_REG;
+			op->src.reg = op_to_cfi_reg[op1 & 0x7][rex_b];
+			op->dest.type = OP_DEST_PUSH;
+		}
 
 		break;
 
@@ -162,9 +173,11 @@ int arch_decode_instruction(struct elf *
 
 		/* pop reg */
 		*type = INSN_STACK;
-		op->src.type = OP_SRC_POP;
-		op->dest.type = OP_DEST_REG;
-		op->dest.reg = op_to_cfi_reg[op1 & 0x7][rex_b];
+		ADD_OP(op) {
+			op->src.type = OP_SRC_POP;
+			op->dest.type = OP_DEST_REG;
+			op->dest.reg = op_to_cfi_reg[op1 & 0x7][rex_b];
+		}
 
 		break;
 
@@ -172,8 +185,10 @@ int arch_decode_instruction(struct elf *
 	case 0x6a:
 		/* push immediate */
 		*type = INSN_STACK;
-		op->src.type = OP_SRC_CONST;
-		op->dest.type = OP_DEST_PUSH;
+		ADD_OP(op) {
+			op->src.type = OP_SRC_CONST;
+			op->dest.type = OP_DEST_PUSH;
+		}
 		break;
 
 	case 0x70 ... 0x7f:
@@ -188,11 +203,13 @@ int arch_decode_instruction(struct elf *
 		if (modrm == 0xe4) {
 			/* and imm, %rsp */
 			*type = INSN_STACK;
-			op->src.type = OP_SRC_AND;
-			op->src.reg = CFI_SP;
-			op->src.offset = insn.immediate.value;
-			op->dest.type = OP_DEST_REG;
-			op->dest.reg = CFI_SP;
+			ADD_OP(op) {
+				op->src.type = OP_SRC_AND;
+				op->src.reg = CFI_SP;
+				op->src.offset = insn.immediate.value;
+				op->dest.type = OP_DEST_REG;
+				op->dest.reg = CFI_SP;
+			}
 			break;
 		}
 
@@ -205,11 +222,13 @@ int arch_decode_instruction(struct elf *
 
 		/* add/sub imm, %rsp */
 		*type = INSN_STACK;
-		op->src.type = OP_SRC_ADD;
-		op->src.reg = CFI_SP;
-		op->src.offset = insn.immediate.value * sign;
-		op->dest.type = OP_DEST_REG;
-		op->dest.reg = CFI_SP;
+		ADD_OP(op) {
+			op->src.type = OP_SRC_ADD;
+			op->src.reg = CFI_SP;
+			op->src.offset = insn.immediate.value * sign;
+			op->dest.type = OP_DEST_REG;
+			op->dest.reg = CFI_SP;
+		}
 		break;
 
 	case 0x89:
@@ -217,10 +236,12 @@ int arch_decode_instruction(struct elf *
 
 			/* mov %rsp, reg */
 			*type = INSN_STACK;
-			op->src.type = OP_SRC_REG;
-			op->src.reg = CFI_SP;
-			op->dest.type = OP_DEST_REG;
-			op->dest.reg = op_to_cfi_reg[modrm_rm][rex_b];
+			ADD_OP(op) {
+				op->src.type = OP_SRC_REG;
+				op->src.reg = CFI_SP;
+				op->dest.type = OP_DEST_REG;
+				op->dest.reg = op_to_cfi_reg[modrm_rm][rex_b];
+			}
 			break;
 		}
 
@@ -228,10 +249,12 @@ int arch_decode_instruction(struct elf *
 
 			/* mov reg, %rsp */
 			*type = INSN_STACK;
-			op->src.type = OP_SRC_REG;
-			op->src.reg = op_to_cfi_reg[modrm_reg][rex_r];
-			op->dest.type = OP_DEST_REG;
-			op->dest.reg = CFI_SP;
+			ADD_OP(op) {
+				op->src.type = OP_SRC_REG;
+				op->src.reg = op_to_cfi_reg[modrm_reg][rex_r];
+				op->dest.type = OP_DEST_REG;
+				op->dest.reg = CFI_SP;
+			}
 			break;
 		}
 
@@ -242,21 +265,25 @@ int arch_decode_instruction(struct elf *
 
 			/* mov reg, disp(%rbp) */
 			*type = INSN_STACK;
-			op->src.type = OP_SRC_REG;
-			op->src.reg = op_to_cfi_reg[modrm_reg][rex_r];
-			op->dest.type = OP_DEST_REG_INDIRECT;
-			op->dest.reg = CFI_BP;
-			op->dest.offset = insn.displacement.value;
+			ADD_OP(op) {
+				op->src.type = OP_SRC_REG;
+				op->src.reg = op_to_cfi_reg[modrm_reg][rex_r];
+				op->dest.type = OP_DEST_REG_INDIRECT;
+				op->dest.reg = CFI_BP;
+				op->dest.offset = insn.displacement.value;
+			}
 
 		} else if (rex_w && !rex_b && modrm_rm == 4 && sib == 0x24) {
 
 			/* mov reg, disp(%rsp) */
 			*type = INSN_STACK;
-			op->src.type = OP_SRC_REG;
-			op->src.reg = op_to_cfi_reg[modrm_reg][rex_r];
-			op->dest.type = OP_DEST_REG_INDIRECT;
-			op->dest.reg = CFI_SP;
-			op->dest.offset = insn.displacement.value;
+			ADD_OP(op) {
+				op->src.type = OP_SRC_REG;
+				op->src.reg = op_to_cfi_reg[modrm_reg][rex_r];
+				op->dest.type = OP_DEST_REG_INDIRECT;
+				op->dest.reg = CFI_SP;
+				op->dest.offset = insn.displacement.value;
+			}
 		}
 
 		break;
@@ -266,22 +293,26 @@ int arch_decode_instruction(struct elf *
 
 			/* mov disp(%rbp), reg */
 			*type = INSN_STACK;
-			op->src.type = OP_SRC_REG_INDIRECT;
-			op->src.reg = CFI_BP;
-			op->src.offset = insn.displacement.value;
-			op->dest.type = OP_DEST_REG;
-			op->dest.reg = op_to_cfi_reg[modrm_reg][rex_r];
+			ADD_OP(op) {
+				op->src.type = OP_SRC_REG_INDIRECT;
+				op->src.reg = CFI_BP;
+				op->src.offset = insn.displacement.value;
+				op->dest.type = OP_DEST_REG;
+				op->dest.reg = op_to_cfi_reg[modrm_reg][rex_r];
+			}
 
 		} else if (rex_w && !rex_b && sib == 0x24 &&
 			   modrm_mod != 3 && modrm_rm == 4) {
 
 			/* mov disp(%rsp), reg */
 			*type = INSN_STACK;
-			op->src.type = OP_SRC_REG_INDIRECT;
-			op->src.reg = CFI_SP;
-			op->src.offset = insn.displacement.value;
-			op->dest.type = OP_DEST_REG;
-			op->dest.reg = op_to_cfi_reg[modrm_reg][rex_r];
+			ADD_OP(op) {
+				op->src.type = OP_SRC_REG_INDIRECT;
+				op->src.reg = CFI_SP;
+				op->src.offset = insn.displacement.value;
+				op->dest.type = OP_DEST_REG;
+				op->dest.reg = op_to_cfi_reg[modrm_reg][rex_r];
+			}
 		}
 
 		break;
@@ -290,27 +321,31 @@ int arch_decode_instruction(struct elf *
 		if (sib == 0x24 && rex_w && !rex_b && !rex_x) {
 
 			*type = INSN_STACK;
-			if (!insn.displacement.value) {
-				/* lea (%rsp), reg */
-				op->src.type = OP_SRC_REG;
-			} else {
-				/* lea disp(%rsp), reg */
-				op->src.type = OP_SRC_ADD;
-				op->src.offset = insn.displacement.value;
+			ADD_OP(op) {
+				if (!insn.displacement.value) {
+					/* lea (%rsp), reg */
+					op->src.type = OP_SRC_REG;
+				} else {
+					/* lea disp(%rsp), reg */
+					op->src.type = OP_SRC_ADD;
+					op->src.offset = insn.displacement.value;
+				}
+				op->src.reg = CFI_SP;
+				op->dest.type = OP_DEST_REG;
+				op->dest.reg = op_to_cfi_reg[modrm_reg][rex_r];
 			}
-			op->src.reg = CFI_SP;
-			op->dest.type = OP_DEST_REG;
-			op->dest.reg = op_to_cfi_reg[modrm_reg][rex_r];
 
 		} else if (rex == 0x48 && modrm == 0x65) {
 
 			/* lea disp(%rbp), %rsp */
 			*type = INSN_STACK;
-			op->src.type = OP_SRC_ADD;
-			op->src.reg = CFI_BP;
-			op->src.offset = insn.displacement.value;
-			op->dest.type = OP_DEST_REG;
-			op->dest.reg = CFI_SP;
+			ADD_OP(op) {
+				op->src.type = OP_SRC_ADD;
+				op->src.reg = CFI_BP;
+				op->src.offset = insn.displacement.value;
+				op->dest.type = OP_DEST_REG;
+				op->dest.reg = CFI_SP;
+			}
 
 		} else if (rex == 0x49 && modrm == 0x62 &&
 			   insn.displacement.value == -8) {
@@ -322,11 +357,13 @@ int arch_decode_instruction(struct elf *
 			 * stack realignment.
 			 */
 			*type = INSN_STACK;
-			op->src.type = OP_SRC_ADD;
-			op->src.reg = CFI_R10;
-			op->src.offset = -8;
-			op->dest.type = OP_DEST_REG;
-			op->dest.reg = CFI_SP;
+			ADD_OP(op) {
+				op->src.type = OP_SRC_ADD;
+				op->src.reg = CFI_R10;
+				op->src.offset = -8;
+				op->dest.type = OP_DEST_REG;
+				op->dest.reg = CFI_SP;
+			}
 
 		} else if (rex == 0x49 && modrm == 0x65 &&
 			   insn.displacement.value == -16) {
@@ -338,11 +375,13 @@ int arch_decode_instruction(struct elf *
 			 * stack realignment.
 			 */
 			*type = INSN_STACK;
-			op->src.type = OP_SRC_ADD;
-			op->src.reg = CFI_R13;
-			op->src.offset = -16;
-			op->dest.type = OP_DEST_REG;
-			op->dest.reg = CFI_SP;
+			ADD_OP(op) {
+				op->src.type = OP_SRC_ADD;
+				op->src.reg = CFI_R13;
+				op->src.offset = -16;
+				op->dest.type = OP_DEST_REG;
+				op->dest.reg = CFI_SP;
+			}
 		}
 
 		break;
@@ -350,8 +389,10 @@ int arch_decode_instruction(struct elf *
 	case 0x8f:
 		/* pop to mem */
 		*type = INSN_STACK;
-		op->src.type = OP_SRC_POP;
-		op->dest.type = OP_DEST_MEM;
+		ADD_OP(op) {
+			op->src.type = OP_SRC_POP;
+			op->dest.type = OP_DEST_MEM;
+		}
 		break;
 
 	case 0x90:
@@ -361,15 +402,19 @@ int arch_decode_instruction(struct elf *
 	case 0x9c:
 		/* pushf */
 		*type = INSN_STACK;
-		op->src.type = OP_SRC_CONST;
-		op->dest.type = OP_DEST_PUSHF;
+		ADD_OP(op) {
+			op->src.type = OP_SRC_CONST;
+			op->dest.type = OP_DEST_PUSHF;
+		}
 		break;
 
 	case 0x9d:
 		/* popf */
 		*type = INSN_STACK;
-		op->src.type = OP_SRC_POPF;
-		op->dest.type = OP_DEST_MEM;
+		ADD_OP(op) {
+			op->src.type = OP_SRC_POPF;
+			op->dest.type = OP_DEST_MEM;
+		}
 		break;
 
 	case 0x0f:
@@ -405,15 +450,19 @@ int arch_decode_instruction(struct elf *
 
 			/* push fs/gs */
 			*type = INSN_STACK;
-			op->src.type = OP_SRC_CONST;
-			op->dest.type = OP_DEST_PUSH;
+			ADD_OP(op) {
+				op->src.type = OP_SRC_CONST;
+				op->dest.type = OP_DEST_PUSH;
+			}
 
 		} else if (op2 == 0xa1 || op2 == 0xa9) {
 
 			/* pop fs/gs */
 			*type = INSN_STACK;
-			op->src.type = OP_SRC_POP;
-			op->dest.type = OP_DEST_MEM;
+			ADD_OP(op) {
+				op->src.type = OP_SRC_POP;
+				op->dest.type = OP_DEST_MEM;
+			}
 		}
 
 		break;
@@ -427,7 +476,8 @@ int arch_decode_instruction(struct elf *
 		 * pop bp
 		 */
 		*type = INSN_STACK;
-		op->dest.type = OP_DEST_LEAVE;
+		ADD_OP(op)
+			op->dest.type = OP_DEST_LEAVE;
 
 		break;
 
@@ -449,12 +499,14 @@ int arch_decode_instruction(struct elf *
 	case 0xcf: /* iret */
 		*type = INSN_EXCEPTION_RETURN;
 
-		/* add $40, %rsp */
-		op->src.type = OP_SRC_ADD;
-		op->src.reg = CFI_SP;
-		op->src.offset = 5*8;
-		op->dest.type = OP_DEST_REG;
-		op->dest.reg = CFI_SP;
+		ADD_OP(op) {
+			/* add $40, %rsp */
+			op->src.type = OP_SRC_ADD;
+			op->src.reg = CFI_SP;
+			op->src.offset = 5*8;
+			op->dest.type = OP_DEST_REG;
+			op->dest.reg = CFI_SP;
+		}
 		break;
 
 	case 0xca: /* retf */
@@ -492,8 +544,10 @@ int arch_decode_instruction(struct elf *
 
 			/* push from mem */
 			*type = INSN_STACK;
-			op->src.type = OP_SRC_CONST;
-			op->dest.type = OP_DEST_PUSH;
+			ADD_OP(op) {
+				op->src.type = OP_SRC_CONST;
+				op->dest.type = OP_DEST_PUSH;
+			}
 		}
 
 		break;
@@ -504,11 +558,6 @@ int arch_decode_instruction(struct elf *
 
 	*immediate = insn.immediate.nbytes ? insn.immediate.value : 0;
 
-	if (*type == INSN_STACK || *type == INSN_EXCEPTION_RETURN)
-		list_add_tail(&op->list, ops_list);
-	else
-		free(op);
-
 	return 0;
 }
 



^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 4/8] objtool: Add support for intra-function calls
  2020-04-23 12:47 [PATCH 0/8] objtool vs retpoline Peter Zijlstra
                   ` (2 preceding siblings ...)
  2020-04-23 12:47 ` [PATCH 3/8] objtool: Rework allocating stack_ops on decode Peter Zijlstra
@ 2020-04-23 12:47 ` Peter Zijlstra
  2020-04-23 14:34   ` Miroslav Benes
  2020-04-23 12:47 ` [PATCH 5/8] x86/speculation: Change FILL_RETURN_BUFFER to work with objtool Peter Zijlstra
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 24+ messages in thread
From: Peter Zijlstra @ 2020-04-23 12:47 UTC (permalink / raw)
  To: jpoimboe, alexandre.chartre
  Cc: linux-kernel, jthierry, tglx, x86, mbenes, peterz

From: Alexandre Chartre <alexandre.chartre@oracle.com>

Change objtool to support intra-function calls. On x86, an intra-function
call is represented in objtool as a push onto the stack (of the return
address), and a jump to the destination address. That way the stack
information is correctly updated and the call flow is still accurate.

Signed-off-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200414103618.12657-4-alexandre.chartre@oracle.com
---
 include/linux/frame.h                            |   11 +++
 tools/objtool/Documentation/stack-validation.txt |    8 ++
 tools/objtool/arch/x86/decode.c                  |    8 ++
 tools/objtool/check.c                            |   84 ++++++++++++++++++++---
 4 files changed, 102 insertions(+), 9 deletions(-)

--- a/include/linux/frame.h
+++ b/include/linux/frame.h
@@ -15,9 +15,20 @@
 	static void __used __section(.discard.func_stack_frame_non_standard) \
 		*__func_stack_frame_non_standard_##func = func
 
+/*
+ * This macro indicates that the following intra-function call is valid.
+ * Any non-annotated intra-function call will cause objtool to issue a warning.
+ */
+#define ANNOTATE_INTRA_FUNCTION_CALL				\
+	999:							\
+	.pushsection .discard.intra_function_calls;		\
+	.long 999b;						\
+	.popsection;
+
 #else /* !CONFIG_STACK_VALIDATION */
 
 #define STACK_FRAME_NON_STANDARD(func)
+#define ANNOTATE_INTRA_FUNCTION_CALL
 
 #endif /* CONFIG_STACK_VALIDATION */
 
--- a/tools/objtool/Documentation/stack-validation.txt
+++ b/tools/objtool/Documentation/stack-validation.txt
@@ -316,6 +316,14 @@ they mean, and suggestions for how to fi
       sources).
 
 
+10. file.o: warning: unsupported intra-function call
+
+   This warning means that a direct call is done to a destination which
+   is not at the beginning of a function. If this is a legit call, you
+   can remove this warning by putting the ANNOTATE_INTRA_FUNCTION_CALL
+   directive right before the call.
+
+
 If the error doesn't seem to make sense, it could be a bug in objtool.
 Feel free to ask the objtool maintainer for help.
 
--- a/tools/objtool/arch/x86/decode.c
+++ b/tools/objtool/arch/x86/decode.c
@@ -516,6 +516,14 @@ int arch_decode_instruction(struct elf *
 
 	case 0xe8:
 		*type = INSN_CALL;
+		/*
+		 * For the impact on the stack, a CALL behaves like
+		 * a PUSH of an immediate value (the return address).
+		 */
+		ADD_OP(op) {
+			op->src.type = OP_SRC_CONST;
+			op->dest.type = OP_DEST_PUSH;
+		}
 		break;
 
 	case 0xfc:
--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -690,6 +690,16 @@ static int add_jump_destinations(struct
 	return 0;
 }
 
+static void remove_insn_ops(struct instruction *insn)
+{
+	struct stack_op *op, *tmp;
+
+	list_for_each_entry_safe(op, tmp, &insn->stack_ops, list) {
+		list_del(&op->list);
+		free(op);
+	}
+}
+
 /*
  * Find the destination instructions for all calls.
  */
@@ -715,10 +725,7 @@ static int add_call_destinations(struct
 				continue;
 
 			if (!insn->call_dest) {
-				WARN_FUNC("unsupported intra-function call",
-					  insn->sec, insn->offset);
-				if (retpoline)
-					WARN("If this is a retpoline, please patch it in with alternatives and annotate it with ANNOTATE_NOSPEC_ALTERNATIVE.");
+				WARN_FUNC("intra-function call", insn->sec, insn->offset);
 				return -1;
 			}
 
@@ -741,6 +748,12 @@ static int add_call_destinations(struct
 			}
 		} else
 			insn->call_dest = rela->sym;
+
+		/*
+		 * Whatever stack impact regular CALLs have, should be
+		 * undone by the RETURN of the called function.
+		 */
+		remove_insn_ops(insn);
 	}
 
 	return 0;
@@ -1416,6 +1429,57 @@ static int read_instr_hints(struct objto
 	return 0;
 }
 
+static int read_intra_function_calls(struct objtool_file *file)
+{
+	struct instruction *insn;
+	struct section *sec;
+	struct rela *rela;
+
+	sec = find_section_by_name(file->elf, ".rela.discard.intra_function_calls");
+	if (!sec)
+		return 0;
+
+	list_for_each_entry(rela, &sec->rela_list, list) {
+		unsigned long dest_off;
+
+		if (rela->sym->type != STT_SECTION) {
+			WARN("unexpected relocation symbol type in %s",
+			     sec->name);
+			return -1;
+		}
+
+		insn = find_insn(file, rela->sym->sec, rela->addend);
+		if (!insn) {
+			WARN("bad .discard.intra_function_call entry");
+			return -1;
+		}
+
+		if (insn->type != INSN_CALL) {
+			WARN_FUNC("intra_function_call not a direct call",
+				  insn->sec, insn->offset);
+			return -1;
+		}
+
+		/*
+		 * Treat intra-function CALLs as JMPs, but with a stack_op.
+		 * Also see how setup_call_dest() strips stack_ops from normal
+		 * CALLs.
+		 */
+		insn->type = INSN_JUMP_UNCONDITIONAL;
+
+		dest_off = insn->offset + insn->len + insn->immediate;
+		insn->jump_dest = find_insn(file, insn->sec, dest_off);
+		if (!insn->jump_dest) {
+			WARN_FUNC("can't find call dest at %s+0x%lx",
+				  insn->sec, insn->offset,
+				  insn->sec->name, dest_off);
+			return -1;
+		}
+	}
+
+	return 0;
+}
+
 static void mark_rodata(struct objtool_file *file)
 {
 	struct section *sec;
@@ -1471,6 +1535,10 @@ static int decode_sections(struct objtoo
 	if (ret)
 		return ret;
 
+	ret = read_intra_function_calls(file);
+	if (ret)
+		return ret;
+
 	ret = add_call_destinations(file);
 	if (ret)
 		return ret;
@@ -2245,6 +2313,9 @@ static int validate_branch(struct objtoo
 				return 0;
 		}
 
+		if (handle_insn_ops(insn, &state))
+			return 1;
+
 		switch (insn->type) {
 
 		case INSN_RETURN:
@@ -2304,9 +2375,6 @@ static int validate_branch(struct objtoo
 			break;
 
 		case INSN_EXCEPTION_RETURN:
-			if (handle_insn_ops(insn, &state))
-				return 1;
-
 			/*
 			 * This handles x86's sync_core() case, where we use an
 			 * IRET to self. All 'normal' IRET instructions are in
@@ -2326,8 +2394,6 @@ static int validate_branch(struct objtoo
 			return 0;
 
 		case INSN_STACK:
-			if (handle_insn_ops(insn, &state))
-				return 1;
 			break;
 
 		case INSN_STAC:



^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 5/8] x86/speculation: Change FILL_RETURN_BUFFER to work with objtool
  2020-04-23 12:47 [PATCH 0/8] objtool vs retpoline Peter Zijlstra
                   ` (3 preceding siblings ...)
  2020-04-23 12:47 ` [PATCH 4/8] objtool: Add support for intra-function calls Peter Zijlstra
@ 2020-04-23 12:47 ` Peter Zijlstra
  2020-04-23 15:45   ` Alexandre Chartre
  2020-04-24 19:04   ` Josh Poimboeuf
  2020-04-23 12:47 ` [PATCH 6/8] x86: Simplify retpoline declaration Peter Zijlstra
                   ` (2 subsequent siblings)
  7 siblings, 2 replies; 24+ messages in thread
From: Peter Zijlstra @ 2020-04-23 12:47 UTC (permalink / raw)
  To: jpoimboe, alexandre.chartre
  Cc: linux-kernel, jthierry, tglx, x86, mbenes, peterz

Change FILL_RETURN_BUFFER and vmexit_fill_RSB() so that objtool groks
them and can generate correct ORC unwind information for them.

 - Since ORC is alternative invariant; that is, all alternatives
   should have the same ORC entries, the __FILL_RETURN_BUFFER body
   can not be part of an alternative.

   Therefore, move it out of the alternative and keep the alternative
   as a sort of jump_label around it.

 - Use the ANNOTATE_INTRA_FUNCTION_CALL annotation to white-list
   these 'funny' call instructions to nowhere.

 - Use UNWIND_HINT_EMPTY to 'fill' the speculation traps, otherwise
   objtool will consider them unreachable.

 - Move the RSP adjustment into the loop, such that the loop has a
   deterministic stack layout.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/include/asm/nospec-branch.h |   16 ++++++++++------
 1 file changed, 10 insertions(+), 6 deletions(-)

--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -4,11 +4,13 @@
 #define _ASM_X86_NOSPEC_BRANCH_H_
 
 #include <linux/static_key.h>
+#include <linux/frame.h>
 
 #include <asm/alternative.h>
 #include <asm/alternative-asm.h>
 #include <asm/cpufeatures.h>
 #include <asm/msr-index.h>
+#include <asm/unwind_hints.h>
 
 /*
  * This should be used immediately before a retpoline alternative. It tells
@@ -46,21 +48,25 @@
 #define __FILL_RETURN_BUFFER(reg, nr, sp)	\
 	mov	$(nr/2), reg;			\
 771:						\
+	ANNOTATE_INTRA_FUNCTION_CALL		\
 	call	772f;				\
 773:	/* speculation trap */			\
+	UNWIND_HINT_EMPTY;			\
 	pause;					\
 	lfence;					\
 	jmp	773b;				\
 772:						\
+	ANNOTATE_INTRA_FUNCTION_CALL		\
 	call	774f;				\
 775:	/* speculation trap */			\
+	UNWIND_HINT_EMPTY;			\
 	pause;					\
 	lfence;					\
 	jmp	775b;				\
 774:						\
+	add	$(BITS_PER_LONG/8) * 2, sp;	\
 	dec	reg;				\
-	jnz	771b;				\
-	add	$(BITS_PER_LONG/8) * nr, sp;
+	jnz	771b;
 
 #ifdef __ASSEMBLY__
 
@@ -137,10 +143,8 @@
   */
 .macro FILL_RETURN_BUFFER reg:req nr:req ftr:req
 #ifdef CONFIG_RETPOLINE
-	ANNOTATE_NOSPEC_ALTERNATIVE
-	ALTERNATIVE "jmp .Lskip_rsb_\@",				\
-		__stringify(__FILL_RETURN_BUFFER(\reg,\nr,%_ASM_SP))	\
-		\ftr
+	ALTERNATIVE "jmp .Lskip_rsb_\@", "", \ftr
+	__FILL_RETURN_BUFFER(\reg,\nr,%_ASM_SP)
 .Lskip_rsb_\@:
 #endif
 .endm



^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 6/8] x86: Simplify retpoline declaration
  2020-04-23 12:47 [PATCH 0/8] objtool vs retpoline Peter Zijlstra
                   ` (4 preceding siblings ...)
  2020-04-23 12:47 ` [PATCH 5/8] x86/speculation: Change FILL_RETURN_BUFFER to work with objtool Peter Zijlstra
@ 2020-04-23 12:47 ` Peter Zijlstra
  2020-04-24 19:09   ` Josh Poimboeuf
  2020-04-23 12:47 ` [PATCH 7/8] x86: Change {JMP,CALL}_NOSPEC argument Peter Zijlstra
  2020-04-23 12:47 ` [PATCH 8/8] x86/retpoline: Fix retpoline unwind Peter Zijlstra
  7 siblings, 1 reply; 24+ messages in thread
From: Peter Zijlstra @ 2020-04-23 12:47 UTC (permalink / raw)
  To: jpoimboe, alexandre.chartre
  Cc: linux-kernel, jthierry, tglx, x86, mbenes, peterz, arnd

Because of how KSYM works, we need one declaration per line. Seeing
how we're going to be doubling the amount of retpoline symbols,
simplify the machinery in order to avoid having to copy/paste even
more.

Cc: arnd@arndb.de
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/include/asm/GEN-for-each-reg.h |   25 ++++++++++++++++++++++
 arch/x86/include/asm/asm-prototypes.h   |   28 +++++++------------------
 arch/x86/lib/retpoline.S                |   35 +++++++++++++-------------------
 3 files changed, 48 insertions(+), 40 deletions(-)

--- /dev/null
+++ b/arch/x86/include/asm/GEN-for-each-reg.h
@@ -0,0 +1,25 @@
+#ifdef CONFIG_64BIT
+GEN(rax)
+GEN(rbx)
+GEN(rcx)
+GEN(rdx)
+GEN(rsi)
+GEN(rdi)
+GEN(rbp)
+GEN(r8)
+GEN(r9)
+GEN(r10)
+GEN(r11)
+GEN(r12)
+GEN(r13)
+GEN(r14)
+GEN(r15)
+#else
+GEN(eax)
+GEN(ebx)
+GEN(ecx)
+GEN(edx)
+GEN(esi)
+GEN(edi)
+GEN(ebp)
+#endif
--- a/arch/x86/include/asm/asm-prototypes.h
+++ b/arch/x86/include/asm/asm-prototypes.h
@@ -17,24 +17,12 @@ extern void cmpxchg8b_emu(void);
 #endif
 
 #ifdef CONFIG_RETPOLINE
-#ifdef CONFIG_X86_32
-#define INDIRECT_THUNK(reg) extern asmlinkage void __x86_indirect_thunk_e ## reg(void);
-#else
-#define INDIRECT_THUNK(reg) extern asmlinkage void __x86_indirect_thunk_r ## reg(void);
-INDIRECT_THUNK(8)
-INDIRECT_THUNK(9)
-INDIRECT_THUNK(10)
-INDIRECT_THUNK(11)
-INDIRECT_THUNK(12)
-INDIRECT_THUNK(13)
-INDIRECT_THUNK(14)
-INDIRECT_THUNK(15)
-#endif
-INDIRECT_THUNK(ax)
-INDIRECT_THUNK(bx)
-INDIRECT_THUNK(cx)
-INDIRECT_THUNK(dx)
-INDIRECT_THUNK(si)
-INDIRECT_THUNK(di)
-INDIRECT_THUNK(bp)
+
+#define DECL_INDIRECT_THUNK(reg) \
+	extern asmlinkage void __x86_indirect_thunk_ ## reg (void);
+
+#undef GEN
+#define GEN(reg) DECL_INDIRECT_THUNK(reg)
+#include <asm/GEN-for-each-reg.h>
+
 #endif /* CONFIG_RETPOLINE */
--- a/arch/x86/lib/retpoline.S
+++ b/arch/x86/lib/retpoline.S
@@ -24,25 +24,20 @@ SYM_FUNC_END(__x86_indirect_thunk_\reg)
  * only see one instance of "__x86_indirect_thunk_\reg" rather
  * than one per register with the correct names. So we do it
  * the simple and nasty way...
+ *
+ * Worse, you can only have a single EXPORT_SYMBOL per line,
+ * and CPP can't insert newlines, so we have to repeat everything
+ * at least twice.
  */
-#define __EXPORT_THUNK(sym) _ASM_NOKPROBE(sym); EXPORT_SYMBOL(sym)
-#define EXPORT_THUNK(reg) __EXPORT_THUNK(__x86_indirect_thunk_ ## reg)
-#define GENERATE_THUNK(reg) THUNK reg ; EXPORT_THUNK(reg)
 
-GENERATE_THUNK(_ASM_AX)
-GENERATE_THUNK(_ASM_BX)
-GENERATE_THUNK(_ASM_CX)
-GENERATE_THUNK(_ASM_DX)
-GENERATE_THUNK(_ASM_SI)
-GENERATE_THUNK(_ASM_DI)
-GENERATE_THUNK(_ASM_BP)
-#ifdef CONFIG_64BIT
-GENERATE_THUNK(r8)
-GENERATE_THUNK(r9)
-GENERATE_THUNK(r10)
-GENERATE_THUNK(r11)
-GENERATE_THUNK(r12)
-GENERATE_THUNK(r13)
-GENERATE_THUNK(r14)
-GENERATE_THUNK(r15)
-#endif
+#define __EXPORT_THUNK(sym)	_ASM_NOKPROBE(sym); EXPORT_SYMBOL(sym)
+#define EXPORT_THUNK(reg)	__EXPORT_THUNK(__x86_indirect_thunk_ ## reg)
+
+#undef GEN
+#define GEN(reg) THUNK reg
+#include <asm/GEN-for-each-reg.h>
+
+#undef GEN
+#define GEN(reg) EXPORT_THUNK(reg)
+#include <asm/GEN-for-each-reg.h>
+



^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 7/8] x86: Change {JMP,CALL}_NOSPEC argument
  2020-04-23 12:47 [PATCH 0/8] objtool vs retpoline Peter Zijlstra
                   ` (5 preceding siblings ...)
  2020-04-23 12:47 ` [PATCH 6/8] x86: Simplify retpoline declaration Peter Zijlstra
@ 2020-04-23 12:47 ` Peter Zijlstra
  2020-04-23 12:47 ` [PATCH 8/8] x86/retpoline: Fix retpoline unwind Peter Zijlstra
  7 siblings, 0 replies; 24+ messages in thread
From: Peter Zijlstra @ 2020-04-23 12:47 UTC (permalink / raw)
  To: jpoimboe, alexandre.chartre
  Cc: linux-kernel, jthierry, tglx, x86, mbenes, peterz

In order to change the {JMP,CALL}_NOSPEC macros to call out-of-line
versions of the retpoline magic, we need to remove the '%' from the
argument, such that we can paste it onto symbol names.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/crypto/aesni-intel_asm.S            |    4 ++--
 arch/x86/crypto/camellia-aesni-avx-asm_64.S  |    2 +-
 arch/x86/crypto/camellia-aesni-avx2-asm_64.S |    2 +-
 arch/x86/crypto/crc32c-pcl-intel-asm_64.S    |   26 +++++++++++++-------------
 arch/x86/entry/entry_32.S                    |    6 +++---
 arch/x86/entry/entry_64.S                    |    2 +-
 arch/x86/include/asm/nospec-branch.h         |   16 ++++++++--------
 arch/x86/kernel/ftrace_32.S                  |    2 +-
 arch/x86/kernel/ftrace_64.S                  |    4 ++--
 arch/x86/lib/checksum_32.S                   |    4 ++--
 arch/x86/platform/efi/efi_stub_64.S          |    2 +-
 11 files changed, 35 insertions(+), 35 deletions(-)

--- a/arch/x86/crypto/aesni-intel_asm.S
+++ b/arch/x86/crypto/aesni-intel_asm.S
@@ -2758,7 +2758,7 @@ SYM_FUNC_START(aesni_xts_crypt8)
 	pxor INC, STATE4
 	movdqu IV, 0x30(OUTP)
 
-	CALL_NOSPEC %r11
+	CALL_NOSPEC r11
 
 	movdqu 0x00(OUTP), INC
 	pxor INC, STATE1
@@ -2803,7 +2803,7 @@ SYM_FUNC_START(aesni_xts_crypt8)
 	_aesni_gf128mul_x_ble()
 	movups IV, (IVP)
 
-	CALL_NOSPEC %r11
+	CALL_NOSPEC r11
 
 	movdqu 0x40(OUTP), INC
 	pxor INC, STATE1
--- a/arch/x86/crypto/camellia-aesni-avx-asm_64.S
+++ b/arch/x86/crypto/camellia-aesni-avx-asm_64.S
@@ -1228,7 +1228,7 @@ SYM_FUNC_START_LOCAL(camellia_xts_crypt_
 	vpxor 14 * 16(%rax), %xmm15, %xmm14;
 	vpxor 15 * 16(%rax), %xmm15, %xmm15;
 
-	CALL_NOSPEC %r9;
+	CALL_NOSPEC r9;
 
 	addq $(16 * 16), %rsp;
 
--- a/arch/x86/crypto/camellia-aesni-avx2-asm_64.S
+++ b/arch/x86/crypto/camellia-aesni-avx2-asm_64.S
@@ -1339,7 +1339,7 @@ SYM_FUNC_START_LOCAL(camellia_xts_crypt_
 	vpxor 14 * 32(%rax), %ymm15, %ymm14;
 	vpxor 15 * 32(%rax), %ymm15, %ymm15;
 
-	CALL_NOSPEC %r9;
+	CALL_NOSPEC r9;
 
 	addq $(16 * 32), %rsp;
 
--- a/arch/x86/crypto/crc32c-pcl-intel-asm_64.S
+++ b/arch/x86/crypto/crc32c-pcl-intel-asm_64.S
@@ -75,7 +75,7 @@
 
 .text
 SYM_FUNC_START(crc_pcl)
-#define    bufp		%rdi
+#define    bufp		rdi
 #define    bufp_dw	%edi
 #define    bufp_w	%di
 #define    bufp_b	%dil
@@ -105,9 +105,9 @@ SYM_FUNC_START(crc_pcl)
 	## 1) ALIGN:
 	################################################################
 
-	mov     bufp, bufptmp		# rdi = *buf
-	neg     bufp
-	and     $7, bufp		# calculate the unalignment amount of
+	mov     %bufp, bufptmp		# rdi = *buf
+	neg     %bufp
+	and     $7, %bufp		# calculate the unalignment amount of
 					# the address
 	je      proc_block		# Skip if aligned
 
@@ -123,13 +123,13 @@ SYM_FUNC_START(crc_pcl)
 do_align:
 	#### Calculate CRC of unaligned bytes of the buffer (if any)
 	movq    (bufptmp), tmp		# load a quadward from the buffer
-	add     bufp, bufptmp		# align buffer pointer for quadword
+	add     %bufp, bufptmp		# align buffer pointer for quadword
 					# processing
-	sub     bufp, len		# update buffer length
+	sub     %bufp, len		# update buffer length
 align_loop:
 	crc32b  %bl, crc_init_dw 	# compute crc32 of 1-byte
 	shr     $8, tmp			# get next byte
-	dec     bufp
+	dec     %bufp
 	jne     align_loop
 
 proc_block:
@@ -169,10 +169,10 @@ SYM_FUNC_START(crc_pcl)
 	xor     crc2, crc2
 
 	## branch into array
-	lea	jump_table(%rip), bufp
-	movzxw  (bufp, %rax, 2), len
-	lea	crc_array(%rip), bufp
-	lea     (bufp, len, 1), bufp
+	lea	jump_table(%rip), %bufp
+	movzxw  (%bufp, %rax, 2), len
+	lea	crc_array(%rip), %bufp
+	lea     (%bufp, len, 1), %bufp
 	JMP_NOSPEC bufp
 
 	################################################################
@@ -218,9 +218,9 @@ LABEL crc_ %i
 	## 4) Combine three results:
 	################################################################
 
-	lea	(K_table-8)(%rip), bufp		# first entry is for idx 1
+	lea	(K_table-8)(%rip), %bufp		# first entry is for idx 1
 	shlq    $3, %rax			# rax *= 8
-	pmovzxdq (bufp,%rax), %xmm0		# 2 consts: K1:K2
+	pmovzxdq (%bufp,%rax), %xmm0		# 2 consts: K1:K2
 	leal	(%eax,%eax,2), %eax		# rax *= 3 (total *24)
 	subq    %rax, tmp			# tmp -= rax*24
 
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -816,7 +816,7 @@ SYM_CODE_START(ret_from_fork)
 
 	/* kernel thread */
 1:	movl	%edi, %eax
-	CALL_NOSPEC %ebx
+	CALL_NOSPEC ebx
 	/*
 	 * A kernel thread is allowed to return here after successfully
 	 * calling do_execve().  Exit to userspace to complete the execve()
@@ -1501,7 +1501,7 @@ SYM_CODE_START_LOCAL_NOALIGN(common_exce
 
 	TRACE_IRQS_OFF
 	movl	%esp, %eax			# pt_regs pointer
-	CALL_NOSPEC %edi
+	CALL_NOSPEC edi
 	jmp	ret_from_exception
 SYM_CODE_END(common_exception_read_cr2)
 
@@ -1522,7 +1522,7 @@ SYM_CODE_START_LOCAL_NOALIGN(common_exce
 
 	TRACE_IRQS_OFF
 	movl	%esp, %eax			# pt_regs pointer
-	CALL_NOSPEC %edi
+	CALL_NOSPEC edi
 	jmp	ret_from_exception
 SYM_CODE_END(common_exception)
 
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -349,7 +349,7 @@ SYM_CODE_START(ret_from_fork)
 	/* kernel thread */
 	UNWIND_HINT_EMPTY
 	movq	%r12, %rdi
-	CALL_NOSPEC %rbx
+	CALL_NOSPEC rbx
 	/*
 	 * A kernel thread is allowed to return here after successfully
 	 * calling do_execve().  Exit to userspace to complete the execve()
--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -118,22 +118,22 @@
 .macro JMP_NOSPEC reg:req
 #ifdef CONFIG_RETPOLINE
 	ANNOTATE_NOSPEC_ALTERNATIVE
-	ALTERNATIVE_2 __stringify(ANNOTATE_RETPOLINE_SAFE; jmp *\reg),	\
-		__stringify(RETPOLINE_JMP \reg), X86_FEATURE_RETPOLINE,	\
-		__stringify(lfence; ANNOTATE_RETPOLINE_SAFE; jmp *\reg), X86_FEATURE_RETPOLINE_AMD
+	ALTERNATIVE_2 __stringify(ANNOTATE_RETPOLINE_SAFE; jmp *%\reg),	\
+		__stringify(RETPOLINE_JMP %\reg), X86_FEATURE_RETPOLINE,\
+		__stringify(lfence; ANNOTATE_RETPOLINE_SAFE; jmp *%\reg), X86_FEATURE_RETPOLINE_AMD
 #else
-	jmp	*\reg
+	jmp	*%\reg
 #endif
 .endm
 
 .macro CALL_NOSPEC reg:req
 #ifdef CONFIG_RETPOLINE
 	ANNOTATE_NOSPEC_ALTERNATIVE
-	ALTERNATIVE_2 __stringify(ANNOTATE_RETPOLINE_SAFE; call *\reg),	\
-		__stringify(RETPOLINE_CALL \reg), X86_FEATURE_RETPOLINE,\
-		__stringify(lfence; ANNOTATE_RETPOLINE_SAFE; call *\reg), X86_FEATURE_RETPOLINE_AMD
+	ALTERNATIVE_2 __stringify(ANNOTATE_RETPOLINE_SAFE; call *%\reg),\
+		__stringify(RETPOLINE_CALL %\reg), X86_FEATURE_RETPOLINE,\
+		__stringify(lfence; ANNOTATE_RETPOLINE_SAFE; call *%\reg), X86_FEATURE_RETPOLINE_AMD
 #else
-	call	*\reg
+	call	*%\reg
 #endif
 .endm
 
--- a/arch/x86/kernel/ftrace_32.S
+++ b/arch/x86/kernel/ftrace_32.S
@@ -189,5 +189,5 @@ SYM_CODE_END(ftrace_graph_caller)
 	movl	%eax, %ecx
 	popl	%edx
 	popl	%eax
-	JMP_NOSPEC %ecx
+	JMP_NOSPEC ecx
 #endif
--- a/arch/x86/kernel/ftrace_64.S
+++ b/arch/x86/kernel/ftrace_64.S
@@ -301,7 +301,7 @@ SYM_INNER_LABEL(ftrace_stub, SYM_L_GLOBA
 	 * function tracing is enabled.
 	 */
 	movq ftrace_trace_function, %r8
-	CALL_NOSPEC %r8
+	CALL_NOSPEC r8
 	restore_mcount_regs
 
 	jmp fgraph_trace
@@ -338,6 +338,6 @@ SYM_CODE_START(return_to_handler)
 	movq 8(%rsp), %rdx
 	movq (%rsp), %rax
 	addq $24, %rsp
-	JMP_NOSPEC %rdi
+	JMP_NOSPEC rdi
 SYM_CODE_END(return_to_handler)
 #endif
--- a/arch/x86/lib/checksum_32.S
+++ b/arch/x86/lib/checksum_32.S
@@ -153,7 +153,7 @@ SYM_FUNC_START(csum_partial)
 	negl %ebx
 	lea 45f(%ebx,%ebx,2), %ebx
 	testl %esi, %esi
-	JMP_NOSPEC %ebx
+	JMP_NOSPEC ebx
 
 	# Handle 2-byte-aligned regions
 20:	addw (%esi), %ax
@@ -436,7 +436,7 @@ SYM_FUNC_START(csum_partial_copy_generic
 	andl $-32,%edx
 	lea 3f(%ebx,%ebx), %ebx
 	testl %esi, %esi 
-	JMP_NOSPEC %ebx
+	JMP_NOSPEC ebx
 1:	addl $64,%esi
 	addl $64,%edi 
 	SRC(movb -32(%edx),%bl)	; SRC(movb (%edx),%bl)
--- a/arch/x86/platform/efi/efi_stub_64.S
+++ b/arch/x86/platform/efi/efi_stub_64.S
@@ -21,7 +21,7 @@ SYM_FUNC_START(__efi_call)
 	mov %r8, %r9
 	mov %rcx, %r8
 	mov %rsi, %rcx
-	CALL_NOSPEC %rdi
+	CALL_NOSPEC rdi
 	leave
 	ret
 SYM_FUNC_END(__efi_call)



^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 8/8] x86/retpoline: Fix retpoline unwind
  2020-04-23 12:47 [PATCH 0/8] objtool vs retpoline Peter Zijlstra
                   ` (6 preceding siblings ...)
  2020-04-23 12:47 ` [PATCH 7/8] x86: Change {JMP,CALL}_NOSPEC argument Peter Zijlstra
@ 2020-04-23 12:47 ` Peter Zijlstra
  2020-04-23 12:59   ` Peter Zijlstra
  2020-04-24 19:30   ` Josh Poimboeuf
  7 siblings, 2 replies; 24+ messages in thread
From: Peter Zijlstra @ 2020-04-23 12:47 UTC (permalink / raw)
  To: jpoimboe, alexandre.chartre
  Cc: linux-kernel, jthierry, tglx, x86, mbenes, peterz

Currently objtool cannot understand retpolines, and thus cannot
generate ORC unwind information for them. This means that we cannot
unwind from the middle of a retpoline.

The recent ANNOTATE_INTRA_FUNCTION_CALL and UNWIND_HINT_RET_OFFSET
support in objtool enables it to understand the basic retpoline
construct. A further problem is that the ORC unwind information is
alternative invariant; IOW. every alternative should have the same
ORC, retpolines obviously violate this. This means we need to
out-of-line them.

Since all GCC generated code already uses out-of-line retpolines, this
should not affect performance much, if anything.

This will enable objtool to generate valid ORC data for the
out-of-line copies, which means we can correctly and reliably unwind
through a retpoline.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/include/asm/asm-prototypes.h |    7 +++
 arch/x86/include/asm/nospec-branch.h  |   75 ++++++++--------------------------
 arch/x86/lib/retpoline.S              |   26 ++++++++++-
 3 files changed, 49 insertions(+), 59 deletions(-)

--- a/arch/x86/include/asm/asm-prototypes.h
+++ b/arch/x86/include/asm/asm-prototypes.h
@@ -21,8 +21,15 @@ extern void cmpxchg8b_emu(void);
 #define DECL_INDIRECT_THUNK(reg) \
 	extern asmlinkage void __x86_indirect_thunk_ ## reg (void);
 
+#define DECL_RETPOLINE(reg) \
+	extern asmlinkage void __x86_retpoline_ ## reg (void);
+
 #undef GEN
 #define GEN(reg) DECL_INDIRECT_THUNK(reg)
 #include <asm/GEN-for-each-reg.h>
 
+#undef GEN
+#define GEN(reg) DECL_RETPOLINE(reg)
+#include <asm/GEN-for-each-reg.h>
+
 #endif /* CONFIG_RETPOLINE */
--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -13,15 +13,6 @@
 #include <asm/unwind_hints.h>
 
 /*
- * This should be used immediately before a retpoline alternative. It tells
- * objtool where the retpolines are so that it can make sense of the control
- * flow by just reading the original instruction(s) and ignoring the
- * alternatives.
- */
-#define ANNOTATE_NOSPEC_ALTERNATIVE \
-	ANNOTATE_IGNORE_ALTERNATIVE
-
-/*
  * Fill the CPU return stack buffer.
  *
  * Each entry in the RSB, if used for a speculative 'ret', contains an
@@ -83,44 +74,15 @@
 .endm
 
 /*
- * These are the bare retpoline primitives for indirect jmp and call.
- * Do not use these directly; they only exist to make the ALTERNATIVE
- * invocation below less ugly.
- */
-.macro RETPOLINE_JMP reg:req
-	call	.Ldo_rop_\@
-.Lspec_trap_\@:
-	pause
-	lfence
-	jmp	.Lspec_trap_\@
-.Ldo_rop_\@:
-	mov	\reg, (%_ASM_SP)
-	ret
-.endm
-
-/*
- * This is a wrapper around RETPOLINE_JMP so the called function in reg
- * returns to the instruction after the macro.
- */
-.macro RETPOLINE_CALL reg:req
-	jmp	.Ldo_call_\@
-.Ldo_retpoline_jmp_\@:
-	RETPOLINE_JMP \reg
-.Ldo_call_\@:
-	call	.Ldo_retpoline_jmp_\@
-.endm
-
-/*
  * JMP_NOSPEC and CALL_NOSPEC macros can be used instead of a simple
  * indirect jmp/call which may be susceptible to the Spectre variant 2
  * attack.
  */
 .macro JMP_NOSPEC reg:req
 #ifdef CONFIG_RETPOLINE
-	ANNOTATE_NOSPEC_ALTERNATIVE
-	ALTERNATIVE_2 __stringify(ANNOTATE_RETPOLINE_SAFE; jmp *%\reg),	\
-		__stringify(RETPOLINE_JMP %\reg), X86_FEATURE_RETPOLINE,\
-		__stringify(lfence; ANNOTATE_RETPOLINE_SAFE; jmp *%\reg), X86_FEATURE_RETPOLINE_AMD
+	ALTERNATIVE_2 __stringify(ANNOTATE_RETPOLINE_SAFE; jmp *%\reg), \
+		      __stringify(jmp __x86_retpoline_\reg), X86_FEATURE_RETPOLINE, \
+		      __stringify(lfence; ANNOTATE_RETPOLINE_SAFE; jmp *%\reg), X86_FEATURE_RETPOLINE_AMD
 #else
 	jmp	*%\reg
 #endif
@@ -128,10 +90,16 @@
 
 .macro CALL_NOSPEC reg:req
 #ifdef CONFIG_RETPOLINE
-	ANNOTATE_NOSPEC_ALTERNATIVE
-	ALTERNATIVE_2 __stringify(ANNOTATE_RETPOLINE_SAFE; call *%\reg),\
-		__stringify(RETPOLINE_CALL %\reg), X86_FEATURE_RETPOLINE,\
-		__stringify(lfence; ANNOTATE_RETPOLINE_SAFE; call *%\reg), X86_FEATURE_RETPOLINE_AMD
+	/*
+	 * This cannot be ALTERNATIVE_2 like with JMP_NOSPEC, because ORC
+	 * unwind data is alternative invariant and needs stack modifying
+	 * instructions to be in the same place for all alternatives.
+	 *
+	 * IOW the CALL instruction must be at the same offset for all cases.
+	 */
+	ALTERNATIVE "", "lfence", X86_FEATURE_RETPOLINE_AMD
+	ALTERNATIVE __stringify(ANNOTATE_RETPOLINE_SAFE; call *%\reg), \
+		    __stringify(call __x86_retpoline_\reg), X86_FEATURE_RETPOLINE
 #else
 	call	*%\reg
 #endif
@@ -165,16 +133,12 @@
  * which is ensured when CONFIG_RETPOLINE is defined.
  */
 # define CALL_NOSPEC						\
-	ANNOTATE_NOSPEC_ALTERNATIVE				\
-	ALTERNATIVE_2(						\
-	ANNOTATE_RETPOLINE_SAFE					\
-	"call *%[thunk_target]\n",				\
-	"call __x86_indirect_thunk_%V[thunk_target]\n",		\
-	X86_FEATURE_RETPOLINE,					\
-	"lfence;\n"						\
-	ANNOTATE_RETPOLINE_SAFE					\
-	"call *%[thunk_target]\n",				\
-	X86_FEATURE_RETPOLINE_AMD)
+	ALTERNATIVE("", "lfence", X86_FEATURE_RETPOLINE_AMD)	\
+	ALTERNATIVE(ANNOTATE_RETPOLINE_SAFE			\
+		    "call *%[thunk_target]\n",			\
+		    "call __x86_indirect_thunk_%V[thunk_target]\n", \
+		    X86_FEATURE_RETPOLINE)
+
 # define THUNK_TARGET(addr) [thunk_target] "r" (addr)
 
 #else /* CONFIG_X86_32 */
@@ -184,7 +148,6 @@
  * here, anyway.
  */
 # define CALL_NOSPEC						\
-	ANNOTATE_NOSPEC_ALTERNATIVE				\
 	ALTERNATIVE_2(						\
 	ANNOTATE_RETPOLINE_SAFE					\
 	"call *%[thunk_target]\n",				\
--- a/arch/x86/lib/retpoline.S
+++ b/arch/x86/lib/retpoline.S
@@ -7,15 +7,31 @@
 #include <asm/alternative-asm.h>
 #include <asm/export.h>
 #include <asm/nospec-branch.h>
+#include <asm/unwind_hints.h>
+#include <asm/frame.h>
 
 .macro THUNK reg
 	.section .text.__x86.indirect_thunk
 
+	.align 32
 SYM_FUNC_START(__x86_indirect_thunk_\reg)
-	CFI_STARTPROC
-	JMP_NOSPEC %\reg
-	CFI_ENDPROC
+	JMP_NOSPEC \reg
 SYM_FUNC_END(__x86_indirect_thunk_\reg)
+
+SYM_FUNC_START_NOALIGN(__x86_retpoline_\reg)
+	ANNOTATE_INTRA_FUNCTION_CALL
+	call	.Ldo_rop_\@
+.Lspec_trap_\@:
+	UNWIND_HINT_EMPTY
+	pause
+	lfence
+	jmp	.Lspec_trap_\@
+.Ldo_rop_\@:
+	mov	%\reg, (%_ASM_SP)
+	UNWIND_HINT_RET_OFFSET
+	ret
+SYM_FUNC_END(__x86_retpoline_\reg)
+
 .endm
 
 /*
@@ -32,6 +48,7 @@ SYM_FUNC_END(__x86_indirect_thunk_\reg)
 
 #define __EXPORT_THUNK(sym)	_ASM_NOKPROBE(sym); EXPORT_SYMBOL(sym)
 #define EXPORT_THUNK(reg)	__EXPORT_THUNK(__x86_indirect_thunk_ ## reg)
+#define EXPORT_RETPOLINE(reg)  __EXPORT_THUNK(__x86_retpoline_ ## reg)
 
 #undef GEN
 #define GEN(reg) THUNK reg
@@ -41,3 +58,6 @@ SYM_FUNC_END(__x86_indirect_thunk_\reg)
 #define GEN(reg) EXPORT_THUNK(reg)
 #include <asm/GEN-for-each-reg.h>
 
+#undef GEN
+#define GEN(reg) EXPORT_RETPOLINE(reg)
+#include <asm/GEN-for-each-reg.h>



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 8/8] x86/retpoline: Fix retpoline unwind
  2020-04-23 12:47 ` [PATCH 8/8] x86/retpoline: Fix retpoline unwind Peter Zijlstra
@ 2020-04-23 12:59   ` Peter Zijlstra
  2020-04-24 19:30   ` Josh Poimboeuf
  1 sibling, 0 replies; 24+ messages in thread
From: Peter Zijlstra @ 2020-04-23 12:59 UTC (permalink / raw)
  To: jpoimboe, alexandre.chartre; +Cc: linux-kernel, jthierry, tglx, x86, mbenes

On Thu, Apr 23, 2020 at 02:47:25PM +0200, Peter Zijlstra wrote:
> @@ -128,10 +90,16 @@
>  
>  .macro CALL_NOSPEC reg:req
>  #ifdef CONFIG_RETPOLINE
> -	ANNOTATE_NOSPEC_ALTERNATIVE
> -	ALTERNATIVE_2 __stringify(ANNOTATE_RETPOLINE_SAFE; call *%\reg),\
> -		__stringify(RETPOLINE_CALL %\reg), X86_FEATURE_RETPOLINE,\
> -		__stringify(lfence; ANNOTATE_RETPOLINE_SAFE; call *%\reg), X86_FEATURE_RETPOLINE_AMD
> +	/*
> +	 * This cannot be ALTERNATIVE_2 like with JMP_NOSPEC, because ORC
> +	 * unwind data is alternative invariant and needs stack modifying
> +	 * instructions to be in the same place for all alternatives.
> +	 *
> +	 * IOW the CALL instruction must be at the same offset for all cases.
> +	 */
> +	ALTERNATIVE "", "lfence", X86_FEATURE_RETPOLINE_AMD
> +	ALTERNATIVE __stringify(ANNOTATE_RETPOLINE_SAFE; call *%\reg), \
> +		    __stringify(call __x86_retpoline_\reg), X86_FEATURE_RETPOLINE
>  #else
>  	call	*%\reg
>  #endif
> @@ -165,16 +133,12 @@
>   * which is ensured when CONFIG_RETPOLINE is defined.
>   */
>  # define CALL_NOSPEC						\
> -	ANNOTATE_NOSPEC_ALTERNATIVE				\
> -	ALTERNATIVE_2(						\
> -	ANNOTATE_RETPOLINE_SAFE					\
> -	"call *%[thunk_target]\n",				\
> -	"call __x86_indirect_thunk_%V[thunk_target]\n",		\
> -	X86_FEATURE_RETPOLINE,					\
> -	"lfence;\n"						\
> -	ANNOTATE_RETPOLINE_SAFE					\
> -	"call *%[thunk_target]\n",				\
> -	X86_FEATURE_RETPOLINE_AMD)
> +	ALTERNATIVE("", "lfence", X86_FEATURE_RETPOLINE_AMD)	\
> +	ALTERNATIVE(ANNOTATE_RETPOLINE_SAFE			\
> +		    "call *%[thunk_target]\n",			\
> +		    "call __x86_indirect_thunk_%V[thunk_target]\n", \
> +		    X86_FEATURE_RETPOLINE)
> +

Hmm, that's a bit daft; that could be a call to
__x86_retpoline_%V[thunk_target] like for the ASM version above.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 4/8] objtool: Add support for intra-function calls
  2020-04-23 12:47 ` [PATCH 4/8] objtool: Add support for intra-function calls Peter Zijlstra
@ 2020-04-23 14:34   ` Miroslav Benes
  2020-04-23 15:22     ` Peter Zijlstra
  0 siblings, 1 reply; 24+ messages in thread
From: Miroslav Benes @ 2020-04-23 14:34 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: jpoimboe, alexandre.chartre, linux-kernel, jthierry, tglx, x86

>  /*
>   * Find the destination instructions for all calls.
>   */
> @@ -715,10 +725,7 @@ static int add_call_destinations(struct
>  				continue;
>  
>  			if (!insn->call_dest) {
> -				WARN_FUNC("unsupported intra-function call",
> -					  insn->sec, insn->offset);
> -				if (retpoline)
> -					WARN("If this is a retpoline, please patch it in with alternatives and annotate it with ANNOTATE_NOSPEC_ALTERNATIVE.");
> +				WARN_FUNC("intra-function call", insn->sec, insn->offset);

"unsupported intra-function call"?

>  				return -1;
>  			}
>  
> @@ -741,6 +748,12 @@ static int add_call_destinations(struct
>  			}
>  		} else
>  			insn->call_dest = rela->sym;
> +
> +		/*
> +		 * Whatever stack impact regular CALLs have, should be
> +		 * undone by the RETURN of the called function.

 * Annotated intra-function CALLs are treated as JMPs with a stack_op.
 * See read_intra_function_calls().

would make it a bit clearer.

> +                */
> +               remove_insn_ops(insn);
>         }
>  
>         return 0;
> @@ -1416,6 +1429,57 @@ static int read_instr_hints(struct objto
>         return 0;
>  }
> 
> +static int read_intra_function_calls(struct objtool_file *file)
> +{
> +       struct instruction *insn;
> +       struct section *sec;
> +       struct rela *rela;
> +
> +       sec = find_section_by_name(file->elf, ".rela.discard.intra_function_calls");
> +       if (!sec)
> +               return 0;
> +
> +       list_for_each_entry(rela, &sec->rela_list, list) {
> +               unsigned long dest_off;
> +
> +               if (rela->sym->type != STT_SECTION) {
> +                       WARN("unexpected relocation symbol type in %s",
> +                            sec->name);
> +                       return -1;
> +               }
> +
> +               insn = find_insn(file, rela->sym->sec, rela->addend);
> +               if (!insn) {
> +                       WARN("bad .discard.intra_function_call entry");
> +                       return -1;
> +               }
> +
> +               if (insn->type != INSN_CALL) {
> +                       WARN_FUNC("intra_function_call not a direct call",
> +                                 insn->sec, insn->offset);
> +                       return -1;
> +               }
> +
> +               /*
> +                * Treat intra-function CALLs as JMPs, but with a stack_op.
> +                * Also see how setup_call_dest() strips stack_ops from normal
> +                * CALLs.

/*
 * Treat annotated intra-function CALLs as JMPs, but with a stack_op.
 * Also see how add_call_destinations() strips stack_ops from normal
 * CALLs.
 */

? (note added "annotated" and s/setup_call_dest/add_call_destinations/)

> +                */
> +               insn->type = INSN_JUMP_UNCONDITIONAL;

[...]

> @@ -2245,6 +2313,9 @@ static int validate_branch(struct objtoo
>  				return 0;
>  		}
>  
> +		if (handle_insn_ops(insn, &state))
> +			return 1;
> +
>  		switch (insn->type) {
>  
>  		case INSN_RETURN:
> @@ -2304,9 +2375,6 @@ static int validate_branch(struct objtoo
>  			break;
>  
>  		case INSN_EXCEPTION_RETURN:
> -			if (handle_insn_ops(insn, &state))
> -				return 1;
> -
>  			/*
>  			 * This handles x86's sync_core() case, where we use an
>  			 * IRET to self. All 'normal' IRET instructions are in
> @@ -2326,8 +2394,6 @@ static int validate_branch(struct objtoo
>  			return 0;
>  
>  		case INSN_STACK:
> -			if (handle_insn_ops(insn, &state))
> -				return 1;
>  			break;

So we could get rid of INSN_STACK now as Julien proposed, couldn't we? If 
I am not missing something. handle_insn_ops() is called unconditionally 
here for all insn types and you remove stack_ops when unneeded.

We could also go ahead with Julien's proposal to remove 
INSN_EXCEPTION_RETURN hack and move it to tools/objtool/arch/x86/decode.c. 

Miroslav

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 4/8] objtool: Add support for intra-function calls
  2020-04-23 14:34   ` Miroslav Benes
@ 2020-04-23 15:22     ` Peter Zijlstra
  2020-04-24  9:37       ` Miroslav Benes
  0 siblings, 1 reply; 24+ messages in thread
From: Peter Zijlstra @ 2020-04-23 15:22 UTC (permalink / raw)
  To: Miroslav Benes
  Cc: jpoimboe, alexandre.chartre, linux-kernel, jthierry, tglx, x86

On Thu, Apr 23, 2020 at 04:34:21PM +0200, Miroslav Benes wrote:
> >  /*
> >   * Find the destination instructions for all calls.
> >   */
> > @@ -715,10 +725,7 @@ static int add_call_destinations(struct
> >  				continue;
> >  
> >  			if (!insn->call_dest) {
> > -				WARN_FUNC("unsupported intra-function call",
> > -					  insn->sec, insn->offset);
> > -				if (retpoline)
> > -					WARN("If this is a retpoline, please patch it in with alternatives and annotate it with ANNOTATE_NOSPEC_ALTERNATIVE.");
> > +				WARN_FUNC("intra-function call", insn->sec, insn->offset);
> 
> "unsupported intra-function call"?

Well, I think the thinking was that intra-function calls are actually
supported, 'unannotated' perhaps ?

> >  				return -1;
> >  			}
> >  
> > @@ -741,6 +748,12 @@ static int add_call_destinations(struct
> >  			}
> >  		} else
> >  			insn->call_dest = rela->sym;
> > +
> > +		/*
> > +		 * Whatever stack impact regular CALLs have, should be
> > +		 * undone by the RETURN of the called function.
> 
>  * Annotated intra-function CALLs are treated as JMPs with a stack_op.
>  * See read_intra_function_calls().
> 
> would make it a bit clearer.

That doesn't work for me; we want to explain why it is OK to delete
stack_ops for regular CALLs. The reason this is OK, is because they're
matched by RETURN.

> > +                */
> > +               remove_insn_ops(insn);
> >         }
> >  
> >         return 0;
> > @@ -1416,6 +1429,57 @@ static int read_instr_hints(struct objto
> >         return 0;
> >  }
> > 
> > +static int read_intra_function_calls(struct objtool_file *file)
> > +{
> > +       struct instruction *insn;
> > +       struct section *sec;
> > +       struct rela *rela;
> > +
> > +       sec = find_section_by_name(file->elf, ".rela.discard.intra_function_calls");
> > +       if (!sec)
> > +               return 0;
> > +
> > +       list_for_each_entry(rela, &sec->rela_list, list) {
> > +               unsigned long dest_off;
> > +
> > +               if (rela->sym->type != STT_SECTION) {
> > +                       WARN("unexpected relocation symbol type in %s",
> > +                            sec->name);
> > +                       return -1;
> > +               }
> > +
> > +               insn = find_insn(file, rela->sym->sec, rela->addend);
> > +               if (!insn) {
> > +                       WARN("bad .discard.intra_function_call entry");
> > +                       return -1;
> > +               }
> > +
> > +               if (insn->type != INSN_CALL) {
> > +                       WARN_FUNC("intra_function_call not a direct call",
> > +                                 insn->sec, insn->offset);
> > +                       return -1;
> > +               }
> > +
> > +               /*
> > +                * Treat intra-function CALLs as JMPs, but with a stack_op.
> > +                * Also see how setup_call_dest() strips stack_ops from normal
> > +                * CALLs.
> 
> /*
>  * Treat annotated intra-function CALLs as JMPs, but with a stack_op.
>  * Also see how add_call_destinations() strips stack_ops from normal
>  * CALLs.
>  */
> 
> ? (note added "annotated" and s/setup_call_dest/add_call_destinations/)

Unannotated intra-function calls are not allowed, so I don't see a
reason to make that distinction, but sure.

> > +                */
> > +               insn->type = INSN_JUMP_UNCONDITIONAL;
> 
> [...]
> 
> > @@ -2245,6 +2313,9 @@ static int validate_branch(struct objtoo
> >  				return 0;
> >  		}
> >  
> > +		if (handle_insn_ops(insn, &state))
> > +			return 1;
> > +
> >  		switch (insn->type) {
> >  
> >  		case INSN_RETURN:
> > @@ -2304,9 +2375,6 @@ static int validate_branch(struct objtoo
> >  			break;
> >  
> >  		case INSN_EXCEPTION_RETURN:
> > -			if (handle_insn_ops(insn, &state))
> > -				return 1;
> > -
> >  			/*
> >  			 * This handles x86's sync_core() case, where we use an
> >  			 * IRET to self. All 'normal' IRET instructions are in
> > @@ -2326,8 +2394,6 @@ static int validate_branch(struct objtoo
> >  			return 0;
> >  
> >  		case INSN_STACK:
> > -			if (handle_insn_ops(insn, &state))
> > -				return 1;
> >  			break;
> 
> So we could get rid of INSN_STACK now as Julien proposed, couldn't we? If 
> I am not missing something. handle_insn_ops() is called unconditionally 
> here for all insn types and you remove stack_ops when unneeded.

Yes, INSN_STACK can now go away in favour of NOPs with stack_ops.
Separate patch though.

> We could also go ahead with Julien's proposal to remove 
> INSN_EXCEPTION_RETURN hack and move it to tools/objtool/arch/x86/decode.c. 

I don't immediately see how; we don't have a symbol there.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 3/8] objtool: Rework allocating stack_ops on decode
  2020-04-23 12:47 ` [PATCH 3/8] objtool: Rework allocating stack_ops on decode Peter Zijlstra
@ 2020-04-23 15:40   ` Alexandre Chartre
  2020-04-23 15:54     ` Peter Zijlstra
  2020-04-23 16:16     ` Peter Zijlstra
  0 siblings, 2 replies; 24+ messages in thread
From: Alexandre Chartre @ 2020-04-23 15:40 UTC (permalink / raw)
  To: Peter Zijlstra, jpoimboe; +Cc: linux-kernel, jthierry, tglx, x86, mbenes


On 4/23/20 2:47 PM, Peter Zijlstra wrote:
> Wrap each stack_op in a macro that allocates and adds it to the list.
> This simplifies trying to figure out what to do with the pre-allocated
> stack_op at the end.
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>   tools/objtool/arch/x86/decode.c |  257 +++++++++++++++++++++++-----------------
>   1 file changed, 153 insertions(+), 104 deletions(-)
> 
> --- a/tools/objtool/arch/x86/decode.c
> +++ b/tools/objtool/arch/x86/decode.c
> @@ -77,6 +77,17 @@ unsigned long arch_jump_destination(stru
>   	return insn->offset + insn->len + insn->immediate;
>   }
>   
> +#define PUSH_OP(op) \
> +({ \
> +	list_add_tail(&op->list, ops_list); \
> +	NULL; \
> +})
> +
> +#define ADD_OP(op) \
> +	if (!(op = calloc(1, sizeof(*op)))) \
> +		return -1; \
> +	else for (; op; op = PUSH_OP(op))
> +

I would better have a function to alloc+add op instead of weird macros,
for example:

static struct stack_op *add_op(void)
{
         struct stack *op;

         op = calloc(1, sizeof(*op));
         if (!op)
                 return NULL;
         list_add_tail(&op->list, ops_list);
}

Then it requires two more lines when using it but I think the code is much
cleaner and clearer, e.g.:

                         op = add_op();
                         if (!op)
                                 return -1;
                         op->src.type = OP_SRC_ADD;
                         op->src.reg = op_to_cfi_reg[modrm_reg][rex_r];
                         op->dest.type = OP_DEST_REG;
                         op->dest.reg = CFI_SP;

alex.

>   int arch_decode_instruction(struct elf *elf, struct section *sec,
>   			    unsigned long offset, unsigned int maxlen,
>   			    unsigned int *len, enum insn_type *type,
> @@ -88,7 +99,7 @@ int arch_decode_instruction(struct elf *
>   	unsigned char op1, op2, rex = 0, rex_b = 0, rex_r = 0, rex_w = 0,
>   		      rex_x = 0, modrm = 0, modrm_mod = 0, modrm_rm = 0,
>   		      modrm_reg = 0, sib = 0;
> -	struct stack_op *op;
> +	struct stack_op *op = NULL;
>   
>   	x86_64 = is_x86_64(elf);
>   	if (x86_64 == -1)
> @@ -129,10 +140,6 @@ int arch_decode_instruction(struct elf *
>   	if (insn.sib.nbytes)
>   		sib = insn.sib.bytes[0];
>   
> -	op = calloc(1, sizeof(*op));
> -	if (!op)
> -		return -1;
> -
>   	switch (op1) {
>   
>   	case 0x1:
> @@ -141,10 +148,12 @@ int arch_decode_instruction(struct elf *
>   
>   			/* add/sub reg, %rsp */
>   			*type = INSN_STACK;
> -			op->src.type = OP_SRC_ADD;
> -			op->src.reg = op_to_cfi_reg[modrm_reg][rex_r];
> -			op->dest.type = OP_DEST_REG;
> -			op->dest.reg = CFI_SP;
> +			ADD_OP(op) {
> +				op->src.type = OP_SRC_ADD;
> +				op->src.reg = op_to_cfi_reg[modrm_reg][rex_r];
> +				op->dest.type = OP_DEST_REG;
> +				op->dest.reg = CFI_SP;
> +			}
>   		}
>   		break;
>   
> @@ -152,9 +161,11 @@ int arch_decode_instruction(struct elf *
>   
>   		/* push reg */
>   		*type = INSN_STACK;
> -		op->src.type = OP_SRC_REG;
> -		op->src.reg = op_to_cfi_reg[op1 & 0x7][rex_b];
> -		op->dest.type = OP_DEST_PUSH;
> +		ADD_OP(op) {
> +			op->src.type = OP_SRC_REG;
> +			op->src.reg = op_to_cfi_reg[op1 & 0x7][rex_b];
> +			op->dest.type = OP_DEST_PUSH;
> +		}
>   
>   		break;
>   
> @@ -162,9 +173,11 @@ int arch_decode_instruction(struct elf *
>   
>   		/* pop reg */
>   		*type = INSN_STACK;
> -		op->src.type = OP_SRC_POP;
> -		op->dest.type = OP_DEST_REG;
> -		op->dest.reg = op_to_cfi_reg[op1 & 0x7][rex_b];
> +		ADD_OP(op) {
> +			op->src.type = OP_SRC_POP;
> +			op->dest.type = OP_DEST_REG;
> +			op->dest.reg = op_to_cfi_reg[op1 & 0x7][rex_b];
> +		}
>   
>   		break;
>   
> @@ -172,8 +185,10 @@ int arch_decode_instruction(struct elf *
>   	case 0x6a:
>   		/* push immediate */
>   		*type = INSN_STACK;
> -		op->src.type = OP_SRC_CONST;
> -		op->dest.type = OP_DEST_PUSH;
> +		ADD_OP(op) {
> +			op->src.type = OP_SRC_CONST;
> +			op->dest.type = OP_DEST_PUSH;
> +		}
>   		break;
>   
>   	case 0x70 ... 0x7f:
> @@ -188,11 +203,13 @@ int arch_decode_instruction(struct elf *
>   		if (modrm == 0xe4) {
>   			/* and imm, %rsp */
>   			*type = INSN_STACK;
> -			op->src.type = OP_SRC_AND;
> -			op->src.reg = CFI_SP;
> -			op->src.offset = insn.immediate.value;
> -			op->dest.type = OP_DEST_REG;
> -			op->dest.reg = CFI_SP;
> +			ADD_OP(op) {
> +				op->src.type = OP_SRC_AND;
> +				op->src.reg = CFI_SP;
> +				op->src.offset = insn.immediate.value;
> +				op->dest.type = OP_DEST_REG;
> +				op->dest.reg = CFI_SP;
> +			}
>   			break;
>   		}
>   
> @@ -205,11 +222,13 @@ int arch_decode_instruction(struct elf *
>   
>   		/* add/sub imm, %rsp */
>   		*type = INSN_STACK;
> -		op->src.type = OP_SRC_ADD;
> -		op->src.reg = CFI_SP;
> -		op->src.offset = insn.immediate.value * sign;
> -		op->dest.type = OP_DEST_REG;
> -		op->dest.reg = CFI_SP;
> +		ADD_OP(op) {
> +			op->src.type = OP_SRC_ADD;
> +			op->src.reg = CFI_SP;
> +			op->src.offset = insn.immediate.value * sign;
> +			op->dest.type = OP_DEST_REG;
> +			op->dest.reg = CFI_SP;
> +		}
>   		break;
>   
>   	case 0x89:
> @@ -217,10 +236,12 @@ int arch_decode_instruction(struct elf *
>   
>   			/* mov %rsp, reg */
>   			*type = INSN_STACK;
> -			op->src.type = OP_SRC_REG;
> -			op->src.reg = CFI_SP;
> -			op->dest.type = OP_DEST_REG;
> -			op->dest.reg = op_to_cfi_reg[modrm_rm][rex_b];
> +			ADD_OP(op) {
> +				op->src.type = OP_SRC_REG;
> +				op->src.reg = CFI_SP;
> +				op->dest.type = OP_DEST_REG;
> +				op->dest.reg = op_to_cfi_reg[modrm_rm][rex_b];
> +			}
>   			break;
>   		}
>   
> @@ -228,10 +249,12 @@ int arch_decode_instruction(struct elf *
>   
>   			/* mov reg, %rsp */
>   			*type = INSN_STACK;
> -			op->src.type = OP_SRC_REG;
> -			op->src.reg = op_to_cfi_reg[modrm_reg][rex_r];
> -			op->dest.type = OP_DEST_REG;
> -			op->dest.reg = CFI_SP;
> +			ADD_OP(op) {
> +				op->src.type = OP_SRC_REG;
> +				op->src.reg = op_to_cfi_reg[modrm_reg][rex_r];
> +				op->dest.type = OP_DEST_REG;
> +				op->dest.reg = CFI_SP;
> +			}
>   			break;
>   		}
>   
> @@ -242,21 +265,25 @@ int arch_decode_instruction(struct elf *
>   
>   			/* mov reg, disp(%rbp) */
>   			*type = INSN_STACK;
> -			op->src.type = OP_SRC_REG;
> -			op->src.reg = op_to_cfi_reg[modrm_reg][rex_r];
> -			op->dest.type = OP_DEST_REG_INDIRECT;
> -			op->dest.reg = CFI_BP;
> -			op->dest.offset = insn.displacement.value;
> +			ADD_OP(op) {
> +				op->src.type = OP_SRC_REG;
> +				op->src.reg = op_to_cfi_reg[modrm_reg][rex_r];
> +				op->dest.type = OP_DEST_REG_INDIRECT;
> +				op->dest.reg = CFI_BP;
> +				op->dest.offset = insn.displacement.value;
> +			}
>   
>   		} else if (rex_w && !rex_b && modrm_rm == 4 && sib == 0x24) {
>   
>   			/* mov reg, disp(%rsp) */
>   			*type = INSN_STACK;
> -			op->src.type = OP_SRC_REG;
> -			op->src.reg = op_to_cfi_reg[modrm_reg][rex_r];
> -			op->dest.type = OP_DEST_REG_INDIRECT;
> -			op->dest.reg = CFI_SP;
> -			op->dest.offset = insn.displacement.value;
> +			ADD_OP(op) {
> +				op->src.type = OP_SRC_REG;
> +				op->src.reg = op_to_cfi_reg[modrm_reg][rex_r];
> +				op->dest.type = OP_DEST_REG_INDIRECT;
> +				op->dest.reg = CFI_SP;
> +				op->dest.offset = insn.displacement.value;
> +			}
>   		}
>   
>   		break;
> @@ -266,22 +293,26 @@ int arch_decode_instruction(struct elf *
>   
>   			/* mov disp(%rbp), reg */
>   			*type = INSN_STACK;
> -			op->src.type = OP_SRC_REG_INDIRECT;
> -			op->src.reg = CFI_BP;
> -			op->src.offset = insn.displacement.value;
> -			op->dest.type = OP_DEST_REG;
> -			op->dest.reg = op_to_cfi_reg[modrm_reg][rex_r];
> +			ADD_OP(op) {
> +				op->src.type = OP_SRC_REG_INDIRECT;
> +				op->src.reg = CFI_BP;
> +				op->src.offset = insn.displacement.value;
> +				op->dest.type = OP_DEST_REG;
> +				op->dest.reg = op_to_cfi_reg[modrm_reg][rex_r];
> +			}
>   
>   		} else if (rex_w && !rex_b && sib == 0x24 &&
>   			   modrm_mod != 3 && modrm_rm == 4) {
>   
>   			/* mov disp(%rsp), reg */
>   			*type = INSN_STACK;
> -			op->src.type = OP_SRC_REG_INDIRECT;
> -			op->src.reg = CFI_SP;
> -			op->src.offset = insn.displacement.value;
> -			op->dest.type = OP_DEST_REG;
> -			op->dest.reg = op_to_cfi_reg[modrm_reg][rex_r];
> +			ADD_OP(op) {
> +				op->src.type = OP_SRC_REG_INDIRECT;
> +				op->src.reg = CFI_SP;
> +				op->src.offset = insn.displacement.value;
> +				op->dest.type = OP_DEST_REG;
> +				op->dest.reg = op_to_cfi_reg[modrm_reg][rex_r];
> +			}
>   		}
>   
>   		break;
> @@ -290,27 +321,31 @@ int arch_decode_instruction(struct elf *
>   		if (sib == 0x24 && rex_w && !rex_b && !rex_x) {
>   
>   			*type = INSN_STACK;
> -			if (!insn.displacement.value) {
> -				/* lea (%rsp), reg */
> -				op->src.type = OP_SRC_REG;
> -			} else {
> -				/* lea disp(%rsp), reg */
> -				op->src.type = OP_SRC_ADD;
> -				op->src.offset = insn.displacement.value;
> +			ADD_OP(op) {
> +				if (!insn.displacement.value) {
> +					/* lea (%rsp), reg */
> +					op->src.type = OP_SRC_REG;
> +				} else {
> +					/* lea disp(%rsp), reg */
> +					op->src.type = OP_SRC_ADD;
> +					op->src.offset = insn.displacement.value;
> +				}
> +				op->src.reg = CFI_SP;
> +				op->dest.type = OP_DEST_REG;
> +				op->dest.reg = op_to_cfi_reg[modrm_reg][rex_r];
>   			}
> -			op->src.reg = CFI_SP;
> -			op->dest.type = OP_DEST_REG;
> -			op->dest.reg = op_to_cfi_reg[modrm_reg][rex_r];
>   
>   		} else if (rex == 0x48 && modrm == 0x65) {
>   
>   			/* lea disp(%rbp), %rsp */
>   			*type = INSN_STACK;
> -			op->src.type = OP_SRC_ADD;
> -			op->src.reg = CFI_BP;
> -			op->src.offset = insn.displacement.value;
> -			op->dest.type = OP_DEST_REG;
> -			op->dest.reg = CFI_SP;
> +			ADD_OP(op) {
> +				op->src.type = OP_SRC_ADD;
> +				op->src.reg = CFI_BP;
> +				op->src.offset = insn.displacement.value;
> +				op->dest.type = OP_DEST_REG;
> +				op->dest.reg = CFI_SP;
> +			}
>   
>   		} else if (rex == 0x49 && modrm == 0x62 &&
>   			   insn.displacement.value == -8) {
> @@ -322,11 +357,13 @@ int arch_decode_instruction(struct elf *
>   			 * stack realignment.
>   			 */
>   			*type = INSN_STACK;
> -			op->src.type = OP_SRC_ADD;
> -			op->src.reg = CFI_R10;
> -			op->src.offset = -8;
> -			op->dest.type = OP_DEST_REG;
> -			op->dest.reg = CFI_SP;
> +			ADD_OP(op) {
> +				op->src.type = OP_SRC_ADD;
> +				op->src.reg = CFI_R10;
> +				op->src.offset = -8;
> +				op->dest.type = OP_DEST_REG;
> +				op->dest.reg = CFI_SP;
> +			}
>   
>   		} else if (rex == 0x49 && modrm == 0x65 &&
>   			   insn.displacement.value == -16) {
> @@ -338,11 +375,13 @@ int arch_decode_instruction(struct elf *
>   			 * stack realignment.
>   			 */
>   			*type = INSN_STACK;
> -			op->src.type = OP_SRC_ADD;
> -			op->src.reg = CFI_R13;
> -			op->src.offset = -16;
> -			op->dest.type = OP_DEST_REG;
> -			op->dest.reg = CFI_SP;
> +			ADD_OP(op) {
> +				op->src.type = OP_SRC_ADD;
> +				op->src.reg = CFI_R13;
> +				op->src.offset = -16;
> +				op->dest.type = OP_DEST_REG;
> +				op->dest.reg = CFI_SP;
> +			}
>   		}
>   
>   		break;
> @@ -350,8 +389,10 @@ int arch_decode_instruction(struct elf *
>   	case 0x8f:
>   		/* pop to mem */
>   		*type = INSN_STACK;
> -		op->src.type = OP_SRC_POP;
> -		op->dest.type = OP_DEST_MEM;
> +		ADD_OP(op) {
> +			op->src.type = OP_SRC_POP;
> +			op->dest.type = OP_DEST_MEM;
> +		}
>   		break;
>   
>   	case 0x90:
> @@ -361,15 +402,19 @@ int arch_decode_instruction(struct elf *
>   	case 0x9c:
>   		/* pushf */
>   		*type = INSN_STACK;
> -		op->src.type = OP_SRC_CONST;
> -		op->dest.type = OP_DEST_PUSHF;
> +		ADD_OP(op) {
> +			op->src.type = OP_SRC_CONST;
> +			op->dest.type = OP_DEST_PUSHF;
> +		}
>   		break;
>   
>   	case 0x9d:
>   		/* popf */
>   		*type = INSN_STACK;
> -		op->src.type = OP_SRC_POPF;
> -		op->dest.type = OP_DEST_MEM;
> +		ADD_OP(op) {
> +			op->src.type = OP_SRC_POPF;
> +			op->dest.type = OP_DEST_MEM;
> +		}
>   		break;
>   
>   	case 0x0f:
> @@ -405,15 +450,19 @@ int arch_decode_instruction(struct elf *
>   
>   			/* push fs/gs */
>   			*type = INSN_STACK;
> -			op->src.type = OP_SRC_CONST;
> -			op->dest.type = OP_DEST_PUSH;
> +			ADD_OP(op) {
> +				op->src.type = OP_SRC_CONST;
> +				op->dest.type = OP_DEST_PUSH;
> +			}
>   
>   		} else if (op2 == 0xa1 || op2 == 0xa9) {
>   
>   			/* pop fs/gs */
>   			*type = INSN_STACK;
> -			op->src.type = OP_SRC_POP;
> -			op->dest.type = OP_DEST_MEM;
> +			ADD_OP(op) {
> +				op->src.type = OP_SRC_POP;
> +				op->dest.type = OP_DEST_MEM;
> +			}
>   		}
>   
>   		break;
> @@ -427,7 +476,8 @@ int arch_decode_instruction(struct elf *
>   		 * pop bp
>   		 */
>   		*type = INSN_STACK;
> -		op->dest.type = OP_DEST_LEAVE;
> +		ADD_OP(op)
> +			op->dest.type = OP_DEST_LEAVE;
>   
>   		break;
>   
> @@ -449,12 +499,14 @@ int arch_decode_instruction(struct elf *
>   	case 0xcf: /* iret */
>   		*type = INSN_EXCEPTION_RETURN;
>   
> -		/* add $40, %rsp */
> -		op->src.type = OP_SRC_ADD;
> -		op->src.reg = CFI_SP;
> -		op->src.offset = 5*8;
> -		op->dest.type = OP_DEST_REG;
> -		op->dest.reg = CFI_SP;
> +		ADD_OP(op) {
> +			/* add $40, %rsp */
> +			op->src.type = OP_SRC_ADD;
> +			op->src.reg = CFI_SP;
> +			op->src.offset = 5*8;
> +			op->dest.type = OP_DEST_REG;
> +			op->dest.reg = CFI_SP;
> +		}
>   		break;
>   
>   	case 0xca: /* retf */
> @@ -492,8 +544,10 @@ int arch_decode_instruction(struct elf *
>   
>   			/* push from mem */
>   			*type = INSN_STACK;
> -			op->src.type = OP_SRC_CONST;
> -			op->dest.type = OP_DEST_PUSH;
> +			ADD_OP(op) {
> +				op->src.type = OP_SRC_CONST;
> +				op->dest.type = OP_DEST_PUSH;
> +			}
>   		}
>   
>   		break;
> @@ -504,11 +558,6 @@ int arch_decode_instruction(struct elf *
>   
>   	*immediate = insn.immediate.nbytes ? insn.immediate.value : 0;
>   
> -	if (*type == INSN_STACK || *type == INSN_EXCEPTION_RETURN)
> -		list_add_tail(&op->list, ops_list);
> -	else
> -		free(op);
> -
>   	return 0;
>   }
>   
> 
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 5/8] x86/speculation: Change FILL_RETURN_BUFFER to work with objtool
  2020-04-23 12:47 ` [PATCH 5/8] x86/speculation: Change FILL_RETURN_BUFFER to work with objtool Peter Zijlstra
@ 2020-04-23 15:45   ` Alexandre Chartre
  2020-04-24 19:04   ` Josh Poimboeuf
  1 sibling, 0 replies; 24+ messages in thread
From: Alexandre Chartre @ 2020-04-23 15:45 UTC (permalink / raw)
  To: Peter Zijlstra, jpoimboe; +Cc: linux-kernel, jthierry, tglx, x86, mbenes


On 4/23/20 2:47 PM, Peter Zijlstra wrote:
> Change FILL_RETURN_BUFFER and vmexit_fill_RSB() so that objtool groks
> them and can generate correct ORC unwind information for them.
> 
>   - Since ORC is alternative invariant; that is, all alternatives
>     should have the same ORC entries, the __FILL_RETURN_BUFFER body
>     can not be part of an alternative.
> 
>     Therefore, move it out of the alternative and keep the alternative
>     as a sort of jump_label around it.
> 
>   - Use the ANNOTATE_INTRA_FUNCTION_CALL annotation to white-list
>     these 'funny' call instructions to nowhere.
> 
>   - Use UNWIND_HINT_EMPTY to 'fill' the speculation traps, otherwise
>     objtool will consider them unreachable.
> 
>   - Move the RSP adjustment into the loop, such that the loop has a
>     deterministic stack layout.
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>   arch/x86/include/asm/nospec-branch.h |   16 ++++++++++------
>   1 file changed, 10 insertions(+), 6 deletions(-)


Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>

alex.

> --- a/arch/x86/include/asm/nospec-branch.h
> +++ b/arch/x86/include/asm/nospec-branch.h
> @@ -4,11 +4,13 @@
>   #define _ASM_X86_NOSPEC_BRANCH_H_
>   
>   #include <linux/static_key.h>
> +#include <linux/frame.h>
>   
>   #include <asm/alternative.h>
>   #include <asm/alternative-asm.h>
>   #include <asm/cpufeatures.h>
>   #include <asm/msr-index.h>
> +#include <asm/unwind_hints.h>
>   
>   /*
>    * This should be used immediately before a retpoline alternative. It tells
> @@ -46,21 +48,25 @@
>   #define __FILL_RETURN_BUFFER(reg, nr, sp)	\
>   	mov	$(nr/2), reg;			\
>   771:						\
> +	ANNOTATE_INTRA_FUNCTION_CALL		\
>   	call	772f;				\
>   773:	/* speculation trap */			\
> +	UNWIND_HINT_EMPTY;			\
>   	pause;					\
>   	lfence;					\
>   	jmp	773b;				\
>   772:						\
> +	ANNOTATE_INTRA_FUNCTION_CALL		\
>   	call	774f;				\
>   775:	/* speculation trap */			\
> +	UNWIND_HINT_EMPTY;			\
>   	pause;					\
>   	lfence;					\
>   	jmp	775b;				\
>   774:						\
> +	add	$(BITS_PER_LONG/8) * 2, sp;	\
>   	dec	reg;				\
> -	jnz	771b;				\
> -	add	$(BITS_PER_LONG/8) * nr, sp;
> +	jnz	771b;
>   
>   #ifdef __ASSEMBLY__
>   
> @@ -137,10 +143,8 @@
>     */
>   .macro FILL_RETURN_BUFFER reg:req nr:req ftr:req
>   #ifdef CONFIG_RETPOLINE
> -	ANNOTATE_NOSPEC_ALTERNATIVE
> -	ALTERNATIVE "jmp .Lskip_rsb_\@",				\
> -		__stringify(__FILL_RETURN_BUFFER(\reg,\nr,%_ASM_SP))	\
> -		\ftr
> +	ALTERNATIVE "jmp .Lskip_rsb_\@", "", \ftr
> +	__FILL_RETURN_BUFFER(\reg,\nr,%_ASM_SP)
>   .Lskip_rsb_\@:
>   #endif
>   .endm
> 
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 3/8] objtool: Rework allocating stack_ops on decode
  2020-04-23 15:40   ` Alexandre Chartre
@ 2020-04-23 15:54     ` Peter Zijlstra
  2020-04-24  7:06       ` Alexandre Chartre
  2020-04-23 16:16     ` Peter Zijlstra
  1 sibling, 1 reply; 24+ messages in thread
From: Peter Zijlstra @ 2020-04-23 15:54 UTC (permalink / raw)
  To: Alexandre Chartre; +Cc: jpoimboe, linux-kernel, jthierry, tglx, x86, mbenes

On Thu, Apr 23, 2020 at 05:40:38PM +0200, Alexandre Chartre wrote:

> > @@ -77,6 +77,17 @@ unsigned long arch_jump_destination(stru
> >   	return insn->offset + insn->len + insn->immediate;
> >   }
> > +#define PUSH_OP(op) \
> > +({ \
> > +	list_add_tail(&op->list, ops_list); \
> > +	NULL; \
> > +})
> > +
> > +#define ADD_OP(op) \
> > +	if (!(op = calloc(1, sizeof(*op)))) \
> > +		return -1; \
> > +	else for (; op; op = PUSH_OP(op))
> > +
> 
> I would better have a function to alloc+add op instead of weird macros,
> for example:
> 
> static struct stack_op *add_op(void)
> {
>         struct stack *op;
> 
>         op = calloc(1, sizeof(*op));
>         if (!op)
>                 return NULL;
>         list_add_tail(&op->list, ops_list);
> }
> 
> Then it requires two more lines when using it but I think the code is much
> cleaner and clearer, e.g.:
> 
>                         op = add_op();
>                         if (!op)
>                                 return -1;
>                         op->src.type = OP_SRC_ADD;
>                         op->src.reg = op_to_cfi_reg[modrm_reg][rex_r];
>                         op->dest.type = OP_DEST_REG;
>                         op->dest.reg = CFI_SP;

The 'problem' which this is that it doesn't NULL op again, so any later
use will do 'funny' things instead of crashing sensibly. Also, I'm
mightly lazy, I don't like endlessly repeating the same things.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 3/8] objtool: Rework allocating stack_ops on decode
  2020-04-23 15:40   ` Alexandre Chartre
  2020-04-23 15:54     ` Peter Zijlstra
@ 2020-04-23 16:16     ` Peter Zijlstra
  2020-04-24  9:43       ` Miroslav Benes
  1 sibling, 1 reply; 24+ messages in thread
From: Peter Zijlstra @ 2020-04-23 16:16 UTC (permalink / raw)
  To: Alexandre Chartre; +Cc: jpoimboe, linux-kernel, jthierry, tglx, x86, mbenes

On Thu, Apr 23, 2020 at 05:40:38PM +0200, Alexandre Chartre wrote:
> 
> On 4/23/20 2:47 PM, Peter Zijlstra wrote:
> > Wrap each stack_op in a macro that allocates and adds it to the list.
> > This simplifies trying to figure out what to do with the pre-allocated
> > stack_op at the end.
> > 
> > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> > ---
> >   tools/objtool/arch/x86/decode.c |  257 +++++++++++++++++++++++-----------------
> >   1 file changed, 153 insertions(+), 104 deletions(-)
> > 
> > --- a/tools/objtool/arch/x86/decode.c
> > +++ b/tools/objtool/arch/x86/decode.c
> > @@ -77,6 +77,17 @@ unsigned long arch_jump_destination(stru
> >   	return insn->offset + insn->len + insn->immediate;
> >   }
> > +#define PUSH_OP(op) \
> > +({ \
> > +	list_add_tail(&op->list, ops_list); \
> > +	NULL; \
> > +})
> > +
> > +#define ADD_OP(op) \
> > +	if (!(op = calloc(1, sizeof(*op)))) \
> > +		return -1; \
> > +	else for (; op; op = PUSH_OP(op))
> > +
> 
> I would better have a function to alloc+add op instead of weird macros,
> for example:

I know it'll not make you feel (much) better, but we can write it
shorter:

+#define ADD_OP(op) \
+       if (!(op = calloc(1, sizeof(*op)))) \
+               return -1; \
+       else for (list_add_tail(&op->list, ops_list); op; op = NULL)

Also, the kernel is full of funny macros like this ;-)

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 3/8] objtool: Rework allocating stack_ops on decode
  2020-04-23 15:54     ` Peter Zijlstra
@ 2020-04-24  7:06       ` Alexandre Chartre
  0 siblings, 0 replies; 24+ messages in thread
From: Alexandre Chartre @ 2020-04-24  7:06 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: jpoimboe, linux-kernel, jthierry, tglx, x86, mbenes


On 4/23/20 5:54 PM, Peter Zijlstra wrote:
> On Thu, Apr 23, 2020 at 05:40:38PM +0200, Alexandre Chartre wrote:
> 
>>> @@ -77,6 +77,17 @@ unsigned long arch_jump_destination(stru
>>>    	return insn->offset + insn->len + insn->immediate;
>>>    }
>>> +#define PUSH_OP(op) \
>>> +({ \
>>> +	list_add_tail(&op->list, ops_list); \
>>> +	NULL; \
>>> +})
>>> +
>>> +#define ADD_OP(op) \
>>> +	if (!(op = calloc(1, sizeof(*op)))) \
>>> +		return -1; \
>>> +	else for (; op; op = PUSH_OP(op))
>>> +
>>
>> I would better have a function to alloc+add op instead of weird macros,
>> for example:
>>
>> static struct stack_op *add_op(void)
>> {
>>          struct stack *op;
>>
>>          op = calloc(1, sizeof(*op));
>>          if (!op)
>>                  return NULL;
>>          list_add_tail(&op->list, ops_list);
>> }
>>
>> Then it requires two more lines when using it but I think the code is much
>> cleaner and clearer, e.g.:
>>
>>                          op = add_op();
>>                          if (!op)
>>                                  return -1;
>>                          op->src.type = OP_SRC_ADD;
>>                          op->src.reg = op_to_cfi_reg[modrm_reg][rex_r];
>>                          op->dest.type = OP_DEST_REG;
>>                          op->dest.reg = CFI_SP;
> 
> The 'problem' which this is that it doesn't NULL op again, so any later
> use will do 'funny' things instead of crashing sensibly.

Then you can use a local variable:

                 {
                         struct stack_op *op = add_op();
                         if (!op)
                                 return -1;
                         op->src.type = OP_SRC_ADD;
                         op->src.reg = op_to_cfi_reg[modrm_reg][rex_r];
                         op->dest.type = OP_DEST_REG;
                         op->dest.reg = CFI_SP;
                 }

> Also, I'm mightly lazy, I don't like endlessly repeating the same things.

Me too, I often try to use macros to avoid repeating the same thing, and usually
spend a lot of time trying fancy macros and eventually realize that this is
usually not worth it.

So here, we are down to a two line differences:

    ADD_OP(op) {
            ...
    }

vs.

    {
             struct stack *op = add_op();
             if (!op)
                     return -1;
             ...
    }

Anyway, I leave it up to you, that's just coding preferences.

In any case:

Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>


Thanks,

alex.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 4/8] objtool: Add support for intra-function calls
  2020-04-23 15:22     ` Peter Zijlstra
@ 2020-04-24  9:37       ` Miroslav Benes
  2020-04-24 14:07         ` Miroslav Benes
  0 siblings, 1 reply; 24+ messages in thread
From: Miroslav Benes @ 2020-04-24  9:37 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: jpoimboe, alexandre.chartre, linux-kernel, jthierry, tglx, x86

On Thu, 23 Apr 2020, Peter Zijlstra wrote:

> On Thu, Apr 23, 2020 at 04:34:21PM +0200, Miroslav Benes wrote:
> > >  /*
> > >   * Find the destination instructions for all calls.
> > >   */
> > > @@ -715,10 +725,7 @@ static int add_call_destinations(struct
> > >  				continue;
> > >  
> > >  			if (!insn->call_dest) {
> > > -				WARN_FUNC("unsupported intra-function call",
> > > -					  insn->sec, insn->offset);
> > > -				if (retpoline)
> > > -					WARN("If this is a retpoline, please patch it in with alternatives and annotate it with ANNOTATE_NOSPEC_ALTERNATIVE.");
> > > +				WARN_FUNC("intra-function call", insn->sec, insn->offset);
> > 
> > "unsupported intra-function call"?
> 
> Well, I think the thinking was that intra-function calls are actually
> supported, 'unannotated' perhaps ?

Ok, that would work too. Just keep it consistent with a new description in 
tools/objtool/Documentation/stack-validation.txt added by the patch.
 
> > >  				return -1;
> > >  			}
> > >  
> > > @@ -741,6 +748,12 @@ static int add_call_destinations(struct
> > >  			}
> > >  		} else
> > >  			insn->call_dest = rela->sym;
> > > +
> > > +		/*
> > > +		 * Whatever stack impact regular CALLs have, should be
> > > +		 * undone by the RETURN of the called function.
> > 
> >  * Annotated intra-function CALLs are treated as JMPs with a stack_op.
> >  * See read_intra_function_calls().
> > 
> > would make it a bit clearer.
> 
> That doesn't work for me; we want to explain why it is OK to delete
> stack_ops for regular CALLs. The reason this is OK, is because they're
> matched by RETURN.

Yes. I meant to add the paragraph, not to replace it. Sorry about the 
confusion. The point is to explain what "regular" also means in this 
context.

> > > +                */
> > > +               remove_insn_ops(insn);
> > >         }
> > >  
> > >         return 0;
> > > @@ -1416,6 +1429,57 @@ static int read_instr_hints(struct objto
> > >         return 0;
> > >  }
> > > 
> > > +static int read_intra_function_calls(struct objtool_file *file)
> > > +{
> > > +       struct instruction *insn;
> > > +       struct section *sec;
> > > +       struct rela *rela;
> > > +
> > > +       sec = find_section_by_name(file->elf, ".rela.discard.intra_function_calls");
> > > +       if (!sec)
> > > +               return 0;
> > > +
> > > +       list_for_each_entry(rela, &sec->rela_list, list) {
> > > +               unsigned long dest_off;
> > > +
> > > +               if (rela->sym->type != STT_SECTION) {
> > > +                       WARN("unexpected relocation symbol type in %s",
> > > +                            sec->name);
> > > +                       return -1;
> > > +               }
> > > +
> > > +               insn = find_insn(file, rela->sym->sec, rela->addend);
> > > +               if (!insn) {
> > > +                       WARN("bad .discard.intra_function_call entry");
> > > +                       return -1;
> > > +               }
> > > +
> > > +               if (insn->type != INSN_CALL) {
> > > +                       WARN_FUNC("intra_function_call not a direct call",
> > > +                                 insn->sec, insn->offset);
> > > +                       return -1;
> > > +               }
> > > +
> > > +               /*
> > > +                * Treat intra-function CALLs as JMPs, but with a stack_op.
> > > +                * Also see how setup_call_dest() strips stack_ops from normal
> > > +                * CALLs.
> > 
> > /*
> >  * Treat annotated intra-function CALLs as JMPs, but with a stack_op.
> >  * Also see how add_call_destinations() strips stack_ops from normal
> >  * CALLs.
> >  */
> > 
> > ? (note added "annotated" and s/setup_call_dest/add_call_destinations/)
> 
> Unannotated intra-function calls are not allowed, so I don't see a
> reason to make that distinction, but sure.

Then it would be better to say something like

/*
 * Treat intra-function CALLs as JMPs, but with a stack_op.
 * See add_call_destinations() for reference which also strips 
 * stack_ops from normal CALLs.
 */

But in the end it is up to you for sure.

> > > +                */
> > > +               insn->type = INSN_JUMP_UNCONDITIONAL;
> > 
> > [...]
> > 
> > > @@ -2245,6 +2313,9 @@ static int validate_branch(struct objtoo
> > >  				return 0;
> > >  		}
> > >  
> > > +		if (handle_insn_ops(insn, &state))
> > > +			return 1;
> > > +
> > >  		switch (insn->type) {
> > >  
> > >  		case INSN_RETURN:
> > > @@ -2304,9 +2375,6 @@ static int validate_branch(struct objtoo
> > >  			break;
> > >  
> > >  		case INSN_EXCEPTION_RETURN:
> > > -			if (handle_insn_ops(insn, &state))
> > > -				return 1;
> > > -
> > >  			/*
> > >  			 * This handles x86's sync_core() case, where we use an
> > >  			 * IRET to self. All 'normal' IRET instructions are in
> > > @@ -2326,8 +2394,6 @@ static int validate_branch(struct objtoo
> > >  			return 0;
> > >  
> > >  		case INSN_STACK:
> > > -			if (handle_insn_ops(insn, &state))
> > > -				return 1;
> > >  			break;
> > 
> > So we could get rid of INSN_STACK now as Julien proposed, couldn't we? If 
> > I am not missing something. handle_insn_ops() is called unconditionally 
> > here for all insn types and you remove stack_ops when unneeded.
> 
> Yes, INSN_STACK can now go away in favour of NOPs with stack_ops.
> Separate patch though.
> 
> > We could also go ahead with Julien's proposal to remove 
> > INSN_EXCEPTION_RETURN hack and move it to tools/objtool/arch/x86/decode.c. 
> 
> I don't immediately see how; we don't have a symbol there.

You can call find_symbol_by_offset() to get it, no? All the information 
sohuld be available.

If by symbol you mean the symbol containing the iret.

Quoting Julien:
"And the other suggestion is my other email was that you don't even need to add
INSN_EXCEPTION_RETURN. You can keep IRET as INSN_CONTEXT_SWITCH by default and
x86 decoder lookups the symbol conaining an iret. If it's a function symbol, it
can just set the type to INSN_OTHER so that it caries on to the next
instruction after having handled the stack_op."

So something like (it is incomplete, does not compile and it may be 
completely wrong, so sorry for wasting time in that case):

---

diff --git a/tools/objtool/arch/x86/decode.c b/tools/objtool/arch/x86/decode.c
index 6340ea0dd527..be6520155cfd 100644
--- a/tools/objtool/arch/x86/decode.c
+++ b/tools/objtool/arch/x86/decode.c
@@ -100,6 +100,7 @@ int arch_decode_instruction(const struct elf *elf, const struct section *sec,
 		      rex_x = 0, modrm = 0, modrm_mod = 0, modrm_rm = 0,
 		      modrm_reg = 0, sib = 0;
 	struct stack_op *op = NULL;
+	struct symbol *sym;
 
 	x86_64 = is_x86_64(elf);
 	if (x86_64 == -1)
@@ -496,22 +497,24 @@ int arch_decode_instruction(const struct elf *elf, const struct section *sec,
 		*type = INSN_RETURN;
 		break;
 
-	case 0xcf: /* iret */
-		*type = INSN_EXCEPTION_RETURN;
-
-		ADD_OP(op) {
-			/* add $40, %rsp */
-			op->src.type = OP_SRC_ADD;
-			op->src.reg = CFI_SP;
-			op->src.offset = 5*8;
-			op->dest.type = OP_DEST_REG;
-			op->dest.reg = CFI_SP;
-		}
-		break;
-
 	case 0xca: /* retf */
 	case 0xcb: /* retf */
+	case 0xcf: /* iret */
 		*type = INSN_CONTEXT_SWITCH;
+
+		sym = find_symbol_by_offset(sec, offset);
+		if (sym && sym->type == STT_FUNC) {
+			*type = INSN_OTHER;
+
+			ADD_OP(op) {
+				/* add $40, %rsp */
+				op->src.type = OP_SRC_ADD;
+				op->src.reg = CFI_SP;
+				op->src.offset = 5*8;
+				op->dest.type = OP_DEST_REG;
+				op->dest.reg = CFI_SP;
+			}
+		}
 		break;
 
 	case 0xe8:
diff --git a/tools/objtool/check.c b/tools/objtool/check.c
index 802dba19a161..a5eedf5e9813 100644
--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -2358,17 +2358,6 @@ static int validate_branch(struct objtool_file *file, struct symbol *func,
 
 			break;
 
-		case INSN_EXCEPTION_RETURN:
-			/*
-			 * This handles x86's sync_core() case, where we use an
-			 * IRET to self. All 'normal' IRET instructions are in
-			 * STT_NOTYPE entry symbols.
-			 */
-			if (func)
-				break;
-
-			return 0;
-
 		case INSN_CONTEXT_SWITCH:
 			if (func && (!next_insn || !next_insn->hint)) {
 				WARN_FUNC("unsupported instruction in callable function",

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH 3/8] objtool: Rework allocating stack_ops on decode
  2020-04-23 16:16     ` Peter Zijlstra
@ 2020-04-24  9:43       ` Miroslav Benes
  0 siblings, 0 replies; 24+ messages in thread
From: Miroslav Benes @ 2020-04-24  9:43 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Alexandre Chartre, jpoimboe, linux-kernel, jthierry, tglx, x86

On Thu, 23 Apr 2020, Peter Zijlstra wrote:

> On Thu, Apr 23, 2020 at 05:40:38PM +0200, Alexandre Chartre wrote:
> > 
> > On 4/23/20 2:47 PM, Peter Zijlstra wrote:
> > > Wrap each stack_op in a macro that allocates and adds it to the list.
> > > This simplifies trying to figure out what to do with the pre-allocated
> > > stack_op at the end.
> > > 
> > > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> > > ---
> > >   tools/objtool/arch/x86/decode.c |  257 +++++++++++++++++++++++-----------------
> > >   1 file changed, 153 insertions(+), 104 deletions(-)
> > > 
> > > --- a/tools/objtool/arch/x86/decode.c
> > > +++ b/tools/objtool/arch/x86/decode.c
> > > @@ -77,6 +77,17 @@ unsigned long arch_jump_destination(stru
> > >   	return insn->offset + insn->len + insn->immediate;
> > >   }
> > > +#define PUSH_OP(op) \
> > > +({ \
> > > +	list_add_tail(&op->list, ops_list); \
> > > +	NULL; \
> > > +})
> > > +
> > > +#define ADD_OP(op) \
> > > +	if (!(op = calloc(1, sizeof(*op)))) \
> > > +		return -1; \
> > > +	else for (; op; op = PUSH_OP(op))
> > > +
> > 
> > I would better have a function to alloc+add op instead of weird macros,
> > for example:
> 
> I know it'll not make you feel (much) better, but we can write it
> shorter:
> 
> +#define ADD_OP(op) \
> +       if (!(op = calloc(1, sizeof(*op)))) \
> +               return -1; \
> +       else for (list_add_tail(&op->list, ops_list); op; op = NULL)

FWIW, I like this version the best.

Miroslav

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 4/8] objtool: Add support for intra-function calls
  2020-04-24  9:37       ` Miroslav Benes
@ 2020-04-24 14:07         ` Miroslav Benes
  0 siblings, 0 replies; 24+ messages in thread
From: Miroslav Benes @ 2020-04-24 14:07 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: jpoimboe, alexandre.chartre, linux-kernel, jthierry, tglx, x86

> ---
> 
> diff --git a/tools/objtool/arch/x86/decode.c b/tools/objtool/arch/x86/decode.c
> index 6340ea0dd527..be6520155cfd 100644
> --- a/tools/objtool/arch/x86/decode.c
> +++ b/tools/objtool/arch/x86/decode.c
> @@ -100,6 +100,7 @@ int arch_decode_instruction(const struct elf *elf, const struct section *sec,
>  		      rex_x = 0, modrm = 0, modrm_mod = 0, modrm_rm = 0,
>  		      modrm_reg = 0, sib = 0;
>  	struct stack_op *op = NULL;
> +	struct symbol *sym;
>  
>  	x86_64 = is_x86_64(elf);
>  	if (x86_64 == -1)
> @@ -496,22 +497,24 @@ int arch_decode_instruction(const struct elf *elf, const struct section *sec,
>  		*type = INSN_RETURN;
>  		break;
>  
> -	case 0xcf: /* iret */
> -		*type = INSN_EXCEPTION_RETURN;
> -
> -		ADD_OP(op) {
> -			/* add $40, %rsp */
> -			op->src.type = OP_SRC_ADD;
> -			op->src.reg = CFI_SP;
> -			op->src.offset = 5*8;
> -			op->dest.type = OP_DEST_REG;
> -			op->dest.reg = CFI_SP;
> -		}
> -		break;
> -
>  	case 0xca: /* retf */
>  	case 0xcb: /* retf */
> +	case 0xcf: /* iret */
>  		*type = INSN_CONTEXT_SWITCH;
> +
> +		sym = find_symbol_by_offset(sec, offset);

sym = find_symbol_containing(sec, offset);

of course

> +		if (sym && sym->type == STT_FUNC) {
> +			*type = INSN_OTHER;
> +
> +			ADD_OP(op) {
> +				/* add $40, %rsp */
> +				op->src.type = OP_SRC_ADD;
> +				op->src.reg = CFI_SP;
> +				op->src.offset = 5*8;
> +				op->dest.type = OP_DEST_REG;
> +				op->dest.reg = CFI_SP;
> +			}
> +		}
>  		break;

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 5/8] x86/speculation: Change FILL_RETURN_BUFFER to work with objtool
  2020-04-23 12:47 ` [PATCH 5/8] x86/speculation: Change FILL_RETURN_BUFFER to work with objtool Peter Zijlstra
  2020-04-23 15:45   ` Alexandre Chartre
@ 2020-04-24 19:04   ` Josh Poimboeuf
  1 sibling, 0 replies; 24+ messages in thread
From: Josh Poimboeuf @ 2020-04-24 19:04 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: alexandre.chartre, linux-kernel, jthierry, tglx, x86, mbenes

On Thu, Apr 23, 2020 at 02:47:22PM +0200, Peter Zijlstra wrote:
>  #define __FILL_RETURN_BUFFER(reg, nr, sp)	\
>  	mov	$(nr/2), reg;			\
>  771:						\
> +	ANNOTATE_INTRA_FUNCTION_CALL		\
>  	call	772f;				\
>  773:	/* speculation trap */			\
> +	UNWIND_HINT_EMPTY;			\
>  	pause;					\
>  	lfence;					\
>  	jmp	773b;				\
>  772:						\
> +	ANNOTATE_INTRA_FUNCTION_CALL		\
>  	call	774f;				\
>  775:	/* speculation trap */			\
> +	UNWIND_HINT_EMPTY;			\
>  	pause;					\
>  	lfence;					\
>  	jmp	775b;				\
>  774:						\
> +	add	$(BITS_PER_LONG/8) * 2, sp;	\
>  	dec	reg;				\
> -	jnz	771b;				\
> -	add	$(BITS_PER_LONG/8) * nr, sp;
> +	jnz	771b;

Looks weird having semicolons for one annotation but not the other...

>  
>  #ifdef __ASSEMBLY__
>  
> @@ -137,10 +143,8 @@
>    */
>  .macro FILL_RETURN_BUFFER reg:req nr:req ftr:req
>  #ifdef CONFIG_RETPOLINE
> -	ANNOTATE_NOSPEC_ALTERNATIVE
> -	ALTERNATIVE "jmp .Lskip_rsb_\@",				\
> -		__stringify(__FILL_RETURN_BUFFER(\reg,\nr,%_ASM_SP))	\
> -		\ftr
> +	ALTERNATIVE "jmp .Lskip_rsb_\@", "", \ftr
> +	__FILL_RETURN_BUFFER(\reg,\nr,%_ASM_SP)

I almost gave my "you can't change the stack in an alternative" lecture,
then I did a double take :-)

We do still need a patch to prevent other code from doing that though.

-- 
Josh


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 6/8] x86: Simplify retpoline declaration
  2020-04-23 12:47 ` [PATCH 6/8] x86: Simplify retpoline declaration Peter Zijlstra
@ 2020-04-24 19:09   ` Josh Poimboeuf
  0 siblings, 0 replies; 24+ messages in thread
From: Josh Poimboeuf @ 2020-04-24 19:09 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: alexandre.chartre, linux-kernel, jthierry, tglx, x86, mbenes, arnd

On Thu, Apr 23, 2020 at 02:47:23PM +0200, Peter Zijlstra wrote:
> +#undef GEN
> +#define GEN(reg) EXPORT_THUNK(reg)
> +#include <asm/GEN-for-each-reg.h>
> +

Applying: x86: Simplify retpoline declaration
.git/rebase-apply/patch:112: new blank line at EOF.
+
warning: 1 line adds whitespace errors.

-- 
Josh


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 8/8] x86/retpoline: Fix retpoline unwind
  2020-04-23 12:47 ` [PATCH 8/8] x86/retpoline: Fix retpoline unwind Peter Zijlstra
  2020-04-23 12:59   ` Peter Zijlstra
@ 2020-04-24 19:30   ` Josh Poimboeuf
  2020-04-28 18:30     ` Peter Zijlstra
  1 sibling, 1 reply; 24+ messages in thread
From: Josh Poimboeuf @ 2020-04-24 19:30 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: alexandre.chartre, linux-kernel, jthierry, tglx, x86, mbenes

On Thu, Apr 23, 2020 at 02:47:25PM +0200, Peter Zijlstra wrote:
>  .macro CALL_NOSPEC reg:req
>  #ifdef CONFIG_RETPOLINE
> -	ANNOTATE_NOSPEC_ALTERNATIVE
> -	ALTERNATIVE_2 __stringify(ANNOTATE_RETPOLINE_SAFE; call *%\reg),\
> -		__stringify(RETPOLINE_CALL %\reg), X86_FEATURE_RETPOLINE,\
> -		__stringify(lfence; ANNOTATE_RETPOLINE_SAFE; call *%\reg), X86_FEATURE_RETPOLINE_AMD
> +	/*
> +	 * This cannot be ALTERNATIVE_2 like with JMP_NOSPEC, because ORC
> +	 * unwind data is alternative invariant and needs stack modifying
> +	 * instructions to be in the same place for all alternatives.
> +	 *
> +	 * IOW the CALL instruction must be at the same offset for all cases.
> +	 */
> +	ALTERNATIVE "", "lfence", X86_FEATURE_RETPOLINE_AMD
> +	ALTERNATIVE __stringify(ANNOTATE_RETPOLINE_SAFE; call *%\reg), \
> +		    __stringify(call __x86_retpoline_\reg), X86_FEATURE_RETPOLINE

I'm missing why ALTERNATIVE_2 wouldn't work here.  How is the call a
"stack modifying instruction"?  It's not an intra-function call so it
shouldn't affect ORC at all, right?

>  # define CALL_NOSPEC						\
> -	ANNOTATE_NOSPEC_ALTERNATIVE				\
> -	ALTERNATIVE_2(						\
> -	ANNOTATE_RETPOLINE_SAFE					\
> -	"call *%[thunk_target]\n",				\
> -	"call __x86_indirect_thunk_%V[thunk_target]\n",		\
> -	X86_FEATURE_RETPOLINE,					\
> -	"lfence;\n"						\
> -	ANNOTATE_RETPOLINE_SAFE					\
> -	"call *%[thunk_target]\n",				\
> -	X86_FEATURE_RETPOLINE_AMD)
> +	ALTERNATIVE("", "lfence", X86_FEATURE_RETPOLINE_AMD)	\
> +	ALTERNATIVE(ANNOTATE_RETPOLINE_SAFE			\
> +		    "call *%[thunk_target]\n",			\
> +		    "call __x86_indirect_thunk_%V[thunk_target]\n", \
> +		    X86_FEATURE_RETPOLINE)
> +

Same.

-- 
Josh


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 8/8] x86/retpoline: Fix retpoline unwind
  2020-04-24 19:30   ` Josh Poimboeuf
@ 2020-04-28 18:30     ` Peter Zijlstra
  0 siblings, 0 replies; 24+ messages in thread
From: Peter Zijlstra @ 2020-04-28 18:30 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: alexandre.chartre, linux-kernel, jthierry, tglx, x86, mbenes

On Fri, Apr 24, 2020 at 02:30:28PM -0500, Josh Poimboeuf wrote:
> On Thu, Apr 23, 2020 at 02:47:25PM +0200, Peter Zijlstra wrote:
> >  .macro CALL_NOSPEC reg:req
> >  #ifdef CONFIG_RETPOLINE
> > -	ANNOTATE_NOSPEC_ALTERNATIVE
> > -	ALTERNATIVE_2 __stringify(ANNOTATE_RETPOLINE_SAFE; call *%\reg),\
> > -		__stringify(RETPOLINE_CALL %\reg), X86_FEATURE_RETPOLINE,\
> > -		__stringify(lfence; ANNOTATE_RETPOLINE_SAFE; call *%\reg), X86_FEATURE_RETPOLINE_AMD
> > +	/*
> > +	 * This cannot be ALTERNATIVE_2 like with JMP_NOSPEC, because ORC
> > +	 * unwind data is alternative invariant and needs stack modifying
> > +	 * instructions to be in the same place for all alternatives.
> > +	 *
> > +	 * IOW the CALL instruction must be at the same offset for all cases.
> > +	 */
> > +	ALTERNATIVE "", "lfence", X86_FEATURE_RETPOLINE_AMD
> > +	ALTERNATIVE __stringify(ANNOTATE_RETPOLINE_SAFE; call *%\reg), \
> > +		    __stringify(call __x86_retpoline_\reg), X86_FEATURE_RETPOLINE
> 
> I'm missing why ALTERNATIVE_2 wouldn't work here.  How is the call a
> "stack modifying instruction"?  It's not an intra-function call so it
> shouldn't affect ORC at all, right?

My bad. I find it hard not to consider CALL a stack modification. Let me
fix that.

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2020-04-28 18:31 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-23 12:47 [PATCH 0/8] objtool vs retpoline Peter Zijlstra
2020-04-23 12:47 ` [PATCH 1/8] objtool: is_fentry_call() crashes if call has no destination Peter Zijlstra
2020-04-23 12:47 ` [PATCH 2/8] objtool: UNWIND_HINT_RET_OFFSET should not check registers Peter Zijlstra
2020-04-23 12:47 ` [PATCH 3/8] objtool: Rework allocating stack_ops on decode Peter Zijlstra
2020-04-23 15:40   ` Alexandre Chartre
2020-04-23 15:54     ` Peter Zijlstra
2020-04-24  7:06       ` Alexandre Chartre
2020-04-23 16:16     ` Peter Zijlstra
2020-04-24  9:43       ` Miroslav Benes
2020-04-23 12:47 ` [PATCH 4/8] objtool: Add support for intra-function calls Peter Zijlstra
2020-04-23 14:34   ` Miroslav Benes
2020-04-23 15:22     ` Peter Zijlstra
2020-04-24  9:37       ` Miroslav Benes
2020-04-24 14:07         ` Miroslav Benes
2020-04-23 12:47 ` [PATCH 5/8] x86/speculation: Change FILL_RETURN_BUFFER to work with objtool Peter Zijlstra
2020-04-23 15:45   ` Alexandre Chartre
2020-04-24 19:04   ` Josh Poimboeuf
2020-04-23 12:47 ` [PATCH 6/8] x86: Simplify retpoline declaration Peter Zijlstra
2020-04-24 19:09   ` Josh Poimboeuf
2020-04-23 12:47 ` [PATCH 7/8] x86: Change {JMP,CALL}_NOSPEC argument Peter Zijlstra
2020-04-23 12:47 ` [PATCH 8/8] x86/retpoline: Fix retpoline unwind Peter Zijlstra
2020-04-23 12:59   ` Peter Zijlstra
2020-04-24 19:30   ` Josh Poimboeuf
2020-04-28 18:30     ` Peter Zijlstra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).