linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 00/16] Rewrite x86/ftrace to use text_poke (and more)
@ 2019-10-18  7:35 Peter Zijlstra
  2019-10-18  7:35 ` [PATCH v4 01/16] x86/alternatives: Teach text_poke_bp() to emulate instructions Peter Zijlstra
                   ` (16 more replies)
  0 siblings, 17 replies; 70+ messages in thread
From: Peter Zijlstra @ 2019-10-18  7:35 UTC (permalink / raw)
  To: x86
  Cc: peterz, linux-kernel, rostedt, mhiramat, bristot, jbaron,
	torvalds, tglx, mingo, namit, hpa, luto, ard.biesheuvel,
	jpoimboe, jeyu

Ftrace was one of the last W^X violators (and KLP it seems). These here patches
move it over to the generic text_poke() interface and thereby get rid of this
oddity.

The first 6 or so patches are more or less the same as in v3, except it has the
bugs fixed that Steve found:

 - boot time function tracing works
 - module loading with function tracing works

Then there's 10 new patches, that go all over the place, mostly inspired by
staring at code touched by the first 6. That is, there's further ftrace and
kprobes cleanups, as well fixes for various issues.

In the end, it will have removed the horrible set_all_modules_text_*()
interface and reduced the ftrace module loading to a single callback (again).

The ARM patch is compiled only, I would be much obliged if someone could test
that.


^ permalink raw reply	[flat|nested] 70+ messages in thread

* [PATCH v4 01/16] x86/alternatives: Teach text_poke_bp() to emulate instructions
  2019-10-18  7:35 [PATCH v4 00/16] Rewrite x86/ftrace to use text_poke (and more) Peter Zijlstra
@ 2019-10-18  7:35 ` Peter Zijlstra
  2019-10-18  7:35 ` [PATCH v4 02/16] x86/alternatives: Update int3_emulate_push() comment Peter Zijlstra
                   ` (15 subsequent siblings)
  16 siblings, 0 replies; 70+ messages in thread
From: Peter Zijlstra @ 2019-10-18  7:35 UTC (permalink / raw)
  To: x86
  Cc: peterz, linux-kernel, rostedt, mhiramat, bristot, jbaron,
	torvalds, tglx, mingo, namit, hpa, luto, ard.biesheuvel,
	jpoimboe, jeyu

In preparation for static_call and variable size jump_label support,
teach text_poke_bp() to emulate instructions, namely:

  JMP32, JMP8, CALL, NOP2, NOP_ATOMIC5, INT3

The current text_poke_bp() takes a @handler argument which is used as
a jump target when the temporary INT3 is hit by a different CPU.

When patching CALL instructions, this doesn't work because we'd miss
the PUSH of the return address. Instead, teach poke_int3_handler() to
emulate an instruction, typically the instruction we're patching in.

This fits almost all text_poke_bp() users, except
arch_unoptimize_kprobe() which restores random text, and for that site
we have to build an explicit emulate instruction.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Reviewed-by: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
---
 arch/x86/include/asm/text-patching.h |   24 ++++--
 arch/x86/kernel/alternative.c        |  132 ++++++++++++++++++++++++++---------
 arch/x86/kernel/jump_label.c         |    9 --
 arch/x86/kernel/kprobes/opt.c        |   11 ++
 4 files changed, 130 insertions(+), 46 deletions(-)

--- a/arch/x86/include/asm/text-patching.h
+++ b/arch/x86/include/asm/text-patching.h
@@ -26,10 +26,11 @@ static inline void apply_paravirt(struct
 #define POKE_MAX_OPCODE_SIZE	5
 
 struct text_poke_loc {
-	void *detour;
 	void *addr;
-	size_t len;
-	const char opcode[POKE_MAX_OPCODE_SIZE];
+	int len;
+	s32 rel32;
+	u8 opcode;
+	const u8 text[POKE_MAX_OPCODE_SIZE];
 };
 
 extern void text_poke_early(void *addr, const void *opcode, size_t len);
@@ -51,8 +52,10 @@ extern void text_poke_early(void *addr,
 extern void *text_poke(void *addr, const void *opcode, size_t len);
 extern void *text_poke_kgdb(void *addr, const void *opcode, size_t len);
 extern int poke_int3_handler(struct pt_regs *regs);
-extern void text_poke_bp(void *addr, const void *opcode, size_t len, void *handler);
+extern void text_poke_bp(void *addr, const void *opcode, size_t len, const void *emulate);
 extern void text_poke_bp_batch(struct text_poke_loc *tp, unsigned int nr_entries);
+extern void text_poke_loc_init(struct text_poke_loc *tp, void *addr,
+			       const void *opcode, size_t len, const void *emulate);
 extern int after_bootmem;
 extern __ro_after_init struct mm_struct *poking_mm;
 extern __ro_after_init unsigned long poking_addr;
@@ -63,8 +66,17 @@ static inline void int3_emulate_jmp(stru
 	regs->ip = ip;
 }
 
-#define INT3_INSN_SIZE 1
-#define CALL_INSN_SIZE 5
+#define INT3_INSN_SIZE		1
+#define INT3_INSN_OPCODE	0xCC
+
+#define CALL_INSN_SIZE		5
+#define CALL_INSN_OPCODE	0xE8
+
+#define JMP32_INSN_SIZE		5
+#define JMP32_INSN_OPCODE	0xE9
+
+#define JMP8_INSN_SIZE		2
+#define JMP8_INSN_OPCODE	0xEB
 
 static inline void int3_emulate_push(struct pt_regs *regs, unsigned long val)
 {
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -956,16 +956,15 @@ NOKPROBE_SYMBOL(patch_cmp);
 int poke_int3_handler(struct pt_regs *regs)
 {
 	struct text_poke_loc *tp;
-	unsigned char int3 = 0xcc;
 	void *ip;
 
 	/*
 	 * Having observed our INT3 instruction, we now must observe
 	 * bp_patching.nr_entries.
 	 *
-	 * 	nr_entries != 0			INT3
-	 * 	WMB				RMB
-	 * 	write INT3			if (nr_entries)
+	 *	nr_entries != 0			INT3
+	 *	WMB				RMB
+	 *	write INT3			if (nr_entries)
 	 *
 	 * Idem for other elements in bp_patching.
 	 */
@@ -978,9 +977,9 @@ int poke_int3_handler(struct pt_regs *re
 		return 0;
 
 	/*
-	 * Discount the sizeof(int3). See text_poke_bp_batch().
+	 * Discount the INT3. See text_poke_bp_batch().
 	 */
-	ip = (void *) regs->ip - sizeof(int3);
+	ip = (void *) regs->ip - INT3_INSN_SIZE;
 
 	/*
 	 * Skip the binary search if there is a single member in the vector.
@@ -997,8 +996,28 @@ int poke_int3_handler(struct pt_regs *re
 			return 0;
 	}
 
-	/* set up the specified breakpoint detour */
-	regs->ip = (unsigned long) tp->detour;
+	ip += tp->len;
+
+	switch (tp->opcode) {
+	case INT3_INSN_OPCODE:
+		/*
+		 * Someone poked an explicit INT3, they'll want to handle it,
+		 * do not consume.
+		 */
+		return 0;
+
+	case CALL_INSN_OPCODE:
+		int3_emulate_call(regs, (long)ip + tp->rel32);
+		break;
+
+	case JMP32_INSN_OPCODE:
+	case JMP8_INSN_OPCODE:
+		int3_emulate_jmp(regs, (long)ip + tp->rel32);
+		break;
+
+	default:
+		BUG();
+	}
 
 	return 1;
 }
@@ -1014,7 +1033,7 @@ NOKPROBE_SYMBOL(poke_int3_handler);
  * synchronization using int3 breakpoint.
  *
  * The way it is done:
- * 	- For each entry in the vector:
+ *	- For each entry in the vector:
  *		- add a int3 trap to the address that will be patched
  *	- sync cores
  *	- For each entry in the vector:
@@ -1027,9 +1046,9 @@ NOKPROBE_SYMBOL(poke_int3_handler);
  */
 void text_poke_bp_batch(struct text_poke_loc *tp, unsigned int nr_entries)
 {
-	int patched_all_but_first = 0;
-	unsigned char int3 = 0xcc;
+	unsigned char int3 = INT3_INSN_OPCODE;
 	unsigned int i;
+	int do_sync;
 
 	lockdep_assert_held(&text_mutex);
 
@@ -1053,16 +1072,16 @@ void text_poke_bp_batch(struct text_poke
 	/*
 	 * Second step: update all but the first byte of the patched range.
 	 */
-	for (i = 0; i < nr_entries; i++) {
+	for (do_sync = 0, i = 0; i < nr_entries; i++) {
 		if (tp[i].len - sizeof(int3) > 0) {
 			text_poke((char *)tp[i].addr + sizeof(int3),
-				  (const char *)tp[i].opcode + sizeof(int3),
+				  (const char *)tp[i].text + sizeof(int3),
 				  tp[i].len - sizeof(int3));
-			patched_all_but_first++;
+			do_sync++;
 		}
 	}
 
-	if (patched_all_but_first) {
+	if (do_sync) {
 		/*
 		 * According to Intel, this core syncing is very likely
 		 * not necessary and we'd be safe even without it. But
@@ -1075,10 +1094,17 @@ void text_poke_bp_batch(struct text_poke
 	 * Third step: replace the first byte (int3) by the first byte of
 	 * replacing opcode.
 	 */
-	for (i = 0; i < nr_entries; i++)
-		text_poke(tp[i].addr, tp[i].opcode, sizeof(int3));
+	for (do_sync = 0, i = 0; i < nr_entries; i++) {
+		if (tp[i].text[0] == INT3_INSN_OPCODE)
+			continue;
+
+		text_poke(tp[i].addr, tp[i].text, sizeof(int3));
+		do_sync++;
+	}
+
+	if (do_sync)
+		on_each_cpu(do_sync_core, NULL, 1);
 
-	on_each_cpu(do_sync_core, NULL, 1);
 	/*
 	 * sync_core() implies an smp_mb() and orders this store against
 	 * the writing of the new instruction.
@@ -1087,6 +1113,60 @@ void text_poke_bp_batch(struct text_poke
 	bp_patching.nr_entries = 0;
 }
 
+void text_poke_loc_init(struct text_poke_loc *tp, void *addr,
+			const void *opcode, size_t len, const void *emulate)
+{
+	struct insn insn;
+
+	if (!opcode)
+		opcode = (void *)tp->text;
+	else
+		memcpy((void *)tp->text, opcode, len);
+
+	if (!emulate)
+		emulate = opcode;
+
+	kernel_insn_init(&insn, emulate, MAX_INSN_SIZE);
+	insn_get_length(&insn);
+
+	BUG_ON(!insn_complete(&insn));
+	BUG_ON(len != insn.length);
+
+	tp->addr = addr;
+	tp->len = len;
+	tp->opcode = insn.opcode.bytes[0];
+
+	switch (tp->opcode) {
+	case INT3_INSN_OPCODE:
+		break;
+
+	case CALL_INSN_OPCODE:
+	case JMP32_INSN_OPCODE:
+	case JMP8_INSN_OPCODE:
+		tp->rel32 = insn.immediate.value;
+		break;
+
+	default: /* assume NOP */
+		switch (len) {
+		case 2: /* NOP2 -- emulate as JMP8+0 */
+			BUG_ON(memcmp(emulate, ideal_nops[len], len));
+			tp->opcode = JMP8_INSN_OPCODE;
+			tp->rel32 = 0;
+			break;
+
+		case 5: /* NOP5 -- emulate as JMP32+0 */
+			BUG_ON(memcmp(emulate, ideal_nops[NOP_ATOMIC5], len));
+			tp->opcode = JMP32_INSN_OPCODE;
+			tp->rel32 = 0;
+			break;
+
+		default: /* unknown instruction */
+			BUG();
+		}
+		break;
+	}
+}
+
 /**
  * text_poke_bp() -- update instructions on live kernel on SMP
  * @addr:	address to patch
@@ -1098,20 +1178,10 @@ void text_poke_bp_batch(struct text_poke
  * dynamically allocated memory. This function should be used when it is
  * not possible to allocate memory.
  */
-void text_poke_bp(void *addr, const void *opcode, size_t len, void *handler)
+void text_poke_bp(void *addr, const void *opcode, size_t len, const void *emulate)
 {
-	struct text_poke_loc tp = {
-		.detour = handler,
-		.addr = addr,
-		.len = len,
-	};
-
-	if (len > POKE_MAX_OPCODE_SIZE) {
-		WARN_ONCE(1, "len is larger than %d\n", POKE_MAX_OPCODE_SIZE);
-		return;
-	}
-
-	memcpy((void *)tp.opcode, opcode, len);
+	struct text_poke_loc tp;
 
+	text_poke_loc_init(&tp, addr, opcode, len, emulate);
 	text_poke_bp_batch(&tp, 1);
 }
--- a/arch/x86/kernel/jump_label.c
+++ b/arch/x86/kernel/jump_label.c
@@ -89,8 +89,7 @@ static void __ref __jump_label_transform
 		return;
 	}
 
-	text_poke_bp((void *)jump_entry_code(entry), &code, JUMP_LABEL_NOP_SIZE,
-		     (void *)jump_entry_code(entry) + JUMP_LABEL_NOP_SIZE);
+	text_poke_bp((void *)jump_entry_code(entry), &code, JUMP_LABEL_NOP_SIZE, NULL);
 }
 
 void arch_jump_label_transform(struct jump_entry *entry,
@@ -147,11 +146,9 @@ bool arch_jump_label_transform_queue(str
 	}
 
 	__jump_label_set_jump_code(entry, type,
-				   (union jump_code_union *) &tp->opcode, 0);
+				   (union jump_code_union *)&tp->text, 0);
 
-	tp->addr = entry_code;
-	tp->detour = entry_code + JUMP_LABEL_NOP_SIZE;
-	tp->len = JUMP_LABEL_NOP_SIZE;
+	text_poke_loc_init(tp, entry_code, NULL, JUMP_LABEL_NOP_SIZE, NULL);
 
 	tp_vec_nr++;
 
--- a/arch/x86/kernel/kprobes/opt.c
+++ b/arch/x86/kernel/kprobes/opt.c
@@ -437,8 +437,7 @@ void arch_optimize_kprobes(struct list_h
 		insn_buff[0] = RELATIVEJUMP_OPCODE;
 		*(s32 *)(&insn_buff[1]) = rel;
 
-		text_poke_bp(op->kp.addr, insn_buff, RELATIVEJUMP_SIZE,
-			     op->optinsn.insn);
+		text_poke_bp(op->kp.addr, insn_buff, RELATIVEJUMP_SIZE, NULL);
 
 		list_del_init(&op->list);
 	}
@@ -448,12 +447,18 @@ void arch_optimize_kprobes(struct list_h
 void arch_unoptimize_kprobe(struct optimized_kprobe *op)
 {
 	u8 insn_buff[RELATIVEJUMP_SIZE];
+	u8 emulate_buff[RELATIVEJUMP_SIZE];
 
 	/* Set int3 to first byte for kprobes */
 	insn_buff[0] = BREAKPOINT_INSTRUCTION;
 	memcpy(insn_buff + 1, op->optinsn.copied_insn, RELATIVE_ADDR_SIZE);
+
+	emulate_buff[0] = RELATIVEJUMP_OPCODE;
+	*(s32 *)(&emulate_buff[1]) = (s32)((long)op->optinsn.insn -
+			((long)op->kp.addr + RELATIVEJUMP_SIZE));
+
 	text_poke_bp(op->kp.addr, insn_buff, RELATIVEJUMP_SIZE,
-		     op->optinsn.insn);
+		     emulate_buff);
 }
 
 /*



^ permalink raw reply	[flat|nested] 70+ messages in thread

* [PATCH v4 02/16] x86/alternatives: Update int3_emulate_push() comment
  2019-10-18  7:35 [PATCH v4 00/16] Rewrite x86/ftrace to use text_poke (and more) Peter Zijlstra
  2019-10-18  7:35 ` [PATCH v4 01/16] x86/alternatives: Teach text_poke_bp() to emulate instructions Peter Zijlstra
@ 2019-10-18  7:35 ` Peter Zijlstra
  2019-10-18  7:35 ` [PATCH v4 03/16] x86/alternatives,jump_label: Provide better text_poke() batching interface Peter Zijlstra
                   ` (14 subsequent siblings)
  16 siblings, 0 replies; 70+ messages in thread
From: Peter Zijlstra @ 2019-10-18  7:35 UTC (permalink / raw)
  To: x86
  Cc: peterz, linux-kernel, rostedt, mhiramat, bristot, jbaron,
	torvalds, tglx, mingo, namit, hpa, luto, ard.biesheuvel,
	jpoimboe, jeyu

Update the comment now that we've merged x86_32 support.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/include/asm/text-patching.h |    3 +++
 1 file changed, 3 insertions(+)

--- a/arch/x86/include/asm/text-patching.h
+++ b/arch/x86/include/asm/text-patching.h
@@ -85,6 +85,9 @@ static inline void int3_emulate_push(str
 	 * stack where the break point happened, and the saving of
 	 * pt_regs. We can extend the original stack because of
 	 * this gap. See the idtentry macro's create_gap option.
+	 *
+	 * Similarly entry_32.S will have a gap on the stack for (any) hardware
+	 * exception and pt_regs; see FIXUP_FRAME.
 	 */
 	regs->sp -= sizeof(unsigned long);
 	*(unsigned long *)regs->sp = val;



^ permalink raw reply	[flat|nested] 70+ messages in thread

* [PATCH v4 03/16] x86/alternatives,jump_label: Provide better text_poke() batching interface
  2019-10-18  7:35 [PATCH v4 00/16] Rewrite x86/ftrace to use text_poke (and more) Peter Zijlstra
  2019-10-18  7:35 ` [PATCH v4 01/16] x86/alternatives: Teach text_poke_bp() to emulate instructions Peter Zijlstra
  2019-10-18  7:35 ` [PATCH v4 02/16] x86/alternatives: Update int3_emulate_push() comment Peter Zijlstra
@ 2019-10-18  7:35 ` Peter Zijlstra
  2019-10-21  8:48   ` Ingo Molnar
  2019-10-18  7:35 ` [PATCH v4 04/16] x86/alternatives: Add and use text_gen_insn() helper Peter Zijlstra
                   ` (13 subsequent siblings)
  16 siblings, 1 reply; 70+ messages in thread
From: Peter Zijlstra @ 2019-10-18  7:35 UTC (permalink / raw)
  To: x86
  Cc: peterz, linux-kernel, rostedt, mhiramat, bristot, jbaron,
	torvalds, tglx, mingo, namit, hpa, luto, ard.biesheuvel,
	jpoimboe, jeyu

Adding another text_poke_bp_batch() user made me realize the interface
is all sorts of wrong. The text poke vector should be internal to the
implementation.

This then results in a trivial interface:

  text_poke_queue()  - which has the 'normal' text_poke_bp() interface
  text_poke_finish() - which takes no arguments and flushes any
                       pending text_poke()s.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Reviewed-by: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
---
 arch/x86/include/asm/text-patching.h |   15 +----
 arch/x86/kernel/alternative.c        |   64 ++++++++++++++++++++--
 arch/x86/kernel/jump_label.c         |   99 ++++++++++++-----------------------
 3 files changed, 96 insertions(+), 82 deletions(-)

--- a/arch/x86/include/asm/text-patching.h
+++ b/arch/x86/include/asm/text-patching.h
@@ -25,14 +25,6 @@ static inline void apply_paravirt(struct
  */
 #define POKE_MAX_OPCODE_SIZE	5
 
-struct text_poke_loc {
-	void *addr;
-	int len;
-	s32 rel32;
-	u8 opcode;
-	const u8 text[POKE_MAX_OPCODE_SIZE];
-};
-
 extern void text_poke_early(void *addr, const void *opcode, size_t len);
 
 /*
@@ -53,9 +45,10 @@ extern void *text_poke(void *addr, const
 extern void *text_poke_kgdb(void *addr, const void *opcode, size_t len);
 extern int poke_int3_handler(struct pt_regs *regs);
 extern void text_poke_bp(void *addr, const void *opcode, size_t len, const void *emulate);
-extern void text_poke_bp_batch(struct text_poke_loc *tp, unsigned int nr_entries);
-extern void text_poke_loc_init(struct text_poke_loc *tp, void *addr,
-			       const void *opcode, size_t len, const void *emulate);
+
+extern void text_poke_queue(void *addr, const void *opcode, size_t len, const void *emulate);
+extern void text_poke_finish(void);
+
 extern int after_bootmem;
 extern __ro_after_init struct mm_struct *poking_mm;
 extern __ro_after_init unsigned long poking_addr;
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -936,6 +936,14 @@ static void do_sync_core(void *info)
 	sync_core();
 }
 
+struct text_poke_loc {
+	void *addr;
+	int len;
+	s32 rel32;
+	u8 opcode;
+	const u8 text[POKE_MAX_OPCODE_SIZE];
+};
+
 static struct bp_patching_desc {
 	struct text_poke_loc *vec;
 	int nr_entries;
@@ -1023,6 +1031,10 @@ int poke_int3_handler(struct pt_regs *re
 }
 NOKPROBE_SYMBOL(poke_int3_handler);
 
+#define TP_VEC_MAX (PAGE_SIZE / sizeof(struct text_poke_loc))
+static struct text_poke_loc tp_vec[TP_VEC_MAX];
+static int tp_vec_nr;
+
 /**
  * text_poke_bp_batch() -- update instructions on live kernel on SMP
  * @tp:			vector of instructions to patch
@@ -1044,7 +1056,7 @@ NOKPROBE_SYMBOL(poke_int3_handler);
  *		  replacing opcode
  *	- sync cores
  */
-void text_poke_bp_batch(struct text_poke_loc *tp, unsigned int nr_entries)
+static void text_poke_bp_batch(struct text_poke_loc *tp, unsigned int nr_entries)
 {
 	unsigned char int3 = INT3_INSN_OPCODE;
 	unsigned int i;
@@ -1118,11 +1130,7 @@ void text_poke_loc_init(struct text_poke
 {
 	struct insn insn;
 
-	if (!opcode)
-		opcode = (void *)tp->text;
-	else
-		memcpy((void *)tp->text, opcode, len);
-
+	memcpy((void *)tp->text, opcode, len);
 	if (!emulate)
 		emulate = opcode;
 
@@ -1167,6 +1175,50 @@ void text_poke_loc_init(struct text_poke
 	}
 }
 
+/*
+ * We hard rely on the tp_vec being ordered; ensure this is so by flushing
+ * early if needed.
+ */
+static bool tp_order_fail(void *addr)
+{
+	struct text_poke_loc *tp;
+
+	if (!tp_vec_nr)
+		return false;
+
+	if (!addr) /* force */
+		return true;
+
+	tp = &tp_vec[tp_vec_nr - 1];
+	if ((unsigned long)tp->addr > (unsigned long)addr)
+		return true;
+
+	return false;
+}
+
+static void text_poke_flush(void *addr)
+{
+	if (tp_vec_nr == TP_VEC_MAX || tp_order_fail(addr)) {
+		text_poke_bp_batch(tp_vec, tp_vec_nr);
+		tp_vec_nr = 0;
+	}
+}
+
+void text_poke_finish(void)
+{
+	text_poke_flush(NULL);
+}
+
+void text_poke_queue(void *addr, const void *opcode, size_t len, const void *emulate)
+{
+	struct text_poke_loc *tp;
+
+	text_poke_flush(addr);
+
+	tp = &tp_vec[tp_vec_nr++];
+	text_poke_loc_init(tp, addr, opcode, len, emulate);
+}
+
 /**
  * text_poke_bp() -- update instructions on live kernel on SMP
  * @addr:	address to patch
--- a/arch/x86/kernel/jump_label.c
+++ b/arch/x86/kernel/jump_label.c
@@ -35,18 +35,19 @@ static void bug_at(unsigned char *ip, in
 	BUG();
 }
 
-static void __jump_label_set_jump_code(struct jump_entry *entry,
-				       enum jump_label_type type,
-				       union jump_code_union *code,
-				       int init)
+static const void *
+__jump_label_set_jump_code(struct jump_entry *entry, enum jump_label_type type, int init)
 {
+	static union jump_code_union code; /* relies on text_mutex */
 	const unsigned char default_nop[] = { STATIC_KEY_INIT_NOP };
 	const unsigned char *ideal_nop = ideal_nops[NOP_ATOMIC5];
 	const void *expect;
 	int line;
 
-	code->jump = 0xe9;
-	code->offset = jump_entry_target(entry) -
+	lockdep_assert_held(&text_mutex);
+
+	code.jump = JMP32_INSN_OPCODE;
+	code.offset = jump_entry_target(entry) -
 		       (jump_entry_code(entry) + JUMP_LABEL_NOP_SIZE);
 
 	if (init) {
@@ -54,23 +55,23 @@ static void __jump_label_set_jump_code(s
 	} else if (type == JUMP_LABEL_JMP) {
 		expect = ideal_nop; line = __LINE__;
 	} else {
-		expect = code->code; line = __LINE__;
+		expect = code.code; line = __LINE__;
 	}
 
 	if (memcmp((void *)jump_entry_code(entry), expect, JUMP_LABEL_NOP_SIZE))
 		bug_at((void *)jump_entry_code(entry), line);
 
 	if (type == JUMP_LABEL_NOP)
-		memcpy(code, ideal_nop, JUMP_LABEL_NOP_SIZE);
+		memcpy(&code, ideal_nop, JUMP_LABEL_NOP_SIZE);
+
+	return &code;
 }
 
-static void __ref __jump_label_transform(struct jump_entry *entry,
-					 enum jump_label_type type,
-					 int init)
+static void inline __jump_label_transform(struct jump_entry *entry,
+					  enum jump_label_type type,
+					  int init)
 {
-	union jump_code_union code;
-
-	__jump_label_set_jump_code(entry, type, &code, init);
+	const void *opcode = __jump_label_set_jump_code(entry, type, init);
 
 	/*
 	 * As long as only a single processor is running and the code is still
@@ -84,31 +85,33 @@ static void __ref __jump_label_transform
 	 * always nop being the 'currently valid' instruction
 	 */
 	if (init || system_state == SYSTEM_BOOTING) {
-		text_poke_early((void *)jump_entry_code(entry), &code,
+		text_poke_early((void *)jump_entry_code(entry), opcode,
 				JUMP_LABEL_NOP_SIZE);
 		return;
 	}
 
-	text_poke_bp((void *)jump_entry_code(entry), &code, JUMP_LABEL_NOP_SIZE, NULL);
+	text_poke_bp((void *)jump_entry_code(entry), opcode, JUMP_LABEL_NOP_SIZE, NULL);
 }
 
-void arch_jump_label_transform(struct jump_entry *entry,
-			       enum jump_label_type type)
+static void __ref jump_label_transform(struct jump_entry *entry,
+				       enum jump_label_type type,
+				       int init)
 {
 	mutex_lock(&text_mutex);
-	__jump_label_transform(entry, type, 0);
+	__jump_label_transform(entry, type, init);
 	mutex_unlock(&text_mutex);
 }
 
-#define TP_VEC_MAX (PAGE_SIZE / sizeof(struct text_poke_loc))
-static struct text_poke_loc tp_vec[TP_VEC_MAX];
-static int tp_vec_nr;
+void arch_jump_label_transform(struct jump_entry *entry,
+			       enum jump_label_type type)
+{
+	jump_label_transform(entry, type, 0);
+}
 
 bool arch_jump_label_transform_queue(struct jump_entry *entry,
 				     enum jump_label_type type)
 {
-	struct text_poke_loc *tp;
-	void *entry_code;
+	const void *opcode;
 
 	if (system_state == SYSTEM_BOOTING) {
 		/*
@@ -118,53 +121,19 @@ bool arch_jump_label_transform_queue(str
 		return true;
 	}
 
-	/*
-	 * No more space in the vector, tell upper layer to apply
-	 * the queue before continuing.
-	 */
-	if (tp_vec_nr == TP_VEC_MAX)
-		return false;
-
-	tp = &tp_vec[tp_vec_nr];
-
-	entry_code = (void *)jump_entry_code(entry);
-
-	/*
-	 * The INT3 handler will do a bsearch in the queue, so we need entries
-	 * to be sorted. We can survive an unsorted list by rejecting the entry,
-	 * forcing the generic jump_label code to apply the queue. Warning once,
-	 * to raise the attention to the case of an unsorted entry that is
-	 * better not happen, because, in the worst case we will perform in the
-	 * same way as we do without batching - with some more overhead.
-	 */
-	if (tp_vec_nr > 0) {
-		int prev = tp_vec_nr - 1;
-		struct text_poke_loc *prev_tp = &tp_vec[prev];
-
-		if (WARN_ON_ONCE(prev_tp->addr > entry_code))
-			return false;
-	}
-
-	__jump_label_set_jump_code(entry, type,
-				   (union jump_code_union *)&tp->text, 0);
-
-	text_poke_loc_init(tp, entry_code, NULL, JUMP_LABEL_NOP_SIZE, NULL);
-
-	tp_vec_nr++;
-
+	mutex_lock(&text_mutex);
+	opcode = __jump_label_set_jump_code(entry, type, 0);
+	text_poke_queue((void *)jump_entry_code(entry),
+			opcode, JUMP_LABEL_NOP_SIZE, NULL);
+	mutex_unlock(&text_mutex);
 	return true;
 }
 
 void arch_jump_label_transform_apply(void)
 {
-	if (!tp_vec_nr)
-		return;
-
 	mutex_lock(&text_mutex);
-	text_poke_bp_batch(tp_vec, tp_vec_nr);
+	text_poke_finish();
 	mutex_unlock(&text_mutex);
-
-	tp_vec_nr = 0;
 }
 
 static enum {
@@ -193,5 +162,5 @@ __init_or_module void arch_jump_label_tr
 			jlstate = JL_STATE_NO_UPDATE;
 	}
 	if (jlstate == JL_STATE_UPDATE)
-		__jump_label_transform(entry, type, 1);
+		jump_label_transform(entry, type, 1);
 }



^ permalink raw reply	[flat|nested] 70+ messages in thread

* [PATCH v4 04/16] x86/alternatives: Add and use text_gen_insn() helper
  2019-10-18  7:35 [PATCH v4 00/16] Rewrite x86/ftrace to use text_poke (and more) Peter Zijlstra
                   ` (2 preceding siblings ...)
  2019-10-18  7:35 ` [PATCH v4 03/16] x86/alternatives,jump_label: Provide better text_poke() batching interface Peter Zijlstra
@ 2019-10-18  7:35 ` Peter Zijlstra
  2019-10-18  7:35 ` [PATCH v4 05/16] x86/ftrace: Use text_poke() Peter Zijlstra
                   ` (12 subsequent siblings)
  16 siblings, 0 replies; 70+ messages in thread
From: Peter Zijlstra @ 2019-10-18  7:35 UTC (permalink / raw)
  To: x86
  Cc: peterz, linux-kernel, rostedt, mhiramat, bristot, jbaron,
	torvalds, tglx, mingo, namit, hpa, luto, ard.biesheuvel,
	jpoimboe, jeyu

Provide a simple helper function to create common instruction
encodings.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
---
 arch/x86/include/asm/text-patching.h |    2 +
 arch/x86/kernel/alternative.c        |   36 +++++++++++++++++++++++++++++++++++
 arch/x86/kernel/jump_label.c         |   31 ++++++++++--------------------
 arch/x86/kernel/kprobes/opt.c        |    7 ------
 4 files changed, 50 insertions(+), 26 deletions(-)

--- a/arch/x86/include/asm/text-patching.h
+++ b/arch/x86/include/asm/text-patching.h
@@ -49,6 +49,8 @@ extern void text_poke_bp(void *addr, con
 extern void text_poke_queue(void *addr, const void *opcode, size_t len, const void *emulate);
 extern void text_poke_finish(void);
 
+extern void *text_gen_insn(u8 opcode, const void *addr, const void *dest);
+
 extern int after_bootmem;
 extern __ro_after_init struct mm_struct *poking_mm;
 extern __ro_after_init unsigned long poking_addr;
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -1237,3 +1237,39 @@ void text_poke_bp(void *addr, const void
 	text_poke_loc_init(&tp, addr, opcode, len, emulate);
 	text_poke_bp_batch(&tp, 1);
 }
+
+union text_poke_insn {
+	u8 text[POKE_MAX_OPCODE_SIZE];
+	struct {
+		u8 opcode;
+		s32 disp;
+	} __attribute__((packed));
+};
+
+void *text_gen_insn(u8 opcode, const void *addr, const void *dest)
+{
+	static union text_poke_insn insn; /* text_mutex */
+	int size = 0;
+
+	lockdep_assert_held(&text_mutex);
+
+	insn.opcode = opcode;
+
+#define __CASE(insn)	\
+	case insn##_INSN_OPCODE: size = insn##_INSN_SIZE; break
+
+	switch(opcode) {
+	__CASE(INT3);
+	__CASE(CALL);
+	__CASE(JMP32);
+	__CASE(JMP8);
+	}
+
+	if (size > 1) {
+		insn.disp = (long)dest - (long)(addr + size);
+		if (size == 2)
+			BUG_ON((insn.disp >> 31) != (insn.disp >> 7));
+	}
+
+	return &insn.text;
+}
--- a/arch/x86/kernel/jump_label.c
+++ b/arch/x86/kernel/jump_label.c
@@ -16,15 +16,7 @@
 #include <asm/alternative.h>
 #include <asm/text-patching.h>
 
-union jump_code_union {
-	char code[JUMP_LABEL_NOP_SIZE];
-	struct {
-		char jump;
-		int offset;
-	} __attribute__((packed));
-};
-
-static void bug_at(unsigned char *ip, int line)
+static void bug_at(const void *ip, int line)
 {
 	/*
 	 * The location is not an op that we were expecting.
@@ -38,33 +30,32 @@ static void bug_at(unsigned char *ip, in
 static const void *
 __jump_label_set_jump_code(struct jump_entry *entry, enum jump_label_type type, int init)
 {
-	static union jump_code_union code; /* relies on text_mutex */
 	const unsigned char default_nop[] = { STATIC_KEY_INIT_NOP };
 	const unsigned char *ideal_nop = ideal_nops[NOP_ATOMIC5];
-	const void *expect;
+	const void *expect, *code;
+	const void *addr, *dest;
 	int line;
 
-	lockdep_assert_held(&text_mutex);
+	addr = (void *)jump_entry_code(entry);
+	dest = (void *)jump_entry_target(entry);
 
-	code.jump = JMP32_INSN_OPCODE;
-	code.offset = jump_entry_target(entry) -
-		       (jump_entry_code(entry) + JUMP_LABEL_NOP_SIZE);
+	code = text_gen_insn(JMP32_INSN_OPCODE, addr, dest);
 
 	if (init) {
 		expect = default_nop; line = __LINE__;
 	} else if (type == JUMP_LABEL_JMP) {
 		expect = ideal_nop; line = __LINE__;
 	} else {
-		expect = code.code; line = __LINE__;
+		expect = code; line = __LINE__;
 	}
 
-	if (memcmp((void *)jump_entry_code(entry), expect, JUMP_LABEL_NOP_SIZE))
-		bug_at((void *)jump_entry_code(entry), line);
+	if (memcmp(addr, expect, JUMP_LABEL_NOP_SIZE))
+		bug_at(addr, line);
 
 	if (type == JUMP_LABEL_NOP)
-		memcpy(&code, ideal_nop, JUMP_LABEL_NOP_SIZE);
+		code = ideal_nop;
 
-	return &code;
+	return code;
 }
 
 static void inline __jump_label_transform(struct jump_entry *entry,
--- a/arch/x86/kernel/kprobes/opt.c
+++ b/arch/x86/kernel/kprobes/opt.c
@@ -447,18 +447,13 @@ void arch_optimize_kprobes(struct list_h
 void arch_unoptimize_kprobe(struct optimized_kprobe *op)
 {
 	u8 insn_buff[RELATIVEJUMP_SIZE];
-	u8 emulate_buff[RELATIVEJUMP_SIZE];
 
 	/* Set int3 to first byte for kprobes */
 	insn_buff[0] = BREAKPOINT_INSTRUCTION;
 	memcpy(insn_buff + 1, op->optinsn.copied_insn, RELATIVE_ADDR_SIZE);
 
-	emulate_buff[0] = RELATIVEJUMP_OPCODE;
-	*(s32 *)(&emulate_buff[1]) = (s32)((long)op->optinsn.insn -
-			((long)op->kp.addr + RELATIVEJUMP_SIZE));
-
 	text_poke_bp(op->kp.addr, insn_buff, RELATIVEJUMP_SIZE,
-		     emulate_buff);
+		     text_gen_insn(JMP32_INSN_OPCODE, op->kp.addr, op->optinsn.insn));
 }
 
 /*



^ permalink raw reply	[flat|nested] 70+ messages in thread

* [PATCH v4 05/16] x86/ftrace: Use text_poke()
  2019-10-18  7:35 [PATCH v4 00/16] Rewrite x86/ftrace to use text_poke (and more) Peter Zijlstra
                   ` (3 preceding siblings ...)
  2019-10-18  7:35 ` [PATCH v4 04/16] x86/alternatives: Add and use text_gen_insn() helper Peter Zijlstra
@ 2019-10-18  7:35 ` Peter Zijlstra
  2019-10-18  7:35 ` [PATCH v4 06/16] x86/mm: Remove set_kernel_text_r[ow]() Peter Zijlstra
                   ` (11 subsequent siblings)
  16 siblings, 0 replies; 70+ messages in thread
From: Peter Zijlstra @ 2019-10-18  7:35 UTC (permalink / raw)
  To: x86
  Cc: peterz, linux-kernel, rostedt, mhiramat, bristot, jbaron,
	torvalds, tglx, mingo, namit, hpa, luto, ard.biesheuvel,
	jpoimboe, jeyu

Move ftrace over to using the generic x86 text_poke functions; this
avoids having a second/different copy of that code around.

This also avoids ftrace violating the (new) W^X rule and avoids
fragmenting the kernel text page-tables, due to no longer having to
toggle them RW.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
---
 arch/x86/include/asm/ftrace.h |    2 
 arch/x86/kernel/alternative.c |   18 -
 arch/x86/kernel/ftrace.c      |  666 +++++++-----------------------------------
 arch/x86/kernel/traps.c       |    9 
 4 files changed, 133 insertions(+), 562 deletions(-)

--- a/arch/x86/include/asm/ftrace.h
+++ b/arch/x86/include/asm/ftrace.h
@@ -34,8 +34,6 @@ struct dyn_arch_ftrace {
 	/* No extra data needed for x86 */
 };
 
-int ftrace_int3_handler(struct pt_regs *regs);
-
 #define FTRACE_GRAPH_TRAMP_ADDR FTRACE_GRAPH_ADDR
 
 #endif /*  CONFIG_DYNAMIC_FTRACE */
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -949,7 +949,7 @@ static struct bp_patching_desc {
 	int nr_entries;
 } bp_patching;
 
-static int patch_cmp(const void *key, const void *elt)
+static int notrace patch_cmp(const void *key, const void *elt)
 {
 	struct text_poke_loc *tp = (struct text_poke_loc *) elt;
 
@@ -961,7 +961,7 @@ static int patch_cmp(const void *key, co
 }
 NOKPROBE_SYMBOL(patch_cmp);
 
-int poke_int3_handler(struct pt_regs *regs)
+int notrace poke_int3_handler(struct pt_regs *regs)
 {
 	struct text_poke_loc *tp;
 	void *ip;
@@ -1209,10 +1209,15 @@ void text_poke_finish(void)
 	text_poke_flush(NULL);
 }
 
-void text_poke_queue(void *addr, const void *opcode, size_t len, const void *emulate)
+void __ref text_poke_queue(void *addr, const void *opcode, size_t len, const void *emulate)
 {
 	struct text_poke_loc *tp;
 
+	if (unlikely(system_state == SYSTEM_BOOTING)) {
+		text_poke_early(addr, opcode, len);
+		return;
+	}
+
 	text_poke_flush(addr);
 
 	tp = &tp_vec[tp_vec_nr++];
@@ -1230,10 +1235,15 @@ void text_poke_queue(void *addr, const v
  * dynamically allocated memory. This function should be used when it is
  * not possible to allocate memory.
  */
-void text_poke_bp(void *addr, const void *opcode, size_t len, const void *emulate)
+void __ref text_poke_bp(void *addr, const void *opcode, size_t len, const void *emulate)
 {
 	struct text_poke_loc tp;
 
+	if (unlikely(system_state == SYSTEM_BOOTING)) {
+		text_poke_early(addr, opcode, len);
+		return;
+	}
+
 	text_poke_loc_init(&tp, addr, opcode, len, emulate);
 	text_poke_bp_batch(&tp, 1);
 }
--- a/arch/x86/kernel/ftrace.c
+++ b/arch/x86/kernel/ftrace.c
@@ -34,6 +34,8 @@
 
 #ifdef CONFIG_DYNAMIC_FTRACE
 
+static int ftrace_poke_late = 0;
+
 int ftrace_arch_code_modify_prepare(void)
     __acquires(&text_mutex)
 {
@@ -43,16 +45,15 @@ int ftrace_arch_code_modify_prepare(void
 	 * ftrace has it set to "read/write".
 	 */
 	mutex_lock(&text_mutex);
-	set_kernel_text_rw();
-	set_all_modules_text_rw();
+	ftrace_poke_late = 1;
 	return 0;
 }
 
 int ftrace_arch_code_modify_post_process(void)
     __releases(&text_mutex)
 {
-	set_all_modules_text_ro();
-	set_kernel_text_ro();
+	text_poke_finish();
+	ftrace_poke_late = 0;
 	mutex_unlock(&text_mutex);
 	return 0;
 }
@@ -60,67 +61,34 @@ int ftrace_arch_code_modify_post_process
 union ftrace_code_union {
 	char code[MCOUNT_INSN_SIZE];
 	struct {
-		unsigned char op;
+		char op;
 		int offset;
 	} __attribute__((packed));
 };
 
-static int ftrace_calc_offset(long ip, long addr)
-{
-	return (int)(addr - ip);
-}
-
-static unsigned char *
-ftrace_text_replace(unsigned char op, unsigned long ip, unsigned long addr)
+static const char *ftrace_text_replace(char op, unsigned long ip, unsigned long addr)
 {
 	static union ftrace_code_union calc;
 
-	calc.op		= op;
-	calc.offset	= ftrace_calc_offset(ip + MCOUNT_INSN_SIZE, addr);
+	calc.op = op;
+	calc.offset = (int)(addr - (ip + MCOUNT_INSN_SIZE));
 
 	return calc.code;
 }
 
-static unsigned char *
-ftrace_call_replace(unsigned long ip, unsigned long addr)
+static const char *ftrace_nop_replace(void)
 {
-	return ftrace_text_replace(0xe8, ip, addr);
-}
-
-static inline int
-within(unsigned long addr, unsigned long start, unsigned long end)
-{
-	return addr >= start && addr < end;
-}
-
-static unsigned long text_ip_addr(unsigned long ip)
-{
-	/*
-	 * On x86_64, kernel text mappings are mapped read-only, so we use
-	 * the kernel identity mapping instead of the kernel text mapping
-	 * to modify the kernel text.
-	 *
-	 * For 32bit kernels, these mappings are same and we can use
-	 * kernel identity mapping to modify code.
-	 */
-	if (within(ip, (unsigned long)_text, (unsigned long)_etext))
-		ip = (unsigned long)__va(__pa_symbol(ip));
-
-	return ip;
+	return ideal_nops[NOP_ATOMIC5];
 }
 
-static const unsigned char *ftrace_nop_replace(void)
+static const char *ftrace_call_replace(unsigned long ip, unsigned long addr)
 {
-	return ideal_nops[NOP_ATOMIC5];
+	return ftrace_text_replace(CALL_INSN_OPCODE, ip, addr);
 }
 
-static int
-ftrace_modify_code_direct(unsigned long ip, unsigned const char *old_code,
-		   unsigned const char *new_code)
+static int ftrace_verify_code(unsigned long ip, const char *old_code)
 {
-	unsigned char replaced[MCOUNT_INSN_SIZE];
-
-	ftrace_expected = old_code;
+	char cur_code[MCOUNT_INSN_SIZE];
 
 	/*
 	 * Note:
@@ -129,31 +97,41 @@ ftrace_modify_code_direct(unsigned long
 	 * Carefully read and modify the code with probe_kernel_*(), and make
 	 * sure what we read is what we expected it to be before modifying it.
 	 */
-
 	/* read the text we want to modify */
-	if (probe_kernel_read(replaced, (void *)ip, MCOUNT_INSN_SIZE))
+	if (probe_kernel_read(cur_code, (void *)ip, MCOUNT_INSN_SIZE)) {
+		WARN_ON(1);
 		return -EFAULT;
+	}
 
 	/* Make sure it is what we expect it to be */
-	if (memcmp(replaced, old_code, MCOUNT_INSN_SIZE) != 0)
+	if (memcmp(cur_code, old_code, MCOUNT_INSN_SIZE) != 0) {
+		WARN_ON(1);
 		return -EINVAL;
+	}
 
-	ip = text_ip_addr(ip);
-
-	/* replace the text with the new text */
-	if (probe_kernel_write((void *)ip, new_code, MCOUNT_INSN_SIZE))
-		return -EPERM;
+	return 0;
+}
 
-	sync_core();
+static int
+ftrace_modify_code_direct(unsigned long ip, const char *old_code,
+			  const char *new_code)
+{
+	int ret = ftrace_verify_code(ip, old_code);
+	if (ret)
+		return ret;
 
+	/* replace the text with the new text */
+	if (ftrace_poke_late)
+		text_poke_queue((void *)ip, new_code, MCOUNT_INSN_SIZE, NULL);
+	else
+		text_poke_early((void *)ip, new_code, MCOUNT_INSN_SIZE);
 	return 0;
 }
 
-int ftrace_make_nop(struct module *mod,
-		    struct dyn_ftrace *rec, unsigned long addr)
+int ftrace_make_nop(struct module *mod, struct dyn_ftrace *rec, unsigned long addr)
 {
-	unsigned const char *new, *old;
 	unsigned long ip = rec->ip;
+	const char *new, *old;
 
 	old = ftrace_call_replace(ip, addr);
 	new = ftrace_nop_replace();
@@ -167,19 +145,20 @@ int ftrace_make_nop(struct module *mod,
 	 * just modify the code directly.
 	 */
 	if (addr == MCOUNT_ADDR)
-		return ftrace_modify_code_direct(rec->ip, old, new);
+		return ftrace_modify_code_direct(ip, old, new);
 
-	ftrace_expected = NULL;
-
-	/* Normal cases use add_brk_on_nop */
+	/*
+	 * x86 overrides ftrace_replace_code -- this function will never be used
+	 * in this case.
+	 */
 	WARN_ONCE(1, "invalid use of ftrace_make_nop");
 	return -EINVAL;
 }
 
 int ftrace_make_call(struct dyn_ftrace *rec, unsigned long addr)
 {
-	unsigned const char *new, *old;
 	unsigned long ip = rec->ip;
+	const char *new, *old;
 
 	old = ftrace_nop_replace();
 	new = ftrace_call_replace(ip, addr);
@@ -189,43 +168,6 @@ int ftrace_make_call(struct dyn_ftrace *
 }
 
 /*
- * The modifying_ftrace_code is used to tell the breakpoint
- * handler to call ftrace_int3_handler(). If it fails to
- * call this handler for a breakpoint added by ftrace, then
- * the kernel may crash.
- *
- * As atomic_writes on x86 do not need a barrier, we do not
- * need to add smp_mb()s for this to work. It is also considered
- * that we can not read the modifying_ftrace_code before
- * executing the breakpoint. That would be quite remarkable if
- * it could do that. Here's the flow that is required:
- *
- *   CPU-0                          CPU-1
- *
- * atomic_inc(mfc);
- * write int3s
- *				<trap-int3> // implicit (r)mb
- *				if (atomic_read(mfc))
- *					call ftrace_int3_handler()
- *
- * Then when we are finished:
- *
- * atomic_dec(mfc);
- *
- * If we hit a breakpoint that was not set by ftrace, it does not
- * matter if ftrace_int3_handler() is called or not. It will
- * simply be ignored. But it is crucial that a ftrace nop/caller
- * breakpoint is handled. No other user should ever place a
- * breakpoint on an ftrace nop/caller location. It must only
- * be done by this code.
- */
-atomic_t modifying_ftrace_code __read_mostly;
-
-static int
-ftrace_modify_code(unsigned long ip, unsigned const char *old_code,
-		   unsigned const char *new_code);
-
-/*
  * Should never be called:
  *  As it is only called by __ftrace_replace_code() which is called by
  *  ftrace_replace_code() that x86 overrides, and by ftrace_update_code()
@@ -237,452 +179,84 @@ int ftrace_modify_call(struct dyn_ftrace
 				 unsigned long addr)
 {
 	WARN_ON(1);
-	ftrace_expected = NULL;
 	return -EINVAL;
 }
 
-static unsigned long ftrace_update_func;
-static unsigned long ftrace_update_func_call;
-
-static int update_ftrace_func(unsigned long ip, void *new)
-{
-	unsigned char old[MCOUNT_INSN_SIZE];
-	int ret;
-
-	memcpy(old, (void *)ip, MCOUNT_INSN_SIZE);
-
-	ftrace_update_func = ip;
-	/* Make sure the breakpoints see the ftrace_update_func update */
-	smp_wmb();
-
-	/* See comment above by declaration of modifying_ftrace_code */
-	atomic_inc(&modifying_ftrace_code);
-
-	ret = ftrace_modify_code(ip, old, new);
-
-	atomic_dec(&modifying_ftrace_code);
-
-	return ret;
-}
-
 int ftrace_update_ftrace_func(ftrace_func_t func)
 {
-	unsigned long ip = (unsigned long)(&ftrace_call);
-	unsigned char *new;
-	int ret;
-
-	ftrace_update_func_call = (unsigned long)func;
-
-	new = ftrace_call_replace(ip, (unsigned long)func);
-	ret = update_ftrace_func(ip, new);
-
-	/* Also update the regs callback function */
-	if (!ret) {
-		ip = (unsigned long)(&ftrace_regs_call);
-		new = ftrace_call_replace(ip, (unsigned long)func);
-		ret = update_ftrace_func(ip, new);
-	}
-
-	return ret;
-}
-
-static nokprobe_inline int is_ftrace_caller(unsigned long ip)
-{
-	if (ip == ftrace_update_func)
-		return 1;
-
-	return 0;
-}
-
-/*
- * A breakpoint was added to the code address we are about to
- * modify, and this is the handle that will just skip over it.
- * We are either changing a nop into a trace call, or a trace
- * call to a nop. While the change is taking place, we treat
- * it just like it was a nop.
- */
-int ftrace_int3_handler(struct pt_regs *regs)
-{
 	unsigned long ip;
+	const char *new;
 
-	if (WARN_ON_ONCE(!regs))
-		return 0;
-
-	ip = regs->ip - INT3_INSN_SIZE;
-
-	if (ftrace_location(ip)) {
-		int3_emulate_call(regs, (unsigned long)ftrace_regs_caller);
-		return 1;
-	} else if (is_ftrace_caller(ip)) {
-		if (!ftrace_update_func_call) {
-			int3_emulate_jmp(regs, ip + CALL_INSN_SIZE);
-			return 1;
-		}
-		int3_emulate_call(regs, ftrace_update_func_call);
-		return 1;
-	}
-
-	return 0;
-}
-NOKPROBE_SYMBOL(ftrace_int3_handler);
-
-static int ftrace_write(unsigned long ip, const char *val, int size)
-{
-	ip = text_ip_addr(ip);
-
-	if (probe_kernel_write((void *)ip, val, size))
-		return -EPERM;
-
-	return 0;
-}
-
-static int add_break(unsigned long ip, const char *old)
-{
-	unsigned char replaced[MCOUNT_INSN_SIZE];
-	unsigned char brk = BREAKPOINT_INSTRUCTION;
-
-	if (probe_kernel_read(replaced, (void *)ip, MCOUNT_INSN_SIZE))
-		return -EFAULT;
-
-	ftrace_expected = old;
-
-	/* Make sure it is what we expect it to be */
-	if (memcmp(replaced, old, MCOUNT_INSN_SIZE) != 0)
-		return -EINVAL;
-
-	return ftrace_write(ip, &brk, 1);
-}
-
-static int add_brk_on_call(struct dyn_ftrace *rec, unsigned long addr)
-{
-	unsigned const char *old;
-	unsigned long ip = rec->ip;
-
-	old = ftrace_call_replace(ip, addr);
-
-	return add_break(rec->ip, old);
-}
-
-
-static int add_brk_on_nop(struct dyn_ftrace *rec)
-{
-	unsigned const char *old;
-
-	old = ftrace_nop_replace();
-
-	return add_break(rec->ip, old);
-}
-
-static int add_breakpoints(struct dyn_ftrace *rec, bool enable)
-{
-	unsigned long ftrace_addr;
-	int ret;
-
-	ftrace_addr = ftrace_get_addr_curr(rec);
-
-	ret = ftrace_test_record(rec, enable);
-
-	switch (ret) {
-	case FTRACE_UPDATE_IGNORE:
-		return 0;
-
-	case FTRACE_UPDATE_MAKE_CALL:
-		/* converting nop to call */
-		return add_brk_on_nop(rec);
-
-	case FTRACE_UPDATE_MODIFY_CALL:
-	case FTRACE_UPDATE_MAKE_NOP:
-		/* converting a call to a nop */
-		return add_brk_on_call(rec, ftrace_addr);
-	}
-	return 0;
-}
-
-/*
- * On error, we need to remove breakpoints. This needs to
- * be done caefully. If the address does not currently have a
- * breakpoint, we know we are done. Otherwise, we look at the
- * remaining 4 bytes of the instruction. If it matches a nop
- * we replace the breakpoint with the nop. Otherwise we replace
- * it with the call instruction.
- */
-static int remove_breakpoint(struct dyn_ftrace *rec)
-{
-	unsigned char ins[MCOUNT_INSN_SIZE];
-	unsigned char brk = BREAKPOINT_INSTRUCTION;
-	const unsigned char *nop;
-	unsigned long ftrace_addr;
-	unsigned long ip = rec->ip;
-
-	/* If we fail the read, just give up */
-	if (probe_kernel_read(ins, (void *)ip, MCOUNT_INSN_SIZE))
-		return -EFAULT;
-
-	/* If this does not have a breakpoint, we are done */
-	if (ins[0] != brk)
-		return 0;
-
-	nop = ftrace_nop_replace();
-
-	/*
-	 * If the last 4 bytes of the instruction do not match
-	 * a nop, then we assume that this is a call to ftrace_addr.
-	 */
-	if (memcmp(&ins[1], &nop[1], MCOUNT_INSN_SIZE - 1) != 0) {
-		/*
-		 * For extra paranoidism, we check if the breakpoint is on
-		 * a call that would actually jump to the ftrace_addr.
-		 * If not, don't touch the breakpoint, we make just create
-		 * a disaster.
-		 */
-		ftrace_addr = ftrace_get_addr_new(rec);
-		nop = ftrace_call_replace(ip, ftrace_addr);
-
-		if (memcmp(&ins[1], &nop[1], MCOUNT_INSN_SIZE - 1) == 0)
-			goto update;
-
-		/* Check both ftrace_addr and ftrace_old_addr */
-		ftrace_addr = ftrace_get_addr_curr(rec);
-		nop = ftrace_call_replace(ip, ftrace_addr);
-
-		ftrace_expected = nop;
-
-		if (memcmp(&ins[1], &nop[1], MCOUNT_INSN_SIZE - 1) != 0)
-			return -EINVAL;
-	}
-
- update:
-	return ftrace_write(ip, nop, 1);
-}
-
-static int add_update_code(unsigned long ip, unsigned const char *new)
-{
-	/* skip breakpoint */
-	ip++;
-	new++;
-	return ftrace_write(ip, new, MCOUNT_INSN_SIZE - 1);
-}
-
-static int add_update_call(struct dyn_ftrace *rec, unsigned long addr)
-{
-	unsigned long ip = rec->ip;
-	unsigned const char *new;
-
-	new = ftrace_call_replace(ip, addr);
-	return add_update_code(ip, new);
-}
-
-static int add_update_nop(struct dyn_ftrace *rec)
-{
-	unsigned long ip = rec->ip;
-	unsigned const char *new;
-
-	new = ftrace_nop_replace();
-	return add_update_code(ip, new);
-}
-
-static int add_update(struct dyn_ftrace *rec, bool enable)
-{
-	unsigned long ftrace_addr;
-	int ret;
-
-	ret = ftrace_test_record(rec, enable);
-
-	ftrace_addr  = ftrace_get_addr_new(rec);
-
-	switch (ret) {
-	case FTRACE_UPDATE_IGNORE:
-		return 0;
-
-	case FTRACE_UPDATE_MODIFY_CALL:
-	case FTRACE_UPDATE_MAKE_CALL:
-		/* converting nop to call */
-		return add_update_call(rec, ftrace_addr);
-
-	case FTRACE_UPDATE_MAKE_NOP:
-		/* converting a call to a nop */
-		return add_update_nop(rec);
-	}
-
-	return 0;
-}
-
-static int finish_update_call(struct dyn_ftrace *rec, unsigned long addr)
-{
-	unsigned long ip = rec->ip;
-	unsigned const char *new;
-
-	new = ftrace_call_replace(ip, addr);
-
-	return ftrace_write(ip, new, 1);
-}
-
-static int finish_update_nop(struct dyn_ftrace *rec)
-{
-	unsigned long ip = rec->ip;
-	unsigned const char *new;
-
-	new = ftrace_nop_replace();
-
-	return ftrace_write(ip, new, 1);
-}
-
-static int finish_update(struct dyn_ftrace *rec, bool enable)
-{
-	unsigned long ftrace_addr;
-	int ret;
-
-	ret = ftrace_update_record(rec, enable);
-
-	ftrace_addr = ftrace_get_addr_new(rec);
-
-	switch (ret) {
-	case FTRACE_UPDATE_IGNORE:
-		return 0;
+	ip = (unsigned long)(&ftrace_call);
+	new = ftrace_call_replace(ip, (unsigned long)func);
+	text_poke_bp((void *)ip, new, MCOUNT_INSN_SIZE, NULL);
 
-	case FTRACE_UPDATE_MODIFY_CALL:
-	case FTRACE_UPDATE_MAKE_CALL:
-		/* converting nop to call */
-		return finish_update_call(rec, ftrace_addr);
-
-	case FTRACE_UPDATE_MAKE_NOP:
-		/* converting a call to a nop */
-		return finish_update_nop(rec);
-	}
+	ip = (unsigned long)(&ftrace_regs_call);
+	new = ftrace_call_replace(ip, (unsigned long)func);
+	text_poke_bp((void *)ip, new, MCOUNT_INSN_SIZE, NULL);
 
 	return 0;
 }
 
-static void do_sync_core(void *data)
-{
-	sync_core();
-}
-
-static void run_sync(void)
-{
-	int enable_irqs;
-
-	/* No need to sync if there's only one CPU */
-	if (num_online_cpus() == 1)
-		return;
-
-	enable_irqs = irqs_disabled();
-
-	/* We may be called with interrupts disabled (on bootup). */
-	if (enable_irqs)
-		local_irq_enable();
-	on_each_cpu(do_sync_core, NULL, 1);
-	if (enable_irqs)
-		local_irq_disable();
-}
-
 void ftrace_replace_code(int enable)
 {
 	struct ftrace_rec_iter *iter;
 	struct dyn_ftrace *rec;
-	const char *report = "adding breakpoints";
-	int count = 0;
+	const char *new, *old;
 	int ret;
 
 	for_ftrace_rec_iter(iter) {
 		rec = ftrace_rec_iter_record(iter);
 
-		ret = add_breakpoints(rec, enable);
-		if (ret)
-			goto remove_breakpoints;
-		count++;
-	}
-
-	run_sync();
-
-	report = "updating code";
-	count = 0;
-
-	for_ftrace_rec_iter(iter) {
-		rec = ftrace_rec_iter_record(iter);
+		switch (ftrace_test_record(rec, enable)) {
+		case FTRACE_UPDATE_IGNORE:
+		default:
+			continue;
+
+		case FTRACE_UPDATE_MAKE_CALL:
+			old = ftrace_nop_replace();
+			break;
+
+		case FTRACE_UPDATE_MODIFY_CALL:
+		case FTRACE_UPDATE_MAKE_NOP:
+			old = ftrace_call_replace(rec->ip, ftrace_get_addr_curr(rec));
+			break;
+		}
 
-		ret = add_update(rec, enable);
-		if (ret)
-			goto remove_breakpoints;
-		count++;
+		ret = ftrace_verify_code(rec->ip, old);
+		if (ret) {
+			ftrace_bug(ret, rec);
+			return;
+		}
 	}
 
-	run_sync();
-
-	report = "removing breakpoints";
-	count = 0;
-
 	for_ftrace_rec_iter(iter) {
 		rec = ftrace_rec_iter_record(iter);
 
-		ret = finish_update(rec, enable);
-		if (ret)
-			goto remove_breakpoints;
-		count++;
-	}
-
-	run_sync();
-
-	return;
+		switch (ftrace_test_record(rec, enable)) {
+		case FTRACE_UPDATE_IGNORE:
+		default:
+			continue;
+
+		case FTRACE_UPDATE_MAKE_CALL:
+		case FTRACE_UPDATE_MODIFY_CALL:
+			new = ftrace_call_replace(rec->ip, ftrace_get_addr_new(rec));
+			break;
+
+		case FTRACE_UPDATE_MAKE_NOP:
+			new = ftrace_nop_replace();
+			break;
+		}
 
- remove_breakpoints:
-	pr_warn("Failed on %s (%d):\n", report, count);
-	ftrace_bug(ret, rec);
-	for_ftrace_rec_iter(iter) {
-		rec = ftrace_rec_iter_record(iter);
-		/*
-		 * Breakpoints are handled only when this function is in
-		 * progress. The system could not work with them.
-		 */
-		if (remove_breakpoint(rec))
-			BUG();
+		text_poke_queue((void *)rec->ip, new, MCOUNT_INSN_SIZE, NULL);
+		ftrace_update_record(rec, enable);
 	}
-	run_sync();
-}
-
-static int
-ftrace_modify_code(unsigned long ip, unsigned const char *old_code,
-		   unsigned const char *new_code)
-{
-	int ret;
-
-	ret = add_break(ip, old_code);
-	if (ret)
-		goto out;
-
-	run_sync();
-
-	ret = add_update_code(ip, new_code);
-	if (ret)
-		goto fail_update;
-
-	run_sync();
-
-	ret = ftrace_write(ip, new_code, 1);
-	/*
-	 * The breakpoint is handled only when this function is in progress.
-	 * The system could not work if we could not remove it.
-	 */
-	BUG_ON(ret);
- out:
-	run_sync();
-	return ret;
-
- fail_update:
-	/* Also here the system could not work with the breakpoint */
-	if (ftrace_write(ip, old_code, 1))
-		BUG();
-	goto out;
+	text_poke_finish();
 }
 
 void arch_ftrace_update_code(int command)
 {
-	/* See comment above by declaration of modifying_ftrace_code */
-	atomic_inc(&modifying_ftrace_code);
-
 	ftrace_modify_all_code(command);
-
-	atomic_dec(&modifying_ftrace_code);
 }
 
 int __init ftrace_dyn_arch_init(void)
@@ -747,6 +321,7 @@ create_trampoline(struct ftrace_ops *ops
 	unsigned long start_offset;
 	unsigned long end_offset;
 	unsigned long op_offset;
+	unsigned long call_offset;
 	unsigned long offset;
 	unsigned long npages;
 	unsigned long size;
@@ -763,10 +338,12 @@ create_trampoline(struct ftrace_ops *ops
 		start_offset = (unsigned long)ftrace_regs_caller;
 		end_offset = (unsigned long)ftrace_regs_caller_end;
 		op_offset = (unsigned long)ftrace_regs_caller_op_ptr;
+		call_offset = (unsigned long)ftrace_regs_call;
 	} else {
 		start_offset = (unsigned long)ftrace_caller;
 		end_offset = (unsigned long)ftrace_epilogue;
 		op_offset = (unsigned long)ftrace_caller_op_ptr;
+		call_offset = (unsigned long)ftrace_call;
 	}
 
 	size = end_offset - start_offset;
@@ -823,16 +400,21 @@ create_trampoline(struct ftrace_ops *ops
 	/* put in the new offset to the ftrace_ops */
 	memcpy(trampoline + op_offset, &op_ptr, OP_REF_SIZE);
 
+	/* put in the call to the function */
+	mutex_lock(&text_mutex);
+	call_offset -= start_offset;
+	memcpy(trampoline + call_offset,
+	       text_gen_insn(CALL_INSN_OPCODE,
+			     trampoline + call_offset,
+			     ftrace_ops_get_func(ops)), CALL_INSN_SIZE);
+	mutex_unlock(&text_mutex);
+
 	/* ALLOC_TRAMP flags lets us know we created it */
 	ops->flags |= FTRACE_OPS_FL_ALLOC_TRAMP;
 
 	set_vm_flush_reset_perms(trampoline);
 
-	/*
-	 * Module allocation needs to be completed by making the page
-	 * executable. The page is still writable, which is a security hazard,
-	 * but anyhow ftrace breaks W^X completely.
-	 */
+	set_memory_ro((unsigned long)trampoline, npages);
 	set_memory_x((unsigned long)trampoline, npages);
 	return (unsigned long)trampoline;
 fail:
@@ -859,43 +441,35 @@ static unsigned long calc_trampoline_cal
 void arch_ftrace_update_trampoline(struct ftrace_ops *ops)
 {
 	ftrace_func_t func;
-	unsigned char *new;
 	unsigned long offset;
 	unsigned long ip;
 	unsigned int size;
-	int ret, npages;
+	const char *new;
 
-	if (ops->trampoline) {
-		/*
-		 * The ftrace_ops caller may set up its own trampoline.
-		 * In such a case, this code must not modify it.
-		 */
-		if (!(ops->flags & FTRACE_OPS_FL_ALLOC_TRAMP))
-			return;
-		npages = PAGE_ALIGN(ops->trampoline_size) >> PAGE_SHIFT;
-		set_memory_rw(ops->trampoline, npages);
-	} else {
+	if (!ops->trampoline) {
 		ops->trampoline = create_trampoline(ops, &size);
 		if (!ops->trampoline)
 			return;
 		ops->trampoline_size = size;
-		npages = PAGE_ALIGN(size) >> PAGE_SHIFT;
+		return;
 	}
 
+	/*
+	 * The ftrace_ops caller may set up its own trampoline.
+	 * In such a case, this code must not modify it.
+	 */
+	if (!(ops->flags & FTRACE_OPS_FL_ALLOC_TRAMP))
+		return;
+
 	offset = calc_trampoline_call_offset(ops->flags & FTRACE_OPS_FL_SAVE_REGS);
 	ip = ops->trampoline + offset;
-
 	func = ftrace_ops_get_func(ops);
 
-	ftrace_update_func_call = (unsigned long)func;
-
+	mutex_lock(&text_mutex);
 	/* Do a safe modify in case the trampoline is executing */
 	new = ftrace_call_replace(ip, (unsigned long)func);
-	ret = update_ftrace_func(ip, new);
-	set_memory_ro(ops->trampoline, npages);
-
-	/* The update should never fail */
-	WARN_ON(ret);
+	text_poke_bp((void *)ip, new, MCOUNT_INSN_SIZE, NULL);
+	mutex_unlock(&text_mutex);
 }
 
 /* Return the address of the function the trampoline calls */
@@ -981,19 +555,18 @@ void arch_ftrace_trampoline_free(struct
 #ifdef CONFIG_DYNAMIC_FTRACE
 extern void ftrace_graph_call(void);
 
-static unsigned char *ftrace_jmp_replace(unsigned long ip, unsigned long addr)
+static const char *ftrace_jmp_replace(unsigned long ip, unsigned long addr)
 {
-	return ftrace_text_replace(0xe9, ip, addr);
+	return ftrace_text_replace(JMP32_INSN_OPCODE, ip, addr);
 }
 
 static int ftrace_mod_jmp(unsigned long ip, void *func)
 {
-	unsigned char *new;
+	const char *new;
 
-	ftrace_update_func_call = 0UL;
 	new = ftrace_jmp_replace(ip, (unsigned long)func);
-
-	return update_ftrace_func(ip, new);
+	text_poke_bp((void *)ip, new, MCOUNT_INSN_SIZE, NULL); // BATCH
+	return 0;
 }
 
 int ftrace_enable_ftrace_graph_caller(void)
@@ -1019,10 +592,9 @@ int ftrace_disable_ftrace_graph_caller(v
 void prepare_ftrace_return(unsigned long self_addr, unsigned long *parent,
 			   unsigned long frame_pointer)
 {
+	unsigned long return_hooker = (unsigned long)&return_to_handler;
 	unsigned long old;
 	int faulted;
-	unsigned long return_hooker = (unsigned long)
-				&return_to_handler;
 
 	/*
 	 * When resuming from suspend-to-ram, this function can be indirectly
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -563,15 +563,6 @@ NOKPROBE_SYMBOL(do_general_protection);
 
 dotraplinkage void notrace do_int3(struct pt_regs *regs, long error_code)
 {
-#ifdef CONFIG_DYNAMIC_FTRACE
-	/*
-	 * ftrace must be first, everything else may cause a recursive crash.
-	 * See note by declaration of modifying_ftrace_code in ftrace.c
-	 */
-	if (unlikely(atomic_read(&modifying_ftrace_code)) &&
-	    ftrace_int3_handler(regs))
-		return;
-#endif
 	if (poke_int3_handler(regs))
 		return;
 



^ permalink raw reply	[flat|nested] 70+ messages in thread

* [PATCH v4 06/16] x86/mm: Remove set_kernel_text_r[ow]()
  2019-10-18  7:35 [PATCH v4 00/16] Rewrite x86/ftrace to use text_poke (and more) Peter Zijlstra
                   ` (4 preceding siblings ...)
  2019-10-18  7:35 ` [PATCH v4 05/16] x86/ftrace: Use text_poke() Peter Zijlstra
@ 2019-10-18  7:35 ` Peter Zijlstra
  2019-10-18  7:35 ` [PATCH v4 07/16] x86/alternative: Add text_opcode_size() Peter Zijlstra
                   ` (10 subsequent siblings)
  16 siblings, 0 replies; 70+ messages in thread
From: Peter Zijlstra @ 2019-10-18  7:35 UTC (permalink / raw)
  To: x86
  Cc: peterz, linux-kernel, rostedt, mhiramat, bristot, jbaron,
	torvalds, tglx, mingo, namit, hpa, luto, ard.biesheuvel,
	jpoimboe, jeyu

With the last and only user of these functions gone (ftrace) remove
them as well to avoid ever growing new users.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/include/asm/set_memory.h |    2 --
 arch/x86/mm/init_32.c             |   28 ----------------------------
 arch/x86/mm/init_64.c             |   36 ------------------------------------
 3 files changed, 66 deletions(-)

--- a/arch/x86/include/asm/set_memory.h
+++ b/arch/x86/include/asm/set_memory.h
@@ -80,8 +80,6 @@ int set_direct_map_invalid_noflush(struc
 int set_direct_map_default_noflush(struct page *page);
 
 extern int kernel_set_to_readonly;
-void set_kernel_text_rw(void);
-void set_kernel_text_ro(void);
 
 #ifdef CONFIG_X86_64
 static inline int set_mce_nospec(unsigned long pfn)
--- a/arch/x86/mm/init_32.c
+++ b/arch/x86/mm/init_32.c
@@ -874,34 +874,6 @@ void arch_remove_memory(int nid, u64 sta
 
 int kernel_set_to_readonly __read_mostly;
 
-void set_kernel_text_rw(void)
-{
-	unsigned long start = PFN_ALIGN(_text);
-	unsigned long size = PFN_ALIGN(_etext) - start;
-
-	if (!kernel_set_to_readonly)
-		return;
-
-	pr_debug("Set kernel text: %lx - %lx for read write\n",
-		 start, start+size);
-
-	set_pages_rw(virt_to_page(start), size >> PAGE_SHIFT);
-}
-
-void set_kernel_text_ro(void)
-{
-	unsigned long start = PFN_ALIGN(_text);
-	unsigned long size = PFN_ALIGN(_etext) - start;
-
-	if (!kernel_set_to_readonly)
-		return;
-
-	pr_debug("Set kernel text: %lx - %lx for read only\n",
-		 start, start+size);
-
-	set_pages_ro(virt_to_page(start), size >> PAGE_SHIFT);
-}
-
 static void mark_nxdata_nx(void)
 {
 	/*
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -1260,42 +1260,6 @@ void __init mem_init(void)
 
 int kernel_set_to_readonly;
 
-void set_kernel_text_rw(void)
-{
-	unsigned long start = PFN_ALIGN(_text);
-	unsigned long end = PFN_ALIGN(__stop___ex_table);
-
-	if (!kernel_set_to_readonly)
-		return;
-
-	pr_debug("Set kernel text: %lx - %lx for read write\n",
-		 start, end);
-
-	/*
-	 * Make the kernel identity mapping for text RW. Kernel text
-	 * mapping will always be RO. Refer to the comment in
-	 * static_protections() in pageattr.c
-	 */
-	set_memory_rw(start, (end - start) >> PAGE_SHIFT);
-}
-
-void set_kernel_text_ro(void)
-{
-	unsigned long start = PFN_ALIGN(_text);
-	unsigned long end = PFN_ALIGN(__stop___ex_table);
-
-	if (!kernel_set_to_readonly)
-		return;
-
-	pr_debug("Set kernel text: %lx - %lx for read only\n",
-		 start, end);
-
-	/*
-	 * Set the kernel identity mapping for text RO.
-	 */
-	set_memory_ro(start, (end - start) >> PAGE_SHIFT);
-}
-
 void mark_rodata_ro(void)
 {
 	unsigned long start = PFN_ALIGN(_text);



^ permalink raw reply	[flat|nested] 70+ messages in thread

* [PATCH v4 07/16] x86/alternative: Add text_opcode_size()
  2019-10-18  7:35 [PATCH v4 00/16] Rewrite x86/ftrace to use text_poke (and more) Peter Zijlstra
                   ` (5 preceding siblings ...)
  2019-10-18  7:35 ` [PATCH v4 06/16] x86/mm: Remove set_kernel_text_r[ow]() Peter Zijlstra
@ 2019-10-18  7:35 ` Peter Zijlstra
  2019-10-18  7:35 ` [PATCH v4 08/16] x86/ftrace: Use text_gen_insn() Peter Zijlstra
                   ` (9 subsequent siblings)
  16 siblings, 0 replies; 70+ messages in thread
From: Peter Zijlstra @ 2019-10-18  7:35 UTC (permalink / raw)
  To: x86
  Cc: peterz, linux-kernel, rostedt, mhiramat, bristot, jbaron,
	torvalds, tglx, mingo, namit, hpa, luto, ard.biesheuvel,
	jpoimboe, jeyu

Introduce a common helper to map *_INSN_OPCODE to *_INSN_SIZE.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/include/asm/text-patching.h |   43 +++++++++++++++++++++++++----------
 arch/x86/kernel/alternative.c        |   12 ---------
 2 files changed, 32 insertions(+), 23 deletions(-)

--- a/arch/x86/include/asm/text-patching.h
+++ b/arch/x86/include/asm/text-patching.h
@@ -49,18 +49,6 @@ extern void text_poke_bp(void *addr, con
 extern void text_poke_queue(void *addr, const void *opcode, size_t len, const void *emulate);
 extern void text_poke_finish(void);
 
-extern void *text_gen_insn(u8 opcode, const void *addr, const void *dest);
-
-extern int after_bootmem;
-extern __ro_after_init struct mm_struct *poking_mm;
-extern __ro_after_init unsigned long poking_addr;
-
-#ifndef CONFIG_UML_X86
-static inline void int3_emulate_jmp(struct pt_regs *regs, unsigned long ip)
-{
-	regs->ip = ip;
-}
-
 #define INT3_INSN_SIZE		1
 #define INT3_INSN_OPCODE	0xCC
 
@@ -73,6 +61,37 @@ static inline void int3_emulate_jmp(stru
 #define JMP8_INSN_SIZE		2
 #define JMP8_INSN_OPCODE	0xEB
 
+static inline int text_opcode_size(u8 opcode)
+{
+	int size = 0;
+
+#define __CASE(insn)	\
+	case insn##_INSN_OPCODE: size = insn##_INSN_SIZE; break
+
+	switch(opcode) {
+	__CASE(INT3);
+	__CASE(CALL);
+	__CASE(JMP32);
+	__CASE(JMP8);
+	}
+
+#undef __CASE
+
+	return size;
+}
+
+extern void *text_gen_insn(u8 opcode, const void *addr, const void *dest);
+
+extern int after_bootmem;
+extern __ro_after_init struct mm_struct *poking_mm;
+extern __ro_after_init unsigned long poking_addr;
+
+#ifndef CONFIG_UML_X86
+static inline void int3_emulate_jmp(struct pt_regs *regs, unsigned long ip)
+{
+	regs->ip = ip;
+}
+
 static inline void int3_emulate_push(struct pt_regs *regs, unsigned long val)
 {
 	/*
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -1259,22 +1259,12 @@ union text_poke_insn {
 void *text_gen_insn(u8 opcode, const void *addr, const void *dest)
 {
 	static union text_poke_insn insn; /* text_mutex */
-	int size = 0;
+	int size = text_opcode_size(opcode);
 
 	lockdep_assert_held(&text_mutex);
 
 	insn.opcode = opcode;
 
-#define __CASE(insn)	\
-	case insn##_INSN_OPCODE: size = insn##_INSN_SIZE; break
-
-	switch(opcode) {
-	__CASE(INT3);
-	__CASE(CALL);
-	__CASE(JMP32);
-	__CASE(JMP8);
-	}
-
 	if (size > 1) {
 		insn.disp = (long)dest - (long)(addr + size);
 		if (size == 2)



^ permalink raw reply	[flat|nested] 70+ messages in thread

* [PATCH v4 08/16] x86/ftrace: Use text_gen_insn()
  2019-10-18  7:35 [PATCH v4 00/16] Rewrite x86/ftrace to use text_poke (and more) Peter Zijlstra
                   ` (6 preceding siblings ...)
  2019-10-18  7:35 ` [PATCH v4 07/16] x86/alternative: Add text_opcode_size() Peter Zijlstra
@ 2019-10-18  7:35 ` Peter Zijlstra
  2019-10-18  7:35 ` [PATCH v4 09/16] x86/alternative: Remove text_poke_loc::len Peter Zijlstra
                   ` (8 subsequent siblings)
  16 siblings, 0 replies; 70+ messages in thread
From: Peter Zijlstra @ 2019-10-18  7:35 UTC (permalink / raw)
  To: x86
  Cc: peterz, linux-kernel, rostedt, mhiramat, bristot, jbaron,
	torvalds, tglx, mingo, namit, hpa, luto, ard.biesheuvel,
	jpoimboe, jeyu

Replace the ftrace_code_union with the generic text_gen_insn() helper,
which does exactly this.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/include/asm/text-patching.h |   25 ++++++++++++++++++++++++-
 arch/x86/kernel/alternative.c        |   26 --------------------------
 arch/x86/kernel/ftrace.c             |   32 +++++++-------------------------
 3 files changed, 31 insertions(+), 52 deletions(-)

--- a/arch/x86/include/asm/text-patching.h
+++ b/arch/x86/include/asm/text-patching.h
@@ -80,7 +80,30 @@ static inline int text_opcode_size(u8 op
 	return size;
 }
 
-extern void *text_gen_insn(u8 opcode, const void *addr, const void *dest);
+union text_poke_insn {
+	u8 text[POKE_MAX_OPCODE_SIZE];
+	struct {
+		u8 opcode;
+		s32 disp;
+	} __attribute__((packed));
+};
+
+static __always_inline
+void *text_gen_insn(u8 opcode, const void *addr, const void *dest)
+{
+	static union text_poke_insn insn; /* per instance */
+	int size = text_opcode_size(opcode);
+
+	insn.opcode = opcode;
+
+	if (size > 1) {
+		insn.disp = (long)dest - (long)(addr + size);
+		if (size == 2)
+			BUG_ON((insn.disp >> 31) != (insn.disp >> 7));
+	}
+
+	return &insn.text;
+}
 
 extern int after_bootmem;
 extern __ro_after_init struct mm_struct *poking_mm;
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -1247,29 +1247,3 @@ void __ref text_poke_bp(void *addr, cons
 	text_poke_loc_init(&tp, addr, opcode, len, emulate);
 	text_poke_bp_batch(&tp, 1);
 }
-
-union text_poke_insn {
-	u8 text[POKE_MAX_OPCODE_SIZE];
-	struct {
-		u8 opcode;
-		s32 disp;
-	} __attribute__((packed));
-};
-
-void *text_gen_insn(u8 opcode, const void *addr, const void *dest)
-{
-	static union text_poke_insn insn; /* text_mutex */
-	int size = text_opcode_size(opcode);
-
-	lockdep_assert_held(&text_mutex);
-
-	insn.opcode = opcode;
-
-	if (size > 1) {
-		insn.disp = (long)dest - (long)(addr + size);
-		if (size == 2)
-			BUG_ON((insn.disp >> 31) != (insn.disp >> 7));
-	}
-
-	return &insn.text;
-}
--- a/arch/x86/kernel/ftrace.c
+++ b/arch/x86/kernel/ftrace.c
@@ -58,24 +58,6 @@ int ftrace_arch_code_modify_post_process
 	return 0;
 }
 
-union ftrace_code_union {
-	char code[MCOUNT_INSN_SIZE];
-	struct {
-		char op;
-		int offset;
-	} __attribute__((packed));
-};
-
-static const char *ftrace_text_replace(char op, unsigned long ip, unsigned long addr)
-{
-	static union ftrace_code_union calc;
-
-	calc.op = op;
-	calc.offset = (int)(addr - (ip + MCOUNT_INSN_SIZE));
-
-	return calc.code;
-}
-
 static const char *ftrace_nop_replace(void)
 {
 	return ideal_nops[NOP_ATOMIC5];
@@ -83,7 +65,7 @@ static const char *ftrace_nop_replace(vo
 
 static const char *ftrace_call_replace(unsigned long ip, unsigned long addr)
 {
-	return ftrace_text_replace(CALL_INSN_OPCODE, ip, addr);
+	return text_gen_insn(CALL_INSN_OPCODE, (void *)ip, (void *)addr);
 }
 
 static int ftrace_verify_code(unsigned long ip, const char *old_code)
@@ -475,20 +457,20 @@ void arch_ftrace_update_trampoline(struc
 /* Return the address of the function the trampoline calls */
 static void *addr_from_call(void *ptr)
 {
-	union ftrace_code_union calc;
+	union text_poke_insn call;
 	int ret;
 
-	ret = probe_kernel_read(&calc, ptr, MCOUNT_INSN_SIZE);
+	ret = probe_kernel_read(&call, ptr, CALL_INSN_SIZE);
 	if (WARN_ON_ONCE(ret < 0))
 		return NULL;
 
 	/* Make sure this is a call */
-	if (WARN_ON_ONCE(calc.op != 0xe8)) {
-		pr_warn("Expected e8, got %x\n", calc.op);
+	if (WARN_ON_ONCE(call.opcode != CALL_INSN_OPCODE)) {
+		pr_warn("Expected E8, got %x\n", call.opcode);
 		return NULL;
 	}
 
-	return ptr + MCOUNT_INSN_SIZE + calc.offset;
+	return ptr + CALL_INSN_SIZE + call.disp;
 }
 
 void prepare_ftrace_return(unsigned long self_addr, unsigned long *parent,
@@ -557,7 +539,7 @@ extern void ftrace_graph_call(void);
 
 static const char *ftrace_jmp_replace(unsigned long ip, unsigned long addr)
 {
-	return ftrace_text_replace(JMP32_INSN_OPCODE, ip, addr);
+	return text_gen_insn(JMP32_INSN_OPCODE, (void *)ip, (void *)addr);
 }
 
 static int ftrace_mod_jmp(unsigned long ip, void *func)



^ permalink raw reply	[flat|nested] 70+ messages in thread

* [PATCH v4 09/16] x86/alternative: Remove text_poke_loc::len
  2019-10-18  7:35 [PATCH v4 00/16] Rewrite x86/ftrace to use text_poke (and more) Peter Zijlstra
                   ` (7 preceding siblings ...)
  2019-10-18  7:35 ` [PATCH v4 08/16] x86/ftrace: Use text_gen_insn() Peter Zijlstra
@ 2019-10-18  7:35 ` Peter Zijlstra
  2019-10-21  8:58   ` Ingo Molnar
  2019-10-18  7:35 ` [PATCH v4 10/16] x86/alternative: Shrink text_poke_loc Peter Zijlstra
                   ` (7 subsequent siblings)
  16 siblings, 1 reply; 70+ messages in thread
From: Peter Zijlstra @ 2019-10-18  7:35 UTC (permalink / raw)
  To: x86
  Cc: peterz, linux-kernel, rostedt, mhiramat, bristot, jbaron,
	torvalds, tglx, mingo, namit, hpa, luto, ard.biesheuvel,
	jpoimboe, jeyu

Per the BUG_ON(len != insn.length) in text_poke_loc_init(), tp->len
must indeed be the same as text_opcode_size(tp->opcode). Use this to
remove this field from the structure.

Sadly, due to 8 byte alignment, this only increases the structure
padding.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/kernel/alternative.c |   12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -938,7 +938,6 @@ static void do_sync_core(void *info)
 
 struct text_poke_loc {
 	void *addr;
-	int len;
 	s32 rel32;
 	u8 opcode;
 	const u8 text[POKE_MAX_OPCODE_SIZE];
@@ -965,6 +964,7 @@ int notrace poke_int3_handler(struct pt_
 {
 	struct text_poke_loc *tp;
 	void *ip;
+	int len;
 
 	/*
 	 * Having observed our INT3 instruction, we now must observe
@@ -1004,7 +1004,8 @@ int notrace poke_int3_handler(struct pt_
 			return 0;
 	}
 
-	ip += tp->len;
+	len = text_opcode_size(tp->opcode);
+	ip += len;
 
 	switch (tp->opcode) {
 	case INT3_INSN_OPCODE:
@@ -1085,10 +1086,12 @@ static void text_poke_bp_batch(struct te
 	 * Second step: update all but the first byte of the patched range.
 	 */
 	for (do_sync = 0, i = 0; i < nr_entries; i++) {
-		if (tp[i].len - sizeof(int3) > 0) {
+		int len = text_opcode_size(tp[i].opcode);
+
+		if (len - sizeof(int3) > 0) {
 			text_poke((char *)tp[i].addr + sizeof(int3),
 				  (const char *)tp[i].text + sizeof(int3),
-				  tp[i].len - sizeof(int3));
+				  len - sizeof(int3));
 			do_sync++;
 		}
 	}
@@ -1141,7 +1144,6 @@ void text_poke_loc_init(struct text_poke
 	BUG_ON(len != insn.length);
 
 	tp->addr = addr;
-	tp->len = len;
 	tp->opcode = insn.opcode.bytes[0];
 
 	switch (tp->opcode) {



^ permalink raw reply	[flat|nested] 70+ messages in thread

* [PATCH v4 10/16] x86/alternative: Shrink text_poke_loc
  2019-10-18  7:35 [PATCH v4 00/16] Rewrite x86/ftrace to use text_poke (and more) Peter Zijlstra
                   ` (8 preceding siblings ...)
  2019-10-18  7:35 ` [PATCH v4 09/16] x86/alternative: Remove text_poke_loc::len Peter Zijlstra
@ 2019-10-18  7:35 ` Peter Zijlstra
  2019-10-21  9:01   ` Ingo Molnar
  2019-10-18  7:35 ` [PATCH v4 11/16] x86/kprobes: Convert to text-patching.h Peter Zijlstra
                   ` (6 subsequent siblings)
  16 siblings, 1 reply; 70+ messages in thread
From: Peter Zijlstra @ 2019-10-18  7:35 UTC (permalink / raw)
  To: x86
  Cc: peterz, linux-kernel, rostedt, mhiramat, bristot, jbaron,
	torvalds, tglx, mingo, namit, hpa, luto, ard.biesheuvel,
	jpoimboe, jeyu

Employ the fact that all text must be within a s32 displacement of one
another to shrink the text_poke_loc::addr field. Make it relative to
_stext.

This then shrinks struct text_poke_loc to 16 bytes, and consequently
increases TP_VEC_MAX from 170 to 256.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/kernel/alternative.c |   23 ++++++++++++++---------
 1 file changed, 14 insertions(+), 9 deletions(-)

--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -937,7 +937,7 @@ static void do_sync_core(void *info)
 }
 
 struct text_poke_loc {
-	void *addr;
+	s32 rel_addr; /* addr := _stext + rel_addr */
 	s32 rel32;
 	u8 opcode;
 	const u8 text[POKE_MAX_OPCODE_SIZE];
@@ -948,13 +948,18 @@ static struct bp_patching_desc {
 	int nr_entries;
 } bp_patching;
 
+static inline void *text_poke_addr(struct text_poke_loc *tp)
+{
+	return _stext + tp->rel_addr;
+}
+
 static int notrace patch_cmp(const void *key, const void *elt)
 {
 	struct text_poke_loc *tp = (struct text_poke_loc *) elt;
 
-	if (key < tp->addr)
+	if (key < text_poke_addr(tp))
 		return -1;
-	if (key > tp->addr)
+	if (key > text_poke_addr(tp))
 		return 1;
 	return 0;
 }
@@ -1000,7 +1005,7 @@ int notrace poke_int3_handler(struct pt_
 			return 0;
 	} else {
 		tp = bp_patching.vec;
-		if (tp->addr != ip)
+		if (text_poke_addr(tp) != ip)
 			return 0;
 	}
 
@@ -1078,7 +1083,7 @@ static void text_poke_bp_batch(struct te
 	 * First step: add a int3 trap to the address that will be patched.
 	 */
 	for (i = 0; i < nr_entries; i++)
-		text_poke(tp[i].addr, &int3, sizeof(int3));
+		text_poke(text_poke_addr(&tp[i]), &int3, sizeof(int3));
 
 	on_each_cpu(do_sync_core, NULL, 1);
 
@@ -1089,7 +1094,7 @@ static void text_poke_bp_batch(struct te
 		int len = text_opcode_size(tp[i].opcode);
 
 		if (len - sizeof(int3) > 0) {
-			text_poke((char *)tp[i].addr + sizeof(int3),
+			text_poke(text_poke_addr(&tp[i]) + sizeof(int3),
 				  (const char *)tp[i].text + sizeof(int3),
 				  len - sizeof(int3));
 			do_sync++;
@@ -1113,7 +1118,7 @@ static void text_poke_bp_batch(struct te
 		if (tp[i].text[0] == INT3_INSN_OPCODE)
 			continue;
 
-		text_poke(tp[i].addr, tp[i].text, sizeof(int3));
+		text_poke(text_poke_addr(&tp[i]), tp[i].text, sizeof(int3));
 		do_sync++;
 	}
 
@@ -1143,7 +1148,7 @@ void text_poke_loc_init(struct text_poke
 	BUG_ON(!insn_complete(&insn));
 	BUG_ON(len != insn.length);
 
-	tp->addr = addr;
+	tp->rel_addr = addr - (void *)_stext;
 	tp->opcode = insn.opcode.bytes[0];
 
 	switch (tp->opcode) {
@@ -1192,7 +1197,7 @@ static bool tp_order_fail(void *addr)
 		return true;
 
 	tp = &tp_vec[tp_vec_nr - 1];
-	if ((unsigned long)tp->addr > (unsigned long)addr)
+	if ((unsigned long)text_poke_addr(tp) > (unsigned long)addr)
 		return true;
 
 	return false;



^ permalink raw reply	[flat|nested] 70+ messages in thread

* [PATCH v4 11/16] x86/kprobes: Convert to text-patching.h
  2019-10-18  7:35 [PATCH v4 00/16] Rewrite x86/ftrace to use text_poke (and more) Peter Zijlstra
                   ` (9 preceding siblings ...)
  2019-10-18  7:35 ` [PATCH v4 10/16] x86/alternative: Shrink text_poke_loc Peter Zijlstra
@ 2019-10-18  7:35 ` Peter Zijlstra
  2019-10-21 14:57   ` Masami Hiramatsu
  2019-10-18  7:35 ` [PATCH v4 12/16] x86/kprobes: Fix ordering Peter Zijlstra
                   ` (5 subsequent siblings)
  16 siblings, 1 reply; 70+ messages in thread
From: Peter Zijlstra @ 2019-10-18  7:35 UTC (permalink / raw)
  To: x86
  Cc: peterz, linux-kernel, rostedt, mhiramat, bristot, jbaron,
	torvalds, tglx, mingo, namit, hpa, luto, ard.biesheuvel,
	jpoimboe, jeyu

Convert kprobes to the new text-poke naming.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/include/asm/kprobes.h       |   14 +++--------
 arch/x86/include/asm/text-patching.h |    2 +
 arch/x86/kernel/kprobes/core.c       |   18 +++++++-------
 arch/x86/kernel/kprobes/opt.c        |   44 ++++++++++++++++-------------------
 4 files changed, 37 insertions(+), 41 deletions(-)

--- a/arch/x86/include/asm/kprobes.h
+++ b/arch/x86/include/asm/kprobes.h
@@ -11,12 +11,11 @@
 
 #include <asm-generic/kprobes.h>
 
-#define BREAKPOINT_INSTRUCTION	0xcc
-
 #ifdef CONFIG_KPROBES
 #include <linux/types.h>
 #include <linux/ptrace.h>
 #include <linux/percpu.h>
+#include <asm/text-patching.h>
 #include <asm/insn.h>
 
 #define  __ARCH_WANT_KPROBES_INSN_SLOT
@@ -25,10 +24,7 @@ struct pt_regs;
 struct kprobe;
 
 typedef u8 kprobe_opcode_t;
-#define RELATIVEJUMP_OPCODE 0xe9
-#define RELATIVEJUMP_SIZE 5
-#define RELATIVECALL_OPCODE 0xe8
-#define RELATIVE_ADDR_SIZE 4
+
 #define MAX_STACK_SIZE 64
 #define CUR_STACK_SIZE(ADDR) \
 	(current_top_of_stack() - (unsigned long)(ADDR))
@@ -43,11 +39,11 @@ extern __visible kprobe_opcode_t optprob
 extern __visible kprobe_opcode_t optprobe_template_val[];
 extern __visible kprobe_opcode_t optprobe_template_call[];
 extern __visible kprobe_opcode_t optprobe_template_end[];
-#define MAX_OPTIMIZED_LENGTH (MAX_INSN_SIZE + RELATIVE_ADDR_SIZE)
+#define MAX_OPTIMIZED_LENGTH (MAX_INSN_SIZE + DISP32_SIZE)
 #define MAX_OPTINSN_SIZE 				\
 	(((unsigned long)optprobe_template_end -	\
 	  (unsigned long)optprobe_template_entry) +	\
-	 MAX_OPTIMIZED_LENGTH + RELATIVEJUMP_SIZE)
+	 MAX_OPTIMIZED_LENGTH + JMP32_INSN_SIZE)
 
 extern const int kretprobe_blacklist_size;
 
@@ -73,7 +69,7 @@ struct arch_specific_insn {
 
 struct arch_optimized_insn {
 	/* copy of the original instructions */
-	kprobe_opcode_t copied_insn[RELATIVE_ADDR_SIZE];
+	kprobe_opcode_t copied_insn[DISP32_SIZE];
 	/* detour code buffer */
 	kprobe_opcode_t *insn;
 	/* the size of instructions copied to detour code buffer */
--- a/arch/x86/include/asm/text-patching.h
+++ b/arch/x86/include/asm/text-patching.h
@@ -61,6 +61,8 @@ extern void text_poke_finish(void);
 #define JMP8_INSN_SIZE		2
 #define JMP8_INSN_OPCODE	0xEB
 
+#define DISP32_SIZE		4
+
 static inline int text_opcode_size(u8 opcode)
 {
 	int size = 0;
--- a/arch/x86/kernel/kprobes/core.c
+++ b/arch/x86/kernel/kprobes/core.c
@@ -119,14 +119,14 @@ __synthesize_relative_insn(void *dest, v
 /* Insert a jump instruction at address 'from', which jumps to address 'to'.*/
 void synthesize_reljump(void *dest, void *from, void *to)
 {
-	__synthesize_relative_insn(dest, from, to, RELATIVEJUMP_OPCODE);
+	__synthesize_relative_insn(dest, from, to, JMP32_INSN_OPCODE);
 }
 NOKPROBE_SYMBOL(synthesize_reljump);
 
 /* Insert a call instruction at address 'from', which calls address 'to'.*/
 void synthesize_relcall(void *dest, void *from, void *to)
 {
-	__synthesize_relative_insn(dest, from, to, RELATIVECALL_OPCODE);
+	__synthesize_relative_insn(dest, from, to, CALL_INSN_OPCODE);
 }
 NOKPROBE_SYMBOL(synthesize_relcall);
 
@@ -301,7 +301,7 @@ static int can_probe(unsigned long paddr
 		 * Another debugging subsystem might insert this breakpoint.
 		 * In that case, we can't recover it.
 		 */
-		if (insn.opcode.bytes[0] == BREAKPOINT_INSTRUCTION)
+		if (insn.opcode.bytes[0] == INT3_INSN_OPCODE)
 			return 0;
 		addr += insn.length;
 	}
@@ -352,7 +352,7 @@ int __copy_instruction(u8 *dest, u8 *src
 	insn_get_length(insn);
 
 	/* Another subsystem puts a breakpoint, failed to recover */
-	if (insn->opcode.bytes[0] == BREAKPOINT_INSTRUCTION)
+	if (insn->opcode.bytes[0] == INT3_INSN_OPCODE)
 		return 0;
 
 	/* We should not singlestep on the exception masking instructions */
@@ -396,14 +396,14 @@ static int prepare_boost(kprobe_opcode_t
 	int len = insn->length;
 
 	if (can_boost(insn, p->addr) &&
-	    MAX_INSN_SIZE - len >= RELATIVEJUMP_SIZE) {
+	    MAX_INSN_SIZE - len >= JMP32_INSN_SIZE) {
 		/*
 		 * These instructions can be executed directly if it
 		 * jumps back to correct address.
 		 */
 		synthesize_reljump(buf + len, p->ainsn.insn + len,
 				   p->addr + insn->length);
-		len += RELATIVEJUMP_SIZE;
+		len += JMP32_INSN_SIZE;
 		p->ainsn.boostable = true;
 	} else {
 		p->ainsn.boostable = false;
@@ -497,7 +497,7 @@ int arch_prepare_kprobe(struct kprobe *p
 
 void arch_arm_kprobe(struct kprobe *p)
 {
-	text_poke(p->addr, ((unsigned char []){BREAKPOINT_INSTRUCTION}), 1);
+	text_poke(p->addr, ((unsigned char []){INT3_INSN_OPCODE}), 1);
 }
 
 void arch_disarm_kprobe(struct kprobe *p)
@@ -605,7 +605,7 @@ static void setup_singlestep(struct kpro
 	regs->flags |= X86_EFLAGS_TF;
 	regs->flags &= ~X86_EFLAGS_IF;
 	/* single step inline if the instruction is an int3 */
-	if (p->opcode == BREAKPOINT_INSTRUCTION)
+	if (p->opcode == INT3_INSN_OPCODE)
 		regs->ip = (unsigned long)p->addr;
 	else
 		regs->ip = (unsigned long)p->ainsn.insn;
@@ -691,7 +691,7 @@ int kprobe_int3_handler(struct pt_regs *
 				reset_current_kprobe();
 			return 1;
 		}
-	} else if (*addr != BREAKPOINT_INSTRUCTION) {
+	} else if (*addr != INT3_INSN_OPCODE) {
 		/*
 		 * The breakpoint instruction was removed right
 		 * after we hit it.  Another cpu has removed
--- a/arch/x86/kernel/kprobes/opt.c
+++ b/arch/x86/kernel/kprobes/opt.c
@@ -38,7 +38,7 @@ unsigned long __recover_optprobed_insn(k
 	long offs;
 	int i;
 
-	for (i = 0; i < RELATIVEJUMP_SIZE; i++) {
+	for (i = 0; i < JMP32_INSN_SIZE; i++) {
 		kp = get_kprobe((void *)addr - i);
 		/* This function only handles jump-optimized kprobe */
 		if (kp && kprobe_optimized(kp)) {
@@ -62,10 +62,10 @@ unsigned long __recover_optprobed_insn(k
 
 	if (addr == (unsigned long)kp->addr) {
 		buf[0] = kp->opcode;
-		memcpy(buf + 1, op->optinsn.copied_insn, RELATIVE_ADDR_SIZE);
+		memcpy(buf + 1, op->optinsn.copied_insn, DISP32_SIZE);
 	} else {
 		offs = addr - (unsigned long)kp->addr - 1;
-		memcpy(buf, op->optinsn.copied_insn + offs, RELATIVE_ADDR_SIZE - offs);
+		memcpy(buf, op->optinsn.copied_insn + offs, DISP32_SIZE - offs);
 	}
 
 	return (unsigned long)buf;
@@ -141,8 +141,6 @@ STACK_FRAME_NON_STANDARD(optprobe_templa
 #define TMPL_END_IDX \
 	((long)optprobe_template_end - (long)optprobe_template_entry)
 
-#define INT3_SIZE sizeof(kprobe_opcode_t)
-
 /* Optimized kprobe call back function: called from optinsn */
 static void
 optimized_callback(struct optimized_kprobe *op, struct pt_regs *regs)
@@ -162,7 +160,7 @@ optimized_callback(struct optimized_kpro
 		regs->cs |= get_kernel_rpl();
 		regs->gs = 0;
 #endif
-		regs->ip = (unsigned long)op->kp.addr + INT3_SIZE;
+		regs->ip = (unsigned long)op->kp.addr + INT3_INSN_SIZE;
 		regs->orig_ax = ~0UL;
 
 		__this_cpu_write(current_kprobe, &op->kp);
@@ -179,7 +177,7 @@ static int copy_optimized_instructions(u
 	struct insn insn;
 	int len = 0, ret;
 
-	while (len < RELATIVEJUMP_SIZE) {
+	while (len < JMP32_INSN_SIZE) {
 		ret = __copy_instruction(dest + len, src + len, real + len, &insn);
 		if (!ret || !can_boost(&insn, src + len))
 			return -EINVAL;
@@ -271,7 +269,7 @@ static int can_optimize(unsigned long pa
 		return 0;
 
 	/* Check there is enough space for a relative jump. */
-	if (size - offset < RELATIVEJUMP_SIZE)
+	if (size - offset < JMP32_INSN_SIZE)
 		return 0;
 
 	/* Decode instructions */
@@ -290,15 +288,15 @@ static int can_optimize(unsigned long pa
 		kernel_insn_init(&insn, (void *)recovered_insn, MAX_INSN_SIZE);
 		insn_get_length(&insn);
 		/* Another subsystem puts a breakpoint */
-		if (insn.opcode.bytes[0] == BREAKPOINT_INSTRUCTION)
+		if (insn.opcode.bytes[0] == INT3_INSN_OPCODE)
 			return 0;
 		/* Recover address */
 		insn.kaddr = (void *)addr;
 		insn.next_byte = (void *)(addr + insn.length);
 		/* Check any instructions don't jump into target */
 		if (insn_is_indirect_jump(&insn) ||
-		    insn_jump_into_range(&insn, paddr + INT3_SIZE,
-					 RELATIVE_ADDR_SIZE))
+		    insn_jump_into_range(&insn, paddr + INT3_INSN_SIZE,
+					 DISP32_SIZE))
 			return 0;
 		addr += insn.length;
 	}
@@ -374,7 +372,7 @@ int arch_prepare_optimized_kprobe(struct
 	 * Verify if the address gap is in 2GB range, because this uses
 	 * a relative jump.
 	 */
-	rel = (long)slot - (long)op->kp.addr + RELATIVEJUMP_SIZE;
+	rel = (long)slot - (long)op->kp.addr + JMP32_INSN_SIZE;
 	if (abs(rel) > 0x7fffffff) {
 		ret = -ERANGE;
 		goto err;
@@ -401,7 +399,7 @@ int arch_prepare_optimized_kprobe(struct
 	/* Set returning jmp instruction at the tail of out-of-line buffer */
 	synthesize_reljump(buf + len, slot + len,
 			   (u8 *)op->kp.addr + op->optinsn.size);
-	len += RELATIVEJUMP_SIZE;
+	len += JMP32_INSN_SIZE;
 
 	/* We have to use text_poke() for instruction buffer because it is RO */
 	text_poke(slot, buf, len);
@@ -422,22 +420,22 @@ int arch_prepare_optimized_kprobe(struct
 void arch_optimize_kprobes(struct list_head *oplist)
 {
 	struct optimized_kprobe *op, *tmp;
-	u8 insn_buff[RELATIVEJUMP_SIZE];
+	u8 insn_buff[JMP32_INSN_SIZE];
 
 	list_for_each_entry_safe(op, tmp, oplist, list) {
 		s32 rel = (s32)((long)op->optinsn.insn -
-			((long)op->kp.addr + RELATIVEJUMP_SIZE));
+			((long)op->kp.addr + JMP32_INSN_SIZE));
 
 		WARN_ON(kprobe_disabled(&op->kp));
 
 		/* Backup instructions which will be replaced by jump address */
-		memcpy(op->optinsn.copied_insn, op->kp.addr + INT3_SIZE,
-		       RELATIVE_ADDR_SIZE);
+		memcpy(op->optinsn.copied_insn, op->kp.addr + INT3_INSN_SIZE,
+		       DISP32_SIZE);
 
-		insn_buff[0] = RELATIVEJUMP_OPCODE;
+		insn_buff[0] = JMP32_INSN_OPCODE;
 		*(s32 *)(&insn_buff[1]) = rel;
 
-		text_poke_bp(op->kp.addr, insn_buff, RELATIVEJUMP_SIZE, NULL);
+		text_poke_bp(op->kp.addr, insn_buff, JMP32_INSN_SIZE, NULL);
 
 		list_del_init(&op->list);
 	}
@@ -446,13 +444,13 @@ void arch_optimize_kprobes(struct list_h
 /* Replace a relative jump with a breakpoint (int3).  */
 void arch_unoptimize_kprobe(struct optimized_kprobe *op)
 {
-	u8 insn_buff[RELATIVEJUMP_SIZE];
+	u8 insn_buff[JMP32_INSN_SIZE];
 
 	/* Set int3 to first byte for kprobes */
-	insn_buff[0] = BREAKPOINT_INSTRUCTION;
-	memcpy(insn_buff + 1, op->optinsn.copied_insn, RELATIVE_ADDR_SIZE);
+	insn_buff[0] = INT3_INSN_OPCODE;
+	memcpy(insn_buff + 1, op->optinsn.copied_insn, DISP32_SIZE);
 
-	text_poke_bp(op->kp.addr, insn_buff, RELATIVEJUMP_SIZE,
+	text_poke_bp(op->kp.addr, insn_buff, JMP32_INSN_SIZE,
 		     text_gen_insn(JMP32_INSN_OPCODE, op->kp.addr, op->optinsn.insn));
 }
 



^ permalink raw reply	[flat|nested] 70+ messages in thread

* [PATCH v4 12/16] x86/kprobes: Fix ordering
  2019-10-18  7:35 [PATCH v4 00/16] Rewrite x86/ftrace to use text_poke (and more) Peter Zijlstra
                   ` (10 preceding siblings ...)
  2019-10-18  7:35 ` [PATCH v4 11/16] x86/kprobes: Convert to text-patching.h Peter Zijlstra
@ 2019-10-18  7:35 ` Peter Zijlstra
  2019-10-22  1:35   ` Masami Hiramatsu
  2019-10-18  7:35 ` [PATCH v4 13/16] arm/ftrace: Use __patch_text_real() Peter Zijlstra
                   ` (4 subsequent siblings)
  16 siblings, 1 reply; 70+ messages in thread
From: Peter Zijlstra @ 2019-10-18  7:35 UTC (permalink / raw)
  To: x86
  Cc: peterz, linux-kernel, rostedt, mhiramat, bristot, jbaron,
	torvalds, tglx, mingo, namit, hpa, luto, ard.biesheuvel,
	jpoimboe, jeyu, paulmck, mathieu.desnoyers

Kprobes does something like:

register:
	arch_arm_kprobe()
	  text_poke(INT3)
          /* guarantees nothing, INT3 will become visible at some point, maybe */

        kprobe_optimizer()
	  /* guarantees the bytes after INT3 are unused */
	  syncrhonize_rcu_tasks();
	  text_poke_bp(JMP32);
	  /* implies IPI-sync, kprobe really is enabled */


unregister:
	__disarm_kprobe()
	  unoptimize_kprobe()
	    text_poke_bp(INT3 + tail);
	    /* implies IPI-sync, so tail is guaranteed visible */
          arch_disarm_kprobe()
            text_poke(old);
	    /* guarantees nothing, old will maybe become visible */

	synchronize_rcu()

        free-stuff

Now the problem is that on register, the synchronize_rcu_tasks() does
not imply sufficient to guarantee all CPUs have already observed INT3
(although in practise this is exceedingly unlikely not to have
happened) (similar to how MEMBARRIER_CMD_PRIVATE_EXPEDITED does not
imply MEMBARRIER_CMD_PRIVATE_EXPEDITED_SYNC_CORE).

Worse, even if it did, we'd have to do 2 synchronize calls to provide
the guarantee we're looking for, the first to ensure INT3 is visible,
the second to guarantee nobody is then still using the instruction
bytes after INT3.

Similar on unregister; the synchronize_rcu() between
__unregister_kprobe_top() and __unregister_kprobe_bottom() does not
guarantee all CPUs are free of the INT3 (and observe the old text).

Therefore, sprinkle some IPI-sync love around. This guarantees that
all CPUs agree on the text and RCU once again provides the required
guaranteed.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: hpa@zytor.com
Cc: paulmck@kernel.org
Cc: mathieu.desnoyers@efficios.com
---
 arch/x86/include/asm/text-patching.h |    1 +
 arch/x86/kernel/alternative.c        |   11 ++++++++---
 arch/x86/kernel/kprobes/core.c       |    2 ++
 arch/x86/kernel/kprobes/opt.c        |   12 ++++--------
 4 files changed, 15 insertions(+), 11 deletions(-)

--- a/arch/x86/include/asm/text-patching.h
+++ b/arch/x86/include/asm/text-patching.h
@@ -42,6 +42,7 @@ extern void text_poke_early(void *addr,
  * an inconsistent instruction while you patch.
  */
 extern void *text_poke(void *addr, const void *opcode, size_t len);
+extern void text_poke_sync(void);
 extern void *text_poke_kgdb(void *addr, const void *opcode, size_t len);
 extern int poke_int3_handler(struct pt_regs *regs);
 extern void text_poke_bp(void *addr, const void *opcode, size_t len, const void *emulate);
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -936,6 +936,11 @@ static void do_sync_core(void *info)
 	sync_core();
 }
 
+void text_poke_sync(void)
+{
+	on_each_cpu(do_sync_core, NULL, 1);
+}
+
 struct text_poke_loc {
 	s32 rel_addr; /* addr := _stext + rel_addr */
 	s32 rel32;
@@ -1085,7 +1090,7 @@ static void text_poke_bp_batch(struct te
 	for (i = 0; i < nr_entries; i++)
 		text_poke(text_poke_addr(&tp[i]), &int3, sizeof(int3));
 
-	on_each_cpu(do_sync_core, NULL, 1);
+	text_poke_sync();
 
 	/*
 	 * Second step: update all but the first byte of the patched range.
@@ -1107,7 +1112,7 @@ static void text_poke_bp_batch(struct te
 		 * not necessary and we'd be safe even without it. But
 		 * better safe than sorry (plus there's not only Intel).
 		 */
-		on_each_cpu(do_sync_core, NULL, 1);
+		text_poke_sync();
 	}
 
 	/*
@@ -1123,7 +1128,7 @@ static void text_poke_bp_batch(struct te
 	}
 
 	if (do_sync)
-		on_each_cpu(do_sync_core, NULL, 1);
+		text_poke_sync();
 
 	/*
 	 * sync_core() implies an smp_mb() and orders this store against
--- a/arch/x86/kernel/kprobes/core.c
+++ b/arch/x86/kernel/kprobes/core.c
@@ -502,11 +502,13 @@ int arch_prepare_kprobe(struct kprobe *p
 void arch_arm_kprobe(struct kprobe *p)
 {
 	text_poke(p->addr, ((unsigned char []){INT3_INSN_OPCODE}), 1);
+	text_poke_sync();
 }
 
 void arch_disarm_kprobe(struct kprobe *p)
 {
 	text_poke(p->addr, &p->opcode, 1);
+	text_poke_sync();
 }
 
 void arch_remove_kprobe(struct kprobe *p)
--- a/arch/x86/kernel/kprobes/opt.c
+++ b/arch/x86/kernel/kprobes/opt.c
@@ -444,14 +444,10 @@ void arch_optimize_kprobes(struct list_h
 /* Replace a relative jump with a breakpoint (int3).  */
 void arch_unoptimize_kprobe(struct optimized_kprobe *op)
 {
-	u8 insn_buff[JMP32_INSN_SIZE];
-
-	/* Set int3 to first byte for kprobes */
-	insn_buff[0] = INT3_INSN_OPCODE;
-	memcpy(insn_buff + 1, op->optinsn.copied_insn, DISP32_SIZE);
-
-	text_poke_bp(op->kp.addr, insn_buff, JMP32_INSN_SIZE,
-		     text_gen_insn(JMP32_INSN_OPCODE, op->kp.addr, op->optinsn.insn));
+	arch_arm_kprobe(&op->kp);
+	text_poke(op->kp.addr + INT3_INSN_SIZE,
+		  op->optinsn.copied_insn, DISP32_SIZE);
+	text_poke_sync();
 }
 
 /*



^ permalink raw reply	[flat|nested] 70+ messages in thread

* [PATCH v4 13/16] arm/ftrace: Use __patch_text_real()
  2019-10-18  7:35 [PATCH v4 00/16] Rewrite x86/ftrace to use text_poke (and more) Peter Zijlstra
                   ` (11 preceding siblings ...)
  2019-10-18  7:35 ` [PATCH v4 12/16] x86/kprobes: Fix ordering Peter Zijlstra
@ 2019-10-18  7:35 ` Peter Zijlstra
  2019-10-28 16:25   ` Will Deacon
  2019-10-18  7:35 ` [PATCH v4 14/16] module: Remove set_all_modules_text_*() Peter Zijlstra
                   ` (3 subsequent siblings)
  16 siblings, 1 reply; 70+ messages in thread
From: Peter Zijlstra @ 2019-10-18  7:35 UTC (permalink / raw)
  To: x86
  Cc: peterz, linux-kernel, rostedt, mhiramat, bristot, jbaron,
	torvalds, tglx, mingo, namit, hpa, luto, ard.biesheuvel,
	jpoimboe, jeyu, rabin, Mark Rutland, Will Deacon, james.morse

Instead of flipping text protection, use the patch_text infrastructure
that uses a fixmap alias where required.

This removes the last user of set_all_modules_text_*().

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: ard.biesheuvel@linaro.org
Cc: rabin@rab.in
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: james.morse@arm.com
---
 arch/arm/kernel/ftrace.c |   16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

--- a/arch/arm/kernel/ftrace.c
+++ b/arch/arm/kernel/ftrace.c
@@ -22,6 +22,7 @@
 #include <asm/ftrace.h>
 #include <asm/insn.h>
 #include <asm/set_memory.h>
+#include <asm/patch.h>
 
 #ifdef CONFIG_THUMB2_KERNEL
 #define	NOP		0xf85deb04	/* pop.w {lr} */
@@ -31,13 +32,15 @@
 
 #ifdef CONFIG_DYNAMIC_FTRACE
 
+static int patch_text_remap = 0;
+
 static int __ftrace_modify_code(void *data)
 {
 	int *command = data;
 
-	set_kernel_text_rw();
+	patch_text_remap++;
 	ftrace_modify_all_code(*command);
-	set_kernel_text_ro();
+	patch_text_remap--;
 
 	return 0;
 }
@@ -59,13 +62,13 @@ static unsigned long adjust_address(stru
 
 int ftrace_arch_code_modify_prepare(void)
 {
-	set_all_modules_text_rw();
+	patch_text_remap++;
 	return 0;
 }
 
 int ftrace_arch_code_modify_post_process(void)
 {
-	set_all_modules_text_ro();
+	patch_text_remap--;
 	/* Make sure any TLB misses during machine stop are cleared. */
 	flush_tlb_all();
 	return 0;
@@ -97,10 +100,7 @@ static int ftrace_modify_code(unsigned l
 			return -EINVAL;
 	}
 
-	if (probe_kernel_write((void *)pc, &new, MCOUNT_INSN_SIZE))
-		return -EPERM;
-
-	flush_icache_range(pc, pc + MCOUNT_INSN_SIZE);
+	__patch_text_real((void *)pc, new, patch_text_remap);
 
 	return 0;
 }



^ permalink raw reply	[flat|nested] 70+ messages in thread

* [PATCH v4 14/16] module: Remove set_all_modules_text_*()
  2019-10-18  7:35 [PATCH v4 00/16] Rewrite x86/ftrace to use text_poke (and more) Peter Zijlstra
                   ` (12 preceding siblings ...)
  2019-10-18  7:35 ` [PATCH v4 13/16] arm/ftrace: Use __patch_text_real() Peter Zijlstra
@ 2019-10-18  7:35 ` Peter Zijlstra
  2019-10-18  7:35 ` [PATCH v4 15/16] module: Move where we mark modules RO,X Peter Zijlstra
                   ` (2 subsequent siblings)
  16 siblings, 0 replies; 70+ messages in thread
From: Peter Zijlstra @ 2019-10-18  7:35 UTC (permalink / raw)
  To: x86
  Cc: peterz, linux-kernel, rostedt, mhiramat, bristot, jbaron,
	torvalds, tglx, mingo, namit, hpa, luto, ard.biesheuvel,
	jpoimboe, jeyu, Greentime Hu, Vincent Chen

Now that there are no users of set_all_modules_text_*() left, remove
it.

While it appears nds32 uses it, it does not have STRICT_MODULE_RWX and
therefore ends up with the NOP stubs.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Jessica Yu <jeyu@kernel.org>
Cc: Greentime Hu <green.hu@gmail.com>
Cc: Vincent Chen <deanbo422@gmail.com>
---
 arch/nds32/kernel/ftrace.c |   12 ------------
 include/linux/module.h     |    4 ----
 kernel/module.c            |   43 -------------------------------------------
 3 files changed, 59 deletions(-)

--- a/arch/nds32/kernel/ftrace.c
+++ b/arch/nds32/kernel/ftrace.c
@@ -89,18 +89,6 @@ int __init ftrace_dyn_arch_init(void)
 	return 0;
 }
 
-int ftrace_arch_code_modify_prepare(void)
-{
-	set_all_modules_text_rw();
-	return 0;
-}
-
-int ftrace_arch_code_modify_post_process(void)
-{
-	set_all_modules_text_ro();
-	return 0;
-}
-
 static unsigned long gen_sethi_insn(unsigned long addr)
 {
 	unsigned long opcode = 0x46000000;
--- a/include/linux/module.h
+++ b/include/linux/module.h
@@ -846,13 +846,9 @@ extern int module_sysfs_initialized;
 #define __MODULE_STRING(x) __stringify(x)
 
 #ifdef CONFIG_STRICT_MODULE_RWX
-extern void set_all_modules_text_rw(void);
-extern void set_all_modules_text_ro(void);
 extern void module_enable_ro(const struct module *mod, bool after_init);
 extern void module_disable_ro(const struct module *mod);
 #else
-static inline void set_all_modules_text_rw(void) { }
-static inline void set_all_modules_text_ro(void) { }
 static inline void module_enable_ro(const struct module *mod, bool after_init) { }
 static inline void module_disable_ro(const struct module *mod) { }
 #endif
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -2029,49 +2029,6 @@ static void module_enable_nx(const struc
 	frob_writable_data(&mod->init_layout, set_memory_nx);
 }
 
-/* Iterate through all modules and set each module's text as RW */
-void set_all_modules_text_rw(void)
-{
-	struct module *mod;
-
-	if (!rodata_enabled)
-		return;
-
-	mutex_lock(&module_mutex);
-	list_for_each_entry_rcu(mod, &modules, list) {
-		if (mod->state == MODULE_STATE_UNFORMED)
-			continue;
-
-		frob_text(&mod->core_layout, set_memory_rw);
-		frob_text(&mod->init_layout, set_memory_rw);
-	}
-	mutex_unlock(&module_mutex);
-}
-
-/* Iterate through all modules and set each module's text as RO */
-void set_all_modules_text_ro(void)
-{
-	struct module *mod;
-
-	if (!rodata_enabled)
-		return;
-
-	mutex_lock(&module_mutex);
-	list_for_each_entry_rcu(mod, &modules, list) {
-		/*
-		 * Ignore going modules since it's possible that ro
-		 * protection has already been disabled, otherwise we'll
-		 * run into protection faults at module deallocation.
-		 */
-		if (mod->state == MODULE_STATE_UNFORMED ||
-			mod->state == MODULE_STATE_GOING)
-			continue;
-
-		frob_text(&mod->core_layout, set_memory_ro);
-		frob_text(&mod->init_layout, set_memory_ro);
-	}
-	mutex_unlock(&module_mutex);
-}
 #else /* !CONFIG_STRICT_MODULE_RWX */
 static void module_enable_nx(const struct module *mod) { }
 #endif /*  CONFIG_STRICT_MODULE_RWX */



^ permalink raw reply	[flat|nested] 70+ messages in thread

* [PATCH v4 15/16] module: Move where we mark modules RO,X
  2019-10-18  7:35 [PATCH v4 00/16] Rewrite x86/ftrace to use text_poke (and more) Peter Zijlstra
                   ` (13 preceding siblings ...)
  2019-10-18  7:35 ` [PATCH v4 14/16] module: Remove set_all_modules_text_*() Peter Zijlstra
@ 2019-10-18  7:35 ` Peter Zijlstra
  2019-10-21 13:53   ` Josh Poimboeuf
  2019-10-22  2:21   ` Steven Rostedt
  2019-10-18  7:35 ` [PATCH v4 16/16] ftrace: Merge ftrace_module_{init,enable}() Peter Zijlstra
  2019-10-21  9:09 ` [PATCH v4 00/16] Rewrite x86/ftrace to use text_poke (and more) Ingo Molnar
  16 siblings, 2 replies; 70+ messages in thread
From: Peter Zijlstra @ 2019-10-18  7:35 UTC (permalink / raw)
  To: x86
  Cc: peterz, linux-kernel, rostedt, mhiramat, bristot, jbaron,
	torvalds, tglx, mingo, namit, hpa, luto, ard.biesheuvel,
	jpoimboe, jeyu

Now that set_all_modules_text_*() is gone, nothing depends on the
relation between ->state = COMING and the protection state anymore.
This enables moving the protection changes later, such that the COMING
notifier callbacks can more easily modify the text.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Jessica Yu <jeyu@kernel.org>
---
 kernel/module.c |    8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

--- a/kernel/module.c
+++ b/kernel/module.c
@@ -3683,10 +3683,6 @@ static int complete_formation(struct mod
 	/* This relies on module_mutex for list integrity. */
 	module_bug_finalize(info->hdr, info->sechdrs, mod);
 
-	module_enable_ro(mod, false);
-	module_enable_nx(mod);
-	module_enable_x(mod);
-
 	/* Mark state as coming so strong_try_module_get() ignores us,
 	 * but kallsyms etc. can see us. */
 	mod->state = MODULE_STATE_COMING;
@@ -3852,6 +3848,10 @@ static int load_module(struct load_info
 	if (err)
 		goto bug_cleanup;
 
+	module_enable_ro(mod, false);
+	module_enable_nx(mod);
+	module_enable_x(mod);
+
 	/* Module is ready to execute: parsing args may do that. */
 	after_dashes = parse_args(mod->name, mod->args, mod->kp, mod->num_kp,
 				  -32768, 32767, mod,



^ permalink raw reply	[flat|nested] 70+ messages in thread

* [PATCH v4 16/16] ftrace: Merge ftrace_module_{init,enable}()
  2019-10-18  7:35 [PATCH v4 00/16] Rewrite x86/ftrace to use text_poke (and more) Peter Zijlstra
                   ` (14 preceding siblings ...)
  2019-10-18  7:35 ` [PATCH v4 15/16] module: Move where we mark modules RO,X Peter Zijlstra
@ 2019-10-18  7:35 ` Peter Zijlstra
  2019-10-18  8:20   ` Peter Zijlstra
  2019-10-21  9:09 ` [PATCH v4 00/16] Rewrite x86/ftrace to use text_poke (and more) Ingo Molnar
  16 siblings, 1 reply; 70+ messages in thread
From: Peter Zijlstra @ 2019-10-18  7:35 UTC (permalink / raw)
  To: x86
  Cc: peterz, linux-kernel, rostedt, mhiramat, bristot, jbaron,
	torvalds, tglx, mingo, namit, hpa, luto, ard.biesheuvel,
	jpoimboe, jeyu

Because of how some architectures used set_all_modules_text_*() there
was a dependency between the module state and its memory protection
state. This then required ftrace to be split into two functions, see
commit:

  a949ae560a51 ("ftrace/module: Hardcode ftrace_module_init() call into load_module()")

Now that set_all_modules_text_*() is dead and burried, this is no
longer relevant and we can merge the ftrace_module hooks again.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
---
 include/linux/ftrace.h |    2 
 kernel/module.c        |    3 -
 kernel/trace/ftrace.c  |  124 +++++++++++++++++++------------------------------
 3 files changed, 49 insertions(+), 80 deletions(-)

--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -579,7 +579,6 @@ static inline int ftrace_modify_call(str
 extern int ftrace_arch_read_dyn_info(char *buf, int size);
 
 extern int skip_trace(unsigned long ip);
-extern void ftrace_module_init(struct module *mod);
 extern void ftrace_module_enable(struct module *mod);
 extern void ftrace_release_mod(struct module *mod);
 
@@ -590,7 +589,6 @@ static inline int skip_trace(unsigned lo
 static inline int ftrace_force_update(void) { return 0; }
 static inline void ftrace_disable_daemon(void) { }
 static inline void ftrace_enable_daemon(void) { }
-static inline void ftrace_module_init(struct module *mod) { }
 static inline void ftrace_module_enable(struct module *mod) { }
 static inline void ftrace_release_mod(struct module *mod) { }
 static inline int ftrace_text_reserved(const void *start, const void *end)
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -3836,9 +3836,6 @@ static int load_module(struct load_info
 
 	dynamic_debug_setup(mod, info->debug, info->num_debug);
 
-	/* Ftrace init must be called in the MODULE_STATE_UNFORMED state */
-	ftrace_module_init(mod);
-
 	/* Finally it's fully formed, ready to start executing. */
 	err = complete_formation(mod, info);
 	if (err)
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -5564,6 +5564,8 @@ static int ftrace_cmp_ips(const void *a,
 	return 0;
 }
 
+static int referenced_filters(struct dyn_ftrace *rec);
+
 static int ftrace_process_locs(struct module *mod,
 			       unsigned long *start,
 			       unsigned long *end)
@@ -5656,6 +5658,49 @@ static int ftrace_process_locs(struct mo
 	ftrace_update_code(mod, start_pg);
 	if (!mod)
 		local_irq_restore(flags);
+
+#ifdef CONFIG_MODULES
+	if (ftrace_disabled || !mod)
+		goto out_loop;
+
+	do_for_each_ftrace_rec(pg, rec) {
+		int cnt;
+		/*
+		 * do_for_each_ftrace_rec() is a double loop.
+		 * module text shares the pg. If a record is
+		 * not part of this module, then skip this pg,
+		 * which the "break" will do.
+		 */
+		if (!within_module_core(rec->ip, mod) &&
+		    !within_module_init(rec->ip, mod))
+			break;
+
+		cnt = 0;
+
+		/*
+		 * When adding a module, we need to check if tracers are
+		 * currently enabled and if they are, and can trace this record,
+		 * we need to enable the module functions as well as update the
+		 * reference counts for those function records.
+		 */
+		if (ftrace_start_up)
+			cnt += referenced_filters(rec);
+
+		/* This clears FTRACE_FL_DISABLED */
+		rec->flags = cnt;
+
+		if (ftrace_start_up && cnt) {
+			int failed = __ftrace_replace_code(rec, 1);
+			if (failed) {
+				ftrace_bug(failed, rec);
+				goto out_loop;
+			}
+		}
+
+	} while_for_each_ftrace_rec();
+
+ out_loop:
+#endif
 	ret = 0;
  out:
 	mutex_unlock(&ftrace_lock);
@@ -5823,85 +5868,14 @@ void ftrace_release_mod(struct module *m
 
 void ftrace_module_enable(struct module *mod)
 {
-	struct dyn_ftrace *rec;
-	struct ftrace_page *pg;
-
-	mutex_lock(&ftrace_lock);
-
-	if (ftrace_disabled)
-		goto out_unlock;
-
-	/*
-	 * If the tracing is enabled, go ahead and enable the record.
-	 *
-	 * The reason not to enable the record immediately is the
-	 * inherent check of ftrace_make_nop/ftrace_make_call for
-	 * correct previous instructions.  Making first the NOP
-	 * conversion puts the module to the correct state, thus
-	 * passing the ftrace_make_call check.
-	 *
-	 * We also delay this to after the module code already set the
-	 * text to read-only, as we now need to set it back to read-write
-	 * so that we can modify the text.
-	 */
-	if (ftrace_start_up)
-		ftrace_arch_code_modify_prepare();
-
-	do_for_each_ftrace_rec(pg, rec) {
-		int cnt;
-		/*
-		 * do_for_each_ftrace_rec() is a double loop.
-		 * module text shares the pg. If a record is
-		 * not part of this module, then skip this pg,
-		 * which the "break" will do.
-		 */
-		if (!within_module_core(rec->ip, mod) &&
-		    !within_module_init(rec->ip, mod))
-			break;
-
-		cnt = 0;
-
-		/*
-		 * When adding a module, we need to check if tracers are
-		 * currently enabled and if they are, and can trace this record,
-		 * we need to enable the module functions as well as update the
-		 * reference counts for those function records.
-		 */
-		if (ftrace_start_up)
-			cnt += referenced_filters(rec);
-
-		/* This clears FTRACE_FL_DISABLED */
-		rec->flags = cnt;
-
-		if (ftrace_start_up && cnt) {
-			int failed = __ftrace_replace_code(rec, 1);
-			if (failed) {
-				ftrace_bug(failed, rec);
-				goto out_loop;
-			}
-		}
-
-	} while_for_each_ftrace_rec();
-
- out_loop:
-	if (ftrace_start_up)
-		ftrace_arch_code_modify_post_process();
-
- out_unlock:
-	mutex_unlock(&ftrace_lock);
+	if (!(ftrace_disabled || !mod->num_ftrace_callsites)) {
+		ftrace_process_locs(mod, mod->ftrace_callsites,
+				    mod->ftrace_callsites + mod->num_ftrace_callsites);
+	}
 
 	process_cached_mods(mod->name);
 }
 
-void ftrace_module_init(struct module *mod)
-{
-	if (ftrace_disabled || !mod->num_ftrace_callsites)
-		return;
-
-	ftrace_process_locs(mod, mod->ftrace_callsites,
-			    mod->ftrace_callsites + mod->num_ftrace_callsites);
-}
-
 static void save_ftrace_mod_rec(struct ftrace_mod_map *mod_map,
 				struct dyn_ftrace *rec)
 {



^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v4 16/16] ftrace: Merge ftrace_module_{init,enable}()
  2019-10-18  7:35 ` [PATCH v4 16/16] ftrace: Merge ftrace_module_{init,enable}() Peter Zijlstra
@ 2019-10-18  8:20   ` Peter Zijlstra
  0 siblings, 0 replies; 70+ messages in thread
From: Peter Zijlstra @ 2019-10-18  8:20 UTC (permalink / raw)
  To: x86
  Cc: linux-kernel, rostedt, mhiramat, bristot, jbaron, torvalds, tglx,
	mingo, namit, hpa, luto, ard.biesheuvel, jpoimboe, jeyu

On Fri, Oct 18, 2019 at 09:35:41AM +0200, Peter Zijlstra wrote:
> Because of how some architectures used set_all_modules_text_*() there
> was a dependency between the module state and its memory protection
> state. This then required ftrace to be split into two functions, see
> commit:
> 
>   a949ae560a51 ("ftrace/module: Hardcode ftrace_module_init() call into load_module()")
> 
> Now that set_all_modules_text_*() is dead and burried, this is no
> longer relevant and we can merge the ftrace_module hooks again.

NOTE that by also getting rid of the ftrace_arch_code_modify_prepare() /
ftrace_arch_code_modify_post_process() callbacks in the
ftrace_module_enable() callback, both x86 and ARM will use direct poking
instead of doing the alias thing.


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v4 03/16] x86/alternatives,jump_label: Provide better text_poke() batching interface
  2019-10-18  7:35 ` [PATCH v4 03/16] x86/alternatives,jump_label: Provide better text_poke() batching interface Peter Zijlstra
@ 2019-10-21  8:48   ` Ingo Molnar
  2019-10-21  9:21     ` Peter Zijlstra
  0 siblings, 1 reply; 70+ messages in thread
From: Ingo Molnar @ 2019-10-21  8:48 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: x86, linux-kernel, rostedt, mhiramat, bristot, jbaron, torvalds,
	tglx, namit, hpa, luto, ard.biesheuvel, jpoimboe, jeyu


* Peter Zijlstra <peterz@infradead.org> wrote:

> --- a/arch/x86/kernel/jump_label.c
> +++ b/arch/x86/kernel/jump_label.c
> @@ -35,18 +35,19 @@ static void bug_at(unsigned char *ip, in
>  	BUG();
>  }
>  
> -static void __jump_label_set_jump_code(struct jump_entry *entry,
> -				       enum jump_label_type type,
> -				       union jump_code_union *code,
> -				       int init)
> +static const void *
> +__jump_label_set_jump_code(struct jump_entry *entry, enum jump_label_type type, int init)
>  {
> +	static union jump_code_union code; /* relies on text_mutex */
>  	const unsigned char default_nop[] = { STATIC_KEY_INIT_NOP };
>  	const unsigned char *ideal_nop = ideal_nops[NOP_ATOMIC5];
>  	const void *expect;
>  	int line;
>  
> -	code->jump = 0xe9;
> -	code->offset = jump_entry_target(entry) -
> +	lockdep_assert_held(&text_mutex);
> +
> +	code.jump = JMP32_INSN_OPCODE;
> +	code.offset = jump_entry_target(entry) -
>  		       (jump_entry_code(entry) + JUMP_LABEL_NOP_SIZE);
>  
>  	if (init) {
> @@ -54,23 +55,23 @@ static void __jump_label_set_jump_code(s
>  	} else if (type == JUMP_LABEL_JMP) {
>  		expect = ideal_nop; line = __LINE__;
>  	} else {
> -		expect = code->code; line = __LINE__;
> +		expect = code.code; line = __LINE__;

Side note: the whole 'line' logic looked weird to me and it obsfuscates 
the logic a bit, and I had to look it up to see what it's about: 
improving the debug output of text-patching crashes.

How about something like the below on top of your queue? We have %phD 
that can nicely print instructions in hex.

Totally untested though.

Thanks,

	Ingo

---
 arch/x86/kernel/jump_label.c |   21 ++++++++++-----------
 1 file changed, 10 insertions(+), 11 deletions(-)

Index: tip/arch/x86/kernel/jump_label.c
===================================================================
--- tip.orig/arch/x86/kernel/jump_label.c
+++ tip/arch/x86/kernel/jump_label.c
@@ -16,14 +16,15 @@
 #include <asm/alternative.h>
 #include <asm/text-patching.h>
 
-static void bug_at(const void *ip, int line)
+static void bug_at(const void *ip, const void *op_expected, const void *op_unexpected)
 {
 	/*
 	 * The location is not an op that we were expecting.
 	 * Something went wrong. Crash the box, as something could be
 	 * corrupting the kernel.
 	 */
-	pr_crit("jump_label: Fatal kernel bug, unexpected op at %pS [%p] (%5ph) %d\n", ip, ip, ip, line);
+	pr_crit("jump_label: Fatal kernel bug, expected op (%*phD), unexpected op (%*phD) at %pS [%p] (%5ph\n",
+		JUMP_LABEL_NOP_SIZE, op_expected, JUMP_LABEL_NOP_SIZE, op_unexpected, ip, ip, ip);
 	BUG();
 }
 
@@ -34,23 +35,21 @@ __jump_label_set_jump_code(struct jump_e
 	const unsigned char *ideal_nop = ideal_nops[NOP_ATOMIC5];
 	const void *expect, *code;
 	const void *addr, *dest;
-	int line;
 
 	addr = (void *)jump_entry_code(entry);
 	dest = (void *)jump_entry_target(entry);
 
 	code = text_gen_insn(JMP32_INSN_OPCODE, addr, dest);
 
-	if (init) {
-		expect = default_nop; line = __LINE__;
-	} else if (type == JUMP_LABEL_JMP) {
-		expect = ideal_nop; line = __LINE__;
-	} else {
-		expect = code; line = __LINE__;
-	}
+	if (init)
+		expect = default_nop;
+	else if (type == JUMP_LABEL_JMP)
+		expect = ideal_nop;
+	else
+		expect = code;
 
 	if (memcmp(addr, expect, JUMP_LABEL_NOP_SIZE))
-		bug_at(addr, line);
+		bug_at(addr, expect, addr);
 
 	if (type == JUMP_LABEL_NOP)
 		code = ideal_nop;

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v4 09/16] x86/alternative: Remove text_poke_loc::len
  2019-10-18  7:35 ` [PATCH v4 09/16] x86/alternative: Remove text_poke_loc::len Peter Zijlstra
@ 2019-10-21  8:58   ` Ingo Molnar
  2019-10-21  9:02     ` Ingo Molnar
  0 siblings, 1 reply; 70+ messages in thread
From: Ingo Molnar @ 2019-10-21  8:58 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: x86, linux-kernel, rostedt, mhiramat, bristot, jbaron, torvalds,
	tglx, namit, hpa, luto, ard.biesheuvel, jpoimboe, jeyu


* Peter Zijlstra <peterz@infradead.org> wrote:

>  	 * Second step: update all but the first byte of the patched range.
>  	 */
>  	for (do_sync = 0, i = 0; i < nr_entries; i++) {
> -		if (tp[i].len - sizeof(int3) > 0) {
> +		int len = text_opcode_size(tp[i].opcode);
> +
> +		if (len - sizeof(int3) > 0) {
>  			text_poke((char *)tp[i].addr + sizeof(int3),
>  				  (const char *)tp[i].text + sizeof(int3),
> -				  tp[i].len - sizeof(int3));
> +				  len - sizeof(int3));
>  			do_sync++;
>  		}

Readability side note: 'sizeof(int3)' is a really weird way to write '1' 
and I had to double check it's not measuring the size of some larger 
entity.

I think it might make sense to just break out INT3_SIZE from 
arch/x86/kernel/kprobes/opt.c into a header, rename it to INS_INT3_SIZE 
and define it to 1, because the opt.c use is pretty obfuscated as well:

  #define INT3_SIZE sizeof(kprobe_opcode_t)

Where kprobe_opcode_t is u8 on x86 (and won't ever be anything else).

?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v4 10/16] x86/alternative: Shrink text_poke_loc
  2019-10-18  7:35 ` [PATCH v4 10/16] x86/alternative: Shrink text_poke_loc Peter Zijlstra
@ 2019-10-21  9:01   ` Ingo Molnar
  2019-10-21  9:25     ` Peter Zijlstra
  0 siblings, 1 reply; 70+ messages in thread
From: Ingo Molnar @ 2019-10-21  9:01 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: x86, linux-kernel, rostedt, mhiramat, bristot, jbaron, torvalds,
	tglx, namit, hpa, luto, ard.biesheuvel, jpoimboe, jeyu


* Peter Zijlstra <peterz@infradead.org> wrote:

> Employ the fact that all text must be within a s32 displacement of one
> another to shrink the text_poke_loc::addr field. Make it relative to
> _stext.
> 
> This then shrinks struct text_poke_loc to 16 bytes, and consequently
> increases TP_VEC_MAX from 170 to 256.
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  arch/x86/kernel/alternative.c |   23 ++++++++++++++---------
>  1 file changed, 14 insertions(+), 9 deletions(-)
> 
> --- a/arch/x86/kernel/alternative.c
> +++ b/arch/x86/kernel/alternative.c
> @@ -937,7 +937,7 @@ static void do_sync_core(void *info)
>  }
>  
>  struct text_poke_loc {
> -	void *addr;
> +	s32 rel_addr; /* addr := _stext + rel_addr */
>  	s32 rel32;
>  	u8 opcode;
>  	const u8 text[POKE_MAX_OPCODE_SIZE];
> @@ -948,13 +948,18 @@ static struct bp_patching_desc {
>  	int nr_entries;
>  } bp_patching;
>  
> +static inline void *text_poke_addr(struct text_poke_loc *tp)
> +{
> +	return _stext + tp->rel_addr;
> +}

So won't this complicate the life of the big-address-space gcc model 
build patches that for purposes of module randomization are spreading the 
kernel and modules all across the 64-bit address space, where they might 
not necessarily end up within a ~2GB window?

Nothing upstream yet, but I remember such patches ...

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v4 09/16] x86/alternative: Remove text_poke_loc::len
  2019-10-21  8:58   ` Ingo Molnar
@ 2019-10-21  9:02     ` Ingo Molnar
  0 siblings, 0 replies; 70+ messages in thread
From: Ingo Molnar @ 2019-10-21  9:02 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: x86, linux-kernel, rostedt, mhiramat, bristot, jbaron, torvalds,
	tglx, namit, hpa, luto, ard.biesheuvel, jpoimboe, jeyu


* Ingo Molnar <mingo@kernel.org> wrote:

> 
> * Peter Zijlstra <peterz@infradead.org> wrote:
> 
> >  	 * Second step: update all but the first byte of the patched range.
> >  	 */
> >  	for (do_sync = 0, i = 0; i < nr_entries; i++) {
> > -		if (tp[i].len - sizeof(int3) > 0) {
> > +		int len = text_opcode_size(tp[i].opcode);
> > +
> > +		if (len - sizeof(int3) > 0) {
> >  			text_poke((char *)tp[i].addr + sizeof(int3),
> >  				  (const char *)tp[i].text + sizeof(int3),
> > -				  tp[i].len - sizeof(int3));
> > +				  len - sizeof(int3));
> >  			do_sync++;
> >  		}
> 
> Readability side note: 'sizeof(int3)' is a really weird way to write '1' 
> and I had to double check it's not measuring the size of some larger 
> entity.
> 
> I think it might make sense to just break out INT3_SIZE from 
> arch/x86/kernel/kprobes/opt.c into a header, rename it to INS_INT3_SIZE 
> and define it to 1, because the opt.c use is pretty obfuscated as well:
> 
>   #define INT3_SIZE sizeof(kprobe_opcode_t)
> 
> Where kprobe_opcode_t is u8 on x86 (and won't ever be anything else).
> 
> ?

Oh, the latter is done in your patch #11 already. Nice!

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v4 00/16] Rewrite x86/ftrace to use text_poke (and more)
  2019-10-18  7:35 [PATCH v4 00/16] Rewrite x86/ftrace to use text_poke (and more) Peter Zijlstra
                   ` (15 preceding siblings ...)
  2019-10-18  7:35 ` [PATCH v4 16/16] ftrace: Merge ftrace_module_{init,enable}() Peter Zijlstra
@ 2019-10-21  9:09 ` Ingo Molnar
  2019-10-21 13:38   ` Steven Rostedt
  16 siblings, 1 reply; 70+ messages in thread
From: Ingo Molnar @ 2019-10-21  9:09 UTC (permalink / raw)
  To: Peter Zijlstra, Steven Rostedt
  Cc: x86, linux-kernel, rostedt, mhiramat, bristot, jbaron, torvalds,
	tglx, namit, hpa, luto, ard.biesheuvel, jpoimboe, jeyu


* Peter Zijlstra <peterz@infradead.org> wrote:

> Ftrace was one of the last W^X violators (and KLP it seems). These here patches
> move it over to the generic text_poke() interface and thereby get rid of this
> oddity.
> 
> The first 6 or so patches are more or less the same as in v3, except it has the
> bugs fixed that Steve found:
> 
>  - boot time function tracing works
>  - module loading with function tracing works
> 
> Then there's 10 new patches, that go all over the place, mostly inspired by
> staring at code touched by the first 6. That is, there's further ftrace and
> kprobes cleanups, as well fixes for various issues.
> 
> In the end, it will have removed the horrible set_all_modules_text_*()
> interface and reduced the ftrace module loading to a single callback (again).
> 
> The ARM patch is compiled only, I would be much obliged if someone could test
> that.

Ok, this looks all around like a nice improvement to me, and fixes a few 
bugs as well.

Steve, any objections to this series? If not I'll stick it into 
tip:core/kprobes with a tentative v5.5 upstream merge target.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v4 03/16] x86/alternatives,jump_label: Provide better text_poke() batching interface
  2019-10-21  8:48   ` Ingo Molnar
@ 2019-10-21  9:21     ` Peter Zijlstra
  0 siblings, 0 replies; 70+ messages in thread
From: Peter Zijlstra @ 2019-10-21  9:21 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: x86, linux-kernel, rostedt, mhiramat, bristot, jbaron, torvalds,
	tglx, namit, hpa, luto, ard.biesheuvel, jpoimboe, jeyu

On Mon, Oct 21, 2019 at 10:48:02AM +0200, Ingo Molnar wrote:
> 
> * Peter Zijlstra <peterz@infradead.org> wrote:
> 
> > --- a/arch/x86/kernel/jump_label.c
> > +++ b/arch/x86/kernel/jump_label.c
> > @@ -35,18 +35,19 @@ static void bug_at(unsigned char *ip, in
> >  	BUG();
> >  }
> >  
> > -static void __jump_label_set_jump_code(struct jump_entry *entry,
> > -				       enum jump_label_type type,
> > -				       union jump_code_union *code,
> > -				       int init)
> > +static const void *
> > +__jump_label_set_jump_code(struct jump_entry *entry, enum jump_label_type type, int init)
> >  {
> > +	static union jump_code_union code; /* relies on text_mutex */
> >  	const unsigned char default_nop[] = { STATIC_KEY_INIT_NOP };
> >  	const unsigned char *ideal_nop = ideal_nops[NOP_ATOMIC5];
> >  	const void *expect;
> >  	int line;
> >  
> > -	code->jump = 0xe9;
> > -	code->offset = jump_entry_target(entry) -
> > +	lockdep_assert_held(&text_mutex);
> > +
> > +	code.jump = JMP32_INSN_OPCODE;
> > +	code.offset = jump_entry_target(entry) -
> >  		       (jump_entry_code(entry) + JUMP_LABEL_NOP_SIZE);
> >  
> >  	if (init) {
> > @@ -54,23 +55,23 @@ static void __jump_label_set_jump_code(s
> >  	} else if (type == JUMP_LABEL_JMP) {
> >  		expect = ideal_nop; line = __LINE__;
> >  	} else {
> > -		expect = code->code; line = __LINE__;
> > +		expect = code.code; line = __LINE__;
> 
> Side note: the whole 'line' logic looked weird to me and it obsfuscates 
> the logic a bit, and I had to look it up to see what it's about: 
> improving the debug output of text-patching crashes.
> 
> How about something like the below on top of your queue? We have %phD 
> that can nicely print instructions in hex.

I have a patch like that somewhere; see here:

  https://lkml.kernel.org/r/20191007090012.00469193.6@infradead.org

But yes, the __LINE__ thing is mostly about identifying which case it is
and I suppose we can infer that when we have the expected text printed
too.

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v4 10/16] x86/alternative: Shrink text_poke_loc
  2019-10-21  9:01   ` Ingo Molnar
@ 2019-10-21  9:25     ` Peter Zijlstra
  2019-10-21  9:33       ` Ingo Molnar
  0 siblings, 1 reply; 70+ messages in thread
From: Peter Zijlstra @ 2019-10-21  9:25 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: x86, linux-kernel, rostedt, mhiramat, bristot, jbaron, torvalds,
	tglx, namit, hpa, luto, ard.biesheuvel, jpoimboe, jeyu

On Mon, Oct 21, 2019 at 11:01:04AM +0200, Ingo Molnar wrote:
> 
> * Peter Zijlstra <peterz@infradead.org> wrote:
> 
> > Employ the fact that all text must be within a s32 displacement of one
> > another to shrink the text_poke_loc::addr field. Make it relative to
> > _stext.
> > 
> > This then shrinks struct text_poke_loc to 16 bytes, and consequently
> > increases TP_VEC_MAX from 170 to 256.
> > 
> > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> > ---
> >  arch/x86/kernel/alternative.c |   23 ++++++++++++++---------
> >  1 file changed, 14 insertions(+), 9 deletions(-)
> > 
> > --- a/arch/x86/kernel/alternative.c
> > +++ b/arch/x86/kernel/alternative.c
> > @@ -937,7 +937,7 @@ static void do_sync_core(void *info)
> >  }
> >  
> >  struct text_poke_loc {
> > -	void *addr;
> > +	s32 rel_addr; /* addr := _stext + rel_addr */
> >  	s32 rel32;
> >  	u8 opcode;
> >  	const u8 text[POKE_MAX_OPCODE_SIZE];
> > @@ -948,13 +948,18 @@ static struct bp_patching_desc {
> >  	int nr_entries;
> >  } bp_patching;
> >  
> > +static inline void *text_poke_addr(struct text_poke_loc *tp)
> > +{
> > +	return _stext + tp->rel_addr;
> > +}
> 
> So won't this complicate the life of the big-address-space gcc model 
> build patches that for purposes of module randomization are spreading the 
> kernel and modules all across the 64-bit address space, where they might 
> not necessarily end up within a ~2GB window?
> 
> Nothing upstream yet, but I remember such patches ...

IIRC what they were doing was allow moving the 2G range further out into
the address space, such that absolute addresses no longer fit in u32 (as
they do now), but they keep the relative displacement in s32. Otherwise
we'll end up with PLT entries all over the place. That is, if we break
the s32 displacement, CALL/JMP.d32 will not longer be able to reach any
other code and we need intermediate trampolines to help them along,
which is pretty shit.

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v4 10/16] x86/alternative: Shrink text_poke_loc
  2019-10-21  9:25     ` Peter Zijlstra
@ 2019-10-21  9:33       ` Ingo Molnar
  0 siblings, 0 replies; 70+ messages in thread
From: Ingo Molnar @ 2019-10-21  9:33 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: x86, linux-kernel, rostedt, mhiramat, bristot, jbaron, torvalds,
	tglx, namit, hpa, luto, ard.biesheuvel, jpoimboe, jeyu


* Peter Zijlstra <peterz@infradead.org> wrote:

> On Mon, Oct 21, 2019 at 11:01:04AM +0200, Ingo Molnar wrote:
> > 
> > * Peter Zijlstra <peterz@infradead.org> wrote:
> > 
> > > Employ the fact that all text must be within a s32 displacement of one
> > > another to shrink the text_poke_loc::addr field. Make it relative to
> > > _stext.
> > > 
> > > This then shrinks struct text_poke_loc to 16 bytes, and consequently
> > > increases TP_VEC_MAX from 170 to 256.
> > > 
> > > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> > > ---
> > >  arch/x86/kernel/alternative.c |   23 ++++++++++++++---------
> > >  1 file changed, 14 insertions(+), 9 deletions(-)
> > > 
> > > --- a/arch/x86/kernel/alternative.c
> > > +++ b/arch/x86/kernel/alternative.c
> > > @@ -937,7 +937,7 @@ static void do_sync_core(void *info)
> > >  }
> > >  
> > >  struct text_poke_loc {
> > > -	void *addr;
> > > +	s32 rel_addr; /* addr := _stext + rel_addr */
> > >  	s32 rel32;
> > >  	u8 opcode;
> > >  	const u8 text[POKE_MAX_OPCODE_SIZE];
> > > @@ -948,13 +948,18 @@ static struct bp_patching_desc {
> > >  	int nr_entries;
> > >  } bp_patching;
> > >  
> > > +static inline void *text_poke_addr(struct text_poke_loc *tp)
> > > +{
> > > +	return _stext + tp->rel_addr;
> > > +}
> > 
> > So won't this complicate the life of the big-address-space gcc model 
> > build patches that for purposes of module randomization are spreading the 
> > kernel and modules all across the 64-bit address space, where they might 
> > not necessarily end up within a ~2GB window?
> > 
> > Nothing upstream yet, but I remember such patches ...
> 
> IIRC what they were doing was allow moving the 2G range further out 
> into the address space, such that absolute addresses no longer fit in 
> u32 (as they do now), but they keep the relative displacement in s32. 
> Otherwise we'll end up with PLT entries all over the place. That is, if 
> we break the s32 displacement, CALL/JMP.d32 will not longer be able to 
> reach any other code and we need intermediate trampolines to help them 
> along, which is pretty shit.

Ok, indeed, that's fair enough.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v4 00/16] Rewrite x86/ftrace to use text_poke (and more)
  2019-10-21  9:09 ` [PATCH v4 00/16] Rewrite x86/ftrace to use text_poke (and more) Ingo Molnar
@ 2019-10-21 13:38   ` Steven Rostedt
  0 siblings, 0 replies; 70+ messages in thread
From: Steven Rostedt @ 2019-10-21 13:38 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, x86, linux-kernel, mhiramat, bristot, jbaron,
	torvalds, tglx, namit, hpa, luto, ard.biesheuvel, jpoimboe, jeyu

On Mon, 21 Oct 2019 11:09:22 +0200
Ingo Molnar <mingo@kernel.org> wrote:

> Steve, any objections to this series? If not I'll stick it into 
> tip:core/kprobes with a tentative v5.5 upstream merge target.

I'm going to be doing work this merge window that is going to be
directly conflicting with a lot of changes in this series. If one of us
can simply have a separate branch with just this series based off of
v5.4-rc3, it would work better.

I could add it to mine, or for you. But I would still need to test this
in my code first. Peter's last series failed my test suite.

-- Steve

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v4 15/16] module: Move where we mark modules RO,X
  2019-10-18  7:35 ` [PATCH v4 15/16] module: Move where we mark modules RO,X Peter Zijlstra
@ 2019-10-21 13:53   ` Josh Poimboeuf
  2019-10-21 14:14     ` Peter Zijlstra
  2019-10-22  2:21   ` Steven Rostedt
  1 sibling, 1 reply; 70+ messages in thread
From: Josh Poimboeuf @ 2019-10-21 13:53 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: x86, linux-kernel, rostedt, mhiramat, bristot, jbaron, torvalds,
	tglx, mingo, namit, hpa, luto, ard.biesheuvel, jeyu

On Fri, Oct 18, 2019 at 09:35:40AM +0200, Peter Zijlstra wrote:
> Now that set_all_modules_text_*() is gone, nothing depends on the
> relation between ->state = COMING and the protection state anymore.
> This enables moving the protection changes later, such that the COMING
> notifier callbacks can more easily modify the text.
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> Cc: Jessica Yu <jeyu@kernel.org>
> ---
>  kernel/module.c |    8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> --- a/kernel/module.c
> +++ b/kernel/module.c
> @@ -3683,10 +3683,6 @@ static int complete_formation(struct mod
>  	/* This relies on module_mutex for list integrity. */
>  	module_bug_finalize(info->hdr, info->sechdrs, mod);
>  
> -	module_enable_ro(mod, false);
> -	module_enable_nx(mod);
> -	module_enable_x(mod);
> -
>  	/* Mark state as coming so strong_try_module_get() ignores us,
>  	 * but kallsyms etc. can see us. */
>  	mod->state = MODULE_STATE_COMING;
> @@ -3852,6 +3848,10 @@ static int load_module(struct load_info
>  	if (err)
>  		goto bug_cleanup;
>  
> +	module_enable_ro(mod, false);
> +	module_enable_nx(mod);
> +	module_enable_x(mod);
> +
>  	/* Module is ready to execute: parsing args may do that. */
>  	after_dashes = parse_args(mod->name, mod->args, mod->kp, mod->num_kp,
>  				  -32768, 32767, mod,

[ Sorry if this was already discussed, I still have a large backlog. ]

Doesn't livepatch code also need to be modified?  We have:

prepare_coming_module()
	klp_module_coming()
		klp_init_object_loaded()
			module_disable_ro()
			...
			module_enable_ro()

which is done right before the above patch does module_enable_ro().

We could remove the disable-RO from that case, though we'd still need it
for another case (late module patching).

-- 
Josh


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v4 15/16] module: Move where we mark modules RO,X
  2019-10-21 13:53   ` Josh Poimboeuf
@ 2019-10-21 14:14     ` Peter Zijlstra
  2019-10-21 15:34       ` Peter Zijlstra
  2019-10-23 11:48       ` Peter Zijlstra
  0 siblings, 2 replies; 70+ messages in thread
From: Peter Zijlstra @ 2019-10-21 14:14 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: x86, linux-kernel, rostedt, mhiramat, bristot, jbaron, torvalds,
	tglx, mingo, namit, hpa, luto, ard.biesheuvel, jeyu

On Mon, Oct 21, 2019 at 08:53:12AM -0500, Josh Poimboeuf wrote:
> On Fri, Oct 18, 2019 at 09:35:40AM +0200, Peter Zijlstra wrote:
> > Now that set_all_modules_text_*() is gone, nothing depends on the
> > relation between ->state = COMING and the protection state anymore.
> > This enables moving the protection changes later, such that the COMING
> > notifier callbacks can more easily modify the text.
> > 
> > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> > Cc: Jessica Yu <jeyu@kernel.org>
> > ---
> >  kernel/module.c |    8 ++++----
> >  1 file changed, 4 insertions(+), 4 deletions(-)
> > 
> > --- a/kernel/module.c
> > +++ b/kernel/module.c
> > @@ -3683,10 +3683,6 @@ static int complete_formation(struct mod
> >  	/* This relies on module_mutex for list integrity. */
> >  	module_bug_finalize(info->hdr, info->sechdrs, mod);
> >  
> > -	module_enable_ro(mod, false);
> > -	module_enable_nx(mod);
> > -	module_enable_x(mod);
> > -
> >  	/* Mark state as coming so strong_try_module_get() ignores us,
> >  	 * but kallsyms etc. can see us. */
> >  	mod->state = MODULE_STATE_COMING;
> > @@ -3852,6 +3848,10 @@ static int load_module(struct load_info
> >  	if (err)
> >  		goto bug_cleanup;
> >  
> > +	module_enable_ro(mod, false);
> > +	module_enable_nx(mod);
> > +	module_enable_x(mod);
> > +
> >  	/* Module is ready to execute: parsing args may do that. */
> >  	after_dashes = parse_args(mod->name, mod->args, mod->kp, mod->num_kp,
> >  				  -32768, 32767, mod,
> 
> [ Sorry if this was already discussed, I still have a large backlog. ]
> 
> Doesn't livepatch code also need to be modified?  We have:

Urgh bah.. I was too focussed on the other klp borkage :/ But yes,
arm64/ftrace and klp are the only two users of that function (outside of
module.c) and Mark was already writing a patch for arm64.

Means these last two patches need to wait a little until we've fixed
those.

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v4 11/16] x86/kprobes: Convert to text-patching.h
  2019-10-18  7:35 ` [PATCH v4 11/16] x86/kprobes: Convert to text-patching.h Peter Zijlstra
@ 2019-10-21 14:57   ` Masami Hiramatsu
  0 siblings, 0 replies; 70+ messages in thread
From: Masami Hiramatsu @ 2019-10-21 14:57 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: x86, linux-kernel, rostedt, mhiramat, bristot, jbaron, torvalds,
	tglx, mingo, namit, hpa, luto, ard.biesheuvel, jpoimboe, jeyu

Hi Peter,

On Fri, 18 Oct 2019 09:35:36 +0200
Peter Zijlstra <peterz@infradead.org> wrote:

> Convert kprobes to the new text-poke naming.
> 

Looks good to me :)

Acked-by: Masami Hiramatsu <mhiramat@kernel.org>

Thanks,

> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  arch/x86/include/asm/kprobes.h       |   14 +++--------
>  arch/x86/include/asm/text-patching.h |    2 +
>  arch/x86/kernel/kprobes/core.c       |   18 +++++++-------
>  arch/x86/kernel/kprobes/opt.c        |   44 ++++++++++++++++-------------------
>  4 files changed, 37 insertions(+), 41 deletions(-)
> 
> --- a/arch/x86/include/asm/kprobes.h
> +++ b/arch/x86/include/asm/kprobes.h
> @@ -11,12 +11,11 @@
>  
>  #include <asm-generic/kprobes.h>
>  
> -#define BREAKPOINT_INSTRUCTION	0xcc
> -
>  #ifdef CONFIG_KPROBES
>  #include <linux/types.h>
>  #include <linux/ptrace.h>
>  #include <linux/percpu.h>
> +#include <asm/text-patching.h>
>  #include <asm/insn.h>
>  
>  #define  __ARCH_WANT_KPROBES_INSN_SLOT
> @@ -25,10 +24,7 @@ struct pt_regs;
>  struct kprobe;
>  
>  typedef u8 kprobe_opcode_t;
> -#define RELATIVEJUMP_OPCODE 0xe9
> -#define RELATIVEJUMP_SIZE 5
> -#define RELATIVECALL_OPCODE 0xe8
> -#define RELATIVE_ADDR_SIZE 4
> +
>  #define MAX_STACK_SIZE 64
>  #define CUR_STACK_SIZE(ADDR) \
>  	(current_top_of_stack() - (unsigned long)(ADDR))
> @@ -43,11 +39,11 @@ extern __visible kprobe_opcode_t optprob
>  extern __visible kprobe_opcode_t optprobe_template_val[];
>  extern __visible kprobe_opcode_t optprobe_template_call[];
>  extern __visible kprobe_opcode_t optprobe_template_end[];
> -#define MAX_OPTIMIZED_LENGTH (MAX_INSN_SIZE + RELATIVE_ADDR_SIZE)
> +#define MAX_OPTIMIZED_LENGTH (MAX_INSN_SIZE + DISP32_SIZE)
>  #define MAX_OPTINSN_SIZE 				\
>  	(((unsigned long)optprobe_template_end -	\
>  	  (unsigned long)optprobe_template_entry) +	\
> -	 MAX_OPTIMIZED_LENGTH + RELATIVEJUMP_SIZE)
> +	 MAX_OPTIMIZED_LENGTH + JMP32_INSN_SIZE)
>  
>  extern const int kretprobe_blacklist_size;
>  
> @@ -73,7 +69,7 @@ struct arch_specific_insn {
>  
>  struct arch_optimized_insn {
>  	/* copy of the original instructions */
> -	kprobe_opcode_t copied_insn[RELATIVE_ADDR_SIZE];
> +	kprobe_opcode_t copied_insn[DISP32_SIZE];
>  	/* detour code buffer */
>  	kprobe_opcode_t *insn;
>  	/* the size of instructions copied to detour code buffer */
> --- a/arch/x86/include/asm/text-patching.h
> +++ b/arch/x86/include/asm/text-patching.h
> @@ -61,6 +61,8 @@ extern void text_poke_finish(void);
>  #define JMP8_INSN_SIZE		2
>  #define JMP8_INSN_OPCODE	0xEB
>  
> +#define DISP32_SIZE		4
> +
>  static inline int text_opcode_size(u8 opcode)
>  {
>  	int size = 0;
> --- a/arch/x86/kernel/kprobes/core.c
> +++ b/arch/x86/kernel/kprobes/core.c
> @@ -119,14 +119,14 @@ __synthesize_relative_insn(void *dest, v
>  /* Insert a jump instruction at address 'from', which jumps to address 'to'.*/
>  void synthesize_reljump(void *dest, void *from, void *to)
>  {
> -	__synthesize_relative_insn(dest, from, to, RELATIVEJUMP_OPCODE);
> +	__synthesize_relative_insn(dest, from, to, JMP32_INSN_OPCODE);
>  }
>  NOKPROBE_SYMBOL(synthesize_reljump);
>  
>  /* Insert a call instruction at address 'from', which calls address 'to'.*/
>  void synthesize_relcall(void *dest, void *from, void *to)
>  {
> -	__synthesize_relative_insn(dest, from, to, RELATIVECALL_OPCODE);
> +	__synthesize_relative_insn(dest, from, to, CALL_INSN_OPCODE);
>  }
>  NOKPROBE_SYMBOL(synthesize_relcall);
>  
> @@ -301,7 +301,7 @@ static int can_probe(unsigned long paddr
>  		 * Another debugging subsystem might insert this breakpoint.
>  		 * In that case, we can't recover it.
>  		 */
> -		if (insn.opcode.bytes[0] == BREAKPOINT_INSTRUCTION)
> +		if (insn.opcode.bytes[0] == INT3_INSN_OPCODE)
>  			return 0;
>  		addr += insn.length;
>  	}
> @@ -352,7 +352,7 @@ int __copy_instruction(u8 *dest, u8 *src
>  	insn_get_length(insn);
>  
>  	/* Another subsystem puts a breakpoint, failed to recover */
> -	if (insn->opcode.bytes[0] == BREAKPOINT_INSTRUCTION)
> +	if (insn->opcode.bytes[0] == INT3_INSN_OPCODE)
>  		return 0;
>  
>  	/* We should not singlestep on the exception masking instructions */
> @@ -396,14 +396,14 @@ static int prepare_boost(kprobe_opcode_t
>  	int len = insn->length;
>  
>  	if (can_boost(insn, p->addr) &&
> -	    MAX_INSN_SIZE - len >= RELATIVEJUMP_SIZE) {
> +	    MAX_INSN_SIZE - len >= JMP32_INSN_SIZE) {
>  		/*
>  		 * These instructions can be executed directly if it
>  		 * jumps back to correct address.
>  		 */
>  		synthesize_reljump(buf + len, p->ainsn.insn + len,
>  				   p->addr + insn->length);
> -		len += RELATIVEJUMP_SIZE;
> +		len += JMP32_INSN_SIZE;
>  		p->ainsn.boostable = true;
>  	} else {
>  		p->ainsn.boostable = false;
> @@ -497,7 +497,7 @@ int arch_prepare_kprobe(struct kprobe *p
>  
>  void arch_arm_kprobe(struct kprobe *p)
>  {
> -	text_poke(p->addr, ((unsigned char []){BREAKPOINT_INSTRUCTION}), 1);
> +	text_poke(p->addr, ((unsigned char []){INT3_INSN_OPCODE}), 1);
>  }
>  
>  void arch_disarm_kprobe(struct kprobe *p)
> @@ -605,7 +605,7 @@ static void setup_singlestep(struct kpro
>  	regs->flags |= X86_EFLAGS_TF;
>  	regs->flags &= ~X86_EFLAGS_IF;
>  	/* single step inline if the instruction is an int3 */
> -	if (p->opcode == BREAKPOINT_INSTRUCTION)
> +	if (p->opcode == INT3_INSN_OPCODE)
>  		regs->ip = (unsigned long)p->addr;
>  	else
>  		regs->ip = (unsigned long)p->ainsn.insn;
> @@ -691,7 +691,7 @@ int kprobe_int3_handler(struct pt_regs *
>  				reset_current_kprobe();
>  			return 1;
>  		}
> -	} else if (*addr != BREAKPOINT_INSTRUCTION) {
> +	} else if (*addr != INT3_INSN_OPCODE) {
>  		/*
>  		 * The breakpoint instruction was removed right
>  		 * after we hit it.  Another cpu has removed
> --- a/arch/x86/kernel/kprobes/opt.c
> +++ b/arch/x86/kernel/kprobes/opt.c
> @@ -38,7 +38,7 @@ unsigned long __recover_optprobed_insn(k
>  	long offs;
>  	int i;
>  
> -	for (i = 0; i < RELATIVEJUMP_SIZE; i++) {
> +	for (i = 0; i < JMP32_INSN_SIZE; i++) {
>  		kp = get_kprobe((void *)addr - i);
>  		/* This function only handles jump-optimized kprobe */
>  		if (kp && kprobe_optimized(kp)) {
> @@ -62,10 +62,10 @@ unsigned long __recover_optprobed_insn(k
>  
>  	if (addr == (unsigned long)kp->addr) {
>  		buf[0] = kp->opcode;
> -		memcpy(buf + 1, op->optinsn.copied_insn, RELATIVE_ADDR_SIZE);
> +		memcpy(buf + 1, op->optinsn.copied_insn, DISP32_SIZE);
>  	} else {
>  		offs = addr - (unsigned long)kp->addr - 1;
> -		memcpy(buf, op->optinsn.copied_insn + offs, RELATIVE_ADDR_SIZE - offs);
> +		memcpy(buf, op->optinsn.copied_insn + offs, DISP32_SIZE - offs);
>  	}
>  
>  	return (unsigned long)buf;
> @@ -141,8 +141,6 @@ STACK_FRAME_NON_STANDARD(optprobe_templa
>  #define TMPL_END_IDX \
>  	((long)optprobe_template_end - (long)optprobe_template_entry)
>  
> -#define INT3_SIZE sizeof(kprobe_opcode_t)
> -
>  /* Optimized kprobe call back function: called from optinsn */
>  static void
>  optimized_callback(struct optimized_kprobe *op, struct pt_regs *regs)
> @@ -162,7 +160,7 @@ optimized_callback(struct optimized_kpro
>  		regs->cs |= get_kernel_rpl();
>  		regs->gs = 0;
>  #endif
> -		regs->ip = (unsigned long)op->kp.addr + INT3_SIZE;
> +		regs->ip = (unsigned long)op->kp.addr + INT3_INSN_SIZE;
>  		regs->orig_ax = ~0UL;
>  
>  		__this_cpu_write(current_kprobe, &op->kp);
> @@ -179,7 +177,7 @@ static int copy_optimized_instructions(u
>  	struct insn insn;
>  	int len = 0, ret;
>  
> -	while (len < RELATIVEJUMP_SIZE) {
> +	while (len < JMP32_INSN_SIZE) {
>  		ret = __copy_instruction(dest + len, src + len, real + len, &insn);
>  		if (!ret || !can_boost(&insn, src + len))
>  			return -EINVAL;
> @@ -271,7 +269,7 @@ static int can_optimize(unsigned long pa
>  		return 0;
>  
>  	/* Check there is enough space for a relative jump. */
> -	if (size - offset < RELATIVEJUMP_SIZE)
> +	if (size - offset < JMP32_INSN_SIZE)
>  		return 0;
>  
>  	/* Decode instructions */
> @@ -290,15 +288,15 @@ static int can_optimize(unsigned long pa
>  		kernel_insn_init(&insn, (void *)recovered_insn, MAX_INSN_SIZE);
>  		insn_get_length(&insn);
>  		/* Another subsystem puts a breakpoint */
> -		if (insn.opcode.bytes[0] == BREAKPOINT_INSTRUCTION)
> +		if (insn.opcode.bytes[0] == INT3_INSN_OPCODE)
>  			return 0;
>  		/* Recover address */
>  		insn.kaddr = (void *)addr;
>  		insn.next_byte = (void *)(addr + insn.length);
>  		/* Check any instructions don't jump into target */
>  		if (insn_is_indirect_jump(&insn) ||
> -		    insn_jump_into_range(&insn, paddr + INT3_SIZE,
> -					 RELATIVE_ADDR_SIZE))
> +		    insn_jump_into_range(&insn, paddr + INT3_INSN_SIZE,
> +					 DISP32_SIZE))
>  			return 0;
>  		addr += insn.length;
>  	}
> @@ -374,7 +372,7 @@ int arch_prepare_optimized_kprobe(struct
>  	 * Verify if the address gap is in 2GB range, because this uses
>  	 * a relative jump.
>  	 */
> -	rel = (long)slot - (long)op->kp.addr + RELATIVEJUMP_SIZE;
> +	rel = (long)slot - (long)op->kp.addr + JMP32_INSN_SIZE;
>  	if (abs(rel) > 0x7fffffff) {
>  		ret = -ERANGE;
>  		goto err;
> @@ -401,7 +399,7 @@ int arch_prepare_optimized_kprobe(struct
>  	/* Set returning jmp instruction at the tail of out-of-line buffer */
>  	synthesize_reljump(buf + len, slot + len,
>  			   (u8 *)op->kp.addr + op->optinsn.size);
> -	len += RELATIVEJUMP_SIZE;
> +	len += JMP32_INSN_SIZE;
>  
>  	/* We have to use text_poke() for instruction buffer because it is RO */
>  	text_poke(slot, buf, len);
> @@ -422,22 +420,22 @@ int arch_prepare_optimized_kprobe(struct
>  void arch_optimize_kprobes(struct list_head *oplist)
>  {
>  	struct optimized_kprobe *op, *tmp;
> -	u8 insn_buff[RELATIVEJUMP_SIZE];
> +	u8 insn_buff[JMP32_INSN_SIZE];
>  
>  	list_for_each_entry_safe(op, tmp, oplist, list) {
>  		s32 rel = (s32)((long)op->optinsn.insn -
> -			((long)op->kp.addr + RELATIVEJUMP_SIZE));
> +			((long)op->kp.addr + JMP32_INSN_SIZE));
>  
>  		WARN_ON(kprobe_disabled(&op->kp));
>  
>  		/* Backup instructions which will be replaced by jump address */
> -		memcpy(op->optinsn.copied_insn, op->kp.addr + INT3_SIZE,
> -		       RELATIVE_ADDR_SIZE);
> +		memcpy(op->optinsn.copied_insn, op->kp.addr + INT3_INSN_SIZE,
> +		       DISP32_SIZE);
>  
> -		insn_buff[0] = RELATIVEJUMP_OPCODE;
> +		insn_buff[0] = JMP32_INSN_OPCODE;
>  		*(s32 *)(&insn_buff[1]) = rel;
>  
> -		text_poke_bp(op->kp.addr, insn_buff, RELATIVEJUMP_SIZE, NULL);
> +		text_poke_bp(op->kp.addr, insn_buff, JMP32_INSN_SIZE, NULL);
>  
>  		list_del_init(&op->list);
>  	}
> @@ -446,13 +444,13 @@ void arch_optimize_kprobes(struct list_h
>  /* Replace a relative jump with a breakpoint (int3).  */
>  void arch_unoptimize_kprobe(struct optimized_kprobe *op)
>  {
> -	u8 insn_buff[RELATIVEJUMP_SIZE];
> +	u8 insn_buff[JMP32_INSN_SIZE];
>  
>  	/* Set int3 to first byte for kprobes */
> -	insn_buff[0] = BREAKPOINT_INSTRUCTION;
> -	memcpy(insn_buff + 1, op->optinsn.copied_insn, RELATIVE_ADDR_SIZE);
> +	insn_buff[0] = INT3_INSN_OPCODE;
> +	memcpy(insn_buff + 1, op->optinsn.copied_insn, DISP32_SIZE);
>  
> -	text_poke_bp(op->kp.addr, insn_buff, RELATIVEJUMP_SIZE,
> +	text_poke_bp(op->kp.addr, insn_buff, JMP32_INSN_SIZE,
>  		     text_gen_insn(JMP32_INSN_OPCODE, op->kp.addr, op->optinsn.insn));
>  }
>  
> 
> 


-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v4 15/16] module: Move where we mark modules RO,X
  2019-10-21 14:14     ` Peter Zijlstra
@ 2019-10-21 15:34       ` Peter Zijlstra
  2019-10-21 15:44         ` Peter Zijlstra
  2019-10-21 16:11         ` Peter Zijlstra
  2019-10-23 11:48       ` Peter Zijlstra
  1 sibling, 2 replies; 70+ messages in thread
From: Peter Zijlstra @ 2019-10-21 15:34 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: x86, linux-kernel, rostedt, mhiramat, bristot, jbaron, torvalds,
	tglx, mingo, namit, hpa, luto, ard.biesheuvel, jeyu

On Mon, Oct 21, 2019 at 04:14:02PM +0200, Peter Zijlstra wrote:
> On Mon, Oct 21, 2019 at 08:53:12AM -0500, Josh Poimboeuf wrote:
> > On Fri, Oct 18, 2019 at 09:35:40AM +0200, Peter Zijlstra wrote:
> > > Now that set_all_modules_text_*() is gone, nothing depends on the
> > > relation between ->state = COMING and the protection state anymore.
> > > This enables moving the protection changes later, such that the COMING
> > > notifier callbacks can more easily modify the text.
> > > 
> > > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> > > Cc: Jessica Yu <jeyu@kernel.org>
> > > ---
> > >  kernel/module.c |    8 ++++----
> > >  1 file changed, 4 insertions(+), 4 deletions(-)
> > > 
> > > --- a/kernel/module.c
> > > +++ b/kernel/module.c
> > > @@ -3683,10 +3683,6 @@ static int complete_formation(struct mod
> > >  	/* This relies on module_mutex for list integrity. */
> > >  	module_bug_finalize(info->hdr, info->sechdrs, mod);
> > >  
> > > -	module_enable_ro(mod, false);
> > > -	module_enable_nx(mod);
> > > -	module_enable_x(mod);
> > > -
> > >  	/* Mark state as coming so strong_try_module_get() ignores us,
> > >  	 * but kallsyms etc. can see us. */
> > >  	mod->state = MODULE_STATE_COMING;
> > > @@ -3852,6 +3848,10 @@ static int load_module(struct load_info
> > >  	if (err)
> > >  		goto bug_cleanup;
> > >  
> > > +	module_enable_ro(mod, false);
> > > +	module_enable_nx(mod);
> > > +	module_enable_x(mod);
> > > +
> > >  	/* Module is ready to execute: parsing args may do that. */
> > >  	after_dashes = parse_args(mod->name, mod->args, mod->kp, mod->num_kp,
> > >  				  -32768, 32767, mod,
> > 
> > [ Sorry if this was already discussed, I still have a large backlog. ]
> > 
> > Doesn't livepatch code also need to be modified?  We have:
> 
> Urgh bah.. I was too focussed on the other klp borkage :/ But yes,
> arm64/ftrace and klp are the only two users of that function (outside of
> module.c) and Mark was already writing a patch for arm64.
> 
> Means these last two patches need to wait a little until we've fixed
> those.

So On IRC Josh suggested we use text_poke() for RELA. Since KLP is only
available on Power and x86, and Power does not have STRICT_MODULE_RWX,
the below should be sufficient.

Completely untested...

---
 arch/x86/kernel/module.c  | 40 +++++++++++++++++++++++++++++++++-------
 include/linux/livepatch.h |  7 +++++++
 kernel/livepatch/core.c   | 14 ++++++++++----
 3 files changed, 50 insertions(+), 11 deletions(-)

diff --git a/arch/x86/kernel/module.c b/arch/x86/kernel/module.c
index d5c72cb877b3..76fa2c5f2d7b 100644
--- a/arch/x86/kernel/module.c
+++ b/arch/x86/kernel/module.c
@@ -126,11 +126,12 @@ int apply_relocate(Elf32_Shdr *sechdrs,
 	return 0;
 }
 #else /*X86_64*/
-int apply_relocate_add(Elf64_Shdr *sechdrs,
+int __apply_relocate_add(Elf64_Shdr *sechdrs,
 		   const char *strtab,
 		   unsigned int symindex,
 		   unsigned int relsec,
-		   struct module *me)
+		   struct module *me,
+		   void *(*write)(void *addr, const void *val, size_t size))
 {
 	unsigned int i;
 	Elf64_Rela *rel = (void *)sechdrs[relsec].sh_addr;
@@ -162,19 +163,19 @@ int apply_relocate_add(Elf64_Shdr *sechdrs,
 		case R_X86_64_64:
 			if (*(u64 *)loc != 0)
 				goto invalid_relocation;
-			*(u64 *)loc = val;
+			write(loc, &val, 8);
 			break;
 		case R_X86_64_32:
 			if (*(u32 *)loc != 0)
 				goto invalid_relocation;
-			*(u32 *)loc = val;
+			write(loc, &val, 4);
 			if (val != *(u32 *)loc)
 				goto overflow;
 			break;
 		case R_X86_64_32S:
 			if (*(s32 *)loc != 0)
 				goto invalid_relocation;
-			*(s32 *)loc = val;
+			write(loc, &val, 4);
 			if ((s64)val != *(s32 *)loc)
 				goto overflow;
 			break;
@@ -183,7 +184,7 @@ int apply_relocate_add(Elf64_Shdr *sechdrs,
 			if (*(u32 *)loc != 0)
 				goto invalid_relocation;
 			val -= (u64)loc;
-			*(u32 *)loc = val;
+			write(loc, &val, 4);
 #if 0
 			if ((s64)val != *(s32 *)loc)
 				goto overflow;
@@ -193,7 +194,7 @@ int apply_relocate_add(Elf64_Shdr *sechdrs,
 			if (*(u64 *)loc != 0)
 				goto invalid_relocation;
 			val -= (u64)loc;
-			*(u64 *)loc = val;
+			write(loc, &val, 8);
 			break;
 		default:
 			pr_err("%s: Unknown rela relocation: %llu\n",
@@ -215,6 +216,31 @@ int apply_relocate_add(Elf64_Shdr *sechdrs,
 	       me->name);
 	return -ENOEXEC;
 }
+
+int apply_relocate_add(Elf64_Shdr *sechdrs,
+		   const char *strtab,
+		   unsigned int symindex,
+		   unsigned int relsec,
+		   struct module *me)
+{
+	return __apply_relocate_add(sechdrs, strtab, symindex, relsec, me, memcpy);
+}
+
+int klp_apply_relocate_add(Elf64_Shdr *sechdrs,
+		   const char *strtab,
+		   unsigned int symindex,
+		   unsigned int relsec,
+		   struct module *me)
+{
+	int ret;
+
+	ret = __apply_relocate_add(sechdrs, strtab, symindex, relsec, me, text_poke);
+	if (!ret)
+		text_poke_sync();
+
+	return ret;
+}
+
 #endif
 
 int module_finalize(const Elf_Ehdr *hdr,
diff --git a/include/linux/livepatch.h b/include/linux/livepatch.h
index 273400814020..5b8c10871b70 100644
--- a/include/linux/livepatch.h
+++ b/include/linux/livepatch.h
@@ -217,6 +217,13 @@ void *klp_shadow_get_or_alloc(void *obj, unsigned long id,
 void klp_shadow_free(void *obj, unsigned long id, klp_shadow_dtor_t dtor);
 void klp_shadow_free_all(unsigned long id, klp_shadow_dtor_t dtor);
 
+
+extern int klp_apply_relocate_add(Elf64_Shdr *sechdrs,
+			      const char *strtab,
+			      unsigned int symindex,
+			      unsigned int relsec,
+			      struct module *me);
+
 #else /* !CONFIG_LIVEPATCH */
 
 static inline int klp_module_coming(struct module *mod) { return 0; }
diff --git a/kernel/livepatch/core.c b/kernel/livepatch/core.c
index ab4a4606d19b..e690519aba31 100644
--- a/kernel/livepatch/core.c
+++ b/kernel/livepatch/core.c
@@ -245,6 +245,15 @@ static int klp_resolve_symbols(Elf_Shdr *relasec, struct module *pmod)
 	return 0;
 }
 
+int __weak klp_apply_relocate_add(Elf64_Shdr *sechdrs,
+			      const char *strtab,
+			      unsigned int symindex,
+			      unsigned int relsec,
+			      struct module *me)
+{
+	apply_relocate_add(sechdrs, strtab, symindex, relsec, me);
+}
+
 static int klp_write_object_relocations(struct module *pmod,
 					struct klp_object *obj)
 {
@@ -285,7 +294,7 @@ static int klp_write_object_relocations(struct module *pmod,
 		if (ret)
 			break;
 
-		ret = apply_relocate_add(pmod->klp_info->sechdrs,
+		ret = klp_apply_relocate_add(pmod->klp_info->sechdrs,
 					 pmod->core_kallsyms.strtab,
 					 pmod->klp_info->symndx, i, pmod);
 		if (ret)
@@ -721,16 +730,13 @@ static int klp_init_object_loaded(struct klp_patch *patch,
 
 	mutex_lock(&text_mutex);
 
-	module_disable_ro(patch->mod);
 	ret = klp_write_object_relocations(patch->mod, obj);
 	if (ret) {
-		module_enable_ro(patch->mod, true);
 		mutex_unlock(&text_mutex);
 		return ret;
 	}
 
 	arch_klp_init_object_loaded(patch, obj);
-	module_enable_ro(patch->mod, true);
 
 	mutex_unlock(&text_mutex);
 

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v4 15/16] module: Move where we mark modules RO,X
  2019-10-21 15:34       ` Peter Zijlstra
@ 2019-10-21 15:44         ` Peter Zijlstra
  2019-10-21 16:11         ` Peter Zijlstra
  1 sibling, 0 replies; 70+ messages in thread
From: Peter Zijlstra @ 2019-10-21 15:44 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: x86, linux-kernel, rostedt, mhiramat, bristot, jbaron, torvalds,
	tglx, mingo, namit, hpa, luto, ard.biesheuvel, jeyu

On Mon, Oct 21, 2019 at 05:34:25PM +0200, Peter Zijlstra wrote:
> So On IRC Josh suggested we use text_poke() for RELA. Since KLP is only
> available on Power and x86, and Power does not have STRICT_MODULE_RWX,
> the below should be sufficient.

And... s390 also has HAVE_LIVEPATCH and STRICT_MODULE_RWX, so let me
poke at that.

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v4 15/16] module: Move where we mark modules RO,X
  2019-10-21 15:34       ` Peter Zijlstra
  2019-10-21 15:44         ` Peter Zijlstra
@ 2019-10-21 16:11         ` Peter Zijlstra
  2019-10-22 11:31           ` Heiko Carstens
  1 sibling, 1 reply; 70+ messages in thread
From: Peter Zijlstra @ 2019-10-21 16:11 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: x86, linux-kernel, rostedt, mhiramat, bristot, jbaron, torvalds,
	tglx, mingo, namit, hpa, luto, ard.biesheuvel, jeyu,
	heiko.carstens

On Mon, Oct 21, 2019 at 05:34:25PM +0200, Peter Zijlstra wrote:
> On Mon, Oct 21, 2019 at 04:14:02PM +0200, Peter Zijlstra wrote:

> So On IRC Josh suggested we use text_poke() for RELA. Since KLP is only
> available on Power and x86, and Power does not have STRICT_MODULE_RWX,
> the below should be sufficient.
> 
> Completely untested...

And because s390 also has: HAVE_LIVEPATCH and STRICT_MODULE_RWX the even
less tested s390 bits included below.

Heiko, apologies if I completely wrecked it.

The purpose is to remove module_disable_ro()/module_enable_ro() from
livepatch/core.c such that:

 - nothing relies on where in the module loading path module text goes RX.
 - nothing ever has writable text

> ---
>  arch/x86/kernel/module.c  | 40 +++++++++++++++++++++++++++++++++-------
>  include/linux/livepatch.h |  7 +++++++
>  kernel/livepatch/core.c   | 14 ++++++++++----
>  3 files changed, 50 insertions(+), 11 deletions(-)
> 
> diff --git a/arch/x86/kernel/module.c b/arch/x86/kernel/module.c
> index d5c72cb877b3..76fa2c5f2d7b 100644
> --- a/arch/x86/kernel/module.c
> +++ b/arch/x86/kernel/module.c
> @@ -126,11 +126,12 @@ int apply_relocate(Elf32_Shdr *sechdrs,
>  	return 0;
>  }
>  #else /*X86_64*/
> -int apply_relocate_add(Elf64_Shdr *sechdrs,
> +int __apply_relocate_add(Elf64_Shdr *sechdrs,
>  		   const char *strtab,
>  		   unsigned int symindex,
>  		   unsigned int relsec,
> -		   struct module *me)
> +		   struct module *me,
> +		   void *(*write)(void *addr, const void *val, size_t size))
>  {
>  	unsigned int i;
>  	Elf64_Rela *rel = (void *)sechdrs[relsec].sh_addr;
> @@ -162,19 +163,19 @@ int apply_relocate_add(Elf64_Shdr *sechdrs,
>  		case R_X86_64_64:
>  			if (*(u64 *)loc != 0)
>  				goto invalid_relocation;
> -			*(u64 *)loc = val;
> +			write(loc, &val, 8);
>  			break;
>  		case R_X86_64_32:
>  			if (*(u32 *)loc != 0)
>  				goto invalid_relocation;
> -			*(u32 *)loc = val;
> +			write(loc, &val, 4);
>  			if (val != *(u32 *)loc)
>  				goto overflow;
>  			break;
>  		case R_X86_64_32S:
>  			if (*(s32 *)loc != 0)
>  				goto invalid_relocation;
> -			*(s32 *)loc = val;
> +			write(loc, &val, 4);
>  			if ((s64)val != *(s32 *)loc)
>  				goto overflow;
>  			break;
> @@ -183,7 +184,7 @@ int apply_relocate_add(Elf64_Shdr *sechdrs,
>  			if (*(u32 *)loc != 0)
>  				goto invalid_relocation;
>  			val -= (u64)loc;
> -			*(u32 *)loc = val;
> +			write(loc, &val, 4);
>  #if 0
>  			if ((s64)val != *(s32 *)loc)
>  				goto overflow;
> @@ -193,7 +194,7 @@ int apply_relocate_add(Elf64_Shdr *sechdrs,
>  			if (*(u64 *)loc != 0)
>  				goto invalid_relocation;
>  			val -= (u64)loc;
> -			*(u64 *)loc = val;
> +			write(loc, &val, 8);
>  			break;
>  		default:
>  			pr_err("%s: Unknown rela relocation: %llu\n",
> @@ -215,6 +216,31 @@ int apply_relocate_add(Elf64_Shdr *sechdrs,
>  	       me->name);
>  	return -ENOEXEC;
>  }
> +
> +int apply_relocate_add(Elf64_Shdr *sechdrs,
> +		   const char *strtab,
> +		   unsigned int symindex,
> +		   unsigned int relsec,
> +		   struct module *me)
> +{
> +	return __apply_relocate_add(sechdrs, strtab, symindex, relsec, me, memcpy);
> +}
> +
> +int klp_apply_relocate_add(Elf64_Shdr *sechdrs,
> +		   const char *strtab,
> +		   unsigned int symindex,
> +		   unsigned int relsec,
> +		   struct module *me)
> +{
> +	int ret;
> +
> +	ret = __apply_relocate_add(sechdrs, strtab, symindex, relsec, me, text_poke);
> +	if (!ret)
> +		text_poke_sync();
> +
> +	return ret;
> +}
> +
>  #endif
>  
>  int module_finalize(const Elf_Ehdr *hdr,
> diff --git a/include/linux/livepatch.h b/include/linux/livepatch.h
> index 273400814020..5b8c10871b70 100644
> --- a/include/linux/livepatch.h
> +++ b/include/linux/livepatch.h
> @@ -217,6 +217,13 @@ void *klp_shadow_get_or_alloc(void *obj, unsigned long id,
>  void klp_shadow_free(void *obj, unsigned long id, klp_shadow_dtor_t dtor);
>  void klp_shadow_free_all(unsigned long id, klp_shadow_dtor_t dtor);
>  
> +
> +extern int klp_apply_relocate_add(Elf64_Shdr *sechdrs,
> +			      const char *strtab,
> +			      unsigned int symindex,
> +			      unsigned int relsec,
> +			      struct module *me);
> +
>  #else /* !CONFIG_LIVEPATCH */
>  
>  static inline int klp_module_coming(struct module *mod) { return 0; }
> diff --git a/kernel/livepatch/core.c b/kernel/livepatch/core.c
> index ab4a4606d19b..e690519aba31 100644
> --- a/kernel/livepatch/core.c
> +++ b/kernel/livepatch/core.c
> @@ -245,6 +245,15 @@ static int klp_resolve_symbols(Elf_Shdr *relasec, struct module *pmod)
>  	return 0;
>  }
>  
> +int __weak klp_apply_relocate_add(Elf64_Shdr *sechdrs,
> +			      const char *strtab,
> +			      unsigned int symindex,
> +			      unsigned int relsec,
> +			      struct module *me)
> +{
> +	apply_relocate_add(sechdrs, strtab, symindex, relsec, me);
> +}
> +
>  static int klp_write_object_relocations(struct module *pmod,
>  					struct klp_object *obj)
>  {
> @@ -285,7 +294,7 @@ static int klp_write_object_relocations(struct module *pmod,
>  		if (ret)
>  			break;
>  
> -		ret = apply_relocate_add(pmod->klp_info->sechdrs,
> +		ret = klp_apply_relocate_add(pmod->klp_info->sechdrs,
>  					 pmod->core_kallsyms.strtab,
>  					 pmod->klp_info->symndx, i, pmod);
>  		if (ret)
> @@ -721,16 +730,13 @@ static int klp_init_object_loaded(struct klp_patch *patch,
>  
>  	mutex_lock(&text_mutex);
>  
> -	module_disable_ro(patch->mod);
>  	ret = klp_write_object_relocations(patch->mod, obj);
>  	if (ret) {
> -		module_enable_ro(patch->mod, true);
>  		mutex_unlock(&text_mutex);
>  		return ret;
>  	}
>  
>  	arch_klp_init_object_loaded(patch, obj);
> -	module_enable_ro(patch->mod, true);
>  
>  	mutex_unlock(&text_mutex);
>  

diff --git a/arch/s390/kernel/module.c b/arch/s390/kernel/module.c
index ba8f19bb438b..5f3443098172 100644
--- a/arch/s390/kernel/module.c
+++ b/arch/s390/kernel/module.c
@@ -174,7 +174,8 @@ int module_frob_arch_sections(Elf_Ehdr *hdr, Elf_Shdr *sechdrs,
 }
 
 static int apply_rela_bits(Elf_Addr loc, Elf_Addr val,
-			   int sign, int bits, int shift)
+			   int sign, int bits, int shift.
+			   void (*write)(void *addr, const char *data, size_t len))
 {
 	unsigned long umax;
 	long min, max;
@@ -194,26 +195,29 @@ static int apply_rela_bits(Elf_Addr loc, Elf_Addr val,
 			return -ENOEXEC;
 	}
 
-	if (bits == 8)
-		*(unsigned char *) loc = val;
-	else if (bits == 12)
-		*(unsigned short *) loc = (val & 0xfff) |
+	if (bits == 8) {
+		write(loc, &val, 1);
+	} else if (bits == 12) {
+		unsigned short tmp = (val & 0xfff) |
 			(*(unsigned short *) loc & 0xf000);
-	else if (bits == 16)
-		*(unsigned short *) loc = val;
-	else if (bits == 20)
-		*(unsigned int *) loc = (val & 0xfff) << 16 |
-			(val & 0xff000) >> 4 |
-			(*(unsigned int *) loc & 0xf00000ff);
-	else if (bits == 32)
-		*(unsigned int *) loc = val;
-	else if (bits == 64)
-		*(unsigned long *) loc = val;
+		write(loc, &tmp, 2);
+	} else if (bits == 16) {
+		write(loc, &val, 2);
+	} else if (bits == 20) {
+		unsigned int tmp = (val & 0xfff) << 16 |
+			(val & 0xff000) >> 4 | (*(unsigned int *) loc & 0xf00000ff);
+		write(loc, &tmp, 4);
+	} else if (bits == 32) {
+		write(loc, &val, 4);
+	} else if (bits == 64) {
+		write(loc, &val, 8);
+	}
 	return 0;
 }
 
 static int apply_rela(Elf_Rela *rela, Elf_Addr base, Elf_Sym *symtab,
-		      const char *strtab, struct module *me)
+		      const char *strtab, struct module *me,
+		      void (*write)(void *addr, const void *data, size_t len))
 {
 	struct mod_arch_syminfo *info;
 	Elf_Addr loc, val;
@@ -241,17 +245,17 @@ static int apply_rela(Elf_Rela *rela, Elf_Addr base, Elf_Sym *symtab,
 	case R_390_64:		/* Direct 64 bit.  */
 		val += rela->r_addend;
 		if (r_type == R_390_8)
-			rc = apply_rela_bits(loc, val, 0, 8, 0);
+			rc = apply_rela_bits(loc, val, 0, 8, 0, write);
 		else if (r_type == R_390_12)
-			rc = apply_rela_bits(loc, val, 0, 12, 0);
+			rc = apply_rela_bits(loc, val, 0, 12, 0, write);
 		else if (r_type == R_390_16)
-			rc = apply_rela_bits(loc, val, 0, 16, 0);
+			rc = apply_rela_bits(loc, val, 0, 16, 0, write);
 		else if (r_type == R_390_20)
-			rc = apply_rela_bits(loc, val, 1, 20, 0);
+			rc = apply_rela_bits(loc, val, 1, 20, 0, write);
 		else if (r_type == R_390_32)
-			rc = apply_rela_bits(loc, val, 0, 32, 0);
+			rc = apply_rela_bits(loc, val, 0, 32, 0, write);
 		else if (r_type == R_390_64)
-			rc = apply_rela_bits(loc, val, 0, 64, 0);
+			rc = apply_rela_bits(loc, val, 0, 64, 0, write);
 		break;
 	case R_390_PC16:	/* PC relative 16 bit.  */
 	case R_390_PC16DBL:	/* PC relative 16 bit shifted by 1.  */
@@ -260,15 +264,15 @@ static int apply_rela(Elf_Rela *rela, Elf_Addr base, Elf_Sym *symtab,
 	case R_390_PC64:	/* PC relative 64 bit.	*/
 		val += rela->r_addend - loc;
 		if (r_type == R_390_PC16)
-			rc = apply_rela_bits(loc, val, 1, 16, 0);
+			rc = apply_rela_bits(loc, val, 1, 16, 0, write);
 		else if (r_type == R_390_PC16DBL)
-			rc = apply_rela_bits(loc, val, 1, 16, 1);
+			rc = apply_rela_bits(loc, val, 1, 16, 1, write);
 		else if (r_type == R_390_PC32DBL)
-			rc = apply_rela_bits(loc, val, 1, 32, 1);
+			rc = apply_rela_bits(loc, val, 1, 32, 1, write);
 		else if (r_type == R_390_PC32)
-			rc = apply_rela_bits(loc, val, 1, 32, 0);
+			rc = apply_rela_bits(loc, val, 1, 32, 0, write);
 		else if (r_type == R_390_PC64)
-			rc = apply_rela_bits(loc, val, 1, 64, 0);
+			rc = apply_rela_bits(loc, val, 1, 64, 0, write);
 		break;
 	case R_390_GOT12:	/* 12 bit GOT offset.  */
 	case R_390_GOT16:	/* 16 bit GOT offset.  */
@@ -293,23 +297,23 @@ static int apply_rela(Elf_Rela *rela, Elf_Addr base, Elf_Sym *symtab,
 		val = info->got_offset + rela->r_addend;
 		if (r_type == R_390_GOT12 ||
 		    r_type == R_390_GOTPLT12)
-			rc = apply_rela_bits(loc, val, 0, 12, 0);
+			rc = apply_rela_bits(loc, val, 0, 12, 0, write);
 		else if (r_type == R_390_GOT16 ||
 			 r_type == R_390_GOTPLT16)
-			rc = apply_rela_bits(loc, val, 0, 16, 0);
+			rc = apply_rela_bits(loc, val, 0, 16, 0, write);
 		else if (r_type == R_390_GOT20 ||
 			 r_type == R_390_GOTPLT20)
-			rc = apply_rela_bits(loc, val, 1, 20, 0);
+			rc = apply_rela_bits(loc, val, 1, 20, 0, write);
 		else if (r_type == R_390_GOT32 ||
 			 r_type == R_390_GOTPLT32)
-			rc = apply_rela_bits(loc, val, 0, 32, 0);
+			rc = apply_rela_bits(loc, val, 0, 32, 0, write);
 		else if (r_type == R_390_GOT64 ||
 			 r_type == R_390_GOTPLT64)
-			rc = apply_rela_bits(loc, val, 0, 64, 0);
+			rc = apply_rela_bits(loc, val, 0, 64, 0, write);
 		else if (r_type == R_390_GOTENT ||
 			 r_type == R_390_GOTPLTENT) {
 			val += (Elf_Addr) me->core_layout.base - loc;
-			rc = apply_rela_bits(loc, val, 1, 32, 1);
+			rc = apply_rela_bits(loc, val, 1, 32, 1, write);
 		}
 		break;
 	case R_390_PLT16DBL:	/* 16 bit PC rel. PLT shifted by 1.  */
@@ -357,17 +361,17 @@ static int apply_rela(Elf_Rela *rela, Elf_Addr base, Elf_Sym *symtab,
 			val += rela->r_addend - loc;
 		}
 		if (r_type == R_390_PLT16DBL)
-			rc = apply_rela_bits(loc, val, 1, 16, 1);
+			rc = apply_rela_bits(loc, val, 1, 16, 1, write);
 		else if (r_type == R_390_PLTOFF16)
-			rc = apply_rela_bits(loc, val, 0, 16, 0);
+			rc = apply_rela_bits(loc, val, 0, 16, 0, write);
 		else if (r_type == R_390_PLT32DBL)
-			rc = apply_rela_bits(loc, val, 1, 32, 1);
+			rc = apply_rela_bits(loc, val, 1, 32, 1, write);
 		else if (r_type == R_390_PLT32 ||
 			 r_type == R_390_PLTOFF32)
-			rc = apply_rela_bits(loc, val, 0, 32, 0);
+			rc = apply_rela_bits(loc, val, 0, 32, 0, write);
 		else if (r_type == R_390_PLT64 ||
 			 r_type == R_390_PLTOFF64)
-			rc = apply_rela_bits(loc, val, 0, 64, 0);
+			rc = apply_rela_bits(loc, val, 0, 64, 0, write);
 		break;
 	case R_390_GOTOFF16:	/* 16 bit offset to GOT.  */
 	case R_390_GOTOFF32:	/* 32 bit offset to GOT.  */
@@ -375,20 +379,20 @@ static int apply_rela(Elf_Rela *rela, Elf_Addr base, Elf_Sym *symtab,
 		val = val + rela->r_addend -
 			((Elf_Addr) me->core_layout.base + me->arch.got_offset);
 		if (r_type == R_390_GOTOFF16)
-			rc = apply_rela_bits(loc, val, 0, 16, 0);
+			rc = apply_rela_bits(loc, val, 0, 16, 0, write);
 		else if (r_type == R_390_GOTOFF32)
-			rc = apply_rela_bits(loc, val, 0, 32, 0);
+			rc = apply_rela_bits(loc, val, 0, 32, 0, write);
 		else if (r_type == R_390_GOTOFF64)
-			rc = apply_rela_bits(loc, val, 0, 64, 0);
+			rc = apply_rela_bits(loc, val, 0, 64, 0, write);
 		break;
 	case R_390_GOTPC:	/* 32 bit PC relative offset to GOT. */
 	case R_390_GOTPCDBL:	/* 32 bit PC rel. off. to GOT shifted by 1. */
 		val = (Elf_Addr) me->core_layout.base + me->arch.got_offset +
 			rela->r_addend - loc;
 		if (r_type == R_390_GOTPC)
-			rc = apply_rela_bits(loc, val, 1, 32, 0);
+			rc = apply_rela_bits(loc, val, 1, 32, 0, write);
 		else if (r_type == R_390_GOTPCDBL)
-			rc = apply_rela_bits(loc, val, 1, 32, 1);
+			rc = apply_rela_bits(loc, val, 1, 32, 1, write);
 		break;
 	case R_390_COPY:
 	case R_390_GLOB_DAT:	/* Create GOT entry.  */
@@ -412,9 +416,10 @@ static int apply_rela(Elf_Rela *rela, Elf_Addr base, Elf_Sym *symtab,
 	return 0;
 }
 
-int apply_relocate_add(Elf_Shdr *sechdrs, const char *strtab,
+int __apply_relocate_add(Elf_Shdr *sechdrs, const char *strtab,
 		       unsigned int symindex, unsigned int relsec,
-		       struct module *me)
+		       struct module *me,
+		       void (*write)(void *addr, const void *data, size_t len))
 {
 	Elf_Addr base;
 	Elf_Sym *symtab;
@@ -437,6 +442,25 @@ int apply_relocate_add(Elf_Shdr *sechdrs, const char *strtab,
 	return 0;
 }
 
+static void memwrite(void *addr, const void *data, size_t len)
+{
+	memcpy(addr, data, len);
+}
+
+int apply_relocate_add(Elf_Shdr *sechdrs, const char *strtab,
+		       unsigned int symindex, unsigned int relsec,
+		       struct module *me)
+{
+	return __apply_relocate_add(sechdrs, strtab, symindex, relsec, me, memwrite);
+}
+
+int klp_apply_relocate_add(Elf_Shdr *sechdrs, const char *strtab,
+		       unsigned int symindex, unsigned int relsec,
+		       struct module *me)
+{
+	return __apply_relocate_add(sechdrs, strtab, symindex, relsec, me, s390_kernel_write);
+}
+
 int module_finalize(const Elf_Ehdr *hdr,
 		    const Elf_Shdr *sechdrs,
 		    struct module *me)


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v4 12/16] x86/kprobes: Fix ordering
  2019-10-18  7:35 ` [PATCH v4 12/16] x86/kprobes: Fix ordering Peter Zijlstra
@ 2019-10-22  1:35   ` Masami Hiramatsu
  2019-10-22 10:31     ` Peter Zijlstra
  0 siblings, 1 reply; 70+ messages in thread
From: Masami Hiramatsu @ 2019-10-22  1:35 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: x86, linux-kernel, rostedt, mhiramat, bristot, jbaron, torvalds,
	tglx, mingo, namit, hpa, luto, ard.biesheuvel, jpoimboe, jeyu,
	paulmck, mathieu.desnoyers

On Fri, 18 Oct 2019 09:35:37 +0200
Peter Zijlstra <peterz@infradead.org> wrote:

> Kprobes does something like:
> 
> register:
> 	arch_arm_kprobe()
> 	  text_poke(INT3)
>           /* guarantees nothing, INT3 will become visible at some point, maybe */
> 
>         kprobe_optimizer()
> 	  /* guarantees the bytes after INT3 are unused */
> 	  syncrhonize_rcu_tasks();
> 	  text_poke_bp(JMP32);
> 	  /* implies IPI-sync, kprobe really is enabled */
> 
> 
> unregister:
> 	__disarm_kprobe()
> 	  unoptimize_kprobe()
> 	    text_poke_bp(INT3 + tail);
> 	    /* implies IPI-sync, so tail is guaranteed visible */
>           arch_disarm_kprobe()
>             text_poke(old);
> 	    /* guarantees nothing, old will maybe become visible */
> 
> 	synchronize_rcu()
> 
>         free-stuff

Note that this is only for the case of optimized kprobe.
(On some probe points we can not optimize it)

> 
> Now the problem is that on register, the synchronize_rcu_tasks() does
> not imply sufficient to guarantee all CPUs have already observed INT3
> (although in practise this is exceedingly unlikely not to have
> happened) (similar to how MEMBARRIER_CMD_PRIVATE_EXPEDITED does not
> imply MEMBARRIER_CMD_PRIVATE_EXPEDITED_SYNC_CORE).

OK, so the sync_core() after int3 is needed to guarantee the probe
is enabled on each core.

> 
> Worse, even if it did, we'd have to do 2 synchronize calls to provide
> the guarantee we're looking for, the first to ensure INT3 is visible,
> the second to guarantee nobody is then still using the instruction
> bytes after INT3.

I think this 2nd guarantee is done by syncrhonize_rcu() if we
put sync_core() after int3. syncrhonize_rcu() ensures that
all cores once scheduled and all interrupts have done.

> 
> Similar on unregister; the synchronize_rcu() between
> __unregister_kprobe_top() and __unregister_kprobe_bottom() does not
> guarantee all CPUs are free of the INT3 (and observe the old text).

I agree with putting sync_core() after putting/removing INT3.

> 
> Therefore, sprinkle some IPI-sync love around. This guarantees that
> all CPUs agree on the text and RCU once again provides the required
> guaranteed.
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> Cc: hpa@zytor.com
> Cc: paulmck@kernel.org
> Cc: mathieu.desnoyers@efficios.com
> ---
>  arch/x86/include/asm/text-patching.h |    1 +
>  arch/x86/kernel/alternative.c        |   11 ++++++++---
>  arch/x86/kernel/kprobes/core.c       |    2 ++
>  arch/x86/kernel/kprobes/opt.c        |   12 ++++--------
>  4 files changed, 15 insertions(+), 11 deletions(-)
> 
> --- a/arch/x86/include/asm/text-patching.h
> +++ b/arch/x86/include/asm/text-patching.h
> @@ -42,6 +42,7 @@ extern void text_poke_early(void *addr,
>   * an inconsistent instruction while you patch.
>   */
>  extern void *text_poke(void *addr, const void *opcode, size_t len);
> +extern void text_poke_sync(void);
>  extern void *text_poke_kgdb(void *addr, const void *opcode, size_t len);
>  extern int poke_int3_handler(struct pt_regs *regs);
>  extern void text_poke_bp(void *addr, const void *opcode, size_t len, const void *emulate);
> --- a/arch/x86/kernel/alternative.c
> +++ b/arch/x86/kernel/alternative.c
> @@ -936,6 +936,11 @@ static void do_sync_core(void *info)
>  	sync_core();
>  }
>  
> +void text_poke_sync(void)
> +{
> +	on_each_cpu(do_sync_core, NULL, 1);
> +}
> +
>  struct text_poke_loc {
>  	s32 rel_addr; /* addr := _stext + rel_addr */
>  	s32 rel32;
> @@ -1085,7 +1090,7 @@ static void text_poke_bp_batch(struct te
>  	for (i = 0; i < nr_entries; i++)
>  		text_poke(text_poke_addr(&tp[i]), &int3, sizeof(int3));
>  
> -	on_each_cpu(do_sync_core, NULL, 1);
> +	text_poke_sync();
>  
>  	/*
>  	 * Second step: update all but the first byte of the patched range.
> @@ -1107,7 +1112,7 @@ static void text_poke_bp_batch(struct te
>  		 * not necessary and we'd be safe even without it. But
>  		 * better safe than sorry (plus there's not only Intel).
>  		 */
> -		on_each_cpu(do_sync_core, NULL, 1);
> +		text_poke_sync();
>  	}
>  
>  	/*
> @@ -1123,7 +1128,7 @@ static void text_poke_bp_batch(struct te
>  	}
>  
>  	if (do_sync)
> -		on_each_cpu(do_sync_core, NULL, 1);
> +		text_poke_sync();
>  
>  	/*
>  	 * sync_core() implies an smp_mb() and orders this store against
> --- a/arch/x86/kernel/kprobes/core.c
> +++ b/arch/x86/kernel/kprobes/core.c
> @@ -502,11 +502,13 @@ int arch_prepare_kprobe(struct kprobe *p
>  void arch_arm_kprobe(struct kprobe *p)
>  {
>  	text_poke(p->addr, ((unsigned char []){INT3_INSN_OPCODE}), 1);
> +	text_poke_sync();
>  }
>  
>  void arch_disarm_kprobe(struct kprobe *p)
>  {
>  	text_poke(p->addr, &p->opcode, 1);
> +	text_poke_sync();
>  }

This looks good to me.

>  
>  void arch_remove_kprobe(struct kprobe *p)
> --- a/arch/x86/kernel/kprobes/opt.c
> +++ b/arch/x86/kernel/kprobes/opt.c
> @@ -444,14 +444,10 @@ void arch_optimize_kprobes(struct list_h
>  /* Replace a relative jump with a breakpoint (int3).  */
>  void arch_unoptimize_kprobe(struct optimized_kprobe *op)
>  {
> -	u8 insn_buff[JMP32_INSN_SIZE];
> -
> -	/* Set int3 to first byte for kprobes */
> -	insn_buff[0] = INT3_INSN_OPCODE;
> -	memcpy(insn_buff + 1, op->optinsn.copied_insn, DISP32_SIZE);
> -
> -	text_poke_bp(op->kp.addr, insn_buff, JMP32_INSN_SIZE,
> -		     text_gen_insn(JMP32_INSN_OPCODE, op->kp.addr, op->optinsn.insn));
> +	arch_arm_kprobe(&op->kp);
> +	text_poke(op->kp.addr + INT3_INSN_SIZE,
> +		  op->optinsn.copied_insn, DISP32_SIZE);
> +	text_poke_sync();
>  }

For this part, I thought it was same as what text_poke_bp() does.
But, indeed, this looks better (simpler & lighter) than using
text_poke_bp()...

So, in total, this looks good to me.

Acked-by: Masami Hiramatsu <mhiramat@kernel.org>


Thank you,

-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v4 15/16] module: Move where we mark modules RO,X
  2019-10-18  7:35 ` [PATCH v4 15/16] module: Move where we mark modules RO,X Peter Zijlstra
  2019-10-21 13:53   ` Josh Poimboeuf
@ 2019-10-22  2:21   ` Steven Rostedt
  2019-10-22 20:24     ` Peter Zijlstra
  1 sibling, 1 reply; 70+ messages in thread
From: Steven Rostedt @ 2019-10-22  2:21 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: x86, linux-kernel, mhiramat, bristot, jbaron, torvalds, tglx,
	mingo, namit, hpa, luto, ard.biesheuvel, jpoimboe, jeyu

[-- Attachment #1: Type: text/plain, Size: 3435 bytes --]

On Fri, 18 Oct 2019 09:35:40 +0200
Peter Zijlstra <peterz@infradead.org> wrote:

> Now that set_all_modules_text_*() is gone, nothing depends on the
> relation between ->state = COMING and the protection state anymore.
> This enables moving the protection changes later, such that the COMING
> notifier callbacks can more easily modify the text.
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> Cc: Jessica Yu <jeyu@kernel.org>
> ---

This triggered the following bug:

 BUG: unable to handle page fault for address: ffffffffa01501f1
 #PF: supervisor instruction fetch in kernel mode
 #PF: error_code(0x0011) - permissions violation
 PGD 2a16067 P4D 2a16067 PUD 2a17063 PMD c230c067 PTE 80000000c4d74063
 Oops: 0011 [#1] PREEMPT SMP KASAN PTI
 CPU: 2 PID: 638 Comm: systemd-udevd Not tainted 5.4.0-rc3-test+ #98
 ACPI: If an ACPI driver is available for this device, you should use it instead of the native driver
 ACPI Warning: SystemIO range 0x0000000000000530-0x000000000000053F conflicts with OpRegion 0x0000000000000500-0x0000000000000563 (\GPIO) (20190816/utaddress-213)
 ACPI: If an ACPI driver is available for this device, you should use it instead of the native driver
 ACPI Warning: SystemIO range 0x0000000000000500-0x000000000000052F conflicts with OpRegion 0x0000000000000500-0x0000000000000563 (\GPIO) (20190816/utaddress-213)
 ACPI: If an ACPI driver is available for this device, you should use it instead of the native driver
 lpc_ich: Resource conflict(s) found affecting gpio_ich
 Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01 v03.03 07/14/2016
 RIP: 0010:trace_event_define_fields_i2c_result+0x0/0x86 [i2c_core]
 Code: 27 6a 00 48 c7 c2 60 34 13 a0 45 31 c9 48 89 df 41 b8 02 00 00 00 b9 12 00 00 00 48 c7 c6 a0 33 13 a0 e8 02 ec 14 e1 5a 5b c3 <53> 48 c7 c6 20 33 13 a0 b9 08 00 00 00 41
0 6a 00 41
 RSP: 0018:ffff8880cba07950 EFLAGS: 00010246
 RAX: ffffffffa01501f1 RBX: ffffffffa013da40 RCX: ffffffff812a147c
 RDX: dffffc0000000000 RSI: 0000000000000008 RDI: ffffffffa013da40
 RBP: ffffffffa0142be0 R08: ffffed1017fde1ab R09: ffffed1017fde1ab
 R10: ffffed1017fde1aa R11: ffff8880bfef0d57 R12: ffff8880cc22a000
 R13: ffffffffa013da50 R14: ffffffffa0137aa8 R15: ffff8880cd372c60
 FS:  00007f062a48f940(0000) GS:ffff8880d4680000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: ffffffffa01501f1 CR3: 00000000cb632003 CR4: 00000000001606e0
 Call Trace:
  event_create_dir+0x358/0x7b0
  trace_module_notify+0x20b/0x240
  notifier_call_chain+0x6d/0xa0
  blocking_notifier_call_chain+0x5e/0x80
  load_module+0x39a5/0x3d80
  ? module_frob_arch_sections+0x20/0x20
  ? vfs_read+0xcc/0x1b0
  ? kernel_read+0x95/0xb0
  ? kernel_read_file+0x187/0x310
  ? find_held_lock+0xac/0xd0
  ? syscall_trace_enter+0x369/0x590
  ? __do_sys_finit_module+0x11a/0x1b0
  __do_sys_finit_module+0x11a/0x1b0
  ? __ia32_sys_init_module+0x40/0x40
  ? trace_hardirqs_on+0x2e/0x120
  ? ktime_get_coarse_real_ts64+0x6c/0xf0
  ? syscall_trace_enter+0x233/0x590
  ? do_syscall_64+0x14/0x1a0
  do_syscall_64+0x68/0x1a0
  entry_SYSCALL_64_after_hwframe+0x49/0xbe

Attached config, but it seems to be triggered with modules that have
trace events defined in them.

The trace_event_define_fields_<event>() is defined in
include/trace/trace_events.h and is an init function called by the
trace_events event_create_dir() via the module notifier:
MODULE_STATE_COMING

-- Steve

[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 30457 bytes --]

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v4 12/16] x86/kprobes: Fix ordering
  2019-10-22  1:35   ` Masami Hiramatsu
@ 2019-10-22 10:31     ` Peter Zijlstra
  0 siblings, 0 replies; 70+ messages in thread
From: Peter Zijlstra @ 2019-10-22 10:31 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: x86, linux-kernel, rostedt, bristot, jbaron, torvalds, tglx,
	mingo, namit, hpa, luto, ard.biesheuvel, jpoimboe, jeyu, paulmck,
	mathieu.desnoyers

On Tue, Oct 22, 2019 at 10:35:21AM +0900, Masami Hiramatsu wrote:
> On Fri, 18 Oct 2019 09:35:37 +0200
> Peter Zijlstra <peterz@infradead.org> wrote:
> 
> > Kprobes does something like:
> > 
> > register:
> > 	arch_arm_kprobe()
> > 	  text_poke(INT3)
> >           /* guarantees nothing, INT3 will become visible at some point, maybe */
> > 
> >         kprobe_optimizer()
> > 	  /* guarantees the bytes after INT3 are unused */
> > 	  syncrhonize_rcu_tasks();
> > 	  text_poke_bp(JMP32);
> > 	  /* implies IPI-sync, kprobe really is enabled */
> > 
> > 
> > unregister:
> > 	__disarm_kprobe()
> > 	  unoptimize_kprobe()
> > 	    text_poke_bp(INT3 + tail);
> > 	    /* implies IPI-sync, so tail is guaranteed visible */
> >           arch_disarm_kprobe()
> >             text_poke(old);
> > 	    /* guarantees nothing, old will maybe become visible */
> > 
> > 	synchronize_rcu()
> > 
> >         free-stuff
> 
> Note that this is only for the case of optimized kprobe.
> (On some probe points we can not optimize it)

Of course..

> > Now the problem is that on register, the synchronize_rcu_tasks() does
> > not imply sufficient to guarantee all CPUs have already observed INT3
> > (although in practise this is exceedingly unlikely not to have
> > happened) (similar to how MEMBARRIER_CMD_PRIVATE_EXPEDITED does not
> > imply MEMBARRIER_CMD_PRIVATE_EXPEDITED_SYNC_CORE).
> 
> OK, so the sync_core() after int3 is needed to guarantee the probe
> is enabled on each core.

Indeed, AFAIU the old instruction can stay in I$ or micro-ops caches
which are not (fully) cache coherent.

So without forcing a serializing instruction (CPUID, CR3 write, etc..)
there is no guarantee.

> > Worse, even if it did, we'd have to do 2 synchronize calls to provide
> > the guarantee we're looking for, the first to ensure INT3 is visible,
> > the second to guarantee nobody is then still using the instruction
> > bytes after INT3.
> 
> I think this 2nd guarantee is done by syncrhonize_rcu() if we
> put sync_core() after int3. syncrhonize_rcu() ensures that
> all cores once scheduled and all interrupts have done.

Right, with IPI it all works, without IPI it might be that the INT3 is
still visible when we start synchronize_rcu() and thereby violate the
RCU guarantees.

> > --- a/arch/x86/kernel/kprobes/opt.c
> > +++ b/arch/x86/kernel/kprobes/opt.c
> > @@ -444,14 +444,10 @@ void arch_optimize_kprobes(struct list_h
> >  /* Replace a relative jump with a breakpoint (int3).  */
> >  void arch_unoptimize_kprobe(struct optimized_kprobe *op)
> >  {
> > -	u8 insn_buff[JMP32_INSN_SIZE];
> > -
> > -	/* Set int3 to first byte for kprobes */
> > -	insn_buff[0] = INT3_INSN_OPCODE;
> > -	memcpy(insn_buff + 1, op->optinsn.copied_insn, DISP32_SIZE);
> > -
> > -	text_poke_bp(op->kp.addr, insn_buff, JMP32_INSN_SIZE,
> > -		     text_gen_insn(JMP32_INSN_OPCODE, op->kp.addr, op->optinsn.insn));
> > +	arch_arm_kprobe(&op->kp);
> > +	text_poke(op->kp.addr + INT3_INSN_SIZE,
> > +		  op->optinsn.copied_insn, DISP32_SIZE);
> > +	text_poke_sync();
> >  }
> 
> For this part, I thought it was same as what text_poke_bp() does.
> But, indeed, this looks better (simpler & lighter) than using
> text_poke_bp()...

Indeed. The reason I wrote it as such is that now the
text_poke_bp(.emulate) argument is unused, which I can go remove in a
future patch.

The whole reason I went looking here is that I was going to write a
comment on how this was correct; it seems I've still to do that..

/me adds the two entries to the TODO list.

> So, in total, this looks good to me.
> 
> Acked-by: Masami Hiramatsu <mhiramat@kernel.org>

Thanks!

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v4 15/16] module: Move where we mark modules RO,X
  2019-10-21 16:11         ` Peter Zijlstra
@ 2019-10-22 11:31           ` Heiko Carstens
  2019-10-22 12:31             ` Peter Zijlstra
  0 siblings, 1 reply; 70+ messages in thread
From: Heiko Carstens @ 2019-10-22 11:31 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Josh Poimboeuf, x86, linux-kernel, rostedt, mhiramat, bristot,
	jbaron, torvalds, tglx, mingo, namit, hpa, luto, ard.biesheuvel,
	jeyu

On Mon, Oct 21, 2019 at 06:11:35PM +0200, Peter Zijlstra wrote:
> On Mon, Oct 21, 2019 at 05:34:25PM +0200, Peter Zijlstra wrote:
> > On Mon, Oct 21, 2019 at 04:14:02PM +0200, Peter Zijlstra wrote:
> 
> > So On IRC Josh suggested we use text_poke() for RELA. Since KLP is only
> > available on Power and x86, and Power does not have STRICT_MODULE_RWX,
> > the below should be sufficient.
> > 
> > Completely untested...
> 
> And because s390 also has: HAVE_LIVEPATCH and STRICT_MODULE_RWX the even
> less tested s390 bits included below.
> 
> Heiko, apologies if I completely wrecked it.
> 
> The purpose is to remove module_disable_ro()/module_enable_ro() from
> livepatch/core.c such that:
> 
>  - nothing relies on where in the module loading path module text goes RX.
>  - nothing ever has writable text

Given that Steven reported a crash, I assume I can wait until you
repost a new version of the series, which also includes s390 bits?


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v4 15/16] module: Move where we mark modules RO,X
  2019-10-22 11:31           ` Heiko Carstens
@ 2019-10-22 12:31             ` Peter Zijlstra
  0 siblings, 0 replies; 70+ messages in thread
From: Peter Zijlstra @ 2019-10-22 12:31 UTC (permalink / raw)
  To: Heiko Carstens
  Cc: Josh Poimboeuf, x86, linux-kernel, rostedt, mhiramat, bristot,
	jbaron, torvalds, tglx, mingo, namit, hpa, luto, ard.biesheuvel,
	jeyu

On Tue, Oct 22, 2019 at 01:31:16PM +0200, Heiko Carstens wrote:
> On Mon, Oct 21, 2019 at 06:11:35PM +0200, Peter Zijlstra wrote:
> > On Mon, Oct 21, 2019 at 05:34:25PM +0200, Peter Zijlstra wrote:
> > > On Mon, Oct 21, 2019 at 04:14:02PM +0200, Peter Zijlstra wrote:
> > 
> > > So On IRC Josh suggested we use text_poke() for RELA. Since KLP is only
> > > available on Power and x86, and Power does not have STRICT_MODULE_RWX,
> > > the below should be sufficient.
> > > 
> > > Completely untested...
> > 
> > And because s390 also has: HAVE_LIVEPATCH and STRICT_MODULE_RWX the even
> > less tested s390 bits included below.
> > 
> > Heiko, apologies if I completely wrecked it.
> > 
> > The purpose is to remove module_disable_ro()/module_enable_ro() from
> > livepatch/core.c such that:
> > 
> >  - nothing relies on where in the module loading path module text goes RX.
> >  - nothing ever has writable text
> 
> Given that Steven reported a crash, I assume I can wait until you
> repost a new version of the series, which also includes s390 bits?

His crash is somewhat orthogonal, but yes, I will repost once sorted.

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v4 15/16] module: Move where we mark modules RO,X
  2019-10-22  2:21   ` Steven Rostedt
@ 2019-10-22 20:24     ` Peter Zijlstra
  2019-10-22 20:40       ` Steven Rostedt
  2019-10-23 18:52       ` Steven Rostedt
  0 siblings, 2 replies; 70+ messages in thread
From: Peter Zijlstra @ 2019-10-22 20:24 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: x86, linux-kernel, mhiramat, bristot, jbaron, torvalds, tglx,
	mingo, namit, hpa, luto, ard.biesheuvel, jpoimboe, jeyu

On Mon, Oct 21, 2019 at 10:21:10PM -0400, Steven Rostedt wrote:
> On Fri, 18 Oct 2019 09:35:40 +0200
> Peter Zijlstra <peterz@infradead.org> wrote:
> 
> > Now that set_all_modules_text_*() is gone, nothing depends on the
> > relation between ->state = COMING and the protection state anymore.
> > This enables moving the protection changes later, such that the COMING
> > notifier callbacks can more easily modify the text.
> > 
> > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> > Cc: Jessica Yu <jeyu@kernel.org>
> > ---
> 
> This triggered the following bug:
> 

> The trace_event_define_fields_<event>() is defined in
> include/trace/trace_events.h and is an init function called by the
> trace_events event_create_dir() via the module notifier:
> MODULE_STATE_COMING

The below seems to cure it; and seems to generate identical
events/*/format output (for my .config, with the exception of ID).

It has just one section mismatch report that I'm too tired to look at
just now.

I'm not particularly proud of the "__function__" hack, but it works :/ I
couldn't come up with anything else for [uk]probes which seem to have
dynamic fields and if we're having it then syscall_enter can also make
use of it, the syscall_metadata crud was going to be ugly otherwise.

(also, win on LOC)

---
 fs/xfs/xfs_trace.h             |   4 +-
 include/linux/trace_events.h   |  16 ++++++-
 include/trace/events/filemap.h |   2 +-
 include/trace/trace_events.h   |  64 ++++++++-----------------
 kernel/trace/trace.h           |  31 ++++++------
 kernel/trace/trace_entries.h   |  66 +++++++------------------
 kernel/trace/trace_events.c    |  20 +++++++-
 kernel/trace/trace_export.c    | 106 +++++++++++++++--------------------------
 kernel/trace/trace_kprobe.c    |  12 ++++-
 kernel/trace/trace_syscalls.c  |  48 +++++++------------
 kernel/trace/trace_uprobe.c    |   6 ++-
 11 files changed, 162 insertions(+), 213 deletions(-)

diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index eaae275ed430..53c5485cf2a1 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -218,8 +218,8 @@ DECLARE_EVENT_CLASS(xfs_bmap_class,
 	TP_STRUCT__entry(
 		__field(dev_t, dev)
 		__field(xfs_ino_t, ino)
-		__field(void *, leaf);
-		__field(int, pos);
+		__field(void *, leaf)
+		__field(int, pos)
 		__field(xfs_fileoff_t, startoff)
 		__field(xfs_fsblock_t, startblock)
 		__field(xfs_filblks_t, blockcount)
diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h
index 30a8cdcfd4a4..30782e78b91d 100644
--- a/include/linux/trace_events.h
+++ b/include/linux/trace_events.h
@@ -187,6 +187,20 @@ enum trace_reg {
 
 struct trace_event_call;
 
+struct trace_event_fields {
+	const char *type;
+	union {
+		struct {
+			const char *name;
+			const int  size;
+			const int  align;
+			const int  is_signed;
+			const int  filter_type;
+		};
+		int (*define_fields)(struct trace_event_call *);
+	};
+};
+
 struct trace_event_class {
 	const char		*system;
 	void			*probe;
@@ -195,7 +209,7 @@ struct trace_event_class {
 #endif
 	int			(*reg)(struct trace_event_call *event,
 				       enum trace_reg type, void *data);
-	int			(*define_fields)(struct trace_event_call *);
+	struct trace_event_fields *fields_array;
 	struct list_head	*(*get_fields)(struct trace_event_call *);
 	struct list_head	fields;
 	int			(*raw_init)(struct trace_event_call *);
diff --git a/include/trace/events/filemap.h b/include/trace/events/filemap.h
index ee05db7ee8d2..796053e162d2 100644
--- a/include/trace/events/filemap.h
+++ b/include/trace/events/filemap.h
@@ -85,7 +85,7 @@ TRACE_EVENT(file_check_and_advance_wb_err,
 		TP_ARGS(file, old),
 
 		TP_STRUCT__entry(
-			__field(struct file *, file);
+			__field(struct file *, file)
 			__field(unsigned long, i_ino)
 			__field(dev_t, s_dev)
 			__field(errseq_t, old)
diff --git a/include/trace/trace_events.h b/include/trace/trace_events.h
index 4ecdfe2e3580..ca1d2e745a3f 100644
--- a/include/trace/trace_events.h
+++ b/include/trace/trace_events.h
@@ -394,22 +394,16 @@ static struct trace_event_functions trace_event_type_funcs_##call = {	\
 #include TRACE_INCLUDE(TRACE_INCLUDE_FILE)
 
 #undef __field_ext
-#define __field_ext(type, item, filter_type)				\
-	ret = trace_define_field(event_call, #type, #item,		\
-				 offsetof(typeof(field), item),		\
-				 sizeof(field.item),			\
-				 is_signed_type(type), filter_type);	\
-	if (ret)							\
-		return ret;
+#define __field_ext(_type, _item, _filter_type) {			\
+	.type = #_type, .name = #_item,					\
+	.size = sizeof(_type), .align = __alignof__(_type),		\
+	.is_signed = is_signed_type(_type), .filter_type = _filter_type },
 
 #undef __field_struct_ext
-#define __field_struct_ext(type, item, filter_type)			\
-	ret = trace_define_field(event_call, #type, #item,		\
-				 offsetof(typeof(field), item),		\
-				 sizeof(field.item),			\
-				 0, filter_type);			\
-	if (ret)							\
-		return ret;
+#define __field_struct_ext(_type, _item, _filter_type) {		\
+	.type = #_type, .name = #_item,					\
+	.size = sizeof(_type), .align = __alignof__(_type),		\
+	0, .filter_type = _filter_type },
 
 #undef __field
 #define __field(type, item)	__field_ext(type, item, FILTER_OTHER)
@@ -418,25 +412,16 @@ static struct trace_event_functions trace_event_type_funcs_##call = {	\
 #define __field_struct(type, item) __field_struct_ext(type, item, FILTER_OTHER)
 
 #undef __array
-#define __array(type, item, len)					\
-	do {								\
-		char *type_str = #type"["__stringify(len)"]";		\
-		BUILD_BUG_ON(len > MAX_FILTER_STR_VAL);			\
-		BUILD_BUG_ON(len <= 0);					\
-		ret = trace_define_field(event_call, type_str, #item,	\
-				 offsetof(typeof(field), item),		\
-				 sizeof(field.item),			\
-				 is_signed_type(type), FILTER_OTHER);	\
-		if (ret)						\
-			return ret;					\
-	} while (0);
+#define __array(_type, _item, _len) {					\
+	.type = #_type"["__stringify(_len)"]", .name = #_item,		\
+	.size = sizeof(_type[_len]), .align = __alignof__(_type),	\
+	.is_signed = is_signed_type(_type), .filter_type = FILTER_OTHER },
 
 #undef __dynamic_array
-#define __dynamic_array(type, item, len)				       \
-	ret = trace_define_field(event_call, "__data_loc " #type "[]", #item,  \
-				 offsetof(typeof(field), __data_loc_##item),   \
-				 sizeof(field.__data_loc_##item),	       \
-				 is_signed_type(type), FILTER_OTHER);
+#define __dynamic_array(_type, _item, _len) {				\
+	.type = "__data_loc " #_type "[]", .name = #_item,		\
+	.size = 4, .align = 4,						\
+	.is_signed = is_signed_type(_type), .filter_type = FILTER_OTHER },
 
 #undef __string
 #define __string(item, src) __dynamic_array(char, item, -1)
@@ -446,16 +431,9 @@ static struct trace_event_functions trace_event_type_funcs_##call = {	\
 
 #undef DECLARE_EVENT_CLASS
 #define DECLARE_EVENT_CLASS(call, proto, args, tstruct, func, print)	\
-static int notrace __init						\
-trace_event_define_fields_##call(struct trace_event_call *event_call)	\
-{									\
-	struct trace_event_raw_##call field;				\
-	int ret;							\
-									\
-	tstruct;							\
-									\
-	return ret;							\
-}
+static struct trace_event_fields trace_event_fields_##call[] = {	\
+	tstruct								\
+	{} };
 
 #undef DEFINE_EVENT
 #define DEFINE_EVENT(template, name, proto, args)
@@ -613,7 +591,7 @@ static inline notrace int trace_event_get_offsets_##call(		\
  *
  * static struct trace_event_class __used event_class_<template> = {
  *	.system			= "<system>",
- *	.define_fields		= trace_event_define_fields_<call>,
+ *	.fields_array		= trace_event_fields_<call>,
  *	.fields			= LIST_HEAD_INIT(event_class_##call.fields),
  *	.raw_init		= trace_event_raw_init,
  *	.probe			= trace_event_raw_event_##call,
@@ -761,7 +739,7 @@ _TRACE_PERF_PROTO(call, PARAMS(proto));					\
 static char print_fmt_##call[] = print;					\
 static struct trace_event_class __used __refdata event_class_##call = { \
 	.system			= TRACE_SYSTEM_STRING,			\
-	.define_fields		= trace_event_define_fields_##call,	\
+	.fields_array		= trace_event_fields_##call,		\
 	.fields			= LIST_HEAD_INIT(event_class_##call.fields),\
 	.raw_init		= trace_event_raw_init,			\
 	.probe			= trace_event_raw_event_##call,		\
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index d685c61085c0..298a7cacf146 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -49,6 +49,9 @@ enum trace_type {
 #undef __field
 #define __field(type, item)		type	item;
 
+#undef __field_fn
+#define __field_fn(type, item)		type	item;
+
 #undef __field_struct
 #define __field_struct(type, item)	__field(type, item)
 
@@ -68,26 +71,22 @@ enum trace_type {
 #define F_STRUCT(args...)		args
 
 #undef FTRACE_ENTRY
-#define FTRACE_ENTRY(name, struct_name, id, tstruct, print, filter)	\
+#define FTRACE_ENTRY(name, struct_name, id, tstruct, print)		\
 	struct struct_name {						\
 		struct trace_entry	ent;				\
 		tstruct							\
 	}
 
 #undef FTRACE_ENTRY_DUP
-#define FTRACE_ENTRY_DUP(name, name_struct, id, tstruct, printk, filter)
+#define FTRACE_ENTRY_DUP(name, name_struct, id, tstruct, printk)
 
 #undef FTRACE_ENTRY_REG
-#define FTRACE_ENTRY_REG(name, struct_name, id, tstruct, print,	\
-			 filter, regfn) \
-	FTRACE_ENTRY(name, struct_name, id, PARAMS(tstruct), PARAMS(print), \
-		     filter)
+#define FTRACE_ENTRY_REG(name, struct_name, id, tstruct, print,	regfn)	\
+	FTRACE_ENTRY(name, struct_name, id, PARAMS(tstruct), PARAMS(print))
 
 #undef FTRACE_ENTRY_PACKED
-#define FTRACE_ENTRY_PACKED(name, struct_name, id, tstruct, print,	\
-			    filter)					\
-	FTRACE_ENTRY(name, struct_name, id, PARAMS(tstruct), PARAMS(print), \
-		     filter) __packed
+#define FTRACE_ENTRY_PACKED(name, struct_name, id, tstruct, print)	\
+	FTRACE_ENTRY(name, struct_name, id, PARAMS(tstruct), PARAMS(print)) __packed
 
 #include "trace_entries.h"
 
@@ -1899,17 +1898,15 @@ extern void tracing_log_err(struct trace_array *tr,
 #define internal_trace_puts(str) __trace_puts(_THIS_IP_, str, strlen(str))
 
 #undef FTRACE_ENTRY
-#define FTRACE_ENTRY(call, struct_name, id, tstruct, print, filter)	\
+#define FTRACE_ENTRY(call, struct_name, id, tstruct, print)	\
 	extern struct trace_event_call					\
 	__aligned(4) event_##call;
 #undef FTRACE_ENTRY_DUP
-#define FTRACE_ENTRY_DUP(call, struct_name, id, tstruct, print, filter)	\
-	FTRACE_ENTRY(call, struct_name, id, PARAMS(tstruct), PARAMS(print), \
-		     filter)
+#define FTRACE_ENTRY_DUP(call, struct_name, id, tstruct, print)	\
+	FTRACE_ENTRY(call, struct_name, id, PARAMS(tstruct), PARAMS(print))
 #undef FTRACE_ENTRY_PACKED
-#define FTRACE_ENTRY_PACKED(call, struct_name, id, tstruct, print, filter) \
-	FTRACE_ENTRY(call, struct_name, id, PARAMS(tstruct), PARAMS(print), \
-		     filter)
+#define FTRACE_ENTRY_PACKED(call, struct_name, id, tstruct, print) \
+	FTRACE_ENTRY(call, struct_name, id, PARAMS(tstruct), PARAMS(print))
 
 #include "trace_entries.h"
 
diff --git a/kernel/trace/trace_entries.h b/kernel/trace/trace_entries.h
index fc8e97328e54..3e9d81608284 100644
--- a/kernel/trace/trace_entries.h
+++ b/kernel/trace/trace_entries.h
@@ -61,15 +61,13 @@ FTRACE_ENTRY_REG(function, ftrace_entry,
 	TRACE_FN,
 
 	F_STRUCT(
-		__field(	unsigned long,	ip		)
-		__field(	unsigned long,	parent_ip	)
+		__field_fn(	unsigned long,	ip		)
+		__field_fn(	unsigned long,	parent_ip	)
 	),
 
 	F_printk(" %ps <-- %ps",
 		 (void *)__entry->ip, (void *)__entry->parent_ip),
 
-	FILTER_TRACE_FN,
-
 	perf_ftrace_event_register
 );
 
@@ -84,9 +82,7 @@ FTRACE_ENTRY_PACKED(funcgraph_entry, ftrace_graph_ent_entry,
 		__field_desc(	int,		graph_ent,	depth		)
 	),
 
-	F_printk("--> %ps (%d)", (void *)__entry->func, __entry->depth),
-
-	FILTER_OTHER
+	F_printk("--> %ps (%d)", (void *)__entry->func, __entry->depth)
 );
 
 /* Function return entry */
@@ -97,18 +93,16 @@ FTRACE_ENTRY_PACKED(funcgraph_exit, ftrace_graph_ret_entry,
 	F_STRUCT(
 		__field_struct(	struct ftrace_graph_ret,	ret	)
 		__field_desc(	unsigned long,	ret,		func	)
+		__field_desc(	unsigned long,	ret,		overrun	)
 		__field_desc(	unsigned long long, ret,	calltime)
 		__field_desc(	unsigned long long, ret,	rettime	)
-		__field_desc(	unsigned long,	ret,		overrun	)
 		__field_desc(	int,		ret,		depth	)
 	),
 
 	F_printk("<-- %ps (%d) (start: %llx  end: %llx) over: %d",
 		 (void *)__entry->func, __entry->depth,
 		 __entry->calltime, __entry->rettime,
-		 __entry->depth),
-
-	FILTER_OTHER
+		 __entry->depth)
 );
 
 /*
@@ -137,9 +131,7 @@ FTRACE_ENTRY(context_switch, ctx_switch_entry,
 	F_printk("%u:%u:%u  ==> %u:%u:%u [%03u]",
 		 __entry->prev_pid, __entry->prev_prio, __entry->prev_state,
 		 __entry->next_pid, __entry->next_prio, __entry->next_state,
-		 __entry->next_cpu),
-
-	FILTER_OTHER
+		 __entry->next_cpu)
 );
 
 /*
@@ -157,9 +149,7 @@ FTRACE_ENTRY_DUP(wakeup, ctx_switch_entry,
 	F_printk("%u:%u:%u  ==+ %u:%u:%u [%03u]",
 		 __entry->prev_pid, __entry->prev_prio, __entry->prev_state,
 		 __entry->next_pid, __entry->next_prio, __entry->next_state,
-		 __entry->next_cpu),
-
-	FILTER_OTHER
+		 __entry->next_cpu)
 );
 
 /*
@@ -183,9 +173,7 @@ FTRACE_ENTRY(kernel_stack, stack_entry,
 		 (void *)__entry->caller[0], (void *)__entry->caller[1],
 		 (void *)__entry->caller[2], (void *)__entry->caller[3],
 		 (void *)__entry->caller[4], (void *)__entry->caller[5],
-		 (void *)__entry->caller[6], (void *)__entry->caller[7]),
-
-	FILTER_OTHER
+		 (void *)__entry->caller[6], (void *)__entry->caller[7])
 );
 
 FTRACE_ENTRY(user_stack, userstack_entry,
@@ -203,9 +191,7 @@ FTRACE_ENTRY(user_stack, userstack_entry,
 		 (void *)__entry->caller[0], (void *)__entry->caller[1],
 		 (void *)__entry->caller[2], (void *)__entry->caller[3],
 		 (void *)__entry->caller[4], (void *)__entry->caller[5],
-		 (void *)__entry->caller[6], (void *)__entry->caller[7]),
-
-	FILTER_OTHER
+		 (void *)__entry->caller[6], (void *)__entry->caller[7])
 );
 
 /*
@@ -222,9 +208,7 @@ FTRACE_ENTRY(bprint, bprint_entry,
 	),
 
 	F_printk("%ps: %s",
-		 (void *)__entry->ip, __entry->fmt),
-
-	FILTER_OTHER
+		 (void *)__entry->ip, __entry->fmt)
 );
 
 FTRACE_ENTRY_REG(print, print_entry,
@@ -239,8 +223,6 @@ FTRACE_ENTRY_REG(print, print_entry,
 	F_printk("%ps: %s",
 		 (void *)__entry->ip, __entry->buf),
 
-	FILTER_OTHER,
-
 	ftrace_event_register
 );
 
@@ -254,9 +236,7 @@ FTRACE_ENTRY(raw_data, raw_data_entry,
 	),
 
 	F_printk("id:%04x %08x",
-		 __entry->id, (int)__entry->buf[0]),
-
-	FILTER_OTHER
+		 __entry->id, (int)__entry->buf[0])
 );
 
 FTRACE_ENTRY(bputs, bputs_entry,
@@ -269,9 +249,7 @@ FTRACE_ENTRY(bputs, bputs_entry,
 	),
 
 	F_printk("%ps: %s",
-		 (void *)__entry->ip, __entry->str),
-
-	FILTER_OTHER
+		 (void *)__entry->ip, __entry->str)
 );
 
 FTRACE_ENTRY(mmiotrace_rw, trace_mmiotrace_rw,
@@ -283,16 +261,14 @@ FTRACE_ENTRY(mmiotrace_rw, trace_mmiotrace_rw,
 		__field_desc(	resource_size_t, rw,	phys	)
 		__field_desc(	unsigned long,	rw,	value	)
 		__field_desc(	unsigned long,	rw,	pc	)
-		__field_desc(	int, 		rw,	map_id	)
+		__field_desc(	int,		rw,	map_id	)
 		__field_desc(	unsigned char,	rw,	opcode	)
 		__field_desc(	unsigned char,	rw,	width	)
 	),
 
 	F_printk("%lx %lx %lx %d %x %x",
 		 (unsigned long)__entry->phys, __entry->value, __entry->pc,
-		 __entry->map_id, __entry->opcode, __entry->width),
-
-	FILTER_OTHER
+		 __entry->map_id, __entry->opcode, __entry->width)
 );
 
 FTRACE_ENTRY(mmiotrace_map, trace_mmiotrace_map,
@@ -304,15 +280,13 @@ FTRACE_ENTRY(mmiotrace_map, trace_mmiotrace_map,
 		__field_desc(	resource_size_t, map,	phys	)
 		__field_desc(	unsigned long,	map,	virt	)
 		__field_desc(	unsigned long,	map,	len	)
-		__field_desc(	int, 		map,	map_id	)
+		__field_desc(	int,		map,	map_id	)
 		__field_desc(	unsigned char,	map,	opcode	)
 	),
 
 	F_printk("%lx %lx %lx %d %x",
 		 (unsigned long)__entry->phys, __entry->virt, __entry->len,
-		 __entry->map_id, __entry->opcode),
-
-	FILTER_OTHER
+		 __entry->map_id, __entry->opcode)
 );
 
 
@@ -334,9 +308,7 @@ FTRACE_ENTRY(branch, trace_branch,
 	F_printk("%u:%s:%s (%u)%s",
 		 __entry->line,
 		 __entry->func, __entry->file, __entry->correct,
-		 __entry->constant ? " CONSTANT" : ""),
-
-	FILTER_OTHER
+		 __entry->constant ? " CONSTANT" : "")
 );
 
 
@@ -362,7 +334,5 @@ FTRACE_ENTRY(hwlat, hwlat_entry,
 		 __entry->duration,
 		 __entry->outer_duration,
 		 __entry->nmi_total_ts,
-		 __entry->nmi_count),
-
-	FILTER_OTHER
+		 __entry->nmi_count)
 );
diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
index fba87d10f0c1..508b5299d843 100644
--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -24,6 +24,7 @@
 #include <linux/delay.h>
 
 #include <trace/events/sched.h>
+#include <trace/syscall.h>
 
 #include <asm/setup.h>
 
@@ -1990,7 +1991,24 @@ event_create_dir(struct dentry *parent, struct trace_event_file *file)
 	 */
 	head = trace_get_fields(call);
 	if (list_empty(head)) {
-		ret = call->class->define_fields(call);
+		struct trace_event_fields *field = call->class->fields_array;
+		unsigned int offset = sizeof(struct trace_entry);
+
+		for (; field->type; field++) {
+			if (strcmp(field->type, "__function__") == 0) {
+				ret = field->define_fields(call);
+				break;
+			}
+
+			offset = ALIGN(offset, field->align);
+			ret = trace_define_field(call, field->type, field->name,
+						 offset, field->size,
+						 field->is_signed, field->filter_type);
+			if (ret)
+				break;
+
+			offset += field->size;
+		}
 		if (ret < 0) {
 			pr_warn("Could not initialize trace point events/%s\n",
 				name);
diff --git a/kernel/trace/trace_export.c b/kernel/trace/trace_export.c
index 45630a76ed3a..6d64c1c19fd5 100644
--- a/kernel/trace/trace_export.c
+++ b/kernel/trace/trace_export.c
@@ -29,10 +29,8 @@ static int ftrace_event_register(struct trace_event_call *call,
  * function and thus become accesible via perf.
  */
 #undef FTRACE_ENTRY_REG
-#define FTRACE_ENTRY_REG(name, struct_name, id, tstruct, print, \
-			 filter, regfn) \
-	FTRACE_ENTRY(name, struct_name, id, PARAMS(tstruct), PARAMS(print), \
-		     filter)
+#define FTRACE_ENTRY_REG(name, struct_name, id, tstruct, print, regfn) \
+	FTRACE_ENTRY(name, struct_name, id, PARAMS(tstruct), PARAMS(print))
 
 /* not needed for this file */
 #undef __field_struct
@@ -41,6 +39,9 @@ static int ftrace_event_register(struct trace_event_call *call,
 #undef __field
 #define __field(type, item)				type item;
 
+#undef __field_fn
+#define __field_fn(type, item)				type item;
+
 #undef __field_desc
 #define __field_desc(type, container, item)		type item;
 
@@ -60,7 +61,7 @@ static int ftrace_event_register(struct trace_event_call *call,
 #define F_printk(fmt, args...) fmt, args
 
 #undef FTRACE_ENTRY
-#define FTRACE_ENTRY(name, struct_name, id, tstruct, print, filter)	\
+#define FTRACE_ENTRY(name, struct_name, id, tstruct, print)		\
 struct ____ftrace_##name {						\
 	tstruct								\
 };									\
@@ -73,76 +74,46 @@ static void __always_unused ____ftrace_check_##name(void)		\
 }
 
 #undef FTRACE_ENTRY_DUP
-#define FTRACE_ENTRY_DUP(name, struct_name, id, tstruct, print, filter)	\
-	FTRACE_ENTRY(name, struct_name, id, PARAMS(tstruct), PARAMS(print), \
-		     filter)
+#define FTRACE_ENTRY_DUP(name, struct_name, id, tstruct, print)		\
+	FTRACE_ENTRY(name, struct_name, id, PARAMS(tstruct), PARAMS(print))
 
 #include "trace_entries.h"
 
+#undef __field_ext
+#define __field_ext(_type, _item, _filter_type) {			\
+	.type = #_type, .name = #_item,					\
+	.size = sizeof(_type), .align = __alignof__(_type),		\
+	is_signed_type(_type), .filter_type = _filter_type },
+
 #undef __field
-#define __field(type, item)						\
-	ret = trace_define_field(event_call, #type, #item,		\
-				 offsetof(typeof(field), item),		\
-				 sizeof(field.item),			\
-				 is_signed_type(type), filter_type);	\
-	if (ret)							\
-		return ret;
+#define __field(_type, _item) __field_ext(_type, _item, FILTER_OTHER)
+
+#undef __field_fn
+#define __field_fn(_type, _item) __field_ext(_type, _item, FILTER_TRACE_FN)
 
 #undef __field_desc
-#define __field_desc(type, container, item)	\
-	ret = trace_define_field(event_call, #type, #item,		\
-				 offsetof(typeof(field),		\
-					  container.item),		\
-				 sizeof(field.container.item),		\
-				 is_signed_type(type), filter_type);	\
-	if (ret)							\
-		return ret;
+#define __field_desc(_type, _container, _item) __field_ext(_type, _item, FILTER_OTHER)
 
 #undef __array
-#define __array(type, item, len)					\
-	do {								\
-		char *type_str = #type"["__stringify(len)"]";		\
-		BUILD_BUG_ON(len > MAX_FILTER_STR_VAL);			\
-		ret = trace_define_field(event_call, type_str, #item,	\
-				 offsetof(typeof(field), item),		\
-				 sizeof(field.item),			\
-				 is_signed_type(type), filter_type);	\
-		if (ret)						\
-			return ret;					\
-	} while (0);
+#define __array(_type, _item, _len) {					\
+	.type = #_type"["__stringify(_len)"]", .name = #_item,		\
+	.size = sizeof(_type[_len]), .align = __alignof__(_type),	\
+	is_signed_type(_type), .filter_type = FILTER_OTHER },
 
 #undef __array_desc
-#define __array_desc(type, container, item, len)			\
-	BUILD_BUG_ON(len > MAX_FILTER_STR_VAL);				\
-	ret = trace_define_field(event_call, #type "[" #len "]", #item,	\
-				 offsetof(typeof(field),		\
-					  container.item),		\
-				 sizeof(field.container.item),		\
-				 is_signed_type(type), filter_type);	\
-	if (ret)							\
-		return ret;
+#define __array_desc(_type, _container, _item, _len) __array(_type, _item, _len)
 
 #undef __dynamic_array
-#define __dynamic_array(type, item)					\
-	ret = trace_define_field(event_call, #type "[]", #item,  \
-				 offsetof(typeof(field), item),		\
-				 0, is_signed_type(type), filter_type);\
-	if (ret)							\
-		return ret;
+#define __dynamic_array(_type, _item) {					\
+	.type = #_type "[]", .name = #_item,				\
+	.size = 0, .align = __alignof__(_type),				\
+	is_signed_type(_type), .filter_type = FILTER_OTHER },
 
 #undef FTRACE_ENTRY
-#define FTRACE_ENTRY(name, struct_name, id, tstruct, print, filter)	\
-static int __init							\
-ftrace_define_fields_##name(struct trace_event_call *event_call)	\
-{									\
-	struct struct_name field;					\
-	int ret;							\
-	int filter_type = filter;					\
-									\
-	tstruct;							\
-									\
-	return ret;							\
-}
+#define FTRACE_ENTRY(name, struct_name, id, tstruct, print)		\
+static struct trace_event_fields ftrace_event_fields_##name[] = {	\
+	tstruct								\
+	{} };
 
 #include "trace_entries.h"
 
@@ -152,6 +123,9 @@ ftrace_define_fields_##name(struct trace_event_call *event_call)	\
 #undef __field
 #define __field(type, item)
 
+#undef __field_fn
+#define __field_fn(type, item)
+
 #undef __field_desc
 #define __field_desc(type, container, item)
 
@@ -168,12 +142,10 @@ ftrace_define_fields_##name(struct trace_event_call *event_call)	\
 #define F_printk(fmt, args...) __stringify(fmt) ", "  __stringify(args)
 
 #undef FTRACE_ENTRY_REG
-#define FTRACE_ENTRY_REG(call, struct_name, etype, tstruct, print, filter,\
-			 regfn)						\
-									\
+#define FTRACE_ENTRY_REG(call, struct_name, etype, tstruct, print, regfn) \
 struct trace_event_class __refdata event_class_ftrace_##call = {	\
 	.system			= __stringify(TRACE_SYSTEM),		\
-	.define_fields		= ftrace_define_fields_##call,		\
+	.fields_array		= ftrace_event_fields_##call,		\
 	.fields			= LIST_HEAD_INIT(event_class_ftrace_##call.fields),\
 	.reg			= regfn,				\
 };									\
@@ -191,9 +163,9 @@ struct trace_event_call __used						\
 __attribute__((section("_ftrace_events"))) *__event_##call = &event_##call;
 
 #undef FTRACE_ENTRY
-#define FTRACE_ENTRY(call, struct_name, etype, tstruct, print, filter)	\
+#define FTRACE_ENTRY(call, struct_name, etype, tstruct, print)		\
 	FTRACE_ENTRY_REG(call, struct_name, etype,			\
-			 PARAMS(tstruct), PARAMS(print), filter, NULL)
+			 PARAMS(tstruct), PARAMS(print), NULL)
 
 bool ftrace_event_is_function(struct trace_event_call *call)
 {
diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index 1552a95c743b..618267647885 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -1540,10 +1540,18 @@ static inline void init_trace_event_call(struct trace_kprobe *tk)
 
 	if (trace_kprobe_is_return(tk)) {
 		call->event.funcs = &kretprobe_funcs;
-		call->class->define_fields = kretprobe_event_define_fields;
+		call->class->fields_array = (struct trace_event_fields[]) {
+			{ .type = "__function__",
+			  .define_fields = kretprobe_event_define_fields },
+			{}
+		};
 	} else {
 		call->event.funcs = &kprobe_funcs;
-		call->class->define_fields = kprobe_event_define_fields;
+		call->class->fields_array = (struct trace_event_fields[]) {
+			{ .type = "__function__",
+			  .define_fields = kprobe_event_define_fields },
+			{}
+		};
 	}
 
 	call->flags = TRACE_EVENT_FL_KPROBE;
diff --git a/kernel/trace/trace_syscalls.c b/kernel/trace/trace_syscalls.c
index fa8fbff736d6..766557a99385 100644
--- a/kernel/trace/trace_syscalls.c
+++ b/kernel/trace/trace_syscalls.c
@@ -198,11 +198,10 @@ print_syscall_exit(struct trace_iterator *iter, int flags,
 
 extern char *__bad_type_size(void);
 
-#define SYSCALL_FIELD(type, field, name)				\
-	sizeof(type) != sizeof(trace.field) ?				\
-		__bad_type_size() :					\
-		#type, #name, offsetof(typeof(trace), field),		\
-		sizeof(trace.field), is_signed_type(type)
+#define SYSCALL_FIELD(_type, _name) {					\
+	.type = #_type, .name = #_name,					\
+	.size = sizeof(_type), .align = __alignof__(_type),		\
+	.is_signed = is_signed_type(_type), .filter_type = FILTER_OTHER }
 
 static int __init
 __set_enter_print_fmt(struct syscall_metadata *entry, char *buf, int len)
@@ -269,42 +268,22 @@ static int __init syscall_enter_define_fields(struct trace_event_call *call)
 {
 	struct syscall_trace_enter trace;
 	struct syscall_metadata *meta = call->data;
-	int ret;
-	int i;
 	int offset = offsetof(typeof(trace), args);
-
-	ret = trace_define_field(call, SYSCALL_FIELD(int, nr, __syscall_nr),
-				 FILTER_OTHER);
-	if (ret)
-		return ret;
+	int ret, i;
 
 	for (i = 0; i < meta->nb_args; i++) {
 		ret = trace_define_field(call, meta->types[i],
 					 meta->args[i], offset,
 					 sizeof(unsigned long), 0,
 					 FILTER_OTHER);
+		if (ret)
+			break;
 		offset += sizeof(unsigned long);
 	}
 
 	return ret;
 }
 
-static int __init syscall_exit_define_fields(struct trace_event_call *call)
-{
-	struct syscall_trace_exit trace;
-	int ret;
-
-	ret = trace_define_field(call, SYSCALL_FIELD(int, nr, __syscall_nr),
-				 FILTER_OTHER);
-	if (ret)
-		return ret;
-
-	ret = trace_define_field(call, SYSCALL_FIELD(long, ret, ret),
-				 FILTER_OTHER);
-
-	return ret;
-}
-
 static void ftrace_syscall_enter(void *data, struct pt_regs *regs, long id)
 {
 	struct trace_array *tr = data;
@@ -513,7 +492,12 @@ struct trace_event_functions exit_syscall_print_funcs = {
 struct trace_event_class __refdata event_class_syscall_enter = {
 	.system		= "syscalls",
 	.reg		= syscall_enter_register,
-	.define_fields	= syscall_enter_define_fields,
+	.fields_array	= (struct trace_event_fields[]) {
+		SYSCALL_FIELD(int, __syscall_nr),
+		{ .type = "__function__",
+		  .define_fields = syscall_enter_define_fields },
+		{}
+	},
 	.get_fields	= syscall_get_enter_fields,
 	.raw_init	= init_syscall_trace,
 };
@@ -521,7 +505,11 @@ struct trace_event_class __refdata event_class_syscall_enter = {
 struct trace_event_class __refdata event_class_syscall_exit = {
 	.system		= "syscalls",
 	.reg		= syscall_exit_register,
-	.define_fields	= syscall_exit_define_fields,
+	.fields_array	= (struct trace_event_fields[]){
+		SYSCALL_FIELD(int, __syscall_nr),
+		SYSCALL_FIELD(long, ret),
+		{}
+	},
 	.fields		= LIST_HEAD_INIT(event_class_syscall_exit.fields),
 	.raw_init	= init_syscall_trace,
 };
diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
index 352073d36585..9aa866c62354 100644
--- a/kernel/trace/trace_uprobe.c
+++ b/kernel/trace/trace_uprobe.c
@@ -1512,7 +1512,11 @@ static inline void init_trace_event_call(struct trace_uprobe *tu)
 	struct trace_event_call *call = trace_probe_event_call(&tu->tp);
 
 	call->event.funcs = &uprobe_funcs;
-	call->class->define_fields = uprobe_event_define_fields;
+	call->class->fields_array = (struct trace_event_fields[]) {
+		{ .type = "__function__",
+		  .define_fields = uprobe_event_define_fields },
+		{}
+	};
 
 	call->flags = TRACE_EVENT_FL_UPROBE | TRACE_EVENT_FL_CAP_ANY;
 	call->class->reg = trace_uprobe_register;

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v4 15/16] module: Move where we mark modules RO,X
  2019-10-22 20:24     ` Peter Zijlstra
@ 2019-10-22 20:40       ` Steven Rostedt
  2019-10-23  9:07         ` Peter Zijlstra
  2019-10-23 18:52       ` Steven Rostedt
  1 sibling, 1 reply; 70+ messages in thread
From: Steven Rostedt @ 2019-10-22 20:40 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: x86, linux-kernel, mhiramat, bristot, jbaron, torvalds, tglx,
	mingo, namit, hpa, luto, ard.biesheuvel, jpoimboe, jeyu

On Tue, 22 Oct 2019 22:24:01 +0200
Peter Zijlstra <peterz@infradead.org> wrote:

> The below seems to cure it; and seems to generate identical
> events/*/format output (for my .config, with the exception of ID).
> 
> It has just one section mismatch report that I'm too tired to look at
> just now.

Thanks, I'll try to take a look at it tomorrow.

> 
> I'm not particularly proud of the "__function__" hack, but it works :/ I

If anything, that should be defined as a macro:

#define TRACE_EVENT_FIELD_SPECIAL "__trace_event_special__"

And use that to test?

> couldn't come up with anything else for [uk]probes which seem to have
> dynamic fields and if we're having it then syscall_enter can also make
> use of it, the syscall_metadata crud was going to be ugly otherwise.
> 
> (also, win on LOC)

I'm more worried about text/data bloat. But if anything, we may be able
to deal with that later.

-- Steve

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v4 15/16] module: Move where we mark modules RO,X
  2019-10-22 20:40       ` Steven Rostedt
@ 2019-10-23  9:07         ` Peter Zijlstra
  0 siblings, 0 replies; 70+ messages in thread
From: Peter Zijlstra @ 2019-10-23  9:07 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: x86, linux-kernel, mhiramat, bristot, jbaron, torvalds, tglx,
	mingo, namit, hpa, luto, ard.biesheuvel, jpoimboe, jeyu

On Tue, Oct 22, 2019 at 04:40:23PM -0400, Steven Rostedt wrote:
> On Tue, 22 Oct 2019 22:24:01 +0200
> Peter Zijlstra <peterz@infradead.org> wrote:

> > I'm not particularly proud of the "__function__" hack, but it works :/ I
> 
> If anything, that should be defined as a macro:
> 
> #define TRACE_EVENT_FIELD_SPECIAL "__trace_event_special__"
> 
> And use that to test?

Possibly, also, we should probably start with a character that C doesn't
allow in typenames, like '$'.

That way we can have a much shorter string and still be certain it will
never conflict; "$func" comes to mind.

> > couldn't come up with anything else for [uk]probes which seem to have
> > dynamic fields and if we're having it then syscall_enter can also make
> > use of it, the syscall_metadata crud was going to be ugly otherwise.
> > 
> > (also, win on LOC)
> 
> I'm more worried about text/data bloat. But if anything, we may be able
> to deal with that later.

We use almost the exact same data the function would've used, except we
don't have the actual function. I just don't see how it can be more.

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v4 15/16] module: Move where we mark modules RO,X
  2019-10-21 14:14     ` Peter Zijlstra
  2019-10-21 15:34       ` Peter Zijlstra
@ 2019-10-23 11:48       ` Peter Zijlstra
  2019-10-23 15:16         ` Peter Zijlstra
  2019-10-23 17:00         ` Josh Poimboeuf
  1 sibling, 2 replies; 70+ messages in thread
From: Peter Zijlstra @ 2019-10-23 11:48 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: x86, linux-kernel, rostedt, mhiramat, bristot, jbaron, torvalds,
	tglx, mingo, namit, hpa, luto, ard.biesheuvel, jeyu

On Mon, Oct 21, 2019 at 04:14:02PM +0200, Peter Zijlstra wrote:
> On Mon, Oct 21, 2019 at 08:53:12AM -0500, Josh Poimboeuf wrote:

> > Doesn't livepatch code also need to be modified?  We have:
> 
> Urgh bah.. I was too focussed on the other klp borkage :/ But yes,
> arm64/ftrace and klp are the only two users of that function (outside of
> module.c) and Mark was already writing a patch for arm64.
> 
> Means these last two patches need to wait a little until we've fixed
> those.

So the other KLP issue:

<mbenes> peterz: ad klp, apply_relocate_add() and text_poke()... what
         about apply_alternatives() and apply_paravirt()? They call
         text_poke_early(), which was ok with module_disable/enable_ro(), but
	 now it is not, I suppose. See arch_klp_init_object_loaded() in
         arch/x86/kernel/livepatch.c.

<peterz> mbenes: hurm, I don't see why we would need to do
         apply_alternatives() / apply_paravirt() later, why can't we do that
	 the moment we load the module

<peterz> mbenes: that is, those things _never_ change after boot

<mbenes> peterz: as jpoimboe explained. See commit
         d4c3e6e1b193497da3a2ce495fb1db0243e41e37 for detailed explanation.

Now sadly that commit missed all the useful information, luckily I could
find the patch in my LKML folder, more sad, that thread still didn't
contain the actual useful information, for that I was directed to
github:

  https://github.com/dynup/kpatch/issues/580

Now, someone is owning me a beer for having to look at github for this.

That finally explained that what happens is that the RELA was trying to
fix up the paravirt indirect call to 'local_irq_disable', which
apply_paravirt() will have overwritten with 'CLI; NOP'. This then
obviously goes *bang*.

This then raises a number of questions:

 1) why is that RELA (that obviously does not depend on any module)
    applied so late?

 2) why can't we unconditionally skip RELA's to paravirt sites?

 3) Is there ever a possible module-dependent RELA to a paravirt /
    alternative site?


Now, for 1), I would propose '.klp.rela.${mod}' sections only contain
RELAs that depend on symbols in ${mod} (or modules in general). We can
fix up RELAs that depend on core kernel early without problems. Let them
be in the normal .rela sections and be fixed up on loading the
patch-module as per usual.

This should also deal with 2, paravirt should always have RELAs into the
core kernel.

Then for 3) we only have alternatives left, and I _think_ it unlikely to
be the case, but I'll have to have a hard look at that.

Hmmm ?

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v4 15/16] module: Move where we mark modules RO,X
  2019-10-23 11:48       ` Peter Zijlstra
@ 2019-10-23 15:16         ` Peter Zijlstra
  2019-10-23 17:15           ` Josh Poimboeuf
  2019-10-23 17:00         ` Josh Poimboeuf
  1 sibling, 1 reply; 70+ messages in thread
From: Peter Zijlstra @ 2019-10-23 15:16 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: x86, linux-kernel, rostedt, mhiramat, bristot, jbaron, torvalds,
	tglx, mingo, namit, hpa, luto, ard.biesheuvel, jeyu

On Wed, Oct 23, 2019 at 01:48:35PM +0200, Peter Zijlstra wrote:
> On Mon, Oct 21, 2019 at 04:14:02PM +0200, Peter Zijlstra wrote:
> > On Mon, Oct 21, 2019 at 08:53:12AM -0500, Josh Poimboeuf wrote:
> 
> > > Doesn't livepatch code also need to be modified?  We have:
> > 
> > Urgh bah.. I was too focussed on the other klp borkage :/ But yes,
> > arm64/ftrace and klp are the only two users of that function (outside of
> > module.c) and Mark was already writing a patch for arm64.
> > 
> > Means these last two patches need to wait a little until we've fixed
> > those.
> 
> So the other KLP issue:
> 
> <mbenes> peterz: ad klp, apply_relocate_add() and text_poke()... what
>          about apply_alternatives() and apply_paravirt()? They call
>          text_poke_early(), which was ok with module_disable/enable_ro(), but
> 	 now it is not, I suppose. See arch_klp_init_object_loaded() in
>          arch/x86/kernel/livepatch.c.
> 
> <peterz> mbenes: hurm, I don't see why we would need to do
>          apply_alternatives() / apply_paravirt() later, why can't we do that
> 	 the moment we load the module
> 
> <peterz> mbenes: that is, those things _never_ change after boot
> 
> <mbenes> peterz: as jpoimboe explained. See commit
>          d4c3e6e1b193497da3a2ce495fb1db0243e41e37 for detailed explanation.
> 
> Now sadly that commit missed all the useful information, luckily I could
> find the patch in my LKML folder, more sad, that thread still didn't
> contain the actual useful information, for that I was directed to
> github:
> 
>   https://github.com/dynup/kpatch/issues/580
> 
> Now, someone is owning me a beer for having to look at github for this.
> 
> That finally explained that what happens is that the RELA was trying to
> fix up the paravirt indirect call to 'local_irq_disable', which
> apply_paravirt() will have overwritten with 'CLI; NOP'. This then
> obviously goes *bang*.
> 
> This then raises a number of questions:
> 
>  1) why is that RELA (that obviously does not depend on any module)
>     applied so late?
> 
>  2) why can't we unconditionally skip RELA's to paravirt sites?
> 
>  3) Is there ever a possible module-dependent RELA to a paravirt /
>     alternative site?
> 
> 
> Now, for 1), I would propose '.klp.rela.${mod}' sections only contain
> RELAs that depend on symbols in ${mod} (or modules in general). We can
> fix up RELAs that depend on core kernel early without problems. Let them
> be in the normal .rela sections and be fixed up on loading the
> patch-module as per usual.
> 
> This should also deal with 2, paravirt should always have RELAs into the
> core kernel.
> 
> Then for 3) we only have alternatives left, and I _think_ it unlikely to
> be the case, but I'll have to have a hard look at that.
> 
> Hmmm ?

Something like so on top, I suppose.

---

--- a/arch/x86/kernel/module.c
+++ b/arch/x86/kernel/module.c
@@ -131,7 +131,8 @@ static int __apply_relocate_add(Elf64_Sh
 		   unsigned int symindex,
 		   unsigned int relsec,
 		   struct module *me,
-		   void *(*write)(void *addr, const void *val, size_t size))
+		   void *(*write)(void *addr, const void *val, size_t size),
+		   bool klp)
 {
 	unsigned int i;
 	Elf64_Rela *rel = (void *)sechdrs[relsec].sh_addr;
@@ -157,6 +158,14 @@ static int __apply_relocate_add(Elf64_Sh
 
 		val = sym->st_value + rel[i].r_addend;
 
+		/*
+		 * .klp.rela.* sections should only contain module
+		 * related RELAs. All core-kernel RELAs should be in
+		 * normal .rela.* sections and be applied when loading
+		 * the patch module itself.
+		 */
+		WARN_ON_ONCE(klp && core_kernel_text(val));
+
 		switch (ELF64_R_TYPE(rel[i].r_info)) {
 		case R_X86_64_NONE:
 			break;
@@ -223,7 +232,7 @@ int apply_relocate_add(Elf64_Shdr *sechd
 		   unsigned int relsec,
 		   struct module *me)
 {
-	return __apply_relocate_add(sechdrs, strtab, symindex, relsec, me, memcpy);
+	return __apply_relocate_add(sechdrs, strtab, symindex, relsec, me, memcpy, false);
 }
 
 int klp_apply_relocate_add(Elf64_Shdr *sechdrs,
@@ -234,7 +243,7 @@ int klp_apply_relocate_add(Elf64_Shdr *s
 {
 	int ret;
 
-	ret = __apply_relocate_add(sechdrs, strtab, symindex, relsec, me, text_poke);
+	ret = __apply_relocate_add(sechdrs, strtab, symindex, relsec, me, text_poke, true);
 	if (!ret)
 		text_poke_sync();
 

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v4 15/16] module: Move where we mark modules RO,X
  2019-10-23 11:48       ` Peter Zijlstra
  2019-10-23 15:16         ` Peter Zijlstra
@ 2019-10-23 17:00         ` Josh Poimboeuf
  2019-10-24 13:16           ` Peter Zijlstra
  1 sibling, 1 reply; 70+ messages in thread
From: Josh Poimboeuf @ 2019-10-23 17:00 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: x86, linux-kernel, rostedt, mhiramat, bristot, jbaron, torvalds,
	tglx, mingo, namit, hpa, luto, ard.biesheuvel, jeyu,
	live-patching

On Wed, Oct 23, 2019 at 01:48:35PM +0200, Peter Zijlstra wrote:
> Now sadly that commit missed all the useful information, luckily I could
> find the patch in my LKML folder, more sad, that thread still didn't
> contain the actual useful information, for that I was directed to
> github:
> 
>   https://github.com/dynup/kpatch/issues/580
> 
> Now, someone is owning me a beer for having to look at github for this.

Deal.  And you probably deserve a few more for fixing our crap.

The github thing is supposed to be temporary, at least in theory we'll
eventually have all klp patch module building code in the kernel tree.

> That finally explained that what happens is that the RELA was trying to
> fix up the paravirt indirect call to 'local_irq_disable', which
> apply_paravirt() will have overwritten with 'CLI; NOP'. This then
> obviously goes *bang*.
> 
> This then raises a number of questions:
> 
>  1) why is that RELA (that obviously does not depend on any module)
>     applied so late?

Good question.  The 'pv_ops' symbol is exported by the core kernel, so I
can't see any reason why we'd need to apply that rela late.  In theory,
kpatch-build isn't supposed to convert that to a klp rela.  Maybe
something went wrong in the patch creation code.

I'm also questioning why we even need to apply the parainstructions
section late.  Maybe we can remove that apply_paravirt() call
altogether, along with .klp.arch.parainstruction sections.

I'll need to look into it...

>  2) why can't we unconditionally skip RELA's to paravirt sites?

We could, but I don't think it's needed if we fix #1.

>  3) Is there ever a possible module-dependent RELA to a paravirt /
>     alternative site?

Good question...

> Now, for 1), I would propose '.klp.rela.${mod}' sections only contain
> RELAs that depend on symbols in ${mod} (or modules in general).

That was already the goal, but we've apparently failed at that.

> We can fix up RELAs that depend on core kernel early without problems.
> Let them be in the normal .rela sections and be fixed up on loading
> the patch-module as per usual.

If such symbols aren't exported, then they still need to be in
.klp.rela.vmlinux sections, since normal relas won't work.

> This should also deal with 2, paravirt should always have RELAs into the
> core kernel.
> 
> Then for 3) we only have alternatives left, and I _think_ it unlikely to
> be the case, but I'll have to have a hard look at that.

I'm not sure about alternatives, but maybe we can enforce such
limitations with tooling and/or kernel checks.

-- 
Josh


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v4 15/16] module: Move where we mark modules RO,X
  2019-10-23 15:16         ` Peter Zijlstra
@ 2019-10-23 17:15           ` Josh Poimboeuf
  2019-10-24 10:59             ` Peter Zijlstra
  0 siblings, 1 reply; 70+ messages in thread
From: Josh Poimboeuf @ 2019-10-23 17:15 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: x86, linux-kernel, rostedt, mhiramat, bristot, jbaron, torvalds,
	tglx, mingo, namit, hpa, luto, ard.biesheuvel, jeyu

On Wed, Oct 23, 2019 at 05:16:54PM +0200, Peter Zijlstra wrote:
> @@ -157,6 +158,14 @@ static int __apply_relocate_add(Elf64_Sh
>  
>  		val = sym->st_value + rel[i].r_addend;
>  
> +		/*
> +		 * .klp.rela.* sections should only contain module
> +		 * related RELAs. All core-kernel RELAs should be in
> +		 * normal .rela.* sections and be applied when loading
> +		 * the patch module itself.
> +		 */
> +		WARN_ON_ONCE(klp && core_kernel_text(val));
> +

This isn't quite true, we also use .klp.rela sections to access
unexported vmlinux symbols.

-- 
Josh


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v4 15/16] module: Move where we mark modules RO,X
  2019-10-22 20:24     ` Peter Zijlstra
  2019-10-22 20:40       ` Steven Rostedt
@ 2019-10-23 18:52       ` Steven Rostedt
  2019-10-24 10:16         ` Peter Zijlstra
  1 sibling, 1 reply; 70+ messages in thread
From: Steven Rostedt @ 2019-10-23 18:52 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: x86, linux-kernel, mhiramat, bristot, jbaron, torvalds, tglx,
	mingo, namit, hpa, luto, ard.biesheuvel, jpoimboe, jeyu

On Tue, 22 Oct 2019 22:24:01 +0200
Peter Zijlstra <peterz@infradead.org> wrote:

> On Mon, Oct 21, 2019 at 10:21:10PM -0400, Steven Rostedt wrote:
> > On Fri, 18 Oct 2019 09:35:40 +0200
> > Peter Zijlstra <peterz@infradead.org> wrote:
> >   
> > > Now that set_all_modules_text_*() is gone, nothing depends on the
> > > relation between ->state = COMING and the protection state anymore.
> > > This enables moving the protection changes later, such that the COMING
> > > notifier callbacks can more easily modify the text.
> > > 
> > > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> > > Cc: Jessica Yu <jeyu@kernel.org>
> > > ---  
> > 
> > This triggered the following bug:
> >   
> 
> > The trace_event_define_fields_<event>() is defined in
> > include/trace/trace_events.h and is an init function called by the
> > trace_events event_create_dir() via the module notifier:
> > MODULE_STATE_COMING  
> 
> The below seems to cure it; and seems to generate identical
> events/*/format output (for my .config, with the exception of ID).
> 
> It has just one section mismatch report that I'm too tired to look at
> just now.
> 
> I'm not particularly proud of the "__function__" hack, but it works :/ I
> couldn't come up with anything else for [uk]probes which seem to have
> dynamic fields and if we're having it then syscall_enter can also make
> use of it, the syscall_metadata crud was going to be ugly otherwise.
> 
> (also, win on LOC)
> 
>

After applying this series and this patch I triggered this:

[ 1397.281889] BUG: kernel NULL pointer dereference, address: 0000000000000001
[ 1397.288896] #PF: supervisor read access in kernel mode
[ 1397.294062] #PF: error_code(0x0000) - not-present page
[ 1397.299192] PGD 0 P4D 0 
[ 1397.301728] Oops: 0000 [#1] PREEMPT SMP PTI
[ 1397.305908] CPU: 7 PID: 4252 Comm: ftracetest Not tainted 5.4.0-rc3-test+ #132
[ 1397.313114] Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01 v03.03 07/14/2016
[ 1397.322056] RIP: 0010:event_create_dir+0x26a/0x520
[ 1397.326841] Code: ff ff 5a 85 c0 75 37 44 03 7b 10 48 83 c3 20 4c 8b 13 4d 85 d2 0f 84 66 fe ff ff b9 0d 00 00 00 4c 89 d6 4c 89 f7 48 8b 53 08 <f3> a6 0f 97 c1 80 d9 00 84 c9 75 a5 48 89 ef e8 b2 d4 a3 00 85 c0
[ 1397.345558] RSP: 0018:ffffc90000a63d18 EFLAGS: 00010202
[ 1397.350775] RAX: 0000000000000000 RBX: ffffc90000a63d80 RCX: 000000000000000d
[ 1397.357893] RDX: ffffffff811ccfac RSI: 0000000000000001 RDI: ffffffff8207c68a
[ 1397.365006] RBP: ffff888114b1c750 R08: 0000000000000000 R09: ffff8881147561b0
[ 1397.372119] R10: 0000000000000001 R11: 0000000000000001 R12: ffff8880b1d80528
[ 1397.379243] R13: ffff88811189aeb0 R14: ffffffff8207c68a R15: 00000000811d2d22
[ 1397.386365] FS:  00007f2567213740(0000) GS:ffff88811a9c0000(0000) knlGS:0000000000000000
[ 1397.394437] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1397.400174] CR2: 0000000000000001 CR3: 00000000b1f06005 CR4: 00000000001606e0
[ 1397.407297] Call Trace:
[ 1397.409753]  trace_add_event_call+0x6c/0xb0
[ 1397.413938]  trace_probe_register_event_call+0x22/0x50
[ 1397.419071]  trace_kprobe_create+0x65c/0xa20
[ 1397.423340]  ? argv_split+0x99/0x130
[ 1397.426913]  ? __kmalloc+0x1d4/0x2c0
[ 1397.430485]  ? trace_kprobe_create+0xa20/0xa20
[ 1397.434922]  ? trace_kprobe_create+0xa20/0xa20
[ 1397.439361]  create_or_delete_trace_kprobe+0xd/0x30
[ 1397.444237]  trace_run_command+0x72/0x90
[ 1397.448158]  trace_parse_run_command+0xaf/0x131
[ 1397.452684]  vfs_write+0xa5/0x1a0
[ 1397.455996]  ksys_write+0x5c/0xd0
[ 1397.459312]  do_syscall_64+0x48/0x120
[ 1397.462971]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 1397.468017] RIP: 0033:0x7f2567303ff8

By running tools/selftests/ftrace/ftracetest

Crashed here:

[33] Kprobe dynamic event - adding and removing	[PASS]
[34] Kprobe dynamic event - busy event check	[PASS]
[35] Kprobe event with comm arguments	[FAIL]
[36] Kprobe event string type argumentclient_loop:

-- Steve

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v4 15/16] module: Move where we mark modules RO,X
  2019-10-23 18:52       ` Steven Rostedt
@ 2019-10-24 10:16         ` Peter Zijlstra
  2019-10-24 10:18           ` Peter Zijlstra
  2019-10-24 15:00           ` Steven Rostedt
  0 siblings, 2 replies; 70+ messages in thread
From: Peter Zijlstra @ 2019-10-24 10:16 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: x86, linux-kernel, mhiramat, bristot, jbaron, torvalds, tglx,
	mingo, namit, hpa, luto, ard.biesheuvel, jpoimboe, jeyu

On Wed, Oct 23, 2019 at 02:52:45PM -0400, Steven Rostedt wrote:
> After applying this series and this patch I triggered this:

Bah, I hate C.

(also for some reason I had KPROBE_EVENTS disabled, when I enabled it it
failed on boot due to selftests)

this one seems to boot and survive your selftests thing (that takes for
bloody ever to run).

---
Subject: ftrace: Rework event_create_dir()

Rework event_create_dir() to use an array of static data instead of
function pointers where possible.

The problem is that it would call the function pointer on module load
before parse_args(), possibly even before jump_labels were initialized.

Luckily the generated functions don't use jump_labels but it still seems
fragile. It also gets in the way of changing when we make the module map
executable.

The generated function are basically calling trace_define_field() with a
bunch of static arguments. So instead of a function, capture these
arguments in a static array, avoiding the function call.

Now there are a number of cases where the fields are dynamic (syscall
arguments, kprobes and uprobes), in which case a static array does not
work, for these we preserve the function call. Luckily all these cases
are not related to modules and so we can retain the function call for
them.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 fs/xfs/xfs_trace.h             |   4 +-
 include/linux/trace_events.h   |  18 ++++++-
 include/trace/events/filemap.h |   2 +-
 include/trace/trace_events.h   |  64 ++++++++-----------------
 kernel/trace/trace.h           |  31 ++++++------
 kernel/trace/trace_entries.h   |  66 +++++++------------------
 kernel/trace/trace_events.c    |  20 +++++++-
 kernel/trace/trace_export.c    | 106 +++++++++++++++--------------------------
 kernel/trace/trace_kprobe.c    |  16 ++++++-
 kernel/trace/trace_syscalls.c  |  50 ++++++++-----------
 kernel/trace/trace_uprobe.c    |   9 +++-
 11 files changed, 172 insertions(+), 214 deletions(-)

diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index eaae275ed430..53c5485cf2a1 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -218,8 +218,8 @@ DECLARE_EVENT_CLASS(xfs_bmap_class,
 	TP_STRUCT__entry(
 		__field(dev_t, dev)
 		__field(xfs_ino_t, ino)
-		__field(void *, leaf);
-		__field(int, pos);
+		__field(void *, leaf)
+		__field(int, pos)
 		__field(xfs_fileoff_t, startoff)
 		__field(xfs_fsblock_t, startblock)
 		__field(xfs_filblks_t, blockcount)
diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h
index 30a8cdcfd4a4..a379255c14a9 100644
--- a/include/linux/trace_events.h
+++ b/include/linux/trace_events.h
@@ -187,6 +187,22 @@ enum trace_reg {
 
 struct trace_event_call;
 
+#define TRACE_FUNCTION_TYPE ((const char *)~0UL)
+
+struct trace_event_fields {
+	const char *type;
+	union {
+		struct {
+			const char *name;
+			const int  size;
+			const int  align;
+			const int  is_signed;
+			const int  filter_type;
+		};
+		int (*define_fields)(struct trace_event_call *);
+	};
+};
+
 struct trace_event_class {
 	const char		*system;
 	void			*probe;
@@ -195,7 +211,7 @@ struct trace_event_class {
 #endif
 	int			(*reg)(struct trace_event_call *event,
 				       enum trace_reg type, void *data);
-	int			(*define_fields)(struct trace_event_call *);
+	struct trace_event_fields *fields_array;
 	struct list_head	*(*get_fields)(struct trace_event_call *);
 	struct list_head	fields;
 	int			(*raw_init)(struct trace_event_call *);
diff --git a/include/trace/events/filemap.h b/include/trace/events/filemap.h
index ee05db7ee8d2..796053e162d2 100644
--- a/include/trace/events/filemap.h
+++ b/include/trace/events/filemap.h
@@ -85,7 +85,7 @@ TRACE_EVENT(file_check_and_advance_wb_err,
 		TP_ARGS(file, old),
 
 		TP_STRUCT__entry(
-			__field(struct file *, file);
+			__field(struct file *, file)
 			__field(unsigned long, i_ino)
 			__field(dev_t, s_dev)
 			__field(errseq_t, old)
diff --git a/include/trace/trace_events.h b/include/trace/trace_events.h
index 4ecdfe2e3580..ca1d2e745a3f 100644
--- a/include/trace/trace_events.h
+++ b/include/trace/trace_events.h
@@ -394,22 +394,16 @@ static struct trace_event_functions trace_event_type_funcs_##call = {	\
 #include TRACE_INCLUDE(TRACE_INCLUDE_FILE)
 
 #undef __field_ext
-#define __field_ext(type, item, filter_type)				\
-	ret = trace_define_field(event_call, #type, #item,		\
-				 offsetof(typeof(field), item),		\
-				 sizeof(field.item),			\
-				 is_signed_type(type), filter_type);	\
-	if (ret)							\
-		return ret;
+#define __field_ext(_type, _item, _filter_type) {			\
+	.type = #_type, .name = #_item,					\
+	.size = sizeof(_type), .align = __alignof__(_type),		\
+	.is_signed = is_signed_type(_type), .filter_type = _filter_type },
 
 #undef __field_struct_ext
-#define __field_struct_ext(type, item, filter_type)			\
-	ret = trace_define_field(event_call, #type, #item,		\
-				 offsetof(typeof(field), item),		\
-				 sizeof(field.item),			\
-				 0, filter_type);			\
-	if (ret)							\
-		return ret;
+#define __field_struct_ext(_type, _item, _filter_type) {		\
+	.type = #_type, .name = #_item,					\
+	.size = sizeof(_type), .align = __alignof__(_type),		\
+	0, .filter_type = _filter_type },
 
 #undef __field
 #define __field(type, item)	__field_ext(type, item, FILTER_OTHER)
@@ -418,25 +412,16 @@ static struct trace_event_functions trace_event_type_funcs_##call = {	\
 #define __field_struct(type, item) __field_struct_ext(type, item, FILTER_OTHER)
 
 #undef __array
-#define __array(type, item, len)					\
-	do {								\
-		char *type_str = #type"["__stringify(len)"]";		\
-		BUILD_BUG_ON(len > MAX_FILTER_STR_VAL);			\
-		BUILD_BUG_ON(len <= 0);					\
-		ret = trace_define_field(event_call, type_str, #item,	\
-				 offsetof(typeof(field), item),		\
-				 sizeof(field.item),			\
-				 is_signed_type(type), FILTER_OTHER);	\
-		if (ret)						\
-			return ret;					\
-	} while (0);
+#define __array(_type, _item, _len) {					\
+	.type = #_type"["__stringify(_len)"]", .name = #_item,		\
+	.size = sizeof(_type[_len]), .align = __alignof__(_type),	\
+	.is_signed = is_signed_type(_type), .filter_type = FILTER_OTHER },
 
 #undef __dynamic_array
-#define __dynamic_array(type, item, len)				       \
-	ret = trace_define_field(event_call, "__data_loc " #type "[]", #item,  \
-				 offsetof(typeof(field), __data_loc_##item),   \
-				 sizeof(field.__data_loc_##item),	       \
-				 is_signed_type(type), FILTER_OTHER);
+#define __dynamic_array(_type, _item, _len) {				\
+	.type = "__data_loc " #_type "[]", .name = #_item,		\
+	.size = 4, .align = 4,						\
+	.is_signed = is_signed_type(_type), .filter_type = FILTER_OTHER },
 
 #undef __string
 #define __string(item, src) __dynamic_array(char, item, -1)
@@ -446,16 +431,9 @@ static struct trace_event_functions trace_event_type_funcs_##call = {	\
 
 #undef DECLARE_EVENT_CLASS
 #define DECLARE_EVENT_CLASS(call, proto, args, tstruct, func, print)	\
-static int notrace __init						\
-trace_event_define_fields_##call(struct trace_event_call *event_call)	\
-{									\
-	struct trace_event_raw_##call field;				\
-	int ret;							\
-									\
-	tstruct;							\
-									\
-	return ret;							\
-}
+static struct trace_event_fields trace_event_fields_##call[] = {	\
+	tstruct								\
+	{} };
 
 #undef DEFINE_EVENT
 #define DEFINE_EVENT(template, name, proto, args)
@@ -613,7 +591,7 @@ static inline notrace int trace_event_get_offsets_##call(		\
  *
  * static struct trace_event_class __used event_class_<template> = {
  *	.system			= "<system>",
- *	.define_fields		= trace_event_define_fields_<call>,
+ *	.fields_array		= trace_event_fields_<call>,
  *	.fields			= LIST_HEAD_INIT(event_class_##call.fields),
  *	.raw_init		= trace_event_raw_init,
  *	.probe			= trace_event_raw_event_##call,
@@ -761,7 +739,7 @@ _TRACE_PERF_PROTO(call, PARAMS(proto));					\
 static char print_fmt_##call[] = print;					\
 static struct trace_event_class __used __refdata event_class_##call = { \
 	.system			= TRACE_SYSTEM_STRING,			\
-	.define_fields		= trace_event_define_fields_##call,	\
+	.fields_array		= trace_event_fields_##call,		\
 	.fields			= LIST_HEAD_INIT(event_class_##call.fields),\
 	.raw_init		= trace_event_raw_init,			\
 	.probe			= trace_event_raw_event_##call,		\
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index d685c61085c0..298a7cacf146 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -49,6 +49,9 @@ enum trace_type {
 #undef __field
 #define __field(type, item)		type	item;
 
+#undef __field_fn
+#define __field_fn(type, item)		type	item;
+
 #undef __field_struct
 #define __field_struct(type, item)	__field(type, item)
 
@@ -68,26 +71,22 @@ enum trace_type {
 #define F_STRUCT(args...)		args
 
 #undef FTRACE_ENTRY
-#define FTRACE_ENTRY(name, struct_name, id, tstruct, print, filter)	\
+#define FTRACE_ENTRY(name, struct_name, id, tstruct, print)		\
 	struct struct_name {						\
 		struct trace_entry	ent;				\
 		tstruct							\
 	}
 
 #undef FTRACE_ENTRY_DUP
-#define FTRACE_ENTRY_DUP(name, name_struct, id, tstruct, printk, filter)
+#define FTRACE_ENTRY_DUP(name, name_struct, id, tstruct, printk)
 
 #undef FTRACE_ENTRY_REG
-#define FTRACE_ENTRY_REG(name, struct_name, id, tstruct, print,	\
-			 filter, regfn) \
-	FTRACE_ENTRY(name, struct_name, id, PARAMS(tstruct), PARAMS(print), \
-		     filter)
+#define FTRACE_ENTRY_REG(name, struct_name, id, tstruct, print,	regfn)	\
+	FTRACE_ENTRY(name, struct_name, id, PARAMS(tstruct), PARAMS(print))
 
 #undef FTRACE_ENTRY_PACKED
-#define FTRACE_ENTRY_PACKED(name, struct_name, id, tstruct, print,	\
-			    filter)					\
-	FTRACE_ENTRY(name, struct_name, id, PARAMS(tstruct), PARAMS(print), \
-		     filter) __packed
+#define FTRACE_ENTRY_PACKED(name, struct_name, id, tstruct, print)	\
+	FTRACE_ENTRY(name, struct_name, id, PARAMS(tstruct), PARAMS(print)) __packed
 
 #include "trace_entries.h"
 
@@ -1899,17 +1898,15 @@ extern void tracing_log_err(struct trace_array *tr,
 #define internal_trace_puts(str) __trace_puts(_THIS_IP_, str, strlen(str))
 
 #undef FTRACE_ENTRY
-#define FTRACE_ENTRY(call, struct_name, id, tstruct, print, filter)	\
+#define FTRACE_ENTRY(call, struct_name, id, tstruct, print)	\
 	extern struct trace_event_call					\
 	__aligned(4) event_##call;
 #undef FTRACE_ENTRY_DUP
-#define FTRACE_ENTRY_DUP(call, struct_name, id, tstruct, print, filter)	\
-	FTRACE_ENTRY(call, struct_name, id, PARAMS(tstruct), PARAMS(print), \
-		     filter)
+#define FTRACE_ENTRY_DUP(call, struct_name, id, tstruct, print)	\
+	FTRACE_ENTRY(call, struct_name, id, PARAMS(tstruct), PARAMS(print))
 #undef FTRACE_ENTRY_PACKED
-#define FTRACE_ENTRY_PACKED(call, struct_name, id, tstruct, print, filter) \
-	FTRACE_ENTRY(call, struct_name, id, PARAMS(tstruct), PARAMS(print), \
-		     filter)
+#define FTRACE_ENTRY_PACKED(call, struct_name, id, tstruct, print) \
+	FTRACE_ENTRY(call, struct_name, id, PARAMS(tstruct), PARAMS(print))
 
 #include "trace_entries.h"
 
diff --git a/kernel/trace/trace_entries.h b/kernel/trace/trace_entries.h
index fc8e97328e54..3e9d81608284 100644
--- a/kernel/trace/trace_entries.h
+++ b/kernel/trace/trace_entries.h
@@ -61,15 +61,13 @@ FTRACE_ENTRY_REG(function, ftrace_entry,
 	TRACE_FN,
 
 	F_STRUCT(
-		__field(	unsigned long,	ip		)
-		__field(	unsigned long,	parent_ip	)
+		__field_fn(	unsigned long,	ip		)
+		__field_fn(	unsigned long,	parent_ip	)
 	),
 
 	F_printk(" %ps <-- %ps",
 		 (void *)__entry->ip, (void *)__entry->parent_ip),
 
-	FILTER_TRACE_FN,
-
 	perf_ftrace_event_register
 );
 
@@ -84,9 +82,7 @@ FTRACE_ENTRY_PACKED(funcgraph_entry, ftrace_graph_ent_entry,
 		__field_desc(	int,		graph_ent,	depth		)
 	),
 
-	F_printk("--> %ps (%d)", (void *)__entry->func, __entry->depth),
-
-	FILTER_OTHER
+	F_printk("--> %ps (%d)", (void *)__entry->func, __entry->depth)
 );
 
 /* Function return entry */
@@ -97,18 +93,16 @@ FTRACE_ENTRY_PACKED(funcgraph_exit, ftrace_graph_ret_entry,
 	F_STRUCT(
 		__field_struct(	struct ftrace_graph_ret,	ret	)
 		__field_desc(	unsigned long,	ret,		func	)
+		__field_desc(	unsigned long,	ret,		overrun	)
 		__field_desc(	unsigned long long, ret,	calltime)
 		__field_desc(	unsigned long long, ret,	rettime	)
-		__field_desc(	unsigned long,	ret,		overrun	)
 		__field_desc(	int,		ret,		depth	)
 	),
 
 	F_printk("<-- %ps (%d) (start: %llx  end: %llx) over: %d",
 		 (void *)__entry->func, __entry->depth,
 		 __entry->calltime, __entry->rettime,
-		 __entry->depth),
-
-	FILTER_OTHER
+		 __entry->depth)
 );
 
 /*
@@ -137,9 +131,7 @@ FTRACE_ENTRY(context_switch, ctx_switch_entry,
 	F_printk("%u:%u:%u  ==> %u:%u:%u [%03u]",
 		 __entry->prev_pid, __entry->prev_prio, __entry->prev_state,
 		 __entry->next_pid, __entry->next_prio, __entry->next_state,
-		 __entry->next_cpu),
-
-	FILTER_OTHER
+		 __entry->next_cpu)
 );
 
 /*
@@ -157,9 +149,7 @@ FTRACE_ENTRY_DUP(wakeup, ctx_switch_entry,
 	F_printk("%u:%u:%u  ==+ %u:%u:%u [%03u]",
 		 __entry->prev_pid, __entry->prev_prio, __entry->prev_state,
 		 __entry->next_pid, __entry->next_prio, __entry->next_state,
-		 __entry->next_cpu),
-
-	FILTER_OTHER
+		 __entry->next_cpu)
 );
 
 /*
@@ -183,9 +173,7 @@ FTRACE_ENTRY(kernel_stack, stack_entry,
 		 (void *)__entry->caller[0], (void *)__entry->caller[1],
 		 (void *)__entry->caller[2], (void *)__entry->caller[3],
 		 (void *)__entry->caller[4], (void *)__entry->caller[5],
-		 (void *)__entry->caller[6], (void *)__entry->caller[7]),
-
-	FILTER_OTHER
+		 (void *)__entry->caller[6], (void *)__entry->caller[7])
 );
 
 FTRACE_ENTRY(user_stack, userstack_entry,
@@ -203,9 +191,7 @@ FTRACE_ENTRY(user_stack, userstack_entry,
 		 (void *)__entry->caller[0], (void *)__entry->caller[1],
 		 (void *)__entry->caller[2], (void *)__entry->caller[3],
 		 (void *)__entry->caller[4], (void *)__entry->caller[5],
-		 (void *)__entry->caller[6], (void *)__entry->caller[7]),
-
-	FILTER_OTHER
+		 (void *)__entry->caller[6], (void *)__entry->caller[7])
 );
 
 /*
@@ -222,9 +208,7 @@ FTRACE_ENTRY(bprint, bprint_entry,
 	),
 
 	F_printk("%ps: %s",
-		 (void *)__entry->ip, __entry->fmt),
-
-	FILTER_OTHER
+		 (void *)__entry->ip, __entry->fmt)
 );
 
 FTRACE_ENTRY_REG(print, print_entry,
@@ -239,8 +223,6 @@ FTRACE_ENTRY_REG(print, print_entry,
 	F_printk("%ps: %s",
 		 (void *)__entry->ip, __entry->buf),
 
-	FILTER_OTHER,
-
 	ftrace_event_register
 );
 
@@ -254,9 +236,7 @@ FTRACE_ENTRY(raw_data, raw_data_entry,
 	),
 
 	F_printk("id:%04x %08x",
-		 __entry->id, (int)__entry->buf[0]),
-
-	FILTER_OTHER
+		 __entry->id, (int)__entry->buf[0])
 );
 
 FTRACE_ENTRY(bputs, bputs_entry,
@@ -269,9 +249,7 @@ FTRACE_ENTRY(bputs, bputs_entry,
 	),
 
 	F_printk("%ps: %s",
-		 (void *)__entry->ip, __entry->str),
-
-	FILTER_OTHER
+		 (void *)__entry->ip, __entry->str)
 );
 
 FTRACE_ENTRY(mmiotrace_rw, trace_mmiotrace_rw,
@@ -283,16 +261,14 @@ FTRACE_ENTRY(mmiotrace_rw, trace_mmiotrace_rw,
 		__field_desc(	resource_size_t, rw,	phys	)
 		__field_desc(	unsigned long,	rw,	value	)
 		__field_desc(	unsigned long,	rw,	pc	)
-		__field_desc(	int, 		rw,	map_id	)
+		__field_desc(	int,		rw,	map_id	)
 		__field_desc(	unsigned char,	rw,	opcode	)
 		__field_desc(	unsigned char,	rw,	width	)
 	),
 
 	F_printk("%lx %lx %lx %d %x %x",
 		 (unsigned long)__entry->phys, __entry->value, __entry->pc,
-		 __entry->map_id, __entry->opcode, __entry->width),
-
-	FILTER_OTHER
+		 __entry->map_id, __entry->opcode, __entry->width)
 );
 
 FTRACE_ENTRY(mmiotrace_map, trace_mmiotrace_map,
@@ -304,15 +280,13 @@ FTRACE_ENTRY(mmiotrace_map, trace_mmiotrace_map,
 		__field_desc(	resource_size_t, map,	phys	)
 		__field_desc(	unsigned long,	map,	virt	)
 		__field_desc(	unsigned long,	map,	len	)
-		__field_desc(	int, 		map,	map_id	)
+		__field_desc(	int,		map,	map_id	)
 		__field_desc(	unsigned char,	map,	opcode	)
 	),
 
 	F_printk("%lx %lx %lx %d %x",
 		 (unsigned long)__entry->phys, __entry->virt, __entry->len,
-		 __entry->map_id, __entry->opcode),
-
-	FILTER_OTHER
+		 __entry->map_id, __entry->opcode)
 );
 
 
@@ -334,9 +308,7 @@ FTRACE_ENTRY(branch, trace_branch,
 	F_printk("%u:%s:%s (%u)%s",
 		 __entry->line,
 		 __entry->func, __entry->file, __entry->correct,
-		 __entry->constant ? " CONSTANT" : ""),
-
-	FILTER_OTHER
+		 __entry->constant ? " CONSTANT" : "")
 );
 
 
@@ -362,7 +334,5 @@ FTRACE_ENTRY(hwlat, hwlat_entry,
 		 __entry->duration,
 		 __entry->outer_duration,
 		 __entry->nmi_total_ts,
-		 __entry->nmi_count),
-
-	FILTER_OTHER
+		 __entry->nmi_count)
 );
diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
index fba87d10f0c1..5ab10c3dce78 100644
--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -24,6 +24,7 @@
 #include <linux/delay.h>
 
 #include <trace/events/sched.h>
+#include <trace/syscall.h>
 
 #include <asm/setup.h>
 
@@ -1990,7 +1991,24 @@ event_create_dir(struct dentry *parent, struct trace_event_file *file)
 	 */
 	head = trace_get_fields(call);
 	if (list_empty(head)) {
-		ret = call->class->define_fields(call);
+		struct trace_event_fields *field = call->class->fields_array;
+		unsigned int offset = sizeof(struct trace_entry);
+
+		for (; field->type; field++) {
+			if (field->type == TRACE_FUNCTION_TYPE) {
+				ret = field->define_fields(call);
+				break;
+			}
+
+			offset = ALIGN(offset, field->align);
+			ret = trace_define_field(call, field->type, field->name,
+						 offset, field->size,
+						 field->is_signed, field->filter_type);
+			if (ret)
+				break;
+
+			offset += field->size;
+		}
 		if (ret < 0) {
 			pr_warn("Could not initialize trace point events/%s\n",
 				name);
diff --git a/kernel/trace/trace_export.c b/kernel/trace/trace_export.c
index 45630a76ed3a..6d64c1c19fd5 100644
--- a/kernel/trace/trace_export.c
+++ b/kernel/trace/trace_export.c
@@ -29,10 +29,8 @@ static int ftrace_event_register(struct trace_event_call *call,
  * function and thus become accesible via perf.
  */
 #undef FTRACE_ENTRY_REG
-#define FTRACE_ENTRY_REG(name, struct_name, id, tstruct, print, \
-			 filter, regfn) \
-	FTRACE_ENTRY(name, struct_name, id, PARAMS(tstruct), PARAMS(print), \
-		     filter)
+#define FTRACE_ENTRY_REG(name, struct_name, id, tstruct, print, regfn) \
+	FTRACE_ENTRY(name, struct_name, id, PARAMS(tstruct), PARAMS(print))
 
 /* not needed for this file */
 #undef __field_struct
@@ -41,6 +39,9 @@ static int ftrace_event_register(struct trace_event_call *call,
 #undef __field
 #define __field(type, item)				type item;
 
+#undef __field_fn
+#define __field_fn(type, item)				type item;
+
 #undef __field_desc
 #define __field_desc(type, container, item)		type item;
 
@@ -60,7 +61,7 @@ static int ftrace_event_register(struct trace_event_call *call,
 #define F_printk(fmt, args...) fmt, args
 
 #undef FTRACE_ENTRY
-#define FTRACE_ENTRY(name, struct_name, id, tstruct, print, filter)	\
+#define FTRACE_ENTRY(name, struct_name, id, tstruct, print)		\
 struct ____ftrace_##name {						\
 	tstruct								\
 };									\
@@ -73,76 +74,46 @@ static void __always_unused ____ftrace_check_##name(void)		\
 }
 
 #undef FTRACE_ENTRY_DUP
-#define FTRACE_ENTRY_DUP(name, struct_name, id, tstruct, print, filter)	\
-	FTRACE_ENTRY(name, struct_name, id, PARAMS(tstruct), PARAMS(print), \
-		     filter)
+#define FTRACE_ENTRY_DUP(name, struct_name, id, tstruct, print)		\
+	FTRACE_ENTRY(name, struct_name, id, PARAMS(tstruct), PARAMS(print))
 
 #include "trace_entries.h"
 
+#undef __field_ext
+#define __field_ext(_type, _item, _filter_type) {			\
+	.type = #_type, .name = #_item,					\
+	.size = sizeof(_type), .align = __alignof__(_type),		\
+	is_signed_type(_type), .filter_type = _filter_type },
+
 #undef __field
-#define __field(type, item)						\
-	ret = trace_define_field(event_call, #type, #item,		\
-				 offsetof(typeof(field), item),		\
-				 sizeof(field.item),			\
-				 is_signed_type(type), filter_type);	\
-	if (ret)							\
-		return ret;
+#define __field(_type, _item) __field_ext(_type, _item, FILTER_OTHER)
+
+#undef __field_fn
+#define __field_fn(_type, _item) __field_ext(_type, _item, FILTER_TRACE_FN)
 
 #undef __field_desc
-#define __field_desc(type, container, item)	\
-	ret = trace_define_field(event_call, #type, #item,		\
-				 offsetof(typeof(field),		\
-					  container.item),		\
-				 sizeof(field.container.item),		\
-				 is_signed_type(type), filter_type);	\
-	if (ret)							\
-		return ret;
+#define __field_desc(_type, _container, _item) __field_ext(_type, _item, FILTER_OTHER)
 
 #undef __array
-#define __array(type, item, len)					\
-	do {								\
-		char *type_str = #type"["__stringify(len)"]";		\
-		BUILD_BUG_ON(len > MAX_FILTER_STR_VAL);			\
-		ret = trace_define_field(event_call, type_str, #item,	\
-				 offsetof(typeof(field), item),		\
-				 sizeof(field.item),			\
-				 is_signed_type(type), filter_type);	\
-		if (ret)						\
-			return ret;					\
-	} while (0);
+#define __array(_type, _item, _len) {					\
+	.type = #_type"["__stringify(_len)"]", .name = #_item,		\
+	.size = sizeof(_type[_len]), .align = __alignof__(_type),	\
+	is_signed_type(_type), .filter_type = FILTER_OTHER },
 
 #undef __array_desc
-#define __array_desc(type, container, item, len)			\
-	BUILD_BUG_ON(len > MAX_FILTER_STR_VAL);				\
-	ret = trace_define_field(event_call, #type "[" #len "]", #item,	\
-				 offsetof(typeof(field),		\
-					  container.item),		\
-				 sizeof(field.container.item),		\
-				 is_signed_type(type), filter_type);	\
-	if (ret)							\
-		return ret;
+#define __array_desc(_type, _container, _item, _len) __array(_type, _item, _len)
 
 #undef __dynamic_array
-#define __dynamic_array(type, item)					\
-	ret = trace_define_field(event_call, #type "[]", #item,  \
-				 offsetof(typeof(field), item),		\
-				 0, is_signed_type(type), filter_type);\
-	if (ret)							\
-		return ret;
+#define __dynamic_array(_type, _item) {					\
+	.type = #_type "[]", .name = #_item,				\
+	.size = 0, .align = __alignof__(_type),				\
+	is_signed_type(_type), .filter_type = FILTER_OTHER },
 
 #undef FTRACE_ENTRY
-#define FTRACE_ENTRY(name, struct_name, id, tstruct, print, filter)	\
-static int __init							\
-ftrace_define_fields_##name(struct trace_event_call *event_call)	\
-{									\
-	struct struct_name field;					\
-	int ret;							\
-	int filter_type = filter;					\
-									\
-	tstruct;							\
-									\
-	return ret;							\
-}
+#define FTRACE_ENTRY(name, struct_name, id, tstruct, print)		\
+static struct trace_event_fields ftrace_event_fields_##name[] = {	\
+	tstruct								\
+	{} };
 
 #include "trace_entries.h"
 
@@ -152,6 +123,9 @@ ftrace_define_fields_##name(struct trace_event_call *event_call)	\
 #undef __field
 #define __field(type, item)
 
+#undef __field_fn
+#define __field_fn(type, item)
+
 #undef __field_desc
 #define __field_desc(type, container, item)
 
@@ -168,12 +142,10 @@ ftrace_define_fields_##name(struct trace_event_call *event_call)	\
 #define F_printk(fmt, args...) __stringify(fmt) ", "  __stringify(args)
 
 #undef FTRACE_ENTRY_REG
-#define FTRACE_ENTRY_REG(call, struct_name, etype, tstruct, print, filter,\
-			 regfn)						\
-									\
+#define FTRACE_ENTRY_REG(call, struct_name, etype, tstruct, print, regfn) \
 struct trace_event_class __refdata event_class_ftrace_##call = {	\
 	.system			= __stringify(TRACE_SYSTEM),		\
-	.define_fields		= ftrace_define_fields_##call,		\
+	.fields_array		= ftrace_event_fields_##call,		\
 	.fields			= LIST_HEAD_INIT(event_class_ftrace_##call.fields),\
 	.reg			= regfn,				\
 };									\
@@ -191,9 +163,9 @@ struct trace_event_call __used						\
 __attribute__((section("_ftrace_events"))) *__event_##call = &event_##call;
 
 #undef FTRACE_ENTRY
-#define FTRACE_ENTRY(call, struct_name, etype, tstruct, print, filter)	\
+#define FTRACE_ENTRY(call, struct_name, etype, tstruct, print)		\
 	FTRACE_ENTRY_REG(call, struct_name, etype,			\
-			 PARAMS(tstruct), PARAMS(print), filter, NULL)
+			 PARAMS(tstruct), PARAMS(print), NULL)
 
 bool ftrace_event_is_function(struct trace_event_call *call)
 {
diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index 1552a95c743b..66e0a8ff1c01 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -1534,16 +1534,28 @@ static struct trace_event_functions kprobe_funcs = {
 	.trace		= print_kprobe_event
 };
 
+static struct trace_event_fields kretprobe_fields_array[] = {
+	{ .type = TRACE_FUNCTION_TYPE,
+	  .define_fields = kretprobe_event_define_fields },
+	{}
+};
+
+static struct trace_event_fields kprobe_fields_array[] = {
+	{ .type = TRACE_FUNCTION_TYPE,
+	  .define_fields = kprobe_event_define_fields },
+	{}
+};
+
 static inline void init_trace_event_call(struct trace_kprobe *tk)
 {
 	struct trace_event_call *call = trace_probe_event_call(&tk->tp);
 
 	if (trace_kprobe_is_return(tk)) {
 		call->event.funcs = &kretprobe_funcs;
-		call->class->define_fields = kretprobe_event_define_fields;
+		call->class->fields_array = kretprobe_fields_array;
 	} else {
 		call->event.funcs = &kprobe_funcs;
-		call->class->define_fields = kprobe_event_define_fields;
+		call->class->fields_array = kprobe_fields_array;
 	}
 
 	call->flags = TRACE_EVENT_FL_KPROBE;
diff --git a/kernel/trace/trace_syscalls.c b/kernel/trace/trace_syscalls.c
index fa8fbff736d6..53935259f701 100644
--- a/kernel/trace/trace_syscalls.c
+++ b/kernel/trace/trace_syscalls.c
@@ -198,11 +198,10 @@ print_syscall_exit(struct trace_iterator *iter, int flags,
 
 extern char *__bad_type_size(void);
 
-#define SYSCALL_FIELD(type, field, name)				\
-	sizeof(type) != sizeof(trace.field) ?				\
-		__bad_type_size() :					\
-		#type, #name, offsetof(typeof(trace), field),		\
-		sizeof(trace.field), is_signed_type(type)
+#define SYSCALL_FIELD(_type, _name) {					\
+	.type = #_type, .name = #_name,					\
+	.size = sizeof(_type), .align = __alignof__(_type),		\
+	.is_signed = is_signed_type(_type), .filter_type = FILTER_OTHER }
 
 static int __init
 __set_enter_print_fmt(struct syscall_metadata *entry, char *buf, int len)
@@ -269,42 +268,22 @@ static int __init syscall_enter_define_fields(struct trace_event_call *call)
 {
 	struct syscall_trace_enter trace;
 	struct syscall_metadata *meta = call->data;
-	int ret;
-	int i;
 	int offset = offsetof(typeof(trace), args);
-
-	ret = trace_define_field(call, SYSCALL_FIELD(int, nr, __syscall_nr),
-				 FILTER_OTHER);
-	if (ret)
-		return ret;
+	int ret, i;
 
 	for (i = 0; i < meta->nb_args; i++) {
 		ret = trace_define_field(call, meta->types[i],
 					 meta->args[i], offset,
 					 sizeof(unsigned long), 0,
 					 FILTER_OTHER);
+		if (ret)
+			break;
 		offset += sizeof(unsigned long);
 	}
 
 	return ret;
 }
 
-static int __init syscall_exit_define_fields(struct trace_event_call *call)
-{
-	struct syscall_trace_exit trace;
-	int ret;
-
-	ret = trace_define_field(call, SYSCALL_FIELD(int, nr, __syscall_nr),
-				 FILTER_OTHER);
-	if (ret)
-		return ret;
-
-	ret = trace_define_field(call, SYSCALL_FIELD(long, ret, ret),
-				 FILTER_OTHER);
-
-	return ret;
-}
-
 static void ftrace_syscall_enter(void *data, struct pt_regs *regs, long id)
 {
 	struct trace_array *tr = data;
@@ -502,6 +481,13 @@ static int __init init_syscall_trace(struct trace_event_call *call)
 	return id;
 }
 
+static struct trace_event_fields __refdata syscall_enter_fields_array[] = {
+	SYSCALL_FIELD(int, __syscall_nr),
+	{ .type = TRACE_FUNCTION_TYPE,
+	  .define_fields = syscall_enter_define_fields },
+	{}
+};
+
 struct trace_event_functions enter_syscall_print_funcs = {
 	.trace		= print_syscall_enter,
 };
@@ -513,7 +499,7 @@ struct trace_event_functions exit_syscall_print_funcs = {
 struct trace_event_class __refdata event_class_syscall_enter = {
 	.system		= "syscalls",
 	.reg		= syscall_enter_register,
-	.define_fields	= syscall_enter_define_fields,
+	.fields_array	= syscall_enter_fields_array,
 	.get_fields	= syscall_get_enter_fields,
 	.raw_init	= init_syscall_trace,
 };
@@ -521,7 +507,11 @@ struct trace_event_class __refdata event_class_syscall_enter = {
 struct trace_event_class __refdata event_class_syscall_exit = {
 	.system		= "syscalls",
 	.reg		= syscall_exit_register,
-	.define_fields	= syscall_exit_define_fields,
+	.fields_array	= (struct trace_event_fields[]){
+		SYSCALL_FIELD(int, __syscall_nr),
+		SYSCALL_FIELD(long, ret),
+		{}
+	},
 	.fields		= LIST_HEAD_INIT(event_class_syscall_exit.fields),
 	.raw_init	= init_syscall_trace,
 };
diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
index 352073d36585..476a382f1f1b 100644
--- a/kernel/trace/trace_uprobe.c
+++ b/kernel/trace/trace_uprobe.c
@@ -1507,12 +1507,17 @@ static struct trace_event_functions uprobe_funcs = {
 	.trace		= print_uprobe_event
 };
 
+static struct trace_event_fields uprobe_fields_array[] = {
+	{ .type = TRACE_FUNCTION_TYPE,
+	  .define_fields = uprobe_event_define_fields },
+	{}
+};
+
 static inline void init_trace_event_call(struct trace_uprobe *tu)
 {
 	struct trace_event_call *call = trace_probe_event_call(&tu->tp);
-
 	call->event.funcs = &uprobe_funcs;
-	call->class->define_fields = uprobe_event_define_fields;
+	call->class->fields_array = uprobe_fields_array;
 
 	call->flags = TRACE_EVENT_FL_UPROBE | TRACE_EVENT_FL_CAP_ANY;
 	call->class->reg = trace_uprobe_register;

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v4 15/16] module: Move where we mark modules RO,X
  2019-10-24 10:16         ` Peter Zijlstra
@ 2019-10-24 10:18           ` Peter Zijlstra
  2019-10-24 15:00           ` Steven Rostedt
  1 sibling, 0 replies; 70+ messages in thread
From: Peter Zijlstra @ 2019-10-24 10:18 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: x86, linux-kernel, mhiramat, bristot, jbaron, torvalds, tglx,
	mingo, namit, hpa, luto, ard.biesheuvel, jpoimboe, jeyu

On Thu, Oct 24, 2019 at 12:16:09PM +0200, Peter Zijlstra wrote:
> +struct trace_event_fields {
> +	const char *type;
> +	union {
> +		struct {
> +			const char *name;
> +			const int  size;
> +			const int  align;
> +			const int  is_signed;
> +			const int  filter_type;

FWIW, I suspect we can do:

			unsigned char size;
			unsigned char align;
			unsigned char is_signed;
			unsigned char filter_type;

And save us some 8 bytes per entry (12 on 32bit).

> +		};
> +		int (*define_fields)(struct trace_event_call *);
> +	};
> +};

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v4 15/16] module: Move where we mark modules RO,X
  2019-10-23 17:15           ` Josh Poimboeuf
@ 2019-10-24 10:59             ` Peter Zijlstra
  2019-10-24 18:31               ` Josh Poimboeuf
  0 siblings, 1 reply; 70+ messages in thread
From: Peter Zijlstra @ 2019-10-24 10:59 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: x86, linux-kernel, rostedt, mhiramat, bristot, jbaron, torvalds,
	tglx, mingo, namit, hpa, luto, ard.biesheuvel, jeyu

On Wed, Oct 23, 2019 at 12:15:14PM -0500, Josh Poimboeuf wrote:
> On Wed, Oct 23, 2019 at 05:16:54PM +0200, Peter Zijlstra wrote:
> > @@ -157,6 +158,14 @@ static int __apply_relocate_add(Elf64_Sh
> >  
> >  		val = sym->st_value + rel[i].r_addend;
> >  
> > +		/*
> > +		 * .klp.rela.* sections should only contain module
> > +		 * related RELAs. All core-kernel RELAs should be in
> > +		 * normal .rela.* sections and be applied when loading
> > +		 * the patch module itself.
> > +		 */
> > +		WARN_ON_ONCE(klp && core_kernel_text(val));
> > +
> 
> This isn't quite true, we also use .klp.rela sections to access
> unexported vmlinux symbols.

Yes, you said in that earlier email. That all makes it really hard to
validate this. But unless we validate it, it will stay buggy :/

Hmmm.... /me ponders

The alternative would be to apply the .klp.rela.* sections twice, once
at patch-module load time and then apply those core_kernel_text()
entries, and then once later and skip over them.

How's this?

---
--- a/arch/x86/kernel/module.c
+++ b/arch/x86/kernel/module.c
@@ -126,12 +126,20 @@ int apply_relocate(Elf32_Shdr *sechdrs,
 	return 0;
 }
 #else /*X86_64*/
+
+enum rela_filter {
+	rela_all = 0,
+	rela_core,
+	rela_mod,
+};
+
 static int __apply_relocate_add(Elf64_Shdr *sechdrs,
 		   const char *strtab,
 		   unsigned int symindex,
 		   unsigned int relsec,
 		   struct module *me,
-		   void *(*write)(void *addr, const void *val, size_t size))
+		   void *(*write)(void *addr, const void *val, size_t size),
+		   enum rela_filter filter)
 {
 	unsigned int i;
 	Elf64_Rela *rel = (void *)sechdrs[relsec].sh_addr;
@@ -157,6 +165,11 @@ static int __apply_relocate_add(Elf64_Sh
 
 		val = sym->st_value + rel[i].r_addend;
 
+		if (filter) {
+			if ((filter == rela_core) != !!core_kernel_text(val))
+				continue;
+		}
+
 		switch (ELF64_R_TYPE(rel[i].r_info)) {
 		case R_X86_64_NONE:
 			break;
@@ -223,18 +236,21 @@ int apply_relocate_add(Elf64_Shdr *sechd
 		   unsigned int relsec,
 		   struct module *me)
 {
-	return __apply_relocate_add(sechdrs, strtab, symindex, relsec, me, memcpy);
+	return __apply_relocate_add(sechdrs, strtab, symindex, relsec, me, memcpy, rela_all);
 }
 
 int klp_apply_relocate_add(Elf64_Shdr *sechdrs,
 		   const char *strtab,
 		   unsigned int symindex,
 		   unsigned int relsec,
-		   struct module *me)
+		   struct module *me, bool late)
 {
 	int ret;
 
-	ret = __apply_relocate_add(sechdrs, strtab, symindex, relsec, me, text_poke);
+	if (!late)
+		return __apply_relocate_add(sechdrs, strtab, symindex, relsec, me, memcpy, rela_core);
+
+	ret = __apply_relocate_add(sechdrs, strtab, symindex, relsec, me, text_poke, rela_mod);
 	if (!ret)
 		text_poke_sync();
 
--- a/include/linux/livepatch.h
+++ b/include/linux/livepatch.h
@@ -222,7 +222,7 @@ extern int klp_apply_relocate_add(Elf64_
 			      const char *strtab,
 			      unsigned int symindex,
 			      unsigned int relsec,
-			      struct module *me);
+			      struct module *me, bool late);
 
 #else /* !CONFIG_LIVEPATCH */
 
--- a/kernel/livepatch/core.c
+++ b/kernel/livepatch/core.c
@@ -245,15 +245,6 @@ static int klp_resolve_symbols(Elf_Shdr
 	return 0;
 }
 
-int __weak klp_apply_relocate_add(Elf64_Shdr *sechdrs,
-			      const char *strtab,
-			      unsigned int symindex,
-			      unsigned int relsec,
-			      struct module *me)
-{
-	return apply_relocate_add(sechdrs, strtab, symindex, relsec, me);
-}
-
 static int klp_write_object_relocations(struct module *pmod,
 					struct klp_object *obj)
 {
@@ -296,7 +287,7 @@ static int klp_write_object_relocations(
 
 		ret = klp_apply_relocate_add(pmod->klp_info->sechdrs,
 					 pmod->core_kallsyms.strtab,
-					 pmod->klp_info->symndx, i, pmod);
+					 pmod->klp_info->symndx, i, pmod, true);
 		if (ret)
 			break;
 	}
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -2328,8 +2328,13 @@ static int apply_relocations(struct modu
 			continue;
 
 		/* Livepatch relocation sections are applied by livepatch */
-		if (info->sechdrs[i].sh_flags & SHF_RELA_LIVEPATCH)
+		if (info->sechdrs[i].sh_flags & SHF_RELA_LIVEPATCH) {
+			if (info->sechdrs[i].sh_type == SHT_RELA) {
+				klp_apply_relocate_add(info->sechdrs, info->strtab,
+						       info->index.sym, i, mod, false);
+			}
 			continue;
+		}
 
 		if (info->sechdrs[i].sh_type == SHT_REL)
 			err = apply_relocate(info->sechdrs, info->strtab,

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v4 15/16] module: Move where we mark modules RO,X
  2019-10-23 17:00         ` Josh Poimboeuf
@ 2019-10-24 13:16           ` Peter Zijlstra
  2019-10-25  6:44             ` Petr Mladek
  0 siblings, 1 reply; 70+ messages in thread
From: Peter Zijlstra @ 2019-10-24 13:16 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: x86, linux-kernel, rostedt, mhiramat, bristot, jbaron, torvalds,
	tglx, mingo, namit, hpa, luto, ard.biesheuvel, jeyu,
	live-patching

On Wed, Oct 23, 2019 at 12:00:25PM -0500, Josh Poimboeuf wrote:

> > This then raises a number of questions:
> > 
> >  1) why is that RELA (that obviously does not depend on any module)
> >     applied so late?
> 
> Good question.  The 'pv_ops' symbol is exported by the core kernel, so I
> can't see any reason why we'd need to apply that rela late.  In theory,
> kpatch-build isn't supposed to convert that to a klp rela.  Maybe
> something went wrong in the patch creation code.
> 
> I'm also questioning why we even need to apply the parainstructions
> section late.  Maybe we can remove that apply_paravirt() call
> altogether, along with .klp.arch.parainstruction sections.
> 
> I'll need to look into it...

Right, that really should be able to run early. Esp. after commit

  11e86dc7f274 ("x86/paravirt: Detect over-sized patching bugs in paravirt_patch_call()")

paravirt patching is unconditional. We _never_ run with the indirect
call except very early boot, but modules should have them patched way
before their init section runs.

We rely on this for spectre-v2 and friends.

> >  3) Is there ever a possible module-dependent RELA to a paravirt /
> >     alternative site?
> 
> Good question...

> > Then for 3) we only have alternatives left, and I _think_ it unlikely to
> > be the case, but I'll have to have a hard look at that.
> 
> I'm not sure about alternatives, but maybe we can enforce such
> limitations with tooling and/or kernel checks.

Right, so on IRC you implied you might have some additional details on
how alternatives were affected; did you manage to dig that up?

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v4 15/16] module: Move where we mark modules RO,X
  2019-10-24 10:16         ` Peter Zijlstra
  2019-10-24 10:18           ` Peter Zijlstra
@ 2019-10-24 15:00           ` Steven Rostedt
  2019-10-24 16:43             ` Peter Zijlstra
  1 sibling, 1 reply; 70+ messages in thread
From: Steven Rostedt @ 2019-10-24 15:00 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: x86, linux-kernel, mhiramat, bristot, jbaron, torvalds, tglx,
	mingo, namit, hpa, luto, ard.biesheuvel, jpoimboe, jeyu

On Thu, 24 Oct 2019 12:16:09 +0200
Peter Zijlstra <peterz@infradead.org> wrote:

> On Wed, Oct 23, 2019 at 02:52:45PM -0400, Steven Rostedt wrote:
> > After applying this series and this patch I triggered this:  
> 
> Bah, I hate C.
> 
> (also for some reason I had KPROBE_EVENTS disabled, when I enabled it it
> failed on boot due to selftests)
> 
> this one seems to boot and survive your selftests thing (that takes for
> bloody ever to run).
> 

Care to enable CONFIG_HIST_TRIGGERS?

  CC [M]  drivers/gpu/drm/i915/gem/i915_gem_context.o
/work/git/linux-trace.git/kernel/trace/trace_events_hist.c: In function ‘register_synth_event’:
/work/git/linux-trace.git/kernel/trace/trace_events_hist.c:1157:15: error: ‘struct trace_event_class’ has no member named ‘define_fields’; did you mean ‘get_fields’?
  call->class->define_fields = synth_event_define_fields;
               ^~~~~~~~~~~~~
               get_fields
make[3]: *** [/work/git/linux-trace.git/scripts/Makefile.build:265: kernel/trace/trace_events_hist.o] Error 1
make[3]: *** Waiting for unfinished jobs....

-- Steve

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v4 15/16] module: Move where we mark modules RO,X
  2019-10-24 15:00           ` Steven Rostedt
@ 2019-10-24 16:43             ` Peter Zijlstra
  2019-10-24 18:17               ` Steven Rostedt
  0 siblings, 1 reply; 70+ messages in thread
From: Peter Zijlstra @ 2019-10-24 16:43 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: x86, linux-kernel, mhiramat, bristot, jbaron, torvalds, tglx,
	mingo, namit, hpa, luto, ard.biesheuvel, jpoimboe, jeyu

On Thu, Oct 24, 2019 at 11:00:24AM -0400, Steven Rostedt wrote:
> On Thu, 24 Oct 2019 12:16:09 +0200
> Peter Zijlstra <peterz@infradead.org> wrote:
> 
> > On Wed, Oct 23, 2019 at 02:52:45PM -0400, Steven Rostedt wrote:
> > > After applying this series and this patch I triggered this:  
> > 
> > Bah, I hate C.
> > 
> > (also for some reason I had KPROBE_EVENTS disabled, when I enabled it it
> > failed on boot due to selftests)
> > 
> > this one seems to boot and survive your selftests thing (that takes for
> > bloody ever to run).
> > 
> 
> Care to enable CONFIG_HIST_TRIGGERS?
> 
>   CC [M]  drivers/gpu/drm/i915/gem/i915_gem_context.o
> /work/git/linux-trace.git/kernel/trace/trace_events_hist.c: In function ‘register_synth_event’:
> /work/git/linux-trace.git/kernel/trace/trace_events_hist.c:1157:15: error: ‘struct trace_event_class’ has no member named ‘define_fields’; did you mean ‘get_fields’?
>   call->class->define_fields = synth_event_define_fields;
>                ^~~~~~~~~~~~~
>                get_fields
> make[3]: *** [/work/git/linux-trace.git/scripts/Makefile.build:265: kernel/trace/trace_events_hist.o] Error 1
> make[3]: *** Waiting for unfinished jobs....

allmodconfig clean

(omg, so much __field(); fail)

---
 drivers/infiniband/hw/hfi1/trace_tid.h  |   8 +--
 drivers/infiniband/hw/hfi1/trace_tx.h   |   2 +-
 drivers/lightnvm/pblk-trace.h           |   8 +--
 drivers/net/fjes/fjes_trace.h           |   2 +-
 drivers/net/wireless/ath/ath10k/trace.h |   6 +-
 fs/xfs/scrub/trace.h                    |   6 +-
 fs/xfs/xfs_trace.h                      |   4 +-
 include/linux/trace_events.h            |  18 +++++-
 include/trace/events/filemap.h          |   2 +-
 include/trace/events/rpcrdma.h          |   2 +-
 include/trace/trace_events.h            |  64 +++++++------------
 kernel/trace/trace.h                    |  31 +++++-----
 kernel/trace/trace_entries.h            |  66 ++++++--------------
 kernel/trace/trace_events.c             |  20 +++++-
 kernel/trace/trace_events_hist.c        |   8 ++-
 kernel/trace/trace_export.c             | 106 ++++++++++++--------------------
 kernel/trace/trace_kprobe.c             |  16 ++++-
 kernel/trace/trace_syscalls.c           |  50 ++++++---------
 kernel/trace/trace_uprobe.c             |   9 ++-
 net/mac80211/trace.h                    |  28 ++++-----
 net/wireless/trace.h                    |   6 +-
 21 files changed, 213 insertions(+), 249 deletions(-)

diff --git a/drivers/infiniband/hw/hfi1/trace_tid.h b/drivers/infiniband/hw/hfi1/trace_tid.h
index 343fb9894a82..985ffa9cc958 100644
--- a/drivers/infiniband/hw/hfi1/trace_tid.h
+++ b/drivers/infiniband/hw/hfi1/trace_tid.h
@@ -138,10 +138,10 @@ TRACE_EVENT(/* put_tid */
 	TP_ARGS(dd, index, type, pa, order),
 	TP_STRUCT__entry(/* entry */
 		DD_DEV_ENTRY(dd)
-		__field(unsigned long, pa);
-		__field(u32, index);
-		__field(u32, type);
-		__field(u16, order);
+		__field(unsigned long, pa)
+		__field(u32, index)
+		__field(u32, type)
+		__field(u16, order)
 	),
 	TP_fast_assign(/* assign */
 		DD_DEV_ASSIGN(dd);
diff --git a/drivers/infiniband/hw/hfi1/trace_tx.h b/drivers/infiniband/hw/hfi1/trace_tx.h
index 09eb0c9ada00..769e5e4710c6 100644
--- a/drivers/infiniband/hw/hfi1/trace_tx.h
+++ b/drivers/infiniband/hw/hfi1/trace_tx.h
@@ -588,7 +588,7 @@ TRACE_EVENT(hfi1_sdma_user_reqinfo,
 	    TP_PROTO(struct hfi1_devdata *dd, u16 ctxt, u8 subctxt, u16 *i),
 	    TP_ARGS(dd, ctxt, subctxt, i),
 	    TP_STRUCT__entry(
-		    DD_DEV_ENTRY(dd);
+		    DD_DEV_ENTRY(dd)
 		    __field(u16, ctxt)
 		    __field(u8, subctxt)
 		    __field(u8, ver_opcode)
diff --git a/drivers/lightnvm/pblk-trace.h b/drivers/lightnvm/pblk-trace.h
index 9534503b69d9..47b67c6bff7a 100644
--- a/drivers/lightnvm/pblk-trace.h
+++ b/drivers/lightnvm/pblk-trace.h
@@ -46,7 +46,7 @@ TRACE_EVENT(pblk_chunk_reset,
 	TP_STRUCT__entry(
 		__string(name, name)
 		__field(u64, ppa)
-		__field(int, state);
+		__field(int, state)
 	),
 
 	TP_fast_assign(
@@ -72,7 +72,7 @@ TRACE_EVENT(pblk_chunk_state,
 	TP_STRUCT__entry(
 		__string(name, name)
 		__field(u64, ppa)
-		__field(int, state);
+		__field(int, state)
 	),
 
 	TP_fast_assign(
@@ -98,7 +98,7 @@ TRACE_EVENT(pblk_line_state,
 	TP_STRUCT__entry(
 		__string(name, name)
 		__field(int, line)
-		__field(int, state);
+		__field(int, state)
 	),
 
 	TP_fast_assign(
@@ -121,7 +121,7 @@ TRACE_EVENT(pblk_state,
 
 	TP_STRUCT__entry(
 		__string(name, name)
-		__field(int, state);
+		__field(int, state)
 	),
 
 	TP_fast_assign(
diff --git a/drivers/net/fjes/fjes_trace.h b/drivers/net/fjes/fjes_trace.h
index c611b6a80b20..9237b69d8e21 100644
--- a/drivers/net/fjes/fjes_trace.h
+++ b/drivers/net/fjes/fjes_trace.h
@@ -28,7 +28,7 @@ TRACE_EVENT(fjes_hw_issue_request_command,
 		__field(u8, cs_busy)
 		__field(u8, cs_complete)
 		__field(int, timeout)
-		__field(int, ret);
+		__field(int, ret)
 	),
 	TP_fast_assign(
 		__entry->cr_req = cr->bits.req_code;
diff --git a/drivers/net/wireless/ath/ath10k/trace.h b/drivers/net/wireless/ath/ath10k/trace.h
index ab916459d237..842e42ec814f 100644
--- a/drivers/net/wireless/ath/ath10k/trace.h
+++ b/drivers/net/wireless/ath/ath10k/trace.h
@@ -239,7 +239,7 @@ TRACE_EVENT(ath10k_wmi_dbglog,
 	TP_STRUCT__entry(
 		__string(device, dev_name(ar->dev))
 		__string(driver, dev_driver_string(ar->dev))
-		__field(u8, hw_type);
+		__field(u8, hw_type)
 		__field(size_t, buf_len)
 		__dynamic_array(u8, buf, buf_len)
 	),
@@ -269,7 +269,7 @@ TRACE_EVENT(ath10k_htt_pktlog,
 	TP_STRUCT__entry(
 		__string(device, dev_name(ar->dev))
 		__string(driver, dev_driver_string(ar->dev))
-		__field(u8, hw_type);
+		__field(u8, hw_type)
 		__field(u16, buf_len)
 		__dynamic_array(u8, pktlog, buf_len)
 	),
@@ -435,7 +435,7 @@ TRACE_EVENT(ath10k_htt_rx_desc,
 	TP_STRUCT__entry(
 		__string(device, dev_name(ar->dev))
 		__string(driver, dev_driver_string(ar->dev))
-		__field(u8, hw_type);
+		__field(u8, hw_type)
 		__field(u16, len)
 		__dynamic_array(u8, rxdesc, len)
 	),
diff --git a/fs/xfs/scrub/trace.h b/fs/xfs/scrub/trace.h
index 3362bae28b46..096203119934 100644
--- a/fs/xfs/scrub/trace.h
+++ b/fs/xfs/scrub/trace.h
@@ -329,7 +329,7 @@ TRACE_EVENT(xchk_btree_op_error,
 		__field(int, level)
 		__field(xfs_agnumber_t, agno)
 		__field(xfs_agblock_t, bno)
-		__field(int, ptr);
+		__field(int, ptr)
 		__field(int, error)
 		__field(void *, ret_ip)
 	),
@@ -414,7 +414,7 @@ TRACE_EVENT(xchk_btree_error,
 		__field(int, level)
 		__field(xfs_agnumber_t, agno)
 		__field(xfs_agblock_t, bno)
-		__field(int, ptr);
+		__field(int, ptr)
 		__field(void *, ret_ip)
 	),
 	TP_fast_assign(
@@ -452,7 +452,7 @@ TRACE_EVENT(xchk_ifork_btree_error,
 		__field(int, level)
 		__field(xfs_agnumber_t, agno)
 		__field(xfs_agblock_t, bno)
-		__field(int, ptr);
+		__field(int, ptr)
 		__field(void *, ret_ip)
 	),
 	TP_fast_assign(
diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index eaae275ed430..53c5485cf2a1 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -218,8 +218,8 @@ DECLARE_EVENT_CLASS(xfs_bmap_class,
 	TP_STRUCT__entry(
 		__field(dev_t, dev)
 		__field(xfs_ino_t, ino)
-		__field(void *, leaf);
-		__field(int, pos);
+		__field(void *, leaf)
+		__field(int, pos)
 		__field(xfs_fileoff_t, startoff)
 		__field(xfs_fsblock_t, startblock)
 		__field(xfs_filblks_t, blockcount)
diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h
index 30a8cdcfd4a4..a379255c14a9 100644
--- a/include/linux/trace_events.h
+++ b/include/linux/trace_events.h
@@ -187,6 +187,22 @@ enum trace_reg {
 
 struct trace_event_call;
 
+#define TRACE_FUNCTION_TYPE ((const char *)~0UL)
+
+struct trace_event_fields {
+	const char *type;
+	union {
+		struct {
+			const char *name;
+			const int  size;
+			const int  align;
+			const int  is_signed;
+			const int  filter_type;
+		};
+		int (*define_fields)(struct trace_event_call *);
+	};
+};
+
 struct trace_event_class {
 	const char		*system;
 	void			*probe;
@@ -195,7 +211,7 @@ struct trace_event_class {
 #endif
 	int			(*reg)(struct trace_event_call *event,
 				       enum trace_reg type, void *data);
-	int			(*define_fields)(struct trace_event_call *);
+	struct trace_event_fields *fields_array;
 	struct list_head	*(*get_fields)(struct trace_event_call *);
 	struct list_head	fields;
 	int			(*raw_init)(struct trace_event_call *);
diff --git a/include/trace/events/filemap.h b/include/trace/events/filemap.h
index ee05db7ee8d2..796053e162d2 100644
--- a/include/trace/events/filemap.h
+++ b/include/trace/events/filemap.h
@@ -85,7 +85,7 @@ TRACE_EVENT(file_check_and_advance_wb_err,
 		TP_ARGS(file, old),
 
 		TP_STRUCT__entry(
-			__field(struct file *, file);
+			__field(struct file *, file)
 			__field(unsigned long, i_ino)
 			__field(dev_t, s_dev)
 			__field(errseq_t, old)
diff --git a/include/trace/events/rpcrdma.h b/include/trace/events/rpcrdma.h
index a13830616107..477351897051 100644
--- a/include/trace/events/rpcrdma.h
+++ b/include/trace/events/rpcrdma.h
@@ -1507,7 +1507,7 @@ TRACE_EVENT(svcrdma_dma_map_page,
 	TP_ARGS(rdma, page),
 
 	TP_STRUCT__entry(
-		__field(const void *, page);
+		__field(const void *, page)
 		__string(device, rdma->sc_cm_id->device->name)
 		__string(addr, rdma->sc_xprt.xpt_remotebuf)
 	),
diff --git a/include/trace/trace_events.h b/include/trace/trace_events.h
index 4ecdfe2e3580..ca1d2e745a3f 100644
--- a/include/trace/trace_events.h
+++ b/include/trace/trace_events.h
@@ -394,22 +394,16 @@ static struct trace_event_functions trace_event_type_funcs_##call = {	\
 #include TRACE_INCLUDE(TRACE_INCLUDE_FILE)
 
 #undef __field_ext
-#define __field_ext(type, item, filter_type)				\
-	ret = trace_define_field(event_call, #type, #item,		\
-				 offsetof(typeof(field), item),		\
-				 sizeof(field.item),			\
-				 is_signed_type(type), filter_type);	\
-	if (ret)							\
-		return ret;
+#define __field_ext(_type, _item, _filter_type) {			\
+	.type = #_type, .name = #_item,					\
+	.size = sizeof(_type), .align = __alignof__(_type),		\
+	.is_signed = is_signed_type(_type), .filter_type = _filter_type },
 
 #undef __field_struct_ext
-#define __field_struct_ext(type, item, filter_type)			\
-	ret = trace_define_field(event_call, #type, #item,		\
-				 offsetof(typeof(field), item),		\
-				 sizeof(field.item),			\
-				 0, filter_type);			\
-	if (ret)							\
-		return ret;
+#define __field_struct_ext(_type, _item, _filter_type) {		\
+	.type = #_type, .name = #_item,					\
+	.size = sizeof(_type), .align = __alignof__(_type),		\
+	0, .filter_type = _filter_type },
 
 #undef __field
 #define __field(type, item)	__field_ext(type, item, FILTER_OTHER)
@@ -418,25 +412,16 @@ static struct trace_event_functions trace_event_type_funcs_##call = {	\
 #define __field_struct(type, item) __field_struct_ext(type, item, FILTER_OTHER)
 
 #undef __array
-#define __array(type, item, len)					\
-	do {								\
-		char *type_str = #type"["__stringify(len)"]";		\
-		BUILD_BUG_ON(len > MAX_FILTER_STR_VAL);			\
-		BUILD_BUG_ON(len <= 0);					\
-		ret = trace_define_field(event_call, type_str, #item,	\
-				 offsetof(typeof(field), item),		\
-				 sizeof(field.item),			\
-				 is_signed_type(type), FILTER_OTHER);	\
-		if (ret)						\
-			return ret;					\
-	} while (0);
+#define __array(_type, _item, _len) {					\
+	.type = #_type"["__stringify(_len)"]", .name = #_item,		\
+	.size = sizeof(_type[_len]), .align = __alignof__(_type),	\
+	.is_signed = is_signed_type(_type), .filter_type = FILTER_OTHER },
 
 #undef __dynamic_array
-#define __dynamic_array(type, item, len)				       \
-	ret = trace_define_field(event_call, "__data_loc " #type "[]", #item,  \
-				 offsetof(typeof(field), __data_loc_##item),   \
-				 sizeof(field.__data_loc_##item),	       \
-				 is_signed_type(type), FILTER_OTHER);
+#define __dynamic_array(_type, _item, _len) {				\
+	.type = "__data_loc " #_type "[]", .name = #_item,		\
+	.size = 4, .align = 4,						\
+	.is_signed = is_signed_type(_type), .filter_type = FILTER_OTHER },
 
 #undef __string
 #define __string(item, src) __dynamic_array(char, item, -1)
@@ -446,16 +431,9 @@ static struct trace_event_functions trace_event_type_funcs_##call = {	\
 
 #undef DECLARE_EVENT_CLASS
 #define DECLARE_EVENT_CLASS(call, proto, args, tstruct, func, print)	\
-static int notrace __init						\
-trace_event_define_fields_##call(struct trace_event_call *event_call)	\
-{									\
-	struct trace_event_raw_##call field;				\
-	int ret;							\
-									\
-	tstruct;							\
-									\
-	return ret;							\
-}
+static struct trace_event_fields trace_event_fields_##call[] = {	\
+	tstruct								\
+	{} };
 
 #undef DEFINE_EVENT
 #define DEFINE_EVENT(template, name, proto, args)
@@ -613,7 +591,7 @@ static inline notrace int trace_event_get_offsets_##call(		\
  *
  * static struct trace_event_class __used event_class_<template> = {
  *	.system			= "<system>",
- *	.define_fields		= trace_event_define_fields_<call>,
+ *	.fields_array		= trace_event_fields_<call>,
  *	.fields			= LIST_HEAD_INIT(event_class_##call.fields),
  *	.raw_init		= trace_event_raw_init,
  *	.probe			= trace_event_raw_event_##call,
@@ -761,7 +739,7 @@ _TRACE_PERF_PROTO(call, PARAMS(proto));					\
 static char print_fmt_##call[] = print;					\
 static struct trace_event_class __used __refdata event_class_##call = { \
 	.system			= TRACE_SYSTEM_STRING,			\
-	.define_fields		= trace_event_define_fields_##call,	\
+	.fields_array		= trace_event_fields_##call,		\
 	.fields			= LIST_HEAD_INIT(event_class_##call.fields),\
 	.raw_init		= trace_event_raw_init,			\
 	.probe			= trace_event_raw_event_##call,		\
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index d685c61085c0..298a7cacf146 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -49,6 +49,9 @@ enum trace_type {
 #undef __field
 #define __field(type, item)		type	item;
 
+#undef __field_fn
+#define __field_fn(type, item)		type	item;
+
 #undef __field_struct
 #define __field_struct(type, item)	__field(type, item)
 
@@ -68,26 +71,22 @@ enum trace_type {
 #define F_STRUCT(args...)		args
 
 #undef FTRACE_ENTRY
-#define FTRACE_ENTRY(name, struct_name, id, tstruct, print, filter)	\
+#define FTRACE_ENTRY(name, struct_name, id, tstruct, print)		\
 	struct struct_name {						\
 		struct trace_entry	ent;				\
 		tstruct							\
 	}
 
 #undef FTRACE_ENTRY_DUP
-#define FTRACE_ENTRY_DUP(name, name_struct, id, tstruct, printk, filter)
+#define FTRACE_ENTRY_DUP(name, name_struct, id, tstruct, printk)
 
 #undef FTRACE_ENTRY_REG
-#define FTRACE_ENTRY_REG(name, struct_name, id, tstruct, print,	\
-			 filter, regfn) \
-	FTRACE_ENTRY(name, struct_name, id, PARAMS(tstruct), PARAMS(print), \
-		     filter)
+#define FTRACE_ENTRY_REG(name, struct_name, id, tstruct, print,	regfn)	\
+	FTRACE_ENTRY(name, struct_name, id, PARAMS(tstruct), PARAMS(print))
 
 #undef FTRACE_ENTRY_PACKED
-#define FTRACE_ENTRY_PACKED(name, struct_name, id, tstruct, print,	\
-			    filter)					\
-	FTRACE_ENTRY(name, struct_name, id, PARAMS(tstruct), PARAMS(print), \
-		     filter) __packed
+#define FTRACE_ENTRY_PACKED(name, struct_name, id, tstruct, print)	\
+	FTRACE_ENTRY(name, struct_name, id, PARAMS(tstruct), PARAMS(print)) __packed
 
 #include "trace_entries.h"
 
@@ -1899,17 +1898,15 @@ extern void tracing_log_err(struct trace_array *tr,
 #define internal_trace_puts(str) __trace_puts(_THIS_IP_, str, strlen(str))
 
 #undef FTRACE_ENTRY
-#define FTRACE_ENTRY(call, struct_name, id, tstruct, print, filter)	\
+#define FTRACE_ENTRY(call, struct_name, id, tstruct, print)	\
 	extern struct trace_event_call					\
 	__aligned(4) event_##call;
 #undef FTRACE_ENTRY_DUP
-#define FTRACE_ENTRY_DUP(call, struct_name, id, tstruct, print, filter)	\
-	FTRACE_ENTRY(call, struct_name, id, PARAMS(tstruct), PARAMS(print), \
-		     filter)
+#define FTRACE_ENTRY_DUP(call, struct_name, id, tstruct, print)	\
+	FTRACE_ENTRY(call, struct_name, id, PARAMS(tstruct), PARAMS(print))
 #undef FTRACE_ENTRY_PACKED
-#define FTRACE_ENTRY_PACKED(call, struct_name, id, tstruct, print, filter) \
-	FTRACE_ENTRY(call, struct_name, id, PARAMS(tstruct), PARAMS(print), \
-		     filter)
+#define FTRACE_ENTRY_PACKED(call, struct_name, id, tstruct, print) \
+	FTRACE_ENTRY(call, struct_name, id, PARAMS(tstruct), PARAMS(print))
 
 #include "trace_entries.h"
 
diff --git a/kernel/trace/trace_entries.h b/kernel/trace/trace_entries.h
index fc8e97328e54..3e9d81608284 100644
--- a/kernel/trace/trace_entries.h
+++ b/kernel/trace/trace_entries.h
@@ -61,15 +61,13 @@ FTRACE_ENTRY_REG(function, ftrace_entry,
 	TRACE_FN,
 
 	F_STRUCT(
-		__field(	unsigned long,	ip		)
-		__field(	unsigned long,	parent_ip	)
+		__field_fn(	unsigned long,	ip		)
+		__field_fn(	unsigned long,	parent_ip	)
 	),
 
 	F_printk(" %ps <-- %ps",
 		 (void *)__entry->ip, (void *)__entry->parent_ip),
 
-	FILTER_TRACE_FN,
-
 	perf_ftrace_event_register
 );
 
@@ -84,9 +82,7 @@ FTRACE_ENTRY_PACKED(funcgraph_entry, ftrace_graph_ent_entry,
 		__field_desc(	int,		graph_ent,	depth		)
 	),
 
-	F_printk("--> %ps (%d)", (void *)__entry->func, __entry->depth),
-
-	FILTER_OTHER
+	F_printk("--> %ps (%d)", (void *)__entry->func, __entry->depth)
 );
 
 /* Function return entry */
@@ -97,18 +93,16 @@ FTRACE_ENTRY_PACKED(funcgraph_exit, ftrace_graph_ret_entry,
 	F_STRUCT(
 		__field_struct(	struct ftrace_graph_ret,	ret	)
 		__field_desc(	unsigned long,	ret,		func	)
+		__field_desc(	unsigned long,	ret,		overrun	)
 		__field_desc(	unsigned long long, ret,	calltime)
 		__field_desc(	unsigned long long, ret,	rettime	)
-		__field_desc(	unsigned long,	ret,		overrun	)
 		__field_desc(	int,		ret,		depth	)
 	),
 
 	F_printk("<-- %ps (%d) (start: %llx  end: %llx) over: %d",
 		 (void *)__entry->func, __entry->depth,
 		 __entry->calltime, __entry->rettime,
-		 __entry->depth),
-
-	FILTER_OTHER
+		 __entry->depth)
 );
 
 /*
@@ -137,9 +131,7 @@ FTRACE_ENTRY(context_switch, ctx_switch_entry,
 	F_printk("%u:%u:%u  ==> %u:%u:%u [%03u]",
 		 __entry->prev_pid, __entry->prev_prio, __entry->prev_state,
 		 __entry->next_pid, __entry->next_prio, __entry->next_state,
-		 __entry->next_cpu),
-
-	FILTER_OTHER
+		 __entry->next_cpu)
 );
 
 /*
@@ -157,9 +149,7 @@ FTRACE_ENTRY_DUP(wakeup, ctx_switch_entry,
 	F_printk("%u:%u:%u  ==+ %u:%u:%u [%03u]",
 		 __entry->prev_pid, __entry->prev_prio, __entry->prev_state,
 		 __entry->next_pid, __entry->next_prio, __entry->next_state,
-		 __entry->next_cpu),
-
-	FILTER_OTHER
+		 __entry->next_cpu)
 );
 
 /*
@@ -183,9 +173,7 @@ FTRACE_ENTRY(kernel_stack, stack_entry,
 		 (void *)__entry->caller[0], (void *)__entry->caller[1],
 		 (void *)__entry->caller[2], (void *)__entry->caller[3],
 		 (void *)__entry->caller[4], (void *)__entry->caller[5],
-		 (void *)__entry->caller[6], (void *)__entry->caller[7]),
-
-	FILTER_OTHER
+		 (void *)__entry->caller[6], (void *)__entry->caller[7])
 );
 
 FTRACE_ENTRY(user_stack, userstack_entry,
@@ -203,9 +191,7 @@ FTRACE_ENTRY(user_stack, userstack_entry,
 		 (void *)__entry->caller[0], (void *)__entry->caller[1],
 		 (void *)__entry->caller[2], (void *)__entry->caller[3],
 		 (void *)__entry->caller[4], (void *)__entry->caller[5],
-		 (void *)__entry->caller[6], (void *)__entry->caller[7]),
-
-	FILTER_OTHER
+		 (void *)__entry->caller[6], (void *)__entry->caller[7])
 );
 
 /*
@@ -222,9 +208,7 @@ FTRACE_ENTRY(bprint, bprint_entry,
 	),
 
 	F_printk("%ps: %s",
-		 (void *)__entry->ip, __entry->fmt),
-
-	FILTER_OTHER
+		 (void *)__entry->ip, __entry->fmt)
 );
 
 FTRACE_ENTRY_REG(print, print_entry,
@@ -239,8 +223,6 @@ FTRACE_ENTRY_REG(print, print_entry,
 	F_printk("%ps: %s",
 		 (void *)__entry->ip, __entry->buf),
 
-	FILTER_OTHER,
-
 	ftrace_event_register
 );
 
@@ -254,9 +236,7 @@ FTRACE_ENTRY(raw_data, raw_data_entry,
 	),
 
 	F_printk("id:%04x %08x",
-		 __entry->id, (int)__entry->buf[0]),
-
-	FILTER_OTHER
+		 __entry->id, (int)__entry->buf[0])
 );
 
 FTRACE_ENTRY(bputs, bputs_entry,
@@ -269,9 +249,7 @@ FTRACE_ENTRY(bputs, bputs_entry,
 	),
 
 	F_printk("%ps: %s",
-		 (void *)__entry->ip, __entry->str),
-
-	FILTER_OTHER
+		 (void *)__entry->ip, __entry->str)
 );
 
 FTRACE_ENTRY(mmiotrace_rw, trace_mmiotrace_rw,
@@ -283,16 +261,14 @@ FTRACE_ENTRY(mmiotrace_rw, trace_mmiotrace_rw,
 		__field_desc(	resource_size_t, rw,	phys	)
 		__field_desc(	unsigned long,	rw,	value	)
 		__field_desc(	unsigned long,	rw,	pc	)
-		__field_desc(	int, 		rw,	map_id	)
+		__field_desc(	int,		rw,	map_id	)
 		__field_desc(	unsigned char,	rw,	opcode	)
 		__field_desc(	unsigned char,	rw,	width	)
 	),
 
 	F_printk("%lx %lx %lx %d %x %x",
 		 (unsigned long)__entry->phys, __entry->value, __entry->pc,
-		 __entry->map_id, __entry->opcode, __entry->width),
-
-	FILTER_OTHER
+		 __entry->map_id, __entry->opcode, __entry->width)
 );
 
 FTRACE_ENTRY(mmiotrace_map, trace_mmiotrace_map,
@@ -304,15 +280,13 @@ FTRACE_ENTRY(mmiotrace_map, trace_mmiotrace_map,
 		__field_desc(	resource_size_t, map,	phys	)
 		__field_desc(	unsigned long,	map,	virt	)
 		__field_desc(	unsigned long,	map,	len	)
-		__field_desc(	int, 		map,	map_id	)
+		__field_desc(	int,		map,	map_id	)
 		__field_desc(	unsigned char,	map,	opcode	)
 	),
 
 	F_printk("%lx %lx %lx %d %x",
 		 (unsigned long)__entry->phys, __entry->virt, __entry->len,
-		 __entry->map_id, __entry->opcode),
-
-	FILTER_OTHER
+		 __entry->map_id, __entry->opcode)
 );
 
 
@@ -334,9 +308,7 @@ FTRACE_ENTRY(branch, trace_branch,
 	F_printk("%u:%s:%s (%u)%s",
 		 __entry->line,
 		 __entry->func, __entry->file, __entry->correct,
-		 __entry->constant ? " CONSTANT" : ""),
-
-	FILTER_OTHER
+		 __entry->constant ? " CONSTANT" : "")
 );
 
 
@@ -362,7 +334,5 @@ FTRACE_ENTRY(hwlat, hwlat_entry,
 		 __entry->duration,
 		 __entry->outer_duration,
 		 __entry->nmi_total_ts,
-		 __entry->nmi_count),
-
-	FILTER_OTHER
+		 __entry->nmi_count)
 );
diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
index fba87d10f0c1..5ab10c3dce78 100644
--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -24,6 +24,7 @@
 #include <linux/delay.h>
 
 #include <trace/events/sched.h>
+#include <trace/syscall.h>
 
 #include <asm/setup.h>
 
@@ -1990,7 +1991,24 @@ event_create_dir(struct dentry *parent, struct trace_event_file *file)
 	 */
 	head = trace_get_fields(call);
 	if (list_empty(head)) {
-		ret = call->class->define_fields(call);
+		struct trace_event_fields *field = call->class->fields_array;
+		unsigned int offset = sizeof(struct trace_entry);
+
+		for (; field->type; field++) {
+			if (field->type == TRACE_FUNCTION_TYPE) {
+				ret = field->define_fields(call);
+				break;
+			}
+
+			offset = ALIGN(offset, field->align);
+			ret = trace_define_field(call, field->type, field->name,
+						 offset, field->size,
+						 field->is_signed, field->filter_type);
+			if (ret)
+				break;
+
+			offset += field->size;
+		}
 		if (ret < 0) {
 			pr_warn("Could not initialize trace point events/%s\n",
 				name);
diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c
index 57648c5aa679..2a4d3ceefd71 100644
--- a/kernel/trace/trace_events_hist.c
+++ b/kernel/trace/trace_events_hist.c
@@ -1133,6 +1133,12 @@ static struct synth_event *find_synth_event(const char *name)
 	return NULL;
 }
 
+static struct trace_event_fields synth_event_fields_array[] = {
+	{ .type = TRACE_FUNCTION_TYPE,
+	  .define_fields = synth_event_define_fields },
+	{}
+};
+
 static int register_synth_event(struct synth_event *event)
 {
 	struct trace_event_call *call = &event->call;
@@ -1154,7 +1160,7 @@ static int register_synth_event(struct synth_event *event)
 
 	INIT_LIST_HEAD(&call->class->fields);
 	call->event.funcs = &synth_event_funcs;
-	call->class->define_fields = synth_event_define_fields;
+	call->class->fields_array = synth_event_fields_array;
 
 	ret = register_trace_event(&call->event);
 	if (!ret) {
diff --git a/kernel/trace/trace_export.c b/kernel/trace/trace_export.c
index 45630a76ed3a..6d64c1c19fd5 100644
--- a/kernel/trace/trace_export.c
+++ b/kernel/trace/trace_export.c
@@ -29,10 +29,8 @@ static int ftrace_event_register(struct trace_event_call *call,
  * function and thus become accesible via perf.
  */
 #undef FTRACE_ENTRY_REG
-#define FTRACE_ENTRY_REG(name, struct_name, id, tstruct, print, \
-			 filter, regfn) \
-	FTRACE_ENTRY(name, struct_name, id, PARAMS(tstruct), PARAMS(print), \
-		     filter)
+#define FTRACE_ENTRY_REG(name, struct_name, id, tstruct, print, regfn) \
+	FTRACE_ENTRY(name, struct_name, id, PARAMS(tstruct), PARAMS(print))
 
 /* not needed for this file */
 #undef __field_struct
@@ -41,6 +39,9 @@ static int ftrace_event_register(struct trace_event_call *call,
 #undef __field
 #define __field(type, item)				type item;
 
+#undef __field_fn
+#define __field_fn(type, item)				type item;
+
 #undef __field_desc
 #define __field_desc(type, container, item)		type item;
 
@@ -60,7 +61,7 @@ static int ftrace_event_register(struct trace_event_call *call,
 #define F_printk(fmt, args...) fmt, args
 
 #undef FTRACE_ENTRY
-#define FTRACE_ENTRY(name, struct_name, id, tstruct, print, filter)	\
+#define FTRACE_ENTRY(name, struct_name, id, tstruct, print)		\
 struct ____ftrace_##name {						\
 	tstruct								\
 };									\
@@ -73,76 +74,46 @@ static void __always_unused ____ftrace_check_##name(void)		\
 }
 
 #undef FTRACE_ENTRY_DUP
-#define FTRACE_ENTRY_DUP(name, struct_name, id, tstruct, print, filter)	\
-	FTRACE_ENTRY(name, struct_name, id, PARAMS(tstruct), PARAMS(print), \
-		     filter)
+#define FTRACE_ENTRY_DUP(name, struct_name, id, tstruct, print)		\
+	FTRACE_ENTRY(name, struct_name, id, PARAMS(tstruct), PARAMS(print))
 
 #include "trace_entries.h"
 
+#undef __field_ext
+#define __field_ext(_type, _item, _filter_type) {			\
+	.type = #_type, .name = #_item,					\
+	.size = sizeof(_type), .align = __alignof__(_type),		\
+	is_signed_type(_type), .filter_type = _filter_type },
+
 #undef __field
-#define __field(type, item)						\
-	ret = trace_define_field(event_call, #type, #item,		\
-				 offsetof(typeof(field), item),		\
-				 sizeof(field.item),			\
-				 is_signed_type(type), filter_type);	\
-	if (ret)							\
-		return ret;
+#define __field(_type, _item) __field_ext(_type, _item, FILTER_OTHER)
+
+#undef __field_fn
+#define __field_fn(_type, _item) __field_ext(_type, _item, FILTER_TRACE_FN)
 
 #undef __field_desc
-#define __field_desc(type, container, item)	\
-	ret = trace_define_field(event_call, #type, #item,		\
-				 offsetof(typeof(field),		\
-					  container.item),		\
-				 sizeof(field.container.item),		\
-				 is_signed_type(type), filter_type);	\
-	if (ret)							\
-		return ret;
+#define __field_desc(_type, _container, _item) __field_ext(_type, _item, FILTER_OTHER)
 
 #undef __array
-#define __array(type, item, len)					\
-	do {								\
-		char *type_str = #type"["__stringify(len)"]";		\
-		BUILD_BUG_ON(len > MAX_FILTER_STR_VAL);			\
-		ret = trace_define_field(event_call, type_str, #item,	\
-				 offsetof(typeof(field), item),		\
-				 sizeof(field.item),			\
-				 is_signed_type(type), filter_type);	\
-		if (ret)						\
-			return ret;					\
-	} while (0);
+#define __array(_type, _item, _len) {					\
+	.type = #_type"["__stringify(_len)"]", .name = #_item,		\
+	.size = sizeof(_type[_len]), .align = __alignof__(_type),	\
+	is_signed_type(_type), .filter_type = FILTER_OTHER },
 
 #undef __array_desc
-#define __array_desc(type, container, item, len)			\
-	BUILD_BUG_ON(len > MAX_FILTER_STR_VAL);				\
-	ret = trace_define_field(event_call, #type "[" #len "]", #item,	\
-				 offsetof(typeof(field),		\
-					  container.item),		\
-				 sizeof(field.container.item),		\
-				 is_signed_type(type), filter_type);	\
-	if (ret)							\
-		return ret;
+#define __array_desc(_type, _container, _item, _len) __array(_type, _item, _len)
 
 #undef __dynamic_array
-#define __dynamic_array(type, item)					\
-	ret = trace_define_field(event_call, #type "[]", #item,  \
-				 offsetof(typeof(field), item),		\
-				 0, is_signed_type(type), filter_type);\
-	if (ret)							\
-		return ret;
+#define __dynamic_array(_type, _item) {					\
+	.type = #_type "[]", .name = #_item,				\
+	.size = 0, .align = __alignof__(_type),				\
+	is_signed_type(_type), .filter_type = FILTER_OTHER },
 
 #undef FTRACE_ENTRY
-#define FTRACE_ENTRY(name, struct_name, id, tstruct, print, filter)	\
-static int __init							\
-ftrace_define_fields_##name(struct trace_event_call *event_call)	\
-{									\
-	struct struct_name field;					\
-	int ret;							\
-	int filter_type = filter;					\
-									\
-	tstruct;							\
-									\
-	return ret;							\
-}
+#define FTRACE_ENTRY(name, struct_name, id, tstruct, print)		\
+static struct trace_event_fields ftrace_event_fields_##name[] = {	\
+	tstruct								\
+	{} };
 
 #include "trace_entries.h"
 
@@ -152,6 +123,9 @@ ftrace_define_fields_##name(struct trace_event_call *event_call)	\
 #undef __field
 #define __field(type, item)
 
+#undef __field_fn
+#define __field_fn(type, item)
+
 #undef __field_desc
 #define __field_desc(type, container, item)
 
@@ -168,12 +142,10 @@ ftrace_define_fields_##name(struct trace_event_call *event_call)	\
 #define F_printk(fmt, args...) __stringify(fmt) ", "  __stringify(args)
 
 #undef FTRACE_ENTRY_REG
-#define FTRACE_ENTRY_REG(call, struct_name, etype, tstruct, print, filter,\
-			 regfn)						\
-									\
+#define FTRACE_ENTRY_REG(call, struct_name, etype, tstruct, print, regfn) \
 struct trace_event_class __refdata event_class_ftrace_##call = {	\
 	.system			= __stringify(TRACE_SYSTEM),		\
-	.define_fields		= ftrace_define_fields_##call,		\
+	.fields_array		= ftrace_event_fields_##call,		\
 	.fields			= LIST_HEAD_INIT(event_class_ftrace_##call.fields),\
 	.reg			= regfn,				\
 };									\
@@ -191,9 +163,9 @@ struct trace_event_call __used						\
 __attribute__((section("_ftrace_events"))) *__event_##call = &event_##call;
 
 #undef FTRACE_ENTRY
-#define FTRACE_ENTRY(call, struct_name, etype, tstruct, print, filter)	\
+#define FTRACE_ENTRY(call, struct_name, etype, tstruct, print)		\
 	FTRACE_ENTRY_REG(call, struct_name, etype,			\
-			 PARAMS(tstruct), PARAMS(print), filter, NULL)
+			 PARAMS(tstruct), PARAMS(print), NULL)
 
 bool ftrace_event_is_function(struct trace_event_call *call)
 {
diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index 1552a95c743b..66e0a8ff1c01 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -1534,16 +1534,28 @@ static struct trace_event_functions kprobe_funcs = {
 	.trace		= print_kprobe_event
 };
 
+static struct trace_event_fields kretprobe_fields_array[] = {
+	{ .type = TRACE_FUNCTION_TYPE,
+	  .define_fields = kretprobe_event_define_fields },
+	{}
+};
+
+static struct trace_event_fields kprobe_fields_array[] = {
+	{ .type = TRACE_FUNCTION_TYPE,
+	  .define_fields = kprobe_event_define_fields },
+	{}
+};
+
 static inline void init_trace_event_call(struct trace_kprobe *tk)
 {
 	struct trace_event_call *call = trace_probe_event_call(&tk->tp);
 
 	if (trace_kprobe_is_return(tk)) {
 		call->event.funcs = &kretprobe_funcs;
-		call->class->define_fields = kretprobe_event_define_fields;
+		call->class->fields_array = kretprobe_fields_array;
 	} else {
 		call->event.funcs = &kprobe_funcs;
-		call->class->define_fields = kprobe_event_define_fields;
+		call->class->fields_array = kprobe_fields_array;
 	}
 
 	call->flags = TRACE_EVENT_FL_KPROBE;
diff --git a/kernel/trace/trace_syscalls.c b/kernel/trace/trace_syscalls.c
index fa8fbff736d6..53935259f701 100644
--- a/kernel/trace/trace_syscalls.c
+++ b/kernel/trace/trace_syscalls.c
@@ -198,11 +198,10 @@ print_syscall_exit(struct trace_iterator *iter, int flags,
 
 extern char *__bad_type_size(void);
 
-#define SYSCALL_FIELD(type, field, name)				\
-	sizeof(type) != sizeof(trace.field) ?				\
-		__bad_type_size() :					\
-		#type, #name, offsetof(typeof(trace), field),		\
-		sizeof(trace.field), is_signed_type(type)
+#define SYSCALL_FIELD(_type, _name) {					\
+	.type = #_type, .name = #_name,					\
+	.size = sizeof(_type), .align = __alignof__(_type),		\
+	.is_signed = is_signed_type(_type), .filter_type = FILTER_OTHER }
 
 static int __init
 __set_enter_print_fmt(struct syscall_metadata *entry, char *buf, int len)
@@ -269,42 +268,22 @@ static int __init syscall_enter_define_fields(struct trace_event_call *call)
 {
 	struct syscall_trace_enter trace;
 	struct syscall_metadata *meta = call->data;
-	int ret;
-	int i;
 	int offset = offsetof(typeof(trace), args);
-
-	ret = trace_define_field(call, SYSCALL_FIELD(int, nr, __syscall_nr),
-				 FILTER_OTHER);
-	if (ret)
-		return ret;
+	int ret, i;
 
 	for (i = 0; i < meta->nb_args; i++) {
 		ret = trace_define_field(call, meta->types[i],
 					 meta->args[i], offset,
 					 sizeof(unsigned long), 0,
 					 FILTER_OTHER);
+		if (ret)
+			break;
 		offset += sizeof(unsigned long);
 	}
 
 	return ret;
 }
 
-static int __init syscall_exit_define_fields(struct trace_event_call *call)
-{
-	struct syscall_trace_exit trace;
-	int ret;
-
-	ret = trace_define_field(call, SYSCALL_FIELD(int, nr, __syscall_nr),
-				 FILTER_OTHER);
-	if (ret)
-		return ret;
-
-	ret = trace_define_field(call, SYSCALL_FIELD(long, ret, ret),
-				 FILTER_OTHER);
-
-	return ret;
-}
-
 static void ftrace_syscall_enter(void *data, struct pt_regs *regs, long id)
 {
 	struct trace_array *tr = data;
@@ -502,6 +481,13 @@ static int __init init_syscall_trace(struct trace_event_call *call)
 	return id;
 }
 
+static struct trace_event_fields __refdata syscall_enter_fields_array[] = {
+	SYSCALL_FIELD(int, __syscall_nr),
+	{ .type = TRACE_FUNCTION_TYPE,
+	  .define_fields = syscall_enter_define_fields },
+	{}
+};
+
 struct trace_event_functions enter_syscall_print_funcs = {
 	.trace		= print_syscall_enter,
 };
@@ -513,7 +499,7 @@ struct trace_event_functions exit_syscall_print_funcs = {
 struct trace_event_class __refdata event_class_syscall_enter = {
 	.system		= "syscalls",
 	.reg		= syscall_enter_register,
-	.define_fields	= syscall_enter_define_fields,
+	.fields_array	= syscall_enter_fields_array,
 	.get_fields	= syscall_get_enter_fields,
 	.raw_init	= init_syscall_trace,
 };
@@ -521,7 +507,11 @@ struct trace_event_class __refdata event_class_syscall_enter = {
 struct trace_event_class __refdata event_class_syscall_exit = {
 	.system		= "syscalls",
 	.reg		= syscall_exit_register,
-	.define_fields	= syscall_exit_define_fields,
+	.fields_array	= (struct trace_event_fields[]){
+		SYSCALL_FIELD(int, __syscall_nr),
+		SYSCALL_FIELD(long, ret),
+		{}
+	},
 	.fields		= LIST_HEAD_INIT(event_class_syscall_exit.fields),
 	.raw_init	= init_syscall_trace,
 };
diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
index 352073d36585..476a382f1f1b 100644
--- a/kernel/trace/trace_uprobe.c
+++ b/kernel/trace/trace_uprobe.c
@@ -1507,12 +1507,17 @@ static struct trace_event_functions uprobe_funcs = {
 	.trace		= print_uprobe_event
 };
 
+static struct trace_event_fields uprobe_fields_array[] = {
+	{ .type = TRACE_FUNCTION_TYPE,
+	  .define_fields = uprobe_event_define_fields },
+	{}
+};
+
 static inline void init_trace_event_call(struct trace_uprobe *tu)
 {
 	struct trace_event_call *call = trace_probe_event_call(&tu->tp);
-
 	call->event.funcs = &uprobe_funcs;
-	call->class->define_fields = uprobe_event_define_fields;
+	call->class->fields_array = uprobe_fields_array;
 
 	call->flags = TRACE_EVENT_FL_UPROBE | TRACE_EVENT_FL_CAP_ANY;
 	call->class->reg = trace_uprobe_register;
diff --git a/net/mac80211/trace.h b/net/mac80211/trace.h
index 4768322dc202..427f51a0a994 100644
--- a/net/mac80211/trace.h
+++ b/net/mac80211/trace.h
@@ -408,20 +408,20 @@ TRACE_EVENT(drv_bss_info_changed,
 		__field(u32, basic_rates)
 		__array(int, mcast_rate, NUM_NL80211_BANDS)
 		__field(u16, ht_operation_mode)
-		__field(s32, cqm_rssi_thold);
-		__field(s32, cqm_rssi_hyst);
-		__field(u32, channel_width);
-		__field(u32, channel_cfreq1);
+		__field(s32, cqm_rssi_thold)
+		__field(s32, cqm_rssi_hyst)
+		__field(u32, channel_width)
+		__field(u32, channel_cfreq1)
 		__dynamic_array(u32, arp_addr_list,
 				info->arp_addr_cnt > IEEE80211_BSS_ARP_ADDR_LIST_LEN ?
 					IEEE80211_BSS_ARP_ADDR_LIST_LEN :
-					info->arp_addr_cnt);
-		__field(int, arp_addr_cnt);
-		__field(bool, qos);
-		__field(bool, idle);
-		__field(bool, ps);
-		__dynamic_array(u8, ssid, info->ssid_len);
-		__field(bool, hidden_ssid);
+					info->arp_addr_cnt)
+		__field(int, arp_addr_cnt)
+		__field(bool, qos)
+		__field(bool, idle)
+		__field(bool, ps)
+		__dynamic_array(u8, ssid, info->ssid_len)
+		__field(bool, hidden_ssid)
 		__field(int, txpower)
 		__field(u8, p2p_oppps_ctwindow)
 	),
@@ -1672,8 +1672,8 @@ TRACE_EVENT(drv_start_ap,
 		VIF_ENTRY
 		__field(u8, dtimper)
 		__field(u16, bcnint)
-		__dynamic_array(u8, ssid, info->ssid_len);
-		__field(bool, hidden_ssid);
+		__dynamic_array(u8, ssid, info->ssid_len)
+		__field(bool, hidden_ssid)
 	),
 
 	TP_fast_assign(
@@ -1739,7 +1739,7 @@ TRACE_EVENT(drv_join_ibss,
 		VIF_ENTRY
 		__field(u8, dtimper)
 		__field(u16, bcnint)
-		__dynamic_array(u8, ssid, info->ssid_len);
+		__dynamic_array(u8, ssid, info->ssid_len)
 	),
 
 	TP_fast_assign(
diff --git a/net/wireless/trace.h b/net/wireless/trace.h
index d98ad2b3143b..d5eaa4928baa 100644
--- a/net/wireless/trace.h
+++ b/net/wireless/trace.h
@@ -2009,7 +2009,7 @@ TRACE_EVENT(rdev_start_nan,
 		WIPHY_ENTRY
 		WDEV_ENTRY
 		__field(u8, master_pref)
-		__field(u8, bands);
+		__field(u8, bands)
 	),
 	TP_fast_assign(
 		WIPHY_ASSIGN;
@@ -2031,8 +2031,8 @@ TRACE_EVENT(rdev_nan_change_conf,
 		WIPHY_ENTRY
 		WDEV_ENTRY
 		__field(u8, master_pref)
-		__field(u8, bands);
-		__field(u32, changes);
+		__field(u8, bands)
+		__field(u32, changes)
 	),
 	TP_fast_assign(
 		WIPHY_ASSIGN;

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v4 15/16] module: Move where we mark modules RO,X
  2019-10-24 16:43             ` Peter Zijlstra
@ 2019-10-24 18:17               ` Steven Rostedt
  2019-10-24 20:24                 ` Peter Zijlstra
  0 siblings, 1 reply; 70+ messages in thread
From: Steven Rostedt @ 2019-10-24 18:17 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: x86, linux-kernel, mhiramat, bristot, jbaron, torvalds, tglx,
	mingo, namit, hpa, luto, ard.biesheuvel, jpoimboe, jeyu

On Thu, 24 Oct 2019 18:43:20 +0200
Peter Zijlstra <peterz@infradead.org> wrote:

> > 
> >   CC [M]  drivers/gpu/drm/i915/gem/i915_gem_context.o
> > /work/git/linux-trace.git/kernel/trace/trace_events_hist.c: In function ‘register_synth_event’:
> > /work/git/linux-trace.git/kernel/trace/trace_events_hist.c:1157:15: error: ‘struct trace_event_class’ has no member named ‘define_fields’; did you mean ‘get_fields’?
> >   call->class->define_fields = synth_event_define_fields;
> >                ^~~~~~~~~~~~~
> >                get_fields
> > make[3]: *** [/work/git/linux-trace.git/scripts/Makefile.build:265: kernel/trace/trace_events_hist.o] Error 1
> > make[3]: *** Waiting for unfinished jobs....  
> 
> allmodconfig clean
> 
> (omg, so much __field(); fail)

Well it built without warnings and passed the ftrace selftests.

I haven't ran it through the full suite, but that can wait for the v5.

-- Steve

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v4 15/16] module: Move where we mark modules RO,X
  2019-10-24 10:59             ` Peter Zijlstra
@ 2019-10-24 18:31               ` Josh Poimboeuf
  2019-10-24 20:33                 ` Peter Zijlstra
  0 siblings, 1 reply; 70+ messages in thread
From: Josh Poimboeuf @ 2019-10-24 18:31 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: x86, linux-kernel, rostedt, mhiramat, bristot, jbaron, torvalds,
	tglx, mingo, namit, hpa, luto, ard.biesheuvel, jeyu

On Thu, Oct 24, 2019 at 12:59:04PM +0200, Peter Zijlstra wrote:
> On Wed, Oct 23, 2019 at 12:15:14PM -0500, Josh Poimboeuf wrote:
> > On Wed, Oct 23, 2019 at 05:16:54PM +0200, Peter Zijlstra wrote:
> > > @@ -157,6 +158,14 @@ static int __apply_relocate_add(Elf64_Sh
> > >  
> > >  		val = sym->st_value + rel[i].r_addend;
> > >  
> > > +		/*
> > > +		 * .klp.rela.* sections should only contain module
> > > +		 * related RELAs. All core-kernel RELAs should be in
> > > +		 * normal .rela.* sections and be applied when loading
> > > +		 * the patch module itself.
> > > +		 */
> > > +		WARN_ON_ONCE(klp && core_kernel_text(val));
> > > +
> > 
> > This isn't quite true, we also use .klp.rela sections to access
> > unexported vmlinux symbols.
> 
> Yes, you said in that earlier email. That all makes it really hard to
> validate this. But unless we validate it, it will stay buggy :/
> 
> Hmmm.... /me ponders
> 
> The alternative would be to apply the .klp.rela.* sections twice, once
> at patch-module load time and then apply those core_kernel_text()
> entries, and then once later and skip over them.
> 
> How's this?

How about something like this?  Completely untested, but if you agree
with this approach I could hack up kpatch-build to test it properly.

diff --git a/kernel/livepatch/core.c b/kernel/livepatch/core.c
index ab4a4606d19b..597bf32bc591 100644
--- a/kernel/livepatch/core.c
+++ b/kernel/livepatch/core.c
@@ -239,6 +239,17 @@ static int klp_resolve_symbols(Elf_Shdr *relasec, struct module *pmod)
 		if (ret)
 			return ret;
 
+		/*
+		 * Prevent module patches from using livepatch relas for
+		 * vmlinux symbols.  Presumably such symbols are exported and
+		 * normal relas can instead be used at patch module loading
+		 * time.
+		 */
+		if (!vmlinux && core_kernel_text(addr)) {
+			pr_err("unsupported livepatch symbol\n");
+			return -EINVAL;
+		}
+
 		sym->st_value = addr;
 	}
 


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v4 15/16] module: Move where we mark modules RO,X
  2019-10-24 18:17               ` Steven Rostedt
@ 2019-10-24 20:24                 ` Peter Zijlstra
  2019-10-24 20:28                   ` Steven Rostedt
  0 siblings, 1 reply; 70+ messages in thread
From: Peter Zijlstra @ 2019-10-24 20:24 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: x86, linux-kernel, mhiramat, bristot, jbaron, torvalds, tglx,
	mingo, namit, hpa, luto, ard.biesheuvel, jpoimboe, jeyu

On Thu, Oct 24, 2019 at 02:17:31PM -0400, Steven Rostedt wrote:
> On Thu, 24 Oct 2019 18:43:20 +0200
> Peter Zijlstra <peterz@infradead.org> wrote:
> 
> > > 
> > >   CC [M]  drivers/gpu/drm/i915/gem/i915_gem_context.o
> > > /work/git/linux-trace.git/kernel/trace/trace_events_hist.c: In function ‘register_synth_event’:
> > > /work/git/linux-trace.git/kernel/trace/trace_events_hist.c:1157:15: error: ‘struct trace_event_class’ has no member named ‘define_fields’; did you mean ‘get_fields’?
> > >   call->class->define_fields = synth_event_define_fields;
> > >                ^~~~~~~~~~~~~
> > >                get_fields
> > > make[3]: *** [/work/git/linux-trace.git/scripts/Makefile.build:265: kernel/trace/trace_events_hist.o] Error 1
> > > make[3]: *** Waiting for unfinished jobs....  
> > 
> > allmodconfig clean
> > 
> > (omg, so much __field(); fail)
> 
> Well it built without warnings and passed the ftrace selftests.
> 
> I haven't ran it through the full suite, but that can wait for the v5.

I'll push it out to git to the 0day robot can have a go at it. For v5
I'm still staring at some KLP borkage. Then again, maybe I should delay
that last bit and make that a new series.

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v4 15/16] module: Move where we mark modules RO,X
  2019-10-24 20:24                 ` Peter Zijlstra
@ 2019-10-24 20:28                   ` Steven Rostedt
  0 siblings, 0 replies; 70+ messages in thread
From: Steven Rostedt @ 2019-10-24 20:28 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: x86, linux-kernel, mhiramat, bristot, jbaron, torvalds, tglx,
	mingo, namit, hpa, luto, ard.biesheuvel, jpoimboe, jeyu

On Thu, 24 Oct 2019 22:24:55 +0200
Peter Zijlstra <peterz@infradead.org> wrote:

> On Thu, Oct 24, 2019 at 02:17:31PM -0400, Steven Rostedt wrote:
> > On Thu, 24 Oct 2019 18:43:20 +0200
> > Peter Zijlstra <peterz@infradead.org> wrote:
> >   
> > > > 
> > > >   CC [M]  drivers/gpu/drm/i915/gem/i915_gem_context.o
> > > > /work/git/linux-trace.git/kernel/trace/trace_events_hist.c: In function ‘register_synth_event’:
> > > > /work/git/linux-trace.git/kernel/trace/trace_events_hist.c:1157:15: error: ‘struct trace_event_class’ has no member named ‘define_fields’; did you mean ‘get_fields’?
> > > >   call->class->define_fields = synth_event_define_fields;
> > > >                ^~~~~~~~~~~~~
> > > >                get_fields
> > > > make[3]: *** [/work/git/linux-trace.git/scripts/Makefile.build:265: kernel/trace/trace_events_hist.o] Error 1
> > > > make[3]: *** Waiting for unfinished jobs....    
> > > 
> > > allmodconfig clean
> > > 
> > > (omg, so much __field(); fail)  
> > 
> > Well it built without warnings and passed the ftrace selftests.
> > 
> > I haven't ran it through the full suite, but that can wait for the v5.  
> 
> I'll push it out to git to the 0day robot can have a go at it. For v5
> I'm still staring at some KLP borkage. Then again, maybe I should delay
> that last bit and make that a new series.

Also note, that I'm about to travel to Lyon for Open Source Summit,
thus my looking at this will come pretty much to a stand still :-/

-- Steve

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v4 15/16] module: Move where we mark modules RO,X
  2019-10-24 18:31               ` Josh Poimboeuf
@ 2019-10-24 20:33                 ` Peter Zijlstra
  0 siblings, 0 replies; 70+ messages in thread
From: Peter Zijlstra @ 2019-10-24 20:33 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: x86, linux-kernel, rostedt, mhiramat, bristot, jbaron, torvalds,
	tglx, mingo, namit, hpa, luto, ard.biesheuvel, jeyu

On Thu, Oct 24, 2019 at 01:31:15PM -0500, Josh Poimboeuf wrote:

> How about something like this?  Completely untested, but if you agree
> with this approach I could hack up kpatch-build to test it properly.
> 
> diff --git a/kernel/livepatch/core.c b/kernel/livepatch/core.c
> index ab4a4606d19b..597bf32bc591 100644
> --- a/kernel/livepatch/core.c
> +++ b/kernel/livepatch/core.c
> @@ -239,6 +239,17 @@ static int klp_resolve_symbols(Elf_Shdr *relasec, struct module *pmod)
>  		if (ret)
>  			return ret;
>  
> +		/*
> +		 * Prevent module patches from using livepatch relas for
> +		 * vmlinux symbols.  Presumably such symbols are exported and
> +		 * normal relas can instead be used at patch module loading
> +		 * time.
> +		 */
> +		if (!vmlinux && core_kernel_text(addr)) {
> +			pr_err("unsupported livepatch symbol\n");
> +			return -EINVAL;
> +		}
> +
>  		sym->st_value = addr;
>  	}

If that works, this is much simpler and therefore preferred.

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v4 15/16] module: Move where we mark modules RO,X
  2019-10-24 13:16           ` Peter Zijlstra
@ 2019-10-25  6:44             ` Petr Mladek
  2019-10-25  8:43               ` Peter Zijlstra
  2019-10-25  9:16               ` Peter Zijlstra
  0 siblings, 2 replies; 70+ messages in thread
From: Petr Mladek @ 2019-10-25  6:44 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Josh Poimboeuf, x86, linux-kernel, rostedt, mhiramat, bristot,
	jbaron, torvalds, tglx, mingo, namit, hpa, luto, ard.biesheuvel,
	jeyu, live-patching

On Thu 2019-10-24 15:16:34, Peter Zijlstra wrote:
> On Wed, Oct 23, 2019 at 12:00:25PM -0500, Josh Poimboeuf wrote:
> 
> > > This then raises a number of questions:
> > > 
> > >  1) why is that RELA (that obviously does not depend on any module)
> > >     applied so late?
> > 
> > Good question.  The 'pv_ops' symbol is exported by the core kernel, so I
> > can't see any reason why we'd need to apply that rela late.  In theory,
> > kpatch-build isn't supposed to convert that to a klp rela.  Maybe
> > something went wrong in the patch creation code.
> > 
> > I'm also questioning why we even need to apply the parainstructions
> > section late.  Maybe we can remove that apply_paravirt() call
> > altogether, along with .klp.arch.parainstruction sections.

Hmm, the original bug report against livepatching was actually about
paravirt ops, see below.


> > I'll need to look into it...
> 
> Right, that really should be able to run early. Esp. after commit
> 
>   11e86dc7f274 ("x86/paravirt: Detect over-sized patching bugs in paravirt_patch_call()")
> 
> paravirt patching is unconditional. We _never_ run with the indirect
> call except very early boot, but modules should have them patched way
> before their init section runs.
> 
> We rely on this for spectre-v2 and friends.

Livepatching has the same requirement. The module code has to be fully
livepatched before the module gets actually used. It means before
mod->init() is called and before the module is moved into
MODULE_STATE_LIVE state.


> > >  3) Is there ever a possible module-dependent RELA to a paravirt /
> > >     alternative site?
> > 
> > Good question...
> 
> > > Then for 3) we only have alternatives left, and I _think_ it unlikely to
> > > be the case, but I'll have to have a hard look at that.
> > 
> > I'm not sure about alternatives, but maybe we can enforce such
> > limitations with tooling and/or kernel checks.
> 
> Right, so on IRC you implied you might have some additional details on
> how alternatives were affected; did you manage to dig that up?

I am not sure what Josh had in mind. But the problem with livepatches,
paravort ops, and alternatives was described in the related patchset, see
https://lkml.kernel.org/r/1471481911-5003-1-git-send-email-jeyu@redhat.com

The original bug report is
https://lkml.kernel.org/r/20160329120518.GA21252@canonical.com

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v4 15/16] module: Move where we mark modules RO,X
  2019-10-25  6:44             ` Petr Mladek
@ 2019-10-25  8:43               ` Peter Zijlstra
  2019-10-25 10:06                 ` Peter Zijlstra
  2019-10-25  9:16               ` Peter Zijlstra
  1 sibling, 1 reply; 70+ messages in thread
From: Peter Zijlstra @ 2019-10-25  8:43 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Josh Poimboeuf, x86, linux-kernel, rostedt, mhiramat, bristot,
	jbaron, torvalds, tglx, mingo, namit, hpa, luto, ard.biesheuvel,
	jeyu, live-patching

On Fri, Oct 25, 2019 at 08:44:56AM +0200, Petr Mladek wrote:
> On Thu 2019-10-24 15:16:34, Peter Zijlstra wrote:
> > On Wed, Oct 23, 2019 at 12:00:25PM -0500, Josh Poimboeuf wrote:
> > 
> > > > This then raises a number of questions:
> > > > 
> > > >  1) why is that RELA (that obviously does not depend on any module)
> > > >     applied so late?
> > > 
> > > Good question.  The 'pv_ops' symbol is exported by the core kernel, so I
> > > can't see any reason why we'd need to apply that rela late.  In theory,
> > > kpatch-build isn't supposed to convert that to a klp rela.  Maybe
> > > something went wrong in the patch creation code.
> > > 
> > > I'm also questioning why we even need to apply the parainstructions
> > > section late.  Maybe we can remove that apply_paravirt() call
> > > altogether, along with .klp.arch.parainstruction sections.
> 
> Hmm, the original bug report against livepatching was actually about
> paravirt ops, see below.

Yes, I found that.

> > > I'm not sure about alternatives, but maybe we can enforce such
> > > limitations with tooling and/or kernel checks.
> > 
> > Right, so on IRC you implied you might have some additional details on
> > how alternatives were affected; did you manage to dig that up?
> 
> I am not sure what Josh had in mind. But the problem with livepatches,
> paravort ops, and alternatives was described in the related patchset, see
> https://lkml.kernel.org/r/1471481911-5003-1-git-send-email-jeyu@redhat.com

Yes, and my complaint there is that that thread is void of useful
content.

> The original bug report is
> https://lkml.kernel.org/r/20160329120518.GA21252@canonical.com

I found the github (*groan*) link in the thread above.

From all that I could only make that the paravirt stuff is just doing it
wrong (see earlier emails, core-kernel RELAs really should be applied at
the time of patch-module load, there's no excuse for them to be delayed
to the .klp.rela. section) at which point paravirt will also magically
work.

But none of that explains why apply_alternatives() is also delayed.

So I'm very tempted to just revert that patchset for doing it all
wrong.

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v4 15/16] module: Move where we mark modules RO,X
  2019-10-25  6:44             ` Petr Mladek
  2019-10-25  8:43               ` Peter Zijlstra
@ 2019-10-25  9:16               ` Peter Zijlstra
  1 sibling, 0 replies; 70+ messages in thread
From: Peter Zijlstra @ 2019-10-25  9:16 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Josh Poimboeuf, x86, linux-kernel, rostedt, mhiramat, bristot,
	jbaron, torvalds, tglx, mingo, namit, hpa, luto, ard.biesheuvel,
	jeyu, live-patching

On Fri, Oct 25, 2019 at 08:44:56AM +0200, Petr Mladek wrote:
> On Thu 2019-10-24 15:16:34, Peter Zijlstra wrote:

> > Right, that really should be able to run early. Esp. after commit
> > 
> >   11e86dc7f274 ("x86/paravirt: Detect over-sized patching bugs in paravirt_patch_call()")
> > 
> > paravirt patching is unconditional. We _never_ run with the indirect
> > call except very early boot, but modules should have them patched way
> > before their init section runs.
> > 
> > We rely on this for spectre-v2 and friends.
> 
> Livepatching has the same requirement. The module code has to be fully
> livepatched before the module gets actually used.

Right, and that is just saying that all paravirt RELAs (pv_ops) can
basically be deleted from modules.

Which avoids the reported problem in yet another way.

> It means before mod->init() is called and before the module is moved
> into MODULE_STATE_LIVE state.

Funny thing, currently ftrace is running code before all that. It runs
code before klp_module_coming(), before jump_label patching.

My other patch in this thread fixes that.

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v4 15/16] module: Move where we mark modules RO,X
  2019-10-25  8:43               ` Peter Zijlstra
@ 2019-10-25 10:06                 ` Peter Zijlstra
  2019-10-25 13:50                   ` Josh Poimboeuf
  2019-10-26  1:17                   ` Josh Poimboeuf
  0 siblings, 2 replies; 70+ messages in thread
From: Peter Zijlstra @ 2019-10-25 10:06 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Josh Poimboeuf, x86, linux-kernel, rostedt, mhiramat, bristot,
	jbaron, torvalds, tglx, mingo, namit, hpa, luto, ard.biesheuvel,
	jeyu, live-patching, Mark Rutland

On Fri, Oct 25, 2019 at 10:43:00AM +0200, Peter Zijlstra wrote:

> But none of that explains why apply_alternatives() is also delayed.
> 
> So I'm very tempted to just revert that patchset for doing it all
> wrong.

And I've done just that. This includes Josh's validation patch, the
revert and my klp_appy_relocations_add() patches with the removal of
module_disable_ro().

Josh, can you test or give me clue on how to test? I need to run a few
errands today, but I'll try and have a poke either tonight or tomorrow.

  git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git x86/rwx

After this it is waiting for Mark's argh64 patches to land:

  https://lkml.kernel.org/r/20191021163426.9408-5-mark.rutland@arm.com

And then we can go and delete module_disable_ro() entirely -- hooray!

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v4 15/16] module: Move where we mark modules RO,X
  2019-10-25 10:06                 ` Peter Zijlstra
@ 2019-10-25 13:50                   ` Josh Poimboeuf
  2019-10-26  1:17                   ` Josh Poimboeuf
  1 sibling, 0 replies; 70+ messages in thread
From: Josh Poimboeuf @ 2019-10-25 13:50 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Petr Mladek, x86, linux-kernel, rostedt, mhiramat, bristot,
	jbaron, torvalds, tglx, mingo, namit, hpa, luto, ard.biesheuvel,
	jeyu, live-patching, Mark Rutland

On Fri, Oct 25, 2019 at 12:06:12PM +0200, Peter Zijlstra wrote:
> On Fri, Oct 25, 2019 at 10:43:00AM +0200, Peter Zijlstra wrote:
> 
> > But none of that explains why apply_alternatives() is also delayed.
> > 
> > So I'm very tempted to just revert that patchset for doing it all
> > wrong.
> 
> And I've done just that. This includes Josh's validation patch, the
> revert and my klp_appy_relocations_add() patches with the removal of
> module_disable_ro().
> 
> Josh, can you test or give me clue on how to test? I need to run a few
> errands today, but I'll try and have a poke either tonight or tomorrow.
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git x86/rwx

Thanks.  I'll work on hacking up kpatch-build to support this, and then
I'll need to run it through a lot of testing to make sure this was a
good idea.

-- 
Josh


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v4 15/16] module: Move where we mark modules RO,X
  2019-10-25 10:06                 ` Peter Zijlstra
  2019-10-25 13:50                   ` Josh Poimboeuf
@ 2019-10-26  1:17                   ` Josh Poimboeuf
  2019-10-28 10:07                     ` Peter Zijlstra
  2019-10-28 10:43                     ` Peter Zijlstra
  1 sibling, 2 replies; 70+ messages in thread
From: Josh Poimboeuf @ 2019-10-26  1:17 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Petr Mladek, x86, linux-kernel, rostedt, mhiramat, bristot,
	jbaron, torvalds, tglx, mingo, namit, hpa, luto, ard.biesheuvel,
	jeyu, live-patching, Mark Rutland

On Fri, Oct 25, 2019 at 12:06:12PM +0200, Peter Zijlstra wrote:
> On Fri, Oct 25, 2019 at 10:43:00AM +0200, Peter Zijlstra wrote:
> 
> > But none of that explains why apply_alternatives() is also delayed.
> > 
> > So I'm very tempted to just revert that patchset for doing it all
> > wrong.
> 
> And I've done just that. This includes Josh's validation patch, the
> revert and my klp_appy_relocations_add() patches with the removal of
> module_disable_ro().
> 
> Josh, can you test or give me clue on how to test? I need to run a few
> errands today, but I'll try and have a poke either tonight or tomorrow.
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git x86/rwx

I looked at this today.  A few potential tweaks:

- The new klp_apply_relocate_add() interface isn't needed.  Instead
  apply_relocate_add() can use the module state to decide whether to
  text_poke().

- For robustness I think we need to apply vmlinux-specific klp
  relocations at the same time as normal relocations.

Rough untested changes below.  I still need to finish changing
kpatch-build and then I'll need to do a LOT of testing.

I can take over the livepatch-specific patches if you want.  Or however
you want to do it.

diff --git a/arch/s390/kernel/module.c b/arch/s390/kernel/module.c
index 7fc519b9b4e0..6a70213854f0 100644
--- a/arch/s390/kernel/module.c
+++ b/arch/s390/kernel/module.c
@@ -451,14 +451,11 @@ int apply_relocate_add(Elf_Shdr *sechdrs, const char *strtab,
 		       unsigned int symindex, unsigned int relsec,
 		       struct module *me)
 {
-	return __apply_relocate_add(sechdrs, strtab, symindex, relsec, me, memwrite);
-}
+	int ret;
+	bool early = me->state != MODULE_STATE_LIVE;
 
-int klp_apply_relocate_add(Elf_Shdr *sechdrs, const char *strtab,
-		       unsigned int symindex, unsigned int relsec,
-		       struct module *me)
-{
-	return __apply_relocate_add(sechdrs, strtab, symindex, relsec, me, s390_kernel_write);
+	return __apply_relocate_add(sechdrs, strtab, symindex, relsec, me,
+				    early ? memwrite : s390_kernel_write);
 }
 
 int module_finalize(const Elf_Ehdr *hdr,
diff --git a/arch/x86/kernel/module.c b/arch/x86/kernel/module.c
index 5eee618a98c5..30174798ff79 100644
--- a/arch/x86/kernel/module.c
+++ b/arch/x86/kernel/module.c
@@ -222,20 +222,14 @@ int apply_relocate_add(Elf64_Shdr *sechdrs,
 		   unsigned int symindex,
 		   unsigned int relsec,
 		   struct module *me)
-{
-	return __apply_relocate_add(sechdrs, strtab, symindex, relsec, me, memcpy);
-}
-
-int klp_apply_relocate_add(Elf64_Shdr *sechdrs,
-		   const char *strtab,
-		   unsigned int symindex,
-		   unsigned int relsec,
-		   struct module *me)
 {
 	int ret;
+	bool early = me->state != MODULE_STATE_LIVE;
+
+	ret = __apply_relocate_add(sechdrs, strtab, symindex, relsec, me,
+				   early ? memcpy : text_poke);
 
-	ret = __apply_relocate_add(sechdrs, strtab, symindex, relsec, me, text_poke);
-	if (!ret)
+	if (!ret && !early)
 		text_poke_sync();
 
 	return ret;
diff --git a/include/linux/livepatch.h b/include/linux/livepatch.h
index cc18f945bdb2..b00170696db2 100644
--- a/include/linux/livepatch.h
+++ b/include/linux/livepatch.h
@@ -214,12 +214,7 @@ void *klp_shadow_get_or_alloc(void *obj, unsigned long id,
 void klp_shadow_free(void *obj, unsigned long id, klp_shadow_dtor_t dtor);
 void klp_shadow_free_all(unsigned long id, klp_shadow_dtor_t dtor);
 
-
-extern int klp_apply_relocate_add(Elf64_Shdr *sechdrs,
-			      const char *strtab,
-			      unsigned int symindex,
-			      unsigned int relsec,
-			      struct module *me);
+int klp_write_relocations(struct module *mod, struct klp_object *obj);
 
 #else /* !CONFIG_LIVEPATCH */
 
@@ -229,6 +224,12 @@ static inline bool klp_patch_pending(struct task_struct *task) { return false; }
 static inline void klp_update_patch_state(struct task_struct *task) {}
 static inline void klp_copy_process(struct task_struct *child) {}
 
+static inline int klp_write_relocations(struct module *mod,
+					struct klp_object *obj)
+{
+	return 0;
+}
+
 #endif /* CONFIG_LIVEPATCH */
 
 #endif /* _LINUX_LIVEPATCH_H_ */
diff --git a/kernel/livepatch/core.c b/kernel/livepatch/core.c
index 30395302a273..52eb91d0ee8d 100644
--- a/kernel/livepatch/core.c
+++ b/kernel/livepatch/core.c
@@ -256,27 +256,60 @@ static int klp_resolve_symbols(Elf_Shdr *relasec, struct module *pmod)
 	return 0;
 }
 
-int __weak klp_apply_relocate_add(Elf64_Shdr *sechdrs,
-			      const char *strtab,
-			      unsigned int symindex,
-			      unsigned int relsec,
-			      struct module *me)
-{
-	return apply_relocate_add(sechdrs, strtab, symindex, relsec, me);
-}
-
-static int klp_write_object_relocations(struct module *pmod,
-					struct klp_object *obj)
+/*
+ * This function is called for both vmlinux-specific and module-specific klp
+ * relocation sections:
+ *
+ * 1) When the klp module itself loads, the module code calls this function
+ *    to write vmlinux-specific klp relocations.  These relocations allow the
+ *    patched code/data to reference unexported vmlinux symbols.  They're
+ *    written as early as possible to ensure that other module init code
+ *    (.e.g., jump_label_apply_nops) can access any non-exported vmlinux
+ *    symbols which might be referenced by the klp module's special sections.
+ *
+ * 2) When a to-be-patched module loads (or is already loaded when the
+ *    klp module loads), klp code calls this function to write klp relocations
+ *    which are specific to the module.  These relocations allow the patched
+ *    code/data to reference module symbols, both unexported and exported.
+ *    They also enable late module patching, which means the to-be-patched
+ *    module may be loaded *after* the klp module.
+ *
+ *    The following restrictions apply to module-specific relocation sections:
+ *
+ *    a) References to vmlinux symbols are not allowed.  Otherwise there might
+ *       be module init ordering issues, and crashes might occur in some of the
+ *       other kernel patching components like paravirt patching or jump
+ *       labels.  All references to vmlinux symbols should use either normal
+ *       relas (for exported symbols) or vmlinux-specific klp relas (for
+ *       unexported symbols).  This restriction is enforced in
+ *       klp_resolve_symbols().
+ *
+ *    b) Relocations to special sections like __jump_table and .altinstructions
+ *       aren't allowed.  In other words, there should never be a
+ *       .klp.rela.{module}.__jump_table section.  This will definitely cause
+ *       initialization ordering issues, as such special sections are processed
+ *       during the loading of the klp module itself, *not* the to-be-patched
+ *       module.  This means that e.g., it's not currently possible to patch a
+ *       module function which uses a static key jump label, if you want to
+ *       have the replacement function also use the same static key.  In this
+ *       case, a non-static interface like static_key_enabled() can be used in
+ *       the new function instead.
+ *
+ *       On the other hand, a .klp.rela.vmlinux.__jump_table section is fine,
+ *       as it can be resolved early enough during the load of the klp module,
+ *       as described above.
+ */
+int klp_write_relocations(struct module *pmod, struct klp_object *obj)
 {
 	int i, cnt, ret = 0;
 	const char *objname, *secname;
 	char sec_objname[MODULE_NAME_LEN];
 	Elf_Shdr *sec;
 
-	if (WARN_ON(!klp_is_object_loaded(obj)))
+	if (WARN_ON(obj && !klp_is_object_loaded(obj)))
 		return -EINVAL;
 
-	objname = klp_is_module(obj) ? obj->name : "vmlinux";
+	objname = obj ? obj->name : "vmlinux";
 
 	/* For each klp relocation section */
 	for (i = 1; i < pmod->klp_info->hdr.e_shnum; i++) {
@@ -305,7 +338,7 @@ static int klp_write_object_relocations(struct module *pmod,
 		if (ret)
 			break;
 
-		ret = klp_apply_relocate_add(pmod->klp_info->sechdrs,
+		ret = apply_relocate_add(pmod->klp_info->sechdrs,
 					 pmod->core_kallsyms.strtab,
 					 pmod->klp_info->symndx, i, pmod);
 		if (ret)
@@ -733,16 +766,25 @@ static int klp_init_object_loaded(struct klp_patch *patch,
 	struct klp_func *func;
 	int ret;
 
-	mutex_lock(&text_mutex);
+	if (klp_is_module(obj)) {
+
+		/*
+		 * Only write module-specific relocations here.
+		 * vmlinux-specific relocations were already written during the
+		 * loading of the klp module.
+		 */
+
+		mutex_lock(&text_mutex);
+
+		ret = klp_write_relocations(patch->mod, obj);
+		if (ret) {
+			mutex_unlock(&text_mutex);
+			return ret;
+		}
 
-	ret = klp_write_object_relocations(patch->mod, obj);
-	if (ret) {
 		mutex_unlock(&text_mutex);
-		return ret;
 	}
 
-	mutex_unlock(&text_mutex);
-
 	klp_for_each_func(obj, func) {
 		ret = klp_find_object_symbol(obj->name, func->old_name,
 					     func->old_sympos,
diff --git a/kernel/module.c b/kernel/module.c
index fe5bd382759c..ff4347385f05 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -2327,11 +2327,9 @@ static int apply_relocations(struct module *mod, const struct load_info *info)
 		if (!(info->sechdrs[infosec].sh_flags & SHF_ALLOC))
 			continue;
 
-		/* Livepatch relocation sections are applied by livepatch */
 		if (info->sechdrs[i].sh_flags & SHF_RELA_LIVEPATCH)
-			continue;
-
-		if (info->sechdrs[i].sh_type == SHT_REL)
+			err = klp_write_relocations(mod, NULL);
+		else if (info->sechdrs[i].sh_type == SHT_REL)
 			err = apply_relocate(info->sechdrs, info->strtab,
 					     info->index.sym, i, mod);
 		else if (info->sechdrs[i].sh_type == SHT_RELA)
@@ -3812,18 +3810,24 @@ static int load_module(struct load_info *info, const char __user *uargs,
 	/* Set up MODINFO_ATTR fields */
 	setup_modinfo(mod, info);
 
+	if (is_livepatch_module(mod)) {
+		err = copy_module_elf(mod, info);
+		if (err < 0)
+			goto free_modinfo;
+	}
+
 	/* Fix up syms, so that st_value is a pointer to location. */
 	err = simplify_symbols(mod, info);
 	if (err < 0)
-		goto free_modinfo;
+		goto free_elf_copy;
 
 	err = apply_relocations(mod, info);
 	if (err < 0)
-		goto free_modinfo;
+		goto free_elf_copy;
 
 	err = post_relocation(mod, info);
 	if (err < 0)
-		goto free_modinfo;
+		goto free_elf_copy;
 
 	flush_module_icache(mod);
 
@@ -3866,12 +3870,6 @@ static int load_module(struct load_info *info, const char __user *uargs,
 	if (err < 0)
 		goto coming_cleanup;
 
-	if (is_livepatch_module(mod)) {
-		err = copy_module_elf(mod, info);
-		if (err < 0)
-			goto sysfs_cleanup;
-	}
-
 	/* Get rid of temporary copy. */
 	free_copy(info);
 
@@ -3880,8 +3878,6 @@ static int load_module(struct load_info *info, const char __user *uargs,
 
 	return do_init_module(mod);
 
- sysfs_cleanup:
-	mod_sysfs_teardown(mod);
  coming_cleanup:
 	mod->state = MODULE_STATE_GOING;
 	destroy_params(mod->kp, mod->num_kp);
@@ -3901,6 +3897,8 @@ static int load_module(struct load_info *info, const char __user *uargs,
 	kfree(mod->args);
  free_arch_cleanup:
 	module_arch_cleanup(mod);
+ free_elf_copy:
+	free_module_elf(mod);
  free_modinfo:
 	free_modinfo(mod);
  free_unload:


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v4 15/16] module: Move where we mark modules RO,X
  2019-10-26  1:17                   ` Josh Poimboeuf
@ 2019-10-28 10:07                     ` Peter Zijlstra
  2019-10-28 10:43                     ` Peter Zijlstra
  1 sibling, 0 replies; 70+ messages in thread
From: Peter Zijlstra @ 2019-10-28 10:07 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Petr Mladek, x86, linux-kernel, rostedt, mhiramat, bristot,
	jbaron, torvalds, tglx, mingo, namit, hpa, luto, ard.biesheuvel,
	jeyu, live-patching, Mark Rutland

On Fri, Oct 25, 2019 at 08:17:41PM -0500, Josh Poimboeuf wrote:

> + *    The following restrictions apply to module-specific relocation sections:
> + *
> + *    a) References to vmlinux symbols are not allowed.  Otherwise there might
> + *       be module init ordering issues, and crashes might occur in some of the
> + *       other kernel patching components like paravirt patching or jump
> + *       labels.  All references to vmlinux symbols should use either normal
> + *       relas (for exported symbols) or vmlinux-specific klp relas (for
> + *       unexported symbols).  This restriction is enforced in
> + *       klp_resolve_symbols().

Right.

> + *    b) Relocations to special sections like __jump_table and .altinstructions
> + *       aren't allowed.  In other words, there should never be a
> + *       .klp.rela.{module}.__jump_table section.  This will definitely cause
> + *       initialization ordering issues, as such special sections are processed
> + *       during the loading of the klp module itself, *not* the to-be-patched
> + *       module.  This means that e.g., it's not currently possible to patch a
> + *       module function which uses a static key jump label, if you want to
> + *       have the replacement function also use the same static key.  In this
> + *       case, a non-static interface like static_key_enabled() can be used in
> + *       the new function instead.

Idem for .static_call_sites I suppose..

Is there any enforcement on this? I'm thinking it should be possible to
detect the presence of these sections and yell a bit.

OTOH, it should be possible to actually handle this, but let's do that
later.

> + *       On the other hand, a .klp.rela.vmlinux.__jump_table section is fine,
> + *       as it can be resolved early enough during the load of the klp module,
> + *       as described above.
> + */

> diff --git a/kernel/module.c b/kernel/module.c
> index fe5bd382759c..ff4347385f05 100644
> --- a/kernel/module.c
> +++ b/kernel/module.c
> @@ -2327,11 +2327,9 @@ static int apply_relocations(struct module *mod, const struct load_info *info)
>  		if (!(info->sechdrs[infosec].sh_flags & SHF_ALLOC))
>  			continue;
>  
> -		/* Livepatch relocation sections are applied by livepatch */
>  		if (info->sechdrs[i].sh_flags & SHF_RELA_LIVEPATCH)
> -			continue;
> -
> -		if (info->sechdrs[i].sh_type == SHT_REL)
> +			err = klp_write_relocations(mod, NULL);
> +		else if (info->sechdrs[i].sh_type == SHT_REL)
>  			err = apply_relocate(info->sechdrs, info->strtab,
>  					     info->index.sym, i, mod);
>  		else if (info->sechdrs[i].sh_type == SHT_RELA)

Like here, we can yell and error if .klp.rela.{mod}.__jump_table
sections are encountered.


Other than that, this should work I suppose.

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v4 15/16] module: Move where we mark modules RO,X
  2019-10-26  1:17                   ` Josh Poimboeuf
  2019-10-28 10:07                     ` Peter Zijlstra
@ 2019-10-28 10:43                     ` Peter Zijlstra
  1 sibling, 0 replies; 70+ messages in thread
From: Peter Zijlstra @ 2019-10-28 10:43 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Petr Mladek, x86, linux-kernel, rostedt, mhiramat, bristot,
	jbaron, torvalds, tglx, mingo, namit, hpa, luto, ard.biesheuvel,
	jeyu, live-patching, Mark Rutland

On Fri, Oct 25, 2019 at 08:17:41PM -0500, Josh Poimboeuf wrote:

> I can take over the livepatch-specific patches if you want.  Or however
> you want to do it.

Sure, feel free to take and route the livepatch patches. Then I'll wait
until those and the ARM64 patches land before I'll pick this up again.

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v4 13/16] arm/ftrace: Use __patch_text_real()
  2019-10-18  7:35 ` [PATCH v4 13/16] arm/ftrace: Use __patch_text_real() Peter Zijlstra
@ 2019-10-28 16:25   ` Will Deacon
  2019-10-28 16:34     ` Peter Zijlstra
  0 siblings, 1 reply; 70+ messages in thread
From: Will Deacon @ 2019-10-28 16:25 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: x86, linux-kernel, rostedt, mhiramat, bristot, jbaron, torvalds,
	tglx, mingo, namit, hpa, luto, ard.biesheuvel, jpoimboe, jeyu,
	rabin, Mark Rutland, james.morse

Hi Peter,

On Fri, Oct 18, 2019 at 09:35:38AM +0200, Peter Zijlstra wrote:
> Instead of flipping text protection, use the patch_text infrastructure
> that uses a fixmap alias where required.
> 
> This removes the last user of set_all_modules_text_*().
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> Cc: ard.biesheuvel@linaro.org
> Cc: rabin@rab.in
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: Will Deacon <will@kernel.org>
> Cc: james.morse@arm.com
> ---
>  arch/arm/kernel/ftrace.c |   16 ++++++++--------
>  1 file changed, 8 insertions(+), 8 deletions(-)
> 
> --- a/arch/arm/kernel/ftrace.c
> +++ b/arch/arm/kernel/ftrace.c
> @@ -22,6 +22,7 @@
>  #include <asm/ftrace.h>
>  #include <asm/insn.h>
>  #include <asm/set_memory.h>
> +#include <asm/patch.h>
>  
>  #ifdef CONFIG_THUMB2_KERNEL
>  #define	NOP		0xf85deb04	/* pop.w {lr} */
> @@ -31,13 +32,15 @@
>  
>  #ifdef CONFIG_DYNAMIC_FTRACE
>  
> +static int patch_text_remap = 0;
> +
>  static int __ftrace_modify_code(void *data)
>  {
>  	int *command = data;
>  
> -	set_kernel_text_rw();
> +	patch_text_remap++;
>  	ftrace_modify_all_code(*command);
> -	set_kernel_text_ro();
> +	patch_text_remap--;
>  
>  	return 0;
>  }
> @@ -59,13 +62,13 @@ static unsigned long adjust_address(stru
>  
>  int ftrace_arch_code_modify_prepare(void)
>  {
> -	set_all_modules_text_rw();
> +	patch_text_remap++;
>  	return 0;
>  }
>  
>  int ftrace_arch_code_modify_post_process(void)
>  {
> -	set_all_modules_text_ro();
> +	patch_text_remap--;
>  	/* Make sure any TLB misses during machine stop are cleared. */
>  	flush_tlb_all();
>  	return 0;
> @@ -97,10 +100,7 @@ static int ftrace_modify_code(unsigned l
>  			return -EINVAL;
>  	}
>  
> -	if (probe_kernel_write((void *)pc, &new, MCOUNT_INSN_SIZE))
> -		return -EPERM;
> -
> -	flush_icache_range(pc, pc + MCOUNT_INSN_SIZE);
> +	__patch_text_real((void *)pc, new, patch_text_remap);

Why can't you just pass 'true' for patch_text_remap? AFAICT, the only
time you want to pass false is during early boot when the text is
assumedly still writable without the fixmap.

Will

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v4 13/16] arm/ftrace: Use __patch_text_real()
  2019-10-28 16:25   ` Will Deacon
@ 2019-10-28 16:34     ` Peter Zijlstra
  2019-10-28 16:35       ` Peter Zijlstra
  2019-10-28 16:47       ` Will Deacon
  0 siblings, 2 replies; 70+ messages in thread
From: Peter Zijlstra @ 2019-10-28 16:34 UTC (permalink / raw)
  To: Will Deacon
  Cc: x86, linux-kernel, rostedt, mhiramat, bristot, jbaron, torvalds,
	tglx, mingo, namit, hpa, luto, ard.biesheuvel, jpoimboe, jeyu,
	rabin, Mark Rutland, james.morse

On Mon, Oct 28, 2019 at 04:25:26PM +0000, Will Deacon wrote:
> Hi Peter,
> 
> On Fri, Oct 18, 2019 at 09:35:38AM +0200, Peter Zijlstra wrote:
> > Instead of flipping text protection, use the patch_text infrastructure
> > that uses a fixmap alias where required.
> > 
> > This removes the last user of set_all_modules_text_*().
> > 
> > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> > Cc: ard.biesheuvel@linaro.org
> > Cc: rabin@rab.in
> > Cc: Mark Rutland <mark.rutland@arm.com>
> > Cc: Will Deacon <will@kernel.org>
> > Cc: james.morse@arm.com
> > ---
> >  arch/arm/kernel/ftrace.c |   16 ++++++++--------
> >  1 file changed, 8 insertions(+), 8 deletions(-)
> > 
> > --- a/arch/arm/kernel/ftrace.c
> > +++ b/arch/arm/kernel/ftrace.c
> > @@ -22,6 +22,7 @@
> >  #include <asm/ftrace.h>
> >  #include <asm/insn.h>
> >  #include <asm/set_memory.h>
> > +#include <asm/patch.h>
> >  
> >  #ifdef CONFIG_THUMB2_KERNEL
> >  #define	NOP		0xf85deb04	/* pop.w {lr} */
> > @@ -31,13 +32,15 @@
> >  
> >  #ifdef CONFIG_DYNAMIC_FTRACE
> >  
> > +static int patch_text_remap = 0;
> > +
> >  static int __ftrace_modify_code(void *data)
> >  {
> >  	int *command = data;
> >  
> > -	set_kernel_text_rw();
> > +	patch_text_remap++;
> >  	ftrace_modify_all_code(*command);
> > -	set_kernel_text_ro();
> > +	patch_text_remap--;
> >  
> >  	return 0;
> >  }
> > @@ -59,13 +62,13 @@ static unsigned long adjust_address(stru
> >  
> >  int ftrace_arch_code_modify_prepare(void)
> >  {
> > -	set_all_modules_text_rw();
> > +	patch_text_remap++;
> >  	return 0;
> >  }
> >  
> >  int ftrace_arch_code_modify_post_process(void)
> >  {
> > -	set_all_modules_text_ro();
> > +	patch_text_remap--;
> >  	/* Make sure any TLB misses during machine stop are cleared. */
> >  	flush_tlb_all();
> >  	return 0;
> > @@ -97,10 +100,7 @@ static int ftrace_modify_code(unsigned l
> >  			return -EINVAL;
> >  	}
> >  
> > -	if (probe_kernel_write((void *)pc, &new, MCOUNT_INSN_SIZE))
> > -		return -EPERM;
> > -
> > -	flush_icache_range(pc, pc + MCOUNT_INSN_SIZE);
> > +	__patch_text_real((void *)pc, new, patch_text_remap);
> 
> Why can't you just pass 'true' for patch_text_remap? AFAICT, the only
> time you want to pass false is during early boot when the text is
> assumedly still writable without the fixmap.

Ah, it will also become true for module loading once we rework where we
flip the module text RO,X. See this patch:

  https://lkml.kernel.org/r/20191018074634.858645375@infradead.org

But for that to land, there's still a few other issues to fix (KLP).

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v4 13/16] arm/ftrace: Use __patch_text_real()
  2019-10-28 16:34     ` Peter Zijlstra
@ 2019-10-28 16:35       ` Peter Zijlstra
  2019-10-28 16:47       ` Will Deacon
  1 sibling, 0 replies; 70+ messages in thread
From: Peter Zijlstra @ 2019-10-28 16:35 UTC (permalink / raw)
  To: Will Deacon
  Cc: x86, linux-kernel, rostedt, mhiramat, bristot, jbaron, torvalds,
	tglx, mingo, namit, hpa, luto, ard.biesheuvel, jpoimboe, jeyu,
	rabin, Mark Rutland, james.morse

On Mon, Oct 28, 2019 at 05:34:21PM +0100, Peter Zijlstra wrote:
> On Mon, Oct 28, 2019 at 04:25:26PM +0000, Will Deacon wrote:
> > > @@ -97,10 +100,7 @@ static int ftrace_modify_code(unsigned l
> > >  			return -EINVAL;
> > >  	}
> > >  
> > > -	if (probe_kernel_write((void *)pc, &new, MCOUNT_INSN_SIZE))
> > > -		return -EPERM;
> > > -
> > > -	flush_icache_range(pc, pc + MCOUNT_INSN_SIZE);
> > > +	__patch_text_real((void *)pc, new, patch_text_remap);
> > 
> > Why can't you just pass 'true' for patch_text_remap? AFAICT, the only
> > time you want to pass false is during early boot when the text is
> > assumedly still writable without the fixmap.
> 
> Ah, it will also become true for module loading once we rework where we

'false'. That is module loading will again be able to poke without
alias map.

> flip the module text RO,X. See this patch:
> 
>   https://lkml.kernel.org/r/20191018074634.858645375@infradead.org
> 
> But for that to land, there's still a few other issues to fix (KLP).

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v4 13/16] arm/ftrace: Use __patch_text_real()
  2019-10-28 16:34     ` Peter Zijlstra
  2019-10-28 16:35       ` Peter Zijlstra
@ 2019-10-28 16:47       ` Will Deacon
  2019-10-28 16:55         ` Peter Zijlstra
  1 sibling, 1 reply; 70+ messages in thread
From: Will Deacon @ 2019-10-28 16:47 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: x86, linux-kernel, rostedt, mhiramat, bristot, jbaron, torvalds,
	tglx, mingo, namit, hpa, luto, ard.biesheuvel, jpoimboe, jeyu,
	rabin, Mark Rutland, james.morse

On Mon, Oct 28, 2019 at 05:34:21PM +0100, Peter Zijlstra wrote:
> On Mon, Oct 28, 2019 at 04:25:26PM +0000, Will Deacon wrote:
> > On Fri, Oct 18, 2019 at 09:35:38AM +0200, Peter Zijlstra wrote:
> > > @@ -97,10 +100,7 @@ static int ftrace_modify_code(unsigned l
> > >  			return -EINVAL;
> > >  	}
> > >  
> > > -	if (probe_kernel_write((void *)pc, &new, MCOUNT_INSN_SIZE))
> > > -		return -EPERM;
> > > -
> > > -	flush_icache_range(pc, pc + MCOUNT_INSN_SIZE);
> > > +	__patch_text_real((void *)pc, new, patch_text_remap);
> > 
> > Why can't you just pass 'true' for patch_text_remap? AFAICT, the only
> > time you want to pass false is during early boot when the text is
> > assumedly still writable without the fixmap.
> 
> Ah, it will also become true for module loading once we rework where we
> flip the module text RO,X. See this patch:
> 
>   https://lkml.kernel.org/r/20191018074634.858645375@infradead.org
> 
> But for that to land, there's still a few other issues to fix (KLP).

Passing 'true' would still work though, right? Just feels a bit error
prone having to maintain the state of patch_test_remap() and remember
that 'ftrace_lock' is holding the concurrency together.

Will

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v4 13/16] arm/ftrace: Use __patch_text_real()
  2019-10-28 16:47       ` Will Deacon
@ 2019-10-28 16:55         ` Peter Zijlstra
  0 siblings, 0 replies; 70+ messages in thread
From: Peter Zijlstra @ 2019-10-28 16:55 UTC (permalink / raw)
  To: Will Deacon
  Cc: x86, linux-kernel, rostedt, mhiramat, bristot, jbaron, torvalds,
	tglx, mingo, namit, hpa, luto, ard.biesheuvel, jpoimboe, jeyu,
	rabin, Mark Rutland, james.morse

On Mon, Oct 28, 2019 at 04:47:59PM +0000, Will Deacon wrote:
> On Mon, Oct 28, 2019 at 05:34:21PM +0100, Peter Zijlstra wrote:
> > On Mon, Oct 28, 2019 at 04:25:26PM +0000, Will Deacon wrote:
> > > On Fri, Oct 18, 2019 at 09:35:38AM +0200, Peter Zijlstra wrote:
> > > > @@ -97,10 +100,7 @@ static int ftrace_modify_code(unsigned l
> > > >  			return -EINVAL;
> > > >  	}
> > > >  
> > > > -	if (probe_kernel_write((void *)pc, &new, MCOUNT_INSN_SIZE))
> > > > -		return -EPERM;
> > > > -
> > > > -	flush_icache_range(pc, pc + MCOUNT_INSN_SIZE);
> > > > +	__patch_text_real((void *)pc, new, patch_text_remap);
> > > 
> > > Why can't you just pass 'true' for patch_text_remap? AFAICT, the only
> > > time you want to pass false is during early boot when the text is
> > > assumedly still writable without the fixmap.
> > 
> > Ah, it will also become true for module loading once we rework where we
> > flip the module text RO,X. See this patch:
> > 
> >   https://lkml.kernel.org/r/20191018074634.858645375@infradead.org
> > 
> > But for that to land, there's still a few other issues to fix (KLP).
> 
> Passing 'true' would still work though, right? Just feels a bit error
> prone having to maintain the state of patch_test_remap() and remember
> that 'ftrace_lock' is holding the concurrency together.

It should, provided your fixmap stuff is working when we do the early
stuff I suppose. Module loading will be a wee bit slower too, but I'm
not the person to care about that.

^ permalink raw reply	[flat|nested] 70+ messages in thread

end of thread, other threads:[~2019-10-28 16:55 UTC | newest]

Thread overview: 70+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-18  7:35 [PATCH v4 00/16] Rewrite x86/ftrace to use text_poke (and more) Peter Zijlstra
2019-10-18  7:35 ` [PATCH v4 01/16] x86/alternatives: Teach text_poke_bp() to emulate instructions Peter Zijlstra
2019-10-18  7:35 ` [PATCH v4 02/16] x86/alternatives: Update int3_emulate_push() comment Peter Zijlstra
2019-10-18  7:35 ` [PATCH v4 03/16] x86/alternatives,jump_label: Provide better text_poke() batching interface Peter Zijlstra
2019-10-21  8:48   ` Ingo Molnar
2019-10-21  9:21     ` Peter Zijlstra
2019-10-18  7:35 ` [PATCH v4 04/16] x86/alternatives: Add and use text_gen_insn() helper Peter Zijlstra
2019-10-18  7:35 ` [PATCH v4 05/16] x86/ftrace: Use text_poke() Peter Zijlstra
2019-10-18  7:35 ` [PATCH v4 06/16] x86/mm: Remove set_kernel_text_r[ow]() Peter Zijlstra
2019-10-18  7:35 ` [PATCH v4 07/16] x86/alternative: Add text_opcode_size() Peter Zijlstra
2019-10-18  7:35 ` [PATCH v4 08/16] x86/ftrace: Use text_gen_insn() Peter Zijlstra
2019-10-18  7:35 ` [PATCH v4 09/16] x86/alternative: Remove text_poke_loc::len Peter Zijlstra
2019-10-21  8:58   ` Ingo Molnar
2019-10-21  9:02     ` Ingo Molnar
2019-10-18  7:35 ` [PATCH v4 10/16] x86/alternative: Shrink text_poke_loc Peter Zijlstra
2019-10-21  9:01   ` Ingo Molnar
2019-10-21  9:25     ` Peter Zijlstra
2019-10-21  9:33       ` Ingo Molnar
2019-10-18  7:35 ` [PATCH v4 11/16] x86/kprobes: Convert to text-patching.h Peter Zijlstra
2019-10-21 14:57   ` Masami Hiramatsu
2019-10-18  7:35 ` [PATCH v4 12/16] x86/kprobes: Fix ordering Peter Zijlstra
2019-10-22  1:35   ` Masami Hiramatsu
2019-10-22 10:31     ` Peter Zijlstra
2019-10-18  7:35 ` [PATCH v4 13/16] arm/ftrace: Use __patch_text_real() Peter Zijlstra
2019-10-28 16:25   ` Will Deacon
2019-10-28 16:34     ` Peter Zijlstra
2019-10-28 16:35       ` Peter Zijlstra
2019-10-28 16:47       ` Will Deacon
2019-10-28 16:55         ` Peter Zijlstra
2019-10-18  7:35 ` [PATCH v4 14/16] module: Remove set_all_modules_text_*() Peter Zijlstra
2019-10-18  7:35 ` [PATCH v4 15/16] module: Move where we mark modules RO,X Peter Zijlstra
2019-10-21 13:53   ` Josh Poimboeuf
2019-10-21 14:14     ` Peter Zijlstra
2019-10-21 15:34       ` Peter Zijlstra
2019-10-21 15:44         ` Peter Zijlstra
2019-10-21 16:11         ` Peter Zijlstra
2019-10-22 11:31           ` Heiko Carstens
2019-10-22 12:31             ` Peter Zijlstra
2019-10-23 11:48       ` Peter Zijlstra
2019-10-23 15:16         ` Peter Zijlstra
2019-10-23 17:15           ` Josh Poimboeuf
2019-10-24 10:59             ` Peter Zijlstra
2019-10-24 18:31               ` Josh Poimboeuf
2019-10-24 20:33                 ` Peter Zijlstra
2019-10-23 17:00         ` Josh Poimboeuf
2019-10-24 13:16           ` Peter Zijlstra
2019-10-25  6:44             ` Petr Mladek
2019-10-25  8:43               ` Peter Zijlstra
2019-10-25 10:06                 ` Peter Zijlstra
2019-10-25 13:50                   ` Josh Poimboeuf
2019-10-26  1:17                   ` Josh Poimboeuf
2019-10-28 10:07                     ` Peter Zijlstra
2019-10-28 10:43                     ` Peter Zijlstra
2019-10-25  9:16               ` Peter Zijlstra
2019-10-22  2:21   ` Steven Rostedt
2019-10-22 20:24     ` Peter Zijlstra
2019-10-22 20:40       ` Steven Rostedt
2019-10-23  9:07         ` Peter Zijlstra
2019-10-23 18:52       ` Steven Rostedt
2019-10-24 10:16         ` Peter Zijlstra
2019-10-24 10:18           ` Peter Zijlstra
2019-10-24 15:00           ` Steven Rostedt
2019-10-24 16:43             ` Peter Zijlstra
2019-10-24 18:17               ` Steven Rostedt
2019-10-24 20:24                 ` Peter Zijlstra
2019-10-24 20:28                   ` Steven Rostedt
2019-10-18  7:35 ` [PATCH v4 16/16] ftrace: Merge ftrace_module_{init,enable}() Peter Zijlstra
2019-10-18  8:20   ` Peter Zijlstra
2019-10-21  9:09 ` [PATCH v4 00/16] Rewrite x86/ftrace to use text_poke (and more) Ingo Molnar
2019-10-21 13:38   ` Steven Rostedt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).