All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH RFC v2 riscv/for-next 0/5] Enable ftrace with kernel preemption for RISC-V
@ 2022-09-13  9:42 Andy Chiu
  2022-09-13  9:42 ` [PATCH RFC v2 riscv/for-next 1/5] riscv: align ftrace to 4 Byte boundary and increase ftrace prologue size Andy Chiu
                   ` (5 more replies)
  0 siblings, 6 replies; 43+ messages in thread
From: Andy Chiu @ 2022-09-13  9:42 UTC (permalink / raw)
  To: palmer, paul.walmsley, aou, rostedt, mingo, peterz, jpoimboe,
	jbaron, ardb
  Cc: greentime.hu, zong.li, andy.chiu, guoren, kernel, linux-riscv

This patch removes dependency of dynamic ftrace from calling
stop_machine(), and makes it compatiable with kernel preemption.
Originally, we ran into stack corruptions, or execution of partially
updated instructions when starting or stopping ftrace on a fully
preemptible kernel configuration. The reason is that kernel periodically
calls rcu_momentary_dyntick_idle() on cores waiting for the code-patching
core running in ftrace. Though rcu_momentary_dyntick_idle() itself is
marked as notrace, it would call a bunch of tracable functions if we
configured the kernel as preemptible. For example, these are some functions
that happened to have a symbol and have not been marked as notrace on a
RISC-V preemptible kernel compiled with GCC-11:
 - __rcu_report_exp_rnp()
 - rcu_report_exp_cpu_mult()
 - rcu_preempt_deferred_qs()
 - rcu_preempt_need_deferred_qs()
 - rcu_preempt_deferred_qs_irqrestore()

Thus, this make it not ideal for us to rely on stop_machine() and
handly marked "notrace"s to perform runtime code patching. To remove
such dependency, we must make updates of code seemed atomic on running
cores. This might not be obvious for RISC-V since it usaually uses a pair
of AUIPC + JALR to perform a long jump, which cannot be modified and
executed concurrently if we consider preemptions. As such, this patch
proposed a way to make it possible. It embeds a 32-bit rel-address data
into instructions of each ftrace prologue and jumps indirectly. In this
way, we could store and load the address atomically so that the code
patching core could run simutaneously with the rest of running cores.

After applying the patchset, we compiled a preemptible kernel with all
tracers and ftrace-selftest enabled, and booted it on a 2-core QEMU virt
machine. The kernel could boot up successfully, passing all ftrace
testsuits. Besides, we ran a script that randomly pick a tracer on every
0~5 seconds. The kernel has sustained over 20K rounds of the test. In
contrast, a preemptible kernel without our patch would panic in few
rounds on the same machine.

Though we ran into errors when using hwlat or irqsoff tracers together
with cpu-online stressor from stress-ng on a preemptible kernel. We
believe the reason may be that  percpu workers of the tracers are being
queued into unbounded workqueue when cpu get offlined and patches will go
through tracing tree.

Additionally, we found patching of tracepoints unsafe since the
instructions being patched are not naturally aligned. This may result in
2 half-word stores, which breaks atomicity, during the code patching.

changes in patch v2:
 - Enforce alignments on all functions with a compiler workaround.
 - Support 64bit addressing for ftrace targets if xlen == 64
 - Initialize ftrace target addresses to avoid calling bad address in a
   hypothesized case.
 - Use LGPTR instead of SZPTR since .align is log-scaled for
   mcount-dyn.S
 - Require the nop instruction of all jump_labels aligns naturally on
   4B.

Andy Chiu (5):
  riscv: align ftrace to 4 Byte boundary and increase ftrace prologue
    size
  riscv: export patch_insn_write
  riscv: ftrace: use indirect jump to work with kernel preemption
  riscv: ftrace: do not use stop_machine to update code
  riscv: align arch_static_branch function

 arch/riscv/Makefile                 |   2 +-
 arch/riscv/include/asm/ftrace.h     |  24 ----
 arch/riscv/include/asm/jump_label.h |   2 +
 arch/riscv/include/asm/patch.h      |   1 +
 arch/riscv/kernel/ftrace.c          | 179 ++++++++++++++++++++--------
 arch/riscv/kernel/mcount-dyn.S      |  69 ++++++++---
 arch/riscv/kernel/patch.c           |   4 +-
 7 files changed, 188 insertions(+), 93 deletions(-)

-- 
2.36.0


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH RFC v2 riscv/for-next 1/5] riscv: align ftrace to 4 Byte boundary and increase ftrace prologue size
  2022-09-13  9:42 [PATCH RFC v2 riscv/for-next 0/5] Enable ftrace with kernel preemption for RISC-V Andy Chiu
@ 2022-09-13  9:42 ` Andy Chiu
  2022-09-15 13:53   ` Guo Ren
  2022-09-13  9:42 ` [PATCH RFC v2 riscv/for-next 2/5] riscv: export patch_insn_write Andy Chiu
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 43+ messages in thread
From: Andy Chiu @ 2022-09-13  9:42 UTC (permalink / raw)
  To: palmer, paul.walmsley, aou, rostedt, mingo, peterz, jpoimboe,
	jbaron, ardb
  Cc: greentime.hu, zong.li, andy.chiu, guoren, kernel, linux-riscv

We are introducing a new ftrace mechanism in order to phase out
stop_machine() and enable kernel preemption. The new mechanism requires
ftrace patchable function entries to be 24 bytes and aligned to 4 Byte
boundaries.

Before applying this patch, the size of the kernel code, with 122465 of
ftrace entries, was at 12.46 MB. Under the same configuration, the size
has increased to 12.99 MB after applying this patch set.

However, we found the -falign-functions alone was not strong enoungh to
make all functions align as required. In fact, cold functions are not
aligned after turning on optimizations. We consider this is a bug in GCC
and turn off guess-branch-probility as a workaround to align all
functions.

GCC bug id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88345

Signed-off-by: Andy Chiu <andy.chiu@sifive.com>
Reviewed-by: Greentime Hu <greentime.hu@sifive.com>
---
 arch/riscv/Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/riscv/Makefile b/arch/riscv/Makefile
index 3fa8ef336822..fd8069f59a59 100644
--- a/arch/riscv/Makefile
+++ b/arch/riscv/Makefile
@@ -11,7 +11,7 @@ LDFLAGS_vmlinux :=
 ifeq ($(CONFIG_DYNAMIC_FTRACE),y)
 	LDFLAGS_vmlinux := --no-relax
 	KBUILD_CPPFLAGS += -DCC_USING_PATCHABLE_FUNCTION_ENTRY
-	CC_FLAGS_FTRACE := -fpatchable-function-entry=8
+	CC_FLAGS_FTRACE := -fpatchable-function-entry=12  -falign-functions=4 -fno-guess-branch-probability
 endif
 
 ifeq ($(CONFIG_CMODEL_MEDLOW),y)
-- 
2.36.0


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH RFC v2 riscv/for-next 2/5] riscv: export patch_insn_write
  2022-09-13  9:42 [PATCH RFC v2 riscv/for-next 0/5] Enable ftrace with kernel preemption for RISC-V Andy Chiu
  2022-09-13  9:42 ` [PATCH RFC v2 riscv/for-next 1/5] riscv: align ftrace to 4 Byte boundary and increase ftrace prologue size Andy Chiu
@ 2022-09-13  9:42 ` Andy Chiu
  2022-09-13  9:42 ` [PATCH RFC v2 riscv/for-next 3/5] riscv: ftrace: use indirect jump to work with kernel preemption Andy Chiu
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 43+ messages in thread
From: Andy Chiu @ 2022-09-13  9:42 UTC (permalink / raw)
  To: palmer, paul.walmsley, aou, rostedt, mingo, peterz, jpoimboe,
	jbaron, ardb
  Cc: greentime.hu, zong.li, andy.chiu, guoren, kernel, linux-riscv,
	Palmer Dabbelt

So that we may patch code without issuing fence.i

Signed-off-by: Andy Chiu <andy.chiu@sifive.com>
Reviewed-by: Greentime Hu <greentime.hu@sifive.com>
Reviewed-by: Zong Li <zong.li@sifive.com>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Reviewed-by: Palmer Dabbelt <palmer@rivosinc.com>
---
 arch/riscv/include/asm/patch.h | 1 +
 arch/riscv/kernel/patch.c      | 4 ++--
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/riscv/include/asm/patch.h b/arch/riscv/include/asm/patch.h
index 9a7d7346001e..327e99114d67 100644
--- a/arch/riscv/include/asm/patch.h
+++ b/arch/riscv/include/asm/patch.h
@@ -6,6 +6,7 @@
 #ifndef _ASM_RISCV_PATCH_H
 #define _ASM_RISCV_PATCH_H
 
+int patch_insn_write(void *addr, const void *insn, size_t len);
 int patch_text_nosync(void *addr, const void *insns, size_t len);
 int patch_text(void *addr, u32 insn);
 
diff --git a/arch/riscv/kernel/patch.c b/arch/riscv/kernel/patch.c
index 765004b60513..6f7757ce50dc 100644
--- a/arch/riscv/kernel/patch.c
+++ b/arch/riscv/kernel/patch.c
@@ -49,7 +49,7 @@ static void patch_unmap(int fixmap)
 }
 NOKPROBE_SYMBOL(patch_unmap);
 
-static int patch_insn_write(void *addr, const void *insn, size_t len)
+int patch_insn_write(void *addr, const void *insn, size_t len)
 {
 	void *waddr = addr;
 	bool across_pages = (((uintptr_t) addr & ~PAGE_MASK) + len) > PAGE_SIZE;
@@ -78,7 +78,7 @@ static int patch_insn_write(void *addr, const void *insn, size_t len)
 }
 NOKPROBE_SYMBOL(patch_insn_write);
 #else
-static int patch_insn_write(void *addr, const void *insn, size_t len)
+int patch_insn_write(void *addr, const void *insn, size_t len)
 {
 	return copy_to_kernel_nofault(addr, insn, len);
 }
-- 
2.36.0


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH RFC v2 riscv/for-next 3/5] riscv: ftrace: use indirect jump to work with kernel preemption
  2022-09-13  9:42 [PATCH RFC v2 riscv/for-next 0/5] Enable ftrace with kernel preemption for RISC-V Andy Chiu
  2022-09-13  9:42 ` [PATCH RFC v2 riscv/for-next 1/5] riscv: align ftrace to 4 Byte boundary and increase ftrace prologue size Andy Chiu
  2022-09-13  9:42 ` [PATCH RFC v2 riscv/for-next 2/5] riscv: export patch_insn_write Andy Chiu
@ 2022-09-13  9:42 ` Andy Chiu
  2022-09-14 13:45   ` Guo Ren
  2024-02-20 14:17   ` Evgenii Shatokhin
  2022-09-13  9:42 ` [PATCH RFC v2 riscv/for-next 4/5] riscv: ftrace: do not use stop_machine to update code Andy Chiu
                   ` (2 subsequent siblings)
  5 siblings, 2 replies; 43+ messages in thread
From: Andy Chiu @ 2022-09-13  9:42 UTC (permalink / raw)
  To: palmer, paul.walmsley, aou, rostedt, mingo, peterz, jpoimboe,
	jbaron, ardb
  Cc: greentime.hu, zong.li, andy.chiu, guoren, kernel, linux-riscv

In RISCV, we must use an AUIPC + JALR pair to encode an immediate,
forming a jump that jumps to an address over 4K. This may cause errors
if we want to enable kernel preemption and remove dependency from
patching code with stop_machine(). For example, if a task was switched
out on auipc. And, if we changed the ftrace function before it was
switched back, then it would jump to an address that has updated 11:0
bits mixing with previous XLEN:12 part.

p: patched area performed by dynamic ftrace
ftrace_prologue:
p|	REG_S	ra, -SZREG(sp)
p|	auipc	ra, 0x? ------------> preempted
					...
				change ftrace function
					...
p|	jalr	-?(ra) <------------- switched back
p|	REG_L	ra, -SZREG(sp)
func:
	xxx
	ret

To prevent such condition, we proposed a way to load or store target
addresses atomically. We store a 8 bytes aligned full-width absolute
address into each ftrace prologue and use a jump at front to decide
whether we should take ftrace detour. To reduce footprint of ftrace
prologues, we clobber t0, and we move ra (re-)storing into
ftrace_{regs_}caller. This is similar to ARM64, which also clobbers x9 at
each prologue.

Also, we initialize the target at startup to take care of a case where
REG_L happened before the update of the ftrace target.

.align 2  # if it happen to be 8B-aligned
ftrace_prologue:
p|	{j	func} | {auipc	t0}
	j	ftrace_cont
p|	.dword	0x? <=== storing the address to a 8B aligned space can be
			 considered atomic to read sides using REG_L
ftrace_cont:
	REG_L	t0, 8(t0) <=== read side
	jalr	t0, t0
func:
	xxx
	ret

.align 2  # if it is 4B but not 8B-aligned
ftrace_prologue:
p|	{j	func} | {auipc	t0}
	REG_L	t0, 0xc(t0) <=== read side
	j	ftrace_cont
p|	.dword	0x? <=== the target address
ftrace_cont:
	jalr	t0, t0
func:
	xxx
	ret

Signed-off-by: Andy Chiu <andy.chiu@sifive.com>
Reviewed-by: Greentime Hu <greentime.hu@sifive.com>
---
 arch/riscv/include/asm/ftrace.h |  24 -----
 arch/riscv/kernel/ftrace.c      | 173 ++++++++++++++++++++++----------
 arch/riscv/kernel/mcount-dyn.S  |  69 ++++++++++---
 3 files changed, 176 insertions(+), 90 deletions(-)

diff --git a/arch/riscv/include/asm/ftrace.h b/arch/riscv/include/asm/ftrace.h
index 04dad3380041..eaa611e491fc 100644
--- a/arch/riscv/include/asm/ftrace.h
+++ b/arch/riscv/include/asm/ftrace.h
@@ -47,30 +47,6 @@ struct dyn_arch_ftrace {
  */
 
 #define MCOUNT_ADDR		((unsigned long)MCOUNT_NAME)
-#define JALR_SIGN_MASK		(0x00000800)
-#define JALR_OFFSET_MASK	(0x00000fff)
-#define AUIPC_OFFSET_MASK	(0xfffff000)
-#define AUIPC_PAD		(0x00001000)
-#define JALR_SHIFT		20
-#define JALR_BASIC		(0x000080e7)
-#define AUIPC_BASIC		(0x00000097)
-#define NOP4			(0x00000013)
-
-#define make_call(caller, callee, call)					\
-do {									\
-	call[0] = to_auipc_insn((unsigned int)((unsigned long)callee -	\
-				(unsigned long)caller));		\
-	call[1] = to_jalr_insn((unsigned int)((unsigned long)callee -	\
-			       (unsigned long)caller));			\
-} while (0)
-
-#define to_jalr_insn(offset)						\
-	(((offset & JALR_OFFSET_MASK) << JALR_SHIFT) | JALR_BASIC)
-
-#define to_auipc_insn(offset)						\
-	((offset & JALR_SIGN_MASK) ?					\
-	(((offset & AUIPC_OFFSET_MASK) + AUIPC_PAD) | AUIPC_BASIC) :	\
-	((offset & AUIPC_OFFSET_MASK) | AUIPC_BASIC))
 
 /*
  * Let auipc+jalr be the basic *mcount unit*, so we make it 8 bytes here.
diff --git a/arch/riscv/kernel/ftrace.c b/arch/riscv/kernel/ftrace.c
index 2086f6585773..84b9e280dd1f 100644
--- a/arch/riscv/kernel/ftrace.c
+++ b/arch/riscv/kernel/ftrace.c
@@ -23,31 +23,29 @@ void ftrace_arch_code_modify_post_process(void) __releases(&text_mutex)
 }
 
 static int ftrace_check_current_call(unsigned long hook_pos,
-				     unsigned int *expected)
+				     unsigned long expected_addr)
 {
-	unsigned int replaced[2];
-	unsigned int nops[2] = {NOP4, NOP4};
+	unsigned long replaced;
 
-	/* we expect nops at the hook position */
-	if (!expected)
-		expected = nops;
+	/* we expect ftrace_stub at the hook position */
+	if (!expected_addr)
+		expected_addr = (unsigned long) ftrace_stub;
 
 	/*
 	 * Read the text we want to modify;
 	 * return must be -EFAULT on read error
 	 */
-	if (copy_from_kernel_nofault(replaced, (void *)hook_pos,
-			MCOUNT_INSN_SIZE))
+	if (copy_from_kernel_nofault(&replaced, (void *)hook_pos,
+			(sizeof(unsigned long))))
 		return -EFAULT;
 
 	/*
 	 * Make sure it is what we expect it to be;
 	 * return must be -EINVAL on failed comparison
 	 */
-	if (memcmp(expected, replaced, sizeof(replaced))) {
-		pr_err("%p: expected (%08x %08x) but got (%08x %08x)\n",
-		       (void *)hook_pos, expected[0], expected[1], replaced[0],
-		       replaced[1]);
+	if (expected_addr != replaced) {
+		pr_err("%p: expected (%016lx) but got (%016lx)\n",
+		       (void *)hook_pos, expected_addr, replaced);
 		return -EINVAL;
 	}
 
@@ -57,55 +55,96 @@ static int ftrace_check_current_call(unsigned long hook_pos,
 static int __ftrace_modify_call(unsigned long hook_pos, unsigned long target,
 				bool enable)
 {
-	unsigned int call[2];
-	unsigned int nops[2] = {NOP4, NOP4};
+	unsigned long call = target;
+	unsigned long nops = (unsigned long)ftrace_stub;
 
-	make_call(hook_pos, target, call);
-
-	/* Replace the auipc-jalr pair at once. Return -EPERM on write error. */
+	/* Replace the target address at once. Return -EPERM on write error. */
 	if (patch_text_nosync
-	    ((void *)hook_pos, enable ? call : nops, MCOUNT_INSN_SIZE))
+	    ((void *)hook_pos, enable ? &call : &nops, sizeof(unsigned long)))
 		return -EPERM;
 
 	return 0;
 }
 
 /*
- * Put 5 instructions with 16 bytes at the front of function within
- * patchable function entry nops' area.
- *
- * 0: REG_S  ra, -SZREG(sp)
- * 1: auipc  ra, 0x?
- * 2: jalr   -?(ra)
- * 3: REG_L  ra, -SZREG(sp)
+ * Place 4 instructions and a destination address in the patchable function
+ * entry.
  *
  * So the opcodes is:
- * 0: 0xfe113c23 (sd)/0xfe112e23 (sw)
- * 1: 0x???????? -> auipc
- * 2: 0x???????? -> jalr
- * 3: 0xff813083 (ld)/0xffc12083 (lw)
+ * INSN_SKIPALL  : J     PC + 0x18 (when disabled, jump to the function)
+ * INSN_AUIPC    : AUIPC T0, 0 (when enabled, load address of trampoline)
+ * INSN_LOAD(off): REG_L T0, off(T0) (load address stored in the tramp)
+ * INSN_SKIPTRAMP: J     PC + 0x10 (skip tramp since they are not instructions)
+ * INSN_JALR     : JALR  T0, T0 (jump to the destination)
+ *
+ * At runtime, we want to patch the jump target atomically in order to work with
+ * kernel preemption. If we patched with a pair of AUIPC + JALR and a task was
+ * preempted after loading upper bits with AUIPC. Then things would mess up if
+ * we updated the jump target before the task were switched back.
+ *
+ * We also want to align all patchable function entries to 4-byte boundaries and,
+ * the jump target to a 8 Bytes aligned address so that each of them could be
+ * natually updated and observed by patching and running cores.
+ *
+ * To make sure target addresses are 8-byte aligned, we have to consider
+ * following scenarios:
+ *
+ * First if the starting address of the patchable entry is aligned to an 8-byte
+ * boundary:
+ * | ADDR   | COMPILED | DISABLED         | ENABLED                |
+ * +--------+----------+------------------+------------------------+
+ * | 0x00   | NOP      | J     FUNC       | AUIPC T0, 0            |
+ * | 0x04   | NOP      | J     0x10                                |
+ * | 0x08   | NOP      | 8-byte aligned target address (low)       |
+ * | 0x0C   | NOP      |                               (high)      |
+ * | 0x10   | NOP      | REG_L T0, 8(T0)                           |
+ * | 0x14   | NOP      | JALR  T0, T0                              |
+ * | FUNC   | X                                                    |
+ *
+ * If not, then it starts at a 4- but not 8-byte aligned address. In such cases,
+ * We re-arrange code and the trampoline in order to natually align it.
+ * | ADDR   | COMPILED | DISABLED         | ENABLED                |
+ * +--------+----------+------------------+------------------------+
+ * | 0x04   | NOP      | J     FUNC       | AUIPC T0, 0            |
+ * | 0x08   | NOP      | REG_L T0, 0xC(T0)                         |
+ * | 0x0C   | NOP      | J     0x18                                |
+ * | 0x10   | NOP      | 8-byte aligned target address (low)       |
+ * | 0x14   | NOP      |                               (high)      |
+ * | 0x18   | NOP      | JALR  T0, T0                              |
+ * | FUNC   | X                                                    |
  */
+
 #if __riscv_xlen == 64
-#define INSN0	0xfe113c23
-#define INSN3	0xff813083
-#elif __riscv_xlen == 32
-#define INSN0	0xfe112e23
-#define INSN3	0xffc12083
+#define INSN_LD_T0_OFF(off) ((0x2b283) | ((off) << 20))
+# elif __riscv_xlen == 32
+#define INSN_LD_T0_OFF(off) ((0x2a283) | ((off) << 20))
 #endif
 
-#define FUNC_ENTRY_SIZE	16
-#define FUNC_ENTRY_JMP	4
+#define INSN_SKIPALL	0x0180006f
+#define INSN_AUIPC	0x00000297
+#define INSN_LOAD(off)	INSN_LD_T0_OFF(off)
+#define INSN_SKIPTRAMP	0x00c0006f
+#define INSN_JALR	0x000282e7
+#define INSN_SIZE	4
 
 int ftrace_make_call(struct dyn_ftrace *rec, unsigned long addr)
 {
-	unsigned int call[4] = {INSN0, 0, 0, INSN3};
-	unsigned long target = addr;
-	unsigned long caller = rec->ip + FUNC_ENTRY_JMP;
-
-	call[1] = to_auipc_insn((unsigned int)(target - caller));
-	call[2] = to_jalr_insn((unsigned int)(target - caller));
+	unsigned int call[1] = {INSN_AUIPC};
+	void *tramp;
+	unsigned long patch_addr = rec->ip;
+
+	if (IS_ALIGNED(patch_addr, 8)) {
+		tramp = (void *) (patch_addr + 0x8);
+	} else if (IS_ALIGNED(patch_addr, 4)) {
+		tramp = (void *) (patch_addr + 0xc);
+	} else {
+		pr_warn("cannot patch: function must be 4-Byte or 8-Byte aligned\n");
+		return -EINVAL;
+	}
+	WARN_ON(!IS_ALIGNED((unsigned long)tramp, 8));
+	patch_insn_write(tramp, &addr, sizeof(unsigned long));
 
-	if (patch_text_nosync((void *)rec->ip, call, FUNC_ENTRY_SIZE))
+	if (patch_text_nosync((void *)patch_addr, &call, INSN_SIZE))
 		return -EPERM;
 
 	return 0;
@@ -114,14 +153,49 @@ int ftrace_make_call(struct dyn_ftrace *rec, unsigned long addr)
 int ftrace_make_nop(struct module *mod, struct dyn_ftrace *rec,
 		    unsigned long addr)
 {
-	unsigned int nops[4] = {NOP4, NOP4, NOP4, NOP4};
+	unsigned int nops[1] = {INSN_SKIPALL};
+	unsigned long patch_addr = rec->ip;
 
-	if (patch_text_nosync((void *)rec->ip, nops, FUNC_ENTRY_SIZE))
+	if (patch_text_nosync((void *)patch_addr, nops, INSN_SIZE))
 		return -EPERM;
 
 	return 0;
 }
 
+extern void ftrace_no_caller(void);
+static void ftrace_make_default_tramp(unsigned int *tramp)
+{
+	*((unsigned long *)tramp) = (unsigned long) &ftrace_no_caller;
+}
+
+int __ftrace_init_nop(struct module *mod, struct dyn_ftrace *rec,
+		    unsigned long addr)
+{
+	unsigned int nops[6];
+	unsigned int *tramp;
+	unsigned long patch_addr = rec->ip;
+
+	nops[0] = INSN_SKIPALL;
+	if (IS_ALIGNED(patch_addr, 8)) {
+		nops[1] = INSN_SKIPTRAMP;
+		nops[4] = INSN_LOAD(0x8);
+		tramp = &nops[2];
+	} else if (IS_ALIGNED(patch_addr, 4)) {
+		nops[1] = INSN_LOAD(0xc);
+		nops[2] = INSN_SKIPTRAMP;
+		tramp = &nops[3];
+	} else {
+		pr_warn("start address must be 4-Byte aligned\n");
+		return -EINVAL;
+	}
+	ftrace_make_default_tramp(tramp);
+	nops[5] = INSN_JALR;
+
+	if (patch_text_nosync((void *)patch_addr, nops, sizeof(nops)))
+		return -EPERM;
+
+	return 0;
+}
 
 /*
  * This is called early on, and isn't wrapped by
@@ -135,7 +209,7 @@ int ftrace_init_nop(struct module *mod, struct dyn_ftrace *rec)
 	int out;
 
 	ftrace_arch_code_modify_prepare();
-	out = ftrace_make_nop(mod, rec, MCOUNT_ADDR);
+	out = __ftrace_init_nop(mod, rec, MCOUNT_ADDR);
 	ftrace_arch_code_modify_post_process();
 
 	return out;
@@ -158,17 +232,14 @@ int ftrace_update_ftrace_func(ftrace_func_t func)
 int ftrace_modify_call(struct dyn_ftrace *rec, unsigned long old_addr,
 		       unsigned long addr)
 {
-	unsigned int call[2];
-	unsigned long caller = rec->ip + FUNC_ENTRY_JMP;
 	int ret;
 
-	make_call(caller, old_addr, call);
-	ret = ftrace_check_current_call(caller, call);
+	ret = ftrace_check_current_call(rec->ip, old_addr);
 
 	if (ret)
 		return ret;
 
-	return __ftrace_modify_call(caller, addr, true);
+	return __ftrace_modify_call(rec->ip, addr, true);
 }
 #endif
 
diff --git a/arch/riscv/kernel/mcount-dyn.S b/arch/riscv/kernel/mcount-dyn.S
index d171eca623b6..f8ee63e4314b 100644
--- a/arch/riscv/kernel/mcount-dyn.S
+++ b/arch/riscv/kernel/mcount-dyn.S
@@ -13,7 +13,7 @@
 
 	.text
 
-#define FENTRY_RA_OFFSET	12
+#define FENTRY_RA_OFFSET	24
 #define ABI_SIZE_ON_STACK	72
 #define ABI_A0			0
 #define ABI_A1			8
@@ -25,7 +25,12 @@
 #define ABI_A7			56
 #define ABI_RA			64
 
+# t0 points to return of ftrace
+# ra points to the return address of traced function
+
 	.macro SAVE_ABI
+	REG_S	ra, -SZREG(sp)
+	mv	ra, t0
 	addi	sp, sp, -SZREG
 	addi	sp, sp, -ABI_SIZE_ON_STACK
 
@@ -53,10 +58,14 @@
 
 	addi	sp, sp, ABI_SIZE_ON_STACK
 	addi	sp, sp, SZREG
+	mv	t0, ra
+	REG_L	ra, -SZREG(sp)
 	.endm
 
 #ifdef CONFIG_DYNAMIC_FTRACE_WITH_REGS
 	.macro SAVE_ALL
+	REG_S	ra, -SZREG(sp)
+	mv	ra, t0
 	addi	sp, sp, -SZREG
 	addi	sp, sp, -PT_SIZE_ON_STACK
 
@@ -138,9 +147,18 @@
 
 	addi	sp, sp, PT_SIZE_ON_STACK
 	addi	sp, sp, SZREG
+	mv	t0, ra # t0 is equal to ra here
+	REG_L	ra, -SZREG(sp)
 	.endm
 #endif /* CONFIG_DYNAMIC_FTRACE_WITH_REGS */
 
+# perform a full fence before re-running the ftrae entry if we run into this
+ENTRY(ftrace_no_caller)
+	fence	rw, rw
+	fence.i
+	jr	-FENTRY_RA_OFFSET(t0)
+ENDPROC(ftrace_no_caller)
+
 ENTRY(ftrace_caller)
 	SAVE_ABI
 
@@ -150,9 +168,9 @@ ENTRY(ftrace_caller)
 	REG_L	a1, ABI_SIZE_ON_STACK(sp)
 	mv	a3, sp
 
-ftrace_call:
-	.global ftrace_call
-	call	ftrace_stub
+ftrace_call_site:
+	REG_L	ra, ftrace_call
+	jalr	0(ra)
 
 #ifdef CONFIG_FUNCTION_GRAPH_TRACER
 	addi	a0, sp, ABI_SIZE_ON_STACK
@@ -161,12 +179,12 @@ ftrace_call:
 #ifdef HAVE_FUNCTION_GRAPH_FP_TEST
 	mv	a2, s0
 #endif
-ftrace_graph_call:
-	.global ftrace_graph_call
-	call	ftrace_stub
+ftrace_graph_call_site:
+	REG_L	ra, ftrace_graph_call
+	jalr	0(ra)
 #endif
 	RESTORE_ABI
-	ret
+	jr	t0
 ENDPROC(ftrace_caller)
 
 #ifdef CONFIG_DYNAMIC_FTRACE_WITH_REGS
@@ -179,9 +197,9 @@ ENTRY(ftrace_regs_caller)
 	REG_L	a1, PT_SIZE_ON_STACK(sp)
 	mv	a3, sp
 
-ftrace_regs_call:
-	.global ftrace_regs_call
-	call	ftrace_stub
+ftrace_regs_call_site:
+	REG_L	ra, ftrace_regs_call
+	jalr	0(ra)
 
 #ifdef CONFIG_FUNCTION_GRAPH_TRACER
 	addi	a0, sp, PT_RA
@@ -190,12 +208,33 @@ ftrace_regs_call:
 #ifdef HAVE_FUNCTION_GRAPH_FP_TEST
 	mv	a2, s0
 #endif
-ftrace_graph_regs_call:
-	.global ftrace_graph_regs_call
-	call	ftrace_stub
+ftrace_graph_regs_call_site:
+	REG_L	ra, ftrace_graph_regs_call
+	jalr	0(ra)
 #endif
 
 	RESTORE_ALL
-	ret
+	jr	t0
 ENDPROC(ftrace_regs_caller)
 #endif /* CONFIG_DYNAMIC_FTRACE_WITH_REGS */
+
+.align RISCV_LGPTR
+ftrace_call:
+	.global ftrace_call
+	RISCV_PTR ftrace_stub
+
+.align RISCV_LGPTR
+ftrace_graph_call:
+	.global ftrace_graph_call
+	RISCV_PTR ftrace_stub
+
+.align RISCV_LGPTR
+ftrace_regs_call:
+	.global ftrace_regs_call
+	RISCV_PTR ftrace_stub
+
+.align RISCV_LGPTR
+ftrace_graph_regs_call:
+	.global ftrace_graph_regs_call
+	RISCV_PTR ftrace_stub
+
-- 
2.36.0


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH RFC v2 riscv/for-next 4/5] riscv: ftrace: do not use stop_machine to update code
  2022-09-13  9:42 [PATCH RFC v2 riscv/for-next 0/5] Enable ftrace with kernel preemption for RISC-V Andy Chiu
                   ` (2 preceding siblings ...)
  2022-09-13  9:42 ` [PATCH RFC v2 riscv/for-next 3/5] riscv: ftrace: use indirect jump to work with kernel preemption Andy Chiu
@ 2022-09-13  9:42 ` Andy Chiu
  2022-09-13  9:42 ` [PATCH RFC v2 riscv/for-next 5/5] riscv: align arch_static_branch function Andy Chiu
  2024-02-13 19:42 ` [PATCH RFC v2 riscv/for-next 0/5] Enable ftrace with kernel preemption for RISC-V Evgenii Shatokhin
  5 siblings, 0 replies; 43+ messages in thread
From: Andy Chiu @ 2022-09-13  9:42 UTC (permalink / raw)
  To: palmer, paul.walmsley, aou, rostedt, mingo, peterz, jpoimboe,
	jbaron, ardb
  Cc: greentime.hu, zong.li, andy.chiu, guoren, kernel, linux-riscv

Now it is safe to remove dependency from stop_machine() to patch code in
ftrace.

Signed-off-by: Andy Chiu <andy.chiu@sifive.com>
Reviewed-by: Greentime Hu <greentime.hu@sifive.com>
Reviewed-by: Zong Li <zong.li@sifive.com>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
 arch/riscv/kernel/ftrace.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/arch/riscv/kernel/ftrace.c b/arch/riscv/kernel/ftrace.c
index 84b9e280dd1f..53db2ff83751 100644
--- a/arch/riscv/kernel/ftrace.c
+++ b/arch/riscv/kernel/ftrace.c
@@ -12,6 +12,12 @@
 #include <asm/patch.h>
 
 #ifdef CONFIG_DYNAMIC_FTRACE
+
+void arch_ftrace_update_code(int command)
+{
+	ftrace_modify_all_code(command);
+}
+
 void ftrace_arch_code_modify_prepare(void) __acquires(&text_mutex)
 {
 	mutex_lock(&text_mutex);
-- 
2.36.0


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH RFC v2 riscv/for-next 5/5] riscv: align arch_static_branch function
  2022-09-13  9:42 [PATCH RFC v2 riscv/for-next 0/5] Enable ftrace with kernel preemption for RISC-V Andy Chiu
                   ` (3 preceding siblings ...)
  2022-09-13  9:42 ` [PATCH RFC v2 riscv/for-next 4/5] riscv: ftrace: do not use stop_machine to update code Andy Chiu
@ 2022-09-13  9:42 ` Andy Chiu
  2022-09-14 14:06   ` Guo Ren
  2022-09-14 14:24   ` Jessica Clarke
  2024-02-13 19:42 ` [PATCH RFC v2 riscv/for-next 0/5] Enable ftrace with kernel preemption for RISC-V Evgenii Shatokhin
  5 siblings, 2 replies; 43+ messages in thread
From: Andy Chiu @ 2022-09-13  9:42 UTC (permalink / raw)
  To: palmer, paul.walmsley, aou, rostedt, mingo, peterz, jpoimboe,
	jbaron, ardb
  Cc: greentime.hu, zong.li, andy.chiu, guoren, kernel, linux-riscv

runtime code patching must be done at a naturally aligned address, or we
may execute on a partial instruction.

Signed-off-by: Andy Chiu <andy.chiu@sifive.com>
Reviewed-by: Greentime Hu <greentime.hu@sifive.com>
---
 arch/riscv/include/asm/jump_label.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/riscv/include/asm/jump_label.h b/arch/riscv/include/asm/jump_label.h
index 38af2ec7b9bf..729991e8f782 100644
--- a/arch/riscv/include/asm/jump_label.h
+++ b/arch/riscv/include/asm/jump_label.h
@@ -18,6 +18,7 @@ static __always_inline bool arch_static_branch(struct static_key *key,
 					       bool branch)
 {
 	asm_volatile_goto(
+		"	.align		2			\n\t"
 		"	.option push				\n\t"
 		"	.option norelax				\n\t"
 		"	.option norvc				\n\t"
@@ -39,6 +40,7 @@ static __always_inline bool arch_static_branch_jump(struct static_key *key,
 						    bool branch)
 {
 	asm_volatile_goto(
+		"	.align		2			\n\t"
 		"	.option push				\n\t"
 		"	.option norelax				\n\t"
 		"	.option norvc				\n\t"
-- 
2.36.0


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* Re: [PATCH RFC v2 riscv/for-next 3/5] riscv: ftrace: use indirect jump to work with kernel preemption
  2022-09-13  9:42 ` [PATCH RFC v2 riscv/for-next 3/5] riscv: ftrace: use indirect jump to work with kernel preemption Andy Chiu
@ 2022-09-14 13:45   ` Guo Ren
  2022-09-15 13:30     ` Guo Ren
  2022-09-17  1:04     ` Andy Chiu
  2024-02-20 14:17   ` Evgenii Shatokhin
  1 sibling, 2 replies; 43+ messages in thread
From: Guo Ren @ 2022-09-14 13:45 UTC (permalink / raw)
  To: Andy Chiu
  Cc: palmer, paul.walmsley, aou, rostedt, mingo, peterz, jpoimboe,
	jbaron, ardb, greentime.hu, zong.li, kernel, linux-riscv

I really appreciate you finding the bug, great job.

On Tue, Sep 13, 2022 at 5:44 PM Andy Chiu <andy.chiu@sifive.com> wrote:
>
> In RISCV, we must use an AUIPC + JALR pair to encode an immediate,
> forming a jump that jumps to an address over 4K. This may cause errors
> if we want to enable kernel preemption and remove dependency from
> patching code with stop_machine(). For example, if a task was switched
> out on auipc. And, if we changed the ftrace function before it was
> switched back, then it would jump to an address that has updated 11:0
> bits mixing with previous XLEN:12 part.
>
> p: patched area performed by dynamic ftrace
> ftrace_prologue:
> p|      REG_S   ra, -SZREG(sp)
> p|      auipc   ra, 0x? ------------> preempted
>                                         ...
>                                 change ftrace function
>                                         ...
> p|      jalr    -?(ra) <------------- switched back

When auipc + jalr -> nop, is safe, right? Because when switched back,
jalr -> nop.
When nop -> auipc + jalr, is buggy, right? Because when switched back,
nop -> jalr, the ra's value is not expected.

Some machines with instruction fusion won't be affected, because they
would merge auipc + jalr into one macro-op.
Qemu shouldn't be broken, because auipc + jalr is always in the same
tcg block, so no chance for interruption between them.

But anyway, we need to fix it.

> p|      REG_L   ra, -SZREG(sp)
> func:
>         xxx
>         ret
>
> To prevent such condition, we proposed a way to load or store target
> addresses atomically. We store a 8 bytes aligned full-width absolute
> address into each ftrace prologue and use a jump at front to decide
> whether we should take ftrace detour. To reduce footprint of ftrace
> prologues, we clobber t0, and we move ra (re-)storing into
> ftrace_{regs_}caller. This is similar to ARM64, which also clobbers x9 at
> each prologue.
>
> Also, we initialize the target at startup to take care of a case where
> REG_L happened before the update of the ftrace target.
>
> .align 2  # if it happen to be 8B-aligned
> ftrace_prologue:
> p|      {j      func} | {auipc  t0}
>         j       ftrace_cont
> p|      .dword  0x? <=== storing the address to a 8B aligned space can be
>                          considered atomic to read sides using REG_L
> ftrace_cont:
>         REG_L   t0, 8(t0) <=== read side
>         jalr    t0, t0
> func:
>         xxx
>         ret
>
> .align 2  # if it is 4B but not 8B-aligned
> ftrace_prologue:
> p|      {j      func} | {auipc  t0}
>         REG_L   t0, 0xc(t0) <=== read side
>         j       ftrace_cont
> p|      .dword  0x? <=== the target address
> ftrace_cont:
>         jalr    t0, t0
> func:
>         xxx
>         ret
>
> Signed-off-by: Andy Chiu <andy.chiu@sifive.com>
> Reviewed-by: Greentime Hu <greentime.hu@sifive.com>
> ---
>  arch/riscv/include/asm/ftrace.h |  24 -----
>  arch/riscv/kernel/ftrace.c      | 173 ++++++++++++++++++++++----------
>  arch/riscv/kernel/mcount-dyn.S  |  69 ++++++++++---
>  3 files changed, 176 insertions(+), 90 deletions(-)
>
> diff --git a/arch/riscv/include/asm/ftrace.h b/arch/riscv/include/asm/ftrace.h
> index 04dad3380041..eaa611e491fc 100644
> --- a/arch/riscv/include/asm/ftrace.h
> +++ b/arch/riscv/include/asm/ftrace.h
> @@ -47,30 +47,6 @@ struct dyn_arch_ftrace {
>   */
>
>  #define MCOUNT_ADDR            ((unsigned long)MCOUNT_NAME)
> -#define JALR_SIGN_MASK         (0x00000800)
> -#define JALR_OFFSET_MASK       (0x00000fff)
> -#define AUIPC_OFFSET_MASK      (0xfffff000)
> -#define AUIPC_PAD              (0x00001000)
> -#define JALR_SHIFT             20
> -#define JALR_BASIC             (0x000080e7)
> -#define AUIPC_BASIC            (0x00000097)
> -#define NOP4                   (0x00000013)
> -
> -#define make_call(caller, callee, call)                                        \
> -do {                                                                   \
> -       call[0] = to_auipc_insn((unsigned int)((unsigned long)callee -  \
> -                               (unsigned long)caller));                \
> -       call[1] = to_jalr_insn((unsigned int)((unsigned long)callee -   \
> -                              (unsigned long)caller));                 \
> -} while (0)
> -
> -#define to_jalr_insn(offset)                                           \
> -       (((offset & JALR_OFFSET_MASK) << JALR_SHIFT) | JALR_BASIC)
> -
> -#define to_auipc_insn(offset)                                          \
> -       ((offset & JALR_SIGN_MASK) ?                                    \
> -       (((offset & AUIPC_OFFSET_MASK) + AUIPC_PAD) | AUIPC_BASIC) :    \
> -       ((offset & AUIPC_OFFSET_MASK) | AUIPC_BASIC))
>
>  /*
>   * Let auipc+jalr be the basic *mcount unit*, so we make it 8 bytes here.
> diff --git a/arch/riscv/kernel/ftrace.c b/arch/riscv/kernel/ftrace.c
> index 2086f6585773..84b9e280dd1f 100644
> --- a/arch/riscv/kernel/ftrace.c
> +++ b/arch/riscv/kernel/ftrace.c
> @@ -23,31 +23,29 @@ void ftrace_arch_code_modify_post_process(void) __releases(&text_mutex)
>  }
>
>  static int ftrace_check_current_call(unsigned long hook_pos,
> -                                    unsigned int *expected)
> +                                    unsigned long expected_addr)
>  {
> -       unsigned int replaced[2];
> -       unsigned int nops[2] = {NOP4, NOP4};
> +       unsigned long replaced;
>
> -       /* we expect nops at the hook position */
> -       if (!expected)
> -               expected = nops;
> +       /* we expect ftrace_stub at the hook position */
> +       if (!expected_addr)
> +               expected_addr = (unsigned long) ftrace_stub;
>
>         /*
>          * Read the text we want to modify;
>          * return must be -EFAULT on read error
>          */
> -       if (copy_from_kernel_nofault(replaced, (void *)hook_pos,
> -                       MCOUNT_INSN_SIZE))
> +       if (copy_from_kernel_nofault(&replaced, (void *)hook_pos,
> +                       (sizeof(unsigned long))))
>                 return -EFAULT;
>
>         /*
>          * Make sure it is what we expect it to be;
>          * return must be -EINVAL on failed comparison
>          */
> -       if (memcmp(expected, replaced, sizeof(replaced))) {
> -               pr_err("%p: expected (%08x %08x) but got (%08x %08x)\n",
> -                      (void *)hook_pos, expected[0], expected[1], replaced[0],
> -                      replaced[1]);
> +       if (expected_addr != replaced) {
> +               pr_err("%p: expected (%016lx) but got (%016lx)\n",
> +                      (void *)hook_pos, expected_addr, replaced);
>                 return -EINVAL;
>         }
>
> @@ -57,55 +55,96 @@ static int ftrace_check_current_call(unsigned long hook_pos,
>  static int __ftrace_modify_call(unsigned long hook_pos, unsigned long target,
>                                 bool enable)
>  {
> -       unsigned int call[2];
> -       unsigned int nops[2] = {NOP4, NOP4};
> +       unsigned long call = target;
> +       unsigned long nops = (unsigned long)ftrace_stub;
>
> -       make_call(hook_pos, target, call);
> -
> -       /* Replace the auipc-jalr pair at once. Return -EPERM on write error. */
> +       /* Replace the target address at once. Return -EPERM on write error. */
>         if (patch_text_nosync
> -           ((void *)hook_pos, enable ? call : nops, MCOUNT_INSN_SIZE))
> +           ((void *)hook_pos, enable ? &call : &nops, sizeof(unsigned long)))
>                 return -EPERM;
>
>         return 0;
>  }
>
>  /*
> - * Put 5 instructions with 16 bytes at the front of function within
> - * patchable function entry nops' area.
> - *
> - * 0: REG_S  ra, -SZREG(sp)
> - * 1: auipc  ra, 0x?
> - * 2: jalr   -?(ra)
> - * 3: REG_L  ra, -SZREG(sp)
> + * Place 4 instructions and a destination address in the patchable function
> + * entry.
>   *
>   * So the opcodes is:
> - * 0: 0xfe113c23 (sd)/0xfe112e23 (sw)
> - * 1: 0x???????? -> auipc
> - * 2: 0x???????? -> jalr
> - * 3: 0xff813083 (ld)/0xffc12083 (lw)
> + * INSN_SKIPALL  : J     PC + 0x18 (when disabled, jump to the function)
> + * INSN_AUIPC    : AUIPC T0, 0 (when enabled, load address of trampoline)
> + * INSN_LOAD(off): REG_L T0, off(T0) (load address stored in the tramp)
> + * INSN_SKIPTRAMP: J     PC + 0x10 (skip tramp since they are not instructions)
> + * INSN_JALR     : JALR  T0, T0 (jump to the destination)
> + *
> + * At runtime, we want to patch the jump target atomically in order to work with
> + * kernel preemption. If we patched with a pair of AUIPC + JALR and a task was
> + * preempted after loading upper bits with AUIPC. Then things would mess up if
> + * we updated the jump target before the task were switched back.
> + *
> + * We also want to align all patchable function entries to 4-byte boundaries and,
> + * the jump target to a 8 Bytes aligned address so that each of them could be
> + * natually updated and observed by patching and running cores.
> + *
> + * To make sure target addresses are 8-byte aligned, we have to consider
> + * following scenarios:
> + *
> + * First if the starting address of the patchable entry is aligned to an 8-byte
> + * boundary:
> + * | ADDR   | COMPILED | DISABLED         | ENABLED                |
> + * +--------+----------+------------------+------------------------+
> + * | 0x00   | NOP      | J     FUNC       | AUIPC T0, 0            |
If we only add a J FUNC in current code, can we solve the problem?
    DISABLED | ENABLED
0: J FUNC      | REG_S  ra, -SZREG(sp)
1: auipc  ra, 0x?
2: jalr   -?(ra)
3: REG_L  ra, -SZREG(sp)
FUNC:


> + * | 0x04   | NOP      | J     0x10                                |
> + * | 0x08   | NOP      | 8-byte aligned target address (low)       |
> + * | 0x0C   | NOP      |                               (high)      |
> + * | 0x10   | NOP      | REG_L T0, 8(T0)                           |
> + * | 0x14   | NOP      | JALR  T0, T0                              |
> + * | FUNC   | X                                                    |
> + *
> + * If not, then it starts at a 4- but not 8-byte aligned address. In such cases,
> + * We re-arrange code and the trampoline in order to natually align it.
> + * | ADDR   | COMPILED | DISABLED         | ENABLED                |
> + * +--------+----------+------------------+------------------------+
> + * | 0x04   | NOP      | J     FUNC       | AUIPC T0, 0            |
> + * | 0x08   | NOP      | REG_L T0, 0xC(T0)                         |
> + * | 0x0C   | NOP      | J     0x18                                |
> + * | 0x10   | NOP      | 8-byte aligned target address (low)       |
> + * | 0x14   | NOP      |                               (high)      |
> + * | 0x18   | NOP      | JALR  T0, T0                              |
> + * | FUNC   | X                                                    |
>   */
> +
>  #if __riscv_xlen == 64
> -#define INSN0  0xfe113c23
> -#define INSN3  0xff813083
> -#elif __riscv_xlen == 32
> -#define INSN0  0xfe112e23
> -#define INSN3  0xffc12083
> +#define INSN_LD_T0_OFF(off) ((0x2b283) | ((off) << 20))
> +# elif __riscv_xlen == 32
> +#define INSN_LD_T0_OFF(off) ((0x2a283) | ((off) << 20))
>  #endif
>
> -#define FUNC_ENTRY_SIZE        16
> -#define FUNC_ENTRY_JMP 4
> +#define INSN_SKIPALL   0x0180006f
> +#define INSN_AUIPC     0x00000297
> +#define INSN_LOAD(off) INSN_LD_T0_OFF(off)
> +#define INSN_SKIPTRAMP 0x00c0006f
> +#define INSN_JALR      0x000282e7
> +#define INSN_SIZE      4
>
>  int ftrace_make_call(struct dyn_ftrace *rec, unsigned long addr)
>  {
> -       unsigned int call[4] = {INSN0, 0, 0, INSN3};
> -       unsigned long target = addr;
> -       unsigned long caller = rec->ip + FUNC_ENTRY_JMP;
> -
> -       call[1] = to_auipc_insn((unsigned int)(target - caller));
> -       call[2] = to_jalr_insn((unsigned int)(target - caller));
> +       unsigned int call[1] = {INSN_AUIPC};
> +       void *tramp;
> +       unsigned long patch_addr = rec->ip;
> +
> +       if (IS_ALIGNED(patch_addr, 8)) {
> +               tramp = (void *) (patch_addr + 0x8);
> +       } else if (IS_ALIGNED(patch_addr, 4)) {
> +               tramp = (void *) (patch_addr + 0xc);
> +       } else {
> +               pr_warn("cannot patch: function must be 4-Byte or 8-Byte aligned\n");
> +               return -EINVAL;
> +       }
> +       WARN_ON(!IS_ALIGNED((unsigned long)tramp, 8));
> +       patch_insn_write(tramp, &addr, sizeof(unsigned long));
>
> -       if (patch_text_nosync((void *)rec->ip, call, FUNC_ENTRY_SIZE))
> +       if (patch_text_nosync((void *)patch_addr, &call, INSN_SIZE))
>                 return -EPERM;
>
>         return 0;
> @@ -114,14 +153,49 @@ int ftrace_make_call(struct dyn_ftrace *rec, unsigned long addr)
>  int ftrace_make_nop(struct module *mod, struct dyn_ftrace *rec,
>                     unsigned long addr)
>  {
> -       unsigned int nops[4] = {NOP4, NOP4, NOP4, NOP4};
> +       unsigned int nops[1] = {INSN_SKIPALL};
> +       unsigned long patch_addr = rec->ip;
>
> -       if (patch_text_nosync((void *)rec->ip, nops, FUNC_ENTRY_SIZE))
> +       if (patch_text_nosync((void *)patch_addr, nops, INSN_SIZE))
>                 return -EPERM;
>
>         return 0;
>  }
>
> +extern void ftrace_no_caller(void);
> +static void ftrace_make_default_tramp(unsigned int *tramp)
> +{
> +       *((unsigned long *)tramp) = (unsigned long) &ftrace_no_caller;
> +}
> +
> +int __ftrace_init_nop(struct module *mod, struct dyn_ftrace *rec,
> +                   unsigned long addr)
> +{
> +       unsigned int nops[6];
> +       unsigned int *tramp;
> +       unsigned long patch_addr = rec->ip;
> +
> +       nops[0] = INSN_SKIPALL;
> +       if (IS_ALIGNED(patch_addr, 8)) {
> +               nops[1] = INSN_SKIPTRAMP;
> +               nops[4] = INSN_LOAD(0x8);
> +               tramp = &nops[2];
> +       } else if (IS_ALIGNED(patch_addr, 4)) {
> +               nops[1] = INSN_LOAD(0xc);
> +               nops[2] = INSN_SKIPTRAMP;
> +               tramp = &nops[3];
> +       } else {
> +               pr_warn("start address must be 4-Byte aligned\n");
> +               return -EINVAL;
> +       }
> +       ftrace_make_default_tramp(tramp);
> +       nops[5] = INSN_JALR;
> +
> +       if (patch_text_nosync((void *)patch_addr, nops, sizeof(nops)))
> +               return -EPERM;
> +
> +       return 0;
> +}
>
>  /*
>   * This is called early on, and isn't wrapped by
> @@ -135,7 +209,7 @@ int ftrace_init_nop(struct module *mod, struct dyn_ftrace *rec)
>         int out;
>
>         ftrace_arch_code_modify_prepare();
> -       out = ftrace_make_nop(mod, rec, MCOUNT_ADDR);
> +       out = __ftrace_init_nop(mod, rec, MCOUNT_ADDR);
>         ftrace_arch_code_modify_post_process();
>
>         return out;
> @@ -158,17 +232,14 @@ int ftrace_update_ftrace_func(ftrace_func_t func)
>  int ftrace_modify_call(struct dyn_ftrace *rec, unsigned long old_addr,
>                        unsigned long addr)
>  {
> -       unsigned int call[2];
> -       unsigned long caller = rec->ip + FUNC_ENTRY_JMP;
>         int ret;
>
> -       make_call(caller, old_addr, call);
> -       ret = ftrace_check_current_call(caller, call);
> +       ret = ftrace_check_current_call(rec->ip, old_addr);
>
>         if (ret)
>                 return ret;
>
> -       return __ftrace_modify_call(caller, addr, true);
> +       return __ftrace_modify_call(rec->ip, addr, true);
>  }
>  #endif
>
> diff --git a/arch/riscv/kernel/mcount-dyn.S b/arch/riscv/kernel/mcount-dyn.S
> index d171eca623b6..f8ee63e4314b 100644
> --- a/arch/riscv/kernel/mcount-dyn.S
> +++ b/arch/riscv/kernel/mcount-dyn.S
> @@ -13,7 +13,7 @@
>
>         .text
>
> -#define FENTRY_RA_OFFSET       12
> +#define FENTRY_RA_OFFSET       24
>  #define ABI_SIZE_ON_STACK      72
>  #define ABI_A0                 0
>  #define ABI_A1                 8
> @@ -25,7 +25,12 @@
>  #define ABI_A7                 56
>  #define ABI_RA                 64
>
> +# t0 points to return of ftrace
> +# ra points to the return address of traced function
> +
>         .macro SAVE_ABI
> +       REG_S   ra, -SZREG(sp)
> +       mv      ra, t0
>         addi    sp, sp, -SZREG
>         addi    sp, sp, -ABI_SIZE_ON_STACK
>
> @@ -53,10 +58,14 @@
>
>         addi    sp, sp, ABI_SIZE_ON_STACK
>         addi    sp, sp, SZREG
> +       mv      t0, ra
> +       REG_L   ra, -SZREG(sp)
>         .endm
>
>  #ifdef CONFIG_DYNAMIC_FTRACE_WITH_REGS
>         .macro SAVE_ALL
> +       REG_S   ra, -SZREG(sp)
> +       mv      ra, t0
>         addi    sp, sp, -SZREG
>         addi    sp, sp, -PT_SIZE_ON_STACK
>
> @@ -138,9 +147,18 @@
>
>         addi    sp, sp, PT_SIZE_ON_STACK
>         addi    sp, sp, SZREG
> +       mv      t0, ra # t0 is equal to ra here
> +       REG_L   ra, -SZREG(sp)
>         .endm
>  #endif /* CONFIG_DYNAMIC_FTRACE_WITH_REGS */
>
> +# perform a full fence before re-running the ftrae entry if we run into this
> +ENTRY(ftrace_no_caller)
> +       fence   rw, rw
> +       fence.i
> +       jr      -FENTRY_RA_OFFSET(t0)
> +ENDPROC(ftrace_no_caller)
> +
>  ENTRY(ftrace_caller)
>         SAVE_ABI
>
> @@ -150,9 +168,9 @@ ENTRY(ftrace_caller)
>         REG_L   a1, ABI_SIZE_ON_STACK(sp)
>         mv      a3, sp
>
> -ftrace_call:
> -       .global ftrace_call
> -       call    ftrace_stub
> +ftrace_call_site:
> +       REG_L   ra, ftrace_call
> +       jalr    0(ra)
>
>  #ifdef CONFIG_FUNCTION_GRAPH_TRACER
>         addi    a0, sp, ABI_SIZE_ON_STACK
> @@ -161,12 +179,12 @@ ftrace_call:
>  #ifdef HAVE_FUNCTION_GRAPH_FP_TEST
>         mv      a2, s0
>  #endif
> -ftrace_graph_call:
> -       .global ftrace_graph_call
> -       call    ftrace_stub
> +ftrace_graph_call_site:
> +       REG_L   ra, ftrace_graph_call
> +       jalr    0(ra)
>  #endif
>         RESTORE_ABI
> -       ret
> +       jr      t0
>  ENDPROC(ftrace_caller)
>
>  #ifdef CONFIG_DYNAMIC_FTRACE_WITH_REGS
> @@ -179,9 +197,9 @@ ENTRY(ftrace_regs_caller)
>         REG_L   a1, PT_SIZE_ON_STACK(sp)
>         mv      a3, sp
>
> -ftrace_regs_call:
> -       .global ftrace_regs_call
> -       call    ftrace_stub
> +ftrace_regs_call_site:
> +       REG_L   ra, ftrace_regs_call
> +       jalr    0(ra)
>
>  #ifdef CONFIG_FUNCTION_GRAPH_TRACER
>         addi    a0, sp, PT_RA
> @@ -190,12 +208,33 @@ ftrace_regs_call:
>  #ifdef HAVE_FUNCTION_GRAPH_FP_TEST
>         mv      a2, s0
>  #endif
> -ftrace_graph_regs_call:
> -       .global ftrace_graph_regs_call
> -       call    ftrace_stub
> +ftrace_graph_regs_call_site:
> +       REG_L   ra, ftrace_graph_regs_call
> +       jalr    0(ra)
>  #endif
>
>         RESTORE_ALL
> -       ret
> +       jr      t0
>  ENDPROC(ftrace_regs_caller)
>  #endif /* CONFIG_DYNAMIC_FTRACE_WITH_REGS */
> +
> +.align RISCV_LGPTR
> +ftrace_call:
> +       .global ftrace_call
> +       RISCV_PTR ftrace_stub
> +
> +.align RISCV_LGPTR
> +ftrace_graph_call:
> +       .global ftrace_graph_call
> +       RISCV_PTR ftrace_stub
> +
> +.align RISCV_LGPTR
> +ftrace_regs_call:
> +       .global ftrace_regs_call
> +       RISCV_PTR ftrace_stub
> +
> +.align RISCV_LGPTR
> +ftrace_graph_regs_call:
> +       .global ftrace_graph_regs_call
> +       RISCV_PTR ftrace_stub
> +
> --
> 2.36.0
>


--
Best Regards
 Guo Ren

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH RFC v2 riscv/for-next 5/5] riscv: align arch_static_branch function
  2022-09-13  9:42 ` [PATCH RFC v2 riscv/for-next 5/5] riscv: align arch_static_branch function Andy Chiu
@ 2022-09-14 14:06   ` Guo Ren
  2022-09-16 23:54     ` Andy Chiu
  2022-09-14 14:24   ` Jessica Clarke
  1 sibling, 1 reply; 43+ messages in thread
From: Guo Ren @ 2022-09-14 14:06 UTC (permalink / raw)
  To: Andy Chiu
  Cc: palmer, paul.walmsley, aou, rostedt, mingo, peterz, jpoimboe,
	jbaron, ardb, greentime.hu, zong.li, kernel, linux-riscv

Is this patch related to this series?

On Tue, Sep 13, 2022 at 5:44 PM Andy Chiu <andy.chiu@sifive.com> wrote:
>
> runtime code patching must be done at a naturally aligned address, or we
> may execute on a partial instruction.
If it's true, we can't use static branches at all. Have you
encountered a problem?

If you are right, arm64 ... csky all need the patch.


>
> Signed-off-by: Andy Chiu <andy.chiu@sifive.com>
> Reviewed-by: Greentime Hu <greentime.hu@sifive.com>
> ---
>  arch/riscv/include/asm/jump_label.h | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/arch/riscv/include/asm/jump_label.h b/arch/riscv/include/asm/jump_label.h
> index 38af2ec7b9bf..729991e8f782 100644
> --- a/arch/riscv/include/asm/jump_label.h
> +++ b/arch/riscv/include/asm/jump_label.h
> @@ -18,6 +18,7 @@ static __always_inline bool arch_static_branch(struct static_key *key,
>                                                bool branch)
>  {
>         asm_volatile_goto(
> +               "       .align          2                       \n\t"
>                 "       .option push                            \n\t"
>                 "       .option norelax                         \n\t"
>                 "       .option norvc                           \n\t"
> @@ -39,6 +40,7 @@ static __always_inline bool arch_static_branch_jump(struct static_key *key,
>                                                     bool branch)
>  {
>         asm_volatile_goto(
> +               "       .align          2                       \n\t"
>                 "       .option push                            \n\t"
>                 "       .option norelax                         \n\t"
>                 "       .option norvc                           \n\t"
> --
> 2.36.0
>


-- 
Best Regards
 Guo Ren

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH RFC v2 riscv/for-next 5/5] riscv: align arch_static_branch function
  2022-09-13  9:42 ` [PATCH RFC v2 riscv/for-next 5/5] riscv: align arch_static_branch function Andy Chiu
  2022-09-14 14:06   ` Guo Ren
@ 2022-09-14 14:24   ` Jessica Clarke
  2022-09-15  1:47     ` Guo Ren
  1 sibling, 1 reply; 43+ messages in thread
From: Jessica Clarke @ 2022-09-14 14:24 UTC (permalink / raw)
  To: Andy Chiu
  Cc: Palmer Dabbelt, Paul Walmsley, Albert Ou, rostedt, Ingo Molnar,
	Peter Zijlstra, jpoimboe, jbaron, ardb, greentime.hu, zong.li,
	guoren, kernel, linux-riscv

On 13 Sept 2022, at 10:42, Andy Chiu <andy.chiu@sifive.com> wrote:
> 
> runtime code patching must be done at a naturally aligned address, or we
> may execute on a partial instruction.
> 
> Signed-off-by: Andy Chiu <andy.chiu@sifive.com>
> Reviewed-by: Greentime Hu <greentime.hu@sifive.com>
> ---
> arch/riscv/include/asm/jump_label.h | 2 ++
> 1 file changed, 2 insertions(+)
> 
> diff --git a/arch/riscv/include/asm/jump_label.h b/arch/riscv/include/asm/jump_label.h
> index 38af2ec7b9bf..729991e8f782 100644
> --- a/arch/riscv/include/asm/jump_label.h
> +++ b/arch/riscv/include/asm/jump_label.h
> @@ -18,6 +18,7 @@ static __always_inline bool arch_static_branch(struct static_key *key,
> 					       bool branch)
> {
> 	asm_volatile_goto(
> +		"	.align		2			\n\t"

.align is a horrible directive whose meaning changes between
architectures and requires careful thought, especially when the
argument is a power of 2. Better to use .balign 4, or .p2align 2 if you
really want to for some reason.

Jess

> 		"	.option push				\n\t"
> 		"	.option norelax				\n\t"
> 		"	.option norvc				\n\t"
> @@ -39,6 +40,7 @@ static __always_inline bool arch_static_branch_jump(struct static_key *key,
> 						    bool branch)
> {
> 	asm_volatile_goto(
> +		"	.align		2			\n\t"
> 		"	.option push				\n\t"
> 		"	.option norelax				\n\t"
> 		"	.option norvc				\n\t"
> -- 
> 2.36.0
> 
> 
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH RFC v2 riscv/for-next 5/5] riscv: align arch_static_branch function
  2022-09-14 14:24   ` Jessica Clarke
@ 2022-09-15  1:47     ` Guo Ren
  2022-09-15  2:34       ` Jessica Clarke
  0 siblings, 1 reply; 43+ messages in thread
From: Guo Ren @ 2022-09-15  1:47 UTC (permalink / raw)
  To: Jessica Clarke
  Cc: Andy Chiu, Palmer Dabbelt, Paul Walmsley, Albert Ou, rostedt,
	Ingo Molnar, Peter Zijlstra, jpoimboe, jbaron, ardb,
	greentime.hu, zong.li, kernel, linux-riscv

On Wed, Sep 14, 2022 at 10:24 PM Jessica Clarke <jrtc27@jrtc27.com> wrote:
>
> On 13 Sept 2022, at 10:42, Andy Chiu <andy.chiu@sifive.com> wrote:
> >
> > runtime code patching must be done at a naturally aligned address, or we
> > may execute on a partial instruction.
> >
> > Signed-off-by: Andy Chiu <andy.chiu@sifive.com>
> > Reviewed-by: Greentime Hu <greentime.hu@sifive.com>
> > ---
> > arch/riscv/include/asm/jump_label.h | 2 ++
> > 1 file changed, 2 insertions(+)
> >
> > diff --git a/arch/riscv/include/asm/jump_label.h b/arch/riscv/include/asm/jump_label.h
> > index 38af2ec7b9bf..729991e8f782 100644
> > --- a/arch/riscv/include/asm/jump_label.h
> > +++ b/arch/riscv/include/asm/jump_label.h
> > @@ -18,6 +18,7 @@ static __always_inline bool arch_static_branch(struct static_key *key,
> >                                              bool branch)
> > {
> >       asm_volatile_goto(
> > +             "       .align          2                       \n\t"
>
> .align is a horrible directive whose meaning changes between
> architectures and requires careful thought, especially when the
> argument is a power of 2. Better to use .balign 4, or .p2align 2 if you
> really want to for some reason.
Do we really need to align here? Should it be naturally done by the compiler?

>
> Jess
>
> >               "       .option push                            \n\t"
> >               "       .option norelax                         \n\t"
> >               "       .option norvc                           \n\t"
> > @@ -39,6 +40,7 @@ static __always_inline bool arch_static_branch_jump(struct static_key *key,
> >                                                   bool branch)
> > {
> >       asm_volatile_goto(
> > +             "       .align          2                       \n\t"
> >               "       .option push                            \n\t"
> >               "       .option norelax                         \n\t"
> >               "       .option norvc                           \n\t"
> > --
> > 2.36.0
> >
> >
> > _______________________________________________
> > linux-riscv mailing list
> > linux-riscv@lists.infradead.org
> > http://lists.infradead.org/mailman/listinfo/linux-riscv
>


-- 
Best Regards
 Guo Ren

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH RFC v2 riscv/for-next 5/5] riscv: align arch_static_branch function
  2022-09-15  1:47     ` Guo Ren
@ 2022-09-15  2:34       ` Jessica Clarke
  0 siblings, 0 replies; 43+ messages in thread
From: Jessica Clarke @ 2022-09-15  2:34 UTC (permalink / raw)
  To: Guo Ren
  Cc: Andy Chiu, Palmer Dabbelt, Paul Walmsley, Albert Ou, rostedt,
	Ingo Molnar, Peter Zijlstra, jpoimboe, jbaron, Ard Biesheuvel,
	Greentime Hu, Zong Li, kernel, linux-riscv

On 15 Sept 2022, at 02:47, Guo Ren <guoren@kernel.org> wrote:
> 
> On Wed, Sep 14, 2022 at 10:24 PM Jessica Clarke <jrtc27@jrtc27.com> wrote:
>> 
>> On 13 Sept 2022, at 10:42, Andy Chiu <andy.chiu@sifive.com> wrote:
>>> 
>>> runtime code patching must be done at a naturally aligned address, or we
>>> may execute on a partial instruction.
>>> 
>>> Signed-off-by: Andy Chiu <andy.chiu@sifive.com>
>>> Reviewed-by: Greentime Hu <greentime.hu@sifive.com>
>>> ---
>>> arch/riscv/include/asm/jump_label.h | 2 ++
>>> 1 file changed, 2 insertions(+)
>>> 
>>> diff --git a/arch/riscv/include/asm/jump_label.h b/arch/riscv/include/asm/jump_label.h
>>> index 38af2ec7b9bf..729991e8f782 100644
>>> --- a/arch/riscv/include/asm/jump_label.h
>>> +++ b/arch/riscv/include/asm/jump_label.h
>>> @@ -18,6 +18,7 @@ static __always_inline bool arch_static_branch(struct static_key *key,
>>> bool branch)
>>> {
>>> asm_volatile_goto(
>>> + " .align 2 \n\t"
>> 
>> .align is a horrible directive whose meaning changes between
>> architectures and requires careful thought, especially when the
>> argument is a power of 2. Better to use .balign 4, or .p2align 2 if you
>> really want to for some reason.
> Do we really need to align here? Should it be naturally done by the compiler?

Compilers don’t add alignment do inline assembly beyond what’s needed.
And assemblers won’t align beyond what the architecture requires here
either as far as I know for .option rvc, since normally you still have
an RVC-capable processor you just don’t want compressed instructions at
that specific point.

Jess

>> 
>> Jess
>> 
>>> " .option push \n\t"
>>> " .option norelax \n\t"
>>> " .option norvc \n\t"
>>> @@ -39,6 +40,7 @@ static __always_inline bool arch_static_branch_jump(struct static_key *key,
>>> bool branch)
>>> {
>>> asm_volatile_goto(
>>> + " .align 2 \n\t"
>>> " .option push \n\t"
>>> " .option norelax \n\t"
>>> " .option norvc \n\t"
>>> --
>>> 2.36.0
>>> 
>>> 
>>> _______________________________________________
>>> linux-riscv mailing list
>>> linux-riscv@lists.infradead.org
>>> http://lists.infradead.org/mailman/listinfo/linux-riscv
>> 
> 
> 
> -- 
> Best Regards
> Guo Ren


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH RFC v2 riscv/for-next 3/5] riscv: ftrace: use indirect jump to work with kernel preemption
  2022-09-14 13:45   ` Guo Ren
@ 2022-09-15 13:30     ` Guo Ren
  2022-09-17  1:04     ` Andy Chiu
  1 sibling, 0 replies; 43+ messages in thread
From: Guo Ren @ 2022-09-15 13:30 UTC (permalink / raw)
  To: Andy Chiu
  Cc: palmer, paul.walmsley, aou, rostedt, mingo, peterz, jpoimboe,
	jbaron, ardb, greentime.hu, zong.li, kernel, linux-riscv

On Wed, Sep 14, 2022 at 9:45 PM Guo Ren <guoren@kernel.org> wrote:
>
> I really appreciate you finding the bug, great job.
>
> On Tue, Sep 13, 2022 at 5:44 PM Andy Chiu <andy.chiu@sifive.com> wrote:
> >
> > In RISCV, we must use an AUIPC + JALR pair to encode an immediate,
> > forming a jump that jumps to an address over 4K. This may cause errors
> > if we want to enable kernel preemption and remove dependency from
> > patching code with stop_machine(). For example, if a task was switched
> > out on auipc. And, if we changed the ftrace function before it was
> > switched back, then it would jump to an address that has updated 11:0
> > bits mixing with previous XLEN:12 part.
> >
> > p: patched area performed by dynamic ftrace
> > ftrace_prologue:
> > p|      REG_S   ra, -SZREG(sp)
> > p|      auipc   ra, 0x? ------------> preempted
> >                                         ...
> >                                 change ftrace function
> >                                         ...
> > p|      jalr    -?(ra) <------------- switched back
>
> When auipc + jalr -> nop, is safe, right? Because when switched back,
> jalr -> nop.
> When nop -> auipc + jalr, is buggy, right? Because when switched back,
> nop -> jalr, the ra's value is not expected.
>
> Some machines with instruction fusion won't be affected, because they
> would merge auipc + jalr into one macro-op.
> Qemu shouldn't be broken, because auipc + jalr is always in the same
> tcg block, so no chance for interruption between them.
>
> But anyway, we need to fix it.
>
> > p|      REG_L   ra, -SZREG(sp)
> > func:
> >         xxx
> >         ret
> >
> > To prevent such condition, we proposed a way to load or store target
> > addresses atomically. We store a 8 bytes aligned full-width absolute
> > address into each ftrace prologue and use a jump at front to decide
> > whether we should take ftrace detour. To reduce footprint of ftrace
> > prologues, we clobber t0, and we move ra (re-)storing into
> > ftrace_{regs_}caller. This is similar to ARM64, which also clobbers x9 at
> > each prologue.
> >
> > Also, we initialize the target at startup to take care of a case where
> > REG_L happened before the update of the ftrace target.
> >
> > .align 2  # if it happen to be 8B-aligned
> > ftrace_prologue:
> > p|      {j      func} | {auipc  t0}
> >         j       ftrace_cont
> > p|      .dword  0x? <=== storing the address to a 8B aligned space can be
> >                          considered atomic to read sides using REG_L
> > ftrace_cont:
> >         REG_L   t0, 8(t0) <=== read side
> >         jalr    t0, t0
> > func:
> >         xxx
> >         ret
> >
> > .align 2  # if it is 4B but not 8B-aligned
> > ftrace_prologue:
> > p|      {j      func} | {auipc  t0}
> >         REG_L   t0, 0xc(t0) <=== read side
> >         j       ftrace_cont
> > p|      .dword  0x? <=== the target address
> > ftrace_cont:
> >         jalr    t0, t0
> > func:
> >         xxx
> >         ret
> >
> > Signed-off-by: Andy Chiu <andy.chiu@sifive.com>
> > Reviewed-by: Greentime Hu <greentime.hu@sifive.com>
> > ---
> >  arch/riscv/include/asm/ftrace.h |  24 -----
> >  arch/riscv/kernel/ftrace.c      | 173 ++++++++++++++++++++++----------
> >  arch/riscv/kernel/mcount-dyn.S  |  69 ++++++++++---
> >  3 files changed, 176 insertions(+), 90 deletions(-)
> >
> > diff --git a/arch/riscv/include/asm/ftrace.h b/arch/riscv/include/asm/ftrace.h
> > index 04dad3380041..eaa611e491fc 100644
> > --- a/arch/riscv/include/asm/ftrace.h
> > +++ b/arch/riscv/include/asm/ftrace.h
> > @@ -47,30 +47,6 @@ struct dyn_arch_ftrace {
> >   */
> >
> >  #define MCOUNT_ADDR            ((unsigned long)MCOUNT_NAME)
> > -#define JALR_SIGN_MASK         (0x00000800)
> > -#define JALR_OFFSET_MASK       (0x00000fff)
> > -#define AUIPC_OFFSET_MASK      (0xfffff000)
> > -#define AUIPC_PAD              (0x00001000)
> > -#define JALR_SHIFT             20
> > -#define JALR_BASIC             (0x000080e7)
> > -#define AUIPC_BASIC            (0x00000097)
> > -#define NOP4                   (0x00000013)
> > -
> > -#define make_call(caller, callee, call)                                        \
> > -do {                                                                   \
> > -       call[0] = to_auipc_insn((unsigned int)((unsigned long)callee -  \
> > -                               (unsigned long)caller));                \
> > -       call[1] = to_jalr_insn((unsigned int)((unsigned long)callee -   \
> > -                              (unsigned long)caller));                 \
> > -} while (0)
> > -
> > -#define to_jalr_insn(offset)                                           \
> > -       (((offset & JALR_OFFSET_MASK) << JALR_SHIFT) | JALR_BASIC)
> > -
> > -#define to_auipc_insn(offset)                                          \
> > -       ((offset & JALR_SIGN_MASK) ?                                    \
> > -       (((offset & AUIPC_OFFSET_MASK) + AUIPC_PAD) | AUIPC_BASIC) :    \
> > -       ((offset & AUIPC_OFFSET_MASK) | AUIPC_BASIC))
> >
> >  /*
> >   * Let auipc+jalr be the basic *mcount unit*, so we make it 8 bytes here.
> > diff --git a/arch/riscv/kernel/ftrace.c b/arch/riscv/kernel/ftrace.c
> > index 2086f6585773..84b9e280dd1f 100644
> > --- a/arch/riscv/kernel/ftrace.c
> > +++ b/arch/riscv/kernel/ftrace.c
> > @@ -23,31 +23,29 @@ void ftrace_arch_code_modify_post_process(void) __releases(&text_mutex)
> >  }
> >
> >  static int ftrace_check_current_call(unsigned long hook_pos,
> > -                                    unsigned int *expected)
> > +                                    unsigned long expected_addr)
> >  {
> > -       unsigned int replaced[2];
> > -       unsigned int nops[2] = {NOP4, NOP4};
> > +       unsigned long replaced;
> >
> > -       /* we expect nops at the hook position */
> > -       if (!expected)
> > -               expected = nops;
> > +       /* we expect ftrace_stub at the hook position */
> > +       if (!expected_addr)
> > +               expected_addr = (unsigned long) ftrace_stub;
> >
> >         /*
> >          * Read the text we want to modify;
> >          * return must be -EFAULT on read error
> >          */
> > -       if (copy_from_kernel_nofault(replaced, (void *)hook_pos,
> > -                       MCOUNT_INSN_SIZE))
> > +       if (copy_from_kernel_nofault(&replaced, (void *)hook_pos,
> > +                       (sizeof(unsigned long))))
> >                 return -EFAULT;
> >
> >         /*
> >          * Make sure it is what we expect it to be;
> >          * return must be -EINVAL on failed comparison
> >          */
> > -       if (memcmp(expected, replaced, sizeof(replaced))) {
> > -               pr_err("%p: expected (%08x %08x) but got (%08x %08x)\n",
> > -                      (void *)hook_pos, expected[0], expected[1], replaced[0],
> > -                      replaced[1]);
> > +       if (expected_addr != replaced) {
> > +               pr_err("%p: expected (%016lx) but got (%016lx)\n",
> > +                      (void *)hook_pos, expected_addr, replaced);
> >                 return -EINVAL;
> >         }
> >
> > @@ -57,55 +55,96 @@ static int ftrace_check_current_call(unsigned long hook_pos,
> >  static int __ftrace_modify_call(unsigned long hook_pos, unsigned long target,
> >                                 bool enable)
> >  {
> > -       unsigned int call[2];
> > -       unsigned int nops[2] = {NOP4, NOP4};
> > +       unsigned long call = target;
> > +       unsigned long nops = (unsigned long)ftrace_stub;
> >
> > -       make_call(hook_pos, target, call);
> > -
> > -       /* Replace the auipc-jalr pair at once. Return -EPERM on write error. */
> > +       /* Replace the target address at once. Return -EPERM on write error. */
> >         if (patch_text_nosync
> > -           ((void *)hook_pos, enable ? call : nops, MCOUNT_INSN_SIZE))
> > +           ((void *)hook_pos, enable ? &call : &nops, sizeof(unsigned long)))
> >                 return -EPERM;
> >
> >         return 0;
> >  }
> >
> >  /*
> > - * Put 5 instructions with 16 bytes at the front of function within
> > - * patchable function entry nops' area.
> > - *
> > - * 0: REG_S  ra, -SZREG(sp)
> > - * 1: auipc  ra, 0x?
> > - * 2: jalr   -?(ra)
> > - * 3: REG_L  ra, -SZREG(sp)
> > + * Place 4 instructions and a destination address in the patchable function
> > + * entry.
> >   *
> >   * So the opcodes is:
> > - * 0: 0xfe113c23 (sd)/0xfe112e23 (sw)
> > - * 1: 0x???????? -> auipc
> > - * 2: 0x???????? -> jalr
> > - * 3: 0xff813083 (ld)/0xffc12083 (lw)
> > + * INSN_SKIPALL  : J     PC + 0x18 (when disabled, jump to the function)
> > + * INSN_AUIPC    : AUIPC T0, 0 (when enabled, load address of trampoline)
> > + * INSN_LOAD(off): REG_L T0, off(T0) (load address stored in the tramp)
> > + * INSN_SKIPTRAMP: J     PC + 0x10 (skip tramp since they are not instructions)
> > + * INSN_JALR     : JALR  T0, T0 (jump to the destination)
> > + *
> > + * At runtime, we want to patch the jump target atomically in order to work with
> > + * kernel preemption. If we patched with a pair of AUIPC + JALR and a task was
> > + * preempted after loading upper bits with AUIPC. Then things would mess up if
> > + * we updated the jump target before the task were switched back.
> > + *
> > + * We also want to align all patchable function entries to 4-byte boundaries and,
> > + * the jump target to a 8 Bytes aligned address so that each of them could be
> > + * natually updated and observed by patching and running cores.
> > + *
> > + * To make sure target addresses are 8-byte aligned, we have to consider
> > + * following scenarios:
> > + *
> > + * First if the starting address of the patchable entry is aligned to an 8-byte
> > + * boundary:
> > + * | ADDR   | COMPILED | DISABLED         | ENABLED                |
> > + * +--------+----------+------------------+------------------------+
> > + * | 0x00   | NOP      | J     FUNC       | AUIPC T0, 0            |
> If we only add a J FUNC in current code, can we solve the problem?
>     DISABLED | ENABLED
> 0: J FUNC      | REG_S  ra, -SZREG(sp)
> 1: auipc  ra, 0x?
> 2: jalr   -?(ra)
> 3: REG_L  ra, -SZREG(sp)
> FUNC:
Ah..., it can't. The auipc + jalr is broken, we need directly detour
from A to B in ftrace_modify_call.

>
>
> > + * | 0x04   | NOP      | J     0x10                                |
> > + * | 0x08   | NOP      | 8-byte aligned target address (low)       |
> > + * | 0x0C   | NOP      |                               (high)      |
> > + * | 0x10   | NOP      | REG_L T0, 8(T0)                           |
> > + * | 0x14   | NOP      | JALR  T0, T0                              |
> > + * | FUNC   | X                                                    |
> > + *
> > + * If not, then it starts at a 4- but not 8-byte aligned address. In such cases,
> > + * We re-arrange code and the trampoline in order to natually align it.
> > + * | ADDR   | COMPILED | DISABLED         | ENABLED                |
> > + * +--------+----------+------------------+------------------------+
> > + * | 0x04   | NOP      | J     FUNC       | AUIPC T0, 0            |
> > + * | 0x08   | NOP      | REG_L T0, 0xC(T0)                         |
> > + * | 0x0C   | NOP      | J     0x18                                |
> > + * | 0x10   | NOP      | 8-byte aligned target address (low)       |
> > + * | 0x14   | NOP      |                               (high)      |
> > + * | 0x18   | NOP      | JALR  T0, T0                              |
> > + * | FUNC   | X                                                    |
> >   */
> > +
> >  #if __riscv_xlen == 64
> > -#define INSN0  0xfe113c23
> > -#define INSN3  0xff813083
> > -#elif __riscv_xlen == 32
> > -#define INSN0  0xfe112e23
> > -#define INSN3  0xffc12083
> > +#define INSN_LD_T0_OFF(off) ((0x2b283) | ((off) << 20))
> > +# elif __riscv_xlen == 32
> > +#define INSN_LD_T0_OFF(off) ((0x2a283) | ((off) << 20))
> >  #endif
> >
> > -#define FUNC_ENTRY_SIZE        16
> > -#define FUNC_ENTRY_JMP 4
> > +#define INSN_SKIPALL   0x0180006f
> > +#define INSN_AUIPC     0x00000297
> > +#define INSN_LOAD(off) INSN_LD_T0_OFF(off)
> > +#define INSN_SKIPTRAMP 0x00c0006f
> > +#define INSN_JALR      0x000282e7
> > +#define INSN_SIZE      4
> >
> >  int ftrace_make_call(struct dyn_ftrace *rec, unsigned long addr)
> >  {
> > -       unsigned int call[4] = {INSN0, 0, 0, INSN3};
> > -       unsigned long target = addr;
> > -       unsigned long caller = rec->ip + FUNC_ENTRY_JMP;
> > -
> > -       call[1] = to_auipc_insn((unsigned int)(target - caller));
> > -       call[2] = to_jalr_insn((unsigned int)(target - caller));
> > +       unsigned int call[1] = {INSN_AUIPC};
> > +       void *tramp;
> > +       unsigned long patch_addr = rec->ip;
> > +
> > +       if (IS_ALIGNED(patch_addr, 8)) {
> > +               tramp = (void *) (patch_addr + 0x8);
> > +       } else if (IS_ALIGNED(patch_addr, 4)) {
> > +               tramp = (void *) (patch_addr + 0xc);
> > +       } else {
> > +               pr_warn("cannot patch: function must be 4-Byte or 8-Byte aligned\n");
> > +               return -EINVAL;
> > +       }
> > +       WARN_ON(!IS_ALIGNED((unsigned long)tramp, 8));
> > +       patch_insn_write(tramp, &addr, sizeof(unsigned long));
> >
> > -       if (patch_text_nosync((void *)rec->ip, call, FUNC_ENTRY_SIZE))
> > +       if (patch_text_nosync((void *)patch_addr, &call, INSN_SIZE))
> >                 return -EPERM;
> >
> >         return 0;
> > @@ -114,14 +153,49 @@ int ftrace_make_call(struct dyn_ftrace *rec, unsigned long addr)
> >  int ftrace_make_nop(struct module *mod, struct dyn_ftrace *rec,
> >                     unsigned long addr)
> >  {
> > -       unsigned int nops[4] = {NOP4, NOP4, NOP4, NOP4};
> > +       unsigned int nops[1] = {INSN_SKIPALL};
> > +       unsigned long patch_addr = rec->ip;
> >
> > -       if (patch_text_nosync((void *)rec->ip, nops, FUNC_ENTRY_SIZE))
> > +       if (patch_text_nosync((void *)patch_addr, nops, INSN_SIZE))
> >                 return -EPERM;
> >
> >         return 0;
> >  }
> >
> > +extern void ftrace_no_caller(void);
> > +static void ftrace_make_default_tramp(unsigned int *tramp)
> > +{
> > +       *((unsigned long *)tramp) = (unsigned long) &ftrace_no_caller;
> > +}
> > +
> > +int __ftrace_init_nop(struct module *mod, struct dyn_ftrace *rec,
> > +                   unsigned long addr)
> > +{
> > +       unsigned int nops[6];
> > +       unsigned int *tramp;
> > +       unsigned long patch_addr = rec->ip;
> > +
> > +       nops[0] = INSN_SKIPALL;
> > +       if (IS_ALIGNED(patch_addr, 8)) {
> > +               nops[1] = INSN_SKIPTRAMP;
> > +               nops[4] = INSN_LOAD(0x8);
> > +               tramp = &nops[2];
> > +       } else if (IS_ALIGNED(patch_addr, 4)) {
> > +               nops[1] = INSN_LOAD(0xc);
> > +               nops[2] = INSN_SKIPTRAMP;
> > +               tramp = &nops[3];
> > +       } else {
> > +               pr_warn("start address must be 4-Byte aligned\n");
> > +               return -EINVAL;
> > +       }
> > +       ftrace_make_default_tramp(tramp);
> > +       nops[5] = INSN_JALR;
> > +
> > +       if (patch_text_nosync((void *)patch_addr, nops, sizeof(nops)))
> > +               return -EPERM;
> > +
> > +       return 0;
> > +}
> >
> >  /*
> >   * This is called early on, and isn't wrapped by
> > @@ -135,7 +209,7 @@ int ftrace_init_nop(struct module *mod, struct dyn_ftrace *rec)
> >         int out;
> >
> >         ftrace_arch_code_modify_prepare();
> > -       out = ftrace_make_nop(mod, rec, MCOUNT_ADDR);
> > +       out = __ftrace_init_nop(mod, rec, MCOUNT_ADDR);
> >         ftrace_arch_code_modify_post_process();
> >
> >         return out;
> > @@ -158,17 +232,14 @@ int ftrace_update_ftrace_func(ftrace_func_t func)
> >  int ftrace_modify_call(struct dyn_ftrace *rec, unsigned long old_addr,
> >                        unsigned long addr)
> >  {
> > -       unsigned int call[2];
> > -       unsigned long caller = rec->ip + FUNC_ENTRY_JMP;
> >         int ret;
> >
> > -       make_call(caller, old_addr, call);
> > -       ret = ftrace_check_current_call(caller, call);
> > +       ret = ftrace_check_current_call(rec->ip, old_addr);
> >
> >         if (ret)
> >                 return ret;
> >
> > -       return __ftrace_modify_call(caller, addr, true);
> > +       return __ftrace_modify_call(rec->ip, addr, true);
> >  }
> >  #endif
> >
> > diff --git a/arch/riscv/kernel/mcount-dyn.S b/arch/riscv/kernel/mcount-dyn.S
> > index d171eca623b6..f8ee63e4314b 100644
> > --- a/arch/riscv/kernel/mcount-dyn.S
> > +++ b/arch/riscv/kernel/mcount-dyn.S
> > @@ -13,7 +13,7 @@
> >
> >         .text
> >
> > -#define FENTRY_RA_OFFSET       12
> > +#define FENTRY_RA_OFFSET       24
> >  #define ABI_SIZE_ON_STACK      72
> >  #define ABI_A0                 0
> >  #define ABI_A1                 8
> > @@ -25,7 +25,12 @@
> >  #define ABI_A7                 56
> >  #define ABI_RA                 64
> >
> > +# t0 points to return of ftrace
> > +# ra points to the return address of traced function
> > +
> >         .macro SAVE_ABI
> > +       REG_S   ra, -SZREG(sp)
> > +       mv      ra, t0
> >         addi    sp, sp, -SZREG
> >         addi    sp, sp, -ABI_SIZE_ON_STACK
> >
> > @@ -53,10 +58,14 @@
> >
> >         addi    sp, sp, ABI_SIZE_ON_STACK
> >         addi    sp, sp, SZREG
> > +       mv      t0, ra
> > +       REG_L   ra, -SZREG(sp)
> >         .endm
> >
> >  #ifdef CONFIG_DYNAMIC_FTRACE_WITH_REGS
> >         .macro SAVE_ALL
> > +       REG_S   ra, -SZREG(sp)
> > +       mv      ra, t0
> >         addi    sp, sp, -SZREG
> >         addi    sp, sp, -PT_SIZE_ON_STACK
> >
> > @@ -138,9 +147,18 @@
> >
> >         addi    sp, sp, PT_SIZE_ON_STACK
> >         addi    sp, sp, SZREG
> > +       mv      t0, ra # t0 is equal to ra here
> > +       REG_L   ra, -SZREG(sp)
> >         .endm
> >  #endif /* CONFIG_DYNAMIC_FTRACE_WITH_REGS */
> >
> > +# perform a full fence before re-running the ftrae entry if we run into this
> > +ENTRY(ftrace_no_caller)
> > +       fence   rw, rw
> > +       fence.i
> > +       jr      -FENTRY_RA_OFFSET(t0)
> > +ENDPROC(ftrace_no_caller)
> > +
> >  ENTRY(ftrace_caller)
> >         SAVE_ABI
> >
> > @@ -150,9 +168,9 @@ ENTRY(ftrace_caller)
> >         REG_L   a1, ABI_SIZE_ON_STACK(sp)
> >         mv      a3, sp
> >
> > -ftrace_call:
> > -       .global ftrace_call
> > -       call    ftrace_stub
> > +ftrace_call_site:
> > +       REG_L   ra, ftrace_call
> > +       jalr    0(ra)
> >
> >  #ifdef CONFIG_FUNCTION_GRAPH_TRACER
> >         addi    a0, sp, ABI_SIZE_ON_STACK
> > @@ -161,12 +179,12 @@ ftrace_call:
> >  #ifdef HAVE_FUNCTION_GRAPH_FP_TEST
> >         mv      a2, s0
> >  #endif
> > -ftrace_graph_call:
> > -       .global ftrace_graph_call
> > -       call    ftrace_stub
> > +ftrace_graph_call_site:
> > +       REG_L   ra, ftrace_graph_call
> > +       jalr    0(ra)
> >  #endif
> >         RESTORE_ABI
> > -       ret
> > +       jr      t0
> >  ENDPROC(ftrace_caller)
> >
> >  #ifdef CONFIG_DYNAMIC_FTRACE_WITH_REGS
> > @@ -179,9 +197,9 @@ ENTRY(ftrace_regs_caller)
> >         REG_L   a1, PT_SIZE_ON_STACK(sp)
> >         mv      a3, sp
> >
> > -ftrace_regs_call:
> > -       .global ftrace_regs_call
> > -       call    ftrace_stub
> > +ftrace_regs_call_site:
> > +       REG_L   ra, ftrace_regs_call
> > +       jalr    0(ra)
> >
> >  #ifdef CONFIG_FUNCTION_GRAPH_TRACER
> >         addi    a0, sp, PT_RA
> > @@ -190,12 +208,33 @@ ftrace_regs_call:
> >  #ifdef HAVE_FUNCTION_GRAPH_FP_TEST
> >         mv      a2, s0
> >  #endif
> > -ftrace_graph_regs_call:
> > -       .global ftrace_graph_regs_call
> > -       call    ftrace_stub
> > +ftrace_graph_regs_call_site:
> > +       REG_L   ra, ftrace_graph_regs_call
> > +       jalr    0(ra)
> >  #endif
> >
> >         RESTORE_ALL
> > -       ret
> > +       jr      t0
> >  ENDPROC(ftrace_regs_caller)
> >  #endif /* CONFIG_DYNAMIC_FTRACE_WITH_REGS */
> > +
> > +.align RISCV_LGPTR
> > +ftrace_call:
> > +       .global ftrace_call
> > +       RISCV_PTR ftrace_stub
> > +
> > +.align RISCV_LGPTR
> > +ftrace_graph_call:
> > +       .global ftrace_graph_call
> > +       RISCV_PTR ftrace_stub
> > +
> > +.align RISCV_LGPTR
> > +ftrace_regs_call:
> > +       .global ftrace_regs_call
> > +       RISCV_PTR ftrace_stub
> > +
> > +.align RISCV_LGPTR
> > +ftrace_graph_regs_call:
> > +       .global ftrace_graph_regs_call
> > +       RISCV_PTR ftrace_stub
> > +
> > --
> > 2.36.0
> >
>
>
> --
> Best Regards
>  Guo Ren



--
Best Regards
 Guo Ren

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH RFC v2 riscv/for-next 1/5] riscv: align ftrace to 4 Byte boundary and increase ftrace prologue size
  2022-09-13  9:42 ` [PATCH RFC v2 riscv/for-next 1/5] riscv: align ftrace to 4 Byte boundary and increase ftrace prologue size Andy Chiu
@ 2022-09-15 13:53   ` Guo Ren
  2022-09-17  1:15     ` Andy Chiu
  0 siblings, 1 reply; 43+ messages in thread
From: Guo Ren @ 2022-09-15 13:53 UTC (permalink / raw)
  To: Andy Chiu
  Cc: palmer, paul.walmsley, aou, rostedt, mingo, peterz, jpoimboe,
	jbaron, ardb, greentime.hu, zong.li, kernel, linux-riscv

On Tue, Sep 13, 2022 at 5:44 PM Andy Chiu <andy.chiu@sifive.com> wrote:
>
> We are introducing a new ftrace mechanism in order to phase out
> stop_machine() and enable kernel preemption. The new mechanism requires
> ftrace patchable function entries to be 24 bytes and aligned to 4 Byte
> boundaries.
>
> Before applying this patch, the size of the kernel code, with 122465 of
> ftrace entries, was at 12.46 MB. Under the same configuration, the size
> has increased to 12.99 MB after applying this patch set.
>
> However, we found the -falign-functions alone was not strong enoungh to
> make all functions align as required. In fact, cold functions are not
> aligned after turning on optimizations. We consider this is a bug in GCC
> and turn off guess-branch-probility as a workaround to align all
> functions.
Disable pgo static optimization would reduce the code performance. I
think we need to fix that problem in GCC, and pgo is a default-enabled
optimization.

>
> GCC bug id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88345
>
> Signed-off-by: Andy Chiu <andy.chiu@sifive.com>
> Reviewed-by: Greentime Hu <greentime.hu@sifive.com>
> ---
>  arch/riscv/Makefile | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/riscv/Makefile b/arch/riscv/Makefile
> index 3fa8ef336822..fd8069f59a59 100644
> --- a/arch/riscv/Makefile
> +++ b/arch/riscv/Makefile
> @@ -11,7 +11,7 @@ LDFLAGS_vmlinux :=
>  ifeq ($(CONFIG_DYNAMIC_FTRACE),y)
>         LDFLAGS_vmlinux := --no-relax
>         KBUILD_CPPFLAGS += -DCC_USING_PATCHABLE_FUNCTION_ENTRY
> -       CC_FLAGS_FTRACE := -fpatchable-function-entry=8
> +       CC_FLAGS_FTRACE := -fpatchable-function-entry=12  -falign-functions=4 -fno-guess-branch-probability
Another fixup should be covered, eg:

+ifeq ($(CONFIG_RISCV_ISA_C),y)
+       CC_FLAGS_FTRACE := -fpatchable-function-entry=12
+else
+       CC_FLAGS_FTRACE := -fpatchable-function-entry=6
+endif

>  endif
>
>  ifeq ($(CONFIG_CMODEL_MEDLOW),y)
> --
> 2.36.0
>


-- 
Best Regards
 Guo Ren

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH RFC v2 riscv/for-next 5/5] riscv: align arch_static_branch function
  2022-09-14 14:06   ` Guo Ren
@ 2022-09-16 23:54     ` Andy Chiu
  2022-09-17  0:22       ` Guo Ren
  0 siblings, 1 reply; 43+ messages in thread
From: Andy Chiu @ 2022-09-16 23:54 UTC (permalink / raw)
  To: Guo Ren
  Cc: palmer, paul.walmsley, aou, rostedt, mingo, peterz, jpoimboe,
	jbaron, ardb, greentime.hu, zong.li, kernel, linux-riscv

Hi Guo,

Sorry for sending it again, I forgot to send in plain-text on the last mail.

On Wed, Sep 14, 2022 at 3:06 PM Guo Ren <guoren@kernel.org> wrote:
>
> Is this patch related to this series?
>

This is related to dynamic code patching but not the mechanism of
"function tracer" itself. You are right, I should submit another patch
for that.

> On Tue, Sep 13, 2022 at 5:44 PM Andy Chiu <andy.chiu@sifive.com> wrote:
> >
> > runtime code patching must be done at a naturally aligned address, or we
> > may execute on a partial instruction.
> If it's true, we can't use static branches at all. Have you
> encountered a problem?
>
> If you are right, arm64 ... csky all need the patch.
>
In fact we have run into problems that traced back to static jump
functions during the test. We switched tracer randomly for every 1~5
seconds on a dual-core QEMU setup and found the kernel stucking at a
static branch where it jumps to itself. The reason is that the static
branch was 2-byte but not 4-byte aligned. Then, the kernel would patch
the instruction, either J or NOP, with 2 half-word stores, if the
machine does not have efficient unaligned accesses. Thus, there exists
moments where a half of the NOP mixes with the other half of the J
when transitioning the branch. In our particular case, on a
little-endian machine, the upper half of the NOP was mixed with the
lower part of the J when enabling the branch, resulting in a jump that
jumped to itself. On the other way, it would result in a HINT
instruction when disabling the branch, but it might not be observable.

ARM64 does not have this problem since all instructions must be 4-byte aligned.

Regards,
Andy

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH RFC v2 riscv/for-next 5/5] riscv: align arch_static_branch function
  2022-09-16 23:54     ` Andy Chiu
@ 2022-09-17  0:22       ` Guo Ren
  2022-09-17 18:17         ` [PATCH] riscv: jump_label: Optimize size with RISCV_ISA_C guoren
  2022-09-17 18:38         ` [PATCH RFC v2 riscv/for-next 5/5] riscv: align arch_static_branch function guoren
  0 siblings, 2 replies; 43+ messages in thread
From: Guo Ren @ 2022-09-17  0:22 UTC (permalink / raw)
  To: Andy Chiu
  Cc: palmer, paul.walmsley, aou, rostedt, mingo, peterz, jpoimboe,
	jbaron, ardb, greentime.hu, zong.li, kernel, linux-riscv

On Sat, Sep 17, 2022 at 7:54 AM Andy Chiu <andy.chiu@sifive.com> wrote:
>
> Hi Guo,
>
> Sorry for sending it again, I forgot to send in plain-text on the last mail.
>
> On Wed, Sep 14, 2022 at 3:06 PM Guo Ren <guoren@kernel.org> wrote:
> >
> > Is this patch related to this series?
> >
>
> This is related to dynamic code patching but not the mechanism of
> "function tracer" itself. You are right, I should submit another patch
> for that.
>
> > On Tue, Sep 13, 2022 at 5:44 PM Andy Chiu <andy.chiu@sifive.com> wrote:
> > >
> > > runtime code patching must be done at a naturally aligned address, or we
> > > may execute on a partial instruction.
> > If it's true, we can't use static branches at all. Have you
> > encountered a problem?
> >
> > If you are right, arm64 ... csky all need the patch.
> >
> In fact we have run into problems that traced back to static jump
> functions during the test. We switched tracer randomly for every 1~5
> seconds on a dual-core QEMU setup and found the kernel stucking at a
> static branch where it jumps to itself. The reason is that the static
> branch was 2-byte but not 4-byte aligned. Then, the kernel would patch
> the instruction, either J or NOP, with 2 half-word stores, if the
> machine does not have efficient unaligned accesses. Thus, there exists
> moments where a half of the NOP mixes with the other half of the J
> when transitioning the branch. In our particular case, on a
> little-endian machine, the upper half of the NOP was mixed with the
> lower part of the J when enabling the branch, resulting in a jump that
> jumped to itself. On the other way, it would result in a HINT
> instruction when disabling the branch, but it might not be observable.
How about limiting the static branch to 16-bit instruction? (nop16 - br16)

>
> ARM64 does not have this problem since all instructions must be 4-byte aligned.
But, csky does (16+32). Thx.
>
> Regards,
> Andy



-- 
Best Regards
 Guo Ren

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH RFC v2 riscv/for-next 3/5] riscv: ftrace: use indirect jump to work with kernel preemption
  2022-09-14 13:45   ` Guo Ren
  2022-09-15 13:30     ` Guo Ren
@ 2022-09-17  1:04     ` Andy Chiu
  2022-09-17 10:56       ` Guo Ren
  1 sibling, 1 reply; 43+ messages in thread
From: Andy Chiu @ 2022-09-17  1:04 UTC (permalink / raw)
  To: Guo Ren
  Cc: palmer, paul.walmsley, aou, rostedt, mingo, peterz, jpoimboe,
	jbaron, ardb, greentime.hu, zong.li, kernel, linux-riscv,
	Jim Shu, Ruinland Tsai

Hi Guo,

On Wed, Sep 14, 2022 at 2:45 PM Guo Ren <guoren@kernel.org> wrote:
>
> I really appreciate you finding the bug, great job.

Thanks, :)

>
> On Tue, Sep 13, 2022 at 5:44 PM Andy Chiu <andy.chiu@sifive.com> wrote:

Consider this case happens on a preemptive kernel, with stop_machine.
And all of stop_machine's sub-functions were marked as no trace.

> > p: patched area performed by dynamic ftrace
> > ftrace_prologue:
> > p|      REG_S   ra, -SZREG(sp)
> > p|      auipc   ra, 0x? ------------> preempted
> >                                         ...
> >                                 change ftrace function
> >                                         ...
> > p|      jalr    -?(ra) <------------- switched back
>
> When auipc + jalr -> nop, is safe, right? Because when switched back,
> jalr -> nop.
> When nop -> auipc + jalr, is buggy, right? Because when switched back,
> nop -> jalr, the ra's value is not expected.
>
> Some machines with instruction fusion won't be affected, because they
> would merge auipc + jalr into one macro-op.

This might not be safe as well, if auipc and jalr happened to sit on a
different cache line. And if there were a cache hit for the line
having the auipc and miss for the jalr after switching back. I do not
really sure if this is possible in practice.

> Qemu shouldn't be broken, because auipc + jalr is always in the same
> tcg block, so no chance for interruption between them.

In fact, qemu is broken. I had not thought of that before I got your
reply. But I believe that there is a size limit for each tcg block,
and the auipc and jalr just locate in a separate tcg block.

Regards,
Andy

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH RFC v2 riscv/for-next 1/5] riscv: align ftrace to 4 Byte boundary and increase ftrace prologue size
  2022-09-15 13:53   ` Guo Ren
@ 2022-09-17  1:15     ` Andy Chiu
  0 siblings, 0 replies; 43+ messages in thread
From: Andy Chiu @ 2022-09-17  1:15 UTC (permalink / raw)
  To: Guo Ren
  Cc: palmer, paul.walmsley, aou, rostedt, mingo, peterz, jpoimboe,
	jbaron, ardb, greentime.hu, zong.li, kernel, linux-riscv,
	Kito Cheng

On Thu, Sep 15, 2022 at 2:54 PM Guo Ren <guoren@kernel.org> wrote:
>
> On Tue, Sep 13, 2022 at 5:44 PM Andy Chiu <andy.chiu@sifive.com> wrote:
> >
> > However, we found the -falign-functions alone was not strong enoungh to
> > make all functions align as required. In fact, cold functions are not
> > aligned after turning on optimizations. We consider this is a bug in GCC
> > and turn off guess-branch-probility as a workaround to align all
> > functions.
> Disable pgo static optimization would reduce the code performance. I
> think we need to fix that problem in GCC, and pgo is a default-enabled
> optimization.
I would like to see this fix in GCC as well. Or the other way is to
reserve extra bytes (like, 2 bytes) and pick an aligned address in
software.
> >
> > GCC bug id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88345
> >
> > Signed-off-by: Andy Chiu <andy.chiu@sifive.com>
> > Reviewed-by: Greentime Hu <greentime.hu@sifive.com>
> > ---
> >  arch/riscv/Makefile | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/arch/riscv/Makefile b/arch/riscv/Makefile
> > index 3fa8ef336822..fd8069f59a59 100644
> > --- a/arch/riscv/Makefile
> > +++ b/arch/riscv/Makefile
> > @@ -11,7 +11,7 @@ LDFLAGS_vmlinux :=
> >  ifeq ($(CONFIG_DYNAMIC_FTRACE),y)
> >         LDFLAGS_vmlinux := --no-relax
> >         KBUILD_CPPFLAGS += -DCC_USING_PATCHABLE_FUNCTION_ENTRY
> > -       CC_FLAGS_FTRACE := -fpatchable-function-entry=8
> > +       CC_FLAGS_FTRACE := -fpatchable-function-entry=12  -falign-functions=4 -fno-guess-branch-probability
> Another fixup should be covered, eg:
>
> +ifeq ($(CONFIG_RISCV_ISA_C),y)
> +       CC_FLAGS_FTRACE := -fpatchable-function-entry=12
> +else
> +       CC_FLAGS_FTRACE := -fpatchable-function-entry=6
> +endif
>
Thanks for reminding. Yes, we have to do that.

Regards,
Andy

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH RFC v2 riscv/for-next 3/5] riscv: ftrace: use indirect jump to work with kernel preemption
  2022-09-17  1:04     ` Andy Chiu
@ 2022-09-17 10:56       ` Guo Ren
  0 siblings, 0 replies; 43+ messages in thread
From: Guo Ren @ 2022-09-17 10:56 UTC (permalink / raw)
  To: Andy Chiu
  Cc: palmer, paul.walmsley, aou, rostedt, mingo, peterz, jpoimboe,
	jbaron, ardb, greentime.hu, zong.li, kernel, linux-riscv,
	Jim Shu, Ruinland Tsai

On Sat, Sep 17, 2022 at 9:04 AM Andy Chiu <andy.chiu@sifive.com> wrote:
>
> Hi Guo,
>
> On Wed, Sep 14, 2022 at 2:45 PM Guo Ren <guoren@kernel.org> wrote:
> >
> > I really appreciate you finding the bug, great job.
>
> Thanks, :)
>
> >
> > On Tue, Sep 13, 2022 at 5:44 PM Andy Chiu <andy.chiu@sifive.com> wrote:
>
> Consider this case happens on a preemptive kernel, with stop_machine.
> And all of stop_machine's sub-functions were marked as no trace.
>
> > > p: patched area performed by dynamic ftrace
> > > ftrace_prologue:
> > > p|      REG_S   ra, -SZREG(sp)
> > > p|      auipc   ra, 0x? ------------> preempted
> > >                                         ...
> > >                                 change ftrace function
> > >                                         ...
> > > p|      jalr    -?(ra) <------------- switched back
> >
> > When auipc + jalr -> nop, is safe, right? Because when switched back,
> > jalr -> nop.
> > When nop -> auipc + jalr, is buggy, right? Because when switched back,
> > nop -> jalr, the ra's value is not expected.
> >
> > Some machines with instruction fusion won't be affected, because they
> > would merge auipc + jalr into one macro-op.
>
> This might not be safe as well, if auipc and jalr happened to sit on a
> different cache line. And if there were a cache hit for the line
> having the auipc and miss for the jalr after switching back. I do not
> really sure if this is possible in practice.
That's mico-arch bug, hardware should guarantee it. IFU always emits
the macro-op after getting "auipc + <next insn>" or emits separately.

>
> > Qemu shouldn't be broken, because auipc + jalr is always in the same
> > tcg block, so no chance for interruption between them.
>
> In fact, qemu is broken. I had not thought of that before I got your
> reply. But I believe that there is a size limit for each tcg block,
> and the auipc and jalr just locate in a separate tcg block.
Yes, you are right. They could be located in different TCG blocks.

>
> Regards,
> Andy



-- 
Best Regards
 Guo Ren

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH] riscv: jump_label: Optimize size with RISCV_ISA_C
  2022-09-17  0:22       ` Guo Ren
@ 2022-09-17 18:17         ` guoren
  2022-09-17 18:38         ` [PATCH RFC v2 riscv/for-next 5/5] riscv: align arch_static_branch function guoren
  1 sibling, 0 replies; 43+ messages in thread
From: guoren @ 2022-09-17 18:17 UTC (permalink / raw)
  To: guoren
  Cc: andy.chiu, aou, ardb, greentime.hu, jbaron, jpoimboe, kernel,
	linux-riscv, mingo, palmer, paul.walmsley, peterz, rostedt,
	zong.li, Guo Ren

From: Guo Ren <guoren@linux.alibaba.com>

Reduce size of static branch.

Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Signed-off-by: Guo Ren <guoren@kernel.org>
---
 arch/riscv/include/asm/jump_label.h | 17 ++++++++++-----
 arch/riscv/kernel/jump_label.c      | 32 +++++++++++++++++++++++++++++
 2 files changed, 44 insertions(+), 5 deletions(-)

diff --git a/arch/riscv/include/asm/jump_label.h b/arch/riscv/include/asm/jump_label.h
index 38af2ec7b9bf..78f747dfa8a2 100644
--- a/arch/riscv/include/asm/jump_label.h
+++ b/arch/riscv/include/asm/jump_label.h
@@ -12,17 +12,21 @@
 #include <linux/types.h>
 #include <asm/asm.h>
 
+#ifdef CONFIG_RISCV_ISA_C
+#define JUMP_LABEL_NOP_SIZE 2
+#else
 #define JUMP_LABEL_NOP_SIZE 4
+#endif
 
 static __always_inline bool arch_static_branch(struct static_key *key,
 					       bool branch)
 {
 	asm_volatile_goto(
-		"	.option push				\n\t"
-		"	.option norelax				\n\t"
-		"	.option norvc				\n\t"
+#ifdef CONFIG_RISCV_ISA_C
+		"1:	c.nop					\n\t"
+#else
 		"1:	nop					\n\t"
-		"	.option pop				\n\t"
+#endif
 		"	.pushsection	__jump_table, \"aw\"	\n\t"
 		"	.align		" RISCV_LGPTR "		\n\t"
 		"	.long		1b - ., %l[label] - .	\n\t"
@@ -39,11 +43,14 @@ static __always_inline bool arch_static_branch_jump(struct static_key *key,
 						    bool branch)
 {
 	asm_volatile_goto(
+#ifdef CONFIG_RISCV_ISA_C
+		"1:	c.j		%l[label]		\n\t"
+#else
 		"	.option push				\n\t"
 		"	.option norelax				\n\t"
-		"	.option norvc				\n\t"
 		"1:	jal		zero, %l[label]		\n\t"
 		"	.option pop				\n\t"
+#endif
 		"	.pushsection	__jump_table, \"aw\"	\n\t"
 		"	.align		" RISCV_LGPTR "		\n\t"
 		"	.long		1b - ., %l[label] - .	\n\t"
diff --git a/arch/riscv/kernel/jump_label.c b/arch/riscv/kernel/jump_label.c
index e6694759dbd0..64a4e5df093d 100644
--- a/arch/riscv/kernel/jump_label.c
+++ b/arch/riscv/kernel/jump_label.c
@@ -11,21 +11,52 @@
 #include <asm/bug.h>
 #include <asm/patch.h>
 
+#ifdef CONFIG_RISCV_ISA_C
+#define RISCV_INSN_C_NOP 0x0001U
+#define RISCV_INSN_C_JAL 0xa001U
+#else
 #define RISCV_INSN_NOP 0x00000013U
 #define RISCV_INSN_JAL 0x0000006fU
+#endif
 
 void arch_jump_label_transform(struct jump_entry *entry,
 			       enum jump_label_type type)
 {
 	void *addr = (void *)jump_entry_code(entry);
+#ifdef CONFIG_RISCV_ISA_C
+	u16 insn;
+#else
 	u32 insn;
+#endif
 
 	if (type == JUMP_LABEL_JMP) {
 		long offset = jump_entry_target(entry) - jump_entry_code(entry);
+#ifdef CONFIG_RISCV_ISA_C
+		if (WARN_ON(offset & 1 || offset < -2048 || offset >= 2048))
+			return;
 
+		/*
+		 * 001 | imm[11|4|9:8|10|6|7|3:1|5] 01 - C.JAL
+		 */
+		insn = RISCV_INSN_C_JAL |
+			(((u16)offset & GENMASK(5, 5)) >> (5 - 2)) |
+			(((u16)offset & GENMASK(3, 1)) << (3 - 1)) |
+			(((u16)offset & GENMASK(7, 7)) >> (7 - 6)) |
+			(((u16)offset & GENMASK(6, 6)) << (7 - 6)) |
+			(((u16)offset & GENMASK(10, 10)) >> (10 - 8)) |
+			(((u16)offset & GENMASK(9, 8)) << (9 - 8)) |
+			(((u16)offset & GENMASK(4, 4)) << (11 - 4)) |
+			(((u16)offset & GENMASK(11, 11)) << (12 - 11));
+	} else {
+		insn = RISCV_INSN_C_NOP;
+	}
+#else
 		if (WARN_ON(offset & 1 || offset < -524288 || offset >= 524288))
 			return;
 
+		/*
+		 * imm[20|10:1|11|19:12] | rd | 1101111 - JAL
+		 */
 		insn = RISCV_INSN_JAL |
 			(((u32)offset & GENMASK(19, 12)) << (12 - 12)) |
 			(((u32)offset & GENMASK(11, 11)) << (20 - 11)) |
@@ -34,6 +65,7 @@ void arch_jump_label_transform(struct jump_entry *entry,
 	} else {
 		insn = RISCV_INSN_NOP;
 	}
+#endif
 
 	mutex_lock(&text_mutex);
 	patch_text_nosync(addr, &insn, sizeof(insn));
-- 
2.36.1


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* Re: [PATCH RFC v2 riscv/for-next 5/5] riscv: align arch_static_branch function
  2022-09-17  0:22       ` Guo Ren
  2022-09-17 18:17         ` [PATCH] riscv: jump_label: Optimize size with RISCV_ISA_C guoren
@ 2022-09-17 18:38         ` guoren
  2022-09-17 23:49           ` Guo Ren
                             ` (2 more replies)
  1 sibling, 3 replies; 43+ messages in thread
From: guoren @ 2022-09-17 18:38 UTC (permalink / raw)
  To: guoren
  Cc: andy.chiu, aou, ardb, greentime.hu, jbaron, jpoimboe, kernel,
	linux-riscv, mingo, palmer, paul.walmsley, peterz, rostedt,
	zong.li, Guo Ren

From: Guo Ren <guoren@linux.alibaba.com>

Reduce size of static branch.

Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Signed-off-by: Guo Ren <guoren@kernel.org>
---
 arch/riscv/include/asm/jump_label.h | 17 ++++++++++-----
 arch/riscv/kernel/jump_label.c      | 32 +++++++++++++++++++++++++++++
 2 files changed, 44 insertions(+), 5 deletions(-)

diff --git a/arch/riscv/include/asm/jump_label.h b/arch/riscv/include/asm/jump_label.h
index 38af2ec7b9bf..78f747dfa8a2 100644
--- a/arch/riscv/include/asm/jump_label.h
+++ b/arch/riscv/include/asm/jump_label.h
@@ -12,17 +12,21 @@
 #include <linux/types.h>
 #include <asm/asm.h>
 
+#ifdef CONFIG_RISCV_ISA_C
+#define JUMP_LABEL_NOP_SIZE 2
+#else
 #define JUMP_LABEL_NOP_SIZE 4
+#endif
 
 static __always_inline bool arch_static_branch(struct static_key *key,
 					       bool branch)
 {
 	asm_volatile_goto(
-		"	.option push				\n\t"
-		"	.option norelax				\n\t"
-		"	.option norvc				\n\t"
+#ifdef CONFIG_RISCV_ISA_C
+		"1:	c.nop					\n\t"
+#else
 		"1:	nop					\n\t"
-		"	.option pop				\n\t"
+#endif
 		"	.pushsection	__jump_table, \"aw\"	\n\t"
 		"	.align		" RISCV_LGPTR "		\n\t"
 		"	.long		1b - ., %l[label] - .	\n\t"
@@ -39,11 +43,14 @@ static __always_inline bool arch_static_branch_jump(struct static_key *key,
 						    bool branch)
 {
 	asm_volatile_goto(
+#ifdef CONFIG_RISCV_ISA_C
+		"1:	c.j		%l[label]		\n\t"
+#else
 		"	.option push				\n\t"
 		"	.option norelax				\n\t"
-		"	.option norvc				\n\t"
 		"1:	jal		zero, %l[label]		\n\t"
 		"	.option pop				\n\t"
+#endif
 		"	.pushsection	__jump_table, \"aw\"	\n\t"
 		"	.align		" RISCV_LGPTR "		\n\t"
 		"	.long		1b - ., %l[label] - .	\n\t"
diff --git a/arch/riscv/kernel/jump_label.c b/arch/riscv/kernel/jump_label.c
index e6694759dbd0..64a4e5df093d 100644
--- a/arch/riscv/kernel/jump_label.c
+++ b/arch/riscv/kernel/jump_label.c
@@ -11,21 +11,52 @@
 #include <asm/bug.h>
 #include <asm/patch.h>
 
+#ifdef CONFIG_RISCV_ISA_C
+#define RISCV_INSN_C_NOP 0x0001U
+#define RISCV_INSN_C_JAL 0xa001U
+#else
 #define RISCV_INSN_NOP 0x00000013U
 #define RISCV_INSN_JAL 0x0000006fU
+#endif
 
 void arch_jump_label_transform(struct jump_entry *entry,
 			       enum jump_label_type type)
 {
 	void *addr = (void *)jump_entry_code(entry);
+#ifdef CONFIG_RISCV_ISA_C
+	u16 insn;
+#else
 	u32 insn;
+#endif
 
 	if (type == JUMP_LABEL_JMP) {
 		long offset = jump_entry_target(entry) - jump_entry_code(entry);
+#ifdef CONFIG_RISCV_ISA_C
+		if (WARN_ON(offset & 1 || offset < -2048 || offset >= 2048))
+			return;
 
+		/*
+		 * 001 | imm[11|4|9:8|10|6|7|3:1|5] 01 - C.JAL
+		 */
+		insn = RISCV_INSN_C_JAL |
+			(((u16)offset & GENMASK(5, 5)) >> (5 - 2)) |
+			(((u16)offset & GENMASK(3, 1)) << (3 - 1)) |
+			(((u16)offset & GENMASK(7, 7)) >> (7 - 6)) |
+			(((u16)offset & GENMASK(6, 6)) << (7 - 6)) |
+			(((u16)offset & GENMASK(10, 10)) >> (10 - 8)) |
+			(((u16)offset & GENMASK(9, 8)) << (9 - 8)) |
+			(((u16)offset & GENMASK(4, 4)) << (11 - 4)) |
+			(((u16)offset & GENMASK(11, 11)) << (12 - 11));
+	} else {
+		insn = RISCV_INSN_C_NOP;
+	}
+#else
 		if (WARN_ON(offset & 1 || offset < -524288 || offset >= 524288))
 			return;
 
+		/*
+		 * imm[20|10:1|11|19:12] | rd | 1101111 - JAL
+		 */
 		insn = RISCV_INSN_JAL |
 			(((u32)offset & GENMASK(19, 12)) << (12 - 12)) |
 			(((u32)offset & GENMASK(11, 11)) << (20 - 11)) |
@@ -34,6 +65,7 @@ void arch_jump_label_transform(struct jump_entry *entry,
 	} else {
 		insn = RISCV_INSN_NOP;
 	}
+#endif
 
 	mutex_lock(&text_mutex);
 	patch_text_nosync(addr, &insn, sizeof(insn));
-- 
2.36.1


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* Re: [PATCH RFC v2 riscv/for-next 5/5] riscv: align arch_static_branch function
  2022-09-17 18:38         ` [PATCH RFC v2 riscv/for-next 5/5] riscv: align arch_static_branch function guoren
@ 2022-09-17 23:49           ` Guo Ren
  2022-09-17 23:59           ` Guo Ren
  2022-09-18  0:12           ` Jessica Clarke
  2 siblings, 0 replies; 43+ messages in thread
From: Guo Ren @ 2022-09-17 23:49 UTC (permalink / raw)
  To: guoren
  Cc: andy.chiu, aou, ardb, greentime.hu, jbaron, jpoimboe, kernel,
	linux-riscv, mingo, palmer, paul.walmsley, peterz, rostedt,
	zong.li, Guo Ren

On Sun, Sep 18, 2022 at 2:39 AM <guoren@kernel.org> wrote:
>
> From: Guo Ren <guoren@linux.alibaba.com>
>
> Reduce size of static branch.
>
> Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
> Signed-off-by: Guo Ren <guoren@kernel.org>
> ---
>  arch/riscv/include/asm/jump_label.h | 17 ++++++++++-----
>  arch/riscv/kernel/jump_label.c      | 32 +++++++++++++++++++++++++++++
>  2 files changed, 44 insertions(+), 5 deletions(-)
>
> diff --git a/arch/riscv/include/asm/jump_label.h b/arch/riscv/include/asm/jump_label.h
> index 38af2ec7b9bf..78f747dfa8a2 100644
> --- a/arch/riscv/include/asm/jump_label.h
> +++ b/arch/riscv/include/asm/jump_label.h
> @@ -12,17 +12,21 @@
>  #include <linux/types.h>
>  #include <asm/asm.h>
>
> +#ifdef CONFIG_RISCV_ISA_C
> +#define JUMP_LABEL_NOP_SIZE 2
> +#else
>  #define JUMP_LABEL_NOP_SIZE 4
> +#endif
>
>  static __always_inline bool arch_static_branch(struct static_key *key,
>                                                bool branch)
>  {
>         asm_volatile_goto(
> -               "       .option push                            \n\t"
> -               "       .option norelax                         \n\t"
> -               "       .option norvc                           \n\t"
> +#ifdef CONFIG_RISCV_ISA_C
> +               "1:     c.nop                                   \n\t"
> +#else
>                 "1:     nop                                     \n\t"
> -               "       .option pop                             \n\t"
> +#endif
>                 "       .pushsection    __jump_table, \"aw\"    \n\t"
>                 "       .align          " RISCV_LGPTR "         \n\t"
>                 "       .long           1b - ., %l[label] - .   \n\t"
> @@ -39,11 +43,14 @@ static __always_inline bool arch_static_branch_jump(struct static_key *key,
>                                                     bool branch)
>  {
>         asm_volatile_goto(
> +#ifdef CONFIG_RISCV_ISA_C
> +               "1:     c.j             %l[label]               \n\t"
> +#else
>                 "       .option push                            \n\t"
>                 "       .option norelax                         \n\t"
> -               "       .option norvc                           \n\t"
>                 "1:     jal             zero, %l[label]         \n\t"
>                 "       .option pop                             \n\t"
> +#endif
>                 "       .pushsection    __jump_table, \"aw\"    \n\t"
>                 "       .align          " RISCV_LGPTR "         \n\t"
>                 "       .long           1b - ., %l[label] - .   \n\t"
> diff --git a/arch/riscv/kernel/jump_label.c b/arch/riscv/kernel/jump_label.c
> index e6694759dbd0..64a4e5df093d 100644
> --- a/arch/riscv/kernel/jump_label.c
> +++ b/arch/riscv/kernel/jump_label.c
> @@ -11,21 +11,52 @@
>  #include <asm/bug.h>
>  #include <asm/patch.h>
>
> +#ifdef CONFIG_RISCV_ISA_C
> +#define RISCV_INSN_C_NOP 0x0001U
> +#define RISCV_INSN_C_JAL 0xa001U
> +#else
>  #define RISCV_INSN_NOP 0x00000013U
>  #define RISCV_INSN_JAL 0x0000006fU
> +#endif
>
>  void arch_jump_label_transform(struct jump_entry *entry,
>                                enum jump_label_type type)
>  {
>         void *addr = (void *)jump_entry_code(entry);
> +#ifdef CONFIG_RISCV_ISA_C
> +       u16 insn;
> +#else
>         u32 insn;
> +#endif
>
>         if (type == JUMP_LABEL_JMP) {
>                 long offset = jump_entry_target(entry) - jump_entry_code(entry);
> +#ifdef CONFIG_RISCV_ISA_C
> +               if (WARN_ON(offset & 1 || offset < -2048 || offset >= 2048))
> +                       return;
>
> +               /*
> +                * 001 | imm[11|4|9:8|10|6|7|3:1|5] 01 - C.JAL
> +                */
> +               insn = RISCV_INSN_C_JAL |
> +                       (((u16)offset & GENMASK(5, 5)) >> (5 - 2)) |
> +                       (((u16)offset & GENMASK(3, 1)) << (3 - 1)) |
> +                       (((u16)offset & GENMASK(7, 7)) >> (7 - 6)) |
> +                       (((u16)offset & GENMASK(6, 6)) << (7 - 6)) |
> +                       (((u16)offset & GENMASK(10, 10)) >> (10 - 8)) |
> +                       (((u16)offset & GENMASK(9, 8)) << (9 - 8)) |
> +                       (((u16)offset & GENMASK(4, 4)) << (11 - 4)) |
> +                       (((u16)offset & GENMASK(11, 11)) << (12 - 11));
> +       } else {
> +               insn = RISCV_INSN_C_NOP;
> +       }
> +#else
>                 if (WARN_ON(offset & 1 || offset < -524288 || offset >= 524288))
For the unify, it also should be:
if (WARN_ON(offset & 1 || offset < -2048 || offset >= 2048))



>                         return;
>
> +               /*
> +                * imm[20|10:1|11|19:12] | rd | 1101111 - JAL
> +                */
>                 insn = RISCV_INSN_JAL |
>                         (((u32)offset & GENMASK(19, 12)) << (12 - 12)) |
>                         (((u32)offset & GENMASK(11, 11)) << (20 - 11)) |
> @@ -34,6 +65,7 @@ void arch_jump_label_transform(struct jump_entry *entry,
>         } else {
>                 insn = RISCV_INSN_NOP;
>         }
> +#endif
>
>         mutex_lock(&text_mutex);
>         patch_text_nosync(addr, &insn, sizeof(insn));
> --
> 2.36.1
>


-- 
Best Regards
 Guo Ren

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH RFC v2 riscv/for-next 5/5] riscv: align arch_static_branch function
  2022-09-17 18:38         ` [PATCH RFC v2 riscv/for-next 5/5] riscv: align arch_static_branch function guoren
  2022-09-17 23:49           ` Guo Ren
@ 2022-09-17 23:59           ` Guo Ren
  2022-09-18  0:12           ` Jessica Clarke
  2 siblings, 0 replies; 43+ messages in thread
From: Guo Ren @ 2022-09-17 23:59 UTC (permalink / raw)
  To: guoren
  Cc: andy.chiu, aou, ardb, greentime.hu, jbaron, jpoimboe, kernel,
	linux-riscv, mingo, palmer, paul.walmsley, peterz, rostedt,
	zong.li, Guo Ren

On Sun, Sep 18, 2022 at 2:39 AM <guoren@kernel.org> wrote:
>
> From: Guo Ren <guoren@linux.alibaba.com>
>
> Reduce size of static branch.
>
> Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
> Signed-off-by: Guo Ren <guoren@kernel.org>
> ---
>  arch/riscv/include/asm/jump_label.h | 17 ++++++++++-----
>  arch/riscv/kernel/jump_label.c      | 32 +++++++++++++++++++++++++++++
>  2 files changed, 44 insertions(+), 5 deletions(-)
>
> diff --git a/arch/riscv/include/asm/jump_label.h b/arch/riscv/include/asm/jump_label.h
> index 38af2ec7b9bf..78f747dfa8a2 100644
> --- a/arch/riscv/include/asm/jump_label.h
> +++ b/arch/riscv/include/asm/jump_label.h
> @@ -12,17 +12,21 @@
>  #include <linux/types.h>
>  #include <asm/asm.h>
>
> +#ifdef CONFIG_RISCV_ISA_C
> +#define JUMP_LABEL_NOP_SIZE 2
> +#else
>  #define JUMP_LABEL_NOP_SIZE 4
> +#endif
>
>  static __always_inline bool arch_static_branch(struct static_key *key,
>                                                bool branch)
>  {
>         asm_volatile_goto(
> -               "       .option push                            \n\t"
> -               "       .option norelax                         \n\t"
> -               "       .option norvc                           \n\t"
> +#ifdef CONFIG_RISCV_ISA_C
> +               "1:     c.nop                                   \n\t"
> +#else
>                 "1:     nop                                     \n\t"
> -               "       .option pop                             \n\t"
> +#endif
>                 "       .pushsection    __jump_table, \"aw\"    \n\t"
>                 "       .align          " RISCV_LGPTR "         \n\t"
>                 "       .long           1b - ., %l[label] - .   \n\t"
We should check the range during compile time, here. Run-time checking
is the last and most painful method to prevent some errors.

> @@ -39,11 +43,14 @@ static __always_inline bool arch_static_branch_jump(struct static_key *key,
>                                                     bool branch)
>  {
>         asm_volatile_goto(
> +#ifdef CONFIG_RISCV_ISA_C
> +               "1:     c.j             %l[label]               \n\t"
> +#else
>                 "       .option push                            \n\t"
>                 "       .option norelax                         \n\t"
> -               "       .option norvc                           \n\t"
>                 "1:     jal             zero, %l[label]         \n\t"
>                 "       .option pop                             \n\t"
> +#endif
>                 "       .pushsection    __jump_table, \"aw\"    \n\t"
>                 "       .align          " RISCV_LGPTR "         \n\t"
>                 "       .long           1b - ., %l[label] - .   \n\t"
> diff --git a/arch/riscv/kernel/jump_label.c b/arch/riscv/kernel/jump_label.c
> index e6694759dbd0..64a4e5df093d 100644
> --- a/arch/riscv/kernel/jump_label.c
> +++ b/arch/riscv/kernel/jump_label.c
> @@ -11,21 +11,52 @@
>  #include <asm/bug.h>
>  #include <asm/patch.h>
>
> +#ifdef CONFIG_RISCV_ISA_C
> +#define RISCV_INSN_C_NOP 0x0001U
> +#define RISCV_INSN_C_JAL 0xa001U
> +#else
>  #define RISCV_INSN_NOP 0x00000013U
>  #define RISCV_INSN_JAL 0x0000006fU
> +#endif
>
>  void arch_jump_label_transform(struct jump_entry *entry,
>                                enum jump_label_type type)
>  {
>         void *addr = (void *)jump_entry_code(entry);
> +#ifdef CONFIG_RISCV_ISA_C
> +       u16 insn;
> +#else
>         u32 insn;
> +#endif
>
>         if (type == JUMP_LABEL_JMP) {
>                 long offset = jump_entry_target(entry) - jump_entry_code(entry);
> +#ifdef CONFIG_RISCV_ISA_C
> +               if (WARN_ON(offset & 1 || offset < -2048 || offset >= 2048))
> +                       return;
>
> +               /*
> +                * 001 | imm[11|4|9:8|10|6|7|3:1|5] 01 - C.JAL
> +                */
> +               insn = RISCV_INSN_C_JAL |
> +                       (((u16)offset & GENMASK(5, 5)) >> (5 - 2)) |
> +                       (((u16)offset & GENMASK(3, 1)) << (3 - 1)) |
> +                       (((u16)offset & GENMASK(7, 7)) >> (7 - 6)) |
> +                       (((u16)offset & GENMASK(6, 6)) << (7 - 6)) |
> +                       (((u16)offset & GENMASK(10, 10)) >> (10 - 8)) |
> +                       (((u16)offset & GENMASK(9, 8)) << (9 - 8)) |
> +                       (((u16)offset & GENMASK(4, 4)) << (11 - 4)) |
> +                       (((u16)offset & GENMASK(11, 11)) << (12 - 11));
> +       } else {
> +               insn = RISCV_INSN_C_NOP;
> +       }
> +#else
>                 if (WARN_ON(offset & 1 || offset < -524288 || offset >= 524288))
For the unify, it also should be:
                   if (WARN_ON(offset & 1 || offset < -2048 || offset >= 2048))

After some tests, 2048 is enough for the current riscv Linux.

>                         return;
>
> +               /*
> +                * imm[20|10:1|11|19:12] | rd | 1101111 - JAL
> +                */
>                 insn = RISCV_INSN_JAL |
>                         (((u32)offset & GENMASK(19, 12)) << (12 - 12)) |
>                         (((u32)offset & GENMASK(11, 11)) << (20 - 11)) |
> @@ -34,6 +65,7 @@ void arch_jump_label_transform(struct jump_entry *entry,
>         } else {
>                 insn = RISCV_INSN_NOP;
>         }
> +#endif
>
>         mutex_lock(&text_mutex);
>         patch_text_nosync(addr, &insn, sizeof(insn));
> --
> 2.36.1
>


-- 
Best Regards
 Guo Ren

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH RFC v2 riscv/for-next 5/5] riscv: align arch_static_branch function
  2022-09-17 18:38         ` [PATCH RFC v2 riscv/for-next 5/5] riscv: align arch_static_branch function guoren
  2022-09-17 23:49           ` Guo Ren
  2022-09-17 23:59           ` Guo Ren
@ 2022-09-18  0:12           ` Jessica Clarke
  2022-09-18  0:46             ` Guo Ren
  2 siblings, 1 reply; 43+ messages in thread
From: Jessica Clarke @ 2022-09-18  0:12 UTC (permalink / raw)
  To: guoren
  Cc: Andy Chiu, Albert Ou, Ard Biesheuvel, Greentime Hu, jbaron,
	jpoimboe, kernel, linux-riscv, Ingo Molnar, Palmer Dabbelt,
	Paul Walmsley, Peter Zijlstra, rostedt, Zong Li, Guo Ren

On 17 Sept 2022, at 19:38, guoren@kernel.org wrote:
> 
> From: Guo Ren <guoren@linux.alibaba.com>
> 
> Reduce size of static branch.
> 
> Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
> Signed-off-by: Guo Ren <guoren@kernel.org>
> ---
> arch/riscv/include/asm/jump_label.h | 17 ++++++++++-----
> arch/riscv/kernel/jump_label.c      | 32 +++++++++++++++++++++++++++++
> 2 files changed, 44 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/riscv/include/asm/jump_label.h b/arch/riscv/include/asm/jump_label.h
> index 38af2ec7b9bf..78f747dfa8a2 100644
> --- a/arch/riscv/include/asm/jump_label.h
> +++ b/arch/riscv/include/asm/jump_label.h
> @@ -12,17 +12,21 @@
> #include <linux/types.h>
> #include <asm/asm.h>
> 
> +#ifdef CONFIG_RISCV_ISA_C
> +#define JUMP_LABEL_NOP_SIZE 2
> +#else
> #define JUMP_LABEL_NOP_SIZE 4
> +#endif
> 
> static __always_inline bool arch_static_branch(struct static_key *key,
> 					       bool branch)
> {
> 	asm_volatile_goto(
> -		"	.option push				\n\t"
> -		"	.option norelax				\n\t"
> -		"	.option norvc				\n\t"
> +#ifdef CONFIG_RISCV_ISA_C
> +		"1:	c.nop					\n\t"
> +#else
> 		"1:	nop					\n\t"
> -		"	.option pop				\n\t"
> +#endif
> 		"	.pushsection	__jump_table, \"aw\"	\n\t"
> 		"	.align		" RISCV_LGPTR "		\n\t"
> 		"	.long		1b - ., %l[label] - .	\n\t"
> @@ -39,11 +43,14 @@ static __always_inline bool arch_static_branch_jump(struct static_key *key,
> 						    bool branch)
> {
> 	asm_volatile_goto(
> +#ifdef CONFIG_RISCV_ISA_C
> +		"1:	c.j		%l[label]		\n\t"
> +#else
> 		"	.option push				\n\t"
> 		"	.option norelax				\n\t"
> -		"	.option norvc				\n\t"
> 		"1:	jal		zero, %l[label]		\n\t"
> 		"	.option pop				\n\t"
> +#endif
> 		"	.pushsection	__jump_table, \"aw\"	\n\t"
> 		"	.align		" RISCV_LGPTR "		\n\t"
> 		"	.long		1b - ., %l[label] - .	\n\t"
> diff --git a/arch/riscv/kernel/jump_label.c b/arch/riscv/kernel/jump_label.c
> index e6694759dbd0..64a4e5df093d 100644
> --- a/arch/riscv/kernel/jump_label.c
> +++ b/arch/riscv/kernel/jump_label.c
> @@ -11,21 +11,52 @@
> #include <asm/bug.h>
> #include <asm/patch.h>
> 
> +#ifdef CONFIG_RISCV_ISA_C
> +#define RISCV_INSN_C_NOP 0x0001U
> +#define RISCV_INSN_C_JAL 0xa001U

This is C.J (i.e. JAL X0) not C.JAL (i.e. JAL RA, which is RV32-only
and not what you want since it clobbers RA).

Jess

> +#else
> #define RISCV_INSN_NOP 0x00000013U
> #define RISCV_INSN_JAL 0x0000006fU
> +#endif
> 
> void arch_jump_label_transform(struct jump_entry *entry,
> 			       enum jump_label_type type)
> {
> 	void *addr = (void *)jump_entry_code(entry);
> +#ifdef CONFIG_RISCV_ISA_C
> +	u16 insn;
> +#else
> 	u32 insn;
> +#endif
> 
> 	if (type == JUMP_LABEL_JMP) {
> 		long offset = jump_entry_target(entry) - jump_entry_code(entry);
> +#ifdef CONFIG_RISCV_ISA_C
> +		if (WARN_ON(offset & 1 || offset < -2048 || offset >= 2048))
> +			return;
> 
> +		/*
> +		 * 001 | imm[11|4|9:8|10|6|7|3:1|5] 01 - C.JAL
> +		 */
> +		insn = RISCV_INSN_C_JAL |
> +			(((u16)offset & GENMASK(5, 5)) >> (5 - 2)) |
> +			(((u16)offset & GENMASK(3, 1)) << (3 - 1)) |
> +			(((u16)offset & GENMASK(7, 7)) >> (7 - 6)) |
> +			(((u16)offset & GENMASK(6, 6)) << (7 - 6)) |
> +			(((u16)offset & GENMASK(10, 10)) >> (10 - 8)) |
> +			(((u16)offset & GENMASK(9, 8)) << (9 - 8)) |
> +			(((u16)offset & GENMASK(4, 4)) << (11 - 4)) |
> +			(((u16)offset & GENMASK(11, 11)) << (12 - 11));
> +	} else {
> +		insn = RISCV_INSN_C_NOP;
> +	}
> +#else
> 		if (WARN_ON(offset & 1 || offset < -524288 || offset >= 524288))
> 			return;
> 
> +		/*
> +		 * imm[20|10:1|11|19:12] | rd | 1101111 - JAL
> +		 */
> 		insn = RISCV_INSN_JAL |
> 			(((u32)offset & GENMASK(19, 12)) << (12 - 12)) |
> 			(((u32)offset & GENMASK(11, 11)) << (20 - 11)) |
> @@ -34,6 +65,7 @@ void arch_jump_label_transform(struct jump_entry *entry,
> 	} else {
> 		insn = RISCV_INSN_NOP;
> 	}
> +#endif
> 
> 	mutex_lock(&text_mutex);
> 	patch_text_nosync(addr, &insn, sizeof(insn));
> -- 
> 2.36.1
> 
> 
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH RFC v2 riscv/for-next 5/5] riscv: align arch_static_branch function
  2022-09-18  0:12           ` Jessica Clarke
@ 2022-09-18  0:46             ` Guo Ren
  0 siblings, 0 replies; 43+ messages in thread
From: Guo Ren @ 2022-09-18  0:46 UTC (permalink / raw)
  To: Jessica Clarke
  Cc: Andy Chiu, Albert Ou, Ard Biesheuvel, Greentime Hu, jbaron,
	jpoimboe, kernel, linux-riscv, Ingo Molnar, Palmer Dabbelt,
	Paul Walmsley, Peter Zijlstra, rostedt, Zong Li, Guo Ren

On Sun, Sep 18, 2022 at 8:12 AM Jessica Clarke <jrtc27@jrtc27.com> wrote:
>
> On 17 Sept 2022, at 19:38, guoren@kernel.org wrote:
> >
> > From: Guo Ren <guoren@linux.alibaba.com>
> >
> > Reduce size of static branch.
> >
> > Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
> > Signed-off-by: Guo Ren <guoren@kernel.org>
> > ---
> > arch/riscv/include/asm/jump_label.h | 17 ++++++++++-----
> > arch/riscv/kernel/jump_label.c      | 32 +++++++++++++++++++++++++++++
> > 2 files changed, 44 insertions(+), 5 deletions(-)
> >
> > diff --git a/arch/riscv/include/asm/jump_label.h b/arch/riscv/include/asm/jump_label.h
> > index 38af2ec7b9bf..78f747dfa8a2 100644
> > --- a/arch/riscv/include/asm/jump_label.h
> > +++ b/arch/riscv/include/asm/jump_label.h
> > @@ -12,17 +12,21 @@
> > #include <linux/types.h>
> > #include <asm/asm.h>
> >
> > +#ifdef CONFIG_RISCV_ISA_C
> > +#define JUMP_LABEL_NOP_SIZE 2
> > +#else
> > #define JUMP_LABEL_NOP_SIZE 4
> > +#endif
> >
> > static __always_inline bool arch_static_branch(struct static_key *key,
> >                                              bool branch)
> > {
> >       asm_volatile_goto(
> > -             "       .option push                            \n\t"
> > -             "       .option norelax                         \n\t"
> > -             "       .option norvc                           \n\t"
> > +#ifdef CONFIG_RISCV_ISA_C
> > +             "1:     c.nop                                   \n\t"
> > +#else
> >               "1:     nop                                     \n\t"
> > -             "       .option pop                             \n\t"
> > +#endif
> >               "       .pushsection    __jump_table, \"aw\"    \n\t"
> >               "       .align          " RISCV_LGPTR "         \n\t"
> >               "       .long           1b - ., %l[label] - .   \n\t"
> > @@ -39,11 +43,14 @@ static __always_inline bool arch_static_branch_jump(struct static_key *key,
> >                                                   bool branch)
> > {
> >       asm_volatile_goto(
> > +#ifdef CONFIG_RISCV_ISA_C
> > +             "1:     c.j             %l[label]               \n\t"
> > +#else
> >               "       .option push                            \n\t"
> >               "       .option norelax                         \n\t"
> > -             "       .option norvc                           \n\t"
> >               "1:     jal             zero, %l[label]         \n\t"
> >               "       .option pop                             \n\t"
> > +#endif
> >               "       .pushsection    __jump_table, \"aw\"    \n\t"
> >               "       .align          " RISCV_LGPTR "         \n\t"
> >               "       .long           1b - ., %l[label] - .   \n\t"
> > diff --git a/arch/riscv/kernel/jump_label.c b/arch/riscv/kernel/jump_label.c
> > index e6694759dbd0..64a4e5df093d 100644
> > --- a/arch/riscv/kernel/jump_label.c
> > +++ b/arch/riscv/kernel/jump_label.c
> > @@ -11,21 +11,52 @@
> > #include <asm/bug.h>
> > #include <asm/patch.h>
> >
> > +#ifdef CONFIG_RISCV_ISA_C
> > +#define RISCV_INSN_C_NOP 0x0001U
> > +#define RISCV_INSN_C_JAL 0xa001U
>
> This is C.J (i.e. JAL X0) not C.JAL (i.e. JAL RA, which is RV32-only
> and not what you want since it clobbers RA).

Sorry for the macro naming bug. I know I'm using c.j.


101 imm[11|4|9:8|10|6|7|3:1|5] 01 C.J

001 imm[11|4|9:8|10|6|7|3:1|5] 01 C.JAL (RV32)

Here is the correction:
-#define RISCV_INSN_C_JAL 0xa001U
+#define RISCV_INSN_C_J   0xa001U



>
> Jess
>
> > +#else
> > #define RISCV_INSN_NOP 0x00000013U
> > #define RISCV_INSN_JAL 0x0000006fU
> > +#endif
> >
> > void arch_jump_label_transform(struct jump_entry *entry,
> >                              enum jump_label_type type)
> > {
> >       void *addr = (void *)jump_entry_code(entry);
> > +#ifdef CONFIG_RISCV_ISA_C
> > +     u16 insn;
> > +#else
> >       u32 insn;
> > +#endif
> >
> >       if (type == JUMP_LABEL_JMP) {
> >               long offset = jump_entry_target(entry) - jump_entry_code(entry);
> > +#ifdef CONFIG_RISCV_ISA_C
> > +             if (WARN_ON(offset & 1 || offset < -2048 || offset >= 2048))
> > +                     return;
> >
> > +             /*
> > +              * 001 | imm[11|4|9:8|10|6|7|3:1|5] 01 - C.JAL
> > +              */
> > +             insn = RISCV_INSN_C_JAL |
> > +                     (((u16)offset & GENMASK(5, 5)) >> (5 - 2)) |
> > +                     (((u16)offset & GENMASK(3, 1)) << (3 - 1)) |
> > +                     (((u16)offset & GENMASK(7, 7)) >> (7 - 6)) |
> > +                     (((u16)offset & GENMASK(6, 6)) << (7 - 6)) |
> > +                     (((u16)offset & GENMASK(10, 10)) >> (10 - 8)) |
> > +                     (((u16)offset & GENMASK(9, 8)) << (9 - 8)) |
> > +                     (((u16)offset & GENMASK(4, 4)) << (11 - 4)) |
> > +                     (((u16)offset & GENMASK(11, 11)) << (12 - 11));
> > +     } else {
> > +             insn = RISCV_INSN_C_NOP;
> > +     }
> > +#else
> >               if (WARN_ON(offset & 1 || offset < -524288 || offset >= 524288))
> >                       return;
> >
> > +             /*
> > +              * imm[20|10:1|11|19:12] | rd | 1101111 - JAL
> > +              */
> >               insn = RISCV_INSN_JAL |
> >                       (((u32)offset & GENMASK(19, 12)) << (12 - 12)) |
> >                       (((u32)offset & GENMASK(11, 11)) << (20 - 11)) |
> > @@ -34,6 +65,7 @@ void arch_jump_label_transform(struct jump_entry *entry,
> >       } else {
> >               insn = RISCV_INSN_NOP;
> >       }
> > +#endif
> >
> >       mutex_lock(&text_mutex);
> >       patch_text_nosync(addr, &insn, sizeof(insn));
> > --
> > 2.36.1
> >
> >
> > _______________________________________________
> > linux-riscv mailing list
> > linux-riscv@lists.infradead.org
> > http://lists.infradead.org/mailman/listinfo/linux-riscv
>


-- 
Best Regards
 Guo Ren

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH RFC v2 riscv/for-next 0/5] Enable ftrace with kernel preemption for RISC-V
  2022-09-13  9:42 [PATCH RFC v2 riscv/for-next 0/5] Enable ftrace with kernel preemption for RISC-V Andy Chiu
                   ` (4 preceding siblings ...)
  2022-09-13  9:42 ` [PATCH RFC v2 riscv/for-next 5/5] riscv: align arch_static_branch function Andy Chiu
@ 2024-02-13 19:42 ` Evgenii Shatokhin
  2024-02-21  5:27   ` Andy Chiu
  5 siblings, 1 reply; 43+ messages in thread
From: Evgenii Shatokhin @ 2024-02-13 19:42 UTC (permalink / raw)
  To: Andy Chiu, palmer, paul.walmsley, aou, rostedt, mingo, peterz,
	jpoimboe, jbaron, ardb
  Cc: greentime.hu, zong.li, guoren, Jessica Clarke, kernel,
	linux-riscv, linux

Hi,

On 13.09.2022 12:42, Andy Chiu wrote:
> This patch removes dependency of dynamic ftrace from calling
> stop_machine(), and makes it compatiable with kernel preemption.
> Originally, we ran into stack corruptions, or execution of partially
> updated instructions when starting or stopping ftrace on a fully
> preemptible kernel configuration. The reason is that kernel periodically
> calls rcu_momentary_dyntick_idle() on cores waiting for the code-patching
> core running in ftrace. Though rcu_momentary_dyntick_idle() itself is
> marked as notrace, it would call a bunch of tracable functions if we
> configured the kernel as preemptible. For example, these are some functions
> that happened to have a symbol and have not been marked as notrace on a
> RISC-V preemptible kernel compiled with GCC-11:
>   - __rcu_report_exp_rnp()
>   - rcu_report_exp_cpu_mult()
>   - rcu_preempt_deferred_qs()
>   - rcu_preempt_need_deferred_qs()
>   - rcu_preempt_deferred_qs_irqrestore()
> 
> Thus, this make it not ideal for us to rely on stop_machine() and
> handly marked "notrace"s to perform runtime code patching. To remove
> such dependency, we must make updates of code seemed atomic on running
> cores. This might not be obvious for RISC-V since it usaually uses a pair
> of AUIPC + JALR to perform a long jump, which cannot be modified and
> executed concurrently if we consider preemptions. As such, this patch
> proposed a way to make it possible. It embeds a 32-bit rel-address data
> into instructions of each ftrace prologue and jumps indirectly. In this
> way, we could store and load the address atomically so that the code
> patching core could run simutaneously with the rest of running cores.
> 
> After applying the patchset, we compiled a preemptible kernel with all
> tracers and ftrace-selftest enabled, and booted it on a 2-core QEMU virt
> machine. The kernel could boot up successfully, passing all ftrace
> testsuits. Besides, we ran a script that randomly pick a tracer on every
> 0~5 seconds. The kernel has sustained over 20K rounds of the test. In
> contrast, a preemptible kernel without our patch would panic in few
> rounds on the same machine.
> 
> Though we ran into errors when using hwlat or irqsoff tracers together
> with cpu-online stressor from stress-ng on a preemptible kernel. We
> believe the reason may be that  percpu workers of the tracers are being
> queued into unbounded workqueue when cpu get offlined and patches will go
> through tracing tree.
> 
> Additionally, we found patching of tracepoints unsafe since the
> instructions being patched are not naturally aligned. This may result in
> 2 half-word stores, which breaks atomicity, during the code patching.
> 
> changes in patch v2:
>   - Enforce alignments on all functions with a compiler workaround.
>   - Support 64bit addressing for ftrace targets if xlen == 64
>   - Initialize ftrace target addresses to avoid calling bad address in a
>     hypothesized case.
>   - Use LGPTR instead of SZPTR since .align is log-scaled for
>     mcount-dyn.S
>   - Require the nop instruction of all jump_labels aligns naturally on
>     4B.
> 
> Andy Chiu (5):
>    riscv: align ftrace to 4 Byte boundary and increase ftrace prologue
>      size
>    riscv: export patch_insn_write
>    riscv: ftrace: use indirect jump to work with kernel preemption
>    riscv: ftrace: do not use stop_machine to update code
>    riscv: align arch_static_branch function
> 
>   arch/riscv/Makefile                 |   2 +-
>   arch/riscv/include/asm/ftrace.h     |  24 ----
>   arch/riscv/include/asm/jump_label.h |   2 +
>   arch/riscv/include/asm/patch.h      |   1 +
>   arch/riscv/kernel/ftrace.c          | 179 ++++++++++++++++++++--------
>   arch/riscv/kernel/mcount-dyn.S      |  69 ++++++++---
>   arch/riscv/kernel/patch.c           |   4 +-
>   7 files changed, 188 insertions(+), 93 deletions(-)
> 

First of all, thank you for working on making dynamic Ftrace robust in 
preemptible kernels on RISC-V.
It is an important use case but, for now, dynamic Ftrace and related 
tracers cannot be safely used with such kernels.

Are there any updates on this series?
It needs a rebase, of course, but it looks doable.

If I understand the discussion correctly, the only blocker was that 
using "-falign-functions" was not enough to properly align cold 
functions and "-fno-guess-branch-probability" would likely have a 
performance cost.

It seems, GCC developers have recently provided a workaround for that 
(https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;h=0f5a9a00e3ab1fe96142f304cfbcf3f63b15f326, 
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88345#c24).

"-fmin-function-alignment" should help but, I do not know, which GCC 
versions have got that patch already. In the meantime, one could 
probably check if "-fmin-function-alignment" is supported by the 
compiler and use it, if it is.

Thoughts?

Regards,
Evgenii

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH RFC v2 riscv/for-next 3/5] riscv: ftrace: use indirect jump to work with kernel preemption
  2022-09-13  9:42 ` [PATCH RFC v2 riscv/for-next 3/5] riscv: ftrace: use indirect jump to work with kernel preemption Andy Chiu
  2022-09-14 13:45   ` Guo Ren
@ 2024-02-20 14:17   ` Evgenii Shatokhin
  1 sibling, 0 replies; 43+ messages in thread
From: Evgenii Shatokhin @ 2024-02-20 14:17 UTC (permalink / raw)
  To: Andy Chiu
  Cc: greentime.hu, zong.li, guoren, kernel, linux-riscv, palmer,
	paul.walmsley, aou, rostedt, mingo, jpoimboe, jbaron, peterz,
	ardb, linux

Hi,

On 13.09.2022 12:42, Andy Chiu wrote:
> In RISCV, we must use an AUIPC + JALR pair to encode an immediate,
> forming a jump that jumps to an address over 4K. This may cause errors
> if we want to enable kernel preemption and remove dependency from
> patching code with stop_machine(). For example, if a task was switched
> out on auipc. And, if we changed the ftrace function before it was
> switched back, then it would jump to an address that has updated 11:0
> bits mixing with previous XLEN:12 part.
> 
> p: patched area performed by dynamic ftrace
> ftrace_prologue:
> p|	REG_S	ra, -SZREG(sp)
> p|	auipc	ra, 0x? ------------> preempted
> 					...
> 				change ftrace function
> 					...
> p|	jalr	-?(ra) <------------- switched back
> p|	REG_L	ra, -SZREG(sp)
> func:
> 	xxx
> 	ret
> 
> To prevent such condition, we proposed a way to load or store target
> addresses atomically. We store a 8 bytes aligned full-width absolute
> address into each ftrace prologue and use a jump at front to decide
> whether we should take ftrace detour. To reduce footprint of ftrace
> prologues, we clobber t0, and we move ra (re-)storing into
> ftrace_{regs_}caller. This is similar to ARM64, which also clobbers x9 at
> each prologue.
> 
> Also, we initialize the target at startup to take care of a case where
> REG_L happened before the update of the ftrace target.
> 
> .align 2  # if it happen to be 8B-aligned
> ftrace_prologue:
> p|	{j	func} | {auipc	t0}
> 	j	ftrace_cont
> p|	.dword	0x? <=== storing the address to a 8B aligned space can be
> 			 considered atomic to read sides using REG_L
> ftrace_cont:
> 	REG_L	t0, 8(t0) <=== read side
> 	jalr	t0, t0
> func:
> 	xxx
> 	ret
> 
> .align 2  # if it is 4B but not 8B-aligned
> ftrace_prologue:
> p|	{j	func} | {auipc	t0}
> 	REG_L	t0, 0xc(t0) <=== read side
> 	j	ftrace_cont
> p|	.dword	0x? <=== the target address
> ftrace_cont:
> 	jalr	t0, t0
> func:
> 	xxx
> 	ret
> 
> Signed-off-by: Andy Chiu <andy.chiu@sifive.com>
> Reviewed-by: Greentime Hu <greentime.hu@sifive.com>
> ---
>   arch/riscv/include/asm/ftrace.h |  24 -----
>   arch/riscv/kernel/ftrace.c      | 173 ++++++++++++++++++++++----------
>   arch/riscv/kernel/mcount-dyn.S  |  69 ++++++++++---
>   3 files changed, 176 insertions(+), 90 deletions(-)
> 
> diff --git a/arch/riscv/include/asm/ftrace.h b/arch/riscv/include/asm/ftrace.h
> index 04dad3380041..eaa611e491fc 100644
> --- a/arch/riscv/include/asm/ftrace.h
> +++ b/arch/riscv/include/asm/ftrace.h
> @@ -47,30 +47,6 @@ struct dyn_arch_ftrace {
>    */
>   
>   #define MCOUNT_ADDR		((unsigned long)MCOUNT_NAME)
> -#define JALR_SIGN_MASK		(0x00000800)
> -#define JALR_OFFSET_MASK	(0x00000fff)
> -#define AUIPC_OFFSET_MASK	(0xfffff000)
> -#define AUIPC_PAD		(0x00001000)
> -#define JALR_SHIFT		20
> -#define JALR_BASIC		(0x000080e7)
> -#define AUIPC_BASIC		(0x00000097)
> -#define NOP4			(0x00000013)
> -
> -#define make_call(caller, callee, call)					\
> -do {									\
> -	call[0] = to_auipc_insn((unsigned int)((unsigned long)callee -	\
> -				(unsigned long)caller));		\
> -	call[1] = to_jalr_insn((unsigned int)((unsigned long)callee -	\
> -			       (unsigned long)caller));			\
> -} while (0)
> -
> -#define to_jalr_insn(offset)						\
> -	(((offset & JALR_OFFSET_MASK) << JALR_SHIFT) | JALR_BASIC)
> -
> -#define to_auipc_insn(offset)						\
> -	((offset & JALR_SIGN_MASK) ?					\
> -	(((offset & AUIPC_OFFSET_MASK) + AUIPC_PAD) | AUIPC_BASIC) :	\
> -	((offset & AUIPC_OFFSET_MASK) | AUIPC_BASIC))
>   
>   /*
>    * Let auipc+jalr be the basic *mcount unit*, so we make it 8 bytes here.
> diff --git a/arch/riscv/kernel/ftrace.c b/arch/riscv/kernel/ftrace.c
> index 2086f6585773..84b9e280dd1f 100644
> --- a/arch/riscv/kernel/ftrace.c
> +++ b/arch/riscv/kernel/ftrace.c
> @@ -23,31 +23,29 @@ void ftrace_arch_code_modify_post_process(void) __releases(&text_mutex)
>   }
>   
>   static int ftrace_check_current_call(unsigned long hook_pos,
> -				     unsigned int *expected)
> +				     unsigned long expected_addr)
>   {
> -	unsigned int replaced[2];
> -	unsigned int nops[2] = {NOP4, NOP4};
> +	unsigned long replaced;
>   
> -	/* we expect nops at the hook position */
> -	if (!expected)
> -		expected = nops;
> +	/* we expect ftrace_stub at the hook position */
> +	if (!expected_addr)
> +		expected_addr = (unsigned long) ftrace_stub;
>   
>   	/*
>   	 * Read the text we want to modify;
>   	 * return must be -EFAULT on read error
>   	 */
> -	if (copy_from_kernel_nofault(replaced, (void *)hook_pos,
> -			MCOUNT_INSN_SIZE))
> +	if (copy_from_kernel_nofault(&replaced, (void *)hook_pos,
> +			(sizeof(unsigned long))))
>   		return -EFAULT;
>   
>   	/*
>   	 * Make sure it is what we expect it to be;
>   	 * return must be -EINVAL on failed comparison
>   	 */
> -	if (memcmp(expected, replaced, sizeof(replaced))) {
> -		pr_err("%p: expected (%08x %08x) but got (%08x %08x)\n",
> -		       (void *)hook_pos, expected[0], expected[1], replaced[0],
> -		       replaced[1]);
> +	if (expected_addr != replaced) {
> +		pr_err("%p: expected (%016lx) but got (%016lx)\n",
> +		       (void *)hook_pos, expected_addr, replaced);
>   		return -EINVAL;
>   	}
>   
> @@ -57,55 +55,96 @@ static int ftrace_check_current_call(unsigned long hook_pos,
>   static int __ftrace_modify_call(unsigned long hook_pos, unsigned long target,
>   				bool enable)
>   {
> -	unsigned int call[2];
> -	unsigned int nops[2] = {NOP4, NOP4};
> +	unsigned long call = target;
> +	unsigned long nops = (unsigned long)ftrace_stub;
>   
> -	make_call(hook_pos, target, call);
> -
> -	/* Replace the auipc-jalr pair at once. Return -EPERM on write error. */
> +	/* Replace the target address at once. Return -EPERM on write error. */
>   	if (patch_text_nosync
> -	    ((void *)hook_pos, enable ? call : nops, MCOUNT_INSN_SIZE))
> +	    ((void *)hook_pos, enable ? &call : &nops, sizeof(unsigned long)))
>   		return -EPERM;
>   
>   	return 0;
>   }
>   
>   /*
> - * Put 5 instructions with 16 bytes at the front of function within
> - * patchable function entry nops' area.
> - *
> - * 0: REG_S  ra, -SZREG(sp)
> - * 1: auipc  ra, 0x?
> - * 2: jalr   -?(ra)
> - * 3: REG_L  ra, -SZREG(sp)
> + * Place 4 instructions and a destination address in the patchable function
> + * entry.
>    *
>    * So the opcodes is:
> - * 0: 0xfe113c23 (sd)/0xfe112e23 (sw)
> - * 1: 0x???????? -> auipc
> - * 2: 0x???????? -> jalr
> - * 3: 0xff813083 (ld)/0xffc12083 (lw)
> + * INSN_SKIPALL  : J     PC + 0x18 (when disabled, jump to the function)
> + * INSN_AUIPC    : AUIPC T0, 0 (when enabled, load address of trampoline)
> + * INSN_LOAD(off): REG_L T0, off(T0) (load address stored in the tramp)
> + * INSN_SKIPTRAMP: J     PC + 0x10 (skip tramp since they are not instructions)
> + * INSN_JALR     : JALR  T0, T0 (jump to the destination)
> + *
> + * At runtime, we want to patch the jump target atomically in order to work with
> + * kernel preemption. If we patched with a pair of AUIPC + JALR and a task was
> + * preempted after loading upper bits with AUIPC. Then things would mess up if
> + * we updated the jump target before the task were switched back.
> + *
> + * We also want to align all patchable function entries to 4-byte boundaries and,
> + * the jump target to a 8 Bytes aligned address so that each of them could be
> + * natually updated and observed by patching and running cores.
> + *
> + * To make sure target addresses are 8-byte aligned, we have to consider
> + * following scenarios:
> + *
> + * First if the starting address of the patchable entry is aligned to an 8-byte
> + * boundary:
> + * | ADDR   | COMPILED | DISABLED         | ENABLED                |
> + * +--------+----------+------------------+------------------------+
> + * | 0x00   | NOP      | J     FUNC       | AUIPC T0, 0            |
> + * | 0x04   | NOP      | J     0x10                                |
> + * | 0x08   | NOP      | 8-byte aligned target address (low)       |
> + * | 0x0C   | NOP      |                               (high)      |
> + * | 0x10   | NOP      | REG_L T0, 8(T0)                           |
> + * | 0x14   | NOP      | JALR  T0, T0                              |
> + * | FUNC   | X                                                    |
> + *
> + * If not, then it starts at a 4- but not 8-byte aligned address. In such cases,
> + * We re-arrange code and the trampoline in order to natually align it.
> + * | ADDR   | COMPILED | DISABLED         | ENABLED                |
> + * +--------+----------+------------------+------------------------+
> + * | 0x04   | NOP      | J     FUNC       | AUIPC T0, 0            |
> + * | 0x08   | NOP      | REG_L T0, 0xC(T0)                         |
> + * | 0x0C   | NOP      | J     0x18                                |
> + * | 0x10   | NOP      | 8-byte aligned target address (low)       |
> + * | 0x14   | NOP      |                               (high)      |
> + * | 0x18   | NOP      | JALR  T0, T0                              |
> + * | FUNC   | X                                                    |
>    */
> +
>   #if __riscv_xlen == 64
> -#define INSN0	0xfe113c23
> -#define INSN3	0xff813083
> -#elif __riscv_xlen == 32
> -#define INSN0	0xfe112e23
> -#define INSN3	0xffc12083
> +#define INSN_LD_T0_OFF(off) ((0x2b283) | ((off) << 20))
> +# elif __riscv_xlen == 32
> +#define INSN_LD_T0_OFF(off) ((0x2a283) | ((off) << 20))
>   #endif
>   
> -#define FUNC_ENTRY_SIZE	16
> -#define FUNC_ENTRY_JMP	4
> +#define INSN_SKIPALL	0x0180006f
> +#define INSN_AUIPC	0x00000297
> +#define INSN_LOAD(off)	INSN_LD_T0_OFF(off)
> +#define INSN_SKIPTRAMP	0x00c0006f
> +#define INSN_JALR	0x000282e7
> +#define INSN_SIZE	4
>   
>   int ftrace_make_call(struct dyn_ftrace *rec, unsigned long addr)
>   {
> -	unsigned int call[4] = {INSN0, 0, 0, INSN3};
> -	unsigned long target = addr;
> -	unsigned long caller = rec->ip + FUNC_ENTRY_JMP;
> -
> -	call[1] = to_auipc_insn((unsigned int)(target - caller));
> -	call[2] = to_jalr_insn((unsigned int)(target - caller));
> +	unsigned int call[1] = {INSN_AUIPC};
> +	void *tramp;
> +	unsigned long patch_addr = rec->ip;
> +
> +	if (IS_ALIGNED(patch_addr, 8)) {
> +		tramp = (void *) (patch_addr + 0x8);
> +	} else if (IS_ALIGNED(patch_addr, 4)) {
> +		tramp = (void *) (patch_addr + 0xc);
> +	} else {
> +		pr_warn("cannot patch: function must be 4-Byte or 8-Byte aligned\n");
> +		return -EINVAL;
> +	}
> +	WARN_ON(!IS_ALIGNED((unsigned long)tramp, 8));
> +	patch_insn_write(tramp, &addr, sizeof(unsigned long));
>   
> -	if (patch_text_nosync((void *)rec->ip, call, FUNC_ENTRY_SIZE))
> +	if (patch_text_nosync((void *)patch_addr, &call, INSN_SIZE))
>   		return -EPERM;
>   
>   	return 0;
> @@ -114,14 +153,49 @@ int ftrace_make_call(struct dyn_ftrace *rec, unsigned long addr)
>   int ftrace_make_nop(struct module *mod, struct dyn_ftrace *rec,
>   		    unsigned long addr)
>   {
> -	unsigned int nops[4] = {NOP4, NOP4, NOP4, NOP4};
> +	unsigned int nops[1] = {INSN_SKIPALL};
> +	unsigned long patch_addr = rec->ip;
>   
> -	if (patch_text_nosync((void *)rec->ip, nops, FUNC_ENTRY_SIZE))
> +	if (patch_text_nosync((void *)patch_addr, nops, INSN_SIZE))
>   		return -EPERM;
>   
>   	return 0;
>   }
>   
> +extern void ftrace_no_caller(void);
> +static void ftrace_make_default_tramp(unsigned int *tramp)
> +{
> +	*((unsigned long *)tramp) = (unsigned long) &ftrace_no_caller;
> +}
> +
> +int __ftrace_init_nop(struct module *mod, struct dyn_ftrace *rec,
> +		    unsigned long addr)
> +{
> +	unsigned int nops[6];
> +	unsigned int *tramp;
> +	unsigned long patch_addr = rec->ip;
> +
> +	nops[0] = INSN_SKIPALL;
> +	if (IS_ALIGNED(patch_addr, 8)) {
> +		nops[1] = INSN_SKIPTRAMP;
> +		nops[4] = INSN_LOAD(0x8);
> +		tramp = &nops[2];
> +	} else if (IS_ALIGNED(patch_addr, 4)) {
> +		nops[1] = INSN_LOAD(0xc);
> +		nops[2] = INSN_SKIPTRAMP;
> +		tramp = &nops[3];
> +	} else {
> +		pr_warn("start address must be 4-Byte aligned\n");
> +		return -EINVAL;
> +	}
> +	ftrace_make_default_tramp(tramp);
> +	nops[5] = INSN_JALR;
> +
> +	if (patch_text_nosync((void *)patch_addr, nops, sizeof(nops)))
> +		return -EPERM;
> +
> +	return 0;
> +}
>   
>   /*
>    * This is called early on, and isn't wrapped by
> @@ -135,7 +209,7 @@ int ftrace_init_nop(struct module *mod, struct dyn_ftrace *rec)
>   	int out;
>   
>   	ftrace_arch_code_modify_prepare();
> -	out = ftrace_make_nop(mod, rec, MCOUNT_ADDR);
> +	out = __ftrace_init_nop(mod, rec, MCOUNT_ADDR);
>   	ftrace_arch_code_modify_post_process();
>   
>   	return out;
> @@ -158,17 +232,14 @@ int ftrace_update_ftrace_func(ftrace_func_t func)
>   int ftrace_modify_call(struct dyn_ftrace *rec, unsigned long old_addr,
>   		       unsigned long addr)
>   {
> -	unsigned int call[2];
> -	unsigned long caller = rec->ip + FUNC_ENTRY_JMP;
>   	int ret;
>   
> -	make_call(caller, old_addr, call);
> -	ret = ftrace_check_current_call(caller, call);
> +	ret = ftrace_check_current_call(rec->ip, old_addr);

I suppose, the location of the handler's address should be used here, 
rather than the start address of the function. I mean, rec->ip + 8 or 
rec->ip + 12, depending on the alignment of the function.

Without this fix, the boottime function graph test fails as follows:

[   39.360356] Testing tracer function_graph:
[   44.648661] (____ptrval____): expected (ffffffff8000d838) but got 
(00c0006f00000297)
[   44.649805] ------------[ ftrace bug ]------------
[   44.650102] ftrace failed to modify
[   44.650135] [<ffffffff800fdba0>] 
trace_selftest_dynamic_test_func+0x0/0x28
[   44.650966]  actual:   97:02:00:00:6f:00:c0:00
[   44.651684] Updating ftrace call site to call a different ftrace function
[   44.652366] ftrace record flags: e1180002
[   44.652798]  (2) R
                 expected tramp: ffffffff8000d73c

The byte sequence 97:02:00:00:6f:00:c0:00 is the pair of instructions 
"auipc t0,0x0; j 0x10" rather than the address of the Ftrace trampoline.

>   
>   	if (ret)
>   		return ret;
>   
> -	return __ftrace_modify_call(caller, addr, true);
> +	return __ftrace_modify_call(rec->ip, addr, true);

Same here.

I am currently testing a rebased version of the patchset with the kernel 
6.8.0-rc4.

With the following fix the issue goes away and all boottime tests of the 
tracers pass in QEMU:
-----------------------

diff --git a/arch/riscv/kernel/ftrace.c b/arch/riscv/kernel/ftrace.c
index ad9eebfad0d5..cc89b0927622 100644
--- a/arch/riscv/kernel/ftrace.c
+++ b/arch/riscv/kernel/ftrace.c
@@ -142,20 +142,30 @@ static int __ftrace_modify_call(unsigned long 
hook_pos, unsigned long target,
  #define INSN_JALR	0x000282e7
  #define INSN_SIZE	4

-int ftrace_make_call(struct dyn_ftrace *rec, unsigned long addr)
+static inline unsigned long ftrace_tramp_addr(struct dyn_ftrace *rec)
  {
-	unsigned int call[1] = {INSN_AUIPC};
-	void *tramp;
  	unsigned long patch_addr = rec->ip;

  	if (IS_ALIGNED(patch_addr, 8)) {
-		tramp = (void *) (patch_addr + 0x8);
+		return patch_addr + 0x8;
  	} else if (IS_ALIGNED(patch_addr, 4)) {
-		tramp = (void *) (patch_addr + 0xc);
+		return patch_addr + 0xc;
  	} else {
-		pr_warn("cannot patch: function must be 4-Byte or 8-Byte aligned\n");
-		return -EINVAL;
+		pr_warn("cannot patch: start of the function must be 4-byte aligned\n");
+		return 0;
  	}
+}
+
+int ftrace_make_call(struct dyn_ftrace *rec, unsigned long addr)
+{
+	unsigned int call[1] = {INSN_AUIPC};
+	void *tramp;
+	unsigned long patch_addr = rec->ip;
+
+	tramp = (void *)ftrace_tramp_addr(rec);
+	if (!tramp)
+		return -EINVAL;
+
  	WARN_ON(!IS_ALIGNED((unsigned long)tramp, 8));
  	patch_insn_write(tramp, &addr, sizeof(unsigned long));

@@ -248,13 +257,18 @@ int ftrace_modify_call(struct dyn_ftrace *rec, 
unsigned long old_addr,
  		       unsigned long addr)
  {
  	int ret;
+	unsigned long tramp;
+
+	tramp = ftrace_tramp_addr(rec);
+	if (!tramp)
+		return -EINVAL;

-	ret = ftrace_check_current_call(rec->ip, old_addr);
+	ret = ftrace_check_current_call(tramp, old_addr);

  	if (ret)
  		return ret;

-	return __ftrace_modify_call(rec->ip, addr, true);
+	return __ftrace_modify_call(tramp, addr, true);
  }
  #endif

-----------------------

>   }
>   #endif
>   
> diff --git a/arch/riscv/kernel/mcount-dyn.S b/arch/riscv/kernel/mcount-dyn.S
> index d171eca623b6..f8ee63e4314b 100644
> --- a/arch/riscv/kernel/mcount-dyn.S
> +++ b/arch/riscv/kernel/mcount-dyn.S
> @@ -13,7 +13,7 @@
>   
>   	.text
>   
> -#define FENTRY_RA_OFFSET	12
> +#define FENTRY_RA_OFFSET	24
>   #define ABI_SIZE_ON_STACK	72
>   #define ABI_A0			0
>   #define ABI_A1			8
> @@ -25,7 +25,12 @@
>   #define ABI_A7			56
>   #define ABI_RA			64
>   
> +# t0 points to return of ftrace
> +# ra points to the return address of traced function
> +
>   	.macro SAVE_ABI
> +	REG_S	ra, -SZREG(sp)
> +	mv	ra, t0
>   	addi	sp, sp, -SZREG
>   	addi	sp, sp, -ABI_SIZE_ON_STACK
>   
> @@ -53,10 +58,14 @@
>   
>   	addi	sp, sp, ABI_SIZE_ON_STACK
>   	addi	sp, sp, SZREG
> +	mv	t0, ra
> +	REG_L	ra, -SZREG(sp)
>   	.endm
>   
>   #ifdef CONFIG_DYNAMIC_FTRACE_WITH_REGS
>   	.macro SAVE_ALL
> +	REG_S	ra, -SZREG(sp)
> +	mv	ra, t0
>   	addi	sp, sp, -SZREG
>   	addi	sp, sp, -PT_SIZE_ON_STACK
>   
> @@ -138,9 +147,18 @@
>   
>   	addi	sp, sp, PT_SIZE_ON_STACK
>   	addi	sp, sp, SZREG
> +	mv	t0, ra # t0 is equal to ra here
> +	REG_L	ra, -SZREG(sp)
>   	.endm
>   #endif /* CONFIG_DYNAMIC_FTRACE_WITH_REGS */
>   
> +# perform a full fence before re-running the ftrae entry if we run into this
> +ENTRY(ftrace_no_caller)
> +	fence	rw, rw
> +	fence.i
> +	jr	-FENTRY_RA_OFFSET(t0)
> +ENDPROC(ftrace_no_caller)
> +
>   ENTRY(ftrace_caller)
>   	SAVE_ABI
>   
> @@ -150,9 +168,9 @@ ENTRY(ftrace_caller)
>   	REG_L	a1, ABI_SIZE_ON_STACK(sp)
>   	mv	a3, sp
>   
> -ftrace_call:
> -	.global ftrace_call
> -	call	ftrace_stub
> +ftrace_call_site:
> +	REG_L	ra, ftrace_call
> +	jalr	0(ra)
>   
>   #ifdef CONFIG_FUNCTION_GRAPH_TRACER
>   	addi	a0, sp, ABI_SIZE_ON_STACK
> @@ -161,12 +179,12 @@ ftrace_call:
>   #ifdef HAVE_FUNCTION_GRAPH_FP_TEST
>   	mv	a2, s0
>   #endif
> -ftrace_graph_call:
> -	.global ftrace_graph_call
> -	call	ftrace_stub
> +ftrace_graph_call_site:
> +	REG_L	ra, ftrace_graph_call
> +	jalr	0(ra)
>   #endif
>   	RESTORE_ABI
> -	ret
> +	jr	t0
>   ENDPROC(ftrace_caller)
>   
>   #ifdef CONFIG_DYNAMIC_FTRACE_WITH_REGS
> @@ -179,9 +197,9 @@ ENTRY(ftrace_regs_caller)
>   	REG_L	a1, PT_SIZE_ON_STACK(sp)
>   	mv	a3, sp
>   
> -ftrace_regs_call:
> -	.global ftrace_regs_call
> -	call	ftrace_stub
> +ftrace_regs_call_site:
> +	REG_L	ra, ftrace_regs_call
> +	jalr	0(ra)
>   
>   #ifdef CONFIG_FUNCTION_GRAPH_TRACER
>   	addi	a0, sp, PT_RA
> @@ -190,12 +208,33 @@ ftrace_regs_call:
>   #ifdef HAVE_FUNCTION_GRAPH_FP_TEST
>   	mv	a2, s0
>   #endif
> -ftrace_graph_regs_call:
> -	.global ftrace_graph_regs_call
> -	call	ftrace_stub
> +ftrace_graph_regs_call_site:
> +	REG_L	ra, ftrace_graph_regs_call
> +	jalr	0(ra)
>   #endif
>   
>   	RESTORE_ALL
> -	ret
> +	jr	t0
>   ENDPROC(ftrace_regs_caller)
>   #endif /* CONFIG_DYNAMIC_FTRACE_WITH_REGS */
> +
> +.align RISCV_LGPTR
> +ftrace_call:
> +	.global ftrace_call
> +	RISCV_PTR ftrace_stub
> +
> +.align RISCV_LGPTR
> +ftrace_graph_call:
> +	.global ftrace_graph_call
> +	RISCV_PTR ftrace_stub
> +
> +.align RISCV_LGPTR
> +ftrace_regs_call:
> +	.global ftrace_regs_call
> +	RISCV_PTR ftrace_stub
> +
> +.align RISCV_LGPTR
> +ftrace_graph_regs_call:
> +	.global ftrace_graph_regs_call
> +	RISCV_PTR ftrace_stub
> +

Regards,
Evgenii

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* Re: [PATCH RFC v2 riscv/for-next 0/5] Enable ftrace with kernel preemption for RISC-V
  2024-02-13 19:42 ` [PATCH RFC v2 riscv/for-next 0/5] Enable ftrace with kernel preemption for RISC-V Evgenii Shatokhin
@ 2024-02-21  5:27   ` Andy Chiu
  2024-02-21 16:55     ` Evgenii Shatokhin
  0 siblings, 1 reply; 43+ messages in thread
From: Andy Chiu @ 2024-02-21  5:27 UTC (permalink / raw)
  To: Evgenii Shatokhin
  Cc: palmer, paul.walmsley, aou, rostedt, mingo, peterz, jpoimboe,
	jbaron, ardb, greentime.hu, zong.li, guoren, Jessica Clarke,
	kernel, linux-riscv, linux

On Wed, Feb 14, 2024 at 3:42 AM Evgenii Shatokhin <e.shatokhin@yadro.com> wrote:
>
> Hi,
>
> On 13.09.2022 12:42, Andy Chiu wrote:
> > This patch removes dependency of dynamic ftrace from calling
> > stop_machine(), and makes it compatiable with kernel preemption.
> > Originally, we ran into stack corruptions, or execution of partially
> > updated instructions when starting or stopping ftrace on a fully
> > preemptible kernel configuration. The reason is that kernel periodically
> > calls rcu_momentary_dyntick_idle() on cores waiting for the code-patching
> > core running in ftrace. Though rcu_momentary_dyntick_idle() itself is
> > marked as notrace, it would call a bunch of tracable functions if we
> > configured the kernel as preemptible. For example, these are some functions
> > that happened to have a symbol and have not been marked as notrace on a
> > RISC-V preemptible kernel compiled with GCC-11:
> >   - __rcu_report_exp_rnp()
> >   - rcu_report_exp_cpu_mult()
> >   - rcu_preempt_deferred_qs()
> >   - rcu_preempt_need_deferred_qs()
> >   - rcu_preempt_deferred_qs_irqrestore()
> >
> > Thus, this make it not ideal for us to rely on stop_machine() and
> > handly marked "notrace"s to perform runtime code patching. To remove
> > such dependency, we must make updates of code seemed atomic on running
> > cores. This might not be obvious for RISC-V since it usaually uses a pair
> > of AUIPC + JALR to perform a long jump, which cannot be modified and
> > executed concurrently if we consider preemptions. As such, this patch
> > proposed a way to make it possible. It embeds a 32-bit rel-address data
> > into instructions of each ftrace prologue and jumps indirectly. In this
> > way, we could store and load the address atomically so that the code
> > patching core could run simutaneously with the rest of running cores.
> >
> > After applying the patchset, we compiled a preemptible kernel with all
> > tracers and ftrace-selftest enabled, and booted it on a 2-core QEMU virt
> > machine. The kernel could boot up successfully, passing all ftrace
> > testsuits. Besides, we ran a script that randomly pick a tracer on every
> > 0~5 seconds. The kernel has sustained over 20K rounds of the test. In
> > contrast, a preemptible kernel without our patch would panic in few
> > rounds on the same machine.
> >
> > Though we ran into errors when using hwlat or irqsoff tracers together
> > with cpu-online stressor from stress-ng on a preemptible kernel. We
> > believe the reason may be that  percpu workers of the tracers are being
> > queued into unbounded workqueue when cpu get offlined and patches will go
> > through tracing tree.
> >
> > Additionally, we found patching of tracepoints unsafe since the
> > instructions being patched are not naturally aligned. This may result in
> > 2 half-word stores, which breaks atomicity, during the code patching.
> >
> > changes in patch v2:
> >   - Enforce alignments on all functions with a compiler workaround.
> >   - Support 64bit addressing for ftrace targets if xlen == 64
> >   - Initialize ftrace target addresses to avoid calling bad address in a
> >     hypothesized case.
> >   - Use LGPTR instead of SZPTR since .align is log-scaled for
> >     mcount-dyn.S
> >   - Require the nop instruction of all jump_labels aligns naturally on
> >     4B.
> >
> > Andy Chiu (5):
> >    riscv: align ftrace to 4 Byte boundary and increase ftrace prologue
> >      size
> >    riscv: export patch_insn_write
> >    riscv: ftrace: use indirect jump to work with kernel preemption
> >    riscv: ftrace: do not use stop_machine to update code
> >    riscv: align arch_static_branch function
> >
> >   arch/riscv/Makefile                 |   2 +-
> >   arch/riscv/include/asm/ftrace.h     |  24 ----
> >   arch/riscv/include/asm/jump_label.h |   2 +
> >   arch/riscv/include/asm/patch.h      |   1 +
> >   arch/riscv/kernel/ftrace.c          | 179 ++++++++++++++++++++--------
> >   arch/riscv/kernel/mcount-dyn.S      |  69 ++++++++---
> >   arch/riscv/kernel/patch.c           |   4 +-
> >   7 files changed, 188 insertions(+), 93 deletions(-)
> >
>
> First of all, thank you for working on making dynamic Ftrace robust in
> preemptible kernels on RISC-V.
> It is an important use case but, for now, dynamic Ftrace and related
> tracers cannot be safely used with such kernels.
>
> Are there any updates on this series?
> It needs a rebase, of course, but it looks doable.
>
> If I understand the discussion correctly, the only blocker was that
> using "-falign-functions" was not enough to properly align cold
> functions and "-fno-guess-branch-probability" would likely have a
> performance cost.
>
> It seems, GCC developers have recently provided a workaround for that
> (https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;h=0f5a9a00e3ab1fe96142f304cfbcf3f63b15f326,
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88345#c24).
>
> "-fmin-function-alignment" should help but, I do not know, which GCC
> versions have got that patch already. In the meantime, one could
> probably check if "-fmin-function-alignment" is supported by the
> compiler and use it, if it is.
>
> Thoughts?

Hi Evgenii,

Thanks for the update. Indeed, it is essential to this patch for
toolchain to provide forced alignment. We can test this flag in the
Makefile to sort out if toolchain supports it or not. Meanwhile, I had
figured out a way for this to work on any 2-B align addresses but
hadn't implemented it out yet. Basically it would require more
patching space for us to do software alignment. I would opt for a
special toolchain flag if the toolchain just supports it.

Let me take some time to look and get back to you soon.

>
> Regards,
> Evgenii

Regards,
Andy

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH RFC v2 riscv/for-next 0/5] Enable ftrace with kernel preemption for RISC-V
  2024-02-21  5:27   ` Andy Chiu
@ 2024-02-21 16:55     ` Evgenii Shatokhin
  2024-03-06 20:57       ` Alexandre Ghiti
  2024-03-18 15:31       ` Andy Chiu
  0 siblings, 2 replies; 43+ messages in thread
From: Evgenii Shatokhin @ 2024-02-21 16:55 UTC (permalink / raw)
  To: Andy Chiu
  Cc: palmer, paul.walmsley, aou, rostedt, mingo, peterz, jpoimboe,
	jbaron, ardb, greentime.hu, zong.li, guoren, Jessica Clarke,
	kernel, linux-riscv, linux

On 21.02.2024 08:27, Andy Chiu wrote:
> «Внимание! Данное письмо от внешнего адресата!»
> 
> On Wed, Feb 14, 2024 at 3:42 AM Evgenii Shatokhin <e.shatokhin@yadro.com> wrote:
>>
>> Hi,
>>
>> On 13.09.2022 12:42, Andy Chiu wrote:
>>> This patch removes dependency of dynamic ftrace from calling
>>> stop_machine(), and makes it compatiable with kernel preemption.
>>> Originally, we ran into stack corruptions, or execution of partially
>>> updated instructions when starting or stopping ftrace on a fully
>>> preemptible kernel configuration. The reason is that kernel periodically
>>> calls rcu_momentary_dyntick_idle() on cores waiting for the code-patching
>>> core running in ftrace. Though rcu_momentary_dyntick_idle() itself is
>>> marked as notrace, it would call a bunch of tracable functions if we
>>> configured the kernel as preemptible. For example, these are some functions
>>> that happened to have a symbol and have not been marked as notrace on a
>>> RISC-V preemptible kernel compiled with GCC-11:
>>>    - __rcu_report_exp_rnp()
>>>    - rcu_report_exp_cpu_mult()
>>>    - rcu_preempt_deferred_qs()
>>>    - rcu_preempt_need_deferred_qs()
>>>    - rcu_preempt_deferred_qs_irqrestore()
>>>
>>> Thus, this make it not ideal for us to rely on stop_machine() and
>>> handly marked "notrace"s to perform runtime code patching. To remove
>>> such dependency, we must make updates of code seemed atomic on running
>>> cores. This might not be obvious for RISC-V since it usaually uses a pair
>>> of AUIPC + JALR to perform a long jump, which cannot be modified and
>>> executed concurrently if we consider preemptions. As such, this patch
>>> proposed a way to make it possible. It embeds a 32-bit rel-address data
>>> into instructions of each ftrace prologue and jumps indirectly. In this
>>> way, we could store and load the address atomically so that the code
>>> patching core could run simutaneously with the rest of running cores.
>>>
>>> After applying the patchset, we compiled a preemptible kernel with all
>>> tracers and ftrace-selftest enabled, and booted it on a 2-core QEMU virt
>>> machine. The kernel could boot up successfully, passing all ftrace
>>> testsuits. Besides, we ran a script that randomly pick a tracer on every
>>> 0~5 seconds. The kernel has sustained over 20K rounds of the test. In
>>> contrast, a preemptible kernel without our patch would panic in few
>>> rounds on the same machine.
>>>
>>> Though we ran into errors when using hwlat or irqsoff tracers together
>>> with cpu-online stressor from stress-ng on a preemptible kernel. We
>>> believe the reason may be that  percpu workers of the tracers are being
>>> queued into unbounded workqueue when cpu get offlined and patches will go
>>> through tracing tree.
>>>
>>> Additionally, we found patching of tracepoints unsafe since the
>>> instructions being patched are not naturally aligned. This may result in
>>> 2 half-word stores, which breaks atomicity, during the code patching.
>>>
>>> changes in patch v2:
>>>    - Enforce alignments on all functions with a compiler workaround.
>>>    - Support 64bit addressing for ftrace targets if xlen == 64
>>>    - Initialize ftrace target addresses to avoid calling bad address in a
>>>      hypothesized case.
>>>    - Use LGPTR instead of SZPTR since .align is log-scaled for
>>>      mcount-dyn.S
>>>    - Require the nop instruction of all jump_labels aligns naturally on
>>>      4B.
>>>
>>> Andy Chiu (5):
>>>     riscv: align ftrace to 4 Byte boundary and increase ftrace prologue
>>>       size
>>>     riscv: export patch_insn_write
>>>     riscv: ftrace: use indirect jump to work with kernel preemption
>>>     riscv: ftrace: do not use stop_machine to update code
>>>     riscv: align arch_static_branch function
>>>
>>>    arch/riscv/Makefile                 |   2 +-
>>>    arch/riscv/include/asm/ftrace.h     |  24 ----
>>>    arch/riscv/include/asm/jump_label.h |   2 +
>>>    arch/riscv/include/asm/patch.h      |   1 +
>>>    arch/riscv/kernel/ftrace.c          | 179 ++++++++++++++++++++--------
>>>    arch/riscv/kernel/mcount-dyn.S      |  69 ++++++++---
>>>    arch/riscv/kernel/patch.c           |   4 +-
>>>    7 files changed, 188 insertions(+), 93 deletions(-)
>>>
>>
>> First of all, thank you for working on making dynamic Ftrace robust in
>> preemptible kernels on RISC-V.
>> It is an important use case but, for now, dynamic Ftrace and related
>> tracers cannot be safely used with such kernels.
>>
>> Are there any updates on this series?
>> It needs a rebase, of course, but it looks doable.
>>
>> If I understand the discussion correctly, the only blocker was that
>> using "-falign-functions" was not enough to properly align cold
>> functions and "-fno-guess-branch-probability" would likely have a
>> performance cost.
>>
>> It seems, GCC developers have recently provided a workaround for that
>> (https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;h=0f5a9a00e3ab1fe96142f304cfbcf3f63b15f326,
>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88345#c24).
>>
>> "-fmin-function-alignment" should help but, I do not know, which GCC
>> versions have got that patch already. In the meantime, one could
>> probably check if "-fmin-function-alignment" is supported by the
>> compiler and use it, if it is.
>>
>> Thoughts?
> 
> Hi Evgenii,
> 
> Thanks for the update. Indeed, it is essential to this patch for
> toolchain to provide forced alignment. We can test this flag in the
> Makefile to sort out if toolchain supports it or not. Meanwhile, I had
> figured out a way for this to work on any 2-B align addresses but
> hadn't implemented it out yet. Basically it would require more
> patching space for us to do software alignment. I would opt for a
> special toolchain flag if the toolchain just supports it.
> 
> Let me take some time to look and get back to you soon.

Thank you! Looking forward to it.

In case it helps, here is what I have checked so far.

1.
I added the patch 
https://gcc.gnu.org/git/?p=gcc.git;a=patch;h=0f5a9a00e3ab1fe96142f304cfbcf3f63b15f326 
to the current revision of GCC 13.2.0 from RISC-V toolchain.

Rebased your patchset on top of Linux 6.8-rc4 (mostly - context changes, 
SYM_FUNC_START/SYM_FUNC_END for asm symbols, etc.).

Reverted 8547649981e6 ("riscv: ftrace: Fixup panic by disabling 
preemption").

Switched from -falign-functions=4 to -fmin-function-alignment=4:
------------------
diff --git a/arch/riscv/Makefile b/arch/riscv/Makefile
index b33b787c8b07..dcd0adeebaae 100644
--- a/arch/riscv/Makefile
+++ b/arch/riscv/Makefile
@@ -15,9 +15,9 @@ ifeq ($(CONFIG_DYNAMIC_FTRACE),y)
  	LDFLAGS_vmlinux += --no-relax
  	KBUILD_CPPFLAGS += -DCC_USING_PATCHABLE_FUNCTION_ENTRY
  ifeq ($(CONFIG_RISCV_ISA_C),y)
-	CC_FLAGS_FTRACE := -fpatchable-function-entry=12 -falign-functions=4
+	CC_FLAGS_FTRACE := -fpatchable-function-entry=12 
-fmin-function-alignment=4
  else
-	CC_FLAGS_FTRACE := -fpatchable-function-entry=6 -falign-functions=4
+	CC_FLAGS_FTRACE := -fpatchable-function-entry=6 -fmin-function-alignment=4
  endif
  endif

------------------

As far as I can see from objdump, the functions that were not aligned at 
4-byte boundary with -falign-functions=4, are now aligned correctly with 
-fmin-function-alignment=4.

2.
I tried the kernel in a QEMU VM with 2 CPUs and "-machine virt".

The boottime tests for Ftrace had passed, except the tests for 
function_graph. I described the failure and the possible fix here:
https://lore.kernel.org/all/dcc5976d-635a-4710-92df-94a99653314e@yadro.com/

3.
There were also boottime warnings about "RCU not on for: 
arch_cpu_idle+0x0/0x2c". These are probably not related to your 
patchset, but rather to the fact that Ftrace is enabled in a preemptble 
kernel where RCU does different things.

As a workaround, I disabled tracing of arch_cpu_idle() for now:
------------------
diff --git a/arch/riscv/kernel/process.c b/arch/riscv/kernel/process.c
index 92922dbd5b5c..6abeecbfc51d 100644
--- a/arch/riscv/kernel/process.c
+++ b/arch/riscv/kernel/process.c
@@ -37,7 +37,7 @@ EXPORT_SYMBOL(__stack_chk_guard);

  extern asmlinkage void ret_from_fork(void);

-void arch_cpu_idle(void)
+void noinstr arch_cpu_idle(void)
  {
  	cpu_do_idle();
  }

------------------

4.
Stress-testing revealed an issue though, which I do not understand yet.

Probably similar to what you did earlier, I ran a script that switched 
the current tracer to "function", "function_graph", "nop", "blk" each 
1-5 seconds. In another shell, "stress-ng --hrtimers 1" was running.

The kernel usually crashed within a few minutes, in seemingly random 
locations, but often in one of two ways:

(a) Invalid instruction, because the address of ftrace_caller function 
was somehow written to the body of the traced function rather than just 
to the Ftrace prologue.

In the following example, the crash happened at 0xffffffff800d3398. "b0 
d7" is actually not part of the code here, but rather the lower bytes of 
0xffffffff8000d7b0, the address of ftrace_caller() in this kernel.

(gdb) disas /r 0xffffffff800d3382,+0x20
Dump of assembler code from 0xffffffff800d3382 to 0xffffffff800d33a2:
...
    0xffffffff800d3394 <clockevents_program_event+144>:  ba 87   mv 
a5,a4
    0xffffffff800d3396 <clockevents_program_event+146>:  c1 bf   j 
0xffffffff800d3366 <clockevents_program_event+98>
    0xffffffff800d3398 <clockevents_program_event+148>:  b0 d7   sw 
a2,104(a5) // 0xffffffff8000d7b0, the address of ftrace_caller().
    0xffffffff800d339a <clockevents_program_event+150>:  00 80   .2byte 
0x8000
    0xffffffff800d339c <clockevents_program_event+152>:  ff ff   .2byte 
0xffff
    0xffffffff800d339e <clockevents_program_event+154>:  ff ff   .2byte 
0xffff
    0xffffffff800d33a0 <clockevents_program_event+156>:  d5 bf   j 
0xffffffff800d3394 <clockevents_program_event+144

The backtrace usually contains one or more occurrences of 
return_to_handler() in this case.

[  260.520394] [<ffffffff800d3398>] clockevents_program_event+0xac/0x100
[  260.521195] [<ffffffff8000d2bc>] return_to_handler+0x0/0x26
[  260.521843] [<ffffffff800c50ba>] hrtimer_interrupt+0x122/0x20c
[  260.522492] [<ffffffff8000d2bc>] return_to_handler+0x0/0x26
[  260.523132] [<ffffffff8009785e>] handle_percpu_devid_irq+0x9e/0x1ec
[  260.523788] [<ffffffff8000d2bc>] return_to_handler+0x0/0x26
[  260.524437] [<ffffffff8000d2bc>] return_to_handler+0x0/0x26
[  260.525080] [<ffffffff80a8acfa>] handle_riscv_irq+0x4a/0x74
[  260.525726] [<ffffffff80a97b9a>] call_on_irq_stack+0x32/0x40
----------------------

(b) Jump to an invalid location, e.g. to the middle of a valid 4-byte 
instruction. %ra usually points right after the last instruction, "jalr 
   a2", in return_to_handler() in such cases, so the jump was likely 
made from there.

The problem is reproducible, although I have not found what causes it yet.

Any help is appreciated, of course.

> 
>>
>> Regards,
>> Evgenii
> 
> Regards,
> Andy


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* Re: [PATCH RFC v2 riscv/for-next 0/5] Enable ftrace with kernel preemption for RISC-V
  2024-02-21 16:55     ` Evgenii Shatokhin
@ 2024-03-06 20:57       ` Alexandre Ghiti
  2024-03-07  8:35         ` Evgenii Shatokhin
  2024-03-07 12:27         ` Andy Chiu
  2024-03-18 15:31       ` Andy Chiu
  1 sibling, 2 replies; 43+ messages in thread
From: Alexandre Ghiti @ 2024-03-06 20:57 UTC (permalink / raw)
  To: Evgenii Shatokhin, Andy Chiu
  Cc: palmer, paul.walmsley, aou, rostedt, mingo, peterz, jpoimboe,
	jbaron, ardb, greentime.hu, zong.li, guoren, Jessica Clarke,
	kernel, linux-riscv, linux

Hi Evgenii,

On 21/02/2024 17:55, Evgenii Shatokhin wrote:
> On 21.02.2024 08:27, Andy Chiu wrote:
>> «Внимание! Данное письмо от внешнего адресата!»
>>
>> On Wed, Feb 14, 2024 at 3:42 AM Evgenii Shatokhin 
>> <e.shatokhin@yadro.com> wrote:
>>>
>>> Hi,
>>>
>>> On 13.09.2022 12:42, Andy Chiu wrote:
>>>> This patch removes dependency of dynamic ftrace from calling
>>>> stop_machine(), and makes it compatiable with kernel preemption.
>>>> Originally, we ran into stack corruptions, or execution of partially
>>>> updated instructions when starting or stopping ftrace on a fully
>>>> preemptible kernel configuration. The reason is that kernel 
>>>> periodically
>>>> calls rcu_momentary_dyntick_idle() on cores waiting for the 
>>>> code-patching
>>>> core running in ftrace. Though rcu_momentary_dyntick_idle() itself is
>>>> marked as notrace, it would call a bunch of tracable functions if we
>>>> configured the kernel as preemptible. For example, these are some 
>>>> functions
>>>> that happened to have a symbol and have not been marked as notrace 
>>>> on a
>>>> RISC-V preemptible kernel compiled with GCC-11:
>>>>    - __rcu_report_exp_rnp()
>>>>    - rcu_report_exp_cpu_mult()
>>>>    - rcu_preempt_deferred_qs()
>>>>    - rcu_preempt_need_deferred_qs()
>>>>    - rcu_preempt_deferred_qs_irqrestore()
>>>>
>>>> Thus, this make it not ideal for us to rely on stop_machine() and
>>>> handly marked "notrace"s to perform runtime code patching. To remove
>>>> such dependency, we must make updates of code seemed atomic on running
>>>> cores. This might not be obvious for RISC-V since it usaually uses 
>>>> a pair
>>>> of AUIPC + JALR to perform a long jump, which cannot be modified and
>>>> executed concurrently if we consider preemptions. As such, this patch
>>>> proposed a way to make it possible. It embeds a 32-bit rel-address 
>>>> data
>>>> into instructions of each ftrace prologue and jumps indirectly. In 
>>>> this
>>>> way, we could store and load the address atomically so that the code
>>>> patching core could run simutaneously with the rest of running cores.
>>>>
>>>> After applying the patchset, we compiled a preemptible kernel with all
>>>> tracers and ftrace-selftest enabled, and booted it on a 2-core QEMU 
>>>> virt
>>>> machine. The kernel could boot up successfully, passing all ftrace
>>>> testsuits. Besides, we ran a script that randomly pick a tracer on 
>>>> every
>>>> 0~5 seconds. The kernel has sustained over 20K rounds of the test. In
>>>> contrast, a preemptible kernel without our patch would panic in few
>>>> rounds on the same machine.
>>>>
>>>> Though we ran into errors when using hwlat or irqsoff tracers together
>>>> with cpu-online stressor from stress-ng on a preemptible kernel. We
>>>> believe the reason may be that  percpu workers of the tracers are 
>>>> being
>>>> queued into unbounded workqueue when cpu get offlined and patches 
>>>> will go
>>>> through tracing tree.
>>>>
>>>> Additionally, we found patching of tracepoints unsafe since the
>>>> instructions being patched are not naturally aligned. This may 
>>>> result in
>>>> 2 half-word stores, which breaks atomicity, during the code patching.
>>>>
>>>> changes in patch v2:
>>>>    - Enforce alignments on all functions with a compiler workaround.
>>>>    - Support 64bit addressing for ftrace targets if xlen == 64
>>>>    - Initialize ftrace target addresses to avoid calling bad 
>>>> address in a
>>>>      hypothesized case.
>>>>    - Use LGPTR instead of SZPTR since .align is log-scaled for
>>>>      mcount-dyn.S
>>>>    - Require the nop instruction of all jump_labels aligns 
>>>> naturally on
>>>>      4B.
>>>>
>>>> Andy Chiu (5):
>>>>     riscv: align ftrace to 4 Byte boundary and increase ftrace 
>>>> prologue
>>>>       size
>>>>     riscv: export patch_insn_write
>>>>     riscv: ftrace: use indirect jump to work with kernel preemption
>>>>     riscv: ftrace: do not use stop_machine to update code
>>>>     riscv: align arch_static_branch function
>>>>
>>>>    arch/riscv/Makefile                 |   2 +-
>>>>    arch/riscv/include/asm/ftrace.h     |  24 ----
>>>>    arch/riscv/include/asm/jump_label.h |   2 +
>>>>    arch/riscv/include/asm/patch.h      |   1 +
>>>>    arch/riscv/kernel/ftrace.c          | 179 
>>>> ++++++++++++++++++++--------
>>>>    arch/riscv/kernel/mcount-dyn.S      |  69 ++++++++---
>>>>    arch/riscv/kernel/patch.c           |   4 +-
>>>>    7 files changed, 188 insertions(+), 93 deletions(-)
>>>>
>>>
>>> First of all, thank you for working on making dynamic Ftrace robust in
>>> preemptible kernels on RISC-V.
>>> It is an important use case but, for now, dynamic Ftrace and related
>>> tracers cannot be safely used with such kernels.
>>>
>>> Are there any updates on this series?
>>> It needs a rebase, of course, but it looks doable.
>>>
>>> If I understand the discussion correctly, the only blocker was that
>>> using "-falign-functions" was not enough to properly align cold
>>> functions and "-fno-guess-branch-probability" would likely have a
>>> performance cost.
>>>
>>> It seems, GCC developers have recently provided a workaround for that
>>> (https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;h=0f5a9a00e3ab1fe96142f304cfbcf3f63b15f326, 
>>>
>>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88345#c24).
>>>
>>> "-fmin-function-alignment" should help but, I do not know, which GCC
>>> versions have got that patch already. In the meantime, one could
>>> probably check if "-fmin-function-alignment" is supported by the
>>> compiler and use it, if it is.
>>>
>>> Thoughts?
>>
>> Hi Evgenii,
>>
>> Thanks for the update. Indeed, it is essential to this patch for
>> toolchain to provide forced alignment. We can test this flag in the
>> Makefile to sort out if toolchain supports it or not. Meanwhile, I had
>> figured out a way for this to work on any 2-B align addresses but
>> hadn't implemented it out yet. Basically it would require more
>> patching space for us to do software alignment. I would opt for a
>> special toolchain flag if the toolchain just supports it.
>>
>> Let me take some time to look and get back to you soon.
>
> Thank you! Looking forward to it.
>
> In case it helps, here is what I have checked so far.
>
> 1.
> I added the patch 
> https://gcc.gnu.org/git/?p=gcc.git;a=patch;h=0f5a9a00e3ab1fe96142f304cfbcf3f63b15f326 
> to the current revision of GCC 13.2.0 from RISC-V toolchain.
>
> Rebased your patchset on top of Linux 6.8-rc4 (mostly - context 
> changes, SYM_FUNC_START/SYM_FUNC_END for asm symbols, etc.).
>
> Reverted 8547649981e6 ("riscv: ftrace: Fixup panic by disabling 
> preemption").
>
> Switched from -falign-functions=4 to -fmin-function-alignment=4:
> ------------------
> diff --git a/arch/riscv/Makefile b/arch/riscv/Makefile
> index b33b787c8b07..dcd0adeebaae 100644
> --- a/arch/riscv/Makefile
> +++ b/arch/riscv/Makefile
> @@ -15,9 +15,9 @@ ifeq ($(CONFIG_DYNAMIC_FTRACE),y)
>      LDFLAGS_vmlinux += --no-relax
>      KBUILD_CPPFLAGS += -DCC_USING_PATCHABLE_FUNCTION_ENTRY
>  ifeq ($(CONFIG_RISCV_ISA_C),y)
> -    CC_FLAGS_FTRACE := -fpatchable-function-entry=12 -falign-functions=4
> +    CC_FLAGS_FTRACE := -fpatchable-function-entry=12 
> -fmin-function-alignment=4
>  else
> -    CC_FLAGS_FTRACE := -fpatchable-function-entry=6 -falign-functions=4
> +    CC_FLAGS_FTRACE := -fpatchable-function-entry=6 
> -fmin-function-alignment=4
>  endif
>  endif
>
> ------------------
>
> As far as I can see from objdump, the functions that were not aligned 
> at 4-byte boundary with -falign-functions=4, are now aligned correctly 
> with -fmin-function-alignment=4.
>
> 2.
> I tried the kernel in a QEMU VM with 2 CPUs and "-machine virt".
>
> The boottime tests for Ftrace had passed, except the tests for 
> function_graph. I described the failure and the possible fix here:
> https://lore.kernel.org/all/dcc5976d-635a-4710-92df-94a99653314e@yadro.com/ 
>
>
> 3.
> There were also boottime warnings about "RCU not on for: 
> arch_cpu_idle+0x0/0x2c". These are probably not related to your 
> patchset, but rather to the fact that Ftrace is enabled in a 
> preemptble kernel where RCU does different things.
>
> As a workaround, I disabled tracing of arch_cpu_idle() for now:
> ------------------
> diff --git a/arch/riscv/kernel/process.c b/arch/riscv/kernel/process.c
> index 92922dbd5b5c..6abeecbfc51d 100644
> --- a/arch/riscv/kernel/process.c
> +++ b/arch/riscv/kernel/process.c
> @@ -37,7 +37,7 @@ EXPORT_SYMBOL(__stack_chk_guard);
>
>  extern asmlinkage void ret_from_fork(void);
>
> -void arch_cpu_idle(void)
> +void noinstr arch_cpu_idle(void)
>  {
>      cpu_do_idle();
>  }


I came up with the same fix for this, based on a similar fix for s390. I 
have a patch ready and will send it soon since to me, it is a fix, not a 
workaround.

Thanks,

Alex


>
> ------------------
>
> 4.
> Stress-testing revealed an issue though, which I do not understand yet.
>
> Probably similar to what you did earlier, I ran a script that switched 
> the current tracer to "function", "function_graph", "nop", "blk" each 
> 1-5 seconds. In another shell, "stress-ng --hrtimers 1" was running.
>
> The kernel usually crashed within a few minutes, in seemingly random 
> locations, but often in one of two ways:
>
> (a) Invalid instruction, because the address of ftrace_caller function 
> was somehow written to the body of the traced function rather than 
> just to the Ftrace prologue.
>
> In the following example, the crash happened at 0xffffffff800d3398. 
> "b0 d7" is actually not part of the code here, but rather the lower 
> bytes of 0xffffffff8000d7b0, the address of ftrace_caller() in this 
> kernel.
>
> (gdb) disas /r 0xffffffff800d3382,+0x20
> Dump of assembler code from 0xffffffff800d3382 to 0xffffffff800d33a2:
> ...
>    0xffffffff800d3394 <clockevents_program_event+144>:  ba 87   mv a5,a4
>    0xffffffff800d3396 <clockevents_program_event+146>:  c1 bf   j 
> 0xffffffff800d3366 <clockevents_program_event+98>
>    0xffffffff800d3398 <clockevents_program_event+148>:  b0 d7   sw 
> a2,104(a5) // 0xffffffff8000d7b0, the address of ftrace_caller().
>    0xffffffff800d339a <clockevents_program_event+150>:  00 80   .2byte 
> 0x8000
>    0xffffffff800d339c <clockevents_program_event+152>:  ff ff   .2byte 
> 0xffff
>    0xffffffff800d339e <clockevents_program_event+154>:  ff ff   .2byte 
> 0xffff
>    0xffffffff800d33a0 <clockevents_program_event+156>:  d5 bf   j 
> 0xffffffff800d3394 <clockevents_program_event+144
>
> The backtrace usually contains one or more occurrences of 
> return_to_handler() in this case.
>
> [  260.520394] [<ffffffff800d3398>] clockevents_program_event+0xac/0x100
> [  260.521195] [<ffffffff8000d2bc>] return_to_handler+0x0/0x26
> [  260.521843] [<ffffffff800c50ba>] hrtimer_interrupt+0x122/0x20c
> [  260.522492] [<ffffffff8000d2bc>] return_to_handler+0x0/0x26
> [  260.523132] [<ffffffff8009785e>] handle_percpu_devid_irq+0x9e/0x1ec
> [  260.523788] [<ffffffff8000d2bc>] return_to_handler+0x0/0x26
> [  260.524437] [<ffffffff8000d2bc>] return_to_handler+0x0/0x26
> [  260.525080] [<ffffffff80a8acfa>] handle_riscv_irq+0x4a/0x74
> [  260.525726] [<ffffffff80a97b9a>] call_on_irq_stack+0x32/0x40
> ----------------------
>
> (b) Jump to an invalid location, e.g. to the middle of a valid 4-byte 
> instruction. %ra usually points right after the last instruction, 
> "jalr   a2", in return_to_handler() in such cases, so the jump was 
> likely made from there.
>
> The problem is reproducible, although I have not found what causes it 
> yet.
>
> Any help is appreciated, of course.
>
>>
>>>
>>> Regards,
>>> Evgenii
>>
>> Regards,
>> Andy
>
>
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH RFC v2 riscv/for-next 0/5] Enable ftrace with kernel preemption for RISC-V
  2024-03-06 20:57       ` Alexandre Ghiti
@ 2024-03-07  8:35         ` Evgenii Shatokhin
  2024-03-07 12:27         ` Andy Chiu
  1 sibling, 0 replies; 43+ messages in thread
From: Evgenii Shatokhin @ 2024-03-07  8:35 UTC (permalink / raw)
  To: Alexandre Ghiti
  Cc: Andy Chiu, palmer, paul.walmsley, aou, rostedt, mingo, peterz,
	jpoimboe, jbaron, ardb, greentime.hu, zong.li, guoren,
	Jessica Clarke, kernel, linux-riscv, linux

Hi Alexandre,

On 06.03.2024 23:57, Alexandre Ghiti wrote:
> Hi Evgenii,
> 
> On 21/02/2024 17:55, Evgenii Shatokhin wrote:
>> On 21.02.2024 08:27, Andy Chiu wrote:
>>> «Внимание! Данное письмо от внешнего адресата!»
>>>
>>> On Wed, Feb 14, 2024 at 3:42 AM Evgenii Shatokhin
>>> <e.shatokhin@yadro.com> wrote:
>>>>
>>>> Hi,
>>>>
>>>> On 13.09.2022 12:42, Andy Chiu wrote:
>>>>> This patch removes dependency of dynamic ftrace from calling
>>>>> stop_machine(), and makes it compatiable with kernel preemption.
>>>>> Originally, we ran into stack corruptions, or execution of partially
>>>>> updated instructions when starting or stopping ftrace on a fully
>>>>> preemptible kernel configuration. The reason is that kernel
>>>>> periodically
>>>>> calls rcu_momentary_dyntick_idle() on cores waiting for the
>>>>> code-patching
>>>>> core running in ftrace. Though rcu_momentary_dyntick_idle() itself is
>>>>> marked as notrace, it would call a bunch of tracable functions if we
>>>>> configured the kernel as preemptible. For example, these are some
>>>>> functions
>>>>> that happened to have a symbol and have not been marked as notrace
>>>>> on a
>>>>> RISC-V preemptible kernel compiled with GCC-11:
>>>>>    - __rcu_report_exp_rnp()
>>>>>    - rcu_report_exp_cpu_mult()
>>>>>    - rcu_preempt_deferred_qs()
>>>>>    - rcu_preempt_need_deferred_qs()
>>>>>    - rcu_preempt_deferred_qs_irqrestore()
>>>>>
>>>>> Thus, this make it not ideal for us to rely on stop_machine() and
>>>>> handly marked "notrace"s to perform runtime code patching. To remove
>>>>> such dependency, we must make updates of code seemed atomic on running
>>>>> cores. This might not be obvious for RISC-V since it usaually uses
>>>>> a pair
>>>>> of AUIPC + JALR to perform a long jump, which cannot be modified and
>>>>> executed concurrently if we consider preemptions. As such, this patch
>>>>> proposed a way to make it possible. It embeds a 32-bit rel-address
>>>>> data
>>>>> into instructions of each ftrace prologue and jumps indirectly. In
>>>>> this
>>>>> way, we could store and load the address atomically so that the code
>>>>> patching core could run simutaneously with the rest of running cores.
>>>>>
>>>>> After applying the patchset, we compiled a preemptible kernel with all
>>>>> tracers and ftrace-selftest enabled, and booted it on a 2-core QEMU
>>>>> virt
>>>>> machine. The kernel could boot up successfully, passing all ftrace
>>>>> testsuits. Besides, we ran a script that randomly pick a tracer on
>>>>> every
>>>>> 0~5 seconds. The kernel has sustained over 20K rounds of the test. In
>>>>> contrast, a preemptible kernel without our patch would panic in few
>>>>> rounds on the same machine.
>>>>>
>>>>> Though we ran into errors when using hwlat or irqsoff tracers together
>>>>> with cpu-online stressor from stress-ng on a preemptible kernel. We
>>>>> believe the reason may be that  percpu workers of the tracers are
>>>>> being
>>>>> queued into unbounded workqueue when cpu get offlined and patches
>>>>> will go
>>>>> through tracing tree.
>>>>>
>>>>> Additionally, we found patching of tracepoints unsafe since the
>>>>> instructions being patched are not naturally aligned. This may
>>>>> result in
>>>>> 2 half-word stores, which breaks atomicity, during the code patching.
>>>>>
>>>>> changes in patch v2:
>>>>>    - Enforce alignments on all functions with a compiler workaround.
>>>>>    - Support 64bit addressing for ftrace targets if xlen == 64
>>>>>    - Initialize ftrace target addresses to avoid calling bad
>>>>> address in a
>>>>>      hypothesized case.
>>>>>    - Use LGPTR instead of SZPTR since .align is log-scaled for
>>>>>      mcount-dyn.S
>>>>>    - Require the nop instruction of all jump_labels aligns
>>>>> naturally on
>>>>>      4B.
>>>>>
>>>>> Andy Chiu (5):
>>>>>     riscv: align ftrace to 4 Byte boundary and increase ftrace
>>>>> prologue
>>>>>       size
>>>>>     riscv: export patch_insn_write
>>>>>     riscv: ftrace: use indirect jump to work with kernel preemption
>>>>>     riscv: ftrace: do not use stop_machine to update code
>>>>>     riscv: align arch_static_branch function
>>>>>
>>>>>    arch/riscv/Makefile                 |   2 +-
>>>>>    arch/riscv/include/asm/ftrace.h     |  24 ----
>>>>>    arch/riscv/include/asm/jump_label.h |   2 +
>>>>>    arch/riscv/include/asm/patch.h      |   1 +
>>>>>    arch/riscv/kernel/ftrace.c          | 179
>>>>> ++++++++++++++++++++--------
>>>>>    arch/riscv/kernel/mcount-dyn.S      |  69 ++++++++---
>>>>>    arch/riscv/kernel/patch.c           |   4 +-
>>>>>    7 files changed, 188 insertions(+), 93 deletions(-)
>>>>>
>>>>
>>>> First of all, thank you for working on making dynamic Ftrace robust in
>>>> preemptible kernels on RISC-V.
>>>> It is an important use case but, for now, dynamic Ftrace and related
>>>> tracers cannot be safely used with such kernels.
>>>>
>>>> Are there any updates on this series?
>>>> It needs a rebase, of course, but it looks doable.
>>>>
>>>> If I understand the discussion correctly, the only blocker was that
>>>> using "-falign-functions" was not enough to properly align cold
>>>> functions and "-fno-guess-branch-probability" would likely have a
>>>> performance cost.
>>>>
>>>> It seems, GCC developers have recently provided a workaround for that
>>>> (https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;h=0f5a9a00e3ab1fe96142f304cfbcf3f63b15f326,
>>>>
>>>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88345#c24).
>>>>
>>>> "-fmin-function-alignment" should help but, I do not know, which GCC
>>>> versions have got that patch already. In the meantime, one could
>>>> probably check if "-fmin-function-alignment" is supported by the
>>>> compiler and use it, if it is.
>>>>
>>>> Thoughts?
>>>
>>> Hi Evgenii,
>>>
>>> Thanks for the update. Indeed, it is essential to this patch for
>>> toolchain to provide forced alignment. We can test this flag in the
>>> Makefile to sort out if toolchain supports it or not. Meanwhile, I had
>>> figured out a way for this to work on any 2-B align addresses but
>>> hadn't implemented it out yet. Basically it would require more
>>> patching space for us to do software alignment. I would opt for a
>>> special toolchain flag if the toolchain just supports it.
>>>
>>> Let me take some time to look and get back to you soon.
>>
>> Thank you! Looking forward to it.
>>
>> In case it helps, here is what I have checked so far.
>>
>> 1.
>> I added the patch
>> https://gcc.gnu.org/git/?p=gcc.git;a=patch;h=0f5a9a00e3ab1fe96142f304cfbcf3f63b15f326
>> to the current revision of GCC 13.2.0 from RISC-V toolchain.
>>
>> Rebased your patchset on top of Linux 6.8-rc4 (mostly - context
>> changes, SYM_FUNC_START/SYM_FUNC_END for asm symbols, etc.).
>>
>> Reverted 8547649981e6 ("riscv: ftrace: Fixup panic by disabling
>> preemption").
>>
>> Switched from -falign-functions=4 to -fmin-function-alignment=4:
>> ------------------
>> diff --git a/arch/riscv/Makefile b/arch/riscv/Makefile
>> index b33b787c8b07..dcd0adeebaae 100644
>> --- a/arch/riscv/Makefile
>> +++ b/arch/riscv/Makefile
>> @@ -15,9 +15,9 @@ ifeq ($(CONFIG_DYNAMIC_FTRACE),y)
>>      LDFLAGS_vmlinux += --no-relax
>>      KBUILD_CPPFLAGS += -DCC_USING_PATCHABLE_FUNCTION_ENTRY
>>  ifeq ($(CONFIG_RISCV_ISA_C),y)
>> -    CC_FLAGS_FTRACE := -fpatchable-function-entry=12 -falign-functions=4
>> +    CC_FLAGS_FTRACE := -fpatchable-function-entry=12
>> -fmin-function-alignment=4
>>  else
>> -    CC_FLAGS_FTRACE := -fpatchable-function-entry=6 -falign-functions=4
>> +    CC_FLAGS_FTRACE := -fpatchable-function-entry=6
>> -fmin-function-alignment=4
>>  endif
>>  endif
>>
>> ------------------
>>
>> As far as I can see from objdump, the functions that were not aligned
>> at 4-byte boundary with -falign-functions=4, are now aligned correctly
>> with -fmin-function-alignment=4.
>>
>> 2.
>> I tried the kernel in a QEMU VM with 2 CPUs and "-machine virt".
>>
>> The boottime tests for Ftrace had passed, except the tests for
>> function_graph. I described the failure and the possible fix here:
>> https://lore.kernel.org/all/dcc5976d-635a-4710-92df-94a99653314e@yadro.com/
>>
>>
>> 3.
>> There were also boottime warnings about "RCU not on for:
>> arch_cpu_idle+0x0/0x2c". These are probably not related to your
>> patchset, but rather to the fact that Ftrace is enabled in a
>> preemptble kernel where RCU does different things.
>>
>> As a workaround, I disabled tracing of arch_cpu_idle() for now:
>> ------------------
>> diff --git a/arch/riscv/kernel/process.c b/arch/riscv/kernel/process.c
>> index 92922dbd5b5c..6abeecbfc51d 100644
>> --- a/arch/riscv/kernel/process.c
>> +++ b/arch/riscv/kernel/process.c
>> @@ -37,7 +37,7 @@ EXPORT_SYMBOL(__stack_chk_guard);
>>
>>  extern asmlinkage void ret_from_fork(void);
>>
>> -void arch_cpu_idle(void)
>> +void noinstr arch_cpu_idle(void)
>>  {
>>      cpu_do_idle();
>>  }
> 
> 
> I came up with the same fix for this, based on a similar fix for s390. I
> have a patch ready and will send it soon since to me, it is a fix, not a
> workaround.
> 
> Thanks,
> 
> Alex

Great! Thank you. That is very good news.

By the way, have you tried switching dynamic tracers like "function", 
"function_graph", etc. while the system is under pressure, on a kernel 
with this patchset?

I am using 'stress-ng --hrtimers 1' and memory corruption still happens 
within a few minutes each time. I described the issue earlier.

It seems as if the address of ftrace_caller is sometimes written to a 
wrong location when enabling "function" or "function_graph" tracer. 
Perhaps, a barrier is missing somewhere, or something.

> 
> 
>>
>> ------------------
>>
>> 4.
>> Stress-testing revealed an issue though, which I do not understand yet.
>>
>> Probably similar to what you did earlier, I ran a script that switched
>> the current tracer to "function", "function_graph", "nop", "blk" each
>> 1-5 seconds. In another shell, "stress-ng --hrtimers 1" was running.
>>
>> The kernel usually crashed within a few minutes, in seemingly random
>> locations, but often in one of two ways:
>>
>> (a) Invalid instruction, because the address of ftrace_caller function
>> was somehow written to the body of the traced function rather than
>> just to the Ftrace prologue.
>>
>> In the following example, the crash happened at 0xffffffff800d3398.
>> "b0 d7" is actually not part of the code here, but rather the lower
>> bytes of 0xffffffff8000d7b0, the address of ftrace_caller() in this
>> kernel.
>>
>> (gdb) disas /r 0xffffffff800d3382,+0x20
>> Dump of assembler code from 0xffffffff800d3382 to 0xffffffff800d33a2:
>> ...
>>    0xffffffff800d3394 <clockevents_program_event+144>:  ba 87   mv a5,a4
>>    0xffffffff800d3396 <clockevents_program_event+146>:  c1 bf   j
>> 0xffffffff800d3366 <clockevents_program_event+98>
>>    0xffffffff800d3398 <clockevents_program_event+148>:  b0 d7   sw
>> a2,104(a5) // 0xffffffff8000d7b0, the address of ftrace_caller().
>>    0xffffffff800d339a <clockevents_program_event+150>:  00 80   .2byte
>> 0x8000
>>    0xffffffff800d339c <clockevents_program_event+152>:  ff ff   .2byte
>> 0xffff
>>    0xffffffff800d339e <clockevents_program_event+154>:  ff ff   .2byte
>> 0xffff
>>    0xffffffff800d33a0 <clockevents_program_event+156>:  d5 bf   j
>> 0xffffffff800d3394 <clockevents_program_event+144
>>
>> The backtrace usually contains one or more occurrences of
>> return_to_handler() in this case.
>>
>> [  260.520394] [<ffffffff800d3398>] clockevents_program_event+0xac/0x100
>> [  260.521195] [<ffffffff8000d2bc>] return_to_handler+0x0/0x26
>> [  260.521843] [<ffffffff800c50ba>] hrtimer_interrupt+0x122/0x20c
>> [  260.522492] [<ffffffff8000d2bc>] return_to_handler+0x0/0x26
>> [  260.523132] [<ffffffff8009785e>] handle_percpu_devid_irq+0x9e/0x1ec
>> [  260.523788] [<ffffffff8000d2bc>] return_to_handler+0x0/0x26
>> [  260.524437] [<ffffffff8000d2bc>] return_to_handler+0x0/0x26
>> [  260.525080] [<ffffffff80a8acfa>] handle_riscv_irq+0x4a/0x74
>> [  260.525726] [<ffffffff80a97b9a>] call_on_irq_stack+0x32/0x40
>> ----------------------
>>
>> (b) Jump to an invalid location, e.g. to the middle of a valid 4-byte
>> instruction. %ra usually points right after the last instruction,
>> "jalr   a2", in return_to_handler() in such cases, so the jump was
>> likely made from there.
>>
>> The problem is reproducible, although I have not found what causes it
>> yet.
>>
>> Any help is appreciated, of course.
>>
>>>
>>>>
>>>> Regards,
>>>> Evgenii
>>>
>>> Regards,
>>> Andy
>>

Regards,
Evgenii


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH RFC v2 riscv/for-next 0/5] Enable ftrace with kernel preemption for RISC-V
  2024-03-06 20:57       ` Alexandre Ghiti
  2024-03-07  8:35         ` Evgenii Shatokhin
@ 2024-03-07 12:27         ` Andy Chiu
  2024-03-07 13:21           ` Alexandre Ghiti
  1 sibling, 1 reply; 43+ messages in thread
From: Andy Chiu @ 2024-03-07 12:27 UTC (permalink / raw)
  To: Alexandre Ghiti
  Cc: Evgenii Shatokhin, palmer, paul.walmsley, aou, rostedt, mingo,
	peterz, jpoimboe, jbaron, ardb, greentime.hu, zong.li, guoren,
	Jessica Clarke, kernel, linux-riscv, linux

Hi Alex,

On Thu, Mar 7, 2024 at 4:57 AM Alexandre Ghiti <alex@ghiti.fr> wrote:
>
> Hi Evgenii,
>
> On 21/02/2024 17:55, Evgenii Shatokhin wrote:
> > On 21.02.2024 08:27, Andy Chiu wrote:
> >> «Внимание! Данное письмо от внешнего адресата!»
> >>
> >> On Wed, Feb 14, 2024 at 3:42 AM Evgenii Shatokhin
> >> <e.shatokhin@yadro.com> wrote:
> >>>
> >>> Hi,
> >>>
> >>> On 13.09.2022 12:42, Andy Chiu wrote:
> >>>> This patch removes dependency of dynamic ftrace from calling
> >>>> stop_machine(), and makes it compatiable with kernel preemption.
> >>>> Originally, we ran into stack corruptions, or execution of partially
> >>>> updated instructions when starting or stopping ftrace on a fully
> >>>> preemptible kernel configuration. The reason is that kernel
> >>>> periodically
> >>>> calls rcu_momentary_dyntick_idle() on cores waiting for the
> >>>> code-patching
> >>>> core running in ftrace. Though rcu_momentary_dyntick_idle() itself is
> >>>> marked as notrace, it would call a bunch of tracable functions if we
> >>>> configured the kernel as preemptible. For example, these are some
> >>>> functions
> >>>> that happened to have a symbol and have not been marked as notrace
> >>>> on a
> >>>> RISC-V preemptible kernel compiled with GCC-11:
> >>>>    - __rcu_report_exp_rnp()
> >>>>    - rcu_report_exp_cpu_mult()
> >>>>    - rcu_preempt_deferred_qs()
> >>>>    - rcu_preempt_need_deferred_qs()
> >>>>    - rcu_preempt_deferred_qs_irqrestore()
> >>>>
> >>>> Thus, this make it not ideal for us to rely on stop_machine() and
> >>>> handly marked "notrace"s to perform runtime code patching. To remove
> >>>> such dependency, we must make updates of code seemed atomic on running
> >>>> cores. This might not be obvious for RISC-V since it usaually uses
> >>>> a pair
> >>>> of AUIPC + JALR to perform a long jump, which cannot be modified and
> >>>> executed concurrently if we consider preemptions. As such, this patch
> >>>> proposed a way to make it possible. It embeds a 32-bit rel-address
> >>>> data
> >>>> into instructions of each ftrace prologue and jumps indirectly. In
> >>>> this
> >>>> way, we could store and load the address atomically so that the code
> >>>> patching core could run simutaneously with the rest of running cores.
> >>>>
> >>>> After applying the patchset, we compiled a preemptible kernel with all
> >>>> tracers and ftrace-selftest enabled, and booted it on a 2-core QEMU
> >>>> virt
> >>>> machine. The kernel could boot up successfully, passing all ftrace
> >>>> testsuits. Besides, we ran a script that randomly pick a tracer on
> >>>> every
> >>>> 0~5 seconds. The kernel has sustained over 20K rounds of the test. In
> >>>> contrast, a preemptible kernel without our patch would panic in few
> >>>> rounds on the same machine.
> >>>>
> >>>> Though we ran into errors when using hwlat or irqsoff tracers together
> >>>> with cpu-online stressor from stress-ng on a preemptible kernel. We
> >>>> believe the reason may be that  percpu workers of the tracers are
> >>>> being
> >>>> queued into unbounded workqueue when cpu get offlined and patches
> >>>> will go
> >>>> through tracing tree.
> >>>>
> >>>> Additionally, we found patching of tracepoints unsafe since the
> >>>> instructions being patched are not naturally aligned. This may
> >>>> result in
> >>>> 2 half-word stores, which breaks atomicity, during the code patching.
> >>>>
> >>>> changes in patch v2:
> >>>>    - Enforce alignments on all functions with a compiler workaround.
> >>>>    - Support 64bit addressing for ftrace targets if xlen == 64
> >>>>    - Initialize ftrace target addresses to avoid calling bad
> >>>> address in a
> >>>>      hypothesized case.
> >>>>    - Use LGPTR instead of SZPTR since .align is log-scaled for
> >>>>      mcount-dyn.S
> >>>>    - Require the nop instruction of all jump_labels aligns
> >>>> naturally on
> >>>>      4B.
> >>>>
> >>>> Andy Chiu (5):
> >>>>     riscv: align ftrace to 4 Byte boundary and increase ftrace
> >>>> prologue
> >>>>       size
> >>>>     riscv: export patch_insn_write
> >>>>     riscv: ftrace: use indirect jump to work with kernel preemption
> >>>>     riscv: ftrace: do not use stop_machine to update code
> >>>>     riscv: align arch_static_branch function
> >>>>
> >>>>    arch/riscv/Makefile                 |   2 +-
> >>>>    arch/riscv/include/asm/ftrace.h     |  24 ----
> >>>>    arch/riscv/include/asm/jump_label.h |   2 +
> >>>>    arch/riscv/include/asm/patch.h      |   1 +
> >>>>    arch/riscv/kernel/ftrace.c          | 179
> >>>> ++++++++++++++++++++--------
> >>>>    arch/riscv/kernel/mcount-dyn.S      |  69 ++++++++---
> >>>>    arch/riscv/kernel/patch.c           |   4 +-
> >>>>    7 files changed, 188 insertions(+), 93 deletions(-)
> >>>>
> >>>
> >>> First of all, thank you for working on making dynamic Ftrace robust in
> >>> preemptible kernels on RISC-V.
> >>> It is an important use case but, for now, dynamic Ftrace and related
> >>> tracers cannot be safely used with such kernels.
> >>>
> >>> Are there any updates on this series?
> >>> It needs a rebase, of course, but it looks doable.
> >>>
> >>> If I understand the discussion correctly, the only blocker was that
> >>> using "-falign-functions" was not enough to properly align cold
> >>> functions and "-fno-guess-branch-probability" would likely have a
> >>> performance cost.
> >>>
> >>> It seems, GCC developers have recently provided a workaround for that
> >>> (https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;h=0f5a9a00e3ab1fe96142f304cfbcf3f63b15f326,
> >>>
> >>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88345#c24).
> >>>
> >>> "-fmin-function-alignment" should help but, I do not know, which GCC
> >>> versions have got that patch already. In the meantime, one could
> >>> probably check if "-fmin-function-alignment" is supported by the
> >>> compiler and use it, if it is.
> >>>
> >>> Thoughts?
> >>
> >> Hi Evgenii,
> >>
> >> Thanks for the update. Indeed, it is essential to this patch for
> >> toolchain to provide forced alignment. We can test this flag in the
> >> Makefile to sort out if toolchain supports it or not. Meanwhile, I had
> >> figured out a way for this to work on any 2-B align addresses but
> >> hadn't implemented it out yet. Basically it would require more
> >> patching space for us to do software alignment. I would opt for a
> >> special toolchain flag if the toolchain just supports it.
> >>
> >> Let me take some time to look and get back to you soon.
> >
> > Thank you! Looking forward to it.
> >
> > In case it helps, here is what I have checked so far.
> >
> > 1.
> > I added the patch
> > https://gcc.gnu.org/git/?p=gcc.git;a=patch;h=0f5a9a00e3ab1fe96142f304cfbcf3f63b15f326
> > to the current revision of GCC 13.2.0 from RISC-V toolchain.
> >
> > Rebased your patchset on top of Linux 6.8-rc4 (mostly - context
> > changes, SYM_FUNC_START/SYM_FUNC_END for asm symbols, etc.).
> >
> > Reverted 8547649981e6 ("riscv: ftrace: Fixup panic by disabling
> > preemption").
> >
> > Switched from -falign-functions=4 to -fmin-function-alignment=4:
> > ------------------
> > diff --git a/arch/riscv/Makefile b/arch/riscv/Makefile
> > index b33b787c8b07..dcd0adeebaae 100644
> > --- a/arch/riscv/Makefile
> > +++ b/arch/riscv/Makefile
> > @@ -15,9 +15,9 @@ ifeq ($(CONFIG_DYNAMIC_FTRACE),y)
> >      LDFLAGS_vmlinux += --no-relax
> >      KBUILD_CPPFLAGS += -DCC_USING_PATCHABLE_FUNCTION_ENTRY
> >  ifeq ($(CONFIG_RISCV_ISA_C),y)
> > -    CC_FLAGS_FTRACE := -fpatchable-function-entry=12 -falign-functions=4
> > +    CC_FLAGS_FTRACE := -fpatchable-function-entry=12
> > -fmin-function-alignment=4
> >  else
> > -    CC_FLAGS_FTRACE := -fpatchable-function-entry=6 -falign-functions=4
> > +    CC_FLAGS_FTRACE := -fpatchable-function-entry=6
> > -fmin-function-alignment=4
> >  endif
> >  endif
> >
> > ------------------
> >
> > As far as I can see from objdump, the functions that were not aligned
> > at 4-byte boundary with -falign-functions=4, are now aligned correctly
> > with -fmin-function-alignment=4.
> >
> > 2.
> > I tried the kernel in a QEMU VM with 2 CPUs and "-machine virt".
> >
> > The boottime tests for Ftrace had passed, except the tests for
> > function_graph. I described the failure and the possible fix here:
> > https://lore.kernel.org/all/dcc5976d-635a-4710-92df-94a99653314e@yadro.com/
> >
> >
> > 3.
> > There were also boottime warnings about "RCU not on for:
> > arch_cpu_idle+0x0/0x2c". These are probably not related to your
> > patchset, but rather to the fact that Ftrace is enabled in a
> > preemptble kernel where RCU does different things.
> >
> > As a workaround, I disabled tracing of arch_cpu_idle() for now:
> > ------------------
> > diff --git a/arch/riscv/kernel/process.c b/arch/riscv/kernel/process.c
> > index 92922dbd5b5c..6abeecbfc51d 100644
> > --- a/arch/riscv/kernel/process.c
> > +++ b/arch/riscv/kernel/process.c
> > @@ -37,7 +37,7 @@ EXPORT_SYMBOL(__stack_chk_guard);
> >
> >  extern asmlinkage void ret_from_fork(void);
> >
> > -void arch_cpu_idle(void)
> > +void noinstr arch_cpu_idle(void)
> >  {
> >      cpu_do_idle();
> >  }
>
>
> I came up with the same fix for this, based on a similar fix for s390. I
> have a patch ready and will send it soon since to me, it is a fix, not a
> workaround.

Just making sure we aren't duplicating works. Are you also working on
getting rid of stop_machine() while patching ftrace entries? Or to
provide a patch to fix the issue in arch_cpu_idle()? I was just about
to restart my patchset for the first purpose. In case if I missed
anything, could you help pointing me to the patchset if it's already
on the ML?

>
> Thanks,
>
> Alex
>
>
> >
> > ------------------
> >
> > 4.
> > Stress-testing revealed an issue though, which I do not understand yet.
> >
> > Probably similar to what you did earlier, I ran a script that switched
> > the current tracer to "function", "function_graph", "nop", "blk" each
> > 1-5 seconds. In another shell, "stress-ng --hrtimers 1" was running.
> >
> > The kernel usually crashed within a few minutes, in seemingly random
> > locations, but often in one of two ways:
> >
> > (a) Invalid instruction, because the address of ftrace_caller function
> > was somehow written to the body of the traced function rather than
> > just to the Ftrace prologue.
> >
> > In the following example, the crash happened at 0xffffffff800d3398.
> > "b0 d7" is actually not part of the code here, but rather the lower
> > bytes of 0xffffffff8000d7b0, the address of ftrace_caller() in this
> > kernel.
> >
> > (gdb) disas /r 0xffffffff800d3382,+0x20
> > Dump of assembler code from 0xffffffff800d3382 to 0xffffffff800d33a2:
> > ...
> >    0xffffffff800d3394 <clockevents_program_event+144>:  ba 87   mv a5,a4
> >    0xffffffff800d3396 <clockevents_program_event+146>:  c1 bf   j
> > 0xffffffff800d3366 <clockevents_program_event+98>
> >    0xffffffff800d3398 <clockevents_program_event+148>:  b0 d7   sw
> > a2,104(a5) // 0xffffffff8000d7b0, the address of ftrace_caller().
> >    0xffffffff800d339a <clockevents_program_event+150>:  00 80   .2byte
> > 0x8000
> >    0xffffffff800d339c <clockevents_program_event+152>:  ff ff   .2byte
> > 0xffff
> >    0xffffffff800d339e <clockevents_program_event+154>:  ff ff   .2byte
> > 0xffff
> >    0xffffffff800d33a0 <clockevents_program_event+156>:  d5 bf   j
> > 0xffffffff800d3394 <clockevents_program_event+144
> >
> > The backtrace usually contains one or more occurrences of
> > return_to_handler() in this case.
> >
> > [  260.520394] [<ffffffff800d3398>] clockevents_program_event+0xac/0x100
> > [  260.521195] [<ffffffff8000d2bc>] return_to_handler+0x0/0x26
> > [  260.521843] [<ffffffff800c50ba>] hrtimer_interrupt+0x122/0x20c
> > [  260.522492] [<ffffffff8000d2bc>] return_to_handler+0x0/0x26
> > [  260.523132] [<ffffffff8009785e>] handle_percpu_devid_irq+0x9e/0x1ec
> > [  260.523788] [<ffffffff8000d2bc>] return_to_handler+0x0/0x26
> > [  260.524437] [<ffffffff8000d2bc>] return_to_handler+0x0/0x26
> > [  260.525080] [<ffffffff80a8acfa>] handle_riscv_irq+0x4a/0x74
> > [  260.525726] [<ffffffff80a97b9a>] call_on_irq_stack+0x32/0x40
> > ----------------------
> >
> > (b) Jump to an invalid location, e.g. to the middle of a valid 4-byte
> > instruction. %ra usually points right after the last instruction,
> > "jalr   a2", in return_to_handler() in such cases, so the jump was
> > likely made from there.
> >
> > The problem is reproducible, although I have not found what causes it
> > yet.
> >
> > Any help is appreciated, of course.
> >
> >>
> >>>
> >>> Regards,
> >>> Evgenii
> >>
> >> Regards,
> >> Andy
> >
> >
> > _______________________________________________
> > linux-riscv mailing list
> > linux-riscv@lists.infradead.org
> > http://lists.infradead.org/mailman/listinfo/linux-riscv

Thanks,
Andy

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH RFC v2 riscv/for-next 0/5] Enable ftrace with kernel preemption for RISC-V
  2024-03-07 12:27         ` Andy Chiu
@ 2024-03-07 13:21           ` Alexandre Ghiti
  2024-03-07 15:57             ` Samuel Holland
  0 siblings, 1 reply; 43+ messages in thread
From: Alexandre Ghiti @ 2024-03-07 13:21 UTC (permalink / raw)
  To: Andy Chiu
  Cc: Evgenii Shatokhin, palmer, paul.walmsley, aou, rostedt, mingo,
	peterz, jpoimboe, jbaron, ardb, greentime.hu, zong.li, guoren,
	Jessica Clarke, kernel, linux-riscv, linux

Hi Andy,

On 07/03/2024 13:27, Andy Chiu wrote:
> Hi Alex,
>
> On Thu, Mar 7, 2024 at 4:57 AM Alexandre Ghiti <alex@ghiti.fr> wrote:
>> Hi Evgenii,
>>
>> On 21/02/2024 17:55, Evgenii Shatokhin wrote:
>>> On 21.02.2024 08:27, Andy Chiu wrote:
>>>> «Внимание! Данное письмо от внешнего адресата!»
>>>>
>>>> On Wed, Feb 14, 2024 at 3:42 AM Evgenii Shatokhin
>>>> <e.shatokhin@yadro.com> wrote:
>>>>> Hi,
>>>>>
>>>>> On 13.09.2022 12:42, Andy Chiu wrote:
>>>>>> This patch removes dependency of dynamic ftrace from calling
>>>>>> stop_machine(), and makes it compatiable with kernel preemption.
>>>>>> Originally, we ran into stack corruptions, or execution of partially
>>>>>> updated instructions when starting or stopping ftrace on a fully
>>>>>> preemptible kernel configuration. The reason is that kernel
>>>>>> periodically
>>>>>> calls rcu_momentary_dyntick_idle() on cores waiting for the
>>>>>> code-patching
>>>>>> core running in ftrace. Though rcu_momentary_dyntick_idle() itself is
>>>>>> marked as notrace, it would call a bunch of tracable functions if we
>>>>>> configured the kernel as preemptible. For example, these are some
>>>>>> functions
>>>>>> that happened to have a symbol and have not been marked as notrace
>>>>>> on a
>>>>>> RISC-V preemptible kernel compiled with GCC-11:
>>>>>>     - __rcu_report_exp_rnp()
>>>>>>     - rcu_report_exp_cpu_mult()
>>>>>>     - rcu_preempt_deferred_qs()
>>>>>>     - rcu_preempt_need_deferred_qs()
>>>>>>     - rcu_preempt_deferred_qs_irqrestore()
>>>>>>
>>>>>> Thus, this make it not ideal for us to rely on stop_machine() and
>>>>>> handly marked "notrace"s to perform runtime code patching. To remove
>>>>>> such dependency, we must make updates of code seemed atomic on running
>>>>>> cores. This might not be obvious for RISC-V since it usaually uses
>>>>>> a pair
>>>>>> of AUIPC + JALR to perform a long jump, which cannot be modified and
>>>>>> executed concurrently if we consider preemptions. As such, this patch
>>>>>> proposed a way to make it possible. It embeds a 32-bit rel-address
>>>>>> data
>>>>>> into instructions of each ftrace prologue and jumps indirectly. In
>>>>>> this
>>>>>> way, we could store and load the address atomically so that the code
>>>>>> patching core could run simutaneously with the rest of running cores.
>>>>>>
>>>>>> After applying the patchset, we compiled a preemptible kernel with all
>>>>>> tracers and ftrace-selftest enabled, and booted it on a 2-core QEMU
>>>>>> virt
>>>>>> machine. The kernel could boot up successfully, passing all ftrace
>>>>>> testsuits. Besides, we ran a script that randomly pick a tracer on
>>>>>> every
>>>>>> 0~5 seconds. The kernel has sustained over 20K rounds of the test. In
>>>>>> contrast, a preemptible kernel without our patch would panic in few
>>>>>> rounds on the same machine.
>>>>>>
>>>>>> Though we ran into errors when using hwlat or irqsoff tracers together
>>>>>> with cpu-online stressor from stress-ng on a preemptible kernel. We
>>>>>> believe the reason may be that  percpu workers of the tracers are
>>>>>> being
>>>>>> queued into unbounded workqueue when cpu get offlined and patches
>>>>>> will go
>>>>>> through tracing tree.
>>>>>>
>>>>>> Additionally, we found patching of tracepoints unsafe since the
>>>>>> instructions being patched are not naturally aligned. This may
>>>>>> result in
>>>>>> 2 half-word stores, which breaks atomicity, during the code patching.
>>>>>>
>>>>>> changes in patch v2:
>>>>>>     - Enforce alignments on all functions with a compiler workaround.
>>>>>>     - Support 64bit addressing for ftrace targets if xlen == 64
>>>>>>     - Initialize ftrace target addresses to avoid calling bad
>>>>>> address in a
>>>>>>       hypothesized case.
>>>>>>     - Use LGPTR instead of SZPTR since .align is log-scaled for
>>>>>>       mcount-dyn.S
>>>>>>     - Require the nop instruction of all jump_labels aligns
>>>>>> naturally on
>>>>>>       4B.
>>>>>>
>>>>>> Andy Chiu (5):
>>>>>>      riscv: align ftrace to 4 Byte boundary and increase ftrace
>>>>>> prologue
>>>>>>        size
>>>>>>      riscv: export patch_insn_write
>>>>>>      riscv: ftrace: use indirect jump to work with kernel preemption
>>>>>>      riscv: ftrace: do not use stop_machine to update code
>>>>>>      riscv: align arch_static_branch function
>>>>>>
>>>>>>     arch/riscv/Makefile                 |   2 +-
>>>>>>     arch/riscv/include/asm/ftrace.h     |  24 ----
>>>>>>     arch/riscv/include/asm/jump_label.h |   2 +
>>>>>>     arch/riscv/include/asm/patch.h      |   1 +
>>>>>>     arch/riscv/kernel/ftrace.c          | 179
>>>>>> ++++++++++++++++++++--------
>>>>>>     arch/riscv/kernel/mcount-dyn.S      |  69 ++++++++---
>>>>>>     arch/riscv/kernel/patch.c           |   4 +-
>>>>>>     7 files changed, 188 insertions(+), 93 deletions(-)
>>>>>>
>>>>> First of all, thank you for working on making dynamic Ftrace robust in
>>>>> preemptible kernels on RISC-V.
>>>>> It is an important use case but, for now, dynamic Ftrace and related
>>>>> tracers cannot be safely used with such kernels.
>>>>>
>>>>> Are there any updates on this series?
>>>>> It needs a rebase, of course, but it looks doable.
>>>>>
>>>>> If I understand the discussion correctly, the only blocker was that
>>>>> using "-falign-functions" was not enough to properly align cold
>>>>> functions and "-fno-guess-branch-probability" would likely have a
>>>>> performance cost.
>>>>>
>>>>> It seems, GCC developers have recently provided a workaround for that
>>>>> (https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;h=0f5a9a00e3ab1fe96142f304cfbcf3f63b15f326,
>>>>>
>>>>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88345#c24).
>>>>>
>>>>> "-fmin-function-alignment" should help but, I do not know, which GCC
>>>>> versions have got that patch already. In the meantime, one could
>>>>> probably check if "-fmin-function-alignment" is supported by the
>>>>> compiler and use it, if it is.
>>>>>
>>>>> Thoughts?
>>>> Hi Evgenii,
>>>>
>>>> Thanks for the update. Indeed, it is essential to this patch for
>>>> toolchain to provide forced alignment. We can test this flag in the
>>>> Makefile to sort out if toolchain supports it or not. Meanwhile, I had
>>>> figured out a way for this to work on any 2-B align addresses but
>>>> hadn't implemented it out yet. Basically it would require more
>>>> patching space for us to do software alignment. I would opt for a
>>>> special toolchain flag if the toolchain just supports it.
>>>>
>>>> Let me take some time to look and get back to you soon.
>>> Thank you! Looking forward to it.
>>>
>>> In case it helps, here is what I have checked so far.
>>>
>>> 1.
>>> I added the patch
>>> https://gcc.gnu.org/git/?p=gcc.git;a=patch;h=0f5a9a00e3ab1fe96142f304cfbcf3f63b15f326
>>> to the current revision of GCC 13.2.0 from RISC-V toolchain.
>>>
>>> Rebased your patchset on top of Linux 6.8-rc4 (mostly - context
>>> changes, SYM_FUNC_START/SYM_FUNC_END for asm symbols, etc.).
>>>
>>> Reverted 8547649981e6 ("riscv: ftrace: Fixup panic by disabling
>>> preemption").
>>>
>>> Switched from -falign-functions=4 to -fmin-function-alignment=4:
>>> ------------------
>>> diff --git a/arch/riscv/Makefile b/arch/riscv/Makefile
>>> index b33b787c8b07..dcd0adeebaae 100644
>>> --- a/arch/riscv/Makefile
>>> +++ b/arch/riscv/Makefile
>>> @@ -15,9 +15,9 @@ ifeq ($(CONFIG_DYNAMIC_FTRACE),y)
>>>       LDFLAGS_vmlinux += --no-relax
>>>       KBUILD_CPPFLAGS += -DCC_USING_PATCHABLE_FUNCTION_ENTRY
>>>   ifeq ($(CONFIG_RISCV_ISA_C),y)
>>> -    CC_FLAGS_FTRACE := -fpatchable-function-entry=12 -falign-functions=4
>>> +    CC_FLAGS_FTRACE := -fpatchable-function-entry=12
>>> -fmin-function-alignment=4
>>>   else
>>> -    CC_FLAGS_FTRACE := -fpatchable-function-entry=6 -falign-functions=4
>>> +    CC_FLAGS_FTRACE := -fpatchable-function-entry=6
>>> -fmin-function-alignment=4
>>>   endif
>>>   endif
>>>
>>> ------------------
>>>
>>> As far as I can see from objdump, the functions that were not aligned
>>> at 4-byte boundary with -falign-functions=4, are now aligned correctly
>>> with -fmin-function-alignment=4.
>>>
>>> 2.
>>> I tried the kernel in a QEMU VM with 2 CPUs and "-machine virt".
>>>
>>> The boottime tests for Ftrace had passed, except the tests for
>>> function_graph. I described the failure and the possible fix here:
>>> https://lore.kernel.org/all/dcc5976d-635a-4710-92df-94a99653314e@yadro.com/
>>>
>>>
>>> 3.
>>> There were also boottime warnings about "RCU not on for:
>>> arch_cpu_idle+0x0/0x2c". These are probably not related to your
>>> patchset, but rather to the fact that Ftrace is enabled in a
>>> preemptble kernel where RCU does different things.
>>>
>>> As a workaround, I disabled tracing of arch_cpu_idle() for now:
>>> ------------------
>>> diff --git a/arch/riscv/kernel/process.c b/arch/riscv/kernel/process.c
>>> index 92922dbd5b5c..6abeecbfc51d 100644
>>> --- a/arch/riscv/kernel/process.c
>>> +++ b/arch/riscv/kernel/process.c
>>> @@ -37,7 +37,7 @@ EXPORT_SYMBOL(__stack_chk_guard);
>>>
>>>   extern asmlinkage void ret_from_fork(void);
>>>
>>> -void arch_cpu_idle(void)
>>> +void noinstr arch_cpu_idle(void)
>>>   {
>>>       cpu_do_idle();
>>>   }
>>
>> I came up with the same fix for this, based on a similar fix for s390. I
>> have a patch ready and will send it soon since to me, it is a fix, not a
>> workaround.
> Just making sure we aren't duplicating works. Are you also working on
> getting rid of stop_machine() while patching ftrace entries? Or to
> provide a patch to fix the issue in arch_cpu_idle()? I was just about
> to restart my patchset for the first purpose. In case if I missed
> anything, could you help pointing me to the patchset if it's already
> on the ML?


I'm currently trying to fix ftrace because I noticed that the ftrace 
kselftests triggered a lot of panics and warning. For now I only fixed 
this one ^.

But TBH, I have started thinking about the issue your patch is trying to 
deal with. IIUC you're trying to avoid traps (or silent errors) that 
could happen because of concurrent accesses when patching is happening 
on a pair auipc/jarl.

I'm wondering if instead, we could not actually handle the potential 
traps: before storing the auipc + jalr pair, we could use a 
well-identified trapping instruction that could be recognized in the 
trap handler as a legitimate trap. For example:


auipc   -->  auipc  --> XXXX  -->  XXXX  -->  auipc
jalr              XXXX        XXXX        jalr             jalr


If a core traps on a XXXX instruction, we know this address is being 
patched, so we can return and probably the patching will be over. We 
could also identify half patched word instruction (I mean with only XX).

But please let me know if that's completely stupid and I did not 
understand the problem, since my patchset to support svvptc, I am 
wondering if it is not more performant to actually take very unlikely 
traps instead of trying to avoid them.

Thanks,

Alex


>> Thanks,
>>
>> Alex
>>
>>
>>> ------------------
>>>
>>> 4.
>>> Stress-testing revealed an issue though, which I do not understand yet.
>>>
>>> Probably similar to what you did earlier, I ran a script that switched
>>> the current tracer to "function", "function_graph", "nop", "blk" each
>>> 1-5 seconds. In another shell, "stress-ng --hrtimers 1" was running.
>>>
>>> The kernel usually crashed within a few minutes, in seemingly random
>>> locations, but often in one of two ways:
>>>
>>> (a) Invalid instruction, because the address of ftrace_caller function
>>> was somehow written to the body of the traced function rather than
>>> just to the Ftrace prologue.
>>>
>>> In the following example, the crash happened at 0xffffffff800d3398.
>>> "b0 d7" is actually not part of the code here, but rather the lower
>>> bytes of 0xffffffff8000d7b0, the address of ftrace_caller() in this
>>> kernel.
>>>
>>> (gdb) disas /r 0xffffffff800d3382,+0x20
>>> Dump of assembler code from 0xffffffff800d3382 to 0xffffffff800d33a2:
>>> ...
>>>     0xffffffff800d3394 <clockevents_program_event+144>:  ba 87   mv a5,a4
>>>     0xffffffff800d3396 <clockevents_program_event+146>:  c1 bf   j
>>> 0xffffffff800d3366 <clockevents_program_event+98>
>>>     0xffffffff800d3398 <clockevents_program_event+148>:  b0 d7   sw
>>> a2,104(a5) // 0xffffffff8000d7b0, the address of ftrace_caller().
>>>     0xffffffff800d339a <clockevents_program_event+150>:  00 80   .2byte
>>> 0x8000
>>>     0xffffffff800d339c <clockevents_program_event+152>:  ff ff   .2byte
>>> 0xffff
>>>     0xffffffff800d339e <clockevents_program_event+154>:  ff ff   .2byte
>>> 0xffff
>>>     0xffffffff800d33a0 <clockevents_program_event+156>:  d5 bf   j
>>> 0xffffffff800d3394 <clockevents_program_event+144
>>>
>>> The backtrace usually contains one or more occurrences of
>>> return_to_handler() in this case.
>>>
>>> [  260.520394] [<ffffffff800d3398>] clockevents_program_event+0xac/0x100
>>> [  260.521195] [<ffffffff8000d2bc>] return_to_handler+0x0/0x26
>>> [  260.521843] [<ffffffff800c50ba>] hrtimer_interrupt+0x122/0x20c
>>> [  260.522492] [<ffffffff8000d2bc>] return_to_handler+0x0/0x26
>>> [  260.523132] [<ffffffff8009785e>] handle_percpu_devid_irq+0x9e/0x1ec
>>> [  260.523788] [<ffffffff8000d2bc>] return_to_handler+0x0/0x26
>>> [  260.524437] [<ffffffff8000d2bc>] return_to_handler+0x0/0x26
>>> [  260.525080] [<ffffffff80a8acfa>] handle_riscv_irq+0x4a/0x74
>>> [  260.525726] [<ffffffff80a97b9a>] call_on_irq_stack+0x32/0x40
>>> ----------------------
>>>
>>> (b) Jump to an invalid location, e.g. to the middle of a valid 4-byte
>>> instruction. %ra usually points right after the last instruction,
>>> "jalr   a2", in return_to_handler() in such cases, so the jump was
>>> likely made from there.
>>>
>>> The problem is reproducible, although I have not found what causes it
>>> yet.
>>>
>>> Any help is appreciated, of course.
>>>
>>>>> Regards,
>>>>> Evgenii
>>>> Regards,
>>>> Andy
>>>
>>> _______________________________________________
>>> linux-riscv mailing list
>>> linux-riscv@lists.infradead.org
>>> http://lists.infradead.org/mailman/listinfo/linux-riscv
> Thanks,
> Andy
>
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH RFC v2 riscv/for-next 0/5] Enable ftrace with kernel preemption for RISC-V
  2024-03-07 13:21           ` Alexandre Ghiti
@ 2024-03-07 15:57             ` Samuel Holland
  2024-03-11 14:24               ` Andy Chiu
  0 siblings, 1 reply; 43+ messages in thread
From: Samuel Holland @ 2024-03-07 15:57 UTC (permalink / raw)
  To: Alexandre Ghiti, Andy Chiu
  Cc: Evgenii Shatokhin, palmer, paul.walmsley, aou, rostedt, mingo,
	peterz, jpoimboe, jbaron, ardb, greentime.hu, zong.li, guoren,
	Jessica Clarke, kernel, linux-riscv, linux

Hi Alex,

On 2024-03-07 7:21 AM, Alexandre Ghiti wrote:
> But TBH, I have started thinking about the issue your patch is trying to deal
> with. IIUC you're trying to avoid traps (or silent errors) that could happen
> because of concurrent accesses when patching is happening on a pair auipc/jarl.
> 
> I'm wondering if instead, we could not actually handle the potential traps:
> before storing the auipc + jalr pair, we could use a well-identified trapping
> instruction that could be recognized in the trap handler as a legitimate trap.
> For example:
> 
> 
> auipc  -->  auipc  -->  XXXX  -->  XXXX  -->  auipc
> jalr        XXXX        XXXX       jalr       jalr
> 
> 
> If a core traps on a XXXX instruction, we know this address is being patched, so
> we can return and probably the patching will be over. We could also identify
> half patched word instruction (I mean with only XX).

Unfortunately this does not work without some fence.i in the middle. The
processor is free to fetch any instruction that has been written to a location
since the last fence.i instruction. So it would be perfectly valid to fetch the
old aiupc and new jalr or vice versa and not trap. This would happen if, for
example, the two instructions were in different cache lines, and only one of the
cache lines got evicted and refilled.

But sending an IPI to run the fence.i probably negates the performance benefit.

Maybe there is some creative way to overcome this.

> But please let me know if that's completely stupid and I did not understand the
> problem, since my patchset to support svvptc, I am wondering if it is not more
> performant to actually take very unlikely traps instead of trying to avoid them.

I agree in general it is a good idea to optimize the hot path like this.

Regards,
Samuel


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH RFC v2 riscv/for-next 0/5] Enable ftrace with kernel preemption for RISC-V
  2024-03-07 15:57             ` Samuel Holland
@ 2024-03-11 14:24               ` Andy Chiu
  2024-03-19 14:50                 ` Alexandre Ghiti
  0 siblings, 1 reply; 43+ messages in thread
From: Andy Chiu @ 2024-03-11 14:24 UTC (permalink / raw)
  To: Alexandre Ghiti
  Cc: Evgenii Shatokhin, palmer, paul.walmsley, aou, rostedt, mingo,
	peterz, jpoimboe, jbaron, ardb, greentime.hu, zong.li, guoren,
	Jessica Clarke, kernel, linux-riscv, linux, Samuel Holland

On Thu, Mar 7, 2024 at 11:57 PM Samuel Holland
<samuel.holland@sifive.com> wrote:
>
> Hi Alex,
>
> On 2024-03-07 7:21 AM, Alexandre Ghiti wrote:
> > But TBH, I have started thinking about the issue your patch is trying to deal
> > with. IIUC you're trying to avoid traps (or silent errors) that could happen
> > because of concurrent accesses when patching is happening on a pair auipc/jarl.
> >
> > I'm wondering if instead, we could not actually handle the potential traps:
> > before storing the auipc + jalr pair, we could use a well-identified trapping
> > instruction that could be recognized in the trap handler as a legitimate trap.
> > For example:
> >
> >
> > auipc  -->  auipc  -->  XXXX  -->  XXXX  -->  auipc
> > jalr        XXXX        XXXX       jalr       jalr
> >
> >
> > If a core traps on a XXXX instruction, we know this address is being patched, so
> > we can return and probably the patching will be over. We could also identify
> > half patched word instruction (I mean with only XX).
>
> Unfortunately this does not work without some fence.i in the middle. The
> processor is free to fetch any instruction that has been written to a location
> since the last fence.i instruction. So it would be perfectly valid to fetch the
> old aiupc and new jalr or vice versa and not trap. This would happen if, for
> example, the two instructions were in different cache lines, and only one of the
> cache lines got evicted and refilled.
>
> But sending an IPI to run the fence.i probably negates the performance benefit.

Maybe something like x86, we can hook ftrace_replace_code() out and
batch send IPIs to prevent storms of remote fences. The solution Alex
proposed can save the code size for function entries. But we have to
send out remote fences at each "-->" transition, which is 4 sets of
remote IPIs. On the other hand, this series increases the per-function
patch size to 24 bytes. However, it decreases the number of remote
fences to 1 set.

The performance hit could be observable for the auipc + jalr case,
because all remote cores will be executing on XXXX instructions and
take a trap at each function entry during code patching.

Besides, this series would give us a chance not to send any remote
fences if we were to change only the destination of ftrace (e.g. to a
custom ftrace trampoline). As it would be a regular store for the
writer and regular load for readers, only fence w,w is needed.
However, I am not very certain on how often would be for this
particular use case. I'd need some time to investigate it.

>
> Maybe there is some creative way to overcome this.
>
> > But please let me know if that's completely stupid and I did not understand the
> > problem, since my patchset to support svvptc, I am wondering if it is not more
> > performant to actually take very unlikely traps instead of trying to avoid them.
>
> I agree in general it is a good idea to optimize the hot path like this.
>
> Regards,
> Samuel
>

Regards,
Andy

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH RFC v2 riscv/for-next 0/5] Enable ftrace with kernel preemption for RISC-V
  2024-02-21 16:55     ` Evgenii Shatokhin
  2024-03-06 20:57       ` Alexandre Ghiti
@ 2024-03-18 15:31       ` Andy Chiu
  2024-03-19 15:32         ` Evgenii Shatokhin
  2024-03-19 17:37         ` Alexandre Ghiti
  1 sibling, 2 replies; 43+ messages in thread
From: Andy Chiu @ 2024-03-18 15:31 UTC (permalink / raw)
  To: Evgenii Shatokhin
  Cc: palmer, paul.walmsley, aou, rostedt, mingo, peterz, jpoimboe,
	jbaron, ardb, greentime.hu, zong.li, guoren, Jessica Clarke,
	kernel, linux-riscv, linux

Hi Evgenii,

Thanks for your help!

I just rebased upon 6.8-rc1 and passed the stress-ng + ftrace/nop
testing. I will add some random tracers to test and some optimization
before sending out again. Here are a few things needed:

On Thu, Feb 22, 2024 at 12:55 AM Evgenii Shatokhin
<e.shatokhin@yadro.com> wrote:
>
> On 21.02.2024 08:27, Andy Chiu wrote:
> > «Внимание! Данное письмо от внешнего адресата!»
> >
> > On Wed, Feb 14, 2024 at 3:42 AM Evgenii Shatokhin <e.shatokhin@yadro.com> wrote:
> >>
> >> Hi,
> >>
> >> On 13.09.2022 12:42, Andy Chiu wrote:
> >>> This patch removes dependency of dynamic ftrace from calling
> >>> stop_machine(), and makes it compatiable with kernel preemption.
> >>> Originally, we ran into stack corruptions, or execution of partially
> >>> updated instructions when starting or stopping ftrace on a fully
> >>> preemptible kernel configuration. The reason is that kernel periodically
> >>> calls rcu_momentary_dyntick_idle() on cores waiting for the code-patching
> >>> core running in ftrace. Though rcu_momentary_dyntick_idle() itself is
> >>> marked as notrace, it would call a bunch of tracable functions if we
> >>> configured the kernel as preemptible. For example, these are some functions
> >>> that happened to have a symbol and have not been marked as notrace on a
> >>> RISC-V preemptible kernel compiled with GCC-11:
> >>>    - __rcu_report_exp_rnp()
> >>>    - rcu_report_exp_cpu_mult()
> >>>    - rcu_preempt_deferred_qs()
> >>>    - rcu_preempt_need_deferred_qs()
> >>>    - rcu_preempt_deferred_qs_irqrestore()
> >>>
> >>> Thus, this make it not ideal for us to rely on stop_machine() and
> >>> handly marked "notrace"s to perform runtime code patching. To remove
> >>> such dependency, we must make updates of code seemed atomic on running
> >>> cores. This might not be obvious for RISC-V since it usaually uses a pair
> >>> of AUIPC + JALR to perform a long jump, which cannot be modified and
> >>> executed concurrently if we consider preemptions. As such, this patch
> >>> proposed a way to make it possible. It embeds a 32-bit rel-address data
> >>> into instructions of each ftrace prologue and jumps indirectly. In this
> >>> way, we could store and load the address atomically so that the code
> >>> patching core could run simutaneously with the rest of running cores.
> >>>
> >>> After applying the patchset, we compiled a preemptible kernel with all
> >>> tracers and ftrace-selftest enabled, and booted it on a 2-core QEMU virt
> >>> machine. The kernel could boot up successfully, passing all ftrace
> >>> testsuits. Besides, we ran a script that randomly pick a tracer on every
> >>> 0~5 seconds. The kernel has sustained over 20K rounds of the test. In
> >>> contrast, a preemptible kernel without our patch would panic in few
> >>> rounds on the same machine.
> >>>
> >>> Though we ran into errors when using hwlat or irqsoff tracers together
> >>> with cpu-online stressor from stress-ng on a preemptible kernel. We
> >>> believe the reason may be that  percpu workers of the tracers are being
> >>> queued into unbounded workqueue when cpu get offlined and patches will go
> >>> through tracing tree.
> >>>
> >>> Additionally, we found patching of tracepoints unsafe since the
> >>> instructions being patched are not naturally aligned. This may result in
> >>> 2 half-word stores, which breaks atomicity, during the code patching.
> >>>
> >>> changes in patch v2:
> >>>    - Enforce alignments on all functions with a compiler workaround.
> >>>    - Support 64bit addressing for ftrace targets if xlen == 64
> >>>    - Initialize ftrace target addresses to avoid calling bad address in a
> >>>      hypothesized case.
> >>>    - Use LGPTR instead of SZPTR since .align is log-scaled for
> >>>      mcount-dyn.S
> >>>    - Require the nop instruction of all jump_labels aligns naturally on
> >>>      4B.
> >>>
> >>> Andy Chiu (5):
> >>>     riscv: align ftrace to 4 Byte boundary and increase ftrace prologue
> >>>       size
> >>>     riscv: export patch_insn_write
> >>>     riscv: ftrace: use indirect jump to work with kernel preemption
> >>>     riscv: ftrace: do not use stop_machine to update code
> >>>     riscv: align arch_static_branch function
> >>>
> >>>    arch/riscv/Makefile                 |   2 +-
> >>>    arch/riscv/include/asm/ftrace.h     |  24 ----
> >>>    arch/riscv/include/asm/jump_label.h |   2 +
> >>>    arch/riscv/include/asm/patch.h      |   1 +
> >>>    arch/riscv/kernel/ftrace.c          | 179 ++++++++++++++++++++--------
> >>>    arch/riscv/kernel/mcount-dyn.S      |  69 ++++++++---
> >>>    arch/riscv/kernel/patch.c           |   4 +-
> >>>    7 files changed, 188 insertions(+), 93 deletions(-)
> >>>
> >>
> >> First of all, thank you for working on making dynamic Ftrace robust in
> >> preemptible kernels on RISC-V.
> >> It is an important use case but, for now, dynamic Ftrace and related
> >> tracers cannot be safely used with such kernels.
> >>
> >> Are there any updates on this series?
> >> It needs a rebase, of course, but it looks doable.
> >>
> >> If I understand the discussion correctly, the only blocker was that
> >> using "-falign-functions" was not enough to properly align cold
> >> functions and "-fno-guess-branch-probability" would likely have a
> >> performance cost.
> >>
> >> It seems, GCC developers have recently provided a workaround for that
> >> (https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;h=0f5a9a00e3ab1fe96142f304cfbcf3f63b15f326,
> >> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88345#c24).
> >>
> >> "-fmin-function-alignment" should help but, I do not know, which GCC
> >> versions have got that patch already. In the meantime, one could
> >> probably check if "-fmin-function-alignment" is supported by the
> >> compiler and use it, if it is.
> >>
> >> Thoughts?
> >
> > Hi Evgenii,
> >
> > Thanks for the update. Indeed, it is essential to this patch for
> > toolchain to provide forced alignment. We can test this flag in the
> > Makefile to sort out if toolchain supports it or not. Meanwhile, I had
> > figured out a way for this to work on any 2-B align addresses but
> > hadn't implemented it out yet. Basically it would require more
> > patching space for us to do software alignment. I would opt for a
> > special toolchain flag if the toolchain just supports it.
> >
> > Let me take some time to look and get back to you soon.
>
> Thank you! Looking forward to it.
>
> In case it helps, here is what I have checked so far.
>
> 1.
> I added the patch
> https://gcc.gnu.org/git/?p=gcc.git;a=patch;h=0f5a9a00e3ab1fe96142f304cfbcf3f63b15f326
> to the current revision of GCC 13.2.0 from RISC-V toolchain.
>
> Rebased your patchset on top of Linux 6.8-rc4 (mostly - context changes,
> SYM_FUNC_START/SYM_FUNC_END for asm symbols, etc.).
>
> Reverted 8547649981e6 ("riscv: ftrace: Fixup panic by disabling
> preemption").
>
> Switched from -falign-functions=4 to -fmin-function-alignment=4:
> ------------------
> diff --git a/arch/riscv/Makefile b/arch/riscv/Makefile
> index b33b787c8b07..dcd0adeebaae 100644
> --- a/arch/riscv/Makefile
> +++ b/arch/riscv/Makefile
> @@ -15,9 +15,9 @@ ifeq ($(CONFIG_DYNAMIC_FTRACE),y)
>         LDFLAGS_vmlinux += --no-relax
>         KBUILD_CPPFLAGS += -DCC_USING_PATCHABLE_FUNCTION_ENTRY
>   ifeq ($(CONFIG_RISCV_ISA_C),y)
> -       CC_FLAGS_FTRACE := -fpatchable-function-entry=12 -falign-functions=4
> +       CC_FLAGS_FTRACE := -fpatchable-function-entry=12
> -fmin-function-alignment=4
>   else
> -       CC_FLAGS_FTRACE := -fpatchable-function-entry=6 -falign-functions=4
> +       CC_FLAGS_FTRACE := -fpatchable-function-entry=6 -fmin-function-alignment=4
>   endif
>   endif
>
> ------------------
>
> As far as I can see from objdump, the functions that were not aligned at
> 4-byte boundary with -falign-functions=4, are now aligned correctly with
> -fmin-function-alignment=4.
>
> 2.
> I tried the kernel in a QEMU VM with 2 CPUs and "-machine virt".
>
> The boottime tests for Ftrace had passed, except the tests for
> function_graph. I described the failure and the possible fix here:
> https://lore.kernel.org/all/dcc5976d-635a-4710-92df-94a99653314e@yadro.com/

Indeed, this is needed. I am not sure why I got ftrace boot-time tests
passed back then. Thank you for solving it!

>
> 3.
> There were also boottime warnings about "RCU not on for:
> arch_cpu_idle+0x0/0x2c". These are probably not related to your
> patchset, but rather to the fact that Ftrace is enabled in a preemptble
> kernel where RCU does different things.
>
> As a workaround, I disabled tracing of arch_cpu_idle() for now:
> ------------------
> diff --git a/arch/riscv/kernel/process.c b/arch/riscv/kernel/process.c
> index 92922dbd5b5c..6abeecbfc51d 100644
> --- a/arch/riscv/kernel/process.c
> +++ b/arch/riscv/kernel/process.c
> @@ -37,7 +37,7 @@ EXPORT_SYMBOL(__stack_chk_guard);
>
>   extern asmlinkage void ret_from_fork(void);
>
> -void arch_cpu_idle(void)
> +void noinstr arch_cpu_idle(void)
>   {
>         cpu_do_idle();
>   }
>
> ------------------
>
> 4.
> Stress-testing revealed an issue though, which I do not understand yet.
>
> Probably similar to what you did earlier, I ran a script that switched
> the current tracer to "function", "function_graph", "nop", "blk" each
> 1-5 seconds. In another shell, "stress-ng --hrtimers 1" was running.
>
> The kernel usually crashed within a few minutes, in seemingly random
> locations, but often in one of two ways:
>
> (a) Invalid instruction, because the address of ftrace_caller function
> was somehow written to the body of the traced function rather than just
> to the Ftrace prologue.

The reason for this is probably that any one of your ftrace_*_call is
not 8-B aligned.

>
> In the following example, the crash happened at 0xffffffff800d3398. "b0
> d7" is actually not part of the code here, but rather the lower bytes of
> 0xffffffff8000d7b0, the address of ftrace_caller() in this kernel.

It seems like there is a bug in patch_insn_write(). I think we should
at least disable migration during patch_map() and patch_unmap(). I'd
need some time to dig into patch_map(). But since __set_fixmap() only
flush local tlb, I'd assume it is not safe to context switch out and
migrate while holding the fix-map mapping. Adding preempt_disable()
and preempt_enable() before calling __patch_insn_write() solves the
issue.

>
> (gdb) disas /r 0xffffffff800d3382,+0x20
> Dump of assembler code from 0xffffffff800d3382 to 0xffffffff800d33a2:
> ...
>     0xffffffff800d3394 <clockevents_program_event+144>:  ba 87   mv
> a5,a4
>     0xffffffff800d3396 <clockevents_program_event+146>:  c1 bf   j
> 0xffffffff800d3366 <clockevents_program_event+98>
>     0xffffffff800d3398 <clockevents_program_event+148>:  b0 d7   sw
> a2,104(a5) // 0xffffffff8000d7b0, the address of ftrace_caller().
>     0xffffffff800d339a <clockevents_program_event+150>:  00 80   .2byte
> 0x8000
>     0xffffffff800d339c <clockevents_program_event+152>:  ff ff   .2byte
> 0xffff
>     0xffffffff800d339e <clockevents_program_event+154>:  ff ff   .2byte
> 0xffff
>     0xffffffff800d33a0 <clockevents_program_event+156>:  d5 bf   j
> 0xffffffff800d3394 <clockevents_program_event+144
>
> The backtrace usually contains one or more occurrences of
> return_to_handler() in this case.
>
> [  260.520394] [<ffffffff800d3398>] clockevents_program_event+0xac/0x100
> [  260.521195] [<ffffffff8000d2bc>] return_to_handler+0x0/0x26
> [  260.521843] [<ffffffff800c50ba>] hrtimer_interrupt+0x122/0x20c
> [  260.522492] [<ffffffff8000d2bc>] return_to_handler+0x0/0x26
> [  260.523132] [<ffffffff8009785e>] handle_percpu_devid_irq+0x9e/0x1ec
> [  260.523788] [<ffffffff8000d2bc>] return_to_handler+0x0/0x26
> [  260.524437] [<ffffffff8000d2bc>] return_to_handler+0x0/0x26
> [  260.525080] [<ffffffff80a8acfa>] handle_riscv_irq+0x4a/0x74
> [  260.525726] [<ffffffff80a97b9a>] call_on_irq_stack+0x32/0x40
> ----------------------
>
> (b) Jump to an invalid location, e.g. to the middle of a valid 4-byte
> instruction. %ra usually points right after the last instruction, "jalr
>    a2", in return_to_handler() in such cases, so the jump was likely
> made from there.

I haven't done fgraph tests yet. I will try out and see.

>
> The problem is reproducible, although I have not found what causes it yet.
>
> Any help is appreciated, of course.
>
> >
> >>
> >> Regards,
> >> Evgenii
> >
> > Regards,
> > Andy
>

Also, here is another side note,

It seems like the ftrace save/restore routine should save more
registers as clang's fastcc may use t2 when the number of arguments
exceeds what ABI defines for passing arg through registers.

Cheers,
Andy

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH RFC v2 riscv/for-next 0/5] Enable ftrace with kernel preemption for RISC-V
  2024-03-11 14:24               ` Andy Chiu
@ 2024-03-19 14:50                 ` Alexandre Ghiti
  2024-03-19 14:58                   ` Conor Dooley
  2024-03-20 16:37                   ` Andy Chiu
  0 siblings, 2 replies; 43+ messages in thread
From: Alexandre Ghiti @ 2024-03-19 14:50 UTC (permalink / raw)
  To: Andy Chiu, Björn Töpel
  Cc: Evgenii Shatokhin, palmer, paul.walmsley, aou, rostedt, mingo,
	peterz, jpoimboe, jbaron, ardb, greentime.hu, zong.li, guoren,
	Jessica Clarke, kernel, linux-riscv, linux, Samuel Holland

On 11/03/2024 15:24, Andy Chiu wrote:
> On Thu, Mar 7, 2024 at 11:57 PM Samuel Holland
> <samuel.holland@sifive.com> wrote:
>> Hi Alex,
>>
>> On 2024-03-07 7:21 AM, Alexandre Ghiti wrote:
>>> But TBH, I have started thinking about the issue your patch is trying to deal
>>> with. IIUC you're trying to avoid traps (or silent errors) that could happen
>>> because of concurrent accesses when patching is happening on a pair auipc/jarl.
>>>
>>> I'm wondering if instead, we could not actually handle the potential traps:
>>> before storing the auipc + jalr pair, we could use a well-identified trapping
>>> instruction that could be recognized in the trap handler as a legitimate trap.
>>> For example:
>>>
>>>
>>> auipc  -->  auipc  -->  XXXX  -->  XXXX  -->  auipc
>>> jalr        XXXX        XXXX       jalr       jalr
>>>
>>>
>>> If a core traps on a XXXX instruction, we know this address is being patched, so
>>> we can return and probably the patching will be over. We could also identify
>>> half patched word instruction (I mean with only XX).
>> Unfortunately this does not work without some fence.i in the middle. The
>> processor is free to fetch any instruction that has been written to a location
>> since the last fence.i instruction. So it would be perfectly valid to fetch the
>> old aiupc and new jalr or vice versa and not trap. This would happen if, for
>> example, the two instructions were in different cache lines, and only one of the
>> cache lines got evicted and refilled.
>>
>> But sending an IPI to run the fence.i probably negates the performance benefit.
> Maybe something like x86, we can hook ftrace_replace_code() out and
> batch send IPIs to prevent storms of remote fences. The solution Alex
> proposed can save the code size for function entries. But we have to
> send out remote fences at each "-->" transition, which is 4 sets of
> remote IPIs. On the other hand, this series increases the per-function
> patch size to 24 bytes. However, it decreases the number of remote
> fences to 1 set.
>
> The performance hit could be observable for the auipc + jalr case,
> because all remote cores will be executing on XXXX instructions and
> take a trap at each function entry during code patching.
>
> Besides, this series would give us a chance not to send any remote
> fences if we were to change only the destination of ftrace (e.g. to a
> custom ftrace trampoline). As it would be a regular store for the
> writer and regular load for readers, only fence w,w is needed.
> However, I am not very certain on how often would be for this
> particular use case. I'd need some time to investigate it.
>
>> Maybe there is some creative way to overcome this.
>>
>>> But please let me know if that's completely stupid and I did not understand the
>>> problem, since my patchset to support svvptc, I am wondering if it is not more
>>> performant to actually take very unlikely traps instead of trying to avoid them.
>> I agree in general it is a good idea to optimize the hot path like this.
>>
>> Regards,
>> Samuel
>>
> Regards,
> Andy
>
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv


So indeed my solution was way too naive and we've been discussing that 
with Björn lately. He worked a lot on that and came up with the solution 
he proposed here 
https://lore.kernel.org/linux-riscv/87zfv0onre.fsf@all.your.base.are.belong.to.us/

The thing is ftrace seems to be quite broken as the ftrace kselftests 
raise a lot of issues which I have started to debug but are not that 
easy, so we are wondering if *someone* should not work on Bjorn's 
solution (or another, open to discussions) for 6.10. @Andy WDYT? Do you 
have free cycles? Björn could work on that too (and I'll help if needed).

Let me know what you think!

Alex



_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH RFC v2 riscv/for-next 0/5] Enable ftrace with kernel preemption for RISC-V
  2024-03-19 14:50                 ` Alexandre Ghiti
@ 2024-03-19 14:58                   ` Conor Dooley
  2024-03-20 16:37                   ` Andy Chiu
  1 sibling, 0 replies; 43+ messages in thread
From: Conor Dooley @ 2024-03-19 14:58 UTC (permalink / raw)
  To: Alexandre Ghiti
  Cc: Andy Chiu, Björn Töpel, Evgenii Shatokhin, palmer,
	paul.walmsley, aou, rostedt, mingo, peterz, jpoimboe, jbaron,
	ardb, greentime.hu, zong.li, guoren, Jessica Clarke, kernel,
	linux-riscv, linux, Samuel Holland


[-- Attachment #1.1: Type: text/plain, Size: 595 bytes --]

On Tue, Mar 19, 2024 at 03:50:01PM +0100, Alexandre Ghiti wrote:

> The thing is ftrace seems to be quite broken as the ftrace kselftests raise
> a lot of issues which I have started to debug but are not that easy, so we
> are wondering if *someone* should not work on Bjorn's solution (or another,
> open to discussions) for 6.10. @Andy WDYT? Do you have free cycles? Björn
> could work on that too (and I'll help if needed).

If patching is broken I wouldn't be too worried about targeting 6.10,
just do it right and get Palmer to take it on fixes when everyone is
happy with it.


[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

[-- Attachment #2: Type: text/plain, Size: 161 bytes --]

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH RFC v2 riscv/for-next 0/5] Enable ftrace with kernel preemption for RISC-V
  2024-03-18 15:31       ` Andy Chiu
@ 2024-03-19 15:32         ` Evgenii Shatokhin
  2024-03-20 16:38           ` Andy Chiu
  2024-03-19 17:37         ` Alexandre Ghiti
  1 sibling, 1 reply; 43+ messages in thread
From: Evgenii Shatokhin @ 2024-03-19 15:32 UTC (permalink / raw)
  To: Andy Chiu
  Cc: palmer, paul.walmsley, aou, rostedt, mingo, peterz, jpoimboe,
	jbaron, ardb, greentime.hu, zong.li, guoren, Jessica Clarke,
	kernel, linux-riscv, linux

Hi,

On 18.03.2024 18:31, Andy Chiu wrote:
> Hi Evgenii,
> 
> Thanks for your help!

You are welcome!

> 
> I just rebased upon 6.8-rc1 and passed the stress-ng + ftrace/nop
> testing. I will add some random tracers to test and some optimization
> before sending out again. Here are a few things needed:
> 
> On Thu, Feb 22, 2024 at 12:55 AM Evgenii Shatokhin
> <e.shatokhin@yadro.com> wrote:
>>
>> On 21.02.2024 08:27, Andy Chiu wrote:
>>>
>>> On Wed, Feb 14, 2024 at 3:42 AM Evgenii Shatokhin <e.shatokhin@yadro.com> wrote:
>>>>
>>>> Hi,
>>>>
>>>> On 13.09.2022 12:42, Andy Chiu wrote:
>>>>> This patch removes dependency of dynamic ftrace from calling
>>>>> stop_machine(), and makes it compatiable with kernel preemption.
>>>>> Originally, we ran into stack corruptions, or execution of partially
>>>>> updated instructions when starting or stopping ftrace on a fully
>>>>> preemptible kernel configuration. The reason is that kernel periodically
>>>>> calls rcu_momentary_dyntick_idle() on cores waiting for the code-patching
>>>>> core running in ftrace. Though rcu_momentary_dyntick_idle() itself is
>>>>> marked as notrace, it would call a bunch of tracable functions if we
>>>>> configured the kernel as preemptible. For example, these are some functions
>>>>> that happened to have a symbol and have not been marked as notrace on a
>>>>> RISC-V preemptible kernel compiled with GCC-11:
>>>>>     - __rcu_report_exp_rnp()
>>>>>     - rcu_report_exp_cpu_mult()
>>>>>     - rcu_preempt_deferred_qs()
>>>>>     - rcu_preempt_need_deferred_qs()
>>>>>     - rcu_preempt_deferred_qs_irqrestore()
>>>>>
>>>>> Thus, this make it not ideal for us to rely on stop_machine() and
>>>>> handly marked "notrace"s to perform runtime code patching. To remove
>>>>> such dependency, we must make updates of code seemed atomic on running
>>>>> cores. This might not be obvious for RISC-V since it usaually uses a pair
>>>>> of AUIPC + JALR to perform a long jump, which cannot be modified and
>>>>> executed concurrently if we consider preemptions. As such, this patch
>>>>> proposed a way to make it possible. It embeds a 32-bit rel-address data
>>>>> into instructions of each ftrace prologue and jumps indirectly. In this
>>>>> way, we could store and load the address atomically so that the code
>>>>> patching core could run simutaneously with the rest of running cores.
>>>>>
>>>>> After applying the patchset, we compiled a preemptible kernel with all
>>>>> tracers and ftrace-selftest enabled, and booted it on a 2-core QEMU virt
>>>>> machine. The kernel could boot up successfully, passing all ftrace
>>>>> testsuits. Besides, we ran a script that randomly pick a tracer on every
>>>>> 0~5 seconds. The kernel has sustained over 20K rounds of the test. In
>>>>> contrast, a preemptible kernel without our patch would panic in few
>>>>> rounds on the same machine.
>>>>>
>>>>> Though we ran into errors when using hwlat or irqsoff tracers together
>>>>> with cpu-online stressor from stress-ng on a preemptible kernel. We
>>>>> believe the reason may be that  percpu workers of the tracers are being
>>>>> queued into unbounded workqueue when cpu get offlined and patches will go
>>>>> through tracing tree.
>>>>>
>>>>> Additionally, we found patching of tracepoints unsafe since the
>>>>> instructions being patched are not naturally aligned. This may result in
>>>>> 2 half-word stores, which breaks atomicity, during the code patching.
>>>>>
>>>>> changes in patch v2:
>>>>>     - Enforce alignments on all functions with a compiler workaround.
>>>>>     - Support 64bit addressing for ftrace targets if xlen == 64
>>>>>     - Initialize ftrace target addresses to avoid calling bad address in a
>>>>>       hypothesized case.
>>>>>     - Use LGPTR instead of SZPTR since .align is log-scaled for
>>>>>       mcount-dyn.S
>>>>>     - Require the nop instruction of all jump_labels aligns naturally on
>>>>>       4B.
>>>>>
>>>>> Andy Chiu (5):
>>>>>      riscv: align ftrace to 4 Byte boundary and increase ftrace prologue
>>>>>        size
>>>>>      riscv: export patch_insn_write
>>>>>      riscv: ftrace: use indirect jump to work with kernel preemption
>>>>>      riscv: ftrace: do not use stop_machine to update code
>>>>>      riscv: align arch_static_branch function
>>>>>
>>>>>     arch/riscv/Makefile                 |   2 +-
>>>>>     arch/riscv/include/asm/ftrace.h     |  24 ----
>>>>>     arch/riscv/include/asm/jump_label.h |   2 +
>>>>>     arch/riscv/include/asm/patch.h      |   1 +
>>>>>     arch/riscv/kernel/ftrace.c          | 179 ++++++++++++++++++++--------
>>>>>     arch/riscv/kernel/mcount-dyn.S      |  69 ++++++++---
>>>>>     arch/riscv/kernel/patch.c           |   4 +-
>>>>>     7 files changed, 188 insertions(+), 93 deletions(-)
>>>>>
>>>>
>>>> First of all, thank you for working on making dynamic Ftrace robust in
>>>> preemptible kernels on RISC-V.
>>>> It is an important use case but, for now, dynamic Ftrace and related
>>>> tracers cannot be safely used with such kernels.
>>>>
>>>> Are there any updates on this series?
>>>> It needs a rebase, of course, but it looks doable.
>>>>
>>>> If I understand the discussion correctly, the only blocker was that
>>>> using "-falign-functions" was not enough to properly align cold
>>>> functions and "-fno-guess-branch-probability" would likely have a
>>>> performance cost.
>>>>
>>>> It seems, GCC developers have recently provided a workaround for that
>>>> (https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;h=0f5a9a00e3ab1fe96142f304cfbcf3f63b15f326,
>>>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88345#c24).
>>>>
>>>> "-fmin-function-alignment" should help but, I do not know, which GCC
>>>> versions have got that patch already. In the meantime, one could
>>>> probably check if "-fmin-function-alignment" is supported by the
>>>> compiler and use it, if it is.
>>>>
>>>> Thoughts?
>>>
>>> Hi Evgenii,
>>>
>>> Thanks for the update. Indeed, it is essential to this patch for
>>> toolchain to provide forced alignment. We can test this flag in the
>>> Makefile to sort out if toolchain supports it or not. Meanwhile, I had
>>> figured out a way for this to work on any 2-B align addresses but
>>> hadn't implemented it out yet. Basically it would require more
>>> patching space for us to do software alignment. I would opt for a
>>> special toolchain flag if the toolchain just supports it.
>>>
>>> Let me take some time to look and get back to you soon.
>>
>> Thank you! Looking forward to it.
>>
>> In case it helps, here is what I have checked so far.
>>
>> 1.
>> I added the patch
>> https://gcc.gnu.org/git/?p=gcc.git;a=patch;h=0f5a9a00e3ab1fe96142f304cfbcf3f63b15f326
>> to the current revision of GCC 13.2.0 from RISC-V toolchain.
>>
>> Rebased your patchset on top of Linux 6.8-rc4 (mostly - context changes,
>> SYM_FUNC_START/SYM_FUNC_END for asm symbols, etc.).
>>
>> Reverted 8547649981e6 ("riscv: ftrace: Fixup panic by disabling
>> preemption").
>>
>> Switched from -falign-functions=4 to -fmin-function-alignment=4:
>> ------------------
>> diff --git a/arch/riscv/Makefile b/arch/riscv/Makefile
>> index b33b787c8b07..dcd0adeebaae 100644
>> --- a/arch/riscv/Makefile
>> +++ b/arch/riscv/Makefile
>> @@ -15,9 +15,9 @@ ifeq ($(CONFIG_DYNAMIC_FTRACE),y)
>>          LDFLAGS_vmlinux += --no-relax
>>          KBUILD_CPPFLAGS += -DCC_USING_PATCHABLE_FUNCTION_ENTRY
>>    ifeq ($(CONFIG_RISCV_ISA_C),y)
>> -       CC_FLAGS_FTRACE := -fpatchable-function-entry=12 -falign-functions=4
>> +       CC_FLAGS_FTRACE := -fpatchable-function-entry=12
>> -fmin-function-alignment=4
>>    else
>> -       CC_FLAGS_FTRACE := -fpatchable-function-entry=6 -falign-functions=4
>> +       CC_FLAGS_FTRACE := -fpatchable-function-entry=6 -fmin-function-alignment=4
>>    endif
>>    endif
>>
>> ------------------
>>
>> As far as I can see from objdump, the functions that were not aligned at
>> 4-byte boundary with -falign-functions=4, are now aligned correctly with
>> -fmin-function-alignment=4.
>>
>> 2.
>> I tried the kernel in a QEMU VM with 2 CPUs and "-machine virt".
>>
>> The boottime tests for Ftrace had passed, except the tests for
>> function_graph. I described the failure and the possible fix here:
>> https://lore.kernel.org/all/dcc5976d-635a-4710-92df-94a99653314e@yadro.com/
> 
> Indeed, this is needed. I am not sure why I got ftrace boot-time tests
> passed back then. Thank you for solving it!
> 
>>
>> 3.
>> There were also boottime warnings about "RCU not on for:
>> arch_cpu_idle+0x0/0x2c". These are probably not related to your
>> patchset, but rather to the fact that Ftrace is enabled in a preemptble
>> kernel where RCU does different things.
>>
>> As a workaround, I disabled tracing of arch_cpu_idle() for now:
>> ------------------
>> diff --git a/arch/riscv/kernel/process.c b/arch/riscv/kernel/process.c
>> index 92922dbd5b5c..6abeecbfc51d 100644
>> --- a/arch/riscv/kernel/process.c
>> +++ b/arch/riscv/kernel/process.c
>> @@ -37,7 +37,7 @@ EXPORT_SYMBOL(__stack_chk_guard);
>>
>>    extern asmlinkage void ret_from_fork(void);
>>
>> -void arch_cpu_idle(void)
>> +void noinstr arch_cpu_idle(void)
>>    {
>>          cpu_do_idle();
>>    }
>>
>> ------------------
>>
>> 4.
>> Stress-testing revealed an issue though, which I do not understand yet.
>>
>> Probably similar to what you did earlier, I ran a script that switched
>> the current tracer to "function", "function_graph", "nop", "blk" each
>> 1-5 seconds. In another shell, "stress-ng --hrtimers 1" was running.
>>
>> The kernel usually crashed within a few minutes, in seemingly random
>> locations, but often in one of two ways:
>>
>> (a) Invalid instruction, because the address of ftrace_caller function
>> was somehow written to the body of the traced function rather than just
>> to the Ftrace prologue.
> 
> The reason for this is probably that any one of your ftrace_*_call is
> not 8-B aligned.

I thought, all locations where the address of a ftrace_caller function 
is written are 8-byte aligned, if the compiler guarantees that start 
addresses of all functions are 4-byte aligned. Your patchset provides 2 
kinds of function prologues exactly for that purpose. Am I missing 
something?

> 
>>
>> In the following example, the crash happened at 0xffffffff800d3398. "b0
>> d7" is actually not part of the code here, but rather the lower bytes of
>> 0xffffffff8000d7b0, the address of ftrace_caller() in this kernel.
> 
> It seems like there is a bug in patch_insn_write(). I think we should
> at least disable migration during patch_map() and patch_unmap(). I'd
> need some time to dig into patch_map(). But since __set_fixmap() only
> flush local tlb, I'd assume it is not safe to context switch out and
> migrate while holding the fix-map mapping. Adding preempt_disable()
> and preempt_enable() before calling __patch_insn_write() solves the
> issue.
> 

Interesting.
Thanks for pointing that out! I never though that the task could migrate 
to a different CPU while patch_insn_write() is running. If it could, 
that would cause such issues, sure. And probably - the issues with 
"function_graph" too, if some data were corrupted that way rather than code.

>>
>> (gdb) disas /r 0xffffffff800d3382,+0x20
>> Dump of assembler code from 0xffffffff800d3382 to 0xffffffff800d33a2:
>> ...
>>      0xffffffff800d3394 <clockevents_program_event+144>:  ba 87   mv
>> a5,a4
>>      0xffffffff800d3396 <clockevents_program_event+146>:  c1 bf   j
>> 0xffffffff800d3366 <clockevents_program_event+98>
>>      0xffffffff800d3398 <clockevents_program_event+148>:  b0 d7   sw
>> a2,104(a5) // 0xffffffff8000d7b0, the address of ftrace_caller().
>>      0xffffffff800d339a <clockevents_program_event+150>:  00 80   .2byte
>> 0x8000
>>      0xffffffff800d339c <clockevents_program_event+152>:  ff ff   .2byte
>> 0xffff
>>      0xffffffff800d339e <clockevents_program_event+154>:  ff ff   .2byte
>> 0xffff
>>      0xffffffff800d33a0 <clockevents_program_event+156>:  d5 bf   j
>> 0xffffffff800d3394 <clockevents_program_event+144
>>
>> The backtrace usually contains one or more occurrences of
>> return_to_handler() in this case.
>>
>> [  260.520394] [<ffffffff800d3398>] clockevents_program_event+0xac/0x100
>> [  260.521195] [<ffffffff8000d2bc>] return_to_handler+0x0/0x26
>> [  260.521843] [<ffffffff800c50ba>] hrtimer_interrupt+0x122/0x20c
>> [  260.522492] [<ffffffff8000d2bc>] return_to_handler+0x0/0x26
>> [  260.523132] [<ffffffff8009785e>] handle_percpu_devid_irq+0x9e/0x1ec
>> [  260.523788] [<ffffffff8000d2bc>] return_to_handler+0x0/0x26
>> [  260.524437] [<ffffffff8000d2bc>] return_to_handler+0x0/0x26
>> [  260.525080] [<ffffffff80a8acfa>] handle_riscv_irq+0x4a/0x74
>> [  260.525726] [<ffffffff80a97b9a>] call_on_irq_stack+0x32/0x40
>> ----------------------
>>
>> (b) Jump to an invalid location, e.g. to the middle of a valid 4-byte
>> instruction. %ra usually points right after the last instruction, "jalr
>>     a2", in return_to_handler() in such cases, so the jump was likely
>> made from there.
> 
> I haven't done fgraph tests yet. I will try out and see.
> 
>>
>> The problem is reproducible, although I have not found what causes it yet.
>>
>> Any help is appreciated, of course.
>>
>>>
>>>>
>>>> Regards,
>>>> Evgenii
>>>
>>> Regards,
>>> Andy
>>
> 
> Also, here is another side note,
> 
> It seems like the ftrace save/restore routine should save more
> registers as clang's fastcc may use t2 when the number of arguments
> exceeds what ABI defines for passing arg through registers.

Yes, I reported that issue to LLVM maintainers in 
https://github.com/llvm/llvm-project/issues/83111. It seems, static 
functions with 9+ arguments use t2 and t3, etc. for the 9th and 10th 
arguments when compiled with clang.

Clang seems to leave t0 and t1 alone but I do not know yet, if it is 
just a coincidence. Haven't found the exact rules for fastcc calling 
convention on RISC-V so far.

A compiler option to disable fastcc for the Linux kernel builds would be 
great. But, it seems, the discussion with LLVM maintainers will go 
nowhere without benchmarks to show whether that optimization has any 
significant effect. I plan to find and run proper benchmarks when I have 
time, but not just yet.

> 
> Cheers,
> Andy

Regards,
Evgenii



_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH RFC v2 riscv/for-next 0/5] Enable ftrace with kernel preemption for RISC-V
  2024-03-18 15:31       ` Andy Chiu
  2024-03-19 15:32         ` Evgenii Shatokhin
@ 2024-03-19 17:37         ` Alexandre Ghiti
  2024-03-20 16:36           ` Andy Chiu
  1 sibling, 1 reply; 43+ messages in thread
From: Alexandre Ghiti @ 2024-03-19 17:37 UTC (permalink / raw)
  To: Andy Chiu, Evgenii Shatokhin
  Cc: palmer, paul.walmsley, aou, rostedt, mingo, peterz, jpoimboe,
	jbaron, ardb, greentime.hu, zong.li, guoren, Jessica Clarke,
	kernel, linux-riscv, linux

Hi Andy,

On 18/03/2024 16:31, Andy Chiu wrote:
> Hi Evgenii,
>
> Thanks for your help!
>
> I just rebased upon 6.8-rc1 and passed the stress-ng + ftrace/nop
> testing. I will add some random tracers to test and some optimization
> before sending out again. Here are a few things needed:
>
> On Thu, Feb 22, 2024 at 12:55 AM Evgenii Shatokhin
> <e.shatokhin@yadro.com> wrote:
>> On 21.02.2024 08:27, Andy Chiu wrote:
>>> «Внимание! Данное письмо от внешнего адресата!»
>>>
>>> On Wed, Feb 14, 2024 at 3:42 AM Evgenii Shatokhin <e.shatokhin@yadro.com> wrote:
>>>> Hi,
>>>>
>>>> On 13.09.2022 12:42, Andy Chiu wrote:
>>>>> This patch removes dependency of dynamic ftrace from calling
>>>>> stop_machine(), and makes it compatiable with kernel preemption.
>>>>> Originally, we ran into stack corruptions, or execution of partially
>>>>> updated instructions when starting or stopping ftrace on a fully
>>>>> preemptible kernel configuration. The reason is that kernel periodically
>>>>> calls rcu_momentary_dyntick_idle() on cores waiting for the code-patching
>>>>> core running in ftrace. Though rcu_momentary_dyntick_idle() itself is
>>>>> marked as notrace, it would call a bunch of tracable functions if we
>>>>> configured the kernel as preemptible. For example, these are some functions
>>>>> that happened to have a symbol and have not been marked as notrace on a
>>>>> RISC-V preemptible kernel compiled with GCC-11:
>>>>>     - __rcu_report_exp_rnp()
>>>>>     - rcu_report_exp_cpu_mult()
>>>>>     - rcu_preempt_deferred_qs()
>>>>>     - rcu_preempt_need_deferred_qs()
>>>>>     - rcu_preempt_deferred_qs_irqrestore()
>>>>>
>>>>> Thus, this make it not ideal for us to rely on stop_machine() and
>>>>> handly marked "notrace"s to perform runtime code patching. To remove
>>>>> such dependency, we must make updates of code seemed atomic on running
>>>>> cores. This might not be obvious for RISC-V since it usaually uses a pair
>>>>> of AUIPC + JALR to perform a long jump, which cannot be modified and
>>>>> executed concurrently if we consider preemptions. As such, this patch
>>>>> proposed a way to make it possible. It embeds a 32-bit rel-address data
>>>>> into instructions of each ftrace prologue and jumps indirectly. In this
>>>>> way, we could store and load the address atomically so that the code
>>>>> patching core could run simutaneously with the rest of running cores.
>>>>>
>>>>> After applying the patchset, we compiled a preemptible kernel with all
>>>>> tracers and ftrace-selftest enabled, and booted it on a 2-core QEMU virt
>>>>> machine. The kernel could boot up successfully, passing all ftrace
>>>>> testsuits. Besides, we ran a script that randomly pick a tracer on every
>>>>> 0~5 seconds. The kernel has sustained over 20K rounds of the test. In
>>>>> contrast, a preemptible kernel without our patch would panic in few
>>>>> rounds on the same machine.
>>>>>
>>>>> Though we ran into errors when using hwlat or irqsoff tracers together
>>>>> with cpu-online stressor from stress-ng on a preemptible kernel. We
>>>>> believe the reason may be that  percpu workers of the tracers are being
>>>>> queued into unbounded workqueue when cpu get offlined and patches will go
>>>>> through tracing tree.
>>>>>
>>>>> Additionally, we found patching of tracepoints unsafe since the
>>>>> instructions being patched are not naturally aligned. This may result in
>>>>> 2 half-word stores, which breaks atomicity, during the code patching.
>>>>>
>>>>> changes in patch v2:
>>>>>     - Enforce alignments on all functions with a compiler workaround.
>>>>>     - Support 64bit addressing for ftrace targets if xlen == 64
>>>>>     - Initialize ftrace target addresses to avoid calling bad address in a
>>>>>       hypothesized case.
>>>>>     - Use LGPTR instead of SZPTR since .align is log-scaled for
>>>>>       mcount-dyn.S
>>>>>     - Require the nop instruction of all jump_labels aligns naturally on
>>>>>       4B.
>>>>>
>>>>> Andy Chiu (5):
>>>>>      riscv: align ftrace to 4 Byte boundary and increase ftrace prologue
>>>>>        size
>>>>>      riscv: export patch_insn_write
>>>>>      riscv: ftrace: use indirect jump to work with kernel preemption
>>>>>      riscv: ftrace: do not use stop_machine to update code
>>>>>      riscv: align arch_static_branch function
>>>>>
>>>>>     arch/riscv/Makefile                 |   2 +-
>>>>>     arch/riscv/include/asm/ftrace.h     |  24 ----
>>>>>     arch/riscv/include/asm/jump_label.h |   2 +
>>>>>     arch/riscv/include/asm/patch.h      |   1 +
>>>>>     arch/riscv/kernel/ftrace.c          | 179 ++++++++++++++++++++--------
>>>>>     arch/riscv/kernel/mcount-dyn.S      |  69 ++++++++---
>>>>>     arch/riscv/kernel/patch.c           |   4 +-
>>>>>     7 files changed, 188 insertions(+), 93 deletions(-)
>>>>>
>>>> First of all, thank you for working on making dynamic Ftrace robust in
>>>> preemptible kernels on RISC-V.
>>>> It is an important use case but, for now, dynamic Ftrace and related
>>>> tracers cannot be safely used with such kernels.
>>>>
>>>> Are there any updates on this series?
>>>> It needs a rebase, of course, but it looks doable.
>>>>
>>>> If I understand the discussion correctly, the only blocker was that
>>>> using "-falign-functions" was not enough to properly align cold
>>>> functions and "-fno-guess-branch-probability" would likely have a
>>>> performance cost.
>>>>
>>>> It seems, GCC developers have recently provided a workaround for that
>>>> (https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;h=0f5a9a00e3ab1fe96142f304cfbcf3f63b15f326,
>>>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88345#c24).
>>>>
>>>> "-fmin-function-alignment" should help but, I do not know, which GCC
>>>> versions have got that patch already. In the meantime, one could
>>>> probably check if "-fmin-function-alignment" is supported by the
>>>> compiler and use it, if it is.
>>>>
>>>> Thoughts?
>>> Hi Evgenii,
>>>
>>> Thanks for the update. Indeed, it is essential to this patch for
>>> toolchain to provide forced alignment. We can test this flag in the
>>> Makefile to sort out if toolchain supports it or not. Meanwhile, I had
>>> figured out a way for this to work on any 2-B align addresses but
>>> hadn't implemented it out yet. Basically it would require more
>>> patching space for us to do software alignment. I would opt for a
>>> special toolchain flag if the toolchain just supports it.
>>>
>>> Let me take some time to look and get back to you soon.
>> Thank you! Looking forward to it.
>>
>> In case it helps, here is what I have checked so far.
>>
>> 1.
>> I added the patch
>> https://gcc.gnu.org/git/?p=gcc.git;a=patch;h=0f5a9a00e3ab1fe96142f304cfbcf3f63b15f326
>> to the current revision of GCC 13.2.0 from RISC-V toolchain.
>>
>> Rebased your patchset on top of Linux 6.8-rc4 (mostly - context changes,
>> SYM_FUNC_START/SYM_FUNC_END for asm symbols, etc.).
>>
>> Reverted 8547649981e6 ("riscv: ftrace: Fixup panic by disabling
>> preemption").
>>
>> Switched from -falign-functions=4 to -fmin-function-alignment=4:
>> ------------------
>> diff --git a/arch/riscv/Makefile b/arch/riscv/Makefile
>> index b33b787c8b07..dcd0adeebaae 100644
>> --- a/arch/riscv/Makefile
>> +++ b/arch/riscv/Makefile
>> @@ -15,9 +15,9 @@ ifeq ($(CONFIG_DYNAMIC_FTRACE),y)
>>          LDFLAGS_vmlinux += --no-relax
>>          KBUILD_CPPFLAGS += -DCC_USING_PATCHABLE_FUNCTION_ENTRY
>>    ifeq ($(CONFIG_RISCV_ISA_C),y)
>> -       CC_FLAGS_FTRACE := -fpatchable-function-entry=12 -falign-functions=4
>> +       CC_FLAGS_FTRACE := -fpatchable-function-entry=12
>> -fmin-function-alignment=4
>>    else
>> -       CC_FLAGS_FTRACE := -fpatchable-function-entry=6 -falign-functions=4
>> +       CC_FLAGS_FTRACE := -fpatchable-function-entry=6 -fmin-function-alignment=4
>>    endif
>>    endif
>>
>> ------------------
>>
>> As far as I can see from objdump, the functions that were not aligned at
>> 4-byte boundary with -falign-functions=4, are now aligned correctly with
>> -fmin-function-alignment=4.
>>
>> 2.
>> I tried the kernel in a QEMU VM with 2 CPUs and "-machine virt".
>>
>> The boottime tests for Ftrace had passed, except the tests for
>> function_graph. I described the failure and the possible fix here:
>> https://lore.kernel.org/all/dcc5976d-635a-4710-92df-94a99653314e@yadro.com/
> Indeed, this is needed. I am not sure why I got ftrace boot-time tests
> passed back then. Thank you for solving it!
>
>> 3.
>> There were also boottime warnings about "RCU not on for:
>> arch_cpu_idle+0x0/0x2c". These are probably not related to your
>> patchset, but rather to the fact that Ftrace is enabled in a preemptble
>> kernel where RCU does different things.
>>
>> As a workaround, I disabled tracing of arch_cpu_idle() for now:
>> ------------------
>> diff --git a/arch/riscv/kernel/process.c b/arch/riscv/kernel/process.c
>> index 92922dbd5b5c..6abeecbfc51d 100644
>> --- a/arch/riscv/kernel/process.c
>> +++ b/arch/riscv/kernel/process.c
>> @@ -37,7 +37,7 @@ EXPORT_SYMBOL(__stack_chk_guard);
>>
>>    extern asmlinkage void ret_from_fork(void);
>>
>> -void arch_cpu_idle(void)
>> +void noinstr arch_cpu_idle(void)
>>    {
>>          cpu_do_idle();
>>    }
>>
>> ------------------
>>
>> 4.
>> Stress-testing revealed an issue though, which I do not understand yet.
>>
>> Probably similar to what you did earlier, I ran a script that switched
>> the current tracer to "function", "function_graph", "nop", "blk" each
>> 1-5 seconds. In another shell, "stress-ng --hrtimers 1" was running.
>>
>> The kernel usually crashed within a few minutes, in seemingly random
>> locations, but often in one of two ways:
>>
>> (a) Invalid instruction, because the address of ftrace_caller function
>> was somehow written to the body of the traced function rather than just
>> to the Ftrace prologue.
> The reason for this is probably that any one of your ftrace_*_call is
> not 8-B aligned.
>
>> In the following example, the crash happened at 0xffffffff800d3398. "b0
>> d7" is actually not part of the code here, but rather the lower bytes of
>> 0xffffffff8000d7b0, the address of ftrace_caller() in this kernel.
> It seems like there is a bug in patch_insn_write(). I think we should
> at least disable migration during patch_map() and patch_unmap(). I'd
> need some time to dig into patch_map(). But since __set_fixmap() only
> flush local tlb, I'd assume it is not safe to context switch out and
> migrate while holding the fix-map mapping. Adding preempt_disable()
> and preempt_enable() before calling __patch_insn_write() solves the
> issue.


Yes, Andrea already mentioned this, I came up with the same idea of 
preempt_disable() but then I noticed arm64 actually disables IRQ: any 
idea why? 
https://lore.kernel.org/linux-riscv/CAHVXubj7ChgpvN4F_QO0oASaT5WC2VS0Q-bEqhnmF8z8QV=yDQ@mail.gmail.com/


>> (gdb) disas /r 0xffffffff800d3382,+0x20
>> Dump of assembler code from 0xffffffff800d3382 to 0xffffffff800d33a2:
>> ...
>>      0xffffffff800d3394 <clockevents_program_event+144>:  ba 87   mv
>> a5,a4
>>      0xffffffff800d3396 <clockevents_program_event+146>:  c1 bf   j
>> 0xffffffff800d3366 <clockevents_program_event+98>
>>      0xffffffff800d3398 <clockevents_program_event+148>:  b0 d7   sw
>> a2,104(a5) // 0xffffffff8000d7b0, the address of ftrace_caller().
>>      0xffffffff800d339a <clockevents_program_event+150>:  00 80   .2byte
>> 0x8000
>>      0xffffffff800d339c <clockevents_program_event+152>:  ff ff   .2byte
>> 0xffff
>>      0xffffffff800d339e <clockevents_program_event+154>:  ff ff   .2byte
>> 0xffff
>>      0xffffffff800d33a0 <clockevents_program_event+156>:  d5 bf   j
>> 0xffffffff800d3394 <clockevents_program_event+144
>>
>> The backtrace usually contains one or more occurrences of
>> return_to_handler() in this case.
>>
>> [  260.520394] [<ffffffff800d3398>] clockevents_program_event+0xac/0x100
>> [  260.521195] [<ffffffff8000d2bc>] return_to_handler+0x0/0x26
>> [  260.521843] [<ffffffff800c50ba>] hrtimer_interrupt+0x122/0x20c
>> [  260.522492] [<ffffffff8000d2bc>] return_to_handler+0x0/0x26
>> [  260.523132] [<ffffffff8009785e>] handle_percpu_devid_irq+0x9e/0x1ec
>> [  260.523788] [<ffffffff8000d2bc>] return_to_handler+0x0/0x26
>> [  260.524437] [<ffffffff8000d2bc>] return_to_handler+0x0/0x26
>> [  260.525080] [<ffffffff80a8acfa>] handle_riscv_irq+0x4a/0x74
>> [  260.525726] [<ffffffff80a97b9a>] call_on_irq_stack+0x32/0x40
>> ----------------------
>>
>> (b) Jump to an invalid location, e.g. to the middle of a valid 4-byte
>> instruction. %ra usually points right after the last instruction, "jalr
>>     a2", in return_to_handler() in such cases, so the jump was likely
>> made from there.
> I haven't done fgraph tests yet. I will try out and see.
>
>> The problem is reproducible, although I have not found what causes it yet.
>>
>> Any help is appreciated, of course.
>>
>>>> Regards,
>>>> Evgenii
>>> Regards,
>>> Andy
> Also, here is another side note,
>
> It seems like the ftrace save/restore routine should save more
> registers as clang's fastcc may use t2 when the number of arguments
> exceeds what ABI defines for passing arg through registers.
>
> Cheers,
> Andy
>
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH RFC v2 riscv/for-next 0/5] Enable ftrace with kernel preemption for RISC-V
  2024-03-19 17:37         ` Alexandre Ghiti
@ 2024-03-20 16:36           ` Andy Chiu
  2024-03-21 11:02             ` Alexandre Ghiti
  0 siblings, 1 reply; 43+ messages in thread
From: Andy Chiu @ 2024-03-20 16:36 UTC (permalink / raw)
  To: Alexandre Ghiti
  Cc: Evgenii Shatokhin, palmer, paul.walmsley, aou, rostedt, mingo,
	peterz, jpoimboe, jbaron, ardb, greentime.hu, zong.li, guoren,
	Jessica Clarke, kernel, linux-riscv, linux

On Wed, Mar 20, 2024 at 1:37 AM Alexandre Ghiti <alex@ghiti.fr> wrote:
>
> Hi Andy,
>
> On 18/03/2024 16:31, Andy Chiu wrote:
> > Hi Evgenii,
> >
> > Thanks for your help!
> >
> > I just rebased upon 6.8-rc1 and passed the stress-ng + ftrace/nop
> > testing. I will add some random tracers to test and some optimization
> > before sending out again. Here are a few things needed:
> >
> > On Thu, Feb 22, 2024 at 12:55 AM Evgenii Shatokhin
> > <e.shatokhin@yadro.com> wrote:
> >> On 21.02.2024 08:27, Andy Chiu wrote:
> >>> «Внимание! Данное письмо от внешнего адресата!»
> >>>
> >>> On Wed, Feb 14, 2024 at 3:42 AM Evgenii Shatokhin <e.shatokhin@yadro.com> wrote:
> >>>> Hi,
> >>>>
> >>>> On 13.09.2022 12:42, Andy Chiu wrote:
> >>>>> This patch removes dependency of dynamic ftrace from calling
> >>>>> stop_machine(), and makes it compatiable with kernel preemption.
> >>>>> Originally, we ran into stack corruptions, or execution of partially
> >>>>> updated instructions when starting or stopping ftrace on a fully
> >>>>> preemptible kernel configuration. The reason is that kernel periodically
> >>>>> calls rcu_momentary_dyntick_idle() on cores waiting for the code-patching
> >>>>> core running in ftrace. Though rcu_momentary_dyntick_idle() itself is
> >>>>> marked as notrace, it would call a bunch of tracable functions if we
> >>>>> configured the kernel as preemptible. For example, these are some functions
> >>>>> that happened to have a symbol and have not been marked as notrace on a
> >>>>> RISC-V preemptible kernel compiled with GCC-11:
> >>>>>     - __rcu_report_exp_rnp()
> >>>>>     - rcu_report_exp_cpu_mult()
> >>>>>     - rcu_preempt_deferred_qs()
> >>>>>     - rcu_preempt_need_deferred_qs()
> >>>>>     - rcu_preempt_deferred_qs_irqrestore()
> >>>>>
> >>>>> Thus, this make it not ideal for us to rely on stop_machine() and
> >>>>> handly marked "notrace"s to perform runtime code patching. To remove
> >>>>> such dependency, we must make updates of code seemed atomic on running
> >>>>> cores. This might not be obvious for RISC-V since it usaually uses a pair
> >>>>> of AUIPC + JALR to perform a long jump, which cannot be modified and
> >>>>> executed concurrently if we consider preemptions. As such, this patch
> >>>>> proposed a way to make it possible. It embeds a 32-bit rel-address data
> >>>>> into instructions of each ftrace prologue and jumps indirectly. In this
> >>>>> way, we could store and load the address atomically so that the code
> >>>>> patching core could run simutaneously with the rest of running cores.
> >>>>>
> >>>>> After applying the patchset, we compiled a preemptible kernel with all
> >>>>> tracers and ftrace-selftest enabled, and booted it on a 2-core QEMU virt
> >>>>> machine. The kernel could boot up successfully, passing all ftrace
> >>>>> testsuits. Besides, we ran a script that randomly pick a tracer on every
> >>>>> 0~5 seconds. The kernel has sustained over 20K rounds of the test. In
> >>>>> contrast, a preemptible kernel without our patch would panic in few
> >>>>> rounds on the same machine.
> >>>>>
> >>>>> Though we ran into errors when using hwlat or irqsoff tracers together
> >>>>> with cpu-online stressor from stress-ng on a preemptible kernel. We
> >>>>> believe the reason may be that  percpu workers of the tracers are being
> >>>>> queued into unbounded workqueue when cpu get offlined and patches will go
> >>>>> through tracing tree.
> >>>>>
> >>>>> Additionally, we found patching of tracepoints unsafe since the
> >>>>> instructions being patched are not naturally aligned. This may result in
> >>>>> 2 half-word stores, which breaks atomicity, during the code patching.
> >>>>>
> >>>>> changes in patch v2:
> >>>>>     - Enforce alignments on all functions with a compiler workaround.
> >>>>>     - Support 64bit addressing for ftrace targets if xlen == 64
> >>>>>     - Initialize ftrace target addresses to avoid calling bad address in a
> >>>>>       hypothesized case.
> >>>>>     - Use LGPTR instead of SZPTR since .align is log-scaled for
> >>>>>       mcount-dyn.S
> >>>>>     - Require the nop instruction of all jump_labels aligns naturally on
> >>>>>       4B.
> >>>>>
> >>>>> Andy Chiu (5):
> >>>>>      riscv: align ftrace to 4 Byte boundary and increase ftrace prologue
> >>>>>        size
> >>>>>      riscv: export patch_insn_write
> >>>>>      riscv: ftrace: use indirect jump to work with kernel preemption
> >>>>>      riscv: ftrace: do not use stop_machine to update code
> >>>>>      riscv: align arch_static_branch function
> >>>>>
> >>>>>     arch/riscv/Makefile                 |   2 +-
> >>>>>     arch/riscv/include/asm/ftrace.h     |  24 ----
> >>>>>     arch/riscv/include/asm/jump_label.h |   2 +
> >>>>>     arch/riscv/include/asm/patch.h      |   1 +
> >>>>>     arch/riscv/kernel/ftrace.c          | 179 ++++++++++++++++++++--------
> >>>>>     arch/riscv/kernel/mcount-dyn.S      |  69 ++++++++---
> >>>>>     arch/riscv/kernel/patch.c           |   4 +-
> >>>>>     7 files changed, 188 insertions(+), 93 deletions(-)
> >>>>>
> >>>> First of all, thank you for working on making dynamic Ftrace robust in
> >>>> preemptible kernels on RISC-V.
> >>>> It is an important use case but, for now, dynamic Ftrace and related
> >>>> tracers cannot be safely used with such kernels.
> >>>>
> >>>> Are there any updates on this series?
> >>>> It needs a rebase, of course, but it looks doable.
> >>>>
> >>>> If I understand the discussion correctly, the only blocker was that
> >>>> using "-falign-functions" was not enough to properly align cold
> >>>> functions and "-fno-guess-branch-probability" would likely have a
> >>>> performance cost.
> >>>>
> >>>> It seems, GCC developers have recently provided a workaround for that
> >>>> (https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;h=0f5a9a00e3ab1fe96142f304cfbcf3f63b15f326,
> >>>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88345#c24).
> >>>>
> >>>> "-fmin-function-alignment" should help but, I do not know, which GCC
> >>>> versions have got that patch already. In the meantime, one could
> >>>> probably check if "-fmin-function-alignment" is supported by the
> >>>> compiler and use it, if it is.
> >>>>
> >>>> Thoughts?
> >>> Hi Evgenii,
> >>>
> >>> Thanks for the update. Indeed, it is essential to this patch for
> >>> toolchain to provide forced alignment. We can test this flag in the
> >>> Makefile to sort out if toolchain supports it or not. Meanwhile, I had
> >>> figured out a way for this to work on any 2-B align addresses but
> >>> hadn't implemented it out yet. Basically it would require more
> >>> patching space for us to do software alignment. I would opt for a
> >>> special toolchain flag if the toolchain just supports it.
> >>>
> >>> Let me take some time to look and get back to you soon.
> >> Thank you! Looking forward to it.
> >>
> >> In case it helps, here is what I have checked so far.
> >>
> >> 1.
> >> I added the patch
> >> https://gcc.gnu.org/git/?p=gcc.git;a=patch;h=0f5a9a00e3ab1fe96142f304cfbcf3f63b15f326
> >> to the current revision of GCC 13.2.0 from RISC-V toolchain.
> >>
> >> Rebased your patchset on top of Linux 6.8-rc4 (mostly - context changes,
> >> SYM_FUNC_START/SYM_FUNC_END for asm symbols, etc.).
> >>
> >> Reverted 8547649981e6 ("riscv: ftrace: Fixup panic by disabling
> >> preemption").
> >>
> >> Switched from -falign-functions=4 to -fmin-function-alignment=4:
> >> ------------------
> >> diff --git a/arch/riscv/Makefile b/arch/riscv/Makefile
> >> index b33b787c8b07..dcd0adeebaae 100644
> >> --- a/arch/riscv/Makefile
> >> +++ b/arch/riscv/Makefile
> >> @@ -15,9 +15,9 @@ ifeq ($(CONFIG_DYNAMIC_FTRACE),y)
> >>          LDFLAGS_vmlinux += --no-relax
> >>          KBUILD_CPPFLAGS += -DCC_USING_PATCHABLE_FUNCTION_ENTRY
> >>    ifeq ($(CONFIG_RISCV_ISA_C),y)
> >> -       CC_FLAGS_FTRACE := -fpatchable-function-entry=12 -falign-functions=4
> >> +       CC_FLAGS_FTRACE := -fpatchable-function-entry=12
> >> -fmin-function-alignment=4
> >>    else
> >> -       CC_FLAGS_FTRACE := -fpatchable-function-entry=6 -falign-functions=4
> >> +       CC_FLAGS_FTRACE := -fpatchable-function-entry=6 -fmin-function-alignment=4
> >>    endif
> >>    endif
> >>
> >> ------------------
> >>
> >> As far as I can see from objdump, the functions that were not aligned at
> >> 4-byte boundary with -falign-functions=4, are now aligned correctly with
> >> -fmin-function-alignment=4.
> >>
> >> 2.
> >> I tried the kernel in a QEMU VM with 2 CPUs and "-machine virt".
> >>
> >> The boottime tests for Ftrace had passed, except the tests for
> >> function_graph. I described the failure and the possible fix here:
> >> https://lore.kernel.org/all/dcc5976d-635a-4710-92df-94a99653314e@yadro.com/
> > Indeed, this is needed. I am not sure why I got ftrace boot-time tests
> > passed back then. Thank you for solving it!
> >
> >> 3.
> >> There were also boottime warnings about "RCU not on for:
> >> arch_cpu_idle+0x0/0x2c". These are probably not related to your
> >> patchset, but rather to the fact that Ftrace is enabled in a preemptble
> >> kernel where RCU does different things.
> >>
> >> As a workaround, I disabled tracing of arch_cpu_idle() for now:
> >> ------------------
> >> diff --git a/arch/riscv/kernel/process.c b/arch/riscv/kernel/process.c
> >> index 92922dbd5b5c..6abeecbfc51d 100644
> >> --- a/arch/riscv/kernel/process.c
> >> +++ b/arch/riscv/kernel/process.c
> >> @@ -37,7 +37,7 @@ EXPORT_SYMBOL(__stack_chk_guard);
> >>
> >>    extern asmlinkage void ret_from_fork(void);
> >>
> >> -void arch_cpu_idle(void)
> >> +void noinstr arch_cpu_idle(void)
> >>    {
> >>          cpu_do_idle();
> >>    }
> >>
> >> ------------------
> >>
> >> 4.
> >> Stress-testing revealed an issue though, which I do not understand yet.
> >>
> >> Probably similar to what you did earlier, I ran a script that switched
> >> the current tracer to "function", "function_graph", "nop", "blk" each
> >> 1-5 seconds. In another shell, "stress-ng --hrtimers 1" was running.
> >>
> >> The kernel usually crashed within a few minutes, in seemingly random
> >> locations, but often in one of two ways:
> >>
> >> (a) Invalid instruction, because the address of ftrace_caller function
> >> was somehow written to the body of the traced function rather than just
> >> to the Ftrace prologue.
> > The reason for this is probably that any one of your ftrace_*_call is
> > not 8-B aligned.
> >
> >> In the following example, the crash happened at 0xffffffff800d3398. "b0
> >> d7" is actually not part of the code here, but rather the lower bytes of
> >> 0xffffffff8000d7b0, the address of ftrace_caller() in this kernel.
> > It seems like there is a bug in patch_insn_write(). I think we should
> > at least disable migration during patch_map() and patch_unmap(). I'd
> > need some time to dig into patch_map(). But since __set_fixmap() only
> > flush local tlb, I'd assume it is not safe to context switch out and
> > migrate while holding the fix-map mapping. Adding preempt_disable()
> > and preempt_enable() before calling __patch_insn_write() solves the
> > issue.
>
>
> Yes, Andrea already mentioned this, I came up with the same idea of
> preempt_disable() but then I noticed arm64 actually disables IRQ: any
> idea why?
> https://lore.kernel.org/linux-riscv/CAHVXubj7ChgpvN4F_QO0oASaT5WC2VS0Q-bEqhnmF8z8QV=yDQ@mail.gmail.com/

Hi, I took a quick look and it seems that it is a design choice in
software to me. ARM uses a spinlock to protect text and we use a
mutex. If they have a requirement to do patching while irq is off
(maybe in an ipi handler), then the only viable option would be to use
raw_spin_lock_irqsave. I think preempt_disable should be enough for us
if we use text_mutex to protect patching. Or, am I missing something?




>
>
> >> (gdb) disas /r 0xffffffff800d3382,+0x20
> >> Dump of assembler code from 0xffffffff800d3382 to 0xffffffff800d33a2:
> >> ...
> >>      0xffffffff800d3394 <clockevents_program_event+144>:  ba 87   mv
> >> a5,a4
> >>      0xffffffff800d3396 <clockevents_program_event+146>:  c1 bf   j
> >> 0xffffffff800d3366 <clockevents_program_event+98>
> >>      0xffffffff800d3398 <clockevents_program_event+148>:  b0 d7   sw
> >> a2,104(a5) // 0xffffffff8000d7b0, the address of ftrace_caller().
> >>      0xffffffff800d339a <clockevents_program_event+150>:  00 80   .2byte
> >> 0x8000
> >>      0xffffffff800d339c <clockevents_program_event+152>:  ff ff   .2byte
> >> 0xffff
> >>      0xffffffff800d339e <clockevents_program_event+154>:  ff ff   .2byte
> >> 0xffff
> >>      0xffffffff800d33a0 <clockevents_program_event+156>:  d5 bf   j
> >> 0xffffffff800d3394 <clockevents_program_event+144
> >>
> >> The backtrace usually contains one or more occurrences of
> >> return_to_handler() in this case.
> >>
> >> [  260.520394] [<ffffffff800d3398>] clockevents_program_event+0xac/0x100
> >> [  260.521195] [<ffffffff8000d2bc>] return_to_handler+0x0/0x26
> >> [  260.521843] [<ffffffff800c50ba>] hrtimer_interrupt+0x122/0x20c
> >> [  260.522492] [<ffffffff8000d2bc>] return_to_handler+0x0/0x26
> >> [  260.523132] [<ffffffff8009785e>] handle_percpu_devid_irq+0x9e/0x1ec
> >> [  260.523788] [<ffffffff8000d2bc>] return_to_handler+0x0/0x26
> >> [  260.524437] [<ffffffff8000d2bc>] return_to_handler+0x0/0x26
> >> [  260.525080] [<ffffffff80a8acfa>] handle_riscv_irq+0x4a/0x74
> >> [  260.525726] [<ffffffff80a97b9a>] call_on_irq_stack+0x32/0x40
> >> ----------------------
> >>
> >> (b) Jump to an invalid location, e.g. to the middle of a valid 4-byte
> >> instruction. %ra usually points right after the last instruction, "jalr
> >>     a2", in return_to_handler() in such cases, so the jump was likely
> >> made from there.
> > I haven't done fgraph tests yet. I will try out and see.
> >
> >> The problem is reproducible, although I have not found what causes it yet.
> >>
> >> Any help is appreciated, of course.
> >>
> >>>> Regards,
> >>>> Evgenii
> >>> Regards,
> >>> Andy
> > Also, here is another side note,
> >
> > It seems like the ftrace save/restore routine should save more
> > registers as clang's fastcc may use t2 when the number of arguments
> > exceeds what ABI defines for passing arg through registers.
> >
> > Cheers,
> > Andy
> >
> > _______________________________________________
> > linux-riscv mailing list
> > linux-riscv@lists.infradead.org
> > http://lists.infradead.org/mailman/listinfo/linux-riscv

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH RFC v2 riscv/for-next 0/5] Enable ftrace with kernel preemption for RISC-V
  2024-03-19 14:50                 ` Alexandre Ghiti
  2024-03-19 14:58                   ` Conor Dooley
@ 2024-03-20 16:37                   ` Andy Chiu
  1 sibling, 0 replies; 43+ messages in thread
From: Andy Chiu @ 2024-03-20 16:37 UTC (permalink / raw)
  To: Alexandre Ghiti
  Cc: Björn Töpel, Evgenii Shatokhin, palmer, paul.walmsley,
	aou, rostedt, mingo, peterz, jpoimboe, jbaron, ardb,
	greentime.hu, zong.li, guoren, Jessica Clarke, kernel,
	linux-riscv, linux, Samuel Holland

On Tue, Mar 19, 2024 at 10:50 PM Alexandre Ghiti <alex@ghiti.fr> wrote:
>
> On 11/03/2024 15:24, Andy Chiu wrote:
> > On Thu, Mar 7, 2024 at 11:57 PM Samuel Holland
> > <samuel.holland@sifive.com> wrote:
> >> Hi Alex,
> >>
> >> On 2024-03-07 7:21 AM, Alexandre Ghiti wrote:
> >>> But TBH, I have started thinking about the issue your patch is trying to deal
> >>> with. IIUC you're trying to avoid traps (or silent errors) that could happen
> >>> because of concurrent accesses when patching is happening on a pair auipc/jarl.
> >>>
> >>> I'm wondering if instead, we could not actually handle the potential traps:
> >>> before storing the auipc + jalr pair, we could use a well-identified trapping
> >>> instruction that could be recognized in the trap handler as a legitimate trap.
> >>> For example:
> >>>
> >>>
> >>> auipc  -->  auipc  -->  XXXX  -->  XXXX  -->  auipc
> >>> jalr        XXXX        XXXX       jalr       jalr
> >>>
> >>>
> >>> If a core traps on a XXXX instruction, we know this address is being patched, so
> >>> we can return and probably the patching will be over. We could also identify
> >>> half patched word instruction (I mean with only XX).
> >> Unfortunately this does not work without some fence.i in the middle. The
> >> processor is free to fetch any instruction that has been written to a location
> >> since the last fence.i instruction. So it would be perfectly valid to fetch the
> >> old aiupc and new jalr or vice versa and not trap. This would happen if, for
> >> example, the two instructions were in different cache lines, and only one of the
> >> cache lines got evicted and refilled.
> >>
> >> But sending an IPI to run the fence.i probably negates the performance benefit.
> > Maybe something like x86, we can hook ftrace_replace_code() out and
> > batch send IPIs to prevent storms of remote fences. The solution Alex
> > proposed can save the code size for function entries. But we have to
> > send out remote fences at each "-->" transition, which is 4 sets of
> > remote IPIs. On the other hand, this series increases the per-function
> > patch size to 24 bytes. However, it decreases the number of remote
> > fences to 1 set.
> >
> > The performance hit could be observable for the auipc + jalr case,
> > because all remote cores will be executing on XXXX instructions and
> > take a trap at each function entry during code patching.
> >
> > Besides, this series would give us a chance not to send any remote
> > fences if we were to change only the destination of ftrace (e.g. to a
> > custom ftrace trampoline). As it would be a regular store for the
> > writer and regular load for readers, only fence w,w is needed.
> > However, I am not very certain on how often would be for this
> > particular use case. I'd need some time to investigate it.
> >
> >> Maybe there is some creative way to overcome this.
> >>
> >>> But please let me know if that's completely stupid and I did not understand the
> >>> problem, since my patchset to support svvptc, I am wondering if it is not more
> >>> performant to actually take very unlikely traps instead of trying to avoid them.
> >> I agree in general it is a good idea to optimize the hot path like this.
> >>
> >> Regards,
> >> Samuel
> >>
> > Regards,
> > Andy
> >
> > _______________________________________________
> > linux-riscv mailing list
> > linux-riscv@lists.infradead.org
> > http://lists.infradead.org/mailman/listinfo/linux-riscv
>
>
> So indeed my solution was way too naive and we've been discussing that
> with Björn lately. He worked a lot on that and came up with the solution
> he proposed here
> https://lore.kernel.org/linux-riscv/87zfv0onre.fsf@all.your.base.are.belong.to.us/
>
> The thing is ftrace seems to be quite broken as the ftrace kselftests
> raise a lot of issues which I have started to debug but are not that
> easy, so we are wondering if *someone* should not work on Bjorn's
> solution (or another, open to discussions) for 6.10. @Andy WDYT? Do you
> have free cycles? Björn could work on that too (and I'll help if needed).

Do you mean the FTRACE_STARTUP_TEST, or something else? I am also
happy to help on text patching issues. It would be great if we could
define the remaining works and share them. Currently I am focusing on
having dynamic ftrace with preemption and getting rid of
stop_machine() while patching code. I am going to spin a revision of
this patch series in a few days if possible. There are quite some
things needed to be discussed and I'd like to join any conversation!

>
> Let me know what you think!
>
> Alex
>
>

Cheers,
Andy

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH RFC v2 riscv/for-next 0/5] Enable ftrace with kernel preemption for RISC-V
  2024-03-19 15:32         ` Evgenii Shatokhin
@ 2024-03-20 16:38           ` Andy Chiu
  0 siblings, 0 replies; 43+ messages in thread
From: Andy Chiu @ 2024-03-20 16:38 UTC (permalink / raw)
  To: Evgenii Shatokhin
  Cc: palmer, paul.walmsley, aou, rostedt, mingo, peterz, jpoimboe,
	jbaron, ardb, greentime.hu, zong.li, guoren, Jessica Clarke,
	kernel, linux-riscv, linux

On Tue, Mar 19, 2024 at 11:32 PM Evgenii Shatokhin
<e.shatokhin@yadro.com> wrote:
>
> Hi,
>
> On 18.03.2024 18:31, Andy Chiu wrote:
> > Hi Evgenii,
> >
> > Thanks for your help!
>
> You are welcome!
>
> >
> > I just rebased upon 6.8-rc1 and passed the stress-ng + ftrace/nop
> > testing. I will add some random tracers to test and some optimization
> > before sending out again. Here are a few things needed:
> >
> > On Thu, Feb 22, 2024 at 12:55 AM Evgenii Shatokhin
> > <e.shatokhin@yadro.com> wrote:
> >>
> >> On 21.02.2024 08:27, Andy Chiu wrote:
> >>>
> >>> On Wed, Feb 14, 2024 at 3:42 AM Evgenii Shatokhin <e.shatokhin@yadro.com> wrote:
> >>>>
> >>>> Hi,
> >>>>
> >>>> On 13.09.2022 12:42, Andy Chiu wrote:
> >>>>> This patch removes dependency of dynamic ftrace from calling
> >>>>> stop_machine(), and makes it compatiable with kernel preemption.
> >>>>> Originally, we ran into stack corruptions, or execution of partially
> >>>>> updated instructions when starting or stopping ftrace on a fully
> >>>>> preemptible kernel configuration. The reason is that kernel periodically
> >>>>> calls rcu_momentary_dyntick_idle() on cores waiting for the code-patching
> >>>>> core running in ftrace. Though rcu_momentary_dyntick_idle() itself is
> >>>>> marked as notrace, it would call a bunch of tracable functions if we
> >>>>> configured the kernel as preemptible. For example, these are some functions
> >>>>> that happened to have a symbol and have not been marked as notrace on a
> >>>>> RISC-V preemptible kernel compiled with GCC-11:
> >>>>>     - __rcu_report_exp_rnp()
> >>>>>     - rcu_report_exp_cpu_mult()
> >>>>>     - rcu_preempt_deferred_qs()
> >>>>>     - rcu_preempt_need_deferred_qs()
> >>>>>     - rcu_preempt_deferred_qs_irqrestore()
> >>>>>
> >>>>> Thus, this make it not ideal for us to rely on stop_machine() and
> >>>>> handly marked "notrace"s to perform runtime code patching. To remove
> >>>>> such dependency, we must make updates of code seemed atomic on running
> >>>>> cores. This might not be obvious for RISC-V since it usaually uses a pair
> >>>>> of AUIPC + JALR to perform a long jump, which cannot be modified and
> >>>>> executed concurrently if we consider preemptions. As such, this patch
> >>>>> proposed a way to make it possible. It embeds a 32-bit rel-address data
> >>>>> into instructions of each ftrace prologue and jumps indirectly. In this
> >>>>> way, we could store and load the address atomically so that the code
> >>>>> patching core could run simutaneously with the rest of running cores.
> >>>>>
> >>>>> After applying the patchset, we compiled a preemptible kernel with all
> >>>>> tracers and ftrace-selftest enabled, and booted it on a 2-core QEMU virt
> >>>>> machine. The kernel could boot up successfully, passing all ftrace
> >>>>> testsuits. Besides, we ran a script that randomly pick a tracer on every
> >>>>> 0~5 seconds. The kernel has sustained over 20K rounds of the test. In
> >>>>> contrast, a preemptible kernel without our patch would panic in few
> >>>>> rounds on the same machine.
> >>>>>
> >>>>> Though we ran into errors when using hwlat or irqsoff tracers together
> >>>>> with cpu-online stressor from stress-ng on a preemptible kernel. We
> >>>>> believe the reason may be that  percpu workers of the tracers are being
> >>>>> queued into unbounded workqueue when cpu get offlined and patches will go
> >>>>> through tracing tree.
> >>>>>
> >>>>> Additionally, we found patching of tracepoints unsafe since the
> >>>>> instructions being patched are not naturally aligned. This may result in
> >>>>> 2 half-word stores, which breaks atomicity, during the code patching.
> >>>>>
> >>>>> changes in patch v2:
> >>>>>     - Enforce alignments on all functions with a compiler workaround.
> >>>>>     - Support 64bit addressing for ftrace targets if xlen == 64
> >>>>>     - Initialize ftrace target addresses to avoid calling bad address in a
> >>>>>       hypothesized case.
> >>>>>     - Use LGPTR instead of SZPTR since .align is log-scaled for
> >>>>>       mcount-dyn.S
> >>>>>     - Require the nop instruction of all jump_labels aligns naturally on
> >>>>>       4B.
> >>>>>
> >>>>> Andy Chiu (5):
> >>>>>      riscv: align ftrace to 4 Byte boundary and increase ftrace prologue
> >>>>>        size
> >>>>>      riscv: export patch_insn_write
> >>>>>      riscv: ftrace: use indirect jump to work with kernel preemption
> >>>>>      riscv: ftrace: do not use stop_machine to update code
> >>>>>      riscv: align arch_static_branch function
> >>>>>
> >>>>>     arch/riscv/Makefile                 |   2 +-
> >>>>>     arch/riscv/include/asm/ftrace.h     |  24 ----
> >>>>>     arch/riscv/include/asm/jump_label.h |   2 +
> >>>>>     arch/riscv/include/asm/patch.h      |   1 +
> >>>>>     arch/riscv/kernel/ftrace.c          | 179 ++++++++++++++++++++--------
> >>>>>     arch/riscv/kernel/mcount-dyn.S      |  69 ++++++++---
> >>>>>     arch/riscv/kernel/patch.c           |   4 +-
> >>>>>     7 files changed, 188 insertions(+), 93 deletions(-)
> >>>>>
> >>>>
> >>>> First of all, thank you for working on making dynamic Ftrace robust in
> >>>> preemptible kernels on RISC-V.
> >>>> It is an important use case but, for now, dynamic Ftrace and related
> >>>> tracers cannot be safely used with such kernels.
> >>>>
> >>>> Are there any updates on this series?
> >>>> It needs a rebase, of course, but it looks doable.
> >>>>
> >>>> If I understand the discussion correctly, the only blocker was that
> >>>> using "-falign-functions" was not enough to properly align cold
> >>>> functions and "-fno-guess-branch-probability" would likely have a
> >>>> performance cost.
> >>>>
> >>>> It seems, GCC developers have recently provided a workaround for that
> >>>> (https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;h=0f5a9a00e3ab1fe96142f304cfbcf3f63b15f326,
> >>>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88345#c24).
> >>>>
> >>>> "-fmin-function-alignment" should help but, I do not know, which GCC
> >>>> versions have got that patch already. In the meantime, one could
> >>>> probably check if "-fmin-function-alignment" is supported by the
> >>>> compiler and use it, if it is.
> >>>>
> >>>> Thoughts?
> >>>
> >>> Hi Evgenii,
> >>>
> >>> Thanks for the update. Indeed, it is essential to this patch for
> >>> toolchain to provide forced alignment. We can test this flag in the
> >>> Makefile to sort out if toolchain supports it or not. Meanwhile, I had
> >>> figured out a way for this to work on any 2-B align addresses but
> >>> hadn't implemented it out yet. Basically it would require more
> >>> patching space for us to do software alignment. I would opt for a
> >>> special toolchain flag if the toolchain just supports it.
> >>>
> >>> Let me take some time to look and get back to you soon.
> >>
> >> Thank you! Looking forward to it.
> >>
> >> In case it helps, here is what I have checked so far.
> >>
> >> 1.
> >> I added the patch
> >> https://gcc.gnu.org/git/?p=gcc.git;a=patch;h=0f5a9a00e3ab1fe96142f304cfbcf3f63b15f326
> >> to the current revision of GCC 13.2.0 from RISC-V toolchain.
> >>
> >> Rebased your patchset on top of Linux 6.8-rc4 (mostly - context changes,
> >> SYM_FUNC_START/SYM_FUNC_END for asm symbols, etc.).
> >>
> >> Reverted 8547649981e6 ("riscv: ftrace: Fixup panic by disabling
> >> preemption").
> >>
> >> Switched from -falign-functions=4 to -fmin-function-alignment=4:
> >> ------------------
> >> diff --git a/arch/riscv/Makefile b/arch/riscv/Makefile
> >> index b33b787c8b07..dcd0adeebaae 100644
> >> --- a/arch/riscv/Makefile
> >> +++ b/arch/riscv/Makefile
> >> @@ -15,9 +15,9 @@ ifeq ($(CONFIG_DYNAMIC_FTRACE),y)
> >>          LDFLAGS_vmlinux += --no-relax
> >>          KBUILD_CPPFLAGS += -DCC_USING_PATCHABLE_FUNCTION_ENTRY
> >>    ifeq ($(CONFIG_RISCV_ISA_C),y)
> >> -       CC_FLAGS_FTRACE := -fpatchable-function-entry=12 -falign-functions=4
> >> +       CC_FLAGS_FTRACE := -fpatchable-function-entry=12
> >> -fmin-function-alignment=4
> >>    else
> >> -       CC_FLAGS_FTRACE := -fpatchable-function-entry=6 -falign-functions=4
> >> +       CC_FLAGS_FTRACE := -fpatchable-function-entry=6 -fmin-function-alignment=4
> >>    endif
> >>    endif
> >>
> >> ------------------
> >>
> >> As far as I can see from objdump, the functions that were not aligned at
> >> 4-byte boundary with -falign-functions=4, are now aligned correctly with
> >> -fmin-function-alignment=4.
> >>
> >> 2.
> >> I tried the kernel in a QEMU VM with 2 CPUs and "-machine virt".
> >>
> >> The boottime tests for Ftrace had passed, except the tests for
> >> function_graph. I described the failure and the possible fix here:
> >> https://lore.kernel.org/all/dcc5976d-635a-4710-92df-94a99653314e@yadro.com/
> >
> > Indeed, this is needed. I am not sure why I got ftrace boot-time tests
> > passed back then. Thank you for solving it!
> >
> >>
> >> 3.
> >> There were also boottime warnings about "RCU not on for:
> >> arch_cpu_idle+0x0/0x2c". These are probably not related to your
> >> patchset, but rather to the fact that Ftrace is enabled in a preemptble
> >> kernel where RCU does different things.
> >>
> >> As a workaround, I disabled tracing of arch_cpu_idle() for now:
> >> ------------------
> >> diff --git a/arch/riscv/kernel/process.c b/arch/riscv/kernel/process.c
> >> index 92922dbd5b5c..6abeecbfc51d 100644
> >> --- a/arch/riscv/kernel/process.c
> >> +++ b/arch/riscv/kernel/process.c
> >> @@ -37,7 +37,7 @@ EXPORT_SYMBOL(__stack_chk_guard);
> >>
> >>    extern asmlinkage void ret_from_fork(void);
> >>
> >> -void arch_cpu_idle(void)
> >> +void noinstr arch_cpu_idle(void)
> >>    {
> >>          cpu_do_idle();
> >>    }
> >>
> >> ------------------
> >>
> >> 4.
> >> Stress-testing revealed an issue though, which I do not understand yet.
> >>
> >> Probably similar to what you did earlier, I ran a script that switched
> >> the current tracer to "function", "function_graph", "nop", "blk" each
> >> 1-5 seconds. In another shell, "stress-ng --hrtimers 1" was running.
> >>
> >> The kernel usually crashed within a few minutes, in seemingly random
> >> locations, but often in one of two ways:
> >>
> >> (a) Invalid instruction, because the address of ftrace_caller function
> >> was somehow written to the body of the traced function rather than just
> >> to the Ftrace prologue.
> >
> > The reason for this is probably that any one of your ftrace_*_call is
> > not 8-B aligned.
>
> I thought, all locations where the address of a ftrace_caller function
> is written are 8-byte aligned, if the compiler guarantees that start
> addresses of all functions are 4-byte aligned. Your patchset provides 2
> kinds of function prologues exactly for that purpose. Am I missing
> something?

Yes, it's true, and that is the first step of ftrace, e.g. to jump
into a ftrace trampoline. The second step for ftrace is to jump to the
actual ftrace handler function. We have to use a 8B-aligned .text
address to store the pointer to the handler. So it could be atomically
patched, or loaded, in dynamic ftrace.

>
> >
> >>
> >> In the following example, the crash happened at 0xffffffff800d3398. "b0
> >> d7" is actually not part of the code here, but rather the lower bytes of
> >> 0xffffffff8000d7b0, the address of ftrace_caller() in this kernel.
> >
> > It seems like there is a bug in patch_insn_write(). I think we should
> > at least disable migration during patch_map() and patch_unmap(). I'd
> > need some time to dig into patch_map(). But since __set_fixmap() only
> > flush local tlb, I'd assume it is not safe to context switch out and
> > migrate while holding the fix-map mapping. Adding preempt_disable()
> > and preempt_enable() before calling __patch_insn_write() solves the
> > issue.
> >
>
> Interesting.
> Thanks for pointing that out! I never though that the task could migrate
> to a different CPU while patch_insn_write() is running. If it could,
> that would cause such issues, sure. And probably - the issues with
> "function_graph" too, if some data were corrupted that way rather than code.

I found another issue with function_graph in preemptible Vector, not
directly related to function_graph though. Currently we don't support
calling schedule() within kernel_vector_{begin,end}. However, this
could be inevitable with ftrace + preemption. For example, preemptible
vectorized uaccess could call into return_to_handler, then call
schedule() when returned from kernel_vector_begin(). This can cause
the following Vector operation fail with illegal instruction because
VS was turned off during context switch.

        kernel_vector_begin();
        //=> return_to_handler
        //==> ... schedule()
        remain = __asm_vector_usercopy(dst, src, n);
        kernel_vector_end();

Here is what we can do if we'd support calling schedule() while in an
active preempt_v.

 static __always_inline void __vstate_csr_save(struct __riscv_v_ext_state *dest)
 {
        asm volatile (
@@ -243,6 +248,11 @@ static inline void __switch_to_vector(struct
task_struct *prev,
        struct pt_regs *regs;

        if (riscv_preempt_v_started(prev)) {
+               if (riscv_v_is_on()) {
+                       WARN_ON(prev->thread.riscv_v_flags &
RISCV_V_CTX_DEPTH_MASK);
+                       riscv_v_disable();
+                       prev->thread.riscv_v_flags |=
RISCV_PREEMPT_V_IN_SCHEDULE;
+               }
                if (riscv_preempt_v_dirty(prev)) {
                        __riscv_v_vstate_save(&prev->thread.kernel_vstate,
                                              prev->thread.kernel_vstate.datap);
@@ -253,10 +263,16 @@ static inline void __switch_to_vector(struct
task_struct *prev,
                riscv_v_vstate_save(&prev->thread.vstate, regs);
        }

-       if (riscv_preempt_v_started(next))
-               riscv_preempt_v_set_restore(next);
-       else
+       if (riscv_preempt_v_started(next)) {
+               if (next->thread.riscv_v_flags & RISCV_PREEMPT_V_IN_SCHEDULE) {
+                       next->thread.riscv_v_flags &=
~RISCV_PREEMPT_V_IN_SCHEDULE;
+                       riscv_v_enable();
+               } else {
+                       riscv_preempt_v_set_restore(next);
+               }
+       } else {
                riscv_v_vstate_set_restore(next, task_pt_regs(next));
+       }

 }

>
> >>
> >> (gdb) disas /r 0xffffffff800d3382,+0x20
> >> Dump of assembler code from 0xffffffff800d3382 to 0xffffffff800d33a2:
> >> ...
> >>      0xffffffff800d3394 <clockevents_program_event+144>:  ba 87   mv
> >> a5,a4
> >>      0xffffffff800d3396 <clockevents_program_event+146>:  c1 bf   j
> >> 0xffffffff800d3366 <clockevents_program_event+98>
> >>      0xffffffff800d3398 <clockevents_program_event+148>:  b0 d7   sw
> >> a2,104(a5) // 0xffffffff8000d7b0, the address of ftrace_caller().
> >>      0xffffffff800d339a <clockevents_program_event+150>:  00 80   .2byte
> >> 0x8000
> >>      0xffffffff800d339c <clockevents_program_event+152>:  ff ff   .2byte
> >> 0xffff
> >>      0xffffffff800d339e <clockevents_program_event+154>:  ff ff   .2byte
> >> 0xffff
> >>      0xffffffff800d33a0 <clockevents_program_event+156>:  d5 bf   j
> >> 0xffffffff800d3394 <clockevents_program_event+144
> >>
> >> The backtrace usually contains one or more occurrences of
> >> return_to_handler() in this case.
> >>
> >> [  260.520394] [<ffffffff800d3398>] clockevents_program_event+0xac/0x100
> >> [  260.521195] [<ffffffff8000d2bc>] return_to_handler+0x0/0x26
> >> [  260.521843] [<ffffffff800c50ba>] hrtimer_interrupt+0x122/0x20c
> >> [  260.522492] [<ffffffff8000d2bc>] return_to_handler+0x0/0x26
> >> [  260.523132] [<ffffffff8009785e>] handle_percpu_devid_irq+0x9e/0x1ec
> >> [  260.523788] [<ffffffff8000d2bc>] return_to_handler+0x0/0x26
> >> [  260.524437] [<ffffffff8000d2bc>] return_to_handler+0x0/0x26
> >> [  260.525080] [<ffffffff80a8acfa>] handle_riscv_irq+0x4a/0x74
> >> [  260.525726] [<ffffffff80a97b9a>] call_on_irq_stack+0x32/0x40
> >> ----------------------
> >>
> >> (b) Jump to an invalid location, e.g. to the middle of a valid 4-byte
> >> instruction. %ra usually points right after the last instruction, "jalr
> >>     a2", in return_to_handler() in such cases, so the jump was likely
> >> made from there.
> >
> > I haven't done fgraph tests yet. I will try out and see.

With the above being fixed, I can pass several hundred (and continue)
rounds of random tracer + stress-ng --hrtimers test.


> >
> >>
> >> The problem is reproducible, although I have not found what causes it yet.
> >>
> >> Any help is appreciated, of course.
> >>
> >>>
> >>>>
> >>>> Regards,
> >>>> Evgenii
> >>>
> >>> Regards,
> >>> Andy
> >>
> >
> > Also, here is another side note,
> >
> > It seems like the ftrace save/restore routine should save more
> > registers as clang's fastcc may use t2 when the number of arguments
> > exceeds what ABI defines for passing arg through registers.
>
> Yes, I reported that issue to LLVM maintainers in
> https://github.com/llvm/llvm-project/issues/83111. It seems, static
> functions with 9+ arguments use t2 and t3, etc. for the 9th and 10th
> arguments when compiled with clang.
>
> Clang seems to leave t0 and t1 alone but I do not know yet, if it is
> just a coincidence. Haven't found the exact rules for fastcc calling
> convention on RISC-V so far.
>
> A compiler option to disable fastcc for the Linux kernel builds would be
> great. But, it seems, the discussion with LLVM maintainers will go
> nowhere without benchmarks to show whether that optimization has any
> significant effect. I plan to find and run proper benchmarks when I have
> time, but not just yet.
>
> >
> > Cheers,
> > Andy
>
> Regards,
> Evgenii
>
>

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH RFC v2 riscv/for-next 0/5] Enable ftrace with kernel preemption for RISC-V
  2024-03-20 16:36           ` Andy Chiu
@ 2024-03-21 11:02             ` Alexandre Ghiti
  0 siblings, 0 replies; 43+ messages in thread
From: Alexandre Ghiti @ 2024-03-21 11:02 UTC (permalink / raw)
  To: Andy Chiu
  Cc: Evgenii Shatokhin, palmer, paul.walmsley, aou, rostedt, mingo,
	peterz, jpoimboe, jbaron, ardb, greentime.hu, zong.li, guoren,
	Jessica Clarke, kernel, linux-riscv, linux

On 20/03/2024 17:36, Andy Chiu wrote:
> On Wed, Mar 20, 2024 at 1:37 AM Alexandre Ghiti <alex@ghiti.fr> wrote:
>> Hi Andy,
>>
>> On 18/03/2024 16:31, Andy Chiu wrote:
>>> Hi Evgenii,
>>>
>>> Thanks for your help!
>>>
>>> I just rebased upon 6.8-rc1 and passed the stress-ng + ftrace/nop
>>> testing. I will add some random tracers to test and some optimization
>>> before sending out again. Here are a few things needed:
>>>
>>> On Thu, Feb 22, 2024 at 12:55 AM Evgenii Shatokhin
>>> <e.shatokhin@yadro.com> wrote:
>>>> On 21.02.2024 08:27, Andy Chiu wrote:
>>>>> «Внимание! Данное письмо от внешнего адресата!»
>>>>>
>>>>> On Wed, Feb 14, 2024 at 3:42 AM Evgenii Shatokhin <e.shatokhin@yadro.com> wrote:
>>>>>> Hi,
>>>>>>
>>>>>> On 13.09.2022 12:42, Andy Chiu wrote:
>>>>>>> This patch removes dependency of dynamic ftrace from calling
>>>>>>> stop_machine(), and makes it compatiable with kernel preemption.
>>>>>>> Originally, we ran into stack corruptions, or execution of partially
>>>>>>> updated instructions when starting or stopping ftrace on a fully
>>>>>>> preemptible kernel configuration. The reason is that kernel periodically
>>>>>>> calls rcu_momentary_dyntick_idle() on cores waiting for the code-patching
>>>>>>> core running in ftrace. Though rcu_momentary_dyntick_idle() itself is
>>>>>>> marked as notrace, it would call a bunch of tracable functions if we
>>>>>>> configured the kernel as preemptible. For example, these are some functions
>>>>>>> that happened to have a symbol and have not been marked as notrace on a
>>>>>>> RISC-V preemptible kernel compiled with GCC-11:
>>>>>>>      - __rcu_report_exp_rnp()
>>>>>>>      - rcu_report_exp_cpu_mult()
>>>>>>>      - rcu_preempt_deferred_qs()
>>>>>>>      - rcu_preempt_need_deferred_qs()
>>>>>>>      - rcu_preempt_deferred_qs_irqrestore()
>>>>>>>
>>>>>>> Thus, this make it not ideal for us to rely on stop_machine() and
>>>>>>> handly marked "notrace"s to perform runtime code patching. To remove
>>>>>>> such dependency, we must make updates of code seemed atomic on running
>>>>>>> cores. This might not be obvious for RISC-V since it usaually uses a pair
>>>>>>> of AUIPC + JALR to perform a long jump, which cannot be modified and
>>>>>>> executed concurrently if we consider preemptions. As such, this patch
>>>>>>> proposed a way to make it possible. It embeds a 32-bit rel-address data
>>>>>>> into instructions of each ftrace prologue and jumps indirectly. In this
>>>>>>> way, we could store and load the address atomically so that the code
>>>>>>> patching core could run simutaneously with the rest of running cores.
>>>>>>>
>>>>>>> After applying the patchset, we compiled a preemptible kernel with all
>>>>>>> tracers and ftrace-selftest enabled, and booted it on a 2-core QEMU virt
>>>>>>> machine. The kernel could boot up successfully, passing all ftrace
>>>>>>> testsuits. Besides, we ran a script that randomly pick a tracer on every
>>>>>>> 0~5 seconds. The kernel has sustained over 20K rounds of the test. In
>>>>>>> contrast, a preemptible kernel without our patch would panic in few
>>>>>>> rounds on the same machine.
>>>>>>>
>>>>>>> Though we ran into errors when using hwlat or irqsoff tracers together
>>>>>>> with cpu-online stressor from stress-ng on a preemptible kernel. We
>>>>>>> believe the reason may be that  percpu workers of the tracers are being
>>>>>>> queued into unbounded workqueue when cpu get offlined and patches will go
>>>>>>> through tracing tree.
>>>>>>>
>>>>>>> Additionally, we found patching of tracepoints unsafe since the
>>>>>>> instructions being patched are not naturally aligned. This may result in
>>>>>>> 2 half-word stores, which breaks atomicity, during the code patching.
>>>>>>>
>>>>>>> changes in patch v2:
>>>>>>>      - Enforce alignments on all functions with a compiler workaround.
>>>>>>>      - Support 64bit addressing for ftrace targets if xlen == 64
>>>>>>>      - Initialize ftrace target addresses to avoid calling bad address in a
>>>>>>>        hypothesized case.
>>>>>>>      - Use LGPTR instead of SZPTR since .align is log-scaled for
>>>>>>>        mcount-dyn.S
>>>>>>>      - Require the nop instruction of all jump_labels aligns naturally on
>>>>>>>        4B.
>>>>>>>
>>>>>>> Andy Chiu (5):
>>>>>>>       riscv: align ftrace to 4 Byte boundary and increase ftrace prologue
>>>>>>>         size
>>>>>>>       riscv: export patch_insn_write
>>>>>>>       riscv: ftrace: use indirect jump to work with kernel preemption
>>>>>>>       riscv: ftrace: do not use stop_machine to update code
>>>>>>>       riscv: align arch_static_branch function
>>>>>>>
>>>>>>>      arch/riscv/Makefile                 |   2 +-
>>>>>>>      arch/riscv/include/asm/ftrace.h     |  24 ----
>>>>>>>      arch/riscv/include/asm/jump_label.h |   2 +
>>>>>>>      arch/riscv/include/asm/patch.h      |   1 +
>>>>>>>      arch/riscv/kernel/ftrace.c          | 179 ++++++++++++++++++++--------
>>>>>>>      arch/riscv/kernel/mcount-dyn.S      |  69 ++++++++---
>>>>>>>      arch/riscv/kernel/patch.c           |   4 +-
>>>>>>>      7 files changed, 188 insertions(+), 93 deletions(-)
>>>>>>>
>>>>>> First of all, thank you for working on making dynamic Ftrace robust in
>>>>>> preemptible kernels on RISC-V.
>>>>>> It is an important use case but, for now, dynamic Ftrace and related
>>>>>> tracers cannot be safely used with such kernels.
>>>>>>
>>>>>> Are there any updates on this series?
>>>>>> It needs a rebase, of course, but it looks doable.
>>>>>>
>>>>>> If I understand the discussion correctly, the only blocker was that
>>>>>> using "-falign-functions" was not enough to properly align cold
>>>>>> functions and "-fno-guess-branch-probability" would likely have a
>>>>>> performance cost.
>>>>>>
>>>>>> It seems, GCC developers have recently provided a workaround for that
>>>>>> (https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;h=0f5a9a00e3ab1fe96142f304cfbcf3f63b15f326,
>>>>>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88345#c24).
>>>>>>
>>>>>> "-fmin-function-alignment" should help but, I do not know, which GCC
>>>>>> versions have got that patch already. In the meantime, one could
>>>>>> probably check if "-fmin-function-alignment" is supported by the
>>>>>> compiler and use it, if it is.
>>>>>>
>>>>>> Thoughts?
>>>>> Hi Evgenii,
>>>>>
>>>>> Thanks for the update. Indeed, it is essential to this patch for
>>>>> toolchain to provide forced alignment. We can test this flag in the
>>>>> Makefile to sort out if toolchain supports it or not. Meanwhile, I had
>>>>> figured out a way for this to work on any 2-B align addresses but
>>>>> hadn't implemented it out yet. Basically it would require more
>>>>> patching space for us to do software alignment. I would opt for a
>>>>> special toolchain flag if the toolchain just supports it.
>>>>>
>>>>> Let me take some time to look and get back to you soon.
>>>> Thank you! Looking forward to it.
>>>>
>>>> In case it helps, here is what I have checked so far.
>>>>
>>>> 1.
>>>> I added the patch
>>>> https://gcc.gnu.org/git/?p=gcc.git;a=patch;h=0f5a9a00e3ab1fe96142f304cfbcf3f63b15f326
>>>> to the current revision of GCC 13.2.0 from RISC-V toolchain.
>>>>
>>>> Rebased your patchset on top of Linux 6.8-rc4 (mostly - context changes,
>>>> SYM_FUNC_START/SYM_FUNC_END for asm symbols, etc.).
>>>>
>>>> Reverted 8547649981e6 ("riscv: ftrace: Fixup panic by disabling
>>>> preemption").
>>>>
>>>> Switched from -falign-functions=4 to -fmin-function-alignment=4:
>>>> ------------------
>>>> diff --git a/arch/riscv/Makefile b/arch/riscv/Makefile
>>>> index b33b787c8b07..dcd0adeebaae 100644
>>>> --- a/arch/riscv/Makefile
>>>> +++ b/arch/riscv/Makefile
>>>> @@ -15,9 +15,9 @@ ifeq ($(CONFIG_DYNAMIC_FTRACE),y)
>>>>           LDFLAGS_vmlinux += --no-relax
>>>>           KBUILD_CPPFLAGS += -DCC_USING_PATCHABLE_FUNCTION_ENTRY
>>>>     ifeq ($(CONFIG_RISCV_ISA_C),y)
>>>> -       CC_FLAGS_FTRACE := -fpatchable-function-entry=12 -falign-functions=4
>>>> +       CC_FLAGS_FTRACE := -fpatchable-function-entry=12
>>>> -fmin-function-alignment=4
>>>>     else
>>>> -       CC_FLAGS_FTRACE := -fpatchable-function-entry=6 -falign-functions=4
>>>> +       CC_FLAGS_FTRACE := -fpatchable-function-entry=6 -fmin-function-alignment=4
>>>>     endif
>>>>     endif
>>>>
>>>> ------------------
>>>>
>>>> As far as I can see from objdump, the functions that were not aligned at
>>>> 4-byte boundary with -falign-functions=4, are now aligned correctly with
>>>> -fmin-function-alignment=4.
>>>>
>>>> 2.
>>>> I tried the kernel in a QEMU VM with 2 CPUs and "-machine virt".
>>>>
>>>> The boottime tests for Ftrace had passed, except the tests for
>>>> function_graph. I described the failure and the possible fix here:
>>>> https://lore.kernel.org/all/dcc5976d-635a-4710-92df-94a99653314e@yadro.com/
>>> Indeed, this is needed. I am not sure why I got ftrace boot-time tests
>>> passed back then. Thank you for solving it!
>>>
>>>> 3.
>>>> There were also boottime warnings about "RCU not on for:
>>>> arch_cpu_idle+0x0/0x2c". These are probably not related to your
>>>> patchset, but rather to the fact that Ftrace is enabled in a preemptble
>>>> kernel where RCU does different things.
>>>>
>>>> As a workaround, I disabled tracing of arch_cpu_idle() for now:
>>>> ------------------
>>>> diff --git a/arch/riscv/kernel/process.c b/arch/riscv/kernel/process.c
>>>> index 92922dbd5b5c..6abeecbfc51d 100644
>>>> --- a/arch/riscv/kernel/process.c
>>>> +++ b/arch/riscv/kernel/process.c
>>>> @@ -37,7 +37,7 @@ EXPORT_SYMBOL(__stack_chk_guard);
>>>>
>>>>     extern asmlinkage void ret_from_fork(void);
>>>>
>>>> -void arch_cpu_idle(void)
>>>> +void noinstr arch_cpu_idle(void)
>>>>     {
>>>>           cpu_do_idle();
>>>>     }
>>>>
>>>> ------------------
>>>>
>>>> 4.
>>>> Stress-testing revealed an issue though, which I do not understand yet.
>>>>
>>>> Probably similar to what you did earlier, I ran a script that switched
>>>> the current tracer to "function", "function_graph", "nop", "blk" each
>>>> 1-5 seconds. In another shell, "stress-ng --hrtimers 1" was running.
>>>>
>>>> The kernel usually crashed within a few minutes, in seemingly random
>>>> locations, but often in one of two ways:
>>>>
>>>> (a) Invalid instruction, because the address of ftrace_caller function
>>>> was somehow written to the body of the traced function rather than just
>>>> to the Ftrace prologue.
>>> The reason for this is probably that any one of your ftrace_*_call is
>>> not 8-B aligned.
>>>
>>>> In the following example, the crash happened at 0xffffffff800d3398. "b0
>>>> d7" is actually not part of the code here, but rather the lower bytes of
>>>> 0xffffffff8000d7b0, the address of ftrace_caller() in this kernel.
>>> It seems like there is a bug in patch_insn_write(). I think we should
>>> at least disable migration during patch_map() and patch_unmap(). I'd
>>> need some time to dig into patch_map(). But since __set_fixmap() only
>>> flush local tlb, I'd assume it is not safe to context switch out and
>>> migrate while holding the fix-map mapping. Adding preempt_disable()
>>> and preempt_enable() before calling __patch_insn_write() solves the
>>> issue.
>>
>> Yes, Andrea already mentioned this, I came up with the same idea of
>> preempt_disable() but then I noticed arm64 actually disables IRQ: any
>> idea why?
>> https://lore.kernel.org/linux-riscv/CAHVXubj7ChgpvN4F_QO0oASaT5WC2VS0Q-bEqhnmF8z8QV=yDQ@mail.gmail.com/
> Hi, I took a quick look and it seems that it is a design choice in
> software to me. ARM uses a spinlock to protect text and we use a
> mutex. If they have a requirement to do patching while irq is off
> (maybe in an ipi handler), then the only viable option would be to use
> raw_spin_lock_irqsave. I think preempt_disable should be enough for us
> if we use text_mutex to protect patching. Or, am I missing something?


I agree with you, I convinced myself that it should be enough :)

Do you intend to send this patch? Or should I? I have another small fix 
for ftrace, so I don't mind sending this one. Up to you, we just need to 
make sure it lands in 6.9 :)

Thanks


>
>
>
>
>>
>>>> (gdb) disas /r 0xffffffff800d3382,+0x20
>>>> Dump of assembler code from 0xffffffff800d3382 to 0xffffffff800d33a2:
>>>> ...
>>>>       0xffffffff800d3394 <clockevents_program_event+144>:  ba 87   mv
>>>> a5,a4
>>>>       0xffffffff800d3396 <clockevents_program_event+146>:  c1 bf   j
>>>> 0xffffffff800d3366 <clockevents_program_event+98>
>>>>       0xffffffff800d3398 <clockevents_program_event+148>:  b0 d7   sw
>>>> a2,104(a5) // 0xffffffff8000d7b0, the address of ftrace_caller().
>>>>       0xffffffff800d339a <clockevents_program_event+150>:  00 80   .2byte
>>>> 0x8000
>>>>       0xffffffff800d339c <clockevents_program_event+152>:  ff ff   .2byte
>>>> 0xffff
>>>>       0xffffffff800d339e <clockevents_program_event+154>:  ff ff   .2byte
>>>> 0xffff
>>>>       0xffffffff800d33a0 <clockevents_program_event+156>:  d5 bf   j
>>>> 0xffffffff800d3394 <clockevents_program_event+144
>>>>
>>>> The backtrace usually contains one or more occurrences of
>>>> return_to_handler() in this case.
>>>>
>>>> [  260.520394] [<ffffffff800d3398>] clockevents_program_event+0xac/0x100
>>>> [  260.521195] [<ffffffff8000d2bc>] return_to_handler+0x0/0x26
>>>> [  260.521843] [<ffffffff800c50ba>] hrtimer_interrupt+0x122/0x20c
>>>> [  260.522492] [<ffffffff8000d2bc>] return_to_handler+0x0/0x26
>>>> [  260.523132] [<ffffffff8009785e>] handle_percpu_devid_irq+0x9e/0x1ec
>>>> [  260.523788] [<ffffffff8000d2bc>] return_to_handler+0x0/0x26
>>>> [  260.524437] [<ffffffff8000d2bc>] return_to_handler+0x0/0x26
>>>> [  260.525080] [<ffffffff80a8acfa>] handle_riscv_irq+0x4a/0x74
>>>> [  260.525726] [<ffffffff80a97b9a>] call_on_irq_stack+0x32/0x40
>>>> ----------------------
>>>>
>>>> (b) Jump to an invalid location, e.g. to the middle of a valid 4-byte
>>>> instruction. %ra usually points right after the last instruction, "jalr
>>>>      a2", in return_to_handler() in such cases, so the jump was likely
>>>> made from there.
>>> I haven't done fgraph tests yet. I will try out and see.
>>>
>>>> The problem is reproducible, although I have not found what causes it yet.
>>>>
>>>> Any help is appreciated, of course.
>>>>
>>>>>> Regards,
>>>>>> Evgenii
>>>>> Regards,
>>>>> Andy
>>> Also, here is another side note,
>>>
>>> It seems like the ftrace save/restore routine should save more
>>> registers as clang's fastcc may use t2 when the number of arguments
>>> exceeds what ABI defines for passing arg through registers.
>>>
>>> Cheers,
>>> Andy
>>>
>>> _______________________________________________
>>> linux-riscv mailing list
>>> linux-riscv@lists.infradead.org
>>> http://lists.infradead.org/mailman/listinfo/linux-riscv

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 43+ messages in thread

end of thread, other threads:[~2024-03-21 11:02 UTC | newest]

Thread overview: 43+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-13  9:42 [PATCH RFC v2 riscv/for-next 0/5] Enable ftrace with kernel preemption for RISC-V Andy Chiu
2022-09-13  9:42 ` [PATCH RFC v2 riscv/for-next 1/5] riscv: align ftrace to 4 Byte boundary and increase ftrace prologue size Andy Chiu
2022-09-15 13:53   ` Guo Ren
2022-09-17  1:15     ` Andy Chiu
2022-09-13  9:42 ` [PATCH RFC v2 riscv/for-next 2/5] riscv: export patch_insn_write Andy Chiu
2022-09-13  9:42 ` [PATCH RFC v2 riscv/for-next 3/5] riscv: ftrace: use indirect jump to work with kernel preemption Andy Chiu
2022-09-14 13:45   ` Guo Ren
2022-09-15 13:30     ` Guo Ren
2022-09-17  1:04     ` Andy Chiu
2022-09-17 10:56       ` Guo Ren
2024-02-20 14:17   ` Evgenii Shatokhin
2022-09-13  9:42 ` [PATCH RFC v2 riscv/for-next 4/5] riscv: ftrace: do not use stop_machine to update code Andy Chiu
2022-09-13  9:42 ` [PATCH RFC v2 riscv/for-next 5/5] riscv: align arch_static_branch function Andy Chiu
2022-09-14 14:06   ` Guo Ren
2022-09-16 23:54     ` Andy Chiu
2022-09-17  0:22       ` Guo Ren
2022-09-17 18:17         ` [PATCH] riscv: jump_label: Optimize size with RISCV_ISA_C guoren
2022-09-17 18:38         ` [PATCH RFC v2 riscv/for-next 5/5] riscv: align arch_static_branch function guoren
2022-09-17 23:49           ` Guo Ren
2022-09-17 23:59           ` Guo Ren
2022-09-18  0:12           ` Jessica Clarke
2022-09-18  0:46             ` Guo Ren
2022-09-14 14:24   ` Jessica Clarke
2022-09-15  1:47     ` Guo Ren
2022-09-15  2:34       ` Jessica Clarke
2024-02-13 19:42 ` [PATCH RFC v2 riscv/for-next 0/5] Enable ftrace with kernel preemption for RISC-V Evgenii Shatokhin
2024-02-21  5:27   ` Andy Chiu
2024-02-21 16:55     ` Evgenii Shatokhin
2024-03-06 20:57       ` Alexandre Ghiti
2024-03-07  8:35         ` Evgenii Shatokhin
2024-03-07 12:27         ` Andy Chiu
2024-03-07 13:21           ` Alexandre Ghiti
2024-03-07 15:57             ` Samuel Holland
2024-03-11 14:24               ` Andy Chiu
2024-03-19 14:50                 ` Alexandre Ghiti
2024-03-19 14:58                   ` Conor Dooley
2024-03-20 16:37                   ` Andy Chiu
2024-03-18 15:31       ` Andy Chiu
2024-03-19 15:32         ` Evgenii Shatokhin
2024-03-20 16:38           ` Andy Chiu
2024-03-19 17:37         ` Alexandre Ghiti
2024-03-20 16:36           ` Andy Chiu
2024-03-21 11:02             ` Alexandre Ghiti

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.