linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH -next V6 0/7] riscv: Optimize function trace
@ 2023-01-07 13:35 guoren
  2023-01-07 13:35 ` [PATCH -next V6 1/7] riscv: ftrace: Fixup panic by disabling preemption guoren
                   ` (7 more replies)
  0 siblings, 8 replies; 28+ messages in thread
From: guoren @ 2023-01-07 13:35 UTC (permalink / raw)
  To: anup, paul.walmsley, palmer, conor.dooley, heiko, rostedt,
	mhiramat, jolsa, bp, jpoimboe, suagrfillet, andy.chiu,
	e.shatokhin, guoren
  Cc: linux-riscv, linux-kernel, Guo Ren

From: Guo Ren <guoren@linux.alibaba.com>

The previous ftrace detour implementation fc76b8b8011 ("riscv: Using
PATCHABLE_FUNCTION_ENTRY instead of MCOUNT") contain three problems.

 - The most horrible bug is preemption panic which found by Andy [1].
   Let's disable preemption for ftrace first, and Andy could continue
   the ftrace preemption work.
 - The "-fpatchable-function-entry= CFLAG" wasted code size
   !RISCV_ISA_C.
 - The ftrace detour implementation wasted code size.
 - When livepatching, the trampoline (ftrace_regs_caller) would not
   return to <func_prolog+12> but would rather jump to the new function.
   So, "REG_L ra, -SZREG(sp)" would not run and the original return
   address would not be restored. The kernel is likely to hang or crash
   as a result. (Found by Evgenii Shatokhin [4]) 

Patches 1,2,3 fixup above problems.

Patches 4,5,6,7 are the features based on reduced detour code
patch, we include them in the series for test and maintenance.

You can directly try it with:
https://github.com/guoren83/linux/tree/ftrace_fixup_v6

Make function graph use ftrace directly [2] (patch 4, 5)
======================================================== 

In RISC-V architecture, when we enable the ftrace_graph tracer on some
functions, the function tracings on other functions will suffer extra
graph tracing work. In essence, graph_ops isn't limited by its func_hash
due to the global ftrace_graph_[regs]_call label. That should be
corrected.

What inspires me is the commit 0c0593b45c9b ("x86/ftrace: Make function
graph use ftrace directly") that uses graph_ops::func function to
install return_hooker and makes the function called against its
func_hash.

This series of patches makes function graph use ftrace directly for
riscv.

If FTRACE_WITH_REGS isn't defined, ftrace_caller keeps ftrace_graph_call
so that it can be replaced with the calling of prepare_ftrace_return by
the enable/disable helper.

As for defining FTRACE_WITH_REGS, ftrace_caller is adjusted to save the
necessary regs against the pt_regs layout, so it can reasonably call the
graph_ops::func function - ftrace_graph_func. And
ftrace_graph_[regs]_call
and its enable/disable helper aren't needed.

Test log:

The tests generated by CONFIG_FTRACE_STARTUP_TEST have passed in the
local
qemu-system-riscv64 virt machine. The following is the log during
startup.

```
Nov 15 03:07:13 stage4 kernel: Testing tracer function: PASSED
Nov 15 03:07:13 stage4 kernel: Testing dynamic ftrace: PASSED
Nov 15 03:07:13 stage4 kernel: Testing dynamic ftrace ops #1: 
Nov 15 03:07:13 stage4 kernel: (1 0 1 0 0)  
Nov 15 03:07:13 stage4 kernel: (1 1 2 0 0)  
Nov 15 03:07:13 stage4 kernel: (2 1 3 0 365) 
Nov 15 03:07:13 stage4 kernel: (2 2 4 0 399) 
Nov 15 03:07:13 stage4 kernel: (3 2 4 0 146071) 
Nov 15 03:07:13 stage4 kernel: (3 3 5 0 146105) PASSED
Nov 15 03:07:13 stage4 kernel: Testing dynamic ftrace ops #2: 
Nov 15 03:07:13 stage4 kernel: (1 0 1 589 0)  
Nov 15 03:07:13 stage4 kernel: (1 1 2 635 0)  
Nov 15 03:07:13 stage4 kernel: (2 1 3 1 2)  
Nov 15 03:07:13 stage4 kernel: (2 2 4 125 126) 
Nov 15 03:07:13 stage4 kernel: (3 2 4 146001 146078) 
Nov 15 03:07:13 stage4 kernel: (3 3 5 146035 146112) PASSED
Nov 15 03:07:13 stage4 kernel: Testing ftrace recursion: PASSED
Nov 15 03:07:13 stage4 kernel: Testing ftrace recursion safe: PASSED
Nov 15 03:07:13 stage4 kernel: Testing ftrace regs: PASSED
Nov 15 03:07:13 stage4 kernel: Testing tracer nop: PASSED
Nov 15 03:07:13 stage4 kernel: Testing tracer irqsoff: PASSED
Nov 15 03:07:13 stage4 kernel: Testing tracer wakeup:
Nov 15 03:07:13 stage4 kernel: sched: DL replenish lagged too much
Nov 15 03:07:13 stage4 kernel: PASSED
Nov 15 03:07:13 stage4 kernel: Testing tracer wakeup_rt: PASSED
Nov 15 03:07:13 stage4 kernel: Testing tracer wakeup_dl: PASSED
Nov 15 03:07:13 stage4 kernel: Testing tracer function_graph: PASSED
```

Add WITH_DIRECT_CALLS support [3] (patch 6, 7)
==============================================

This series adds DYNAMIC_FTRACE_WITH_DIRECT_CALLS support for RISC-V.
SAMPLE_FTRACE_DIRECT and SAMPLE_FTRACE_DIRECT_MULTI are also included
here as the samples for testing DIRECT_CALLS related interface.

First, select the DYNAMIC_FTRACE_WITH_DIRECT_CALLS to provide 
register_ftrace_direct[_multi] interfaces allowing user to register 
the customed trampoline (direct_caller) as the mcount for one or 
more target functions. And modify_ftrace_direct[_multi] are also 
provided for modify direct_caller.

At the same time, the samples in ./samples/ftrace/ can be built
as kerenl module for testing these interfaces with SAMPLE_FTRACE_DIRECT
and SAMPLE_FTRACE_DIRECT_MULTI selected.

Second, to make the direct_caller and the other ftrace hooks
(eg. function/fgraph tracer, k[ret]probes) co-exist, a temporary
register
are nominated to store the address of direct_caller in
ftrace_regs_caller.
After the setting of the address direct_caller by direct_ops->func and
the RESTORE_REGS in ftrace_regs_caller, direct_caller will be jumped to
by the `jr` inst.

The following tests have been passed in my local qemu-riscv64 virt
machine. 

1. tests with CONFIG_FTRACE_STARTUP_TEST
2. tests of samples/ftrace/ftrace*.ko
3. manual tests with any combination of the following hooks
  - function/function_graph tracer 
  - ftrace*.ko
  - kprobe/kretprobe

For your reference, here is the log when function tracer, kretprobe and 
ftrace-direct-too.ko co-hooks the handle_mm_fault function.

```
[root@stage4 tracing]# echo handle_mm_fault > set_ftrace_filter
[root@stage4 tracing]# echo 'r:myr handle_mm_fault' > kprobe_events
[root@stage4 tracing]# echo function > current_tracer 
[root@stage4 tracing]# echo 1 > events/kprobes/myr/enable 
[root@stage4 tracing]# insmod /root/ftrace-direct-too.ko 
[root@stage4 tracing]# 
[root@stage4 tracing]# cat trace | tail
             cat-388     [000] ...1.   583.051438: myr:
(do_page_fault+0x16c/0x5f2 <- handle_mm_fault)
             cat-388     [000] ...2.   583.057930: handle_mm_fault
<-do_page_fault
             cat-388     [000] .....   583.057990: my_direct_func:
handle mm fault vma=000000002d9fe19c address=ffffffae9b7000 flags=215
             cat-388     [000] ...1.   583.058284: myr:
(do_page_fault+0x16c/0x5f2 <- handle_mm_fault)
            tail-389     [001] ...2.   583.059062: handle_mm_fault
<-do_page_fault
            tail-389     [001] .....   583.059104: my_direct_func:
handle mm fault vma=0000000017f3c48e address=aaaaaabebf3000 flags=215
            tail-389     [001] ...1.   583.059325: myr:
(do_page_fault+0x16c/0x5f2 <- handle_mm_fault)
            tail-389     [001] ...2.   583.060371: handle_mm_fault
<-do_page_fault
            tail-389     [001] .....   583.060410: my_direct_func:
handle mm fault vma=0000000017f3c48e address=aaaaaabebf1000 flags=255
            tail-389     [001] ...1.   583.060996: myr:
(do_page_fault+0x16c/0x5f2 <- handle_mm_fault)
```
Note1:

The checkpatch.pl will output some warnings on this series, like this

```
WARNING: Prefer using '"%s...", __func__' to using 'my_direct_func2',
this function's name, in a string
111: FILE: samples/ftrace/ftrace-direct-multi-modify.c:48:
+"       call my_direct_func2\n"
```

The reason is that checkpatch depends on patch context providing the
function name. In the above warning, my_direct_func2 has some codeline
distance with the changed trunk, so its declaration doesn't come into
the patch, and then the warning jumps out.

You may notice the location of `my_ip` variable changes in the 2nd
patch. I did that for reducing the warnings to some extent. But killing
all the warnings will makes the patch less readable, so I stopped here.

[1] https://lpc.events/event/16/contributions/1171/
[2] https://lore.kernel.org/lkml/20221120084230.910152-1-suagrfillet@gmail.com/
[3] https://lore.kernel.org/linux-riscv/20221123142025.1504030-1-suagrfillet@gmail.com/ 
[4] https://lore.kernel.org/linux-riscv/d7d5730b-ebef-68e5-5046-e763e1ee6164@yadro.com/

Changes in v6:
 - Replace 8 with MCOUNT_INSN_SIZE
 - Replace "REG_L a1, PT_RA(sp)" with "mv a1, ra"
 - Add Evgenii Shatokhin comment

Changes in v5:
https://lore.kernel.org/linux-riscv/20221208091244.203407-1-guoren@kernel.org/
 - Sort Kconfig entries in alphabetical order.

Changes in v4:
https://lore.kernel.org/linux-riscv/20221129033230.255947-1-guoren@kernel.org/
 - Include [3] for maintenance. [Song Shuai]

Changes in V3:
https://lore.kernel.org/linux-riscv/20221123153950.2911981-1-guoren@kernel.org/
 - Include [2] for maintenance. [Song Shuai]

Changes in V2:
https://lore.kernel.org/linux-riscv/20220921034910.3142465-1-guoren@kernel.org/
 - Add Signed-off for preemption fixup.

Changes in V1:
https://lore.kernel.org/linux-riscv/20220916103817.9490-1-guoren@kernel.org/

Andy Chiu (1):
  riscv: ftrace: Fixup panic by disabling preemption

Guo Ren (2):
  riscv: ftrace: Remove wasted nops for !RISCV_ISA_C
  riscv: ftrace: Reduce the detour code size to half

Song Shuai (4):
  riscv: ftrace: Add ftrace_graph_func
  riscv: ftrace: Make ftrace_caller call ftrace_graph_func
  riscv: ftrace: Add DYNAMIC_FTRACE_WITH_DIRECT_CALLS support
  samples: ftrace: Add riscv support for SAMPLE_FTRACE_DIRECT[_MULTI]

 arch/riscv/Kconfig                          |   3 +-
 arch/riscv/Makefile                         |   6 +-
 arch/riscv/include/asm/ftrace.h             |  71 ++++++--
 arch/riscv/kernel/ftrace.c                  |  91 ++++------
 arch/riscv/kernel/mcount-dyn.S              | 181 +++++++++++++-------
 samples/ftrace/ftrace-direct-modify.c       |  33 ++++
 samples/ftrace/ftrace-direct-multi-modify.c |  37 ++++
 samples/ftrace/ftrace-direct-multi.c        |  22 +++
 samples/ftrace/ftrace-direct-too.c          |  26 +++
 samples/ftrace/ftrace-direct.c              |  22 +++
 10 files changed, 355 insertions(+), 137 deletions(-)

-- 
2.36.1


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH -next V6 1/7] riscv: ftrace: Fixup panic by disabling preemption
  2023-01-07 13:35 [PATCH -next V6 0/7] riscv: Optimize function trace guoren
@ 2023-01-07 13:35 ` guoren
  2023-01-09 17:19   ` Mark Rutland
  2023-01-07 13:35 ` [PATCH -next V6 2/7] riscv: ftrace: Remove wasted nops for !RISCV_ISA_C guoren
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 28+ messages in thread
From: guoren @ 2023-01-07 13:35 UTC (permalink / raw)
  To: anup, paul.walmsley, palmer, conor.dooley, heiko, rostedt,
	mhiramat, jolsa, bp, jpoimboe, suagrfillet, andy.chiu,
	e.shatokhin, guoren
  Cc: linux-riscv, linux-kernel

From: Andy Chiu <andy.chiu@sifive.com>

In RISCV, we must use an AUIPC + JALR pair to encode an immediate,
forming a jump that jumps to an address over 4K. This may cause errors
if we want to enable kernel preemption and remove dependency from
patching code with stop_machine(). For example, if a task was switched
out on auipc. And, if we changed the ftrace function before it was
switched back, then it would jump to an address that has updated 11:0
bits mixing with previous XLEN:12 part.

p: patched area performed by dynamic ftrace
ftrace_prologue:
p|      REG_S   ra, -SZREG(sp)
p|      auipc   ra, 0x? ------------> preempted
					...
				change ftrace function
					...
p|      jalr    -?(ra) <------------- switched back
p|      REG_L   ra, -SZREG(sp)
func:
	xxx
	ret

Fixes: afc76b8b8011 ("riscv: Using PATCHABLE_FUNCTION_ENTRY instead of MCOUNT")
Signed-off-by: Andy Chiu <andy.chiu@sifive.com>
Signed-off-by: Guo Ren <guoren@kernel.org>
---
 arch/riscv/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index e2b656043abf..ee0d39b26794 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -138,7 +138,7 @@ config RISCV
 	select HAVE_DYNAMIC_FTRACE_WITH_REGS if HAVE_DYNAMIC_FTRACE
 	select HAVE_FTRACE_MCOUNT_RECORD if !XIP_KERNEL
 	select HAVE_FUNCTION_GRAPH_TRACER
-	select HAVE_FUNCTION_TRACER if !XIP_KERNEL
+	select HAVE_FUNCTION_TRACER if !XIP_KERNEL && !PREEMPTION
 
 config ARCH_MMAP_RND_BITS_MIN
 	default 18 if 64BIT
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH -next V6 2/7] riscv: ftrace: Remove wasted nops for !RISCV_ISA_C
  2023-01-07 13:35 [PATCH -next V6 0/7] riscv: Optimize function trace guoren
  2023-01-07 13:35 ` [PATCH -next V6 1/7] riscv: ftrace: Fixup panic by disabling preemption guoren
@ 2023-01-07 13:35 ` guoren
  2023-01-07 13:35 ` [PATCH -next V6 3/7] riscv: ftrace: Reduce the detour code size to half guoren
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 28+ messages in thread
From: guoren @ 2023-01-07 13:35 UTC (permalink / raw)
  To: anup, paul.walmsley, palmer, conor.dooley, heiko, rostedt,
	mhiramat, jolsa, bp, jpoimboe, suagrfillet, andy.chiu,
	e.shatokhin, guoren
  Cc: linux-riscv, linux-kernel, Guo Ren

From: Guo Ren <guoren@linux.alibaba.com>

When CONFIG_RISCV_ISA_C=n, -fpatchable-function-entry=8 would generate
more nops than we expect. Because it treat nop opcode as 0x00000013
instead of 0x0001.

Dump of assembler code for function dw_pcie_free_msi:
   0xffffffff806fce94 <+0>:     sd      ra,-8(sp)
   0xffffffff806fce98 <+4>:     auipc   ra,0xff90f
   0xffffffff806fce9c <+8>:     jalr    -684(ra) # 0xffffffff8000bbec
<ftrace_caller>
   0xffffffff806fcea0 <+12>:    ld      ra,-8(sp)
   0xffffffff806fcea4 <+16>:    nop /* wasted */
   0xffffffff806fcea8 <+20>:    nop /* wasted */
   0xffffffff806fceac <+24>:    nop /* wasted */
   0xffffffff806fceb0 <+28>:    nop /* wasted */
   0xffffffff806fceb4 <+0>:     addi    sp,sp,-48
   0xffffffff806fceb8 <+4>:     sd      s0,32(sp)
   0xffffffff806fcebc <+8>:     sd      s1,24(sp)
   0xffffffff806fcec0 <+12>:    sd      s2,16(sp)
   0xffffffff806fcec4 <+16>:    sd      s3,8(sp)
   0xffffffff806fcec8 <+20>:    sd      ra,40(sp)
   0xffffffff806fcecc <+24>:    addi    s0,sp,48

Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Signed-off-by: Guo Ren <guoren@kernel.org>
---
 arch/riscv/Makefile | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/arch/riscv/Makefile b/arch/riscv/Makefile
index 12d91b0a73d8..ea5a91da6897 100644
--- a/arch/riscv/Makefile
+++ b/arch/riscv/Makefile
@@ -11,7 +11,11 @@ LDFLAGS_vmlinux :=
 ifeq ($(CONFIG_DYNAMIC_FTRACE),y)
 	LDFLAGS_vmlinux := --no-relax
 	KBUILD_CPPFLAGS += -DCC_USING_PATCHABLE_FUNCTION_ENTRY
+ifeq ($(CONFIG_RISCV_ISA_C),y)
 	CC_FLAGS_FTRACE := -fpatchable-function-entry=8
+else
+	CC_FLAGS_FTRACE := -fpatchable-function-entry=4
+endif
 endif
 
 ifeq ($(CONFIG_CMODEL_MEDLOW),y)
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH -next V6 3/7] riscv: ftrace: Reduce the detour code size to half
  2023-01-07 13:35 [PATCH -next V6 0/7] riscv: Optimize function trace guoren
  2023-01-07 13:35 ` [PATCH -next V6 1/7] riscv: ftrace: Fixup panic by disabling preemption guoren
  2023-01-07 13:35 ` [PATCH -next V6 2/7] riscv: ftrace: Remove wasted nops for !RISCV_ISA_C guoren
@ 2023-01-07 13:35 ` guoren
  2023-01-10 17:13   ` Evgenii Shatokhin
  2023-01-07 13:35 ` [PATCH -next V6 4/7] riscv: ftrace: Add ftrace_graph_func guoren
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 28+ messages in thread
From: guoren @ 2023-01-07 13:35 UTC (permalink / raw)
  To: anup, paul.walmsley, palmer, conor.dooley, heiko, rostedt,
	mhiramat, jolsa, bp, jpoimboe, suagrfillet, andy.chiu,
	e.shatokhin, guoren
  Cc: linux-riscv, linux-kernel, Guo Ren

From: Guo Ren <guoren@linux.alibaba.com>

Use a temporary register to reduce the size of detour code from 16 bytes to
8 bytes. The previous implementation is from 'commit afc76b8b8011 ("riscv:
Using PATCHABLE_FUNCTION_ENTRY instead of MCOUNT")'.

Before the patch:
<func_prolog>:
 0: REG_S  ra, -SZREG(sp)
 4: auipc  ra, ?
 8: jalr   ?(ra)
12: REG_L  ra, -SZREG(sp)
 (func_boddy)

After the patch:
<func_prolog>:
 0: auipc  t0, ?
 4: jalr   t0, ?(t0)
 (func_boddy)

This patch not just reduces the size of detour code, but also fixes an
important issue:

An Ftrace callback registered with FTRACE_OPS_FL_IPMODIFY flag can
actually change the instruction pointer, e.g. to "replace" the given
kernel function with a new one, which is needed for livepatching, etc.

In this case, the trampoline (ftrace_regs_caller) would not return to
<func_prolog+12> but would rather jump to the new function. So, "REG_L
ra, -SZREG(sp)" would not run and the original return address would not
be restored. The kernel is likely to hang or crash as a result.

This can be easily demonstrated if one tries to "replace", say,
cmdline_proc_show() with a new function with the same signature using
instruction_pointer_set(&fregs->regs, new_func_addr) in the Ftrace
callback.

Link: https://lore.kernel.org/linux-riscv/20221122075440.1165172-1-suagrfillet@gmail.com/
Link: https://lore.kernel.org/linux-riscv/d7d5730b-ebef-68e5-5046-e763e1ee6164@yadro.com/
Co-developed-by: Song Shuai <suagrfillet@gmail.com>
Signed-off-by: Song Shuai <suagrfillet@gmail.com>
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Signed-off-by: Guo Ren <guoren@kernel.org>
Cc: Evgenii Shatokhin <e.shatokhin@yadro.com>
---
 arch/riscv/Makefile             |  4 +-
 arch/riscv/include/asm/ftrace.h | 50 +++++++++++++++++++------
 arch/riscv/kernel/ftrace.c      | 65 ++++++++++-----------------------
 arch/riscv/kernel/mcount-dyn.S  | 43 +++++++++-------------
 4 files changed, 76 insertions(+), 86 deletions(-)

diff --git a/arch/riscv/Makefile b/arch/riscv/Makefile
index ea5a91da6897..3c9aaf67ed79 100644
--- a/arch/riscv/Makefile
+++ b/arch/riscv/Makefile
@@ -12,9 +12,9 @@ ifeq ($(CONFIG_DYNAMIC_FTRACE),y)
 	LDFLAGS_vmlinux := --no-relax
 	KBUILD_CPPFLAGS += -DCC_USING_PATCHABLE_FUNCTION_ENTRY
 ifeq ($(CONFIG_RISCV_ISA_C),y)
-	CC_FLAGS_FTRACE := -fpatchable-function-entry=8
-else
 	CC_FLAGS_FTRACE := -fpatchable-function-entry=4
+else
+	CC_FLAGS_FTRACE := -fpatchable-function-entry=2
 endif
 endif
 
diff --git a/arch/riscv/include/asm/ftrace.h b/arch/riscv/include/asm/ftrace.h
index 04dad3380041..9e73922e1e2e 100644
--- a/arch/riscv/include/asm/ftrace.h
+++ b/arch/riscv/include/asm/ftrace.h
@@ -42,6 +42,14 @@ struct dyn_arch_ftrace {
  * 2) jalr: setting low-12 offset to ra, jump to ra, and set ra to
  *          return address (original pc + 4)
  *
+ *<ftrace enable>:
+ * 0: auipc  t0/ra, 0x?
+ * 4: jalr   t0/ra, ?(t0/ra)
+ *
+ *<ftrace disable>:
+ * 0: nop
+ * 4: nop
+ *
  * Dynamic ftrace generates probes to call sites, so we must deal with
  * both auipc and jalr at the same time.
  */
@@ -52,25 +60,43 @@ struct dyn_arch_ftrace {
 #define AUIPC_OFFSET_MASK	(0xfffff000)
 #define AUIPC_PAD		(0x00001000)
 #define JALR_SHIFT		20
-#define JALR_BASIC		(0x000080e7)
-#define AUIPC_BASIC		(0x00000097)
+#define JALR_RA			(0x000080e7)
+#define AUIPC_RA		(0x00000097)
+#define JALR_T0			(0x000282e7)
+#define AUIPC_T0		(0x00000297)
 #define NOP4			(0x00000013)
 
-#define make_call(caller, callee, call)					\
+#define to_jalr_t0(offset)						\
+	(((offset & JALR_OFFSET_MASK) << JALR_SHIFT) | JALR_T0)
+
+#define to_auipc_t0(offset)						\
+	((offset & JALR_SIGN_MASK) ?					\
+	(((offset & AUIPC_OFFSET_MASK) + AUIPC_PAD) | AUIPC_T0) :	\
+	((offset & AUIPC_OFFSET_MASK) | AUIPC_T0))
+
+#define make_call_t0(caller, callee, call)				\
 do {									\
-	call[0] = to_auipc_insn((unsigned int)((unsigned long)callee -	\
-				(unsigned long)caller));		\
-	call[1] = to_jalr_insn((unsigned int)((unsigned long)callee -	\
-			       (unsigned long)caller));			\
+	unsigned int offset =						\
+		(unsigned long) callee - (unsigned long) caller;	\
+	call[0] = to_auipc_t0(offset);					\
+	call[1] = to_jalr_t0(offset);					\
 } while (0)
 
-#define to_jalr_insn(offset)						\
-	(((offset & JALR_OFFSET_MASK) << JALR_SHIFT) | JALR_BASIC)
+#define to_jalr_ra(offset)						\
+	(((offset & JALR_OFFSET_MASK) << JALR_SHIFT) | JALR_RA)
 
-#define to_auipc_insn(offset)						\
+#define to_auipc_ra(offset)						\
 	((offset & JALR_SIGN_MASK) ?					\
-	(((offset & AUIPC_OFFSET_MASK) + AUIPC_PAD) | AUIPC_BASIC) :	\
-	((offset & AUIPC_OFFSET_MASK) | AUIPC_BASIC))
+	(((offset & AUIPC_OFFSET_MASK) + AUIPC_PAD) | AUIPC_RA) :	\
+	((offset & AUIPC_OFFSET_MASK) | AUIPC_RA))
+
+#define make_call_ra(caller, callee, call)				\
+do {									\
+	unsigned int offset =						\
+		(unsigned long) callee - (unsigned long) caller;	\
+	call[0] = to_auipc_ra(offset);					\
+	call[1] = to_jalr_ra(offset);					\
+} while (0)
 
 /*
  * Let auipc+jalr be the basic *mcount unit*, so we make it 8 bytes here.
diff --git a/arch/riscv/kernel/ftrace.c b/arch/riscv/kernel/ftrace.c
index 2086f6585773..5bff37af4770 100644
--- a/arch/riscv/kernel/ftrace.c
+++ b/arch/riscv/kernel/ftrace.c
@@ -55,12 +55,15 @@ static int ftrace_check_current_call(unsigned long hook_pos,
 }
 
 static int __ftrace_modify_call(unsigned long hook_pos, unsigned long target,
-				bool enable)
+				bool enable, bool ra)
 {
 	unsigned int call[2];
 	unsigned int nops[2] = {NOP4, NOP4};
 
-	make_call(hook_pos, target, call);
+	if (ra)
+		make_call_ra(hook_pos, target, call);
+	else
+		make_call_t0(hook_pos, target, call);
 
 	/* Replace the auipc-jalr pair at once. Return -EPERM on write error. */
 	if (patch_text_nosync
@@ -70,42 +73,13 @@ static int __ftrace_modify_call(unsigned long hook_pos, unsigned long target,
 	return 0;
 }
 
-/*
- * Put 5 instructions with 16 bytes at the front of function within
- * patchable function entry nops' area.
- *
- * 0: REG_S  ra, -SZREG(sp)
- * 1: auipc  ra, 0x?
- * 2: jalr   -?(ra)
- * 3: REG_L  ra, -SZREG(sp)
- *
- * So the opcodes is:
- * 0: 0xfe113c23 (sd)/0xfe112e23 (sw)
- * 1: 0x???????? -> auipc
- * 2: 0x???????? -> jalr
- * 3: 0xff813083 (ld)/0xffc12083 (lw)
- */
-#if __riscv_xlen == 64
-#define INSN0	0xfe113c23
-#define INSN3	0xff813083
-#elif __riscv_xlen == 32
-#define INSN0	0xfe112e23
-#define INSN3	0xffc12083
-#endif
-
-#define FUNC_ENTRY_SIZE	16
-#define FUNC_ENTRY_JMP	4
-
 int ftrace_make_call(struct dyn_ftrace *rec, unsigned long addr)
 {
-	unsigned int call[4] = {INSN0, 0, 0, INSN3};
-	unsigned long target = addr;
-	unsigned long caller = rec->ip + FUNC_ENTRY_JMP;
+	unsigned int call[2];
 
-	call[1] = to_auipc_insn((unsigned int)(target - caller));
-	call[2] = to_jalr_insn((unsigned int)(target - caller));
+	make_call_t0(rec->ip, addr, call);
 
-	if (patch_text_nosync((void *)rec->ip, call, FUNC_ENTRY_SIZE))
+	if (patch_text_nosync((void *)rec->ip, call, MCOUNT_INSN_SIZE))
 		return -EPERM;
 
 	return 0;
@@ -114,15 +88,14 @@ int ftrace_make_call(struct dyn_ftrace *rec, unsigned long addr)
 int ftrace_make_nop(struct module *mod, struct dyn_ftrace *rec,
 		    unsigned long addr)
 {
-	unsigned int nops[4] = {NOP4, NOP4, NOP4, NOP4};
+	unsigned int nops[2] = {NOP4, NOP4};
 
-	if (patch_text_nosync((void *)rec->ip, nops, FUNC_ENTRY_SIZE))
+	if (patch_text_nosync((void *)rec->ip, nops, MCOUNT_INSN_SIZE))
 		return -EPERM;
 
 	return 0;
 }
 
-
 /*
  * This is called early on, and isn't wrapped by
  * ftrace_arch_code_modify_{prepare,post_process}() and therefor doesn't hold
@@ -144,10 +117,10 @@ int ftrace_init_nop(struct module *mod, struct dyn_ftrace *rec)
 int ftrace_update_ftrace_func(ftrace_func_t func)
 {
 	int ret = __ftrace_modify_call((unsigned long)&ftrace_call,
-				       (unsigned long)func, true);
+				       (unsigned long)func, true, true);
 	if (!ret) {
 		ret = __ftrace_modify_call((unsigned long)&ftrace_regs_call,
-					   (unsigned long)func, true);
+					   (unsigned long)func, true, true);
 	}
 
 	return ret;
@@ -159,16 +132,16 @@ int ftrace_modify_call(struct dyn_ftrace *rec, unsigned long old_addr,
 		       unsigned long addr)
 {
 	unsigned int call[2];
-	unsigned long caller = rec->ip + FUNC_ENTRY_JMP;
+	unsigned long caller = rec->ip;
 	int ret;
 
-	make_call(caller, old_addr, call);
+	make_call_t0(caller, old_addr, call);
 	ret = ftrace_check_current_call(caller, call);
 
 	if (ret)
 		return ret;
 
-	return __ftrace_modify_call(caller, addr, true);
+	return __ftrace_modify_call(caller, addr, true, false);
 }
 #endif
 
@@ -203,12 +176,12 @@ int ftrace_enable_ftrace_graph_caller(void)
 	int ret;
 
 	ret = __ftrace_modify_call((unsigned long)&ftrace_graph_call,
-				    (unsigned long)&prepare_ftrace_return, true);
+				    (unsigned long)&prepare_ftrace_return, true, true);
 	if (ret)
 		return ret;
 
 	return __ftrace_modify_call((unsigned long)&ftrace_graph_regs_call,
-				    (unsigned long)&prepare_ftrace_return, true);
+				    (unsigned long)&prepare_ftrace_return, true, true);
 }
 
 int ftrace_disable_ftrace_graph_caller(void)
@@ -216,12 +189,12 @@ int ftrace_disable_ftrace_graph_caller(void)
 	int ret;
 
 	ret = __ftrace_modify_call((unsigned long)&ftrace_graph_call,
-				    (unsigned long)&prepare_ftrace_return, false);
+				    (unsigned long)&prepare_ftrace_return, false, true);
 	if (ret)
 		return ret;
 
 	return __ftrace_modify_call((unsigned long)&ftrace_graph_regs_call,
-				    (unsigned long)&prepare_ftrace_return, false);
+				    (unsigned long)&prepare_ftrace_return, false, true);
 }
 #endif /* CONFIG_DYNAMIC_FTRACE */
 #endif /* CONFIG_FUNCTION_GRAPH_TRACER */
diff --git a/arch/riscv/kernel/mcount-dyn.S b/arch/riscv/kernel/mcount-dyn.S
index d171eca623b6..b75332ced757 100644
--- a/arch/riscv/kernel/mcount-dyn.S
+++ b/arch/riscv/kernel/mcount-dyn.S
@@ -13,8 +13,8 @@
 
 	.text
 
-#define FENTRY_RA_OFFSET	12
-#define ABI_SIZE_ON_STACK	72
+#define FENTRY_RA_OFFSET	8
+#define ABI_SIZE_ON_STACK	80
 #define ABI_A0			0
 #define ABI_A1			8
 #define ABI_A2			16
@@ -23,10 +23,10 @@
 #define ABI_A5			40
 #define ABI_A6			48
 #define ABI_A7			56
-#define ABI_RA			64
+#define ABI_T0			64
+#define ABI_RA			72
 
 	.macro SAVE_ABI
-	addi	sp, sp, -SZREG
 	addi	sp, sp, -ABI_SIZE_ON_STACK
 
 	REG_S	a0, ABI_A0(sp)
@@ -37,6 +37,7 @@
 	REG_S	a5, ABI_A5(sp)
 	REG_S	a6, ABI_A6(sp)
 	REG_S	a7, ABI_A7(sp)
+	REG_S	t0, ABI_T0(sp)
 	REG_S	ra, ABI_RA(sp)
 	.endm
 
@@ -49,24 +50,18 @@
 	REG_L	a5, ABI_A5(sp)
 	REG_L	a6, ABI_A6(sp)
 	REG_L	a7, ABI_A7(sp)
+	REG_L	t0, ABI_T0(sp)
 	REG_L	ra, ABI_RA(sp)
 
 	addi	sp, sp, ABI_SIZE_ON_STACK
-	addi	sp, sp, SZREG
 	.endm
 
 #ifdef CONFIG_DYNAMIC_FTRACE_WITH_REGS
 	.macro SAVE_ALL
-	addi	sp, sp, -SZREG
 	addi	sp, sp, -PT_SIZE_ON_STACK
 
-	REG_S x1,  PT_EPC(sp)
-	addi	sp, sp, PT_SIZE_ON_STACK
-	REG_L x1,  (sp)
-	addi	sp, sp, -PT_SIZE_ON_STACK
+	REG_S t0,  PT_EPC(sp)
 	REG_S x1,  PT_RA(sp)
-	REG_L x1,  PT_EPC(sp)
-
 	REG_S x2,  PT_SP(sp)
 	REG_S x3,  PT_GP(sp)
 	REG_S x4,  PT_TP(sp)
@@ -100,11 +95,8 @@
 	.endm
 
 	.macro RESTORE_ALL
+	REG_L t0,  PT_EPC(sp)
 	REG_L x1,  PT_RA(sp)
-	addi	sp, sp, PT_SIZE_ON_STACK
-	REG_S x1,  (sp)
-	addi	sp, sp, -PT_SIZE_ON_STACK
-	REG_L x1,  PT_EPC(sp)
 	REG_L x2,  PT_SP(sp)
 	REG_L x3,  PT_GP(sp)
 	REG_L x4,  PT_TP(sp)
@@ -137,17 +129,16 @@
 	REG_L x31, PT_T6(sp)
 
 	addi	sp, sp, PT_SIZE_ON_STACK
-	addi	sp, sp, SZREG
 	.endm
 #endif /* CONFIG_DYNAMIC_FTRACE_WITH_REGS */
 
 ENTRY(ftrace_caller)
 	SAVE_ABI
 
-	addi	a0, ra, -FENTRY_RA_OFFSET
+	addi	a0, t0, -FENTRY_RA_OFFSET
 	la	a1, function_trace_op
 	REG_L	a2, 0(a1)
-	REG_L	a1, ABI_SIZE_ON_STACK(sp)
+	mv	a1, ra
 	mv	a3, sp
 
 ftrace_call:
@@ -155,8 +146,8 @@ ftrace_call:
 	call	ftrace_stub
 
 #ifdef CONFIG_FUNCTION_GRAPH_TRACER
-	addi	a0, sp, ABI_SIZE_ON_STACK
-	REG_L	a1, ABI_RA(sp)
+	addi	a0, sp, ABI_RA
+	REG_L	a1, ABI_T0(sp)
 	addi	a1, a1, -FENTRY_RA_OFFSET
 #ifdef HAVE_FUNCTION_GRAPH_FP_TEST
 	mv	a2, s0
@@ -166,17 +157,17 @@ ftrace_graph_call:
 	call	ftrace_stub
 #endif
 	RESTORE_ABI
-	ret
+	jr t0
 ENDPROC(ftrace_caller)
 
 #ifdef CONFIG_DYNAMIC_FTRACE_WITH_REGS
 ENTRY(ftrace_regs_caller)
 	SAVE_ALL
 
-	addi	a0, ra, -FENTRY_RA_OFFSET
+	addi	a0, t0, -FENTRY_RA_OFFSET
 	la	a1, function_trace_op
 	REG_L	a2, 0(a1)
-	REG_L	a1, PT_SIZE_ON_STACK(sp)
+	mv	a1, ra
 	mv	a3, sp
 
 ftrace_regs_call:
@@ -185,7 +176,7 @@ ftrace_regs_call:
 
 #ifdef CONFIG_FUNCTION_GRAPH_TRACER
 	addi	a0, sp, PT_RA
-	REG_L	a1, PT_EPC(sp)
+	REG_L	a1, PT_T0(sp)
 	addi	a1, a1, -FENTRY_RA_OFFSET
 #ifdef HAVE_FUNCTION_GRAPH_FP_TEST
 	mv	a2, s0
@@ -196,6 +187,6 @@ ftrace_graph_regs_call:
 #endif
 
 	RESTORE_ALL
-	ret
+	jr t0
 ENDPROC(ftrace_regs_caller)
 #endif /* CONFIG_DYNAMIC_FTRACE_WITH_REGS */
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH -next V6 4/7] riscv: ftrace: Add ftrace_graph_func
  2023-01-07 13:35 [PATCH -next V6 0/7] riscv: Optimize function trace guoren
                   ` (2 preceding siblings ...)
  2023-01-07 13:35 ` [PATCH -next V6 3/7] riscv: ftrace: Reduce the detour code size to half guoren
@ 2023-01-07 13:35 ` guoren
  2023-01-07 13:35 ` [PATCH -next V6 5/7] riscv: ftrace: Make ftrace_caller call ftrace_graph_func guoren
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 28+ messages in thread
From: guoren @ 2023-01-07 13:35 UTC (permalink / raw)
  To: anup, paul.walmsley, palmer, conor.dooley, heiko, rostedt,
	mhiramat, jolsa, bp, jpoimboe, suagrfillet, andy.chiu,
	e.shatokhin, guoren
  Cc: linux-riscv, linux-kernel

From: Song Shuai <suagrfillet@gmail.com>

Here implements ftrace_graph_func as the function graph tracing function
with FTRACE_WITH_REGS defined.

function_graph_func gets the point of the parent IP and the frame pointer
from fregs and call prepare_ftrace_return for function graph tracing.

If FTRACE_WITH_REGS isn't defined, the enable/disable helpers of
ftrace_graph_[regs]_call are revised for serving only ftrace_graph_call
in the !FTRACE_WITH_REGS version ftrace_caller.

Signed-off-by: Song Shuai <suagrfillet@gmail.com>
Tested-by: Guo Ren <guoren@kernel.org>
Signed-off-by: Guo Ren <guoren@kernel.org>
---
 arch/riscv/include/asm/ftrace.h | 13 +++++++++++--
 arch/riscv/kernel/ftrace.c      | 30 +++++++++++++-----------------
 2 files changed, 24 insertions(+), 19 deletions(-)

diff --git a/arch/riscv/include/asm/ftrace.h b/arch/riscv/include/asm/ftrace.h
index 9e73922e1e2e..84f856a3286e 100644
--- a/arch/riscv/include/asm/ftrace.h
+++ b/arch/riscv/include/asm/ftrace.h
@@ -107,8 +107,17 @@ do {									\
 struct dyn_ftrace;
 int ftrace_init_nop(struct module *mod, struct dyn_ftrace *rec);
 #define ftrace_init_nop ftrace_init_nop
-#endif
 
-#endif
+#ifdef CONFIG_DYNAMIC_FTRACE_WITH_REGS
+struct ftrace_ops;
+struct ftrace_regs;
+void ftrace_graph_func(unsigned long ip, unsigned long parent_ip,
+		       struct ftrace_ops *op, struct ftrace_regs *fregs);
+#define ftrace_graph_func ftrace_graph_func
+#endif /* CONFIG_DYNAMIC_FTRACE_WITH_REGS */
+
+#endif /* __ASSEMBLY__ */
+
+#endif /* CONFIG_DYNAMIC_FTRACE */
 
 #endif /* _ASM_RISCV_FTRACE_H */
diff --git a/arch/riscv/kernel/ftrace.c b/arch/riscv/kernel/ftrace.c
index 5bff37af4770..95e14d8161a4 100644
--- a/arch/riscv/kernel/ftrace.c
+++ b/arch/riscv/kernel/ftrace.c
@@ -169,32 +169,28 @@ void prepare_ftrace_return(unsigned long *parent, unsigned long self_addr,
 }
 
 #ifdef CONFIG_DYNAMIC_FTRACE
+#ifdef CONFIG_DYNAMIC_FTRACE_WITH_REGS
+void ftrace_graph_func(unsigned long ip, unsigned long parent_ip,
+		       struct ftrace_ops *op, struct ftrace_regs *fregs)
+{
+	struct pt_regs *regs = arch_ftrace_get_regs(fregs);
+	unsigned long *parent = (unsigned long *)&regs->ra;
+
+	prepare_ftrace_return(parent, ip, frame_pointer(regs));
+}
+#else /* CONFIG_DYNAMIC_FTRACE_WITH_REGS */
 extern void ftrace_graph_call(void);
-extern void ftrace_graph_regs_call(void);
 int ftrace_enable_ftrace_graph_caller(void)
 {
-	int ret;
-
-	ret = __ftrace_modify_call((unsigned long)&ftrace_graph_call,
-				    (unsigned long)&prepare_ftrace_return, true, true);
-	if (ret)
-		return ret;
-
-	return __ftrace_modify_call((unsigned long)&ftrace_graph_regs_call,
+	return __ftrace_modify_call((unsigned long)&ftrace_graph_call,
 				    (unsigned long)&prepare_ftrace_return, true, true);
 }
 
 int ftrace_disable_ftrace_graph_caller(void)
 {
-	int ret;
-
-	ret = __ftrace_modify_call((unsigned long)&ftrace_graph_call,
-				    (unsigned long)&prepare_ftrace_return, false, true);
-	if (ret)
-		return ret;
-
-	return __ftrace_modify_call((unsigned long)&ftrace_graph_regs_call,
+	return __ftrace_modify_call((unsigned long)&ftrace_graph_call,
 				    (unsigned long)&prepare_ftrace_return, false, true);
 }
+#endif /* CONFIG_DYNAMIC_FTRACE_WITH_REGS */
 #endif /* CONFIG_DYNAMIC_FTRACE */
 #endif /* CONFIG_FUNCTION_GRAPH_TRACER */
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH -next V6 5/7] riscv: ftrace: Make ftrace_caller call ftrace_graph_func
  2023-01-07 13:35 [PATCH -next V6 0/7] riscv: Optimize function trace guoren
                   ` (3 preceding siblings ...)
  2023-01-07 13:35 ` [PATCH -next V6 4/7] riscv: ftrace: Add ftrace_graph_func guoren
@ 2023-01-07 13:35 ` guoren
  2023-01-10 17:16   ` Evgenii Shatokhin
  2023-01-07 13:35 ` [PATCH -next V6 6/7] riscv: ftrace: Add DYNAMIC_FTRACE_WITH_DIRECT_CALLS support guoren
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 28+ messages in thread
From: guoren @ 2023-01-07 13:35 UTC (permalink / raw)
  To: anup, paul.walmsley, palmer, conor.dooley, heiko, rostedt,
	mhiramat, jolsa, bp, jpoimboe, suagrfillet, andy.chiu,
	e.shatokhin, guoren
  Cc: linux-riscv, linux-kernel

From: Song Shuai <suagrfillet@gmail.com>

In order to make the function graph use ftrace directly, ftrace_caller
should be adjusted to save the necessary regs against the pt_regs layout
so it can call ftrace_graph_func reasonably.

SAVE_ALL now saves all the regs according to the pt_regs struct. Here
supersedes SAVE_ALL by SAVE_ABI_REGS which has an extra option to allow
saving only the necessary ABI-related regs for ftrace_caller.

ftrace_caller and ftrace_regs_caller save their regs with the respective
option of SAVE_ABI_REGS, then call the tracing function, especially
graph_ops's ftrace_graph_func. So the ftrace_graph_[regs]_call labels
aren't needed anymore if FTRACE_WITH_REGS is defined.

As the previous patch described, the ftrace_caller remains with its
ftrace_graph_call if FTRACE_WITH_REGS isn't defined,

For convenience, the original argument setup for the tracing function in
ftrace_[regs]_caller is separated as PREPARE_ARGS.

Signed-off-by: Song Shuai <suagrfillet@gmail.com>
Tested-by: Guo Ren <guoren@kernel.org>
Signed-off-by: Guo Ren <guoren@kernel.org>
---
 arch/riscv/kernel/mcount-dyn.S | 142 ++++++++++++++++++++++++---------
 1 file changed, 104 insertions(+), 38 deletions(-)

diff --git a/arch/riscv/kernel/mcount-dyn.S b/arch/riscv/kernel/mcount-dyn.S
index b75332ced757..d7d4d51b4bd7 100644
--- a/arch/riscv/kernel/mcount-dyn.S
+++ b/arch/riscv/kernel/mcount-dyn.S
@@ -57,19 +57,52 @@
 	.endm
 
 #ifdef CONFIG_DYNAMIC_FTRACE_WITH_REGS
-	.macro SAVE_ALL
+
+/**
+* SAVE_ABI_REGS - save regs against the pt_regs struct
+*
+* @all: tell if saving all the regs
+*
+* If all is set, all the regs will be saved, otherwise only ABI
+* related regs (a0-a7,epc,ra and optional s0) will be saved.
+*
+* After the stack is established,
+*
+* 0(sp) stores the PC of the traced function which can be accessed
+* by &(fregs)->regs->epc in tracing function. Note that the real
+* function entry address should be computed with -FENTRY_RA_OFFSET.
+*
+* 8(sp) stores the function return address (i.e. parent IP) that
+* can be accessed by &(fregs)->regs->ra in tracing function.
+*
+* The other regs are saved at the respective localtion and accessed
+* by the respective pt_regs member.
+*
+* Here is the layout of stack for your reference.
+*
+* PT_SIZE_ON_STACK  ->  +++++++++
+*                       + ..... +
+*                       + t3-t6 +
+*                       + s2-s11+
+*                       + a0-a7 + --++++-> ftrace_caller saved
+*                       + s1    +   +
+*                       + s0    + --+
+*                       + t0-t2 +   +
+*                       + tp    +   +
+*                       + gp    +   +
+*                       + sp    +   +
+*                       + ra    + --+ // parent IP
+*               sp  ->  + epc   + --+ // PC
+*                       +++++++++
+**/
+	.macro SAVE_ABI_REGS, all=0
 	addi	sp, sp, -PT_SIZE_ON_STACK
 
 	REG_S t0,  PT_EPC(sp)
 	REG_S x1,  PT_RA(sp)
-	REG_S x2,  PT_SP(sp)
-	REG_S x3,  PT_GP(sp)
-	REG_S x4,  PT_TP(sp)
-	REG_S x5,  PT_T0(sp)
-	REG_S x6,  PT_T1(sp)
-	REG_S x7,  PT_T2(sp)
-	REG_S x8,  PT_S0(sp)
-	REG_S x9,  PT_S1(sp)
+
+	// always save the ABI regs
+
 	REG_S x10, PT_A0(sp)
 	REG_S x11, PT_A1(sp)
 	REG_S x12, PT_A2(sp)
@@ -78,6 +111,18 @@
 	REG_S x15, PT_A5(sp)
 	REG_S x16, PT_A6(sp)
 	REG_S x17, PT_A7(sp)
+
+	// save the leftover regs
+
+	.if \all == 1
+	REG_S x2,  PT_SP(sp)
+	REG_S x3,  PT_GP(sp)
+	REG_S x4,  PT_TP(sp)
+	REG_S x5,  PT_T0(sp)
+	REG_S x6,  PT_T1(sp)
+	REG_S x7,  PT_T2(sp)
+	REG_S x8,  PT_S0(sp)
+	REG_S x9,  PT_S1(sp)
 	REG_S x18, PT_S2(sp)
 	REG_S x19, PT_S3(sp)
 	REG_S x20, PT_S4(sp)
@@ -92,19 +137,19 @@
 	REG_S x29, PT_T4(sp)
 	REG_S x30, PT_T5(sp)
 	REG_S x31, PT_T6(sp)
+
+	// save s0 if FP_TEST defined
+
+	.else
+#ifdef HAVE_FUNCTION_GRAPH_FP_TEST
+	REG_S x8,  PT_S0(sp)
+#endif
+	.endif
 	.endm
 
-	.macro RESTORE_ALL
+	.macro RESTORE_ABI_REGS, all=0
 	REG_L t0,  PT_EPC(sp)
 	REG_L x1,  PT_RA(sp)
-	REG_L x2,  PT_SP(sp)
-	REG_L x3,  PT_GP(sp)
-	REG_L x4,  PT_TP(sp)
-	REG_L x5,  PT_T0(sp)
-	REG_L x6,  PT_T1(sp)
-	REG_L x7,  PT_T2(sp)
-	REG_L x8,  PT_S0(sp)
-	REG_L x9,  PT_S1(sp)
 	REG_L x10, PT_A0(sp)
 	REG_L x11, PT_A1(sp)
 	REG_L x12, PT_A2(sp)
@@ -113,6 +158,16 @@
 	REG_L x15, PT_A5(sp)
 	REG_L x16, PT_A6(sp)
 	REG_L x17, PT_A7(sp)
+
+	.if \all == 1
+	REG_L x2,  PT_SP(sp)
+	REG_L x3,  PT_GP(sp)
+	REG_L x4,  PT_TP(sp)
+	REG_L x5,  PT_T0(sp)
+	REG_L x6,  PT_T1(sp)
+	REG_L x7,  PT_T2(sp)
+	REG_L x8,  PT_S0(sp)
+	REG_L x9,  PT_S1(sp)
 	REG_L x18, PT_S2(sp)
 	REG_L x19, PT_S3(sp)
 	REG_L x20, PT_S4(sp)
@@ -128,10 +183,25 @@
 	REG_L x30, PT_T5(sp)
 	REG_L x31, PT_T6(sp)
 
+	.else
+#ifdef HAVE_FUNCTION_GRAPH_FP_TEST
+	REG_L x8,  PT_S0(sp)
+#endif
+	.endif
 	addi	sp, sp, PT_SIZE_ON_STACK
 	.endm
+
+	.macro PREPARE_ARGS
+	addi	a0, t0, -FENTRY_RA_OFFSET	// ip
+	la	a1, function_trace_op
+	REG_L	a2, 0(a1)			// op
+	mv	a1, ra				// parent_ip
+	mv	a3, sp				// fregs
+	.endm
+
 #endif /* CONFIG_DYNAMIC_FTRACE_WITH_REGS */
 
+#ifndef CONFIG_DYNAMIC_FTRACE_WITH_REGS
 ENTRY(ftrace_caller)
 	SAVE_ABI
 
@@ -160,33 +230,29 @@ ftrace_graph_call:
 	jr t0
 ENDPROC(ftrace_caller)
 
-#ifdef CONFIG_DYNAMIC_FTRACE_WITH_REGS
+#else /* CONFIG_DYNAMIC_FTRACE_WITH_REGS */
 ENTRY(ftrace_regs_caller)
-	SAVE_ALL
-
-	addi	a0, t0, -FENTRY_RA_OFFSET
-	la	a1, function_trace_op
-	REG_L	a2, 0(a1)
-	mv	a1, ra
-	mv	a3, sp
+	SAVE_ABI_REGS 1
+	PREPARE_ARGS
 
 ftrace_regs_call:
 	.global ftrace_regs_call
 	call	ftrace_stub
 
-#ifdef CONFIG_FUNCTION_GRAPH_TRACER
-	addi	a0, sp, PT_RA
-	REG_L	a1, PT_T0(sp)
-	addi	a1, a1, -FENTRY_RA_OFFSET
-#ifdef HAVE_FUNCTION_GRAPH_FP_TEST
-	mv	a2, s0
-#endif
-ftrace_graph_regs_call:
-	.global ftrace_graph_regs_call
-	call	ftrace_stub
-#endif
 
-	RESTORE_ALL
+	RESTORE_ABI_REGS 1
 	jr t0
 ENDPROC(ftrace_regs_caller)
+
+ENTRY(ftrace_caller)
+	SAVE_ABI_REGS 0
+	PREPARE_ARGS
+
+ftrace_call:
+	.global ftrace_call
+	call	ftrace_stub
+
+	RESTORE_ABI_REGS 0
+	jr t0
+ENDPROC(ftrace_caller)
 #endif /* CONFIG_DYNAMIC_FTRACE_WITH_REGS */
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH -next V6 6/7] riscv: ftrace: Add DYNAMIC_FTRACE_WITH_DIRECT_CALLS support
  2023-01-07 13:35 [PATCH -next V6 0/7] riscv: Optimize function trace guoren
                   ` (4 preceding siblings ...)
  2023-01-07 13:35 ` [PATCH -next V6 5/7] riscv: ftrace: Make ftrace_caller call ftrace_graph_func guoren
@ 2023-01-07 13:35 ` guoren
  2023-01-07 13:35 ` [PATCH -next V6 7/7] samples: ftrace: Add riscv support for SAMPLE_FTRACE_DIRECT[_MULTI] guoren
  2023-01-11 10:11 ` [PATCH -next V6 0/7] riscv: Optimize function trace Song Shuai
  7 siblings, 0 replies; 28+ messages in thread
From: guoren @ 2023-01-07 13:35 UTC (permalink / raw)
  To: anup, paul.walmsley, palmer, conor.dooley, heiko, rostedt,
	mhiramat, jolsa, bp, jpoimboe, suagrfillet, andy.chiu,
	e.shatokhin, guoren
  Cc: linux-riscv, linux-kernel

From: Song Shuai <suagrfillet@gmail.com>

This patch adds DYNAMIC_FTRACE_WITH_DIRECT_CALLS support for RISC-V.

select the DYNAMIC_FTRACE_WITH_DIRECT_CALLS to provide the
register_ftrace_direct[_multi] interfaces allowing users to register
the customed trampoline (direct_caller) as the mcount for one or
more target functions. And modify_ftrace_direct[_multi] are also
provided for modifying direct_caller.

To make the direct_caller and the other ftrace hooks (eg. function/fgraph
tracer, k[ret]probes) co-exist, a temporary register is nominated to
store the address of direct_caller in ftrace_regs_caller. After the
setting of the address direct_caller by direct_ops->func and the
RESTORE_REGS in ftrace_regs_caller, direct_caller will be jumped to
by the `jr` inst.

Signed-off-by: Song Shuai <suagrfillet@gmail.com>
Tested-by: Guo Ren <guoren@kernel.org>
Signed-off-by: Guo Ren <guoren@kernel.org>
---
 arch/riscv/Kconfig              | 1 +
 arch/riscv/include/asm/ftrace.h | 8 ++++++++
 arch/riscv/kernel/mcount-dyn.S  | 4 ++++
 3 files changed, 13 insertions(+)

diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index ee0d39b26794..307a9f413edd 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -135,6 +135,7 @@ config RISCV
 	select UACCESS_MEMCPY if !MMU
 	select ZONE_DMA32 if 64BIT
 	select HAVE_DYNAMIC_FTRACE if !XIP_KERNEL && MMU && $(cc-option,-fpatchable-function-entry=8)
+	select HAVE_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
 	select HAVE_DYNAMIC_FTRACE_WITH_REGS if HAVE_DYNAMIC_FTRACE
 	select HAVE_FTRACE_MCOUNT_RECORD if !XIP_KERNEL
 	select HAVE_FUNCTION_GRAPH_TRACER
diff --git a/arch/riscv/include/asm/ftrace.h b/arch/riscv/include/asm/ftrace.h
index 84f856a3286e..84904c1e4369 100644
--- a/arch/riscv/include/asm/ftrace.h
+++ b/arch/riscv/include/asm/ftrace.h
@@ -114,6 +114,14 @@ struct ftrace_regs;
 void ftrace_graph_func(unsigned long ip, unsigned long parent_ip,
 		       struct ftrace_ops *op, struct ftrace_regs *fregs);
 #define ftrace_graph_func ftrace_graph_func
+
+static inline void
+__arch_ftrace_set_direct_caller(struct pt_regs *regs, unsigned long addr)
+{
+		regs->t1 = addr;
+}
+#define arch_ftrace_set_direct_caller(fregs, addr) \
+	__arch_ftrace_set_direct_caller(&(fregs)->regs, addr)
 #endif /* CONFIG_DYNAMIC_FTRACE_WITH_REGS */
 
 #endif /* __ASSEMBLY__ */
diff --git a/arch/riscv/kernel/mcount-dyn.S b/arch/riscv/kernel/mcount-dyn.S
index d7d4d51b4bd7..552f622ef2b6 100644
--- a/arch/riscv/kernel/mcount-dyn.S
+++ b/arch/riscv/kernel/mcount-dyn.S
@@ -232,6 +232,7 @@ ENDPROC(ftrace_caller)
 
 #else /* CONFIG_DYNAMIC_FTRACE_WITH_REGS */
 ENTRY(ftrace_regs_caller)
+	move	t1, zero
 	SAVE_ABI_REGS 1
 	PREPARE_ARGS
 
@@ -241,7 +242,10 @@ ftrace_regs_call:
 
 
 	RESTORE_ABI_REGS 1
+	bnez	t1,.Ldirect
 	jr t0
+.Ldirect:
+	jr t1
 ENDPROC(ftrace_regs_caller)
 
 ENTRY(ftrace_caller)
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH -next V6 7/7] samples: ftrace: Add riscv support for SAMPLE_FTRACE_DIRECT[_MULTI]
  2023-01-07 13:35 [PATCH -next V6 0/7] riscv: Optimize function trace guoren
                   ` (5 preceding siblings ...)
  2023-01-07 13:35 ` [PATCH -next V6 6/7] riscv: ftrace: Add DYNAMIC_FTRACE_WITH_DIRECT_CALLS support guoren
@ 2023-01-07 13:35 ` guoren
  2023-01-10 13:08   ` Evgenii Shatokhin
                     ` (2 more replies)
  2023-01-11 10:11 ` [PATCH -next V6 0/7] riscv: Optimize function trace Song Shuai
  7 siblings, 3 replies; 28+ messages in thread
From: guoren @ 2023-01-07 13:35 UTC (permalink / raw)
  To: anup, paul.walmsley, palmer, conor.dooley, heiko, rostedt,
	mhiramat, jolsa, bp, jpoimboe, suagrfillet, andy.chiu,
	e.shatokhin, guoren
  Cc: linux-riscv, linux-kernel

From: Song Shuai <suagrfillet@gmail.com>

select HAVE_SAMPLE_FTRACE_DIRECT and HAVE_SAMPLE_FTRACE_DIRECT_MULTI
for ARCH_RV64I in arch/riscv/Kconfig. And add riscv asm code for
the ftrace-direct*.c files in samples/ftrace/.

Signed-off-by: Song Shuai <suagrfillet@gmail.com>
Tested-by: Guo Ren <guoren@kernel.org>
Signed-off-by: Guo Ren <guoren@kernel.org>
---
 samples/ftrace/ftrace-direct-modify.c       | 33 ++++++++++++++++++
 samples/ftrace/ftrace-direct-multi-modify.c | 37 +++++++++++++++++++++
 samples/ftrace/ftrace-direct-multi.c        | 22 ++++++++++++
 samples/ftrace/ftrace-direct-too.c          | 26 +++++++++++++++
 samples/ftrace/ftrace-direct.c              | 22 ++++++++++++
 5 files changed, 140 insertions(+)

diff --git a/samples/ftrace/ftrace-direct-modify.c b/samples/ftrace/ftrace-direct-modify.c
index de5a0f67f320..be7bf472c3c7 100644
--- a/samples/ftrace/ftrace-direct-modify.c
+++ b/samples/ftrace/ftrace-direct-modify.c
@@ -23,6 +23,39 @@ extern void my_tramp2(void *);
 
 static unsigned long my_ip = (unsigned long)schedule;
 
+#ifdef CONFIG_RISCV
+
+asm ("	.pushsection    .text, \"ax\", @progbits\n"
+"	.type		my_tramp1, @function\n"
+"	.globl		my_tramp1\n"
+"   my_tramp1:\n"
+"	addi sp,sp,-16\n"
+"	sd   t0,0(sp)\n"
+"	sd   ra,8(sp)\n"
+"	call my_direct_func1\n"
+"	ld   t0,0(sp)\n"
+"	ld   ra,8(sp)\n"
+"	addi sp,sp,16\n"
+"	jr t0\n"
+"	.size		my_tramp1, .-my_tramp1\n"
+
+"	.type		my_tramp2, @function\n"
+"	.globl		my_tramp2\n"
+"   my_tramp2:\n"
+"	addi sp,sp,-16\n"
+"	sd   t0,0(sp)\n"
+"	sd   ra,8(sp)\n"
+"	call my_direct_func2\n"
+"	ld   t0,0(sp)\n"
+"	ld   ra,8(sp)\n"
+"	addi sp,sp,16\n"
+"	jr t0\n"
+"	.size		my_tramp2, .-my_tramp2\n"
+"	.popsection\n"
+);
+
+#endif /* CONFIG_RISCV */
+
 #ifdef CONFIG_X86_64
 
 #include <asm/ibt.h>
diff --git a/samples/ftrace/ftrace-direct-multi-modify.c b/samples/ftrace/ftrace-direct-multi-modify.c
index d52370cad0b6..10884bf418f7 100644
--- a/samples/ftrace/ftrace-direct-multi-modify.c
+++ b/samples/ftrace/ftrace-direct-multi-modify.c
@@ -21,6 +21,43 @@ void my_direct_func2(unsigned long ip)
 extern void my_tramp1(void *);
 extern void my_tramp2(void *);
 
+#ifdef CONFIG_RISCV
+
+asm ("	.pushsection    .text, \"ax\", @progbits\n"
+"	.type		my_tramp1, @function\n"
+"	.globl		my_tramp1\n"
+"   my_tramp1:\n"
+"       addi sp,sp,-24\n"
+"       sd   a0,0(sp)\n"
+"       sd   t0,8(sp)\n"
+"       sd   ra,16(sp)\n"
+"       call my_direct_func1\n"
+"       ld   a0,0(sp)\n"
+"       ld   t0,8(sp)\n"
+"       ld   ra,16(sp)\n"
+"       addi sp,sp,24\n"
+"	jr t0\n"
+"	.size		my_tramp1, .-my_tramp1\n"
+
+"	.type		my_tramp2, @function\n"
+"	.globl		my_tramp2\n"
+"   my_tramp2:\n"
+"       addi sp,sp,-24\n"
+"       sd   a0,0(sp)\n"
+"       sd   t0,8(sp)\n"
+"       sd   ra,16(sp)\n"
+"       call my_direct_func2\n"
+"       ld   a0,0(sp)\n"
+"       ld   t0,8(sp)\n"
+"       ld   ra,16(sp)\n"
+"       addi sp,sp,24\n"
+"	jr t0\n"
+"	.size		my_tramp2, .-my_tramp2\n"
+"	.popsection\n"
+);
+
+#endif /* CONFIG_RISCV */
+
 #ifdef CONFIG_X86_64
 
 #include <asm/ibt.h>
diff --git a/samples/ftrace/ftrace-direct-multi.c b/samples/ftrace/ftrace-direct-multi.c
index ec1088922517..a35bf43bf6d7 100644
--- a/samples/ftrace/ftrace-direct-multi.c
+++ b/samples/ftrace/ftrace-direct-multi.c
@@ -16,6 +16,28 @@ void my_direct_func(unsigned long ip)
 
 extern void my_tramp(void *);
 
+#ifdef CONFIG_RISCV
+
+asm ("       .pushsection    .text, \"ax\", @progbits\n"
+"       .type           my_tramp, @function\n"
+"       .globl          my_tramp\n"
+"   my_tramp:\n"
+"       addi sp,sp,-24\n"
+"       sd   a0,0(sp)\n"
+"       sd   t0,8(sp)\n"
+"       sd   ra,16(sp)\n"
+"       call my_direct_func\n"
+"       ld   a0,0(sp)\n"
+"       ld   t0,8(sp)\n"
+"       ld   ra,16(sp)\n"
+"       addi sp,sp,24\n"
+"       jr t0\n"
+"       .size           my_tramp, .-my_tramp\n"
+"       .popsection\n"
+);
+
+#endif /* CONFIG_RISCV */
+
 #ifdef CONFIG_X86_64
 
 #include <asm/ibt.h>
diff --git a/samples/ftrace/ftrace-direct-too.c b/samples/ftrace/ftrace-direct-too.c
index e13fb59a2b47..3b62e33c2e6d 100644
--- a/samples/ftrace/ftrace-direct-too.c
+++ b/samples/ftrace/ftrace-direct-too.c
@@ -18,6 +18,32 @@ void my_direct_func(struct vm_area_struct *vma,
 
 extern void my_tramp(void *);
 
+#ifdef CONFIG_RISCV
+
+asm ("       .pushsection    .text, \"ax\", @progbits\n"
+"       .type           my_tramp, @function\n"
+"       .globl          my_tramp\n"
+"   my_tramp:\n"
+"       addi sp,sp,-40\n"
+"       sd   a0,0(sp)\n"
+"       sd   a1,8(sp)\n"
+"       sd   a2,16(sp)\n"
+"       sd   t0,24(sp)\n"
+"       sd   ra,32(sp)\n"
+"       call my_direct_func\n"
+"       ld   a0,0(sp)\n"
+"       ld   a1,8(sp)\n"
+"       ld   a2,16(sp)\n"
+"       ld   t0,24(sp)\n"
+"       ld   ra,32(sp)\n"
+"       addi sp,sp,40\n"
+"       jr t0\n"
+"       .size           my_tramp, .-my_tramp\n"
+"       .popsection\n"
+);
+
+#endif /* CONFIG_RISCV */
+
 #ifdef CONFIG_X86_64
 
 #include <asm/ibt.h>
diff --git a/samples/ftrace/ftrace-direct.c b/samples/ftrace/ftrace-direct.c
index 1f769d0db20f..2cfe5a7d2d70 100644
--- a/samples/ftrace/ftrace-direct.c
+++ b/samples/ftrace/ftrace-direct.c
@@ -15,6 +15,28 @@ void my_direct_func(struct task_struct *p)
 
 extern void my_tramp(void *);
 
+#ifdef CONFIG_RISCV
+
+asm ("       .pushsection    .text, \"ax\", @progbits\n"
+"       .type           my_tramp, @function\n"
+"       .globl          my_tramp\n"
+"   my_tramp:\n"
+"       addi sp,sp,-24\n"
+"       sd   a0,0(sp)\n"
+"       sd   t0,8(sp)\n"
+"       sd   ra,16(sp)\n"
+"       call my_direct_func\n"
+"       ld   a0,0(sp)\n"
+"       ld   t0,8(sp)\n"
+"       ld   ra,16(sp)\n"
+"       addi sp,sp,24\n"
+"       jr t0\n"
+"       .size           my_tramp, .-my_tramp\n"
+"       .popsection\n"
+);
+
+#endif /* CONFIG_RISCV */
+
 #ifdef CONFIG_X86_64
 
 #include <asm/ibt.h>
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH -next V6 1/7] riscv: ftrace: Fixup panic by disabling preemption
  2023-01-07 13:35 ` [PATCH -next V6 1/7] riscv: ftrace: Fixup panic by disabling preemption guoren
@ 2023-01-09 17:19   ` Mark Rutland
  2023-01-11 13:22     ` Guo Ren
  0 siblings, 1 reply; 28+ messages in thread
From: Mark Rutland @ 2023-01-09 17:19 UTC (permalink / raw)
  To: guoren
  Cc: anup, paul.walmsley, palmer, conor.dooley, heiko, rostedt,
	mhiramat, jolsa, bp, jpoimboe, suagrfillet, andy.chiu,
	e.shatokhin, linux-riscv, linux-kernel

On Sat, Jan 07, 2023 at 08:35:43AM -0500, guoren@kernel.org wrote:
> From: Andy Chiu <andy.chiu@sifive.com>
> 
> In RISCV, we must use an AUIPC + JALR pair to encode an immediate,
> forming a jump that jumps to an address over 4K. This may cause errors
> if we want to enable kernel preemption and remove dependency from
> patching code with stop_machine(). For example, if a task was switched
> out on auipc. And, if we changed the ftrace function before it was
> switched back, then it would jump to an address that has updated 11:0
> bits mixing with previous XLEN:12 part.
> 
> p: patched area performed by dynamic ftrace
> ftrace_prologue:
> p|      REG_S   ra, -SZREG(sp)
> p|      auipc   ra, 0x? ------------> preempted
> 					...
> 				change ftrace function
> 					...
> p|      jalr    -?(ra) <------------- switched back
> p|      REG_L   ra, -SZREG(sp)
> func:
> 	xxx
> 	ret

What happens on SMP but not !PREEMPTION; can't a CPU be in the middle of this
while you're patching the sequence?

Do you have any guarantee as to the atomicity and ordering of instruction
fetches?

Thanks,
Mark.

> 
> Fixes: afc76b8b8011 ("riscv: Using PATCHABLE_FUNCTION_ENTRY instead of MCOUNT")
> Signed-off-by: Andy Chiu <andy.chiu@sifive.com>
> Signed-off-by: Guo Ren <guoren@kernel.org>
> ---
>  arch/riscv/Kconfig | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> index e2b656043abf..ee0d39b26794 100644
> --- a/arch/riscv/Kconfig
> +++ b/arch/riscv/Kconfig
> @@ -138,7 +138,7 @@ config RISCV
>  	select HAVE_DYNAMIC_FTRACE_WITH_REGS if HAVE_DYNAMIC_FTRACE
>  	select HAVE_FTRACE_MCOUNT_RECORD if !XIP_KERNEL
>  	select HAVE_FUNCTION_GRAPH_TRACER
> -	select HAVE_FUNCTION_TRACER if !XIP_KERNEL
> +	select HAVE_FUNCTION_TRACER if !XIP_KERNEL && !PREEMPTION
>  
>  config ARCH_MMAP_RND_BITS_MIN
>  	default 18 if 64BIT
> -- 
> 2.36.1
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH -next V6 7/7] samples: ftrace: Add riscv support for SAMPLE_FTRACE_DIRECT[_MULTI]
  2023-01-07 13:35 ` [PATCH -next V6 7/7] samples: ftrace: Add riscv support for SAMPLE_FTRACE_DIRECT[_MULTI] guoren
@ 2023-01-10 13:08   ` Evgenii Shatokhin
  2023-01-10 13:50   ` Evgenii Shatokhin
  2023-01-11  9:50   ` Song Shuai
  2 siblings, 0 replies; 28+ messages in thread
From: Evgenii Shatokhin @ 2023-01-10 13:08 UTC (permalink / raw)
  To: guoren, suagrfillet
  Cc: linux-riscv, linux-kernel, anup, paul.walmsley, palmer,
	conor.dooley, heiko, rostedt, mhiramat, jolsa, bp, jpoimboe,
	andy.chiu

On 07.01.2023 16:35, guoren@kernel.org wrote:
> From: Song Shuai <suagrfillet@gmail.com>
> 
> select HAVE_SAMPLE_FTRACE_DIRECT and HAVE_SAMPLE_FTRACE_DIRECT_MULTI
> for ARCH_RV64I in arch/riscv/Kconfig. And add riscv asm code for
> the ftrace-direct*.c files in samples/ftrace/.

This patch does not actually change arch/riscv/Kconfig. Some part is 
missing, perhaps?

> 
> Signed-off-by: Song Shuai <suagrfillet@gmail.com>
> Tested-by: Guo Ren <guoren@kernel.org>
> Signed-off-by: Guo Ren <guoren@kernel.org>
> ---
>   samples/ftrace/ftrace-direct-modify.c       | 33 ++++++++++++++++++
>   samples/ftrace/ftrace-direct-multi-modify.c | 37 +++++++++++++++++++++
>   samples/ftrace/ftrace-direct-multi.c        | 22 ++++++++++++
>   samples/ftrace/ftrace-direct-too.c          | 26 +++++++++++++++
>   samples/ftrace/ftrace-direct.c              | 22 ++++++++++++
>   5 files changed, 140 insertions(+)
> 
> diff --git a/samples/ftrace/ftrace-direct-modify.c b/samples/ftrace/ftrace-direct-modify.c
> index de5a0f67f320..be7bf472c3c7 100644
> --- a/samples/ftrace/ftrace-direct-modify.c
> +++ b/samples/ftrace/ftrace-direct-modify.c
> @@ -23,6 +23,39 @@ extern void my_tramp2(void *);
> 
>   static unsigned long my_ip = (unsigned long)schedule;
> 
> +#ifdef CONFIG_RISCV
> +
> +asm (" .pushsection    .text, \"ax\", @progbits\n"
> +"      .type           my_tramp1, @function\n"
> +"      .globl          my_tramp1\n"
> +"   my_tramp1:\n"
> +"      addi sp,sp,-16\n"
> +"      sd   t0,0(sp)\n"
> +"      sd   ra,8(sp)\n"
> +"      call my_direct_func1\n"
> +"      ld   t0,0(sp)\n"
> +"      ld   ra,8(sp)\n"
> +"      addi sp,sp,16\n"
> +"      jr t0\n"
> +"      .size           my_tramp1, .-my_tramp1\n"
> +
> +"      .type           my_tramp2, @function\n"
> +"      .globl          my_tramp2\n"
> +"   my_tramp2:\n"
> +"      addi sp,sp,-16\n"
> +"      sd   t0,0(sp)\n"
> +"      sd   ra,8(sp)\n"
> +"      call my_direct_func2\n"
> +"      ld   t0,0(sp)\n"
> +"      ld   ra,8(sp)\n"
> +"      addi sp,sp,16\n"
> +"      jr t0\n"
> +"      .size           my_tramp2, .-my_tramp2\n"
> +"      .popsection\n"
> +);
> +
> +#endif /* CONFIG_RISCV */
> +
>   #ifdef CONFIG_X86_64
> 
>   #include <asm/ibt.h>
> diff --git a/samples/ftrace/ftrace-direct-multi-modify.c b/samples/ftrace/ftrace-direct-multi-modify.c
> index d52370cad0b6..10884bf418f7 100644
> --- a/samples/ftrace/ftrace-direct-multi-modify.c
> +++ b/samples/ftrace/ftrace-direct-multi-modify.c
> @@ -21,6 +21,43 @@ void my_direct_func2(unsigned long ip)
>   extern void my_tramp1(void *);
>   extern void my_tramp2(void *);
> 
> +#ifdef CONFIG_RISCV
> +
> +asm (" .pushsection    .text, \"ax\", @progbits\n"
> +"      .type           my_tramp1, @function\n"
> +"      .globl          my_tramp1\n"
> +"   my_tramp1:\n"
> +"       addi sp,sp,-24\n"
> +"       sd   a0,0(sp)\n"
> +"       sd   t0,8(sp)\n"
> +"       sd   ra,16(sp)\n"
> +"       call my_direct_func1\n"
> +"       ld   a0,0(sp)\n"
> +"       ld   t0,8(sp)\n"
> +"       ld   ra,16(sp)\n"
> +"       addi sp,sp,24\n"
> +"      jr t0\n"
> +"      .size           my_tramp1, .-my_tramp1\n"
> +
> +"      .type           my_tramp2, @function\n"
> +"      .globl          my_tramp2\n"
> +"   my_tramp2:\n"
> +"       addi sp,sp,-24\n"
> +"       sd   a0,0(sp)\n"
> +"       sd   t0,8(sp)\n"
> +"       sd   ra,16(sp)\n"
> +"       call my_direct_func2\n"
> +"       ld   a0,0(sp)\n"
> +"       ld   t0,8(sp)\n"
> +"       ld   ra,16(sp)\n"
> +"       addi sp,sp,24\n"
> +"      jr t0\n"
> +"      .size           my_tramp2, .-my_tramp2\n"
> +"      .popsection\n"
> +);
> +
> +#endif /* CONFIG_RISCV */
> +
>   #ifdef CONFIG_X86_64
> 
>   #include <asm/ibt.h>
> diff --git a/samples/ftrace/ftrace-direct-multi.c b/samples/ftrace/ftrace-direct-multi.c
> index ec1088922517..a35bf43bf6d7 100644
> --- a/samples/ftrace/ftrace-direct-multi.c
> +++ b/samples/ftrace/ftrace-direct-multi.c
> @@ -16,6 +16,28 @@ void my_direct_func(unsigned long ip)
> 
>   extern void my_tramp(void *);
> 
> +#ifdef CONFIG_RISCV
> +
> +asm ("       .pushsection    .text, \"ax\", @progbits\n"
> +"       .type           my_tramp, @function\n"
> +"       .globl          my_tramp\n"
> +"   my_tramp:\n"
> +"       addi sp,sp,-24\n"
> +"       sd   a0,0(sp)\n"
> +"       sd   t0,8(sp)\n"
> +"       sd   ra,16(sp)\n"
> +"       call my_direct_func\n"
> +"       ld   a0,0(sp)\n"
> +"       ld   t0,8(sp)\n"
> +"       ld   ra,16(sp)\n"
> +"       addi sp,sp,24\n"
> +"       jr t0\n"
> +"       .size           my_tramp, .-my_tramp\n"
> +"       .popsection\n"
> +);
> +
> +#endif /* CONFIG_RISCV */
> +
>   #ifdef CONFIG_X86_64
> 
>   #include <asm/ibt.h>
> diff --git a/samples/ftrace/ftrace-direct-too.c b/samples/ftrace/ftrace-direct-too.c
> index e13fb59a2b47..3b62e33c2e6d 100644
> --- a/samples/ftrace/ftrace-direct-too.c
> +++ b/samples/ftrace/ftrace-direct-too.c
> @@ -18,6 +18,32 @@ void my_direct_func(struct vm_area_struct *vma,
> 
>   extern void my_tramp(void *);
> 
> +#ifdef CONFIG_RISCV
> +
> +asm ("       .pushsection    .text, \"ax\", @progbits\n"
> +"       .type           my_tramp, @function\n"
> +"       .globl          my_tramp\n"
> +"   my_tramp:\n"
> +"       addi sp,sp,-40\n"
> +"       sd   a0,0(sp)\n"
> +"       sd   a1,8(sp)\n"
> +"       sd   a2,16(sp)\n"
> +"       sd   t0,24(sp)\n"
> +"       sd   ra,32(sp)\n"
> +"       call my_direct_func\n"
> +"       ld   a0,0(sp)\n"
> +"       ld   a1,8(sp)\n"
> +"       ld   a2,16(sp)\n"
> +"       ld   t0,24(sp)\n"
> +"       ld   ra,32(sp)\n"
> +"       addi sp,sp,40\n"
> +"       jr t0\n"
> +"       .size           my_tramp, .-my_tramp\n"
> +"       .popsection\n"
> +);
> +
> +#endif /* CONFIG_RISCV */
> +
>   #ifdef CONFIG_X86_64
> 
>   #include <asm/ibt.h>
> diff --git a/samples/ftrace/ftrace-direct.c b/samples/ftrace/ftrace-direct.c
> index 1f769d0db20f..2cfe5a7d2d70 100644
> --- a/samples/ftrace/ftrace-direct.c
> +++ b/samples/ftrace/ftrace-direct.c
> @@ -15,6 +15,28 @@ void my_direct_func(struct task_struct *p)
> 
>   extern void my_tramp(void *);
> 
> +#ifdef CONFIG_RISCV
> +
> +asm ("       .pushsection    .text, \"ax\", @progbits\n"
> +"       .type           my_tramp, @function\n"
> +"       .globl          my_tramp\n"
> +"   my_tramp:\n"
> +"       addi sp,sp,-24\n"
> +"       sd   a0,0(sp)\n"
> +"       sd   t0,8(sp)\n"
> +"       sd   ra,16(sp)\n"
> +"       call my_direct_func\n"
> +"       ld   a0,0(sp)\n"
> +"       ld   t0,8(sp)\n"
> +"       ld   ra,16(sp)\n"
> +"       addi sp,sp,24\n"
> +"       jr t0\n"
> +"       .size           my_tramp, .-my_tramp\n"
> +"       .popsection\n"
> +);
> +
> +#endif /* CONFIG_RISCV */
> +
>   #ifdef CONFIG_X86_64
> 
>   #include <asm/ibt.h>
> --
> 2.36.1
> 
> 

Regards,
Evgenii



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH -next V6 7/7] samples: ftrace: Add riscv support for SAMPLE_FTRACE_DIRECT[_MULTI]
  2023-01-07 13:35 ` [PATCH -next V6 7/7] samples: ftrace: Add riscv support for SAMPLE_FTRACE_DIRECT[_MULTI] guoren
  2023-01-10 13:08   ` Evgenii Shatokhin
@ 2023-01-10 13:50   ` Evgenii Shatokhin
  2023-01-11  9:50   ` Song Shuai
  2 siblings, 0 replies; 28+ messages in thread
From: Evgenii Shatokhin @ 2023-01-10 13:50 UTC (permalink / raw)
  To: guoren
  Cc: linux-riscv, linux-kernel, anup, paul.walmsley, palmer,
	conor.dooley, heiko, rostedt, mhiramat, jolsa, bp, jpoimboe,
	suagrfillet, andy.chiu

On 07.01.2023 16:35, guoren@kernel.org wrote:
> From: Song Shuai <suagrfillet@gmail.com>
> 
> select HAVE_SAMPLE_FTRACE_DIRECT and HAVE_SAMPLE_FTRACE_DIRECT_MULTI
> for ARCH_RV64I in arch/riscv/Kconfig. And add riscv asm code for
> the ftrace-direct*.c files in samples/ftrace/.

The samples fail to build for RV64GC system, because asm/nospec-branch.h 
is not available for RISC-V. As of 6.2-rc3, it seems, the header is only 
present on x86 and s390.

May be, place it under #ifdef, so that it is only used where available? 
nospec functionality is not yet implemented for RISC-V and is way out of 
scope of this patch series.

> 
> Signed-off-by: Song Shuai <suagrfillet@gmail.com>
> Tested-by: Guo Ren <guoren@kernel.org>
> Signed-off-by: Guo Ren <guoren@kernel.org>
> ---
>   samples/ftrace/ftrace-direct-modify.c       | 33 ++++++++++++++++++
>   samples/ftrace/ftrace-direct-multi-modify.c | 37 +++++++++++++++++++++
>   samples/ftrace/ftrace-direct-multi.c        | 22 ++++++++++++
>   samples/ftrace/ftrace-direct-too.c          | 26 +++++++++++++++
>   samples/ftrace/ftrace-direct.c              | 22 ++++++++++++
>   5 files changed, 140 insertions(+)
> 
> diff --git a/samples/ftrace/ftrace-direct-modify.c b/samples/ftrace/ftrace-direct-modify.c
> index de5a0f67f320..be7bf472c3c7 100644
> --- a/samples/ftrace/ftrace-direct-modify.c
> +++ b/samples/ftrace/ftrace-direct-modify.c
> @@ -23,6 +23,39 @@ extern void my_tramp2(void *);
> 
>   static unsigned long my_ip = (unsigned long)schedule;
> 
> +#ifdef CONFIG_RISCV
> +
> +asm (" .pushsection    .text, \"ax\", @progbits\n"
> +"      .type           my_tramp1, @function\n"
> +"      .globl          my_tramp1\n"
> +"   my_tramp1:\n"
> +"      addi sp,sp,-16\n"
> +"      sd   t0,0(sp)\n"
> +"      sd   ra,8(sp)\n"
> +"      call my_direct_func1\n"
> +"      ld   t0,0(sp)\n"
> +"      ld   ra,8(sp)\n"
> +"      addi sp,sp,16\n"
> +"      jr t0\n"
> +"      .size           my_tramp1, .-my_tramp1\n"
> +
> +"      .type           my_tramp2, @function\n"
> +"      .globl          my_tramp2\n"
> +"   my_tramp2:\n"
> +"      addi sp,sp,-16\n"
> +"      sd   t0,0(sp)\n"
> +"      sd   ra,8(sp)\n"
> +"      call my_direct_func2\n"
> +"      ld   t0,0(sp)\n"
> +"      ld   ra,8(sp)\n"
> +"      addi sp,sp,16\n"
> +"      jr t0\n"
> +"      .size           my_tramp2, .-my_tramp2\n"
> +"      .popsection\n"
> +);
> +
> +#endif /* CONFIG_RISCV */
> +
>   #ifdef CONFIG_X86_64
> 
>   #include <asm/ibt.h>
> diff --git a/samples/ftrace/ftrace-direct-multi-modify.c b/samples/ftrace/ftrace-direct-multi-modify.c
> index d52370cad0b6..10884bf418f7 100644
> --- a/samples/ftrace/ftrace-direct-multi-modify.c
> +++ b/samples/ftrace/ftrace-direct-multi-modify.c
> @@ -21,6 +21,43 @@ void my_direct_func2(unsigned long ip)
>   extern void my_tramp1(void *);
>   extern void my_tramp2(void *);
> 
> +#ifdef CONFIG_RISCV
> +
> +asm (" .pushsection    .text, \"ax\", @progbits\n"
> +"      .type           my_tramp1, @function\n"
> +"      .globl          my_tramp1\n"
> +"   my_tramp1:\n"
> +"       addi sp,sp,-24\n"
> +"       sd   a0,0(sp)\n"
> +"       sd   t0,8(sp)\n"
> +"       sd   ra,16(sp)\n"
> +"       call my_direct_func1\n"
> +"       ld   a0,0(sp)\n"
> +"       ld   t0,8(sp)\n"
> +"       ld   ra,16(sp)\n"
> +"       addi sp,sp,24\n"
> +"      jr t0\n"
> +"      .size           my_tramp1, .-my_tramp1\n"
> +
> +"      .type           my_tramp2, @function\n"
> +"      .globl          my_tramp2\n"
> +"   my_tramp2:\n"
> +"       addi sp,sp,-24\n"
> +"       sd   a0,0(sp)\n"
> +"       sd   t0,8(sp)\n"
> +"       sd   ra,16(sp)\n"
> +"       call my_direct_func2\n"
> +"       ld   a0,0(sp)\n"
> +"       ld   t0,8(sp)\n"
> +"       ld   ra,16(sp)\n"
> +"       addi sp,sp,24\n"
> +"      jr t0\n"
> +"      .size           my_tramp2, .-my_tramp2\n"
> +"      .popsection\n"
> +);
> +
> +#endif /* CONFIG_RISCV */
> +
>   #ifdef CONFIG_X86_64
> 
>   #include <asm/ibt.h>
> diff --git a/samples/ftrace/ftrace-direct-multi.c b/samples/ftrace/ftrace-direct-multi.c
> index ec1088922517..a35bf43bf6d7 100644
> --- a/samples/ftrace/ftrace-direct-multi.c
> +++ b/samples/ftrace/ftrace-direct-multi.c
> @@ -16,6 +16,28 @@ void my_direct_func(unsigned long ip)
> 
>   extern void my_tramp(void *);
> 
> +#ifdef CONFIG_RISCV
> +
> +asm ("       .pushsection    .text, \"ax\", @progbits\n"
> +"       .type           my_tramp, @function\n"
> +"       .globl          my_tramp\n"
> +"   my_tramp:\n"
> +"       addi sp,sp,-24\n"
> +"       sd   a0,0(sp)\n"
> +"       sd   t0,8(sp)\n"
> +"       sd   ra,16(sp)\n"
> +"       call my_direct_func\n"
> +"       ld   a0,0(sp)\n"
> +"       ld   t0,8(sp)\n"
> +"       ld   ra,16(sp)\n"
> +"       addi sp,sp,24\n"
> +"       jr t0\n"
> +"       .size           my_tramp, .-my_tramp\n"
> +"       .popsection\n"
> +);
> +
> +#endif /* CONFIG_RISCV */
> +
>   #ifdef CONFIG_X86_64
> 
>   #include <asm/ibt.h>
> diff --git a/samples/ftrace/ftrace-direct-too.c b/samples/ftrace/ftrace-direct-too.c
> index e13fb59a2b47..3b62e33c2e6d 100644
> --- a/samples/ftrace/ftrace-direct-too.c
> +++ b/samples/ftrace/ftrace-direct-too.c
> @@ -18,6 +18,32 @@ void my_direct_func(struct vm_area_struct *vma,
> 
>   extern void my_tramp(void *);
> 
> +#ifdef CONFIG_RISCV
> +
> +asm ("       .pushsection    .text, \"ax\", @progbits\n"
> +"       .type           my_tramp, @function\n"
> +"       .globl          my_tramp\n"
> +"   my_tramp:\n"
> +"       addi sp,sp,-40\n"
> +"       sd   a0,0(sp)\n"
> +"       sd   a1,8(sp)\n"
> +"       sd   a2,16(sp)\n"
> +"       sd   t0,24(sp)\n"
> +"       sd   ra,32(sp)\n"
> +"       call my_direct_func\n"
> +"       ld   a0,0(sp)\n"
> +"       ld   a1,8(sp)\n"
> +"       ld   a2,16(sp)\n"
> +"       ld   t0,24(sp)\n"
> +"       ld   ra,32(sp)\n"
> +"       addi sp,sp,40\n"
> +"       jr t0\n"
> +"       .size           my_tramp, .-my_tramp\n"
> +"       .popsection\n"
> +);
> +
> +#endif /* CONFIG_RISCV */
> +
>   #ifdef CONFIG_X86_64
> 
>   #include <asm/ibt.h>
> diff --git a/samples/ftrace/ftrace-direct.c b/samples/ftrace/ftrace-direct.c
> index 1f769d0db20f..2cfe5a7d2d70 100644
> --- a/samples/ftrace/ftrace-direct.c
> +++ b/samples/ftrace/ftrace-direct.c
> @@ -15,6 +15,28 @@ void my_direct_func(struct task_struct *p)
> 
>   extern void my_tramp(void *);
> 
> +#ifdef CONFIG_RISCV
> +
> +asm ("       .pushsection    .text, \"ax\", @progbits\n"
> +"       .type           my_tramp, @function\n"
> +"       .globl          my_tramp\n"
> +"   my_tramp:\n"
> +"       addi sp,sp,-24\n"
> +"       sd   a0,0(sp)\n"
> +"       sd   t0,8(sp)\n"
> +"       sd   ra,16(sp)\n"
> +"       call my_direct_func\n"
> +"       ld   a0,0(sp)\n"
> +"       ld   t0,8(sp)\n"
> +"       ld   ra,16(sp)\n"
> +"       addi sp,sp,24\n"
> +"       jr t0\n"
> +"       .size           my_tramp, .-my_tramp\n"
> +"       .popsection\n"
> +);
> +
> +#endif /* CONFIG_RISCV */
> +
>   #ifdef CONFIG_X86_64
> 
>   #include <asm/ibt.h>
> --
> 2.36.1
> 
> 

Regards,
Evgenii


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH -next V6 3/7] riscv: ftrace: Reduce the detour code size to half
  2023-01-07 13:35 ` [PATCH -next V6 3/7] riscv: ftrace: Reduce the detour code size to half guoren
@ 2023-01-10 17:13   ` Evgenii Shatokhin
  2023-01-11  9:58     ` Guo Ren
  0 siblings, 1 reply; 28+ messages in thread
From: Evgenii Shatokhin @ 2023-01-10 17:13 UTC (permalink / raw)
  To: guoren
  Cc: linux-riscv, linux-kernel, Guo Ren, anup, paul.walmsley, palmer,
	conor.dooley, heiko, rostedt, mhiramat, jolsa, bp, jpoimboe,
	suagrfillet, andy.chiu, linux

On 07.01.2023 16:35, guoren@kernel.org wrote:
> 
> From: Guo Ren <guoren@linux.alibaba.com>
> 
> Use a temporary register to reduce the size of detour code from 16 bytes to
> 8 bytes. The previous implementation is from 'commit afc76b8b8011 ("riscv:
> Using PATCHABLE_FUNCTION_ENTRY instead of MCOUNT")'.
> 
> Before the patch:
> <func_prolog>:
>   0: REG_S  ra, -SZREG(sp)
>   4: auipc  ra, ?
>   8: jalr   ?(ra)
> 12: REG_L  ra, -SZREG(sp)
>   (func_boddy)
> 
> After the patch:
> <func_prolog>:
>   0: auipc  t0, ?
>   4: jalr   t0, ?(t0)
>   (func_boddy)
> 
> This patch not just reduces the size of detour code, but also fixes an
> important issue:
> 
> An Ftrace callback registered with FTRACE_OPS_FL_IPMODIFY flag can
> actually change the instruction pointer, e.g. to "replace" the given
> kernel function with a new one, which is needed for livepatching, etc.
> 
> In this case, the trampoline (ftrace_regs_caller) would not return to
> <func_prolog+12> but would rather jump to the new function. So, "REG_L
> ra, -SZREG(sp)" would not run and the original return address would not
> be restored. The kernel is likely to hang or crash as a result.
> 
> This can be easily demonstrated if one tries to "replace", say,
> cmdline_proc_show() with a new function with the same signature using
> instruction_pointer_set(&fregs->regs, new_func_addr) in the Ftrace
> callback.
> 
> Link: https://lore.kernel.org/linux-riscv/20221122075440.1165172-1-suagrfillet@gmail.com/
> Link: https://lore.kernel.org/linux-riscv/d7d5730b-ebef-68e5-5046-e763e1ee6164@yadro.com/
> Co-developed-by: Song Shuai <suagrfillet@gmail.com>
> Signed-off-by: Song Shuai <suagrfillet@gmail.com>
> Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
> Signed-off-by: Guo Ren <guoren@kernel.org>
> Cc: Evgenii Shatokhin <e.shatokhin@yadro.com>
> ---
>   arch/riscv/Makefile             |  4 +-
>   arch/riscv/include/asm/ftrace.h | 50 +++++++++++++++++++------
>   arch/riscv/kernel/ftrace.c      | 65 ++++++++++-----------------------
>   arch/riscv/kernel/mcount-dyn.S  | 43 +++++++++-------------
>   4 files changed, 76 insertions(+), 86 deletions(-)
> 
> diff --git a/arch/riscv/Makefile b/arch/riscv/Makefile
> index ea5a91da6897..3c9aaf67ed79 100644
> --- a/arch/riscv/Makefile
> +++ b/arch/riscv/Makefile
> @@ -12,9 +12,9 @@ ifeq ($(CONFIG_DYNAMIC_FTRACE),y)
>          LDFLAGS_vmlinux := --no-relax
>          KBUILD_CPPFLAGS += -DCC_USING_PATCHABLE_FUNCTION_ENTRY
>   ifeq ($(CONFIG_RISCV_ISA_C),y)
> -       CC_FLAGS_FTRACE := -fpatchable-function-entry=8
> -else
>          CC_FLAGS_FTRACE := -fpatchable-function-entry=4
> +else
> +       CC_FLAGS_FTRACE := -fpatchable-function-entry=2
>   endif
>   endif
> 
> diff --git a/arch/riscv/include/asm/ftrace.h b/arch/riscv/include/asm/ftrace.h
> index 04dad3380041..9e73922e1e2e 100644
> --- a/arch/riscv/include/asm/ftrace.h
> +++ b/arch/riscv/include/asm/ftrace.h
> @@ -42,6 +42,14 @@ struct dyn_arch_ftrace {
>    * 2) jalr: setting low-12 offset to ra, jump to ra, and set ra to
>    *          return address (original pc + 4)
>    *
> + *<ftrace enable>:
> + * 0: auipc  t0/ra, 0x?
> + * 4: jalr   t0/ra, ?(t0/ra)
> + *
> + *<ftrace disable>:
> + * 0: nop
> + * 4: nop
> + *
>    * Dynamic ftrace generates probes to call sites, so we must deal with
>    * both auipc and jalr at the same time.
>    */
> @@ -52,25 +60,43 @@ struct dyn_arch_ftrace {
>   #define AUIPC_OFFSET_MASK      (0xfffff000)
>   #define AUIPC_PAD              (0x00001000)
>   #define JALR_SHIFT             20
> -#define JALR_BASIC             (0x000080e7)
> -#define AUIPC_BASIC            (0x00000097)
> +#define JALR_RA                        (0x000080e7)
> +#define AUIPC_RA               (0x00000097)
> +#define JALR_T0                        (0x000282e7)
> +#define AUIPC_T0               (0x00000297)
>   #define NOP4                   (0x00000013)
> 
> -#define make_call(caller, callee, call)                                        \
> +#define to_jalr_t0(offset)                                             \
> +       (((offset & JALR_OFFSET_MASK) << JALR_SHIFT) | JALR_T0)
> +
> +#define to_auipc_t0(offset)                                            \
> +       ((offset & JALR_SIGN_MASK) ?                                    \
> +       (((offset & AUIPC_OFFSET_MASK) + AUIPC_PAD) | AUIPC_T0) :       \
> +       ((offset & AUIPC_OFFSET_MASK) | AUIPC_T0))
> +
> +#define make_call_t0(caller, callee, call)                             \
>   do {                                                                   \
> -       call[0] = to_auipc_insn((unsigned int)((unsigned long)callee -  \
> -                               (unsigned long)caller));                \
> -       call[1] = to_jalr_insn((unsigned int)((unsigned long)callee -   \
> -                              (unsigned long)caller));                 \
> +       unsigned int offset =                                           \
> +               (unsigned long) callee - (unsigned long) caller;        \
> +       call[0] = to_auipc_t0(offset);                                  \
> +       call[1] = to_jalr_t0(offset);                                   \
>   } while (0)
> 
> -#define to_jalr_insn(offset)                                           \
> -       (((offset & JALR_OFFSET_MASK) << JALR_SHIFT) | JALR_BASIC)
> +#define to_jalr_ra(offset)                                             \
> +       (((offset & JALR_OFFSET_MASK) << JALR_SHIFT) | JALR_RA)
> 
> -#define to_auipc_insn(offset)                                          \
> +#define to_auipc_ra(offset)                                            \
>          ((offset & JALR_SIGN_MASK) ?                                    \
> -       (((offset & AUIPC_OFFSET_MASK) + AUIPC_PAD) | AUIPC_BASIC) :    \
> -       ((offset & AUIPC_OFFSET_MASK) | AUIPC_BASIC))
> +       (((offset & AUIPC_OFFSET_MASK) + AUIPC_PAD) | AUIPC_RA) :       \
> +       ((offset & AUIPC_OFFSET_MASK) | AUIPC_RA))
> +
> +#define make_call_ra(caller, callee, call)                             \
> +do {                                                                   \
> +       unsigned int offset =                                           \
> +               (unsigned long) callee - (unsigned long) caller;        \
> +       call[0] = to_auipc_ra(offset);                                  \
> +       call[1] = to_jalr_ra(offset);                                   \
> +} while (0)
> 
>   /*
>    * Let auipc+jalr be the basic *mcount unit*, so we make it 8 bytes here.
> diff --git a/arch/riscv/kernel/ftrace.c b/arch/riscv/kernel/ftrace.c
> index 2086f6585773..5bff37af4770 100644
> --- a/arch/riscv/kernel/ftrace.c
> +++ b/arch/riscv/kernel/ftrace.c
> @@ -55,12 +55,15 @@ static int ftrace_check_current_call(unsigned long hook_pos,
>   }
> 
>   static int __ftrace_modify_call(unsigned long hook_pos, unsigned long target,
> -                               bool enable)
> +                               bool enable, bool ra)
>   {
>          unsigned int call[2];
>          unsigned int nops[2] = {NOP4, NOP4};
> 
> -       make_call(hook_pos, target, call);
> +       if (ra)
> +               make_call_ra(hook_pos, target, call);
> +       else
> +               make_call_t0(hook_pos, target, call);
> 
>          /* Replace the auipc-jalr pair at once. Return -EPERM on write error. */
>          if (patch_text_nosync
> @@ -70,42 +73,13 @@ static int __ftrace_modify_call(unsigned long hook_pos, unsigned long target,
>          return 0;
>   }
> 
> -/*
> - * Put 5 instructions with 16 bytes at the front of function within
> - * patchable function entry nops' area.
> - *
> - * 0: REG_S  ra, -SZREG(sp)
> - * 1: auipc  ra, 0x?
> - * 2: jalr   -?(ra)
> - * 3: REG_L  ra, -SZREG(sp)
> - *
> - * So the opcodes is:
> - * 0: 0xfe113c23 (sd)/0xfe112e23 (sw)
> - * 1: 0x???????? -> auipc
> - * 2: 0x???????? -> jalr
> - * 3: 0xff813083 (ld)/0xffc12083 (lw)
> - */
> -#if __riscv_xlen == 64
> -#define INSN0  0xfe113c23
> -#define INSN3  0xff813083
> -#elif __riscv_xlen == 32
> -#define INSN0  0xfe112e23
> -#define INSN3  0xffc12083
> -#endif
> -
> -#define FUNC_ENTRY_SIZE        16
> -#define FUNC_ENTRY_JMP 4
> -
>   int ftrace_make_call(struct dyn_ftrace *rec, unsigned long addr)
>   {
> -       unsigned int call[4] = {INSN0, 0, 0, INSN3};
> -       unsigned long target = addr;
> -       unsigned long caller = rec->ip + FUNC_ENTRY_JMP;
> +       unsigned int call[2];
> 
> -       call[1] = to_auipc_insn((unsigned int)(target - caller));
> -       call[2] = to_jalr_insn((unsigned int)(target - caller));
> +       make_call_t0(rec->ip, addr, call);
> 
> -       if (patch_text_nosync((void *)rec->ip, call, FUNC_ENTRY_SIZE))
> +       if (patch_text_nosync((void *)rec->ip, call, MCOUNT_INSN_SIZE))
>                  return -EPERM;
> 
>          return 0;
> @@ -114,15 +88,14 @@ int ftrace_make_call(struct dyn_ftrace *rec, unsigned long addr)
>   int ftrace_make_nop(struct module *mod, struct dyn_ftrace *rec,
>                      unsigned long addr)
>   {
> -       unsigned int nops[4] = {NOP4, NOP4, NOP4, NOP4};
> +       unsigned int nops[2] = {NOP4, NOP4};
> 
> -       if (patch_text_nosync((void *)rec->ip, nops, FUNC_ENTRY_SIZE))
> +       if (patch_text_nosync((void *)rec->ip, nops, MCOUNT_INSN_SIZE))
>                  return -EPERM;
> 
>          return 0;
>   }
> 
> -
>   /*
>    * This is called early on, and isn't wrapped by
>    * ftrace_arch_code_modify_{prepare,post_process}() and therefor doesn't hold
> @@ -144,10 +117,10 @@ int ftrace_init_nop(struct module *mod, struct dyn_ftrace *rec)
>   int ftrace_update_ftrace_func(ftrace_func_t func)
>   {
>          int ret = __ftrace_modify_call((unsigned long)&ftrace_call,
> -                                      (unsigned long)func, true);
> +                                      (unsigned long)func, true, true);
>          if (!ret) {
>                  ret = __ftrace_modify_call((unsigned long)&ftrace_regs_call,
> -                                          (unsigned long)func, true);
> +                                          (unsigned long)func, true, true);
>          }
> 
>          return ret;
> @@ -159,16 +132,16 @@ int ftrace_modify_call(struct dyn_ftrace *rec, unsigned long old_addr,
>                         unsigned long addr)
>   {
>          unsigned int call[2];
> -       unsigned long caller = rec->ip + FUNC_ENTRY_JMP;
> +       unsigned long caller = rec->ip;
>          int ret;
> 
> -       make_call(caller, old_addr, call);
> +       make_call_t0(caller, old_addr, call);
>          ret = ftrace_check_current_call(caller, call);
> 
>          if (ret)
>                  return ret;
> 
> -       return __ftrace_modify_call(caller, addr, true);
> +       return __ftrace_modify_call(caller, addr, true, false);
>   }
>   #endif
> 
> @@ -203,12 +176,12 @@ int ftrace_enable_ftrace_graph_caller(void)
>          int ret;
> 
>          ret = __ftrace_modify_call((unsigned long)&ftrace_graph_call,
> -                                   (unsigned long)&prepare_ftrace_return, true);
> +                                   (unsigned long)&prepare_ftrace_return, true, true);
>          if (ret)
>                  return ret;
> 
>          return __ftrace_modify_call((unsigned long)&ftrace_graph_regs_call,
> -                                   (unsigned long)&prepare_ftrace_return, true);
> +                                   (unsigned long)&prepare_ftrace_return, true, true);
>   }
> 
>   int ftrace_disable_ftrace_graph_caller(void)
> @@ -216,12 +189,12 @@ int ftrace_disable_ftrace_graph_caller(void)
>          int ret;
> 
>          ret = __ftrace_modify_call((unsigned long)&ftrace_graph_call,
> -                                   (unsigned long)&prepare_ftrace_return, false);
> +                                   (unsigned long)&prepare_ftrace_return, false, true);
>          if (ret)
>                  return ret;
> 
>          return __ftrace_modify_call((unsigned long)&ftrace_graph_regs_call,
> -                                   (unsigned long)&prepare_ftrace_return, false);
> +                                   (unsigned long)&prepare_ftrace_return, false, true);
>   }
>   #endif /* CONFIG_DYNAMIC_FTRACE */
>   #endif /* CONFIG_FUNCTION_GRAPH_TRACER */
> diff --git a/arch/riscv/kernel/mcount-dyn.S b/arch/riscv/kernel/mcount-dyn.S
> index d171eca623b6..b75332ced757 100644
> --- a/arch/riscv/kernel/mcount-dyn.S
> +++ b/arch/riscv/kernel/mcount-dyn.S
> @@ -13,8 +13,8 @@
> 
>          .text
> 
> -#define FENTRY_RA_OFFSET       12
> -#define ABI_SIZE_ON_STACK      72
> +#define FENTRY_RA_OFFSET       8
> +#define ABI_SIZE_ON_STACK      80
>   #define ABI_A0                 0
>   #define ABI_A1                 8
>   #define ABI_A2                 16
> @@ -23,10 +23,10 @@
>   #define ABI_A5                 40
>   #define ABI_A6                 48
>   #define ABI_A7                 56
> -#define ABI_RA                 64
> +#define ABI_T0                 64
> +#define ABI_RA                 72
> 
>          .macro SAVE_ABI
> -       addi    sp, sp, -SZREG
>          addi    sp, sp, -ABI_SIZE_ON_STACK
> 
>          REG_S   a0, ABI_A0(sp)
> @@ -37,6 +37,7 @@
>          REG_S   a5, ABI_A5(sp)
>          REG_S   a6, ABI_A6(sp)
>          REG_S   a7, ABI_A7(sp)
> +       REG_S   t0, ABI_T0(sp)
>          REG_S   ra, ABI_RA(sp)
>          .endm
> 
> @@ -49,24 +50,18 @@
>          REG_L   a5, ABI_A5(sp)
>          REG_L   a6, ABI_A6(sp)
>          REG_L   a7, ABI_A7(sp)
> +       REG_L   t0, ABI_T0(sp)
>          REG_L   ra, ABI_RA(sp)
> 
>          addi    sp, sp, ABI_SIZE_ON_STACK
> -       addi    sp, sp, SZREG
>          .endm
> 
>   #ifdef CONFIG_DYNAMIC_FTRACE_WITH_REGS
>          .macro SAVE_ALL
> -       addi    sp, sp, -SZREG
>          addi    sp, sp, -PT_SIZE_ON_STACK
> 
> -       REG_S x1,  PT_EPC(sp)
> -       addi    sp, sp, PT_SIZE_ON_STACK
> -       REG_L x1,  (sp)
> -       addi    sp, sp, -PT_SIZE_ON_STACK
> +       REG_S t0,  PT_EPC(sp)
>          REG_S x1,  PT_RA(sp)
> -       REG_L x1,  PT_EPC(sp)
> -
>          REG_S x2,  PT_SP(sp)
>          REG_S x3,  PT_GP(sp)
>          REG_S x4,  PT_TP(sp)
> @@ -100,11 +95,8 @@
>          .endm
> 
>          .macro RESTORE_ALL
> +       REG_L t0,  PT_EPC(sp)
>          REG_L x1,  PT_RA(sp)
> -       addi    sp, sp, PT_SIZE_ON_STACK
> -       REG_S x1,  (sp)
> -       addi    sp, sp, -PT_SIZE_ON_STACK
> -       REG_L x1,  PT_EPC(sp)
>          REG_L x2,  PT_SP(sp)
>          REG_L x3,  PT_GP(sp)
>          REG_L x4,  PT_TP(sp)

I did not notice it the last time, but RESTORE_ALL should skip "REG_L 
x5,  PT_T0(sp)". Otherwise, the value of t0/x5 loaded from PT_EPC(sp) 
will be overwritten.

With the "ipmodify" use case I mentioned earlier, the Ftrace handler 
might have changed the value in PT_EPC(sp) by that time. Reading t0 
again from PT_T0(sp) here breaks it.

The same holds for "[PATCH -next V6 5/7] riscv: ftrace: Make 
ftrace_caller call ftrace_graph_func".

> @@ -137,17 +129,16 @@
>          REG_L x31, PT_T6(sp)
> 
>          addi    sp, sp, PT_SIZE_ON_STACK
> -       addi    sp, sp, SZREG
>          .endm
>   #endif /* CONFIG_DYNAMIC_FTRACE_WITH_REGS */
> 
>   ENTRY(ftrace_caller)
>          SAVE_ABI
> 
> -       addi    a0, ra, -FENTRY_RA_OFFSET
> +       addi    a0, t0, -FENTRY_RA_OFFSET
>          la      a1, function_trace_op
>          REG_L   a2, 0(a1)
> -       REG_L   a1, ABI_SIZE_ON_STACK(sp)
> +       mv      a1, ra
>          mv      a3, sp
> 
>   ftrace_call:
> @@ -155,8 +146,8 @@ ftrace_call:
>          call    ftrace_stub
> 
>   #ifdef CONFIG_FUNCTION_GRAPH_TRACER
> -       addi    a0, sp, ABI_SIZE_ON_STACK
> -       REG_L   a1, ABI_RA(sp)
> +       addi    a0, sp, ABI_RA
> +       REG_L   a1, ABI_T0(sp)
>          addi    a1, a1, -FENTRY_RA_OFFSET
>   #ifdef HAVE_FUNCTION_GRAPH_FP_TEST
>          mv      a2, s0
> @@ -166,17 +157,17 @@ ftrace_graph_call:
>          call    ftrace_stub
>   #endif
>          RESTORE_ABI
> -       ret
> +       jr t0
>   ENDPROC(ftrace_caller)
> 
>   #ifdef CONFIG_DYNAMIC_FTRACE_WITH_REGS
>   ENTRY(ftrace_regs_caller)
>          SAVE_ALL
> 
> -       addi    a0, ra, -FENTRY_RA_OFFSET
> +       addi    a0, t0, -FENTRY_RA_OFFSET
>          la      a1, function_trace_op
>          REG_L   a2, 0(a1)
> -       REG_L   a1, PT_SIZE_ON_STACK(sp)
> +       mv      a1, ra
>          mv      a3, sp
> 
>   ftrace_regs_call:
> @@ -185,7 +176,7 @@ ftrace_regs_call:
> 
>   #ifdef CONFIG_FUNCTION_GRAPH_TRACER
>          addi    a0, sp, PT_RA
> -       REG_L   a1, PT_EPC(sp)
> +       REG_L   a1, PT_T0(sp)
>          addi    a1, a1, -FENTRY_RA_OFFSET
>   #ifdef HAVE_FUNCTION_GRAPH_FP_TEST
>          mv      a2, s0
> @@ -196,6 +187,6 @@ ftrace_graph_regs_call:
>   #endif
> 
>          RESTORE_ALL
> -       ret
> +       jr t0
>   ENDPROC(ftrace_regs_caller)
>   #endif /* CONFIG_DYNAMIC_FTRACE_WITH_REGS */
> --
> 2.36.1
> 
> 

Regards,
Evgenii


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH -next V6 5/7] riscv: ftrace: Make ftrace_caller call ftrace_graph_func
  2023-01-07 13:35 ` [PATCH -next V6 5/7] riscv: ftrace: Make ftrace_caller call ftrace_graph_func guoren
@ 2023-01-10 17:16   ` Evgenii Shatokhin
  2023-01-11  8:23     ` Guo Ren
  0 siblings, 1 reply; 28+ messages in thread
From: Evgenii Shatokhin @ 2023-01-10 17:16 UTC (permalink / raw)
  To: guoren
  Cc: linux-riscv, linux-kernel, anup, paul.walmsley, palmer,
	conor.dooley, heiko, rostedt, mhiramat, jolsa, bp, jpoimboe,
	suagrfillet, andy.chiu, linux

On 07.01.2023 16:35, guoren@kernel.org wrote:
> From: Song Shuai <suagrfillet@gmail.com>
> 
> In order to make the function graph use ftrace directly, ftrace_caller
> should be adjusted to save the necessary regs against the pt_regs layout
> so it can call ftrace_graph_func reasonably.
> 
> SAVE_ALL now saves all the regs according to the pt_regs struct. Here
> supersedes SAVE_ALL by SAVE_ABI_REGS which has an extra option to allow
> saving only the necessary ABI-related regs for ftrace_caller.
> 
> ftrace_caller and ftrace_regs_caller save their regs with the respective
> option of SAVE_ABI_REGS, then call the tracing function, especially
> graph_ops's ftrace_graph_func. So the ftrace_graph_[regs]_call labels
> aren't needed anymore if FTRACE_WITH_REGS is defined.
> 
> As the previous patch described, the ftrace_caller remains with its
> ftrace_graph_call if FTRACE_WITH_REGS isn't defined,
> 
> For convenience, the original argument setup for the tracing function in
> ftrace_[regs]_caller is separated as PREPARE_ARGS.
> 
> Signed-off-by: Song Shuai <suagrfillet@gmail.com>
> Tested-by: Guo Ren <guoren@kernel.org>
> Signed-off-by: Guo Ren <guoren@kernel.org>
> ---
>   arch/riscv/kernel/mcount-dyn.S | 142 ++++++++++++++++++++++++---------
>   1 file changed, 104 insertions(+), 38 deletions(-)
> 
> diff --git a/arch/riscv/kernel/mcount-dyn.S b/arch/riscv/kernel/mcount-dyn.S
> index b75332ced757..d7d4d51b4bd7 100644
> --- a/arch/riscv/kernel/mcount-dyn.S
> +++ b/arch/riscv/kernel/mcount-dyn.S
> @@ -57,19 +57,52 @@
>          .endm
> 
>   #ifdef CONFIG_DYNAMIC_FTRACE_WITH_REGS
> -       .macro SAVE_ALL
> +
> +/**
> +* SAVE_ABI_REGS - save regs against the pt_regs struct
> +*
> +* @all: tell if saving all the regs
> +*
> +* If all is set, all the regs will be saved, otherwise only ABI
> +* related regs (a0-a7,epc,ra and optional s0) will be saved.
> +*
> +* After the stack is established,
> +*
> +* 0(sp) stores the PC of the traced function which can be accessed
> +* by &(fregs)->regs->epc in tracing function. Note that the real
> +* function entry address should be computed with -FENTRY_RA_OFFSET.
> +*
> +* 8(sp) stores the function return address (i.e. parent IP) that
> +* can be accessed by &(fregs)->regs->ra in tracing function.
> +*
> +* The other regs are saved at the respective localtion and accessed
> +* by the respective pt_regs member.
> +*
> +* Here is the layout of stack for your reference.
> +*
> +* PT_SIZE_ON_STACK  ->  +++++++++
> +*                       + ..... +
> +*                       + t3-t6 +
> +*                       + s2-s11+
> +*                       + a0-a7 + --++++-> ftrace_caller saved
> +*                       + s1    +   +
> +*                       + s0    + --+
> +*                       + t0-t2 +   +
> +*                       + tp    +   +
> +*                       + gp    +   +
> +*                       + sp    +   +
> +*                       + ra    + --+ // parent IP
> +*               sp  ->  + epc   + --+ // PC
> +*                       +++++++++
> +**/
> +       .macro SAVE_ABI_REGS, all=0
>          addi    sp, sp, -PT_SIZE_ON_STACK
> 
>          REG_S t0,  PT_EPC(sp)
>          REG_S x1,  PT_RA(sp)
> -       REG_S x2,  PT_SP(sp)
> -       REG_S x3,  PT_GP(sp)
> -       REG_S x4,  PT_TP(sp)
> -       REG_S x5,  PT_T0(sp)
> -       REG_S x6,  PT_T1(sp)
> -       REG_S x7,  PT_T2(sp)
> -       REG_S x8,  PT_S0(sp)
> -       REG_S x9,  PT_S1(sp)
> +
> +       // always save the ABI regs
> +
>          REG_S x10, PT_A0(sp)
>          REG_S x11, PT_A1(sp)
>          REG_S x12, PT_A2(sp)
> @@ -78,6 +111,18 @@
>          REG_S x15, PT_A5(sp)
>          REG_S x16, PT_A6(sp)
>          REG_S x17, PT_A7(sp)
> +
> +       // save the leftover regs
> +
> +       .if \all == 1
> +       REG_S x2,  PT_SP(sp)
> +       REG_S x3,  PT_GP(sp)
> +       REG_S x4,  PT_TP(sp)
> +       REG_S x5,  PT_T0(sp)
> +       REG_S x6,  PT_T1(sp)
> +       REG_S x7,  PT_T2(sp)
> +       REG_S x8,  PT_S0(sp)
> +       REG_S x9,  PT_S1(sp)
>          REG_S x18, PT_S2(sp)
>          REG_S x19, PT_S3(sp)
>          REG_S x20, PT_S4(sp)
> @@ -92,19 +137,19 @@
>          REG_S x29, PT_T4(sp)
>          REG_S x30, PT_T5(sp)
>          REG_S x31, PT_T6(sp)
> +
> +       // save s0 if FP_TEST defined
> +
> +       .else
> +#ifdef HAVE_FUNCTION_GRAPH_FP_TEST
> +       REG_S x8,  PT_S0(sp)
> +#endif
> +       .endif
>          .endm
> 
> -       .macro RESTORE_ALL
> +       .macro RESTORE_ABI_REGS, all=0
>          REG_L t0,  PT_EPC(sp)
>          REG_L x1,  PT_RA(sp)
> -       REG_L x2,  PT_SP(sp)
> -       REG_L x3,  PT_GP(sp)
> -       REG_L x4,  PT_TP(sp)
> -       REG_L x5,  PT_T0(sp)
> -       REG_L x6,  PT_T1(sp)
> -       REG_L x7,  PT_T2(sp)
> -       REG_L x8,  PT_S0(sp)
> -       REG_L x9,  PT_S1(sp)
>          REG_L x10, PT_A0(sp)
>          REG_L x11, PT_A1(sp)
>          REG_L x12, PT_A2(sp)
> @@ -113,6 +158,16 @@
>          REG_L x15, PT_A5(sp)
>          REG_L x16, PT_A6(sp)
>          REG_L x17, PT_A7(sp)
> +
> +       .if \all == 1
> +       REG_L x2,  PT_SP(sp)
> +       REG_L x3,  PT_GP(sp)
> +       REG_L x4,  PT_TP(sp)
> +       REG_L x5,  PT_T0(sp)

Same as for the patch #3, please skip "REG_L x5,  PT_T0(sp)" here. The 
correct value of t0/x5 has already been read from PT_EPC(sp) at this point.

> +       REG_L x6,  PT_T1(sp)
> +       REG_L x7,  PT_T2(sp)
> +       REG_L x8,  PT_S0(sp)
> +       REG_L x9,  PT_S1(sp)
>          REG_L x18, PT_S2(sp)
>          REG_L x19, PT_S3(sp)
>          REG_L x20, PT_S4(sp)
> @@ -128,10 +183,25 @@
>          REG_L x30, PT_T5(sp)
>          REG_L x31, PT_T6(sp)
> 
> +       .else
> +#ifdef HAVE_FUNCTION_GRAPH_FP_TEST
> +       REG_L x8,  PT_S0(sp)
> +#endif
> +       .endif
>          addi    sp, sp, PT_SIZE_ON_STACK
>          .endm
> +
> +       .macro PREPARE_ARGS
> +       addi    a0, t0, -FENTRY_RA_OFFSET       // ip
> +       la      a1, function_trace_op
> +       REG_L   a2, 0(a1)                       // op
> +       mv      a1, ra                          // parent_ip
> +       mv      a3, sp                          // fregs
> +       .endm
> +
>   #endif /* CONFIG_DYNAMIC_FTRACE_WITH_REGS */
> 
> +#ifndef CONFIG_DYNAMIC_FTRACE_WITH_REGS
>   ENTRY(ftrace_caller)
>          SAVE_ABI
> 
> @@ -160,33 +230,29 @@ ftrace_graph_call:
>          jr t0
>   ENDPROC(ftrace_caller)
> 
> -#ifdef CONFIG_DYNAMIC_FTRACE_WITH_REGS
> +#else /* CONFIG_DYNAMIC_FTRACE_WITH_REGS */
>   ENTRY(ftrace_regs_caller)
> -       SAVE_ALL
> -
> -       addi    a0, t0, -FENTRY_RA_OFFSET
> -       la      a1, function_trace_op
> -       REG_L   a2, 0(a1)
> -       mv      a1, ra
> -       mv      a3, sp
> +       SAVE_ABI_REGS 1
> +       PREPARE_ARGS
> 
>   ftrace_regs_call:
>          .global ftrace_regs_call
>          call    ftrace_stub
> 
> -#ifdef CONFIG_FUNCTION_GRAPH_TRACER
> -       addi    a0, sp, PT_RA
> -       REG_L   a1, PT_T0(sp)
> -       addi    a1, a1, -FENTRY_RA_OFFSET
> -#ifdef HAVE_FUNCTION_GRAPH_FP_TEST
> -       mv      a2, s0
> -#endif
> -ftrace_graph_regs_call:
> -       .global ftrace_graph_regs_call
> -       call    ftrace_stub
> -#endif
> 
> -       RESTORE_ALL
> +       RESTORE_ABI_REGS 1
>          jr t0
>   ENDPROC(ftrace_regs_caller)
> +
> +ENTRY(ftrace_caller)
> +       SAVE_ABI_REGS 0
> +       PREPARE_ARGS
> +
> +ftrace_call:
> +       .global ftrace_call
> +       call    ftrace_stub
> +
> +       RESTORE_ABI_REGS 0
> +       jr t0
> +ENDPROC(ftrace_caller)
>   #endif /* CONFIG_DYNAMIC_FTRACE_WITH_REGS */
> --
> 2.36.1
> 
> 
Regards,
Evgenii



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH -next V6 5/7] riscv: ftrace: Make ftrace_caller call ftrace_graph_func
  2023-01-10 17:16   ` Evgenii Shatokhin
@ 2023-01-11  8:23     ` Guo Ren
  2023-01-11  8:41       ` Guo Ren
  0 siblings, 1 reply; 28+ messages in thread
From: Guo Ren @ 2023-01-11  8:23 UTC (permalink / raw)
  To: Evgenii Shatokhin
  Cc: linux-riscv, linux-kernel, anup, paul.walmsley, palmer,
	conor.dooley, heiko, rostedt, mhiramat, jolsa, bp, jpoimboe,
	suagrfillet, andy.chiu, linux

On Wed, Jan 11, 2023 at 1:16 AM Evgenii Shatokhin <e.shatokhin@yadro.com> wrote:
>
> On 07.01.2023 16:35, guoren@kernel.org wrote:
> > From: Song Shuai <suagrfillet@gmail.com>
> >
> > In order to make the function graph use ftrace directly, ftrace_caller
> > should be adjusted to save the necessary regs against the pt_regs layout
> > so it can call ftrace_graph_func reasonably.
> >
> > SAVE_ALL now saves all the regs according to the pt_regs struct. Here
> > supersedes SAVE_ALL by SAVE_ABI_REGS which has an extra option to allow
> > saving only the necessary ABI-related regs for ftrace_caller.
> >
> > ftrace_caller and ftrace_regs_caller save their regs with the respective
> > option of SAVE_ABI_REGS, then call the tracing function, especially
> > graph_ops's ftrace_graph_func. So the ftrace_graph_[regs]_call labels
> > aren't needed anymore if FTRACE_WITH_REGS is defined.
> >
> > As the previous patch described, the ftrace_caller remains with its
> > ftrace_graph_call if FTRACE_WITH_REGS isn't defined,
> >
> > For convenience, the original argument setup for the tracing function in
> > ftrace_[regs]_caller is separated as PREPARE_ARGS.
> >
> > Signed-off-by: Song Shuai <suagrfillet@gmail.com>
> > Tested-by: Guo Ren <guoren@kernel.org>
> > Signed-off-by: Guo Ren <guoren@kernel.org>
> > ---
> >   arch/riscv/kernel/mcount-dyn.S | 142 ++++++++++++++++++++++++---------
> >   1 file changed, 104 insertions(+), 38 deletions(-)
> >
> > diff --git a/arch/riscv/kernel/mcount-dyn.S b/arch/riscv/kernel/mcount-dyn.S
> > index b75332ced757..d7d4d51b4bd7 100644
> > --- a/arch/riscv/kernel/mcount-dyn.S
> > +++ b/arch/riscv/kernel/mcount-dyn.S
> > @@ -57,19 +57,52 @@
> >          .endm
> >
> >   #ifdef CONFIG_DYNAMIC_FTRACE_WITH_REGS
> > -       .macro SAVE_ALL
> > +
> > +/**
> > +* SAVE_ABI_REGS - save regs against the pt_regs struct
> > +*
> > +* @all: tell if saving all the regs
> > +*
> > +* If all is set, all the regs will be saved, otherwise only ABI
> > +* related regs (a0-a7,epc,ra and optional s0) will be saved.
> > +*
> > +* After the stack is established,
> > +*
> > +* 0(sp) stores the PC of the traced function which can be accessed
> > +* by &(fregs)->regs->epc in tracing function. Note that the real
> > +* function entry address should be computed with -FENTRY_RA_OFFSET.
> > +*
> > +* 8(sp) stores the function return address (i.e. parent IP) that
> > +* can be accessed by &(fregs)->regs->ra in tracing function.
> > +*
> > +* The other regs are saved at the respective localtion and accessed
> > +* by the respective pt_regs member.
> > +*
> > +* Here is the layout of stack for your reference.
> > +*
> > +* PT_SIZE_ON_STACK  ->  +++++++++
> > +*                       + ..... +
> > +*                       + t3-t6 +
> > +*                       + s2-s11+
> > +*                       + a0-a7 + --++++-> ftrace_caller saved
> > +*                       + s1    +   +
> > +*                       + s0    + --+
> > +*                       + t0-t2 +   +
> > +*                       + tp    +   +
> > +*                       + gp    +   +
> > +*                       + sp    +   +
> > +*                       + ra    + --+ // parent IP
> > +*               sp  ->  + epc   + --+ // PC
> > +*                       +++++++++
> > +**/
> > +       .macro SAVE_ABI_REGS, all=0
> >          addi    sp, sp, -PT_SIZE_ON_STACK
> >
> >          REG_S t0,  PT_EPC(sp)
> >          REG_S x1,  PT_RA(sp)
> > -       REG_S x2,  PT_SP(sp)
> > -       REG_S x3,  PT_GP(sp)
> > -       REG_S x4,  PT_TP(sp)
> > -       REG_S x5,  PT_T0(sp)
> > -       REG_S x6,  PT_T1(sp)
> > -       REG_S x7,  PT_T2(sp)
> > -       REG_S x8,  PT_S0(sp)
> > -       REG_S x9,  PT_S1(sp)
> > +
> > +       // always save the ABI regs
> > +
> >          REG_S x10, PT_A0(sp)
> >          REG_S x11, PT_A1(sp)
> >          REG_S x12, PT_A2(sp)
> > @@ -78,6 +111,18 @@
> >          REG_S x15, PT_A5(sp)
> >          REG_S x16, PT_A6(sp)
> >          REG_S x17, PT_A7(sp)
> > +
> > +       // save the leftover regs
> > +
> > +       .if \all == 1
> > +       REG_S x2,  PT_SP(sp)
> > +       REG_S x3,  PT_GP(sp)
> > +       REG_S x4,  PT_TP(sp)
> > +       REG_S x5,  PT_T0(sp)
> > +       REG_S x6,  PT_T1(sp)
> > +       REG_S x7,  PT_T2(sp)
> > +       REG_S x8,  PT_S0(sp)
> > +       REG_S x9,  PT_S1(sp)
> >          REG_S x18, PT_S2(sp)
> >          REG_S x19, PT_S3(sp)
> >          REG_S x20, PT_S4(sp)
> > @@ -92,19 +137,19 @@
> >          REG_S x29, PT_T4(sp)
> >          REG_S x30, PT_T5(sp)
> >          REG_S x31, PT_T6(sp)
> > +
> > +       // save s0 if FP_TEST defined
> > +
> > +       .else
> > +#ifdef HAVE_FUNCTION_GRAPH_FP_TEST
> > +       REG_S x8,  PT_S0(sp)
> > +#endif
> > +       .endif
> >          .endm
> >
> > -       .macro RESTORE_ALL
> > +       .macro RESTORE_ABI_REGS, all=0
> >          REG_L t0,  PT_EPC(sp)
> >          REG_L x1,  PT_RA(sp)
> > -       REG_L x2,  PT_SP(sp)
> > -       REG_L x3,  PT_GP(sp)
> > -       REG_L x4,  PT_TP(sp)
> > -       REG_L x5,  PT_T0(sp)
> > -       REG_L x6,  PT_T1(sp)
> > -       REG_L x7,  PT_T2(sp)
> > -       REG_L x8,  PT_S0(sp)
> > -       REG_L x9,  PT_S1(sp)
> >          REG_L x10, PT_A0(sp)
> >          REG_L x11, PT_A1(sp)
> >          REG_L x12, PT_A2(sp)
> > @@ -113,6 +158,16 @@
> >          REG_L x15, PT_A5(sp)
> >          REG_L x16, PT_A6(sp)
> >          REG_L x17, PT_A7(sp)
> > +
> > +       .if \all == 1
> > +       REG_L x2,  PT_SP(sp)
> > +       REG_L x3,  PT_GP(sp)
> > +       REG_L x4,  PT_TP(sp)
> > +       REG_L x5,  PT_T0(sp)
>
> Same as for the patch #3, please skip "REG_L x5,  PT_T0(sp)" here. The
> correct value of t0/x5 has already been read from PT_EPC(sp) at this point.
Oh, I don't want to do that here. It's a common macro. Because it's a
continuous load within the cacheline, I don't think it would cause a
performance gap.


>
> > +       REG_L x6,  PT_T1(sp)
> > +       REG_L x7,  PT_T2(sp)
> > +       REG_L x8,  PT_S0(sp)
> > +       REG_L x9,  PT_S1(sp)
> >          REG_L x18, PT_S2(sp)
> >          REG_L x19, PT_S3(sp)
> >          REG_L x20, PT_S4(sp)
> > @@ -128,10 +183,25 @@
> >          REG_L x30, PT_T5(sp)
> >          REG_L x31, PT_T6(sp)
> >
> > +       .else
> > +#ifdef HAVE_FUNCTION_GRAPH_FP_TEST
> > +       REG_L x8,  PT_S0(sp)
> > +#endif
> > +       .endif
> >          addi    sp, sp, PT_SIZE_ON_STACK
> >          .endm
> > +
> > +       .macro PREPARE_ARGS
> > +       addi    a0, t0, -FENTRY_RA_OFFSET       // ip
> > +       la      a1, function_trace_op
> > +       REG_L   a2, 0(a1)                       // op
> > +       mv      a1, ra                          // parent_ip
> > +       mv      a3, sp                          // fregs
> > +       .endm
> > +
> >   #endif /* CONFIG_DYNAMIC_FTRACE_WITH_REGS */
> >
> > +#ifndef CONFIG_DYNAMIC_FTRACE_WITH_REGS
> >   ENTRY(ftrace_caller)
> >          SAVE_ABI
> >
> > @@ -160,33 +230,29 @@ ftrace_graph_call:
> >          jr t0
> >   ENDPROC(ftrace_caller)
> >
> > -#ifdef CONFIG_DYNAMIC_FTRACE_WITH_REGS
> > +#else /* CONFIG_DYNAMIC_FTRACE_WITH_REGS */
> >   ENTRY(ftrace_regs_caller)
> > -       SAVE_ALL
> > -
> > -       addi    a0, t0, -FENTRY_RA_OFFSET
> > -       la      a1, function_trace_op
> > -       REG_L   a2, 0(a1)
> > -       mv      a1, ra
> > -       mv      a3, sp
> > +       SAVE_ABI_REGS 1
> > +       PREPARE_ARGS
> >
> >   ftrace_regs_call:
> >          .global ftrace_regs_call
> >          call    ftrace_stub
> >
> > -#ifdef CONFIG_FUNCTION_GRAPH_TRACER
> > -       addi    a0, sp, PT_RA
> > -       REG_L   a1, PT_T0(sp)
> > -       addi    a1, a1, -FENTRY_RA_OFFSET
> > -#ifdef HAVE_FUNCTION_GRAPH_FP_TEST
> > -       mv      a2, s0
> > -#endif
> > -ftrace_graph_regs_call:
> > -       .global ftrace_graph_regs_call
> > -       call    ftrace_stub
> > -#endif
> >
> > -       RESTORE_ALL
> > +       RESTORE_ABI_REGS 1
> >          jr t0
> >   ENDPROC(ftrace_regs_caller)
> > +
> > +ENTRY(ftrace_caller)
> > +       SAVE_ABI_REGS 0
> > +       PREPARE_ARGS
> > +
> > +ftrace_call:
> > +       .global ftrace_call
> > +       call    ftrace_stub
> > +
> > +       RESTORE_ABI_REGS 0
> > +       jr t0
> > +ENDPROC(ftrace_caller)
> >   #endif /* CONFIG_DYNAMIC_FTRACE_WITH_REGS */
> > --
> > 2.36.1
> >
> >
> Regards,
> Evgenii
>
>


--
Best Regards
 Guo Ren

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH -next V6 5/7] riscv: ftrace: Make ftrace_caller call ftrace_graph_func
  2023-01-11  8:23     ` Guo Ren
@ 2023-01-11  8:41       ` Guo Ren
  0 siblings, 0 replies; 28+ messages in thread
From: Guo Ren @ 2023-01-11  8:41 UTC (permalink / raw)
  To: Evgenii Shatokhin
  Cc: linux-riscv, linux-kernel, anup, paul.walmsley, palmer,
	conor.dooley, heiko, rostedt, mhiramat, jolsa, bp, jpoimboe,
	suagrfillet, andy.chiu, linux

On Wed, Jan 11, 2023 at 4:23 PM Guo Ren <guoren@kernel.org> wrote:
>
> On Wed, Jan 11, 2023 at 1:16 AM Evgenii Shatokhin <e.shatokhin@yadro.com> wrote:
> >
> > On 07.01.2023 16:35, guoren@kernel.org wrote:
> > > From: Song Shuai <suagrfillet@gmail.com>
> > >
> > > In order to make the function graph use ftrace directly, ftrace_caller
> > > should be adjusted to save the necessary regs against the pt_regs layout
> > > so it can call ftrace_graph_func reasonably.
> > >
> > > SAVE_ALL now saves all the regs according to the pt_regs struct. Here
> > > supersedes SAVE_ALL by SAVE_ABI_REGS which has an extra option to allow
> > > saving only the necessary ABI-related regs for ftrace_caller.
> > >
> > > ftrace_caller and ftrace_regs_caller save their regs with the respective
> > > option of SAVE_ABI_REGS, then call the tracing function, especially
> > > graph_ops's ftrace_graph_func. So the ftrace_graph_[regs]_call labels
> > > aren't needed anymore if FTRACE_WITH_REGS is defined.
> > >
> > > As the previous patch described, the ftrace_caller remains with its
> > > ftrace_graph_call if FTRACE_WITH_REGS isn't defined,
> > >
> > > For convenience, the original argument setup for the tracing function in
> > > ftrace_[regs]_caller is separated as PREPARE_ARGS.
> > >
> > > Signed-off-by: Song Shuai <suagrfillet@gmail.com>
> > > Tested-by: Guo Ren <guoren@kernel.org>
> > > Signed-off-by: Guo Ren <guoren@kernel.org>
> > > ---
> > >   arch/riscv/kernel/mcount-dyn.S | 142 ++++++++++++++++++++++++---------
> > >   1 file changed, 104 insertions(+), 38 deletions(-)
> > >
> > > diff --git a/arch/riscv/kernel/mcount-dyn.S b/arch/riscv/kernel/mcount-dyn.S
> > > index b75332ced757..d7d4d51b4bd7 100644
> > > --- a/arch/riscv/kernel/mcount-dyn.S
> > > +++ b/arch/riscv/kernel/mcount-dyn.S
> > > @@ -57,19 +57,52 @@
> > >          .endm
> > >
> > >   #ifdef CONFIG_DYNAMIC_FTRACE_WITH_REGS
> > > -       .macro SAVE_ALL
> > > +
> > > +/**
> > > +* SAVE_ABI_REGS - save regs against the pt_regs struct
> > > +*
> > > +* @all: tell if saving all the regs
> > > +*
> > > +* If all is set, all the regs will be saved, otherwise only ABI
> > > +* related regs (a0-a7,epc,ra and optional s0) will be saved.
> > > +*
> > > +* After the stack is established,
> > > +*
> > > +* 0(sp) stores the PC of the traced function which can be accessed
> > > +* by &(fregs)->regs->epc in tracing function. Note that the real
> > > +* function entry address should be computed with -FENTRY_RA_OFFSET.
> > > +*
> > > +* 8(sp) stores the function return address (i.e. parent IP) that
> > > +* can be accessed by &(fregs)->regs->ra in tracing function.
> > > +*
> > > +* The other regs are saved at the respective localtion and accessed
> > > +* by the respective pt_regs member.
> > > +*
> > > +* Here is the layout of stack for your reference.
> > > +*
> > > +* PT_SIZE_ON_STACK  ->  +++++++++
> > > +*                       + ..... +
> > > +*                       + t3-t6 +
> > > +*                       + s2-s11+
> > > +*                       + a0-a7 + --++++-> ftrace_caller saved
> > > +*                       + s1    +   +
> > > +*                       + s0    + --+
> > > +*                       + t0-t2 +   +
> > > +*                       + tp    +   +
> > > +*                       + gp    +   +
> > > +*                       + sp    +   +
> > > +*                       + ra    + --+ // parent IP
> > > +*               sp  ->  + epc   + --+ // PC
> > > +*                       +++++++++
> > > +**/
> > > +       .macro SAVE_ABI_REGS, all=0
> > >          addi    sp, sp, -PT_SIZE_ON_STACK
> > >
> > >          REG_S t0,  PT_EPC(sp)
> > >          REG_S x1,  PT_RA(sp)
> > > -       REG_S x2,  PT_SP(sp)
> > > -       REG_S x3,  PT_GP(sp)
> > > -       REG_S x4,  PT_TP(sp)
> > > -       REG_S x5,  PT_T0(sp)
> > > -       REG_S x6,  PT_T1(sp)
> > > -       REG_S x7,  PT_T2(sp)
> > > -       REG_S x8,  PT_S0(sp)
> > > -       REG_S x9,  PT_S1(sp)
> > > +
> > > +       // always save the ABI regs
> > > +
> > >          REG_S x10, PT_A0(sp)
> > >          REG_S x11, PT_A1(sp)
> > >          REG_S x12, PT_A2(sp)
> > > @@ -78,6 +111,18 @@
> > >          REG_S x15, PT_A5(sp)
> > >          REG_S x16, PT_A6(sp)
> > >          REG_S x17, PT_A7(sp)
> > > +
> > > +       // save the leftover regs
> > > +
> > > +       .if \all == 1
> > > +       REG_S x2,  PT_SP(sp)
> > > +       REG_S x3,  PT_GP(sp)
> > > +       REG_S x4,  PT_TP(sp)
> > > +       REG_S x5,  PT_T0(sp)
> > > +       REG_S x6,  PT_T1(sp)
> > > +       REG_S x7,  PT_T2(sp)
> > > +       REG_S x8,  PT_S0(sp)
> > > +       REG_S x9,  PT_S1(sp)
> > >          REG_S x18, PT_S2(sp)
> > >          REG_S x19, PT_S3(sp)
> > >          REG_S x20, PT_S4(sp)
> > > @@ -92,19 +137,19 @@
> > >          REG_S x29, PT_T4(sp)
> > >          REG_S x30, PT_T5(sp)
> > >          REG_S x31, PT_T6(sp)
> > > +
> > > +       // save s0 if FP_TEST defined
> > > +
> > > +       .else
> > > +#ifdef HAVE_FUNCTION_GRAPH_FP_TEST
> > > +       REG_S x8,  PT_S0(sp)
> > > +#endif
> > > +       .endif
> > >          .endm
> > >
> > > -       .macro RESTORE_ALL
> > > +       .macro RESTORE_ABI_REGS, all=0
> > >          REG_L t0,  PT_EPC(sp)
> > >          REG_L x1,  PT_RA(sp)
> > > -       REG_L x2,  PT_SP(sp)
> > > -       REG_L x3,  PT_GP(sp)
> > > -       REG_L x4,  PT_TP(sp)
> > > -       REG_L x5,  PT_T0(sp)
> > > -       REG_L x6,  PT_T1(sp)
> > > -       REG_L x7,  PT_T2(sp)
> > > -       REG_L x8,  PT_S0(sp)
> > > -       REG_L x9,  PT_S1(sp)
> > >          REG_L x10, PT_A0(sp)
> > >          REG_L x11, PT_A1(sp)
> > >          REG_L x12, PT_A2(sp)
> > > @@ -113,6 +158,16 @@
> > >          REG_L x15, PT_A5(sp)
> > >          REG_L x16, PT_A6(sp)
> > >          REG_L x17, PT_A7(sp)
> > > +
> > > +       .if \all == 1
> > > +       REG_L x2,  PT_SP(sp)
> > > +       REG_L x3,  PT_GP(sp)
> > > +       REG_L x4,  PT_TP(sp)
> > > +       REG_L x5,  PT_T0(sp)
> >
> > Same as for the patch #3, please skip "REG_L x5,  PT_T0(sp)" here. The
> > correct value of t0/x5 has already been read from PT_EPC(sp) at this point.
> Oh, I don't want to do that here. It's a common macro. Because it's a
> continuous load within the cacheline, I don't think it would cause a
> performance gap.
I misunderstood here; you're correct. The "REG_L x5,  PT_T0(sp)"
should be skipped.

>
>
> >
> > > +       REG_L x6,  PT_T1(sp)
> > > +       REG_L x7,  PT_T2(sp)
> > > +       REG_L x8,  PT_S0(sp)
> > > +       REG_L x9,  PT_S1(sp)
> > >          REG_L x18, PT_S2(sp)
> > >          REG_L x19, PT_S3(sp)
> > >          REG_L x20, PT_S4(sp)
> > > @@ -128,10 +183,25 @@
> > >          REG_L x30, PT_T5(sp)
> > >          REG_L x31, PT_T6(sp)
> > >
> > > +       .else
> > > +#ifdef HAVE_FUNCTION_GRAPH_FP_TEST
> > > +       REG_L x8,  PT_S0(sp)
> > > +#endif
> > > +       .endif
> > >          addi    sp, sp, PT_SIZE_ON_STACK
> > >          .endm
> > > +
> > > +       .macro PREPARE_ARGS
> > > +       addi    a0, t0, -FENTRY_RA_OFFSET       // ip
> > > +       la      a1, function_trace_op
> > > +       REG_L   a2, 0(a1)                       // op
> > > +       mv      a1, ra                          // parent_ip
> > > +       mv      a3, sp                          // fregs
> > > +       .endm
> > > +
> > >   #endif /* CONFIG_DYNAMIC_FTRACE_WITH_REGS */
> > >
> > > +#ifndef CONFIG_DYNAMIC_FTRACE_WITH_REGS
> > >   ENTRY(ftrace_caller)
> > >          SAVE_ABI
> > >
> > > @@ -160,33 +230,29 @@ ftrace_graph_call:
> > >          jr t0
> > >   ENDPROC(ftrace_caller)
> > >
> > > -#ifdef CONFIG_DYNAMIC_FTRACE_WITH_REGS
> > > +#else /* CONFIG_DYNAMIC_FTRACE_WITH_REGS */
> > >   ENTRY(ftrace_regs_caller)
> > > -       SAVE_ALL
> > > -
> > > -       addi    a0, t0, -FENTRY_RA_OFFSET
> > > -       la      a1, function_trace_op
> > > -       REG_L   a2, 0(a1)
> > > -       mv      a1, ra
> > > -       mv      a3, sp
> > > +       SAVE_ABI_REGS 1
> > > +       PREPARE_ARGS
> > >
> > >   ftrace_regs_call:
> > >          .global ftrace_regs_call
> > >          call    ftrace_stub
> > >
> > > -#ifdef CONFIG_FUNCTION_GRAPH_TRACER
> > > -       addi    a0, sp, PT_RA
> > > -       REG_L   a1, PT_T0(sp)
> > > -       addi    a1, a1, -FENTRY_RA_OFFSET
> > > -#ifdef HAVE_FUNCTION_GRAPH_FP_TEST
> > > -       mv      a2, s0
> > > -#endif
> > > -ftrace_graph_regs_call:
> > > -       .global ftrace_graph_regs_call
> > > -       call    ftrace_stub
> > > -#endif
> > >
> > > -       RESTORE_ALL
> > > +       RESTORE_ABI_REGS 1
> > >          jr t0
> > >   ENDPROC(ftrace_regs_caller)
> > > +
> > > +ENTRY(ftrace_caller)
> > > +       SAVE_ABI_REGS 0
> > > +       PREPARE_ARGS
> > > +
> > > +ftrace_call:
> > > +       .global ftrace_call
> > > +       call    ftrace_stub
> > > +
> > > +       RESTORE_ABI_REGS 0
> > > +       jr t0
> > > +ENDPROC(ftrace_caller)
> > >   #endif /* CONFIG_DYNAMIC_FTRACE_WITH_REGS */
> > > --
> > > 2.36.1
> > >
> > >
> > Regards,
> > Evgenii
> >
> >
>
>
> --
> Best Regards
>  Guo Ren



-- 
Best Regards
 Guo Ren

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH -next V6 7/7] samples: ftrace: Add riscv support for SAMPLE_FTRACE_DIRECT[_MULTI]
  2023-01-07 13:35 ` [PATCH -next V6 7/7] samples: ftrace: Add riscv support for SAMPLE_FTRACE_DIRECT[_MULTI] guoren
  2023-01-10 13:08   ` Evgenii Shatokhin
  2023-01-10 13:50   ` Evgenii Shatokhin
@ 2023-01-11  9:50   ` Song Shuai
  2023-01-11  9:59     ` Guo Ren
  2023-01-13 10:48     ` Song Shuai
  2 siblings, 2 replies; 28+ messages in thread
From: Song Shuai @ 2023-01-11  9:50 UTC (permalink / raw)
  To: guoren, e.shatokhin
  Cc: anup, paul.walmsley, palmer, conor.dooley, heiko, rostedt,
	mhiramat, jolsa, bp, jpoimboe, andy.chiu, linux-riscv,
	linux-kernel

<guoren@kernel.org> 于2023年1月7日周六 13:36写道:
>
> From: Song Shuai <suagrfillet@gmail.com>
>
> select HAVE_SAMPLE_FTRACE_DIRECT and HAVE_SAMPLE_FTRACE_DIRECT_MULTI
> for ARCH_RV64I in arch/riscv/Kconfig. And add riscv asm code for
> the ftrace-direct*.c files in samples/ftrace/.
>
> Signed-off-by: Song Shuai <suagrfillet@gmail.com>
> Tested-by: Guo Ren <guoren@kernel.org>
> Signed-off-by: Guo Ren <guoren@kernel.org>
> ---
Hi,Guo && Evgenii:

As Evgenii comments, this patch has two problems to fix:

1. the modification of Kconfig is missing
So we should add it back

2. the build error resulted by including of nospec-branch.h file
This including is exclusive for x86 architecture, moving it under
x86 #ifdef seems better to fix this error

I fixed them in my GitHub repo, but I can't find a reasonable target
branch in your repo to PR.
So I paste the link of my branch here, you can pick the first 2
commits in the next series.

- samples: ftrace: Add riscv support for SAMPLE_FTRACE_DIRECT[_MULTI]
- samples: ftrace: Include the nospec-branch.h only for x86

Link: https://github.com/sugarfillet/linux/commits/song_ftrace_fix_up_v6

>  samples/ftrace/ftrace-direct-modify.c       | 33 ++++++++++++++++++
>  samples/ftrace/ftrace-direct-multi-modify.c | 37 +++++++++++++++++++++
>  samples/ftrace/ftrace-direct-multi.c        | 22 ++++++++++++
>  samples/ftrace/ftrace-direct-too.c          | 26 +++++++++++++++
>  samples/ftrace/ftrace-direct.c              | 22 ++++++++++++
>  5 files changed, 140 insertions(+)
>
> diff --git a/samples/ftrace/ftrace-direct-modify.c b/samples/ftrace/ftrace-direct-modify.c
> index de5a0f67f320..be7bf472c3c7 100644
> --- a/samples/ftrace/ftrace-direct-modify.c
> +++ b/samples/ftrace/ftrace-direct-modify.c
> @@ -23,6 +23,39 @@ extern void my_tramp2(void *);
>
>  static unsigned long my_ip = (unsigned long)schedule;
>
> +#ifdef CONFIG_RISCV
> +
> +asm (" .pushsection    .text, \"ax\", @progbits\n"
> +"      .type           my_tramp1, @function\n"
> +"      .globl          my_tramp1\n"
> +"   my_tramp1:\n"
> +"      addi sp,sp,-16\n"
> +"      sd   t0,0(sp)\n"
> +"      sd   ra,8(sp)\n"
> +"      call my_direct_func1\n"
> +"      ld   t0,0(sp)\n"
> +"      ld   ra,8(sp)\n"
> +"      addi sp,sp,16\n"
> +"      jr t0\n"
> +"      .size           my_tramp1, .-my_tramp1\n"
> +
> +"      .type           my_tramp2, @function\n"
> +"      .globl          my_tramp2\n"
> +"   my_tramp2:\n"
> +"      addi sp,sp,-16\n"
> +"      sd   t0,0(sp)\n"
> +"      sd   ra,8(sp)\n"
> +"      call my_direct_func2\n"
> +"      ld   t0,0(sp)\n"
> +"      ld   ra,8(sp)\n"
> +"      addi sp,sp,16\n"
> +"      jr t0\n"
> +"      .size           my_tramp2, .-my_tramp2\n"
> +"      .popsection\n"
> +);
> +
> +#endif /* CONFIG_RISCV */
> +
>  #ifdef CONFIG_X86_64
>
>  #include <asm/ibt.h>
> diff --git a/samples/ftrace/ftrace-direct-multi-modify.c b/samples/ftrace/ftrace-direct-multi-modify.c
> index d52370cad0b6..10884bf418f7 100644
> --- a/samples/ftrace/ftrace-direct-multi-modify.c
> +++ b/samples/ftrace/ftrace-direct-multi-modify.c
> @@ -21,6 +21,43 @@ void my_direct_func2(unsigned long ip)
>  extern void my_tramp1(void *);
>  extern void my_tramp2(void *);
>
> +#ifdef CONFIG_RISCV
> +
> +asm (" .pushsection    .text, \"ax\", @progbits\n"
> +"      .type           my_tramp1, @function\n"
> +"      .globl          my_tramp1\n"
> +"   my_tramp1:\n"
> +"       addi sp,sp,-24\n"
> +"       sd   a0,0(sp)\n"
> +"       sd   t0,8(sp)\n"
> +"       sd   ra,16(sp)\n"
> +"       call my_direct_func1\n"
> +"       ld   a0,0(sp)\n"
> +"       ld   t0,8(sp)\n"
> +"       ld   ra,16(sp)\n"
> +"       addi sp,sp,24\n"
> +"      jr t0\n"
> +"      .size           my_tramp1, .-my_tramp1\n"
> +
> +"      .type           my_tramp2, @function\n"
> +"      .globl          my_tramp2\n"
> +"   my_tramp2:\n"
> +"       addi sp,sp,-24\n"
> +"       sd   a0,0(sp)\n"
> +"       sd   t0,8(sp)\n"
> +"       sd   ra,16(sp)\n"
> +"       call my_direct_func2\n"
> +"       ld   a0,0(sp)\n"
> +"       ld   t0,8(sp)\n"
> +"       ld   ra,16(sp)\n"
> +"       addi sp,sp,24\n"
> +"      jr t0\n"
> +"      .size           my_tramp2, .-my_tramp2\n"
> +"      .popsection\n"
> +);
> +
> +#endif /* CONFIG_RISCV */
> +
>  #ifdef CONFIG_X86_64
>
>  #include <asm/ibt.h>
> diff --git a/samples/ftrace/ftrace-direct-multi.c b/samples/ftrace/ftrace-direct-multi.c
> index ec1088922517..a35bf43bf6d7 100644
> --- a/samples/ftrace/ftrace-direct-multi.c
> +++ b/samples/ftrace/ftrace-direct-multi.c
> @@ -16,6 +16,28 @@ void my_direct_func(unsigned long ip)
>
>  extern void my_tramp(void *);
>
> +#ifdef CONFIG_RISCV
> +
> +asm ("       .pushsection    .text, \"ax\", @progbits\n"
> +"       .type           my_tramp, @function\n"
> +"       .globl          my_tramp\n"
> +"   my_tramp:\n"
> +"       addi sp,sp,-24\n"
> +"       sd   a0,0(sp)\n"
> +"       sd   t0,8(sp)\n"
> +"       sd   ra,16(sp)\n"
> +"       call my_direct_func\n"
> +"       ld   a0,0(sp)\n"
> +"       ld   t0,8(sp)\n"
> +"       ld   ra,16(sp)\n"
> +"       addi sp,sp,24\n"
> +"       jr t0\n"
> +"       .size           my_tramp, .-my_tramp\n"
> +"       .popsection\n"
> +);
> +
> +#endif /* CONFIG_RISCV */
> +
>  #ifdef CONFIG_X86_64
>
>  #include <asm/ibt.h>
> diff --git a/samples/ftrace/ftrace-direct-too.c b/samples/ftrace/ftrace-direct-too.c
> index e13fb59a2b47..3b62e33c2e6d 100644
> --- a/samples/ftrace/ftrace-direct-too.c
> +++ b/samples/ftrace/ftrace-direct-too.c
> @@ -18,6 +18,32 @@ void my_direct_func(struct vm_area_struct *vma,
>
>  extern void my_tramp(void *);
>
> +#ifdef CONFIG_RISCV
> +
> +asm ("       .pushsection    .text, \"ax\", @progbits\n"
> +"       .type           my_tramp, @function\n"
> +"       .globl          my_tramp\n"
> +"   my_tramp:\n"
> +"       addi sp,sp,-40\n"
> +"       sd   a0,0(sp)\n"
> +"       sd   a1,8(sp)\n"
> +"       sd   a2,16(sp)\n"
> +"       sd   t0,24(sp)\n"
> +"       sd   ra,32(sp)\n"
> +"       call my_direct_func\n"
> +"       ld   a0,0(sp)\n"
> +"       ld   a1,8(sp)\n"
> +"       ld   a2,16(sp)\n"
> +"       ld   t0,24(sp)\n"
> +"       ld   ra,32(sp)\n"
> +"       addi sp,sp,40\n"
> +"       jr t0\n"
> +"       .size           my_tramp, .-my_tramp\n"
> +"       .popsection\n"
> +);
> +
> +#endif /* CONFIG_RISCV */
> +
>  #ifdef CONFIG_X86_64
>
>  #include <asm/ibt.h>
> diff --git a/samples/ftrace/ftrace-direct.c b/samples/ftrace/ftrace-direct.c
> index 1f769d0db20f..2cfe5a7d2d70 100644
> --- a/samples/ftrace/ftrace-direct.c
> +++ b/samples/ftrace/ftrace-direct.c
> @@ -15,6 +15,28 @@ void my_direct_func(struct task_struct *p)
>
>  extern void my_tramp(void *);
>
> +#ifdef CONFIG_RISCV
> +
> +asm ("       .pushsection    .text, \"ax\", @progbits\n"
> +"       .type           my_tramp, @function\n"
> +"       .globl          my_tramp\n"
> +"   my_tramp:\n"
> +"       addi sp,sp,-24\n"
> +"       sd   a0,0(sp)\n"
> +"       sd   t0,8(sp)\n"
> +"       sd   ra,16(sp)\n"
> +"       call my_direct_func\n"
> +"       ld   a0,0(sp)\n"
> +"       ld   t0,8(sp)\n"
> +"       ld   ra,16(sp)\n"
> +"       addi sp,sp,24\n"
> +"       jr t0\n"
> +"       .size           my_tramp, .-my_tramp\n"
> +"       .popsection\n"
> +);
> +
> +#endif /* CONFIG_RISCV */
> +
>  #ifdef CONFIG_X86_64
>
>  #include <asm/ibt.h>
> --
> 2.36.1
>


-- 
Thanks,
Song

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH -next V6 3/7] riscv: ftrace: Reduce the detour code size to half
  2023-01-10 17:13   ` Evgenii Shatokhin
@ 2023-01-11  9:58     ` Guo Ren
  0 siblings, 0 replies; 28+ messages in thread
From: Guo Ren @ 2023-01-11  9:58 UTC (permalink / raw)
  To: Evgenii Shatokhin
  Cc: linux-riscv, linux-kernel, Guo Ren, anup, paul.walmsley, palmer,
	conor.dooley, heiko, rostedt, mhiramat, jolsa, bp, jpoimboe,
	suagrfillet, andy.chiu, linux

On Wed, Jan 11, 2023 at 1:13 AM Evgenii Shatokhin <e.shatokhin@yadro.com> wrote:
>
> On 07.01.2023 16:35, guoren@kernel.org wrote:
> >
> > From: Guo Ren <guoren@linux.alibaba.com>
> >
> > Use a temporary register to reduce the size of detour code from 16 bytes to
> > 8 bytes. The previous implementation is from 'commit afc76b8b8011 ("riscv:
> > Using PATCHABLE_FUNCTION_ENTRY instead of MCOUNT")'.
> >
> > Before the patch:
> > <func_prolog>:
> >   0: REG_S  ra, -SZREG(sp)
> >   4: auipc  ra, ?
> >   8: jalr   ?(ra)
> > 12: REG_L  ra, -SZREG(sp)
> >   (func_boddy)
> >
> > After the patch:
> > <func_prolog>:
> >   0: auipc  t0, ?
> >   4: jalr   t0, ?(t0)
> >   (func_boddy)
> >
> > This patch not just reduces the size of detour code, but also fixes an
> > important issue:
> >
> > An Ftrace callback registered with FTRACE_OPS_FL_IPMODIFY flag can
> > actually change the instruction pointer, e.g. to "replace" the given
> > kernel function with a new one, which is needed for livepatching, etc.
> >
> > In this case, the trampoline (ftrace_regs_caller) would not return to
> > <func_prolog+12> but would rather jump to the new function. So, "REG_L
> > ra, -SZREG(sp)" would not run and the original return address would not
> > be restored. The kernel is likely to hang or crash as a result.
> >
> > This can be easily demonstrated if one tries to "replace", say,
> > cmdline_proc_show() with a new function with the same signature using
> > instruction_pointer_set(&fregs->regs, new_func_addr) in the Ftrace
> > callback.
> >
> > Link: https://lore.kernel.org/linux-riscv/20221122075440.1165172-1-suagrfillet@gmail.com/
> > Link: https://lore.kernel.org/linux-riscv/d7d5730b-ebef-68e5-5046-e763e1ee6164@yadro.com/
> > Co-developed-by: Song Shuai <suagrfillet@gmail.com>
> > Signed-off-by: Song Shuai <suagrfillet@gmail.com>
> > Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
> > Signed-off-by: Guo Ren <guoren@kernel.org>
> > Cc: Evgenii Shatokhin <e.shatokhin@yadro.com>
> > ---
> >   arch/riscv/Makefile             |  4 +-
> >   arch/riscv/include/asm/ftrace.h | 50 +++++++++++++++++++------
> >   arch/riscv/kernel/ftrace.c      | 65 ++++++++++-----------------------
> >   arch/riscv/kernel/mcount-dyn.S  | 43 +++++++++-------------
> >   4 files changed, 76 insertions(+), 86 deletions(-)
> >
> > diff --git a/arch/riscv/Makefile b/arch/riscv/Makefile
> > index ea5a91da6897..3c9aaf67ed79 100644
> > --- a/arch/riscv/Makefile
> > +++ b/arch/riscv/Makefile
> > @@ -12,9 +12,9 @@ ifeq ($(CONFIG_DYNAMIC_FTRACE),y)
> >          LDFLAGS_vmlinux := --no-relax
> >          KBUILD_CPPFLAGS += -DCC_USING_PATCHABLE_FUNCTION_ENTRY
> >   ifeq ($(CONFIG_RISCV_ISA_C),y)
> > -       CC_FLAGS_FTRACE := -fpatchable-function-entry=8
> > -else
> >          CC_FLAGS_FTRACE := -fpatchable-function-entry=4
> > +else
> > +       CC_FLAGS_FTRACE := -fpatchable-function-entry=2
> >   endif
> >   endif
> >
> > diff --git a/arch/riscv/include/asm/ftrace.h b/arch/riscv/include/asm/ftrace.h
> > index 04dad3380041..9e73922e1e2e 100644
> > --- a/arch/riscv/include/asm/ftrace.h
> > +++ b/arch/riscv/include/asm/ftrace.h
> > @@ -42,6 +42,14 @@ struct dyn_arch_ftrace {
> >    * 2) jalr: setting low-12 offset to ra, jump to ra, and set ra to
> >    *          return address (original pc + 4)
> >    *
> > + *<ftrace enable>:
> > + * 0: auipc  t0/ra, 0x?
> > + * 4: jalr   t0/ra, ?(t0/ra)
> > + *
> > + *<ftrace disable>:
> > + * 0: nop
> > + * 4: nop
> > + *
> >    * Dynamic ftrace generates probes to call sites, so we must deal with
> >    * both auipc and jalr at the same time.
> >    */
> > @@ -52,25 +60,43 @@ struct dyn_arch_ftrace {
> >   #define AUIPC_OFFSET_MASK      (0xfffff000)
> >   #define AUIPC_PAD              (0x00001000)
> >   #define JALR_SHIFT             20
> > -#define JALR_BASIC             (0x000080e7)
> > -#define AUIPC_BASIC            (0x00000097)
> > +#define JALR_RA                        (0x000080e7)
> > +#define AUIPC_RA               (0x00000097)
> > +#define JALR_T0                        (0x000282e7)
> > +#define AUIPC_T0               (0x00000297)
> >   #define NOP4                   (0x00000013)
> >
> > -#define make_call(caller, callee, call)                                        \
> > +#define to_jalr_t0(offset)                                             \
> > +       (((offset & JALR_OFFSET_MASK) << JALR_SHIFT) | JALR_T0)
> > +
> > +#define to_auipc_t0(offset)                                            \
> > +       ((offset & JALR_SIGN_MASK) ?                                    \
> > +       (((offset & AUIPC_OFFSET_MASK) + AUIPC_PAD) | AUIPC_T0) :       \
> > +       ((offset & AUIPC_OFFSET_MASK) | AUIPC_T0))
> > +
> > +#define make_call_t0(caller, callee, call)                             \
> >   do {                                                                   \
> > -       call[0] = to_auipc_insn((unsigned int)((unsigned long)callee -  \
> > -                               (unsigned long)caller));                \
> > -       call[1] = to_jalr_insn((unsigned int)((unsigned long)callee -   \
> > -                              (unsigned long)caller));                 \
> > +       unsigned int offset =                                           \
> > +               (unsigned long) callee - (unsigned long) caller;        \
> > +       call[0] = to_auipc_t0(offset);                                  \
> > +       call[1] = to_jalr_t0(offset);                                   \
> >   } while (0)
> >
> > -#define to_jalr_insn(offset)                                           \
> > -       (((offset & JALR_OFFSET_MASK) << JALR_SHIFT) | JALR_BASIC)
> > +#define to_jalr_ra(offset)                                             \
> > +       (((offset & JALR_OFFSET_MASK) << JALR_SHIFT) | JALR_RA)
> >
> > -#define to_auipc_insn(offset)                                          \
> > +#define to_auipc_ra(offset)                                            \
> >          ((offset & JALR_SIGN_MASK) ?                                    \
> > -       (((offset & AUIPC_OFFSET_MASK) + AUIPC_PAD) | AUIPC_BASIC) :    \
> > -       ((offset & AUIPC_OFFSET_MASK) | AUIPC_BASIC))
> > +       (((offset & AUIPC_OFFSET_MASK) + AUIPC_PAD) | AUIPC_RA) :       \
> > +       ((offset & AUIPC_OFFSET_MASK) | AUIPC_RA))
> > +
> > +#define make_call_ra(caller, callee, call)                             \
> > +do {                                                                   \
> > +       unsigned int offset =                                           \
> > +               (unsigned long) callee - (unsigned long) caller;        \
> > +       call[0] = to_auipc_ra(offset);                                  \
> > +       call[1] = to_jalr_ra(offset);                                   \
> > +} while (0)
> >
> >   /*
> >    * Let auipc+jalr be the basic *mcount unit*, so we make it 8 bytes here.
> > diff --git a/arch/riscv/kernel/ftrace.c b/arch/riscv/kernel/ftrace.c
> > index 2086f6585773..5bff37af4770 100644
> > --- a/arch/riscv/kernel/ftrace.c
> > +++ b/arch/riscv/kernel/ftrace.c
> > @@ -55,12 +55,15 @@ static int ftrace_check_current_call(unsigned long hook_pos,
> >   }
> >
> >   static int __ftrace_modify_call(unsigned long hook_pos, unsigned long target,
> > -                               bool enable)
> > +                               bool enable, bool ra)
> >   {
> >          unsigned int call[2];
> >          unsigned int nops[2] = {NOP4, NOP4};
> >
> > -       make_call(hook_pos, target, call);
> > +       if (ra)
> > +               make_call_ra(hook_pos, target, call);
> > +       else
> > +               make_call_t0(hook_pos, target, call);
> >
> >          /* Replace the auipc-jalr pair at once. Return -EPERM on write error. */
> >          if (patch_text_nosync
> > @@ -70,42 +73,13 @@ static int __ftrace_modify_call(unsigned long hook_pos, unsigned long target,
> >          return 0;
> >   }
> >
> > -/*
> > - * Put 5 instructions with 16 bytes at the front of function within
> > - * patchable function entry nops' area.
> > - *
> > - * 0: REG_S  ra, -SZREG(sp)
> > - * 1: auipc  ra, 0x?
> > - * 2: jalr   -?(ra)
> > - * 3: REG_L  ra, -SZREG(sp)
> > - *
> > - * So the opcodes is:
> > - * 0: 0xfe113c23 (sd)/0xfe112e23 (sw)
> > - * 1: 0x???????? -> auipc
> > - * 2: 0x???????? -> jalr
> > - * 3: 0xff813083 (ld)/0xffc12083 (lw)
> > - */
> > -#if __riscv_xlen == 64
> > -#define INSN0  0xfe113c23
> > -#define INSN3  0xff813083
> > -#elif __riscv_xlen == 32
> > -#define INSN0  0xfe112e23
> > -#define INSN3  0xffc12083
> > -#endif
> > -
> > -#define FUNC_ENTRY_SIZE        16
> > -#define FUNC_ENTRY_JMP 4
> > -
> >   int ftrace_make_call(struct dyn_ftrace *rec, unsigned long addr)
> >   {
> > -       unsigned int call[4] = {INSN0, 0, 0, INSN3};
> > -       unsigned long target = addr;
> > -       unsigned long caller = rec->ip + FUNC_ENTRY_JMP;
> > +       unsigned int call[2];
> >
> > -       call[1] = to_auipc_insn((unsigned int)(target - caller));
> > -       call[2] = to_jalr_insn((unsigned int)(target - caller));
> > +       make_call_t0(rec->ip, addr, call);
> >
> > -       if (patch_text_nosync((void *)rec->ip, call, FUNC_ENTRY_SIZE))
> > +       if (patch_text_nosync((void *)rec->ip, call, MCOUNT_INSN_SIZE))
> >                  return -EPERM;
> >
> >          return 0;
> > @@ -114,15 +88,14 @@ int ftrace_make_call(struct dyn_ftrace *rec, unsigned long addr)
> >   int ftrace_make_nop(struct module *mod, struct dyn_ftrace *rec,
> >                      unsigned long addr)
> >   {
> > -       unsigned int nops[4] = {NOP4, NOP4, NOP4, NOP4};
> > +       unsigned int nops[2] = {NOP4, NOP4};
> >
> > -       if (patch_text_nosync((void *)rec->ip, nops, FUNC_ENTRY_SIZE))
> > +       if (patch_text_nosync((void *)rec->ip, nops, MCOUNT_INSN_SIZE))
> >                  return -EPERM;
> >
> >          return 0;
> >   }
> >
> > -
> >   /*
> >    * This is called early on, and isn't wrapped by
> >    * ftrace_arch_code_modify_{prepare,post_process}() and therefor doesn't hold
> > @@ -144,10 +117,10 @@ int ftrace_init_nop(struct module *mod, struct dyn_ftrace *rec)
> >   int ftrace_update_ftrace_func(ftrace_func_t func)
> >   {
> >          int ret = __ftrace_modify_call((unsigned long)&ftrace_call,
> > -                                      (unsigned long)func, true);
> > +                                      (unsigned long)func, true, true);
> >          if (!ret) {
> >                  ret = __ftrace_modify_call((unsigned long)&ftrace_regs_call,
> > -                                          (unsigned long)func, true);
> > +                                          (unsigned long)func, true, true);
> >          }
> >
> >          return ret;
> > @@ -159,16 +132,16 @@ int ftrace_modify_call(struct dyn_ftrace *rec, unsigned long old_addr,
> >                         unsigned long addr)
> >   {
> >          unsigned int call[2];
> > -       unsigned long caller = rec->ip + FUNC_ENTRY_JMP;
> > +       unsigned long caller = rec->ip;
> >          int ret;
> >
> > -       make_call(caller, old_addr, call);
> > +       make_call_t0(caller, old_addr, call);
> >          ret = ftrace_check_current_call(caller, call);
> >
> >          if (ret)
> >                  return ret;
> >
> > -       return __ftrace_modify_call(caller, addr, true);
> > +       return __ftrace_modify_call(caller, addr, true, false);
> >   }
> >   #endif
> >
> > @@ -203,12 +176,12 @@ int ftrace_enable_ftrace_graph_caller(void)
> >          int ret;
> >
> >          ret = __ftrace_modify_call((unsigned long)&ftrace_graph_call,
> > -                                   (unsigned long)&prepare_ftrace_return, true);
> > +                                   (unsigned long)&prepare_ftrace_return, true, true);
> >          if (ret)
> >                  return ret;
> >
> >          return __ftrace_modify_call((unsigned long)&ftrace_graph_regs_call,
> > -                                   (unsigned long)&prepare_ftrace_return, true);
> > +                                   (unsigned long)&prepare_ftrace_return, true, true);
> >   }
> >
> >   int ftrace_disable_ftrace_graph_caller(void)
> > @@ -216,12 +189,12 @@ int ftrace_disable_ftrace_graph_caller(void)
> >          int ret;
> >
> >          ret = __ftrace_modify_call((unsigned long)&ftrace_graph_call,
> > -                                   (unsigned long)&prepare_ftrace_return, false);
> > +                                   (unsigned long)&prepare_ftrace_return, false, true);
> >          if (ret)
> >                  return ret;
> >
> >          return __ftrace_modify_call((unsigned long)&ftrace_graph_regs_call,
> > -                                   (unsigned long)&prepare_ftrace_return, false);
> > +                                   (unsigned long)&prepare_ftrace_return, false, true);
> >   }
> >   #endif /* CONFIG_DYNAMIC_FTRACE */
> >   #endif /* CONFIG_FUNCTION_GRAPH_TRACER */
> > diff --git a/arch/riscv/kernel/mcount-dyn.S b/arch/riscv/kernel/mcount-dyn.S
> > index d171eca623b6..b75332ced757 100644
> > --- a/arch/riscv/kernel/mcount-dyn.S
> > +++ b/arch/riscv/kernel/mcount-dyn.S
> > @@ -13,8 +13,8 @@
> >
> >          .text
> >
> > -#define FENTRY_RA_OFFSET       12
> > -#define ABI_SIZE_ON_STACK      72
> > +#define FENTRY_RA_OFFSET       8
> > +#define ABI_SIZE_ON_STACK      80
> >   #define ABI_A0                 0
> >   #define ABI_A1                 8
> >   #define ABI_A2                 16
> > @@ -23,10 +23,10 @@
> >   #define ABI_A5                 40
> >   #define ABI_A6                 48
> >   #define ABI_A7                 56
> > -#define ABI_RA                 64
> > +#define ABI_T0                 64
> > +#define ABI_RA                 72
> >
> >          .macro SAVE_ABI
> > -       addi    sp, sp, -SZREG
> >          addi    sp, sp, -ABI_SIZE_ON_STACK
> >
> >          REG_S   a0, ABI_A0(sp)
> > @@ -37,6 +37,7 @@
> >          REG_S   a5, ABI_A5(sp)
> >          REG_S   a6, ABI_A6(sp)
> >          REG_S   a7, ABI_A7(sp)
> > +       REG_S   t0, ABI_T0(sp)
> >          REG_S   ra, ABI_RA(sp)
> >          .endm
> >
> > @@ -49,24 +50,18 @@
> >          REG_L   a5, ABI_A5(sp)
> >          REG_L   a6, ABI_A6(sp)
> >          REG_L   a7, ABI_A7(sp)
> > +       REG_L   t0, ABI_T0(sp)
> >          REG_L   ra, ABI_RA(sp)
> >
> >          addi    sp, sp, ABI_SIZE_ON_STACK
> > -       addi    sp, sp, SZREG
> >          .endm
> >
> >   #ifdef CONFIG_DYNAMIC_FTRACE_WITH_REGS
> >          .macro SAVE_ALL
> > -       addi    sp, sp, -SZREG
> >          addi    sp, sp, -PT_SIZE_ON_STACK
> >
> > -       REG_S x1,  PT_EPC(sp)
> > -       addi    sp, sp, PT_SIZE_ON_STACK
> > -       REG_L x1,  (sp)
> > -       addi    sp, sp, -PT_SIZE_ON_STACK
> > +       REG_S t0,  PT_EPC(sp)
> >          REG_S x1,  PT_RA(sp)
> > -       REG_L x1,  PT_EPC(sp)
> > -
> >          REG_S x2,  PT_SP(sp)
> >          REG_S x3,  PT_GP(sp)
> >          REG_S x4,  PT_TP(sp)
> > @@ -100,11 +95,8 @@
> >          .endm
> >
> >          .macro RESTORE_ALL
> > +       REG_L t0,  PT_EPC(sp)
> >          REG_L x1,  PT_RA(sp)
> > -       addi    sp, sp, PT_SIZE_ON_STACK
> > -       REG_S x1,  (sp)
> > -       addi    sp, sp, -PT_SIZE_ON_STACK
> > -       REG_L x1,  PT_EPC(sp)
> >          REG_L x2,  PT_SP(sp)
> >          REG_L x3,  PT_GP(sp)
> >          REG_L x4,  PT_TP(sp)
>
> I did not notice it the last time, but RESTORE_ALL should skip "REG_L
> x5,  PT_T0(sp)". Otherwise, the value of t0/x5 loaded from PT_EPC(sp)
> will be overwritten.
>
> With the "ipmodify" use case I mentioned earlier, the Ftrace handler
> might have changed the value in PT_EPC(sp) by that time. Reading t0
> again from PT_T0(sp) here breaks it.


Correct, in kprobe_ftrace_handler, I found:

                unsigned long orig_ip = instruction_pointer(regs);

                instruction_pointer_set(regs, ip);

So, I must skip "REG_L x5,  PT_T0(sp)" to prevent the value from being
overwritten.


>
> The same holds for "[PATCH -next V6 5/7] riscv: ftrace: Make
> ftrace_caller call ftrace_graph_func".
>
> > @@ -137,17 +129,16 @@
> >          REG_L x31, PT_T6(sp)
> >
> >          addi    sp, sp, PT_SIZE_ON_STACK
> > -       addi    sp, sp, SZREG
> >          .endm
> >   #endif /* CONFIG_DYNAMIC_FTRACE_WITH_REGS */
> >
> >   ENTRY(ftrace_caller)
> >          SAVE_ABI
> >
> > -       addi    a0, ra, -FENTRY_RA_OFFSET
> > +       addi    a0, t0, -FENTRY_RA_OFFSET
> >          la      a1, function_trace_op
> >          REG_L   a2, 0(a1)
> > -       REG_L   a1, ABI_SIZE_ON_STACK(sp)
> > +       mv      a1, ra
> >          mv      a3, sp
> >
> >   ftrace_call:
> > @@ -155,8 +146,8 @@ ftrace_call:
> >          call    ftrace_stub
> >
> >   #ifdef CONFIG_FUNCTION_GRAPH_TRACER
> > -       addi    a0, sp, ABI_SIZE_ON_STACK
> > -       REG_L   a1, ABI_RA(sp)
> > +       addi    a0, sp, ABI_RA
> > +       REG_L   a1, ABI_T0(sp)
> >          addi    a1, a1, -FENTRY_RA_OFFSET
> >   #ifdef HAVE_FUNCTION_GRAPH_FP_TEST
> >          mv      a2, s0
> > @@ -166,17 +157,17 @@ ftrace_graph_call:
> >          call    ftrace_stub
> >   #endif
> >          RESTORE_ABI
> > -       ret
> > +       jr t0
> >   ENDPROC(ftrace_caller)
> >
> >   #ifdef CONFIG_DYNAMIC_FTRACE_WITH_REGS
> >   ENTRY(ftrace_regs_caller)
> >          SAVE_ALL
> >
> > -       addi    a0, ra, -FENTRY_RA_OFFSET
> > +       addi    a0, t0, -FENTRY_RA_OFFSET
> >          la      a1, function_trace_op
> >          REG_L   a2, 0(a1)
> > -       REG_L   a1, PT_SIZE_ON_STACK(sp)
> > +       mv      a1, ra
> >          mv      a3, sp
> >
> >   ftrace_regs_call:
> > @@ -185,7 +176,7 @@ ftrace_regs_call:
> >
> >   #ifdef CONFIG_FUNCTION_GRAPH_TRACER
> >          addi    a0, sp, PT_RA
> > -       REG_L   a1, PT_EPC(sp)
> > +       REG_L   a1, PT_T0(sp)
> >          addi    a1, a1, -FENTRY_RA_OFFSET
> >   #ifdef HAVE_FUNCTION_GRAPH_FP_TEST
> >          mv      a2, s0
> > @@ -196,6 +187,6 @@ ftrace_graph_regs_call:
> >   #endif
> >
> >          RESTORE_ALL
> > -       ret
> > +       jr t0
> >   ENDPROC(ftrace_regs_caller)
> >   #endif /* CONFIG_DYNAMIC_FTRACE_WITH_REGS */
> > --
> > 2.36.1
> >
> >
>
> Regards,
> Evgenii
>


--
Best Regards
 Guo Ren

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH -next V6 7/7] samples: ftrace: Add riscv support for SAMPLE_FTRACE_DIRECT[_MULTI]
  2023-01-11  9:50   ` Song Shuai
@ 2023-01-11  9:59     ` Guo Ren
  2023-01-13 10:48     ` Song Shuai
  1 sibling, 0 replies; 28+ messages in thread
From: Guo Ren @ 2023-01-11  9:59 UTC (permalink / raw)
  To: Song Shuai
  Cc: e.shatokhin, anup, paul.walmsley, palmer, conor.dooley, heiko,
	rostedt, mhiramat, jolsa, bp, jpoimboe, andy.chiu, linux-riscv,
	linux-kernel

On Wed, Jan 11, 2023 at 5:51 PM Song Shuai <suagrfillet@gmail.com> wrote:
>
> <guoren@kernel.org> 于2023年1月7日周六 13:36写道:
> >
> > From: Song Shuai <suagrfillet@gmail.com>
> >
> > select HAVE_SAMPLE_FTRACE_DIRECT and HAVE_SAMPLE_FTRACE_DIRECT_MULTI
> > for ARCH_RV64I in arch/riscv/Kconfig. And add riscv asm code for
> > the ftrace-direct*.c files in samples/ftrace/.
> >
> > Signed-off-by: Song Shuai <suagrfillet@gmail.com>
> > Tested-by: Guo Ren <guoren@kernel.org>
> > Signed-off-by: Guo Ren <guoren@kernel.org>
> > ---
> Hi,Guo && Evgenii:
>
> As Evgenii comments, this patch has two problems to fix:
>
> 1. the modification of Kconfig is missing
> So we should add it back
>
> 2. the build error resulted by including of nospec-branch.h file
> This including is exclusive for x86 architecture, moving it under
> x86 #ifdef seems better to fix this error
>
> I fixed them in my GitHub repo, but I can't find a reasonable target
> branch in your repo to PR.
> So I paste the link of my branch here, you can pick the first 2
> commits in the next series.
>
> - samples: ftrace: Add riscv support for SAMPLE_FTRACE_DIRECT[_MULTI]
> - samples: ftrace: Include the nospec-branch.h only for x86
Great, thx.

>
> Link: https://github.com/sugarfillet/linux/commits/song_ftrace_fix_up_v6
>
> >  samples/ftrace/ftrace-direct-modify.c       | 33 ++++++++++++++++++
> >  samples/ftrace/ftrace-direct-multi-modify.c | 37 +++++++++++++++++++++
> >  samples/ftrace/ftrace-direct-multi.c        | 22 ++++++++++++
> >  samples/ftrace/ftrace-direct-too.c          | 26 +++++++++++++++
> >  samples/ftrace/ftrace-direct.c              | 22 ++++++++++++
> >  5 files changed, 140 insertions(+)
> >
> > diff --git a/samples/ftrace/ftrace-direct-modify.c b/samples/ftrace/ftrace-direct-modify.c
> > index de5a0f67f320..be7bf472c3c7 100644
> > --- a/samples/ftrace/ftrace-direct-modify.c
> > +++ b/samples/ftrace/ftrace-direct-modify.c
> > @@ -23,6 +23,39 @@ extern void my_tramp2(void *);
> >
> >  static unsigned long my_ip = (unsigned long)schedule;
> >
> > +#ifdef CONFIG_RISCV
> > +
> > +asm (" .pushsection    .text, \"ax\", @progbits\n"
> > +"      .type           my_tramp1, @function\n"
> > +"      .globl          my_tramp1\n"
> > +"   my_tramp1:\n"
> > +"      addi sp,sp,-16\n"
> > +"      sd   t0,0(sp)\n"
> > +"      sd   ra,8(sp)\n"
> > +"      call my_direct_func1\n"
> > +"      ld   t0,0(sp)\n"
> > +"      ld   ra,8(sp)\n"
> > +"      addi sp,sp,16\n"
> > +"      jr t0\n"
> > +"      .size           my_tramp1, .-my_tramp1\n"
> > +
> > +"      .type           my_tramp2, @function\n"
> > +"      .globl          my_tramp2\n"
> > +"   my_tramp2:\n"
> > +"      addi sp,sp,-16\n"
> > +"      sd   t0,0(sp)\n"
> > +"      sd   ra,8(sp)\n"
> > +"      call my_direct_func2\n"
> > +"      ld   t0,0(sp)\n"
> > +"      ld   ra,8(sp)\n"
> > +"      addi sp,sp,16\n"
> > +"      jr t0\n"
> > +"      .size           my_tramp2, .-my_tramp2\n"
> > +"      .popsection\n"
> > +);
> > +
> > +#endif /* CONFIG_RISCV */
> > +
> >  #ifdef CONFIG_X86_64
> >
> >  #include <asm/ibt.h>
> > diff --git a/samples/ftrace/ftrace-direct-multi-modify.c b/samples/ftrace/ftrace-direct-multi-modify.c
> > index d52370cad0b6..10884bf418f7 100644
> > --- a/samples/ftrace/ftrace-direct-multi-modify.c
> > +++ b/samples/ftrace/ftrace-direct-multi-modify.c
> > @@ -21,6 +21,43 @@ void my_direct_func2(unsigned long ip)
> >  extern void my_tramp1(void *);
> >  extern void my_tramp2(void *);
> >
> > +#ifdef CONFIG_RISCV
> > +
> > +asm (" .pushsection    .text, \"ax\", @progbits\n"
> > +"      .type           my_tramp1, @function\n"
> > +"      .globl          my_tramp1\n"
> > +"   my_tramp1:\n"
> > +"       addi sp,sp,-24\n"
> > +"       sd   a0,0(sp)\n"
> > +"       sd   t0,8(sp)\n"
> > +"       sd   ra,16(sp)\n"
> > +"       call my_direct_func1\n"
> > +"       ld   a0,0(sp)\n"
> > +"       ld   t0,8(sp)\n"
> > +"       ld   ra,16(sp)\n"
> > +"       addi sp,sp,24\n"
> > +"      jr t0\n"
> > +"      .size           my_tramp1, .-my_tramp1\n"
> > +
> > +"      .type           my_tramp2, @function\n"
> > +"      .globl          my_tramp2\n"
> > +"   my_tramp2:\n"
> > +"       addi sp,sp,-24\n"
> > +"       sd   a0,0(sp)\n"
> > +"       sd   t0,8(sp)\n"
> > +"       sd   ra,16(sp)\n"
> > +"       call my_direct_func2\n"
> > +"       ld   a0,0(sp)\n"
> > +"       ld   t0,8(sp)\n"
> > +"       ld   ra,16(sp)\n"
> > +"       addi sp,sp,24\n"
> > +"      jr t0\n"
> > +"      .size           my_tramp2, .-my_tramp2\n"
> > +"      .popsection\n"
> > +);
> > +
> > +#endif /* CONFIG_RISCV */
> > +
> >  #ifdef CONFIG_X86_64
> >
> >  #include <asm/ibt.h>
> > diff --git a/samples/ftrace/ftrace-direct-multi.c b/samples/ftrace/ftrace-direct-multi.c
> > index ec1088922517..a35bf43bf6d7 100644
> > --- a/samples/ftrace/ftrace-direct-multi.c
> > +++ b/samples/ftrace/ftrace-direct-multi.c
> > @@ -16,6 +16,28 @@ void my_direct_func(unsigned long ip)
> >
> >  extern void my_tramp(void *);
> >
> > +#ifdef CONFIG_RISCV
> > +
> > +asm ("       .pushsection    .text, \"ax\", @progbits\n"
> > +"       .type           my_tramp, @function\n"
> > +"       .globl          my_tramp\n"
> > +"   my_tramp:\n"
> > +"       addi sp,sp,-24\n"
> > +"       sd   a0,0(sp)\n"
> > +"       sd   t0,8(sp)\n"
> > +"       sd   ra,16(sp)\n"
> > +"       call my_direct_func\n"
> > +"       ld   a0,0(sp)\n"
> > +"       ld   t0,8(sp)\n"
> > +"       ld   ra,16(sp)\n"
> > +"       addi sp,sp,24\n"
> > +"       jr t0\n"
> > +"       .size           my_tramp, .-my_tramp\n"
> > +"       .popsection\n"
> > +);
> > +
> > +#endif /* CONFIG_RISCV */
> > +
> >  #ifdef CONFIG_X86_64
> >
> >  #include <asm/ibt.h>
> > diff --git a/samples/ftrace/ftrace-direct-too.c b/samples/ftrace/ftrace-direct-too.c
> > index e13fb59a2b47..3b62e33c2e6d 100644
> > --- a/samples/ftrace/ftrace-direct-too.c
> > +++ b/samples/ftrace/ftrace-direct-too.c
> > @@ -18,6 +18,32 @@ void my_direct_func(struct vm_area_struct *vma,
> >
> >  extern void my_tramp(void *);
> >
> > +#ifdef CONFIG_RISCV
> > +
> > +asm ("       .pushsection    .text, \"ax\", @progbits\n"
> > +"       .type           my_tramp, @function\n"
> > +"       .globl          my_tramp\n"
> > +"   my_tramp:\n"
> > +"       addi sp,sp,-40\n"
> > +"       sd   a0,0(sp)\n"
> > +"       sd   a1,8(sp)\n"
> > +"       sd   a2,16(sp)\n"
> > +"       sd   t0,24(sp)\n"
> > +"       sd   ra,32(sp)\n"
> > +"       call my_direct_func\n"
> > +"       ld   a0,0(sp)\n"
> > +"       ld   a1,8(sp)\n"
> > +"       ld   a2,16(sp)\n"
> > +"       ld   t0,24(sp)\n"
> > +"       ld   ra,32(sp)\n"
> > +"       addi sp,sp,40\n"
> > +"       jr t0\n"
> > +"       .size           my_tramp, .-my_tramp\n"
> > +"       .popsection\n"
> > +);
> > +
> > +#endif /* CONFIG_RISCV */
> > +
> >  #ifdef CONFIG_X86_64
> >
> >  #include <asm/ibt.h>
> > diff --git a/samples/ftrace/ftrace-direct.c b/samples/ftrace/ftrace-direct.c
> > index 1f769d0db20f..2cfe5a7d2d70 100644
> > --- a/samples/ftrace/ftrace-direct.c
> > +++ b/samples/ftrace/ftrace-direct.c
> > @@ -15,6 +15,28 @@ void my_direct_func(struct task_struct *p)
> >
> >  extern void my_tramp(void *);
> >
> > +#ifdef CONFIG_RISCV
> > +
> > +asm ("       .pushsection    .text, \"ax\", @progbits\n"
> > +"       .type           my_tramp, @function\n"
> > +"       .globl          my_tramp\n"
> > +"   my_tramp:\n"
> > +"       addi sp,sp,-24\n"
> > +"       sd   a0,0(sp)\n"
> > +"       sd   t0,8(sp)\n"
> > +"       sd   ra,16(sp)\n"
> > +"       call my_direct_func\n"
> > +"       ld   a0,0(sp)\n"
> > +"       ld   t0,8(sp)\n"
> > +"       ld   ra,16(sp)\n"
> > +"       addi sp,sp,24\n"
> > +"       jr t0\n"
> > +"       .size           my_tramp, .-my_tramp\n"
> > +"       .popsection\n"
> > +);
> > +
> > +#endif /* CONFIG_RISCV */
> > +
> >  #ifdef CONFIG_X86_64
> >
> >  #include <asm/ibt.h>
> > --
> > 2.36.1
> >
>
>
> --
> Thanks,
> Song



-- 
Best Regards
 Guo Ren

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH -next V6 0/7] riscv: Optimize function trace
  2023-01-07 13:35 [PATCH -next V6 0/7] riscv: Optimize function trace guoren
                   ` (6 preceding siblings ...)
  2023-01-07 13:35 ` [PATCH -next V6 7/7] samples: ftrace: Add riscv support for SAMPLE_FTRACE_DIRECT[_MULTI] guoren
@ 2023-01-11 10:11 ` Song Shuai
  2023-01-11 14:03   ` Guo Ren
  7 siblings, 1 reply; 28+ messages in thread
From: Song Shuai @ 2023-01-11 10:11 UTC (permalink / raw)
  To: guoren
  Cc: anup, paul.walmsley, palmer, conor.dooley, heiko, rostedt,
	mhiramat, jolsa, bp, jpoimboe, andy.chiu, e.shatokhin,
	linux-riscv, linux-kernel, Guo Ren

<guoren@kernel.org> 于2023年1月7日周六 13:36写道:
>
> From: Guo Ren <guoren@linux.alibaba.com>
>
> The previous ftrace detour implementation fc76b8b8011 ("riscv: Using
> PATCHABLE_FUNCTION_ENTRY instead of MCOUNT") contain three problems.
>
>  - The most horrible bug is preemption panic which found by Andy [1].
>    Let's disable preemption for ftrace first, and Andy could continue
>    the ftrace preemption work.
>  - The "-fpatchable-function-entry= CFLAG" wasted code size
>    !RISCV_ISA_C.
>  - The ftrace detour implementation wasted code size.
>  - When livepatching, the trampoline (ftrace_regs_caller) would not
>    return to <func_prolog+12> but would rather jump to the new function.
>    So, "REG_L ra, -SZREG(sp)" would not run and the original return
>    address would not be restored. The kernel is likely to hang or crash
>    as a result. (Found by Evgenii Shatokhin [4])
>
> Patches 1,2,3 fixup above problems.
>
> Patches 4,5,6,7 are the features based on reduced detour code
> patch, we include them in the series for test and maintenance.
>
> You can directly try it with:
> https://github.com/guoren83/linux/tree/ftrace_fixup_v6
>
This link is broken, maybe you forget to push the branch to remote.

And the patch ("riscv : select FTRACE_MCOUNT_USE_PATCHABLE_FUNCTION_ENTRY")
was folded in the PR ( https://github.com/guoren83/linux/pull/1 )
I guess you might omit it. Hope its debut in the next version.

> Make function graph use ftrace directly [2] (patch 4, 5)
> ========================================================
>
> In RISC-V architecture, when we enable the ftrace_graph tracer on some
> functions, the function tracings on other functions will suffer extra
> graph tracing work. In essence, graph_ops isn't limited by its func_hash
> due to the global ftrace_graph_[regs]_call label. That should be
> corrected.
>
> What inspires me is the commit 0c0593b45c9b ("x86/ftrace: Make function
> graph use ftrace directly") that uses graph_ops::func function to
> install return_hooker and makes the function called against its
> func_hash.
>
> This series of patches makes function graph use ftrace directly for
> riscv.
>
> If FTRACE_WITH_REGS isn't defined, ftrace_caller keeps ftrace_graph_call
> so that it can be replaced with the calling of prepare_ftrace_return by
> the enable/disable helper.
>
> As for defining FTRACE_WITH_REGS, ftrace_caller is adjusted to save the
> necessary regs against the pt_regs layout, so it can reasonably call the
> graph_ops::func function - ftrace_graph_func. And
> ftrace_graph_[regs]_call
> and its enable/disable helper aren't needed.
>
> Test log:
>
> The tests generated by CONFIG_FTRACE_STARTUP_TEST have passed in the
> local
> qemu-system-riscv64 virt machine. The following is the log during
> startup.
>
> ```
> Nov 15 03:07:13 stage4 kernel: Testing tracer function: PASSED
> Nov 15 03:07:13 stage4 kernel: Testing dynamic ftrace: PASSED
> Nov 15 03:07:13 stage4 kernel: Testing dynamic ftrace ops #1:
> Nov 15 03:07:13 stage4 kernel: (1 0 1 0 0)
> Nov 15 03:07:13 stage4 kernel: (1 1 2 0 0)
> Nov 15 03:07:13 stage4 kernel: (2 1 3 0 365)
> Nov 15 03:07:13 stage4 kernel: (2 2 4 0 399)
> Nov 15 03:07:13 stage4 kernel: (3 2 4 0 146071)
> Nov 15 03:07:13 stage4 kernel: (3 3 5 0 146105) PASSED
> Nov 15 03:07:13 stage4 kernel: Testing dynamic ftrace ops #2:
> Nov 15 03:07:13 stage4 kernel: (1 0 1 589 0)
> Nov 15 03:07:13 stage4 kernel: (1 1 2 635 0)
> Nov 15 03:07:13 stage4 kernel: (2 1 3 1 2)
> Nov 15 03:07:13 stage4 kernel: (2 2 4 125 126)
> Nov 15 03:07:13 stage4 kernel: (3 2 4 146001 146078)
> Nov 15 03:07:13 stage4 kernel: (3 3 5 146035 146112) PASSED
> Nov 15 03:07:13 stage4 kernel: Testing ftrace recursion: PASSED
> Nov 15 03:07:13 stage4 kernel: Testing ftrace recursion safe: PASSED
> Nov 15 03:07:13 stage4 kernel: Testing ftrace regs: PASSED
> Nov 15 03:07:13 stage4 kernel: Testing tracer nop: PASSED
> Nov 15 03:07:13 stage4 kernel: Testing tracer irqsoff: PASSED
> Nov 15 03:07:13 stage4 kernel: Testing tracer wakeup:
> Nov 15 03:07:13 stage4 kernel: sched: DL replenish lagged too much
> Nov 15 03:07:13 stage4 kernel: PASSED
> Nov 15 03:07:13 stage4 kernel: Testing tracer wakeup_rt: PASSED
> Nov 15 03:07:13 stage4 kernel: Testing tracer wakeup_dl: PASSED
> Nov 15 03:07:13 stage4 kernel: Testing tracer function_graph: PASSED
> ```
>
> Add WITH_DIRECT_CALLS support [3] (patch 6, 7)
> ==============================================
>
> This series adds DYNAMIC_FTRACE_WITH_DIRECT_CALLS support for RISC-V.
> SAMPLE_FTRACE_DIRECT and SAMPLE_FTRACE_DIRECT_MULTI are also included
> here as the samples for testing DIRECT_CALLS related interface.
>
> First, select the DYNAMIC_FTRACE_WITH_DIRECT_CALLS to provide
> register_ftrace_direct[_multi] interfaces allowing user to register
> the customed trampoline (direct_caller) as the mcount for one or
> more target functions. And modify_ftrace_direct[_multi] are also
> provided for modify direct_caller.
>
> At the same time, the samples in ./samples/ftrace/ can be built
> as kerenl module for testing these interfaces with SAMPLE_FTRACE_DIRECT
> and SAMPLE_FTRACE_DIRECT_MULTI selected.
>
> Second, to make the direct_caller and the other ftrace hooks
> (eg. function/fgraph tracer, k[ret]probes) co-exist, a temporary
> register
> are nominated to store the address of direct_caller in
> ftrace_regs_caller.
> After the setting of the address direct_caller by direct_ops->func and
> the RESTORE_REGS in ftrace_regs_caller, direct_caller will be jumped to
> by the `jr` inst.
>
> The following tests have been passed in my local qemu-riscv64 virt
> machine.
>
> 1. tests with CONFIG_FTRACE_STARTUP_TEST
> 2. tests of samples/ftrace/ftrace*.ko
> 3. manual tests with any combination of the following hooks
>   - function/function_graph tracer
>   - ftrace*.ko
>   - kprobe/kretprobe
>
> For your reference, here is the log when function tracer, kretprobe and
> ftrace-direct-too.ko co-hooks the handle_mm_fault function.
>
> ```
> [root@stage4 tracing]# echo handle_mm_fault > set_ftrace_filter
> [root@stage4 tracing]# echo 'r:myr handle_mm_fault' > kprobe_events
> [root@stage4 tracing]# echo function > current_tracer
> [root@stage4 tracing]# echo 1 > events/kprobes/myr/enable
> [root@stage4 tracing]# insmod /root/ftrace-direct-too.ko
> [root@stage4 tracing]#
> [root@stage4 tracing]# cat trace | tail
>              cat-388     [000] ...1.   583.051438: myr:
> (do_page_fault+0x16c/0x5f2 <- handle_mm_fault)
>              cat-388     [000] ...2.   583.057930: handle_mm_fault
> <-do_page_fault
>              cat-388     [000] .....   583.057990: my_direct_func:
> handle mm fault vma=000000002d9fe19c address=ffffffae9b7000 flags=215
>              cat-388     [000] ...1.   583.058284: myr:
> (do_page_fault+0x16c/0x5f2 <- handle_mm_fault)
>             tail-389     [001] ...2.   583.059062: handle_mm_fault
> <-do_page_fault
>             tail-389     [001] .....   583.059104: my_direct_func:
> handle mm fault vma=0000000017f3c48e address=aaaaaabebf3000 flags=215
>             tail-389     [001] ...1.   583.059325: myr:
> (do_page_fault+0x16c/0x5f2 <- handle_mm_fault)
>             tail-389     [001] ...2.   583.060371: handle_mm_fault
> <-do_page_fault
>             tail-389     [001] .....   583.060410: my_direct_func:
> handle mm fault vma=0000000017f3c48e address=aaaaaabebf1000 flags=255
>             tail-389     [001] ...1.   583.060996: myr:
> (do_page_fault+0x16c/0x5f2 <- handle_mm_fault)
> ```
> Note1:
>
> The checkpatch.pl will output some warnings on this series, like this
>
> ```
> WARNING: Prefer using '"%s...", __func__' to using 'my_direct_func2',
> this function's name, in a string
> 111: FILE: samples/ftrace/ftrace-direct-multi-modify.c:48:
> +"       call my_direct_func2\n"
> ```
>
> The reason is that checkpatch depends on patch context providing the
> function name. In the above warning, my_direct_func2 has some codeline
> distance with the changed trunk, so its declaration doesn't come into
> the patch, and then the warning jumps out.
>
> You may notice the location of `my_ip` variable changes in the 2nd
> patch. I did that for reducing the warnings to some extent. But killing
> all the warnings will makes the patch less readable, so I stopped here.
>
> [1] https://lpc.events/event/16/contributions/1171/
> [2] https://lore.kernel.org/lkml/20221120084230.910152-1-suagrfillet@gmail.com/
> [3] https://lore.kernel.org/linux-riscv/20221123142025.1504030-1-suagrfillet@gmail.com/
> [4] https://lore.kernel.org/linux-riscv/d7d5730b-ebef-68e5-5046-e763e1ee6164@yadro.com/
>
> Changes in v6:
>  - Replace 8 with MCOUNT_INSN_SIZE
>  - Replace "REG_L a1, PT_RA(sp)" with "mv a1, ra"
>  - Add Evgenii Shatokhin comment
>
> Changes in v5:
> https://lore.kernel.org/linux-riscv/20221208091244.203407-1-guoren@kernel.org/
>  - Sort Kconfig entries in alphabetical order.
>
> Changes in v4:
> https://lore.kernel.org/linux-riscv/20221129033230.255947-1-guoren@kernel.org/
>  - Include [3] for maintenance. [Song Shuai]
>
> Changes in V3:
> https://lore.kernel.org/linux-riscv/20221123153950.2911981-1-guoren@kernel.org/
>  - Include [2] for maintenance. [Song Shuai]
>
> Changes in V2:
> https://lore.kernel.org/linux-riscv/20220921034910.3142465-1-guoren@kernel.org/
>  - Add Signed-off for preemption fixup.
>
> Changes in V1:
> https://lore.kernel.org/linux-riscv/20220916103817.9490-1-guoren@kernel.org/
>
> Andy Chiu (1):
>   riscv: ftrace: Fixup panic by disabling preemption
>
> Guo Ren (2):
>   riscv: ftrace: Remove wasted nops for !RISCV_ISA_C
>   riscv: ftrace: Reduce the detour code size to half
>
> Song Shuai (4):
>   riscv: ftrace: Add ftrace_graph_func
>   riscv: ftrace: Make ftrace_caller call ftrace_graph_func
>   riscv: ftrace: Add DYNAMIC_FTRACE_WITH_DIRECT_CALLS support
>   samples: ftrace: Add riscv support for SAMPLE_FTRACE_DIRECT[_MULTI]
>
>  arch/riscv/Kconfig                          |   3 +-
>  arch/riscv/Makefile                         |   6 +-
>  arch/riscv/include/asm/ftrace.h             |  71 ++++++--
>  arch/riscv/kernel/ftrace.c                  |  91 ++++------
>  arch/riscv/kernel/mcount-dyn.S              | 181 +++++++++++++-------
>  samples/ftrace/ftrace-direct-modify.c       |  33 ++++
>  samples/ftrace/ftrace-direct-multi-modify.c |  37 ++++
>  samples/ftrace/ftrace-direct-multi.c        |  22 +++
>  samples/ftrace/ftrace-direct-too.c          |  26 +++
>  samples/ftrace/ftrace-direct.c              |  22 +++
>  10 files changed, 355 insertions(+), 137 deletions(-)
>
> --
> 2.36.1
>


--
Thanks,
Song

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH -next V6 1/7] riscv: ftrace: Fixup panic by disabling preemption
  2023-01-09 17:19   ` Mark Rutland
@ 2023-01-11 13:22     ` Guo Ren
  2023-01-12 12:05       ` Mark Rutland
  0 siblings, 1 reply; 28+ messages in thread
From: Guo Ren @ 2023-01-11 13:22 UTC (permalink / raw)
  To: Mark Rutland
  Cc: anup, paul.walmsley, palmer, conor.dooley, heiko, rostedt,
	mhiramat, jolsa, bp, jpoimboe, suagrfillet, andy.chiu,
	e.shatokhin, linux-riscv, linux-kernel

On Tue, Jan 10, 2023 at 1:20 AM Mark Rutland <mark.rutland@arm.com> wrote:
>
> On Sat, Jan 07, 2023 at 08:35:43AM -0500, guoren@kernel.org wrote:
> > From: Andy Chiu <andy.chiu@sifive.com>
> >
> > In RISCV, we must use an AUIPC + JALR pair to encode an immediate,
> > forming a jump that jumps to an address over 4K. This may cause errors
> > if we want to enable kernel preemption and remove dependency from
> > patching code with stop_machine(). For example, if a task was switched
> > out on auipc. And, if we changed the ftrace function before it was
> > switched back, then it would jump to an address that has updated 11:0
> > bits mixing with previous XLEN:12 part.
> >
> > p: patched area performed by dynamic ftrace
> > ftrace_prologue:
> > p|      REG_S   ra, -SZREG(sp)
> > p|      auipc   ra, 0x? ------------> preempted
> >                                       ...
> >                               change ftrace function
> >                                       ...
> > p|      jalr    -?(ra) <------------- switched back
> > p|      REG_L   ra, -SZREG(sp)
> > func:
> >       xxx
> >       ret
>
> What happens on SMP but not !PREEMPTION; can't a CPU be in the middle of this
> while you're patching the sequence?
Yes, when PREEMPTION, a timer interrupt between auipc & jalr may cause
context_switch. And riscv uses stop_machine for patch_text. Then, we
may modify auipc part, but only execute the jalr part when return.

>
> Do you have any guarantee as to the atomicity and ordering of instruction
> fetches?
Not yet. If the region is short, we could use nop + jalr pair instead.
Only one jalr instruction makes the entry atomicity.

There are already several proposed solutions:
1. Make stop_machine guarantee all CPU out of preemption point.
2. Expand -fpatchable-function-entry from 4 to 24, and make detour
codes atomicity.
3. We want to propose a solution to make auipc by hardware mask_irq.
For more details, see:
https://www.youtube.com/watch?v=4JkkkXuEvCw


>
> Thanks,
> Mark.
>
> >
> > Fixes: afc76b8b8011 ("riscv: Using PATCHABLE_FUNCTION_ENTRY instead of MCOUNT")
> > Signed-off-by: Andy Chiu <andy.chiu@sifive.com>
> > Signed-off-by: Guo Ren <guoren@kernel.org>
> > ---
> >  arch/riscv/Kconfig | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> > index e2b656043abf..ee0d39b26794 100644
> > --- a/arch/riscv/Kconfig
> > +++ b/arch/riscv/Kconfig
> > @@ -138,7 +138,7 @@ config RISCV
> >       select HAVE_DYNAMIC_FTRACE_WITH_REGS if HAVE_DYNAMIC_FTRACE
> >       select HAVE_FTRACE_MCOUNT_RECORD if !XIP_KERNEL
> >       select HAVE_FUNCTION_GRAPH_TRACER
> > -     select HAVE_FUNCTION_TRACER if !XIP_KERNEL
> > +     select HAVE_FUNCTION_TRACER if !XIP_KERNEL && !PREEMPTION
> >
> >  config ARCH_MMAP_RND_BITS_MIN
> >       default 18 if 64BIT
> > --
> > 2.36.1
> >



--
Best Regards
 Guo Ren

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH -next V6 0/7] riscv: Optimize function trace
  2023-01-11 10:11 ` [PATCH -next V6 0/7] riscv: Optimize function trace Song Shuai
@ 2023-01-11 14:03   ` Guo Ren
  0 siblings, 0 replies; 28+ messages in thread
From: Guo Ren @ 2023-01-11 14:03 UTC (permalink / raw)
  To: Song Shuai
  Cc: anup, paul.walmsley, palmer, conor.dooley, heiko, rostedt,
	mhiramat, jolsa, bp, jpoimboe, andy.chiu, e.shatokhin,
	linux-riscv, linux-kernel, Guo Ren

On Wed, Jan 11, 2023 at 6:11 PM Song Shuai <suagrfillet@gmail.com> wrote:
>
> <guoren@kernel.org> 于2023年1月7日周六 13:36写道:
> >
> > From: Guo Ren <guoren@linux.alibaba.com>
> >
> > The previous ftrace detour implementation fc76b8b8011 ("riscv: Using
> > PATCHABLE_FUNCTION_ENTRY instead of MCOUNT") contain three problems.
> >
> >  - The most horrible bug is preemption panic which found by Andy [1].
> >    Let's disable preemption for ftrace first, and Andy could continue
> >    the ftrace preemption work.
> >  - The "-fpatchable-function-entry= CFLAG" wasted code size
> >    !RISCV_ISA_C.
> >  - The ftrace detour implementation wasted code size.
> >  - When livepatching, the trampoline (ftrace_regs_caller) would not
> >    return to <func_prolog+12> but would rather jump to the new function.
> >    So, "REG_L ra, -SZREG(sp)" would not run and the original return
> >    address would not be restored. The kernel is likely to hang or crash
> >    as a result. (Found by Evgenii Shatokhin [4])
> >
> > Patches 1,2,3 fixup above problems.
> >
> > Patches 4,5,6,7 are the features based on reduced detour code
> > patch, we include them in the series for test and maintenance.
> >
> > You can directly try it with:
> > https://github.com/guoren83/linux/tree/ftrace_fixup_v6
Oh, yes. I've corrected it.

> >
> This link is broken, maybe you forget to push the branch to remote.
>
> And the patch ("riscv : select FTRACE_MCOUNT_USE_PATCHABLE_FUNCTION_ENTRY")
> was folded in the PR ( https://github.com/guoren83/linux/pull/1 )
> I guess you might omit it. Hope its debut in the next version.

Yes, tell Makefile not to run recordmcount is necessary.


>
> > Make function graph use ftrace directly [2] (patch 4, 5)
> > ========================================================
> >
> > In RISC-V architecture, when we enable the ftrace_graph tracer on some
> > functions, the function tracings on other functions will suffer extra
> > graph tracing work. In essence, graph_ops isn't limited by its func_hash
> > due to the global ftrace_graph_[regs]_call label. That should be
> > corrected.
> >
> > What inspires me is the commit 0c0593b45c9b ("x86/ftrace: Make function
> > graph use ftrace directly") that uses graph_ops::func function to
> > install return_hooker and makes the function called against its
> > func_hash.
> >
> > This series of patches makes function graph use ftrace directly for
> > riscv.
> >
> > If FTRACE_WITH_REGS isn't defined, ftrace_caller keeps ftrace_graph_call
> > so that it can be replaced with the calling of prepare_ftrace_return by
> > the enable/disable helper.
> >
> > As for defining FTRACE_WITH_REGS, ftrace_caller is adjusted to save the
> > necessary regs against the pt_regs layout, so it can reasonably call the
> > graph_ops::func function - ftrace_graph_func. And
> > ftrace_graph_[regs]_call
> > and its enable/disable helper aren't needed.
> >
> > Test log:
> >
> > The tests generated by CONFIG_FTRACE_STARTUP_TEST have passed in the
> > local
> > qemu-system-riscv64 virt machine. The following is the log during
> > startup.
> >
> > ```
> > Nov 15 03:07:13 stage4 kernel: Testing tracer function: PASSED
> > Nov 15 03:07:13 stage4 kernel: Testing dynamic ftrace: PASSED
> > Nov 15 03:07:13 stage4 kernel: Testing dynamic ftrace ops #1:
> > Nov 15 03:07:13 stage4 kernel: (1 0 1 0 0)
> > Nov 15 03:07:13 stage4 kernel: (1 1 2 0 0)
> > Nov 15 03:07:13 stage4 kernel: (2 1 3 0 365)
> > Nov 15 03:07:13 stage4 kernel: (2 2 4 0 399)
> > Nov 15 03:07:13 stage4 kernel: (3 2 4 0 146071)
> > Nov 15 03:07:13 stage4 kernel: (3 3 5 0 146105) PASSED
> > Nov 15 03:07:13 stage4 kernel: Testing dynamic ftrace ops #2:
> > Nov 15 03:07:13 stage4 kernel: (1 0 1 589 0)
> > Nov 15 03:07:13 stage4 kernel: (1 1 2 635 0)
> > Nov 15 03:07:13 stage4 kernel: (2 1 3 1 2)
> > Nov 15 03:07:13 stage4 kernel: (2 2 4 125 126)
> > Nov 15 03:07:13 stage4 kernel: (3 2 4 146001 146078)
> > Nov 15 03:07:13 stage4 kernel: (3 3 5 146035 146112) PASSED
> > Nov 15 03:07:13 stage4 kernel: Testing ftrace recursion: PASSED
> > Nov 15 03:07:13 stage4 kernel: Testing ftrace recursion safe: PASSED
> > Nov 15 03:07:13 stage4 kernel: Testing ftrace regs: PASSED
> > Nov 15 03:07:13 stage4 kernel: Testing tracer nop: PASSED
> > Nov 15 03:07:13 stage4 kernel: Testing tracer irqsoff: PASSED
> > Nov 15 03:07:13 stage4 kernel: Testing tracer wakeup:
> > Nov 15 03:07:13 stage4 kernel: sched: DL replenish lagged too much
> > Nov 15 03:07:13 stage4 kernel: PASSED
> > Nov 15 03:07:13 stage4 kernel: Testing tracer wakeup_rt: PASSED
> > Nov 15 03:07:13 stage4 kernel: Testing tracer wakeup_dl: PASSED
> > Nov 15 03:07:13 stage4 kernel: Testing tracer function_graph: PASSED
> > ```
> >
> > Add WITH_DIRECT_CALLS support [3] (patch 6, 7)
> > ==============================================
> >
> > This series adds DYNAMIC_FTRACE_WITH_DIRECT_CALLS support for RISC-V.
> > SAMPLE_FTRACE_DIRECT and SAMPLE_FTRACE_DIRECT_MULTI are also included
> > here as the samples for testing DIRECT_CALLS related interface.
> >
> > First, select the DYNAMIC_FTRACE_WITH_DIRECT_CALLS to provide
> > register_ftrace_direct[_multi] interfaces allowing user to register
> > the customed trampoline (direct_caller) as the mcount for one or
> > more target functions. And modify_ftrace_direct[_multi] are also
> > provided for modify direct_caller.
> >
> > At the same time, the samples in ./samples/ftrace/ can be built
> > as kerenl module for testing these interfaces with SAMPLE_FTRACE_DIRECT
> > and SAMPLE_FTRACE_DIRECT_MULTI selected.
> >
> > Second, to make the direct_caller and the other ftrace hooks
> > (eg. function/fgraph tracer, k[ret]probes) co-exist, a temporary
> > register
> > are nominated to store the address of direct_caller in
> > ftrace_regs_caller.
> > After the setting of the address direct_caller by direct_ops->func and
> > the RESTORE_REGS in ftrace_regs_caller, direct_caller will be jumped to
> > by the `jr` inst.
> >
> > The following tests have been passed in my local qemu-riscv64 virt
> > machine.
> >
> > 1. tests with CONFIG_FTRACE_STARTUP_TEST
> > 2. tests of samples/ftrace/ftrace*.ko
> > 3. manual tests with any combination of the following hooks
> >   - function/function_graph tracer
> >   - ftrace*.ko
> >   - kprobe/kretprobe
> >
> > For your reference, here is the log when function tracer, kretprobe and
> > ftrace-direct-too.ko co-hooks the handle_mm_fault function.
> >
> > ```
> > [root@stage4 tracing]# echo handle_mm_fault > set_ftrace_filter
> > [root@stage4 tracing]# echo 'r:myr handle_mm_fault' > kprobe_events
> > [root@stage4 tracing]# echo function > current_tracer
> > [root@stage4 tracing]# echo 1 > events/kprobes/myr/enable
> > [root@stage4 tracing]# insmod /root/ftrace-direct-too.ko
> > [root@stage4 tracing]#
> > [root@stage4 tracing]# cat trace | tail
> >              cat-388     [000] ...1.   583.051438: myr:
> > (do_page_fault+0x16c/0x5f2 <- handle_mm_fault)
> >              cat-388     [000] ...2.   583.057930: handle_mm_fault
> > <-do_page_fault
> >              cat-388     [000] .....   583.057990: my_direct_func:
> > handle mm fault vma=000000002d9fe19c address=ffffffae9b7000 flags=215
> >              cat-388     [000] ...1.   583.058284: myr:
> > (do_page_fault+0x16c/0x5f2 <- handle_mm_fault)
> >             tail-389     [001] ...2.   583.059062: handle_mm_fault
> > <-do_page_fault
> >             tail-389     [001] .....   583.059104: my_direct_func:
> > handle mm fault vma=0000000017f3c48e address=aaaaaabebf3000 flags=215
> >             tail-389     [001] ...1.   583.059325: myr:
> > (do_page_fault+0x16c/0x5f2 <- handle_mm_fault)
> >             tail-389     [001] ...2.   583.060371: handle_mm_fault
> > <-do_page_fault
> >             tail-389     [001] .....   583.060410: my_direct_func:
> > handle mm fault vma=0000000017f3c48e address=aaaaaabebf1000 flags=255
> >             tail-389     [001] ...1.   583.060996: myr:
> > (do_page_fault+0x16c/0x5f2 <- handle_mm_fault)
> > ```
> > Note1:
> >
> > The checkpatch.pl will output some warnings on this series, like this
> >
> > ```
> > WARNING: Prefer using '"%s...", __func__' to using 'my_direct_func2',
> > this function's name, in a string
> > 111: FILE: samples/ftrace/ftrace-direct-multi-modify.c:48:
> > +"       call my_direct_func2\n"
> > ```
> >
> > The reason is that checkpatch depends on patch context providing the
> > function name. In the above warning, my_direct_func2 has some codeline
> > distance with the changed trunk, so its declaration doesn't come into
> > the patch, and then the warning jumps out.
> >
> > You may notice the location of `my_ip` variable changes in the 2nd
> > patch. I did that for reducing the warnings to some extent. But killing
> > all the warnings will makes the patch less readable, so I stopped here.
> >
> > [1] https://lpc.events/event/16/contributions/1171/
> > [2] https://lore.kernel.org/lkml/20221120084230.910152-1-suagrfillet@gmail.com/
> > [3] https://lore.kernel.org/linux-riscv/20221123142025.1504030-1-suagrfillet@gmail.com/
> > [4] https://lore.kernel.org/linux-riscv/d7d5730b-ebef-68e5-5046-e763e1ee6164@yadro.com/
> >
> > Changes in v6:
> >  - Replace 8 with MCOUNT_INSN_SIZE
> >  - Replace "REG_L a1, PT_RA(sp)" with "mv a1, ra"
> >  - Add Evgenii Shatokhin comment
> >
> > Changes in v5:
> > https://lore.kernel.org/linux-riscv/20221208091244.203407-1-guoren@kernel.org/
> >  - Sort Kconfig entries in alphabetical order.
> >
> > Changes in v4:
> > https://lore.kernel.org/linux-riscv/20221129033230.255947-1-guoren@kernel.org/
> >  - Include [3] for maintenance. [Song Shuai]
> >
> > Changes in V3:
> > https://lore.kernel.org/linux-riscv/20221123153950.2911981-1-guoren@kernel.org/
> >  - Include [2] for maintenance. [Song Shuai]
> >
> > Changes in V2:
> > https://lore.kernel.org/linux-riscv/20220921034910.3142465-1-guoren@kernel.org/
> >  - Add Signed-off for preemption fixup.
> >
> > Changes in V1:
> > https://lore.kernel.org/linux-riscv/20220916103817.9490-1-guoren@kernel.org/
> >
> > Andy Chiu (1):
> >   riscv: ftrace: Fixup panic by disabling preemption
> >
> > Guo Ren (2):
> >   riscv: ftrace: Remove wasted nops for !RISCV_ISA_C
> >   riscv: ftrace: Reduce the detour code size to half
> >
> > Song Shuai (4):
> >   riscv: ftrace: Add ftrace_graph_func
> >   riscv: ftrace: Make ftrace_caller call ftrace_graph_func
> >   riscv: ftrace: Add DYNAMIC_FTRACE_WITH_DIRECT_CALLS support
> >   samples: ftrace: Add riscv support for SAMPLE_FTRACE_DIRECT[_MULTI]
> >
> >  arch/riscv/Kconfig                          |   3 +-
> >  arch/riscv/Makefile                         |   6 +-
> >  arch/riscv/include/asm/ftrace.h             |  71 ++++++--
> >  arch/riscv/kernel/ftrace.c                  |  91 ++++------
> >  arch/riscv/kernel/mcount-dyn.S              | 181 +++++++++++++-------
> >  samples/ftrace/ftrace-direct-modify.c       |  33 ++++
> >  samples/ftrace/ftrace-direct-multi-modify.c |  37 ++++
> >  samples/ftrace/ftrace-direct-multi.c        |  22 +++
> >  samples/ftrace/ftrace-direct-too.c          |  26 +++
> >  samples/ftrace/ftrace-direct.c              |  22 +++
> >  10 files changed, 355 insertions(+), 137 deletions(-)
> >
> > --
> > 2.36.1
> >
>
>
> --
> Thanks,
> Song



-- 
Best Regards
 Guo Ren

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH -next V6 1/7] riscv: ftrace: Fixup panic by disabling preemption
  2023-01-11 13:22     ` Guo Ren
@ 2023-01-12 12:05       ` Mark Rutland
  2023-01-28 10:00         ` Guo Ren
  0 siblings, 1 reply; 28+ messages in thread
From: Mark Rutland @ 2023-01-12 12:05 UTC (permalink / raw)
  To: Guo Ren
  Cc: anup, paul.walmsley, palmer, conor.dooley, heiko, rostedt,
	mhiramat, jolsa, bp, jpoimboe, suagrfillet, andy.chiu,
	e.shatokhin, linux-riscv, linux-kernel

On Wed, Jan 11, 2023 at 09:22:09PM +0800, Guo Ren wrote:
> On Tue, Jan 10, 2023 at 1:20 AM Mark Rutland <mark.rutland@arm.com> wrote:
> >
> > On Sat, Jan 07, 2023 at 08:35:43AM -0500, guoren@kernel.org wrote:
> > > From: Andy Chiu <andy.chiu@sifive.com>
> > >
> > > In RISCV, we must use an AUIPC + JALR pair to encode an immediate,
> > > forming a jump that jumps to an address over 4K. This may cause errors
> > > if we want to enable kernel preemption and remove dependency from
> > > patching code with stop_machine(). For example, if a task was switched
> > > out on auipc. And, if we changed the ftrace function before it was
> > > switched back, then it would jump to an address that has updated 11:0
> > > bits mixing with previous XLEN:12 part.
> > >
> > > p: patched area performed by dynamic ftrace
> > > ftrace_prologue:
> > > p|      REG_S   ra, -SZREG(sp)
> > > p|      auipc   ra, 0x? ------------> preempted
> > >                                       ...
> > >                               change ftrace function
> > >                                       ...
> > > p|      jalr    -?(ra) <------------- switched back
> > > p|      REG_L   ra, -SZREG(sp)
> > > func:
> > >       xxx
> > >       ret
> >
> > What happens on SMP but not !PREEMPTION; can't a CPU be in the middle of this
> > while you're patching the sequence?
> Yes, when PREEMPTION, a timer interrupt between auipc & jalr may cause
> context_switch. And riscv uses stop_machine for patch_text. Then, we
> may modify auipc part, but only execute the jalr part when return.

Please re-read my question; "!PREEMPTION" means "NOT PREEMPTION".

Ignore preeemption entirely and assume two CPUs X and Y are running code
concurrently. Assume CPU X is in the ftrace prologue, and CPU Y is patching
that prologue while CPU X is executing it.

Is that prevented somehow? If not, what happens in that case?

At the very least you can have exactly the same case as on a preemptible kernel
(and in a VM, the hypervisor can preempt the guest ata arbitrary times),
becuase CPU X could end up executing a mixture of the old and new instructions.

More generally, if you don't have strong rules about concurrent modification
and execution of instructions, it may not be safe to modify and instruction as
it is being executed (e.g. if the CPU's instruction fetches aren't atomic).

> > Do you have any guarantee as to the atomicity and ordering of instruction
> > fetches?
> Not yet. If the region is short, we could use nop + jalr pair instead.

Ok, so as above I do not understand how this is safe. Maybe I am missing
something, but if you don't have a guarantee as to ordering I don't see how you
can safely patch this even if you have atomicity of each instruction update.

Note that if you don't have atomicity of instruction fetches you *cannot*
safely concurrently modify and execute instructions.

> Only one jalr instruction makes the entry atomicity.

I'll have to take your word for that.

As above, I don't think this sequence is safe regardless.

> There are already several proposed solutions:
> 1. Make stop_machine guarantee all CPU out of preemption point.
> 2. Expand -fpatchable-function-entry from 4 to 24, and make detour
> codes atomicity.
> 3. We want to propose a solution to make auipc by hardware mask_irq.
> For more details, see:
> https://www.youtube.com/watch?v=4JkkkXuEvCw

Ignoring things which require HW changes, you could consider doing something
like what I'm doing for arm64 with DYNAMIC_FTRACE_WITH_CALL_OPS:

  https://lore.kernel.org/lkml/20230109135828.879136-1-mark.rutland@arm.com/

... which would replace the address generation with a load, which can be
atomic, and would give you a number of other benefits (e.g. avoiding branch
range limitations, performance benefits as in the cover letter).

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH -next V6 7/7] samples: ftrace: Add riscv support for SAMPLE_FTRACE_DIRECT[_MULTI]
  2023-01-11  9:50   ` Song Shuai
  2023-01-11  9:59     ` Guo Ren
@ 2023-01-13 10:48     ` Song Shuai
  2023-01-17  3:20       ` Guo Ren
  1 sibling, 1 reply; 28+ messages in thread
From: Song Shuai @ 2023-01-13 10:48 UTC (permalink / raw)
  To: guoren
  Cc: anup, e.shatokhin, paul.walmsley, palmer, conor.dooley, heiko,
	rostedt, mhiramat, jolsa, bp, jpoimboe, andy.chiu, linux-riscv,
	linux-kernel

Hi,Guo:

Song Shuai <suagrfillet@gmail.com> 于2023年1月11日周三 09:50写道:
>
> <guoren@kernel.org> 于2023年1月7日周六 13:36写道:
> >
> > From: Song Shuai <suagrfillet@gmail.com>
> >
> > select HAVE_SAMPLE_FTRACE_DIRECT and HAVE_SAMPLE_FTRACE_DIRECT_MULTI
> > for ARCH_RV64I in arch/riscv/Kconfig. And add riscv asm code for
> > the ftrace-direct*.c files in samples/ftrace/.
> >
> > Signed-off-by: Song Shuai <suagrfillet@gmail.com>
> > Tested-by: Guo Ren <guoren@kernel.org>
> > Signed-off-by: Guo Ren <guoren@kernel.org>
> > ---
> Hi,Guo && Evgenii:
>
> As Evgenii comments, this patch has two problems to fix:
>
> 1. the modification of Kconfig is missing
> So we should add it back
>
> 2. the build error resulted by including of nospec-branch.h file
> This including is exclusive for x86 architecture, moving it under
> x86 #ifdef seems better to fix this error
>
> I fixed them in my GitHub repo, but I can't find a reasonable target
> branch in your repo to PR.
> So I paste the link of my branch here, you can pick the first 2
> commits in the next series.
>
> - samples: ftrace: Add riscv support for SAMPLE_FTRACE_DIRECT[_MULTI]
> - samples: ftrace: Include the nospec-branch.h only for x86
This patch for the 2nd problem mentioned above seems to be omitted in
the V7 series.
I paste the raw patch link here. Hope you can add it in the next.

https://github.com/sugarfillet/linux/commit/9539a80dc6e7d1137ec7a96ebef2ab912a694bd7.patch
>
> Link: https://github.com/sugarfillet/linux/commits/song_ftrace_fix_up_v6
>
> >  samples/ftrace/ftrace-direct-modify.c       | 33 ++++++++++++++++++
> >  samples/ftrace/ftrace-direct-multi-modify.c | 37 +++++++++++++++++++++
> >  samples/ftrace/ftrace-direct-multi.c        | 22 ++++++++++++
> >  samples/ftrace/ftrace-direct-too.c          | 26 +++++++++++++++
> >  samples/ftrace/ftrace-direct.c              | 22 ++++++++++++
> >  5 files changed, 140 insertions(+)
> >
> > diff --git a/samples/ftrace/ftrace-direct-modify.c b/samples/ftrace/ftrace-direct-modify.c
> > index de5a0f67f320..be7bf472c3c7 100644
> > --- a/samples/ftrace/ftrace-direct-modify.c
> > +++ b/samples/ftrace/ftrace-direct-modify.c
> > @@ -23,6 +23,39 @@ extern void my_tramp2(void *);
> >
> >  static unsigned long my_ip = (unsigned long)schedule;
> >
> > +#ifdef CONFIG_RISCV
> > +
> > +asm (" .pushsection    .text, \"ax\", @progbits\n"
> > +"      .type           my_tramp1, @function\n"
> > +"      .globl          my_tramp1\n"
> > +"   my_tramp1:\n"
> > +"      addi sp,sp,-16\n"
> > +"      sd   t0,0(sp)\n"
> > +"      sd   ra,8(sp)\n"
> > +"      call my_direct_func1\n"
> > +"      ld   t0,0(sp)\n"
> > +"      ld   ra,8(sp)\n"
> > +"      addi sp,sp,16\n"
> > +"      jr t0\n"
> > +"      .size           my_tramp1, .-my_tramp1\n"
> > +
> > +"      .type           my_tramp2, @function\n"
> > +"      .globl          my_tramp2\n"
> > +"   my_tramp2:\n"
> > +"      addi sp,sp,-16\n"
> > +"      sd   t0,0(sp)\n"
> > +"      sd   ra,8(sp)\n"
> > +"      call my_direct_func2\n"
> > +"      ld   t0,0(sp)\n"
> > +"      ld   ra,8(sp)\n"
> > +"      addi sp,sp,16\n"
> > +"      jr t0\n"
> > +"      .size           my_tramp2, .-my_tramp2\n"
> > +"      .popsection\n"
> > +);
> > +
> > +#endif /* CONFIG_RISCV */
> > +
> >  #ifdef CONFIG_X86_64
> >
> >  #include <asm/ibt.h>
> > diff --git a/samples/ftrace/ftrace-direct-multi-modify.c b/samples/ftrace/ftrace-direct-multi-modify.c
> > index d52370cad0b6..10884bf418f7 100644
> > --- a/samples/ftrace/ftrace-direct-multi-modify.c
> > +++ b/samples/ftrace/ftrace-direct-multi-modify.c
> > @@ -21,6 +21,43 @@ void my_direct_func2(unsigned long ip)
> >  extern void my_tramp1(void *);
> >  extern void my_tramp2(void *);
> >
> > +#ifdef CONFIG_RISCV
> > +
> > +asm (" .pushsection    .text, \"ax\", @progbits\n"
> > +"      .type           my_tramp1, @function\n"
> > +"      .globl          my_tramp1\n"
> > +"   my_tramp1:\n"
> > +"       addi sp,sp,-24\n"
> > +"       sd   a0,0(sp)\n"
> > +"       sd   t0,8(sp)\n"
> > +"       sd   ra,16(sp)\n"
> > +"       call my_direct_func1\n"
> > +"       ld   a0,0(sp)\n"
> > +"       ld   t0,8(sp)\n"
> > +"       ld   ra,16(sp)\n"
> > +"       addi sp,sp,24\n"
> > +"      jr t0\n"
> > +"      .size           my_tramp1, .-my_tramp1\n"
> > +
> > +"      .type           my_tramp2, @function\n"
> > +"      .globl          my_tramp2\n"
> > +"   my_tramp2:\n"
> > +"       addi sp,sp,-24\n"
> > +"       sd   a0,0(sp)\n"
> > +"       sd   t0,8(sp)\n"
> > +"       sd   ra,16(sp)\n"
> > +"       call my_direct_func2\n"
> > +"       ld   a0,0(sp)\n"
> > +"       ld   t0,8(sp)\n"
> > +"       ld   ra,16(sp)\n"
> > +"       addi sp,sp,24\n"
> > +"      jr t0\n"
> > +"      .size           my_tramp2, .-my_tramp2\n"
> > +"      .popsection\n"
> > +);
> > +
> > +#endif /* CONFIG_RISCV */
> > +
> >  #ifdef CONFIG_X86_64
> >
> >  #include <asm/ibt.h>
> > diff --git a/samples/ftrace/ftrace-direct-multi.c b/samples/ftrace/ftrace-direct-multi.c
> > index ec1088922517..a35bf43bf6d7 100644
> > --- a/samples/ftrace/ftrace-direct-multi.c
> > +++ b/samples/ftrace/ftrace-direct-multi.c
> > @@ -16,6 +16,28 @@ void my_direct_func(unsigned long ip)
> >
> >  extern void my_tramp(void *);
> >
> > +#ifdef CONFIG_RISCV
> > +
> > +asm ("       .pushsection    .text, \"ax\", @progbits\n"
> > +"       .type           my_tramp, @function\n"
> > +"       .globl          my_tramp\n"
> > +"   my_tramp:\n"
> > +"       addi sp,sp,-24\n"
> > +"       sd   a0,0(sp)\n"
> > +"       sd   t0,8(sp)\n"
> > +"       sd   ra,16(sp)\n"
> > +"       call my_direct_func\n"
> > +"       ld   a0,0(sp)\n"
> > +"       ld   t0,8(sp)\n"
> > +"       ld   ra,16(sp)\n"
> > +"       addi sp,sp,24\n"
> > +"       jr t0\n"
> > +"       .size           my_tramp, .-my_tramp\n"
> > +"       .popsection\n"
> > +);
> > +
> > +#endif /* CONFIG_RISCV */
> > +
> >  #ifdef CONFIG_X86_64
> >
> >  #include <asm/ibt.h>
> > diff --git a/samples/ftrace/ftrace-direct-too.c b/samples/ftrace/ftrace-direct-too.c
> > index e13fb59a2b47..3b62e33c2e6d 100644
> > --- a/samples/ftrace/ftrace-direct-too.c
> > +++ b/samples/ftrace/ftrace-direct-too.c
> > @@ -18,6 +18,32 @@ void my_direct_func(struct vm_area_struct *vma,
> >
> >  extern void my_tramp(void *);
> >
> > +#ifdef CONFIG_RISCV
> > +
> > +asm ("       .pushsection    .text, \"ax\", @progbits\n"
> > +"       .type           my_tramp, @function\n"
> > +"       .globl          my_tramp\n"
> > +"   my_tramp:\n"
> > +"       addi sp,sp,-40\n"
> > +"       sd   a0,0(sp)\n"
> > +"       sd   a1,8(sp)\n"
> > +"       sd   a2,16(sp)\n"
> > +"       sd   t0,24(sp)\n"
> > +"       sd   ra,32(sp)\n"
> > +"       call my_direct_func\n"
> > +"       ld   a0,0(sp)\n"
> > +"       ld   a1,8(sp)\n"
> > +"       ld   a2,16(sp)\n"
> > +"       ld   t0,24(sp)\n"
> > +"       ld   ra,32(sp)\n"
> > +"       addi sp,sp,40\n"
> > +"       jr t0\n"
> > +"       .size           my_tramp, .-my_tramp\n"
> > +"       .popsection\n"
> > +);
> > +
> > +#endif /* CONFIG_RISCV */
> > +
> >  #ifdef CONFIG_X86_64
> >
> >  #include <asm/ibt.h>
> > diff --git a/samples/ftrace/ftrace-direct.c b/samples/ftrace/ftrace-direct.c
> > index 1f769d0db20f..2cfe5a7d2d70 100644
> > --- a/samples/ftrace/ftrace-direct.c
> > +++ b/samples/ftrace/ftrace-direct.c
> > @@ -15,6 +15,28 @@ void my_direct_func(struct task_struct *p)
> >
> >  extern void my_tramp(void *);
> >
> > +#ifdef CONFIG_RISCV
> > +
> > +asm ("       .pushsection    .text, \"ax\", @progbits\n"
> > +"       .type           my_tramp, @function\n"
> > +"       .globl          my_tramp\n"
> > +"   my_tramp:\n"
> > +"       addi sp,sp,-24\n"
> > +"       sd   a0,0(sp)\n"
> > +"       sd   t0,8(sp)\n"
> > +"       sd   ra,16(sp)\n"
> > +"       call my_direct_func\n"
> > +"       ld   a0,0(sp)\n"
> > +"       ld   t0,8(sp)\n"
> > +"       ld   ra,16(sp)\n"
> > +"       addi sp,sp,24\n"
> > +"       jr t0\n"
> > +"       .size           my_tramp, .-my_tramp\n"
> > +"       .popsection\n"
> > +);
> > +
> > +#endif /* CONFIG_RISCV */
> > +
> >  #ifdef CONFIG_X86_64
> >
> >  #include <asm/ibt.h>
> > --
> > 2.36.1
> >
>
>
> --
> Thanks,
> Song



-- 
Thanks,
Song

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH -next V6 7/7] samples: ftrace: Add riscv support for SAMPLE_FTRACE_DIRECT[_MULTI]
  2023-01-13 10:48     ` Song Shuai
@ 2023-01-17  3:20       ` Guo Ren
  0 siblings, 0 replies; 28+ messages in thread
From: Guo Ren @ 2023-01-17  3:20 UTC (permalink / raw)
  To: Song Shuai
  Cc: anup, e.shatokhin, paul.walmsley, palmer, conor.dooley, heiko,
	rostedt, mhiramat, jolsa, bp, jpoimboe, andy.chiu, linux-riscv,
	linux-kernel

On Fri, Jan 13, 2023 at 6:48 PM Song Shuai <suagrfillet@gmail.com> wrote:
>
> Hi,Guo:
>
> Song Shuai <suagrfillet@gmail.com> 于2023年1月11日周三 09:50写道:
> >
> > <guoren@kernel.org> 于2023年1月7日周六 13:36写道:
> > >
> > > From: Song Shuai <suagrfillet@gmail.com>
> > >
> > > select HAVE_SAMPLE_FTRACE_DIRECT and HAVE_SAMPLE_FTRACE_DIRECT_MULTI
> > > for ARCH_RV64I in arch/riscv/Kconfig. And add riscv asm code for
> > > the ftrace-direct*.c files in samples/ftrace/.
> > >
> > > Signed-off-by: Song Shuai <suagrfillet@gmail.com>
> > > Tested-by: Guo Ren <guoren@kernel.org>
> > > Signed-off-by: Guo Ren <guoren@kernel.org>
> > > ---
> > Hi,Guo && Evgenii:
> >
> > As Evgenii comments, this patch has two problems to fix:
> >
> > 1. the modification of Kconfig is missing
> > So we should add it back
> >
> > 2. the build error resulted by including of nospec-branch.h file
> > This including is exclusive for x86 architecture, moving it under
> > x86 #ifdef seems better to fix this error
> >
> > I fixed them in my GitHub repo, but I can't find a reasonable target
> > branch in your repo to PR.
> > So I paste the link of my branch here, you can pick the first 2
> > commits in the next series.
> >
> > - samples: ftrace: Add riscv support for SAMPLE_FTRACE_DIRECT[_MULTI]
> > - samples: ftrace: Include the nospec-branch.h only for x86
> This patch for the 2nd problem mentioned above seems to be omitted in
> the V7 series.
> I paste the raw patch link here. Hope you can add it in the next.
>
> https://github.com/sugarfillet/linux/commit/9539a80dc6e7d1137ec7a96ebef2ab912a694bd7.patch
Why do we need an x86 patchset in a riscv series? Without the patch,
what's compile error in riscv?

> >
> > Link: https://github.com/sugarfillet/linux/commits/song_ftrace_fix_up_v6
> >
> > >  samples/ftrace/ftrace-direct-modify.c       | 33 ++++++++++++++++++
> > >  samples/ftrace/ftrace-direct-multi-modify.c | 37 +++++++++++++++++++++
> > >  samples/ftrace/ftrace-direct-multi.c        | 22 ++++++++++++
> > >  samples/ftrace/ftrace-direct-too.c          | 26 +++++++++++++++
> > >  samples/ftrace/ftrace-direct.c              | 22 ++++++++++++
> > >  5 files changed, 140 insertions(+)
> > >
> > > diff --git a/samples/ftrace/ftrace-direct-modify.c b/samples/ftrace/ftrace-direct-modify.c
> > > index de5a0f67f320..be7bf472c3c7 100644
> > > --- a/samples/ftrace/ftrace-direct-modify.c
> > > +++ b/samples/ftrace/ftrace-direct-modify.c
> > > @@ -23,6 +23,39 @@ extern void my_tramp2(void *);
> > >
> > >  static unsigned long my_ip = (unsigned long)schedule;
> > >
> > > +#ifdef CONFIG_RISCV
> > > +
> > > +asm (" .pushsection    .text, \"ax\", @progbits\n"
> > > +"      .type           my_tramp1, @function\n"
> > > +"      .globl          my_tramp1\n"
> > > +"   my_tramp1:\n"
> > > +"      addi sp,sp,-16\n"
> > > +"      sd   t0,0(sp)\n"
> > > +"      sd   ra,8(sp)\n"
> > > +"      call my_direct_func1\n"
> > > +"      ld   t0,0(sp)\n"
> > > +"      ld   ra,8(sp)\n"
> > > +"      addi sp,sp,16\n"
> > > +"      jr t0\n"
> > > +"      .size           my_tramp1, .-my_tramp1\n"
> > > +
> > > +"      .type           my_tramp2, @function\n"
> > > +"      .globl          my_tramp2\n"
> > > +"   my_tramp2:\n"
> > > +"      addi sp,sp,-16\n"
> > > +"      sd   t0,0(sp)\n"
> > > +"      sd   ra,8(sp)\n"
> > > +"      call my_direct_func2\n"
> > > +"      ld   t0,0(sp)\n"
> > > +"      ld   ra,8(sp)\n"
> > > +"      addi sp,sp,16\n"
> > > +"      jr t0\n"
> > > +"      .size           my_tramp2, .-my_tramp2\n"
> > > +"      .popsection\n"
> > > +);
> > > +
> > > +#endif /* CONFIG_RISCV */
> > > +
> > >  #ifdef CONFIG_X86_64
> > >
> > >  #include <asm/ibt.h>
> > > diff --git a/samples/ftrace/ftrace-direct-multi-modify.c b/samples/ftrace/ftrace-direct-multi-modify.c
> > > index d52370cad0b6..10884bf418f7 100644
> > > --- a/samples/ftrace/ftrace-direct-multi-modify.c
> > > +++ b/samples/ftrace/ftrace-direct-multi-modify.c
> > > @@ -21,6 +21,43 @@ void my_direct_func2(unsigned long ip)
> > >  extern void my_tramp1(void *);
> > >  extern void my_tramp2(void *);
> > >
> > > +#ifdef CONFIG_RISCV
> > > +
> > > +asm (" .pushsection    .text, \"ax\", @progbits\n"
> > > +"      .type           my_tramp1, @function\n"
> > > +"      .globl          my_tramp1\n"
> > > +"   my_tramp1:\n"
> > > +"       addi sp,sp,-24\n"
> > > +"       sd   a0,0(sp)\n"
> > > +"       sd   t0,8(sp)\n"
> > > +"       sd   ra,16(sp)\n"
> > > +"       call my_direct_func1\n"
> > > +"       ld   a0,0(sp)\n"
> > > +"       ld   t0,8(sp)\n"
> > > +"       ld   ra,16(sp)\n"
> > > +"       addi sp,sp,24\n"
> > > +"      jr t0\n"
> > > +"      .size           my_tramp1, .-my_tramp1\n"
> > > +
> > > +"      .type           my_tramp2, @function\n"
> > > +"      .globl          my_tramp2\n"
> > > +"   my_tramp2:\n"
> > > +"       addi sp,sp,-24\n"
> > > +"       sd   a0,0(sp)\n"
> > > +"       sd   t0,8(sp)\n"
> > > +"       sd   ra,16(sp)\n"
> > > +"       call my_direct_func2\n"
> > > +"       ld   a0,0(sp)\n"
> > > +"       ld   t0,8(sp)\n"
> > > +"       ld   ra,16(sp)\n"
> > > +"       addi sp,sp,24\n"
> > > +"      jr t0\n"
> > > +"      .size           my_tramp2, .-my_tramp2\n"
> > > +"      .popsection\n"
> > > +);
> > > +
> > > +#endif /* CONFIG_RISCV */
> > > +
> > >  #ifdef CONFIG_X86_64
> > >
> > >  #include <asm/ibt.h>
> > > diff --git a/samples/ftrace/ftrace-direct-multi.c b/samples/ftrace/ftrace-direct-multi.c
> > > index ec1088922517..a35bf43bf6d7 100644
> > > --- a/samples/ftrace/ftrace-direct-multi.c
> > > +++ b/samples/ftrace/ftrace-direct-multi.c
> > > @@ -16,6 +16,28 @@ void my_direct_func(unsigned long ip)
> > >
> > >  extern void my_tramp(void *);
> > >
> > > +#ifdef CONFIG_RISCV
> > > +
> > > +asm ("       .pushsection    .text, \"ax\", @progbits\n"
> > > +"       .type           my_tramp, @function\n"
> > > +"       .globl          my_tramp\n"
> > > +"   my_tramp:\n"
> > > +"       addi sp,sp,-24\n"
> > > +"       sd   a0,0(sp)\n"
> > > +"       sd   t0,8(sp)\n"
> > > +"       sd   ra,16(sp)\n"
> > > +"       call my_direct_func\n"
> > > +"       ld   a0,0(sp)\n"
> > > +"       ld   t0,8(sp)\n"
> > > +"       ld   ra,16(sp)\n"
> > > +"       addi sp,sp,24\n"
> > > +"       jr t0\n"
> > > +"       .size           my_tramp, .-my_tramp\n"
> > > +"       .popsection\n"
> > > +);
> > > +
> > > +#endif /* CONFIG_RISCV */
> > > +
> > >  #ifdef CONFIG_X86_64
> > >
> > >  #include <asm/ibt.h>
> > > diff --git a/samples/ftrace/ftrace-direct-too.c b/samples/ftrace/ftrace-direct-too.c
> > > index e13fb59a2b47..3b62e33c2e6d 100644
> > > --- a/samples/ftrace/ftrace-direct-too.c
> > > +++ b/samples/ftrace/ftrace-direct-too.c
> > > @@ -18,6 +18,32 @@ void my_direct_func(struct vm_area_struct *vma,
> > >
> > >  extern void my_tramp(void *);
> > >
> > > +#ifdef CONFIG_RISCV
> > > +
> > > +asm ("       .pushsection    .text, \"ax\", @progbits\n"
> > > +"       .type           my_tramp, @function\n"
> > > +"       .globl          my_tramp\n"
> > > +"   my_tramp:\n"
> > > +"       addi sp,sp,-40\n"
> > > +"       sd   a0,0(sp)\n"
> > > +"       sd   a1,8(sp)\n"
> > > +"       sd   a2,16(sp)\n"
> > > +"       sd   t0,24(sp)\n"
> > > +"       sd   ra,32(sp)\n"
> > > +"       call my_direct_func\n"
> > > +"       ld   a0,0(sp)\n"
> > > +"       ld   a1,8(sp)\n"
> > > +"       ld   a2,16(sp)\n"
> > > +"       ld   t0,24(sp)\n"
> > > +"       ld   ra,32(sp)\n"
> > > +"       addi sp,sp,40\n"
> > > +"       jr t0\n"
> > > +"       .size           my_tramp, .-my_tramp\n"
> > > +"       .popsection\n"
> > > +);
> > > +
> > > +#endif /* CONFIG_RISCV */
> > > +
> > >  #ifdef CONFIG_X86_64
> > >
> > >  #include <asm/ibt.h>
> > > diff --git a/samples/ftrace/ftrace-direct.c b/samples/ftrace/ftrace-direct.c
> > > index 1f769d0db20f..2cfe5a7d2d70 100644
> > > --- a/samples/ftrace/ftrace-direct.c
> > > +++ b/samples/ftrace/ftrace-direct.c
> > > @@ -15,6 +15,28 @@ void my_direct_func(struct task_struct *p)
> > >
> > >  extern void my_tramp(void *);
> > >
> > > +#ifdef CONFIG_RISCV
> > > +
> > > +asm ("       .pushsection    .text, \"ax\", @progbits\n"
> > > +"       .type           my_tramp, @function\n"
> > > +"       .globl          my_tramp\n"
> > > +"   my_tramp:\n"
> > > +"       addi sp,sp,-24\n"
> > > +"       sd   a0,0(sp)\n"
> > > +"       sd   t0,8(sp)\n"
> > > +"       sd   ra,16(sp)\n"
> > > +"       call my_direct_func\n"
> > > +"       ld   a0,0(sp)\n"
> > > +"       ld   t0,8(sp)\n"
> > > +"       ld   ra,16(sp)\n"
> > > +"       addi sp,sp,24\n"
> > > +"       jr t0\n"
> > > +"       .size           my_tramp, .-my_tramp\n"
> > > +"       .popsection\n"
> > > +);
> > > +
> > > +#endif /* CONFIG_RISCV */
> > > +
> > >  #ifdef CONFIG_X86_64
> > >
> > >  #include <asm/ibt.h>
> > > --
> > > 2.36.1
> > >
> >
> >
> > --
> > Thanks,
> > Song
>
>
>
> --
> Thanks,
> Song



-- 
Best Regards
 Guo Ren

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH -next V6 1/7] riscv: ftrace: Fixup panic by disabling preemption
  2023-01-12 12:05       ` Mark Rutland
@ 2023-01-28 10:00         ` Guo Ren
  2023-01-29  5:36           ` Guo Ren
  2023-01-30 11:17           ` Mark Rutland
  0 siblings, 2 replies; 28+ messages in thread
From: Guo Ren @ 2023-01-28 10:00 UTC (permalink / raw)
  To: Mark Rutland
  Cc: anup, paul.walmsley, palmer, conor.dooley, heiko, rostedt,
	mhiramat, jolsa, bp, jpoimboe, suagrfillet, andy.chiu,
	e.shatokhin, linux-riscv, linux-kernel

On Thu, Jan 12, 2023 at 8:05 PM Mark Rutland <mark.rutland@arm.com> wrote:
>
> On Wed, Jan 11, 2023 at 09:22:09PM +0800, Guo Ren wrote:
> > On Tue, Jan 10, 2023 at 1:20 AM Mark Rutland <mark.rutland@arm.com> wrote:
> > >
> > > On Sat, Jan 07, 2023 at 08:35:43AM -0500, guoren@kernel.org wrote:
> > > > From: Andy Chiu <andy.chiu@sifive.com>
> > > >
> > > > In RISCV, we must use an AUIPC + JALR pair to encode an immediate,
> > > > forming a jump that jumps to an address over 4K. This may cause errors
> > > > if we want to enable kernel preemption and remove dependency from
> > > > patching code with stop_machine(). For example, if a task was switched
> > > > out on auipc. And, if we changed the ftrace function before it was
> > > > switched back, then it would jump to an address that has updated 11:0
> > > > bits mixing with previous XLEN:12 part.
> > > >
> > > > p: patched area performed by dynamic ftrace
> > > > ftrace_prologue:
> > > > p|      REG_S   ra, -SZREG(sp)
> > > > p|      auipc   ra, 0x? ------------> preempted
> > > >                                       ...
> > > >                               change ftrace function
> > > >                                       ...
> > > > p|      jalr    -?(ra) <------------- switched back
> > > > p|      REG_L   ra, -SZREG(sp)
> > > > func:
> > > >       xxx
> > > >       ret
> > >
> > > What happens on SMP but not !PREEMPTION; can't a CPU be in the middle of this
> > > while you're patching the sequence?
> > Yes, when PREEMPTION, a timer interrupt between auipc & jalr may cause
> > context_switch. And riscv uses stop_machine for patch_text. Then, we
> > may modify auipc part, but only execute the jalr part when return.
>
> Please re-read my question; "!PREEMPTION" means "NOT PREEMPTION".
>
> Ignore preeemption entirely and assume two CPUs X and Y are running code
> concurrently. Assume CPU X is in the ftrace prologue, and CPU Y is patching
> that prologue while CPU X is executing it.
>
> Is that prevented somehow? If not, what happens in that case?
>
> At the very least you can have exactly the same case as on a preemptible kernel
> (and in a VM, the hypervisor can preempt the guest ata arbitrary times),
> becuase CPU X could end up executing a mixture of the old and new instructions.
>
> More generally, if you don't have strong rules about concurrent modification
> and execution of instructions, it may not be safe to modify and instruction as
> it is being executed (e.g. if the CPU's instruction fetches aren't atomic).
>
> > > Do you have any guarantee as to the atomicity and ordering of instruction
> > > fetches?
> > Not yet. If the region is short, we could use nop + jalr pair instead.
>
> Ok, so as above I do not understand how this is safe. Maybe I am missing
> something, but if you don't have a guarantee as to ordering I don't see how you
> can safely patch this even if you have atomicity of each instruction update.
>
> Note that if you don't have atomicity of instruction fetches you *cannot*
> safely concurrently modify and execute instructions.
>
> > Only one jalr instruction makes the entry atomicity.
>
> I'll have to take your word for that.
>
> As above, I don't think this sequence is safe regardless.
>
> > There are already several proposed solutions:
> > 1. Make stop_machine guarantee all CPU out of preemption point.
> > 2. Expand -fpatchable-function-entry from 4 to 24, and make detour
> > codes atomicity.
> > 3. We want to propose a solution to make auipc by hardware mask_irq.
> > For more details, see:
> > https://www.youtube.com/watch?v=4JkkkXuEvCw
>
> Ignoring things which require HW changes, you could consider doing something
> like what I'm doing for arm64 with DYNAMIC_FTRACE_WITH_CALL_OPS:
>
>   https://lore.kernel.org/lkml/20230109135828.879136-1-mark.rutland@arm.com/
The idea of DYNAMIC_FTRACE_WITH_CALL_OPS (Using data load/store +
indirect jump instead of auipc+jalr) is similar to Andy's solution
(See youtube link, last page of ppt). But the key problem is you also
expand the size of the prologue of the function. 64BIT is already
expensive, and we can't afford more of it. I would change to seek a
new atomic auipc+jalr ISA extension to solve this problem.

DYNAMIC_FTRACE_WITH_CALL_OPS  would speed up ftrace_(regs)_caller
(Mostly for kernel debug), but it won't help
DYNAMIC_FTRACE_WITH_DIRECT_CALLS. So I do not so care about the
ftrace_(regs)_caller performance gain.

>
> ... which would replace the address generation with a load, which can be
> atomic, and would give you a number of other benefits (e.g. avoiding branch
> range limitations, performance benefits as in the cover letter).
>
> Thanks,
> Mark.



-- 
Best Regards
 Guo Ren

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH -next V6 1/7] riscv: ftrace: Fixup panic by disabling preemption
  2023-01-28 10:00         ` Guo Ren
@ 2023-01-29  5:36           ` Guo Ren
  2023-01-30 11:17           ` Mark Rutland
  1 sibling, 0 replies; 28+ messages in thread
From: Guo Ren @ 2023-01-29  5:36 UTC (permalink / raw)
  To: Mark Rutland
  Cc: anup, paul.walmsley, palmer, conor.dooley, heiko, rostedt,
	mhiramat, jolsa, bp, jpoimboe, suagrfillet, andy.chiu,
	e.shatokhin, linux-riscv, linux-kernel

On Sat, Jan 28, 2023 at 6:00 PM Guo Ren <guoren@kernel.org> wrote:
>
> On Thu, Jan 12, 2023 at 8:05 PM Mark Rutland <mark.rutland@arm.com> wrote:
> >
> > On Wed, Jan 11, 2023 at 09:22:09PM +0800, Guo Ren wrote:
> > > On Tue, Jan 10, 2023 at 1:20 AM Mark Rutland <mark.rutland@arm.com> wrote:
> > > >
> > > > On Sat, Jan 07, 2023 at 08:35:43AM -0500, guoren@kernel.org wrote:
> > > > > From: Andy Chiu <andy.chiu@sifive.com>
> > > > >
> > > > > In RISCV, we must use an AUIPC + JALR pair to encode an immediate,
> > > > > forming a jump that jumps to an address over 4K. This may cause errors
> > > > > if we want to enable kernel preemption and remove dependency from
> > > > > patching code with stop_machine(). For example, if a task was switched
> > > > > out on auipc. And, if we changed the ftrace function before it was
> > > > > switched back, then it would jump to an address that has updated 11:0
> > > > > bits mixing with previous XLEN:12 part.
> > > > >
> > > > > p: patched area performed by dynamic ftrace
> > > > > ftrace_prologue:
> > > > > p|      REG_S   ra, -SZREG(sp)
> > > > > p|      auipc   ra, 0x? ------------> preempted
> > > > >                                       ...
> > > > >                               change ftrace function
> > > > >                                       ...
> > > > > p|      jalr    -?(ra) <------------- switched back
> > > > > p|      REG_L   ra, -SZREG(sp)
> > > > > func:
> > > > >       xxx
> > > > >       ret
> > > >
> > > > What happens on SMP but not !PREEMPTION; can't a CPU be in the middle of this
> > > > while you're patching the sequence?
> > > Yes, when PREEMPTION, a timer interrupt between auipc & jalr may cause
> > > context_switch. And riscv uses stop_machine for patch_text. Then, we
> > > may modify auipc part, but only execute the jalr part when return.
> >
> > Please re-read my question; "!PREEMPTION" means "NOT PREEMPTION".
> >
> > Ignore preeemption entirely and assume two CPUs X and Y are running code
> > concurrently. Assume CPU X is in the ftrace prologue, and CPU Y is patching
> > that prologue while CPU X is executing it.
> >
> > Is that prevented somehow? If not, what happens in that case?
> >
> > At the very least you can have exactly the same case as on a preemptible kernel
> > (and in a VM, the hypervisor can preempt the guest ata arbitrary times),
> > becuase CPU X could end up executing a mixture of the old and new instructions.
> >
> > More generally, if you don't have strong rules about concurrent modification
> > and execution of instructions, it may not be safe to modify and instruction as
> > it is being executed (e.g. if the CPU's instruction fetches aren't atomic).
> >
> > > > Do you have any guarantee as to the atomicity and ordering of instruction
> > > > fetches?
> > > Not yet. If the region is short, we could use nop + jalr pair instead.
> >
> > Ok, so as above I do not understand how this is safe. Maybe I am missing
> > something, but if you don't have a guarantee as to ordering I don't see how you
> > can safely patch this even if you have atomicity of each instruction update.
> >
> > Note that if you don't have atomicity of instruction fetches you *cannot*
> > safely concurrently modify and execute instructions.
> >
> > > Only one jalr instruction makes the entry atomicity.
> >
> > I'll have to take your word for that.
> >
> > As above, I don't think this sequence is safe regardless.
> >
> > > There are already several proposed solutions:
> > > 1. Make stop_machine guarantee all CPU out of preemption point.
> > > 2. Expand -fpatchable-function-entry from 4 to 24, and make detour
> > > codes atomicity.
> > > 3. We want to propose a solution to make auipc by hardware mask_irq.
> > > For more details, see:
> > > https://www.youtube.com/watch?v=4JkkkXuEvCw
> >
> > Ignoring things which require HW changes, you could consider doing something
> > like what I'm doing for arm64 with DYNAMIC_FTRACE_WITH_CALL_OPS:
> >
> >   https://lore.kernel.org/lkml/20230109135828.879136-1-mark.rutland@arm.com/
> The idea of DYNAMIC_FTRACE_WITH_CALL_OPS (Using data load/store +
> indirect jump instead of auipc+jalr) is similar to Andy's solution
> (See youtube link, last page of ppt). But the key problem is you also
> expand the size of the prologue of the function. 64BIT is already
> expensive, and we can't afford more of it. I would change to seek a
> new atomic auipc+jalr ISA extension to solve this problem.
The atomicity here means:
 - auipc + jalr won't be interrupted
 - auipc + jalr should be aligned by 64bit, then one sd instruction
could update them in atomic.


>
> DYNAMIC_FTRACE_WITH_CALL_OPS  would speed up ftrace_(regs)_caller
> (Mostly for kernel debug), but it won't help
> DYNAMIC_FTRACE_WITH_DIRECT_CALLS. So I do not so care about the
> ftrace_(regs)_caller performance gain.
>
> >
> > ... which would replace the address generation with a load, which can be
> > atomic, and would give you a number of other benefits (e.g. avoiding branch
> > range limitations, performance benefits as in the cover letter).
> >
> > Thanks,
> > Mark.
>
>
>
> --
> Best Regards
>  Guo Ren



-- 
Best Regards
 Guo Ren

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH -next V6 1/7] riscv: ftrace: Fixup panic by disabling preemption
  2023-01-28 10:00         ` Guo Ren
  2023-01-29  5:36           ` Guo Ren
@ 2023-01-30 11:17           ` Mark Rutland
  2023-02-07  2:31             ` Guo Ren
  1 sibling, 1 reply; 28+ messages in thread
From: Mark Rutland @ 2023-01-30 11:17 UTC (permalink / raw)
  To: Guo Ren
  Cc: anup, paul.walmsley, palmer, conor.dooley, heiko, rostedt,
	mhiramat, jolsa, bp, jpoimboe, suagrfillet, andy.chiu,
	e.shatokhin, linux-riscv, linux-kernel

On Sat, Jan 28, 2023 at 06:00:20PM +0800, Guo Ren wrote:
> On Thu, Jan 12, 2023 at 8:05 PM Mark Rutland <mark.rutland@arm.com> wrote:

> > Ignoring things which require HW changes, you could consider doing something
> > like what I'm doing for arm64 with DYNAMIC_FTRACE_WITH_CALL_OPS:
> >
> >   https://lore.kernel.org/lkml/20230109135828.879136-1-mark.rutland@arm.com/
> The idea of DYNAMIC_FTRACE_WITH_CALL_OPS (Using data load/store +
> indirect jump instead of auipc+jalr) is similar to Andy's solution
> (See youtube link, last page of ppt).

Sure; I was present in that room and I spoke with Andy at the time.

The solutions are similar, but the important detail with
DYNAMIC_FTRACE_WITH_CALL_OPS is that the load and indirect branch is moved into
a common trampoline so that each call-site can be smaller. The ops pointer is
placed *before* the function entry point and doesn't need to be skipped with a
direct branch (which Andy's approach could also do if he aligned functions
similarly).

> But the key problem is you also expand the size of the prologue of the
> function. 64BIT is already expensive, and we can't afford more of it. I would
> change to seek a new atomic auipc+jalr ISA extension to solve this problem.

Sure, and that's nice for *new* hardware, but I'm talking about a solution
which works on *current* hardware.

> DYNAMIC_FTRACE_WITH_CALL_OPS  would speed up ftrace_(regs)_caller (Mostly for
> kernel debug), but it won't help DYNAMIC_FTRACE_WITH_DIRECT_CALLS. So I do
> not so care about the ftrace_(regs)_caller performance gain.

Actually, the plan is that it *will* help DYNAMIC_FTRACE_WITH_DIRECT_CALLS; we
just didn't make all the necessary changes in one go.

Florent Revest is looking at implementing that by placing the direct call
pointer into the ops, so the common trampoline can load that directly.

He has an older draft available at:

  https://github.com/FlorentRevest/linux/commits/indirect-direct-calls-3

... and since then, having spoken to Steven, we came up with a plan to make all
direct calls require an ops (which is the case for DIRECT_CALLS_MULTI), and
place a trampoline pointer in the ops.

That way, the common trampoline can do something like (in arm64 asm):

| 	LDR	<tmp>, [<ops>, #OPS_TRAMP_PTR]
| 	CBNZ	<tmp>, __call_tramp_directly
| 
| 	// ... regular regs trampoline logic here 
| 
| __call_tramp_directly:
| 
| 	// ... shuffle registers here
| 	
| 	BR	<tmp>

... and I believe the same should work for riscv.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH -next V6 1/7] riscv: ftrace: Fixup panic by disabling preemption
  2023-01-30 11:17           ` Mark Rutland
@ 2023-02-07  2:31             ` Guo Ren
  0 siblings, 0 replies; 28+ messages in thread
From: Guo Ren @ 2023-02-07  2:31 UTC (permalink / raw)
  To: Mark Rutland
  Cc: anup, paul.walmsley, palmer, conor.dooley, heiko, rostedt,
	mhiramat, jolsa, bp, jpoimboe, suagrfillet, andy.chiu,
	e.shatokhin, linux-riscv, linux-kernel

On Mon, Jan 30, 2023 at 7:17 PM Mark Rutland <mark.rutland@arm.com> wrote:
>
> On Sat, Jan 28, 2023 at 06:00:20PM +0800, Guo Ren wrote:
> > On Thu, Jan 12, 2023 at 8:05 PM Mark Rutland <mark.rutland@arm.com> wrote:
>
> > > Ignoring things which require HW changes, you could consider doing something
> > > like what I'm doing for arm64 with DYNAMIC_FTRACE_WITH_CALL_OPS:
> > >
> > >   https://lore.kernel.org/lkml/20230109135828.879136-1-mark.rutland@arm.com/
> > The idea of DYNAMIC_FTRACE_WITH_CALL_OPS (Using data load/store +
> > indirect jump instead of auipc+jalr) is similar to Andy's solution
> > (See youtube link, last page of ppt).
>
> Sure; I was present in that room and I spoke with Andy at the time.
>
> The solutions are similar, but the important detail with
> DYNAMIC_FTRACE_WITH_CALL_OPS is that the load and indirect branch is moved into
> a common trampoline so that each call-site can be smaller. The ops pointer is
> placed *before* the function entry point and doesn't need to be skipped with a
> direct branch (which Andy's approach could also do if he aligned functions
> similarly).
>
> > But the key problem is you also expand the size of the prologue of the
> > function. 64BIT is already expensive, and we can't afford more of it. I would
> > change to seek a new atomic auipc+jalr ISA extension to solve this problem.
>
> Sure, and that's nice for *new* hardware, but I'm talking about a solution
> which works on *current* hardware.
>
> > DYNAMIC_FTRACE_WITH_CALL_OPS  would speed up ftrace_(regs)_caller (Mostly for
> > kernel debug), but it won't help DYNAMIC_FTRACE_WITH_DIRECT_CALLS. So I do
> > not so care about the ftrace_(regs)_caller performance gain.
>
> Actually, the plan is that it *will* help DYNAMIC_FTRACE_WITH_DIRECT_CALLS; we
> just didn't make all the necessary changes in one go.
>
> Florent Revest is looking at implementing that by placing the direct call
> pointer into the ops, so the common trampoline can load that directly.
>
> He has an older draft available at:
>
>   https://github.com/FlorentRevest/linux/commits/indirect-direct-calls-3
Thx for sharing :)

>
> ... and since then, having spoken to Steven, we came up with a plan to make all
> direct calls require an ops (which is the case for DIRECT_CALLS_MULTI), and
> place a trampoline pointer in the ops.
>
> That way, the common trampoline can do something like (in arm64 asm):
>
> |       LDR     <tmp>, [<ops>, #OPS_TRAMP_PTR]
> |       CBNZ    <tmp>, __call_tramp_directly
> |
> |       // ... regular regs trampoline logic here
> |
> | __call_tramp_directly:
> |
> |       // ... shuffle registers here
> |
> |       BR      <tmp>
>
> ... and I believe the same should work for riscv.
I agree; I would try next.

>
> Thanks,
> Mark.



-- 
Best Regards
 Guo Ren

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2023-02-07  2:31 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-01-07 13:35 [PATCH -next V6 0/7] riscv: Optimize function trace guoren
2023-01-07 13:35 ` [PATCH -next V6 1/7] riscv: ftrace: Fixup panic by disabling preemption guoren
2023-01-09 17:19   ` Mark Rutland
2023-01-11 13:22     ` Guo Ren
2023-01-12 12:05       ` Mark Rutland
2023-01-28 10:00         ` Guo Ren
2023-01-29  5:36           ` Guo Ren
2023-01-30 11:17           ` Mark Rutland
2023-02-07  2:31             ` Guo Ren
2023-01-07 13:35 ` [PATCH -next V6 2/7] riscv: ftrace: Remove wasted nops for !RISCV_ISA_C guoren
2023-01-07 13:35 ` [PATCH -next V6 3/7] riscv: ftrace: Reduce the detour code size to half guoren
2023-01-10 17:13   ` Evgenii Shatokhin
2023-01-11  9:58     ` Guo Ren
2023-01-07 13:35 ` [PATCH -next V6 4/7] riscv: ftrace: Add ftrace_graph_func guoren
2023-01-07 13:35 ` [PATCH -next V6 5/7] riscv: ftrace: Make ftrace_caller call ftrace_graph_func guoren
2023-01-10 17:16   ` Evgenii Shatokhin
2023-01-11  8:23     ` Guo Ren
2023-01-11  8:41       ` Guo Ren
2023-01-07 13:35 ` [PATCH -next V6 6/7] riscv: ftrace: Add DYNAMIC_FTRACE_WITH_DIRECT_CALLS support guoren
2023-01-07 13:35 ` [PATCH -next V6 7/7] samples: ftrace: Add riscv support for SAMPLE_FTRACE_DIRECT[_MULTI] guoren
2023-01-10 13:08   ` Evgenii Shatokhin
2023-01-10 13:50   ` Evgenii Shatokhin
2023-01-11  9:50   ` Song Shuai
2023-01-11  9:59     ` Guo Ren
2023-01-13 10:48     ` Song Shuai
2023-01-17  3:20       ` Guo Ren
2023-01-11 10:11 ` [PATCH -next V6 0/7] riscv: Optimize function trace Song Shuai
2023-01-11 14:03   ` Guo Ren

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).