linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 0/9] powerpc: ftrace updates
@ 2023-12-08 16:30 Naveen N Rao
  2023-12-08 16:30 ` [RFC PATCH 1/9] powerpc/ftrace: Fix indentation in ftrace.h Naveen N Rao
                   ` (9 more replies)
  0 siblings, 10 replies; 16+ messages in thread
From: Naveen N Rao @ 2023-12-08 16:30 UTC (permalink / raw)
  To: linuxppc-dev, linux-kernel
  Cc: Mark Rutland, Florent Revest, Nicholas Piggin, Steven Rostedt,
	Aneesh Kumar K.V, Masami Hiramatsu

Early RFC.

This series attempts to address couple of issues with the existing 
support for ftrace on powerpc, with a view towards improving performance 
when ftrace is not enabled. See patch 6 for more details.

Patches 7 and 8 implement support for ftrace direct calls, through 
adding support for DYNAMIC_FTRACE_WITH_CALL_OPS.

The first 5 patches are minor cleanups and updates, and can go in 
separately.

This series depends on Benjamin Gray's series adding support for 
patch_ulong().

I have lightly tested this patch set and it looks to be working well. As 
described in patch 6, context_switch microbenchmark shows an improvement 
of ~6% with this series with ftrace disabled. Performance when ftrace is
enabled reduces due to how DYNAMIC_FTRACE_WITH_CALL_OPS works, and due 
to support for direct calls. Some of that can hopefully be improved, if 
this approach is otherwise ok.

- Naveen



Naveen N Rao (8):
  powerpc/ftrace: Fix indentation in ftrace.h
  powerpc/ftrace: Unify 32-bit and 64-bit ftrace entry code
  powerpc/ftrace: Remove nops after the call to ftrace_stub
  powerpc/kprobes: Use ftrace to determine if a probe is at function
    entry
  powerpc/ftrace: Update and move function profile instructions
    out-of-line
  powerpc/ftrace: Add support for DYNAMIC_FTRACE_WITH_CALL_OPS
  powerpc/ftrace: Add support for DYNAMIC_FTRACE_WITH_DIRECT_CALLS
  samples/ftrace: Add support for ftrace direct samples on powerpc

Sathvika Vasireddy (1):
  powerpc/Kconfig: Select FUNCTION_ALIGNMENT_4B

 arch/powerpc/Kconfig                        |   6 +
 arch/powerpc/Makefile                       |   6 +-
 arch/powerpc/include/asm/code-patching.h    |  15 +-
 arch/powerpc/include/asm/ftrace.h           |  35 ++-
 arch/powerpc/include/asm/linkage.h          |   3 -
 arch/powerpc/kernel/asm-offsets.c           |   7 +
 arch/powerpc/kernel/kprobes.c               |  69 +++++-
 arch/powerpc/kernel/trace/ftrace.c          | 231 ++++++++++++++++----
 arch/powerpc/kernel/trace/ftrace_entry.S    | 182 +++++++++++----
 samples/ftrace/ftrace-direct-modify.c       |  94 +++++++-
 samples/ftrace/ftrace-direct-multi-modify.c | 110 +++++++++-
 samples/ftrace/ftrace-direct-multi.c        |  64 +++++-
 samples/ftrace/ftrace-direct-too.c          |  72 +++++-
 samples/ftrace/ftrace-direct.c              |  61 +++++-
 14 files changed, 845 insertions(+), 110 deletions(-)


base-commit: 9a15ae60f2c9707433b01e55815cd9142be102b2
prerequisite-patch-id: 38d3e705bf2e27cfa5e3ba369a6ded84ba6615c2
prerequisite-patch-id: 609d292e054b2396b603890522a940fa0bdfb6d8
prerequisite-patch-id: 6f7213fb77b1260defbf43be0e47bff9c80054cc
prerequisite-patch-id: f2328625ae2193c3c8e336b154b62030940cece8
-- 
2.43.0


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [RFC PATCH 1/9] powerpc/ftrace: Fix indentation in ftrace.h
  2023-12-08 16:30 [RFC PATCH 0/9] powerpc: ftrace updates Naveen N Rao
@ 2023-12-08 16:30 ` Naveen N Rao
  2023-12-08 16:30 ` [RFC PATCH 2/9] powerpc/ftrace: Unify 32-bit and 64-bit ftrace entry code Naveen N Rao
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 16+ messages in thread
From: Naveen N Rao @ 2023-12-08 16:30 UTC (permalink / raw)
  To: linuxppc-dev, linux-kernel
  Cc: Mark Rutland, Florent Revest, Nicholas Piggin, Steven Rostedt,
	Aneesh Kumar K.V, Masami Hiramatsu

Replace seven spaces with a tab character to fix an indentation issue
reported by the kernel test robot.

Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202311221731.alUwTDIm-lkp@intel.com/
Signed-off-by: Naveen N Rao <naveen@kernel.org>
---
 arch/powerpc/include/asm/ftrace.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/ftrace.h b/arch/powerpc/include/asm/ftrace.h
index 9e5a39b6a311..1ebd2ca97f12 100644
--- a/arch/powerpc/include/asm/ftrace.h
+++ b/arch/powerpc/include/asm/ftrace.h
@@ -25,7 +25,7 @@ static inline unsigned long ftrace_call_adjust(unsigned long addr)
 	if (IS_ENABLED(CONFIG_ARCH_USING_PATCHABLE_FUNCTION_ENTRY))
 		addr += MCOUNT_INSN_SIZE;
 
-       return addr;
+	return addr;
 }
 
 unsigned long prepare_ftrace_return(unsigned long parent, unsigned long ip,
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [RFC PATCH 2/9] powerpc/ftrace: Unify 32-bit and 64-bit ftrace entry code
  2023-12-08 16:30 [RFC PATCH 0/9] powerpc: ftrace updates Naveen N Rao
  2023-12-08 16:30 ` [RFC PATCH 1/9] powerpc/ftrace: Fix indentation in ftrace.h Naveen N Rao
@ 2023-12-08 16:30 ` Naveen N Rao
  2023-12-08 16:30 ` [RFC PATCH 3/9] powerpc/ftrace: Remove nops after the call to ftrace_stub Naveen N Rao
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 16+ messages in thread
From: Naveen N Rao @ 2023-12-08 16:30 UTC (permalink / raw)
  To: linuxppc-dev, linux-kernel
  Cc: Mark Rutland, Florent Revest, Nicholas Piggin, Steven Rostedt,
	Aneesh Kumar K.V, Masami Hiramatsu

On 32-bit powerpc, gcc generates a three instruction sequence for
function profiling:
	mflr	r0
	stw	r0, 4(r1)
	bl	_mcount

On kernel boot, the call to _mcount() is nop-ed out, to be patched back
in when ftrace is actually enabled. The 'stw' instruction therefore is
not necessary unless ftrace is enabled. Nop it out during ftrace init.

When ftrace is enabled, we want the 'stw' so that stack unwinding works
properly. Perform the same within the ftrace handler, similar to 64-bit
powerpc.

For 64-bit powerpc, early versions of gcc used to emit a three
instruction sequence for function profiling (with -mprofile-kernel) with
a 'std' instruction to mimic the 'stw' above. Address that scenario also
by nop-ing out the 'std' instruction during ftrace init.

Signed-off-by: Naveen N Rao <naveen@kernel.org>
---
 arch/powerpc/kernel/trace/ftrace.c       | 6 ++++--
 arch/powerpc/kernel/trace/ftrace_entry.S | 4 ++--
 2 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/kernel/trace/ftrace.c b/arch/powerpc/kernel/trace/ftrace.c
index 82010629cf88..2956196c98ff 100644
--- a/arch/powerpc/kernel/trace/ftrace.c
+++ b/arch/powerpc/kernel/trace/ftrace.c
@@ -229,13 +229,15 @@ int ftrace_init_nop(struct module *mod, struct dyn_ftrace *rec)
 		/* Expected sequence: 'mflr r0', 'stw r0,4(r1)', 'bl _mcount' */
 		ret = ftrace_validate_inst(ip - 8, ppc_inst(PPC_RAW_MFLR(_R0)));
 		if (!ret)
-			ret = ftrace_validate_inst(ip - 4, ppc_inst(PPC_RAW_STW(_R0, _R1, 4)));
+			ret = ftrace_modify_code(ip - 4, ppc_inst(PPC_RAW_STW(_R0, _R1, 4)),
+						 ppc_inst(PPC_RAW_NOP()));
 	} else if (IS_ENABLED(CONFIG_MPROFILE_KERNEL)) {
 		/* Expected sequence: 'mflr r0', ['std r0,16(r1)'], 'bl _mcount' */
 		ret = ftrace_read_inst(ip - 4, &old);
 		if (!ret && !ppc_inst_equal(old, ppc_inst(PPC_RAW_MFLR(_R0)))) {
 			ret = ftrace_validate_inst(ip - 8, ppc_inst(PPC_RAW_MFLR(_R0)));
-			ret |= ftrace_validate_inst(ip - 4, ppc_inst(PPC_RAW_STD(_R0, _R1, 16)));
+			ret |= ftrace_modify_code(ip - 4, ppc_inst(PPC_RAW_STD(_R0, _R1, 16)),
+						  ppc_inst(PPC_RAW_NOP()));
 		}
 	} else {
 		return -EINVAL;
diff --git a/arch/powerpc/kernel/trace/ftrace_entry.S b/arch/powerpc/kernel/trace/ftrace_entry.S
index 40677416d7b2..17d1ed3d0b40 100644
--- a/arch/powerpc/kernel/trace/ftrace_entry.S
+++ b/arch/powerpc/kernel/trace/ftrace_entry.S
@@ -33,6 +33,8 @@
  * and then arrange for the ftrace function to be called.
  */
 .macro	ftrace_regs_entry allregs
+	/* Save the original return address in A's stack frame */
+	PPC_STL		r0, LRSAVE(r1)
 	/* Create a minimal stack frame for representing B */
 	PPC_STLU	r1, -STACK_FRAME_MIN_SIZE(r1)
 
@@ -44,8 +46,6 @@
 	SAVE_GPRS(3, 10, r1)
 
 #ifdef CONFIG_PPC64
-	/* Save the original return address in A's stack frame */
-	std	r0, LRSAVE+SWITCH_FRAME_SIZE+STACK_FRAME_MIN_SIZE(r1)
 	/* Ok to continue? */
 	lbz	r3, PACA_FTRACE_ENABLED(r13)
 	cmpdi	r3, 0
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [RFC PATCH 3/9] powerpc/ftrace: Remove nops after the call to ftrace_stub
  2023-12-08 16:30 [RFC PATCH 0/9] powerpc: ftrace updates Naveen N Rao
  2023-12-08 16:30 ` [RFC PATCH 1/9] powerpc/ftrace: Fix indentation in ftrace.h Naveen N Rao
  2023-12-08 16:30 ` [RFC PATCH 2/9] powerpc/ftrace: Unify 32-bit and 64-bit ftrace entry code Naveen N Rao
@ 2023-12-08 16:30 ` Naveen N Rao
  2023-12-08 16:30 ` [RFC PATCH 4/9] powerpc/Kconfig: Select FUNCTION_ALIGNMENT_4B Naveen N Rao
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 16+ messages in thread
From: Naveen N Rao @ 2023-12-08 16:30 UTC (permalink / raw)
  To: linuxppc-dev, linux-kernel
  Cc: Mark Rutland, Florent Revest, Nicholas Piggin, Steven Rostedt,
	Aneesh Kumar K.V, Masami Hiramatsu

ftrace_stub is within the same CU, so there is no need for a subsequent
nop instruction.

Signed-off-by: Naveen N Rao <naveen@kernel.org>
---
 arch/powerpc/kernel/trace/ftrace_entry.S | 2 --
 1 file changed, 2 deletions(-)

diff --git a/arch/powerpc/kernel/trace/ftrace_entry.S b/arch/powerpc/kernel/trace/ftrace_entry.S
index 17d1ed3d0b40..244a1c7bb1e8 100644
--- a/arch/powerpc/kernel/trace/ftrace_entry.S
+++ b/arch/powerpc/kernel/trace/ftrace_entry.S
@@ -162,7 +162,6 @@ _GLOBAL(ftrace_regs_caller)
 .globl ftrace_regs_call
 ftrace_regs_call:
 	bl	ftrace_stub
-	nop
 	ftrace_regs_exit 1
 
 _GLOBAL(ftrace_caller)
@@ -171,7 +170,6 @@ _GLOBAL(ftrace_caller)
 .globl ftrace_call
 ftrace_call:
 	bl	ftrace_stub
-	nop
 	ftrace_regs_exit 0
 
 _GLOBAL(ftrace_stub)
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [RFC PATCH 4/9] powerpc/Kconfig: Select FUNCTION_ALIGNMENT_4B
  2023-12-08 16:30 [RFC PATCH 0/9] powerpc: ftrace updates Naveen N Rao
                   ` (2 preceding siblings ...)
  2023-12-08 16:30 ` [RFC PATCH 3/9] powerpc/ftrace: Remove nops after the call to ftrace_stub Naveen N Rao
@ 2023-12-08 16:30 ` Naveen N Rao
  2023-12-08 16:30 ` [RFC PATCH 5/9] powerpc/kprobes: Use ftrace to determine if a probe is at function entry Naveen N Rao
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 16+ messages in thread
From: Naveen N Rao @ 2023-12-08 16:30 UTC (permalink / raw)
  To: linuxppc-dev, linux-kernel
  Cc: Mark Rutland, Florent Revest, Nicholas Piggin, Steven Rostedt,
	Aneesh Kumar K.V, Masami Hiramatsu

From: Sathvika Vasireddy <sv@linux.ibm.com>

Commit d49a0626216b95 ("arch: Introduce CONFIG_FUNCTION_ALIGNMENT")
introduced a generic function-alignment infrastructure. Move to using
FUNCTION_ALIGNMENT_4B on powerpc, to use the same alignment as that of
the existing _GLOBAL macro.

Signed-off-by: Sathvika Vasireddy <sv@linux.ibm.com>
---
 arch/powerpc/Kconfig               | 1 +
 arch/powerpc/include/asm/linkage.h | 3 ---
 2 files changed, 1 insertion(+), 3 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 6f105ee4f3cf..318e5c1b7454 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -189,6 +189,7 @@ config PPC
 	select EDAC_ATOMIC_SCRUB
 	select EDAC_SUPPORT
 	select FTRACE_MCOUNT_USE_PATCHABLE_FUNCTION_ENTRY if ARCH_USING_PATCHABLE_FUNCTION_ENTRY
+	select FUNCTION_ALIGNMENT_4B
 	select GENERIC_ATOMIC64			if PPC32
 	select GENERIC_CLOCKEVENTS_BROADCAST	if SMP
 	select GENERIC_CMOS_UPDATE
diff --git a/arch/powerpc/include/asm/linkage.h b/arch/powerpc/include/asm/linkage.h
index b88d1d2cf304..b71b9582e754 100644
--- a/arch/powerpc/include/asm/linkage.h
+++ b/arch/powerpc/include/asm/linkage.h
@@ -4,9 +4,6 @@
 
 #include <asm/types.h>
 
-#define __ALIGN		.align 2
-#define __ALIGN_STR	".align 2"
-
 #ifdef CONFIG_PPC64_ELF_ABI_V1
 #define cond_syscall(x) \
 	asm ("\t.weak " #x "\n\t.set " #x ", sys_ni_syscall\n"		\
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [RFC PATCH 5/9] powerpc/kprobes: Use ftrace to determine if a probe is at function entry
  2023-12-08 16:30 [RFC PATCH 0/9] powerpc: ftrace updates Naveen N Rao
                   ` (3 preceding siblings ...)
  2023-12-08 16:30 ` [RFC PATCH 4/9] powerpc/Kconfig: Select FUNCTION_ALIGNMENT_4B Naveen N Rao
@ 2023-12-08 16:30 ` Naveen N Rao
  2023-12-11  5:39   ` Masami Hiramatsu
  2023-12-08 16:30 ` [RFC PATCH 6/9] powerpc/ftrace: Update and move function profile instructions out-of-line Naveen N Rao
                   ` (4 subsequent siblings)
  9 siblings, 1 reply; 16+ messages in thread
From: Naveen N Rao @ 2023-12-08 16:30 UTC (permalink / raw)
  To: linuxppc-dev, linux-kernel
  Cc: Mark Rutland, Florent Revest, Nicholas Piggin, Steven Rostedt,
	Aneesh Kumar K.V, Masami Hiramatsu

Rather than hard-coding the offset into a function to be used to
determine if a kprobe is at function entry, use ftrace_location() to
determine the ftrace location within the function and categorize all
instructions till that offset to be function entry.

For functions that cannot be traced, we fall back to using a fixed
offset of 8 (two instructions) to categorize a probe as being at
function entry for 64-bit elfv2.

Signed-off-by: Naveen N Rao <naveen@kernel.org>
---
 arch/powerpc/kernel/kprobes.c | 18 ++++++++----------
 1 file changed, 8 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/kernel/kprobes.c b/arch/powerpc/kernel/kprobes.c
index b20ee72e873a..42665dfab59e 100644
--- a/arch/powerpc/kernel/kprobes.c
+++ b/arch/powerpc/kernel/kprobes.c
@@ -105,24 +105,22 @@ kprobe_opcode_t *kprobe_lookup_name(const char *name, unsigned int offset)
 	return addr;
 }
 
-static bool arch_kprobe_on_func_entry(unsigned long offset)
+static bool arch_kprobe_on_func_entry(unsigned long addr, unsigned long offset)
 {
-#ifdef CONFIG_PPC64_ELF_ABI_V2
-#ifdef CONFIG_KPROBES_ON_FTRACE
-	return offset <= 16;
-#else
-	return offset <= 8;
-#endif
-#else
+	unsigned long ip = ftrace_location(addr);
+
+	if (ip)
+		return offset <= (ip - addr);
+	if (IS_ENABLED(CONFIG_PPC64_ELF_ABI_V2))
+		return offset <= 8;
 	return !offset;
-#endif
 }
 
 /* XXX try and fold the magic of kprobe_lookup_name() in this */
 kprobe_opcode_t *arch_adjust_kprobe_addr(unsigned long addr, unsigned long offset,
 					 bool *on_func_entry)
 {
-	*on_func_entry = arch_kprobe_on_func_entry(offset);
+	*on_func_entry = arch_kprobe_on_func_entry(addr, offset);
 	return (kprobe_opcode_t *)(addr + offset);
 }
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [RFC PATCH 6/9] powerpc/ftrace: Update and move function profile instructions out-of-line
  2023-12-08 16:30 [RFC PATCH 0/9] powerpc: ftrace updates Naveen N Rao
                   ` (4 preceding siblings ...)
  2023-12-08 16:30 ` [RFC PATCH 5/9] powerpc/kprobes: Use ftrace to determine if a probe is at function entry Naveen N Rao
@ 2023-12-08 16:30 ` Naveen N Rao
  2023-12-21 10:46   ` Christophe Leroy
  2023-12-08 16:30 ` [RFC PATCH 7/9] powerpc/ftrace: Add support for DYNAMIC_FTRACE_WITH_CALL_OPS Naveen N Rao
                   ` (3 subsequent siblings)
  9 siblings, 1 reply; 16+ messages in thread
From: Naveen N Rao @ 2023-12-08 16:30 UTC (permalink / raw)
  To: linuxppc-dev, linux-kernel
  Cc: Mark Rutland, Florent Revest, Nicholas Piggin, Steven Rostedt,
	Aneesh Kumar K.V, Masami Hiramatsu

Function profile sequence on powerpc includes two instructions at the
beginning of each function:

	mflr	r0
	bl	ftrace_caller

The call to ftrace_caller() gets nop'ed out during kernel boot and is
patched in when ftrace is enabled.

There are two issues with this:
1. The 'mflr r0' instruction at the beginning of each function remains
   even though ftrace is not being used.
2. When ftrace is activated, we return from ftrace_caller() with a bctr
   instruction to preserve r0 and LR, resulting in the link stack
   becoming unbalanced.

To address (1), we have tried to nop'out the 'mflr r0' instruction when
nop'ing out the call to ftrace_caller() and restoring it when enabling
ftrace. But, that required additional synchronization slowing down
ftrace activation. It also left an additional nop instruction at the
beginning of each function and that wasn't desirable on 32-bit powerpc.

Instead of that, move the function profile sequence out-of-line leaving
a single nop at function entry. On ftrace activation, the nop is changed
to an unconditional branch to the out-of-line sequence that in turn
calls ftrace_caller(). This removes the need for complex synchronization
during ftrace activation and simplifies the code. More importantly, this
improves performance of the kernel when ftrace is not in use.

To address (2), change the ftrace trampoline to return with a 'blr'
instruction with the original return address in r0 intact. Then, an
additional 'mtlr r0' instruction in the function profile sequence can
move the correct return address back to LR.

With the above two changes, the function profile sequence now looks like
the following:

 [func:		# GEP -- 64-bit powerpc, optional
	addis	r2,r12,imm1
	addi	r2,r2,imm2]
  tramp:
	mflr	r0
	bl	ftrace_caller
	mtlr	r0
	b	func
	nop
	[nop]	# 64-bit powerpc only
  func:		# LEP
	nop

On 32-bit powerpc, the ftrace mcount trampoline is now completely
outside the function. This is also the case on 64-bit powerpc for
functions that do not need a GEP. However, for functions that need a
GEP, the additional instructions are inserted between the GEP and the
LEP. Since we can only have a fixed number of instructions between GEP
and LEP, we choose to emit 6 instructions. Four of those instructions
are used for the function profile sequence and two instruction slots are
reserved for implementing support for DYNAMIC_FTRACE_WITH_CALL_OPS. On
32-bit powerpc, we emit one additional nop for this purpose resulting in
a total of 5 nops before function entry.

To enable ftrace, the nop at function entry is changed to an
unconditional branch to 'tramp'. The call to ftrace_caller() may be
updated to ftrace_regs_caller() depending on the registered ftrace ops.
On 64-bit powerpc, we additionally change the instruction at 'tramp' to
'mflr r0' from an unconditional branch back to func+4. This is so that
functions entered through the GEP can skip the function profile sequence
unless ftrace is enabled.

With the context_switch microbenchmark on a P9 machine, there is a
performance improvement of ~6% with this patch applied, going from 650k
context switches to 690k context switches without ftrace enabled. With
ftrace enabled, the performance was similar at 86k context switches.

The downside of this approach is the increase in vmlinux size,
especially on 32-bit powerpc. We now emit 3 additional instructions for
each function (excluding the one or two instructions for supporting
DYNAMIC_FTRACE_WITH_CALL_OPS). On 64-bit powerpc with the current
implementation of -fpatchable-function-entry though, this is not
avoidable since we are forced to emit 6 instructions between the GEP and
the LEP even if we are to only support DYNAMIC_FTRACE_WITH_CALL_OPS.

Signed-off-by: Naveen N Rao <naveen@kernel.org>
---
 arch/powerpc/Makefile                    |   6 +-
 arch/powerpc/include/asm/code-patching.h |  15 ++-
 arch/powerpc/include/asm/ftrace.h        |  18 ++-
 arch/powerpc/kernel/kprobes.c            |  51 +++++++-
 arch/powerpc/kernel/trace/ftrace.c       | 149 ++++++++++++++++++-----
 arch/powerpc/kernel/trace/ftrace_entry.S |  54 ++++++--
 6 files changed, 246 insertions(+), 47 deletions(-)

diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile
index f19dbaa1d541..91ef34be8eb9 100644
--- a/arch/powerpc/Makefile
+++ b/arch/powerpc/Makefile
@@ -145,7 +145,11 @@ CFLAGS-$(CONFIG_PPC32)	+= $(call cc-option,-mno-readonly-in-sdata)
 ifdef CONFIG_FUNCTION_TRACER
 ifdef CONFIG_ARCH_USING_PATCHABLE_FUNCTION_ENTRY
 KBUILD_CPPFLAGS	+= -DCC_USING_PATCHABLE_FUNCTION_ENTRY
-CC_FLAGS_FTRACE := -fpatchable-function-entry=2
+ifdef CONFIG_PPC32
+CC_FLAGS_FTRACE := -fpatchable-function-entry=6,5
+else
+CC_FLAGS_FTRACE := -fpatchable-function-entry=7,6
+endif
 else
 CC_FLAGS_FTRACE := -pg
 ifdef CONFIG_MPROFILE_KERNEL
diff --git a/arch/powerpc/include/asm/code-patching.h b/arch/powerpc/include/asm/code-patching.h
index 84f6ccd7de7a..9a54bb9e0dde 100644
--- a/arch/powerpc/include/asm/code-patching.h
+++ b/arch/powerpc/include/asm/code-patching.h
@@ -185,10 +185,21 @@ static inline unsigned long ppc_function_entry(void *func)
 	 */
 	if ((((*insn & OP_RT_RA_MASK) == ADDIS_R2_R12) ||
 	     ((*insn & OP_RT_RA_MASK) == LIS_R2)) &&
-	    ((*(insn+1) & OP_RT_RA_MASK) == ADDI_R2_R2))
+	    ((*(insn+1) & OP_RT_RA_MASK) == ADDI_R2_R2)) {
+#ifdef CONFIG_ARCH_USING_PATCHABLE_FUNCTION_ENTRY
+		/*
+		 * Heuristic: look for the 'mtlr r0' instruction assuming ftrace init is done.
+		 * If it is not found, look for two consecutive nops after the GEP.
+		 * Longer term, we really should be parsing the symbol table to determine LEP.
+		 */
+		if ((*(insn+4) == PPC_RAW_MTLR(_R0)) ||
+		    ((*(insn+2) == PPC_RAW_NOP() && *(insn+3) == PPC_RAW_NOP())))
+			return (unsigned long)(insn + 8);
+#endif
 		return (unsigned long)(insn + 2);
-	else
+	} else {
 		return (unsigned long)func;
+	}
 #elif defined(CONFIG_PPC64_ELF_ABI_V1)
 	/*
 	 * On PPC64 ABIv1 the function pointer actually points to the
diff --git a/arch/powerpc/include/asm/ftrace.h b/arch/powerpc/include/asm/ftrace.h
index 1ebd2ca97f12..d9b99781bea3 100644
--- a/arch/powerpc/include/asm/ftrace.h
+++ b/arch/powerpc/include/asm/ftrace.h
@@ -11,10 +11,20 @@
 #define HAVE_FUNCTION_GRAPH_RET_ADDR_PTR
 
 /* Ignore unused weak functions which will have larger offsets */
-#if defined(CONFIG_MPROFILE_KERNEL) || defined(CONFIG_ARCH_USING_PATCHABLE_FUNCTION_ENTRY)
-#define FTRACE_MCOUNT_MAX_OFFSET	16
+#if defined(CONFIG_ARCH_USING_PATCHABLE_FUNCTION_ENTRY)
+#if defined(CONFIG_PPC64)
+#define FTRACE_MCOUNT_MAX_OFFSET	(MCOUNT_INSN_SIZE * 8)
+#define FTRACE_MCOUNT_TRAMP_OFFSET	(MCOUNT_INSN_SIZE * 6)
+#else
+#define FTRACE_MCOUNT_MAX_OFFSET	0
+#define FTRACE_MCOUNT_TRAMP_OFFSET	(MCOUNT_INSN_SIZE * 5)
+#endif /* CONFIG_PPC64 */
+#elif defined(CONFIG_MPROFILE_KERNEL)
+#define FTRACE_MCOUNT_MAX_OFFSET	(MCOUNT_INSN_SIZE * 4)
+#define FTRACE_MCOUNT_TRAMP_OFFSET	0
 #elif defined(CONFIG_PPC32)
-#define FTRACE_MCOUNT_MAX_OFFSET	8
+#define FTRACE_MCOUNT_MAX_OFFSET	(MCOUNT_INSN_SIZE * 2)
+#define FTRACE_MCOUNT_TRAMP_OFFSET	0
 #endif
 
 #ifndef __ASSEMBLY__
@@ -23,7 +33,7 @@ extern void _mcount(void);
 static inline unsigned long ftrace_call_adjust(unsigned long addr)
 {
 	if (IS_ENABLED(CONFIG_ARCH_USING_PATCHABLE_FUNCTION_ENTRY))
-		addr += MCOUNT_INSN_SIZE;
+		addr += FTRACE_MCOUNT_TRAMP_OFFSET;
 
 	return addr;
 }
diff --git a/arch/powerpc/kernel/kprobes.c b/arch/powerpc/kernel/kprobes.c
index 42665dfab59e..21557cf8544d 100644
--- a/arch/powerpc/kernel/kprobes.c
+++ b/arch/powerpc/kernel/kprobes.c
@@ -33,12 +33,61 @@ DEFINE_PER_CPU(struct kprobe_ctlblk, kprobe_ctlblk);
 
 struct kretprobe_blackpoint kretprobe_blacklist[] = {{NULL, NULL}};
 
+#ifdef CONFIG_ARCH_USING_PATCHABLE_FUNCTION_ENTRY
+/*
+ * Reject probes on the ftrace mcount trampoline instruction sequence.  There
+ * are two scenarios to handle:
+ * 1. Functions that can be traced and have a GEP.
+ * 2. Functions without a GEP, wherein the ftrace mcount trampoline ends up
+ * being part of the previous function
+ */
+static inline bool addr_within_ftrace_mcount_trampoline(unsigned long addr)
+{
+	unsigned long ip, size, offset;
+
+	if (!kallsyms_lookup_size_offset(addr, &size, &offset))
+		return false;
+
+	if (IS_ENABLED(CONFIG_PPC64)) {
+		ip = ftrace_location(addr - offset);
+
+		/* If the function is traceable and has GEP... */
+		if (ip && ip != (addr - offset))
+			/* ... reject probes on 6 instructions after the GEP entry sequence */
+			if (offset >= 2 * MCOUNT_INSN_SIZE && offset < 8 * MCOUNT_INSN_SIZE)
+				return true;
+
+		/* If the next function is traceable and does not have GEP... */
+		ip = addr + offset;
+		if (ftrace_location(ip) == ip)
+			/* ... reject probes on 6 instructions at the end of the function */
+			if (offset >= (size - 6 * MCOUNT_INSN_SIZE))
+				return true;
+	} else {
+		/* If the next function is traceable ... */
+		ip = addr + offset;
+		if (ftrace_location(ip) == ip)
+			/* ... reject probes on the last 5 instructions of the function */
+			if (offset >= (size - 5 * MCOUNT_INSN_SIZE))
+				return true;
+	}
+
+	return false;
+}
+#else
+static inline bool addr_within_ftrace_mcount_trampoline(unsigned long addr)
+{
+	return false;
+}
+#endif
+
 bool arch_within_kprobe_blacklist(unsigned long addr)
 {
 	return  (addr >= (unsigned long)__kprobes_text_start &&
 		 addr < (unsigned long)__kprobes_text_end) ||
 		(addr >= (unsigned long)_stext &&
-		 addr < (unsigned long)__head_end);
+		 addr < (unsigned long)__head_end) ||
+		addr_within_ftrace_mcount_trampoline(addr);
 }
 
 kprobe_opcode_t *kprobe_lookup_name(const char *name, unsigned int offset)
diff --git a/arch/powerpc/kernel/trace/ftrace.c b/arch/powerpc/kernel/trace/ftrace.c
index 2956196c98ff..d3b4949142a8 100644
--- a/arch/powerpc/kernel/trace/ftrace.c
+++ b/arch/powerpc/kernel/trace/ftrace.c
@@ -70,7 +70,7 @@ static inline int ftrace_modify_code(unsigned long ip, ppc_inst_t old, ppc_inst_
 {
 	int ret = ftrace_validate_inst(ip, old);
 
-	if (!ret)
+	if (!ret && !ppc_inst_equal(old, new))
 		ret = patch_instruction((u32 *)ip, new);
 
 	return ret;
@@ -96,7 +96,8 @@ static unsigned long find_ftrace_tramp(unsigned long ip)
 
 static int ftrace_get_call_inst(struct dyn_ftrace *rec, unsigned long addr, ppc_inst_t *call_inst)
 {
-	unsigned long ip = rec->ip;
+	unsigned long ip = rec->ip - (FTRACE_MCOUNT_TRAMP_OFFSET ?
+				      FTRACE_MCOUNT_TRAMP_OFFSET - MCOUNT_INSN_SIZE : 0);
 	unsigned long stub;
 
 	if (is_offset_in_branch_range(addr - ip)) {
@@ -135,18 +136,38 @@ int ftrace_modify_call(struct dyn_ftrace *rec, unsigned long old_addr, unsigned
 int ftrace_make_call(struct dyn_ftrace *rec, unsigned long addr)
 {
 	ppc_inst_t old, new;
-	int ret;
+	unsigned long ip;
+	int ret = 0;
 
-	/* This can only ever be called during module load */
 	if (WARN_ON(!IS_ENABLED(CONFIG_MODULES) || core_kernel_text(rec->ip)))
 		return -EINVAL;
 
-	old = ppc_inst(PPC_RAW_NOP());
-	ret = ftrace_get_call_inst(rec, addr, &new);
+	if (IS_ENABLED(CONFIG_ARCH_USING_PATCHABLE_FUNCTION_ENTRY))
+		ret = ftrace_get_call_inst(rec, (unsigned long)ftrace_caller, &old);
+	else
+		old = ppc_inst(PPC_RAW_NOP());
+	ret |= ftrace_get_call_inst(rec, addr, &new);
 	if (ret)
 		return ret;
 
-	return ftrace_modify_code(rec->ip, old, new);
+	ip = rec->ip - (FTRACE_MCOUNT_TRAMP_OFFSET ?
+			FTRACE_MCOUNT_TRAMP_OFFSET - MCOUNT_INSN_SIZE : 0);
+
+	/* This can only ever be called during module load */
+	ret = ftrace_modify_code(ip, old, new);
+
+	if (ret || !IS_ENABLED(CONFIG_ARCH_USING_PATCHABLE_FUNCTION_ENTRY))
+		return ret;
+
+	ip = rec->ip;
+	ret = ftrace_modify_code(ip, ppc_inst(PPC_RAW_NOP()),
+				 ppc_inst(PPC_RAW_BRANCH(-FTRACE_MCOUNT_TRAMP_OFFSET)));
+	if (IS_ENABLED(CONFIG_PPC64) && !ret)
+		ret = ftrace_modify_code(ip - FTRACE_MCOUNT_TRAMP_OFFSET,
+			  ppc_inst(PPC_RAW_BRANCH(FTRACE_MCOUNT_TRAMP_OFFSET + MCOUNT_INSN_SIZE)),
+			  ppc_inst(PPC_RAW_MFLR(_R0)));
+
+	return ret;
 }
 
 int ftrace_make_nop(struct module *mod, struct dyn_ftrace *rec, unsigned long addr)
@@ -170,7 +191,8 @@ void ftrace_replace_code(int enable)
 
 	for_ftrace_rec_iter(iter) {
 		rec = ftrace_rec_iter_record(iter);
-		ip = rec->ip;
+		ip = rec->ip - (FTRACE_MCOUNT_TRAMP_OFFSET ?
+				FTRACE_MCOUNT_TRAMP_OFFSET - MCOUNT_INSN_SIZE : 0);
 
 		if (rec->flags & FTRACE_FL_DISABLED && !(rec->flags & FTRACE_FL_ENABLED))
 			continue;
@@ -179,6 +201,12 @@ void ftrace_replace_code(int enable)
 		new_addr = ftrace_get_addr_new(rec);
 		update = ftrace_update_record(rec, enable);
 
+		if (IS_ENABLED(CONFIG_ARCH_USING_PATCHABLE_FUNCTION_ENTRY)) {
+			ret = ftrace_get_call_inst(rec, (unsigned long)ftrace_caller, &nop_inst);
+			if (ret)
+				goto out;
+		}
+
 		switch (update) {
 		case FTRACE_UPDATE_IGNORE:
 		default:
@@ -205,6 +233,35 @@ void ftrace_replace_code(int enable)
 			ret = ftrace_modify_code(ip, old, new);
 		if (ret)
 			goto out;
+
+		if (IS_ENABLED(CONFIG_ARCH_USING_PATCHABLE_FUNCTION_ENTRY) &&
+		    (update == FTRACE_UPDATE_MAKE_NOP || update == FTRACE_UPDATE_MAKE_CALL)) {
+			/* Update the actual ftrace location */
+			call_inst = ppc_inst(PPC_RAW_BRANCH(-FTRACE_MCOUNT_TRAMP_OFFSET));
+			nop_inst = ppc_inst(PPC_RAW_NOP());
+			ip = rec->ip;
+
+			if (update == FTRACE_UPDATE_MAKE_NOP)
+				ret = ftrace_modify_code(ip, call_inst, nop_inst);
+			else
+				ret = ftrace_modify_code(ip, nop_inst, call_inst);
+
+			/* Switch unconditional branch after GEP to/from 'mflr r0' */
+			if (IS_ENABLED(CONFIG_PPC64) && !ret) {
+				call_inst = ppc_inst(PPC_RAW_BRANCH(FTRACE_MCOUNT_TRAMP_OFFSET + MCOUNT_INSN_SIZE));
+				old = ppc_inst(PPC_RAW_MFLR(_R0));
+
+				if (update == FTRACE_UPDATE_MAKE_NOP)
+					ret = ftrace_modify_code(ip - FTRACE_MCOUNT_TRAMP_OFFSET,
+								 old, call_inst);
+				else
+					ret = ftrace_modify_code(ip - FTRACE_MCOUNT_TRAMP_OFFSET,
+								 call_inst, old);
+			}
+
+			if (ret)
+				goto out;
+		}
 	}
 
 out:
@@ -217,15 +274,62 @@ int ftrace_init_nop(struct module *mod, struct dyn_ftrace *rec)
 {
 	unsigned long addr, ip = rec->ip;
 	ppc_inst_t old, new;
-	int ret = 0;
+	int i, ret = 0;
+	u32 ftrace_mcount_tramp_insns[] = {
+#ifdef CONFIG_PPC64
+		PPC_RAW_BRANCH(FTRACE_MCOUNT_TRAMP_OFFSET + MCOUNT_INSN_SIZE),
+#else
+		PPC_RAW_MFLR(_R0),
+#endif
+		PPC_RAW_BL(0), /* bl ftrace_caller */
+		PPC_RAW_MTLR(_R0), /* also see update ppc_function_entry() */
+		PPC_RAW_BRANCH(FTRACE_MCOUNT_TRAMP_OFFSET - MCOUNT_INSN_SIZE * 2)
+	};
+
+	if (!core_kernel_text(ip)) {
+		if (!mod) {
+			pr_err("0x%lx: No module provided for non-kernel address\n", ip);
+			return -EFAULT;
+		}
+		rec->arch.mod = mod;
+	}
+
+	if (IS_ENABLED(CONFIG_ARCH_USING_PATCHABLE_FUNCTION_ENTRY)) {
+		ip -= FTRACE_MCOUNT_TRAMP_OFFSET;
+
+		/* First instruction from the sequence */
+		old = ppc_inst(PPC_RAW_NOP());
+		ret = ftrace_modify_code(ip, old, ppc_inst(ftrace_mcount_tramp_insns[0]));
+		ip += MCOUNT_INSN_SIZE;
+
+		/* Default the mcount trampoline to go to ftrace_caller */
+		ret |= ftrace_get_call_inst(rec, (unsigned long)ftrace_caller, &new);
+		ret |= ftrace_modify_code(ip, old, new);
+		ip += MCOUNT_INSN_SIZE;
+
+		/* Rest of the instructions from the sequence */
+		for (i = 2; i < 4; i++, ip += MCOUNT_INSN_SIZE)
+			ret |= ftrace_modify_code(ip, old, ppc_inst(ftrace_mcount_tramp_insns[i]));
+
+		if (IS_ENABLED(CONFIG_PPC64)) {
+			/* two more nops */
+			ret |= ftrace_validate_inst(ip, old);
+			ip += MCOUNT_INSN_SIZE;
+			ret |= ftrace_validate_inst(ip, old);
+			ip += MCOUNT_INSN_SIZE;
+		} else {
+			ret |= ftrace_validate_inst(ip, old);
+			ip += MCOUNT_INSN_SIZE;
+		}
+
+		/* nop at ftrace location */
+		ret |= ftrace_validate_inst(ip, old);
+
+		return ret;
+	}
 
 	/* Verify instructions surrounding the ftrace location */
-	if (IS_ENABLED(CONFIG_ARCH_USING_PATCHABLE_FUNCTION_ENTRY)) {
-		/* Expect nops */
-		ret = ftrace_validate_inst(ip - 4, ppc_inst(PPC_RAW_NOP()));
-		if (!ret)
-			ret = ftrace_validate_inst(ip, ppc_inst(PPC_RAW_NOP()));
-	} else if (IS_ENABLED(CONFIG_PPC32)) {
+	if (IS_ENABLED(CONFIG_PPC32)) {
 		/* Expected sequence: 'mflr r0', 'stw r0,4(r1)', 'bl _mcount' */
 		ret = ftrace_validate_inst(ip - 8, ppc_inst(PPC_RAW_MFLR(_R0)));
 		if (!ret)
@@ -246,23 +350,10 @@ int ftrace_init_nop(struct module *mod, struct dyn_ftrace *rec)
 	if (ret)
 		return ret;
 
-	if (!core_kernel_text(ip)) {
-		if (!mod) {
-			pr_err("0x%lx: No module provided for non-kernel address\n", ip);
-			return -EFAULT;
-		}
-		rec->arch.mod = mod;
-	}
-
 	/* Nop-out the ftrace location */
 	new = ppc_inst(PPC_RAW_NOP());
 	addr = MCOUNT_ADDR;
-	if (IS_ENABLED(CONFIG_ARCH_USING_PATCHABLE_FUNCTION_ENTRY)) {
-		/* we instead patch-in the 'mflr r0' */
-		old = ppc_inst(PPC_RAW_NOP());
-		new = ppc_inst(PPC_RAW_MFLR(_R0));
-		ret = ftrace_modify_code(ip - 4, old, new);
-	} else if (is_offset_in_branch_range(addr - ip)) {
+	if (is_offset_in_branch_range(addr - ip)) {
 		/* Within range */
 		old = ftrace_create_branch_inst(ip, addr, 1);
 		ret = ftrace_modify_code(ip, old, new);
diff --git a/arch/powerpc/kernel/trace/ftrace_entry.S b/arch/powerpc/kernel/trace/ftrace_entry.S
index 244a1c7bb1e8..537c14b12904 100644
--- a/arch/powerpc/kernel/trace/ftrace_entry.S
+++ b/arch/powerpc/kernel/trace/ftrace_entry.S
@@ -78,6 +78,14 @@
 
 	/* Get the _mcount() call site out of LR */
 	mflr	r7
+#ifdef CONFIG_ARCH_USING_PATCHABLE_FUNCTION_ENTRY
+	/*
+	 * This points after the bl at 'mtlr r0', but this sequence could be
+	 * outside the function. Move this to point just after the ftrace
+	 * location inside the function for proper unwind.
+	 */
+	addi	r7, r7, FTRACE_MCOUNT_TRAMP_OFFSET - MCOUNT_INSN_SIZE
+#endif
 	/* Save it as pt_regs->nip */
 	PPC_STL	r7, _NIP(r1)
 	/* Also save it in B's stackframe header for proper unwind */
@@ -121,12 +129,18 @@
 .macro	ftrace_regs_exit allregs
 	/* Load ctr with the possibly modified NIP */
 	PPC_LL	r3, _NIP(r1)
-	mtctr	r3
 
 #ifdef CONFIG_LIVEPATCH_64
 	cmpd	r14, r3		/* has NIP been altered? */
 #endif
 
+#ifdef CONFIG_ARCH_USING_PATCHABLE_FUNCTION_ENTRY
+	subi	r3, r3, FTRACE_MCOUNT_TRAMP_OFFSET - MCOUNT_INSN_SIZE
+	mtlr	r3
+#else
+	mtctr	r3
+#endif
+
 	/* Restore gprs */
 	.if \allregs == 1
 	REST_GPRS(2, 31, r1)
@@ -139,7 +153,9 @@
 
 	/* Restore possibly modified LR */
 	PPC_LL	r0, _LINK(r1)
+#ifndef CONFIG_ARCH_USING_PATCHABLE_FUNCTION_ENTRY
 	mtlr	r0
+#endif
 
 #ifdef CONFIG_PPC64
 	/* Restore callee's TOC */
@@ -153,7 +169,12 @@
         /* Based on the cmpd above, if the NIP was altered handle livepatch */
 	bne-	livepatch_handler
 #endif
-	bctr			/* jump after _mcount site */
+	/* jump after _mcount site */
+#ifdef CONFIG_ARCH_USING_PATCHABLE_FUNCTION_ENTRY
+	blr
+#else
+	bctr
+#endif
 .endm
 
 _GLOBAL(ftrace_regs_caller)
@@ -177,6 +198,11 @@ _GLOBAL(ftrace_stub)
 
 #ifdef CONFIG_PPC64
 ftrace_no_trace:
+#ifdef CONFIG_ARCH_USING_PATCHABLE_FUNCTION_ENTRY
+	REST_GPR(3, r1)
+	addi	r1, r1, SWITCH_FRAME_SIZE+STACK_FRAME_MIN_SIZE
+	blr
+#else
 	mflr	r3
 	mtctr	r3
 	REST_GPR(3, r1)
@@ -184,6 +210,7 @@ ftrace_no_trace:
 	mtlr	r0
 	bctr
 #endif
+#endif
 
 #ifdef CONFIG_LIVEPATCH_64
 	/*
@@ -196,9 +223,9 @@ ftrace_no_trace:
 	 *
 	 * On entry:
 	 *  - we have no stack frame and can not allocate one
-	 *  - LR points back to the original caller (in A)
-	 *  - CTR holds the new NIP in C
-	 *  - r0, r11 & r12 are free
+	 *  - LR/r0 points back to the original caller (in A)
+	 *  - CTR/LR holds the new NIP in C
+	 *  - r11 & r12 are free
 	 */
 livepatch_handler:
 	ld	r12, PACA_THREAD_INFO(r13)
@@ -208,19 +235,26 @@ livepatch_handler:
 	addi	r11, r11, 24
 	std	r11, TI_livepatch_sp(r12)
 
-	/* Save toc & real LR on livepatch stack */
-	std	r2,  -24(r11)
-	mflr	r12
-	std	r12, -16(r11)
-
 	/* Store stack end marker */
 	lis     r12, STACK_END_MAGIC@h
 	ori     r12, r12, STACK_END_MAGIC@l
 	std	r12, -8(r11)
 
+	/* Save toc & real LR on livepatch stack */
+	std	r2,  -24(r11)
+#ifndef CONFIG_ARCH_USING_PATCHABLE_FUNCTION_ENTRY
+	mflr	r12
+	std	r12, -16(r11)
 	/* Put ctr in r12 for global entry and branch there */
 	mfctr	r12
 	bctrl
+#else
+	std	r0, -16(r11)
+	mflr	r12
+	addi	r12, r12, FTRACE_MCOUNT_TRAMP_OFFSET - MCOUNT_INSN_SIZE
+	mtlr	r12
+	blrl
+#endif
 
 	/*
 	 * Now we are returning from the patched function to the original
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [RFC PATCH 7/9] powerpc/ftrace: Add support for DYNAMIC_FTRACE_WITH_CALL_OPS
  2023-12-08 16:30 [RFC PATCH 0/9] powerpc: ftrace updates Naveen N Rao
                   ` (5 preceding siblings ...)
  2023-12-08 16:30 ` [RFC PATCH 6/9] powerpc/ftrace: Update and move function profile instructions out-of-line Naveen N Rao
@ 2023-12-08 16:30 ` Naveen N Rao
  2023-12-08 16:30 ` [RFC PATCH 8/9] powerpc/ftrace: Add support for DYNAMIC_FTRACE_WITH_DIRECT_CALLS Naveen N Rao
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 16+ messages in thread
From: Naveen N Rao @ 2023-12-08 16:30 UTC (permalink / raw)
  To: linuxppc-dev, linux-kernel
  Cc: Mark Rutland, Florent Revest, Nicholas Piggin, Steven Rostedt,
	Aneesh Kumar K.V, Masami Hiramatsu

Implement support for DYNAMIC_FTRACE_WITH_CALL_OPS similar to the
arm64 implementation.

This works by patching-in a pointer to an associated ftrace_ops
structure before each traceable function. If multiple ftrace_ops are
associated with a call site, then a special ftrace_list_ops is used to
enable iterating over all the registered ftrace_ops. If no ftrace_ops
are associated with a call site, then a special ftrace_nop_ops structure
is used to render the ftrace call as a no-op. ftrace trampoline can then
read the associated ftrace_ops for a call site by loading from an offset
from the LR, and branch directly to the associated function.

The primary advantage with this approach is that we don't have to
iterate over all the registered ftrace_ops for call sites that have a
single ftrace_ops registered. This is the equivalent of implementing
support for dynamic ftrace trampolines, which set up a special ftrace
trampoline for each registered ftrace_ops and have individual call sites
branch into those directly.

A secondary advantage is that this gives us a way to add support for
direct ftrace callers without having to resort to using stubs. The
address of the direct call trampoline can be loaded from the ftrace_ops
structure.

To support this, we utilize the space between the existing function
profile sequence and the function entry. During ftrace activation, we
update this location with the associated ftrace_ops pointer. Then, on
ftrace entry, we load from this location and call into
ftrace_ops->func().

For 64-bit powerpc, we also select FUNCTION_ALIGNMENT_8B so that the
ftrace_ops pointer is double word aligned and can be updated atomically.

Signed-off-by: Naveen N Rao <naveen@kernel.org>
---
 arch/powerpc/Kconfig                     |  2 +
 arch/powerpc/kernel/asm-offsets.c        |  4 ++
 arch/powerpc/kernel/trace/ftrace.c       | 58 ++++++++++++++++++++++++
 arch/powerpc/kernel/trace/ftrace_entry.S | 39 +++++++++++-----
 4 files changed, 91 insertions(+), 12 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 318e5c1b7454..c8ecc9dcc914 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -190,6 +190,7 @@ config PPC
 	select EDAC_SUPPORT
 	select FTRACE_MCOUNT_USE_PATCHABLE_FUNCTION_ENTRY if ARCH_USING_PATCHABLE_FUNCTION_ENTRY
 	select FUNCTION_ALIGNMENT_4B
+	select FUNCTION_ALIGNMENT_8B		if PPC64 && DYNAMIC_FTRACE_WITH_CALL_OPS
 	select GENERIC_ATOMIC64			if PPC32
 	select GENERIC_CLOCKEVENTS_BROADCAST	if SMP
 	select GENERIC_CMOS_UPDATE
@@ -233,6 +234,7 @@ config PPC
 	select HAVE_DEBUG_STACKOVERFLOW
 	select HAVE_DYNAMIC_FTRACE
 	select HAVE_DYNAMIC_FTRACE_WITH_ARGS	if ARCH_USING_PATCHABLE_FUNCTION_ENTRY || MPROFILE_KERNEL || PPC32
+	select HAVE_DYNAMIC_FTRACE_WITH_CALL_OPS if ARCH_USING_PATCHABLE_FUNCTION_ENTRY
 	select HAVE_DYNAMIC_FTRACE_WITH_REGS	if ARCH_USING_PATCHABLE_FUNCTION_ENTRY || MPROFILE_KERNEL || PPC32
 	select HAVE_EBPF_JIT
 	select HAVE_EFFICIENT_UNALIGNED_ACCESS
diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c
index 9f14d95b8b32..8b8a39b57a9f 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -676,5 +676,9 @@ int main(void)
 	DEFINE(BPT_SIZE, BPT_SIZE);
 #endif
 
+#ifdef CONFIG_DYNAMIC_FTRACE_WITH_CALL_OPS
+	OFFSET(FTRACE_OPS_FUNC, ftrace_ops, func);
+#endif
+
 	return 0;
 }
diff --git a/arch/powerpc/kernel/trace/ftrace.c b/arch/powerpc/kernel/trace/ftrace.c
index d3b4949142a8..af84eabf7912 100644
--- a/arch/powerpc/kernel/trace/ftrace.c
+++ b/arch/powerpc/kernel/trace/ftrace.c
@@ -124,6 +124,41 @@ static int ftrace_get_call_inst(struct dyn_ftrace *rec, unsigned long addr, ppc_
 	return 0;
 }
 
+#ifdef CONFIG_DYNAMIC_FTRACE_WITH_CALL_OPS
+static const struct ftrace_ops *powerpc_rec_get_ops(struct dyn_ftrace *rec)
+{
+	const struct ftrace_ops *ops = NULL;
+
+	if (rec->flags & FTRACE_FL_CALL_OPS_EN) {
+		ops = ftrace_find_unique_ops(rec);
+		WARN_ON_ONCE(!ops);
+	}
+
+	if (!ops)
+		ops = &ftrace_list_ops;
+
+	return ops;
+}
+
+static int ftrace_rec_set_ops(const struct dyn_ftrace *rec, const struct ftrace_ops *ops)
+{
+	return patch_ulong((void *)(rec->ip - sizeof(unsigned long)), (unsigned long)ops);
+}
+
+static int ftrace_rec_set_nop_ops(struct dyn_ftrace *rec)
+{
+	return ftrace_rec_set_ops(rec, &ftrace_nop_ops);
+}
+
+static int ftrace_rec_update_ops(struct dyn_ftrace *rec)
+{
+	return ftrace_rec_set_ops(rec, powerpc_rec_get_ops(rec));
+}
+#else
+static int ftrace_rec_set_nop_ops(struct dyn_ftrace *rec) { return 0; }
+static int ftrace_rec_update_ops(struct dyn_ftrace *rec) { return 0; }
+#endif
+
 #ifdef CONFIG_DYNAMIC_FTRACE_WITH_REGS
 int ftrace_modify_call(struct dyn_ftrace *rec, unsigned long old_addr, unsigned long addr)
 {
@@ -159,6 +194,10 @@ int ftrace_make_call(struct dyn_ftrace *rec, unsigned long addr)
 	if (ret || !IS_ENABLED(CONFIG_ARCH_USING_PATCHABLE_FUNCTION_ENTRY))
 		return ret;
 
+	ret = ftrace_rec_update_ops(rec);
+	if (ret)
+		return ret;
+
 	ip = rec->ip;
 	ret = ftrace_modify_code(ip, ppc_inst(PPC_RAW_NOP()),
 				 ppc_inst(PPC_RAW_BRANCH(-FTRACE_MCOUNT_TRAMP_OFFSET)));
@@ -214,16 +253,19 @@ void ftrace_replace_code(int enable)
 		case FTRACE_UPDATE_MODIFY_CALL:
 			ret = ftrace_get_call_inst(rec, new_addr, &new_call_inst);
 			ret |= ftrace_get_call_inst(rec, addr, &call_inst);
+			ret |= ftrace_rec_update_ops(rec);
 			old = call_inst;
 			new = new_call_inst;
 			break;
 		case FTRACE_UPDATE_MAKE_NOP:
 			ret = ftrace_get_call_inst(rec, addr, &call_inst);
+			ret |= ftrace_rec_set_nop_ops(rec);
 			old = call_inst;
 			new = nop_inst;
 			break;
 		case FTRACE_UPDATE_MAKE_CALL:
 			ret = ftrace_get_call_inst(rec, new_addr, &call_inst);
+			ret |= ftrace_rec_update_ops(rec);
 			old = nop_inst;
 			new = call_inst;
 			break;
@@ -312,6 +354,12 @@ int ftrace_init_nop(struct module *mod, struct dyn_ftrace *rec)
 			ret |= ftrace_modify_code(ip, old, ppc_inst(ftrace_mcount_tramp_insns[i]));
 
 		if (IS_ENABLED(CONFIG_PPC64)) {
+			if (IS_ENABLED(CONFIG_DYNAMIC_FTRACE_WITH_CALL_OPS) &&
+			    !IS_ALIGNED(ip, sizeof(unsigned long))) {
+				pr_err("0x%lx: Mis-aligned ftrace_ops patch site\n", ip);
+				return -EINVAL;
+			}
+
 			/* two more nops */
 			ret |= ftrace_validate_inst(ip, old);
 			ip += MCOUNT_INSN_SIZE;
@@ -325,6 +373,9 @@ int ftrace_init_nop(struct module *mod, struct dyn_ftrace *rec)
 		/* nop at ftrace location */
 		ret |= ftrace_validate_inst(ip, old);
 
+		if (!ret)
+			ret = ftrace_rec_set_nop_ops(rec);
+
 		return ret;
 	}
 
@@ -383,6 +434,13 @@ int ftrace_update_ftrace_func(ftrace_func_t func)
 	ppc_inst_t old, new;
 	int ret;
 
+	/*
+	 * When using CALL_OPS, the function to call is associated with the
+	 * call site, and we don't have a global function pointer to update.
+	 */
+	if (IS_ENABLED(CONFIG_DYNAMIC_FTRACE_WITH_CALL_OPS))
+		return 0;
+
 	old = ppc_inst_read((u32 *)&ftrace_call);
 	new = ftrace_create_branch_inst(ip, ppc_function_entry(func), 1);
 	ret = ftrace_modify_code(ip, old, new);
diff --git a/arch/powerpc/kernel/trace/ftrace_entry.S b/arch/powerpc/kernel/trace/ftrace_entry.S
index 537c14b12904..4d1220c2e32f 100644
--- a/arch/powerpc/kernel/trace/ftrace_entry.S
+++ b/arch/powerpc/kernel/trace/ftrace_entry.S
@@ -97,11 +97,6 @@
 	/* Save callee's TOC in the ABI compliant location */
 	std	r2, STK_GOT(r1)
 	LOAD_PACA_TOC()		/* get kernel TOC in r2 */
-	LOAD_REG_ADDR(r3, function_trace_op)
-	ld	r5,0(r3)
-#else
-	lis	r3,function_trace_op@ha
-	lwz	r5,function_trace_op@l(r3)
 #endif
 
 #ifdef CONFIG_LIVEPATCH_64
@@ -177,20 +172,40 @@
 #endif
 .endm
 
-_GLOBAL(ftrace_regs_caller)
-	ftrace_regs_entry 1
-	/* ftrace_call(r3, r4, r5, r6) */
+.macro ftrace_regs_func allregs
+#ifdef CONFIG_DYNAMIC_FTRACE_WITH_CALL_OPS
+	PPC_LL	r5, -SZL(r3)
+	PPC_LL	r12, FTRACE_OPS_FUNC(r5)
+	mtctr	r12
+	bctrl
+#else
+#ifdef CONFIG_PPC64
+	LOAD_REG_ADDR(r5, function_trace_op)
+	ld	r5, 0(r5)
+#else
+	lis	r5, function_trace_op@ha
+	lwz	r5, function_trace_op@l(r5)
+#endif
+	.if \allregs == 1
 .globl ftrace_regs_call
 ftrace_regs_call:
+	.else
+.globl ftrace_call
+ftrace_call:
+	.endif
+	/* ftrace_call(r3, r4, r5, r6) */
 	bl	ftrace_stub
+#endif
+.endm
+
+_GLOBAL(ftrace_regs_caller)
+	ftrace_regs_entry 1
+	ftrace_regs_func 1
 	ftrace_regs_exit 1
 
 _GLOBAL(ftrace_caller)
 	ftrace_regs_entry 0
-	/* ftrace_call(r3, r4, r5, r6) */
-.globl ftrace_call
-ftrace_call:
-	bl	ftrace_stub
+	ftrace_regs_func 0
 	ftrace_regs_exit 0
 
 _GLOBAL(ftrace_stub)
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [RFC PATCH 8/9] powerpc/ftrace: Add support for DYNAMIC_FTRACE_WITH_DIRECT_CALLS
  2023-12-08 16:30 [RFC PATCH 0/9] powerpc: ftrace updates Naveen N Rao
                   ` (6 preceding siblings ...)
  2023-12-08 16:30 ` [RFC PATCH 7/9] powerpc/ftrace: Add support for DYNAMIC_FTRACE_WITH_CALL_OPS Naveen N Rao
@ 2023-12-08 16:30 ` Naveen N Rao
  2023-12-08 16:30 ` [RFC PATCH 9/9] samples/ftrace: Add support for ftrace direct samples on powerpc Naveen N Rao
  2023-12-21 10:38 ` (subset) [RFC PATCH 0/9] powerpc: ftrace updates Michael Ellerman
  9 siblings, 0 replies; 16+ messages in thread
From: Naveen N Rao @ 2023-12-08 16:30 UTC (permalink / raw)
  To: linuxppc-dev, linux-kernel
  Cc: Mark Rutland, Florent Revest, Nicholas Piggin, Steven Rostedt,
	Aneesh Kumar K.V, Masami Hiramatsu

Add support for DYNAMIC_FTRACE_WITH_DIRECT_CALLS similar to the arm64
implementation.

ftrace direct calls allow custom trampolines to be called into directly
from function ftrace call sites, bypassing the ftrace trampoline
completely. This functionality is currently utilized by BPF trampolines
to hook into kernel function entries.

Since we have limited relative branch range, we support ftrace direct
calls through support for DYNAMIC_FTRACE_WITH_CALL_OPS. In this
approach, ftrace trampoline is not entirely bypassed. Rather, it is
re-purposed into a stub that reads direct_call field from the associated
ftrace_ops structure and branches into that, if it is not NULL. For
this, it is sufficient if we can ensure that the ftrace trampoline is
reachable from all traceable functions.

When multiple ftrace_ops are associated with a call site, we utilize a
call back to set pt_regs->orig_gpr3 that can then be tested on the
return path from the ftrace trampoline to branch into the direct caller.

Signed-off-by: Naveen N Rao <naveen@kernel.org>
---
 arch/powerpc/Kconfig                     |  1 +
 arch/powerpc/include/asm/ftrace.h        | 15 ++++
 arch/powerpc/kernel/asm-offsets.c        |  3 +
 arch/powerpc/kernel/trace/ftrace_entry.S | 99 ++++++++++++++++++------
 4 files changed, 93 insertions(+), 25 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index c8ecc9dcc914..4fe04fdca33a 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -235,6 +235,7 @@ config PPC
 	select HAVE_DYNAMIC_FTRACE
 	select HAVE_DYNAMIC_FTRACE_WITH_ARGS	if ARCH_USING_PATCHABLE_FUNCTION_ENTRY || MPROFILE_KERNEL || PPC32
 	select HAVE_DYNAMIC_FTRACE_WITH_CALL_OPS if ARCH_USING_PATCHABLE_FUNCTION_ENTRY
+	select HAVE_DYNAMIC_FTRACE_WITH_DIRECT_CALLS if HAVE_DYNAMIC_FTRACE_WITH_CALL_OPS
 	select HAVE_DYNAMIC_FTRACE_WITH_REGS	if ARCH_USING_PATCHABLE_FUNCTION_ENTRY || MPROFILE_KERNEL || PPC32
 	select HAVE_EBPF_JIT
 	select HAVE_EFFICIENT_UNALIGNED_ACCESS
diff --git a/arch/powerpc/include/asm/ftrace.h b/arch/powerpc/include/asm/ftrace.h
index d9b99781bea3..986c4fffb9ec 100644
--- a/arch/powerpc/include/asm/ftrace.h
+++ b/arch/powerpc/include/asm/ftrace.h
@@ -93,6 +93,21 @@ struct ftrace_ops;
 #define ftrace_graph_func ftrace_graph_func
 void ftrace_graph_func(unsigned long ip, unsigned long parent_ip,
 		       struct ftrace_ops *op, struct ftrace_regs *fregs);
+
+#ifdef CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
+/*
+ * When an ftrace registered caller is tracing a function that is also set by a
+ * register_ftrace_direct() call, it needs to be differentiated in the
+ * ftrace_caller trampoline so that the direct call can be invoked after the
+ * other ftrace ops. To do this, place the direct caller in the orig_gpr3 field
+ * of pt_regs. This tells ftrace_caller that there's a direct caller.
+ */
+static inline void arch_ftrace_set_direct_caller(struct ftrace_regs *fregs, unsigned long addr)
+{
+	struct pt_regs *regs = &fregs->regs;
+	regs->orig_gpr3 = addr;
+}
+#endif /* CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS */
 #endif
 #endif /* __ASSEMBLY__ */
 
diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c
index 8b8a39b57a9f..85da10726d98 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -678,6 +678,9 @@ int main(void)
 
 #ifdef CONFIG_DYNAMIC_FTRACE_WITH_CALL_OPS
 	OFFSET(FTRACE_OPS_FUNC, ftrace_ops, func);
+#ifdef CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
+	OFFSET(FTRACE_OPS_DIRECT_CALL, ftrace_ops, direct_call);
+#endif
 #endif
 
 	return 0;
diff --git a/arch/powerpc/kernel/trace/ftrace_entry.S b/arch/powerpc/kernel/trace/ftrace_entry.S
index 4d1220c2e32f..ab60395fc34b 100644
--- a/arch/powerpc/kernel/trace/ftrace_entry.S
+++ b/arch/powerpc/kernel/trace/ftrace_entry.S
@@ -33,14 +33,57 @@
  * and then arrange for the ftrace function to be called.
  */
 .macro	ftrace_regs_entry allregs
-	/* Save the original return address in A's stack frame */
-	PPC_STL		r0, LRSAVE(r1)
 	/* Create a minimal stack frame for representing B */
 	PPC_STLU	r1, -STACK_FRAME_MIN_SIZE(r1)
 
 	/* Create our stack frame + pt_regs */
 	PPC_STLU	r1,-SWITCH_FRAME_SIZE(r1)
 
+	.if \allregs == 1
+	SAVE_GPRS(11, 12, r1)
+	.endif
+
+	/* Get the _mcount() call site out of LR */
+	mflr	r11
+
+#ifdef CONFIG_ARCH_USING_PATCHABLE_FUNCTION_ENTRY
+	/*
+	 * This points after the bl at 'mtlr r0', but this sequence could be
+	 * outside the function. Move this to point just after the ftrace
+	 * location inside the function for proper unwind.
+	 */
+	addi	r11, r11, FTRACE_MCOUNT_TRAMP_OFFSET - MCOUNT_INSN_SIZE
+
+#ifdef CONFIG_DYNAMIC_FTRACE_WITH_CALL_OPS
+	/* Load the ftrace_op */
+	PPC_LL	r12, -SZL-MCOUNT_INSN_SIZE(r11)
+
+#ifdef CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
+	/* Load direct_call from the ftrace_op */
+	PPC_LL	r12, FTRACE_OPS_DIRECT_CALL(r12)
+	PPC_LCMPI r12, 0
+	beq	1f
+	mtctr	r12
+	.if \allregs == 1
+	REST_GPRS(11, 12, r1)
+	.endif
+	addi	r1, r1, SWITCH_FRAME_SIZE+STACK_FRAME_MIN_SIZE
+	bctr
+1:
+#endif
+#endif
+#endif
+
+	/* Save the previous LR in pt_regs->link */
+	PPC_STL	r0, _LINK(r1)
+	/* Also save it in A's stack frame */
+	PPC_STL	r0, SWITCH_FRAME_SIZE+STACK_FRAME_MIN_SIZE+LRSAVE(r1)
+
+	/* Save our return address as pt_regs->nip */
+	PPC_STL	r11, _NIP(r1)
+	/* Also save it in B's stackframe header for proper unwind */
+	PPC_STL	r11, SWITCH_FRAME_SIZE+LRSAVE(r1)
+
 	/* Save all gprs to pt_regs */
 	SAVE_GPR(0, r1)
 	SAVE_GPRS(3, 10, r1)
@@ -54,7 +97,7 @@
 
 	.if \allregs == 1
 	SAVE_GPR(2, r1)
-	SAVE_GPRS(11, 31, r1)
+	SAVE_GPRS(13, 31, r1)
 	.else
 #ifdef CONFIG_LIVEPATCH_64
 	SAVE_GPR(14, r1)
@@ -65,6 +108,13 @@
 	addi	r8, r1, SWITCH_FRAME_SIZE+STACK_FRAME_MIN_SIZE
 	PPC_STL	r8, GPR1(r1)
 
+#ifdef CONFIG_LIVEPATCH_64
+	mr	r14, r11	/* remember old NIP */
+#endif
+
+	/* Calculate ip from nip-4 into r3 for call below */
+	subi	r3, r11, MCOUNT_INSN_SIZE
+
 	.if \allregs == 1
 	/* Load special regs for save below */
 	mfmsr   r8
@@ -76,22 +126,11 @@
 	li	r8, 0
 	.endif
 
-	/* Get the _mcount() call site out of LR */
-	mflr	r7
-#ifdef CONFIG_ARCH_USING_PATCHABLE_FUNCTION_ENTRY
-	/*
-	 * This points after the bl at 'mtlr r0', but this sequence could be
-	 * outside the function. Move this to point just after the ftrace
-	 * location inside the function for proper unwind.
-	 */
-	addi	r7, r7, FTRACE_MCOUNT_TRAMP_OFFSET - MCOUNT_INSN_SIZE
+#ifdef CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
+	/* Clear orig_gpr3 to later detect ftrace_direct call */
+	li	r7, 0
+	PPC_STL	r7, ORIG_GPR3(r1)
 #endif
-	/* Save it as pt_regs->nip */
-	PPC_STL	r7, _NIP(r1)
-	/* Also save it in B's stackframe header for proper unwind */
-	PPC_STL	r7, LRSAVE+SWITCH_FRAME_SIZE(r1)
-	/* Save the read LR in pt_regs->link */
-	PPC_STL	r0, _LINK(r1)
 
 #ifdef CONFIG_PPC64
 	/* Save callee's TOC in the ABI compliant location */
@@ -99,13 +138,6 @@
 	LOAD_PACA_TOC()		/* get kernel TOC in r2 */
 #endif
 
-#ifdef CONFIG_LIVEPATCH_64
-	mr	r14, r7		/* remember old NIP */
-#endif
-
-	/* Calculate ip from nip-4 into r3 for call below */
-	subi    r3, r7, MCOUNT_INSN_SIZE
-
 	/* Put the original return address in r4 as parent_ip */
 	mr	r4, r0
 
@@ -122,6 +154,13 @@
 .endm
 
 .macro	ftrace_regs_exit allregs
+#ifdef CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
+	/* Check orig_gpr3 to detect ftrace_direct call */
+	PPC_LL	r7, ORIG_GPR3(r1)
+	PPC_LCMPI cr1, r7, 0
+	mtctr	r7
+#endif
+
 	/* Load ctr with the possibly modified NIP */
 	PPC_LL	r3, _NIP(r1)
 
@@ -164,8 +203,12 @@
         /* Based on the cmpd above, if the NIP was altered handle livepatch */
 	bne-	livepatch_handler
 #endif
+
 	/* jump after _mcount site */
 #ifdef CONFIG_ARCH_USING_PATCHABLE_FUNCTION_ENTRY
+#ifdef CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
+	bnectr	cr1
+#endif
 	blr
 #else
 	bctr
@@ -227,6 +270,12 @@ ftrace_no_trace:
 #endif
 #endif
 
+#ifdef CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
+SYM_FUNC_START(ftrace_stub_direct_tramp)
+	blr
+SYM_FUNC_END(ftrace_stub_direct_tramp)
+#endif
+
 #ifdef CONFIG_LIVEPATCH_64
 	/*
 	 * This function runs in the mcount context, between two functions. As
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [RFC PATCH 9/9] samples/ftrace: Add support for ftrace direct samples on powerpc
  2023-12-08 16:30 [RFC PATCH 0/9] powerpc: ftrace updates Naveen N Rao
                   ` (7 preceding siblings ...)
  2023-12-08 16:30 ` [RFC PATCH 8/9] powerpc/ftrace: Add support for DYNAMIC_FTRACE_WITH_DIRECT_CALLS Naveen N Rao
@ 2023-12-08 16:30 ` Naveen N Rao
  2023-12-21 10:38 ` (subset) [RFC PATCH 0/9] powerpc: ftrace updates Michael Ellerman
  9 siblings, 0 replies; 16+ messages in thread
From: Naveen N Rao @ 2023-12-08 16:30 UTC (permalink / raw)
  To: linuxppc-dev, linux-kernel
  Cc: Mark Rutland, Florent Revest, Nicholas Piggin, Steven Rostedt,
	Aneesh Kumar K.V, Masami Hiramatsu

Add powerpc 32-bit and 64-bit samples for ftrace direct. This serves to
show the sample instruction sequence to be used by ftrace direct calls
to adhere to the ftrace ABI.

On 64-bit powerpc, TOC setup requires some additional work.

Signed-off-by: Naveen N Rao <naveen@kernel.org>
---
 arch/powerpc/Kconfig                        |   2 +
 samples/ftrace/ftrace-direct-modify.c       |  94 ++++++++++++++++-
 samples/ftrace/ftrace-direct-multi-modify.c | 110 +++++++++++++++++++-
 samples/ftrace/ftrace-direct-multi.c        |  64 +++++++++++-
 samples/ftrace/ftrace-direct-too.c          |  72 ++++++++++++-
 samples/ftrace/ftrace-direct.c              |  61 ++++++++++-
 6 files changed, 398 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 4fe04fdca33a..28de3a5f3e98 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -274,6 +274,8 @@ config PPC
 	select HAVE_REGS_AND_STACK_ACCESS_API
 	select HAVE_RELIABLE_STACKTRACE
 	select HAVE_RSEQ
+	select HAVE_SAMPLE_FTRACE_DIRECT	if HAVE_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
+	select HAVE_SAMPLE_FTRACE_DIRECT_MULTI	if HAVE_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
 	select HAVE_SETUP_PER_CPU_AREA		if PPC64
 	select HAVE_SOFTIRQ_ON_OWN_STACK
 	select HAVE_STACKPROTECTOR		if PPC32 && $(cc-option,-mstack-protector-guard=tls -mstack-protector-guard-reg=r2)
diff --git a/samples/ftrace/ftrace-direct-modify.c b/samples/ftrace/ftrace-direct-modify.c
index e2a6a69352df..bd985035b937 100644
--- a/samples/ftrace/ftrace-direct-modify.c
+++ b/samples/ftrace/ftrace-direct-modify.c
@@ -2,7 +2,7 @@
 #include <linux/module.h>
 #include <linux/kthread.h>
 #include <linux/ftrace.h>
-#ifndef CONFIG_ARM64
+#if !defined(CONFIG_ARM64) && !defined(CONFIG_PPC32)
 #include <asm/asm-offsets.h>
 #endif
 
@@ -164,6 +164,98 @@ asm (
 
 #endif /* CONFIG_LOONGARCH */
 
+#ifdef CONFIG_PPC32
+
+asm (
+"	.pushsection	.text, \"ax\", @progbits\n"
+"	.type		my_tramp1, @function\n"
+"	.globl		my_tramp1\n"
+"   my_tramp1:"
+"	stw		0, 4(1)\n"
+"	stwu		1, -16(1)\n"
+"	mflr		0\n"
+"	stw		0, 4(1)\n"
+"	stwu		1, -16(1)\n"
+"	bl		my_direct_func1\n"
+"	lwz		0, 20(1)\n"
+"	mtlr		0\n"
+"	addi		1, 1, 32\n"
+"	lwz		0, 4(1)\n"
+"	blr\n"
+"	.size		my_tramp1, .-my_tramp1\n"
+
+"	.type		my_tramp2, @function\n"
+"	.globl		my_tramp2\n"
+"   my_tramp2:"
+"	stw		0, 4(1)\n"
+"	stwu		1, -16(1)\n"
+"	mflr		0\n"
+"	stw		0, 4(1)\n"
+"	stwu		1, -16(1)\n"
+"	bl		my_direct_func2\n"
+"	lwz		0, 20(1)\n"
+"	mtlr		0\n"
+"	addi		1, 1, 32\n"
+"	lwz		0, 4(1)\n"
+"	blr\n"
+"	.size		my_tramp2, .-my_tramp2\n"
+"	.popsection\n"
+);
+
+#endif /* CONFIG_PPC32 */
+
+#ifdef CONFIG_PPC64
+
+asm (
+"	.pushsection	.text, \"ax\", @progbits\n"
+"	.type		my_tramp1, @function\n"
+"	.globl		my_tramp1\n"
+"   my_tramp1:"
+"	std		0, 16(1)\n"
+"	stdu		1, -32(1)\n"
+"	mflr		0\n"
+"	std		0, 16(1)\n"
+"	stdu		1, -32(1)\n"
+"	std		2, 24(1)\n"
+"	bcl		20, 31, 1f\n"
+"   1:	mflr		12\n"
+"	ld		2, (2f - 1b)(12)\n"
+"	bl		my_direct_func1\n"
+"	ld		2, 24(1)\n"
+"	ld		0, 48(1)\n"
+"	mtlr		0\n"
+"	addi		1, 1, 64\n"
+"	ld		0, 16(1)\n"
+"	blr\n"
+"   2:	.quad		.TOC.@tocbase\n"
+"	.size		my_tramp1, .-my_tramp1\n"
+
+"	.type		my_tramp2, @function\n"
+"	.globl		my_tramp2\n"
+"   my_tramp2:"
+"	std		0, 16(1)\n"
+"	stdu		1, -32(1)\n"
+"	mflr		0\n"
+"	std		0, 16(1)\n"
+"	stdu		1, -32(1)\n"
+"	std		2, 24(1)\n"
+"	bcl		20, 31, 1f\n"
+"   1:	mflr		12\n"
+"	ld		2, (2f - 1b)(12)\n"
+"	bl		my_direct_func2\n"
+"	ld		2, 24(1)\n"
+"	ld		0, 48(1)\n"
+"	mtlr		0\n"
+"	addi		1, 1, 64\n"
+"	ld		0, 16(1)\n"
+"	blr\n"
+"   2:	.quad		.TOC.@tocbase\n"
+"	.size		my_tramp2, .-my_tramp2\n"
+"	.popsection\n"
+);
+
+#endif /* CONFIG_PPC64 */
+
 static struct ftrace_ops direct;
 
 static unsigned long my_tramp = (unsigned long)my_tramp1;
diff --git a/samples/ftrace/ftrace-direct-multi-modify.c b/samples/ftrace/ftrace-direct-multi-modify.c
index 2e349834d63c..478e879a23af 100644
--- a/samples/ftrace/ftrace-direct-multi-modify.c
+++ b/samples/ftrace/ftrace-direct-multi-modify.c
@@ -2,7 +2,7 @@
 #include <linux/module.h>
 #include <linux/kthread.h>
 #include <linux/ftrace.h>
-#ifndef CONFIG_ARM64
+#if !defined(CONFIG_ARM64) && !defined(CONFIG_PPC32)
 #include <asm/asm-offsets.h>
 #endif
 
@@ -184,6 +184,114 @@ asm (
 
 #endif /* CONFIG_LOONGARCH */
 
+#ifdef CONFIG_PPC32
+
+asm (
+"	.pushsection	.text, \"ax\", @progbits\n"
+"	.type		my_tramp1, @function\n"
+"	.globl		my_tramp1\n"
+"   my_tramp1:"
+"	stw		0, 4(1)\n"
+"	stwu		1, -16(1)\n"
+"	mflr		0\n"
+"	stw		0, 4(1)\n"
+"	stwu		1, -24(1)\n"
+"	stw		3, 16(1)\n"
+"	mr		3, 0\n"
+"	addi		3, 3, 16\n"
+"	bl		my_direct_func1\n"
+"	lwz		3, 16(1)\n"
+"	lwz		0, 28(1)\n"
+"	mtlr		0\n"
+"	addi		1, 1, 40\n"
+"	lwz		0, 4(1)\n"
+"	blr\n"
+"	.size		my_tramp1, .-my_tramp1\n"
+
+"	.type		my_tramp2, @function\n"
+"	.globl		my_tramp2\n"
+"   my_tramp2:"
+"	stw		0, 4(1)\n"
+"	stwu		1, -16(1)\n"
+"	mflr		0\n"
+"	stw		0, 4(1)\n"
+"	stwu		1, -24(1)\n"
+"	stw		3, 16(1)\n"
+"	mr		3, 0\n"
+"	addi		3, 3, 16\n"
+"	bl		my_direct_func2\n"
+"	lwz		3, 16(1)\n"
+"	lwz		0, 28(1)\n"
+"	mtlr		0\n"
+"	addi		1, 1, 40\n"
+"	lwz		0, 4(1)\n"
+"	blr\n"
+"	.size		my_tramp2, .-my_tramp2\n"
+"	.popsection\n"
+);
+
+#endif /* CONFIG_PPC32 */
+
+#ifdef CONFIG_PPC64
+
+asm (
+"	.pushsection	.text, \"ax\", @progbits\n"
+"	.type		my_tramp1, @function\n"
+"	.globl		my_tramp1\n"
+"   my_tramp1:"
+"	std		0, 16(1)\n"
+"	stdu		1, -32(1)\n"
+"	mflr		0\n"
+"	std		0, 16(1)\n"
+"	stdu		1, -48(1)\n"
+"	std		2, 24(1)\n"
+"	bcl		20, 31, 1f\n"
+"   1:	mflr		12\n"
+"	ld		2, (2f - 1b)(12)\n"
+"	std		3, 32(1)\n"
+"	mr		3, 0\n"
+"	addi		3, 3, 20\n"
+"	bl		my_direct_func1\n"
+"	ld		3, 32(1)\n"
+"	ld		2, 24(1)\n"
+"	ld		0, 64(1)\n"
+"	mtlr		0\n"
+"	addi		1, 1, 80\n"
+"	ld		0, 16(1)\n"
+"	blr\n"
+"   2:	.quad		.TOC.@tocbase\n"
+"	.size		my_tramp1, .-my_tramp1\n"
+
+"	.type		my_tramp2, @function\n"
+"	.globl		my_tramp2\n"
+"   my_tramp2:"
+"	std		0, 16(1)\n"
+"	stdu		1, -32(1)\n"
+"	mflr		0\n"
+"	std		0, 16(1)\n"
+"	stdu		1, -48(1)\n"
+"	std		2, 24(1)\n"
+"	bcl		20, 31, 1f\n"
+"   1:	mflr		12\n"
+"	ld		2, (2f - 1b)(12)\n"
+"	std		3, 32(1)\n"
+"	mr		3, 0\n"
+"	addi		3, 3, 20\n"
+"	bl		my_direct_func2\n"
+"	ld		3, 32(1)\n"
+"	ld		2, 24(1)\n"
+"	ld		0, 64(1)\n"
+"	mtlr		0\n"
+"	addi		1, 1, 80\n"
+"	ld		0, 16(1)\n"
+"	blr\n"
+"   2:	.quad		.TOC.@tocbase\n"
+"	.size		my_tramp2, .-my_tramp2\n"
+"	.popsection\n"
+);
+
+#endif /* CONFIG_PPC64 */
+
 static unsigned long my_tramp = (unsigned long)my_tramp1;
 static unsigned long tramps[2] = {
 	(unsigned long)my_tramp1,
diff --git a/samples/ftrace/ftrace-direct-multi.c b/samples/ftrace/ftrace-direct-multi.c
index 9243dbfe4d0c..558f4ad8d84a 100644
--- a/samples/ftrace/ftrace-direct-multi.c
+++ b/samples/ftrace/ftrace-direct-multi.c
@@ -4,7 +4,7 @@
 #include <linux/mm.h> /* for handle_mm_fault() */
 #include <linux/ftrace.h>
 #include <linux/sched/stat.h>
-#ifndef CONFIG_ARM64
+#if !defined(CONFIG_ARM64) && !defined(CONFIG_PPC32)
 #include <asm/asm-offsets.h>
 #endif
 
@@ -116,6 +116,68 @@ asm (
 
 #endif /* CONFIG_LOONGARCH */
 
+#ifdef CONFIG_PPC32
+
+asm (
+"	.pushsection	.text, \"ax\", @progbits\n"
+"	.type		my_tramp, @function\n"
+"	.globl		my_tramp\n"
+"   my_tramp:"
+"	stw		0, 4(1)\n"
+"	stwu		1, -16(1)\n"
+"	mflr		0\n"
+"	stw		0, 4(1)\n"
+"	stwu		1, -24(1)\n"
+"	stw		3, 16(1)\n"
+"	mr		3, 0\n"
+"	addi		3, 3, 16\n"
+"	bl		my_direct_func\n"
+"	lwz		3, 16(1)\n"
+"	lwz		0, 28(1)\n"
+"	mtlr		0\n"
+"	addi		1, 1, 40\n"
+"	lwz		0, 4(1)\n"
+"	blr\n"
+"	.size		my_tramp, .-my_tramp\n"
+"	.popsection\n"
+);
+
+#endif /* CONFIG_PPC32 */
+
+#ifdef CONFIG_PPC64
+
+asm (
+"	.pushsection	.text, \"ax\", @progbits\n"
+"	.type		my_tramp, @function\n"
+"	.globl		my_tramp\n"
+"   my_tramp:"
+"	std		0, 16(1)\n"
+"	stdu		1, -32(1)\n"
+"	mflr		0\n"
+"	std		0, 16(1)\n"
+"	stdu		1, -48(1)\n"
+"	std		2, 24(1)\n"
+"	bcl		20, 31, 1f\n"
+"   1:	mflr		12\n"
+"	ld		2, (2f - 1b)(12)\n"
+"	std		3, 32(1)\n"
+"	mr		3, 0\n"
+"	addi		3, 3, 20\n"
+"	bl		my_direct_func\n"
+"	ld		3, 32(1)\n"
+"	ld		2, 24(1)\n"
+"	ld		0, 64(1)\n"
+"	mtlr		0\n"
+"	addi		1, 1, 80\n"
+"	ld		0, 16(1)\n"
+"	blr\n"
+"   2:	.quad		.TOC.@tocbase\n"
+"	.size		my_tramp, .-my_tramp\n"
+"	.popsection\n"
+);
+
+#endif /* CONFIG_PPC64 */
+
 static struct ftrace_ops direct;
 
 static int __init ftrace_direct_multi_init(void)
diff --git a/samples/ftrace/ftrace-direct-too.c b/samples/ftrace/ftrace-direct-too.c
index e39c3563ae4e..2a35b5d88304 100644
--- a/samples/ftrace/ftrace-direct-too.c
+++ b/samples/ftrace/ftrace-direct-too.c
@@ -3,7 +3,7 @@
 
 #include <linux/mm.h> /* for handle_mm_fault() */
 #include <linux/ftrace.h>
-#ifndef CONFIG_ARM64
+#if !defined(CONFIG_ARM64) && !defined(CONFIG_PPC32)
 #include <asm/asm-offsets.h>
 #endif
 
@@ -125,6 +125,76 @@ asm (
 
 #endif /* CONFIG_LOONGARCH */
 
+#ifdef CONFIG_PPC32
+
+asm (
+"	.pushsection	.text, \"ax\", @progbits\n"
+"	.type		my_tramp, @function\n"
+"	.globl		my_tramp\n"
+"   my_tramp:"
+"	stw		0, 4(1)\n"
+"	stwu		1, -16(1)\n"
+"	mflr		0\n"
+"	stw		0, 4(1)\n"
+"	stwu		1, -32(1)\n"
+"	stw		3, 16(1)\n"
+"	stw		4, 20(1)\n"
+"	stw		5, 24(1)\n"
+"	stw		6, 28(1)\n"
+"	bl		my_direct_func\n"
+"	lwz		6, 28(1)\n"
+"	lwz		5, 24(1)\n"
+"	lwz		4, 20(1)\n"
+"	lwz		3, 16(1)\n"
+"	lwz		0, 36(1)\n"
+"	mtlr		0\n"
+"	addi		1, 1, 48\n"
+"	lwz		0, 4(1)\n"
+"	blr\n"
+"	.size		my_tramp, .-my_tramp\n"
+"	.popsection\n"
+);
+
+#endif /* CONFIG_PPC32 */
+
+#ifdef CONFIG_PPC64
+
+asm (
+"	.pushsection	.text, \"ax\", @progbits\n"
+"	.type		my_tramp, @function\n"
+"	.globl		my_tramp\n"
+"   my_tramp:"
+"	std		0, 16(1)\n"
+"	stdu		1, -32(1)\n"
+"	mflr		0\n"
+"	std		0, 16(1)\n"
+"	stdu		1, -64(1)\n"
+"	std		2, 24(1)\n"
+"	bcl		20, 31, 1f\n"
+"   1:	mflr		12\n"
+"	ld		2, (2f - 1b)(12)\n"
+"	std		3, 32(1)\n"
+"	std		4, 40(1)\n"
+"	std		5, 48(1)\n"
+"	std		6, 56(1)\n"
+"	bl		my_direct_func\n"
+"	ld		6, 56(1)\n"
+"	ld		5, 48(1)\n"
+"	ld		4, 40(1)\n"
+"	ld		3, 32(1)\n"
+"	ld		2, 24(1)\n"
+"	ld		0, 80(1)\n"
+"	mtlr		0\n"
+"	addi		1, 1, 96\n"
+"	ld		0, 16(1)\n"
+"	blr\n"
+"   2:	.quad		.TOC.@tocbase\n"
+"	.size		my_tramp, .-my_tramp\n"
+"	.popsection\n"
+);
+
+#endif /* CONFIG_PPC64 */
+
 static struct ftrace_ops direct;
 
 static int __init ftrace_direct_init(void)
diff --git a/samples/ftrace/ftrace-direct.c b/samples/ftrace/ftrace-direct.c
index 32c477da1e9a..5585ffb6dd41 100644
--- a/samples/ftrace/ftrace-direct.c
+++ b/samples/ftrace/ftrace-direct.c
@@ -3,7 +3,7 @@
 
 #include <linux/sched.h> /* for wake_up_process() */
 #include <linux/ftrace.h>
-#ifndef CONFIG_ARM64
+#if !defined(CONFIG_ARM64) && !defined(CONFIG_PPC32)
 #include <asm/asm-offsets.h>
 #endif
 
@@ -110,6 +110,65 @@ asm (
 
 #endif /* CONFIG_LOONGARCH */
 
+#ifdef CONFIG_PPC32
+
+asm (
+"	.pushsection	.text, \"ax\", @progbits\n"
+"	.type		my_tramp, @function\n"
+"	.globl		my_tramp\n"
+"   my_tramp:"
+"	stw		0, 4(1)\n"
+"	stwu		1, -16(1)\n"
+"	mflr		0\n"
+"	stw		0, 4(1)\n"
+"	stwu		1, -24(1)\n"
+"	stw		3, 16(1)\n"
+"	bl		my_direct_func\n"
+"	lwz		3, 16(1)\n"
+"	lwz		0, 28(1)\n"
+"	mtlr		0\n"
+"	addi		1, 1, 40\n"
+"	lwz		0, 4(1)\n"
+"	blr\n"
+"	.size		my_tramp, .-my_tramp\n"
+"	.popsection\n"
+);
+
+#endif /* CONFIG_PPC32 */
+
+#ifdef CONFIG_PPC64
+
+asm (
+"	.pushsection	.text, \"ax\", @progbits\n"
+"	.type		my_tramp, @function\n"
+"	.globl		my_tramp\n"
+"   my_tramp:"
+"	std		0, 16(1)\n"
+"	stdu		1, -32(1)\n"
+"	mflr		0\n"
+"	std		0, 16(1)\n"
+"	stdu		1, -48(1)\n"
+"	std		2, 24(1)\n"
+"	bcl		20, 31, 1f\n"
+"   1:	mflr		12\n"
+"	ld		2, (2f - 1b)(12)\n"
+"	std		3, 32(1)\n"
+"	bl		my_direct_func\n"
+"	ld		3, 32(1)\n"
+"	ld		2, 24(1)\n"
+"	ld		0, 64(1)\n"
+"	mtlr		0\n"
+"	addi		1, 1, 80\n"
+"	ld		0, 16(1)\n"
+"	blr\n"
+"   2:	.quad		.TOC.@tocbase\n"
+"	.size		my_tramp, .-my_tramp\n"
+"	.popsection\n"
+);
+
+#endif /* CONFIG_PPC64 */
+
+
 static struct ftrace_ops direct;
 
 static int __init ftrace_direct_init(void)
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH 5/9] powerpc/kprobes: Use ftrace to determine if a probe is at function entry
  2023-12-08 16:30 ` [RFC PATCH 5/9] powerpc/kprobes: Use ftrace to determine if a probe is at function entry Naveen N Rao
@ 2023-12-11  5:39   ` Masami Hiramatsu
  0 siblings, 0 replies; 16+ messages in thread
From: Masami Hiramatsu @ 2023-12-11  5:39 UTC (permalink / raw)
  To: Naveen N Rao
  Cc: Mark Rutland, Florent Revest, linux-kernel, Aneesh Kumar K.V,
	Steven Rostedt, Nicholas Piggin, linuxppc-dev, Masami Hiramatsu

On Fri,  8 Dec 2023 22:00:44 +0530
Naveen N Rao <naveen@kernel.org> wrote:

> Rather than hard-coding the offset into a function to be used to
> determine if a kprobe is at function entry, use ftrace_location() to
> determine the ftrace location within the function and categorize all
> instructions till that offset to be function entry.
> 
> For functions that cannot be traced, we fall back to using a fixed
> offset of 8 (two instructions) to categorize a probe as being at
> function entry for 64-bit elfv2.
> 

OK, so this can cover both KPROBES_ON_FTRACE=y/n cases and the function
is traced by ftrace or not.

Looks good to me.

Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>

Thanks,

> Signed-off-by: Naveen N Rao <naveen@kernel.org>
> ---
>  arch/powerpc/kernel/kprobes.c | 18 ++++++++----------
>  1 file changed, 8 insertions(+), 10 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/kprobes.c b/arch/powerpc/kernel/kprobes.c
> index b20ee72e873a..42665dfab59e 100644
> --- a/arch/powerpc/kernel/kprobes.c
> +++ b/arch/powerpc/kernel/kprobes.c
> @@ -105,24 +105,22 @@ kprobe_opcode_t *kprobe_lookup_name(const char *name, unsigned int offset)
>  	return addr;
>  }
>  
> -static bool arch_kprobe_on_func_entry(unsigned long offset)
> +static bool arch_kprobe_on_func_entry(unsigned long addr, unsigned long offset)
>  {
> -#ifdef CONFIG_PPC64_ELF_ABI_V2
> -#ifdef CONFIG_KPROBES_ON_FTRACE
> -	return offset <= 16;
> -#else
> -	return offset <= 8;
> -#endif
> -#else
> +	unsigned long ip = ftrace_location(addr);
> +
> +	if (ip)
> +		return offset <= (ip - addr);
> +	if (IS_ENABLED(CONFIG_PPC64_ELF_ABI_V2))
> +		return offset <= 8;
>  	return !offset;
> -#endif
>  }
>  
>  /* XXX try and fold the magic of kprobe_lookup_name() in this */
>  kprobe_opcode_t *arch_adjust_kprobe_addr(unsigned long addr, unsigned long offset,
>  					 bool *on_func_entry)
>  {
> -	*on_func_entry = arch_kprobe_on_func_entry(offset);
> +	*on_func_entry = arch_kprobe_on_func_entry(addr, offset);
>  	return (kprobe_opcode_t *)(addr + offset);
>  }
>  
> -- 
> 2.43.0
> 


-- 
Masami Hiramatsu (Google) <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: (subset) [RFC PATCH 0/9] powerpc: ftrace updates
  2023-12-08 16:30 [RFC PATCH 0/9] powerpc: ftrace updates Naveen N Rao
                   ` (8 preceding siblings ...)
  2023-12-08 16:30 ` [RFC PATCH 9/9] samples/ftrace: Add support for ftrace direct samples on powerpc Naveen N Rao
@ 2023-12-21 10:38 ` Michael Ellerman
  9 siblings, 0 replies; 16+ messages in thread
From: Michael Ellerman @ 2023-12-21 10:38 UTC (permalink / raw)
  To: linuxppc-dev, linux-kernel, Naveen N Rao
  Cc: Mark Rutland, Florent Revest, Nicholas Piggin, Steven Rostedt,
	Aneesh Kumar K.V, Masami Hiramatsu

On Fri, 08 Dec 2023 22:00:39 +0530, Naveen N Rao wrote:
> Early RFC.
> 
> This series attempts to address couple of issues with the existing
> support for ftrace on powerpc, with a view towards improving performance
> when ftrace is not enabled. See patch 6 for more details.
> 
> Patches 7 and 8 implement support for ftrace direct calls, through
> adding support for DYNAMIC_FTRACE_WITH_CALL_OPS.
> 
> [...]

Patches 1, 3 and 4 applied to powerpc/next.

[1/9] powerpc/ftrace: Fix indentation in ftrace.h
      https://git.kernel.org/powerpc/c/2ec36570c3581285d15de672eaed10ce7e9218cd
[3/9] powerpc/ftrace: Remove nops after the call to ftrace_stub
      https://git.kernel.org/powerpc/c/ae24db43b3b427eb290b58d55179c32f0a7539d1
[4/9] powerpc/Kconfig: Select FUNCTION_ALIGNMENT_4B
      https://git.kernel.org/powerpc/c/b20f98e8b3deb50247603f0242ee2d1e38726635

cheers

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH 6/9] powerpc/ftrace: Update and move function profile instructions out-of-line
  2023-12-08 16:30 ` [RFC PATCH 6/9] powerpc/ftrace: Update and move function profile instructions out-of-line Naveen N Rao
@ 2023-12-21 10:46   ` Christophe Leroy
  2023-12-21 14:25     ` Steven Rostedt
  2023-12-22 15:01     ` Naveen N Rao
  0 siblings, 2 replies; 16+ messages in thread
From: Christophe Leroy @ 2023-12-21 10:46 UTC (permalink / raw)
  To: Naveen N Rao, linuxppc-dev, linux-kernel
  Cc: Mark Rutland, Florent Revest, Steven Rostedt, Aneesh Kumar K.V,
	Nicholas Piggin, Masami Hiramatsu



Le 08/12/2023 à 17:30, Naveen N Rao a écrit :
> Function profile sequence on powerpc includes two instructions at the
> beginning of each function:
> 
> 	mflr	r0
> 	bl	ftrace_caller
> 
> The call to ftrace_caller() gets nop'ed out during kernel boot and is
> patched in when ftrace is enabled.
> 
> There are two issues with this:
> 1. The 'mflr r0' instruction at the beginning of each function remains
>     even though ftrace is not being used.
> 2. When ftrace is activated, we return from ftrace_caller() with a bctr
>     instruction to preserve r0 and LR, resulting in the link stack
>     becoming unbalanced.
> 
> To address (1), we have tried to nop'out the 'mflr r0' instruction when
> nop'ing out the call to ftrace_caller() and restoring it when enabling
> ftrace. But, that required additional synchronization slowing down
> ftrace activation. It also left an additional nop instruction at the
> beginning of each function and that wasn't desirable on 32-bit powerpc.
> 
> Instead of that, move the function profile sequence out-of-line leaving
> a single nop at function entry. On ftrace activation, the nop is changed
> to an unconditional branch to the out-of-line sequence that in turn
> calls ftrace_caller(). This removes the need for complex synchronization
> during ftrace activation and simplifies the code. More importantly, this
> improves performance of the kernel when ftrace is not in use.
> 
> To address (2), change the ftrace trampoline to return with a 'blr'
> instruction with the original return address in r0 intact. Then, an
> additional 'mtlr r0' instruction in the function profile sequence can
> move the correct return address back to LR.
> 
> With the above two changes, the function profile sequence now looks like
> the following:
> 
>   [func:		# GEP -- 64-bit powerpc, optional
> 	addis	r2,r12,imm1
> 	addi	r2,r2,imm2]
>    tramp:
> 	mflr	r0
> 	bl	ftrace_caller
> 	mtlr	r0
> 	b	func
> 	nop
> 	[nop]	# 64-bit powerpc only
>    func:		# LEP
> 	nop
> 
> On 32-bit powerpc, the ftrace mcount trampoline is now completely
> outside the function. This is also the case on 64-bit powerpc for
> functions that do not need a GEP. However, for functions that need a
> GEP, the additional instructions are inserted between the GEP and the
> LEP. Since we can only have a fixed number of instructions between GEP
> and LEP, we choose to emit 6 instructions. Four of those instructions
> are used for the function profile sequence and two instruction slots are
> reserved for implementing support for DYNAMIC_FTRACE_WITH_CALL_OPS. On
> 32-bit powerpc, we emit one additional nop for this purpose resulting in
> a total of 5 nops before function entry.
> 
> To enable ftrace, the nop at function entry is changed to an
> unconditional branch to 'tramp'. The call to ftrace_caller() may be
> updated to ftrace_regs_caller() depending on the registered ftrace ops.
> On 64-bit powerpc, we additionally change the instruction at 'tramp' to
> 'mflr r0' from an unconditional branch back to func+4. This is so that
> functions entered through the GEP can skip the function profile sequence
> unless ftrace is enabled.
> 
> With the context_switch microbenchmark on a P9 machine, there is a
> performance improvement of ~6% with this patch applied, going from 650k
> context switches to 690k context switches without ftrace enabled. With
> ftrace enabled, the performance was similar at 86k context switches.

Wondering how significant that context_switch micorbenchmark is.

I ran it on both mpc885 and mpc8321 and I'm a bit puzzled by some of the 
results:
# ./context_switch --no-fp
Using threads with yield on cpus 0/0 touching FP:no altivec:no vector:no 
vdso:no

On 885, I get the following results before and after your patch.

CONFIG_FTRACE not selected : 44,9k
CONFIG_FTRACE selected, before : 32,8k
CONFIG_FTRACE selected, after : 33,6k

All this is with CONFIG_INIT_STACK_ALL_ZERO which is the default. But 
when I select CONFIG_INIT_STACK_NONE, the CONFIG_FTRACE not selected 
result is only 34,4.

On 8321:

CONFIG_FTRACE not selected : 100,3k
CONFIG_FTRACE selected, before : 72,5k
CONFIG_FTRACE selected, after : 116k

So the results look odd to me.

> 
> The downside of this approach is the increase in vmlinux size,
> especially on 32-bit powerpc. We now emit 3 additional instructions for
> each function (excluding the one or two instructions for supporting
> DYNAMIC_FTRACE_WITH_CALL_OPS). On 64-bit powerpc with the current
> implementation of -fpatchable-function-entry though, this is not
> avoidable since we are forced to emit 6 instructions between the GEP and
> the LEP even if we are to only support DYNAMIC_FTRACE_WITH_CALL_OPS.

The increase is almost 5% on the few 32 bits defconfig I have tested. 
That's a lot.

On 32 bits powerpc, only the e500 has a link stack that could end up 
being unbalanced. Could we keep the bctr and avoid the mtlr and the jump 
in the trampoline ?

On 8xx a NOP is one cycle, a taken branch is 2 cycles, but the second 
cycle is a bubble that most of the time gets filled by following 
operations. On the 83xx, branches and NOPs are supposed to be seamless.

So is that out-of-line trampoline really worth it ? Maybe keeping the 
ftrace instructions at the begining and just replacing the mflr by an 
jump when ftrace is off would help reduce the ftrace insns by one more 
instruction.

> 
> Signed-off-by: Naveen N Rao <naveen@kernel.org>
> ---

> diff --git a/arch/powerpc/include/asm/code-patching.h b/arch/powerpc/include/asm/code-patching.h
> index 84f6ccd7de7a..9a54bb9e0dde 100644
> --- a/arch/powerpc/include/asm/code-patching.h
> +++ b/arch/powerpc/include/asm/code-patching.h
> @@ -185,10 +185,21 @@ static inline unsigned long ppc_function_entry(void *func)
>   	 */
>   	if ((((*insn & OP_RT_RA_MASK) == ADDIS_R2_R12) ||
>   	     ((*insn & OP_RT_RA_MASK) == LIS_R2)) &&
> -	    ((*(insn+1) & OP_RT_RA_MASK) == ADDI_R2_R2))
> +	    ((*(insn+1) & OP_RT_RA_MASK) == ADDI_R2_R2)) {
> +#ifdef CONFIG_ARCH_USING_PATCHABLE_FUNCTION_ENTRY

Can you replace by IS_ENABLED() ?

> +		/*
> +		 * Heuristic: look for the 'mtlr r0' instruction assuming ftrace init is done.
> +		 * If it is not found, look for two consecutive nops after the GEP.
> +		 * Longer term, we really should be parsing the symbol table to determine LEP.
> +		 */
> +		if ((*(insn+4) == PPC_RAW_MTLR(_R0)) ||
> +		    ((*(insn+2) == PPC_RAW_NOP() && *(insn+3) == PPC_RAW_NOP())))
> +			return (unsigned long)(insn + 8);
> +#endif
>   		return (unsigned long)(insn + 2);
> -	else
> +	} else {
>   		return (unsigned long)func;
> +	}
>   #elif defined(CONFIG_PPC64_ELF_ABI_V1)
>   	/*
>   	 * On PPC64 ABIv1 the function pointer actually points to the

> diff --git a/arch/powerpc/kernel/trace/ftrace.c b/arch/powerpc/kernel/trace/ftrace.c
> index 2956196c98ff..d3b4949142a8 100644
> --- a/arch/powerpc/kernel/trace/ftrace.c
> +++ b/arch/powerpc/kernel/trace/ftrace.c

> @@ -217,15 +274,62 @@ int ftrace_init_nop(struct module *mod, struct dyn_ftrace *rec)
>   {
>   	unsigned long addr, ip = rec->ip;
>   	ppc_inst_t old, new;
> -	int ret = 0;
> +	int i, ret = 0;
> +	u32 ftrace_mcount_tramp_insns[] = {
> +#ifdef CONFIG_PPC64
> +		PPC_RAW_BRANCH(FTRACE_MCOUNT_TRAMP_OFFSET + MCOUNT_INSN_SIZE),
> +#else
> +		PPC_RAW_MFLR(_R0),
> +#endif

Can it be based on IS_ENABLED(CONFIG_PPC64) instead ?

> +		PPC_RAW_BL(0), /* bl ftrace_caller */
> +		PPC_RAW_MTLR(_R0), /* also see update ppc_function_entry() */
> +		PPC_RAW_BRANCH(FTRACE_MCOUNT_TRAMP_OFFSET - MCOUNT_INSN_SIZE * 2)
> +	};
> +

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH 6/9] powerpc/ftrace: Update and move function profile instructions out-of-line
  2023-12-21 10:46   ` Christophe Leroy
@ 2023-12-21 14:25     ` Steven Rostedt
  2023-12-21 15:20       ` Christophe Leroy
  2023-12-22 15:01     ` Naveen N Rao
  1 sibling, 1 reply; 16+ messages in thread
From: Steven Rostedt @ 2023-12-21 14:25 UTC (permalink / raw)
  To: Christophe Leroy
  Cc: Mark Rutland, Florent Revest, Naveen N Rao, Nicholas Piggin,
	linux-kernel, Aneesh Kumar K.V, Masami Hiramatsu, linuxppc-dev

On Thu, 21 Dec 2023 10:46:08 +0000
Christophe Leroy <christophe.leroy@csgroup.eu> wrote:

> > To enable ftrace, the nop at function entry is changed to an
> > unconditional branch to 'tramp'. The call to ftrace_caller() may be
> > updated to ftrace_regs_caller() depending on the registered ftrace ops.
> > On 64-bit powerpc, we additionally change the instruction at 'tramp' to
> > 'mflr r0' from an unconditional branch back to func+4. This is so that
> > functions entered through the GEP can skip the function profile sequence
> > unless ftrace is enabled.
> > 
> > With the context_switch microbenchmark on a P9 machine, there is a
> > performance improvement of ~6% with this patch applied, going from 650k
> > context switches to 690k context switches without ftrace enabled. With
> > ftrace enabled, the performance was similar at 86k context switches.  
> 
> Wondering how significant that context_switch micorbenchmark is.
> 
> I ran it on both mpc885 and mpc8321 and I'm a bit puzzled by some of the 
> results:
> # ./context_switch --no-fp
> Using threads with yield on cpus 0/0 touching FP:no altivec:no vector:no 
> vdso:no
> 
> On 885, I get the following results before and after your patch.
> 
> CONFIG_FTRACE not selected : 44,9k
> CONFIG_FTRACE selected, before : 32,8k
> CONFIG_FTRACE selected, after : 33,6k
> 
> All this is with CONFIG_INIT_STACK_ALL_ZERO which is the default. But 
> when I select CONFIG_INIT_STACK_NONE, the CONFIG_FTRACE not selected 
> result is only 34,4.
> 
> On 8321:
> 
> CONFIG_FTRACE not selected : 100,3k
> CONFIG_FTRACE selected, before : 72,5k
> CONFIG_FTRACE selected, after : 116k
> 
> So the results look odd to me.


BTW, CONFIG_FTRACE just enables the tracing system (I would like to change
that to CONFIG_TRACING, but not sure if I can without breaking .configs all
over the place).

The nops for ftrace is enabled with CONFIG_FUNCTION_TRACER.

-- Steve

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH 6/9] powerpc/ftrace: Update and move function profile instructions out-of-line
  2023-12-21 14:25     ` Steven Rostedt
@ 2023-12-21 15:20       ` Christophe Leroy
  0 siblings, 0 replies; 16+ messages in thread
From: Christophe Leroy @ 2023-12-21 15:20 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Mark Rutland, Florent Revest, Naveen N Rao, Nicholas Piggin,
	linux-kernel, Aneesh Kumar K.V, Masami Hiramatsu, linuxppc-dev



Le 21/12/2023 à 15:25, Steven Rostedt a écrit :
> On Thu, 21 Dec 2023 10:46:08 +0000
> Christophe Leroy <christophe.leroy@csgroup.eu> wrote:
> 
>>> To enable ftrace, the nop at function entry is changed to an
>>> unconditional branch to 'tramp'. The call to ftrace_caller() may be
>>> updated to ftrace_regs_caller() depending on the registered ftrace ops.
>>> On 64-bit powerpc, we additionally change the instruction at 'tramp' to
>>> 'mflr r0' from an unconditional branch back to func+4. This is so that
>>> functions entered through the GEP can skip the function profile sequence
>>> unless ftrace is enabled.
>>>
>>> With the context_switch microbenchmark on a P9 machine, there is a
>>> performance improvement of ~6% with this patch applied, going from 650k
>>> context switches to 690k context switches without ftrace enabled. With
>>> ftrace enabled, the performance was similar at 86k context switches.
>>
>> Wondering how significant that context_switch micorbenchmark is.
>>
>> I ran it on both mpc885 and mpc8321 and I'm a bit puzzled by some of the
>> results:
>> # ./context_switch --no-fp
>> Using threads with yield on cpus 0/0 touching FP:no altivec:no vector:no
>> vdso:no
>>
>> On 885, I get the following results before and after your patch.
>>
>> CONFIG_FTRACE not selected : 44,9k
>> CONFIG_FTRACE selected, before : 32,8k
>> CONFIG_FTRACE selected, after : 33,6k
>>
>> All this is with CONFIG_INIT_STACK_ALL_ZERO which is the default. But
>> when I select CONFIG_INIT_STACK_NONE, the CONFIG_FTRACE not selected
>> result is only 34,4.
>>
>> On 8321:
>>
>> CONFIG_FTRACE not selected : 100,3k
>> CONFIG_FTRACE selected, before : 72,5k
>> CONFIG_FTRACE selected, after : 116k
>>
>> So the results look odd to me.
> 
> 
> BTW, CONFIG_FTRACE just enables the tracing system (I would like to change
> that to CONFIG_TRACING, but not sure if I can without breaking .configs all
> over the place).
> 
> The nops for ftrace is enabled with CONFIG_FUNCTION_TRACER.

Yes I selected both CONFIG_FTRACE and CONFIG_FUNCTION_TRACER

Christophe

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH 6/9] powerpc/ftrace: Update and move function profile instructions out-of-line
  2023-12-21 10:46   ` Christophe Leroy
  2023-12-21 14:25     ` Steven Rostedt
@ 2023-12-22 15:01     ` Naveen N Rao
  1 sibling, 0 replies; 16+ messages in thread
From: Naveen N Rao @ 2023-12-22 15:01 UTC (permalink / raw)
  To: Christophe Leroy
  Cc: Mark Rutland, Florent Revest, linux-kernel, Steven Rostedt,
	Aneesh Kumar K.V, Nicholas Piggin, linuxppc-dev,
	Masami Hiramatsu

On Thu, Dec 21, 2023 at 10:46:08AM +0000, Christophe Leroy wrote:
> 
> 
> Le 08/12/2023 à 17:30, Naveen N Rao a écrit :
> > Function profile sequence on powerpc includes two instructions at the
> > beginning of each function:
> > 
> > 	mflr	r0
> > 	bl	ftrace_caller
> > 
> > The call to ftrace_caller() gets nop'ed out during kernel boot and is
> > patched in when ftrace is enabled.
> > 
> > There are two issues with this:
> > 1. The 'mflr r0' instruction at the beginning of each function remains
> >     even though ftrace is not being used.
> > 2. When ftrace is activated, we return from ftrace_caller() with a bctr
> >     instruction to preserve r0 and LR, resulting in the link stack
> >     becoming unbalanced.
> > 
> > To address (1), we have tried to nop'out the 'mflr r0' instruction when
> > nop'ing out the call to ftrace_caller() and restoring it when enabling
> > ftrace. But, that required additional synchronization slowing down
> > ftrace activation. It also left an additional nop instruction at the
> > beginning of each function and that wasn't desirable on 32-bit powerpc.
> > 
> > Instead of that, move the function profile sequence out-of-line leaving
> > a single nop at function entry. On ftrace activation, the nop is changed
> > to an unconditional branch to the out-of-line sequence that in turn
> > calls ftrace_caller(). This removes the need for complex synchronization
> > during ftrace activation and simplifies the code. More importantly, this
> > improves performance of the kernel when ftrace is not in use.
> > 
> > To address (2), change the ftrace trampoline to return with a 'blr'
> > instruction with the original return address in r0 intact. Then, an
> > additional 'mtlr r0' instruction in the function profile sequence can
> > move the correct return address back to LR.
> > 
> > With the above two changes, the function profile sequence now looks like
> > the following:
> > 
> >   [func:		# GEP -- 64-bit powerpc, optional
> > 	addis	r2,r12,imm1
> > 	addi	r2,r2,imm2]
> >    tramp:
> > 	mflr	r0
> > 	bl	ftrace_caller
> > 	mtlr	r0
> > 	b	func
> > 	nop
> > 	[nop]	# 64-bit powerpc only
> >    func:		# LEP
> > 	nop
> > 
> > On 32-bit powerpc, the ftrace mcount trampoline is now completely
> > outside the function. This is also the case on 64-bit powerpc for
> > functions that do not need a GEP. However, for functions that need a
> > GEP, the additional instructions are inserted between the GEP and the
> > LEP. Since we can only have a fixed number of instructions between GEP
> > and LEP, we choose to emit 6 instructions. Four of those instructions
> > are used for the function profile sequence and two instruction slots are
> > reserved for implementing support for DYNAMIC_FTRACE_WITH_CALL_OPS. On
> > 32-bit powerpc, we emit one additional nop for this purpose resulting in
> > a total of 5 nops before function entry.
> > 
> > To enable ftrace, the nop at function entry is changed to an
> > unconditional branch to 'tramp'. The call to ftrace_caller() may be
> > updated to ftrace_regs_caller() depending on the registered ftrace ops.
> > On 64-bit powerpc, we additionally change the instruction at 'tramp' to
> > 'mflr r0' from an unconditional branch back to func+4. This is so that
> > functions entered through the GEP can skip the function profile sequence
> > unless ftrace is enabled.
> > 
> > With the context_switch microbenchmark on a P9 machine, there is a
> > performance improvement of ~6% with this patch applied, going from 650k
> > context switches to 690k context switches without ftrace enabled. With
> > ftrace enabled, the performance was similar at 86k context switches.
> 
> Wondering how significant that context_switch micorbenchmark is.
> 
> I ran it on both mpc885 and mpc8321 and I'm a bit puzzled by some of the 
> results:
> # ./context_switch --no-fp
> Using threads with yield on cpus 0/0 touching FP:no altivec:no vector:no 
> vdso:no
> 
> On 885, I get the following results before and after your patch.
> 
> CONFIG_FTRACE not selected : 44,9k
> CONFIG_FTRACE selected, before : 32,8k
> CONFIG_FTRACE selected, after : 33,6k
> 
> All this is with CONFIG_INIT_STACK_ALL_ZERO which is the default. But 
> when I select CONFIG_INIT_STACK_NONE, the CONFIG_FTRACE not selected 
> result is only 34,4.
> 
> On 8321:
> 
> CONFIG_FTRACE not selected : 100,3k
> CONFIG_FTRACE selected, before : 72,5k
> CONFIG_FTRACE selected, after : 116k
> 
> So the results look odd to me.

That's indeed odd, though it looks to be showing good improvement. Are 
those numbers with/without the function tracer enabled?

Do you see more reasonable numbers if you use a different 
FUNCTION_ALIGNMENT?

> 
> > 
> > The downside of this approach is the increase in vmlinux size,
> > especially on 32-bit powerpc. We now emit 3 additional instructions for
> > each function (excluding the one or two instructions for supporting
> > DYNAMIC_FTRACE_WITH_CALL_OPS). On 64-bit powerpc with the current
> > implementation of -fpatchable-function-entry though, this is not
> > avoidable since we are forced to emit 6 instructions between the GEP and
> > the LEP even if we are to only support DYNAMIC_FTRACE_WITH_CALL_OPS.
> 
> The increase is almost 5% on the few 32 bits defconfig I have tested.  
> That's a lot.

Indeed. Note that one of those nops is for DYN_FTRACE_WITH_CALL_OPS, 
which we will need regardless.

Moving the ftrace mcount sequence out of line will alone need 2 
additional nops. 'mtlr r0' for balancing the link stack costs us one 
more nop.

> 
> On 32 bits powerpc, only the e500 has a link stack that could end up 
> being unbalanced. Could we keep the bctr and avoid the mtlr and the jump 
> in the trampoline ?
> 
> On 8xx a NOP is one cycle, a taken branch is 2 cycles, but the second 
> cycle is a bubble that most of the time gets filled by following 
> operations. On the 83xx, branches and NOPs are supposed to be seamless.
> 
> So is that out-of-line trampoline really worth it ? Maybe keeping the 
> ftrace instructions at the begining and just replacing the mflr by an 
> jump when ftrace is off would help reduce the ftrace insns by one more 
> instruction.

I was looking forward to your feedback w.r.t 32-bit powerpc since I 
couldn't test that.

We can certainly retain the existing behavior for ppc32, though it might 
make it harder to share the code base with ppc64.

The primary downside with the 'mflr r0' at function entry is that 
patching it out (or replacing it with a branch) will need additional
synchronization.

I'm out on vacation till end of the year. I plan on doing more tests 
once I am back to understand if the performance benefit is worth the 
increase in vmlinux size.


Thanks,
Naveen


^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2023-12-22 15:05 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-12-08 16:30 [RFC PATCH 0/9] powerpc: ftrace updates Naveen N Rao
2023-12-08 16:30 ` [RFC PATCH 1/9] powerpc/ftrace: Fix indentation in ftrace.h Naveen N Rao
2023-12-08 16:30 ` [RFC PATCH 2/9] powerpc/ftrace: Unify 32-bit and 64-bit ftrace entry code Naveen N Rao
2023-12-08 16:30 ` [RFC PATCH 3/9] powerpc/ftrace: Remove nops after the call to ftrace_stub Naveen N Rao
2023-12-08 16:30 ` [RFC PATCH 4/9] powerpc/Kconfig: Select FUNCTION_ALIGNMENT_4B Naveen N Rao
2023-12-08 16:30 ` [RFC PATCH 5/9] powerpc/kprobes: Use ftrace to determine if a probe is at function entry Naveen N Rao
2023-12-11  5:39   ` Masami Hiramatsu
2023-12-08 16:30 ` [RFC PATCH 6/9] powerpc/ftrace: Update and move function profile instructions out-of-line Naveen N Rao
2023-12-21 10:46   ` Christophe Leroy
2023-12-21 14:25     ` Steven Rostedt
2023-12-21 15:20       ` Christophe Leroy
2023-12-22 15:01     ` Naveen N Rao
2023-12-08 16:30 ` [RFC PATCH 7/9] powerpc/ftrace: Add support for DYNAMIC_FTRACE_WITH_CALL_OPS Naveen N Rao
2023-12-08 16:30 ` [RFC PATCH 8/9] powerpc/ftrace: Add support for DYNAMIC_FTRACE_WITH_DIRECT_CALLS Naveen N Rao
2023-12-08 16:30 ` [RFC PATCH 9/9] samples/ftrace: Add support for ftrace direct samples on powerpc Naveen N Rao
2023-12-21 10:38 ` (subset) [RFC PATCH 0/9] powerpc: ftrace updates Michael Ellerman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).