All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 00/13] ARM: ftrace fixes and cleanups
@ 2022-02-03  8:21 Ard Biesheuvel
  2022-02-03  8:21 ` [PATCH v3 01/13] ARM: ftrace: ensure that ADR takes the Thumb bit into account Ard Biesheuvel
                   ` (12 more replies)
  0 siblings, 13 replies; 27+ messages in thread
From: Ard Biesheuvel @ 2022-02-03  8:21 UTC (permalink / raw)
  To: linux
  Cc: linux-arm-kernel, Ard Biesheuvel, Steven Rostedt, Sudeep Holla,
	Cristian Marussi, Nathan Chancellor, Nick Desaulniers,
	Arnd Bergmann, Linus Walleij, Masami Hiramatsu

This series addresses a number of issues in the ARM support code for
ftrace, mostly related to Thumb2 but affecting other configurations as
well.

Changes since v2:
- simplify kprobes patch and avoid global FPREG_ macros
- update cacheflush code to avoid R7 and R11 entirely, instead of
  preserving/restoring them
- add new patch to disable ftrace in Broadcom Kona SMC code
- add acks from various folks (thanks!)

Changes since v1:
- add a couple of patches to enable ftrace in Thumb2 mode when building
  with Clang, which was the one remaining unsupported configuration
- fix up some minor code issues caught by the bots
- add some acks

Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Sudeep Holla <sudeep.holla@arm.com>
Cc: Cristian Marussi <cristian.marussi@arm.com>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Linus Walleij <linus.walleij@linaro.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>

Ard Biesheuvel (13):
  ARM: ftrace: ensure that ADR takes the Thumb bit into account
  ARM: ftrace: use ADD not POP to counter PUSH at entry
  ARM: ftrace: use trampolines to keep .init.text in branching range
  ARM: ftrace: avoid redundant loads or clobbering IP
  ARM: ftrace: avoid unnecessary literal loads
  ARM: ftrace: enable HAVE_FUNCTION_GRAPH_FP_TEST
  ARM: unwind: track location of LR value in stack frame
  ARM: ftrace: enable the graph tracer with the EABI unwinder
  ARM: kprobes: treat R7 as the frame pointer register in Thumb2 builds
  drivers/firmware/scmi: disable ftrace for Clang Thumb2 builds
  ARM: cacheflush: avoid clobbering the frame pointer
  ARM: mach-bcm: disable ftrace in SMC invocation routines
  Revert "ARM: 9144/1: forbid ftrace with clang and thumb2_kernel"

 arch/arm/Kconfig                         |   4 +-
 arch/arm/Kconfig.debug                   |   2 +-
 arch/arm/include/asm/cacheflush.h        |  12 +-
 arch/arm/include/asm/ftrace.h            |  20 +--
 arch/arm/include/asm/stacktrace.h        |   3 +
 arch/arm/kernel/Makefile                 |   6 +-
 arch/arm/kernel/entry-ftrace.S           | 128 +++++++++++---------
 arch/arm/kernel/ftrace.c                 |  62 ++++++++--
 arch/arm/kernel/unwind.c                 |   7 +-
 arch/arm/mach-bcm/Makefile               |   1 +
 arch/arm/mach-exynos/mcpm-exynos.c       |   6 +-
 arch/arm/mm/cache-v7.S                   |  40 +++---
 arch/arm/probes/kprobes/actions-common.c |   8 +-
 arch/arm/probes/kprobes/actions-thumb.c  |  16 ++-
 drivers/firmware/arm_scmi/Makefile       |   7 ++
 15 files changed, 189 insertions(+), 133 deletions(-)

-- 
2.30.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH v3 01/13] ARM: ftrace: ensure that ADR takes the Thumb bit into account
  2022-02-03  8:21 [PATCH v3 00/13] ARM: ftrace fixes and cleanups Ard Biesheuvel
@ 2022-02-03  8:21 ` Ard Biesheuvel
  2022-02-03  8:21 ` [PATCH v3 02/13] ARM: ftrace: use ADD not POP to counter PUSH at entry Ard Biesheuvel
                   ` (11 subsequent siblings)
  12 siblings, 0 replies; 27+ messages in thread
From: Ard Biesheuvel @ 2022-02-03  8:21 UTC (permalink / raw)
  To: linux
  Cc: linux-arm-kernel, Ard Biesheuvel, Steven Rostedt, Sudeep Holla,
	Cristian Marussi, Nathan Chancellor, Nick Desaulniers,
	Arnd Bergmann, Linus Walleij, Masami Hiramatsu

Using ADR to take the address of 'ftrace_stub' via a local label
produces an address that has the Thumb bit cleared, which means the
subsequent comparison is guaranteed to fail. Instead, use the badr
macro, which forces the Thumb bit to be set.

Fixes: a3ba87a61499 ("ARM: 6316/1: ftrace: add Thumb-2 support")
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
---
 arch/arm/kernel/entry-ftrace.S | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm/kernel/entry-ftrace.S b/arch/arm/kernel/entry-ftrace.S
index a74289ebc803..f4886fb6e9ba 100644
--- a/arch/arm/kernel/entry-ftrace.S
+++ b/arch/arm/kernel/entry-ftrace.S
@@ -40,7 +40,7 @@
 	mcount_enter
 	ldr	r0, =ftrace_trace_function
 	ldr	r2, [r0]
-	adr	r0, .Lftrace_stub
+	badr	r0, .Lftrace_stub
 	cmp	r0, r2
 	bne	1f
 
-- 
2.30.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v3 02/13] ARM: ftrace: use ADD not POP to counter PUSH at entry
  2022-02-03  8:21 [PATCH v3 00/13] ARM: ftrace fixes and cleanups Ard Biesheuvel
  2022-02-03  8:21 ` [PATCH v3 01/13] ARM: ftrace: ensure that ADR takes the Thumb bit into account Ard Biesheuvel
@ 2022-02-03  8:21 ` Ard Biesheuvel
  2022-02-03  8:21 ` [PATCH v3 03/13] ARM: ftrace: use trampolines to keep .init.text in branching range Ard Biesheuvel
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 27+ messages in thread
From: Ard Biesheuvel @ 2022-02-03  8:21 UTC (permalink / raw)
  To: linux
  Cc: linux-arm-kernel, Ard Biesheuvel, Steven Rostedt, Sudeep Holla,
	Cristian Marussi, Nathan Chancellor, Nick Desaulniers,
	Arnd Bergmann, Linus Walleij, Masami Hiramatsu

The compiler emitted hook used for ftrace consists of a PUSH {LR} to
preserve the link register, followed by a branch-and-link (BL) to
__gnu_mount_nc. Dynamic ftrace patches away the latter to turn the
combined sequence into a NOP, using a POP {LR} instruction.

This is not necessary, since the link register does not get clobbered in
this case, and simply adding #4 to the stack pointer is sufficient, and
avoids a memory access that may take a few cycles to resolve depending
on the micro-architecture.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
 arch/arm/kernel/entry-ftrace.S |  2 +-
 arch/arm/kernel/ftrace.c       | 15 +++++++++++++--
 2 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/arch/arm/kernel/entry-ftrace.S b/arch/arm/kernel/entry-ftrace.S
index f4886fb6e9ba..dca12a09322a 100644
--- a/arch/arm/kernel/entry-ftrace.S
+++ b/arch/arm/kernel/entry-ftrace.S
@@ -27,7 +27,7 @@
  * allows it to be clobbered in subroutines and doesn't use it to hold
  * parameters.)
  *
- * When using dynamic ftrace, we patch out the mcount call by a "pop {lr}"
+ * When using dynamic ftrace, we patch out the mcount call by a "add sp, #4"
  * instead of the __gnu_mcount_nc call (see arch/arm/kernel/ftrace.c).
  */
 
diff --git a/arch/arm/kernel/ftrace.c b/arch/arm/kernel/ftrace.c
index a006585e1c09..811cadf7eefc 100644
--- a/arch/arm/kernel/ftrace.c
+++ b/arch/arm/kernel/ftrace.c
@@ -24,10 +24,21 @@
 #include <asm/set_memory.h>
 #include <asm/patch.h>
 
+/*
+ * The compiler emitted profiling hook consists of
+ *
+ *   PUSH    {LR}
+ *   BL	     __gnu_mcount_nc
+ *
+ * To turn this combined sequence into a NOP, we need to restore the value of
+ * SP before the PUSH. Let's use an ADD rather than a POP into LR, as LR is not
+ * modified anyway, and reloading LR from memory is highly likely to be less
+ * efficient.
+ */
 #ifdef CONFIG_THUMB2_KERNEL
-#define	NOP		0xf85deb04	/* pop.w {lr} */
+#define	NOP		0xf10d0d04	/* add.w sp, sp, #4 */
 #else
-#define	NOP		0xe8bd4000	/* pop {lr} */
+#define	NOP		0xe28dd004	/* add   sp, sp, #4 */
 #endif
 
 #ifdef CONFIG_DYNAMIC_FTRACE
-- 
2.30.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v3 03/13] ARM: ftrace: use trampolines to keep .init.text in branching range
  2022-02-03  8:21 [PATCH v3 00/13] ARM: ftrace fixes and cleanups Ard Biesheuvel
  2022-02-03  8:21 ` [PATCH v3 01/13] ARM: ftrace: ensure that ADR takes the Thumb bit into account Ard Biesheuvel
  2022-02-03  8:21 ` [PATCH v3 02/13] ARM: ftrace: use ADD not POP to counter PUSH at entry Ard Biesheuvel
@ 2022-02-03  8:21 ` Ard Biesheuvel
  2022-02-03  8:21 ` [PATCH v3 04/13] ARM: ftrace: avoid redundant loads or clobbering IP Ard Biesheuvel
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 27+ messages in thread
From: Ard Biesheuvel @ 2022-02-03  8:21 UTC (permalink / raw)
  To: linux
  Cc: linux-arm-kernel, Ard Biesheuvel, Steven Rostedt, Sudeep Holla,
	Cristian Marussi, Nathan Chancellor, Nick Desaulniers,
	Arnd Bergmann, Linus Walleij, Masami Hiramatsu

Kernel images that are large in comparison to the range of a direct
branch may fail to work as expected with ftrace, as patching a direct
branch to one of the core ftrace routines may not be possible from the
.init.text section, if it is emitted too far away from the normal .text
section.

This is more likely to affect Thumb2 builds, given that its range is
only -/+ 16 MiB (as opposed to ARM which has -/+ 32 MiB), but may occur
in either ISA.

To work around this, add a couple of trampolines to .init.text and
swap these in when the ftrace patching code is operating on callers in
.init.text.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
 arch/arm/kernel/entry-ftrace.S | 16 ++++++++++++++
 arch/arm/kernel/ftrace.c       | 23 +++++++++++++++++---
 2 files changed, 36 insertions(+), 3 deletions(-)

diff --git a/arch/arm/kernel/entry-ftrace.S b/arch/arm/kernel/entry-ftrace.S
index dca12a09322a..237d435e29aa 100644
--- a/arch/arm/kernel/entry-ftrace.S
+++ b/arch/arm/kernel/entry-ftrace.S
@@ -270,3 +270,19 @@ ENTRY(ftrace_stub)
 .Lftrace_stub:
 	ret	lr
 ENDPROC(ftrace_stub)
+
+#ifdef CONFIG_DYNAMIC_FTRACE
+
+	__INIT
+
+	.macro	init_tramp, dst:req
+ENTRY(\dst\()_from_init)
+	ldr	pc, =\dst
+ENDPROC(\dst\()_from_init)
+	.endm
+
+	init_tramp	ftrace_caller
+#ifdef CONFIG_DYNAMIC_FTRACE_WITH_REGS
+	init_tramp	ftrace_regs_caller
+#endif
+#endif
diff --git a/arch/arm/kernel/ftrace.c b/arch/arm/kernel/ftrace.c
index 811cadf7eefc..74d3913f5590 100644
--- a/arch/arm/kernel/ftrace.c
+++ b/arch/arm/kernel/ftrace.c
@@ -62,9 +62,20 @@ static unsigned long ftrace_nop_replace(struct dyn_ftrace *rec)
 	return NOP;
 }
 
-static unsigned long adjust_address(struct dyn_ftrace *rec, unsigned long addr)
+void ftrace_caller_from_init(void);
+void ftrace_regs_caller_from_init(void);
+
+static unsigned long __ref adjust_address(struct dyn_ftrace *rec,
+					  unsigned long addr)
 {
-	return addr;
+	if (!IS_ENABLED(CONFIG_DYNAMIC_FTRACE) ||
+	    system_state >= SYSTEM_FREEING_INITMEM ||
+	    likely(!is_kernel_inittext(rec->ip)))
+		return addr;
+	if (!IS_ENABLED(CONFIG_DYNAMIC_FTRACE_WITH_REGS) ||
+	    addr == (unsigned long)&ftrace_caller)
+		return (unsigned long)&ftrace_caller_from_init;
+	return (unsigned long)&ftrace_regs_caller_from_init;
 }
 
 int ftrace_arch_code_modify_prepare(void)
@@ -200,7 +211,13 @@ int ftrace_make_nop(struct module *mod,
 #endif
 
 	new = ftrace_nop_replace(rec);
-	ret = ftrace_modify_code(ip, old, new, true);
+	/*
+	 * Locations in .init.text may call __gnu_mcount_mc via a linker
+	 * emitted veneer if they are too far away from its implementation, and
+	 * so validation may fail spuriously in such cases. Let's work around
+	 * this by omitting those from validation.
+	 */
+	ret = ftrace_modify_code(ip, old, new, !is_kernel_inittext(ip));
 
 	return ret;
 }
-- 
2.30.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v3 04/13] ARM: ftrace: avoid redundant loads or clobbering IP
  2022-02-03  8:21 [PATCH v3 00/13] ARM: ftrace fixes and cleanups Ard Biesheuvel
                   ` (2 preceding siblings ...)
  2022-02-03  8:21 ` [PATCH v3 03/13] ARM: ftrace: use trampolines to keep .init.text in branching range Ard Biesheuvel
@ 2022-02-03  8:21 ` Ard Biesheuvel
  2022-02-03  8:21 ` [PATCH v3 05/13] ARM: ftrace: avoid unnecessary literal loads Ard Biesheuvel
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 27+ messages in thread
From: Ard Biesheuvel @ 2022-02-03  8:21 UTC (permalink / raw)
  To: linux
  Cc: linux-arm-kernel, Ard Biesheuvel, Steven Rostedt, Sudeep Holla,
	Cristian Marussi, Nathan Chancellor, Nick Desaulniers,
	Arnd Bergmann, Linus Walleij, Masami Hiramatsu

Tweak the ftrace return paths to avoid redundant loads of SP, as well as
unnecessary clobbering of IP.

This also fixes the inconsistency of using MOV to perform a function
return, which is sub-optimal on recent micro-architectures but more
importantly, does not perform an interworking return, unlike compiler
generated function returns in Thumb2 builds.

Let's fix this by popping PC from the stack like most ordinary code
does.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
 arch/arm/kernel/entry-ftrace.S | 51 +++++++++-----------
 1 file changed, 22 insertions(+), 29 deletions(-)

diff --git a/arch/arm/kernel/entry-ftrace.S b/arch/arm/kernel/entry-ftrace.S
index 237d435e29aa..67548c38a567 100644
--- a/arch/arm/kernel/entry-ftrace.S
+++ b/arch/arm/kernel/entry-ftrace.S
@@ -22,10 +22,7 @@
  * mcount can be thought of as a function called in the middle of a subroutine
  * call.  As such, it needs to be transparent for both the caller and the
  * callee: the original lr needs to be restored when leaving mcount, and no
- * registers should be clobbered.  (In the __gnu_mcount_nc implementation, we
- * clobber the ip register.  This is OK because the ARM calling convention
- * allows it to be clobbered in subroutines and doesn't use it to hold
- * parameters.)
+ * registers should be clobbered.
  *
  * When using dynamic ftrace, we patch out the mcount call by a "add sp, #4"
  * instead of the __gnu_mcount_nc call (see arch/arm/kernel/ftrace.c).
@@ -70,26 +67,25 @@
 
 .macro __ftrace_regs_caller
 
-	sub	sp, sp, #8	@ space for PC and CPSR OLD_R0,
+	str	lr, [sp, #-8]!	@ store LR as PC and make space for CPSR/OLD_R0,
 				@ OLD_R0 will overwrite previous LR
 
-	add 	ip, sp, #12	@ move in IP the value of SP as it was
-				@ before the push {lr} of the mcount mechanism
+	ldr	lr, [sp, #8]    @ get previous LR
 
-	str     lr, [sp, #0]    @ store LR instead of PC
+	str	r0, [sp, #8]	@ write r0 as OLD_R0 over previous LR
 
-	ldr     lr, [sp, #8]    @ get previous LR
+	str	lr, [sp, #-4]!	@ store previous LR as LR
 
-	str	r0, [sp, #8]	@ write r0 as OLD_R0 over previous LR
+	add 	lr, sp, #16	@ move in LR the value of SP as it was
+				@ before the push {lr} of the mcount mechanism
 
-	stmdb   sp!, {ip, lr}
-	stmdb   sp!, {r0-r11, lr}
+	push	{r0-r11, ip, lr}
 
 	@ stack content at this point:
 	@ 0  4          48   52       56            60   64    68       72
-	@ R0 | R1 | ... | LR | SP + 4 | previous LR | LR | PSR | OLD_R0 |
+	@ R0 | R1 | ... | IP | SP + 4 | previous LR | LR | PSR | OLD_R0 |
 
-	mov r3, sp				@ struct pt_regs*
+	mov	r3, sp				@ struct pt_regs*
 
 	ldr r2, =function_trace_op
 	ldr r2, [r2]				@ pointer to the current
@@ -112,11 +108,9 @@ ftrace_graph_regs_call:
 #endif
 
 	@ pop saved regs
-	ldmia   sp!, {r0-r12}			@ restore r0 through r12
-	ldr	ip, [sp, #8]			@ restore PC
-	ldr	lr, [sp, #4]			@ restore LR
-	ldr	sp, [sp, #0]			@ restore SP
-	mov	pc, ip				@ return
+	pop	{r0-r11, ip, lr}		@ restore r0 through r12
+	ldr	lr, [sp], #4			@ restore LR
+	ldr	pc, [sp], #12
 .endm
 
 #ifdef CONFIG_FUNCTION_GRAPH_TRACER
@@ -132,11 +126,9 @@ ftrace_graph_regs_call:
 	bl	prepare_ftrace_return
 
 	@ pop registers saved in ftrace_regs_caller
-	ldmia   sp!, {r0-r12}			@ restore r0 through r12
-	ldr	ip, [sp, #8]			@ restore PC
-	ldr	lr, [sp, #4]			@ restore LR
-	ldr	sp, [sp, #0]			@ restore SP
-	mov	pc, ip				@ return
+	pop	{r0-r11, ip, lr}		@ restore r0 through r12
+	ldr	lr, [sp], #4			@ restore LR
+	ldr	pc, [sp], #12
 
 .endm
 #endif
@@ -202,16 +194,17 @@ ftrace_graph_call\suffix:
 .endm
 
 .macro mcount_exit
-	ldmia	sp!, {r0-r3, ip, lr}
-	ret	ip
+	ldmia	sp!, {r0-r3}
+	ldr	lr, [sp, #4]
+	ldr	pc, [sp], #8
 .endm
 
 ENTRY(__gnu_mcount_nc)
 UNWIND(.fnstart)
 #ifdef CONFIG_DYNAMIC_FTRACE
-	mov	ip, lr
-	ldmia	sp!, {lr}
-	ret	ip
+	push	{lr}
+	ldr	lr, [sp, #4]
+	ldr	pc, [sp], #8
 #else
 	__mcount
 #endif
-- 
2.30.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v3 05/13] ARM: ftrace: avoid unnecessary literal loads
  2022-02-03  8:21 [PATCH v3 00/13] ARM: ftrace fixes and cleanups Ard Biesheuvel
                   ` (3 preceding siblings ...)
  2022-02-03  8:21 ` [PATCH v3 04/13] ARM: ftrace: avoid redundant loads or clobbering IP Ard Biesheuvel
@ 2022-02-03  8:21 ` Ard Biesheuvel
  2022-02-03  8:21 ` [PATCH v3 06/13] ARM: ftrace: enable HAVE_FUNCTION_GRAPH_FP_TEST Ard Biesheuvel
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 27+ messages in thread
From: Ard Biesheuvel @ 2022-02-03  8:21 UTC (permalink / raw)
  To: linux
  Cc: linux-arm-kernel, Ard Biesheuvel, Steven Rostedt, Sudeep Holla,
	Cristian Marussi, Nathan Chancellor, Nick Desaulniers,
	Arnd Bergmann, Linus Walleij, Masami Hiramatsu

Avoid explicit literal loads and instead, use accessor macros that
generate the optimal sequence depending on the architecture revision
being targeted.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
 arch/arm/kernel/entry-ftrace.S | 27 ++++++++------------
 1 file changed, 11 insertions(+), 16 deletions(-)

diff --git a/arch/arm/kernel/entry-ftrace.S b/arch/arm/kernel/entry-ftrace.S
index 67548c38a567..99720064a4c5 100644
--- a/arch/arm/kernel/entry-ftrace.S
+++ b/arch/arm/kernel/entry-ftrace.S
@@ -35,23 +35,20 @@
 
 .macro __mcount suffix
 	mcount_enter
-	ldr	r0, =ftrace_trace_function
-	ldr	r2, [r0]
+	ldr_va	r2, ftrace_trace_function
 	badr	r0, .Lftrace_stub
 	cmp	r0, r2
 	bne	1f
 
 #ifdef CONFIG_FUNCTION_GRAPH_TRACER
-	ldr     r1, =ftrace_graph_return
-	ldr     r2, [r1]
-	cmp     r0, r2
-	bne     ftrace_graph_caller\suffix
-
-	ldr     r1, =ftrace_graph_entry
-	ldr     r2, [r1]
-	ldr     r0, =ftrace_graph_entry_stub
-	cmp     r0, r2
-	bne     ftrace_graph_caller\suffix
+	ldr_va	r2, ftrace_graph_return
+	cmp	r0, r2
+	bne	ftrace_graph_caller\suffix
+
+	ldr_va	r2, ftrace_graph_entry
+	mov_l	r0, ftrace_graph_entry_stub
+	cmp	r0, r2
+	bne	ftrace_graph_caller\suffix
 #endif
 
 	mcount_exit
@@ -87,8 +84,7 @@
 
 	mov	r3, sp				@ struct pt_regs*
 
-	ldr r2, =function_trace_op
-	ldr r2, [r2]				@ pointer to the current
+	ldr_va	r2, function_trace_op		@ pointer to the current
 						@ function tracing op
 
 	ldr	r1, [sp, #S_LR]			@ lr of instrumented func
@@ -141,8 +137,7 @@ ftrace_graph_regs_call:
 	mcount_adjust_addr	r0, lr		@ instrumented function
 
 #ifdef CONFIG_DYNAMIC_FTRACE_WITH_REGS
-	ldr r2, =function_trace_op
-	ldr r2, [r2]				@ pointer to the current
+	ldr_va	r2, function_trace_op		@ pointer to the current
 						@ function tracing op
 	mov r3, #0				@ regs is NULL
 #endif
-- 
2.30.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v3 06/13] ARM: ftrace: enable HAVE_FUNCTION_GRAPH_FP_TEST
  2022-02-03  8:21 [PATCH v3 00/13] ARM: ftrace fixes and cleanups Ard Biesheuvel
                   ` (4 preceding siblings ...)
  2022-02-03  8:21 ` [PATCH v3 05/13] ARM: ftrace: avoid unnecessary literal loads Ard Biesheuvel
@ 2022-02-03  8:21 ` Ard Biesheuvel
  2022-02-03  8:21 ` [PATCH v3 07/13] ARM: unwind: track location of LR value in stack frame Ard Biesheuvel
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 27+ messages in thread
From: Ard Biesheuvel @ 2022-02-03  8:21 UTC (permalink / raw)
  To: linux
  Cc: linux-arm-kernel, Ard Biesheuvel, Steven Rostedt, Sudeep Holla,
	Cristian Marussi, Nathan Chancellor, Nick Desaulniers,
	Arnd Bergmann, Linus Walleij, Masami Hiramatsu

Fix the frame pointer handling in the function graph tracer entry and
exit code so we can enable HAVE_FUNCTION_GRAPH_FP_TEST. Instead of using
FP directly (which will have different values between the entry and exit
pieces of the function graph tracer), use the value of SP at entry and
exit, as we can derive the former value from the frame pointer.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
 arch/arm/include/asm/ftrace.h  | 2 ++
 arch/arm/kernel/entry-ftrace.S | 2 +-
 arch/arm/kernel/ftrace.c       | 5 +++++
 3 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/arch/arm/include/asm/ftrace.h b/arch/arm/include/asm/ftrace.h
index a4dbac07e4ef..b4f5fab6b04e 100644
--- a/arch/arm/include/asm/ftrace.h
+++ b/arch/arm/include/asm/ftrace.h
@@ -2,6 +2,8 @@
 #ifndef _ASM_ARM_FTRACE
 #define _ASM_ARM_FTRACE
 
+#define HAVE_FUNCTION_GRAPH_FP_TEST
+
 #ifdef CONFIG_DYNAMIC_FTRACE_WITH_REGS
 #define ARCH_SUPPORTS_FTRACE_OPS 1
 #endif
diff --git a/arch/arm/kernel/entry-ftrace.S b/arch/arm/kernel/entry-ftrace.S
index 99720064a4c5..bbfa0954c385 100644
--- a/arch/arm/kernel/entry-ftrace.S
+++ b/arch/arm/kernel/entry-ftrace.S
@@ -247,7 +247,7 @@ ENDPROC(ftrace_graph_regs_caller)
 	.globl return_to_handler
 return_to_handler:
 	stmdb	sp!, {r0-r3}
-	mov	r0, fp			@ frame pointer
+	add	r0, sp, #16		@ sp at exit of instrumented routine
 	bl	ftrace_return_to_handler
 	mov	lr, r0			@ r0 has real ret addr
 	ldmia	sp!, {r0-r3}
diff --git a/arch/arm/kernel/ftrace.c b/arch/arm/kernel/ftrace.c
index 74d3913f5590..ea2396900c7d 100644
--- a/arch/arm/kernel/ftrace.c
+++ b/arch/arm/kernel/ftrace.c
@@ -233,6 +233,11 @@ void prepare_ftrace_return(unsigned long *parent, unsigned long self_addr,
 	if (unlikely(atomic_read(&current->tracing_graph_pause)))
 		return;
 
+	if (IS_ENABLED(CONFIG_UNWINDER_FRAME_POINTER)) {
+		/* FP points one word below parent's top of stack */
+		frame_pointer += 4;
+	}
+
 	old = *parent;
 	*parent = return_hooker;
 
-- 
2.30.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v3 07/13] ARM: unwind: track location of LR value in stack frame
  2022-02-03  8:21 [PATCH v3 00/13] ARM: ftrace fixes and cleanups Ard Biesheuvel
                   ` (5 preceding siblings ...)
  2022-02-03  8:21 ` [PATCH v3 06/13] ARM: ftrace: enable HAVE_FUNCTION_GRAPH_FP_TEST Ard Biesheuvel
@ 2022-02-03  8:21 ` Ard Biesheuvel
  2022-02-07 18:14   ` Nick Desaulniers
  2022-02-03  8:21 ` [PATCH v3 08/13] ARM: ftrace: enable the graph tracer with the EABI unwinder Ard Biesheuvel
                   ` (5 subsequent siblings)
  12 siblings, 1 reply; 27+ messages in thread
From: Ard Biesheuvel @ 2022-02-03  8:21 UTC (permalink / raw)
  To: linux
  Cc: linux-arm-kernel, Ard Biesheuvel, Steven Rostedt, Sudeep Holla,
	Cristian Marussi, Nathan Chancellor, Nick Desaulniers,
	Arnd Bergmann, Linus Walleij, Masami Hiramatsu

The ftrace graph tracer needs to override the return address of an
instrumented function, in order to install a hook that gets invoked when
the function returns again.

Currently, we only support this when building for ARM using GCC with
frame pointers, as in this case, it is guaranteed that the function will
reload LR from [FP, #-4] in all cases, and we can simply pass that
address to the ftrace code.

In order to support this for configurations that rely on the EABI
unwinder, such as Thumb2 builds, make the unwinder keep track of the
address from which LR was unwound, permitting ftrace to make use of this
in a subsequent patch.

Drop the call to is_kernel_text_address(), which is problematic in terms
of ftrace recursion, given that it may be instrumented itself. The call
is redundant anyway, as no unwind directives will be found unless the PC
points to memory that is known to contain executable code.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm/include/asm/stacktrace.h | 3 +++
 arch/arm/kernel/Makefile          | 1 +
 arch/arm/kernel/unwind.c          | 7 ++++---
 3 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/arch/arm/include/asm/stacktrace.h b/arch/arm/include/asm/stacktrace.h
index d87d60532b86..e56503fd9447 100644
--- a/arch/arm/include/asm/stacktrace.h
+++ b/arch/arm/include/asm/stacktrace.h
@@ -14,6 +14,9 @@ struct stackframe {
 	unsigned long sp;
 	unsigned long lr;
 	unsigned long pc;
+
+	/* address of the LR value on the stack */
+	unsigned long *lr_addr;
 #ifdef CONFIG_KRETPROBES
 	struct llist_node *kr_cur;
 	struct task_struct *tsk;
diff --git a/arch/arm/kernel/Makefile b/arch/arm/kernel/Makefile
index ae295a3bcfef..56511856ff9d 100644
--- a/arch/arm/kernel/Makefile
+++ b/arch/arm/kernel/Makefile
@@ -10,6 +10,7 @@ ifdef CONFIG_FUNCTION_TRACER
 CFLAGS_REMOVE_ftrace.o = -pg
 CFLAGS_REMOVE_insn.o = -pg
 CFLAGS_REMOVE_patch.o = -pg
+CFLAGS_REMOVE_unwind.o = -pg
 endif
 
 CFLAGS_REMOVE_return_address.o = -pg
diff --git a/arch/arm/kernel/unwind.c b/arch/arm/kernel/unwind.c
index c5ea328c428d..b4e468a7674b 100644
--- a/arch/arm/kernel/unwind.c
+++ b/arch/arm/kernel/unwind.c
@@ -55,6 +55,7 @@ struct unwind_ctrl_block {
 	const unsigned long *insn;	/* pointer to the current instructions word */
 	unsigned long sp_low;		/* lowest value of sp allowed */
 	unsigned long sp_high;		/* highest value of sp allowed */
+	unsigned long *lr_addr;		/* address of LR value on the stack */
 	/*
 	 * 1 : check for stack overflow for each register pop.
 	 * 0 : save overhead if there is plenty of stack remaining.
@@ -239,6 +240,8 @@ static int unwind_pop_register(struct unwind_ctrl_block *ctrl,
 	 * from being tracked by KASAN.
 	 */
 	ctrl->vrs[reg] = READ_ONCE_NOCHECK(*(*vsp));
+	if (reg == 14)
+		ctrl->lr_addr = *vsp;
 	(*vsp)++;
 	return URC_OK;
 }
@@ -395,9 +398,6 @@ int unwind_frame(struct stackframe *frame)
 	pr_debug("%s(pc = %08lx lr = %08lx sp = %08lx)\n", __func__,
 		 frame->pc, frame->lr, frame->sp);
 
-	if (!kernel_text_address(frame->pc))
-		return -URC_FAILURE;
-
 	idx = unwind_find_idx(frame->pc);
 	if (!idx) {
 		pr_warn("unwind: Index not found %08lx\n", frame->pc);
@@ -476,6 +476,7 @@ int unwind_frame(struct stackframe *frame)
 	frame->lr = ctrl.vrs[LR];
 	frame->pc = ctrl.vrs[PC];
 	frame->sp_low = ctrl.sp_low;
+	frame->lr_addr = ctrl.lr_addr;
 
 	return URC_OK;
 }
-- 
2.30.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v3 08/13] ARM: ftrace: enable the graph tracer with the EABI unwinder
  2022-02-03  8:21 [PATCH v3 00/13] ARM: ftrace fixes and cleanups Ard Biesheuvel
                   ` (6 preceding siblings ...)
  2022-02-03  8:21 ` [PATCH v3 07/13] ARM: unwind: track location of LR value in stack frame Ard Biesheuvel
@ 2022-02-03  8:21 ` Ard Biesheuvel
  2022-02-03  9:16   ` Arnd Bergmann
  2022-02-03  8:22 ` [PATCH v3 09/13] ARM: kprobes: treat R7 as the frame pointer register in Thumb2 builds Ard Biesheuvel
                   ` (4 subsequent siblings)
  12 siblings, 1 reply; 27+ messages in thread
From: Ard Biesheuvel @ 2022-02-03  8:21 UTC (permalink / raw)
  To: linux
  Cc: linux-arm-kernel, Ard Biesheuvel, Steven Rostedt, Sudeep Holla,
	Cristian Marussi, Nathan Chancellor, Nick Desaulniers,
	Arnd Bergmann, Linus Walleij, Masami Hiramatsu

Enable the function graph tracer in combination with the EABI unwinder,
so that Thumb2 builds or Clang ARM builds can make use of it.

This involves using the unwinder to locate the return address of an
instrumented function on the stack, so that it can be overridden and
made to refer to the ftrace handling routines that need to be called at
function return.

Given that for these builds, it is not guaranteed that the value of the
link register is stored on the stack, fall back to the stack slot that
will be used by the ftrace exit code to restore LR in the instrumented
function's execution context.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
 arch/arm/Kconfig               |  2 +-
 arch/arm/Kconfig.debug         |  2 +-
 arch/arm/include/asm/ftrace.h  | 18 -------------
 arch/arm/kernel/Makefile       |  5 +---
 arch/arm/kernel/entry-ftrace.S | 28 ++++++++++++++------
 arch/arm/kernel/ftrace.c       | 19 +++++++++++--
 6 files changed, 40 insertions(+), 34 deletions(-)

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 8eff55222874..3c6216f194f2 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -92,7 +92,7 @@ config ARM
 	select HAVE_EXIT_THREAD
 	select HAVE_FAST_GUP if ARM_LPAE
 	select HAVE_FTRACE_MCOUNT_RECORD if !XIP_KERNEL
-	select HAVE_FUNCTION_GRAPH_TRACER if !THUMB2_KERNEL && !CC_IS_CLANG
+	select HAVE_FUNCTION_GRAPH_TRACER
 	select HAVE_FUNCTION_TRACER if !XIP_KERNEL && !(THUMB2_KERNEL && CC_IS_CLANG)
 	select HAVE_FUTEX_CMPXCHG if FUTEX
 	select HAVE_GCC_PLUGINS
diff --git a/arch/arm/Kconfig.debug b/arch/arm/Kconfig.debug
index 07055a503022..cc7523f44be4 100644
--- a/arch/arm/Kconfig.debug
+++ b/arch/arm/Kconfig.debug
@@ -65,7 +65,7 @@ config UNWINDER_FRAME_POINTER
 
 config UNWINDER_ARM
 	bool "ARM EABI stack unwinder"
-	depends on AEABI && !FUNCTION_GRAPH_TRACER
+	depends on AEABI
 	# https://github.com/ClangBuiltLinux/linux/issues/732
 	depends on !LD_IS_LLD || LLD_VERSION >= 110000
 	select ARM_UNWIND
diff --git a/arch/arm/include/asm/ftrace.h b/arch/arm/include/asm/ftrace.h
index b4f5fab6b04e..5358aad67831 100644
--- a/arch/arm/include/asm/ftrace.h
+++ b/arch/arm/include/asm/ftrace.h
@@ -35,26 +35,8 @@ static inline unsigned long ftrace_call_adjust(unsigned long addr)
 
 #ifndef __ASSEMBLY__
 
-#if defined(CONFIG_FRAME_POINTER) && !defined(CONFIG_ARM_UNWIND)
-/*
- * return_address uses walk_stackframe to do it's work.  If both
- * CONFIG_FRAME_POINTER=y and CONFIG_ARM_UNWIND=y walk_stackframe uses unwind
- * information.  For this to work in the function tracer many functions would
- * have to be marked with __notrace.  So for now just depend on
- * !CONFIG_ARM_UNWIND.
- */
-
 void *return_address(unsigned int);
 
-#else
-
-static inline void *return_address(unsigned int level)
-{
-	return NULL;
-}
-
-#endif
-
 #define ftrace_return_address(n) return_address(n)
 
 #define ARCH_HAS_SYSCALL_MATCH_SYM_NAME
diff --git a/arch/arm/kernel/Makefile b/arch/arm/kernel/Makefile
index 56511856ff9d..5cebb8d5a1d6 100644
--- a/arch/arm/kernel/Makefile
+++ b/arch/arm/kernel/Makefile
@@ -25,10 +25,7 @@ obj-y		:= elf.o entry-common.o irq.o opcodes.o \
 KASAN_SANITIZE_stacktrace.o := n
 KASAN_SANITIZE_traps.o := n
 
-ifneq ($(CONFIG_ARM_UNWIND),y)
-obj-$(CONFIG_FRAME_POINTER)	+= return_address.o
-endif
-
+obj-y				+= return_address.o
 obj-$(CONFIG_ATAGS)		+= atags_parse.o
 obj-$(CONFIG_ATAGS_PROC)	+= atags_proc.o
 obj-$(CONFIG_DEPRECATED_PARAM_STRUCT) += atags_compat.o
diff --git a/arch/arm/kernel/entry-ftrace.S b/arch/arm/kernel/entry-ftrace.S
index bbfa0954c385..3e7bcaca5e07 100644
--- a/arch/arm/kernel/entry-ftrace.S
+++ b/arch/arm/kernel/entry-ftrace.S
@@ -100,7 +100,8 @@ ftrace_regs_call:
 #ifdef CONFIG_FUNCTION_GRAPH_TRACER
 	.globl ftrace_graph_regs_call
 ftrace_graph_regs_call:
-	mov	r0, r0
+ARM(	mov	r0, r0	)
+THUMB(	nop.w		)
 #endif
 
 	@ pop saved regs
@@ -112,13 +113,18 @@ ftrace_graph_regs_call:
 #ifdef CONFIG_FUNCTION_GRAPH_TRACER
 .macro __ftrace_graph_regs_caller
 
-	sub     r0, fp, #4              @ lr of instrumented routine (parent)
+#ifdef CONFIG_UNWINDER_FRAME_POINTER
+	sub	r0, fp, #4		@ lr of instrumented routine (parent)
+#else
+	add	r0, sp, #S_LR
+#endif
 
 	@ called from __ftrace_regs_caller
-	ldr     r1, [sp, #S_PC]		@ instrumented routine (func)
+	ldr	r1, [sp, #S_PC]		@ instrumented routine (func)
 	mcount_adjust_addr	r1, r1
 
-	mov	r2, fp			@ frame pointer
+	mov	r2, fpreg		@ frame pointer
+	add	r3, sp, #PT_REGS_SIZE
 	bl	prepare_ftrace_return
 
 	@ pop registers saved in ftrace_regs_caller
@@ -149,14 +155,19 @@ ftrace_call\suffix:
 #ifdef CONFIG_FUNCTION_GRAPH_TRACER
 	.globl ftrace_graph_call\suffix
 ftrace_graph_call\suffix:
-	mov	r0, r0
+ARM(	mov	r0, r0	)
+THUMB(	nop.w		)
 #endif
 
 	mcount_exit
 .endm
 
 .macro __ftrace_graph_caller
+#ifdef CONFIG_UNWINDER_FRAME_POINTER
 	sub	r0, fp, #4		@ &lr of instrumented routine (&parent)
+#else
+	add	r0, sp, #20
+#endif
 #ifdef CONFIG_DYNAMIC_FTRACE
 	@ called from __ftrace_caller, saved in mcount_enter
 	ldr	r1, [sp, #16]		@ instrumented routine (func)
@@ -165,7 +176,8 @@ ftrace_graph_call\suffix:
 	@ called from __mcount, untouched in lr
 	mcount_adjust_addr	r1, lr	@ instrumented routine (func)
 #endif
-	mov	r2, fp			@ frame pointer
+	mov	r2, fpreg		@ frame pointer
+	add	r3, sp, #24
 	bl	prepare_ftrace_return
 	mcount_exit
 .endm
@@ -244,14 +256,14 @@ ENDPROC(ftrace_graph_regs_caller)
 .purgem mcount_exit
 
 #ifdef CONFIG_FUNCTION_GRAPH_TRACER
-	.globl return_to_handler
-return_to_handler:
+ENTRY(return_to_handler)
 	stmdb	sp!, {r0-r3}
 	add	r0, sp, #16		@ sp at exit of instrumented routine
 	bl	ftrace_return_to_handler
 	mov	lr, r0			@ r0 has real ret addr
 	ldmia	sp!, {r0-r3}
 	ret	lr
+ENDPROC(return_to_handler)
 #endif
 
 ENTRY(ftrace_stub)
diff --git a/arch/arm/kernel/ftrace.c b/arch/arm/kernel/ftrace.c
index ea2396900c7d..83cc068586bc 100644
--- a/arch/arm/kernel/ftrace.c
+++ b/arch/arm/kernel/ftrace.c
@@ -22,6 +22,7 @@
 #include <asm/ftrace.h>
 #include <asm/insn.h>
 #include <asm/set_memory.h>
+#include <asm/stacktrace.h>
 #include <asm/patch.h>
 
 /*
@@ -224,8 +225,10 @@ int ftrace_make_nop(struct module *mod,
 #endif /* CONFIG_DYNAMIC_FTRACE */
 
 #ifdef CONFIG_FUNCTION_GRAPH_TRACER
+asmlinkage
 void prepare_ftrace_return(unsigned long *parent, unsigned long self_addr,
-			   unsigned long frame_pointer)
+			   unsigned long frame_pointer,
+			   unsigned long stack_pointer)
 {
 	unsigned long return_hooker = (unsigned long) &return_to_handler;
 	unsigned long old;
@@ -236,6 +239,18 @@ void prepare_ftrace_return(unsigned long *parent, unsigned long self_addr,
 	if (IS_ENABLED(CONFIG_UNWINDER_FRAME_POINTER)) {
 		/* FP points one word below parent's top of stack */
 		frame_pointer += 4;
+	} else {
+		struct stackframe frame = {
+			.fp = frame_pointer,
+			.sp = stack_pointer,
+			.lr = self_addr,
+			.pc = self_addr,
+		};
+		if (unwind_frame(&frame) < 0)
+			return;
+		if (frame.lr != self_addr)
+			parent = frame.lr_addr;
+		frame_pointer = frame.sp;
 	}
 
 	old = *parent;
@@ -258,7 +273,7 @@ static int __ftrace_modify_caller(unsigned long *callsite,
 	unsigned long caller_fn = (unsigned long) func;
 	unsigned long pc = (unsigned long) callsite;
 	unsigned long branch = arm_gen_branch(pc, caller_fn);
-	unsigned long nop = 0xe1a00000;	/* mov r0, r0 */
+	unsigned long nop = arm_gen_nop();
 	unsigned long old = enable ? nop : branch;
 	unsigned long new = enable ? branch : nop;
 
-- 
2.30.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v3 09/13] ARM: kprobes: treat R7 as the frame pointer register in Thumb2 builds
  2022-02-03  8:21 [PATCH v3 00/13] ARM: ftrace fixes and cleanups Ard Biesheuvel
                   ` (7 preceding siblings ...)
  2022-02-03  8:21 ` [PATCH v3 08/13] ARM: ftrace: enable the graph tracer with the EABI unwinder Ard Biesheuvel
@ 2022-02-03  8:22 ` Ard Biesheuvel
  2022-02-03  8:22 ` [PATCH v3 10/13] drivers/firmware/scmi: disable ftrace for Clang " Ard Biesheuvel
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 27+ messages in thread
From: Ard Biesheuvel @ 2022-02-03  8:22 UTC (permalink / raw)
  To: linux
  Cc: linux-arm-kernel, Ard Biesheuvel, Steven Rostedt, Sudeep Holla,
	Cristian Marussi, Nathan Chancellor, Nick Desaulniers,
	Arnd Bergmann, Linus Walleij, Masami Hiramatsu

Thumb2 code uses R7 as the frame pointer rather than R11, because the
opcodes to access it are generally shorter.

This means that there are cases where we cannot simply add it to the
clobber list of an asm() block, but need to preserve/restore it
explicitly, or the compiler may complain in some cases (e.g., Clang
builds with ftrace enabled).

Since R11 is not special in that case, clobber it instead, and use it to
preserve/restore the value of R7.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
---
 arch/arm/probes/kprobes/actions-common.c |  8 +++++---
 arch/arm/probes/kprobes/actions-thumb.c  | 16 ++++++++++++----
 2 files changed, 17 insertions(+), 7 deletions(-)

diff --git a/arch/arm/probes/kprobes/actions-common.c b/arch/arm/probes/kprobes/actions-common.c
index 836aebe596cd..79171344dbeb 100644
--- a/arch/arm/probes/kprobes/actions-common.c
+++ b/arch/arm/probes/kprobes/actions-common.c
@@ -84,7 +84,8 @@ emulate_generic_r0_12_noflags(probes_opcode_t insn,
 	register void *rfn asm("lr") = asi->insn_fn;
 
 	__asm__ __volatile__ (
-		"stmdb	sp!, {%[regs], r11}	\n\t"
+ARM(		"stmdb	sp!, {%[regs], r11}	\n\t"	)
+THUMB(		"stmdb	sp!, {%[regs], r7}	\n\t"	)
 		"ldmia	%[regs], {r0-r12}	\n\t"
 #if __LINUX_ARM_ARCH__ >= 6
 		"blx	%[fn]			\n\t"
@@ -96,10 +97,11 @@ emulate_generic_r0_12_noflags(probes_opcode_t insn,
 #endif
 		"ldr	lr, [sp], #4		\n\t" /* lr = regs */
 		"stmia	lr, {r0-r12}		\n\t"
-		"ldr	r11, [sp], #4		\n\t"
+ARM(		"ldr	r11, [sp], #4		\n\t"	)
+THUMB(		"ldr	r7, [sp], #4		\n\t"	)
 		: [regs] "=r" (rregs), [fn] "=r" (rfn)
 		: "0" (rregs), "1" (rfn)
-		: "r0", "r2", "r3", "r4", "r5", "r6", "r7",
+		: "r0", "r2", "r3", "r4", "r5", "r6", ARM("r7") THUMB("r11"),
 		  "r8", "r9", "r10", "r12", "memory", "cc"
 		);
 }
diff --git a/arch/arm/probes/kprobes/actions-thumb.c b/arch/arm/probes/kprobes/actions-thumb.c
index 7884fcb81c26..51624fc263fc 100644
--- a/arch/arm/probes/kprobes/actions-thumb.c
+++ b/arch/arm/probes/kprobes/actions-thumb.c
@@ -447,14 +447,16 @@ t16_emulate_loregs(probes_opcode_t insn,
 
 	__asm__ __volatile__ (
 		"msr	cpsr_fs, %[oldcpsr]	\n\t"
+		"mov	r11, r7			\n\t"
 		"ldmia	%[regs], {r0-r7}	\n\t"
 		"blx	%[fn]			\n\t"
 		"stmia	%[regs], {r0-r7}	\n\t"
+		"mov	r7, r11			\n\t"
 		"mrs	%[newcpsr], cpsr	\n\t"
 		: [newcpsr] "=r" (newcpsr)
 		: [oldcpsr] "r" (oldcpsr), [regs] "r" (regs),
 		  [fn] "r" (asi->insn_fn)
-		: "r0", "r1", "r2", "r3", "r4", "r5", "r6", "r7",
+		: "r0", "r1", "r2", "r3", "r4", "r5", "r6", "r11",
 		  "lr", "memory", "cc"
 		);
 
@@ -524,14 +526,16 @@ t16_emulate_push(probes_opcode_t insn,
 		struct arch_probes_insn *asi, struct pt_regs *regs)
 {
 	__asm__ __volatile__ (
+		"mov	r11, r7			\n\t"
 		"ldr	r9, [%[regs], #13*4]	\n\t"
 		"ldr	r8, [%[regs], #14*4]	\n\t"
 		"ldmia	%[regs], {r0-r7}	\n\t"
 		"blx	%[fn]			\n\t"
 		"str	r9, [%[regs], #13*4]	\n\t"
+		"mov	r7, r11			\n\t"
 		:
 		: [regs] "r" (regs), [fn] "r" (asi->insn_fn)
-		: "r0", "r1", "r2", "r3", "r4", "r5", "r6", "r7", "r8", "r9",
+		: "r0", "r1", "r2", "r3", "r4", "r5", "r6", "r8", "r9", "r11",
 		  "lr", "memory", "cc"
 		);
 }
@@ -558,14 +562,16 @@ t16_emulate_pop_nopc(probes_opcode_t insn,
 		struct arch_probes_insn *asi, struct pt_regs *regs)
 {
 	__asm__ __volatile__ (
+		"mov	r11, r7			\n\t"
 		"ldr	r9, [%[regs], #13*4]	\n\t"
 		"ldmia	%[regs], {r0-r7}	\n\t"
 		"blx	%[fn]			\n\t"
 		"stmia	%[regs], {r0-r7}	\n\t"
 		"str	r9, [%[regs], #13*4]	\n\t"
+		"mov	r7, r11			\n\t"
 		:
 		: [regs] "r" (regs), [fn] "r" (asi->insn_fn)
-		: "r0", "r1", "r2", "r3", "r4", "r5", "r6", "r7", "r9",
+		: "r0", "r1", "r2", "r3", "r4", "r5", "r6", "r9", "r11",
 		  "lr", "memory", "cc"
 		);
 }
@@ -577,14 +583,16 @@ t16_emulate_pop_pc(probes_opcode_t insn,
 	register unsigned long pc asm("r8");
 
 	__asm__ __volatile__ (
+		"mov	r11, r7			\n\t"
 		"ldr	r9, [%[regs], #13*4]	\n\t"
 		"ldmia	%[regs], {r0-r7}	\n\t"
 		"blx	%[fn]			\n\t"
 		"stmia	%[regs], {r0-r7}	\n\t"
 		"str	r9, [%[regs], #13*4]	\n\t"
+		"mov	r7, r11			\n\t"
 		: "=r" (pc)
 		: [regs] "r" (regs), [fn] "r" (asi->insn_fn)
-		: "r0", "r1", "r2", "r3", "r4", "r5", "r6", "r7", "r9",
+		: "r0", "r1", "r2", "r3", "r4", "r5", "r6", "r9", "r11",
 		  "lr", "memory", "cc"
 		);
 
-- 
2.30.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v3 10/13] drivers/firmware/scmi: disable ftrace for Clang Thumb2 builds
  2022-02-03  8:21 [PATCH v3 00/13] ARM: ftrace fixes and cleanups Ard Biesheuvel
                   ` (8 preceding siblings ...)
  2022-02-03  8:22 ` [PATCH v3 09/13] ARM: kprobes: treat R7 as the frame pointer register in Thumb2 builds Ard Biesheuvel
@ 2022-02-03  8:22 ` Ard Biesheuvel
  2022-02-07 18:28   ` Sudeep Holla
  2022-02-11  9:56   ` Sudeep Holla
  2022-02-03  8:22 ` [PATCH v3 11/13] ARM: cacheflush: avoid clobbering the frame pointer Ard Biesheuvel
                   ` (2 subsequent siblings)
  12 siblings, 2 replies; 27+ messages in thread
From: Ard Biesheuvel @ 2022-02-03  8:22 UTC (permalink / raw)
  To: linux
  Cc: linux-arm-kernel, Ard Biesheuvel, Steven Rostedt, Sudeep Holla,
	Cristian Marussi, Nathan Chancellor, Nick Desaulniers,
	Arnd Bergmann, Linus Walleij, Masami Hiramatsu

The SMC calling convention designates R0-R7 as input registers in
AArch32 mode, and this conflicts with the compiler's use of R7 as a
frame pointer when building in Thumb2 mode. Generally, we don't enable
the frame pointer, and GCC happily enables the -pg profiling hooks
without them. However, Clang refuses, and errors out with the message
below:

drivers/firmware/arm_scmi/smc.c:152:2: error: write to reserved register 'R7'
        arm_smccc_1_1_invoke(scmi_info->func_id, 0, 0, 0, 0, 0, 0, 0, &res);
        ^
include/linux/arm-smccc.h:550:4: note: expanded from macro 'arm_smccc_1_1_invoke'
                        arm_smccc_1_1_smc(__VA_ARGS__);                 \
                        ^
Let's just disable ftrace for the compilation unit when building this
configuration.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Acked-by: Sudeep Holla <sudeep.holla@arm.com>
---
 drivers/firmware/arm_scmi/Makefile | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/drivers/firmware/arm_scmi/Makefile b/drivers/firmware/arm_scmi/Makefile
index 1dcf123d64ab..52b9078bfe96 100644
--- a/drivers/firmware/arm_scmi/Makefile
+++ b/drivers/firmware/arm_scmi/Makefile
@@ -11,3 +11,10 @@ scmi-module-objs := $(scmi-bus-y) $(scmi-driver-y) $(scmi-protocols-y) \
 		    $(scmi-transport-y)
 obj-$(CONFIG_ARM_SCMI_PROTOCOL) += scmi-module.o
 obj-$(CONFIG_ARM_SCMI_POWER_DOMAIN) += scmi_pm_domain.o
+
+ifeq ($(CONFIG_THUMB2_KERNEL)$(CONFIG_CC_IS_CLANG),yy)
+# The use of R7 in the SMCCC conflicts with the compiler's use of R7 as a frame
+# pointer in Thumb2 mode, which is forcibly enabled by Clang when profiling
+# hooks are inserted via the -pg switch.
+CFLAGS_REMOVE_smc.o += $(CC_FLAGS_FTRACE)
+endif
-- 
2.30.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v3 11/13] ARM: cacheflush: avoid clobbering the frame pointer
  2022-02-03  8:21 [PATCH v3 00/13] ARM: ftrace fixes and cleanups Ard Biesheuvel
                   ` (9 preceding siblings ...)
  2022-02-03  8:22 ` [PATCH v3 10/13] drivers/firmware/scmi: disable ftrace for Clang " Ard Biesheuvel
@ 2022-02-03  8:22 ` Ard Biesheuvel
  2022-02-07 19:12   ` Nick Desaulniers
  2022-02-03  8:22 ` [PATCH v3 12/13] ARM: mach-bcm: disable ftrace in SMC invocation routines Ard Biesheuvel
  2022-02-03  8:22 ` [PATCH v3 13/13] Revert "ARM: 9144/1: forbid ftrace with clang and thumb2_kernel" Ard Biesheuvel
  12 siblings, 1 reply; 27+ messages in thread
From: Ard Biesheuvel @ 2022-02-03  8:22 UTC (permalink / raw)
  To: linux
  Cc: linux-arm-kernel, Ard Biesheuvel, Steven Rostedt, Sudeep Holla,
	Cristian Marussi, Nathan Chancellor, Nick Desaulniers,
	Arnd Bergmann, Linus Walleij, Masami Hiramatsu

Thumb2 uses R7 rather than R11 as the frame pointer, and even if we
rarely use a frame pointer to begin with when building in Thumb2 mode,
there are cases where it is required by the compiler (Clang when
inserting profiling hooks via -pg)

However, preserving and restoring the frame pointer is risky, as any
unhandled exceptions raised in the mean time will produce a bogus
backtrace, and it would be better not to touch the frame pointer at all.
This is the case even when CONFIG_FRAME_POINTER is not set, as the
unwind directive used by the unwinder may also use R7 or R11 as the
unwind anchor, even if the frame pointer is not managed strictly
according to the frame pointer ABI.

So let's tweak the cacheflush asm code not to clobber R7 or R11 at all,
so that we can drop R7 from the clobber lists of the inline asm blocks
that call these routines, and remove the code that preserves/restores
R11.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm/include/asm/cacheflush.h  | 12 ++----
 arch/arm/mach-exynos/mcpm-exynos.c |  6 +--
 arch/arm/mm/cache-v7.S             | 40 +++++++++-----------
 3 files changed, 23 insertions(+), 35 deletions(-)

diff --git a/arch/arm/include/asm/cacheflush.h b/arch/arm/include/asm/cacheflush.h
index e68fb879e4f9..d27782331556 100644
--- a/arch/arm/include/asm/cacheflush.h
+++ b/arch/arm/include/asm/cacheflush.h
@@ -446,15 +446,10 @@ static inline void __sync_cache_range_r(volatile void *p, size_t size)
  *   however some exceptions may exist.  Caveat emptor.
  *
  * - The clobber list is dictated by the call to v7_flush_dcache_*.
- *   fp is preserved to the stack explicitly prior disabling the cache
- *   since adding it to the clobber list is incompatible with having
- *   CONFIG_FRAME_POINTER=y.  ip is saved as well if ever r12-clobbering
- *   trampoline are inserted by the linker and to keep sp 64-bit aligned.
  */
 #define v7_exit_coherency_flush(level) \
 	asm volatile( \
 	".arch	armv7-a \n\t" \
-	"stmfd	sp!, {fp, ip} \n\t" \
 	"mrc	p15, 0, r0, c1, c0, 0	@ get SCTLR \n\t" \
 	"bic	r0, r0, #"__stringify(CR_C)" \n\t" \
 	"mcr	p15, 0, r0, c1, c0, 0	@ set SCTLR \n\t" \
@@ -464,10 +459,9 @@ static inline void __sync_cache_range_r(volatile void *p, size_t size)
 	"bic	r0, r0, #(1 << 6)	@ disable local coherency \n\t" \
 	"mcr	p15, 0, r0, c1, c0, 1	@ set ACTLR \n\t" \
 	"isb	\n\t" \
-	"dsb	\n\t" \
-	"ldmfd	sp!, {fp, ip}" \
-	: : : "r0","r1","r2","r3","r4","r5","r6","r7", \
-	      "r9","r10","lr","memory" )
+	"dsb" \
+	: : : "r0","r1","r2","r3","r4","r5","r6", \
+	      "r9","r10","ip","lr","memory" )
 
 void flush_uprobe_xol_access(struct page *page, unsigned long uaddr,
 			     void *kaddr, unsigned long len);
diff --git a/arch/arm/mach-exynos/mcpm-exynos.c b/arch/arm/mach-exynos/mcpm-exynos.c
index cd861c57d5ad..fd0dbeb93357 100644
--- a/arch/arm/mach-exynos/mcpm-exynos.c
+++ b/arch/arm/mach-exynos/mcpm-exynos.c
@@ -35,7 +35,6 @@ static bool secure_firmware __ro_after_init;
  */
 #define exynos_v7_exit_coherency_flush(level) \
 	asm volatile( \
-	"stmfd	sp!, {fp, ip}\n\t"\
 	"mrc	p15, 0, r0, c1, c0, 0	@ get SCTLR\n\t" \
 	"bic	r0, r0, #"__stringify(CR_C)"\n\t" \
 	"mcr	p15, 0, r0, c1, c0, 0	@ set SCTLR\n\t" \
@@ -50,11 +49,10 @@ static bool secure_firmware __ro_after_init;
 	"mcr	p15, 0, r0, c1, c0, 1	@ set ACTLR\n\t" \
 	"isb\n\t" \
 	"dsb\n\t" \
-	"ldmfd	sp!, {fp, ip}" \
 	: \
 	: "Ir" (pmu_base_addr + S5P_INFORM0) \
-	: "r0", "r1", "r2", "r3", "r4", "r5", "r6", "r7", \
-	  "r9", "r10", "lr", "memory")
+	: "r0", "r1", "r2", "r3", "r4", "r5", "r6", \
+	  "r9", "r10", "ip", "lr", "memory")
 
 static int exynos_cpu_powerup(unsigned int cpu, unsigned int cluster)
 {
diff --git a/arch/arm/mm/cache-v7.S b/arch/arm/mm/cache-v7.S
index 830bbfb26ca5..b9f55f0bc278 100644
--- a/arch/arm/mm/cache-v7.S
+++ b/arch/arm/mm/cache-v7.S
@@ -90,7 +90,7 @@ ENDPROC(v7_flush_icache_all)
  *
  *     Flush the D-cache up to the Level of Unification Inner Shareable
  *
- *     Corrupted registers: r0-r7, r9-r11 (r6 only in Thumb mode)
+ *     Corrupted registers: r0-r6, r9-r10
  */
 
 ENTRY(v7_flush_dcache_louis)
@@ -117,7 +117,7 @@ ENDPROC(v7_flush_dcache_louis)
  *
  *	Flush the whole D-cache.
  *
- *	Corrupted registers: r0-r7, r9-r11 (r6 only in Thumb mode)
+ *	Corrupted registers: r0-r6, r9-r10
  *
  *	- mm    - mm_struct describing address space
  */
@@ -149,22 +149,22 @@ flush_levels:
 	movw	r4, #0x3ff
 	ands	r4, r4, r1, lsr #3		@ find maximum number on the way size
 	clz	r5, r4				@ find bit position of way size increment
-	movw	r7, #0x7fff
-	ands	r7, r7, r1, lsr #13		@ extract max number of the index size
+	movw	r6, #0x7fff
+	and	r1, r6, r1, lsr #13		@ extract max number of the index size
+	mov	r6, #1
+	movne	r4, r4, lsl r5			@ # of ways shifted into bits [31:...]
+	movne	r6, r6, lsl r5			@ 1 shifted left by same amount
 loop1:
-	mov	r9, r7				@ create working copy of max index
+	mov	r9, r1				@ create working copy of max index
 loop2:
- ARM(	orr	r11, r10, r4, lsl r5	)	@ factor way and cache number into r11
- THUMB(	lsl	r6, r4, r5		)
- THUMB(	orr	r11, r10, r6		)	@ factor way and cache number into r11
- ARM(	orr	r11, r11, r9, lsl r2	)	@ factor index number into r11
- THUMB(	lsl	r6, r9, r2		)
- THUMB(	orr	r11, r11, r6		)	@ factor index number into r11
-	mcr	p15, 0, r11, c7, c14, 2		@ clean & invalidate by set/way
+	mov	r5, r9, lsl r2			@ factor way into r5
+	orr	r5, r5, r4			@ factor index number into r5
+	orr	r5, r5, r10			@ factor cache number into r5
+	mcr	p15, 0, r5, c7, c14, 2		@ clean & invalidate by set/way
 	subs	r9, r9, #1			@ decrement the index
 	bge	loop2
-	subs	r4, r4, #1			@ decrement the way
-	bge	loop1
+	subs	r4, r4, r6			@ decrement the way
+	bcs	loop1
 skip:
 	add	r10, r10, #2			@ increment cache number
 	cmp	r3, r10
@@ -192,14 +192,12 @@ ENDPROC(v7_flush_dcache_all)
  *
  */
 ENTRY(v7_flush_kern_cache_all)
- ARM(	stmfd	sp!, {r4-r5, r7, r9-r11, lr}	)
- THUMB(	stmfd	sp!, {r4-r7, r9-r11, lr}	)
+	stmfd	sp!, {r4-r6, r9-r10, lr}
 	bl	v7_flush_dcache_all
 	mov	r0, #0
 	ALT_SMP(mcr	p15, 0, r0, c7, c1, 0)	@ invalidate I-cache inner shareable
 	ALT_UP(mcr	p15, 0, r0, c7, c5, 0)	@ I+BTB cache invalidate
- ARM(	ldmfd	sp!, {r4-r5, r7, r9-r11, lr}	)
- THUMB(	ldmfd	sp!, {r4-r7, r9-r11, lr}	)
+	ldmfd	sp!, {r4-r6, r9-r10, lr}
 	ret	lr
 ENDPROC(v7_flush_kern_cache_all)
 
@@ -210,14 +208,12 @@ ENDPROC(v7_flush_kern_cache_all)
  *     Invalidate the I-cache to the point of unification.
  */
 ENTRY(v7_flush_kern_cache_louis)
- ARM(	stmfd	sp!, {r4-r5, r7, r9-r11, lr}	)
- THUMB(	stmfd	sp!, {r4-r7, r9-r11, lr}	)
+	stmfd	sp!, {r4-r6, r9-r10, lr}
 	bl	v7_flush_dcache_louis
 	mov	r0, #0
 	ALT_SMP(mcr	p15, 0, r0, c7, c1, 0)	@ invalidate I-cache inner shareable
 	ALT_UP(mcr	p15, 0, r0, c7, c5, 0)	@ I+BTB cache invalidate
- ARM(	ldmfd	sp!, {r4-r5, r7, r9-r11, lr}	)
- THUMB(	ldmfd	sp!, {r4-r7, r9-r11, lr}	)
+	ldmfd	sp!, {r4-r6, r9-r10, lr}
 	ret	lr
 ENDPROC(v7_flush_kern_cache_louis)
 
-- 
2.30.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v3 12/13] ARM: mach-bcm: disable ftrace in SMC invocation routines
  2022-02-03  8:21 [PATCH v3 00/13] ARM: ftrace fixes and cleanups Ard Biesheuvel
                   ` (10 preceding siblings ...)
  2022-02-03  8:22 ` [PATCH v3 11/13] ARM: cacheflush: avoid clobbering the frame pointer Ard Biesheuvel
@ 2022-02-03  8:22 ` Ard Biesheuvel
  2022-02-03 20:39   ` Nick Desaulniers
  2022-02-03  8:22 ` [PATCH v3 13/13] Revert "ARM: 9144/1: forbid ftrace with clang and thumb2_kernel" Ard Biesheuvel
  12 siblings, 1 reply; 27+ messages in thread
From: Ard Biesheuvel @ 2022-02-03  8:22 UTC (permalink / raw)
  To: linux
  Cc: linux-arm-kernel, Ard Biesheuvel, Steven Rostedt, Sudeep Holla,
	Cristian Marussi, Nathan Chancellor, Nick Desaulniers,
	Arnd Bergmann, Linus Walleij, Masami Hiramatsu

The SMC calling convention uses R7 as an argument register, which
conflicts with its use as a frame pointer when building in Thumb2 mode.
Given that Clang with ftrace does not permit frame pointers to be
enabled, let's omit this compilation unit from ftrace instrumentation.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm/mach-bcm/Makefile | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/arm/mach-bcm/Makefile b/arch/arm/mach-bcm/Makefile
index 7baa8c9427d5..b2394ddb0558 100644
--- a/arch/arm/mach-bcm/Makefile
+++ b/arch/arm/mach-bcm/Makefile
@@ -40,6 +40,7 @@ obj-$(CONFIG_ARCH_BCM_MOBILE_L2_CACHE) += kona_l2_cache.o
 
 # Support for secure monitor traps
 obj-$(CONFIG_ARCH_BCM_MOBILE_SMC) += bcm_kona_smc.o
+CFLAGS_REMOVE_bcm_kona_smc.o += $(CC_FLAGS_FTRACE)
 
 # BCM2835
 ifeq ($(CONFIG_ARCH_BCM2835),y)
-- 
2.30.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v3 13/13] Revert "ARM: 9144/1: forbid ftrace with clang and thumb2_kernel"
  2022-02-03  8:21 [PATCH v3 00/13] ARM: ftrace fixes and cleanups Ard Biesheuvel
                   ` (11 preceding siblings ...)
  2022-02-03  8:22 ` [PATCH v3 12/13] ARM: mach-bcm: disable ftrace in SMC invocation routines Ard Biesheuvel
@ 2022-02-03  8:22 ` Ard Biesheuvel
  12 siblings, 0 replies; 27+ messages in thread
From: Ard Biesheuvel @ 2022-02-03  8:22 UTC (permalink / raw)
  To: linux
  Cc: linux-arm-kernel, Ard Biesheuvel, Steven Rostedt, Sudeep Holla,
	Cristian Marussi, Nathan Chancellor, Nick Desaulniers,
	Arnd Bergmann, Linus Walleij, Masami Hiramatsu

This reverts commit ecb108e3e3f7c692082b7c6fce41779c3835854a.

Clang + Thumb2 with ftrace is now supported.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
 arch/arm/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 3c6216f194f2..4ad7080bc17c 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -93,7 +93,7 @@ config ARM
 	select HAVE_FAST_GUP if ARM_LPAE
 	select HAVE_FTRACE_MCOUNT_RECORD if !XIP_KERNEL
 	select HAVE_FUNCTION_GRAPH_TRACER
-	select HAVE_FUNCTION_TRACER if !XIP_KERNEL && !(THUMB2_KERNEL && CC_IS_CLANG)
+	select HAVE_FUNCTION_TRACER if !XIP_KERNEL
 	select HAVE_FUTEX_CMPXCHG if FUTEX
 	select HAVE_GCC_PLUGINS
 	select HAVE_HW_BREAKPOINT if PERF_EVENTS && (CPU_V6 || CPU_V6K || CPU_V7)
-- 
2.30.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 08/13] ARM: ftrace: enable the graph tracer with the EABI unwinder
  2022-02-03  8:21 ` [PATCH v3 08/13] ARM: ftrace: enable the graph tracer with the EABI unwinder Ard Biesheuvel
@ 2022-02-03  9:16   ` Arnd Bergmann
  2022-02-03  9:41     ` Ard Biesheuvel
  2022-02-03 16:09     ` Nathan Chancellor
  0 siblings, 2 replies; 27+ messages in thread
From: Arnd Bergmann @ 2022-02-03  9:16 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Russell King - ARM Linux, Linux ARM, Steven Rostedt,
	Sudeep Holla, Cristian Marussi, Nathan Chancellor,
	Nick Desaulniers, Arnd Bergmann, Linus Walleij, Masami Hiramatsu

On Thu, Feb 3, 2022 at 9:21 AM Ard Biesheuvel <ardb@kernel.org> wrote:
>
> Enable the function graph tracer in combination with the EABI unwinder,
> so that Thumb2 builds or Clang ARM builds can make use of it.
>
> This involves using the unwinder to locate the return address of an
> instrumented function on the stack, so that it can be overridden and
> made to refer to the ftrace handling routines that need to be called at
> function return.
>
> Given that for these builds, it is not guaranteed that the value of the
> link register is stored on the stack, fall back to the stack slot that
> will be used by the ftrace exit code to restore LR in the instrumented
> function's execution context.
>
> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>

> diff --git a/arch/arm/Kconfig.debug b/arch/arm/Kconfig.debug
> index 07055a503022..cc7523f44be4 100644
> --- a/arch/arm/Kconfig.debug
> +++ b/arch/arm/Kconfig.debug
> @@ -65,7 +65,7 @@ config UNWINDER_FRAME_POINTER
>
>  config UNWINDER_ARM
>         bool "ARM EABI stack unwinder"
> -       depends on AEABI && !FUNCTION_GRAPH_TRACER
> +       depends on AEABI
>         # https://github.com/ClangBuiltLinux/linux/issues/732
>         depends on !LD_IS_LLD || LLD_VERSION >= 110000
>         select ARM_UNWIND

What are the remaining tradeoffs between the two unwinders
on EABI kernels? Since we require llvm-11 as the minimum
toolchain now, we can probably drop the LLD_VERSION
check here as well, so UNWINDER_ARM should finally work
for any EABI kernel.

According to the help text, the main downside is a small size
increase that I've measured at under 1% in multi_v7_defconfig,
however that seems to be far outweighed by allowing thumb2
kernels that reduce the size by 15% in the same config.

Are there any other reasons for keeping the frame pointer
unwinder as a normal option for ARMv7?

        Arnd

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 08/13] ARM: ftrace: enable the graph tracer with the EABI unwinder
  2022-02-03  9:16   ` Arnd Bergmann
@ 2022-02-03  9:41     ` Ard Biesheuvel
  2022-02-03 16:09     ` Nathan Chancellor
  1 sibling, 0 replies; 27+ messages in thread
From: Ard Biesheuvel @ 2022-02-03  9:41 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Russell King - ARM Linux, Linux ARM, Steven Rostedt,
	Sudeep Holla, Cristian Marussi, Nathan Chancellor,
	Nick Desaulniers, Linus Walleij, Masami Hiramatsu

On Thu, 3 Feb 2022 at 10:17, Arnd Bergmann <arnd@arndb.de> wrote:
>
> On Thu, Feb 3, 2022 at 9:21 AM Ard Biesheuvel <ardb@kernel.org> wrote:
> >
> > Enable the function graph tracer in combination with the EABI unwinder,
> > so that Thumb2 builds or Clang ARM builds can make use of it.
> >
> > This involves using the unwinder to locate the return address of an
> > instrumented function on the stack, so that it can be overridden and
> > made to refer to the ftrace handling routines that need to be called at
> > function return.
> >
> > Given that for these builds, it is not guaranteed that the value of the
> > link register is stored on the stack, fall back to the stack slot that
> > will be used by the ftrace exit code to restore LR in the instrumented
> > function's execution context.
> >
> > Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> > Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
>
> > diff --git a/arch/arm/Kconfig.debug b/arch/arm/Kconfig.debug
> > index 07055a503022..cc7523f44be4 100644
> > --- a/arch/arm/Kconfig.debug
> > +++ b/arch/arm/Kconfig.debug
> > @@ -65,7 +65,7 @@ config UNWINDER_FRAME_POINTER
> >
> >  config UNWINDER_ARM
> >         bool "ARM EABI stack unwinder"
> > -       depends on AEABI && !FUNCTION_GRAPH_TRACER
> > +       depends on AEABI
> >         # https://github.com/ClangBuiltLinux/linux/issues/732
> >         depends on !LD_IS_LLD || LLD_VERSION >= 110000
> >         select ARM_UNWIND
>
> What are the remaining tradeoffs between the two unwinders
> on EABI kernels?

The frame pointer based stack tracer dumps the values of the registers
at each level, by decoding the instructions that push them onto the
stack in the prologue. And the frame pointer unwinder is generally a
bit more robust.

> Since we require llvm-11 as the minimum
> toolchain now, we can probably drop the LLD_VERSION
> check here as well, so UNWINDER_ARM should finally work
> for any EABI kernel.
>

Indeed.

> According to the help text, the main downside is a small size
> increase that I've measured at under 1% in multi_v7_defconfig,
> however that seems to be far outweighed by allowing thumb2
> kernels that reduce the size by 15% in the same config.
>

Yeah the EABI format unwind info is a bit more compact than DWARF, so
the size is manageable. I saw a ~30% reduction in .text size, so 15%
for the core kernel is probably accurate, but it is likely a bit
better for modules (modulo the rounding to page size, of course)

> Are there any other reasons for keeping the frame pointer
> unwinder as a normal option for ARMv7?
>

Just the points mentioned above

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 08/13] ARM: ftrace: enable the graph tracer with the EABI unwinder
  2022-02-03  9:16   ` Arnd Bergmann
  2022-02-03  9:41     ` Ard Biesheuvel
@ 2022-02-03 16:09     ` Nathan Chancellor
  2022-02-03 16:11       ` Ard Biesheuvel
  1 sibling, 1 reply; 27+ messages in thread
From: Nathan Chancellor @ 2022-02-03 16:09 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Ard Biesheuvel, Russell King - ARM Linux, Linux ARM,
	Steven Rostedt, Sudeep Holla, Cristian Marussi, Nick Desaulniers,
	Linus Walleij, Masami Hiramatsu

On Thu, Feb 03, 2022 at 10:16:57AM +0100, Arnd Bergmann wrote:
> On Thu, Feb 3, 2022 at 9:21 AM Ard Biesheuvel <ardb@kernel.org> wrote:
> >
> > Enable the function graph tracer in combination with the EABI unwinder,
> > so that Thumb2 builds or Clang ARM builds can make use of it.
> >
> > This involves using the unwinder to locate the return address of an
> > instrumented function on the stack, so that it can be overridden and
> > made to refer to the ftrace handling routines that need to be called at
> > function return.
> >
> > Given that for these builds, it is not guaranteed that the value of the
> > link register is stored on the stack, fall back to the stack slot that
> > will be used by the ftrace exit code to restore LR in the instrumented
> > function's execution context.
> >
> > Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> > Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
> 
> > diff --git a/arch/arm/Kconfig.debug b/arch/arm/Kconfig.debug
> > index 07055a503022..cc7523f44be4 100644
> > --- a/arch/arm/Kconfig.debug
> > +++ b/arch/arm/Kconfig.debug
> > @@ -65,7 +65,7 @@ config UNWINDER_FRAME_POINTER
> >
> >  config UNWINDER_ARM
> >         bool "ARM EABI stack unwinder"
> > -       depends on AEABI && !FUNCTION_GRAPH_TRACER
> > +       depends on AEABI
> >         # https://github.com/ClangBuiltLinux/linux/issues/732
> >         depends on !LD_IS_LLD || LLD_VERSION >= 110000
> >         select ARM_UNWIND
> 
> What are the remaining tradeoffs between the two unwinders
> on EABI kernels? Since we require llvm-11 as the minimum
> toolchain now, we can probably drop the LLD_VERSION
> check here as well, so UNWINDER_ARM should finally work
> for any EABI kernel.

For what it's worth, that check was eliminated in the same series as
bumping the minimum version: e1ab4182ca11 ("Revert "ARM: 9070/1: Make
UNWINDER_ARM depend on ld.bfd or ld.lld 11.0.0+"")

Cheers,
Nathan

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 08/13] ARM: ftrace: enable the graph tracer with the EABI unwinder
  2022-02-03 16:09     ` Nathan Chancellor
@ 2022-02-03 16:11       ` Ard Biesheuvel
  0 siblings, 0 replies; 27+ messages in thread
From: Ard Biesheuvel @ 2022-02-03 16:11 UTC (permalink / raw)
  To: Nathan Chancellor
  Cc: Arnd Bergmann, Russell King - ARM Linux, Linux ARM,
	Steven Rostedt, Sudeep Holla, Cristian Marussi, Nick Desaulniers,
	Linus Walleij, Masami Hiramatsu

On Thu, 3 Feb 2022 at 17:09, Nathan Chancellor <nathan@kernel.org> wrote:
>
> On Thu, Feb 03, 2022 at 10:16:57AM +0100, Arnd Bergmann wrote:
> > On Thu, Feb 3, 2022 at 9:21 AM Ard Biesheuvel <ardb@kernel.org> wrote:
> > >
> > > Enable the function graph tracer in combination with the EABI unwinder,
> > > so that Thumb2 builds or Clang ARM builds can make use of it.
> > >
> > > This involves using the unwinder to locate the return address of an
> > > instrumented function on the stack, so that it can be overridden and
> > > made to refer to the ftrace handling routines that need to be called at
> > > function return.
> > >
> > > Given that for these builds, it is not guaranteed that the value of the
> > > link register is stored on the stack, fall back to the stack slot that
> > > will be used by the ftrace exit code to restore LR in the instrumented
> > > function's execution context.
> > >
> > > Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> > > Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
> >
> > > diff --git a/arch/arm/Kconfig.debug b/arch/arm/Kconfig.debug
> > > index 07055a503022..cc7523f44be4 100644
> > > --- a/arch/arm/Kconfig.debug
> > > +++ b/arch/arm/Kconfig.debug
> > > @@ -65,7 +65,7 @@ config UNWINDER_FRAME_POINTER
> > >
> > >  config UNWINDER_ARM
> > >         bool "ARM EABI stack unwinder"
> > > -       depends on AEABI && !FUNCTION_GRAPH_TRACER
> > > +       depends on AEABI
> > >         # https://github.com/ClangBuiltLinux/linux/issues/732
> > >         depends on !LD_IS_LLD || LLD_VERSION >= 110000
> > >         select ARM_UNWIND
> >
> > What are the remaining tradeoffs between the two unwinders
> > on EABI kernels? Since we require llvm-11 as the minimum
> > toolchain now, we can probably drop the LLD_VERSION
> > check here as well, so UNWINDER_ARM should finally work
> > for any EABI kernel.
>
> For what it's worth, that check was eliminated in the same series as
> bumping the minimum version: e1ab4182ca11 ("Revert "ARM: 9070/1: Make
> UNWINDER_ARM depend on ld.bfd or ld.lld 11.0.0+"")
>

OK. This series is currently based on Russell's tree which does not
include v5.17-rc1 yet.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 12/13] ARM: mach-bcm: disable ftrace in SMC invocation routines
  2022-02-03  8:22 ` [PATCH v3 12/13] ARM: mach-bcm: disable ftrace in SMC invocation routines Ard Biesheuvel
@ 2022-02-03 20:39   ` Nick Desaulniers
  0 siblings, 0 replies; 27+ messages in thread
From: Nick Desaulniers @ 2022-02-03 20:39 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: linux, linux-arm-kernel, Steven Rostedt, Sudeep Holla,
	Cristian Marussi, Nathan Chancellor, Arnd Bergmann,
	Linus Walleij, Masami Hiramatsu

On Thu, Feb 3, 2022 at 12:22 AM Ard Biesheuvel <ardb@kernel.org> wrote:
>
> The SMC calling convention uses R7 as an argument register, which
> conflicts with its use as a frame pointer when building in Thumb2 mode.
> Given that Clang with ftrace does not permit frame pointers to be
> enabled, let's omit this compilation unit from ftrace instrumentation.
>
> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>

Acked-by: Nick Desaulniers <ndesaulniers@google.com>

> ---
>  arch/arm/mach-bcm/Makefile | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/arch/arm/mach-bcm/Makefile b/arch/arm/mach-bcm/Makefile
> index 7baa8c9427d5..b2394ddb0558 100644
> --- a/arch/arm/mach-bcm/Makefile
> +++ b/arch/arm/mach-bcm/Makefile
> @@ -40,6 +40,7 @@ obj-$(CONFIG_ARCH_BCM_MOBILE_L2_CACHE) += kona_l2_cache.o
>
>  # Support for secure monitor traps
>  obj-$(CONFIG_ARCH_BCM_MOBILE_SMC) += bcm_kona_smc.o
> +CFLAGS_REMOVE_bcm_kona_smc.o += $(CC_FLAGS_FTRACE)
>
>  # BCM2835
>  ifeq ($(CONFIG_ARCH_BCM2835),y)
> --
> 2.30.2
>


-- 
Thanks,
~Nick Desaulniers

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 07/13] ARM: unwind: track location of LR value in stack frame
  2022-02-03  8:21 ` [PATCH v3 07/13] ARM: unwind: track location of LR value in stack frame Ard Biesheuvel
@ 2022-02-07 18:14   ` Nick Desaulniers
  0 siblings, 0 replies; 27+ messages in thread
From: Nick Desaulniers @ 2022-02-07 18:14 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: linux, linux-arm-kernel, Steven Rostedt, Sudeep Holla,
	Cristian Marussi, Nathan Chancellor, Arnd Bergmann,
	Linus Walleij, Masami Hiramatsu

On Thu, Feb 3, 2022 at 12:22 AM Ard Biesheuvel <ardb@kernel.org> wrote:
>
> The ftrace graph tracer needs to override the return address of an
> instrumented function, in order to install a hook that gets invoked when
> the function returns again.
>
> Currently, we only support this when building for ARM using GCC with
> frame pointers, as in this case, it is guaranteed that the function will
> reload LR from [FP, #-4] in all cases, and we can simply pass that
> address to the ftrace code.
>
> In order to support this for configurations that rely on the EABI
> unwinder, such as Thumb2 builds, make the unwinder keep track of the
> address from which LR was unwound, permitting ftrace to make use of this
> in a subsequent patch.
>
> Drop the call to is_kernel_text_address(), which is problematic in terms
> of ftrace recursion, given that it may be instrumented itself. The call
> is redundant anyway, as no unwind directives will be found unless the PC
> points to memory that is known to contain executable code.
>
> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>


Looks like patch 8/13 consumes lr_addr.

Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>

> ---
>  arch/arm/include/asm/stacktrace.h | 3 +++
>  arch/arm/kernel/Makefile          | 1 +
>  arch/arm/kernel/unwind.c          | 7 ++++---
>  3 files changed, 8 insertions(+), 3 deletions(-)
>
> diff --git a/arch/arm/include/asm/stacktrace.h b/arch/arm/include/asm/stacktrace.h
> index d87d60532b86..e56503fd9447 100644
> --- a/arch/arm/include/asm/stacktrace.h
> +++ b/arch/arm/include/asm/stacktrace.h
> @@ -14,6 +14,9 @@ struct stackframe {
>         unsigned long sp;
>         unsigned long lr;
>         unsigned long pc;
> +
> +       /* address of the LR value on the stack */
> +       unsigned long *lr_addr;
>  #ifdef CONFIG_KRETPROBES
>         struct llist_node *kr_cur;
>         struct task_struct *tsk;
> diff --git a/arch/arm/kernel/Makefile b/arch/arm/kernel/Makefile
> index ae295a3bcfef..56511856ff9d 100644
> --- a/arch/arm/kernel/Makefile
> +++ b/arch/arm/kernel/Makefile
> @@ -10,6 +10,7 @@ ifdef CONFIG_FUNCTION_TRACER
>  CFLAGS_REMOVE_ftrace.o = -pg
>  CFLAGS_REMOVE_insn.o = -pg
>  CFLAGS_REMOVE_patch.o = -pg
> +CFLAGS_REMOVE_unwind.o = -pg
>  endif
>
>  CFLAGS_REMOVE_return_address.o = -pg
> diff --git a/arch/arm/kernel/unwind.c b/arch/arm/kernel/unwind.c
> index c5ea328c428d..b4e468a7674b 100644
> --- a/arch/arm/kernel/unwind.c
> +++ b/arch/arm/kernel/unwind.c
> @@ -55,6 +55,7 @@ struct unwind_ctrl_block {
>         const unsigned long *insn;      /* pointer to the current instructions word */
>         unsigned long sp_low;           /* lowest value of sp allowed */
>         unsigned long sp_high;          /* highest value of sp allowed */
> +       unsigned long *lr_addr;         /* address of LR value on the stack */
>         /*
>          * 1 : check for stack overflow for each register pop.
>          * 0 : save overhead if there is plenty of stack remaining.
> @@ -239,6 +240,8 @@ static int unwind_pop_register(struct unwind_ctrl_block *ctrl,
>          * from being tracked by KASAN.
>          */
>         ctrl->vrs[reg] = READ_ONCE_NOCHECK(*(*vsp));
> +       if (reg == 14)
> +               ctrl->lr_addr = *vsp;
>         (*vsp)++;
>         return URC_OK;
>  }
> @@ -395,9 +398,6 @@ int unwind_frame(struct stackframe *frame)
>         pr_debug("%s(pc = %08lx lr = %08lx sp = %08lx)\n", __func__,
>                  frame->pc, frame->lr, frame->sp);
>
> -       if (!kernel_text_address(frame->pc))
> -               return -URC_FAILURE;
> -
>         idx = unwind_find_idx(frame->pc);
>         if (!idx) {
>                 pr_warn("unwind: Index not found %08lx\n", frame->pc);
> @@ -476,6 +476,7 @@ int unwind_frame(struct stackframe *frame)
>         frame->lr = ctrl.vrs[LR];
>         frame->pc = ctrl.vrs[PC];
>         frame->sp_low = ctrl.sp_low;
> +       frame->lr_addr = ctrl.lr_addr;
>
>         return URC_OK;
>  }
> --
> 2.30.2
>


-- 
Thanks,
~Nick Desaulniers

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 10/13] drivers/firmware/scmi: disable ftrace for Clang Thumb2 builds
  2022-02-03  8:22 ` [PATCH v3 10/13] drivers/firmware/scmi: disable ftrace for Clang " Ard Biesheuvel
@ 2022-02-07 18:28   ` Sudeep Holla
  2022-02-08 21:18     ` Ard Biesheuvel
  2022-02-11  9:56   ` Sudeep Holla
  1 sibling, 1 reply; 27+ messages in thread
From: Sudeep Holla @ 2022-02-07 18:28 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: linux, linux-arm-kernel, Steven Rostedt, Cristian Marussi,
	Sudeep Holla, Nathan Chancellor, Nick Desaulniers, Arnd Bergmann,
	Linus Walleij, Masami Hiramatsu

On Thu, Feb 03, 2022 at 09:22:01AM +0100, Ard Biesheuvel wrote:
> The SMC calling convention designates R0-R7 as input registers in
> AArch32 mode, and this conflicts with the compiler's use of R7 as a
> frame pointer when building in Thumb2 mode. Generally, we don't enable
> the frame pointer, and GCC happily enables the -pg profiling hooks
> without them. However, Clang refuses, and errors out with the message
> below:
> 
> drivers/firmware/arm_scmi/smc.c:152:2: error: write to reserved register 'R7'
>         arm_smccc_1_1_invoke(scmi_info->func_id, 0, 0, 0, 0, 0, 0, 0, &res);
>         ^
> include/linux/arm-smccc.h:550:4: note: expanded from macro 'arm_smccc_1_1_invoke'
>                         arm_smccc_1_1_smc(__VA_ARGS__);                 \
>                         ^
> Let's just disable ftrace for the compilation unit when building this
> configuration.
> 
> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
> Acked-by: Sudeep Holla <sudeep.holla@arm.com>

I know I already acked, but wanted to check if you want to take this as series
or you want me to apply this one myself. I don't see any dependency on other
patches but thought I will check with you and your plans.

I am happy to take this patch alone if you prefer that. Let me know.

-- 
Regards,
Sudeep

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 11/13] ARM: cacheflush: avoid clobbering the frame pointer
  2022-02-03  8:22 ` [PATCH v3 11/13] ARM: cacheflush: avoid clobbering the frame pointer Ard Biesheuvel
@ 2022-02-07 19:12   ` Nick Desaulniers
  2022-02-08 10:08     ` Ard Biesheuvel
  0 siblings, 1 reply; 27+ messages in thread
From: Nick Desaulniers @ 2022-02-07 19:12 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: linux, linux-arm-kernel, Steven Rostedt, Sudeep Holla,
	Cristian Marussi, Nathan Chancellor, Arnd Bergmann,
	Linus Walleij, Masami Hiramatsu

On Thu, Feb 3, 2022 at 12:22 AM Ard Biesheuvel <ardb@kernel.org> wrote:
>
> Thumb2 uses R7 rather than R11 as the frame pointer, and even if we
> rarely use a frame pointer to begin with when building in Thumb2 mode,
> there are cases where it is required by the compiler (Clang when
> inserting profiling hooks via -pg)
>
> However, preserving and restoring the frame pointer is risky, as any
> unhandled exceptions raised in the mean time will produce a bogus
> backtrace, and it would be better not to touch the frame pointer at all.
> This is the case even when CONFIG_FRAME_POINTER is not set, as the
> unwind directive used by the unwinder may also use R7 or R11 as the
> unwind anchor, even if the frame pointer is not managed strictly
> according to the frame pointer ABI.
>
> So let's tweak the cacheflush asm code not to clobber R7 or R11 at all,
> so that we can drop R7 from the clobber lists of the inline asm blocks
> that call these routines, and remove the code that preserves/restores
> R11.
>
> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> ---
>  arch/arm/include/asm/cacheflush.h  | 12 ++----
>  arch/arm/mach-exynos/mcpm-exynos.c |  6 +--
>  arch/arm/mm/cache-v7.S             | 40 +++++++++-----------
>  3 files changed, 23 insertions(+), 35 deletions(-)
>
> diff --git a/arch/arm/include/asm/cacheflush.h b/arch/arm/include/asm/cacheflush.h
> index e68fb879e4f9..d27782331556 100644
> --- a/arch/arm/include/asm/cacheflush.h
> +++ b/arch/arm/include/asm/cacheflush.h
> @@ -446,15 +446,10 @@ static inline void __sync_cache_range_r(volatile void *p, size_t size)
>   *   however some exceptions may exist.  Caveat emptor.
>   *
>   * - The clobber list is dictated by the call to v7_flush_dcache_*.
> - *   fp is preserved to the stack explicitly prior disabling the cache
> - *   since adding it to the clobber list is incompatible with having
> - *   CONFIG_FRAME_POINTER=y.  ip is saved as well if ever r12-clobbering
> - *   trampoline are inserted by the linker and to keep sp 64-bit aligned.
>   */
>  #define v7_exit_coherency_flush(level) \
>         asm volatile( \
>         ".arch  armv7-a \n\t" \
> -       "stmfd  sp!, {fp, ip} \n\t" \
>         "mrc    p15, 0, r0, c1, c0, 0   @ get SCTLR \n\t" \
>         "bic    r0, r0, #"__stringify(CR_C)" \n\t" \
>         "mcr    p15, 0, r0, c1, c0, 0   @ set SCTLR \n\t" \
> @@ -464,10 +459,9 @@ static inline void __sync_cache_range_r(volatile void *p, size_t size)
>         "bic    r0, r0, #(1 << 6)       @ disable local coherency \n\t" \
>         "mcr    p15, 0, r0, c1, c0, 1   @ set ACTLR \n\t" \
>         "isb    \n\t" \
> -       "dsb    \n\t" \
> -       "ldmfd  sp!, {fp, ip}" \
> -       : : : "r0","r1","r2","r3","r4","r5","r6","r7", \
> -             "r9","r10","lr","memory" )
> +       "dsb" \
> +       : : : "r0","r1","r2","r3","r4","r5","r6", \
> +             "r9","r10","ip","lr","memory" )
>
>  void flush_uprobe_xol_access(struct page *page, unsigned long uaddr,
>                              void *kaddr, unsigned long len);
> diff --git a/arch/arm/mach-exynos/mcpm-exynos.c b/arch/arm/mach-exynos/mcpm-exynos.c
> index cd861c57d5ad..fd0dbeb93357 100644
> --- a/arch/arm/mach-exynos/mcpm-exynos.c
> +++ b/arch/arm/mach-exynos/mcpm-exynos.c
> @@ -35,7 +35,6 @@ static bool secure_firmware __ro_after_init;
>   */
>  #define exynos_v7_exit_coherency_flush(level) \
>         asm volatile( \
> -       "stmfd  sp!, {fp, ip}\n\t"\
>         "mrc    p15, 0, r0, c1, c0, 0   @ get SCTLR\n\t" \
>         "bic    r0, r0, #"__stringify(CR_C)"\n\t" \
>         "mcr    p15, 0, r0, c1, c0, 0   @ set SCTLR\n\t" \
> @@ -50,11 +49,10 @@ static bool secure_firmware __ro_after_init;
>         "mcr    p15, 0, r0, c1, c0, 1   @ set ACTLR\n\t" \
>         "isb\n\t" \
>         "dsb\n\t" \
> -       "ldmfd  sp!, {fp, ip}" \
>         : \
>         : "Ir" (pmu_base_addr + S5P_INFORM0) \
> -       : "r0", "r1", "r2", "r3", "r4", "r5", "r6", "r7", \
> -         "r9", "r10", "lr", "memory")
> +       : "r0", "r1", "r2", "r3", "r4", "r5", "r6", \
> +         "r9", "r10", "ip", "lr", "memory")
>
>  static int exynos_cpu_powerup(unsigned int cpu, unsigned int cluster)
>  {
> diff --git a/arch/arm/mm/cache-v7.S b/arch/arm/mm/cache-v7.S
> index 830bbfb26ca5..b9f55f0bc278 100644
> --- a/arch/arm/mm/cache-v7.S
> +++ b/arch/arm/mm/cache-v7.S
> @@ -90,7 +90,7 @@ ENDPROC(v7_flush_icache_all)
>   *
>   *     Flush the D-cache up to the Level of Unification Inner Shareable
>   *
> - *     Corrupted registers: r0-r7, r9-r11 (r6 only in Thumb mode)
> + *     Corrupted registers: r0-r6, r9-r10
>   */
>
>  ENTRY(v7_flush_dcache_louis)
> @@ -117,7 +117,7 @@ ENDPROC(v7_flush_dcache_louis)
>   *
>   *     Flush the whole D-cache.
>   *
> - *     Corrupted registers: r0-r7, r9-r11 (r6 only in Thumb mode)
> + *     Corrupted registers: r0-r6, r9-r10
>   *
>   *     - mm    - mm_struct describing address space
>   */
> @@ -149,22 +149,22 @@ flush_levels:
>         movw    r4, #0x3ff
>         ands    r4, r4, r1, lsr #3              @ find maximum number on the way size
>         clz     r5, r4                          @ find bit position of way size increment
> -       movw    r7, #0x7fff
> -       ands    r7, r7, r1, lsr #13             @ extract max number of the index size
> +       movw    r6, #0x7fff
> +       and     r1, r6, r1, lsr #13             @ extract max number of the index size
> +       mov     r6, #1
> +       movne   r4, r4, lsl r5                  @ # of ways shifted into bits [31:...]
> +       movne   r6, r6, lsl r5                  @ 1 shifted left by same amount

This group of changes was definitely trickier to review...can you
provide more color as to why the conditional `mov`s and the change in
the loop back-edge? (I appreciate you explaining some on IRC; it might
be helpful to other reviewers to have more commentary on list) The
rest LGTM.

>  loop1:
> -       mov     r9, r7                          @ create working copy of max index
> +       mov     r9, r1                          @ create working copy of max index
>  loop2:
> - ARM(  orr     r11, r10, r4, lsl r5    )       @ factor way and cache number into r11
> - THUMB(        lsl     r6, r4, r5              )
> - THUMB(        orr     r11, r10, r6            )       @ factor way and cache number into r11
> - ARM(  orr     r11, r11, r9, lsl r2    )       @ factor index number into r11
> - THUMB(        lsl     r6, r9, r2              )
> - THUMB(        orr     r11, r11, r6            )       @ factor index number into r11
> -       mcr     p15, 0, r11, c7, c14, 2         @ clean & invalidate by set/way
> +       mov     r5, r9, lsl r2                  @ factor way into r5
> +       orr     r5, r5, r4                      @ factor index number into r5
> +       orr     r5, r5, r10                     @ factor cache number into r5
> +       mcr     p15, 0, r5, c7, c14, 2          @ clean & invalidate by set/way
>         subs    r9, r9, #1                      @ decrement the index
>         bge     loop2
> -       subs    r4, r4, #1                      @ decrement the way
> -       bge     loop1
> +       subs    r4, r4, r6                      @ decrement the way
> +       bcs     loop1
>  skip:
>         add     r10, r10, #2                    @ increment cache number
>         cmp     r3, r10
> @@ -192,14 +192,12 @@ ENDPROC(v7_flush_dcache_all)
>   *
>   */
>  ENTRY(v7_flush_kern_cache_all)
> - ARM(  stmfd   sp!, {r4-r5, r7, r9-r11, lr}    )
> - THUMB(        stmfd   sp!, {r4-r7, r9-r11, lr}        )
> +       stmfd   sp!, {r4-r6, r9-r10, lr}
>         bl      v7_flush_dcache_all
>         mov     r0, #0
>         ALT_SMP(mcr     p15, 0, r0, c7, c1, 0)  @ invalidate I-cache inner shareable
>         ALT_UP(mcr      p15, 0, r0, c7, c5, 0)  @ I+BTB cache invalidate
> - ARM(  ldmfd   sp!, {r4-r5, r7, r9-r11, lr}    )
> - THUMB(        ldmfd   sp!, {r4-r7, r9-r11, lr}        )
> +       ldmfd   sp!, {r4-r6, r9-r10, lr}
>         ret     lr
>  ENDPROC(v7_flush_kern_cache_all)
>
> @@ -210,14 +208,12 @@ ENDPROC(v7_flush_kern_cache_all)
>   *     Invalidate the I-cache to the point of unification.
>   */
>  ENTRY(v7_flush_kern_cache_louis)
> - ARM(  stmfd   sp!, {r4-r5, r7, r9-r11, lr}    )
> - THUMB(        stmfd   sp!, {r4-r7, r9-r11, lr}        )
> +       stmfd   sp!, {r4-r6, r9-r10, lr}
>         bl      v7_flush_dcache_louis
>         mov     r0, #0
>         ALT_SMP(mcr     p15, 0, r0, c7, c1, 0)  @ invalidate I-cache inner shareable
>         ALT_UP(mcr      p15, 0, r0, c7, c5, 0)  @ I+BTB cache invalidate
> - ARM(  ldmfd   sp!, {r4-r5, r7, r9-r11, lr}    )
> - THUMB(        ldmfd   sp!, {r4-r7, r9-r11, lr}        )
> +       ldmfd   sp!, {r4-r6, r9-r10, lr}
>         ret     lr
>  ENDPROC(v7_flush_kern_cache_louis)
>
> --
> 2.30.2
>


-- 
Thanks,
~Nick Desaulniers

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 11/13] ARM: cacheflush: avoid clobbering the frame pointer
  2022-02-07 19:12   ` Nick Desaulniers
@ 2022-02-08 10:08     ` Ard Biesheuvel
  2022-02-08 22:34       ` Nick Desaulniers
  0 siblings, 1 reply; 27+ messages in thread
From: Ard Biesheuvel @ 2022-02-08 10:08 UTC (permalink / raw)
  To: Nick Desaulniers
  Cc: Russell King, Linux ARM, Steven Rostedt, Sudeep Holla,
	Cristian Marussi, Nathan Chancellor, Arnd Bergmann,
	Linus Walleij, Masami Hiramatsu

On Mon, 7 Feb 2022 at 20:12, Nick Desaulniers <ndesaulniers@google.com> wrote:
>
> On Thu, Feb 3, 2022 at 12:22 AM Ard Biesheuvel <ardb@kernel.org> wrote:
> >
> > Thumb2 uses R7 rather than R11 as the frame pointer, and even if we
> > rarely use a frame pointer to begin with when building in Thumb2 mode,
> > there are cases where it is required by the compiler (Clang when
> > inserting profiling hooks via -pg)
> >
> > However, preserving and restoring the frame pointer is risky, as any
> > unhandled exceptions raised in the mean time will produce a bogus
> > backtrace, and it would be better not to touch the frame pointer at all.
> > This is the case even when CONFIG_FRAME_POINTER is not set, as the
> > unwind directive used by the unwinder may also use R7 or R11 as the
> > unwind anchor, even if the frame pointer is not managed strictly
> > according to the frame pointer ABI.
> >
> > So let's tweak the cacheflush asm code not to clobber R7 or R11 at all,
> > so that we can drop R7 from the clobber lists of the inline asm blocks
> > that call these routines, and remove the code that preserves/restores
> > R11.
> >
> > Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> > ---
> >  arch/arm/include/asm/cacheflush.h  | 12 ++----
> >  arch/arm/mach-exynos/mcpm-exynos.c |  6 +--
> >  arch/arm/mm/cache-v7.S             | 40 +++++++++-----------
> >  3 files changed, 23 insertions(+), 35 deletions(-)
> >
> > diff --git a/arch/arm/include/asm/cacheflush.h b/arch/arm/include/asm/cacheflush.h
> > index e68fb879e4f9..d27782331556 100644
> > --- a/arch/arm/include/asm/cacheflush.h
> > +++ b/arch/arm/include/asm/cacheflush.h
> > @@ -446,15 +446,10 @@ static inline void __sync_cache_range_r(volatile void *p, size_t size)
> >   *   however some exceptions may exist.  Caveat emptor.
> >   *
> >   * - The clobber list is dictated by the call to v7_flush_dcache_*.
> > - *   fp is preserved to the stack explicitly prior disabling the cache
> > - *   since adding it to the clobber list is incompatible with having
> > - *   CONFIG_FRAME_POINTER=y.  ip is saved as well if ever r12-clobbering
> > - *   trampoline are inserted by the linker and to keep sp 64-bit aligned.
> >   */
> >  #define v7_exit_coherency_flush(level) \
> >         asm volatile( \
> >         ".arch  armv7-a \n\t" \
> > -       "stmfd  sp!, {fp, ip} \n\t" \
> >         "mrc    p15, 0, r0, c1, c0, 0   @ get SCTLR \n\t" \
> >         "bic    r0, r0, #"__stringify(CR_C)" \n\t" \
> >         "mcr    p15, 0, r0, c1, c0, 0   @ set SCTLR \n\t" \
> > @@ -464,10 +459,9 @@ static inline void __sync_cache_range_r(volatile void *p, size_t size)
> >         "bic    r0, r0, #(1 << 6)       @ disable local coherency \n\t" \
> >         "mcr    p15, 0, r0, c1, c0, 1   @ set ACTLR \n\t" \
> >         "isb    \n\t" \
> > -       "dsb    \n\t" \
> > -       "ldmfd  sp!, {fp, ip}" \
> > -       : : : "r0","r1","r2","r3","r4","r5","r6","r7", \
> > -             "r9","r10","lr","memory" )
> > +       "dsb" \
> > +       : : : "r0","r1","r2","r3","r4","r5","r6", \
> > +             "r9","r10","ip","lr","memory" )
> >
> >  void flush_uprobe_xol_access(struct page *page, unsigned long uaddr,
> >                              void *kaddr, unsigned long len);
> > diff --git a/arch/arm/mach-exynos/mcpm-exynos.c b/arch/arm/mach-exynos/mcpm-exynos.c
> > index cd861c57d5ad..fd0dbeb93357 100644
> > --- a/arch/arm/mach-exynos/mcpm-exynos.c
> > +++ b/arch/arm/mach-exynos/mcpm-exynos.c
> > @@ -35,7 +35,6 @@ static bool secure_firmware __ro_after_init;
> >   */
> >  #define exynos_v7_exit_coherency_flush(level) \
> >         asm volatile( \
> > -       "stmfd  sp!, {fp, ip}\n\t"\
> >         "mrc    p15, 0, r0, c1, c0, 0   @ get SCTLR\n\t" \
> >         "bic    r0, r0, #"__stringify(CR_C)"\n\t" \
> >         "mcr    p15, 0, r0, c1, c0, 0   @ set SCTLR\n\t" \
> > @@ -50,11 +49,10 @@ static bool secure_firmware __ro_after_init;
> >         "mcr    p15, 0, r0, c1, c0, 1   @ set ACTLR\n\t" \
> >         "isb\n\t" \
> >         "dsb\n\t" \
> > -       "ldmfd  sp!, {fp, ip}" \
> >         : \
> >         : "Ir" (pmu_base_addr + S5P_INFORM0) \
> > -       : "r0", "r1", "r2", "r3", "r4", "r5", "r6", "r7", \
> > -         "r9", "r10", "lr", "memory")
> > +       : "r0", "r1", "r2", "r3", "r4", "r5", "r6", \
> > +         "r9", "r10", "ip", "lr", "memory")
> >
> >  static int exynos_cpu_powerup(unsigned int cpu, unsigned int cluster)
> >  {
> > diff --git a/arch/arm/mm/cache-v7.S b/arch/arm/mm/cache-v7.S
> > index 830bbfb26ca5..b9f55f0bc278 100644
> > --- a/arch/arm/mm/cache-v7.S
> > +++ b/arch/arm/mm/cache-v7.S
> > @@ -90,7 +90,7 @@ ENDPROC(v7_flush_icache_all)
> >   *
> >   *     Flush the D-cache up to the Level of Unification Inner Shareable
> >   *
> > - *     Corrupted registers: r0-r7, r9-r11 (r6 only in Thumb mode)
> > + *     Corrupted registers: r0-r6, r9-r10
> >   */
> >
> >  ENTRY(v7_flush_dcache_louis)
> > @@ -117,7 +117,7 @@ ENDPROC(v7_flush_dcache_louis)
> >   *
> >   *     Flush the whole D-cache.
> >   *
> > - *     Corrupted registers: r0-r7, r9-r11 (r6 only in Thumb mode)
> > + *     Corrupted registers: r0-r6, r9-r10
> >   *
> >   *     - mm    - mm_struct describing address space
> >   */
> > @@ -149,22 +149,22 @@ flush_levels:
> >         movw    r4, #0x3ff
> >         ands    r4, r4, r1, lsr #3              @ find maximum number on the way size
> >         clz     r5, r4                          @ find bit position of way size increment
> > -       movw    r7, #0x7fff
> > -       ands    r7, r7, r1, lsr #13             @ extract max number of the index size
> > +       movw    r6, #0x7fff
> > +       and     r1, r6, r1, lsr #13             @ extract max number of the index size
> > +       mov     r6, #1
> > +       movne   r4, r4, lsl r5                  @ # of ways shifted into bits [31:...]
> > +       movne   r6, r6, lsl r5                  @ 1 shifted left by same amount
>
> This group of changes was definitely trickier to review...can you
> provide more color as to why the conditional `mov`s and the change in
> the loop back-edge? (I appreciate you explaining some on IRC; it might
> be helpful to other reviewers to have more commentary on list) The
> rest LGTM.
>

So the primary goal is to use fewer registers, to avoid clobbering R7 and R11.

So the first step is to reuse R1 in the loop, as its value is not used
after deriving the number of sets and ways from it, allowing us to
avoid using R7.

The second step is to rewrite the inner loop so that we use R6 on ARM
as well as Thumb2 as a temporary.

Currently, R6 is used on Thumb2 only because it does not support
variable inline rotates. So ARM has

orr r11, r10, r4, lsl r5
orr r11, r11, r9, lsl r2
mcr p15, 0, r11, c7, c14, 2

whereas Thumb2 has

lsl r6, r4, r5
orr r11, r10, r6
lsl r6, r9, r2
orr r11, r11, r6
mcr p15, 0, r11, c7, c14, 2

Note that the value of R4 << R5 only changes in the outer loop, so if
we can precompute it, we can turn this into the following for both ARM
and Thumb

mov r6, r9, lsl r2
orr r6, r6, r4  <-- already shifted left by r5
orr r6, r6, r10
mcr p15, 0, r6, c7, c14, 2

which no longer needs R11 as a temporary.

Since R4 now carries the left-shifted value, we need to subtract 1
shifted left by the same amount after every iteration of the outer
loop, and terminate once the subtraction triggers a borrow (!carry on
ARM), so we continue as long as the carry remains set. Note that the
existing BGE instruction relies on signed arithmetic, which is
unsuitable as bit 31 is no longer a sign bit.

To handle a corner case where the cache geometry is reported as one
cache and one way (793acf870ea3), we use MOVNE to skip the left-shift
at the start when R4 == 0x0, and to force the outer loop decrement to
be 0x1, as otherwise, the loop would never terminate.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 10/13] drivers/firmware/scmi: disable ftrace for Clang Thumb2 builds
  2022-02-07 18:28   ` Sudeep Holla
@ 2022-02-08 21:18     ` Ard Biesheuvel
  0 siblings, 0 replies; 27+ messages in thread
From: Ard Biesheuvel @ 2022-02-08 21:18 UTC (permalink / raw)
  To: Sudeep Holla
  Cc: Russell King, Linux ARM, Steven Rostedt, Cristian Marussi,
	Nathan Chancellor, Nick Desaulniers, Arnd Bergmann,
	Linus Walleij, Masami Hiramatsu

On Mon, 7 Feb 2022 at 19:29, Sudeep Holla <sudeep.holla@arm.com> wrote:
>
> On Thu, Feb 03, 2022 at 09:22:01AM +0100, Ard Biesheuvel wrote:
> > The SMC calling convention designates R0-R7 as input registers in
> > AArch32 mode, and this conflicts with the compiler's use of R7 as a
> > frame pointer when building in Thumb2 mode. Generally, we don't enable
> > the frame pointer, and GCC happily enables the -pg profiling hooks
> > without them. However, Clang refuses, and errors out with the message
> > below:
> >
> > drivers/firmware/arm_scmi/smc.c:152:2: error: write to reserved register 'R7'
> >         arm_smccc_1_1_invoke(scmi_info->func_id, 0, 0, 0, 0, 0, 0, 0, &res);
> >         ^
> > include/linux/arm-smccc.h:550:4: note: expanded from macro 'arm_smccc_1_1_invoke'
> >                         arm_smccc_1_1_smc(__VA_ARGS__);                 \
> >                         ^
> > Let's just disable ftrace for the compilation unit when building this
> > configuration.
> >
> > Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> > Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
> > Acked-by: Sudeep Holla <sudeep.holla@arm.com>
>
> I know I already acked, but wanted to check if you want to take this as series
> or you want me to apply this one myself. I don't see any dependency on other
> patches but thought I will check with you and your plans.
>
> I am happy to take this patch alone if you prefer that. Let me know.
>

Yes, please.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 11/13] ARM: cacheflush: avoid clobbering the frame pointer
  2022-02-08 10:08     ` Ard Biesheuvel
@ 2022-02-08 22:34       ` Nick Desaulniers
  2022-02-08 22:39         ` Nick Desaulniers
  0 siblings, 1 reply; 27+ messages in thread
From: Nick Desaulniers @ 2022-02-08 22:34 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Russell King, Linux ARM, Steven Rostedt, Sudeep Holla,
	Cristian Marussi, Nathan Chancellor, Arnd Bergmann,
	Linus Walleij, Masami Hiramatsu

On Tue, Feb 8, 2022 at 2:08 AM Ard Biesheuvel <ardb@kernel.org> wrote:
>
> On Mon, 7 Feb 2022 at 20:12, Nick Desaulniers <ndesaulniers@google.com> wrote:
> >
> > On Thu, Feb 3, 2022 at 12:22 AM Ard Biesheuvel <ardb@kernel.org> wrote:
> > >
> > > @@ -149,22 +149,22 @@ flush_levels:
> > >         movw    r4, #0x3ff
> > >         ands    r4, r4, r1, lsr #3              @ find maximum number on the way size
> > >         clz     r5, r4                          @ find bit position of way size increment
> > > -       movw    r7, #0x7fff
> > > -       ands    r7, r7, r1, lsr #13             @ extract max number of the index size
> > > +       movw    r6, #0x7fff
> > > +       and     r1, r6, r1, lsr #13             @ extract max number of the index size
> > > +       mov     r6, #1
> > > +       movne   r4, r4, lsl r5                  @ # of ways shifted into bits [31:...]
> > > +       movne   r6, r6, lsl r5                  @ 1 shifted left by same amount
> >
> > This group of changes was definitely trickier to review...can you
> > provide more color as to why the conditional `mov`s and the change in
> > the loop back-edge? (I appreciate you explaining some on IRC; it might
> > be helpful to other reviewers to have more commentary on list) The
> > rest LGTM.
> >
>
> So the primary goal is to use fewer registers, to avoid clobbering R7 and R11.
>
> So the first step is to reuse R1 in the loop, as its value is not used
> after deriving the number of sets and ways from it, allowing us to
> avoid using R7.

Yep. LGTM.

>
> The second step is to rewrite the inner loop so that we use R6 on ARM
> as well as Thumb2 as a temporary.

Yep. LGTM.

>
> Currently, R6 is used on Thumb2 only because it does not support
> variable inline rotates. So ARM has
>
> orr r11, r10, r4, lsl r5
> orr r11, r11, r9, lsl r2
> mcr p15, 0, r11, c7, c14, 2
>
> whereas Thumb2 has
>
> lsl r6, r4, r5
> orr r11, r10, r6
> lsl r6, r9, r2
> orr r11, r11, r6
> mcr p15, 0, r11, c7, c14, 2
>
> Note that the value of R4 << R5 only changes in the outer loop, so if
> we can precompute it, we can turn this into the following for both ARM
> and Thumb
>
> mov r6, r9, lsl r2
> orr r6, r6, r4  <-- already shifted left by r5
> orr r6, r6, r10
> mcr p15, 0, r6, c7, c14, 2
>
> which no longer needs R11 as a temporary.

Yep. LGTM.

>
> Since R4 now carries the left-shifted value, we need to subtract 1
> shifted left by the same amount after every iteration of the outer
> loop, and terminate once the subtraction triggers a borrow (!carry on
> ARM), so we continue as long as the carry remains set. Note that the
> existing BGE instruction relies on signed arithmetic, which is
> unsuitable as bit 31 is no longer a sign bit.

Ah, perfect. This was the part I was struggling most with.  I guess
before we had something almost like (for loop1):

int r4 = ...
do {
  r11 = r10 | (r4 << r5);
  ...
} while (r4 = r4 - 1);

Now it's
unsigned r4 = ...
r4 = r4 << r5
r6 = 1 << r5
do {
  ...
} while (r4 = r4 - r6);

>
> To handle a corner case where the cache geometry is reported as one
> cache and one way (793acf870ea3), we use MOVNE to skip the left-shift
> at the start when R4 == 0x0, and to force the outer loop decrement to
> be 0x1, as otherwise, the loop would never terminate.

Right, so when r4 == 0x0, r6 == 0x1, r4 = 0x0 - 0x1 == 0xFFFFFF. That
will set the N bit of PSR but because we only branch back if C was set
to 1 (which it would be if r4 > r6 > 0) we exit loop1.

(I found this fun little ARM simulator:
http://www.csbio.unc.edu/mcmillan/miniARM.html; it definitely helped
to observe the PSR through the loop!)

Thanks again for taking the time to explain it; I appreciate it!

Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
-- 
Thanks,
~Nick Desaulniers

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 11/13] ARM: cacheflush: avoid clobbering the frame pointer
  2022-02-08 22:34       ` Nick Desaulniers
@ 2022-02-08 22:39         ` Nick Desaulniers
  0 siblings, 0 replies; 27+ messages in thread
From: Nick Desaulniers @ 2022-02-08 22:39 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Russell King, Linux ARM, Steven Rostedt, Sudeep Holla,
	Cristian Marussi, Nathan Chancellor, Arnd Bergmann,
	Linus Walleij, Masami Hiramatsu

On Tue, Feb 8, 2022 at 2:34 PM Nick Desaulniers <ndesaulniers@google.com> wrote:
>
> Thanks again for taking the time to explain it; I appreciate it!
>
> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>

Also, I thought this patch was giving me deja-vu:
https://github.com/ClangBuiltLinux/linux/issues/701
https://github.com/ClangBuiltLinux/linux/issues/701#issuecomment-591323758
-- 
Thanks,
~Nick Desaulniers

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 10/13] drivers/firmware/scmi: disable ftrace for Clang Thumb2 builds
  2022-02-03  8:22 ` [PATCH v3 10/13] drivers/firmware/scmi: disable ftrace for Clang " Ard Biesheuvel
  2022-02-07 18:28   ` Sudeep Holla
@ 2022-02-11  9:56   ` Sudeep Holla
  1 sibling, 0 replies; 27+ messages in thread
From: Sudeep Holla @ 2022-02-11  9:56 UTC (permalink / raw)
  To: linux, Ard Biesheuvel
  Cc: Sudeep Holla, Steven Rostedt, Cristian Marussi, Arnd Bergmann,
	Nick Desaulniers, Linus Walleij, Nathan Chancellor,
	Masami Hiramatsu, linux-arm-kernel

On Thu, Feb 03, 2022 at 09:22:01AM +0100, Ard Biesheuvel wrote:
> The SMC calling convention designates R0-R7 as input registers in
> AArch32 mode, and this conflicts with the compiler's use of R7 as a
> frame pointer when building in Thumb2 mode. Generally, we don't enable
> the frame pointer, and GCC happily enables the -pg profiling hooks
> without them. However, Clang refuses, and errors out with the message
> below:
>
> drivers/firmware/arm_scmi/smc.c:152:2: error: write to reserved register 'R7'
>         arm_smccc_1_1_invoke(scmi_info->func_id, 0, 0, 0, 0, 0, 0, 0, &res);
>         ^
> include/linux/arm-smccc.h:550:4: note: expanded from macro 'arm_smccc_1_1_invoke'
>                         arm_smccc_1_1_smc(__VA_ARGS__);                 \
>                         ^
> Let's just disable ftrace for the compilation unit when building this
> configuration.
>
> [...]

Applied to sudeep.holla/linux (for-next/scmi), thanks!

[10/13] drivers/firmware/scmi: disable ftrace for Clang Thumb2 builds
        https://git.kernel.org/sudeep.holla/linux/c/cdf157faaa

--
Regards,
Sudeep


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2022-02-11  9:58 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-03  8:21 [PATCH v3 00/13] ARM: ftrace fixes and cleanups Ard Biesheuvel
2022-02-03  8:21 ` [PATCH v3 01/13] ARM: ftrace: ensure that ADR takes the Thumb bit into account Ard Biesheuvel
2022-02-03  8:21 ` [PATCH v3 02/13] ARM: ftrace: use ADD not POP to counter PUSH at entry Ard Biesheuvel
2022-02-03  8:21 ` [PATCH v3 03/13] ARM: ftrace: use trampolines to keep .init.text in branching range Ard Biesheuvel
2022-02-03  8:21 ` [PATCH v3 04/13] ARM: ftrace: avoid redundant loads or clobbering IP Ard Biesheuvel
2022-02-03  8:21 ` [PATCH v3 05/13] ARM: ftrace: avoid unnecessary literal loads Ard Biesheuvel
2022-02-03  8:21 ` [PATCH v3 06/13] ARM: ftrace: enable HAVE_FUNCTION_GRAPH_FP_TEST Ard Biesheuvel
2022-02-03  8:21 ` [PATCH v3 07/13] ARM: unwind: track location of LR value in stack frame Ard Biesheuvel
2022-02-07 18:14   ` Nick Desaulniers
2022-02-03  8:21 ` [PATCH v3 08/13] ARM: ftrace: enable the graph tracer with the EABI unwinder Ard Biesheuvel
2022-02-03  9:16   ` Arnd Bergmann
2022-02-03  9:41     ` Ard Biesheuvel
2022-02-03 16:09     ` Nathan Chancellor
2022-02-03 16:11       ` Ard Biesheuvel
2022-02-03  8:22 ` [PATCH v3 09/13] ARM: kprobes: treat R7 as the frame pointer register in Thumb2 builds Ard Biesheuvel
2022-02-03  8:22 ` [PATCH v3 10/13] drivers/firmware/scmi: disable ftrace for Clang " Ard Biesheuvel
2022-02-07 18:28   ` Sudeep Holla
2022-02-08 21:18     ` Ard Biesheuvel
2022-02-11  9:56   ` Sudeep Holla
2022-02-03  8:22 ` [PATCH v3 11/13] ARM: cacheflush: avoid clobbering the frame pointer Ard Biesheuvel
2022-02-07 19:12   ` Nick Desaulniers
2022-02-08 10:08     ` Ard Biesheuvel
2022-02-08 22:34       ` Nick Desaulniers
2022-02-08 22:39         ` Nick Desaulniers
2022-02-03  8:22 ` [PATCH v3 12/13] ARM: mach-bcm: disable ftrace in SMC invocation routines Ard Biesheuvel
2022-02-03 20:39   ` Nick Desaulniers
2022-02-03  8:22 ` [PATCH v3 13/13] Revert "ARM: 9144/1: forbid ftrace with clang and thumb2_kernel" Ard Biesheuvel

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.