All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/4] ARM: vfp: Switch to C API to en/disable softirqs
@ 2023-03-16  8:20 Ard Biesheuvel
  2023-03-16  8:20 ` [PATCH v3 1/4] ARM: vfp: Pass thread_info pointer to vfp_support_entry Ard Biesheuvel
                   ` (4 more replies)
  0 siblings, 5 replies; 10+ messages in thread
From: Ard Biesheuvel @ 2023-03-16  8:20 UTC (permalink / raw)
  To: linux-arm-kernel, linux
  Cc: Ard Biesheuvel, Frederic Weisbecker, Guenter Roeck,
	Peter Zijlstra, Linus Walleij, Arnd Bergmann

As it turns out, enabling or disabling softirqs from asm code is more
tricky than it looks. Simply bumping the associated bit in preempt_count
does the trick for uninstrumented kernels, but with lockdep or preempt
debug enabled, we really need to call the C versions, as replicating
their behavior in asm fully is intractible.

So let's rework the existing code a little bit so we can interpose a
little C helper 'vfp_entry()' that disables softirqs before calling the
existing vfpstate handling code in asm. Re-enabling softirqs from asm
code is more straight-forward, as we can simply perform a tail call via
a C routine that is guaranteed to be callable as a function.

However, since calling these APIs from asm code is still not ideal,
let's reimplement the whole thing in C (patch #4)


Changes since v2:
- add Rbs and Tbs from Linus and Guenter (thanks!)
- correct reference to local_bh_enable() vs __local_bh_enable_ip() in
  commit log of patch #3
- add patch that reworks the asm VFP exception handling entirely so the
  bulk of it is implemented in C. This could be taken into v6.4, while
  the  preceding patches are fixes for v6.3-rc

Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Walleij <linus.walleij@linaro.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/all/ZBBYCSZUJOWBg1s8@localhost.localdomain/

Ard Biesheuvel (4):
  ARM: vfp: Pass thread_info pointer to vfp_support_entry
  ARM: vfp: Pass successful return address via register R3
  ARM: vfp: Fix broken softirq handling with instrumentation enabled
  ARM: vfp: Reimplement VFP exception entry in C code

 arch/arm/include/asm/assembler.h |  13 --
 arch/arm/kernel/entry-armv.S     |  12 ++
 arch/arm/vfp/Makefile            |   2 +-
 arch/arm/vfp/entry.S             |  39 ----
 arch/arm/vfp/vfp.h               |   1 +
 arch/arm/vfp/vfphw.S             | 198 ++------------------
 arch/arm/vfp/vfpmodule.c         | 125 +++++++++++-
 7 files changed, 142 insertions(+), 248 deletions(-)
 delete mode 100644 arch/arm/vfp/entry.S

-- 
2.39.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v3 1/4] ARM: vfp: Pass thread_info pointer to vfp_support_entry
  2023-03-16  8:20 [PATCH v3 0/4] ARM: vfp: Switch to C API to en/disable softirqs Ard Biesheuvel
@ 2023-03-16  8:20 ` Ard Biesheuvel
  2023-03-16  8:20 ` [PATCH v3 2/4] ARM: vfp: Pass successful return address via register R3 Ard Biesheuvel
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 10+ messages in thread
From: Ard Biesheuvel @ 2023-03-16  8:20 UTC (permalink / raw)
  To: linux-arm-kernel, linux
  Cc: Ard Biesheuvel, Frederic Weisbecker, Guenter Roeck,
	Peter Zijlstra, Linus Walleij, Arnd Bergmann

Instead of dereferencing thread_info in do_vfp, pass the thread_info
pointer to vfp_support_entry via R1. That way, we only use a single
caller save register, which makes it easier to convert do_vfp to C code
in a subsequent patch.

Note that, unlike the CPU number, which can change due to preemption,
passing the thread_info pointer can safely be done with preemption
enabled.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Tested-by: Guenter Roeck <linux@roeck-us.net>
---
 arch/arm/vfp/entry.S |  5 +----
 arch/arm/vfp/vfphw.S | 10 +++++++---
 2 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/arch/arm/vfp/entry.S b/arch/arm/vfp/entry.S
index 9a89264cdcc0b46e..cfedc2a3dbd68f1c 100644
--- a/arch/arm/vfp/entry.S
+++ b/arch/arm/vfp/entry.S
@@ -22,15 +22,12 @@
 @  IRQs enabled.
 @
 ENTRY(do_vfp)
-	local_bh_disable r10, r4
+	mov	r1, r10
  	ldr	r4, .LCvfp
-	ldr	r11, [r10, #TI_CPU]	@ CPU number
-	add	r10, r10, #TI_VFPSTATE	@ r10 = workspace
 	ldr	pc, [r4]		@ call VFP entry point
 ENDPROC(do_vfp)
 
 ENTRY(vfp_null_entry)
-	local_bh_enable_ti r10, r4
 	ret	lr
 ENDPROC(vfp_null_entry)
 
diff --git a/arch/arm/vfp/vfphw.S b/arch/arm/vfp/vfphw.S
index 26c4f61ecfa39638..6d056d810e4868c2 100644
--- a/arch/arm/vfp/vfphw.S
+++ b/arch/arm/vfp/vfphw.S
@@ -6,9 +6,9 @@
  *  Written by Deep Blue Solutions Limited.
  *
  * This code is called from the kernel's undefined instruction trap.
+ * r1 holds the thread_info pointer
  * r9 holds the return address for successful handling.
  * lr holds the return address for unrecognised instructions.
- * r10 points at the start of the private FP workspace in the thread structure
  * sp points to a struct pt_regs (as defined in include/asm/proc/ptrace.h)
  */
 #include <linux/init.h>
@@ -69,13 +69,17 @@
 @ VFP hardware support entry point.
 @
 @  r0  = instruction opcode (32-bit ARM or two 16-bit Thumb)
+@  r1  = thread_info pointer
 @  r2  = PC value to resume execution after successful emulation
 @  r9  = normal "successful" return address
-@  r10 = vfp_state union
-@  r11 = CPU number
 @  lr  = unrecognised instruction return address
 @  IRQs enabled.
 ENTRY(vfp_support_entry)
+	local_bh_disable r1, r4
+
+	ldr	r11, [r1, #TI_CPU]	@ CPU number
+	add	r10, r1, #TI_VFPSTATE	@ r10 = workspace
+
 	DBGSTR3	"instr %08x pc %08x state %p", r0, r2, r10
 
 	.fpu	vfpv2
-- 
2.39.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v3 2/4] ARM: vfp: Pass successful return address via register R3
  2023-03-16  8:20 [PATCH v3 0/4] ARM: vfp: Switch to C API to en/disable softirqs Ard Biesheuvel
  2023-03-16  8:20 ` [PATCH v3 1/4] ARM: vfp: Pass thread_info pointer to vfp_support_entry Ard Biesheuvel
@ 2023-03-16  8:20 ` Ard Biesheuvel
  2023-03-16  8:20 ` [PATCH v3 3/4] ARM: vfp: Fix broken softirq handling with instrumentation enabled Ard Biesheuvel
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 10+ messages in thread
From: Ard Biesheuvel @ 2023-03-16  8:20 UTC (permalink / raw)
  To: linux-arm-kernel, linux
  Cc: Ard Biesheuvel, Frederic Weisbecker, Guenter Roeck,
	Peter Zijlstra, Linus Walleij, Arnd Bergmann

In preparation for reimplementing the do_vfp()->vfp_support_entry()
handover in C code, switch to using R3 to pass the 'success' return
address, rather than R9, as it cannot be used for parameter passing.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Tested-by: Guenter Roeck <linux@roeck-us.net>
---
 arch/arm/vfp/entry.S |  1 +
 arch/arm/vfp/vfphw.S | 14 +++++++-------
 2 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/arch/arm/vfp/entry.S b/arch/arm/vfp/entry.S
index cfedc2a3dbd68f1c..6dabb47617781a5f 100644
--- a/arch/arm/vfp/entry.S
+++ b/arch/arm/vfp/entry.S
@@ -23,6 +23,7 @@
 @
 ENTRY(do_vfp)
 	mov	r1, r10
+	mov	r3, r9
  	ldr	r4, .LCvfp
 	ldr	pc, [r4]		@ call VFP entry point
 ENDPROC(do_vfp)
diff --git a/arch/arm/vfp/vfphw.S b/arch/arm/vfp/vfphw.S
index 6d056d810e4868c2..60acd42e05786e95 100644
--- a/arch/arm/vfp/vfphw.S
+++ b/arch/arm/vfp/vfphw.S
@@ -7,7 +7,7 @@
  *
  * This code is called from the kernel's undefined instruction trap.
  * r1 holds the thread_info pointer
- * r9 holds the return address for successful handling.
+ * r3 holds the return address for successful handling.
  * lr holds the return address for unrecognised instructions.
  * sp points to a struct pt_regs (as defined in include/asm/proc/ptrace.h)
  */
@@ -71,7 +71,7 @@
 @  r0  = instruction opcode (32-bit ARM or two 16-bit Thumb)
 @  r1  = thread_info pointer
 @  r2  = PC value to resume execution after successful emulation
-@  r9  = normal "successful" return address
+@  r3  = normal "successful" return address
 @  lr  = unrecognised instruction return address
 @  IRQs enabled.
 ENTRY(vfp_support_entry)
@@ -89,9 +89,9 @@ ENTRY(vfp_support_entry)
 	bne	look_for_VFP_exceptions	@ VFP is already enabled
 
 	DBGSTR1 "enable %x", r10
-	ldr	r3, vfp_current_hw_state_address
+	ldr	r9, vfp_current_hw_state_address
 	orr	r1, r1, #FPEXC_EN	@ user FPEXC has the enable bit set
-	ldr	r4, [r3, r11, lsl #2]	@ vfp_current_hw_state pointer
+	ldr	r4, [r9, r11, lsl #2]	@ vfp_current_hw_state pointer
 	bic	r5, r1, #FPEXC_EX	@ make sure exceptions are disabled
 	cmp	r4, r10			@ this thread owns the hw context?
 #ifndef CONFIG_SMP
@@ -150,7 +150,7 @@ vfp_reload_hw:
 #endif
 
 	DBGSTR1	"load state %p", r10
-	str	r10, [r3, r11, lsl #2]	@ update the vfp_current_hw_state pointer
+	str	r10, [r9, r11, lsl #2]	@ update the vfp_current_hw_state pointer
 					@ Load the saved state back into the VFP
 	VFPFLDMIA r10, r5		@ reload the working registers while
 					@ FPEXC is in a safe state
@@ -180,7 +180,7 @@ vfp_hw_state_valid:
 					@ always subtract 4 from the following
 					@ instruction address.
 	local_bh_enable_ti r10, r4
-	ret	r9			@ we think we have handled things
+	ret	r3			@ we think we have handled things
 
 
 look_for_VFP_exceptions:
@@ -210,7 +210,7 @@ skip:
 process_exception:
 	DBGSTR	"bounce"
 	mov	r2, sp			@ nothing stacked - regdump is at TOS
-	mov	lr, r9			@ setup for a return to the user code.
+	mov	lr, r3			@ setup for a return to the user code.
 
 	@ Now call the C code to package up the bounce to the support code
 	@   r0 holds the trigger instruction
-- 
2.39.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v3 3/4] ARM: vfp: Fix broken softirq handling with instrumentation enabled
  2023-03-16  8:20 [PATCH v3 0/4] ARM: vfp: Switch to C API to en/disable softirqs Ard Biesheuvel
  2023-03-16  8:20 ` [PATCH v3 1/4] ARM: vfp: Pass thread_info pointer to vfp_support_entry Ard Biesheuvel
  2023-03-16  8:20 ` [PATCH v3 2/4] ARM: vfp: Pass successful return address via register R3 Ard Biesheuvel
@ 2023-03-16  8:20 ` Ard Biesheuvel
  2023-03-16  8:20 ` [PATCH v3 4/4] ARM: vfp: Reimplement VFP exception entry in C code Ard Biesheuvel
  2023-03-17  0:23 ` [PATCH v3 0/4] ARM: vfp: Switch to C API to en/disable softirqs Guenter Roeck
  4 siblings, 0 replies; 10+ messages in thread
From: Ard Biesheuvel @ 2023-03-16  8:20 UTC (permalink / raw)
  To: linux-arm-kernel, linux
  Cc: Ard Biesheuvel, Frederic Weisbecker, Guenter Roeck,
	Peter Zijlstra, Linus Walleij, Arnd Bergmann

Commit 62b95a7b44d1 ("ARM: 9282/1: vfp: Manipulate task VFP state with
softirqs disabled") replaced the en/disable preemption calls inside the
VFP state handling code with en/disabling of soft IRQs, which is
necessary to allow kernel use of the VFP/SIMD unit when handling a soft
IRQ.

Unfortunately, when lockdep is enabled (or other instrumentation that
enables TRACE_IRQFLAGS), the disable path implemented in asm fails to
perform the lockdep and RCU related bookkeeping, resulting in spurious
warnings and other badness.

Set let's rework the VFP entry code a little bit so we can make the
local_bh_disable() call from C, with all the instrumentations that
happen to have been configured. Calling local_bh_enable() can be done
from asm, as it is a simple wrapper around __local_bh_enable_ip(), which
is always a callable function.

Link: https://lore.kernel.org/all/ZBBYCSZUJOWBg1s8@localhost.localdomain/
Fixes: 62b95a7b44d1 ("ARM: 9282/1: vfp: Manipulate task VFP state with softirqs disabled")
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Tested-by: Guenter Roeck <linux@roeck-us.net>
---
 arch/arm/include/asm/assembler.h | 13 ----------
 arch/arm/vfp/entry.S             | 11 +-------
 arch/arm/vfp/vfphw.S             | 12 ++++-----
 arch/arm/vfp/vfpmodule.c         | 27 ++++++++++++++++----
 4 files changed, 29 insertions(+), 34 deletions(-)

diff --git a/arch/arm/include/asm/assembler.h b/arch/arm/include/asm/assembler.h
index 06b48ce23e1ca245..505a306e0271a9c4 100644
--- a/arch/arm/include/asm/assembler.h
+++ b/arch/arm/include/asm/assembler.h
@@ -244,19 +244,6 @@ THUMB(	fpreg	.req	r7	)
 	.endm
 #endif
 
-	.macro	local_bh_disable, ti, tmp
-	ldr	\tmp, [\ti, #TI_PREEMPT]
-	add	\tmp, \tmp, #SOFTIRQ_DISABLE_OFFSET
-	str	\tmp, [\ti, #TI_PREEMPT]
-	.endm
-
-	.macro	local_bh_enable_ti, ti, tmp
-	get_thread_info \ti
-	ldr	\tmp, [\ti, #TI_PREEMPT]
-	sub	\tmp, \tmp, #SOFTIRQ_DISABLE_OFFSET
-	str	\tmp, [\ti, #TI_PREEMPT]
-	.endm
-
 #define USERL(l, x...)				\
 9999:	x;					\
 	.pushsection __ex_table,"a";		\
diff --git a/arch/arm/vfp/entry.S b/arch/arm/vfp/entry.S
index 6dabb47617781a5f..7483ef8bccda394c 100644
--- a/arch/arm/vfp/entry.S
+++ b/arch/arm/vfp/entry.S
@@ -24,14 +24,5 @@
 ENTRY(do_vfp)
 	mov	r1, r10
 	mov	r3, r9
- 	ldr	r4, .LCvfp
-	ldr	pc, [r4]		@ call VFP entry point
+	b	vfp_entry
 ENDPROC(do_vfp)
-
-ENTRY(vfp_null_entry)
-	ret	lr
-ENDPROC(vfp_null_entry)
-
-	.align	2
-.LCvfp:
-	.word	vfp_vector
diff --git a/arch/arm/vfp/vfphw.S b/arch/arm/vfp/vfphw.S
index 60acd42e05786e95..4d8478264d82b3d2 100644
--- a/arch/arm/vfp/vfphw.S
+++ b/arch/arm/vfp/vfphw.S
@@ -75,8 +75,6 @@
 @  lr  = unrecognised instruction return address
 @  IRQs enabled.
 ENTRY(vfp_support_entry)
-	local_bh_disable r1, r4
-
 	ldr	r11, [r1, #TI_CPU]	@ CPU number
 	add	r10, r1, #TI_VFPSTATE	@ r10 = workspace
 
@@ -179,9 +177,12 @@ vfp_hw_state_valid:
 					@ else it's one 32-bit instruction, so
 					@ always subtract 4 from the following
 					@ instruction address.
-	local_bh_enable_ti r10, r4
-	ret	r3			@ we think we have handled things
 
+	mov	lr, r3			@ we think we have handled things
+local_bh_enable_and_ret:
+	adr	r0, .
+	mov	r1, #SOFTIRQ_DISABLE_OFFSET
+	b	__local_bh_enable_ip	@ tail call
 
 look_for_VFP_exceptions:
 	@ Check for synchronous or asynchronous exception
@@ -204,8 +205,7 @@ skip:
 	@ not recognised by VFP
 
 	DBGSTR	"not VFP"
-	local_bh_enable_ti r10, r4
-	ret	lr
+	b	local_bh_enable_and_ret
 
 process_exception:
 	DBGSTR	"bounce"
diff --git a/arch/arm/vfp/vfpmodule.c b/arch/arm/vfp/vfpmodule.c
index 86fb0be41ae14b6c..3f88599c285a5d4b 100644
--- a/arch/arm/vfp/vfpmodule.c
+++ b/arch/arm/vfp/vfpmodule.c
@@ -32,10 +32,9 @@
 /*
  * Our undef handlers (in entry.S)
  */
-asmlinkage void vfp_support_entry(void);
-asmlinkage void vfp_null_entry(void);
+asmlinkage void vfp_support_entry(u32, void *, u32, u32);
 
-asmlinkage void (*vfp_vector)(void) = vfp_null_entry;
+static bool have_vfp __ro_after_init;
 
 /*
  * Dual-use variable.
@@ -799,7 +798,6 @@ static int __init vfp_init(void)
 	vfpsid = fmrx(FPSID);
 	barrier();
 	unregister_undef_hook(&vfp_detect_hook);
-	vfp_vector = vfp_null_entry;
 
 	pr_info("VFP support v0.3: ");
 	if (VFP_arch) {
@@ -884,7 +882,7 @@ static int __init vfp_init(void)
 				  "arm/vfp:starting", vfp_starting_cpu,
 				  vfp_dying_cpu);
 
-	vfp_vector = vfp_support_entry;
+	have_vfp = true;
 
 	thread_register_notifier(&vfp_notifier_block);
 	vfp_pm_init();
@@ -906,3 +904,22 @@ static int __init vfp_init(void)
 }
 
 core_initcall(vfp_init);
+
+/*
+ * Entered with:
+ *
+ *  r0  = instruction opcode (32-bit ARM or two 16-bit Thumb)
+ *  r1  = thread_info pointer
+ *  r2  = PC value to resume execution after successful emulation
+ *  r3  = normal "successful" return address
+ *  lr  = unrecognised instruction return address
+ */
+asmlinkage void vfp_entry(u32 trigger, struct thread_info *ti, u32 resume_pc,
+			  u32 resume_return_address)
+{
+	if (unlikely(!have_vfp))
+		return;
+
+	local_bh_disable();
+	vfp_support_entry(opcode, ti, resume_pc, resume_return_address);
+}
-- 
2.39.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v3 4/4] ARM: vfp: Reimplement VFP exception entry in C code
  2023-03-16  8:20 [PATCH v3 0/4] ARM: vfp: Switch to C API to en/disable softirqs Ard Biesheuvel
                   ` (2 preceding siblings ...)
  2023-03-16  8:20 ` [PATCH v3 3/4] ARM: vfp: Fix broken softirq handling with instrumentation enabled Ard Biesheuvel
@ 2023-03-16  8:20 ` Ard Biesheuvel
  2023-03-16  8:59   ` Linus Walleij
  2023-03-17  0:23 ` [PATCH v3 0/4] ARM: vfp: Switch to C API to en/disable softirqs Guenter Roeck
  4 siblings, 1 reply; 10+ messages in thread
From: Ard Biesheuvel @ 2023-03-16  8:20 UTC (permalink / raw)
  To: linux-arm-kernel, linux
  Cc: Ard Biesheuvel, Frederic Weisbecker, Guenter Roeck,
	Peter Zijlstra, Linus Walleij, Arnd Bergmann

En/disabling softirqs from asm code turned out to be trickier than
expected, so vfp_support_entry now returns by tail calling
__local_enable_bh_ip() and passing the same arguments that a C call to
local_bh_enable() would pass. However, this is slightly hacky, as we
don't want to carry our own implementation of local_bh_enable().

So let's bite the bullet, and get rid of the asm logic in
vfp_support_entry that reasons about whether or not to save and/or
reload the VFP state, and about whether or not an FP exception is
pending, and only keep the VFP loading logic as a function that is
callable from C.

Replicate the removed logic in vfp_entry(), and use the exact same
reasoning as in the asm code. To emphasize the correspondance, retain
some of the asm comments in the C version as well.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm/kernel/entry-armv.S |  12 ++
 arch/arm/vfp/Makefile        |   2 +-
 arch/arm/vfp/entry.S         |  28 ---
 arch/arm/vfp/vfp.h           |   1 +
 arch/arm/vfp/vfphw.S         | 204 ++------------------
 arch/arm/vfp/vfpmodule.c     | 118 +++++++++--
 6 files changed, 131 insertions(+), 234 deletions(-)

diff --git a/arch/arm/kernel/entry-armv.S b/arch/arm/kernel/entry-armv.S
index c39303e5c23470e6..759404e72c63287c 100644
--- a/arch/arm/kernel/entry-armv.S
+++ b/arch/arm/kernel/entry-armv.S
@@ -662,6 +662,18 @@ call_fpe:
 	.word	0x00000000			@ opcode
 #endif
 
+@ VFP entry point.
+@
+@  r0  = instruction opcode (32-bit ARM or two 16-bit Thumb)
+@  r9  = normal "successful" return address
+@  lr  = unrecognised instruction return address
+do_vfp:
+	mov	r1, sp				@ pass struct pt_regs via R1
+	bl	vfp_entry			@ dispatch the VFP exception
+	cmp	r0, #0				@ handled successfully?
+	reteq	r9				@ then use R9 as return address
+	ret	lr
+
 do_fpe:
 	add	r10, r10, #TI_FPSTATE		@ r10 = workspace
 	ldr_va	pc, fp_enter, tmp=r4		@ Call FP module USR entry point
diff --git a/arch/arm/vfp/Makefile b/arch/arm/vfp/Makefile
index 749901a72d6dc6c4..dfd64bc2b2fbdd06 100644
--- a/arch/arm/vfp/Makefile
+++ b/arch/arm/vfp/Makefile
@@ -8,4 +8,4 @@
 # ccflags-y := -DDEBUG
 # asflags-y := -DDEBUG
 
-obj-y		+= vfpmodule.o entry.o vfphw.o vfpsingle.o vfpdouble.o
+obj-y		+= vfpmodule.o vfphw.o vfpsingle.o vfpdouble.o
diff --git a/arch/arm/vfp/entry.S b/arch/arm/vfp/entry.S
deleted file mode 100644
index 7483ef8bccda394c..0000000000000000
--- a/arch/arm/vfp/entry.S
+++ /dev/null
@@ -1,28 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0-only */
-/*
- *  linux/arch/arm/vfp/entry.S
- *
- *  Copyright (C) 2004 ARM Limited.
- *  Written by Deep Blue Solutions Limited.
- */
-#include <linux/init.h>
-#include <linux/linkage.h>
-#include <asm/thread_info.h>
-#include <asm/vfpmacros.h>
-#include <asm/assembler.h>
-#include <asm/asm-offsets.h>
-
-@ VFP entry point.
-@
-@  r0  = instruction opcode (32-bit ARM or two 16-bit Thumb)
-@  r2  = PC value to resume execution after successful emulation
-@  r9  = normal "successful" return address
-@  r10 = this threads thread_info structure
-@  lr  = unrecognised instruction return address
-@  IRQs enabled.
-@
-ENTRY(do_vfp)
-	mov	r1, r10
-	mov	r3, r9
-	b	vfp_entry
-ENDPROC(do_vfp)
diff --git a/arch/arm/vfp/vfp.h b/arch/arm/vfp/vfp.h
index 5cd6d5053271760e..e43a630f8a164f9d 100644
--- a/arch/arm/vfp/vfp.h
+++ b/arch/arm/vfp/vfp.h
@@ -375,3 +375,4 @@ struct op {
 };
 
 asmlinkage void vfp_save_state(void *location, u32 fpexc);
+asmlinkage u32 vfp_load_state(const void *location);
diff --git a/arch/arm/vfp/vfphw.S b/arch/arm/vfp/vfphw.S
index 4d8478264d82b3d2..e82a1577afb83666 100644
--- a/arch/arm/vfp/vfphw.S
+++ b/arch/arm/vfp/vfphw.S
@@ -4,12 +4,6 @@
  *
  *  Copyright (C) 2004 ARM Limited.
  *  Written by Deep Blue Solutions Limited.
- *
- * This code is called from the kernel's undefined instruction trap.
- * r1 holds the thread_info pointer
- * r3 holds the return address for successful handling.
- * lr holds the return address for unrecognised instructions.
- * sp points to a struct pt_regs (as defined in include/asm/proc/ptrace.h)
  */
 #include <linux/init.h>
 #include <linux/linkage.h>
@@ -19,20 +13,6 @@
 #include <asm/assembler.h>
 #include <asm/asm-offsets.h>
 
-	.macro	DBGSTR, str
-#ifdef DEBUG
-	stmfd	sp!, {r0-r3, ip, lr}
-	ldr	r0, =1f
-	bl	_printk
-	ldmfd	sp!, {r0-r3, ip, lr}
-
-	.pushsection .rodata, "a"
-1:	.ascii	KERN_DEBUG "VFP: \str\n"
-	.byte	0
-	.previous
-#endif
-	.endm
-
 	.macro  DBGSTR1, str, arg
 #ifdef DEBUG
 	stmfd	sp!, {r0-r3, ip, lr}
@@ -48,179 +28,27 @@
 #endif
 	.endm
 
-	.macro  DBGSTR3, str, arg1, arg2, arg3
-#ifdef DEBUG
-	stmfd	sp!, {r0-r3, ip, lr}
-	mov	r3, \arg3
-	mov	r2, \arg2
-	mov	r1, \arg1
-	ldr	r0, =1f
-	bl	_printk
-	ldmfd	sp!, {r0-r3, ip, lr}
-
-	.pushsection .rodata, "a"
-1:	.ascii	KERN_DEBUG "VFP: \str\n"
-	.byte	0
-	.previous
-#endif
-	.endm
-
-
-@ VFP hardware support entry point.
-@
-@  r0  = instruction opcode (32-bit ARM or two 16-bit Thumb)
-@  r1  = thread_info pointer
-@  r2  = PC value to resume execution after successful emulation
-@  r3  = normal "successful" return address
-@  lr  = unrecognised instruction return address
-@  IRQs enabled.
-ENTRY(vfp_support_entry)
-	ldr	r11, [r1, #TI_CPU]	@ CPU number
-	add	r10, r1, #TI_VFPSTATE	@ r10 = workspace
-
-	DBGSTR3	"instr %08x pc %08x state %p", r0, r2, r10
-
-	.fpu	vfpv2
-	VFPFMRX	r1, FPEXC		@ Is the VFP enabled?
-	DBGSTR1	"fpexc %08x", r1
-	tst	r1, #FPEXC_EN
-	bne	look_for_VFP_exceptions	@ VFP is already enabled
-
-	DBGSTR1 "enable %x", r10
-	ldr	r9, vfp_current_hw_state_address
-	orr	r1, r1, #FPEXC_EN	@ user FPEXC has the enable bit set
-	ldr	r4, [r9, r11, lsl #2]	@ vfp_current_hw_state pointer
-	bic	r5, r1, #FPEXC_EX	@ make sure exceptions are disabled
-	cmp	r4, r10			@ this thread owns the hw context?
-#ifndef CONFIG_SMP
-	@ For UP, checking that this thread owns the hw context is
-	@ sufficient to determine that the hardware state is valid.
-	beq	vfp_hw_state_valid
-
-	@ On UP, we lazily save the VFP context.  As a different
-	@ thread wants ownership of the VFP hardware, save the old
-	@ state if there was a previous (valid) owner.
-
-	VFPFMXR	FPEXC, r5		@ enable VFP, disable any pending
-					@ exceptions, so we can get at the
-					@ rest of it
-
-	DBGSTR1	"save old state %p", r4
-	cmp	r4, #0			@ if the vfp_current_hw_state is NULL
-	beq	vfp_reload_hw		@ then the hw state needs reloading
-	VFPFSTMIA r4, r5		@ save the working registers
-	VFPFMRX	r5, FPSCR		@ current status
-#ifndef CONFIG_CPU_FEROCEON
-	tst	r1, #FPEXC_EX		@ is there additional state to save?
-	beq	1f
-	VFPFMRX	r6, FPINST		@ FPINST (only if FPEXC.EX is set)
-	tst	r1, #FPEXC_FP2V		@ is there an FPINST2 to read?
-	beq	1f
-	VFPFMRX	r8, FPINST2		@ FPINST2 if needed (and present)
-1:
-#endif
-	stmia	r4, {r1, r5, r6, r8}	@ save FPEXC, FPSCR, FPINST, FPINST2
-vfp_reload_hw:
-
-#else
-	@ For SMP, if this thread does not own the hw context, then we
-	@ need to reload it.  No need to save the old state as on SMP,
-	@ we always save the state when we switch away from a thread.
-	bne	vfp_reload_hw
-
-	@ This thread has ownership of the current hardware context.
-	@ However, it may have been migrated to another CPU, in which
-	@ case the saved state is newer than the hardware context.
-	@ Check this by looking at the CPU number which the state was
-	@ last loaded onto.
-	ldr	ip, [r10, #VFP_CPU]
-	teq	ip, r11
-	beq	vfp_hw_state_valid
-
-vfp_reload_hw:
-	@ We're loading this threads state into the VFP hardware. Update
-	@ the CPU number which contains the most up to date VFP context.
-	str	r11, [r10, #VFP_CPU]
-
-	VFPFMXR	FPEXC, r5		@ enable VFP, disable any pending
-					@ exceptions, so we can get at the
-					@ rest of it
-#endif
-
-	DBGSTR1	"load state %p", r10
-	str	r10, [r9, r11, lsl #2]	@ update the vfp_current_hw_state pointer
+ENTRY(vfp_load_state)
+	@ Load the current VFP state
+	@ r0 - load location
+	@ returns FPEXC
+	DBGSTR1	"load VFP state %p", r0
 					@ Load the saved state back into the VFP
-	VFPFLDMIA r10, r5		@ reload the working registers while
+	VFPFLDMIA r0, r1		@ reload the working registers while
 					@ FPEXC is in a safe state
-	ldmia	r10, {r1, r5, r6, r8}	@ load FPEXC, FPSCR, FPINST, FPINST2
+	ldmia	r0, {r0-r3}		@ load FPEXC, FPSCR, FPINST, FPINST2
 #ifndef CONFIG_CPU_FEROCEON
-	tst	r1, #FPEXC_EX		@ is there additional state to restore?
+	tst	r0, #FPEXC_EX		@ is there additional state to restore?
 	beq	1f
-	VFPFMXR	FPINST, r6		@ restore FPINST (only if FPEXC.EX is set)
-	tst	r1, #FPEXC_FP2V		@ is there an FPINST2 to write?
+	VFPFMXR	FPINST, r2		@ restore FPINST (only if FPEXC.EX is set)
+	tst	r0, #FPEXC_FP2V		@ is there an FPINST2 to write?
 	beq	1f
-	VFPFMXR	FPINST2, r8		@ FPINST2 if needed (and present)
+	VFPFMXR	FPINST2, r3		@ FPINST2 if needed (and present)
 1:
 #endif
-	VFPFMXR	FPSCR, r5		@ restore status
-
-@ The context stored in the VFP hardware is up to date with this thread
-vfp_hw_state_valid:
-	tst	r1, #FPEXC_EX
-	bne	process_exception	@ might as well handle the pending
-					@ exception before retrying branch
-					@ out before setting an FPEXC that
-					@ stops us reading stuff
-	VFPFMXR	FPEXC, r1		@ Restore FPEXC last
-	sub	r2, r2, #4		@ Retry current instruction - if Thumb
-	str	r2, [sp, #S_PC]		@ mode it's two 16-bit instructions,
-					@ else it's one 32-bit instruction, so
-					@ always subtract 4 from the following
-					@ instruction address.
-
-	mov	lr, r3			@ we think we have handled things
-local_bh_enable_and_ret:
-	adr	r0, .
-	mov	r1, #SOFTIRQ_DISABLE_OFFSET
-	b	__local_bh_enable_ip	@ tail call
-
-look_for_VFP_exceptions:
-	@ Check for synchronous or asynchronous exception
-	tst	r1, #FPEXC_EX | FPEXC_DEX
-	bne	process_exception
-	@ On some implementations of the VFP subarch 1, setting FPSCR.IXE
-	@ causes all the CDP instructions to be bounced synchronously without
-	@ setting the FPEXC.EX bit
-	VFPFMRX	r5, FPSCR
-	tst	r5, #FPSCR_IXE
-	bne	process_exception
-
-	tst	r5, #FPSCR_LENGTH_MASK
-	beq	skip
-	orr	r1, r1, #FPEXC_DEX
-	b	process_exception
-skip:
-
-	@ Fall into hand on to next handler - appropriate coproc instr
-	@ not recognised by VFP
-
-	DBGSTR	"not VFP"
-	b	local_bh_enable_and_ret
-
-process_exception:
-	DBGSTR	"bounce"
-	mov	r2, sp			@ nothing stacked - regdump is at TOS
-	mov	lr, r3			@ setup for a return to the user code.
-
-	@ Now call the C code to package up the bounce to the support code
-	@   r0 holds the trigger instruction
-	@   r1 holds the FPEXC value
-	@   r2 pointer to register dump
-	b	VFP_bounce		@ we have handled this - the support
-					@ code will raise an exception if
-					@ required. If not, the user code will
-					@ retry the faulted instruction
-ENDPROC(vfp_support_entry)
+	VFPFMXR	FPSCR, r1		@ restore status
+	ret	lr
+ENDPROC(vfp_load_state)
 
 ENTRY(vfp_save_state)
 	@ Save the current VFP state
@@ -240,10 +68,6 @@ ENTRY(vfp_save_state)
 	ret	lr
 ENDPROC(vfp_save_state)
 
-	.align
-vfp_current_hw_state_address:
-	.word	vfp_current_hw_state
-
 	.macro	tbl_branch, base, tmp, shift
 #ifdef CONFIG_THUMB2_KERNEL
 	adr	\tmp, 1f
diff --git a/arch/arm/vfp/vfpmodule.c b/arch/arm/vfp/vfpmodule.c
index 3f88599c285a5d4b..d8e490f8fb536861 100644
--- a/arch/arm/vfp/vfpmodule.c
+++ b/arch/arm/vfp/vfpmodule.c
@@ -319,7 +319,7 @@ static u32 vfp_emulate_instruction(u32 inst, u32 fpscr, struct pt_regs *regs)
 /*
  * Package up a bounce condition.
  */
-void VFP_bounce(u32 trigger, u32 fpexc, struct pt_regs *regs)
+static void VFP_bounce(u32 trigger, u32 fpexc, struct pt_regs *regs)
 {
 	u32 fpscr, orig_fpscr, fpsid, exceptions;
 
@@ -370,7 +370,7 @@ void VFP_bounce(u32 trigger, u32 fpexc, struct pt_regs *regs)
 		 * on VFP subarch 1.
 		 */
 		 vfp_raise_exceptions(VFP_EXCEPTION_ERROR, trigger, fpscr, regs);
-		goto exit;
+		return;
 	}
 
 	/*
@@ -401,7 +401,7 @@ void VFP_bounce(u32 trigger, u32 fpexc, struct pt_regs *regs)
 	 * the FPEXC.FP2V bit is valid only if FPEXC.EX is 1.
 	 */
 	if ((fpexc & (FPEXC_EX | FPEXC_FP2V)) != (FPEXC_EX | FPEXC_FP2V))
-		goto exit;
+		return;
 
 	/*
 	 * The barrier() here prevents fpinst2 being read
@@ -414,8 +414,6 @@ void VFP_bounce(u32 trigger, u32 fpexc, struct pt_regs *regs)
 	exceptions = vfp_emulate_instruction(trigger, orig_fpscr, regs);
 	if (exceptions)
 		vfp_raise_exceptions(exceptions, trigger, orig_fpscr, regs);
- exit:
-	local_bh_enable();
 }
 
 static void vfp_enable(void *unused)
@@ -906,20 +904,110 @@ static int __init vfp_init(void)
 core_initcall(vfp_init);
 
 /*
- * Entered with:
+ * vfp_entry - Handle VFP exception from user mode
  *
- *  r0  = instruction opcode (32-bit ARM or two 16-bit Thumb)
- *  r1  = thread_info pointer
- *  r2  = PC value to resume execution after successful emulation
- *  r3  = normal "successful" return address
- *  lr  = unrecognised instruction return address
+ * @trigger:	The opcode of the instruction that triggered the exception
+ * @regs:	pt_regs structure holding the register state at exception entry
+ *
+ * Returns 0 if the exception was handled, or an error code otherwise.
  */
-asmlinkage void vfp_entry(u32 trigger, struct thread_info *ti, u32 resume_pc,
-			  u32 resume_return_address)
+asmlinkage int vfp_entry(u32 trigger, struct pt_regs *regs)
 {
+	struct thread_info *ti = current_thread_info();
+	u32 fpexc;
+
 	if (unlikely(!have_vfp))
-		return;
+		return -ENODEV;
 
 	local_bh_disable();
-	vfp_support_entry(opcode, ti, resume_pc, resume_return_address);
+	fpexc = fmrx(FPEXC);
+
+	/*
+	 * If the VFP unit was not enabled yet, we have to check whether the
+	 * VFP state in the CPU's registers is the most recent VFP state
+	 * associated with the process. On UP systems, we don't save the VFP
+	 * state eagerly on a context switch, so we may need to save the
+	 * VFP state to memory first, as it may belong to another process.
+	 */
+	if (!(fpexc & FPEXC_EN)) {
+		/*
+		 * Enable the VFP unit but mask the FP exception flag for the
+		 * time being, so we can access all the registers.
+		 */
+		fpexc |= FPEXC_EN;
+		fmxr(FPEXC, fpexc & ~FPEXC_EX);
+
+		/*
+		 * Check whether or not the VFP state in the CPU's registers is
+		 * the most recent VFP state associated with this task. On SMP,
+		 * migration may result in multiple CPUs holding VFP states
+		 * that belong to the same task, but only the most recent one
+		 * is valid.
+		 */
+		if (!vfp_state_in_hw(ti->cpu, ti)) {
+			if (!IS_ENABLED(CONFIG_SMP) &&
+			    vfp_current_hw_state[ti->cpu] != NULL) {
+				/*
+				 * This CPU is currently holding the most
+				 * recent VFP state associated with another
+				 * task, and we must save that to memory first.
+				 */
+				vfp_save_state(vfp_current_hw_state[ti->cpu],
+					       fpexc);
+			}
+
+			/*
+			 * We can now proceed with loading the task's VFP state
+			 * from memory into the CPU registers.
+			 */
+			fpexc = vfp_load_state(&ti->vfpstate);
+			vfp_current_hw_state[ti->cpu] = &ti->vfpstate;
+#ifdef CONFIG_SMP
+			/*
+			 * Record that this CPU is now the one holding the most
+			 * recent VFP state of the task.
+			 */
+			ti->vfpstate.hard.cpu = ti->cpu;
+#endif
+		}
+
+		if (fpexc & FPEXC_EX)
+			/*
+			 * Might as well handle the pending exception before
+			 * retrying branch out before setting an FPEXC that
+			 * stops us reading stuff.
+			 */
+			goto bounce;
+
+		/*
+		 * No FP exception is pending: just enable the VFP and
+		 * replay the instruction that trapped.
+		 */
+		fmxr(FPEXC, fpexc);
+		regs->ARM_pc -= 4;
+	} else {
+		/* Check for synchronous or asynchronous exceptions */
+		if (!(fpexc & (FPEXC_EX | FPEXC_DEX))) {
+			u32 fpscr = fmrx(FPSCR);
+
+			/*
+			 * On some implementations of the VFP subarch 1,
+			 * setting FPSCR.IXE causes all the CDP instructions to
+			 * be bounced synchronously without setting the
+			 * FPEXC.EX bit
+			 */
+			if (!(fpscr & FPSCR_IXE)) {
+				if (!(fpscr & FPSCR_LENGTH_MASK)) {
+					pr_debug("not VFP\n");
+					local_bh_enable();
+					return -ENOEXEC;
+				}
+				fpexc |= FPEXC_DEX;
+			}
+		}
+bounce:		VFP_bounce(trigger, fpexc, regs);
+	}
+
+	local_bh_enable();
+	return 0;
 }
-- 
2.39.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v3 4/4] ARM: vfp: Reimplement VFP exception entry in C code
  2023-03-16  8:20 ` [PATCH v3 4/4] ARM: vfp: Reimplement VFP exception entry in C code Ard Biesheuvel
@ 2023-03-16  8:59   ` Linus Walleij
  2023-03-18 16:20     ` Ard Biesheuvel
  0 siblings, 1 reply; 10+ messages in thread
From: Linus Walleij @ 2023-03-16  8:59 UTC (permalink / raw)
  To: Ard Biesheuvel, Catalin Marinas
  Cc: linux-arm-kernel, linux, Frederic Weisbecker, Guenter Roeck,
	Peter Zijlstra, Arnd Bergmann

On Thu, Mar 16, 2023 at 9:20 AM Ard Biesheuvel <ardb@kernel.org> wrote:

> En/disabling softirqs from asm code turned out to be trickier than
> expected, so vfp_support_entry now returns by tail calling
> __local_enable_bh_ip() and passing the same arguments that a C call to
> local_bh_enable() would pass. However, this is slightly hacky, as we
> don't want to carry our own implementation of local_bh_enable().
>
> So let's bite the bullet, and get rid of the asm logic in
> vfp_support_entry that reasons about whether or not to save and/or
> reload the VFP state, and about whether or not an FP exception is
> pending, and only keep the VFP loading logic as a function that is
> callable from C.
>
> Replicate the removed logic in vfp_entry(), and use the exact same
> reasoning as in the asm code. To emphasize the correspondance, retain
> some of the asm comments in the C version as well.
>
> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>

I tried to model what the assembly is doing in my head to review
if the new code in C does the same thing, but after following some of
the semantics (which look correct) I realized it will mostly be down to
testing anyway.

It's definately better to have it like this so FWIW:
Acked-by: Linus Walleij <linus.walleij@linaro.org>

Apparently a bunch of this code was written by Catalin for VFPv3 support,
he may or may not have the memory and time to go back and look at it, but
let's page him in anyway :)

Yours,
Linus Walleij

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v3 0/4] ARM: vfp: Switch to C API to en/disable softirqs
  2023-03-16  8:20 [PATCH v3 0/4] ARM: vfp: Switch to C API to en/disable softirqs Ard Biesheuvel
                   ` (3 preceding siblings ...)
  2023-03-16  8:20 ` [PATCH v3 4/4] ARM: vfp: Reimplement VFP exception entry in C code Ard Biesheuvel
@ 2023-03-17  0:23 ` Guenter Roeck
  2023-03-17  0:24   ` Guenter Roeck
  4 siblings, 1 reply; 10+ messages in thread
From: Guenter Roeck @ 2023-03-17  0:23 UTC (permalink / raw)
  To: Ard Biesheuvel, linux-arm-kernel, linux
  Cc: Frederic Weisbecker, Peter Zijlstra, Linus Walleij, Arnd Bergmann

On 3/16/23 01:20, Ard Biesheuvel wrote:
> As it turns out, enabling or disabling softirqs from asm code is more
> tricky than it looks. Simply bumping the associated bit in preempt_count
> does the trick for uninstrumented kernels, but with lockdep or preempt
> debug enabled, we really need to call the C versions, as replicating
> their behavior in asm fully is intractible.
> 
> So let's rework the existing code a little bit so we can interpose a
> little C helper 'vfp_entry()' that disables softirqs before calling the
> existing vfpstate handling code in asm. Re-enabling softirqs from asm
> code is more straight-forward, as we can simply perform a tail call via
> a C routine that is guaranteed to be callable as a function.
> 
> However, since calling these APIs from asm code is still not ideal,
> let's reimplement the whole thing in C (patch #4)
> 
> 
> Changes since v2:
> - add Rbs and Tbs from Linus and Guenter (thanks!)
> - correct reference to local_bh_enable() vs __local_bh_enable_ip() in
>    commit log of patch #3
> - add patch that reworks the asm VFP exception handling entirely so the
>    bulk of it is implemented in C. This could be taken into v6.4, while
>    the  preceding patches are fixes for v6.3-rc
> 
> Cc: Frederic Weisbecker <frederic@kernel.org>
> Cc: Guenter Roeck <linux@roeck-us.net>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Linus Walleij <linus.walleij@linaro.org>
> Cc: Arnd Bergmann <arnd@arndb.de>
> Link: https://lore.kernel.org/all/ZBBYCSZUJOWBg1s8@localhost.localdomain/
> 
> Ard Biesheuvel (4):
>    ARM: vfp: Pass thread_info pointer to vfp_support_entry
>    ARM: vfp: Pass successful return address via register R3
>    ARM: vfp: Fix broken softirq handling with instrumentation enabled
>    ARM: vfp: Reimplement VFP exception entry in C code
> 
>   arch/arm/include/asm/assembler.h |  13 --
>   arch/arm/kernel/entry-armv.S     |  12 ++
>   arch/arm/vfp/Makefile            |   2 +-
>   arch/arm/vfp/entry.S             |  39 ----
>   arch/arm/vfp/vfp.h               |   1 +
>   arch/arm/vfp/vfphw.S             | 198 ++------------------
>   arch/arm/vfp/vfpmodule.c         | 125 +++++++++++-
>   7 files changed, 142 insertions(+), 248 deletions(-)
>   delete mode 100644 arch/arm/vfp/entry.S
> 

After applying this series, imx_v4_v5_defconfig and aspeed_g4_defconfig
(and maybe others) fail to build:

arm-linux-gnueabi-ld: arch/arm/kernel/entry-armv.o: in function `do_vfp':
(.entry.text+0x5a4): undefined reference to `vfp_entry'
make[1]: *** [scripts/Makefile.vmlinux:35: vmlinux] Error 1
make: *** [Makefile:1250: vmlinux] Error 2

Guenter


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v3 0/4] ARM: vfp: Switch to C API to en/disable softirqs
  2023-03-17  0:23 ` [PATCH v3 0/4] ARM: vfp: Switch to C API to en/disable softirqs Guenter Roeck
@ 2023-03-17  0:24   ` Guenter Roeck
  2023-03-17  8:55     ` Ard Biesheuvel
  0 siblings, 1 reply; 10+ messages in thread
From: Guenter Roeck @ 2023-03-17  0:24 UTC (permalink / raw)
  To: Ard Biesheuvel, linux-arm-kernel, linux
  Cc: Frederic Weisbecker, Peter Zijlstra, Linus Walleij, Arnd Bergmann

On 3/16/23 17:23, Guenter Roeck wrote:
> On 3/16/23 01:20, Ard Biesheuvel wrote:
>> As it turns out, enabling or disabling softirqs from asm code is more
>> tricky than it looks. Simply bumping the associated bit in preempt_count
>> does the trick for uninstrumented kernels, but with lockdep or preempt
>> debug enabled, we really need to call the C versions, as replicating
>> their behavior in asm fully is intractible.
>>
>> So let's rework the existing code a little bit so we can interpose a
>> little C helper 'vfp_entry()' that disables softirqs before calling the
>> existing vfpstate handling code in asm. Re-enabling softirqs from asm
>> code is more straight-forward, as we can simply perform a tail call via
>> a C routine that is guaranteed to be callable as a function.
>>
>> However, since calling these APIs from asm code is still not ideal,
>> let's reimplement the whole thing in C (patch #4)
>>
>>
>> Changes since v2:
>> - add Rbs and Tbs from Linus and Guenter (thanks!)
>> - correct reference to local_bh_enable() vs __local_bh_enable_ip() in
>>    commit log of patch #3
>> - add patch that reworks the asm VFP exception handling entirely so the
>>    bulk of it is implemented in C. This could be taken into v6.4, while
>>    the  preceding patches are fixes for v6.3-rc
>>
>> Cc: Frederic Weisbecker <frederic@kernel.org>
>> Cc: Guenter Roeck <linux@roeck-us.net>
>> Cc: Peter Zijlstra <peterz@infradead.org>
>> Cc: Linus Walleij <linus.walleij@linaro.org>
>> Cc: Arnd Bergmann <arnd@arndb.de>
>> Link: https://lore.kernel.org/all/ZBBYCSZUJOWBg1s8@localhost.localdomain/
>>
>> Ard Biesheuvel (4):
>>    ARM: vfp: Pass thread_info pointer to vfp_support_entry
>>    ARM: vfp: Pass successful return address via register R3
>>    ARM: vfp: Fix broken softirq handling with instrumentation enabled
>>    ARM: vfp: Reimplement VFP exception entry in C code
>>
>>   arch/arm/include/asm/assembler.h |  13 --
>>   arch/arm/kernel/entry-armv.S     |  12 ++
>>   arch/arm/vfp/Makefile            |   2 +-
>>   arch/arm/vfp/entry.S             |  39 ----
>>   arch/arm/vfp/vfp.h               |   1 +
>>   arch/arm/vfp/vfphw.S             | 198 ++------------------
>>   arch/arm/vfp/vfpmodule.c         | 125 +++++++++++-
>>   7 files changed, 142 insertions(+), 248 deletions(-)
>>   delete mode 100644 arch/arm/vfp/entry.S
>>
> 
> After applying this series, imx_v4_v5_defconfig and aspeed_g4_defconfig
> (and maybe others) fail to build:
> 
> arm-linux-gnueabi-ld: arch/arm/kernel/entry-armv.o: in function `do_vfp':
> (.entry.text+0x5a4): undefined reference to `vfp_entry'
> make[1]: *** [scripts/Makefile.vmlinux:35: vmlinux] Error 1
> make: *** [Makefile:1250: vmlinux] Error 2
> 

... and reverting the last patch of the series fixes the problem.

Guenter


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v3 0/4] ARM: vfp: Switch to C API to en/disable softirqs
  2023-03-17  0:24   ` Guenter Roeck
@ 2023-03-17  8:55     ` Ard Biesheuvel
  0 siblings, 0 replies; 10+ messages in thread
From: Ard Biesheuvel @ 2023-03-17  8:55 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: linux-arm-kernel, linux, Frederic Weisbecker, Peter Zijlstra,
	Linus Walleij, Arnd Bergmann

On Fri, 17 Mar 2023 at 01:24, Guenter Roeck <linux@roeck-us.net> wrote:
>
> On 3/16/23 17:23, Guenter Roeck wrote:
> > On 3/16/23 01:20, Ard Biesheuvel wrote:
> >> As it turns out, enabling or disabling softirqs from asm code is more
> >> tricky than it looks. Simply bumping the associated bit in preempt_count
> >> does the trick for uninstrumented kernels, but with lockdep or preempt
> >> debug enabled, we really need to call the C versions, as replicating
> >> their behavior in asm fully is intractible.
> >>
> >> So let's rework the existing code a little bit so we can interpose a
> >> little C helper 'vfp_entry()' that disables softirqs before calling the
> >> existing vfpstate handling code in asm. Re-enabling softirqs from asm
> >> code is more straight-forward, as we can simply perform a tail call via
> >> a C routine that is guaranteed to be callable as a function.
> >>
> >> However, since calling these APIs from asm code is still not ideal,
> >> let's reimplement the whole thing in C (patch #4)
> >>
> >>
> >> Changes since v2:
> >> - add Rbs and Tbs from Linus and Guenter (thanks!)
> >> - correct reference to local_bh_enable() vs __local_bh_enable_ip() in
> >>    commit log of patch #3
> >> - add patch that reworks the asm VFP exception handling entirely so the
> >>    bulk of it is implemented in C. This could be taken into v6.4, while
> >>    the  preceding patches are fixes for v6.3-rc
> >>
> >> Cc: Frederic Weisbecker <frederic@kernel.org>
> >> Cc: Guenter Roeck <linux@roeck-us.net>
> >> Cc: Peter Zijlstra <peterz@infradead.org>
> >> Cc: Linus Walleij <linus.walleij@linaro.org>
> >> Cc: Arnd Bergmann <arnd@arndb.de>
> >> Link: https://lore.kernel.org/all/ZBBYCSZUJOWBg1s8@localhost.localdomain/
> >>
> >> Ard Biesheuvel (4):
> >>    ARM: vfp: Pass thread_info pointer to vfp_support_entry
> >>    ARM: vfp: Pass successful return address via register R3
> >>    ARM: vfp: Fix broken softirq handling with instrumentation enabled
> >>    ARM: vfp: Reimplement VFP exception entry in C code
> >>
> >>   arch/arm/include/asm/assembler.h |  13 --
> >>   arch/arm/kernel/entry-armv.S     |  12 ++
> >>   arch/arm/vfp/Makefile            |   2 +-
> >>   arch/arm/vfp/entry.S             |  39 ----
> >>   arch/arm/vfp/vfp.h               |   1 +
> >>   arch/arm/vfp/vfphw.S             | 198 ++------------------
> >>   arch/arm/vfp/vfpmodule.c         | 125 +++++++++++-
> >>   7 files changed, 142 insertions(+), 248 deletions(-)
> >>   delete mode 100644 arch/arm/vfp/entry.S
> >>
> >
> > After applying this series, imx_v4_v5_defconfig and aspeed_g4_defconfig
> > (and maybe others) fail to build:
> >
> > arm-linux-gnueabi-ld: arch/arm/kernel/entry-armv.o: in function `do_vfp':
> > (.entry.text+0x5a4): undefined reference to `vfp_entry'
> > make[1]: *** [scripts/Makefile.vmlinux:35: vmlinux] Error 1
> > make: *** [Makefile:1250: vmlinux] Error 2
> >
>
> ... and reverting the last patch of the series fixes the problem.
>

Thanks Guenter,

It was a missing #ifdef CONFIG_VFP - I'll fix that up before I drop it
into the patch system.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v3 4/4] ARM: vfp: Reimplement VFP exception entry in C code
  2023-03-16  8:59   ` Linus Walleij
@ 2023-03-18 16:20     ` Ard Biesheuvel
  0 siblings, 0 replies; 10+ messages in thread
From: Ard Biesheuvel @ 2023-03-18 16:20 UTC (permalink / raw)
  To: Linus Walleij
  Cc: Catalin Marinas, linux-arm-kernel, linux, Frederic Weisbecker,
	Guenter Roeck, Peter Zijlstra, Arnd Bergmann

On Thu, 16 Mar 2023 at 09:59, Linus Walleij <linus.walleij@linaro.org> wrote:
>
> On Thu, Mar 16, 2023 at 9:20 AM Ard Biesheuvel <ardb@kernel.org> wrote:
>
> > En/disabling softirqs from asm code turned out to be trickier than
> > expected, so vfp_support_entry now returns by tail calling
> > __local_enable_bh_ip() and passing the same arguments that a C call to
> > local_bh_enable() would pass. However, this is slightly hacky, as we
> > don't want to carry our own implementation of local_bh_enable().
> >
> > So let's bite the bullet, and get rid of the asm logic in
> > vfp_support_entry that reasons about whether or not to save and/or
> > reload the VFP state, and about whether or not an FP exception is
> > pending, and only keep the VFP loading logic as a function that is
> > callable from C.
> >
> > Replicate the removed logic in vfp_entry(), and use the exact same
> > reasoning as in the asm code. To emphasize the correspondance, retain
> > some of the asm comments in the C version as well.
> >
> > Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
>
> I tried to model what the assembly is doing in my head to review
> if the new code in C does the same thing, but after following some of
> the semantics (which look correct) I realized it will mostly be down to
> testing anyway.
>

Indeed. The only hardware I have that actually generates VFP
exceptions is Raspberry Pi 1, so I gave it a spin there.

After adding a perf software event like below, and running John
Hauser's testfloat tool [0], I get the exact same mix of passes and
failures, i.e.,

$ perf_4.9 stat -e emulation-faults ./testfloat -all2 >before 2>&1
$ diff -u before after
--- before 2023-03-18 16:51:17.842814747 +0100
+++ after 2023-03-18 16:49:11.783973769 +0100
@@ -399,5 +399,5 @@

            259,277      emulation-faults:u

-       5.810963659 seconds time elapsed
+       5.748333166 seconds time elapsed

So as far as ARM 1176 is concerned, I think we are ok. However, it
appears all generated emulation exceptions are synchronous ones, so
the async code path is not testable this way.


[0] http://www.jhauser.us/arithmetic/TestFloat.html

diff --git a/arch/arm/vfp/vfpmodule.c b/arch/arm/vfp/vfpmodule.c
index b56fad06cef0dfad..f46eb681f2f95448 100644
--- a/arch/arm/vfp/vfpmodule.c
+++ b/arch/arm/vfp/vfpmodule.c
@@ -18,6 +18,7 @@
 #include <linux/uaccess.h>
 #include <linux/user.h>
 #include <linux/export.h>
+#include <linux/perf_event.h>

 #include <asm/cp15.h>
 #include <asm/cputype.h>
@@ -313,6 +314,7 @@ static u32 vfp_emulate_instruction(u32 inst, u32
fpscr, struct pt_regs *regs)
                 * emulate it.
                 */
        }
+       perf_sw_event(PERF_COUNT_SW_EMULATION_FAULTS, 1, regs, regs->ARM_pc);
        return exceptions & ~VFP_NAN_FLAG;
 }



> It's definately better to have it like this so FWIW:
> Acked-by: Linus Walleij <linus.walleij@linaro.org>
>

Thanks

> Apparently a bunch of this code was written by Catalin for VFPv3 support,
> he may or may not have the memory and time to go back and look at it, but
> let's page him in anyway :)
>
> Yours,
> Linus Walleij

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2023-03-18 16:21 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-16  8:20 [PATCH v3 0/4] ARM: vfp: Switch to C API to en/disable softirqs Ard Biesheuvel
2023-03-16  8:20 ` [PATCH v3 1/4] ARM: vfp: Pass thread_info pointer to vfp_support_entry Ard Biesheuvel
2023-03-16  8:20 ` [PATCH v3 2/4] ARM: vfp: Pass successful return address via register R3 Ard Biesheuvel
2023-03-16  8:20 ` [PATCH v3 3/4] ARM: vfp: Fix broken softirq handling with instrumentation enabled Ard Biesheuvel
2023-03-16  8:20 ` [PATCH v3 4/4] ARM: vfp: Reimplement VFP exception entry in C code Ard Biesheuvel
2023-03-16  8:59   ` Linus Walleij
2023-03-18 16:20     ` Ard Biesheuvel
2023-03-17  0:23 ` [PATCH v3 0/4] ARM: vfp: Switch to C API to en/disable softirqs Guenter Roeck
2023-03-17  0:24   ` Guenter Roeck
2023-03-17  8:55     ` Ard Biesheuvel

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.