[PATCH v2 0/3] ARM: make use of UAL VFP mnemonics when possible

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH v2 0/3] ARM: make use of UAL VFP mnemonics when possible
@ 2020-04-19 12:35 ` Stefan Agner
  0 siblings, 0 replies; 14+ messages in thread
From: Stefan Agner @ 2020-04-19 12:35 UTC (permalink / raw)
  To: linux
  Cc: arnd, ard.biesheuvel, robin.murphy, yamada.masahiro,
	ndesaulniers, manojgupta, jiancai, linux-arm-kernel,
	linux-kernel, clang-built-linux, Stefan Agner

To build the kernel with Clang's integrated assembler the VFP code needs
to make use of the unified assembler language (UAL) VFP mnemonics.

At first I tried to get rid of the co-processor instructions to access
the floating point unit along with the macros completely. However, due
to missing FPINST/FPINST2 argument support in older binutils versions we
have to keep them around. Once we drop support for binutils 2.24 and
older, the move to UAL VFP mnemonics will be straight forward with this
changes applied.

Tested using Clang with integrated assembler as well as external
(binutils assembler), various gcc/binutils version down to 4.7/2.23.
Disassembled and compared the object files in arch/arm/vfp/ to make
sure this changes leads to the same code. Besides different inlining
behavior I was not able to spot a difference.

In v2 the check for FPINST argument support is now made in Kconfig.

--
Stefan

Stefan Agner (3):
  ARM: use .fpu assembler directives instead of assembler arguments
  ARM: use VFP assembler mnemonics in register load/store macros
  ARM: use VFP assembler mnemonics if available

 arch/arm/Kconfig                 |  2 ++
 arch/arm/Kconfig.assembler       |  6 ++++++
 arch/arm/include/asm/vfp.h       |  2 ++
 arch/arm/include/asm/vfpmacros.h | 31 ++++++++++++++++++++++---------
 arch/arm/vfp/Makefile            |  2 --
 arch/arm/vfp/vfphw.S             | 31 ++++++++++++++++++++-----------
 arch/arm/vfp/vfpinstr.h          | 23 +++++++++++++++++++----
 7 files changed, 71 insertions(+), 26 deletions(-)
 create mode 100644 arch/arm/Kconfig.assembler

-- 
2.25.1

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v2 0/3] ARM: make use of UAL VFP mnemonics when possible
@ 2020-04-19 12:35 ` Stefan Agner
  0 siblings, 0 replies; 14+ messages in thread
From: Stefan Agner @ 2020-04-19 12:35 UTC (permalink / raw)
  To: linux
  Cc: clang-built-linux, arnd, ard.biesheuvel, ndesaulniers,
	linux-kernel, Stefan Agner, jiancai, yamada.masahiro, manojgupta,
	robin.murphy, linux-arm-kernel

To build the kernel with Clang's integrated assembler the VFP code needs
to make use of the unified assembler language (UAL) VFP mnemonics.

At first I tried to get rid of the co-processor instructions to access
the floating point unit along with the macros completely. However, due
to missing FPINST/FPINST2 argument support in older binutils versions we
have to keep them around. Once we drop support for binutils 2.24 and
older, the move to UAL VFP mnemonics will be straight forward with this
changes applied.

Tested using Clang with integrated assembler as well as external
(binutils assembler), various gcc/binutils version down to 4.7/2.23.
Disassembled and compared the object files in arch/arm/vfp/ to make
sure this changes leads to the same code. Besides different inlining
behavior I was not able to spot a difference.

In v2 the check for FPINST argument support is now made in Kconfig.

--
Stefan

Stefan Agner (3):
  ARM: use .fpu assembler directives instead of assembler arguments
  ARM: use VFP assembler mnemonics in register load/store macros
  ARM: use VFP assembler mnemonics if available

 arch/arm/Kconfig                 |  2 ++
 arch/arm/Kconfig.assembler       |  6 ++++++
 arch/arm/include/asm/vfp.h       |  2 ++
 arch/arm/include/asm/vfpmacros.h | 31 ++++++++++++++++++++++---------
 arch/arm/vfp/Makefile            |  2 --
 arch/arm/vfp/vfphw.S             | 31 ++++++++++++++++++++-----------
 arch/arm/vfp/vfpinstr.h          | 23 +++++++++++++++++++----
 7 files changed, 71 insertions(+), 26 deletions(-)
 create mode 100644 arch/arm/Kconfig.assembler

-- 
2.25.1

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v2 1/3] ARM: use .fpu assembler directives instead of assembler arguments
  2020-04-19 12:35 ` Stefan Agner
@ 2020-04-19 12:35   ` Stefan Agner
  -1 siblings, 0 replies; 14+ messages in thread
From: Stefan Agner @ 2020-04-19 12:35 UTC (permalink / raw)
  To: linux
  Cc: arnd, ard.biesheuvel, robin.murphy, yamada.masahiro,
	ndesaulniers, manojgupta, jiancai, linux-arm-kernel,
	linux-kernel, clang-built-linux, Stefan Agner

Explicit FPU selection has been introduced in commit 1a6be26d5b1a
("[ARM] Enable VFP to be built when non-VFP capable CPUs are selected")
to make use of assembler mnemonics for VFP instructions.

However, clang currently does not support passing assembler flags
like this and errors out with:
clang-10: error: the clang compiler does not support '-Wa,-mfpu=softvfp+vfp'

Make use of the .fpu assembler directives to select the floating point
hardware selectively. Also use the new unified assembler language
mnemonics. This allows to build these procedures with Clang.

Link: https://github.com/ClangBuiltLinux/linux/issues/762
Signed-off-by: Stefan Agner <stefan@agner.ch>
---
Changes in v2:
- Add link in commit message

 arch/arm/vfp/Makefile |  2 --
 arch/arm/vfp/vfphw.S  | 30 +++++++++++++++++++-----------
 2 files changed, 19 insertions(+), 13 deletions(-)

diff --git a/arch/arm/vfp/Makefile b/arch/arm/vfp/Makefile
index 9975b63ac3b0..749901a72d6d 100644
--- a/arch/arm/vfp/Makefile
+++ b/arch/arm/vfp/Makefile
@@ -8,6 +8,4 @@
 # ccflags-y := -DDEBUG
 # asflags-y := -DDEBUG
 
-KBUILD_AFLAGS	:=$(KBUILD_AFLAGS:-msoft-float=-Wa,-mfpu=softvfp+vfp -mfloat-abi=soft)
-
 obj-y		+= vfpmodule.o entry.o vfphw.o vfpsingle.o vfpdouble.o
diff --git a/arch/arm/vfp/vfphw.S b/arch/arm/vfp/vfphw.S
index b2e560290860..e214007a20a2 100644
--- a/arch/arm/vfp/vfphw.S
+++ b/arch/arm/vfp/vfphw.S
@@ -258,11 +258,13 @@ vfp_current_hw_state_address:
 
 ENTRY(vfp_get_float)
 	tbl_branch r0, r3, #3
-	.irp	dr,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15
-1:	mrc	p10, 0, r0, c\dr, c0, 0	@ fmrs	r0, s0
+	.irp	dr,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,
+1:	vmov	r0, s\dr
 	ret	lr
 	.org	1b + 8
-1:	mrc	p10, 0, r0, c\dr, c0, 4	@ fmrs	r0, s1
+	.endr
+	.irp	dr,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31
+1:	vmov	r0, s\dr
 	ret	lr
 	.org	1b + 8
 	.endr
@@ -271,10 +273,12 @@ ENDPROC(vfp_get_float)
 ENTRY(vfp_put_float)
 	tbl_branch r1, r3, #3
 	.irp	dr,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15
-1:	mcr	p10, 0, r0, c\dr, c0, 0	@ fmsr	r0, s0
+1:	vmov	s\dr, r0
 	ret	lr
 	.org	1b + 8
-1:	mcr	p10, 0, r0, c\dr, c0, 4	@ fmsr	r0, s1
+	.endr
+	.irp	dr,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31
+1:	vmov	s\dr, r0
 	ret	lr
 	.org	1b + 8
 	.endr
@@ -282,15 +286,17 @@ ENDPROC(vfp_put_float)
 
 ENTRY(vfp_get_double)
 	tbl_branch r0, r3, #3
+	.fpu	vfpv2
 	.irp	dr,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15
-1:	fmrrd	r0, r1, d\dr
+1:	vmov	r0, r1, d\dr
 	ret	lr
 	.org	1b + 8
 	.endr
 #ifdef CONFIG_VFPv3
 	@ d16 - d31 registers
-	.irp	dr,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15
-1:	mrrc	p11, 3, r0, r1, c\dr	@ fmrrd	r0, r1, d\dr
+	.fpu	vfpv3
+	.irp	dr,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31
+1:	vmov	r0, r1, d\dr
 	ret	lr
 	.org	1b + 8
 	.endr
@@ -304,15 +310,17 @@ ENDPROC(vfp_get_double)
 
 ENTRY(vfp_put_double)
 	tbl_branch r2, r3, #3
+	.fpu	vfpv2
 	.irp	dr,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15
-1:	fmdrr	d\dr, r0, r1
+1:	vmov	d\dr, r0, r1
 	ret	lr
 	.org	1b + 8
 	.endr
 #ifdef CONFIG_VFPv3
+	.fpu	vfpv3
 	@ d16 - d31 registers
-	.irp	dr,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15
-1:	mcrr	p11, 3, r0, r1, c\dr	@ fmdrr	r0, r1, d\dr
+	.irp	dr,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31
+1:	vmov	d\dr, r0, r1
 	ret	lr
 	.org	1b + 8
 	.endr
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v2 1/3] ARM: use .fpu assembler directives instead of assembler arguments
@ 2020-04-19 12:35   ` Stefan Agner
  0 siblings, 0 replies; 14+ messages in thread
From: Stefan Agner @ 2020-04-19 12:35 UTC (permalink / raw)
  To: linux
  Cc: clang-built-linux, arnd, ard.biesheuvel, ndesaulniers,
	linux-kernel, Stefan Agner, jiancai, yamada.masahiro, manojgupta,
	robin.murphy, linux-arm-kernel

Explicit FPU selection has been introduced in commit 1a6be26d5b1a
("[ARM] Enable VFP to be built when non-VFP capable CPUs are selected")
to make use of assembler mnemonics for VFP instructions.

However, clang currently does not support passing assembler flags
like this and errors out with:
clang-10: error: the clang compiler does not support '-Wa,-mfpu=softvfp+vfp'

Make use of the .fpu assembler directives to select the floating point
hardware selectively. Also use the new unified assembler language
mnemonics. This allows to build these procedures with Clang.

Link: https://github.com/ClangBuiltLinux/linux/issues/762
Signed-off-by: Stefan Agner <stefan@agner.ch>
---
Changes in v2:
- Add link in commit message

 arch/arm/vfp/Makefile |  2 --
 arch/arm/vfp/vfphw.S  | 30 +++++++++++++++++++-----------
 2 files changed, 19 insertions(+), 13 deletions(-)

diff --git a/arch/arm/vfp/Makefile b/arch/arm/vfp/Makefile
index 9975b63ac3b0..749901a72d6d 100644
--- a/arch/arm/vfp/Makefile
+++ b/arch/arm/vfp/Makefile
@@ -8,6 +8,4 @@
 # ccflags-y := -DDEBUG
 # asflags-y := -DDEBUG
 
-KBUILD_AFLAGS	:=$(KBUILD_AFLAGS:-msoft-float=-Wa,-mfpu=softvfp+vfp -mfloat-abi=soft)
-
 obj-y		+= vfpmodule.o entry.o vfphw.o vfpsingle.o vfpdouble.o
diff --git a/arch/arm/vfp/vfphw.S b/arch/arm/vfp/vfphw.S
index b2e560290860..e214007a20a2 100644
--- a/arch/arm/vfp/vfphw.S
+++ b/arch/arm/vfp/vfphw.S
@@ -258,11 +258,13 @@ vfp_current_hw_state_address:
 
 ENTRY(vfp_get_float)
 	tbl_branch r0, r3, #3
-	.irp	dr,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15
-1:	mrc	p10, 0, r0, c\dr, c0, 0	@ fmrs	r0, s0
+	.irp	dr,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,
+1:	vmov	r0, s\dr
 	ret	lr
 	.org	1b + 8
-1:	mrc	p10, 0, r0, c\dr, c0, 4	@ fmrs	r0, s1
+	.endr
+	.irp	dr,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31
+1:	vmov	r0, s\dr
 	ret	lr
 	.org	1b + 8
 	.endr
@@ -271,10 +273,12 @@ ENDPROC(vfp_get_float)
 ENTRY(vfp_put_float)
 	tbl_branch r1, r3, #3
 	.irp	dr,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15
-1:	mcr	p10, 0, r0, c\dr, c0, 0	@ fmsr	r0, s0
+1:	vmov	s\dr, r0
 	ret	lr
 	.org	1b + 8
-1:	mcr	p10, 0, r0, c\dr, c0, 4	@ fmsr	r0, s1
+	.endr
+	.irp	dr,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31
+1:	vmov	s\dr, r0
 	ret	lr
 	.org	1b + 8
 	.endr
@@ -282,15 +286,17 @@ ENDPROC(vfp_put_float)
 
 ENTRY(vfp_get_double)
 	tbl_branch r0, r3, #3
+	.fpu	vfpv2
 	.irp	dr,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15
-1:	fmrrd	r0, r1, d\dr
+1:	vmov	r0, r1, d\dr
 	ret	lr
 	.org	1b + 8
 	.endr
 #ifdef CONFIG_VFPv3
 	@ d16 - d31 registers
-	.irp	dr,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15
-1:	mrrc	p11, 3, r0, r1, c\dr	@ fmrrd	r0, r1, d\dr
+	.fpu	vfpv3
+	.irp	dr,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31
+1:	vmov	r0, r1, d\dr
 	ret	lr
 	.org	1b + 8
 	.endr
@@ -304,15 +310,17 @@ ENDPROC(vfp_get_double)
 
 ENTRY(vfp_put_double)
 	tbl_branch r2, r3, #3
+	.fpu	vfpv2
 	.irp	dr,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15
-1:	fmdrr	d\dr, r0, r1
+1:	vmov	d\dr, r0, r1
 	ret	lr
 	.org	1b + 8
 	.endr
 #ifdef CONFIG_VFPv3
+	.fpu	vfpv3
 	@ d16 - d31 registers
-	.irp	dr,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15
-1:	mcrr	p11, 3, r0, r1, c\dr	@ fmdrr	r0, r1, d\dr
+	.irp	dr,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31
+1:	vmov	d\dr, r0, r1
 	ret	lr
 	.org	1b + 8
 	.endr
-- 
2.25.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v2 2/3] ARM: use VFP assembler mnemonics in register load/store macros
  2020-04-19 12:35 ` Stefan Agner
@ 2020-04-19 12:35   ` Stefan Agner
  -1 siblings, 0 replies; 14+ messages in thread
From: Stefan Agner @ 2020-04-19 12:35 UTC (permalink / raw)
  To: linux
  Cc: arnd, ard.biesheuvel, robin.murphy, yamada.masahiro,
	ndesaulniers, manojgupta, jiancai, linux-arm-kernel,
	linux-kernel, clang-built-linux, Stefan Agner

Clang's integrated assembler does not allow to access the VFP registers
through the coprocessor load/store instructions:
<instantiation>:4:6: error: invalid operand for instruction
 LDC p11, cr0, [r10],#32*4 @ FLDMIAD r10!, {d0-d15}
     ^

Replace the coprocessor load/store instructions with explicit assembler
mnemonics to accessing the floating point coprocessor registers. Use
assembler directives to select the appropriate FPU version.

This allows to build these macros with GNU assembler as well as with
Clang's built-in assembler.

Link: https://github.com/ClangBuiltLinux/linux/issues/905
Signed-off-by: Stefan Agner <stefan@agner.ch>
---
Changes in v2:
- Add link in commit message

 arch/arm/include/asm/vfpmacros.h | 19 +++++++++++--------
 1 file changed, 11 insertions(+), 8 deletions(-)

diff --git a/arch/arm/include/asm/vfpmacros.h b/arch/arm/include/asm/vfpmacros.h
index 628c336e8e3b..947ee5395e1f 100644
--- a/arch/arm/include/asm/vfpmacros.h
+++ b/arch/arm/include/asm/vfpmacros.h
@@ -19,23 +19,25 @@
 
 	@ read all the working registers back into the VFP
 	.macro	VFPFLDMIA, base, tmp
+	.fpu	vfpv2
 #if __LINUX_ARM_ARCH__ < 6
-	LDC	p11, cr0, [\base],#33*4		    @ FLDMIAX \base!, {d0-d15}
+	fldmiax	\base!, {d0-d15}
 #else
-	LDC	p11, cr0, [\base],#32*4		    @ FLDMIAD \base!, {d0-d15}
+	vldmia	\base!, {d0-d15}
 #endif
 #ifdef CONFIG_VFPv3
+	.fpu	vfpv3
 #if __LINUX_ARM_ARCH__ <= 6
 	ldr	\tmp, =elf_hwcap		    @ may not have MVFR regs
 	ldr	\tmp, [\tmp, #0]
 	tst	\tmp, #HWCAP_VFPD32
-	ldclne	p11, cr0, [\base],#32*4		    @ FLDMIAD \base!, {d16-d31}
+	vldmiane \base!, {d16-d31}
 	addeq	\base, \base, #32*4		    @ step over unused register space
 #else
 	VFPFMRX	\tmp, MVFR0			    @ Media and VFP Feature Register 0
 	and	\tmp, \tmp, #MVFR0_A_SIMD_MASK	    @ A_SIMD field
 	cmp	\tmp, #2			    @ 32 x 64bit registers?
-	ldcleq	p11, cr0, [\base],#32*4		    @ FLDMIAD \base!, {d16-d31}
+	vldmiaeq \base!, {d16-d31}
 	addne	\base, \base, #32*4		    @ step over unused register space
 #endif
 #endif
@@ -44,22 +46,23 @@
 	@ write all the working registers out of the VFP
 	.macro	VFPFSTMIA, base, tmp
 #if __LINUX_ARM_ARCH__ < 6
-	STC	p11, cr0, [\base],#33*4		    @ FSTMIAX \base!, {d0-d15}
+	fstmiax	\base!, {d0-d15}
 #else
-	STC	p11, cr0, [\base],#32*4		    @ FSTMIAD \base!, {d0-d15}
+	vstmia	\base!, {d0-d15}
 #endif
 #ifdef CONFIG_VFPv3
+	.fpu	vfpv3
 #if __LINUX_ARM_ARCH__ <= 6
 	ldr	\tmp, =elf_hwcap		    @ may not have MVFR regs
 	ldr	\tmp, [\tmp, #0]
 	tst	\tmp, #HWCAP_VFPD32
-	stclne	p11, cr0, [\base],#32*4		    @ FSTMIAD \base!, {d16-d31}
+	vstmiane \base!, {d16-d31}
 	addeq	\base, \base, #32*4		    @ step over unused register space
 #else
 	VFPFMRX	\tmp, MVFR0			    @ Media and VFP Feature Register 0
 	and	\tmp, \tmp, #MVFR0_A_SIMD_MASK	    @ A_SIMD field
 	cmp	\tmp, #2			    @ 32 x 64bit registers?
-	stcleq	p11, cr0, [\base],#32*4		    @ FSTMIAD \base!, {d16-d31}
+	vstmiaeq \base!, {d16-d31}
 	addne	\base, \base, #32*4		    @ step over unused register space
 #endif
 #endif
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v2 2/3] ARM: use VFP assembler mnemonics in register load/store macros
@ 2020-04-19 12:35   ` Stefan Agner
  0 siblings, 0 replies; 14+ messages in thread
From: Stefan Agner @ 2020-04-19 12:35 UTC (permalink / raw)
  To: linux
  Cc: clang-built-linux, arnd, ard.biesheuvel, ndesaulniers,
	linux-kernel, Stefan Agner, jiancai, yamada.masahiro, manojgupta,
	robin.murphy, linux-arm-kernel

Clang's integrated assembler does not allow to access the VFP registers
through the coprocessor load/store instructions:
<instantiation>:4:6: error: invalid operand for instruction
 LDC p11, cr0, [r10],#32*4 @ FLDMIAD r10!, {d0-d15}
     ^

Replace the coprocessor load/store instructions with explicit assembler
mnemonics to accessing the floating point coprocessor registers. Use
assembler directives to select the appropriate FPU version.

This allows to build these macros with GNU assembler as well as with
Clang's built-in assembler.

Link: https://github.com/ClangBuiltLinux/linux/issues/905
Signed-off-by: Stefan Agner <stefan@agner.ch>
---
Changes in v2:
- Add link in commit message

 arch/arm/include/asm/vfpmacros.h | 19 +++++++++++--------
 1 file changed, 11 insertions(+), 8 deletions(-)

diff --git a/arch/arm/include/asm/vfpmacros.h b/arch/arm/include/asm/vfpmacros.h
index 628c336e8e3b..947ee5395e1f 100644
--- a/arch/arm/include/asm/vfpmacros.h
+++ b/arch/arm/include/asm/vfpmacros.h
@@ -19,23 +19,25 @@
 
 	@ read all the working registers back into the VFP
 	.macro	VFPFLDMIA, base, tmp
+	.fpu	vfpv2
 #if __LINUX_ARM_ARCH__ < 6
-	LDC	p11, cr0, [\base],#33*4		    @ FLDMIAX \base!, {d0-d15}
+	fldmiax	\base!, {d0-d15}
 #else
-	LDC	p11, cr0, [\base],#32*4		    @ FLDMIAD \base!, {d0-d15}
+	vldmia	\base!, {d0-d15}
 #endif
 #ifdef CONFIG_VFPv3
+	.fpu	vfpv3
 #if __LINUX_ARM_ARCH__ <= 6
 	ldr	\tmp, =elf_hwcap		    @ may not have MVFR regs
 	ldr	\tmp, [\tmp, #0]
 	tst	\tmp, #HWCAP_VFPD32
-	ldclne	p11, cr0, [\base],#32*4		    @ FLDMIAD \base!, {d16-d31}
+	vldmiane \base!, {d16-d31}
 	addeq	\base, \base, #32*4		    @ step over unused register space
 #else
 	VFPFMRX	\tmp, MVFR0			    @ Media and VFP Feature Register 0
 	and	\tmp, \tmp, #MVFR0_A_SIMD_MASK	    @ A_SIMD field
 	cmp	\tmp, #2			    @ 32 x 64bit registers?
-	ldcleq	p11, cr0, [\base],#32*4		    @ FLDMIAD \base!, {d16-d31}
+	vldmiaeq \base!, {d16-d31}
 	addne	\base, \base, #32*4		    @ step over unused register space
 #endif
 #endif
@@ -44,22 +46,23 @@
 	@ write all the working registers out of the VFP
 	.macro	VFPFSTMIA, base, tmp
 #if __LINUX_ARM_ARCH__ < 6
-	STC	p11, cr0, [\base],#33*4		    @ FSTMIAX \base!, {d0-d15}
+	fstmiax	\base!, {d0-d15}
 #else
-	STC	p11, cr0, [\base],#32*4		    @ FSTMIAD \base!, {d0-d15}
+	vstmia	\base!, {d0-d15}
 #endif
 #ifdef CONFIG_VFPv3
+	.fpu	vfpv3
 #if __LINUX_ARM_ARCH__ <= 6
 	ldr	\tmp, =elf_hwcap		    @ may not have MVFR regs
 	ldr	\tmp, [\tmp, #0]
 	tst	\tmp, #HWCAP_VFPD32
-	stclne	p11, cr0, [\base],#32*4		    @ FSTMIAD \base!, {d16-d31}
+	vstmiane \base!, {d16-d31}
 	addeq	\base, \base, #32*4		    @ step over unused register space
 #else
 	VFPFMRX	\tmp, MVFR0			    @ Media and VFP Feature Register 0
 	and	\tmp, \tmp, #MVFR0_A_SIMD_MASK	    @ A_SIMD field
 	cmp	\tmp, #2			    @ 32 x 64bit registers?
-	stcleq	p11, cr0, [\base],#32*4		    @ FSTMIAD \base!, {d16-d31}
+	vstmiaeq \base!, {d16-d31}
 	addne	\base, \base, #32*4		    @ step over unused register space
 #endif
 #endif
-- 
2.25.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v2 3/3] ARM: use VFP assembler mnemonics if available
  2020-04-19 12:35 ` Stefan Agner
@ 2020-04-19 12:35   ` Stefan Agner
  -1 siblings, 0 replies; 14+ messages in thread
From: Stefan Agner @ 2020-04-19 12:35 UTC (permalink / raw)
  To: linux
  Cc: arnd, ard.biesheuvel, robin.murphy, yamada.masahiro,
	ndesaulniers, manojgupta, jiancai, linux-arm-kernel,
	linux-kernel, clang-built-linux, Stefan Agner

Clang's integrated assembler does not allow to to use the mcr
instruction to access floating point co-processor registers:
arch/arm/vfp/vfpmodule.c:342:2: error: invalid operand for instruction
        fmxr(FPEXC, fpexc & ~(FPEXC_EX|FPEXC_DEX|FPEXC_FP2V|FPEXC_VV|FPEXC_TRAP_MASK));
        ^
arch/arm/vfp/vfpinstr.h:79:6: note: expanded from macro 'fmxr'
        asm("mcr p10, 7, %0, " vfpreg(_vfp_) ", cr0, 0 @ fmxr   " #_vfp_ ", %0" \
            ^
<inline asm>:1:6: note: instantiated into assembly here
        mcr p10, 7, r0, cr8, cr0, 0 @ fmxr      FPEXC, r0
            ^

Ideally we would replace this code with the unified assembler language
mnemonics vmrs/vmsr on call sites along with .fpu assembler directives.
The GNU assembler supports the .fpu directive at least since 2.17 (when
documentation has been added). Since Linux requires binutils 2.21 it is
safe to use .fpu directive. However, binutils does not allow to use
FPINST or FPINST2 as an argument to vmrs/vmsr instructions up to
binutils 2.24 (see binutils commit 16d02dc907c5):
arch/arm/vfp/vfphw.S: Assembler messages:
arch/arm/vfp/vfphw.S:162: Error: operand 0 must be FPSID or FPSCR pr FPEXC -- `vmsr FPINST,r6'
arch/arm/vfp/vfphw.S:165: Error: operand 0 must be FPSID or FPSCR pr FPEXC -- `vmsr FPINST2,r8'
arch/arm/vfp/vfphw.S:235: Error: operand 1 must be a VFP extension System Register -- `vmrs r3,FPINST'
arch/arm/vfp/vfphw.S:238: Error: operand 1 must be a VFP extension System Register -- `vmrs r12,FPINST2'

Use as-instr in Kconfig to check if FPINST/FPINST2 can be used. If they
can be used make use of .fpu directives and UAL VFP mnemonics for
register access.

This allows to build vfpmodule.c with Clang and its integrated assembler.

Link: https://github.com/ClangBuiltLinux/linux/issues/905
Signed-off-by: Stefan Agner <stefan@agner.ch>
---
Changes in v2:
- Check assembler capabilities in Kconfig instead of Makefile

 arch/arm/Kconfig                 |  2 ++
 arch/arm/Kconfig.assembler       |  6 ++++++
 arch/arm/include/asm/vfp.h       |  2 ++
 arch/arm/include/asm/vfpmacros.h | 12 +++++++++++-
 arch/arm/vfp/vfphw.S             |  1 +
 arch/arm/vfp/vfpinstr.h          | 23 +++++++++++++++++++----
 6 files changed, 41 insertions(+), 5 deletions(-)
 create mode 100644 arch/arm/Kconfig.assembler

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 66a04f6f4775..49564cc32688 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -2090,3 +2090,5 @@ source "drivers/firmware/Kconfig"
 if CRYPTO
 source "arch/arm/crypto/Kconfig"
 endif
+
+source "arch/arm/Kconfig.assembler"
diff --git a/arch/arm/Kconfig.assembler b/arch/arm/Kconfig.assembler
new file mode 100644
index 000000000000..5cb31aae1188
--- /dev/null
+++ b/arch/arm/Kconfig.assembler
@@ -0,0 +1,6 @@
+# SPDX-License-Identifier: GPL-2.0
+
+config AS_VFP_VMRS_FPINST
+	def_bool $(as-instr,.fpu vfpv2\nvmrs r0$(comma)FPINST)
+	help
+	  Supported by binutils >= 2.24 and LLVM integrated assembler.
diff --git a/arch/arm/include/asm/vfp.h b/arch/arm/include/asm/vfp.h
index 7157d2a30a49..19928bfb4f9c 100644
--- a/arch/arm/include/asm/vfp.h
+++ b/arch/arm/include/asm/vfp.h
@@ -9,6 +9,7 @@
 #ifndef __ASM_VFP_H
 #define __ASM_VFP_H
 
+#ifndef CONFIG_AS_VFP_VMRS_FPINST
 #define FPSID			cr0
 #define FPSCR			cr1
 #define MVFR1			cr6
@@ -16,6 +17,7 @@
 #define FPEXC			cr8
 #define FPINST			cr9
 #define FPINST2			cr10
+#endif
 
 /* FPSID bits */
 #define FPSID_IMPLEMENTER_BIT	(24)
diff --git a/arch/arm/include/asm/vfpmacros.h b/arch/arm/include/asm/vfpmacros.h
index 947ee5395e1f..ba0d4cb5377e 100644
--- a/arch/arm/include/asm/vfpmacros.h
+++ b/arch/arm/include/asm/vfpmacros.h
@@ -8,7 +8,16 @@
 
 #include <asm/vfp.h>
 
-@ Macros to allow building with old toolkits (with no VFP support)
+#ifdef CONFIG_AS_VFP_VMRS_FPINST
+	.macro	VFPFMRX, rd, sysreg, cond
+	vmrs\cond	\rd, \sysreg
+	.endm
+
+	.macro	VFPFMXR, sysreg, rd, cond
+	vmsr\cond	\sysreg, \rd
+	.endm
+#else
+	@ Macros to allow building with old toolkits (with no VFP support)
 	.macro	VFPFMRX, rd, sysreg, cond
 	MRC\cond	p10, 7, \rd, \sysreg, cr0, 0	@ FMRX	\rd, \sysreg
 	.endm
@@ -16,6 +25,7 @@
 	.macro	VFPFMXR, sysreg, rd, cond
 	MCR\cond	p10, 7, \rd, \sysreg, cr0, 0	@ FMXR	\sysreg, \rd
 	.endm
+#endif
 
 	@ read all the working registers back into the VFP
 	.macro	VFPFLDMIA, base, tmp
diff --git a/arch/arm/vfp/vfphw.S b/arch/arm/vfp/vfphw.S
index e214007a20a2..90e5659827c7 100644
--- a/arch/arm/vfp/vfphw.S
+++ b/arch/arm/vfp/vfphw.S
@@ -78,6 +78,7 @@
 ENTRY(vfp_support_entry)
 	DBGSTR3	"instr %08x pc %08x state %p", r0, r2, r10
 
+	.fpu	vfpv2
 	ldr	r3, [sp, #S_PSR]	@ Neither lazy restore nor FP exceptions
 	and	r3, r3, #MODE_MASK	@ are supported in kernel mode
 	teq	r3, #USR_MODE
diff --git a/arch/arm/vfp/vfpinstr.h b/arch/arm/vfp/vfpinstr.h
index 38dc154e39ff..3c7938fd40aa 100644
--- a/arch/arm/vfp/vfpinstr.h
+++ b/arch/arm/vfp/vfpinstr.h
@@ -62,10 +62,23 @@
 #define FPSCR_C (1 << 29)
 #define FPSCR_V	(1 << 28)
 
-/*
- * Since we aren't building with -mfpu=vfp, we need to code
- * these instructions using their MRC/MCR equivalents.
- */
+#ifdef CONFIG_AS_VFP_VMRS_FPINST
+
+#define fmrx(_vfp_) ({			\
+	u32 __v;			\
+	asm(".fpu	vfpv2\n"	\
+	    "vmrs	%0, " #_vfp_	\
+	    : "=r" (__v) : : "cc");	\
+	__v;				\
+ })
+
+#define fmxr(_vfp_,_var_)		\
+	asm(".fpu	vfpv2\n"	\
+	    "vmsr	" #_vfp_ ", %0"	\
+	   : : "r" (_var_) : "cc")
+
+#else
+
 #define vfpreg(_vfp_) #_vfp_
 
 #define fmrx(_vfp_) ({			\
@@ -79,6 +92,8 @@
 	asm("mcr p10, 7, %0, " vfpreg(_vfp_) ", cr0, 0 @ fmxr	" #_vfp_ ", %0"	\
 	   : : "r" (_var_) : "cc")
 
+#endif
+
 u32 vfp_single_cpdo(u32 inst, u32 fpscr);
 u32 vfp_single_cprt(u32 inst, u32 fpscr, struct pt_regs *regs);
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v2 3/3] ARM: use VFP assembler mnemonics if available
@ 2020-04-19 12:35   ` Stefan Agner
  0 siblings, 0 replies; 14+ messages in thread
From: Stefan Agner @ 2020-04-19 12:35 UTC (permalink / raw)
  To: linux
  Cc: clang-built-linux, arnd, ard.biesheuvel, ndesaulniers,
	linux-kernel, Stefan Agner, jiancai, yamada.masahiro, manojgupta,
	robin.murphy, linux-arm-kernel

Clang's integrated assembler does not allow to to use the mcr
instruction to access floating point co-processor registers:
arch/arm/vfp/vfpmodule.c:342:2: error: invalid operand for instruction
        fmxr(FPEXC, fpexc & ~(FPEXC_EX|FPEXC_DEX|FPEXC_FP2V|FPEXC_VV|FPEXC_TRAP_MASK));
        ^
arch/arm/vfp/vfpinstr.h:79:6: note: expanded from macro 'fmxr'
        asm("mcr p10, 7, %0, " vfpreg(_vfp_) ", cr0, 0 @ fmxr   " #_vfp_ ", %0" \
            ^
<inline asm>:1:6: note: instantiated into assembly here
        mcr p10, 7, r0, cr8, cr0, 0 @ fmxr      FPEXC, r0
            ^

Ideally we would replace this code with the unified assembler language
mnemonics vmrs/vmsr on call sites along with .fpu assembler directives.
The GNU assembler supports the .fpu directive at least since 2.17 (when
documentation has been added). Since Linux requires binutils 2.21 it is
safe to use .fpu directive. However, binutils does not allow to use
FPINST or FPINST2 as an argument to vmrs/vmsr instructions up to
binutils 2.24 (see binutils commit 16d02dc907c5):
arch/arm/vfp/vfphw.S: Assembler messages:
arch/arm/vfp/vfphw.S:162: Error: operand 0 must be FPSID or FPSCR pr FPEXC -- `vmsr FPINST,r6'
arch/arm/vfp/vfphw.S:165: Error: operand 0 must be FPSID or FPSCR pr FPEXC -- `vmsr FPINST2,r8'
arch/arm/vfp/vfphw.S:235: Error: operand 1 must be a VFP extension System Register -- `vmrs r3,FPINST'
arch/arm/vfp/vfphw.S:238: Error: operand 1 must be a VFP extension System Register -- `vmrs r12,FPINST2'

Use as-instr in Kconfig to check if FPINST/FPINST2 can be used. If they
can be used make use of .fpu directives and UAL VFP mnemonics for
register access.

This allows to build vfpmodule.c with Clang and its integrated assembler.

Link: https://github.com/ClangBuiltLinux/linux/issues/905
Signed-off-by: Stefan Agner <stefan@agner.ch>
---
Changes in v2:
- Check assembler capabilities in Kconfig instead of Makefile

 arch/arm/Kconfig                 |  2 ++
 arch/arm/Kconfig.assembler       |  6 ++++++
 arch/arm/include/asm/vfp.h       |  2 ++
 arch/arm/include/asm/vfpmacros.h | 12 +++++++++++-
 arch/arm/vfp/vfphw.S             |  1 +
 arch/arm/vfp/vfpinstr.h          | 23 +++++++++++++++++++----
 6 files changed, 41 insertions(+), 5 deletions(-)
 create mode 100644 arch/arm/Kconfig.assembler

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 66a04f6f4775..49564cc32688 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -2090,3 +2090,5 @@ source "drivers/firmware/Kconfig"
 if CRYPTO
 source "arch/arm/crypto/Kconfig"
 endif
+
+source "arch/arm/Kconfig.assembler"
diff --git a/arch/arm/Kconfig.assembler b/arch/arm/Kconfig.assembler
new file mode 100644
index 000000000000..5cb31aae1188
--- /dev/null
+++ b/arch/arm/Kconfig.assembler
@@ -0,0 +1,6 @@
+# SPDX-License-Identifier: GPL-2.0
+
+config AS_VFP_VMRS_FPINST
+	def_bool $(as-instr,.fpu vfpv2\nvmrs r0$(comma)FPINST)
+	help
+	  Supported by binutils >= 2.24 and LLVM integrated assembler.
diff --git a/arch/arm/include/asm/vfp.h b/arch/arm/include/asm/vfp.h
index 7157d2a30a49..19928bfb4f9c 100644
--- a/arch/arm/include/asm/vfp.h
+++ b/arch/arm/include/asm/vfp.h
@@ -9,6 +9,7 @@
 #ifndef __ASM_VFP_H
 #define __ASM_VFP_H
 
+#ifndef CONFIG_AS_VFP_VMRS_FPINST
 #define FPSID			cr0
 #define FPSCR			cr1
 #define MVFR1			cr6
@@ -16,6 +17,7 @@
 #define FPEXC			cr8
 #define FPINST			cr9
 #define FPINST2			cr10
+#endif
 
 /* FPSID bits */
 #define FPSID_IMPLEMENTER_BIT	(24)
diff --git a/arch/arm/include/asm/vfpmacros.h b/arch/arm/include/asm/vfpmacros.h
index 947ee5395e1f..ba0d4cb5377e 100644
--- a/arch/arm/include/asm/vfpmacros.h
+++ b/arch/arm/include/asm/vfpmacros.h
@@ -8,7 +8,16 @@
 
 #include <asm/vfp.h>
 
-@ Macros to allow building with old toolkits (with no VFP support)
+#ifdef CONFIG_AS_VFP_VMRS_FPINST
+	.macro	VFPFMRX, rd, sysreg, cond
+	vmrs\cond	\rd, \sysreg
+	.endm
+
+	.macro	VFPFMXR, sysreg, rd, cond
+	vmsr\cond	\sysreg, \rd
+	.endm
+#else
+	@ Macros to allow building with old toolkits (with no VFP support)
 	.macro	VFPFMRX, rd, sysreg, cond
 	MRC\cond	p10, 7, \rd, \sysreg, cr0, 0	@ FMRX	\rd, \sysreg
 	.endm
@@ -16,6 +25,7 @@
 	.macro	VFPFMXR, sysreg, rd, cond
 	MCR\cond	p10, 7, \rd, \sysreg, cr0, 0	@ FMXR	\sysreg, \rd
 	.endm
+#endif
 
 	@ read all the working registers back into the VFP
 	.macro	VFPFLDMIA, base, tmp
diff --git a/arch/arm/vfp/vfphw.S b/arch/arm/vfp/vfphw.S
index e214007a20a2..90e5659827c7 100644
--- a/arch/arm/vfp/vfphw.S
+++ b/arch/arm/vfp/vfphw.S
@@ -78,6 +78,7 @@
 ENTRY(vfp_support_entry)
 	DBGSTR3	"instr %08x pc %08x state %p", r0, r2, r10
 
+	.fpu	vfpv2
 	ldr	r3, [sp, #S_PSR]	@ Neither lazy restore nor FP exceptions
 	and	r3, r3, #MODE_MASK	@ are supported in kernel mode
 	teq	r3, #USR_MODE
diff --git a/arch/arm/vfp/vfpinstr.h b/arch/arm/vfp/vfpinstr.h
index 38dc154e39ff..3c7938fd40aa 100644
--- a/arch/arm/vfp/vfpinstr.h
+++ b/arch/arm/vfp/vfpinstr.h
@@ -62,10 +62,23 @@
 #define FPSCR_C (1 << 29)
 #define FPSCR_V	(1 << 28)
 
-/*
- * Since we aren't building with -mfpu=vfp, we need to code
- * these instructions using their MRC/MCR equivalents.
- */
+#ifdef CONFIG_AS_VFP_VMRS_FPINST
+
+#define fmrx(_vfp_) ({			\
+	u32 __v;			\
+	asm(".fpu	vfpv2\n"	\
+	    "vmrs	%0, " #_vfp_	\
+	    : "=r" (__v) : : "cc");	\
+	__v;				\
+ })
+
+#define fmxr(_vfp_,_var_)		\
+	asm(".fpu	vfpv2\n"	\
+	    "vmsr	" #_vfp_ ", %0"	\
+	   : : "r" (_var_) : "cc")
+
+#else
+
 #define vfpreg(_vfp_) #_vfp_
 
 #define fmrx(_vfp_) ({			\
@@ -79,6 +92,8 @@
 	asm("mcr p10, 7, %0, " vfpreg(_vfp_) ", cr0, 0 @ fmxr	" #_vfp_ ", %0"	\
 	   : : "r" (_var_) : "cc")
 
+#endif
+
 u32 vfp_single_cpdo(u32 inst, u32 fpscr);
 u32 vfp_single_cprt(u32 inst, u32 fpscr, struct pt_regs *regs);
 
-- 
2.25.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/3] ARM: use .fpu assembler directives instead of assembler arguments
  2020-04-19 12:35   ` Stefan Agner
@ 2020-04-19 14:04     ` Russell King - ARM Linux admin
  -1 siblings, 0 replies; 14+ messages in thread
From: Russell King - ARM Linux admin @ 2020-04-19 14:04 UTC (permalink / raw)
  To: Stefan Agner
  Cc: arnd, ard.biesheuvel, robin.murphy, yamada.masahiro,
	ndesaulniers, manojgupta, jiancai, linux-arm-kernel,
	linux-kernel, clang-built-linux

On Sun, Apr 19, 2020 at 02:35:49PM +0200, Stefan Agner wrote:
> Explicit FPU selection has been introduced in commit 1a6be26d5b1a
> ("[ARM] Enable VFP to be built when non-VFP capable CPUs are selected")
> to make use of assembler mnemonics for VFP instructions.
> 
> However, clang currently does not support passing assembler flags
> like this and errors out with:
> clang-10: error: the clang compiler does not support '-Wa,-mfpu=softvfp+vfp'
> 
> Make use of the .fpu assembler directives to select the floating point
> hardware selectively. Also use the new unified assembler language
> mnemonics. This allows to build these procedures with Clang.
> 
> Link: https://github.com/ClangBuiltLinux/linux/issues/762
> Signed-off-by: Stefan Agner <stefan@agner.ch>
> ---
> Changes in v2:
> - Add link in commit message
> 
>  arch/arm/vfp/Makefile |  2 --
>  arch/arm/vfp/vfphw.S  | 30 +++++++++++++++++++-----------
>  2 files changed, 19 insertions(+), 13 deletions(-)
> 
> diff --git a/arch/arm/vfp/Makefile b/arch/arm/vfp/Makefile
> index 9975b63ac3b0..749901a72d6d 100644
> --- a/arch/arm/vfp/Makefile
> +++ b/arch/arm/vfp/Makefile
> @@ -8,6 +8,4 @@
>  # ccflags-y := -DDEBUG
>  # asflags-y := -DDEBUG
>  
> -KBUILD_AFLAGS	:=$(KBUILD_AFLAGS:-msoft-float=-Wa,-mfpu=softvfp+vfp -mfloat-abi=soft)
> -
>  obj-y		+= vfpmodule.o entry.o vfphw.o vfpsingle.o vfpdouble.o
> diff --git a/arch/arm/vfp/vfphw.S b/arch/arm/vfp/vfphw.S
> index b2e560290860..e214007a20a2 100644
> --- a/arch/arm/vfp/vfphw.S
> +++ b/arch/arm/vfp/vfphw.S
> @@ -258,11 +258,13 @@ vfp_current_hw_state_address:
>  
>  ENTRY(vfp_get_float)
>  	tbl_branch r0, r3, #3
> -	.irp	dr,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15
> -1:	mrc	p10, 0, r0, c\dr, c0, 0	@ fmrs	r0, s0
> +	.irp	dr,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,

Apart from the extraneous comma above, this looks fine, thanks.

> +1:	vmov	r0, s\dr
>  	ret	lr
>  	.org	1b + 8
> -1:	mrc	p10, 0, r0, c\dr, c0, 4	@ fmrs	r0, s1
> +	.endr
> +	.irp	dr,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31
> +1:	vmov	r0, s\dr
>  	ret	lr
>  	.org	1b + 8
>  	.endr
> @@ -271,10 +273,12 @@ ENDPROC(vfp_get_float)
>  ENTRY(vfp_put_float)
>  	tbl_branch r1, r3, #3
>  	.irp	dr,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15
> -1:	mcr	p10, 0, r0, c\dr, c0, 0	@ fmsr	r0, s0
> +1:	vmov	s\dr, r0
>  	ret	lr
>  	.org	1b + 8
> -1:	mcr	p10, 0, r0, c\dr, c0, 4	@ fmsr	r0, s1
> +	.endr
> +	.irp	dr,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31
> +1:	vmov	s\dr, r0
>  	ret	lr
>  	.org	1b + 8
>  	.endr
> @@ -282,15 +286,17 @@ ENDPROC(vfp_put_float)
>  
>  ENTRY(vfp_get_double)
>  	tbl_branch r0, r3, #3
> +	.fpu	vfpv2
>  	.irp	dr,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15
> -1:	fmrrd	r0, r1, d\dr
> +1:	vmov	r0, r1, d\dr
>  	ret	lr
>  	.org	1b + 8
>  	.endr
>  #ifdef CONFIG_VFPv3
>  	@ d16 - d31 registers
> -	.irp	dr,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15
> -1:	mrrc	p11, 3, r0, r1, c\dr	@ fmrrd	r0, r1, d\dr
> +	.fpu	vfpv3
> +	.irp	dr,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31
> +1:	vmov	r0, r1, d\dr
>  	ret	lr
>  	.org	1b + 8
>  	.endr
> @@ -304,15 +310,17 @@ ENDPROC(vfp_get_double)
>  
>  ENTRY(vfp_put_double)
>  	tbl_branch r2, r3, #3
> +	.fpu	vfpv2
>  	.irp	dr,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15
> -1:	fmdrr	d\dr, r0, r1
> +1:	vmov	d\dr, r0, r1
>  	ret	lr
>  	.org	1b + 8
>  	.endr
>  #ifdef CONFIG_VFPv3
> +	.fpu	vfpv3
>  	@ d16 - d31 registers
> -	.irp	dr,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15
> -1:	mcrr	p11, 3, r0, r1, c\dr	@ fmdrr	r0, r1, d\dr
> +	.irp	dr,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31
> +1:	vmov	d\dr, r0, r1
>  	ret	lr
>  	.org	1b + 8
>  	.endr
> -- 
> 2.25.1
> 
> 

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 10.2Mbps down 587kbps up

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/3] ARM: use .fpu assembler directives instead of assembler arguments
@ 2020-04-19 14:04     ` Russell King - ARM Linux admin
  0 siblings, 0 replies; 14+ messages in thread
From: Russell King - ARM Linux admin @ 2020-04-19 14:04 UTC (permalink / raw)
  To: Stefan Agner
  Cc: clang-built-linux, arnd, ard.biesheuvel, ndesaulniers,
	linux-kernel, jiancai, yamada.masahiro, manojgupta, robin.murphy,
	linux-arm-kernel

On Sun, Apr 19, 2020 at 02:35:49PM +0200, Stefan Agner wrote:
> Explicit FPU selection has been introduced in commit 1a6be26d5b1a
> ("[ARM] Enable VFP to be built when non-VFP capable CPUs are selected")
> to make use of assembler mnemonics for VFP instructions.
> 
> However, clang currently does not support passing assembler flags
> like this and errors out with:
> clang-10: error: the clang compiler does not support '-Wa,-mfpu=softvfp+vfp'
> 
> Make use of the .fpu assembler directives to select the floating point
> hardware selectively. Also use the new unified assembler language
> mnemonics. This allows to build these procedures with Clang.
> 
> Link: https://github.com/ClangBuiltLinux/linux/issues/762
> Signed-off-by: Stefan Agner <stefan@agner.ch>
> ---
> Changes in v2:
> - Add link in commit message
> 
>  arch/arm/vfp/Makefile |  2 --
>  arch/arm/vfp/vfphw.S  | 30 +++++++++++++++++++-----------
>  2 files changed, 19 insertions(+), 13 deletions(-)
> 
> diff --git a/arch/arm/vfp/Makefile b/arch/arm/vfp/Makefile
> index 9975b63ac3b0..749901a72d6d 100644
> --- a/arch/arm/vfp/Makefile
> +++ b/arch/arm/vfp/Makefile
> @@ -8,6 +8,4 @@
>  # ccflags-y := -DDEBUG
>  # asflags-y := -DDEBUG
>  
> -KBUILD_AFLAGS	:=$(KBUILD_AFLAGS:-msoft-float=-Wa,-mfpu=softvfp+vfp -mfloat-abi=soft)
> -
>  obj-y		+= vfpmodule.o entry.o vfphw.o vfpsingle.o vfpdouble.o
> diff --git a/arch/arm/vfp/vfphw.S b/arch/arm/vfp/vfphw.S
> index b2e560290860..e214007a20a2 100644
> --- a/arch/arm/vfp/vfphw.S
> +++ b/arch/arm/vfp/vfphw.S
> @@ -258,11 +258,13 @@ vfp_current_hw_state_address:
>  
>  ENTRY(vfp_get_float)
>  	tbl_branch r0, r3, #3
> -	.irp	dr,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15
> -1:	mrc	p10, 0, r0, c\dr, c0, 0	@ fmrs	r0, s0
> +	.irp	dr,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,

Apart from the extraneous comma above, this looks fine, thanks.

> +1:	vmov	r0, s\dr
>  	ret	lr
>  	.org	1b + 8
> -1:	mrc	p10, 0, r0, c\dr, c0, 4	@ fmrs	r0, s1
> +	.endr
> +	.irp	dr,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31
> +1:	vmov	r0, s\dr
>  	ret	lr
>  	.org	1b + 8
>  	.endr
> @@ -271,10 +273,12 @@ ENDPROC(vfp_get_float)
>  ENTRY(vfp_put_float)
>  	tbl_branch r1, r3, #3
>  	.irp	dr,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15
> -1:	mcr	p10, 0, r0, c\dr, c0, 0	@ fmsr	r0, s0
> +1:	vmov	s\dr, r0
>  	ret	lr
>  	.org	1b + 8
> -1:	mcr	p10, 0, r0, c\dr, c0, 4	@ fmsr	r0, s1
> +	.endr
> +	.irp	dr,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31
> +1:	vmov	s\dr, r0
>  	ret	lr
>  	.org	1b + 8
>  	.endr
> @@ -282,15 +286,17 @@ ENDPROC(vfp_put_float)
>  
>  ENTRY(vfp_get_double)
>  	tbl_branch r0, r3, #3
> +	.fpu	vfpv2
>  	.irp	dr,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15
> -1:	fmrrd	r0, r1, d\dr
> +1:	vmov	r0, r1, d\dr
>  	ret	lr
>  	.org	1b + 8
>  	.endr
>  #ifdef CONFIG_VFPv3
>  	@ d16 - d31 registers
> -	.irp	dr,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15
> -1:	mrrc	p11, 3, r0, r1, c\dr	@ fmrrd	r0, r1, d\dr
> +	.fpu	vfpv3
> +	.irp	dr,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31
> +1:	vmov	r0, r1, d\dr
>  	ret	lr
>  	.org	1b + 8
>  	.endr
> @@ -304,15 +310,17 @@ ENDPROC(vfp_get_double)
>  
>  ENTRY(vfp_put_double)
>  	tbl_branch r2, r3, #3
> +	.fpu	vfpv2
>  	.irp	dr,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15
> -1:	fmdrr	d\dr, r0, r1
> +1:	vmov	d\dr, r0, r1
>  	ret	lr
>  	.org	1b + 8
>  	.endr
>  #ifdef CONFIG_VFPv3
> +	.fpu	vfpv3
>  	@ d16 - d31 registers
> -	.irp	dr,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15
> -1:	mcrr	p11, 3, r0, r1, c\dr	@ fmdrr	r0, r1, d\dr
> +	.irp	dr,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31
> +1:	vmov	d\dr, r0, r1
>  	ret	lr
>  	.org	1b + 8
>  	.endr
> -- 
> 2.25.1
> 
> 

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 10.2Mbps down 587kbps up

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 0/3] ARM: make use of UAL VFP mnemonics when possible
  2020-04-19 12:35 ` Stefan Agner
@ 2020-04-19 14:12   ` Russell King - ARM Linux admin
  -1 siblings, 0 replies; 14+ messages in thread
From: Russell King - ARM Linux admin @ 2020-04-19 14:12 UTC (permalink / raw)
  To: Stefan Agner
  Cc: clang-built-linux, arnd, ard.biesheuvel, ndesaulniers,
	linux-kernel, jiancai, yamada.masahiro, manojgupta, robin.murphy,
	linux-arm-kernel

On Sun, Apr 19, 2020 at 02:35:48PM +0200, Stefan Agner wrote:
> To build the kernel with Clang's integrated assembler the VFP code needs
> to make use of the unified assembler language (UAL) VFP mnemonics.
> 
> At first I tried to get rid of the co-processor instructions to access
> the floating point unit along with the macros completely. However, due
> to missing FPINST/FPINST2 argument support in older binutils versions we
> have to keep them around. Once we drop support for binutils 2.24 and
> older, the move to UAL VFP mnemonics will be straight forward with this
> changes applied.
> 
> Tested using Clang with integrated assembler as well as external
> (binutils assembler), various gcc/binutils version down to 4.7/2.23.
> Disassembled and compared the object files in arch/arm/vfp/ to make
> sure this changes leads to the same code. Besides different inlining
> behavior I was not able to spot a difference.
> 
> In v2 the check for FPINST argument support is now made in Kconfig.

Given what I said in the other thread, Clang really _should_ allow
the MCR/MRC et.al. instructions to access the VFP registers.  There
is no reason to specifically block them.

As we have seen with FPA, having that ability when iWMMXT comes along
is very useful.  In any case:

1. The ARM ARM (DDI0406) states that "These instructions are MRC and MCR
instructions for coprocessors 10 and 11." in section A7.8.

2. The ARM ARM (DDI0406) describes the MRC and MCR instructions as
being able to access _any_ co-processor.

So, Clang deciding that it's going to block access to coprocessor 10
and 11 because some version of the architecture _also_ defines these
as VFP instructions is really not on, and Clang needs to be fixed
irrespective of these patches - and I want to know that *is* going to
get fixed before I take these patches into the kernel.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 10.2Mbps down 587kbps up

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 0/3] ARM: make use of UAL VFP mnemonics when possible
@ 2020-04-19 14:12   ` Russell King - ARM Linux admin
  0 siblings, 0 replies; 14+ messages in thread
From: Russell King - ARM Linux admin @ 2020-04-19 14:12 UTC (permalink / raw)
  To: Stefan Agner
  Cc: arnd, ard.biesheuvel, yamada.masahiro, ndesaulniers,
	linux-kernel, jiancai, clang-built-linux, manojgupta,
	robin.murphy, linux-arm-kernel

On Sun, Apr 19, 2020 at 02:35:48PM +0200, Stefan Agner wrote:
> To build the kernel with Clang's integrated assembler the VFP code needs
> to make use of the unified assembler language (UAL) VFP mnemonics.
> 
> At first I tried to get rid of the co-processor instructions to access
> the floating point unit along with the macros completely. However, due
> to missing FPINST/FPINST2 argument support in older binutils versions we
> have to keep them around. Once we drop support for binutils 2.24 and
> older, the move to UAL VFP mnemonics will be straight forward with this
> changes applied.
> 
> Tested using Clang with integrated assembler as well as external
> (binutils assembler), various gcc/binutils version down to 4.7/2.23.
> Disassembled and compared the object files in arch/arm/vfp/ to make
> sure this changes leads to the same code. Besides different inlining
> behavior I was not able to spot a difference.
> 
> In v2 the check for FPINST argument support is now made in Kconfig.

Given what I said in the other thread, Clang really _should_ allow
the MCR/MRC et.al. instructions to access the VFP registers.  There
is no reason to specifically block them.

As we have seen with FPA, having that ability when iWMMXT comes along
is very useful.  In any case:

1. The ARM ARM (DDI0406) states that "These instructions are MRC and MCR
instructions for coprocessors 10 and 11." in section A7.8.

2. The ARM ARM (DDI0406) describes the MRC and MCR instructions as
being able to access _any_ co-processor.

So, Clang deciding that it's going to block access to coprocessor 10
and 11 because some version of the architecture _also_ defines these
as VFP instructions is really not on, and Clang needs to be fixed
irrespective of these patches - and I want to know that *is* going to
get fixed before I take these patches into the kernel.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 10.2Mbps down 587kbps up

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 0/3] ARM: make use of UAL VFP mnemonics when possible
  2020-04-19 14:12   ` Russell King - ARM Linux admin
@ 2020-04-19 21:20     ` Stefan Agner
  -1 siblings, 0 replies; 14+ messages in thread
From: Stefan Agner @ 2020-04-19 21:20 UTC (permalink / raw)
  To: Russell King - ARM Linux admin
  Cc: clang-built-linux, arnd, ard.biesheuvel, ndesaulniers,
	linux-kernel, jiancai, yamada.masahiro, manojgupta, robin.murphy,
	linux-arm-kernel

On 2020-04-19 16:12, Russell King - ARM Linux admin wrote:
> On Sun, Apr 19, 2020 at 02:35:48PM +0200, Stefan Agner wrote:
>> To build the kernel with Clang's integrated assembler the VFP code needs
>> to make use of the unified assembler language (UAL) VFP mnemonics.
>>
>> At first I tried to get rid of the co-processor instructions to access
>> the floating point unit along with the macros completely. However, due
>> to missing FPINST/FPINST2 argument support in older binutils versions we
>> have to keep them around. Once we drop support for binutils 2.24 and
>> older, the move to UAL VFP mnemonics will be straight forward with this
>> changes applied.
>>
>> Tested using Clang with integrated assembler as well as external
>> (binutils assembler), various gcc/binutils version down to 4.7/2.23.
>> Disassembled and compared the object files in arch/arm/vfp/ to make
>> sure this changes leads to the same code. Besides different inlining
>> behavior I was not able to spot a difference.
>>
>> In v2 the check for FPINST argument support is now made in Kconfig.
> 
> Given what I said in the other thread, Clang really _should_ allow
> the MCR/MRC et.al. instructions to access the VFP registers.  There
> is no reason to specifically block them.

I agree, and I am working on changing this.

There have been discussions about co-processor register access a while
back in the LLVM/Clang community [1]. Peter Smith pointed this out in
the ClangBuiltLinux issue tracker [2], which also has some more context.
I did submit a patch [3] to convert use of cp10/cp11 in ARMv7 contexts
to a warning. However it got stale, I'll have to revisit.

There is actually another case where this issue blocks Clang's
integrated assembler: In arch/arm/kernel/perf_event_v7.c, function
venum_read_pmresr mcr/mrc is used to access the performance monitor
registers for Qualcomm's Krait/Scorpion PMU, and in this case there is
no mnemonic available.

> 
> As we have seen with FPA, having that ability when iWMMXT comes along
> is very useful.  In any case:
> 
> 1. The ARM ARM (DDI0406) states that "These instructions are MRC and MCR
> instructions for coprocessors 10 and 11." in section A7.8.
> 
> 2. The ARM ARM (DDI0406) describes the MRC and MCR instructions as
> being able to access _any_ co-processor.

These are good arguments I can use in case my patch stirs up a
discussion, thanks for the hints!

> 
> So, Clang deciding that it's going to block access to coprocessor 10
> and 11 because some version of the architecture _also_ defines these
> as VFP instructions is really not on, and Clang needs to be fixed
> irrespective of these patches - and I want to know that *is* going to
> get fixed before I take these patches into the kernel.

I'll try. We'll see.

[1] https://bugs.llvm.org/show_bug.cgi?id=20025
[2] https://github.com/ClangBuiltLinux/linux/issues/306
[3] https://reviews.llvm.org/D59733

--
Stefan

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 0/3] ARM: make use of UAL VFP mnemonics when possible
@ 2020-04-19 21:20     ` Stefan Agner
  0 siblings, 0 replies; 14+ messages in thread
From: Stefan Agner @ 2020-04-19 21:20 UTC (permalink / raw)
  To: Russell King - ARM Linux admin
  Cc: arnd, ard.biesheuvel, yamada.masahiro, ndesaulniers,
	linux-kernel, jiancai, clang-built-linux, manojgupta,
	robin.murphy, linux-arm-kernel

On 2020-04-19 16:12, Russell King - ARM Linux admin wrote:
> On Sun, Apr 19, 2020 at 02:35:48PM +0200, Stefan Agner wrote:
>> To build the kernel with Clang's integrated assembler the VFP code needs
>> to make use of the unified assembler language (UAL) VFP mnemonics.
>>
>> At first I tried to get rid of the co-processor instructions to access
>> the floating point unit along with the macros completely. However, due
>> to missing FPINST/FPINST2 argument support in older binutils versions we
>> have to keep them around. Once we drop support for binutils 2.24 and
>> older, the move to UAL VFP mnemonics will be straight forward with this
>> changes applied.
>>
>> Tested using Clang with integrated assembler as well as external
>> (binutils assembler), various gcc/binutils version down to 4.7/2.23.
>> Disassembled and compared the object files in arch/arm/vfp/ to make
>> sure this changes leads to the same code. Besides different inlining
>> behavior I was not able to spot a difference.
>>
>> In v2 the check for FPINST argument support is now made in Kconfig.
> 
> Given what I said in the other thread, Clang really _should_ allow
> the MCR/MRC et.al. instructions to access the VFP registers.  There
> is no reason to specifically block them.

I agree, and I am working on changing this.

There have been discussions about co-processor register access a while
back in the LLVM/Clang community [1]. Peter Smith pointed this out in
the ClangBuiltLinux issue tracker [2], which also has some more context.
I did submit a patch [3] to convert use of cp10/cp11 in ARMv7 contexts
to a warning. However it got stale, I'll have to revisit.

There is actually another case where this issue blocks Clang's
integrated assembler: In arch/arm/kernel/perf_event_v7.c, function
venum_read_pmresr mcr/mrc is used to access the performance monitor
registers for Qualcomm's Krait/Scorpion PMU, and in this case there is
no mnemonic available.

> 
> As we have seen with FPA, having that ability when iWMMXT comes along
> is very useful.  In any case:
> 
> 1. The ARM ARM (DDI0406) states that "These instructions are MRC and MCR
> instructions for coprocessors 10 and 11." in section A7.8.
> 
> 2. The ARM ARM (DDI0406) describes the MRC and MCR instructions as
> being able to access _any_ co-processor.

These are good arguments I can use in case my patch stirs up a
discussion, thanks for the hints!

> 
> So, Clang deciding that it's going to block access to coprocessor 10
> and 11 because some version of the architecture _also_ defines these
> as VFP instructions is really not on, and Clang needs to be fixed
> irrespective of these patches - and I want to know that *is* going to
> get fixed before I take these patches into the kernel.

I'll try. We'll see.

[1] https://bugs.llvm.org/show_bug.cgi?id=20025
[2] https://github.com/ClangBuiltLinux/linux/issues/306
[3] https://reviews.llvm.org/D59733

--
Stefan

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2020-04-19 21:21 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-19 12:35 [PATCH v2 0/3] ARM: make use of UAL VFP mnemonics when possible Stefan Agner
2020-04-19 12:35 ` Stefan Agner
2020-04-19 12:35 ` [PATCH v2 1/3] ARM: use .fpu assembler directives instead of assembler arguments Stefan Agner
2020-04-19 12:35   ` Stefan Agner
2020-04-19 14:04   ` Russell King - ARM Linux admin
2020-04-19 14:04     ` Russell King - ARM Linux admin
2020-04-19 12:35 ` [PATCH v2 2/3] ARM: use VFP assembler mnemonics in register load/store macros Stefan Agner
2020-04-19 12:35   ` Stefan Agner
2020-04-19 12:35 ` [PATCH v2 3/3] ARM: use VFP assembler mnemonics if available Stefan Agner
2020-04-19 12:35   ` Stefan Agner
2020-04-19 14:12 ` [PATCH v2 0/3] ARM: make use of UAL VFP mnemonics when possible Russell King - ARM Linux admin
2020-04-19 14:12   ` Russell King - ARM Linux admin
2020-04-19 21:20   ` Stefan Agner
2020-04-19 21:20     ` Stefan Agner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.