* [PATCH 0/3] ARM: make use of UAL VFP mnemonics when possible
@ 2020-03-10 22:01 ` Stefan Agner
0 siblings, 0 replies; 8+ messages in thread
From: Stefan Agner @ 2020-03-10 22:01 UTC (permalink / raw)
To: linux
Cc: arnd, ard.biesheuvel, robin.murphy, yamada.masahiro,
ndesaulniers, manojgupta, jiancai, linux-arm-kernel,
linux-kernel, clang-built-linux, Stefan Agner
To build the kernel with Clang's integrated assembler the VFP code needs
to make use of the unified assembler language (UAL) VFP mnemonics.
At first I tried to get rid of the co-processor instructions to access
the floating point unit along with the macros completely. However, due
to missing FPINST/FPINST2 argument support in older binutils versions we
have to keep them around. Once we drop support for binutils 2.24 and
older, the move to UAL VFP mnemonics will be straight forward with this
changes applied.
Tested using Clang with integrated assembler as well as external
(binutils assembler), various gcc/binutils version down to 4.7/2.23.
Disassembled and compared the object files in arch/arm/vfp/ to make
sure this changes leads to the same code. Besides different inlining
behavior I was not able to spot a difference.
This replaces (and extends) my earlier patch "ARM: use assembly mnemonics
for VFP register access"
http://lore.kernel.org/r/8bb16ac4b15a7e28a8e819ef9aae20bfc3f75fbc.1582266841.git.stefan@agner.ch
--
Stefan
Stefan Agner (3):
ARM: use .fpu assembler directives instead of assembler arguments
ARM: use VFP assembler mnemonics in register load/store macros
ARM: use VFP assembler mnemonics if available
arch/arm/include/asm/vfp.h | 2 ++
arch/arm/include/asm/vfpmacros.h | 31 ++++++++++++++++++++++---------
arch/arm/vfp/Makefile | 5 ++++-
arch/arm/vfp/vfphw.S | 31 ++++++++++++++++++++-----------
arch/arm/vfp/vfpinstr.h | 23 +++++++++++++++++++----
5 files changed, 67 insertions(+), 25 deletions(-)
--
2.25.1
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH 0/3] ARM: make use of UAL VFP mnemonics when possible
@ 2020-03-10 22:01 ` Stefan Agner
0 siblings, 0 replies; 8+ messages in thread
From: Stefan Agner @ 2020-03-10 22:01 UTC (permalink / raw)
To: linux
Cc: clang-built-linux, arnd, ard.biesheuvel, ndesaulniers,
linux-kernel, Stefan Agner, jiancai, yamada.masahiro, manojgupta,
robin.murphy, linux-arm-kernel
To build the kernel with Clang's integrated assembler the VFP code needs
to make use of the unified assembler language (UAL) VFP mnemonics.
At first I tried to get rid of the co-processor instructions to access
the floating point unit along with the macros completely. However, due
to missing FPINST/FPINST2 argument support in older binutils versions we
have to keep them around. Once we drop support for binutils 2.24 and
older, the move to UAL VFP mnemonics will be straight forward with this
changes applied.
Tested using Clang with integrated assembler as well as external
(binutils assembler), various gcc/binutils version down to 4.7/2.23.
Disassembled and compared the object files in arch/arm/vfp/ to make
sure this changes leads to the same code. Besides different inlining
behavior I was not able to spot a difference.
This replaces (and extends) my earlier patch "ARM: use assembly mnemonics
for VFP register access"
http://lore.kernel.org/r/8bb16ac4b15a7e28a8e819ef9aae20bfc3f75fbc.1582266841.git.stefan@agner.ch
--
Stefan
Stefan Agner (3):
ARM: use .fpu assembler directives instead of assembler arguments
ARM: use VFP assembler mnemonics in register load/store macros
ARM: use VFP assembler mnemonics if available
arch/arm/include/asm/vfp.h | 2 ++
arch/arm/include/asm/vfpmacros.h | 31 ++++++++++++++++++++++---------
arch/arm/vfp/Makefile | 5 ++++-
arch/arm/vfp/vfphw.S | 31 ++++++++++++++++++++-----------
arch/arm/vfp/vfpinstr.h | 23 +++++++++++++++++++----
5 files changed, 67 insertions(+), 25 deletions(-)
--
2.25.1
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH 1/3] ARM: use .fpu assembler directives instead of assembler arguments
2020-03-10 22:01 ` Stefan Agner
@ 2020-03-10 22:01 ` Stefan Agner
-1 siblings, 0 replies; 8+ messages in thread
From: Stefan Agner @ 2020-03-10 22:01 UTC (permalink / raw)
To: linux
Cc: arnd, ard.biesheuvel, robin.murphy, yamada.masahiro,
ndesaulniers, manojgupta, jiancai, linux-arm-kernel,
linux-kernel, clang-built-linux, Stefan Agner
Explicit FPU selection has been introduced in commit 1a6be26d5b1a
("[ARM] Enable VFP to be built when non-VFP capable CPUs are selected")
to make use of assembler mnemonics for VFP instructions.
However, clang currently does not support passing assembler flags
like this and errors out with:
clang-10: error: the clang compiler does not support '-Wa,-mfpu=softvfp+vfp'
Make use of the .fpu assembler directives to select the floating point
hardware selectively. Also use the new unified assembler language
mnemonics. This allows to build these procedures with Clang.
Signed-off-by: Stefan Agner <stefan@agner.ch>
---
arch/arm/vfp/Makefile | 2 --
arch/arm/vfp/vfphw.S | 30 +++++++++++++++++++-----------
2 files changed, 19 insertions(+), 13 deletions(-)
diff --git a/arch/arm/vfp/Makefile b/arch/arm/vfp/Makefile
index 9975b63ac3b0..749901a72d6d 100644
--- a/arch/arm/vfp/Makefile
+++ b/arch/arm/vfp/Makefile
@@ -8,6 +8,4 @@
# ccflags-y := -DDEBUG
# asflags-y := -DDEBUG
-KBUILD_AFLAGS :=$(KBUILD_AFLAGS:-msoft-float=-Wa,-mfpu=softvfp+vfp -mfloat-abi=soft)
-
obj-y += vfpmodule.o entry.o vfphw.o vfpsingle.o vfpdouble.o
diff --git a/arch/arm/vfp/vfphw.S b/arch/arm/vfp/vfphw.S
index b2e560290860..e214007a20a2 100644
--- a/arch/arm/vfp/vfphw.S
+++ b/arch/arm/vfp/vfphw.S
@@ -258,11 +258,13 @@ vfp_current_hw_state_address:
ENTRY(vfp_get_float)
tbl_branch r0, r3, #3
- .irp dr,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15
-1: mrc p10, 0, r0, c\dr, c0, 0 @ fmrs r0, s0
+ .irp dr,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,
+1: vmov r0, s\dr
ret lr
.org 1b + 8
-1: mrc p10, 0, r0, c\dr, c0, 4 @ fmrs r0, s1
+ .endr
+ .irp dr,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31
+1: vmov r0, s\dr
ret lr
.org 1b + 8
.endr
@@ -271,10 +273,12 @@ ENDPROC(vfp_get_float)
ENTRY(vfp_put_float)
tbl_branch r1, r3, #3
.irp dr,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15
-1: mcr p10, 0, r0, c\dr, c0, 0 @ fmsr r0, s0
+1: vmov s\dr, r0
ret lr
.org 1b + 8
-1: mcr p10, 0, r0, c\dr, c0, 4 @ fmsr r0, s1
+ .endr
+ .irp dr,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31
+1: vmov s\dr, r0
ret lr
.org 1b + 8
.endr
@@ -282,15 +286,17 @@ ENDPROC(vfp_put_float)
ENTRY(vfp_get_double)
tbl_branch r0, r3, #3
+ .fpu vfpv2
.irp dr,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15
-1: fmrrd r0, r1, d\dr
+1: vmov r0, r1, d\dr
ret lr
.org 1b + 8
.endr
#ifdef CONFIG_VFPv3
@ d16 - d31 registers
- .irp dr,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15
-1: mrrc p11, 3, r0, r1, c\dr @ fmrrd r0, r1, d\dr
+ .fpu vfpv3
+ .irp dr,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31
+1: vmov r0, r1, d\dr
ret lr
.org 1b + 8
.endr
@@ -304,15 +310,17 @@ ENDPROC(vfp_get_double)
ENTRY(vfp_put_double)
tbl_branch r2, r3, #3
+ .fpu vfpv2
.irp dr,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15
-1: fmdrr d\dr, r0, r1
+1: vmov d\dr, r0, r1
ret lr
.org 1b + 8
.endr
#ifdef CONFIG_VFPv3
+ .fpu vfpv3
@ d16 - d31 registers
- .irp dr,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15
-1: mcrr p11, 3, r0, r1, c\dr @ fmdrr r0, r1, d\dr
+ .irp dr,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31
+1: vmov d\dr, r0, r1
ret lr
.org 1b + 8
.endr
--
2.25.1
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH 1/3] ARM: use .fpu assembler directives instead of assembler arguments
@ 2020-03-10 22:01 ` Stefan Agner
0 siblings, 0 replies; 8+ messages in thread
From: Stefan Agner @ 2020-03-10 22:01 UTC (permalink / raw)
To: linux
Cc: clang-built-linux, arnd, ard.biesheuvel, ndesaulniers,
linux-kernel, Stefan Agner, jiancai, yamada.masahiro, manojgupta,
robin.murphy, linux-arm-kernel
Explicit FPU selection has been introduced in commit 1a6be26d5b1a
("[ARM] Enable VFP to be built when non-VFP capable CPUs are selected")
to make use of assembler mnemonics for VFP instructions.
However, clang currently does not support passing assembler flags
like this and errors out with:
clang-10: error: the clang compiler does not support '-Wa,-mfpu=softvfp+vfp'
Make use of the .fpu assembler directives to select the floating point
hardware selectively. Also use the new unified assembler language
mnemonics. This allows to build these procedures with Clang.
Signed-off-by: Stefan Agner <stefan@agner.ch>
---
arch/arm/vfp/Makefile | 2 --
arch/arm/vfp/vfphw.S | 30 +++++++++++++++++++-----------
2 files changed, 19 insertions(+), 13 deletions(-)
diff --git a/arch/arm/vfp/Makefile b/arch/arm/vfp/Makefile
index 9975b63ac3b0..749901a72d6d 100644
--- a/arch/arm/vfp/Makefile
+++ b/arch/arm/vfp/Makefile
@@ -8,6 +8,4 @@
# ccflags-y := -DDEBUG
# asflags-y := -DDEBUG
-KBUILD_AFLAGS :=$(KBUILD_AFLAGS:-msoft-float=-Wa,-mfpu=softvfp+vfp -mfloat-abi=soft)
-
obj-y += vfpmodule.o entry.o vfphw.o vfpsingle.o vfpdouble.o
diff --git a/arch/arm/vfp/vfphw.S b/arch/arm/vfp/vfphw.S
index b2e560290860..e214007a20a2 100644
--- a/arch/arm/vfp/vfphw.S
+++ b/arch/arm/vfp/vfphw.S
@@ -258,11 +258,13 @@ vfp_current_hw_state_address:
ENTRY(vfp_get_float)
tbl_branch r0, r3, #3
- .irp dr,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15
-1: mrc p10, 0, r0, c\dr, c0, 0 @ fmrs r0, s0
+ .irp dr,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,
+1: vmov r0, s\dr
ret lr
.org 1b + 8
-1: mrc p10, 0, r0, c\dr, c0, 4 @ fmrs r0, s1
+ .endr
+ .irp dr,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31
+1: vmov r0, s\dr
ret lr
.org 1b + 8
.endr
@@ -271,10 +273,12 @@ ENDPROC(vfp_get_float)
ENTRY(vfp_put_float)
tbl_branch r1, r3, #3
.irp dr,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15
-1: mcr p10, 0, r0, c\dr, c0, 0 @ fmsr r0, s0
+1: vmov s\dr, r0
ret lr
.org 1b + 8
-1: mcr p10, 0, r0, c\dr, c0, 4 @ fmsr r0, s1
+ .endr
+ .irp dr,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31
+1: vmov s\dr, r0
ret lr
.org 1b + 8
.endr
@@ -282,15 +286,17 @@ ENDPROC(vfp_put_float)
ENTRY(vfp_get_double)
tbl_branch r0, r3, #3
+ .fpu vfpv2
.irp dr,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15
-1: fmrrd r0, r1, d\dr
+1: vmov r0, r1, d\dr
ret lr
.org 1b + 8
.endr
#ifdef CONFIG_VFPv3
@ d16 - d31 registers
- .irp dr,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15
-1: mrrc p11, 3, r0, r1, c\dr @ fmrrd r0, r1, d\dr
+ .fpu vfpv3
+ .irp dr,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31
+1: vmov r0, r1, d\dr
ret lr
.org 1b + 8
.endr
@@ -304,15 +310,17 @@ ENDPROC(vfp_get_double)
ENTRY(vfp_put_double)
tbl_branch r2, r3, #3
+ .fpu vfpv2
.irp dr,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15
-1: fmdrr d\dr, r0, r1
+1: vmov d\dr, r0, r1
ret lr
.org 1b + 8
.endr
#ifdef CONFIG_VFPv3
+ .fpu vfpv3
@ d16 - d31 registers
- .irp dr,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15
-1: mcrr p11, 3, r0, r1, c\dr @ fmdrr r0, r1, d\dr
+ .irp dr,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31
+1: vmov d\dr, r0, r1
ret lr
.org 1b + 8
.endr
--
2.25.1
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH 2/3] ARM: use VFP assembler mnemonics in register load/store macros
2020-03-10 22:01 ` Stefan Agner
@ 2020-03-10 22:01 ` Stefan Agner
-1 siblings, 0 replies; 8+ messages in thread
From: Stefan Agner @ 2020-03-10 22:01 UTC (permalink / raw)
To: linux
Cc: arnd, ard.biesheuvel, robin.murphy, yamada.masahiro,
ndesaulniers, manojgupta, jiancai, linux-arm-kernel,
linux-kernel, clang-built-linux, Stefan Agner
Clang's integrated assembler does not allow to access the VFP registers
through the coprocessor load/store instructions:
<instantiation>:4:6: error: invalid operand for instruction
LDC p11, cr0, [r10],#32*4 @ FLDMIAD r10!, {d0-d15}
^
Replace the coprocessor load/store instructions with explicit assembler
mnemonics to accessing the floating point coprocessor registers. Use
assembler directives to select the appropriate FPU version.
This allows to build these macros with GNU assembler as well as with
Clang's built-in assembler.
Signed-off-by: Stefan Agner <stefan@agner.ch>
---
arch/arm/include/asm/vfpmacros.h | 19 +++++++++++--------
1 file changed, 11 insertions(+), 8 deletions(-)
diff --git a/arch/arm/include/asm/vfpmacros.h b/arch/arm/include/asm/vfpmacros.h
index 628c336e8e3b..947ee5395e1f 100644
--- a/arch/arm/include/asm/vfpmacros.h
+++ b/arch/arm/include/asm/vfpmacros.h
@@ -19,23 +19,25 @@
@ read all the working registers back into the VFP
.macro VFPFLDMIA, base, tmp
+ .fpu vfpv2
#if __LINUX_ARM_ARCH__ < 6
- LDC p11, cr0, [\base],#33*4 @ FLDMIAX \base!, {d0-d15}
+ fldmiax \base!, {d0-d15}
#else
- LDC p11, cr0, [\base],#32*4 @ FLDMIAD \base!, {d0-d15}
+ vldmia \base!, {d0-d15}
#endif
#ifdef CONFIG_VFPv3
+ .fpu vfpv3
#if __LINUX_ARM_ARCH__ <= 6
ldr \tmp, =elf_hwcap @ may not have MVFR regs
ldr \tmp, [\tmp, #0]
tst \tmp, #HWCAP_VFPD32
- ldclne p11, cr0, [\base],#32*4 @ FLDMIAD \base!, {d16-d31}
+ vldmiane \base!, {d16-d31}
addeq \base, \base, #32*4 @ step over unused register space
#else
VFPFMRX \tmp, MVFR0 @ Media and VFP Feature Register 0
and \tmp, \tmp, #MVFR0_A_SIMD_MASK @ A_SIMD field
cmp \tmp, #2 @ 32 x 64bit registers?
- ldcleq p11, cr0, [\base],#32*4 @ FLDMIAD \base!, {d16-d31}
+ vldmiaeq \base!, {d16-d31}
addne \base, \base, #32*4 @ step over unused register space
#endif
#endif
@@ -44,22 +46,23 @@
@ write all the working registers out of the VFP
.macro VFPFSTMIA, base, tmp
#if __LINUX_ARM_ARCH__ < 6
- STC p11, cr0, [\base],#33*4 @ FSTMIAX \base!, {d0-d15}
+ fstmiax \base!, {d0-d15}
#else
- STC p11, cr0, [\base],#32*4 @ FSTMIAD \base!, {d0-d15}
+ vstmia \base!, {d0-d15}
#endif
#ifdef CONFIG_VFPv3
+ .fpu vfpv3
#if __LINUX_ARM_ARCH__ <= 6
ldr \tmp, =elf_hwcap @ may not have MVFR regs
ldr \tmp, [\tmp, #0]
tst \tmp, #HWCAP_VFPD32
- stclne p11, cr0, [\base],#32*4 @ FSTMIAD \base!, {d16-d31}
+ vstmiane \base!, {d16-d31}
addeq \base, \base, #32*4 @ step over unused register space
#else
VFPFMRX \tmp, MVFR0 @ Media and VFP Feature Register 0
and \tmp, \tmp, #MVFR0_A_SIMD_MASK @ A_SIMD field
cmp \tmp, #2 @ 32 x 64bit registers?
- stcleq p11, cr0, [\base],#32*4 @ FSTMIAD \base!, {d16-d31}
+ vstmiaeq \base!, {d16-d31}
addne \base, \base, #32*4 @ step over unused register space
#endif
#endif
--
2.25.1
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH 2/3] ARM: use VFP assembler mnemonics in register load/store macros
@ 2020-03-10 22:01 ` Stefan Agner
0 siblings, 0 replies; 8+ messages in thread
From: Stefan Agner @ 2020-03-10 22:01 UTC (permalink / raw)
To: linux
Cc: clang-built-linux, arnd, ard.biesheuvel, ndesaulniers,
linux-kernel, Stefan Agner, jiancai, yamada.masahiro, manojgupta,
robin.murphy, linux-arm-kernel
Clang's integrated assembler does not allow to access the VFP registers
through the coprocessor load/store instructions:
<instantiation>:4:6: error: invalid operand for instruction
LDC p11, cr0, [r10],#32*4 @ FLDMIAD r10!, {d0-d15}
^
Replace the coprocessor load/store instructions with explicit assembler
mnemonics to accessing the floating point coprocessor registers. Use
assembler directives to select the appropriate FPU version.
This allows to build these macros with GNU assembler as well as with
Clang's built-in assembler.
Signed-off-by: Stefan Agner <stefan@agner.ch>
---
arch/arm/include/asm/vfpmacros.h | 19 +++++++++++--------
1 file changed, 11 insertions(+), 8 deletions(-)
diff --git a/arch/arm/include/asm/vfpmacros.h b/arch/arm/include/asm/vfpmacros.h
index 628c336e8e3b..947ee5395e1f 100644
--- a/arch/arm/include/asm/vfpmacros.h
+++ b/arch/arm/include/asm/vfpmacros.h
@@ -19,23 +19,25 @@
@ read all the working registers back into the VFP
.macro VFPFLDMIA, base, tmp
+ .fpu vfpv2
#if __LINUX_ARM_ARCH__ < 6
- LDC p11, cr0, [\base],#33*4 @ FLDMIAX \base!, {d0-d15}
+ fldmiax \base!, {d0-d15}
#else
- LDC p11, cr0, [\base],#32*4 @ FLDMIAD \base!, {d0-d15}
+ vldmia \base!, {d0-d15}
#endif
#ifdef CONFIG_VFPv3
+ .fpu vfpv3
#if __LINUX_ARM_ARCH__ <= 6
ldr \tmp, =elf_hwcap @ may not have MVFR regs
ldr \tmp, [\tmp, #0]
tst \tmp, #HWCAP_VFPD32
- ldclne p11, cr0, [\base],#32*4 @ FLDMIAD \base!, {d16-d31}
+ vldmiane \base!, {d16-d31}
addeq \base, \base, #32*4 @ step over unused register space
#else
VFPFMRX \tmp, MVFR0 @ Media and VFP Feature Register 0
and \tmp, \tmp, #MVFR0_A_SIMD_MASK @ A_SIMD field
cmp \tmp, #2 @ 32 x 64bit registers?
- ldcleq p11, cr0, [\base],#32*4 @ FLDMIAD \base!, {d16-d31}
+ vldmiaeq \base!, {d16-d31}
addne \base, \base, #32*4 @ step over unused register space
#endif
#endif
@@ -44,22 +46,23 @@
@ write all the working registers out of the VFP
.macro VFPFSTMIA, base, tmp
#if __LINUX_ARM_ARCH__ < 6
- STC p11, cr0, [\base],#33*4 @ FSTMIAX \base!, {d0-d15}
+ fstmiax \base!, {d0-d15}
#else
- STC p11, cr0, [\base],#32*4 @ FSTMIAD \base!, {d0-d15}
+ vstmia \base!, {d0-d15}
#endif
#ifdef CONFIG_VFPv3
+ .fpu vfpv3
#if __LINUX_ARM_ARCH__ <= 6
ldr \tmp, =elf_hwcap @ may not have MVFR regs
ldr \tmp, [\tmp, #0]
tst \tmp, #HWCAP_VFPD32
- stclne p11, cr0, [\base],#32*4 @ FSTMIAD \base!, {d16-d31}
+ vstmiane \base!, {d16-d31}
addeq \base, \base, #32*4 @ step over unused register space
#else
VFPFMRX \tmp, MVFR0 @ Media and VFP Feature Register 0
and \tmp, \tmp, #MVFR0_A_SIMD_MASK @ A_SIMD field
cmp \tmp, #2 @ 32 x 64bit registers?
- stcleq p11, cr0, [\base],#32*4 @ FSTMIAD \base!, {d16-d31}
+ vstmiaeq \base!, {d16-d31}
addne \base, \base, #32*4 @ step over unused register space
#endif
#endif
--
2.25.1
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH 3/3] ARM: use VFP assembler mnemonics if available
2020-03-10 22:01 ` Stefan Agner
@ 2020-03-10 22:01 ` Stefan Agner
-1 siblings, 0 replies; 8+ messages in thread
From: Stefan Agner @ 2020-03-10 22:01 UTC (permalink / raw)
To: linux
Cc: arnd, ard.biesheuvel, robin.murphy, yamada.masahiro,
ndesaulniers, manojgupta, jiancai, linux-arm-kernel,
linux-kernel, clang-built-linux, Stefan Agner
Clang's integrated assembler does not allow to to use the mcr
instruction to access floating point co-processor registers:
arch/arm/vfp/vfpmodule.c:342:2: error: invalid operand for instruction
fmxr(FPEXC, fpexc & ~(FPEXC_EX|FPEXC_DEX|FPEXC_FP2V|FPEXC_VV|FPEXC_TRAP_MASK));
^
arch/arm/vfp/vfpinstr.h:79:6: note: expanded from macro 'fmxr'
asm("mcr p10, 7, %0, " vfpreg(_vfp_) ", cr0, 0 @ fmxr " #_vfp_ ", %0" \
^
<inline asm>:1:6: note: instantiated into assembly here
mcr p10, 7, r0, cr8, cr0, 0 @ fmxr FPEXC, r0
^
Ideally we would replace this code with the unified assembler language
mnemonics vmrs/vmsr on call sites along with .fpu assembler directives.
The GNU assembler supports the .fpu directive at least since 2.17 (when
documentation has been added). Since Linux requires binutils 2.21 it is
safe to use .fpu directive. However, binutils does not allow to use
FPINST or FPINST2 as an argument to vmrs/vmsr instructions up to
binutils 2.24 (see binutils commit 16d02dc907c5).
Use as-instr to check if FPINST/FPINST2 can be used. If they can be used
make use of .fpu directives and UAL VFP mnemonics for register access.
This allows to build vfpmodule.c with Clang and its integrated assembler.
Link: https://github.com/ClangBuiltLinux/linux/issues/905
Signed-off-by: Stefan Agner <stefan@agner.ch>
---
arch/arm/include/asm/vfp.h | 2 ++
arch/arm/include/asm/vfpmacros.h | 12 +++++++++++-
arch/arm/vfp/Makefile | 5 +++++
arch/arm/vfp/vfphw.S | 1 +
arch/arm/vfp/vfpinstr.h | 23 +++++++++++++++++++----
5 files changed, 38 insertions(+), 5 deletions(-)
diff --git a/arch/arm/include/asm/vfp.h b/arch/arm/include/asm/vfp.h
index 7157d2a30a49..a73c29ff4d1f 100644
--- a/arch/arm/include/asm/vfp.h
+++ b/arch/arm/include/asm/vfp.h
@@ -9,6 +9,7 @@
#ifndef __ASM_VFP_H
#define __ASM_VFP_H
+#ifndef HAVE_VMRS_FPINST
#define FPSID cr0
#define FPSCR cr1
#define MVFR1 cr6
@@ -16,6 +17,7 @@
#define FPEXC cr8
#define FPINST cr9
#define FPINST2 cr10
+#endif
/* FPSID bits */
#define FPSID_IMPLEMENTER_BIT (24)
diff --git a/arch/arm/include/asm/vfpmacros.h b/arch/arm/include/asm/vfpmacros.h
index 947ee5395e1f..eb8d3738f227 100644
--- a/arch/arm/include/asm/vfpmacros.h
+++ b/arch/arm/include/asm/vfpmacros.h
@@ -8,7 +8,16 @@
#include <asm/vfp.h>
-@ Macros to allow building with old toolkits (with no VFP support)
+#ifdef HAVE_VMRS_FPINST
+ .macro VFPFMRX, rd, sysreg, cond
+ vmrs\cond \rd, \sysreg
+ .endm
+
+ .macro VFPFMXR, sysreg, rd, cond
+ vmsr\cond \sysreg, \rd
+ .endm
+#else
+ @ Macros to allow building with old toolkits (with no VFP support)
.macro VFPFMRX, rd, sysreg, cond
MRC\cond p10, 7, \rd, \sysreg, cr0, 0 @ FMRX \rd, \sysreg
.endm
@@ -16,6 +25,7 @@
.macro VFPFMXR, sysreg, rd, cond
MCR\cond p10, 7, \rd, \sysreg, cr0, 0 @ FMXR \sysreg, \rd
.endm
+#endif
@ read all the working registers back into the VFP
.macro VFPFLDMIA, base, tmp
diff --git a/arch/arm/vfp/Makefile b/arch/arm/vfp/Makefile
index 749901a72d6d..f145c99fba6b 100644
--- a/arch/arm/vfp/Makefile
+++ b/arch/arm/vfp/Makefile
@@ -8,4 +8,9 @@
# ccflags-y := -DDEBUG
# asflags-y := -DDEBUG
+vmrs_fpinst := $(call as-instr,.fpu vfpv2\nvmrs r0$(comma)FPINST,-DHAVE_VMRS_FPINST=1)
+
+KBUILD_CFLAGS += $(vmrs_fpinst)
+KBUILD_AFLAGS += $(vmrs_fpinst)
+
obj-y += vfpmodule.o entry.o vfphw.o vfpsingle.o vfpdouble.o
diff --git a/arch/arm/vfp/vfphw.S b/arch/arm/vfp/vfphw.S
index e214007a20a2..90e5659827c7 100644
--- a/arch/arm/vfp/vfphw.S
+++ b/arch/arm/vfp/vfphw.S
@@ -78,6 +78,7 @@
ENTRY(vfp_support_entry)
DBGSTR3 "instr %08x pc %08x state %p", r0, r2, r10
+ .fpu vfpv2
ldr r3, [sp, #S_PSR] @ Neither lazy restore nor FP exceptions
and r3, r3, #MODE_MASK @ are supported in kernel mode
teq r3, #USR_MODE
diff --git a/arch/arm/vfp/vfpinstr.h b/arch/arm/vfp/vfpinstr.h
index 38dc154e39ff..0db3825c4b4f 100644
--- a/arch/arm/vfp/vfpinstr.h
+++ b/arch/arm/vfp/vfpinstr.h
@@ -62,10 +62,23 @@
#define FPSCR_C (1 << 29)
#define FPSCR_V (1 << 28)
-/*
- * Since we aren't building with -mfpu=vfp, we need to code
- * these instructions using their MRC/MCR equivalents.
- */
+#ifdef HAVE_VMRS_FPINST
+
+#define fmrx(_vfp_) ({ \
+ u32 __v; \
+ asm(".fpu vfpv2\n" \
+ "vmrs %0, " #_vfp_ \
+ : "=r" (__v) : : "cc"); \
+ __v; \
+ })
+
+#define fmxr(_vfp_,_var_) \
+ asm(".fpu vfpv2\n" \
+ "vmsr " #_vfp_ ", %0" \
+ : : "r" (_var_) : "cc")
+
+#else
+
#define vfpreg(_vfp_) #_vfp_
#define fmrx(_vfp_) ({ \
@@ -79,6 +92,8 @@
asm("mcr p10, 7, %0, " vfpreg(_vfp_) ", cr0, 0 @ fmxr " #_vfp_ ", %0" \
: : "r" (_var_) : "cc")
+#endif
+
u32 vfp_single_cpdo(u32 inst, u32 fpscr);
u32 vfp_single_cprt(u32 inst, u32 fpscr, struct pt_regs *regs);
--
2.25.1
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH 3/3] ARM: use VFP assembler mnemonics if available
@ 2020-03-10 22:01 ` Stefan Agner
0 siblings, 0 replies; 8+ messages in thread
From: Stefan Agner @ 2020-03-10 22:01 UTC (permalink / raw)
To: linux
Cc: clang-built-linux, arnd, ard.biesheuvel, ndesaulniers,
linux-kernel, Stefan Agner, jiancai, yamada.masahiro, manojgupta,
robin.murphy, linux-arm-kernel
Clang's integrated assembler does not allow to to use the mcr
instruction to access floating point co-processor registers:
arch/arm/vfp/vfpmodule.c:342:2: error: invalid operand for instruction
fmxr(FPEXC, fpexc & ~(FPEXC_EX|FPEXC_DEX|FPEXC_FP2V|FPEXC_VV|FPEXC_TRAP_MASK));
^
arch/arm/vfp/vfpinstr.h:79:6: note: expanded from macro 'fmxr'
asm("mcr p10, 7, %0, " vfpreg(_vfp_) ", cr0, 0 @ fmxr " #_vfp_ ", %0" \
^
<inline asm>:1:6: note: instantiated into assembly here
mcr p10, 7, r0, cr8, cr0, 0 @ fmxr FPEXC, r0
^
Ideally we would replace this code with the unified assembler language
mnemonics vmrs/vmsr on call sites along with .fpu assembler directives.
The GNU assembler supports the .fpu directive at least since 2.17 (when
documentation has been added). Since Linux requires binutils 2.21 it is
safe to use .fpu directive. However, binutils does not allow to use
FPINST or FPINST2 as an argument to vmrs/vmsr instructions up to
binutils 2.24 (see binutils commit 16d02dc907c5).
Use as-instr to check if FPINST/FPINST2 can be used. If they can be used
make use of .fpu directives and UAL VFP mnemonics for register access.
This allows to build vfpmodule.c with Clang and its integrated assembler.
Link: https://github.com/ClangBuiltLinux/linux/issues/905
Signed-off-by: Stefan Agner <stefan@agner.ch>
---
arch/arm/include/asm/vfp.h | 2 ++
arch/arm/include/asm/vfpmacros.h | 12 +++++++++++-
arch/arm/vfp/Makefile | 5 +++++
arch/arm/vfp/vfphw.S | 1 +
arch/arm/vfp/vfpinstr.h | 23 +++++++++++++++++++----
5 files changed, 38 insertions(+), 5 deletions(-)
diff --git a/arch/arm/include/asm/vfp.h b/arch/arm/include/asm/vfp.h
index 7157d2a30a49..a73c29ff4d1f 100644
--- a/arch/arm/include/asm/vfp.h
+++ b/arch/arm/include/asm/vfp.h
@@ -9,6 +9,7 @@
#ifndef __ASM_VFP_H
#define __ASM_VFP_H
+#ifndef HAVE_VMRS_FPINST
#define FPSID cr0
#define FPSCR cr1
#define MVFR1 cr6
@@ -16,6 +17,7 @@
#define FPEXC cr8
#define FPINST cr9
#define FPINST2 cr10
+#endif
/* FPSID bits */
#define FPSID_IMPLEMENTER_BIT (24)
diff --git a/arch/arm/include/asm/vfpmacros.h b/arch/arm/include/asm/vfpmacros.h
index 947ee5395e1f..eb8d3738f227 100644
--- a/arch/arm/include/asm/vfpmacros.h
+++ b/arch/arm/include/asm/vfpmacros.h
@@ -8,7 +8,16 @@
#include <asm/vfp.h>
-@ Macros to allow building with old toolkits (with no VFP support)
+#ifdef HAVE_VMRS_FPINST
+ .macro VFPFMRX, rd, sysreg, cond
+ vmrs\cond \rd, \sysreg
+ .endm
+
+ .macro VFPFMXR, sysreg, rd, cond
+ vmsr\cond \sysreg, \rd
+ .endm
+#else
+ @ Macros to allow building with old toolkits (with no VFP support)
.macro VFPFMRX, rd, sysreg, cond
MRC\cond p10, 7, \rd, \sysreg, cr0, 0 @ FMRX \rd, \sysreg
.endm
@@ -16,6 +25,7 @@
.macro VFPFMXR, sysreg, rd, cond
MCR\cond p10, 7, \rd, \sysreg, cr0, 0 @ FMXR \sysreg, \rd
.endm
+#endif
@ read all the working registers back into the VFP
.macro VFPFLDMIA, base, tmp
diff --git a/arch/arm/vfp/Makefile b/arch/arm/vfp/Makefile
index 749901a72d6d..f145c99fba6b 100644
--- a/arch/arm/vfp/Makefile
+++ b/arch/arm/vfp/Makefile
@@ -8,4 +8,9 @@
# ccflags-y := -DDEBUG
# asflags-y := -DDEBUG
+vmrs_fpinst := $(call as-instr,.fpu vfpv2\nvmrs r0$(comma)FPINST,-DHAVE_VMRS_FPINST=1)
+
+KBUILD_CFLAGS += $(vmrs_fpinst)
+KBUILD_AFLAGS += $(vmrs_fpinst)
+
obj-y += vfpmodule.o entry.o vfphw.o vfpsingle.o vfpdouble.o
diff --git a/arch/arm/vfp/vfphw.S b/arch/arm/vfp/vfphw.S
index e214007a20a2..90e5659827c7 100644
--- a/arch/arm/vfp/vfphw.S
+++ b/arch/arm/vfp/vfphw.S
@@ -78,6 +78,7 @@
ENTRY(vfp_support_entry)
DBGSTR3 "instr %08x pc %08x state %p", r0, r2, r10
+ .fpu vfpv2
ldr r3, [sp, #S_PSR] @ Neither lazy restore nor FP exceptions
and r3, r3, #MODE_MASK @ are supported in kernel mode
teq r3, #USR_MODE
diff --git a/arch/arm/vfp/vfpinstr.h b/arch/arm/vfp/vfpinstr.h
index 38dc154e39ff..0db3825c4b4f 100644
--- a/arch/arm/vfp/vfpinstr.h
+++ b/arch/arm/vfp/vfpinstr.h
@@ -62,10 +62,23 @@
#define FPSCR_C (1 << 29)
#define FPSCR_V (1 << 28)
-/*
- * Since we aren't building with -mfpu=vfp, we need to code
- * these instructions using their MRC/MCR equivalents.
- */
+#ifdef HAVE_VMRS_FPINST
+
+#define fmrx(_vfp_) ({ \
+ u32 __v; \
+ asm(".fpu vfpv2\n" \
+ "vmrs %0, " #_vfp_ \
+ : "=r" (__v) : : "cc"); \
+ __v; \
+ })
+
+#define fmxr(_vfp_,_var_) \
+ asm(".fpu vfpv2\n" \
+ "vmsr " #_vfp_ ", %0" \
+ : : "r" (_var_) : "cc")
+
+#else
+
#define vfpreg(_vfp_) #_vfp_
#define fmrx(_vfp_) ({ \
@@ -79,6 +92,8 @@
asm("mcr p10, 7, %0, " vfpreg(_vfp_) ", cr0, 0 @ fmxr " #_vfp_ ", %0" \
: : "r" (_var_) : "cc")
+#endif
+
u32 vfp_single_cpdo(u32 inst, u32 fpscr);
u32 vfp_single_cprt(u32 inst, u32 fpscr, struct pt_regs *regs);
--
2.25.1
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply related [flat|nested] 8+ messages in thread
end of thread, other threads:[~2020-03-10 22:02 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-10 22:01 [PATCH 0/3] ARM: make use of UAL VFP mnemonics when possible Stefan Agner
2020-03-10 22:01 ` Stefan Agner
2020-03-10 22:01 ` [PATCH 1/3] ARM: use .fpu assembler directives instead of assembler arguments Stefan Agner
2020-03-10 22:01 ` Stefan Agner
2020-03-10 22:01 ` [PATCH 2/3] ARM: use VFP assembler mnemonics in register load/store macros Stefan Agner
2020-03-10 22:01 ` Stefan Agner
2020-03-10 22:01 ` [PATCH 3/3] ARM: use VFP assembler mnemonics if available Stefan Agner
2020-03-10 22:01 ` Stefan Agner
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.