Re: [PATCH] ARM: use assembly mnemonics for VFP register access

From: Robin Murphy <robin.murphy@arm.com>
To: Ard Biesheuvel <ard.biesheuvel@linaro.org>,
	Nick Desaulniers <ndesaulniers@google.com>
Cc: Arnd Bergmann <arnd@arndb.de>,
	LKML <linux-kernel@vger.kernel.org>,
	Stefan Agner <stefan@agner.ch>, Jian Cai <jiancai@google.com>,
	clang-built-linux <clang-built-linux@googlegroups.com>,
	Manoj Gupta <manojgupta@google.com>,
	Russell King <linux@armlinux.org.uk>,
	Linux ARM <linux-arm-kernel@lists.infradead.org>
Subject: Re: [PATCH] ARM: use assembly mnemonics for VFP register access
Date: Tue, 25 Feb 2020 19:45:14 +0000	[thread overview]
Message-ID: <d7047fca-7efb-94bc-51aa-c33934df0721@arm.com> (raw)
In-Reply-To: <CAKv+Gu881ZSwvuACmsbBnpfdeJpNYsEQxLSoepJBbZ=O6D6Rcg@mail.gmail.com>

On 2020-02-25 7:33 pm, Ard Biesheuvel wrote:
> On Tue, 25 Feb 2020 at 20:10, Nick Desaulniers <ndesaulniers@google.com> wrote:
>>
>> On Mon, Feb 24, 2020 at 9:22 PM Stefan Agner <stefan@agner.ch> wrote:
>>>
>>> Clang's integrated assembler does not allow to to use the mcr
>>> instruction to access floating point co-processor registers:
>>> arch/arm/vfp/vfpmodule.c:342:2: error: invalid operand for instruction
>>>          fmxr(FPEXC, fpexc & ~(FPEXC_EX|FPEXC_DEX|FPEXC_FP2V|FPEXC_VV|FPEXC_TRAP_MASK));
>>>          ^
>>> arch/arm/vfp/vfpinstr.h:79:6: note: expanded from macro 'fmxr'
>>>          asm("mcr p10, 7, %0, " vfpreg(_vfp_) ", cr0, 0 @ fmxr   " #_vfp_ ", %0" \
>>>              ^
>>> <inline asm>:1:6: note: instantiated into assembly here
>>>          mcr p10, 7, r0, cr8, cr0, 0 @ fmxr      FPEXC, r0
>>>              ^
>>>
>>> The GNU assembler supports the .fpu directive at least since 2.17 (when
>>> documentation has been added). Since Linux requires binutils 2.21 it is
>>> safe to use .fpu directive. Use the .fpu directive and mnemonics for VFP
>>> register access.
>>>
>>> This allows to build vfpmodule.c with Clang and its integrated assembler.
>>>
>>> Link: https://github.com/ClangBuiltLinux/linux/issues/905
>>> Signed-off-by: Stefan Agner <stefan@agner.ch>
>>> ---
>>>   arch/arm/vfp/vfpinstr.h | 12 ++++--------
>>>   1 file changed, 4 insertions(+), 8 deletions(-)
>>>
>>> diff --git a/arch/arm/vfp/vfpinstr.h b/arch/arm/vfp/vfpinstr.h
>>> index 38dc154e39ff..799ccf065406 100644
>>> --- a/arch/arm/vfp/vfpinstr.h
>>> +++ b/arch/arm/vfp/vfpinstr.h
>>> @@ -62,21 +62,17 @@
>>>   #define FPSCR_C (1 << 29)
>>>   #define FPSCR_V        (1 << 28)
>>>
>>> -/*
>>> - * Since we aren't building with -mfpu=vfp, we need to code
>>> - * these instructions using their MRC/MCR equivalents.
>>> - */
>>> -#define vfpreg(_vfp_) #_vfp_
>>> -
>>>   #define fmrx(_vfp_) ({                 \
>>>          u32 __v;                        \
>>> -       asm("mrc p10, 7, %0, " vfpreg(_vfp_) ", cr0, 0 @ fmrx   %0, " #_vfp_    \
>>> +       asm(".fpu       vfpv2\n"        \
>>> +           "vmrs       %0, " #_vfp_    \
>>>              : "=r" (__v) : : "cc");     \
>>>          __v;                            \
>>>    })
>>>
>>>   #define fmxr(_vfp_,_var_)              \
>>> -       asm("mcr p10, 7, %0, " vfpreg(_vfp_) ", cr0, 0 @ fmxr   " #_vfp_ ", %0" \
>>> +       asm(".fpu       vfpv2\n"        \
>>> +           "vmsr       " #_vfp_ ", %0" \
>>>             : : "r" (_var_) : "cc")
>>>
>>>   u32 vfp_single_cpdo(u32 inst, u32 fpscr);
>>> --
>>
>> Hi Stefan,
>> Thanks for the patch.  Reading through:
>> - FMRX, FMXR, and FMSTAT:
>> http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0068b/Bcfbdihi.html
>> - VMRS and VMSR:
>> http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0204h/Bcfbdihi.html
>>
>> Should a macro called `fmrx` that had a comment about `fmrx` be using
>> `vmrs` in place of `fmrx`?
>>
>> It looks like Clang treats them the same, but GCC keeps them separate:
>> https://godbolt.org/z/YKmSAs
>> Ah, this is only when streaming to assembly. Looks like they have the
>> same encoding, and produce the same disassembly. (Godbolt emits
>> assembly by default, and has the option to compile, then disassemble).
>> If I take my case from godbolt above:
>>
>> ➜  /tmp arm-linux-gnueabihf-gcc -O2 -c x.c
>> ➜  /tmp llvm-objdump -dr x.o
>>
>> x.o: file format elf32-arm-little
>>
>>
>> Disassembly of section .text:
>>
>> 00000000 bar:
>>         0: f1 ee 10 0a                  vmrs r0, fpscr
>>         4: 70 47                        bx lr
>>         6: 00 bf                        nop
>>
>> 00000008 baz:
>>         8: f1 ee 10 0a                  vmrs r0, fpscr
>>         c: 70 47                        bx lr
>>         e: 00 bf                        nop
>>
>> So indeed a similar encoding exists for the two different assembler
>> instructions.
> 
> Does that hold for ARM (A32) instructions as well?

It should do - they're all the same thing underneath. The UAL syntax 
just renamed all the legacy VFP mnemonics from Fxxx to Vxxx form, apart 
from a couple of things that were already deprecated. GAS still accepts 
both regardless of ".syntax unified", and as a result GCC never saw a 
reason to stop emitting the old mnemonics.

Robin.