All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] MIPS: UAPI: Fix unrecognized opcode WSBH/DSBH/DSHD when using MIPS16.
@ 2015-09-03  9:47 Yousong Zhou
  2015-09-04 18:52 ` Maciej W. Rozycki
  0 siblings, 1 reply; 7+ messages in thread
From: Yousong Zhou @ 2015-09-03  9:47 UTC (permalink / raw)
  To: Ralf Baechle; +Cc: Chen Jie, Yousong Zhou, linux-mips, linux-kernel

The nomips16 has to be added both as function attribute and assembler
directive.

When only function attribute is specified, the compiler will inline the
function with -Os optimization.  The generated assembly code cannot be
correctly assembled because ISA mode switch has to be done through jump
instruction.

When only ".set nomips16" directive is used, the generated assembly code
will use MIPS32 code for the inline assembly template and MIPS16 for the
function return.  The compiled binary is invalid:

    00403100 <__arch_swab16>:
      403100:   7c0410a0    wsbh    v0,a0
      403104:   e820ea31    swc2    $0,-5583(at)

while correct code should be:

    00402650 <__arch_swab16>:
      402650:   7c0410a0    wsbh    v0,a0
      402654:   03e00008    jr  ra
      402658:   3042ffff    andi    v0,v0,0xffff

Signed-off-by: Yousong Zhou <yszhou4tech@gmail.com>
---
 arch/mips/include/uapi/asm/swab.h |   12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/arch/mips/include/uapi/asm/swab.h b/arch/mips/include/uapi/asm/swab.h
index 8f2d184..c4ddc4f 100644
--- a/arch/mips/include/uapi/asm/swab.h
+++ b/arch/mips/include/uapi/asm/swab.h
@@ -16,11 +16,13 @@
 #if (defined(__mips_isa_rev) && (__mips_isa_rev >= 2)) ||		\
     defined(_MIPS_ARCH_LOONGSON3A)
 
-static inline __attribute_const__ __u16 __arch_swab16(__u16 x)
+static inline __attribute__((nomips16)) __attribute_const__
+		__u16 __arch_swab16(__u16 x)
 {
 	__asm__(
 	"	.set	push			\n"
 	"	.set	arch=mips32r2		\n"
+	"	.set	nomips16		\n"
 	"	wsbh	%0, %1			\n"
 	"	.set	pop			\n"
 	: "=r" (x)
@@ -30,11 +32,13 @@ static inline __attribute_const__ __u16 __arch_swab16(__u16 x)
 }
 #define __arch_swab16 __arch_swab16
 
-static inline __attribute_const__ __u32 __arch_swab32(__u32 x)
+static inline __attribute__((nomips16)) __attribute_const__
+		__u32 __arch_swab32(__u32 x)
 {
 	__asm__(
 	"	.set	push			\n"
 	"	.set	arch=mips32r2		\n"
+	"	.set	nomips16		\n"
 	"	wsbh	%0, %1			\n"
 	"	rotr	%0, %0, 16		\n"
 	"	.set	pop			\n"
@@ -50,11 +54,13 @@ static inline __attribute_const__ __u32 __arch_swab32(__u32 x)
  * 64-bit kernel on r2 CPUs.
  */
 #ifdef __mips64
-static inline __attribute_const__ __u64 __arch_swab64(__u64 x)
+static inline __attribute__((nomips16)) __attribute_const__
+		__u64 __arch_swab64(__u64 x)
 {
 	__asm__(
 	"	.set	push			\n"
 	"	.set	arch=mips64r2		\n"
+	"	.set	nomips16		\n"
 	"	dsbh	%0, %1			\n"
 	"	dshd	%0, %0			\n"
 	"	.set	pop			\n"
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] MIPS: UAPI: Fix unrecognized opcode WSBH/DSBH/DSHD when using MIPS16.
  2015-09-03  9:47 [PATCH] MIPS: UAPI: Fix unrecognized opcode WSBH/DSBH/DSHD when using MIPS16 Yousong Zhou
@ 2015-09-04 18:52 ` Maciej W. Rozycki
  2015-09-05  3:29   ` Yousong Zhou
  0 siblings, 1 reply; 7+ messages in thread
From: Maciej W. Rozycki @ 2015-09-04 18:52 UTC (permalink / raw)
  To: Yousong Zhou; +Cc: Ralf Baechle, Chen Jie, linux-mips, linux-kernel

On Thu, 3 Sep 2015, Yousong Zhou wrote:

> The nomips16 has to be added both as function attribute and assembler
> directive.
> 
> When only function attribute is specified, the compiler will inline the
> function with -Os optimization.  The generated assembly code cannot be
> correctly assembled because ISA mode switch has to be done through jump
> instruction.

 This can't be true.  The compiler does not intepret the contents of an 
inline asm and therefore cannot decide whether to inline a function 
containing one or not based on the lone presence or the absence of an 
assembly directive within.

> When only ".set nomips16" directive is used, the generated assembly code
> will use MIPS32 code for the inline assembly template and MIPS16 for the
> function return.  The compiled binary is invalid:
> 
>     00403100 <__arch_swab16>:
>       403100:   7c0410a0    wsbh    v0,a0
>       403104:   e820ea31    swc2    $0,-5583(at)
> 
> while correct code should be:
> 
>     00402650 <__arch_swab16>:
>       402650:   7c0410a0    wsbh    v0,a0
>       402654:   03e00008    jr  ra
>       402658:   3042ffff    andi    v0,v0,0xffff

 It looks to me you're papering something over here -- when you use a 
`.set nomips16' directive the assembler will happily switch your 
instruction set in the middle of an instruction stream.  Consequently if 
this function is (for whatever reason; it should not really) inlined in 
MIPS16 code, then you'll get a MIPS32 instruction in the middle, which 
will obviously be interpreted differently in the MIPS16 execution mode and 
is therefore surely a recipe for disaster.

 How did you test your change and made the conclusion quoted with your 
patch description?

> Signed-off-by: Yousong Zhou <yszhou4tech@gmail.com>
> ---
>  arch/mips/include/uapi/asm/swab.h |   12 +++++++++---
>  1 file changed, 9 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/mips/include/uapi/asm/swab.h b/arch/mips/include/uapi/asm/swab.h
> index 8f2d184..c4ddc4f 100644
> --- a/arch/mips/include/uapi/asm/swab.h
> +++ b/arch/mips/include/uapi/asm/swab.h
> @@ -16,11 +16,13 @@
>  #if (defined(__mips_isa_rev) && (__mips_isa_rev >= 2)) ||		\
>      defined(_MIPS_ARCH_LOONGSON3A)
>  
> -static inline __attribute_const__ __u16 __arch_swab16(__u16 x)
> +static inline __attribute__((nomips16)) __attribute_const__
> +		__u16 __arch_swab16(__u16 x)
>  {
>  	__asm__(
>  	"	.set	push			\n"
>  	"	.set	arch=mips32r2		\n"
> +	"	.set	nomips16		\n"
>  	"	wsbh	%0, %1			\n"
>  	"	.set	pop			\n"
>  	: "=r" (x)

 So setting aside the correctness issues discussed above, for MIPS16 code 
this has to be put out of line by the compiler, with all the usual 
overhead of a function call, causing a performance loss rather than a gain 
expected here.  Especially as switching the ISA mode requires draining the 
whole pipeline so that subsequent instructions are interpreted in the new 
execution mode.  This is expensive in performance terms.

 I'm fairly sure simply disabling these asms (#ifndef __mips16) and 
letting the compiler generate the right mask and shift operations from 
plain C code will be cheaper in performance terms and possibly cheaper in 
code size as well, especially in otherwise leaf functions where an extra 
function call will according to the ABI clobber temporaries.  So I suggest 
going in that direction instead.

 So this is a NAK really.

  Maciej

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] MIPS: UAPI: Fix unrecognized opcode WSBH/DSBH/DSHD when using MIPS16.
  2015-09-04 18:52 ` Maciej W. Rozycki
@ 2015-09-05  3:29   ` Yousong Zhou
  2015-09-05  3:34     ` Yousong Zhou
  2015-09-05 14:25     ` Maciej W. Rozycki
  0 siblings, 2 replies; 7+ messages in thread
From: Yousong Zhou @ 2015-09-05  3:29 UTC (permalink / raw)
  To: Maciej W. Rozycki
  Cc: Ralf Baechle, Chen Jie, linux-mips, linux-kernel, Florian Fainelli

On 5 September 2015 at 02:52, Maciej W. Rozycki <macro@linux-mips.org> wrote:
> On Thu, 3 Sep 2015, Yousong Zhou wrote:
>
>> The nomips16 has to be added both as function attribute and assembler
>> directive.
>>
>> When only function attribute is specified, the compiler will inline the
>> function with -Os optimization.  The generated assembly code cannot be
>> correctly assembled because ISA mode switch has to be done through jump
>> instruction.
>
>  This can't be true.  The compiler does not intepret the contents of an
> inline asm and therefore cannot decide whether to inline a function
> containing one or not based on the lone presence or the absence of an
> assembly directive within.
>

Most of the time I trust my compiler and never meddle with the
toolchain.  Anyway I made a patch because it really did not work for
me.   No big deal.  It's not the end of world.  It started with a
comment from OpenWrt packages feeds [1].  Actually this "unrecognized
opcode" problem have occurred within OpenWrt quite a few times before.

 [1] https://github.com/openwrt/packages/commit/1e29676a8ac74f797f8ca799364681cec575ae6f#commitcomment-12901931

>> When only ".set nomips16" directive is used, the generated assembly code
>> will use MIPS32 code for the inline assembly template and MIPS16 for the
>> function return.  The compiled binary is invalid:
>>
>>     00403100 <__arch_swab16>:
>>       403100:   7c0410a0    wsbh    v0,a0
>>       403104:   e820ea31    swc2    $0,-5583(at)
>>
>> while correct code should be:
>>
>>     00402650 <__arch_swab16>:
>>       402650:   7c0410a0    wsbh    v0,a0
>>       402654:   03e00008    jr  ra
>>       402658:   3042ffff    andi    v0,v0,0xffff
>
>  It looks to me you're papering something over here -- when you use a
> `.set nomips16' directive the assembler will happily switch your
> instruction set in the middle of an instruction stream.  Consequently if
> this function is (for whatever reason; it should not really) inlined in
> MIPS16 code, then you'll get a MIPS32 instruction in the middle, which
> will obviously be interpreted differently in the MIPS16 execution mode and
> is therefore surely a recipe for disaster.

If by "papering" you mean "made up", then whatever.  Yeah, it's
disaster, an "invalid instruction" abort.

>
>  How did you test your change and made the conclusion quoted with your
> patch description?
>

Compile the following program with a MIPS 24kc big endian variant compiler with
flag "-mips32r2 -mips16 -Os".

    #include <stdio.h>
    #include <stdint.h>

    uint16_t __attribute__((noinline)) f(uint16_t v)
    {
        v = __cpu_to_le16(v);
        return v;
    }

    int main()
    {
        printf("%x\n", f(0xbeef));

        return 0;
    }

When only ".set nomips16" was specified in __arch_swab16(), this was output
from objdump.

    242 004003e0 <f>:
    243   4003e0:       7c0410a0        wsbh    v0,a0
    244   4003e4:       e820ea31        swc2    $0,-5583(at)
    245   4003e8:       65006500        0x65006500
    246   4003ec:       65006500        0x65006500

__arch_swab16() was indeed inlined.  That swc2 instruction can be got from
assembler with the following code (it's from the "-S" result of GCC).

    .set    mips16
    .set    noreorder
    .set    nomacro
    j       $31
    zeh     $2

When only nomips16 function attribute was specified, this time GCC failed with
unrecognized opcode

    /tmp/ccaGCouL.s: Assembler messages:
    /tmp/ccaGCouL.s:21: Error: unrecognized opcode `wsbh $2,$4'

The generated assembly was something in the following form.  Looks like the
assembler did not automatically switch to MIPS32 mode when ".set arch=mip32r2"

        .set mips16

        .ent f
        .type f, @function
    f
        ...
        .set push
        .set arch=mips32r2
        wsbh $2,$4
        .pop
        j $31
        zeh $2
        .end f
        ...

The patch was run tested on QEMU Malta and an router with Atheros
AR9331 SoC.  I didn't test __arch_swab64() though.  I have done many
other trial-and-error tests while preparing this patch.  It was a mess
when I was sure I should expect some sensible behaviour from the
compiler while it actually just did not behave that way.

>> Signed-off-by: Yousong Zhou <yszhou4tech@gmail.com>
>> ---
>>  arch/mips/include/uapi/asm/swab.h |   12 +++++++++---
>>  1 file changed, 9 insertions(+), 3 deletions(-)
>>
>> diff --git a/arch/mips/include/uapi/asm/swab.h b/arch/mips/include/uapi/asm/swab.h
>> index 8f2d184..c4ddc4f 100644
>> --- a/arch/mips/include/uapi/asm/swab.h
>> +++ b/arch/mips/include/uapi/asm/swab.h
>> @@ -16,11 +16,13 @@
>>  #if (defined(__mips_isa_rev) && (__mips_isa_rev >= 2)) ||            \
>>      defined(_MIPS_ARCH_LOONGSON3A)
>>
>> -static inline __attribute_const__ __u16 __arch_swab16(__u16 x)
>> +static inline __attribute__((nomips16)) __attribute_const__
>> +             __u16 __arch_swab16(__u16 x)
>>  {
>>       __asm__(
>>       "       .set    push                    \n"
>>       "       .set    arch=mips32r2           \n"
>> +     "       .set    nomips16                \n"
>>       "       wsbh    %0, %1                  \n"
>>       "       .set    pop                     \n"
>>       : "=r" (x)
>
>  So setting aside the correctness issues discussed above, for MIPS16 code
> this has to be put out of line by the compiler, with all the usual
> overhead of a function call, causing a performance loss rather than a gain
> expected here.  Especially as switching the ISA mode requires draining the
> whole pipeline so that subsequent instructions are interpreted in the new
> execution mode.  This is expensive in performance terms.
>
>  I'm fairly sure simply disabling these asms (#ifndef __mips16) and
> letting the compiler generate the right mask and shift operations from
> plain C code will be cheaper in performance terms and possibly cheaper in
> code size as well, especially in otherwise leaf functions where an extra
> function call will according to the ABI clobber temporaries.  So I suggest
> going in that direction instead.

I agree.  Then you will provide the fix right?  I am just curious
where that __mips16 should be placed or is it from compiler and
assembler?

>
>  So this is a NAK really.

okay.

Cheers

                yousong

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] MIPS: UAPI: Fix unrecognized opcode WSBH/DSBH/DSHD when using MIPS16.
  2015-09-05  3:29   ` Yousong Zhou
@ 2015-09-05  3:34     ` Yousong Zhou
  2015-09-05 14:25     ` Maciej W. Rozycki
  1 sibling, 0 replies; 7+ messages in thread
From: Yousong Zhou @ 2015-09-05  3:34 UTC (permalink / raw)
  To: Maciej W. Rozycki
  Cc: Ralf Baechle, Chen Jie, linux-mips, linux-kernel, Florian Fainelli

On 5 September 2015 at 11:29, Yousong Zhou <yszhou4tech@gmail.com> wrote:
>     #include <stdio.h>
>     #include <stdint.h>
>

#include <asm/byteorder.h> was missed out at the top.

                yousong

>     uint16_t __attribute__((noinline)) f(uint16_t v)
>     {
>         v = __cpu_to_le16(v);
>         return v;
>     }
>
>     int main()
>     {
>         printf("%x\n", f(0xbeef));
>
>         return 0;
>     }
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] MIPS: UAPI: Fix unrecognized opcode WSBH/DSBH/DSHD when using MIPS16.
  2015-09-05  3:29   ` Yousong Zhou
  2015-09-05  3:34     ` Yousong Zhou
@ 2015-09-05 14:25     ` Maciej W. Rozycki
  2015-09-06  2:42       ` Yousong Zhou
  1 sibling, 1 reply; 7+ messages in thread
From: Maciej W. Rozycki @ 2015-09-05 14:25 UTC (permalink / raw)
  To: Yousong Zhou
  Cc: Ralf Baechle, Chen Jie, linux-mips, linux-kernel, Florian Fainelli

On Sat, 5 Sep 2015, Yousong Zhou wrote:

> >  This can't be true.  The compiler does not intepret the contents of an
> > inline asm and therefore cannot decide whether to inline a function
> > containing one or not based on the lone presence or the absence of an
> > assembly directive within.
> >
> 
> Most of the time I trust my compiler and never meddle with the
> toolchain.  Anyway I made a patch because it really did not work for
> me.   No big deal.  It's not the end of world.  It started with a
> comment from OpenWrt packages feeds [1].  Actually this "unrecognized
> opcode" problem have occurred within OpenWrt quite a few times before.
> 
>  [1] https://github.com/openwrt/packages/commit/1e29676a8ac74f797f8ca799364681cec575ae6f#commitcomment-12901931

 The bug certainly was there, it's just your analysis and consequently the 
fix that are wrong in the general case for some reason, maybe a buggy 
compiler.

> >  It looks to me you're papering something over here -- when you use a
> > `.set nomips16' directive the assembler will happily switch your
> > instruction set in the middle of an instruction stream.  Consequently if
> > this function is (for whatever reason; it should not really) inlined in
> > MIPS16 code, then you'll get a MIPS32 instruction in the middle, which
> > will obviously be interpreted differently in the MIPS16 execution mode and
> > is therefore surely a recipe for disaster.
> 
> If by "papering" you mean "made up", then whatever.  Yeah, it's
> disaster, an "invalid instruction" abort.

 By "papering over" I mean forcing source code to compile successfully at 
the risk of producing incorrect binary code.

> >  How did you test your change and made the conclusion quoted with your
> > patch description?
> >
> 
> Compile the following program with a MIPS 24kc big endian variant compiler with
> flag "-mips32r2 -mips16 -Os".
> 
>     #include <stdio.h>
>     #include <stdint.h>
> 
>     uint16_t __attribute__((noinline)) f(uint16_t v)
>     {
>         v = __cpu_to_le16(v);
>         return v;
>     }
> 
>     int main()
>     {
>         printf("%x\n", f(0xbeef));
> 
>         return 0;
>     }
> 
> When only ".set nomips16" was specified in __arch_swab16(), this was output
> from objdump.
> 
>     242 004003e0 <f>:
>     243   4003e0:       7c0410a0        wsbh    v0,a0
>     244   4003e4:       e820ea31        swc2    $0,-5583(at)
>     245   4003e8:       65006500        0x65006500
>     246   4003ec:       65006500        0x65006500

 Quite obviously.

 For the record: the first instruction has been assembled in the regular 
MIPS mode and that propagated to symbol annotation.  Consequently `f' is 
seen by `objdump' as a regular MIPS function and disassembles it all as 
such.  You can put a global label immediately after the WSBH instruction 
in your source code to have the rest of the function disassembled 
correctly (of course this won't make this code work at the run time).

> __arch_swab16() was indeed inlined.  That swc2 instruction can be got from
> assembler with the following code (it's from the "-S" result of GCC).
> 
>     .set    mips16
>     .set    noreorder
>     .set    nomacro
>     j       $31
>     zeh     $2
> 
> When only nomips16 function attribute was specified, this time GCC failed with
> unrecognized opcode
> 
>     /tmp/ccaGCouL.s: Assembler messages:
>     /tmp/ccaGCouL.s:21: Error: unrecognized opcode `wsbh $2,$4'
> 
> The generated assembly was something in the following form.  Looks like the
> assembler did not automatically switch to MIPS32 mode when ".set arch=mip32r2"

 There's no switch to regular MIPS mode implied with `.set arch=mip32r2', 
the directive merely switches the ISA level, affecting both the regular 
MIPS and the MIPS16 mode (the MIPS32r2 ISA level adds extra MIPS16 
instructions too, e.g. ZEH is a new addition).

>         .set mips16
> 
>         .ent f
>         .type f, @function
>     f
>         ...
>         .set push
>         .set arch=mips32r2
>         wsbh $2,$4
>         .pop
>         j $31
>         zeh $2
>         .end f
>         ...

 That's exactly the papered-over buggy code scenario I've been referring 
to above.  This is clearly MIPS16 code: ZEH is a MIPS16 instruction only, 
there's no such regular MIPS counterpart.  And it also obviously fails to 
assemble because on the contrary there's no MIPS16 WSBH instruction.

 Now if you stick `.set nomips16' just above WSBH, then this code will 
happily assemble, because this single instruction only (`.set pop' reverts 
any previous `.set' directives; I'm assuming you wrote above by hand and 
`.pop' is a typo) will assemble in the regular MIPS instruction mode.  But 
if this code is ever reached, then the processor will still execute the 
machine code produced by the assembler from the WSBH instruction in the 
MIPS16 mode.

 For example the encoding of:

	wsbh	$2, $4

is (as you've shown in a dump above) 0x7c0410a0, which in the MIPS16 mode 
yields (in the big-endian mode):

00000000 <f>:
   0:	7c04      	sd	s0,32(a0)
   2:	10a0      	b	144 <f+0x144>

-- so this won't do what you might expect, you'll get a Reserved 
Instruction exception on the SD instruction, which is not supported in the 
32-bit mode, and consequently SIGILL.

> The patch was run tested on QEMU Malta and an router with Atheros
> AR9331 SoC.  I didn't test __arch_swab64() though.  I have done many
> other trial-and-error tests while preparing this patch.  It was a mess
> when I was sure I should expect some sensible behaviour from the
> compiler while it actually just did not behave that way.

 I've compiled your example provided and as I stated in the original mail 
`__arch_swab16' is always produced as a separate function, whether `.set 
nomips16' is present in the inline assembly placed there or not.  This is 
with (unreleased) GCC 6.0.0.

 However if you happen to have a buggy compiler that fails to emit 
`__arch_swab16' as a separate function despite the `nomips16' attribute, 
then it's better if the resulting generated assembly code fails to 
assemble rather than if it goes astray at the run time.

> >  So setting aside the correctness issues discussed above, for MIPS16 code
> > this has to be put out of line by the compiler, with all the usual
> > overhead of a function call, causing a performance loss rather than a gain
> > expected here.  Especially as switching the ISA mode requires draining the
> > whole pipeline so that subsequent instructions are interpreted in the new
> > execution mode.  This is expensive in performance terms.
> >
> >  I'm fairly sure simply disabling these asms (#ifndef __mips16) and
> > letting the compiler generate the right mask and shift operations from
> > plain C code will be cheaper in performance terms and possibly cheaper in
> > code size as well, especially in otherwise leaf functions where an extra
> > function call will according to the ABI clobber temporaries.  So I suggest
> > going in that direction instead.
> 
> I agree.  Then you will provide the fix right?  I am just curious
> where that __mips16 should be placed or is it from compiler and
> assembler?

 No, it's your bug after all.  I think the last paragraph I wrote quoted 
above combined with the source code in question make it clear what to do.

 I have also checked what the difference in generated MIPS16 code is 
between a call to `__arch_swab16':

0000000c <f>:
   c:	64c4      	save	32,ra
   e:	1800 0000 	jal	0 <__arch_swab16>
			e: R_MIPS16_26	__arch_swab16
  12:	ec31      	zeh	a0
  14:	6444      	restore	32,ra
  16:	e8a0      	jrc	ra

and equivalent generic code:

0000000c <f>:
   c:	ec31      	zeh	a0
   e:	3280      	sll	v0,a0,8
  10:	3482      	srl	a0,8
  12:	ea8d      	or	v0,a0
  14:	e820      	jr	ra
  16:	ea31      	zeh	v0

so the win is I think clear.

 Finally the MIPS64 `__arch_swab64' case does not matter as we have no 
MIPS16 support for 64-bit code in Linux, the toolchain will simply refuse 
to build it (only bare metal is supported).

  Maciej

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] MIPS: UAPI: Fix unrecognized opcode WSBH/DSBH/DSHD when using MIPS16.
  2015-09-05 14:25     ` Maciej W. Rozycki
@ 2015-09-06  2:42       ` Yousong Zhou
  2015-09-06 23:23         ` Maciej W. Rozycki
  0 siblings, 1 reply; 7+ messages in thread
From: Yousong Zhou @ 2015-09-06  2:42 UTC (permalink / raw)
  To: Maciej W. Rozycki
  Cc: Ralf Baechle, Chen Jie, linux-mips, linux-kernel, Florian Fainelli

Hi, Maciej, first of all, thank you for your time on this,
appreciate it.

Comments inline

On 5 September 2015 at 22:25, Maciej W. Rozycki <macro@linux-mips.org> wrote:
> On Sat, 5 Sep 2015, Yousong Zhou wrote:
>
>> >  This can't be true.  The compiler does not intepret the contents of an
>> > inline asm and therefore cannot decide whether to inline a function
>> > containing one or not based on the lone presence or the absence of an
>> > assembly directive within.
>> >
>>
>> Most of the time I trust my compiler and never meddle with the
>> toolchain.  Anyway I made a patch because it really did not work for
>> me.   No big deal.  It's not the end of world.  It started with a
>> comment from OpenWrt packages feeds [1].  Actually this "unrecognized
>> opcode" problem have occurred within OpenWrt quite a few times before.
>>
>>  [1] https://github.com/openwrt/packages/commit/1e29676a8ac74f797f8ca799364681cec575ae6f#commitcomment-12901931
>
>  The bug certainly was there, it's just your analysis and consequently the
> fix that are wrong in the general case for some reason, maybe a buggy
> compiler.
>

This is the compiler "--version",

    mips-openwrt-linux-gcc (OpenWrt/Linaro GCC 4.8-2014.04 r46763) 4.8.3

>> >  It looks to me you're papering something over here -- when you use a
>> > `.set nomips16' directive the assembler will happily switch your
>> > instruction set in the middle of an instruction stream.  Consequently if
>> > this function is (for whatever reason; it should not really) inlined in
>> > MIPS16 code, then you'll get a MIPS32 instruction in the middle, which
>> > will obviously be interpreted differently in the MIPS16 execution mode and
>> > is therefore surely a recipe for disaster.
>>
>> If by "papering" you mean "made up", then whatever.  Yeah, it's
>> disaster, an "invalid instruction" abort.
>
>  By "papering over" I mean forcing source code to compile successfully at
> the risk of producing incorrect binary code.

Learned a new phrase/idiom :)

>
>> >  How did you test your change and made the conclusion quoted with your
>> > patch description?
>> >
>>
>> Compile the following program with a MIPS 24kc big endian variant compiler with
>> flag "-mips32r2 -mips16 -Os".
>>
>>     #include <stdio.h>
>>     #include <stdint.h>
>>
>>     uint16_t __attribute__((noinline)) f(uint16_t v)
>>     {
>>         v = __cpu_to_le16(v);
>>         return v;
>>     }
>>
>>     int main()
>>     {
>>         printf("%x\n", f(0xbeef));
>>
>>         return 0;
>>     }
>>
>> When only ".set nomips16" was specified in __arch_swab16(), this was output
>> from objdump.
>>
>>     242 004003e0 <f>:
>>     243   4003e0:       7c0410a0        wsbh    v0,a0
>>     244   4003e4:       e820ea31        swc2    $0,-5583(at)
>>     245   4003e8:       65006500        0x65006500
>>     246   4003ec:       65006500        0x65006500
>
>  Quite obviously.
>
>  For the record: the first instruction has been assembled in the regular
> MIPS mode and that propagated to symbol annotation.  Consequently `f' is
> seen by `objdump' as a regular MIPS function and disassembles it all as
> such.  You can put a global label immediately after the WSBH instruction
> in your source code to have the rest of the function disassembled
> correctly (of course this won't make this code work at the run time).

Thanks for the trick.  Objdump really disassembled it
correctly.

>
>> __arch_swab16() was indeed inlined.  That swc2 instruction can be got from
>> assembler with the following code (it's from the "-S" result of GCC).
>>
>>     .set    mips16
>>     .set    noreorder
>>     .set    nomacro
>>     j       $31
>>     zeh     $2
>>
>> When only nomips16 function attribute was specified, this time GCC failed with
>> unrecognized opcode
>>
>>     /tmp/ccaGCouL.s: Assembler messages:
>>     /tmp/ccaGCouL.s:21: Error: unrecognized opcode `wsbh $2,$4'
>>
>> The generated assembly was something in the following form.  Looks like the
>> assembler did not automatically switch to MIPS32 mode when ".set arch=mip32r2"
>
>  There's no switch to regular MIPS mode implied with `.set arch=mip32r2',
> the directive merely switches the ISA level, affecting both the regular
> MIPS and the MIPS16 mode (the MIPS32r2 ISA level adds extra MIPS16
> instructions too, e.g. ZEH is a new addition).

Agreed

>
>>         .set mips16
>>
>>         .ent f
>>         .type f, @function
>>     f
>>         ...
>>         .set push
>>         .set arch=mips32r2
>>         wsbh $2,$4
>>         .pop
>>         j $31
>>         zeh $2
>>         .end f
>>         ...
>
>  That's exactly the papered-over buggy code scenario I've been referring
> to above.  This is clearly MIPS16 code: ZEH is a MIPS16 instruction only,
> there's no such regular MIPS counterpart.  And it also obviously fails to
> assemble because on the contrary there's no MIPS16 WSBH instruction.
>

Yes, it won't assemble.

>  Now if you stick `.set nomips16' just above WSBH, then this code will
> happily assemble, because this single instruction only (`.set pop' reverts
> any previous `.set' directives; I'm assuming you wrote above by hand and
> `.pop' is a typo) will assemble in the regular MIPS instruction mode.  But
> if this code is ever reached, then the processor will still execute the
> machine code produced by the assembler from the WSBH instruction in the
> MIPS16 mode.

Yes, I hand-copied it from the output of "gcc -S" just to
show the form/pattern (the original output is too long for
this conversation).  No, that `.pop' is not a typo (I just
did a double-check).

If I stick `.set nomips16` just above WSBH, that's just what
the original patch tries to do, papering over the fact that
it did not compile/assemble without it.

>
>  For example the encoding of:
>
>         wsbh    $2, $4
>
> is (as you've shown in a dump above) 0x7c0410a0, which in the MIPS16 mode
> yields (in the big-endian mode):
>
> 00000000 <f>:
>    0:   7c04            sd      s0,32(a0)
>    2:   10a0            b       144 <f+0x144>
>
> -- so this won't do what you might expect, you'll get a Reserved
> Instruction exception on the SD instruction, which is not supported in the
> 32-bit mode, and consequently SIGILL.

Agreed.

>
>> The patch was run tested on QEMU Malta and an router with Atheros
>> AR9331 SoC.  I didn't test __arch_swab64() though.  I have done many
>> other trial-and-error tests while preparing this patch.  It was a mess
>> when I was sure I should expect some sensible behaviour from the
>> compiler while it actually just did not behave that way.
>
>  I've compiled your example provided and as I stated in the original mail
> `__arch_swab16' is always produced as a separate function, whether `.set
> nomips16' is present in the inline assembly placed there or not.  This is
> with (unreleased) GCC 6.0.0.
>
>  However if you happen to have a buggy compiler that fails to emit
> `__arch_swab16' as a separate function despite the `nomips16' attribute,
> then it's better if the resulting generated assembly code fails to
> assemble rather than if it goes astray at the run time.

Now, my compiler refused to emit `__arch_swab16' as a
separate function even with the `nomips16' function
attribute.  This is the kind of "mess" I just meant above, again.
I expect that it should emit a separate function and call it
with a jump observing that the caller `f' is in MIPS16 mode
yet the `__arch_swab16' is noMIPS16.  sigh~~

>
>> >  So setting aside the correctness issues discussed above, for MIPS16 code
>> > this has to be put out of line by the compiler, with all the usual
>> > overhead of a function call, causing a performance loss rather than a gain
>> > expected here.  Especially as switching the ISA mode requires draining the
>> > whole pipeline so that subsequent instructions are interpreted in the new
>> > execution mode.  This is expensive in performance terms.
>> >
>> >  I'm fairly sure simply disabling these asms (#ifndef __mips16) and
>> > letting the compiler generate the right mask and shift operations from
>> > plain C code will be cheaper in performance terms and possibly cheaper in
>> > code size as well, especially in otherwise leaf functions where an extra
>> > function call will according to the ABI clobber temporaries.  So I suggest
>> > going in that direction instead.
>>
>> I agree.  Then you will provide the fix right?  I am just curious
>> where that __mips16 should be placed or is it from compiler and
>> assembler?
>
>  No, it's your bug after all.  I think the last paragraph I wrote quoted
> above combined with the source code in question make it clear what to do.

Okay, I will try.  Most of the time when textbooks read
clearly/obviously/apparently, things go astray ;)

>
>  I have also checked what the difference in generated MIPS16 code is
> between a call to `__arch_swab16':
>
> 0000000c <f>:
>    c:   64c4            save    32,ra
>    e:   1800 0000       jal     0 <__arch_swab16>
>                         e: R_MIPS16_26  __arch_swab16
>   12:   ec31            zeh     a0
>   14:   6444            restore 32,ra
>   16:   e8a0            jrc     ra
>
> and equivalent generic code:
>
> 0000000c <f>:
>    c:   ec31            zeh     a0
>    e:   3280            sll     v0,a0,8
>   10:   3482            srl     a0,8
>   12:   ea8d            or      v0,a0
>   14:   e820            jr      ra
>   16:   ea31            zeh     v0
>
> so the win is I think clear.
>
>  Finally the MIPS64 `__arch_swab64' case does not matter as we have no
> MIPS16 support for 64-bit code in Linux, the toolchain will simply refuse
> to build it (only bare metal is supported).

Thanks for the information.  Never played with MIPS64
before.

Regards

                yousong

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] MIPS: UAPI: Fix unrecognized opcode WSBH/DSBH/DSHD when using MIPS16.
  2015-09-06  2:42       ` Yousong Zhou
@ 2015-09-06 23:23         ` Maciej W. Rozycki
  0 siblings, 0 replies; 7+ messages in thread
From: Maciej W. Rozycki @ 2015-09-06 23:23 UTC (permalink / raw)
  To: Yousong Zhou
  Cc: Ralf Baechle, Chen Jie, linux-mips, linux-kernel, Florian Fainelli

On Sun, 6 Sep 2015, Yousong Zhou wrote:

> Hi, Maciej, first of all, thank you for your time on this,
> appreciate it.

 You're welcome!

> >  The bug certainly was there, it's just your analysis and consequently the
> > fix that are wrong in the general case for some reason, maybe a buggy
> > compiler.
> >
> 
> This is the compiler "--version",
> 
>     mips-openwrt-linux-gcc (OpenWrt/Linaro GCC 4.8-2014.04 r46763) 4.8.3

 A-ha!  I've checked GCC's history and the symptom you're seeing was PR 
target/55777:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55777

and the bug has only been fixed pretty recently (r199328), already after 
GCC 4.8 has branched.  From your troubles I infer the fix has not been 
backported to 4.8 and therefore is only available from GCC 4.9 up; of 
course either you or OpenWrt can still backport it and apply locally.

 Fortunately as I noted you need not dive into that hole as the 
reasonable choice here is to avoid the asm for MIPS16 code altogether, 
regardless of the GCC bug.

> >  Now if you stick `.set nomips16' just above WSBH, then this code will
> > happily assemble, because this single instruction only (`.set pop' reverts
> > any previous `.set' directives; I'm assuming you wrote above by hand and
> > `.pop' is a typo) will assemble in the regular MIPS instruction mode.  But
> > if this code is ever reached, then the processor will still execute the
> > machine code produced by the assembler from the WSBH instruction in the
> > MIPS16 mode.
> 
> Yes, I hand-copied it from the output of "gcc -S" just to
> show the form/pattern (the original output is too long for
> this conversation).  No, that `.pop' is not a typo (I just
> did a double-check).

 Well, please check again then as there's no such pseudo-op:

$ cat pop.s
	.pop
$ mips-mti-linux-gnu-as -o pop.o pop.s
pop.s: Assembler messages:
pop.s:1: Error: unknown pseudo-op: `.pop'
$ 

> >  No, it's your bug after all.  I think the last paragraph I wrote quoted
> > above combined with the source code in question make it clear what to do.
> 
> Okay, I will try.  Most of the time when textbooks read
> clearly/obviously/apparently, things go astray ;)

 Well, if you really find yourself stuck with it, then come back for 
more hints, however please do try figuring it out yourself first as 
it'll be a good exercise.

  Maciej

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2015-09-06 23:23 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-09-03  9:47 [PATCH] MIPS: UAPI: Fix unrecognized opcode WSBH/DSBH/DSHD when using MIPS16 Yousong Zhou
2015-09-04 18:52 ` Maciej W. Rozycki
2015-09-05  3:29   ` Yousong Zhou
2015-09-05  3:34     ` Yousong Zhou
2015-09-05 14:25     ` Maciej W. Rozycki
2015-09-06  2:42       ` Yousong Zhou
2015-09-06 23:23         ` Maciej W. Rozycki

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.