All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] powerpc/32: Remove one insn in __bswapdi2
@ 2016-08-05 11:28 Christophe Leroy
  2016-08-10  8:56 ` Gabriel Paubert
  0 siblings, 1 reply; 6+ messages in thread
From: Christophe Leroy @ 2016-08-05 11:28 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, Scott Wood
  Cc: linux-kernel, linuxppc-dev

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
---
 arch/powerpc/kernel/misc_32.S | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/powerpc/kernel/misc_32.S b/arch/powerpc/kernel/misc_32.S
index e025230..e18055c 100644
--- a/arch/powerpc/kernel/misc_32.S
+++ b/arch/powerpc/kernel/misc_32.S
@@ -578,9 +578,8 @@ _GLOBAL(__bswapdi2)
 	rlwimi  r9,r4,24,0,7
 	rlwimi  r10,r3,24,0,7
 	rlwimi  r9,r4,24,16,23
-	rlwimi  r10,r3,24,16,23
+	rlwimi  r4,r3,24,16,23
 	mr      r3,r9
-	mr      r4,r10
 	blr
 
 #ifdef CONFIG_SMP
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] powerpc/32: Remove one insn in __bswapdi2
  2016-08-05 11:28 [PATCH] powerpc/32: Remove one insn in __bswapdi2 Christophe Leroy
@ 2016-08-10  8:56 ` Gabriel Paubert
  2016-08-10 10:18   ` Christophe Leroy
  0 siblings, 1 reply; 6+ messages in thread
From: Gabriel Paubert @ 2016-08-10  8:56 UTC (permalink / raw)
  To: Christophe Leroy
  Cc: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	Scott Wood, linuxppc-dev, linux-kernel

On Fri, Aug 05, 2016 at 01:28:02PM +0200, Christophe Leroy wrote:
> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
> ---
>  arch/powerpc/kernel/misc_32.S | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/misc_32.S b/arch/powerpc/kernel/misc_32.S
> index e025230..e18055c 100644
> --- a/arch/powerpc/kernel/misc_32.S
> +++ b/arch/powerpc/kernel/misc_32.S
> @@ -578,9 +578,8 @@ _GLOBAL(__bswapdi2)
>  	rlwimi  r9,r4,24,0,7
>  	rlwimi  r10,r3,24,0,7
>  	rlwimi  r9,r4,24,16,23
> -	rlwimi  r10,r3,24,16,23
> +	rlwimi  r4,r3,24,16,23
>  	mr      r3,r9
> -	mr      r4,r10
>  	blr
>  

Hmmm, are you sure that it works? rlwimi is a bit special since the
first operand is both an input and an output of the instruction.


In other words, did you test the code?

    Gabriel

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] powerpc/32: Remove one insn in __bswapdi2
  2016-08-10  8:56 ` Gabriel Paubert
@ 2016-08-10 10:18   ` Christophe Leroy
  2016-08-11 21:34     ` Gabriel Paubert
  0 siblings, 1 reply; 6+ messages in thread
From: Christophe Leroy @ 2016-08-10 10:18 UTC (permalink / raw)
  To: Gabriel Paubert
  Cc: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	Scott Wood, linuxppc-dev, linux-kernel



Le 10/08/2016 à 10:56, Gabriel Paubert a écrit :
> On Fri, Aug 05, 2016 at 01:28:02PM +0200, Christophe Leroy wrote:
>> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
>> ---
>>  arch/powerpc/kernel/misc_32.S | 3 +--
>>  1 file changed, 1 insertion(+), 2 deletions(-)
>>
>> diff --git a/arch/powerpc/kernel/misc_32.S b/arch/powerpc/kernel/misc_32.S
>> index e025230..e18055c 100644
>> --- a/arch/powerpc/kernel/misc_32.S
>> +++ b/arch/powerpc/kernel/misc_32.S
>> @@ -578,9 +578,8 @@ _GLOBAL(__bswapdi2)
>>  	rlwimi  r9,r4,24,0,7
>>  	rlwimi  r10,r3,24,0,7
>>  	rlwimi  r9,r4,24,16,23
>> -	rlwimi  r10,r3,24,16,23
>> +	rlwimi  r4,r3,24,16,23
>>  	mr      r3,r9
>> -	mr      r4,r10
>>  	blr
>>
>
> Hmmm, are you sure that it works? rlwimi is a bit special since the
> first operand is both an input and an output of the instruction.
>
>

Oops, you are right ...

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] powerpc/32: Remove one insn in __bswapdi2
  2016-08-10 10:18   ` Christophe Leroy
@ 2016-08-11 21:34     ` Gabriel Paubert
  2016-08-11 22:11       ` Segher Boessenkool
  0 siblings, 1 reply; 6+ messages in thread
From: Gabriel Paubert @ 2016-08-11 21:34 UTC (permalink / raw)
  To: Christophe Leroy
  Cc: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	Scott Wood, linuxppc-dev, linux-kernel

On Wed, Aug 10, 2016 at 12:18:15PM +0200, Christophe Leroy wrote:
> 
> 
> Le 10/08/2016 à 10:56, Gabriel Paubert a écrit :
> >On Fri, Aug 05, 2016 at 01:28:02PM +0200, Christophe Leroy wrote:
> >>Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
> >>---
> >> arch/powerpc/kernel/misc_32.S | 3 +--
> >> 1 file changed, 1 insertion(+), 2 deletions(-)
> >>
> >>diff --git a/arch/powerpc/kernel/misc_32.S b/arch/powerpc/kernel/misc_32.S
> >>index e025230..e18055c 100644
> >>--- a/arch/powerpc/kernel/misc_32.S
> >>+++ b/arch/powerpc/kernel/misc_32.S
> >>@@ -578,9 +578,8 @@ _GLOBAL(__bswapdi2)
> >> 	rlwimi  r9,r4,24,0,7
> >> 	rlwimi  r10,r3,24,0,7
> >> 	rlwimi  r9,r4,24,16,23
> >>-	rlwimi  r10,r3,24,16,23
> >>+	rlwimi  r4,r3,24,16,23
> >> 	mr      r3,r9
> >>-	mr      r4,r10
> >> 	blr
> >>
> >
> >Hmmm, are you sure that it works? rlwimi is a bit special since the
> >first operand is both an input and an output of the instruction.
> >
> >
> 
> Oops, you are right ...

I just found this: 

http://hardwarebug.org/2010/01/14/beware-the-builtins/

the bswapdi2 suggested sequence only needs a single mr instruction, the 
other one is absorbed in a rotlwi.

The scheduling looks poor, but it seems impossible to interleave the
operations between the two halves without adding another instructions,
and the routine is 8 instructions long, which happens to be exactly a
cache line on most 32 bit processors.

On the other hand gcc did at the time a very poor job (quite an
understatement) at bswapdi when compiling for 64 bit processors 
(see the example).

But what do modern compilers generate for bswapdi these days? Do they
still call the library or not?

After all, bswapdi on 32 bit processors only takes 6 instructions if the
input and output registers don't overlap.

    Gabriel

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] powerpc/32: Remove one insn in __bswapdi2
  2016-08-11 21:34     ` Gabriel Paubert
@ 2016-08-11 22:11       ` Segher Boessenkool
  2016-08-12 22:49         ` Gabriel Paubert
  0 siblings, 1 reply; 6+ messages in thread
From: Segher Boessenkool @ 2016-08-11 22:11 UTC (permalink / raw)
  To: Gabriel Paubert
  Cc: Christophe Leroy, linux-kernel, Scott Wood, Paul Mackerras, linuxppc-dev

On Thu, Aug 11, 2016 at 11:34:37PM +0200, Gabriel Paubert wrote:
> On the other hand gcc did at the time a very poor job (quite an
> understatement) at bswapdi when compiling for 64 bit processors 
> (see the example).
> 
> But what do modern compilers generate for bswapdi these days? Do they
> still call the library or not?

Nope.

> After all, bswapdi on 32 bit processors only takes 6 instructions if the
> input and output registers don't overlap.

For this testcase:
===
typedef unsigned long long u64;
u64 bs(u64 x) { return __builtin_bswap64(x); }
===

we get with -m32:
===
bs:
	mr 9,3
	rotlwi 3,4,24
	rlwimi 3,4,8,8,15
	rlwimi 3,4,8,24,31
	rotlwi 4,9,24
	rlwimi 4,9,8,8,15
	rlwimi 4,9,8,24,31
	blr
===

and with -m64:
===
.L.bs:
	srdi 10,3,32
	mr 9,3
	rotlwi 3,3,24
	rotlwi 8,10,24
	rlwimi 3,9,8,8,15
	rlwimi 8,10,8,8,15
	rlwimi 3,9,8,24,31
	rlwimi 8,10,8,24,31
	sldi 3,3,32
	or 3,3,8
	blr
===

Neither as tight as possible, but neither horrible either.


Segher

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] powerpc/32: Remove one insn in __bswapdi2
  2016-08-11 22:11       ` Segher Boessenkool
@ 2016-08-12 22:49         ` Gabriel Paubert
  0 siblings, 0 replies; 6+ messages in thread
From: Gabriel Paubert @ 2016-08-12 22:49 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Christophe Leroy, linux-kernel, Scott Wood, Paul Mackerras, linuxppc-dev

On Thu, Aug 11, 2016 at 05:11:19PM -0500, Segher Boessenkool wrote:
> On Thu, Aug 11, 2016 at 11:34:37PM +0200, Gabriel Paubert wrote:
> > On the other hand gcc did at the time a very poor job (quite an
> > understatement) at bswapdi when compiling for 64 bit processors 
> > (see the example).
> > 
> > But what do modern compilers generate for bswapdi these days? Do they
> > still call the library or not?
> 
> Nope.

Great, could then these functions be removed from misc_32.S, or are
compilers that use libcalls still supported for kernel builds?

> 
> > After all, bswapdi on 32 bit processors only takes 6 instructions if the
> > input and output registers don't overlap.
> 
> For this testcase:
> ===
> typedef unsigned long long u64;
> u64 bs(u64 x) { return __builtin_bswap64(x); }
> ===
> 
> we get with -m32:
> ===
> bs:
> 	mr 9,3
> 	rotlwi 3,4,24
> 	rlwimi 3,4,8,8,15
> 	rlwimi 3,4,8,24,31
> 	rotlwi 4,9,24
> 	rlwimi 4,9,8,8,15
> 	rlwimi 4,9,8,24,31
> 	blr

In this case the compiler is constrained by the fact that the input and
ouput registers are the same. When inlined with other things it can
probably perform better scheduling and interleaving of operations.


> ===
> 
> and with -m64:
> ===
> .L.bs:
> 	srdi 10,3,32
> 	mr 9,3
> 	rotlwi 3,3,24
> 	rotlwi 8,10,24
> 	rlwimi 3,9,8,8,15
> 	rlwimi 8,10,8,8,15
> 	rlwimi 3,9,8,24,31
> 	rlwimi 8,10,8,24,31
> 	sldi 3,3,32
> 	or 3,3,8
> 	blr
> ===
> 

As demonstrated here where the two halves of the 64 bit quantity
are byte swapped in an interleaved fashion. Not perfect (I think
that with proper ordering the last 2 instructions could be replaced
by a rldimi), but reasonable.

> Neither as tight as possible, but neither horrible either.
> 

Indeed.

    Gabriel

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2016-08-12 22:50 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-08-05 11:28 [PATCH] powerpc/32: Remove one insn in __bswapdi2 Christophe Leroy
2016-08-10  8:56 ` Gabriel Paubert
2016-08-10 10:18   ` Christophe Leroy
2016-08-11 21:34     ` Gabriel Paubert
2016-08-11 22:11       ` Segher Boessenkool
2016-08-12 22:49         ` Gabriel Paubert

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.