* [PATCH] powerpc/32: Remove one insn in __bswapdi2
@ 2016-08-05 11:28 Christophe Leroy
2016-08-10 8:56 ` Gabriel Paubert
0 siblings, 1 reply; 6+ messages in thread
From: Christophe Leroy @ 2016-08-05 11:28 UTC (permalink / raw)
To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, Scott Wood
Cc: linux-kernel, linuxppc-dev
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
---
arch/powerpc/kernel/misc_32.S | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/arch/powerpc/kernel/misc_32.S b/arch/powerpc/kernel/misc_32.S
index e025230..e18055c 100644
--- a/arch/powerpc/kernel/misc_32.S
+++ b/arch/powerpc/kernel/misc_32.S
@@ -578,9 +578,8 @@ _GLOBAL(__bswapdi2)
rlwimi r9,r4,24,0,7
rlwimi r10,r3,24,0,7
rlwimi r9,r4,24,16,23
- rlwimi r10,r3,24,16,23
+ rlwimi r4,r3,24,16,23
mr r3,r9
- mr r4,r10
blr
#ifdef CONFIG_SMP
--
2.1.0
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH] powerpc/32: Remove one insn in __bswapdi2
2016-08-05 11:28 [PATCH] powerpc/32: Remove one insn in __bswapdi2 Christophe Leroy
@ 2016-08-10 8:56 ` Gabriel Paubert
2016-08-10 10:18 ` Christophe Leroy
0 siblings, 1 reply; 6+ messages in thread
From: Gabriel Paubert @ 2016-08-10 8:56 UTC (permalink / raw)
To: Christophe Leroy
Cc: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
Scott Wood, linuxppc-dev, linux-kernel
On Fri, Aug 05, 2016 at 01:28:02PM +0200, Christophe Leroy wrote:
> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
> ---
> arch/powerpc/kernel/misc_32.S | 3 +--
> 1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/arch/powerpc/kernel/misc_32.S b/arch/powerpc/kernel/misc_32.S
> index e025230..e18055c 100644
> --- a/arch/powerpc/kernel/misc_32.S
> +++ b/arch/powerpc/kernel/misc_32.S
> @@ -578,9 +578,8 @@ _GLOBAL(__bswapdi2)
> rlwimi r9,r4,24,0,7
> rlwimi r10,r3,24,0,7
> rlwimi r9,r4,24,16,23
> - rlwimi r10,r3,24,16,23
> + rlwimi r4,r3,24,16,23
> mr r3,r9
> - mr r4,r10
> blr
>
Hmmm, are you sure that it works? rlwimi is a bit special since the
first operand is both an input and an output of the instruction.
In other words, did you test the code?
Gabriel
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] powerpc/32: Remove one insn in __bswapdi2
2016-08-10 8:56 ` Gabriel Paubert
@ 2016-08-10 10:18 ` Christophe Leroy
2016-08-11 21:34 ` Gabriel Paubert
0 siblings, 1 reply; 6+ messages in thread
From: Christophe Leroy @ 2016-08-10 10:18 UTC (permalink / raw)
To: Gabriel Paubert
Cc: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
Scott Wood, linuxppc-dev, linux-kernel
Le 10/08/2016 à 10:56, Gabriel Paubert a écrit :
> On Fri, Aug 05, 2016 at 01:28:02PM +0200, Christophe Leroy wrote:
>> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
>> ---
>> arch/powerpc/kernel/misc_32.S | 3 +--
>> 1 file changed, 1 insertion(+), 2 deletions(-)
>>
>> diff --git a/arch/powerpc/kernel/misc_32.S b/arch/powerpc/kernel/misc_32.S
>> index e025230..e18055c 100644
>> --- a/arch/powerpc/kernel/misc_32.S
>> +++ b/arch/powerpc/kernel/misc_32.S
>> @@ -578,9 +578,8 @@ _GLOBAL(__bswapdi2)
>> rlwimi r9,r4,24,0,7
>> rlwimi r10,r3,24,0,7
>> rlwimi r9,r4,24,16,23
>> - rlwimi r10,r3,24,16,23
>> + rlwimi r4,r3,24,16,23
>> mr r3,r9
>> - mr r4,r10
>> blr
>>
>
> Hmmm, are you sure that it works? rlwimi is a bit special since the
> first operand is both an input and an output of the instruction.
>
>
Oops, you are right ...
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] powerpc/32: Remove one insn in __bswapdi2
2016-08-10 10:18 ` Christophe Leroy
@ 2016-08-11 21:34 ` Gabriel Paubert
2016-08-11 22:11 ` Segher Boessenkool
0 siblings, 1 reply; 6+ messages in thread
From: Gabriel Paubert @ 2016-08-11 21:34 UTC (permalink / raw)
To: Christophe Leroy
Cc: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
Scott Wood, linuxppc-dev, linux-kernel
On Wed, Aug 10, 2016 at 12:18:15PM +0200, Christophe Leroy wrote:
>
>
> Le 10/08/2016 à 10:56, Gabriel Paubert a écrit :
> >On Fri, Aug 05, 2016 at 01:28:02PM +0200, Christophe Leroy wrote:
> >>Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
> >>---
> >> arch/powerpc/kernel/misc_32.S | 3 +--
> >> 1 file changed, 1 insertion(+), 2 deletions(-)
> >>
> >>diff --git a/arch/powerpc/kernel/misc_32.S b/arch/powerpc/kernel/misc_32.S
> >>index e025230..e18055c 100644
> >>--- a/arch/powerpc/kernel/misc_32.S
> >>+++ b/arch/powerpc/kernel/misc_32.S
> >>@@ -578,9 +578,8 @@ _GLOBAL(__bswapdi2)
> >> rlwimi r9,r4,24,0,7
> >> rlwimi r10,r3,24,0,7
> >> rlwimi r9,r4,24,16,23
> >>- rlwimi r10,r3,24,16,23
> >>+ rlwimi r4,r3,24,16,23
> >> mr r3,r9
> >>- mr r4,r10
> >> blr
> >>
> >
> >Hmmm, are you sure that it works? rlwimi is a bit special since the
> >first operand is both an input and an output of the instruction.
> >
> >
>
> Oops, you are right ...
I just found this:
http://hardwarebug.org/2010/01/14/beware-the-builtins/
the bswapdi2 suggested sequence only needs a single mr instruction, the
other one is absorbed in a rotlwi.
The scheduling looks poor, but it seems impossible to interleave the
operations between the two halves without adding another instructions,
and the routine is 8 instructions long, which happens to be exactly a
cache line on most 32 bit processors.
On the other hand gcc did at the time a very poor job (quite an
understatement) at bswapdi when compiling for 64 bit processors
(see the example).
But what do modern compilers generate for bswapdi these days? Do they
still call the library or not?
After all, bswapdi on 32 bit processors only takes 6 instructions if the
input and output registers don't overlap.
Gabriel
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] powerpc/32: Remove one insn in __bswapdi2
2016-08-11 21:34 ` Gabriel Paubert
@ 2016-08-11 22:11 ` Segher Boessenkool
2016-08-12 22:49 ` Gabriel Paubert
0 siblings, 1 reply; 6+ messages in thread
From: Segher Boessenkool @ 2016-08-11 22:11 UTC (permalink / raw)
To: Gabriel Paubert
Cc: Christophe Leroy, linux-kernel, Scott Wood, Paul Mackerras, linuxppc-dev
On Thu, Aug 11, 2016 at 11:34:37PM +0200, Gabriel Paubert wrote:
> On the other hand gcc did at the time a very poor job (quite an
> understatement) at bswapdi when compiling for 64 bit processors
> (see the example).
>
> But what do modern compilers generate for bswapdi these days? Do they
> still call the library or not?
Nope.
> After all, bswapdi on 32 bit processors only takes 6 instructions if the
> input and output registers don't overlap.
For this testcase:
===
typedef unsigned long long u64;
u64 bs(u64 x) { return __builtin_bswap64(x); }
===
we get with -m32:
===
bs:
mr 9,3
rotlwi 3,4,24
rlwimi 3,4,8,8,15
rlwimi 3,4,8,24,31
rotlwi 4,9,24
rlwimi 4,9,8,8,15
rlwimi 4,9,8,24,31
blr
===
and with -m64:
===
.L.bs:
srdi 10,3,32
mr 9,3
rotlwi 3,3,24
rotlwi 8,10,24
rlwimi 3,9,8,8,15
rlwimi 8,10,8,8,15
rlwimi 3,9,8,24,31
rlwimi 8,10,8,24,31
sldi 3,3,32
or 3,3,8
blr
===
Neither as tight as possible, but neither horrible either.
Segher
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] powerpc/32: Remove one insn in __bswapdi2
2016-08-11 22:11 ` Segher Boessenkool
@ 2016-08-12 22:49 ` Gabriel Paubert
0 siblings, 0 replies; 6+ messages in thread
From: Gabriel Paubert @ 2016-08-12 22:49 UTC (permalink / raw)
To: Segher Boessenkool
Cc: Christophe Leroy, linux-kernel, Scott Wood, Paul Mackerras, linuxppc-dev
On Thu, Aug 11, 2016 at 05:11:19PM -0500, Segher Boessenkool wrote:
> On Thu, Aug 11, 2016 at 11:34:37PM +0200, Gabriel Paubert wrote:
> > On the other hand gcc did at the time a very poor job (quite an
> > understatement) at bswapdi when compiling for 64 bit processors
> > (see the example).
> >
> > But what do modern compilers generate for bswapdi these days? Do they
> > still call the library or not?
>
> Nope.
Great, could then these functions be removed from misc_32.S, or are
compilers that use libcalls still supported for kernel builds?
>
> > After all, bswapdi on 32 bit processors only takes 6 instructions if the
> > input and output registers don't overlap.
>
> For this testcase:
> ===
> typedef unsigned long long u64;
> u64 bs(u64 x) { return __builtin_bswap64(x); }
> ===
>
> we get with -m32:
> ===
> bs:
> mr 9,3
> rotlwi 3,4,24
> rlwimi 3,4,8,8,15
> rlwimi 3,4,8,24,31
> rotlwi 4,9,24
> rlwimi 4,9,8,8,15
> rlwimi 4,9,8,24,31
> blr
In this case the compiler is constrained by the fact that the input and
ouput registers are the same. When inlined with other things it can
probably perform better scheduling and interleaving of operations.
> ===
>
> and with -m64:
> ===
> .L.bs:
> srdi 10,3,32
> mr 9,3
> rotlwi 3,3,24
> rotlwi 8,10,24
> rlwimi 3,9,8,8,15
> rlwimi 8,10,8,8,15
> rlwimi 3,9,8,24,31
> rlwimi 8,10,8,24,31
> sldi 3,3,32
> or 3,3,8
> blr
> ===
>
As demonstrated here where the two halves of the 64 bit quantity
are byte swapped in an interleaved fashion. Not perfect (I think
that with proper ordering the last 2 instructions could be replaced
by a rldimi), but reasonable.
> Neither as tight as possible, but neither horrible either.
>
Indeed.
Gabriel
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2016-08-12 22:50 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-08-05 11:28 [PATCH] powerpc/32: Remove one insn in __bswapdi2 Christophe Leroy
2016-08-10 8:56 ` Gabriel Paubert
2016-08-10 10:18 ` Christophe Leroy
2016-08-11 21:34 ` Gabriel Paubert
2016-08-11 22:11 ` Segher Boessenkool
2016-08-12 22:49 ` Gabriel Paubert
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.