All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] powerpc: Align hot loops of memset() and backwards_memcpy()
@ 2016-08-04  6:53 Anton Blanchard
  2016-08-04  7:49 ` Christophe Leroy
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Anton Blanchard @ 2016-08-04  6:53 UTC (permalink / raw)
  To: benh, paulus, mpe, agraf; +Cc: linuxppc-dev

From: Anton Blanchard <anton@samba.org>

Align the hot loops in our assembly implementation of memset()
and backwards_memcpy().

backwards_memcpy() is called from tcp_v4_rcv(), so we might
want to optimise this a little more.

Signed-off-by: Anton Blanchard <anton@samba.org>
---
 arch/powerpc/lib/mem_64.S | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/powerpc/lib/mem_64.S b/arch/powerpc/lib/mem_64.S
index 43435c6..eda7a96 100644
--- a/arch/powerpc/lib/mem_64.S
+++ b/arch/powerpc/lib/mem_64.S
@@ -37,6 +37,7 @@ _GLOBAL(memset)
 	clrldi	r5,r5,58
 	mtctr	r0
 	beq	5f
+	.balign 16
 4:	std	r4,0(r6)
 	std	r4,8(r6)
 	std	r4,16(r6)
@@ -90,6 +91,7 @@ _GLOBAL(backwards_memcpy)
 	andi.	r0,r6,3
 	mtctr	r7
 	bne	5f
+	.balign 16
 1:	lwz	r7,-4(r4)
 	lwzu	r8,-8(r4)
 	stw	r7,-4(r6)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] powerpc: Align hot loops of memset() and backwards_memcpy()
  2016-08-04  6:53 [PATCH] powerpc: Align hot loops of memset() and backwards_memcpy() Anton Blanchard
@ 2016-08-04  7:49 ` Christophe Leroy
  2016-08-04 10:36   ` Anton Blanchard
  2016-08-05 11:00 ` Nicholas Piggin
  2016-10-05  2:36 ` Michael Ellerman
  2 siblings, 1 reply; 8+ messages in thread
From: Christophe Leroy @ 2016-08-04  7:49 UTC (permalink / raw)
  To: Anton Blanchard, benh, paulus, mpe, agraf; +Cc: linuxppc-dev


Le 04/08/2016 à 08:53, Anton Blanchard a écrit :
> From: Anton Blanchard <anton@samba.org>
>
> Align the hot loops in our assembly implementation of memset()
> and backwards_memcpy().
>
> backwards_memcpy() is called from tcp_v4_rcv(), so we might
> want to optimise this a little more.
>
> Signed-off-by: Anton Blanchard <anton@samba.org>

Shouldn't this patch be titled powerpc/64, as powerpc32 has a different 
memset() ?

Christophe

> ---
>  arch/powerpc/lib/mem_64.S | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/arch/powerpc/lib/mem_64.S b/arch/powerpc/lib/mem_64.S
> index 43435c6..eda7a96 100644
> --- a/arch/powerpc/lib/mem_64.S
> +++ b/arch/powerpc/lib/mem_64.S
> @@ -37,6 +37,7 @@ _GLOBAL(memset)
>  	clrldi	r5,r5,58
>  	mtctr	r0
>  	beq	5f
> +	.balign 16
>  4:	std	r4,0(r6)
>  	std	r4,8(r6)
>  	std	r4,16(r6)
> @@ -90,6 +91,7 @@ _GLOBAL(backwards_memcpy)
>  	andi.	r0,r6,3
>  	mtctr	r7
>  	bne	5f
> +	.balign 16
>  1:	lwz	r7,-4(r4)
>  	lwzu	r8,-8(r4)
>  	stw	r7,-4(r6)
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] powerpc: Align hot loops of memset() and backwards_memcpy()
  2016-08-04  7:49 ` Christophe Leroy
@ 2016-08-04 10:36   ` Anton Blanchard
  0 siblings, 0 replies; 8+ messages in thread
From: Anton Blanchard @ 2016-08-04 10:36 UTC (permalink / raw)
  To: Christophe Leroy; +Cc: benh, paulus, mpe, agraf, linuxppc-dev

Hi Christophe,

> > Align the hot loops in our assembly implementation of memset()
> > and backwards_memcpy().
> >
> > backwards_memcpy() is called from tcp_v4_rcv(), so we might
> > want to optimise this a little more.
> >
> > Signed-off-by: Anton Blanchard <anton@samba.org>  
> 
> Shouldn't this patch be titled powerpc/64, as powerpc32 has a
> different memset() ?

Yeah, good point. Michael can you make this change if you choose to
merge it? 

Anton

> > ---
> >  arch/powerpc/lib/mem_64.S | 2 ++
> >  1 file changed, 2 insertions(+)
> >
> > diff --git a/arch/powerpc/lib/mem_64.S b/arch/powerpc/lib/mem_64.S
> > index 43435c6..eda7a96 100644
> > --- a/arch/powerpc/lib/mem_64.S
> > +++ b/arch/powerpc/lib/mem_64.S
> > @@ -37,6 +37,7 @@ _GLOBAL(memset)
> >  	clrldi	r5,r5,58
> >  	mtctr	r0
> >  	beq	5f
> > +	.balign 16
> >  4:	std	r4,0(r6)
> >  	std	r4,8(r6)
> >  	std	r4,16(r6)
> > @@ -90,6 +91,7 @@ _GLOBAL(backwards_memcpy)
> >  	andi.	r0,r6,3
> >  	mtctr	r7
> >  	bne	5f
> > +	.balign 16
> >  1:	lwz	r7,-4(r4)
> >  	lwzu	r8,-8(r4)
> >  	stw	r7,-4(r6)
> >  
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] powerpc: Align hot loops of memset() and backwards_memcpy()
  2016-08-04  6:53 [PATCH] powerpc: Align hot loops of memset() and backwards_memcpy() Anton Blanchard
  2016-08-04  7:49 ` Christophe Leroy
@ 2016-08-05 11:00 ` Nicholas Piggin
  2016-08-05 11:54   ` Anton Blanchard
  2016-09-25 11:36   ` Anton Blanchard
  2016-10-05  2:36 ` Michael Ellerman
  2 siblings, 2 replies; 8+ messages in thread
From: Nicholas Piggin @ 2016-08-05 11:00 UTC (permalink / raw)
  To: Anton Blanchard; +Cc: benh, paulus, mpe, agraf, linuxppc-dev

On Thu,  4 Aug 2016 16:53:22 +1000
Anton Blanchard <anton@ozlabs.org> wrote:

> From: Anton Blanchard <anton@samba.org>
> 
> Align the hot loops in our assembly implementation of memset()
> and backwards_memcpy().
> 
> backwards_memcpy() is called from tcp_v4_rcv(), so we might
> want to optimise this a little more.
> 
> Signed-off-by: Anton Blanchard <anton@samba.org>
> ---
>  arch/powerpc/lib/mem_64.S | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/arch/powerpc/lib/mem_64.S b/arch/powerpc/lib/mem_64.S
> index 43435c6..eda7a96 100644
> --- a/arch/powerpc/lib/mem_64.S
> +++ b/arch/powerpc/lib/mem_64.S
> @@ -37,6 +37,7 @@ _GLOBAL(memset)
>  	clrldi	r5,r5,58
>  	mtctr	r0
>  	beq	5f
> +	.balign 16
>  4:	std	r4,0(r6)
>  	std	r4,8(r6)
>  	std	r4,16(r6)

Hmm. If we execute this loop once, we'll only fetch additional nops. Twice, and
we make up for them by not fetching unused instructions. More than twice and we
may start winning.

For large sizes it probably helps, but I'd like to see what sizes memset sees.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] powerpc: Align hot loops of memset() and backwards_memcpy()
  2016-08-05 11:00 ` Nicholas Piggin
@ 2016-08-05 11:54   ` Anton Blanchard
  2016-09-25 11:36   ` Anton Blanchard
  1 sibling, 0 replies; 8+ messages in thread
From: Anton Blanchard @ 2016-08-05 11:54 UTC (permalink / raw)
  To: Nicholas Piggin; +Cc: benh, paulus, mpe, agraf, linuxppc-dev

Hi Nick,

> Hmm. If we execute this loop once, we'll only fetch additional nops.
> Twice, and we make up for them by not fetching unused instructions.
> More than twice and we may start winning.
> 
> For large sizes it probably helps, but I'd like to see what sizes
> memset sees.

I found this in a trace of nginx web serving. Looking back at it,
get_empty_filp() zeros a struct file, and we go through the loop 4
times. We might want to look more generally at what lengths memset() is
called with though.

Anton

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] powerpc: Align hot loops of memset() and backwards_memcpy()
  2016-08-05 11:00 ` Nicholas Piggin
  2016-08-05 11:54   ` Anton Blanchard
@ 2016-09-25 11:36   ` Anton Blanchard
  2016-09-27 19:03     ` Nicholas Piggin
  1 sibling, 1 reply; 8+ messages in thread
From: Anton Blanchard @ 2016-09-25 11:36 UTC (permalink / raw)
  To: Nicholas Piggin; +Cc: benh, paulus, mpe, agraf, linuxppc-dev

Hi Nick,

> Hmm. If we execute this loop once, we'll only fetch additional nops.
> Twice, and we make up for them by not fetching unused instructions.
> More than twice and we may start winning.
> 
> For large sizes it probably helps, but I'd like to see what sizes
> memset sees.

I noticed this in an nginx web serving test. There are some 1 and 2
iteration calls, but quite a few larger ones - get_empty_filp() goes for
4 iterations and sk_prot_alloc() for 26 iterations.

Anton

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] powerpc: Align hot loops of memset() and backwards_memcpy()
  2016-09-25 11:36   ` Anton Blanchard
@ 2016-09-27 19:03     ` Nicholas Piggin
  0 siblings, 0 replies; 8+ messages in thread
From: Nicholas Piggin @ 2016-09-27 19:03 UTC (permalink / raw)
  To: Anton Blanchard; +Cc: benh, paulus, mpe, agraf, linuxppc-dev

On Sun, 25 Sep 2016 21:36:59 +1000
Anton Blanchard <anton@samba.org> wrote:

> Hi Nick,
> 
> > Hmm. If we execute this loop once, we'll only fetch additional nops.
> > Twice, and we make up for them by not fetching unused instructions.
> > More than twice and we may start winning.
> > 
> > For large sizes it probably helps, but I'd like to see what sizes
> > memset sees.  
> 
> I noticed this in an nginx web serving test. There are some 1 and 2
> iteration calls, but quite a few larger ones - get_empty_filp() goes for
> 4 iterations and sk_prot_alloc() for 26 iterations.

Hi Anton,

I didn't have anything against the patch as such, I just wondered if
it's likely to be an overall win.

Thanks,
Nick

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: powerpc: Align hot loops of memset() and backwards_memcpy()
  2016-08-04  6:53 [PATCH] powerpc: Align hot loops of memset() and backwards_memcpy() Anton Blanchard
  2016-08-04  7:49 ` Christophe Leroy
  2016-08-05 11:00 ` Nicholas Piggin
@ 2016-10-05  2:36 ` Michael Ellerman
  2 siblings, 0 replies; 8+ messages in thread
From: Michael Ellerman @ 2016-10-05  2:36 UTC (permalink / raw)
  To: Anton Blanchard, benh, paulus, agraf; +Cc: linuxppc-dev

On Thu, 2016-04-08 at 06:53:22 UTC, Anton Blanchard wrote:
> From: Anton Blanchard <anton@samba.org>
> 
> Align the hot loops in our assembly implementation of memset()
> and backwards_memcpy().
> 
> backwards_memcpy() is called from tcp_v4_rcv(), so we might
> want to optimise this a little more.
> 
> Signed-off-by: Anton Blanchard <anton@samba.org>

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/12ab11a2c09b30c1938c8e82e53908

cheers

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2016-10-05  2:36 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-08-04  6:53 [PATCH] powerpc: Align hot loops of memset() and backwards_memcpy() Anton Blanchard
2016-08-04  7:49 ` Christophe Leroy
2016-08-04 10:36   ` Anton Blanchard
2016-08-05 11:00 ` Nicholas Piggin
2016-08-05 11:54   ` Anton Blanchard
2016-09-25 11:36   ` Anton Blanchard
2016-09-27 19:03     ` Nicholas Piggin
2016-10-05  2:36 ` Michael Ellerman

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.