All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] powerpc/32: Optimise __csum_partial()
@ 2018-05-24 11:22 Christophe Leroy
  2018-05-24 19:58 ` Segher Boessenkool
  2018-06-04 14:11 ` Michael Ellerman
  0 siblings, 2 replies; 3+ messages in thread
From: Christophe Leroy @ 2018-05-24 11:22 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, segher
  Cc: linux-kernel, linuxppc-dev

Improve __csum_partial by interleaving loads and adds.

On a 8xx, it brings neither improvement nor degradation.
On a 83xx, it brings a 25% improvement.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
---
 arch/powerpc/lib/checksum_32.S | 13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/lib/checksum_32.S b/arch/powerpc/lib/checksum_32.S
index d2238ea82209..aa224069f93a 100644
--- a/arch/powerpc/lib/checksum_32.S
+++ b/arch/powerpc/lib/checksum_32.S
@@ -47,16 +47,25 @@ _GLOBAL(__csum_partial)
 	bdnz	2b
 21:	srwi.	r6,r4,4		/* # blocks of 4 words to do */
 	beq	3f
+	lwz	r0,4(r3)
 	mtctr	r6
-22:	lwz	r0,4(r3)
 	lwz	r6,8(r3)
+	adde	r5,r5,r0
 	lwz	r7,12(r3)
+	adde	r5,r5,r6
 	lwzu	r8,16(r3)
+	adde	r5,r5,r7
+	bdz	23f
+22:	lwz	r0,4(r3)
+	adde	r5,r5,r8
+	lwz	r6,8(r3)
 	adde	r5,r5,r0
+	lwz	r7,12(r3)
 	adde	r5,r5,r6
+	lwzu	r8,16(r3)
 	adde	r5,r5,r7
-	adde	r5,r5,r8
 	bdnz	22b
+23:	adde	r5,r5,r8
 3:	andi.	r0,r4,2
 	beq+	4f
 	lhz	r0,4(r3)
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH] powerpc/32: Optimise __csum_partial()
  2018-05-24 11:22 [PATCH] powerpc/32: Optimise __csum_partial() Christophe Leroy
@ 2018-05-24 19:58 ` Segher Boessenkool
  2018-06-04 14:11 ` Michael Ellerman
  1 sibling, 0 replies; 3+ messages in thread
From: Segher Boessenkool @ 2018-05-24 19:58 UTC (permalink / raw)
  To: Christophe Leroy
  Cc: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	linux-kernel, linuxppc-dev

On Thu, May 24, 2018 at 11:22:27AM +0000, Christophe Leroy wrote:
> Improve __csum_partial by interleaving loads and adds.
> 
> On a 8xx, it brings neither improvement nor degradation.
> On a 83xx, it brings a 25% improvement.

Thanks!  Looks fine to me.

> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>

Reviewed-by: Segher Boessenkool <segher@kernel.crashing.org>

> ---
>  arch/powerpc/lib/checksum_32.S | 13 +++++++++++--
>  1 file changed, 11 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/powerpc/lib/checksum_32.S b/arch/powerpc/lib/checksum_32.S
> index d2238ea82209..aa224069f93a 100644
> --- a/arch/powerpc/lib/checksum_32.S
> +++ b/arch/powerpc/lib/checksum_32.S
> @@ -47,16 +47,25 @@ _GLOBAL(__csum_partial)
>  	bdnz	2b
>  21:	srwi.	r6,r4,4		/* # blocks of 4 words to do */
>  	beq	3f
> +	lwz	r0,4(r3)
>  	mtctr	r6
> -22:	lwz	r0,4(r3)
>  	lwz	r6,8(r3)
> +	adde	r5,r5,r0
>  	lwz	r7,12(r3)
> +	adde	r5,r5,r6
>  	lwzu	r8,16(r3)
> +	adde	r5,r5,r7
> +	bdz	23f
> +22:	lwz	r0,4(r3)
> +	adde	r5,r5,r8
> +	lwz	r6,8(r3)
>  	adde	r5,r5,r0
> +	lwz	r7,12(r3)
>  	adde	r5,r5,r6
> +	lwzu	r8,16(r3)
>  	adde	r5,r5,r7
> -	adde	r5,r5,r8
>  	bdnz	22b
> +23:	adde	r5,r5,r8
>  3:	andi.	r0,r4,2
>  	beq+	4f
>  	lhz	r0,4(r3)
> -- 
> 2.13.3

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: powerpc/32: Optimise __csum_partial()
  2018-05-24 11:22 [PATCH] powerpc/32: Optimise __csum_partial() Christophe Leroy
  2018-05-24 19:58 ` Segher Boessenkool
@ 2018-06-04 14:11 ` Michael Ellerman
  1 sibling, 0 replies; 3+ messages in thread
From: Michael Ellerman @ 2018-06-04 14:11 UTC (permalink / raw)
  To: Christophe Leroy, Benjamin Herrenschmidt, Paul Mackerras, segher
  Cc: linuxppc-dev, linux-kernel

On Thu, 2018-05-24 at 11:22:27 UTC, Christophe Leroy wrote:
> Improve __csum_partial by interleaving loads and adds.
> 
> On a 8xx, it brings neither improvement nor degradation.
> On a 83xx, it brings a 25% improvement.
> 
> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
> Reviewed-by: Segher Boessenkool <segher@kernel.crashing.org>

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/373e098e1e788d7b89ec0f31765a6c

cheers

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2018-06-04 14:11 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-05-24 11:22 [PATCH] powerpc/32: Optimise __csum_partial() Christophe Leroy
2018-05-24 19:58 ` Segher Boessenkool
2018-06-04 14:11 ` Michael Ellerman

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.