linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] lib/raid6: arm: optimize away a mask operation in NEON recovery routine
@ 2019-02-26 11:36 Ard Biesheuvel
  2019-02-28 17:48 ` Catalin Marinas
  0 siblings, 1 reply; 2+ messages in thread
From: Ard Biesheuvel @ 2019-02-26 11:36 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: catalin.marinas, will.deacon, jeremy.linton, Ard Biesheuvel

The NEON recovery code was modeled after the x86 SIMD code, and for
some reason, that code uses a 16 bit wide signed shift and a mask to
perform what amounts to a 8 bit unsigned shift. So fold the ops
together.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 lib/raid6/recov_neon_inner.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/lib/raid6/recov_neon_inner.c b/lib/raid6/recov_neon_inner.c
index 8cd20c9f834a..f80e8cead9cf 100644
--- a/lib/raid6/recov_neon_inner.c
+++ b/lib/raid6/recov_neon_inner.c
@@ -60,14 +60,14 @@ void __raid6_2data_recov_neon(int bytes, uint8_t *p, uint8_t *q, uint8_t *dp,
 		px = veorq_u8(vld1q_u8(p), vld1q_u8(dp));
 		vx = veorq_u8(vld1q_u8(q), vld1q_u8(dq));
 
-		vy = (uint8x16_t)vshrq_n_s16((int16x8_t)vx, 4);
+		vy = vshrq_n_u8(vx, 4);
 		vx = vqtbl1q_u8(qm0, vandq_u8(vx, x0f));
-		vy = vqtbl1q_u8(qm1, vandq_u8(vy, x0f));
+		vy = vqtbl1q_u8(qm1, vy);
 		qx = veorq_u8(vx, vy);
 
-		vy = (uint8x16_t)vshrq_n_s16((int16x8_t)px, 4);
+		vy = vshrq_n_u8(px, 4);
 		vx = vqtbl1q_u8(pm0, vandq_u8(px, x0f));
-		vy = vqtbl1q_u8(pm1, vandq_u8(vy, x0f));
+		vy = vqtbl1q_u8(pm1, vy);
 		vx = veorq_u8(vx, vy);
 		db = veorq_u8(vx, qx);
 
@@ -100,9 +100,9 @@ void __raid6_datap_recov_neon(int bytes, uint8_t *p, uint8_t *q, uint8_t *dq,
 
 		vx = veorq_u8(vld1q_u8(q), vld1q_u8(dq));
 
-		vy = (uint8x16_t)vshrq_n_s16((int16x8_t)vx, 4);
+		vy = vshrq_n_u8(vx, 4);
 		vx = vqtbl1q_u8(qm0, vandq_u8(vx, x0f));
-		vy = vqtbl1q_u8(qm1, vandq_u8(vy, x0f));
+		vy = vqtbl1q_u8(qm1, vy);
 		vx = veorq_u8(vx, vy);
 		vy = veorq_u8(vx, vld1q_u8(p));
 
-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 2+ messages in thread

* Re: [PATCH] lib/raid6: arm: optimize away a mask operation in NEON recovery routine
  2019-02-26 11:36 [PATCH] lib/raid6: arm: optimize away a mask operation in NEON recovery routine Ard Biesheuvel
@ 2019-02-28 17:48 ` Catalin Marinas
  0 siblings, 0 replies; 2+ messages in thread
From: Catalin Marinas @ 2019-02-28 17:48 UTC (permalink / raw)
  To: Ard Biesheuvel; +Cc: will.deacon, jeremy.linton, linux-arm-kernel

On Tue, Feb 26, 2019 at 12:36:18PM +0100, Ard Biesheuvel wrote:
> The NEON recovery code was modeled after the x86 SIMD code, and for
> some reason, that code uses a 16 bit wide signed shift and a mask to
> perform what amounts to a 8 bit unsigned shift. So fold the ops
> together.
> 
> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> ---
>  lib/raid6/recov_neon_inner.c | 12 ++++++------
>  1 file changed, 6 insertions(+), 6 deletions(-)

Queued for 5.1. Thanks.

-- 
Catalin

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2019-02-28 17:48 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-02-26 11:36 [PATCH] lib/raid6: arm: optimize away a mask operation in NEON recovery routine Ard Biesheuvel
2019-02-28 17:48 ` Catalin Marinas

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).