* [PATCH] arm64:crc:accelerated-crc32-by-64bytes
@ 2018-11-19 7:29 Rui Sun
2018-11-19 16:11 ` Ard Biesheuvel
0 siblings, 1 reply; 2+ messages in thread
From: Rui Sun @ 2018-11-19 7:29 UTC (permalink / raw)
To: catalin.marinas
Cc: will.deacon, ard.biesheuvel, linux-arm-kernel, linux-kernel, Rui Sun
add 64 bytes loop to acceleration calculation
Signed-off-by: Rui Sun <sunrui26@huawei.com>
---
arch/arm64/lib/crc32.S | 54 ++++++++++++++++++++++++++++++++++++++++++++++----
1 file changed, 50 insertions(+), 4 deletions(-)
diff --git a/arch/arm64/lib/crc32.S b/arch/arm64/lib/crc32.S
index 5bc1e85..2b37009 100644
--- a/arch/arm64/lib/crc32.S
+++ b/arch/arm64/lib/crc32.S
@@ -15,15 +15,61 @@
.cpu generic+crc
.macro __crc32, c
-0: subs x2, x2, #16
- b.mi 8f
+
+64: cmp x2, #64
+ b.lt 32f
+
+ adds x11, x1, #16
+ adds x12, x1, #32
+ adds x13, x1, #48
+
+0 : subs x2, x2, #64
+ b.mi 32f
+
+ ldp x3, x4, [x1], #64
+ ldp x5, x6, [x11], #64
+ ldp x7, x8, [x12], #64
+ ldp x9, x10,[x13], #64
+
+ CPU_BE( rev x3, x3 )
+ CPU_BE( rev x4, x4 )
+ CPU_BE( rev x5, x5 )
+ CPU_BE( rev x6, x6 )
+ CPU_BE( rev x7, x7 )
+ CPU_BE( rev x8, x8 )
+ CPU_BE( rev x9, x9 )
+ CPU_BE( rev x10,x10 )
+
+ crc32\c\()x w0, w0, x3
+ crc32\c\()x w0, w0, x4
+ crc32\c\()x w0, w0, x5
+ crc32\c\()x w0, w0, x6
+ crc32\c\()x w0, w0, x7
+ crc32\c\()x w0, w0, x8
+ crc32\c\()x w0, w0, x9
+ crc32\c\()x w0, w0, x10
+
+ b.ne 0b
+ ret
+
+32: tbz x2, #5, 16f
+ ldp x3, x4, [x1], #16
+ ldp x5, x6, [x1], #16
+CPU_BE( rev x3, x3 )
+CPU_BE( rev x4, x4 )
+CPU_BE( rev x5, x5 )
+CPU_BE( rev x6, x6 )
+ crc32\c\()x w0, w0, x3
+ crc32\c\()x w0, w0, x4
+ crc32\c\()x w0, w0, x5
+ crc32\c\()x w0, w0, x6
+
+16: tbz x2, #4, 8f
ldp x3, x4, [x1], #16
CPU_BE( rev x3, x3 )
CPU_BE( rev x4, x4 )
crc32\c\()x w0, w0, x3
crc32\c\()x w0, w0, x4
- b.ne 0b
- ret
8: tbz x2, #3, 4f
ldr x3, [x1], #8
--
1.8.3.1
^ permalink raw reply related [flat|nested] 2+ messages in thread
* Re: [PATCH] arm64:crc:accelerated-crc32-by-64bytes
2018-11-19 7:29 [PATCH] arm64:crc:accelerated-crc32-by-64bytes Rui Sun
@ 2018-11-19 16:11 ` Ard Biesheuvel
0 siblings, 0 replies; 2+ messages in thread
From: Ard Biesheuvel @ 2018-11-19 16:11 UTC (permalink / raw)
To: sunrui26
Cc: Catalin Marinas, Will Deacon, linux-arm-kernel,
Linux Kernel Mailing List
On Sun, 18 Nov 2018 at 23:30, Rui Sun <sunrui26@huawei.com> wrote:
>
> add 64 bytes loop to acceleration calculation
>
Can you share some performance numbers please?
Also, we don't need 64 byte, 32 byte and 16 byte code paths: just make
the 8 byte one a loop as well, and drop the 32 byte and 16 byte ones.
> Signed-off-by: Rui Sun <sunrui26@huawei.com>
> ---
> arch/arm64/lib/crc32.S | 54 ++++++++++++++++++++++++++++++++++++++++++++++----
> 1 file changed, 50 insertions(+), 4 deletions(-)
>
> diff --git a/arch/arm64/lib/crc32.S b/arch/arm64/lib/crc32.S
> index 5bc1e85..2b37009 100644
> --- a/arch/arm64/lib/crc32.S
> +++ b/arch/arm64/lib/crc32.S
> @@ -15,15 +15,61 @@
> .cpu generic+crc
>
> .macro __crc32, c
> -0: subs x2, x2, #16
> - b.mi 8f
> +
> +64: cmp x2, #64
> + b.lt 32f
> +
> + adds x11, x1, #16
> + adds x12, x1, #32
> + adds x13, x1, #48
> +
> +0 : subs x2, x2, #64
> + b.mi 32f
> +
> + ldp x3, x4, [x1], #64
> + ldp x5, x6, [x11], #64
> + ldp x7, x8, [x12], #64
> + ldp x9, x10,[x13], #64
> +
Can we do this instead, and get rid of the temp variables?
ldp x3, x4, [x1], #64
ldp x5, x6, [x1, #-48]
ldp x7, x8, [x1, #-32]
ldp x9, x10,[x1, #-16]
> + CPU_BE( rev x3, x3 )
> + CPU_BE( rev x4, x4 )
> + CPU_BE( rev x5, x5 )
> + CPU_BE( rev x6, x6 )
> + CPU_BE( rev x7, x7 )
> + CPU_BE( rev x8, x8 )
> + CPU_BE( rev x9, x9 )
> + CPU_BE( rev x10,x10 )
> +
> + crc32\c\()x w0, w0, x3
> + crc32\c\()x w0, w0, x4
> + crc32\c\()x w0, w0, x5
> + crc32\c\()x w0, w0, x6
> + crc32\c\()x w0, w0, x7
> + crc32\c\()x w0, w0, x8
> + crc32\c\()x w0, w0, x9
> + crc32\c\()x w0, w0, x10
> +
> + b.ne 0b
> + ret
> +
> +32: tbz x2, #5, 16f
> + ldp x3, x4, [x1], #16
> + ldp x5, x6, [x1], #16
> +CPU_BE( rev x3, x3 )
> +CPU_BE( rev x4, x4 )
> +CPU_BE( rev x5, x5 )
> +CPU_BE( rev x6, x6 )
> + crc32\c\()x w0, w0, x3
> + crc32\c\()x w0, w0, x4
> + crc32\c\()x w0, w0, x5
> + crc32\c\()x w0, w0, x6
> +
> +16: tbz x2, #4, 8f
> ldp x3, x4, [x1], #16
> CPU_BE( rev x3, x3 )
> CPU_BE( rev x4, x4 )
> crc32\c\()x w0, w0, x3
> crc32\c\()x w0, w0, x4
> - b.ne 0b
> - ret
>
> 8: tbz x2, #3, 4f
> ldr x3, [x1], #8
> --
> 1.8.3.1
>
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2018-11-19 16:11 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-11-19 7:29 [PATCH] arm64:crc:accelerated-crc32-by-64bytes Rui Sun
2018-11-19 16:11 ` Ard Biesheuvel
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).