* [PATCH] md/raid6: delta syndrome for ARM NEON
@ 2015-06-24 16:43 Ard Biesheuvel
2015-06-25 6:32 ` AW: " Markus Stockhausen
2015-06-29 1:32 ` NeilBrown
0 siblings, 2 replies; 6+ messages in thread
From: Ard Biesheuvel @ 2015-06-24 16:43 UTC (permalink / raw)
To: linux-arm-kernel
This implements XOR syndrome calculation using NEON intrinsics.
As before, the module can be built for ARM and arm64 from the
same source.
Relative performance on a Cortex-A57 based system:
raid6: int64x1 gen() 905 MB/s
raid6: int64x1 xor() 881 MB/s
raid6: int64x2 gen() 1343 MB/s
raid6: int64x2 xor() 1286 MB/s
raid6: int64x4 gen() 1896 MB/s
raid6: int64x4 xor() 1321 MB/s
raid6: int64x8 gen() 1773 MB/s
raid6: int64x8 xor() 1165 MB/s
raid6: neonx1 gen() 1834 MB/s
raid6: neonx1 xor() 1278 MB/s
raid6: neonx2 gen() 2528 MB/s
raid6: neonx2 xor() 1942 MB/s
raid6: neonx4 gen() 2888 MB/s
raid6: neonx4 xor() 2334 MB/s
raid6: neonx8 gen() 2957 MB/s
raid6: neonx8 xor() 2232 MB/s
raid6: using algorithm neonx8 gen() 2957 MB/s
raid6: .... xor() 2232 MB/s, rmw enabled
Cc: Markus Stockhausen <stockhausen@collogia.de>
Cc: Neil Brown <neilb@suse.de>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
lib/raid6/neon.c | 13 ++++++++++++-
lib/raid6/neon.uc | 46 ++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 58 insertions(+), 1 deletion(-)
diff --git a/lib/raid6/neon.c b/lib/raid6/neon.c
index d9ad6ee284f4..7076ef1ba3dd 100644
--- a/lib/raid6/neon.c
+++ b/lib/raid6/neon.c
@@ -40,9 +40,20 @@
(unsigned long)bytes, ptrs); \
kernel_neon_end(); \
} \
+ static void raid6_neon ## _n ## _xor_syndrome(int disks, \
+ int start, int stop, \
+ size_t bytes, void **ptrs) \
+ { \
+ void raid6_neon ## _n ## _xor_syndrome_real(int, \
+ int, int, unsigned long, void**); \
+ kernel_neon_begin(); \
+ raid6_neon ## _n ## _xor_syndrome_real(disks, \
+ start, stop, (unsigned long)bytes, ptrs); \
+ kernel_neon_end(); \
+ } \
struct raid6_calls const raid6_neonx ## _n = { \
raid6_neon ## _n ## _gen_syndrome, \
- NULL, /* XOR not yet implemented */ \
+ raid6_neon ## _n ## _xor_syndrome, \
raid6_have_neon, \
"neonx" #_n, \
0 \
diff --git a/lib/raid6/neon.uc b/lib/raid6/neon.uc
index 1b9ed793342d..4fa51b761dd0 100644
--- a/lib/raid6/neon.uc
+++ b/lib/raid6/neon.uc
@@ -3,6 +3,7 @@
* neon.uc - RAID-6 syndrome calculation using ARM NEON instructions
*
* Copyright (C) 2012 Rob Herring
+ * Copyright (C) 2015 Linaro Ltd. <ard.biesheuvel@linaro.org>
*
* Based on altivec.uc:
* Copyright 2002-2004 H. Peter Anvin - All Rights Reserved
@@ -78,3 +79,48 @@ void raid6_neon$#_gen_syndrome_real(int disks, unsigned long bytes, void **ptrs)
vst1q_u8(&q[d+NSIZE*$$], wq$$);
}
}
+
+void raid6_neon$#_xor_syndrome_real(int disks, int start, int stop,
+ unsigned long bytes, void **ptrs)
+{
+ uint8_t **dptr = (uint8_t **)ptrs;
+ uint8_t *p, *q;
+ int d, z, z0;
+
+ register unative_t wd$$, wq$$, wp$$, w1$$, w2$$;
+ const unative_t x1d = NBYTES(0x1d);
+
+ z0 = stop; /* P/Q right side optimization */
+ p = dptr[disks-2]; /* XOR parity */
+ q = dptr[disks-1]; /* RS syndrome */
+
+ for ( d = 0 ; d < bytes ; d += NSIZE*$# ) {
+ wq$$ = vld1q_u8(&dptr[z0][d+$$*NSIZE]);
+ wp$$ = veorq_u8(vld1q_u8(&p[d+$$*NSIZE]), wq$$);
+
+ /* P/Q data pages */
+ for ( z = z0-1 ; z >= start ; z-- ) {
+ wd$$ = vld1q_u8(&dptr[z][d+$$*NSIZE]);
+ wp$$ = veorq_u8(wp$$, wd$$);
+ w2$$ = MASK(wq$$);
+ w1$$ = SHLBYTE(wq$$);
+
+ w2$$ = vandq_u8(w2$$, x1d);
+ w1$$ = veorq_u8(w1$$, w2$$);
+ wq$$ = veorq_u8(w1$$, wd$$);
+ }
+ /* P/Q left side optimization */
+ for ( z = start-1 ; z >= 0 ; z-- ) {
+ w2$$ = MASK(wq$$);
+ w1$$ = SHLBYTE(wq$$);
+
+ w2$$ = vandq_u8(w2$$, x1d);
+ wq$$ = veorq_u8(w1$$, w2$$);
+ }
+ w1$$ = vld1q_u8(&q[d+NSIZE*$$]);
+ wq$$ = veorq_u8(wq$$, w1$$);
+
+ vst1q_u8(&p[d+NSIZE*$$], wp$$);
+ vst1q_u8(&q[d+NSIZE*$$], wq$$);
+ }
+}
--
1.9.1
^ permalink raw reply related [flat|nested] 6+ messages in thread
* AW: [PATCH] md/raid6: delta syndrome for ARM NEON
2015-06-24 16:43 [PATCH] md/raid6: delta syndrome for ARM NEON Ard Biesheuvel
@ 2015-06-25 6:32 ` Markus Stockhausen
2015-06-25 8:30 ` Ard Biesheuvel
2015-06-29 1:32 ` NeilBrown
1 sibling, 1 reply; 6+ messages in thread
From: Markus Stockhausen @ 2015-06-25 6:32 UTC (permalink / raw)
To: linux-arm-kernel
> Von: Ard Biesheuvel [ard.biesheuvel at linaro.org]
> Gesendet: Mittwoch, 24. Juni 2015 18:43
> An: linux-arm-kernel at lists.infradead.org; hpa at zytor.com
> Cc: Ard Biesheuvel; Markus Stockhausen; Neil Brown
> Betreff: [PATCH] md/raid6: delta syndrome for ARM NEON
>
> This implements XOR syndrome calculation using NEON intrinsics.
> As before, the module can be built for ARM and arm64 from the
> same source.
> Relative performance on a Cortex-A57 based system:
>
> raid6: int64x1 gen() 905 MB/s
> raid6: int64x1 xor() 881 MB/s
> raid6: int64x2 gen() 1343 MB/s
> raid6: int64x2 xor() 1286 MB/s
> raid6: int64x4 gen() 1896 MB/s
> raid6: int64x4 xor() 1321 MB/s
> raid6: int64x8 gen() 1773 MB/s
> raid6: int64x8 xor() 1165 MB/s
> raid6: neonx1 gen() 1834 MB/s
> raid6: neonx1 xor() 1278 MB/s
> raid6: neonx2 gen() 2528 MB/s
> raid6: neonx2 xor() 1942 MB/s
> raid6: neonx4 gen() 2888 MB/s
> raid6: neonx4 xor() 2334 MB/s
> raid6: neonx8 gen() 2957 MB/s
> raid6: neonx8 xor() 2232 MB/s
> raid6: using algorithm neonx8 gen() 2957 MB/s
> raid6: .... xor() 2232 MB/s, rmw enabled
>
Nice to see that the placeholders get filled.
Did you have a chance to do some real tests?
Markus
>
> Cc: Markus Stockhausen <stockhausen@collogia.de>
> Cc: Neil Brown <neilb@suse.de>
> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> ---
> ...
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: InterScan_Disclaimer.txt
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20150625/23b4482f/attachment.txt>
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH] md/raid6: delta syndrome for ARM NEON
2015-06-25 6:32 ` AW: " Markus Stockhausen
@ 2015-06-25 8:30 ` Ard Biesheuvel
2015-06-25 8:50 ` Ard Biesheuvel
2015-06-27 19:54 ` AW: " Markus Stockhausen
0 siblings, 2 replies; 6+ messages in thread
From: Ard Biesheuvel @ 2015-06-25 8:30 UTC (permalink / raw)
To: linux-arm-kernel
On 25 June 2015 at 08:32, Markus Stockhausen <stockhausen@collogia.de> wrote:
>> Von: Ard Biesheuvel [ard.biesheuvel at linaro.org]
>> Gesendet: Mittwoch, 24. Juni 2015 18:43
>> An: linux-arm-kernel at lists.infradead.org; hpa at zytor.com
>> Cc: Ard Biesheuvel; Markus Stockhausen; Neil Brown
>> Betreff: [PATCH] md/raid6: delta syndrome for ARM NEON
>>
>> This implements XOR syndrome calculation using NEON intrinsics.
>> As before, the module can be built for ARM and arm64 from the
>> same source.
>
>> Relative performance on a Cortex-A57 based system:
>>
>> raid6: int64x1 gen() 905 MB/s
>> raid6: int64x1 xor() 881 MB/s
>> raid6: int64x2 gen() 1343 MB/s
>> raid6: int64x2 xor() 1286 MB/s
>> raid6: int64x4 gen() 1896 MB/s
>> raid6: int64x4 xor() 1321 MB/s
>> raid6: int64x8 gen() 1773 MB/s
>> raid6: int64x8 xor() 1165 MB/s
>> raid6: neonx1 gen() 1834 MB/s
>> raid6: neonx1 xor() 1278 MB/s
>> raid6: neonx2 gen() 2528 MB/s
>> raid6: neonx2 xor() 1942 MB/s
>> raid6: neonx4 gen() 2888 MB/s
>> raid6: neonx4 xor() 2334 MB/s
>> raid6: neonx8 gen() 2957 MB/s
>> raid6: neonx8 xor() 2232 MB/s
>> raid6: using algorithm neonx8 gen() 2957 MB/s
>> raid6: .... xor() 2232 MB/s, rmw enabled
>>
>
> Nice to see that the placeholders get filled.
> Did you have a chance to do some real tests?
>
Hello Markus,
I haven't done any real world testing yet. Can you recommend any test
tools or test suites in particular?
Thanks,
Ard.
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH] md/raid6: delta syndrome for ARM NEON
2015-06-25 8:30 ` Ard Biesheuvel
@ 2015-06-25 8:50 ` Ard Biesheuvel
2015-06-27 19:54 ` AW: " Markus Stockhausen
1 sibling, 0 replies; 6+ messages in thread
From: Ard Biesheuvel @ 2015-06-25 8:50 UTC (permalink / raw)
To: linux-arm-kernel
On 25 June 2015 at 10:30, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
> On 25 June 2015 at 08:32, Markus Stockhausen <stockhausen@collogia.de> wrote:
>>> Von: Ard Biesheuvel [ard.biesheuvel at linaro.org]
>>> Gesendet: Mittwoch, 24. Juni 2015 18:43
>>> An: linux-arm-kernel at lists.infradead.org; hpa at zytor.com
>>> Cc: Ard Biesheuvel; Markus Stockhausen; Neil Brown
>>> Betreff: [PATCH] md/raid6: delta syndrome for ARM NEON
>>>
>>> This implements XOR syndrome calculation using NEON intrinsics.
>>> As before, the module can be built for ARM and arm64 from the
>>> same source.
>>
>>> Relative performance on a Cortex-A57 based system:
>>>
>>> raid6: int64x1 gen() 905 MB/s
>>> raid6: int64x1 xor() 881 MB/s
>>> raid6: int64x2 gen() 1343 MB/s
>>> raid6: int64x2 xor() 1286 MB/s
>>> raid6: int64x4 gen() 1896 MB/s
>>> raid6: int64x4 xor() 1321 MB/s
>>> raid6: int64x8 gen() 1773 MB/s
>>> raid6: int64x8 xor() 1165 MB/s
>>> raid6: neonx1 gen() 1834 MB/s
>>> raid6: neonx1 xor() 1278 MB/s
>>> raid6: neonx2 gen() 2528 MB/s
>>> raid6: neonx2 xor() 1942 MB/s
>>> raid6: neonx4 gen() 2888 MB/s
>>> raid6: neonx4 xor() 2334 MB/s
>>> raid6: neonx8 gen() 2957 MB/s
>>> raid6: neonx8 xor() 2232 MB/s
>>> raid6: using algorithm neonx8 gen() 2957 MB/s
>>> raid6: .... xor() 2232 MB/s, rmw enabled
>>>
>>
>> Nice to see that the placeholders get filled.
>> Did you have a chance to do some real tests?
>>
>
> Hello Markus,
>
> I haven't done any real world testing yet. Can you recommend any test
> tools or test suites in particular?
>
I am assuming you are asking about benchmarks, right? The code passes
the raid6test tests, so the correctness part should be covered
(although more testing is always better, of course)
--
Ard.
^ permalink raw reply [flat|nested] 6+ messages in thread
* AW: [PATCH] md/raid6: delta syndrome for ARM NEON
2015-06-25 8:30 ` Ard Biesheuvel
2015-06-25 8:50 ` Ard Biesheuvel
@ 2015-06-27 19:54 ` Markus Stockhausen
1 sibling, 0 replies; 6+ messages in thread
From: Markus Stockhausen @ 2015-06-27 19:54 UTC (permalink / raw)
To: linux-arm-kernel
> Von: Ard Biesheuvel [ard.biesheuvel at linaro.org]
> Gesendet: Donnerstag, 25. Juni 2015 10:30
> An: Markus Stockhausen
> Cc: linux-arm-kernel at lists.infradead.org; hpa at zytor.com; Neil Brown
> Betreff: Re: [PATCH] md/raid6: delta syndrome for ARM NEON
>
> On 25 June 2015 at 08:32, Markus Stockhausen <stockhausen@collogia.de> wrote:
> >> Von: Ard Biesheuvel [ard.biesheuvel at linaro.org]
> >> Gesendet: Mittwoch, 24. Juni 2015 18:43
> >> An: linux-arm-kernel at lists.infradead.org; hpa at zytor.com
> >> Cc: Ard Biesheuvel; Markus Stockhausen; Neil Brown
> >> Betreff: [PATCH] md/raid6: delta syndrome for ARM NEON
> >>
> >> This implements XOR syndrome calculation using NEON intrinsics.
> >> As before, the module can be built for ARM and arm64 from the
> >> same source.
> >
> >> Relative performance on a Cortex-A57 based system:
> >>
> >> raid6: int64x1 gen() 905 MB/s
> >> raid6: int64x1 xor() 881 MB/s
> >> raid6: int64x2 gen() 1343 MB/s
> >> raid6: int64x2 xor() 1286 MB/s
> >> raid6: int64x4 gen() 1896 MB/s
> >> raid6: int64x4 xor() 1321 MB/s
> >> raid6: int64x8 gen() 1773 MB/s
> >> raid6: int64x8 xor() 1165 MB/s
> >> raid6: neonx1 gen() 1834 MB/s
> >> raid6: neonx1 xor() 1278 MB/s
> >> raid6: neonx2 gen() 2528 MB/s
> >> raid6: neonx2 xor() 1942 MB/s
> >> raid6: neonx4 gen() 2888 MB/s
> >> raid6: neonx4 xor() 2334 MB/s
> >> raid6: neonx8 gen() 2957 MB/s
> >> raid6: neonx8 xor() 2232 MB/s
> >> raid6: using algorithm neonx8 gen() 2957 MB/s
> >> raid6: .... xor() 2232 MB/s, rmw enabled
> >>
> >
> > Nice to see that the placeholders get filled.
> > Did you have a chance to do some real tests?
> >
>
> Hello Markus,
>
> I haven't done any real world testing yet. Can you recommend any test
> tools or test suites in particular?
Not really. I was just curious if someone with a larger setup than mine
had a chance to do some testing. Especially because I had only discontinued
hardware for the benchmarks.
Markus
Markus
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: InterScan_Disclaimer.txt
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20150627/2869ae5f/attachment.txt>
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH] md/raid6: delta syndrome for ARM NEON
2015-06-24 16:43 [PATCH] md/raid6: delta syndrome for ARM NEON Ard Biesheuvel
2015-06-25 6:32 ` AW: " Markus Stockhausen
@ 2015-06-29 1:32 ` NeilBrown
1 sibling, 0 replies; 6+ messages in thread
From: NeilBrown @ 2015-06-29 1:32 UTC (permalink / raw)
To: linux-arm-kernel
On Wed, 24 Jun 2015 18:43:33 +0200 Ard Biesheuvel
<ard.biesheuvel@linaro.org> wrote:
> This implements XOR syndrome calculation using NEON intrinsics.
> As before, the module can be built for ARM and arm64 from the
> same source.
>
> Relative performance on a Cortex-A57 based system:
>
> raid6: int64x1 gen() 905 MB/s
> raid6: int64x1 xor() 881 MB/s
> raid6: int64x2 gen() 1343 MB/s
> raid6: int64x2 xor() 1286 MB/s
> raid6: int64x4 gen() 1896 MB/s
> raid6: int64x4 xor() 1321 MB/s
> raid6: int64x8 gen() 1773 MB/s
> raid6: int64x8 xor() 1165 MB/s
> raid6: neonx1 gen() 1834 MB/s
> raid6: neonx1 xor() 1278 MB/s
> raid6: neonx2 gen() 2528 MB/s
> raid6: neonx2 xor() 1942 MB/s
> raid6: neonx4 gen() 2888 MB/s
> raid6: neonx4 xor() 2334 MB/s
> raid6: neonx8 gen() 2957 MB/s
> raid6: neonx8 xor() 2232 MB/s
> raid6: using algorithm neonx8 gen() 2957 MB/s
> raid6: .... xor() 2232 MB/s, rmw enabled
>
> Cc: Markus Stockhausen <stockhausen@collogia.de>
> Cc: Neil Brown <neilb@suse.de>
> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> ---
> lib/raid6/neon.c | 13 ++++++++++++-
> lib/raid6/neon.uc | 46 ++++++++++++++++++++++++++++++++++++++++++++++
> 2 files changed, 58 insertions(+), 1 deletion(-)
>
> diff --git a/lib/raid6/neon.c b/lib/raid6/neon.c
> index d9ad6ee284f4..7076ef1ba3dd 100644
> --- a/lib/raid6/neon.c
> +++ b/lib/raid6/neon.c
> @@ -40,9 +40,20 @@
> (unsigned long)bytes, ptrs); \
> kernel_neon_end(); \
> } \
> + static void raid6_neon ## _n ## _xor_syndrome(int disks, \
> + int start, int stop, \
> + size_t bytes, void **ptrs) \
> + { \
> + void raid6_neon ## _n ## _xor_syndrome_real(int, \
> + int, int, unsigned long, void**); \
> + kernel_neon_begin(); \
> + raid6_neon ## _n ## _xor_syndrome_real(disks, \
> + start, stop, (unsigned long)bytes, ptrs); \
> + kernel_neon_end(); \
> + } \
> struct raid6_calls const raid6_neonx ## _n = { \
> raid6_neon ## _n ## _gen_syndrome, \
> - NULL, /* XOR not yet implemented */ \
> + raid6_neon ## _n ## _xor_syndrome, \
> raid6_have_neon, \
> "neonx" #_n, \
> 0 \
> diff --git a/lib/raid6/neon.uc b/lib/raid6/neon.uc
> index 1b9ed793342d..4fa51b761dd0 100644
> --- a/lib/raid6/neon.uc
> +++ b/lib/raid6/neon.uc
> @@ -3,6 +3,7 @@
> * neon.uc - RAID-6 syndrome calculation using ARM NEON instructions
> *
> * Copyright (C) 2012 Rob Herring
> + * Copyright (C) 2015 Linaro Ltd. <ard.biesheuvel@linaro.org>
> *
> * Based on altivec.uc:
> * Copyright 2002-2004 H. Peter Anvin - All Rights Reserved
> @@ -78,3 +79,48 @@ void raid6_neon$#_gen_syndrome_real(int disks, unsigned long bytes, void **ptrs)
> vst1q_u8(&q[d+NSIZE*$$], wq$$);
> }
> }
> +
> +void raid6_neon$#_xor_syndrome_real(int disks, int start, int stop,
> + unsigned long bytes, void **ptrs)
> +{
> + uint8_t **dptr = (uint8_t **)ptrs;
> + uint8_t *p, *q;
> + int d, z, z0;
> +
> + register unative_t wd$$, wq$$, wp$$, w1$$, w2$$;
> + const unative_t x1d = NBYTES(0x1d);
> +
> + z0 = stop; /* P/Q right side optimization */
> + p = dptr[disks-2]; /* XOR parity */
> + q = dptr[disks-1]; /* RS syndrome */
> +
> + for ( d = 0 ; d < bytes ; d += NSIZE*$# ) {
> + wq$$ = vld1q_u8(&dptr[z0][d+$$*NSIZE]);
> + wp$$ = veorq_u8(vld1q_u8(&p[d+$$*NSIZE]), wq$$);
> +
> + /* P/Q data pages */
> + for ( z = z0-1 ; z >= start ; z-- ) {
> + wd$$ = vld1q_u8(&dptr[z][d+$$*NSIZE]);
> + wp$$ = veorq_u8(wp$$, wd$$);
> + w2$$ = MASK(wq$$);
> + w1$$ = SHLBYTE(wq$$);
> +
> + w2$$ = vandq_u8(w2$$, x1d);
> + w1$$ = veorq_u8(w1$$, w2$$);
> + wq$$ = veorq_u8(w1$$, wd$$);
> + }
> + /* P/Q left side optimization */
> + for ( z = start-1 ; z >= 0 ; z-- ) {
> + w2$$ = MASK(wq$$);
> + w1$$ = SHLBYTE(wq$$);
> +
> + w2$$ = vandq_u8(w2$$, x1d);
> + wq$$ = veorq_u8(w1$$, w2$$);
> + }
> + w1$$ = vld1q_u8(&q[d+NSIZE*$$]);
> + wq$$ = veorq_u8(wq$$, w1$$);
> +
> + vst1q_u8(&p[d+NSIZE*$$], wp$$);
> + vst1q_u8(&q[d+NSIZE*$$], wq$$);
> + }
> +}
Looks good, thanks.
I've queued this for the next merge window (4.3)
NeilBrown
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2015-06-29 1:32 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-06-24 16:43 [PATCH] md/raid6: delta syndrome for ARM NEON Ard Biesheuvel
2015-06-25 6:32 ` AW: " Markus Stockhausen
2015-06-25 8:30 ` Ard Biesheuvel
2015-06-25 8:50 ` Ard Biesheuvel
2015-06-27 19:54 ` AW: " Markus Stockhausen
2015-06-29 1:32 ` NeilBrown
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.