* [PATCH] Skip benchmarking of non-best xor_syndrome functions
@ 2021-12-29 22:34 Dirk Müller
2021-12-30 11:33 ` Paul Menzel
2022-01-01 23:49 ` Song Liu
0 siblings, 2 replies; 7+ messages in thread
From: Dirk Müller @ 2021-12-29 22:34 UTC (permalink / raw)
To: linux-raid; +Cc: Dirk Müller
In commit fe5cbc6e06c7 ("md/raid6 algorithms: delta syndrome functions")
a xor_syndrome() has been added for optimized syndrome calculation and
being added also to to the raid6_choose_gen() function. However, the
result of the xor_syndrome() benchmarking was intentionally discarded
and did not influence the choice.
We can optimize raid6_choose_gen() without changing its behavior by
only benchmarking the chosen xor_syndrome() variant and printing it
output, rather than benchmarking also the ones that are discarded and
never influence the outcome.
For x86_64, this removes 8 out of 18 benchmark loops which each take
16 jiffies, so up to 160 jiffies saved on module load (640ms on a 250HZ
kernel).
Signed-off-by: Dirk Müller <dmueller@suse.de>
---
lib/raid6/algos.c | 76 +++++++++++++++++++++++------------------------
1 file changed, 37 insertions(+), 39 deletions(-)
diff --git a/lib/raid6/algos.c b/lib/raid6/algos.c
index 6d5e5000fdd7..889033b7fc0d 100644
--- a/lib/raid6/algos.c
+++ b/lib/raid6/algos.c
@@ -145,12 +145,12 @@ static inline const struct raid6_recov_calls *raid6_choose_recov(void)
static inline const struct raid6_calls *raid6_choose_gen(
void *(*const dptrs)[RAID6_TEST_DISKS], const int disks)
{
- unsigned long perf, bestgenperf, bestxorperf, j0, j1;
+ unsigned long perf, bestgenperf, j0, j1;
int start = (disks>>1)-1, stop = disks-3; /* work on the second half of the disks */
const struct raid6_calls *const *algo;
const struct raid6_calls *best;
- for (bestgenperf = 0, bestxorperf = 0, best = NULL, algo = raid6_algos; *algo; algo++) {
+ for (bestgenperf = 0, best = NULL, algo = raid6_algos; *algo; algo++) {
if (!best || (*algo)->prefer >= best->prefer) {
if ((*algo)->valid && !(*algo)->valid())
continue;
@@ -180,50 +180,48 @@ static inline const struct raid6_calls *raid6_choose_gen(
pr_info("raid6: %-8s gen() %5ld MB/s\n", (*algo)->name,
(perf * HZ * (disks-2)) >>
(20 - PAGE_SHIFT + RAID6_TIME_JIFFIES_LG2));
+ }
+ }
- if (!(*algo)->xor_syndrome)
- continue;
+ if (best == NULL) {
+ pr_err("raid6: Yikes! No algorithm found!\n");
+ goto out;
+ }
- perf = 0;
+ raid6_call = *best;
- preempt_disable();
- j0 = jiffies;
- while ((j1 = jiffies) == j0)
- cpu_relax();
- while (time_before(jiffies,
- j1 + (1<<RAID6_TIME_JIFFIES_LG2))) {
- (*algo)->xor_syndrome(disks, start, stop,
- PAGE_SIZE, *dptrs);
- perf++;
- }
- preempt_enable();
-
- if (best == *algo)
- bestxorperf = perf;
+ if (!IS_ENABLED(CONFIG_RAID6_PQ_BENCHMARK)) {
+ pr_info("raid6: skipped pq benchmark and selected %s\n",
+ best->name);
+ goto out;
+ }
- pr_info("raid6: %-8s xor() %5ld MB/s\n", (*algo)->name,
- (perf * HZ * (disks-2)) >>
- (20 - PAGE_SHIFT + RAID6_TIME_JIFFIES_LG2 + 1));
+ pr_info("raid6: using algorithm %s gen() %ld MB/s\n",
+ best->name,
+ (bestgenperf * HZ * (disks-2)) >>
+ (20 - PAGE_SHIFT+RAID6_TIME_JIFFIES_LG2));
+
+ if (best->xor_syndrome) {
+ perf = 0;
+
+ preempt_disable();
+ j0 = jiffies;
+ while ((j1 = jiffies) == j0)
+ cpu_relax();
+ while (time_before(jiffies,
+ j1 + (1 << RAID6_TIME_JIFFIES_LG2))) {
+ best->xor_syndrome(disks, start, stop,
+ PAGE_SIZE, *dptrs);
+ perf++;
}
- }
+ preempt_enable();
- if (best) {
- if (IS_ENABLED(CONFIG_RAID6_PQ_BENCHMARK)) {
- pr_info("raid6: using algorithm %s gen() %ld MB/s\n",
- best->name,
- (bestgenperf * HZ * (disks-2)) >>
- (20 - PAGE_SHIFT+RAID6_TIME_JIFFIES_LG2));
- if (best->xor_syndrome)
- pr_info("raid6: .... xor() %ld MB/s, rmw enabled\n",
- (bestxorperf * HZ * (disks-2)) >>
- (20 - PAGE_SHIFT + RAID6_TIME_JIFFIES_LG2 + 1));
- } else
- pr_info("raid6: skip pq benchmark and using algorithm %s\n",
- best->name);
- raid6_call = *best;
- } else
- pr_err("raid6: Yikes! No algorithm found!\n");
+ pr_info("raid6: .... xor() %ld MB/s, rmw enabled\n",
+ (perf * HZ * (disks-2)) >>
+ (20 - PAGE_SHIFT + RAID6_TIME_JIFFIES_LG2 + 1));
+ }
+out:
return best;
}
--
2.34.1
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH] Skip benchmarking of non-best xor_syndrome functions
2021-12-29 22:34 [PATCH] Skip benchmarking of non-best xor_syndrome functions Dirk Müller
@ 2021-12-30 11:33 ` Paul Menzel
2021-12-31 8:35 ` Dirk Müller
2022-01-01 23:49 ` Song Liu
1 sibling, 1 reply; 7+ messages in thread
From: Paul Menzel @ 2021-12-30 11:33 UTC (permalink / raw)
To: Dirk Müller; +Cc: linux-raid
Dear Dirk,
Am 29.12.21 um 23:34 schrieb Dirk Müller:
> In commit fe5cbc6e06c7 ("md/raid6 algorithms: delta syndrome functions")
> a xor_syndrome() has been added for optimized syndrome calculation and
was added
> being added also to to the raid6_choose_gen() function. However, the > result of the xor_syndrome() benchmarking was intentionally discarded
s/to to/to/
> and did not influence the choice.
>
> We can optimize raid6_choose_gen() without changing its behavior by
> only benchmarking the chosen xor_syndrome() variant and printing it
it*s*
> output, rather than benchmarking also the ones that are discarded and
> never influence the outcome.
>
> For x86_64, this removes 8 out of 18 benchmark loops which each take
> 16 jiffies, so up to 160 jiffies saved on module load (640ms on a 250HZ
> kernel).
On what system?
The new message below is logged?
raid6: skipped pq benchmark and selected …
I am booting my non-RAID systems with `cryptomgr.notests` to avoid this
boot time penalty.
> Signed-off-by: Dirk Müller <dmueller@suse.de>
> ---
> lib/raid6/algos.c | 76 +++++++++++++++++++++++------------------------
> 1 file changed, 37 insertions(+), 39 deletions(-)
>
> diff --git a/lib/raid6/algos.c b/lib/raid6/algos.c
> index 6d5e5000fdd7..889033b7fc0d 100644
> --- a/lib/raid6/algos.c
> +++ b/lib/raid6/algos.c
> @@ -145,12 +145,12 @@ static inline const struct raid6_recov_calls *raid6_choose_recov(void)
> static inline const struct raid6_calls *raid6_choose_gen(
> void *(*const dptrs)[RAID6_TEST_DISKS], const int disks)
> {
> - unsigned long perf, bestgenperf, bestxorperf, j0, j1;
> + unsigned long perf, bestgenperf, j0, j1;
> int start = (disks>>1)-1, stop = disks-3; /* work on the second half of the disks */
> const struct raid6_calls *const *algo;
> const struct raid6_calls *best;
>
> - for (bestgenperf = 0, bestxorperf = 0, best = NULL, algo = raid6_algos; *algo; algo++) {
> + for (bestgenperf = 0, best = NULL, algo = raid6_algos; *algo; algo++) {
> if (!best || (*algo)->prefer >= best->prefer) {
> if ((*algo)->valid && !(*algo)->valid())
> continue;
> @@ -180,50 +180,48 @@ static inline const struct raid6_calls *raid6_choose_gen(
> pr_info("raid6: %-8s gen() %5ld MB/s\n", (*algo)->name,
> (perf * HZ * (disks-2)) >>
> (20 - PAGE_SHIFT + RAID6_TIME_JIFFIES_LG2));
> + }
In the diff, I do not see the corresponding removed }.
> + }
>
> - if (!(*algo)->xor_syndrome)
> - continue;
> + if (best == NULL) {
> + pr_err("raid6: Yikes! No algorithm found!\n");
> + goto out;
> + }
>
> - perf = 0;
> + raid6_call = *best;
>
> - preempt_disable();
> - j0 = jiffies;
> - while ((j1 = jiffies) == j0)
> - cpu_relax();
> - while (time_before(jiffies,
> - j1 + (1<<RAID6_TIME_JIFFIES_LG2))) {
> - (*algo)->xor_syndrome(disks, start, stop,
> - PAGE_SIZE, *dptrs);
> - perf++;
> - }
> - preempt_enable();
> -
> - if (best == *algo)
> - bestxorperf = perf;
> + if (!IS_ENABLED(CONFIG_RAID6_PQ_BENCHMARK)) {
> + pr_info("raid6: skipped pq benchmark and selected %s\n",
> + best->name);
> + goto out;
> + }
>
> - pr_info("raid6: %-8s xor() %5ld MB/s\n", (*algo)->name,
> - (perf * HZ * (disks-2)) >>
> - (20 - PAGE_SHIFT + RAID6_TIME_JIFFIES_LG2 + 1));
> + pr_info("raid6: using algorithm %s gen() %ld MB/s\n",
> + best->name,
> + (bestgenperf * HZ * (disks-2)) >>
> + (20 - PAGE_SHIFT+RAID6_TIME_JIFFIES_LG2));
> +
> + if (best->xor_syndrome) {
> + perf = 0;
> +
> + preempt_disable();
> + j0 = jiffies;
> + while ((j1 = jiffies) == j0)
> + cpu_relax();
> + while (time_before(jiffies,
> + j1 + (1 << RAID6_TIME_JIFFIES_LG2))) {
> + best->xor_syndrome(disks, start, stop,
> + PAGE_SIZE, *dptrs);
> + perf++;
> }
> - }
> + preempt_enable();
>
> - if (best) {
> - if (IS_ENABLED(CONFIG_RAID6_PQ_BENCHMARK)) {
> - pr_info("raid6: using algorithm %s gen() %ld MB/s\n",
> - best->name,
> - (bestgenperf * HZ * (disks-2)) >>
> - (20 - PAGE_SHIFT+RAID6_TIME_JIFFIES_LG2));
> - if (best->xor_syndrome)
> - pr_info("raid6: .... xor() %ld MB/s, rmw enabled\n",
> - (bestxorperf * HZ * (disks-2)) >>
> - (20 - PAGE_SHIFT + RAID6_TIME_JIFFIES_LG2 + 1));
> - } else
> - pr_info("raid6: skip pq benchmark and using algorithm %s\n",
> - best->name);
> - raid6_call = *best;
> - } else
> - pr_err("raid6: Yikes! No algorithm found!\n");
> + pr_info("raid6: .... xor() %ld MB/s, rmw enabled\n",
> + (perf * HZ * (disks-2)) >>
> + (20 - PAGE_SHIFT + RAID6_TIME_JIFFIES_LG2 + 1));
> + }
>
> +out:
> return best;
> }
>
Kind regards,
Paul
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] Skip benchmarking of non-best xor_syndrome functions
2021-12-30 11:33 ` Paul Menzel
@ 2021-12-31 8:35 ` Dirk Müller
2021-12-31 8:53 ` Paul Menzel
0 siblings, 1 reply; 7+ messages in thread
From: Dirk Müller @ 2021-12-31 8:35 UTC (permalink / raw)
To: Paul Menzel; +Cc: linux-raid
Am 2021-12-30 12:33, schrieb Paul Menzel:
Hi Paul,
Thank you for the wording improvements to the commit message,
incorporated for the next
patch version.
>> For x86_64, this removes 8 out of 18 benchmark loops which each take
>> 16 jiffies, so up to 160 jiffies saved on module load (640ms on a
>> 250HZ
>> kernel)
> On what system?
on a x86_64 system with avx512 capabilities. before this patch it was
doing 3x avx512, 3x avx2 and 3x sse2 xor() benchmark runs (so 9 total,
plus 9 gen() runs as well, leading to the 18 above). with this patch
applied the 9 xor() runs become just 1, saving 8. exact timing depends
on the CONFIG_HZ setting in use, as the benchmark timescale is in
jiffies (which is a problem on its own, but that is for another patch).
> The new message below is logged?
>
> raid6: skipped pq benchmark and selected …
its the same message like before, just worded slightly differently. I
can undo the wording change if requested.
> I am booting my non-RAID systems with `cryptomgr.notests` to avoid
> this boot time penalty.
the benchmark option is recommended to be turned on, and I'm trying to
reduce the cost of that. turning it off avoids the cost altogether, but
I'm not able to judge (yet?) whether that's a better thing to do.
Thanks
Dirk
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] Skip benchmarking of non-best xor_syndrome functions
2021-12-31 8:35 ` Dirk Müller
@ 2021-12-31 8:53 ` Paul Menzel
0 siblings, 0 replies; 7+ messages in thread
From: Paul Menzel @ 2021-12-31 8:53 UTC (permalink / raw)
To: Dirk Müller; +Cc: linux-raid
Dear Dirk,
Thank you for your reply.
Am 31.12.21 um 09:35 schrieb Dirk Müller:
> Am 2021-12-30 12:33, schrieb Paul
[…]
>> The new message below is logged?
>>
>> raid6: skipped pq benchmark and selected …
>
> its the same message like before, just worded slightly differently. I
> can undo the wording change if requested.
No, it can stay. I always like to have these things mentioned in the
commit message though.
[…]
Kind regards,
Paul
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] Skip benchmarking of non-best xor_syndrome functions
2021-12-29 22:34 [PATCH] Skip benchmarking of non-best xor_syndrome functions Dirk Müller
2021-12-30 11:33 ` Paul Menzel
@ 2022-01-01 23:49 ` Song Liu
2022-01-03 13:49 ` Dirk Müller
1 sibling, 1 reply; 7+ messages in thread
From: Song Liu @ 2022-01-01 23:49 UTC (permalink / raw)
To: Dirk Müller; +Cc: linux-raid
On Wed, Dec 29, 2021 at 2:34 PM Dirk Müller <dmueller@suse.de> wrote:
>
> In commit fe5cbc6e06c7 ("md/raid6 algorithms: delta syndrome functions")
> a xor_syndrome() has been added for optimized syndrome calculation and
> being added also to to the raid6_choose_gen() function. However, the
> result of the xor_syndrome() benchmarking was intentionally discarded
> and did not influence the choice.
>
> We can optimize raid6_choose_gen() without changing its behavior by
> only benchmarking the chosen xor_syndrome() variant and printing it
> output, rather than benchmarking also the ones that are discarded and
> never influence the outcome.
>
> For x86_64, this removes 8 out of 18 benchmark loops which each take
> 16 jiffies, so up to 160 jiffies saved on module load (640ms on a 250HZ
> kernel).
How critical is 160 (or 128?) jiffies saving here? I think some users use these
information from dmesg, which shows an overview of all these algorithms:
[ 21.629694] raid6: avx2x4 gen() 18081 MB/s
[ 21.799692] raid6: avx2x4 xor() 8568 MB/s
[ 21.969692] raid6: avx2x2 gen() 15060 MB/s
[ 22.139792] raid6: avx2x2 xor() 9665 MB/s
[ 22.309718] raid6: avx2x1 gen() 9223 MB/s
[ 22.479691] raid6: avx2x1 xor() 6809 MB/s
[ 22.649691] raid6: sse2x4 gen() 9647 MB/s
[ 22.819691] raid6: sse2x4 xor() 6229 MB/s
[ 22.989690] raid6: sse2x2 gen() 7276 MB/s
[ 23.159690] raid6: sse2x2 xor() 5400 MB/s
[ 23.329761] raid6: sse2x1 gen() 4734 MB/s
[ 23.499693] raid6: sse2x1 xor() 3565 MB/s
[ 23.500225] raid6: using algorithm avx2x4 gen() 18081 MB/s
[ 23.500886] raid6: .... xor() 8568 MB/s, rmw enabled
If it is critical for some use cases, maybe we can gate the change with a
CONFIG?
Thanks,
Song
[...]
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] Skip benchmarking of non-best xor_syndrome functions
2022-01-01 23:49 ` Song Liu
@ 2022-01-03 13:49 ` Dirk Müller
2022-01-04 17:40 ` Song Liu
0 siblings, 1 reply; 7+ messages in thread
From: Dirk Müller @ 2022-01-03 13:49 UTC (permalink / raw)
To: Song Liu; +Cc: linux-raid
On Sonntag, 2. Januar 2022 00:49:17 CET Song Liu wrote:
> > For x86_64, this removes 8 out of 18 benchmark loops which each take
> > 16 jiffies, so up to 160 jiffies saved on module load (640ms on a 250HZ
> > kernel).
> How critical is 160 (or 128?) jiffies saving here?
For my usecase, this initialization codepath is just over 50% of the total
runtime of my initrd, so it is significant. this is small initrd for virtual
environments, using however the standard distribution kernel image (which has
benchmarking enabled per Kconfig recommendation).
> If it is critical for some use cases, maybe we can gate the change with a
> CONFIG?
I can't see how it is critical for usecases, because of
* the outcome of the benchmarking is discarded
* there is no configuration or commandline option to manually select a different
xor variant if you as a linux user chose to select a different one based on
manual review of the xor benchmarking results
* benchmarking is recommended to be enabled by default. if you disable it, it
will simply pick the first that works from the static list (which is basically
ordered by hardware features reverse, or by most likely best performance).
I'm happy to have that behind yet another config option if that makes the patch
upstreamable, just let me know and I'll send a new variant.
Greetings,
Dirk
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] Skip benchmarking of non-best xor_syndrome functions
2022-01-03 13:49 ` Dirk Müller
@ 2022-01-04 17:40 ` Song Liu
0 siblings, 0 replies; 7+ messages in thread
From: Song Liu @ 2022-01-04 17:40 UTC (permalink / raw)
To: Dirk Müller; +Cc: linux-raid
On Mon, Jan 3, 2022 at 5:49 AM Dirk Müller <dmueller@suse.de> wrote:
>
> On Sonntag, 2. Januar 2022 00:49:17 CET Song Liu wrote:
>
> > > For x86_64, this removes 8 out of 18 benchmark loops which each take
> > > 16 jiffies, so up to 160 jiffies saved on module load (640ms on a 250HZ
> > > kernel).
> > How critical is 160 (or 128?) jiffies saving here?
>
> For my usecase, this initialization codepath is just over 50% of the total
> runtime of my initrd, so it is significant. this is small initrd for virtual
> environments, using however the standard distribution kernel image (which has
> benchmarking enabled per Kconfig recommendation).
So this kernel has to use the default config? If this is the case, it
doesn't make sense
to add another config.
Thanks,
Song
[...]
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2022-01-04 17:40 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-29 22:34 [PATCH] Skip benchmarking of non-best xor_syndrome functions Dirk Müller
2021-12-30 11:33 ` Paul Menzel
2021-12-31 8:35 ` Dirk Müller
2021-12-31 8:53 ` Paul Menzel
2022-01-01 23:49 ` Song Liu
2022-01-03 13:49 ` Dirk Müller
2022-01-04 17:40 ` Song Liu
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.