All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] Skip benchmarking of non-best xor_syndrome functions
@ 2021-12-29 22:34 Dirk Müller
  2021-12-30 11:33 ` Paul Menzel
  2022-01-01 23:49 ` Song Liu
  0 siblings, 2 replies; 7+ messages in thread
From: Dirk Müller @ 2021-12-29 22:34 UTC (permalink / raw)
  To: linux-raid; +Cc: Dirk Müller

In commit fe5cbc6e06c7 ("md/raid6 algorithms: delta syndrome functions")
a xor_syndrome() has been added for optimized syndrome calculation and
being added also to to the raid6_choose_gen() function. However, the
result of the xor_syndrome() benchmarking was intentionally discarded
and did not influence the choice.

We can optimize raid6_choose_gen() without changing its behavior by
only benchmarking the chosen xor_syndrome() variant and printing it
output, rather than benchmarking also the ones that are discarded and
never influence the outcome.

For x86_64, this removes 8 out of 18 benchmark loops which each take
16 jiffies, so up to 160 jiffies saved on module load (640ms on a 250HZ
kernel).

Signed-off-by: Dirk Müller <dmueller@suse.de>
---
 lib/raid6/algos.c | 76 +++++++++++++++++++++++------------------------
 1 file changed, 37 insertions(+), 39 deletions(-)

diff --git a/lib/raid6/algos.c b/lib/raid6/algos.c
index 6d5e5000fdd7..889033b7fc0d 100644
--- a/lib/raid6/algos.c
+++ b/lib/raid6/algos.c
@@ -145,12 +145,12 @@ static inline const struct raid6_recov_calls *raid6_choose_recov(void)
 static inline const struct raid6_calls *raid6_choose_gen(
 	void *(*const dptrs)[RAID6_TEST_DISKS], const int disks)
 {
-	unsigned long perf, bestgenperf, bestxorperf, j0, j1;
+	unsigned long perf, bestgenperf, j0, j1;
 	int start = (disks>>1)-1, stop = disks-3;	/* work on the second half of the disks */
 	const struct raid6_calls *const *algo;
 	const struct raid6_calls *best;
 
-	for (bestgenperf = 0, bestxorperf = 0, best = NULL, algo = raid6_algos; *algo; algo++) {
+	for (bestgenperf = 0, best = NULL, algo = raid6_algos; *algo; algo++) {
 		if (!best || (*algo)->prefer >= best->prefer) {
 			if ((*algo)->valid && !(*algo)->valid())
 				continue;
@@ -180,50 +180,48 @@ static inline const struct raid6_calls *raid6_choose_gen(
 			pr_info("raid6: %-8s gen() %5ld MB/s\n", (*algo)->name,
 				(perf * HZ * (disks-2)) >>
 				(20 - PAGE_SHIFT + RAID6_TIME_JIFFIES_LG2));
+		}
+	}
 
-			if (!(*algo)->xor_syndrome)
-				continue;
+	if (best == NULL) {
+		pr_err("raid6: Yikes! No algorithm found!\n");
+		goto out;
+	}
 
-			perf = 0;
+	raid6_call = *best;
 
-			preempt_disable();
-			j0 = jiffies;
-			while ((j1 = jiffies) == j0)
-				cpu_relax();
-			while (time_before(jiffies,
-					    j1 + (1<<RAID6_TIME_JIFFIES_LG2))) {
-				(*algo)->xor_syndrome(disks, start, stop,
-						      PAGE_SIZE, *dptrs);
-				perf++;
-			}
-			preempt_enable();
-
-			if (best == *algo)
-				bestxorperf = perf;
+	if (!IS_ENABLED(CONFIG_RAID6_PQ_BENCHMARK)) {
+		pr_info("raid6: skipped pq benchmark and selected %s\n",
+			best->name);
+		goto out;
+	}
 
-			pr_info("raid6: %-8s xor() %5ld MB/s\n", (*algo)->name,
-				(perf * HZ * (disks-2)) >>
-				(20 - PAGE_SHIFT + RAID6_TIME_JIFFIES_LG2 + 1));
+	pr_info("raid6: using algorithm %s gen() %ld MB/s\n",
+		best->name,
+		(bestgenperf * HZ * (disks-2)) >>
+		(20 - PAGE_SHIFT+RAID6_TIME_JIFFIES_LG2));
+
+	if (best->xor_syndrome) {
+		perf = 0;
+
+		preempt_disable();
+		j0 = jiffies;
+		while ((j1 = jiffies) == j0)
+			cpu_relax();
+		while (time_before(jiffies,
+				   j1 + (1 << RAID6_TIME_JIFFIES_LG2))) {
+			best->xor_syndrome(disks, start, stop,
+					   PAGE_SIZE, *dptrs);
+			perf++;
 		}
-	}
+		preempt_enable();
 
-	if (best) {
-		if (IS_ENABLED(CONFIG_RAID6_PQ_BENCHMARK)) {
-			pr_info("raid6: using algorithm %s gen() %ld MB/s\n",
-				best->name,
-				(bestgenperf * HZ * (disks-2)) >>
-				(20 - PAGE_SHIFT+RAID6_TIME_JIFFIES_LG2));
-			if (best->xor_syndrome)
-				pr_info("raid6: .... xor() %ld MB/s, rmw enabled\n",
-					(bestxorperf * HZ * (disks-2)) >>
-					(20 - PAGE_SHIFT + RAID6_TIME_JIFFIES_LG2 + 1));
-		} else
-			pr_info("raid6: skip pq benchmark and using algorithm %s\n",
-				best->name);
-		raid6_call = *best;
-	} else
-		pr_err("raid6: Yikes!  No algorithm found!\n");
+		pr_info("raid6: .... xor() %ld MB/s, rmw enabled\n",
+			(perf * HZ * (disks-2)) >>
+			(20 - PAGE_SHIFT + RAID6_TIME_JIFFIES_LG2 + 1));
+	}
 
+out:
 	return best;
 }
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] Skip benchmarking of non-best xor_syndrome functions
  2021-12-29 22:34 [PATCH] Skip benchmarking of non-best xor_syndrome functions Dirk Müller
@ 2021-12-30 11:33 ` Paul Menzel
  2021-12-31  8:35   ` Dirk Müller
  2022-01-01 23:49 ` Song Liu
  1 sibling, 1 reply; 7+ messages in thread
From: Paul Menzel @ 2021-12-30 11:33 UTC (permalink / raw)
  To: Dirk Müller; +Cc: linux-raid

Dear Dirk,


Am 29.12.21 um 23:34 schrieb Dirk Müller:
> In commit fe5cbc6e06c7 ("md/raid6 algorithms: delta syndrome functions")
> a xor_syndrome() has been added for optimized syndrome calculation and

was added

> being added also to to the raid6_choose_gen() function. However, the > result of the xor_syndrome() benchmarking was intentionally discarded

s/to to/to/

> and did not influence the choice.
> 
> We can optimize raid6_choose_gen() without changing its behavior by
> only benchmarking the chosen xor_syndrome() variant and printing it

it*s*

> output, rather than benchmarking also the ones that are discarded and
> never influence the outcome.
> 
> For x86_64, this removes 8 out of 18 benchmark loops which each take
> 16 jiffies, so up to 160 jiffies saved on module load (640ms on a 250HZ
> kernel).

On what system?

The new message below is logged?

     raid6: skipped pq benchmark and selected …

I am booting my non-RAID systems with `cryptomgr.notests` to avoid this 
boot time penalty.

> Signed-off-by: Dirk Müller <dmueller@suse.de>
> ---
>   lib/raid6/algos.c | 76 +++++++++++++++++++++++------------------------
>   1 file changed, 37 insertions(+), 39 deletions(-)
> 
> diff --git a/lib/raid6/algos.c b/lib/raid6/algos.c
> index 6d5e5000fdd7..889033b7fc0d 100644
> --- a/lib/raid6/algos.c
> +++ b/lib/raid6/algos.c
> @@ -145,12 +145,12 @@ static inline const struct raid6_recov_calls *raid6_choose_recov(void)
>   static inline const struct raid6_calls *raid6_choose_gen(
>   	void *(*const dptrs)[RAID6_TEST_DISKS], const int disks)
>   {
> -	unsigned long perf, bestgenperf, bestxorperf, j0, j1;
> +	unsigned long perf, bestgenperf, j0, j1;
>   	int start = (disks>>1)-1, stop = disks-3;	/* work on the second half of the disks */
>   	const struct raid6_calls *const *algo;
>   	const struct raid6_calls *best;
>   
> -	for (bestgenperf = 0, bestxorperf = 0, best = NULL, algo = raid6_algos; *algo; algo++) {
> +	for (bestgenperf = 0, best = NULL, algo = raid6_algos; *algo; algo++) {
>   		if (!best || (*algo)->prefer >= best->prefer) {
>   			if ((*algo)->valid && !(*algo)->valid())
>   				continue;
> @@ -180,50 +180,48 @@ static inline const struct raid6_calls *raid6_choose_gen(
>   			pr_info("raid6: %-8s gen() %5ld MB/s\n", (*algo)->name,
>   				(perf * HZ * (disks-2)) >>
>   				(20 - PAGE_SHIFT + RAID6_TIME_JIFFIES_LG2));
> +		}

In the diff, I do not see the corresponding removed }.

> +	}
>   
> -			if (!(*algo)->xor_syndrome)
> -				continue;
> +	if (best == NULL) {
> +		pr_err("raid6: Yikes! No algorithm found!\n");
> +		goto out;
> +	}
>   
> -			perf = 0;
> +	raid6_call = *best;
>   
> -			preempt_disable();
> -			j0 = jiffies;
> -			while ((j1 = jiffies) == j0)
> -				cpu_relax();
> -			while (time_before(jiffies,
> -					    j1 + (1<<RAID6_TIME_JIFFIES_LG2))) {
> -				(*algo)->xor_syndrome(disks, start, stop,
> -						      PAGE_SIZE, *dptrs);
> -				perf++;
> -			}
> -			preempt_enable();
> -
> -			if (best == *algo)
> -				bestxorperf = perf;
> +	if (!IS_ENABLED(CONFIG_RAID6_PQ_BENCHMARK)) {
> +		pr_info("raid6: skipped pq benchmark and selected %s\n",
> +			best->name);
> +		goto out;
> +	}
>   
> -			pr_info("raid6: %-8s xor() %5ld MB/s\n", (*algo)->name,
> -				(perf * HZ * (disks-2)) >>
> -				(20 - PAGE_SHIFT + RAID6_TIME_JIFFIES_LG2 + 1));
> +	pr_info("raid6: using algorithm %s gen() %ld MB/s\n",
> +		best->name,
> +		(bestgenperf * HZ * (disks-2)) >>
> +		(20 - PAGE_SHIFT+RAID6_TIME_JIFFIES_LG2));
> +
> +	if (best->xor_syndrome) {
> +		perf = 0;
> +
> +		preempt_disable();
> +		j0 = jiffies;
> +		while ((j1 = jiffies) == j0)
> +			cpu_relax();
> +		while (time_before(jiffies,
> +				   j1 + (1 << RAID6_TIME_JIFFIES_LG2))) {
> +			best->xor_syndrome(disks, start, stop,
> +					   PAGE_SIZE, *dptrs);
> +			perf++;
>   		}
> -	}
> +		preempt_enable();
>   
> -	if (best) {
> -		if (IS_ENABLED(CONFIG_RAID6_PQ_BENCHMARK)) {
> -			pr_info("raid6: using algorithm %s gen() %ld MB/s\n",
> -				best->name,
> -				(bestgenperf * HZ * (disks-2)) >>
> -				(20 - PAGE_SHIFT+RAID6_TIME_JIFFIES_LG2));
> -			if (best->xor_syndrome)
> -				pr_info("raid6: .... xor() %ld MB/s, rmw enabled\n",
> -					(bestxorperf * HZ * (disks-2)) >>
> -					(20 - PAGE_SHIFT + RAID6_TIME_JIFFIES_LG2 + 1));
> -		} else
> -			pr_info("raid6: skip pq benchmark and using algorithm %s\n",
> -				best->name);
> -		raid6_call = *best;
> -	} else
> -		pr_err("raid6: Yikes!  No algorithm found!\n");
> +		pr_info("raid6: .... xor() %ld MB/s, rmw enabled\n",
> +			(perf * HZ * (disks-2)) >>
> +			(20 - PAGE_SHIFT + RAID6_TIME_JIFFIES_LG2 + 1));
> +	}
>   
> +out:
>   	return best;
>   }
>   


Kind regards,

Paul

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] Skip benchmarking of non-best xor_syndrome functions
  2021-12-30 11:33 ` Paul Menzel
@ 2021-12-31  8:35   ` Dirk Müller
  2021-12-31  8:53     ` Paul Menzel
  0 siblings, 1 reply; 7+ messages in thread
From: Dirk Müller @ 2021-12-31  8:35 UTC (permalink / raw)
  To: Paul Menzel; +Cc: linux-raid

Am 2021-12-30 12:33, schrieb Paul Menzel:

Hi Paul,

Thank you for the wording improvements to the commit message, 
incorporated for the next
patch version.

>> For x86_64, this removes 8 out of 18 benchmark loops which each take
>> 16 jiffies, so up to 160 jiffies saved on module load (640ms on a 
>> 250HZ
>> kernel)
> On what system?

on a x86_64 system with avx512 capabilities. before this patch it was 
doing 3x avx512, 3x avx2 and 3x sse2 xor() benchmark runs (so 9 total, 
plus 9 gen() runs as well, leading to the 18 above). with this patch 
applied the 9 xor() runs become just 1, saving 8. exact timing depends 
on the CONFIG_HZ setting in use, as the benchmark timescale is in 
jiffies (which is a problem on its own, but that is for another patch).

> The new message below is logged?
> 
>     raid6: skipped pq benchmark and selected …

its the same message like before, just worded slightly differently. I 
can undo the wording change if requested.

> I am booting my non-RAID systems with `cryptomgr.notests` to avoid
> this boot time penalty.

the benchmark option is recommended to be turned on, and I'm trying to 
reduce the cost of that. turning it off avoids the cost altogether, but 
I'm not able to judge (yet?) whether that's a better thing to do.


Thanks
Dirk

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] Skip benchmarking of non-best xor_syndrome functions
  2021-12-31  8:35   ` Dirk Müller
@ 2021-12-31  8:53     ` Paul Menzel
  0 siblings, 0 replies; 7+ messages in thread
From: Paul Menzel @ 2021-12-31  8:53 UTC (permalink / raw)
  To: Dirk Müller; +Cc: linux-raid

Dear Dirk,


Thank you for your reply.


Am 31.12.21 um 09:35 schrieb Dirk Müller:
> Am 2021-12-30 12:33, schrieb Paul 

[…]

>> The new message below is logged?
>>
>>     raid6: skipped pq benchmark and selected …
> 
> its the same message like before, just worded slightly differently. I 
> can undo the wording change if requested.

No, it can stay. I always like to have these things mentioned in the 
commit message though.

[…]


Kind regards,

Paul

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] Skip benchmarking of non-best xor_syndrome functions
  2021-12-29 22:34 [PATCH] Skip benchmarking of non-best xor_syndrome functions Dirk Müller
  2021-12-30 11:33 ` Paul Menzel
@ 2022-01-01 23:49 ` Song Liu
  2022-01-03 13:49   ` Dirk Müller
  1 sibling, 1 reply; 7+ messages in thread
From: Song Liu @ 2022-01-01 23:49 UTC (permalink / raw)
  To: Dirk Müller; +Cc: linux-raid

On Wed, Dec 29, 2021 at 2:34 PM Dirk Müller <dmueller@suse.de> wrote:
>
> In commit fe5cbc6e06c7 ("md/raid6 algorithms: delta syndrome functions")
> a xor_syndrome() has been added for optimized syndrome calculation and
> being added also to to the raid6_choose_gen() function. However, the
> result of the xor_syndrome() benchmarking was intentionally discarded
> and did not influence the choice.
>
> We can optimize raid6_choose_gen() without changing its behavior by
> only benchmarking the chosen xor_syndrome() variant and printing it
> output, rather than benchmarking also the ones that are discarded and
> never influence the outcome.
>
> For x86_64, this removes 8 out of 18 benchmark loops which each take
> 16 jiffies, so up to 160 jiffies saved on module load (640ms on a 250HZ
> kernel).

How critical is 160 (or 128?) jiffies saving here? I think some users use these
information from dmesg, which shows an overview of all these algorithms:

[   21.629694] raid6: avx2x4   gen() 18081 MB/s
[   21.799692] raid6: avx2x4   xor()  8568 MB/s
[   21.969692] raid6: avx2x2   gen() 15060 MB/s
[   22.139792] raid6: avx2x2   xor()  9665 MB/s
[   22.309718] raid6: avx2x1   gen()  9223 MB/s
[   22.479691] raid6: avx2x1   xor()  6809 MB/s
[   22.649691] raid6: sse2x4   gen()  9647 MB/s
[   22.819691] raid6: sse2x4   xor()  6229 MB/s
[   22.989690] raid6: sse2x2   gen()  7276 MB/s
[   23.159690] raid6: sse2x2   xor()  5400 MB/s
[   23.329761] raid6: sse2x1   gen()  4734 MB/s
[   23.499693] raid6: sse2x1   xor()  3565 MB/s
[   23.500225] raid6: using algorithm avx2x4 gen() 18081 MB/s
[   23.500886] raid6: .... xor() 8568 MB/s, rmw enabled

If it is critical for some use cases, maybe we can gate the change with a
CONFIG?

Thanks,
Song



[...]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] Skip benchmarking of non-best xor_syndrome functions
  2022-01-01 23:49 ` Song Liu
@ 2022-01-03 13:49   ` Dirk Müller
  2022-01-04 17:40     ` Song Liu
  0 siblings, 1 reply; 7+ messages in thread
From: Dirk Müller @ 2022-01-03 13:49 UTC (permalink / raw)
  To: Song Liu; +Cc: linux-raid

On Sonntag, 2. Januar 2022 00:49:17 CET Song Liu wrote:

> > For x86_64, this removes 8 out of 18 benchmark loops which each take
> > 16 jiffies, so up to 160 jiffies saved on module load (640ms on a 250HZ
> > kernel).
> How critical is 160 (or 128?) jiffies saving here? 

For my usecase, this initialization codepath is just over 50% of the total 
runtime of my initrd, so it is significant. this is small initrd for virtual 
environments, using however the standard distribution kernel image (which has 
benchmarking enabled per Kconfig recommendation).

> If it is critical for some use cases, maybe we can gate the change with a
> CONFIG?

I can't see how it is critical for usecases, because of 

* the outcome of the benchmarking is discarded
* there is no configuration or commandline option to manually select a different 
xor variant if you as a linux user chose to select a different one based on 
manual review of the xor benchmarking results
* benchmarking is recommended to be enabled by default. if you disable it, it 
will simply pick the first that works from the static list (which is basically 
ordered by hardware features reverse, or by most likely best performance). 

I'm happy to have that behind yet another config option if that makes the patch 
upstreamable, just let me know and I'll send a new variant. 

Greetings,
Dirk




^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] Skip benchmarking of non-best xor_syndrome functions
  2022-01-03 13:49   ` Dirk Müller
@ 2022-01-04 17:40     ` Song Liu
  0 siblings, 0 replies; 7+ messages in thread
From: Song Liu @ 2022-01-04 17:40 UTC (permalink / raw)
  To: Dirk Müller; +Cc: linux-raid

On Mon, Jan 3, 2022 at 5:49 AM Dirk Müller <dmueller@suse.de> wrote:
>
> On Sonntag, 2. Januar 2022 00:49:17 CET Song Liu wrote:
>
> > > For x86_64, this removes 8 out of 18 benchmark loops which each take
> > > 16 jiffies, so up to 160 jiffies saved on module load (640ms on a 250HZ
> > > kernel).
> > How critical is 160 (or 128?) jiffies saving here?
>
> For my usecase, this initialization codepath is just over 50% of the total
> runtime of my initrd, so it is significant. this is small initrd for virtual
> environments, using however the standard distribution kernel image (which has
> benchmarking enabled per Kconfig recommendation).

So this kernel has to use the default config? If this is the case, it
doesn't make sense
to add another config.

Thanks,
Song

[...]

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2022-01-04 17:40 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-29 22:34 [PATCH] Skip benchmarking of non-best xor_syndrome functions Dirk Müller
2021-12-30 11:33 ` Paul Menzel
2021-12-31  8:35   ` Dirk Müller
2021-12-31  8:53     ` Paul Menzel
2022-01-01 23:49 ` Song Liu
2022-01-03 13:49   ` Dirk Müller
2022-01-04 17:40     ` Song Liu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.