All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] Use strict priority ranking for pq gen() benchmarking
@ 2021-12-29 22:36 Dirk Müller
  2021-12-30 13:46 ` Paul Menzel
  2022-01-02  0:03 ` Song Liu
  0 siblings, 2 replies; 8+ messages in thread
From: Dirk Müller @ 2021-12-29 22:36 UTC (permalink / raw)
  To: linux-raid; +Cc: Dirk Müller

On x86_64, currently 3 variants of AVX512, 3 variants of AVX2
and 3 variants of SSE2 are benchmarked on initialization, taking
between 144-153 jiffies. Over a hardware pool of various generations
of intel cpus I could not find a single case where SSE2 won over
AVX2 or AVX512. There are cases where AVX2 wins over AVX512.

By giving AVXx variants higher priority over SSE, we can generally
skip 3 benchmarks which speeds this up by 33% - 50%, depending on
whether AVX512 is available.

Signed-off-by: Dirk Müller <dmueller@suse.de>
---
 include/linux/raid/pq.h | 2 +-
 lib/raid6/algos.c       | 2 +-
 lib/raid6/avx2.c        | 6 +++---
 lib/raid6/avx512.c      | 6 +++---
 4 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/include/linux/raid/pq.h b/include/linux/raid/pq.h
index 154e954b711d..d6e5a1feb947 100644
--- a/include/linux/raid/pq.h
+++ b/include/linux/raid/pq.h
@@ -81,7 +81,7 @@ struct raid6_calls {
 	void (*xor_syndrome)(int, int, int, size_t, void **);
 	int  (*valid)(void);	/* Returns 1 if this routine set is usable */
 	const char *name;	/* Name of this routine set */
-	int prefer;		/* Has special performance attribute */
+	int priority;		/* Relative priority ranking if non-zero */
 };
 
 /* Selected algorithm */
diff --git a/lib/raid6/algos.c b/lib/raid6/algos.c
index 889033b7fc0d..d1e8ff837a32 100644
--- a/lib/raid6/algos.c
+++ b/lib/raid6/algos.c
@@ -151,7 +151,7 @@ static inline const struct raid6_calls *raid6_choose_gen(
 	const struct raid6_calls *best;
 
 	for (bestgenperf = 0, best = NULL, algo = raid6_algos; *algo; algo++) {
-		if (!best || (*algo)->prefer >= best->prefer) {
+		if (!best || (*algo)->priority >= best->priority) {
 			if ((*algo)->valid && !(*algo)->valid())
 				continue;
 
diff --git a/lib/raid6/avx2.c b/lib/raid6/avx2.c
index f299476e1d76..31be496b8c81 100644
--- a/lib/raid6/avx2.c
+++ b/lib/raid6/avx2.c
@@ -132,7 +132,7 @@ const struct raid6_calls raid6_avx2x1 = {
 	raid6_avx21_xor_syndrome,
 	raid6_have_avx2,
 	"avx2x1",
-	1			/* Has cache hints */
+	.priority = 2
 };
 
 /*
@@ -262,7 +262,7 @@ const struct raid6_calls raid6_avx2x2 = {
 	raid6_avx22_xor_syndrome,
 	raid6_have_avx2,
 	"avx2x2",
-	1			/* Has cache hints */
+	.priority = 2
 };
 
 #ifdef CONFIG_X86_64
@@ -465,6 +465,6 @@ const struct raid6_calls raid6_avx2x4 = {
 	raid6_avx24_xor_syndrome,
 	raid6_have_avx2,
 	"avx2x4",
-	1			/* Has cache hints */
+	.priority = 2
 };
 #endif
diff --git a/lib/raid6/avx512.c b/lib/raid6/avx512.c
index bb684d144ee2..63ae197c3294 100644
--- a/lib/raid6/avx512.c
+++ b/lib/raid6/avx512.c
@@ -162,7 +162,7 @@ const struct raid6_calls raid6_avx512x1 = {
 	raid6_avx5121_xor_syndrome,
 	raid6_have_avx512,
 	"avx512x1",
-	1                       /* Has cache hints */
+	.priority = 2
 };
 
 /*
@@ -319,7 +319,7 @@ const struct raid6_calls raid6_avx512x2 = {
 	raid6_avx5122_xor_syndrome,
 	raid6_have_avx512,
 	"avx512x2",
-	1                       /* Has cache hints */
+	.priority = 2
 };
 
 #ifdef CONFIG_X86_64
@@ -557,7 +557,7 @@ const struct raid6_calls raid6_avx512x4 = {
 	raid6_avx5124_xor_syndrome,
 	raid6_have_avx512,
 	"avx512x4",
-	1                       /* Has cache hints */
+	.priority = 2
 };
 #endif
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] Use strict priority ranking for pq gen() benchmarking
  2021-12-29 22:36 [PATCH] Use strict priority ranking for pq gen() benchmarking Dirk Müller
@ 2021-12-30 13:46 ` Paul Menzel
  2021-12-31  8:52   ` Dirk Müller
  2022-01-02  0:03 ` Song Liu
  1 sibling, 1 reply; 8+ messages in thread
From: Paul Menzel @ 2021-12-30 13:46 UTC (permalink / raw)
  To: Dirk Müller; +Cc: linux-raid

Dear Dirk,


Am 29.12.21 um 23:36 schrieb Dirk Müller:
> On x86_64, currently 3 variants of AVX512, 3 variants of AVX2
> and 3 variants of SSE2 are benchmarked on initialization, taking
> between 144-153 jiffies. Over a hardware pool of various generations
> of intel cpus I could not find a single case where SSE2 won over
> AVX2 or AVX512. There are cases where AVX2 wins over AVX512.

Can the AVX2 wins over AVX512 be explained, or does it point to some 
implementation problem? By the way, Borislav did not give much credit to 
the benchmarks results [1].

> By giving AVXx variants higher priority over SSE, we can generally
> skip 3 benchmarks which speeds this up by 33% - 50%, depending on
> whether AVX512 is available.

Please give concrete timing numbers for one system you tested this on.

> Signed-off-by: Dirk Müller <dmueller@suse.de>
> ---
>   include/linux/raid/pq.h | 2 +-
>   lib/raid6/algos.c       | 2 +-
>   lib/raid6/avx2.c        | 6 +++---
>   lib/raid6/avx512.c      | 6 +++---
>   4 files changed, 8 insertions(+), 8 deletions(-)
> 
> diff --git a/include/linux/raid/pq.h b/include/linux/raid/pq.h
> index 154e954b711d..d6e5a1feb947 100644
> --- a/include/linux/raid/pq.h
> +++ b/include/linux/raid/pq.h
> @@ -81,7 +81,7 @@ struct raid6_calls {
>   	void (*xor_syndrome)(int, int, int, size_t, void **);
>   	int  (*valid)(void);	/* Returns 1 if this routine set is usable */
>   	const char *name;	/* Name of this routine set */
> -	int prefer;		/* Has special performance attribute */
> +	int priority;		/* Relative priority ranking if non-zero */
>   };
>   
>   /* Selected algorithm */
> diff --git a/lib/raid6/algos.c b/lib/raid6/algos.c
> index 889033b7fc0d..d1e8ff837a32 100644
> --- a/lib/raid6/algos.c
> +++ b/lib/raid6/algos.c
> @@ -151,7 +151,7 @@ static inline const struct raid6_calls *raid6_choose_gen(
>   	const struct raid6_calls *best;
>   
>   	for (bestgenperf = 0, best = NULL, algo = raid6_algos; *algo; algo++) {
> -		if (!best || (*algo)->prefer >= best->prefer) {
> +		if (!best || (*algo)->priority >= best->priority) {
>   			if ((*algo)->valid && !(*algo)->valid())
>   				continue;
>   
> diff --git a/lib/raid6/avx2.c b/lib/raid6/avx2.c
> index f299476e1d76..31be496b8c81 100644
> --- a/lib/raid6/avx2.c
> +++ b/lib/raid6/avx2.c
> @@ -132,7 +132,7 @@ const struct raid6_calls raid6_avx2x1 = {
>   	raid6_avx21_xor_syndrome,
>   	raid6_have_avx2,
>   	"avx2x1",
> -	1			/* Has cache hints */
> +	.priority = 2
>   };
>   
>   /*
> @@ -262,7 +262,7 @@ const struct raid6_calls raid6_avx2x2 = {
>   	raid6_avx22_xor_syndrome,
>   	raid6_have_avx2,
>   	"avx2x2",
> -	1			/* Has cache hints */
> +	.priority = 2
>   };
>   
>   #ifdef CONFIG_X86_64
> @@ -465,6 +465,6 @@ const struct raid6_calls raid6_avx2x4 = {
>   	raid6_avx24_xor_syndrome,
>   	raid6_have_avx2,
>   	"avx2x4",
> -	1			/* Has cache hints */
> +	.priority = 2
>   };
>   #endif
> diff --git a/lib/raid6/avx512.c b/lib/raid6/avx512.c
> index bb684d144ee2..63ae197c3294 100644
> --- a/lib/raid6/avx512.c
> +++ b/lib/raid6/avx512.c
> @@ -162,7 +162,7 @@ const struct raid6_calls raid6_avx512x1 = {
>   	raid6_avx5121_xor_syndrome,
>   	raid6_have_avx512,
>   	"avx512x1",
> -	1                       /* Has cache hints */
> +	.priority = 2
>   };
>   
>   /*
> @@ -319,7 +319,7 @@ const struct raid6_calls raid6_avx512x2 = {
>   	raid6_avx5122_xor_syndrome,
>   	raid6_have_avx512,
>   	"avx512x2",
> -	1                       /* Has cache hints */
> +	.priority = 2
>   };
>   
>   #ifdef CONFIG_X86_64
> @@ -557,7 +557,7 @@ const struct raid6_calls raid6_avx512x4 = {
>   	raid6_avx5124_xor_syndrome,
>   	raid6_have_avx512,
>   	"avx512x4",
> -	1                       /* Has cache hints */
> +	.priority = 2
>   };
>   #endif
>   

Acked-by: Paul Menzel <pmenzel@molgen.mpg.de>


Kind regards,

Paul


[1]: https://lore.kernel.org/all/20210406124126.GM17806@zn.tnic/

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] Use strict priority ranking for pq gen() benchmarking
  2021-12-30 13:46 ` Paul Menzel
@ 2021-12-31  8:52   ` Dirk Müller
  2021-12-31  8:57     ` Paul Menzel
  0 siblings, 1 reply; 8+ messages in thread
From: Dirk Müller @ 2021-12-31  8:52 UTC (permalink / raw)
  To: Paul Menzel; +Cc: linux-raid

Am 2021-12-30 14:46, schrieb Paul Menzel:

Hi Paul,

> Can the AVX2 wins over AVX512 be explained, or does it point to some
> implementation problem?

I've not yet analyzed this deep enough to have a defendable explanation 
ready, sorry. My patch is
not changing the situation in regards to AVX512 vs AVX2 (both are ranked 
equal, same like before).
The only change I do is that SSE2 is ranked lower than AVX2, so cpu 
generations that have AVX2 will
stop benchmarking at AVX2 rather than also including SSE2 benchmark 
runs.

The current benchmark routine is likely too naive when you look at the 
last 20+ years of
cpu design improvements (prefetching, Out-of-Order Execution, Turbo 
modes, Energy-Cores,
AVX512 licensing turbo and many other aspects). This is not in my 
current focus, my current
focus is on lowering the tax of the benchmark.

> By the way, Borislav did not give much credit to the benchmarks results 
> [1].

I have seen that as well, there are two remarks on this (both not 
invalidating what Borislav wrote):

* the comment was about xor(), this patch is about gen()
* the benchmark logic does a relative ranking of approaches, so the 
absolute number fluctuation doesn't matter if they still rank the same.

>> By giving AVXx variants higher priority over SSE, we can generally
>> skip 3 benchmarks which speeds this up by 33% - 50%, depending on
>> whether AVX512 is available.
> Please give concrete timing numbers for one system you tested this on.

I have given an explanation of how this patch affects number of 
benchmarks that are run. how long they take depends on other factors. 
this is the list of benchmarks configured (lib/raid6/algos.c the 
raid6_algos6[] array):


   #if defined(__x86_64__) && !defined(__arch_um__)
   #ifdef CONFIG_AS_AVX512
           &raid6_avx512x4,
           &raid6_avx512x2,
           &raid6_avx512x1,
   #endif
           &raid6_avx2x4,
           &raid6_avx2x2,
           &raid6_avx2x1,
           &raid6_sse2x4,
           &raid6_sse2x2,
           &raid6_sse2x1,
   #endif

without this patch, all 9 are executed. with this patch, the last 3 
(sse2x*) are skipped, leading to a 3 out of 6 or 3 out of 9 (depending 
on whether or not AVX512 is enabled) improvement, or 33%-50% as written 
above.

I'm open to any sugggestion of a wording change that makes this clearer.


Thanks,
Dirk

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] Use strict priority ranking for pq gen() benchmarking
  2021-12-31  8:52   ` Dirk Müller
@ 2021-12-31  8:57     ` Paul Menzel
  0 siblings, 0 replies; 8+ messages in thread
From: Paul Menzel @ 2021-12-31  8:57 UTC (permalink / raw)
  To: Dirk Müller; +Cc: linux-raid


Dear Dirk,


Thank you for the detailed reply.


Am 31.12.21 um 09:52 schrieb Dirk Müller:
> Am 2021-12-30 14:46, schrieb Paul Menzel:

>> Can the AVX2 wins over AVX512 be explained, or does it point to
>> some implementation problem?
> 
> I've not yet analyzed this deep enough to have a defendable
> explanation ready, sorry. My patch is not changing the situation in
> regards to AVX512 vs AVX2 (both are ranked equal, same like before). 
> The only change I do is that SSE2 is ranked lower than AVX2, so cpu 
> generations that have AVX2 will stop benchmarking at AVX2 rather than
> also including SSE2 benchmark runs.
> 
> The current benchmark routine is likely too naive when you look at
> the last 20+ years of cpu design improvements (prefetching,
> Out-of-Order Execution, Turbo modes, Energy-Cores, AVX512 licensing
> turbo and many other aspects). This is not in my current focus, my
> current focus is on lowering the tax of the benchmark.

Thank you. Sorry for hijacking this thread with the question.

>> By the way, Borislav did not give much credit to the benchmarks 
>> results [1].
> 
> I have seen that as well, there are two remarks on this (both not 
> invalidating what Borislav wrote):
> 
> * the comment was about xor(), this patch is about gen()
> * the benchmark logic does a relative ranking of approaches, so the 
> absolute number fluctuation doesn't matter if they still rank the same.

Indeed.

>>> By giving AVXx variants higher priority over SSE, we can generally
>>> skip 3 benchmarks which speeds this up by 33% - 50%, depending on
>>> whether AVX512 is available.
>> Please give concrete timing numbers for one system you tested this on.
> 
> I have given an explanation of how this patch affects number of 
> benchmarks that are run. how long they take depends on other factors. 
> this is the list of benchmarks configured (lib/raid6/algos.c the 
> raid6_algos6[] array):
> 
> 
>    #if defined(__x86_64__) && !defined(__arch_um__)
>    #ifdef CONFIG_AS_AVX512
>            &raid6_avx512x4,
>            &raid6_avx512x2,
>            &raid6_avx512x1,
>    #endif
>            &raid6_avx2x4,
>            &raid6_avx2x2,
>            &raid6_avx2x1,
>            &raid6_sse2x4,
>            &raid6_sse2x2,
>            &raid6_sse2x1,
>    #endif
> 
> without this patch, all 9 are executed. with this patch, the last 3 
> (sse2x*) are skipped, leading to a 3 out of 6 or 3 out of 9 (depending 
> on whether or not AVX512 is enabled) improvement, or 33%-50% as written 
> above.
> 
> I'm open to any suggestion of a wording change that makes this clearer.

As in the other patch, having an additional statement like below, would 
help me.

With a 250HZ kernel, on Intel Xeon(?) … according to `initcall_debug` 
the former load time is X ms, and now only Y ms.


Kind regards,

Paul

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] Use strict priority ranking for pq gen() benchmarking
  2021-12-29 22:36 [PATCH] Use strict priority ranking for pq gen() benchmarking Dirk Müller
  2021-12-30 13:46 ` Paul Menzel
@ 2022-01-02  0:03 ` Song Liu
  2022-01-03 16:28   ` Dirk Müller
  1 sibling, 1 reply; 8+ messages in thread
From: Song Liu @ 2022-01-02  0:03 UTC (permalink / raw)
  To: Dirk Müller; +Cc: linux-raid

On Wed, Dec 29, 2021 at 2:36 PM Dirk Müller <dmueller@suse.de> wrote:
>
> On x86_64, currently 3 variants of AVX512, 3 variants of AVX2
> and 3 variants of SSE2 are benchmarked on initialization, taking
> between 144-153 jiffies. Over a hardware pool of various generations
> of intel cpus I could not find a single case where SSE2 won over
> AVX2 or AVX512. There are cases where AVX2 wins over AVX512.
>
> By giving AVXx variants higher priority over SSE, we can generally
> skip 3 benchmarks which speeds this up by 33% - 50%, depending on
> whether AVX512 is available.
>
> Signed-off-by: Dirk Müller <dmueller@suse.de>
> ---
>  include/linux/raid/pq.h | 2 +-
>  lib/raid6/algos.c       | 2 +-
>  lib/raid6/avx2.c        | 6 +++---
>  lib/raid6/avx512.c      | 6 +++---
>  4 files changed, 8 insertions(+), 8 deletions(-)
>
> diff --git a/include/linux/raid/pq.h b/include/linux/raid/pq.h
> index 154e954b711d..d6e5a1feb947 100644
> --- a/include/linux/raid/pq.h
> +++ b/include/linux/raid/pq.h
> @@ -81,7 +81,7 @@ struct raid6_calls {
>         void (*xor_syndrome)(int, int, int, size_t, void **);
>         int  (*valid)(void);    /* Returns 1 if this routine set is usable */
>         const char *name;       /* Name of this routine set */
> -       int prefer;             /* Has special performance attribute */
> +       int priority;           /* Relative priority ranking if non-zero */

We need  more explanation/documentation about 0 vs. 1 vs. 2 priority.

>  };
>
>  /* Selected algorithm */
> diff --git a/lib/raid6/algos.c b/lib/raid6/algos.c
> index 889033b7fc0d..d1e8ff837a32 100644
> --- a/lib/raid6/algos.c
> +++ b/lib/raid6/algos.c
> @@ -151,7 +151,7 @@ static inline const struct raid6_calls *raid6_choose_gen(
>         const struct raid6_calls *best;
>
>         for (bestgenperf = 0, best = NULL, algo = raid6_algos; *algo; algo++) {
> -               if (!best || (*algo)->prefer >= best->prefer) {
> +               if (!best || (*algo)->priority >= best->priority) {
>                         if ((*algo)->valid && !(*algo)->valid())

If the module load time is really critical, maybe we can run all
->valid() calls first and
find the highest valid priority. Then, we only run the benchmark for
these algorithms.

Does this make sense?

Thanks,
Song

>                                 continue;
>
> diff --git a/lib/raid6/avx2.c b/lib/raid6/avx2.c
> index f299476e1d76..31be496b8c81 100644
> --- a/lib/raid6/avx2.c
> +++ b/lib/raid6/avx2.c
> @@ -132,7 +132,7 @@ const struct raid6_calls raid6_avx2x1 = {
>         raid6_avx21_xor_syndrome,
>         raid6_have_avx2,
>         "avx2x1",
> -       1                       /* Has cache hints */
> +       .priority = 2
>  };
>
>  /*
> @@ -262,7 +262,7 @@ const struct raid6_calls raid6_avx2x2 = {
>         raid6_avx22_xor_syndrome,
>         raid6_have_avx2,
>         "avx2x2",
> -       1                       /* Has cache hints */
> +       .priority = 2
>  };
>
>  #ifdef CONFIG_X86_64
> @@ -465,6 +465,6 @@ const struct raid6_calls raid6_avx2x4 = {
>         raid6_avx24_xor_syndrome,
>         raid6_have_avx2,
>         "avx2x4",
> -       1                       /* Has cache hints */
> +       .priority = 2
>  };
>  #endif
> diff --git a/lib/raid6/avx512.c b/lib/raid6/avx512.c
> index bb684d144ee2..63ae197c3294 100644
> --- a/lib/raid6/avx512.c
> +++ b/lib/raid6/avx512.c
> @@ -162,7 +162,7 @@ const struct raid6_calls raid6_avx512x1 = {
>         raid6_avx5121_xor_syndrome,
>         raid6_have_avx512,
>         "avx512x1",
> -       1                       /* Has cache hints */
> +       .priority = 2
>  };
>
>  /*
> @@ -319,7 +319,7 @@ const struct raid6_calls raid6_avx512x2 = {
>         raid6_avx5122_xor_syndrome,
>         raid6_have_avx512,
>         "avx512x2",
> -       1                       /* Has cache hints */
> +       .priority = 2
>  };
>
>  #ifdef CONFIG_X86_64
> @@ -557,7 +557,7 @@ const struct raid6_calls raid6_avx512x4 = {
>         raid6_avx5124_xor_syndrome,
>         raid6_have_avx512,
>         "avx512x4",
> -       1                       /* Has cache hints */
> +       .priority = 2
>  };
>  #endif
>
> --
> 2.34.1
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] Use strict priority ranking for pq gen() benchmarking
  2022-01-02  0:03 ` Song Liu
@ 2022-01-03 16:28   ` Dirk Müller
  2022-01-04 17:28     ` Song Liu
  0 siblings, 1 reply; 8+ messages in thread
From: Dirk Müller @ 2022-01-03 16:28 UTC (permalink / raw)
  To: Song Liu; +Cc: linux-raid

On Sonntag, 2. Januar 2022 01:03:44 CET Song Liu wrote:

> We need  more explanation/documentation about 0 vs. 1 vs. 2 priority.

In the commit message? in the code? this is basically a copy&paste of the same 
concept and code from a few lines below the diff, struct raid6_recov_calls
which works the same way and currently has no documentation at all.

want me to add to both then?

> >                         if ((*algo)->valid && !(*algo)->valid())
> 
> If the module load time is really critical, maybe we can run all
> ->valid() calls first and
> find the highest valid priority. Then, we only run the benchmark for
> these algorithms.

thats exactly what the code always did. previously all x86_64 specific 
implementations (be it SSE1/SSE2/AVX2/AVX512) all had the same priority level 
1, over the default priority level 0 for the implemented-in-C int*.c routines. 
with this change, we have one more level p refering AVX* over the rest, so 
that we skip testing SSE1/SSE2 (similary to how the integer implementations 
have always been skipped before). 

> Does this make sense?

the valid call is not probing anything by itself. it just iterates over a 
small array of functions and stops executing benchmarks for those that have 
lower priority ranks. 

so there isn't really a lot of cycles to win by changing the execution order 
here. I would assume it will actually slow things down as we have to store the 
valid() result for the 2nd iteration. 

Greetings,
Dirk




^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] Use strict priority ranking for pq gen() benchmarking
  2022-01-03 16:28   ` Dirk Müller
@ 2022-01-04 17:28     ` Song Liu
  2022-01-05 16:39       ` Dirk Müller
  0 siblings, 1 reply; 8+ messages in thread
From: Song Liu @ 2022-01-04 17:28 UTC (permalink / raw)
  To: Dirk Müller; +Cc: linux-raid

On Mon, Jan 3, 2022 at 8:28 AM Dirk Müller <dmueller@suse.de> wrote:
>
> On Sonntag, 2. Januar 2022 01:03:44 CET Song Liu wrote:
>
> > We need  more explanation/documentation about 0 vs. 1 vs. 2 priority.
>
> In the commit message? in the code? this is basically a copy&paste of the same
> concept and code from a few lines below the diff, struct raid6_recov_calls
> which works the same way and currently has no documentation at all.
>
> want me to add to both then?

I guess we only need something like:

  .priority = 2   /* avx is always faster than sse */

>
> > >                         if ((*algo)->valid && !(*algo)->valid())
> >
> > If the module load time is really critical, maybe we can run all
> > ->valid() calls first and
> > find the highest valid priority. Then, we only run the benchmark for
> > these algorithms.
>
> thats exactly what the code always did. previously all x86_64 specific
> implementations (be it SSE1/SSE2/AVX2/AVX512) all had the same priority level
> 1, over the default priority level 0 for the implemented-in-C int*.c routines.
> with this change, we have one more level p refering AVX* over the rest, so
> that we skip testing SSE1/SSE2 (similary to how the integer implementations
> have always been skipped before).
>
> > Does this make sense?
>
> the valid call is not probing anything by itself. it just iterates over a
> small array of functions and stops executing benchmarks for those that have
> lower priority ranks.
>
> so there isn't really a lot of cycles to win by changing the execution order
> here. I would assume it will actually slow things down as we have to store the
> valid() result for the 2nd iteration.

Let's keep this part as-is then.

Thanks,
Song

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] Use strict priority ranking for pq gen() benchmarking
  2022-01-04 17:28     ` Song Liu
@ 2022-01-05 16:39       ` Dirk Müller
  0 siblings, 0 replies; 8+ messages in thread
From: Dirk Müller @ 2022-01-05 16:39 UTC (permalink / raw)
  To: Song Liu; +Cc: linux-raid

On Dienstag, 4. Januar 2022 18:28:39 CET Song Liu wrote:

> > want me to add to both then?
> I guess we only need something like:
>   .priority = 2   /* avx is always faster than sse */

Ah okay, makes total sense. added to v2. 

> Let's keep this part as-is then.

Thank you!

Greetings,
Dirk




^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2022-01-05 16:40 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-29 22:36 [PATCH] Use strict priority ranking for pq gen() benchmarking Dirk Müller
2021-12-30 13:46 ` Paul Menzel
2021-12-31  8:52   ` Dirk Müller
2021-12-31  8:57     ` Paul Menzel
2022-01-02  0:03 ` Song Liu
2022-01-03 16:28   ` Dirk Müller
2022-01-04 17:28     ` Song Liu
2022-01-05 16:39       ` Dirk Müller

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.