linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] crypto: x86/crc32c-intel - Don't match some Zhaoxin CPUs
@ 2020-12-11 11:29 Tony W Wang-oc
  2020-12-11 13:00 ` Peter Zijlstra
  2020-12-11 17:43 ` Eric Biggers
  0 siblings, 2 replies; 22+ messages in thread
From: Tony W Wang-oc @ 2020-12-11 11:29 UTC (permalink / raw)
  To: herbert, davem, tglx, mingo, bp, x86, hpa, linux-crypto, linux-kernel
  Cc: TimGuo-oc, CooperYan, QiyuanWang, HerryYang, CobeChen, SilviaZhao

The driver crc32c-intel match CPUs supporting X86_FEATURE_XMM4_2.
On platforms with Zhaoxin CPUs supporting this X86 feature, When
crc32c-intel and crc32c-generic are both registered, system will
use crc32c-intel because its .cra_priority is greater than
crc32c-generic. This case expect to use crc32c-generic driver for
some Zhaoxin CPUs to get performance gain, So remove these Zhaoxin
CPUs support from crc32c-intel.

Signed-off-by: Tony W Wang-oc <TonyWWang-oc@zhaoxin.com>
---
 arch/x86/crypto/crc32c-intel_glue.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/arch/x86/crypto/crc32c-intel_glue.c b/arch/x86/crypto/crc32c-intel_glue.c
index feccb52..6dafdae 100644
--- a/arch/x86/crypto/crc32c-intel_glue.c
+++ b/arch/x86/crypto/crc32c-intel_glue.c
@@ -222,8 +222,16 @@ MODULE_DEVICE_TABLE(x86cpu, crc32c_cpu_id);
 
 static int __init crc32c_intel_mod_init(void)
 {
+	struct cpuinfo_x86 *c = &boot_cpu_data;
+
 	if (!x86_match_cpu(crc32c_cpu_id))
 		return -ENODEV;
+
+	if (c->x86_vendor == X86_VENDOR_ZHAOXIN || c->x86_vendor == X86_VENDOR_CENTAUR) {
+		if (c->x86 == 0x6 || (c->x86 == 0x7 && c->x86_model <= 0x3b))
+			return -ENODEV;
+	}
+
 #ifdef CONFIG_X86_64
 	if (boot_cpu_has(X86_FEATURE_PCLMULQDQ)) {
 		alg.update = crc32c_pcl_intel_update;
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH] crypto: x86/crc32c-intel - Don't match some Zhaoxin CPUs
  2020-12-11 11:29 [PATCH] crypto: x86/crc32c-intel - Don't match some Zhaoxin CPUs Tony W Wang-oc
@ 2020-12-11 13:00 ` Peter Zijlstra
  2020-12-12  4:32   ` Tony W Wang-oc
  2020-12-11 17:43 ` Eric Biggers
  1 sibling, 1 reply; 22+ messages in thread
From: Peter Zijlstra @ 2020-12-11 13:00 UTC (permalink / raw)
  To: Tony W Wang-oc
  Cc: herbert, davem, tglx, mingo, bp, x86, hpa, linux-crypto,
	linux-kernel, TimGuo-oc, CooperYan, QiyuanWang, HerryYang,
	CobeChen, SilviaZhao, thomas.lendacky

On Fri, Dec 11, 2020 at 07:29:04PM +0800, Tony W Wang-oc wrote:
> The driver crc32c-intel match CPUs supporting X86_FEATURE_XMM4_2.
> On platforms with Zhaoxin CPUs supporting this X86 feature, When
> crc32c-intel and crc32c-generic are both registered, system will
> use crc32c-intel because its .cra_priority is greater than
> crc32c-generic. This case expect to use crc32c-generic driver for
> some Zhaoxin CPUs to get performance gain, So remove these Zhaoxin
> CPUs support from crc32c-intel.
> 
> Signed-off-by: Tony W Wang-oc <TonyWWang-oc@zhaoxin.com>
> ---
>  arch/x86/crypto/crc32c-intel_glue.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/arch/x86/crypto/crc32c-intel_glue.c b/arch/x86/crypto/crc32c-intel_glue.c
> index feccb52..6dafdae 100644
> --- a/arch/x86/crypto/crc32c-intel_glue.c
> +++ b/arch/x86/crypto/crc32c-intel_glue.c
> @@ -222,8 +222,16 @@ MODULE_DEVICE_TABLE(x86cpu, crc32c_cpu_id);
>  
>  static int __init crc32c_intel_mod_init(void)
>  {
> +	struct cpuinfo_x86 *c = &boot_cpu_data;
> +
>  	if (!x86_match_cpu(crc32c_cpu_id))
>  		return -ENODEV;
> +
> +	if (c->x86_vendor == X86_VENDOR_ZHAOXIN || c->x86_vendor == X86_VENDOR_CENTAUR) {
> +		if (c->x86 == 0x6 || (c->x86 == 0x7 && c->x86_model <= 0x3b))
> +			return -ENODEV;
> +	}

Egads, why can't you use that x86_match_cpu() above, and also this
really wants a comment on why you're excluding these chips. Also, since
(IIRC) ZHAOXIN is basically AND, shouldn't they also be listed?

That is; write it like:

	m = x86_match_cpu(crc32_cpu_id);
	if (!m || !m->data)
		return -ENODEV;

That way you can have positive and negative matches in the array
(obviously the existing FEATURE test would need data=1 and be last).

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] crypto: x86/crc32c-intel - Don't match some Zhaoxin CPUs
  2020-12-11 11:29 [PATCH] crypto: x86/crc32c-intel - Don't match some Zhaoxin CPUs Tony W Wang-oc
  2020-12-11 13:00 ` Peter Zijlstra
@ 2020-12-11 17:43 ` Eric Biggers
  2020-12-12  9:36   ` Ard Biesheuvel
  2020-12-14  2:28   ` Tony W Wang-oc
  1 sibling, 2 replies; 22+ messages in thread
From: Eric Biggers @ 2020-12-11 17:43 UTC (permalink / raw)
  To: Tony W Wang-oc
  Cc: herbert, davem, tglx, mingo, bp, x86, hpa, linux-crypto,
	linux-kernel, TimGuo-oc, CooperYan, QiyuanWang, HerryYang,
	CobeChen, SilviaZhao

On Fri, Dec 11, 2020 at 07:29:04PM +0800, Tony W Wang-oc wrote:
> The driver crc32c-intel match CPUs supporting X86_FEATURE_XMM4_2.
> On platforms with Zhaoxin CPUs supporting this X86 feature, When
> crc32c-intel and crc32c-generic are both registered, system will
> use crc32c-intel because its .cra_priority is greater than
> crc32c-generic. This case expect to use crc32c-generic driver for
> some Zhaoxin CPUs to get performance gain, So remove these Zhaoxin
> CPUs support from crc32c-intel.
> 
> Signed-off-by: Tony W Wang-oc <TonyWWang-oc@zhaoxin.com>

Does this mean that the performance of the crc32c instruction on those CPUs is
actually slower than a regular C implementation?  That's very weird.

- Eric

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] crypto: x86/crc32c-intel - Don't match some Zhaoxin CPUs
  2020-12-11 13:00 ` Peter Zijlstra
@ 2020-12-12  4:32   ` Tony W Wang-oc
  0 siblings, 0 replies; 22+ messages in thread
From: Tony W Wang-oc @ 2020-12-12  4:32 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: herbert, davem, tglx, mingo, bp, x86, hpa, linux-crypto,
	linux-kernel, TimGuo-oc, CooperYan, QiyuanWang, HerryYang,
	CobeChen, SilviaZhao, thomas.lendacky

On 11/12/2020 21:00, Peter Zijlstra wrote:
> On Fri, Dec 11, 2020 at 07:29:04PM +0800, Tony W Wang-oc wrote:
>> The driver crc32c-intel match CPUs supporting X86_FEATURE_XMM4_2.
>> On platforms with Zhaoxin CPUs supporting this X86 feature, When
>> crc32c-intel and crc32c-generic are both registered, system will
>> use crc32c-intel because its .cra_priority is greater than
>> crc32c-generic. This case expect to use crc32c-generic driver for
>> some Zhaoxin CPUs to get performance gain, So remove these Zhaoxin
>> CPUs support from crc32c-intel.
>>
>> Signed-off-by: Tony W Wang-oc <TonyWWang-oc@zhaoxin.com>
>> ---
>>  arch/x86/crypto/crc32c-intel_glue.c | 8 ++++++++
>>  1 file changed, 8 insertions(+)
>>
>> diff --git a/arch/x86/crypto/crc32c-intel_glue.c b/arch/x86/crypto/crc32c-intel_glue.c
>> index feccb52..6dafdae 100644
>> --- a/arch/x86/crypto/crc32c-intel_glue.c
>> +++ b/arch/x86/crypto/crc32c-intel_glue.c
>> @@ -222,8 +222,16 @@ MODULE_DEVICE_TABLE(x86cpu, crc32c_cpu_id);
>>  
>>  static int __init crc32c_intel_mod_init(void)
>>  {
>> +	struct cpuinfo_x86 *c = &boot_cpu_data;
>> +
>>  	if (!x86_match_cpu(crc32c_cpu_id))
>>  		return -ENODEV;
>> +
>> +	if (c->x86_vendor == X86_VENDOR_ZHAOXIN || c->x86_vendor == X86_VENDOR_CENTAUR) {
>> +		if (c->x86 == 0x6 || (c->x86 == 0x7 && c->x86_model <= 0x3b))
>> +			return -ENODEV;
>> +	}
> 
> Egads, why can't you use that x86_match_cpu() above, and also this
> really wants a comment on why you're excluding these chips. 

When doing lmbench3 Create and Delete file test on partitions with ext4
enabling metadata checksum, found using crc32c-generic driver could get
about 20% performance gain than using the driver crc32c-intel on these
chips.

Also, since
> (IIRC) ZHAOXIN is basically AND, shouldn't they also be listed?
> 
> That is; write it like:
> 
> 	m = x86_match_cpu(crc32_cpu_id);
> 	if (!m || !m->data)
> 		return -ENODEV;
> 
> That way you can have positive and negative matches in the array
> (obviously the existing FEATURE test would need data=1 and be last).
> .
> 

Lot thanks for you suggestion, will list these chips in crc32c_cpu_id
and use x86_match_cpu:

 static const struct x86_cpu_id crc32c_cpu_id[] = {
+       X86_MATCH_VENDOR_FAM_FEATURE(ZHAOXIN, 0x6, X86_FEATURE_XMM4_2, 1),
+       X86_MATCH_VENDOR_FAM_MODEL_FEATURE(ZHAOXIN, 0x7, 0x1b,
X86_FEATURE_XMM4_2, 1),
+       X86_MATCH_VENDOR_FAM_MODEL_FEATURE(ZHAOXIN, 0x7, 0x3b,
X86_FEATURE_XMM4_2, 1),
+       X86_MATCH_VENDOR_FAM_FEATURE(CENTAUR, 0x6, X86_FEATURE_XMM4_2, 1),
+       X86_MATCH_VENDOR_FAM_MODEL_FEATURE(CENTAUR, 0x7, 0x1b,
X86_FEATURE_XMM4_2, 1),
+       X86_MATCH_VENDOR_FAM_MODEL_FEATURE(CENTAUR, 0x7, 0x3b,
X86_FEATURE_XMM4_2, 1),
        X86_MATCH_FEATURE(X86_FEATURE_XMM4_2, NULL),
        {}
 };
@@ -228,8 +234,10 @@ MODULE_DEVICE_TABLE(x86cpu, crc32c_cpu_id);

 static int __init crc32c_intel_mod_init(void)
 {
-       if (!x86_match_cpu(crc32c_cpu_id))
+       const struct x86_cpu_id *m = x86_match_cpu(crc32c_cpu_id);
+       if (!m || m->driver_data)
                return -ENODEV;


sincerely
TonyWWangoc

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] crypto: x86/crc32c-intel - Don't match some Zhaoxin CPUs
  2020-12-11 17:43 ` Eric Biggers
@ 2020-12-12  9:36   ` Ard Biesheuvel
  2020-12-12 10:54     ` Ard Biesheuvel
  2020-12-14  2:28   ` Tony W Wang-oc
  1 sibling, 1 reply; 22+ messages in thread
From: Ard Biesheuvel @ 2020-12-12  9:36 UTC (permalink / raw)
  To: Eric Biggers
  Cc: Tony W Wang-oc, Herbert Xu, David S. Miller, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, X86 ML, H. Peter Anvin,
	Linux Crypto Mailing List, Linux Kernel Mailing List, TimGuo-oc,
	CooperYan, QiyuanWang, HerryYang, CobeChen, SilviaZhao

On Fri, 11 Dec 2020 at 20:07, Eric Biggers <ebiggers@kernel.org> wrote:
>
> On Fri, Dec 11, 2020 at 07:29:04PM +0800, Tony W Wang-oc wrote:
> > The driver crc32c-intel match CPUs supporting X86_FEATURE_XMM4_2.
> > On platforms with Zhaoxin CPUs supporting this X86 feature, When
> > crc32c-intel and crc32c-generic are both registered, system will
> > use crc32c-intel because its .cra_priority is greater than
> > crc32c-generic. This case expect to use crc32c-generic driver for
> > some Zhaoxin CPUs to get performance gain, So remove these Zhaoxin
> > CPUs support from crc32c-intel.
> >
> > Signed-off-by: Tony W Wang-oc <TonyWWang-oc@zhaoxin.com>
>
> Does this mean that the performance of the crc32c instruction on those CPUs is
> actually slower than a regular C implementation?  That's very weird.
>

This driver does not use CRC instructions, but carryless
multiplication and aggregation. So I suppose the pclmulqdq instruction
triggers some pathological performance limitation here.

That means the crct10dif driver probably needs the same treatment.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] crypto: x86/crc32c-intel - Don't match some Zhaoxin CPUs
  2020-12-12  9:36   ` Ard Biesheuvel
@ 2020-12-12 10:54     ` Ard Biesheuvel
  2020-12-14  2:29       ` Tony W Wang-oc
  0 siblings, 1 reply; 22+ messages in thread
From: Ard Biesheuvel @ 2020-12-12 10:54 UTC (permalink / raw)
  To: Eric Biggers
  Cc: Tony W Wang-oc, Herbert Xu, David S. Miller, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, X86 ML, H. Peter Anvin,
	Linux Crypto Mailing List, Linux Kernel Mailing List, TimGuo-oc,
	CooperYan, QiyuanWang, HerryYang, CobeChen, SilviaZhao

On Sat, 12 Dec 2020 at 10:36, Ard Biesheuvel <ardb@kernel.org> wrote:
>
> On Fri, 11 Dec 2020 at 20:07, Eric Biggers <ebiggers@kernel.org> wrote:
> >
> > On Fri, Dec 11, 2020 at 07:29:04PM +0800, Tony W Wang-oc wrote:
> > > The driver crc32c-intel match CPUs supporting X86_FEATURE_XMM4_2.
> > > On platforms with Zhaoxin CPUs supporting this X86 feature, When
> > > crc32c-intel and crc32c-generic are both registered, system will
> > > use crc32c-intel because its .cra_priority is greater than
> > > crc32c-generic. This case expect to use crc32c-generic driver for
> > > some Zhaoxin CPUs to get performance gain, So remove these Zhaoxin
> > > CPUs support from crc32c-intel.
> > >
> > > Signed-off-by: Tony W Wang-oc <TonyWWang-oc@zhaoxin.com>
> >
> > Does this mean that the performance of the crc32c instruction on those CPUs is
> > actually slower than a regular C implementation?  That's very weird.
> >
>
> This driver does not use CRC instructions, but carryless
> multiplication and aggregation. So I suppose the pclmulqdq instruction
> triggers some pathological performance limitation here.
>

Just noticed it uses both crc instructions and pclmulqdq instructions.
Sorry for the noise.

> That means the crct10dif driver probably needs the same treatment.

Tony, can you confirm that the problem is in the CRC instructions and
not in the PCLMULQDQ code path that supersedes it when available?

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] crypto: x86/crc32c-intel - Don't match some Zhaoxin CPUs
  2020-12-11 17:43 ` Eric Biggers
  2020-12-12  9:36   ` Ard Biesheuvel
@ 2020-12-14  2:28   ` Tony W Wang-oc
  2020-12-14 20:41     ` Eric Biggers
  1 sibling, 1 reply; 22+ messages in thread
From: Tony W Wang-oc @ 2020-12-14  2:28 UTC (permalink / raw)
  To: Eric Biggers
  Cc: herbert, davem, tglx, mingo, bp, x86, hpa, linux-crypto,
	linux-kernel, TimGuo-oc, CooperYan, QiyuanWang, HerryYang,
	CobeChen, SilviaZhao

On 12/12/2020 01:43, Eric Biggers wrote:
> On Fri, Dec 11, 2020 at 07:29:04PM +0800, Tony W Wang-oc wrote:
>> The driver crc32c-intel match CPUs supporting X86_FEATURE_XMM4_2.
>> On platforms with Zhaoxin CPUs supporting this X86 feature, When
>> crc32c-intel and crc32c-generic are both registered, system will
>> use crc32c-intel because its .cra_priority is greater than
>> crc32c-generic. This case expect to use crc32c-generic driver for
>> some Zhaoxin CPUs to get performance gain, So remove these Zhaoxin
>> CPUs support from crc32c-intel.
>>
>> Signed-off-by: Tony W Wang-oc <TonyWWang-oc@zhaoxin.com>
> 
> Does this mean that the performance of the crc32c instruction on those CPUs is
> actually slower than a regular C implementation?  That's very weird.
> 

From the lmbench3 Create and Delete file test on those chips, I think yes.

sincerely
Tony

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] crypto: x86/crc32c-intel - Don't match some Zhaoxin CPUs
  2020-12-12 10:54     ` Ard Biesheuvel
@ 2020-12-14  2:29       ` Tony W Wang-oc
  0 siblings, 0 replies; 22+ messages in thread
From: Tony W Wang-oc @ 2020-12-14  2:29 UTC (permalink / raw)
  To: Ard Biesheuvel, Eric Biggers
  Cc: Herbert Xu, David S. Miller, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, X86 ML, H. Peter Anvin,
	Linux Crypto Mailing List, Linux Kernel Mailing List, TimGuo-oc,
	CooperYan, QiyuanWang, HerryYang, CobeChen, SilviaZhao



On 12/12/2020 18:54, Ard Biesheuvel wrote:
> On Sat, 12 Dec 2020 at 10:36, Ard Biesheuvel <ardb@kernel.org> wrote:
>>
>> On Fri, 11 Dec 2020 at 20:07, Eric Biggers <ebiggers@kernel.org> wrote:
>>>
>>> On Fri, Dec 11, 2020 at 07:29:04PM +0800, Tony W Wang-oc wrote:
>>>> The driver crc32c-intel match CPUs supporting X86_FEATURE_XMM4_2.
>>>> On platforms with Zhaoxin CPUs supporting this X86 feature, When
>>>> crc32c-intel and crc32c-generic are both registered, system will
>>>> use crc32c-intel because its .cra_priority is greater than
>>>> crc32c-generic. This case expect to use crc32c-generic driver for
>>>> some Zhaoxin CPUs to get performance gain, So remove these Zhaoxin
>>>> CPUs support from crc32c-intel.
>>>>
>>>> Signed-off-by: Tony W Wang-oc <TonyWWang-oc@zhaoxin.com>
>>>
>>> Does this mean that the performance of the crc32c instruction on those CPUs is
>>> actually slower than a regular C implementation?  That's very weird.
>>>
>>
>> This driver does not use CRC instructions, but carryless
>> multiplication and aggregation. So I suppose the pclmulqdq instruction
>> triggers some pathological performance limitation here.
>>
> 
> Just noticed it uses both crc instructions and pclmulqdq instructions.
> Sorry for the noise.
> 
>> That means the crct10dif driver probably needs the same treatment.
> 
> Tony, can you confirm that the problem is in the CRC instructions and
> not in the PCLMULQDQ code path that supersedes it when available?

CRC instructions.

sincerely
Tony

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] crypto: x86/crc32c-intel - Don't match some Zhaoxin CPUs
  2020-12-14  2:28   ` Tony W Wang-oc
@ 2020-12-14 20:41     ` Eric Biggers
  2020-12-15  2:15       ` Tony W Wang-oc
  0 siblings, 1 reply; 22+ messages in thread
From: Eric Biggers @ 2020-12-14 20:41 UTC (permalink / raw)
  To: Tony W Wang-oc
  Cc: herbert, davem, tglx, mingo, bp, x86, hpa, linux-crypto,
	linux-kernel, TimGuo-oc, CooperYan, QiyuanWang, HerryYang,
	CobeChen, SilviaZhao

On Mon, Dec 14, 2020 at 10:28:19AM +0800, Tony W Wang-oc wrote:
> On 12/12/2020 01:43, Eric Biggers wrote:
> > On Fri, Dec 11, 2020 at 07:29:04PM +0800, Tony W Wang-oc wrote:
> >> The driver crc32c-intel match CPUs supporting X86_FEATURE_XMM4_2.
> >> On platforms with Zhaoxin CPUs supporting this X86 feature, When
> >> crc32c-intel and crc32c-generic are both registered, system will
> >> use crc32c-intel because its .cra_priority is greater than
> >> crc32c-generic. This case expect to use crc32c-generic driver for
> >> some Zhaoxin CPUs to get performance gain, So remove these Zhaoxin
> >> CPUs support from crc32c-intel.
> >>
> >> Signed-off-by: Tony W Wang-oc <TonyWWang-oc@zhaoxin.com>
> > 
> > Does this mean that the performance of the crc32c instruction on those CPUs is
> > actually slower than a regular C implementation?  That's very weird.
> > 
> 
> From the lmbench3 Create and Delete file test on those chips, I think yes.
> 

Did you try measuring the performance of the hashing itself, and not some
higher-level filesystem operations?

- Eric

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] crypto: x86/crc32c-intel - Don't match some Zhaoxin CPUs
  2020-12-14 20:41     ` Eric Biggers
@ 2020-12-15  2:15       ` Tony W Wang-oc
  2020-12-15 17:56         ` Eric Biggers
  0 siblings, 1 reply; 22+ messages in thread
From: Tony W Wang-oc @ 2020-12-15  2:15 UTC (permalink / raw)
  To: Eric Biggers
  Cc: herbert, davem, tglx, mingo, bp, x86, hpa, linux-crypto,
	linux-kernel, TimGuo-oc, CooperYan, QiyuanWang, HerryYang,
	CobeChen, SilviaZhao


On 15/12/2020 04:41, Eric Biggers wrote:
> On Mon, Dec 14, 2020 at 10:28:19AM +0800, Tony W Wang-oc wrote:
>> On 12/12/2020 01:43, Eric Biggers wrote:
>>> On Fri, Dec 11, 2020 at 07:29:04PM +0800, Tony W Wang-oc wrote:
>>>> The driver crc32c-intel match CPUs supporting X86_FEATURE_XMM4_2.
>>>> On platforms with Zhaoxin CPUs supporting this X86 feature, When
>>>> crc32c-intel and crc32c-generic are both registered, system will
>>>> use crc32c-intel because its .cra_priority is greater than
>>>> crc32c-generic. This case expect to use crc32c-generic driver for
>>>> some Zhaoxin CPUs to get performance gain, So remove these Zhaoxin
>>>> CPUs support from crc32c-intel.
>>>>
>>>> Signed-off-by: Tony W Wang-oc <TonyWWang-oc@zhaoxin.com>
>>>
>>> Does this mean that the performance of the crc32c instruction on those CPUs is
>>> actually slower than a regular C implementation?  That's very weird.
>>>
>>
>> From the lmbench3 Create and Delete file test on those chips, I think yes.
>>
> 
> Did you try measuring the performance of the hashing itself, and not some
> higher-level filesystem operations?
> 

Yes. Was testing on these Zhaoxin CPUs, the result is that with the same
input value the generic C implementation takes fewer time than the
crc32c instruction implementation.

Sincerely
Tony

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] crypto: x86/crc32c-intel - Don't match some Zhaoxin CPUs
  2020-12-15  2:15       ` Tony W Wang-oc
@ 2020-12-15 17:56         ` Eric Biggers
  2020-12-21  2:46           ` tonywwang-oc
  0 siblings, 1 reply; 22+ messages in thread
From: Eric Biggers @ 2020-12-15 17:56 UTC (permalink / raw)
  To: Tony W Wang-oc
  Cc: herbert, davem, tglx, mingo, bp, x86, hpa, linux-crypto,
	linux-kernel, TimGuo-oc, CooperYan, QiyuanWang, HerryYang,
	CobeChen, SilviaZhao

On Tue, Dec 15, 2020 at 10:15:29AM +0800, Tony W Wang-oc wrote:
> 
> On 15/12/2020 04:41, Eric Biggers wrote:
> > On Mon, Dec 14, 2020 at 10:28:19AM +0800, Tony W Wang-oc wrote:
> >> On 12/12/2020 01:43, Eric Biggers wrote:
> >>> On Fri, Dec 11, 2020 at 07:29:04PM +0800, Tony W Wang-oc wrote:
> >>>> The driver crc32c-intel match CPUs supporting X86_FEATURE_XMM4_2.
> >>>> On platforms with Zhaoxin CPUs supporting this X86 feature, When
> >>>> crc32c-intel and crc32c-generic are both registered, system will
> >>>> use crc32c-intel because its .cra_priority is greater than
> >>>> crc32c-generic. This case expect to use crc32c-generic driver for
> >>>> some Zhaoxin CPUs to get performance gain, So remove these Zhaoxin
> >>>> CPUs support from crc32c-intel.
> >>>>
> >>>> Signed-off-by: Tony W Wang-oc <TonyWWang-oc@zhaoxin.com>
> >>>
> >>> Does this mean that the performance of the crc32c instruction on those CPUs is
> >>> actually slower than a regular C implementation?  That's very weird.
> >>>
> >>
> >> From the lmbench3 Create and Delete file test on those chips, I think yes.
> >>
> > 
> > Did you try measuring the performance of the hashing itself, and not some
> > higher-level filesystem operations?
> > 
> 
> Yes. Was testing on these Zhaoxin CPUs, the result is that with the same
> input value the generic C implementation takes fewer time than the
> crc32c instruction implementation.
> 

And that is really "working as intended"?  Why do these CPUs even declare that
they support the crc32c instruction, when it is so slow?

Are there any other instruction sets (AES-NI, PCLMUL, SSE, SSE2, AVX, etc.) that
these CPUs similarly declare support for but they are uselessly slow?

- Eric

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] crypto: x86/crc32c-intel - Don't match some Zhaoxin CPUs
  2020-12-15 17:56         ` Eric Biggers
@ 2020-12-21  2:46           ` tonywwang-oc
  2020-12-21 19:27             ` hpa
  0 siblings, 1 reply; 22+ messages in thread
From: tonywwang-oc @ 2020-12-21  2:46 UTC (permalink / raw)
  To: Eric Biggers
  Cc: herbert, davem, tglx, mingo, bp, x86, hpa, linux-crypto,
	linux-kernel, TimGuo-oc, CooperYan, QiyuanWang, HerryYang,
	CobeChen, SilviaZhao

On December 16, 2020 1:56:45 AM GMT+08:00, Eric Biggers <ebiggers@kernel.org> wrote:
>On Tue, Dec 15, 2020 at 10:15:29AM +0800, Tony W Wang-oc wrote:
>> 
>> On 15/12/2020 04:41, Eric Biggers wrote:
>> > On Mon, Dec 14, 2020 at 10:28:19AM +0800, Tony W Wang-oc wrote:
>> >> On 12/12/2020 01:43, Eric Biggers wrote:
>> >>> On Fri, Dec 11, 2020 at 07:29:04PM +0800, Tony W Wang-oc wrote:
>> >>>> The driver crc32c-intel match CPUs supporting
>X86_FEATURE_XMM4_2.
>> >>>> On platforms with Zhaoxin CPUs supporting this X86 feature, When
>> >>>> crc32c-intel and crc32c-generic are both registered, system will
>> >>>> use crc32c-intel because its .cra_priority is greater than
>> >>>> crc32c-generic. This case expect to use crc32c-generic driver
>for
>> >>>> some Zhaoxin CPUs to get performance gain, So remove these
>Zhaoxin
>> >>>> CPUs support from crc32c-intel.
>> >>>>
>> >>>> Signed-off-by: Tony W Wang-oc <TonyWWang-oc@zhaoxin.com>
>> >>>
>> >>> Does this mean that the performance of the crc32c instruction on
>those CPUs is
>> >>> actually slower than a regular C implementation?  That's very
>weird.
>> >>>
>> >>
>> >> From the lmbench3 Create and Delete file test on those chips, I
>think yes.
>> >>
>> > 
>> > Did you try measuring the performance of the hashing itself, and
>not some
>> > higher-level filesystem operations?
>> > 
>> 
>> Yes. Was testing on these Zhaoxin CPUs, the result is that with the
>same
>> input value the generic C implementation takes fewer time than the
>> crc32c instruction implementation.
>> 
>
>And that is really "working as intended"?

These CPU's crc32c instruction is not working as intended.

  Why do these CPUs even
>declare that
>they support the crc32c instruction, when it is so slow?
>

The presence of crc32c and some other instructions supports are enumerated by CPUID.01:ECX[SSE4.2] = 1,  other instructions are ok except the crc32c instruction.

>Are there any other instruction sets (AES-NI, PCLMUL, SSE, SSE2, AVX,
>etc.) that
>these CPUs similarly declare support for but they are uselessly slow?

No.

Sincerely
Tonyw

>
>- Eric



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] crypto: x86/crc32c-intel - Don't match some Zhaoxin CPUs
  2020-12-21  2:46           ` tonywwang-oc
@ 2020-12-21 19:27             ` hpa
  2020-12-22  3:01               ` tonywwang-oc
  0 siblings, 1 reply; 22+ messages in thread
From: hpa @ 2020-12-21 19:27 UTC (permalink / raw)
  To: tonywwang-oc, Eric Biggers
  Cc: herbert, davem, tglx, mingo, bp, x86, linux-crypto, linux-kernel,
	TimGuo-oc, CooperYan, QiyuanWang, HerryYang, CobeChen,
	SilviaZhao

On December 20, 2020 6:46:25 PM PST, tonywwang-oc@zhaoxin.com wrote:
>On December 16, 2020 1:56:45 AM GMT+08:00, Eric Biggers
><ebiggers@kernel.org> wrote:
>>On Tue, Dec 15, 2020 at 10:15:29AM +0800, Tony W Wang-oc wrote:
>>> 
>>> On 15/12/2020 04:41, Eric Biggers wrote:
>>> > On Mon, Dec 14, 2020 at 10:28:19AM +0800, Tony W Wang-oc wrote:
>>> >> On 12/12/2020 01:43, Eric Biggers wrote:
>>> >>> On Fri, Dec 11, 2020 at 07:29:04PM +0800, Tony W Wang-oc wrote:
>>> >>>> The driver crc32c-intel match CPUs supporting
>>X86_FEATURE_XMM4_2.
>>> >>>> On platforms with Zhaoxin CPUs supporting this X86 feature,
>When
>>> >>>> crc32c-intel and crc32c-generic are both registered, system
>will
>>> >>>> use crc32c-intel because its .cra_priority is greater than
>>> >>>> crc32c-generic. This case expect to use crc32c-generic driver
>>for
>>> >>>> some Zhaoxin CPUs to get performance gain, So remove these
>>Zhaoxin
>>> >>>> CPUs support from crc32c-intel.
>>> >>>>
>>> >>>> Signed-off-by: Tony W Wang-oc <TonyWWang-oc@zhaoxin.com>
>>> >>>
>>> >>> Does this mean that the performance of the crc32c instruction on
>>those CPUs is
>>> >>> actually slower than a regular C implementation?  That's very
>>weird.
>>> >>>
>>> >>
>>> >> From the lmbench3 Create and Delete file test on those chips, I
>>think yes.
>>> >>
>>> > 
>>> > Did you try measuring the performance of the hashing itself, and
>>not some
>>> > higher-level filesystem operations?
>>> > 
>>> 
>>> Yes. Was testing on these Zhaoxin CPUs, the result is that with the
>>same
>>> input value the generic C implementation takes fewer time than the
>>> crc32c instruction implementation.
>>> 
>>
>>And that is really "working as intended"?
>
>These CPU's crc32c instruction is not working as intended.
>
>  Why do these CPUs even
>>declare that
>>they support the crc32c instruction, when it is so slow?
>>
>
>The presence of crc32c and some other instructions supports are
>enumerated by CPUID.01:ECX[SSE4.2] = 1,  other instructions are ok
>except the crc32c instruction.
>
>>Are there any other instruction sets (AES-NI, PCLMUL, SSE, SSE2, AVX,
>>etc.) that
>>these CPUs similarly declare support for but they are uselessly slow?
>
>No.
>
>Sincerely
>Tonyw
>
>>
>>- Eric

Then the right thing to do is to disable the CPUID bit in the vendor-specific startup code.
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] crypto: x86/crc32c-intel - Don't match some Zhaoxin CPUs
  2020-12-21 19:27             ` hpa
@ 2020-12-22  3:01               ` tonywwang-oc
  2020-12-22  4:54                 ` hpa
  0 siblings, 1 reply; 22+ messages in thread
From: tonywwang-oc @ 2020-12-22  3:01 UTC (permalink / raw)
  To: hpa, Eric Biggers
  Cc: herbert, davem, tglx, mingo, bp, x86, linux-crypto, linux-kernel,
	TimGuo-oc, CooperYan, QiyuanWang, HerryYang, CobeChen,
	SilviaZhao

On December 22, 2020 3:27:33 AM GMT+08:00, hpa@zytor.com wrote:
>On December 20, 2020 6:46:25 PM PST, tonywwang-oc@zhaoxin.com wrote:
>>On December 16, 2020 1:56:45 AM GMT+08:00, Eric Biggers
>><ebiggers@kernel.org> wrote:
>>>On Tue, Dec 15, 2020 at 10:15:29AM +0800, Tony W Wang-oc wrote:
>>>> 
>>>> On 15/12/2020 04:41, Eric Biggers wrote:
>>>> > On Mon, Dec 14, 2020 at 10:28:19AM +0800, Tony W Wang-oc wrote:
>>>> >> On 12/12/2020 01:43, Eric Biggers wrote:
>>>> >>> On Fri, Dec 11, 2020 at 07:29:04PM +0800, Tony W Wang-oc wrote:
>>>> >>>> The driver crc32c-intel match CPUs supporting
>>>X86_FEATURE_XMM4_2.
>>>> >>>> On platforms with Zhaoxin CPUs supporting this X86 feature,
>>When
>>>> >>>> crc32c-intel and crc32c-generic are both registered, system
>>will
>>>> >>>> use crc32c-intel because its .cra_priority is greater than
>>>> >>>> crc32c-generic. This case expect to use crc32c-generic driver
>>>for
>>>> >>>> some Zhaoxin CPUs to get performance gain, So remove these
>>>Zhaoxin
>>>> >>>> CPUs support from crc32c-intel.
>>>> >>>>
>>>> >>>> Signed-off-by: Tony W Wang-oc <TonyWWang-oc@zhaoxin.com>
>>>> >>>
>>>> >>> Does this mean that the performance of the crc32c instruction
>on
>>>those CPUs is
>>>> >>> actually slower than a regular C implementation?  That's very
>>>weird.
>>>> >>>
>>>> >>
>>>> >> From the lmbench3 Create and Delete file test on those chips, I
>>>think yes.
>>>> >>
>>>> > 
>>>> > Did you try measuring the performance of the hashing itself, and
>>>not some
>>>> > higher-level filesystem operations?
>>>> > 
>>>> 
>>>> Yes. Was testing on these Zhaoxin CPUs, the result is that with the
>>>same
>>>> input value the generic C implementation takes fewer time than the
>>>> crc32c instruction implementation.
>>>> 
>>>
>>>And that is really "working as intended"?
>>
>>These CPU's crc32c instruction is not working as intended.
>>
>>  Why do these CPUs even
>>>declare that
>>>they support the crc32c instruction, when it is so slow?
>>>
>>
>>The presence of crc32c and some other instructions supports are
>>enumerated by CPUID.01:ECX[SSE4.2] = 1,  other instructions are ok
>>except the crc32c instruction.
>>
>>>Are there any other instruction sets (AES-NI, PCLMUL, SSE, SSE2, AVX,
>>>etc.) that
>>>these CPUs similarly declare support for but they are uselessly slow?
>>
>>No.
>>
>>Sincerely
>>Tonyw
>>
>>>
>>>- Eric
>
>Then the right thing to do is to disable the CPUID bit in the
>vendor-specific startup code.

This way makes these CPUs do not support all instruction sets enumerated
by CPUID.01:ECX[SSE4.2].
While only crc32c instruction is slow, just expect the crc32c-intel driver do not
match these CPUs.

Sincerely
Tonyw


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] crypto: x86/crc32c-intel - Don't match some Zhaoxin CPUs
  2020-12-22  3:01               ` tonywwang-oc
@ 2020-12-22  4:54                 ` hpa
  2021-01-07  6:23                   ` Tony W Wang-oc
  0 siblings, 1 reply; 22+ messages in thread
From: hpa @ 2020-12-22  4:54 UTC (permalink / raw)
  To: tonywwang-oc, Eric Biggers
  Cc: herbert, davem, tglx, mingo, bp, x86, linux-crypto, linux-kernel,
	TimGuo-oc, CooperYan, QiyuanWang, HerryYang, CobeChen,
	SilviaZhao

On December 21, 2020 7:01:39 PM PST, tonywwang-oc@zhaoxin.com wrote:
>On December 22, 2020 3:27:33 AM GMT+08:00, hpa@zytor.com wrote:
>>On December 20, 2020 6:46:25 PM PST, tonywwang-oc@zhaoxin.com wrote:
>>>On December 16, 2020 1:56:45 AM GMT+08:00, Eric Biggers
>>><ebiggers@kernel.org> wrote:
>>>>On Tue, Dec 15, 2020 at 10:15:29AM +0800, Tony W Wang-oc wrote:
>>>>> 
>>>>> On 15/12/2020 04:41, Eric Biggers wrote:
>>>>> > On Mon, Dec 14, 2020 at 10:28:19AM +0800, Tony W Wang-oc wrote:
>>>>> >> On 12/12/2020 01:43, Eric Biggers wrote:
>>>>> >>> On Fri, Dec 11, 2020 at 07:29:04PM +0800, Tony W Wang-oc
>wrote:
>>>>> >>>> The driver crc32c-intel match CPUs supporting
>>>>X86_FEATURE_XMM4_2.
>>>>> >>>> On platforms with Zhaoxin CPUs supporting this X86 feature,
>>>When
>>>>> >>>> crc32c-intel and crc32c-generic are both registered, system
>>>will
>>>>> >>>> use crc32c-intel because its .cra_priority is greater than
>>>>> >>>> crc32c-generic. This case expect to use crc32c-generic driver
>>>>for
>>>>> >>>> some Zhaoxin CPUs to get performance gain, So remove these
>>>>Zhaoxin
>>>>> >>>> CPUs support from crc32c-intel.
>>>>> >>>>
>>>>> >>>> Signed-off-by: Tony W Wang-oc <TonyWWang-oc@zhaoxin.com>
>>>>> >>>
>>>>> >>> Does this mean that the performance of the crc32c instruction
>>on
>>>>those CPUs is
>>>>> >>> actually slower than a regular C implementation?  That's very
>>>>weird.
>>>>> >>>
>>>>> >>
>>>>> >> From the lmbench3 Create and Delete file test on those chips, I
>>>>think yes.
>>>>> >>
>>>>> > 
>>>>> > Did you try measuring the performance of the hashing itself, and
>>>>not some
>>>>> > higher-level filesystem operations?
>>>>> > 
>>>>> 
>>>>> Yes. Was testing on these Zhaoxin CPUs, the result is that with
>the
>>>>same
>>>>> input value the generic C implementation takes fewer time than the
>>>>> crc32c instruction implementation.
>>>>> 
>>>>
>>>>And that is really "working as intended"?
>>>
>>>These CPU's crc32c instruction is not working as intended.
>>>
>>>  Why do these CPUs even
>>>>declare that
>>>>they support the crc32c instruction, when it is so slow?
>>>>
>>>
>>>The presence of crc32c and some other instructions supports are
>>>enumerated by CPUID.01:ECX[SSE4.2] = 1,  other instructions are ok
>>>except the crc32c instruction.
>>>
>>>>Are there any other instruction sets (AES-NI, PCLMUL, SSE, SSE2,
>AVX,
>>>>etc.) that
>>>>these CPUs similarly declare support for but they are uselessly
>slow?
>>>
>>>No.
>>>
>>>Sincerely
>>>Tonyw
>>>
>>>>
>>>>- Eric
>>
>>Then the right thing to do is to disable the CPUID bit in the
>>vendor-specific startup code.
>
>This way makes these CPUs do not support all instruction sets
>enumerated
>by CPUID.01:ECX[SSE4.2].
>While only crc32c instruction is slow, just expect the crc32c-intel
>driver do not
>match these CPUs.
>
>Sincerely
>Tonyw

Then create a BUG flag for it, or factor out CRC32C into a synthetic flag. We *do not* bury this information in drivers; it becomes a recipe for the same problems over and over.
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] crypto: x86/crc32c-intel - Don't match some Zhaoxin CPUs
  2020-12-22  4:54                 ` hpa
@ 2021-01-07  6:23                   ` Tony W Wang-oc
  0 siblings, 0 replies; 22+ messages in thread
From: Tony W Wang-oc @ 2021-01-07  6:23 UTC (permalink / raw)
  To: hpa, Eric Biggers
  Cc: herbert, davem, tglx, mingo, bp, x86, linux-crypto, linux-kernel,
	TimGuo-oc, CooperYan, QiyuanWang, HerryYang, CobeChen,
	SilviaZhao


On 22/12/2020 12:54, hpa@zytor.com wrote:
> On December 21, 2020 7:01:39 PM PST, tonywwang-oc@zhaoxin.com wrote:
>> On December 22, 2020 3:27:33 AM GMT+08:00, hpa@zytor.com wrote:
>>> On December 20, 2020 6:46:25 PM PST, tonywwang-oc@zhaoxin.com wrote:
>>>> On December 16, 2020 1:56:45 AM GMT+08:00, Eric Biggers
>>>> <ebiggers@kernel.org> wrote:
>>>>> On Tue, Dec 15, 2020 at 10:15:29AM +0800, Tony W Wang-oc wrote:
>>>>>>
>>>>>> On 15/12/2020 04:41, Eric Biggers wrote:
>>>>>>> On Mon, Dec 14, 2020 at 10:28:19AM +0800, Tony W Wang-oc wrote:
>>>>>>>> On 12/12/2020 01:43, Eric Biggers wrote:
>>>>>>>>> On Fri, Dec 11, 2020 at 07:29:04PM +0800, Tony W Wang-oc
>> wrote:
>>>>>>>>>> The driver crc32c-intel match CPUs supporting
>>>>> X86_FEATURE_XMM4_2.
>>>>>>>>>> On platforms with Zhaoxin CPUs supporting this X86 feature,
>>>> When
>>>>>>>>>> crc32c-intel and crc32c-generic are both registered, system
>>>> will
>>>>>>>>>> use crc32c-intel because its .cra_priority is greater than
>>>>>>>>>> crc32c-generic. This case expect to use crc32c-generic driver
>>>>> for
>>>>>>>>>> some Zhaoxin CPUs to get performance gain, So remove these
>>>>> Zhaoxin
>>>>>>>>>> CPUs support from crc32c-intel.
>>>>>>>>>>
>>>>>>>>>> Signed-off-by: Tony W Wang-oc <TonyWWang-oc@zhaoxin.com>
>>>>>>>>>
>>>>>>>>> Does this mean that the performance of the crc32c instruction
>>> on
>>>>> those CPUs is
>>>>>>>>> actually slower than a regular C implementation?  That's very
>>>>> weird.
>>>>>>>>>
>>>>>>>>
>>>>>>>> From the lmbench3 Create and Delete file test on those chips, I
>>>>> think yes.
>>>>>>>>
>>>>>>>
>>>>>>> Did you try measuring the performance of the hashing itself, and
>>>>> not some
>>>>>>> higher-level filesystem operations?
>>>>>>>
>>>>>>
>>>>>> Yes. Was testing on these Zhaoxin CPUs, the result is that with
>> the
>>>>> same
>>>>>> input value the generic C implementation takes fewer time than the
>>>>>> crc32c instruction implementation.
>>>>>>
>>>>>
>>>>> And that is really "working as intended"?
>>>>
>>>> These CPU's crc32c instruction is not working as intended.
>>>>
>>>>  Why do these CPUs even
>>>>> declare that
>>>>> they support the crc32c instruction, when it is so slow?
>>>>>
>>>>
>>>> The presence of crc32c and some other instructions supports are
>>>> enumerated by CPUID.01:ECX[SSE4.2] = 1,  other instructions are ok
>>>> except the crc32c instruction.
>>>>
>>>>> Are there any other instruction sets (AES-NI, PCLMUL, SSE, SSE2,
>> AVX,
>>>>> etc.) that
>>>>> these CPUs similarly declare support for but they are uselessly
>> slow?
>>>>
>>>> No.
>>>>
>>>> Sincerely
>>>> Tonyw
>>>>
>>>>>
>>>>> - Eric
>>>
>>> Then the right thing to do is to disable the CPUID bit in the
>>> vendor-specific startup code.
>>
>> This way makes these CPUs do not support all instruction sets
>> enumerated
>> by CPUID.01:ECX[SSE4.2].
>> While only crc32c instruction is slow, just expect the crc32c-intel
>> driver do not
>> match these CPUs.
>>
>> Sincerely
>> Tonyw
> 
> Then create a BUG flag for it, or factor out CRC32C into a synthetic flag. We *do not* bury this information in drivers; it becomes a recipe for the same problems over and over.
> 

Thanks for your suggestion. Have send new patch set.

Sincerely
Tonyw

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] crypto: x86/crc32c-intel - Don't match some Zhaoxin CPUs
  2021-01-02 21:12 ` Herbert Xu
@ 2021-01-07  6:27   ` Tony W Wang-oc
  0 siblings, 0 replies; 22+ messages in thread
From: Tony W Wang-oc @ 2021-01-07  6:27 UTC (permalink / raw)
  To: Herbert Xu
  Cc: davem, tglx, mingo, bp, x86, hpa, linux-crypto, linux-kernel,
	TimGuo-oc, CooperYan, QiyuanWang, HerryYang, CobeChen,
	SilviaZhao

On 03/01/2021 05:12, Herbert Xu wrote:
> On Tue, Dec 15, 2020 at 06:28:11PM +0800, Tony W Wang-oc wrote:
>> The driver crc32c-intel match CPUs supporting X86_FEATURE_XMM4_2.
>> On platforms with Zhaoxin CPUs supporting this X86 feature, when
>> crc32c-intel and crc32c-generic are both registered, system will
>> use crc32c-intel because its .cra_priority is greater than
>> crc32c-generic.
>>
>> When doing lmbench3 Create and Delete file test on partitions with
>> ext4 enabling metadata checksum, found using crc32c-generic driver
>> could get about 20% performance gain than using the driver crc32c-intel
>> on some Zhaoxin CPUs.
>>
>> This case expect to use crc32c-generic driver for these Zhaoxin CPUs
>> to get performance gain, so remove these Zhaoxin CPUs support from
>> crc32c-intel.
>>
>> Signed-off-by: Tony W Wang-oc <TonyWWang-oc@zhaoxin.com>
>> ---
>>  arch/x86/crypto/crc32c-intel_glue.c | 21 +++++++++++++++++++--
>>  1 file changed, 19 insertions(+), 2 deletions(-)
> 
> This does not seem to address the latest comment from hpa.
> 

Yes, please ignore this patch. Have send new patch set per Hpa's suggestion.

Sincerely
Tonyw

> Thanks,
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] crypto: x86/crc32c-intel - Don't match some Zhaoxin CPUs
  2020-12-15 10:28 Tony W Wang-oc
@ 2021-01-02 21:12 ` Herbert Xu
  2021-01-07  6:27   ` Tony W Wang-oc
  0 siblings, 1 reply; 22+ messages in thread
From: Herbert Xu @ 2021-01-02 21:12 UTC (permalink / raw)
  To: Tony W Wang-oc
  Cc: davem, tglx, mingo, bp, x86, hpa, linux-crypto, linux-kernel,
	TimGuo-oc, CooperYan, QiyuanWang, HerryYang, CobeChen,
	SilviaZhao

On Tue, Dec 15, 2020 at 06:28:11PM +0800, Tony W Wang-oc wrote:
> The driver crc32c-intel match CPUs supporting X86_FEATURE_XMM4_2.
> On platforms with Zhaoxin CPUs supporting this X86 feature, when
> crc32c-intel and crc32c-generic are both registered, system will
> use crc32c-intel because its .cra_priority is greater than
> crc32c-generic.
> 
> When doing lmbench3 Create and Delete file test on partitions with
> ext4 enabling metadata checksum, found using crc32c-generic driver
> could get about 20% performance gain than using the driver crc32c-intel
> on some Zhaoxin CPUs.
> 
> This case expect to use crc32c-generic driver for these Zhaoxin CPUs
> to get performance gain, so remove these Zhaoxin CPUs support from
> crc32c-intel.
> 
> Signed-off-by: Tony W Wang-oc <TonyWWang-oc@zhaoxin.com>
> ---
>  arch/x86/crypto/crc32c-intel_glue.c | 21 +++++++++++++++++++--
>  1 file changed, 19 insertions(+), 2 deletions(-)

This does not seem to address the latest comment from hpa.

Thanks,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH] crypto: x86/crc32c-intel - Don't match some Zhaoxin CPUs
@ 2020-12-15 10:28 Tony W Wang-oc
  2021-01-02 21:12 ` Herbert Xu
  0 siblings, 1 reply; 22+ messages in thread
From: Tony W Wang-oc @ 2020-12-15 10:28 UTC (permalink / raw)
  To: herbert, davem, tglx, mingo, bp, x86, hpa, linux-crypto, linux-kernel
  Cc: TimGuo-oc, CooperYan, QiyuanWang, HerryYang, CobeChen, SilviaZhao

The driver crc32c-intel match CPUs supporting X86_FEATURE_XMM4_2.
On platforms with Zhaoxin CPUs supporting this X86 feature, when
crc32c-intel and crc32c-generic are both registered, system will
use crc32c-intel because its .cra_priority is greater than
crc32c-generic.

When doing lmbench3 Create and Delete file test on partitions with
ext4 enabling metadata checksum, found using crc32c-generic driver
could get about 20% performance gain than using the driver crc32c-intel
on some Zhaoxin CPUs.

This case expect to use crc32c-generic driver for these Zhaoxin CPUs
to get performance gain, so remove these Zhaoxin CPUs support from
crc32c-intel.

Signed-off-by: Tony W Wang-oc <TonyWWang-oc@zhaoxin.com>
---
 arch/x86/crypto/crc32c-intel_glue.c | 21 +++++++++++++++++++--
 1 file changed, 19 insertions(+), 2 deletions(-)

diff --git a/arch/x86/crypto/crc32c-intel_glue.c b/arch/x86/crypto/crc32c-intel_glue.c
index feccb52..5171091 100644
--- a/arch/x86/crypto/crc32c-intel_glue.c
+++ b/arch/x86/crypto/crc32c-intel_glue.c
@@ -215,14 +215,31 @@ static struct shash_alg alg = {
 };
 
 static const struct x86_cpu_id crc32c_cpu_id[] = {
-	X86_MATCH_FEATURE(X86_FEATURE_XMM4_2, NULL),
+	/*
+	 * Negative entries; exclude these chips from using this driver.
+	 * They match the positive rule below, but their CRC32 instruction
+	 * implementation is so slow, it doesn't merit use.
+	 */
+	X86_MATCH_VENDOR_FAM_FEATURE(ZHAOXIN, 0x6, X86_FEATURE_XMM4_2, false),
+	X86_MATCH_VENDOR_FAM_MODEL_FEATURE(ZHAOXIN, 0x7, 0x1b, X86_FEATURE_XMM4_2, false),
+	X86_MATCH_VENDOR_FAM_MODEL_FEATURE(ZHAOXIN, 0x7, 0x3b, X86_FEATURE_XMM4_2, false),
+	X86_MATCH_VENDOR_FAM_FEATURE(CENTAUR, 0x6, X86_FEATURE_XMM4_2, false),
+	X86_MATCH_VENDOR_FAM_MODEL_FEATURE(CENTAUR, 0x7, 0x1b, X86_FEATURE_XMM4_2, false),
+	X86_MATCH_VENDOR_FAM_MODEL_FEATURE(CENTAUR, 0x7, 0x3b, X86_FEATURE_XMM4_2, false),
+	/*
+	 * Positive entry; SSE-4.2 instructions include special purpose CRC32
+	 * instructions.
+	 */
+	X86_MATCH_FEATURE(X86_FEATURE_XMM4_2, true),
 	{}
 };
 MODULE_DEVICE_TABLE(x86cpu, crc32c_cpu_id);
 
 static int __init crc32c_intel_mod_init(void)
 {
-	if (!x86_match_cpu(crc32c_cpu_id))
+	const struct x86_cpu_id *m = x86_match_cpu(crc32c_cpu_id);
+
+	if (!m || !m->driver_data)
 		return -ENODEV;
 #ifdef CONFIG_X86_64
 	if (boot_cpu_has(X86_FEATURE_PCLMULQDQ)) {
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH] crypto: x86/crc32c-intel - Don't match some Zhaoxin CPUs
  2020-12-15  8:58 ` Peter Zijlstra
@ 2020-12-15 10:02   ` Tony W Wang-oc
  0 siblings, 0 replies; 22+ messages in thread
From: Tony W Wang-oc @ 2020-12-15 10:02 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: herbert, davem, tglx, mingo, bp, x86, hpa, linux-crypto,
	linux-kernel, TimGuo-oc, CooperYan, QiyuanWang, HerryYang,
	CobeChen, SilviaZhao


On 15/12/2020 16:58, Peter Zijlstra wrote:
> On Mon, Dec 14, 2020 at 11:59:52AM +0800, Tony W Wang-oc wrote:
> 
> Didn't I mention something about a comment?
> 
Really sorry for this.

>>  static const struct x86_cpu_id crc32c_cpu_id[] = {
>> +	X86_MATCH_VENDOR_FAM_FEATURE(ZHAOXIN, 0x6, X86_FEATURE_XMM4_2, 1),
>> +	X86_MATCH_VENDOR_FAM_MODEL_FEATURE(ZHAOXIN, 0x7, 0x1b, X86_FEATURE_XMM4_2, 1),
>> +	X86_MATCH_VENDOR_FAM_MODEL_FEATURE(ZHAOXIN, 0x7, 0x3b, X86_FEATURE_XMM4_2, 1),
>> +	X86_MATCH_VENDOR_FAM_FEATURE(CENTAUR, 0x6, X86_FEATURE_XMM4_2, 1),
>> +	X86_MATCH_VENDOR_FAM_MODEL_FEATURE(CENTAUR, 0x7, 0x1b, X86_FEATURE_XMM4_2, 1),
>> +	X86_MATCH_VENDOR_FAM_MODEL_FEATURE(CENTAUR, 0x7, 0x3b, X86_FEATURE_XMM4_2, 1),
>>  	X86_MATCH_FEATURE(X86_FEATURE_XMM4_2, NULL),
>>  	{}
> 
> Also, the above is weird in that is has the negative entries marked
> positive, and 1/NULL are inconsistent.
> 
> Something like so then?
> That's better!

> ---
> 
> diff --git a/arch/x86/crypto/crc32c-intel_glue.c b/arch/x86/crypto/crc32c-intel_glue.c
> index feccb5254c7e..f6e6669a5102 100644
> --- a/arch/x86/crypto/crc32c-intel_glue.c
> +++ b/arch/x86/crypto/crc32c-intel_glue.c
> @@ -215,14 +215,31 @@ static struct shash_alg alg = {
>  };
>  
>  static const struct x86_cpu_id crc32c_cpu_id[] = {
> -	X86_MATCH_FEATURE(X86_FEATURE_XMM4_2, NULL),
> +	/*
> +	 * Negative entries; exclude these chips from using this driver.
> +	 * They match the positive rule below, but their CRC32 instruction
> +	 * implementation is so slow, it doesn't merrit use.
Will fix the typo merrit -> merit and resend the patch.

Sincerely
Tony

> +	 */
> +	X86_MATCH_VENDOR_FAM_FEATURE(ZHAOXIN, 0x6, X86_FEATURE_XMM4_2, false),
> +	X86_MATCH_VENDOR_FAM_MODEL_FEATURE(ZHAOXIN, 0x7, 0x1b, X86_FEATURE_XMM4_2, false),
> +	X86_MATCH_VENDOR_FAM_MODEL_FEATURE(ZHAOXIN, 0x7, 0x3b, X86_FEATURE_XMM4_2, false),
> +	X86_MATCH_VENDOR_FAM_FEATURE(CENTAUR, 0x6, X86_FEATURE_XMM4_2, false),
> +	X86_MATCH_VENDOR_FAM_MODEL_FEATURE(CENTAUR, 0x7, 0x1b, X86_FEATURE_XMM4_2, false),
> +	X86_MATCH_VENDOR_FAM_MODEL_FEATURE(CENTAUR, 0x7, 0x3b, X86_FEATURE_XMM4_2, false),
> +	/*
> +	 * Positive entry; SSE-4.2 instructions include special purpose CRC32
> +	 * instructions.
> +	 */
> +	X86_MATCH_FEATURE(X86_FEATURE_XMM4_2, true),
>  	{}
>  };
>  MODULE_DEVICE_TABLE(x86cpu, crc32c_cpu_id);
>  
>  static int __init crc32c_intel_mod_init(void)
>  {
> -	if (!x86_match_cpu(crc32c_cpu_id))
> +	const struct x86_cpu_id *m = x86_match_cpu(crc32c_cpu_id);
> +
> +	if (!m || !m->driver_data)
>  		return -ENODEV;
>  #ifdef CONFIG_X86_64
>  	if (boot_cpu_has(X86_FEATURE_PCLMULQDQ)) {
> .
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] crypto: x86/crc32c-intel - Don't match some Zhaoxin CPUs
  2020-12-14  3:59 Tony W Wang-oc
@ 2020-12-15  8:58 ` Peter Zijlstra
  2020-12-15 10:02   ` Tony W Wang-oc
  0 siblings, 1 reply; 22+ messages in thread
From: Peter Zijlstra @ 2020-12-15  8:58 UTC (permalink / raw)
  To: Tony W Wang-oc
  Cc: herbert, davem, tglx, mingo, bp, x86, hpa, linux-crypto,
	linux-kernel, TimGuo-oc, CooperYan, QiyuanWang, HerryYang,
	CobeChen, SilviaZhao

On Mon, Dec 14, 2020 at 11:59:52AM +0800, Tony W Wang-oc wrote:

Didn't I mention something about a comment?

>  static const struct x86_cpu_id crc32c_cpu_id[] = {
> +	X86_MATCH_VENDOR_FAM_FEATURE(ZHAOXIN, 0x6, X86_FEATURE_XMM4_2, 1),
> +	X86_MATCH_VENDOR_FAM_MODEL_FEATURE(ZHAOXIN, 0x7, 0x1b, X86_FEATURE_XMM4_2, 1),
> +	X86_MATCH_VENDOR_FAM_MODEL_FEATURE(ZHAOXIN, 0x7, 0x3b, X86_FEATURE_XMM4_2, 1),
> +	X86_MATCH_VENDOR_FAM_FEATURE(CENTAUR, 0x6, X86_FEATURE_XMM4_2, 1),
> +	X86_MATCH_VENDOR_FAM_MODEL_FEATURE(CENTAUR, 0x7, 0x1b, X86_FEATURE_XMM4_2, 1),
> +	X86_MATCH_VENDOR_FAM_MODEL_FEATURE(CENTAUR, 0x7, 0x3b, X86_FEATURE_XMM4_2, 1),
>  	X86_MATCH_FEATURE(X86_FEATURE_XMM4_2, NULL),
>  	{}

Also, the above is weird in that is has the negative entries marked
positive, and 1/NULL are inconsistent.

Something like so then?

---

diff --git a/arch/x86/crypto/crc32c-intel_glue.c b/arch/x86/crypto/crc32c-intel_glue.c
index feccb5254c7e..f6e6669a5102 100644
--- a/arch/x86/crypto/crc32c-intel_glue.c
+++ b/arch/x86/crypto/crc32c-intel_glue.c
@@ -215,14 +215,31 @@ static struct shash_alg alg = {
 };
 
 static const struct x86_cpu_id crc32c_cpu_id[] = {
-	X86_MATCH_FEATURE(X86_FEATURE_XMM4_2, NULL),
+	/*
+	 * Negative entries; exclude these chips from using this driver.
+	 * They match the positive rule below, but their CRC32 instruction
+	 * implementation is so slow, it doesn't merrit use.
+	 */
+	X86_MATCH_VENDOR_FAM_FEATURE(ZHAOXIN, 0x6, X86_FEATURE_XMM4_2, false),
+	X86_MATCH_VENDOR_FAM_MODEL_FEATURE(ZHAOXIN, 0x7, 0x1b, X86_FEATURE_XMM4_2, false),
+	X86_MATCH_VENDOR_FAM_MODEL_FEATURE(ZHAOXIN, 0x7, 0x3b, X86_FEATURE_XMM4_2, false),
+	X86_MATCH_VENDOR_FAM_FEATURE(CENTAUR, 0x6, X86_FEATURE_XMM4_2, false),
+	X86_MATCH_VENDOR_FAM_MODEL_FEATURE(CENTAUR, 0x7, 0x1b, X86_FEATURE_XMM4_2, false),
+	X86_MATCH_VENDOR_FAM_MODEL_FEATURE(CENTAUR, 0x7, 0x3b, X86_FEATURE_XMM4_2, false),
+	/*
+	 * Positive entry; SSE-4.2 instructions include special purpose CRC32
+	 * instructions.
+	 */
+	X86_MATCH_FEATURE(X86_FEATURE_XMM4_2, true),
 	{}
 };
 MODULE_DEVICE_TABLE(x86cpu, crc32c_cpu_id);
 
 static int __init crc32c_intel_mod_init(void)
 {
-	if (!x86_match_cpu(crc32c_cpu_id))
+	const struct x86_cpu_id *m = x86_match_cpu(crc32c_cpu_id);
+
+	if (!m || !m->driver_data)
 		return -ENODEV;
 #ifdef CONFIG_X86_64
 	if (boot_cpu_has(X86_FEATURE_PCLMULQDQ)) {

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH] crypto: x86/crc32c-intel - Don't match some Zhaoxin CPUs
@ 2020-12-14  3:59 Tony W Wang-oc
  2020-12-15  8:58 ` Peter Zijlstra
  0 siblings, 1 reply; 22+ messages in thread
From: Tony W Wang-oc @ 2020-12-14  3:59 UTC (permalink / raw)
  To: herbert, davem, tglx, mingo, bp, x86, hpa, linux-crypto, linux-kernel
  Cc: TimGuo-oc, CooperYan, QiyuanWang, HerryYang, CobeChen, SilviaZhao

The driver crc32c-intel match CPUs supporting X86_FEATURE_XMM4_2.
On platforms with Zhaoxin CPUs supporting this X86 feature, when
crc32c-intel and crc32c-generic are both registered, system will
use crc32c-intel because its .cra_priority is greater than
crc32c-generic.

When doing lmbench3 Create and Delete file test on partitions with
ext4 enabling metadata checksum, found using crc32c-generic driver
could get about 20% performance gain than using the driver crc32c-intel
on some Zhaoxin CPUs.

This case expect to use crc32c-generic driver for these Zhaoxin CPUs
to get performance gain, so remove these Zhaoxin CPUs support from
crc32c-intel.

Signed-off-by: Tony W Wang-oc <TonyWWang-oc@zhaoxin.com>
---
 arch/x86/crypto/crc32c-intel_glue.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/arch/x86/crypto/crc32c-intel_glue.c b/arch/x86/crypto/crc32c-intel_glue.c
index feccb52..2cbbdde 100644
--- a/arch/x86/crypto/crc32c-intel_glue.c
+++ b/arch/x86/crypto/crc32c-intel_glue.c
@@ -215,6 +215,12 @@ static struct shash_alg alg = {
 };
 
 static const struct x86_cpu_id crc32c_cpu_id[] = {
+	X86_MATCH_VENDOR_FAM_FEATURE(ZHAOXIN, 0x6, X86_FEATURE_XMM4_2, 1),
+	X86_MATCH_VENDOR_FAM_MODEL_FEATURE(ZHAOXIN, 0x7, 0x1b, X86_FEATURE_XMM4_2, 1),
+	X86_MATCH_VENDOR_FAM_MODEL_FEATURE(ZHAOXIN, 0x7, 0x3b, X86_FEATURE_XMM4_2, 1),
+	X86_MATCH_VENDOR_FAM_FEATURE(CENTAUR, 0x6, X86_FEATURE_XMM4_2, 1),
+	X86_MATCH_VENDOR_FAM_MODEL_FEATURE(CENTAUR, 0x7, 0x1b, X86_FEATURE_XMM4_2, 1),
+	X86_MATCH_VENDOR_FAM_MODEL_FEATURE(CENTAUR, 0x7, 0x3b, X86_FEATURE_XMM4_2, 1),
 	X86_MATCH_FEATURE(X86_FEATURE_XMM4_2, NULL),
 	{}
 };
@@ -222,7 +228,9 @@ MODULE_DEVICE_TABLE(x86cpu, crc32c_cpu_id);
 
 static int __init crc32c_intel_mod_init(void)
 {
-	if (!x86_match_cpu(crc32c_cpu_id))
+	const struct x86_cpu_id *m = x86_match_cpu(crc32c_cpu_id);
+
+	if (!m || m->driver_data)
 		return -ENODEV;
 #ifdef CONFIG_X86_64
 	if (boot_cpu_has(X86_FEATURE_PCLMULQDQ)) {
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2021-01-07  6:28 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-11 11:29 [PATCH] crypto: x86/crc32c-intel - Don't match some Zhaoxin CPUs Tony W Wang-oc
2020-12-11 13:00 ` Peter Zijlstra
2020-12-12  4:32   ` Tony W Wang-oc
2020-12-11 17:43 ` Eric Biggers
2020-12-12  9:36   ` Ard Biesheuvel
2020-12-12 10:54     ` Ard Biesheuvel
2020-12-14  2:29       ` Tony W Wang-oc
2020-12-14  2:28   ` Tony W Wang-oc
2020-12-14 20:41     ` Eric Biggers
2020-12-15  2:15       ` Tony W Wang-oc
2020-12-15 17:56         ` Eric Biggers
2020-12-21  2:46           ` tonywwang-oc
2020-12-21 19:27             ` hpa
2020-12-22  3:01               ` tonywwang-oc
2020-12-22  4:54                 ` hpa
2021-01-07  6:23                   ` Tony W Wang-oc
2020-12-14  3:59 Tony W Wang-oc
2020-12-15  8:58 ` Peter Zijlstra
2020-12-15 10:02   ` Tony W Wang-oc
2020-12-15 10:28 Tony W Wang-oc
2021-01-02 21:12 ` Herbert Xu
2021-01-07  6:27   ` Tony W Wang-oc

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).