linux-crypto.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/3] Add Zhaoxin hardware engine driver support for SHA
@ 2024-01-23  2:28 Tony W Wang-oc
  2024-01-23  2:28 ` [PATCH v2 1/3] crypto: padlock-sha: Matches CPU with Family with 6 explicitly Tony W Wang-oc
                   ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Tony W Wang-oc @ 2024-01-23  2:28 UTC (permalink / raw)
  To: herbert, davem, linux-crypto, linux-kernel, tglx, mingo, bp,
	dave.hansen, x86, hpa, seanjc, kim.phillips, kirill.shutemov,
	jmattson, babu.moger, kai.huang, TonyWWang-oc, acme, aik,
	namhyung
  Cc: CobeChen, TimGuo, LeoLiu-oc, GeorgeXue

Zhaoxin CPUs have implemented the SHA(Secure Hash Algorithm) as its CPU
instructions, including SHA1, SHA256, SHA384 and SHA512, which conform
to the Secure Hash Algorithms specified by FIPS 180-3.

With the help of implementation of SHA in hardware instead of software,
can develop applications with higher performance, more security and more
flexibility.

Below table gives a summary of test using the driver tcrypt with different
crypt algorithm drivers on Zhaoxin KH-40000 platform:
---------------------------------------------------------------------------
tcrypt     driver   16*    64      256     1024    2048    4096    8192
---------------------------------------------------------------------------
           zhaoxin** 442.80 1309.21 3257.53 5221.56 5813.45 6136.39 6264.50***
403:SHA1   generic** 341.44 813.27  1458.98 1818.03 1896.60 1940.71 1939.06
           ratio    1.30   1.61    2.23    2.87    3.07    3.16    3.23
---------------------------------------------------------------------------
           zhaoxin  451.70 1313.65 2958.71 4658.55 5109.16 5359.08 5459.13
404:SHA256 generic  202.62 463.55  845.01  1070.50 1117.51 1144.79 1155.68
           ratio    2.23   2.83    3.50    4.35    4.57    4.68    4.72
---------------------------------------------------------------------------
           zhaoxin  350.90 1406.42 3166.16 5736.39 6627.77 7182.01 7429.18
405:SHA384 generic  161.76 654.88  979.06  1350.56 1423.08 1496.57 1513.12
           ratio    2.17   2.15    3.23    4.25    4.66    4.80    4.91
---------------------------------------------------------------------------
           zhaoxin  334.49 1394.71 3159.93 5728.86 6625.33 7169.23 7407.80
406:SHA512 generic  161.80 653.84  979.42  1351.41 1444.14 1495.35 1518.43
           ratio    2.07   2.13    3.23    4.24    4.59    4.79    4.88
---------------------------------------------------------------------------
*: The length of each data block to be processed by one complete SHA
   sequence, namely one INIT, multi UPDATEs and one FINAL.
**: Crypt algorithm driver used by tcrypt, "zhaoxin" represents zhaoxin-sha
   while "generic" represents the generic software SHA driver.
***: The speed of each crypt algorithm driver processing different length
   of data blocks, unit is Mb/s.

The ratio in the table implies the performance of SHA implemented by
zhaoxin-sha driver is much higher than the ones implemented by the generic
software driver of sha1/sha256/sha384/sha512.

In order to support Zhaoxin-sha driver, make padlock-sha driver matches
the CENTAUR CPUs with Family == 6 and add two Zhaoxin Hash Engine
cpufeatures.

---
v2:
- Make Zhaoxin SHA depends on X86 && !UML
- Update MAINTAINERS for Zhaoxin SHA

Tony W Wang-oc (3):
  crypto: padlock-sha: Matches CPU with Family with 6 explicitly
  x86/cpufeatures: Add CPU feature flags for Zhaoxin Hash Engine
  crypto: Zhaoxin: Hardware Engine Driver for SHA1/256/384/512

 MAINTAINERS                              |   6 +
 arch/x86/include/asm/cpufeatures.h       |   4 +-
 drivers/crypto/Kconfig                   |  16 +
 drivers/crypto/Makefile                  |   1 +
 drivers/crypto/padlock-sha.c             |   2 +-
 drivers/crypto/zhaoxin-sha.c             | 500 +++++++++++++++++++++++
 drivers/crypto/zhaoxin-sha.h             |  17 +
 tools/arch/x86/include/asm/cpufeatures.h |   4 +-
 8 files changed, 547 insertions(+), 3 deletions(-)
 create mode 100644 drivers/crypto/zhaoxin-sha.c
 create mode 100644 drivers/crypto/zhaoxin-sha.h

-- 
2.25.1


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v2 1/3] crypto: padlock-sha: Matches CPU with Family with 6 explicitly
  2024-01-23  2:28 [PATCH v2 0/3] Add Zhaoxin hardware engine driver support for SHA Tony W Wang-oc
@ 2024-01-23  2:28 ` Tony W Wang-oc
  2024-01-23 16:33   ` Dave Hansen
  2024-01-23  2:28 ` [PATCH v2 2/3] x86/cpufeatures: Add CPU feature flags for Zhaoxin Hash Engine Tony W Wang-oc
  2024-01-23  2:28 ` [PATCH v2 3/3] crypto: Zhaoxin: Hardware Engine Driver for SHA1/256/384/512 Tony W Wang-oc
  2 siblings, 1 reply; 13+ messages in thread
From: Tony W Wang-oc @ 2024-01-23  2:28 UTC (permalink / raw)
  To: herbert, davem, linux-crypto, linux-kernel, tglx, mingo, bp,
	dave.hansen, x86, hpa, seanjc, kim.phillips, kirill.shutemov,
	jmattson, babu.moger, kai.huang, TonyWWang-oc, acme, aik,
	namhyung
  Cc: CobeChen, TimGuo, LeoLiu-oc, GeorgeXue

Updates the supporting qualification for packlock-sha driver, making
it support CPUs whose vendor ID is Centaur and Famliy is 6.

Signed-off-by: Tony W Wang-oc <TonyWWang-oc@zhaoxin.com>
---
 drivers/crypto/padlock-sha.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/crypto/padlock-sha.c b/drivers/crypto/padlock-sha.c
index 6865c7f1fc1a..2e82c5e77f7a 100644
--- a/drivers/crypto/padlock-sha.c
+++ b/drivers/crypto/padlock-sha.c
@@ -491,7 +491,7 @@ static struct shash_alg sha256_alg_nano = {
 };
 
 static const struct x86_cpu_id padlock_sha_ids[] = {
-	X86_MATCH_FEATURE(X86_FEATURE_PHE, NULL),
+	X86_MATCH_VENDOR_FAM_FEATURE(CENTAUR, 6, X86_FEATURE_PHE, NULL),
 	{}
 };
 MODULE_DEVICE_TABLE(x86cpu, padlock_sha_ids);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v2 2/3] x86/cpufeatures: Add CPU feature flags for Zhaoxin Hash Engine
  2024-01-23  2:28 [PATCH v2 0/3] Add Zhaoxin hardware engine driver support for SHA Tony W Wang-oc
  2024-01-23  2:28 ` [PATCH v2 1/3] crypto: padlock-sha: Matches CPU with Family with 6 explicitly Tony W Wang-oc
@ 2024-01-23  2:28 ` Tony W Wang-oc
  2024-01-23  9:44   ` Borislav Petkov
  2024-01-23  2:28 ` [PATCH v2 3/3] crypto: Zhaoxin: Hardware Engine Driver for SHA1/256/384/512 Tony W Wang-oc
  2 siblings, 1 reply; 13+ messages in thread
From: Tony W Wang-oc @ 2024-01-23  2:28 UTC (permalink / raw)
  To: herbert, davem, linux-crypto, linux-kernel, tglx, mingo, bp,
	dave.hansen, x86, hpa, seanjc, kim.phillips, kirill.shutemov,
	jmattson, babu.moger, kai.huang, TonyWWang-oc, acme, aik,
	namhyung
  Cc: CobeChen, TimGuo, LeoLiu-oc, GeorgeXue

Zhaoxin CPUs have implemented the SHA(Secure Hash Algorithm) as its
instrucions.
Add two CPU feature flags indicated by CPUID.(EAX=C0000001,ECX=0):EDX
bit 25/26 which will be used by Zhaoxin SHA driver.

Signed-off-by: Tony W Wang-oc <TonyWWang-oc@zhaoxin.com>
---
 arch/x86/include/asm/cpufeatures.h       | 4 +++-
 tools/arch/x86/include/asm/cpufeatures.h | 4 +++-
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 29cb275a219d..28b0e62dbdf5 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -145,7 +145,7 @@
 #define X86_FEATURE_RDRAND		( 4*32+30) /* RDRAND instruction */
 #define X86_FEATURE_HYPERVISOR		( 4*32+31) /* Running on a hypervisor */
 
-/* VIA/Cyrix/Centaur-defined CPU features, CPUID level 0xC0000001, word 5 */
+/* VIA/Cyrix/Centaur/Zhaoxin-defined CPU features, CPUID level 0xC0000001, word 5 */
 #define X86_FEATURE_XSTORE		( 5*32+ 2) /* "rng" RNG present (xstore) */
 #define X86_FEATURE_XSTORE_EN		( 5*32+ 3) /* "rng_en" RNG enabled */
 #define X86_FEATURE_XCRYPT		( 5*32+ 6) /* "ace" on-CPU crypto (xcrypt) */
@@ -156,6 +156,8 @@
 #define X86_FEATURE_PHE_EN		( 5*32+11) /* PHE enabled */
 #define X86_FEATURE_PMM			( 5*32+12) /* PadLock Montgomery Multiplier */
 #define X86_FEATURE_PMM_EN		( 5*32+13) /* PMM enabled */
+#define X86_FEATURE_PHE2		( 5*32+25) /* "phe2" Zhaoxin Hash Engine */
+#define X86_FEATURE_PHE2_EN		( 5*32+26) /* "phe2_en" PHE2 enabled */
 
 /* More extended AMD flags: CPUID level 0x80000001, ECX, word 6 */
 #define X86_FEATURE_LAHF_LM		( 6*32+ 0) /* LAHF/SAHF in long mode */
diff --git a/tools/arch/x86/include/asm/cpufeatures.h b/tools/arch/x86/include/asm/cpufeatures.h
index f4542d2718f4..21caba9d070b 100644
--- a/tools/arch/x86/include/asm/cpufeatures.h
+++ b/tools/arch/x86/include/asm/cpufeatures.h
@@ -145,7 +145,7 @@
 #define X86_FEATURE_RDRAND		( 4*32+30) /* RDRAND instruction */
 #define X86_FEATURE_HYPERVISOR		( 4*32+31) /* Running on a hypervisor */
 
-/* VIA/Cyrix/Centaur-defined CPU features, CPUID level 0xC0000001, word 5 */
+/* VIA/Cyrix/Centaur/Zhaoxin-defined CPU features, CPUID level 0xC0000001, word 5 */
 #define X86_FEATURE_XSTORE		( 5*32+ 2) /* "rng" RNG present (xstore) */
 #define X86_FEATURE_XSTORE_EN		( 5*32+ 3) /* "rng_en" RNG enabled */
 #define X86_FEATURE_XCRYPT		( 5*32+ 6) /* "ace" on-CPU crypto (xcrypt) */
@@ -156,6 +156,8 @@
 #define X86_FEATURE_PHE_EN		( 5*32+11) /* PHE enabled */
 #define X86_FEATURE_PMM			( 5*32+12) /* PadLock Montgomery Multiplier */
 #define X86_FEATURE_PMM_EN		( 5*32+13) /* PMM enabled */
+#define X86_FEATURE_PHE2		( 5*32+25) /* "phe2" Zhaoxin Hash Engine */
+#define X86_FEATURE_PHE2_EN		( 5*32+26) /* "phe2_en" PHE2 enabled */
 
 /* More extended AMD flags: CPUID level 0x80000001, ECX, word 6 */
 #define X86_FEATURE_LAHF_LM		( 6*32+ 0) /* LAHF/SAHF in long mode */
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v2 3/3] crypto: Zhaoxin: Hardware Engine Driver for SHA1/256/384/512
  2024-01-23  2:28 [PATCH v2 0/3] Add Zhaoxin hardware engine driver support for SHA Tony W Wang-oc
  2024-01-23  2:28 ` [PATCH v2 1/3] crypto: padlock-sha: Matches CPU with Family with 6 explicitly Tony W Wang-oc
  2024-01-23  2:28 ` [PATCH v2 2/3] x86/cpufeatures: Add CPU feature flags for Zhaoxin Hash Engine Tony W Wang-oc
@ 2024-01-23  2:28 ` Tony W Wang-oc
  2 siblings, 0 replies; 13+ messages in thread
From: Tony W Wang-oc @ 2024-01-23  2:28 UTC (permalink / raw)
  To: herbert, davem, linux-crypto, linux-kernel, tglx, mingo, bp,
	dave.hansen, x86, hpa, seanjc, kim.phillips, kirill.shutemov,
	jmattson, babu.moger, kai.huang, TonyWWang-oc, acme, aik,
	namhyung
  Cc: CobeChen, TimGuo, LeoLiu-oc, GeorgeXue

Zhaoxin CPUs have implemented the SHA(Secure Hash Algorithm) as its CPU
instructions, including SHA1, SHA256, SHA384 and SHA512, which conform
to the Secure Hash Algorithms specified by FIPS 180-3.

With the help of implementation of SHA in hardware instead of software,
can develop applications with higher performance, more security and more
flexibility.

Below table gives a summary of test using the driver tcrypt with different
crypt algorithm drivers on Zhaoxin KH-40000 platform:
---------------------------------------------------------------------------
tcrypt     driver   16*    64      256     1024    2048    4096    8192
---------------------------------------------------------------------------
           zhaoxin** 442.80 1309.21 3257.53 5221.56 5813.45 6136.39 6264.50***
403:SHA1   generic** 341.44 813.27  1458.98 1818.03 1896.60 1940.71 1939.06
           ratio    1.30   1.61    2.23    2.87    3.07    3.16    3.23
---------------------------------------------------------------------------
           zhaoxin  451.70 1313.65 2958.71 4658.55 5109.16 5359.08 5459.13
404:SHA256 generic  202.62 463.55  845.01  1070.50 1117.51 1144.79 1155.68
           ratio    2.23   2.83    3.50    4.35    4.57    4.68    4.72
---------------------------------------------------------------------------
           zhaoxin  350.90 1406.42 3166.16 5736.39 6627.77 7182.01 7429.18
405:SHA384 generic  161.76 654.88  979.06  1350.56 1423.08 1496.57 1513.12
           ratio    2.17   2.15    3.23    4.25    4.66    4.80    4.91
---------------------------------------------------------------------------
           zhaoxin  334.49 1394.71 3159.93 5728.86 6625.33 7169.23 7407.80
406:SHA512 generic  161.80 653.84  979.42  1351.41 1444.14 1495.35 1518.43
           ratio    2.07   2.13    3.23    4.24    4.59    4.79    4.88
---------------------------------------------------------------------------
*: The length of each data block to be processed by one complete SHA
   sequence, namely one INIT, multi UPDATEs and one FINAL.
**: Crypt algorithm driver used by tcrypt, "zhaoxin" represents zhaoxin-sha
   while "generic" represents the generic software SHA driver.
***: The speed of each crypt algorithm driver processing different length
   of data blocks, unit is Mb/s.

The ratio in the table implies the performance of SHA implemented by
zhaoxin-sha driver is much higher than the ones implemented by the generic
software driver of sha1/sha256/sha384/sha512.

Signed-off-by: Tony W Wang-oc <TonyWWang-oc@zhaoxin.com>
---
 MAINTAINERS                  |   6 +
 drivers/crypto/Kconfig       |  16 ++
 drivers/crypto/Makefile      |   1 +
 drivers/crypto/zhaoxin-sha.c | 500 +++++++++++++++++++++++++++++++++++
 drivers/crypto/zhaoxin-sha.h |  17 ++
 5 files changed, 540 insertions(+)
 create mode 100644 drivers/crypto/zhaoxin-sha.c
 create mode 100644 drivers/crypto/zhaoxin-sha.h

diff --git a/MAINTAINERS b/MAINTAINERS
index ddc5e1049921..7d2bb64ea196 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -24329,6 +24329,12 @@ L:	linux-kernel@vger.kernel.org
 S:	Maintained
 F:	arch/x86/kernel/cpu/zhaoxin.c
 
+ZHAOXIN SHA SUPPORT
+M:	<TonyWWang-oc@zhaoxin.com>
+M:	<GeorgeXue@zhaoxin.com>
+S:	Maintained
+F:	drivers/crypto/zhaoxin-sha.c
+
 ZONEFS FILESYSTEM
 M:	Damien Le Moal <dlemoal@kernel.org>
 M:	Naohiro Aota <naohiro.aota@wdc.com>
diff --git a/drivers/crypto/Kconfig b/drivers/crypto/Kconfig
index 0991f026cb07..97716b90e180 100644
--- a/drivers/crypto/Kconfig
+++ b/drivers/crypto/Kconfig
@@ -799,4 +799,20 @@ config CRYPTO_DEV_SA2UL
 source "drivers/crypto/aspeed/Kconfig"
 source "drivers/crypto/starfive/Kconfig"
 
+config CRYPTO_DEV_ZHAOXIN_SHA
+	tristate "Support for Zhaoxin SHA1/SHA256/SHA384/SHA512 algorithms"
+	depends on X86 && !UML
+	select CRYPTO_HASH
+	select CRYPTO_SHA1
+	select CRYPTO_SHA256
+	select CRYPTO_SHA384
+	select CRYPTO_SHA512
+	help
+	  Use Zhaoxin HW engine for SHA1/SHA256/SHA384/SHA512 algorithms.
+
+	  Available in ZX-C+ and newer processors.
+
+	  If unsure say M. The compiled module will be
+	  called zhaoxin-sha.
+
 endif # CRYPTO_HW
diff --git a/drivers/crypto/Makefile b/drivers/crypto/Makefile
index d859d6a5f3a4..b77c02d6dab7 100644
--- a/drivers/crypto/Makefile
+++ b/drivers/crypto/Makefile
@@ -51,3 +51,4 @@ obj-y += hisilicon/
 obj-$(CONFIG_CRYPTO_DEV_AMLOGIC_GXL) += amlogic/
 obj-y += intel/
 obj-y += starfive/
+obj-$(CONFIG_CRYPTO_DEV_ZHAOXIN_SHA) += zhaoxin-sha.o
diff --git a/drivers/crypto/zhaoxin-sha.c b/drivers/crypto/zhaoxin-sha.c
new file mode 100644
index 000000000000..17242239edf2
--- /dev/null
+++ b/drivers/crypto/zhaoxin-sha.c
@@ -0,0 +1,500 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Cryptographic API.
+ *
+ * Support for Zhaoxin hardware crypto engine.
+ *
+ * Copyright (c) 2023  George Xue <georgexue@zhaoxin.com>
+ */
+
+#include <crypto/internal/hash.h>
+#include <crypto/sha1.h>
+#include <crypto/sha2.h>
+#include <linux/err.h>
+#include <linux/module.h>
+#include <linux/init.h>
+#include <linux/errno.h>
+#include <linux/interrupt.h>
+#include <linux/kernel.h>
+#include <linux/scatterlist.h>
+#include <asm/cpu_device_id.h>
+#include "zhaoxin-sha.h"
+
+static inline void zhaoxin_output_block(uint32_t *src, uint32_t *dst, size_t count)
+{
+	while (count--)
+		*dst++ = swab32(*src++);
+}
+
+static int zhaoxin_sha1_init(struct shash_desc *desc)
+{
+	struct sha1_state *sctx = shash_desc_ctx(desc);
+
+	*sctx = (struct sha1_state){
+		.state = { SHA1_H0, SHA1_H1, SHA1_H2, SHA1_H3, SHA1_H4 },
+	};
+
+	return 0;
+}
+
+static int zhaoxin_sha1_update(struct shash_desc *desc, const u8 *data, unsigned int len)
+{
+	struct sha1_state *sctx = shash_desc_ctx(desc);
+	unsigned int partial, done;
+	const u8 *src;
+	u8 buf[SHA1_BLOCK_SIZE * 2];
+	u8 *dst = &buf[0];
+
+	partial = sctx->count & (SHA1_BLOCK_SIZE - 1);
+	sctx->count += len;
+	done = 0;
+	src = data;
+	memcpy(dst, sctx->state, SHA1_DIGEST_SIZE);
+
+	if ((partial + len) >= SHA1_BLOCK_SIZE) {
+
+		/* Append the bytes in state's buffer to a block to handle */
+		if (partial) {
+			done = -partial;
+			memcpy(sctx->buffer + partial, data, done + SHA1_BLOCK_SIZE);
+			src = sctx->buffer;
+
+			asm volatile (".byte 0xf3,0x0f,0xa6,0xc8"
+			: "+S"(src), "+D"(dst)
+			: "a"(-1L), "c"(1UL));
+
+			done += SHA1_BLOCK_SIZE;
+			src = data + done;
+		}
+
+		/* Process the left bytes from the input data */
+		if (len - done >= SHA1_BLOCK_SIZE) {
+			asm volatile (".byte 0xf3,0x0f,0xa6,0xc8"
+			: "+S"(src), "+D"(dst)
+			: "a"(-1L),
+			"c"((unsigned long)((len - done) / SHA1_BLOCK_SIZE)));
+
+			done += ((len - done) - (len - done) % SHA1_BLOCK_SIZE);
+			src = data + done;
+		}
+		partial = 0;
+	}
+	memcpy(sctx->state, dst, SHA1_DIGEST_SIZE);
+	memcpy(sctx->buffer + partial, src, len - done);
+
+	return 0;
+}
+
+static int zhaoxin_sha1_final(struct shash_desc *desc, u8 *out)
+{
+	struct sha1_state *state = shash_desc_ctx(desc);
+	unsigned int partial, padlen;
+	__be64 bits;
+	static const u8 padding[SHA1_BLOCK_SIZE] = {SHA_PADDING_BYTE, };
+	const int bit_offset = SHA1_BLOCK_SIZE - sizeof(__be64);
+
+	bits = cpu_to_be64(state->count << 3);
+
+	/* Padding */
+	partial = state->count & (SHA1_BLOCK_SIZE - 1);
+	padlen = (partial < bit_offset) ? (bit_offset - partial) :
+		((SHA1_BLOCK_SIZE + bit_offset) - partial);
+	zhaoxin_sha1_update(desc, padding, padlen);
+
+	/* Append length field bytes */
+	zhaoxin_sha1_update(desc, (const u8 *)&bits, sizeof(bits));
+
+	/* Swap to output */
+	zhaoxin_output_block(state->state, (uint32_t *)out, SHA1_DIGEST_SIZE/sizeof(uint32_t));
+
+	return 0;
+}
+
+static int zhaoxin_sha256_init(struct shash_desc *desc)
+{
+	struct sha256_state *sctx = shash_desc_ctx(desc);
+
+	*sctx = (struct sha256_state){
+		.state = { SHA256_H0, SHA256_H1, SHA256_H2, SHA256_H3,
+				SHA256_H4, SHA256_H5, SHA256_H6, SHA256_H7},
+	};
+
+	return 0;
+}
+
+static int zhaoxin_sha256_update(struct shash_desc *desc, const u8 *data,
+			  unsigned int len)
+{
+	struct sha256_state *sctx = shash_desc_ctx(desc);
+	unsigned int partial, done;
+	const u8 *src;
+	u8 buf[SHA256_BLOCK_SIZE*2];
+	u8 *dst = &buf[0];
+
+	partial = sctx->count & (SHA256_BLOCK_SIZE - 1);
+	sctx->count += len;
+	done = 0;
+	src = data;
+	memcpy(dst, sctx->state, SHA256_DIGEST_SIZE);
+
+	if ((partial + len) >= SHA256_BLOCK_SIZE) {
+
+		/* Append the bytes in state's buffer to a block to handle */
+		if (partial) {
+			done = -partial;
+			memcpy(sctx->buf + partial, data, done + SHA256_BLOCK_SIZE);
+			src = sctx->buf;
+
+			asm volatile (".byte 0xf3,0x0f,0xa6,0xd0"
+			: "+S"(src), "+D"(dst)
+			: "a"(-1L), "c"(1UL));
+
+			done += SHA256_BLOCK_SIZE;
+			src = data + done;
+		}
+
+		/* Process the left bytes from input data*/
+		if (len - done >= SHA256_BLOCK_SIZE) {
+			asm volatile (".byte 0xf3,0x0f,0xa6,0xd0"
+			: "+S"(src), "+D"(dst)
+			: "a"(-1L),
+			"c"((unsigned long)((len - done) / SHA256_BLOCK_SIZE)));
+
+			done += ((len - done) - (len - done) % SHA256_BLOCK_SIZE);
+			src = data + done;
+		}
+		partial = 0;
+	}
+	memcpy(sctx->state, dst, SHA256_DIGEST_SIZE);
+	memcpy(sctx->buf + partial, src, len - done);
+
+	return 0;
+}
+
+static int zhaoxin_sha256_final(struct shash_desc *desc, u8 *out)
+{
+	struct sha256_state *state = shash_desc_ctx(desc);
+	unsigned int partial, padlen;
+	__be64 bits;
+	static const u8 padding[SHA256_BLOCK_SIZE] = {SHA_PADDING_BYTE, };
+	const int bit_offset = SHA256_BLOCK_SIZE - sizeof(__be64);
+
+	bits = cpu_to_be64(state->count << 3);
+
+	/* Padding */
+	partial = state->count & (SHA256_BLOCK_SIZE - 1);
+	padlen = (partial < bit_offset) ? (bit_offset - partial) :
+		((SHA256_BLOCK_SIZE + bit_offset) - partial);
+	zhaoxin_sha256_update(desc, padding, padlen);
+
+	/* Append length field bytes */
+	zhaoxin_sha256_update(desc, (const u8 *)&bits, sizeof(bits));
+
+	/* Swap to output */
+	zhaoxin_output_block(state->state, (uint32_t *)out, SHA256_DIGEST_SIZE/sizeof(uint32_t));
+
+	return 0;
+}
+
+static inline void zhaoxin_output_block_512(uint64_t *src,
+			uint64_t *dst, size_t count)
+{
+	while (count--)
+		*dst++ = swab64(*src++);
+}
+
+static int zhaoxin_sha384_init(struct shash_desc *desc)
+{
+	struct sha512_state *sctx = shash_desc_ctx(desc);
+
+	*sctx = (struct sha512_state){
+		.state = { SHA384_H0, SHA384_H1, SHA384_H2, SHA384_H3,
+				SHA384_H4, SHA384_H5, SHA384_H6, SHA384_H7},
+		.count = {0, 0},
+	};
+
+	return 0;
+}
+
+static int zhaoxin_sha512_init(struct shash_desc *desc)
+{
+	struct sha512_state *sctx = shash_desc_ctx(desc);
+
+	*sctx = (struct sha512_state){
+		.state = { SHA512_H0, SHA512_H1, SHA512_H2, SHA512_H3,
+				SHA512_H4, SHA512_H5, SHA512_H6, SHA512_H7},
+		.count = {0, 0},
+	};
+
+	return 0;
+}
+
+static int zhaoxin_sha512_update(struct shash_desc *desc, const u8 *data,
+			  unsigned int len)
+{
+	struct sha512_state *sctx = shash_desc_ctx(desc);
+	unsigned int partial, done;
+	const u8 *src;
+	u8 buf[SHA512_BLOCK_SIZE];
+	u8 *dst = &buf[0];
+
+	partial = sctx->count[0] % SHA512_BLOCK_SIZE;
+
+	sctx->count[0] += len;
+	if (sctx->count[0] < len)
+		sctx->count[1]++;
+
+	done = 0;
+	src = data;
+	memcpy(dst, sctx->state, SHA512_DIGEST_SIZE);
+
+	if ((partial + len) >= SHA512_BLOCK_SIZE) {
+		/* Append the bytes in state's buffer to a block to handle */
+		if (partial) {
+
+			done = -partial;
+			memcpy(sctx->buf + partial, data, done + SHA512_BLOCK_SIZE);
+
+			src = sctx->buf;
+
+			asm volatile (".byte 0xf3,0x0f,0xa6,0xe0"
+			: "+S"(src), "+D"(dst)
+			: "c"(1UL));
+
+			done += SHA512_BLOCK_SIZE;
+			src = data + done;
+		}
+
+		/* Process the left bytes from input data*/
+		if (len - done >= SHA512_BLOCK_SIZE) {
+			asm volatile (".byte 0xf3,0x0f,0xa6,0xe0"
+			: "+S"(src), "+D"(dst)
+			: "c"((unsigned long)((len - done) / SHA512_BLOCK_SIZE)));
+
+			done += ((len - done) - (len - done) % SHA512_BLOCK_SIZE);
+			src = data + done;
+		}
+		partial = 0;
+	}
+
+	memcpy(sctx->state, dst, SHA512_DIGEST_SIZE);
+	memcpy(sctx->buf + partial, src, len - done);
+
+	return 0;
+}
+
+static int zhaoxin_sha512_final(struct shash_desc *desc, u8 *out)
+{
+	const int bit_offset = SHA512_BLOCK_SIZE - sizeof(__be64[2]);
+	struct sha512_state *state = shash_desc_ctx(desc);
+	unsigned int partial = state->count[0] % SHA512_BLOCK_SIZE, padlen;
+	__be64 bits2[2];
+
+	// Both SHA384 and SHA512 may be supported.
+	int dgst_size = crypto_shash_digestsize(desc->tfm);
+
+	static u8 padding[SHA512_BLOCK_SIZE];
+
+	memset(padding, 0, SHA512_BLOCK_SIZE);
+	padding[0] = SHA_PADDING_BYTE;
+
+	// Convert byte count in little endian to bit count in big endian.
+	bits2[0] = cpu_to_be64(state->count[1] << 3 | state->count[0] >> 61);
+	bits2[1] = cpu_to_be64(state->count[0] << 3);
+
+	padlen = (partial < bit_offset) ? (bit_offset - partial) :
+		((SHA512_BLOCK_SIZE + bit_offset) - partial);
+
+	zhaoxin_sha512_update(desc, padding, padlen);
+
+	/* Append length field bytes */
+	zhaoxin_sha512_update(desc, (const u8 *)bits2, sizeof(__be64[2]));
+
+	/* Swap to output */
+	zhaoxin_output_block_512(state->state, (uint64_t *)out, dgst_size/sizeof(uint64_t));
+
+	return 0;
+}
+
+static int zhaoxin_sha_export(struct shash_desc *desc,
+				void *out)
+{
+	int statesize = crypto_shash_statesize(desc->tfm);
+	void *sctx = shash_desc_ctx(desc);
+
+	memcpy(out, sctx, statesize);
+	return 0;
+}
+
+static int zhaoxin_sha_import(struct shash_desc *desc,
+				const void *in)
+{
+	int statesize = crypto_shash_statesize(desc->tfm);
+	void *sctx = shash_desc_ctx(desc);
+
+	memcpy(sctx, in, statesize);
+	return 0;
+}
+
+static struct shash_alg sha1_alg = {
+	.digestsize	=	SHA1_DIGEST_SIZE,
+	.init		=	zhaoxin_sha1_init,
+	.update		=	zhaoxin_sha1_update,
+	.final		=	zhaoxin_sha1_final,
+	.export		=	zhaoxin_sha_export,
+	.import		=	zhaoxin_sha_import,
+	.descsize	=	sizeof(struct sha1_state),
+	.statesize	=	sizeof(struct sha1_state),
+	.base		=	{
+		.cra_name		=	"sha1",
+		.cra_driver_name	=	"sha1-zhaoxin",
+		.cra_priority		=	ZHAOXIN_SHA_CRA_PRIORITY,
+		.cra_blocksize		=	SHA1_BLOCK_SIZE,
+		.cra_module		=	THIS_MODULE,
+	}
+};
+
+static struct shash_alg sha256_alg = {
+	.digestsize	=	SHA256_DIGEST_SIZE,
+	.init		=	zhaoxin_sha256_init,
+	.update		=	zhaoxin_sha256_update,
+	.final		=	zhaoxin_sha256_final,
+	.export		=	zhaoxin_sha_export,
+	.import		=	zhaoxin_sha_import,
+	.descsize	=	sizeof(struct sha256_state),
+	.statesize	=	sizeof(struct sha256_state),
+	.base		=	{
+		.cra_name		=	"sha256",
+		.cra_driver_name	=	"sha256-zhaoxin",
+		.cra_priority		=	ZHAOXIN_SHA_CRA_PRIORITY,
+		.cra_blocksize		=	SHA256_BLOCK_SIZE,
+		.cra_module		=	THIS_MODULE,
+	}
+};
+
+static struct shash_alg sha384_alg = {
+	.digestsize	=	SHA384_DIGEST_SIZE,
+	.init		=	zhaoxin_sha384_init,
+	.update		=	zhaoxin_sha512_update,
+	.final		=	zhaoxin_sha512_final,
+	.export		=	zhaoxin_sha_export,
+	.import		=	zhaoxin_sha_import,
+	.descsize	=	sizeof(struct sha512_state),
+	.statesize	=	sizeof(struct sha512_state),
+	.base		=	{
+		.cra_name		=	"sha384",
+		.cra_driver_name	=	"sha384-zhaoxin",
+		.cra_priority		=	ZHAOXIN_SHA_CRA_PRIORITY,
+		.cra_blocksize		=	SHA384_BLOCK_SIZE,
+		.cra_module		=	THIS_MODULE,
+	}
+};
+
+static struct shash_alg sha512_alg = {
+	.digestsize	=	SHA512_DIGEST_SIZE,
+	.init		=	zhaoxin_sha512_init,
+	.update		=	zhaoxin_sha512_update,
+	.final		=	zhaoxin_sha512_final,
+	.export		=	zhaoxin_sha_export,
+	.import		=	zhaoxin_sha_import,
+	.descsize	=	sizeof(struct sha512_state),
+	.statesize	=	sizeof(struct sha512_state),
+	.base		=	{
+		.cra_name		=	"sha512",
+		.cra_driver_name	=	"sha512-zhaoxin",
+		.cra_priority		=	ZHAOXIN_SHA_CRA_PRIORITY,
+		.cra_blocksize		=	SHA512_BLOCK_SIZE,
+		.cra_module		=	THIS_MODULE,
+	}
+};
+
+
+static const struct x86_cpu_id zhaoxin_sha_ids[] = {
+	X86_MATCH_VENDOR_FAM_FEATURE(ZHAOXIN, 6, X86_FEATURE_PHE, NULL),
+	X86_MATCH_VENDOR_FAM_FEATURE(ZHAOXIN, 7, X86_FEATURE_PHE, NULL),
+	X86_MATCH_VENDOR_FAM_FEATURE(CENTAUR, 7, X86_FEATURE_PHE, NULL),
+	{}
+};
+MODULE_DEVICE_TABLE(x86cpu, zhaoxin_sha_ids);
+
+static int __init zhaoxin_sha_init(void)
+{
+	int rc = -ENODEV;
+
+	struct shash_alg *sha1;
+	struct shash_alg *sha256;
+	struct shash_alg *sha384;
+	struct shash_alg *sha512;
+
+	if (!x86_match_cpu(zhaoxin_sha_ids) || !boot_cpu_has(X86_FEATURE_PHE_EN))
+		return -ENODEV;
+
+	sha1 = &sha1_alg;
+	sha256 = &sha256_alg;
+
+	rc = crypto_register_shash(sha1);
+	if (rc)
+		goto out;
+
+	rc = crypto_register_shash(sha256);
+	if (rc)
+		goto out_unreg1;
+
+	if (boot_cpu_has(X86_FEATURE_PHE2_EN)) {
+
+		sha384 = &sha384_alg;
+		sha512 = &sha512_alg;
+
+		rc = crypto_register_shash(sha384);
+		if (rc)
+			goto out_unreg2;
+
+		rc = crypto_register_shash(sha512);
+		if (rc)
+			goto out_unreg3;
+
+		pr_notice("Using Zhaoxin Hardware Engine for SHA1/SHA256/SHA384/SHA512 algorithms.\n");
+	} else
+		pr_notice("Using Zhaoxin Hardware Engine for SHA1/SHA256 algorithms.\n");
+
+
+	return 0;
+
+out_unreg3:
+	if (boot_cpu_has(X86_FEATURE_PHE2_EN))
+		crypto_unregister_shash(sha384);
+
+out_unreg2:
+	crypto_unregister_shash(sha256);
+out_unreg1:
+	crypto_unregister_shash(sha1);
+
+out:
+	pr_err("Zhaoxin Hardware Engine for SHA1/SHA256/SHA384/SHA512 initialization failed.\n");
+	return rc;
+}
+
+static void __exit zhaoxin_sha_fini(void)
+{
+	crypto_unregister_shash(&sha1_alg);
+	crypto_unregister_shash(&sha256_alg);
+
+	if (boot_cpu_has(X86_FEATURE_PHE2_EN)) {
+		crypto_unregister_shash(&sha384_alg);
+		crypto_unregister_shash(&sha512_alg);
+	}
+
+}
+
+module_init(zhaoxin_sha_init);
+module_exit(zhaoxin_sha_fini);
+
+MODULE_DESCRIPTION("Zhaoxin Hardware SHA1/SHA256/SHA384/SHA512 algorithms support.");
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("George Xue");
+
+MODULE_ALIAS_CRYPTO("sha1-zhaoxin");
+MODULE_ALIAS_CRYPTO("sha256-zhaoxin");
+MODULE_ALIAS_CRYPTO("sha384-zhaoxin");
+MODULE_ALIAS_CRYPTO("sha512-zhaoxin");
+
diff --git a/drivers/crypto/zhaoxin-sha.h b/drivers/crypto/zhaoxin-sha.h
new file mode 100644
index 000000000000..699659018d19
--- /dev/null
+++ b/drivers/crypto/zhaoxin-sha.h
@@ -0,0 +1,17 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * Driver for Zhaoxin Sha
+ *
+ * Copyright (c) 2023 George Xue<georgexue@zhaoxin.com>
+ */
+
+#ifndef _ZHAOXIN_SHA_H
+#define _ZHAOXIN_SHA_H
+
+#define ZHAOXIN_SHA_CRA_PRIORITY	300
+#define ZHAOXIN_SHA_COMPOSITE_PRIORITY 400
+
+#define SHA_PADDING_BYTE    0x80
+
+#endif	/* _ZHAOXIN_SHA_H */
+
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 2/3] x86/cpufeatures: Add CPU feature flags for Zhaoxin Hash Engine
  2024-01-23  2:28 ` [PATCH v2 2/3] x86/cpufeatures: Add CPU feature flags for Zhaoxin Hash Engine Tony W Wang-oc
@ 2024-01-23  9:44   ` Borislav Petkov
  2024-01-23 15:42     ` H. Peter Anvin
  0 siblings, 1 reply; 13+ messages in thread
From: Borislav Petkov @ 2024-01-23  9:44 UTC (permalink / raw)
  To: Tony W Wang-oc
  Cc: herbert, davem, linux-crypto, linux-kernel, tglx, mingo,
	dave.hansen, x86, hpa, seanjc, kim.phillips, kirill.shutemov,
	jmattson, babu.moger, kai.huang, acme, aik, namhyung, CobeChen,
	TimGuo, LeoLiu-oc, GeorgeXue

On Tue, Jan 23, 2024 at 10:28:51AM +0800, Tony W Wang-oc wrote:
> Zhaoxin CPUs have implemented the SHA(Secure Hash Algorithm) as its
> instrucions.
> Add two CPU feature flags indicated by CPUID.(EAX=C0000001,ECX=0):EDX
> bit 25/26 which will be used by Zhaoxin SHA driver.
> 
> Signed-off-by: Tony W Wang-oc <TonyWWang-oc@zhaoxin.com>
> ---
>  arch/x86/include/asm/cpufeatures.h       | 4 +++-
>  tools/arch/x86/include/asm/cpufeatures.h | 4 +++-
>  2 files changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
> index 29cb275a219d..28b0e62dbdf5 100644
> --- a/arch/x86/include/asm/cpufeatures.h
> +++ b/arch/x86/include/asm/cpufeatures.h
> @@ -145,7 +145,7 @@
>  #define X86_FEATURE_RDRAND		( 4*32+30) /* RDRAND instruction */
>  #define X86_FEATURE_HYPERVISOR		( 4*32+31) /* Running on a hypervisor */
>  
> -/* VIA/Cyrix/Centaur-defined CPU features, CPUID level 0xC0000001, word 5 */
> +/* VIA/Cyrix/Centaur/Zhaoxin-defined CPU features, CPUID level 0xC0000001, word 5 */

Does that mean that all those companies agree on the contents of this
CPUID leaf?

>  #define X86_FEATURE_XSTORE		( 5*32+ 2) /* "rng" RNG present (xstore) */
>  #define X86_FEATURE_XSTORE_EN		( 5*32+ 3) /* "rng_en" RNG enabled */
>  #define X86_FEATURE_XCRYPT		( 5*32+ 6) /* "ace" on-CPU crypto (xcrypt) */
> @@ -156,6 +156,8 @@
>  #define X86_FEATURE_PHE_EN		( 5*32+11) /* PHE enabled */
>  #define X86_FEATURE_PMM			( 5*32+12) /* PadLock Montgomery Multiplier */
>  #define X86_FEATURE_PMM_EN		( 5*32+13) /* PMM enabled */
> +#define X86_FEATURE_PHE2		( 5*32+25) /* "phe2" Zhaoxin Hash Engine */
> +#define X86_FEATURE_PHE2_EN		( 5*32+26) /* "phe2_en" PHE2 enabled */
						      ^^^^^^^^^

From: Documentation/arch/x86/cpuinfo.rst

"a: Feature flags can be derived from the contents of CPUID leaves.
------------------------------------------------------------------
These feature definitions are organized mirroring the layout of CPUID
leaves and grouped in words with offsets as mapped in enum cpuid_leafs
in cpufeatures.h (see arch/x86/include/asm/cpufeatures.h for details).
If a feature is defined with a X86_FEATURE_<name> definition in
cpufeatures.h, and if it is detected at run time, the flags will be
displayed accordingly in /proc/cpuinfo. For example, the flag "avx2"
comes from X86_FEATURE_AVX2 in cpufeatures.h."

Is your grep broken?

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 2/3] x86/cpufeatures: Add CPU feature flags for Zhaoxin Hash Engine
  2024-01-23  9:44   ` Borislav Petkov
@ 2024-01-23 15:42     ` H. Peter Anvin
  2024-01-23 16:00       ` Borislav Petkov
  2024-01-31  8:59       ` Tony W Wang-oc
  0 siblings, 2 replies; 13+ messages in thread
From: H. Peter Anvin @ 2024-01-23 15:42 UTC (permalink / raw)
  To: Borislav Petkov, Tony W Wang-oc
  Cc: herbert, davem, linux-crypto, linux-kernel, tglx, mingo,
	dave.hansen, x86, seanjc, kim.phillips, kirill.shutemov,
	jmattson, babu.moger, kai.huang, acme, aik, namhyung, CobeChen,
	TimGuo, LeoLiu-oc, GeorgeXue

On January 23, 2024 1:44:27 AM PST, Borislav Petkov <bp@alien8.de> wrote:
>On Tue, Jan 23, 2024 at 10:28:51AM +0800, Tony W Wang-oc wrote:
>> Zhaoxin CPUs have implemented the SHA(Secure Hash Algorithm) as its
>> instrucions.
>> Add two CPU feature flags indicated by CPUID.(EAX=C0000001,ECX=0):EDX
>> bit 25/26 which will be used by Zhaoxin SHA driver.
>> 
>> Signed-off-by: Tony W Wang-oc <TonyWWang-oc@zhaoxin.com>
>> ---
>>  arch/x86/include/asm/cpufeatures.h       | 4 +++-
>>  tools/arch/x86/include/asm/cpufeatures.h | 4 +++-
>>  2 files changed, 6 insertions(+), 2 deletions(-)
>> 
>> diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
>> index 29cb275a219d..28b0e62dbdf5 100644
>> --- a/arch/x86/include/asm/cpufeatures.h
>> +++ b/arch/x86/include/asm/cpufeatures.h
>> @@ -145,7 +145,7 @@
>>  #define X86_FEATURE_RDRAND		( 4*32+30) /* RDRAND instruction */
>>  #define X86_FEATURE_HYPERVISOR		( 4*32+31) /* Running on a hypervisor */
>>  
>> -/* VIA/Cyrix/Centaur-defined CPU features, CPUID level 0xC0000001, word 5 */
>> +/* VIA/Cyrix/Centaur/Zhaoxin-defined CPU features, CPUID level 0xC0000001, word 5 */
>
>Does that mean that all those companies agree on the contents of this
>CPUID leaf?
>
>>  #define X86_FEATURE_XSTORE		( 5*32+ 2) /* "rng" RNG present (xstore) */
>>  #define X86_FEATURE_XSTORE_EN		( 5*32+ 3) /* "rng_en" RNG enabled */
>>  #define X86_FEATURE_XCRYPT		( 5*32+ 6) /* "ace" on-CPU crypto (xcrypt) */
>> @@ -156,6 +156,8 @@
>>  #define X86_FEATURE_PHE_EN		( 5*32+11) /* PHE enabled */
>>  #define X86_FEATURE_PMM			( 5*32+12) /* PadLock Montgomery Multiplier */
>>  #define X86_FEATURE_PMM_EN		( 5*32+13) /* PMM enabled */
>> +#define X86_FEATURE_PHE2		( 5*32+25) /* "phe2" Zhaoxin Hash Engine */
>> +#define X86_FEATURE_PHE2_EN		( 5*32+26) /* "phe2_en" PHE2 enabled */
>						      ^^^^^^^^^
>
>From: Documentation/arch/x86/cpuinfo.rst
>
>"a: Feature flags can be derived from the contents of CPUID leaves.
>------------------------------------------------------------------
>These feature definitions are organized mirroring the layout of CPUID
>leaves and grouped in words with offsets as mapped in enum cpuid_leafs
>in cpufeatures.h (see arch/x86/include/asm/cpufeatures.h for details).
>If a feature is defined with a X86_FEATURE_<name> definition in
>cpufeatures.h, and if it is detected at run time, the flags will be
>displayed accordingly in /proc/cpuinfo. For example, the flag "avx2"
>comes from X86_FEATURE_AVX2 in cpufeatures.h."
>
>Is your grep broken?
>

Well, Centaur bought Cyrix, and then VIA bought Centaur. I think Zhaoxin is a joint venture between VIA and the City of Shanghai, or something like that?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 2/3] x86/cpufeatures: Add CPU feature flags for Zhaoxin Hash Engine
  2024-01-23 15:42     ` H. Peter Anvin
@ 2024-01-23 16:00       ` Borislav Petkov
  2024-01-31  8:59       ` Tony W Wang-oc
  1 sibling, 0 replies; 13+ messages in thread
From: Borislav Petkov @ 2024-01-23 16:00 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Tony W Wang-oc, herbert, davem, linux-crypto, linux-kernel, tglx,
	mingo, dave.hansen, x86, seanjc, kim.phillips, kirill.shutemov,
	jmattson, babu.moger, kai.huang, acme, aik, namhyung, CobeChen,
	TimGuo, LeoLiu-oc, GeorgeXue

On Tue, Jan 23, 2024 at 07:42:00AM -0800, H. Peter Anvin wrote:
> Well, Centaur bought Cyrix, and then VIA bought Centaur.

I suspected something like that.

> I think Zhaoxin is a joint venture between VIA and the City of
> Shanghai, or something like that?

Aha.

Btw, lemme know if your reply bounces too. I got

<TonyWWang-oc@zhaoxin.com>: host mx2.zhaoxin.com[203.110.167.99] said: 550
    Sender IP reverse lookup rejected (in reply to RCPT TO command)

earlier.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 1/3] crypto: padlock-sha: Matches CPU with Family with 6 explicitly
  2024-01-23  2:28 ` [PATCH v2 1/3] crypto: padlock-sha: Matches CPU with Family with 6 explicitly Tony W Wang-oc
@ 2024-01-23 16:33   ` Dave Hansen
  2024-01-31  9:45     ` Tony W Wang-oc
  0 siblings, 1 reply; 13+ messages in thread
From: Dave Hansen @ 2024-01-23 16:33 UTC (permalink / raw)
  To: Tony W Wang-oc, herbert, davem, linux-crypto, linux-kernel, tglx,
	mingo, bp, dave.hansen, x86, hpa, seanjc, kim.phillips,
	kirill.shutemov, jmattson, babu.moger, kai.huang, acme, aik,
	namhyung
  Cc: CobeChen, TimGuo, LeoLiu-oc, GeorgeXue

On 1/22/24 18:28, Tony W Wang-oc wrote:
> Updates the supporting qualification for packlock-sha driver, making
> it support CPUs whose vendor ID is Centaur and Famliy is 6.

This changelog isn't telling us very much.  *Why* is this a good change?

> diff --git a/drivers/crypto/padlock-sha.c b/drivers/crypto/padlock-sha.c
> index 6865c7f1fc1a..2e82c5e77f7a 100644
> --- a/drivers/crypto/padlock-sha.c
> +++ b/drivers/crypto/padlock-sha.c
> @@ -491,7 +491,7 @@ static struct shash_alg sha256_alg_nano = {
>  };
>  
>  static const struct x86_cpu_id padlock_sha_ids[] = {
> -	X86_MATCH_FEATURE(X86_FEATURE_PHE, NULL),
> +	X86_MATCH_VENDOR_FAM_FEATURE(CENTAUR, 6, X86_FEATURE_PHE, NULL),
>  	{}
>  };

Logically, this is saying that there are non-CENTAUR or non-family-6
CPUs that set X86_FEATURE_PHE, but don't support X86_FEATURE_PHE.  Is
that the case?

The one Intel use of X86_MATCH_VENDOR_FAM_FEATURE() also looks a bit
suspect, btw.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 2/3] x86/cpufeatures: Add CPU feature flags for Zhaoxin Hash Engine
  2024-01-23 15:42     ` H. Peter Anvin
  2024-01-23 16:00       ` Borislav Petkov
@ 2024-01-31  8:59       ` Tony W Wang-oc
  1 sibling, 0 replies; 13+ messages in thread
From: Tony W Wang-oc @ 2024-01-31  8:59 UTC (permalink / raw)
  To: H. Peter Anvin, Borislav Petkov
  Cc: herbert, davem, linux-crypto, linux-kernel, tglx, mingo,
	dave.hansen, x86, seanjc, kim.phillips, kirill.shutemov,
	jmattson, babu.moger, kai.huang, acme, aik, namhyung, CobeChen,
	TimGuo, LeoLiu-oc, GeorgeXue


On 2024/1/23 23:42, H. Peter Anvin wrote:
>
> [这封邮件来自外部发件人 谨防风险]
>
> On January 23, 2024 1:44:27 AM PST, Borislav Petkov <bp@alien8.de> wrote:
>> On Tue, Jan 23, 2024 at 10:28:51AM +0800, Tony W Wang-oc wrote:
>>> Zhaoxin CPUs have implemented the SHA(Secure Hash Algorithm) as its
>>> instrucions.
>>> Add two CPU feature flags indicated by CPUID.(EAX=C0000001,ECX=0):EDX
>>> bit 25/26 which will be used by Zhaoxin SHA driver.
>>>
>>> Signed-off-by: Tony W Wang-oc <TonyWWang-oc@zhaoxin.com>
>>> ---
>>>   arch/x86/include/asm/cpufeatures.h       | 4 +++-
>>>   tools/arch/x86/include/asm/cpufeatures.h | 4 +++-
>>>   2 files changed, 6 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
>>> index 29cb275a219d..28b0e62dbdf5 100644
>>> --- a/arch/x86/include/asm/cpufeatures.h
>>> +++ b/arch/x86/include/asm/cpufeatures.h
>>> @@ -145,7 +145,7 @@
>>>   #define X86_FEATURE_RDRAND          ( 4*32+30) /* RDRAND instruction */
>>>   #define X86_FEATURE_HYPERVISOR              ( 4*32+31) /* Running on a hypervisor */
>>>
>>> -/* VIA/Cyrix/Centaur-defined CPU features, CPUID level 0xC0000001, word 5 */
>>> +/* VIA/Cyrix/Centaur/Zhaoxin-defined CPU features, CPUID level 0xC0000001, word 5 */
>> Does that mean that all those companies agree on the contents of this
>> CPUID leaf?
>>
>>>   #define X86_FEATURE_XSTORE          ( 5*32+ 2) /* "rng" RNG present (xstore) */
>>>   #define X86_FEATURE_XSTORE_EN               ( 5*32+ 3) /* "rng_en" RNG enabled */
>>>   #define X86_FEATURE_XCRYPT          ( 5*32+ 6) /* "ace" on-CPU crypto (xcrypt) */
>>> @@ -156,6 +156,8 @@
>>>   #define X86_FEATURE_PHE_EN          ( 5*32+11) /* PHE enabled */
>>>   #define X86_FEATURE_PMM                     ( 5*32+12) /* PadLock Montgomery Multiplier */
>>>   #define X86_FEATURE_PMM_EN          ( 5*32+13) /* PMM enabled */
>>> +#define X86_FEATURE_PHE2            ( 5*32+25) /* "phe2" Zhaoxin Hash Engine */
>>> +#define X86_FEATURE_PHE2_EN         ( 5*32+26) /* "phe2_en" PHE2 enabled */
>>                                                      ^^^^^^^^^
>>
>> From: Documentation/arch/x86/cpuinfo.rst
>>
>> "a: Feature flags can be derived from the contents of CPUID leaves.
>> ------------------------------------------------------------------
>> These feature definitions are organized mirroring the layout of CPUID
>> leaves and grouped in words with offsets as mapped in enum cpuid_leafs
>> in cpufeatures.h (see arch/x86/include/asm/cpufeatures.h for details).
>> If a feature is defined with a X86_FEATURE_<name> definition in
>> cpufeatures.h, and if it is detected at run time, the flags will be
>> displayed accordingly in /proc/cpuinfo. For example, the flag "avx2"
>> comes from X86_FEATURE_AVX2 in cpufeatures.h."
>>
>> Is your grep broken?
>>
> Well, Centaur bought Cyrix, and then VIA bought Centaur. I think Zhaoxin is a joint venture between VIA and the City of Shanghai, or something like that?

Yes, Zhaoxin is a joint venture including VIA and Shanghai Alliance 
Investment Ltd.

VIA has not designed new CPU products for a long time, nor maintained 
the previous products.

Zhaoxin is currently designing and releasing new CPU products, and VIA 
understands and agrees that Zhaoxin uses the contents of this CPUID leaf.

Sorry for late!



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 1/3] crypto: padlock-sha: Matches CPU with Family with 6 explicitly
  2024-01-23 16:33   ` Dave Hansen
@ 2024-01-31  9:45     ` Tony W Wang-oc
  2024-01-31 15:33       ` Dave Hansen
  0 siblings, 1 reply; 13+ messages in thread
From: Tony W Wang-oc @ 2024-01-31  9:45 UTC (permalink / raw)
  To: Dave Hansen, herbert, davem, linux-crypto, linux-kernel, tglx,
	mingo, bp, dave.hansen, x86, hpa, seanjc, kim.phillips,
	kirill.shutemov, jmattson, babu.moger, kai.huang, acme, aik,
	namhyung
  Cc: CobeChen, TimGuo, LeoLiu-oc, GeorgeXue


On 2024/1/24 00:33, Dave Hansen wrote:
>
> [这封邮件来自外部发件人 谨防风险]
>
> On 1/22/24 18:28, Tony W Wang-oc wrote:
>> Updates the supporting qualification for packlock-sha driver, making
>> it support CPUs whose vendor ID is Centaur and Famliy is 6.
> This changelog isn't telling us very much.  *Why* is this a good change?
>
>> diff --git a/drivers/crypto/padlock-sha.c b/drivers/crypto/padlock-sha.c
>> index 6865c7f1fc1a..2e82c5e77f7a 100644
>> --- a/drivers/crypto/padlock-sha.c
>> +++ b/drivers/crypto/padlock-sha.c
>> @@ -491,7 +491,7 @@ static struct shash_alg sha256_alg_nano = {
>>   };
>>
>>   static const struct x86_cpu_id padlock_sha_ids[] = {
>> -     X86_MATCH_FEATURE(X86_FEATURE_PHE, NULL),
>> +     X86_MATCH_VENDOR_FAM_FEATURE(CENTAUR, 6, X86_FEATURE_PHE, NULL),
>>        {}
>>   };
> Logically, this is saying that there are non-CENTAUR or non-family-6
> CPUs that set X86_FEATURE_PHE, but don't support X86_FEATURE_PHE.  Is
> that the case?

Not exactly.

Zhaoxin CPU supports X86_FEATURE_PHE and X86_FEATURE_PHE2.

We expect the Zhaoxin CPU to use the zhaoxin_sha driver introduced in 
the third patch of this patch set.

Without this patch Zhaoxin CPU will also match the padlock-sha driver too.


> The one Intel use of X86_MATCH_VENDOR_FAM_FEATURE() also looks a bit
> suspect, btw.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 1/3] crypto: padlock-sha: Matches CPU with Family with 6 explicitly
  2024-01-31  9:45     ` Tony W Wang-oc
@ 2024-01-31 15:33       ` Dave Hansen
  2024-02-01  2:37         ` Tony W Wang-oc
  0 siblings, 1 reply; 13+ messages in thread
From: Dave Hansen @ 2024-01-31 15:33 UTC (permalink / raw)
  To: Tony W Wang-oc, herbert, davem, linux-crypto, linux-kernel, tglx,
	mingo, bp, dave.hansen, x86, hpa, seanjc, kim.phillips,
	kirill.shutemov, jmattson, babu.moger, kai.huang, acme, aik,
	namhyung
  Cc: CobeChen, TimGuo, LeoLiu-oc, GeorgeXue

On 1/31/24 01:45, Tony W Wang-oc wrote:
>>>   static const struct x86_cpu_id padlock_sha_ids[] = {
>>> -     X86_MATCH_FEATURE(X86_FEATURE_PHE, NULL),
>>> +     X86_MATCH_VENDOR_FAM_FEATURE(CENTAUR, 6, X86_FEATURE_PHE, NULL),
>>>        {}
>>>   };
>> Logically, this is saying that there are non-CENTAUR or non-family-6
>> CPUs that set X86_FEATURE_PHE, but don't support X86_FEATURE_PHE.  Is
>> that the case?
> 
> Not exactly.
> 
> Zhaoxin CPU supports X86_FEATURE_PHE and X86_FEATURE_PHE2.
> 
> We expect the Zhaoxin CPU to use the zhaoxin_sha driver introduced in
> the third patch of this patch set.
> 
> Without this patch Zhaoxin CPU will also match the padlock-sha driver too.

I honestly have no idea what this is saying.

Could you try again, please?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 1/3] crypto: padlock-sha: Matches CPU with Family with 6 explicitly
  2024-01-31 15:33       ` Dave Hansen
@ 2024-02-01  2:37         ` Tony W Wang-oc
  2024-02-01 16:42           ` Dave Hansen
  0 siblings, 1 reply; 13+ messages in thread
From: Tony W Wang-oc @ 2024-02-01  2:37 UTC (permalink / raw)
  To: Dave Hansen, herbert, davem, linux-crypto, linux-kernel, tglx,
	mingo, bp, dave.hansen, x86, hpa, seanjc, kim.phillips,
	kirill.shutemov, jmattson, babu.moger, kai.huang, acme, aik,
	namhyung
  Cc: CobeChen, TimGuo, LeoLiu-oc, GeorgeXue


On 2024/1/31 23:33, Dave Hansen wrote:
>
> [这封邮件来自外部发件人 谨防风险]
>
> On 1/31/24 01:45, Tony W Wang-oc wrote:
>>>>    static const struct x86_cpu_id padlock_sha_ids[] = {
>>>> -     X86_MATCH_FEATURE(X86_FEATURE_PHE, NULL),
>>>> +     X86_MATCH_VENDOR_FAM_FEATURE(CENTAUR, 6, X86_FEATURE_PHE, NULL),
>>>>         {}
>>>>    };
>>> Logically, this is saying that there are non-CENTAUR or non-family-6
>>> CPUs that set X86_FEATURE_PHE, but don't support X86_FEATURE_PHE.  Is
>>> that the case?
>> Not exactly.
>>
>> Zhaoxin CPU supports X86_FEATURE_PHE and X86_FEATURE_PHE2.
>>
>> We expect the Zhaoxin CPU to use the zhaoxin_sha driver introduced in
>> the third patch of this patch set.
>>
>> Without this patch Zhaoxin CPU will also match the padlock-sha driver too.
> I honestly have no idea what this is saying.
>
> Could you try again, please?


Sorry. It should be said that there are non-CENTAUR or non-family-6 CPUs 
that set X86_FEATURE_PHE,

and also set the new X86_FEATURE_PHE2.  For these CPUs, we expect to use 
a new driver that supports

both X86_FEATURE_PHE and X86_FEATURE_PHE2.

So we make the driver padlock-sha to matches CENTAUR Family-6 CPU 
explicitly.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 1/3] crypto: padlock-sha: Matches CPU with Family with 6 explicitly
  2024-02-01  2:37         ` Tony W Wang-oc
@ 2024-02-01 16:42           ` Dave Hansen
  0 siblings, 0 replies; 13+ messages in thread
From: Dave Hansen @ 2024-02-01 16:42 UTC (permalink / raw)
  To: Tony W Wang-oc, herbert, davem, linux-crypto, linux-kernel, tglx,
	mingo, bp, dave.hansen, x86, hpa, seanjc, kim.phillips,
	kirill.shutemov, jmattson, babu.moger, kai.huang, acme, aik,
	namhyung
  Cc: CobeChen, TimGuo, LeoLiu-oc, GeorgeXue

On 1/31/24 18:37, Tony W Wang-oc wrote:
> Sorry. It should be said that there are non-CENTAUR or non-family-6 CPUs
> that set X86_FEATURE_PHE,
> 
> and also set the new X86_FEATURE_PHE2.  For these CPUs, we expect to use
> a new driver that supports
> 
> both X86_FEATURE_PHE and X86_FEATURE_PHE2.
> 
> So we make the driver padlock-sha to matches CENTAUR Family-6 CPU
> explicitly.

Could you please take a look at how this is done for the existing crypto
algorithms?  This doesn't seem horribly new.  We have AVX-512-based
algorithms that somehow work on systems that also have AVX and AVX2
support.  Yet, there are no other vendor or family matches in the
x86_cpu_id arrays for them.  Why?

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2024-02-01 16:42 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-01-23  2:28 [PATCH v2 0/3] Add Zhaoxin hardware engine driver support for SHA Tony W Wang-oc
2024-01-23  2:28 ` [PATCH v2 1/3] crypto: padlock-sha: Matches CPU with Family with 6 explicitly Tony W Wang-oc
2024-01-23 16:33   ` Dave Hansen
2024-01-31  9:45     ` Tony W Wang-oc
2024-01-31 15:33       ` Dave Hansen
2024-02-01  2:37         ` Tony W Wang-oc
2024-02-01 16:42           ` Dave Hansen
2024-01-23  2:28 ` [PATCH v2 2/3] x86/cpufeatures: Add CPU feature flags for Zhaoxin Hash Engine Tony W Wang-oc
2024-01-23  9:44   ` Borislav Petkov
2024-01-23 15:42     ` H. Peter Anvin
2024-01-23 16:00       ` Borislav Petkov
2024-01-31  8:59       ` Tony W Wang-oc
2024-01-23  2:28 ` [PATCH v2 3/3] crypto: Zhaoxin: Hardware Engine Driver for SHA1/256/384/512 Tony W Wang-oc

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).