All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 00/12] RISC-V: support some cryptography accelerations
@ 2023-07-11 15:37 ` Heiko Stuebner
  0 siblings, 0 replies; 100+ messages in thread
From: Heiko Stuebner @ 2023-07-11 15:37 UTC (permalink / raw)
  To: palmer, paul.walmsley
  Cc: aou, heiko, herbert, davem, conor.dooley, linux-riscv,
	linux-kernel, linux-crypto, christoph.muellner, ebiggers,
	Heiko Stuebner

From: Heiko Stuebner <heiko.stuebner@vrull.eu>

This series provides cryptographic implementations using the vector
crypto extensions.

v13 of the vector patchset dropped the patches for in-kernel usage of
vector instructions, I picked the ones from v12 over into this series
for now.

My basic goal was to not re-invent cryptographic code, so the heavy
lifting is done by those perl-asm scripts used in openssl and the perl
code used here-in stems from code that is targetted at openssl [0] and is
unmodified from there to limit needed review effort.

With a matching qemu (there are patches for vector-crypto flying around)
the in-kernel crypto-selftests (also the extended ones) are very happy
so far.


changes in v4:
- split off from scalar crypto patches but base on top of them
- adapt to pending openssl code [0] using the now frozen vector crypto
  extensions - with all its changes
  [0] https://github.com/openssl/openssl/pull/20149

changes in v3:
- rebase on top of 6.3-rc2
- rebase on top of vector-v14 patchset
- add the missing Co-developed-by mentions to showcase
  the people that did the actual openSSL crypto code

changes in v2:
- rebased on 6.2 + zbb series, so don't include already
  applied changes anymore
- refresh code picked from openssl as that side matures
- more algorithms (SHA512, AES, SM3, SM4)

Greentime Hu (2):
  riscv: Add support for kernel mode vector
  riscv: Add vector extension XOR implementation

Heiko Stuebner (10):
  RISC-V: add helper function to read the vector VLEN
  RISC-V: add vector crypto extension detection
  RISC-V: crypto: update perl include with helpers for vector (crypto)
    instructions
  RISC-V: crypto: add Zvbb+Zvbc accelerated GCM GHASH implementation
  RISC-V: crypto: add Zvkg accelerated GCM GHASH implementation
  RISC-V: crypto: add a vector-crypto-accelerated SHA256 implementation
  RISC-V: crypto: add a vector-crypto-accelerated SHA512 implementation
  RISC-V: crypto: add Zvkned accelerated AES encryption implementation
  RISC-V: crypto: add Zvksed accelerated SM4 encryption implementation
  RISC-V: crypto: add Zvksh accelerated SM3 hash implementation

 arch/riscv/crypto/Kconfig                     |  68 ++-
 arch/riscv/crypto/Makefile                    |  44 +-
 arch/riscv/crypto/aes-riscv-glue.c            | 168 ++++++
 arch/riscv/crypto/aes-riscv64-zvkned.pl       | 530 ++++++++++++++++++
 arch/riscv/crypto/ghash-riscv64-glue.c        | 245 ++++++++
 arch/riscv/crypto/ghash-riscv64-zvbb-zvbc.pl  | 380 +++++++++++++
 arch/riscv/crypto/ghash-riscv64-zvkg.pl       | 168 ++++++
 arch/riscv/crypto/riscv.pm                    | 433 +++++++++++++-
 arch/riscv/crypto/sha256-riscv64-glue.c       | 115 ++++
 .../crypto/sha256-riscv64-zvbb-zvknha.pl      | 314 +++++++++++
 arch/riscv/crypto/sha512-riscv64-glue.c       | 106 ++++
 .../crypto/sha512-riscv64-zvbb-zvknhb.pl      | 377 +++++++++++++
 arch/riscv/crypto/sm3-riscv64-glue.c          | 112 ++++
 arch/riscv/crypto/sm3-riscv64-zvksh.pl        | 225 ++++++++
 arch/riscv/crypto/sm4-riscv64-glue.c          | 162 ++++++
 arch/riscv/crypto/sm4-riscv64-zvksed.pl       | 300 ++++++++++
 arch/riscv/include/asm/hwcap.h                |   9 +
 arch/riscv/include/asm/vector.h               |  28 +
 arch/riscv/include/asm/xor.h                  |  82 +++
 arch/riscv/kernel/Makefile                    |   1 +
 arch/riscv/kernel/cpu.c                       |   8 +
 arch/riscv/kernel/cpufeature.c                |  50 ++
 arch/riscv/kernel/kernel_mode_vector.c        | 132 +++++
 arch/riscv/lib/Makefile                       |   1 +
 arch/riscv/lib/xor.S                          |  81 +++
 25 files changed, 4136 insertions(+), 3 deletions(-)
 create mode 100644 arch/riscv/crypto/aes-riscv-glue.c
 create mode 100644 arch/riscv/crypto/aes-riscv64-zvkned.pl
 create mode 100644 arch/riscv/crypto/ghash-riscv64-zvbb-zvbc.pl
 create mode 100644 arch/riscv/crypto/ghash-riscv64-zvkg.pl
 create mode 100644 arch/riscv/crypto/sha256-riscv64-glue.c
 create mode 100644 arch/riscv/crypto/sha256-riscv64-zvbb-zvknha.pl
 create mode 100644 arch/riscv/crypto/sha512-riscv64-glue.c
 create mode 100644 arch/riscv/crypto/sha512-riscv64-zvbb-zvknhb.pl
 create mode 100644 arch/riscv/crypto/sm3-riscv64-glue.c
 create mode 100644 arch/riscv/crypto/sm3-riscv64-zvksh.pl
 create mode 100644 arch/riscv/crypto/sm4-riscv64-glue.c
 create mode 100644 arch/riscv/crypto/sm4-riscv64-zvksed.pl
 create mode 100644 arch/riscv/include/asm/xor.h
 create mode 100644 arch/riscv/kernel/kernel_mode_vector.c
 create mode 100644 arch/riscv/lib/xor.S

-- 
2.39.2


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 100+ messages in thread

* [PATCH v4 00/12] RISC-V: support some cryptography accelerations
@ 2023-07-11 15:37 ` Heiko Stuebner
  0 siblings, 0 replies; 100+ messages in thread
From: Heiko Stuebner @ 2023-07-11 15:37 UTC (permalink / raw)
  To: palmer, paul.walmsley
  Cc: aou, heiko, herbert, davem, conor.dooley, linux-riscv,
	linux-kernel, linux-crypto, christoph.muellner, ebiggers,
	Heiko Stuebner

From: Heiko Stuebner <heiko.stuebner@vrull.eu>

This series provides cryptographic implementations using the vector
crypto extensions.

v13 of the vector patchset dropped the patches for in-kernel usage of
vector instructions, I picked the ones from v12 over into this series
for now.

My basic goal was to not re-invent cryptographic code, so the heavy
lifting is done by those perl-asm scripts used in openssl and the perl
code used here-in stems from code that is targetted at openssl [0] and is
unmodified from there to limit needed review effort.

With a matching qemu (there are patches for vector-crypto flying around)
the in-kernel crypto-selftests (also the extended ones) are very happy
so far.


changes in v4:
- split off from scalar crypto patches but base on top of them
- adapt to pending openssl code [0] using the now frozen vector crypto
  extensions - with all its changes
  [0] https://github.com/openssl/openssl/pull/20149

changes in v3:
- rebase on top of 6.3-rc2
- rebase on top of vector-v14 patchset
- add the missing Co-developed-by mentions to showcase
  the people that did the actual openSSL crypto code

changes in v2:
- rebased on 6.2 + zbb series, so don't include already
  applied changes anymore
- refresh code picked from openssl as that side matures
- more algorithms (SHA512, AES, SM3, SM4)

Greentime Hu (2):
  riscv: Add support for kernel mode vector
  riscv: Add vector extension XOR implementation

Heiko Stuebner (10):
  RISC-V: add helper function to read the vector VLEN
  RISC-V: add vector crypto extension detection
  RISC-V: crypto: update perl include with helpers for vector (crypto)
    instructions
  RISC-V: crypto: add Zvbb+Zvbc accelerated GCM GHASH implementation
  RISC-V: crypto: add Zvkg accelerated GCM GHASH implementation
  RISC-V: crypto: add a vector-crypto-accelerated SHA256 implementation
  RISC-V: crypto: add a vector-crypto-accelerated SHA512 implementation
  RISC-V: crypto: add Zvkned accelerated AES encryption implementation
  RISC-V: crypto: add Zvksed accelerated SM4 encryption implementation
  RISC-V: crypto: add Zvksh accelerated SM3 hash implementation

 arch/riscv/crypto/Kconfig                     |  68 ++-
 arch/riscv/crypto/Makefile                    |  44 +-
 arch/riscv/crypto/aes-riscv-glue.c            | 168 ++++++
 arch/riscv/crypto/aes-riscv64-zvkned.pl       | 530 ++++++++++++++++++
 arch/riscv/crypto/ghash-riscv64-glue.c        | 245 ++++++++
 arch/riscv/crypto/ghash-riscv64-zvbb-zvbc.pl  | 380 +++++++++++++
 arch/riscv/crypto/ghash-riscv64-zvkg.pl       | 168 ++++++
 arch/riscv/crypto/riscv.pm                    | 433 +++++++++++++-
 arch/riscv/crypto/sha256-riscv64-glue.c       | 115 ++++
 .../crypto/sha256-riscv64-zvbb-zvknha.pl      | 314 +++++++++++
 arch/riscv/crypto/sha512-riscv64-glue.c       | 106 ++++
 .../crypto/sha512-riscv64-zvbb-zvknhb.pl      | 377 +++++++++++++
 arch/riscv/crypto/sm3-riscv64-glue.c          | 112 ++++
 arch/riscv/crypto/sm3-riscv64-zvksh.pl        | 225 ++++++++
 arch/riscv/crypto/sm4-riscv64-glue.c          | 162 ++++++
 arch/riscv/crypto/sm4-riscv64-zvksed.pl       | 300 ++++++++++
 arch/riscv/include/asm/hwcap.h                |   9 +
 arch/riscv/include/asm/vector.h               |  28 +
 arch/riscv/include/asm/xor.h                  |  82 +++
 arch/riscv/kernel/Makefile                    |   1 +
 arch/riscv/kernel/cpu.c                       |   8 +
 arch/riscv/kernel/cpufeature.c                |  50 ++
 arch/riscv/kernel/kernel_mode_vector.c        | 132 +++++
 arch/riscv/lib/Makefile                       |   1 +
 arch/riscv/lib/xor.S                          |  81 +++
 25 files changed, 4136 insertions(+), 3 deletions(-)
 create mode 100644 arch/riscv/crypto/aes-riscv-glue.c
 create mode 100644 arch/riscv/crypto/aes-riscv64-zvkned.pl
 create mode 100644 arch/riscv/crypto/ghash-riscv64-zvbb-zvbc.pl
 create mode 100644 arch/riscv/crypto/ghash-riscv64-zvkg.pl
 create mode 100644 arch/riscv/crypto/sha256-riscv64-glue.c
 create mode 100644 arch/riscv/crypto/sha256-riscv64-zvbb-zvknha.pl
 create mode 100644 arch/riscv/crypto/sha512-riscv64-glue.c
 create mode 100644 arch/riscv/crypto/sha512-riscv64-zvbb-zvknhb.pl
 create mode 100644 arch/riscv/crypto/sm3-riscv64-glue.c
 create mode 100644 arch/riscv/crypto/sm3-riscv64-zvksh.pl
 create mode 100644 arch/riscv/crypto/sm4-riscv64-glue.c
 create mode 100644 arch/riscv/crypto/sm4-riscv64-zvksed.pl
 create mode 100644 arch/riscv/include/asm/xor.h
 create mode 100644 arch/riscv/kernel/kernel_mode_vector.c
 create mode 100644 arch/riscv/lib/xor.S

-- 
2.39.2


^ permalink raw reply	[flat|nested] 100+ messages in thread

* [PATCH v4 01/12] riscv: Add support for kernel mode vector
  2023-07-11 15:37 ` Heiko Stuebner
@ 2023-07-11 15:37   ` Heiko Stuebner
  -1 siblings, 0 replies; 100+ messages in thread
From: Heiko Stuebner @ 2023-07-11 15:37 UTC (permalink / raw)
  To: palmer, paul.walmsley
  Cc: aou, heiko, herbert, davem, conor.dooley, linux-riscv,
	linux-kernel, linux-crypto, christoph.muellner, ebiggers,
	Greentime Hu, Vincent Chen, Heiko Stuebner

From: Greentime Hu <greentime.hu@sifive.com>

Add kernel_rvv_begin() and kernel_rvv_end() function declarations
and corresponding definitions in kernel_mode_vector.c

These are needed to wrap uses of vector in kernel mode.

Co-developed-by: Vincent Chen <vincent.chen@sifive.com>
Signed-off-by: Vincent Chen <vincent.chen@sifive.com>
Signed-off-by: Greentime Hu <greentime.hu@sifive.com>
Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
---
 arch/riscv/include/asm/vector.h        |  17 ++++
 arch/riscv/kernel/Makefile             |   1 +
 arch/riscv/kernel/kernel_mode_vector.c | 132 +++++++++++++++++++++++++
 3 files changed, 150 insertions(+)
 create mode 100644 arch/riscv/kernel/kernel_mode_vector.c

diff --git a/arch/riscv/include/asm/vector.h b/arch/riscv/include/asm/vector.h
index 3d78930cab51..ac2c23045eec 100644
--- a/arch/riscv/include/asm/vector.h
+++ b/arch/riscv/include/asm/vector.h
@@ -196,6 +196,23 @@ static inline void __switch_to_vector(struct task_struct *prev,
 void riscv_v_vstate_ctrl_init(struct task_struct *tsk);
 bool riscv_v_vstate_ctrl_user_allowed(void);
 
+static inline void riscv_v_flush_cpu_state(void)
+{
+	asm volatile (
+		".option push\n\t"
+		".option arch, +v\n\t"
+		"vsetvli	t0, x0, e8, m8, ta, ma\n\t"
+		"vmv.v.i	v0, 0\n\t"
+		"vmv.v.i	v8, 0\n\t"
+		"vmv.v.i	v16, 0\n\t"
+		"vmv.v.i	v24, 0\n\t"
+		".option pop\n\t"
+		: : : "t0");
+}
+
+void kernel_rvv_begin(void);
+void kernel_rvv_end(void);
+
 #else /* ! CONFIG_RISCV_ISA_V  */
 
 struct pt_regs;
diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile
index 506cc4a9a45a..3f4435746af7 100644
--- a/arch/riscv/kernel/Makefile
+++ b/arch/riscv/kernel/Makefile
@@ -61,6 +61,7 @@ obj-$(CONFIG_MMU) += vdso.o vdso/
 obj-$(CONFIG_RISCV_M_MODE)	+= traps_misaligned.o
 obj-$(CONFIG_FPU)		+= fpu.o
 obj-$(CONFIG_RISCV_ISA_V)	+= vector.o
+obj-$(CONFIG_RISCV_ISA_V)	+= kernel_mode_vector.o
 obj-$(CONFIG_SMP)		+= smpboot.o
 obj-$(CONFIG_SMP)		+= smp.o
 obj-$(CONFIG_SMP)		+= cpu_ops.o
diff --git a/arch/riscv/kernel/kernel_mode_vector.c b/arch/riscv/kernel/kernel_mode_vector.c
new file mode 100644
index 000000000000..2d704190c054
--- /dev/null
+++ b/arch/riscv/kernel/kernel_mode_vector.c
@@ -0,0 +1,132 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Copyright (C) 2012 ARM Ltd.
+ * Author: Catalin Marinas <catalin.marinas@arm.com>
+ * Copyright (C) 2017 Linaro Ltd. <ard.biesheuvel@linaro.org>
+ * Copyright (C) 2021 SiFive
+ */
+#include <linux/compiler.h>
+#include <linux/irqflags.h>
+#include <linux/percpu.h>
+#include <linux/preempt.h>
+#include <linux/types.h>
+
+#include <asm/vector.h>
+#include <asm/switch_to.h>
+
+DECLARE_PER_CPU(bool, vector_context_busy);
+DEFINE_PER_CPU(bool, vector_context_busy);
+
+/*
+ * may_use_vector - whether it is allowable at this time to issue vector
+ *                instructions or access the vector register file
+ *
+ * Callers must not assume that the result remains true beyond the next
+ * preempt_enable() or return from softirq context.
+ */
+static __must_check inline bool may_use_vector(void)
+{
+	/*
+	 * vector_context_busy is only set while preemption is disabled,
+	 * and is clear whenever preemption is enabled. Since
+	 * this_cpu_read() is atomic w.r.t. preemption, vector_context_busy
+	 * cannot change under our feet -- if it's set we cannot be
+	 * migrated, and if it's clear we cannot be migrated to a CPU
+	 * where it is set.
+	 */
+	return !in_irq() && !irqs_disabled() && !in_nmi() &&
+	       !this_cpu_read(vector_context_busy);
+}
+
+/*
+ * Claim ownership of the CPU vector context for use by the calling context.
+ *
+ * The caller may freely manipulate the vector context metadata until
+ * put_cpu_vector_context() is called.
+ */
+static void get_cpu_vector_context(void)
+{
+	bool busy;
+
+	preempt_disable();
+	busy = __this_cpu_xchg(vector_context_busy, true);
+
+	WARN_ON(busy);
+}
+
+/*
+ * Release the CPU vector context.
+ *
+ * Must be called from a context in which get_cpu_vector_context() was
+ * previously called, with no call to put_cpu_vector_context() in the
+ * meantime.
+ */
+static void put_cpu_vector_context(void)
+{
+	bool busy = __this_cpu_xchg(vector_context_busy, false);
+
+	WARN_ON(!busy);
+	preempt_enable();
+}
+
+/*
+ * kernel_rvv_begin(): obtain the CPU vector registers for use by the calling
+ * context
+ *
+ * Must not be called unless may_use_vector() returns true.
+ * Task context in the vector registers is saved back to memory as necessary.
+ *
+ * A matching call to kernel_rvv_end() must be made before returning from the
+ * calling context.
+ *
+ * The caller may freely use the vector registers until kernel_rvv_end() is
+ * called.
+ */
+void kernel_rvv_begin(void)
+{
+	if (WARN_ON(!has_vector()))
+		return;
+
+	WARN_ON(!may_use_vector());
+
+	/* Acquire kernel mode vector */
+	get_cpu_vector_context();
+
+	/* Save vector state, if any */
+	riscv_v_vstate_save(current, task_pt_regs(current));
+
+	/* Enable vector */
+	riscv_v_enable();
+
+	/* Invalidate vector regs */
+	riscv_v_flush_cpu_state();
+}
+EXPORT_SYMBOL_GPL(kernel_rvv_begin);
+
+/*
+ * kernel_rvv_end(): give the CPU vector registers back to the current task
+ *
+ * Must be called from a context in which kernel_rvv_begin() was previously
+ * called, with no call to kernel_rvv_end() in the meantime.
+ *
+ * The caller must not use the vector registers after this function is called,
+ * unless kernel_rvv_begin() is called again in the meantime.
+ */
+void kernel_rvv_end(void)
+{
+	if (WARN_ON(!has_vector()))
+		return;
+
+	/* Invalidate vector regs */
+	riscv_v_flush_cpu_state();
+
+	/* Restore vector state, if any */
+	riscv_v_vstate_restore(current, task_pt_regs(current));
+
+	/* disable vector */
+	riscv_v_disable();
+
+	/* release kernel mode vector */
+	put_cpu_vector_context();
+}
+EXPORT_SYMBOL_GPL(kernel_rvv_end);
-- 
2.39.2


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [PATCH v4 01/12] riscv: Add support for kernel mode vector
@ 2023-07-11 15:37   ` Heiko Stuebner
  0 siblings, 0 replies; 100+ messages in thread
From: Heiko Stuebner @ 2023-07-11 15:37 UTC (permalink / raw)
  To: palmer, paul.walmsley
  Cc: aou, heiko, herbert, davem, conor.dooley, linux-riscv,
	linux-kernel, linux-crypto, christoph.muellner, ebiggers,
	Greentime Hu, Vincent Chen, Heiko Stuebner

From: Greentime Hu <greentime.hu@sifive.com>

Add kernel_rvv_begin() and kernel_rvv_end() function declarations
and corresponding definitions in kernel_mode_vector.c

These are needed to wrap uses of vector in kernel mode.

Co-developed-by: Vincent Chen <vincent.chen@sifive.com>
Signed-off-by: Vincent Chen <vincent.chen@sifive.com>
Signed-off-by: Greentime Hu <greentime.hu@sifive.com>
Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
---
 arch/riscv/include/asm/vector.h        |  17 ++++
 arch/riscv/kernel/Makefile             |   1 +
 arch/riscv/kernel/kernel_mode_vector.c | 132 +++++++++++++++++++++++++
 3 files changed, 150 insertions(+)
 create mode 100644 arch/riscv/kernel/kernel_mode_vector.c

diff --git a/arch/riscv/include/asm/vector.h b/arch/riscv/include/asm/vector.h
index 3d78930cab51..ac2c23045eec 100644
--- a/arch/riscv/include/asm/vector.h
+++ b/arch/riscv/include/asm/vector.h
@@ -196,6 +196,23 @@ static inline void __switch_to_vector(struct task_struct *prev,
 void riscv_v_vstate_ctrl_init(struct task_struct *tsk);
 bool riscv_v_vstate_ctrl_user_allowed(void);
 
+static inline void riscv_v_flush_cpu_state(void)
+{
+	asm volatile (
+		".option push\n\t"
+		".option arch, +v\n\t"
+		"vsetvli	t0, x0, e8, m8, ta, ma\n\t"
+		"vmv.v.i	v0, 0\n\t"
+		"vmv.v.i	v8, 0\n\t"
+		"vmv.v.i	v16, 0\n\t"
+		"vmv.v.i	v24, 0\n\t"
+		".option pop\n\t"
+		: : : "t0");
+}
+
+void kernel_rvv_begin(void);
+void kernel_rvv_end(void);
+
 #else /* ! CONFIG_RISCV_ISA_V  */
 
 struct pt_regs;
diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile
index 506cc4a9a45a..3f4435746af7 100644
--- a/arch/riscv/kernel/Makefile
+++ b/arch/riscv/kernel/Makefile
@@ -61,6 +61,7 @@ obj-$(CONFIG_MMU) += vdso.o vdso/
 obj-$(CONFIG_RISCV_M_MODE)	+= traps_misaligned.o
 obj-$(CONFIG_FPU)		+= fpu.o
 obj-$(CONFIG_RISCV_ISA_V)	+= vector.o
+obj-$(CONFIG_RISCV_ISA_V)	+= kernel_mode_vector.o
 obj-$(CONFIG_SMP)		+= smpboot.o
 obj-$(CONFIG_SMP)		+= smp.o
 obj-$(CONFIG_SMP)		+= cpu_ops.o
diff --git a/arch/riscv/kernel/kernel_mode_vector.c b/arch/riscv/kernel/kernel_mode_vector.c
new file mode 100644
index 000000000000..2d704190c054
--- /dev/null
+++ b/arch/riscv/kernel/kernel_mode_vector.c
@@ -0,0 +1,132 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Copyright (C) 2012 ARM Ltd.
+ * Author: Catalin Marinas <catalin.marinas@arm.com>
+ * Copyright (C) 2017 Linaro Ltd. <ard.biesheuvel@linaro.org>
+ * Copyright (C) 2021 SiFive
+ */
+#include <linux/compiler.h>
+#include <linux/irqflags.h>
+#include <linux/percpu.h>
+#include <linux/preempt.h>
+#include <linux/types.h>
+
+#include <asm/vector.h>
+#include <asm/switch_to.h>
+
+DECLARE_PER_CPU(bool, vector_context_busy);
+DEFINE_PER_CPU(bool, vector_context_busy);
+
+/*
+ * may_use_vector - whether it is allowable at this time to issue vector
+ *                instructions or access the vector register file
+ *
+ * Callers must not assume that the result remains true beyond the next
+ * preempt_enable() or return from softirq context.
+ */
+static __must_check inline bool may_use_vector(void)
+{
+	/*
+	 * vector_context_busy is only set while preemption is disabled,
+	 * and is clear whenever preemption is enabled. Since
+	 * this_cpu_read() is atomic w.r.t. preemption, vector_context_busy
+	 * cannot change under our feet -- if it's set we cannot be
+	 * migrated, and if it's clear we cannot be migrated to a CPU
+	 * where it is set.
+	 */
+	return !in_irq() && !irqs_disabled() && !in_nmi() &&
+	       !this_cpu_read(vector_context_busy);
+}
+
+/*
+ * Claim ownership of the CPU vector context for use by the calling context.
+ *
+ * The caller may freely manipulate the vector context metadata until
+ * put_cpu_vector_context() is called.
+ */
+static void get_cpu_vector_context(void)
+{
+	bool busy;
+
+	preempt_disable();
+	busy = __this_cpu_xchg(vector_context_busy, true);
+
+	WARN_ON(busy);
+}
+
+/*
+ * Release the CPU vector context.
+ *
+ * Must be called from a context in which get_cpu_vector_context() was
+ * previously called, with no call to put_cpu_vector_context() in the
+ * meantime.
+ */
+static void put_cpu_vector_context(void)
+{
+	bool busy = __this_cpu_xchg(vector_context_busy, false);
+
+	WARN_ON(!busy);
+	preempt_enable();
+}
+
+/*
+ * kernel_rvv_begin(): obtain the CPU vector registers for use by the calling
+ * context
+ *
+ * Must not be called unless may_use_vector() returns true.
+ * Task context in the vector registers is saved back to memory as necessary.
+ *
+ * A matching call to kernel_rvv_end() must be made before returning from the
+ * calling context.
+ *
+ * The caller may freely use the vector registers until kernel_rvv_end() is
+ * called.
+ */
+void kernel_rvv_begin(void)
+{
+	if (WARN_ON(!has_vector()))
+		return;
+
+	WARN_ON(!may_use_vector());
+
+	/* Acquire kernel mode vector */
+	get_cpu_vector_context();
+
+	/* Save vector state, if any */
+	riscv_v_vstate_save(current, task_pt_regs(current));
+
+	/* Enable vector */
+	riscv_v_enable();
+
+	/* Invalidate vector regs */
+	riscv_v_flush_cpu_state();
+}
+EXPORT_SYMBOL_GPL(kernel_rvv_begin);
+
+/*
+ * kernel_rvv_end(): give the CPU vector registers back to the current task
+ *
+ * Must be called from a context in which kernel_rvv_begin() was previously
+ * called, with no call to kernel_rvv_end() in the meantime.
+ *
+ * The caller must not use the vector registers after this function is called,
+ * unless kernel_rvv_begin() is called again in the meantime.
+ */
+void kernel_rvv_end(void)
+{
+	if (WARN_ON(!has_vector()))
+		return;
+
+	/* Invalidate vector regs */
+	riscv_v_flush_cpu_state();
+
+	/* Restore vector state, if any */
+	riscv_v_vstate_restore(current, task_pt_regs(current));
+
+	/* disable vector */
+	riscv_v_disable();
+
+	/* release kernel mode vector */
+	put_cpu_vector_context();
+}
+EXPORT_SYMBOL_GPL(kernel_rvv_end);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [PATCH v4 02/12] riscv: Add vector extension XOR implementation
  2023-07-11 15:37 ` Heiko Stuebner
@ 2023-07-11 15:37   ` Heiko Stuebner
  -1 siblings, 0 replies; 100+ messages in thread
From: Heiko Stuebner @ 2023-07-11 15:37 UTC (permalink / raw)
  To: palmer, paul.walmsley
  Cc: aou, heiko, herbert, davem, conor.dooley, linux-riscv,
	linux-kernel, linux-crypto, christoph.muellner, ebiggers,
	Greentime Hu, Han-Kuan Chen, Heiko Stuebner

From: Greentime Hu <greentime.hu@sifive.com>

This patch adds support for vector optimized XOR and it is tested in
qemu.

Co-developed-by: Han-Kuan Chen <hankuan.chen@sifive.com>
Signed-off-by: Han-Kuan Chen <hankuan.chen@sifive.com>
Signed-off-by: Greentime Hu <greentime.hu@sifive.com>
Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
---
 arch/riscv/include/asm/xor.h | 82 ++++++++++++++++++++++++++++++++++++
 arch/riscv/lib/Makefile      |  1 +
 arch/riscv/lib/xor.S         | 81 +++++++++++++++++++++++++++++++++++
 3 files changed, 164 insertions(+)
 create mode 100644 arch/riscv/include/asm/xor.h
 create mode 100644 arch/riscv/lib/xor.S

diff --git a/arch/riscv/include/asm/xor.h b/arch/riscv/include/asm/xor.h
new file mode 100644
index 000000000000..74867c7fd955
--- /dev/null
+++ b/arch/riscv/include/asm/xor.h
@@ -0,0 +1,82 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * Copyright (C) 2021 SiFive
+ */
+
+#include <linux/hardirq.h>
+#include <asm-generic/xor.h>
+#ifdef CONFIG_VECTOR
+#include <asm/vector.h>
+#include <asm/switch_to.h>
+
+void xor_regs_2_(unsigned long bytes, unsigned long *__restrict p1,
+		 const unsigned long *__restrict p2);
+void xor_regs_3_(unsigned long bytes, unsigned long *__restrict p1,
+		 const unsigned long *__restrict p2,
+		 const unsigned long *__restrict p3);
+void xor_regs_4_(unsigned long bytes, unsigned long *__restrict p1,
+		 const unsigned long *__restrict p2,
+		 const unsigned long *__restrict p3,
+		 const unsigned long *__restrict p4);
+void xor_regs_5_(unsigned long bytes, unsigned long *__restrict p1,
+		 const unsigned long *__restrict p2,
+		 const unsigned long *__restrict p3,
+		 const unsigned long *__restrict p4,
+		 const unsigned long *__restrict p5);
+
+static void xor_rvv_2(unsigned long bytes, unsigned long *__restrict p1,
+		      const unsigned long *__restrict p2)
+{
+	kernel_rvv_begin();
+	xor_regs_2_(bytes, p1, p2);
+	kernel_rvv_end();
+}
+
+static void xor_rvv_3(unsigned long bytes, unsigned long *__restrict p1,
+		      const unsigned long *__restrict p2,
+		      const unsigned long *__restrict p3)
+{
+	kernel_rvv_begin();
+	xor_regs_3_(bytes, p1, p2, p3);
+	kernel_rvv_end();
+}
+
+static void xor_rvv_4(unsigned long bytes, unsigned long *__restrict p1,
+		      const unsigned long *__restrict p2,
+		      const unsigned long *__restrict p3,
+		      const unsigned long *__restrict p4)
+{
+	kernel_rvv_begin();
+	xor_regs_4_(bytes, p1, p2, p3, p4);
+	kernel_rvv_end();
+}
+
+static void xor_rvv_5(unsigned long bytes, unsigned long *__restrict p1,
+		      const unsigned long *__restrict p2,
+		      const unsigned long *__restrict p3,
+		      const unsigned long *__restrict p4,
+		      const unsigned long *__restrict p5)
+{
+	kernel_rvv_begin();
+	xor_regs_5_(bytes, p1, p2, p3, p4, p5);
+	kernel_rvv_end();
+}
+
+static struct xor_block_template xor_block_rvv = {
+	.name = "rvv",
+	.do_2 = xor_rvv_2,
+	.do_3 = xor_rvv_3,
+	.do_4 = xor_rvv_4,
+	.do_5 = xor_rvv_5
+};
+
+#undef XOR_TRY_TEMPLATES
+#define XOR_TRY_TEMPLATES           \
+	do {        \
+		xor_speed(&xor_block_8regs);    \
+		xor_speed(&xor_block_32regs);    \
+		if (has_vector()) { \
+			xor_speed(&xor_block_rvv);\
+		} \
+	} while (0)
+#endif
diff --git a/arch/riscv/lib/Makefile b/arch/riscv/lib/Makefile
index 26cb2502ecf8..3164112680f1 100644
--- a/arch/riscv/lib/Makefile
+++ b/arch/riscv/lib/Makefile
@@ -11,3 +11,4 @@ lib-$(CONFIG_64BIT)	+= tishift.o
 lib-$(CONFIG_RISCV_ISA_ZICBOZ)	+= clear_page.o
 
 obj-$(CONFIG_FUNCTION_ERROR_INJECTION) += error-inject.o
+lib-$(CONFIG_VECTOR)	+= xor.o
diff --git a/arch/riscv/lib/xor.S b/arch/riscv/lib/xor.S
new file mode 100644
index 000000000000..3bc059e18171
--- /dev/null
+++ b/arch/riscv/lib/xor.S
@@ -0,0 +1,81 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * Copyright (C) 2021 SiFive
+ */
+#include <linux/linkage.h>
+#include <asm-generic/export.h>
+#include <asm/asm.h>
+
+ENTRY(xor_regs_2_)
+	vsetvli a3, a0, e8, m8, ta, ma
+	vle8.v v0, (a1)
+	vle8.v v8, (a2)
+	sub a0, a0, a3
+	vxor.vv v16, v0, v8
+	add a2, a2, a3
+	vse8.v v16, (a1)
+	add a1, a1, a3
+	bnez a0, xor_regs_2_
+	ret
+END(xor_regs_2_)
+EXPORT_SYMBOL(xor_regs_2_)
+
+ENTRY(xor_regs_3_)
+	vsetvli a4, a0, e8, m8, ta, ma
+	vle8.v v0, (a1)
+	vle8.v v8, (a2)
+	sub a0, a0, a4
+	vxor.vv v0, v0, v8
+	vle8.v v16, (a3)
+	add a2, a2, a4
+	vxor.vv v16, v0, v16
+	add a3, a3, a4
+	vse8.v v16, (a1)
+	add a1, a1, a4
+	bnez a0, xor_regs_3_
+	ret
+END(xor_regs_3_)
+EXPORT_SYMBOL(xor_regs_3_)
+
+ENTRY(xor_regs_4_)
+	vsetvli a5, a0, e8, m8, ta, ma
+	vle8.v v0, (a1)
+	vle8.v v8, (a2)
+	sub a0, a0, a5
+	vxor.vv v0, v0, v8
+	vle8.v v16, (a3)
+	add a2, a2, a5
+	vxor.vv v0, v0, v16
+	vle8.v v24, (a4)
+	add a3, a3, a5
+	vxor.vv v16, v0, v24
+	add a4, a4, a5
+	vse8.v v16, (a1)
+	add a1, a1, a5
+	bnez a0, xor_regs_4_
+	ret
+END(xor_regs_4_)
+EXPORT_SYMBOL(xor_regs_4_)
+
+ENTRY(xor_regs_5_)
+	vsetvli a6, a0, e8, m8, ta, ma
+	vle8.v v0, (a1)
+	vle8.v v8, (a2)
+	sub a0, a0, a6
+	vxor.vv v0, v0, v8
+	vle8.v v16, (a3)
+	add a2, a2, a6
+	vxor.vv v0, v0, v16
+	vle8.v v24, (a4)
+	add a3, a3, a6
+	vxor.vv v0, v0, v24
+	vle8.v v8, (a5)
+	add a4, a4, a6
+	vxor.vv v16, v0, v8
+	add a5, a5, a6
+	vse8.v v16, (a1)
+	add a1, a1, a6
+	bnez a0, xor_regs_5_
+	ret
+END(xor_regs_5_)
+EXPORT_SYMBOL(xor_regs_5_)
-- 
2.39.2


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [PATCH v4 02/12] riscv: Add vector extension XOR implementation
@ 2023-07-11 15:37   ` Heiko Stuebner
  0 siblings, 0 replies; 100+ messages in thread
From: Heiko Stuebner @ 2023-07-11 15:37 UTC (permalink / raw)
  To: palmer, paul.walmsley
  Cc: aou, heiko, herbert, davem, conor.dooley, linux-riscv,
	linux-kernel, linux-crypto, christoph.muellner, ebiggers,
	Greentime Hu, Han-Kuan Chen, Heiko Stuebner

From: Greentime Hu <greentime.hu@sifive.com>

This patch adds support for vector optimized XOR and it is tested in
qemu.

Co-developed-by: Han-Kuan Chen <hankuan.chen@sifive.com>
Signed-off-by: Han-Kuan Chen <hankuan.chen@sifive.com>
Signed-off-by: Greentime Hu <greentime.hu@sifive.com>
Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
---
 arch/riscv/include/asm/xor.h | 82 ++++++++++++++++++++++++++++++++++++
 arch/riscv/lib/Makefile      |  1 +
 arch/riscv/lib/xor.S         | 81 +++++++++++++++++++++++++++++++++++
 3 files changed, 164 insertions(+)
 create mode 100644 arch/riscv/include/asm/xor.h
 create mode 100644 arch/riscv/lib/xor.S

diff --git a/arch/riscv/include/asm/xor.h b/arch/riscv/include/asm/xor.h
new file mode 100644
index 000000000000..74867c7fd955
--- /dev/null
+++ b/arch/riscv/include/asm/xor.h
@@ -0,0 +1,82 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * Copyright (C) 2021 SiFive
+ */
+
+#include <linux/hardirq.h>
+#include <asm-generic/xor.h>
+#ifdef CONFIG_VECTOR
+#include <asm/vector.h>
+#include <asm/switch_to.h>
+
+void xor_regs_2_(unsigned long bytes, unsigned long *__restrict p1,
+		 const unsigned long *__restrict p2);
+void xor_regs_3_(unsigned long bytes, unsigned long *__restrict p1,
+		 const unsigned long *__restrict p2,
+		 const unsigned long *__restrict p3);
+void xor_regs_4_(unsigned long bytes, unsigned long *__restrict p1,
+		 const unsigned long *__restrict p2,
+		 const unsigned long *__restrict p3,
+		 const unsigned long *__restrict p4);
+void xor_regs_5_(unsigned long bytes, unsigned long *__restrict p1,
+		 const unsigned long *__restrict p2,
+		 const unsigned long *__restrict p3,
+		 const unsigned long *__restrict p4,
+		 const unsigned long *__restrict p5);
+
+static void xor_rvv_2(unsigned long bytes, unsigned long *__restrict p1,
+		      const unsigned long *__restrict p2)
+{
+	kernel_rvv_begin();
+	xor_regs_2_(bytes, p1, p2);
+	kernel_rvv_end();
+}
+
+static void xor_rvv_3(unsigned long bytes, unsigned long *__restrict p1,
+		      const unsigned long *__restrict p2,
+		      const unsigned long *__restrict p3)
+{
+	kernel_rvv_begin();
+	xor_regs_3_(bytes, p1, p2, p3);
+	kernel_rvv_end();
+}
+
+static void xor_rvv_4(unsigned long bytes, unsigned long *__restrict p1,
+		      const unsigned long *__restrict p2,
+		      const unsigned long *__restrict p3,
+		      const unsigned long *__restrict p4)
+{
+	kernel_rvv_begin();
+	xor_regs_4_(bytes, p1, p2, p3, p4);
+	kernel_rvv_end();
+}
+
+static void xor_rvv_5(unsigned long bytes, unsigned long *__restrict p1,
+		      const unsigned long *__restrict p2,
+		      const unsigned long *__restrict p3,
+		      const unsigned long *__restrict p4,
+		      const unsigned long *__restrict p5)
+{
+	kernel_rvv_begin();
+	xor_regs_5_(bytes, p1, p2, p3, p4, p5);
+	kernel_rvv_end();
+}
+
+static struct xor_block_template xor_block_rvv = {
+	.name = "rvv",
+	.do_2 = xor_rvv_2,
+	.do_3 = xor_rvv_3,
+	.do_4 = xor_rvv_4,
+	.do_5 = xor_rvv_5
+};
+
+#undef XOR_TRY_TEMPLATES
+#define XOR_TRY_TEMPLATES           \
+	do {        \
+		xor_speed(&xor_block_8regs);    \
+		xor_speed(&xor_block_32regs);    \
+		if (has_vector()) { \
+			xor_speed(&xor_block_rvv);\
+		} \
+	} while (0)
+#endif
diff --git a/arch/riscv/lib/Makefile b/arch/riscv/lib/Makefile
index 26cb2502ecf8..3164112680f1 100644
--- a/arch/riscv/lib/Makefile
+++ b/arch/riscv/lib/Makefile
@@ -11,3 +11,4 @@ lib-$(CONFIG_64BIT)	+= tishift.o
 lib-$(CONFIG_RISCV_ISA_ZICBOZ)	+= clear_page.o
 
 obj-$(CONFIG_FUNCTION_ERROR_INJECTION) += error-inject.o
+lib-$(CONFIG_VECTOR)	+= xor.o
diff --git a/arch/riscv/lib/xor.S b/arch/riscv/lib/xor.S
new file mode 100644
index 000000000000..3bc059e18171
--- /dev/null
+++ b/arch/riscv/lib/xor.S
@@ -0,0 +1,81 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * Copyright (C) 2021 SiFive
+ */
+#include <linux/linkage.h>
+#include <asm-generic/export.h>
+#include <asm/asm.h>
+
+ENTRY(xor_regs_2_)
+	vsetvli a3, a0, e8, m8, ta, ma
+	vle8.v v0, (a1)
+	vle8.v v8, (a2)
+	sub a0, a0, a3
+	vxor.vv v16, v0, v8
+	add a2, a2, a3
+	vse8.v v16, (a1)
+	add a1, a1, a3
+	bnez a0, xor_regs_2_
+	ret
+END(xor_regs_2_)
+EXPORT_SYMBOL(xor_regs_2_)
+
+ENTRY(xor_regs_3_)
+	vsetvli a4, a0, e8, m8, ta, ma
+	vle8.v v0, (a1)
+	vle8.v v8, (a2)
+	sub a0, a0, a4
+	vxor.vv v0, v0, v8
+	vle8.v v16, (a3)
+	add a2, a2, a4
+	vxor.vv v16, v0, v16
+	add a3, a3, a4
+	vse8.v v16, (a1)
+	add a1, a1, a4
+	bnez a0, xor_regs_3_
+	ret
+END(xor_regs_3_)
+EXPORT_SYMBOL(xor_regs_3_)
+
+ENTRY(xor_regs_4_)
+	vsetvli a5, a0, e8, m8, ta, ma
+	vle8.v v0, (a1)
+	vle8.v v8, (a2)
+	sub a0, a0, a5
+	vxor.vv v0, v0, v8
+	vle8.v v16, (a3)
+	add a2, a2, a5
+	vxor.vv v0, v0, v16
+	vle8.v v24, (a4)
+	add a3, a3, a5
+	vxor.vv v16, v0, v24
+	add a4, a4, a5
+	vse8.v v16, (a1)
+	add a1, a1, a5
+	bnez a0, xor_regs_4_
+	ret
+END(xor_regs_4_)
+EXPORT_SYMBOL(xor_regs_4_)
+
+ENTRY(xor_regs_5_)
+	vsetvli a6, a0, e8, m8, ta, ma
+	vle8.v v0, (a1)
+	vle8.v v8, (a2)
+	sub a0, a0, a6
+	vxor.vv v0, v0, v8
+	vle8.v v16, (a3)
+	add a2, a2, a6
+	vxor.vv v0, v0, v16
+	vle8.v v24, (a4)
+	add a3, a3, a6
+	vxor.vv v0, v0, v24
+	vle8.v v8, (a5)
+	add a4, a4, a6
+	vxor.vv v16, v0, v8
+	add a5, a5, a6
+	vse8.v v16, (a1)
+	add a1, a1, a6
+	bnez a0, xor_regs_5_
+	ret
+END(xor_regs_5_)
+EXPORT_SYMBOL(xor_regs_5_)
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [PATCH v4 03/12] RISC-V: add helper function to read the vector VLEN
  2023-07-11 15:37 ` Heiko Stuebner
@ 2023-07-11 15:37   ` Heiko Stuebner
  -1 siblings, 0 replies; 100+ messages in thread
From: Heiko Stuebner @ 2023-07-11 15:37 UTC (permalink / raw)
  To: palmer, paul.walmsley
  Cc: aou, heiko, herbert, davem, conor.dooley, linux-riscv,
	linux-kernel, linux-crypto, christoph.muellner, ebiggers,
	Heiko Stuebner

From: Heiko Stuebner <heiko.stuebner@vrull.eu>

VLEN describes the length of each vector register and some instructions
need specific minimal VLENs to work correctly.

The vector code already includes a variable riscv_vsize that contains the
value of "32 vector registers with vlenb length" that gets filled during
boot. vlenb is the value contained in the CSR_VLENB register and
the value represents "VLEN / 8".

So add riscv_vector_vlen() to return the actual VLEN value for in-kernel
users when they need to check the available VLEN.

Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
---
 arch/riscv/include/asm/vector.h | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/arch/riscv/include/asm/vector.h b/arch/riscv/include/asm/vector.h
index ac2c23045eec..88cf76a2316d 100644
--- a/arch/riscv/include/asm/vector.h
+++ b/arch/riscv/include/asm/vector.h
@@ -232,4 +232,15 @@ static inline bool riscv_v_vstate_ctrl_user_allowed(void) { return false; }
 
 #endif /* CONFIG_RISCV_ISA_V */
 
+/*
+ * Return the implementation's vlen value.
+ *
+ * riscv_vsize contains the value of "32 vector registers with vlenb length"
+ * so rebuild the vlen value in bits from it.
+ */
+static inline int riscv_vector_vlen(void)
+{
+	return riscv_v_vsize / 32 * 8;
+}
+
 #endif /* ! __ASM_RISCV_VECTOR_H */
-- 
2.39.2


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [PATCH v4 03/12] RISC-V: add helper function to read the vector VLEN
@ 2023-07-11 15:37   ` Heiko Stuebner
  0 siblings, 0 replies; 100+ messages in thread
From: Heiko Stuebner @ 2023-07-11 15:37 UTC (permalink / raw)
  To: palmer, paul.walmsley
  Cc: aou, heiko, herbert, davem, conor.dooley, linux-riscv,
	linux-kernel, linux-crypto, christoph.muellner, ebiggers,
	Heiko Stuebner

From: Heiko Stuebner <heiko.stuebner@vrull.eu>

VLEN describes the length of each vector register and some instructions
need specific minimal VLENs to work correctly.

The vector code already includes a variable riscv_vsize that contains the
value of "32 vector registers with vlenb length" that gets filled during
boot. vlenb is the value contained in the CSR_VLENB register and
the value represents "VLEN / 8".

So add riscv_vector_vlen() to return the actual VLEN value for in-kernel
users when they need to check the available VLEN.

Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
---
 arch/riscv/include/asm/vector.h | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/arch/riscv/include/asm/vector.h b/arch/riscv/include/asm/vector.h
index ac2c23045eec..88cf76a2316d 100644
--- a/arch/riscv/include/asm/vector.h
+++ b/arch/riscv/include/asm/vector.h
@@ -232,4 +232,15 @@ static inline bool riscv_v_vstate_ctrl_user_allowed(void) { return false; }
 
 #endif /* CONFIG_RISCV_ISA_V */
 
+/*
+ * Return the implementation's vlen value.
+ *
+ * riscv_vsize contains the value of "32 vector registers with vlenb length"
+ * so rebuild the vlen value in bits from it.
+ */
+static inline int riscv_vector_vlen(void)
+{
+	return riscv_v_vsize / 32 * 8;
+}
+
 #endif /* ! __ASM_RISCV_VECTOR_H */
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [PATCH v4 04/12] RISC-V: add vector crypto extension detection
  2023-07-11 15:37 ` Heiko Stuebner
@ 2023-07-11 15:37   ` Heiko Stuebner
  -1 siblings, 0 replies; 100+ messages in thread
From: Heiko Stuebner @ 2023-07-11 15:37 UTC (permalink / raw)
  To: palmer, paul.walmsley
  Cc: aou, heiko, herbert, davem, conor.dooley, linux-riscv,
	linux-kernel, linux-crypto, christoph.muellner, ebiggers,
	Heiko Stuebner

From: Heiko Stuebner <heiko.stuebner@vrull.eu>

Add detection for some extensions of the vector-crypto specification:
- Zvkb: Vector Bit-manipulation used in Cryptography
- Zvkg: Vector GCM/GMAC
- Zvknha and Zvknhb: NIST Algorithm Suite
- Zvkns: AES-128, AES-256 Single Round Suite
- Zvksed: ShangMi Algorithm Suite
- Zvksh: ShangMi Algorithm Suite

As their use is very specific and will likely be limited to special places
we expect current code to just pre-encode those instructions, so right now
we don't introduce toolchain requirements.

Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
---
 arch/riscv/include/asm/hwcap.h |  9 ++++++
 arch/riscv/kernel/cpu.c        |  8 ++++++
 arch/riscv/kernel/cpufeature.c | 50 ++++++++++++++++++++++++++++++++++
 3 files changed, 67 insertions(+)

diff --git a/arch/riscv/include/asm/hwcap.h b/arch/riscv/include/asm/hwcap.h
index b80ca6e77088..0f5172fa87b0 100644
--- a/arch/riscv/include/asm/hwcap.h
+++ b/arch/riscv/include/asm/hwcap.h
@@ -64,6 +64,15 @@
 #define RISCV_ISA_EXT_ZKSED		51
 #define RISCV_ISA_EXT_ZKSH		52
 #define RISCV_ISA_EXT_ZKT		53
+#define RISCV_ISA_EXT_ZVBB		54
+#define RISCV_ISA_EXT_ZVBC		55
+#define RISCV_ISA_EXT_ZVKG		56
+#define RISCV_ISA_EXT_ZVKNED		57
+#define RISCV_ISA_EXT_ZVKNHA		58
+#define RISCV_ISA_EXT_ZVKNHB		59
+#define RISCV_ISA_EXT_ZVKSED		60
+#define RISCV_ISA_EXT_ZVKSH		61
+#define RISCV_ISA_EXT_ZVKT		62
 
 #define RISCV_ISA_EXT_MAX		64
 #define RISCV_ISA_EXT_NAME_LEN_MAX	32
diff --git a/arch/riscv/kernel/cpu.c b/arch/riscv/kernel/cpu.c
index 10524322a4c0..925241e25db2 100644
--- a/arch/riscv/kernel/cpu.c
+++ b/arch/riscv/kernel/cpu.c
@@ -227,6 +227,14 @@ static struct riscv_isa_ext_data isa_ext_arr[] = {
 	__RISCV_ISA_EXT_DATA(zksed, RISCV_ISA_EXT_ZKSED),
 	__RISCV_ISA_EXT_DATA(zksh, RISCV_ISA_EXT_ZKSH),
 	__RISCV_ISA_EXT_DATA(zkt, RISCV_ISA_EXT_ZKT),
+	__RISCV_ISA_EXT_DATA(zvbb, RISCV_ISA_EXT_ZVBB),
+	__RISCV_ISA_EXT_DATA(zvbc, RISCV_ISA_EXT_ZVBC),
+	__RISCV_ISA_EXT_DATA(zvkg, RISCV_ISA_EXT_ZVKG),
+	__RISCV_ISA_EXT_DATA(zvkned, RISCV_ISA_EXT_ZVKNED),
+	__RISCV_ISA_EXT_DATA(zvknha, RISCV_ISA_EXT_ZVKNHA),
+	__RISCV_ISA_EXT_DATA(zvknhb, RISCV_ISA_EXT_ZVKNHB),
+	__RISCV_ISA_EXT_DATA(zvksed, RISCV_ISA_EXT_ZVKSED),
+	__RISCV_ISA_EXT_DATA(zvksh, RISCV_ISA_EXT_ZVKSH),
 	__RISCV_ISA_EXT_DATA(smaia, RISCV_ISA_EXT_SMAIA),
 	__RISCV_ISA_EXT_DATA(ssaia, RISCV_ISA_EXT_SSAIA),
 	__RISCV_ISA_EXT_DATA(sscofpmf, RISCV_ISA_EXT_SSCOFPMF),
diff --git a/arch/riscv/kernel/cpufeature.c b/arch/riscv/kernel/cpufeature.c
index 9a872a2007a5..13556fd16bf6 100644
--- a/arch/riscv/kernel/cpufeature.c
+++ b/arch/riscv/kernel/cpufeature.c
@@ -343,6 +343,56 @@ void __init riscv_fill_hwcap(void)
 				SET_ISA_EXT_MAP("zksh", RISCV_ISA_EXT_ZKSH);
 				SET_ISA_EXT_MAP("zkr", RISCV_ISA_EXT_ZKR);
 				SET_ISA_EXT_MAP("zkt", RISCV_ISA_EXT_ZKT);
+				SET_ISA_EXT_MAP("zvbb", RISCV_ISA_EXT_ZVBB);
+				SET_ISA_EXT_MAP("zvbc", RISCV_ISA_EXT_ZVBC);
+				SET_ISA_EXT_MAP("zvkg", RISCV_ISA_EXT_ZVKG);
+				SET_ISA_EXT_MAP("zvkned", RISCV_ISA_EXT_ZVKNED);
+				SET_ISA_EXT_MAP("zvknha", RISCV_ISA_EXT_ZVKNHA);
+				SET_ISA_EXT_MAP("zvknhb", RISCV_ISA_EXT_ZVKNHB);
+				SET_ISA_EXT_MAP("zvksed", RISCV_ISA_EXT_ZVKSED);
+				SET_ISA_EXT_MAP("zvksh", RISCV_ISA_EXT_ZVKSH);
+				SET_ISA_EXT_MAP("zvkt", RISCV_ISA_EXT_ZVKT);
+
+				/* NIST Algorithm Suite */
+				SET_ISA_EXT_MAP("zvkn", RISCV_ISA_EXT_ZVKNED);
+				SET_ISA_EXT_MAP("zvkn", RISCV_ISA_EXT_ZVKNHB);
+				SET_ISA_EXT_MAP("zvkn", RISCV_ISA_EXT_ZVBB);
+				SET_ISA_EXT_MAP("zvkn", RISCV_ISA_EXT_ZVKT);
+
+				/* NIST Algorithm Suite with carryless multiply */
+				SET_ISA_EXT_MAP("zvknc", RISCV_ISA_EXT_ZVKNED);
+				SET_ISA_EXT_MAP("zvknc", RISCV_ISA_EXT_ZVKNHB);
+				SET_ISA_EXT_MAP("zvknc", RISCV_ISA_EXT_ZVBB);
+				SET_ISA_EXT_MAP("zvknc", RISCV_ISA_EXT_ZVKT);
+				SET_ISA_EXT_MAP("zvknc", RISCV_ISA_EXT_ZVBC);
+
+				/* NIST Algorithm Suite with GCM */
+				SET_ISA_EXT_MAP("zvkng", RISCV_ISA_EXT_ZVKNED);
+				SET_ISA_EXT_MAP("zvkng", RISCV_ISA_EXT_ZVKNHB);
+				SET_ISA_EXT_MAP("zvkng", RISCV_ISA_EXT_ZVBB);
+				SET_ISA_EXT_MAP("zvkng", RISCV_ISA_EXT_ZVKT);
+				SET_ISA_EXT_MAP("zvkng", RISCV_ISA_EXT_ZVKG);
+
+				/*  ShangMi Algorithm Suite */
+				SET_ISA_EXT_MAP("zvks", RISCV_ISA_EXT_ZVKSED);
+				SET_ISA_EXT_MAP("zvks", RISCV_ISA_EXT_ZVKSH);
+				SET_ISA_EXT_MAP("zvks", RISCV_ISA_EXT_ZVBB);
+				SET_ISA_EXT_MAP("zvks", RISCV_ISA_EXT_ZVKT);
+
+				/* ShangMi Algorithm Suite with carryless multiply */
+				SET_ISA_EXT_MAP("zvksc", RISCV_ISA_EXT_ZVKSED);
+				SET_ISA_EXT_MAP("zvksc", RISCV_ISA_EXT_ZVKSH);
+				SET_ISA_EXT_MAP("zvksc", RISCV_ISA_EXT_ZVBB);
+				SET_ISA_EXT_MAP("zvksc", RISCV_ISA_EXT_ZVKT);
+				SET_ISA_EXT_MAP("zvksc", RISCV_ISA_EXT_ZVBC);
+
+				/* ShangMi Algorithm Suite with GCM */
+				SET_ISA_EXT_MAP("zvksg", RISCV_ISA_EXT_ZVKSED);
+				SET_ISA_EXT_MAP("zvksg", RISCV_ISA_EXT_ZVKSH);
+				SET_ISA_EXT_MAP("zvksg", RISCV_ISA_EXT_ZVBB);
+				SET_ISA_EXT_MAP("zvksg", RISCV_ISA_EXT_ZVKT);
+				SET_ISA_EXT_MAP("zvksg", RISCV_ISA_EXT_ZVKG);
+
 			}
 #undef SET_ISA_EXT_MAP
 		}
-- 
2.39.2


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [PATCH v4 04/12] RISC-V: add vector crypto extension detection
@ 2023-07-11 15:37   ` Heiko Stuebner
  0 siblings, 0 replies; 100+ messages in thread
From: Heiko Stuebner @ 2023-07-11 15:37 UTC (permalink / raw)
  To: palmer, paul.walmsley
  Cc: aou, heiko, herbert, davem, conor.dooley, linux-riscv,
	linux-kernel, linux-crypto, christoph.muellner, ebiggers,
	Heiko Stuebner

From: Heiko Stuebner <heiko.stuebner@vrull.eu>

Add detection for some extensions of the vector-crypto specification:
- Zvkb: Vector Bit-manipulation used in Cryptography
- Zvkg: Vector GCM/GMAC
- Zvknha and Zvknhb: NIST Algorithm Suite
- Zvkns: AES-128, AES-256 Single Round Suite
- Zvksed: ShangMi Algorithm Suite
- Zvksh: ShangMi Algorithm Suite

As their use is very specific and will likely be limited to special places
we expect current code to just pre-encode those instructions, so right now
we don't introduce toolchain requirements.

Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
---
 arch/riscv/include/asm/hwcap.h |  9 ++++++
 arch/riscv/kernel/cpu.c        |  8 ++++++
 arch/riscv/kernel/cpufeature.c | 50 ++++++++++++++++++++++++++++++++++
 3 files changed, 67 insertions(+)

diff --git a/arch/riscv/include/asm/hwcap.h b/arch/riscv/include/asm/hwcap.h
index b80ca6e77088..0f5172fa87b0 100644
--- a/arch/riscv/include/asm/hwcap.h
+++ b/arch/riscv/include/asm/hwcap.h
@@ -64,6 +64,15 @@
 #define RISCV_ISA_EXT_ZKSED		51
 #define RISCV_ISA_EXT_ZKSH		52
 #define RISCV_ISA_EXT_ZKT		53
+#define RISCV_ISA_EXT_ZVBB		54
+#define RISCV_ISA_EXT_ZVBC		55
+#define RISCV_ISA_EXT_ZVKG		56
+#define RISCV_ISA_EXT_ZVKNED		57
+#define RISCV_ISA_EXT_ZVKNHA		58
+#define RISCV_ISA_EXT_ZVKNHB		59
+#define RISCV_ISA_EXT_ZVKSED		60
+#define RISCV_ISA_EXT_ZVKSH		61
+#define RISCV_ISA_EXT_ZVKT		62
 
 #define RISCV_ISA_EXT_MAX		64
 #define RISCV_ISA_EXT_NAME_LEN_MAX	32
diff --git a/arch/riscv/kernel/cpu.c b/arch/riscv/kernel/cpu.c
index 10524322a4c0..925241e25db2 100644
--- a/arch/riscv/kernel/cpu.c
+++ b/arch/riscv/kernel/cpu.c
@@ -227,6 +227,14 @@ static struct riscv_isa_ext_data isa_ext_arr[] = {
 	__RISCV_ISA_EXT_DATA(zksed, RISCV_ISA_EXT_ZKSED),
 	__RISCV_ISA_EXT_DATA(zksh, RISCV_ISA_EXT_ZKSH),
 	__RISCV_ISA_EXT_DATA(zkt, RISCV_ISA_EXT_ZKT),
+	__RISCV_ISA_EXT_DATA(zvbb, RISCV_ISA_EXT_ZVBB),
+	__RISCV_ISA_EXT_DATA(zvbc, RISCV_ISA_EXT_ZVBC),
+	__RISCV_ISA_EXT_DATA(zvkg, RISCV_ISA_EXT_ZVKG),
+	__RISCV_ISA_EXT_DATA(zvkned, RISCV_ISA_EXT_ZVKNED),
+	__RISCV_ISA_EXT_DATA(zvknha, RISCV_ISA_EXT_ZVKNHA),
+	__RISCV_ISA_EXT_DATA(zvknhb, RISCV_ISA_EXT_ZVKNHB),
+	__RISCV_ISA_EXT_DATA(zvksed, RISCV_ISA_EXT_ZVKSED),
+	__RISCV_ISA_EXT_DATA(zvksh, RISCV_ISA_EXT_ZVKSH),
 	__RISCV_ISA_EXT_DATA(smaia, RISCV_ISA_EXT_SMAIA),
 	__RISCV_ISA_EXT_DATA(ssaia, RISCV_ISA_EXT_SSAIA),
 	__RISCV_ISA_EXT_DATA(sscofpmf, RISCV_ISA_EXT_SSCOFPMF),
diff --git a/arch/riscv/kernel/cpufeature.c b/arch/riscv/kernel/cpufeature.c
index 9a872a2007a5..13556fd16bf6 100644
--- a/arch/riscv/kernel/cpufeature.c
+++ b/arch/riscv/kernel/cpufeature.c
@@ -343,6 +343,56 @@ void __init riscv_fill_hwcap(void)
 				SET_ISA_EXT_MAP("zksh", RISCV_ISA_EXT_ZKSH);
 				SET_ISA_EXT_MAP("zkr", RISCV_ISA_EXT_ZKR);
 				SET_ISA_EXT_MAP("zkt", RISCV_ISA_EXT_ZKT);
+				SET_ISA_EXT_MAP("zvbb", RISCV_ISA_EXT_ZVBB);
+				SET_ISA_EXT_MAP("zvbc", RISCV_ISA_EXT_ZVBC);
+				SET_ISA_EXT_MAP("zvkg", RISCV_ISA_EXT_ZVKG);
+				SET_ISA_EXT_MAP("zvkned", RISCV_ISA_EXT_ZVKNED);
+				SET_ISA_EXT_MAP("zvknha", RISCV_ISA_EXT_ZVKNHA);
+				SET_ISA_EXT_MAP("zvknhb", RISCV_ISA_EXT_ZVKNHB);
+				SET_ISA_EXT_MAP("zvksed", RISCV_ISA_EXT_ZVKSED);
+				SET_ISA_EXT_MAP("zvksh", RISCV_ISA_EXT_ZVKSH);
+				SET_ISA_EXT_MAP("zvkt", RISCV_ISA_EXT_ZVKT);
+
+				/* NIST Algorithm Suite */
+				SET_ISA_EXT_MAP("zvkn", RISCV_ISA_EXT_ZVKNED);
+				SET_ISA_EXT_MAP("zvkn", RISCV_ISA_EXT_ZVKNHB);
+				SET_ISA_EXT_MAP("zvkn", RISCV_ISA_EXT_ZVBB);
+				SET_ISA_EXT_MAP("zvkn", RISCV_ISA_EXT_ZVKT);
+
+				/* NIST Algorithm Suite with carryless multiply */
+				SET_ISA_EXT_MAP("zvknc", RISCV_ISA_EXT_ZVKNED);
+				SET_ISA_EXT_MAP("zvknc", RISCV_ISA_EXT_ZVKNHB);
+				SET_ISA_EXT_MAP("zvknc", RISCV_ISA_EXT_ZVBB);
+				SET_ISA_EXT_MAP("zvknc", RISCV_ISA_EXT_ZVKT);
+				SET_ISA_EXT_MAP("zvknc", RISCV_ISA_EXT_ZVBC);
+
+				/* NIST Algorithm Suite with GCM */
+				SET_ISA_EXT_MAP("zvkng", RISCV_ISA_EXT_ZVKNED);
+				SET_ISA_EXT_MAP("zvkng", RISCV_ISA_EXT_ZVKNHB);
+				SET_ISA_EXT_MAP("zvkng", RISCV_ISA_EXT_ZVBB);
+				SET_ISA_EXT_MAP("zvkng", RISCV_ISA_EXT_ZVKT);
+				SET_ISA_EXT_MAP("zvkng", RISCV_ISA_EXT_ZVKG);
+
+				/*  ShangMi Algorithm Suite */
+				SET_ISA_EXT_MAP("zvks", RISCV_ISA_EXT_ZVKSED);
+				SET_ISA_EXT_MAP("zvks", RISCV_ISA_EXT_ZVKSH);
+				SET_ISA_EXT_MAP("zvks", RISCV_ISA_EXT_ZVBB);
+				SET_ISA_EXT_MAP("zvks", RISCV_ISA_EXT_ZVKT);
+
+				/* ShangMi Algorithm Suite with carryless multiply */
+				SET_ISA_EXT_MAP("zvksc", RISCV_ISA_EXT_ZVKSED);
+				SET_ISA_EXT_MAP("zvksc", RISCV_ISA_EXT_ZVKSH);
+				SET_ISA_EXT_MAP("zvksc", RISCV_ISA_EXT_ZVBB);
+				SET_ISA_EXT_MAP("zvksc", RISCV_ISA_EXT_ZVKT);
+				SET_ISA_EXT_MAP("zvksc", RISCV_ISA_EXT_ZVBC);
+
+				/* ShangMi Algorithm Suite with GCM */
+				SET_ISA_EXT_MAP("zvksg", RISCV_ISA_EXT_ZVKSED);
+				SET_ISA_EXT_MAP("zvksg", RISCV_ISA_EXT_ZVKSH);
+				SET_ISA_EXT_MAP("zvksg", RISCV_ISA_EXT_ZVBB);
+				SET_ISA_EXT_MAP("zvksg", RISCV_ISA_EXT_ZVKT);
+				SET_ISA_EXT_MAP("zvksg", RISCV_ISA_EXT_ZVKG);
+
 			}
 #undef SET_ISA_EXT_MAP
 		}
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [PATCH v4 05/12] RISC-V: crypto: update perl include with helpers for vector (crypto) instructions
  2023-07-11 15:37 ` Heiko Stuebner
@ 2023-07-11 15:37   ` Heiko Stuebner
  -1 siblings, 0 replies; 100+ messages in thread
From: Heiko Stuebner @ 2023-07-11 15:37 UTC (permalink / raw)
  To: palmer, paul.walmsley
  Cc: aou, heiko, herbert, davem, conor.dooley, linux-riscv,
	linux-kernel, linux-crypto, christoph.muellner, ebiggers,
	Heiko Stuebner

From: Heiko Stuebner <heiko.stuebner@vrull.eu>

The openSSL scripts use a number of helpers for handling vector
instructions and instructions from the vector-crypto-extensions.

Therefore port these over from openSSL.

Co-developed-by: Christoph Müllner <christoph.muellner@vrull.eu>
Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
---
 arch/riscv/crypto/riscv.pm | 433 ++++++++++++++++++++++++++++++++++++-
 1 file changed, 432 insertions(+), 1 deletion(-)

diff --git a/arch/riscv/crypto/riscv.pm b/arch/riscv/crypto/riscv.pm
index a44edc68d1a6..89e4cb7cfd8c 100644
--- a/arch/riscv/crypto/riscv.pm
+++ b/arch/riscv/crypto/riscv.pm
@@ -79,6 +79,29 @@ sub read_reg {
     return $1;
 }
 
+my @vregs = map("v$_",(0..31));
+my %vreglookup;
+@vreglookup{@vregs} = @vregs;
+
+sub read_vreg {
+    my $vreg = lc shift;
+    if (!exists($vreglookup{$vreg})) {
+        my $trace = "";
+        if ($have_stacktrace) {
+            $trace = Devel::StackTrace->new->as_string;
+        }
+        die("Unknown vector register ".$vreg."\n".$trace);
+    }
+    if (!($vreg =~ /^v([0-9]+)$/)) {
+        my $trace = "";
+        if ($have_stacktrace) {
+            $trace = Devel::StackTrace->new->as_string;
+        }
+        die("Could not process vector register ".$vreg."\n".$trace);
+    }
+    return $1;
+}
+
 # Helper functions
 
 sub brev8_rv64i {
@@ -258,4 +281,412 @@ sub rev8 {
     return ".word ".($template | ($rs << 15) | ($rd << 7));
 }
 
+# Vector instructions
+
+sub vadd_vv {
+    # vadd.vv vd, vs2, vs1
+    my $template = 0b0000001_00000_00000_000_00000_1010111;
+    my $vd = read_vreg shift;
+    my $vs2 = read_vreg shift;
+    my $vs1 = read_vreg shift;
+    return ".word ".($template | ($vs2 << 20) | ($vs1 << 15) | ($vd << 7));
+}
+
+sub vid_v {
+    # vid.v vd
+    my $template = 0b0101001_00000_10001_010_00000_1010111;
+    my $vd = read_vreg shift;
+    return ".word ".($template | ($vd << 7));
+}
+
+sub vle32_v {
+    # vle32.v vd, (rs1)
+    my $template = 0b0000001_00000_00000_110_00000_0000111;
+    my $vd = read_vreg shift;
+    my $rs1 = read_reg shift;
+    return ".word ".($template | ($rs1 << 15) | ($vd << 7));
+}
+
+sub vle64_v {
+    # vle64.v vd, (rs1)
+    my $template = 0b0000001_00000_00000_111_00000_0000111;
+    my $vd = read_vreg shift;
+    my $rs1 = read_reg shift;
+    return ".word ".($template | ($rs1 << 15) | ($vd << 7));
+}
+
+sub vlse32_v {
+    # vlse32.v vd, (rs1), rs2
+    my $template = 0b0000101_00000_00000_110_00000_0000111;
+    my $vd = read_vreg shift;
+    my $rs1 = read_reg shift;
+    my $rs2 = read_reg shift;
+    return ".word ".($template | ($rs2 << 20) | ($rs1 << 15) | ($vd << 7));
+}
+
+sub vlse64_v {
+    # vlse64.v vd, (rs1), rs2
+    my $template = 0b0000101_00000_00000_111_00000_0000111;
+    my $vd = read_vreg shift;
+    my $rs1 = read_reg shift;
+    my $rs2 = read_reg shift;
+    return ".word ".($template | ($rs2 << 20) | ($rs1 << 15) | ($vd << 7));
+}
+
+sub vmerge_vim {
+    # vmerge.vim vd, vs2, imm, v0
+    my $template = 0b0101110_00000_00000_011_00000_1010111;
+    my $vd = read_vreg shift;
+    my $vs2 = read_vreg shift;
+    my $imm = shift;
+    return ".word ".($template | ($vs2 << 20) | ($imm << 15) | ($vd << 7));
+}
+
+sub vmerge_vvm {
+    # vmerge.vvm vd vs2 vs1
+    my $template = 0b0101110_00000_00000_000_00000_1010111;
+    my $vd = read_vreg shift;
+    my $vs2 = read_vreg shift;
+    my $vs1 = read_vreg shift;
+    return ".word ".($template | ($vs2 << 20) | ($vs1 <<   15) | ($vd << 7))
+}
+
+sub vmseq_vi {
+    # vmseq vd vs1, imm
+    my $template = 0b0110001_00000_00000_011_00000_1010111;
+    my $vd = read_vreg shift;
+    my $vs1 = read_vreg shift;
+    my $imm = shift;
+    return ".word ".($template | ($vs1 << 20) | ($imm <<   15) | ($vd << 7))
+}
+
+sub vmv_v_i {
+    # vmv.v.i vd, imm
+    my $template = 0b0101111_00000_00000_011_00000_1010111;
+    my $vd = read_vreg shift;
+    my $imm = shift;
+    return ".word ".($template | ($imm << 15) | ($vd << 7));
+}
+
+sub vmv_v_v {
+    # vmv.v.v vd, vs1
+    my $template = 0b0101111_00000_00000_000_00000_1010111;
+    my $vd = read_vreg shift;
+    my $vs1 = read_vreg shift;
+    return ".word ".($template | ($vs1 << 15) | ($vd << 7));
+}
+
+sub vor_vv_v0t {
+    # vor.vv vd, vs2, vs1, v0.t
+    my $template = 0b0010100_00000_00000_000_00000_1010111;
+    my $vd = read_vreg shift;
+    my $vs2 = read_vreg shift;
+    my $vs1 = read_vreg shift;
+    return ".word ".($template | ($vs2 << 20) | ($vs1 << 15) | ($vd << 7));
+}
+
+sub vse32_v {
+    # vse32.v vd, (rs1)
+    my $template = 0b0000001_00000_00000_110_00000_0100111;
+    my $vd = read_vreg shift;
+    my $rs1 = read_reg shift;
+    return ".word ".($template | ($rs1 << 15) | ($vd << 7));
+}
+
+sub vse64_v {
+    # vse64.v vd, (rs1)
+    my $template = 0b0000001_00000_00000_111_00000_0100111;
+    my $vd = read_vreg shift;
+    my $rs1 = read_reg shift;
+    return ".word ".($template | ($rs1 << 15) | ($vd << 7));
+}
+
+sub vsetivli__x0_2_e64_m1_ta_ma {
+    # vsetivli x0, 2, e64, m1, ta, ma
+    return ".word 0xcd817057";
+}
+
+sub vsetivli__x0_4_e32_m1_ta_ma {
+    # vsetivli x0, 4, e32, m1, ta, ma
+    return ".word 0xcd027057";
+}
+
+sub vsetivli__x0_4_e64_m1_ta_ma {
+    # vsetivli x0,4,e64,m1,ta,ma
+    return ".word 0xcd827057";
+}
+
+sub vsetivli__x0_8_e32_m1_ta_ma {
+    return ".word 0xcd047057";
+}
+
+sub vslidedown_vi {
+    # vslidedown.vi vd, vs2, uimm
+    my $template = 0b0011111_00000_00000_011_00000_1010111;
+    my $vd = read_vreg shift;
+    my $vs2 = read_vreg shift;
+    my $uimm = shift;
+    return ".word ".($template | ($vs2 << 20) | ($uimm << 15) | ($vd << 7));
+}
+
+sub vslideup_vi_v0t {
+    # vslideup.vi vd, vs2, uimm, v0.t
+    my $template = 0b0011100_00000_00000_011_00000_1010111;
+    my $vd = read_vreg shift;
+    my $vs2 = read_vreg shift;
+    my $uimm = shift;
+    return ".word ".($template | ($vs2 << 20) | ($uimm << 15) | ($vd << 7));
+}
+
+sub vslideup_vi {
+    # vslideup.vi vd, vs2, uimm
+    my $template = 0b0011101_00000_00000_011_00000_1010111;
+    my $vd = read_vreg shift;
+    my $vs2 = read_vreg shift;
+    my $uimm = shift;
+    return ".word ".($template | ($vs2 << 20) | ($uimm << 15) | ($vd << 7));
+}
+
+sub vsll_vi {
+    # vsll.vi vd, vs2, uimm, vm
+    my $template = 0b1001011_00000_00000_011_00000_1010111;
+    my $vd = read_vreg shift;
+    my $vs2 = read_vreg shift;
+    my $uimm = shift;
+    return ".word ".($template | ($vs2 << 20) | ($uimm << 15) | ($vd << 7));
+}
+
+sub vsrl_vx {
+    # vsrl.vx vd, vs2, rs1
+    my $template = 0b1010001_00000_00000_100_00000_1010111;
+    my $vd = read_vreg shift;
+    my $vs2 = read_vreg shift;
+    my $rs1 = read_reg shift;
+    return ".word ".($template | ($vs2 << 20) | ($rs1 << 15) | ($vd << 7));
+}
+
+sub vsse32_v {
+    # vse32.v vs3, (rs1), rs2
+    my $template = 0b0000101_00000_00000_110_00000_0100111;
+    my $vs3 = read_vreg shift;
+    my $rs1 = read_reg shift;
+    my $rs2 = read_reg shift;
+    return ".word ".($template | ($rs2 << 20) | ($rs1 << 15) | ($vs3 << 7));
+}
+
+sub vsse64_v {
+    # vsse64.v vs3, (rs1), rs2
+    my $template = 0b0000101_00000_00000_111_00000_0100111;
+    my $vs3 = read_vreg shift;
+    my $rs1 = read_reg shift;
+    my $rs2 = read_reg shift;
+    return ".word ".($template | ($rs2 << 20) | ($rs1 << 15) | ($vs3 << 7));
+}
+
+sub vxor_vv_v0t {
+    # vxor.vv vd, vs2, vs1, v0.t
+    my $template = 0b0010110_00000_00000_000_00000_1010111;
+    my $vd = read_vreg shift;
+    my $vs2 = read_vreg shift;
+    my $vs1 = read_vreg shift;
+    return ".word ".($template | ($vs2 << 20) | ($vs1 << 15) | ($vd << 7));
+}
+
+sub vxor_vv {
+    # vxor.vv vd, vs2, vs1
+    my $template = 0b0010111_00000_00000_000_00000_1010111;
+    my $vd = read_vreg shift;
+    my $vs2 = read_vreg shift;
+    my $vs1 = read_vreg shift;
+    return ".word ".($template | ($vs2 << 20) | ($vs1 << 15) | ($vd << 7));
+}
+
+# Vector crypto instructions
+
+## Zvbb instructions
+
+sub vrev8_v {
+    # vrev8.v vd, vs2
+    my $template = 0b0100101_00000_01001_010_00000_1010111;
+    my $vd = read_vreg shift;
+    my $vs2 = read_vreg shift;
+    return ".word ".($template | ($vs2 << 20) | ($vd << 7));
+}
+
+## Zvbc instructions
+
+sub vclmulh_vx {
+    # vclmulh.vx vd, vs2, rs1
+    my $template = 0b0011011_00000_00000_110_00000_1010111;
+    my $vd = read_vreg shift;
+    my $vs2 = read_vreg shift;
+    my $rs1 = read_reg shift;
+    return ".word ".($template | ($vs2 << 20) | ($rs1 << 15) | ($vd << 7));
+}
+
+sub vclmul_vx_v0t {
+    # vclmul.vx vd, vs2, rs1, v0.t
+    my $template = 0b0011000_00000_00000_110_00000_1010111;
+    my $vd = read_vreg shift;
+    my $vs2 = read_vreg shift;
+    my $rs1 = read_reg shift;
+    return ".word ".($template | ($vs2 << 20) | ($rs1 << 15) | ($vd << 7));
+}
+
+sub vclmul_vx {
+    # vclmul.vx vd, vs2, rs1
+    my $template = 0b0011001_00000_00000_110_00000_1010111;
+    my $vd = read_vreg shift;
+    my $vs2 = read_vreg shift;
+    my $rs1 = read_reg shift;
+    return ".word ".($template | ($vs2 << 20) | ($rs1 << 15) | ($vd << 7));
+}
+
+## Zvkg instructions
+
+sub vghsh_vv {
+    # vghsh.vv vd, vs2, vs1
+    my $template = 0b1011001_00000_00000_010_00000_1110111;
+    my $vd = read_vreg shift;
+    my $vs2 = read_vreg shift;
+    my $vs1 = read_vreg shift;
+    return ".word ".($template | ($vs2 << 20) | ($vs1 << 15) | ($vd << 7));
+}
+
+sub vgmul_vv {
+    # vgmul.vv vd, vs2
+    my $template = 0b1010001_00000_10001_010_00000_1110111;
+    my $vd = read_vreg shift;
+    my $vs2 = read_vreg shift;
+    return ".word ".($template | ($vs2 << 20) | ($vd << 7));
+}
+
+## Zvkned instructions
+
+sub vaesdf_vs {
+    # vaesdf.vs vd, vs2
+    my $template = 0b101001_1_00000_00001_010_00000_1110111;
+    my $vd = read_vreg  shift;
+    my $vs2 = read_vreg  shift;
+    return ".word ".($template | ($vs2 << 20) | ($vd << 7));
+}
+
+sub vaesdm_vs {
+    # vaesdm.vs vd, vs2
+    my $template = 0b101001_1_00000_00000_010_00000_1110111;
+    my $vd = read_vreg shift;
+    my $vs2 = read_vreg shift;
+    return ".word ".($template | ($vs2 << 20) | ($vd << 7));
+}
+
+sub vaesef_vs {
+    # vaesef.vs vd, vs2
+    my $template = 0b101001_1_00000_00011_010_00000_1110111;
+    my $vd = read_vreg  shift;
+    my $vs2 = read_vreg  shift;
+    return ".word ".($template | ($vs2 << 20) | ($vd << 7));
+}
+
+sub vaesem_vs {
+    # vaesem.vs vd, vs2
+    my $template = 0b101001_1_00000_00010_010_00000_1110111;
+    my $vd = read_vreg  shift;
+    my $vs2 = read_vreg  shift;
+    return ".word ".($template | ($vs2 << 20) | ($vd << 7));
+}
+
+sub vaeskf1_vi {
+    # vaeskf1.vi vd, vs2, uimmm
+    my $template = 0b100010_1_00000_00000_010_00000_1110111;
+    my $vd = read_vreg  shift;
+    my $vs2 = read_vreg  shift;
+    my $uimm = shift;
+    return ".word ".($template | ($uimm << 15) | ($vs2 << 20) | ($vd << 7));
+}
+
+sub vaeskf2_vi {
+    # vaeskf2.vi vd, vs2, uimm
+    my $template = 0b101010_1_00000_00000_010_00000_1110111;
+    my $vd = read_vreg shift;
+    my $vs2 = read_vreg shift;
+    my $uimm = shift;
+    return ".word ".($template | ($vs2 << 20) | ($uimm << 15) | ($vd << 7));
+}
+
+sub vaesz_vs {
+    # vaesz.vs vd, vs2
+    my $template = 0b101001_1_00000_00111_010_00000_1110111;
+    my $vd = read_vreg  shift;
+    my $vs2 = read_vreg  shift;
+    return ".word ".($template | ($vs2 << 20) | ($vd << 7));
+}
+
+## Zvknha and Zvknhb instructions
+
+sub vsha2ms_vv {
+    # vsha2ms.vv vd, vs2, vs1
+    my $template = 0b1011011_00000_00000_010_00000_1110111;
+    my $vd = read_vreg shift;
+    my $vs2 = read_vreg shift;
+    my $vs1 = read_vreg shift;
+    return ".word ".($template | ($vs2 << 20)| ($vs1 << 15 )| ($vd << 7));
+}
+
+sub vsha2ch_vv {
+    # vsha2ch.vv vd, vs2, vs1
+    my $template = 0b101110_10000_00000_001_00000_01110111;
+    my $vd = read_vreg shift;
+    my $vs2 = read_vreg shift;
+    my $vs1 = read_vreg shift;
+    return ".word ".($template | ($vs2 << 20)| ($vs1 << 15 )| ($vd << 7));
+}
+
+sub vsha2cl_vv {
+    # vsha2cl.vv vd, vs2, vs1
+    my $template = 0b101111_10000_00000_001_00000_01110111;
+    my $vd = read_vreg shift;
+    my $vs2 = read_vreg shift;
+    my $vs1 = read_vreg shift;
+    return ".word ".($template | ($vs2 << 20)| ($vs1 << 15 )| ($vd << 7));
+}
+
+## Zvksed instructions
+
+sub vsm4k_vi {
+    # vsm4k.vi vd, vs2, uimm
+    my $template = 0b1000011_00000_00000_010_00000_1110111;
+    my $vd = read_vreg shift;
+    my $vs2 = read_vreg shift;
+    my $uimm = shift;
+    return ".word ".($template | ($vs2 << 20) | ($uimm << 15) | ($vd << 7));
+}
+
+sub vsm4r_vs {
+    # vsm4r.vs vd, vs2
+    my $template = 0b1010011_00000_10000_010_00000_1110111;
+    my $vd = read_vreg shift;
+    my $vs2 = read_vreg shift;
+    return ".word ".($template | ($vs2 << 20) | ($vd << 7));
+}
+
+## zvksh instructions
+
+sub vsm3c_vi {
+    # vsm3c.vi vd, vs2, uimm
+    my $template = 0b1010111_00000_00000_010_00000_1110111;
+    my $vd = read_vreg shift;
+    my $vs2 = read_vreg shift;
+    my $uimm = shift;
+    return ".word ".($template | ($vs2 << 20) | ($uimm << 15 ) | ($vd << 7));
+}
+
+sub vsm3me_vv {
+    # vsm3me.vv vd, vs2, vs1
+    my $template = 0b1000001_00000_00000_010_00000_1110111;
+    my $vd = read_vreg shift;
+    my $vs2 = read_vreg shift;
+    my $vs1 = read_vreg shift;
+    return ".word ".($template | ($vs2 << 20) | ($vs1 << 15 ) | ($vd << 7));
+}
+
 1;
-- 
2.39.2


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [PATCH v4 05/12] RISC-V: crypto: update perl include with helpers for vector (crypto) instructions
@ 2023-07-11 15:37   ` Heiko Stuebner
  0 siblings, 0 replies; 100+ messages in thread
From: Heiko Stuebner @ 2023-07-11 15:37 UTC (permalink / raw)
  To: palmer, paul.walmsley
  Cc: aou, heiko, herbert, davem, conor.dooley, linux-riscv,
	linux-kernel, linux-crypto, christoph.muellner, ebiggers,
	Heiko Stuebner

From: Heiko Stuebner <heiko.stuebner@vrull.eu>

The openSSL scripts use a number of helpers for handling vector
instructions and instructions from the vector-crypto-extensions.

Therefore port these over from openSSL.

Co-developed-by: Christoph Müllner <christoph.muellner@vrull.eu>
Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
---
 arch/riscv/crypto/riscv.pm | 433 ++++++++++++++++++++++++++++++++++++-
 1 file changed, 432 insertions(+), 1 deletion(-)

diff --git a/arch/riscv/crypto/riscv.pm b/arch/riscv/crypto/riscv.pm
index a44edc68d1a6..89e4cb7cfd8c 100644
--- a/arch/riscv/crypto/riscv.pm
+++ b/arch/riscv/crypto/riscv.pm
@@ -79,6 +79,29 @@ sub read_reg {
     return $1;
 }
 
+my @vregs = map("v$_",(0..31));
+my %vreglookup;
+@vreglookup{@vregs} = @vregs;
+
+sub read_vreg {
+    my $vreg = lc shift;
+    if (!exists($vreglookup{$vreg})) {
+        my $trace = "";
+        if ($have_stacktrace) {
+            $trace = Devel::StackTrace->new->as_string;
+        }
+        die("Unknown vector register ".$vreg."\n".$trace);
+    }
+    if (!($vreg =~ /^v([0-9]+)$/)) {
+        my $trace = "";
+        if ($have_stacktrace) {
+            $trace = Devel::StackTrace->new->as_string;
+        }
+        die("Could not process vector register ".$vreg."\n".$trace);
+    }
+    return $1;
+}
+
 # Helper functions
 
 sub brev8_rv64i {
@@ -258,4 +281,412 @@ sub rev8 {
     return ".word ".($template | ($rs << 15) | ($rd << 7));
 }
 
+# Vector instructions
+
+sub vadd_vv {
+    # vadd.vv vd, vs2, vs1
+    my $template = 0b0000001_00000_00000_000_00000_1010111;
+    my $vd = read_vreg shift;
+    my $vs2 = read_vreg shift;
+    my $vs1 = read_vreg shift;
+    return ".word ".($template | ($vs2 << 20) | ($vs1 << 15) | ($vd << 7));
+}
+
+sub vid_v {
+    # vid.v vd
+    my $template = 0b0101001_00000_10001_010_00000_1010111;
+    my $vd = read_vreg shift;
+    return ".word ".($template | ($vd << 7));
+}
+
+sub vle32_v {
+    # vle32.v vd, (rs1)
+    my $template = 0b0000001_00000_00000_110_00000_0000111;
+    my $vd = read_vreg shift;
+    my $rs1 = read_reg shift;
+    return ".word ".($template | ($rs1 << 15) | ($vd << 7));
+}
+
+sub vle64_v {
+    # vle64.v vd, (rs1)
+    my $template = 0b0000001_00000_00000_111_00000_0000111;
+    my $vd = read_vreg shift;
+    my $rs1 = read_reg shift;
+    return ".word ".($template | ($rs1 << 15) | ($vd << 7));
+}
+
+sub vlse32_v {
+    # vlse32.v vd, (rs1), rs2
+    my $template = 0b0000101_00000_00000_110_00000_0000111;
+    my $vd = read_vreg shift;
+    my $rs1 = read_reg shift;
+    my $rs2 = read_reg shift;
+    return ".word ".($template | ($rs2 << 20) | ($rs1 << 15) | ($vd << 7));
+}
+
+sub vlse64_v {
+    # vlse64.v vd, (rs1), rs2
+    my $template = 0b0000101_00000_00000_111_00000_0000111;
+    my $vd = read_vreg shift;
+    my $rs1 = read_reg shift;
+    my $rs2 = read_reg shift;
+    return ".word ".($template | ($rs2 << 20) | ($rs1 << 15) | ($vd << 7));
+}
+
+sub vmerge_vim {
+    # vmerge.vim vd, vs2, imm, v0
+    my $template = 0b0101110_00000_00000_011_00000_1010111;
+    my $vd = read_vreg shift;
+    my $vs2 = read_vreg shift;
+    my $imm = shift;
+    return ".word ".($template | ($vs2 << 20) | ($imm << 15) | ($vd << 7));
+}
+
+sub vmerge_vvm {
+    # vmerge.vvm vd vs2 vs1
+    my $template = 0b0101110_00000_00000_000_00000_1010111;
+    my $vd = read_vreg shift;
+    my $vs2 = read_vreg shift;
+    my $vs1 = read_vreg shift;
+    return ".word ".($template | ($vs2 << 20) | ($vs1 <<   15) | ($vd << 7))
+}
+
+sub vmseq_vi {
+    # vmseq vd vs1, imm
+    my $template = 0b0110001_00000_00000_011_00000_1010111;
+    my $vd = read_vreg shift;
+    my $vs1 = read_vreg shift;
+    my $imm = shift;
+    return ".word ".($template | ($vs1 << 20) | ($imm <<   15) | ($vd << 7))
+}
+
+sub vmv_v_i {
+    # vmv.v.i vd, imm
+    my $template = 0b0101111_00000_00000_011_00000_1010111;
+    my $vd = read_vreg shift;
+    my $imm = shift;
+    return ".word ".($template | ($imm << 15) | ($vd << 7));
+}
+
+sub vmv_v_v {
+    # vmv.v.v vd, vs1
+    my $template = 0b0101111_00000_00000_000_00000_1010111;
+    my $vd = read_vreg shift;
+    my $vs1 = read_vreg shift;
+    return ".word ".($template | ($vs1 << 15) | ($vd << 7));
+}
+
+sub vor_vv_v0t {
+    # vor.vv vd, vs2, vs1, v0.t
+    my $template = 0b0010100_00000_00000_000_00000_1010111;
+    my $vd = read_vreg shift;
+    my $vs2 = read_vreg shift;
+    my $vs1 = read_vreg shift;
+    return ".word ".($template | ($vs2 << 20) | ($vs1 << 15) | ($vd << 7));
+}
+
+sub vse32_v {
+    # vse32.v vd, (rs1)
+    my $template = 0b0000001_00000_00000_110_00000_0100111;
+    my $vd = read_vreg shift;
+    my $rs1 = read_reg shift;
+    return ".word ".($template | ($rs1 << 15) | ($vd << 7));
+}
+
+sub vse64_v {
+    # vse64.v vd, (rs1)
+    my $template = 0b0000001_00000_00000_111_00000_0100111;
+    my $vd = read_vreg shift;
+    my $rs1 = read_reg shift;
+    return ".word ".($template | ($rs1 << 15) | ($vd << 7));
+}
+
+sub vsetivli__x0_2_e64_m1_ta_ma {
+    # vsetivli x0, 2, e64, m1, ta, ma
+    return ".word 0xcd817057";
+}
+
+sub vsetivli__x0_4_e32_m1_ta_ma {
+    # vsetivli x0, 4, e32, m1, ta, ma
+    return ".word 0xcd027057";
+}
+
+sub vsetivli__x0_4_e64_m1_ta_ma {
+    # vsetivli x0,4,e64,m1,ta,ma
+    return ".word 0xcd827057";
+}
+
+sub vsetivli__x0_8_e32_m1_ta_ma {
+    return ".word 0xcd047057";
+}
+
+sub vslidedown_vi {
+    # vslidedown.vi vd, vs2, uimm
+    my $template = 0b0011111_00000_00000_011_00000_1010111;
+    my $vd = read_vreg shift;
+    my $vs2 = read_vreg shift;
+    my $uimm = shift;
+    return ".word ".($template | ($vs2 << 20) | ($uimm << 15) | ($vd << 7));
+}
+
+sub vslideup_vi_v0t {
+    # vslideup.vi vd, vs2, uimm, v0.t
+    my $template = 0b0011100_00000_00000_011_00000_1010111;
+    my $vd = read_vreg shift;
+    my $vs2 = read_vreg shift;
+    my $uimm = shift;
+    return ".word ".($template | ($vs2 << 20) | ($uimm << 15) | ($vd << 7));
+}
+
+sub vslideup_vi {
+    # vslideup.vi vd, vs2, uimm
+    my $template = 0b0011101_00000_00000_011_00000_1010111;
+    my $vd = read_vreg shift;
+    my $vs2 = read_vreg shift;
+    my $uimm = shift;
+    return ".word ".($template | ($vs2 << 20) | ($uimm << 15) | ($vd << 7));
+}
+
+sub vsll_vi {
+    # vsll.vi vd, vs2, uimm, vm
+    my $template = 0b1001011_00000_00000_011_00000_1010111;
+    my $vd = read_vreg shift;
+    my $vs2 = read_vreg shift;
+    my $uimm = shift;
+    return ".word ".($template | ($vs2 << 20) | ($uimm << 15) | ($vd << 7));
+}
+
+sub vsrl_vx {
+    # vsrl.vx vd, vs2, rs1
+    my $template = 0b1010001_00000_00000_100_00000_1010111;
+    my $vd = read_vreg shift;
+    my $vs2 = read_vreg shift;
+    my $rs1 = read_reg shift;
+    return ".word ".($template | ($vs2 << 20) | ($rs1 << 15) | ($vd << 7));
+}
+
+sub vsse32_v {
+    # vse32.v vs3, (rs1), rs2
+    my $template = 0b0000101_00000_00000_110_00000_0100111;
+    my $vs3 = read_vreg shift;
+    my $rs1 = read_reg shift;
+    my $rs2 = read_reg shift;
+    return ".word ".($template | ($rs2 << 20) | ($rs1 << 15) | ($vs3 << 7));
+}
+
+sub vsse64_v {
+    # vsse64.v vs3, (rs1), rs2
+    my $template = 0b0000101_00000_00000_111_00000_0100111;
+    my $vs3 = read_vreg shift;
+    my $rs1 = read_reg shift;
+    my $rs2 = read_reg shift;
+    return ".word ".($template | ($rs2 << 20) | ($rs1 << 15) | ($vs3 << 7));
+}
+
+sub vxor_vv_v0t {
+    # vxor.vv vd, vs2, vs1, v0.t
+    my $template = 0b0010110_00000_00000_000_00000_1010111;
+    my $vd = read_vreg shift;
+    my $vs2 = read_vreg shift;
+    my $vs1 = read_vreg shift;
+    return ".word ".($template | ($vs2 << 20) | ($vs1 << 15) | ($vd << 7));
+}
+
+sub vxor_vv {
+    # vxor.vv vd, vs2, vs1
+    my $template = 0b0010111_00000_00000_000_00000_1010111;
+    my $vd = read_vreg shift;
+    my $vs2 = read_vreg shift;
+    my $vs1 = read_vreg shift;
+    return ".word ".($template | ($vs2 << 20) | ($vs1 << 15) | ($vd << 7));
+}
+
+# Vector crypto instructions
+
+## Zvbb instructions
+
+sub vrev8_v {
+    # vrev8.v vd, vs2
+    my $template = 0b0100101_00000_01001_010_00000_1010111;
+    my $vd = read_vreg shift;
+    my $vs2 = read_vreg shift;
+    return ".word ".($template | ($vs2 << 20) | ($vd << 7));
+}
+
+## Zvbc instructions
+
+sub vclmulh_vx {
+    # vclmulh.vx vd, vs2, rs1
+    my $template = 0b0011011_00000_00000_110_00000_1010111;
+    my $vd = read_vreg shift;
+    my $vs2 = read_vreg shift;
+    my $rs1 = read_reg shift;
+    return ".word ".($template | ($vs2 << 20) | ($rs1 << 15) | ($vd << 7));
+}
+
+sub vclmul_vx_v0t {
+    # vclmul.vx vd, vs2, rs1, v0.t
+    my $template = 0b0011000_00000_00000_110_00000_1010111;
+    my $vd = read_vreg shift;
+    my $vs2 = read_vreg shift;
+    my $rs1 = read_reg shift;
+    return ".word ".($template | ($vs2 << 20) | ($rs1 << 15) | ($vd << 7));
+}
+
+sub vclmul_vx {
+    # vclmul.vx vd, vs2, rs1
+    my $template = 0b0011001_00000_00000_110_00000_1010111;
+    my $vd = read_vreg shift;
+    my $vs2 = read_vreg shift;
+    my $rs1 = read_reg shift;
+    return ".word ".($template | ($vs2 << 20) | ($rs1 << 15) | ($vd << 7));
+}
+
+## Zvkg instructions
+
+sub vghsh_vv {
+    # vghsh.vv vd, vs2, vs1
+    my $template = 0b1011001_00000_00000_010_00000_1110111;
+    my $vd = read_vreg shift;
+    my $vs2 = read_vreg shift;
+    my $vs1 = read_vreg shift;
+    return ".word ".($template | ($vs2 << 20) | ($vs1 << 15) | ($vd << 7));
+}
+
+sub vgmul_vv {
+    # vgmul.vv vd, vs2
+    my $template = 0b1010001_00000_10001_010_00000_1110111;
+    my $vd = read_vreg shift;
+    my $vs2 = read_vreg shift;
+    return ".word ".($template | ($vs2 << 20) | ($vd << 7));
+}
+
+## Zvkned instructions
+
+sub vaesdf_vs {
+    # vaesdf.vs vd, vs2
+    my $template = 0b101001_1_00000_00001_010_00000_1110111;
+    my $vd = read_vreg  shift;
+    my $vs2 = read_vreg  shift;
+    return ".word ".($template | ($vs2 << 20) | ($vd << 7));
+}
+
+sub vaesdm_vs {
+    # vaesdm.vs vd, vs2
+    my $template = 0b101001_1_00000_00000_010_00000_1110111;
+    my $vd = read_vreg shift;
+    my $vs2 = read_vreg shift;
+    return ".word ".($template | ($vs2 << 20) | ($vd << 7));
+}
+
+sub vaesef_vs {
+    # vaesef.vs vd, vs2
+    my $template = 0b101001_1_00000_00011_010_00000_1110111;
+    my $vd = read_vreg  shift;
+    my $vs2 = read_vreg  shift;
+    return ".word ".($template | ($vs2 << 20) | ($vd << 7));
+}
+
+sub vaesem_vs {
+    # vaesem.vs vd, vs2
+    my $template = 0b101001_1_00000_00010_010_00000_1110111;
+    my $vd = read_vreg  shift;
+    my $vs2 = read_vreg  shift;
+    return ".word ".($template | ($vs2 << 20) | ($vd << 7));
+}
+
+sub vaeskf1_vi {
+    # vaeskf1.vi vd, vs2, uimmm
+    my $template = 0b100010_1_00000_00000_010_00000_1110111;
+    my $vd = read_vreg  shift;
+    my $vs2 = read_vreg  shift;
+    my $uimm = shift;
+    return ".word ".($template | ($uimm << 15) | ($vs2 << 20) | ($vd << 7));
+}
+
+sub vaeskf2_vi {
+    # vaeskf2.vi vd, vs2, uimm
+    my $template = 0b101010_1_00000_00000_010_00000_1110111;
+    my $vd = read_vreg shift;
+    my $vs2 = read_vreg shift;
+    my $uimm = shift;
+    return ".word ".($template | ($vs2 << 20) | ($uimm << 15) | ($vd << 7));
+}
+
+sub vaesz_vs {
+    # vaesz.vs vd, vs2
+    my $template = 0b101001_1_00000_00111_010_00000_1110111;
+    my $vd = read_vreg  shift;
+    my $vs2 = read_vreg  shift;
+    return ".word ".($template | ($vs2 << 20) | ($vd << 7));
+}
+
+## Zvknha and Zvknhb instructions
+
+sub vsha2ms_vv {
+    # vsha2ms.vv vd, vs2, vs1
+    my $template = 0b1011011_00000_00000_010_00000_1110111;
+    my $vd = read_vreg shift;
+    my $vs2 = read_vreg shift;
+    my $vs1 = read_vreg shift;
+    return ".word ".($template | ($vs2 << 20)| ($vs1 << 15 )| ($vd << 7));
+}
+
+sub vsha2ch_vv {
+    # vsha2ch.vv vd, vs2, vs1
+    my $template = 0b101110_10000_00000_001_00000_01110111;
+    my $vd = read_vreg shift;
+    my $vs2 = read_vreg shift;
+    my $vs1 = read_vreg shift;
+    return ".word ".($template | ($vs2 << 20)| ($vs1 << 15 )| ($vd << 7));
+}
+
+sub vsha2cl_vv {
+    # vsha2cl.vv vd, vs2, vs1
+    my $template = 0b101111_10000_00000_001_00000_01110111;
+    my $vd = read_vreg shift;
+    my $vs2 = read_vreg shift;
+    my $vs1 = read_vreg shift;
+    return ".word ".($template | ($vs2 << 20)| ($vs1 << 15 )| ($vd << 7));
+}
+
+## Zvksed instructions
+
+sub vsm4k_vi {
+    # vsm4k.vi vd, vs2, uimm
+    my $template = 0b1000011_00000_00000_010_00000_1110111;
+    my $vd = read_vreg shift;
+    my $vs2 = read_vreg shift;
+    my $uimm = shift;
+    return ".word ".($template | ($vs2 << 20) | ($uimm << 15) | ($vd << 7));
+}
+
+sub vsm4r_vs {
+    # vsm4r.vs vd, vs2
+    my $template = 0b1010011_00000_10000_010_00000_1110111;
+    my $vd = read_vreg shift;
+    my $vs2 = read_vreg shift;
+    return ".word ".($template | ($vs2 << 20) | ($vd << 7));
+}
+
+## zvksh instructions
+
+sub vsm3c_vi {
+    # vsm3c.vi vd, vs2, uimm
+    my $template = 0b1010111_00000_00000_010_00000_1110111;
+    my $vd = read_vreg shift;
+    my $vs2 = read_vreg shift;
+    my $uimm = shift;
+    return ".word ".($template | ($vs2 << 20) | ($uimm << 15 ) | ($vd << 7));
+}
+
+sub vsm3me_vv {
+    # vsm3me.vv vd, vs2, vs1
+    my $template = 0b1000001_00000_00000_010_00000_1110111;
+    my $vd = read_vreg shift;
+    my $vs2 = read_vreg shift;
+    my $vs1 = read_vreg shift;
+    return ".word ".($template | ($vs2 << 20) | ($vs1 << 15 ) | ($vd << 7));
+}
+
 1;
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [PATCH v4 06/12] RISC-V: crypto: add Zvbb+Zvbc accelerated GCM GHASH implementation
  2023-07-11 15:37 ` Heiko Stuebner
@ 2023-07-11 15:37   ` Heiko Stuebner
  -1 siblings, 0 replies; 100+ messages in thread
From: Heiko Stuebner @ 2023-07-11 15:37 UTC (permalink / raw)
  To: palmer, paul.walmsley
  Cc: aou, heiko, herbert, davem, conor.dooley, linux-riscv,
	linux-kernel, linux-crypto, christoph.muellner, ebiggers,
	Heiko Stuebner

From: Heiko Stuebner <heiko.stuebner@vrull.eu>

Add a gcm hash implementation using the Zvbb+Zvbc crypto extensions.
It gets possibly registered alongside the Zbc-based variant, with a higher
priority so that the crypto subsystem will be able to select the most
performant variant, but the algorithm itself will still be part of the
crypto selftests that run during registration.

Co-developed-by: Christoph Müllner <christoph.muellner@vrull.eu>
Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
---
 arch/riscv/crypto/Kconfig                    |   3 +-
 arch/riscv/crypto/Makefile                   |   8 +-
 arch/riscv/crypto/ghash-riscv64-glue.c       | 150 ++++++++
 arch/riscv/crypto/ghash-riscv64-zvbb-zvbc.pl | 380 +++++++++++++++++++
 4 files changed, 539 insertions(+), 2 deletions(-)
 create mode 100644 arch/riscv/crypto/ghash-riscv64-zvbb-zvbc.pl

diff --git a/arch/riscv/crypto/Kconfig b/arch/riscv/crypto/Kconfig
index cd2237923e68..41b8fdfe1d92 100644
--- a/arch/riscv/crypto/Kconfig
+++ b/arch/riscv/crypto/Kconfig
@@ -4,7 +4,7 @@ menu "Accelerated Cryptographic Algorithms for CPU (riscv)"
 
 config CRYPTO_GHASH_RISCV64
 	tristate "Hash functions: GHASH"
-	depends on 64BIT && RISCV_ISA_ZBC
+	depends on 64BIT && (RISCV_ISA_ZBC || RISCV_ISA_V)
 	select CRYPTO_HASH
 	select CRYPTO_LIB_GF128MUL
 	help
@@ -14,5 +14,6 @@ config CRYPTO_GHASH_RISCV64
 	  - Zbc extension
 	  - Zbc + Zbb extensions
 	  - Zbc + Zbkb extensions
+	  - Zvbb vector crypto extension
 
 endmenu
diff --git a/arch/riscv/crypto/Makefile b/arch/riscv/crypto/Makefile
index 0a158919e9da..81190941ba78 100644
--- a/arch/riscv/crypto/Makefile
+++ b/arch/riscv/crypto/Makefile
@@ -8,6 +8,9 @@ ghash-riscv64-y := ghash-riscv64-glue.o
 ifdef CONFIG_RISCV_ISA_ZBC
 ghash-riscv64-y += ghash-riscv64-zbc.o
 endif
+ifdef CONFIG_RISCV_ISA_V
+ghash-riscv64-y += ghash-riscv64-zvbb-zvbc.o
+endif
 
 quiet_cmd_perlasm = PERLASM $@
       cmd_perlasm = $(PERL) $(<) void $(@)
@@ -15,4 +18,7 @@ quiet_cmd_perlasm = PERLASM $@
 $(obj)/ghash-riscv64-zbc.S: $(src)/ghash-riscv64-zbc.pl
 	$(call cmd,perlasm)
 
-clean-files += ghash-riscv64-zbc.S
+$(obj)/ghash-riscv64-zvbb-zvbc.S: $(src)/ghash-riscv64-zvbb-zvbc.pl
+	$(call cmd,perlasm)
+
+clean-files += ghash-riscv64-zbc.S ghash-riscv64-zvkb.S
diff --git a/arch/riscv/crypto/ghash-riscv64-glue.c b/arch/riscv/crypto/ghash-riscv64-glue.c
index 695bed6c54cb..2bfd1934d55b 100644
--- a/arch/riscv/crypto/ghash-riscv64-glue.c
+++ b/arch/riscv/crypto/ghash-riscv64-glue.c
@@ -11,6 +11,7 @@
 #include <linux/crypto.h>
 #include <linux/module.h>
 #include <asm/simd.h>
+#include <asm/vector.h>
 #include <crypto/ghash.h>
 #include <crypto/internal/hash.h>
 #include <crypto/internal/simd.h>
@@ -21,6 +22,9 @@ struct riscv64_ghash_ctx {
 
 	/* key used by vector asm */
 	u128 htable[16];
+
+	/* key used by software fallback */
+	be128 key;
 };
 
 struct riscv64_ghash_desc_ctx {
@@ -38,6 +42,142 @@ static int riscv64_ghash_init(struct shash_desc *desc)
 	return 0;
 }
 
+#ifdef CONFIG_RISCV_ISA_V
+
+void gcm_init_rv64i_zvbb_zvbc(u128 Htable[16], const u64 Xi[2]);
+
+void gcm_ghash_rv64i_zvbb_zvbc(u64 Xi[2], const u128 Htable[16],
+			       const u8 *inp, size_t len);
+
+static int riscv64_zvk_ghash_setkey_zvbb_zvbc(struct crypto_shash *tfm,
+					      const u8 *key,
+					      unsigned int keylen)
+{
+	struct riscv64_ghash_ctx *ctx = crypto_tfm_ctx(crypto_shash_tfm(tfm));
+	const u64 k[2] = { cpu_to_be64(((const u64 *)key)[0]),
+			   cpu_to_be64(((const u64 *)key)[1]) };
+
+	if (keylen != GHASH_BLOCK_SIZE)
+		return -EINVAL;
+
+	memcpy(&ctx->key, key, GHASH_BLOCK_SIZE);
+	kernel_rvv_begin();
+	gcm_init_rv64i_zvbb_zvbc(ctx->htable, k);
+	kernel_rvv_end();
+
+	ctx->ghash_func = gcm_ghash_rv64i_zvbb_zvbc;
+
+	return 0;
+}
+
+static inline void __ghash_block(struct riscv64_ghash_ctx *ctx,
+				 struct riscv64_ghash_desc_ctx *dctx)
+{
+	if (crypto_simd_usable()) {
+		kernel_rvv_begin();
+		ctx->ghash_func(dctx->shash, ctx->htable,
+				dctx->buffer, GHASH_DIGEST_SIZE);
+		kernel_rvv_end();
+	} else {
+		crypto_xor((u8 *)dctx->shash, dctx->buffer, GHASH_BLOCK_SIZE);
+		gf128mul_lle((be128 *)dctx->shash, &ctx->key);
+	}
+}
+
+static inline void __ghash_blocks(struct riscv64_ghash_ctx *ctx,
+				  struct riscv64_ghash_desc_ctx *dctx,
+				  const u8 *src, unsigned int srclen)
+{
+	if (crypto_simd_usable()) {
+		kernel_rvv_begin();
+		ctx->ghash_func(dctx->shash, ctx->htable,
+				src, srclen);
+		kernel_rvv_end();
+	} else {
+		while (srclen >= GHASH_BLOCK_SIZE) {
+			crypto_xor((u8 *)dctx->shash, src, GHASH_BLOCK_SIZE);
+			gf128mul_lle((be128 *)dctx->shash, &ctx->key);
+			srclen -= GHASH_BLOCK_SIZE;
+			src += GHASH_BLOCK_SIZE;
+		}
+	}
+}
+
+static int riscv64_zvk_ghash_update(struct shash_desc *desc,
+			   const u8 *src, unsigned int srclen)
+{
+	unsigned int len;
+	struct riscv64_ghash_ctx *ctx = crypto_tfm_ctx(crypto_shash_tfm(desc->tfm));
+	struct riscv64_ghash_desc_ctx *dctx = shash_desc_ctx(desc);
+
+	if (dctx->bytes) {
+		if (dctx->bytes + srclen < GHASH_DIGEST_SIZE) {
+			memcpy(dctx->buffer + dctx->bytes, src,
+				srclen);
+			dctx->bytes += srclen;
+			return 0;
+		}
+		memcpy(dctx->buffer + dctx->bytes, src,
+			GHASH_DIGEST_SIZE - dctx->bytes);
+
+		__ghash_block(ctx, dctx);
+
+		src += GHASH_DIGEST_SIZE - dctx->bytes;
+		srclen -= GHASH_DIGEST_SIZE - dctx->bytes;
+		dctx->bytes = 0;
+	}
+	len = srclen & ~(GHASH_DIGEST_SIZE - 1);
+
+	if (len) {
+		__ghash_blocks(ctx, dctx, src, len);
+		src += len;
+		srclen -= len;
+	}
+
+	if (srclen) {
+		memcpy(dctx->buffer, src, srclen);
+		dctx->bytes = srclen;
+	}
+	return 0;
+}
+
+static int riscv64_zvk_ghash_final(struct shash_desc *desc, u8 *out)
+{
+	struct riscv64_ghash_ctx *ctx = crypto_tfm_ctx(crypto_shash_tfm(desc->tfm));
+	struct riscv64_ghash_desc_ctx *dctx = shash_desc_ctx(desc);
+	int i;
+
+	if (dctx->bytes) {
+		for (i = dctx->bytes; i < GHASH_DIGEST_SIZE; i++)
+			dctx->buffer[i] = 0;
+		__ghash_block(ctx, dctx);
+		dctx->bytes = 0;
+	}
+
+	memcpy(out, dctx->shash, GHASH_DIGEST_SIZE);
+	return 0;
+}
+
+struct shash_alg riscv64_zvbb_zvbc_ghash_alg = {
+	.digestsize = GHASH_DIGEST_SIZE,
+	.init = riscv64_ghash_init,
+	.update = riscv64_zvk_ghash_update,
+	.final = riscv64_zvk_ghash_final,
+	.setkey = riscv64_zvk_ghash_setkey_zvbb_zvbc,
+	.descsize = sizeof(struct riscv64_ghash_desc_ctx)
+		    + sizeof(struct ghash_desc_ctx),
+	.base = {
+		 .cra_name = "ghash",
+		 .cra_driver_name = "riscv64_zvbb_zvbc_ghash",
+		 .cra_priority = 300,
+		 .cra_blocksize = GHASH_BLOCK_SIZE,
+		 .cra_ctxsize = sizeof(struct riscv64_ghash_ctx),
+		 .cra_module = THIS_MODULE,
+	},
+};
+
+#endif /* CONFIG_RISCV_ISA_V */
+
 #ifdef CONFIG_RISCV_ISA_ZBC
 
 void gcm_init_rv64i_zbc(u128 Htable[16], const u64 Xi[2]);
@@ -269,6 +409,16 @@ static int __init riscv64_ghash_mod_init(void)
 	}
 #endif
 
+#ifdef CONFIG_RISCV_ISA_V
+	if (riscv_isa_extension_available(NULL, ZVBB) && 
+	    riscv_isa_extension_available(NULL, ZVBC) &&
+	    riscv_vector_vlen() >= 128) {
+		ret = riscv64_ghash_register(&riscv64_zvbb_zvbc_ghash_alg);
+		if (ret < 0)
+			return ret;
+	}
+#endif
+
 	return 0;
 }
 
diff --git a/arch/riscv/crypto/ghash-riscv64-zvbb-zvbc.pl b/arch/riscv/crypto/ghash-riscv64-zvbb-zvbc.pl
new file mode 100644
index 000000000000..2b7475324c83
--- /dev/null
+++ b/arch/riscv/crypto/ghash-riscv64-zvbb-zvbc.pl
@@ -0,0 +1,380 @@
+#! /usr/bin/env perl
+# SPDX-License-Identifier: Apache-2.0 OR BSD-2-Clause
+#
+# This file is dual-licensed, meaning that you can use it under your
+# choice of either of the following two licenses:
+#
+# Copyright 2023 The OpenSSL Project Authors. All Rights Reserved.
+#
+# Licensed under the Apache License 2.0 (the "License"). You can obtain
+# a copy in the file LICENSE in the source distribution or at
+# https://www.openssl.org/source/license.html
+#
+# or
+#
+# Copyright (c) 2023, Christoph Müllner <christoph.muellner@vrull.eu>
+# All rights reserved.
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions
+# are met:
+# 1. Redistributions of source code must retain the above copyright
+#    notice, this list of conditions and the following disclaimer.
+# 2. Redistributions in binary form must reproduce the above copyright
+#    notice, this list of conditions and the following disclaimer in the
+#    documentation and/or other materials provided with the distribution.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+# - RV64I
+# - RISC-V vector ('V') with VLEN >= 128
+# - Vector Bit-manipulation used in Cryptography ('Zvbb')
+# - Vector Carryless Multiplication ('Zvbc')
+
+use strict;
+use warnings;
+
+use FindBin qw($Bin);
+use lib "$Bin";
+use lib "$Bin/../../perlasm";
+use riscv;
+
+# $output is the last argument if it looks like a file (it has an extension)
+# $flavour is the first argument if it doesn't look like a file
+my $output = $#ARGV >= 0 && $ARGV[$#ARGV] =~ m|\.\w+$| ? pop : undef;
+my $flavour = $#ARGV >= 0 && $ARGV[0] !~ m|\.| ? shift : undef;
+
+$output and open STDOUT,">$output";
+
+my $code=<<___;
+.text
+___
+
+################################################################################
+# void gcm_init_rv64i_zvbb_zvbc(u128 Htable[16], const u64 H[2]);
+#
+# input:	H: 128-bit H - secret parameter E(K, 0^128)
+# output:	Htable: Preprocessed key data for gcm_gmult_rv64i_zvbb_zvbc and
+#                       gcm_ghash_rv64i_zvbb_zvbc
+{
+my ($Htable,$H,$TMP0,$TMP1,$TMP2) = ("a0","a1","t0","t1","t2");
+my ($V0,$V1,$V2,$V3,$V4,$V5,$V6) = ("v0","v1","v2","v3","v4","v5","v6");
+
+$code .= <<___;
+.p2align 3
+.globl gcm_init_rv64i_zvbb_zvbc
+.type gcm_init_rv64i_zvbb_zvbc,\@function
+gcm_init_rv64i_zvbb_zvbc:
+    # Load/store data in reverse order.
+    # This is needed as a part of endianness swap.
+    add $H, $H, 8
+    li $TMP0, -8
+    li $TMP1, 63
+    la $TMP2, Lpolymod
+
+    @{[vsetivli__x0_2_e64_m1_ta_ma]} # vsetivli x0, 2, e64, m1, ta, ma
+
+    @{[vlse64_v  $V1, $H, $TMP0]}    # vlse64.v v1, (a1), t0
+    @{[vle64_v $V2, $TMP2]}          # vle64.v v2, (t2)
+
+    # Shift one left and get the carry bits.
+    @{[vsrl_vx $V3, $V1, $TMP1]}     # vsrl.vx v3, v1, t1
+    @{[vsll_vi $V1, $V1, 1]}         # vsll.vi v1, v1, 1
+
+    # Use the fact that the polynomial degree is no more than 128,
+    # i.e. only the LSB of the upper half could be set.
+    # Thanks to this we don't need to do the full reduction here.
+    # Instead simply subtract the reduction polynomial.
+    # This idea was taken from x86 ghash implementation in OpenSSL.
+    @{[vslideup_vi $V4, $V3, 1]}     # vslideup.vi v4, v3, 1
+    @{[vslidedown_vi $V3, $V3, 1]}   # vslidedown.vi v3, v3, 1
+
+    @{[vmv_v_i $V0, 2]}              # vmv.v.i v0, 2
+    @{[vor_vv_v0t $V1, $V1, $V4]}    # vor.vv v1, v1, v4, v0.t
+
+    # Need to set the mask to 3, if the carry bit is set.
+    @{[vmv_v_v $V0, $V3]}            # vmv.v.v v0, v3
+    @{[vmv_v_i $V3, 0]}              # vmv.v.i v3, 0
+    @{[vmerge_vim $V3, $V3, 3]}      # vmerge.vim v3, v3, 3, v0
+    @{[vmv_v_v $V0, $V3]}            # vmv.v.v v0, v3
+
+    @{[vxor_vv_v0t $V1, $V1, $V2]}   # vxor.vv v1, v1, v2, v0.t
+
+    @{[vse64_v $V1, $Htable]}        # vse64.v v1, (a0)
+    ret
+.size gcm_init_rv64i_zvbb_zvbc,.-gcm_init_rv64i_zvbb_zvbc
+___
+}
+
+################################################################################
+# void gcm_gmult_rv64i_zvbb_zvbc(u64 Xi[2], const u128 Htable[16]);
+#
+# input:	Xi: current hash value
+#		Htable: preprocessed H
+# output:	Xi: next hash value Xi = (Xi * H mod f)
+{
+my ($Xi,$Htable,$TMP0,$TMP1,$TMP2,$TMP3,$TMP4) = ("a0","a1","t0","t1","t2","t3","t4");
+my ($V0,$V1,$V2,$V3,$V4,$V5,$V6) = ("v0","v1","v2","v3","v4","v5","v6");
+
+$code .= <<___;
+.text
+.p2align 3
+.globl gcm_gmult_rv64i_zvbb_zvbc
+.type gcm_gmult_rv64i_zvbb_zvbc,\@function
+gcm_gmult_rv64i_zvbb_zvbc:
+    ld $TMP0, ($Htable)
+    ld $TMP1, 8($Htable)
+    li $TMP2, 63
+    la $TMP3, Lpolymod
+    ld $TMP3, 8($TMP3)
+
+    # Load/store data in reverse order.
+    # This is needed as a part of endianness swap.
+    add $Xi, $Xi, 8
+    li $TMP4, -8
+
+    @{[vsetivli__x0_2_e64_m1_ta_ma]} # vsetivli x0, 2, e64, m1, ta, ma
+
+    @{[vlse64_v $V5, $Xi, $TMP4]}    # vlse64.v v5, (a0), t4
+    @{[vrev8_v $V5, $V5]}            # vrev8.v v5, v5
+
+    # Multiplication
+
+    # Do two 64x64 multiplications in one go to save some time
+    # and simplify things.
+
+    # A = a1a0 (t1, t0)
+    # B = b1b0 (v5)
+    # C = c1c0 (256 bit)
+    # c1 = a1b1 + (a0b1)h + (a1b0)h
+    # c0 = a0b0 + (a0b1)l + (a1b0)h
+
+    # v1 = (a0b1)l,(a0b0)l
+    @{[vclmul_vx $V1, $V5, $TMP0]}   # vclmul.vx v1, v5, t0
+    # v3 = (a0b1)h,(a0b0)h
+    @{[vclmulh_vx $V3, $V5, $TMP0]}  # vclmulh.vx v3, v5, t0
+
+    # v4 = (a1b1)l,(a1b0)l
+    @{[vclmul_vx $V4, $V5, $TMP1]}   # vclmul.vx v4, v5, t1
+    # v2 = (a1b1)h,(a1b0)h
+    @{[vclmulh_vx $V2, $V5, $TMP1]}   # vclmulh.vx v2, v5, t1
+
+    # Is there a better way to do this?
+    # Would need to swap the order of elements within a vector register.
+    @{[vslideup_vi $V5, $V3, 1]}     # vslideup.vi v5, v3, 1
+    @{[vslideup_vi $V6, $V4, 1]}     # vslideup.vi v6, v4, 1
+    @{[vslidedown_vi $V3, $V3, 1]}   # vslidedown.vi v3, v3, 1
+    @{[vslidedown_vi $V4, $V4, 1]}   # vslidedown.vi v4, v4, 1
+
+    @{[vmv_v_i $V0, 1]}              # vmv.v.i v0, 1
+    # v2 += (a0b1)h
+    @{[vxor_vv_v0t $V2, $V2, $V3]}   # vxor.vv v2, v2, v3, v0.t
+    # v2 += (a1b1)l
+    @{[vxor_vv_v0t $V2, $V2, $V4]}   # vxor.vv v2, v2, v4, v0.t
+
+    @{[vmv_v_i $V0, 2]}              # vmv.v.i v0, 2
+    # v1 += (a0b0)h,0
+    @{[vxor_vv_v0t $V1, $V1, $V5]}   # vxor.vv v1, v1, v5, v0.t
+    # v1 += (a1b0)l,0
+    @{[vxor_vv_v0t $V1, $V1, $V6]}   # vxor.vv v1, v1, v6, v0.t
+
+    # Now the 256bit product should be stored in (v2,v1)
+    # v1 = (a0b1)l + (a0b0)h + (a1b0)l, (a0b0)l
+    # v2 = (a1b1)h, (a1b0)h + (a0b1)h + (a1b1)l
+
+    # Reduction
+    # Let C := A*B = c3,c2,c1,c0 = v2[1],v2[0],v1[1],v1[0]
+    # This is a slight variation of the Gueron's Montgomery reduction.
+    # The difference being the order of some operations has been changed,
+    # to make a better use of vclmul(h) instructions.
+
+    # First step:
+    # c1 += (c0 * P)l
+    # vmv.v.i v0, 2
+    @{[vslideup_vi_v0t $V3, $V1, 1]} # vslideup.vi v3, v1, 1, v0.t
+    @{[vclmul_vx_v0t $V3, $V3, $TMP3]} # vclmul.vx v3, v3, t3, v0.t
+    @{[vxor_vv_v0t $V1, $V1, $V3]}   # vxor.vv v1, v1, v3, v0.t
+
+    # Second step:
+    # D = d1,d0 is final result
+    # We want:
+    # m1 = c1 + (c1 * P)h
+    # m0 = (c1 * P)l + (c0 * P)h + c0
+    # d1 = c3 + m1
+    # d0 = c2 + m0
+
+    #v3 = (c1 * P)l, 0
+    @{[vclmul_vx_v0t $V3, $V1, $TMP3]} # vclmul.vx v3, v1, t3, v0.t
+    #v4 = (c1 * P)h, (c0 * P)h
+    @{[vclmulh_vx $V4, $V1, $TMP3]}   # vclmulh.vx v4, v1, t3
+
+    @{[vmv_v_i $V0, 1]}              # vmv.v.i v0, 1
+    @{[vslidedown_vi $V3, $V3, 1]}   # vslidedown.vi v3, v3, 1
+
+    @{[vxor_vv $V1, $V1, $V4]}       # vxor.vv v1, v1, v4
+    @{[vxor_vv_v0t $V1, $V1, $V3]}   # vxor.vv v1, v1, v3, v0.t
+
+    # XOR in the upper upper part of the product
+    @{[vxor_vv $V2, $V2, $V1]}       # vxor.vv v2, v2, v1
+
+    @{[vrev8_v $V2, $V2]}            # vrev8.v v2, v2
+    @{[vsse64_v $V2, $Xi, $TMP4]}    # vsse64.v v2, (a0), t4
+    ret
+.size gcm_gmult_rv64i_zvbb_zvbc,.-gcm_gmult_rv64i_zvbb_zvbc
+___
+}
+
+################################################################################
+# void gcm_ghash_rv64i_zvbb_zvbc(u64 Xi[2], const u128 Htable[16],
+#                                const u8 *inp, size_t len);
+#
+# input:	Xi: current hash value
+#		Htable: preprocessed H
+#		inp: pointer to input data
+#		len: length of input data in bytes (mutiple of block size)
+# output:	Xi: Xi+1 (next hash value Xi)
+{
+my ($Xi,$Htable,$inp,$len,$TMP0,$TMP1,$TMP2,$TMP3,$M8,$TMP5,$TMP6) = ("a0","a1","a2","a3","t0","t1","t2","t3","t4","t5","t6");
+my ($V0,$V1,$V2,$V3,$V4,$V5,$V6,$Vinp) = ("v0","v1","v2","v3","v4","v5","v6","v7");
+
+$code .= <<___;
+.p2align 3
+.globl gcm_ghash_rv64i_zvbb_zvbc
+.type gcm_ghash_rv64i_zvbb_zvbc,\@function
+gcm_ghash_rv64i_zvbb_zvbc:
+    ld $TMP0, ($Htable)
+    ld $TMP1, 8($Htable)
+    li $TMP2, 63
+    la $TMP3, Lpolymod
+    ld $TMP3, 8($TMP3)
+
+    # Load/store data in reverse order.
+    # This is needed as a part of endianness swap.
+    add $Xi, $Xi, 8
+    add $inp, $inp, 8
+    li $M8, -8
+
+    @{[vsetivli__x0_2_e64_m1_ta_ma]} # vsetivli x0, 2, e64, m1, ta, ma
+
+    @{[vlse64_v $V5, $Xi, $M8]}      # vlse64.v v5, (a0), t4
+
+Lstep:
+    # Read input data
+    @{[vlse64_v $Vinp, $inp, $M8]}   # vle64.v v0, (a2)
+    add $inp, $inp, 16
+    add $len, $len, -16
+    # XOR them into Xi
+    @{[vxor_vv $V5, $V5, $Vinp]}       # vxor.vv v0, v0, v1
+
+    @{[vrev8_v $V5, $V5]}            # vrev8.v v5, v5
+
+    # Multiplication
+
+    # Do two 64x64 multiplications in one go to save some time
+    # and simplify things.
+
+    # A = a1a0 (t1, t0)
+    # B = b1b0 (v5)
+    # C = c1c0 (256 bit)
+    # c1 = a1b1 + (a0b1)h + (a1b0)h
+    # c0 = a0b0 + (a0b1)l + (a1b0)h
+
+    # v1 = (a0b1)l,(a0b0)l
+    @{[vclmul_vx $V1, $V5, $TMP0]}   # vclmul.vx v1, v5, t0
+    # v3 = (a0b1)h,(a0b0)h
+    @{[vclmulh_vx $V3, $V5, $TMP0]}  # vclmulh.vx v3, v5, t0
+
+    # v4 = (a1b1)l,(a1b0)l
+    @{[vclmul_vx $V4, $V5, $TMP1]}   # vclmul.vx v4, v5, t1
+    # v2 = (a1b1)h,(a1b0)h
+    @{[vclmulh_vx $V2, $V5, $TMP1]}   # vclmulh.vx v2, v5, t1
+
+    # Is there a better way to do this?
+    # Would need to swap the order of elements within a vector register.
+    @{[vslideup_vi $V5, $V3, 1]}     # vslideup.vi v5, v3, 1
+    @{[vslideup_vi $V6, $V4, 1]}     # vslideup.vi v6, v4, 1
+    @{[vslidedown_vi $V3, $V3, 1]}   # vslidedown.vi v3, v3, 1
+    @{[vslidedown_vi $V4, $V4, 1]}   # vslidedown.vi v4, v4, 1
+
+    @{[vmv_v_i $V0, 1]}              # vmv.v.i v0, 1
+    # v2 += (a0b1)h
+    @{[vxor_vv_v0t $V2, $V2, $V3]}   # vxor.vv v2, v2, v3, v0.t
+    # v2 += (a1b1)l
+    @{[vxor_vv_v0t $V2, $V2, $V4]}   # vxor.vv v2, v2, v4, v0.t
+
+    @{[vmv_v_i $V0, 2]}              # vmv.v.i v0, 2
+    # v1 += (a0b0)h,0
+    @{[vxor_vv_v0t $V1, $V1, $V5]}   # vxor.vv v1, v1, v5, v0.t
+    # v1 += (a1b0)l,0
+    @{[vxor_vv_v0t $V1, $V1, $V6]}   # vxor.vv v1, v1, v6, v0.t
+
+    # Now the 256bit product should be stored in (v2,v1)
+    # v1 = (a0b1)l + (a0b0)h + (a1b0)l, (a0b0)l
+    # v2 = (a1b1)h, (a1b0)h + (a0b1)h + (a1b1)l
+
+    # Reduction
+    # Let C := A*B = c3,c2,c1,c0 = v2[1],v2[0],v1[1],v1[0]
+    # This is a slight variation of the Gueron's Montgomery reduction.
+    # The difference being the order of some operations has been changed,
+    # to make a better use of vclmul(h) instructions.
+
+    # First step:
+    # c1 += (c0 * P)l
+    # vmv.v.i v0, 2
+    @{[vslideup_vi_v0t $V3, $V1, 1]} # vslideup.vi v3, v1, 1, v0.t
+    @{[vclmul_vx_v0t $V3, $V3, $TMP3]} # vclmul.vx v3, v3, t3, v0.t
+    @{[vxor_vv_v0t $V1, $V1, $V3]}   # vxor.vv v1, v1, v3, v0.t
+
+    # Second step:
+    # D = d1,d0 is final result
+    # We want:
+    # m1 = c1 + (c1 * P)h
+    # m0 = (c1 * P)l + (c0 * P)h + c0
+    # d1 = c3 + m1
+    # d0 = c2 + m0
+
+    #v3 = (c1 * P)l, 0
+    @{[vclmul_vx_v0t $V3, $V1, $TMP3]} # vclmul.vx v3, v1, t3, v0.t
+    #v4 = (c1 * P)h, (c0 * P)h
+    @{[vclmulh_vx $V4, $V1, $TMP3]}   # vclmulh.vx v4, v1, t3
+
+    @{[vmv_v_i $V0, 1]}              # vmv.v.i v0, 1
+    @{[vslidedown_vi $V3, $V3, 1]}   # vslidedown.vi v3, v3, 1
+
+    @{[vxor_vv $V1, $V1, $V4]}       # vxor.vv v1, v1, v4
+    @{[vxor_vv_v0t $V1, $V1, $V3]}   # vxor.vv v1, v1, v3, v0.t
+
+    # XOR in the upper upper part of the product
+    @{[vxor_vv $V2, $V2, $V1]}       # vxor.vv v2, v2, v1
+
+    @{[vrev8_v $V5, $V2]}            # vrev8.v v2, v2
+
+    bnez $len, Lstep
+
+    @{[vsse64_v $V5, $Xi, $M8]}    # vsse64.v v2, (a0), t4
+    ret
+.size gcm_ghash_rv64i_zvbb_zvbc,.-gcm_ghash_rv64i_zvbb_zvbc
+___
+}
+
+$code .= <<___;
+.p2align 4
+Lpolymod:
+        .dword 0x0000000000000001
+        .dword 0xc200000000000000
+.size Lpolymod,.-Lpolymod
+___
+
+print $code;
+
+close STDOUT or die "error closing STDOUT: $!";
-- 
2.39.2


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [PATCH v4 06/12] RISC-V: crypto: add Zvbb+Zvbc accelerated GCM GHASH implementation
@ 2023-07-11 15:37   ` Heiko Stuebner
  0 siblings, 0 replies; 100+ messages in thread
From: Heiko Stuebner @ 2023-07-11 15:37 UTC (permalink / raw)
  To: palmer, paul.walmsley
  Cc: aou, heiko, herbert, davem, conor.dooley, linux-riscv,
	linux-kernel, linux-crypto, christoph.muellner, ebiggers,
	Heiko Stuebner

From: Heiko Stuebner <heiko.stuebner@vrull.eu>

Add a gcm hash implementation using the Zvbb+Zvbc crypto extensions.
It gets possibly registered alongside the Zbc-based variant, with a higher
priority so that the crypto subsystem will be able to select the most
performant variant, but the algorithm itself will still be part of the
crypto selftests that run during registration.

Co-developed-by: Christoph Müllner <christoph.muellner@vrull.eu>
Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
---
 arch/riscv/crypto/Kconfig                    |   3 +-
 arch/riscv/crypto/Makefile                   |   8 +-
 arch/riscv/crypto/ghash-riscv64-glue.c       | 150 ++++++++
 arch/riscv/crypto/ghash-riscv64-zvbb-zvbc.pl | 380 +++++++++++++++++++
 4 files changed, 539 insertions(+), 2 deletions(-)
 create mode 100644 arch/riscv/crypto/ghash-riscv64-zvbb-zvbc.pl

diff --git a/arch/riscv/crypto/Kconfig b/arch/riscv/crypto/Kconfig
index cd2237923e68..41b8fdfe1d92 100644
--- a/arch/riscv/crypto/Kconfig
+++ b/arch/riscv/crypto/Kconfig
@@ -4,7 +4,7 @@ menu "Accelerated Cryptographic Algorithms for CPU (riscv)"
 
 config CRYPTO_GHASH_RISCV64
 	tristate "Hash functions: GHASH"
-	depends on 64BIT && RISCV_ISA_ZBC
+	depends on 64BIT && (RISCV_ISA_ZBC || RISCV_ISA_V)
 	select CRYPTO_HASH
 	select CRYPTO_LIB_GF128MUL
 	help
@@ -14,5 +14,6 @@ config CRYPTO_GHASH_RISCV64
 	  - Zbc extension
 	  - Zbc + Zbb extensions
 	  - Zbc + Zbkb extensions
+	  - Zvbb vector crypto extension
 
 endmenu
diff --git a/arch/riscv/crypto/Makefile b/arch/riscv/crypto/Makefile
index 0a158919e9da..81190941ba78 100644
--- a/arch/riscv/crypto/Makefile
+++ b/arch/riscv/crypto/Makefile
@@ -8,6 +8,9 @@ ghash-riscv64-y := ghash-riscv64-glue.o
 ifdef CONFIG_RISCV_ISA_ZBC
 ghash-riscv64-y += ghash-riscv64-zbc.o
 endif
+ifdef CONFIG_RISCV_ISA_V
+ghash-riscv64-y += ghash-riscv64-zvbb-zvbc.o
+endif
 
 quiet_cmd_perlasm = PERLASM $@
       cmd_perlasm = $(PERL) $(<) void $(@)
@@ -15,4 +18,7 @@ quiet_cmd_perlasm = PERLASM $@
 $(obj)/ghash-riscv64-zbc.S: $(src)/ghash-riscv64-zbc.pl
 	$(call cmd,perlasm)
 
-clean-files += ghash-riscv64-zbc.S
+$(obj)/ghash-riscv64-zvbb-zvbc.S: $(src)/ghash-riscv64-zvbb-zvbc.pl
+	$(call cmd,perlasm)
+
+clean-files += ghash-riscv64-zbc.S ghash-riscv64-zvkb.S
diff --git a/arch/riscv/crypto/ghash-riscv64-glue.c b/arch/riscv/crypto/ghash-riscv64-glue.c
index 695bed6c54cb..2bfd1934d55b 100644
--- a/arch/riscv/crypto/ghash-riscv64-glue.c
+++ b/arch/riscv/crypto/ghash-riscv64-glue.c
@@ -11,6 +11,7 @@
 #include <linux/crypto.h>
 #include <linux/module.h>
 #include <asm/simd.h>
+#include <asm/vector.h>
 #include <crypto/ghash.h>
 #include <crypto/internal/hash.h>
 #include <crypto/internal/simd.h>
@@ -21,6 +22,9 @@ struct riscv64_ghash_ctx {
 
 	/* key used by vector asm */
 	u128 htable[16];
+
+	/* key used by software fallback */
+	be128 key;
 };
 
 struct riscv64_ghash_desc_ctx {
@@ -38,6 +42,142 @@ static int riscv64_ghash_init(struct shash_desc *desc)
 	return 0;
 }
 
+#ifdef CONFIG_RISCV_ISA_V
+
+void gcm_init_rv64i_zvbb_zvbc(u128 Htable[16], const u64 Xi[2]);
+
+void gcm_ghash_rv64i_zvbb_zvbc(u64 Xi[2], const u128 Htable[16],
+			       const u8 *inp, size_t len);
+
+static int riscv64_zvk_ghash_setkey_zvbb_zvbc(struct crypto_shash *tfm,
+					      const u8 *key,
+					      unsigned int keylen)
+{
+	struct riscv64_ghash_ctx *ctx = crypto_tfm_ctx(crypto_shash_tfm(tfm));
+	const u64 k[2] = { cpu_to_be64(((const u64 *)key)[0]),
+			   cpu_to_be64(((const u64 *)key)[1]) };
+
+	if (keylen != GHASH_BLOCK_SIZE)
+		return -EINVAL;
+
+	memcpy(&ctx->key, key, GHASH_BLOCK_SIZE);
+	kernel_rvv_begin();
+	gcm_init_rv64i_zvbb_zvbc(ctx->htable, k);
+	kernel_rvv_end();
+
+	ctx->ghash_func = gcm_ghash_rv64i_zvbb_zvbc;
+
+	return 0;
+}
+
+static inline void __ghash_block(struct riscv64_ghash_ctx *ctx,
+				 struct riscv64_ghash_desc_ctx *dctx)
+{
+	if (crypto_simd_usable()) {
+		kernel_rvv_begin();
+		ctx->ghash_func(dctx->shash, ctx->htable,
+				dctx->buffer, GHASH_DIGEST_SIZE);
+		kernel_rvv_end();
+	} else {
+		crypto_xor((u8 *)dctx->shash, dctx->buffer, GHASH_BLOCK_SIZE);
+		gf128mul_lle((be128 *)dctx->shash, &ctx->key);
+	}
+}
+
+static inline void __ghash_blocks(struct riscv64_ghash_ctx *ctx,
+				  struct riscv64_ghash_desc_ctx *dctx,
+				  const u8 *src, unsigned int srclen)
+{
+	if (crypto_simd_usable()) {
+		kernel_rvv_begin();
+		ctx->ghash_func(dctx->shash, ctx->htable,
+				src, srclen);
+		kernel_rvv_end();
+	} else {
+		while (srclen >= GHASH_BLOCK_SIZE) {
+			crypto_xor((u8 *)dctx->shash, src, GHASH_BLOCK_SIZE);
+			gf128mul_lle((be128 *)dctx->shash, &ctx->key);
+			srclen -= GHASH_BLOCK_SIZE;
+			src += GHASH_BLOCK_SIZE;
+		}
+	}
+}
+
+static int riscv64_zvk_ghash_update(struct shash_desc *desc,
+			   const u8 *src, unsigned int srclen)
+{
+	unsigned int len;
+	struct riscv64_ghash_ctx *ctx = crypto_tfm_ctx(crypto_shash_tfm(desc->tfm));
+	struct riscv64_ghash_desc_ctx *dctx = shash_desc_ctx(desc);
+
+	if (dctx->bytes) {
+		if (dctx->bytes + srclen < GHASH_DIGEST_SIZE) {
+			memcpy(dctx->buffer + dctx->bytes, src,
+				srclen);
+			dctx->bytes += srclen;
+			return 0;
+		}
+		memcpy(dctx->buffer + dctx->bytes, src,
+			GHASH_DIGEST_SIZE - dctx->bytes);
+
+		__ghash_block(ctx, dctx);
+
+		src += GHASH_DIGEST_SIZE - dctx->bytes;
+		srclen -= GHASH_DIGEST_SIZE - dctx->bytes;
+		dctx->bytes = 0;
+	}
+	len = srclen & ~(GHASH_DIGEST_SIZE - 1);
+
+	if (len) {
+		__ghash_blocks(ctx, dctx, src, len);
+		src += len;
+		srclen -= len;
+	}
+
+	if (srclen) {
+		memcpy(dctx->buffer, src, srclen);
+		dctx->bytes = srclen;
+	}
+	return 0;
+}
+
+static int riscv64_zvk_ghash_final(struct shash_desc *desc, u8 *out)
+{
+	struct riscv64_ghash_ctx *ctx = crypto_tfm_ctx(crypto_shash_tfm(desc->tfm));
+	struct riscv64_ghash_desc_ctx *dctx = shash_desc_ctx(desc);
+	int i;
+
+	if (dctx->bytes) {
+		for (i = dctx->bytes; i < GHASH_DIGEST_SIZE; i++)
+			dctx->buffer[i] = 0;
+		__ghash_block(ctx, dctx);
+		dctx->bytes = 0;
+	}
+
+	memcpy(out, dctx->shash, GHASH_DIGEST_SIZE);
+	return 0;
+}
+
+struct shash_alg riscv64_zvbb_zvbc_ghash_alg = {
+	.digestsize = GHASH_DIGEST_SIZE,
+	.init = riscv64_ghash_init,
+	.update = riscv64_zvk_ghash_update,
+	.final = riscv64_zvk_ghash_final,
+	.setkey = riscv64_zvk_ghash_setkey_zvbb_zvbc,
+	.descsize = sizeof(struct riscv64_ghash_desc_ctx)
+		    + sizeof(struct ghash_desc_ctx),
+	.base = {
+		 .cra_name = "ghash",
+		 .cra_driver_name = "riscv64_zvbb_zvbc_ghash",
+		 .cra_priority = 300,
+		 .cra_blocksize = GHASH_BLOCK_SIZE,
+		 .cra_ctxsize = sizeof(struct riscv64_ghash_ctx),
+		 .cra_module = THIS_MODULE,
+	},
+};
+
+#endif /* CONFIG_RISCV_ISA_V */
+
 #ifdef CONFIG_RISCV_ISA_ZBC
 
 void gcm_init_rv64i_zbc(u128 Htable[16], const u64 Xi[2]);
@@ -269,6 +409,16 @@ static int __init riscv64_ghash_mod_init(void)
 	}
 #endif
 
+#ifdef CONFIG_RISCV_ISA_V
+	if (riscv_isa_extension_available(NULL, ZVBB) && 
+	    riscv_isa_extension_available(NULL, ZVBC) &&
+	    riscv_vector_vlen() >= 128) {
+		ret = riscv64_ghash_register(&riscv64_zvbb_zvbc_ghash_alg);
+		if (ret < 0)
+			return ret;
+	}
+#endif
+
 	return 0;
 }
 
diff --git a/arch/riscv/crypto/ghash-riscv64-zvbb-zvbc.pl b/arch/riscv/crypto/ghash-riscv64-zvbb-zvbc.pl
new file mode 100644
index 000000000000..2b7475324c83
--- /dev/null
+++ b/arch/riscv/crypto/ghash-riscv64-zvbb-zvbc.pl
@@ -0,0 +1,380 @@
+#! /usr/bin/env perl
+# SPDX-License-Identifier: Apache-2.0 OR BSD-2-Clause
+#
+# This file is dual-licensed, meaning that you can use it under your
+# choice of either of the following two licenses:
+#
+# Copyright 2023 The OpenSSL Project Authors. All Rights Reserved.
+#
+# Licensed under the Apache License 2.0 (the "License"). You can obtain
+# a copy in the file LICENSE in the source distribution or at
+# https://www.openssl.org/source/license.html
+#
+# or
+#
+# Copyright (c) 2023, Christoph Müllner <christoph.muellner@vrull.eu>
+# All rights reserved.
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions
+# are met:
+# 1. Redistributions of source code must retain the above copyright
+#    notice, this list of conditions and the following disclaimer.
+# 2. Redistributions in binary form must reproduce the above copyright
+#    notice, this list of conditions and the following disclaimer in the
+#    documentation and/or other materials provided with the distribution.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+# - RV64I
+# - RISC-V vector ('V') with VLEN >= 128
+# - Vector Bit-manipulation used in Cryptography ('Zvbb')
+# - Vector Carryless Multiplication ('Zvbc')
+
+use strict;
+use warnings;
+
+use FindBin qw($Bin);
+use lib "$Bin";
+use lib "$Bin/../../perlasm";
+use riscv;
+
+# $output is the last argument if it looks like a file (it has an extension)
+# $flavour is the first argument if it doesn't look like a file
+my $output = $#ARGV >= 0 && $ARGV[$#ARGV] =~ m|\.\w+$| ? pop : undef;
+my $flavour = $#ARGV >= 0 && $ARGV[0] !~ m|\.| ? shift : undef;
+
+$output and open STDOUT,">$output";
+
+my $code=<<___;
+.text
+___
+
+################################################################################
+# void gcm_init_rv64i_zvbb_zvbc(u128 Htable[16], const u64 H[2]);
+#
+# input:	H: 128-bit H - secret parameter E(K, 0^128)
+# output:	Htable: Preprocessed key data for gcm_gmult_rv64i_zvbb_zvbc and
+#                       gcm_ghash_rv64i_zvbb_zvbc
+{
+my ($Htable,$H,$TMP0,$TMP1,$TMP2) = ("a0","a1","t0","t1","t2");
+my ($V0,$V1,$V2,$V3,$V4,$V5,$V6) = ("v0","v1","v2","v3","v4","v5","v6");
+
+$code .= <<___;
+.p2align 3
+.globl gcm_init_rv64i_zvbb_zvbc
+.type gcm_init_rv64i_zvbb_zvbc,\@function
+gcm_init_rv64i_zvbb_zvbc:
+    # Load/store data in reverse order.
+    # This is needed as a part of endianness swap.
+    add $H, $H, 8
+    li $TMP0, -8
+    li $TMP1, 63
+    la $TMP2, Lpolymod
+
+    @{[vsetivli__x0_2_e64_m1_ta_ma]} # vsetivli x0, 2, e64, m1, ta, ma
+
+    @{[vlse64_v  $V1, $H, $TMP0]}    # vlse64.v v1, (a1), t0
+    @{[vle64_v $V2, $TMP2]}          # vle64.v v2, (t2)
+
+    # Shift one left and get the carry bits.
+    @{[vsrl_vx $V3, $V1, $TMP1]}     # vsrl.vx v3, v1, t1
+    @{[vsll_vi $V1, $V1, 1]}         # vsll.vi v1, v1, 1
+
+    # Use the fact that the polynomial degree is no more than 128,
+    # i.e. only the LSB of the upper half could be set.
+    # Thanks to this we don't need to do the full reduction here.
+    # Instead simply subtract the reduction polynomial.
+    # This idea was taken from x86 ghash implementation in OpenSSL.
+    @{[vslideup_vi $V4, $V3, 1]}     # vslideup.vi v4, v3, 1
+    @{[vslidedown_vi $V3, $V3, 1]}   # vslidedown.vi v3, v3, 1
+
+    @{[vmv_v_i $V0, 2]}              # vmv.v.i v0, 2
+    @{[vor_vv_v0t $V1, $V1, $V4]}    # vor.vv v1, v1, v4, v0.t
+
+    # Need to set the mask to 3, if the carry bit is set.
+    @{[vmv_v_v $V0, $V3]}            # vmv.v.v v0, v3
+    @{[vmv_v_i $V3, 0]}              # vmv.v.i v3, 0
+    @{[vmerge_vim $V3, $V3, 3]}      # vmerge.vim v3, v3, 3, v0
+    @{[vmv_v_v $V0, $V3]}            # vmv.v.v v0, v3
+
+    @{[vxor_vv_v0t $V1, $V1, $V2]}   # vxor.vv v1, v1, v2, v0.t
+
+    @{[vse64_v $V1, $Htable]}        # vse64.v v1, (a0)
+    ret
+.size gcm_init_rv64i_zvbb_zvbc,.-gcm_init_rv64i_zvbb_zvbc
+___
+}
+
+################################################################################
+# void gcm_gmult_rv64i_zvbb_zvbc(u64 Xi[2], const u128 Htable[16]);
+#
+# input:	Xi: current hash value
+#		Htable: preprocessed H
+# output:	Xi: next hash value Xi = (Xi * H mod f)
+{
+my ($Xi,$Htable,$TMP0,$TMP1,$TMP2,$TMP3,$TMP4) = ("a0","a1","t0","t1","t2","t3","t4");
+my ($V0,$V1,$V2,$V3,$V4,$V5,$V6) = ("v0","v1","v2","v3","v4","v5","v6");
+
+$code .= <<___;
+.text
+.p2align 3
+.globl gcm_gmult_rv64i_zvbb_zvbc
+.type gcm_gmult_rv64i_zvbb_zvbc,\@function
+gcm_gmult_rv64i_zvbb_zvbc:
+    ld $TMP0, ($Htable)
+    ld $TMP1, 8($Htable)
+    li $TMP2, 63
+    la $TMP3, Lpolymod
+    ld $TMP3, 8($TMP3)
+
+    # Load/store data in reverse order.
+    # This is needed as a part of endianness swap.
+    add $Xi, $Xi, 8
+    li $TMP4, -8
+
+    @{[vsetivli__x0_2_e64_m1_ta_ma]} # vsetivli x0, 2, e64, m1, ta, ma
+
+    @{[vlse64_v $V5, $Xi, $TMP4]}    # vlse64.v v5, (a0), t4
+    @{[vrev8_v $V5, $V5]}            # vrev8.v v5, v5
+
+    # Multiplication
+
+    # Do two 64x64 multiplications in one go to save some time
+    # and simplify things.
+
+    # A = a1a0 (t1, t0)
+    # B = b1b0 (v5)
+    # C = c1c0 (256 bit)
+    # c1 = a1b1 + (a0b1)h + (a1b0)h
+    # c0 = a0b0 + (a0b1)l + (a1b0)h
+
+    # v1 = (a0b1)l,(a0b0)l
+    @{[vclmul_vx $V1, $V5, $TMP0]}   # vclmul.vx v1, v5, t0
+    # v3 = (a0b1)h,(a0b0)h
+    @{[vclmulh_vx $V3, $V5, $TMP0]}  # vclmulh.vx v3, v5, t0
+
+    # v4 = (a1b1)l,(a1b0)l
+    @{[vclmul_vx $V4, $V5, $TMP1]}   # vclmul.vx v4, v5, t1
+    # v2 = (a1b1)h,(a1b0)h
+    @{[vclmulh_vx $V2, $V5, $TMP1]}   # vclmulh.vx v2, v5, t1
+
+    # Is there a better way to do this?
+    # Would need to swap the order of elements within a vector register.
+    @{[vslideup_vi $V5, $V3, 1]}     # vslideup.vi v5, v3, 1
+    @{[vslideup_vi $V6, $V4, 1]}     # vslideup.vi v6, v4, 1
+    @{[vslidedown_vi $V3, $V3, 1]}   # vslidedown.vi v3, v3, 1
+    @{[vslidedown_vi $V4, $V4, 1]}   # vslidedown.vi v4, v4, 1
+
+    @{[vmv_v_i $V0, 1]}              # vmv.v.i v0, 1
+    # v2 += (a0b1)h
+    @{[vxor_vv_v0t $V2, $V2, $V3]}   # vxor.vv v2, v2, v3, v0.t
+    # v2 += (a1b1)l
+    @{[vxor_vv_v0t $V2, $V2, $V4]}   # vxor.vv v2, v2, v4, v0.t
+
+    @{[vmv_v_i $V0, 2]}              # vmv.v.i v0, 2
+    # v1 += (a0b0)h,0
+    @{[vxor_vv_v0t $V1, $V1, $V5]}   # vxor.vv v1, v1, v5, v0.t
+    # v1 += (a1b0)l,0
+    @{[vxor_vv_v0t $V1, $V1, $V6]}   # vxor.vv v1, v1, v6, v0.t
+
+    # Now the 256bit product should be stored in (v2,v1)
+    # v1 = (a0b1)l + (a0b0)h + (a1b0)l, (a0b0)l
+    # v2 = (a1b1)h, (a1b0)h + (a0b1)h + (a1b1)l
+
+    # Reduction
+    # Let C := A*B = c3,c2,c1,c0 = v2[1],v2[0],v1[1],v1[0]
+    # This is a slight variation of the Gueron's Montgomery reduction.
+    # The difference being the order of some operations has been changed,
+    # to make a better use of vclmul(h) instructions.
+
+    # First step:
+    # c1 += (c0 * P)l
+    # vmv.v.i v0, 2
+    @{[vslideup_vi_v0t $V3, $V1, 1]} # vslideup.vi v3, v1, 1, v0.t
+    @{[vclmul_vx_v0t $V3, $V3, $TMP3]} # vclmul.vx v3, v3, t3, v0.t
+    @{[vxor_vv_v0t $V1, $V1, $V3]}   # vxor.vv v1, v1, v3, v0.t
+
+    # Second step:
+    # D = d1,d0 is final result
+    # We want:
+    # m1 = c1 + (c1 * P)h
+    # m0 = (c1 * P)l + (c0 * P)h + c0
+    # d1 = c3 + m1
+    # d0 = c2 + m0
+
+    #v3 = (c1 * P)l, 0
+    @{[vclmul_vx_v0t $V3, $V1, $TMP3]} # vclmul.vx v3, v1, t3, v0.t
+    #v4 = (c1 * P)h, (c0 * P)h
+    @{[vclmulh_vx $V4, $V1, $TMP3]}   # vclmulh.vx v4, v1, t3
+
+    @{[vmv_v_i $V0, 1]}              # vmv.v.i v0, 1
+    @{[vslidedown_vi $V3, $V3, 1]}   # vslidedown.vi v3, v3, 1
+
+    @{[vxor_vv $V1, $V1, $V4]}       # vxor.vv v1, v1, v4
+    @{[vxor_vv_v0t $V1, $V1, $V3]}   # vxor.vv v1, v1, v3, v0.t
+
+    # XOR in the upper upper part of the product
+    @{[vxor_vv $V2, $V2, $V1]}       # vxor.vv v2, v2, v1
+
+    @{[vrev8_v $V2, $V2]}            # vrev8.v v2, v2
+    @{[vsse64_v $V2, $Xi, $TMP4]}    # vsse64.v v2, (a0), t4
+    ret
+.size gcm_gmult_rv64i_zvbb_zvbc,.-gcm_gmult_rv64i_zvbb_zvbc
+___
+}
+
+################################################################################
+# void gcm_ghash_rv64i_zvbb_zvbc(u64 Xi[2], const u128 Htable[16],
+#                                const u8 *inp, size_t len);
+#
+# input:	Xi: current hash value
+#		Htable: preprocessed H
+#		inp: pointer to input data
+#		len: length of input data in bytes (mutiple of block size)
+# output:	Xi: Xi+1 (next hash value Xi)
+{
+my ($Xi,$Htable,$inp,$len,$TMP0,$TMP1,$TMP2,$TMP3,$M8,$TMP5,$TMP6) = ("a0","a1","a2","a3","t0","t1","t2","t3","t4","t5","t6");
+my ($V0,$V1,$V2,$V3,$V4,$V5,$V6,$Vinp) = ("v0","v1","v2","v3","v4","v5","v6","v7");
+
+$code .= <<___;
+.p2align 3
+.globl gcm_ghash_rv64i_zvbb_zvbc
+.type gcm_ghash_rv64i_zvbb_zvbc,\@function
+gcm_ghash_rv64i_zvbb_zvbc:
+    ld $TMP0, ($Htable)
+    ld $TMP1, 8($Htable)
+    li $TMP2, 63
+    la $TMP3, Lpolymod
+    ld $TMP3, 8($TMP3)
+
+    # Load/store data in reverse order.
+    # This is needed as a part of endianness swap.
+    add $Xi, $Xi, 8
+    add $inp, $inp, 8
+    li $M8, -8
+
+    @{[vsetivli__x0_2_e64_m1_ta_ma]} # vsetivli x0, 2, e64, m1, ta, ma
+
+    @{[vlse64_v $V5, $Xi, $M8]}      # vlse64.v v5, (a0), t4
+
+Lstep:
+    # Read input data
+    @{[vlse64_v $Vinp, $inp, $M8]}   # vle64.v v0, (a2)
+    add $inp, $inp, 16
+    add $len, $len, -16
+    # XOR them into Xi
+    @{[vxor_vv $V5, $V5, $Vinp]}       # vxor.vv v0, v0, v1
+
+    @{[vrev8_v $V5, $V5]}            # vrev8.v v5, v5
+
+    # Multiplication
+
+    # Do two 64x64 multiplications in one go to save some time
+    # and simplify things.
+
+    # A = a1a0 (t1, t0)
+    # B = b1b0 (v5)
+    # C = c1c0 (256 bit)
+    # c1 = a1b1 + (a0b1)h + (a1b0)h
+    # c0 = a0b0 + (a0b1)l + (a1b0)h
+
+    # v1 = (a0b1)l,(a0b0)l
+    @{[vclmul_vx $V1, $V5, $TMP0]}   # vclmul.vx v1, v5, t0
+    # v3 = (a0b1)h,(a0b0)h
+    @{[vclmulh_vx $V3, $V5, $TMP0]}  # vclmulh.vx v3, v5, t0
+
+    # v4 = (a1b1)l,(a1b0)l
+    @{[vclmul_vx $V4, $V5, $TMP1]}   # vclmul.vx v4, v5, t1
+    # v2 = (a1b1)h,(a1b0)h
+    @{[vclmulh_vx $V2, $V5, $TMP1]}   # vclmulh.vx v2, v5, t1
+
+    # Is there a better way to do this?
+    # Would need to swap the order of elements within a vector register.
+    @{[vslideup_vi $V5, $V3, 1]}     # vslideup.vi v5, v3, 1
+    @{[vslideup_vi $V6, $V4, 1]}     # vslideup.vi v6, v4, 1
+    @{[vslidedown_vi $V3, $V3, 1]}   # vslidedown.vi v3, v3, 1
+    @{[vslidedown_vi $V4, $V4, 1]}   # vslidedown.vi v4, v4, 1
+
+    @{[vmv_v_i $V0, 1]}              # vmv.v.i v0, 1
+    # v2 += (a0b1)h
+    @{[vxor_vv_v0t $V2, $V2, $V3]}   # vxor.vv v2, v2, v3, v0.t
+    # v2 += (a1b1)l
+    @{[vxor_vv_v0t $V2, $V2, $V4]}   # vxor.vv v2, v2, v4, v0.t
+
+    @{[vmv_v_i $V0, 2]}              # vmv.v.i v0, 2
+    # v1 += (a0b0)h,0
+    @{[vxor_vv_v0t $V1, $V1, $V5]}   # vxor.vv v1, v1, v5, v0.t
+    # v1 += (a1b0)l,0
+    @{[vxor_vv_v0t $V1, $V1, $V6]}   # vxor.vv v1, v1, v6, v0.t
+
+    # Now the 256bit product should be stored in (v2,v1)
+    # v1 = (a0b1)l + (a0b0)h + (a1b0)l, (a0b0)l
+    # v2 = (a1b1)h, (a1b0)h + (a0b1)h + (a1b1)l
+
+    # Reduction
+    # Let C := A*B = c3,c2,c1,c0 = v2[1],v2[0],v1[1],v1[0]
+    # This is a slight variation of the Gueron's Montgomery reduction.
+    # The difference being the order of some operations has been changed,
+    # to make a better use of vclmul(h) instructions.
+
+    # First step:
+    # c1 += (c0 * P)l
+    # vmv.v.i v0, 2
+    @{[vslideup_vi_v0t $V3, $V1, 1]} # vslideup.vi v3, v1, 1, v0.t
+    @{[vclmul_vx_v0t $V3, $V3, $TMP3]} # vclmul.vx v3, v3, t3, v0.t
+    @{[vxor_vv_v0t $V1, $V1, $V3]}   # vxor.vv v1, v1, v3, v0.t
+
+    # Second step:
+    # D = d1,d0 is final result
+    # We want:
+    # m1 = c1 + (c1 * P)h
+    # m0 = (c1 * P)l + (c0 * P)h + c0
+    # d1 = c3 + m1
+    # d0 = c2 + m0
+
+    #v3 = (c1 * P)l, 0
+    @{[vclmul_vx_v0t $V3, $V1, $TMP3]} # vclmul.vx v3, v1, t3, v0.t
+    #v4 = (c1 * P)h, (c0 * P)h
+    @{[vclmulh_vx $V4, $V1, $TMP3]}   # vclmulh.vx v4, v1, t3
+
+    @{[vmv_v_i $V0, 1]}              # vmv.v.i v0, 1
+    @{[vslidedown_vi $V3, $V3, 1]}   # vslidedown.vi v3, v3, 1
+
+    @{[vxor_vv $V1, $V1, $V4]}       # vxor.vv v1, v1, v4
+    @{[vxor_vv_v0t $V1, $V1, $V3]}   # vxor.vv v1, v1, v3, v0.t
+
+    # XOR in the upper upper part of the product
+    @{[vxor_vv $V2, $V2, $V1]}       # vxor.vv v2, v2, v1
+
+    @{[vrev8_v $V5, $V2]}            # vrev8.v v2, v2
+
+    bnez $len, Lstep
+
+    @{[vsse64_v $V5, $Xi, $M8]}    # vsse64.v v2, (a0), t4
+    ret
+.size gcm_ghash_rv64i_zvbb_zvbc,.-gcm_ghash_rv64i_zvbb_zvbc
+___
+}
+
+$code .= <<___;
+.p2align 4
+Lpolymod:
+        .dword 0x0000000000000001
+        .dword 0xc200000000000000
+.size Lpolymod,.-Lpolymod
+___
+
+print $code;
+
+close STDOUT or die "error closing STDOUT: $!";
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [PATCH v4 07/12] RISC-V: crypto: add Zvkg accelerated GCM GHASH implementation
  2023-07-11 15:37 ` Heiko Stuebner
@ 2023-07-11 15:37   ` Heiko Stuebner
  -1 siblings, 0 replies; 100+ messages in thread
From: Heiko Stuebner @ 2023-07-11 15:37 UTC (permalink / raw)
  To: palmer, paul.walmsley
  Cc: aou, heiko, herbert, davem, conor.dooley, linux-riscv,
	linux-kernel, linux-crypto, christoph.muellner, ebiggers,
	Heiko Stuebner

From: Heiko Stuebner <heiko.stuebner@vrull.eu>

When the Zvkg vector crypto extension is available another optimized
gcm ghash variant is possible, so add it as another implementation.

Co-developed-by: Christoph Müllner <christoph.muellner@vrull.eu>
Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
---
 arch/riscv/crypto/Kconfig               |   1 +
 arch/riscv/crypto/Makefile              |   7 +-
 arch/riscv/crypto/ghash-riscv64-glue.c  |  95 ++++++++++++++
 arch/riscv/crypto/ghash-riscv64-zvkg.pl | 168 ++++++++++++++++++++++++
 4 files changed, 269 insertions(+), 2 deletions(-)
 create mode 100644 arch/riscv/crypto/ghash-riscv64-zvkg.pl

diff --git a/arch/riscv/crypto/Kconfig b/arch/riscv/crypto/Kconfig
index 41b8fdfe1d92..a1493b556993 100644
--- a/arch/riscv/crypto/Kconfig
+++ b/arch/riscv/crypto/Kconfig
@@ -15,5 +15,6 @@ config CRYPTO_GHASH_RISCV64
 	  - Zbc + Zbb extensions
 	  - Zbc + Zbkb extensions
 	  - Zvbb vector crypto extension
+	  - Zvkg vector crypto extension
 
 endmenu
diff --git a/arch/riscv/crypto/Makefile b/arch/riscv/crypto/Makefile
index 81190941ba78..496e784984cc 100644
--- a/arch/riscv/crypto/Makefile
+++ b/arch/riscv/crypto/Makefile
@@ -9,7 +9,7 @@ ifdef CONFIG_RISCV_ISA_ZBC
 ghash-riscv64-y += ghash-riscv64-zbc.o
 endif
 ifdef CONFIG_RISCV_ISA_V
-ghash-riscv64-y += ghash-riscv64-zvbb-zvbc.o
+ghash-riscv64-y += ghash-riscv64-zvbb-zvbc.o ghash-riscv64-zvkg.o
 endif
 
 quiet_cmd_perlasm = PERLASM $@
@@ -21,4 +21,7 @@ $(obj)/ghash-riscv64-zbc.S: $(src)/ghash-riscv64-zbc.pl
 $(obj)/ghash-riscv64-zvbb-zvbc.S: $(src)/ghash-riscv64-zvbb-zvbc.pl
 	$(call cmd,perlasm)
 
-clean-files += ghash-riscv64-zbc.S ghash-riscv64-zvkb.S
+$(obj)/ghash-riscv64-zvkg.S: $(src)/ghash-riscv64-zvkg.pl
+	$(call cmd,perlasm)
+
+clean-files += ghash-riscv64-zbc.S ghash-riscv64-zvkb.S ghash-riscv64-zvkg.S
diff --git a/arch/riscv/crypto/ghash-riscv64-glue.c b/arch/riscv/crypto/ghash-riscv64-glue.c
index 2bfd1934d55b..a196e35a0751 100644
--- a/arch/riscv/crypto/ghash-riscv64-glue.c
+++ b/arch/riscv/crypto/ghash-riscv64-glue.c
@@ -45,9 +45,13 @@ static int riscv64_ghash_init(struct shash_desc *desc)
 #ifdef CONFIG_RISCV_ISA_V
 
 void gcm_init_rv64i_zvbb_zvbc(u128 Htable[16], const u64 Xi[2]);
+void gcm_init_rv64i_zvkg(u128 Htable[16], const u64 Xi[2]);
+void gcm_init_rv64i_zvkg_zvbb(u128 Htable[16], const u64 Xi[2]);
 
 void gcm_ghash_rv64i_zvbb_zvbc(u64 Xi[2], const u128 Htable[16],
 			       const u8 *inp, size_t len);
+void gcm_ghash_rv64i_zvkg(u64 Xi[2], const u128 Htable[16],
+			  const u8 *inp, size_t len);
 
 static int riscv64_zvk_ghash_setkey_zvbb_zvbc(struct crypto_shash *tfm,
 					      const u8 *key,
@@ -70,6 +74,48 @@ static int riscv64_zvk_ghash_setkey_zvbb_zvbc(struct crypto_shash *tfm,
 	return 0;
 }
 
+static int riscv64_zvk_ghash_setkey_zvkg(struct crypto_shash *tfm,
+					   const u8 *key,
+					   unsigned int keylen)
+{
+	struct riscv64_ghash_ctx *ctx = crypto_tfm_ctx(crypto_shash_tfm(tfm));
+	const u64 k[2] = { cpu_to_be64(((const u64 *)key)[0]),
+			   cpu_to_be64(((const u64 *)key)[1]) };
+
+	if (keylen != GHASH_BLOCK_SIZE)
+		return -EINVAL;
+
+	memcpy(&ctx->key, key, GHASH_BLOCK_SIZE);
+	kernel_rvv_begin();
+	gcm_init_rv64i_zvkg(ctx->htable, k);
+	kernel_rvv_end();
+
+	ctx->ghash_func = gcm_ghash_rv64i_zvkg;
+
+	return 0;
+}
+
+static int riscv64_zvk_ghash_setkey_zvkg_zvbb(struct crypto_shash *tfm,
+					   const u8 *key,
+					   unsigned int keylen)
+{
+	struct riscv64_ghash_ctx *ctx = crypto_tfm_ctx(crypto_shash_tfm(tfm));
+	const u64 k[2] = { cpu_to_be64(((const u64 *)key)[0]),
+			   cpu_to_be64(((const u64 *)key)[1]) };
+
+	if (keylen != GHASH_BLOCK_SIZE)
+		return -EINVAL;
+
+	memcpy(&ctx->key, key, GHASH_BLOCK_SIZE);
+	kernel_rvv_begin();
+	gcm_init_rv64i_zvkg_zvbb(ctx->htable, k);
+	kernel_rvv_end();
+
+	ctx->ghash_func = gcm_ghash_rv64i_zvkg;
+
+	return 0;
+}
+
 static inline void __ghash_block(struct riscv64_ghash_ctx *ctx,
 				 struct riscv64_ghash_desc_ctx *dctx)
 {
@@ -176,6 +222,42 @@ struct shash_alg riscv64_zvbb_zvbc_ghash_alg = {
 	},
 };
 
+struct shash_alg riscv64_zvkg_ghash_alg = {
+	.digestsize = GHASH_DIGEST_SIZE,
+	.init = riscv64_ghash_init,
+	.update = riscv64_zvk_ghash_update,
+	.final = riscv64_zvk_ghash_final,
+	.setkey = riscv64_zvk_ghash_setkey_zvkg,
+	.descsize = sizeof(struct riscv64_ghash_desc_ctx)
+		    + sizeof(struct ghash_desc_ctx),
+	.base = {
+		 .cra_name = "ghash",
+		 .cra_driver_name = "riscv64_zvkg_ghash",
+		 .cra_priority = 301,
+		 .cra_blocksize = GHASH_BLOCK_SIZE,
+		 .cra_ctxsize = sizeof(struct riscv64_ghash_ctx),
+		 .cra_module = THIS_MODULE,
+	},
+};
+
+struct shash_alg riscv64_zvkg_zvbb_ghash_alg = {
+	.digestsize = GHASH_DIGEST_SIZE,
+	.init = riscv64_ghash_init,
+	.update = riscv64_zvk_ghash_update,
+	.final = riscv64_zvk_ghash_final,
+	.setkey = riscv64_zvk_ghash_setkey_zvkg_zvbb,
+	.descsize = sizeof(struct riscv64_ghash_desc_ctx)
+		    + sizeof(struct ghash_desc_ctx),
+	.base = {
+		 .cra_name = "ghash",
+		 .cra_driver_name = "riscv64_zvkg_zvbb_ghash",
+		 .cra_priority = 303,
+		 .cra_blocksize = GHASH_BLOCK_SIZE,
+		 .cra_ctxsize = sizeof(struct riscv64_ghash_ctx),
+		 .cra_module = THIS_MODULE,
+	},
+};
+
 #endif /* CONFIG_RISCV_ISA_V */
 
 #ifdef CONFIG_RISCV_ISA_ZBC
@@ -417,6 +499,19 @@ static int __init riscv64_ghash_mod_init(void)
 		if (ret < 0)
 			return ret;
 	}
+
+	if (riscv_isa_extension_available(NULL, ZVKG) &&
+	    riscv_vector_vlen() >= 128) {
+		ret = riscv64_ghash_register(&riscv64_zvkg_ghash_alg);
+		if (ret < 0)
+			return ret;
+
+		if (riscv_isa_extension_available(NULL, ZVBB)) {
+			ret = riscv64_ghash_register(&riscv64_zvkg_zvbb_ghash_alg);
+			if (ret < 0)
+				return ret;
+		}
+	}
 #endif
 
 	return 0;
diff --git a/arch/riscv/crypto/ghash-riscv64-zvkg.pl b/arch/riscv/crypto/ghash-riscv64-zvkg.pl
new file mode 100644
index 000000000000..d613218a286e
--- /dev/null
+++ b/arch/riscv/crypto/ghash-riscv64-zvkg.pl
@@ -0,0 +1,168 @@
+#! /usr/bin/env perl
+# SPDX-License-Identifier: Apache-2.0 OR BSD-2-Clause
+#
+# This file is dual-licensed, meaning that you can use it under your
+# choice of either of the following two licenses:
+#
+# Copyright 2023 The OpenSSL Project Authors. All Rights Reserved.
+#
+# Licensed under the Apache License 2.0 (the "License"). You can obtain
+# a copy in the file LICENSE in the source distribution or at
+# https://www.openssl.org/source/license.html
+#
+# or
+#
+# Copyright (c) 2023, Christoph Müllner <christoph.muellner@vrull.eu>
+# All rights reserved.
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions
+# are met:
+# 1. Redistributions of source code must retain the above copyright
+#    notice, this list of conditions and the following disclaimer.
+# 2. Redistributions in binary form must reproduce the above copyright
+#    notice, this list of conditions and the following disclaimer in the
+#    documentation and/or other materials provided with the distribution.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+# - RV64I
+# - RISC-V vector ('V') with VLEN >= 128
+# - RISC-V vector crypto GHASH extension ('Zvkg')
+
+use strict;
+use warnings;
+
+use FindBin qw($Bin);
+use lib "$Bin";
+use lib "$Bin/../../perlasm";
+use riscv;
+
+# $output is the last argument if it looks like a file (it has an extension)
+# $flavour is the first argument if it doesn't look like a file
+my $output = $#ARGV >= 0 && $ARGV[$#ARGV] =~ m|\.\w+$| ? pop : undef;
+my $flavour = $#ARGV >= 0 && $ARGV[0] !~ m|\.| ? shift : undef;
+
+$output and open STDOUT,">$output";
+
+my $code=<<___;
+.text
+___
+
+################################################################################
+# void gcm_init_rv64i_zvkg(u128 Htable[16], const u64 H[2]);
+# void gcm_init_rv64i_zvkg_zvbb(u128 Htable[16], const u64 H[2]);
+#
+# input: H: 128-bit H - secret parameter E(K, 0^128)
+# output: Htable: Copy of secret parameter (in normalized byte order)
+#
+# All callers of this function revert the byte-order unconditionally
+# on little-endian machines. So we need to revert the byte-order back.
+{
+my ($Htable,$H,$VAL0,$VAL1,$TMP0) = ("a0","a1","a2","a3","t0");
+
+$code .= <<___;
+.p2align 3
+.globl gcm_init_rv64i_zvkg
+.type gcm_init_rv64i_zvkg,\@function
+gcm_init_rv64i_zvkg:
+    ld      $VAL0, 0($H)
+    ld      $VAL1, 8($H)
+    @{[sd_rev8_rv64i $VAL0, $Htable, 0, $TMP0]}
+    @{[sd_rev8_rv64i $VAL1, $Htable, 8, $TMP0]}
+    ret
+.size gcm_init_rv64i_zvkg,.-gcm_init_rv64i_zvkg
+___
+}
+
+{
+my ($Htable,$H,$V0) = ("a0","a1","v0");
+
+$code .= <<___;
+.p2align 3
+.globl gcm_init_rv64i_zvkg_zvbb
+.type gcm_init_rv64i_zvkg_zvbb,\@function
+gcm_init_rv64i_zvkg_zvbb:
+    @{[vsetivli__x0_2_e64_m1_ta_ma]} # vsetivli x0, 2, e64, m1, ta, ma
+    @{[vle64_v $V0, $H]}             # vle64.v v0, (a1)
+    @{[vrev8_v $V0, $V0]}            # vrev8.v v0, v0
+    @{[vse64_v $V0, $Htable]}        # vse64.v v0, (a0)
+    ret
+.size gcm_init_rv64i_zvkg_zvbb,.-gcm_init_rv64i_zvkg_zvbb
+___
+}
+
+################################################################################
+# void gcm_gmult_rv64i_zvkg(u64 Xi[2], const u128 Htable[16]);
+#
+# input: Xi: current hash value
+#        Htable: copy of H
+# output: Xi: next hash value Xi
+{
+my ($Xi,$Htable) = ("a0","a1");
+my ($VD,$VS2) = ("v1","v2");
+
+$code .= <<___;
+.p2align 3
+.globl gcm_gmult_rv64i_zvkg
+.type gcm_gmult_rv64i_zvkg,\@function
+gcm_gmult_rv64i_zvkg:
+    @{[vsetivli__x0_4_e32_m1_ta_ma]}
+    @{[vle32_v $VS2, $Htable]}
+    @{[vle32_v $VD, $Xi]}
+    @{[vgmul_vv $VD, $VS2]}
+    @{[vse32_v $VD, $Xi]}
+    ret
+.size gcm_gmult_rv64i_zvkg,.-gcm_gmult_rv64i_zvkg
+___
+}
+
+################################################################################
+# void gcm_ghash_rv64i_zvkg(u64 Xi[2], const u128 Htable[16],
+#                           const u8 *inp, size_t len);
+#
+# input: Xi: current hash value
+#        Htable: copy of H
+#        inp: pointer to input data
+#        len: length of input data in bytes (mutiple of block size)
+# output: Xi: Xi+1 (next hash value Xi)
+{
+my ($Xi,$Htable,$inp,$len) = ("a0","a1","a2","a3");
+my ($vXi,$vH,$vinp,$Vzero) = ("v1","v2","v3","v4");
+
+$code .= <<___;
+.p2align 3
+.globl gcm_ghash_rv64i_zvkg
+.type gcm_ghash_rv64i_zvkg,\@function
+gcm_ghash_rv64i_zvkg:
+    @{[vsetivli__x0_4_e32_m1_ta_ma]}
+    @{[vle32_v $vH, $Htable]}
+    @{[vle32_v $vXi, $Xi]}
+
+Lstep:
+    @{[vle32_v $vinp, $inp]}
+    add $inp, $inp, 16
+    add $len, $len, -16
+    @{[vghsh_vv $vXi, $vH, $vinp]}
+    bnez $len, Lstep
+
+    @{[vse32_v $vXi, $Xi]}
+    ret
+
+.size gcm_ghash_rv64i_zvkg,.-gcm_ghash_rv64i_zvkg
+___
+}
+
+print $code;
+
+close STDOUT or die "error closing STDOUT: $!";
-- 
2.39.2


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [PATCH v4 07/12] RISC-V: crypto: add Zvkg accelerated GCM GHASH implementation
@ 2023-07-11 15:37   ` Heiko Stuebner
  0 siblings, 0 replies; 100+ messages in thread
From: Heiko Stuebner @ 2023-07-11 15:37 UTC (permalink / raw)
  To: palmer, paul.walmsley
  Cc: aou, heiko, herbert, davem, conor.dooley, linux-riscv,
	linux-kernel, linux-crypto, christoph.muellner, ebiggers,
	Heiko Stuebner

From: Heiko Stuebner <heiko.stuebner@vrull.eu>

When the Zvkg vector crypto extension is available another optimized
gcm ghash variant is possible, so add it as another implementation.

Co-developed-by: Christoph Müllner <christoph.muellner@vrull.eu>
Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
---
 arch/riscv/crypto/Kconfig               |   1 +
 arch/riscv/crypto/Makefile              |   7 +-
 arch/riscv/crypto/ghash-riscv64-glue.c  |  95 ++++++++++++++
 arch/riscv/crypto/ghash-riscv64-zvkg.pl | 168 ++++++++++++++++++++++++
 4 files changed, 269 insertions(+), 2 deletions(-)
 create mode 100644 arch/riscv/crypto/ghash-riscv64-zvkg.pl

diff --git a/arch/riscv/crypto/Kconfig b/arch/riscv/crypto/Kconfig
index 41b8fdfe1d92..a1493b556993 100644
--- a/arch/riscv/crypto/Kconfig
+++ b/arch/riscv/crypto/Kconfig
@@ -15,5 +15,6 @@ config CRYPTO_GHASH_RISCV64
 	  - Zbc + Zbb extensions
 	  - Zbc + Zbkb extensions
 	  - Zvbb vector crypto extension
+	  - Zvkg vector crypto extension
 
 endmenu
diff --git a/arch/riscv/crypto/Makefile b/arch/riscv/crypto/Makefile
index 81190941ba78..496e784984cc 100644
--- a/arch/riscv/crypto/Makefile
+++ b/arch/riscv/crypto/Makefile
@@ -9,7 +9,7 @@ ifdef CONFIG_RISCV_ISA_ZBC
 ghash-riscv64-y += ghash-riscv64-zbc.o
 endif
 ifdef CONFIG_RISCV_ISA_V
-ghash-riscv64-y += ghash-riscv64-zvbb-zvbc.o
+ghash-riscv64-y += ghash-riscv64-zvbb-zvbc.o ghash-riscv64-zvkg.o
 endif
 
 quiet_cmd_perlasm = PERLASM $@
@@ -21,4 +21,7 @@ $(obj)/ghash-riscv64-zbc.S: $(src)/ghash-riscv64-zbc.pl
 $(obj)/ghash-riscv64-zvbb-zvbc.S: $(src)/ghash-riscv64-zvbb-zvbc.pl
 	$(call cmd,perlasm)
 
-clean-files += ghash-riscv64-zbc.S ghash-riscv64-zvkb.S
+$(obj)/ghash-riscv64-zvkg.S: $(src)/ghash-riscv64-zvkg.pl
+	$(call cmd,perlasm)
+
+clean-files += ghash-riscv64-zbc.S ghash-riscv64-zvkb.S ghash-riscv64-zvkg.S
diff --git a/arch/riscv/crypto/ghash-riscv64-glue.c b/arch/riscv/crypto/ghash-riscv64-glue.c
index 2bfd1934d55b..a196e35a0751 100644
--- a/arch/riscv/crypto/ghash-riscv64-glue.c
+++ b/arch/riscv/crypto/ghash-riscv64-glue.c
@@ -45,9 +45,13 @@ static int riscv64_ghash_init(struct shash_desc *desc)
 #ifdef CONFIG_RISCV_ISA_V
 
 void gcm_init_rv64i_zvbb_zvbc(u128 Htable[16], const u64 Xi[2]);
+void gcm_init_rv64i_zvkg(u128 Htable[16], const u64 Xi[2]);
+void gcm_init_rv64i_zvkg_zvbb(u128 Htable[16], const u64 Xi[2]);
 
 void gcm_ghash_rv64i_zvbb_zvbc(u64 Xi[2], const u128 Htable[16],
 			       const u8 *inp, size_t len);
+void gcm_ghash_rv64i_zvkg(u64 Xi[2], const u128 Htable[16],
+			  const u8 *inp, size_t len);
 
 static int riscv64_zvk_ghash_setkey_zvbb_zvbc(struct crypto_shash *tfm,
 					      const u8 *key,
@@ -70,6 +74,48 @@ static int riscv64_zvk_ghash_setkey_zvbb_zvbc(struct crypto_shash *tfm,
 	return 0;
 }
 
+static int riscv64_zvk_ghash_setkey_zvkg(struct crypto_shash *tfm,
+					   const u8 *key,
+					   unsigned int keylen)
+{
+	struct riscv64_ghash_ctx *ctx = crypto_tfm_ctx(crypto_shash_tfm(tfm));
+	const u64 k[2] = { cpu_to_be64(((const u64 *)key)[0]),
+			   cpu_to_be64(((const u64 *)key)[1]) };
+
+	if (keylen != GHASH_BLOCK_SIZE)
+		return -EINVAL;
+
+	memcpy(&ctx->key, key, GHASH_BLOCK_SIZE);
+	kernel_rvv_begin();
+	gcm_init_rv64i_zvkg(ctx->htable, k);
+	kernel_rvv_end();
+
+	ctx->ghash_func = gcm_ghash_rv64i_zvkg;
+
+	return 0;
+}
+
+static int riscv64_zvk_ghash_setkey_zvkg_zvbb(struct crypto_shash *tfm,
+					   const u8 *key,
+					   unsigned int keylen)
+{
+	struct riscv64_ghash_ctx *ctx = crypto_tfm_ctx(crypto_shash_tfm(tfm));
+	const u64 k[2] = { cpu_to_be64(((const u64 *)key)[0]),
+			   cpu_to_be64(((const u64 *)key)[1]) };
+
+	if (keylen != GHASH_BLOCK_SIZE)
+		return -EINVAL;
+
+	memcpy(&ctx->key, key, GHASH_BLOCK_SIZE);
+	kernel_rvv_begin();
+	gcm_init_rv64i_zvkg_zvbb(ctx->htable, k);
+	kernel_rvv_end();
+
+	ctx->ghash_func = gcm_ghash_rv64i_zvkg;
+
+	return 0;
+}
+
 static inline void __ghash_block(struct riscv64_ghash_ctx *ctx,
 				 struct riscv64_ghash_desc_ctx *dctx)
 {
@@ -176,6 +222,42 @@ struct shash_alg riscv64_zvbb_zvbc_ghash_alg = {
 	},
 };
 
+struct shash_alg riscv64_zvkg_ghash_alg = {
+	.digestsize = GHASH_DIGEST_SIZE,
+	.init = riscv64_ghash_init,
+	.update = riscv64_zvk_ghash_update,
+	.final = riscv64_zvk_ghash_final,
+	.setkey = riscv64_zvk_ghash_setkey_zvkg,
+	.descsize = sizeof(struct riscv64_ghash_desc_ctx)
+		    + sizeof(struct ghash_desc_ctx),
+	.base = {
+		 .cra_name = "ghash",
+		 .cra_driver_name = "riscv64_zvkg_ghash",
+		 .cra_priority = 301,
+		 .cra_blocksize = GHASH_BLOCK_SIZE,
+		 .cra_ctxsize = sizeof(struct riscv64_ghash_ctx),
+		 .cra_module = THIS_MODULE,
+	},
+};
+
+struct shash_alg riscv64_zvkg_zvbb_ghash_alg = {
+	.digestsize = GHASH_DIGEST_SIZE,
+	.init = riscv64_ghash_init,
+	.update = riscv64_zvk_ghash_update,
+	.final = riscv64_zvk_ghash_final,
+	.setkey = riscv64_zvk_ghash_setkey_zvkg_zvbb,
+	.descsize = sizeof(struct riscv64_ghash_desc_ctx)
+		    + sizeof(struct ghash_desc_ctx),
+	.base = {
+		 .cra_name = "ghash",
+		 .cra_driver_name = "riscv64_zvkg_zvbb_ghash",
+		 .cra_priority = 303,
+		 .cra_blocksize = GHASH_BLOCK_SIZE,
+		 .cra_ctxsize = sizeof(struct riscv64_ghash_ctx),
+		 .cra_module = THIS_MODULE,
+	},
+};
+
 #endif /* CONFIG_RISCV_ISA_V */
 
 #ifdef CONFIG_RISCV_ISA_ZBC
@@ -417,6 +499,19 @@ static int __init riscv64_ghash_mod_init(void)
 		if (ret < 0)
 			return ret;
 	}
+
+	if (riscv_isa_extension_available(NULL, ZVKG) &&
+	    riscv_vector_vlen() >= 128) {
+		ret = riscv64_ghash_register(&riscv64_zvkg_ghash_alg);
+		if (ret < 0)
+			return ret;
+
+		if (riscv_isa_extension_available(NULL, ZVBB)) {
+			ret = riscv64_ghash_register(&riscv64_zvkg_zvbb_ghash_alg);
+			if (ret < 0)
+				return ret;
+		}
+	}
 #endif
 
 	return 0;
diff --git a/arch/riscv/crypto/ghash-riscv64-zvkg.pl b/arch/riscv/crypto/ghash-riscv64-zvkg.pl
new file mode 100644
index 000000000000..d613218a286e
--- /dev/null
+++ b/arch/riscv/crypto/ghash-riscv64-zvkg.pl
@@ -0,0 +1,168 @@
+#! /usr/bin/env perl
+# SPDX-License-Identifier: Apache-2.0 OR BSD-2-Clause
+#
+# This file is dual-licensed, meaning that you can use it under your
+# choice of either of the following two licenses:
+#
+# Copyright 2023 The OpenSSL Project Authors. All Rights Reserved.
+#
+# Licensed under the Apache License 2.0 (the "License"). You can obtain
+# a copy in the file LICENSE in the source distribution or at
+# https://www.openssl.org/source/license.html
+#
+# or
+#
+# Copyright (c) 2023, Christoph Müllner <christoph.muellner@vrull.eu>
+# All rights reserved.
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions
+# are met:
+# 1. Redistributions of source code must retain the above copyright
+#    notice, this list of conditions and the following disclaimer.
+# 2. Redistributions in binary form must reproduce the above copyright
+#    notice, this list of conditions and the following disclaimer in the
+#    documentation and/or other materials provided with the distribution.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+# - RV64I
+# - RISC-V vector ('V') with VLEN >= 128
+# - RISC-V vector crypto GHASH extension ('Zvkg')
+
+use strict;
+use warnings;
+
+use FindBin qw($Bin);
+use lib "$Bin";
+use lib "$Bin/../../perlasm";
+use riscv;
+
+# $output is the last argument if it looks like a file (it has an extension)
+# $flavour is the first argument if it doesn't look like a file
+my $output = $#ARGV >= 0 && $ARGV[$#ARGV] =~ m|\.\w+$| ? pop : undef;
+my $flavour = $#ARGV >= 0 && $ARGV[0] !~ m|\.| ? shift : undef;
+
+$output and open STDOUT,">$output";
+
+my $code=<<___;
+.text
+___
+
+################################################################################
+# void gcm_init_rv64i_zvkg(u128 Htable[16], const u64 H[2]);
+# void gcm_init_rv64i_zvkg_zvbb(u128 Htable[16], const u64 H[2]);
+#
+# input: H: 128-bit H - secret parameter E(K, 0^128)
+# output: Htable: Copy of secret parameter (in normalized byte order)
+#
+# All callers of this function revert the byte-order unconditionally
+# on little-endian machines. So we need to revert the byte-order back.
+{
+my ($Htable,$H,$VAL0,$VAL1,$TMP0) = ("a0","a1","a2","a3","t0");
+
+$code .= <<___;
+.p2align 3
+.globl gcm_init_rv64i_zvkg
+.type gcm_init_rv64i_zvkg,\@function
+gcm_init_rv64i_zvkg:
+    ld      $VAL0, 0($H)
+    ld      $VAL1, 8($H)
+    @{[sd_rev8_rv64i $VAL0, $Htable, 0, $TMP0]}
+    @{[sd_rev8_rv64i $VAL1, $Htable, 8, $TMP0]}
+    ret
+.size gcm_init_rv64i_zvkg,.-gcm_init_rv64i_zvkg
+___
+}
+
+{
+my ($Htable,$H,$V0) = ("a0","a1","v0");
+
+$code .= <<___;
+.p2align 3
+.globl gcm_init_rv64i_zvkg_zvbb
+.type gcm_init_rv64i_zvkg_zvbb,\@function
+gcm_init_rv64i_zvkg_zvbb:
+    @{[vsetivli__x0_2_e64_m1_ta_ma]} # vsetivli x0, 2, e64, m1, ta, ma
+    @{[vle64_v $V0, $H]}             # vle64.v v0, (a1)
+    @{[vrev8_v $V0, $V0]}            # vrev8.v v0, v0
+    @{[vse64_v $V0, $Htable]}        # vse64.v v0, (a0)
+    ret
+.size gcm_init_rv64i_zvkg_zvbb,.-gcm_init_rv64i_zvkg_zvbb
+___
+}
+
+################################################################################
+# void gcm_gmult_rv64i_zvkg(u64 Xi[2], const u128 Htable[16]);
+#
+# input: Xi: current hash value
+#        Htable: copy of H
+# output: Xi: next hash value Xi
+{
+my ($Xi,$Htable) = ("a0","a1");
+my ($VD,$VS2) = ("v1","v2");
+
+$code .= <<___;
+.p2align 3
+.globl gcm_gmult_rv64i_zvkg
+.type gcm_gmult_rv64i_zvkg,\@function
+gcm_gmult_rv64i_zvkg:
+    @{[vsetivli__x0_4_e32_m1_ta_ma]}
+    @{[vle32_v $VS2, $Htable]}
+    @{[vle32_v $VD, $Xi]}
+    @{[vgmul_vv $VD, $VS2]}
+    @{[vse32_v $VD, $Xi]}
+    ret
+.size gcm_gmult_rv64i_zvkg,.-gcm_gmult_rv64i_zvkg
+___
+}
+
+################################################################################
+# void gcm_ghash_rv64i_zvkg(u64 Xi[2], const u128 Htable[16],
+#                           const u8 *inp, size_t len);
+#
+# input: Xi: current hash value
+#        Htable: copy of H
+#        inp: pointer to input data
+#        len: length of input data in bytes (mutiple of block size)
+# output: Xi: Xi+1 (next hash value Xi)
+{
+my ($Xi,$Htable,$inp,$len) = ("a0","a1","a2","a3");
+my ($vXi,$vH,$vinp,$Vzero) = ("v1","v2","v3","v4");
+
+$code .= <<___;
+.p2align 3
+.globl gcm_ghash_rv64i_zvkg
+.type gcm_ghash_rv64i_zvkg,\@function
+gcm_ghash_rv64i_zvkg:
+    @{[vsetivli__x0_4_e32_m1_ta_ma]}
+    @{[vle32_v $vH, $Htable]}
+    @{[vle32_v $vXi, $Xi]}
+
+Lstep:
+    @{[vle32_v $vinp, $inp]}
+    add $inp, $inp, 16
+    add $len, $len, -16
+    @{[vghsh_vv $vXi, $vH, $vinp]}
+    bnez $len, Lstep
+
+    @{[vse32_v $vXi, $Xi]}
+    ret
+
+.size gcm_ghash_rv64i_zvkg,.-gcm_ghash_rv64i_zvkg
+___
+}
+
+print $code;
+
+close STDOUT or die "error closing STDOUT: $!";
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [PATCH v4 08/12] RISC-V: crypto: add a vector-crypto-accelerated SHA256 implementation
  2023-07-11 15:37 ` Heiko Stuebner
@ 2023-07-11 15:37   ` Heiko Stuebner
  -1 siblings, 0 replies; 100+ messages in thread
From: Heiko Stuebner @ 2023-07-11 15:37 UTC (permalink / raw)
  To: palmer, paul.walmsley
  Cc: aou, heiko, herbert, davem, conor.dooley, linux-riscv,
	linux-kernel, linux-crypto, christoph.muellner, ebiggers,
	Heiko Stuebner, Charalampos Mitrodimas

From: Heiko Stuebner <heiko.stuebner@vrull.eu>

This adds an accelerated SHA256 algorithm using either the Zvknha
or Zvknhb vector crypto extensions. The spec says that

    Zvknhb supports SHA-256 and SHA-512. Zvknha supports only SHA-256.

so the relevant acclerating instructions are included in both.

Co-developed-by: Charalampos Mitrodimas <charalampos.mitrodimas@vrull.eu>
Signed-off-by: Charalampos Mitrodimas <charalampos.mitrodimas@vrull.eu>
Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
---
 arch/riscv/crypto/Kconfig                     |  11 +
 arch/riscv/crypto/Makefile                    |   7 +
 arch/riscv/crypto/sha256-riscv64-glue.c       | 115 +++++++
 .../crypto/sha256-riscv64-zvbb-zvknha.pl      | 314 ++++++++++++++++++
 4 files changed, 447 insertions(+)
 create mode 100644 arch/riscv/crypto/sha256-riscv64-glue.c
 create mode 100644 arch/riscv/crypto/sha256-riscv64-zvbb-zvknha.pl

diff --git a/arch/riscv/crypto/Kconfig b/arch/riscv/crypto/Kconfig
index a1493b556993..860919d230aa 100644
--- a/arch/riscv/crypto/Kconfig
+++ b/arch/riscv/crypto/Kconfig
@@ -17,4 +17,15 @@ config CRYPTO_GHASH_RISCV64
 	  - Zvbb vector crypto extension
 	  - Zvkg vector crypto extension
 
+config CRYPTO_SHA256_RISCV64
+	tristate "Hash functions: SHA-256"
+	depends on 64BIT && RISCV_ISA_V
+	select CRYPTO_HASH
+	select CRYPTO_LIB_SHA256
+	help
+	  SHA-256 secure hash algorithm (FIPS 180)
+
+	  Architecture: riscv64 using
+	  - Zvknha or Zvknhb vector crypto extensions
+
 endmenu
diff --git a/arch/riscv/crypto/Makefile b/arch/riscv/crypto/Makefile
index 496e784984cc..cae2f255ceae 100644
--- a/arch/riscv/crypto/Makefile
+++ b/arch/riscv/crypto/Makefile
@@ -12,6 +12,9 @@ ifdef CONFIG_RISCV_ISA_V
 ghash-riscv64-y += ghash-riscv64-zvbb-zvbc.o ghash-riscv64-zvkg.o
 endif
 
+obj-$(CONFIG_CRYPTO_SHA256_RISCV64) += sha256-riscv64.o
+sha256-riscv64-y := sha256-riscv64-glue.o sha256-riscv64-zvbb-zvknha.o
+
 quiet_cmd_perlasm = PERLASM $@
       cmd_perlasm = $(PERL) $(<) void $(@)
 
@@ -24,4 +27,8 @@ $(obj)/ghash-riscv64-zvbb-zvbc.S: $(src)/ghash-riscv64-zvbb-zvbc.pl
 $(obj)/ghash-riscv64-zvkg.S: $(src)/ghash-riscv64-zvkg.pl
 	$(call cmd,perlasm)
 
+$(obj)/sha256-riscv64-zvbb-zvknha.S: $(src)/sha256-riscv64-zvbb-zvknha.pl
+	$(call cmd,perlasm)
+
 clean-files += ghash-riscv64-zbc.S ghash-riscv64-zvkb.S ghash-riscv64-zvkg.S
+clean-files += sha256-riscv64-zvknha.S
diff --git a/arch/riscv/crypto/sha256-riscv64-glue.c b/arch/riscv/crypto/sha256-riscv64-glue.c
new file mode 100644
index 000000000000..1c9c88029f60
--- /dev/null
+++ b/arch/riscv/crypto/sha256-riscv64-glue.c
@@ -0,0 +1,115 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Linux/riscv64 port of the OpenSSL SHA256 implementation for RISCV64
+ *
+ * Copyright (C) 2022 VRULL GmbH
+ * Author: Heiko Stuebner <heiko.stuebner@vrull.eu>
+ */
+
+#include <linux/module.h>
+#include <linux/types.h>
+#include <asm/simd.h>
+#include <asm/vector.h>
+#include <crypto/internal/hash.h>
+#include <crypto/internal/simd.h>
+#include <crypto/sha2.h>
+#include <crypto/sha256_base.h>
+
+asmlinkage void sha256_block_data_order_zvbb_zvknha(u32 *digest, const void *data,
+					unsigned int num_blks);
+
+static void __sha256_block_data_order(struct sha256_state *sst, u8 const *src,
+				      int blocks)
+{
+	sha256_block_data_order_zvbb_zvknha(sst->state, src, blocks);
+}
+
+static int riscv64_sha256_update(struct shash_desc *desc, const u8 *data,
+			 unsigned int len)
+{
+	if (crypto_simd_usable()) {
+		int ret;
+
+		kernel_rvv_begin();
+		ret = sha256_base_do_update(desc, data, len,
+					    __sha256_block_data_order);
+		kernel_rvv_end();
+		return ret;
+	} else {
+		sha256_update(shash_desc_ctx(desc), data, len);
+		return 0;
+	}
+}
+
+static int riscv64_sha256_finup(struct shash_desc *desc, const u8 *data,
+			unsigned int len, u8 *out)
+{
+	if (!crypto_simd_usable()) {
+		sha256_update(shash_desc_ctx(desc), data, len);
+		sha256_final(shash_desc_ctx(desc), out);
+		return 0;
+	}
+
+	kernel_rvv_begin();
+	if (len)
+		sha256_base_do_update(desc, data, len,
+				      __sha256_block_data_order);
+
+	sha256_base_do_finalize(desc, __sha256_block_data_order);
+	kernel_rvv_end();
+
+	return sha256_base_finish(desc, out);
+}
+
+static int riscv64_sha256_final(struct shash_desc *desc, u8 *out)
+{
+	return riscv64_sha256_finup(desc, NULL, 0, out);
+}
+
+static struct shash_alg sha256_alg = {
+	.digestsize		= SHA256_DIGEST_SIZE,
+	.init			= sha256_base_init,
+	.update			= riscv64_sha256_update,
+	.final			= riscv64_sha256_final,
+	.finup			= riscv64_sha256_finup,
+	.descsize		= sizeof(struct sha256_state),
+	.base.cra_name		= "sha256",
+	.base.cra_driver_name	= "sha256-riscv64-zvknha",
+	.base.cra_priority	= 150,
+	.base.cra_blocksize	= SHA256_BLOCK_SIZE,
+	.base.cra_module	= THIS_MODULE,
+};
+
+static int __init sha256_mod_init(void)
+{
+	/*
+	 * From the spec:
+	 * Zvknhb supports SHA-256 and SHA-512. Zvknha supports only SHA-256.
+	 */
+	if ((riscv_isa_extension_available(NULL, ZVKNHA) ||
+	     riscv_isa_extension_available(NULL, ZVKNHB)) &&
+	     riscv_isa_extension_available(NULL, ZVBB) &&
+	     riscv_vector_vlen() >= 128)
+
+		return crypto_register_shash(&sha256_alg);
+
+	return 0;
+}
+
+static void __exit sha256_mod_fini(void)
+{
+	if ((riscv_isa_extension_available(NULL, ZVKNHA) ||
+	     riscv_isa_extension_available(NULL, ZVKNHB)) &&
+	     riscv_isa_extension_available(NULL, ZVBB) &&
+	     riscv_vector_vlen() >= 128)
+		crypto_unregister_shash(&sha256_alg);
+}
+
+module_init(sha256_mod_init);
+module_exit(sha256_mod_fini);
+
+MODULE_DESCRIPTION("SHA-256 secure hash for riscv64");
+MODULE_AUTHOR("Andy Polyakov <appro@openssl.org>");
+MODULE_AUTHOR("Heiko Stuebner <heiko.stuebner@vrull.eu>");
+MODULE_LICENSE("GPL");
+MODULE_ALIAS_CRYPTO("sha256");
diff --git a/arch/riscv/crypto/sha256-riscv64-zvbb-zvknha.pl b/arch/riscv/crypto/sha256-riscv64-zvbb-zvknha.pl
new file mode 100644
index 000000000000..ab3ac4e373d9
--- /dev/null
+++ b/arch/riscv/crypto/sha256-riscv64-zvbb-zvknha.pl
@@ -0,0 +1,314 @@
+#! /usr/bin/env perl
+# SPDX-License-Identifier: Apache-2.0 OR BSD-2-Clause
+#
+# This file is dual-licensed, meaning that you can use it under your
+# choice of either of the following two licenses:
+#
+# Copyright 2023 The OpenSSL Project Authors. All Rights Reserved.
+#
+# Licensed under the Apache License 2.0 (the "License"). You can obtain
+# a copy in the file LICENSE in the source distribution or at
+# https://www.openssl.org/source/license.html
+#
+# or
+#
+# Copyright (c) 2023, Christoph Müllner <christoph.muellner@vrull.eu>
+# All rights reserved.
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions
+# are met:
+# 1. Redistributions of source code must retain the above copyright
+#    notice, this list of conditions and the following disclaimer.
+# 2. Redistributions in binary form must reproduce the above copyright
+#    notice, this list of conditions and the following disclaimer in the
+#    documentation and/or other materials provided with the distribution.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+# The generated code of this file depends on the following RISC-V extensions:
+# - RV64I
+# - RISC-V vector ('V') with VLEN >= 128
+# - Vector Bit-manipulation used in Cryptography ('Zvbb')
+# - Vector SHA-2 Secure Hash ('Zvknha')
+
+use strict;
+use warnings;
+
+use FindBin qw($Bin);
+use lib "$Bin";
+use lib "$Bin/../../perlasm";
+use riscv;
+
+# $output is the last argument if it looks like a file (it has an extension)
+# $flavour is the first argument if it doesn't look like a file
+my $output = $#ARGV >= 0 && $ARGV[$#ARGV] =~ m|\.\w+$| ? pop : undef;
+my $flavour = $#ARGV >= 0 && $ARGV[0] !~ m|\.| ? shift : undef;
+
+$output and open STDOUT,">$output";
+
+my $code=<<___;
+.text
+___
+
+my ($V0, $V10, $V11, $V12, $V13, $V14, $V15, $V16, $V17) = ("v0", "v10", "v11", "v12", "v13", "v14","v15", "v16", "v17");
+my ($V26, $V27) = ("v26", "v27");
+
+my $K256 = "K256";
+
+# Function arguments
+my ($H, $INP, $LEN, $KT, $STRIDE) = ("a0", "a1", "a2", "a3", "t3");
+
+################################################################################
+# void sha256_block_data_order_zvbb_zvknha(void *c, const void *p, size_t len)
+$code .= <<___;
+.p2align 2
+.globl sha256_block_data_order_zvbb_zvknha
+.type   sha256_block_data_order_zvbb_zvknha,\@function
+sha256_block_data_order_zvbb_zvknha:
+    @{[vsetivli__x0_4_e32_m1_ta_ma]}
+
+    # H is stored as {a,b,c,d},{e,f,g,h}, but we need {f,e,b,a},{h,g,d,c}
+    # We achieve this by reading with a negative stride followed by
+    # element sliding.
+    li $STRIDE, -4
+    addi $H, $H, 12
+    @{[vlse32_v $V16, $H, $STRIDE]} # {d,c,b,a}
+    addi $H, $H, 16
+    @{[vlse32_v $V17, $H, $STRIDE]} # {h,g,f,e}
+    # Keep H advanced by 12
+    addi $H, $H, -16
+
+    @{[vmv_v_v $V27, $V16]} # {d,c,b,a}
+    @{[vslidedown_vi $V26, $V16, 2]} # {b,a,0,0}
+    @{[vslidedown_vi $V16, $V17, 2]} # {f,e,0,0}
+    @{[vslideup_vi $V16, $V26, 2]} # {f,e,b,a}
+    @{[vslideup_vi $V17, $V27, 2]} # {h,g,d,c}
+
+    # Keep the old state as we need it later: H' = H+{a',b',c',...,h'}.
+    @{[vmv_v_v $V26, $V16]}
+    @{[vmv_v_v $V27, $V17]}
+
+L_round_loop:
+    la $KT, $K256 # Load round constants K256
+
+    # Load the 512-bits of the message block in v10-v13 and perform
+    # an endian swap on each 4 bytes element.
+    @{[vle32_v $V10, $INP]}
+    @{[vrev8_v $V10, $V10]}
+    add $INP, $INP, 16
+    @{[vle32_v $V11, $INP]}
+    @{[vrev8_v $V11, $V11]}
+    add $INP, $INP, 16
+    @{[vle32_v $V12, $INP]}
+    @{[vrev8_v $V12, $V12]}
+    add $INP, $INP, 16
+    @{[vle32_v $V13, $INP]}
+    @{[vrev8_v $V13, $V13]}
+    add $INP, $INP, 16
+
+    # Decrement length by 1
+    add $LEN, $LEN, -1
+
+    # Set v0 up for the vmerge that replaces the first word (idx==0)
+    @{[vid_v $V0]}
+    @{[vmseq_vi $V0, $V0, 0x0]}    # v0.mask[i] = (i == 0 ? 1 : 0)
+
+    # Quad-round 0 (+0, Wt from oldest to newest in v10->v11->v12->v13)
+    @{[vle32_v $V15, $KT]}
+    addi $KT, $KT, 16
+    @{[vadd_vv $V14, $V15, $V10]}
+    @{[vsha2cl_vv $V17, $V16, $V14]}
+    @{[vsha2ch_vv $V16, $V17, $V14]}
+    @{[vmerge_vvm $V14, $V12, $V11, $V0]}
+    @{[vsha2ms_vv $V10, $V14, $V13]}  # Generate W[19:16]
+
+    # Quad-round 1 (+1, v11->v12->v13->v10)
+    @{[vle32_v $V15, $KT]}
+    addi $KT, $KT, 16
+    @{[vadd_vv $V14, $V15, $V11]}
+    @{[vsha2cl_vv $V17, $V16, $V14]}
+    @{[vsha2ch_vv $V16, $V17, $V14]}
+    @{[vmerge_vvm $V14, $V13, $V12, $V0]}
+    @{[vsha2ms_vv $V11, $V14, $V10]}  # Generate W[23:20]
+
+    # Quad-round 2 (+2, v12->v13->v10->v11)
+    @{[vle32_v $V15, $KT]}
+    addi $KT, $KT, 16
+    @{[vadd_vv $V14, $V15, $V12]}
+    @{[vsha2cl_vv $V17, $V16, $V14]}
+    @{[vsha2ch_vv $V16, $V17, $V14]}
+    @{[vmerge_vvm $V14, $V10, $V13, $V0]}
+    @{[vsha2ms_vv $V12, $V14, $V11]}  # Generate W[27:24]
+
+    # Quad-round 3 (+3, v13->v10->v11->v12)
+    @{[vle32_v $V15, $KT]}
+    addi $KT, $KT, 16
+    @{[vadd_vv $V14, $V15, $V13]}
+    @{[vsha2cl_vv $V17, $V16, $V14]}
+    @{[vsha2ch_vv $V16, $V17, $V14]}
+    @{[vmerge_vvm $V14, $V11, $V10, $V0]}
+    @{[vsha2ms_vv $V13, $V14, $V12]}  # Generate W[31:28]
+
+    # Quad-round 4 (+0, v10->v11->v12->v13)
+    @{[vle32_v $V15, $KT]}
+    addi $KT, $KT, 16
+    @{[vadd_vv $V14, $V15, $V10]}
+    @{[vsha2cl_vv $V17, $V16, $V14]}
+    @{[vsha2ch_vv $V16, $V17, $V14]}
+    @{[vmerge_vvm $V14, $V12, $V11, $V0]}
+    @{[vsha2ms_vv $V10, $V14, $V13]}  # Generate W[35:32]
+
+    # Quad-round 5 (+1, v11->v12->v13->v10)
+    @{[vle32_v $V15, $KT]}
+    addi $KT, $KT, 16
+    @{[vadd_vv $V14, $V15, $V11]}
+    @{[vsha2cl_vv $V17, $V16, $V14]}
+    @{[vsha2ch_vv $V16, $V17, $V14]}
+    @{[vmerge_vvm $V14, $V13, $V12, $V0]}
+    @{[vsha2ms_vv $V11, $V14, $V10]}  # Generate W[39:36]
+
+    # Quad-round 6 (+2, v12->v13->v10->v11)
+    @{[vle32_v $V15, $KT]}
+    addi $KT, $KT, 16
+    @{[vadd_vv $V14, $V15, $V12]}
+    @{[vsha2cl_vv $V17, $V16, $V14]}
+    @{[vsha2ch_vv $V16, $V17, $V14]}
+    @{[vmerge_vvm $V14, $V10, $V13, $V0]}
+    @{[vsha2ms_vv $V12, $V14, $V11]}  # Generate W[43:40]
+
+    # Quad-round 7 (+3, v13->v10->v11->v12)
+    @{[vle32_v $V15, $KT]}
+    addi $KT, $KT, 16
+    @{[vadd_vv $V14, $V15, $V13]}
+    @{[vsha2cl_vv $V17, $V16, $V14]}
+    @{[vsha2ch_vv $V16, $V17, $V14]}
+    @{[vmerge_vvm $V14, $V11, $V10, $V0]}
+    @{[vsha2ms_vv $V13, $V14, $V12]}  # Generate W[47:44]
+
+    # Quad-round 8 (+0, v10->v11->v12->v13)
+    @{[vle32_v $V15, $KT]}
+    addi $KT, $KT, 16
+    @{[vadd_vv $V14, $V15, $V10]}
+    @{[vsha2cl_vv $V17, $V16, $V14]}
+    @{[vsha2ch_vv $V16, $V17, $V14]}
+    @{[vmerge_vvm $V14, $V12, $V11, $V0]}
+    @{[vsha2ms_vv $V10, $V14, $V13]}  # Generate W[51:48]
+
+    # Quad-round 9 (+1, v11->v12->v13->v10)
+    @{[vle32_v $V15, $KT]}
+    addi $KT, $KT, 16
+    @{[vadd_vv $V14, $V15, $V11]}
+    @{[vsha2cl_vv $V17, $V16, $V14]}
+    @{[vsha2ch_vv $V16, $V17, $V14]}
+    @{[vmerge_vvm $V14, $V13, $V12, $V0]}
+    @{[vsha2ms_vv $V11, $V14, $V10]}  # Generate W[55:52]
+
+    # Quad-round 10 (+2, v12->v13->v10->v11)
+    @{[vle32_v $V15, $KT]}
+    addi $KT, $KT, 16
+    @{[vadd_vv $V14, $V15, $V12]}
+    @{[vsha2cl_vv $V17, $V16, $V14]}
+    @{[vsha2ch_vv $V16, $V17, $V14]}
+    @{[vmerge_vvm $V14, $V10, $V13, $V0]}
+    @{[vsha2ms_vv $V12, $V14, $V11]}  # Generate W[59:56]
+
+    # Quad-round 11 (+3, v13->v10->v11->v12)
+    @{[vle32_v $V15, $KT]}
+    addi $KT, $KT, 16
+    @{[vadd_vv $V14, $V15, $V13]}
+    @{[vsha2cl_vv $V17, $V16, $V14]}
+    @{[vsha2ch_vv $V16, $V17, $V14]}
+    @{[vmerge_vvm $V14, $V11, $V10, $V0]}
+    @{[vsha2ms_vv $V13, $V14, $V12]}  # Generate W[63:60]
+
+    # Quad-round 12 (+0, v10->v11->v12->v13)
+    # Note that we stop generating new message schedule words (Wt, v10-13)
+    # as we already generated all the words we end up consuming (i.e., W[63:60]).
+    @{[vle32_v $V15, $KT]}
+    addi $KT, $KT, 16
+    @{[vadd_vv $V14, $V15, $V10]}
+    @{[vsha2cl_vv $V17, $V16, $V14]}
+    @{[vsha2ch_vv $V16, $V17, $V14]}
+
+    # Quad-round 13 (+1, v11->v12->v13->v10)
+    @{[vle32_v $V15, $KT]}
+    addi $KT, $KT, 16
+    @{[vadd_vv $V14, $V15, $V11]}
+    @{[vsha2cl_vv $V17, $V16, $V14]}
+    @{[vsha2ch_vv $V16, $V17, $V14]}
+
+    # Quad-round 14 (+2, v12->v13->v10->v11)
+    @{[vle32_v $V15, $KT]}
+    addi $KT, $KT, 16
+    @{[vadd_vv $V14, $V15, $V12]}
+    @{[vsha2cl_vv $V17, $V16, $V14]}
+    @{[vsha2ch_vv $V16, $V17, $V14]}
+
+    # Quad-round 15 (+3, v13->v10->v11->v12)
+    @{[vle32_v $V15, $KT]}
+    # No kt increment needed.
+    @{[vadd_vv $V14, $V15, $V13]}
+    @{[vsha2cl_vv $V17, $V16, $V14]}
+    @{[vsha2ch_vv $V16, $V17, $V14]}
+
+    # H' = H+{a',b',c',...,h'}
+    @{[vadd_vv $V16, $V26, $V16]}
+    @{[vadd_vv $V17, $V27, $V17]}
+    @{[vmv_v_v $V26, $V16]}
+    @{[vmv_v_v $V27, $V17]}
+    bnez $LEN, L_round_loop
+
+    # v26 = v16 = {f,e,b,a}
+    # v27 = v17 = {h,g,d,c}
+    # Let's do the opposit transformation like on entry.
+
+    @{[vslideup_vi $V17, $V16, 2]} # {h,g,f,e}
+
+    @{[vslidedown_vi $V16, $V27, 2]} # {d,c,0,0}
+    @{[vslidedown_vi $V26, $V26, 2]} # {b,a,0,0}
+    @{[vslideup_vi $V16, $V26, 2]} # {d,c,b,a}
+
+    # H is already advanced by 12
+    @{[vsse32_v $V16, $H, $STRIDE]} # {a,b,c,d}
+    addi $H, $H, 16
+    @{[vsse32_v $V17, $H, $STRIDE]} # {e,f,g,h}
+
+    ret
+.size sha256_block_data_order_zvbb_zvknha,.-sha256_block_data_order_zvbb_zvknha
+
+.p2align 2
+.type $K256,\@object
+$K256:
+    .word 0x428a2f98, 0x71374491, 0xb5c0fbcf, 0xe9b5dba5
+    .word 0x3956c25b, 0x59f111f1, 0x923f82a4, 0xab1c5ed5
+    .word 0xd807aa98, 0x12835b01, 0x243185be, 0x550c7dc3
+    .word 0x72be5d74, 0x80deb1fe, 0x9bdc06a7, 0xc19bf174
+    .word 0xe49b69c1, 0xefbe4786, 0x0fc19dc6, 0x240ca1cc
+    .word 0x2de92c6f, 0x4a7484aa, 0x5cb0a9dc, 0x76f988da
+    .word 0x983e5152, 0xa831c66d, 0xb00327c8, 0xbf597fc7
+    .word 0xc6e00bf3, 0xd5a79147, 0x06ca6351, 0x14292967
+    .word 0x27b70a85, 0x2e1b2138, 0x4d2c6dfc, 0x53380d13
+    .word 0x650a7354, 0x766a0abb, 0x81c2c92e, 0x92722c85
+    .word 0xa2bfe8a1, 0xa81a664b, 0xc24b8b70, 0xc76c51a3
+    .word 0xd192e819, 0xd6990624, 0xf40e3585, 0x106aa070
+    .word 0x19a4c116, 0x1e376c08, 0x2748774c, 0x34b0bcb5
+    .word 0x391c0cb3, 0x4ed8aa4a, 0x5b9cca4f, 0x682e6ff3
+    .word 0x748f82ee, 0x78a5636f, 0x84c87814, 0x8cc70208
+    .word 0x90befffa, 0xa4506ceb, 0xbef9a3f7, 0xc67178f2
+.size $K256,.-$K256
+___
+
+print $code;
+
+close STDOUT or die "error closing STDOUT: $!";
-- 
2.39.2


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [PATCH v4 08/12] RISC-V: crypto: add a vector-crypto-accelerated SHA256 implementation
@ 2023-07-11 15:37   ` Heiko Stuebner
  0 siblings, 0 replies; 100+ messages in thread
From: Heiko Stuebner @ 2023-07-11 15:37 UTC (permalink / raw)
  To: palmer, paul.walmsley
  Cc: aou, heiko, herbert, davem, conor.dooley, linux-riscv,
	linux-kernel, linux-crypto, christoph.muellner, ebiggers,
	Heiko Stuebner, Charalampos Mitrodimas

From: Heiko Stuebner <heiko.stuebner@vrull.eu>

This adds an accelerated SHA256 algorithm using either the Zvknha
or Zvknhb vector crypto extensions. The spec says that

    Zvknhb supports SHA-256 and SHA-512. Zvknha supports only SHA-256.

so the relevant acclerating instructions are included in both.

Co-developed-by: Charalampos Mitrodimas <charalampos.mitrodimas@vrull.eu>
Signed-off-by: Charalampos Mitrodimas <charalampos.mitrodimas@vrull.eu>
Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
---
 arch/riscv/crypto/Kconfig                     |  11 +
 arch/riscv/crypto/Makefile                    |   7 +
 arch/riscv/crypto/sha256-riscv64-glue.c       | 115 +++++++
 .../crypto/sha256-riscv64-zvbb-zvknha.pl      | 314 ++++++++++++++++++
 4 files changed, 447 insertions(+)
 create mode 100644 arch/riscv/crypto/sha256-riscv64-glue.c
 create mode 100644 arch/riscv/crypto/sha256-riscv64-zvbb-zvknha.pl

diff --git a/arch/riscv/crypto/Kconfig b/arch/riscv/crypto/Kconfig
index a1493b556993..860919d230aa 100644
--- a/arch/riscv/crypto/Kconfig
+++ b/arch/riscv/crypto/Kconfig
@@ -17,4 +17,15 @@ config CRYPTO_GHASH_RISCV64
 	  - Zvbb vector crypto extension
 	  - Zvkg vector crypto extension
 
+config CRYPTO_SHA256_RISCV64
+	tristate "Hash functions: SHA-256"
+	depends on 64BIT && RISCV_ISA_V
+	select CRYPTO_HASH
+	select CRYPTO_LIB_SHA256
+	help
+	  SHA-256 secure hash algorithm (FIPS 180)
+
+	  Architecture: riscv64 using
+	  - Zvknha or Zvknhb vector crypto extensions
+
 endmenu
diff --git a/arch/riscv/crypto/Makefile b/arch/riscv/crypto/Makefile
index 496e784984cc..cae2f255ceae 100644
--- a/arch/riscv/crypto/Makefile
+++ b/arch/riscv/crypto/Makefile
@@ -12,6 +12,9 @@ ifdef CONFIG_RISCV_ISA_V
 ghash-riscv64-y += ghash-riscv64-zvbb-zvbc.o ghash-riscv64-zvkg.o
 endif
 
+obj-$(CONFIG_CRYPTO_SHA256_RISCV64) += sha256-riscv64.o
+sha256-riscv64-y := sha256-riscv64-glue.o sha256-riscv64-zvbb-zvknha.o
+
 quiet_cmd_perlasm = PERLASM $@
       cmd_perlasm = $(PERL) $(<) void $(@)
 
@@ -24,4 +27,8 @@ $(obj)/ghash-riscv64-zvbb-zvbc.S: $(src)/ghash-riscv64-zvbb-zvbc.pl
 $(obj)/ghash-riscv64-zvkg.S: $(src)/ghash-riscv64-zvkg.pl
 	$(call cmd,perlasm)
 
+$(obj)/sha256-riscv64-zvbb-zvknha.S: $(src)/sha256-riscv64-zvbb-zvknha.pl
+	$(call cmd,perlasm)
+
 clean-files += ghash-riscv64-zbc.S ghash-riscv64-zvkb.S ghash-riscv64-zvkg.S
+clean-files += sha256-riscv64-zvknha.S
diff --git a/arch/riscv/crypto/sha256-riscv64-glue.c b/arch/riscv/crypto/sha256-riscv64-glue.c
new file mode 100644
index 000000000000..1c9c88029f60
--- /dev/null
+++ b/arch/riscv/crypto/sha256-riscv64-glue.c
@@ -0,0 +1,115 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Linux/riscv64 port of the OpenSSL SHA256 implementation for RISCV64
+ *
+ * Copyright (C) 2022 VRULL GmbH
+ * Author: Heiko Stuebner <heiko.stuebner@vrull.eu>
+ */
+
+#include <linux/module.h>
+#include <linux/types.h>
+#include <asm/simd.h>
+#include <asm/vector.h>
+#include <crypto/internal/hash.h>
+#include <crypto/internal/simd.h>
+#include <crypto/sha2.h>
+#include <crypto/sha256_base.h>
+
+asmlinkage void sha256_block_data_order_zvbb_zvknha(u32 *digest, const void *data,
+					unsigned int num_blks);
+
+static void __sha256_block_data_order(struct sha256_state *sst, u8 const *src,
+				      int blocks)
+{
+	sha256_block_data_order_zvbb_zvknha(sst->state, src, blocks);
+}
+
+static int riscv64_sha256_update(struct shash_desc *desc, const u8 *data,
+			 unsigned int len)
+{
+	if (crypto_simd_usable()) {
+		int ret;
+
+		kernel_rvv_begin();
+		ret = sha256_base_do_update(desc, data, len,
+					    __sha256_block_data_order);
+		kernel_rvv_end();
+		return ret;
+	} else {
+		sha256_update(shash_desc_ctx(desc), data, len);
+		return 0;
+	}
+}
+
+static int riscv64_sha256_finup(struct shash_desc *desc, const u8 *data,
+			unsigned int len, u8 *out)
+{
+	if (!crypto_simd_usable()) {
+		sha256_update(shash_desc_ctx(desc), data, len);
+		sha256_final(shash_desc_ctx(desc), out);
+		return 0;
+	}
+
+	kernel_rvv_begin();
+	if (len)
+		sha256_base_do_update(desc, data, len,
+				      __sha256_block_data_order);
+
+	sha256_base_do_finalize(desc, __sha256_block_data_order);
+	kernel_rvv_end();
+
+	return sha256_base_finish(desc, out);
+}
+
+static int riscv64_sha256_final(struct shash_desc *desc, u8 *out)
+{
+	return riscv64_sha256_finup(desc, NULL, 0, out);
+}
+
+static struct shash_alg sha256_alg = {
+	.digestsize		= SHA256_DIGEST_SIZE,
+	.init			= sha256_base_init,
+	.update			= riscv64_sha256_update,
+	.final			= riscv64_sha256_final,
+	.finup			= riscv64_sha256_finup,
+	.descsize		= sizeof(struct sha256_state),
+	.base.cra_name		= "sha256",
+	.base.cra_driver_name	= "sha256-riscv64-zvknha",
+	.base.cra_priority	= 150,
+	.base.cra_blocksize	= SHA256_BLOCK_SIZE,
+	.base.cra_module	= THIS_MODULE,
+};
+
+static int __init sha256_mod_init(void)
+{
+	/*
+	 * From the spec:
+	 * Zvknhb supports SHA-256 and SHA-512. Zvknha supports only SHA-256.
+	 */
+	if ((riscv_isa_extension_available(NULL, ZVKNHA) ||
+	     riscv_isa_extension_available(NULL, ZVKNHB)) &&
+	     riscv_isa_extension_available(NULL, ZVBB) &&
+	     riscv_vector_vlen() >= 128)
+
+		return crypto_register_shash(&sha256_alg);
+
+	return 0;
+}
+
+static void __exit sha256_mod_fini(void)
+{
+	if ((riscv_isa_extension_available(NULL, ZVKNHA) ||
+	     riscv_isa_extension_available(NULL, ZVKNHB)) &&
+	     riscv_isa_extension_available(NULL, ZVBB) &&
+	     riscv_vector_vlen() >= 128)
+		crypto_unregister_shash(&sha256_alg);
+}
+
+module_init(sha256_mod_init);
+module_exit(sha256_mod_fini);
+
+MODULE_DESCRIPTION("SHA-256 secure hash for riscv64");
+MODULE_AUTHOR("Andy Polyakov <appro@openssl.org>");
+MODULE_AUTHOR("Heiko Stuebner <heiko.stuebner@vrull.eu>");
+MODULE_LICENSE("GPL");
+MODULE_ALIAS_CRYPTO("sha256");
diff --git a/arch/riscv/crypto/sha256-riscv64-zvbb-zvknha.pl b/arch/riscv/crypto/sha256-riscv64-zvbb-zvknha.pl
new file mode 100644
index 000000000000..ab3ac4e373d9
--- /dev/null
+++ b/arch/riscv/crypto/sha256-riscv64-zvbb-zvknha.pl
@@ -0,0 +1,314 @@
+#! /usr/bin/env perl
+# SPDX-License-Identifier: Apache-2.0 OR BSD-2-Clause
+#
+# This file is dual-licensed, meaning that you can use it under your
+# choice of either of the following two licenses:
+#
+# Copyright 2023 The OpenSSL Project Authors. All Rights Reserved.
+#
+# Licensed under the Apache License 2.0 (the "License"). You can obtain
+# a copy in the file LICENSE in the source distribution or at
+# https://www.openssl.org/source/license.html
+#
+# or
+#
+# Copyright (c) 2023, Christoph Müllner <christoph.muellner@vrull.eu>
+# All rights reserved.
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions
+# are met:
+# 1. Redistributions of source code must retain the above copyright
+#    notice, this list of conditions and the following disclaimer.
+# 2. Redistributions in binary form must reproduce the above copyright
+#    notice, this list of conditions and the following disclaimer in the
+#    documentation and/or other materials provided with the distribution.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+# The generated code of this file depends on the following RISC-V extensions:
+# - RV64I
+# - RISC-V vector ('V') with VLEN >= 128
+# - Vector Bit-manipulation used in Cryptography ('Zvbb')
+# - Vector SHA-2 Secure Hash ('Zvknha')
+
+use strict;
+use warnings;
+
+use FindBin qw($Bin);
+use lib "$Bin";
+use lib "$Bin/../../perlasm";
+use riscv;
+
+# $output is the last argument if it looks like a file (it has an extension)
+# $flavour is the first argument if it doesn't look like a file
+my $output = $#ARGV >= 0 && $ARGV[$#ARGV] =~ m|\.\w+$| ? pop : undef;
+my $flavour = $#ARGV >= 0 && $ARGV[0] !~ m|\.| ? shift : undef;
+
+$output and open STDOUT,">$output";
+
+my $code=<<___;
+.text
+___
+
+my ($V0, $V10, $V11, $V12, $V13, $V14, $V15, $V16, $V17) = ("v0", "v10", "v11", "v12", "v13", "v14","v15", "v16", "v17");
+my ($V26, $V27) = ("v26", "v27");
+
+my $K256 = "K256";
+
+# Function arguments
+my ($H, $INP, $LEN, $KT, $STRIDE) = ("a0", "a1", "a2", "a3", "t3");
+
+################################################################################
+# void sha256_block_data_order_zvbb_zvknha(void *c, const void *p, size_t len)
+$code .= <<___;
+.p2align 2
+.globl sha256_block_data_order_zvbb_zvknha
+.type   sha256_block_data_order_zvbb_zvknha,\@function
+sha256_block_data_order_zvbb_zvknha:
+    @{[vsetivli__x0_4_e32_m1_ta_ma]}
+
+    # H is stored as {a,b,c,d},{e,f,g,h}, but we need {f,e,b,a},{h,g,d,c}
+    # We achieve this by reading with a negative stride followed by
+    # element sliding.
+    li $STRIDE, -4
+    addi $H, $H, 12
+    @{[vlse32_v $V16, $H, $STRIDE]} # {d,c,b,a}
+    addi $H, $H, 16
+    @{[vlse32_v $V17, $H, $STRIDE]} # {h,g,f,e}
+    # Keep H advanced by 12
+    addi $H, $H, -16
+
+    @{[vmv_v_v $V27, $V16]} # {d,c,b,a}
+    @{[vslidedown_vi $V26, $V16, 2]} # {b,a,0,0}
+    @{[vslidedown_vi $V16, $V17, 2]} # {f,e,0,0}
+    @{[vslideup_vi $V16, $V26, 2]} # {f,e,b,a}
+    @{[vslideup_vi $V17, $V27, 2]} # {h,g,d,c}
+
+    # Keep the old state as we need it later: H' = H+{a',b',c',...,h'}.
+    @{[vmv_v_v $V26, $V16]}
+    @{[vmv_v_v $V27, $V17]}
+
+L_round_loop:
+    la $KT, $K256 # Load round constants K256
+
+    # Load the 512-bits of the message block in v10-v13 and perform
+    # an endian swap on each 4 bytes element.
+    @{[vle32_v $V10, $INP]}
+    @{[vrev8_v $V10, $V10]}
+    add $INP, $INP, 16
+    @{[vle32_v $V11, $INP]}
+    @{[vrev8_v $V11, $V11]}
+    add $INP, $INP, 16
+    @{[vle32_v $V12, $INP]}
+    @{[vrev8_v $V12, $V12]}
+    add $INP, $INP, 16
+    @{[vle32_v $V13, $INP]}
+    @{[vrev8_v $V13, $V13]}
+    add $INP, $INP, 16
+
+    # Decrement length by 1
+    add $LEN, $LEN, -1
+
+    # Set v0 up for the vmerge that replaces the first word (idx==0)
+    @{[vid_v $V0]}
+    @{[vmseq_vi $V0, $V0, 0x0]}    # v0.mask[i] = (i == 0 ? 1 : 0)
+
+    # Quad-round 0 (+0, Wt from oldest to newest in v10->v11->v12->v13)
+    @{[vle32_v $V15, $KT]}
+    addi $KT, $KT, 16
+    @{[vadd_vv $V14, $V15, $V10]}
+    @{[vsha2cl_vv $V17, $V16, $V14]}
+    @{[vsha2ch_vv $V16, $V17, $V14]}
+    @{[vmerge_vvm $V14, $V12, $V11, $V0]}
+    @{[vsha2ms_vv $V10, $V14, $V13]}  # Generate W[19:16]
+
+    # Quad-round 1 (+1, v11->v12->v13->v10)
+    @{[vle32_v $V15, $KT]}
+    addi $KT, $KT, 16
+    @{[vadd_vv $V14, $V15, $V11]}
+    @{[vsha2cl_vv $V17, $V16, $V14]}
+    @{[vsha2ch_vv $V16, $V17, $V14]}
+    @{[vmerge_vvm $V14, $V13, $V12, $V0]}
+    @{[vsha2ms_vv $V11, $V14, $V10]}  # Generate W[23:20]
+
+    # Quad-round 2 (+2, v12->v13->v10->v11)
+    @{[vle32_v $V15, $KT]}
+    addi $KT, $KT, 16
+    @{[vadd_vv $V14, $V15, $V12]}
+    @{[vsha2cl_vv $V17, $V16, $V14]}
+    @{[vsha2ch_vv $V16, $V17, $V14]}
+    @{[vmerge_vvm $V14, $V10, $V13, $V0]}
+    @{[vsha2ms_vv $V12, $V14, $V11]}  # Generate W[27:24]
+
+    # Quad-round 3 (+3, v13->v10->v11->v12)
+    @{[vle32_v $V15, $KT]}
+    addi $KT, $KT, 16
+    @{[vadd_vv $V14, $V15, $V13]}
+    @{[vsha2cl_vv $V17, $V16, $V14]}
+    @{[vsha2ch_vv $V16, $V17, $V14]}
+    @{[vmerge_vvm $V14, $V11, $V10, $V0]}
+    @{[vsha2ms_vv $V13, $V14, $V12]}  # Generate W[31:28]
+
+    # Quad-round 4 (+0, v10->v11->v12->v13)
+    @{[vle32_v $V15, $KT]}
+    addi $KT, $KT, 16
+    @{[vadd_vv $V14, $V15, $V10]}
+    @{[vsha2cl_vv $V17, $V16, $V14]}
+    @{[vsha2ch_vv $V16, $V17, $V14]}
+    @{[vmerge_vvm $V14, $V12, $V11, $V0]}
+    @{[vsha2ms_vv $V10, $V14, $V13]}  # Generate W[35:32]
+
+    # Quad-round 5 (+1, v11->v12->v13->v10)
+    @{[vle32_v $V15, $KT]}
+    addi $KT, $KT, 16
+    @{[vadd_vv $V14, $V15, $V11]}
+    @{[vsha2cl_vv $V17, $V16, $V14]}
+    @{[vsha2ch_vv $V16, $V17, $V14]}
+    @{[vmerge_vvm $V14, $V13, $V12, $V0]}
+    @{[vsha2ms_vv $V11, $V14, $V10]}  # Generate W[39:36]
+
+    # Quad-round 6 (+2, v12->v13->v10->v11)
+    @{[vle32_v $V15, $KT]}
+    addi $KT, $KT, 16
+    @{[vadd_vv $V14, $V15, $V12]}
+    @{[vsha2cl_vv $V17, $V16, $V14]}
+    @{[vsha2ch_vv $V16, $V17, $V14]}
+    @{[vmerge_vvm $V14, $V10, $V13, $V0]}
+    @{[vsha2ms_vv $V12, $V14, $V11]}  # Generate W[43:40]
+
+    # Quad-round 7 (+3, v13->v10->v11->v12)
+    @{[vle32_v $V15, $KT]}
+    addi $KT, $KT, 16
+    @{[vadd_vv $V14, $V15, $V13]}
+    @{[vsha2cl_vv $V17, $V16, $V14]}
+    @{[vsha2ch_vv $V16, $V17, $V14]}
+    @{[vmerge_vvm $V14, $V11, $V10, $V0]}
+    @{[vsha2ms_vv $V13, $V14, $V12]}  # Generate W[47:44]
+
+    # Quad-round 8 (+0, v10->v11->v12->v13)
+    @{[vle32_v $V15, $KT]}
+    addi $KT, $KT, 16
+    @{[vadd_vv $V14, $V15, $V10]}
+    @{[vsha2cl_vv $V17, $V16, $V14]}
+    @{[vsha2ch_vv $V16, $V17, $V14]}
+    @{[vmerge_vvm $V14, $V12, $V11, $V0]}
+    @{[vsha2ms_vv $V10, $V14, $V13]}  # Generate W[51:48]
+
+    # Quad-round 9 (+1, v11->v12->v13->v10)
+    @{[vle32_v $V15, $KT]}
+    addi $KT, $KT, 16
+    @{[vadd_vv $V14, $V15, $V11]}
+    @{[vsha2cl_vv $V17, $V16, $V14]}
+    @{[vsha2ch_vv $V16, $V17, $V14]}
+    @{[vmerge_vvm $V14, $V13, $V12, $V0]}
+    @{[vsha2ms_vv $V11, $V14, $V10]}  # Generate W[55:52]
+
+    # Quad-round 10 (+2, v12->v13->v10->v11)
+    @{[vle32_v $V15, $KT]}
+    addi $KT, $KT, 16
+    @{[vadd_vv $V14, $V15, $V12]}
+    @{[vsha2cl_vv $V17, $V16, $V14]}
+    @{[vsha2ch_vv $V16, $V17, $V14]}
+    @{[vmerge_vvm $V14, $V10, $V13, $V0]}
+    @{[vsha2ms_vv $V12, $V14, $V11]}  # Generate W[59:56]
+
+    # Quad-round 11 (+3, v13->v10->v11->v12)
+    @{[vle32_v $V15, $KT]}
+    addi $KT, $KT, 16
+    @{[vadd_vv $V14, $V15, $V13]}
+    @{[vsha2cl_vv $V17, $V16, $V14]}
+    @{[vsha2ch_vv $V16, $V17, $V14]}
+    @{[vmerge_vvm $V14, $V11, $V10, $V0]}
+    @{[vsha2ms_vv $V13, $V14, $V12]}  # Generate W[63:60]
+
+    # Quad-round 12 (+0, v10->v11->v12->v13)
+    # Note that we stop generating new message schedule words (Wt, v10-13)
+    # as we already generated all the words we end up consuming (i.e., W[63:60]).
+    @{[vle32_v $V15, $KT]}
+    addi $KT, $KT, 16
+    @{[vadd_vv $V14, $V15, $V10]}
+    @{[vsha2cl_vv $V17, $V16, $V14]}
+    @{[vsha2ch_vv $V16, $V17, $V14]}
+
+    # Quad-round 13 (+1, v11->v12->v13->v10)
+    @{[vle32_v $V15, $KT]}
+    addi $KT, $KT, 16
+    @{[vadd_vv $V14, $V15, $V11]}
+    @{[vsha2cl_vv $V17, $V16, $V14]}
+    @{[vsha2ch_vv $V16, $V17, $V14]}
+
+    # Quad-round 14 (+2, v12->v13->v10->v11)
+    @{[vle32_v $V15, $KT]}
+    addi $KT, $KT, 16
+    @{[vadd_vv $V14, $V15, $V12]}
+    @{[vsha2cl_vv $V17, $V16, $V14]}
+    @{[vsha2ch_vv $V16, $V17, $V14]}
+
+    # Quad-round 15 (+3, v13->v10->v11->v12)
+    @{[vle32_v $V15, $KT]}
+    # No kt increment needed.
+    @{[vadd_vv $V14, $V15, $V13]}
+    @{[vsha2cl_vv $V17, $V16, $V14]}
+    @{[vsha2ch_vv $V16, $V17, $V14]}
+
+    # H' = H+{a',b',c',...,h'}
+    @{[vadd_vv $V16, $V26, $V16]}
+    @{[vadd_vv $V17, $V27, $V17]}
+    @{[vmv_v_v $V26, $V16]}
+    @{[vmv_v_v $V27, $V17]}
+    bnez $LEN, L_round_loop
+
+    # v26 = v16 = {f,e,b,a}
+    # v27 = v17 = {h,g,d,c}
+    # Let's do the opposit transformation like on entry.
+
+    @{[vslideup_vi $V17, $V16, 2]} # {h,g,f,e}
+
+    @{[vslidedown_vi $V16, $V27, 2]} # {d,c,0,0}
+    @{[vslidedown_vi $V26, $V26, 2]} # {b,a,0,0}
+    @{[vslideup_vi $V16, $V26, 2]} # {d,c,b,a}
+
+    # H is already advanced by 12
+    @{[vsse32_v $V16, $H, $STRIDE]} # {a,b,c,d}
+    addi $H, $H, 16
+    @{[vsse32_v $V17, $H, $STRIDE]} # {e,f,g,h}
+
+    ret
+.size sha256_block_data_order_zvbb_zvknha,.-sha256_block_data_order_zvbb_zvknha
+
+.p2align 2
+.type $K256,\@object
+$K256:
+    .word 0x428a2f98, 0x71374491, 0xb5c0fbcf, 0xe9b5dba5
+    .word 0x3956c25b, 0x59f111f1, 0x923f82a4, 0xab1c5ed5
+    .word 0xd807aa98, 0x12835b01, 0x243185be, 0x550c7dc3
+    .word 0x72be5d74, 0x80deb1fe, 0x9bdc06a7, 0xc19bf174
+    .word 0xe49b69c1, 0xefbe4786, 0x0fc19dc6, 0x240ca1cc
+    .word 0x2de92c6f, 0x4a7484aa, 0x5cb0a9dc, 0x76f988da
+    .word 0x983e5152, 0xa831c66d, 0xb00327c8, 0xbf597fc7
+    .word 0xc6e00bf3, 0xd5a79147, 0x06ca6351, 0x14292967
+    .word 0x27b70a85, 0x2e1b2138, 0x4d2c6dfc, 0x53380d13
+    .word 0x650a7354, 0x766a0abb, 0x81c2c92e, 0x92722c85
+    .word 0xa2bfe8a1, 0xa81a664b, 0xc24b8b70, 0xc76c51a3
+    .word 0xd192e819, 0xd6990624, 0xf40e3585, 0x106aa070
+    .word 0x19a4c116, 0x1e376c08, 0x2748774c, 0x34b0bcb5
+    .word 0x391c0cb3, 0x4ed8aa4a, 0x5b9cca4f, 0x682e6ff3
+    .word 0x748f82ee, 0x78a5636f, 0x84c87814, 0x8cc70208
+    .word 0x90befffa, 0xa4506ceb, 0xbef9a3f7, 0xc67178f2
+.size $K256,.-$K256
+___
+
+print $code;
+
+close STDOUT or die "error closing STDOUT: $!";
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [PATCH v4 09/12] RISC-V: crypto: add a vector-crypto-accelerated SHA512 implementation
  2023-07-11 15:37 ` Heiko Stuebner
@ 2023-07-11 15:37   ` Heiko Stuebner
  -1 siblings, 0 replies; 100+ messages in thread
From: Heiko Stuebner @ 2023-07-11 15:37 UTC (permalink / raw)
  To: palmer, paul.walmsley
  Cc: aou, heiko, herbert, davem, conor.dooley, linux-riscv,
	linux-kernel, linux-crypto, christoph.muellner, ebiggers,
	Heiko Stuebner, Charalampos Mitrodimas

From: Heiko Stuebner <heiko.stuebner@vrull.eu>

This adds an accelerated SHA512 algorithm using either the Zvknhb
vector crypto extension.

Co-developed-by: Charalampos Mitrodimas <charalampos.mitrodimas@vrull.eu>
Signed-off-by: Charalampos Mitrodimas <charalampos.mitrodimas@vrull.eu>
Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
---
 arch/riscv/crypto/Kconfig                     |  11 +
 arch/riscv/crypto/Makefile                    |   8 +-
 arch/riscv/crypto/sha512-riscv64-glue.c       | 106 +++++
 .../crypto/sha512-riscv64-zvbb-zvknhb.pl      | 377 ++++++++++++++++++
 4 files changed, 501 insertions(+), 1 deletion(-)
 create mode 100644 arch/riscv/crypto/sha512-riscv64-glue.c
 create mode 100644 arch/riscv/crypto/sha512-riscv64-zvbb-zvknhb.pl

diff --git a/arch/riscv/crypto/Kconfig b/arch/riscv/crypto/Kconfig
index 860919d230aa..e564f861d95e 100644
--- a/arch/riscv/crypto/Kconfig
+++ b/arch/riscv/crypto/Kconfig
@@ -28,4 +28,15 @@ config CRYPTO_SHA256_RISCV64
 	  Architecture: riscv64 using
 	  - Zvknha or Zvknhb vector crypto extensions
 
+config CRYPTO_SHA512_RISCV64
+	tristate "Hash functions: SHA-512"
+	depends on 64BIT && RISCV_ISA_V
+	select CRYPTO_HASH
+	select CRYPTO_SHA512
+	help
+	  SHA-512 secure hash algorithm (FIPS 180)
+
+	  Architecture: riscv64
+	  - Zvknhb vector crypto extension
+
 endmenu
diff --git a/arch/riscv/crypto/Makefile b/arch/riscv/crypto/Makefile
index cae2f255ceae..b12c925172db 100644
--- a/arch/riscv/crypto/Makefile
+++ b/arch/riscv/crypto/Makefile
@@ -15,6 +15,9 @@ endif
 obj-$(CONFIG_CRYPTO_SHA256_RISCV64) += sha256-riscv64.o
 sha256-riscv64-y := sha256-riscv64-glue.o sha256-riscv64-zvbb-zvknha.o
 
+obj-$(CONFIG_CRYPTO_SHA512_RISCV64) += sha512-riscv64.o
+sha512-riscv64-y := sha512-riscv64-glue.o sha512-riscv64-zvbb-zvknhb.o
+
 quiet_cmd_perlasm = PERLASM $@
       cmd_perlasm = $(PERL) $(<) void $(@)
 
@@ -30,5 +33,8 @@ $(obj)/ghash-riscv64-zvkg.S: $(src)/ghash-riscv64-zvkg.pl
 $(obj)/sha256-riscv64-zvbb-zvknha.S: $(src)/sha256-riscv64-zvbb-zvknha.pl
 	$(call cmd,perlasm)
 
+$(obj)/sha512-riscv64-zvbb-zvknhb.S: $(src)/sha512-riscv64-zvbb-zvknhb.pl
+	$(call cmd,perlasm)
+
 clean-files += ghash-riscv64-zbc.S ghash-riscv64-zvkb.S ghash-riscv64-zvkg.S
-clean-files += sha256-riscv64-zvknha.S
+clean-files += sha256-riscv64-zvknha.S sha512-riscv64-zvknhb.S
diff --git a/arch/riscv/crypto/sha512-riscv64-glue.c b/arch/riscv/crypto/sha512-riscv64-glue.c
new file mode 100644
index 000000000000..92ea1542c22a
--- /dev/null
+++ b/arch/riscv/crypto/sha512-riscv64-glue.c
@@ -0,0 +1,106 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Linux/riscv64 port of the OpenSSL SHA512 implementation for RISCV64
+ *
+ * Copyright (C) 2023 VRULL GmbH
+ * Author: Heiko Stuebner <heiko.stuebner@vrull.eu>
+ */
+
+#include <linux/module.h>
+#include <linux/types.h>
+#include <asm/simd.h>
+#include <asm/vector.h>
+#include <crypto/internal/hash.h>
+#include <crypto/internal/simd.h>
+#include <crypto/sha2.h>
+#include <crypto/sha512_base.h>
+
+asmlinkage void sha512_block_data_order_zvbb_zvknhb(u64 *digest, const void *data,
+					unsigned int num_blks);
+
+
+static void __sha512_block_data_order(struct sha512_state *sst, u8 const *src,
+				      int blocks)
+{
+	sha512_block_data_order_zvbb_zvknhb(sst->state, src, blocks);
+}
+
+static int sha512_update(struct shash_desc *desc, const u8 *data,
+			 unsigned int len)
+{
+	if (crypto_simd_usable()) {
+		int ret;
+
+		kernel_rvv_begin();
+		ret = sha512_base_do_update(desc, data, len,
+					    __sha512_block_data_order);
+		kernel_rvv_end();
+		return ret;
+	} else {
+		return crypto_sha512_update(desc, data, len);
+	}
+}
+
+static int sha512_finup(struct shash_desc *desc, const u8 *data,
+			unsigned int len, u8 *out)
+{
+	if (!crypto_simd_usable())
+		return crypto_sha512_finup(desc, data, len, out);
+
+	kernel_rvv_begin();
+	if (len)
+		sha512_base_do_update(desc, data, len,
+				      __sha512_block_data_order);
+
+	sha512_base_do_finalize(desc, __sha512_block_data_order);
+	kernel_rvv_end();
+
+	return sha512_base_finish(desc, out);
+}
+
+static int sha512_final(struct shash_desc *desc, u8 *out)
+{
+	return sha512_finup(desc, NULL, 0, out);
+}
+
+static struct shash_alg sha512_alg = {
+	.digestsize		= SHA512_DIGEST_SIZE,
+	.init			= sha512_base_init,
+	.update			= sha512_update,
+	.final			= sha512_final,
+	.finup			= sha512_finup,
+	.descsize		= sizeof(struct sha512_state),
+	.base.cra_name		= "sha512",
+	.base.cra_driver_name	= "sha512-riscv64-zvknhb",
+	.base.cra_priority	= 150,
+	.base.cra_blocksize	= SHA512_BLOCK_SIZE,
+	.base.cra_module	= THIS_MODULE,
+};
+
+static int __init sha512_mod_init(void)
+{
+	/* sha512 needs at least a vlen of 256 to work correctly */
+	if (riscv_isa_extension_available(NULL, ZVKNHB) &&
+	    riscv_isa_extension_available(NULL, ZVBB) &&
+	    riscv_vector_vlen() >= 256)
+		return crypto_register_shash(&sha512_alg);
+
+	return 0;
+}
+
+static void __exit sha512_mod_fini(void)
+{
+	if (riscv_isa_extension_available(NULL, ZVKNHB) &&
+	    riscv_isa_extension_available(NULL, ZVBB) &&
+	    riscv_vector_vlen() >= 256)
+		crypto_unregister_shash(&sha512_alg);
+}
+
+module_init(sha512_mod_init);
+module_exit(sha512_mod_fini);
+
+MODULE_DESCRIPTION("SHA-512 secure hash for riscv64");
+MODULE_AUTHOR("Andy Polyakov <appro@openssl.org>");
+MODULE_AUTHOR("Ard Biesheuvel <ard.biesheuvel@linaro.org>");
+MODULE_LICENSE("GPL");
+MODULE_ALIAS_CRYPTO("sha512");
diff --git a/arch/riscv/crypto/sha512-riscv64-zvbb-zvknhb.pl b/arch/riscv/crypto/sha512-riscv64-zvbb-zvknhb.pl
new file mode 100644
index 000000000000..4bd09443dcdd
--- /dev/null
+++ b/arch/riscv/crypto/sha512-riscv64-zvbb-zvknhb.pl
@@ -0,0 +1,377 @@
+#! /usr/bin/env perl
+# SPDX-License-Identifier: Apache-2.0 OR BSD-2-Clause
+#
+# This file is dual-licensed, meaning that you can use it under your
+# choice of either of the following two licenses:
+#
+# Copyright 2023 The OpenSSL Project Authors. All Rights Reserved.
+#
+# Licensed under the Apache License 2.0 (the "License"). You can obtain
+# a copy in the file LICENSE in the source distribution or at
+# https://www.openssl.org/source/license.html
+#
+# or
+#
+# Copyright (c) 2023, Christoph Müllner <christoph.muellner@vrull.eu>
+# All rights reserved.
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions
+# are met:
+# 1. Redistributions of source code must retain the above copyright
+#    notice, this list of conditions and the following disclaimer.
+# 2. Redistributions in binary form must reproduce the above copyright
+#    notice, this list of conditions and the following disclaimer in the
+#    documentation and/or other materials provided with the distribution.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+# The generated code of this file depends on the following RISC-V extensions:
+# - RV64I
+# - RISC-V vector ('V') with VLEN >= 256
+# - Vector Bit-manipulation used in Cryptography ('Zvbb')
+# - Vector SHA-2 Secure Hash ('Zvknhb')
+
+use strict;
+use warnings;
+
+use FindBin qw($Bin);
+use lib "$Bin";
+use lib "$Bin/../../perlasm";
+use riscv;
+
+# $output is the last argument if it looks like a file (it has an extension)
+# $flavour is the first argument if it doesn't look like a file
+my $output = $#ARGV >= 0 && $ARGV[$#ARGV] =~ m|\.\w+$| ? pop : undef;
+my $flavour = $#ARGV >= 0 && $ARGV[0] !~ m|\.| ? shift : undef;
+
+$output and open STDOUT,">$output";
+
+my $code=<<___;
+.text
+___
+
+my ($V0, $V10, $V11, $V12, $V13, $V14, $V15, $V16, $V17) = ("v0", "v10", "v11", "v12", "v13", "v14","v15", "v16", "v17");
+my ($V26, $V27) = ("v26", "v27");
+
+my $K512 = "K512";
+
+# Function arguments
+my ($H, $INP, $LEN, $KT, $STRIDE) = ("a0", "a1", "a2", "a3", "t3");
+
+################################################################################
+# void sha512_block_data_order_zvbb_zvknhb(void *c, const void *p, size_t len)
+$code .= <<___;
+.p2align 2
+.globl sha512_block_data_order_zvbb_zvknhb
+.type sha512_block_data_order_zvbb_zvknhb,\@function
+sha512_block_data_order_zvbb_zvknhb:
+    @{[vsetivli__x0_4_e64_m1_ta_ma]}
+
+    # H is stored as {a,b,c,d},{e,f,g,h}, but we need {f,e,b,a},{h,g,d,c}
+    # We achieve this by reading with a negative stride followed by
+    # element sliding.
+    li $STRIDE, -8
+    addi $H, $H, 24
+    @{[vlse64_v $V16, $H, $STRIDE]} # {d,c,b,a}
+    addi $H, $H, 32
+    @{[vlse64_v $V17, $H, $STRIDE]} # {h,g,f,e}
+    # Keep H advanced by 24
+    addi $H, $H, -32
+
+    @{[vmv_v_v $V27, $V16]} # {d,c,b,a}
+    @{[vslidedown_vi $V26, $V16, 2]} # {b,a,X,X}
+    @{[vslidedown_vi $V16, $V17, 2]} # {f,e,X,X}
+    @{[vslideup_vi $V16, $V26, 2]} # {f,e,b,a}
+    @{[vslideup_vi $V17, $V27, 2]} # {h,g,d,c}
+
+    # Keep the old state as we need it later: H' = H+{a',b',c',...,h'}.
+    @{[vmv_v_v $V26, $V16]}
+    @{[vmv_v_v $V27, $V17]}
+
+L_round_loop:
+    la $KT, $K512 # Load round constants K512
+
+    # Load the 1024-bits of the message block in v10-v13 and perform
+    # an endian swap on each 4 bytes element.
+    @{[vle64_v $V10, $INP]}
+    @{[vrev8_v $V10, $V10]}
+    add $INP, $INP, 32
+    @{[vle64_v $V11, $INP]}
+    @{[vrev8_v $V11, $V11]}
+    add $INP, $INP, 32
+    @{[vle64_v $V12, $INP]}
+    @{[vrev8_v $V12, $V12]}
+    add $INP, $INP, 32
+    @{[vle64_v $V13, $INP]}
+    @{[vrev8_v $V13, $V13]}
+    add $INP, $INP, 32
+
+    # Decrement length by 1
+    add $LEN, $LEN, -1
+
+    # Set v0 up for the vmerge that replaces the first word (idx==0)
+    @{[vid_v $V0]}
+    @{[vmseq_vi $V0, $V0, 0x0]} # v0.mask[i] = (i == 0 ? 1 : 0)
+
+    # Quad-round 0 (+0, v10->v11->v12->v13)
+    @{[vle64_v $V15, ($KT)]}
+    addi $KT, $KT, 32
+    @{[vadd_vv $V14, $V15, $V10]}
+    @{[vsha2cl_vv $V17, $V16, $V14]}
+    @{[vsha2ch_vv $V16, $V17, $V14]}
+    @{[vmerge_vvm $V14, $V12, $V11, $V0]}
+    @{[vsha2ms_vv $V10, $V14, $V13]}
+
+    # Quad-round 1 (+1, v11->v12->v13->v10)
+    @{[vle64_v $V15, ($KT)]}
+    addi $KT, $KT, 32
+    @{[vadd_vv $V14, $V15, $V11]}
+    @{[vsha2cl_vv $V17, $V16, $V14]}
+    @{[vsha2ch_vv $V16, $V17, $V14]}
+    @{[vmerge_vvm $V14, $V13, $V12, $V0]}
+    @{[vsha2ms_vv $V11, $V14, $V10]}
+
+    # Quad-round 2 (+2, v12->v13->v10->v11)
+    @{[vle64_v $V15, ($KT)]}
+    addi $KT, $KT, 32
+    @{[vadd_vv $V14, $V15, $V12]}
+    @{[vsha2cl_vv $V17, $V16, $V14]}
+    @{[vsha2ch_vv $V16, $V17, $V14]}
+    @{[vmerge_vvm $V14, $V10, $V13, $V0]}
+    @{[vsha2ms_vv $V12, $V14, $V11]}
+
+    # Quad-round 3 (+3, v13->v10->v11->v12)
+    @{[vle64_v $V15, ($KT)]}
+    addi $KT, $KT, 32
+    @{[vadd_vv $V14, $V15, $V13]}
+    @{[vsha2cl_vv $V17, $V16, $V14]}
+    @{[vsha2ch_vv $V16, $V17, $V14]}
+    @{[vmerge_vvm $V14, $V11, $V10, $V0]}
+    @{[vsha2ms_vv $V13, $V14, $V12]}
+
+    # Quad-round 4 (+0, v10->v11->v12->v13)
+    @{[vle64_v $V15, ($KT)]}
+    addi $KT, $KT, 32
+    @{[vadd_vv $V14, $V15, $V10]}
+    @{[vsha2cl_vv $V17, $V16, $V14]}
+    @{[vsha2ch_vv $V16, $V17, $V14]}
+    @{[vmerge_vvm $V14, $V12, $V11, $V0]}
+    @{[vsha2ms_vv $V10, $V14, $V13]}
+
+    # Quad-round 5 (+1, v11->v12->v13->v10)
+    @{[vle64_v $V15, ($KT)]}
+    addi $KT, $KT, 32
+    @{[vadd_vv $V14, $V15, $V11]}
+    @{[vsha2cl_vv $V17, $V16, $V14]}
+    @{[vsha2ch_vv $V16, $V17, $V14]}
+    @{[vmerge_vvm $V14, $V13, $V12, $V0]}
+    @{[vsha2ms_vv $V11, $V14, $V10]}
+
+    # Quad-round 6 (+2, v12->v13->v10->v11)
+    @{[vle64_v $V15, ($KT)]}
+    addi $KT, $KT, 32
+    @{[vadd_vv $V14, $V15, $V12]}
+    @{[vsha2cl_vv $V17, $V16, $V14]}
+    @{[vsha2ch_vv $V16, $V17, $V14]}
+    @{[vmerge_vvm $V14, $V10, $V13, $V0]}
+    @{[vsha2ms_vv $V12, $V14, $V11]}
+
+    # Quad-round 7 (+3, v13->v10->v11->v12)
+    @{[vle64_v $V15, ($KT)]}
+    addi $KT, $KT, 32
+    @{[vadd_vv $V14, $V15, $V13]}
+    @{[vsha2cl_vv $V17, $V16, $V14]}
+    @{[vsha2ch_vv $V16, $V17, $V14]}
+    @{[vmerge_vvm $V14, $V11, $V10, $V0]}
+    @{[vsha2ms_vv $V13, $V14, $V12]}
+
+    # Quad-round 8 (+0, v10->v11->v12->v13)
+    @{[vle64_v $V15, ($KT)]}
+    addi $KT, $KT, 32
+    @{[vadd_vv $V14, $V15, $V10]}
+    @{[vsha2cl_vv $V17, $V16, $V14]}
+    @{[vsha2ch_vv $V16, $V17, $V14]}
+    @{[vmerge_vvm $V14, $V12, $V11, $V0]}
+    @{[vsha2ms_vv $V10, $V14, $V13]}
+
+    # Quad-round 9 (+1, v11->v12->v13->v10)
+    @{[vle64_v $V15, ($KT)]}
+    addi $KT, $KT, 32
+    @{[vadd_vv $V14, $V15, $V11]}
+    @{[vsha2cl_vv $V17, $V16, $V14]}
+    @{[vsha2ch_vv $V16, $V17, $V14]}
+    @{[vmerge_vvm $V14, $V13, $V12, $V0]}
+    @{[vsha2ms_vv $V11, $V14, $V10]}
+
+    # Quad-round 10 (+2, v12->v13->v10->v11)
+    @{[vle64_v $V15, ($KT)]}
+    addi $KT, $KT, 32
+    @{[vadd_vv $V14, $V15, $V12]}
+    @{[vsha2cl_vv $V17, $V16, $V14]}
+    @{[vsha2ch_vv $V16, $V17, $V14]}
+    @{[vmerge_vvm $V14, $V10, $V13, $V0]}
+    @{[vsha2ms_vv $V12, $V14, $V11]}
+
+    # Quad-round 11 (+3, v13->v10->v11->v12)
+    @{[vle64_v $V15, ($KT)]}
+    addi $KT, $KT, 32
+    @{[vadd_vv $V14, $V15, $V13]}
+    @{[vsha2cl_vv $V17, $V16, $V14]}
+    @{[vsha2ch_vv $V16, $V17, $V14]}
+    @{[vmerge_vvm $V14, $V11, $V10, $V0]}
+    @{[vsha2ms_vv $V13, $V14, $V12]}
+
+    # Quad-round 12 (+0, v10->v11->v12->v13)
+    @{[vle64_v $V15, ($KT)]}
+    addi $KT, $KT, 32
+    @{[vadd_vv $V14, $V15, $V10]}
+    @{[vsha2cl_vv $V17, $V16, $V14]}
+    @{[vsha2ch_vv $V16, $V17, $V14]}
+    @{[vmerge_vvm $V14, $V12, $V11, $V0]}
+    @{[vsha2ms_vv $V10, $V14, $V13]}
+
+    # Quad-round 13 (+1, v11->v12->v13->v10)
+    @{[vle64_v $V15, ($KT)]}
+    addi $KT, $KT, 32
+    @{[vadd_vv $V14, $V15, $V11]}
+    @{[vsha2cl_vv $V17, $V16, $V14]}
+    @{[vsha2ch_vv $V16, $V17, $V14]}
+    @{[vmerge_vvm $V14, $V13, $V12, $V0]}
+    @{[vsha2ms_vv $V11, $V14, $V10]}
+
+    # Quad-round 14 (+2, v12->v13->v10->v11)
+    @{[vle64_v $V15, ($KT)]}
+    addi $KT, $KT, 32
+    @{[vadd_vv $V14, $V15, $V12]}
+    @{[vsha2cl_vv $V17, $V16, $V14]}
+    @{[vsha2ch_vv $V16, $V17, $V14]}
+    @{[vmerge_vvm $V14, $V10, $V13, $V0]}
+    @{[vsha2ms_vv $V12, $V14, $V11]}
+
+    # Quad-round 15 (+3, v13->v10->v11->v12)
+    @{[vle64_v $V15, ($KT)]}
+    addi $KT, $KT, 32
+    @{[vadd_vv $V14, $V15, $V13]}
+    @{[vsha2cl_vv $V17, $V16, $V14]}
+    @{[vsha2ch_vv $V16, $V17, $V14]}
+    @{[vmerge_vvm $V14, $V11, $V10, $V0]}
+    @{[vsha2ms_vv $V13, $V14, $V12]}
+
+    # Quad-round 16 (+0, v10->v11->v12->v13)
+    # Note that we stop generating new message schedule words (Wt, v10-13)
+    # as we already generated all the words we end up consuming (i.e., W[79:76]).
+    @{[vle64_v $V15, ($KT)]}
+    addi $KT, $KT, 32
+    @{[vadd_vv $V14, $V15, $V10]}
+    @{[vsha2cl_vv $V17, $V16, $V14]}
+    @{[vsha2ch_vv $V16, $V17, $V14]}
+    @{[vmerge_vvm $V14, $V12, $V11, $V0]}
+
+    # Quad-round 17 (+1, v11->v12->v13->v10)
+    @{[vle64_v $V15, ($KT)]}
+    addi $KT, $KT, 32
+    @{[vadd_vv $V14, $V15, $V11]}
+    @{[vsha2cl_vv $V17, $V16, $V14]}
+    @{[vsha2ch_vv $V16, $V17, $V14]}
+    @{[vmerge_vvm $V14, $V13, $V12, $V0]}
+
+    # Quad-round 18 (+2, v12->v13->v10->v11)
+    @{[vle64_v $V15, ($KT)]}
+    addi $KT, $KT, 32
+    @{[vadd_vv $V14, $V15, $V12]}
+    @{[vsha2cl_vv $V17, $V16, $V14]}
+    @{[vsha2ch_vv $V16, $V17, $V14]}
+    @{[vmerge_vvm $V14, $V10, $V13, $V0]}
+
+    # Quad-round 19 (+3, v13->v10->v11->v12)
+    @{[vle64_v $V15, ($KT)]}
+    # No t1 increment needed.
+    @{[vadd_vv $V14, $V15, $V13]}
+    @{[vsha2cl_vv $V17, $V16, $V14]}
+    @{[vsha2ch_vv $V16, $V17, $V14]}
+
+    # H' = H+{a',b',c',...,h'}
+    @{[vadd_vv $V16, $V26, $V16]}
+    @{[vadd_vv $V17, $V27, $V17]}
+    @{[vmv_v_v $V26, $V16]}
+    @{[vmv_v_v $V27, $V17]}
+    bnez $LEN, L_round_loop
+
+    # v26 = v16 = {f,e,b,a}
+    # v27 = v17 = {h,g,d,c}
+    # Let's do the opposit transformation like on entry.
+
+    @{[vslideup_vi $V17, $V16, 2]} # {h,g,f,e}
+
+    @{[vslidedown_vi $V16, $V27, 2]} # {d,c,X,X}
+    @{[vslidedown_vi $V26, $V26, 2]} # {b,a,X,X}
+    @{[vslideup_vi $V16, $V26, 2]} # {d,c,b,a}
+
+    # H is already advanced by 24
+    @{[vsse64_v $V16, $H, $STRIDE]} # {a,b,c,d}
+    addi $H, $H, 32
+    @{[vsse64_v $V17, $H, $STRIDE]} # {e,f,g,h}
+
+    ret
+.size sha512_block_data_order_zvbb_zvknhb,.-sha512_block_data_order_zvbb_zvknhb
+
+.p2align 3
+.type $K512,\@object
+$K512:
+    .dword 0x428a2f98d728ae22, 0x7137449123ef65cd
+    .dword 0xb5c0fbcfec4d3b2f, 0xe9b5dba58189dbbc
+    .dword 0x3956c25bf348b538, 0x59f111f1b605d019
+    .dword 0x923f82a4af194f9b, 0xab1c5ed5da6d8118
+    .dword 0xd807aa98a3030242, 0x12835b0145706fbe
+    .dword 0x243185be4ee4b28c, 0x550c7dc3d5ffb4e2
+    .dword 0x72be5d74f27b896f, 0x80deb1fe3b1696b1
+    .dword 0x9bdc06a725c71235, 0xc19bf174cf692694
+    .dword 0xe49b69c19ef14ad2, 0xefbe4786384f25e3
+    .dword 0x0fc19dc68b8cd5b5, 0x240ca1cc77ac9c65
+    .dword 0x2de92c6f592b0275, 0x4a7484aa6ea6e483
+    .dword 0x5cb0a9dcbd41fbd4, 0x76f988da831153b5
+    .dword 0x983e5152ee66dfab, 0xa831c66d2db43210
+    .dword 0xb00327c898fb213f, 0xbf597fc7beef0ee4
+    .dword 0xc6e00bf33da88fc2, 0xd5a79147930aa725
+    .dword 0x06ca6351e003826f, 0x142929670a0e6e70
+    .dword 0x27b70a8546d22ffc, 0x2e1b21385c26c926
+    .dword 0x4d2c6dfc5ac42aed, 0x53380d139d95b3df
+    .dword 0x650a73548baf63de, 0x766a0abb3c77b2a8
+    .dword 0x81c2c92e47edaee6, 0x92722c851482353b
+    .dword 0xa2bfe8a14cf10364, 0xa81a664bbc423001
+    .dword 0xc24b8b70d0f89791, 0xc76c51a30654be30
+    .dword 0xd192e819d6ef5218, 0xd69906245565a910
+    .dword 0xf40e35855771202a, 0x106aa07032bbd1b8
+    .dword 0x19a4c116b8d2d0c8, 0x1e376c085141ab53
+    .dword 0x2748774cdf8eeb99, 0x34b0bcb5e19b48a8
+    .dword 0x391c0cb3c5c95a63, 0x4ed8aa4ae3418acb
+    .dword 0x5b9cca4f7763e373, 0x682e6ff3d6b2b8a3
+    .dword 0x748f82ee5defb2fc, 0x78a5636f43172f60
+    .dword 0x84c87814a1f0ab72, 0x8cc702081a6439ec
+    .dword 0x90befffa23631e28, 0xa4506cebde82bde9
+    .dword 0xbef9a3f7b2c67915, 0xc67178f2e372532b
+    .dword 0xca273eceea26619c, 0xd186b8c721c0c207
+    .dword 0xeada7dd6cde0eb1e, 0xf57d4f7fee6ed178
+    .dword 0x06f067aa72176fba, 0x0a637dc5a2c898a6
+    .dword 0x113f9804bef90dae, 0x1b710b35131c471b
+    .dword 0x28db77f523047d84, 0x32caab7b40c72493
+    .dword 0x3c9ebe0a15c9bebc, 0x431d67c49c100d4c
+    .dword 0x4cc5d4becb3e42b6, 0x597f299cfc657e2a
+    .dword 0x5fcb6fab3ad6faec, 0x6c44198c4a475817
+.size $K512,.-$K512
+___
+
+print $code;
+
+close STDOUT or die "error closing STDOUT: $!";
-- 
2.39.2


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [PATCH v4 09/12] RISC-V: crypto: add a vector-crypto-accelerated SHA512 implementation
@ 2023-07-11 15:37   ` Heiko Stuebner
  0 siblings, 0 replies; 100+ messages in thread
From: Heiko Stuebner @ 2023-07-11 15:37 UTC (permalink / raw)
  To: palmer, paul.walmsley
  Cc: aou, heiko, herbert, davem, conor.dooley, linux-riscv,
	linux-kernel, linux-crypto, christoph.muellner, ebiggers,
	Heiko Stuebner, Charalampos Mitrodimas

From: Heiko Stuebner <heiko.stuebner@vrull.eu>

This adds an accelerated SHA512 algorithm using either the Zvknhb
vector crypto extension.

Co-developed-by: Charalampos Mitrodimas <charalampos.mitrodimas@vrull.eu>
Signed-off-by: Charalampos Mitrodimas <charalampos.mitrodimas@vrull.eu>
Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
---
 arch/riscv/crypto/Kconfig                     |  11 +
 arch/riscv/crypto/Makefile                    |   8 +-
 arch/riscv/crypto/sha512-riscv64-glue.c       | 106 +++++
 .../crypto/sha512-riscv64-zvbb-zvknhb.pl      | 377 ++++++++++++++++++
 4 files changed, 501 insertions(+), 1 deletion(-)
 create mode 100644 arch/riscv/crypto/sha512-riscv64-glue.c
 create mode 100644 arch/riscv/crypto/sha512-riscv64-zvbb-zvknhb.pl

diff --git a/arch/riscv/crypto/Kconfig b/arch/riscv/crypto/Kconfig
index 860919d230aa..e564f861d95e 100644
--- a/arch/riscv/crypto/Kconfig
+++ b/arch/riscv/crypto/Kconfig
@@ -28,4 +28,15 @@ config CRYPTO_SHA256_RISCV64
 	  Architecture: riscv64 using
 	  - Zvknha or Zvknhb vector crypto extensions
 
+config CRYPTO_SHA512_RISCV64
+	tristate "Hash functions: SHA-512"
+	depends on 64BIT && RISCV_ISA_V
+	select CRYPTO_HASH
+	select CRYPTO_SHA512
+	help
+	  SHA-512 secure hash algorithm (FIPS 180)
+
+	  Architecture: riscv64
+	  - Zvknhb vector crypto extension
+
 endmenu
diff --git a/arch/riscv/crypto/Makefile b/arch/riscv/crypto/Makefile
index cae2f255ceae..b12c925172db 100644
--- a/arch/riscv/crypto/Makefile
+++ b/arch/riscv/crypto/Makefile
@@ -15,6 +15,9 @@ endif
 obj-$(CONFIG_CRYPTO_SHA256_RISCV64) += sha256-riscv64.o
 sha256-riscv64-y := sha256-riscv64-glue.o sha256-riscv64-zvbb-zvknha.o
 
+obj-$(CONFIG_CRYPTO_SHA512_RISCV64) += sha512-riscv64.o
+sha512-riscv64-y := sha512-riscv64-glue.o sha512-riscv64-zvbb-zvknhb.o
+
 quiet_cmd_perlasm = PERLASM $@
       cmd_perlasm = $(PERL) $(<) void $(@)
 
@@ -30,5 +33,8 @@ $(obj)/ghash-riscv64-zvkg.S: $(src)/ghash-riscv64-zvkg.pl
 $(obj)/sha256-riscv64-zvbb-zvknha.S: $(src)/sha256-riscv64-zvbb-zvknha.pl
 	$(call cmd,perlasm)
 
+$(obj)/sha512-riscv64-zvbb-zvknhb.S: $(src)/sha512-riscv64-zvbb-zvknhb.pl
+	$(call cmd,perlasm)
+
 clean-files += ghash-riscv64-zbc.S ghash-riscv64-zvkb.S ghash-riscv64-zvkg.S
-clean-files += sha256-riscv64-zvknha.S
+clean-files += sha256-riscv64-zvknha.S sha512-riscv64-zvknhb.S
diff --git a/arch/riscv/crypto/sha512-riscv64-glue.c b/arch/riscv/crypto/sha512-riscv64-glue.c
new file mode 100644
index 000000000000..92ea1542c22a
--- /dev/null
+++ b/arch/riscv/crypto/sha512-riscv64-glue.c
@@ -0,0 +1,106 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Linux/riscv64 port of the OpenSSL SHA512 implementation for RISCV64
+ *
+ * Copyright (C) 2023 VRULL GmbH
+ * Author: Heiko Stuebner <heiko.stuebner@vrull.eu>
+ */
+
+#include <linux/module.h>
+#include <linux/types.h>
+#include <asm/simd.h>
+#include <asm/vector.h>
+#include <crypto/internal/hash.h>
+#include <crypto/internal/simd.h>
+#include <crypto/sha2.h>
+#include <crypto/sha512_base.h>
+
+asmlinkage void sha512_block_data_order_zvbb_zvknhb(u64 *digest, const void *data,
+					unsigned int num_blks);
+
+
+static void __sha512_block_data_order(struct sha512_state *sst, u8 const *src,
+				      int blocks)
+{
+	sha512_block_data_order_zvbb_zvknhb(sst->state, src, blocks);
+}
+
+static int sha512_update(struct shash_desc *desc, const u8 *data,
+			 unsigned int len)
+{
+	if (crypto_simd_usable()) {
+		int ret;
+
+		kernel_rvv_begin();
+		ret = sha512_base_do_update(desc, data, len,
+					    __sha512_block_data_order);
+		kernel_rvv_end();
+		return ret;
+	} else {
+		return crypto_sha512_update(desc, data, len);
+	}
+}
+
+static int sha512_finup(struct shash_desc *desc, const u8 *data,
+			unsigned int len, u8 *out)
+{
+	if (!crypto_simd_usable())
+		return crypto_sha512_finup(desc, data, len, out);
+
+	kernel_rvv_begin();
+	if (len)
+		sha512_base_do_update(desc, data, len,
+				      __sha512_block_data_order);
+
+	sha512_base_do_finalize(desc, __sha512_block_data_order);
+	kernel_rvv_end();
+
+	return sha512_base_finish(desc, out);
+}
+
+static int sha512_final(struct shash_desc *desc, u8 *out)
+{
+	return sha512_finup(desc, NULL, 0, out);
+}
+
+static struct shash_alg sha512_alg = {
+	.digestsize		= SHA512_DIGEST_SIZE,
+	.init			= sha512_base_init,
+	.update			= sha512_update,
+	.final			= sha512_final,
+	.finup			= sha512_finup,
+	.descsize		= sizeof(struct sha512_state),
+	.base.cra_name		= "sha512",
+	.base.cra_driver_name	= "sha512-riscv64-zvknhb",
+	.base.cra_priority	= 150,
+	.base.cra_blocksize	= SHA512_BLOCK_SIZE,
+	.base.cra_module	= THIS_MODULE,
+};
+
+static int __init sha512_mod_init(void)
+{
+	/* sha512 needs at least a vlen of 256 to work correctly */
+	if (riscv_isa_extension_available(NULL, ZVKNHB) &&
+	    riscv_isa_extension_available(NULL, ZVBB) &&
+	    riscv_vector_vlen() >= 256)
+		return crypto_register_shash(&sha512_alg);
+
+	return 0;
+}
+
+static void __exit sha512_mod_fini(void)
+{
+	if (riscv_isa_extension_available(NULL, ZVKNHB) &&
+	    riscv_isa_extension_available(NULL, ZVBB) &&
+	    riscv_vector_vlen() >= 256)
+		crypto_unregister_shash(&sha512_alg);
+}
+
+module_init(sha512_mod_init);
+module_exit(sha512_mod_fini);
+
+MODULE_DESCRIPTION("SHA-512 secure hash for riscv64");
+MODULE_AUTHOR("Andy Polyakov <appro@openssl.org>");
+MODULE_AUTHOR("Ard Biesheuvel <ard.biesheuvel@linaro.org>");
+MODULE_LICENSE("GPL");
+MODULE_ALIAS_CRYPTO("sha512");
diff --git a/arch/riscv/crypto/sha512-riscv64-zvbb-zvknhb.pl b/arch/riscv/crypto/sha512-riscv64-zvbb-zvknhb.pl
new file mode 100644
index 000000000000..4bd09443dcdd
--- /dev/null
+++ b/arch/riscv/crypto/sha512-riscv64-zvbb-zvknhb.pl
@@ -0,0 +1,377 @@
+#! /usr/bin/env perl
+# SPDX-License-Identifier: Apache-2.0 OR BSD-2-Clause
+#
+# This file is dual-licensed, meaning that you can use it under your
+# choice of either of the following two licenses:
+#
+# Copyright 2023 The OpenSSL Project Authors. All Rights Reserved.
+#
+# Licensed under the Apache License 2.0 (the "License"). You can obtain
+# a copy in the file LICENSE in the source distribution or at
+# https://www.openssl.org/source/license.html
+#
+# or
+#
+# Copyright (c) 2023, Christoph Müllner <christoph.muellner@vrull.eu>
+# All rights reserved.
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions
+# are met:
+# 1. Redistributions of source code must retain the above copyright
+#    notice, this list of conditions and the following disclaimer.
+# 2. Redistributions in binary form must reproduce the above copyright
+#    notice, this list of conditions and the following disclaimer in the
+#    documentation and/or other materials provided with the distribution.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+# The generated code of this file depends on the following RISC-V extensions:
+# - RV64I
+# - RISC-V vector ('V') with VLEN >= 256
+# - Vector Bit-manipulation used in Cryptography ('Zvbb')
+# - Vector SHA-2 Secure Hash ('Zvknhb')
+
+use strict;
+use warnings;
+
+use FindBin qw($Bin);
+use lib "$Bin";
+use lib "$Bin/../../perlasm";
+use riscv;
+
+# $output is the last argument if it looks like a file (it has an extension)
+# $flavour is the first argument if it doesn't look like a file
+my $output = $#ARGV >= 0 && $ARGV[$#ARGV] =~ m|\.\w+$| ? pop : undef;
+my $flavour = $#ARGV >= 0 && $ARGV[0] !~ m|\.| ? shift : undef;
+
+$output and open STDOUT,">$output";
+
+my $code=<<___;
+.text
+___
+
+my ($V0, $V10, $V11, $V12, $V13, $V14, $V15, $V16, $V17) = ("v0", "v10", "v11", "v12", "v13", "v14","v15", "v16", "v17");
+my ($V26, $V27) = ("v26", "v27");
+
+my $K512 = "K512";
+
+# Function arguments
+my ($H, $INP, $LEN, $KT, $STRIDE) = ("a0", "a1", "a2", "a3", "t3");
+
+################################################################################
+# void sha512_block_data_order_zvbb_zvknhb(void *c, const void *p, size_t len)
+$code .= <<___;
+.p2align 2
+.globl sha512_block_data_order_zvbb_zvknhb
+.type sha512_block_data_order_zvbb_zvknhb,\@function
+sha512_block_data_order_zvbb_zvknhb:
+    @{[vsetivli__x0_4_e64_m1_ta_ma]}
+
+    # H is stored as {a,b,c,d},{e,f,g,h}, but we need {f,e,b,a},{h,g,d,c}
+    # We achieve this by reading with a negative stride followed by
+    # element sliding.
+    li $STRIDE, -8
+    addi $H, $H, 24
+    @{[vlse64_v $V16, $H, $STRIDE]} # {d,c,b,a}
+    addi $H, $H, 32
+    @{[vlse64_v $V17, $H, $STRIDE]} # {h,g,f,e}
+    # Keep H advanced by 24
+    addi $H, $H, -32
+
+    @{[vmv_v_v $V27, $V16]} # {d,c,b,a}
+    @{[vslidedown_vi $V26, $V16, 2]} # {b,a,X,X}
+    @{[vslidedown_vi $V16, $V17, 2]} # {f,e,X,X}
+    @{[vslideup_vi $V16, $V26, 2]} # {f,e,b,a}
+    @{[vslideup_vi $V17, $V27, 2]} # {h,g,d,c}
+
+    # Keep the old state as we need it later: H' = H+{a',b',c',...,h'}.
+    @{[vmv_v_v $V26, $V16]}
+    @{[vmv_v_v $V27, $V17]}
+
+L_round_loop:
+    la $KT, $K512 # Load round constants K512
+
+    # Load the 1024-bits of the message block in v10-v13 and perform
+    # an endian swap on each 4 bytes element.
+    @{[vle64_v $V10, $INP]}
+    @{[vrev8_v $V10, $V10]}
+    add $INP, $INP, 32
+    @{[vle64_v $V11, $INP]}
+    @{[vrev8_v $V11, $V11]}
+    add $INP, $INP, 32
+    @{[vle64_v $V12, $INP]}
+    @{[vrev8_v $V12, $V12]}
+    add $INP, $INP, 32
+    @{[vle64_v $V13, $INP]}
+    @{[vrev8_v $V13, $V13]}
+    add $INP, $INP, 32
+
+    # Decrement length by 1
+    add $LEN, $LEN, -1
+
+    # Set v0 up for the vmerge that replaces the first word (idx==0)
+    @{[vid_v $V0]}
+    @{[vmseq_vi $V0, $V0, 0x0]} # v0.mask[i] = (i == 0 ? 1 : 0)
+
+    # Quad-round 0 (+0, v10->v11->v12->v13)
+    @{[vle64_v $V15, ($KT)]}
+    addi $KT, $KT, 32
+    @{[vadd_vv $V14, $V15, $V10]}
+    @{[vsha2cl_vv $V17, $V16, $V14]}
+    @{[vsha2ch_vv $V16, $V17, $V14]}
+    @{[vmerge_vvm $V14, $V12, $V11, $V0]}
+    @{[vsha2ms_vv $V10, $V14, $V13]}
+
+    # Quad-round 1 (+1, v11->v12->v13->v10)
+    @{[vle64_v $V15, ($KT)]}
+    addi $KT, $KT, 32
+    @{[vadd_vv $V14, $V15, $V11]}
+    @{[vsha2cl_vv $V17, $V16, $V14]}
+    @{[vsha2ch_vv $V16, $V17, $V14]}
+    @{[vmerge_vvm $V14, $V13, $V12, $V0]}
+    @{[vsha2ms_vv $V11, $V14, $V10]}
+
+    # Quad-round 2 (+2, v12->v13->v10->v11)
+    @{[vle64_v $V15, ($KT)]}
+    addi $KT, $KT, 32
+    @{[vadd_vv $V14, $V15, $V12]}
+    @{[vsha2cl_vv $V17, $V16, $V14]}
+    @{[vsha2ch_vv $V16, $V17, $V14]}
+    @{[vmerge_vvm $V14, $V10, $V13, $V0]}
+    @{[vsha2ms_vv $V12, $V14, $V11]}
+
+    # Quad-round 3 (+3, v13->v10->v11->v12)
+    @{[vle64_v $V15, ($KT)]}
+    addi $KT, $KT, 32
+    @{[vadd_vv $V14, $V15, $V13]}
+    @{[vsha2cl_vv $V17, $V16, $V14]}
+    @{[vsha2ch_vv $V16, $V17, $V14]}
+    @{[vmerge_vvm $V14, $V11, $V10, $V0]}
+    @{[vsha2ms_vv $V13, $V14, $V12]}
+
+    # Quad-round 4 (+0, v10->v11->v12->v13)
+    @{[vle64_v $V15, ($KT)]}
+    addi $KT, $KT, 32
+    @{[vadd_vv $V14, $V15, $V10]}
+    @{[vsha2cl_vv $V17, $V16, $V14]}
+    @{[vsha2ch_vv $V16, $V17, $V14]}
+    @{[vmerge_vvm $V14, $V12, $V11, $V0]}
+    @{[vsha2ms_vv $V10, $V14, $V13]}
+
+    # Quad-round 5 (+1, v11->v12->v13->v10)
+    @{[vle64_v $V15, ($KT)]}
+    addi $KT, $KT, 32
+    @{[vadd_vv $V14, $V15, $V11]}
+    @{[vsha2cl_vv $V17, $V16, $V14]}
+    @{[vsha2ch_vv $V16, $V17, $V14]}
+    @{[vmerge_vvm $V14, $V13, $V12, $V0]}
+    @{[vsha2ms_vv $V11, $V14, $V10]}
+
+    # Quad-round 6 (+2, v12->v13->v10->v11)
+    @{[vle64_v $V15, ($KT)]}
+    addi $KT, $KT, 32
+    @{[vadd_vv $V14, $V15, $V12]}
+    @{[vsha2cl_vv $V17, $V16, $V14]}
+    @{[vsha2ch_vv $V16, $V17, $V14]}
+    @{[vmerge_vvm $V14, $V10, $V13, $V0]}
+    @{[vsha2ms_vv $V12, $V14, $V11]}
+
+    # Quad-round 7 (+3, v13->v10->v11->v12)
+    @{[vle64_v $V15, ($KT)]}
+    addi $KT, $KT, 32
+    @{[vadd_vv $V14, $V15, $V13]}
+    @{[vsha2cl_vv $V17, $V16, $V14]}
+    @{[vsha2ch_vv $V16, $V17, $V14]}
+    @{[vmerge_vvm $V14, $V11, $V10, $V0]}
+    @{[vsha2ms_vv $V13, $V14, $V12]}
+
+    # Quad-round 8 (+0, v10->v11->v12->v13)
+    @{[vle64_v $V15, ($KT)]}
+    addi $KT, $KT, 32
+    @{[vadd_vv $V14, $V15, $V10]}
+    @{[vsha2cl_vv $V17, $V16, $V14]}
+    @{[vsha2ch_vv $V16, $V17, $V14]}
+    @{[vmerge_vvm $V14, $V12, $V11, $V0]}
+    @{[vsha2ms_vv $V10, $V14, $V13]}
+
+    # Quad-round 9 (+1, v11->v12->v13->v10)
+    @{[vle64_v $V15, ($KT)]}
+    addi $KT, $KT, 32
+    @{[vadd_vv $V14, $V15, $V11]}
+    @{[vsha2cl_vv $V17, $V16, $V14]}
+    @{[vsha2ch_vv $V16, $V17, $V14]}
+    @{[vmerge_vvm $V14, $V13, $V12, $V0]}
+    @{[vsha2ms_vv $V11, $V14, $V10]}
+
+    # Quad-round 10 (+2, v12->v13->v10->v11)
+    @{[vle64_v $V15, ($KT)]}
+    addi $KT, $KT, 32
+    @{[vadd_vv $V14, $V15, $V12]}
+    @{[vsha2cl_vv $V17, $V16, $V14]}
+    @{[vsha2ch_vv $V16, $V17, $V14]}
+    @{[vmerge_vvm $V14, $V10, $V13, $V0]}
+    @{[vsha2ms_vv $V12, $V14, $V11]}
+
+    # Quad-round 11 (+3, v13->v10->v11->v12)
+    @{[vle64_v $V15, ($KT)]}
+    addi $KT, $KT, 32
+    @{[vadd_vv $V14, $V15, $V13]}
+    @{[vsha2cl_vv $V17, $V16, $V14]}
+    @{[vsha2ch_vv $V16, $V17, $V14]}
+    @{[vmerge_vvm $V14, $V11, $V10, $V0]}
+    @{[vsha2ms_vv $V13, $V14, $V12]}
+
+    # Quad-round 12 (+0, v10->v11->v12->v13)
+    @{[vle64_v $V15, ($KT)]}
+    addi $KT, $KT, 32
+    @{[vadd_vv $V14, $V15, $V10]}
+    @{[vsha2cl_vv $V17, $V16, $V14]}
+    @{[vsha2ch_vv $V16, $V17, $V14]}
+    @{[vmerge_vvm $V14, $V12, $V11, $V0]}
+    @{[vsha2ms_vv $V10, $V14, $V13]}
+
+    # Quad-round 13 (+1, v11->v12->v13->v10)
+    @{[vle64_v $V15, ($KT)]}
+    addi $KT, $KT, 32
+    @{[vadd_vv $V14, $V15, $V11]}
+    @{[vsha2cl_vv $V17, $V16, $V14]}
+    @{[vsha2ch_vv $V16, $V17, $V14]}
+    @{[vmerge_vvm $V14, $V13, $V12, $V0]}
+    @{[vsha2ms_vv $V11, $V14, $V10]}
+
+    # Quad-round 14 (+2, v12->v13->v10->v11)
+    @{[vle64_v $V15, ($KT)]}
+    addi $KT, $KT, 32
+    @{[vadd_vv $V14, $V15, $V12]}
+    @{[vsha2cl_vv $V17, $V16, $V14]}
+    @{[vsha2ch_vv $V16, $V17, $V14]}
+    @{[vmerge_vvm $V14, $V10, $V13, $V0]}
+    @{[vsha2ms_vv $V12, $V14, $V11]}
+
+    # Quad-round 15 (+3, v13->v10->v11->v12)
+    @{[vle64_v $V15, ($KT)]}
+    addi $KT, $KT, 32
+    @{[vadd_vv $V14, $V15, $V13]}
+    @{[vsha2cl_vv $V17, $V16, $V14]}
+    @{[vsha2ch_vv $V16, $V17, $V14]}
+    @{[vmerge_vvm $V14, $V11, $V10, $V0]}
+    @{[vsha2ms_vv $V13, $V14, $V12]}
+
+    # Quad-round 16 (+0, v10->v11->v12->v13)
+    # Note that we stop generating new message schedule words (Wt, v10-13)
+    # as we already generated all the words we end up consuming (i.e., W[79:76]).
+    @{[vle64_v $V15, ($KT)]}
+    addi $KT, $KT, 32
+    @{[vadd_vv $V14, $V15, $V10]}
+    @{[vsha2cl_vv $V17, $V16, $V14]}
+    @{[vsha2ch_vv $V16, $V17, $V14]}
+    @{[vmerge_vvm $V14, $V12, $V11, $V0]}
+
+    # Quad-round 17 (+1, v11->v12->v13->v10)
+    @{[vle64_v $V15, ($KT)]}
+    addi $KT, $KT, 32
+    @{[vadd_vv $V14, $V15, $V11]}
+    @{[vsha2cl_vv $V17, $V16, $V14]}
+    @{[vsha2ch_vv $V16, $V17, $V14]}
+    @{[vmerge_vvm $V14, $V13, $V12, $V0]}
+
+    # Quad-round 18 (+2, v12->v13->v10->v11)
+    @{[vle64_v $V15, ($KT)]}
+    addi $KT, $KT, 32
+    @{[vadd_vv $V14, $V15, $V12]}
+    @{[vsha2cl_vv $V17, $V16, $V14]}
+    @{[vsha2ch_vv $V16, $V17, $V14]}
+    @{[vmerge_vvm $V14, $V10, $V13, $V0]}
+
+    # Quad-round 19 (+3, v13->v10->v11->v12)
+    @{[vle64_v $V15, ($KT)]}
+    # No t1 increment needed.
+    @{[vadd_vv $V14, $V15, $V13]}
+    @{[vsha2cl_vv $V17, $V16, $V14]}
+    @{[vsha2ch_vv $V16, $V17, $V14]}
+
+    # H' = H+{a',b',c',...,h'}
+    @{[vadd_vv $V16, $V26, $V16]}
+    @{[vadd_vv $V17, $V27, $V17]}
+    @{[vmv_v_v $V26, $V16]}
+    @{[vmv_v_v $V27, $V17]}
+    bnez $LEN, L_round_loop
+
+    # v26 = v16 = {f,e,b,a}
+    # v27 = v17 = {h,g,d,c}
+    # Let's do the opposit transformation like on entry.
+
+    @{[vslideup_vi $V17, $V16, 2]} # {h,g,f,e}
+
+    @{[vslidedown_vi $V16, $V27, 2]} # {d,c,X,X}
+    @{[vslidedown_vi $V26, $V26, 2]} # {b,a,X,X}
+    @{[vslideup_vi $V16, $V26, 2]} # {d,c,b,a}
+
+    # H is already advanced by 24
+    @{[vsse64_v $V16, $H, $STRIDE]} # {a,b,c,d}
+    addi $H, $H, 32
+    @{[vsse64_v $V17, $H, $STRIDE]} # {e,f,g,h}
+
+    ret
+.size sha512_block_data_order_zvbb_zvknhb,.-sha512_block_data_order_zvbb_zvknhb
+
+.p2align 3
+.type $K512,\@object
+$K512:
+    .dword 0x428a2f98d728ae22, 0x7137449123ef65cd
+    .dword 0xb5c0fbcfec4d3b2f, 0xe9b5dba58189dbbc
+    .dword 0x3956c25bf348b538, 0x59f111f1b605d019
+    .dword 0x923f82a4af194f9b, 0xab1c5ed5da6d8118
+    .dword 0xd807aa98a3030242, 0x12835b0145706fbe
+    .dword 0x243185be4ee4b28c, 0x550c7dc3d5ffb4e2
+    .dword 0x72be5d74f27b896f, 0x80deb1fe3b1696b1
+    .dword 0x9bdc06a725c71235, 0xc19bf174cf692694
+    .dword 0xe49b69c19ef14ad2, 0xefbe4786384f25e3
+    .dword 0x0fc19dc68b8cd5b5, 0x240ca1cc77ac9c65
+    .dword 0x2de92c6f592b0275, 0x4a7484aa6ea6e483
+    .dword 0x5cb0a9dcbd41fbd4, 0x76f988da831153b5
+    .dword 0x983e5152ee66dfab, 0xa831c66d2db43210
+    .dword 0xb00327c898fb213f, 0xbf597fc7beef0ee4
+    .dword 0xc6e00bf33da88fc2, 0xd5a79147930aa725
+    .dword 0x06ca6351e003826f, 0x142929670a0e6e70
+    .dword 0x27b70a8546d22ffc, 0x2e1b21385c26c926
+    .dword 0x4d2c6dfc5ac42aed, 0x53380d139d95b3df
+    .dword 0x650a73548baf63de, 0x766a0abb3c77b2a8
+    .dword 0x81c2c92e47edaee6, 0x92722c851482353b
+    .dword 0xa2bfe8a14cf10364, 0xa81a664bbc423001
+    .dword 0xc24b8b70d0f89791, 0xc76c51a30654be30
+    .dword 0xd192e819d6ef5218, 0xd69906245565a910
+    .dword 0xf40e35855771202a, 0x106aa07032bbd1b8
+    .dword 0x19a4c116b8d2d0c8, 0x1e376c085141ab53
+    .dword 0x2748774cdf8eeb99, 0x34b0bcb5e19b48a8
+    .dword 0x391c0cb3c5c95a63, 0x4ed8aa4ae3418acb
+    .dword 0x5b9cca4f7763e373, 0x682e6ff3d6b2b8a3
+    .dword 0x748f82ee5defb2fc, 0x78a5636f43172f60
+    .dword 0x84c87814a1f0ab72, 0x8cc702081a6439ec
+    .dword 0x90befffa23631e28, 0xa4506cebde82bde9
+    .dword 0xbef9a3f7b2c67915, 0xc67178f2e372532b
+    .dword 0xca273eceea26619c, 0xd186b8c721c0c207
+    .dword 0xeada7dd6cde0eb1e, 0xf57d4f7fee6ed178
+    .dword 0x06f067aa72176fba, 0x0a637dc5a2c898a6
+    .dword 0x113f9804bef90dae, 0x1b710b35131c471b
+    .dword 0x28db77f523047d84, 0x32caab7b40c72493
+    .dword 0x3c9ebe0a15c9bebc, 0x431d67c49c100d4c
+    .dword 0x4cc5d4becb3e42b6, 0x597f299cfc657e2a
+    .dword 0x5fcb6fab3ad6faec, 0x6c44198c4a475817
+.size $K512,.-$K512
+___
+
+print $code;
+
+close STDOUT or die "error closing STDOUT: $!";
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [PATCH v4 10/12] RISC-V: crypto: add Zvkned accelerated AES encryption implementation
  2023-07-11 15:37 ` Heiko Stuebner
@ 2023-07-11 15:37   ` Heiko Stuebner
  -1 siblings, 0 replies; 100+ messages in thread
From: Heiko Stuebner @ 2023-07-11 15:37 UTC (permalink / raw)
  To: palmer, paul.walmsley
  Cc: aou, heiko, herbert, davem, conor.dooley, linux-riscv,
	linux-kernel, linux-crypto, christoph.muellner, ebiggers,
	Heiko Stuebner

From: Heiko Stuebner <heiko.stuebner@vrull.eu>

This adds an AES implementation using the Zvkned vector crypto instructions.

Co-developed-by: Christoph Müllner <christoph.muellner@vrull.eu>
Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
---
 arch/riscv/crypto/Kconfig               |  14 +
 arch/riscv/crypto/Makefile              |   7 +
 arch/riscv/crypto/aes-riscv-glue.c      | 168 ++++++++
 arch/riscv/crypto/aes-riscv64-zvkned.pl | 530 ++++++++++++++++++++++++
 4 files changed, 719 insertions(+)
 create mode 100644 arch/riscv/crypto/aes-riscv-glue.c
 create mode 100644 arch/riscv/crypto/aes-riscv64-zvkned.pl

diff --git a/arch/riscv/crypto/Kconfig b/arch/riscv/crypto/Kconfig
index e564f861d95e..8579ce43546d 100644
--- a/arch/riscv/crypto/Kconfig
+++ b/arch/riscv/crypto/Kconfig
@@ -2,6 +2,20 @@
 
 menu "Accelerated Cryptographic Algorithms for CPU (riscv)"
 
+config CRYPTO_AES_RISCV
+	tristate "Ciphers: AES (RISCV)"
+	depends on 64BIT && RISCV_ISA_V
+	select CRYPTO_AES
+	help
+	  Block ciphers: AES cipher algorithms (FIPS-197)
+	  Length-preserving ciphers: AES with ECB, CBC, CTR, CTS,
+	    XCTR, and XTS modes
+	  AEAD cipher: AES with CBC, ESSIV, and SHA-256
+	    for fscrypt and dm-crypt
+
+	  Architecture: riscv using one of
+	  - Zvkns
+
 config CRYPTO_GHASH_RISCV64
 	tristate "Hash functions: GHASH"
 	depends on 64BIT && (RISCV_ISA_ZBC || RISCV_ISA_V)
diff --git a/arch/riscv/crypto/Makefile b/arch/riscv/crypto/Makefile
index b12c925172db..38ee741a9777 100644
--- a/arch/riscv/crypto/Makefile
+++ b/arch/riscv/crypto/Makefile
@@ -3,6 +3,9 @@
 # linux/arch/riscv/crypto/Makefile
 #
 
+obj-$(CONFIG_CRYPTO_AES_RISCV) += aes-riscv.o
+aes-riscv-y := aes-riscv-glue.o aes-riscv64-zvkned.o
+
 obj-$(CONFIG_CRYPTO_GHASH_RISCV64) += ghash-riscv64.o
 ghash-riscv64-y := ghash-riscv64-glue.o
 ifdef CONFIG_RISCV_ISA_ZBC
@@ -21,6 +24,9 @@ sha512-riscv64-y := sha512-riscv64-glue.o sha512-riscv64-zvbb-zvknhb.o
 quiet_cmd_perlasm = PERLASM $@
       cmd_perlasm = $(PERL) $(<) void $(@)
 
+$(obj)/aes-riscv64-zvkned.S: $(src)/aes-riscv64-zvkned.pl
+	$(call cmd,perlasm)
+
 $(obj)/ghash-riscv64-zbc.S: $(src)/ghash-riscv64-zbc.pl
 	$(call cmd,perlasm)
 
@@ -36,5 +42,6 @@ $(obj)/sha256-riscv64-zvbb-zvknha.S: $(src)/sha256-riscv64-zvbb-zvknha.pl
 $(obj)/sha512-riscv64-zvbb-zvknhb.S: $(src)/sha512-riscv64-zvbb-zvknhb.pl
 	$(call cmd,perlasm)
 
+clean-files += aes-riscv64-zvkned.S
 clean-files += ghash-riscv64-zbc.S ghash-riscv64-zvkb.S ghash-riscv64-zvkg.S
 clean-files += sha256-riscv64-zvknha.S sha512-riscv64-zvknhb.S
diff --git a/arch/riscv/crypto/aes-riscv-glue.c b/arch/riscv/crypto/aes-riscv-glue.c
new file mode 100644
index 000000000000..85e1187aee22
--- /dev/null
+++ b/arch/riscv/crypto/aes-riscv-glue.c
@@ -0,0 +1,168 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Linux/riscv port of the OpenSSL AES implementation for RISCV
+ *
+ * Copyright (C) 2023 VRULL GmbH
+ * Author: Heiko Stuebner <heiko.stuebner@vrull.eu>
+ */
+
+#include <linux/crypto.h>
+#include <linux/delay.h>
+#include <linux/err.h>
+#include <linux/module.h>
+#include <linux/types.h>
+#include <asm/simd.h>
+#include <asm/vector.h>
+#include <crypto/aes.h>
+#include <crypto/internal/cipher.h>
+#include <crypto/internal/simd.h>
+
+struct aes_key {
+	u8 key[AES_MAX_KEYLENGTH];
+	int rounds;
+};
+
+/* variant using the zvkned vector crypto extension */
+void rv64i_zvkned_encrypt(const u8 *in, u8 *out, const struct aes_key *key);
+void rv64i_zvkned_decrypt(const u8 *in, u8 *out, const struct aes_key *key);
+int rv64i_zvkned_set_encrypt_key(const u8 *userKey, const int bits,
+				struct aes_key *key);
+int rv64i_zvkned_set_decrypt_key(const u8 *userKey, const int bits,
+				struct aes_key *key);
+
+struct riscv_aes_ctx {
+	struct crypto_cipher *fallback;
+	struct aes_key enc_key;
+	struct aes_key dec_key;
+	unsigned int keylen;
+};
+
+static int riscv64_aes_init_zvkned(struct crypto_tfm *tfm)
+{
+	struct riscv_aes_ctx *ctx = crypto_tfm_ctx(tfm);
+	const char *alg = crypto_tfm_alg_name(tfm);
+	struct crypto_cipher *fallback;
+
+	fallback = crypto_alloc_cipher(alg, 0, CRYPTO_ALG_NEED_FALLBACK);
+	if (IS_ERR(fallback)) {
+		pr_err("Failed to allocate transformation for '%s': %ld\n",
+		       alg, PTR_ERR(fallback));
+		return PTR_ERR(fallback);
+	}
+
+	crypto_cipher_set_flags(fallback,
+				crypto_cipher_get_flags((struct
+							 crypto_cipher *)
+							tfm));
+	ctx->fallback = fallback;
+
+	return 0;
+}
+
+static void riscv_aes_exit(struct crypto_tfm *tfm)
+{
+	struct riscv_aes_ctx *ctx = crypto_tfm_ctx(tfm);
+
+	if (ctx->fallback) {
+		crypto_free_cipher(ctx->fallback);
+		ctx->fallback = NULL;
+	}
+}
+
+static int riscv64_aes_setkey_zvkned(struct crypto_tfm *tfm, const u8 *key,
+			 unsigned int keylen)
+{
+	struct riscv_aes_ctx *ctx = crypto_tfm_ctx(tfm);
+	int ret;
+
+	ctx->keylen = keylen;
+
+	if (keylen == 16 || keylen == 32) {
+		kernel_rvv_begin();
+		ret = rv64i_zvkned_set_encrypt_key(key, keylen * 8, &ctx->enc_key);
+		if (ret != 1) {
+			kernel_rvv_end();
+			return -EINVAL;
+		}
+
+		ret = rv64i_zvkned_set_decrypt_key(key, keylen * 8, &ctx->dec_key);
+		kernel_rvv_end();
+		if (ret != 1)
+			return -EINVAL;
+	}
+
+	ret = crypto_cipher_setkey(ctx->fallback, key, keylen);
+
+	return ret ? -EINVAL : 0;
+}
+
+static void riscv64_aes_encrypt_zvkned(struct crypto_tfm *tfm, u8 *dst, const u8 *src)
+{
+	struct riscv_aes_ctx *ctx = crypto_tfm_ctx(tfm);
+
+	if (crypto_simd_usable() && (ctx->keylen == 16 || ctx->keylen == 32)) {
+		kernel_rvv_begin();
+		rv64i_zvkned_encrypt(src, dst, &ctx->enc_key);
+		kernel_rvv_end();
+	} else {
+		crypto_cipher_encrypt_one(ctx->fallback, dst, src);
+	}
+}
+
+static void riscv64_aes_decrypt_zvkned(struct crypto_tfm *tfm, u8 *dst, const u8 *src)
+{
+	struct riscv_aes_ctx *ctx = crypto_tfm_ctx(tfm);
+
+	if (crypto_simd_usable() && (ctx->keylen == 16 || ctx->keylen == 32)) {
+		kernel_rvv_begin();
+		rv64i_zvkned_decrypt(src, dst, &ctx->dec_key);
+		kernel_rvv_end();
+	} else {
+		crypto_cipher_decrypt_one(ctx->fallback, dst, src);
+	}
+}
+
+struct crypto_alg riscv64_aes_zvkned_alg = {
+	.cra_name = "aes",
+	.cra_driver_name = "riscv-aes-zvkned",
+	.cra_module = THIS_MODULE,
+	.cra_priority = 300,
+	.cra_type = NULL,
+	.cra_flags = CRYPTO_ALG_TYPE_CIPHER | CRYPTO_ALG_NEED_FALLBACK,
+	.cra_alignmask = 0,
+	.cra_blocksize = AES_BLOCK_SIZE,
+	.cra_ctxsize = sizeof(struct riscv_aes_ctx),
+	.cra_init = riscv64_aes_init_zvkned,
+	.cra_exit = riscv_aes_exit,
+	.cra_cipher = {
+		.cia_min_keysize = AES_MIN_KEY_SIZE,
+		.cia_max_keysize = AES_MAX_KEY_SIZE,
+		.cia_setkey = riscv64_aes_setkey_zvkned,
+		.cia_encrypt = riscv64_aes_encrypt_zvkned,
+		.cia_decrypt = riscv64_aes_decrypt_zvkned,
+	},
+};
+
+static int __init riscv_aes_mod_init(void)
+{
+	if (riscv_isa_extension_available(NULL, ZVKNED) &&
+	    riscv_vector_vlen() >= 128)
+		return crypto_register_alg(&riscv64_aes_zvkned_alg);
+
+	return 0;
+}
+
+static void __exit riscv_aes_mod_fini(void)
+{
+	if (riscv_isa_extension_available(NULL, ZVKNED) &&
+	    riscv_vector_vlen() >= 128)
+		return crypto_unregister_alg(&riscv64_aes_zvkned_alg);
+}
+
+module_init(riscv_aes_mod_init);
+module_exit(riscv_aes_mod_fini);
+
+MODULE_DESCRIPTION("AES (accelerated)");
+MODULE_AUTHOR("Heiko Stuebner <heiko.stuebner@vrull.eu>");
+MODULE_LICENSE("GPL");
+MODULE_ALIAS_CRYPTO("aes");
diff --git a/arch/riscv/crypto/aes-riscv64-zvkned.pl b/arch/riscv/crypto/aes-riscv64-zvkned.pl
new file mode 100644
index 000000000000..d26eeb8932bd
--- /dev/null
+++ b/arch/riscv/crypto/aes-riscv64-zvkned.pl
@@ -0,0 +1,530 @@
+#! /usr/bin/env perl
+# SPDX-License-Identifier: Apache-2.0 OR BSD-2-Clause
+#
+# This file is dual-licensed, meaning that you can use it under your
+# choice of either of the following two licenses:
+#
+# Copyright 2023 The OpenSSL Project Authors. All Rights Reserved.
+#
+# Licensed under the Apache License 2.0 (the "License"). You can obtain
+# a copy in the file LICENSE in the source distribution or at
+# https://www.openssl.org/source/license.html
+#
+# or
+#
+# Copyright (c) 2023, Christoph Müllner <christoph.muellner@vrull.eu>
+# All rights reserved.
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions
+# are met:
+# 1. Redistributions of source code must retain the above copyright
+#    notice, this list of conditions and the following disclaimer.
+# 2. Redistributions in binary form must reproduce the above copyright
+#    notice, this list of conditions and the following disclaimer in the
+#    documentation and/or other materials provided with the distribution.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+# - RV64I
+# - RISC-V vector ('V') with VLEN >= 128
+# - RISC-V vector crypto AES extension ('Zvkned')
+
+use strict;
+use warnings;
+
+use FindBin qw($Bin);
+use lib "$Bin";
+use lib "$Bin/../../perlasm";
+use riscv;
+
+# $output is the last argument if it looks like a file (it has an extension)
+# $flavour is the first argument if it doesn't look like a file
+my $output = $#ARGV >= 0 && $ARGV[$#ARGV] =~ m|\.\w+$| ? pop : undef;
+my $flavour = $#ARGV >= 0 && $ARGV[0] !~ m|\.| ? shift : undef;
+
+$output and open STDOUT,">$output";
+
+my $code=<<___;
+.text
+___
+
+################################################################################
+# int rv64i_zvkned_set_encrypt_key(const unsigned char *userKey, const int bits,
+#                                  AES_KEY *key)
+# int rv64i_zvkned_set_decrypt_key(const unsigned char *userKey, const int bits,
+#                                  AES_KEY *key)
+{
+my ($UKEY,$BITS,$KEYP) = ("a0", "a1", "a2");
+my ($T0,$T1,$T4) = ("t1", "t2", "t4");
+my ($v0,  $v1,  $v2,  $v3,  $v4,  $v5,  $v6,
+          $v7,  $v8,  $v9,  $v10, $v11, $v12,
+          $v13, $v14, $v15, $v16, $v17, $v18,
+          $v19, $v20, $v21, $v22, $v23, $v24,
+) = map("v$_",(0..24));
+
+$code .= <<___;
+.p2align 3
+.globl rv64i_zvkned_set_encrypt_key
+.type rv64i_zvkned_set_encrypt_key,\@function
+rv64i_zvkned_set_encrypt_key:
+    beqz $UKEY, L_fail_m1
+    beqz $KEYP, L_fail_m1
+
+    # Get proper routine for key size
+    li $T0, 256
+    beq $BITS, $T0, L_set_key_256
+    li $T0, 128
+    beq $BITS, $T0, L_set_key_128
+
+    j L_fail_m2
+
+.size rv64i_zvkned_set_encrypt_key,.-rv64i_zvkned_set_encrypt_key
+___
+
+$code .= <<___;
+.p2align 3
+.globl rv64i_zvkned_set_decrypt_key
+.type rv64i_zvkned_set_decrypt_key,\@function
+rv64i_zvkned_set_decrypt_key:
+    beqz $UKEY, L_fail_m1
+    beqz $KEYP, L_fail_m1
+
+    # Get proper routine for key size
+    li $T0, 256
+    beq $BITS, $T0, L_set_key_256
+    li $T0, 128
+    beq $BITS, $T0, L_set_key_128
+
+    j L_fail_m2
+
+.size rv64i_zvkned_set_decrypt_key,.-rv64i_zvkned_set_decrypt_key
+___
+
+$code .= <<___;
+.p2align 3
+L_set_key_128:
+    # Store the number of rounds
+    li $T1, 10
+    sw $T1, 240($KEYP)
+
+    @{[vsetivli__x0_4_e32_m1_ta_ma]}
+
+    # Load the key
+    @{[vle32_v $v10, ($UKEY)]}
+
+    # Generate keys for round 2-11 into registers v11-v20.
+    @{[vaeskf1_vi $v11, $v10, 1]}   # v11 <- rk2  (w[ 4, 7])
+    @{[vaeskf1_vi $v12, $v11, 2]}   # v12 <- rk3  (w[ 8,11])
+    @{[vaeskf1_vi $v13, $v12, 3]}   # v13 <- rk4  (w[12,15])
+    @{[vaeskf1_vi $v14, $v13, 4]}   # v14 <- rk5  (w[16,19])
+    @{[vaeskf1_vi $v15, $v14, 5]}   # v15 <- rk6  (w[20,23])
+    @{[vaeskf1_vi $v16, $v15, 6]}   # v16 <- rk7  (w[24,27])
+    @{[vaeskf1_vi $v17, $v16, 7]}   # v17 <- rk8  (w[28,31])
+    @{[vaeskf1_vi $v18, $v17, 8]}   # v18 <- rk9  (w[32,35])
+    @{[vaeskf1_vi $v19, $v18, 9]}   # v19 <- rk10 (w[36,39])
+    @{[vaeskf1_vi $v20, $v19, 10]}  # v20 <- rk11 (w[40,43])
+
+    # Store the round keys
+    @{[vse32_v $v10, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vse32_v $v11, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vse32_v $v12, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vse32_v $v13, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vse32_v $v14, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vse32_v $v15, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vse32_v $v16, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vse32_v $v17, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vse32_v $v18, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vse32_v $v19, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vse32_v $v20, ($KEYP)]}
+
+    li a0, 1
+    ret
+.size L_set_key_128,.-L_set_key_128
+___
+
+$code .= <<___;
+.p2align 3
+L_set_key_256:
+    # Store the number of rounds
+    li $T1, 14
+    sw $T1, 240($KEYP)
+
+    @{[vsetivli__x0_4_e32_m1_ta_ma]}
+
+    # Load the key
+    @{[vle32_v $v10, ($UKEY)]}
+    addi $UKEY, $UKEY, 16
+    @{[vle32_v $v11, ($UKEY)]}
+
+    @{[vmv_v_v $v12, $v10]}
+    @{[vaeskf2_vi $v12, $v11, 2]}
+    @{[vmv_v_v $v13, $v11]}
+    @{[vaeskf2_vi $v13, $v12, 3]}
+    @{[vmv_v_v $v14, $v12]}
+    @{[vaeskf2_vi $v14, $v13, 4]}
+    @{[vmv_v_v $v15, $v13]}
+    @{[vaeskf2_vi $v15, $v14, 5]}
+    @{[vmv_v_v $v16, $v14]}
+    @{[vaeskf2_vi $v16, $v15, 6]}
+    @{[vmv_v_v $v17, $v15]}
+    @{[vaeskf2_vi $v17, $v16, 7]}
+    @{[vmv_v_v $v18, $v16]}
+    @{[vaeskf2_vi $v18, $v17, 8]}
+    @{[vmv_v_v $v19, $v17]}
+    @{[vaeskf2_vi $v19, $v18, 9]}
+    @{[vmv_v_v $v20, $v18]}
+    @{[vaeskf2_vi $v20, $v19, 10]}
+    @{[vmv_v_v $v21, $v19]}
+    @{[vaeskf2_vi $v21, $v20, 11]}
+    @{[vmv_v_v $v22, $v20]}
+    @{[vaeskf2_vi $v22, $v21, 12]}
+    @{[vmv_v_v $v23, $v21]}
+    @{[vaeskf2_vi $v23, $v22, 13]}
+    @{[vmv_v_v $v24, $v22]}
+    @{[vaeskf2_vi $v24, $v23, 14]}
+
+    @{[vse32_v $v10, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vse32_v $v11, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vse32_v $v12, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vse32_v $v13, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vse32_v $v14, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vse32_v $v15, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vse32_v $v16, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vse32_v $v17, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vse32_v $v18, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vse32_v $v19, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vse32_v $v20, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vse32_v $v21, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vse32_v $v22, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vse32_v $v23, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vse32_v $v24, ($KEYP)]}
+
+    li a0, 1
+    ret
+.size L_set_key_256,.-L_set_key_256
+___
+}
+
+################################################################################
+# void rv64i_zvkned_encrypt(const unsigned char *in, unsigned char *out,
+#                           const AES_KEY *key);
+{
+my ($INP,$OUTP,$KEYP) = ("a0", "a1", "a2");
+my ($T0,$T1, $rounds, $T6) = ("a3", "a4", "t5", "t6");
+my ($v0,  $v1,  $v2,  $v3,  $v4,  $v5,  $v6,
+          $v7,  $v8,  $v9,  $v10, $v11, $v12,
+          $v13, $v14, $v15, $v16, $v17, $v18,
+          $v19, $v20, $v21, $v22, $v23, $v24,
+) = map("v$_",(0..24));
+
+$code .= <<___;
+.p2align 3
+.globl rv64i_zvkned_encrypt
+.type rv64i_zvkned_encrypt,\@function
+rv64i_zvkned_encrypt:
+    # Load number of rounds
+    lwu     $rounds, 240($KEYP)
+
+    # Get proper routine for key size
+    li $T6, 14
+    beq $rounds, $T6, L_enc_256
+    li $T6, 10
+    beq $rounds, $T6, L_enc_128
+
+    j L_fail_m2
+.size rv64i_zvkned_encrypt,.-rv64i_zvkned_encrypt
+___
+
+$code .= <<___;
+.p2align 3
+L_enc_128:
+    @{[vsetivli__x0_4_e32_m1_ta_ma]}
+
+    @{[vle32_v $v10, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $v11, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $v12, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $v13, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $v14, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $v15, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $v16, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $v17, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $v18, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $v19, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $v20, ($KEYP)]}
+
+    @{[vle32_v $v1, ($INP)]}
+
+    @{[vaesz_vs $v1, $v10]}    # with round key w[ 0, 3]
+    @{[vaesem_vs $v1, $v11]}   # with round key w[ 4, 7]
+    @{[vaesem_vs $v1, $v12]}   # with round key w[ 8,11]
+    @{[vaesem_vs $v1, $v13]}   # with round key w[12,15]
+    @{[vaesem_vs $v1, $v14]}   # with round key w[16,19]
+    @{[vaesem_vs $v1, $v15]}   # with round key w[20,23]
+    @{[vaesem_vs $v1, $v16]}   # with round key w[24,27]
+    @{[vaesem_vs $v1, $v17]}   # with round key w[28,31]
+    @{[vaesem_vs $v1, $v18]}   # with round key w[32,35]
+    @{[vaesem_vs $v1, $v19]}   # with round key w[36,39]
+    @{[vaesef_vs $v1, $v20]}   # with round key w[40,43]
+
+    @{[vse32_v $v1, ($OUTP)]}
+
+    ret
+.size L_enc_128,.-L_enc_128
+___
+
+$code .= <<___;
+.p2align 3
+L_enc_256:
+    @{[vsetivli__x0_4_e32_m1_ta_ma]}
+
+    @{[vle32_v $v10, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $v11, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $v12, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $v13, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $v14, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $v15, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $v16, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $v17, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $v18, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $v19, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $v20, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $v21, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $v22, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $v23, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $v24, ($KEYP)]}
+
+    @{[vle32_v $v1, ($INP)]}
+
+    @{[vaesz_vs $v1, $v10]}     # with round key w[ 0, 3]
+    @{[vaesem_vs $v1, $v11]}
+    @{[vaesem_vs $v1, $v12]}
+    @{[vaesem_vs $v1, $v13]}
+    @{[vaesem_vs $v1, $v14]}
+    @{[vaesem_vs $v1, $v15]}
+    @{[vaesem_vs $v1, $v16]}
+    @{[vaesem_vs $v1, $v17]}
+    @{[vaesem_vs $v1, $v18]}
+    @{[vaesem_vs $v1, $v19]}
+    @{[vaesem_vs $v1, $v20]}
+    @{[vaesem_vs $v1, $v21]}
+    @{[vaesem_vs $v1, $v22]}
+    @{[vaesem_vs $v1, $v23]}
+    @{[vaesef_vs $v1, $v24]}
+
+    @{[vse32_v $v1, ($OUTP)]}
+    ret
+.size L_enc_256,.-L_enc_256
+___
+}
+
+################################################################################
+# void rv64i_zvkned_decrypt(const unsigned char *in, unsigned char *out,
+#                           const AES_KEY *key);
+{
+my ($INP,$OUTP,$KEYP) = ("a0", "a1", "a2");
+my ($T0,$T1, $rounds, $T6) = ("a3", "a4", "t5", "t6");
+my ($v0,  $v1,  $v2,  $v3,  $v4,  $v5,  $v6,
+          $v7,  $v8,  $v9,  $v10, $v11, $v12,
+          $v13, $v14, $v15, $v16, $v17, $v18,
+          $v19, $v20, $v21, $v22, $v23, $v24,
+) = map("v$_",(0..24));
+
+$code .= <<___;
+.p2align 3
+.globl rv64i_zvkned_decrypt
+.type rv64i_zvkned_decrypt,\@function
+rv64i_zvkned_decrypt:
+    # Load number of rounds
+    lwu     $rounds, 240($KEYP)
+
+    # Get proper routine for key size
+    li $T6, 14
+    beq $rounds, $T6, L_dec_256
+    li $T6, 10
+    beq $rounds, $T6, L_dec_128
+
+    j L_fail_m2
+.size rv64i_zvkned_decrypt,.-rv64i_zvkned_decrypt
+___
+
+$code .= <<___;
+.p2align 3
+L_dec_128:
+    @{[vsetivli__x0_4_e32_m1_ta_ma]}
+
+    @{[vle32_v $v10, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $v11, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $v12, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $v13, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $v14, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $v15, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $v16, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $v17, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $v18, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $v19, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $v20, ($KEYP)]}
+
+    @{[vle32_v $v1, ($INP)]}
+
+    @{[vaesz_vs $v1, $v20]}    # with round key w[43,47]
+    @{[vaesdm_vs $v1, $v19]}   # with round key w[36,39]
+    @{[vaesdm_vs $v1, $v18]}   # with round key w[32,35]
+    @{[vaesdm_vs $v1, $v17]}   # with round key w[28,31]
+    @{[vaesdm_vs $v1, $v16]}   # with round key w[24,27]
+    @{[vaesdm_vs $v1, $v15]}   # with round key w[20,23]
+    @{[vaesdm_vs $v1, $v14]}   # with round key w[16,19]
+    @{[vaesdm_vs $v1, $v13]}   # with round key w[12,15]
+    @{[vaesdm_vs $v1, $v12]}   # with round key w[ 8,11]
+    @{[vaesdm_vs $v1, $v11]}   # with round key w[ 4, 7]
+    @{[vaesdf_vs $v1, $v10]}   # with round key w[ 0, 3]
+
+    @{[vse32_v $v1, ($OUTP)]}
+
+    ret
+.size L_dec_128,.-L_dec_128
+___
+
+$code .= <<___;
+.p2align 3
+L_dec_256:
+    @{[vsetivli__x0_4_e32_m1_ta_ma]}
+
+    @{[vle32_v $v10, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $v11, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $v12, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $v13, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $v14, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $v15, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $v16, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $v17, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $v18, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $v19, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $v20, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $v21, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $v22, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $v23, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $v24, ($KEYP)]}
+
+    @{[vle32_v $v1, ($INP)]}
+
+    @{[vaesz_vs $v1, $v24]}    # with round key w[56,59]
+    @{[vaesdm_vs $v1, $v23]}   # with round key w[52,55]
+    @{[vaesdm_vs $v1, $v22]}   # with round key w[48,51]
+    @{[vaesdm_vs $v1, $v21]}   # with round key w[44,47]
+    @{[vaesdm_vs $v1, $v20]}   # with round key w[40,43]
+    @{[vaesdm_vs $v1, $v19]}   # with round key w[36,39]
+    @{[vaesdm_vs $v1, $v18]}   # with round key w[32,35]
+    @{[vaesdm_vs $v1, $v17]}   # with round key w[28,31]
+    @{[vaesdm_vs $v1, $v16]}   # with round key w[24,27]
+    @{[vaesdm_vs $v1, $v15]}   # with round key w[20,23]
+    @{[vaesdm_vs $v1, $v14]}   # with round key w[16,19]
+    @{[vaesdm_vs $v1, $v13]}   # with round key w[12,15]
+    @{[vaesdm_vs $v1, $v12]}   # with round key w[ 8,11]
+    @{[vaesdm_vs $v1, $v11]}   # with round key w[ 4, 7]
+    @{[vaesdf_vs $v1, $v10]}   # with round key w[ 0, 3]
+
+    @{[vse32_v $v1, ($OUTP)]}
+
+    ret
+.size L_dec_256,.-L_dec_256
+___
+}
+
+$code .= <<___;
+L_fail_m1:
+    li a0, -1
+    ret
+.size L_fail_m1,.-L_fail_m1
+
+L_fail_m2:
+    li a0, -2
+    ret
+.size L_fail_m2,.-L_fail_m2
+___
+
+print $code;
+
+close STDOUT or die "error closing STDOUT: $!";
-- 
2.39.2


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [PATCH v4 10/12] RISC-V: crypto: add Zvkned accelerated AES encryption implementation
@ 2023-07-11 15:37   ` Heiko Stuebner
  0 siblings, 0 replies; 100+ messages in thread
From: Heiko Stuebner @ 2023-07-11 15:37 UTC (permalink / raw)
  To: palmer, paul.walmsley
  Cc: aou, heiko, herbert, davem, conor.dooley, linux-riscv,
	linux-kernel, linux-crypto, christoph.muellner, ebiggers,
	Heiko Stuebner

From: Heiko Stuebner <heiko.stuebner@vrull.eu>

This adds an AES implementation using the Zvkned vector crypto instructions.

Co-developed-by: Christoph Müllner <christoph.muellner@vrull.eu>
Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
---
 arch/riscv/crypto/Kconfig               |  14 +
 arch/riscv/crypto/Makefile              |   7 +
 arch/riscv/crypto/aes-riscv-glue.c      | 168 ++++++++
 arch/riscv/crypto/aes-riscv64-zvkned.pl | 530 ++++++++++++++++++++++++
 4 files changed, 719 insertions(+)
 create mode 100644 arch/riscv/crypto/aes-riscv-glue.c
 create mode 100644 arch/riscv/crypto/aes-riscv64-zvkned.pl

diff --git a/arch/riscv/crypto/Kconfig b/arch/riscv/crypto/Kconfig
index e564f861d95e..8579ce43546d 100644
--- a/arch/riscv/crypto/Kconfig
+++ b/arch/riscv/crypto/Kconfig
@@ -2,6 +2,20 @@
 
 menu "Accelerated Cryptographic Algorithms for CPU (riscv)"
 
+config CRYPTO_AES_RISCV
+	tristate "Ciphers: AES (RISCV)"
+	depends on 64BIT && RISCV_ISA_V
+	select CRYPTO_AES
+	help
+	  Block ciphers: AES cipher algorithms (FIPS-197)
+	  Length-preserving ciphers: AES with ECB, CBC, CTR, CTS,
+	    XCTR, and XTS modes
+	  AEAD cipher: AES with CBC, ESSIV, and SHA-256
+	    for fscrypt and dm-crypt
+
+	  Architecture: riscv using one of
+	  - Zvkns
+
 config CRYPTO_GHASH_RISCV64
 	tristate "Hash functions: GHASH"
 	depends on 64BIT && (RISCV_ISA_ZBC || RISCV_ISA_V)
diff --git a/arch/riscv/crypto/Makefile b/arch/riscv/crypto/Makefile
index b12c925172db..38ee741a9777 100644
--- a/arch/riscv/crypto/Makefile
+++ b/arch/riscv/crypto/Makefile
@@ -3,6 +3,9 @@
 # linux/arch/riscv/crypto/Makefile
 #
 
+obj-$(CONFIG_CRYPTO_AES_RISCV) += aes-riscv.o
+aes-riscv-y := aes-riscv-glue.o aes-riscv64-zvkned.o
+
 obj-$(CONFIG_CRYPTO_GHASH_RISCV64) += ghash-riscv64.o
 ghash-riscv64-y := ghash-riscv64-glue.o
 ifdef CONFIG_RISCV_ISA_ZBC
@@ -21,6 +24,9 @@ sha512-riscv64-y := sha512-riscv64-glue.o sha512-riscv64-zvbb-zvknhb.o
 quiet_cmd_perlasm = PERLASM $@
       cmd_perlasm = $(PERL) $(<) void $(@)
 
+$(obj)/aes-riscv64-zvkned.S: $(src)/aes-riscv64-zvkned.pl
+	$(call cmd,perlasm)
+
 $(obj)/ghash-riscv64-zbc.S: $(src)/ghash-riscv64-zbc.pl
 	$(call cmd,perlasm)
 
@@ -36,5 +42,6 @@ $(obj)/sha256-riscv64-zvbb-zvknha.S: $(src)/sha256-riscv64-zvbb-zvknha.pl
 $(obj)/sha512-riscv64-zvbb-zvknhb.S: $(src)/sha512-riscv64-zvbb-zvknhb.pl
 	$(call cmd,perlasm)
 
+clean-files += aes-riscv64-zvkned.S
 clean-files += ghash-riscv64-zbc.S ghash-riscv64-zvkb.S ghash-riscv64-zvkg.S
 clean-files += sha256-riscv64-zvknha.S sha512-riscv64-zvknhb.S
diff --git a/arch/riscv/crypto/aes-riscv-glue.c b/arch/riscv/crypto/aes-riscv-glue.c
new file mode 100644
index 000000000000..85e1187aee22
--- /dev/null
+++ b/arch/riscv/crypto/aes-riscv-glue.c
@@ -0,0 +1,168 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Linux/riscv port of the OpenSSL AES implementation for RISCV
+ *
+ * Copyright (C) 2023 VRULL GmbH
+ * Author: Heiko Stuebner <heiko.stuebner@vrull.eu>
+ */
+
+#include <linux/crypto.h>
+#include <linux/delay.h>
+#include <linux/err.h>
+#include <linux/module.h>
+#include <linux/types.h>
+#include <asm/simd.h>
+#include <asm/vector.h>
+#include <crypto/aes.h>
+#include <crypto/internal/cipher.h>
+#include <crypto/internal/simd.h>
+
+struct aes_key {
+	u8 key[AES_MAX_KEYLENGTH];
+	int rounds;
+};
+
+/* variant using the zvkned vector crypto extension */
+void rv64i_zvkned_encrypt(const u8 *in, u8 *out, const struct aes_key *key);
+void rv64i_zvkned_decrypt(const u8 *in, u8 *out, const struct aes_key *key);
+int rv64i_zvkned_set_encrypt_key(const u8 *userKey, const int bits,
+				struct aes_key *key);
+int rv64i_zvkned_set_decrypt_key(const u8 *userKey, const int bits,
+				struct aes_key *key);
+
+struct riscv_aes_ctx {
+	struct crypto_cipher *fallback;
+	struct aes_key enc_key;
+	struct aes_key dec_key;
+	unsigned int keylen;
+};
+
+static int riscv64_aes_init_zvkned(struct crypto_tfm *tfm)
+{
+	struct riscv_aes_ctx *ctx = crypto_tfm_ctx(tfm);
+	const char *alg = crypto_tfm_alg_name(tfm);
+	struct crypto_cipher *fallback;
+
+	fallback = crypto_alloc_cipher(alg, 0, CRYPTO_ALG_NEED_FALLBACK);
+	if (IS_ERR(fallback)) {
+		pr_err("Failed to allocate transformation for '%s': %ld\n",
+		       alg, PTR_ERR(fallback));
+		return PTR_ERR(fallback);
+	}
+
+	crypto_cipher_set_flags(fallback,
+				crypto_cipher_get_flags((struct
+							 crypto_cipher *)
+							tfm));
+	ctx->fallback = fallback;
+
+	return 0;
+}
+
+static void riscv_aes_exit(struct crypto_tfm *tfm)
+{
+	struct riscv_aes_ctx *ctx = crypto_tfm_ctx(tfm);
+
+	if (ctx->fallback) {
+		crypto_free_cipher(ctx->fallback);
+		ctx->fallback = NULL;
+	}
+}
+
+static int riscv64_aes_setkey_zvkned(struct crypto_tfm *tfm, const u8 *key,
+			 unsigned int keylen)
+{
+	struct riscv_aes_ctx *ctx = crypto_tfm_ctx(tfm);
+	int ret;
+
+	ctx->keylen = keylen;
+
+	if (keylen == 16 || keylen == 32) {
+		kernel_rvv_begin();
+		ret = rv64i_zvkned_set_encrypt_key(key, keylen * 8, &ctx->enc_key);
+		if (ret != 1) {
+			kernel_rvv_end();
+			return -EINVAL;
+		}
+
+		ret = rv64i_zvkned_set_decrypt_key(key, keylen * 8, &ctx->dec_key);
+		kernel_rvv_end();
+		if (ret != 1)
+			return -EINVAL;
+	}
+
+	ret = crypto_cipher_setkey(ctx->fallback, key, keylen);
+
+	return ret ? -EINVAL : 0;
+}
+
+static void riscv64_aes_encrypt_zvkned(struct crypto_tfm *tfm, u8 *dst, const u8 *src)
+{
+	struct riscv_aes_ctx *ctx = crypto_tfm_ctx(tfm);
+
+	if (crypto_simd_usable() && (ctx->keylen == 16 || ctx->keylen == 32)) {
+		kernel_rvv_begin();
+		rv64i_zvkned_encrypt(src, dst, &ctx->enc_key);
+		kernel_rvv_end();
+	} else {
+		crypto_cipher_encrypt_one(ctx->fallback, dst, src);
+	}
+}
+
+static void riscv64_aes_decrypt_zvkned(struct crypto_tfm *tfm, u8 *dst, const u8 *src)
+{
+	struct riscv_aes_ctx *ctx = crypto_tfm_ctx(tfm);
+
+	if (crypto_simd_usable() && (ctx->keylen == 16 || ctx->keylen == 32)) {
+		kernel_rvv_begin();
+		rv64i_zvkned_decrypt(src, dst, &ctx->dec_key);
+		kernel_rvv_end();
+	} else {
+		crypto_cipher_decrypt_one(ctx->fallback, dst, src);
+	}
+}
+
+struct crypto_alg riscv64_aes_zvkned_alg = {
+	.cra_name = "aes",
+	.cra_driver_name = "riscv-aes-zvkned",
+	.cra_module = THIS_MODULE,
+	.cra_priority = 300,
+	.cra_type = NULL,
+	.cra_flags = CRYPTO_ALG_TYPE_CIPHER | CRYPTO_ALG_NEED_FALLBACK,
+	.cra_alignmask = 0,
+	.cra_blocksize = AES_BLOCK_SIZE,
+	.cra_ctxsize = sizeof(struct riscv_aes_ctx),
+	.cra_init = riscv64_aes_init_zvkned,
+	.cra_exit = riscv_aes_exit,
+	.cra_cipher = {
+		.cia_min_keysize = AES_MIN_KEY_SIZE,
+		.cia_max_keysize = AES_MAX_KEY_SIZE,
+		.cia_setkey = riscv64_aes_setkey_zvkned,
+		.cia_encrypt = riscv64_aes_encrypt_zvkned,
+		.cia_decrypt = riscv64_aes_decrypt_zvkned,
+	},
+};
+
+static int __init riscv_aes_mod_init(void)
+{
+	if (riscv_isa_extension_available(NULL, ZVKNED) &&
+	    riscv_vector_vlen() >= 128)
+		return crypto_register_alg(&riscv64_aes_zvkned_alg);
+
+	return 0;
+}
+
+static void __exit riscv_aes_mod_fini(void)
+{
+	if (riscv_isa_extension_available(NULL, ZVKNED) &&
+	    riscv_vector_vlen() >= 128)
+		return crypto_unregister_alg(&riscv64_aes_zvkned_alg);
+}
+
+module_init(riscv_aes_mod_init);
+module_exit(riscv_aes_mod_fini);
+
+MODULE_DESCRIPTION("AES (accelerated)");
+MODULE_AUTHOR("Heiko Stuebner <heiko.stuebner@vrull.eu>");
+MODULE_LICENSE("GPL");
+MODULE_ALIAS_CRYPTO("aes");
diff --git a/arch/riscv/crypto/aes-riscv64-zvkned.pl b/arch/riscv/crypto/aes-riscv64-zvkned.pl
new file mode 100644
index 000000000000..d26eeb8932bd
--- /dev/null
+++ b/arch/riscv/crypto/aes-riscv64-zvkned.pl
@@ -0,0 +1,530 @@
+#! /usr/bin/env perl
+# SPDX-License-Identifier: Apache-2.0 OR BSD-2-Clause
+#
+# This file is dual-licensed, meaning that you can use it under your
+# choice of either of the following two licenses:
+#
+# Copyright 2023 The OpenSSL Project Authors. All Rights Reserved.
+#
+# Licensed under the Apache License 2.0 (the "License"). You can obtain
+# a copy in the file LICENSE in the source distribution or at
+# https://www.openssl.org/source/license.html
+#
+# or
+#
+# Copyright (c) 2023, Christoph Müllner <christoph.muellner@vrull.eu>
+# All rights reserved.
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions
+# are met:
+# 1. Redistributions of source code must retain the above copyright
+#    notice, this list of conditions and the following disclaimer.
+# 2. Redistributions in binary form must reproduce the above copyright
+#    notice, this list of conditions and the following disclaimer in the
+#    documentation and/or other materials provided with the distribution.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+# - RV64I
+# - RISC-V vector ('V') with VLEN >= 128
+# - RISC-V vector crypto AES extension ('Zvkned')
+
+use strict;
+use warnings;
+
+use FindBin qw($Bin);
+use lib "$Bin";
+use lib "$Bin/../../perlasm";
+use riscv;
+
+# $output is the last argument if it looks like a file (it has an extension)
+# $flavour is the first argument if it doesn't look like a file
+my $output = $#ARGV >= 0 && $ARGV[$#ARGV] =~ m|\.\w+$| ? pop : undef;
+my $flavour = $#ARGV >= 0 && $ARGV[0] !~ m|\.| ? shift : undef;
+
+$output and open STDOUT,">$output";
+
+my $code=<<___;
+.text
+___
+
+################################################################################
+# int rv64i_zvkned_set_encrypt_key(const unsigned char *userKey, const int bits,
+#                                  AES_KEY *key)
+# int rv64i_zvkned_set_decrypt_key(const unsigned char *userKey, const int bits,
+#                                  AES_KEY *key)
+{
+my ($UKEY,$BITS,$KEYP) = ("a0", "a1", "a2");
+my ($T0,$T1,$T4) = ("t1", "t2", "t4");
+my ($v0,  $v1,  $v2,  $v3,  $v4,  $v5,  $v6,
+          $v7,  $v8,  $v9,  $v10, $v11, $v12,
+          $v13, $v14, $v15, $v16, $v17, $v18,
+          $v19, $v20, $v21, $v22, $v23, $v24,
+) = map("v$_",(0..24));
+
+$code .= <<___;
+.p2align 3
+.globl rv64i_zvkned_set_encrypt_key
+.type rv64i_zvkned_set_encrypt_key,\@function
+rv64i_zvkned_set_encrypt_key:
+    beqz $UKEY, L_fail_m1
+    beqz $KEYP, L_fail_m1
+
+    # Get proper routine for key size
+    li $T0, 256
+    beq $BITS, $T0, L_set_key_256
+    li $T0, 128
+    beq $BITS, $T0, L_set_key_128
+
+    j L_fail_m2
+
+.size rv64i_zvkned_set_encrypt_key,.-rv64i_zvkned_set_encrypt_key
+___
+
+$code .= <<___;
+.p2align 3
+.globl rv64i_zvkned_set_decrypt_key
+.type rv64i_zvkned_set_decrypt_key,\@function
+rv64i_zvkned_set_decrypt_key:
+    beqz $UKEY, L_fail_m1
+    beqz $KEYP, L_fail_m1
+
+    # Get proper routine for key size
+    li $T0, 256
+    beq $BITS, $T0, L_set_key_256
+    li $T0, 128
+    beq $BITS, $T0, L_set_key_128
+
+    j L_fail_m2
+
+.size rv64i_zvkned_set_decrypt_key,.-rv64i_zvkned_set_decrypt_key
+___
+
+$code .= <<___;
+.p2align 3
+L_set_key_128:
+    # Store the number of rounds
+    li $T1, 10
+    sw $T1, 240($KEYP)
+
+    @{[vsetivli__x0_4_e32_m1_ta_ma]}
+
+    # Load the key
+    @{[vle32_v $v10, ($UKEY)]}
+
+    # Generate keys for round 2-11 into registers v11-v20.
+    @{[vaeskf1_vi $v11, $v10, 1]}   # v11 <- rk2  (w[ 4, 7])
+    @{[vaeskf1_vi $v12, $v11, 2]}   # v12 <- rk3  (w[ 8,11])
+    @{[vaeskf1_vi $v13, $v12, 3]}   # v13 <- rk4  (w[12,15])
+    @{[vaeskf1_vi $v14, $v13, 4]}   # v14 <- rk5  (w[16,19])
+    @{[vaeskf1_vi $v15, $v14, 5]}   # v15 <- rk6  (w[20,23])
+    @{[vaeskf1_vi $v16, $v15, 6]}   # v16 <- rk7  (w[24,27])
+    @{[vaeskf1_vi $v17, $v16, 7]}   # v17 <- rk8  (w[28,31])
+    @{[vaeskf1_vi $v18, $v17, 8]}   # v18 <- rk9  (w[32,35])
+    @{[vaeskf1_vi $v19, $v18, 9]}   # v19 <- rk10 (w[36,39])
+    @{[vaeskf1_vi $v20, $v19, 10]}  # v20 <- rk11 (w[40,43])
+
+    # Store the round keys
+    @{[vse32_v $v10, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vse32_v $v11, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vse32_v $v12, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vse32_v $v13, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vse32_v $v14, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vse32_v $v15, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vse32_v $v16, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vse32_v $v17, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vse32_v $v18, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vse32_v $v19, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vse32_v $v20, ($KEYP)]}
+
+    li a0, 1
+    ret
+.size L_set_key_128,.-L_set_key_128
+___
+
+$code .= <<___;
+.p2align 3
+L_set_key_256:
+    # Store the number of rounds
+    li $T1, 14
+    sw $T1, 240($KEYP)
+
+    @{[vsetivli__x0_4_e32_m1_ta_ma]}
+
+    # Load the key
+    @{[vle32_v $v10, ($UKEY)]}
+    addi $UKEY, $UKEY, 16
+    @{[vle32_v $v11, ($UKEY)]}
+
+    @{[vmv_v_v $v12, $v10]}
+    @{[vaeskf2_vi $v12, $v11, 2]}
+    @{[vmv_v_v $v13, $v11]}
+    @{[vaeskf2_vi $v13, $v12, 3]}
+    @{[vmv_v_v $v14, $v12]}
+    @{[vaeskf2_vi $v14, $v13, 4]}
+    @{[vmv_v_v $v15, $v13]}
+    @{[vaeskf2_vi $v15, $v14, 5]}
+    @{[vmv_v_v $v16, $v14]}
+    @{[vaeskf2_vi $v16, $v15, 6]}
+    @{[vmv_v_v $v17, $v15]}
+    @{[vaeskf2_vi $v17, $v16, 7]}
+    @{[vmv_v_v $v18, $v16]}
+    @{[vaeskf2_vi $v18, $v17, 8]}
+    @{[vmv_v_v $v19, $v17]}
+    @{[vaeskf2_vi $v19, $v18, 9]}
+    @{[vmv_v_v $v20, $v18]}
+    @{[vaeskf2_vi $v20, $v19, 10]}
+    @{[vmv_v_v $v21, $v19]}
+    @{[vaeskf2_vi $v21, $v20, 11]}
+    @{[vmv_v_v $v22, $v20]}
+    @{[vaeskf2_vi $v22, $v21, 12]}
+    @{[vmv_v_v $v23, $v21]}
+    @{[vaeskf2_vi $v23, $v22, 13]}
+    @{[vmv_v_v $v24, $v22]}
+    @{[vaeskf2_vi $v24, $v23, 14]}
+
+    @{[vse32_v $v10, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vse32_v $v11, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vse32_v $v12, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vse32_v $v13, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vse32_v $v14, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vse32_v $v15, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vse32_v $v16, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vse32_v $v17, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vse32_v $v18, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vse32_v $v19, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vse32_v $v20, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vse32_v $v21, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vse32_v $v22, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vse32_v $v23, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vse32_v $v24, ($KEYP)]}
+
+    li a0, 1
+    ret
+.size L_set_key_256,.-L_set_key_256
+___
+}
+
+################################################################################
+# void rv64i_zvkned_encrypt(const unsigned char *in, unsigned char *out,
+#                           const AES_KEY *key);
+{
+my ($INP,$OUTP,$KEYP) = ("a0", "a1", "a2");
+my ($T0,$T1, $rounds, $T6) = ("a3", "a4", "t5", "t6");
+my ($v0,  $v1,  $v2,  $v3,  $v4,  $v5,  $v6,
+          $v7,  $v8,  $v9,  $v10, $v11, $v12,
+          $v13, $v14, $v15, $v16, $v17, $v18,
+          $v19, $v20, $v21, $v22, $v23, $v24,
+) = map("v$_",(0..24));
+
+$code .= <<___;
+.p2align 3
+.globl rv64i_zvkned_encrypt
+.type rv64i_zvkned_encrypt,\@function
+rv64i_zvkned_encrypt:
+    # Load number of rounds
+    lwu     $rounds, 240($KEYP)
+
+    # Get proper routine for key size
+    li $T6, 14
+    beq $rounds, $T6, L_enc_256
+    li $T6, 10
+    beq $rounds, $T6, L_enc_128
+
+    j L_fail_m2
+.size rv64i_zvkned_encrypt,.-rv64i_zvkned_encrypt
+___
+
+$code .= <<___;
+.p2align 3
+L_enc_128:
+    @{[vsetivli__x0_4_e32_m1_ta_ma]}
+
+    @{[vle32_v $v10, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $v11, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $v12, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $v13, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $v14, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $v15, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $v16, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $v17, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $v18, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $v19, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $v20, ($KEYP)]}
+
+    @{[vle32_v $v1, ($INP)]}
+
+    @{[vaesz_vs $v1, $v10]}    # with round key w[ 0, 3]
+    @{[vaesem_vs $v1, $v11]}   # with round key w[ 4, 7]
+    @{[vaesem_vs $v1, $v12]}   # with round key w[ 8,11]
+    @{[vaesem_vs $v1, $v13]}   # with round key w[12,15]
+    @{[vaesem_vs $v1, $v14]}   # with round key w[16,19]
+    @{[vaesem_vs $v1, $v15]}   # with round key w[20,23]
+    @{[vaesem_vs $v1, $v16]}   # with round key w[24,27]
+    @{[vaesem_vs $v1, $v17]}   # with round key w[28,31]
+    @{[vaesem_vs $v1, $v18]}   # with round key w[32,35]
+    @{[vaesem_vs $v1, $v19]}   # with round key w[36,39]
+    @{[vaesef_vs $v1, $v20]}   # with round key w[40,43]
+
+    @{[vse32_v $v1, ($OUTP)]}
+
+    ret
+.size L_enc_128,.-L_enc_128
+___
+
+$code .= <<___;
+.p2align 3
+L_enc_256:
+    @{[vsetivli__x0_4_e32_m1_ta_ma]}
+
+    @{[vle32_v $v10, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $v11, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $v12, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $v13, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $v14, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $v15, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $v16, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $v17, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $v18, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $v19, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $v20, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $v21, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $v22, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $v23, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $v24, ($KEYP)]}
+
+    @{[vle32_v $v1, ($INP)]}
+
+    @{[vaesz_vs $v1, $v10]}     # with round key w[ 0, 3]
+    @{[vaesem_vs $v1, $v11]}
+    @{[vaesem_vs $v1, $v12]}
+    @{[vaesem_vs $v1, $v13]}
+    @{[vaesem_vs $v1, $v14]}
+    @{[vaesem_vs $v1, $v15]}
+    @{[vaesem_vs $v1, $v16]}
+    @{[vaesem_vs $v1, $v17]}
+    @{[vaesem_vs $v1, $v18]}
+    @{[vaesem_vs $v1, $v19]}
+    @{[vaesem_vs $v1, $v20]}
+    @{[vaesem_vs $v1, $v21]}
+    @{[vaesem_vs $v1, $v22]}
+    @{[vaesem_vs $v1, $v23]}
+    @{[vaesef_vs $v1, $v24]}
+
+    @{[vse32_v $v1, ($OUTP)]}
+    ret
+.size L_enc_256,.-L_enc_256
+___
+}
+
+################################################################################
+# void rv64i_zvkned_decrypt(const unsigned char *in, unsigned char *out,
+#                           const AES_KEY *key);
+{
+my ($INP,$OUTP,$KEYP) = ("a0", "a1", "a2");
+my ($T0,$T1, $rounds, $T6) = ("a3", "a4", "t5", "t6");
+my ($v0,  $v1,  $v2,  $v3,  $v4,  $v5,  $v6,
+          $v7,  $v8,  $v9,  $v10, $v11, $v12,
+          $v13, $v14, $v15, $v16, $v17, $v18,
+          $v19, $v20, $v21, $v22, $v23, $v24,
+) = map("v$_",(0..24));
+
+$code .= <<___;
+.p2align 3
+.globl rv64i_zvkned_decrypt
+.type rv64i_zvkned_decrypt,\@function
+rv64i_zvkned_decrypt:
+    # Load number of rounds
+    lwu     $rounds, 240($KEYP)
+
+    # Get proper routine for key size
+    li $T6, 14
+    beq $rounds, $T6, L_dec_256
+    li $T6, 10
+    beq $rounds, $T6, L_dec_128
+
+    j L_fail_m2
+.size rv64i_zvkned_decrypt,.-rv64i_zvkned_decrypt
+___
+
+$code .= <<___;
+.p2align 3
+L_dec_128:
+    @{[vsetivli__x0_4_e32_m1_ta_ma]}
+
+    @{[vle32_v $v10, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $v11, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $v12, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $v13, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $v14, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $v15, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $v16, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $v17, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $v18, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $v19, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $v20, ($KEYP)]}
+
+    @{[vle32_v $v1, ($INP)]}
+
+    @{[vaesz_vs $v1, $v20]}    # with round key w[43,47]
+    @{[vaesdm_vs $v1, $v19]}   # with round key w[36,39]
+    @{[vaesdm_vs $v1, $v18]}   # with round key w[32,35]
+    @{[vaesdm_vs $v1, $v17]}   # with round key w[28,31]
+    @{[vaesdm_vs $v1, $v16]}   # with round key w[24,27]
+    @{[vaesdm_vs $v1, $v15]}   # with round key w[20,23]
+    @{[vaesdm_vs $v1, $v14]}   # with round key w[16,19]
+    @{[vaesdm_vs $v1, $v13]}   # with round key w[12,15]
+    @{[vaesdm_vs $v1, $v12]}   # with round key w[ 8,11]
+    @{[vaesdm_vs $v1, $v11]}   # with round key w[ 4, 7]
+    @{[vaesdf_vs $v1, $v10]}   # with round key w[ 0, 3]
+
+    @{[vse32_v $v1, ($OUTP)]}
+
+    ret
+.size L_dec_128,.-L_dec_128
+___
+
+$code .= <<___;
+.p2align 3
+L_dec_256:
+    @{[vsetivli__x0_4_e32_m1_ta_ma]}
+
+    @{[vle32_v $v10, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $v11, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $v12, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $v13, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $v14, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $v15, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $v16, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $v17, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $v18, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $v19, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $v20, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $v21, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $v22, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $v23, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $v24, ($KEYP)]}
+
+    @{[vle32_v $v1, ($INP)]}
+
+    @{[vaesz_vs $v1, $v24]}    # with round key w[56,59]
+    @{[vaesdm_vs $v1, $v23]}   # with round key w[52,55]
+    @{[vaesdm_vs $v1, $v22]}   # with round key w[48,51]
+    @{[vaesdm_vs $v1, $v21]}   # with round key w[44,47]
+    @{[vaesdm_vs $v1, $v20]}   # with round key w[40,43]
+    @{[vaesdm_vs $v1, $v19]}   # with round key w[36,39]
+    @{[vaesdm_vs $v1, $v18]}   # with round key w[32,35]
+    @{[vaesdm_vs $v1, $v17]}   # with round key w[28,31]
+    @{[vaesdm_vs $v1, $v16]}   # with round key w[24,27]
+    @{[vaesdm_vs $v1, $v15]}   # with round key w[20,23]
+    @{[vaesdm_vs $v1, $v14]}   # with round key w[16,19]
+    @{[vaesdm_vs $v1, $v13]}   # with round key w[12,15]
+    @{[vaesdm_vs $v1, $v12]}   # with round key w[ 8,11]
+    @{[vaesdm_vs $v1, $v11]}   # with round key w[ 4, 7]
+    @{[vaesdf_vs $v1, $v10]}   # with round key w[ 0, 3]
+
+    @{[vse32_v $v1, ($OUTP)]}
+
+    ret
+.size L_dec_256,.-L_dec_256
+___
+}
+
+$code .= <<___;
+L_fail_m1:
+    li a0, -1
+    ret
+.size L_fail_m1,.-L_fail_m1
+
+L_fail_m2:
+    li a0, -2
+    ret
+.size L_fail_m2,.-L_fail_m2
+___
+
+print $code;
+
+close STDOUT or die "error closing STDOUT: $!";
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [PATCH v4 11/12] RISC-V: crypto: add Zvksed accelerated SM4 encryption implementation
  2023-07-11 15:37 ` Heiko Stuebner
@ 2023-07-11 15:37   ` Heiko Stuebner
  -1 siblings, 0 replies; 100+ messages in thread
From: Heiko Stuebner @ 2023-07-11 15:37 UTC (permalink / raw)
  To: palmer, paul.walmsley
  Cc: aou, heiko, herbert, davem, conor.dooley, linux-riscv,
	linux-kernel, linux-crypto, christoph.muellner, ebiggers,
	Heiko Stuebner

From: Heiko Stuebner <heiko.stuebner@vrull.eu>

Add support for the SM4 symmetric cipher implemented using the special
instructions provided by the Zvksed vector crypto instructions.

Co-developed-by: Christoph Müllner <christoph.muellner@vrull.eu>
Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
---
 arch/riscv/crypto/Kconfig               |  17 ++
 arch/riscv/crypto/Makefile              |   7 +
 arch/riscv/crypto/sm4-riscv64-glue.c    | 162 +++++++++++++
 arch/riscv/crypto/sm4-riscv64-zvksed.pl | 300 ++++++++++++++++++++++++
 4 files changed, 486 insertions(+)
 create mode 100644 arch/riscv/crypto/sm4-riscv64-glue.c
 create mode 100644 arch/riscv/crypto/sm4-riscv64-zvksed.pl

diff --git a/arch/riscv/crypto/Kconfig b/arch/riscv/crypto/Kconfig
index 8579ce43546d..d1e22482f7c4 100644
--- a/arch/riscv/crypto/Kconfig
+++ b/arch/riscv/crypto/Kconfig
@@ -53,4 +53,21 @@ config CRYPTO_SHA512_RISCV64
 	  Architecture: riscv64
 	  - Zvknhb vector crypto extension
 
+config CRYPTO_SM4_RISCV64
+	tristate "Ciphers: SM4 (ShangMi 4)"
+	depends on 64BIT && RISCV_ISA_V
+	select CRYPTO_ALGAPI
+	select CRYPTO_SM4
+	select CRYPTO_SM4_GENERIC
+	help
+	  SM4 cipher algorithms (OSCCA GB/T 32907-2016,
+	  ISO/IEC 18033-3:2010/Amd 1:2021)
+
+	  SM4 (GBT.32907-2016) is a cryptographic standard issued by the
+	  Organization of State Commercial Administration of China (OSCCA)
+	  as an authorized cryptographic algorithms for the use within China.
+
+	  Architecture: riscv64
+	  - Zvksed vector crypto extension
+
 endmenu
diff --git a/arch/riscv/crypto/Makefile b/arch/riscv/crypto/Makefile
index 38ee741a9777..1a9f31b185de 100644
--- a/arch/riscv/crypto/Makefile
+++ b/arch/riscv/crypto/Makefile
@@ -21,6 +21,9 @@ sha256-riscv64-y := sha256-riscv64-glue.o sha256-riscv64-zvbb-zvknha.o
 obj-$(CONFIG_CRYPTO_SHA512_RISCV64) += sha512-riscv64.o
 sha512-riscv64-y := sha512-riscv64-glue.o sha512-riscv64-zvbb-zvknhb.o
 
+obj-$(CONFIG_CRYPTO_SM4_RISCV64) += sm4-riscv64.o
+sm4-riscv64-y := sm4-riscv64-glue.o sm4-riscv64-zvksed.o
+
 quiet_cmd_perlasm = PERLASM $@
       cmd_perlasm = $(PERL) $(<) void $(@)
 
@@ -42,6 +45,10 @@ $(obj)/sha256-riscv64-zvbb-zvknha.S: $(src)/sha256-riscv64-zvbb-zvknha.pl
 $(obj)/sha512-riscv64-zvbb-zvknhb.S: $(src)/sha512-riscv64-zvbb-zvknhb.pl
 	$(call cmd,perlasm)
 
+$(obj)/sm4-riscv64-zvksed.S: $(src)/sm4-riscv64-zvksed.pl
+	$(call cmd,perlasm)
+
 clean-files += aes-riscv64-zvkned.S
 clean-files += ghash-riscv64-zbc.S ghash-riscv64-zvkb.S ghash-riscv64-zvkg.S
 clean-files += sha256-riscv64-zvknha.S sha512-riscv64-zvknhb.S
+clean-files += sm4-riscv64-zvksed.S
diff --git a/arch/riscv/crypto/sm4-riscv64-glue.c b/arch/riscv/crypto/sm4-riscv64-glue.c
new file mode 100644
index 000000000000..b4030690f696
--- /dev/null
+++ b/arch/riscv/crypto/sm4-riscv64-glue.c
@@ -0,0 +1,162 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Linux/riscv64 port of the OpenSSL SM4 implementation for RISCV64
+ *
+ * Copyright (C) 2023 VRULL GmbH
+ * Author: Heiko Stuebner <heiko.stuebner@vrull.eu>
+ */
+
+#include <linux/crypto.h>
+#include <linux/delay.h>
+#include <linux/err.h>
+#include <linux/module.h>
+#include <linux/types.h>
+#include <asm/simd.h>
+#include <asm/vector.h>
+#include <crypto/sm4.h>
+#include <crypto/internal/cipher.h>
+#include <crypto/internal/simd.h>
+
+struct sm4_key {
+	u32 rkey[SM4_RKEY_WORDS];
+};
+
+void rv64i_zvksed_sm4_encrypt(const u8 *in, u8 *out, const struct sm4_key *key);
+void rv64i_zvksed_sm4_decrypt(const u8 *in, u8 *out, const struct sm4_key *key);
+int rv64i_zvksed_sm4_set_encrypt_key(const u8 *userKey, struct sm4_key *key);
+int rv64i_zvksed_sm4_set_decrypt_key(const u8 *userKey, struct sm4_key *key);
+
+struct riscv_sm4_ctx {
+	struct crypto_cipher *fallback;
+	struct sm4_key enc_key;
+	struct sm4_key dec_key;
+	unsigned int keylen;
+};
+
+static int riscv64_sm4_init_zvksed(struct crypto_tfm *tfm)
+{
+	struct riscv_sm4_ctx *ctx = crypto_tfm_ctx(tfm);
+	const char *alg = crypto_tfm_alg_name(tfm);
+	struct crypto_cipher *fallback;
+
+	fallback = crypto_alloc_cipher(alg, 0, CRYPTO_ALG_NEED_FALLBACK);
+	if (IS_ERR(fallback)) {
+		pr_err("Failed to allocate fallback for '%s': %ld\n",
+		       alg, PTR_ERR(fallback));
+		return PTR_ERR(fallback);
+	}
+
+	crypto_cipher_set_flags(fallback,
+				crypto_cipher_get_flags((struct
+							 crypto_cipher *)
+							tfm));
+	ctx->fallback = fallback;
+
+	return 0;
+}
+
+static void riscv64_sm4_exit_zvksed(struct crypto_tfm *tfm)
+{
+	struct riscv_sm4_ctx *ctx = crypto_tfm_ctx(tfm);
+
+	if (ctx->fallback) {
+		crypto_free_cipher(ctx->fallback);
+		ctx->fallback = NULL;
+	}
+}
+
+static int riscv64_sm4_setkey_zvksed(struct crypto_tfm *tfm, const u8 *key,
+				     unsigned int keylen)
+{
+	struct riscv_sm4_ctx *ctx = crypto_tfm_ctx(tfm);
+	int ret;
+
+	ctx->keylen = keylen;
+
+	kernel_rvv_begin();
+	ret = rv64i_zvksed_sm4_set_encrypt_key(key, &ctx->enc_key);
+	if (ret != 1) {
+		kernel_rvv_end();
+		return -EINVAL;
+	}
+
+	ret = rv64i_zvksed_sm4_set_decrypt_key(key, &ctx->dec_key);
+	kernel_rvv_end();
+	if (ret != 1)
+		return -EINVAL;
+
+	ret = crypto_cipher_setkey(ctx->fallback, key, keylen);
+
+	return ret ? -EINVAL : 0;
+}
+
+static void riscv64_sm4_encrypt_zvksed(struct crypto_tfm *tfm, u8 *dst, const u8 *src)
+{
+	struct riscv_sm4_ctx *ctx = crypto_tfm_ctx(tfm);
+
+	if (crypto_simd_usable()) {
+		kernel_rvv_begin();
+		rv64i_zvksed_sm4_encrypt(src, dst, &ctx->enc_key);
+		kernel_rvv_end();
+	} else {
+		crypto_cipher_encrypt_one(ctx->fallback, dst, src);
+	}
+}
+
+static void riscv64_sm4_decrypt_zvksed(struct crypto_tfm *tfm, u8 *dst, const u8 *src)
+{
+	struct riscv_sm4_ctx *ctx = crypto_tfm_ctx(tfm);
+
+	if (crypto_simd_usable()) {
+		kernel_rvv_begin();
+		rv64i_zvksed_sm4_decrypt(src, dst, &ctx->dec_key);
+		kernel_rvv_end();
+	} else {
+		crypto_cipher_decrypt_one(ctx->fallback, dst, src);
+	}
+}
+
+struct crypto_alg riscv64_sm4_zvksed_alg = {
+	.cra_name = "sm4",
+	.cra_driver_name = "riscv-sm4-zvksed",
+	.cra_module = THIS_MODULE,
+	.cra_priority = 300,
+	.cra_flags = CRYPTO_ALG_TYPE_CIPHER | CRYPTO_ALG_NEED_FALLBACK,
+	.cra_blocksize = SM4_BLOCK_SIZE,
+	.cra_ctxsize = sizeof(struct riscv_sm4_ctx),
+	.cra_init = riscv64_sm4_init_zvksed,
+	.cra_exit = riscv64_sm4_exit_zvksed,
+	.cra_cipher = {
+		.cia_min_keysize = SM4_KEY_SIZE,
+		.cia_max_keysize = SM4_KEY_SIZE,
+		.cia_setkey = riscv64_sm4_setkey_zvksed,
+		.cia_encrypt = riscv64_sm4_encrypt_zvksed,
+		.cia_decrypt = riscv64_sm4_decrypt_zvksed,
+	},
+};
+
+static int __init riscv64_sm4_mod_init(void)
+{
+	if (riscv_isa_extension_available(NULL, ZVKSED) &&
+	    riscv_isa_extension_available(NULL, ZVBB) &&
+	    riscv_vector_vlen() >= 128)
+		return crypto_register_alg(&riscv64_sm4_zvksed_alg);
+
+	return 0;
+}
+
+static void __exit riscv64_sm4_mod_fini(void)
+{
+	if (riscv_isa_extension_available(NULL, ZVKSED) &&
+	    riscv_isa_extension_available(NULL, ZVBB) &&
+	    riscv_vector_vlen() >= 128)
+		crypto_unregister_alg(&riscv64_sm4_zvksed_alg);
+}
+
+module_init(riscv64_sm4_mod_init);
+module_exit(riscv64_sm4_mod_fini);
+
+MODULE_DESCRIPTION("SM4 (accelerated)");
+MODULE_AUTHOR("Heiko Stuebner <heiko.stuebner@vrull.eu>");
+MODULE_LICENSE("GPL");
+MODULE_ALIAS_CRYPTO("sm4");
diff --git a/arch/riscv/crypto/sm4-riscv64-zvksed.pl b/arch/riscv/crypto/sm4-riscv64-zvksed.pl
new file mode 100644
index 000000000000..fa97a58afeec
--- /dev/null
+++ b/arch/riscv/crypto/sm4-riscv64-zvksed.pl
@@ -0,0 +1,300 @@
+#! /usr/bin/env perl
+# SPDX-License-Identifier: Apache-2.0 OR BSD-2-Clause
+#
+# This file is dual-licensed, meaning that you can use it under your
+# choice of either of the following two licenses:
+#
+# Copyright 2023 The OpenSSL Project Authors. All Rights Reserved.
+#
+# Licensed under the Apache License 2.0 (the "License"). You can obtain
+# a copy in the file LICENSE in the source distribution or at
+# https://www.openssl.org/source/license.html
+#
+# or
+#
+# Copyright (c) 2023, Christoph Müllner <christoph.muellner@vrull.eu>
+# All rights reserved.
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions
+# are met:
+# 1. Redistributions of source code must retain the above copyright
+#    notice, this list of conditions and the following disclaimer.
+# 2. Redistributions in binary form must reproduce the above copyright
+#    notice, this list of conditions and the following disclaimer in the
+#    documentation and/or other materials provided with the distribution.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+# The generated code of this file depends on the following RISC-V extensions:
+# - RV64I
+# - RISC-V vector ('V') with VLEN >= 128
+# - Vector Bit-manipulation used in Cryptography ('Zvbb')
+# - Vector ShangMi Suite: SM4 Block Cipher ('Zvksed')
+
+use strict;
+use warnings;
+
+use FindBin qw($Bin);
+use lib "$Bin";
+use lib "$Bin/../../perlasm";
+use riscv;
+
+# $output is the last argument if it looks like a file (it has an extension)
+# $flavour is the first argument if it doesn't look like a file
+my $output = $#ARGV >= 0 && $ARGV[$#ARGV] =~ m|\.\w+$| ? pop : undef;
+my $flavour = $#ARGV >= 0 && $ARGV[0] !~ m|\.| ? shift : undef;
+
+$output and open STDOUT,">$output";
+
+my $code=<<___;
+.text
+___
+
+####
+# int rv64i_zvksed_sm4_set_encrypt_key(const unsigned char *userKey,
+#                                      SM4_KEY *key);
+#
+{
+my ($ukey,$keys,$fk)=("a0","a1","t0");
+my ($vukey,$vfk,$vk0,$vk1,$vk2,$vk3,$vk4,$vk5,$vk6,$vk7)=("v1","v2","v3","v4","v5","v6","v7","v8","v9","v10");
+$code .= <<___;
+.p2align 3
+.globl rv64i_zvksed_sm4_set_encrypt_key
+.type rv64i_zvksed_sm4_set_encrypt_key,\@function
+rv64i_zvksed_sm4_set_encrypt_key:
+    @{[vsetivli__x0_4_e32_m1_ta_ma]}
+
+    # Load the user key
+    @{[vle32_v $vukey, $ukey]}
+    @{[vrev8_v $vukey, $vukey]}
+
+    # Load the FK.
+    la $fk, FK
+    @{[vle32_v $vfk, $fk]}
+
+    # Generate round keys.
+    @{[vxor_vv $vukey, $vukey, $vfk]}
+    @{[vsm4k_vi $vk0, $vukey, 0]} # rk[0:3]
+    @{[vsm4k_vi $vk1, $vk0, 1]} # rk[4:7]
+    @{[vsm4k_vi $vk2, $vk1, 2]} # rk[8:11]
+    @{[vsm4k_vi $vk3, $vk2, 3]} # rk[12:15]
+    @{[vsm4k_vi $vk4, $vk3, 4]} # rk[16:19]
+    @{[vsm4k_vi $vk5, $vk4, 5]} # rk[20:23]
+    @{[vsm4k_vi $vk6, $vk5, 6]} # rk[24:27]
+    @{[vsm4k_vi $vk7, $vk6, 7]} # rk[28:31]
+
+    # Store round keys
+    @{[vse32_v $vk0, $keys]} # rk[0:3]
+    addi $keys, $keys, 16
+    @{[vse32_v $vk1, $keys]} # rk[4:7]
+    addi $keys, $keys, 16
+    @{[vse32_v $vk2, $keys]} # rk[8:11]
+    addi $keys, $keys, 16
+    @{[vse32_v $vk3, $keys]} # rk[12:15]
+    addi $keys, $keys, 16
+    @{[vse32_v $vk4, $keys]} # rk[16:19]
+    addi $keys, $keys, 16
+    @{[vse32_v $vk5, $keys]} # rk[20:23]
+    addi $keys, $keys, 16
+    @{[vse32_v $vk6, $keys]} # rk[24:27]
+    addi $keys, $keys, 16
+    @{[vse32_v $vk7, $keys]} # rk[28:31]
+
+    li a0, 1
+    ret
+.size rv64i_zvksed_sm4_set_encrypt_key,.-rv64i_zvksed_sm4_set_encrypt_key
+___
+}
+
+####
+# int rv64i_zvksed_sm4_set_decrypt_key(const unsigned char *userKey,
+#                                      SM4_KEY *key);
+#
+{
+my ($ukey,$keys,$fk,$stride)=("a0","a1","t0","t1");
+my ($vukey,$vfk,$vk0,$vk1,$vk2,$vk3,$vk4,$vk5,$vk6,$vk7)=("v1","v2","v3","v4","v5","v6","v7","v8","v9","v10");
+$code .= <<___;
+.p2align 3
+.globl rv64i_zvksed_sm4_set_decrypt_key
+.type rv64i_zvksed_sm4_set_decrypt_key,\@function
+rv64i_zvksed_sm4_set_decrypt_key:
+    @{[vsetivli__x0_4_e32_m1_ta_ma]}
+
+    # Load the user key
+    @{[vle32_v $vukey, $ukey]}
+    @{[vrev8_v $vukey, $vukey]}
+
+    # Load the FK.
+    la $fk, FK
+    @{[vle32_v $vfk, $fk]}
+
+    # Generate round keys.
+    @{[vxor_vv $vukey, $vukey, $vfk]}
+    @{[vsm4k_vi $vk0, $vukey, 0]} # rk[0:3]
+    @{[vsm4k_vi $vk1, $vk0, 1]} # rk[4:7]
+    @{[vsm4k_vi $vk2, $vk1, 2]} # rk[8:11]
+    @{[vsm4k_vi $vk3, $vk2, 3]} # rk[12:15]
+    @{[vsm4k_vi $vk4, $vk3, 4]} # rk[16:19]
+    @{[vsm4k_vi $vk5, $vk4, 5]} # rk[20:23]
+    @{[vsm4k_vi $vk6, $vk5, 6]} # rk[24:27]
+    @{[vsm4k_vi $vk7, $vk6, 7]} # rk[28:31]
+
+    # Store round keys in reverse order
+    addi $keys, $keys, 12
+    li $stride, -4
+    @{[vsse32_v $vk7, $keys, $stride]} # rk[31:28]
+    addi $keys, $keys, 16
+    @{[vsse32_v $vk6, $keys, $stride]} # rk[27:24]
+    addi $keys, $keys, 16
+    @{[vsse32_v $vk5, $keys, $stride]} # rk[23:20]
+    addi $keys, $keys, 16
+    @{[vsse32_v $vk4, $keys, $stride]} # rk[19:16]
+    addi $keys, $keys, 16
+    @{[vsse32_v $vk3, $keys, $stride]} # rk[15:12]
+    addi $keys, $keys, 16
+    @{[vsse32_v $vk2, $keys, $stride]} # rk[11:8]
+    addi $keys, $keys, 16
+    @{[vsse32_v $vk1, $keys, $stride]} # rk[7:4]
+    addi $keys, $keys, 16
+    @{[vsse32_v $vk0, $keys, $stride]} # rk[3:0]
+
+    li a0, 1
+    ret
+.size rv64i_zvksed_sm4_set_decrypt_key,.-rv64i_zvksed_sm4_set_decrypt_key
+___
+}
+
+####
+# void rv64i_zvksed_sm4_encrypt(const unsigned char *in, unsigned char *out,
+#                               const SM4_KEY *key);
+#
+{
+my ($in,$out,$keys,$stride)=("a0","a1","a2","t0");
+my ($vdata,$vk0,$vk1,$vk2,$vk3,$vk4,$vk5,$vk6,$vk7,$vgen)=("v1","v2","v3","v4","v5","v6","v7","v8","v9","v10");
+$code .= <<___;
+.p2align 3
+.globl rv64i_zvksed_sm4_encrypt
+.type rv64i_zvksed_sm4_encrypt,\@function
+rv64i_zvksed_sm4_encrypt:
+    @{[vsetivli__x0_4_e32_m1_ta_ma]}
+
+    # Order of elements was adjusted in set_encrypt_key()
+    @{[vle32_v $vk0, $keys]} # rk[0:3]
+    addi $keys, $keys, 16
+    @{[vle32_v $vk1, $keys]} # rk[4:7]
+    addi $keys, $keys, 16
+    @{[vle32_v $vk2, $keys]} # rk[8:11]
+    addi $keys, $keys, 16
+    @{[vle32_v $vk3, $keys]} # rk[12:15]
+    addi $keys, $keys, 16
+    @{[vle32_v $vk4, $keys]} # rk[16:19]
+    addi $keys, $keys, 16
+    @{[vle32_v $vk5, $keys]} # rk[20:23]
+    addi $keys, $keys, 16
+    @{[vle32_v $vk6, $keys]} # rk[24:27]
+    addi $keys, $keys, 16
+    @{[vle32_v $vk7, $keys]} # rk[28:31]
+
+    # Load input data
+    @{[vle32_v $vdata, $in]}
+    @{[vrev8_v $vdata, $vdata]}
+
+    # Encrypt with all keys
+    @{[vsm4r_vs $vdata, $vk0]}
+    @{[vsm4r_vs $vdata, $vk1]}
+    @{[vsm4r_vs $vdata, $vk2]}
+    @{[vsm4r_vs $vdata, $vk3]}
+    @{[vsm4r_vs $vdata, $vk4]}
+    @{[vsm4r_vs $vdata, $vk5]}
+    @{[vsm4r_vs $vdata, $vk6]}
+    @{[vsm4r_vs $vdata, $vk7]}
+
+    # Save the ciphertext (in reverse element order)
+    @{[vrev8_v $vdata, $vdata]}
+    li $stride, -4
+    addi $out, $out, 12
+    @{[vsse32_v $vdata, $out, $stride]}
+
+    ret
+.size rv64i_zvksed_sm4_encrypt,.-rv64i_zvksed_sm4_encrypt
+___
+}
+
+####
+# void rv64i_zvksed_sm4_decrypt(const unsigned char *in, unsigned char *out,
+#                               const SM4_KEY *key);
+#
+{
+my ($in,$out,$keys,$stride)=("a0","a1","a2","t0");
+my ($vdata,$vk0,$vk1,$vk2,$vk3,$vk4,$vk5,$vk6,$vk7,$vgen)=("v1","v2","v3","v4","v5","v6","v7","v8","v9","v10");
+$code .= <<___;
+.p2align 3
+.globl rv64i_zvksed_sm4_decrypt
+.type rv64i_zvksed_sm4_decrypt,\@function
+rv64i_zvksed_sm4_decrypt:
+    @{[vsetivli__x0_4_e32_m1_ta_ma]}
+
+    # Order of elements was adjusted in set_decrypt_key()
+    @{[vle32_v $vk7, $keys]} # rk[31:28]
+    addi $keys, $keys, 16
+    @{[vle32_v $vk6, $keys]} # rk[27:24]
+    addi $keys, $keys, 16
+    @{[vle32_v $vk5, $keys]} # rk[23:20]
+    addi $keys, $keys, 16
+    @{[vle32_v $vk4, $keys]} # rk[19:16]
+    addi $keys, $keys, 16
+    @{[vle32_v $vk3, $keys]} # rk[15:11]
+    addi $keys, $keys, 16
+    @{[vle32_v $vk2, $keys]} # rk[11:8]
+    addi $keys, $keys, 16
+    @{[vle32_v $vk1, $keys]} # rk[7:4]
+    addi $keys, $keys, 16
+    @{[vle32_v $vk0, $keys]} # rk[3:0]
+
+    # Load input data
+    @{[vle32_v $vdata, $in]}
+    @{[vrev8_v $vdata, $vdata]}
+
+    # Encrypt with all keys
+    @{[vsm4r_vs $vdata, $vk7]}
+    @{[vsm4r_vs $vdata, $vk6]}
+    @{[vsm4r_vs $vdata, $vk5]}
+    @{[vsm4r_vs $vdata, $vk4]}
+    @{[vsm4r_vs $vdata, $vk3]}
+    @{[vsm4r_vs $vdata, $vk2]}
+    @{[vsm4r_vs $vdata, $vk1]}
+    @{[vsm4r_vs $vdata, $vk0]}
+
+    # Save the ciphertext (in reverse element order)
+    @{[vrev8_v $vdata, $vdata]}
+    li $stride, -4
+    addi $out, $out, 12
+    @{[vsse32_v $vdata, $out, $stride]}
+
+    ret
+.size rv64i_zvksed_sm4_decrypt,.-rv64i_zvksed_sm4_decrypt
+___
+}
+
+$code .= <<___;
+# Family Key (little-endian 32-bit chunks)
+.p2align 3
+FK:
+    .word 0xA3B1BAC6, 0x56AA3350, 0x677D9197, 0xB27022DC
+.size FK,.-FK
+___
+
+print $code;
+
+close STDOUT or die "error closing STDOUT: $!";
-- 
2.39.2


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [PATCH v4 11/12] RISC-V: crypto: add Zvksed accelerated SM4 encryption implementation
@ 2023-07-11 15:37   ` Heiko Stuebner
  0 siblings, 0 replies; 100+ messages in thread
From: Heiko Stuebner @ 2023-07-11 15:37 UTC (permalink / raw)
  To: palmer, paul.walmsley
  Cc: aou, heiko, herbert, davem, conor.dooley, linux-riscv,
	linux-kernel, linux-crypto, christoph.muellner, ebiggers,
	Heiko Stuebner

From: Heiko Stuebner <heiko.stuebner@vrull.eu>

Add support for the SM4 symmetric cipher implemented using the special
instructions provided by the Zvksed vector crypto instructions.

Co-developed-by: Christoph Müllner <christoph.muellner@vrull.eu>
Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
---
 arch/riscv/crypto/Kconfig               |  17 ++
 arch/riscv/crypto/Makefile              |   7 +
 arch/riscv/crypto/sm4-riscv64-glue.c    | 162 +++++++++++++
 arch/riscv/crypto/sm4-riscv64-zvksed.pl | 300 ++++++++++++++++++++++++
 4 files changed, 486 insertions(+)
 create mode 100644 arch/riscv/crypto/sm4-riscv64-glue.c
 create mode 100644 arch/riscv/crypto/sm4-riscv64-zvksed.pl

diff --git a/arch/riscv/crypto/Kconfig b/arch/riscv/crypto/Kconfig
index 8579ce43546d..d1e22482f7c4 100644
--- a/arch/riscv/crypto/Kconfig
+++ b/arch/riscv/crypto/Kconfig
@@ -53,4 +53,21 @@ config CRYPTO_SHA512_RISCV64
 	  Architecture: riscv64
 	  - Zvknhb vector crypto extension
 
+config CRYPTO_SM4_RISCV64
+	tristate "Ciphers: SM4 (ShangMi 4)"
+	depends on 64BIT && RISCV_ISA_V
+	select CRYPTO_ALGAPI
+	select CRYPTO_SM4
+	select CRYPTO_SM4_GENERIC
+	help
+	  SM4 cipher algorithms (OSCCA GB/T 32907-2016,
+	  ISO/IEC 18033-3:2010/Amd 1:2021)
+
+	  SM4 (GBT.32907-2016) is a cryptographic standard issued by the
+	  Organization of State Commercial Administration of China (OSCCA)
+	  as an authorized cryptographic algorithms for the use within China.
+
+	  Architecture: riscv64
+	  - Zvksed vector crypto extension
+
 endmenu
diff --git a/arch/riscv/crypto/Makefile b/arch/riscv/crypto/Makefile
index 38ee741a9777..1a9f31b185de 100644
--- a/arch/riscv/crypto/Makefile
+++ b/arch/riscv/crypto/Makefile
@@ -21,6 +21,9 @@ sha256-riscv64-y := sha256-riscv64-glue.o sha256-riscv64-zvbb-zvknha.o
 obj-$(CONFIG_CRYPTO_SHA512_RISCV64) += sha512-riscv64.o
 sha512-riscv64-y := sha512-riscv64-glue.o sha512-riscv64-zvbb-zvknhb.o
 
+obj-$(CONFIG_CRYPTO_SM4_RISCV64) += sm4-riscv64.o
+sm4-riscv64-y := sm4-riscv64-glue.o sm4-riscv64-zvksed.o
+
 quiet_cmd_perlasm = PERLASM $@
       cmd_perlasm = $(PERL) $(<) void $(@)
 
@@ -42,6 +45,10 @@ $(obj)/sha256-riscv64-zvbb-zvknha.S: $(src)/sha256-riscv64-zvbb-zvknha.pl
 $(obj)/sha512-riscv64-zvbb-zvknhb.S: $(src)/sha512-riscv64-zvbb-zvknhb.pl
 	$(call cmd,perlasm)
 
+$(obj)/sm4-riscv64-zvksed.S: $(src)/sm4-riscv64-zvksed.pl
+	$(call cmd,perlasm)
+
 clean-files += aes-riscv64-zvkned.S
 clean-files += ghash-riscv64-zbc.S ghash-riscv64-zvkb.S ghash-riscv64-zvkg.S
 clean-files += sha256-riscv64-zvknha.S sha512-riscv64-zvknhb.S
+clean-files += sm4-riscv64-zvksed.S
diff --git a/arch/riscv/crypto/sm4-riscv64-glue.c b/arch/riscv/crypto/sm4-riscv64-glue.c
new file mode 100644
index 000000000000..b4030690f696
--- /dev/null
+++ b/arch/riscv/crypto/sm4-riscv64-glue.c
@@ -0,0 +1,162 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Linux/riscv64 port of the OpenSSL SM4 implementation for RISCV64
+ *
+ * Copyright (C) 2023 VRULL GmbH
+ * Author: Heiko Stuebner <heiko.stuebner@vrull.eu>
+ */
+
+#include <linux/crypto.h>
+#include <linux/delay.h>
+#include <linux/err.h>
+#include <linux/module.h>
+#include <linux/types.h>
+#include <asm/simd.h>
+#include <asm/vector.h>
+#include <crypto/sm4.h>
+#include <crypto/internal/cipher.h>
+#include <crypto/internal/simd.h>
+
+struct sm4_key {
+	u32 rkey[SM4_RKEY_WORDS];
+};
+
+void rv64i_zvksed_sm4_encrypt(const u8 *in, u8 *out, const struct sm4_key *key);
+void rv64i_zvksed_sm4_decrypt(const u8 *in, u8 *out, const struct sm4_key *key);
+int rv64i_zvksed_sm4_set_encrypt_key(const u8 *userKey, struct sm4_key *key);
+int rv64i_zvksed_sm4_set_decrypt_key(const u8 *userKey, struct sm4_key *key);
+
+struct riscv_sm4_ctx {
+	struct crypto_cipher *fallback;
+	struct sm4_key enc_key;
+	struct sm4_key dec_key;
+	unsigned int keylen;
+};
+
+static int riscv64_sm4_init_zvksed(struct crypto_tfm *tfm)
+{
+	struct riscv_sm4_ctx *ctx = crypto_tfm_ctx(tfm);
+	const char *alg = crypto_tfm_alg_name(tfm);
+	struct crypto_cipher *fallback;
+
+	fallback = crypto_alloc_cipher(alg, 0, CRYPTO_ALG_NEED_FALLBACK);
+	if (IS_ERR(fallback)) {
+		pr_err("Failed to allocate fallback for '%s': %ld\n",
+		       alg, PTR_ERR(fallback));
+		return PTR_ERR(fallback);
+	}
+
+	crypto_cipher_set_flags(fallback,
+				crypto_cipher_get_flags((struct
+							 crypto_cipher *)
+							tfm));
+	ctx->fallback = fallback;
+
+	return 0;
+}
+
+static void riscv64_sm4_exit_zvksed(struct crypto_tfm *tfm)
+{
+	struct riscv_sm4_ctx *ctx = crypto_tfm_ctx(tfm);
+
+	if (ctx->fallback) {
+		crypto_free_cipher(ctx->fallback);
+		ctx->fallback = NULL;
+	}
+}
+
+static int riscv64_sm4_setkey_zvksed(struct crypto_tfm *tfm, const u8 *key,
+				     unsigned int keylen)
+{
+	struct riscv_sm4_ctx *ctx = crypto_tfm_ctx(tfm);
+	int ret;
+
+	ctx->keylen = keylen;
+
+	kernel_rvv_begin();
+	ret = rv64i_zvksed_sm4_set_encrypt_key(key, &ctx->enc_key);
+	if (ret != 1) {
+		kernel_rvv_end();
+		return -EINVAL;
+	}
+
+	ret = rv64i_zvksed_sm4_set_decrypt_key(key, &ctx->dec_key);
+	kernel_rvv_end();
+	if (ret != 1)
+		return -EINVAL;
+
+	ret = crypto_cipher_setkey(ctx->fallback, key, keylen);
+
+	return ret ? -EINVAL : 0;
+}
+
+static void riscv64_sm4_encrypt_zvksed(struct crypto_tfm *tfm, u8 *dst, const u8 *src)
+{
+	struct riscv_sm4_ctx *ctx = crypto_tfm_ctx(tfm);
+
+	if (crypto_simd_usable()) {
+		kernel_rvv_begin();
+		rv64i_zvksed_sm4_encrypt(src, dst, &ctx->enc_key);
+		kernel_rvv_end();
+	} else {
+		crypto_cipher_encrypt_one(ctx->fallback, dst, src);
+	}
+}
+
+static void riscv64_sm4_decrypt_zvksed(struct crypto_tfm *tfm, u8 *dst, const u8 *src)
+{
+	struct riscv_sm4_ctx *ctx = crypto_tfm_ctx(tfm);
+
+	if (crypto_simd_usable()) {
+		kernel_rvv_begin();
+		rv64i_zvksed_sm4_decrypt(src, dst, &ctx->dec_key);
+		kernel_rvv_end();
+	} else {
+		crypto_cipher_decrypt_one(ctx->fallback, dst, src);
+	}
+}
+
+struct crypto_alg riscv64_sm4_zvksed_alg = {
+	.cra_name = "sm4",
+	.cra_driver_name = "riscv-sm4-zvksed",
+	.cra_module = THIS_MODULE,
+	.cra_priority = 300,
+	.cra_flags = CRYPTO_ALG_TYPE_CIPHER | CRYPTO_ALG_NEED_FALLBACK,
+	.cra_blocksize = SM4_BLOCK_SIZE,
+	.cra_ctxsize = sizeof(struct riscv_sm4_ctx),
+	.cra_init = riscv64_sm4_init_zvksed,
+	.cra_exit = riscv64_sm4_exit_zvksed,
+	.cra_cipher = {
+		.cia_min_keysize = SM4_KEY_SIZE,
+		.cia_max_keysize = SM4_KEY_SIZE,
+		.cia_setkey = riscv64_sm4_setkey_zvksed,
+		.cia_encrypt = riscv64_sm4_encrypt_zvksed,
+		.cia_decrypt = riscv64_sm4_decrypt_zvksed,
+	},
+};
+
+static int __init riscv64_sm4_mod_init(void)
+{
+	if (riscv_isa_extension_available(NULL, ZVKSED) &&
+	    riscv_isa_extension_available(NULL, ZVBB) &&
+	    riscv_vector_vlen() >= 128)
+		return crypto_register_alg(&riscv64_sm4_zvksed_alg);
+
+	return 0;
+}
+
+static void __exit riscv64_sm4_mod_fini(void)
+{
+	if (riscv_isa_extension_available(NULL, ZVKSED) &&
+	    riscv_isa_extension_available(NULL, ZVBB) &&
+	    riscv_vector_vlen() >= 128)
+		crypto_unregister_alg(&riscv64_sm4_zvksed_alg);
+}
+
+module_init(riscv64_sm4_mod_init);
+module_exit(riscv64_sm4_mod_fini);
+
+MODULE_DESCRIPTION("SM4 (accelerated)");
+MODULE_AUTHOR("Heiko Stuebner <heiko.stuebner@vrull.eu>");
+MODULE_LICENSE("GPL");
+MODULE_ALIAS_CRYPTO("sm4");
diff --git a/arch/riscv/crypto/sm4-riscv64-zvksed.pl b/arch/riscv/crypto/sm4-riscv64-zvksed.pl
new file mode 100644
index 000000000000..fa97a58afeec
--- /dev/null
+++ b/arch/riscv/crypto/sm4-riscv64-zvksed.pl
@@ -0,0 +1,300 @@
+#! /usr/bin/env perl
+# SPDX-License-Identifier: Apache-2.0 OR BSD-2-Clause
+#
+# This file is dual-licensed, meaning that you can use it under your
+# choice of either of the following two licenses:
+#
+# Copyright 2023 The OpenSSL Project Authors. All Rights Reserved.
+#
+# Licensed under the Apache License 2.0 (the "License"). You can obtain
+# a copy in the file LICENSE in the source distribution or at
+# https://www.openssl.org/source/license.html
+#
+# or
+#
+# Copyright (c) 2023, Christoph Müllner <christoph.muellner@vrull.eu>
+# All rights reserved.
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions
+# are met:
+# 1. Redistributions of source code must retain the above copyright
+#    notice, this list of conditions and the following disclaimer.
+# 2. Redistributions in binary form must reproduce the above copyright
+#    notice, this list of conditions and the following disclaimer in the
+#    documentation and/or other materials provided with the distribution.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+# The generated code of this file depends on the following RISC-V extensions:
+# - RV64I
+# - RISC-V vector ('V') with VLEN >= 128
+# - Vector Bit-manipulation used in Cryptography ('Zvbb')
+# - Vector ShangMi Suite: SM4 Block Cipher ('Zvksed')
+
+use strict;
+use warnings;
+
+use FindBin qw($Bin);
+use lib "$Bin";
+use lib "$Bin/../../perlasm";
+use riscv;
+
+# $output is the last argument if it looks like a file (it has an extension)
+# $flavour is the first argument if it doesn't look like a file
+my $output = $#ARGV >= 0 && $ARGV[$#ARGV] =~ m|\.\w+$| ? pop : undef;
+my $flavour = $#ARGV >= 0 && $ARGV[0] !~ m|\.| ? shift : undef;
+
+$output and open STDOUT,">$output";
+
+my $code=<<___;
+.text
+___
+
+####
+# int rv64i_zvksed_sm4_set_encrypt_key(const unsigned char *userKey,
+#                                      SM4_KEY *key);
+#
+{
+my ($ukey,$keys,$fk)=("a0","a1","t0");
+my ($vukey,$vfk,$vk0,$vk1,$vk2,$vk3,$vk4,$vk5,$vk6,$vk7)=("v1","v2","v3","v4","v5","v6","v7","v8","v9","v10");
+$code .= <<___;
+.p2align 3
+.globl rv64i_zvksed_sm4_set_encrypt_key
+.type rv64i_zvksed_sm4_set_encrypt_key,\@function
+rv64i_zvksed_sm4_set_encrypt_key:
+    @{[vsetivli__x0_4_e32_m1_ta_ma]}
+
+    # Load the user key
+    @{[vle32_v $vukey, $ukey]}
+    @{[vrev8_v $vukey, $vukey]}
+
+    # Load the FK.
+    la $fk, FK
+    @{[vle32_v $vfk, $fk]}
+
+    # Generate round keys.
+    @{[vxor_vv $vukey, $vukey, $vfk]}
+    @{[vsm4k_vi $vk0, $vukey, 0]} # rk[0:3]
+    @{[vsm4k_vi $vk1, $vk0, 1]} # rk[4:7]
+    @{[vsm4k_vi $vk2, $vk1, 2]} # rk[8:11]
+    @{[vsm4k_vi $vk3, $vk2, 3]} # rk[12:15]
+    @{[vsm4k_vi $vk4, $vk3, 4]} # rk[16:19]
+    @{[vsm4k_vi $vk5, $vk4, 5]} # rk[20:23]
+    @{[vsm4k_vi $vk6, $vk5, 6]} # rk[24:27]
+    @{[vsm4k_vi $vk7, $vk6, 7]} # rk[28:31]
+
+    # Store round keys
+    @{[vse32_v $vk0, $keys]} # rk[0:3]
+    addi $keys, $keys, 16
+    @{[vse32_v $vk1, $keys]} # rk[4:7]
+    addi $keys, $keys, 16
+    @{[vse32_v $vk2, $keys]} # rk[8:11]
+    addi $keys, $keys, 16
+    @{[vse32_v $vk3, $keys]} # rk[12:15]
+    addi $keys, $keys, 16
+    @{[vse32_v $vk4, $keys]} # rk[16:19]
+    addi $keys, $keys, 16
+    @{[vse32_v $vk5, $keys]} # rk[20:23]
+    addi $keys, $keys, 16
+    @{[vse32_v $vk6, $keys]} # rk[24:27]
+    addi $keys, $keys, 16
+    @{[vse32_v $vk7, $keys]} # rk[28:31]
+
+    li a0, 1
+    ret
+.size rv64i_zvksed_sm4_set_encrypt_key,.-rv64i_zvksed_sm4_set_encrypt_key
+___
+}
+
+####
+# int rv64i_zvksed_sm4_set_decrypt_key(const unsigned char *userKey,
+#                                      SM4_KEY *key);
+#
+{
+my ($ukey,$keys,$fk,$stride)=("a0","a1","t0","t1");
+my ($vukey,$vfk,$vk0,$vk1,$vk2,$vk3,$vk4,$vk5,$vk6,$vk7)=("v1","v2","v3","v4","v5","v6","v7","v8","v9","v10");
+$code .= <<___;
+.p2align 3
+.globl rv64i_zvksed_sm4_set_decrypt_key
+.type rv64i_zvksed_sm4_set_decrypt_key,\@function
+rv64i_zvksed_sm4_set_decrypt_key:
+    @{[vsetivli__x0_4_e32_m1_ta_ma]}
+
+    # Load the user key
+    @{[vle32_v $vukey, $ukey]}
+    @{[vrev8_v $vukey, $vukey]}
+
+    # Load the FK.
+    la $fk, FK
+    @{[vle32_v $vfk, $fk]}
+
+    # Generate round keys.
+    @{[vxor_vv $vukey, $vukey, $vfk]}
+    @{[vsm4k_vi $vk0, $vukey, 0]} # rk[0:3]
+    @{[vsm4k_vi $vk1, $vk0, 1]} # rk[4:7]
+    @{[vsm4k_vi $vk2, $vk1, 2]} # rk[8:11]
+    @{[vsm4k_vi $vk3, $vk2, 3]} # rk[12:15]
+    @{[vsm4k_vi $vk4, $vk3, 4]} # rk[16:19]
+    @{[vsm4k_vi $vk5, $vk4, 5]} # rk[20:23]
+    @{[vsm4k_vi $vk6, $vk5, 6]} # rk[24:27]
+    @{[vsm4k_vi $vk7, $vk6, 7]} # rk[28:31]
+
+    # Store round keys in reverse order
+    addi $keys, $keys, 12
+    li $stride, -4
+    @{[vsse32_v $vk7, $keys, $stride]} # rk[31:28]
+    addi $keys, $keys, 16
+    @{[vsse32_v $vk6, $keys, $stride]} # rk[27:24]
+    addi $keys, $keys, 16
+    @{[vsse32_v $vk5, $keys, $stride]} # rk[23:20]
+    addi $keys, $keys, 16
+    @{[vsse32_v $vk4, $keys, $stride]} # rk[19:16]
+    addi $keys, $keys, 16
+    @{[vsse32_v $vk3, $keys, $stride]} # rk[15:12]
+    addi $keys, $keys, 16
+    @{[vsse32_v $vk2, $keys, $stride]} # rk[11:8]
+    addi $keys, $keys, 16
+    @{[vsse32_v $vk1, $keys, $stride]} # rk[7:4]
+    addi $keys, $keys, 16
+    @{[vsse32_v $vk0, $keys, $stride]} # rk[3:0]
+
+    li a0, 1
+    ret
+.size rv64i_zvksed_sm4_set_decrypt_key,.-rv64i_zvksed_sm4_set_decrypt_key
+___
+}
+
+####
+# void rv64i_zvksed_sm4_encrypt(const unsigned char *in, unsigned char *out,
+#                               const SM4_KEY *key);
+#
+{
+my ($in,$out,$keys,$stride)=("a0","a1","a2","t0");
+my ($vdata,$vk0,$vk1,$vk2,$vk3,$vk4,$vk5,$vk6,$vk7,$vgen)=("v1","v2","v3","v4","v5","v6","v7","v8","v9","v10");
+$code .= <<___;
+.p2align 3
+.globl rv64i_zvksed_sm4_encrypt
+.type rv64i_zvksed_sm4_encrypt,\@function
+rv64i_zvksed_sm4_encrypt:
+    @{[vsetivli__x0_4_e32_m1_ta_ma]}
+
+    # Order of elements was adjusted in set_encrypt_key()
+    @{[vle32_v $vk0, $keys]} # rk[0:3]
+    addi $keys, $keys, 16
+    @{[vle32_v $vk1, $keys]} # rk[4:7]
+    addi $keys, $keys, 16
+    @{[vle32_v $vk2, $keys]} # rk[8:11]
+    addi $keys, $keys, 16
+    @{[vle32_v $vk3, $keys]} # rk[12:15]
+    addi $keys, $keys, 16
+    @{[vle32_v $vk4, $keys]} # rk[16:19]
+    addi $keys, $keys, 16
+    @{[vle32_v $vk5, $keys]} # rk[20:23]
+    addi $keys, $keys, 16
+    @{[vle32_v $vk6, $keys]} # rk[24:27]
+    addi $keys, $keys, 16
+    @{[vle32_v $vk7, $keys]} # rk[28:31]
+
+    # Load input data
+    @{[vle32_v $vdata, $in]}
+    @{[vrev8_v $vdata, $vdata]}
+
+    # Encrypt with all keys
+    @{[vsm4r_vs $vdata, $vk0]}
+    @{[vsm4r_vs $vdata, $vk1]}
+    @{[vsm4r_vs $vdata, $vk2]}
+    @{[vsm4r_vs $vdata, $vk3]}
+    @{[vsm4r_vs $vdata, $vk4]}
+    @{[vsm4r_vs $vdata, $vk5]}
+    @{[vsm4r_vs $vdata, $vk6]}
+    @{[vsm4r_vs $vdata, $vk7]}
+
+    # Save the ciphertext (in reverse element order)
+    @{[vrev8_v $vdata, $vdata]}
+    li $stride, -4
+    addi $out, $out, 12
+    @{[vsse32_v $vdata, $out, $stride]}
+
+    ret
+.size rv64i_zvksed_sm4_encrypt,.-rv64i_zvksed_sm4_encrypt
+___
+}
+
+####
+# void rv64i_zvksed_sm4_decrypt(const unsigned char *in, unsigned char *out,
+#                               const SM4_KEY *key);
+#
+{
+my ($in,$out,$keys,$stride)=("a0","a1","a2","t0");
+my ($vdata,$vk0,$vk1,$vk2,$vk3,$vk4,$vk5,$vk6,$vk7,$vgen)=("v1","v2","v3","v4","v5","v6","v7","v8","v9","v10");
+$code .= <<___;
+.p2align 3
+.globl rv64i_zvksed_sm4_decrypt
+.type rv64i_zvksed_sm4_decrypt,\@function
+rv64i_zvksed_sm4_decrypt:
+    @{[vsetivli__x0_4_e32_m1_ta_ma]}
+
+    # Order of elements was adjusted in set_decrypt_key()
+    @{[vle32_v $vk7, $keys]} # rk[31:28]
+    addi $keys, $keys, 16
+    @{[vle32_v $vk6, $keys]} # rk[27:24]
+    addi $keys, $keys, 16
+    @{[vle32_v $vk5, $keys]} # rk[23:20]
+    addi $keys, $keys, 16
+    @{[vle32_v $vk4, $keys]} # rk[19:16]
+    addi $keys, $keys, 16
+    @{[vle32_v $vk3, $keys]} # rk[15:11]
+    addi $keys, $keys, 16
+    @{[vle32_v $vk2, $keys]} # rk[11:8]
+    addi $keys, $keys, 16
+    @{[vle32_v $vk1, $keys]} # rk[7:4]
+    addi $keys, $keys, 16
+    @{[vle32_v $vk0, $keys]} # rk[3:0]
+
+    # Load input data
+    @{[vle32_v $vdata, $in]}
+    @{[vrev8_v $vdata, $vdata]}
+
+    # Encrypt with all keys
+    @{[vsm4r_vs $vdata, $vk7]}
+    @{[vsm4r_vs $vdata, $vk6]}
+    @{[vsm4r_vs $vdata, $vk5]}
+    @{[vsm4r_vs $vdata, $vk4]}
+    @{[vsm4r_vs $vdata, $vk3]}
+    @{[vsm4r_vs $vdata, $vk2]}
+    @{[vsm4r_vs $vdata, $vk1]}
+    @{[vsm4r_vs $vdata, $vk0]}
+
+    # Save the ciphertext (in reverse element order)
+    @{[vrev8_v $vdata, $vdata]}
+    li $stride, -4
+    addi $out, $out, 12
+    @{[vsse32_v $vdata, $out, $stride]}
+
+    ret
+.size rv64i_zvksed_sm4_decrypt,.-rv64i_zvksed_sm4_decrypt
+___
+}
+
+$code .= <<___;
+# Family Key (little-endian 32-bit chunks)
+.p2align 3
+FK:
+    .word 0xA3B1BAC6, 0x56AA3350, 0x677D9197, 0xB27022DC
+.size FK,.-FK
+___
+
+print $code;
+
+close STDOUT or die "error closing STDOUT: $!";
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [PATCH v4 12/12] RISC-V: crypto: add Zvksh accelerated SM3 hash implementation
  2023-07-11 15:37 ` Heiko Stuebner
@ 2023-07-11 15:37   ` Heiko Stuebner
  -1 siblings, 0 replies; 100+ messages in thread
From: Heiko Stuebner @ 2023-07-11 15:37 UTC (permalink / raw)
  To: palmer, paul.walmsley
  Cc: aou, heiko, herbert, davem, conor.dooley, linux-riscv,
	linux-kernel, linux-crypto, christoph.muellner, ebiggers,
	Heiko Stuebner, Charalampos Mitrodimas

From: Heiko Stuebner <heiko.stuebner@vrull.eu>

Add support for the SM3 hash function implemented using the special
instructions provided by the Zvksh vector crypto instructions.

Co-developed-by: Charalampos Mitrodimas <charalampos.mitrodimas@vrull.eu>
Signed-off-by: Charalampos Mitrodimas <charalampos.mitrodimas@vrull.eu>
Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
---
 arch/riscv/crypto/Kconfig              |  11 ++
 arch/riscv/crypto/Makefile             |   8 +-
 arch/riscv/crypto/sm3-riscv64-glue.c   | 112 ++++++++++++
 arch/riscv/crypto/sm3-riscv64-zvksh.pl | 225 +++++++++++++++++++++++++
 4 files changed, 355 insertions(+), 1 deletion(-)
 create mode 100644 arch/riscv/crypto/sm3-riscv64-glue.c
 create mode 100644 arch/riscv/crypto/sm3-riscv64-zvksh.pl

diff --git a/arch/riscv/crypto/Kconfig b/arch/riscv/crypto/Kconfig
index d1e22482f7c4..0c4e575ba8d6 100644
--- a/arch/riscv/crypto/Kconfig
+++ b/arch/riscv/crypto/Kconfig
@@ -53,6 +53,17 @@ config CRYPTO_SHA512_RISCV64
 	  Architecture: riscv64
 	  - Zvknhb vector crypto extension
 
+config CRYPTO_SM3_RISCV64
+	tristate "Hash functions: SM3 (ShangMi 3)"
+	depends on 64BIT && RISCV_ISA_V
+	select CRYPTO_HASH
+	select CRYPTO_SM3
+	help
+	  SHA-512 secure hash algorithm (FIPS 180)
+
+	  Architecture: riscv64
+	  - Zvknhb vector crypto extension
+
 config CRYPTO_SM4_RISCV64
 	tristate "Ciphers: SM4 (ShangMi 4)"
 	depends on 64BIT && RISCV_ISA_V
diff --git a/arch/riscv/crypto/Makefile b/arch/riscv/crypto/Makefile
index 1a9f31b185de..f7faba6c12c9 100644
--- a/arch/riscv/crypto/Makefile
+++ b/arch/riscv/crypto/Makefile
@@ -21,6 +21,9 @@ sha256-riscv64-y := sha256-riscv64-glue.o sha256-riscv64-zvbb-zvknha.o
 obj-$(CONFIG_CRYPTO_SHA512_RISCV64) += sha512-riscv64.o
 sha512-riscv64-y := sha512-riscv64-glue.o sha512-riscv64-zvbb-zvknhb.o
 
+obj-$(CONFIG_CRYPTO_SM3_RISCV64) += sm3-riscv64.o
+sm3-riscv64-y := sm3-riscv64-glue.o sm3-riscv64-zvksh.o
+
 obj-$(CONFIG_CRYPTO_SM4_RISCV64) += sm4-riscv64.o
 sm4-riscv64-y := sm4-riscv64-glue.o sm4-riscv64-zvksed.o
 
@@ -45,10 +48,13 @@ $(obj)/sha256-riscv64-zvbb-zvknha.S: $(src)/sha256-riscv64-zvbb-zvknha.pl
 $(obj)/sha512-riscv64-zvbb-zvknhb.S: $(src)/sha512-riscv64-zvbb-zvknhb.pl
 	$(call cmd,perlasm)
 
+$(obj)/sm3-riscv64-zvksh.S: $(src)/sm3-riscv64-zvksh.pl
+	$(call cmd,perlasm)
+
 $(obj)/sm4-riscv64-zvksed.S: $(src)/sm4-riscv64-zvksed.pl
 	$(call cmd,perlasm)
 
 clean-files += aes-riscv64-zvkned.S
 clean-files += ghash-riscv64-zbc.S ghash-riscv64-zvkb.S ghash-riscv64-zvkg.S
 clean-files += sha256-riscv64-zvknha.S sha512-riscv64-zvknhb.S
-clean-files += sm4-riscv64-zvksed.S
+clean-files += sm3-riscv64-zvksh.S sm4-riscv64-zvksed.S
diff --git a/arch/riscv/crypto/sm3-riscv64-glue.c b/arch/riscv/crypto/sm3-riscv64-glue.c
new file mode 100644
index 000000000000..75b2816e2f2b
--- /dev/null
+++ b/arch/riscv/crypto/sm3-riscv64-glue.c
@@ -0,0 +1,112 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Linux/riscv64 port of the OpenSSL SM3 implementation for RISCV64
+ *
+ * Copyright (C) 2023 VRULL GmbH
+ * Author: Heiko Stuebner <heiko.stuebner@vrull.eu>
+ */
+
+#include <linux/types.h>
+#include <asm/simd.h>
+#include <asm/vector.h>
+#include <crypto/internal/hash.h>
+#include <crypto/internal/simd.h>
+#include <crypto/sha2.h>
+#include <crypto/sm3_base.h>
+
+asmlinkage void ossl_hwsm3_block_data_order_zvksh(u32 *digest, const void *o,
+						  unsigned int num);
+
+static void __sm3_block_data_order(struct sm3_state *sst, u8 const *src,
+				      int blocks)
+{
+	ossl_hwsm3_block_data_order_zvksh(sst->state, src, blocks);
+}
+
+static int riscv64_sm3_update(struct shash_desc *desc, const u8 *data,
+			 unsigned int len)
+{
+	if (crypto_simd_usable()) {
+		int ret;
+
+		kernel_rvv_begin();
+		ret = sm3_base_do_update(desc, data, len,
+					    __sm3_block_data_order);
+		kernel_rvv_end();
+		return ret;
+	} else {
+		sm3_update(shash_desc_ctx(desc), data, len);
+		return 0;
+	}
+}
+
+static int riscv64_sm3_finup(struct shash_desc *desc, const u8 *data,
+			unsigned int len, u8 *out)
+{
+
+	if (!crypto_simd_usable()) {
+		struct sm3_state *sctx = shash_desc_ctx(desc);
+
+		if (len)
+			sm3_update(sctx, data, len);
+		sm3_final(sctx, out);
+		return 0;
+	}
+
+	kernel_rvv_begin();
+	if (len)
+		sm3_base_do_update(desc, data, len,
+				   __sm3_block_data_order);
+
+	sm3_base_do_finalize(desc, __sm3_block_data_order);
+	kernel_rvv_end();
+
+	return sm3_base_finish(desc, out);
+}
+
+static int riscv64_sm3_final(struct shash_desc *desc, u8 *out)
+{
+	return riscv64_sm3_finup(desc, NULL, 0, out);
+}
+
+static struct shash_alg sm3_alg = {
+	.digestsize		= SM3_DIGEST_SIZE,
+	.init			= sm3_base_init,
+	.update			= riscv64_sm3_update,
+	.final			= riscv64_sm3_final,
+	.finup			= riscv64_sm3_finup,
+	.descsize		= sizeof(struct sm3_state),
+	.base.cra_name		= "sm3",
+	.base.cra_driver_name	= "sm3-riscv64-zvksh",
+	.base.cra_priority	= 150,
+	.base.cra_blocksize	= SM3_BLOCK_SIZE,
+	.base.cra_module	= THIS_MODULE,
+};
+
+static int __init sm3_mod_init(void)
+{
+	/* sm3 needs at least a vlen of 256 to work correctly */
+	if (riscv_isa_extension_available(NULL, ZVKSH) &&
+	    riscv_isa_extension_available(NULL, ZVBB) &&
+	    riscv_vector_vlen() >= 256)
+		return crypto_register_shash(&sm3_alg);
+
+	return 0;
+}
+
+static void __exit sm3_mod_fini(void)
+{
+	if (riscv_isa_extension_available(NULL, ZVKSH) &&
+	    riscv_isa_extension_available(NULL, ZVBB) &&
+	    riscv_vector_vlen() >= 256)
+		crypto_unregister_shash(&sm3_alg);
+}
+
+module_init(sm3_mod_init);
+module_exit(sm3_mod_fini);
+
+MODULE_DESCRIPTION("SM3 secure hash for riscv64");
+MODULE_AUTHOR("Andy Polyakov <appro@openssl.org>");
+MODULE_AUTHOR("Ard Biesheuvel <ard.biesheuvel@linaro.org>");
+MODULE_LICENSE("GPL");
+MODULE_ALIAS_CRYPTO("sm3");
diff --git a/arch/riscv/crypto/sm3-riscv64-zvksh.pl b/arch/riscv/crypto/sm3-riscv64-zvksh.pl
new file mode 100644
index 000000000000..3c21df31793b
--- /dev/null
+++ b/arch/riscv/crypto/sm3-riscv64-zvksh.pl
@@ -0,0 +1,225 @@
+#! /usr/bin/env perl
+# SPDX-License-Identifier: Apache-2.0 OR BSD-2-Clause
+#
+# This file is dual-licensed, meaning that you can use it under your
+# choice of either of the following two licenses:
+#
+# Copyright 2023 The OpenSSL Project Authors. All Rights Reserved.
+#
+# Licensed under the Apache License 2.0 (the "License"). You can obtain
+# a copy in the file LICENSE in the source distribution or at
+# https://www.openssl.org/source/license.html
+#
+# or
+#
+# Copyright (c) 2023, Christoph Müllner <christoph.muellner@vrull.eu>
+# All rights reserved.
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions
+# are met:
+# 1. Redistributions of source code must retain the above copyright
+#    notice, this list of conditions and the following disclaimer.
+# 2. Redistributions in binary form must reproduce the above copyright
+#    notice, this list of conditions and the following disclaimer in the
+#    documentation and/or other materials provided with the distribution.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+# The generated code of this file depends on the following RISC-V extensions:
+# - RV64I
+# - RISC-V vector ('V') with VLEN >= 256
+# - Vector Bit-manipulation used in Cryptography ('Zvbb')
+# - ShangMi Suite: SM3 Secure Hash ('Zvksh')
+
+use strict;
+use warnings;
+
+use FindBin qw($Bin);
+use lib "$Bin";
+use lib "$Bin/../../perlasm";
+use riscv;
+
+# $output is the last argument if it looks like a file (it has an extension)
+# $flavour is the first argument if it doesn't look like a file
+my $output = $#ARGV >= 0 && $ARGV[$#ARGV] =~ m|\.\w+$| ? pop : undef;
+my $flavour = $#ARGV >= 0 && $ARGV[0] !~ m|\.| ? shift : undef;
+
+$output and open STDOUT,">$output";
+
+my $code=<<___;
+.text
+___
+
+################################################################################
+# ossl_hwsm3_block_data_order_zvksh(SM3_CTX *c, const void *p, size_t num);
+{
+my ($CTX, $INPUT, $NUM) = ("a0", "a1", "a2");
+my ($V0, $V1, $V2, $V3, $V4) = ("v0", "v1", "v2", "v3", "v4");
+
+$code .= <<___;
+.text
+.p2align 3
+.globl ossl_hwsm3_block_data_order_zvksh
+.type ossl_hwsm3_block_data_order_zvksh,\@function
+ossl_hwsm3_block_data_order_zvksh:
+    @{[vsetivli__x0_8_e32_m1_ta_ma]}
+
+    # Load initial state of hash context (c->A-H).
+    @{[vle32_v $V0, $CTX]}
+    @{[vrev8_v $V0, $V0]}
+
+L_sm3_loop:
+    # Copy the previous state to v1.
+    # It will be XOR'ed with the current state at the end of the round.
+    @{[vmv_v_v $V1, $V0]}
+
+    # Load the 64B block in 2x32B chunks.
+    @{[vle32_v $V3, $INPUT]} # v3 := {w7, ..., w0}
+    add $INPUT, $INPUT, 32
+
+    @{[vle32_v $V4, $INPUT]} # v4 := {w15, ..., w8}
+    add $INPUT, $INPUT, 32
+
+    add $NUM, $NUM, -1
+
+    # As vsm3c consumes only w0, w1, w4, w5 we need to slide the input
+    # 2 elements down so we process elements w2, w3, w6, w7
+    # This will be repeated for each odd round.
+    @{[vslidedown_vi $V2, $V3, 2]} # v2 := {X, X, w7, ..., w2}
+
+    @{[vsm3c_vi $V0, $V3, 0]}
+    @{[vsm3c_vi $V0, $V2, 1]}
+
+    # Prepare a vector with {w11, ..., w4}
+    @{[vslidedown_vi $V2, $V2, 2]} # v2 := {X, X, X, X, w7, ..., w4}
+    @{[vslideup_vi $V2, $V4, 4]}   # v2 := {w11, w10, w9, w8, w7, w6, w5, w4}
+
+    @{[vsm3c_vi $V0, $V2, 2]}
+    @{[vslidedown_vi $V2, $V2, 2]} # v2 := {X, X, w11, w10, w9, w8, w7, w6}
+    @{[vsm3c_vi $V0, $V2, 3]}
+
+    @{[vsm3c_vi $V0, $V4, 4]}
+    @{[vslidedown_vi $V2, $V4, 2]} # v2 := {X, X, w15, w14, w13, w12, w11, w10}
+    @{[vsm3c_vi $V0, $V2, 5]}
+
+    @{[vsm3me_vv $V3, $V4, $V3]}   # v3 := {w23, w22, w21, w20, w19, w18, w17, w16}
+
+    # Prepare a register with {w19, w18, w17, w16, w15, w14, w13, w12}
+    @{[vslidedown_vi $V2, $V2, 2]} # v2 := {X, X, X, X, w15, w14, w13, w12}
+    @{[vslideup_vi $V2, $V3, 4]}   # v2 := {w19, w18, w17, w16, w15, w14, w13, w12}
+
+    @{[vsm3c_vi $V0, $V2, 6]}
+    @{[vslidedown_vi $V2, $V2, 2]} # v2 := {X, X, w19, w18, w17, w16, w15, w14}
+    @{[vsm3c_vi $V0, $V2, 7]}
+
+    @{[vsm3c_vi $V0, $V3, 8]}
+    @{[vslidedown_vi $V2, $V3, 2]} # v2 := {X, X, w23, w22, w21, w20, w19, w18}
+    @{[vsm3c_vi $V0, $V2, 9]}
+
+    @{[vsm3me_vv $V4, $V3, $V4]}   # v4 := {w31, w30, w29, w28, w27, w26, w25, w24}
+
+    # Prepare a register with {w27, w26, w25, w24, w23, w22, w21, w20}
+    @{[vslidedown_vi $V2, $V2, 2]} # v2 := {X, X, X, X, w23, w22, w21, w20}
+    @{[vslideup_vi $V2, $V4, 4]}   # v2 := {w27, w26, w25, w24, w23, w22, w21, w20}
+
+    @{[vsm3c_vi $V0, $V2, 10]}
+    @{[vslidedown_vi $V2, $V2, 2]} # v2 := {X, X, w27, w26, w25, w24, w23, w22}
+    @{[vsm3c_vi $V0, $V2, 11]}
+
+    @{[vsm3c_vi $V0, $V4, 12]}
+    @{[vslidedown_vi $V2, $V4, 2]} # v2 := {x, X, w31, w30, w29, w28, w27, w26}
+    @{[vsm3c_vi $V0, $V2, 13]}
+
+    @{[vsm3me_vv $V3, $V4, $V3]}   # v3 := {w32, w33, w34, w35, w36, w37, w38, w39}
+
+    # Prepare a register with {w35, w34, w33, w32, w31, w30, w29, w28}
+    @{[vslidedown_vi $V2, $V2, 2]} # v2 := {X, X, X, X, w31, w30, w29, w28}
+    @{[vslideup_vi $V2, $V3, 4]}   # v2 := {w35, w34, w33, w32, w31, w30, w29, w28}
+
+    @{[vsm3c_vi $V0, $V2, 14]}
+    @{[vslidedown_vi $V2, $V2, 2]} # v2 := {X, X, w35, w34, w33, w32, w31, w30}
+    @{[vsm3c_vi $V0, $V2, 15]}
+
+    @{[vsm3c_vi $V0, $V3, 16]}
+    @{[vslidedown_vi $V2, $V3, 2]} # v2 := {X, X, w39, w38, w37, w36, w35, w34}
+    @{[vsm3c_vi $V0, $V2, 17]}
+
+    @{[vsm3me_vv $V4, $V3, $V4]}   # v4 := {w47, w46, w45, w44, w43, w42, w41, w40}
+
+    # Prepare a register with {w43, w42, w41, w40, w39, w38, w37, w36}
+    @{[vslidedown_vi $V2, $V2, 2]} # v2 := {X, X, X, X, w39, w38, w37, w36}
+    @{[vslideup_vi $V2, $V4, 4]}   # v2 := {w43, w42, w41, w40, w39, w38, w37, w36}
+
+    @{[vsm3c_vi $V0, $V2, 18]}
+    @{[vslidedown_vi $V2, $V2, 2]} # v2 := {X, X, w43, w42, w41, w40, w39, w38}
+    @{[vsm3c_vi $V0, $V2, 19]}
+
+    @{[vsm3c_vi $V0, $V4, 20]}
+    @{[vslidedown_vi $V2, $V4, 2]} # v2 := {X, X, w47, w46, w45, w44, w43, w42}
+    @{[vsm3c_vi $V0, $V2, 21]}
+
+    @{[vsm3me_vv $V3, $V4, $V3]}   # v3 := {w55, w54, w53, w52, w51, w50, w49, w48}
+
+    # Prepare a register with {w51, w50, w49, w48, w47, w46, w45, w44}
+    @{[vslidedown_vi $V2, $V2, 2]} # v2 := {X, X, X, X, w47, w46, w45, w44}
+    @{[vslideup_vi $V2, $V3, 4]}   # v2 := {w51, w50, w49, w48, w47, w46, w45, w44}
+
+    @{[vsm3c_vi $V0, $V2, 22]}
+    @{[vslidedown_vi $V2, $V2, 2]} # v2 := {X, X, w51, w50, w49, w48, w47, w46}
+    @{[vsm3c_vi $V0, $V2, 23]}
+
+    @{[vsm3c_vi $V0, $V3, 24]}
+    @{[vslidedown_vi $V2, $V3, 2]} # v2 := {X, X, w55, w54, w53, w52, w51, w50}
+    @{[vsm3c_vi $V0, $V2, 25]}
+
+    @{[vsm3me_vv $V4, $V3, $V4]}   # v4 := {w63, w62, w61, w60, w59, w58, w57, w56}
+
+    # Prepare a register with {w59, w58, w57, w56, w55, w54, w53, w52}
+    @{[vslidedown_vi $V2, $V2, 2]} # v2 := {X, X, X, X, w55, w54, w53, w52}
+    @{[vslideup_vi $V2, $V4, 4]}   # v2 := {w59, w58, w57, w56, w55, w54, w53, w52}
+
+    @{[vsm3c_vi $V0, $V2, 26]}
+    @{[vslidedown_vi $V2, $V2, 2]} # v2 := {X, X, w59, w58, w57, w56, w55, w54}
+    @{[vsm3c_vi $V0, $V2, 27]}
+
+    @{[vsm3c_vi $V0, $V4, 28]}
+    @{[vslidedown_vi $V2, $V4, 2]} # v2 := {X, X, w63, w62, w61, w60, w59, w58}
+    @{[vsm3c_vi $V0, $V2, 29]}
+
+    @{[vsm3me_vv $V3, $V4, $V3]}   # v3 := {w71, w70, w69, w68, w67, w66, w65, w64}
+
+    # Prepare a register with {w67, w66, w65, w64, w63, w62, w61, w60}
+    @{[vslidedown_vi $V2, $V2, 2]} # v2 := {X, X, X, X, w63, w62, w61, w60}
+    @{[vslideup_vi $V2, $V3, 4]}   # v2 := {w67, w66, w65, w64, w63, w62, w61, w60}
+
+    @{[vsm3c_vi $V0, $V2, 30]}
+    @{[vslidedown_vi $V2, $V2, 2]} # v2 := {X, X, w67, w66, w65, w64, w63, w62}
+    @{[vsm3c_vi $V0, $V2, 31]}
+
+    # XOR in the previous state.
+    @{[vxor_vv $V0, $V0, $V1]}
+
+    bnez $NUM, L_sm3_loop     # Check if there are any more block to process
+L_sm3_end:
+    @{[vrev8_v $V0, $V0]}
+    @{[vse32_v $V0, $CTX]}
+    ret
+
+.size ossl_hwsm3_block_data_order_zvksh,.-ossl_hwsm3_block_data_order_zvksh
+___
+}
+
+print $code;
+
+close STDOUT or die "error closing STDOUT: $!";
-- 
2.39.2


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [PATCH v4 12/12] RISC-V: crypto: add Zvksh accelerated SM3 hash implementation
@ 2023-07-11 15:37   ` Heiko Stuebner
  0 siblings, 0 replies; 100+ messages in thread
From: Heiko Stuebner @ 2023-07-11 15:37 UTC (permalink / raw)
  To: palmer, paul.walmsley
  Cc: aou, heiko, herbert, davem, conor.dooley, linux-riscv,
	linux-kernel, linux-crypto, christoph.muellner, ebiggers,
	Heiko Stuebner, Charalampos Mitrodimas

From: Heiko Stuebner <heiko.stuebner@vrull.eu>

Add support for the SM3 hash function implemented using the special
instructions provided by the Zvksh vector crypto instructions.

Co-developed-by: Charalampos Mitrodimas <charalampos.mitrodimas@vrull.eu>
Signed-off-by: Charalampos Mitrodimas <charalampos.mitrodimas@vrull.eu>
Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
---
 arch/riscv/crypto/Kconfig              |  11 ++
 arch/riscv/crypto/Makefile             |   8 +-
 arch/riscv/crypto/sm3-riscv64-glue.c   | 112 ++++++++++++
 arch/riscv/crypto/sm3-riscv64-zvksh.pl | 225 +++++++++++++++++++++++++
 4 files changed, 355 insertions(+), 1 deletion(-)
 create mode 100644 arch/riscv/crypto/sm3-riscv64-glue.c
 create mode 100644 arch/riscv/crypto/sm3-riscv64-zvksh.pl

diff --git a/arch/riscv/crypto/Kconfig b/arch/riscv/crypto/Kconfig
index d1e22482f7c4..0c4e575ba8d6 100644
--- a/arch/riscv/crypto/Kconfig
+++ b/arch/riscv/crypto/Kconfig
@@ -53,6 +53,17 @@ config CRYPTO_SHA512_RISCV64
 	  Architecture: riscv64
 	  - Zvknhb vector crypto extension
 
+config CRYPTO_SM3_RISCV64
+	tristate "Hash functions: SM3 (ShangMi 3)"
+	depends on 64BIT && RISCV_ISA_V
+	select CRYPTO_HASH
+	select CRYPTO_SM3
+	help
+	  SHA-512 secure hash algorithm (FIPS 180)
+
+	  Architecture: riscv64
+	  - Zvknhb vector crypto extension
+
 config CRYPTO_SM4_RISCV64
 	tristate "Ciphers: SM4 (ShangMi 4)"
 	depends on 64BIT && RISCV_ISA_V
diff --git a/arch/riscv/crypto/Makefile b/arch/riscv/crypto/Makefile
index 1a9f31b185de..f7faba6c12c9 100644
--- a/arch/riscv/crypto/Makefile
+++ b/arch/riscv/crypto/Makefile
@@ -21,6 +21,9 @@ sha256-riscv64-y := sha256-riscv64-glue.o sha256-riscv64-zvbb-zvknha.o
 obj-$(CONFIG_CRYPTO_SHA512_RISCV64) += sha512-riscv64.o
 sha512-riscv64-y := sha512-riscv64-glue.o sha512-riscv64-zvbb-zvknhb.o
 
+obj-$(CONFIG_CRYPTO_SM3_RISCV64) += sm3-riscv64.o
+sm3-riscv64-y := sm3-riscv64-glue.o sm3-riscv64-zvksh.o
+
 obj-$(CONFIG_CRYPTO_SM4_RISCV64) += sm4-riscv64.o
 sm4-riscv64-y := sm4-riscv64-glue.o sm4-riscv64-zvksed.o
 
@@ -45,10 +48,13 @@ $(obj)/sha256-riscv64-zvbb-zvknha.S: $(src)/sha256-riscv64-zvbb-zvknha.pl
 $(obj)/sha512-riscv64-zvbb-zvknhb.S: $(src)/sha512-riscv64-zvbb-zvknhb.pl
 	$(call cmd,perlasm)
 
+$(obj)/sm3-riscv64-zvksh.S: $(src)/sm3-riscv64-zvksh.pl
+	$(call cmd,perlasm)
+
 $(obj)/sm4-riscv64-zvksed.S: $(src)/sm4-riscv64-zvksed.pl
 	$(call cmd,perlasm)
 
 clean-files += aes-riscv64-zvkned.S
 clean-files += ghash-riscv64-zbc.S ghash-riscv64-zvkb.S ghash-riscv64-zvkg.S
 clean-files += sha256-riscv64-zvknha.S sha512-riscv64-zvknhb.S
-clean-files += sm4-riscv64-zvksed.S
+clean-files += sm3-riscv64-zvksh.S sm4-riscv64-zvksed.S
diff --git a/arch/riscv/crypto/sm3-riscv64-glue.c b/arch/riscv/crypto/sm3-riscv64-glue.c
new file mode 100644
index 000000000000..75b2816e2f2b
--- /dev/null
+++ b/arch/riscv/crypto/sm3-riscv64-glue.c
@@ -0,0 +1,112 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Linux/riscv64 port of the OpenSSL SM3 implementation for RISCV64
+ *
+ * Copyright (C) 2023 VRULL GmbH
+ * Author: Heiko Stuebner <heiko.stuebner@vrull.eu>
+ */
+
+#include <linux/types.h>
+#include <asm/simd.h>
+#include <asm/vector.h>
+#include <crypto/internal/hash.h>
+#include <crypto/internal/simd.h>
+#include <crypto/sha2.h>
+#include <crypto/sm3_base.h>
+
+asmlinkage void ossl_hwsm3_block_data_order_zvksh(u32 *digest, const void *o,
+						  unsigned int num);
+
+static void __sm3_block_data_order(struct sm3_state *sst, u8 const *src,
+				      int blocks)
+{
+	ossl_hwsm3_block_data_order_zvksh(sst->state, src, blocks);
+}
+
+static int riscv64_sm3_update(struct shash_desc *desc, const u8 *data,
+			 unsigned int len)
+{
+	if (crypto_simd_usable()) {
+		int ret;
+
+		kernel_rvv_begin();
+		ret = sm3_base_do_update(desc, data, len,
+					    __sm3_block_data_order);
+		kernel_rvv_end();
+		return ret;
+	} else {
+		sm3_update(shash_desc_ctx(desc), data, len);
+		return 0;
+	}
+}
+
+static int riscv64_sm3_finup(struct shash_desc *desc, const u8 *data,
+			unsigned int len, u8 *out)
+{
+
+	if (!crypto_simd_usable()) {
+		struct sm3_state *sctx = shash_desc_ctx(desc);
+
+		if (len)
+			sm3_update(sctx, data, len);
+		sm3_final(sctx, out);
+		return 0;
+	}
+
+	kernel_rvv_begin();
+	if (len)
+		sm3_base_do_update(desc, data, len,
+				   __sm3_block_data_order);
+
+	sm3_base_do_finalize(desc, __sm3_block_data_order);
+	kernel_rvv_end();
+
+	return sm3_base_finish(desc, out);
+}
+
+static int riscv64_sm3_final(struct shash_desc *desc, u8 *out)
+{
+	return riscv64_sm3_finup(desc, NULL, 0, out);
+}
+
+static struct shash_alg sm3_alg = {
+	.digestsize		= SM3_DIGEST_SIZE,
+	.init			= sm3_base_init,
+	.update			= riscv64_sm3_update,
+	.final			= riscv64_sm3_final,
+	.finup			= riscv64_sm3_finup,
+	.descsize		= sizeof(struct sm3_state),
+	.base.cra_name		= "sm3",
+	.base.cra_driver_name	= "sm3-riscv64-zvksh",
+	.base.cra_priority	= 150,
+	.base.cra_blocksize	= SM3_BLOCK_SIZE,
+	.base.cra_module	= THIS_MODULE,
+};
+
+static int __init sm3_mod_init(void)
+{
+	/* sm3 needs at least a vlen of 256 to work correctly */
+	if (riscv_isa_extension_available(NULL, ZVKSH) &&
+	    riscv_isa_extension_available(NULL, ZVBB) &&
+	    riscv_vector_vlen() >= 256)
+		return crypto_register_shash(&sm3_alg);
+
+	return 0;
+}
+
+static void __exit sm3_mod_fini(void)
+{
+	if (riscv_isa_extension_available(NULL, ZVKSH) &&
+	    riscv_isa_extension_available(NULL, ZVBB) &&
+	    riscv_vector_vlen() >= 256)
+		crypto_unregister_shash(&sm3_alg);
+}
+
+module_init(sm3_mod_init);
+module_exit(sm3_mod_fini);
+
+MODULE_DESCRIPTION("SM3 secure hash for riscv64");
+MODULE_AUTHOR("Andy Polyakov <appro@openssl.org>");
+MODULE_AUTHOR("Ard Biesheuvel <ard.biesheuvel@linaro.org>");
+MODULE_LICENSE("GPL");
+MODULE_ALIAS_CRYPTO("sm3");
diff --git a/arch/riscv/crypto/sm3-riscv64-zvksh.pl b/arch/riscv/crypto/sm3-riscv64-zvksh.pl
new file mode 100644
index 000000000000..3c21df31793b
--- /dev/null
+++ b/arch/riscv/crypto/sm3-riscv64-zvksh.pl
@@ -0,0 +1,225 @@
+#! /usr/bin/env perl
+# SPDX-License-Identifier: Apache-2.0 OR BSD-2-Clause
+#
+# This file is dual-licensed, meaning that you can use it under your
+# choice of either of the following two licenses:
+#
+# Copyright 2023 The OpenSSL Project Authors. All Rights Reserved.
+#
+# Licensed under the Apache License 2.0 (the "License"). You can obtain
+# a copy in the file LICENSE in the source distribution or at
+# https://www.openssl.org/source/license.html
+#
+# or
+#
+# Copyright (c) 2023, Christoph Müllner <christoph.muellner@vrull.eu>
+# All rights reserved.
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions
+# are met:
+# 1. Redistributions of source code must retain the above copyright
+#    notice, this list of conditions and the following disclaimer.
+# 2. Redistributions in binary form must reproduce the above copyright
+#    notice, this list of conditions and the following disclaimer in the
+#    documentation and/or other materials provided with the distribution.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+# The generated code of this file depends on the following RISC-V extensions:
+# - RV64I
+# - RISC-V vector ('V') with VLEN >= 256
+# - Vector Bit-manipulation used in Cryptography ('Zvbb')
+# - ShangMi Suite: SM3 Secure Hash ('Zvksh')
+
+use strict;
+use warnings;
+
+use FindBin qw($Bin);
+use lib "$Bin";
+use lib "$Bin/../../perlasm";
+use riscv;
+
+# $output is the last argument if it looks like a file (it has an extension)
+# $flavour is the first argument if it doesn't look like a file
+my $output = $#ARGV >= 0 && $ARGV[$#ARGV] =~ m|\.\w+$| ? pop : undef;
+my $flavour = $#ARGV >= 0 && $ARGV[0] !~ m|\.| ? shift : undef;
+
+$output and open STDOUT,">$output";
+
+my $code=<<___;
+.text
+___
+
+################################################################################
+# ossl_hwsm3_block_data_order_zvksh(SM3_CTX *c, const void *p, size_t num);
+{
+my ($CTX, $INPUT, $NUM) = ("a0", "a1", "a2");
+my ($V0, $V1, $V2, $V3, $V4) = ("v0", "v1", "v2", "v3", "v4");
+
+$code .= <<___;
+.text
+.p2align 3
+.globl ossl_hwsm3_block_data_order_zvksh
+.type ossl_hwsm3_block_data_order_zvksh,\@function
+ossl_hwsm3_block_data_order_zvksh:
+    @{[vsetivli__x0_8_e32_m1_ta_ma]}
+
+    # Load initial state of hash context (c->A-H).
+    @{[vle32_v $V0, $CTX]}
+    @{[vrev8_v $V0, $V0]}
+
+L_sm3_loop:
+    # Copy the previous state to v1.
+    # It will be XOR'ed with the current state at the end of the round.
+    @{[vmv_v_v $V1, $V0]}
+
+    # Load the 64B block in 2x32B chunks.
+    @{[vle32_v $V3, $INPUT]} # v3 := {w7, ..., w0}
+    add $INPUT, $INPUT, 32
+
+    @{[vle32_v $V4, $INPUT]} # v4 := {w15, ..., w8}
+    add $INPUT, $INPUT, 32
+
+    add $NUM, $NUM, -1
+
+    # As vsm3c consumes only w0, w1, w4, w5 we need to slide the input
+    # 2 elements down so we process elements w2, w3, w6, w7
+    # This will be repeated for each odd round.
+    @{[vslidedown_vi $V2, $V3, 2]} # v2 := {X, X, w7, ..., w2}
+
+    @{[vsm3c_vi $V0, $V3, 0]}
+    @{[vsm3c_vi $V0, $V2, 1]}
+
+    # Prepare a vector with {w11, ..., w4}
+    @{[vslidedown_vi $V2, $V2, 2]} # v2 := {X, X, X, X, w7, ..., w4}
+    @{[vslideup_vi $V2, $V4, 4]}   # v2 := {w11, w10, w9, w8, w7, w6, w5, w4}
+
+    @{[vsm3c_vi $V0, $V2, 2]}
+    @{[vslidedown_vi $V2, $V2, 2]} # v2 := {X, X, w11, w10, w9, w8, w7, w6}
+    @{[vsm3c_vi $V0, $V2, 3]}
+
+    @{[vsm3c_vi $V0, $V4, 4]}
+    @{[vslidedown_vi $V2, $V4, 2]} # v2 := {X, X, w15, w14, w13, w12, w11, w10}
+    @{[vsm3c_vi $V0, $V2, 5]}
+
+    @{[vsm3me_vv $V3, $V4, $V3]}   # v3 := {w23, w22, w21, w20, w19, w18, w17, w16}
+
+    # Prepare a register with {w19, w18, w17, w16, w15, w14, w13, w12}
+    @{[vslidedown_vi $V2, $V2, 2]} # v2 := {X, X, X, X, w15, w14, w13, w12}
+    @{[vslideup_vi $V2, $V3, 4]}   # v2 := {w19, w18, w17, w16, w15, w14, w13, w12}
+
+    @{[vsm3c_vi $V0, $V2, 6]}
+    @{[vslidedown_vi $V2, $V2, 2]} # v2 := {X, X, w19, w18, w17, w16, w15, w14}
+    @{[vsm3c_vi $V0, $V2, 7]}
+
+    @{[vsm3c_vi $V0, $V3, 8]}
+    @{[vslidedown_vi $V2, $V3, 2]} # v2 := {X, X, w23, w22, w21, w20, w19, w18}
+    @{[vsm3c_vi $V0, $V2, 9]}
+
+    @{[vsm3me_vv $V4, $V3, $V4]}   # v4 := {w31, w30, w29, w28, w27, w26, w25, w24}
+
+    # Prepare a register with {w27, w26, w25, w24, w23, w22, w21, w20}
+    @{[vslidedown_vi $V2, $V2, 2]} # v2 := {X, X, X, X, w23, w22, w21, w20}
+    @{[vslideup_vi $V2, $V4, 4]}   # v2 := {w27, w26, w25, w24, w23, w22, w21, w20}
+
+    @{[vsm3c_vi $V0, $V2, 10]}
+    @{[vslidedown_vi $V2, $V2, 2]} # v2 := {X, X, w27, w26, w25, w24, w23, w22}
+    @{[vsm3c_vi $V0, $V2, 11]}
+
+    @{[vsm3c_vi $V0, $V4, 12]}
+    @{[vslidedown_vi $V2, $V4, 2]} # v2 := {x, X, w31, w30, w29, w28, w27, w26}
+    @{[vsm3c_vi $V0, $V2, 13]}
+
+    @{[vsm3me_vv $V3, $V4, $V3]}   # v3 := {w32, w33, w34, w35, w36, w37, w38, w39}
+
+    # Prepare a register with {w35, w34, w33, w32, w31, w30, w29, w28}
+    @{[vslidedown_vi $V2, $V2, 2]} # v2 := {X, X, X, X, w31, w30, w29, w28}
+    @{[vslideup_vi $V2, $V3, 4]}   # v2 := {w35, w34, w33, w32, w31, w30, w29, w28}
+
+    @{[vsm3c_vi $V0, $V2, 14]}
+    @{[vslidedown_vi $V2, $V2, 2]} # v2 := {X, X, w35, w34, w33, w32, w31, w30}
+    @{[vsm3c_vi $V0, $V2, 15]}
+
+    @{[vsm3c_vi $V0, $V3, 16]}
+    @{[vslidedown_vi $V2, $V3, 2]} # v2 := {X, X, w39, w38, w37, w36, w35, w34}
+    @{[vsm3c_vi $V0, $V2, 17]}
+
+    @{[vsm3me_vv $V4, $V3, $V4]}   # v4 := {w47, w46, w45, w44, w43, w42, w41, w40}
+
+    # Prepare a register with {w43, w42, w41, w40, w39, w38, w37, w36}
+    @{[vslidedown_vi $V2, $V2, 2]} # v2 := {X, X, X, X, w39, w38, w37, w36}
+    @{[vslideup_vi $V2, $V4, 4]}   # v2 := {w43, w42, w41, w40, w39, w38, w37, w36}
+
+    @{[vsm3c_vi $V0, $V2, 18]}
+    @{[vslidedown_vi $V2, $V2, 2]} # v2 := {X, X, w43, w42, w41, w40, w39, w38}
+    @{[vsm3c_vi $V0, $V2, 19]}
+
+    @{[vsm3c_vi $V0, $V4, 20]}
+    @{[vslidedown_vi $V2, $V4, 2]} # v2 := {X, X, w47, w46, w45, w44, w43, w42}
+    @{[vsm3c_vi $V0, $V2, 21]}
+
+    @{[vsm3me_vv $V3, $V4, $V3]}   # v3 := {w55, w54, w53, w52, w51, w50, w49, w48}
+
+    # Prepare a register with {w51, w50, w49, w48, w47, w46, w45, w44}
+    @{[vslidedown_vi $V2, $V2, 2]} # v2 := {X, X, X, X, w47, w46, w45, w44}
+    @{[vslideup_vi $V2, $V3, 4]}   # v2 := {w51, w50, w49, w48, w47, w46, w45, w44}
+
+    @{[vsm3c_vi $V0, $V2, 22]}
+    @{[vslidedown_vi $V2, $V2, 2]} # v2 := {X, X, w51, w50, w49, w48, w47, w46}
+    @{[vsm3c_vi $V0, $V2, 23]}
+
+    @{[vsm3c_vi $V0, $V3, 24]}
+    @{[vslidedown_vi $V2, $V3, 2]} # v2 := {X, X, w55, w54, w53, w52, w51, w50}
+    @{[vsm3c_vi $V0, $V2, 25]}
+
+    @{[vsm3me_vv $V4, $V3, $V4]}   # v4 := {w63, w62, w61, w60, w59, w58, w57, w56}
+
+    # Prepare a register with {w59, w58, w57, w56, w55, w54, w53, w52}
+    @{[vslidedown_vi $V2, $V2, 2]} # v2 := {X, X, X, X, w55, w54, w53, w52}
+    @{[vslideup_vi $V2, $V4, 4]}   # v2 := {w59, w58, w57, w56, w55, w54, w53, w52}
+
+    @{[vsm3c_vi $V0, $V2, 26]}
+    @{[vslidedown_vi $V2, $V2, 2]} # v2 := {X, X, w59, w58, w57, w56, w55, w54}
+    @{[vsm3c_vi $V0, $V2, 27]}
+
+    @{[vsm3c_vi $V0, $V4, 28]}
+    @{[vslidedown_vi $V2, $V4, 2]} # v2 := {X, X, w63, w62, w61, w60, w59, w58}
+    @{[vsm3c_vi $V0, $V2, 29]}
+
+    @{[vsm3me_vv $V3, $V4, $V3]}   # v3 := {w71, w70, w69, w68, w67, w66, w65, w64}
+
+    # Prepare a register with {w67, w66, w65, w64, w63, w62, w61, w60}
+    @{[vslidedown_vi $V2, $V2, 2]} # v2 := {X, X, X, X, w63, w62, w61, w60}
+    @{[vslideup_vi $V2, $V3, 4]}   # v2 := {w67, w66, w65, w64, w63, w62, w61, w60}
+
+    @{[vsm3c_vi $V0, $V2, 30]}
+    @{[vslidedown_vi $V2, $V2, 2]} # v2 := {X, X, w67, w66, w65, w64, w63, w62}
+    @{[vsm3c_vi $V0, $V2, 31]}
+
+    # XOR in the previous state.
+    @{[vxor_vv $V0, $V0, $V1]}
+
+    bnez $NUM, L_sm3_loop     # Check if there are any more block to process
+L_sm3_end:
+    @{[vrev8_v $V0, $V0]}
+    @{[vse32_v $V0, $CTX]}
+    ret
+
+.size ossl_hwsm3_block_data_order_zvksh,.-ossl_hwsm3_block_data_order_zvksh
+___
+}
+
+print $code;
+
+close STDOUT or die "error closing STDOUT: $!";
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 100+ messages in thread

* Re: [PATCH v4 01/12] riscv: Add support for kernel mode vector
  2023-07-11 15:37   ` Heiko Stuebner
@ 2023-07-11 17:11     ` Rémi Denis-Courmont
  -1 siblings, 0 replies; 100+ messages in thread
From: Rémi Denis-Courmont @ 2023-07-11 17:11 UTC (permalink / raw)
  To: linux-riscv; +Cc: linux-kernel

	Hi,

Le tiistaina 11. heinäkuuta 2023, 18.37.32 EEST Heiko Stuebner a écrit :
> From: Greentime Hu <greentime.hu@sifive.com>
> 
> Add kernel_rvv_begin() and kernel_rvv_end() function declarations
> and corresponding definitions in kernel_mode_vector.c
> 
> These are needed to wrap uses of vector in kernel mode.
> 
> Co-developed-by: Vincent Chen <vincent.chen@sifive.com>
> Signed-off-by: Vincent Chen <vincent.chen@sifive.com>
> Signed-off-by: Greentime Hu <greentime.hu@sifive.com>
> Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
> ---
>  arch/riscv/include/asm/vector.h        |  17 ++++
>  arch/riscv/kernel/Makefile             |   1 +
>  arch/riscv/kernel/kernel_mode_vector.c | 132 +++++++++++++++++++++++++
>  3 files changed, 150 insertions(+)
>  create mode 100644 arch/riscv/kernel/kernel_mode_vector.c
> 
> diff --git a/arch/riscv/include/asm/vector.h
> b/arch/riscv/include/asm/vector.h index 3d78930cab51..ac2c23045eec 100644
> --- a/arch/riscv/include/asm/vector.h
> +++ b/arch/riscv/include/asm/vector.h
> @@ -196,6 +196,23 @@ static inline void __switch_to_vector(struct
> task_struct *prev, void riscv_v_vstate_ctrl_init(struct task_struct *tsk);
>  bool riscv_v_vstate_ctrl_user_allowed(void);
> 
> +static inline void riscv_v_flush_cpu_state(void)
> +{
> +	asm volatile (
> +		".option push\n\t"
> +		".option arch, +v\n\t"
> +		"vsetvli	t0, x0, e8, m8, ta, ma\n\t"
> +		"vmv.v.i	v0, 0\n\t"
> +		"vmv.v.i	v8, 0\n\t"
> +		"vmv.v.i	v16, 0\n\t"
> +		"vmv.v.i	v24, 0\n\t"
> +		".option pop\n\t"
> +		: : : "t0");

Why bother with zeroing out the vectors before kernel use? That sounds like it 
will only hide bugs in kernel code - implicitly assuming that everything is 
initially zero. Ditto initialising the vector configuration; if you really want 
to have a fixed initial value rather than "leak" whatever user set, better use 
an invalid configuration (vill=1), IMO.

> +}
> +
> +void kernel_rvv_begin(void);
> +void kernel_rvv_end(void);
> +
>  #else /* ! CONFIG_RISCV_ISA_V  */
> 
>  struct pt_regs;
> diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile
> index 506cc4a9a45a..3f4435746af7 100644
> --- a/arch/riscv/kernel/Makefile
> +++ b/arch/riscv/kernel/Makefile
> @@ -61,6 +61,7 @@ obj-$(CONFIG_MMU) += vdso.o vdso/
>  obj-$(CONFIG_RISCV_M_MODE)	+= traps_misaligned.o
>  obj-$(CONFIG_FPU)		+= fpu.o
>  obj-$(CONFIG_RISCV_ISA_V)	+= vector.o
> +obj-$(CONFIG_RISCV_ISA_V)	+= kernel_mode_vector.o
>  obj-$(CONFIG_SMP)		+= smpboot.o
>  obj-$(CONFIG_SMP)		+= smp.o
>  obj-$(CONFIG_SMP)		+= cpu_ops.o
> diff --git a/arch/riscv/kernel/kernel_mode_vector.c
> b/arch/riscv/kernel/kernel_mode_vector.c new file mode 100644
> index 000000000000..2d704190c054
> --- /dev/null
> +++ b/arch/riscv/kernel/kernel_mode_vector.c
> @@ -0,0 +1,132 @@
> +// SPDX-License-Identifier: GPL-2.0-or-later
> +/*
> + * Copyright (C) 2012 ARM Ltd.
> + * Author: Catalin Marinas <catalin.marinas@arm.com>
> + * Copyright (C) 2017 Linaro Ltd. <ard.biesheuvel@linaro.org>
> + * Copyright (C) 2021 SiFive
> + */
> +#include <linux/compiler.h>
> +#include <linux/irqflags.h>
> +#include <linux/percpu.h>
> +#include <linux/preempt.h>
> +#include <linux/types.h>
> +
> +#include <asm/vector.h>
> +#include <asm/switch_to.h>
> +
> +DECLARE_PER_CPU(bool, vector_context_busy);
> +DEFINE_PER_CPU(bool, vector_context_busy);
> +
> +/*
> + * may_use_vector - whether it is allowable at this time to issue vector
> + *                instructions or access the vector register file
> + *
> + * Callers must not assume that the result remains true beyond the next
> + * preempt_enable() or return from softirq context.
> + */
> +static __must_check inline bool may_use_vector(void)
> +{
> +	/*
> +	 * vector_context_busy is only set while preemption is disabled,
> +	 * and is clear whenever preemption is enabled. Since
> +	 * this_cpu_read() is atomic w.r.t. preemption, vector_context_busy
> +	 * cannot change under our feet -- if it's set we cannot be
> +	 * migrated, and if it's clear we cannot be migrated to a CPU
> +	 * where it is set.
> +	 */
> +	return !in_irq() && !irqs_disabled() && !in_nmi() &&
> +	       !this_cpu_read(vector_context_busy);
> +}
> +
> +/*
> + * Claim ownership of the CPU vector context for use by the calling
> context. + *
> + * The caller may freely manipulate the vector context metadata until
> + * put_cpu_vector_context() is called.
> + */
> +static void get_cpu_vector_context(void)
> +{
> +	bool busy;
> +
> +	preempt_disable();
> +	busy = __this_cpu_xchg(vector_context_busy, true);
> +
> +	WARN_ON(busy);
> +}
> +
> +/*
> + * Release the CPU vector context.
> + *
> + * Must be called from a context in which get_cpu_vector_context() was
> + * previously called, with no call to put_cpu_vector_context() in the
> + * meantime.
> + */
> +static void put_cpu_vector_context(void)
> +{
> +	bool busy = __this_cpu_xchg(vector_context_busy, false);
> +
> +	WARN_ON(!busy);
> +	preempt_enable();
> +}
> +
> +/*
> + * kernel_rvv_begin(): obtain the CPU vector registers for use by the
> calling + * context
> + *
> + * Must not be called unless may_use_vector() returns true.
> + * Task context in the vector registers is saved back to memory as
> necessary. + *
> + * A matching call to kernel_rvv_end() must be made before returning from
> the + * calling context.
> + *
> + * The caller may freely use the vector registers until kernel_rvv_end() is
> + * called.
> + */
> +void kernel_rvv_begin(void)
> +{
> +	if (WARN_ON(!has_vector()))
> +		return;
> +
> +	WARN_ON(!may_use_vector());
> +
> +	/* Acquire kernel mode vector */
> +	get_cpu_vector_context();
> +
> +	/* Save vector state, if any */
> +	riscv_v_vstate_save(current, task_pt_regs(current));
> +
> +	/* Enable vector */
> +	riscv_v_enable();
> +
> +	/* Invalidate vector regs */
> +	riscv_v_flush_cpu_state();
> +}
> +EXPORT_SYMBOL_GPL(kernel_rvv_begin);
> +
> +/*
> + * kernel_rvv_end(): give the CPU vector registers back to the current task
> + *
> + * Must be called from a context in which kernel_rvv_begin() was previously
> + * called, with no call to kernel_rvv_end() in the meantime.
> + *
> + * The caller must not use the vector registers after this function is
> called, + * unless kernel_rvv_begin() is called again in the meantime.
> + */
> +void kernel_rvv_end(void)
> +{
> +	if (WARN_ON(!has_vector()))
> +		return;
> +
> +	/* Invalidate vector regs */
> +	riscv_v_flush_cpu_state();
> +
> +	/* Restore vector state, if any */
> +	riscv_v_vstate_restore(current, task_pt_regs(current));

I thought that the kernel was already nuking user vectors on every system 
call, since the RVV spec says so.

Are you trying to use vectors from interrupts? Otherwise, isn't this flush & 
restore superfluous?

> +
> +	/* disable vector */
> +	riscv_v_disable();
> +
> +	/* release kernel mode vector */
> +	put_cpu_vector_context();
> +}
> +EXPORT_SYMBOL_GPL(kernel_rvv_end);


-- 
雷米‧德尼-库尔蒙
http://www.remlab.net/




^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH v4 01/12] riscv: Add support for kernel mode vector
@ 2023-07-11 17:11     ` Rémi Denis-Courmont
  0 siblings, 0 replies; 100+ messages in thread
From: Rémi Denis-Courmont @ 2023-07-11 17:11 UTC (permalink / raw)
  To: linux-riscv; +Cc: linux-kernel

	Hi,

Le tiistaina 11. heinäkuuta 2023, 18.37.32 EEST Heiko Stuebner a écrit :
> From: Greentime Hu <greentime.hu@sifive.com>
> 
> Add kernel_rvv_begin() and kernel_rvv_end() function declarations
> and corresponding definitions in kernel_mode_vector.c
> 
> These are needed to wrap uses of vector in kernel mode.
> 
> Co-developed-by: Vincent Chen <vincent.chen@sifive.com>
> Signed-off-by: Vincent Chen <vincent.chen@sifive.com>
> Signed-off-by: Greentime Hu <greentime.hu@sifive.com>
> Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
> ---
>  arch/riscv/include/asm/vector.h        |  17 ++++
>  arch/riscv/kernel/Makefile             |   1 +
>  arch/riscv/kernel/kernel_mode_vector.c | 132 +++++++++++++++++++++++++
>  3 files changed, 150 insertions(+)
>  create mode 100644 arch/riscv/kernel/kernel_mode_vector.c
> 
> diff --git a/arch/riscv/include/asm/vector.h
> b/arch/riscv/include/asm/vector.h index 3d78930cab51..ac2c23045eec 100644
> --- a/arch/riscv/include/asm/vector.h
> +++ b/arch/riscv/include/asm/vector.h
> @@ -196,6 +196,23 @@ static inline void __switch_to_vector(struct
> task_struct *prev, void riscv_v_vstate_ctrl_init(struct task_struct *tsk);
>  bool riscv_v_vstate_ctrl_user_allowed(void);
> 
> +static inline void riscv_v_flush_cpu_state(void)
> +{
> +	asm volatile (
> +		".option push\n\t"
> +		".option arch, +v\n\t"
> +		"vsetvli	t0, x0, e8, m8, ta, ma\n\t"
> +		"vmv.v.i	v0, 0\n\t"
> +		"vmv.v.i	v8, 0\n\t"
> +		"vmv.v.i	v16, 0\n\t"
> +		"vmv.v.i	v24, 0\n\t"
> +		".option pop\n\t"
> +		: : : "t0");

Why bother with zeroing out the vectors before kernel use? That sounds like it 
will only hide bugs in kernel code - implicitly assuming that everything is 
initially zero. Ditto initialising the vector configuration; if you really want 
to have a fixed initial value rather than "leak" whatever user set, better use 
an invalid configuration (vill=1), IMO.

> +}
> +
> +void kernel_rvv_begin(void);
> +void kernel_rvv_end(void);
> +
>  #else /* ! CONFIG_RISCV_ISA_V  */
> 
>  struct pt_regs;
> diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile
> index 506cc4a9a45a..3f4435746af7 100644
> --- a/arch/riscv/kernel/Makefile
> +++ b/arch/riscv/kernel/Makefile
> @@ -61,6 +61,7 @@ obj-$(CONFIG_MMU) += vdso.o vdso/
>  obj-$(CONFIG_RISCV_M_MODE)	+= traps_misaligned.o
>  obj-$(CONFIG_FPU)		+= fpu.o
>  obj-$(CONFIG_RISCV_ISA_V)	+= vector.o
> +obj-$(CONFIG_RISCV_ISA_V)	+= kernel_mode_vector.o
>  obj-$(CONFIG_SMP)		+= smpboot.o
>  obj-$(CONFIG_SMP)		+= smp.o
>  obj-$(CONFIG_SMP)		+= cpu_ops.o
> diff --git a/arch/riscv/kernel/kernel_mode_vector.c
> b/arch/riscv/kernel/kernel_mode_vector.c new file mode 100644
> index 000000000000..2d704190c054
> --- /dev/null
> +++ b/arch/riscv/kernel/kernel_mode_vector.c
> @@ -0,0 +1,132 @@
> +// SPDX-License-Identifier: GPL-2.0-or-later
> +/*
> + * Copyright (C) 2012 ARM Ltd.
> + * Author: Catalin Marinas <catalin.marinas@arm.com>
> + * Copyright (C) 2017 Linaro Ltd. <ard.biesheuvel@linaro.org>
> + * Copyright (C) 2021 SiFive
> + */
> +#include <linux/compiler.h>
> +#include <linux/irqflags.h>
> +#include <linux/percpu.h>
> +#include <linux/preempt.h>
> +#include <linux/types.h>
> +
> +#include <asm/vector.h>
> +#include <asm/switch_to.h>
> +
> +DECLARE_PER_CPU(bool, vector_context_busy);
> +DEFINE_PER_CPU(bool, vector_context_busy);
> +
> +/*
> + * may_use_vector - whether it is allowable at this time to issue vector
> + *                instructions or access the vector register file
> + *
> + * Callers must not assume that the result remains true beyond the next
> + * preempt_enable() or return from softirq context.
> + */
> +static __must_check inline bool may_use_vector(void)
> +{
> +	/*
> +	 * vector_context_busy is only set while preemption is disabled,
> +	 * and is clear whenever preemption is enabled. Since
> +	 * this_cpu_read() is atomic w.r.t. preemption, vector_context_busy
> +	 * cannot change under our feet -- if it's set we cannot be
> +	 * migrated, and if it's clear we cannot be migrated to a CPU
> +	 * where it is set.
> +	 */
> +	return !in_irq() && !irqs_disabled() && !in_nmi() &&
> +	       !this_cpu_read(vector_context_busy);
> +}
> +
> +/*
> + * Claim ownership of the CPU vector context for use by the calling
> context. + *
> + * The caller may freely manipulate the vector context metadata until
> + * put_cpu_vector_context() is called.
> + */
> +static void get_cpu_vector_context(void)
> +{
> +	bool busy;
> +
> +	preempt_disable();
> +	busy = __this_cpu_xchg(vector_context_busy, true);
> +
> +	WARN_ON(busy);
> +}
> +
> +/*
> + * Release the CPU vector context.
> + *
> + * Must be called from a context in which get_cpu_vector_context() was
> + * previously called, with no call to put_cpu_vector_context() in the
> + * meantime.
> + */
> +static void put_cpu_vector_context(void)
> +{
> +	bool busy = __this_cpu_xchg(vector_context_busy, false);
> +
> +	WARN_ON(!busy);
> +	preempt_enable();
> +}
> +
> +/*
> + * kernel_rvv_begin(): obtain the CPU vector registers for use by the
> calling + * context
> + *
> + * Must not be called unless may_use_vector() returns true.
> + * Task context in the vector registers is saved back to memory as
> necessary. + *
> + * A matching call to kernel_rvv_end() must be made before returning from
> the + * calling context.
> + *
> + * The caller may freely use the vector registers until kernel_rvv_end() is
> + * called.
> + */
> +void kernel_rvv_begin(void)
> +{
> +	if (WARN_ON(!has_vector()))
> +		return;
> +
> +	WARN_ON(!may_use_vector());
> +
> +	/* Acquire kernel mode vector */
> +	get_cpu_vector_context();
> +
> +	/* Save vector state, if any */
> +	riscv_v_vstate_save(current, task_pt_regs(current));
> +
> +	/* Enable vector */
> +	riscv_v_enable();
> +
> +	/* Invalidate vector regs */
> +	riscv_v_flush_cpu_state();
> +}
> +EXPORT_SYMBOL_GPL(kernel_rvv_begin);
> +
> +/*
> + * kernel_rvv_end(): give the CPU vector registers back to the current task
> + *
> + * Must be called from a context in which kernel_rvv_begin() was previously
> + * called, with no call to kernel_rvv_end() in the meantime.
> + *
> + * The caller must not use the vector registers after this function is
> called, + * unless kernel_rvv_begin() is called again in the meantime.
> + */
> +void kernel_rvv_end(void)
> +{
> +	if (WARN_ON(!has_vector()))
> +		return;
> +
> +	/* Invalidate vector regs */
> +	riscv_v_flush_cpu_state();
> +
> +	/* Restore vector state, if any */
> +	riscv_v_vstate_restore(current, task_pt_regs(current));

I thought that the kernel was already nuking user vectors on every system 
call, since the RVV spec says so.

Are you trying to use vectors from interrupts? Otherwise, isn't this flush & 
restore superfluous?

> +
> +	/* disable vector */
> +	riscv_v_disable();
> +
> +	/* release kernel mode vector */
> +	put_cpu_vector_context();
> +}
> +EXPORT_SYMBOL_GPL(kernel_rvv_end);


-- 
雷米‧德尼-库尔蒙
http://www.remlab.net/




_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH v4 02/12] riscv: Add vector extension XOR implementation
  2023-07-11 15:37   ` Heiko Stuebner
@ 2023-07-11 17:33     ` Rémi Denis-Courmont
  -1 siblings, 0 replies; 100+ messages in thread
From: Rémi Denis-Courmont @ 2023-07-11 17:33 UTC (permalink / raw)
  To: linux-riscv, linux-kernel

Le tiistaina 11. heinäkuuta 2023, 18.37.33 EEST Heiko Stuebner a écrit :
> diff --git a/arch/riscv/lib/xor.S b/arch/riscv/lib/xor.S
> new file mode 100644
> index 000000000000..3bc059e18171
> --- /dev/null
> +++ b/arch/riscv/lib/xor.S
> @@ -0,0 +1,81 @@
> +/* SPDX-License-Identifier: GPL-2.0-or-later */
> +/*
> + * Copyright (C) 2021 SiFive
> + */
> +#include <linux/linkage.h>
> +#include <asm-generic/export.h>
> +#include <asm/asm.h>
> +
> +ENTRY(xor_regs_2_)
> +	vsetvli a3, a0, e8, m8, ta, ma

AFAICT, so far, Linux only uses `vsetvli` to save/restore/flush vectors, and 
that's, of course, with LMUL=8, so that's not really telling much anything. 
This function could be the first actual vector optimisation in kernel if/when 
it gets merged.

Should the same group multiplier be used for "actual" vector loops throughout 
the kernel? I've seen conflicting advises or opinions here. Should kernel code 
always use the maximum possible LMUL, depending on register pressure of the 
loop? Or will that just increase latency with no bandwidth gains compared to, 
say, LMUL=1 or LMUL=2?

> +	vle8.v v0, (a1)
> +	vle8.v v8, (a2)
> +	sub a0, a0, a3
> +	vxor.vv v16, v0, v8
> +	add a2, a2, a3
> +	vse8.v v16, (a1)
> +	add a1, a1, a3
> +	bnez a0, xor_regs_2_
> +	ret
> +END(xor_regs_2_)
> +EXPORT_SYMBOL(xor_regs_2_)
> +
> +ENTRY(xor_regs_3_)
> +	vsetvli a4, a0, e8, m8, ta, ma
> +	vle8.v v0, (a1)
> +	vle8.v v8, (a2)
> +	sub a0, a0, a4
> +	vxor.vv v0, v0, v8
> +	vle8.v v16, (a3)
> +	add a2, a2, a4
> +	vxor.vv v16, v0, v16
> +	add a3, a3, a4
> +	vse8.v v16, (a1)
> +	add a1, a1, a4
> +	bnez a0, xor_regs_3_
> +	ret
> +END(xor_regs_3_)
> +EXPORT_SYMBOL(xor_regs_3_)
> +
> +ENTRY(xor_regs_4_)
> +	vsetvli a5, a0, e8, m8, ta, ma
> +	vle8.v v0, (a1)
> +	vle8.v v8, (a2)
> +	sub a0, a0, a5
> +	vxor.vv v0, v0, v8
> +	vle8.v v16, (a3)
> +	add a2, a2, a5
> +	vxor.vv v0, v0, v16
> +	vle8.v v24, (a4)
> +	add a3, a3, a5
> +	vxor.vv v16, v0, v24
> +	add a4, a4, a5
> +	vse8.v v16, (a1)
> +	add a1, a1, a5
> +	bnez a0, xor_regs_4_
> +	ret
> +END(xor_regs_4_)
> +EXPORT_SYMBOL(xor_regs_4_)
> +
> +ENTRY(xor_regs_5_)
> +	vsetvli a6, a0, e8, m8, ta, ma
> +	vle8.v v0, (a1)
> +	vle8.v v8, (a2)
> +	sub a0, a0, a6
> +	vxor.vv v0, v0, v8
> +	vle8.v v16, (a3)
> +	add a2, a2, a6
> +	vxor.vv v0, v0, v16
> +	vle8.v v24, (a4)
> +	add a3, a3, a6
> +	vxor.vv v0, v0, v24
> +	vle8.v v8, (a5)
> +	add a4, a4, a6
> +	vxor.vv v16, v0, v8
> +	add a5, a5, a6
> +	vse8.v v16, (a1)
> +	add a1, a1, a6
> +	bnez a0, xor_regs_5_
> +	ret
> +END(xor_regs_5_)
> +EXPORT_SYMBOL(xor_regs_5_)


-- 
レミ・デニ-クールモン
http://www.remlab.net/




_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH v4 02/12] riscv: Add vector extension XOR implementation
@ 2023-07-11 17:33     ` Rémi Denis-Courmont
  0 siblings, 0 replies; 100+ messages in thread
From: Rémi Denis-Courmont @ 2023-07-11 17:33 UTC (permalink / raw)
  To: linux-riscv, linux-kernel

Le tiistaina 11. heinäkuuta 2023, 18.37.33 EEST Heiko Stuebner a écrit :
> diff --git a/arch/riscv/lib/xor.S b/arch/riscv/lib/xor.S
> new file mode 100644
> index 000000000000..3bc059e18171
> --- /dev/null
> +++ b/arch/riscv/lib/xor.S
> @@ -0,0 +1,81 @@
> +/* SPDX-License-Identifier: GPL-2.0-or-later */
> +/*
> + * Copyright (C) 2021 SiFive
> + */
> +#include <linux/linkage.h>
> +#include <asm-generic/export.h>
> +#include <asm/asm.h>
> +
> +ENTRY(xor_regs_2_)
> +	vsetvli a3, a0, e8, m8, ta, ma

AFAICT, so far, Linux only uses `vsetvli` to save/restore/flush vectors, and 
that's, of course, with LMUL=8, so that's not really telling much anything. 
This function could be the first actual vector optimisation in kernel if/when 
it gets merged.

Should the same group multiplier be used for "actual" vector loops throughout 
the kernel? I've seen conflicting advises or opinions here. Should kernel code 
always use the maximum possible LMUL, depending on register pressure of the 
loop? Or will that just increase latency with no bandwidth gains compared to, 
say, LMUL=1 or LMUL=2?

> +	vle8.v v0, (a1)
> +	vle8.v v8, (a2)
> +	sub a0, a0, a3
> +	vxor.vv v16, v0, v8
> +	add a2, a2, a3
> +	vse8.v v16, (a1)
> +	add a1, a1, a3
> +	bnez a0, xor_regs_2_
> +	ret
> +END(xor_regs_2_)
> +EXPORT_SYMBOL(xor_regs_2_)
> +
> +ENTRY(xor_regs_3_)
> +	vsetvli a4, a0, e8, m8, ta, ma
> +	vle8.v v0, (a1)
> +	vle8.v v8, (a2)
> +	sub a0, a0, a4
> +	vxor.vv v0, v0, v8
> +	vle8.v v16, (a3)
> +	add a2, a2, a4
> +	vxor.vv v16, v0, v16
> +	add a3, a3, a4
> +	vse8.v v16, (a1)
> +	add a1, a1, a4
> +	bnez a0, xor_regs_3_
> +	ret
> +END(xor_regs_3_)
> +EXPORT_SYMBOL(xor_regs_3_)
> +
> +ENTRY(xor_regs_4_)
> +	vsetvli a5, a0, e8, m8, ta, ma
> +	vle8.v v0, (a1)
> +	vle8.v v8, (a2)
> +	sub a0, a0, a5
> +	vxor.vv v0, v0, v8
> +	vle8.v v16, (a3)
> +	add a2, a2, a5
> +	vxor.vv v0, v0, v16
> +	vle8.v v24, (a4)
> +	add a3, a3, a5
> +	vxor.vv v16, v0, v24
> +	add a4, a4, a5
> +	vse8.v v16, (a1)
> +	add a1, a1, a5
> +	bnez a0, xor_regs_4_
> +	ret
> +END(xor_regs_4_)
> +EXPORT_SYMBOL(xor_regs_4_)
> +
> +ENTRY(xor_regs_5_)
> +	vsetvli a6, a0, e8, m8, ta, ma
> +	vle8.v v0, (a1)
> +	vle8.v v8, (a2)
> +	sub a0, a0, a6
> +	vxor.vv v0, v0, v8
> +	vle8.v v16, (a3)
> +	add a2, a2, a6
> +	vxor.vv v0, v0, v16
> +	vle8.v v24, (a4)
> +	add a3, a3, a6
> +	vxor.vv v0, v0, v24
> +	vle8.v v8, (a5)
> +	add a4, a4, a6
> +	vxor.vv v16, v0, v8
> +	add a5, a5, a6
> +	vse8.v v16, (a1)
> +	add a1, a1, a6
> +	bnez a0, xor_regs_5_
> +	ret
> +END(xor_regs_5_)
> +EXPORT_SYMBOL(xor_regs_5_)


-- 
レミ・デニ-クールモン
http://www.remlab.net/




^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH v4 05/12] RISC-V: crypto: update perl include with helpers for vector (crypto) instructions
  2023-07-11 15:37   ` Heiko Stuebner
@ 2023-07-11 18:04     ` Rémi Denis-Courmont
  -1 siblings, 0 replies; 100+ messages in thread
From: Rémi Denis-Courmont @ 2023-07-11 18:04 UTC (permalink / raw)
  To: linux-riscv; +Cc: linux-kernel

Le tiistaina 11. heinäkuuta 2023, 18.37.36 EEST Heiko Stuebner a écrit :
> From: Heiko Stuebner <heiko.stuebner@vrull.eu>
> 
> The openSSL scripts use a number of helpers for handling vector
> instructions and instructions from the vector-crypto-extensions.

Uh but the kernel RVV code requires an assembler that supports the `.option 
arch` directive and the V extension anyway.

Is there a need to wrap vector load/store and ALU instructions from the Vector 
spec? This trick should only be necessary for the Zvk*-specific stuff, AFAICT.

(Also FWIW, this can be done directly with .macro inside a header file, without 
involving Perl.)

-- 
Rémi Denis-Courmont
http://www.remlab.net/




_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH v4 05/12] RISC-V: crypto: update perl include with helpers for vector (crypto) instructions
@ 2023-07-11 18:04     ` Rémi Denis-Courmont
  0 siblings, 0 replies; 100+ messages in thread
From: Rémi Denis-Courmont @ 2023-07-11 18:04 UTC (permalink / raw)
  To: linux-riscv; +Cc: linux-kernel

Le tiistaina 11. heinäkuuta 2023, 18.37.36 EEST Heiko Stuebner a écrit :
> From: Heiko Stuebner <heiko.stuebner@vrull.eu>
> 
> The openSSL scripts use a number of helpers for handling vector
> instructions and instructions from the vector-crypto-extensions.

Uh but the kernel RVV code requires an assembler that supports the `.option 
arch` directive and the V extension anyway.

Is there a need to wrap vector load/store and ALU instructions from the Vector 
spec? This trick should only be necessary for the Zvk*-specific stuff, AFAICT.

(Also FWIW, this can be done directly with .macro inside a header file, without 
involving Perl.)

-- 
Rémi Denis-Courmont
http://www.remlab.net/




^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH v4 03/12] RISC-V: add helper function to read the vector VLEN
  2023-07-11 15:37   ` Heiko Stuebner
@ 2023-07-11 18:06     ` Rémi Denis-Courmont
  -1 siblings, 0 replies; 100+ messages in thread
From: Rémi Denis-Courmont @ 2023-07-11 18:06 UTC (permalink / raw)
  To: linux-riscv; +Cc: linux-kernel

Le tiistaina 11. heinäkuuta 2023, 18.37.34 EEST Heiko Stuebner a écrit :
> From: Heiko Stuebner <heiko.stuebner@vrull.eu>
> 
> VLEN describes the length of each vector register and some instructions
> need specific minimal VLENs to work correctly.
> 
> The vector code already includes a variable riscv_vsize that contains the
> value of "32 vector registers with vlenb length" that gets filled during
> boot. vlenb is the value contained in the CSR_VLENB register and
> the value represents "VLEN / 8".
> 
> So add riscv_vector_vlen() to return the actual VLEN value for in-kernel
> users when they need to check the available VLEN.
> 
> Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
> ---
>  arch/riscv/include/asm/vector.h | 11 +++++++++++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/arch/riscv/include/asm/vector.h
> b/arch/riscv/include/asm/vector.h index ac2c23045eec..88cf76a2316d 100644
> --- a/arch/riscv/include/asm/vector.h
> +++ b/arch/riscv/include/asm/vector.h
> @@ -232,4 +232,15 @@ static inline bool
> riscv_v_vstate_ctrl_user_allowed(void) { return false; }
> 
>  #endif /* CONFIG_RISCV_ISA_V */
> 
> +/*
> + * Return the implementation's vlen value.
> + *
> + * riscv_vsize contains the value of "32 vector registers with vlenb
> length" + * so rebuild the vlen value in bits from it.
> + */
> +static inline int riscv_vector_vlen(void)
> +{
> +	return riscv_v_vsize / 32 * 8;
> +}

KVM already has a bespoke conversion to bytes (rather than bits). Factor code?

> +
>  #endif /* ! __ASM_RISCV_VECTOR_H */


-- 
レミ・デニ-クールモン
http://www.remlab.net/




_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH v4 03/12] RISC-V: add helper function to read the vector VLEN
@ 2023-07-11 18:06     ` Rémi Denis-Courmont
  0 siblings, 0 replies; 100+ messages in thread
From: Rémi Denis-Courmont @ 2023-07-11 18:06 UTC (permalink / raw)
  To: linux-riscv; +Cc: linux-kernel

Le tiistaina 11. heinäkuuta 2023, 18.37.34 EEST Heiko Stuebner a écrit :
> From: Heiko Stuebner <heiko.stuebner@vrull.eu>
> 
> VLEN describes the length of each vector register and some instructions
> need specific minimal VLENs to work correctly.
> 
> The vector code already includes a variable riscv_vsize that contains the
> value of "32 vector registers with vlenb length" that gets filled during
> boot. vlenb is the value contained in the CSR_VLENB register and
> the value represents "VLEN / 8".
> 
> So add riscv_vector_vlen() to return the actual VLEN value for in-kernel
> users when they need to check the available VLEN.
> 
> Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
> ---
>  arch/riscv/include/asm/vector.h | 11 +++++++++++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/arch/riscv/include/asm/vector.h
> b/arch/riscv/include/asm/vector.h index ac2c23045eec..88cf76a2316d 100644
> --- a/arch/riscv/include/asm/vector.h
> +++ b/arch/riscv/include/asm/vector.h
> @@ -232,4 +232,15 @@ static inline bool
> riscv_v_vstate_ctrl_user_allowed(void) { return false; }
> 
>  #endif /* CONFIG_RISCV_ISA_V */
> 
> +/*
> + * Return the implementation's vlen value.
> + *
> + * riscv_vsize contains the value of "32 vector registers with vlenb
> length" + * so rebuild the vlen value in bits from it.
> + */
> +static inline int riscv_vector_vlen(void)
> +{
> +	return riscv_v_vsize / 32 * 8;
> +}

KVM already has a bespoke conversion to bytes (rather than bits). Factor code?

> +
>  #endif /* ! __ASM_RISCV_VECTOR_H */


-- 
レミ・デニ-クールモン
http://www.remlab.net/




^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH v4 04/12] RISC-V: add vector crypto extension detection
  2023-07-11 15:37   ` Heiko Stuebner
@ 2023-07-12 10:40     ` Anup Patel
  -1 siblings, 0 replies; 100+ messages in thread
From: Anup Patel @ 2023-07-12 10:40 UTC (permalink / raw)
  To: Heiko Stuebner
  Cc: palmer, paul.walmsley, aou, herbert, davem, conor.dooley,
	linux-riscv, linux-kernel, linux-crypto, christoph.muellner,
	ebiggers, Heiko Stuebner

On Tue, Jul 11, 2023 at 9:09 PM Heiko Stuebner <heiko@sntech.de> wrote:
>
> From: Heiko Stuebner <heiko.stuebner@vrull.eu>
>
> Add detection for some extensions of the vector-crypto specification:
> - Zvkb: Vector Bit-manipulation used in Cryptography
> - Zvkg: Vector GCM/GMAC
> - Zvknha and Zvknhb: NIST Algorithm Suite
> - Zvkns: AES-128, AES-256 Single Round Suite
> - Zvksed: ShangMi Algorithm Suite
> - Zvksh: ShangMi Algorithm Suite

Any plan to allow user-space to detect these extensions via HWPROBE ?

Regards,
Anup

>
> As their use is very specific and will likely be limited to special places
> we expect current code to just pre-encode those instructions, so right now
> we don't introduce toolchain requirements.
>
> Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
> ---
>  arch/riscv/include/asm/hwcap.h |  9 ++++++
>  arch/riscv/kernel/cpu.c        |  8 ++++++
>  arch/riscv/kernel/cpufeature.c | 50 ++++++++++++++++++++++++++++++++++
>  3 files changed, 67 insertions(+)
>
> diff --git a/arch/riscv/include/asm/hwcap.h b/arch/riscv/include/asm/hwcap.h
> index b80ca6e77088..0f5172fa87b0 100644
> --- a/arch/riscv/include/asm/hwcap.h
> +++ b/arch/riscv/include/asm/hwcap.h
> @@ -64,6 +64,15 @@
>  #define RISCV_ISA_EXT_ZKSED            51
>  #define RISCV_ISA_EXT_ZKSH             52
>  #define RISCV_ISA_EXT_ZKT              53
> +#define RISCV_ISA_EXT_ZVBB             54
> +#define RISCV_ISA_EXT_ZVBC             55
> +#define RISCV_ISA_EXT_ZVKG             56
> +#define RISCV_ISA_EXT_ZVKNED           57
> +#define RISCV_ISA_EXT_ZVKNHA           58
> +#define RISCV_ISA_EXT_ZVKNHB           59
> +#define RISCV_ISA_EXT_ZVKSED           60
> +#define RISCV_ISA_EXT_ZVKSH            61
> +#define RISCV_ISA_EXT_ZVKT             62
>
>  #define RISCV_ISA_EXT_MAX              64
>  #define RISCV_ISA_EXT_NAME_LEN_MAX     32
> diff --git a/arch/riscv/kernel/cpu.c b/arch/riscv/kernel/cpu.c
> index 10524322a4c0..925241e25db2 100644
> --- a/arch/riscv/kernel/cpu.c
> +++ b/arch/riscv/kernel/cpu.c
> @@ -227,6 +227,14 @@ static struct riscv_isa_ext_data isa_ext_arr[] = {
>         __RISCV_ISA_EXT_DATA(zksed, RISCV_ISA_EXT_ZKSED),
>         __RISCV_ISA_EXT_DATA(zksh, RISCV_ISA_EXT_ZKSH),
>         __RISCV_ISA_EXT_DATA(zkt, RISCV_ISA_EXT_ZKT),
> +       __RISCV_ISA_EXT_DATA(zvbb, RISCV_ISA_EXT_ZVBB),
> +       __RISCV_ISA_EXT_DATA(zvbc, RISCV_ISA_EXT_ZVBC),
> +       __RISCV_ISA_EXT_DATA(zvkg, RISCV_ISA_EXT_ZVKG),
> +       __RISCV_ISA_EXT_DATA(zvkned, RISCV_ISA_EXT_ZVKNED),
> +       __RISCV_ISA_EXT_DATA(zvknha, RISCV_ISA_EXT_ZVKNHA),
> +       __RISCV_ISA_EXT_DATA(zvknhb, RISCV_ISA_EXT_ZVKNHB),
> +       __RISCV_ISA_EXT_DATA(zvksed, RISCV_ISA_EXT_ZVKSED),
> +       __RISCV_ISA_EXT_DATA(zvksh, RISCV_ISA_EXT_ZVKSH),
>         __RISCV_ISA_EXT_DATA(smaia, RISCV_ISA_EXT_SMAIA),
>         __RISCV_ISA_EXT_DATA(ssaia, RISCV_ISA_EXT_SSAIA),
>         __RISCV_ISA_EXT_DATA(sscofpmf, RISCV_ISA_EXT_SSCOFPMF),
> diff --git a/arch/riscv/kernel/cpufeature.c b/arch/riscv/kernel/cpufeature.c
> index 9a872a2007a5..13556fd16bf6 100644
> --- a/arch/riscv/kernel/cpufeature.c
> +++ b/arch/riscv/kernel/cpufeature.c
> @@ -343,6 +343,56 @@ void __init riscv_fill_hwcap(void)
>                                 SET_ISA_EXT_MAP("zksh", RISCV_ISA_EXT_ZKSH);
>                                 SET_ISA_EXT_MAP("zkr", RISCV_ISA_EXT_ZKR);
>                                 SET_ISA_EXT_MAP("zkt", RISCV_ISA_EXT_ZKT);
> +                               SET_ISA_EXT_MAP("zvbb", RISCV_ISA_EXT_ZVBB);
> +                               SET_ISA_EXT_MAP("zvbc", RISCV_ISA_EXT_ZVBC);
> +                               SET_ISA_EXT_MAP("zvkg", RISCV_ISA_EXT_ZVKG);
> +                               SET_ISA_EXT_MAP("zvkned", RISCV_ISA_EXT_ZVKNED);
> +                               SET_ISA_EXT_MAP("zvknha", RISCV_ISA_EXT_ZVKNHA);
> +                               SET_ISA_EXT_MAP("zvknhb", RISCV_ISA_EXT_ZVKNHB);
> +                               SET_ISA_EXT_MAP("zvksed", RISCV_ISA_EXT_ZVKSED);
> +                               SET_ISA_EXT_MAP("zvksh", RISCV_ISA_EXT_ZVKSH);
> +                               SET_ISA_EXT_MAP("zvkt", RISCV_ISA_EXT_ZVKT);
> +
> +                               /* NIST Algorithm Suite */
> +                               SET_ISA_EXT_MAP("zvkn", RISCV_ISA_EXT_ZVKNED);
> +                               SET_ISA_EXT_MAP("zvkn", RISCV_ISA_EXT_ZVKNHB);
> +                               SET_ISA_EXT_MAP("zvkn", RISCV_ISA_EXT_ZVBB);
> +                               SET_ISA_EXT_MAP("zvkn", RISCV_ISA_EXT_ZVKT);
> +
> +                               /* NIST Algorithm Suite with carryless multiply */
> +                               SET_ISA_EXT_MAP("zvknc", RISCV_ISA_EXT_ZVKNED);
> +                               SET_ISA_EXT_MAP("zvknc", RISCV_ISA_EXT_ZVKNHB);
> +                               SET_ISA_EXT_MAP("zvknc", RISCV_ISA_EXT_ZVBB);
> +                               SET_ISA_EXT_MAP("zvknc", RISCV_ISA_EXT_ZVKT);
> +                               SET_ISA_EXT_MAP("zvknc", RISCV_ISA_EXT_ZVBC);
> +
> +                               /* NIST Algorithm Suite with GCM */
> +                               SET_ISA_EXT_MAP("zvkng", RISCV_ISA_EXT_ZVKNED);
> +                               SET_ISA_EXT_MAP("zvkng", RISCV_ISA_EXT_ZVKNHB);
> +                               SET_ISA_EXT_MAP("zvkng", RISCV_ISA_EXT_ZVBB);
> +                               SET_ISA_EXT_MAP("zvkng", RISCV_ISA_EXT_ZVKT);
> +                               SET_ISA_EXT_MAP("zvkng", RISCV_ISA_EXT_ZVKG);
> +
> +                               /*  ShangMi Algorithm Suite */
> +                               SET_ISA_EXT_MAP("zvks", RISCV_ISA_EXT_ZVKSED);
> +                               SET_ISA_EXT_MAP("zvks", RISCV_ISA_EXT_ZVKSH);
> +                               SET_ISA_EXT_MAP("zvks", RISCV_ISA_EXT_ZVBB);
> +                               SET_ISA_EXT_MAP("zvks", RISCV_ISA_EXT_ZVKT);
> +
> +                               /* ShangMi Algorithm Suite with carryless multiply */
> +                               SET_ISA_EXT_MAP("zvksc", RISCV_ISA_EXT_ZVKSED);
> +                               SET_ISA_EXT_MAP("zvksc", RISCV_ISA_EXT_ZVKSH);
> +                               SET_ISA_EXT_MAP("zvksc", RISCV_ISA_EXT_ZVBB);
> +                               SET_ISA_EXT_MAP("zvksc", RISCV_ISA_EXT_ZVKT);
> +                               SET_ISA_EXT_MAP("zvksc", RISCV_ISA_EXT_ZVBC);
> +
> +                               /* ShangMi Algorithm Suite with GCM */
> +                               SET_ISA_EXT_MAP("zvksg", RISCV_ISA_EXT_ZVKSED);
> +                               SET_ISA_EXT_MAP("zvksg", RISCV_ISA_EXT_ZVKSH);
> +                               SET_ISA_EXT_MAP("zvksg", RISCV_ISA_EXT_ZVBB);
> +                               SET_ISA_EXT_MAP("zvksg", RISCV_ISA_EXT_ZVKT);
> +                               SET_ISA_EXT_MAP("zvksg", RISCV_ISA_EXT_ZVKG);
> +
>                         }
>  #undef SET_ISA_EXT_MAP
>                 }
> --
> 2.39.2
>

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH v4 04/12] RISC-V: add vector crypto extension detection
@ 2023-07-12 10:40     ` Anup Patel
  0 siblings, 0 replies; 100+ messages in thread
From: Anup Patel @ 2023-07-12 10:40 UTC (permalink / raw)
  To: Heiko Stuebner
  Cc: palmer, paul.walmsley, aou, herbert, davem, conor.dooley,
	linux-riscv, linux-kernel, linux-crypto, christoph.muellner,
	ebiggers, Heiko Stuebner

On Tue, Jul 11, 2023 at 9:09 PM Heiko Stuebner <heiko@sntech.de> wrote:
>
> From: Heiko Stuebner <heiko.stuebner@vrull.eu>
>
> Add detection for some extensions of the vector-crypto specification:
> - Zvkb: Vector Bit-manipulation used in Cryptography
> - Zvkg: Vector GCM/GMAC
> - Zvknha and Zvknhb: NIST Algorithm Suite
> - Zvkns: AES-128, AES-256 Single Round Suite
> - Zvksed: ShangMi Algorithm Suite
> - Zvksh: ShangMi Algorithm Suite

Any plan to allow user-space to detect these extensions via HWPROBE ?

Regards,
Anup

>
> As their use is very specific and will likely be limited to special places
> we expect current code to just pre-encode those instructions, so right now
> we don't introduce toolchain requirements.
>
> Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
> ---
>  arch/riscv/include/asm/hwcap.h |  9 ++++++
>  arch/riscv/kernel/cpu.c        |  8 ++++++
>  arch/riscv/kernel/cpufeature.c | 50 ++++++++++++++++++++++++++++++++++
>  3 files changed, 67 insertions(+)
>
> diff --git a/arch/riscv/include/asm/hwcap.h b/arch/riscv/include/asm/hwcap.h
> index b80ca6e77088..0f5172fa87b0 100644
> --- a/arch/riscv/include/asm/hwcap.h
> +++ b/arch/riscv/include/asm/hwcap.h
> @@ -64,6 +64,15 @@
>  #define RISCV_ISA_EXT_ZKSED            51
>  #define RISCV_ISA_EXT_ZKSH             52
>  #define RISCV_ISA_EXT_ZKT              53
> +#define RISCV_ISA_EXT_ZVBB             54
> +#define RISCV_ISA_EXT_ZVBC             55
> +#define RISCV_ISA_EXT_ZVKG             56
> +#define RISCV_ISA_EXT_ZVKNED           57
> +#define RISCV_ISA_EXT_ZVKNHA           58
> +#define RISCV_ISA_EXT_ZVKNHB           59
> +#define RISCV_ISA_EXT_ZVKSED           60
> +#define RISCV_ISA_EXT_ZVKSH            61
> +#define RISCV_ISA_EXT_ZVKT             62
>
>  #define RISCV_ISA_EXT_MAX              64
>  #define RISCV_ISA_EXT_NAME_LEN_MAX     32
> diff --git a/arch/riscv/kernel/cpu.c b/arch/riscv/kernel/cpu.c
> index 10524322a4c0..925241e25db2 100644
> --- a/arch/riscv/kernel/cpu.c
> +++ b/arch/riscv/kernel/cpu.c
> @@ -227,6 +227,14 @@ static struct riscv_isa_ext_data isa_ext_arr[] = {
>         __RISCV_ISA_EXT_DATA(zksed, RISCV_ISA_EXT_ZKSED),
>         __RISCV_ISA_EXT_DATA(zksh, RISCV_ISA_EXT_ZKSH),
>         __RISCV_ISA_EXT_DATA(zkt, RISCV_ISA_EXT_ZKT),
> +       __RISCV_ISA_EXT_DATA(zvbb, RISCV_ISA_EXT_ZVBB),
> +       __RISCV_ISA_EXT_DATA(zvbc, RISCV_ISA_EXT_ZVBC),
> +       __RISCV_ISA_EXT_DATA(zvkg, RISCV_ISA_EXT_ZVKG),
> +       __RISCV_ISA_EXT_DATA(zvkned, RISCV_ISA_EXT_ZVKNED),
> +       __RISCV_ISA_EXT_DATA(zvknha, RISCV_ISA_EXT_ZVKNHA),
> +       __RISCV_ISA_EXT_DATA(zvknhb, RISCV_ISA_EXT_ZVKNHB),
> +       __RISCV_ISA_EXT_DATA(zvksed, RISCV_ISA_EXT_ZVKSED),
> +       __RISCV_ISA_EXT_DATA(zvksh, RISCV_ISA_EXT_ZVKSH),
>         __RISCV_ISA_EXT_DATA(smaia, RISCV_ISA_EXT_SMAIA),
>         __RISCV_ISA_EXT_DATA(ssaia, RISCV_ISA_EXT_SSAIA),
>         __RISCV_ISA_EXT_DATA(sscofpmf, RISCV_ISA_EXT_SSCOFPMF),
> diff --git a/arch/riscv/kernel/cpufeature.c b/arch/riscv/kernel/cpufeature.c
> index 9a872a2007a5..13556fd16bf6 100644
> --- a/arch/riscv/kernel/cpufeature.c
> +++ b/arch/riscv/kernel/cpufeature.c
> @@ -343,6 +343,56 @@ void __init riscv_fill_hwcap(void)
>                                 SET_ISA_EXT_MAP("zksh", RISCV_ISA_EXT_ZKSH);
>                                 SET_ISA_EXT_MAP("zkr", RISCV_ISA_EXT_ZKR);
>                                 SET_ISA_EXT_MAP("zkt", RISCV_ISA_EXT_ZKT);
> +                               SET_ISA_EXT_MAP("zvbb", RISCV_ISA_EXT_ZVBB);
> +                               SET_ISA_EXT_MAP("zvbc", RISCV_ISA_EXT_ZVBC);
> +                               SET_ISA_EXT_MAP("zvkg", RISCV_ISA_EXT_ZVKG);
> +                               SET_ISA_EXT_MAP("zvkned", RISCV_ISA_EXT_ZVKNED);
> +                               SET_ISA_EXT_MAP("zvknha", RISCV_ISA_EXT_ZVKNHA);
> +                               SET_ISA_EXT_MAP("zvknhb", RISCV_ISA_EXT_ZVKNHB);
> +                               SET_ISA_EXT_MAP("zvksed", RISCV_ISA_EXT_ZVKSED);
> +                               SET_ISA_EXT_MAP("zvksh", RISCV_ISA_EXT_ZVKSH);
> +                               SET_ISA_EXT_MAP("zvkt", RISCV_ISA_EXT_ZVKT);
> +
> +                               /* NIST Algorithm Suite */
> +                               SET_ISA_EXT_MAP("zvkn", RISCV_ISA_EXT_ZVKNED);
> +                               SET_ISA_EXT_MAP("zvkn", RISCV_ISA_EXT_ZVKNHB);
> +                               SET_ISA_EXT_MAP("zvkn", RISCV_ISA_EXT_ZVBB);
> +                               SET_ISA_EXT_MAP("zvkn", RISCV_ISA_EXT_ZVKT);
> +
> +                               /* NIST Algorithm Suite with carryless multiply */
> +                               SET_ISA_EXT_MAP("zvknc", RISCV_ISA_EXT_ZVKNED);
> +                               SET_ISA_EXT_MAP("zvknc", RISCV_ISA_EXT_ZVKNHB);
> +                               SET_ISA_EXT_MAP("zvknc", RISCV_ISA_EXT_ZVBB);
> +                               SET_ISA_EXT_MAP("zvknc", RISCV_ISA_EXT_ZVKT);
> +                               SET_ISA_EXT_MAP("zvknc", RISCV_ISA_EXT_ZVBC);
> +
> +                               /* NIST Algorithm Suite with GCM */
> +                               SET_ISA_EXT_MAP("zvkng", RISCV_ISA_EXT_ZVKNED);
> +                               SET_ISA_EXT_MAP("zvkng", RISCV_ISA_EXT_ZVKNHB);
> +                               SET_ISA_EXT_MAP("zvkng", RISCV_ISA_EXT_ZVBB);
> +                               SET_ISA_EXT_MAP("zvkng", RISCV_ISA_EXT_ZVKT);
> +                               SET_ISA_EXT_MAP("zvkng", RISCV_ISA_EXT_ZVKG);
> +
> +                               /*  ShangMi Algorithm Suite */
> +                               SET_ISA_EXT_MAP("zvks", RISCV_ISA_EXT_ZVKSED);
> +                               SET_ISA_EXT_MAP("zvks", RISCV_ISA_EXT_ZVKSH);
> +                               SET_ISA_EXT_MAP("zvks", RISCV_ISA_EXT_ZVBB);
> +                               SET_ISA_EXT_MAP("zvks", RISCV_ISA_EXT_ZVKT);
> +
> +                               /* ShangMi Algorithm Suite with carryless multiply */
> +                               SET_ISA_EXT_MAP("zvksc", RISCV_ISA_EXT_ZVKSED);
> +                               SET_ISA_EXT_MAP("zvksc", RISCV_ISA_EXT_ZVKSH);
> +                               SET_ISA_EXT_MAP("zvksc", RISCV_ISA_EXT_ZVBB);
> +                               SET_ISA_EXT_MAP("zvksc", RISCV_ISA_EXT_ZVKT);
> +                               SET_ISA_EXT_MAP("zvksc", RISCV_ISA_EXT_ZVBC);
> +
> +                               /* ShangMi Algorithm Suite with GCM */
> +                               SET_ISA_EXT_MAP("zvksg", RISCV_ISA_EXT_ZVKSED);
> +                               SET_ISA_EXT_MAP("zvksg", RISCV_ISA_EXT_ZVKSH);
> +                               SET_ISA_EXT_MAP("zvksg", RISCV_ISA_EXT_ZVBB);
> +                               SET_ISA_EXT_MAP("zvksg", RISCV_ISA_EXT_ZVKT);
> +                               SET_ISA_EXT_MAP("zvksg", RISCV_ISA_EXT_ZVKG);
> +
>                         }
>  #undef SET_ISA_EXT_MAP
>                 }
> --
> 2.39.2
>

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH v4 00/12] RISC-V: support some cryptography accelerations
  2023-07-11 15:37 ` Heiko Stuebner
@ 2023-07-13  7:40   ` Eric Biggers
  -1 siblings, 0 replies; 100+ messages in thread
From: Eric Biggers @ 2023-07-13  7:40 UTC (permalink / raw)
  To: Heiko Stuebner
  Cc: palmer, paul.walmsley, aou, herbert, davem, conor.dooley,
	linux-riscv, linux-kernel, linux-crypto, christoph.muellner,
	Heiko Stuebner

On Tue, Jul 11, 2023 at 05:37:31PM +0200, Heiko Stuebner wrote:
> From: Heiko Stuebner <heiko.stuebner@vrull.eu>
> 
> This series provides cryptographic implementations using the vector
> crypto extensions.
> 
> v13 of the vector patchset dropped the patches for in-kernel usage of
> vector instructions, I picked the ones from v12 over into this series
> for now.
> 
> My basic goal was to not re-invent cryptographic code, so the heavy
> lifting is done by those perl-asm scripts used in openssl and the perl
> code used here-in stems from code that is targetted at openssl [0] and is
> unmodified from there to limit needed review effort.
> 
> With a matching qemu (there are patches for vector-crypto flying around)
> the in-kernel crypto-selftests (also the extended ones) are very happy
> so far.

Where does this patchset apply to?  I tried torvalds/master, linux-next/master,
riscv/for-next, and cryptodev/master.  Nothing worked.  When sending a
patch(set), please always use the '--base' option to 'git format-patch', or
explicitly mention where it applies to, or provide a link to a git repo.

- Eric

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH v4 00/12] RISC-V: support some cryptography accelerations
@ 2023-07-13  7:40   ` Eric Biggers
  0 siblings, 0 replies; 100+ messages in thread
From: Eric Biggers @ 2023-07-13  7:40 UTC (permalink / raw)
  To: Heiko Stuebner
  Cc: palmer, paul.walmsley, aou, herbert, davem, conor.dooley,
	linux-riscv, linux-kernel, linux-crypto, christoph.muellner,
	Heiko Stuebner

On Tue, Jul 11, 2023 at 05:37:31PM +0200, Heiko Stuebner wrote:
> From: Heiko Stuebner <heiko.stuebner@vrull.eu>
> 
> This series provides cryptographic implementations using the vector
> crypto extensions.
> 
> v13 of the vector patchset dropped the patches for in-kernel usage of
> vector instructions, I picked the ones from v12 over into this series
> for now.
> 
> My basic goal was to not re-invent cryptographic code, so the heavy
> lifting is done by those perl-asm scripts used in openssl and the perl
> code used here-in stems from code that is targetted at openssl [0] and is
> unmodified from there to limit needed review effort.
> 
> With a matching qemu (there are patches for vector-crypto flying around)
> the in-kernel crypto-selftests (also the extended ones) are very happy
> so far.

Where does this patchset apply to?  I tried torvalds/master, linux-next/master,
riscv/for-next, and cryptodev/master.  Nothing worked.  When sending a
patch(set), please always use the '--base' option to 'git format-patch', or
explicitly mention where it applies to, or provide a link to a git repo.

- Eric

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH v4 01/12] riscv: Add support for kernel mode vector
  2023-07-11 17:11     ` Rémi Denis-Courmont
@ 2023-07-13 17:19       ` Andy Chiu
  -1 siblings, 0 replies; 100+ messages in thread
From: Andy Chiu @ 2023-07-13 17:19 UTC (permalink / raw)
  To: Rémi Denis-Courmont; +Cc: linux-riscv, linux-kernel

Hi, Heiko

On Wed, Jul 12, 2023 at 1:15 AM Rémi Denis-Courmont <remi@remlab.net> wrote:
>
>         Hi,
>
> Le tiistaina 11. heinäkuuta 2023, 18.37.32 EEST Heiko Stuebner a écrit :
> > From: Greentime Hu <greentime.hu@sifive.com>
> >
> > Add kernel_rvv_begin() and kernel_rvv_end() function declarations
> > and corresponding definitions in kernel_mode_vector.c
> >
> > These are needed to wrap uses of vector in kernel mode.
> >
> > Co-developed-by: Vincent Chen <vincent.chen@sifive.com>
> > Signed-off-by: Vincent Chen <vincent.chen@sifive.com>
> > Signed-off-by: Greentime Hu <greentime.hu@sifive.com>
> > Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
> > ---
> >  arch/riscv/include/asm/vector.h        |  17 ++++
> >  arch/riscv/kernel/Makefile             |   1 +
> >  arch/riscv/kernel/kernel_mode_vector.c | 132 +++++++++++++++++++++++++
> >  3 files changed, 150 insertions(+)
> >  create mode 100644 arch/riscv/kernel/kernel_mode_vector.c
> >
> > diff --git a/arch/riscv/include/asm/vector.h
> > b/arch/riscv/include/asm/vector.h index 3d78930cab51..ac2c23045eec 100644
> > --- a/arch/riscv/include/asm/vector.h
> > +++ b/arch/riscv/include/asm/vector.h
> > @@ -196,6 +196,23 @@ static inline void __switch_to_vector(struct
> > task_struct *prev, void riscv_v_vstate_ctrl_init(struct task_struct *tsk);
> >  bool riscv_v_vstate_ctrl_user_allowed(void);
> >
> > +static inline void riscv_v_flush_cpu_state(void)
> > +{
> > +     asm volatile (
> > +             ".option push\n\t"
> > +             ".option arch, +v\n\t"
> > +             "vsetvli        t0, x0, e8, m8, ta, ma\n\t"
> > +             "vmv.v.i        v0, 0\n\t"
> > +             "vmv.v.i        v8, 0\n\t"
> > +             "vmv.v.i        v16, 0\n\t"
> > +             "vmv.v.i        v24, 0\n\t"
> > +             ".option pop\n\t"
> > +             : : : "t0");
>
> Why bother with zeroing out the vectors before kernel use? That sounds like it
> will only hide bugs in kernel code - implicitly assuming that everything is
> initially zero. Ditto initialising the vector configuration; if you really want
> to have a fixed initial value rather than "leak" whatever user set, better use
> an invalid configuration (vill=1), IMO.

Yes, I agree that we don't have to zero out (or invalid) v registers
before any kernel uses. And we only have to restore user's v registers
once, before really returning back to the user space. Actually I am
going to send out the series (for kernel-mode vector) having these
improvements in a few days. Does it seem ok to you to drop the first
two patches and rebase on top of mine at the next respin for the
crypto vector? Or is there any good way that you can think of to let
us cooperate on this?

>
> > +}
> > +
> > +void kernel_rvv_begin(void);
> > +void kernel_rvv_end(void);
> > +
> >  #else /* ! CONFIG_RISCV_ISA_V  */
> >
> >  struct pt_regs;
> > diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile
> > index 506cc4a9a45a..3f4435746af7 100644
> > --- a/arch/riscv/kernel/Makefile
> > +++ b/arch/riscv/kernel/Makefile
> > @@ -61,6 +61,7 @@ obj-$(CONFIG_MMU) += vdso.o vdso/
> >  obj-$(CONFIG_RISCV_M_MODE)   += traps_misaligned.o
> >  obj-$(CONFIG_FPU)            += fpu.o
> >  obj-$(CONFIG_RISCV_ISA_V)    += vector.o
> > +obj-$(CONFIG_RISCV_ISA_V)    += kernel_mode_vector.o
> >  obj-$(CONFIG_SMP)            += smpboot.o
> >  obj-$(CONFIG_SMP)            += smp.o
> >  obj-$(CONFIG_SMP)            += cpu_ops.o
> > diff --git a/arch/riscv/kernel/kernel_mode_vector.c
> > b/arch/riscv/kernel/kernel_mode_vector.c new file mode 100644
> > index 000000000000..2d704190c054
> > --- /dev/null
> > +++ b/arch/riscv/kernel/kernel_mode_vector.c
> > @@ -0,0 +1,132 @@
> > +// SPDX-License-Identifier: GPL-2.0-or-later
> > +/*
> > + * Copyright (C) 2012 ARM Ltd.
> > + * Author: Catalin Marinas <catalin.marinas@arm.com>
> > + * Copyright (C) 2017 Linaro Ltd. <ard.biesheuvel@linaro.org>
> > + * Copyright (C) 2021 SiFive
> > + */
> > +#include <linux/compiler.h>
> > +#include <linux/irqflags.h>
> > +#include <linux/percpu.h>
> > +#include <linux/preempt.h>
> > +#include <linux/types.h>
> > +
> > +#include <asm/vector.h>
> > +#include <asm/switch_to.h>
> > +
> > +DECLARE_PER_CPU(bool, vector_context_busy);
> > +DEFINE_PER_CPU(bool, vector_context_busy);
> > +
> > +/*
> > + * may_use_vector - whether it is allowable at this time to issue vector
> > + *                instructions or access the vector register file
> > + *
> > + * Callers must not assume that the result remains true beyond the next
> > + * preempt_enable() or return from softirq context.
> > + */
> > +static __must_check inline bool may_use_vector(void)
> > +{
> > +     /*
> > +      * vector_context_busy is only set while preemption is disabled,
> > +      * and is clear whenever preemption is enabled. Since
> > +      * this_cpu_read() is atomic w.r.t. preemption, vector_context_busy
> > +      * cannot change under our feet -- if it's set we cannot be
> > +      * migrated, and if it's clear we cannot be migrated to a CPU
> > +      * where it is set.
> > +      */
> > +     return !in_irq() && !irqs_disabled() && !in_nmi() &&
> > +            !this_cpu_read(vector_context_busy);
> > +}
> > +
> > +/*
> > + * Claim ownership of the CPU vector context for use by the calling
> > context. + *
> > + * The caller may freely manipulate the vector context metadata until
> > + * put_cpu_vector_context() is called.
> > + */
> > +static void get_cpu_vector_context(void)
> > +{
> > +     bool busy;
> > +
> > +     preempt_disable();
> > +     busy = __this_cpu_xchg(vector_context_busy, true);
> > +
> > +     WARN_ON(busy);
> > +}
> > +
> > +/*
> > + * Release the CPU vector context.
> > + *
> > + * Must be called from a context in which get_cpu_vector_context() was
> > + * previously called, with no call to put_cpu_vector_context() in the
> > + * meantime.
> > + */
> > +static void put_cpu_vector_context(void)
> > +{
> > +     bool busy = __this_cpu_xchg(vector_context_busy, false);
> > +
> > +     WARN_ON(!busy);
> > +     preempt_enable();
> > +}
> > +
> > +/*
> > + * kernel_rvv_begin(): obtain the CPU vector registers for use by the
> > calling + * context
> > + *
> > + * Must not be called unless may_use_vector() returns true.
> > + * Task context in the vector registers is saved back to memory as
> > necessary. + *
> > + * A matching call to kernel_rvv_end() must be made before returning from
> > the + * calling context.
> > + *
> > + * The caller may freely use the vector registers until kernel_rvv_end() is
> > + * called.
> > + */
> > +void kernel_rvv_begin(void)
> > +{
> > +     if (WARN_ON(!has_vector()))
> > +             return;
> > +
> > +     WARN_ON(!may_use_vector());
> > +
> > +     /* Acquire kernel mode vector */
> > +     get_cpu_vector_context();
> > +
> > +     /* Save vector state, if any */
> > +     riscv_v_vstate_save(current, task_pt_regs(current));
> > +
> > +     /* Enable vector */
> > +     riscv_v_enable();
> > +
> > +     /* Invalidate vector regs */
> > +     riscv_v_flush_cpu_state();
> > +}
> > +EXPORT_SYMBOL_GPL(kernel_rvv_begin);
> > +
> > +/*
> > + * kernel_rvv_end(): give the CPU vector registers back to the current task
> > + *
> > + * Must be called from a context in which kernel_rvv_begin() was previously
> > + * called, with no call to kernel_rvv_end() in the meantime.
> > + *
> > + * The caller must not use the vector registers after this function is
> > called, + * unless kernel_rvv_begin() is called again in the meantime.
> > + */
> > +void kernel_rvv_end(void)
> > +{
> > +     if (WARN_ON(!has_vector()))
> > +             return;
> > +
> > +     /* Invalidate vector regs */
> > +     riscv_v_flush_cpu_state();
> > +
> > +     /* Restore vector state, if any */
> > +     riscv_v_vstate_restore(current, task_pt_regs(current));
>
> I thought that the kernel was already nuking user vectors on every system
> call, since the RVV spec says so.
>
> Are you trying to use vectors from interrupts? Otherwise, isn't this flush &
> restore superfluous?
>
> > +
> > +     /* disable vector */
> > +     riscv_v_disable();
> > +
> > +     /* release kernel mode vector */
> > +     put_cpu_vector_context();
> > +}
> > +EXPORT_SYMBOL_GPL(kernel_rvv_end);
>
>
> --
> 雷米‧德尼-库尔蒙
> http://www.remlab.net/
>
>
>
>
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv

Thanks.

Andy

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH v4 01/12] riscv: Add support for kernel mode vector
@ 2023-07-13 17:19       ` Andy Chiu
  0 siblings, 0 replies; 100+ messages in thread
From: Andy Chiu @ 2023-07-13 17:19 UTC (permalink / raw)
  To: Rémi Denis-Courmont; +Cc: linux-riscv, linux-kernel

Hi, Heiko

On Wed, Jul 12, 2023 at 1:15 AM Rémi Denis-Courmont <remi@remlab.net> wrote:
>
>         Hi,
>
> Le tiistaina 11. heinäkuuta 2023, 18.37.32 EEST Heiko Stuebner a écrit :
> > From: Greentime Hu <greentime.hu@sifive.com>
> >
> > Add kernel_rvv_begin() and kernel_rvv_end() function declarations
> > and corresponding definitions in kernel_mode_vector.c
> >
> > These are needed to wrap uses of vector in kernel mode.
> >
> > Co-developed-by: Vincent Chen <vincent.chen@sifive.com>
> > Signed-off-by: Vincent Chen <vincent.chen@sifive.com>
> > Signed-off-by: Greentime Hu <greentime.hu@sifive.com>
> > Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
> > ---
> >  arch/riscv/include/asm/vector.h        |  17 ++++
> >  arch/riscv/kernel/Makefile             |   1 +
> >  arch/riscv/kernel/kernel_mode_vector.c | 132 +++++++++++++++++++++++++
> >  3 files changed, 150 insertions(+)
> >  create mode 100644 arch/riscv/kernel/kernel_mode_vector.c
> >
> > diff --git a/arch/riscv/include/asm/vector.h
> > b/arch/riscv/include/asm/vector.h index 3d78930cab51..ac2c23045eec 100644
> > --- a/arch/riscv/include/asm/vector.h
> > +++ b/arch/riscv/include/asm/vector.h
> > @@ -196,6 +196,23 @@ static inline void __switch_to_vector(struct
> > task_struct *prev, void riscv_v_vstate_ctrl_init(struct task_struct *tsk);
> >  bool riscv_v_vstate_ctrl_user_allowed(void);
> >
> > +static inline void riscv_v_flush_cpu_state(void)
> > +{
> > +     asm volatile (
> > +             ".option push\n\t"
> > +             ".option arch, +v\n\t"
> > +             "vsetvli        t0, x0, e8, m8, ta, ma\n\t"
> > +             "vmv.v.i        v0, 0\n\t"
> > +             "vmv.v.i        v8, 0\n\t"
> > +             "vmv.v.i        v16, 0\n\t"
> > +             "vmv.v.i        v24, 0\n\t"
> > +             ".option pop\n\t"
> > +             : : : "t0");
>
> Why bother with zeroing out the vectors before kernel use? That sounds like it
> will only hide bugs in kernel code - implicitly assuming that everything is
> initially zero. Ditto initialising the vector configuration; if you really want
> to have a fixed initial value rather than "leak" whatever user set, better use
> an invalid configuration (vill=1), IMO.

Yes, I agree that we don't have to zero out (or invalid) v registers
before any kernel uses. And we only have to restore user's v registers
once, before really returning back to the user space. Actually I am
going to send out the series (for kernel-mode vector) having these
improvements in a few days. Does it seem ok to you to drop the first
two patches and rebase on top of mine at the next respin for the
crypto vector? Or is there any good way that you can think of to let
us cooperate on this?

>
> > +}
> > +
> > +void kernel_rvv_begin(void);
> > +void kernel_rvv_end(void);
> > +
> >  #else /* ! CONFIG_RISCV_ISA_V  */
> >
> >  struct pt_regs;
> > diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile
> > index 506cc4a9a45a..3f4435746af7 100644
> > --- a/arch/riscv/kernel/Makefile
> > +++ b/arch/riscv/kernel/Makefile
> > @@ -61,6 +61,7 @@ obj-$(CONFIG_MMU) += vdso.o vdso/
> >  obj-$(CONFIG_RISCV_M_MODE)   += traps_misaligned.o
> >  obj-$(CONFIG_FPU)            += fpu.o
> >  obj-$(CONFIG_RISCV_ISA_V)    += vector.o
> > +obj-$(CONFIG_RISCV_ISA_V)    += kernel_mode_vector.o
> >  obj-$(CONFIG_SMP)            += smpboot.o
> >  obj-$(CONFIG_SMP)            += smp.o
> >  obj-$(CONFIG_SMP)            += cpu_ops.o
> > diff --git a/arch/riscv/kernel/kernel_mode_vector.c
> > b/arch/riscv/kernel/kernel_mode_vector.c new file mode 100644
> > index 000000000000..2d704190c054
> > --- /dev/null
> > +++ b/arch/riscv/kernel/kernel_mode_vector.c
> > @@ -0,0 +1,132 @@
> > +// SPDX-License-Identifier: GPL-2.0-or-later
> > +/*
> > + * Copyright (C) 2012 ARM Ltd.
> > + * Author: Catalin Marinas <catalin.marinas@arm.com>
> > + * Copyright (C) 2017 Linaro Ltd. <ard.biesheuvel@linaro.org>
> > + * Copyright (C) 2021 SiFive
> > + */
> > +#include <linux/compiler.h>
> > +#include <linux/irqflags.h>
> > +#include <linux/percpu.h>
> > +#include <linux/preempt.h>
> > +#include <linux/types.h>
> > +
> > +#include <asm/vector.h>
> > +#include <asm/switch_to.h>
> > +
> > +DECLARE_PER_CPU(bool, vector_context_busy);
> > +DEFINE_PER_CPU(bool, vector_context_busy);
> > +
> > +/*
> > + * may_use_vector - whether it is allowable at this time to issue vector
> > + *                instructions or access the vector register file
> > + *
> > + * Callers must not assume that the result remains true beyond the next
> > + * preempt_enable() or return from softirq context.
> > + */
> > +static __must_check inline bool may_use_vector(void)
> > +{
> > +     /*
> > +      * vector_context_busy is only set while preemption is disabled,
> > +      * and is clear whenever preemption is enabled. Since
> > +      * this_cpu_read() is atomic w.r.t. preemption, vector_context_busy
> > +      * cannot change under our feet -- if it's set we cannot be
> > +      * migrated, and if it's clear we cannot be migrated to a CPU
> > +      * where it is set.
> > +      */
> > +     return !in_irq() && !irqs_disabled() && !in_nmi() &&
> > +            !this_cpu_read(vector_context_busy);
> > +}
> > +
> > +/*
> > + * Claim ownership of the CPU vector context for use by the calling
> > context. + *
> > + * The caller may freely manipulate the vector context metadata until
> > + * put_cpu_vector_context() is called.
> > + */
> > +static void get_cpu_vector_context(void)
> > +{
> > +     bool busy;
> > +
> > +     preempt_disable();
> > +     busy = __this_cpu_xchg(vector_context_busy, true);
> > +
> > +     WARN_ON(busy);
> > +}
> > +
> > +/*
> > + * Release the CPU vector context.
> > + *
> > + * Must be called from a context in which get_cpu_vector_context() was
> > + * previously called, with no call to put_cpu_vector_context() in the
> > + * meantime.
> > + */
> > +static void put_cpu_vector_context(void)
> > +{
> > +     bool busy = __this_cpu_xchg(vector_context_busy, false);
> > +
> > +     WARN_ON(!busy);
> > +     preempt_enable();
> > +}
> > +
> > +/*
> > + * kernel_rvv_begin(): obtain the CPU vector registers for use by the
> > calling + * context
> > + *
> > + * Must not be called unless may_use_vector() returns true.
> > + * Task context in the vector registers is saved back to memory as
> > necessary. + *
> > + * A matching call to kernel_rvv_end() must be made before returning from
> > the + * calling context.
> > + *
> > + * The caller may freely use the vector registers until kernel_rvv_end() is
> > + * called.
> > + */
> > +void kernel_rvv_begin(void)
> > +{
> > +     if (WARN_ON(!has_vector()))
> > +             return;
> > +
> > +     WARN_ON(!may_use_vector());
> > +
> > +     /* Acquire kernel mode vector */
> > +     get_cpu_vector_context();
> > +
> > +     /* Save vector state, if any */
> > +     riscv_v_vstate_save(current, task_pt_regs(current));
> > +
> > +     /* Enable vector */
> > +     riscv_v_enable();
> > +
> > +     /* Invalidate vector regs */
> > +     riscv_v_flush_cpu_state();
> > +}
> > +EXPORT_SYMBOL_GPL(kernel_rvv_begin);
> > +
> > +/*
> > + * kernel_rvv_end(): give the CPU vector registers back to the current task
> > + *
> > + * Must be called from a context in which kernel_rvv_begin() was previously
> > + * called, with no call to kernel_rvv_end() in the meantime.
> > + *
> > + * The caller must not use the vector registers after this function is
> > called, + * unless kernel_rvv_begin() is called again in the meantime.
> > + */
> > +void kernel_rvv_end(void)
> > +{
> > +     if (WARN_ON(!has_vector()))
> > +             return;
> > +
> > +     /* Invalidate vector regs */
> > +     riscv_v_flush_cpu_state();
> > +
> > +     /* Restore vector state, if any */
> > +     riscv_v_vstate_restore(current, task_pt_regs(current));
>
> I thought that the kernel was already nuking user vectors on every system
> call, since the RVV spec says so.
>
> Are you trying to use vectors from interrupts? Otherwise, isn't this flush &
> restore superfluous?
>
> > +
> > +     /* disable vector */
> > +     riscv_v_disable();
> > +
> > +     /* release kernel mode vector */
> > +     put_cpu_vector_context();
> > +}
> > +EXPORT_SYMBOL_GPL(kernel_rvv_end);
>
>
> --
> 雷米‧德尼-库尔蒙
> http://www.remlab.net/
>
>
>
>
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv

Thanks.

Andy

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH v4 00/12] RISC-V: support some cryptography accelerations
  2023-07-13  7:40   ` Eric Biggers
@ 2023-07-14  6:27     ` Eric Biggers
  -1 siblings, 0 replies; 100+ messages in thread
From: Eric Biggers @ 2023-07-14  6:27 UTC (permalink / raw)
  To: Heiko Stuebner
  Cc: palmer, paul.walmsley, aou, herbert, davem, conor.dooley,
	linux-riscv, linux-kernel, linux-crypto, christoph.muellner,
	Heiko Stuebner

On Thu, Jul 13, 2023 at 12:40:42AM -0700, Eric Biggers wrote:
> On Tue, Jul 11, 2023 at 05:37:31PM +0200, Heiko Stuebner wrote:
> > From: Heiko Stuebner <heiko.stuebner@vrull.eu>
> > 
> > This series provides cryptographic implementations using the vector
> > crypto extensions.
> > 
> > v13 of the vector patchset dropped the patches for in-kernel usage of
> > vector instructions, I picked the ones from v12 over into this series
> > for now.
> > 
> > My basic goal was to not re-invent cryptographic code, so the heavy
> > lifting is done by those perl-asm scripts used in openssl and the perl
> > code used here-in stems from code that is targetted at openssl [0] and is
> > unmodified from there to limit needed review effort.
> > 
> > With a matching qemu (there are patches for vector-crypto flying around)
> > the in-kernel crypto-selftests (also the extended ones) are very happy
> > so far.
> 
> Where does this patchset apply to?  I tried torvalds/master, linux-next/master,
> riscv/for-next, and cryptodev/master.  Nothing worked.  When sending a
> patch(set), please always use the '--base' option to 'git format-patch', or
> explicitly mention where it applies to, or provide a link to a git repo.
> 

Hi Heiko, any update on this?  I would like to review, and maybe test, this
patchset but there's no way for me to do so.

- Eric

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH v4 00/12] RISC-V: support some cryptography accelerations
@ 2023-07-14  6:27     ` Eric Biggers
  0 siblings, 0 replies; 100+ messages in thread
From: Eric Biggers @ 2023-07-14  6:27 UTC (permalink / raw)
  To: Heiko Stuebner
  Cc: palmer, paul.walmsley, aou, herbert, davem, conor.dooley,
	linux-riscv, linux-kernel, linux-crypto, christoph.muellner,
	Heiko Stuebner

On Thu, Jul 13, 2023 at 12:40:42AM -0700, Eric Biggers wrote:
> On Tue, Jul 11, 2023 at 05:37:31PM +0200, Heiko Stuebner wrote:
> > From: Heiko Stuebner <heiko.stuebner@vrull.eu>
> > 
> > This series provides cryptographic implementations using the vector
> > crypto extensions.
> > 
> > v13 of the vector patchset dropped the patches for in-kernel usage of
> > vector instructions, I picked the ones from v12 over into this series
> > for now.
> > 
> > My basic goal was to not re-invent cryptographic code, so the heavy
> > lifting is done by those perl-asm scripts used in openssl and the perl
> > code used here-in stems from code that is targetted at openssl [0] and is
> > unmodified from there to limit needed review effort.
> > 
> > With a matching qemu (there are patches for vector-crypto flying around)
> > the in-kernel crypto-selftests (also the extended ones) are very happy
> > so far.
> 
> Where does this patchset apply to?  I tried torvalds/master, linux-next/master,
> riscv/for-next, and cryptodev/master.  Nothing worked.  When sending a
> patch(set), please always use the '--base' option to 'git format-patch', or
> explicitly mention where it applies to, or provide a link to a git repo.
> 

Hi Heiko, any update on this?  I would like to review, and maybe test, this
patchset but there's no way for me to do so.

- Eric

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH v4 00/12] RISC-V: support some cryptography accelerations
  2023-07-14  6:27     ` Eric Biggers
@ 2023-07-14  7:02       ` Heiko Stuebner
  -1 siblings, 0 replies; 100+ messages in thread
From: Heiko Stuebner @ 2023-07-14  7:02 UTC (permalink / raw)
  To: Eric Biggers
  Cc: palmer, paul.walmsley, aou, herbert, davem, conor.dooley,
	linux-riscv, linux-kernel, linux-crypto, christoph.muellner,
	Heiko Stuebner

Hi Eric,

Am Freitag, 14. Juli 2023, 08:27:08 CEST schrieb Eric Biggers:
> On Thu, Jul 13, 2023 at 12:40:42AM -0700, Eric Biggers wrote:
> > On Tue, Jul 11, 2023 at 05:37:31PM +0200, Heiko Stuebner wrote:
> > > From: Heiko Stuebner <heiko.stuebner@vrull.eu>
> > > 
> > > This series provides cryptographic implementations using the vector
> > > crypto extensions.
> > > 
> > > v13 of the vector patchset dropped the patches for in-kernel usage of
> > > vector instructions, I picked the ones from v12 over into this series
> > > for now.
> > > 
> > > My basic goal was to not re-invent cryptographic code, so the heavy
> > > lifting is done by those perl-asm scripts used in openssl and the perl
> > > code used here-in stems from code that is targetted at openssl [0] and is
> > > unmodified from there to limit needed review effort.
> > > 
> > > With a matching qemu (there are patches for vector-crypto flying around)
> > > the in-kernel crypto-selftests (also the extended ones) are very happy
> > > so far.
> > 
> > Where does this patchset apply to?  I tried torvalds/master, linux-next/master,
> > riscv/for-next, and cryptodev/master.  Nothing worked.  When sending a
> > patch(set), please always use the '--base' option to 'git format-patch', or
> > explicitly mention where it applies to, or provide a link to a git repo.
> > 
> 
> Hi Heiko, any update on this?  I would like to review, and maybe test, this
> patchset but there's no way for me to do so.

sorry about that. As you said, this should've been mentioned in the
cover-letter.

This patchset goes on top of the v6 scalar one [0] which in turn
goes on top of the arch-random patchset [1] and that in turn sits
on top of 6.5-rc1 for me.


Heiko


[0] https://lore.kernel.org/r/20230709154243.1582671-1-heiko@sntech.de
[1] https://lore.kernel.org/r/20230709115549.2666557-1-sameo@rivosinc.com



^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH v4 00/12] RISC-V: support some cryptography accelerations
@ 2023-07-14  7:02       ` Heiko Stuebner
  0 siblings, 0 replies; 100+ messages in thread
From: Heiko Stuebner @ 2023-07-14  7:02 UTC (permalink / raw)
  To: Eric Biggers
  Cc: palmer, paul.walmsley, aou, herbert, davem, conor.dooley,
	linux-riscv, linux-kernel, linux-crypto, christoph.muellner,
	Heiko Stuebner

Hi Eric,

Am Freitag, 14. Juli 2023, 08:27:08 CEST schrieb Eric Biggers:
> On Thu, Jul 13, 2023 at 12:40:42AM -0700, Eric Biggers wrote:
> > On Tue, Jul 11, 2023 at 05:37:31PM +0200, Heiko Stuebner wrote:
> > > From: Heiko Stuebner <heiko.stuebner@vrull.eu>
> > > 
> > > This series provides cryptographic implementations using the vector
> > > crypto extensions.
> > > 
> > > v13 of the vector patchset dropped the patches for in-kernel usage of
> > > vector instructions, I picked the ones from v12 over into this series
> > > for now.
> > > 
> > > My basic goal was to not re-invent cryptographic code, so the heavy
> > > lifting is done by those perl-asm scripts used in openssl and the perl
> > > code used here-in stems from code that is targetted at openssl [0] and is
> > > unmodified from there to limit needed review effort.
> > > 
> > > With a matching qemu (there are patches for vector-crypto flying around)
> > > the in-kernel crypto-selftests (also the extended ones) are very happy
> > > so far.
> > 
> > Where does this patchset apply to?  I tried torvalds/master, linux-next/master,
> > riscv/for-next, and cryptodev/master.  Nothing worked.  When sending a
> > patch(set), please always use the '--base' option to 'git format-patch', or
> > explicitly mention where it applies to, or provide a link to a git repo.
> > 
> 
> Hi Heiko, any update on this?  I would like to review, and maybe test, this
> patchset but there's no way for me to do so.

sorry about that. As you said, this should've been mentioned in the
cover-letter.

This patchset goes on top of the v6 scalar one [0] which in turn
goes on top of the arch-random patchset [1] and that in turn sits
on top of 6.5-rc1 for me.


Heiko


[0] https://lore.kernel.org/r/20230709154243.1582671-1-heiko@sntech.de
[1] https://lore.kernel.org/r/20230709115549.2666557-1-sameo@rivosinc.com



_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH v4 04/12] RISC-V: add vector crypto extension detection
  2023-07-11 15:37   ` Heiko Stuebner
@ 2023-07-18 14:55     ` Conor Dooley
  -1 siblings, 0 replies; 100+ messages in thread
From: Conor Dooley @ 2023-07-18 14:55 UTC (permalink / raw)
  To: Heiko Stuebner
  Cc: palmer, paul.walmsley, aou, herbert, davem, conor.dooley,
	linux-riscv, linux-kernel, linux-crypto, christoph.muellner,
	ebiggers, Heiko Stuebner

[-- Attachment #1: Type: text/plain, Size: 3217 bytes --]

Hey Heiko,

On Tue, Jul 11, 2023 at 05:37:35PM +0200, Heiko Stuebner wrote:
> From: Heiko Stuebner <heiko.stuebner@vrull.eu>
> 
> Add detection for some extensions of the vector-crypto specification:
> - Zvkb: Vector Bit-manipulation used in Cryptography
> - Zvkg: Vector GCM/GMAC
> - Zvknha and Zvknhb: NIST Algorithm Suite
> - Zvkns: AES-128, AES-256 Single Round Suite
> - Zvksed: ShangMi Algorithm Suite
> - Zvksh: ShangMi Algorithm Suite
> 
> As their use is very specific and will likely be limited to special places
> we expect current code to just pre-encode those instructions, so right now
> we don't introduce toolchain requirements.
> 
> Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
> ---
>  arch/riscv/include/asm/hwcap.h |  9 ++++++
>  arch/riscv/kernel/cpu.c        |  8 ++++++
>  arch/riscv/kernel/cpufeature.c | 50 ++++++++++++++++++++++++++++++++++
>  3 files changed, 67 insertions(+)
> 
> diff --git a/arch/riscv/include/asm/hwcap.h b/arch/riscv/include/asm/hwcap.h
> index b80ca6e77088..0f5172fa87b0 100644
> --- a/arch/riscv/include/asm/hwcap.h
> +++ b/arch/riscv/include/asm/hwcap.h
> @@ -64,6 +64,15 @@
>  #define RISCV_ISA_EXT_ZKSED		51
>  #define RISCV_ISA_EXT_ZKSH		52
>  #define RISCV_ISA_EXT_ZKT		53
> +#define RISCV_ISA_EXT_ZVBB		54
> +#define RISCV_ISA_EXT_ZVBC		55
> +#define RISCV_ISA_EXT_ZVKG		56
> +#define RISCV_ISA_EXT_ZVKNED		57
> +#define RISCV_ISA_EXT_ZVKNHA		58
> +#define RISCV_ISA_EXT_ZVKNHB		59
> +#define RISCV_ISA_EXT_ZVKSED		60
> +#define RISCV_ISA_EXT_ZVKSH		61
> +#define RISCV_ISA_EXT_ZVKT		62
>  
>  #define RISCV_ISA_EXT_MAX		64
>  #define RISCV_ISA_EXT_NAME_LEN_MAX	32
> diff --git a/arch/riscv/kernel/cpu.c b/arch/riscv/kernel/cpu.c
> index 10524322a4c0..925241e25db2 100644
> --- a/arch/riscv/kernel/cpu.c
> +++ b/arch/riscv/kernel/cpu.c
> @@ -227,6 +227,14 @@ static struct riscv_isa_ext_data isa_ext_arr[] = {
>  	__RISCV_ISA_EXT_DATA(zksed, RISCV_ISA_EXT_ZKSED),
>  	__RISCV_ISA_EXT_DATA(zksh, RISCV_ISA_EXT_ZKSH),
>  	__RISCV_ISA_EXT_DATA(zkt, RISCV_ISA_EXT_ZKT),
> +	__RISCV_ISA_EXT_DATA(zvbb, RISCV_ISA_EXT_ZVBB),
> +	__RISCV_ISA_EXT_DATA(zvbc, RISCV_ISA_EXT_ZVBC),
> +	__RISCV_ISA_EXT_DATA(zvkg, RISCV_ISA_EXT_ZVKG),
> +	__RISCV_ISA_EXT_DATA(zvkned, RISCV_ISA_EXT_ZVKNED),
> +	__RISCV_ISA_EXT_DATA(zvknha, RISCV_ISA_EXT_ZVKNHA),
> +	__RISCV_ISA_EXT_DATA(zvknhb, RISCV_ISA_EXT_ZVKNHB),
> +	__RISCV_ISA_EXT_DATA(zvksed, RISCV_ISA_EXT_ZVKSED),
> +	__RISCV_ISA_EXT_DATA(zvksh, RISCV_ISA_EXT_ZVKSH),
>  	__RISCV_ISA_EXT_DATA(smaia, RISCV_ISA_EXT_SMAIA),
>  	__RISCV_ISA_EXT_DATA(ssaia, RISCV_ISA_EXT_SSAIA),
>  	__RISCV_ISA_EXT_DATA(sscofpmf, RISCV_ISA_EXT_SSCOFPMF),
> diff --git a/arch/riscv/kernel/cpufeature.c b/arch/riscv/kernel/cpufeature.c
> index 9a872a2007a5..13556fd16bf6 100644
> --- a/arch/riscv/kernel/cpufeature.c
> +++ b/arch/riscv/kernel/cpufeature.c
> @@ -343,6 +343,56 @@ void __init riscv_fill_hwcap(void)

All of these need to be documented in dt-bindings.

At least one of these lists will go away iff Palmer merges my rework of
this stuff & hopefully we'll get one of the ways to avoid repeating the
SET_ISA_EXT_MAP stuff ad nauseam.

Cheers,
Conor.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH v4 04/12] RISC-V: add vector crypto extension detection
@ 2023-07-18 14:55     ` Conor Dooley
  0 siblings, 0 replies; 100+ messages in thread
From: Conor Dooley @ 2023-07-18 14:55 UTC (permalink / raw)
  To: Heiko Stuebner
  Cc: palmer, paul.walmsley, aou, herbert, davem, conor.dooley,
	linux-riscv, linux-kernel, linux-crypto, christoph.muellner,
	ebiggers, Heiko Stuebner


[-- Attachment #1.1: Type: text/plain, Size: 3217 bytes --]

Hey Heiko,

On Tue, Jul 11, 2023 at 05:37:35PM +0200, Heiko Stuebner wrote:
> From: Heiko Stuebner <heiko.stuebner@vrull.eu>
> 
> Add detection for some extensions of the vector-crypto specification:
> - Zvkb: Vector Bit-manipulation used in Cryptography
> - Zvkg: Vector GCM/GMAC
> - Zvknha and Zvknhb: NIST Algorithm Suite
> - Zvkns: AES-128, AES-256 Single Round Suite
> - Zvksed: ShangMi Algorithm Suite
> - Zvksh: ShangMi Algorithm Suite
> 
> As their use is very specific and will likely be limited to special places
> we expect current code to just pre-encode those instructions, so right now
> we don't introduce toolchain requirements.
> 
> Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
> ---
>  arch/riscv/include/asm/hwcap.h |  9 ++++++
>  arch/riscv/kernel/cpu.c        |  8 ++++++
>  arch/riscv/kernel/cpufeature.c | 50 ++++++++++++++++++++++++++++++++++
>  3 files changed, 67 insertions(+)
> 
> diff --git a/arch/riscv/include/asm/hwcap.h b/arch/riscv/include/asm/hwcap.h
> index b80ca6e77088..0f5172fa87b0 100644
> --- a/arch/riscv/include/asm/hwcap.h
> +++ b/arch/riscv/include/asm/hwcap.h
> @@ -64,6 +64,15 @@
>  #define RISCV_ISA_EXT_ZKSED		51
>  #define RISCV_ISA_EXT_ZKSH		52
>  #define RISCV_ISA_EXT_ZKT		53
> +#define RISCV_ISA_EXT_ZVBB		54
> +#define RISCV_ISA_EXT_ZVBC		55
> +#define RISCV_ISA_EXT_ZVKG		56
> +#define RISCV_ISA_EXT_ZVKNED		57
> +#define RISCV_ISA_EXT_ZVKNHA		58
> +#define RISCV_ISA_EXT_ZVKNHB		59
> +#define RISCV_ISA_EXT_ZVKSED		60
> +#define RISCV_ISA_EXT_ZVKSH		61
> +#define RISCV_ISA_EXT_ZVKT		62
>  
>  #define RISCV_ISA_EXT_MAX		64
>  #define RISCV_ISA_EXT_NAME_LEN_MAX	32
> diff --git a/arch/riscv/kernel/cpu.c b/arch/riscv/kernel/cpu.c
> index 10524322a4c0..925241e25db2 100644
> --- a/arch/riscv/kernel/cpu.c
> +++ b/arch/riscv/kernel/cpu.c
> @@ -227,6 +227,14 @@ static struct riscv_isa_ext_data isa_ext_arr[] = {
>  	__RISCV_ISA_EXT_DATA(zksed, RISCV_ISA_EXT_ZKSED),
>  	__RISCV_ISA_EXT_DATA(zksh, RISCV_ISA_EXT_ZKSH),
>  	__RISCV_ISA_EXT_DATA(zkt, RISCV_ISA_EXT_ZKT),
> +	__RISCV_ISA_EXT_DATA(zvbb, RISCV_ISA_EXT_ZVBB),
> +	__RISCV_ISA_EXT_DATA(zvbc, RISCV_ISA_EXT_ZVBC),
> +	__RISCV_ISA_EXT_DATA(zvkg, RISCV_ISA_EXT_ZVKG),
> +	__RISCV_ISA_EXT_DATA(zvkned, RISCV_ISA_EXT_ZVKNED),
> +	__RISCV_ISA_EXT_DATA(zvknha, RISCV_ISA_EXT_ZVKNHA),
> +	__RISCV_ISA_EXT_DATA(zvknhb, RISCV_ISA_EXT_ZVKNHB),
> +	__RISCV_ISA_EXT_DATA(zvksed, RISCV_ISA_EXT_ZVKSED),
> +	__RISCV_ISA_EXT_DATA(zvksh, RISCV_ISA_EXT_ZVKSH),
>  	__RISCV_ISA_EXT_DATA(smaia, RISCV_ISA_EXT_SMAIA),
>  	__RISCV_ISA_EXT_DATA(ssaia, RISCV_ISA_EXT_SSAIA),
>  	__RISCV_ISA_EXT_DATA(sscofpmf, RISCV_ISA_EXT_SSCOFPMF),
> diff --git a/arch/riscv/kernel/cpufeature.c b/arch/riscv/kernel/cpufeature.c
> index 9a872a2007a5..13556fd16bf6 100644
> --- a/arch/riscv/kernel/cpufeature.c
> +++ b/arch/riscv/kernel/cpufeature.c
> @@ -343,6 +343,56 @@ void __init riscv_fill_hwcap(void)

All of these need to be documented in dt-bindings.

At least one of these lists will go away iff Palmer merges my rework of
this stuff & hopefully we'll get one of the ways to avoid repeating the
SET_ISA_EXT_MAP stuff ad nauseam.

Cheers,
Conor.

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

[-- Attachment #2: Type: text/plain, Size: 161 bytes --]

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH v4 08/12] RISC-V: crypto: add a vector-crypto-accelerated SHA256 implementation
  2023-07-11 15:37   ` Heiko Stuebner
@ 2023-07-21  4:42     ` Eric Biggers
  -1 siblings, 0 replies; 100+ messages in thread
From: Eric Biggers @ 2023-07-21  4:42 UTC (permalink / raw)
  To: Heiko Stuebner
  Cc: palmer, paul.walmsley, aou, herbert, davem, conor.dooley,
	linux-riscv, linux-kernel, linux-crypto, christoph.muellner,
	Heiko Stuebner, Charalampos Mitrodimas

On Tue, Jul 11, 2023 at 05:37:39PM +0200, Heiko Stuebner wrote:
> diff --git a/arch/riscv/crypto/sha256-riscv64-glue.c b/arch/riscv/crypto/sha256-riscv64-glue.c
> new file mode 100644
> index 000000000000..1c9c88029f60
> --- /dev/null
> +++ b/arch/riscv/crypto/sha256-riscv64-glue.c
> @@ -0,0 +1,115 @@
> +// SPDX-License-Identifier: GPL-2.0-or-later
> +/*
> + * Linux/riscv64 port of the OpenSSL SHA256 implementation for RISCV64
> + *
> + * Copyright (C) 2022 VRULL GmbH
> + * Author: Heiko Stuebner <heiko.stuebner@vrull.eu>
> + */
> +
> +#include <linux/module.h>
> +#include <linux/types.h>
> +#include <asm/simd.h>
> +#include <asm/vector.h>
> +#include <crypto/internal/hash.h>
> +#include <crypto/internal/simd.h>
> +#include <crypto/sha2.h>
> +#include <crypto/sha256_base.h>
> +
> +asmlinkage void sha256_block_data_order_zvbb_zvknha(u32 *digest, const void *data,
> +					unsigned int num_blks);
> +
> +static void __sha256_block_data_order(struct sha256_state *sst, u8 const *src,
> +				      int blocks)
> +{
> +	sha256_block_data_order_zvbb_zvknha(sst->state, src, blocks);
> +}

Having a double-underscored function wrap around a non-underscored one like this
isn't conventional for Linux kernel code.  IIRC some of the other crypto code
happens to do this, but it really is supposed to be the other way around.

I think you should just declare the assembly function to take a 'struct
sha256_state', with a comment mentioning that only the 'u32 state[8]' at the
beginning is actually used.  That's what arch/x86/crypto/sha256_ssse3_glue.c
does, for example.  Then, __sha256_block_data_order() would be unneeded.

> +static int riscv64_sha256_update(struct shash_desc *desc, const u8 *data,
> +			 unsigned int len)
> +{
> +	if (crypto_simd_usable()) {

crypto_simd_usable() uses may_use_simd() which isn't wired up for RISC-V, so it
gets the default implementation of '!in_interrupt()'.  RISC-V does have
may_use_vector() which looks like right thing.  I think RISC-V needs a header
arch/riscv/include/asm/simd.h which defines may_use_simd() as a wrapper around
may_use_vector().

> +		int ret;
> +
> +		kernel_rvv_begin();
> +		ret = sha256_base_do_update(desc, data, len,
> +					    __sha256_block_data_order);
> +		kernel_rvv_end();
> +		return ret;
> +	} else {
> +		sha256_update(shash_desc_ctx(desc), data, len);
> +		return 0;
> +	}
> +}
> +
> +static int riscv64_sha256_finup(struct shash_desc *desc, const u8 *data,
> +			unsigned int len, u8 *out)
> +{
> +	if (!crypto_simd_usable()) {
> +		sha256_update(shash_desc_ctx(desc), data, len);
> +		sha256_final(shash_desc_ctx(desc), out);
> +		return 0;
> +	}

Keep things consistent please.  riscv64_sha256_update() could use
!crypto_simd_usable() and an early return too.

> +static int __init sha256_mod_init(void)

riscv64_sha256_mod_init()

> +{
> +	/*
> +	 * From the spec:
> +	 * Zvknhb supports SHA-256 and SHA-512. Zvknha supports only SHA-256.
> +	 */
> +	if ((riscv_isa_extension_available(NULL, ZVKNHA) ||
> +	     riscv_isa_extension_available(NULL, ZVKNHB)) &&
> +	     riscv_isa_extension_available(NULL, ZVBB) &&
> +	     riscv_vector_vlen() >= 128)
> +
> +		return crypto_register_shash(&sha256_alg);
> +
> +	return 0;
> +}
> +
> +static void __exit sha256_mod_fini(void)

riscv64_sha256_mod_exit()

> +{
> +	if ((riscv_isa_extension_available(NULL, ZVKNHA) ||
> +	     riscv_isa_extension_available(NULL, ZVKNHB)) &&
> +	     riscv_isa_extension_available(NULL, ZVBB) &&
> +	     riscv_vector_vlen() >= 128)
> +		crypto_unregister_shash(&sha256_alg);
> +}

If the needed CPU features aren't present, return -ENODEV from the module_init
function instead of 0.  Then, the module_exit function can unconditionally
unregister the algorithm.

- Eric

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH v4 08/12] RISC-V: crypto: add a vector-crypto-accelerated SHA256 implementation
@ 2023-07-21  4:42     ` Eric Biggers
  0 siblings, 0 replies; 100+ messages in thread
From: Eric Biggers @ 2023-07-21  4:42 UTC (permalink / raw)
  To: Heiko Stuebner
  Cc: palmer, paul.walmsley, aou, herbert, davem, conor.dooley,
	linux-riscv, linux-kernel, linux-crypto, christoph.muellner,
	Heiko Stuebner, Charalampos Mitrodimas

On Tue, Jul 11, 2023 at 05:37:39PM +0200, Heiko Stuebner wrote:
> diff --git a/arch/riscv/crypto/sha256-riscv64-glue.c b/arch/riscv/crypto/sha256-riscv64-glue.c
> new file mode 100644
> index 000000000000..1c9c88029f60
> --- /dev/null
> +++ b/arch/riscv/crypto/sha256-riscv64-glue.c
> @@ -0,0 +1,115 @@
> +// SPDX-License-Identifier: GPL-2.0-or-later
> +/*
> + * Linux/riscv64 port of the OpenSSL SHA256 implementation for RISCV64
> + *
> + * Copyright (C) 2022 VRULL GmbH
> + * Author: Heiko Stuebner <heiko.stuebner@vrull.eu>
> + */
> +
> +#include <linux/module.h>
> +#include <linux/types.h>
> +#include <asm/simd.h>
> +#include <asm/vector.h>
> +#include <crypto/internal/hash.h>
> +#include <crypto/internal/simd.h>
> +#include <crypto/sha2.h>
> +#include <crypto/sha256_base.h>
> +
> +asmlinkage void sha256_block_data_order_zvbb_zvknha(u32 *digest, const void *data,
> +					unsigned int num_blks);
> +
> +static void __sha256_block_data_order(struct sha256_state *sst, u8 const *src,
> +				      int blocks)
> +{
> +	sha256_block_data_order_zvbb_zvknha(sst->state, src, blocks);
> +}

Having a double-underscored function wrap around a non-underscored one like this
isn't conventional for Linux kernel code.  IIRC some of the other crypto code
happens to do this, but it really is supposed to be the other way around.

I think you should just declare the assembly function to take a 'struct
sha256_state', with a comment mentioning that only the 'u32 state[8]' at the
beginning is actually used.  That's what arch/x86/crypto/sha256_ssse3_glue.c
does, for example.  Then, __sha256_block_data_order() would be unneeded.

> +static int riscv64_sha256_update(struct shash_desc *desc, const u8 *data,
> +			 unsigned int len)
> +{
> +	if (crypto_simd_usable()) {

crypto_simd_usable() uses may_use_simd() which isn't wired up for RISC-V, so it
gets the default implementation of '!in_interrupt()'.  RISC-V does have
may_use_vector() which looks like right thing.  I think RISC-V needs a header
arch/riscv/include/asm/simd.h which defines may_use_simd() as a wrapper around
may_use_vector().

> +		int ret;
> +
> +		kernel_rvv_begin();
> +		ret = sha256_base_do_update(desc, data, len,
> +					    __sha256_block_data_order);
> +		kernel_rvv_end();
> +		return ret;
> +	} else {
> +		sha256_update(shash_desc_ctx(desc), data, len);
> +		return 0;
> +	}
> +}
> +
> +static int riscv64_sha256_finup(struct shash_desc *desc, const u8 *data,
> +			unsigned int len, u8 *out)
> +{
> +	if (!crypto_simd_usable()) {
> +		sha256_update(shash_desc_ctx(desc), data, len);
> +		sha256_final(shash_desc_ctx(desc), out);
> +		return 0;
> +	}

Keep things consistent please.  riscv64_sha256_update() could use
!crypto_simd_usable() and an early return too.

> +static int __init sha256_mod_init(void)

riscv64_sha256_mod_init()

> +{
> +	/*
> +	 * From the spec:
> +	 * Zvknhb supports SHA-256 and SHA-512. Zvknha supports only SHA-256.
> +	 */
> +	if ((riscv_isa_extension_available(NULL, ZVKNHA) ||
> +	     riscv_isa_extension_available(NULL, ZVKNHB)) &&
> +	     riscv_isa_extension_available(NULL, ZVBB) &&
> +	     riscv_vector_vlen() >= 128)
> +
> +		return crypto_register_shash(&sha256_alg);
> +
> +	return 0;
> +}
> +
> +static void __exit sha256_mod_fini(void)

riscv64_sha256_mod_exit()

> +{
> +	if ((riscv_isa_extension_available(NULL, ZVKNHA) ||
> +	     riscv_isa_extension_available(NULL, ZVKNHB)) &&
> +	     riscv_isa_extension_available(NULL, ZVBB) &&
> +	     riscv_vector_vlen() >= 128)
> +		crypto_unregister_shash(&sha256_alg);
> +}

If the needed CPU features aren't present, return -ENODEV from the module_init
function instead of 0.  Then, the module_exit function can unconditionally
unregister the algorithm.

- Eric

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH v4 00/12] RISC-V: support some cryptography accelerations
  2023-07-11 15:37 ` Heiko Stuebner
@ 2023-07-21  5:12   ` Eric Biggers
  -1 siblings, 0 replies; 100+ messages in thread
From: Eric Biggers @ 2023-07-21  5:12 UTC (permalink / raw)
  To: Heiko Stuebner
  Cc: palmer, paul.walmsley, aou, herbert, davem, conor.dooley,
	linux-riscv, linux-kernel, linux-crypto, christoph.muellner,
	Heiko Stuebner

Hi Heiko,

On Tue, Jul 11, 2023 at 05:37:31PM +0200, Heiko Stuebner wrote:
> From: Heiko Stuebner <heiko.stuebner@vrull.eu>
> 
> This series provides cryptographic implementations using the vector
> crypto extensions.
> 
> v13 of the vector patchset dropped the patches for in-kernel usage of
> vector instructions, I picked the ones from v12 over into this series
> for now.
> 
> My basic goal was to not re-invent cryptographic code, so the heavy
> lifting is done by those perl-asm scripts used in openssl and the perl
> code used here-in stems from code that is targetted at openssl [0] and is
> unmodified from there to limit needed review effort.
> 
> With a matching qemu (there are patches for vector-crypto flying around)
> the in-kernel crypto-selftests (also the extended ones) are very happy
> so far.
> 
> 
> changes in v4:
> - split off from scalar crypto patches but base on top of them
> - adapt to pending openssl code [0] using the now frozen vector crypto
>   extensions - with all its changes
>   [0] https://github.com/openssl/openssl/pull/20149
> 
> changes in v3:
> - rebase on top of 6.3-rc2
> - rebase on top of vector-v14 patchset
> - add the missing Co-developed-by mentions to showcase
>   the people that did the actual openSSL crypto code
> 
> changes in v2:
> - rebased on 6.2 + zbb series, so don't include already
>   applied changes anymore
> - refresh code picked from openssl as that side matures
> - more algorithms (SHA512, AES, SM3, SM4)
> 
> Greentime Hu (2):
>   riscv: Add support for kernel mode vector
>   riscv: Add vector extension XOR implementation
> 
> Heiko Stuebner (10):
>   RISC-V: add helper function to read the vector VLEN
>   RISC-V: add vector crypto extension detection
>   RISC-V: crypto: update perl include with helpers for vector (crypto)
>     instructions
>   RISC-V: crypto: add Zvbb+Zvbc accelerated GCM GHASH implementation
>   RISC-V: crypto: add Zvkg accelerated GCM GHASH implementation
>   RISC-V: crypto: add a vector-crypto-accelerated SHA256 implementation
>   RISC-V: crypto: add a vector-crypto-accelerated SHA512 implementation
>   RISC-V: crypto: add Zvkned accelerated AES encryption implementation
>   RISC-V: crypto: add Zvksed accelerated SM4 encryption implementation
>   RISC-V: crypto: add Zvksh accelerated SM3 hash implementation
> 
>  arch/riscv/crypto/Kconfig                     |  68 ++-
>  arch/riscv/crypto/Makefile                    |  44 +-
>  arch/riscv/crypto/aes-riscv-glue.c            | 168 ++++++
>  arch/riscv/crypto/aes-riscv64-zvkned.pl       | 530 ++++++++++++++++++
>  arch/riscv/crypto/ghash-riscv64-glue.c        | 245 ++++++++
>  arch/riscv/crypto/ghash-riscv64-zvbb-zvbc.pl  | 380 +++++++++++++
>  arch/riscv/crypto/ghash-riscv64-zvkg.pl       | 168 ++++++
>  arch/riscv/crypto/riscv.pm                    | 433 +++++++++++++-
>  arch/riscv/crypto/sha256-riscv64-glue.c       | 115 ++++
>  .../crypto/sha256-riscv64-zvbb-zvknha.pl      | 314 +++++++++++
>  arch/riscv/crypto/sha512-riscv64-glue.c       | 106 ++++
>  .../crypto/sha512-riscv64-zvbb-zvknhb.pl      | 377 +++++++++++++
>  arch/riscv/crypto/sm3-riscv64-glue.c          | 112 ++++
>  arch/riscv/crypto/sm3-riscv64-zvksh.pl        | 225 ++++++++
>  arch/riscv/crypto/sm4-riscv64-glue.c          | 162 ++++++
>  arch/riscv/crypto/sm4-riscv64-zvksed.pl       | 300 ++++++++++
>  arch/riscv/include/asm/hwcap.h                |   9 +
>  arch/riscv/include/asm/vector.h               |  28 +
>  arch/riscv/include/asm/xor.h                  |  82 +++
>  arch/riscv/kernel/Makefile                    |   1 +
>  arch/riscv/kernel/cpu.c                       |   8 +
>  arch/riscv/kernel/cpufeature.c                |  50 ++
>  arch/riscv/kernel/kernel_mode_vector.c        | 132 +++++
>  arch/riscv/lib/Makefile                       |   1 +
>  arch/riscv/lib/xor.S                          |  81 +++
>  25 files changed, 4136 insertions(+), 3 deletions(-)
>  create mode 100644 arch/riscv/crypto/aes-riscv-glue.c
>  create mode 100644 arch/riscv/crypto/aes-riscv64-zvkned.pl
>  create mode 100644 arch/riscv/crypto/ghash-riscv64-zvbb-zvbc.pl
>  create mode 100644 arch/riscv/crypto/ghash-riscv64-zvkg.pl
>  create mode 100644 arch/riscv/crypto/sha256-riscv64-glue.c
>  create mode 100644 arch/riscv/crypto/sha256-riscv64-zvbb-zvknha.pl
>  create mode 100644 arch/riscv/crypto/sha512-riscv64-glue.c
>  create mode 100644 arch/riscv/crypto/sha512-riscv64-zvbb-zvknhb.pl
>  create mode 100644 arch/riscv/crypto/sm3-riscv64-glue.c
>  create mode 100644 arch/riscv/crypto/sm3-riscv64-zvksh.pl
>  create mode 100644 arch/riscv/crypto/sm4-riscv64-glue.c
>  create mode 100644 arch/riscv/crypto/sm4-riscv64-zvksed.pl
>  create mode 100644 arch/riscv/include/asm/xor.h
>  create mode 100644 arch/riscv/kernel/kernel_mode_vector.c
>  create mode 100644 arch/riscv/lib/xor.S
> 

Thanks for working on this patchset!  I'm glad to see that you and others are
working on this and the code in OpenSSL.  And thanks for running all the kernel
crypto self-tests and verifying that they pass.

I'm still a bit worried about there being two competing sets of crypto
extensions for RISC-V: scalar and vector.

However the vector crypto extensions are moving forwards (they were recently
frozen), from what I've heard are being implemented in CPUs, and based on this
patchset implementations of most algorithms are ready already.

So I'm wondering: do you still think that it's valuable to continue with your
other patchset that adds GHASH acceleration using the scalar extensions (which
this patchset is still based on)?  

I'm wondering if we should be 100% focused on the vector extensions for now to
avoid fragmentation of effort.

It's just not super clear to me what is driving the scalar crypto support right
now.  Maybe embedded systems?  Maybe it was just a mistep, perhaps due to being
started before the CPU even had a vector unit?  I don't know.  If you do indeed
have a strong reason for it, then you can go ahead -- I just wanted to make sure
we don't end up doing twice as much work unnecessarily.

- Eric

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH v4 00/12] RISC-V: support some cryptography accelerations
@ 2023-07-21  5:12   ` Eric Biggers
  0 siblings, 0 replies; 100+ messages in thread
From: Eric Biggers @ 2023-07-21  5:12 UTC (permalink / raw)
  To: Heiko Stuebner
  Cc: palmer, paul.walmsley, aou, herbert, davem, conor.dooley,
	linux-riscv, linux-kernel, linux-crypto, christoph.muellner,
	Heiko Stuebner

Hi Heiko,

On Tue, Jul 11, 2023 at 05:37:31PM +0200, Heiko Stuebner wrote:
> From: Heiko Stuebner <heiko.stuebner@vrull.eu>
> 
> This series provides cryptographic implementations using the vector
> crypto extensions.
> 
> v13 of the vector patchset dropped the patches for in-kernel usage of
> vector instructions, I picked the ones from v12 over into this series
> for now.
> 
> My basic goal was to not re-invent cryptographic code, so the heavy
> lifting is done by those perl-asm scripts used in openssl and the perl
> code used here-in stems from code that is targetted at openssl [0] and is
> unmodified from there to limit needed review effort.
> 
> With a matching qemu (there are patches for vector-crypto flying around)
> the in-kernel crypto-selftests (also the extended ones) are very happy
> so far.
> 
> 
> changes in v4:
> - split off from scalar crypto patches but base on top of them
> - adapt to pending openssl code [0] using the now frozen vector crypto
>   extensions - with all its changes
>   [0] https://github.com/openssl/openssl/pull/20149
> 
> changes in v3:
> - rebase on top of 6.3-rc2
> - rebase on top of vector-v14 patchset
> - add the missing Co-developed-by mentions to showcase
>   the people that did the actual openSSL crypto code
> 
> changes in v2:
> - rebased on 6.2 + zbb series, so don't include already
>   applied changes anymore
> - refresh code picked from openssl as that side matures
> - more algorithms (SHA512, AES, SM3, SM4)
> 
> Greentime Hu (2):
>   riscv: Add support for kernel mode vector
>   riscv: Add vector extension XOR implementation
> 
> Heiko Stuebner (10):
>   RISC-V: add helper function to read the vector VLEN
>   RISC-V: add vector crypto extension detection
>   RISC-V: crypto: update perl include with helpers for vector (crypto)
>     instructions
>   RISC-V: crypto: add Zvbb+Zvbc accelerated GCM GHASH implementation
>   RISC-V: crypto: add Zvkg accelerated GCM GHASH implementation
>   RISC-V: crypto: add a vector-crypto-accelerated SHA256 implementation
>   RISC-V: crypto: add a vector-crypto-accelerated SHA512 implementation
>   RISC-V: crypto: add Zvkned accelerated AES encryption implementation
>   RISC-V: crypto: add Zvksed accelerated SM4 encryption implementation
>   RISC-V: crypto: add Zvksh accelerated SM3 hash implementation
> 
>  arch/riscv/crypto/Kconfig                     |  68 ++-
>  arch/riscv/crypto/Makefile                    |  44 +-
>  arch/riscv/crypto/aes-riscv-glue.c            | 168 ++++++
>  arch/riscv/crypto/aes-riscv64-zvkned.pl       | 530 ++++++++++++++++++
>  arch/riscv/crypto/ghash-riscv64-glue.c        | 245 ++++++++
>  arch/riscv/crypto/ghash-riscv64-zvbb-zvbc.pl  | 380 +++++++++++++
>  arch/riscv/crypto/ghash-riscv64-zvkg.pl       | 168 ++++++
>  arch/riscv/crypto/riscv.pm                    | 433 +++++++++++++-
>  arch/riscv/crypto/sha256-riscv64-glue.c       | 115 ++++
>  .../crypto/sha256-riscv64-zvbb-zvknha.pl      | 314 +++++++++++
>  arch/riscv/crypto/sha512-riscv64-glue.c       | 106 ++++
>  .../crypto/sha512-riscv64-zvbb-zvknhb.pl      | 377 +++++++++++++
>  arch/riscv/crypto/sm3-riscv64-glue.c          | 112 ++++
>  arch/riscv/crypto/sm3-riscv64-zvksh.pl        | 225 ++++++++
>  arch/riscv/crypto/sm4-riscv64-glue.c          | 162 ++++++
>  arch/riscv/crypto/sm4-riscv64-zvksed.pl       | 300 ++++++++++
>  arch/riscv/include/asm/hwcap.h                |   9 +
>  arch/riscv/include/asm/vector.h               |  28 +
>  arch/riscv/include/asm/xor.h                  |  82 +++
>  arch/riscv/kernel/Makefile                    |   1 +
>  arch/riscv/kernel/cpu.c                       |   8 +
>  arch/riscv/kernel/cpufeature.c                |  50 ++
>  arch/riscv/kernel/kernel_mode_vector.c        | 132 +++++
>  arch/riscv/lib/Makefile                       |   1 +
>  arch/riscv/lib/xor.S                          |  81 +++
>  25 files changed, 4136 insertions(+), 3 deletions(-)
>  create mode 100644 arch/riscv/crypto/aes-riscv-glue.c
>  create mode 100644 arch/riscv/crypto/aes-riscv64-zvkned.pl
>  create mode 100644 arch/riscv/crypto/ghash-riscv64-zvbb-zvbc.pl
>  create mode 100644 arch/riscv/crypto/ghash-riscv64-zvkg.pl
>  create mode 100644 arch/riscv/crypto/sha256-riscv64-glue.c
>  create mode 100644 arch/riscv/crypto/sha256-riscv64-zvbb-zvknha.pl
>  create mode 100644 arch/riscv/crypto/sha512-riscv64-glue.c
>  create mode 100644 arch/riscv/crypto/sha512-riscv64-zvbb-zvknhb.pl
>  create mode 100644 arch/riscv/crypto/sm3-riscv64-glue.c
>  create mode 100644 arch/riscv/crypto/sm3-riscv64-zvksh.pl
>  create mode 100644 arch/riscv/crypto/sm4-riscv64-glue.c
>  create mode 100644 arch/riscv/crypto/sm4-riscv64-zvksed.pl
>  create mode 100644 arch/riscv/include/asm/xor.h
>  create mode 100644 arch/riscv/kernel/kernel_mode_vector.c
>  create mode 100644 arch/riscv/lib/xor.S
> 

Thanks for working on this patchset!  I'm glad to see that you and others are
working on this and the code in OpenSSL.  And thanks for running all the kernel
crypto self-tests and verifying that they pass.

I'm still a bit worried about there being two competing sets of crypto
extensions for RISC-V: scalar and vector.

However the vector crypto extensions are moving forwards (they were recently
frozen), from what I've heard are being implemented in CPUs, and based on this
patchset implementations of most algorithms are ready already.

So I'm wondering: do you still think that it's valuable to continue with your
other patchset that adds GHASH acceleration using the scalar extensions (which
this patchset is still based on)?  

I'm wondering if we should be 100% focused on the vector extensions for now to
avoid fragmentation of effort.

It's just not super clear to me what is driving the scalar crypto support right
now.  Maybe embedded systems?  Maybe it was just a mistep, perhaps due to being
started before the CPU even had a vector unit?  I don't know.  If you do indeed
have a strong reason for it, then you can go ahead -- I just wanted to make sure
we don't end up doing twice as much work unnecessarily.

- Eric

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH v4 10/12] RISC-V: crypto: add Zvkned accelerated AES encryption implementation
  2023-07-11 15:37   ` Heiko Stuebner
@ 2023-07-21  5:40     ` Eric Biggers
  -1 siblings, 0 replies; 100+ messages in thread
From: Eric Biggers @ 2023-07-21  5:40 UTC (permalink / raw)
  To: Heiko Stuebner
  Cc: palmer, paul.walmsley, aou, herbert, davem, conor.dooley,
	linux-riscv, linux-kernel, linux-crypto, christoph.muellner,
	Heiko Stuebner

On Tue, Jul 11, 2023 at 05:37:41PM +0200, Heiko Stuebner wrote:
> +config CRYPTO_AES_RISCV
> +	tristate "Ciphers: AES (RISCV)"
> +	depends on 64BIT && RISCV_ISA_V
> +	select CRYPTO_AES
> +	help
> +	  Block ciphers: AES cipher algorithms (FIPS-197)
> +	  Length-preserving ciphers: AES with ECB, CBC, CTR, CTS,
> +	    XCTR, and XTS modes
> +	  AEAD cipher: AES with CBC, ESSIV, and SHA-256
> +	    for fscrypt and dm-crypt
> +
> +	  Architecture: riscv using one of
> +	  - Zvkns

I'm looking forward to having direct support for these AES modes, especially the
modes needed for storage encryption: XTS, and CBC or CTS!  None of these AES
modes is actually implemented in this patch yet, though, so they can't be
claimed in the kconfig help text yet.  This patch is just a starting point, as
it just adds support for the bare AES block cipher ("aes" in the crypto API).

(BTW, I'm much more interested in, say, AES-XTS support than SM4 support, which
this patchset does include.  SM4 is a "national pride cipher" which is somewhat
of a niche thing.  I suppose there are already people pushing it for RISC-V
though, as they are everywhere else, so that's to be expected...)

> diff --git a/arch/riscv/crypto/aes-riscv-glue.c b/arch/riscv/crypto/aes-riscv-glue.c
> new file mode 100644
> index 000000000000..85e1187aee22
> --- /dev/null
> +++ b/arch/riscv/crypto/aes-riscv-glue.c
> @@ -0,0 +1,168 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Linux/riscv port of the OpenSSL AES implementation for RISCV
> + *
> + * Copyright (C) 2023 VRULL GmbH
> + * Author: Heiko Stuebner <heiko.stuebner@vrull.eu>
> + */
> +
> +#include <linux/crypto.h>
> +#include <linux/delay.h>
> +#include <linux/err.h>
> +#include <linux/module.h>
> +#include <linux/types.h>
> +#include <asm/simd.h>
> +#include <asm/vector.h>
> +#include <crypto/aes.h>
> +#include <crypto/internal/cipher.h>
> +#include <crypto/internal/simd.h>
> +
> +struct aes_key {
> +	u8 key[AES_MAX_KEYLENGTH];
> +	int rounds;
> +};
> +
> +/* variant using the zvkned vector crypto extension */
> +void rv64i_zvkned_encrypt(const u8 *in, u8 *out, const struct aes_key *key);
> +void rv64i_zvkned_decrypt(const u8 *in, u8 *out, const struct aes_key *key);
> +int rv64i_zvkned_set_encrypt_key(const u8 *userKey, const int bits,
> +				struct aes_key *key);
> +int rv64i_zvkned_set_decrypt_key(const u8 *userKey, const int bits,
> +				struct aes_key *key);
> +
> +struct riscv_aes_ctx {
> +	struct crypto_cipher *fallback;
> +	struct aes_key enc_key;
> +	struct aes_key dec_key;
> +	unsigned int keylen;
> +};

Can it just use 'struct crypto_aes_ctx'?  That's what most of the other AES
implementations use.

> +static int riscv64_aes_init_zvkned(struct crypto_tfm *tfm)
> +{
> +	struct riscv_aes_ctx *ctx = crypto_tfm_ctx(tfm);
> +	const char *alg = crypto_tfm_alg_name(tfm);
> +	struct crypto_cipher *fallback;
> +
> +	fallback = crypto_alloc_cipher(alg, 0, CRYPTO_ALG_NEED_FALLBACK);
> +	if (IS_ERR(fallback)) {
> +		pr_err("Failed to allocate transformation for '%s': %ld\n",
> +		       alg, PTR_ERR(fallback));
> +		return PTR_ERR(fallback);
> +	}
> +
> +	crypto_cipher_set_flags(fallback,
> +				crypto_cipher_get_flags((struct
> +							 crypto_cipher *)
> +							tfm));
> +	ctx->fallback = fallback;
> +
> +	return 0;
> +}
> +
> +static void riscv_aes_exit(struct crypto_tfm *tfm)
> +{
> +	struct riscv_aes_ctx *ctx = crypto_tfm_ctx(tfm);
> +
> +	if (ctx->fallback) {
> +		crypto_free_cipher(ctx->fallback);
> +		ctx->fallback = NULL;
> +	}
> +}
> +
> +static int riscv64_aes_setkey_zvkned(struct crypto_tfm *tfm, const u8 *key,
> +			 unsigned int keylen)
> +{
> +	struct riscv_aes_ctx *ctx = crypto_tfm_ctx(tfm);
> +	int ret;
> +
> +	ctx->keylen = keylen;
> +
> +	if (keylen == 16 || keylen == 32) {
> +		kernel_rvv_begin();
> +		ret = rv64i_zvkned_set_encrypt_key(key, keylen * 8, &ctx->enc_key);
> +		if (ret != 1) {
> +			kernel_rvv_end();
> +			return -EINVAL;
> +		}
> +
> +		ret = rv64i_zvkned_set_decrypt_key(key, keylen * 8, &ctx->dec_key);
> +		kernel_rvv_end();
> +		if (ret != 1)
> +			return -EINVAL;
> +	}
> +
> +	ret = crypto_cipher_setkey(ctx->fallback, key, keylen);
> +
> +	return ret ? -EINVAL : 0;
> +}

It's a bit annoying that RISC-V doesn't support AES-192, though also not
particularly surprising, seeing as AES-192 is almost never used.  (Intel's Key
Locker, for example, is another recent CPU feature that doesn't support
AES-192.)  IMO the issue here is really with the kernel crypto API -- it should
treat AES-128, AES-192, and AES-256 as separate algorithms so that
implementations aren't forced to support all three key sizes...

Anyway, for now, as you noticed you do need a fallback to handle AES-192 to make
the kernel crypto API happy.

But, the fallback doesn't have to be a 'crypto_cipher' as you've implemented.
You could just use the AES library.  See what arch/arm64/crypto/aes-ce-glue.c
does, for example.  Have you considered that?  It would be simpler than the
crypto_cipher based approach.

> +
> +static void riscv64_aes_encrypt_zvkned(struct crypto_tfm *tfm, u8 *dst, const u8 *src)
> +{
> +	struct riscv_aes_ctx *ctx = crypto_tfm_ctx(tfm);

Always use 'const' for the tfm_ctx in encrypt and decrypt functions, please, as
it must never be modified there.

> +struct crypto_alg riscv64_aes_zvkned_alg = {

static

> +	.cra_type = NULL,

Omit that line

> +	.cra_alignmask = 0,

Omit that line

> +MODULE_DESCRIPTION("AES (accelerated)");

Maybe "RISC-V accelerated"?

- Eric

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH v4 10/12] RISC-V: crypto: add Zvkned accelerated AES encryption implementation
@ 2023-07-21  5:40     ` Eric Biggers
  0 siblings, 0 replies; 100+ messages in thread
From: Eric Biggers @ 2023-07-21  5:40 UTC (permalink / raw)
  To: Heiko Stuebner
  Cc: palmer, paul.walmsley, aou, herbert, davem, conor.dooley,
	linux-riscv, linux-kernel, linux-crypto, christoph.muellner,
	Heiko Stuebner

On Tue, Jul 11, 2023 at 05:37:41PM +0200, Heiko Stuebner wrote:
> +config CRYPTO_AES_RISCV
> +	tristate "Ciphers: AES (RISCV)"
> +	depends on 64BIT && RISCV_ISA_V
> +	select CRYPTO_AES
> +	help
> +	  Block ciphers: AES cipher algorithms (FIPS-197)
> +	  Length-preserving ciphers: AES with ECB, CBC, CTR, CTS,
> +	    XCTR, and XTS modes
> +	  AEAD cipher: AES with CBC, ESSIV, and SHA-256
> +	    for fscrypt and dm-crypt
> +
> +	  Architecture: riscv using one of
> +	  - Zvkns

I'm looking forward to having direct support for these AES modes, especially the
modes needed for storage encryption: XTS, and CBC or CTS!  None of these AES
modes is actually implemented in this patch yet, though, so they can't be
claimed in the kconfig help text yet.  This patch is just a starting point, as
it just adds support for the bare AES block cipher ("aes" in the crypto API).

(BTW, I'm much more interested in, say, AES-XTS support than SM4 support, which
this patchset does include.  SM4 is a "national pride cipher" which is somewhat
of a niche thing.  I suppose there are already people pushing it for RISC-V
though, as they are everywhere else, so that's to be expected...)

> diff --git a/arch/riscv/crypto/aes-riscv-glue.c b/arch/riscv/crypto/aes-riscv-glue.c
> new file mode 100644
> index 000000000000..85e1187aee22
> --- /dev/null
> +++ b/arch/riscv/crypto/aes-riscv-glue.c
> @@ -0,0 +1,168 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Linux/riscv port of the OpenSSL AES implementation for RISCV
> + *
> + * Copyright (C) 2023 VRULL GmbH
> + * Author: Heiko Stuebner <heiko.stuebner@vrull.eu>
> + */
> +
> +#include <linux/crypto.h>
> +#include <linux/delay.h>
> +#include <linux/err.h>
> +#include <linux/module.h>
> +#include <linux/types.h>
> +#include <asm/simd.h>
> +#include <asm/vector.h>
> +#include <crypto/aes.h>
> +#include <crypto/internal/cipher.h>
> +#include <crypto/internal/simd.h>
> +
> +struct aes_key {
> +	u8 key[AES_MAX_KEYLENGTH];
> +	int rounds;
> +};
> +
> +/* variant using the zvkned vector crypto extension */
> +void rv64i_zvkned_encrypt(const u8 *in, u8 *out, const struct aes_key *key);
> +void rv64i_zvkned_decrypt(const u8 *in, u8 *out, const struct aes_key *key);
> +int rv64i_zvkned_set_encrypt_key(const u8 *userKey, const int bits,
> +				struct aes_key *key);
> +int rv64i_zvkned_set_decrypt_key(const u8 *userKey, const int bits,
> +				struct aes_key *key);
> +
> +struct riscv_aes_ctx {
> +	struct crypto_cipher *fallback;
> +	struct aes_key enc_key;
> +	struct aes_key dec_key;
> +	unsigned int keylen;
> +};

Can it just use 'struct crypto_aes_ctx'?  That's what most of the other AES
implementations use.

> +static int riscv64_aes_init_zvkned(struct crypto_tfm *tfm)
> +{
> +	struct riscv_aes_ctx *ctx = crypto_tfm_ctx(tfm);
> +	const char *alg = crypto_tfm_alg_name(tfm);
> +	struct crypto_cipher *fallback;
> +
> +	fallback = crypto_alloc_cipher(alg, 0, CRYPTO_ALG_NEED_FALLBACK);
> +	if (IS_ERR(fallback)) {
> +		pr_err("Failed to allocate transformation for '%s': %ld\n",
> +		       alg, PTR_ERR(fallback));
> +		return PTR_ERR(fallback);
> +	}
> +
> +	crypto_cipher_set_flags(fallback,
> +				crypto_cipher_get_flags((struct
> +							 crypto_cipher *)
> +							tfm));
> +	ctx->fallback = fallback;
> +
> +	return 0;
> +}
> +
> +static void riscv_aes_exit(struct crypto_tfm *tfm)
> +{
> +	struct riscv_aes_ctx *ctx = crypto_tfm_ctx(tfm);
> +
> +	if (ctx->fallback) {
> +		crypto_free_cipher(ctx->fallback);
> +		ctx->fallback = NULL;
> +	}
> +}
> +
> +static int riscv64_aes_setkey_zvkned(struct crypto_tfm *tfm, const u8 *key,
> +			 unsigned int keylen)
> +{
> +	struct riscv_aes_ctx *ctx = crypto_tfm_ctx(tfm);
> +	int ret;
> +
> +	ctx->keylen = keylen;
> +
> +	if (keylen == 16 || keylen == 32) {
> +		kernel_rvv_begin();
> +		ret = rv64i_zvkned_set_encrypt_key(key, keylen * 8, &ctx->enc_key);
> +		if (ret != 1) {
> +			kernel_rvv_end();
> +			return -EINVAL;
> +		}
> +
> +		ret = rv64i_zvkned_set_decrypt_key(key, keylen * 8, &ctx->dec_key);
> +		kernel_rvv_end();
> +		if (ret != 1)
> +			return -EINVAL;
> +	}
> +
> +	ret = crypto_cipher_setkey(ctx->fallback, key, keylen);
> +
> +	return ret ? -EINVAL : 0;
> +}

It's a bit annoying that RISC-V doesn't support AES-192, though also not
particularly surprising, seeing as AES-192 is almost never used.  (Intel's Key
Locker, for example, is another recent CPU feature that doesn't support
AES-192.)  IMO the issue here is really with the kernel crypto API -- it should
treat AES-128, AES-192, and AES-256 as separate algorithms so that
implementations aren't forced to support all three key sizes...

Anyway, for now, as you noticed you do need a fallback to handle AES-192 to make
the kernel crypto API happy.

But, the fallback doesn't have to be a 'crypto_cipher' as you've implemented.
You could just use the AES library.  See what arch/arm64/crypto/aes-ce-glue.c
does, for example.  Have you considered that?  It would be simpler than the
crypto_cipher based approach.

> +
> +static void riscv64_aes_encrypt_zvkned(struct crypto_tfm *tfm, u8 *dst, const u8 *src)
> +{
> +	struct riscv_aes_ctx *ctx = crypto_tfm_ctx(tfm);

Always use 'const' for the tfm_ctx in encrypt and decrypt functions, please, as
it must never be modified there.

> +struct crypto_alg riscv64_aes_zvkned_alg = {

static

> +	.cra_type = NULL,

Omit that line

> +	.cra_alignmask = 0,

Omit that line

> +MODULE_DESCRIPTION("AES (accelerated)");

Maybe "RISC-V accelerated"?

- Eric

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH v4 04/12] RISC-V: add vector crypto extension detection
  2023-07-11 15:37   ` Heiko Stuebner
@ 2023-07-21  5:48     ` Eric Biggers
  -1 siblings, 0 replies; 100+ messages in thread
From: Eric Biggers @ 2023-07-21  5:48 UTC (permalink / raw)
  To: Heiko Stuebner
  Cc: palmer, paul.walmsley, aou, herbert, davem, conor.dooley,
	linux-riscv, linux-kernel, linux-crypto, christoph.muellner,
	Heiko Stuebner

On Tue, Jul 11, 2023 at 05:37:35PM +0200, Heiko Stuebner wrote:
> From: Heiko Stuebner <heiko.stuebner@vrull.eu>
> 
> Add detection for some extensions of the vector-crypto specification:
> - Zvkb: Vector Bit-manipulation used in Cryptography
> - Zvkg: Vector GCM/GMAC
> - Zvknha and Zvknhb: NIST Algorithm Suite
> - Zvkns: AES-128, AES-256 Single Round Suite
> - Zvksed: ShangMi Algorithm Suite
> - Zvksh: ShangMi Algorithm Suite
> 
> As their use is very specific and will likely be limited to special places
> we expect current code to just pre-encode those instructions, so right now
> we don't introduce toolchain requirements.
> 
> Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
> ---
>  arch/riscv/include/asm/hwcap.h |  9 ++++++
>  arch/riscv/kernel/cpu.c        |  8 ++++++
>  arch/riscv/kernel/cpufeature.c | 50 ++++++++++++++++++++++++++++++++++
>  3 files changed, 67 insertions(+)
> 
> diff --git a/arch/riscv/include/asm/hwcap.h b/arch/riscv/include/asm/hwcap.h
> index b80ca6e77088..0f5172fa87b0 100644
> --- a/arch/riscv/include/asm/hwcap.h
> +++ b/arch/riscv/include/asm/hwcap.h
> @@ -64,6 +64,15 @@
>  #define RISCV_ISA_EXT_ZKSED		51
>  #define RISCV_ISA_EXT_ZKSH		52
>  #define RISCV_ISA_EXT_ZKT		53
> +#define RISCV_ISA_EXT_ZVBB		54
> +#define RISCV_ISA_EXT_ZVBC		55
> +#define RISCV_ISA_EXT_ZVKG		56
> +#define RISCV_ISA_EXT_ZVKNED		57
> +#define RISCV_ISA_EXT_ZVKNHA		58
> +#define RISCV_ISA_EXT_ZVKNHB		59
> +#define RISCV_ISA_EXT_ZVKSED		60
> +#define RISCV_ISA_EXT_ZVKSH		61
> +#define RISCV_ISA_EXT_ZVKT		62

It would be helpful if each RISCV_ISA_EXT_* definition had a comment that spells
out what it stands for, similar to what arch/x86/include/asm/cpufeatures.h does.
I know they can all be looked up, and they're sort of mnemonic, but it would be
helpful.

- Eric

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH v4 04/12] RISC-V: add vector crypto extension detection
@ 2023-07-21  5:48     ` Eric Biggers
  0 siblings, 0 replies; 100+ messages in thread
From: Eric Biggers @ 2023-07-21  5:48 UTC (permalink / raw)
  To: Heiko Stuebner
  Cc: palmer, paul.walmsley, aou, herbert, davem, conor.dooley,
	linux-riscv, linux-kernel, linux-crypto, christoph.muellner,
	Heiko Stuebner

On Tue, Jul 11, 2023 at 05:37:35PM +0200, Heiko Stuebner wrote:
> From: Heiko Stuebner <heiko.stuebner@vrull.eu>
> 
> Add detection for some extensions of the vector-crypto specification:
> - Zvkb: Vector Bit-manipulation used in Cryptography
> - Zvkg: Vector GCM/GMAC
> - Zvknha and Zvknhb: NIST Algorithm Suite
> - Zvkns: AES-128, AES-256 Single Round Suite
> - Zvksed: ShangMi Algorithm Suite
> - Zvksh: ShangMi Algorithm Suite
> 
> As their use is very specific and will likely be limited to special places
> we expect current code to just pre-encode those instructions, so right now
> we don't introduce toolchain requirements.
> 
> Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
> ---
>  arch/riscv/include/asm/hwcap.h |  9 ++++++
>  arch/riscv/kernel/cpu.c        |  8 ++++++
>  arch/riscv/kernel/cpufeature.c | 50 ++++++++++++++++++++++++++++++++++
>  3 files changed, 67 insertions(+)
> 
> diff --git a/arch/riscv/include/asm/hwcap.h b/arch/riscv/include/asm/hwcap.h
> index b80ca6e77088..0f5172fa87b0 100644
> --- a/arch/riscv/include/asm/hwcap.h
> +++ b/arch/riscv/include/asm/hwcap.h
> @@ -64,6 +64,15 @@
>  #define RISCV_ISA_EXT_ZKSED		51
>  #define RISCV_ISA_EXT_ZKSH		52
>  #define RISCV_ISA_EXT_ZKT		53
> +#define RISCV_ISA_EXT_ZVBB		54
> +#define RISCV_ISA_EXT_ZVBC		55
> +#define RISCV_ISA_EXT_ZVKG		56
> +#define RISCV_ISA_EXT_ZVKNED		57
> +#define RISCV_ISA_EXT_ZVKNHA		58
> +#define RISCV_ISA_EXT_ZVKNHB		59
> +#define RISCV_ISA_EXT_ZVKSED		60
> +#define RISCV_ISA_EXT_ZVKSH		61
> +#define RISCV_ISA_EXT_ZVKT		62

It would be helpful if each RISCV_ISA_EXT_* definition had a comment that spells
out what it stands for, similar to what arch/x86/include/asm/cpufeatures.h does.
I know they can all be looked up, and they're sort of mnemonic, but it would be
helpful.

- Eric

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH v4 10/12] RISC-V: crypto: add Zvkned accelerated AES encryption implementation
  2023-07-21  5:40     ` Eric Biggers
@ 2023-07-21 11:39       ` Ard Biesheuvel
  -1 siblings, 0 replies; 100+ messages in thread
From: Ard Biesheuvel @ 2023-07-21 11:39 UTC (permalink / raw)
  To: Eric Biggers
  Cc: Heiko Stuebner, palmer, paul.walmsley, aou, herbert, davem,
	conor.dooley, linux-riscv, linux-kernel, linux-crypto,
	christoph.muellner, Heiko Stuebner

On Fri, 21 Jul 2023 at 07:40, Eric Biggers <ebiggers@kernel.org> wrote:
>
> On Tue, Jul 11, 2023 at 05:37:41PM +0200, Heiko Stuebner wrote:
...
> > +static int riscv64_aes_setkey_zvkned(struct crypto_tfm *tfm, const u8 *key,
> > +                      unsigned int keylen)
> > +{
> > +     struct riscv_aes_ctx *ctx = crypto_tfm_ctx(tfm);
> > +     int ret;
> > +
> > +     ctx->keylen = keylen;
> > +
> > +     if (keylen == 16 || keylen == 32) {
> > +             kernel_rvv_begin();
> > +             ret = rv64i_zvkned_set_encrypt_key(key, keylen * 8, &ctx->enc_key);
> > +             if (ret != 1) {
> > +                     kernel_rvv_end();
> > +                     return -EINVAL;
> > +             }
> > +
> > +             ret = rv64i_zvkned_set_decrypt_key(key, keylen * 8, &ctx->dec_key);

The asm suggests that the encryption and decryption key schedules are
the same, and the decryption algorithm does not implement the
Equivalent Inverse Cipher, but simply iterates over they key schedule
in reverse order. This makes much more sense for instruction based
AES, so it doesn't surprise me but it does mean you can just drop this
part, and pass enc_key everywhere.

> > +             kernel_rvv_end();
> > +             if (ret != 1)
> > +                     return -EINVAL;
> > +     }
> > +
> > +     ret = crypto_cipher_setkey(ctx->fallback, key, keylen);
> > +
> > +     return ret ? -EINVAL : 0;
> > +}
>
> It's a bit annoying that RISC-V doesn't support AES-192, though also not
> particularly surprising, seeing as AES-192 is almost never used.  (Intel's Key
> Locker, for example, is another recent CPU feature that doesn't support
> AES-192.)  IMO the issue here is really with the kernel crypto API -- it should
> treat AES-128, AES-192, and AES-256 as separate algorithms so that
> implementations aren't forced to support all three key sizes...
>

Why is this a fundamental limitation? AES-192 uses the same AES block
size and round structure, the only difference is the number of rounds
and how the round keys are calculated.

Creating the key schedule should never be performance critical, so if
the lack of AES-192 support is due to a limitation in the key schedule
generation instructions, I'd suggest to avoid those if possible and
just use the generic library code to derive the key schedule. If that
works, I'm pretty sure AES-192 support is just a matter of
implementing a 12-round variant modeled after the existing 10/14 round
ones.

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH v4 10/12] RISC-V: crypto: add Zvkned accelerated AES encryption implementation
@ 2023-07-21 11:39       ` Ard Biesheuvel
  0 siblings, 0 replies; 100+ messages in thread
From: Ard Biesheuvel @ 2023-07-21 11:39 UTC (permalink / raw)
  To: Eric Biggers
  Cc: Heiko Stuebner, palmer, paul.walmsley, aou, herbert, davem,
	conor.dooley, linux-riscv, linux-kernel, linux-crypto,
	christoph.muellner, Heiko Stuebner

On Fri, 21 Jul 2023 at 07:40, Eric Biggers <ebiggers@kernel.org> wrote:
>
> On Tue, Jul 11, 2023 at 05:37:41PM +0200, Heiko Stuebner wrote:
...
> > +static int riscv64_aes_setkey_zvkned(struct crypto_tfm *tfm, const u8 *key,
> > +                      unsigned int keylen)
> > +{
> > +     struct riscv_aes_ctx *ctx = crypto_tfm_ctx(tfm);
> > +     int ret;
> > +
> > +     ctx->keylen = keylen;
> > +
> > +     if (keylen == 16 || keylen == 32) {
> > +             kernel_rvv_begin();
> > +             ret = rv64i_zvkned_set_encrypt_key(key, keylen * 8, &ctx->enc_key);
> > +             if (ret != 1) {
> > +                     kernel_rvv_end();
> > +                     return -EINVAL;
> > +             }
> > +
> > +             ret = rv64i_zvkned_set_decrypt_key(key, keylen * 8, &ctx->dec_key);

The asm suggests that the encryption and decryption key schedules are
the same, and the decryption algorithm does not implement the
Equivalent Inverse Cipher, but simply iterates over they key schedule
in reverse order. This makes much more sense for instruction based
AES, so it doesn't surprise me but it does mean you can just drop this
part, and pass enc_key everywhere.

> > +             kernel_rvv_end();
> > +             if (ret != 1)
> > +                     return -EINVAL;
> > +     }
> > +
> > +     ret = crypto_cipher_setkey(ctx->fallback, key, keylen);
> > +
> > +     return ret ? -EINVAL : 0;
> > +}
>
> It's a bit annoying that RISC-V doesn't support AES-192, though also not
> particularly surprising, seeing as AES-192 is almost never used.  (Intel's Key
> Locker, for example, is another recent CPU feature that doesn't support
> AES-192.)  IMO the issue here is really with the kernel crypto API -- it should
> treat AES-128, AES-192, and AES-256 as separate algorithms so that
> implementations aren't forced to support all three key sizes...
>

Why is this a fundamental limitation? AES-192 uses the same AES block
size and round structure, the only difference is the number of rounds
and how the round keys are calculated.

Creating the key schedule should never be performance critical, so if
the lack of AES-192 support is due to a limitation in the key schedule
generation instructions, I'd suggest to avoid those if possible and
just use the generic library code to derive the key schedule. If that
works, I'm pretty sure AES-192 support is just a matter of
implementing a 12-round variant modeled after the existing 10/14 round
ones.

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH v4 10/12] RISC-V: crypto: add Zvkned accelerated AES encryption implementation
  2023-07-21 11:39       ` Ard Biesheuvel
@ 2023-07-21 14:23         ` Ard Biesheuvel
  -1 siblings, 0 replies; 100+ messages in thread
From: Ard Biesheuvel @ 2023-07-21 14:23 UTC (permalink / raw)
  To: Eric Biggers
  Cc: Heiko Stuebner, palmer, paul.walmsley, aou, herbert, davem,
	conor.dooley, linux-riscv, linux-kernel, linux-crypto,
	christoph.muellner, Heiko Stuebner

On Fri, 21 Jul 2023 at 13:39, Ard Biesheuvel <ardb@kernel.org> wrote:
>
> On Fri, 21 Jul 2023 at 07:40, Eric Biggers <ebiggers@kernel.org> wrote:
> >
> > On Tue, Jul 11, 2023 at 05:37:41PM +0200, Heiko Stuebner wrote:
> ...
> > > +static int riscv64_aes_setkey_zvkned(struct crypto_tfm *tfm, const u8 *key,
> > > +                      unsigned int keylen)
> > > +{
> > > +     struct riscv_aes_ctx *ctx = crypto_tfm_ctx(tfm);
> > > +     int ret;
> > > +
> > > +     ctx->keylen = keylen;
> > > +
> > > +     if (keylen == 16 || keylen == 32) {
> > > +             kernel_rvv_begin();
> > > +             ret = rv64i_zvkned_set_encrypt_key(key, keylen * 8, &ctx->enc_key);
> > > +             if (ret != 1) {
> > > +                     kernel_rvv_end();
> > > +                     return -EINVAL;
> > > +             }
> > > +
> > > +             ret = rv64i_zvkned_set_decrypt_key(key, keylen * 8, &ctx->dec_key);
>
> The asm suggests that the encryption and decryption key schedules are
> the same, and the decryption algorithm does not implement the
> Equivalent Inverse Cipher, but simply iterates over they key schedule
> in reverse order. This makes much more sense for instruction based
> AES, so it doesn't surprise me but it does mean you can just drop this
> part, and pass enc_key everywhere.
>
> > > +             kernel_rvv_end();
> > > +             if (ret != 1)
> > > +                     return -EINVAL;
> > > +     }
> > > +
> > > +     ret = crypto_cipher_setkey(ctx->fallback, key, keylen);
> > > +
> > > +     return ret ? -EINVAL : 0;
> > > +}
> >
> > It's a bit annoying that RISC-V doesn't support AES-192, though also not
> > particularly surprising, seeing as AES-192 is almost never used.  (Intel's Key
> > Locker, for example, is another recent CPU feature that doesn't support
> > AES-192.)  IMO the issue here is really with the kernel crypto API -- it should
> > treat AES-128, AES-192, and AES-256 as separate algorithms so that
> > implementations aren't forced to support all three key sizes...
> >
>
> Why is this a fundamental limitation? AES-192 uses the same AES block
> size and round structure, the only difference is the number of rounds
> and how the round keys are calculated.
>
> Creating the key schedule should never be performance critical, so if
> the lack of AES-192 support is due to a limitation in the key schedule
> generation instructions, I'd suggest to avoid those if possible and
> just use the generic library code to derive the key schedule. If that
> works, I'm pretty sure AES-192 support is just a matter of
> implementing a 12-round variant modeled after the existing 10/14 round
> ones.

This seems to work:
https://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux.git/log/?h=riscv-crypto

Feel free to incorporate/squash any of those changes into your series.

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH v4 10/12] RISC-V: crypto: add Zvkned accelerated AES encryption implementation
@ 2023-07-21 14:23         ` Ard Biesheuvel
  0 siblings, 0 replies; 100+ messages in thread
From: Ard Biesheuvel @ 2023-07-21 14:23 UTC (permalink / raw)
  To: Eric Biggers
  Cc: Heiko Stuebner, palmer, paul.walmsley, aou, herbert, davem,
	conor.dooley, linux-riscv, linux-kernel, linux-crypto,
	christoph.muellner, Heiko Stuebner

On Fri, 21 Jul 2023 at 13:39, Ard Biesheuvel <ardb@kernel.org> wrote:
>
> On Fri, 21 Jul 2023 at 07:40, Eric Biggers <ebiggers@kernel.org> wrote:
> >
> > On Tue, Jul 11, 2023 at 05:37:41PM +0200, Heiko Stuebner wrote:
> ...
> > > +static int riscv64_aes_setkey_zvkned(struct crypto_tfm *tfm, const u8 *key,
> > > +                      unsigned int keylen)
> > > +{
> > > +     struct riscv_aes_ctx *ctx = crypto_tfm_ctx(tfm);
> > > +     int ret;
> > > +
> > > +     ctx->keylen = keylen;
> > > +
> > > +     if (keylen == 16 || keylen == 32) {
> > > +             kernel_rvv_begin();
> > > +             ret = rv64i_zvkned_set_encrypt_key(key, keylen * 8, &ctx->enc_key);
> > > +             if (ret != 1) {
> > > +                     kernel_rvv_end();
> > > +                     return -EINVAL;
> > > +             }
> > > +
> > > +             ret = rv64i_zvkned_set_decrypt_key(key, keylen * 8, &ctx->dec_key);
>
> The asm suggests that the encryption and decryption key schedules are
> the same, and the decryption algorithm does not implement the
> Equivalent Inverse Cipher, but simply iterates over they key schedule
> in reverse order. This makes much more sense for instruction based
> AES, so it doesn't surprise me but it does mean you can just drop this
> part, and pass enc_key everywhere.
>
> > > +             kernel_rvv_end();
> > > +             if (ret != 1)
> > > +                     return -EINVAL;
> > > +     }
> > > +
> > > +     ret = crypto_cipher_setkey(ctx->fallback, key, keylen);
> > > +
> > > +     return ret ? -EINVAL : 0;
> > > +}
> >
> > It's a bit annoying that RISC-V doesn't support AES-192, though also not
> > particularly surprising, seeing as AES-192 is almost never used.  (Intel's Key
> > Locker, for example, is another recent CPU feature that doesn't support
> > AES-192.)  IMO the issue here is really with the kernel crypto API -- it should
> > treat AES-128, AES-192, and AES-256 as separate algorithms so that
> > implementations aren't forced to support all three key sizes...
> >
>
> Why is this a fundamental limitation? AES-192 uses the same AES block
> size and round structure, the only difference is the number of rounds
> and how the round keys are calculated.
>
> Creating the key schedule should never be performance critical, so if
> the lack of AES-192 support is due to a limitation in the key schedule
> generation instructions, I'd suggest to avoid those if possible and
> just use the generic library code to derive the key schedule. If that
> works, I'm pretty sure AES-192 support is just a matter of
> implementing a 12-round variant modeled after the existing 10/14 round
> ones.

This seems to work:
https://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux.git/log/?h=riscv-crypto

Feel free to incorporate/squash any of those changes into your series.

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH v4 06/12] RISC-V: crypto: add Zvbb+Zvbc accelerated GCM GHASH implementation
  2023-07-11 15:37   ` Heiko Stuebner
@ 2023-08-10  9:57     ` Andy Chiu
  -1 siblings, 0 replies; 100+ messages in thread
From: Andy Chiu @ 2023-08-10  9:57 UTC (permalink / raw)
  To: Heiko Stuebner
  Cc: palmer, paul.walmsley, aou, herbert, davem, conor.dooley,
	linux-riscv, linux-kernel, linux-crypto, christoph.muellner,
	ebiggers, Heiko Stuebner

On Tue, Jul 11, 2023 at 05:37:37PM +0200, Heiko Stuebner wrote:
Hi Heiko,

> From: Heiko Stuebner <heiko.stuebner@vrull.eu>
> 
> Add a gcm hash implementation using the Zvbb+Zvbc crypto extensions.
> It gets possibly registered alongside the Zbc-based variant, with a higher
> priority so that the crypto subsystem will be able to select the most
> performant variant, but the algorithm itself will still be part of the
> crypto selftests that run during registration.
> 

All newly added crypto algorithms are passing on my side, except for
this one. I was testing on a QEMU and toolchain that support the
frozen spec.

It seems like it was failing on a small 16-Byte input. Here are the
input, expected digest and the (mismatched-)result.

(gdb) x/2gx vec->key
0xffffffff8163cc38:     0x03db81ed4dbfa6df      0x61f030f895ffcaff
(gdb) x/2gx vec->plaintext
0xffffffff8163cc50:     0xc04a60a5562a2b95      0xb6405ba056662bb3
(gdb) x/2gx vec->digest
0xffffffff8163cc68:     0xb65bc5d20aeb53da      0x60dafec32c80c44f
(gdb) x/2gx result
0xff20000000943bb8:     0x0000000000000000      0xb18de0d5e7abcf10

And here is the bootlog, do you have any idea?
[    5.007043] alg: shash: riscv64_zvbb_zvbc_ghash test failed (wrong result) on test vector 0, cfg="init+update+final aligned buffer"
[    5.008164] alg: self-tests for ghash using riscv64_zvbb_zvbc_ghash failed (rc=-22)
[    5.008450] ------------[ cut here ]------------
[    5.009226] alg: self-tests for ghash using riscv64_zvbb_zvbc_ghash failed (rc=-22)
[    5.010678] WARNING: CPU: 1 PID: 87 at crypto/testmgr.c:5867 alg_test+0x3e2/0x41e
[    5.011792] Modules linked in:
[    5.013314] CPU: 1 PID: 87 Comm: cryptomgr_test Not tainted 6.2.2-02529-g4b0fb43edd0f-dirty #37
[    5.014037] Hardware name: riscv-virtio,qemu (DT)
[    5.014582] epc : alg_test+0x3e2/0x41e
[    5.014938]  ra : alg_test+0x3e2/0x41e
[    5.015256] epc : ffffffff80677744 ra : ffffffff80677744 sp : ff2000000095bd70
[    5.015718]  gp : ffffffff81c896b8 tp : ff6000000464d280 t0 : ffffffff81a2c970
[    5.016171]  t1 : ffffffffffffffff t2 : 2d2d2d2d2d2d2d2d s0 : ff2000000095be80
[    5.016616]  s1 : ffffffffffffffea a0 : 0000000000000047 a1 : ffffffff81a97c70
[    5.017078]  a2 : 0000000000000010 a3 : fffffffffffffffe a4 : 0000000000000000
[    5.017582]  a5 : 0000000000000000 a6 : 0000000000000000 a7 : 0000000000000000
[    5.018079]  s2 : 000000000000000e s3 : ff60000002adf200 s4 : ff60000002adf280
[    5.018576]  s5 : 0000000000000171 s6 : 00000000000000b8 s7 : 0000000000000088
[    5.019083]  s8 : ffffffffffffffff s9 : 00000000000000b8 s10: 0000000000002e00
[    5.019584]  s11: ffffffff8127fd78 t3 : ffffffff81ca0f17 t4 : ffffffff81ca0f17
[    5.020074]  t5 : ffffffff81ca0f18 t6 : ff2000000095bb88
[    5.020455] status: 0000000200000120 badaddr: 0000000000000000 cause: 0000000000000003
[    5.021234] [<ffffffff80677744>] alg_test+0x3e2/0x41e
[    5.021906] [<ffffffff8067490e>] cryptomgr_test+0x28/0x4a
[    5.022306] [<ffffffff80055ba0>] kthread+0xe0/0xf6
[    5.022710] [<ffffffff80003edc>] ret_from_exception+0x0/0x16
[    5.023755] ---[ end trace 0000000000000000 ]---

Thanks,
Andy

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH v4 06/12] RISC-V: crypto: add Zvbb+Zvbc accelerated GCM GHASH implementation
@ 2023-08-10  9:57     ` Andy Chiu
  0 siblings, 0 replies; 100+ messages in thread
From: Andy Chiu @ 2023-08-10  9:57 UTC (permalink / raw)
  To: Heiko Stuebner
  Cc: palmer, paul.walmsley, aou, herbert, davem, conor.dooley,
	linux-riscv, linux-kernel, linux-crypto, christoph.muellner,
	ebiggers, Heiko Stuebner

On Tue, Jul 11, 2023 at 05:37:37PM +0200, Heiko Stuebner wrote:
Hi Heiko,

> From: Heiko Stuebner <heiko.stuebner@vrull.eu>
> 
> Add a gcm hash implementation using the Zvbb+Zvbc crypto extensions.
> It gets possibly registered alongside the Zbc-based variant, with a higher
> priority so that the crypto subsystem will be able to select the most
> performant variant, but the algorithm itself will still be part of the
> crypto selftests that run during registration.
> 

All newly added crypto algorithms are passing on my side, except for
this one. I was testing on a QEMU and toolchain that support the
frozen spec.

It seems like it was failing on a small 16-Byte input. Here are the
input, expected digest and the (mismatched-)result.

(gdb) x/2gx vec->key
0xffffffff8163cc38:     0x03db81ed4dbfa6df      0x61f030f895ffcaff
(gdb) x/2gx vec->plaintext
0xffffffff8163cc50:     0xc04a60a5562a2b95      0xb6405ba056662bb3
(gdb) x/2gx vec->digest
0xffffffff8163cc68:     0xb65bc5d20aeb53da      0x60dafec32c80c44f
(gdb) x/2gx result
0xff20000000943bb8:     0x0000000000000000      0xb18de0d5e7abcf10

And here is the bootlog, do you have any idea?
[    5.007043] alg: shash: riscv64_zvbb_zvbc_ghash test failed (wrong result) on test vector 0, cfg="init+update+final aligned buffer"
[    5.008164] alg: self-tests for ghash using riscv64_zvbb_zvbc_ghash failed (rc=-22)
[    5.008450] ------------[ cut here ]------------
[    5.009226] alg: self-tests for ghash using riscv64_zvbb_zvbc_ghash failed (rc=-22)
[    5.010678] WARNING: CPU: 1 PID: 87 at crypto/testmgr.c:5867 alg_test+0x3e2/0x41e
[    5.011792] Modules linked in:
[    5.013314] CPU: 1 PID: 87 Comm: cryptomgr_test Not tainted 6.2.2-02529-g4b0fb43edd0f-dirty #37
[    5.014037] Hardware name: riscv-virtio,qemu (DT)
[    5.014582] epc : alg_test+0x3e2/0x41e
[    5.014938]  ra : alg_test+0x3e2/0x41e
[    5.015256] epc : ffffffff80677744 ra : ffffffff80677744 sp : ff2000000095bd70
[    5.015718]  gp : ffffffff81c896b8 tp : ff6000000464d280 t0 : ffffffff81a2c970
[    5.016171]  t1 : ffffffffffffffff t2 : 2d2d2d2d2d2d2d2d s0 : ff2000000095be80
[    5.016616]  s1 : ffffffffffffffea a0 : 0000000000000047 a1 : ffffffff81a97c70
[    5.017078]  a2 : 0000000000000010 a3 : fffffffffffffffe a4 : 0000000000000000
[    5.017582]  a5 : 0000000000000000 a6 : 0000000000000000 a7 : 0000000000000000
[    5.018079]  s2 : 000000000000000e s3 : ff60000002adf200 s4 : ff60000002adf280
[    5.018576]  s5 : 0000000000000171 s6 : 00000000000000b8 s7 : 0000000000000088
[    5.019083]  s8 : ffffffffffffffff s9 : 00000000000000b8 s10: 0000000000002e00
[    5.019584]  s11: ffffffff8127fd78 t3 : ffffffff81ca0f17 t4 : ffffffff81ca0f17
[    5.020074]  t5 : ffffffff81ca0f18 t6 : ff2000000095bb88
[    5.020455] status: 0000000200000120 badaddr: 0000000000000000 cause: 0000000000000003
[    5.021234] [<ffffffff80677744>] alg_test+0x3e2/0x41e
[    5.021906] [<ffffffff8067490e>] cryptomgr_test+0x28/0x4a
[    5.022306] [<ffffffff80055ba0>] kthread+0xe0/0xf6
[    5.022710] [<ffffffff80003edc>] ret_from_exception+0x0/0x16
[    5.023755] ---[ end trace 0000000000000000 ]---

Thanks,
Andy

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH v4 10/12] RISC-V: crypto: add Zvkned accelerated AES encryption implementation
  2023-07-21  5:40     ` Eric Biggers
@ 2023-09-11 13:06       ` Jerry Shih
  -1 siblings, 0 replies; 100+ messages in thread
From: Jerry Shih @ 2023-09-11 13:06 UTC (permalink / raw)
  To: Eric Biggers
  Cc: Heiko Stuebner, palmer, paul.walmsley, aou, herbert, davem,
	conor.dooley, linux-riscv, linux-kernel, linux-crypto,
	christoph.muellner, Heiko Stuebner

On Jul 21, 2023, at 13:40, Eric Biggers <ebiggers@kernel.org> wrote:

> I'm looking forward to having direct support for these AES modes, especially the
> modes needed for storage encryption: XTS, and CBC or CTS!  None of these AES
> modes is actually implemented in this patch yet, though, so they can't be
> claimed in the kconfig help text yet.  This patch is just a starting point, as
> it just adds support for the bare AES block cipher ("aes" in the crypto API).
> 
> (BTW, I'm much more interested in, say, AES-XTS support than SM4 support, which
> this patchset does include.  SM4 is a "national pride cipher" which is somewhat
> of a niche thing.  I suppose there are already people pushing it for RISC-V
> though, as they are everywhere else, so that's to be expected...)
> 

We have further optimization for RISC-V platform in OpenSSL PR[1]. It will include
AES with CBC, CTR, and XTS mode. Comparing to the generic AES implementation,
the specialized AES-XTS one have about 3X performance improvement using
OpenSSL benchmark tool. If OpenSSL accepts that PR, we will create the
corresponding patch for Linux kernel.

[1]
https://github.com/openssl/openssl/pull/21923

-Jerry

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH v4 10/12] RISC-V: crypto: add Zvkned accelerated AES encryption implementation
@ 2023-09-11 13:06       ` Jerry Shih
  0 siblings, 0 replies; 100+ messages in thread
From: Jerry Shih @ 2023-09-11 13:06 UTC (permalink / raw)
  To: Eric Biggers
  Cc: Heiko Stuebner, palmer, paul.walmsley, aou, herbert, davem,
	conor.dooley, linux-riscv, linux-kernel, linux-crypto,
	christoph.muellner, Heiko Stuebner

On Jul 21, 2023, at 13:40, Eric Biggers <ebiggers@kernel.org> wrote:

> I'm looking forward to having direct support for these AES modes, especially the
> modes needed for storage encryption: XTS, and CBC or CTS!  None of these AES
> modes is actually implemented in this patch yet, though, so they can't be
> claimed in the kconfig help text yet.  This patch is just a starting point, as
> it just adds support for the bare AES block cipher ("aes" in the crypto API).
> 
> (BTW, I'm much more interested in, say, AES-XTS support than SM4 support, which
> this patchset does include.  SM4 is a "national pride cipher" which is somewhat
> of a niche thing.  I suppose there are already people pushing it for RISC-V
> though, as they are everywhere else, so that's to be expected...)
> 

We have further optimization for RISC-V platform in OpenSSL PR[1]. It will include
AES with CBC, CTR, and XTS mode. Comparing to the generic AES implementation,
the specialized AES-XTS one have about 3X performance improvement using
OpenSSL benchmark tool. If OpenSSL accepts that PR, we will create the
corresponding patch for Linux kernel.

[1]
https://github.com/openssl/openssl/pull/21923

-Jerry

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH v4 10/12] RISC-V: crypto: add Zvkned accelerated AES encryption implementation
  2023-09-11 13:06       ` Jerry Shih
@ 2023-09-12  7:04         ` Ard Biesheuvel
  -1 siblings, 0 replies; 100+ messages in thread
From: Ard Biesheuvel @ 2023-09-12  7:04 UTC (permalink / raw)
  To: Jerry Shih
  Cc: Eric Biggers, Heiko Stuebner, palmer, paul.walmsley, aou,
	herbert, davem, conor.dooley, linux-riscv, linux-kernel,
	linux-crypto, christoph.muellner, Heiko Stuebner

On Tue, 12 Sept 2023 at 00:50, Jerry Shih <jerry.shih@sifive.com> wrote:
>
> On Jul 21, 2023, at 13:40, Eric Biggers <ebiggers@kernel.org> wrote:
>
> > I'm looking forward to having direct support for these AES modes, especially the
> > modes needed for storage encryption: XTS, and CBC or CTS!  None of these AES
> > modes is actually implemented in this patch yet, though, so they can't be
> > claimed in the kconfig help text yet.  This patch is just a starting point, as
> > it just adds support for the bare AES block cipher ("aes" in the crypto API).
> >
> > (BTW, I'm much more interested in, say, AES-XTS support than SM4 support, which
> > this patchset does include.  SM4 is a "national pride cipher" which is somewhat
> > of a niche thing.  I suppose there are already people pushing it for RISC-V
> > though, as they are everywhere else, so that's to be expected...)
> >
>
> We have further optimization for RISC-V platform in OpenSSL PR[1]. It will include
> AES with CBC, CTR, and XTS mode. Comparing to the generic AES implementation,
> the specialized AES-XTS one have about 3X performance improvement using
> OpenSSL benchmark tool. If OpenSSL accepts that PR, we will create the
> corresponding patch for Linux kernel.
>
> [1]
> https://github.com/openssl/openssl/pull/21923
>

This pull request doesn't appear to contain any XTS code at all, only CBC.

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH v4 10/12] RISC-V: crypto: add Zvkned accelerated AES encryption implementation
@ 2023-09-12  7:04         ` Ard Biesheuvel
  0 siblings, 0 replies; 100+ messages in thread
From: Ard Biesheuvel @ 2023-09-12  7:04 UTC (permalink / raw)
  To: Jerry Shih
  Cc: Eric Biggers, Heiko Stuebner, palmer, paul.walmsley, aou,
	herbert, davem, conor.dooley, linux-riscv, linux-kernel,
	linux-crypto, christoph.muellner, Heiko Stuebner

On Tue, 12 Sept 2023 at 00:50, Jerry Shih <jerry.shih@sifive.com> wrote:
>
> On Jul 21, 2023, at 13:40, Eric Biggers <ebiggers@kernel.org> wrote:
>
> > I'm looking forward to having direct support for these AES modes, especially the
> > modes needed for storage encryption: XTS, and CBC or CTS!  None of these AES
> > modes is actually implemented in this patch yet, though, so they can't be
> > claimed in the kconfig help text yet.  This patch is just a starting point, as
> > it just adds support for the bare AES block cipher ("aes" in the crypto API).
> >
> > (BTW, I'm much more interested in, say, AES-XTS support than SM4 support, which
> > this patchset does include.  SM4 is a "national pride cipher" which is somewhat
> > of a niche thing.  I suppose there are already people pushing it for RISC-V
> > though, as they are everywhere else, so that's to be expected...)
> >
>
> We have further optimization for RISC-V platform in OpenSSL PR[1]. It will include
> AES with CBC, CTR, and XTS mode. Comparing to the generic AES implementation,
> the specialized AES-XTS one have about 3X performance improvement using
> OpenSSL benchmark tool. If OpenSSL accepts that PR, we will create the
> corresponding patch for Linux kernel.
>
> [1]
> https://github.com/openssl/openssl/pull/21923
>

This pull request doesn't appear to contain any XTS code at all, only CBC.

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH v4 10/12] RISC-V: crypto: add Zvkned accelerated AES encryption implementation
  2023-09-12  7:04         ` Ard Biesheuvel
@ 2023-09-12  7:15           ` Jerry Shih
  -1 siblings, 0 replies; 100+ messages in thread
From: Jerry Shih @ 2023-09-12  7:15 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Eric Biggers, Heiko Stuebner, palmer, paul.walmsley, aou,
	herbert, davem, conor.dooley, linux-riscv, linux-kernel,
	linux-crypto, christoph.muellner, Heiko Stuebner

On Sep 12, 2023, at 15:04, Ard Biesheuvel <ardb@kernel.org> wrote:

>> We have further optimization for RISC-V platform in OpenSSL PR[1]. It will include
>> AES with CBC, CTR, and XTS mode. Comparing to the generic AES implementation,
>> the specialized AES-XTS one have about 3X performance improvement using
>> OpenSSL benchmark tool. If OpenSSL accepts that PR, we will create the
>> corresponding patch for Linux kernel.
>> 
>> [1]
>> https://github.com/openssl/openssl/pull/21923
>> 
> 
> This pull request doesn't appear to contain any XTS code at all, only CBC.

We have some license issues for upstream. We will append the specialized
AES modes soon.

-Jerry

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH v4 10/12] RISC-V: crypto: add Zvkned accelerated AES encryption implementation
@ 2023-09-12  7:15           ` Jerry Shih
  0 siblings, 0 replies; 100+ messages in thread
From: Jerry Shih @ 2023-09-12  7:15 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Eric Biggers, Heiko Stuebner, palmer, paul.walmsley, aou,
	herbert, davem, conor.dooley, linux-riscv, linux-kernel,
	linux-crypto, christoph.muellner, Heiko Stuebner

On Sep 12, 2023, at 15:04, Ard Biesheuvel <ardb@kernel.org> wrote:

>> We have further optimization for RISC-V platform in OpenSSL PR[1]. It will include
>> AES with CBC, CTR, and XTS mode. Comparing to the generic AES implementation,
>> the specialized AES-XTS one have about 3X performance improvement using
>> OpenSSL benchmark tool. If OpenSSL accepts that PR, we will create the
>> corresponding patch for Linux kernel.
>> 
>> [1]
>> https://github.com/openssl/openssl/pull/21923
>> 
> 
> This pull request doesn't appear to contain any XTS code at all, only CBC.

We have some license issues for upstream. We will append the specialized
AES modes soon.

-Jerry
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH v4 00/12] RISC-V: support some cryptography accelerations
  2023-07-11 15:37 ` Heiko Stuebner
@ 2023-09-14  0:11   ` Eric Biggers
  -1 siblings, 0 replies; 100+ messages in thread
From: Eric Biggers @ 2023-09-14  0:11 UTC (permalink / raw)
  To: Heiko Stuebner
  Cc: palmer, paul.walmsley, aou, herbert, davem, conor.dooley,
	linux-riscv, linux-kernel, linux-crypto, christoph.muellner,
	Heiko Stuebner

On Tue, Jul 11, 2023 at 05:37:31PM +0200, Heiko Stuebner wrote:
> From: Heiko Stuebner <heiko.stuebner@vrull.eu>
> 
> This series provides cryptographic implementations using the vector
> crypto extensions.
> 
> v13 of the vector patchset dropped the patches for in-kernel usage of
> vector instructions, I picked the ones from v12 over into this series
> for now.
> 
> My basic goal was to not re-invent cryptographic code, so the heavy
> lifting is done by those perl-asm scripts used in openssl and the perl
> code used here-in stems from code that is targetted at openssl [0] and is
> unmodified from there to limit needed review effort.
> 
> With a matching qemu (there are patches for vector-crypto flying around)
> the in-kernel crypto-selftests (also the extended ones) are very happy
> so far.

Hi Heiko!  Are you still working on this patchset?  And which of its
prerequisites still haven't been merged upstream?

- Eric

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH v4 00/12] RISC-V: support some cryptography accelerations
@ 2023-09-14  0:11   ` Eric Biggers
  0 siblings, 0 replies; 100+ messages in thread
From: Eric Biggers @ 2023-09-14  0:11 UTC (permalink / raw)
  To: Heiko Stuebner
  Cc: palmer, paul.walmsley, aou, herbert, davem, conor.dooley,
	linux-riscv, linux-kernel, linux-crypto, christoph.muellner,
	Heiko Stuebner

On Tue, Jul 11, 2023 at 05:37:31PM +0200, Heiko Stuebner wrote:
> From: Heiko Stuebner <heiko.stuebner@vrull.eu>
> 
> This series provides cryptographic implementations using the vector
> crypto extensions.
> 
> v13 of the vector patchset dropped the patches for in-kernel usage of
> vector instructions, I picked the ones from v12 over into this series
> for now.
> 
> My basic goal was to not re-invent cryptographic code, so the heavy
> lifting is done by those perl-asm scripts used in openssl and the perl
> code used here-in stems from code that is targetted at openssl [0] and is
> unmodified from there to limit needed review effort.
> 
> With a matching qemu (there are patches for vector-crypto flying around)
> the in-kernel crypto-selftests (also the extended ones) are very happy
> so far.

Hi Heiko!  Are you still working on this patchset?  And which of its
prerequisites still haven't been merged upstream?

- Eric

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH v4 00/12] RISC-V: support some cryptography accelerations
  2023-09-14  0:11   ` Eric Biggers
@ 2023-09-14  1:10     ` Charlie Jenkins
  -1 siblings, 0 replies; 100+ messages in thread
From: Charlie Jenkins @ 2023-09-14  1:10 UTC (permalink / raw)
  To: Eric Biggers
  Cc: Heiko Stuebner, palmer, paul.walmsley, aou, herbert, davem,
	conor.dooley, linux-riscv, linux-kernel, linux-crypto,
	christoph.muellner, Heiko Stuebner

On Wed, Sep 13, 2023 at 05:11:44PM -0700, Eric Biggers wrote:
> On Tue, Jul 11, 2023 at 05:37:31PM +0200, Heiko Stuebner wrote:
> > From: Heiko Stuebner <heiko.stuebner@vrull.eu>
> > 
> > This series provides cryptographic implementations using the vector
> > crypto extensions.
> > 
> > v13 of the vector patchset dropped the patches for in-kernel usage of
> > vector instructions, I picked the ones from v12 over into this series
> > for now.
> > 
> > My basic goal was to not re-invent cryptographic code, so the heavy
> > lifting is done by those perl-asm scripts used in openssl and the perl
> > code used here-in stems from code that is targetted at openssl [0] and is
> > unmodified from there to limit needed review effort.
> > 
> > With a matching qemu (there are patches for vector-crypto flying around)
> > the in-kernel crypto-selftests (also the extended ones) are very happy
> > so far.
> 
> Hi Heiko!  Are you still working on this patchset?  And which of its
> prerequisites still haven't been merged upstream?
> 
> - Eric
> 
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv
It is my understanding that Heiko is taking a break from development, I
don't think he will be working on this soon.

- Charlie

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH v4 00/12] RISC-V: support some cryptography accelerations
@ 2023-09-14  1:10     ` Charlie Jenkins
  0 siblings, 0 replies; 100+ messages in thread
From: Charlie Jenkins @ 2023-09-14  1:10 UTC (permalink / raw)
  To: Eric Biggers
  Cc: Heiko Stuebner, palmer, paul.walmsley, aou, herbert, davem,
	conor.dooley, linux-riscv, linux-kernel, linux-crypto,
	christoph.muellner, Heiko Stuebner

On Wed, Sep 13, 2023 at 05:11:44PM -0700, Eric Biggers wrote:
> On Tue, Jul 11, 2023 at 05:37:31PM +0200, Heiko Stuebner wrote:
> > From: Heiko Stuebner <heiko.stuebner@vrull.eu>
> > 
> > This series provides cryptographic implementations using the vector
> > crypto extensions.
> > 
> > v13 of the vector patchset dropped the patches for in-kernel usage of
> > vector instructions, I picked the ones from v12 over into this series
> > for now.
> > 
> > My basic goal was to not re-invent cryptographic code, so the heavy
> > lifting is done by those perl-asm scripts used in openssl and the perl
> > code used here-in stems from code that is targetted at openssl [0] and is
> > unmodified from there to limit needed review effort.
> > 
> > With a matching qemu (there are patches for vector-crypto flying around)
> > the in-kernel crypto-selftests (also the extended ones) are very happy
> > so far.
> 
> Hi Heiko!  Are you still working on this patchset?  And which of its
> prerequisites still haven't been merged upstream?
> 
> - Eric
> 
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv
It is my understanding that Heiko is taking a break from development, I
don't think he will be working on this soon.

- Charlie

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH v4 10/12] RISC-V: crypto: add Zvkned accelerated AES encryption implementation
  2023-09-12  7:15           ` Jerry Shih
@ 2023-09-15  1:28             ` He-Jie Shih
  -1 siblings, 0 replies; 100+ messages in thread
From: He-Jie Shih @ 2023-09-15  1:28 UTC (permalink / raw)
  To: CAMj1kXEGnZC6nge42WeBML9Vx6K6Lezt8Cc1faP+3gN=TzFgvA
  Cc: Ard Biesheuvel, Eric Biggers, Heiko Stuebner, palmer,
	paul.walmsley, aou, herbert, davem, conor.dooley, linux-riscv,
	linux-kernel, linux-crypto, christoph.muellner, Heiko Stuebner

On Sep 12, 2023, at 15:15, Jerry Shih <jerry.shih@sifive.com> wrote:

>> This pull request doesn't appear to contain any XTS code at all, only CBC.
> 
> We have some license issues for upstream. We will append the specialized
> AES modes soon.

We have the XTS and other specialized AES modes in OpenSSL PR[1] now.
The specialized implementations all perform better than generic implementation
on FPGA.
We will try to make that implementations happen in kernel.

-Jerry

[1]
https://github.com/openssl/openssl/pull/21923


^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH v4 10/12] RISC-V: crypto: add Zvkned accelerated AES encryption implementation
@ 2023-09-15  1:28             ` He-Jie Shih
  0 siblings, 0 replies; 100+ messages in thread
From: He-Jie Shih @ 2023-09-15  1:28 UTC (permalink / raw)
  To: CAMj1kXEGnZC6nge42WeBML9Vx6K6Lezt8Cc1faP+3gN=TzFgvA
  Cc: Ard Biesheuvel, Eric Biggers, Heiko Stuebner, palmer,
	paul.walmsley, aou, herbert, davem, conor.dooley, linux-riscv,
	linux-kernel, linux-crypto, christoph.muellner, Heiko Stuebner

On Sep 12, 2023, at 15:15, Jerry Shih <jerry.shih@sifive.com> wrote:

>> This pull request doesn't appear to contain any XTS code at all, only CBC.
> 
> We have some license issues for upstream. We will append the specialized
> AES modes soon.

We have the XTS and other specialized AES modes in OpenSSL PR[1] now.
The specialized implementations all perform better than generic implementation
on FPGA.
We will try to make that implementations happen in kernel.

-Jerry

[1]
https://github.com/openssl/openssl/pull/21923


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH v4 00/12] RISC-V: support some cryptography accelerations
  2023-09-14  1:10     ` Charlie Jenkins
@ 2023-09-15  1:48       ` He-Jie Shih
  -1 siblings, 0 replies; 100+ messages in thread
From: He-Jie Shih @ 2023-09-15  1:48 UTC (permalink / raw)
  To: Charlie Jenkins
  Cc: Eric Biggers, Heiko Stuebner, palmer, paul.walmsley, aou,
	herbert, davem, conor.dooley, linux-riscv, linux-kernel,
	linux-crypto, christoph.muellner, Heiko Stuebner

On Sep 14, 2023, at 09:10, Charlie Jenkins <charlie@rivosinc.com> wrote:

> On Wed, Sep 13, 2023 at 05:11:44PM -0700, Eric Biggers wrote:
>> On Tue, Jul 11, 2023 at 05:37:31PM +0200, Heiko Stuebner wrote:
>> 
>> Hi Heiko!  Are you still working on this patchset?  And which of its
>> prerequisites still haven't been merged upstream?
>> 
>> - Eric
> It is my understanding that Heiko is taking a break from development, I
> don't think he will be working on this soon.

We would like to take over these RISC-V vector crypto implementations.
Our proposed implementations is under reviewing in OpenSSL pr.
And I will check the gluing parts between Linux kernel and OpenSSL.

-Jerry

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH v4 00/12] RISC-V: support some cryptography accelerations
@ 2023-09-15  1:48       ` He-Jie Shih
  0 siblings, 0 replies; 100+ messages in thread
From: He-Jie Shih @ 2023-09-15  1:48 UTC (permalink / raw)
  To: Charlie Jenkins
  Cc: Eric Biggers, Heiko Stuebner, palmer, paul.walmsley, aou,
	herbert, davem, conor.dooley, linux-riscv, linux-kernel,
	linux-crypto, christoph.muellner, Heiko Stuebner

On Sep 14, 2023, at 09:10, Charlie Jenkins <charlie@rivosinc.com> wrote:

> On Wed, Sep 13, 2023 at 05:11:44PM -0700, Eric Biggers wrote:
>> On Tue, Jul 11, 2023 at 05:37:31PM +0200, Heiko Stuebner wrote:
>> 
>> Hi Heiko!  Are you still working on this patchset?  And which of its
>> prerequisites still haven't been merged upstream?
>> 
>> - Eric
> It is my understanding that Heiko is taking a break from development, I
> don't think he will be working on this soon.

We would like to take over these RISC-V vector crypto implementations.
Our proposed implementations is under reviewing in OpenSSL pr.
And I will check the gluing parts between Linux kernel and OpenSSL.

-Jerry

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH v4 00/12] RISC-V: support some cryptography accelerations
  2023-09-15  1:48       ` He-Jie Shih
@ 2023-09-15  3:21         ` Jerry Shih
  -1 siblings, 0 replies; 100+ messages in thread
From: Jerry Shih @ 2023-09-15  3:21 UTC (permalink / raw)
  To: He-Jie Shih
  Cc: Charlie Jenkins, Eric Biggers, Heiko Stuebner, palmer,
	paul.walmsley, aou, herbert, davem, conor.dooley, linux-riscv,
	linux-kernel, linux-crypto, christoph.muellner, Heiko Stuebner

On Sep 15, 2023, at 09:48, He-Jie Shih <bignose1007@gmail.com> wrote:

> On Sep 14, 2023, at 09:10, Charlie Jenkins <charlie@rivosinc.com> wrote:
> 
>> On Wed, Sep 13, 2023 at 05:11:44PM -0700, Eric Biggers wrote:
>>> On Tue, Jul 11, 2023 at 05:37:31PM +0200, Heiko Stuebner wrote:
>>> 
>>> Hi Heiko!  Are you still working on this patchset?  And which of its
>>> prerequisites still haven't been merged upstream?
>>> 
>>> - Eric
>> It is my understanding that Heiko is taking a break from development, I
>> don't think he will be working on this soon.
> 
> We would like to take over these RISC-V vector crypto implementations.
> Our proposed implementations is under reviewing in OpenSSL pr.
> And I will check the gluing parts between Linux kernel and OpenSSL.

The OpenSSL PR is at [1].
And we are from SiFive.

-Jerry

[1]
https://github.com/openssl/openssl/pull/21923

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH v4 00/12] RISC-V: support some cryptography accelerations
@ 2023-09-15  3:21         ` Jerry Shih
  0 siblings, 0 replies; 100+ messages in thread
From: Jerry Shih @ 2023-09-15  3:21 UTC (permalink / raw)
  To: He-Jie Shih
  Cc: Charlie Jenkins, Eric Biggers, Heiko Stuebner, palmer,
	paul.walmsley, aou, herbert, davem, conor.dooley, linux-riscv,
	linux-kernel, linux-crypto, christoph.muellner, Heiko Stuebner

On Sep 15, 2023, at 09:48, He-Jie Shih <bignose1007@gmail.com> wrote:

> On Sep 14, 2023, at 09:10, Charlie Jenkins <charlie@rivosinc.com> wrote:
> 
>> On Wed, Sep 13, 2023 at 05:11:44PM -0700, Eric Biggers wrote:
>>> On Tue, Jul 11, 2023 at 05:37:31PM +0200, Heiko Stuebner wrote:
>>> 
>>> Hi Heiko!  Are you still working on this patchset?  And which of its
>>> prerequisites still haven't been merged upstream?
>>> 
>>> - Eric
>> It is my understanding that Heiko is taking a break from development, I
>> don't think he will be working on this soon.
> 
> We would like to take over these RISC-V vector crypto implementations.
> Our proposed implementations is under reviewing in OpenSSL pr.
> And I will check the gluing parts between Linux kernel and OpenSSL.

The OpenSSL PR is at [1].
And we are from SiFive.

-Jerry

[1]
https://github.com/openssl/openssl/pull/21923

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH v4 00/12] RISC-V: support some cryptography accelerations
  2023-09-15  3:21         ` Jerry Shih
@ 2023-10-06 19:47           ` Eric Biggers
  -1 siblings, 0 replies; 100+ messages in thread
From: Eric Biggers @ 2023-10-06 19:47 UTC (permalink / raw)
  To: Jerry Shih
  Cc: He-Jie Shih, Charlie Jenkins, Heiko Stuebner, palmer,
	paul.walmsley, aou, herbert, davem, conor.dooley, linux-riscv,
	linux-kernel, linux-crypto, christoph.muellner, Heiko Stuebner

On Fri, Sep 15, 2023 at 11:21:28AM +0800, Jerry Shih wrote:
> On Sep 15, 2023, at 09:48, He-Jie Shih <bignose1007@gmail.com> wrote:
> 
> > On Sep 14, 2023, at 09:10, Charlie Jenkins <charlie@rivosinc.com> wrote:
> > 
> >> On Wed, Sep 13, 2023 at 05:11:44PM -0700, Eric Biggers wrote:
> >>> On Tue, Jul 11, 2023 at 05:37:31PM +0200, Heiko Stuebner wrote:
> >>> 
> >>> Hi Heiko!  Are you still working on this patchset?  And which of its
> >>> prerequisites still haven't been merged upstream?
> >>> 
> >>> - Eric
> >> It is my understanding that Heiko is taking a break from development, I
> >> don't think he will be working on this soon.
> > 
> > We would like to take over these RISC-V vector crypto implementations.
> > Our proposed implementations is under reviewing in OpenSSL pr.
> > And I will check the gluing parts between Linux kernel and OpenSSL.
> 
> The OpenSSL PR is at [1].
> And we are from SiFive.
> 
> -Jerry
> 
> [1]
> https://github.com/openssl/openssl/pull/21923

Hi Jerry, I'm wondering if you have an update on this?  Do you need any help?

I'm also wondering about riscv.pm and the choice of generating the crypto
instructions from .words instead of using the assembler.  It makes it
significantly harder to review the code, IMO.  Can we depend on assembler
support for these instructions, or is that just not ready yet?

- Eric

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH v4 00/12] RISC-V: support some cryptography accelerations
@ 2023-10-06 19:47           ` Eric Biggers
  0 siblings, 0 replies; 100+ messages in thread
From: Eric Biggers @ 2023-10-06 19:47 UTC (permalink / raw)
  To: Jerry Shih
  Cc: He-Jie Shih, Charlie Jenkins, Heiko Stuebner, palmer,
	paul.walmsley, aou, herbert, davem, conor.dooley, linux-riscv,
	linux-kernel, linux-crypto, christoph.muellner, Heiko Stuebner

On Fri, Sep 15, 2023 at 11:21:28AM +0800, Jerry Shih wrote:
> On Sep 15, 2023, at 09:48, He-Jie Shih <bignose1007@gmail.com> wrote:
> 
> > On Sep 14, 2023, at 09:10, Charlie Jenkins <charlie@rivosinc.com> wrote:
> > 
> >> On Wed, Sep 13, 2023 at 05:11:44PM -0700, Eric Biggers wrote:
> >>> On Tue, Jul 11, 2023 at 05:37:31PM +0200, Heiko Stuebner wrote:
> >>> 
> >>> Hi Heiko!  Are you still working on this patchset?  And which of its
> >>> prerequisites still haven't been merged upstream?
> >>> 
> >>> - Eric
> >> It is my understanding that Heiko is taking a break from development, I
> >> don't think he will be working on this soon.
> > 
> > We would like to take over these RISC-V vector crypto implementations.
> > Our proposed implementations is under reviewing in OpenSSL pr.
> > And I will check the gluing parts between Linux kernel and OpenSSL.
> 
> The OpenSSL PR is at [1].
> And we are from SiFive.
> 
> -Jerry
> 
> [1]
> https://github.com/openssl/openssl/pull/21923

Hi Jerry, I'm wondering if you have an update on this?  Do you need any help?

I'm also wondering about riscv.pm and the choice of generating the crypto
instructions from .words instead of using the assembler.  It makes it
significantly harder to review the code, IMO.  Can we depend on assembler
support for these instructions, or is that just not ready yet?

- Eric

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH v4 00/12] RISC-V: support some cryptography accelerations
  2023-10-06 19:47           ` Eric Biggers
@ 2023-10-06 21:01             ` He-Jie Shih
  -1 siblings, 0 replies; 100+ messages in thread
From: He-Jie Shih @ 2023-10-06 21:01 UTC (permalink / raw)
  To: Eric Biggers
  Cc: Charlie Jenkins, Heiko Stuebner, palmer, paul.walmsley, aou,
	herbert, davem, conor.dooley, linux-riscv, linux-kernel,
	linux-crypto, christoph.muellner, Heiko Stuebner

On Oct 7, 2023, at 03:47, Eric Biggers <ebiggers@kernel.org> wrote:
> On Fri, Sep 15, 2023 at 11:21:28AM +0800, Jerry Shih wrote:
>> On Sep 15, 2023, at 09:48, He-Jie Shih <bignose1007@gmail.com> wrote:
>> The OpenSSL PR is at [1].
>> And we are from SiFive.
>> 
>> -Jerry
>> 
>> [1]
>> https://github.com/openssl/openssl/pull/21923
> 
> Hi Jerry, I'm wondering if you have an update on this?  Do you need any help?

We have specialized aes-cbc/ecb/ctr patch locally and pass the `testmgr` test
cases. But the test patterns in `testmgr` are quite simple, I think it doesn't test the
corner case(e.g. aes-xts with tail element).

For aes-xts, I'm trying to update the implementation from OpenSSL. The design
philosophy is different between OpenSSL and linux. In linux crypto, the data will
be split into `scatterlist`. I need to preserve the aes-xts's iv for each scatterlist
entry call. And I'm thinking about how to handle the tail data in a simple way.
By the way, the `xts(aes)` implementation for arm and x86 are using
`cra_blocksize= AES_BLOCK_SIZE`. I don't know why we need to handle the tail
element. I think we will hit `EINVAL` error in `skcipher_walk_next()` if the data size
it not be a multiple of block size.

Overall, we will have
1) aes cipher
2) aes with cbc/ecb/ctr/xts mode
3) sha256/512 for `vlen>=128` platform
4) sm3 for `vlen>=128` platform
5) sm4
6) ghash
7) `chacha20` stream cipher

The vector crypto pr in OpenSSL is under reviewing, we are still updating the
perl file into linux.

The most complicated `gcm(aes)` mode will be in our next plan.

> I'm also wondering about riscv.pm and the choice of generating the crypto
> instructions from .words instead of using the assembler.  It makes it
> significantly harder to review the code, IMO.  Can we depend on assembler
> support for these instructions, or is that just not ready yet?

I have asked the same question before[1]. The reason is that Openssl could use
very old compiler for compiling. Thus, the assembler might not know the standard
rvv 1.0[2] and other vector crypto[3] instructions. That's why we use opcode for all
vector instructions. IMO, I would prefer to use opcode for `vector crypto` only. The
gcc-12 and clang-14 are already supporting rvv 1.0. Actually, I just read the `perl`
file instead of the actually generated opcode for OpenSSL pr reviewing. And it's
not hard to read the perl code.


Thanks,
- Jerry

[1]
https://github.com/openssl/openssl/pull/20149#discussion_r1244655440
[2]
https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc
[3
https://github.com/riscv/riscv-crypto/blob/main/doc/vector/riscv-crypto-spec-vector.adoc]

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH v4 00/12] RISC-V: support some cryptography accelerations
@ 2023-10-06 21:01             ` He-Jie Shih
  0 siblings, 0 replies; 100+ messages in thread
From: He-Jie Shih @ 2023-10-06 21:01 UTC (permalink / raw)
  To: Eric Biggers
  Cc: Charlie Jenkins, Heiko Stuebner, palmer, paul.walmsley, aou,
	herbert, davem, conor.dooley, linux-riscv, linux-kernel,
	linux-crypto, christoph.muellner, Heiko Stuebner

On Oct 7, 2023, at 03:47, Eric Biggers <ebiggers@kernel.org> wrote:
> On Fri, Sep 15, 2023 at 11:21:28AM +0800, Jerry Shih wrote:
>> On Sep 15, 2023, at 09:48, He-Jie Shih <bignose1007@gmail.com> wrote:
>> The OpenSSL PR is at [1].
>> And we are from SiFive.
>> 
>> -Jerry
>> 
>> [1]
>> https://github.com/openssl/openssl/pull/21923
> 
> Hi Jerry, I'm wondering if you have an update on this?  Do you need any help?

We have specialized aes-cbc/ecb/ctr patch locally and pass the `testmgr` test
cases. But the test patterns in `testmgr` are quite simple, I think it doesn't test the
corner case(e.g. aes-xts with tail element).

For aes-xts, I'm trying to update the implementation from OpenSSL. The design
philosophy is different between OpenSSL and linux. In linux crypto, the data will
be split into `scatterlist`. I need to preserve the aes-xts's iv for each scatterlist
entry call. And I'm thinking about how to handle the tail data in a simple way.
By the way, the `xts(aes)` implementation for arm and x86 are using
`cra_blocksize= AES_BLOCK_SIZE`. I don't know why we need to handle the tail
element. I think we will hit `EINVAL` error in `skcipher_walk_next()` if the data size
it not be a multiple of block size.

Overall, we will have
1) aes cipher
2) aes with cbc/ecb/ctr/xts mode
3) sha256/512 for `vlen>=128` platform
4) sm3 for `vlen>=128` platform
5) sm4
6) ghash
7) `chacha20` stream cipher

The vector crypto pr in OpenSSL is under reviewing, we are still updating the
perl file into linux.

The most complicated `gcm(aes)` mode will be in our next plan.

> I'm also wondering about riscv.pm and the choice of generating the crypto
> instructions from .words instead of using the assembler.  It makes it
> significantly harder to review the code, IMO.  Can we depend on assembler
> support for these instructions, or is that just not ready yet?

I have asked the same question before[1]. The reason is that Openssl could use
very old compiler for compiling. Thus, the assembler might not know the standard
rvv 1.0[2] and other vector crypto[3] instructions. That's why we use opcode for all
vector instructions. IMO, I would prefer to use opcode for `vector crypto` only. The
gcc-12 and clang-14 are already supporting rvv 1.0. Actually, I just read the `perl`
file instead of the actually generated opcode for OpenSSL pr reviewing. And it's
not hard to read the perl code.


Thanks,
- Jerry

[1]
https://github.com/openssl/openssl/pull/20149#discussion_r1244655440
[2]
https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc
[3
https://github.com/riscv/riscv-crypto/blob/main/doc/vector/riscv-crypto-spec-vector.adoc]
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH v4 00/12] RISC-V: support some cryptography accelerations
  2023-10-06 21:01             ` He-Jie Shih
@ 2023-10-06 23:33               ` Ard Biesheuvel
  -1 siblings, 0 replies; 100+ messages in thread
From: Ard Biesheuvel @ 2023-10-06 23:33 UTC (permalink / raw)
  To: 20231006194741.GA68531
  Cc: Eric Biggers, Charlie Jenkins, Heiko Stuebner, palmer,
	paul.walmsley, aou, herbert, davem, conor.dooley, linux-riscv,
	linux-kernel, linux-crypto, christoph.muellner, Heiko Stuebner

On Fri, 6 Oct 2023 at 23:01, He-Jie Shih <bignose1007@gmail.com> wrote:
>
> On Oct 7, 2023, at 03:47, Eric Biggers <ebiggers@kernel.org> wrote:
> > On Fri, Sep 15, 2023 at 11:21:28AM +0800, Jerry Shih wrote:
> >> On Sep 15, 2023, at 09:48, He-Jie Shih <bignose1007@gmail.com> wrote:
> >> The OpenSSL PR is at [1].
> >> And we are from SiFive.
> >>
> >> -Jerry
> >>
> >> [1]
> >> https://github.com/openssl/openssl/pull/21923
> >
> > Hi Jerry, I'm wondering if you have an update on this?  Do you need any help?
>
> We have specialized aes-cbc/ecb/ctr patch locally and pass the `testmgr` test
> cases. But the test patterns in `testmgr` are quite simple, I think it doesn't test the
> corner case(e.g. aes-xts with tail element).
>

There should be test cases for that.

> For aes-xts, I'm trying to update the implementation from OpenSSL. The design
> philosophy is different between OpenSSL and linux. In linux crypto, the data will
> be split into `scatterlist`. I need to preserve the aes-xts's iv for each scatterlist
> entry call.

Yes, this applies to all block ciphers that take an IV.

> And I'm thinking about how to handle the tail data in a simple way.

The RISC-V vector ISA is quite advanced, so there may be a better
trick using predicates etc but otherwise, I suppose you could reuse
the same trick that other asm implementations use, which is to use
unaligned loads and stores for the final blocks, and to use a vector
permute with a permute table to shift the bytes in the registers. But
this is not performance critical, given that existing in-kernel users
use sector or page size inputs only.

> By the way, the `xts(aes)` implementation for arm and x86 are using
> `cra_blocksize= AES_BLOCK_SIZE`. I don't know why we need to handle the tail
> element. I think we will hit `EINVAL` error in `skcipher_walk_next()` if the data size
> it not be a multiple of block size.
>

No, both XTS and CBC-CTS permit inputs that are not a multiple of the
block size, and will use some form of ciphertext stealing for the
final tail. There is a generic CTS template that wraps CBC but
combining them in the same way (e.g., using vector permute) will speed
up things substantially. *However*, I'm not sure how relevant CBC-CTS
is in the kernel, given that only fscrypt uses it IIRC but actually
prefers something else so for new systems perhaps you shouldn't
bother.

> Overall, we will have
> 1) aes cipher
> 2) aes with cbc/ecb/ctr/xts mode
> 3) sha256/512 for `vlen>=128` platform
> 4) sm3 for `vlen>=128` platform
> 5) sm4
> 6) ghash
> 7) `chacha20` stream cipher
>
> The vector crypto pr in OpenSSL is under reviewing, we are still updating the
> perl file into linux.
>
> The most complicated `gcm(aes)` mode will be in our next plan.
>
> > I'm also wondering about riscv.pm and the choice of generating the crypto
> > instructions from .words instead of using the assembler.  It makes it
> > significantly harder to review the code, IMO.  Can we depend on assembler
> > support for these instructions, or is that just not ready yet?
>
> I have asked the same question before[1]. The reason is that Openssl could use
> very old compiler for compiling. Thus, the assembler might not know the standard
> rvv 1.0[2] and other vector crypto[3] instructions. That's why we use opcode for all
> vector instructions. IMO, I would prefer to use opcode for `vector crypto` only. The
> gcc-12 and clang-14 are already supporting rvv 1.0. Actually, I just read the `perl`
> file instead of the actually generated opcode for OpenSSL pr reviewing. And it's
> not hard to read the perl code.
>

I understand the desire to reuse code, and OpenSSL already relies on
so-called perlasm for this, but I think this is not a great choice,
and I actually think this was a mistake for RISC-V. OpenSSL relies on
perlasm for things like emittting different function pro-/epilogues
depending on the calling convention (SysV versus MS on x86_64, for
instance), but RISC-V does not have that much variety, and already
supports the insn_r / insn_i pseudo instructions to emit arbitrary
opcodes while still supporting named registers as usual. [Maybe my
experience does not quite extrapolate to the vector ISA, but I managed
to implement scalar AES [0] using the insn_r and insn_i pseudo
instructions (which are generally provided by the assembler but Linux
has fallback CPP macros for them as well), and this results on much
more maintainable code IMO.[

We are using some of the OpenSSL perlasm in the kernel already (and
some of it was introduced by me) but I don't think we should blindly
reuse all of the RISC-V code if  some of it can straight-forwardly be
written as normal .S files.

[0] https://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux.git/log/?h=riscv-scalar-aes

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH v4 00/12] RISC-V: support some cryptography accelerations
@ 2023-10-06 23:33               ` Ard Biesheuvel
  0 siblings, 0 replies; 100+ messages in thread
From: Ard Biesheuvel @ 2023-10-06 23:33 UTC (permalink / raw)
  To: 20231006194741.GA68531
  Cc: Eric Biggers, Charlie Jenkins, Heiko Stuebner, palmer,
	paul.walmsley, aou, herbert, davem, conor.dooley, linux-riscv,
	linux-kernel, linux-crypto, christoph.muellner, Heiko Stuebner

On Fri, 6 Oct 2023 at 23:01, He-Jie Shih <bignose1007@gmail.com> wrote:
>
> On Oct 7, 2023, at 03:47, Eric Biggers <ebiggers@kernel.org> wrote:
> > On Fri, Sep 15, 2023 at 11:21:28AM +0800, Jerry Shih wrote:
> >> On Sep 15, 2023, at 09:48, He-Jie Shih <bignose1007@gmail.com> wrote:
> >> The OpenSSL PR is at [1].
> >> And we are from SiFive.
> >>
> >> -Jerry
> >>
> >> [1]
> >> https://github.com/openssl/openssl/pull/21923
> >
> > Hi Jerry, I'm wondering if you have an update on this?  Do you need any help?
>
> We have specialized aes-cbc/ecb/ctr patch locally and pass the `testmgr` test
> cases. But the test patterns in `testmgr` are quite simple, I think it doesn't test the
> corner case(e.g. aes-xts with tail element).
>

There should be test cases for that.

> For aes-xts, I'm trying to update the implementation from OpenSSL. The design
> philosophy is different between OpenSSL and linux. In linux crypto, the data will
> be split into `scatterlist`. I need to preserve the aes-xts's iv for each scatterlist
> entry call.

Yes, this applies to all block ciphers that take an IV.

> And I'm thinking about how to handle the tail data in a simple way.

The RISC-V vector ISA is quite advanced, so there may be a better
trick using predicates etc but otherwise, I suppose you could reuse
the same trick that other asm implementations use, which is to use
unaligned loads and stores for the final blocks, and to use a vector
permute with a permute table to shift the bytes in the registers. But
this is not performance critical, given that existing in-kernel users
use sector or page size inputs only.

> By the way, the `xts(aes)` implementation for arm and x86 are using
> `cra_blocksize= AES_BLOCK_SIZE`. I don't know why we need to handle the tail
> element. I think we will hit `EINVAL` error in `skcipher_walk_next()` if the data size
> it not be a multiple of block size.
>

No, both XTS and CBC-CTS permit inputs that are not a multiple of the
block size, and will use some form of ciphertext stealing for the
final tail. There is a generic CTS template that wraps CBC but
combining them in the same way (e.g., using vector permute) will speed
up things substantially. *However*, I'm not sure how relevant CBC-CTS
is in the kernel, given that only fscrypt uses it IIRC but actually
prefers something else so for new systems perhaps you shouldn't
bother.

> Overall, we will have
> 1) aes cipher
> 2) aes with cbc/ecb/ctr/xts mode
> 3) sha256/512 for `vlen>=128` platform
> 4) sm3 for `vlen>=128` platform
> 5) sm4
> 6) ghash
> 7) `chacha20` stream cipher
>
> The vector crypto pr in OpenSSL is under reviewing, we are still updating the
> perl file into linux.
>
> The most complicated `gcm(aes)` mode will be in our next plan.
>
> > I'm also wondering about riscv.pm and the choice of generating the crypto
> > instructions from .words instead of using the assembler.  It makes it
> > significantly harder to review the code, IMO.  Can we depend on assembler
> > support for these instructions, or is that just not ready yet?
>
> I have asked the same question before[1]. The reason is that Openssl could use
> very old compiler for compiling. Thus, the assembler might not know the standard
> rvv 1.0[2] and other vector crypto[3] instructions. That's why we use opcode for all
> vector instructions. IMO, I would prefer to use opcode for `vector crypto` only. The
> gcc-12 and clang-14 are already supporting rvv 1.0. Actually, I just read the `perl`
> file instead of the actually generated opcode for OpenSSL pr reviewing. And it's
> not hard to read the perl code.
>

I understand the desire to reuse code, and OpenSSL already relies on
so-called perlasm for this, but I think this is not a great choice,
and I actually think this was a mistake for RISC-V. OpenSSL relies on
perlasm for things like emittting different function pro-/epilogues
depending on the calling convention (SysV versus MS on x86_64, for
instance), but RISC-V does not have that much variety, and already
supports the insn_r / insn_i pseudo instructions to emit arbitrary
opcodes while still supporting named registers as usual. [Maybe my
experience does not quite extrapolate to the vector ISA, but I managed
to implement scalar AES [0] using the insn_r and insn_i pseudo
instructions (which are generally provided by the assembler but Linux
has fallback CPP macros for them as well), and this results on much
more maintainable code IMO.[

We are using some of the OpenSSL perlasm in the kernel already (and
some of it was introduced by me) but I don't think we should blindly
reuse all of the RISC-V code if  some of it can straight-forwardly be
written as normal .S files.

[0] https://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux.git/log/?h=riscv-scalar-aes

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH v4 00/12] RISC-V: support some cryptography accelerations
  2023-10-06 21:01             ` He-Jie Shih
@ 2023-10-07 21:30               ` Eric Biggers
  -1 siblings, 0 replies; 100+ messages in thread
From: Eric Biggers @ 2023-10-07 21:30 UTC (permalink / raw)
  To: He-Jie Shih
  Cc: Charlie Jenkins, Heiko Stuebner, palmer, paul.walmsley, aou,
	herbert, davem, conor.dooley, linux-riscv, linux-kernel,
	linux-crypto, christoph.muellner, Heiko Stuebner

On Sat, Oct 07, 2023 at 05:01:45AM +0800, He-Jie Shih wrote:
> Reply-To: 20231006194741.GA68531@google.com
> X-Mailer: Apple Mail (2.3445.9.7)

Can you please fix your email client configuration?  Your emails have a bogus
Reply-To header, which makes replies not be sent to you by default.

- Eric

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH v4 00/12] RISC-V: support some cryptography accelerations
@ 2023-10-07 21:30               ` Eric Biggers
  0 siblings, 0 replies; 100+ messages in thread
From: Eric Biggers @ 2023-10-07 21:30 UTC (permalink / raw)
  To: He-Jie Shih
  Cc: Charlie Jenkins, Heiko Stuebner, palmer, paul.walmsley, aou,
	herbert, davem, conor.dooley, linux-riscv, linux-kernel,
	linux-crypto, christoph.muellner, Heiko Stuebner

On Sat, Oct 07, 2023 at 05:01:45AM +0800, He-Jie Shih wrote:
> Reply-To: 20231006194741.GA68531@google.com
> X-Mailer: Apple Mail (2.3445.9.7)

Can you please fix your email client configuration?  Your emails have a bogus
Reply-To header, which makes replies not be sent to you by default.

- Eric

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH v4 00/12] RISC-V: support some cryptography accelerations
  2023-10-06 23:33               ` Ard Biesheuvel
@ 2023-10-07 22:16                 ` Eric Biggers
  -1 siblings, 0 replies; 100+ messages in thread
From: Eric Biggers @ 2023-10-07 22:16 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: He-Jie Shih, Charlie Jenkins, Heiko Stuebner, palmer,
	paul.walmsley, aou, herbert, davem, conor.dooley, linux-riscv,
	linux-kernel, linux-crypto, christoph.muellner, Heiko Stuebner

On Sat, Oct 07, 2023 at 01:33:48AM +0200, Ard Biesheuvel wrote:
> On Fri, 6 Oct 2023 at 23:01, He-Jie Shih <bignose1007@gmail.com> wrote:
> >
> > On Oct 7, 2023, at 03:47, Eric Biggers <ebiggers@kernel.org> wrote:
> > > On Fri, Sep 15, 2023 at 11:21:28AM +0800, Jerry Shih wrote:
> > >> On Sep 15, 2023, at 09:48, He-Jie Shih <bignose1007@gmail.com> wrote:
> > >> The OpenSSL PR is at [1].
> > >> And we are from SiFive.
> > >>
> > >> -Jerry
> > >>
> > >> [1]
> > >> https://github.com/openssl/openssl/pull/21923
> > >
> > > Hi Jerry, I'm wondering if you have an update on this?  Do you need any help?
> >
> > We have specialized aes-cbc/ecb/ctr patch locally and pass the `testmgr` test
> > cases. But the test patterns in `testmgr` are quite simple, I think it doesn't test the
> > corner case(e.g. aes-xts with tail element).
> >
> 
> There should be test cases for that.

Yes, non-block-aligned AES-XTS test vectors should be added to crypto/testmgr.h.
Though, that case should be already covered by the randomized tests enabled by
CONFIG_CRYPTO_MANAGER_EXTRA_TESTS=y, which I very strongly recommend enabling
during development.

> 
> > For aes-xts, I'm trying to update the implementation from OpenSSL. The design
> > philosophy is different between OpenSSL and linux. In linux crypto, the data will
> > be split into `scatterlist`. I need to preserve the aes-xts's iv for each scatterlist
> > entry call.
> 
> Yes, this applies to all block ciphers that take an IV.
> 
> > And I'm thinking about how to handle the tail data in a simple way.
> 
> The RISC-V vector ISA is quite advanced, so there may be a better
> trick using predicates etc but otherwise, I suppose you could reuse
> the same trick that other asm implementations use, which is to use
> unaligned loads and stores for the final blocks, and to use a vector
> permute with a permute table to shift the bytes in the registers. But
> this is not performance critical, given that existing in-kernel users
> use sector or page size inputs only.
> 
> > By the way, the `xts(aes)` implementation for arm and x86 are using
> > `cra_blocksize= AES_BLOCK_SIZE`. I don't know why we need to handle the tail
> > element. I think we will hit `EINVAL` error in `skcipher_walk_next()` if the data size
> > it not be a multiple of block size.
> >
> 
> No, both XTS and CBC-CTS permit inputs that are not a multiple of the
> block size, and will use some form of ciphertext stealing for the
> final tail. There is a generic CTS template that wraps CBC but
> combining them in the same way (e.g., using vector permute) will speed
> up things substantially. *However*, I'm not sure how relevant CBC-CTS
> is in the kernel, given that only fscrypt uses it IIRC but actually
> prefers something else so for new systems perhaps you shouldn't
> bother.
> 
> > Overall, we will have
> > 1) aes cipher
> > 2) aes with cbc/ecb/ctr/xts mode
> > 3) sha256/512 for `vlen>=128` platform
> > 4) sm3 for `vlen>=128` platform
> > 5) sm4
> > 6) ghash
> > 7) `chacha20` stream cipher
> >
> > The vector crypto pr in OpenSSL is under reviewing, we are still updating the
> > perl file into linux.
> >
> > The most complicated `gcm(aes)` mode will be in our next plan.
> >
> > > I'm also wondering about riscv.pm and the choice of generating the crypto
> > > instructions from .words instead of using the assembler.  It makes it
> > > significantly harder to review the code, IMO.  Can we depend on assembler
> > > support for these instructions, or is that just not ready yet?
> >
> > I have asked the same question before[1]. The reason is that Openssl could use
> > very old compiler for compiling. Thus, the assembler might not know the standard
> > rvv 1.0[2] and other vector crypto[3] instructions. That's why we use opcode for all
> > vector instructions. IMO, I would prefer to use opcode for `vector crypto` only. The
> > gcc-12 and clang-14 are already supporting rvv 1.0. Actually, I just read the `perl`
> > file instead of the actually generated opcode for OpenSSL pr reviewing. And it's
> > not hard to read the perl code.
> >
> 
> I understand the desire to reuse code, and OpenSSL already relies on
> so-called perlasm for this, but I think this is not a great choice,
> and I actually think this was a mistake for RISC-V. OpenSSL relies on
> perlasm for things like emittting different function pro-/epilogues
> depending on the calling convention (SysV versus MS on x86_64, for
> instance), but RISC-V does not have that much variety, and already
> supports the insn_r / insn_i pseudo instructions to emit arbitrary
> opcodes while still supporting named registers as usual. [Maybe my
> experience does not quite extrapolate to the vector ISA, but I managed
> to implement scalar AES [0] using the insn_r and insn_i pseudo
> instructions (which are generally provided by the assembler but Linux
> has fallback CPP macros for them as well), and this results on much
> more maintainable code IMO.[
> 
> We are using some of the OpenSSL perlasm in the kernel already (and
> some of it was introduced by me) but I don't think we should blindly
> reuse all of the RISC-V code if  some of it can straight-forwardly be
> written as normal .S files.
> 
> [0] https://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux.git/log/?h=riscv-scalar-aes

I'm not a huge fan of perlasm either.  Normal .S files can be much easier to
understand, and they do still support basic features like macros.  (Of course,
this only works if the .S file is the real source code.  If the real source code
is the perlasm, dumping it to a .S file doesn't make it more readable.)

Ultimately, it needs to decided on an algorithm-by-algorithm basis whether it
makes sense to use the .pl file directly from OpenSSL or write a normal .S file.
Sharing code can save time, but it can also waste time if/when things don't
match up and need to be changed for the kernel anyway.  If you look at the other
architectures, sharing the OpenSSL .pl file is most common for Poly1305 and
SHA-2.  It's rarer for AES modes.

In any case, regardless of .pl or .S, it would be nice to rely on the assembler
for the mapping from readable instruction to 32-bit words.  Yes, I understand
that the algorithm code reads mostly the some either way, but it introduces
nonstandard notation (e.g. due to having to avoid the period character) and a
possibility for error.  It's not the 1940s anymore; we should be able to have an
assembler.  Why not make OpenSSL and Linux only enable this code when the
assembler supports it?  Note that Linux already does this for many of the x86
extensions, so there is precedent for this; see arch/x86/Kconfig.assembler.

- Eric 

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH v4 00/12] RISC-V: support some cryptography accelerations
@ 2023-10-07 22:16                 ` Eric Biggers
  0 siblings, 0 replies; 100+ messages in thread
From: Eric Biggers @ 2023-10-07 22:16 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: He-Jie Shih, Charlie Jenkins, Heiko Stuebner, palmer,
	paul.walmsley, aou, herbert, davem, conor.dooley, linux-riscv,
	linux-kernel, linux-crypto, christoph.muellner, Heiko Stuebner

On Sat, Oct 07, 2023 at 01:33:48AM +0200, Ard Biesheuvel wrote:
> On Fri, 6 Oct 2023 at 23:01, He-Jie Shih <bignose1007@gmail.com> wrote:
> >
> > On Oct 7, 2023, at 03:47, Eric Biggers <ebiggers@kernel.org> wrote:
> > > On Fri, Sep 15, 2023 at 11:21:28AM +0800, Jerry Shih wrote:
> > >> On Sep 15, 2023, at 09:48, He-Jie Shih <bignose1007@gmail.com> wrote:
> > >> The OpenSSL PR is at [1].
> > >> And we are from SiFive.
> > >>
> > >> -Jerry
> > >>
> > >> [1]
> > >> https://github.com/openssl/openssl/pull/21923
> > >
> > > Hi Jerry, I'm wondering if you have an update on this?  Do you need any help?
> >
> > We have specialized aes-cbc/ecb/ctr patch locally and pass the `testmgr` test
> > cases. But the test patterns in `testmgr` are quite simple, I think it doesn't test the
> > corner case(e.g. aes-xts with tail element).
> >
> 
> There should be test cases for that.

Yes, non-block-aligned AES-XTS test vectors should be added to crypto/testmgr.h.
Though, that case should be already covered by the randomized tests enabled by
CONFIG_CRYPTO_MANAGER_EXTRA_TESTS=y, which I very strongly recommend enabling
during development.

> 
> > For aes-xts, I'm trying to update the implementation from OpenSSL. The design
> > philosophy is different between OpenSSL and linux. In linux crypto, the data will
> > be split into `scatterlist`. I need to preserve the aes-xts's iv for each scatterlist
> > entry call.
> 
> Yes, this applies to all block ciphers that take an IV.
> 
> > And I'm thinking about how to handle the tail data in a simple way.
> 
> The RISC-V vector ISA is quite advanced, so there may be a better
> trick using predicates etc but otherwise, I suppose you could reuse
> the same trick that other asm implementations use, which is to use
> unaligned loads and stores for the final blocks, and to use a vector
> permute with a permute table to shift the bytes in the registers. But
> this is not performance critical, given that existing in-kernel users
> use sector or page size inputs only.
> 
> > By the way, the `xts(aes)` implementation for arm and x86 are using
> > `cra_blocksize= AES_BLOCK_SIZE`. I don't know why we need to handle the tail
> > element. I think we will hit `EINVAL` error in `skcipher_walk_next()` if the data size
> > it not be a multiple of block size.
> >
> 
> No, both XTS and CBC-CTS permit inputs that are not a multiple of the
> block size, and will use some form of ciphertext stealing for the
> final tail. There is a generic CTS template that wraps CBC but
> combining them in the same way (e.g., using vector permute) will speed
> up things substantially. *However*, I'm not sure how relevant CBC-CTS
> is in the kernel, given that only fscrypt uses it IIRC but actually
> prefers something else so for new systems perhaps you shouldn't
> bother.
> 
> > Overall, we will have
> > 1) aes cipher
> > 2) aes with cbc/ecb/ctr/xts mode
> > 3) sha256/512 for `vlen>=128` platform
> > 4) sm3 for `vlen>=128` platform
> > 5) sm4
> > 6) ghash
> > 7) `chacha20` stream cipher
> >
> > The vector crypto pr in OpenSSL is under reviewing, we are still updating the
> > perl file into linux.
> >
> > The most complicated `gcm(aes)` mode will be in our next plan.
> >
> > > I'm also wondering about riscv.pm and the choice of generating the crypto
> > > instructions from .words instead of using the assembler.  It makes it
> > > significantly harder to review the code, IMO.  Can we depend on assembler
> > > support for these instructions, or is that just not ready yet?
> >
> > I have asked the same question before[1]. The reason is that Openssl could use
> > very old compiler for compiling. Thus, the assembler might not know the standard
> > rvv 1.0[2] and other vector crypto[3] instructions. That's why we use opcode for all
> > vector instructions. IMO, I would prefer to use opcode for `vector crypto` only. The
> > gcc-12 and clang-14 are already supporting rvv 1.0. Actually, I just read the `perl`
> > file instead of the actually generated opcode for OpenSSL pr reviewing. And it's
> > not hard to read the perl code.
> >
> 
> I understand the desire to reuse code, and OpenSSL already relies on
> so-called perlasm for this, but I think this is not a great choice,
> and I actually think this was a mistake for RISC-V. OpenSSL relies on
> perlasm for things like emittting different function pro-/epilogues
> depending on the calling convention (SysV versus MS on x86_64, for
> instance), but RISC-V does not have that much variety, and already
> supports the insn_r / insn_i pseudo instructions to emit arbitrary
> opcodes while still supporting named registers as usual. [Maybe my
> experience does not quite extrapolate to the vector ISA, but I managed
> to implement scalar AES [0] using the insn_r and insn_i pseudo
> instructions (which are generally provided by the assembler but Linux
> has fallback CPP macros for them as well), and this results on much
> more maintainable code IMO.[
> 
> We are using some of the OpenSSL perlasm in the kernel already (and
> some of it was introduced by me) but I don't think we should blindly
> reuse all of the RISC-V code if  some of it can straight-forwardly be
> written as normal .S files.
> 
> [0] https://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux.git/log/?h=riscv-scalar-aes

I'm not a huge fan of perlasm either.  Normal .S files can be much easier to
understand, and they do still support basic features like macros.  (Of course,
this only works if the .S file is the real source code.  If the real source code
is the perlasm, dumping it to a .S file doesn't make it more readable.)

Ultimately, it needs to decided on an algorithm-by-algorithm basis whether it
makes sense to use the .pl file directly from OpenSSL or write a normal .S file.
Sharing code can save time, but it can also waste time if/when things don't
match up and need to be changed for the kernel anyway.  If you look at the other
architectures, sharing the OpenSSL .pl file is most common for Poly1305 and
SHA-2.  It's rarer for AES modes.

In any case, regardless of .pl or .S, it would be nice to rely on the assembler
for the mapping from readable instruction to 32-bit words.  Yes, I understand
that the algorithm code reads mostly the some either way, but it introduces
nonstandard notation (e.g. due to having to avoid the period character) and a
possibility for error.  It's not the 1940s anymore; we should be able to have an
assembler.  Why not make OpenSSL and Linux only enable this code when the
assembler supports it?  Note that Linux already does this for many of the x86
extensions, so there is precedent for this; see arch/x86/Kconfig.assembler.

- Eric 

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH v4 00/12] RISC-V: support some cryptography accelerations
  2023-10-06 19:47           ` Eric Biggers
@ 2023-10-31  2:17             ` Jerry Shih
  -1 siblings, 0 replies; 100+ messages in thread
From: Jerry Shih @ 2023-10-31  2:17 UTC (permalink / raw)
  To: Eric Biggers
  Cc: Charlie Jenkins, Heiko Stuebner, palmer, paul.walmsley, aou,
	herbert, davem, conor.dooley, linux-riscv, linux-kernel,
	linux-crypto, christoph.muellner, Heiko Stuebner

On Oct 7, 2023, at 03:47, Eric Biggers <ebiggers@kernel.org> wrote:
> On Fri, Sep 15, 2023 at 11:21:28AM +0800, Jerry Shih wrote:
>> On Sep 15, 2023, at 09:48, He-Jie Shih <bignose1007@gmail.com> wrote:
>> 
>> The OpenSSL PR is at [1].
>> And we are from SiFive.
>> 
>> -Jerry
>> 
>> [1]
>> https://github.com/openssl/openssl/pull/21923
> 
> Hi Jerry, I'm wondering if you have an update on this?  Do you need any help?

The RISC-V vector crypto OpenSSL pr[1] is merged.
And we also sent the vector-crypto patch based on Heiko's and OpenSSL
works.
Here is the link:
https://lore.kernel.org/all/20231025183644.8735-1-jerry.shih@sifive.com/

[1]
https://github.com/openssl/openssl/pull/21923

> I'm also wondering about riscv.pm and the choice of generating the crypto
> instructions from .words instead of using the assembler.  It makes it
> significantly harder to review the code, IMO.  Can we depend on assembler
> support for these instructions, or is that just not ready yet?
> 
> - Eric

There is no public assembler supports the vector-crypto asm mnemonics.
We should still use `opcode` for vector-crypto instructions. But we might
use asm for standard rvv parts.
In order to reuse the codes in OpenSSL as much as possible,  we still use
the `riscv.pm` for all standard rvv and vector-crypto instructions. If the asm
mnemonic is still a better approach,  I will `rewrite` all standard rvv parts
with asm mnemonics in next patch.

-Jerry



^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH v4 00/12] RISC-V: support some cryptography accelerations
@ 2023-10-31  2:17             ` Jerry Shih
  0 siblings, 0 replies; 100+ messages in thread
From: Jerry Shih @ 2023-10-31  2:17 UTC (permalink / raw)
  To: Eric Biggers
  Cc: Charlie Jenkins, Heiko Stuebner, palmer, paul.walmsley, aou,
	herbert, davem, conor.dooley, linux-riscv, linux-kernel,
	linux-crypto, christoph.muellner, Heiko Stuebner

On Oct 7, 2023, at 03:47, Eric Biggers <ebiggers@kernel.org> wrote:
> On Fri, Sep 15, 2023 at 11:21:28AM +0800, Jerry Shih wrote:
>> On Sep 15, 2023, at 09:48, He-Jie Shih <bignose1007@gmail.com> wrote:
>> 
>> The OpenSSL PR is at [1].
>> And we are from SiFive.
>> 
>> -Jerry
>> 
>> [1]
>> https://github.com/openssl/openssl/pull/21923
> 
> Hi Jerry, I'm wondering if you have an update on this?  Do you need any help?

The RISC-V vector crypto OpenSSL pr[1] is merged.
And we also sent the vector-crypto patch based on Heiko's and OpenSSL
works.
Here is the link:
https://lore.kernel.org/all/20231025183644.8735-1-jerry.shih@sifive.com/

[1]
https://github.com/openssl/openssl/pull/21923

> I'm also wondering about riscv.pm and the choice of generating the crypto
> instructions from .words instead of using the assembler.  It makes it
> significantly harder to review the code, IMO.  Can we depend on assembler
> support for these instructions, or is that just not ready yet?
> 
> - Eric

There is no public assembler supports the vector-crypto asm mnemonics.
We should still use `opcode` for vector-crypto instructions. But we might
use asm for standard rvv parts.
In order to reuse the codes in OpenSSL as much as possible,  we still use
the `riscv.pm` for all standard rvv and vector-crypto instructions. If the asm
mnemonic is still a better approach,  I will `rewrite` all standard rvv parts
with asm mnemonics in next patch.

-Jerry



_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH v4 00/12] RISC-V: support some cryptography accelerations
  2023-10-31  2:17             ` Jerry Shih
@ 2023-11-02  4:03               ` Eric Biggers
  -1 siblings, 0 replies; 100+ messages in thread
From: Eric Biggers @ 2023-11-02  4:03 UTC (permalink / raw)
  To: Jerry Shih
  Cc: Charlie Jenkins, Heiko Stuebner, palmer, paul.walmsley, aou,
	herbert, davem, conor.dooley, linux-riscv, linux-kernel,
	linux-crypto, christoph.muellner, Heiko Stuebner

Hi Jerry,

(Just so you know, you still need to fix your email configuration.  Your emails
have a bogus Reply-To header, which makes replies not be sent to you by default.
I had to manually set the "To:" address when replying.)

On Tue, Oct 31, 2023 at 10:17:11AM +0800, Jerry Shih wrote:
> 
> The RISC-V vector crypto OpenSSL pr[1] is merged.
> And we also sent the vector-crypto patch based on Heiko's and OpenSSL
> works.
> Here is the link:
> https://lore.kernel.org/all/20231025183644.8735-1-jerry.shih@sifive.com/
> 
> [1]
> https://github.com/openssl/openssl/pull/21923

Awesome, thanks!

> 
> > I'm also wondering about riscv.pm and the choice of generating the crypto
> > instructions from .words instead of using the assembler.  It makes it
> > significantly harder to review the code, IMO.  Can we depend on assembler
> > support for these instructions, or is that just not ready yet?
> > 
> > - Eric
> 
> There is no public assembler supports the vector-crypto asm mnemonics.
> We should still use `opcode` for vector-crypto instructions. But we might
> use asm for standard rvv parts.
> In order to reuse the codes in OpenSSL as much as possible,  we still use
> the `riscv.pm` for all standard rvv and vector-crypto instructions. If the asm
> mnemonic is still a better approach,  I will `rewrite` all standard rvv parts
> with asm mnemonics in next patch.

Tip-of-tree gcc + binutils seems to support them.  Building some of the sample
code from the riscv-crypto repository:

    $ riscv64-linux-gnu-as --version
    GNU assembler (GNU Binutils) 2.41.50.20231021
    $ riscv64-linux-gnu-gcc --version
    riscv64-linux-gnu-gcc (GCC) 14.0.0 20231021 (experimental)
    $ riscv64-linux-gnu-gcc -march=rv64ivzvkned -c riscv-crypto/doc/vector/code-samples/zvkned.s

And tip-of-tree clang supports them experimentally:

    $ clang --version
    clang version 18.0.0 (https://github.com/llvm/llvm-project 30416f39be326b403e19f23da387009736483119)
    $ clang -menable-experimental-extensions -target riscv64-linux-gnu -march=rv64ivzvkned1 -c riscv-crypto/doc/vector/code-samples/zvkned.s

It would be nice to use a real assembler, so that people won't have to worry
about potential mistakes or inconsistencies in the perl-based "assembler".  Also
keep in mind that if we allow people to compile this code without the real
assembler support from the beginning, it might end up staying that way for quite
a while in order to avoid breaking the build for people.

Ultimately it's up to you though; I think that you and others who have been
working on RISC-V crypto can make the best decision about what to do here.  I
also don't want this patchset to be delayed waiting for other projects, so maybe
that indeed means the perl-based "assembler" needs to be used for now.

- Eric

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH v4 00/12] RISC-V: support some cryptography accelerations
@ 2023-11-02  4:03               ` Eric Biggers
  0 siblings, 0 replies; 100+ messages in thread
From: Eric Biggers @ 2023-11-02  4:03 UTC (permalink / raw)
  To: Jerry Shih
  Cc: Charlie Jenkins, Heiko Stuebner, palmer, paul.walmsley, aou,
	herbert, davem, conor.dooley, linux-riscv, linux-kernel,
	linux-crypto, christoph.muellner, Heiko Stuebner

Hi Jerry,

(Just so you know, you still need to fix your email configuration.  Your emails
have a bogus Reply-To header, which makes replies not be sent to you by default.
I had to manually set the "To:" address when replying.)

On Tue, Oct 31, 2023 at 10:17:11AM +0800, Jerry Shih wrote:
> 
> The RISC-V vector crypto OpenSSL pr[1] is merged.
> And we also sent the vector-crypto patch based on Heiko's and OpenSSL
> works.
> Here is the link:
> https://lore.kernel.org/all/20231025183644.8735-1-jerry.shih@sifive.com/
> 
> [1]
> https://github.com/openssl/openssl/pull/21923

Awesome, thanks!

> 
> > I'm also wondering about riscv.pm and the choice of generating the crypto
> > instructions from .words instead of using the assembler.  It makes it
> > significantly harder to review the code, IMO.  Can we depend on assembler
> > support for these instructions, or is that just not ready yet?
> > 
> > - Eric
> 
> There is no public assembler supports the vector-crypto asm mnemonics.
> We should still use `opcode` for vector-crypto instructions. But we might
> use asm for standard rvv parts.
> In order to reuse the codes in OpenSSL as much as possible,  we still use
> the `riscv.pm` for all standard rvv and vector-crypto instructions. If the asm
> mnemonic is still a better approach,  I will `rewrite` all standard rvv parts
> with asm mnemonics in next patch.

Tip-of-tree gcc + binutils seems to support them.  Building some of the sample
code from the riscv-crypto repository:

    $ riscv64-linux-gnu-as --version
    GNU assembler (GNU Binutils) 2.41.50.20231021
    $ riscv64-linux-gnu-gcc --version
    riscv64-linux-gnu-gcc (GCC) 14.0.0 20231021 (experimental)
    $ riscv64-linux-gnu-gcc -march=rv64ivzvkned -c riscv-crypto/doc/vector/code-samples/zvkned.s

And tip-of-tree clang supports them experimentally:

    $ clang --version
    clang version 18.0.0 (https://github.com/llvm/llvm-project 30416f39be326b403e19f23da387009736483119)
    $ clang -menable-experimental-extensions -target riscv64-linux-gnu -march=rv64ivzvkned1 -c riscv-crypto/doc/vector/code-samples/zvkned.s

It would be nice to use a real assembler, so that people won't have to worry
about potential mistakes or inconsistencies in the perl-based "assembler".  Also
keep in mind that if we allow people to compile this code without the real
assembler support from the beginning, it might end up staying that way for quite
a while in order to avoid breaking the build for people.

Ultimately it's up to you though; I think that you and others who have been
working on RISC-V crypto can make the best decision about what to do here.  I
also don't want this patchset to be delayed waiting for other projects, so maybe
that indeed means the perl-based "assembler" needs to be used for now.

- Eric

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH v4 00/12] RISC-V: support some cryptography accelerations
  2023-11-02  4:03               ` Eric Biggers
@ 2023-11-21 23:51                 ` Eric Biggers
  -1 siblings, 0 replies; 100+ messages in thread
From: Eric Biggers @ 2023-11-21 23:51 UTC (permalink / raw)
  To: Jerry Shih
  Cc: Charlie Jenkins, Heiko Stuebner, palmer, paul.walmsley, aou,
	herbert, davem, conor.dooley, linux-riscv, linux-kernel,
	linux-crypto, christoph.muellner, Heiko Stuebner

On Wed, Nov 01, 2023 at 09:03:33PM -0700, Eric Biggers wrote:
> > 
> > There is no public assembler supports the vector-crypto asm mnemonics.
> > We should still use `opcode` for vector-crypto instructions. But we might
> > use asm for standard rvv parts.
> > In order to reuse the codes in OpenSSL as much as possible,  we still use
> > the `riscv.pm` for all standard rvv and vector-crypto instructions. If the asm
> > mnemonic is still a better approach,  I will `rewrite` all standard rvv parts
> > with asm mnemonics in next patch.
> 
> Tip-of-tree gcc + binutils seems to support them.  Building some of the sample
> code from the riscv-crypto repository:
> 
>     $ riscv64-linux-gnu-as --version
>     GNU assembler (GNU Binutils) 2.41.50.20231021
>     $ riscv64-linux-gnu-gcc --version
>     riscv64-linux-gnu-gcc (GCC) 14.0.0 20231021 (experimental)
>     $ riscv64-linux-gnu-gcc -march=rv64ivzvkned -c riscv-crypto/doc/vector/code-samples/zvkned.s
> 
> And tip-of-tree clang supports them experimentally:
> 
>     $ clang --version
>     clang version 18.0.0 (https://github.com/llvm/llvm-project 30416f39be326b403e19f23da387009736483119)
>     $ clang -menable-experimental-extensions -target riscv64-linux-gnu -march=rv64ivzvkned1 -c riscv-crypto/doc/vector/code-samples/zvkned.s
> 
> It would be nice to use a real assembler, so that people won't have to worry
> about potential mistakes or inconsistencies in the perl-based "assembler".  Also
> keep in mind that if we allow people to compile this code without the real
> assembler support from the beginning, it might end up staying that way for quite
> a while in order to avoid breaking the build for people.
> 
> Ultimately it's up to you though; I think that you and others who have been
> working on RISC-V crypto can make the best decision about what to do here.  I
> also don't want this patchset to be delayed waiting for other projects, so maybe
> that indeed means the perl-based "assembler" needs to be used for now.
> 

Just wanted to bump up this discussion again.  In binutils, the vector crypto
v1.0.0 support was released 4 months ago in 2.41.  See the NEWS file at
https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;a=blob_plain;f=binutils/NEWS;hb=refs/heads/binutils-2_41-branch

    * The RISC-V port now supports the following new standard extensions:
      - Zicond (conditional zero instructions)
      - Zfa (additional floating-point instructions)
      - Zvbb, Zvbc, Zvkg, Zvkned, Zvknh[ab], Zvksed, Zvksh, Zvkn, Zvknc, Zvkng,
        Zvks, Zvksc, Zvkg, Zvkt (vector crypto instructions)

That's every extension listed in the vector crypto v1.0.0 specification
(https://github.com/riscv/riscv-crypto/releases/download/v1.0.0/riscv-crypto-spec-vector.pdf).

LLVM still has the vector crypto extensions marked as "experimental" extensions.
However, there is an open pull request to mark them non-experimental:
https://github.com/llvm/llvm-project/pull/69000

Could we just go ahead and remove riscv.pm, develop with binutils for now, and
prioritize getting the LLVM support finished?

- Eric

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH v4 00/12] RISC-V: support some cryptography accelerations
@ 2023-11-21 23:51                 ` Eric Biggers
  0 siblings, 0 replies; 100+ messages in thread
From: Eric Biggers @ 2023-11-21 23:51 UTC (permalink / raw)
  To: Jerry Shih
  Cc: Charlie Jenkins, Heiko Stuebner, palmer, paul.walmsley, aou,
	herbert, davem, conor.dooley, linux-riscv, linux-kernel,
	linux-crypto, christoph.muellner, Heiko Stuebner

On Wed, Nov 01, 2023 at 09:03:33PM -0700, Eric Biggers wrote:
> > 
> > There is no public assembler supports the vector-crypto asm mnemonics.
> > We should still use `opcode` for vector-crypto instructions. But we might
> > use asm for standard rvv parts.
> > In order to reuse the codes in OpenSSL as much as possible,  we still use
> > the `riscv.pm` for all standard rvv and vector-crypto instructions. If the asm
> > mnemonic is still a better approach,  I will `rewrite` all standard rvv parts
> > with asm mnemonics in next patch.
> 
> Tip-of-tree gcc + binutils seems to support them.  Building some of the sample
> code from the riscv-crypto repository:
> 
>     $ riscv64-linux-gnu-as --version
>     GNU assembler (GNU Binutils) 2.41.50.20231021
>     $ riscv64-linux-gnu-gcc --version
>     riscv64-linux-gnu-gcc (GCC) 14.0.0 20231021 (experimental)
>     $ riscv64-linux-gnu-gcc -march=rv64ivzvkned -c riscv-crypto/doc/vector/code-samples/zvkned.s
> 
> And tip-of-tree clang supports them experimentally:
> 
>     $ clang --version
>     clang version 18.0.0 (https://github.com/llvm/llvm-project 30416f39be326b403e19f23da387009736483119)
>     $ clang -menable-experimental-extensions -target riscv64-linux-gnu -march=rv64ivzvkned1 -c riscv-crypto/doc/vector/code-samples/zvkned.s
> 
> It would be nice to use a real assembler, so that people won't have to worry
> about potential mistakes or inconsistencies in the perl-based "assembler".  Also
> keep in mind that if we allow people to compile this code without the real
> assembler support from the beginning, it might end up staying that way for quite
> a while in order to avoid breaking the build for people.
> 
> Ultimately it's up to you though; I think that you and others who have been
> working on RISC-V crypto can make the best decision about what to do here.  I
> also don't want this patchset to be delayed waiting for other projects, so maybe
> that indeed means the perl-based "assembler" needs to be used for now.
> 

Just wanted to bump up this discussion again.  In binutils, the vector crypto
v1.0.0 support was released 4 months ago in 2.41.  See the NEWS file at
https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;a=blob_plain;f=binutils/NEWS;hb=refs/heads/binutils-2_41-branch

    * The RISC-V port now supports the following new standard extensions:
      - Zicond (conditional zero instructions)
      - Zfa (additional floating-point instructions)
      - Zvbb, Zvbc, Zvkg, Zvkned, Zvknh[ab], Zvksed, Zvksh, Zvkn, Zvknc, Zvkng,
        Zvks, Zvksc, Zvkg, Zvkt (vector crypto instructions)

That's every extension listed in the vector crypto v1.0.0 specification
(https://github.com/riscv/riscv-crypto/releases/download/v1.0.0/riscv-crypto-spec-vector.pdf).

LLVM still has the vector crypto extensions marked as "experimental" extensions.
However, there is an open pull request to mark them non-experimental:
https://github.com/llvm/llvm-project/pull/69000

Could we just go ahead and remove riscv.pm, develop with binutils for now, and
prioritize getting the LLVM support finished?

- Eric

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH v4 00/12] RISC-V: support some cryptography accelerations
  2023-11-21 23:51                 ` Eric Biggers
@ 2023-11-22  7:58                   ` Jerry Shih
  -1 siblings, 0 replies; 100+ messages in thread
From: Jerry Shih @ 2023-11-22  7:58 UTC (permalink / raw)
  To: Eric Biggers
  Cc: Charlie Jenkins, Heiko Stuebner, palmer, paul.walmsley, aou,
	herbert, davem, conor.dooley, linux-riscv, linux-kernel,
	linux-crypto, christoph.muellner, Heiko Stuebner

On Nov 22, 2023, at 07:51, Eric Biggers <ebiggers@kernel.org> wrote:
> On Wed, Nov 01, 2023 at 09:03:33PM -0700, Eric Biggers wrote:
>> 
>> It would be nice to use a real assembler, so that people won't have to worry
>> about potential mistakes or inconsistencies in the perl-based "assembler".  Also
>> keep in mind that if we allow people to compile this code without the real
>> assembler support from the beginning, it might end up staying that way for quite
>> a while in order to avoid breaking the build for people.
>> 
>> Ultimately it's up to you though; I think that you and others who have been
>> working on RISC-V crypto can make the best decision about what to do here.  I
>> also don't want this patchset to be delayed waiting for other projects, so maybe
>> that indeed means the perl-based "assembler" needs to be used for now.
> 
> Just wanted to bump up this discussion again.  In binutils, the vector crypto
> v1.0.0 support was released 4 months ago in 2.41.  See the NEWS file at
> https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;a=blob_plain;f=binutils/NEWS;hb=refs/heads/binutils-2_41-branch
> 
>    * The RISC-V port now supports the following new standard extensions:
>      - Zicond (conditional zero instructions)
>      - Zfa (additional floating-point instructions)
>      - Zvbb, Zvbc, Zvkg, Zvkned, Zvknh[ab], Zvksed, Zvksh, Zvkn, Zvknc, Zvkng,
>        Zvks, Zvksc, Zvkg, Zvkt (vector crypto instructions)
> 
> That's every extension listed in the vector crypto v1.0.0 specification
> (https://github.com/riscv/riscv-crypto/releases/download/v1.0.0/riscv-crypto-spec-vector.pdf).

It doesn't fit all v1.0 spec.
The `Zvkb` is missed in binutils. It's the subset of `Zvbb`. We needs some extra
works if user just try to use `Zvkb`.
https://github.com/riscv/riscv-crypto/blob/main/doc/vector/riscv-crypto-vector-zvkb.adoc
Some crypto algorithms are already checking for `Zvkb` instead of `Zvbb`.

> LLVM still has the vector crypto extensions marked as "experimental" extensions.
> However, there is an open pull request to mark them non-experimental:
> https://github.com/llvm/llvm-project/pull/69000
> 
> Could we just go ahead and remove riscv.pm, develop with binutils for now, and
> prioritize getting the LLVM support finished?

Yes, we could.
But we need to handle the extensions checking for toolchains like:
https://github.com/torvalds/linux/commit/b6fcdb191e36f82336f9b5e126d51c02e7323480
I could do that, but I think I need some times to test the builds. And it will introduce
more dependency patch which is not related to actual crypto algorithms and the
gluing code in kernel. I will send another patch for toolchain part after the v2 patch.
And I am working for v2 patch with your new review comments. The v2 will still
use `perlasm`.
And we could move this discussion to this thread.
https://lore.kernel.org/all/20231025183644.8735-1-jerry.shih@sifive.com/

-Jerry

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH v4 00/12] RISC-V: support some cryptography accelerations
@ 2023-11-22  7:58                   ` Jerry Shih
  0 siblings, 0 replies; 100+ messages in thread
From: Jerry Shih @ 2023-11-22  7:58 UTC (permalink / raw)
  To: Eric Biggers
  Cc: Charlie Jenkins, Heiko Stuebner, palmer, paul.walmsley, aou,
	herbert, davem, conor.dooley, linux-riscv, linux-kernel,
	linux-crypto, christoph.muellner, Heiko Stuebner

On Nov 22, 2023, at 07:51, Eric Biggers <ebiggers@kernel.org> wrote:
> On Wed, Nov 01, 2023 at 09:03:33PM -0700, Eric Biggers wrote:
>> 
>> It would be nice to use a real assembler, so that people won't have to worry
>> about potential mistakes or inconsistencies in the perl-based "assembler".  Also
>> keep in mind that if we allow people to compile this code without the real
>> assembler support from the beginning, it might end up staying that way for quite
>> a while in order to avoid breaking the build for people.
>> 
>> Ultimately it's up to you though; I think that you and others who have been
>> working on RISC-V crypto can make the best decision about what to do here.  I
>> also don't want this patchset to be delayed waiting for other projects, so maybe
>> that indeed means the perl-based "assembler" needs to be used for now.
> 
> Just wanted to bump up this discussion again.  In binutils, the vector crypto
> v1.0.0 support was released 4 months ago in 2.41.  See the NEWS file at
> https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;a=blob_plain;f=binutils/NEWS;hb=refs/heads/binutils-2_41-branch
> 
>    * The RISC-V port now supports the following new standard extensions:
>      - Zicond (conditional zero instructions)
>      - Zfa (additional floating-point instructions)
>      - Zvbb, Zvbc, Zvkg, Zvkned, Zvknh[ab], Zvksed, Zvksh, Zvkn, Zvknc, Zvkng,
>        Zvks, Zvksc, Zvkg, Zvkt (vector crypto instructions)
> 
> That's every extension listed in the vector crypto v1.0.0 specification
> (https://github.com/riscv/riscv-crypto/releases/download/v1.0.0/riscv-crypto-spec-vector.pdf).

It doesn't fit all v1.0 spec.
The `Zvkb` is missed in binutils. It's the subset of `Zvbb`. We needs some extra
works if user just try to use `Zvkb`.
https://github.com/riscv/riscv-crypto/blob/main/doc/vector/riscv-crypto-vector-zvkb.adoc
Some crypto algorithms are already checking for `Zvkb` instead of `Zvbb`.

> LLVM still has the vector crypto extensions marked as "experimental" extensions.
> However, there is an open pull request to mark them non-experimental:
> https://github.com/llvm/llvm-project/pull/69000
> 
> Could we just go ahead and remove riscv.pm, develop with binutils for now, and
> prioritize getting the LLVM support finished?

Yes, we could.
But we need to handle the extensions checking for toolchains like:
https://github.com/torvalds/linux/commit/b6fcdb191e36f82336f9b5e126d51c02e7323480
I could do that, but I think I need some times to test the builds. And it will introduce
more dependency patch which is not related to actual crypto algorithms and the
gluing code in kernel. I will send another patch for toolchain part after the v2 patch.
And I am working for v2 patch with your new review comments. The v2 will still
use `perlasm`.
And we could move this discussion to this thread.
https://lore.kernel.org/all/20231025183644.8735-1-jerry.shih@sifive.com/

-Jerry
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH v4 00/12] RISC-V: support some cryptography accelerations
  2023-11-22  7:58                   ` Jerry Shih
@ 2023-11-22 23:42                     ` Eric Biggers
  -1 siblings, 0 replies; 100+ messages in thread
From: Eric Biggers @ 2023-11-22 23:42 UTC (permalink / raw)
  To: Jerry Shih
  Cc: Charlie Jenkins, Heiko Stuebner, palmer, paul.walmsley, aou,
	herbert, davem, conor.dooley, linux-riscv, linux-kernel,
	linux-crypto, christoph.muellner, Heiko Stuebner

On Wed, Nov 22, 2023 at 03:58:17PM +0800, Jerry Shih wrote:
> On Nov 22, 2023, at 07:51, Eric Biggers <ebiggers@kernel.org> wrote:
> > On Wed, Nov 01, 2023 at 09:03:33PM -0700, Eric Biggers wrote:
> >> 
> >> It would be nice to use a real assembler, so that people won't have to worry
> >> about potential mistakes or inconsistencies in the perl-based "assembler".  Also
> >> keep in mind that if we allow people to compile this code without the real
> >> assembler support from the beginning, it might end up staying that way for quite
> >> a while in order to avoid breaking the build for people.
> >> 
> >> Ultimately it's up to you though; I think that you and others who have been
> >> working on RISC-V crypto can make the best decision about what to do here.  I
> >> also don't want this patchset to be delayed waiting for other projects, so maybe
> >> that indeed means the perl-based "assembler" needs to be used for now.
> > 
> > Just wanted to bump up this discussion again.  In binutils, the vector crypto
> > v1.0.0 support was released 4 months ago in 2.41.  See the NEWS file at
> > https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;a=blob_plain;f=binutils/NEWS;hb=refs/heads/binutils-2_41-branch
> > 
> >    * The RISC-V port now supports the following new standard extensions:
> >      - Zicond (conditional zero instructions)
> >      - Zfa (additional floating-point instructions)
> >      - Zvbb, Zvbc, Zvkg, Zvkned, Zvknh[ab], Zvksed, Zvksh, Zvkn, Zvknc, Zvkng,
> >        Zvks, Zvksc, Zvkg, Zvkt (vector crypto instructions)
> > 
> > That's every extension listed in the vector crypto v1.0.0 specification
> > (https://github.com/riscv/riscv-crypto/releases/download/v1.0.0/riscv-crypto-spec-vector.pdf).
> 
> It doesn't fit all v1.0 spec.
> The `Zvkb` is missed in binutils. It's the subset of `Zvbb`. We needs some extra
> works if user just try to use `Zvkb`.
> https://github.com/riscv/riscv-crypto/blob/main/doc/vector/riscv-crypto-vector-zvkb.adoc
> Some crypto algorithms are already checking for `Zvkb` instead of `Zvbb`.

Yeah, that's unfortunate that Zvkb got missed in binutils.  However, since all
Zvkb instructions are part of Zvbb, which is supported, assembling Zvkb
instructions should still work --- right?

> > LLVM still has the vector crypto extensions marked as "experimental" extensions.
> > However, there is an open pull request to mark them non-experimental:
> > https://github.com/llvm/llvm-project/pull/69000
> > 
> > Could we just go ahead and remove riscv.pm, develop with binutils for now, and
> > prioritize getting the LLVM support finished?
> 
> Yes, we could.
> But we need to handle the extensions checking for toolchains like:
> https://github.com/torvalds/linux/commit/b6fcdb191e36f82336f9b5e126d51c02e7323480
> I could do that, but I think I need some times to test the builds. And it will introduce
> more dependency patch which is not related to actual crypto algorithms and the
> gluing code in kernel. I will send another patch for toolchain part after the v2 patch.
> And I am working for v2 patch with your new review comments. The v2 will still
> use `perlasm`.

Note that perlasm (.pl) vs assembly (.S), and ".inst" vs real assembler
instructions, are actually separate concerns.  We could use real assembler
instructions while still using perlasm.  Or we could use assembly while still
using macros that generate the instructions as .inst.

My preference is indeed both: assembly (.S) with real assembler instructions.
That would keep things more straightforward.

We do not necessarily need to do both before merging the code, though.  It will
be beneficial to get this code merged sooner rather than later, so that other
people can work on improving it.

I would prioritize the change to use real assembler instructions.  I do think
it's worth thinking about getting that change in from the beginning, so that the
toolchain prerequisites are properly in place from the beginning and people can
properly account for them and prioritize the toolchain work as needed.

- Eric

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH v4 00/12] RISC-V: support some cryptography accelerations
@ 2023-11-22 23:42                     ` Eric Biggers
  0 siblings, 0 replies; 100+ messages in thread
From: Eric Biggers @ 2023-11-22 23:42 UTC (permalink / raw)
  To: Jerry Shih
  Cc: Charlie Jenkins, Heiko Stuebner, palmer, paul.walmsley, aou,
	herbert, davem, conor.dooley, linux-riscv, linux-kernel,
	linux-crypto, christoph.muellner, Heiko Stuebner

On Wed, Nov 22, 2023 at 03:58:17PM +0800, Jerry Shih wrote:
> On Nov 22, 2023, at 07:51, Eric Biggers <ebiggers@kernel.org> wrote:
> > On Wed, Nov 01, 2023 at 09:03:33PM -0700, Eric Biggers wrote:
> >> 
> >> It would be nice to use a real assembler, so that people won't have to worry
> >> about potential mistakes or inconsistencies in the perl-based "assembler".  Also
> >> keep in mind that if we allow people to compile this code without the real
> >> assembler support from the beginning, it might end up staying that way for quite
> >> a while in order to avoid breaking the build for people.
> >> 
> >> Ultimately it's up to you though; I think that you and others who have been
> >> working on RISC-V crypto can make the best decision about what to do here.  I
> >> also don't want this patchset to be delayed waiting for other projects, so maybe
> >> that indeed means the perl-based "assembler" needs to be used for now.
> > 
> > Just wanted to bump up this discussion again.  In binutils, the vector crypto
> > v1.0.0 support was released 4 months ago in 2.41.  See the NEWS file at
> > https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;a=blob_plain;f=binutils/NEWS;hb=refs/heads/binutils-2_41-branch
> > 
> >    * The RISC-V port now supports the following new standard extensions:
> >      - Zicond (conditional zero instructions)
> >      - Zfa (additional floating-point instructions)
> >      - Zvbb, Zvbc, Zvkg, Zvkned, Zvknh[ab], Zvksed, Zvksh, Zvkn, Zvknc, Zvkng,
> >        Zvks, Zvksc, Zvkg, Zvkt (vector crypto instructions)
> > 
> > That's every extension listed in the vector crypto v1.0.0 specification
> > (https://github.com/riscv/riscv-crypto/releases/download/v1.0.0/riscv-crypto-spec-vector.pdf).
> 
> It doesn't fit all v1.0 spec.
> The `Zvkb` is missed in binutils. It's the subset of `Zvbb`. We needs some extra
> works if user just try to use `Zvkb`.
> https://github.com/riscv/riscv-crypto/blob/main/doc/vector/riscv-crypto-vector-zvkb.adoc
> Some crypto algorithms are already checking for `Zvkb` instead of `Zvbb`.

Yeah, that's unfortunate that Zvkb got missed in binutils.  However, since all
Zvkb instructions are part of Zvbb, which is supported, assembling Zvkb
instructions should still work --- right?

> > LLVM still has the vector crypto extensions marked as "experimental" extensions.
> > However, there is an open pull request to mark them non-experimental:
> > https://github.com/llvm/llvm-project/pull/69000
> > 
> > Could we just go ahead and remove riscv.pm, develop with binutils for now, and
> > prioritize getting the LLVM support finished?
> 
> Yes, we could.
> But we need to handle the extensions checking for toolchains like:
> https://github.com/torvalds/linux/commit/b6fcdb191e36f82336f9b5e126d51c02e7323480
> I could do that, but I think I need some times to test the builds. And it will introduce
> more dependency patch which is not related to actual crypto algorithms and the
> gluing code in kernel. I will send another patch for toolchain part after the v2 patch.
> And I am working for v2 patch with your new review comments. The v2 will still
> use `perlasm`.

Note that perlasm (.pl) vs assembly (.S), and ".inst" vs real assembler
instructions, are actually separate concerns.  We could use real assembler
instructions while still using perlasm.  Or we could use assembly while still
using macros that generate the instructions as .inst.

My preference is indeed both: assembly (.S) with real assembler instructions.
That would keep things more straightforward.

We do not necessarily need to do both before merging the code, though.  It will
be beneficial to get this code merged sooner rather than later, so that other
people can work on improving it.

I would prioritize the change to use real assembler instructions.  I do think
it's worth thinking about getting that change in from the beginning, so that the
toolchain prerequisites are properly in place from the beginning and people can
properly account for them and prioritize the toolchain work as needed.

- Eric

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH v4 00/12] RISC-V: support some cryptography accelerations
  2023-11-22 23:42                     ` Eric Biggers
@ 2023-11-23  0:36                       ` Christoph Müllner
  -1 siblings, 0 replies; 100+ messages in thread
From: Christoph Müllner @ 2023-11-23  0:36 UTC (permalink / raw)
  To: Eric Biggers
  Cc: Jerry Shih, Charlie Jenkins, Heiko Stuebner, palmer,
	paul.walmsley, aou, herbert, davem, conor.dooley, linux-riscv,
	linux-kernel, linux-crypto, Heiko Stuebner

On Thu, Nov 23, 2023 at 12:43 AM Eric Biggers <ebiggers@kernel.org> wrote:
>
> On Wed, Nov 22, 2023 at 03:58:17PM +0800, Jerry Shih wrote:
> > On Nov 22, 2023, at 07:51, Eric Biggers <ebiggers@kernel.org> wrote:
> > > On Wed, Nov 01, 2023 at 09:03:33PM -0700, Eric Biggers wrote:
> > >>
> > >> It would be nice to use a real assembler, so that people won't have to worry
> > >> about potential mistakes or inconsistencies in the perl-based "assembler".  Also
> > >> keep in mind that if we allow people to compile this code without the real
> > >> assembler support from the beginning, it might end up staying that way for quite
> > >> a while in order to avoid breaking the build for people.
> > >>
> > >> Ultimately it's up to you though; I think that you and others who have been
> > >> working on RISC-V crypto can make the best decision about what to do here.  I
> > >> also don't want this patchset to be delayed waiting for other projects, so maybe
> > >> that indeed means the perl-based "assembler" needs to be used for now.
> > >
> > > Just wanted to bump up this discussion again.  In binutils, the vector crypto
> > > v1.0.0 support was released 4 months ago in 2.41.  See the NEWS file at
> > > https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;a=blob_plain;f=binutils/NEWS;hb=refs/heads/binutils-2_41-branch
> > >
> > >    * The RISC-V port now supports the following new standard extensions:
> > >      - Zicond (conditional zero instructions)
> > >      - Zfa (additional floating-point instructions)
> > >      - Zvbb, Zvbc, Zvkg, Zvkned, Zvknh[ab], Zvksed, Zvksh, Zvkn, Zvknc, Zvkng,
> > >        Zvks, Zvksc, Zvkg, Zvkt (vector crypto instructions)
> > >
> > > That's every extension listed in the vector crypto v1.0.0 specification
> > > (https://github.com/riscv/riscv-crypto/releases/download/v1.0.0/riscv-crypto-spec-vector.pdf).
> >
> > It doesn't fit all v1.0 spec.
> > The `Zvkb` is missed in binutils. It's the subset of `Zvbb`. We needs some extra
> > works if user just try to use `Zvkb`.
> > https://github.com/riscv/riscv-crypto/blob/main/doc/vector/riscv-crypto-vector-zvkb.adoc
> > Some crypto algorithms are already checking for `Zvkb` instead of `Zvbb`.
>
> Yeah, that's unfortunate that Zvkb got missed in binutils.  However, since all
> Zvkb instructions are part of Zvbb, which is supported, assembling Zvkb
> instructions should still work --- right?

Not forgotten, but the Zvkb extension did not exist when the patchset
was merged.
RISC-V extension support is typically merged when specifications are "frozen".
This means a high bar for changes, but they are possible until the
spec is ratified.
Often nothing is changed until ratification, but here Zvkb has been
(re-)introduced.

I was not aware of this untils I read this email, so I just wrote a
patch that fills the gap:
  https://sourceware.org/pipermail/binutils/2023-November/130762.html

Thanks for reporting!

BR
Christoph

>
> > > LLVM still has the vector crypto extensions marked as "experimental" extensions.
> > > However, there is an open pull request to mark them non-experimental:
> > > https://github.com/llvm/llvm-project/pull/69000
> > >
> > > Could we just go ahead and remove riscv.pm, develop with binutils for now, and
> > > prioritize getting the LLVM support finished?
> >
> > Yes, we could.
> > But we need to handle the extensions checking for toolchains like:
> > https://github.com/torvalds/linux/commit/b6fcdb191e36f82336f9b5e126d51c02e7323480
> > I could do that, but I think I need some times to test the builds. And it will introduce
> > more dependency patch which is not related to actual crypto algorithms and the
> > gluing code in kernel. I will send another patch for toolchain part after the v2 patch.
> > And I am working for v2 patch with your new review comments. The v2 will still
> > use `perlasm`.
>
> Note that perlasm (.pl) vs assembly (.S), and ".inst" vs real assembler
> instructions, are actually separate concerns.  We could use real assembler
> instructions while still using perlasm.  Or we could use assembly while still
> using macros that generate the instructions as .inst.
>
> My preference is indeed both: assembly (.S) with real assembler instructions.
> That would keep things more straightforward.
>
> We do not necessarily need to do both before merging the code, though.  It will
> be beneficial to get this code merged sooner rather than later, so that other
> people can work on improving it.
>
> I would prioritize the change to use real assembler instructions.  I do think
> it's worth thinking about getting that change in from the beginning, so that the
> toolchain prerequisites are properly in place from the beginning and people can
> properly account for them and prioritize the toolchain work as needed.
>
> - Eric

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH v4 00/12] RISC-V: support some cryptography accelerations
@ 2023-11-23  0:36                       ` Christoph Müllner
  0 siblings, 0 replies; 100+ messages in thread
From: Christoph Müllner @ 2023-11-23  0:36 UTC (permalink / raw)
  To: Eric Biggers
  Cc: Jerry Shih, Charlie Jenkins, Heiko Stuebner, palmer,
	paul.walmsley, aou, herbert, davem, conor.dooley, linux-riscv,
	linux-kernel, linux-crypto, Heiko Stuebner

On Thu, Nov 23, 2023 at 12:43 AM Eric Biggers <ebiggers@kernel.org> wrote:
>
> On Wed, Nov 22, 2023 at 03:58:17PM +0800, Jerry Shih wrote:
> > On Nov 22, 2023, at 07:51, Eric Biggers <ebiggers@kernel.org> wrote:
> > > On Wed, Nov 01, 2023 at 09:03:33PM -0700, Eric Biggers wrote:
> > >>
> > >> It would be nice to use a real assembler, so that people won't have to worry
> > >> about potential mistakes or inconsistencies in the perl-based "assembler".  Also
> > >> keep in mind that if we allow people to compile this code without the real
> > >> assembler support from the beginning, it might end up staying that way for quite
> > >> a while in order to avoid breaking the build for people.
> > >>
> > >> Ultimately it's up to you though; I think that you and others who have been
> > >> working on RISC-V crypto can make the best decision about what to do here.  I
> > >> also don't want this patchset to be delayed waiting for other projects, so maybe
> > >> that indeed means the perl-based "assembler" needs to be used for now.
> > >
> > > Just wanted to bump up this discussion again.  In binutils, the vector crypto
> > > v1.0.0 support was released 4 months ago in 2.41.  See the NEWS file at
> > > https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;a=blob_plain;f=binutils/NEWS;hb=refs/heads/binutils-2_41-branch
> > >
> > >    * The RISC-V port now supports the following new standard extensions:
> > >      - Zicond (conditional zero instructions)
> > >      - Zfa (additional floating-point instructions)
> > >      - Zvbb, Zvbc, Zvkg, Zvkned, Zvknh[ab], Zvksed, Zvksh, Zvkn, Zvknc, Zvkng,
> > >        Zvks, Zvksc, Zvkg, Zvkt (vector crypto instructions)
> > >
> > > That's every extension listed in the vector crypto v1.0.0 specification
> > > (https://github.com/riscv/riscv-crypto/releases/download/v1.0.0/riscv-crypto-spec-vector.pdf).
> >
> > It doesn't fit all v1.0 spec.
> > The `Zvkb` is missed in binutils. It's the subset of `Zvbb`. We needs some extra
> > works if user just try to use `Zvkb`.
> > https://github.com/riscv/riscv-crypto/blob/main/doc/vector/riscv-crypto-vector-zvkb.adoc
> > Some crypto algorithms are already checking for `Zvkb` instead of `Zvbb`.
>
> Yeah, that's unfortunate that Zvkb got missed in binutils.  However, since all
> Zvkb instructions are part of Zvbb, which is supported, assembling Zvkb
> instructions should still work --- right?

Not forgotten, but the Zvkb extension did not exist when the patchset
was merged.
RISC-V extension support is typically merged when specifications are "frozen".
This means a high bar for changes, but they are possible until the
spec is ratified.
Often nothing is changed until ratification, but here Zvkb has been
(re-)introduced.

I was not aware of this untils I read this email, so I just wrote a
patch that fills the gap:
  https://sourceware.org/pipermail/binutils/2023-November/130762.html

Thanks for reporting!

BR
Christoph

>
> > > LLVM still has the vector crypto extensions marked as "experimental" extensions.
> > > However, there is an open pull request to mark them non-experimental:
> > > https://github.com/llvm/llvm-project/pull/69000
> > >
> > > Could we just go ahead and remove riscv.pm, develop with binutils for now, and
> > > prioritize getting the LLVM support finished?
> >
> > Yes, we could.
> > But we need to handle the extensions checking for toolchains like:
> > https://github.com/torvalds/linux/commit/b6fcdb191e36f82336f9b5e126d51c02e7323480
> > I could do that, but I think I need some times to test the builds. And it will introduce
> > more dependency patch which is not related to actual crypto algorithms and the
> > gluing code in kernel. I will send another patch for toolchain part after the v2 patch.
> > And I am working for v2 patch with your new review comments. The v2 will still
> > use `perlasm`.
>
> Note that perlasm (.pl) vs assembly (.S), and ".inst" vs real assembler
> instructions, are actually separate concerns.  We could use real assembler
> instructions while still using perlasm.  Or we could use assembly while still
> using macros that generate the instructions as .inst.
>
> My preference is indeed both: assembly (.S) with real assembler instructions.
> That would keep things more straightforward.
>
> We do not necessarily need to do both before merging the code, though.  It will
> be beneficial to get this code merged sooner rather than later, so that other
> people can work on improving it.
>
> I would prioritize the change to use real assembler instructions.  I do think
> it's worth thinking about getting that change in from the beginning, so that the
> toolchain prerequisites are properly in place from the beginning and people can
> properly account for them and prioritize the toolchain work as needed.
>
> - Eric

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH v4 00/12] RISC-V: support some cryptography accelerations
  2023-11-23  0:36                       ` Christoph Müllner
@ 2023-11-28 20:19                         ` Eric Biggers
  -1 siblings, 0 replies; 100+ messages in thread
From: Eric Biggers @ 2023-11-28 20:19 UTC (permalink / raw)
  To: Christoph Müllner
  Cc: Jerry Shih, Charlie Jenkins, Heiko Stuebner, palmer,
	paul.walmsley, aou, herbert, davem, conor.dooley, linux-riscv,
	linux-kernel, linux-crypto, Heiko Stuebner

On Thu, Nov 23, 2023 at 01:36:34AM +0100, Christoph Müllner wrote:
> On Thu, Nov 23, 2023 at 12:43 AM Eric Biggers <ebiggers@kernel.org> wrote:
> >
> > On Wed, Nov 22, 2023 at 03:58:17PM +0800, Jerry Shih wrote:
> > > On Nov 22, 2023, at 07:51, Eric Biggers <ebiggers@kernel.org> wrote:
> > > > On Wed, Nov 01, 2023 at 09:03:33PM -0700, Eric Biggers wrote:
> > > >>
> > > >> It would be nice to use a real assembler, so that people won't have to worry
> > > >> about potential mistakes or inconsistencies in the perl-based "assembler".  Also
> > > >> keep in mind that if we allow people to compile this code without the real
> > > >> assembler support from the beginning, it might end up staying that way for quite
> > > >> a while in order to avoid breaking the build for people.
> > > >>
> > > >> Ultimately it's up to you though; I think that you and others who have been
> > > >> working on RISC-V crypto can make the best decision about what to do here.  I
> > > >> also don't want this patchset to be delayed waiting for other projects, so maybe
> > > >> that indeed means the perl-based "assembler" needs to be used for now.
> > > >
> > > > Just wanted to bump up this discussion again.  In binutils, the vector crypto
> > > > v1.0.0 support was released 4 months ago in 2.41.  See the NEWS file at
> > > > https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;a=blob_plain;f=binutils/NEWS;hb=refs/heads/binutils-2_41-branch
> > > >
> > > >    * The RISC-V port now supports the following new standard extensions:
> > > >      - Zicond (conditional zero instructions)
> > > >      - Zfa (additional floating-point instructions)
> > > >      - Zvbb, Zvbc, Zvkg, Zvkned, Zvknh[ab], Zvksed, Zvksh, Zvkn, Zvknc, Zvkng,
> > > >        Zvks, Zvksc, Zvkg, Zvkt (vector crypto instructions)
> > > >
> > > > That's every extension listed in the vector crypto v1.0.0 specification
> > > > (https://github.com/riscv/riscv-crypto/releases/download/v1.0.0/riscv-crypto-spec-vector.pdf).
> > >
> > > It doesn't fit all v1.0 spec.
> > > The `Zvkb` is missed in binutils. It's the subset of `Zvbb`. We needs some extra
> > > works if user just try to use `Zvkb`.
> > > https://github.com/riscv/riscv-crypto/blob/main/doc/vector/riscv-crypto-vector-zvkb.adoc
> > > Some crypto algorithms are already checking for `Zvkb` instead of `Zvbb`.
> >
> > Yeah, that's unfortunate that Zvkb got missed in binutils.  However, since all
> > Zvkb instructions are part of Zvbb, which is supported, assembling Zvkb
> > instructions should still work --- right?
> 
> Not forgotten, but the Zvkb extension did not exist when the patchset
> was merged.
> RISC-V extension support is typically merged when specifications are "frozen".
> This means a high bar for changes, but they are possible until the
> spec is ratified.
> Often nothing is changed until ratification, but here Zvkb has been
> (re-)introduced.
> 
> I was not aware of this untils I read this email, so I just wrote a
> patch that fills the gap:
>   https://sourceware.org/pipermail/binutils/2023-November/130762.html
> 

Thanks Christoph!  That binutils patch looks good to me.

- Eric

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH v4 00/12] RISC-V: support some cryptography accelerations
@ 2023-11-28 20:19                         ` Eric Biggers
  0 siblings, 0 replies; 100+ messages in thread
From: Eric Biggers @ 2023-11-28 20:19 UTC (permalink / raw)
  To: Christoph Müllner
  Cc: Jerry Shih, Charlie Jenkins, Heiko Stuebner, palmer,
	paul.walmsley, aou, herbert, davem, conor.dooley, linux-riscv,
	linux-kernel, linux-crypto, Heiko Stuebner

On Thu, Nov 23, 2023 at 01:36:34AM +0100, Christoph Müllner wrote:
> On Thu, Nov 23, 2023 at 12:43 AM Eric Biggers <ebiggers@kernel.org> wrote:
> >
> > On Wed, Nov 22, 2023 at 03:58:17PM +0800, Jerry Shih wrote:
> > > On Nov 22, 2023, at 07:51, Eric Biggers <ebiggers@kernel.org> wrote:
> > > > On Wed, Nov 01, 2023 at 09:03:33PM -0700, Eric Biggers wrote:
> > > >>
> > > >> It would be nice to use a real assembler, so that people won't have to worry
> > > >> about potential mistakes or inconsistencies in the perl-based "assembler".  Also
> > > >> keep in mind that if we allow people to compile this code without the real
> > > >> assembler support from the beginning, it might end up staying that way for quite
> > > >> a while in order to avoid breaking the build for people.
> > > >>
> > > >> Ultimately it's up to you though; I think that you and others who have been
> > > >> working on RISC-V crypto can make the best decision about what to do here.  I
> > > >> also don't want this patchset to be delayed waiting for other projects, so maybe
> > > >> that indeed means the perl-based "assembler" needs to be used for now.
> > > >
> > > > Just wanted to bump up this discussion again.  In binutils, the vector crypto
> > > > v1.0.0 support was released 4 months ago in 2.41.  See the NEWS file at
> > > > https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;a=blob_plain;f=binutils/NEWS;hb=refs/heads/binutils-2_41-branch
> > > >
> > > >    * The RISC-V port now supports the following new standard extensions:
> > > >      - Zicond (conditional zero instructions)
> > > >      - Zfa (additional floating-point instructions)
> > > >      - Zvbb, Zvbc, Zvkg, Zvkned, Zvknh[ab], Zvksed, Zvksh, Zvkn, Zvknc, Zvkng,
> > > >        Zvks, Zvksc, Zvkg, Zvkt (vector crypto instructions)
> > > >
> > > > That's every extension listed in the vector crypto v1.0.0 specification
> > > > (https://github.com/riscv/riscv-crypto/releases/download/v1.0.0/riscv-crypto-spec-vector.pdf).
> > >
> > > It doesn't fit all v1.0 spec.
> > > The `Zvkb` is missed in binutils. It's the subset of `Zvbb`. We needs some extra
> > > works if user just try to use `Zvkb`.
> > > https://github.com/riscv/riscv-crypto/blob/main/doc/vector/riscv-crypto-vector-zvkb.adoc
> > > Some crypto algorithms are already checking for `Zvkb` instead of `Zvbb`.
> >
> > Yeah, that's unfortunate that Zvkb got missed in binutils.  However, since all
> > Zvkb instructions are part of Zvbb, which is supported, assembling Zvkb
> > instructions should still work --- right?
> 
> Not forgotten, but the Zvkb extension did not exist when the patchset
> was merged.
> RISC-V extension support is typically merged when specifications are "frozen".
> This means a high bar for changes, but they are possible until the
> spec is ratified.
> Often nothing is changed until ratification, but here Zvkb has been
> (re-)introduced.
> 
> I was not aware of this untils I read this email, so I just wrote a
> patch that fills the gap:
>   https://sourceware.org/pipermail/binutils/2023-November/130762.html
> 

Thanks Christoph!  That binutils patch looks good to me.

- Eric

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 100+ messages in thread

end of thread, other threads:[~2023-11-28 20:19 UTC | newest]

Thread overview: 100+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-07-11 15:37 [PATCH v4 00/12] RISC-V: support some cryptography accelerations Heiko Stuebner
2023-07-11 15:37 ` Heiko Stuebner
2023-07-11 15:37 ` [PATCH v4 01/12] riscv: Add support for kernel mode vector Heiko Stuebner
2023-07-11 15:37   ` Heiko Stuebner
2023-07-11 17:11   ` Rémi Denis-Courmont
2023-07-11 17:11     ` Rémi Denis-Courmont
2023-07-13 17:19     ` Andy Chiu
2023-07-13 17:19       ` Andy Chiu
2023-07-11 15:37 ` [PATCH v4 02/12] riscv: Add vector extension XOR implementation Heiko Stuebner
2023-07-11 15:37   ` Heiko Stuebner
2023-07-11 17:33   ` Rémi Denis-Courmont
2023-07-11 17:33     ` Rémi Denis-Courmont
2023-07-11 15:37 ` [PATCH v4 03/12] RISC-V: add helper function to read the vector VLEN Heiko Stuebner
2023-07-11 15:37   ` Heiko Stuebner
2023-07-11 18:06   ` Rémi Denis-Courmont
2023-07-11 18:06     ` Rémi Denis-Courmont
2023-07-11 15:37 ` [PATCH v4 04/12] RISC-V: add vector crypto extension detection Heiko Stuebner
2023-07-11 15:37   ` Heiko Stuebner
2023-07-12 10:40   ` Anup Patel
2023-07-12 10:40     ` Anup Patel
2023-07-18 14:55   ` Conor Dooley
2023-07-18 14:55     ` Conor Dooley
2023-07-21  5:48   ` Eric Biggers
2023-07-21  5:48     ` Eric Biggers
2023-07-11 15:37 ` [PATCH v4 05/12] RISC-V: crypto: update perl include with helpers for vector (crypto) instructions Heiko Stuebner
2023-07-11 15:37   ` Heiko Stuebner
2023-07-11 18:04   ` Rémi Denis-Courmont
2023-07-11 18:04     ` Rémi Denis-Courmont
2023-07-11 15:37 ` [PATCH v4 06/12] RISC-V: crypto: add Zvbb+Zvbc accelerated GCM GHASH implementation Heiko Stuebner
2023-07-11 15:37   ` Heiko Stuebner
2023-08-10  9:57   ` Andy Chiu
2023-08-10  9:57     ` Andy Chiu
2023-07-11 15:37 ` [PATCH v4 07/12] RISC-V: crypto: add Zvkg " Heiko Stuebner
2023-07-11 15:37   ` Heiko Stuebner
2023-07-11 15:37 ` [PATCH v4 08/12] RISC-V: crypto: add a vector-crypto-accelerated SHA256 implementation Heiko Stuebner
2023-07-11 15:37   ` Heiko Stuebner
2023-07-21  4:42   ` Eric Biggers
2023-07-21  4:42     ` Eric Biggers
2023-07-11 15:37 ` [PATCH v4 09/12] RISC-V: crypto: add a vector-crypto-accelerated SHA512 implementation Heiko Stuebner
2023-07-11 15:37   ` Heiko Stuebner
2023-07-11 15:37 ` [PATCH v4 10/12] RISC-V: crypto: add Zvkned accelerated AES encryption implementation Heiko Stuebner
2023-07-11 15:37   ` Heiko Stuebner
2023-07-21  5:40   ` Eric Biggers
2023-07-21  5:40     ` Eric Biggers
2023-07-21 11:39     ` Ard Biesheuvel
2023-07-21 11:39       ` Ard Biesheuvel
2023-07-21 14:23       ` Ard Biesheuvel
2023-07-21 14:23         ` Ard Biesheuvel
2023-09-11 13:06     ` Jerry Shih
2023-09-11 13:06       ` Jerry Shih
2023-09-12  7:04       ` Ard Biesheuvel
2023-09-12  7:04         ` Ard Biesheuvel
2023-09-12  7:15         ` Jerry Shih
2023-09-12  7:15           ` Jerry Shih
2023-09-15  1:28           ` He-Jie Shih
2023-09-15  1:28             ` He-Jie Shih
2023-07-11 15:37 ` [PATCH v4 11/12] RISC-V: crypto: add Zvksed accelerated SM4 " Heiko Stuebner
2023-07-11 15:37   ` Heiko Stuebner
2023-07-11 15:37 ` [PATCH v4 12/12] RISC-V: crypto: add Zvksh accelerated SM3 hash implementation Heiko Stuebner
2023-07-11 15:37   ` Heiko Stuebner
2023-07-13  7:40 ` [PATCH v4 00/12] RISC-V: support some cryptography accelerations Eric Biggers
2023-07-13  7:40   ` Eric Biggers
2023-07-14  6:27   ` Eric Biggers
2023-07-14  6:27     ` Eric Biggers
2023-07-14  7:02     ` Heiko Stuebner
2023-07-14  7:02       ` Heiko Stuebner
2023-07-21  5:12 ` Eric Biggers
2023-07-21  5:12   ` Eric Biggers
2023-09-14  0:11 ` Eric Biggers
2023-09-14  0:11   ` Eric Biggers
2023-09-14  1:10   ` Charlie Jenkins
2023-09-14  1:10     ` Charlie Jenkins
2023-09-15  1:48     ` He-Jie Shih
2023-09-15  1:48       ` He-Jie Shih
2023-09-15  3:21       ` Jerry Shih
2023-09-15  3:21         ` Jerry Shih
2023-10-06 19:47         ` Eric Biggers
2023-10-06 19:47           ` Eric Biggers
2023-10-06 21:01           ` He-Jie Shih
2023-10-06 21:01             ` He-Jie Shih
2023-10-06 23:33             ` Ard Biesheuvel
2023-10-06 23:33               ` Ard Biesheuvel
2023-10-07 22:16               ` Eric Biggers
2023-10-07 22:16                 ` Eric Biggers
2023-10-07 21:30             ` Eric Biggers
2023-10-07 21:30               ` Eric Biggers
2023-10-31  2:17           ` Jerry Shih
2023-10-31  2:17             ` Jerry Shih
2023-11-02  4:03             ` Eric Biggers
2023-11-02  4:03               ` Eric Biggers
2023-11-21 23:51               ` Eric Biggers
2023-11-21 23:51                 ` Eric Biggers
2023-11-22  7:58                 ` Jerry Shih
2023-11-22  7:58                   ` Jerry Shih
2023-11-22 23:42                   ` Eric Biggers
2023-11-22 23:42                     ` Eric Biggers
2023-11-23  0:36                     ` Christoph Müllner
2023-11-23  0:36                       ` Christoph Müllner
2023-11-28 20:19                       ` Eric Biggers
2023-11-28 20:19                         ` Eric Biggers

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.