linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/12] RISC-V: provide some accelerated cryptography implementations using vector extensions
@ 2023-10-25 18:36 Jerry Shih
  2023-10-25 18:36 ` [PATCH 01/12] RISC-V: add helper function to read the vector VLEN Jerry Shih
                   ` (11 more replies)
  0 siblings, 12 replies; 50+ messages in thread
From: Jerry Shih @ 2023-10-25 18:36 UTC (permalink / raw)
  To: paul.walmsley, palmer, aou, herbert, davem
  Cc: andy.chiu, greentime.hu, conor.dooley, guoren, bjorn, heiko,
	ebiggers, ardb, phoebe.chen, hongrong.hsu, linux-riscv,
	linux-kernel, linux-crypto

This series provides cryptographic implementations using the vector crypto
extensions[1] including:
1. AES cipher
2. AES with CBC/CTR/ECB/XTS block modes
3. ChaCha20 stream cipher
4. GHASH for GCM
5. SHA-224/256 and SHA-384/512 hash
6. SM3 hash
7. SM4 cipher

This patch set is based on Heiko Stuebner's work at:
Link: https://lore.kernel.org/all/20230711153743.1970625-1-heiko@sntech.de/

The implementations reuse the perl-asm scripts from OpenSSL[2] with some
changes adapting for the kernel crypto framework.
The perl-asm scripts generate the opcodes into `.S` files instead of asm
mnemonics. The reason for using opcodes is because the assembler hasn't
supported the vector-crypto extensions yet.

All changes pass the kernel run-time crypto self tests and the extra tests
with vector-crypto-enabled qemu.
Link: https://lists.gnu.org/archive/html/qemu-devel/2023-10/msg08822.html

This series depend on:
1. kernel 6.6-rc7
Link: https://github.com/torvalds/linux/commit/05d3ef8bba77c1b5f98d941d8b2d4aeab8118ef1
2. support kernel-mode vector
Link: https://lore.kernel.org/all/20230721112855.1006-1-andy.chiu@sifive.com/
3. vector crypto extensions detection
Link: https://lore.kernel.org/lkml/20231017131456.2053396-1-cleger@rivosinc.com/
4. fix the error message:
    alg: skcipher: skipping comparison tests for xts-aes-aesni because
    xts(ecb(aes-generic)) is unavailable
Link: https://lore.kernel.org/linux-crypto/20231009023116.266210-1-ebiggers@kernel.org/

Here is a branch on github applying with all dependent patches:
Link: https://github.com/JerryShih/linux/tree/dev/jerrys/vector-crypto-upstream

[1]
Link: https://github.com/riscv/riscv-crypto/blob/56ed7952d13eb5bdff92e2b522404668952f416d/doc/vector/riscv-crypto-spec-vector.adoc
[2]
Link: https://github.com/openssl/openssl/pull/21923

Heiko Stuebner (2):
  RISC-V: add helper function to read the vector VLEN
  RISC-V: hook new crypto subdir into build-system

Jerry Shih (10):
  RISC-V: crypto: add OpenSSL perl module for vector instructions
  RISC-V: crypto: add Zvkned accelerated AES implementation
  crypto: scatterwalk - Add scatterwalk_next() to get the next
    scatterlist in scatter_walk
  RISC-V: crypto: add accelerated AES-CBC/CTR/ECB/XTS implementations
  RISC-V: crypto: add Zvkg accelerated GCM GHASH implementation
  RISC-V: crypto: add Zvknha/b accelerated SHA224/256 implementations
  RISC-V: crypto: add Zvknhb accelerated SHA384/512 implementations
  RISC-V: crypto: add Zvksed accelerated SM4 implementation
  RISC-V: crypto: add Zvksh accelerated SM3 implementation
  RISC-V: crypto: add Zvkb accelerated ChaCha20 implementation

 arch/riscv/Kbuild                             |    1 +
 arch/riscv/crypto/Kconfig                     |  122 ++
 arch/riscv/crypto/Makefile                    |   68 +
 .../crypto/aes-riscv64-block-mode-glue.c      |  486 +++++++
 arch/riscv/crypto/aes-riscv64-glue.c          |  163 +++
 arch/riscv/crypto/aes-riscv64-glue.h          |   28 +
 .../crypto/aes-riscv64-zvbb-zvkg-zvkned.pl    |  944 +++++++++++++
 arch/riscv/crypto/aes-riscv64-zvkb-zvkned.pl  |  416 ++++++
 arch/riscv/crypto/aes-riscv64-zvkned.pl       | 1198 +++++++++++++++++
 arch/riscv/crypto/chacha-riscv64-glue.c       |  120 ++
 arch/riscv/crypto/chacha-riscv64-zvkb.pl      |  322 +++++
 arch/riscv/crypto/ghash-riscv64-glue.c        |  191 +++
 arch/riscv/crypto/ghash-riscv64-zvkg.pl       |  100 ++
 arch/riscv/crypto/riscv.pm                    |  828 ++++++++++++
 arch/riscv/crypto/sha256-riscv64-glue.c       |  140 ++
 .../sha256-riscv64-zvkb-zvknha_or_zvknhb.pl   |  318 +++++
 arch/riscv/crypto/sha512-riscv64-glue.c       |  133 ++
 .../crypto/sha512-riscv64-zvkb-zvknhb.pl      |  266 ++++
 arch/riscv/crypto/sm3-riscv64-glue.c          |  121 ++
 arch/riscv/crypto/sm3-riscv64-zvksh.pl        |  230 ++++
 arch/riscv/crypto/sm4-riscv64-glue.c          |  120 ++
 arch/riscv/crypto/sm4-riscv64-zvksed.pl       |  268 ++++
 arch/riscv/include/asm/vector.h               |   11 +
 crypto/Kconfig                                |    3 +
 include/crypto/scatterwalk.h                  |    9 +-
 25 files changed, 6604 insertions(+), 2 deletions(-)
 create mode 100644 arch/riscv/crypto/Kconfig
 create mode 100644 arch/riscv/crypto/Makefile
 create mode 100644 arch/riscv/crypto/aes-riscv64-block-mode-glue.c
 create mode 100644 arch/riscv/crypto/aes-riscv64-glue.c
 create mode 100644 arch/riscv/crypto/aes-riscv64-glue.h
 create mode 100644 arch/riscv/crypto/aes-riscv64-zvbb-zvkg-zvkned.pl
 create mode 100644 arch/riscv/crypto/aes-riscv64-zvkb-zvkned.pl
 create mode 100644 arch/riscv/crypto/aes-riscv64-zvkned.pl
 create mode 100644 arch/riscv/crypto/chacha-riscv64-glue.c
 create mode 100644 arch/riscv/crypto/chacha-riscv64-zvkb.pl
 create mode 100644 arch/riscv/crypto/ghash-riscv64-glue.c
 create mode 100644 arch/riscv/crypto/ghash-riscv64-zvkg.pl
 create mode 100644 arch/riscv/crypto/riscv.pm
 create mode 100644 arch/riscv/crypto/sha256-riscv64-glue.c
 create mode 100644 arch/riscv/crypto/sha256-riscv64-zvkb-zvknha_or_zvknhb.pl
 create mode 100644 arch/riscv/crypto/sha512-riscv64-glue.c
 create mode 100644 arch/riscv/crypto/sha512-riscv64-zvkb-zvknhb.pl
 create mode 100644 arch/riscv/crypto/sm3-riscv64-glue.c
 create mode 100644 arch/riscv/crypto/sm3-riscv64-zvksh.pl
 create mode 100644 arch/riscv/crypto/sm4-riscv64-glue.c
 create mode 100644 arch/riscv/crypto/sm4-riscv64-zvksed.pl

--
2.28.0


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH 01/12] RISC-V: add helper function to read the vector VLEN
  2023-10-25 18:36 [PATCH 00/12] RISC-V: provide some accelerated cryptography implementations using vector extensions Jerry Shih
@ 2023-10-25 18:36 ` Jerry Shih
  2023-10-25 18:36 ` [PATCH 02/12] RISC-V: hook new crypto subdir into build-system Jerry Shih
                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 50+ messages in thread
From: Jerry Shih @ 2023-10-25 18:36 UTC (permalink / raw)
  To: paul.walmsley, palmer, aou, herbert, davem
  Cc: andy.chiu, greentime.hu, conor.dooley, guoren, bjorn, heiko,
	ebiggers, ardb, phoebe.chen, hongrong.hsu, linux-riscv,
	linux-kernel, linux-crypto

From: Heiko Stuebner <heiko.stuebner@vrull.eu>

VLEN describes the length of each vector register and some instructions
need specific minimal VLENs to work correctly.

The vector code already includes a variable riscv_v_vsize that contains
the value of "32 vector registers with vlenb length" that gets filled
during boot. vlenb is the value contained in the CSR_VLENB register and
the value represents "VLEN / 8".

So add riscv_vector_vlen() to return the actual VLEN value for in-kernel
users when they need to check the available VLEN.

Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
Signed-off-by: Jerry Shih <jerry.shih@sifive.com>
---
 arch/riscv/include/asm/vector.h | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/arch/riscv/include/asm/vector.h b/arch/riscv/include/asm/vector.h
index 9fb2dea66abd..1fd3e5510b64 100644
--- a/arch/riscv/include/asm/vector.h
+++ b/arch/riscv/include/asm/vector.h
@@ -244,4 +244,15 @@ void kernel_vector_allow_preemption(void);
 #define kernel_vector_allow_preemption()	do {} while (0)
 #endif
 
+/*
+ * Return the implementation's vlen value.
+ *
+ * riscv_v_vsize contains the value of "32 vector registers with vlenb length"
+ * so rebuild the vlen value in bits from it.
+ */
+static inline int riscv_vector_vlen(void)
+{
+	return riscv_v_vsize / 32 * 8;
+}
+
 #endif /* ! __ASM_RISCV_VECTOR_H */
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 02/12] RISC-V: hook new crypto subdir into build-system
  2023-10-25 18:36 [PATCH 00/12] RISC-V: provide some accelerated cryptography implementations using vector extensions Jerry Shih
  2023-10-25 18:36 ` [PATCH 01/12] RISC-V: add helper function to read the vector VLEN Jerry Shih
@ 2023-10-25 18:36 ` Jerry Shih
  2023-10-25 18:36 ` [PATCH 03/12] RISC-V: crypto: add OpenSSL perl module for vector instructions Jerry Shih
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 50+ messages in thread
From: Jerry Shih @ 2023-10-25 18:36 UTC (permalink / raw)
  To: paul.walmsley, palmer, aou, herbert, davem
  Cc: andy.chiu, greentime.hu, conor.dooley, guoren, bjorn, heiko,
	ebiggers, ardb, phoebe.chen, hongrong.hsu, linux-riscv,
	linux-kernel, linux-crypto

From: Heiko Stuebner <heiko.stuebner@vrull.eu>

Create a crypto subdirectory for added accelerated cryptography routines
and hook it into the riscv Kbuild and the main crypto Kconfig.

Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
Signed-off-by: Jerry Shih <jerry.shih@sifive.com>
---
 arch/riscv/Kbuild          | 1 +
 arch/riscv/crypto/Kconfig  | 5 +++++
 arch/riscv/crypto/Makefile | 4 ++++
 crypto/Kconfig             | 3 +++
 4 files changed, 13 insertions(+)
 create mode 100644 arch/riscv/crypto/Kconfig
 create mode 100644 arch/riscv/crypto/Makefile

diff --git a/arch/riscv/Kbuild b/arch/riscv/Kbuild
index d25ad1c19f88..2c585f7a0b6e 100644
--- a/arch/riscv/Kbuild
+++ b/arch/riscv/Kbuild
@@ -2,6 +2,7 @@
 
 obj-y += kernel/ mm/ net/
 obj-$(CONFIG_BUILTIN_DTB) += boot/dts/
+obj-$(CONFIG_CRYPTO) += crypto/
 obj-y += errata/
 obj-$(CONFIG_KVM) += kvm/
 
diff --git a/arch/riscv/crypto/Kconfig b/arch/riscv/crypto/Kconfig
new file mode 100644
index 000000000000..10d60edc0110
--- /dev/null
+++ b/arch/riscv/crypto/Kconfig
@@ -0,0 +1,5 @@
+# SPDX-License-Identifier: GPL-2.0
+
+menu "Accelerated Cryptographic Algorithms for CPU (riscv)"
+
+endmenu
diff --git a/arch/riscv/crypto/Makefile b/arch/riscv/crypto/Makefile
new file mode 100644
index 000000000000..b3b6332c9f6d
--- /dev/null
+++ b/arch/riscv/crypto/Makefile
@@ -0,0 +1,4 @@
+# SPDX-License-Identifier: GPL-2.0-only
+#
+# linux/arch/riscv/crypto/Makefile
+#
diff --git a/crypto/Kconfig b/crypto/Kconfig
index 650b1b3620d8..c7b23d2c58e4 100644
--- a/crypto/Kconfig
+++ b/crypto/Kconfig
@@ -1436,6 +1436,9 @@ endif
 if PPC
 source "arch/powerpc/crypto/Kconfig"
 endif
+if RISCV
+source "arch/riscv/crypto/Kconfig"
+endif
 if S390
 source "arch/s390/crypto/Kconfig"
 endif
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 03/12] RISC-V: crypto: add OpenSSL perl module for vector instructions
  2023-10-25 18:36 [PATCH 00/12] RISC-V: provide some accelerated cryptography implementations using vector extensions Jerry Shih
  2023-10-25 18:36 ` [PATCH 01/12] RISC-V: add helper function to read the vector VLEN Jerry Shih
  2023-10-25 18:36 ` [PATCH 02/12] RISC-V: hook new crypto subdir into build-system Jerry Shih
@ 2023-10-25 18:36 ` Jerry Shih
  2023-10-25 18:36 ` [PATCH 04/12] RISC-V: crypto: add Zvkned accelerated AES implementation Jerry Shih
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 50+ messages in thread
From: Jerry Shih @ 2023-10-25 18:36 UTC (permalink / raw)
  To: paul.walmsley, palmer, aou, herbert, davem
  Cc: andy.chiu, greentime.hu, conor.dooley, guoren, bjorn, heiko,
	ebiggers, ardb, phoebe.chen, hongrong.hsu, linux-riscv,
	linux-kernel, linux-crypto

The OpenSSL has some RISC-V vector cryptography implementations which
could be reused for kernel. These implementations use a number of perl
helpers for handling vector and vector-crypto-extension instructions.
This patch take these perl helpers from OpenSSL(openssl/openssl#21923).
The unused scalar crypto instructions in the original perl module are
skipped.

Co-developed-by: Christoph Müllner <christoph.muellner@vrull.eu>
Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
Co-developed-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
Co-developed-by: Phoebe Chen <phoebe.chen@sifive.com>
Signed-off-by: Phoebe Chen <phoebe.chen@sifive.com>
Signed-off-by: Jerry Shih <jerry.shih@sifive.com>
---
 arch/riscv/crypto/riscv.pm | 828 +++++++++++++++++++++++++++++++++++++
 1 file changed, 828 insertions(+)
 create mode 100644 arch/riscv/crypto/riscv.pm

diff --git a/arch/riscv/crypto/riscv.pm b/arch/riscv/crypto/riscv.pm
new file mode 100644
index 000000000000..e188f7476e3e
--- /dev/null
+++ b/arch/riscv/crypto/riscv.pm
@@ -0,0 +1,828 @@
+#! /usr/bin/env perl
+# SPDX-License-Identifier: Apache-2.0 OR BSD-2-Clause
+#
+# This file is dual-licensed, meaning that you can use it under your
+# choice of either of the following two licenses:
+#
+# Copyright 2023 The OpenSSL Project Authors. All Rights Reserved.
+#
+# Licensed under the Apache License 2.0 (the "License"). You can obtain
+# a copy in the file LICENSE in the source distribution or at
+# https://www.openssl.org/source/license.html
+#
+# or
+#
+# Copyright (c) 2023, Christoph Müllner <christoph.muellner@vrull.eu>
+# Copyright (c) 2023, Jerry Shih <jerry.shih@sifive.com>
+# Copyright (c) 2023, Phoebe Chen <phoebe.chen@sifive.com>
+# All rights reserved.
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions
+# are met:
+# 1. Redistributions of source code must retain the above copyright
+#    notice, this list of conditions and the following disclaimer.
+# 2. Redistributions in binary form must reproduce the above copyright
+#    notice, this list of conditions and the following disclaimer in the
+#    documentation and/or other materials provided with the distribution.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+use strict;
+use warnings;
+
+# Set $have_stacktrace to 1 if we have Devel::StackTrace
+my $have_stacktrace = 0;
+if (eval {require Devel::StackTrace;1;}) {
+    $have_stacktrace = 1;
+}
+
+my @regs = map("x$_",(0..31));
+# Mapping from the RISC-V psABI ABI mnemonic names to the register number.
+my @regaliases = ('zero','ra','sp','gp','tp','t0','t1','t2','s0','s1',
+    map("a$_",(0..7)),
+    map("s$_",(2..11)),
+    map("t$_",(3..6))
+);
+
+my %reglookup;
+@reglookup{@regs} = @regs;
+@reglookup{@regaliases} = @regs;
+
+# Takes a register name, possibly an alias, and converts it to a register index
+# from 0 to 31
+sub read_reg {
+    my $reg = lc shift;
+    if (!exists($reglookup{$reg})) {
+        my $trace = "";
+        if ($have_stacktrace) {
+            $trace = Devel::StackTrace->new->as_string;
+        }
+        die("Unknown register ".$reg."\n".$trace);
+    }
+    my $regstr = $reglookup{$reg};
+    if (!($regstr =~ /^x([0-9]+)$/)) {
+        my $trace = "";
+        if ($have_stacktrace) {
+            $trace = Devel::StackTrace->new->as_string;
+        }
+        die("Could not process register ".$reg."\n".$trace);
+    }
+    return $1;
+}
+
+# Read the sew setting(8, 16, 32 and 64) and convert to vsew encoding.
+sub read_sew {
+    my $sew_setting = shift;
+
+    if ($sew_setting eq "e8") {
+        return 0;
+    } elsif ($sew_setting eq "e16") {
+        return 1;
+    } elsif ($sew_setting eq "e32") {
+        return 2;
+    } elsif ($sew_setting eq "e64") {
+        return 3;
+    } else {
+        my $trace = "";
+        if ($have_stacktrace) {
+            $trace = Devel::StackTrace->new->as_string;
+        }
+        die("Unsupported SEW setting:".$sew_setting."\n".$trace);
+    }
+}
+
+# Read the LMUL settings and convert to vlmul encoding.
+sub read_lmul {
+    my $lmul_setting = shift;
+
+    if ($lmul_setting eq "mf8") {
+        return 5;
+    } elsif ($lmul_setting eq "mf4") {
+        return 6;
+    } elsif ($lmul_setting eq "mf2") {
+        return 7;
+    } elsif ($lmul_setting eq "m1") {
+        return 0;
+    } elsif ($lmul_setting eq "m2") {
+        return 1;
+    } elsif ($lmul_setting eq "m4") {
+        return 2;
+    } elsif ($lmul_setting eq "m8") {
+        return 3;
+    } else {
+        my $trace = "";
+        if ($have_stacktrace) {
+            $trace = Devel::StackTrace->new->as_string;
+        }
+        die("Unsupported LMUL setting:".$lmul_setting."\n".$trace);
+    }
+}
+
+# Read the tail policy settings and convert to vta encoding.
+sub read_tail_policy {
+    my $tail_setting = shift;
+
+    if ($tail_setting eq "ta") {
+        return 1;
+    } elsif ($tail_setting eq "tu") {
+        return 0;
+    } else {
+        my $trace = "";
+        if ($have_stacktrace) {
+            $trace = Devel::StackTrace->new->as_string;
+        }
+        die("Unsupported tail policy setting:".$tail_setting."\n".$trace);
+    }
+}
+
+# Read the mask policy settings and convert to vma encoding.
+sub read_mask_policy {
+    my $mask_setting = shift;
+
+    if ($mask_setting eq "ma") {
+        return 1;
+    } elsif ($mask_setting eq "mu") {
+        return 0;
+    } else {
+        my $trace = "";
+        if ($have_stacktrace) {
+            $trace = Devel::StackTrace->new->as_string;
+        }
+        die("Unsupported mask policy setting:".$mask_setting."\n".$trace);
+    }
+}
+
+my @vregs = map("v$_",(0..31));
+my %vreglookup;
+@vreglookup{@vregs} = @vregs;
+
+sub read_vreg {
+    my $vreg = lc shift;
+    if (!exists($vreglookup{$vreg})) {
+        my $trace = "";
+        if ($have_stacktrace) {
+            $trace = Devel::StackTrace->new->as_string;
+        }
+        die("Unknown vector register ".$vreg."\n".$trace);
+    }
+    if (!($vreg =~ /^v([0-9]+)$/)) {
+        my $trace = "";
+        if ($have_stacktrace) {
+            $trace = Devel::StackTrace->new->as_string;
+        }
+        die("Could not process vector register ".$vreg."\n".$trace);
+    }
+    return $1;
+}
+
+# Read the vm settings and convert to mask encoding.
+sub read_mask_vreg {
+    my $vreg = shift;
+    # The default value is unmasked.
+    my $mask_bit = 1;
+
+    if (defined($vreg)) {
+        my $reg_id = read_vreg $vreg;
+        if ($reg_id == 0) {
+            $mask_bit = 0;
+        } else {
+            my $trace = "";
+            if ($have_stacktrace) {
+                $trace = Devel::StackTrace->new->as_string;
+            }
+            die("The ".$vreg." is not the mask register v0.\n".$trace);
+        }
+    }
+    return $mask_bit;
+}
+
+# Vector instructions
+
+sub vadd_vv {
+    # vadd.vv vd, vs2, vs1, vm
+    my $template = 0b000000_0_00000_00000_000_00000_1010111;
+    my $vd = read_vreg shift;
+    my $vs2 = read_vreg shift;
+    my $vs1 = read_vreg shift;
+    my $vm = read_mask_vreg shift;
+    return ".word ".($template | ($vm << 25) | ($vs2 << 20) | ($vs1 << 15) | ($vd << 7));
+}
+
+sub vadd_vx {
+    # vadd.vx vd, vs2, rs1, vm
+    my $template = 0b000000_0_00000_00000_100_00000_1010111;
+    my $vd = read_vreg shift;
+    my $vs2 = read_vreg shift;
+    my $rs1 = read_reg shift;
+    my $vm = read_mask_vreg shift;
+    return ".word ".($template | ($vm << 25) | ($vs2 << 20) | ($rs1 << 15) | ($vd << 7));
+}
+
+sub vsub_vv {
+    # vsub.vv vd, vs2, vs1, vm
+    my $template = 0b000010_0_00000_00000_000_00000_1010111;
+    my $vd = read_vreg shift;
+    my $vs2 = read_vreg shift;
+    my $vs1 = read_vreg shift;
+    my $vm = read_mask_vreg shift;
+    return ".word ".($template | ($vm << 25) | ($vs2 << 20) | ($vs1 << 15) | ($vd << 7));
+}
+
+sub vsub_vx {
+    # vsub.vx vd, vs2, rs1, vm
+    my $template = 0b000010_0_00000_00000_100_00000_1010111;
+    my $vd = read_vreg shift;
+    my $vs2 = read_vreg shift;
+    my $rs1 = read_reg shift;
+    my $vm = read_mask_vreg shift;
+    return ".word ".($template | ($vm << 25) | ($vs2 << 20) | ($rs1 << 15) | ($vd << 7));
+}
+
+sub vid_v {
+    # vid.v vd
+    my $template = 0b0101001_00000_10001_010_00000_1010111;
+    my $vd = read_vreg shift;
+    return ".word ".($template | ($vd << 7));
+}
+
+sub viota_m {
+    # viota.m vd, vs2, vm
+    my $template = 0b010100_0_00000_10000_010_00000_1010111;
+    my $vd = read_vreg shift;
+    my $vs2 = read_vreg shift;
+    my $vm = read_mask_vreg shift;
+    return ".word ".($template | ($vm << 25) | ($vs2 << 20) | ($vd << 7));
+}
+
+sub vle8_v {
+    # vle8.v vd, (rs1), vm
+    my $template = 0b000000_0_00000_00000_000_00000_0000111;
+    my $vd = read_vreg shift;
+    my $rs1 = read_reg shift;
+    my $vm = read_mask_vreg shift;
+    return ".word ".($template | ($vm << 25) | ($rs1 << 15) | ($vd << 7));
+}
+
+sub vle32_v {
+    # vle32.v vd, (rs1), vm
+    my $template = 0b000000_0_00000_00000_110_00000_0000111;
+    my $vd = read_vreg shift;
+    my $rs1 = read_reg shift;
+    my $vm = read_mask_vreg shift;
+    return ".word ".($template | ($vm << 25) | ($rs1 << 15) | ($vd << 7));
+}
+
+sub vle64_v {
+    # vle64.v vd, (rs1)
+    my $template = 0b0000001_00000_00000_111_00000_0000111;
+    my $vd = read_vreg shift;
+    my $rs1 = read_reg shift;
+    return ".word ".($template | ($rs1 << 15) | ($vd << 7));
+}
+
+sub vlse32_v {
+    # vlse32.v vd, (rs1), rs2
+    my $template = 0b0000101_00000_00000_110_00000_0000111;
+    my $vd = read_vreg shift;
+    my $rs1 = read_reg shift;
+    my $rs2 = read_reg shift;
+    return ".word ".($template | ($rs2 << 20) | ($rs1 << 15) | ($vd << 7));
+}
+
+sub vlsseg_nf_e32_v {
+    # vlsseg<nf>e32.v vd, (rs1), rs2
+    my $template = 0b0000101_00000_00000_110_00000_0000111;
+    my $nf = shift;
+    $nf -= 1;
+    my $vd = read_vreg shift;
+    my $rs1 = read_reg shift;
+    my $rs2 = read_reg shift;
+    return ".word ".($template | ($nf << 29) | ($rs2 << 20) | ($rs1 << 15) | ($vd << 7));
+}
+
+sub vlse64_v {
+    # vlse64.v vd, (rs1), rs2
+    my $template = 0b0000101_00000_00000_111_00000_0000111;
+    my $vd = read_vreg shift;
+    my $rs1 = read_reg shift;
+    my $rs2 = read_reg shift;
+    return ".word ".($template | ($rs2 << 20) | ($rs1 << 15) | ($vd << 7));
+}
+
+sub vluxei8_v {
+    # vluxei8.v vd, (rs1), vs2, vm
+    my $template = 0b000001_0_00000_00000_000_00000_0000111;
+    my $vd = read_vreg shift;
+    my $rs1 = read_reg shift;
+    my $vs2 = read_vreg shift;
+    my $vm = read_mask_vreg shift;
+    return ".word ".($template | ($vm << 25) | ($vs2 << 20) | ($rs1 << 15) | ($vd << 7));
+}
+
+sub vmerge_vim {
+    # vmerge.vim vd, vs2, imm, v0
+    my $template = 0b0101110_00000_00000_011_00000_1010111;
+    my $vd = read_vreg shift;
+    my $vs2 = read_vreg shift;
+    my $imm = shift;
+    return ".word ".($template | ($vs2 << 20) | ($imm << 15) | ($vd << 7));
+}
+
+sub vmerge_vvm {
+    # vmerge.vvm vd vs2 vs1
+    my $template = 0b0101110_00000_00000_000_00000_1010111;
+    my $vd = read_vreg shift;
+    my $vs2 = read_vreg shift;
+    my $vs1 = read_vreg shift;
+    return ".word ".($template | ($vs2 << 20) | ($vs1 << 15) | ($vd << 7))
+}
+
+sub vmseq_vi {
+    # vmseq.vi vd vs1, imm
+    my $template = 0b0110001_00000_00000_011_00000_1010111;
+    my $vd = read_vreg shift;
+    my $vs1 = read_vreg shift;
+    my $imm = shift;
+    return ".word ".($template | ($vs1 << 20) | ($imm << 15) | ($vd << 7))
+}
+
+sub vmsgtu_vx {
+    # vmsgtu.vx vd vs2, rs1, vm
+    my $template = 0b011110_0_00000_00000_100_00000_1010111;
+    my $vd = read_vreg shift;
+    my $vs2 = read_vreg shift;
+    my $rs1 = read_reg shift;
+    my $vm = read_mask_vreg shift;
+    return ".word ".($template | ($vm << 25) | ($vs2 << 20) | ($rs1 << 15) | ($vd << 7))
+}
+
+sub vmv_v_i {
+    # vmv.v.i vd, imm
+    my $template = 0b0101111_00000_00000_011_00000_1010111;
+    my $vd = read_vreg shift;
+    my $imm = shift;
+    return ".word ".($template | ($imm << 15) | ($vd << 7));
+}
+
+sub vmv_v_x {
+    # vmv.v.x vd, rs1
+    my $template = 0b0101111_00000_00000_100_00000_1010111;
+    my $vd = read_vreg shift;
+    my $rs1 = read_reg shift;
+    return ".word ".($template | ($rs1 << 15) | ($vd << 7));
+}
+
+sub vmv_v_v {
+    # vmv.v.v vd, vs1
+    my $template = 0b0101111_00000_00000_000_00000_1010111;
+    my $vd = read_vreg shift;
+    my $vs1 = read_vreg shift;
+    return ".word ".($template | ($vs1 << 15) | ($vd << 7));
+}
+
+sub vor_vv_v0t {
+    # vor.vv vd, vs2, vs1, v0.t
+    my $template = 0b0010100_00000_00000_000_00000_1010111;
+    my $vd = read_vreg shift;
+    my $vs2 = read_vreg shift;
+    my $vs1 = read_vreg shift;
+    return ".word ".($template | ($vs2 << 20) | ($vs1 << 15) | ($vd << 7));
+}
+
+sub vse8_v {
+    # vse8.v vd, (rs1), vm
+    my $template = 0b000000_0_00000_00000_000_00000_0100111;
+    my $vd = read_vreg shift;
+    my $rs1 = read_reg shift;
+    my $vm = read_mask_vreg shift;
+    return ".word ".($template | ($vm << 25) | ($rs1 << 15) | ($vd << 7));
+}
+
+sub vse32_v {
+    # vse32.v vd, (rs1), vm
+    my $template = 0b000000_0_00000_00000_110_00000_0100111;
+    my $vd = read_vreg shift;
+    my $rs1 = read_reg shift;
+    my $vm = read_mask_vreg shift;
+    return ".word ".($template | ($vm << 25) | ($rs1 << 15) | ($vd << 7));
+}
+
+sub vssseg_nf_e32_v {
+    # vssseg<nf>e32.v vs3, (rs1), rs2
+    my $template = 0b0000101_00000_00000_110_00000_0100111;
+    my $nf = shift;
+    $nf -= 1;
+    my $vs3 = read_vreg shift;
+    my $rs1 = read_reg shift;
+    my $rs2 = read_reg shift;
+    return ".word ".($template | ($nf << 29) | ($rs2 << 20) | ($rs1 << 15) | ($vs3 << 7));
+}
+
+sub vsuxei8_v {
+   # vsuxei8.v vs3, (rs1), vs2, vm
+   my $template = 0b000001_0_00000_00000_000_00000_0100111;
+   my $vs3 = read_vreg shift;
+   my $rs1 = read_reg shift;
+   my $vs2 = read_vreg shift;
+   my $vm = read_mask_vreg shift;
+   return ".word ".($template | ($vm << 25) | ($vs2 << 20) | ($rs1 << 15) | ($vs3 << 7));
+}
+
+sub vse64_v {
+    # vse64.v vd, (rs1)
+    my $template = 0b0000001_00000_00000_111_00000_0100111;
+    my $vd = read_vreg shift;
+    my $rs1 = read_reg shift;
+    return ".word ".($template | ($rs1 << 15) | ($vd << 7));
+}
+
+sub vsetivli__x0_2_e64_m1_tu_mu {
+    # vsetivli x0, 2, e64, m1, tu, mu
+    return ".word 0xc1817057";
+}
+
+sub vsetivli__x0_4_e32_m1_tu_mu {
+    # vsetivli x0, 4, e32, m1, tu, mu
+    return ".word 0xc1027057";
+}
+
+sub vsetivli__x0_4_e64_m1_tu_mu {
+    # vsetivli x0, 4, e64, m1, tu, mu
+    return ".word 0xc1827057";
+}
+
+sub vsetivli__x0_8_e32_m1_tu_mu {
+    # vsetivli x0, 8, e32, m1, tu, mu
+    return ".word 0xc1047057";
+}
+
+sub vsetvli {
+    # vsetvli rd, rs1, vtypei
+    my $template = 0b0_00000000000_00000_111_00000_1010111;
+    my $rd = read_reg shift;
+    my $rs1 = read_reg shift;
+    my $sew = read_sew shift;
+    my $lmul = read_lmul shift;
+    my $tail_policy = read_tail_policy shift;
+    my $mask_policy = read_mask_policy shift;
+    my $vtypei = ($mask_policy << 7) | ($tail_policy << 6) | ($sew << 3) | $lmul;
+
+    return ".word ".($template | ($vtypei << 20) | ($rs1 << 15) | ($rd << 7));
+}
+
+sub vsetivli {
+    # vsetvli rd, uimm, vtypei
+    my $template = 0b11_0000000000_00000_111_00000_1010111;
+    my $rd = read_reg shift;
+    my $uimm = shift;
+    my $sew = read_sew shift;
+    my $lmul = read_lmul shift;
+    my $tail_policy = read_tail_policy shift;
+    my $mask_policy = read_mask_policy shift;
+    my $vtypei = ($mask_policy << 7) | ($tail_policy << 6) | ($sew << 3) | $lmul;
+
+    return ".word ".($template | ($vtypei << 20) | ($uimm << 15) | ($rd << 7));
+}
+
+sub vslidedown_vi {
+    # vslidedown.vi vd, vs2, uimm
+    my $template = 0b0011111_00000_00000_011_00000_1010111;
+    my $vd = read_vreg shift;
+    my $vs2 = read_vreg shift;
+    my $uimm = shift;
+    return ".word ".($template | ($vs2 << 20) | ($uimm << 15) | ($vd << 7));
+}
+
+sub vslidedown_vx {
+    # vslidedown.vx vd, vs2, rs1
+    my $template = 0b0011111_00000_00000_100_00000_1010111;
+    my $vd = read_vreg shift;
+    my $vs2 = read_vreg shift;
+    my $rs1 = read_reg shift;
+    return ".word ".($template | ($vs2 << 20) | ($rs1 << 15) | ($vd << 7));
+}
+
+sub vslideup_vi_v0t {
+    # vslideup.vi vd, vs2, uimm, v0.t
+    my $template = 0b0011100_00000_00000_011_00000_1010111;
+    my $vd = read_vreg shift;
+    my $vs2 = read_vreg shift;
+    my $uimm = shift;
+    return ".word ".($template | ($vs2 << 20) | ($uimm << 15) | ($vd << 7));
+}
+
+sub vslideup_vi {
+    # vslideup.vi vd, vs2, uimm
+    my $template = 0b0011101_00000_00000_011_00000_1010111;
+    my $vd = read_vreg shift;
+    my $vs2 = read_vreg shift;
+    my $uimm = shift;
+    return ".word ".($template | ($vs2 << 20) | ($uimm << 15) | ($vd << 7));
+}
+
+sub vsll_vi {
+    # vsll.vi vd, vs2, uimm, vm
+    my $template = 0b1001011_00000_00000_011_00000_1010111;
+    my $vd = read_vreg shift;
+    my $vs2 = read_vreg shift;
+    my $uimm = shift;
+    return ".word ".($template | ($vs2 << 20) | ($uimm << 15) | ($vd << 7));
+}
+
+sub vsrl_vx {
+    # vsrl.vx vd, vs2, rs1
+    my $template = 0b1010001_00000_00000_100_00000_1010111;
+    my $vd = read_vreg shift;
+    my $vs2 = read_vreg shift;
+    my $rs1 = read_reg shift;
+    return ".word ".($template | ($vs2 << 20) | ($rs1 << 15) | ($vd << 7));
+}
+
+sub vsse32_v {
+    # vse32.v vs3, (rs1), rs2
+    my $template = 0b0000101_00000_00000_110_00000_0100111;
+    my $vs3 = read_vreg shift;
+    my $rs1 = read_reg shift;
+    my $rs2 = read_reg shift;
+    return ".word ".($template | ($rs2 << 20) | ($rs1 << 15) | ($vs3 << 7));
+}
+
+sub vsse64_v {
+    # vsse64.v vs3, (rs1), rs2
+    my $template = 0b0000101_00000_00000_111_00000_0100111;
+    my $vs3 = read_vreg shift;
+    my $rs1 = read_reg shift;
+    my $rs2 = read_reg shift;
+    return ".word ".($template | ($rs2 << 20) | ($rs1 << 15) | ($vs3 << 7));
+}
+
+sub vxor_vv_v0t {
+    # vxor.vv vd, vs2, vs1, v0.t
+    my $template = 0b0010110_00000_00000_000_00000_1010111;
+    my $vd = read_vreg shift;
+    my $vs2 = read_vreg shift;
+    my $vs1 = read_vreg shift;
+    return ".word ".($template | ($vs2 << 20) | ($vs1 << 15) | ($vd << 7));
+}
+
+sub vxor_vv {
+    # vxor.vv vd, vs2, vs1
+    my $template = 0b0010111_00000_00000_000_00000_1010111;
+    my $vd = read_vreg shift;
+    my $vs2 = read_vreg shift;
+    my $vs1 = read_vreg shift;
+    return ".word ".($template | ($vs2 << 20) | ($vs1 << 15) | ($vd << 7));
+}
+
+sub vzext_vf2 {
+    # vzext.vf2 vd, vs2, vm
+    my $template = 0b010010_0_00000_00110_010_00000_1010111;
+    my $vd = read_vreg shift;
+    my $vs2 = read_vreg shift;
+    my $vm = read_mask_vreg shift;
+    return ".word ".($template | ($vm << 25) | ($vs2 << 20) | ($vd << 7));
+}
+
+# Vector crypto instructions
+
+## Zvbb and Zvkb instructions
+##
+## vandn (also in zvkb)
+## vbrev
+## vbrev8 (also in zvkb)
+## vrev8 (also in zvkb)
+## vclz
+## vctz
+## vcpop
+## vrol (also in zvkb)
+## vror (also in zvkb)
+## vwsll
+
+sub vbrev8_v {
+    # vbrev8.v vd, vs2, vm
+    my $template = 0b010010_0_00000_01000_010_00000_1010111;
+    my $vd = read_vreg shift;
+    my $vs2 = read_vreg shift;
+    my $vm = read_mask_vreg shift;
+    return ".word ".($template | ($vm << 25) | ($vs2 << 20) | ($vd << 7));
+}
+
+sub vrev8_v {
+    # vrev8.v vd, vs2, vm
+    my $template = 0b010010_0_00000_01001_010_00000_1010111;
+    my $vd = read_vreg shift;
+    my $vs2 = read_vreg shift;
+    my $vm = read_mask_vreg shift;
+    return ".word ".($template | ($vm << 25) | ($vs2 << 20) | ($vd << 7));
+}
+
+sub vror_vi {
+    # vror.vi vd, vs2, uimm
+    my $template = 0b01010_0_1_00000_00000_011_00000_1010111;
+    my $vd = read_vreg shift;
+    my $vs2 = read_vreg shift;
+    my $uimm = shift;
+    my $uimm_i5 = $uimm >> 5;
+    my $uimm_i4_0 = $uimm & 0b11111;
+
+    return ".word ".($template | ($uimm_i5 << 26) | ($vs2 << 20) | ($uimm_i4_0 << 15) | ($vd << 7));
+}
+
+sub vwsll_vv {
+    # vwsll.vv vd, vs2, vs1, vm
+    my $template = 0b110101_0_00000_00000_000_00000_1010111;
+    my $vd = read_vreg shift;
+    my $vs2 = read_vreg shift;
+    my $vs1 = read_vreg shift;
+    my $vm = read_mask_vreg shift;
+    return ".word ".($template | ($vm << 25) | ($vs2 << 20) | ($vs1 << 15) | ($vd << 7));
+}
+
+## Zvbc instructions
+
+sub vclmulh_vx {
+    # vclmulh.vx vd, vs2, rs1
+    my $template = 0b0011011_00000_00000_110_00000_1010111;
+    my $vd = read_vreg shift;
+    my $vs2 = read_vreg shift;
+    my $rs1 = read_reg shift;
+    return ".word ".($template | ($vs2 << 20) | ($rs1 << 15) | ($vd << 7));
+}
+
+sub vclmul_vx_v0t {
+    # vclmul.vx vd, vs2, rs1, v0.t
+    my $template = 0b0011000_00000_00000_110_00000_1010111;
+    my $vd = read_vreg shift;
+    my $vs2 = read_vreg shift;
+    my $rs1 = read_reg shift;
+    return ".word ".($template | ($vs2 << 20) | ($rs1 << 15) | ($vd << 7));
+}
+
+sub vclmul_vx {
+    # vclmul.vx vd, vs2, rs1
+    my $template = 0b0011001_00000_00000_110_00000_1010111;
+    my $vd = read_vreg shift;
+    my $vs2 = read_vreg shift;
+    my $rs1 = read_reg shift;
+    return ".word ".($template | ($vs2 << 20) | ($rs1 << 15) | ($vd << 7));
+}
+
+## Zvkg instructions
+
+sub vghsh_vv {
+    # vghsh.vv vd, vs2, vs1
+    my $template = 0b1011001_00000_00000_010_00000_1110111;
+    my $vd = read_vreg shift;
+    my $vs2 = read_vreg shift;
+    my $vs1 = read_vreg shift;
+    return ".word ".($template | ($vs2 << 20) | ($vs1 << 15) | ($vd << 7));
+}
+
+sub vgmul_vv {
+    # vgmul.vv vd, vs2
+    my $template = 0b1010001_00000_10001_010_00000_1110111;
+    my $vd = read_vreg shift;
+    my $vs2 = read_vreg shift;
+    return ".word ".($template | ($vs2 << 20) | ($vd << 7));
+}
+
+## Zvkned instructions
+
+sub vaesdf_vs {
+    # vaesdf.vs vd, vs2
+    my $template = 0b101001_1_00000_00001_010_00000_1110111;
+    my $vd = read_vreg  shift;
+    my $vs2 = read_vreg  shift;
+    return ".word ".($template | ($vs2 << 20) | ($vd << 7));
+}
+
+sub vaesdm_vs {
+    # vaesdm.vs vd, vs2
+    my $template = 0b101001_1_00000_00000_010_00000_1110111;
+    my $vd = read_vreg shift;
+    my $vs2 = read_vreg shift;
+    return ".word ".($template | ($vs2 << 20) | ($vd << 7));
+}
+
+sub vaesef_vs {
+    # vaesef.vs vd, vs2
+    my $template = 0b101001_1_00000_00011_010_00000_1110111;
+    my $vd = read_vreg  shift;
+    my $vs2 = read_vreg  shift;
+    return ".word ".($template | ($vs2 << 20) | ($vd << 7));
+}
+
+sub vaesem_vs {
+    # vaesem.vs vd, vs2
+    my $template = 0b101001_1_00000_00010_010_00000_1110111;
+    my $vd = read_vreg  shift;
+    my $vs2 = read_vreg  shift;
+    return ".word ".($template | ($vs2 << 20) | ($vd << 7));
+}
+
+sub vaeskf1_vi {
+    # vaeskf1.vi vd, vs2, uimmm
+    my $template = 0b100010_1_00000_00000_010_00000_1110111;
+    my $vd = read_vreg  shift;
+    my $vs2 = read_vreg  shift;
+    my $uimm = shift;
+    return ".word ".($template | ($uimm << 15) | ($vs2 << 20) | ($vd << 7));
+}
+
+sub vaeskf2_vi {
+    # vaeskf2.vi vd, vs2, uimm
+    my $template = 0b101010_1_00000_00000_010_00000_1110111;
+    my $vd = read_vreg shift;
+    my $vs2 = read_vreg shift;
+    my $uimm = shift;
+    return ".word ".($template | ($vs2 << 20) | ($uimm << 15) | ($vd << 7));
+}
+
+sub vaesz_vs {
+    # vaesz.vs vd, vs2
+    my $template = 0b101001_1_00000_00111_010_00000_1110111;
+    my $vd = read_vreg  shift;
+    my $vs2 = read_vreg  shift;
+    return ".word ".($template | ($vs2 << 20) | ($vd << 7));
+}
+
+## Zvknha and Zvknhb instructions
+
+sub vsha2ms_vv {
+    # vsha2ms.vv vd, vs2, vs1
+    my $template = 0b1011011_00000_00000_010_00000_1110111;
+    my $vd = read_vreg shift;
+    my $vs2 = read_vreg shift;
+    my $vs1 = read_vreg shift;
+    return ".word ".($template | ($vs2 << 20)| ($vs1 << 15 )| ($vd << 7));
+}
+
+sub vsha2ch_vv {
+    # vsha2ch.vv vd, vs2, vs1
+    my $template = 0b101110_10000_00000_001_00000_01110111;
+    my $vd = read_vreg shift;
+    my $vs2 = read_vreg shift;
+    my $vs1 = read_vreg shift;
+    return ".word ".($template | ($vs2 << 20)| ($vs1 << 15 )| ($vd << 7));
+}
+
+sub vsha2cl_vv {
+    # vsha2cl.vv vd, vs2, vs1
+    my $template = 0b101111_10000_00000_001_00000_01110111;
+    my $vd = read_vreg shift;
+    my $vs2 = read_vreg shift;
+    my $vs1 = read_vreg shift;
+    return ".word ".($template | ($vs2 << 20)| ($vs1 << 15 )| ($vd << 7));
+}
+
+## Zvksed instructions
+
+sub vsm4k_vi {
+    # vsm4k.vi vd, vs2, uimm
+    my $template = 0b1000011_00000_00000_010_00000_1110111;
+    my $vd = read_vreg shift;
+    my $vs2 = read_vreg shift;
+    my $uimm = shift;
+    return ".word ".($template | ($vs2 << 20) | ($uimm << 15) | ($vd << 7));
+}
+
+sub vsm4r_vs {
+    # vsm4r.vs vd, vs2
+    my $template = 0b1010011_00000_10000_010_00000_1110111;
+    my $vd = read_vreg shift;
+    my $vs2 = read_vreg shift;
+    return ".word ".($template | ($vs2 << 20) | ($vd << 7));
+}
+
+## zvksh instructions
+
+sub vsm3c_vi {
+    # vsm3c.vi vd, vs2, uimm
+    my $template = 0b1010111_00000_00000_010_00000_1110111;
+    my $vd = read_vreg shift;
+    my $vs2 = read_vreg shift;
+    my $uimm = shift;
+    return ".word ".($template | ($vs2 << 20) | ($uimm << 15 ) | ($vd << 7));
+}
+
+sub vsm3me_vv {
+    # vsm3me.vv vd, vs2, vs1
+    my $template = 0b1000001_00000_00000_010_00000_1110111;
+    my $vd = read_vreg shift;
+    my $vs2 = read_vreg shift;
+    my $vs1 = read_vreg shift;
+    return ".word ".($template | ($vs2 << 20) | ($vs1 << 15 ) | ($vd << 7));
+}
+
+1;
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 04/12] RISC-V: crypto: add Zvkned accelerated AES implementation
  2023-10-25 18:36 [PATCH 00/12] RISC-V: provide some accelerated cryptography implementations using vector extensions Jerry Shih
                   ` (2 preceding siblings ...)
  2023-10-25 18:36 ` [PATCH 03/12] RISC-V: crypto: add OpenSSL perl module for vector instructions Jerry Shih
@ 2023-10-25 18:36 ` Jerry Shih
  2023-11-02  4:51   ` Eric Biggers
  2023-10-25 18:36 ` [PATCH 05/12] crypto: scatterwalk - Add scatterwalk_next() to get the next scatterlist in scatter_walk Jerry Shih
                   ` (7 subsequent siblings)
  11 siblings, 1 reply; 50+ messages in thread
From: Jerry Shih @ 2023-10-25 18:36 UTC (permalink / raw)
  To: paul.walmsley, palmer, aou, herbert, davem
  Cc: andy.chiu, greentime.hu, conor.dooley, guoren, bjorn, heiko,
	ebiggers, ardb, phoebe.chen, hongrong.hsu, linux-riscv,
	linux-kernel, linux-crypto

The AES implementation using the Zvkned vector crypto extension from
OpenSSL(openssl/openssl#21923).

Co-developed-by: Christoph Müllner <christoph.muellner@vrull.eu>
Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
Co-developed-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
Co-developed-by: Phoebe Chen <phoebe.chen@sifive.com>
Signed-off-by: Phoebe Chen <phoebe.chen@sifive.com>
Signed-off-by: Jerry Shih <jerry.shih@sifive.com>
---
 arch/riscv/crypto/Kconfig               |  12 +
 arch/riscv/crypto/Makefile              |  11 +
 arch/riscv/crypto/aes-riscv64-glue.c    | 163 +++++++++
 arch/riscv/crypto/aes-riscv64-glue.h    |  28 ++
 arch/riscv/crypto/aes-riscv64-zvkned.pl | 451 ++++++++++++++++++++++++
 5 files changed, 665 insertions(+)
 create mode 100644 arch/riscv/crypto/aes-riscv64-glue.c
 create mode 100644 arch/riscv/crypto/aes-riscv64-glue.h
 create mode 100644 arch/riscv/crypto/aes-riscv64-zvkned.pl

diff --git a/arch/riscv/crypto/Kconfig b/arch/riscv/crypto/Kconfig
index 10d60edc0110..500938317e71 100644
--- a/arch/riscv/crypto/Kconfig
+++ b/arch/riscv/crypto/Kconfig
@@ -2,4 +2,16 @@
 
 menu "Accelerated Cryptographic Algorithms for CPU (riscv)"
 
+config CRYPTO_AES_RISCV64
+	default y if RISCV_ISA_V
+	tristate "Ciphers: AES"
+	depends on 64BIT && RISCV_ISA_V
+	select CRYPTO_AES
+	select CRYPTO_ALGAPI
+	help
+	  Block ciphers: AES cipher algorithms (FIPS-197)
+
+	  Architecture: riscv64 using:
+	  - Zvkned vector crypto extension
+
 endmenu
diff --git a/arch/riscv/crypto/Makefile b/arch/riscv/crypto/Makefile
index b3b6332c9f6d..90ca91d8df26 100644
--- a/arch/riscv/crypto/Makefile
+++ b/arch/riscv/crypto/Makefile
@@ -2,3 +2,14 @@
 #
 # linux/arch/riscv/crypto/Makefile
 #
+
+obj-$(CONFIG_CRYPTO_AES_RISCV64) += aes-riscv64.o
+aes-riscv64-y := aes-riscv64-glue.o aes-riscv64-zvkned.o
+
+quiet_cmd_perlasm = PERLASM $@
+      cmd_perlasm = $(PERL) $(<) void $(@)
+
+$(obj)/aes-riscv64-zvkned.S: $(src)/aes-riscv64-zvkned.pl
+	$(call cmd,perlasm)
+
+clean-files += aes-riscv64-zvkned.S
diff --git a/arch/riscv/crypto/aes-riscv64-glue.c b/arch/riscv/crypto/aes-riscv64-glue.c
new file mode 100644
index 000000000000..5c4f1018d3aa
--- /dev/null
+++ b/arch/riscv/crypto/aes-riscv64-glue.c
@@ -0,0 +1,163 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Port of the OpenSSL AES implementation for RISC-V
+ *
+ * Copyright (C) 2023 VRULL GmbH
+ * Author: Heiko Stuebner <heiko.stuebner@vrull.eu>
+ *
+ * Copyright (C) 2023 SiFive, Inc.
+ * Author: Jerry Shih <jerry.shih@sifive.com>
+ */
+
+#include <asm/simd.h>
+#include <asm/vector.h>
+#include <crypto/aes.h>
+#include <crypto/internal/cipher.h>
+#include <crypto/internal/simd.h>
+#include <linux/crypto.h>
+#include <linux/module.h>
+#include <linux/types.h>
+
+#include "aes-riscv64-glue.h"
+
+/*
+ * aes cipher using zvkned vector crypto extension
+ *
+ * All zvkned-based functions use encryption expending keys for both encryption
+ * and decryption.
+ */
+void rv64i_zvkned_encrypt(const u8 *in, u8 *out, const struct aes_key *key);
+void rv64i_zvkned_decrypt(const u8 *in, u8 *out, const struct aes_key *key);
+
+static inline int aes_round_num(unsigned int keylen)
+{
+	switch (keylen) {
+	case AES_KEYSIZE_128:
+		return 10;
+	case AES_KEYSIZE_192:
+		return 12;
+	case AES_KEYSIZE_256:
+		return 14;
+	default:
+		return 0;
+	}
+}
+
+int riscv64_aes_setkey(struct riscv64_aes_ctx *ctx, const u8 *key,
+		       unsigned int keylen)
+{
+	/*
+	 * The RISC-V AES vector crypto key expending doesn't support AES-192.
+	 * We just use the generic software key expending here to simplify the key
+	 * expending flow.
+	 */
+	u32 aes_rounds;
+	u32 key_length;
+	int ret;
+
+	ret = aes_expandkey(&ctx->fallback_ctx, key, keylen);
+	if (ret < 0)
+		return -EINVAL;
+
+	/*
+	 * Copy the key from `crypto_aes_ctx` to `aes_key` for zvkned-based AES
+	 * implementations.
+	 */
+	aes_rounds = aes_round_num(keylen);
+	ctx->key.rounds = aes_rounds;
+	key_length = AES_BLOCK_SIZE * (aes_rounds + 1);
+	memcpy(ctx->key.key, ctx->fallback_ctx.key_enc, key_length);
+
+	return 0;
+}
+
+void riscv64_aes_encrypt_zvkned(const struct riscv64_aes_ctx *ctx, u8 *dst,
+				const u8 *src)
+{
+	if (crypto_simd_usable()) {
+		kernel_vector_begin();
+		rv64i_zvkned_encrypt(src, dst, &ctx->key);
+		kernel_vector_end();
+	} else {
+		aes_encrypt(&ctx->fallback_ctx, dst, src);
+	}
+}
+
+void riscv64_aes_decrypt_zvkned(const struct riscv64_aes_ctx *ctx, u8 *dst,
+				const u8 *src)
+{
+	if (crypto_simd_usable()) {
+		kernel_vector_begin();
+		rv64i_zvkned_decrypt(src, dst, &ctx->key);
+		kernel_vector_end();
+	} else {
+		aes_decrypt(&ctx->fallback_ctx, dst, src);
+	}
+}
+
+static int aes_setkey(struct crypto_tfm *tfm, const u8 *key,
+		      unsigned int keylen)
+{
+	struct riscv64_aes_ctx *ctx = crypto_tfm_ctx(tfm);
+
+	return riscv64_aes_setkey(ctx, key, keylen);
+}
+
+static void aes_encrypt_zvkned(struct crypto_tfm *tfm, u8 *dst, const u8 *src)
+{
+	const struct riscv64_aes_ctx *ctx = crypto_tfm_ctx(tfm);
+
+	riscv64_aes_encrypt_zvkned(ctx, dst, src);
+}
+
+static void aes_decrypt_zvkned(struct crypto_tfm *tfm, u8 *dst, const u8 *src)
+{
+	const struct riscv64_aes_ctx *ctx = crypto_tfm_ctx(tfm);
+
+	riscv64_aes_decrypt_zvkned(ctx, dst, src);
+}
+
+static struct crypto_alg riscv64_aes_alg_zvkned = {
+	.cra_name = "aes",
+	.cra_driver_name = "aes-riscv64-zvkned",
+	.cra_module = THIS_MODULE,
+	.cra_priority = 300,
+	.cra_flags = CRYPTO_ALG_TYPE_CIPHER,
+	.cra_blocksize = AES_BLOCK_SIZE,
+	.cra_ctxsize = sizeof(struct riscv64_aes_ctx),
+	.cra_cipher = {
+		.cia_min_keysize = AES_MIN_KEY_SIZE,
+		.cia_max_keysize = AES_MAX_KEY_SIZE,
+		.cia_setkey = aes_setkey,
+		.cia_encrypt = aes_encrypt_zvkned,
+		.cia_decrypt = aes_decrypt_zvkned,
+	},
+};
+
+static inline bool check_aes_ext(void)
+{
+	return riscv_isa_extension_available(NULL, ZVKNED) &&
+	       riscv_vector_vlen() >= 128;
+}
+
+static int __init riscv64_aes_mod_init(void)
+{
+	if (check_aes_ext())
+		return crypto_register_alg(&riscv64_aes_alg_zvkned);
+
+	return -ENODEV;
+}
+
+static void __exit riscv64_aes_mod_fini(void)
+{
+	if (check_aes_ext())
+		crypto_unregister_alg(&riscv64_aes_alg_zvkned);
+}
+
+module_init(riscv64_aes_mod_init);
+module_exit(riscv64_aes_mod_fini);
+
+MODULE_DESCRIPTION("AES (RISC-V accelerated)");
+MODULE_AUTHOR("Heiko Stuebner <heiko.stuebner@vrull.eu>");
+MODULE_LICENSE("GPL");
+MODULE_ALIAS_CRYPTO("aes");
diff --git a/arch/riscv/crypto/aes-riscv64-glue.h b/arch/riscv/crypto/aes-riscv64-glue.h
new file mode 100644
index 000000000000..7f1f675aca0d
--- /dev/null
+++ b/arch/riscv/crypto/aes-riscv64-glue.h
@@ -0,0 +1,28 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#ifndef AES_RISCV64_GLUE_H
+#define AES_RISCV64_GLUE_H
+
+#include <crypto/aes.h>
+#include <linux/types.h>
+
+struct aes_key {
+	u32 key[AES_MAX_KEYLENGTH_U32];
+	u32 rounds;
+};
+
+struct riscv64_aes_ctx {
+	struct aes_key key;
+	struct crypto_aes_ctx fallback_ctx;
+};
+
+int riscv64_aes_setkey(struct riscv64_aes_ctx *ctx, const u8 *key,
+		       unsigned int keylen);
+
+void riscv64_aes_encrypt_zvkned(const struct riscv64_aes_ctx *ctx, u8 *dst,
+				const u8 *src);
+
+void riscv64_aes_decrypt_zvkned(const struct riscv64_aes_ctx *ctx, u8 *dst,
+				const u8 *src);
+
+#endif /* AES_RISCV64_GLUE_H */
diff --git a/arch/riscv/crypto/aes-riscv64-zvkned.pl b/arch/riscv/crypto/aes-riscv64-zvkned.pl
new file mode 100644
index 000000000000..c0ecde77bf56
--- /dev/null
+++ b/arch/riscv/crypto/aes-riscv64-zvkned.pl
@@ -0,0 +1,451 @@
+#! /usr/bin/env perl
+# SPDX-License-Identifier: Apache-2.0 OR BSD-2-Clause
+#
+# This file is dual-licensed, meaning that you can use it under your
+# choice of either of the following two licenses:
+#
+# Copyright 2023 The OpenSSL Project Authors. All Rights Reserved.
+#
+# Licensed under the Apache License 2.0 (the "License"). You can obtain
+# a copy in the file LICENSE in the source distribution or at
+# https://www.openssl.org/source/license.html
+#
+# or
+#
+# Copyright (c) 2023, Christoph Müllner <christoph.muellner@vrull.eu>
+# Copyright (c) 2023, Phoebe Chen <phoebe.chen@sifive.com>
+# All rights reserved.
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions
+# are met:
+# 1. Redistributions of source code must retain the above copyright
+#    notice, this list of conditions and the following disclaimer.
+# 2. Redistributions in binary form must reproduce the above copyright
+#    notice, this list of conditions and the following disclaimer in the
+#    documentation and/or other materials provided with the distribution.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+# - RV64I
+# - RISC-V Vector ('V') with VLEN >= 128
+# - RISC-V Vector AES block cipher extension ('Zvkned')
+
+use strict;
+use warnings;
+
+use FindBin qw($Bin);
+use lib "$Bin";
+use lib "$Bin/../../perlasm";
+use riscv;
+
+# $output is the last argument if it looks like a file (it has an extension)
+# $flavour is the first argument if it doesn't look like a file
+my $output = $#ARGV >= 0 && $ARGV[$#ARGV] =~ m|\.\w+$| ? pop : undef;
+my $flavour = $#ARGV >= 0 && $ARGV[0] !~ m|\.| ? shift : undef;
+
+$output and open STDOUT,">$output";
+
+my $code=<<___;
+.text
+___
+
+my ($V0, $V1, $V2, $V3, $V4, $V5, $V6, $V7,
+    $V8, $V9, $V10, $V11, $V12, $V13, $V14, $V15,
+    $V16, $V17, $V18, $V19, $V20, $V21, $V22, $V23,
+    $V24, $V25, $V26, $V27, $V28, $V29, $V30, $V31,
+) = map("v$_",(0..31));
+
+{
+################################################################################
+# void rv64i_zvkned_encrypt(const unsigned char *in, unsigned char *out,
+#                           const AES_KEY *key);
+my ($INP, $OUTP, $KEYP) = ("a0", "a1", "a2");
+my ($T0, $T1, $ROUNDS, $T6) = ("a3", "a4", "t5", "t6");
+
+$code .= <<___;
+.p2align 3
+.globl rv64i_zvkned_encrypt
+.type rv64i_zvkned_encrypt,\@function
+rv64i_zvkned_encrypt:
+    # Load number of rounds
+    lwu $ROUNDS, 240($KEYP)
+
+    # Get proper routine for key size
+    li $T6, 14
+    beq $ROUNDS, $T6, L_enc_256
+    li $T6, 10
+    beq $ROUNDS, $T6, L_enc_128
+    li $T6, 12
+    beq $ROUNDS, $T6, L_enc_192
+
+    j L_fail_m2
+.size rv64i_zvkned_encrypt,.-rv64i_zvkned_encrypt
+___
+
+$code .= <<___;
+.p2align 3
+L_enc_128:
+    @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]}
+
+    @{[vle32_v $V1, ($INP)]}
+
+    @{[vle32_v $V10, ($KEYP)]}
+    @{[vaesz_vs $V1, $V10]}    # with round key w[ 0, 3]
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $V11, ($KEYP)]}
+    @{[vaesem_vs $V1, $V11]}   # with round key w[ 4, 7]
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $V12, ($KEYP)]}
+    @{[vaesem_vs $V1, $V12]}   # with round key w[ 8,11]
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $V13, ($KEYP)]}
+    @{[vaesem_vs $V1, $V13]}   # with round key w[12,15]
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $V14, ($KEYP)]}
+    @{[vaesem_vs $V1, $V14]}   # with round key w[16,19]
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $V15, ($KEYP)]}
+    @{[vaesem_vs $V1, $V15]}   # with round key w[20,23]
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $V16, ($KEYP)]}
+    @{[vaesem_vs $V1, $V16]}   # with round key w[24,27]
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $V17, ($KEYP)]}
+    @{[vaesem_vs $V1, $V17]}   # with round key w[28,31]
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $V18, ($KEYP)]}
+    @{[vaesem_vs $V1, $V18]}   # with round key w[32,35]
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $V19, ($KEYP)]}
+    @{[vaesem_vs $V1, $V19]}   # with round key w[36,39]
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $V20, ($KEYP)]}
+    @{[vaesef_vs $V1, $V20]}   # with round key w[40,43]
+
+    @{[vse32_v $V1, ($OUTP)]}
+
+    ret
+.size L_enc_128,.-L_enc_128
+___
+
+$code .= <<___;
+.p2align 3
+L_enc_192:
+    @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]}
+
+    @{[vle32_v $V1, ($INP)]}
+
+    @{[vle32_v $V10, ($KEYP)]}
+    @{[vaesz_vs $V1, $V10]}     # with round key w[ 0, 3]
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $V11, ($KEYP)]}
+    @{[vaesem_vs $V1, $V11]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $V12, ($KEYP)]}
+    @{[vaesem_vs $V1, $V12]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $V13, ($KEYP)]}
+    @{[vaesem_vs $V1, $V13]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $V14, ($KEYP)]}
+    @{[vaesem_vs $V1, $V14]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $V15, ($KEYP)]}
+    @{[vaesem_vs $V1, $V15]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $V16, ($KEYP)]}
+    @{[vaesem_vs $V1, $V16]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $V17, ($KEYP)]}
+    @{[vaesem_vs $V1, $V17]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $V18, ($KEYP)]}
+    @{[vaesem_vs $V1, $V18]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $V19, ($KEYP)]}
+    @{[vaesem_vs $V1, $V19]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $V20, ($KEYP)]}
+    @{[vaesem_vs $V1, $V20]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $V21, ($KEYP)]}
+    @{[vaesem_vs $V1, $V21]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $V22, ($KEYP)]}
+    @{[vaesef_vs $V1, $V22]}
+
+    @{[vse32_v $V1, ($OUTP)]}
+    ret
+.size L_enc_192,.-L_enc_192
+___
+
+$code .= <<___;
+.p2align 3
+L_enc_256:
+    @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]}
+
+    @{[vle32_v $V1, ($INP)]}
+
+    @{[vle32_v $V10, ($KEYP)]}
+    @{[vaesz_vs $V1, $V10]}     # with round key w[ 0, 3]
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $V11, ($KEYP)]}
+    @{[vaesem_vs $V1, $V11]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $V12, ($KEYP)]}
+    @{[vaesem_vs $V1, $V12]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $V13, ($KEYP)]}
+    @{[vaesem_vs $V1, $V13]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $V14, ($KEYP)]}
+    @{[vaesem_vs $V1, $V14]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $V15, ($KEYP)]}
+    @{[vaesem_vs $V1, $V15]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $V16, ($KEYP)]}
+    @{[vaesem_vs $V1, $V16]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $V17, ($KEYP)]}
+    @{[vaesem_vs $V1, $V17]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $V18, ($KEYP)]}
+    @{[vaesem_vs $V1, $V18]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $V19, ($KEYP)]}
+    @{[vaesem_vs $V1, $V19]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $V20, ($KEYP)]}
+    @{[vaesem_vs $V1, $V20]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $V21, ($KEYP)]}
+    @{[vaesem_vs $V1, $V21]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $V22, ($KEYP)]}
+    @{[vaesem_vs $V1, $V22]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $V23, ($KEYP)]}
+    @{[vaesem_vs $V1, $V23]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $V24, ($KEYP)]}
+    @{[vaesef_vs $V1, $V24]}
+
+    @{[vse32_v $V1, ($OUTP)]}
+    ret
+.size L_enc_256,.-L_enc_256
+___
+
+################################################################################
+# void rv64i_zvkned_decrypt(const unsigned char *in, unsigned char *out,
+#                           const AES_KEY *key);
+$code .= <<___;
+.p2align 3
+.globl rv64i_zvkned_decrypt
+.type rv64i_zvkned_decrypt,\@function
+rv64i_zvkned_decrypt:
+    # Load number of rounds
+    lwu $ROUNDS, 240($KEYP)
+
+    # Get proper routine for key size
+    li $T6, 14
+    beq $ROUNDS, $T6, L_dec_256
+    li $T6, 10
+    beq $ROUNDS, $T6, L_dec_128
+    li $T6, 12
+    beq $ROUNDS, $T6, L_dec_192
+
+    j L_fail_m2
+.size rv64i_zvkned_decrypt,.-rv64i_zvkned_decrypt
+___
+
+$code .= <<___;
+.p2align 3
+L_dec_128:
+    @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]}
+
+    @{[vle32_v $V1, ($INP)]}
+
+    addi $KEYP, $KEYP, 160
+    @{[vle32_v $V20, ($KEYP)]}
+    @{[vaesz_vs $V1, $V20]}    # with round key w[40,43]
+    addi $KEYP, $KEYP, -16
+    @{[vle32_v $V19, ($KEYP)]}
+    @{[vaesdm_vs $V1, $V19]}   # with round key w[36,39]
+    addi $KEYP, $KEYP, -16
+    @{[vle32_v $V18, ($KEYP)]}
+    @{[vaesdm_vs $V1, $V18]}   # with round key w[32,35]
+    addi $KEYP, $KEYP, -16
+    @{[vle32_v $V17, ($KEYP)]}
+    @{[vaesdm_vs $V1, $V17]}   # with round key w[28,31]
+    addi $KEYP, $KEYP, -16
+    @{[vle32_v $V16, ($KEYP)]}
+    @{[vaesdm_vs $V1, $V16]}   # with round key w[24,27]
+    addi $KEYP, $KEYP, -16
+    @{[vle32_v $V15, ($KEYP)]}
+    @{[vaesdm_vs $V1, $V15]}   # with round key w[20,23]
+    addi $KEYP, $KEYP, -16
+    @{[vle32_v $V14, ($KEYP)]}
+    @{[vaesdm_vs $V1, $V14]}   # with round key w[16,19]
+    addi $KEYP, $KEYP, -16
+    @{[vle32_v $V13, ($KEYP)]}
+    @{[vaesdm_vs $V1, $V13]}   # with round key w[12,15]
+    addi $KEYP, $KEYP, -16
+    @{[vle32_v $V12, ($KEYP)]}
+    @{[vaesdm_vs $V1, $V12]}   # with round key w[ 8,11]
+    addi $KEYP, $KEYP, -16
+    @{[vle32_v $V11, ($KEYP)]}
+    @{[vaesdm_vs $V1, $V11]}   # with round key w[ 4, 7]
+    addi $KEYP, $KEYP, -16
+    @{[vle32_v $V10, ($KEYP)]}
+    @{[vaesdf_vs $V1, $V10]}   # with round key w[ 0, 3]
+
+    @{[vse32_v $V1, ($OUTP)]}
+
+    ret
+.size L_dec_128,.-L_dec_128
+___
+
+$code .= <<___;
+.p2align 3
+L_dec_192:
+    @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]}
+
+    @{[vle32_v $V1, ($INP)]}
+
+    addi $KEYP, $KEYP, 192
+    @{[vle32_v $V22, ($KEYP)]}
+    @{[vaesz_vs $V1, $V22]}    # with round key w[48,51]
+    addi $KEYP, $KEYP, -16
+    @{[vle32_v $V21, ($KEYP)]}
+    @{[vaesdm_vs $V1, $V21]}   # with round key w[44,47]
+    addi $KEYP, $KEYP, -16
+    @{[vle32_v $V20, ($KEYP)]}
+    @{[vaesdm_vs $V1, $V20]}    # with round key w[40,43]
+    addi $KEYP, $KEYP, -16
+    @{[vle32_v $V19, ($KEYP)]}
+    @{[vaesdm_vs $V1, $V19]}   # with round key w[36,39]
+    addi $KEYP, $KEYP, -16
+    @{[vle32_v $V18, ($KEYP)]}
+    @{[vaesdm_vs $V1, $V18]}   # with round key w[32,35]
+    addi $KEYP, $KEYP, -16
+    @{[vle32_v $V17, ($KEYP)]}
+    @{[vaesdm_vs $V1, $V17]}   # with round key w[28,31]
+    addi $KEYP, $KEYP, -16
+    @{[vle32_v $V16, ($KEYP)]}
+    @{[vaesdm_vs $V1, $V16]}   # with round key w[24,27]
+    addi $KEYP, $KEYP, -16
+    @{[vle32_v $V15, ($KEYP)]}
+    @{[vaesdm_vs $V1, $V15]}   # with round key w[20,23]
+    addi $KEYP, $KEYP, -16
+    @{[vle32_v $V14, ($KEYP)]}
+    @{[vaesdm_vs $V1, $V14]}   # with round key w[16,19]
+    addi $KEYP, $KEYP, -16
+    @{[vle32_v $V13, ($KEYP)]}
+    @{[vaesdm_vs $V1, $V13]}   # with round key w[12,15]
+    addi $KEYP, $KEYP, -16
+    @{[vle32_v $V12, ($KEYP)]}
+    @{[vaesdm_vs $V1, $V12]}   # with round key w[ 8,11]
+    addi $KEYP, $KEYP, -16
+    @{[vle32_v $V11, ($KEYP)]}
+    @{[vaesdm_vs $V1, $V11]}   # with round key w[ 4, 7]
+    addi $KEYP, $KEYP, -16
+    @{[vle32_v $V10, ($KEYP)]}
+    @{[vaesdf_vs $V1, $V10]}   # with round key w[ 0, 3]
+
+    @{[vse32_v $V1, ($OUTP)]}
+
+    ret
+.size L_dec_192,.-L_dec_192
+___
+
+$code .= <<___;
+.p2align 3
+L_dec_256:
+    @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]}
+
+    @{[vle32_v $V1, ($INP)]}
+
+    addi $KEYP, $KEYP, 224
+    @{[vle32_v $V24, ($KEYP)]}
+    @{[vaesz_vs $V1, $V24]}    # with round key w[56,59]
+    addi $KEYP, $KEYP, -16
+    @{[vle32_v $V23, ($KEYP)]}
+    @{[vaesdm_vs $V1, $V23]}   # with round key w[52,55]
+    addi $KEYP, $KEYP, -16
+    @{[vle32_v $V22, ($KEYP)]}
+    @{[vaesdm_vs $V1, $V22]}    # with round key w[48,51]
+    addi $KEYP, $KEYP, -16
+    @{[vle32_v $V21, ($KEYP)]}
+    @{[vaesdm_vs $V1, $V21]}   # with round key w[44,47]
+    addi $KEYP, $KEYP, -16
+    @{[vle32_v $V20, ($KEYP)]}
+    @{[vaesdm_vs $V1, $V20]}    # with round key w[40,43]
+    addi $KEYP, $KEYP, -16
+    @{[vle32_v $V19, ($KEYP)]}
+    @{[vaesdm_vs $V1, $V19]}   # with round key w[36,39]
+    addi $KEYP, $KEYP, -16
+    @{[vle32_v $V18, ($KEYP)]}
+    @{[vaesdm_vs $V1, $V18]}   # with round key w[32,35]
+    addi $KEYP, $KEYP, -16
+    @{[vle32_v $V17, ($KEYP)]}
+    @{[vaesdm_vs $V1, $V17]}   # with round key w[28,31]
+    addi $KEYP, $KEYP, -16
+    @{[vle32_v $V16, ($KEYP)]}
+    @{[vaesdm_vs $V1, $V16]}   # with round key w[24,27]
+    addi $KEYP, $KEYP, -16
+    @{[vle32_v $V15, ($KEYP)]}
+    @{[vaesdm_vs $V1, $V15]}   # with round key w[20,23]
+    addi $KEYP, $KEYP, -16
+    @{[vle32_v $V14, ($KEYP)]}
+    @{[vaesdm_vs $V1, $V14]}   # with round key w[16,19]
+    addi $KEYP, $KEYP, -16
+    @{[vle32_v $V13, ($KEYP)]}
+    @{[vaesdm_vs $V1, $V13]}   # with round key w[12,15]
+    addi $KEYP, $KEYP, -16
+    @{[vle32_v $V12, ($KEYP)]}
+    @{[vaesdm_vs $V1, $V12]}   # with round key w[ 8,11]
+    addi $KEYP, $KEYP, -16
+    @{[vle32_v $V11, ($KEYP)]}
+    @{[vaesdm_vs $V1, $V11]}   # with round key w[ 4, 7]
+    addi $KEYP, $KEYP, -16
+    @{[vle32_v $V10, ($KEYP)]}
+    @{[vaesdf_vs $V1, $V10]}   # with round key w[ 0, 3]
+
+    @{[vse32_v $V1, ($OUTP)]}
+
+    ret
+.size L_dec_256,.-L_dec_256
+___
+}
+
+$code .= <<___;
+L_fail_m1:
+    li a0, -1
+    ret
+.size L_fail_m1,.-L_fail_m1
+
+L_fail_m2:
+    li a0, -2
+    ret
+.size L_fail_m2,.-L_fail_m2
+
+L_end:
+  ret
+.size L_end,.-L_end
+___
+
+print $code;
+
+close STDOUT or die "error closing STDOUT: $!";
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 05/12] crypto: scatterwalk - Add scatterwalk_next() to get the next scatterlist in scatter_walk
  2023-10-25 18:36 [PATCH 00/12] RISC-V: provide some accelerated cryptography implementations using vector extensions Jerry Shih
                   ` (3 preceding siblings ...)
  2023-10-25 18:36 ` [PATCH 04/12] RISC-V: crypto: add Zvkned accelerated AES implementation Jerry Shih
@ 2023-10-25 18:36 ` Jerry Shih
  2023-10-25 18:36 ` [PATCH 06/12] RISC-V: crypto: add accelerated AES-CBC/CTR/ECB/XTS implementations Jerry Shih
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 50+ messages in thread
From: Jerry Shih @ 2023-10-25 18:36 UTC (permalink / raw)
  To: paul.walmsley, palmer, aou, herbert, davem
  Cc: andy.chiu, greentime.hu, conor.dooley, guoren, bjorn, heiko,
	ebiggers, ardb, phoebe.chen, hongrong.hsu, linux-riscv,
	linux-kernel, linux-crypto

In some situations, we might split the `skcipher_request` into several
segments. When we try to move to next segment, we might use
`scatterwalk_ffwd()` to get the corresponding `scatterlist` iterating from
the head of `scatterlist`.

This helper function could just gather the information in `skcipher_walk`
and move to next `scatterlist` directly.

Signed-off-by: Jerry Shih <jerry.shih@sifive.com>
---
 include/crypto/scatterwalk.h | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/include/crypto/scatterwalk.h b/include/crypto/scatterwalk.h
index 32fc4473175b..b1a90afe695d 100644
--- a/include/crypto/scatterwalk.h
+++ b/include/crypto/scatterwalk.h
@@ -98,7 +98,12 @@ void scatterwalk_map_and_copy(void *buf, struct scatterlist *sg,
 			      unsigned int start, unsigned int nbytes, int out);
 
 struct scatterlist *scatterwalk_ffwd(struct scatterlist dst[2],
-				     struct scatterlist *src,
-				     unsigned int len);
+				     struct scatterlist *src, unsigned int len);
+
+static inline struct scatterlist *scatterwalk_next(struct scatterlist dst[2],
+						   struct scatter_walk *src)
+{
+	return scatterwalk_ffwd(dst, src->sg, src->offset - src->sg->offset);
+}
 
 #endif  /* _CRYPTO_SCATTERWALK_H */
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 06/12] RISC-V: crypto: add accelerated AES-CBC/CTR/ECB/XTS implementations
  2023-10-25 18:36 [PATCH 00/12] RISC-V: provide some accelerated cryptography implementations using vector extensions Jerry Shih
                   ` (4 preceding siblings ...)
  2023-10-25 18:36 ` [PATCH 05/12] crypto: scatterwalk - Add scatterwalk_next() to get the next scatterlist in scatter_walk Jerry Shih
@ 2023-10-25 18:36 ` Jerry Shih
  2023-11-02  5:16   ` Eric Biggers
  2023-11-09  8:05   ` Eric Biggers
  2023-10-25 18:36 ` [PATCH 07/12] RISC-V: crypto: add Zvkg accelerated GCM GHASH implementation Jerry Shih
                   ` (5 subsequent siblings)
  11 siblings, 2 replies; 50+ messages in thread
From: Jerry Shih @ 2023-10-25 18:36 UTC (permalink / raw)
  To: paul.walmsley, palmer, aou, herbert, davem
  Cc: andy.chiu, greentime.hu, conor.dooley, guoren, bjorn, heiko,
	ebiggers, ardb, phoebe.chen, hongrong.hsu, linux-riscv,
	linux-kernel, linux-crypto

Port the vector-crypto accelerated CBC, CTR, ECB and XTS block modes for
AES cipher from OpenSSL(openssl/openssl#21923).
In addition, support XTS-AES-192 mode which is not existed in OpenSSL.

Co-developed-by: Phoebe Chen <phoebe.chen@sifive.com>
Signed-off-by: Phoebe Chen <phoebe.chen@sifive.com>
Signed-off-by: Jerry Shih <jerry.shih@sifive.com>
---
 arch/riscv/crypto/Kconfig                     |  21 +
 arch/riscv/crypto/Makefile                    |  11 +
 .../crypto/aes-riscv64-block-mode-glue.c      | 486 +++++++++
 .../crypto/aes-riscv64-zvbb-zvkg-zvkned.pl    | 944 ++++++++++++++++++
 arch/riscv/crypto/aes-riscv64-zvkb-zvkned.pl  | 416 ++++++++
 arch/riscv/crypto/aes-riscv64-zvkned.pl       | 927 +++++++++++++++--
 6 files changed, 2715 insertions(+), 90 deletions(-)
 create mode 100644 arch/riscv/crypto/aes-riscv64-block-mode-glue.c
 create mode 100644 arch/riscv/crypto/aes-riscv64-zvbb-zvkg-zvkned.pl
 create mode 100644 arch/riscv/crypto/aes-riscv64-zvkb-zvkned.pl

diff --git a/arch/riscv/crypto/Kconfig b/arch/riscv/crypto/Kconfig
index 500938317e71..dfa9d0146d26 100644
--- a/arch/riscv/crypto/Kconfig
+++ b/arch/riscv/crypto/Kconfig
@@ -14,4 +14,25 @@ config CRYPTO_AES_RISCV64
 	  Architecture: riscv64 using:
 	  - Zvkned vector crypto extension
 
+config CRYPTO_AES_BLOCK_RISCV64
+	default y if RISCV_ISA_V
+	tristate "Ciphers: AES, modes: ECB/CBC/CTR/XTS"
+	depends on 64BIT && RISCV_ISA_V
+	select CRYPTO_AES_RISCV64
+	select CRYPTO_SKCIPHER
+	help
+	  Length-preserving ciphers: AES cipher algorithms (FIPS-197)
+	  with block cipher modes:
+	  - ECB (Electronic Codebook) mode (NIST SP 800-38A)
+	  - CBC (Cipher Block Chaining) mode (NIST SP 800-38A)
+	  - CTR (Counter) mode (NIST SP 800-38A)
+	  - XTS (XOR Encrypt XOR Tweakable Block Cipher with Ciphertext
+	    Stealing) mode (NIST SP 800-38E and IEEE 1619)
+
+	  Architecture: riscv64 using:
+	  - Zvbb vector extension (XTS)
+	  - Zvkb vector crypto extension (CTR/XTS)
+	  - Zvkg vector crypto extension (XTS)
+	  - Zvkned vector crypto extension
+
 endmenu
diff --git a/arch/riscv/crypto/Makefile b/arch/riscv/crypto/Makefile
index 90ca91d8df26..42a4e8ec79cf 100644
--- a/arch/riscv/crypto/Makefile
+++ b/arch/riscv/crypto/Makefile
@@ -6,10 +6,21 @@
 obj-$(CONFIG_CRYPTO_AES_RISCV64) += aes-riscv64.o
 aes-riscv64-y := aes-riscv64-glue.o aes-riscv64-zvkned.o
 
+obj-$(CONFIG_CRYPTO_AES_BLOCK_RISCV64) += aes-block-riscv64.o
+aes-block-riscv64-y := aes-riscv64-block-mode-glue.o aes-riscv64-zvbb-zvkg-zvkned.o aes-riscv64-zvkb-zvkned.o
+
 quiet_cmd_perlasm = PERLASM $@
       cmd_perlasm = $(PERL) $(<) void $(@)
 
 $(obj)/aes-riscv64-zvkned.S: $(src)/aes-riscv64-zvkned.pl
 	$(call cmd,perlasm)
 
+$(obj)/aes-riscv64-zvbb-zvkg-zvkned.S: $(src)/aes-riscv64-zvbb-zvkg-zvkned.pl
+	$(call cmd,perlasm)
+
+$(obj)/aes-riscv64-zvkb-zvkned.S: $(src)/aes-riscv64-zvkb-zvkned.pl
+	$(call cmd,perlasm)
+
 clean-files += aes-riscv64-zvkned.S
+clean-files += aes-riscv64-zvbb-zvkg-zvkned.S
+clean-files += aes-riscv64-zvkb-zvkned.S
diff --git a/arch/riscv/crypto/aes-riscv64-block-mode-glue.c b/arch/riscv/crypto/aes-riscv64-block-mode-glue.c
new file mode 100644
index 000000000000..c33e902eff5e
--- /dev/null
+++ b/arch/riscv/crypto/aes-riscv64-block-mode-glue.c
@@ -0,0 +1,486 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Port of the OpenSSL AES block mode implementations for RISC-V
+ *
+ * Copyright (C) 2023 SiFive, Inc.
+ * Author: Jerry Shih <jerry.shih@sifive.com>
+ */
+
+#include <asm/simd.h>
+#include <asm/vector.h>
+#include <crypto/aes.h>
+#include <crypto/ctr.h>
+#include <crypto/xts.h>
+#include <crypto/internal/cipher.h>
+#include <crypto/internal/simd.h>
+#include <crypto/internal/skcipher.h>
+#include <crypto/scatterwalk.h>
+#include <linux/crypto.h>
+#include <linux/math.h>
+#include <linux/minmax.h>
+#include <linux/module.h>
+#include <linux/types.h>
+
+#include "aes-riscv64-glue.h"
+
+#define AES_BLOCK_VALID_SIZE_MASK (~(AES_BLOCK_SIZE - 1))
+#define AES_BLOCK_REMAINING_SIZE_MASK (AES_BLOCK_SIZE - 1)
+
+struct riscv64_aes_xts_ctx {
+	struct riscv64_aes_ctx ctx1;
+	struct riscv64_aes_ctx ctx2;
+};
+
+/* aes cbc block mode using zvkned vector crypto extension */
+void rv64i_zvkned_cbc_encrypt(const u8 *in, u8 *out, size_t length,
+			      const struct aes_key *key, u8 *ivec);
+void rv64i_zvkned_cbc_decrypt(const u8 *in, u8 *out, size_t length,
+			      const struct aes_key *key, u8 *ivec);
+/* aes ecb block mode using zvkned vector crypto extension */
+void rv64i_zvkned_ecb_encrypt(const u8 *in, u8 *out, size_t length,
+			      const struct aes_key *key);
+void rv64i_zvkned_ecb_decrypt(const u8 *in, u8 *out, size_t length,
+			      const struct aes_key *key);
+
+/* aes ctr block mode using zvkb and zvkned vector crypto extension */
+/* This func operates on 32-bit counter. Caller has to handle the overflow. */
+void rv64i_zvkb_zvkned_ctr32_encrypt_blocks(const u8 *in, u8 *out,
+					    size_t length,
+					    const struct aes_key *key,
+					    u8 *ivec);
+
+/* aes xts block mode using zvbb, zvkg and zvkned vector crypto extension */
+void rv64i_zvbb_zvkg_zvkned_aes_xts_encrypt(const u8 *in, u8 *out,
+					    size_t length,
+					    const struct aes_key *key, u8 *iv,
+					    int update_iv);
+void rv64i_zvbb_zvkg_zvkned_aes_xts_decrypt(const u8 *in, u8 *out,
+					    size_t length,
+					    const struct aes_key *key, u8 *iv,
+					    int update_iv);
+
+typedef void (*aes_xts_func)(const u8 *in, u8 *out, size_t length,
+			     const struct aes_key *key, u8 *iv, int update_iv);
+
+/* ecb */
+static int aes_setkey(struct crypto_skcipher *tfm, const u8 *in_key,
+		      unsigned int key_len)
+{
+	struct riscv64_aes_ctx *ctx = crypto_skcipher_ctx(tfm);
+
+	return riscv64_aes_setkey(ctx, in_key, key_len);
+}
+
+static int ecb_encrypt(struct skcipher_request *req)
+{
+	struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
+	const struct riscv64_aes_ctx *ctx = crypto_skcipher_ctx(tfm);
+	struct skcipher_walk walk;
+	unsigned int nbytes;
+	int err;
+
+	/* If we have error here, the `nbytes` will be zero. */
+	err = skcipher_walk_virt(&walk, req, false);
+	while ((nbytes = walk.nbytes)) {
+		kernel_vector_begin();
+		rv64i_zvkned_ecb_encrypt(walk.src.virt.addr, walk.dst.virt.addr,
+					 nbytes & AES_BLOCK_VALID_SIZE_MASK,
+					 &ctx->key);
+		kernel_vector_end();
+		err = skcipher_walk_done(
+			&walk, nbytes & AES_BLOCK_REMAINING_SIZE_MASK);
+	}
+
+	return err;
+}
+
+static int ecb_decrypt(struct skcipher_request *req)
+{
+	struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
+	const struct riscv64_aes_ctx *ctx = crypto_skcipher_ctx(tfm);
+	struct skcipher_walk walk;
+	unsigned int nbytes;
+	int err;
+
+	err = skcipher_walk_virt(&walk, req, false);
+	while ((nbytes = walk.nbytes)) {
+		kernel_vector_begin();
+		rv64i_zvkned_ecb_decrypt(walk.src.virt.addr, walk.dst.virt.addr,
+					 nbytes & AES_BLOCK_VALID_SIZE_MASK,
+					 &ctx->key);
+		kernel_vector_end();
+		err = skcipher_walk_done(
+			&walk, nbytes & AES_BLOCK_REMAINING_SIZE_MASK);
+	}
+
+	return err;
+}
+
+/* cbc */
+static int cbc_encrypt(struct skcipher_request *req)
+{
+	struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
+	const struct riscv64_aes_ctx *ctx = crypto_skcipher_ctx(tfm);
+	struct skcipher_walk walk;
+	unsigned int nbytes;
+	int err;
+
+	err = skcipher_walk_virt(&walk, req, false);
+	while ((nbytes = walk.nbytes)) {
+		kernel_vector_begin();
+		rv64i_zvkned_cbc_encrypt(walk.src.virt.addr, walk.dst.virt.addr,
+					 nbytes & AES_BLOCK_VALID_SIZE_MASK,
+					 &ctx->key, walk.iv);
+		kernel_vector_end();
+		err = skcipher_walk_done(
+			&walk, nbytes & AES_BLOCK_REMAINING_SIZE_MASK);
+	}
+
+	return err;
+}
+
+static int cbc_decrypt(struct skcipher_request *req)
+{
+	struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
+	const struct riscv64_aes_ctx *ctx = crypto_skcipher_ctx(tfm);
+	struct skcipher_walk walk;
+	unsigned int nbytes;
+	int err;
+
+	err = skcipher_walk_virt(&walk, req, false);
+	while ((nbytes = walk.nbytes)) {
+		kernel_vector_begin();
+		rv64i_zvkned_cbc_decrypt(walk.src.virt.addr, walk.dst.virt.addr,
+					 nbytes & AES_BLOCK_VALID_SIZE_MASK,
+					 &ctx->key, walk.iv);
+		kernel_vector_end();
+		err = skcipher_walk_done(
+			&walk, nbytes & AES_BLOCK_REMAINING_SIZE_MASK);
+	}
+
+	return err;
+}
+
+/* ctr */
+static int ctr_encrypt(struct skcipher_request *req)
+{
+	struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
+	const struct riscv64_aes_ctx *ctx = crypto_skcipher_ctx(tfm);
+	struct skcipher_walk walk;
+	unsigned int ctr32;
+	unsigned int nbytes;
+	unsigned int blocks;
+	unsigned int current_blocks;
+	unsigned int current_length;
+	int err;
+
+	/* the ctr iv uses big endian */
+	ctr32 = get_unaligned_be32(req->iv + 12);
+	err = skcipher_walk_virt(&walk, req, false);
+	while ((nbytes = walk.nbytes)) {
+		if (nbytes != walk.total) {
+			nbytes &= AES_BLOCK_VALID_SIZE_MASK;
+			blocks = nbytes / AES_BLOCK_SIZE;
+		} else {
+			/* This is the last walk. We should handle the tail data. */
+			blocks = (nbytes + (AES_BLOCK_SIZE - 1)) /
+				 AES_BLOCK_SIZE;
+		}
+		ctr32 += blocks;
+
+		kernel_vector_begin();
+		/*
+		 * The `if` block below detects the overflow, which is then handled by
+		 * limiting the amount of blocks to the exact overflow point.
+		 */
+		if (ctr32 >= blocks) {
+			rv64i_zvkb_zvkned_ctr32_encrypt_blocks(
+				walk.src.virt.addr, walk.dst.virt.addr, nbytes,
+				&ctx->key, req->iv);
+		} else {
+			/* use 2 ctr32 function calls for overflow case */
+			current_blocks = blocks - ctr32;
+			current_length =
+				min(nbytes, current_blocks * AES_BLOCK_SIZE);
+			rv64i_zvkb_zvkned_ctr32_encrypt_blocks(
+				walk.src.virt.addr, walk.dst.virt.addr,
+				current_length, &ctx->key, req->iv);
+			crypto_inc(req->iv, 12);
+
+			if (ctr32) {
+				rv64i_zvkb_zvkned_ctr32_encrypt_blocks(
+					walk.src.virt.addr +
+						current_blocks * AES_BLOCK_SIZE,
+					walk.dst.virt.addr +
+						current_blocks * AES_BLOCK_SIZE,
+					nbytes - current_length, &ctx->key,
+					req->iv);
+			}
+		}
+		kernel_vector_end();
+
+		err = skcipher_walk_done(&walk, walk.nbytes - nbytes);
+	}
+
+	return err;
+}
+
+/* xts */
+static int xts_setkey(struct crypto_skcipher *tfm, const u8 *in_key,
+		      unsigned int key_len)
+{
+	struct riscv64_aes_xts_ctx *ctx = crypto_skcipher_ctx(tfm);
+	unsigned int xts_single_key_len = key_len / 2;
+	int ret;
+
+	ret = xts_verify_key(tfm, in_key, key_len);
+	if (ret)
+		return ret;
+	ret = riscv64_aes_setkey(&ctx->ctx1, in_key, xts_single_key_len);
+	if (ret)
+		return ret;
+	return riscv64_aes_setkey(&ctx->ctx2, in_key + xts_single_key_len,
+				  xts_single_key_len);
+}
+
+static int xts_crypt(struct skcipher_request *req, aes_xts_func func)
+{
+	struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
+	const struct riscv64_aes_xts_ctx *ctx = crypto_skcipher_ctx(tfm);
+	struct skcipher_request sub_req;
+	struct scatterlist sg_src[2], sg_dst[2];
+	struct scatterlist *src, *dst;
+	struct skcipher_walk walk;
+	unsigned int walk_size = crypto_skcipher_walksize(tfm);
+	unsigned int tail_bytes;
+	unsigned int head_bytes;
+	unsigned int nbytes;
+	unsigned int update_iv = 1;
+	int err;
+
+	/* xts input size should be bigger than AES_BLOCK_SIZE */
+	if (req->cryptlen < AES_BLOCK_SIZE)
+		return -EINVAL;
+
+	/*
+	 * The tail size should be small than walk_size. Thus, we could make sure the
+	 * walk size for tail elements could be bigger than AES_BLOCK_SIZE.
+	 */
+	if (req->cryptlen <= walk_size) {
+		tail_bytes = req->cryptlen;
+		head_bytes = 0;
+	} else {
+		if (req->cryptlen & AES_BLOCK_REMAINING_SIZE_MASK) {
+			tail_bytes = req->cryptlen &
+				     AES_BLOCK_REMAINING_SIZE_MASK;
+			tail_bytes = walk_size + tail_bytes - AES_BLOCK_SIZE;
+			head_bytes = req->cryptlen - tail_bytes;
+		} else {
+			tail_bytes = 0;
+			head_bytes = req->cryptlen;
+		}
+	}
+
+	riscv64_aes_encrypt_zvkned(&ctx->ctx2, req->iv, req->iv);
+
+	if (head_bytes && tail_bytes) {
+		skcipher_request_set_tfm(&sub_req, tfm);
+		skcipher_request_set_callback(
+			&sub_req, skcipher_request_flags(req), NULL, NULL);
+		skcipher_request_set_crypt(&sub_req, req->src, req->dst,
+					   head_bytes, req->iv);
+		req = &sub_req;
+	}
+
+	if (head_bytes) {
+		err = skcipher_walk_virt(&walk, req, false);
+		while ((nbytes = walk.nbytes)) {
+			if (nbytes == walk.total)
+				update_iv = (tail_bytes > 0);
+
+			nbytes &= AES_BLOCK_VALID_SIZE_MASK;
+			kernel_vector_begin();
+			func(walk.src.virt.addr, walk.dst.virt.addr, nbytes,
+			     &ctx->ctx1.key, req->iv, update_iv);
+			kernel_vector_end();
+
+			err = skcipher_walk_done(&walk, walk.nbytes - nbytes);
+		}
+		if (err || !tail_bytes)
+			return err;
+
+		dst = src = scatterwalk_next(sg_src, &walk.in);
+		if (req->dst != req->src)
+			dst = scatterwalk_next(sg_dst, &walk.out);
+		skcipher_request_set_crypt(req, src, dst, tail_bytes, req->iv);
+	}
+
+	/* tail */
+	err = skcipher_walk_virt(&walk, req, false);
+	if (err)
+		return err;
+	if (walk.nbytes != tail_bytes)
+		return -EINVAL;
+	kernel_vector_begin();
+	func(walk.src.virt.addr, walk.dst.virt.addr, walk.nbytes,
+	     &ctx->ctx1.key, req->iv, 0);
+	kernel_vector_end();
+
+	return skcipher_walk_done(&walk, 0);
+}
+
+static int xts_encrypt(struct skcipher_request *req)
+{
+	return xts_crypt(req, rv64i_zvbb_zvkg_zvkned_aes_xts_encrypt);
+}
+
+static int xts_decrypt(struct skcipher_request *req)
+{
+	return xts_crypt(req, rv64i_zvbb_zvkg_zvkned_aes_xts_decrypt);
+}
+
+static struct skcipher_alg riscv64_aes_alg_zvkned[] = { {
+	.base = {
+		.cra_name	= "ecb(aes)",
+		.cra_driver_name = "ecb-aes-riscv64-zvkned",
+		.cra_priority = 300,
+		.cra_blocksize = AES_BLOCK_SIZE,
+		.cra_ctxsize = sizeof(struct riscv64_aes_ctx),
+		.cra_module = THIS_MODULE,
+	},
+	.min_keysize = AES_MIN_KEY_SIZE,
+	.max_keysize = AES_MAX_KEY_SIZE,
+	.walksize = AES_BLOCK_SIZE * 8,
+	.setkey = aes_setkey,
+	.encrypt = ecb_encrypt,
+	.decrypt = ecb_decrypt,
+}, {
+	.base = {
+		.cra_name = "cbc(aes)",
+		.cra_driver_name = "cbc-aes-riscv64-zvkned",
+		.cra_priority = 300,
+		.cra_blocksize = AES_BLOCK_SIZE,
+		.cra_ctxsize = sizeof(struct riscv64_aes_ctx),
+		.cra_module = THIS_MODULE,
+	},
+	.min_keysize = AES_MIN_KEY_SIZE,
+	.max_keysize = AES_MAX_KEY_SIZE,
+	.ivsize = AES_BLOCK_SIZE,
+	.walksize = AES_BLOCK_SIZE * 8,
+	.setkey = aes_setkey,
+	.encrypt = cbc_encrypt,
+	.decrypt = cbc_decrypt,
+} };
+
+static struct skcipher_alg riscv64_aes_alg_zvkb_zvkned[] = { {
+	.base = {
+		.cra_name = "ctr(aes)",
+		.cra_driver_name = "ctr-aes-riscv64-zvkb-zvkned",
+		.cra_priority = 300,
+		.cra_blocksize = 1,
+		.cra_ctxsize = sizeof(struct riscv64_aes_ctx),
+		.cra_module = THIS_MODULE,
+	},
+	.min_keysize = AES_MIN_KEY_SIZE,
+	.max_keysize = AES_MAX_KEY_SIZE,
+	.ivsize = AES_BLOCK_SIZE,
+	.chunksize = AES_BLOCK_SIZE,
+	.walksize = AES_BLOCK_SIZE * 8,
+	.setkey = aes_setkey,
+	.encrypt = ctr_encrypt,
+	.decrypt = ctr_encrypt,
+} };
+
+static struct skcipher_alg riscv64_aes_alg_zvbb_zvkg_zvkned[] = { {
+	.base = {
+		.cra_name = "xts(aes)",
+		.cra_driver_name = "xts-aes-riscv64-zvbb-zvkg-zvkned",
+		.cra_priority = 300,
+		.cra_blocksize = AES_BLOCK_SIZE,
+		.cra_ctxsize = sizeof(struct riscv64_aes_xts_ctx),
+		.cra_module = THIS_MODULE,
+	},
+	.min_keysize = AES_MIN_KEY_SIZE * 2,
+	.max_keysize = AES_MAX_KEY_SIZE * 2,
+	.ivsize = AES_BLOCK_SIZE,
+	.chunksize = AES_BLOCK_SIZE,
+	.walksize = AES_BLOCK_SIZE * 8,
+	.setkey = xts_setkey,
+	.encrypt = xts_encrypt,
+	.decrypt = xts_decrypt,
+} };
+
+static int __init riscv64_aes_block_mod_init(void)
+{
+	int ret = -ENODEV;
+
+	if (riscv_isa_extension_available(NULL, ZVKNED) &&
+	    riscv_vector_vlen() >= 128) {
+		ret = crypto_register_skciphers(
+			riscv64_aes_alg_zvkned,
+			ARRAY_SIZE(riscv64_aes_alg_zvkned));
+		if (ret)
+			return ret;
+
+		if (riscv_isa_extension_available(NULL, ZVBB)) {
+			ret = crypto_register_skciphers(
+				riscv64_aes_alg_zvkb_zvkned,
+				ARRAY_SIZE(riscv64_aes_alg_zvkb_zvkned));
+			if (ret)
+				goto unregister_zvkned;
+
+			if (riscv_isa_extension_available(NULL, ZVKG)) {
+				ret = crypto_register_skciphers(
+					riscv64_aes_alg_zvbb_zvkg_zvkned,
+					ARRAY_SIZE(
+						riscv64_aes_alg_zvbb_zvkg_zvkned));
+				if (ret)
+					goto unregister_zvkb_zvkned;
+			}
+		}
+	}
+
+	return ret;
+
+unregister_zvkb_zvkned:
+	crypto_unregister_skciphers(riscv64_aes_alg_zvkb_zvkned,
+				    ARRAY_SIZE(riscv64_aes_alg_zvkb_zvkned));
+unregister_zvkned:
+	crypto_unregister_skciphers(riscv64_aes_alg_zvkned,
+				    ARRAY_SIZE(riscv64_aes_alg_zvkned));
+
+	return ret;
+}
+
+static void __exit riscv64_aes_block_mod_fini(void)
+{
+	if (riscv_isa_extension_available(NULL, ZVKNED) &&
+	    riscv_vector_vlen() >= 128) {
+		crypto_unregister_skciphers(riscv64_aes_alg_zvkned,
+					    ARRAY_SIZE(riscv64_aes_alg_zvkned));
+
+		if (riscv_isa_extension_available(NULL, ZVBB)) {
+			crypto_unregister_skciphers(
+				riscv64_aes_alg_zvkb_zvkned,
+				ARRAY_SIZE(riscv64_aes_alg_zvkb_zvkned));
+
+			if (riscv_isa_extension_available(NULL, ZVKG)) {
+				crypto_unregister_skciphers(
+					riscv64_aes_alg_zvbb_zvkg_zvkned,
+					ARRAY_SIZE(
+						riscv64_aes_alg_zvbb_zvkg_zvkned));
+			}
+		}
+	}
+}
+
+module_init(riscv64_aes_block_mod_init);
+module_exit(riscv64_aes_block_mod_fini);
+
+MODULE_DESCRIPTION("AES-ECB/CBC/CTR/XTS (RISC-V accelerated)");
+MODULE_AUTHOR("Jerry Shih <jerry.shih@sifive.com>");
+MODULE_LICENSE("GPL");
+MODULE_ALIAS_CRYPTO("cbc(aes)");
+MODULE_ALIAS_CRYPTO("ctr(aes)");
+MODULE_ALIAS_CRYPTO("ecb(aes)");
+MODULE_ALIAS_CRYPTO("xts(aes)");
diff --git a/arch/riscv/crypto/aes-riscv64-zvbb-zvkg-zvkned.pl b/arch/riscv/crypto/aes-riscv64-zvbb-zvkg-zvkned.pl
new file mode 100644
index 000000000000..0daec9c38574
--- /dev/null
+++ b/arch/riscv/crypto/aes-riscv64-zvbb-zvkg-zvkned.pl
@@ -0,0 +1,944 @@
+#! /usr/bin/env perl
+# SPDX-License-Identifier: Apache-2.0 OR BSD-2-Clause
+#
+# This file is dual-licensed, meaning that you can use it under your
+# choice of either of the following two licenses:
+#
+# Copyright 2023 The OpenSSL Project Authors. All Rights Reserved.
+#
+# Licensed under the Apache License 2.0 (the "License"). You can obtain
+# a copy in the file LICENSE in the source distribution or at
+# https://www.openssl.org/source/license.html
+#
+# or
+#
+# Copyright (c) 2023, Jerry Shih <jerry.shih@sifive.com>
+# All rights reserved.
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions
+# are met:
+# 1. Redistributions of source code must retain the above copyright
+#    notice, this list of conditions and the following disclaimer.
+# 2. Redistributions in binary form must reproduce the above copyright
+#    notice, this list of conditions and the following disclaimer in the
+#    documentation and/or other materials provided with the distribution.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+# - RV64I
+# - RISC-V Vector ('V') with VLEN >= 128
+# - RISC-V Vector Bit-manipulation extension ('Zvbb')
+# - RISC-V Vector GCM/GMAC extension ('Zvkg')
+# - RISC-V Vector AES block cipher extension ('Zvkned')
+# - RISC-V Zicclsm(Main memory supports misaligned loads/stores)
+
+use strict;
+use warnings;
+
+use FindBin qw($Bin);
+use lib "$Bin";
+use lib "$Bin/../../perlasm";
+use riscv;
+
+# $output is the last argument if it looks like a file (it has an extension)
+# $flavour is the first argument if it doesn't look like a file
+my $output = $#ARGV >= 0 && $ARGV[$#ARGV] =~ m|\.\w+$| ? pop : undef;
+my $flavour = $#ARGV >= 0 && $ARGV[0] !~ m|\.| ? shift : undef;
+
+$output and open STDOUT,">$output";
+
+my $code=<<___;
+.text
+___
+
+{
+################################################################################
+# void rv64i_zvbb_zvkg_zvkned_aes_xts_encrypt(const unsigned char *in,
+#                                             unsigned char *out, size_t length,
+#                                             const AES_KEY *key,
+#                                             unsigned char iv[16],
+#                                             int update_iv)
+my ($INPUT, $OUTPUT, $LENGTH, $KEY, $IV, $UPDATE_IV) = ("a0", "a1", "a2", "a3", "a4", "a5");
+my ($TAIL_LENGTH) = ("a6");
+my ($VL) = ("a7");
+my ($T0, $T1, $T2, $T3) = ("t0", "t1", "t2", "t3");
+my ($STORE_LEN32) = ("t4");
+my ($LEN32) = ("t5");
+my ($V0, $V1, $V2, $V3, $V4, $V5, $V6, $V7,
+    $V8, $V9, $V10, $V11, $V12, $V13, $V14, $V15,
+    $V16, $V17, $V18, $V19, $V20, $V21, $V22, $V23,
+    $V24, $V25, $V26, $V27, $V28, $V29, $V30, $V31,
+) = map("v$_",(0..31));
+
+# load iv to v28
+sub load_xts_iv0 {
+    my $code=<<___;
+    @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]}
+    @{[vle32_v $V28, $IV]}
+___
+
+    return $code;
+}
+
+# prepare input data(v24), iv(v28), bit-reversed-iv(v16), bit-reversed-iv-multiplier(v20)
+sub init_first_round {
+    my $code=<<___;
+    # load input
+    @{[vsetvli $VL, $LEN32, "e32", "m4", "ta", "ma"]}
+    @{[vle32_v $V24, $INPUT]}
+
+    li $T0, 5
+    # We could simplify the initialization steps if we have `block<=1`.
+    blt $LEN32, $T0, 1f
+
+    # Note: We use `vgmul` for GF(2^128) multiplication. The `vgmul` uses
+    # different order of coefficients. We should use`vbrev8` to reverse the
+    # data when we use `vgmul`.
+    @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]}
+    @{[vbrev8_v $V0, $V28]}
+    @{[vsetvli "zero", $LEN32, "e32", "m4", "ta", "ma"]}
+    @{[vmv_v_i $V16, 0]}
+    # v16: [r-IV0, r-IV0, ...]
+    @{[vaesz_vs $V16, $V0]}
+
+    # Prepare GF(2^128) multiplier [1, x, x^2, x^3, ...] in v8.
+    slli $T0, $LEN32, 2
+    @{[vsetvli "zero", $T0, "e32", "m1", "ta", "ma"]}
+    # v2: [`1`, `1`, `1`, `1`, ...]
+    @{[vmv_v_i $V2, 1]}
+    # v3: [`0`, `1`, `2`, `3`, ...]
+    @{[vid_v $V3]}
+    @{[vsetvli "zero", $T0, "e64", "m2", "ta", "ma"]}
+    # v4: [`1`, 0, `1`, 0, `1`, 0, `1`, 0, ...]
+    @{[vzext_vf2 $V4, $V2]}
+    # v6: [`0`, 0, `1`, 0, `2`, 0, `3`, 0, ...]
+    @{[vzext_vf2 $V6, $V3]}
+    slli $T0, $LEN32, 1
+    @{[vsetvli "zero", $T0, "e32", "m2", "ta", "ma"]}
+    # v8: [1<<0=1, 0, 0, 0, 1<<1=x, 0, 0, 0, 1<<2=x^2, 0, 0, 0, ...]
+    @{[vwsll_vv $V8, $V4, $V6]}
+
+    # Compute [r-IV0*1, r-IV0*x, r-IV0*x^2, r-IV0*x^3, ...] in v16
+    @{[vsetvli "zero", $LEN32, "e32", "m4", "ta", "ma"]}
+    @{[vbrev8_v $V8, $V8]}
+    @{[vgmul_vv $V16, $V8]}
+
+    # Compute [IV0*1, IV0*x, IV0*x^2, IV0*x^3, ...] in v28.
+    # Reverse the bits order back.
+    @{[vbrev8_v $V28, $V16]}
+
+    # Prepare the x^n multiplier in v20. The `n` is the aes-xts block number
+    # in a LMUL=4 register group.
+    #   n = ((VLEN*LMUL)/(32*4)) = ((VLEN*4)/(32*4))
+    #     = (VLEN/32)
+    # We could use vsetvli with `e32, m1` to compute the `n` number.
+    @{[vsetvli $T0, "zero", "e32", "m1", "ta", "ma"]}
+    li $T1, 1
+    sll $T0, $T1, $T0
+    @{[vsetivli "zero", 2, "e64", "m1", "ta", "ma"]}
+    @{[vmv_v_i $V0, 0]}
+    @{[vsetivli "zero", 1, "e64", "m1", "tu", "ma"]}
+    @{[vmv_v_x $V0, $T0]}
+    @{[vsetivli "zero", 2, "e64", "m1", "ta", "ma"]}
+    @{[vbrev8_v $V0, $V0]}
+    @{[vsetvli "zero", $LEN32, "e32", "m4", "ta", "ma"]}
+    @{[vmv_v_i $V20, 0]}
+    @{[vaesz_vs $V20, $V0]}
+
+    j 2f
+1:
+    @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]}
+    @{[vbrev8_v $V16, $V28]}
+2:
+___
+
+    return $code;
+}
+
+# prepare xts enc last block's input(v24) and iv(v28)
+sub handle_xts_enc_last_block {
+    my $code=<<___;
+    bnez $TAIL_LENGTH, 2f
+
+    beqz $UPDATE_IV, 1f
+    ## Store next IV
+    addi $VL, $VL, -4
+    @{[vsetivli "zero", 4, "e32", "m4", "ta", "ma"]}
+    # multiplier
+    @{[vslidedown_vx $V16, $V16, $VL]}
+
+    # setup `x` multiplier with byte-reversed order
+    # 0b00000010 => 0b01000000 (0x40)
+    li $T0, 0x40
+    @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]}
+    @{[vmv_v_i $V28, 0]}
+    @{[vsetivli "zero", 1, "e8", "m1", "tu", "ma"]}
+    @{[vmv_v_x $V28, $T0]}
+
+    # IV * `x`
+    @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]}
+    @{[vgmul_vv $V16, $V28]}
+    # Reverse the IV's bits order back to big-endian
+    @{[vbrev8_v $V28, $V16]}
+
+    @{[vse32_v $V28, $IV]}
+1:
+
+    ret
+2:
+    # slidedown second to last block
+    addi $VL, $VL, -4
+    @{[vsetivli "zero", 4, "e32", "m4", "ta", "ma"]}
+    # ciphertext
+    @{[vslidedown_vx $V24, $V24, $VL]}
+    # multiplier
+    @{[vslidedown_vx $V16, $V16, $VL]}
+
+    @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]}
+    @{[vmv_v_v $V25, $V24]}
+
+    # load last block into v24
+    # note: We should load the last block before store the second to last block
+    #       for in-place operation.
+    @{[vsetvli "zero", $TAIL_LENGTH, "e8", "m1", "tu", "ma"]}
+    @{[vle8_v $V24, $INPUT]}
+
+    # setup `x` multiplier with byte-reversed order
+    # 0b00000010 => 0b01000000 (0x40)
+    li $T0, 0x40
+    @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]}
+    @{[vmv_v_i $V28, 0]}
+    @{[vsetivli "zero", 1, "e8", "m1", "tu", "ma"]}
+    @{[vmv_v_x $V28, $T0]}
+
+    # compute IV for last block
+    @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]}
+    @{[vgmul_vv $V16, $V28]}
+    @{[vbrev8_v $V28, $V16]}
+
+    # store second to last block
+    @{[vsetvli "zero", $TAIL_LENGTH, "e8", "m1", "ta", "ma"]}
+    @{[vse8_v $V25, $OUTPUT]}
+___
+
+    return $code;
+}
+
+# prepare xts dec second to last block's input(v24) and iv(v29) and
+# last block's and iv(v28)
+sub handle_xts_dec_last_block {
+    my $code=<<___;
+    bnez $TAIL_LENGTH, 2f
+
+    beqz $UPDATE_IV, 1f
+    ## Store next IV
+    # setup `x` multiplier with byte-reversed order
+    # 0b00000010 => 0b01000000 (0x40)
+    li $T0, 0x40
+    @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]}
+    @{[vmv_v_i $V28, 0]}
+    @{[vsetivli "zero", 1, "e8", "m1", "tu", "ma"]}
+    @{[vmv_v_x $V28, $T0]}
+
+    beqz $LENGTH, 3f
+    addi $VL, $VL, -4
+    @{[vsetivli "zero", 4, "e32", "m4", "ta", "ma"]}
+    # multiplier
+    @{[vslidedown_vx $V16, $V16, $VL]}
+
+3:
+    # IV * `x`
+    @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]}
+    @{[vgmul_vv $V16, $V28]}
+    # Reverse the IV's bits order back to big-endian
+    @{[vbrev8_v $V28, $V16]}
+
+    @{[vse32_v $V28, $IV]}
+1:
+
+    ret
+2:
+    # load second to last block's ciphertext
+    @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]}
+    @{[vle32_v $V24, $INPUT]}
+    addi $INPUT, $INPUT, 16
+
+    # setup `x` multiplier with byte-reversed order
+    # 0b00000010 => 0b01000000 (0x40)
+    li $T0, 0x40
+    @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]}
+    @{[vmv_v_i $V20, 0]}
+    @{[vsetivli "zero", 1, "e8", "m1", "tu", "ma"]}
+    @{[vmv_v_x $V20, $T0]}
+
+    beqz $LENGTH, 1f
+    # slidedown third to last block
+    addi $VL, $VL, -4
+    @{[vsetivli "zero", 4, "e32", "m4", "ta", "ma"]}
+    # multiplier
+    @{[vslidedown_vx $V16, $V16, $VL]}
+
+    # compute IV for last block
+    @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]}
+    @{[vgmul_vv $V16, $V20]}
+    @{[vbrev8_v $V28, $V16]}
+
+    # compute IV for second to last block
+    @{[vgmul_vv $V16, $V20]}
+    @{[vbrev8_v $V29, $V16]}
+    j 2f
+1:
+    # compute IV for second to last block
+    @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]}
+    @{[vgmul_vv $V16, $V20]}
+    @{[vbrev8_v $V29, $V16]}
+2:
+___
+
+    return $code;
+}
+
+# Load all 11 round keys to v1-v11 registers.
+sub aes_128_load_key {
+    my $code=<<___;
+    @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]}
+    @{[vle32_v $V1, $KEY]}
+    addi $KEY, $KEY, 16
+    @{[vle32_v $V2, $KEY]}
+    addi $KEY, $KEY, 16
+    @{[vle32_v $V3, $KEY]}
+    addi $KEY, $KEY, 16
+    @{[vle32_v $V4, $KEY]}
+    addi $KEY, $KEY, 16
+    @{[vle32_v $V5, $KEY]}
+    addi $KEY, $KEY, 16
+    @{[vle32_v $V6, $KEY]}
+    addi $KEY, $KEY, 16
+    @{[vle32_v $V7, $KEY]}
+    addi $KEY, $KEY, 16
+    @{[vle32_v $V8, $KEY]}
+    addi $KEY, $KEY, 16
+    @{[vle32_v $V9, $KEY]}
+    addi $KEY, $KEY, 16
+    @{[vle32_v $V10, $KEY]}
+    addi $KEY, $KEY, 16
+    @{[vle32_v $V11, $KEY]}
+___
+
+    return $code;
+}
+
+# Load all 13 round keys to v1-v13 registers.
+sub aes_192_load_key {
+    my $code=<<___;
+    @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]}
+    @{[vle32_v $V1, $KEY]}
+    addi $KEY, $KEY, 16
+    @{[vle32_v $V2, $KEY]}
+    addi $KEY, $KEY, 16
+    @{[vle32_v $V3, $KEY]}
+    addi $KEY, $KEY, 16
+    @{[vle32_v $V4, $KEY]}
+    addi $KEY, $KEY, 16
+    @{[vle32_v $V5, $KEY]}
+    addi $KEY, $KEY, 16
+    @{[vle32_v $V6, $KEY]}
+    addi $KEY, $KEY, 16
+    @{[vle32_v $V7, $KEY]}
+    addi $KEY, $KEY, 16
+    @{[vle32_v $V8, $KEY]}
+    addi $KEY, $KEY, 16
+    @{[vle32_v $V9, $KEY]}
+    addi $KEY, $KEY, 16
+    @{[vle32_v $V10, $KEY]}
+    addi $KEY, $KEY, 16
+    @{[vle32_v $V11, $KEY]}
+    addi $KEY, $KEY, 16
+    @{[vle32_v $V12, $KEY]}
+    addi $KEY, $KEY, 16
+    @{[vle32_v $V13, $KEY]}
+___
+
+    return $code;
+}
+
+# Load all 15 round keys to v1-v15 registers.
+sub aes_256_load_key {
+    my $code=<<___;
+    @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]}
+    @{[vle32_v $V1, $KEY]}
+    addi $KEY, $KEY, 16
+    @{[vle32_v $V2, $KEY]}
+    addi $KEY, $KEY, 16
+    @{[vle32_v $V3, $KEY]}
+    addi $KEY, $KEY, 16
+    @{[vle32_v $V4, $KEY]}
+    addi $KEY, $KEY, 16
+    @{[vle32_v $V5, $KEY]}
+    addi $KEY, $KEY, 16
+    @{[vle32_v $V6, $KEY]}
+    addi $KEY, $KEY, 16
+    @{[vle32_v $V7, $KEY]}
+    addi $KEY, $KEY, 16
+    @{[vle32_v $V8, $KEY]}
+    addi $KEY, $KEY, 16
+    @{[vle32_v $V9, $KEY]}
+    addi $KEY, $KEY, 16
+    @{[vle32_v $V10, $KEY]}
+    addi $KEY, $KEY, 16
+    @{[vle32_v $V11, $KEY]}
+    addi $KEY, $KEY, 16
+    @{[vle32_v $V12, $KEY]}
+    addi $KEY, $KEY, 16
+    @{[vle32_v $V13, $KEY]}
+    addi $KEY, $KEY, 16
+    @{[vle32_v $V14, $KEY]}
+    addi $KEY, $KEY, 16
+    @{[vle32_v $V15, $KEY]}
+___
+
+    return $code;
+}
+
+# aes-128 enc with round keys v1-v11
+sub aes_128_enc {
+    my $code=<<___;
+    @{[vaesz_vs $V24, $V1]}
+    @{[vaesem_vs $V24, $V2]}
+    @{[vaesem_vs $V24, $V3]}
+    @{[vaesem_vs $V24, $V4]}
+    @{[vaesem_vs $V24, $V5]}
+    @{[vaesem_vs $V24, $V6]}
+    @{[vaesem_vs $V24, $V7]}
+    @{[vaesem_vs $V24, $V8]}
+    @{[vaesem_vs $V24, $V9]}
+    @{[vaesem_vs $V24, $V10]}
+    @{[vaesef_vs $V24, $V11]}
+___
+
+    return $code;
+}
+
+# aes-128 dec with round keys v1-v11
+sub aes_128_dec {
+    my $code=<<___;
+    @{[vaesz_vs $V24, $V11]}
+    @{[vaesdm_vs $V24, $V10]}
+    @{[vaesdm_vs $V24, $V9]}
+    @{[vaesdm_vs $V24, $V8]}
+    @{[vaesdm_vs $V24, $V7]}
+    @{[vaesdm_vs $V24, $V6]}
+    @{[vaesdm_vs $V24, $V5]}
+    @{[vaesdm_vs $V24, $V4]}
+    @{[vaesdm_vs $V24, $V3]}
+    @{[vaesdm_vs $V24, $V2]}
+    @{[vaesdf_vs $V24, $V1]}
+___
+
+    return $code;
+}
+
+# aes-192 enc with round keys v1-v13
+sub aes_192_enc {
+    my $code=<<___;
+    @{[vaesz_vs $V24, $V1]}
+    @{[vaesem_vs $V24, $V2]}
+    @{[vaesem_vs $V24, $V3]}
+    @{[vaesem_vs $V24, $V4]}
+    @{[vaesem_vs $V24, $V5]}
+    @{[vaesem_vs $V24, $V6]}
+    @{[vaesem_vs $V24, $V7]}
+    @{[vaesem_vs $V24, $V8]}
+    @{[vaesem_vs $V24, $V9]}
+    @{[vaesem_vs $V24, $V10]}
+    @{[vaesem_vs $V24, $V11]}
+    @{[vaesem_vs $V24, $V12]}
+    @{[vaesef_vs $V24, $V13]}
+___
+
+    return $code;
+}
+
+# aes-192 dec with round keys v1-v13
+sub aes_192_dec {
+    my $code=<<___;
+    @{[vaesz_vs $V24, $V13]}
+    @{[vaesdm_vs $V24, $V12]}
+    @{[vaesdm_vs $V24, $V11]}
+    @{[vaesdm_vs $V24, $V10]}
+    @{[vaesdm_vs $V24, $V9]}
+    @{[vaesdm_vs $V24, $V8]}
+    @{[vaesdm_vs $V24, $V7]}
+    @{[vaesdm_vs $V24, $V6]}
+    @{[vaesdm_vs $V24, $V5]}
+    @{[vaesdm_vs $V24, $V4]}
+    @{[vaesdm_vs $V24, $V3]}
+    @{[vaesdm_vs $V24, $V2]}
+    @{[vaesdf_vs $V24, $V1]}
+___
+
+    return $code;
+}
+
+# aes-256 enc with round keys v1-v15
+sub aes_256_enc {
+    my $code=<<___;
+    @{[vaesz_vs $V24, $V1]}
+    @{[vaesem_vs $V24, $V2]}
+    @{[vaesem_vs $V24, $V3]}
+    @{[vaesem_vs $V24, $V4]}
+    @{[vaesem_vs $V24, $V5]}
+    @{[vaesem_vs $V24, $V6]}
+    @{[vaesem_vs $V24, $V7]}
+    @{[vaesem_vs $V24, $V8]}
+    @{[vaesem_vs $V24, $V9]}
+    @{[vaesem_vs $V24, $V10]}
+    @{[vaesem_vs $V24, $V11]}
+    @{[vaesem_vs $V24, $V12]}
+    @{[vaesem_vs $V24, $V13]}
+    @{[vaesem_vs $V24, $V14]}
+    @{[vaesef_vs $V24, $V15]}
+___
+
+    return $code;
+}
+
+# aes-256 dec with round keys v1-v15
+sub aes_256_dec {
+    my $code=<<___;
+    @{[vaesz_vs $V24, $V15]}
+    @{[vaesdm_vs $V24, $V14]}
+    @{[vaesdm_vs $V24, $V13]}
+    @{[vaesdm_vs $V24, $V12]}
+    @{[vaesdm_vs $V24, $V11]}
+    @{[vaesdm_vs $V24, $V10]}
+    @{[vaesdm_vs $V24, $V9]}
+    @{[vaesdm_vs $V24, $V8]}
+    @{[vaesdm_vs $V24, $V7]}
+    @{[vaesdm_vs $V24, $V6]}
+    @{[vaesdm_vs $V24, $V5]}
+    @{[vaesdm_vs $V24, $V4]}
+    @{[vaesdm_vs $V24, $V3]}
+    @{[vaesdm_vs $V24, $V2]}
+    @{[vaesdf_vs $V24, $V1]}
+___
+
+    return $code;
+}
+
+$code .= <<___;
+.p2align 3
+.globl rv64i_zvbb_zvkg_zvkned_aes_xts_encrypt
+.type rv64i_zvbb_zvkg_zvkned_aes_xts_encrypt,\@function
+rv64i_zvbb_zvkg_zvkned_aes_xts_encrypt:
+    @{[load_xts_iv0]}
+
+    # aes block size is 16
+    andi $TAIL_LENGTH, $LENGTH, 15
+    mv $STORE_LEN32, $LENGTH
+    beqz $TAIL_LENGTH, 1f
+    sub $LENGTH, $LENGTH, $TAIL_LENGTH
+    addi $STORE_LEN32, $LENGTH, -16
+1:
+    # We make the `LENGTH` become e32 length here.
+    srli $LEN32, $LENGTH, 2
+    srli $STORE_LEN32, $STORE_LEN32, 2
+
+    # Load number of rounds
+    lwu $T0, 240($KEY)
+    li $T1, 14
+    li $T2, 12
+    li $T3, 10
+    beq $T0, $T1, aes_xts_enc_256
+    beq $T0, $T2, aes_xts_enc_192
+    beq $T0, $T3, aes_xts_enc_128
+.size rv64i_zvbb_zvkg_zvkned_aes_xts_encrypt,.-rv64i_zvbb_zvkg_zvkned_aes_xts_encrypt
+___
+
+$code .= <<___;
+.p2align 3
+aes_xts_enc_128:
+    @{[init_first_round]}
+    @{[aes_128_load_key]}
+
+    @{[vsetvli $VL, $LEN32, "e32", "m4", "ta", "ma"]}
+    j 1f
+
+.Lenc_blocks_128:
+    @{[vsetvli $VL, $LEN32, "e32", "m4", "ta", "ma"]}
+    # load plaintext into v24
+    @{[vle32_v $V24, $INPUT]}
+    # update iv
+    @{[vgmul_vv $V16, $V20]}
+    # reverse the iv's bits order back
+    @{[vbrev8_v $V28, $V16]}
+1:
+    @{[vxor_vv $V24, $V24, $V28]}
+    slli $T0, $VL, 2
+    sub $LEN32, $LEN32, $VL
+    add $INPUT, $INPUT, $T0
+    @{[aes_128_enc]}
+    @{[vxor_vv $V24, $V24, $V28]}
+
+    # store ciphertext
+    @{[vsetvli "zero", $STORE_LEN32, "e32", "m4", "ta", "ma"]}
+    @{[vse32_v $V24, $OUTPUT]}
+    add $OUTPUT, $OUTPUT, $T0
+    sub $STORE_LEN32, $STORE_LEN32, $VL
+
+    bnez $LEN32, .Lenc_blocks_128
+
+    @{[handle_xts_enc_last_block]}
+
+    # xts last block
+    @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]}
+    @{[vxor_vv $V24, $V24, $V28]}
+    @{[aes_128_enc]}
+    @{[vxor_vv $V24, $V24, $V28]}
+
+    # store last block ciphertext
+    addi $OUTPUT, $OUTPUT, -16
+    @{[vse32_v $V24, $OUTPUT]}
+
+    ret
+.size aes_xts_enc_128,.-aes_xts_enc_128
+___
+
+$code .= <<___;
+.p2align 3
+aes_xts_enc_192:
+    @{[init_first_round]}
+    @{[aes_192_load_key]}
+
+    @{[vsetvli $VL, $LEN32, "e32", "m4", "ta", "ma"]}
+    j 1f
+
+.Lenc_blocks_192:
+    @{[vsetvli $VL, $LEN32, "e32", "m4", "ta", "ma"]}
+    # load plaintext into v24
+    @{[vle32_v $V24, $INPUT]}
+    # update iv
+    @{[vgmul_vv $V16, $V20]}
+    # reverse the iv's bits order back
+    @{[vbrev8_v $V28, $V16]}
+1:
+    @{[vxor_vv $V24, $V24, $V28]}
+    slli $T0, $VL, 2
+    sub $LEN32, $LEN32, $VL
+    add $INPUT, $INPUT, $T0
+    @{[aes_192_enc]}
+    @{[vxor_vv $V24, $V24, $V28]}
+
+    # store ciphertext
+    @{[vsetvli "zero", $STORE_LEN32, "e32", "m4", "ta", "ma"]}
+    @{[vse32_v $V24, $OUTPUT]}
+    add $OUTPUT, $OUTPUT, $T0
+    sub $STORE_LEN32, $STORE_LEN32, $VL
+
+    bnez $LEN32, .Lenc_blocks_192
+
+    @{[handle_xts_enc_last_block]}
+
+    # xts last block
+    @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]}
+    @{[vxor_vv $V24, $V24, $V28]}
+    @{[aes_192_enc]}
+    @{[vxor_vv $V24, $V24, $V28]}
+
+    # store last block ciphertext
+    addi $OUTPUT, $OUTPUT, -16
+    @{[vse32_v $V24, $OUTPUT]}
+
+    ret
+.size aes_xts_enc_192,.-aes_xts_enc_192
+___
+
+$code .= <<___;
+.p2align 3
+aes_xts_enc_256:
+    @{[init_first_round]}
+    @{[aes_256_load_key]}
+
+    @{[vsetvli $VL, $LEN32, "e32", "m4", "ta", "ma"]}
+    j 1f
+
+.Lenc_blocks_256:
+    @{[vsetvli $VL, $LEN32, "e32", "m4", "ta", "ma"]}
+    # load plaintext into v24
+    @{[vle32_v $V24, $INPUT]}
+    # update iv
+    @{[vgmul_vv $V16, $V20]}
+    # reverse the iv's bits order back
+    @{[vbrev8_v $V28, $V16]}
+1:
+    @{[vxor_vv $V24, $V24, $V28]}
+    slli $T0, $VL, 2
+    sub $LEN32, $LEN32, $VL
+    add $INPUT, $INPUT, $T0
+    @{[aes_256_enc]}
+    @{[vxor_vv $V24, $V24, $V28]}
+
+    # store ciphertext
+    @{[vsetvli "zero", $STORE_LEN32, "e32", "m4", "ta", "ma"]}
+    @{[vse32_v $V24, $OUTPUT]}
+    add $OUTPUT, $OUTPUT, $T0
+    sub $STORE_LEN32, $STORE_LEN32, $VL
+
+    bnez $LEN32, .Lenc_blocks_256
+
+    @{[handle_xts_enc_last_block]}
+
+    # xts last block
+    @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]}
+    @{[vxor_vv $V24, $V24, $V28]}
+    @{[aes_256_enc]}
+    @{[vxor_vv $V24, $V24, $V28]}
+
+    # store last block ciphertext
+    addi $OUTPUT, $OUTPUT, -16
+    @{[vse32_v $V24, $OUTPUT]}
+
+    ret
+.size aes_xts_enc_256,.-aes_xts_enc_256
+___
+
+################################################################################
+# void rv64i_zvbb_zvkg_zvkned_aes_xts_decrypt(const unsigned char *in,
+#                                             unsigned char *out, size_t length,
+#                                             const AES_KEY *key,
+#                                             unsigned char iv[16],
+#                                             int update_iv)
+$code .= <<___;
+.p2align 3
+.globl rv64i_zvbb_zvkg_zvkned_aes_xts_decrypt
+.type rv64i_zvbb_zvkg_zvkned_aes_xts_decrypt,\@function
+rv64i_zvbb_zvkg_zvkned_aes_xts_decrypt:
+    @{[load_xts_iv0]}
+
+    # aes block size is 16
+    andi $TAIL_LENGTH, $LENGTH, 15
+    beqz $TAIL_LENGTH, 1f
+    sub $LENGTH, $LENGTH, $TAIL_LENGTH
+    addi $LENGTH, $LENGTH, -16
+1:
+    # We make the `LENGTH` become e32 length here.
+    srli $LEN32, $LENGTH, 2
+
+    # Load number of rounds
+    lwu $T0, 240($KEY)
+    li $T1, 14
+    li $T2, 12
+    li $T3, 10
+    beq $T0, $T1, aes_xts_dec_256
+    beq $T0, $T2, aes_xts_dec_192
+    beq $T0, $T3, aes_xts_dec_128
+.size rv64i_zvbb_zvkg_zvkned_aes_xts_decrypt,.-rv64i_zvbb_zvkg_zvkned_aes_xts_decrypt
+___
+
+$code .= <<___;
+.p2align 3
+aes_xts_dec_128:
+    @{[init_first_round]}
+    @{[aes_128_load_key]}
+
+    beqz $LEN32, 2f
+
+    @{[vsetvli $VL, $LEN32, "e32", "m4", "ta", "ma"]}
+    j 1f
+
+.Ldec_blocks_128:
+    @{[vsetvli $VL, $LEN32, "e32", "m4", "ta", "ma"]}
+    # load ciphertext into v24
+    @{[vle32_v $V24, $INPUT]}
+    # update iv
+    @{[vgmul_vv $V16, $V20]}
+    # reverse the iv's bits order back
+    @{[vbrev8_v $V28, $V16]}
+1:
+    @{[vxor_vv $V24, $V24, $V28]}
+    slli $T0, $VL, 2
+    sub $LEN32, $LEN32, $VL
+    add $INPUT, $INPUT, $T0
+    @{[aes_128_dec]}
+    @{[vxor_vv $V24, $V24, $V28]}
+
+    # store plaintext
+    @{[vse32_v $V24, $OUTPUT]}
+    add $OUTPUT, $OUTPUT, $T0
+
+    bnez $LEN32, .Ldec_blocks_128
+
+2:
+    @{[handle_xts_dec_last_block]}
+
+    ## xts second to last block
+    @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]}
+    @{[vxor_vv $V24, $V24, $V29]}
+    @{[aes_128_dec]}
+    @{[vxor_vv $V24, $V24, $V29]}
+    @{[vmv_v_v $V25, $V24]}
+
+    # load last block ciphertext
+    @{[vsetvli "zero", $TAIL_LENGTH, "e8", "m1", "tu", "ma"]}
+    @{[vle8_v $V24, $INPUT]}
+
+    # store second to last block plaintext
+    addi $T0, $OUTPUT, 16
+    @{[vse8_v $V25, $T0]}
+
+    ## xts last block
+    @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]}
+    @{[vxor_vv $V24, $V24, $V28]}
+    @{[aes_128_dec]}
+    @{[vxor_vv $V24, $V24, $V28]}
+
+    # store second to last block plaintext
+    @{[vse32_v $V24, $OUTPUT]}
+
+    ret
+.size aes_xts_dec_128,.-aes_xts_dec_128
+___
+
+$code .= <<___;
+.p2align 3
+aes_xts_dec_192:
+    @{[init_first_round]}
+    @{[aes_192_load_key]}
+
+    beqz $LEN32, 2f
+
+    @{[vsetvli $VL, $LEN32, "e32", "m4", "ta", "ma"]}
+    j 1f
+
+.Ldec_blocks_192:
+    @{[vsetvli $VL, $LEN32, "e32", "m4", "ta", "ma"]}
+    # load ciphertext into v24
+    @{[vle32_v $V24, $INPUT]}
+    # update iv
+    @{[vgmul_vv $V16, $V20]}
+    # reverse the iv's bits order back
+    @{[vbrev8_v $V28, $V16]}
+1:
+    @{[vxor_vv $V24, $V24, $V28]}
+    slli $T0, $VL, 2
+    sub $LEN32, $LEN32, $VL
+    add $INPUT, $INPUT, $T0
+    @{[aes_192_dec]}
+    @{[vxor_vv $V24, $V24, $V28]}
+
+    # store plaintext
+    @{[vse32_v $V24, $OUTPUT]}
+    add $OUTPUT, $OUTPUT, $T0
+
+    bnez $LEN32, .Ldec_blocks_192
+
+2:
+    @{[handle_xts_dec_last_block]}
+
+    ## xts second to last block
+    @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]}
+    @{[vxor_vv $V24, $V24, $V29]}
+    @{[aes_192_dec]}
+    @{[vxor_vv $V24, $V24, $V29]}
+    @{[vmv_v_v $V25, $V24]}
+
+    # load last block ciphertext
+    @{[vsetvli "zero", $TAIL_LENGTH, "e8", "m1", "tu", "ma"]}
+    @{[vle8_v $V24, $INPUT]}
+
+    # store second to last block plaintext
+    addi $T0, $OUTPUT, 16
+    @{[vse8_v $V25, $T0]}
+
+    ## xts last block
+    @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]}
+    @{[vxor_vv $V24, $V24, $V28]}
+    @{[aes_192_dec]}
+    @{[vxor_vv $V24, $V24, $V28]}
+
+    # store second to last block plaintext
+    @{[vse32_v $V24, $OUTPUT]}
+
+    ret
+.size aes_xts_dec_192,.-aes_xts_dec_192
+___
+
+$code .= <<___;
+.p2align 3
+aes_xts_dec_256:
+    @{[init_first_round]}
+    @{[aes_256_load_key]}
+
+    beqz $LEN32, 2f
+
+    @{[vsetvli $VL, $LEN32, "e32", "m4", "ta", "ma"]}
+    j 1f
+
+.Ldec_blocks_256:
+    @{[vsetvli $VL, $LEN32, "e32", "m4", "ta", "ma"]}
+    # load ciphertext into v24
+    @{[vle32_v $V24, $INPUT]}
+    # update iv
+    @{[vgmul_vv $V16, $V20]}
+    # reverse the iv's bits order back
+    @{[vbrev8_v $V28, $V16]}
+1:
+    @{[vxor_vv $V24, $V24, $V28]}
+    slli $T0, $VL, 2
+    sub $LEN32, $LEN32, $VL
+    add $INPUT, $INPUT, $T0
+    @{[aes_256_dec]}
+    @{[vxor_vv $V24, $V24, $V28]}
+
+    # store plaintext
+    @{[vse32_v $V24, $OUTPUT]}
+    add $OUTPUT, $OUTPUT, $T0
+
+    bnez $LEN32, .Ldec_blocks_256
+
+2:
+    @{[handle_xts_dec_last_block]}
+
+    ## xts second to last block
+    @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]}
+    @{[vxor_vv $V24, $V24, $V29]}
+    @{[aes_256_dec]}
+    @{[vxor_vv $V24, $V24, $V29]}
+    @{[vmv_v_v $V25, $V24]}
+
+    # load last block ciphertext
+    @{[vsetvli "zero", $TAIL_LENGTH, "e8", "m1", "tu", "ma"]}
+    @{[vle8_v $V24, $INPUT]}
+
+    # store second to last block plaintext
+    addi $T0, $OUTPUT, 16
+    @{[vse8_v $V25, $T0]}
+
+    ## xts last block
+    @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]}
+    @{[vxor_vv $V24, $V24, $V28]}
+    @{[aes_256_dec]}
+    @{[vxor_vv $V24, $V24, $V28]}
+
+    # store second to last block plaintext
+    @{[vse32_v $V24, $OUTPUT]}
+
+    ret
+.size aes_xts_dec_256,.-aes_xts_dec_256
+___
+}
+
+print $code;
+
+close STDOUT or die "error closing STDOUT: $!";
diff --git a/arch/riscv/crypto/aes-riscv64-zvkb-zvkned.pl b/arch/riscv/crypto/aes-riscv64-zvkb-zvkned.pl
new file mode 100644
index 000000000000..bc659da44c53
--- /dev/null
+++ b/arch/riscv/crypto/aes-riscv64-zvkb-zvkned.pl
@@ -0,0 +1,416 @@
+#! /usr/bin/env perl
+# SPDX-License-Identifier: Apache-2.0 OR BSD-2-Clause
+#
+# This file is dual-licensed, meaning that you can use it under your
+# choice of either of the following two licenses:
+#
+# Copyright 2023 The OpenSSL Project Authors. All Rights Reserved.
+#
+# Licensed under the Apache License 2.0 (the "License"). You can obtain
+# a copy in the file LICENSE in the source distribution or at
+# https://www.openssl.org/source/license.html
+#
+# or
+#
+# Copyright (c) 2023, Jerry Shih <jerry.shih@sifive.com>
+# All rights reserved.
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions
+# are met:
+# 1. Redistributions of source code must retain the above copyright
+#    notice, this list of conditions and the following disclaimer.
+# 2. Redistributions in binary form must reproduce the above copyright
+#    notice, this list of conditions and the following disclaimer in the
+#    documentation and/or other materials provided with the distribution.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+# - RV64I
+# - RISC-V Vector ('V') with VLEN >= 128
+# - RISC-V Vector Cryptography Bit-manipulation extension ('Zvkb')
+# - RISC-V Vector AES block cipher extension ('Zvkned')
+# - RISC-V Zicclsm(Main memory supports misaligned loads/stores)
+
+use strict;
+use warnings;
+
+use FindBin qw($Bin);
+use lib "$Bin";
+use lib "$Bin/../../perlasm";
+use riscv;
+
+# $output is the last argument if it looks like a file (it has an extension)
+# $flavour is the first argument if it doesn't look like a file
+my $output = $#ARGV >= 0 && $ARGV[$#ARGV] =~ m|\.\w+$| ? pop : undef;
+my $flavour = $#ARGV >= 0 && $ARGV[0] !~ m|\.| ? shift : undef;
+
+$output and open STDOUT,">$output";
+
+my $code=<<___;
+.text
+___
+
+################################################################################
+# void rv64i_zvkb_zvkned_ctr32_encrypt_blocks(const unsigned char *in,
+#                                             unsigned char *out, size_t length,
+#                                             const void *key,
+#                                             unsigned char ivec[16]);
+{
+my ($INP, $OUTP, $LEN, $KEYP, $IVP) = ("a0", "a1", "a2", "a3", "a4");
+my ($T0, $T1, $T2, $T3) = ("t0", "t1", "t2", "t3");
+my ($VL) = ("t4");
+my ($LEN32) = ("t5");
+my ($CTR) = ("t6");
+my ($MASK) = ("v0");
+my ($V0, $V1, $V2, $V3, $V4, $V5, $V6, $V7,
+    $V8, $V9, $V10, $V11, $V12, $V13, $V14, $V15,
+    $V16, $V17, $V18, $V19, $V20, $V21, $V22, $V23,
+    $V24, $V25, $V26, $V27, $V28, $V29, $V30, $V31,
+) = map("v$_",(0..31));
+
+# Prepare the AES ctr input data into v16.
+sub init_aes_ctr_input {
+    my $code=<<___;
+    # Setup mask into v0
+    # The mask pattern for 4*N-th elements
+    # mask v0: [000100010001....]
+    # Note:
+    #   We could setup the mask just for the maximum element length instead of
+    #   the VLMAX.
+    li $T0, 0b10001000
+    @{[vsetvli $T2, "zero", "e8", "m1", "ta", "ma"]}
+    @{[vmv_v_x $MASK, $T0]}
+    # Load IV.
+    # v31:[IV0, IV1, IV2, big-endian count]
+    @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]}
+    @{[vle32_v $V31, $IVP]}
+    # Convert the big-endian counter into little-endian.
+    @{[vsetivli "zero", 4, "e32", "m1", "ta", "mu"]}
+    @{[vrev8_v $V31, $V31, $MASK]}
+    # Splat the IV to v16
+    @{[vsetvli "zero", $LEN32, "e32", "m4", "ta", "ma"]}
+    @{[vmv_v_i $V16, 0]}
+    @{[vaesz_vs $V16, $V31]}
+    # Prepare the ctr pattern into v20
+    # v20: [x, x, x, 0, x, x, x, 1, x, x, x, 2, ...]
+    @{[viota_m $V20, $MASK, $MASK]}
+    # v16:[IV0, IV1, IV2, count+0, IV0, IV1, IV2, count+1, ...]
+    @{[vsetvli $VL, $LEN32, "e32", "m4", "ta", "mu"]}
+    @{[vadd_vv $V16, $V16, $V20, $MASK]}
+___
+
+    return $code;
+}
+
+$code .= <<___;
+.p2align 3
+.globl rv64i_zvkb_zvkned_ctr32_encrypt_blocks
+.type rv64i_zvkb_zvkned_ctr32_encrypt_blocks,\@function
+rv64i_zvkb_zvkned_ctr32_encrypt_blocks:
+    # The aes block size is 16 bytes.
+    # We try to get the minimum aes block number including the tail data.
+    addi $T0, $LEN, 15
+    # the minimum block number
+    srli $T0, $T0, 4
+    # We make the block number become e32 length here.
+    slli $LEN32, $T0, 2
+
+    # Load number of rounds
+    lwu $T0, 240($KEYP)
+    li $T1, 14
+    li $T2, 12
+    li $T3, 10
+
+    beq $T0, $T1, ctr32_encrypt_blocks_256
+    beq $T0, $T2, ctr32_encrypt_blocks_192
+    beq $T0, $T3, ctr32_encrypt_blocks_128
+
+    ret
+.size rv64i_zvkb_zvkned_ctr32_encrypt_blocks,.-rv64i_zvkb_zvkned_ctr32_encrypt_blocks
+___
+
+$code .= <<___;
+.p2align 3
+ctr32_encrypt_blocks_128:
+    # Load all 11 round keys to v1-v11 registers.
+    @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]}
+    @{[vle32_v $V1, $KEYP]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $V2, $KEYP]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $V3, $KEYP]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $V4, $KEYP]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $V5, $KEYP]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $V6, $KEYP]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $V7, $KEYP]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $V8, $KEYP]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $V9, $KEYP]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $V10, $KEYP]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $V11, $KEYP]}
+
+    @{[init_aes_ctr_input]}
+
+    ##### AES body
+    j 2f
+1:
+    @{[vsetvli $VL, $LEN32, "e32", "m4", "ta", "mu"]}
+    # Increase ctr in v16.
+    @{[vadd_vx $V16, $V16, $CTR, $MASK]}
+2:
+    # Prepare the AES ctr input into v24.
+    # The ctr data uses big-endian form.
+    @{[vmv_v_v $V24, $V16]}
+    @{[vrev8_v $V24, $V24, $MASK]}
+    srli $CTR, $VL, 2
+    sub $LEN32, $LEN32, $VL
+
+    # Load plaintext in bytes into v20.
+    @{[vsetvli $T0, $LEN, "e8", "m4", "ta", "ma"]}
+    @{[vle8_v $V20, $INP]}
+    sub $LEN, $LEN, $T0
+    add $INP, $INP, $T0
+
+    @{[vsetvli "zero", $VL, "e32", "m4", "ta", "ma"]}
+    @{[vaesz_vs $V24, $V1]}
+    @{[vaesem_vs $V24, $V2]}
+    @{[vaesem_vs $V24, $V3]}
+    @{[vaesem_vs $V24, $V4]}
+    @{[vaesem_vs $V24, $V5]}
+    @{[vaesem_vs $V24, $V6]}
+    @{[vaesem_vs $V24, $V7]}
+    @{[vaesem_vs $V24, $V8]}
+    @{[vaesem_vs $V24, $V9]}
+    @{[vaesem_vs $V24, $V10]}
+    @{[vaesef_vs $V24, $V11]}
+
+    # ciphertext
+    @{[vsetvli "zero", $T0, "e8", "m4", "ta", "ma"]}
+    @{[vxor_vv $V24, $V24, $V20]}
+
+    # Store the ciphertext.
+    @{[vse8_v $V24, $OUTP]}
+    add $OUTP, $OUTP, $T0
+
+    bnez $LEN, 1b
+
+    ## store ctr iv
+    @{[vsetivli "zero", 4, "e32", "m1", "ta", "mu"]}
+    # Increase ctr in v16.
+    @{[vadd_vx $V16, $V16, $CTR, $MASK]}
+    # Convert ctr data back to big-endian.
+    @{[vrev8_v $V16, $V16, $MASK]}
+    @{[vse32_v $V16, $IVP]}
+
+    ret
+.size ctr32_encrypt_blocks_128,.-ctr32_encrypt_blocks_128
+___
+
+$code .= <<___;
+.p2align 3
+ctr32_encrypt_blocks_192:
+    # Load all 13 round keys to v1-v13 registers.
+    @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]}
+    @{[vle32_v $V1, $KEYP]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $V2, $KEYP]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $V3, $KEYP]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $V4, $KEYP]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $V5, $KEYP]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $V6, $KEYP]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $V7, $KEYP]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $V8, $KEYP]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $V9, $KEYP]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $V10, $KEYP]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $V11, $KEYP]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $V12, $KEYP]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $V13, $KEYP]}
+
+    @{[init_aes_ctr_input]}
+
+    ##### AES body
+    j 2f
+1:
+    @{[vsetvli $VL, $LEN32, "e32", "m4", "ta", "mu"]}
+    # Increase ctr in v16.
+    @{[vadd_vx $V16, $V16, $CTR, $MASK]}
+2:
+    # Prepare the AES ctr input into v24.
+    # The ctr data uses big-endian form.
+    @{[vmv_v_v $V24, $V16]}
+    @{[vrev8_v $V24, $V24, $MASK]}
+    srli $CTR, $VL, 2
+    sub $LEN32, $LEN32, $VL
+
+    # Load plaintext in bytes into v20.
+    @{[vsetvli $T0, $LEN, "e8", "m4", "ta", "ma"]}
+    @{[vle8_v $V20, $INP]}
+    sub $LEN, $LEN, $T0
+    add $INP, $INP, $T0
+
+    @{[vsetvli "zero", $VL, "e32", "m4", "ta", "ma"]}
+    @{[vaesz_vs $V24, $V1]}
+    @{[vaesem_vs $V24, $V2]}
+    @{[vaesem_vs $V24, $V3]}
+    @{[vaesem_vs $V24, $V4]}
+    @{[vaesem_vs $V24, $V5]}
+    @{[vaesem_vs $V24, $V6]}
+    @{[vaesem_vs $V24, $V7]}
+    @{[vaesem_vs $V24, $V8]}
+    @{[vaesem_vs $V24, $V9]}
+    @{[vaesem_vs $V24, $V10]}
+    @{[vaesem_vs $V24, $V11]}
+    @{[vaesem_vs $V24, $V12]}
+    @{[vaesef_vs $V24, $V13]}
+
+    # ciphertext
+    @{[vsetvli "zero", $T0, "e8", "m4", "ta", "ma"]}
+    @{[vxor_vv $V24, $V24, $V20]}
+
+    # Store the ciphertext.
+    @{[vse8_v $V24, $OUTP]}
+    add $OUTP, $OUTP, $T0
+
+    bnez $LEN, 1b
+
+    ## store ctr iv
+    @{[vsetivli "zero", 4, "e32", "m1", "ta", "mu"]}
+    # Increase ctr in v16.
+    @{[vadd_vx $V16, $V16, $CTR, $MASK]}
+    # Convert ctr data back to big-endian.
+    @{[vrev8_v $V16, $V16, $MASK]}
+    @{[vse32_v $V16, $IVP]}
+
+    ret
+.size ctr32_encrypt_blocks_192,.-ctr32_encrypt_blocks_192
+___
+
+$code .= <<___;
+.p2align 3
+ctr32_encrypt_blocks_256:
+    # Load all 15 round keys to v1-v15 registers.
+    @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]}
+    @{[vle32_v $V1, $KEYP]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $V2, $KEYP]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $V3, $KEYP]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $V4, $KEYP]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $V5, $KEYP]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $V6, $KEYP]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $V7, $KEYP]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $V8, $KEYP]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $V9, $KEYP]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $V10, $KEYP]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $V11, $KEYP]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $V12, $KEYP]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $V13, $KEYP]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $V14, $KEYP]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $V15, $KEYP]}
+
+    @{[init_aes_ctr_input]}
+
+    ##### AES body
+    j 2f
+1:
+    @{[vsetvli $VL, $LEN32, "e32", "m4", "ta", "mu"]}
+    # Increase ctr in v16.
+    @{[vadd_vx $V16, $V16, $CTR, $MASK]}
+2:
+    # Prepare the AES ctr input into v24.
+    # The ctr data uses big-endian form.
+    @{[vmv_v_v $V24, $V16]}
+    @{[vrev8_v $V24, $V24, $MASK]}
+    srli $CTR, $VL, 2
+    sub $LEN32, $LEN32, $VL
+
+    # Load plaintext in bytes into v20.
+    @{[vsetvli $T0, $LEN, "e8", "m4", "ta", "ma"]}
+    @{[vle8_v $V20, $INP]}
+    sub $LEN, $LEN, $T0
+    add $INP, $INP, $T0
+
+    @{[vsetvli "zero", $VL, "e32", "m4", "ta", "ma"]}
+    @{[vaesz_vs $V24, $V1]}
+    @{[vaesem_vs $V24, $V2]}
+    @{[vaesem_vs $V24, $V3]}
+    @{[vaesem_vs $V24, $V4]}
+    @{[vaesem_vs $V24, $V5]}
+    @{[vaesem_vs $V24, $V6]}
+    @{[vaesem_vs $V24, $V7]}
+    @{[vaesem_vs $V24, $V8]}
+    @{[vaesem_vs $V24, $V9]}
+    @{[vaesem_vs $V24, $V10]}
+    @{[vaesem_vs $V24, $V11]}
+    @{[vaesem_vs $V24, $V12]}
+    @{[vaesem_vs $V24, $V13]}
+    @{[vaesem_vs $V24, $V14]}
+    @{[vaesef_vs $V24, $V15]}
+
+    # ciphertext
+    @{[vsetvli "zero", $T0, "e8", "m4", "ta", "ma"]}
+    @{[vxor_vv $V24, $V24, $V20]}
+
+    # Store the ciphertext.
+    @{[vse8_v $V24, $OUTP]}
+    add $OUTP, $OUTP, $T0
+
+    bnez $LEN, 1b
+
+    ## store ctr iv
+    @{[vsetivli "zero", 4, "e32", "m1", "ta", "mu"]}
+    # Increase ctr in v16.
+    @{[vadd_vx $V16, $V16, $CTR, $MASK]}
+    # Convert ctr data back to big-endian.
+    @{[vrev8_v $V16, $V16, $MASK]}
+    @{[vse32_v $V16, $IVP]}
+
+    ret
+.size ctr32_encrypt_blocks_256,.-ctr32_encrypt_blocks_256
+___
+}
+
+print $code;
+
+close STDOUT or die "error closing STDOUT: $!";
diff --git a/arch/riscv/crypto/aes-riscv64-zvkned.pl b/arch/riscv/crypto/aes-riscv64-zvkned.pl
index c0ecde77bf56..4689f878463a 100644
--- a/arch/riscv/crypto/aes-riscv64-zvkned.pl
+++ b/arch/riscv/crypto/aes-riscv64-zvkned.pl
@@ -66,6 +66,753 @@ my ($V0, $V1, $V2, $V3, $V4, $V5, $V6, $V7,
     $V24, $V25, $V26, $V27, $V28, $V29, $V30, $V31,
 ) = map("v$_",(0..31));
 
+# Load all 11 round keys to v1-v11 registers.
+sub aes_128_load_key {
+    my $KEYP = shift;
+
+    my $code=<<___;
+    @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]}
+    @{[vle32_v $V1, $KEYP]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $V2, $KEYP]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $V3, $KEYP]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $V4, $KEYP]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $V5, $KEYP]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $V6, $KEYP]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $V7, $KEYP]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $V8, $KEYP]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $V9, $KEYP]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $V10, $KEYP]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $V11, $KEYP]}
+___
+
+    return $code;
+}
+
+# Load all 13 round keys to v1-v13 registers.
+sub aes_192_load_key {
+    my $KEYP = shift;
+
+    my $code=<<___;
+    @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]}
+    @{[vle32_v $V1, $KEYP]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $V2, $KEYP]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $V3, $KEYP]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $V4, $KEYP]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $V5, $KEYP]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $V6, $KEYP]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $V7, $KEYP]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $V8, $KEYP]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $V9, $KEYP]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $V10, $KEYP]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $V11, $KEYP]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $V12, $KEYP]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $V13, $KEYP]}
+___
+
+    return $code;
+}
+
+# Load all 15 round keys to v1-v15 registers.
+sub aes_256_load_key {
+    my $KEYP = shift;
+
+    my $code=<<___;
+    @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]}
+    @{[vle32_v $V1, $KEYP]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $V2, $KEYP]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $V3, $KEYP]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $V4, $KEYP]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $V5, $KEYP]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $V6, $KEYP]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $V7, $KEYP]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $V8, $KEYP]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $V9, $KEYP]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $V10, $KEYP]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $V11, $KEYP]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $V12, $KEYP]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $V13, $KEYP]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $V14, $KEYP]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $V15, $KEYP]}
+___
+
+    return $code;
+}
+
+# aes-128 encryption with round keys v1-v11
+sub aes_128_encrypt {
+    my $code=<<___;
+    @{[vaesz_vs $V24, $V1]}     # with round key w[ 0, 3]
+    @{[vaesem_vs $V24, $V2]}    # with round key w[ 4, 7]
+    @{[vaesem_vs $V24, $V3]}    # with round key w[ 8,11]
+    @{[vaesem_vs $V24, $V4]}    # with round key w[12,15]
+    @{[vaesem_vs $V24, $V5]}    # with round key w[16,19]
+    @{[vaesem_vs $V24, $V6]}    # with round key w[20,23]
+    @{[vaesem_vs $V24, $V7]}    # with round key w[24,27]
+    @{[vaesem_vs $V24, $V8]}    # with round key w[28,31]
+    @{[vaesem_vs $V24, $V9]}    # with round key w[32,35]
+    @{[vaesem_vs $V24, $V10]}   # with round key w[36,39]
+    @{[vaesef_vs $V24, $V11]}   # with round key w[40,43]
+___
+
+    return $code;
+}
+
+# aes-128 decryption with round keys v1-v11
+sub aes_128_decrypt {
+    my $code=<<___;
+    @{[vaesz_vs $V24, $V11]}   # with round key w[40,43]
+    @{[vaesdm_vs $V24, $V10]}  # with round key w[36,39]
+    @{[vaesdm_vs $V24, $V9]}   # with round key w[32,35]
+    @{[vaesdm_vs $V24, $V8]}   # with round key w[28,31]
+    @{[vaesdm_vs $V24, $V7]}   # with round key w[24,27]
+    @{[vaesdm_vs $V24, $V6]}   # with round key w[20,23]
+    @{[vaesdm_vs $V24, $V5]}   # with round key w[16,19]
+    @{[vaesdm_vs $V24, $V4]}   # with round key w[12,15]
+    @{[vaesdm_vs $V24, $V3]}   # with round key w[ 8,11]
+    @{[vaesdm_vs $V24, $V2]}   # with round key w[ 4, 7]
+    @{[vaesdf_vs $V24, $V1]}   # with round key w[ 0, 3]
+___
+
+    return $code;
+}
+
+# aes-192 encryption with round keys v1-v13
+sub aes_192_encrypt {
+    my $code=<<___;
+    @{[vaesz_vs $V24, $V1]}     # with round key w[ 0, 3]
+    @{[vaesem_vs $V24, $V2]}    # with round key w[ 4, 7]
+    @{[vaesem_vs $V24, $V3]}    # with round key w[ 8,11]
+    @{[vaesem_vs $V24, $V4]}    # with round key w[12,15]
+    @{[vaesem_vs $V24, $V5]}    # with round key w[16,19]
+    @{[vaesem_vs $V24, $V6]}    # with round key w[20,23]
+    @{[vaesem_vs $V24, $V7]}    # with round key w[24,27]
+    @{[vaesem_vs $V24, $V8]}    # with round key w[28,31]
+    @{[vaesem_vs $V24, $V9]}    # with round key w[32,35]
+    @{[vaesem_vs $V24, $V10]}   # with round key w[36,39]
+    @{[vaesem_vs $V24, $V11]}   # with round key w[40,43]
+    @{[vaesem_vs $V24, $V12]}   # with round key w[44,47]
+    @{[vaesef_vs $V24, $V13]}   # with round key w[48,51]
+___
+
+    return $code;
+}
+
+# aes-192 decryption with round keys v1-v13
+sub aes_192_decrypt {
+    my $code=<<___;
+    @{[vaesz_vs $V24, $V13]}    # with round key w[48,51]
+    @{[vaesdm_vs $V24, $V12]}   # with round key w[44,47]
+    @{[vaesdm_vs $V24, $V11]}   # with round key w[40,43]
+    @{[vaesdm_vs $V24, $V10]}   # with round key w[36,39]
+    @{[vaesdm_vs $V24, $V9]}    # with round key w[32,35]
+    @{[vaesdm_vs $V24, $V8]}    # with round key w[28,31]
+    @{[vaesdm_vs $V24, $V7]}    # with round key w[24,27]
+    @{[vaesdm_vs $V24, $V6]}    # with round key w[20,23]
+    @{[vaesdm_vs $V24, $V5]}    # with round key w[16,19]
+    @{[vaesdm_vs $V24, $V4]}    # with round key w[12,15]
+    @{[vaesdm_vs $V24, $V3]}    # with round key w[ 8,11]
+    @{[vaesdm_vs $V24, $V2]}    # with round key w[ 4, 7]
+    @{[vaesdf_vs $V24, $V1]}    # with round key w[ 0, 3]
+___
+
+    return $code;
+}
+
+# aes-256 encryption with round keys v1-v15
+sub aes_256_encrypt {
+    my $code=<<___;
+    @{[vaesz_vs $V24, $V1]}     # with round key w[ 0, 3]
+    @{[vaesem_vs $V24, $V2]}    # with round key w[ 4, 7]
+    @{[vaesem_vs $V24, $V3]}    # with round key w[ 8,11]
+    @{[vaesem_vs $V24, $V4]}    # with round key w[12,15]
+    @{[vaesem_vs $V24, $V5]}    # with round key w[16,19]
+    @{[vaesem_vs $V24, $V6]}    # with round key w[20,23]
+    @{[vaesem_vs $V24, $V7]}    # with round key w[24,27]
+    @{[vaesem_vs $V24, $V8]}    # with round key w[28,31]
+    @{[vaesem_vs $V24, $V9]}    # with round key w[32,35]
+    @{[vaesem_vs $V24, $V10]}   # with round key w[36,39]
+    @{[vaesem_vs $V24, $V11]}   # with round key w[40,43]
+    @{[vaesem_vs $V24, $V12]}   # with round key w[44,47]
+    @{[vaesem_vs $V24, $V13]}   # with round key w[48,51]
+    @{[vaesem_vs $V24, $V14]}   # with round key w[52,55]
+    @{[vaesef_vs $V24, $V15]}   # with round key w[56,59]
+___
+
+    return $code;
+}
+
+# aes-256 decryption with round keys v1-v15
+sub aes_256_decrypt {
+    my $code=<<___;
+    @{[vaesz_vs $V24, $V15]}    # with round key w[56,59]
+    @{[vaesdm_vs $V24, $V14]}   # with round key w[52,55]
+    @{[vaesdm_vs $V24, $V13]}   # with round key w[48,51]
+    @{[vaesdm_vs $V24, $V12]}   # with round key w[44,47]
+    @{[vaesdm_vs $V24, $V11]}   # with round key w[40,43]
+    @{[vaesdm_vs $V24, $V10]}   # with round key w[36,39]
+    @{[vaesdm_vs $V24, $V9]}    # with round key w[32,35]
+    @{[vaesdm_vs $V24, $V8]}    # with round key w[28,31]
+    @{[vaesdm_vs $V24, $V7]}    # with round key w[24,27]
+    @{[vaesdm_vs $V24, $V6]}    # with round key w[20,23]
+    @{[vaesdm_vs $V24, $V5]}    # with round key w[16,19]
+    @{[vaesdm_vs $V24, $V4]}    # with round key w[12,15]
+    @{[vaesdm_vs $V24, $V3]}    # with round key w[ 8,11]
+    @{[vaesdm_vs $V24, $V2]}    # with round key w[ 4, 7]
+    @{[vaesdf_vs $V24, $V1]}    # with round key w[ 0, 3]
+___
+
+    return $code;
+}
+
+{
+###############################################################################
+# void rv64i_zvkned_cbc_encrypt(const unsigned char *in, unsigned char *out,
+#                               size_t length, const AES_KEY *key,
+#                               unsigned char *ivec, const int enc);
+my ($INP, $OUTP, $LEN, $KEYP, $IVP, $ENC) = ("a0", "a1", "a2", "a3", "a4", "a5");
+my ($T0, $T1, $ROUNDS) = ("t0", "t1", "t2");
+
+$code .= <<___;
+.p2align 3
+.globl rv64i_zvkned_cbc_encrypt
+.type rv64i_zvkned_cbc_encrypt,\@function
+rv64i_zvkned_cbc_encrypt:
+    # check whether the length is a multiple of 16 and >= 16
+    li $T1, 16
+    blt $LEN, $T1, L_end
+    andi $T1, $LEN, 15
+    bnez $T1, L_end
+
+    # Load number of rounds
+    lwu $ROUNDS, 240($KEYP)
+
+    # Get proper routine for key size
+    li $T0, 10
+    beq $ROUNDS, $T0, L_cbc_enc_128
+
+    li $T0, 12
+    beq $ROUNDS, $T0, L_cbc_enc_192
+
+    li $T0, 14
+    beq $ROUNDS, $T0, L_cbc_enc_256
+
+    ret
+.size rv64i_zvkned_cbc_encrypt,.-rv64i_zvkned_cbc_encrypt
+___
+
+$code .= <<___;
+.p2align 3
+L_cbc_enc_128:
+    # Load all 11 round keys to v1-v11 registers.
+    @{[aes_128_load_key $KEYP]}
+
+    # Load IV.
+    @{[vle32_v $V16, $IVP]}
+
+    @{[vle32_v $V24, $INP]}
+    @{[vxor_vv $V24, $V24, $V16]}
+    j 2f
+
+1:
+    @{[vle32_v $V17, $INP]}
+    @{[vxor_vv $V24, $V24, $V17]}
+
+2:
+    # AES body
+    @{[aes_128_encrypt]}
+
+    @{[vse32_v $V24, $OUTP]}
+
+    addi $INP, $INP, 16
+    addi $OUTP, $OUTP, 16
+    addi $LEN, $LEN, -16
+
+    bnez $LEN, 1b
+
+    @{[vse32_v $V24, $IVP]}
+
+    ret
+.size L_cbc_enc_128,.-L_cbc_enc_128
+___
+
+$code .= <<___;
+.p2align 3
+L_cbc_enc_192:
+    # Load all 13 round keys to v1-v13 registers.
+    @{[aes_192_load_key $KEYP]}
+
+    # Load IV.
+    @{[vle32_v $V16, $IVP]}
+
+    @{[vle32_v $V24, $INP]}
+    @{[vxor_vv $V24, $V24, $V16]}
+    j 2f
+
+1:
+    @{[vle32_v $V17, $INP]}
+    @{[vxor_vv $V24, $V24, $V17]}
+
+2:
+    # AES body
+    @{[aes_192_encrypt]}
+
+    @{[vse32_v $V24, $OUTP]}
+
+    addi $INP, $INP, 16
+    addi $OUTP, $OUTP, 16
+    addi $LEN, $LEN, -16
+
+    bnez $LEN, 1b
+
+    @{[vse32_v $V24, $IVP]}
+
+    ret
+.size L_cbc_enc_192,.-L_cbc_enc_192
+___
+
+$code .= <<___;
+.p2align 3
+L_cbc_enc_256:
+    # Load all 15 round keys to v1-v15 registers.
+    @{[aes_256_load_key $KEYP]}
+
+    # Load IV.
+    @{[vle32_v $V16, $IVP]}
+
+    @{[vle32_v $V24, $INP]}
+    @{[vxor_vv $V24, $V24, $V16]}
+    j 2f
+
+1:
+    @{[vle32_v $V17, $INP]}
+    @{[vxor_vv $V24, $V24, $V17]}
+
+2:
+    # AES body
+    @{[aes_256_encrypt]}
+
+    @{[vse32_v $V24, $OUTP]}
+
+    addi $INP, $INP, 16
+    addi $OUTP, $OUTP, 16
+    addi $LEN, $LEN, -16
+
+    bnez $LEN, 1b
+
+    @{[vse32_v $V24, $IVP]}
+
+    ret
+.size L_cbc_enc_256,.-L_cbc_enc_256
+___
+
+###############################################################################
+# void rv64i_zvkned_cbc_decrypt(const unsigned char *in, unsigned char *out,
+#                               size_t length, const AES_KEY *key,
+#                               unsigned char *ivec, const int enc);
+$code .= <<___;
+.p2align 3
+.globl rv64i_zvkned_cbc_decrypt
+.type rv64i_zvkned_cbc_decrypt,\@function
+rv64i_zvkned_cbc_decrypt:
+    # check whether the length is a multiple of 16 and >= 16
+    li $T1, 16
+    blt $LEN, $T1, L_end
+    andi $T1, $LEN, 15
+    bnez $T1, L_end
+
+    # Load number of rounds
+    lwu $ROUNDS, 240($KEYP)
+
+    # Get proper routine for key size
+    li $T0, 10
+    beq $ROUNDS, $T0, L_cbc_dec_128
+
+    li $T0, 12
+    beq $ROUNDS, $T0, L_cbc_dec_192
+
+    li $T0, 14
+    beq $ROUNDS, $T0, L_cbc_dec_256
+
+    ret
+.size rv64i_zvkned_cbc_decrypt,.-rv64i_zvkned_cbc_decrypt
+___
+
+$code .= <<___;
+.p2align 3
+L_cbc_dec_128:
+    # Load all 11 round keys to v1-v11 registers.
+    @{[aes_128_load_key $KEYP]}
+
+    # Load IV.
+    @{[vle32_v $V16, $IVP]}
+
+    @{[vle32_v $V24, $INP]}
+    @{[vmv_v_v $V17, $V24]}
+    j 2f
+
+1:
+    @{[vle32_v $V24, $INP]}
+    @{[vmv_v_v $V17, $V24]}
+    addi $OUTP, $OUTP, 16
+
+2:
+    # AES body
+    @{[aes_128_decrypt]}
+
+    @{[vxor_vv $V24, $V24, $V16]}
+    @{[vse32_v $V24, $OUTP]}
+    @{[vmv_v_v $V16, $V17]}
+
+    addi $LEN, $LEN, -16
+    addi $INP, $INP, 16
+
+    bnez $LEN, 1b
+
+    @{[vse32_v $V16, $IVP]}
+
+    ret
+.size L_cbc_dec_128,.-L_cbc_dec_128
+___
+
+$code .= <<___;
+.p2align 3
+L_cbc_dec_192:
+    # Load all 13 round keys to v1-v13 registers.
+    @{[aes_192_load_key $KEYP]}
+
+    # Load IV.
+    @{[vle32_v $V16, $IVP]}
+
+    @{[vle32_v $V24, $INP]}
+    @{[vmv_v_v $V17, $V24]}
+    j 2f
+
+1:
+    @{[vle32_v $V24, $INP]}
+    @{[vmv_v_v $V17, $V24]}
+    addi $OUTP, $OUTP, 16
+
+2:
+    # AES body
+    @{[aes_192_decrypt]}
+
+    @{[vxor_vv $V24, $V24, $V16]}
+    @{[vse32_v $V24, $OUTP]}
+    @{[vmv_v_v $V16, $V17]}
+
+    addi $LEN, $LEN, -16
+    addi $INP, $INP, 16
+
+    bnez $LEN, 1b
+
+    @{[vse32_v $V16, $IVP]}
+
+    ret
+.size L_cbc_dec_192,.-L_cbc_dec_192
+___
+
+$code .= <<___;
+.p2align 3
+L_cbc_dec_256:
+    # Load all 15 round keys to v1-v15 registers.
+    @{[aes_256_load_key $KEYP]}
+
+    # Load IV.
+    @{[vle32_v $V16, $IVP]}
+
+    @{[vle32_v $V24, $INP]}
+    @{[vmv_v_v $V17, $V24]}
+    j 2f
+
+1:
+    @{[vle32_v $V24, $INP]}
+    @{[vmv_v_v $V17, $V24]}
+    addi $OUTP, $OUTP, 16
+
+2:
+    # AES body
+    @{[aes_256_decrypt]}
+
+    @{[vxor_vv $V24, $V24, $V16]}
+    @{[vse32_v $V24, $OUTP]}
+    @{[vmv_v_v $V16, $V17]}
+
+    addi $LEN, $LEN, -16
+    addi $INP, $INP, 16
+
+    bnez $LEN, 1b
+
+    @{[vse32_v $V16, $IVP]}
+
+    ret
+.size L_cbc_dec_256,.-L_cbc_dec_256
+___
+}
+
+{
+###############################################################################
+# void rv64i_zvkned_ecb_encrypt(const unsigned char *in, unsigned char *out,
+#                               size_t length, const AES_KEY *key,
+#                               const int enc);
+my ($INP, $OUTP, $LEN, $KEYP, $ENC) = ("a0", "a1", "a2", "a3", "a4");
+my ($REMAIN_LEN) = ("a5");
+my ($VL) = ("a6");
+my ($T0, $T1, $ROUNDS) = ("t0", "t1", "t2");
+my ($LEN32) = ("t3");
+
+$code .= <<___;
+.p2align 3
+.globl rv64i_zvkned_ecb_encrypt
+.type rv64i_zvkned_ecb_encrypt,\@function
+rv64i_zvkned_ecb_encrypt:
+    # Make the LEN become e32 length.
+    srli $LEN32, $LEN, 2
+
+    # Load number of rounds
+    lwu $ROUNDS, 240($KEYP)
+
+    # Get proper routine for key size
+    li $T0, 10
+    beq $ROUNDS, $T0, L_ecb_enc_128
+
+    li $T0, 12
+    beq $ROUNDS, $T0, L_ecb_enc_192
+
+    li $T0, 14
+    beq $ROUNDS, $T0, L_ecb_enc_256
+
+    ret
+.size rv64i_zvkned_ecb_encrypt,.-rv64i_zvkned_ecb_encrypt
+___
+
+$code .= <<___;
+.p2align 3
+L_ecb_enc_128:
+    # Load all 11 round keys to v1-v11 registers.
+    @{[aes_128_load_key $KEYP]}
+
+1:
+    @{[vsetvli $VL, $LEN32, "e32", "m4", "ta", "ma"]}
+    slli $T0, $VL, 2
+    sub $LEN32, $LEN32, $VL
+
+    @{[vle32_v $V24, $INP]}
+
+    # AES body
+    @{[aes_128_encrypt]}
+
+    @{[vse32_v $V24, $OUTP]}
+
+    add $INP, $INP, $T0
+    add $OUTP, $OUTP, $T0
+
+    bnez $LEN32, 1b
+
+    ret
+.size L_ecb_enc_128,.-L_ecb_enc_128
+___
+
+$code .= <<___;
+.p2align 3
+L_ecb_enc_192:
+    # Load all 13 round keys to v1-v13 registers.
+    @{[aes_192_load_key $KEYP]}
+
+1:
+    @{[vsetvli $VL, $LEN32, "e32", "m4", "ta", "ma"]}
+    slli $T0, $VL, 2
+    sub $LEN32, $LEN32, $VL
+
+    @{[vle32_v $V24, $INP]}
+
+    # AES body
+    @{[aes_192_encrypt]}
+
+    @{[vse32_v $V24, $OUTP]}
+
+    add $INP, $INP, $T0
+    add $OUTP, $OUTP, $T0
+
+    bnez $LEN32, 1b
+
+    ret
+.size L_ecb_enc_192,.-L_ecb_enc_192
+___
+
+$code .= <<___;
+.p2align 3
+L_ecb_enc_256:
+    # Load all 15 round keys to v1-v15 registers.
+    @{[aes_256_load_key $KEYP]}
+
+1:
+    @{[vsetvli $VL, $LEN32, "e32", "m4", "ta", "ma"]}
+    slli $T0, $VL, 2
+    sub $LEN32, $LEN32, $VL
+
+    @{[vle32_v $V24, $INP]}
+
+    # AES body
+    @{[aes_256_encrypt]}
+
+    @{[vse32_v $V24, $OUTP]}
+
+    add $INP, $INP, $T0
+    add $OUTP, $OUTP, $T0
+
+    bnez $LEN32, 1b
+
+    ret
+.size L_ecb_enc_256,.-L_ecb_enc_256
+___
+
+###############################################################################
+# void rv64i_zvkned_ecb_decrypt(const unsigned char *in, unsigned char *out,
+#                               size_t length, const AES_KEY *key,
+#                               const int enc);
+$code .= <<___;
+.p2align 3
+.globl rv64i_zvkned_ecb_decrypt
+.type rv64i_zvkned_ecb_decrypt,\@function
+rv64i_zvkned_ecb_decrypt:
+    # Make the LEN become e32 length.
+    srli $LEN32, $LEN, 2
+
+    # Load number of rounds
+    lwu $ROUNDS, 240($KEYP)
+
+    # Get proper routine for key size
+    li $T0, 10
+    beq $ROUNDS, $T0, L_ecb_dec_128
+
+    li $T0, 12
+    beq $ROUNDS, $T0, L_ecb_dec_192
+
+    li $T0, 14
+    beq $ROUNDS, $T0, L_ecb_dec_256
+
+    ret
+.size rv64i_zvkned_ecb_decrypt,.-rv64i_zvkned_ecb_decrypt
+___
+
+$code .= <<___;
+.p2align 3
+L_ecb_dec_128:
+    # Load all 11 round keys to v1-v11 registers.
+    @{[aes_128_load_key $KEYP]}
+
+1:
+    @{[vsetvli $VL, $LEN32, "e32", "m4", "ta", "ma"]}
+    slli $T0, $VL, 2
+    sub $LEN32, $LEN32, $VL
+
+    @{[vle32_v $V24, $INP]}
+
+    # AES body
+    @{[aes_128_decrypt]}
+
+    @{[vse32_v $V24, $OUTP]}
+
+    add $INP, $INP, $T0
+    add $OUTP, $OUTP, $T0
+
+    bnez $LEN32, 1b
+
+    ret
+.size L_ecb_dec_128,.-L_ecb_dec_128
+___
+
+$code .= <<___;
+.p2align 3
+L_ecb_dec_192:
+    # Load all 13 round keys to v1-v13 registers.
+    @{[aes_192_load_key $KEYP]}
+
+1:
+    @{[vsetvli $VL, $LEN32, "e32", "m4", "ta", "ma"]}
+    slli $T0, $VL, 2
+    sub $LEN32, $LEN32, $VL
+
+    @{[vle32_v $V24, $INP]}
+
+    # AES body
+    @{[aes_192_decrypt]}
+
+    @{[vse32_v $V24, $OUTP]}
+
+    add $INP, $INP, $T0
+    add $OUTP, $OUTP, $T0
+
+    bnez $LEN32, 1b
+
+    ret
+.size L_ecb_dec_192,.-L_ecb_dec_192
+___
+
+$code .= <<___;
+.p2align 3
+L_ecb_dec_256:
+    # Load all 15 round keys to v1-v15 registers.
+    @{[aes_256_load_key $KEYP]}
+
+1:
+    @{[vsetvli $VL, $LEN32, "e32", "m4", "ta", "ma"]}
+    slli $T0, $VL, 2
+    sub $LEN32, $LEN32, $VL
+
+    @{[vle32_v $V24, $INP]}
+
+    # AES body
+    @{[aes_256_decrypt]}
+
+    @{[vse32_v $V24, $OUTP]}
+
+    add $INP, $INP, $T0
+    add $OUTP, $OUTP, $T0
+
+    bnez $LEN32, 1b
+
+    ret
+.size L_ecb_dec_256,.-L_ecb_dec_256
+___
+}
+
 {
 ################################################################################
 # void rv64i_zvkned_encrypt(const unsigned char *in, unsigned char *out,
@@ -98,42 +845,42 @@ $code .= <<___;
 L_enc_128:
     @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]}
 
-    @{[vle32_v $V1, ($INP)]}
+    @{[vle32_v $V1, $INP]}
 
-    @{[vle32_v $V10, ($KEYP)]}
+    @{[vle32_v $V10, $KEYP]}
     @{[vaesz_vs $V1, $V10]}    # with round key w[ 0, 3]
     addi $KEYP, $KEYP, 16
-    @{[vle32_v $V11, ($KEYP)]}
+    @{[vle32_v $V11, $KEYP]}
     @{[vaesem_vs $V1, $V11]}   # with round key w[ 4, 7]
     addi $KEYP, $KEYP, 16
-    @{[vle32_v $V12, ($KEYP)]}
+    @{[vle32_v $V12, $KEYP]}
     @{[vaesem_vs $V1, $V12]}   # with round key w[ 8,11]
     addi $KEYP, $KEYP, 16
-    @{[vle32_v $V13, ($KEYP)]}
+    @{[vle32_v $V13, $KEYP]}
     @{[vaesem_vs $V1, $V13]}   # with round key w[12,15]
     addi $KEYP, $KEYP, 16
-    @{[vle32_v $V14, ($KEYP)]}
+    @{[vle32_v $V14, $KEYP]}
     @{[vaesem_vs $V1, $V14]}   # with round key w[16,19]
     addi $KEYP, $KEYP, 16
-    @{[vle32_v $V15, ($KEYP)]}
+    @{[vle32_v $V15, $KEYP]}
     @{[vaesem_vs $V1, $V15]}   # with round key w[20,23]
     addi $KEYP, $KEYP, 16
-    @{[vle32_v $V16, ($KEYP)]}
+    @{[vle32_v $V16, $KEYP]}
     @{[vaesem_vs $V1, $V16]}   # with round key w[24,27]
     addi $KEYP, $KEYP, 16
-    @{[vle32_v $V17, ($KEYP)]}
+    @{[vle32_v $V17, $KEYP]}
     @{[vaesem_vs $V1, $V17]}   # with round key w[28,31]
     addi $KEYP, $KEYP, 16
-    @{[vle32_v $V18, ($KEYP)]}
+    @{[vle32_v $V18, $KEYP]}
     @{[vaesem_vs $V1, $V18]}   # with round key w[32,35]
     addi $KEYP, $KEYP, 16
-    @{[vle32_v $V19, ($KEYP)]}
+    @{[vle32_v $V19, $KEYP]}
     @{[vaesem_vs $V1, $V19]}   # with round key w[36,39]
     addi $KEYP, $KEYP, 16
-    @{[vle32_v $V20, ($KEYP)]}
+    @{[vle32_v $V20, $KEYP]}
     @{[vaesef_vs $V1, $V20]}   # with round key w[40,43]
 
-    @{[vse32_v $V1, ($OUTP)]}
+    @{[vse32_v $V1, $OUTP]}
 
     ret
 .size L_enc_128,.-L_enc_128
@@ -144,48 +891,48 @@ $code .= <<___;
 L_enc_192:
     @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]}
 
-    @{[vle32_v $V1, ($INP)]}
+    @{[vle32_v $V1, $INP]}
 
-    @{[vle32_v $V10, ($KEYP)]}
+    @{[vle32_v $V10, $KEYP]}
     @{[vaesz_vs $V1, $V10]}     # with round key w[ 0, 3]
     addi $KEYP, $KEYP, 16
-    @{[vle32_v $V11, ($KEYP)]}
+    @{[vle32_v $V11, $KEYP]}
     @{[vaesem_vs $V1, $V11]}
     addi $KEYP, $KEYP, 16
-    @{[vle32_v $V12, ($KEYP)]}
+    @{[vle32_v $V12, $KEYP]}
     @{[vaesem_vs $V1, $V12]}
     addi $KEYP, $KEYP, 16
-    @{[vle32_v $V13, ($KEYP)]}
+    @{[vle32_v $V13, $KEYP]}
     @{[vaesem_vs $V1, $V13]}
     addi $KEYP, $KEYP, 16
-    @{[vle32_v $V14, ($KEYP)]}
+    @{[vle32_v $V14, $KEYP]}
     @{[vaesem_vs $V1, $V14]}
     addi $KEYP, $KEYP, 16
-    @{[vle32_v $V15, ($KEYP)]}
+    @{[vle32_v $V15, $KEYP]}
     @{[vaesem_vs $V1, $V15]}
     addi $KEYP, $KEYP, 16
-    @{[vle32_v $V16, ($KEYP)]}
+    @{[vle32_v $V16, $KEYP]}
     @{[vaesem_vs $V1, $V16]}
     addi $KEYP, $KEYP, 16
-    @{[vle32_v $V17, ($KEYP)]}
+    @{[vle32_v $V17, $KEYP]}
     @{[vaesem_vs $V1, $V17]}
     addi $KEYP, $KEYP, 16
-    @{[vle32_v $V18, ($KEYP)]}
+    @{[vle32_v $V18, $KEYP]}
     @{[vaesem_vs $V1, $V18]}
     addi $KEYP, $KEYP, 16
-    @{[vle32_v $V19, ($KEYP)]}
+    @{[vle32_v $V19, $KEYP]}
     @{[vaesem_vs $V1, $V19]}
     addi $KEYP, $KEYP, 16
-    @{[vle32_v $V20, ($KEYP)]}
+    @{[vle32_v $V20, $KEYP]}
     @{[vaesem_vs $V1, $V20]}
     addi $KEYP, $KEYP, 16
-    @{[vle32_v $V21, ($KEYP)]}
+    @{[vle32_v $V21, $KEYP]}
     @{[vaesem_vs $V1, $V21]}
     addi $KEYP, $KEYP, 16
-    @{[vle32_v $V22, ($KEYP)]}
+    @{[vle32_v $V22, $KEYP]}
     @{[vaesef_vs $V1, $V22]}
 
-    @{[vse32_v $V1, ($OUTP)]}
+    @{[vse32_v $V1, $OUTP]}
     ret
 .size L_enc_192,.-L_enc_192
 ___
@@ -195,54 +942,54 @@ $code .= <<___;
 L_enc_256:
     @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]}
 
-    @{[vle32_v $V1, ($INP)]}
+    @{[vle32_v $V1, $INP]}
 
-    @{[vle32_v $V10, ($KEYP)]}
+    @{[vle32_v $V10, $KEYP]}
     @{[vaesz_vs $V1, $V10]}     # with round key w[ 0, 3]
     addi $KEYP, $KEYP, 16
-    @{[vle32_v $V11, ($KEYP)]}
+    @{[vle32_v $V11, $KEYP]}
     @{[vaesem_vs $V1, $V11]}
     addi $KEYP, $KEYP, 16
-    @{[vle32_v $V12, ($KEYP)]}
+    @{[vle32_v $V12, $KEYP]}
     @{[vaesem_vs $V1, $V12]}
     addi $KEYP, $KEYP, 16
-    @{[vle32_v $V13, ($KEYP)]}
+    @{[vle32_v $V13, $KEYP]}
     @{[vaesem_vs $V1, $V13]}
     addi $KEYP, $KEYP, 16
-    @{[vle32_v $V14, ($KEYP)]}
+    @{[vle32_v $V14, $KEYP]}
     @{[vaesem_vs $V1, $V14]}
     addi $KEYP, $KEYP, 16
-    @{[vle32_v $V15, ($KEYP)]}
+    @{[vle32_v $V15, $KEYP]}
     @{[vaesem_vs $V1, $V15]}
     addi $KEYP, $KEYP, 16
-    @{[vle32_v $V16, ($KEYP)]}
+    @{[vle32_v $V16, $KEYP]}
     @{[vaesem_vs $V1, $V16]}
     addi $KEYP, $KEYP, 16
-    @{[vle32_v $V17, ($KEYP)]}
+    @{[vle32_v $V17, $KEYP]}
     @{[vaesem_vs $V1, $V17]}
     addi $KEYP, $KEYP, 16
-    @{[vle32_v $V18, ($KEYP)]}
+    @{[vle32_v $V18, $KEYP]}
     @{[vaesem_vs $V1, $V18]}
     addi $KEYP, $KEYP, 16
-    @{[vle32_v $V19, ($KEYP)]}
+    @{[vle32_v $V19, $KEYP]}
     @{[vaesem_vs $V1, $V19]}
     addi $KEYP, $KEYP, 16
-    @{[vle32_v $V20, ($KEYP)]}
+    @{[vle32_v $V20, $KEYP]}
     @{[vaesem_vs $V1, $V20]}
     addi $KEYP, $KEYP, 16
-    @{[vle32_v $V21, ($KEYP)]}
+    @{[vle32_v $V21, $KEYP]}
     @{[vaesem_vs $V1, $V21]}
     addi $KEYP, $KEYP, 16
-    @{[vle32_v $V22, ($KEYP)]}
+    @{[vle32_v $V22, $KEYP]}
     @{[vaesem_vs $V1, $V22]}
     addi $KEYP, $KEYP, 16
-    @{[vle32_v $V23, ($KEYP)]}
+    @{[vle32_v $V23, $KEYP]}
     @{[vaesem_vs $V1, $V23]}
     addi $KEYP, $KEYP, 16
-    @{[vle32_v $V24, ($KEYP)]}
+    @{[vle32_v $V24, $KEYP]}
     @{[vaesef_vs $V1, $V24]}
 
-    @{[vse32_v $V1, ($OUTP)]}
+    @{[vse32_v $V1, $OUTP]}
     ret
 .size L_enc_256,.-L_enc_256
 ___
@@ -275,43 +1022,43 @@ $code .= <<___;
 L_dec_128:
     @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]}
 
-    @{[vle32_v $V1, ($INP)]}
+    @{[vle32_v $V1, $INP]}
 
     addi $KEYP, $KEYP, 160
-    @{[vle32_v $V20, ($KEYP)]}
+    @{[vle32_v $V20, $KEYP]}
     @{[vaesz_vs $V1, $V20]}    # with round key w[40,43]
     addi $KEYP, $KEYP, -16
-    @{[vle32_v $V19, ($KEYP)]}
+    @{[vle32_v $V19, $KEYP]}
     @{[vaesdm_vs $V1, $V19]}   # with round key w[36,39]
     addi $KEYP, $KEYP, -16
-    @{[vle32_v $V18, ($KEYP)]}
+    @{[vle32_v $V18, $KEYP]}
     @{[vaesdm_vs $V1, $V18]}   # with round key w[32,35]
     addi $KEYP, $KEYP, -16
-    @{[vle32_v $V17, ($KEYP)]}
+    @{[vle32_v $V17, $KEYP]}
     @{[vaesdm_vs $V1, $V17]}   # with round key w[28,31]
     addi $KEYP, $KEYP, -16
-    @{[vle32_v $V16, ($KEYP)]}
+    @{[vle32_v $V16, $KEYP]}
     @{[vaesdm_vs $V1, $V16]}   # with round key w[24,27]
     addi $KEYP, $KEYP, -16
-    @{[vle32_v $V15, ($KEYP)]}
+    @{[vle32_v $V15, $KEYP]}
     @{[vaesdm_vs $V1, $V15]}   # with round key w[20,23]
     addi $KEYP, $KEYP, -16
-    @{[vle32_v $V14, ($KEYP)]}
+    @{[vle32_v $V14, $KEYP]}
     @{[vaesdm_vs $V1, $V14]}   # with round key w[16,19]
     addi $KEYP, $KEYP, -16
-    @{[vle32_v $V13, ($KEYP)]}
+    @{[vle32_v $V13, $KEYP]}
     @{[vaesdm_vs $V1, $V13]}   # with round key w[12,15]
     addi $KEYP, $KEYP, -16
-    @{[vle32_v $V12, ($KEYP)]}
+    @{[vle32_v $V12, $KEYP]}
     @{[vaesdm_vs $V1, $V12]}   # with round key w[ 8,11]
     addi $KEYP, $KEYP, -16
-    @{[vle32_v $V11, ($KEYP)]}
+    @{[vle32_v $V11, $KEYP]}
     @{[vaesdm_vs $V1, $V11]}   # with round key w[ 4, 7]
     addi $KEYP, $KEYP, -16
-    @{[vle32_v $V10, ($KEYP)]}
+    @{[vle32_v $V10, $KEYP]}
     @{[vaesdf_vs $V1, $V10]}   # with round key w[ 0, 3]
 
-    @{[vse32_v $V1, ($OUTP)]}
+    @{[vse32_v $V1, $OUTP]}
 
     ret
 .size L_dec_128,.-L_dec_128
@@ -322,49 +1069,49 @@ $code .= <<___;
 L_dec_192:
     @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]}
 
-    @{[vle32_v $V1, ($INP)]}
+    @{[vle32_v $V1, $INP]}
 
     addi $KEYP, $KEYP, 192
-    @{[vle32_v $V22, ($KEYP)]}
+    @{[vle32_v $V22, $KEYP]}
     @{[vaesz_vs $V1, $V22]}    # with round key w[48,51]
     addi $KEYP, $KEYP, -16
-    @{[vle32_v $V21, ($KEYP)]}
+    @{[vle32_v $V21, $KEYP]}
     @{[vaesdm_vs $V1, $V21]}   # with round key w[44,47]
     addi $KEYP, $KEYP, -16
-    @{[vle32_v $V20, ($KEYP)]}
+    @{[vle32_v $V20, $KEYP]}
     @{[vaesdm_vs $V1, $V20]}    # with round key w[40,43]
     addi $KEYP, $KEYP, -16
-    @{[vle32_v $V19, ($KEYP)]}
+    @{[vle32_v $V19, $KEYP]}
     @{[vaesdm_vs $V1, $V19]}   # with round key w[36,39]
     addi $KEYP, $KEYP, -16
-    @{[vle32_v $V18, ($KEYP)]}
+    @{[vle32_v $V18, $KEYP]}
     @{[vaesdm_vs $V1, $V18]}   # with round key w[32,35]
     addi $KEYP, $KEYP, -16
-    @{[vle32_v $V17, ($KEYP)]}
+    @{[vle32_v $V17, $KEYP]}
     @{[vaesdm_vs $V1, $V17]}   # with round key w[28,31]
     addi $KEYP, $KEYP, -16
-    @{[vle32_v $V16, ($KEYP)]}
+    @{[vle32_v $V16, $KEYP]}
     @{[vaesdm_vs $V1, $V16]}   # with round key w[24,27]
     addi $KEYP, $KEYP, -16
-    @{[vle32_v $V15, ($KEYP)]}
+    @{[vle32_v $V15, $KEYP]}
     @{[vaesdm_vs $V1, $V15]}   # with round key w[20,23]
     addi $KEYP, $KEYP, -16
-    @{[vle32_v $V14, ($KEYP)]}
+    @{[vle32_v $V14, $KEYP]}
     @{[vaesdm_vs $V1, $V14]}   # with round key w[16,19]
     addi $KEYP, $KEYP, -16
-    @{[vle32_v $V13, ($KEYP)]}
+    @{[vle32_v $V13, $KEYP]}
     @{[vaesdm_vs $V1, $V13]}   # with round key w[12,15]
     addi $KEYP, $KEYP, -16
-    @{[vle32_v $V12, ($KEYP)]}
+    @{[vle32_v $V12, $KEYP]}
     @{[vaesdm_vs $V1, $V12]}   # with round key w[ 8,11]
     addi $KEYP, $KEYP, -16
-    @{[vle32_v $V11, ($KEYP)]}
+    @{[vle32_v $V11, $KEYP]}
     @{[vaesdm_vs $V1, $V11]}   # with round key w[ 4, 7]
     addi $KEYP, $KEYP, -16
-    @{[vle32_v $V10, ($KEYP)]}
+    @{[vle32_v $V10, $KEYP]}
     @{[vaesdf_vs $V1, $V10]}   # with round key w[ 0, 3]
 
-    @{[vse32_v $V1, ($OUTP)]}
+    @{[vse32_v $V1, $OUTP]}
 
     ret
 .size L_dec_192,.-L_dec_192
@@ -375,55 +1122,55 @@ $code .= <<___;
 L_dec_256:
     @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]}
 
-    @{[vle32_v $V1, ($INP)]}
+    @{[vle32_v $V1, $INP]}
 
     addi $KEYP, $KEYP, 224
-    @{[vle32_v $V24, ($KEYP)]}
+    @{[vle32_v $V24, $KEYP]}
     @{[vaesz_vs $V1, $V24]}    # with round key w[56,59]
     addi $KEYP, $KEYP, -16
-    @{[vle32_v $V23, ($KEYP)]}
+    @{[vle32_v $V23, $KEYP]}
     @{[vaesdm_vs $V1, $V23]}   # with round key w[52,55]
     addi $KEYP, $KEYP, -16
-    @{[vle32_v $V22, ($KEYP)]}
+    @{[vle32_v $V22, $KEYP]}
     @{[vaesdm_vs $V1, $V22]}    # with round key w[48,51]
     addi $KEYP, $KEYP, -16
-    @{[vle32_v $V21, ($KEYP)]}
+    @{[vle32_v $V21, $KEYP]}
     @{[vaesdm_vs $V1, $V21]}   # with round key w[44,47]
     addi $KEYP, $KEYP, -16
-    @{[vle32_v $V20, ($KEYP)]}
+    @{[vle32_v $V20, $KEYP]}
     @{[vaesdm_vs $V1, $V20]}    # with round key w[40,43]
     addi $KEYP, $KEYP, -16
-    @{[vle32_v $V19, ($KEYP)]}
+    @{[vle32_v $V19, $KEYP]}
     @{[vaesdm_vs $V1, $V19]}   # with round key w[36,39]
     addi $KEYP, $KEYP, -16
-    @{[vle32_v $V18, ($KEYP)]}
+    @{[vle32_v $V18, $KEYP]}
     @{[vaesdm_vs $V1, $V18]}   # with round key w[32,35]
     addi $KEYP, $KEYP, -16
-    @{[vle32_v $V17, ($KEYP)]}
+    @{[vle32_v $V17, $KEYP]}
     @{[vaesdm_vs $V1, $V17]}   # with round key w[28,31]
     addi $KEYP, $KEYP, -16
-    @{[vle32_v $V16, ($KEYP)]}
+    @{[vle32_v $V16, $KEYP]}
     @{[vaesdm_vs $V1, $V16]}   # with round key w[24,27]
     addi $KEYP, $KEYP, -16
-    @{[vle32_v $V15, ($KEYP)]}
+    @{[vle32_v $V15, $KEYP]}
     @{[vaesdm_vs $V1, $V15]}   # with round key w[20,23]
     addi $KEYP, $KEYP, -16
-    @{[vle32_v $V14, ($KEYP)]}
+    @{[vle32_v $V14, $KEYP]}
     @{[vaesdm_vs $V1, $V14]}   # with round key w[16,19]
     addi $KEYP, $KEYP, -16
-    @{[vle32_v $V13, ($KEYP)]}
+    @{[vle32_v $V13, $KEYP]}
     @{[vaesdm_vs $V1, $V13]}   # with round key w[12,15]
     addi $KEYP, $KEYP, -16
-    @{[vle32_v $V12, ($KEYP)]}
+    @{[vle32_v $V12, $KEYP]}
     @{[vaesdm_vs $V1, $V12]}   # with round key w[ 8,11]
     addi $KEYP, $KEYP, -16
-    @{[vle32_v $V11, ($KEYP)]}
+    @{[vle32_v $V11, $KEYP]}
     @{[vaesdm_vs $V1, $V11]}   # with round key w[ 4, 7]
     addi $KEYP, $KEYP, -16
-    @{[vle32_v $V10, ($KEYP)]}
+    @{[vle32_v $V10, $KEYP]}
     @{[vaesdf_vs $V1, $V10]}   # with round key w[ 0, 3]
 
-    @{[vse32_v $V1, ($OUTP)]}
+    @{[vse32_v $V1, $OUTP]}
 
     ret
 .size L_dec_256,.-L_dec_256
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 07/12] RISC-V: crypto: add Zvkg accelerated GCM GHASH implementation
  2023-10-25 18:36 [PATCH 00/12] RISC-V: provide some accelerated cryptography implementations using vector extensions Jerry Shih
                   ` (5 preceding siblings ...)
  2023-10-25 18:36 ` [PATCH 06/12] RISC-V: crypto: add accelerated AES-CBC/CTR/ECB/XTS implementations Jerry Shih
@ 2023-10-25 18:36 ` Jerry Shih
  2023-11-22  1:42   ` Eric Biggers
  2023-10-25 18:36 ` [PATCH 08/12] RISC-V: crypto: add Zvknha/b accelerated SHA224/256 implementations Jerry Shih
                   ` (4 subsequent siblings)
  11 siblings, 1 reply; 50+ messages in thread
From: Jerry Shih @ 2023-10-25 18:36 UTC (permalink / raw)
  To: paul.walmsley, palmer, aou, herbert, davem
  Cc: andy.chiu, greentime.hu, conor.dooley, guoren, bjorn, heiko,
	ebiggers, ardb, phoebe.chen, hongrong.hsu, linux-riscv,
	linux-kernel, linux-crypto

Add a gcm hash implementation using the Zvkg extension from OpenSSL
(openssl/openssl#21923).

The perlasm here is different from the original implementation in OpenSSL.
The OpenSSL assumes that the H is stored in little-endian. Thus, it needs
to convert the H to big-endian for Zvkg instructions. In kernel, we have
the big-endian H directly. There is no need for endian conversion.

Co-developed-by: Christoph Müllner <christoph.muellner@vrull.eu>
Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
Co-developed-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
Signed-off-by: Jerry Shih <jerry.shih@sifive.com>
---
 arch/riscv/crypto/Kconfig               |  14 ++
 arch/riscv/crypto/Makefile              |   7 +
 arch/riscv/crypto/ghash-riscv64-glue.c  | 191 ++++++++++++++++++++++++
 arch/riscv/crypto/ghash-riscv64-zvkg.pl | 100 +++++++++++++
 4 files changed, 312 insertions(+)
 create mode 100644 arch/riscv/crypto/ghash-riscv64-glue.c
 create mode 100644 arch/riscv/crypto/ghash-riscv64-zvkg.pl

diff --git a/arch/riscv/crypto/Kconfig b/arch/riscv/crypto/Kconfig
index dfa9d0146d26..00be7177eb1e 100644
--- a/arch/riscv/crypto/Kconfig
+++ b/arch/riscv/crypto/Kconfig
@@ -35,4 +35,18 @@ config CRYPTO_AES_BLOCK_RISCV64
 	  - Zvkg vector crypto extension (XTS)
 	  - Zvkned vector crypto extension
 
+config CRYPTO_GHASH_RISCV64
+	default y if RISCV_ISA_V
+	tristate "Hash functions: GHASH"
+	depends on 64BIT && RISCV_ISA_V
+	select CRYPTO_GCM
+	select CRYPTO_GHASH
+	select CRYPTO_HASH
+	select CRYPTO_LIB_GF128MUL
+	help
+	  GCM GHASH function (NIST SP 800-38D)
+
+	  Architecture: riscv64 using:
+	  - Zvkg vector crypto extension
+
 endmenu
diff --git a/arch/riscv/crypto/Makefile b/arch/riscv/crypto/Makefile
index 42a4e8ec79cf..532316cc1758 100644
--- a/arch/riscv/crypto/Makefile
+++ b/arch/riscv/crypto/Makefile
@@ -9,6 +9,9 @@ aes-riscv64-y := aes-riscv64-glue.o aes-riscv64-zvkned.o
 obj-$(CONFIG_CRYPTO_AES_BLOCK_RISCV64) += aes-block-riscv64.o
 aes-block-riscv64-y := aes-riscv64-block-mode-glue.o aes-riscv64-zvbb-zvkg-zvkned.o aes-riscv64-zvkb-zvkned.o
 
+obj-$(CONFIG_CRYPTO_GHASH_RISCV64) += ghash-riscv64.o
+ghash-riscv64-y := ghash-riscv64-glue.o ghash-riscv64-zvkg.o
+
 quiet_cmd_perlasm = PERLASM $@
       cmd_perlasm = $(PERL) $(<) void $(@)
 
@@ -21,6 +24,10 @@ $(obj)/aes-riscv64-zvbb-zvkg-zvkned.S: $(src)/aes-riscv64-zvbb-zvkg-zvkned.pl
 $(obj)/aes-riscv64-zvkb-zvkned.S: $(src)/aes-riscv64-zvkb-zvkned.pl
 	$(call cmd,perlasm)
 
+$(obj)/ghash-riscv64-zvkg.S: $(src)/ghash-riscv64-zvkg.pl
+	$(call cmd,perlasm)
+
 clean-files += aes-riscv64-zvkned.S
 clean-files += aes-riscv64-zvbb-zvkg-zvkned.S
 clean-files += aes-riscv64-zvkb-zvkned.S
+clean-files += ghash-riscv64-zvkg.S
diff --git a/arch/riscv/crypto/ghash-riscv64-glue.c b/arch/riscv/crypto/ghash-riscv64-glue.c
new file mode 100644
index 000000000000..d5b7f0e4f612
--- /dev/null
+++ b/arch/riscv/crypto/ghash-riscv64-glue.c
@@ -0,0 +1,191 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * RISC-V optimized GHASH routines
+ *
+ * Copyright (C) 2023 VRULL GmbH
+ * Author: Heiko Stuebner <heiko.stuebner@vrull.eu>
+ *
+ * Copyright (C) 2023 SiFive, Inc.
+ * Author: Jerry Shih <jerry.shih@sifive.com>
+ */
+
+#include <asm/simd.h>
+#include <asm/vector.h>
+#include <crypto/ghash.h>
+#include <crypto/internal/hash.h>
+#include <crypto/internal/simd.h>
+#include <linux/crypto.h>
+#include <linux/module.h>
+#include <linux/types.h>
+
+/* ghash using zvkg vector crypto extension */
+void gcm_ghash_rv64i_zvkg(be128 *Xi, const be128 *H, const u8 *inp, size_t len);
+
+struct riscv64_ghash_context {
+	be128 key;
+};
+
+struct riscv64_ghash_desc_ctx {
+	be128 shash;
+	u8 buffer[GHASH_BLOCK_SIZE];
+	u32 bytes;
+};
+
+typedef void (*ghash_func)(be128 *Xi, const be128 *H, const u8 *inp,
+			   size_t len);
+
+static inline void ghash_blocks(const struct riscv64_ghash_context *ctx,
+				struct riscv64_ghash_desc_ctx *dctx,
+				const u8 *src, size_t srclen, ghash_func func)
+{
+	if (crypto_simd_usable()) {
+		kernel_vector_begin();
+		func(&dctx->shash, &ctx->key, src, srclen);
+		kernel_vector_end();
+	} else {
+		while (srclen >= GHASH_BLOCK_SIZE) {
+			crypto_xor((u8 *)&dctx->shash, src, GHASH_BLOCK_SIZE);
+			gf128mul_lle(&dctx->shash, &ctx->key);
+			srclen -= GHASH_BLOCK_SIZE;
+			src += GHASH_BLOCK_SIZE;
+		}
+	}
+}
+
+static int ghash_update(struct shash_desc *desc, const u8 *src, size_t srclen,
+			ghash_func func)
+{
+	size_t len;
+	const struct riscv64_ghash_context *ctx =
+		crypto_tfm_ctx(crypto_shash_tfm(desc->tfm));
+	struct riscv64_ghash_desc_ctx *dctx = shash_desc_ctx(desc);
+
+	if (dctx->bytes) {
+		if (dctx->bytes + srclen < GHASH_BLOCK_SIZE) {
+			memcpy(dctx->buffer + dctx->bytes, src, srclen);
+			dctx->bytes += srclen;
+			return 0;
+		}
+		memcpy(dctx->buffer + dctx->bytes, src,
+		       GHASH_BLOCK_SIZE - dctx->bytes);
+
+		ghash_blocks(ctx, dctx, dctx->buffer, GHASH_BLOCK_SIZE, func);
+
+		src += GHASH_BLOCK_SIZE - dctx->bytes;
+		srclen -= GHASH_BLOCK_SIZE - dctx->bytes;
+		dctx->bytes = 0;
+	}
+	len = srclen & ~(GHASH_BLOCK_SIZE - 1);
+
+	if (len) {
+		ghash_blocks(ctx, dctx, src, len, func);
+		src += len;
+		srclen -= len;
+	}
+
+	if (srclen) {
+		memcpy(dctx->buffer, src, srclen);
+		dctx->bytes = srclen;
+	}
+
+	return 0;
+}
+
+static int ghash_final(struct shash_desc *desc, u8 *out, ghash_func func)
+{
+	const struct riscv64_ghash_context *ctx =
+		crypto_tfm_ctx(crypto_shash_tfm(desc->tfm));
+	struct riscv64_ghash_desc_ctx *dctx = shash_desc_ctx(desc);
+	int i;
+
+	if (dctx->bytes) {
+		for (i = dctx->bytes; i < GHASH_BLOCK_SIZE; i++)
+			dctx->buffer[i] = 0;
+
+		ghash_blocks(ctx, dctx, dctx->buffer, GHASH_BLOCK_SIZE, func);
+		dctx->bytes = 0;
+	}
+
+	memcpy(out, &dctx->shash, GHASH_DIGEST_SIZE);
+
+	return 0;
+}
+
+static int ghash_init(struct shash_desc *desc)
+{
+	struct riscv64_ghash_desc_ctx *dctx = shash_desc_ctx(desc);
+
+	*dctx = (struct riscv64_ghash_desc_ctx){};
+
+	return 0;
+}
+
+static int ghash_update_zvkg(struct shash_desc *desc, const u8 *src,
+			     unsigned int srclen)
+{
+	return ghash_update(desc, src, srclen, gcm_ghash_rv64i_zvkg);
+}
+
+static int ghash_final_zvkg(struct shash_desc *desc, u8 *out)
+{
+	return ghash_final(desc, out, gcm_ghash_rv64i_zvkg);
+}
+
+static int ghash_setkey(struct crypto_shash *tfm, const u8 *key,
+			unsigned int keylen)
+{
+	struct riscv64_ghash_context *ctx =
+		crypto_tfm_ctx(crypto_shash_tfm(tfm));
+
+	if (keylen != GHASH_BLOCK_SIZE)
+		return -EINVAL;
+
+	memcpy(&ctx->key, key, GHASH_BLOCK_SIZE);
+
+	return 0;
+}
+
+static struct shash_alg riscv64_ghash_alg_zvkg = {
+	.digestsize = GHASH_DIGEST_SIZE,
+	.init = ghash_init,
+	.update = ghash_update_zvkg,
+	.final = ghash_final_zvkg,
+	.setkey = ghash_setkey,
+	.descsize = sizeof(struct riscv64_ghash_desc_ctx),
+	.base = {
+		.cra_name = "ghash",
+		.cra_driver_name = "ghash-riscv64-zvkg",
+		.cra_priority = 303,
+		.cra_blocksize = GHASH_BLOCK_SIZE,
+		.cra_ctxsize = sizeof(struct riscv64_ghash_context),
+		.cra_module = THIS_MODULE,
+	},
+};
+
+static inline bool check_ghash_ext(void)
+{
+	return riscv_isa_extension_available(NULL, ZVKG) &&
+	       riscv_vector_vlen() >= 128;
+}
+
+static int __init riscv64_ghash_mod_init(void)
+{
+	if (check_ghash_ext())
+		return crypto_register_shash(&riscv64_ghash_alg_zvkg);
+
+	return -ENODEV;
+}
+
+static void __exit riscv64_ghash_mod_fini(void)
+{
+	if (check_ghash_ext())
+		crypto_unregister_shash(&riscv64_ghash_alg_zvkg);
+}
+
+module_init(riscv64_ghash_mod_init);
+module_exit(riscv64_ghash_mod_fini);
+
+MODULE_DESCRIPTION("GCM GHASH (RISC-V accelerated)");
+MODULE_AUTHOR("Heiko Stuebner <heiko.stuebner@vrull.eu>");
+MODULE_LICENSE("GPL");
+MODULE_ALIAS_CRYPTO("ghash");
diff --git a/arch/riscv/crypto/ghash-riscv64-zvkg.pl b/arch/riscv/crypto/ghash-riscv64-zvkg.pl
new file mode 100644
index 000000000000..4beea4ac9cbe
--- /dev/null
+++ b/arch/riscv/crypto/ghash-riscv64-zvkg.pl
@@ -0,0 +1,100 @@
+#! /usr/bin/env perl
+# SPDX-License-Identifier: Apache-2.0 OR BSD-2-Clause
+#
+# This file is dual-licensed, meaning that you can use it under your
+# choice of either of the following two licenses:
+#
+# Copyright 2023 The OpenSSL Project Authors. All Rights Reserved.
+#
+# Licensed under the Apache License 2.0 (the "License"). You can obtain
+# a copy in the file LICENSE in the source distribution or at
+# https://www.openssl.org/source/license.html
+#
+# or
+#
+# Copyright (c) 2023, Christoph Müllner <christoph.muellner@vrull.eu>
+# Copyright (c) 2023, Jerry Shih <jerry.shih@sifive.com>
+# All rights reserved.
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions
+# are met:
+# 1. Redistributions of source code must retain the above copyright
+#    notice, this list of conditions and the following disclaimer.
+# 2. Redistributions in binary form must reproduce the above copyright
+#    notice, this list of conditions and the following disclaimer in the
+#    documentation and/or other materials provided with the distribution.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+# - RV64I
+# - RISC-V Vector ('V') with VLEN >= 128
+# - RISC-V Vector GCM/GMAC extension ('Zvkg')
+
+use strict;
+use warnings;
+
+use FindBin qw($Bin);
+use lib "$Bin";
+use lib "$Bin/../../perlasm";
+use riscv;
+
+# $output is the last argument if it looks like a file (it has an extension)
+# $flavour is the first argument if it doesn't look like a file
+my $output = $#ARGV >= 0 && $ARGV[$#ARGV] =~ m|\.\w+$| ? pop : undef;
+my $flavour = $#ARGV >= 0 && $ARGV[0] !~ m|\.| ? shift : undef;
+
+$output and open STDOUT,">$output";
+
+my $code=<<___;
+.text
+___
+
+###############################################################################
+# void gcm_ghash_rv64i_zvkg(be128 *Xi, const be128 *H, const u8 *inp, size_t len)
+#
+# input: Xi: current hash value
+#        H: hash key
+#        inp: pointer to input data
+#        len: length of input data in bytes (multiple of block size)
+# output: Xi: Xi+1 (next hash value Xi)
+{
+my ($Xi,$H,$inp,$len) = ("a0","a1","a2","a3");
+my ($vXi,$vH,$vinp,$Vzero) = ("v1","v2","v3","v4");
+
+$code .= <<___;
+.p2align 3
+.globl gcm_ghash_rv64i_zvkg
+.type gcm_ghash_rv64i_zvkg,\@function
+gcm_ghash_rv64i_zvkg:
+    @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]}
+    @{[vle32_v $vH, $H]}
+    @{[vle32_v $vXi, $Xi]}
+
+Lstep:
+    @{[vle32_v $vinp, $inp]}
+    add $inp, $inp, 16
+    add $len, $len, -16
+    @{[vghsh_vv $vXi, $vH, $vinp]}
+    bnez $len, Lstep
+
+    @{[vse32_v $vXi, $Xi]}
+    ret
+
+.size gcm_ghash_rv64i_zvkg,.-gcm_ghash_rv64i_zvkg
+___
+}
+
+print $code;
+
+close STDOUT or die "error closing STDOUT: $!";
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 08/12] RISC-V: crypto: add Zvknha/b accelerated SHA224/256 implementations
  2023-10-25 18:36 [PATCH 00/12] RISC-V: provide some accelerated cryptography implementations using vector extensions Jerry Shih
                   ` (6 preceding siblings ...)
  2023-10-25 18:36 ` [PATCH 07/12] RISC-V: crypto: add Zvkg accelerated GCM GHASH implementation Jerry Shih
@ 2023-10-25 18:36 ` Jerry Shih
  2023-10-25 18:36 ` [PATCH 09/12] RISC-V: crypto: add Zvknhb accelerated SHA384/512 implementations Jerry Shih
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 50+ messages in thread
From: Jerry Shih @ 2023-10-25 18:36 UTC (permalink / raw)
  To: paul.walmsley, palmer, aou, herbert, davem
  Cc: andy.chiu, greentime.hu, conor.dooley, guoren, bjorn, heiko,
	ebiggers, ardb, phoebe.chen, hongrong.hsu, linux-riscv,
	linux-kernel, linux-crypto

Add SHA224 and 256 implementations using Zvknha or Zvknhb vector crypto
extensions from OpenSSL(openssl/openssl#21923).

Co-developed-by: Charalampos Mitrodimas <charalampos.mitrodimas@vrull.eu>
Signed-off-by: Charalampos Mitrodimas <charalampos.mitrodimas@vrull.eu>
Co-developed-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
Co-developed-by: Phoebe Chen <phoebe.chen@sifive.com>
Signed-off-by: Phoebe Chen <phoebe.chen@sifive.com>
Signed-off-by: Jerry Shih <jerry.shih@sifive.com>
---
 arch/riscv/crypto/Kconfig                     |  13 +
 arch/riscv/crypto/Makefile                    |   7 +
 arch/riscv/crypto/sha256-riscv64-glue.c       | 140 ++++++++
 .../sha256-riscv64-zvkb-zvknha_or_zvknhb.pl   | 318 ++++++++++++++++++
 4 files changed, 478 insertions(+)
 create mode 100644 arch/riscv/crypto/sha256-riscv64-glue.c
 create mode 100644 arch/riscv/crypto/sha256-riscv64-zvkb-zvknha_or_zvknhb.pl

diff --git a/arch/riscv/crypto/Kconfig b/arch/riscv/crypto/Kconfig
index 00be7177eb1e..dbc86592063a 100644
--- a/arch/riscv/crypto/Kconfig
+++ b/arch/riscv/crypto/Kconfig
@@ -49,4 +49,17 @@ config CRYPTO_GHASH_RISCV64
 	  Architecture: riscv64 using:
 	  - Zvkg vector crypto extension
 
+config CRYPTO_SHA256_RISCV64
+	default y if RISCV_ISA_V
+	tristate "Hash functions: SHA-224 and SHA-256"
+	depends on 64BIT && RISCV_ISA_V
+	select CRYPTO_HASH
+	select CRYPTO_SHA256
+	help
+	  SHA-256 secure hash algorithm (FIPS 180)
+
+	  Architecture: riscv64 using:
+	  - Zvkb vector crypto extension
+	  - Zvknha or Zvknhb vector crypto extensions
+
 endmenu
diff --git a/arch/riscv/crypto/Makefile b/arch/riscv/crypto/Makefile
index 532316cc1758..49577fb72cca 100644
--- a/arch/riscv/crypto/Makefile
+++ b/arch/riscv/crypto/Makefile
@@ -12,6 +12,9 @@ aes-block-riscv64-y := aes-riscv64-block-mode-glue.o aes-riscv64-zvbb-zvkg-zvkne
 obj-$(CONFIG_CRYPTO_GHASH_RISCV64) += ghash-riscv64.o
 ghash-riscv64-y := ghash-riscv64-glue.o ghash-riscv64-zvkg.o
 
+obj-$(CONFIG_CRYPTO_SHA256_RISCV64) += sha256-riscv64.o
+sha256-riscv64-y := sha256-riscv64-glue.o sha256-riscv64-zvkb-zvknha_or_zvknhb.o
+
 quiet_cmd_perlasm = PERLASM $@
       cmd_perlasm = $(PERL) $(<) void $(@)
 
@@ -27,7 +30,11 @@ $(obj)/aes-riscv64-zvkb-zvkned.S: $(src)/aes-riscv64-zvkb-zvkned.pl
 $(obj)/ghash-riscv64-zvkg.S: $(src)/ghash-riscv64-zvkg.pl
 	$(call cmd,perlasm)
 
+$(obj)/sha256-riscv64-zvkb-zvknha_or_zvknhb.S: $(src)/sha256-riscv64-zvkb-zvknha_or_zvknhb.pl
+	$(call cmd,perlasm)
+
 clean-files += aes-riscv64-zvkned.S
 clean-files += aes-riscv64-zvbb-zvkg-zvkned.S
 clean-files += aes-riscv64-zvkb-zvkned.S
 clean-files += ghash-riscv64-zvkg.S
+clean-files += sha256-riscv64-zvkb-zvknha_or_zvknhb.S
diff --git a/arch/riscv/crypto/sha256-riscv64-glue.c b/arch/riscv/crypto/sha256-riscv64-glue.c
new file mode 100644
index 000000000000..35acfb6e4b2d
--- /dev/null
+++ b/arch/riscv/crypto/sha256-riscv64-glue.c
@@ -0,0 +1,140 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Linux/riscv64 port of the OpenSSL SHA256 implementation for RISC-V 64
+ *
+ * Copyright (C) 2022 VRULL GmbH
+ * Author: Heiko Stuebner <heiko.stuebner@vrull.eu>
+ */
+
+#include <asm/simd.h>
+#include <asm/vector.h>
+#include <linux/module.h>
+#include <linux/types.h>
+#include <crypto/internal/hash.h>
+#include <crypto/internal/simd.h>
+#include <crypto/sha256_base.h>
+
+/*
+ * sha256 using zvkb and zvknha/b vector crypto extension
+ *
+ * This asm function will just take the first 256-bit as the sha256 state from
+ * the pointer to `struct sha256_state`.
+ */
+void sha256_block_data_order_zvkb_zvknha_or_zvknhb(struct sha256_state *digest,
+						   const u8 *data,
+						   int num_blks);
+
+static int riscv64_sha256_update(struct shash_desc *desc, const u8 *data,
+				 unsigned int len)
+{
+	int ret = 0;
+
+	/*
+	 * Make sure struct sha256_state begins directly with the SHA256
+	 * 256-bit internal state, as this is what the asm function expect.
+	 */
+	BUILD_BUG_ON(offsetof(struct sha256_state, state) != 0);
+
+	if (crypto_simd_usable()) {
+		kernel_vector_begin();
+		ret = sha256_base_do_update(
+			desc, data, len,
+			sha256_block_data_order_zvkb_zvknha_or_zvknhb);
+		kernel_vector_end();
+	} else {
+		ret = crypto_sha256_update(desc, data, len);
+	}
+
+	return ret;
+}
+
+static int riscv64_sha256_finup(struct shash_desc *desc, const u8 *data,
+				unsigned int len, u8 *out)
+{
+	if (crypto_simd_usable()) {
+		kernel_vector_begin();
+		if (len)
+			sha256_base_do_update(
+				desc, data, len,
+				sha256_block_data_order_zvkb_zvknha_or_zvknhb);
+		sha256_base_do_finalize(
+			desc, sha256_block_data_order_zvkb_zvknha_or_zvknhb);
+		kernel_vector_end();
+
+		return sha256_base_finish(desc, out);
+	}
+
+	return crypto_sha256_finup(desc, data, len, out);
+}
+
+static int riscv64_sha256_final(struct shash_desc *desc, u8 *out)
+{
+	return riscv64_sha256_finup(desc, NULL, 0, out);
+}
+
+static struct shash_alg sha256_alg[] = {
+	{
+		.digestsize = SHA256_DIGEST_SIZE,
+		.init = sha256_base_init,
+		.update = riscv64_sha256_update,
+		.final = riscv64_sha256_final,
+		.finup = riscv64_sha256_finup,
+		.descsize = sizeof(struct sha256_state),
+		.base.cra_name = "sha256",
+		.base.cra_driver_name = "sha256-riscv64-zvkb-zvknha_or_zvknhb",
+		.base.cra_priority = 150,
+		.base.cra_blocksize = SHA256_BLOCK_SIZE,
+		.base.cra_module = THIS_MODULE,
+	},
+	{
+		.digestsize = SHA224_DIGEST_SIZE,
+		.init = sha224_base_init,
+		.update = riscv64_sha256_update,
+		.final = riscv64_sha256_final,
+		.finup = riscv64_sha256_finup,
+		.descsize = sizeof(struct sha256_state),
+		.base.cra_name = "sha224",
+		.base.cra_driver_name = "sha224-riscv64-zvkb-zvknha_or_zvknhb",
+		.base.cra_priority = 150,
+		.base.cra_blocksize = SHA224_BLOCK_SIZE,
+		.base.cra_module = THIS_MODULE,
+	}
+};
+
+static inline bool check_sha256_ext(void)
+{
+	/*
+	 * From the spec:
+	 * The Zvknhb ext supports both SHA-256 and SHA-512 and Zvknha only
+	 * supports SHA-256.
+	 */
+	return (riscv_isa_extension_available(NULL, ZVKNHA) ||
+		riscv_isa_extension_available(NULL, ZVKNHB)) &&
+	       riscv_isa_extension_available(NULL, ZVKB) &&
+	       riscv_vector_vlen() >= 128;
+}
+
+static int __init riscv64_sha256_mod_init(void)
+{
+	if (check_sha256_ext())
+		return crypto_register_shashes(sha256_alg,
+					       ARRAY_SIZE(sha256_alg));
+
+	return -ENODEV;
+}
+
+static void __exit riscv64_sha256_mod_fini(void)
+{
+	if (check_sha256_ext())
+		crypto_unregister_shashes(sha256_alg, ARRAY_SIZE(sha256_alg));
+}
+
+module_init(riscv64_sha256_mod_init);
+module_exit(riscv64_sha256_mod_fini);
+
+MODULE_DESCRIPTION("SHA-256 (RISC-V accelerated)");
+MODULE_AUTHOR("Andy Polyakov <appro@openssl.org>");
+MODULE_AUTHOR("Heiko Stuebner <heiko.stuebner@vrull.eu>");
+MODULE_LICENSE("GPL");
+MODULE_ALIAS_CRYPTO("sha224");
+MODULE_ALIAS_CRYPTO("sha256");
diff --git a/arch/riscv/crypto/sha256-riscv64-zvkb-zvknha_or_zvknhb.pl b/arch/riscv/crypto/sha256-riscv64-zvkb-zvknha_or_zvknhb.pl
new file mode 100644
index 000000000000..51b2d9d4f8f1
--- /dev/null
+++ b/arch/riscv/crypto/sha256-riscv64-zvkb-zvknha_or_zvknhb.pl
@@ -0,0 +1,318 @@
+#! /usr/bin/env perl
+# SPDX-License-Identifier: Apache-2.0 OR BSD-2-Clause
+#
+# This file is dual-licensed, meaning that you can use it under your
+# choice of either of the following two licenses:
+#
+# Copyright 2023 The OpenSSL Project Authors. All Rights Reserved.
+#
+# Licensed under the Apache License 2.0 (the "License"). You can obtain
+# a copy in the file LICENSE in the source distribution or at
+# https://www.openssl.org/source/license.html
+#
+# or
+#
+# Copyright (c) 2023, Christoph Müllner <christoph.muellner@vrull.eu>
+# Copyright (c) 2023, Phoebe Chen <phoebe.chen@sifive.com>
+# All rights reserved.
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions
+# are met:
+# 1. Redistributions of source code must retain the above copyright
+#    notice, this list of conditions and the following disclaimer.
+# 2. Redistributions in binary form must reproduce the above copyright
+#    notice, this list of conditions and the following disclaimer in the
+#    documentation and/or other materials provided with the distribution.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+# The generated code of this file depends on the following RISC-V extensions:
+# - RV64I
+# - RISC-V Vector ('V') with VLEN >= 128
+# - RISC-V Vector Cryptography Bit-manipulation extension ('Zvkb')
+# - RISC-V Vector SHA-2 Secure Hash extension ('Zvknha' or 'Zvknhb')
+
+use strict;
+use warnings;
+
+use FindBin qw($Bin);
+use lib "$Bin";
+use lib "$Bin/../../perlasm";
+use riscv;
+
+# $output is the last argument if it looks like a file (it has an extension)
+# $flavour is the first argument if it doesn't look like a file
+my $output = $#ARGV >= 0 && $ARGV[$#ARGV] =~ m|\.\w+$| ? pop : undef;
+my $flavour = $#ARGV >= 0 && $ARGV[0] !~ m|\.| ? shift : undef;
+
+$output and open STDOUT,">$output";
+
+my $code=<<___;
+.text
+___
+
+my ($V0, $V1, $V2, $V3, $V4, $V5, $V6, $V7,
+    $V8, $V9, $V10, $V11, $V12, $V13, $V14, $V15,
+    $V16, $V17, $V18, $V19, $V20, $V21, $V22, $V23,
+    $V24, $V25, $V26, $V27, $V28, $V29, $V30, $V31,
+) = map("v$_",(0..31));
+
+my $K256 = "K256";
+
+# Function arguments
+my ($H, $INP, $LEN, $KT, $H2, $INDEX_PATTERN) = ("a0", "a1", "a2", "a3", "t3", "t4");
+
+sub sha_256_load_constant {
+    my $code=<<___;
+    la $KT, $K256 # Load round constants K256
+    @{[vle32_v $V10, $KT]}
+    addi $KT, $KT, 16
+    @{[vle32_v $V11, $KT]}
+    addi $KT, $KT, 16
+    @{[vle32_v $V12, $KT]}
+    addi $KT, $KT, 16
+    @{[vle32_v $V13, $KT]}
+    addi $KT, $KT, 16
+    @{[vle32_v $V14, $KT]}
+    addi $KT, $KT, 16
+    @{[vle32_v $V15, $KT]}
+    addi $KT, $KT, 16
+    @{[vle32_v $V16, $KT]}
+    addi $KT, $KT, 16
+    @{[vle32_v $V17, $KT]}
+    addi $KT, $KT, 16
+    @{[vle32_v $V18, $KT]}
+    addi $KT, $KT, 16
+    @{[vle32_v $V19, $KT]}
+    addi $KT, $KT, 16
+    @{[vle32_v $V20, $KT]}
+    addi $KT, $KT, 16
+    @{[vle32_v $V21, $KT]}
+    addi $KT, $KT, 16
+    @{[vle32_v $V22, $KT]}
+    addi $KT, $KT, 16
+    @{[vle32_v $V23, $KT]}
+    addi $KT, $KT, 16
+    @{[vle32_v $V24, $KT]}
+    addi $KT, $KT, 16
+    @{[vle32_v $V25, $KT]}
+___
+
+    return $code;
+}
+
+################################################################################
+# void sha256_block_data_order_zvkb_zvknha_or_zvknhb(void *c, const void *p, size_t len)
+$code .= <<___;
+.p2align 2
+.globl sha256_block_data_order_zvkb_zvknha_or_zvknhb
+.type   sha256_block_data_order_zvkb_zvknha_or_zvknhb,\@function
+sha256_block_data_order_zvkb_zvknha_or_zvknhb:
+    @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]}
+
+    @{[sha_256_load_constant]}
+
+    # H is stored as {a,b,c,d},{e,f,g,h}, but we need {f,e,b,a},{h,g,d,c}
+    # The dst vtype is e32m1 and the index vtype is e8mf4.
+    # We use index-load with the following index pattern at v26.
+    #   i8 index:
+    #     20, 16, 4, 0
+    # Instead of setting the i8 index, we could use a single 32bit
+    # little-endian value to cover the 4xi8 index.
+    #   i32 value:
+    #     0x 00 04 10 14
+    li $INDEX_PATTERN, 0x00041014
+    @{[vsetivli "zero", 1, "e32", "m1", "ta", "ma"]}
+    @{[vmv_v_x $V26, $INDEX_PATTERN]}
+
+    addi $H2, $H, 8
+
+    # Use index-load to get {f,e,b,a},{h,g,d,c}
+    @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]}
+    @{[vluxei8_v $V6, $H, $V26]}
+    @{[vluxei8_v $V7, $H2, $V26]}
+
+    # Setup v0 mask for the vmerge to replace the first word (idx==0) in key-scheduling.
+    # The AVL is 4 in SHA, so we could use a single e8(8 element masking) for masking.
+    @{[vsetivli "zero", 1, "e8", "m1", "ta", "ma"]}
+    @{[vmv_v_i $V0, 0x01]}
+
+    @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]}
+
+L_round_loop:
+    # Decrement length by 1
+    add $LEN, $LEN, -1
+
+    # Keep the current state as we need it later: H' = H+{a',b',c',...,h'}.
+    @{[vmv_v_v $V30, $V6]}
+    @{[vmv_v_v $V31, $V7]}
+
+    # Load the 512-bits of the message block in v1-v4 and perform
+    # an endian swap on each 4 bytes element.
+    @{[vle32_v $V1, $INP]}
+    @{[vrev8_v $V1, $V1]}
+    add $INP, $INP, 16
+    @{[vle32_v $V2, $INP]}
+    @{[vrev8_v $V2, $V2]}
+    add $INP, $INP, 16
+    @{[vle32_v $V3, $INP]}
+    @{[vrev8_v $V3, $V3]}
+    add $INP, $INP, 16
+    @{[vle32_v $V4, $INP]}
+    @{[vrev8_v $V4, $V4]}
+    add $INP, $INP, 16
+
+    # Quad-round 0 (+0, Wt from oldest to newest in v1->v2->v3->v4)
+    @{[vadd_vv $V5, $V10, $V1]}
+    @{[vsha2cl_vv $V7, $V6, $V5]}
+    @{[vsha2ch_vv $V6, $V7, $V5]}
+    @{[vmerge_vvm $V5, $V3, $V2, $V0]}
+    @{[vsha2ms_vv $V1, $V5, $V4]}  # Generate W[19:16]
+
+    # Quad-round 1 (+1, v2->v3->v4->v1)
+    @{[vadd_vv $V5, $V11, $V2]}
+    @{[vsha2cl_vv $V7, $V6, $V5]}
+    @{[vsha2ch_vv $V6, $V7, $V5]}
+    @{[vmerge_vvm $V5, $V4, $V3, $V0]}
+    @{[vsha2ms_vv $V2, $V5, $V1]}  # Generate W[23:20]
+
+    # Quad-round 2 (+2, v3->v4->v1->v2)
+    @{[vadd_vv $V5, $V12, $V3]}
+    @{[vsha2cl_vv $V7, $V6, $V5]}
+    @{[vsha2ch_vv $V6, $V7, $V5]}
+    @{[vmerge_vvm $V5, $V1, $V4, $V0]}
+    @{[vsha2ms_vv $V3, $V5, $V2]}  # Generate W[27:24]
+
+    # Quad-round 3 (+3, v4->v1->v2->v3)
+    @{[vadd_vv $V5, $V13, $V4]}
+    @{[vsha2cl_vv $V7, $V6, $V5]}
+    @{[vsha2ch_vv $V6, $V7, $V5]}
+    @{[vmerge_vvm $V5, $V2, $V1, $V0]}
+    @{[vsha2ms_vv $V4, $V5, $V3]}  # Generate W[31:28]
+
+    # Quad-round 4 (+0, v1->v2->v3->v4)
+    @{[vadd_vv $V5, $V14, $V1]}
+    @{[vsha2cl_vv $V7, $V6, $V5]}
+    @{[vsha2ch_vv $V6, $V7, $V5]}
+    @{[vmerge_vvm $V5, $V3, $V2, $V0]}
+    @{[vsha2ms_vv $V1, $V5, $V4]}  # Generate W[35:32]
+
+    # Quad-round 5 (+1, v2->v3->v4->v1)
+    @{[vadd_vv $V5, $V15, $V2]}
+    @{[vsha2cl_vv $V7, $V6, $V5]}
+    @{[vsha2ch_vv $V6, $V7, $V5]}
+    @{[vmerge_vvm $V5, $V4, $V3, $V0]}
+    @{[vsha2ms_vv $V2, $V5, $V1]}  # Generate W[39:36]
+
+    # Quad-round 6 (+2, v3->v4->v1->v2)
+    @{[vadd_vv $V5, $V16, $V3]}
+    @{[vsha2cl_vv $V7, $V6, $V5]}
+    @{[vsha2ch_vv $V6, $V7, $V5]}
+    @{[vmerge_vvm $V5, $V1, $V4, $V0]}
+    @{[vsha2ms_vv $V3, $V5, $V2]}  # Generate W[43:40]
+
+    # Quad-round 7 (+3, v4->v1->v2->v3)
+    @{[vadd_vv $V5, $V17, $V4]}
+    @{[vsha2cl_vv $V7, $V6, $V5]}
+    @{[vsha2ch_vv $V6, $V7, $V5]}
+    @{[vmerge_vvm $V5, $V2, $V1, $V0]}
+    @{[vsha2ms_vv $V4, $V5, $V3]}  # Generate W[47:44]
+
+    # Quad-round 8 (+0, v1->v2->v3->v4)
+    @{[vadd_vv $V5, $V18, $V1]}
+    @{[vsha2cl_vv $V7, $V6, $V5]}
+    @{[vsha2ch_vv $V6, $V7, $V5]}
+    @{[vmerge_vvm $V5, $V3, $V2, $V0]}
+    @{[vsha2ms_vv $V1, $V5, $V4]}  # Generate W[51:48]
+
+    # Quad-round 9 (+1, v2->v3->v4->v1)
+    @{[vadd_vv $V5, $V19, $V2]}
+    @{[vsha2cl_vv $V7, $V6, $V5]}
+    @{[vsha2ch_vv $V6, $V7, $V5]}
+    @{[vmerge_vvm $V5, $V4, $V3, $V0]}
+    @{[vsha2ms_vv $V2, $V5, $V1]}  # Generate W[55:52]
+
+    # Quad-round 10 (+2, v3->v4->v1->v2)
+    @{[vadd_vv $V5, $V20, $V3]}
+    @{[vsha2cl_vv $V7, $V6, $V5]}
+    @{[vsha2ch_vv $V6, $V7, $V5]}
+    @{[vmerge_vvm $V5, $V1, $V4, $V0]}
+    @{[vsha2ms_vv $V3, $V5, $V2]}  # Generate W[59:56]
+
+    # Quad-round 11 (+3, v4->v1->v2->v3)
+    @{[vadd_vv $V5, $V21, $V4]}
+    @{[vsha2cl_vv $V7, $V6, $V5]}
+    @{[vsha2ch_vv $V6, $V7, $V5]}
+    @{[vmerge_vvm $V5, $V2, $V1, $V0]}
+    @{[vsha2ms_vv $V4, $V5, $V3]}  # Generate W[63:60]
+
+    # Quad-round 12 (+0, v1->v2->v3->v4)
+    # Note that we stop generating new message schedule words (Wt, v1-13)
+    # as we already generated all the words we end up consuming (i.e., W[63:60]).
+    @{[vadd_vv $V5, $V22, $V1]}
+    @{[vsha2cl_vv $V7, $V6, $V5]}
+    @{[vsha2ch_vv $V6, $V7, $V5]}
+
+    # Quad-round 13 (+1, v2->v3->v4->v1)
+    @{[vadd_vv $V5, $V23, $V2]}
+    @{[vsha2cl_vv $V7, $V6, $V5]}
+    @{[vsha2ch_vv $V6, $V7, $V5]}
+
+    # Quad-round 14 (+2, v3->v4->v1->v2)
+    @{[vadd_vv $V5, $V24, $V3]}
+    @{[vsha2cl_vv $V7, $V6, $V5]}
+    @{[vsha2ch_vv $V6, $V7, $V5]}
+
+    # Quad-round 15 (+3, v4->v1->v2->v3)
+    @{[vadd_vv $V5, $V25, $V4]}
+    @{[vsha2cl_vv $V7, $V6, $V5]}
+    @{[vsha2ch_vv $V6, $V7, $V5]}
+
+    # H' = H+{a',b',c',...,h'}
+    @{[vadd_vv $V6, $V30, $V6]}
+    @{[vadd_vv $V7, $V31, $V7]}
+    bnez $LEN, L_round_loop
+
+    # Store {f,e,b,a},{h,g,d,c} back to {a,b,c,d},{e,f,g,h}.
+    @{[vsuxei8_v $V6, $H, $V26]}
+    @{[vsuxei8_v $V7, $H2, $V26]}
+
+    ret
+.size sha256_block_data_order_zvkb_zvknha_or_zvknhb,.-sha256_block_data_order_zvkb_zvknha_or_zvknhb
+
+.p2align 2
+.type $K256,\@object
+$K256:
+    .word 0x428a2f98, 0x71374491, 0xb5c0fbcf, 0xe9b5dba5
+    .word 0x3956c25b, 0x59f111f1, 0x923f82a4, 0xab1c5ed5
+    .word 0xd807aa98, 0x12835b01, 0x243185be, 0x550c7dc3
+    .word 0x72be5d74, 0x80deb1fe, 0x9bdc06a7, 0xc19bf174
+    .word 0xe49b69c1, 0xefbe4786, 0x0fc19dc6, 0x240ca1cc
+    .word 0x2de92c6f, 0x4a7484aa, 0x5cb0a9dc, 0x76f988da
+    .word 0x983e5152, 0xa831c66d, 0xb00327c8, 0xbf597fc7
+    .word 0xc6e00bf3, 0xd5a79147, 0x06ca6351, 0x14292967
+    .word 0x27b70a85, 0x2e1b2138, 0x4d2c6dfc, 0x53380d13
+    .word 0x650a7354, 0x766a0abb, 0x81c2c92e, 0x92722c85
+    .word 0xa2bfe8a1, 0xa81a664b, 0xc24b8b70, 0xc76c51a3
+    .word 0xd192e819, 0xd6990624, 0xf40e3585, 0x106aa070
+    .word 0x19a4c116, 0x1e376c08, 0x2748774c, 0x34b0bcb5
+    .word 0x391c0cb3, 0x4ed8aa4a, 0x5b9cca4f, 0x682e6ff3
+    .word 0x748f82ee, 0x78a5636f, 0x84c87814, 0x8cc70208
+    .word 0x90befffa, 0xa4506ceb, 0xbef9a3f7, 0xc67178f2
+.size $K256,.-$K256
+___
+
+print $code;
+
+close STDOUT or die "error closing STDOUT: $!";
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 09/12] RISC-V: crypto: add Zvknhb accelerated SHA384/512 implementations
  2023-10-25 18:36 [PATCH 00/12] RISC-V: provide some accelerated cryptography implementations using vector extensions Jerry Shih
                   ` (7 preceding siblings ...)
  2023-10-25 18:36 ` [PATCH 08/12] RISC-V: crypto: add Zvknha/b accelerated SHA224/256 implementations Jerry Shih
@ 2023-10-25 18:36 ` Jerry Shih
  2023-11-22  1:32   ` Eric Biggers
  2023-10-25 18:36 ` [PATCH 10/12] RISC-V: crypto: add Zvksed accelerated SM4 implementation Jerry Shih
                   ` (2 subsequent siblings)
  11 siblings, 1 reply; 50+ messages in thread
From: Jerry Shih @ 2023-10-25 18:36 UTC (permalink / raw)
  To: paul.walmsley, palmer, aou, herbert, davem
  Cc: andy.chiu, greentime.hu, conor.dooley, guoren, bjorn, heiko,
	ebiggers, ardb, phoebe.chen, hongrong.hsu, linux-riscv,
	linux-kernel, linux-crypto

Add SHA384 and 512 implementations using Zvknhb vector crypto extension
from OpenSSL(openssl/openssl#21923).

Co-developed-by: Charalampos Mitrodimas <charalampos.mitrodimas@vrull.eu>
Signed-off-by: Charalampos Mitrodimas <charalampos.mitrodimas@vrull.eu>
Co-developed-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
Co-developed-by: Phoebe Chen <phoebe.chen@sifive.com>
Signed-off-by: Phoebe Chen <phoebe.chen@sifive.com>
Signed-off-by: Jerry Shih <jerry.shih@sifive.com>
---
 arch/riscv/crypto/Kconfig                     |  13 +
 arch/riscv/crypto/Makefile                    |   7 +
 arch/riscv/crypto/sha512-riscv64-glue.c       | 133 +++++++++
 .../crypto/sha512-riscv64-zvkb-zvknhb.pl      | 266 ++++++++++++++++++
 4 files changed, 419 insertions(+)
 create mode 100644 arch/riscv/crypto/sha512-riscv64-glue.c
 create mode 100644 arch/riscv/crypto/sha512-riscv64-zvkb-zvknhb.pl

diff --git a/arch/riscv/crypto/Kconfig b/arch/riscv/crypto/Kconfig
index dbc86592063a..a5a34777d9c3 100644
--- a/arch/riscv/crypto/Kconfig
+++ b/arch/riscv/crypto/Kconfig
@@ -62,4 +62,17 @@ config CRYPTO_SHA256_RISCV64
 	  - Zvkb vector crypto extension
 	  - Zvknha or Zvknhb vector crypto extensions
 
+config CRYPTO_SHA512_RISCV64
+	default y if RISCV_ISA_V
+	tristate "Hash functions: SHA-384 and SHA-512"
+	depends on 64BIT && RISCV_ISA_V
+	select CRYPTO_HASH
+	select CRYPTO_SHA512
+	help
+	  SHA-512 secure hash algorithm (FIPS 180)
+
+	  Architecture: riscv64 using:
+	  - Zvkb vector crypto extension
+	  - Zvknhb vector crypto extension
+
 endmenu
diff --git a/arch/riscv/crypto/Makefile b/arch/riscv/crypto/Makefile
index 49577fb72cca..ebb4e861ebdc 100644
--- a/arch/riscv/crypto/Makefile
+++ b/arch/riscv/crypto/Makefile
@@ -15,6 +15,9 @@ ghash-riscv64-y := ghash-riscv64-glue.o ghash-riscv64-zvkg.o
 obj-$(CONFIG_CRYPTO_SHA256_RISCV64) += sha256-riscv64.o
 sha256-riscv64-y := sha256-riscv64-glue.o sha256-riscv64-zvkb-zvknha_or_zvknhb.o
 
+obj-$(CONFIG_CRYPTO_SHA512_RISCV64) += sha512-riscv64.o
+sha512-riscv64-y := sha512-riscv64-glue.o sha512-riscv64-zvkb-zvknhb.o
+
 quiet_cmd_perlasm = PERLASM $@
       cmd_perlasm = $(PERL) $(<) void $(@)
 
@@ -33,8 +36,12 @@ $(obj)/ghash-riscv64-zvkg.S: $(src)/ghash-riscv64-zvkg.pl
 $(obj)/sha256-riscv64-zvkb-zvknha_or_zvknhb.S: $(src)/sha256-riscv64-zvkb-zvknha_or_zvknhb.pl
 	$(call cmd,perlasm)
 
+$(obj)/sha512-riscv64-zvkb-zvknhb.S: $(src)/sha512-riscv64-zvkb-zvknhb.pl
+	$(call cmd,perlasm)
+
 clean-files += aes-riscv64-zvkned.S
 clean-files += aes-riscv64-zvbb-zvkg-zvkned.S
 clean-files += aes-riscv64-zvkb-zvkned.S
 clean-files += ghash-riscv64-zvkg.S
 clean-files += sha256-riscv64-zvkb-zvknha_or_zvknhb.S
+clean-files += sha512-riscv64-zvkb-zvknhb.S
diff --git a/arch/riscv/crypto/sha512-riscv64-glue.c b/arch/riscv/crypto/sha512-riscv64-glue.c
new file mode 100644
index 000000000000..31202b7b8c9b
--- /dev/null
+++ b/arch/riscv/crypto/sha512-riscv64-glue.c
@@ -0,0 +1,133 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Linux/riscv64 port of the OpenSSL SHA512 implementation for RISC-V 64
+ *
+ * Copyright (C) 2023 VRULL GmbH
+ * Author: Heiko Stuebner <heiko.stuebner@vrull.eu>
+ */
+
+#include <asm/simd.h>
+#include <asm/vector.h>
+#include <linux/module.h>
+#include <linux/types.h>
+#include <crypto/internal/hash.h>
+#include <crypto/internal/simd.h>
+#include <crypto/sha512_base.h>
+
+/*
+ * sha512 using zvkb and zvknhb vector crypto extension
+ *
+ * This asm function will just take the first 512-bit as the sha512 state from
+ * the pointer to `struct sha512_state`.
+ */
+void sha512_block_data_order_zvkb_zvknhb(struct sha512_state *digest,
+					 const u8 *data, int num_blks);
+
+static int riscv64_sha512_update(struct shash_desc *desc, const u8 *data,
+				 unsigned int len)
+{
+	int ret = 0;
+
+	/*
+	 * Make sure struct sha256_state begins directly with the SHA256
+	 * 256-bit internal state, as this is what the asm function expect.
+	 */
+	BUILD_BUG_ON(offsetof(struct sha512_state, state) != 0);
+
+	if (crypto_simd_usable()) {
+		kernel_vector_begin();
+		ret = sha512_base_do_update(
+			desc, data, len, sha512_block_data_order_zvkb_zvknhb);
+		kernel_vector_end();
+	} else {
+		ret = crypto_sha512_update(desc, data, len);
+	}
+
+	return ret;
+}
+
+static int riscv64_sha512_finup(struct shash_desc *desc, const u8 *data,
+				unsigned int len, u8 *out)
+{
+	if (crypto_simd_usable()) {
+		kernel_vector_begin();
+		if (len)
+			sha512_base_do_update(
+				desc, data, len,
+				sha512_block_data_order_zvkb_zvknhb);
+		sha512_base_do_finalize(desc,
+					sha512_block_data_order_zvkb_zvknhb);
+		kernel_vector_end();
+
+		return sha512_base_finish(desc, out);
+	}
+
+	return crypto_sha512_finup(desc, data, len, out);
+}
+
+static int riscv64_sha512_final(struct shash_desc *desc, u8 *out)
+{
+	return riscv64_sha512_finup(desc, NULL, 0, out);
+}
+
+static struct shash_alg sha512_alg[] = {
+	{
+		.digestsize = SHA512_DIGEST_SIZE,
+		.init = sha512_base_init,
+		.update = riscv64_sha512_update,
+		.final = riscv64_sha512_final,
+		.finup = riscv64_sha512_finup,
+		.descsize = sizeof(struct sha512_state),
+		.base.cra_name = "sha512",
+		.base.cra_driver_name = "sha512-riscv64-zvkb-zvknhb",
+		.base.cra_priority = 150,
+		.base.cra_blocksize = SHA512_BLOCK_SIZE,
+		.base.cra_module = THIS_MODULE,
+	},
+	{
+		.digestsize = SHA384_DIGEST_SIZE,
+		.init = sha384_base_init,
+		.update = riscv64_sha512_update,
+		.final = riscv64_sha512_final,
+		.finup = riscv64_sha512_finup,
+		.descsize = sizeof(struct sha512_state),
+		.base.cra_name = "sha384",
+		.base.cra_driver_name = "sha384-riscv64-zvkb-zvknhb",
+		.base.cra_priority = 150,
+		.base.cra_blocksize = SHA384_BLOCK_SIZE,
+		.base.cra_module = THIS_MODULE,
+	}
+};
+
+static inline bool check_sha512_ext(void)
+{
+	return riscv_isa_extension_available(NULL, ZVKNHB) &&
+	       riscv_isa_extension_available(NULL, ZVKB) &&
+	       riscv_vector_vlen() >= 128;
+}
+
+static int __init riscv64_sha512_mod_init(void)
+{
+	if (check_sha512_ext())
+		return crypto_register_shashes(sha512_alg,
+					       ARRAY_SIZE(sha512_alg));
+
+	return -ENODEV;
+}
+
+static void __exit riscv64_sha512_mod_fini(void)
+{
+	if (check_sha512_ext())
+		crypto_unregister_shashes(sha512_alg, ARRAY_SIZE(sha512_alg));
+}
+
+module_init(riscv64_sha512_mod_init);
+module_exit(riscv64_sha512_mod_fini);
+
+MODULE_DESCRIPTION("SHA-512 (RISC-V accelerated)");
+MODULE_AUTHOR("Andy Polyakov <appro@openssl.org>");
+MODULE_AUTHOR("Ard Biesheuvel <ard.biesheuvel@linaro.org>");
+MODULE_AUTHOR("Heiko Stuebner <heiko.stuebner@vrull.eu>");
+MODULE_LICENSE("GPL");
+MODULE_ALIAS_CRYPTO("sha384");
+MODULE_ALIAS_CRYPTO("sha512");
diff --git a/arch/riscv/crypto/sha512-riscv64-zvkb-zvknhb.pl b/arch/riscv/crypto/sha512-riscv64-zvkb-zvknhb.pl
new file mode 100644
index 000000000000..4be448266a59
--- /dev/null
+++ b/arch/riscv/crypto/sha512-riscv64-zvkb-zvknhb.pl
@@ -0,0 +1,266 @@
+#! /usr/bin/env perl
+# SPDX-License-Identifier: Apache-2.0 OR BSD-2-Clause
+#
+# This file is dual-licensed, meaning that you can use it under your
+# choice of either of the following two licenses:
+#
+# Copyright 2023 The OpenSSL Project Authors. All Rights Reserved.
+#
+# Licensed under the Apache License 2.0 (the "License"). You can obtain
+# a copy in the file LICENSE in the source distribution or at
+# https://www.openssl.org/source/license.html
+#
+# or
+#
+# Copyright (c) 2023, Christoph Müllner <christoph.muellner@vrull.eu>
+# Copyright (c) 2023, Phoebe Chen <phoebe.chen@sifive.com>
+# All rights reserved.
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions
+# are met:
+# 1. Redistributions of source code must retain the above copyright
+#    notice, this list of conditions and the following disclaimer.
+# 2. Redistributions in binary form must reproduce the above copyright
+#    notice, this list of conditions and the following disclaimer in the
+#    documentation and/or other materials provided with the distribution.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+# The generated code of this file depends on the following RISC-V extensions:
+# - RV64I
+# - RISC-V vector ('V') with VLEN >= 128
+# - RISC-V Vector Cryptography Bit-manipulation extension ('Zvkb')
+# - RISC-V Vector SHA-2 Secure Hash extension ('Zvknhb')
+
+use strict;
+use warnings;
+
+use FindBin qw($Bin);
+use lib "$Bin";
+use lib "$Bin/../../perlasm";
+use riscv;
+
+# $output is the last argument if it looks like a file (it has an extension)
+# $flavour is the first argument if it doesn't look like a file
+my $output = $#ARGV >= 0 && $ARGV[$#ARGV] =~ m|\.\w+$| ? pop : undef;
+my $flavour = $#ARGV >= 0 && $ARGV[0] !~ m|\.| ? shift : undef;
+
+$output and open STDOUT,">$output";
+
+my $code=<<___;
+.text
+___
+
+my ($V0, $V1, $V2, $V3, $V4, $V5, $V6, $V7,
+    $V8, $V9, $V10, $V11, $V12, $V13, $V14, $V15,
+    $V16, $V17, $V18, $V19, $V20, $V21, $V22, $V23,
+    $V24, $V25, $V26, $V27, $V28, $V29, $V30, $V31,
+) = map("v$_",(0..31));
+
+my $K512 = "K512";
+
+# Function arguments
+my ($H, $INP, $LEN, $KT, $H2, $INDEX_PATTERN) = ("a0", "a1", "a2", "a3", "t3", "t4");
+
+################################################################################
+# void sha512_block_data_order_zvkb_zvknhb(void *c, const void *p, size_t len)
+$code .= <<___;
+.p2align 2
+.globl sha512_block_data_order_zvkb_zvknhb
+.type sha512_block_data_order_zvkb_zvknhb,\@function
+sha512_block_data_order_zvkb_zvknhb:
+    @{[vsetivli "zero", 4, "e64", "m2", "ta", "ma"]}
+
+    # H is stored as {a,b,c,d},{e,f,g,h}, but we need {f,e,b,a},{h,g,d,c}
+    # The dst vtype is e64m2 and the index vtype is e8mf4.
+    # We use index-load with the following index pattern at v1.
+    #   i8 index:
+    #     40, 32, 8, 0
+    # Instead of setting the i8 index, we could use a single 32bit
+    # little-endian value to cover the 4xi8 index.
+    #   i32 value:
+    #     0x 00 08 20 28
+    li $INDEX_PATTERN, 0x00082028
+    @{[vsetivli "zero", 1, "e32", "m1", "ta", "ma"]}
+    @{[vmv_v_x $V1, $INDEX_PATTERN]}
+
+    addi $H2, $H, 16
+
+    # Use index-load to get {f,e,b,a},{h,g,d,c}
+    @{[vsetivli "zero", 4, "e64", "m2", "ta", "ma"]}
+    @{[vluxei8_v $V22, $H, $V1]}
+    @{[vluxei8_v $V24, $H2, $V1]}
+
+    # Setup v0 mask for the vmerge to replace the first word (idx==0) in key-scheduling.
+    # The AVL is 4 in SHA, so we could use a single e8(8 element masking) for masking.
+    @{[vsetivli "zero", 1, "e8", "m1", "ta", "ma"]}
+    @{[vmv_v_i $V0, 0x01]}
+
+    @{[vsetivli "zero", 4, "e64", "m2", "ta", "ma"]}
+
+L_round_loop:
+    # Load round constants K512
+    la $KT, $K512
+
+    # Decrement length by 1
+    addi $LEN, $LEN, -1
+
+    # Keep the current state as we need it later: H' = H+{a',b',c',...,h'}.
+    @{[vmv_v_v $V26, $V22]}
+    @{[vmv_v_v $V28, $V24]}
+
+    # Load the 1024-bits of the message block in v10-v16 and perform the endian
+    # swap.
+    @{[vle64_v $V10, $INP]}
+    @{[vrev8_v $V10, $V10]}
+    addi $INP, $INP, 32
+    @{[vle64_v $V12, $INP]}
+    @{[vrev8_v $V12, $V12]}
+    addi $INP, $INP, 32
+    @{[vle64_v $V14, $INP]}
+    @{[vrev8_v $V14, $V14]}
+    addi $INP, $INP, 32
+    @{[vle64_v $V16, $INP]}
+    @{[vrev8_v $V16, $V16]}
+    addi $INP, $INP, 32
+
+    .rept 4
+    # Quad-round 0 (+0, v10->v12->v14->v16)
+    @{[vle64_v $V20, $KT]}
+    addi $KT, $KT, 32
+    @{[vadd_vv $V18, $V20, $V10]}
+    @{[vsha2cl_vv $V24, $V22, $V18]}
+    @{[vsha2ch_vv $V22, $V24, $V18]}
+    @{[vmerge_vvm $V18, $V14, $V12, $V0]}
+    @{[vsha2ms_vv $V10, $V18, $V16]}
+
+    # Quad-round 1 (+1, v12->v14->v16->v10)
+    @{[vle64_v $V20, $KT]}
+    addi $KT, $KT, 32
+    @{[vadd_vv $V18, $V20, $V12]}
+    @{[vsha2cl_vv $V24, $V22, $V18]}
+    @{[vsha2ch_vv $V22, $V24, $V18]}
+    @{[vmerge_vvm $V18, $V16, $V14, $V0]}
+    @{[vsha2ms_vv $V12, $V18, $V10]}
+
+    # Quad-round 2 (+2, v14->v16->v10->v12)
+    @{[vle64_v $V20, $KT]}
+    addi $KT, $KT, 32
+    @{[vadd_vv $V18, $V20, $V14]}
+    @{[vsha2cl_vv $V24, $V22, $V18]}
+    @{[vsha2ch_vv $V22, $V24, $V18]}
+    @{[vmerge_vvm $V18, $V10, $V16, $V0]}
+    @{[vsha2ms_vv $V14, $V18, $V12]}
+
+    # Quad-round 3 (+3, v16->v10->v12->v14)
+    @{[vle64_v $V20, $KT]}
+    addi $KT, $KT, 32
+    @{[vadd_vv $V18, $V20, $V16]}
+    @{[vsha2cl_vv $V24, $V22, $V18]}
+    @{[vsha2ch_vv $V22, $V24, $V18]}
+    @{[vmerge_vvm $V18, $V12, $V10, $V0]}
+    @{[vsha2ms_vv $V16, $V18, $V14]}
+    .endr
+
+    # Quad-round 16 (+0, v10->v12->v14->v16)
+    # Note that we stop generating new message schedule words (Wt, v10-16)
+    # as we already generated all the words we end up consuming (i.e., W[79:76]).
+    @{[vle64_v $V20, $KT]}
+    addi $KT, $KT, 32
+    @{[vadd_vv $V18, $V20, $V10]}
+    @{[vsha2cl_vv $V24, $V22, $V18]}
+    @{[vsha2ch_vv $V22, $V24, $V18]}
+
+    # Quad-round 17 (+1, v12->v14->v16->v10)
+    @{[vle64_v $V20, $KT]}
+    addi $KT, $KT, 32
+    @{[vadd_vv $V18, $V20, $V12]}
+    @{[vsha2cl_vv $V24, $V22, $V18]}
+    @{[vsha2ch_vv $V22, $V24, $V18]}
+
+    # Quad-round 18 (+2, v14->v16->v10->v12)
+    @{[vle64_v $V20, $KT]}
+    addi $KT, $KT, 32
+    @{[vadd_vv $V18, $V20, $V14]}
+    @{[vsha2cl_vv $V24, $V22, $V18]}
+    @{[vsha2ch_vv $V22, $V24, $V18]}
+
+    # Quad-round 19 (+3, v16->v10->v12->v14)
+    @{[vle64_v $V20, $KT]}
+    # No t1 increment needed.
+    @{[vadd_vv $V18, $V20, $V16]}
+    @{[vsha2cl_vv $V24, $V22, $V18]}
+    @{[vsha2ch_vv $V22, $V24, $V18]}
+
+    # H' = H+{a',b',c',...,h'}
+    @{[vadd_vv $V22, $V26, $V22]}
+    @{[vadd_vv $V24, $V28, $V24]}
+    bnez $LEN, L_round_loop
+
+    # Store {f,e,b,a},{h,g,d,c} back to {a,b,c,d},{e,f,g,h}.
+    @{[vsuxei8_v $V22, $H, $V1]}
+    @{[vsuxei8_v $V24, $H2, $V1]}
+
+    ret
+.size sha512_block_data_order_zvkb_zvknhb,.-sha512_block_data_order_zvkb_zvknhb
+
+.p2align 3
+.type $K512,\@object
+$K512:
+    .dword 0x428a2f98d728ae22, 0x7137449123ef65cd
+    .dword 0xb5c0fbcfec4d3b2f, 0xe9b5dba58189dbbc
+    .dword 0x3956c25bf348b538, 0x59f111f1b605d019
+    .dword 0x923f82a4af194f9b, 0xab1c5ed5da6d8118
+    .dword 0xd807aa98a3030242, 0x12835b0145706fbe
+    .dword 0x243185be4ee4b28c, 0x550c7dc3d5ffb4e2
+    .dword 0x72be5d74f27b896f, 0x80deb1fe3b1696b1
+    .dword 0x9bdc06a725c71235, 0xc19bf174cf692694
+    .dword 0xe49b69c19ef14ad2, 0xefbe4786384f25e3
+    .dword 0x0fc19dc68b8cd5b5, 0x240ca1cc77ac9c65
+    .dword 0x2de92c6f592b0275, 0x4a7484aa6ea6e483
+    .dword 0x5cb0a9dcbd41fbd4, 0x76f988da831153b5
+    .dword 0x983e5152ee66dfab, 0xa831c66d2db43210
+    .dword 0xb00327c898fb213f, 0xbf597fc7beef0ee4
+    .dword 0xc6e00bf33da88fc2, 0xd5a79147930aa725
+    .dword 0x06ca6351e003826f, 0x142929670a0e6e70
+    .dword 0x27b70a8546d22ffc, 0x2e1b21385c26c926
+    .dword 0x4d2c6dfc5ac42aed, 0x53380d139d95b3df
+    .dword 0x650a73548baf63de, 0x766a0abb3c77b2a8
+    .dword 0x81c2c92e47edaee6, 0x92722c851482353b
+    .dword 0xa2bfe8a14cf10364, 0xa81a664bbc423001
+    .dword 0xc24b8b70d0f89791, 0xc76c51a30654be30
+    .dword 0xd192e819d6ef5218, 0xd69906245565a910
+    .dword 0xf40e35855771202a, 0x106aa07032bbd1b8
+    .dword 0x19a4c116b8d2d0c8, 0x1e376c085141ab53
+    .dword 0x2748774cdf8eeb99, 0x34b0bcb5e19b48a8
+    .dword 0x391c0cb3c5c95a63, 0x4ed8aa4ae3418acb
+    .dword 0x5b9cca4f7763e373, 0x682e6ff3d6b2b8a3
+    .dword 0x748f82ee5defb2fc, 0x78a5636f43172f60
+    .dword 0x84c87814a1f0ab72, 0x8cc702081a6439ec
+    .dword 0x90befffa23631e28, 0xa4506cebde82bde9
+    .dword 0xbef9a3f7b2c67915, 0xc67178f2e372532b
+    .dword 0xca273eceea26619c, 0xd186b8c721c0c207
+    .dword 0xeada7dd6cde0eb1e, 0xf57d4f7fee6ed178
+    .dword 0x06f067aa72176fba, 0x0a637dc5a2c898a6
+    .dword 0x113f9804bef90dae, 0x1b710b35131c471b
+    .dword 0x28db77f523047d84, 0x32caab7b40c72493
+    .dword 0x3c9ebe0a15c9bebc, 0x431d67c49c100d4c
+    .dword 0x4cc5d4becb3e42b6, 0x597f299cfc657e2a
+    .dword 0x5fcb6fab3ad6faec, 0x6c44198c4a475817
+.size $K512,.-$K512
+___
+
+print $code;
+
+close STDOUT or die "error closing STDOUT: $!";
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 10/12] RISC-V: crypto: add Zvksed accelerated SM4 implementation
  2023-10-25 18:36 [PATCH 00/12] RISC-V: provide some accelerated cryptography implementations using vector extensions Jerry Shih
                   ` (8 preceding siblings ...)
  2023-10-25 18:36 ` [PATCH 09/12] RISC-V: crypto: add Zvknhb accelerated SHA384/512 implementations Jerry Shih
@ 2023-10-25 18:36 ` Jerry Shih
  2023-11-02  5:58   ` Eric Biggers
  2023-10-25 18:36 ` [PATCH 11/12] RISC-V: crypto: add Zvksh accelerated SM3 implementation Jerry Shih
  2023-10-25 18:36 ` [PATCH 12/12] RISC-V: crypto: add Zvkb accelerated ChaCha20 implementation Jerry Shih
  11 siblings, 1 reply; 50+ messages in thread
From: Jerry Shih @ 2023-10-25 18:36 UTC (permalink / raw)
  To: paul.walmsley, palmer, aou, herbert, davem
  Cc: andy.chiu, greentime.hu, conor.dooley, guoren, bjorn, heiko,
	ebiggers, ardb, phoebe.chen, hongrong.hsu, linux-riscv,
	linux-kernel, linux-crypto

Add SM4 implementation using Zvksed vector crypto extension from OpenSSL
(openssl/openssl#21923).

The perlasm here is different from the original implementation in OpenSSL.
In OpenSSL, SM4 has the separated set_encrypt_key and set_decrypt_key
functions. In kernel, these set_key functions are merged into a single
one in order to skip the redundant key expanding instructions.

Co-developed-by: Christoph Müllner <christoph.muellner@vrull.eu>
Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
Co-developed-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
Signed-off-by: Jerry Shih <jerry.shih@sifive.com>
---
 arch/riscv/crypto/Kconfig               |  19 ++
 arch/riscv/crypto/Makefile              |   7 +
 arch/riscv/crypto/sm4-riscv64-glue.c    | 120 +++++++++++
 arch/riscv/crypto/sm4-riscv64-zvksed.pl | 268 ++++++++++++++++++++++++
 4 files changed, 414 insertions(+)
 create mode 100644 arch/riscv/crypto/sm4-riscv64-glue.c
 create mode 100644 arch/riscv/crypto/sm4-riscv64-zvksed.pl

diff --git a/arch/riscv/crypto/Kconfig b/arch/riscv/crypto/Kconfig
index a5a34777d9c3..ff004afa2874 100644
--- a/arch/riscv/crypto/Kconfig
+++ b/arch/riscv/crypto/Kconfig
@@ -75,4 +75,23 @@ config CRYPTO_SHA512_RISCV64
 	  - Zvkb vector crypto extension
 	  - Zvknhb vector crypto extension
 
+config CRYPTO_SM4_RISCV64
+	default y if RISCV_ISA_V
+	tristate "Ciphers: SM4 (ShangMi 4)"
+	depends on 64BIT && RISCV_ISA_V
+	select CRYPTO_ALGAPI
+	select CRYPTO_SM4
+	select CRYPTO_SM4_GENERIC
+	help
+	  SM4 cipher algorithms (OSCCA GB/T 32907-2016,
+	  ISO/IEC 18033-3:2010/Amd 1:2021)
+
+	  SM4 (GBT.32907-2016) is a cryptographic standard issued by the
+	  Organization of State Commercial Administration of China (OSCCA)
+	  as an authorized cryptographic algorithms for the use within China.
+
+	  Architecture: riscv64 using:
+	  - Zvkb vector crypto extension
+	  - Zvksed vector crypto extension
+
 endmenu
diff --git a/arch/riscv/crypto/Makefile b/arch/riscv/crypto/Makefile
index ebb4e861ebdc..ffad095e531f 100644
--- a/arch/riscv/crypto/Makefile
+++ b/arch/riscv/crypto/Makefile
@@ -18,6 +18,9 @@ sha256-riscv64-y := sha256-riscv64-glue.o sha256-riscv64-zvkb-zvknha_or_zvknhb.o
 obj-$(CONFIG_CRYPTO_SHA512_RISCV64) += sha512-riscv64.o
 sha512-riscv64-y := sha512-riscv64-glue.o sha512-riscv64-zvkb-zvknhb.o
 
+obj-$(CONFIG_CRYPTO_SM4_RISCV64) += sm4-riscv64.o
+sm4-riscv64-y := sm4-riscv64-glue.o sm4-riscv64-zvksed.o
+
 quiet_cmd_perlasm = PERLASM $@
       cmd_perlasm = $(PERL) $(<) void $(@)
 
@@ -39,9 +42,13 @@ $(obj)/sha256-riscv64-zvkb-zvknha_or_zvknhb.S: $(src)/sha256-riscv64-zvkb-zvknha
 $(obj)/sha512-riscv64-zvkb-zvknhb.S: $(src)/sha512-riscv64-zvkb-zvknhb.pl
 	$(call cmd,perlasm)
 
+$(obj)/sm4-riscv64-zvksed.S: $(src)/sm4-riscv64-zvksed.pl
+	$(call cmd,perlasm)
+
 clean-files += aes-riscv64-zvkned.S
 clean-files += aes-riscv64-zvbb-zvkg-zvkned.S
 clean-files += aes-riscv64-zvkb-zvkned.S
 clean-files += ghash-riscv64-zvkg.S
 clean-files += sha256-riscv64-zvkb-zvknha_or_zvknhb.S
 clean-files += sha512-riscv64-zvkb-zvknhb.S
+clean-files += sm4-riscv64-zvksed.S
diff --git a/arch/riscv/crypto/sm4-riscv64-glue.c b/arch/riscv/crypto/sm4-riscv64-glue.c
new file mode 100644
index 000000000000..2d9983a4b1ee
--- /dev/null
+++ b/arch/riscv/crypto/sm4-riscv64-glue.c
@@ -0,0 +1,120 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Linux/riscv64 port of the OpenSSL SM4 implementation for RISC-V 64
+ *
+ * Copyright (C) 2023 VRULL GmbH
+ * Author: Heiko Stuebner <heiko.stuebner@vrull.eu>
+ *
+ * Copyright (C) 2023 SiFive, Inc.
+ * Author: Jerry Shih <jerry.shih@sifive.com>
+ */
+
+#include <asm/simd.h>
+#include <asm/vector.h>
+#include <crypto/sm4.h>
+#include <crypto/internal/cipher.h>
+#include <crypto/internal/simd.h>
+#include <linux/crypto.h>
+#include <linux/delay.h>
+#include <linux/err.h>
+#include <linux/module.h>
+#include <linux/types.h>
+
+/* sm4 using zvksed vector crypto extension */
+void rv64i_zvksed_sm4_encrypt(const u8 *in, u8 *out, const u32 *key);
+void rv64i_zvksed_sm4_decrypt(const u8 *in, u8 *out, const u32 *key);
+int rv64i_zvksed_sm4_set_key(const u8 *userKey, unsigned int key_len,
+			     u32 *enc_key, u32 *dec_key);
+
+static int riscv64_sm4_setkey_zvksed(struct crypto_tfm *tfm, const u8 *key,
+				     unsigned int key_len)
+{
+	struct sm4_ctx *ctx = crypto_tfm_ctx(tfm);
+	int ret = 0;
+
+	if (crypto_simd_usable()) {
+		kernel_vector_begin();
+		if (rv64i_zvksed_sm4_set_key(key, key_len, ctx->rkey_enc,
+					     ctx->rkey_dec))
+			ret = -EINVAL;
+		kernel_vector_end();
+	} else {
+		ret = sm4_expandkey(ctx, key, key_len);
+	}
+
+	return ret;
+}
+
+static void riscv64_sm4_encrypt_zvksed(struct crypto_tfm *tfm, u8 *dst,
+				       const u8 *src)
+{
+	const struct sm4_ctx *ctx = crypto_tfm_ctx(tfm);
+
+	if (crypto_simd_usable()) {
+		kernel_vector_begin();
+		rv64i_zvksed_sm4_encrypt(src, dst, ctx->rkey_enc);
+		kernel_vector_end();
+	} else {
+		sm4_crypt_block(ctx->rkey_enc, dst, src);
+	}
+}
+
+static void riscv64_sm4_decrypt_zvksed(struct crypto_tfm *tfm, u8 *dst,
+				       const u8 *src)
+{
+	const struct sm4_ctx *ctx = crypto_tfm_ctx(tfm);
+
+	if (crypto_simd_usable()) {
+		kernel_vector_begin();
+		rv64i_zvksed_sm4_decrypt(src, dst, ctx->rkey_dec);
+		kernel_vector_end();
+	} else {
+		sm4_crypt_block(ctx->rkey_dec, dst, src);
+	}
+}
+
+struct crypto_alg riscv64_sm4_zvksed_alg = {
+	.cra_name = "sm4",
+	.cra_driver_name = "sm4-riscv64-zvkb-zvksed",
+	.cra_module = THIS_MODULE,
+	.cra_priority = 300,
+	.cra_flags = CRYPTO_ALG_TYPE_CIPHER,
+	.cra_blocksize = SM4_BLOCK_SIZE,
+	.cra_ctxsize = sizeof(struct sm4_ctx),
+	.cra_cipher = {
+		.cia_min_keysize = SM4_KEY_SIZE,
+		.cia_max_keysize = SM4_KEY_SIZE,
+		.cia_setkey = riscv64_sm4_setkey_zvksed,
+		.cia_encrypt = riscv64_sm4_encrypt_zvksed,
+		.cia_decrypt = riscv64_sm4_decrypt_zvksed,
+	},
+};
+
+static inline bool check_sm4_ext(void)
+{
+	return riscv_isa_extension_available(NULL, ZVKSED) &&
+	       riscv_isa_extension_available(NULL, ZVKB) &&
+	       riscv_vector_vlen() >= 128;
+}
+
+static int __init riscv64_sm4_mod_init(void)
+{
+	if (check_sm4_ext())
+		return crypto_register_alg(&riscv64_sm4_zvksed_alg);
+
+	return -ENODEV;
+}
+
+static void __exit riscv64_sm4_mod_fini(void)
+{
+	if (check_sm4_ext())
+		crypto_unregister_alg(&riscv64_sm4_zvksed_alg);
+}
+
+module_init(riscv64_sm4_mod_init);
+module_exit(riscv64_sm4_mod_fini);
+
+MODULE_DESCRIPTION("SM4 (RISC-V accelerated)");
+MODULE_AUTHOR("Heiko Stuebner <heiko.stuebner@vrull.eu>");
+MODULE_LICENSE("GPL");
+MODULE_ALIAS_CRYPTO("sm4");
diff --git a/arch/riscv/crypto/sm4-riscv64-zvksed.pl b/arch/riscv/crypto/sm4-riscv64-zvksed.pl
new file mode 100644
index 000000000000..a764668c8398
--- /dev/null
+++ b/arch/riscv/crypto/sm4-riscv64-zvksed.pl
@@ -0,0 +1,268 @@
+#! /usr/bin/env perl
+# SPDX-License-Identifier: Apache-2.0 OR BSD-2-Clause
+#
+# This file is dual-licensed, meaning that you can use it under your
+# choice of either of the following two licenses:
+#
+# Copyright 2023 The OpenSSL Project Authors. All Rights Reserved.
+#
+# Licensed under the Apache License 2.0 (the "License"). You can obtain
+# a copy in the file LICENSE in the source distribution or at
+# https://www.openssl.org/source/license.html
+#
+# or
+#
+# Copyright (c) 2023, Christoph Müllner <christoph.muellner@vrull.eu>
+# Copyright (c) 2023, Jerry Shih <jerry.shih@sifive.com>
+# All rights reserved.
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions
+# are met:
+# 1. Redistributions of source code must retain the above copyright
+#    notice, this list of conditions and the following disclaimer.
+# 2. Redistributions in binary form must reproduce the above copyright
+#    notice, this list of conditions and the following disclaimer in the
+#    documentation and/or other materials provided with the distribution.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+# The generated code of this file depends on the following RISC-V extensions:
+# - RV64I
+# - RISC-V Vector ('V') with VLEN >= 128
+# - RISC-V Vector Cryptography Bit-manipulation extension ('Zvkb')
+# - RISC-V Vector SM4 Block Cipher extension ('Zvksed')
+
+use strict;
+use warnings;
+
+use FindBin qw($Bin);
+use lib "$Bin";
+use lib "$Bin/../../perlasm";
+use riscv;
+
+# $output is the last argument if it looks like a file (it has an extension)
+# $flavour is the first argument if it doesn't look like a file
+my $output = $#ARGV >= 0 && $ARGV[$#ARGV] =~ m|\.\w+$| ? pop : undef;
+my $flavour = $#ARGV >= 0 && $ARGV[0] !~ m|\.| ? shift : undef;
+
+$output and open STDOUT,">$output";
+
+my $code=<<___;
+.text
+___
+
+####
+# int rv64i_zvksed_sm4_set_key(const u8 *userKey, unsigned int key_len,
+#			                         u32 *enc_key, u32 *dec_key);
+#
+{
+my ($ukey,$key_len,$enc_key,$dec_key)=("a0","a1","a2","a3");
+my ($fk,$stride)=("a4","a5");
+my ($t0,$t1)=("t0","t1");
+my ($vukey,$vfk,$vk0,$vk1,$vk2,$vk3,$vk4,$vk5,$vk6,$vk7)=("v1","v2","v3","v4","v5","v6","v7","v8","v9","v10");
+$code .= <<___;
+.p2align 3
+.globl rv64i_zvksed_sm4_set_key
+.type rv64i_zvksed_sm4_set_key,\@function
+rv64i_zvksed_sm4_set_key:
+    li $t0, 16
+    beq $t0, $key_len, 1f
+    li a0, 1
+    ret
+1:
+
+    @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]}
+
+    # Load the user key
+    @{[vle32_v $vukey, $ukey]}
+    @{[vrev8_v $vukey, $vukey]}
+
+    # Load the FK.
+    la $fk, FK
+    @{[vle32_v $vfk, $fk]}
+
+    # Generate round keys.
+    @{[vxor_vv $vukey, $vukey, $vfk]}
+    @{[vsm4k_vi $vk0, $vukey, 0]} # rk[0:3]
+    @{[vsm4k_vi $vk1, $vk0, 1]} # rk[4:7]
+    @{[vsm4k_vi $vk2, $vk1, 2]} # rk[8:11]
+    @{[vsm4k_vi $vk3, $vk2, 3]} # rk[12:15]
+    @{[vsm4k_vi $vk4, $vk3, 4]} # rk[16:19]
+    @{[vsm4k_vi $vk5, $vk4, 5]} # rk[20:23]
+    @{[vsm4k_vi $vk6, $vk5, 6]} # rk[24:27]
+    @{[vsm4k_vi $vk7, $vk6, 7]} # rk[28:31]
+
+    # Store enc round keys
+    @{[vse32_v $vk0, $enc_key]} # rk[0:3]
+    addi $enc_key, $enc_key, 16
+    @{[vse32_v $vk1, $enc_key]} # rk[4:7]
+    addi $enc_key, $enc_key, 16
+    @{[vse32_v $vk2, $enc_key]} # rk[8:11]
+    addi $enc_key, $enc_key, 16
+    @{[vse32_v $vk3, $enc_key]} # rk[12:15]
+    addi $enc_key, $enc_key, 16
+    @{[vse32_v $vk4, $enc_key]} # rk[16:19]
+    addi $enc_key, $enc_key, 16
+    @{[vse32_v $vk5, $enc_key]} # rk[20:23]
+    addi $enc_key, $enc_key, 16
+    @{[vse32_v $vk6, $enc_key]} # rk[24:27]
+    addi $enc_key, $enc_key, 16
+    @{[vse32_v $vk7, $enc_key]} # rk[28:31]
+
+    # Store dec round keys in reverse order
+    addi $dec_key, $dec_key, 12
+    li $stride, -4
+    @{[vsse32_v $vk7, $dec_key, $stride]} # rk[31:28]
+    addi $dec_key, $dec_key, 16
+    @{[vsse32_v $vk6, $dec_key, $stride]} # rk[27:24]
+    addi $dec_key, $dec_key, 16
+    @{[vsse32_v $vk5, $dec_key, $stride]} # rk[23:20]
+    addi $dec_key, $dec_key, 16
+    @{[vsse32_v $vk4, $dec_key, $stride]} # rk[19:16]
+    addi $dec_key, $dec_key, 16
+    @{[vsse32_v $vk3, $dec_key, $stride]} # rk[15:12]
+    addi $dec_key, $dec_key, 16
+    @{[vsse32_v $vk2, $dec_key, $stride]} # rk[11:8]
+    addi $dec_key, $dec_key, 16
+    @{[vsse32_v $vk1, $dec_key, $stride]} # rk[7:4]
+    addi $dec_key, $dec_key, 16
+    @{[vsse32_v $vk0, $dec_key, $stride]} # rk[3:0]
+
+    li a0, 0
+    ret
+.size rv64i_zvksed_sm4_set_key,.-rv64i_zvksed_sm4_set_key
+___
+}
+
+####
+# void rv64i_zvksed_sm4_encrypt(const unsigned char *in, unsigned char *out,
+#                               const SM4_KEY *key);
+#
+{
+my ($in,$out,$keys,$stride)=("a0","a1","a2","t0");
+my ($vdata,$vk0,$vk1,$vk2,$vk3,$vk4,$vk5,$vk6,$vk7,$vgen)=("v1","v2","v3","v4","v5","v6","v7","v8","v9","v10");
+$code .= <<___;
+.p2align 3
+.globl rv64i_zvksed_sm4_encrypt
+.type rv64i_zvksed_sm4_encrypt,\@function
+rv64i_zvksed_sm4_encrypt:
+    @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]}
+
+    # Load input data
+    @{[vle32_v $vdata, $in]}
+    @{[vrev8_v $vdata, $vdata]}
+
+    # Order of elements was adjusted in sm4_set_key()
+    # Encrypt with all keys
+    @{[vle32_v $vk0, $keys]} # rk[0:3]
+    @{[vsm4r_vs $vdata, $vk0]}
+    addi $keys, $keys, 16
+    @{[vle32_v $vk1, $keys]} # rk[4:7]
+    @{[vsm4r_vs $vdata, $vk1]}
+    addi $keys, $keys, 16
+    @{[vle32_v $vk2, $keys]} # rk[8:11]
+    @{[vsm4r_vs $vdata, $vk2]}
+    addi $keys, $keys, 16
+    @{[vle32_v $vk3, $keys]} # rk[12:15]
+    @{[vsm4r_vs $vdata, $vk3]}
+    addi $keys, $keys, 16
+    @{[vle32_v $vk4, $keys]} # rk[16:19]
+    @{[vsm4r_vs $vdata, $vk4]}
+    addi $keys, $keys, 16
+    @{[vle32_v $vk5, $keys]} # rk[20:23]
+    @{[vsm4r_vs $vdata, $vk5]}
+    addi $keys, $keys, 16
+    @{[vle32_v $vk6, $keys]} # rk[24:27]
+    @{[vsm4r_vs $vdata, $vk6]}
+    addi $keys, $keys, 16
+    @{[vle32_v $vk7, $keys]} # rk[28:31]
+    @{[vsm4r_vs $vdata, $vk7]}
+
+    # Save the ciphertext (in reverse element order)
+    @{[vrev8_v $vdata, $vdata]}
+    li $stride, -4
+    addi $out, $out, 12
+    @{[vsse32_v $vdata, $out, $stride]}
+
+    ret
+.size rv64i_zvksed_sm4_encrypt,.-rv64i_zvksed_sm4_encrypt
+___
+}
+
+####
+# void rv64i_zvksed_sm4_decrypt(const unsigned char *in, unsigned char *out,
+#                               const SM4_KEY *key);
+#
+{
+my ($in,$out,$keys,$stride)=("a0","a1","a2","t0");
+my ($vdata,$vk0,$vk1,$vk2,$vk3,$vk4,$vk5,$vk6,$vk7,$vgen)=("v1","v2","v3","v4","v5","v6","v7","v8","v9","v10");
+$code .= <<___;
+.p2align 3
+.globl rv64i_zvksed_sm4_decrypt
+.type rv64i_zvksed_sm4_decrypt,\@function
+rv64i_zvksed_sm4_decrypt:
+    @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]}
+
+    # Load input data
+    @{[vle32_v $vdata, $in]}
+    @{[vrev8_v $vdata, $vdata]}
+
+    # Order of key elements was adjusted in sm4_set_key()
+    # Decrypt with all keys
+    @{[vle32_v $vk7, $keys]} # rk[31:28]
+    @{[vsm4r_vs $vdata, $vk7]}
+    addi $keys, $keys, 16
+    @{[vle32_v $vk6, $keys]} # rk[27:24]
+    @{[vsm4r_vs $vdata, $vk6]}
+    addi $keys, $keys, 16
+    @{[vle32_v $vk5, $keys]} # rk[23:20]
+    @{[vsm4r_vs $vdata, $vk5]}
+    addi $keys, $keys, 16
+    @{[vle32_v $vk4, $keys]} # rk[19:16]
+    @{[vsm4r_vs $vdata, $vk4]}
+    addi $keys, $keys, 16
+    @{[vle32_v $vk3, $keys]} # rk[15:11]
+    @{[vsm4r_vs $vdata, $vk3]}
+    addi $keys, $keys, 16
+    @{[vle32_v $vk2, $keys]} # rk[11:8]
+    @{[vsm4r_vs $vdata, $vk2]}
+    addi $keys, $keys, 16
+    @{[vle32_v $vk1, $keys]} # rk[7:4]
+    @{[vsm4r_vs $vdata, $vk1]}
+    addi $keys, $keys, 16
+    @{[vle32_v $vk0, $keys]} # rk[3:0]
+    @{[vsm4r_vs $vdata, $vk0]}
+
+    # Save the ciphertext (in reverse element order)
+    @{[vrev8_v $vdata, $vdata]}
+    li $stride, -4
+    addi $out, $out, 12
+    @{[vsse32_v $vdata, $out, $stride]}
+
+    ret
+.size rv64i_zvksed_sm4_decrypt,.-rv64i_zvksed_sm4_decrypt
+___
+}
+
+$code .= <<___;
+# Family Key (little-endian 32-bit chunks)
+.p2align 3
+FK:
+    .word 0xA3B1BAC6, 0x56AA3350, 0x677D9197, 0xB27022DC
+.size FK,.-FK
+___
+
+print $code;
+
+close STDOUT or die "error closing STDOUT: $!";
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 11/12] RISC-V: crypto: add Zvksh accelerated SM3 implementation
  2023-10-25 18:36 [PATCH 00/12] RISC-V: provide some accelerated cryptography implementations using vector extensions Jerry Shih
                   ` (9 preceding siblings ...)
  2023-10-25 18:36 ` [PATCH 10/12] RISC-V: crypto: add Zvksed accelerated SM4 implementation Jerry Shih
@ 2023-10-25 18:36 ` Jerry Shih
  2023-10-25 18:36 ` [PATCH 12/12] RISC-V: crypto: add Zvkb accelerated ChaCha20 implementation Jerry Shih
  11 siblings, 0 replies; 50+ messages in thread
From: Jerry Shih @ 2023-10-25 18:36 UTC (permalink / raw)
  To: paul.walmsley, palmer, aou, herbert, davem
  Cc: andy.chiu, greentime.hu, conor.dooley, guoren, bjorn, heiko,
	ebiggers, ardb, phoebe.chen, hongrong.hsu, linux-riscv,
	linux-kernel, linux-crypto

Add SM3 implementation using Zvksh vector crypto extension from OpenSSL
(openssl/openssl#21923).

Co-developed-by: Christoph Müllner <christoph.muellner@vrull.eu>
Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
Co-developed-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
Signed-off-by: Jerry Shih <jerry.shih@sifive.com>
---
 arch/riscv/crypto/Kconfig              |  13 ++
 arch/riscv/crypto/Makefile             |   7 +
 arch/riscv/crypto/sm3-riscv64-glue.c   | 121 +++++++++++++
 arch/riscv/crypto/sm3-riscv64-zvksh.pl | 230 +++++++++++++++++++++++++
 4 files changed, 371 insertions(+)
 create mode 100644 arch/riscv/crypto/sm3-riscv64-glue.c
 create mode 100644 arch/riscv/crypto/sm3-riscv64-zvksh.pl

diff --git a/arch/riscv/crypto/Kconfig b/arch/riscv/crypto/Kconfig
index ff004afa2874..2797b37394bb 100644
--- a/arch/riscv/crypto/Kconfig
+++ b/arch/riscv/crypto/Kconfig
@@ -75,6 +75,19 @@ config CRYPTO_SHA512_RISCV64
 	  - Zvkb vector crypto extension
 	  - Zvknhb vector crypto extension
 
+config CRYPTO_SM3_RISCV64
+	default y if RISCV_ISA_V
+	tristate "Hash functions: SM3 (ShangMi 3)"
+	depends on 64BIT && RISCV_ISA_V
+	select CRYPTO_HASH
+	select CRYPTO_SM3
+	help
+	  SM3 (ShangMi 3) secure hash function (OSCCA GM/T 0004-2012)
+
+	  Architecture: riscv64 using:
+	  - Zvkb vector crypto extension
+	  - Zvksh vector crypto extension
+
 config CRYPTO_SM4_RISCV64
 	default y if RISCV_ISA_V
 	tristate "Ciphers: SM4 (ShangMi 4)"
diff --git a/arch/riscv/crypto/Makefile b/arch/riscv/crypto/Makefile
index ffad095e531f..b772417703fd 100644
--- a/arch/riscv/crypto/Makefile
+++ b/arch/riscv/crypto/Makefile
@@ -18,6 +18,9 @@ sha256-riscv64-y := sha256-riscv64-glue.o sha256-riscv64-zvkb-zvknha_or_zvknhb.o
 obj-$(CONFIG_CRYPTO_SHA512_RISCV64) += sha512-riscv64.o
 sha512-riscv64-y := sha512-riscv64-glue.o sha512-riscv64-zvkb-zvknhb.o
 
+obj-$(CONFIG_CRYPTO_SM3_RISCV64) += sm3-riscv64.o
+sm3-riscv64-y := sm3-riscv64-glue.o sm3-riscv64-zvksh.o
+
 obj-$(CONFIG_CRYPTO_SM4_RISCV64) += sm4-riscv64.o
 sm4-riscv64-y := sm4-riscv64-glue.o sm4-riscv64-zvksed.o
 
@@ -42,6 +45,9 @@ $(obj)/sha256-riscv64-zvkb-zvknha_or_zvknhb.S: $(src)/sha256-riscv64-zvkb-zvknha
 $(obj)/sha512-riscv64-zvkb-zvknhb.S: $(src)/sha512-riscv64-zvkb-zvknhb.pl
 	$(call cmd,perlasm)
 
+$(obj)/sm3-riscv64-zvksh.S: $(src)/sm3-riscv64-zvksh.pl
+	$(call cmd,perlasm)
+
 $(obj)/sm4-riscv64-zvksed.S: $(src)/sm4-riscv64-zvksed.pl
 	$(call cmd,perlasm)
 
@@ -51,4 +57,5 @@ clean-files += aes-riscv64-zvkb-zvkned.S
 clean-files += ghash-riscv64-zvkg.S
 clean-files += sha256-riscv64-zvkb-zvknha_or_zvknhb.S
 clean-files += sha512-riscv64-zvkb-zvknhb.S
+clean-files += sm3-riscv64-zvksh.S
 clean-files += sm4-riscv64-zvksed.S
diff --git a/arch/riscv/crypto/sm3-riscv64-glue.c b/arch/riscv/crypto/sm3-riscv64-glue.c
new file mode 100644
index 000000000000..32d37b602e54
--- /dev/null
+++ b/arch/riscv/crypto/sm3-riscv64-glue.c
@@ -0,0 +1,121 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Linux/riscv64 port of the OpenSSL SM3 implementation for RISC-V 64
+ *
+ * Copyright (C) 2023 VRULL GmbH
+ * Author: Heiko Stuebner <heiko.stuebner@vrull.eu>
+ */
+
+#include <asm/simd.h>
+#include <asm/vector.h>
+#include <linux/module.h>
+#include <linux/types.h>
+#include <crypto/internal/hash.h>
+#include <crypto/internal/simd.h>
+#include <crypto/sm3_base.h>
+
+/*
+ * sm3 using zvksh vector crypto extension
+ *
+ * This asm function will just take the first 256-bit as the sm3 state from
+ * the pointer to `struct sm3_state`.
+ */
+void ossl_hwsm3_block_data_order_zvksh(struct sm3_state *digest, u8 const *o,
+				       int num);
+
+static int riscv64_sm3_update(struct shash_desc *desc, const u8 *data,
+			      unsigned int len)
+{
+	int ret = 0;
+
+	/*
+	 * Make sure struct sm3_state begins directly with the SM3 256-bit internal
+	 * state, as this is what the asm function expect.
+	 */
+	BUILD_BUG_ON(offsetof(struct sm3_state, state) != 0);
+
+	if (crypto_simd_usable()) {
+		kernel_vector_begin();
+		ret = sm3_base_do_update(desc, data, len,
+					 ossl_hwsm3_block_data_order_zvksh);
+		kernel_vector_end();
+	} else {
+		sm3_update(shash_desc_ctx(desc), data, len);
+	}
+
+	return ret;
+}
+
+static int riscv64_sm3_finup(struct shash_desc *desc, const u8 *data,
+			     unsigned int len, u8 *out)
+{
+	struct sm3_state *ctx;
+
+	if (crypto_simd_usable()) {
+		kernel_vector_begin();
+		if (len)
+			sm3_base_do_update(desc, data, len,
+					   ossl_hwsm3_block_data_order_zvksh);
+		sm3_base_do_finalize(desc, ossl_hwsm3_block_data_order_zvksh);
+		kernel_vector_end();
+
+		return sm3_base_finish(desc, out);
+	}
+
+	ctx = shash_desc_ctx(desc);
+	if (len)
+		sm3_update(ctx, data, len);
+	sm3_final(ctx, out);
+
+	return 0;
+}
+
+static int riscv64_sm3_final(struct shash_desc *desc, u8 *out)
+{
+	return riscv64_sm3_finup(desc, NULL, 0, out);
+}
+
+static struct shash_alg sm3_alg = {
+	.digestsize = SM3_DIGEST_SIZE,
+	.init = sm3_base_init,
+	.update = riscv64_sm3_update,
+	.final = riscv64_sm3_final,
+	.finup = riscv64_sm3_finup,
+	.descsize = sizeof(struct sm3_state),
+	.base.cra_name = "sm3",
+	.base.cra_driver_name = "sm3-riscv64-zvkb-zvksh",
+	.base.cra_priority = 150,
+	.base.cra_blocksize = SM3_BLOCK_SIZE,
+	.base.cra_module = THIS_MODULE,
+};
+
+static inline bool check_sm3_ext(void)
+{
+	return riscv_isa_extension_available(NULL, ZVKSH) &&
+	       riscv_isa_extension_available(NULL, ZVKB) &&
+	       riscv_vector_vlen() >= 128;
+}
+
+static int __init riscv64_riscv64_sm3_mod_init(void)
+{
+	if (check_sm3_ext())
+		return crypto_register_shash(&sm3_alg);
+
+	return -ENODEV;
+}
+
+static void __exit riscv64_sm3_mod_fini(void)
+{
+	if (check_sm3_ext())
+		crypto_unregister_shash(&sm3_alg);
+}
+
+module_init(riscv64_riscv64_sm3_mod_init);
+module_exit(riscv64_sm3_mod_fini);
+
+MODULE_DESCRIPTION("SM3 (RISC-V accelerated)");
+MODULE_AUTHOR("Andy Polyakov <appro@openssl.org>");
+MODULE_AUTHOR("Ard Biesheuvel <ard.biesheuvel@linaro.org>");
+MODULE_AUTHOR("Heiko Stuebner <heiko.stuebner@vrull.eu>");
+MODULE_LICENSE("GPL");
+MODULE_ALIAS_CRYPTO("sm3");
diff --git a/arch/riscv/crypto/sm3-riscv64-zvksh.pl b/arch/riscv/crypto/sm3-riscv64-zvksh.pl
new file mode 100644
index 000000000000..942d78d982e9
--- /dev/null
+++ b/arch/riscv/crypto/sm3-riscv64-zvksh.pl
@@ -0,0 +1,230 @@
+#! /usr/bin/env perl
+# SPDX-License-Identifier: Apache-2.0 OR BSD-2-Clause
+#
+# This file is dual-licensed, meaning that you can use it under your
+# choice of either of the following two licenses:
+#
+# Copyright 2023 The OpenSSL Project Authors. All Rights Reserved.
+#
+# Licensed under the Apache License 2.0 (the "License"). You can obtain
+# a copy in the file LICENSE in the source distribution or at
+# https://www.openssl.org/source/license.html
+#
+# or
+#
+# Copyright (c) 2023, Christoph Müllner <christoph.muellner@vrull.eu>
+# Copyright (c) 2023, Jerry Shih <jerry.shih@sifive.com>
+# All rights reserved.
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions
+# are met:
+# 1. Redistributions of source code must retain the above copyright
+#    notice, this list of conditions and the following disclaimer.
+# 2. Redistributions in binary form must reproduce the above copyright
+#    notice, this list of conditions and the following disclaimer in the
+#    documentation and/or other materials provided with the distribution.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+# The generated code of this file depends on the following RISC-V extensions:
+# - RV64I
+# - RISC-V Vector ('V') with VLEN >= 128
+# - RISC-V Vector Cryptography Bit-manipulation extension ('Zvkb')
+# - RISC-V Vector SM3 Secure Hash extension ('Zvksh')
+
+use strict;
+use warnings;
+
+use FindBin qw($Bin);
+use lib "$Bin";
+use lib "$Bin/../../perlasm";
+use riscv;
+
+# $output is the last argument if it looks like a file (it has an extension)
+# $flavour is the first argument if it doesn't look like a file
+my $output = $#ARGV >= 0 && $ARGV[$#ARGV] =~ m|\.\w+$| ? pop : undef;
+my $flavour = $#ARGV >= 0 && $ARGV[0] !~ m|\.| ? shift : undef;
+
+$output and open STDOUT,">$output";
+
+my $code=<<___;
+.text
+___
+
+################################################################################
+# ossl_hwsm3_block_data_order_zvksh(SM3_CTX *c, const void *p, size_t num);
+{
+my ($CTX, $INPUT, $NUM) = ("a0", "a1", "a2");
+my ($V0, $V1, $V2, $V3, $V4, $V5, $V6, $V7,
+    $V8, $V9, $V10, $V11, $V12, $V13, $V14, $V15,
+    $V16, $V17, $V18, $V19, $V20, $V21, $V22, $V23,
+    $V24, $V25, $V26, $V27, $V28, $V29, $V30, $V31,
+) = map("v$_",(0..31));
+
+$code .= <<___;
+.text
+.p2align 3
+.globl ossl_hwsm3_block_data_order_zvksh
+.type ossl_hwsm3_block_data_order_zvksh,\@function
+ossl_hwsm3_block_data_order_zvksh:
+    @{[vsetivli "zero", 8, "e32", "m2", "ta", "ma"]}
+
+    # Load initial state of hash context (c->A-H).
+    @{[vle32_v $V0, $CTX]}
+    @{[vrev8_v $V0, $V0]}
+
+L_sm3_loop:
+    # Copy the previous state to v2.
+    # It will be XOR'ed with the current state at the end of the round.
+    @{[vmv_v_v $V2, $V0]}
+
+    # Load the 64B block in 2x32B chunks.
+    @{[vle32_v $V6, $INPUT]} # v6 := {w7, ..., w0}
+    addi $INPUT, $INPUT, 32
+
+    @{[vle32_v $V8, $INPUT]} # v8 := {w15, ..., w8}
+    addi $INPUT, $INPUT, 32
+
+    addi $NUM, $NUM, -1
+
+    # As vsm3c consumes only w0, w1, w4, w5 we need to slide the input
+    # 2 elements down so we process elements w2, w3, w6, w7
+    # This will be repeated for each odd round.
+    @{[vslidedown_vi $V4, $V6, 2]} # v4 := {X, X, w7, ..., w2}
+
+    @{[vsm3c_vi $V0, $V6, 0]}
+    @{[vsm3c_vi $V0, $V4, 1]}
+
+    # Prepare a vector with {w11, ..., w4}
+    @{[vslidedown_vi $V4, $V4, 2]} # v4 := {X, X, X, X, w7, ..., w4}
+    @{[vslideup_vi $V4, $V8, 4]}   # v4 := {w11, w10, w9, w8, w7, w6, w5, w4}
+
+    @{[vsm3c_vi $V0, $V4, 2]}
+    @{[vslidedown_vi $V4, $V4, 2]} # v4 := {X, X, w11, w10, w9, w8, w7, w6}
+    @{[vsm3c_vi $V0, $V4, 3]}
+
+    @{[vsm3c_vi $V0, $V8, 4]}
+    @{[vslidedown_vi $V4, $V8, 2]} # v4 := {X, X, w15, w14, w13, w12, w11, w10}
+    @{[vsm3c_vi $V0, $V4, 5]}
+
+    @{[vsm3me_vv $V6, $V8, $V6]}   # v6 := {w23, w22, w21, w20, w19, w18, w17, w16}
+
+    # Prepare a register with {w19, w18, w17, w16, w15, w14, w13, w12}
+    @{[vslidedown_vi $V4, $V4, 2]} # v4 := {X, X, X, X, w15, w14, w13, w12}
+    @{[vslideup_vi $V4, $V6, 4]}   # v4 := {w19, w18, w17, w16, w15, w14, w13, w12}
+
+    @{[vsm3c_vi $V0, $V4, 6]}
+    @{[vslidedown_vi $V4, $V4, 2]} # v4 := {X, X, w19, w18, w17, w16, w15, w14}
+    @{[vsm3c_vi $V0, $V4, 7]}
+
+    @{[vsm3c_vi $V0, $V6, 8]}
+    @{[vslidedown_vi $V4, $V6, 2]} # v4 := {X, X, w23, w22, w21, w20, w19, w18}
+    @{[vsm3c_vi $V0, $V4, 9]}
+
+    @{[vsm3me_vv $V8, $V6, $V8]}   # v8 := {w31, w30, w29, w28, w27, w26, w25, w24}
+
+    # Prepare a register with {w27, w26, w25, w24, w23, w22, w21, w20}
+    @{[vslidedown_vi $V4, $V4, 2]} # v4 := {X, X, X, X, w23, w22, w21, w20}
+    @{[vslideup_vi $V4, $V8, 4]}   # v4 := {w27, w26, w25, w24, w23, w22, w21, w20}
+
+    @{[vsm3c_vi $V0, $V4, 10]}
+    @{[vslidedown_vi $V4, $V4, 2]} # v4 := {X, X, w27, w26, w25, w24, w23, w22}
+    @{[vsm3c_vi $V0, $V4, 11]}
+
+    @{[vsm3c_vi $V0, $V8, 12]}
+    @{[vslidedown_vi $V4, $V8, 2]} # v4 := {x, X, w31, w30, w29, w28, w27, w26}
+    @{[vsm3c_vi $V0, $V4, 13]}
+
+    @{[vsm3me_vv $V6, $V8, $V6]}   # v6 := {w32, w33, w34, w35, w36, w37, w38, w39}
+
+    # Prepare a register with {w35, w34, w33, w32, w31, w30, w29, w28}
+    @{[vslidedown_vi $V4, $V4, 2]} # v4 := {X, X, X, X, w31, w30, w29, w28}
+    @{[vslideup_vi $V4, $V6, 4]}   # v4 := {w35, w34, w33, w32, w31, w30, w29, w28}
+
+    @{[vsm3c_vi $V0, $V4, 14]}
+    @{[vslidedown_vi $V4, $V4, 2]} # v4 := {X, X, w35, w34, w33, w32, w31, w30}
+    @{[vsm3c_vi $V0, $V4, 15]}
+
+    @{[vsm3c_vi $V0, $V6, 16]}
+    @{[vslidedown_vi $V4, $V6, 2]} # v4 := {X, X, w39, w38, w37, w36, w35, w34}
+    @{[vsm3c_vi $V0, $V4, 17]}
+
+    @{[vsm3me_vv $V8, $V6, $V8]}   # v8 := {w47, w46, w45, w44, w43, w42, w41, w40}
+
+    # Prepare a register with {w43, w42, w41, w40, w39, w38, w37, w36}
+    @{[vslidedown_vi $V4, $V4, 2]} # v4 := {X, X, X, X, w39, w38, w37, w36}
+    @{[vslideup_vi $V4, $V8, 4]}   # v4 := {w43, w42, w41, w40, w39, w38, w37, w36}
+
+    @{[vsm3c_vi $V0, $V4, 18]}
+    @{[vslidedown_vi $V4, $V4, 2]} # v4 := {X, X, w43, w42, w41, w40, w39, w38}
+    @{[vsm3c_vi $V0, $V4, 19]}
+
+    @{[vsm3c_vi $V0, $V8, 20]}
+    @{[vslidedown_vi $V4, $V8, 2]} # v4 := {X, X, w47, w46, w45, w44, w43, w42}
+    @{[vsm3c_vi $V0, $V4, 21]}
+
+    @{[vsm3me_vv $V6, $V8, $V6]}   # v6 := {w55, w54, w53, w52, w51, w50, w49, w48}
+
+    # Prepare a register with {w51, w50, w49, w48, w47, w46, w45, w44}
+    @{[vslidedown_vi $V4, $V4, 2]} # v4 := {X, X, X, X, w47, w46, w45, w44}
+    @{[vslideup_vi $V4, $V6, 4]}   # v4 := {w51, w50, w49, w48, w47, w46, w45, w44}
+
+    @{[vsm3c_vi $V0, $V4, 22]}
+    @{[vslidedown_vi $V4, $V4, 2]} # v4 := {X, X, w51, w50, w49, w48, w47, w46}
+    @{[vsm3c_vi $V0, $V4, 23]}
+
+    @{[vsm3c_vi $V0, $V6, 24]}
+    @{[vslidedown_vi $V4, $V6, 2]} # v4 := {X, X, w55, w54, w53, w52, w51, w50}
+    @{[vsm3c_vi $V0, $V4, 25]}
+
+    @{[vsm3me_vv $V8, $V6, $V8]}   # v8 := {w63, w62, w61, w60, w59, w58, w57, w56}
+
+    # Prepare a register with {w59, w58, w57, w56, w55, w54, w53, w52}
+    @{[vslidedown_vi $V4, $V4, 2]} # v4 := {X, X, X, X, w55, w54, w53, w52}
+    @{[vslideup_vi $V4, $V8, 4]}   # v4 := {w59, w58, w57, w56, w55, w54, w53, w52}
+
+    @{[vsm3c_vi $V0, $V4, 26]}
+    @{[vslidedown_vi $V4, $V4, 2]} # v4 := {X, X, w59, w58, w57, w56, w55, w54}
+    @{[vsm3c_vi $V0, $V4, 27]}
+
+    @{[vsm3c_vi $V0, $V8, 28]}
+    @{[vslidedown_vi $V4, $V8, 2]} # v4 := {X, X, w63, w62, w61, w60, w59, w58}
+    @{[vsm3c_vi $V0, $V4, 29]}
+
+    @{[vsm3me_vv $V6, $V8, $V6]}   # v6 := {w71, w70, w69, w68, w67, w66, w65, w64}
+
+    # Prepare a register with {w67, w66, w65, w64, w63, w62, w61, w60}
+    @{[vslidedown_vi $V4, $V4, 2]} # v4 := {X, X, X, X, w63, w62, w61, w60}
+    @{[vslideup_vi $V4, $V6, 4]}   # v4 := {w67, w66, w65, w64, w63, w62, w61, w60}
+
+    @{[vsm3c_vi $V0, $V4, 30]}
+    @{[vslidedown_vi $V4, $V4, 2]} # v4 := {X, X, w67, w66, w65, w64, w63, w62}
+    @{[vsm3c_vi $V0, $V4, 31]}
+
+    # XOR in the previous state.
+    @{[vxor_vv $V0, $V0, $V2]}
+
+    bnez $NUM, L_sm3_loop     # Check if there are any more block to process
+L_sm3_end:
+    @{[vrev8_v $V0, $V0]}
+    @{[vse32_v $V0, $CTX]}
+    ret
+
+.size ossl_hwsm3_block_data_order_zvksh,.-ossl_hwsm3_block_data_order_zvksh
+___
+}
+
+print $code;
+
+close STDOUT or die "error closing STDOUT: $!";
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 12/12] RISC-V: crypto: add Zvkb accelerated ChaCha20 implementation
  2023-10-25 18:36 [PATCH 00/12] RISC-V: provide some accelerated cryptography implementations using vector extensions Jerry Shih
                   ` (10 preceding siblings ...)
  2023-10-25 18:36 ` [PATCH 11/12] RISC-V: crypto: add Zvksh accelerated SM3 implementation Jerry Shih
@ 2023-10-25 18:36 ` Jerry Shih
  2023-11-02  5:43   ` Eric Biggers
  2023-11-22  1:29   ` Eric Biggers
  11 siblings, 2 replies; 50+ messages in thread
From: Jerry Shih @ 2023-10-25 18:36 UTC (permalink / raw)
  To: paul.walmsley, palmer, aou, herbert, davem
  Cc: andy.chiu, greentime.hu, conor.dooley, guoren, bjorn, heiko,
	ebiggers, ardb, phoebe.chen, hongrong.hsu, linux-riscv,
	linux-kernel, linux-crypto

Add a ChaCha20 vector implementation from OpenSSL(openssl/openssl#21923).

Signed-off-by: Jerry Shih <jerry.shih@sifive.com>
---
 arch/riscv/crypto/Kconfig                |  12 +
 arch/riscv/crypto/Makefile               |   7 +
 arch/riscv/crypto/chacha-riscv64-glue.c  | 120 +++++++++
 arch/riscv/crypto/chacha-riscv64-zvkb.pl | 322 +++++++++++++++++++++++
 4 files changed, 461 insertions(+)
 create mode 100644 arch/riscv/crypto/chacha-riscv64-glue.c
 create mode 100644 arch/riscv/crypto/chacha-riscv64-zvkb.pl

diff --git a/arch/riscv/crypto/Kconfig b/arch/riscv/crypto/Kconfig
index 2797b37394bb..41ce453afafa 100644
--- a/arch/riscv/crypto/Kconfig
+++ b/arch/riscv/crypto/Kconfig
@@ -35,6 +35,18 @@ config CRYPTO_AES_BLOCK_RISCV64
 	  - Zvkg vector crypto extension (XTS)
 	  - Zvkned vector crypto extension
 
+config CRYPTO_CHACHA20_RISCV64
+	default y if RISCV_ISA_V
+	tristate "Ciphers: ChaCha20"
+	depends on 64BIT && RISCV_ISA_V
+	select CRYPTO_SKCIPHER
+	select CRYPTO_LIB_CHACHA_GENERIC
+	help
+	  Length-preserving ciphers: ChaCha20 stream cipher algorithm
+
+	  Architecture: riscv64 using:
+	  - Zvkb vector crypto extension
+
 config CRYPTO_GHASH_RISCV64
 	default y if RISCV_ISA_V
 	tristate "Hash functions: GHASH"
diff --git a/arch/riscv/crypto/Makefile b/arch/riscv/crypto/Makefile
index b772417703fd..80b0ebc956a3 100644
--- a/arch/riscv/crypto/Makefile
+++ b/arch/riscv/crypto/Makefile
@@ -9,6 +9,9 @@ aes-riscv64-y := aes-riscv64-glue.o aes-riscv64-zvkned.o
 obj-$(CONFIG_CRYPTO_AES_BLOCK_RISCV64) += aes-block-riscv64.o
 aes-block-riscv64-y := aes-riscv64-block-mode-glue.o aes-riscv64-zvbb-zvkg-zvkned.o aes-riscv64-zvkb-zvkned.o
 
+obj-$(CONFIG_CRYPTO_CHACHA20_RISCV64) += chacha-riscv64.o
+chacha-riscv64-y := chacha-riscv64-glue.o chacha-riscv64-zvkb.o
+
 obj-$(CONFIG_CRYPTO_GHASH_RISCV64) += ghash-riscv64.o
 ghash-riscv64-y := ghash-riscv64-glue.o ghash-riscv64-zvkg.o
 
@@ -36,6 +39,9 @@ $(obj)/aes-riscv64-zvbb-zvkg-zvkned.S: $(src)/aes-riscv64-zvbb-zvkg-zvkned.pl
 $(obj)/aes-riscv64-zvkb-zvkned.S: $(src)/aes-riscv64-zvkb-zvkned.pl
 	$(call cmd,perlasm)
 
+$(obj)/chacha-riscv64-zvkb.S: $(src)/chacha-riscv64-zvkb.pl
+	$(call cmd,perlasm)
+
 $(obj)/ghash-riscv64-zvkg.S: $(src)/ghash-riscv64-zvkg.pl
 	$(call cmd,perlasm)
 
@@ -54,6 +60,7 @@ $(obj)/sm4-riscv64-zvksed.S: $(src)/sm4-riscv64-zvksed.pl
 clean-files += aes-riscv64-zvkned.S
 clean-files += aes-riscv64-zvbb-zvkg-zvkned.S
 clean-files += aes-riscv64-zvkb-zvkned.S
+clean-files += chacha-riscv64-zvkb.S
 clean-files += ghash-riscv64-zvkg.S
 clean-files += sha256-riscv64-zvkb-zvknha_or_zvknhb.S
 clean-files += sha512-riscv64-zvkb-zvknhb.S
diff --git a/arch/riscv/crypto/chacha-riscv64-glue.c b/arch/riscv/crypto/chacha-riscv64-glue.c
new file mode 100644
index 000000000000..72011949f705
--- /dev/null
+++ b/arch/riscv/crypto/chacha-riscv64-glue.c
@@ -0,0 +1,120 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Port of the OpenSSL ChaCha20 implementation for RISC-V 64
+ *
+ * Copyright (C) 2023 SiFive, Inc.
+ * Author: Jerry Shih <jerry.shih@sifive.com>
+ */
+
+#include <asm/simd.h>
+#include <asm/vector.h>
+#include <crypto/internal/chacha.h>
+#include <crypto/internal/simd.h>
+#include <crypto/internal/skcipher.h>
+#include <linux/crypto.h>
+#include <linux/module.h>
+#include <linux/types.h>
+
+#define CHACHA_BLOCK_VALID_SIZE_MASK (~(CHACHA_BLOCK_SIZE - 1))
+#define CHACHA_BLOCK_REMAINING_SIZE_MASK (CHACHA_BLOCK_SIZE - 1)
+#define CHACHA_KEY_OFFSET 4
+#define CHACHA_IV_OFFSET 12
+
+/* chacha20 using zvkb vector crypto extension */
+void ChaCha20_ctr32_zvkb(u8 *out, const u8 *input, size_t len, const u32 *key,
+			 const u32 *counter);
+
+static int chacha20_encrypt(struct skcipher_request *req)
+{
+	u32 state[CHACHA_STATE_WORDS];
+	u8 block_buffer[CHACHA_BLOCK_SIZE];
+	struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
+	const struct chacha_ctx *ctx = crypto_skcipher_ctx(tfm);
+	struct skcipher_walk walk;
+	unsigned int nbytes;
+	unsigned int tail_bytes;
+	int err;
+
+	chacha_init_generic(state, ctx->key, req->iv);
+
+	err = skcipher_walk_virt(&walk, req, false);
+	while (walk.nbytes) {
+		nbytes = walk.nbytes & CHACHA_BLOCK_VALID_SIZE_MASK;
+		tail_bytes = walk.nbytes & CHACHA_BLOCK_REMAINING_SIZE_MASK;
+		kernel_vector_begin();
+		if (nbytes) {
+			ChaCha20_ctr32_zvkb(walk.dst.virt.addr,
+					    walk.src.virt.addr, nbytes,
+					    state + CHACHA_KEY_OFFSET,
+					    state + CHACHA_IV_OFFSET);
+			state[CHACHA_IV_OFFSET] += nbytes / CHACHA_BLOCK_SIZE;
+		}
+		if (walk.nbytes == walk.total && tail_bytes > 0) {
+			memcpy(block_buffer, walk.src.virt.addr + nbytes,
+			       tail_bytes);
+			ChaCha20_ctr32_zvkb(block_buffer, block_buffer,
+					    CHACHA_BLOCK_SIZE,
+					    state + CHACHA_KEY_OFFSET,
+					    state + CHACHA_IV_OFFSET);
+			memcpy(walk.dst.virt.addr + nbytes, block_buffer,
+			       tail_bytes);
+			tail_bytes = 0;
+		}
+		kernel_vector_end();
+
+		err = skcipher_walk_done(&walk, tail_bytes);
+	}
+
+	return err;
+}
+
+static struct skcipher_alg riscv64_chacha_alg_zvkb[] = { {
+	.base = {
+		.cra_name = "chacha20",
+		.cra_driver_name = "chacha20-riscv64-zvkb",
+		.cra_priority = 300,
+		.cra_blocksize = 1,
+		.cra_ctxsize = sizeof(struct chacha_ctx),
+		.cra_module = THIS_MODULE,
+	},
+	.min_keysize = CHACHA_KEY_SIZE,
+	.max_keysize = CHACHA_KEY_SIZE,
+	.ivsize = CHACHA_IV_SIZE,
+	.chunksize = CHACHA_BLOCK_SIZE,
+	.walksize = CHACHA_BLOCK_SIZE * 4,
+	.setkey = chacha20_setkey,
+	.encrypt = chacha20_encrypt,
+	.decrypt = chacha20_encrypt,
+} };
+
+static inline bool check_chacha20_ext(void)
+{
+	return riscv_isa_extension_available(NULL, ZVKB) &&
+	       riscv_vector_vlen() >= 128;
+}
+
+static int __init riscv64_chacha_mod_init(void)
+{
+	if (check_chacha20_ext())
+		return crypto_register_skciphers(
+			riscv64_chacha_alg_zvkb,
+			ARRAY_SIZE(riscv64_chacha_alg_zvkb));
+
+	return -ENODEV;
+}
+
+static void __exit riscv64_chacha_mod_fini(void)
+{
+	if (check_chacha20_ext())
+		crypto_unregister_skciphers(
+			riscv64_chacha_alg_zvkb,
+			ARRAY_SIZE(riscv64_chacha_alg_zvkb));
+}
+
+module_init(riscv64_chacha_mod_init);
+module_exit(riscv64_chacha_mod_fini);
+
+MODULE_DESCRIPTION("ChaCha20 (RISC-V accelerated)");
+MODULE_AUTHOR("Jerry Shih <jerry.shih@sifive.com>");
+MODULE_LICENSE("GPL");
+MODULE_ALIAS_CRYPTO("chacha20");
diff --git a/arch/riscv/crypto/chacha-riscv64-zvkb.pl b/arch/riscv/crypto/chacha-riscv64-zvkb.pl
new file mode 100644
index 000000000000..9caf7b247804
--- /dev/null
+++ b/arch/riscv/crypto/chacha-riscv64-zvkb.pl
@@ -0,0 +1,322 @@
+#! /usr/bin/env perl
+# SPDX-License-Identifier: Apache-2.0 OR BSD-2-Clause
+#
+# This file is dual-licensed, meaning that you can use it under your
+# choice of either of the following two licenses:
+#
+# Copyright 2023-2023 The OpenSSL Project Authors. All Rights Reserved.
+#
+# Licensed under the Apache License 2.0 (the "License").  You may not use
+# this file except in compliance with the License.  You can obtain a copy
+# in the file LICENSE in the source distribution or at
+# https://www.openssl.org/source/license.html
+#
+# or
+#
+# Copyright (c) 2023, Jerry Shih <jerry.shih@sifive.com>
+# All rights reserved.
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions
+# are met:
+# 1. Redistributions of source code must retain the above copyright
+#    notice, this list of conditions and the following disclaimer.
+# 2. Redistributions in binary form must reproduce the above copyright
+#    notice, this list of conditions and the following disclaimer in the
+#    documentation and/or other materials provided with the distribution.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+# - RV64I
+# - RISC-V Vector ('V') with VLEN >= 128
+# - RISC-V Vector Cryptography Bit-manipulation extension ('Zvkb')
+# - RISC-V Zicclsm(Main memory supports misaligned loads/stores)
+
+use strict;
+use warnings;
+
+use FindBin qw($Bin);
+use lib "$Bin";
+use lib "$Bin/../../perlasm";
+use riscv;
+
+# $output is the last argument if it looks like a file (it has an extension)
+# $flavour is the first argument if it doesn't look like a file
+my $output  = $#ARGV >= 0 && $ARGV[$#ARGV] =~ m|\.\w+$| ? pop   : undef;
+my $flavour = $#ARGV >= 0 && $ARGV[0] !~ m|\.|          ? shift : undef;
+
+$output and open STDOUT, ">$output";
+
+my $code = <<___;
+.text
+___
+
+# void ChaCha20_ctr32_zvkb(unsigned char *out, const unsigned char *inp,
+#                          size_t len, const unsigned int key[8],
+#                          const unsigned int counter[4]);
+################################################################################
+my ( $OUTPUT, $INPUT, $LEN, $KEY, $COUNTER ) = ( "a0", "a1", "a2", "a3", "a4" );
+my ( $T0 ) = ( "t0" );
+my ( $CONST_DATA0, $CONST_DATA1, $CONST_DATA2, $CONST_DATA3 ) =
+  ( "a5", "a6", "a7", "t1" );
+my ( $KEY0, $KEY1, $KEY2,$KEY3, $KEY4, $KEY5, $KEY6, $KEY7,
+     $COUNTER0, $COUNTER1, $NONCE0, $NONCE1
+) = ( "s0", "s1", "s2", "s3", "s4", "s5", "s6",
+    "s7", "s8", "s9", "s10", "s11" );
+my ( $VL, $STRIDE, $CHACHA_LOOP_COUNT ) = ( "t2", "t3", "t4" );
+my (
+    $V0,  $V1,  $V2,  $V3,  $V4,  $V5,  $V6,  $V7,  $V8,  $V9,  $V10,
+    $V11, $V12, $V13, $V14, $V15, $V16, $V17, $V18, $V19, $V20, $V21,
+    $V22, $V23, $V24, $V25, $V26, $V27, $V28, $V29, $V30, $V31,
+) = map( "v$_", ( 0 .. 31 ) );
+
+sub chacha_quad_round_group {
+    my (
+        $A0, $B0, $C0, $D0, $A1, $B1, $C1, $D1,
+        $A2, $B2, $C2, $D2, $A3, $B3, $C3, $D3
+    ) = @_;
+
+    my $code = <<___;
+    # a += b; d ^= a; d <<<= 16;
+    @{[vadd_vv $A0, $A0, $B0]}
+    @{[vadd_vv $A1, $A1, $B1]}
+    @{[vadd_vv $A2, $A2, $B2]}
+    @{[vadd_vv $A3, $A3, $B3]}
+    @{[vxor_vv $D0, $D0, $A0]}
+    @{[vxor_vv $D1, $D1, $A1]}
+    @{[vxor_vv $D2, $D2, $A2]}
+    @{[vxor_vv $D3, $D3, $A3]}
+    @{[vror_vi $D0, $D0, 32 - 16]}
+    @{[vror_vi $D1, $D1, 32 - 16]}
+    @{[vror_vi $D2, $D2, 32 - 16]}
+    @{[vror_vi $D3, $D3, 32 - 16]}
+    # c += d; b ^= c; b <<<= 12;
+    @{[vadd_vv $C0, $C0, $D0]}
+    @{[vadd_vv $C1, $C1, $D1]}
+    @{[vadd_vv $C2, $C2, $D2]}
+    @{[vadd_vv $C3, $C3, $D3]}
+    @{[vxor_vv $B0, $B0, $C0]}
+    @{[vxor_vv $B1, $B1, $C1]}
+    @{[vxor_vv $B2, $B2, $C2]}
+    @{[vxor_vv $B3, $B3, $C3]}
+    @{[vror_vi $B0, $B0, 32 - 12]}
+    @{[vror_vi $B1, $B1, 32 - 12]}
+    @{[vror_vi $B2, $B2, 32 - 12]}
+    @{[vror_vi $B3, $B3, 32 - 12]}
+    # a += b; d ^= a; d <<<= 8;
+    @{[vadd_vv $A0, $A0, $B0]}
+    @{[vadd_vv $A1, $A1, $B1]}
+    @{[vadd_vv $A2, $A2, $B2]}
+    @{[vadd_vv $A3, $A3, $B3]}
+    @{[vxor_vv $D0, $D0, $A0]}
+    @{[vxor_vv $D1, $D1, $A1]}
+    @{[vxor_vv $D2, $D2, $A2]}
+    @{[vxor_vv $D3, $D3, $A3]}
+    @{[vror_vi $D0, $D0, 32 - 8]}
+    @{[vror_vi $D1, $D1, 32 - 8]}
+    @{[vror_vi $D2, $D2, 32 - 8]}
+    @{[vror_vi $D3, $D3, 32 - 8]}
+    # c += d; b ^= c; b <<<= 7;
+    @{[vadd_vv $C0, $C0, $D0]}
+    @{[vadd_vv $C1, $C1, $D1]}
+    @{[vadd_vv $C2, $C2, $D2]}
+    @{[vadd_vv $C3, $C3, $D3]}
+    @{[vxor_vv $B0, $B0, $C0]}
+    @{[vxor_vv $B1, $B1, $C1]}
+    @{[vxor_vv $B2, $B2, $C2]}
+    @{[vxor_vv $B3, $B3, $C3]}
+    @{[vror_vi $B0, $B0, 32 - 7]}
+    @{[vror_vi $B1, $B1, 32 - 7]}
+    @{[vror_vi $B2, $B2, 32 - 7]}
+    @{[vror_vi $B3, $B3, 32 - 7]}
+___
+
+    return $code;
+}
+
+$code .= <<___;
+.p2align 3
+.globl ChaCha20_ctr32_zvkb
+.type ChaCha20_ctr32_zvkb,\@function
+ChaCha20_ctr32_zvkb:
+    srli $LEN, $LEN, 6
+    beqz $LEN, .Lend
+
+    addi sp, sp, -96
+    sd s0, 0(sp)
+    sd s1, 8(sp)
+    sd s2, 16(sp)
+    sd s3, 24(sp)
+    sd s4, 32(sp)
+    sd s5, 40(sp)
+    sd s6, 48(sp)
+    sd s7, 56(sp)
+    sd s8, 64(sp)
+    sd s9, 72(sp)
+    sd s10, 80(sp)
+    sd s11, 88(sp)
+
+    li $STRIDE, 64
+
+    #### chacha block data
+    # "expa" little endian
+    li $CONST_DATA0, 0x61707865
+    # "nd 3" little endian
+    li $CONST_DATA1, 0x3320646e
+    # "2-by" little endian
+    li $CONST_DATA2, 0x79622d32
+    # "te k" little endian
+    li $CONST_DATA3, 0x6b206574
+
+    lw $KEY0, 0($KEY)
+    lw $KEY1, 4($KEY)
+    lw $KEY2, 8($KEY)
+    lw $KEY3, 12($KEY)
+    lw $KEY4, 16($KEY)
+    lw $KEY5, 20($KEY)
+    lw $KEY6, 24($KEY)
+    lw $KEY7, 28($KEY)
+
+    lw $COUNTER0, 0($COUNTER)
+    lw $COUNTER1, 4($COUNTER)
+    lw $NONCE0, 8($COUNTER)
+    lw $NONCE1, 12($COUNTER)
+
+.Lblock_loop:
+    @{[vsetvli $VL, $LEN, "e32", "m1", "ta", "ma"]}
+
+    # init chacha const states
+    @{[vmv_v_x $V0, $CONST_DATA0]}
+    @{[vmv_v_x $V1, $CONST_DATA1]}
+    @{[vmv_v_x $V2, $CONST_DATA2]}
+    @{[vmv_v_x $V3, $CONST_DATA3]}
+
+    # init chacha key states
+    @{[vmv_v_x $V4, $KEY0]}
+    @{[vmv_v_x $V5, $KEY1]}
+    @{[vmv_v_x $V6, $KEY2]}
+    @{[vmv_v_x $V7, $KEY3]}
+    @{[vmv_v_x $V8, $KEY4]}
+    @{[vmv_v_x $V9, $KEY5]}
+    @{[vmv_v_x $V10, $KEY6]}
+    @{[vmv_v_x $V11, $KEY7]}
+
+    # init chacha key states
+    @{[vid_v $V12]}
+    @{[vadd_vx $V12, $V12, $COUNTER0]}
+    @{[vmv_v_x $V13, $COUNTER1]}
+
+    # init chacha nonce states
+    @{[vmv_v_x $V14, $NONCE0]}
+    @{[vmv_v_x $V15, $NONCE1]}
+
+    # load the top-half of input data
+    @{[vlsseg_nf_e32_v 8, $V16, $INPUT, $STRIDE]}
+
+    li $CHACHA_LOOP_COUNT, 10
+.Lround_loop:
+    addi $CHACHA_LOOP_COUNT, $CHACHA_LOOP_COUNT, -1
+    @{[chacha_quad_round_group
+      $V0, $V4, $V8, $V12,
+      $V1, $V5, $V9, $V13,
+      $V2, $V6, $V10, $V14,
+      $V3, $V7, $V11, $V15]}
+    @{[chacha_quad_round_group
+      $V0, $V5, $V10, $V15,
+      $V1, $V6, $V11, $V12,
+      $V2, $V7, $V8, $V13,
+      $V3, $V4, $V9, $V14]}
+    bnez $CHACHA_LOOP_COUNT, .Lround_loop
+
+    # load the bottom-half of input data
+    addi $T0, $INPUT, 32
+    @{[vlsseg_nf_e32_v 8, $V24, $T0, $STRIDE]}
+
+    # add chacha top-half initial block states
+    @{[vadd_vx $V0, $V0, $CONST_DATA0]}
+    @{[vadd_vx $V1, $V1, $CONST_DATA1]}
+    @{[vadd_vx $V2, $V2, $CONST_DATA2]}
+    @{[vadd_vx $V3, $V3, $CONST_DATA3]}
+    @{[vadd_vx $V4, $V4, $KEY0]}
+    @{[vadd_vx $V5, $V5, $KEY1]}
+    @{[vadd_vx $V6, $V6, $KEY2]}
+    @{[vadd_vx $V7, $V7, $KEY3]}
+    # xor with the top-half input
+    @{[vxor_vv $V16, $V16, $V0]}
+    @{[vxor_vv $V17, $V17, $V1]}
+    @{[vxor_vv $V18, $V18, $V2]}
+    @{[vxor_vv $V19, $V19, $V3]}
+    @{[vxor_vv $V20, $V20, $V4]}
+    @{[vxor_vv $V21, $V21, $V5]}
+    @{[vxor_vv $V22, $V22, $V6]}
+    @{[vxor_vv $V23, $V23, $V7]}
+
+    # save the top-half of output
+    @{[vssseg_nf_e32_v 8, $V16, $OUTPUT, $STRIDE]}
+
+    # add chacha bottom-half initial block states
+    @{[vadd_vx $V8, $V8, $KEY4]}
+    @{[vadd_vx $V9, $V9, $KEY5]}
+    @{[vadd_vx $V10, $V10, $KEY6]}
+    @{[vadd_vx $V11, $V11, $KEY7]}
+    @{[vid_v $V0]}
+    @{[vadd_vx $V12, $V12, $COUNTER0]}
+    @{[vadd_vx $V13, $V13, $COUNTER1]}
+    @{[vadd_vx $V14, $V14, $NONCE0]}
+    @{[vadd_vx $V15, $V15, $NONCE1]}
+    @{[vadd_vv $V12, $V12, $V0]}
+    # xor with the bottom-half input
+    @{[vxor_vv $V24, $V24, $V8]}
+    @{[vxor_vv $V25, $V25, $V9]}
+    @{[vxor_vv $V26, $V26, $V10]}
+    @{[vxor_vv $V27, $V27, $V11]}
+    @{[vxor_vv $V29, $V29, $V13]}
+    @{[vxor_vv $V28, $V28, $V12]}
+    @{[vxor_vv $V30, $V30, $V14]}
+    @{[vxor_vv $V31, $V31, $V15]}
+
+    # save the bottom-half of output
+    addi $T0, $OUTPUT, 32
+    @{[vssseg_nf_e32_v 8, $V24, $T0, $STRIDE]}
+
+    # update counter
+    add $COUNTER0, $COUNTER0, $VL
+    sub $LEN, $LEN, $VL
+    # increase offset for `4 * 16 * VL = 64 * VL`
+    slli $T0, $VL, 6
+    add $INPUT, $INPUT, $T0
+    add $OUTPUT, $OUTPUT, $T0
+    bnez $LEN, .Lblock_loop
+
+    ld s0, 0(sp)
+    ld s1, 8(sp)
+    ld s2, 16(sp)
+    ld s3, 24(sp)
+    ld s4, 32(sp)
+    ld s5, 40(sp)
+    ld s6, 48(sp)
+    ld s7, 56(sp)
+    ld s8, 64(sp)
+    ld s9, 72(sp)
+    ld s10, 80(sp)
+    ld s11, 88(sp)
+    addi sp, sp, 96
+
+.Lend:
+    ret
+.size ChaCha20_ctr32_zvkb,.-ChaCha20_ctr32_zvkb
+___
+
+print $code;
+
+close STDOUT or die "error closing STDOUT: $!";
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* Re: [PATCH 04/12] RISC-V: crypto: add Zvkned accelerated AES implementation
  2023-10-25 18:36 ` [PATCH 04/12] RISC-V: crypto: add Zvkned accelerated AES implementation Jerry Shih
@ 2023-11-02  4:51   ` Eric Biggers
  2023-11-20  2:53     ` Jerry Shih
  0 siblings, 1 reply; 50+ messages in thread
From: Eric Biggers @ 2023-11-02  4:51 UTC (permalink / raw)
  To: Jerry Shih
  Cc: paul.walmsley, palmer, aou, herbert, davem, andy.chiu,
	greentime.hu, conor.dooley, guoren, bjorn, heiko, ardb,
	phoebe.chen, hongrong.hsu, linux-riscv, linux-kernel,
	linux-crypto

On Thu, Oct 26, 2023 at 02:36:36AM +0800, Jerry Shih wrote:
> diff --git a/arch/riscv/crypto/Kconfig b/arch/riscv/crypto/Kconfig
> index 10d60edc0110..500938317e71 100644
> --- a/arch/riscv/crypto/Kconfig
> +++ b/arch/riscv/crypto/Kconfig
> @@ -2,4 +2,16 @@
>  
>  menu "Accelerated Cryptographic Algorithms for CPU (riscv)"
>  
> +config CRYPTO_AES_RISCV64
> +	default y if RISCV_ISA_V
> +	tristate "Ciphers: AES"
> +	depends on 64BIT && RISCV_ISA_V
> +	select CRYPTO_AES
> +	select CRYPTO_ALGAPI
> +	help
> +	  Block ciphers: AES cipher algorithms (FIPS-197)
> +
> +	  Architecture: riscv64 using:
> +	  - Zvkned vector crypto extension

kconfig options should default to off.

I.e., remove the line "default y if RISCV_ISA_V"

> + *
> + * All zvkned-based functions use encryption expending keys for both encryption
> + * and decryption.
> + */

The above comment is a bit confusing.  It's describing the 'key' field of struct
aes_key; maybe there should be a comment there instead:

    struct aes_key {
            u32 key[AES_MAX_KEYLENGTH_U32]; /* round keys in encryption order */
            u32 rounds;
    };

> +int riscv64_aes_setkey(struct riscv64_aes_ctx *ctx, const u8 *key,
> +		       unsigned int keylen)
> +{
> +	/*
> +	 * The RISC-V AES vector crypto key expending doesn't support AES-192.
> +	 * We just use the generic software key expending here to simplify the key
> +	 * expending flow.
> +	 */

expending => expanding

> +	u32 aes_rounds;
> +	u32 key_length;
> +	int ret;
> +
> +	ret = aes_expandkey(&ctx->fallback_ctx, key, keylen);
> +	if (ret < 0)
> +		return -EINVAL;
> +
> +	/*
> +	 * Copy the key from `crypto_aes_ctx` to `aes_key` for zvkned-based AES
> +	 * implementations.
> +	 */
> +	aes_rounds = aes_round_num(keylen);
> +	ctx->key.rounds = aes_rounds;
> +	key_length = AES_BLOCK_SIZE * (aes_rounds + 1);
> +	memcpy(ctx->key.key, ctx->fallback_ctx.key_enc, key_length);
> +
> +	return 0;
> +}

Ideally this would use the same crypto_aes_ctx for both the fallback and the
assembly code.  I suppose we don't want to diverge from the OpenSSL code (unless
it gets rewritten), though.  So I guess this is fine for now.

> void riscv64_aes_encrypt_zvkned(const struct riscv64_aes_ctx *ctx, u8 *dst,
>                                const u8 *src)

These functions can be called from a different module (aes-block-riscv64), so
they need EXPORT_SYMBOL_GPL.

> +static inline bool check_aes_ext(void)
> +{
> +	return riscv_isa_extension_available(NULL, ZVKNED) &&
> +	       riscv_vector_vlen() >= 128;
> +}
> +
> +static int __init riscv64_aes_mod_init(void)
> +{
> +	if (check_aes_ext())
> +		return crypto_register_alg(&riscv64_aes_alg_zvkned);
> +
> +	return -ENODEV;
> +}
> +
> +static void __exit riscv64_aes_mod_fini(void)
> +{
> +	if (check_aes_ext())
> +		crypto_unregister_alg(&riscv64_aes_alg_zvkned);
> +}
> +
> +module_init(riscv64_aes_mod_init);
> +module_exit(riscv64_aes_mod_fini);

module_exit can only run if module_init succeeded.  So, in cases like this it's
not necessary to check for CPU features before unregistering the algorithm.

- Eric

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 06/12] RISC-V: crypto: add accelerated AES-CBC/CTR/ECB/XTS implementations
  2023-10-25 18:36 ` [PATCH 06/12] RISC-V: crypto: add accelerated AES-CBC/CTR/ECB/XTS implementations Jerry Shih
@ 2023-11-02  5:16   ` Eric Biggers
  2023-11-07  8:53     ` Jerry Shih
                       ` (2 more replies)
  2023-11-09  8:05   ` Eric Biggers
  1 sibling, 3 replies; 50+ messages in thread
From: Eric Biggers @ 2023-11-02  5:16 UTC (permalink / raw)
  To: Jerry Shih
  Cc: paul.walmsley, palmer, aou, herbert, davem, andy.chiu,
	greentime.hu, conor.dooley, guoren, bjorn, heiko, ardb,
	phoebe.chen, hongrong.hsu, linux-riscv, linux-kernel,
	linux-crypto

On Thu, Oct 26, 2023 at 02:36:38AM +0800, Jerry Shih wrote:
> +config CRYPTO_AES_BLOCK_RISCV64
> +	default y if RISCV_ISA_V
> +	tristate "Ciphers: AES, modes: ECB/CBC/CTR/XTS"
> +	depends on 64BIT && RISCV_ISA_V
> +	select CRYPTO_AES_RISCV64
> +	select CRYPTO_SKCIPHER
> +	help
> +	  Length-preserving ciphers: AES cipher algorithms (FIPS-197)
> +	  with block cipher modes:
> +	  - ECB (Electronic Codebook) mode (NIST SP 800-38A)
> +	  - CBC (Cipher Block Chaining) mode (NIST SP 800-38A)
> +	  - CTR (Counter) mode (NIST SP 800-38A)
> +	  - XTS (XOR Encrypt XOR Tweakable Block Cipher with Ciphertext
> +	    Stealing) mode (NIST SP 800-38E and IEEE 1619)
> +
> +	  Architecture: riscv64 using:
> +	  - Zvbb vector extension (XTS)
> +	  - Zvkb vector crypto extension (CTR/XTS)
> +	  - Zvkg vector crypto extension (XTS)
> +	  - Zvkned vector crypto extension

Maybe list Zvkned first since it's the most important one in this context.

> +#define AES_BLOCK_VALID_SIZE_MASK (~(AES_BLOCK_SIZE - 1))
> +#define AES_BLOCK_REMAINING_SIZE_MASK (AES_BLOCK_SIZE - 1)

I think it would be easier to read if these values were just used directly.

> +static int ecb_encrypt(struct skcipher_request *req)
> +{
> +	struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
> +	const struct riscv64_aes_ctx *ctx = crypto_skcipher_ctx(tfm);
> +	struct skcipher_walk walk;
> +	unsigned int nbytes;
> +	int err;
> +
> +	/* If we have error here, the `nbytes` will be zero. */
> +	err = skcipher_walk_virt(&walk, req, false);
> +	while ((nbytes = walk.nbytes)) {
> +		kernel_vector_begin();
> +		rv64i_zvkned_ecb_encrypt(walk.src.virt.addr, walk.dst.virt.addr,
> +					 nbytes & AES_BLOCK_VALID_SIZE_MASK,
> +					 &ctx->key);
> +		kernel_vector_end();
> +		err = skcipher_walk_done(
> +			&walk, nbytes & AES_BLOCK_REMAINING_SIZE_MASK);
> +	}
> +
> +	return err;
> +}

There's no fallback for !crypto_simd_usable() here.  I really like it this way.
However, for it to work (for skciphers and aeads), RISC-V needs to allow the
vector registers to be used in softirq context.  Is that already the case?

> +/* ctr */
> +static int ctr_encrypt(struct skcipher_request *req)
> +{
> +	struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
> +	const struct riscv64_aes_ctx *ctx = crypto_skcipher_ctx(tfm);
> +	struct skcipher_walk walk;
> +	unsigned int ctr32;
> +	unsigned int nbytes;
> +	unsigned int blocks;
> +	unsigned int current_blocks;
> +	unsigned int current_length;
> +	int err;
> +
> +	/* the ctr iv uses big endian */
> +	ctr32 = get_unaligned_be32(req->iv + 12);
> +	err = skcipher_walk_virt(&walk, req, false);
> +	while ((nbytes = walk.nbytes)) {
> +		if (nbytes != walk.total) {
> +			nbytes &= AES_BLOCK_VALID_SIZE_MASK;
> +			blocks = nbytes / AES_BLOCK_SIZE;
> +		} else {
> +			/* This is the last walk. We should handle the tail data. */
> +			blocks = (nbytes + (AES_BLOCK_SIZE - 1)) /
> +				 AES_BLOCK_SIZE;

'(nbytes + (AES_BLOCK_SIZE - 1)) / AES_BLOCK_SIZE' can be replaced with
'DIV_ROUND_UP(nbytes, AES_BLOCK_SIZE)'

> +static int xts_crypt(struct skcipher_request *req, aes_xts_func func)
> +{
> +	struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
> +	const struct riscv64_aes_xts_ctx *ctx = crypto_skcipher_ctx(tfm);
> +	struct skcipher_request sub_req;
> +	struct scatterlist sg_src[2], sg_dst[2];
> +	struct scatterlist *src, *dst;
> +	struct skcipher_walk walk;
> +	unsigned int walk_size = crypto_skcipher_walksize(tfm);
> +	unsigned int tail_bytes;
> +	unsigned int head_bytes;
> +	unsigned int nbytes;
> +	unsigned int update_iv = 1;
> +	int err;
> +
> +	/* xts input size should be bigger than AES_BLOCK_SIZE */
> +	if (req->cryptlen < AES_BLOCK_SIZE)
> +		return -EINVAL;
> +
> +	/*
> +	 * The tail size should be small than walk_size. Thus, we could make sure the
> +	 * walk size for tail elements could be bigger than AES_BLOCK_SIZE.
> +	 */
> +	if (req->cryptlen <= walk_size) {
> +		tail_bytes = req->cryptlen;
> +		head_bytes = 0;
> +	} else {
> +		if (req->cryptlen & AES_BLOCK_REMAINING_SIZE_MASK) {
> +			tail_bytes = req->cryptlen &
> +				     AES_BLOCK_REMAINING_SIZE_MASK;
> +			tail_bytes = walk_size + tail_bytes - AES_BLOCK_SIZE;
> +			head_bytes = req->cryptlen - tail_bytes;
> +		} else {
> +			tail_bytes = 0;
> +			head_bytes = req->cryptlen;
> +		}
> +	}
> +
> +	riscv64_aes_encrypt_zvkned(&ctx->ctx2, req->iv, req->iv);
> +
> +	if (head_bytes && tail_bytes) {
> +		skcipher_request_set_tfm(&sub_req, tfm);
> +		skcipher_request_set_callback(
> +			&sub_req, skcipher_request_flags(req), NULL, NULL);
> +		skcipher_request_set_crypt(&sub_req, req->src, req->dst,
> +					   head_bytes, req->iv);
> +		req = &sub_req;
> +	}
> +
> +	if (head_bytes) {
> +		err = skcipher_walk_virt(&walk, req, false);
> +		while ((nbytes = walk.nbytes)) {
> +			if (nbytes == walk.total)
> +				update_iv = (tail_bytes > 0);
> +
> +			nbytes &= AES_BLOCK_VALID_SIZE_MASK;
> +			kernel_vector_begin();
> +			func(walk.src.virt.addr, walk.dst.virt.addr, nbytes,
> +			     &ctx->ctx1.key, req->iv, update_iv);
> +			kernel_vector_end();
> +
> +			err = skcipher_walk_done(&walk, walk.nbytes - nbytes);
> +		}
> +		if (err || !tail_bytes)
> +			return err;
> +
> +		dst = src = scatterwalk_next(sg_src, &walk.in);
> +		if (req->dst != req->src)
> +			dst = scatterwalk_next(sg_dst, &walk.out);
> +		skcipher_request_set_crypt(req, src, dst, tail_bytes, req->iv);
> +	}
> +
> +	/* tail */
> +	err = skcipher_walk_virt(&walk, req, false);
> +	if (err)
> +		return err;
> +	if (walk.nbytes != tail_bytes)
> +		return -EINVAL;
> +	kernel_vector_begin();
> +	func(walk.src.virt.addr, walk.dst.virt.addr, walk.nbytes,
> +	     &ctx->ctx1.key, req->iv, 0);
> +	kernel_vector_end();
> +
> +	return skcipher_walk_done(&walk, 0);
> +}

This function looks a bit weird.  I see it's also the only caller of the
scatterwalk_next() function that you're adding.  I haven't looked at this super
closely, but I expect that there's a cleaner way of handling the "tail" than
this -- maybe use scatterwalk_map_and_copy() to copy from/to a stack buffer?

- Eric

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 12/12] RISC-V: crypto: add Zvkb accelerated ChaCha20 implementation
  2023-10-25 18:36 ` [PATCH 12/12] RISC-V: crypto: add Zvkb accelerated ChaCha20 implementation Jerry Shih
@ 2023-11-02  5:43   ` Eric Biggers
  2023-11-20  2:55     ` Jerry Shih
  2023-11-22  1:29   ` Eric Biggers
  1 sibling, 1 reply; 50+ messages in thread
From: Eric Biggers @ 2023-11-02  5:43 UTC (permalink / raw)
  To: Jerry Shih
  Cc: paul.walmsley, palmer, aou, herbert, davem, andy.chiu,
	greentime.hu, conor.dooley, guoren, bjorn, heiko, ardb,
	phoebe.chen, hongrong.hsu, linux-riscv, linux-kernel,
	linux-crypto

On Thu, Oct 26, 2023 at 02:36:44AM +0800, Jerry Shih wrote:
> +static struct skcipher_alg riscv64_chacha_alg_zvkb[] = { {
> +	.base = {
> +		.cra_name = "chacha20",
> +		.cra_driver_name = "chacha20-riscv64-zvkb",
> +		.cra_priority = 300,
> +		.cra_blocksize = 1,
> +		.cra_ctxsize = sizeof(struct chacha_ctx),
> +		.cra_module = THIS_MODULE,
> +	},
> +	.min_keysize = CHACHA_KEY_SIZE,
> +	.max_keysize = CHACHA_KEY_SIZE,
> +	.ivsize = CHACHA_IV_SIZE,
> +	.chunksize = CHACHA_BLOCK_SIZE,
> +	.walksize = CHACHA_BLOCK_SIZE * 4,
> +	.setkey = chacha20_setkey,
> +	.encrypt = chacha20_encrypt,
> +	.decrypt = chacha20_encrypt,
> +} };
> +
> +static inline bool check_chacha20_ext(void)
> +{
> +	return riscv_isa_extension_available(NULL, ZVKB) &&
> +	       riscv_vector_vlen() >= 128;
> +}
> +
> +static int __init riscv64_chacha_mod_init(void)
> +{
> +	if (check_chacha20_ext())
> +		return crypto_register_skciphers(
> +			riscv64_chacha_alg_zvkb,
> +			ARRAY_SIZE(riscv64_chacha_alg_zvkb));
> +
> +	return -ENODEV;
> +}
> +
> +static void __exit riscv64_chacha_mod_fini(void)
> +{
> +	if (check_chacha20_ext())
> +		crypto_unregister_skciphers(
> +			riscv64_chacha_alg_zvkb,
> +			ARRAY_SIZE(riscv64_chacha_alg_zvkb));
> +}

When there's just one algorithm being registered/unregistered,
crypto_register_skcipher() and crypto_unregister_skcipher() can be used.

> +# - RV64I
> +# - RISC-V Vector ('V') with VLEN >= 128
> +# - RISC-V Vector Cryptography Bit-manipulation extension ('Zvkb')
> +# - RISC-V Zicclsm(Main memory supports misaligned loads/stores)

How is the presence of the Zicclsm extension guaranteed?

- Eric

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 10/12] RISC-V: crypto: add Zvksed accelerated SM4 implementation
  2023-10-25 18:36 ` [PATCH 10/12] RISC-V: crypto: add Zvksed accelerated SM4 implementation Jerry Shih
@ 2023-11-02  5:58   ` Eric Biggers
  2023-11-20  2:55     ` Jerry Shih
  0 siblings, 1 reply; 50+ messages in thread
From: Eric Biggers @ 2023-11-02  5:58 UTC (permalink / raw)
  To: Jerry Shih
  Cc: paul.walmsley, palmer, aou, herbert, davem, andy.chiu,
	greentime.hu, conor.dooley, guoren, bjorn, heiko, ardb,
	phoebe.chen, hongrong.hsu, linux-riscv, linux-kernel,
	linux-crypto

On Thu, Oct 26, 2023 at 02:36:42AM +0800, Jerry Shih wrote:
> +struct crypto_alg riscv64_sm4_zvksed_alg = {
> +	.cra_name = "sm4",
> +	.cra_driver_name = "sm4-riscv64-zvkb-zvksed",
> +	.cra_module = THIS_MODULE,
> +	.cra_priority = 300,
> +	.cra_flags = CRYPTO_ALG_TYPE_CIPHER,
> +	.cra_blocksize = SM4_BLOCK_SIZE,
> +	.cra_ctxsize = sizeof(struct sm4_ctx),
> +	.cra_cipher = {
> +		.cia_min_keysize = SM4_KEY_SIZE,
> +		.cia_max_keysize = SM4_KEY_SIZE,
> +		.cia_setkey = riscv64_sm4_setkey_zvksed,
> +		.cia_encrypt = riscv64_sm4_encrypt_zvksed,
> +		.cia_decrypt = riscv64_sm4_decrypt_zvksed,
> +	},
> +};

This should be 'static'.

- Eric

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 06/12] RISC-V: crypto: add accelerated AES-CBC/CTR/ECB/XTS implementations
  2023-11-02  5:16   ` Eric Biggers
@ 2023-11-07  8:53     ` Jerry Shih
  2023-11-09  7:16       ` Eric Biggers
  2023-11-20  2:47     ` Jerry Shih
  2023-11-22  1:14     ` Eric Biggers
  2 siblings, 1 reply; 50+ messages in thread
From: Jerry Shih @ 2023-11-07  8:53 UTC (permalink / raw)
  To: Eric Biggers
  Cc: Paul Walmsley, palmer, Albert Ou, herbert, davem, andy.chiu,
	greentime.hu, conor.dooley, guoren, bjorn, heiko, ardb,
	phoebe.chen, hongrong.hsu, linux-riscv, linux-kernel,
	linux-crypto

On Nov 2, 2023, at 13:16, Eric Biggers <ebiggers@kernel.org> wrote:
> On Thu, Oct 26, 2023 at 02:36:38AM +0800, Jerry Shih wrote:
>> +static int ecb_encrypt(struct skcipher_request *req)
>> +{
>> +	struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
>> +	const struct riscv64_aes_ctx *ctx = crypto_skcipher_ctx(tfm);
>> +	struct skcipher_walk walk;
>> +	unsigned int nbytes;
>> +	int err;
>> +
>> +	/* If we have error here, the `nbytes` will be zero. */
>> +	err = skcipher_walk_virt(&walk, req, false);
>> +	while ((nbytes = walk.nbytes)) {
>> +		kernel_vector_begin();
>> +		rv64i_zvkned_ecb_encrypt(walk.src.virt.addr, walk.dst.virt.addr,
>> +					 nbytes & AES_BLOCK_VALID_SIZE_MASK,
>> +					 &ctx->key);
>> +		kernel_vector_end();
>> +		err = skcipher_walk_done(
>> +			&walk, nbytes & AES_BLOCK_REMAINING_SIZE_MASK);
>> +	}
>> +
>> +	return err;
>> +}
> 
> There's no fallback for !crypto_simd_usable() here.  I really like it this way.
> However, for it to work (for skciphers and aeads), RISC-V needs to allow the
> vector registers to be used in softirq context.  Is that already the case?

The kernel-mode-vector could be enabled in softirq, but we don't have nesting
vector contexts. Will we have the case that kernel needs to jump to softirq for
encryptions during the regular crypto function? If yes, we need to have fallbacks
for all algorithms.

-Jerry

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 06/12] RISC-V: crypto: add accelerated AES-CBC/CTR/ECB/XTS implementations
  2023-11-07  8:53     ` Jerry Shih
@ 2023-11-09  7:16       ` Eric Biggers
  2023-11-10  3:58         ` Jerry Shih
  2023-11-10  4:58         ` Andy Chiu
  0 siblings, 2 replies; 50+ messages in thread
From: Eric Biggers @ 2023-11-09  7:16 UTC (permalink / raw)
  To: Jerry Shih
  Cc: Paul Walmsley, palmer, Albert Ou, herbert, davem, andy.chiu,
	greentime.hu, conor.dooley, guoren, bjorn, heiko, ardb,
	phoebe.chen, hongrong.hsu, linux-riscv, linux-kernel,
	linux-crypto

On Tue, Nov 07, 2023 at 04:53:13PM +0800, Jerry Shih wrote:
> On Nov 2, 2023, at 13:16, Eric Biggers <ebiggers@kernel.org> wrote:
> > On Thu, Oct 26, 2023 at 02:36:38AM +0800, Jerry Shih wrote:
> >> +static int ecb_encrypt(struct skcipher_request *req)
> >> +{
> >> +	struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
> >> +	const struct riscv64_aes_ctx *ctx = crypto_skcipher_ctx(tfm);
> >> +	struct skcipher_walk walk;
> >> +	unsigned int nbytes;
> >> +	int err;
> >> +
> >> +	/* If we have error here, the `nbytes` will be zero. */
> >> +	err = skcipher_walk_virt(&walk, req, false);
> >> +	while ((nbytes = walk.nbytes)) {
> >> +		kernel_vector_begin();
> >> +		rv64i_zvkned_ecb_encrypt(walk.src.virt.addr, walk.dst.virt.addr,
> >> +					 nbytes & AES_BLOCK_VALID_SIZE_MASK,
> >> +					 &ctx->key);
> >> +		kernel_vector_end();
> >> +		err = skcipher_walk_done(
> >> +			&walk, nbytes & AES_BLOCK_REMAINING_SIZE_MASK);
> >> +	}
> >> +
> >> +	return err;
> >> +}
> > 
> > There's no fallback for !crypto_simd_usable() here.  I really like it this way.
> > However, for it to work (for skciphers and aeads), RISC-V needs to allow the
> > vector registers to be used in softirq context.  Is that already the case?
> 
> The kernel-mode-vector could be enabled in softirq, but we don't have nesting
> vector contexts. Will we have the case that kernel needs to jump to softirq for
> encryptions during the regular crypto function? If yes, we need to have fallbacks
> for all algorithms.

Are you asking what happens if a softirq is taken while the CPU is between
kernel_vector_begin() and kernel_vector_end()?  I think that needs to be
prevented by making kernel_vector_begin() and kernel_vector_end() disable and
re-enable softirqs, like what kernel_neon_begin() and kernel_neon_end() do on
arm64.  Refer to commit 13150149aa6ded which implemented that behavior on arm64.

- Eric

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 06/12] RISC-V: crypto: add accelerated AES-CBC/CTR/ECB/XTS implementations
  2023-10-25 18:36 ` [PATCH 06/12] RISC-V: crypto: add accelerated AES-CBC/CTR/ECB/XTS implementations Jerry Shih
  2023-11-02  5:16   ` Eric Biggers
@ 2023-11-09  8:05   ` Eric Biggers
  2023-11-10  4:06     ` Jerry Shih
  1 sibling, 1 reply; 50+ messages in thread
From: Eric Biggers @ 2023-11-09  8:05 UTC (permalink / raw)
  To: Jerry Shih
  Cc: paul.walmsley, palmer, aou, herbert, davem, andy.chiu,
	greentime.hu, conor.dooley, guoren, bjorn, heiko, ardb,
	phoebe.chen, hongrong.hsu, linux-riscv, linux-kernel,
	linux-crypto

On Thu, Oct 26, 2023 at 02:36:38AM +0800, Jerry Shih wrote:
> +# prepare input data(v24), iv(v28), bit-reversed-iv(v16), bit-reversed-iv-multiplier(v20)
> +sub init_first_round {
> +    my $code=<<___;
> +    # load input
> +    @{[vsetvli $VL, $LEN32, "e32", "m4", "ta", "ma"]}
> +    @{[vle32_v $V24, $INPUT]}
> +
> +    li $T0, 5
> +    # We could simplify the initialization steps if we have `block<=1`.
> +    blt $LEN32, $T0, 1f
> +
> +    # Note: We use `vgmul` for GF(2^128) multiplication. The `vgmul` uses
> +    # different order of coefficients. We should use`vbrev8` to reverse the
> +    # data when we use `vgmul`.
> +    @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]}
> +    @{[vbrev8_v $V0, $V28]}
> +    @{[vsetvli "zero", $LEN32, "e32", "m4", "ta", "ma"]}
> +    @{[vmv_v_i $V16, 0]}
> +    # v16: [r-IV0, r-IV0, ...]
> +    @{[vaesz_vs $V16, $V0]}
> +
> +    # Prepare GF(2^128) multiplier [1, x, x^2, x^3, ...] in v8.
> +    slli $T0, $LEN32, 2
> +    @{[vsetvli "zero", $T0, "e32", "m1", "ta", "ma"]}
> +    # v2: [`1`, `1`, `1`, `1`, ...]
> +    @{[vmv_v_i $V2, 1]}
> +    # v3: [`0`, `1`, `2`, `3`, ...]
> +    @{[vid_v $V3]}
> +    @{[vsetvli "zero", $T0, "e64", "m2", "ta", "ma"]}
> +    # v4: [`1`, 0, `1`, 0, `1`, 0, `1`, 0, ...]
> +    @{[vzext_vf2 $V4, $V2]}
> +    # v6: [`0`, 0, `1`, 0, `2`, 0, `3`, 0, ...]
> +    @{[vzext_vf2 $V6, $V3]}
> +    slli $T0, $LEN32, 1
> +    @{[vsetvli "zero", $T0, "e32", "m2", "ta", "ma"]}
> +    # v8: [1<<0=1, 0, 0, 0, 1<<1=x, 0, 0, 0, 1<<2=x^2, 0, 0, 0, ...]
> +    @{[vwsll_vv $V8, $V4, $V6]}
> +
> +    # Compute [r-IV0*1, r-IV0*x, r-IV0*x^2, r-IV0*x^3, ...] in v16
> +    @{[vsetvli "zero", $LEN32, "e32", "m4", "ta", "ma"]}
> +    @{[vbrev8_v $V8, $V8]}
> +    @{[vgmul_vv $V16, $V8]}
> +
> +    # Compute [IV0*1, IV0*x, IV0*x^2, IV0*x^3, ...] in v28.
> +    # Reverse the bits order back.
> +    @{[vbrev8_v $V28, $V16]}

This code assumes that '1 << i' fits in 64 bits, for 0 <= i < vl.

I think that works out to an implicit assumption that VLEN <= 2048.  I.e.,
AES-XTS encryption/decryption would produce the wrong result on RISC-V
implementations with VLEN > 2048.

Perhaps it should be explicitly checked that VLEN <= 2048?

- Eric

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 06/12] RISC-V: crypto: add accelerated AES-CBC/CTR/ECB/XTS implementations
  2023-11-09  7:16       ` Eric Biggers
@ 2023-11-10  3:58         ` Jerry Shih
  2023-11-10  4:34           ` Eric Biggers
  2023-11-10  4:58         ` Andy Chiu
  1 sibling, 1 reply; 50+ messages in thread
From: Jerry Shih @ 2023-11-10  3:58 UTC (permalink / raw)
  To: Eric Biggers
  Cc: Paul Walmsley, palmer, Albert Ou, herbert, davem, andy.chiu,
	greentime.hu, conor.dooley, guoren, bjorn, heiko, ardb,
	phoebe.chen, hongrong.hsu, linux-riscv, linux-kernel,
	linux-crypto

On Nov 9, 2023, at 15:16, Eric Biggers <ebiggers@kernel.org> wrote:
> On Tue, Nov 07, 2023 at 04:53:13PM +0800, Jerry Shih wrote:
>> On Nov 2, 2023, at 13:16, Eric Biggers <ebiggers@kernel.org> wrote:
>>> On Thu, Oct 26, 2023 at 02:36:38AM +0800, Jerry Shih wrote:
>>>> +static int ecb_encrypt(struct skcipher_request *req)
>>>> +{
>>> 
>>> There's no fallback for !crypto_simd_usable() here.  I really like it this way.
>>> However, for it to work (for skciphers and aeads), RISC-V needs to allow the
>>> vector registers to be used in softirq context.  Is that already the case?
>> 
>> The kernel-mode-vector could be enabled in softirq, but we don't have nesting
>> vector contexts. Will we have the case that kernel needs to jump to softirq for
>> encryptions during the regular crypto function? If yes, we need to have fallbacks
>> for all algorithms.
> 
> Are you asking what happens if a softirq is taken while the CPU is between
> kernel_vector_begin() and kernel_vector_end()?  I think that needs to be
> prevented by making kernel_vector_begin() and kernel_vector_end() disable and
> re-enable softirqs, like what kernel_neon_begin() and kernel_neon_end() do on
> arm64.  Refer to commit 13150149aa6ded which implemented that behavior on arm64.
> 
> - Eric

The current kernel-mode-vector implementation, it only calls `preempt_disable()` during
vector context. So, we will hit nesting vector context issue from softirq which also use
kernel-vector.
https://lore.kernel.org/all/20230721112855.1006-1-andy.chiu@sifive.com/

Maybe we could use the `simd_register_aeads_compat()` wrapping as x86 platform
first in a simpler way first.

-Jerry

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 06/12] RISC-V: crypto: add accelerated AES-CBC/CTR/ECB/XTS implementations
  2023-11-09  8:05   ` Eric Biggers
@ 2023-11-10  4:06     ` Jerry Shih
  2023-11-20  2:36       ` Jerry Shih
  0 siblings, 1 reply; 50+ messages in thread
From: Jerry Shih @ 2023-11-10  4:06 UTC (permalink / raw)
  To: Eric Biggers
  Cc: Paul Walmsley, palmer, Albert Ou, herbert, davem, andy.chiu,
	greentime.hu, conor.dooley, guoren, bjorn, heiko, ardb,
	phoebe.chen, hongrong.hsu, linux-riscv, linux-kernel,
	linux-crypto

On Nov 9, 2023, at 16:05, Eric Biggers <ebiggers@kernel.org> wrote:
> On Thu, Oct 26, 2023 at 02:36:38AM +0800, Jerry Shih wrote:
>> +# prepare input data(v24), iv(v28), bit-reversed-iv(v16), bit-reversed-iv-multiplier(v20)
>> +sub init_first_round {
>> ....
>> +    # Prepare GF(2^128) multiplier [1, x, x^2, x^3, ...] in v8.
>> +    slli $T0, $LEN32, 2
>> +    @{[vsetvli "zero", $T0, "e32", "m1", "ta", "ma"]}
>> +    # v2: [`1`, `1`, `1`, `1`, ...]
>> +    @{[vmv_v_i $V2, 1]}
>> +    # v3: [`0`, `1`, `2`, `3`, ...]
>> +    @{[vid_v $V3]}
>> +    @{[vsetvli "zero", $T0, "e64", "m2", "ta", "ma"]}
>> +    # v4: [`1`, 0, `1`, 0, `1`, 0, `1`, 0, ...]
>> +    @{[vzext_vf2 $V4, $V2]}
>> +    # v6: [`0`, 0, `1`, 0, `2`, 0, `3`, 0, ...]
>> +    @{[vzext_vf2 $V6, $V3]}
>> +    slli $T0, $LEN32, 1
>> +    @{[vsetvli "zero", $T0, "e32", "m2", "ta", "ma"]}
>> +    # v8: [1<<0=1, 0, 0, 0, 1<<1=x, 0, 0, 0, 1<<2=x^2, 0, 0, 0, ...]
>> +    @{[vwsll_vv $V8, $V4, $V6]}
> 
> This code assumes that '1 << i' fits in 64 bits, for 0 <= i < vl.
> 
> I think that works out to an implicit assumption that VLEN <= 2048.  I.e.,
> AES-XTS encryption/decryption would produce the wrong result on RISC-V
> implementations with VLEN > 2048.
> 
> Perhaps it should be explicitly checked that VLEN <= 2048?
> 
> - Eric

Yes, we could just have the simple checking like:

  riscv_vector_vlen() >= 128 || riscv_vector_vlen() <=2048

We could also truncate the VL inside for VLEN>2048 case.
Let me think more about these two approaches. 

-Jerry

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 06/12] RISC-V: crypto: add accelerated AES-CBC/CTR/ECB/XTS implementations
  2023-11-10  3:58         ` Jerry Shih
@ 2023-11-10  4:34           ` Eric Biggers
  0 siblings, 0 replies; 50+ messages in thread
From: Eric Biggers @ 2023-11-10  4:34 UTC (permalink / raw)
  To: Jerry Shih
  Cc: Paul Walmsley, palmer, Albert Ou, herbert, davem, andy.chiu,
	greentime.hu, conor.dooley, guoren, bjorn, heiko, ardb,
	phoebe.chen, hongrong.hsu, linux-riscv, linux-kernel,
	linux-crypto

On Fri, Nov 10, 2023 at 11:58:02AM +0800, Jerry Shih wrote:
> On Nov 9, 2023, at 15:16, Eric Biggers <ebiggers@kernel.org> wrote:
> > On Tue, Nov 07, 2023 at 04:53:13PM +0800, Jerry Shih wrote:
> >> On Nov 2, 2023, at 13:16, Eric Biggers <ebiggers@kernel.org> wrote:
> >>> On Thu, Oct 26, 2023 at 02:36:38AM +0800, Jerry Shih wrote:
> >>>> +static int ecb_encrypt(struct skcipher_request *req)
> >>>> +{
> >>> 
> >>> There's no fallback for !crypto_simd_usable() here.  I really like it this way.
> >>> However, for it to work (for skciphers and aeads), RISC-V needs to allow the
> >>> vector registers to be used in softirq context.  Is that already the case?
> >> 
> >> The kernel-mode-vector could be enabled in softirq, but we don't have nesting
> >> vector contexts. Will we have the case that kernel needs to jump to softirq for
> >> encryptions during the regular crypto function? If yes, we need to have fallbacks
> >> for all algorithms.
> > 
> > Are you asking what happens if a softirq is taken while the CPU is between
> > kernel_vector_begin() and kernel_vector_end()?  I think that needs to be
> > prevented by making kernel_vector_begin() and kernel_vector_end() disable and
> > re-enable softirqs, like what kernel_neon_begin() and kernel_neon_end() do on
> > arm64.  Refer to commit 13150149aa6ded which implemented that behavior on arm64.
> > 
> > - Eric
> 
> The current kernel-mode-vector implementation, it only calls `preempt_disable()` during
> vector context. So, we will hit nesting vector context issue from softirq which also use
> kernel-vector.
> https://lore.kernel.org/all/20230721112855.1006-1-andy.chiu@sifive.com/
> 
> Maybe we could use the `simd_register_aeads_compat()` wrapping as x86 platform
> first in a simpler way first.

The "crypto SIMD helpers" (simd_register_*() in crypto/simd.c) are quite complex
and have some disadvantages such as marking the resulting algorithms as
"asynchronous".  I expect that a much better approach would be to align RISC-V
with arm64 by disabling softirqs while the kernel vector unit is in use.

- Eric

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 06/12] RISC-V: crypto: add accelerated AES-CBC/CTR/ECB/XTS implementations
  2023-11-09  7:16       ` Eric Biggers
  2023-11-10  3:58         ` Jerry Shih
@ 2023-11-10  4:58         ` Andy Chiu
  2023-11-10  5:44           ` Eric Biggers
  1 sibling, 1 reply; 50+ messages in thread
From: Andy Chiu @ 2023-11-10  4:58 UTC (permalink / raw)
  To: Eric Biggers
  Cc: Jerry Shih, Paul Walmsley, palmer, Albert Ou, herbert, davem,
	greentime.hu, conor.dooley, guoren, bjorn, heiko, ardb,
	phoebe.chen, hongrong.hsu, linux-riscv, linux-kernel,
	linux-crypto

Hi Eric,

On Thu, Nov 9, 2023 at 3:16 PM Eric Biggers <ebiggers@kernel.org> wrote:
>
> On Tue, Nov 07, 2023 at 04:53:13PM +0800, Jerry Shih wrote:
> > On Nov 2, 2023, at 13:16, Eric Biggers <ebiggers@kernel.org> wrote:
> > > On Thu, Oct 26, 2023 at 02:36:38AM +0800, Jerry Shih wrote:
> > >> +static int ecb_encrypt(struct skcipher_request *req)
> > >> +{
> > >> +  struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
> > >> +  const struct riscv64_aes_ctx *ctx = crypto_skcipher_ctx(tfm);
> > >> +  struct skcipher_walk walk;
> > >> +  unsigned int nbytes;
> > >> +  int err;
> > >> +
> > >> +  /* If we have error here, the `nbytes` will be zero. */
> > >> +  err = skcipher_walk_virt(&walk, req, false);
> > >> +  while ((nbytes = walk.nbytes)) {
> > >> +          kernel_vector_begin();
> > >> +          rv64i_zvkned_ecb_encrypt(walk.src.virt.addr, walk.dst.virt.addr,
> > >> +                                   nbytes & AES_BLOCK_VALID_SIZE_MASK,
> > >> +                                   &ctx->key);
> > >> +          kernel_vector_end();
> > >> +          err = skcipher_walk_done(
> > >> +                  &walk, nbytes & AES_BLOCK_REMAINING_SIZE_MASK);
> > >> +  }
> > >> +
> > >> +  return err;
> > >> +}
> > >
> > > There's no fallback for !crypto_simd_usable() here.  I really like it this way.
> > > However, for it to work (for skciphers and aeads), RISC-V needs to allow the
> > > vector registers to be used in softirq context.  Is that already the case?
> >
> > The kernel-mode-vector could be enabled in softirq, but we don't have nesting
> > vector contexts. Will we have the case that kernel needs to jump to softirq for
> > encryptions during the regular crypto function? If yes, we need to have fallbacks
> > for all algorithms.
>
> Are you asking what happens if a softirq is taken while the CPU is between
> kernel_vector_begin() and kernel_vector_end()?  I think that needs to be
> prevented by making kernel_vector_begin() and kernel_vector_end() disable and
> re-enable softirqs, like what kernel_neon_begin() and kernel_neon_end() do on
> arm64.  Refer to commit 13150149aa6ded which implemented that behavior on arm64.

Yes, if making Vector available to softirq context is a must, then it
is reasonable to call local_bh_disable() in kernel_vector_begin().
However, softirq would not be the only user for Vector and disabling
it may cause extra latencies. Meanwhile, simply disabling bh in
kernel_vector_begin() will conflict with the patch[1] that takes an
approach to run Preemptible Vector. Though it is not clear yet on
whether we should run Vector without turning off preemption, I have
tested running preemptible Vector and observed some latency
improvements without sacrificing throughput. We will have a discussion
on LPC2023[2] and it'd be great if you could join or continue to
discuss it here.

Approaches can be done such as nesting, if running Vector in softirq
is required. Since it requires extra save/restore on nesting, I think
we should run some tests to get more performance (latency/throughput)
figure let the result decide the final direction. For example, we
could run Vector in either nesting with preempt-V and  non-nesting
without preempt-V and compare the following performance catachristics:
 - System-wide latency impact
 - Latency and throughput of softirq-Vector itself

>
> - Eric

 - [1] https://lore.kernel.org/all/20231019154552.23351-6-andy.chiu@sifive.com/
 - [2] https://lpc.events/event/17/contributions/1474/

Regard,
Andy

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 06/12] RISC-V: crypto: add accelerated AES-CBC/CTR/ECB/XTS implementations
  2023-11-10  4:58         ` Andy Chiu
@ 2023-11-10  5:44           ` Eric Biggers
  2023-11-11 11:08             ` Ard Biesheuvel
  0 siblings, 1 reply; 50+ messages in thread
From: Eric Biggers @ 2023-11-10  5:44 UTC (permalink / raw)
  To: Andy Chiu
  Cc: Jerry Shih, Paul Walmsley, palmer, Albert Ou, herbert, davem,
	greentime.hu, conor.dooley, guoren, bjorn, heiko, ardb,
	phoebe.chen, hongrong.hsu, linux-riscv, linux-kernel,
	linux-crypto

On Fri, Nov 10, 2023 at 12:58:12PM +0800, Andy Chiu wrote:
> Hi Eric,
> 
> On Thu, Nov 9, 2023 at 3:16 PM Eric Biggers <ebiggers@kernel.org> wrote:
> >
> > On Tue, Nov 07, 2023 at 04:53:13PM +0800, Jerry Shih wrote:
> > > On Nov 2, 2023, at 13:16, Eric Biggers <ebiggers@kernel.org> wrote:
> > > > On Thu, Oct 26, 2023 at 02:36:38AM +0800, Jerry Shih wrote:
> > > >> +static int ecb_encrypt(struct skcipher_request *req)
> > > >> +{
> > > >> +  struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
> > > >> +  const struct riscv64_aes_ctx *ctx = crypto_skcipher_ctx(tfm);
> > > >> +  struct skcipher_walk walk;
> > > >> +  unsigned int nbytes;
> > > >> +  int err;
> > > >> +
> > > >> +  /* If we have error here, the `nbytes` will be zero. */
> > > >> +  err = skcipher_walk_virt(&walk, req, false);
> > > >> +  while ((nbytes = walk.nbytes)) {
> > > >> +          kernel_vector_begin();
> > > >> +          rv64i_zvkned_ecb_encrypt(walk.src.virt.addr, walk.dst.virt.addr,
> > > >> +                                   nbytes & AES_BLOCK_VALID_SIZE_MASK,
> > > >> +                                   &ctx->key);
> > > >> +          kernel_vector_end();
> > > >> +          err = skcipher_walk_done(
> > > >> +                  &walk, nbytes & AES_BLOCK_REMAINING_SIZE_MASK);
> > > >> +  }
> > > >> +
> > > >> +  return err;
> > > >> +}
> > > >
> > > > There's no fallback for !crypto_simd_usable() here.  I really like it this way.
> > > > However, for it to work (for skciphers and aeads), RISC-V needs to allow the
> > > > vector registers to be used in softirq context.  Is that already the case?
> > >
> > > The kernel-mode-vector could be enabled in softirq, but we don't have nesting
> > > vector contexts. Will we have the case that kernel needs to jump to softirq for
> > > encryptions during the regular crypto function? If yes, we need to have fallbacks
> > > for all algorithms.
> >
> > Are you asking what happens if a softirq is taken while the CPU is between
> > kernel_vector_begin() and kernel_vector_end()?  I think that needs to be
> > prevented by making kernel_vector_begin() and kernel_vector_end() disable and
> > re-enable softirqs, like what kernel_neon_begin() and kernel_neon_end() do on
> > arm64.  Refer to commit 13150149aa6ded which implemented that behavior on arm64.
> 
> Yes, if making Vector available to softirq context is a must, then it
> is reasonable to call local_bh_disable() in kernel_vector_begin().
> However, softirq would not be the only user for Vector and disabling
> it may cause extra latencies. Meanwhile, simply disabling bh in
> kernel_vector_begin() will conflict with the patch[1] that takes an
> approach to run Preemptible Vector. Though it is not clear yet on
> whether we should run Vector without turning off preemption, I have
> tested running preemptible Vector and observed some latency
> improvements without sacrificing throughput. We will have a discussion
> on LPC2023[2] and it'd be great if you could join or continue to
> discuss it here.
> 
> Approaches can be done such as nesting, if running Vector in softirq
> is required. Since it requires extra save/restore on nesting, I think
> we should run some tests to get more performance (latency/throughput)
> figure let the result decide the final direction. For example, we
> could run Vector in either nesting with preempt-V and  non-nesting
> without preempt-V and compare the following performance catachristics:
>  - System-wide latency impact
>  - Latency and throughput of softirq-Vector itself

The skcipher and aead APIs do indeed need to work in softirq context.

It's possible to use a fallback, either by falling back to scalar instructions
or by punting the encryption/decryption operation to a workqueue using
crypto/simd.c.  However, both approaches have some significant disadvantages.
It was nice that the need for them on arm64 was eliminated by commit
13150149aa6ded.  Note that it's possible to yield the vector unit occasionally,
to keep preemption and softirqs from being disabled for too long.

- Eric

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 06/12] RISC-V: crypto: add accelerated AES-CBC/CTR/ECB/XTS implementations
  2023-11-10  5:44           ` Eric Biggers
@ 2023-11-11 11:08             ` Ard Biesheuvel
  2023-11-11 17:52               ` Eric Biggers
  0 siblings, 1 reply; 50+ messages in thread
From: Ard Biesheuvel @ 2023-11-11 11:08 UTC (permalink / raw)
  To: Eric Biggers
  Cc: Andy Chiu, Jerry Shih, Paul Walmsley, palmer, Albert Ou, herbert,
	davem, greentime.hu, conor.dooley, guoren, bjorn, heiko,
	phoebe.chen, hongrong.hsu, linux-riscv, linux-kernel,
	linux-crypto

On Fri, 10 Nov 2023 at 15:44, Eric Biggers <ebiggers@kernel.org> wrote:
>
> On Fri, Nov 10, 2023 at 12:58:12PM +0800, Andy Chiu wrote:
> > Hi Eric,
> >
> > On Thu, Nov 9, 2023 at 3:16 PM Eric Biggers <ebiggers@kernel.org> wrote:
> > >
> > > On Tue, Nov 07, 2023 at 04:53:13PM +0800, Jerry Shih wrote:
> > > > On Nov 2, 2023, at 13:16, Eric Biggers <ebiggers@kernel.org> wrote:
> > > > > On Thu, Oct 26, 2023 at 02:36:38AM +0800, Jerry Shih wrote:
> > > > >> +static int ecb_encrypt(struct skcipher_request *req)
> > > > >> +{
> > > > >> +  struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
> > > > >> +  const struct riscv64_aes_ctx *ctx = crypto_skcipher_ctx(tfm);
> > > > >> +  struct skcipher_walk walk;
> > > > >> +  unsigned int nbytes;
> > > > >> +  int err;
> > > > >> +
> > > > >> +  /* If we have error here, the `nbytes` will be zero. */
> > > > >> +  err = skcipher_walk_virt(&walk, req, false);
> > > > >> +  while ((nbytes = walk.nbytes)) {
> > > > >> +          kernel_vector_begin();
> > > > >> +          rv64i_zvkned_ecb_encrypt(walk.src.virt.addr, walk.dst.virt.addr,
> > > > >> +                                   nbytes & AES_BLOCK_VALID_SIZE_MASK,
> > > > >> +                                   &ctx->key);
> > > > >> +          kernel_vector_end();
> > > > >> +          err = skcipher_walk_done(
> > > > >> +                  &walk, nbytes & AES_BLOCK_REMAINING_SIZE_MASK);
> > > > >> +  }
> > > > >> +
> > > > >> +  return err;
> > > > >> +}
> > > > >
> > > > > There's no fallback for !crypto_simd_usable() here.  I really like it this way.
> > > > > However, for it to work (for skciphers and aeads), RISC-V needs to allow the
> > > > > vector registers to be used in softirq context.  Is that already the case?
> > > >
> > > > The kernel-mode-vector could be enabled in softirq, but we don't have nesting
> > > > vector contexts. Will we have the case that kernel needs to jump to softirq for
> > > > encryptions during the regular crypto function? If yes, we need to have fallbacks
> > > > for all algorithms.
> > >
> > > Are you asking what happens if a softirq is taken while the CPU is between
> > > kernel_vector_begin() and kernel_vector_end()?  I think that needs to be
> > > prevented by making kernel_vector_begin() and kernel_vector_end() disable and
> > > re-enable softirqs, like what kernel_neon_begin() and kernel_neon_end() do on
> > > arm64.  Refer to commit 13150149aa6ded which implemented that behavior on arm64.
> >
> > Yes, if making Vector available to softirq context is a must, then it
> > is reasonable to call local_bh_disable() in kernel_vector_begin().
> > However, softirq would not be the only user for Vector and disabling
> > it may cause extra latencies. Meanwhile, simply disabling bh in
> > kernel_vector_begin() will conflict with the patch[1] that takes an
> > approach to run Preemptible Vector. Though it is not clear yet on
> > whether we should run Vector without turning off preemption, I have
> > tested running preemptible Vector and observed some latency
> > improvements without sacrificing throughput. We will have a discussion
> > on LPC2023[2] and it'd be great if you could join or continue to
> > discuss it here.
> >
> > Approaches can be done such as nesting, if running Vector in softirq
> > is required. Since it requires extra save/restore on nesting, I think
> > we should run some tests to get more performance (latency/throughput)
> > figure let the result decide the final direction. For example, we
> > could run Vector in either nesting with preempt-V and  non-nesting
> > without preempt-V and compare the following performance catachristics:
> >  - System-wide latency impact
> >  - Latency and throughput of softirq-Vector itself
>
> The skcipher and aead APIs do indeed need to work in softirq context.
>
> It's possible to use a fallback, either by falling back to scalar instructions
> or by punting the encryption/decryption operation to a workqueue using
> crypto/simd.c.  However, both approaches have some significant disadvantages.
> It was nice that the need for them on arm64 was eliminated by commit
> 13150149aa6ded.  Note that it's possible to yield the vector unit occasionally,
> to keep preemption and softirqs from being disabled for too long.
>

It is also quite feasible to start out with an implementation of
kernel_vector_begin() that preserves all vector registers eagerly in a
special per-CPU allocation if the call is made in softirq context (and
BUG when called in hardirq/NMI context). This was my initial approach
on arm64 too.

Assuming that RiSC-V systems with vector units are not flooding the
market just yet, this gives you some time to study the issue without
the need to implement non-vector fallback crypto algorithms
everywhere.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 06/12] RISC-V: crypto: add accelerated AES-CBC/CTR/ECB/XTS implementations
  2023-11-11 11:08             ` Ard Biesheuvel
@ 2023-11-11 17:52               ` Eric Biggers
  0 siblings, 0 replies; 50+ messages in thread
From: Eric Biggers @ 2023-11-11 17:52 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Andy Chiu, Jerry Shih, Paul Walmsley, palmer, Albert Ou, herbert,
	davem, greentime.hu, conor.dooley, guoren, bjorn, heiko,
	phoebe.chen, hongrong.hsu, linux-riscv, linux-kernel,
	linux-crypto

On Sat, Nov 11, 2023 at 09:08:31PM +1000, Ard Biesheuvel wrote:
> On Fri, 10 Nov 2023 at 15:44, Eric Biggers <ebiggers@kernel.org> wrote:
> >
> > On Fri, Nov 10, 2023 at 12:58:12PM +0800, Andy Chiu wrote:
> > > Hi Eric,
> > >
> > > On Thu, Nov 9, 2023 at 3:16 PM Eric Biggers <ebiggers@kernel.org> wrote:
> > > >
> > > > On Tue, Nov 07, 2023 at 04:53:13PM +0800, Jerry Shih wrote:
> > > > > On Nov 2, 2023, at 13:16, Eric Biggers <ebiggers@kernel.org> wrote:
> > > > > > On Thu, Oct 26, 2023 at 02:36:38AM +0800, Jerry Shih wrote:
> > > > > >> +static int ecb_encrypt(struct skcipher_request *req)
> > > > > >> +{
> > > > > >> +  struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
> > > > > >> +  const struct riscv64_aes_ctx *ctx = crypto_skcipher_ctx(tfm);
> > > > > >> +  struct skcipher_walk walk;
> > > > > >> +  unsigned int nbytes;
> > > > > >> +  int err;
> > > > > >> +
> > > > > >> +  /* If we have error here, the `nbytes` will be zero. */
> > > > > >> +  err = skcipher_walk_virt(&walk, req, false);
> > > > > >> +  while ((nbytes = walk.nbytes)) {
> > > > > >> +          kernel_vector_begin();
> > > > > >> +          rv64i_zvkned_ecb_encrypt(walk.src.virt.addr, walk.dst.virt.addr,
> > > > > >> +                                   nbytes & AES_BLOCK_VALID_SIZE_MASK,
> > > > > >> +                                   &ctx->key);
> > > > > >> +          kernel_vector_end();
> > > > > >> +          err = skcipher_walk_done(
> > > > > >> +                  &walk, nbytes & AES_BLOCK_REMAINING_SIZE_MASK);
> > > > > >> +  }
> > > > > >> +
> > > > > >> +  return err;
> > > > > >> +}
> > > > > >
> > > > > > There's no fallback for !crypto_simd_usable() here.  I really like it this way.
> > > > > > However, for it to work (for skciphers and aeads), RISC-V needs to allow the
> > > > > > vector registers to be used in softirq context.  Is that already the case?
> > > > >
> > > > > The kernel-mode-vector could be enabled in softirq, but we don't have nesting
> > > > > vector contexts. Will we have the case that kernel needs to jump to softirq for
> > > > > encryptions during the regular crypto function? If yes, we need to have fallbacks
> > > > > for all algorithms.
> > > >
> > > > Are you asking what happens if a softirq is taken while the CPU is between
> > > > kernel_vector_begin() and kernel_vector_end()?  I think that needs to be
> > > > prevented by making kernel_vector_begin() and kernel_vector_end() disable and
> > > > re-enable softirqs, like what kernel_neon_begin() and kernel_neon_end() do on
> > > > arm64.  Refer to commit 13150149aa6ded which implemented that behavior on arm64.
> > >
> > > Yes, if making Vector available to softirq context is a must, then it
> > > is reasonable to call local_bh_disable() in kernel_vector_begin().
> > > However, softirq would not be the only user for Vector and disabling
> > > it may cause extra latencies. Meanwhile, simply disabling bh in
> > > kernel_vector_begin() will conflict with the patch[1] that takes an
> > > approach to run Preemptible Vector. Though it is not clear yet on
> > > whether we should run Vector without turning off preemption, I have
> > > tested running preemptible Vector and observed some latency
> > > improvements without sacrificing throughput. We will have a discussion
> > > on LPC2023[2] and it'd be great if you could join or continue to
> > > discuss it here.
> > >
> > > Approaches can be done such as nesting, if running Vector in softirq
> > > is required. Since it requires extra save/restore on nesting, I think
> > > we should run some tests to get more performance (latency/throughput)
> > > figure let the result decide the final direction. For example, we
> > > could run Vector in either nesting with preempt-V and  non-nesting
> > > without preempt-V and compare the following performance catachristics:
> > >  - System-wide latency impact
> > >  - Latency and throughput of softirq-Vector itself
> >
> > The skcipher and aead APIs do indeed need to work in softirq context.
> >
> > It's possible to use a fallback, either by falling back to scalar instructions
> > or by punting the encryption/decryption operation to a workqueue using
> > crypto/simd.c.  However, both approaches have some significant disadvantages.
> > It was nice that the need for them on arm64 was eliminated by commit
> > 13150149aa6ded.  Note that it's possible to yield the vector unit occasionally,
> > to keep preemption and softirqs from being disabled for too long.
> >
> 
> It is also quite feasible to start out with an implementation of
> kernel_vector_begin() that preserves all vector registers eagerly in a
> special per-CPU allocation if the call is made in softirq context (and
> BUG when called in hardirq/NMI context). This was my initial approach
> on arm64 too.
> 
> Assuming that RiSC-V systems with vector units are not flooding the
> market just yet, this gives you some time to study the issue without
> the need to implement non-vector fallback crypto algorithms
> everywhere.

Yes, that solution would be fine too.

- Eric

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 06/12] RISC-V: crypto: add accelerated AES-CBC/CTR/ECB/XTS implementations
  2023-11-10  4:06     ` Jerry Shih
@ 2023-11-20  2:36       ` Jerry Shih
  0 siblings, 0 replies; 50+ messages in thread
From: Jerry Shih @ 2023-11-20  2:36 UTC (permalink / raw)
  To: Eric Biggers
  Cc: Paul Walmsley, palmer, Albert Ou, herbert, davem, andy.chiu,
	greentime.hu, conor.dooley, guoren, bjorn, heiko, ardb,
	phoebe.chen, hongrong.hsu, linux-riscv, linux-kernel,
	linux-crypto

On Nov 10, 2023, at 12:06, Jerry Shih <jerry.shih@sifive.com> wrote:
> On Nov 9, 2023, at 16:05, Eric Biggers <ebiggers@kernel.org> wrote:
>> On Thu, Oct 26, 2023 at 02:36:38AM +0800, Jerry Shih wrote:
>>> +# prepare input data(v24), iv(v28), bit-reversed-iv(v16), bit-reversed-iv-multiplier(v20)
>>> +sub init_first_round {
>>> ....
>>> +    # Prepare GF(2^128) multiplier [1, x, x^2, x^3, ...] in v8.
>>> +    slli $T0, $LEN32, 2
>>> +    @{[vsetvli "zero", $T0, "e32", "m1", "ta", "ma"]}
>>> +    # v2: [`1`, `1`, `1`, `1`, ...]
>>> +    @{[vmv_v_i $V2, 1]}
>>> +    # v3: [`0`, `1`, `2`, `3`, ...]
>>> +    @{[vid_v $V3]}
>>> +    @{[vsetvli "zero", $T0, "e64", "m2", "ta", "ma"]}
>>> +    # v4: [`1`, 0, `1`, 0, `1`, 0, `1`, 0, ...]
>>> +    @{[vzext_vf2 $V4, $V2]}
>>> +    # v6: [`0`, 0, `1`, 0, `2`, 0, `3`, 0, ...]
>>> +    @{[vzext_vf2 $V6, $V3]}
>>> +    slli $T0, $LEN32, 1
>>> +    @{[vsetvli "zero", $T0, "e32", "m2", "ta", "ma"]}
>>> +    # v8: [1<<0=1, 0, 0, 0, 1<<1=x, 0, 0, 0, 1<<2=x^2, 0, 0, 0, ...]
>>> +    @{[vwsll_vv $V8, $V4, $V6]}
>> 
>> This code assumes that '1 << i' fits in 64 bits, for 0 <= i < vl.
>> 
>> I think that works out to an implicit assumption that VLEN <= 2048.  I.e.,
>> AES-XTS encryption/decryption would produce the wrong result on RISC-V
>> implementations with VLEN > 2048.
>> 
>> Perhaps it should be explicitly checked that VLEN <= 2048?
>> 
>> - Eric
> 
> Yes, we could just have the simple checking like:
> 
>  riscv_vector_vlen() >= 128 || riscv_vector_vlen() <=2048
> 
> We could also truncate the VL inside for VLEN>2048 case.
> Let me think more about these two approaches. 
> 
> -Jerry

I use the simplest solution. Setup the check for vlen:
	riscv_vector_vlen() >= 128 || riscv_vector_vlen() <=2048
It will have a situation that we will not enable accelerated aes-xts for `vlen>2048`.
I would like to make a `todo` task to fix that in the future.

-Jerry

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 06/12] RISC-V: crypto: add accelerated AES-CBC/CTR/ECB/XTS implementations
  2023-11-02  5:16   ` Eric Biggers
  2023-11-07  8:53     ` Jerry Shih
@ 2023-11-20  2:47     ` Jerry Shih
  2023-11-20 19:28       ` Eric Biggers
  2023-11-22  1:14     ` Eric Biggers
  2 siblings, 1 reply; 50+ messages in thread
From: Jerry Shih @ 2023-11-20  2:47 UTC (permalink / raw)
  To: Eric Biggers
  Cc: Paul Walmsley, palmer, Albert Ou, herbert, davem, andy.chiu,
	greentime.hu, conor.dooley, guoren, bjorn, heiko, ardb,
	phoebe.chen, hongrong.hsu, linux-riscv, linux-kernel,
	linux-crypto

On Nov 2, 2023, at 13:16, Eric Biggers <ebiggers@kernel.org> wrote:
> On Thu, Oct 26, 2023 at 02:36:38AM +0800, Jerry Shih wrote:
>> +config CRYPTO_AES_BLOCK_RISCV64
>> +	default y if RISCV_ISA_V
>> +	tristate "Ciphers: AES, modes: ECB/CBC/CTR/XTS"
>> +	depends on 64BIT && RISCV_ISA_V
>> +	select CRYPTO_AES_RISCV64
>> +	select CRYPTO_SKCIPHER
>> +	help
>> +	  Length-preserving ciphers: AES cipher algorithms (FIPS-197)
>> +	  with block cipher modes:
>> +	  - ECB (Electronic Codebook) mode (NIST SP 800-38A)
>> +	  - CBC (Cipher Block Chaining) mode (NIST SP 800-38A)
>> +	  - CTR (Counter) mode (NIST SP 800-38A)
>> +	  - XTS (XOR Encrypt XOR Tweakable Block Cipher with Ciphertext
>> +	    Stealing) mode (NIST SP 800-38E and IEEE 1619)
>> +
>> +	  Architecture: riscv64 using:
>> +	  - Zvbb vector extension (XTS)
>> +	  - Zvkb vector crypto extension (CTR/XTS)
>> +	  - Zvkg vector crypto extension (XTS)
>> +	  - Zvkned vector crypto extension
> 
> Maybe list Zvkned first since it's the most important one in this context.

Fixed.


>> +#define AES_BLOCK_VALID_SIZE_MASK (~(AES_BLOCK_SIZE - 1))
>> +#define AES_BLOCK_REMAINING_SIZE_MASK (AES_BLOCK_SIZE - 1)
> 
> I think it would be easier to read if these values were just used directly.

Fixed.

>> +static int ecb_encrypt(struct skcipher_request *req)
>> +{
>> +	struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
>> +	const struct riscv64_aes_ctx *ctx = crypto_skcipher_ctx(tfm);
>> +	struct skcipher_walk walk;
>> +	unsigned int nbytes;
>> +	int err;
>> +
>> +	/* If we have error here, the `nbytes` will be zero. */
>> +	err = skcipher_walk_virt(&walk, req, false);
>> +	while ((nbytes = walk.nbytes)) {
>> +		kernel_vector_begin();
>> +		rv64i_zvkned_ecb_encrypt(walk.src.virt.addr, walk.dst.virt.addr,
>> +					 nbytes & AES_BLOCK_VALID_SIZE_MASK,
>> +					 &ctx->key);
>> +		kernel_vector_end();
>> +		err = skcipher_walk_done(
>> +			&walk, nbytes & AES_BLOCK_REMAINING_SIZE_MASK);
>> +	}
>> +
>> +	return err;
>> +}
> 
> There's no fallback for !crypto_simd_usable() here.  I really like it this way.
> However, for it to work (for skciphers and aeads), RISC-V needs to allow the
> vector registers to be used in softirq context.  Is that already the case?

I turn to use simd skcipher interface. More details will be in the v2 patch set.

>> +/* ctr */
>> +static int ctr_encrypt(struct skcipher_request *req)
>> +{
>> +	struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
>> +	const struct riscv64_aes_ctx *ctx = crypto_skcipher_ctx(tfm);
>> +	struct skcipher_walk walk;
>> +	unsigned int ctr32;
>> +	unsigned int nbytes;
>> +	unsigned int blocks;
>> +	unsigned int current_blocks;
>> +	unsigned int current_length;
>> +	int err;
>> +
>> +	/* the ctr iv uses big endian */
>> +	ctr32 = get_unaligned_be32(req->iv + 12);
>> +	err = skcipher_walk_virt(&walk, req, false);
>> +	while ((nbytes = walk.nbytes)) {
>> +		if (nbytes != walk.total) {
>> +			nbytes &= AES_BLOCK_VALID_SIZE_MASK;
>> +			blocks = nbytes / AES_BLOCK_SIZE;
>> +		} else {
>> +			/* This is the last walk. We should handle the tail data. */
>> +			blocks = (nbytes + (AES_BLOCK_SIZE - 1)) /
>> +				 AES_BLOCK_SIZE;
> 
> '(nbytes + (AES_BLOCK_SIZE - 1)) / AES_BLOCK_SIZE' can be replaced with
> 'DIV_ROUND_UP(nbytes, AES_BLOCK_SIZE)'

Fixed.

>> +static int xts_crypt(struct skcipher_request *req, aes_xts_func func)
>> +{
>> +	struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
>> +	const struct riscv64_aes_xts_ctx *ctx = crypto_skcipher_ctx(tfm);
>> +	struct skcipher_request sub_req;
>> +	struct scatterlist sg_src[2], sg_dst[2];
>> +	struct scatterlist *src, *dst;
>> +	struct skcipher_walk walk;
>> +	unsigned int walk_size = crypto_skcipher_walksize(tfm);
>> +	unsigned int tail_bytes;
>> +	unsigned int head_bytes;
>> +	unsigned int nbytes;
>> +	unsigned int update_iv = 1;
>> +	int err;
>> +
>> +	/* xts input size should be bigger than AES_BLOCK_SIZE */
>> +	if (req->cryptlen < AES_BLOCK_SIZE)
>> +		return -EINVAL;
>> +
>> +	/*
>> +	 * The tail size should be small than walk_size. Thus, we could make sure the
>> +	 * walk size for tail elements could be bigger than AES_BLOCK_SIZE.
>> +	 */
>> +	if (req->cryptlen <= walk_size) {
>> +		tail_bytes = req->cryptlen;
>> +		head_bytes = 0;
>> +	} else {
>> +		if (req->cryptlen & AES_BLOCK_REMAINING_SIZE_MASK) {
>> +			tail_bytes = req->cryptlen &
>> +				     AES_BLOCK_REMAINING_SIZE_MASK;
>> +			tail_bytes = walk_size + tail_bytes - AES_BLOCK_SIZE;
>> +			head_bytes = req->cryptlen - tail_bytes;
>> +		} else {
>> +			tail_bytes = 0;
>> +			head_bytes = req->cryptlen;
>> +		}
>> +	}
>> +
>> +	riscv64_aes_encrypt_zvkned(&ctx->ctx2, req->iv, req->iv);
>> +
>> +	if (head_bytes && tail_bytes) {
>> +		skcipher_request_set_tfm(&sub_req, tfm);
>> +		skcipher_request_set_callback(
>> +			&sub_req, skcipher_request_flags(req), NULL, NULL);
>> +		skcipher_request_set_crypt(&sub_req, req->src, req->dst,
>> +					   head_bytes, req->iv);
>> +		req = &sub_req;
>> +	}
>> +
>> +	if (head_bytes) {
>> +		err = skcipher_walk_virt(&walk, req, false);
>> +		while ((nbytes = walk.nbytes)) {
>> +			if (nbytes == walk.total)
>> +				update_iv = (tail_bytes > 0);
>> +
>> +			nbytes &= AES_BLOCK_VALID_SIZE_MASK;
>> +			kernel_vector_begin();
>> +			func(walk.src.virt.addr, walk.dst.virt.addr, nbytes,
>> +			     &ctx->ctx1.key, req->iv, update_iv);
>> +			kernel_vector_end();
>> +
>> +			err = skcipher_walk_done(&walk, walk.nbytes - nbytes);
>> +		}
>> +		if (err || !tail_bytes)
>> +			return err;
>> +
>> +		dst = src = scatterwalk_next(sg_src, &walk.in);
>> +		if (req->dst != req->src)
>> +			dst = scatterwalk_next(sg_dst, &walk.out);
>> +		skcipher_request_set_crypt(req, src, dst, tail_bytes, req->iv);
>> +	}
>> +
>> +	/* tail */
>> +	err = skcipher_walk_virt(&walk, req, false);
>> +	if (err)
>> +		return err;
>> +	if (walk.nbytes != tail_bytes)
>> +		return -EINVAL;
>> +	kernel_vector_begin();
>> +	func(walk.src.virt.addr, walk.dst.virt.addr, walk.nbytes,
>> +	     &ctx->ctx1.key, req->iv, 0);
>> +	kernel_vector_end();
>> +
>> +	return skcipher_walk_done(&walk, 0);
>> +}
> 
> This function looks a bit weird.  I see it's also the only caller of the
> scatterwalk_next() function that you're adding.  I haven't looked at this super
> closely, but I expect that there's a cleaner way of handling the "tail" than
> this -- maybe use scatterwalk_map_and_copy() to copy from/to a stack buffer?
> 
> - Eric

I put more comments in v2 patch set. Hope it will be more clear.
Even though we use `scatterwalk_map_and_copy()`, it still use
`scatterwalk_ffwd()` inside. The `scatterwalk_next()` is used
for just `moving the next scatterlist` from from the previous
walk instead of iterating from the head.

-Jerry


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 04/12] RISC-V: crypto: add Zvkned accelerated AES implementation
  2023-11-02  4:51   ` Eric Biggers
@ 2023-11-20  2:53     ` Jerry Shih
  0 siblings, 0 replies; 50+ messages in thread
From: Jerry Shih @ 2023-11-20  2:53 UTC (permalink / raw)
  To: Eric Biggers
  Cc: Paul Walmsley, palmer, Albert Ou, herbert, davem, andy.chiu,
	greentime.hu, conor.dooley, guoren, bjorn, heiko, ardb,
	phoebe.chen, hongrong.hsu, linux-riscv, linux-kernel,
	linux-crypto

On Nov 2, 2023, at 12:51, Eric Biggers <ebiggers@kernel.org> wrote:
> On Thu, Oct 26, 2023 at 02:36:36AM +0800, Jerry Shih wrote:
>> diff --git a/arch/riscv/crypto/Kconfig b/arch/riscv/crypto/Kconfig
>> index 10d60edc0110..500938317e71 100644
>> --- a/arch/riscv/crypto/Kconfig
>> +++ b/arch/riscv/crypto/Kconfig
>> @@ -2,4 +2,16 @@
>> 
>> menu "Accelerated Cryptographic Algorithms for CPU (riscv)"
>> 
>> +config CRYPTO_AES_RISCV64
>> +	default y if RISCV_ISA_V
>> +	tristate "Ciphers: AES"
>> +	depends on 64BIT && RISCV_ISA_V
>> +	select CRYPTO_AES
>> +	select CRYPTO_ALGAPI
>> +	help
>> +	  Block ciphers: AES cipher algorithms (FIPS-197)
>> +
>> +	  Architecture: riscv64 using:
>> +	  - Zvkned vector crypto extension
> 
> kconfig options should default to off.
> 
> I.e., remove the line "default y if RISCV_ISA_V"

Fixed.

>> + *
>> + * All zvkned-based functions use encryption expending keys for both encryption
>> + * and decryption.
>> + */
> 
> The above comment is a bit confusing.  It's describing the 'key' field of struct
> aes_key; maybe there should be a comment there instead:
> 
>    struct aes_key {
>            u32 key[AES_MAX_KEYLENGTH_U32]; /* round keys in encryption order */
>            u32 rounds;
>    };

I have updated the asm implementation to use `crypto_aes_ctx` struct.

>> +int riscv64_aes_setkey(struct riscv64_aes_ctx *ctx, const u8 *key,
>> +		       unsigned int keylen)
>> +{
>> +	/*
>> +	 * The RISC-V AES vector crypto key expending doesn't support AES-192.
>> +	 * We just use the generic software key expending here to simplify the key
>> +	 * expending flow.
>> +	 */
> 
> expending => expanding

Thx.
Fixed.

>> +	u32 aes_rounds;
>> +	u32 key_length;
>> +	int ret;
>> +
>> +	ret = aes_expandkey(&ctx->fallback_ctx, key, keylen);
>> +	if (ret < 0)
>> +		return -EINVAL;
>> +
>> +	/*
>> +	 * Copy the key from `crypto_aes_ctx` to `aes_key` for zvkned-based AES
>> +	 * implementations.
>> +	 */
>> +	aes_rounds = aes_round_num(keylen);
>> +	ctx->key.rounds = aes_rounds;
>> +	key_length = AES_BLOCK_SIZE * (aes_rounds + 1);
>> +	memcpy(ctx->key.key, ctx->fallback_ctx.key_enc, key_length);
>> +
>> +	return 0;
>> +}
> 
> Ideally this would use the same crypto_aes_ctx for both the fallback and the
> assembly code.  I suppose we don't want to diverge from the OpenSSL code (unless
> it gets rewritten), though.  So I guess this is fine for now.

I have updated the asm implementation to use `crypto_aes_ctx` struct.

>> void riscv64_aes_encrypt_zvkned(const struct riscv64_aes_ctx *ctx, u8 *dst,
>>                               const u8 *src)
> 
> These functions can be called from a different module (aes-block-riscv64), so
> they need EXPORT_SYMBOL_GPL.

Fixed.

>> +static inline bool check_aes_ext(void)
>> +{
>> +	return riscv_isa_extension_available(NULL, ZVKNED) &&
>> +	       riscv_vector_vlen() >= 128;
>> +}
>> +
>> +static int __init riscv64_aes_mod_init(void)
>> +{
>> +	if (check_aes_ext())
>> +		return crypto_register_alg(&riscv64_aes_alg_zvkned);
>> +
>> +	return -ENODEV;
>> +}
>> +
>> +static void __exit riscv64_aes_mod_fini(void)
>> +{
>> +	if (check_aes_ext())
>> +		crypto_unregister_alg(&riscv64_aes_alg_zvkned);
>> +}
>> +
>> +module_init(riscv64_aes_mod_init);
>> +module_exit(riscv64_aes_mod_fini);
> 
> module_exit can only run if module_init succeeded.  So, in cases like this it's
> not necessary to check for CPU features before unregistering the algorithm.
> 
> - Eric

Fixed.

-Jerry


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 12/12] RISC-V: crypto: add Zvkb accelerated ChaCha20 implementation
  2023-11-02  5:43   ` Eric Biggers
@ 2023-11-20  2:55     ` Jerry Shih
  2023-11-20 19:18       ` Eric Biggers
  0 siblings, 1 reply; 50+ messages in thread
From: Jerry Shih @ 2023-11-20  2:55 UTC (permalink / raw)
  To: Eric Biggers
  Cc: Paul Walmsley, palmer, Albert Ou, herbert, davem, andy.chiu,
	greentime.hu, conor.dooley, guoren, bjorn, heiko, ardb,
	phoebe.chen, hongrong.hsu, linux-riscv, linux-kernel,
	linux-crypto

On Nov 2, 2023, at 13:43, Eric Biggers <ebiggers@kernel.org> wrote:
> On Thu, Oct 26, 2023 at 02:36:44AM +0800, Jerry Shih wrote:
>> +static struct skcipher_alg riscv64_chacha_alg_zvkb[] = { {
>> +	.base = {
>> +		.cra_name = "chacha20",
>> +		.cra_driver_name = "chacha20-riscv64-zvkb",
>> +		.cra_priority = 300,
>> +		.cra_blocksize = 1,
>> +		.cra_ctxsize = sizeof(struct chacha_ctx),
>> +		.cra_module = THIS_MODULE,
>> +	},
>> +	.min_keysize = CHACHA_KEY_SIZE,
>> +	.max_keysize = CHACHA_KEY_SIZE,
>> +	.ivsize = CHACHA_IV_SIZE,
>> +	.chunksize = CHACHA_BLOCK_SIZE,
>> +	.walksize = CHACHA_BLOCK_SIZE * 4,
>> +	.setkey = chacha20_setkey,
>> +	.encrypt = chacha20_encrypt,
>> +	.decrypt = chacha20_encrypt,
>> +} };
>> +
>> +static inline bool check_chacha20_ext(void)
>> +{
>> +	return riscv_isa_extension_available(NULL, ZVKB) &&
>> +	       riscv_vector_vlen() >= 128;
>> +}
>> +
>> +static int __init riscv64_chacha_mod_init(void)
>> +{
>> +	if (check_chacha20_ext())
>> +		return crypto_register_skciphers(
>> +			riscv64_chacha_alg_zvkb,
>> +			ARRAY_SIZE(riscv64_chacha_alg_zvkb));
>> +
>> +	return -ENODEV;
>> +}
>> +
>> +static void __exit riscv64_chacha_mod_fini(void)
>> +{
>> +	if (check_chacha20_ext())
>> +		crypto_unregister_skciphers(
>> +			riscv64_chacha_alg_zvkb,
>> +			ARRAY_SIZE(riscv64_chacha_alg_zvkb));
>> +}
> 
> When there's just one algorithm being registered/unregistered,
> crypto_register_skcipher() and crypto_unregister_skcipher() can be used.

Fixed.

>> +# - RV64I
>> +# - RISC-V Vector ('V') with VLEN >= 128
>> +# - RISC-V Vector Cryptography Bit-manipulation extension ('Zvkb')
>> +# - RISC-V Zicclsm(Main memory supports misaligned loads/stores)
> 
> How is the presence of the Zicclsm extension guaranteed?
> 
> - Eric

I have the addition extension parser for `Zicclsm` in the v2 patch set.


-Jerry


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 10/12] RISC-V: crypto: add Zvksed accelerated SM4 implementation
  2023-11-02  5:58   ` Eric Biggers
@ 2023-11-20  2:55     ` Jerry Shih
  0 siblings, 0 replies; 50+ messages in thread
From: Jerry Shih @ 2023-11-20  2:55 UTC (permalink / raw)
  To: Eric Biggers
  Cc: Paul Walmsley, palmer, Albert Ou, herbert, davem, andy.chiu,
	greentime.hu, conor.dooley, guoren, bjorn, heiko, ardb,
	phoebe.chen, hongrong.hsu, linux-riscv, linux-kernel,
	linux-crypto

On Nov 2, 2023, at 13:58, Eric Biggers <ebiggers@kernel.org> wrote:
> On Thu, Oct 26, 2023 at 02:36:42AM +0800, Jerry Shih wrote:
>> +struct crypto_alg riscv64_sm4_zvksed_alg = {
>> +	.cra_name = "sm4",
>> +	.cra_driver_name = "sm4-riscv64-zvkb-zvksed",
>> +	.cra_module = THIS_MODULE,
>> +	.cra_priority = 300,
>> +	.cra_flags = CRYPTO_ALG_TYPE_CIPHER,
>> +	.cra_blocksize = SM4_BLOCK_SIZE,
>> +	.cra_ctxsize = sizeof(struct sm4_ctx),
>> +	.cra_cipher = {
>> +		.cia_min_keysize = SM4_KEY_SIZE,
>> +		.cia_max_keysize = SM4_KEY_SIZE,
>> +		.cia_setkey = riscv64_sm4_setkey_zvksed,
>> +		.cia_encrypt = riscv64_sm4_encrypt_zvksed,
>> +		.cia_decrypt = riscv64_sm4_decrypt_zvksed,
>> +	},
>> +};
> 
> This should be 'static'.
> 
> - Eric

Thx.
Fixed.


-Jerry

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 12/12] RISC-V: crypto: add Zvkb accelerated ChaCha20 implementation
  2023-11-20  2:55     ` Jerry Shih
@ 2023-11-20 19:18       ` Eric Biggers
  2023-11-21 10:55         ` Jerry Shih
  0 siblings, 1 reply; 50+ messages in thread
From: Eric Biggers @ 2023-11-20 19:18 UTC (permalink / raw)
  To: Jerry Shih
  Cc: Paul Walmsley, palmer, Albert Ou, herbert, davem, andy.chiu,
	greentime.hu, conor.dooley, guoren, bjorn, heiko, ardb,
	phoebe.chen, hongrong.hsu, linux-riscv, linux-kernel,
	linux-crypto

Hi Jerry!

On Mon, Nov 20, 2023 at 10:55:15AM +0800, Jerry Shih wrote:
> >> +# - RV64I
> >> +# - RISC-V Vector ('V') with VLEN >= 128
> >> +# - RISC-V Vector Cryptography Bit-manipulation extension ('Zvkb')
> >> +# - RISC-V Zicclsm(Main memory supports misaligned loads/stores)
> > 
> > How is the presence of the Zicclsm extension guaranteed?
> > 
> > - Eric
> 
> I have the addition extension parser for `Zicclsm` in the v2 patch set.

First, I can see your updated patchset at branch
"dev/jerrys/vector-crypto-upstream-v2" of https://github.com/JerryShih/linux,
but I haven't seen it on the mailing list yet.  Are you planning to send it out?

Second, with your updated patchset, I'm not seeing any of the RISC-V optimized
algorithms be registered when I boot the kernel in QEMU.  This is caused by the
new check 'riscv_isa_extension_available(NULL, ZICCLSM)' not passing.  Is
checking for "Zicclsm" the correct way to determine whether unaligned memory
accesses are supported?

I'm using 'qemu-system-riscv64 -cpu max -machine virt', with the very latest
QEMU commit (af9264da80073435), so it should have all the CPU features.

- Eric

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 06/12] RISC-V: crypto: add accelerated AES-CBC/CTR/ECB/XTS implementations
  2023-11-20  2:47     ` Jerry Shih
@ 2023-11-20 19:28       ` Eric Biggers
  0 siblings, 0 replies; 50+ messages in thread
From: Eric Biggers @ 2023-11-20 19:28 UTC (permalink / raw)
  To: Jerry Shih
  Cc: Paul Walmsley, palmer, Albert Ou, herbert, davem, andy.chiu,
	greentime.hu, conor.dooley, guoren, bjorn, heiko, ardb,
	phoebe.chen, hongrong.hsu, linux-riscv, linux-kernel,
	linux-crypto

On Mon, Nov 20, 2023 at 10:47:29AM +0800, Jerry Shih wrote:
> > There's no fallback for !crypto_simd_usable() here.  I really like it this way.
> > However, for it to work (for skciphers and aeads), RISC-V needs to allow the
> > vector registers to be used in softirq context.  Is that already the case?
> 
> I turn to use simd skcipher interface. More details will be in the v2 patch set.

Thanks.  Later, I suspect that we'll want to make the vector unit usable in
softirq context directly.  But for now I suppose the SIMD helper is tolerable.

- Eric

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 12/12] RISC-V: crypto: add Zvkb accelerated ChaCha20 implementation
  2023-11-20 19:18       ` Eric Biggers
@ 2023-11-21 10:55         ` Jerry Shih
  2023-11-21 13:14           ` Conor Dooley
  0 siblings, 1 reply; 50+ messages in thread
From: Jerry Shih @ 2023-11-21 10:55 UTC (permalink / raw)
  To: Eric Biggers
  Cc: Paul Walmsley, palmer, Albert Ou, herbert, davem, andy.chiu,
	greentime.hu, conor.dooley, guoren, bjorn, heiko, ardb,
	phoebe.chen, hongrong.hsu, linux-riscv, linux-kernel,
	linux-crypto

On Nov 21, 2023, at 03:18, Eric Biggers <ebiggers@kernel.org> wrote:
> First, I can see your updated patchset at branch
> "dev/jerrys/vector-crypto-upstream-v2" of https://github.com/JerryShih/linux,
> but I haven't seen it on the mailing list yet.  Are you planning to send it out?

I will send it out soon.

> Second, with your updated patchset, I'm not seeing any of the RISC-V optimized
> algorithms be registered when I boot the kernel in QEMU.  This is caused by the
> new check 'riscv_isa_extension_available(NULL, ZICCLSM)' not passing.  Is
> checking for "Zicclsm" the correct way to determine whether unaligned memory
> accesses are supported?
> 
> I'm using 'qemu-system-riscv64 -cpu max -machine virt', with the very latest
> QEMU commit (af9264da80073435), so it should have all the CPU features.
> 
> - Eric

Sorry, I just use my `internal` qemu with vector-crypto and rva22 patches.

The public qemu haven't supported rva22 profiles. Here is the qemu patch[1] for
that. But here is the discussion why the qemu doesn't export these
`named extensions`(e.g. Zicclsm).
I try to add Zicclsm in DT in the v2 patch set. Maybe we will have more discussion
about the rva22 profiles in kernel DT.

[1]
LINK: https://lore.kernel.org/all/d1d6f2dc-55b2-4dce-a48a-4afbbf6df526@ventanamicro.com/#t

I don't know whether it's a good practice to check unaligned access using
`Zicclsm`. 

Here is another related cpu feature for unaligned access:
RISCV_HWPROBE_MISALIGNED_*
But it looks like it always be initialized with `RISCV_HWPROBE_MISALIGNED_SLOW`[2].
It implies that linux kernel always supports unaligned access. But we have the
actual HW which doesn't support unaligned access for vector unit.

[2]
LINK: https://github.com/torvalds/linux/blob/98b1cc82c4affc16f5598d4fa14b1858671b2263/arch/riscv/kernel/cpufeature.c#L575

I will still use `Zicclsm` checking in this stage for reviewing. And I will create qemu
branch with Zicclsm enabled feature for testing.

-Jerry

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 12/12] RISC-V: crypto: add Zvkb accelerated ChaCha20 implementation
  2023-11-21 10:55         ` Jerry Shih
@ 2023-11-21 13:14           ` Conor Dooley
  2023-11-21 23:37             ` Eric Biggers
  2023-11-22 17:37             ` Jerry Shih
  0 siblings, 2 replies; 50+ messages in thread
From: Conor Dooley @ 2023-11-21 13:14 UTC (permalink / raw)
  To: Jerry Shih
  Cc: Eric Biggers, Paul Walmsley, palmer, Albert Ou, herbert, davem,
	andy.chiu, greentime.hu, guoren, bjorn, heiko, ardb, phoebe.chen,
	hongrong.hsu, linux-riscv, linux-kernel, linux-crypto

[-- Attachment #1: Type: text/plain, Size: 2749 bytes --]

On Tue, Nov 21, 2023 at 06:55:07PM +0800, Jerry Shih wrote:
> On Nov 21, 2023, at 03:18, Eric Biggers <ebiggers@kernel.org> wrote:
> > First, I can see your updated patchset at branch
> > "dev/jerrys/vector-crypto-upstream-v2" of https://github.com/JerryShih/linux,
> > but I haven't seen it on the mailing list yet.  Are you planning to send it out?
> 
> I will send it out soon.
> 
> > Second, with your updated patchset, I'm not seeing any of the RISC-V optimized
> > algorithms be registered when I boot the kernel in QEMU.  This is caused by the
> > new check 'riscv_isa_extension_available(NULL, ZICCLSM)' not passing.  Is
> > checking for "Zicclsm" the correct way to determine whether unaligned memory
> > accesses are supported?
> > 
> > I'm using 'qemu-system-riscv64 -cpu max -machine virt', with the very latest
> > QEMU commit (af9264da80073435), so it should have all the CPU features.
> > 
> > - Eric
> 
> Sorry, I just use my `internal` qemu with vector-crypto and rva22 patches.
> 
> The public qemu haven't supported rva22 profiles. Here is the qemu patch[1] for
> that. But here is the discussion why the qemu doesn't export these
> `named extensions`(e.g. Zicclsm).
> I try to add Zicclsm in DT in the v2 patch set. Maybe we will have more discussion
> about the rva22 profiles in kernel DT.

Please do, that'll be fun! Please take some time to read what the
profiles spec actually defines Zicclsm fore before you send those patches
though. I think you might come to find you have misunderstood what it
means - certainly I did the first time I saw it!

> [1]
> LINK: https://lore.kernel.org/all/d1d6f2dc-55b2-4dce-a48a-4afbbf6df526@ventanamicro.com/#t
> 
> I don't know whether it's a good practice to check unaligned access using
> `Zicclsm`. 
> 
> Here is another related cpu feature for unaligned access:
> RISCV_HWPROBE_MISALIGNED_*
> But it looks like it always be initialized with `RISCV_HWPROBE_MISALIGNED_SLOW`[2].
> It implies that linux kernel always supports unaligned access. But we have the
> actual HW which doesn't support unaligned access for vector unit.

https://docs.kernel.org/arch/riscv/uabi.html#misaligned-accesses

Misaligned accesses are part of the user ABI & the hwprobe stuff for
that allows userspace to figure out whether they're fast (likely
implemented in hardware), slow (likely emulated in firmware) or emulated
in the kernel.

Cheers,
Conor.

> 
> [2]
> LINK: https://github.com/torvalds/linux/blob/98b1cc82c4affc16f5598d4fa14b1858671b2263/arch/riscv/kernel/cpufeature.c#L575
> 
> I will still use `Zicclsm` checking in this stage for reviewing. And I will create qemu
> branch with Zicclsm enabled feature for testing.
> 
> -Jerry

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 12/12] RISC-V: crypto: add Zvkb accelerated ChaCha20 implementation
  2023-11-21 13:14           ` Conor Dooley
@ 2023-11-21 23:37             ` Eric Biggers
  2023-11-22  0:39               ` Conor Dooley
  2023-11-22 17:37             ` Jerry Shih
  1 sibling, 1 reply; 50+ messages in thread
From: Eric Biggers @ 2023-11-21 23:37 UTC (permalink / raw)
  To: Conor Dooley
  Cc: Jerry Shih, Paul Walmsley, palmer, Albert Ou, herbert, davem,
	andy.chiu, greentime.hu, guoren, bjorn, heiko, ardb, phoebe.chen,
	hongrong.hsu, linux-riscv, linux-kernel, linux-crypto

On Tue, Nov 21, 2023 at 01:14:47PM +0000, Conor Dooley wrote:
> On Tue, Nov 21, 2023 at 06:55:07PM +0800, Jerry Shih wrote:
> > On Nov 21, 2023, at 03:18, Eric Biggers <ebiggers@kernel.org> wrote:
> > > First, I can see your updated patchset at branch
> > > "dev/jerrys/vector-crypto-upstream-v2" of https://github.com/JerryShih/linux,
> > > but I haven't seen it on the mailing list yet.  Are you planning to send it out?
> > 
> > I will send it out soon.
> > 
> > > Second, with your updated patchset, I'm not seeing any of the RISC-V optimized
> > > algorithms be registered when I boot the kernel in QEMU.  This is caused by the
> > > new check 'riscv_isa_extension_available(NULL, ZICCLSM)' not passing.  Is
> > > checking for "Zicclsm" the correct way to determine whether unaligned memory
> > > accesses are supported?
> > > 
> > > I'm using 'qemu-system-riscv64 -cpu max -machine virt', with the very latest
> > > QEMU commit (af9264da80073435), so it should have all the CPU features.
> > > 
> > > - Eric
> > 
> > Sorry, I just use my `internal` qemu with vector-crypto and rva22 patches.
> > 
> > The public qemu haven't supported rva22 profiles. Here is the qemu patch[1] for
> > that. But here is the discussion why the qemu doesn't export these
> > `named extensions`(e.g. Zicclsm).
> > I try to add Zicclsm in DT in the v2 patch set. Maybe we will have more discussion
> > about the rva22 profiles in kernel DT.
> 
> Please do, that'll be fun! Please take some time to read what the
> profiles spec actually defines Zicclsm fore before you send those patches
> though. I think you might come to find you have misunderstood what it
> means - certainly I did the first time I saw it!
> 
> > [1]
> > LINK: https://lore.kernel.org/all/d1d6f2dc-55b2-4dce-a48a-4afbbf6df526@ventanamicro.com/#t
> > 
> > I don't know whether it's a good practice to check unaligned access using
> > `Zicclsm`. 
> > 
> > Here is another related cpu feature for unaligned access:
> > RISCV_HWPROBE_MISALIGNED_*
> > But it looks like it always be initialized with `RISCV_HWPROBE_MISALIGNED_SLOW`[2].
> > It implies that linux kernel always supports unaligned access. But we have the
> > actual HW which doesn't support unaligned access for vector unit.
> 
> https://docs.kernel.org/arch/riscv/uabi.html#misaligned-accesses
> 
> Misaligned accesses are part of the user ABI & the hwprobe stuff for
> that allows userspace to figure out whether they're fast (likely
> implemented in hardware), slow (likely emulated in firmware) or emulated
> in the kernel.
> 
> Cheers,
> Conor.
> 
> > 
> > [2]
> > LINK: https://github.com/torvalds/linux/blob/98b1cc82c4affc16f5598d4fa14b1858671b2263/arch/riscv/kernel/cpufeature.c#L575
> > 
> > I will still use `Zicclsm` checking in this stage for reviewing. And I will create qemu
> > branch with Zicclsm enabled feature for testing.
> > 

According to https://github.com/riscv/riscv-profiles/blob/main/profiles.adoc,
Zicclsm means that "main memory supports misaligned loads/stores", but they
"might execute extremely slowly."

In general, the vector crypto routines that Jerry is adding assume that
misaligned vector loads/stores are supported *and* are fast.  I think the kernel
mustn't register those algorithms if that isn't the case.  Zicclsm sounds like
the wrong thing to check.  Maybe RISCV_HWPROBE_MISALIGNED_FAST is the right
thing to check?

BTW, something else I was wondering about is endianness.  Most of the vector
crypto routines also assume little endian byte order, but I don't see that being
explicitly checked for anywhere.  Should it be?

- Eric

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 12/12] RISC-V: crypto: add Zvkb accelerated ChaCha20 implementation
  2023-11-21 23:37             ` Eric Biggers
@ 2023-11-22  0:39               ` Conor Dooley
  0 siblings, 0 replies; 50+ messages in thread
From: Conor Dooley @ 2023-11-22  0:39 UTC (permalink / raw)
  To: Eric Biggers
  Cc: Conor Dooley, Jerry Shih, Paul Walmsley, palmer, Albert Ou,
	herbert, davem, andy.chiu, greentime.hu, guoren, bjorn, heiko,
	ardb, phoebe.chen, hongrong.hsu, linux-riscv, linux-kernel,
	linux-crypto

[-- Attachment #1: Type: text/plain, Size: 4826 bytes --]

On Tue, Nov 21, 2023 at 03:37:43PM -0800, Eric Biggers wrote:
> On Tue, Nov 21, 2023 at 01:14:47PM +0000, Conor Dooley wrote:
> > On Tue, Nov 21, 2023 at 06:55:07PM +0800, Jerry Shih wrote:
> > > On Nov 21, 2023, at 03:18, Eric Biggers <ebiggers@kernel.org> wrote:
> > > > First, I can see your updated patchset at branch
> > > > "dev/jerrys/vector-crypto-upstream-v2" of https://github.com/JerryShih/linux,
> > > > but I haven't seen it on the mailing list yet.  Are you planning to send it out?
> > > 
> > > I will send it out soon.
> > > 
> > > > Second, with your updated patchset, I'm not seeing any of the RISC-V optimized
> > > > algorithms be registered when I boot the kernel in QEMU.  This is caused by the
> > > > new check 'riscv_isa_extension_available(NULL, ZICCLSM)' not passing.  Is
> > > > checking for "Zicclsm" the correct way to determine whether unaligned memory
> > > > accesses are supported?
> > > > 
> > > > I'm using 'qemu-system-riscv64 -cpu max -machine virt', with the very latest
> > > > QEMU commit (af9264da80073435), so it should have all the CPU features.
> > > > 
> > > > - Eric
> > > 
> > > Sorry, I just use my `internal` qemu with vector-crypto and rva22 patches.
> > > 
> > > The public qemu haven't supported rva22 profiles. Here is the qemu patch[1] for
> > > that. But here is the discussion why the qemu doesn't export these
> > > `named extensions`(e.g. Zicclsm).
> > > I try to add Zicclsm in DT in the v2 patch set. Maybe we will have more discussion
> > > about the rva22 profiles in kernel DT.
> > 
> > Please do, that'll be fun! Please take some time to read what the
> > profiles spec actually defines Zicclsm fore before you send those patches
> > though. I think you might come to find you have misunderstood what it
> > means - certainly I did the first time I saw it!
> > 
> > > [1]
> > > LINK: https://lore.kernel.org/all/d1d6f2dc-55b2-4dce-a48a-4afbbf6df526@ventanamicro.com/#t
> > > 
> > > I don't know whether it's a good practice to check unaligned access using
> > > `Zicclsm`. 
> > > 
> > > Here is another related cpu feature for unaligned access:
> > > RISCV_HWPROBE_MISALIGNED_*
> > > But it looks like it always be initialized with `RISCV_HWPROBE_MISALIGNED_SLOW`[2].
> > > It implies that linux kernel always supports unaligned access. But we have the
> > > actual HW which doesn't support unaligned access for vector unit.
> > 
> > https://docs.kernel.org/arch/riscv/uabi.html#misaligned-accesses
> > 
> > Misaligned accesses are part of the user ABI & the hwprobe stuff for
> > that allows userspace to figure out whether they're fast (likely
> > implemented in hardware), slow (likely emulated in firmware) or emulated
> > in the kernel.
> >
> > > [2]
> > > LINK: https://github.com/torvalds/linux/blob/98b1cc82c4affc16f5598d4fa14b1858671b2263/arch/riscv/kernel/cpufeature.c#L575
> > > 
> > > I will still use `Zicclsm` checking in this stage for reviewing. And I will create qemu
> > > branch with Zicclsm enabled feature for testing.
> > > 
> 
> According to https://github.com/riscv/riscv-profiles/blob/main/profiles.adoc,
> Zicclsm means that "main memory supports misaligned loads/stores", but they
> "might execute extremely slowly."

Check the section it is defined in - it is only defined for the RVA22U64
profile which describes "features available to user-mode execution
environments". It otherwise has no meaning, so it is not suitable for
detecting anything from within the kernel. For other operating systems
it might actually mean something, but for Linux the uABI on RISC-V
unconditionally provides what Zicclsm is intended to convey:
https://www.kernel.org/doc/html/next/riscv/uabi.html#misaligned-accesses
We could (_perhaps_) set it in /proc/cpuinfo in riscv,isa there - but a
conversation would have to be had about what these non-extension
"features" actually are & whether it makes sense to put them there.

> In general, the vector crypto routines that Jerry is adding assume that
> misaligned vector loads/stores are supported *and* are fast.  I think the kernel
> mustn't register those algorithms if that isn't the case.  Zicclsm sounds like
> the wrong thing to check.  Maybe RISCV_HWPROBE_MISALIGNED_FAST is the right
> thing to check?

It actually means something, so it is certainly better ;)
I think checking it makes sense as a good surrogate for actually knowing
whether or not the hardware supports misaligned access.

> BTW, something else I was wondering about is endianness.  Most of the vector
> crypto routines also assume little endian byte order, but I don't see that being
> explicitly checked for anywhere.  Should it be?

The RISC-V kernel only supports LE at the moment. I hope that doesn't
change tbh.

Cheers,
Conor.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 06/12] RISC-V: crypto: add accelerated AES-CBC/CTR/ECB/XTS implementations
  2023-11-02  5:16   ` Eric Biggers
  2023-11-07  8:53     ` Jerry Shih
  2023-11-20  2:47     ` Jerry Shih
@ 2023-11-22  1:14     ` Eric Biggers
  2023-11-27  2:52       ` Jerry Shih
  2 siblings, 1 reply; 50+ messages in thread
From: Eric Biggers @ 2023-11-22  1:14 UTC (permalink / raw)
  To: Jerry Shih
  Cc: paul.walmsley, palmer, aou, herbert, davem, andy.chiu,
	greentime.hu, conor.dooley, guoren, bjorn, heiko, ardb,
	phoebe.chen, hongrong.hsu, linux-riscv, linux-kernel,
	linux-crypto

On Wed, Nov 01, 2023 at 10:16:39PM -0700, Eric Biggers wrote:
> > +	  Architecture: riscv64 using:
> > +	  - Zvbb vector extension (XTS)
> > +	  - Zvkb vector crypto extension (CTR/XTS)
> > +	  - Zvkg vector crypto extension (XTS)
> > +	  - Zvkned vector crypto extension
> 
> Maybe list Zvkned first since it's the most important one in this context.

BTW, I'd like to extend this request to the implementation names
(.cra_driver_name) and the names of the files as well.  I.e., instead of:

    aes-riscv64-zvkned
    aes-riscv64-zvkb-zvkned
    aes-riscv64-zvbb-zvkg-zvkned
    sha256-riscv64-zvkb-zvknha_or_zvknhb
    sha512-riscv64-zvkb-zvknhb

... we'd have:

    aes-riscv64-zvkned
    aes-riscv64-zvkned-zvkb
    aes-riscv64-zvkned-zvbb-zvkg
    sha256-riscv64-zvknha_or_zvknhb-zvkb
    sha512-riscv64-zvknhb-zvkb

and similarly for the cra_driver_name fields.

I think that's much more logical.  Do you agree?

- Eric

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 12/12] RISC-V: crypto: add Zvkb accelerated ChaCha20 implementation
  2023-10-25 18:36 ` [PATCH 12/12] RISC-V: crypto: add Zvkb accelerated ChaCha20 implementation Jerry Shih
  2023-11-02  5:43   ` Eric Biggers
@ 2023-11-22  1:29   ` Eric Biggers
  2023-11-27  2:14     ` Jerry Shih
  1 sibling, 1 reply; 50+ messages in thread
From: Eric Biggers @ 2023-11-22  1:29 UTC (permalink / raw)
  To: Jerry Shih
  Cc: paul.walmsley, palmer, aou, herbert, davem, andy.chiu,
	greentime.hu, conor.dooley, guoren, bjorn, heiko, ardb,
	phoebe.chen, hongrong.hsu, linux-riscv, linux-kernel,
	linux-crypto

On Thu, Oct 26, 2023 at 02:36:44AM +0800, Jerry Shih wrote:
> diff --git a/arch/riscv/crypto/chacha-riscv64-glue.c b/arch/riscv/crypto/chacha-riscv64-glue.c
> new file mode 100644
> index 000000000000..72011949f705
> --- /dev/null
> +++ b/arch/riscv/crypto/chacha-riscv64-glue.c
> @@ -0,0 +1,120 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Port of the OpenSSL ChaCha20 implementation for RISC-V 64
> + *
> + * Copyright (C) 2023 SiFive, Inc.
> + * Author: Jerry Shih <jerry.shih@sifive.com>
> + */
> +
> +#include <asm/simd.h>
> +#include <asm/vector.h>
> +#include <crypto/internal/chacha.h>
> +#include <crypto/internal/simd.h>
> +#include <crypto/internal/skcipher.h>
> +#include <linux/crypto.h>
> +#include <linux/module.h>
> +#include <linux/types.h>
> +
> +#define CHACHA_BLOCK_VALID_SIZE_MASK (~(CHACHA_BLOCK_SIZE - 1))
> +#define CHACHA_BLOCK_REMAINING_SIZE_MASK (CHACHA_BLOCK_SIZE - 1)
> +#define CHACHA_KEY_OFFSET 4
> +#define CHACHA_IV_OFFSET 12
> +
> +/* chacha20 using zvkb vector crypto extension */
> +void ChaCha20_ctr32_zvkb(u8 *out, const u8 *input, size_t len, const u32 *key,
> +			 const u32 *counter);
> +
> +static int chacha20_encrypt(struct skcipher_request *req)
> +{
> +	u32 state[CHACHA_STATE_WORDS];

This function doesn't need to create the whole state matrix on the stack, since
the underlying assembly function takes as input the key and counter, not the
state matrix.  I recommend something like the following:

diff --git a/arch/riscv/crypto/chacha-riscv64-glue.c b/arch/riscv/crypto/chacha-riscv64-glue.c
index df185d0663fcc..216b4cd9d1e01 100644
--- a/arch/riscv/crypto/chacha-riscv64-glue.c
+++ b/arch/riscv/crypto/chacha-riscv64-glue.c
@@ -16,45 +16,42 @@
 #include <linux/module.h>
 #include <linux/types.h>
 
-#define CHACHA_KEY_OFFSET 4
-#define CHACHA_IV_OFFSET 12
-
 /* chacha20 using zvkb vector crypto extension */
 asmlinkage void ChaCha20_ctr32_zvkb(u8 *out, const u8 *input, size_t len,
 				    const u32 *key, const u32 *counter);
 
 static int chacha20_encrypt(struct skcipher_request *req)
 {
-	u32 state[CHACHA_STATE_WORDS];
 	u8 block_buffer[CHACHA_BLOCK_SIZE];
 	struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
 	const struct chacha_ctx *ctx = crypto_skcipher_ctx(tfm);
 	struct skcipher_walk walk;
 	unsigned int nbytes;
 	unsigned int tail_bytes;
+	u32 iv[4];
 	int err;
 
-	chacha_init_generic(state, ctx->key, req->iv);
+	iv[0] = get_unaligned_le32(req->iv);
+	iv[1] = get_unaligned_le32(req->iv + 4);
+	iv[2] = get_unaligned_le32(req->iv + 8);
+	iv[3] = get_unaligned_le32(req->iv + 12);
 
 	err = skcipher_walk_virt(&walk, req, false);
 	while (walk.nbytes) {
-		nbytes = walk.nbytes & (~(CHACHA_BLOCK_SIZE - 1));
+		nbytes = walk.nbytes & ~(CHACHA_BLOCK_SIZE - 1);
 		tail_bytes = walk.nbytes & (CHACHA_BLOCK_SIZE - 1);
 		kernel_vector_begin();
 		if (nbytes) {
 			ChaCha20_ctr32_zvkb(walk.dst.virt.addr,
 					    walk.src.virt.addr, nbytes,
-					    state + CHACHA_KEY_OFFSET,
-					    state + CHACHA_IV_OFFSET);
-			state[CHACHA_IV_OFFSET] += nbytes / CHACHA_BLOCK_SIZE;
+					    ctx->key, iv);
+			iv[0] += nbytes / CHACHA_BLOCK_SIZE;
 		}
 		if (walk.nbytes == walk.total && tail_bytes > 0) {
 			memcpy(block_buffer, walk.src.virt.addr + nbytes,
 			       tail_bytes);
 			ChaCha20_ctr32_zvkb(block_buffer, block_buffer,
-					    CHACHA_BLOCK_SIZE,
-					    state + CHACHA_KEY_OFFSET,
-					    state + CHACHA_IV_OFFSET);
+					    CHACHA_BLOCK_SIZE, ctx->key, iv);
 			memcpy(walk.dst.virt.addr + nbytes, block_buffer,
 			       tail_bytes);
 			tail_bytes = 0;

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* Re: [PATCH 09/12] RISC-V: crypto: add Zvknhb accelerated SHA384/512 implementations
  2023-10-25 18:36 ` [PATCH 09/12] RISC-V: crypto: add Zvknhb accelerated SHA384/512 implementations Jerry Shih
@ 2023-11-22  1:32   ` Eric Biggers
  2023-11-27  2:50     ` Jerry Shih
  0 siblings, 1 reply; 50+ messages in thread
From: Eric Biggers @ 2023-11-22  1:32 UTC (permalink / raw)
  To: Jerry Shih
  Cc: paul.walmsley, palmer, aou, herbert, davem, andy.chiu,
	greentime.hu, conor.dooley, guoren, bjorn, heiko, ardb,
	phoebe.chen, hongrong.hsu, linux-riscv, linux-kernel,
	linux-crypto

On Thu, Oct 26, 2023 at 02:36:41AM +0800, Jerry Shih wrote:
> +static int riscv64_sha512_update(struct shash_desc *desc, const u8 *data,
> +				 unsigned int len)
> +{
> +	int ret = 0;
> +
> +	/*
> +	 * Make sure struct sha256_state begins directly with the SHA256
> +	 * 256-bit internal state, as this is what the asm function expect.
> +	 */
> +	BUILD_BUG_ON(offsetof(struct sha512_state, state) != 0);

There's a copy-paste error here; all the 256 above should be 512.

> +static struct shash_alg sha512_alg[] = {
> +	{
> +		.digestsize = SHA512_DIGEST_SIZE,
> +		.init = sha512_base_init,
> +		.update = riscv64_sha512_update,
> +		.final = riscv64_sha512_final,
> +		.finup = riscv64_sha512_finup,
> +		.descsize = sizeof(struct sha512_state),
> +		.base.cra_name = "sha512",
> +		.base.cra_driver_name = "sha512-riscv64-zvkb-zvknhb",
> +		.base.cra_priority = 150,
> +		.base.cra_blocksize = SHA512_BLOCK_SIZE,
> +		.base.cra_module = THIS_MODULE,
> +	},
> +	{
> +		.digestsize = SHA384_DIGEST_SIZE,
> +		.init = sha384_base_init,
> +		.update = riscv64_sha512_update,
> +		.final = riscv64_sha512_final,
> +		.finup = riscv64_sha512_finup,
> +		.descsize = sizeof(struct sha512_state),
> +		.base.cra_name = "sha384",
> +		.base.cra_driver_name = "sha384-riscv64-zvkb-zvknhb",
> +		.base.cra_priority = 150,
> +		.base.cra_blocksize = SHA384_BLOCK_SIZE,
> +		.base.cra_module = THIS_MODULE,
> +	}
> +};

*_algs instead of *_alg when there's more than one, please.
I.e., sha512_alg => sha512_algs here.

- Eric

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 07/12] RISC-V: crypto: add Zvkg accelerated GCM GHASH implementation
  2023-10-25 18:36 ` [PATCH 07/12] RISC-V: crypto: add Zvkg accelerated GCM GHASH implementation Jerry Shih
@ 2023-11-22  1:42   ` Eric Biggers
  2023-11-27  2:49     ` Jerry Shih
  0 siblings, 1 reply; 50+ messages in thread
From: Eric Biggers @ 2023-11-22  1:42 UTC (permalink / raw)
  To: Jerry Shih
  Cc: paul.walmsley, palmer, aou, herbert, davem, andy.chiu,
	greentime.hu, conor.dooley, guoren, bjorn, heiko, ardb,
	phoebe.chen, hongrong.hsu, linux-riscv, linux-kernel,
	linux-crypto

On Thu, Oct 26, 2023 at 02:36:39AM +0800, Jerry Shih wrote:
> +struct riscv64_ghash_context {
> +	be128 key;
> +};
> +
> +struct riscv64_ghash_desc_ctx {
> +	be128 shash;
> +	u8 buffer[GHASH_BLOCK_SIZE];
> +	u32 bytes;
> +};

I recommend calling the first struct 'riscv64_ghash_tfm_ctx', and making the
pointers to it be named 'tctx'.  That would more clearly distinguish it from the
desc_ctx / dctx.

> +
> +typedef void (*ghash_func)(be128 *Xi, const be128 *H, const u8 *inp,
> +			   size_t len);
> +
> +static inline void ghash_blocks(const struct riscv64_ghash_context *ctx,
> +				struct riscv64_ghash_desc_ctx *dctx,
> +				const u8 *src, size_t srclen, ghash_func func)
> +	if (crypto_simd_usable()) {
> +		kernel_vector_begin();
> +		func(&dctx->shash, &ctx->key, src, srclen);
> +		kernel_vector_end();

The indirection to ghash_func is unnecessary, since the only value is
gcm_ghash_rv64i_zvkg.

This also means that ghash_update() should be folded into ghash_update_zvkg(),
and ghash_final() into ghash_final_zvkg().

> +	} else {
> +		while (srclen >= GHASH_BLOCK_SIZE) {
> +			crypto_xor((u8 *)&dctx->shash, src, GHASH_BLOCK_SIZE);
> +			gf128mul_lle(&dctx->shash, &ctx->key);
> +			srclen -= GHASH_BLOCK_SIZE;
> +			src += GHASH_BLOCK_SIZE;
> +		}
> +	}

The assembly code uses the equivalent of the following do-while loop instead:

        do {
                srclen -= GHASH_BLOCK_SIZE;
        } while (srclen);

I.e., it assumes the length here is nonzero and a multiple of 16, which it is.

To avoid confusion, I recommend making the C code use the same do-while loop.


>        const struct riscv64_ghash_context *ctx =
>               crypto_tfm_ctx(crypto_shash_tfm(desc->tfm));

crypto_tfm_ctx(crypto_shash_tfm(tfm)) should be crypto_shash_ctx(tfm)

> +static int ghash_final(struct shash_desc *desc, u8 *out, ghash_func func)
> +{
> +	const struct riscv64_ghash_context *ctx =
> +		crypto_tfm_ctx(crypto_shash_tfm(desc->tfm));
> +	struct riscv64_ghash_desc_ctx *dctx = shash_desc_ctx(desc);
> +	int i;
> +
> +	if (dctx->bytes) {
> +		for (i = dctx->bytes; i < GHASH_BLOCK_SIZE; i++)
> +			dctx->buffer[i] = 0;
> +
> +		ghash_blocks(ctx, dctx, dctx->buffer, GHASH_BLOCK_SIZE, func);
> +		dctx->bytes = 0;
> +	}
> +

Setting dctx->bytes above is unnecessary.

> +static int ghash_init(struct shash_desc *desc)
> +{
> +	struct riscv64_ghash_desc_ctx *dctx = shash_desc_ctx(desc);
> +
> +	*dctx = (struct riscv64_ghash_desc_ctx){};
> +
> +	return 0;
> +}
> +
> +static int ghash_update_zvkg(struct shash_desc *desc, const u8 *src,
> +			     unsigned int srclen)
> +{
> +	return ghash_update(desc, src, srclen, gcm_ghash_rv64i_zvkg);
> +}
> +
> +static int ghash_final_zvkg(struct shash_desc *desc, u8 *out)
> +{
> +	return ghash_final(desc, out, gcm_ghash_rv64i_zvkg);
> +}
> +
> +static int ghash_setkey(struct crypto_shash *tfm, const u8 *key,
> +			unsigned int keylen)
> +{
> +	struct riscv64_ghash_context *ctx =
> +		crypto_tfm_ctx(crypto_shash_tfm(tfm));
> +
> +	if (keylen != GHASH_BLOCK_SIZE)
> +		return -EINVAL;
> +
> +	memcpy(&ctx->key, key, GHASH_BLOCK_SIZE);
> +
> +	return 0;
> +}
> +
> +static struct shash_alg riscv64_ghash_alg_zvkg = {
> +	.digestsize = GHASH_DIGEST_SIZE,
> +	.init = ghash_init,
> +	.update = ghash_update_zvkg,
> +	.final = ghash_final_zvkg,
> +	.setkey = ghash_setkey,

IMO it's helpful to order the shash functions as follows, both in their
definitions and their fields in struct shash_alg:

    setkey
    init
    update
    final

That matches the order in which they're called.

- Eric

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 12/12] RISC-V: crypto: add Zvkb accelerated ChaCha20 implementation
  2023-11-21 13:14           ` Conor Dooley
  2023-11-21 23:37             ` Eric Biggers
@ 2023-11-22 17:37             ` Jerry Shih
  2023-11-22 18:05               ` Palmer Dabbelt
  2023-11-22 18:20               ` Conor Dooley
  1 sibling, 2 replies; 50+ messages in thread
From: Jerry Shih @ 2023-11-22 17:37 UTC (permalink / raw)
  To: Conor Dooley
  Cc: Eric Biggers, Paul Walmsley, palmer, Albert Ou, herbert, davem,
	andy.chiu, greentime.hu, guoren, bjorn, heiko, ardb, phoebe.chen,
	hongrong.hsu, linux-riscv, linux-kernel, linux-crypto

On Nov 21, 2023, at 21:14, Conor Dooley <conor.dooley@microchip.com> wrote:
> On Tue, Nov 21, 2023 at 06:55:07PM +0800, Jerry Shih wrote:
>> Sorry, I just use my `internal` qemu with vector-crypto and rva22 patches.
>> 
>> The public qemu haven't supported rva22 profiles. Here is the qemu patch[1] for
>> that. But here is the discussion why the qemu doesn't export these
>> `named extensions`(e.g. Zicclsm).
>> I try to add Zicclsm in DT in the v2 patch set. Maybe we will have more discussion
>> about the rva22 profiles in kernel DT.
> 
> Please do, that'll be fun! Please take some time to read what the
> profiles spec actually defines Zicclsm fore before you send those patches
> though. I think you might come to find you have misunderstood what it
> means - certainly I did the first time I saw it!

From the rva22 profile:
  This requires misaligned support for all regular load and store instructions (including
  scalar and ``vector``)

The spec includes the explicit `vector` keyword.
So, I still think we could use Zicclsm checking for these vector-crypto implementations.

My proposed patch is just a simple patch which only update the DT document and
update the isa string parser for Zicclsm. If it's still not recommend to use Zicclsm
checking, I will turn to use `RISCV_HWPROBE_MISALIGNED_*` instead.

>> [1]
>> LINK: https://lore.kernel.org/all/d1d6f2dc-55b2-4dce-a48a-4afbbf6df526@ventanamicro.com/#t
>> 
>> I don't know whether it's a good practice to check unaligned access using
>> `Zicclsm`. 
>> 
>> Here is another related cpu feature for unaligned access:
>> RISCV_HWPROBE_MISALIGNED_*
>> But it looks like it always be initialized with `RISCV_HWPROBE_MISALIGNED_SLOW`[2].
>> It implies that linux kernel always supports unaligned access. But we have the
>> actual HW which doesn't support unaligned access for vector unit.
> 
> https://docs.kernel.org/arch/riscv/uabi.html#misaligned-accesses
> 
> Misaligned accesses are part of the user ABI & the hwprobe stuff for
> that allows userspace to figure out whether they're fast (likely
> implemented in hardware), slow (likely emulated in firmware) or emulated
> in the kernel.

The HWPROBE_MISALIGNED_* checking function is at:
https://github.com/torvalds/linux/blob/c2d5304e6c648ebcf653bace7e51e0e6742e46c8/arch/riscv/kernel/cpufeature.c#L564-L647
The tests are all scalar. No `vector` test inside. So, I'm not sure the
HWPROBE_MISALIGNED_* is related to vector unit or not.

The goal is to check whether `vector` support unaligned access or not
in this crypto patch.

I haven't seen the emulated path for unaligned-vector-access in OpenSBI
and kernel. Is the unaligned-vector-access included in user ABI?

Thanks,
Jerry




^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 12/12] RISC-V: crypto: add Zvkb accelerated ChaCha20 implementation
  2023-11-22 17:37             ` Jerry Shih
@ 2023-11-22 18:05               ` Palmer Dabbelt
  2023-11-22 18:20               ` Conor Dooley
  1 sibling, 0 replies; 50+ messages in thread
From: Palmer Dabbelt @ 2023-11-22 18:05 UTC (permalink / raw)
  To: jerry.shih
  Cc: Conor Dooley, ebiggers, Paul Walmsley, aou, herbert, davem,
	andy.chiu, greentime.hu, guoren, Bjorn Topel, heiko,
	Ard Biesheuvel, phoebe.chen, hongrong.hsu, linux-riscv,
	linux-kernel, linux-crypto

On Wed, 22 Nov 2023 09:37:33 PST (-0800), jerry.shih@sifive.com wrote:
> On Nov 21, 2023, at 21:14, Conor Dooley <conor.dooley@microchip.com> wrote:
>> On Tue, Nov 21, 2023 at 06:55:07PM +0800, Jerry Shih wrote:
>>> Sorry, I just use my `internal` qemu with vector-crypto and rva22 patches.
>>> 
>>> The public qemu haven't supported rva22 profiles. Here is the qemu patch[1] for
>>> that. But here is the discussion why the qemu doesn't export these
>>> `named extensions`(e.g. Zicclsm).
>>> I try to add Zicclsm in DT in the v2 patch set. Maybe we will have more discussion
>>> about the rva22 profiles in kernel DT.
>> 
>> Please do, that'll be fun! Please take some time to read what the
>> profiles spec actually defines Zicclsm fore before you send those patches
>> though. I think you might come to find you have misunderstood what it
>> means - certainly I did the first time I saw it!
>
> From the rva22 profile:
>   This requires misaligned support for all regular load and store instructions (including
>   scalar and ``vector``)
>
> The spec includes the explicit `vector` keyword.
> So, I still think we could use Zicclsm checking for these vector-crypto implementations.
>
> My proposed patch is just a simple patch which only update the DT document and
> update the isa string parser for Zicclsm. If it's still not recommend to use Zicclsm
> checking, I will turn to use `RISCV_HWPROBE_MISALIGNED_*` instead.

IMO that's the way to go: even if these are required to be supported by 
Zicclsm, we still need to deal with the performance implications.

>>> [1]
>>> LINK: https://lore.kernel.org/all/d1d6f2dc-55b2-4dce-a48a-4afbbf6df526@ventanamicro.com/#t
>>> 
>>> I don't know whether it's a good practice to check unaligned access using
>>> `Zicclsm`. 
>>> 
>>> Here is another related cpu feature for unaligned access:
>>> RISCV_HWPROBE_MISALIGNED_*
>>> But it looks like it always be initialized with `RISCV_HWPROBE_MISALIGNED_SLOW`[2].
>>> It implies that linux kernel always supports unaligned access. But we have the
>>> actual HW which doesn't support unaligned access for vector unit.
>> 
>> https://docs.kernel.org/arch/riscv/uabi.html#misaligned-accesses
>> 
>> Misaligned accesses are part of the user ABI & the hwprobe stuff for
>> that allows userspace to figure out whether they're fast (likely
>> implemented in hardware), slow (likely emulated in firmware) or emulated
>> in the kernel.
>
> The HWPROBE_MISALIGNED_* checking function is at:
> https://github.com/torvalds/linux/blob/c2d5304e6c648ebcf653bace7e51e0e6742e46c8/arch/riscv/kernel/cpufeature.c#L564-L647
> The tests are all scalar. No `vector` test inside. So, I'm not sure the
> HWPROBE_MISALIGNED_* is related to vector unit or not.
>
> The goal is to check whether `vector` support unaligned access or not
> in this crypto patch.
>
> I haven't seen the emulated path for unaligned-vector-access in OpenSBI
> and kernel. Is the unaligned-vector-access included in user ABI?

I guess it's kind of a grey area, but I'd agrue that it is: we merged 
support for V when the only implementation (ie, QEMU) supported 
misaligned accesses, so we're stuck with that being the defacto 
behavior.  As part of adding support for the K230 we'll need to then add 
the kernel-mode vector misaligned access handlers, but that doesn't seem 
so hard.

So I'd say we should update the hwprobe docs to say that key only 
reflects scalar accesses (or maybe even just integer accesses?  that's 
all we're testing for) -- essentially just make the documentation match 
the implementation, as that'll keep ABI compatibility.  Then we can add 
a new key for vector misaligned access performance.

>
> Thanks,
> Jerry

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 12/12] RISC-V: crypto: add Zvkb accelerated ChaCha20 implementation
  2023-11-22 17:37             ` Jerry Shih
  2023-11-22 18:05               ` Palmer Dabbelt
@ 2023-11-22 18:20               ` Conor Dooley
  2023-11-22 19:05                 ` Jerry Shih
  1 sibling, 1 reply; 50+ messages in thread
From: Conor Dooley @ 2023-11-22 18:20 UTC (permalink / raw)
  To: Jerry Shih
  Cc: Conor Dooley, Eric Biggers, Paul Walmsley, palmer, Albert Ou,
	herbert, davem, andy.chiu, greentime.hu, guoren, bjorn, heiko,
	ardb, phoebe.chen, hongrong.hsu, linux-riscv, linux-kernel,
	linux-crypto

[-- Attachment #1: Type: text/plain, Size: 2268 bytes --]

On Thu, Nov 23, 2023 at 01:37:33AM +0800, Jerry Shih wrote:
> On Nov 21, 2023, at 21:14, Conor Dooley <conor.dooley@microchip.com> wrote:
> > On Tue, Nov 21, 2023 at 06:55:07PM +0800, Jerry Shih wrote:
> >> Sorry, I just use my `internal` qemu with vector-crypto and rva22 patches.
> >> 
> >> The public qemu haven't supported rva22 profiles. Here is the qemu patch[1] for
> >> that. But here is the discussion why the qemu doesn't export these
> >> `named extensions`(e.g. Zicclsm).
> >> I try to add Zicclsm in DT in the v2 patch set. Maybe we will have more discussion
> >> about the rva22 profiles in kernel DT.
> > 
> > Please do, that'll be fun! Please take some time to read what the
> > profiles spec actually defines Zicclsm fore before you send those patches
> > though. I think you might come to find you have misunderstood what it
> > means - certainly I did the first time I saw it!
> 
> From the rva22 profile:

"rva22" is not a profile. As I pointed out to Eric, this is defined in
the RVA22U64 profile (and the RVA20U64 one, but that is effectively a
moot point). The profile descriptions for these only specify "the ISA
features available to user-mode execution environments", so it is not
suitable for use in any other context.

>   This requires misaligned support for all regular load and store instructions (including
>   scalar and ``vector``)
> 
> The spec includes the explicit `vector` keyword.
> So, I still think we could use Zicclsm checking for these vector-crypto implementations.

In userspace, if Zicclsm was exported somewhere, that would be a valid
argument. Even for userspace, the hwprobe flags probably provide more
information though, since the firmware emulation is insanely slow.

> My proposed patch is just a simple patch which only update the DT document and
> update the isa string parser for Zicclsm.

Zicclsm has no meaning outside of user mode, so it's not suitable for
use in that context. Other "features" defined in the profiles spec might
be suitable for inclusion, but it'll be a case-by-case basis.

> If it's still not recommend to use Zicclsm
> checking, I will turn to use `RISCV_HWPROBE_MISALIGNED_*` instead.

Palmer has commented on the rest, so no need for me :)

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 12/12] RISC-V: crypto: add Zvkb accelerated ChaCha20 implementation
  2023-11-22 18:20               ` Conor Dooley
@ 2023-11-22 19:05                 ` Jerry Shih
  0 siblings, 0 replies; 50+ messages in thread
From: Jerry Shih @ 2023-11-22 19:05 UTC (permalink / raw)
  To: Conor Dooley
  Cc: Conor Dooley, Eric Biggers, Paul Walmsley, palmer, Albert Ou,
	herbert, davem, andy.chiu, greentime.hu, guoren, bjorn, heiko,
	ardb, phoebe.chen, hongrong.hsu, linux-riscv, linux-kernel,
	linux-crypto

On Nov 23, 2023, at 02:20, Conor Dooley <conor@kernel.org> wrote:
> On Thu, Nov 23, 2023 at 01:37:33AM +0800, Jerry Shih wrote:
>> On Nov 21, 2023, at 21:14, Conor Dooley <conor.dooley@microchip.com> wrote:
>>> On Tue, Nov 21, 2023 at 06:55:07PM +0800, Jerry Shih wrote:
>>>> Sorry, I just use my `internal` qemu with vector-crypto and rva22 patches.
>>>> 
>>>> The public qemu haven't supported rva22 profiles. Here is the qemu patch[1] for
>>>> that. But here is the discussion why the qemu doesn't export these
>>>> `named extensions`(e.g. Zicclsm).
>>>> I try to add Zicclsm in DT in the v2 patch set. Maybe we will have more discussion
>>>> about the rva22 profiles in kernel DT.
>>> 
>>> Please do, that'll be fun! Please take some time to read what the
>>> profiles spec actually defines Zicclsm fore before you send those patches
>>> though. I think you might come to find you have misunderstood what it
>>> means - certainly I did the first time I saw it!
>> 
>> From the rva22 profile:
> 
> "rva22" is not a profile. As I pointed out to Eric, this is defined in
> the RVA22U64 profile (and the RVA20U64 one, but that is effectively a
> moot point). The profile descriptions for these only specify "the ISA
> features available to user-mode execution environments", so it is not
> suitable for use in any other context.

I missed that important part: it's for user space.
Thx.

>>  This requires misaligned support for all regular load and store instructions (including
>>  scalar and ``vector``)
>> 
>> The spec includes the explicit `vector` keyword.
>> So, I still think we could use Zicclsm checking for these vector-crypto implementations.
> 
> In userspace, if Zicclsm was exported somewhere, that would be a valid
> argument. Even for userspace, the hwprobe flags probably provide more
> information though, since the firmware emulation is insanely slow.

I agree. It will be more useful to have the flag like `VECTOR_MISALIGNED_FAST`
instead.

>> My proposed patch is just a simple patch which only update the DT document and
>> update the isa string parser for Zicclsm.
> 
> Zicclsm has no meaning outside of user mode, so it's not suitable for
> use in that context. Other "features" defined in the profiles spec might
> be suitable for inclusion, but it'll be a case-by-case basis.

I will skip the Zicclsm part in my v2 patch.

>> If it's still not recommend to use Zicclsm
>> checking, I will turn to use `RISCV_HWPROBE_MISALIGNED_*` instead.
> 
> Palmer has commented on the rest, so no need for me :)

All crypto algorithms will assume that the vector supports misaligned access in next
v2 patch.
And the algorithms will also not check for `RISCV_HWPROBE_MISALIGNED_*` since
it's related to scalar accesses.
Once we have the vector performance related flag, we could go back here to use it.

-Jerry


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 12/12] RISC-V: crypto: add Zvkb accelerated ChaCha20 implementation
  2023-11-22  1:29   ` Eric Biggers
@ 2023-11-27  2:14     ` Jerry Shih
  0 siblings, 0 replies; 50+ messages in thread
From: Jerry Shih @ 2023-11-27  2:14 UTC (permalink / raw)
  To: Eric Biggers
  Cc: Paul Walmsley, palmer, Albert Ou, herbert, davem, andy.chiu,
	greentime.hu, conor.dooley, guoren, bjorn, heiko, ardb,
	phoebe.chen, hongrong.hsu, linux-riscv, linux-kernel,
	linux-crypto

On Nov 22, 2023, at 09:29, Eric Biggers <ebiggers@kernel.org> wrote:
> On Thu, Oct 26, 2023 at 02:36:44AM +0800, Jerry Shih wrote:
>> diff --git a/arch/riscv/crypto/chacha-riscv64-glue.c b/arch/riscv/crypto/chacha-riscv64-glue.c
>> new file mode 100644
>> index 000000000000..72011949f705
>> --- /dev/null
>> +++ b/arch/riscv/crypto/chacha-riscv64-glue.c
>> @@ -0,0 +1,120 @@
>> +// SPDX-License-Identifier: GPL-2.0-only
>> +/*
>> + * Port of the OpenSSL ChaCha20 implementation for RISC-V 64
>> + *
>> + * Copyright (C) 2023 SiFive, Inc.
>> + * Author: Jerry Shih <jerry.shih@sifive.com>
>> + */
>> +
>> +#include <asm/simd.h>
>> +#include <asm/vector.h>
>> +#include <crypto/internal/chacha.h>
>> +#include <crypto/internal/simd.h>
>> +#include <crypto/internal/skcipher.h>
>> +#include <linux/crypto.h>
>> +#include <linux/module.h>
>> +#include <linux/types.h>
>> +
>> +#define CHACHA_BLOCK_VALID_SIZE_MASK (~(CHACHA_BLOCK_SIZE - 1))
>> +#define CHACHA_BLOCK_REMAINING_SIZE_MASK (CHACHA_BLOCK_SIZE - 1)
>> +#define CHACHA_KEY_OFFSET 4
>> +#define CHACHA_IV_OFFSET 12
>> +
>> +/* chacha20 using zvkb vector crypto extension */
>> +void ChaCha20_ctr32_zvkb(u8 *out, const u8 *input, size_t len, const u32 *key,
>> +			 const u32 *counter);
>> +
>> +static int chacha20_encrypt(struct skcipher_request *req)
>> +{
>> +	u32 state[CHACHA_STATE_WORDS];
> 
> This function doesn't need to create the whole state matrix on the stack, since
> the underlying assembly function takes as input the key and counter, not the
> state matrix.  I recommend something like the following:
> 
> diff --git a/arch/riscv/crypto/chacha-riscv64-glue.c b/arch/riscv/crypto/chacha-riscv64-glue.c
> index df185d0663fcc..216b4cd9d1e01 100644
> --- a/arch/riscv/crypto/chacha-riscv64-glue.c
> +++ b/arch/riscv/crypto/chacha-riscv64-glue.c
> @@ -16,45 +16,42 @@
> #include <linux/module.h>
> #include <linux/types.h>
> 
> -#define CHACHA_KEY_OFFSET 4
> -#define CHACHA_IV_OFFSET 12
> -
> /* chacha20 using zvkb vector crypto extension */
> asmlinkage void ChaCha20_ctr32_zvkb(u8 *out, const u8 *input, size_t len,
> 				    const u32 *key, const u32 *counter);
> 
> static int chacha20_encrypt(struct skcipher_request *req)
> {
> -	u32 state[CHACHA_STATE_WORDS];
> 	u8 block_buffer[CHACHA_BLOCK_SIZE];
> 	struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
> 	const struct chacha_ctx *ctx = crypto_skcipher_ctx(tfm);
> 	struct skcipher_walk walk;
> 	unsigned int nbytes;
> 	unsigned int tail_bytes;
> +	u32 iv[4];
> 	int err;
> 
> -	chacha_init_generic(state, ctx->key, req->iv);
> +	iv[0] = get_unaligned_le32(req->iv);
> +	iv[1] = get_unaligned_le32(req->iv + 4);
> +	iv[2] = get_unaligned_le32(req->iv + 8);
> +	iv[3] = get_unaligned_le32(req->iv + 12);
> 
> 	err = skcipher_walk_virt(&walk, req, false);
> 	while (walk.nbytes) {
> -		nbytes = walk.nbytes & (~(CHACHA_BLOCK_SIZE - 1));
> +		nbytes = walk.nbytes & ~(CHACHA_BLOCK_SIZE - 1);
> 		tail_bytes = walk.nbytes & (CHACHA_BLOCK_SIZE - 1);
> 		kernel_vector_begin();
> 		if (nbytes) {
> 			ChaCha20_ctr32_zvkb(walk.dst.virt.addr,
> 					    walk.src.virt.addr, nbytes,
> -					    state + CHACHA_KEY_OFFSET,
> -					    state + CHACHA_IV_OFFSET);
> -			state[CHACHA_IV_OFFSET] += nbytes / CHACHA_BLOCK_SIZE;
> +					    ctx->key, iv);
> +			iv[0] += nbytes / CHACHA_BLOCK_SIZE;
> 		}
> 		if (walk.nbytes == walk.total && tail_bytes > 0) {
> 			memcpy(block_buffer, walk.src.virt.addr + nbytes,
> 			       tail_bytes);
> 			ChaCha20_ctr32_zvkb(block_buffer, block_buffer,
> -					    CHACHA_BLOCK_SIZE,
> -					    state + CHACHA_KEY_OFFSET,
> -					    state + CHACHA_IV_OFFSET);
> +					    CHACHA_BLOCK_SIZE, ctx->key, iv);
> 			memcpy(walk.dst.virt.addr + nbytes, block_buffer,
> 			       tail_bytes);
> 			tail_bytes = 0;

Fixed.
We will only use the iv instead of the full chacha state matrix.


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 07/12] RISC-V: crypto: add Zvkg accelerated GCM GHASH implementation
  2023-11-22  1:42   ` Eric Biggers
@ 2023-11-27  2:49     ` Jerry Shih
  0 siblings, 0 replies; 50+ messages in thread
From: Jerry Shih @ 2023-11-27  2:49 UTC (permalink / raw)
  To: Eric Biggers
  Cc: Paul Walmsley, palmer, Albert Ou, herbert, davem, andy.chiu,
	greentime.hu, conor.dooley, guoren, bjorn, heiko, ardb,
	phoebe.chen, hongrong.hsu, linux-riscv, linux-kernel,
	linux-crypto

On Nov 22, 2023, at 09:42, Eric Biggers <ebiggers@kernel.org> wrote:
> On Thu, Oct 26, 2023 at 02:36:39AM +0800, Jerry Shih wrote:
>> +struct riscv64_ghash_context {
>> +	be128 key;
>> +};
>> +
>> +struct riscv64_ghash_desc_ctx {
>> +	be128 shash;
>> +	u8 buffer[GHASH_BLOCK_SIZE];
>> +	u32 bytes;
>> +};
> 
> I recommend calling the first struct 'riscv64_ghash_tfm_ctx', and making the
> pointers to it be named 'tctx'.  That would more clearly distinguish it from the
> desc_ctx / dctx.

Fixed.

>> +
>> +typedef void (*ghash_func)(be128 *Xi, const be128 *H, const u8 *inp,
>> +			   size_t len);
>> +
>> +static inline void ghash_blocks(const struct riscv64_ghash_context *ctx,
>> +				struct riscv64_ghash_desc_ctx *dctx,
>> +				const u8 *src, size_t srclen, ghash_func func)
>> +	if (crypto_simd_usable()) {
>> +		kernel_vector_begin();
>> +		func(&dctx->shash, &ctx->key, src, srclen);
>> +		kernel_vector_end();
> 
> The indirection to ghash_func is unnecessary, since the only value is
> gcm_ghash_rv64i_zvkg.
> 
> This also means that ghash_update() should be folded into ghash_update_zvkg(),
> and ghash_final() into ghash_final_zvkg().

Fixed. The `gcm_ghash_rv64i_zvkg()` is folded into `ghash_update_zvkg()` and
`ghash_final_zvkg()`.

>> +	} else {
>> +		while (srclen >= GHASH_BLOCK_SIZE) {
>> +			crypto_xor((u8 *)&dctx->shash, src, GHASH_BLOCK_SIZE);
>> +			gf128mul_lle(&dctx->shash, &ctx->key);
>> +			srclen -= GHASH_BLOCK_SIZE;
>> +			src += GHASH_BLOCK_SIZE;
>> +		}
>> +	}
> 
> The assembly code uses the equivalent of the following do-while loop instead:
> 
>        do {
>                srclen -= GHASH_BLOCK_SIZE;
>        } while (srclen);
> 
> I.e., it assumes the length here is nonzero and a multiple of 16, which it is.
> 
> To avoid confusion, I recommend making the C code use the same do-while loop.

Fixed.

>>       const struct riscv64_ghash_context *ctx =
>>              crypto_tfm_ctx(crypto_shash_tfm(desc->tfm));
> 
> crypto_tfm_ctx(crypto_shash_tfm(tfm)) should be crypto_shash_ctx(tfm)

Fixed.
But the original code do the same thing.

>> +static int ghash_final(struct shash_desc *desc, u8 *out, ghash_func func)
>> +{
>> +	const struct riscv64_ghash_context *ctx =
>> +		crypto_tfm_ctx(crypto_shash_tfm(desc->tfm));
>> +	struct riscv64_ghash_desc_ctx *dctx = shash_desc_ctx(desc);
>> +	int i;
>> +
>> +	if (dctx->bytes) {
>> +		for (i = dctx->bytes; i < GHASH_BLOCK_SIZE; i++)
>> +			dctx->buffer[i] = 0;
>> +
>> +		ghash_blocks(ctx, dctx, dctx->buffer, GHASH_BLOCK_SIZE, func);
>> +		dctx->bytes = 0;
>> +	}
>> +
> 
> Setting dctx->bytes above is unnecessary.

Fixed.

>> +static int ghash_init(struct shash_desc *desc)
>> +{
>> +	struct riscv64_ghash_desc_ctx *dctx = shash_desc_ctx(desc);
>> +
>> +	*dctx = (struct riscv64_ghash_desc_ctx){};
>> +
>> +	return 0;
>> +}
>> +
>> +static int ghash_update_zvkg(struct shash_desc *desc, const u8 *src,
>> +			     unsigned int srclen)
>> +{
>> +	return ghash_update(desc, src, srclen, gcm_ghash_rv64i_zvkg);
>> +}
>> +
>> +static int ghash_final_zvkg(struct shash_desc *desc, u8 *out)
>> +{
>> +	return ghash_final(desc, out, gcm_ghash_rv64i_zvkg);
>> +}
>> +
>> +static int ghash_setkey(struct crypto_shash *tfm, const u8 *key,
>> +			unsigned int keylen)
>> +{
>> +	struct riscv64_ghash_context *ctx =
>> +		crypto_tfm_ctx(crypto_shash_tfm(tfm));
>> +
>> +	if (keylen != GHASH_BLOCK_SIZE)
>> +		return -EINVAL;
>> +
>> +	memcpy(&ctx->key, key, GHASH_BLOCK_SIZE);
>> +
>> +	return 0;
>> +}
>> +
>> +static struct shash_alg riscv64_ghash_alg_zvkg = {
>> +	.digestsize = GHASH_DIGEST_SIZE,
>> +	.init = ghash_init,
>> +	.update = ghash_update_zvkg,
>> +	.final = ghash_final_zvkg,
>> +	.setkey = ghash_setkey,
> 
> IMO it's helpful to order the shash functions as follows, both in their
> definitions and their fields in struct shash_alg:
> 
>    setkey
>    init
>    update
>    final
> 
> That matches the order in which they're called.

I have different opinion. I reorder the initialization in the order declared.
That will help us to check whether the function/member is missed.

> - Eric


-Jerry

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 09/12] RISC-V: crypto: add Zvknhb accelerated SHA384/512 implementations
  2023-11-22  1:32   ` Eric Biggers
@ 2023-11-27  2:50     ` Jerry Shih
  0 siblings, 0 replies; 50+ messages in thread
From: Jerry Shih @ 2023-11-27  2:50 UTC (permalink / raw)
  To: Eric Biggers
  Cc: Paul Walmsley, palmer, Albert Ou, herbert, davem, andy.chiu,
	greentime.hu, conor.dooley, guoren, bjorn, heiko, ardb,
	phoebe.chen, hongrong.hsu, linux-riscv, linux-kernel,
	linux-crypto

On Nov 22, 2023, at 09:32, Eric Biggers <ebiggers@kernel.org> wrote:
> On Thu, Oct 26, 2023 at 02:36:41AM +0800, Jerry Shih wrote:
>> +static int riscv64_sha512_update(struct shash_desc *desc, const u8 *data,
>> +				 unsigned int len)
>> +{
>> +	int ret = 0;
>> +
>> +	/*
>> +	 * Make sure struct sha256_state begins directly with the SHA256
>> +	 * 256-bit internal state, as this is what the asm function expect.
>> +	 */
>> +	BUILD_BUG_ON(offsetof(struct sha512_state, state) != 0);
> 
> There's a copy-paste error here; all the 256 above should be 512.

Fixed.

>> +static struct shash_alg sha512_alg[] = {
>> +	{
>> +		.digestsize = SHA512_DIGEST_SIZE,
>> +		.init = sha512_base_init,
>> +		.update = riscv64_sha512_update,
>> +		.final = riscv64_sha512_final,
>> +		.finup = riscv64_sha512_finup,
>> +		.descsize = sizeof(struct sha512_state),
>> +		.base.cra_name = "sha512",
>> +		.base.cra_driver_name = "sha512-riscv64-zvkb-zvknhb",
>> +		.base.cra_priority = 150,
>> +		.base.cra_blocksize = SHA512_BLOCK_SIZE,
>> +		.base.cra_module = THIS_MODULE,
>> +	},
>> +	{
>> +		.digestsize = SHA384_DIGEST_SIZE,
>> +		.init = sha384_base_init,
>> +		.update = riscv64_sha512_update,
>> +		.final = riscv64_sha512_final,
>> +		.finup = riscv64_sha512_finup,
>> +		.descsize = sizeof(struct sha512_state),
>> +		.base.cra_name = "sha384",
>> +		.base.cra_driver_name = "sha384-riscv64-zvkb-zvknhb",
>> +		.base.cra_priority = 150,
>> +		.base.cra_blocksize = SHA384_BLOCK_SIZE,
>> +		.base.cra_module = THIS_MODULE,
>> +	}
>> +};
> 
> *_algs instead of *_alg when there's more than one, please.
> I.e., sha512_alg => sha512_algs here.

Fixed.

> - Eric


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 06/12] RISC-V: crypto: add accelerated AES-CBC/CTR/ECB/XTS implementations
  2023-11-22  1:14     ` Eric Biggers
@ 2023-11-27  2:52       ` Jerry Shih
  0 siblings, 0 replies; 50+ messages in thread
From: Jerry Shih @ 2023-11-27  2:52 UTC (permalink / raw)
  To: Eric Biggers
  Cc: Paul Walmsley, palmer, Albert Ou, herbert, davem, andy.chiu,
	greentime.hu, conor.dooley, guoren, bjorn, heiko, ardb,
	phoebe.chen, hongrong.hsu, linux-riscv, linux-kernel,
	linux-crypto

On Nov 22, 2023, at 09:14, Eric Biggers <ebiggers@kernel.org> wrote:
> On Wed, Nov 01, 2023 at 10:16:39PM -0700, Eric Biggers wrote:
>>> +	  Architecture: riscv64 using:
>>> +	  - Zvbb vector extension (XTS)
>>> +	  - Zvkb vector crypto extension (CTR/XTS)
>>> +	  - Zvkg vector crypto extension (XTS)
>>> +	  - Zvkned vector crypto extension
>> 
>> Maybe list Zvkned first since it's the most important one in this context.
> 
> BTW, I'd like to extend this request to the implementation names
> (.cra_driver_name) and the names of the files as well.  I.e., instead of:
> 
>    aes-riscv64-zvkned
>    aes-riscv64-zvkb-zvkned
>    aes-riscv64-zvbb-zvkg-zvkned
>    sha256-riscv64-zvkb-zvknha_or_zvknhb
>    sha512-riscv64-zvkb-zvknhb
> 
> ... we'd have:
> 
>    aes-riscv64-zvkned
>    aes-riscv64-zvkned-zvkb
>    aes-riscv64-zvkned-zvbb-zvkg
>    sha256-riscv64-zvknha_or_zvknhb-zvkb
>    sha512-riscv64-zvknhb-zvkb
> 
> and similarly for the cra_driver_name fields.
> 
> I think that's much more logical.  Do you agree?
> 
> - Eric

Fixed.

We have the names like:
aes-riscv64-zvkned
aes-riscv64-zvkned-zvkb
aes-riscv64-zvkned-zvbb-zvkg
sha256-riscv64-zvknha_or_zvknhb-zvkb
sha512-riscv64-zvknhb-zvkb

-Jerry

^ permalink raw reply	[flat|nested] 50+ messages in thread

end of thread, other threads:[~2023-11-27  2:52 UTC | newest]

Thread overview: 50+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-10-25 18:36 [PATCH 00/12] RISC-V: provide some accelerated cryptography implementations using vector extensions Jerry Shih
2023-10-25 18:36 ` [PATCH 01/12] RISC-V: add helper function to read the vector VLEN Jerry Shih
2023-10-25 18:36 ` [PATCH 02/12] RISC-V: hook new crypto subdir into build-system Jerry Shih
2023-10-25 18:36 ` [PATCH 03/12] RISC-V: crypto: add OpenSSL perl module for vector instructions Jerry Shih
2023-10-25 18:36 ` [PATCH 04/12] RISC-V: crypto: add Zvkned accelerated AES implementation Jerry Shih
2023-11-02  4:51   ` Eric Biggers
2023-11-20  2:53     ` Jerry Shih
2023-10-25 18:36 ` [PATCH 05/12] crypto: scatterwalk - Add scatterwalk_next() to get the next scatterlist in scatter_walk Jerry Shih
2023-10-25 18:36 ` [PATCH 06/12] RISC-V: crypto: add accelerated AES-CBC/CTR/ECB/XTS implementations Jerry Shih
2023-11-02  5:16   ` Eric Biggers
2023-11-07  8:53     ` Jerry Shih
2023-11-09  7:16       ` Eric Biggers
2023-11-10  3:58         ` Jerry Shih
2023-11-10  4:34           ` Eric Biggers
2023-11-10  4:58         ` Andy Chiu
2023-11-10  5:44           ` Eric Biggers
2023-11-11 11:08             ` Ard Biesheuvel
2023-11-11 17:52               ` Eric Biggers
2023-11-20  2:47     ` Jerry Shih
2023-11-20 19:28       ` Eric Biggers
2023-11-22  1:14     ` Eric Biggers
2023-11-27  2:52       ` Jerry Shih
2023-11-09  8:05   ` Eric Biggers
2023-11-10  4:06     ` Jerry Shih
2023-11-20  2:36       ` Jerry Shih
2023-10-25 18:36 ` [PATCH 07/12] RISC-V: crypto: add Zvkg accelerated GCM GHASH implementation Jerry Shih
2023-11-22  1:42   ` Eric Biggers
2023-11-27  2:49     ` Jerry Shih
2023-10-25 18:36 ` [PATCH 08/12] RISC-V: crypto: add Zvknha/b accelerated SHA224/256 implementations Jerry Shih
2023-10-25 18:36 ` [PATCH 09/12] RISC-V: crypto: add Zvknhb accelerated SHA384/512 implementations Jerry Shih
2023-11-22  1:32   ` Eric Biggers
2023-11-27  2:50     ` Jerry Shih
2023-10-25 18:36 ` [PATCH 10/12] RISC-V: crypto: add Zvksed accelerated SM4 implementation Jerry Shih
2023-11-02  5:58   ` Eric Biggers
2023-11-20  2:55     ` Jerry Shih
2023-10-25 18:36 ` [PATCH 11/12] RISC-V: crypto: add Zvksh accelerated SM3 implementation Jerry Shih
2023-10-25 18:36 ` [PATCH 12/12] RISC-V: crypto: add Zvkb accelerated ChaCha20 implementation Jerry Shih
2023-11-02  5:43   ` Eric Biggers
2023-11-20  2:55     ` Jerry Shih
2023-11-20 19:18       ` Eric Biggers
2023-11-21 10:55         ` Jerry Shih
2023-11-21 13:14           ` Conor Dooley
2023-11-21 23:37             ` Eric Biggers
2023-11-22  0:39               ` Conor Dooley
2023-11-22 17:37             ` Jerry Shih
2023-11-22 18:05               ` Palmer Dabbelt
2023-11-22 18:20               ` Conor Dooley
2023-11-22 19:05                 ` Jerry Shih
2023-11-22  1:29   ` Eric Biggers
2023-11-27  2:14     ` Jerry Shih

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).