linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] crypto: LEA block cipher implementation
@ 2023-04-28 11:00 Dongsoo Lee
  2023-04-28 11:00 ` [PATCH 1/3] " Dongsoo Lee
                   ` (3 more replies)
  0 siblings, 4 replies; 10+ messages in thread
From: Dongsoo Lee @ 2023-04-28 11:00 UTC (permalink / raw)
  To: linux-crypto
  Cc: Herbert Xu, David S. Miller, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, x86, H. Peter Anvin, linux-kernel,
	David S. Miller, Dongsoo Lee, Dongsoo Lee


This submission contains a generic C implementation of the LEA block
cipher and its optimized implementation within ECB, CBC, CTR, and XTR
cipher modes of operation for the x86_64 environment.

The LEA algorithm is a symmetric key cipher that processes data blocks
of 128bits and has three different key lengths, each with a different
number of rounds:

- LEA-128: 128-bit key, 24 rounds,
- LEA-192: 192-bit key, 28 rounds, and
- LEA-256: 256-bit key, 32 rounds.

The round function of LEA consists of 32-bit ARX(modular Addition,
bitwise Rotation, and bitwise XOR) operations. See [1, 2] for details.

The LEA is a Korean national standard block cipher, described in
"KS X 3246" and is also included in the international standard,
"ISO/IEC 29192-2:2019 standard (Information security - Lightweight
cryptography - Part 2: Block ciphers)".

It is one of the approved block ciphers for the current Korean
Cryptographic Module Validation Program (KCMVP).

The Korean e-government framework contains various cryptographic
applications, and KCMVP-validated cryptographic module should be used
according to the government requirements. The ARIA block cipher, which
is already included in Linux kernel, has been widely used as a symmetric
key cipher. However, the adoption of LEA increase rapidly for new
applications.

By adding LEA to the Linux kernel, Dedicated device drivers that require
LEA encryption can be provided without additional crypto implementation.
An example of an immediately applicable use case is disk encryption
using cryptsetup.

The submitted implementation includes a generic C implementation that
uses 32-bit ARX operations, and an optimized implementation for the
x86_64 environment.

The implementation same as submitted generic C implementation is
distributed through the Korea Internet & Security Agency (KISA),
could be found [3].

For the x86_64 environment, we use SSE2/MOVBE/AVX2 instructions. Since
LEA use four 32-bit unsigned integers for 128-bit block, the SSE2 and
AVX2 implementations encrypts four and eight blocks at a time for
optimization, repectively.
Our submission provides a optimized implementation of 4/8 block ECB, CBC
decryption, CTR, and XTS cipher operation modes on x86_64 CPUs
supporting AVX2. The MOVBE instruction is used for optimizing the CTR
mode.

The implementation has been tested with kernel module tcrypt.ko and has
passed the selftest using test vectors for KCMVP[4]. The path also test
with CONFIG_CRYPTO_MANAGER_EXTRA_TESTS enabled.

- [1] https://en.wikipedia.org/wiki/LEA_(cipher)
- [2] https://seed.kisa.or.kr/kisa/algorithm/EgovLeaInfo.do
- [3] https://seed.kisa.or.kr/kisa/Board/20/detailView.do
- [4] https://seed.kisa.or.kr/kisa/kcmvp/EgovVerification.do

Dongsoo Lee (3):
      crypto: LEA block cipher implementation
      crypto: add LEA testmgr tests
      crypto: LEA block cipher AVX2 optimization

 arch/x86/crypto/Kconfig               |   22 +
 arch/x86/crypto/Makefile              |    3 +
 arch/x86/crypto/lea_avx2_glue.c       | 1112 +++++++++++++++++++++++++
 arch/x86/crypto/lea_avx2_x86_64-asm.S |  778 ++++++++++++++++++
 crypto/Kconfig                        |   12 +
 crypto/Makefile                       |    1 +
 crypto/lea_generic.c                  |  915 +++++++++++++++++++++
 crypto/tcrypt.c                       |   73 ++
 crypto/testmgr.c                      |   32 +
 crypto/testmgr.h                      | 1211 ++++++++++++++++++++++++++++
 include/crypto/lea.h                  |   39 +
 11 files changed, 4198 insertions(+)


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 1/3] crypto: LEA block cipher implementation
  2023-04-28 11:00 [PATCH 0/3] crypto: LEA block cipher implementation Dongsoo Lee
@ 2023-04-28 11:00 ` Dongsoo Lee
  2023-04-28 23:29   ` Eric Biggers
  2023-04-28 11:00 ` [PATCH 2/3] crypto: add LEA testmgr tests Dongsoo Lee
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 10+ messages in thread
From: Dongsoo Lee @ 2023-04-28 11:00 UTC (permalink / raw)
  To: linux-crypto
  Cc: Herbert Xu, David S. Miller, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, x86, H. Peter Anvin, linux-kernel,
	David S. Miller, Dongsoo Lee, Dongsoo Lee

The LEA is a Korean national standard block cipher, described in
"KS X 3246" and is also included in the international standard, "ISO/IEC
29192-2:2019 standard (Information security - Lightweight cryptography
- Part 2: Block ciphers)".

The LEA algorithm is a symmetric key cipher that processes data blocks
of 128-bits and has three different key lengths, each with a different
number of rounds:

- LEA-128: 128-bit key, 24 rounds,
- LEA-192: 192-bit key, 28 rounds, and
- LEA-256: 256-bit key, 32 rounds.

The round function of LEA consists of 32-bit ARX(modular Addition,
bitwise Rotation, and bitwise XOR) operations.

The implementation same as submitted generic C implementation is
distributed through the Korea Internet & Security Agency (KISA).

- https://seed.kisa.or.kr/kisa/algorithm/EgovLeaInfo.do
- https://seed.kisa.or.kr/kisa/Board/20/detailView.do

Signed-off-by: Dongsoo Lee <letrhee@nsr.re.kr>
---
 crypto/Kconfig       |  12 +
 crypto/Makefile      |   1 +
 crypto/lea_generic.c | 915 +++++++++++++++++++++++++++++++++++++++++++
 include/crypto/lea.h |  39 ++
 4 files changed, 967 insertions(+)
 create mode 100644 crypto/lea_generic.c
 create mode 100644 include/crypto/lea.h

diff --git a/crypto/Kconfig b/crypto/Kconfig
index 9c86f7045157..5c56f6083cbd 100644
--- a/crypto/Kconfig
+++ b/crypto/Kconfig
@@ -485,6 +485,18 @@ config CRYPTO_KHAZAD
 	  See https://web.archive.org/web/20171011071731/http://www.larc.usp.br/~pbarreto/KhazadPage.html
 	  for further information.
 
+config CRYPTO_LEA
+	tristate "LEA"
+	select CRYPTO_ALGAPI
+	help
+	  LEA cipher algorithm (KS X 3246, ISO/IEC 29192-2:2019)
+
+	  LEA is one of the standard cryptographic alorithms of
+	  the Republic of Korea. It consists of four 32bit word.
+
+	  See:
+	  https://seed.kisa.or.kr/kisa/algorithm/EgovLeaInfo.do
+
 config CRYPTO_SEED
 	tristate "SEED"
 	depends on CRYPTO_USER_API_ENABLE_OBSOLETE
diff --git a/crypto/Makefile b/crypto/Makefile
index d0126c915834..bf52af4dfdf2 100644
--- a/crypto/Makefile
+++ b/crypto/Makefile
@@ -149,6 +149,7 @@ obj-$(CONFIG_CRYPTO_KHAZAD) += khazad.o
 obj-$(CONFIG_CRYPTO_ANUBIS) += anubis.o
 obj-$(CONFIG_CRYPTO_SEED) += seed.o
 obj-$(CONFIG_CRYPTO_ARIA) += aria_generic.o
+obj-$(CONFIG_CRYPTO_LEA) += lea_generic.o
 obj-$(CONFIG_CRYPTO_CHACHA20) += chacha_generic.o
 obj-$(CONFIG_CRYPTO_POLY1305) += poly1305_generic.o
 obj-$(CONFIG_CRYPTO_DEFLATE) += deflate.o
diff --git a/crypto/lea_generic.c b/crypto/lea_generic.c
new file mode 100644
index 000000000000..919c23e7bcc5
--- /dev/null
+++ b/crypto/lea_generic.c
@@ -0,0 +1,915 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Cryptographic API.
+ *
+ * LEA Cipher Algorithm
+ *
+ * LEA is a 128-bit block cipher developed by South Korea in 2013.
+ *
+ * LEA is the national standard of Republic of Korea (KS X 3246) and included in
+ * the ISO/IEC 29192-2:2019 standard (Information security - Lightweight
+ * cryptography - Part 2: Block ciphers).
+ *
+ * Copyright (c) 2023 National Security Research.
+ * Author: Dongsoo Lee <letrhee@nsr.re.kr>
+ */
+
+
+#include <asm/unaligned.h>
+#include <linux/module.h>
+#include <linux/crypto.h>
+#include <crypto/algapi.h>
+#include <crypto/lea.h>
+
+static const u32 lea_delta[8][36] ____cacheline_aligned = {
+	{
+		0xc3efe9db, 0x87dfd3b7, 0x0fbfa76f, 0x1f7f4ede, 0x3efe9dbc, 0x7dfd3b78,
+		0xfbfa76f0, 0xf7f4ede1, 0xefe9dbc3, 0xdfd3b787, 0xbfa76f0f, 0x7f4ede1f,
+		0xfe9dbc3e, 0xfd3b787d, 0xfa76f0fb, 0xf4ede1f7, 0xe9dbc3ef, 0xd3b787df,
+		0xa76f0fbf, 0x4ede1f7f, 0x9dbc3efe, 0x3b787dfd, 0x76f0fbfa, 0xede1f7f4,
+		0xdbc3efe9, 0xb787dfd3, 0x6f0fbfa7, 0xde1f7f4e, 0xbc3efe9d, 0x787dfd3b,
+		0xf0fbfa76, 0xe1f7f4eD, 0xc3efe9db, 0x87dfd3b7, 0x0fbfa76f, 0x1f7f4ede
+	},
+	{
+		0x44626b02, 0x88c4d604, 0x1189ac09, 0x23135812, 0x4626b024, 0x8c4d6048,
+		0x189ac091, 0x31358122, 0x626b0244, 0xc4d60488, 0x89ac0911, 0x13581223,
+		0x26b02446, 0x4d60488c, 0x9ac09118, 0x35812231, 0x6b024462, 0xd60488c4,
+		0xac091189, 0x58122313, 0xb0244626, 0x60488c4d, 0xc091189a, 0x81223135,
+		0x0244626b, 0x0488c4d6, 0x091189ac, 0x12231358, 0x244626b0, 0x488c4d60,
+		0x91189ac0, 0x22313581, 0x44626b02, 0x88c4d604, 0x1189ac09, 0x23135812
+	},
+	{
+		0x79e27c8a, 0xf3c4f914, 0xe789f229, 0xcf13e453, 0x9e27c8a7, 0x3c4f914f,
+		0x789f229e, 0xf13e453c, 0xe27c8a79, 0xc4f914f3, 0x89f229e7, 0x13e453cf,
+		0x27c8a79e, 0x4f914f3c, 0x9f229e78, 0x3e453cf1, 0x7c8a79e2, 0xf914f3c4,
+		0xf229e789, 0xe453cf13, 0xc8a79e27, 0x914f3c4f, 0x229e789f, 0x453cf13e,
+		0x8a79e27c, 0x14f3c4f9, 0x29e789f2, 0x53cf13e4, 0xa79e27c8, 0x4f3c4f91,
+		0x9e789f22, 0x3cf13e45, 0x79e27c8a, 0xf3c4f914, 0xe789f229, 0xcf13e453
+	},
+	{
+		0x78df30ec, 0xf1be61d8, 0xe37cc3b1, 0xc6f98763, 0x8df30ec7, 0x1be61d8f,
+		0x37cc3b1e, 0x6f98763c, 0xdf30ec78, 0xbe61d8f1, 0x7cc3b1e3, 0xf98763c6,
+		0xf30ec78d, 0xe61d8f1b, 0xcc3b1e37, 0x98763c6f, 0x30ec78df, 0x61d8f1be,
+		0xc3b1e37c, 0x8763c6f9, 0x0ec78df3, 0x1d8f1be6, 0x3b1e37cc, 0x763c6f98,
+		0xec78df30, 0xd8f1be61, 0xb1e37cc3, 0x63c6f987, 0xc78df30e, 0x8f1be61d,
+		0x1e37cc3b, 0x3c6f9876, 0x78df30ec, 0xf1be61d8, 0xe37cc3b1, 0xc6f98763
+	},
+	{
+		0x715ea49e, 0xe2bd493c, 0xc57a9279, 0x8af524f3, 0x15ea49e7, 0x2bd493ce,
+		0x57a9279c, 0xaf524f38, 0x5ea49e71, 0xbd493ce2, 0x7a9279c5, 0xf524f38a,
+		0xea49e715, 0xd493ce2b, 0xa9279c57, 0x524f38af, 0xa49e715e, 0x493ce2bd,
+		0x9279c57a, 0x24f38af5, 0x49e715ea, 0x93ce2bd4, 0x279c57a9, 0x4f38af52,
+		0x9e715ea4, 0x3ce2bd49, 0x79c57a92, 0xf38af524, 0xe715ea49, 0xce2bd493,
+		0x9c57a927, 0x38af524f, 0x715ea49e, 0xe2bd493c, 0xc57a9279, 0x8af524f3
+	},
+	{
+		0xc785da0a, 0x8f0bb415, 0x1e17682b, 0x3c2ed056, 0x785da0ac, 0xf0bb4158,
+		0xe17682b1, 0xc2ed0563, 0x85da0ac7, 0x0bb4158f, 0x17682b1e, 0x2ed0563c,
+		0x5da0ac78, 0xbb4158f0, 0x7682b1e1, 0xed0563c2, 0xda0ac785, 0xb4158f0b,
+		0x682b1e17, 0xd0563c2e, 0xa0ac785d, 0x4158f0bb, 0x82b1e176, 0x0563c2ed,
+		0x0ac785da, 0x158f0bb4, 0x2b1e1768, 0x563c2ed0, 0xac785da0, 0x58f0bb41,
+		0xb1e17682, 0x63c2ed05, 0xc785da0a, 0x8f0bb415, 0x1e17682b, 0x3c2ed056
+	},
+	{
+		0xe04ef22a, 0xc09de455, 0x813bc8ab, 0x02779157, 0x04ef22ae, 0x09de455c,
+		0x13bc8ab8, 0x27791570, 0x4ef22ae0, 0x9de455c0, 0x3bc8ab81, 0x77915702,
+		0xef22ae04, 0xde455c09, 0xbc8ab813, 0x79157027, 0xf22ae04e, 0xe455c09d,
+		0xc8ab813b, 0x91570277, 0x22ae04ef, 0x455c09de, 0x8ab813bc, 0x15702779,
+		0x2ae04ef2, 0x55c09de4, 0xab813bc8, 0x57027791, 0xae04ef22, 0x5c09de45,
+		0xb813bc8a, 0x70277915, 0xe04ef22a, 0xc09de455, 0x813bc8ab, 0x02779157
+	},
+	{
+		0xe5c40957, 0xcb8812af, 0x9710255f, 0x2e204abf, 0x5c40957e, 0xb8812afc,
+		0x710255f9, 0xe204abf2, 0xc40957e5, 0x8812afcb, 0x10255f97, 0x204abf2e,
+		0x40957e5c, 0x812afcb8, 0x0255f971, 0x04abf2e2, 0x0957e5c4, 0x12afcb88,
+		0x255f9710, 0x4abf2e20, 0x957e5c40, 0x2afcb881, 0x55f97102, 0xabf2e204,
+		0x57e5c409, 0xafcb8812, 0x5f971025, 0xbf2e204a, 0x7e5c4095, 0xfcb8812a,
+		0xf9710255, 0xf2e204ab, 0xe5c40957, 0xcb8812af, 0x9710255f, 0x2e204abf
+	}
+};
+
+void lea_encrypt(const void *ctx, u8 *out, const u8 *in)
+{
+	const struct crypto_lea_ctx *key = ctx;
+	u32 X0 = get_unaligned_le32(&in[0]);
+	u32 X1 = get_unaligned_le32(&in[4]);
+	u32 X2 = get_unaligned_le32(&in[8]);
+	u32 X3 = get_unaligned_le32(&in[12]);
+
+	X3 = ror32((X2 ^ key->rk[4]) + (X3 ^ key->rk[5]), 3);
+	X2 = ror32((X1 ^ key->rk[2]) + (X2 ^ key->rk[3]), 5);
+	X1 = rol32((X0 ^ key->rk[0]) + (X1 ^ key->rk[1]), 9);
+	X0 = ror32((X3 ^ key->rk[10]) + (X0 ^ key->rk[11]), 3);
+	X3 = ror32((X2 ^ key->rk[8]) + (X3 ^ key->rk[9]), 5);
+	X2 = rol32((X1 ^ key->rk[6]) + (X2 ^ key->rk[7]), 9);
+	X1 = ror32((X0 ^ key->rk[16]) + (X1 ^ key->rk[17]), 3);
+	X0 = ror32((X3 ^ key->rk[14]) + (X0 ^ key->rk[15]), 5);
+	X3 = rol32((X2 ^ key->rk[12]) + (X3 ^ key->rk[13]), 9);
+	X2 = ror32((X1 ^ key->rk[22]) + (X2 ^ key->rk[23]), 3);
+	X1 = ror32((X0 ^ key->rk[20]) + (X1 ^ key->rk[21]), 5);
+	X0 = rol32((X3 ^ key->rk[18]) + (X0 ^ key->rk[19]), 9);
+
+	X3 = ror32((X2 ^ key->rk[28]) + (X3 ^ key->rk[29]), 3);
+	X2 = ror32((X1 ^ key->rk[26]) + (X2 ^ key->rk[27]), 5);
+	X1 = rol32((X0 ^ key->rk[24]) + (X1 ^ key->rk[25]), 9);
+	X0 = ror32((X3 ^ key->rk[34]) + (X0 ^ key->rk[35]), 3);
+	X3 = ror32((X2 ^ key->rk[32]) + (X3 ^ key->rk[33]), 5);
+	X2 = rol32((X1 ^ key->rk[30]) + (X2 ^ key->rk[31]), 9);
+	X1 = ror32((X0 ^ key->rk[40]) + (X1 ^ key->rk[41]), 3);
+	X0 = ror32((X3 ^ key->rk[38]) + (X0 ^ key->rk[39]), 5);
+	X3 = rol32((X2 ^ key->rk[36]) + (X3 ^ key->rk[37]), 9);
+	X2 = ror32((X1 ^ key->rk[46]) + (X2 ^ key->rk[47]), 3);
+	X1 = ror32((X0 ^ key->rk[44]) + (X1 ^ key->rk[45]), 5);
+	X0 = rol32((X3 ^ key->rk[42]) + (X0 ^ key->rk[43]), 9);
+
+	X3 = ror32((X2 ^ key->rk[52]) + (X3 ^ key->rk[53]), 3);
+	X2 = ror32((X1 ^ key->rk[50]) + (X2 ^ key->rk[51]), 5);
+	X1 = rol32((X0 ^ key->rk[48]) + (X1 ^ key->rk[49]), 9);
+	X0 = ror32((X3 ^ key->rk[58]) + (X0 ^ key->rk[59]), 3);
+	X3 = ror32((X2 ^ key->rk[56]) + (X3 ^ key->rk[57]), 5);
+	X2 = rol32((X1 ^ key->rk[54]) + (X2 ^ key->rk[55]), 9);
+	X1 = ror32((X0 ^ key->rk[64]) + (X1 ^ key->rk[65]), 3);
+	X0 = ror32((X3 ^ key->rk[62]) + (X0 ^ key->rk[63]), 5);
+	X3 = rol32((X2 ^ key->rk[60]) + (X3 ^ key->rk[61]), 9);
+	X2 = ror32((X1 ^ key->rk[70]) + (X2 ^ key->rk[71]), 3);
+	X1 = ror32((X0 ^ key->rk[68]) + (X1 ^ key->rk[69]), 5);
+	X0 = rol32((X3 ^ key->rk[66]) + (X0 ^ key->rk[67]), 9);
+
+	X3 = ror32((X2 ^ key->rk[76]) + (X3 ^ key->rk[77]), 3);
+	X2 = ror32((X1 ^ key->rk[74]) + (X2 ^ key->rk[75]), 5);
+	X1 = rol32((X0 ^ key->rk[72]) + (X1 ^ key->rk[73]), 9);
+	X0 = ror32((X3 ^ key->rk[82]) + (X0 ^ key->rk[83]), 3);
+	X3 = ror32((X2 ^ key->rk[80]) + (X3 ^ key->rk[81]), 5);
+	X2 = rol32((X1 ^ key->rk[78]) + (X2 ^ key->rk[79]), 9);
+	X1 = ror32((X0 ^ key->rk[88]) + (X1 ^ key->rk[89]), 3);
+	X0 = ror32((X3 ^ key->rk[86]) + (X0 ^ key->rk[87]), 5);
+	X3 = rol32((X2 ^ key->rk[84]) + (X3 ^ key->rk[85]), 9);
+	X2 = ror32((X1 ^ key->rk[94]) + (X2 ^ key->rk[95]), 3);
+	X1 = ror32((X0 ^ key->rk[92]) + (X1 ^ key->rk[93]), 5);
+	X0 = rol32((X3 ^ key->rk[90]) + (X0 ^ key->rk[91]), 9);
+
+	X3 = ror32((X2 ^ key->rk[100]) + (X3 ^ key->rk[101]), 3);
+	X2 = ror32((X1 ^ key->rk[98]) + (X2 ^ key->rk[99]), 5);
+	X1 = rol32((X0 ^ key->rk[96]) + (X1 ^ key->rk[97]), 9);
+	X0 = ror32((X3 ^ key->rk[106]) + (X0 ^ key->rk[107]), 3);
+	X3 = ror32((X2 ^ key->rk[104]) + (X3 ^ key->rk[105]), 5);
+	X2 = rol32((X1 ^ key->rk[102]) + (X2 ^ key->rk[103]), 9);
+	X1 = ror32((X0 ^ key->rk[112]) + (X1 ^ key->rk[113]), 3);
+	X0 = ror32((X3 ^ key->rk[110]) + (X0 ^ key->rk[111]), 5);
+	X3 = rol32((X2 ^ key->rk[108]) + (X3 ^ key->rk[109]), 9);
+	X2 = ror32((X1 ^ key->rk[118]) + (X2 ^ key->rk[119]), 3);
+	X1 = ror32((X0 ^ key->rk[116]) + (X1 ^ key->rk[117]), 5);
+	X0 = rol32((X3 ^ key->rk[114]) + (X0 ^ key->rk[115]), 9);
+
+	X3 = ror32((X2 ^ key->rk[124]) + (X3 ^ key->rk[125]), 3);
+	X2 = ror32((X1 ^ key->rk[122]) + (X2 ^ key->rk[123]), 5);
+	X1 = rol32((X0 ^ key->rk[120]) + (X1 ^ key->rk[121]), 9);
+	X0 = ror32((X3 ^ key->rk[130]) + (X0 ^ key->rk[131]), 3);
+	X3 = ror32((X2 ^ key->rk[128]) + (X3 ^ key->rk[129]), 5);
+	X2 = rol32((X1 ^ key->rk[126]) + (X2 ^ key->rk[127]), 9);
+	X1 = ror32((X0 ^ key->rk[136]) + (X1 ^ key->rk[137]), 3);
+	X0 = ror32((X3 ^ key->rk[134]) + (X0 ^ key->rk[135]), 5);
+	X3 = rol32((X2 ^ key->rk[132]) + (X3 ^ key->rk[133]), 9);
+	X2 = ror32((X1 ^ key->rk[142]) + (X2 ^ key->rk[143]), 3);
+	X1 = ror32((X0 ^ key->rk[140]) + (X1 ^ key->rk[141]), 5);
+	X0 = rol32((X3 ^ key->rk[138]) + (X0 ^ key->rk[139]), 9);
+
+	if (key->round > 24) {
+		X3 = ror32((X2 ^ key->rk[148]) + (X3 ^ key->rk[149]), 3);
+		X2 = ror32((X1 ^ key->rk[146]) + (X2 ^ key->rk[147]), 5);
+		X1 = rol32((X0 ^ key->rk[144]) + (X1 ^ key->rk[145]), 9);
+		X0 = ror32((X3 ^ key->rk[154]) + (X0 ^ key->rk[155]), 3);
+		X3 = ror32((X2 ^ key->rk[152]) + (X3 ^ key->rk[153]), 5);
+		X2 = rol32((X1 ^ key->rk[150]) + (X2 ^ key->rk[151]), 9);
+		X1 = ror32((X0 ^ key->rk[160]) + (X1 ^ key->rk[161]), 3);
+		X0 = ror32((X3 ^ key->rk[158]) + (X0 ^ key->rk[159]), 5);
+		X3 = rol32((X2 ^ key->rk[156]) + (X3 ^ key->rk[157]), 9);
+		X2 = ror32((X1 ^ key->rk[166]) + (X2 ^ key->rk[167]), 3);
+		X1 = ror32((X0 ^ key->rk[164]) + (X1 ^ key->rk[165]), 5);
+		X0 = rol32((X3 ^ key->rk[162]) + (X0 ^ key->rk[163]), 9);
+	}
+
+	if (key->round > 28) {
+		X3 = ror32((X2 ^ key->rk[172]) + (X3 ^ key->rk[173]), 3);
+		X2 = ror32((X1 ^ key->rk[170]) + (X2 ^ key->rk[171]), 5);
+		X1 = rol32((X0 ^ key->rk[168]) + (X1 ^ key->rk[169]), 9);
+		X0 = ror32((X3 ^ key->rk[178]) + (X0 ^ key->rk[179]), 3);
+		X3 = ror32((X2 ^ key->rk[176]) + (X3 ^ key->rk[177]), 5);
+		X2 = rol32((X1 ^ key->rk[174]) + (X2 ^ key->rk[175]), 9);
+		X1 = ror32((X0 ^ key->rk[184]) + (X1 ^ key->rk[185]), 3);
+		X0 = ror32((X3 ^ key->rk[182]) + (X0 ^ key->rk[183]), 5);
+		X3 = rol32((X2 ^ key->rk[180]) + (X3 ^ key->rk[181]), 9);
+		X2 = ror32((X1 ^ key->rk[190]) + (X2 ^ key->rk[191]), 3);
+		X1 = ror32((X0 ^ key->rk[188]) + (X1 ^ key->rk[189]), 5);
+		X0 = rol32((X3 ^ key->rk[186]) + (X0 ^ key->rk[187]), 9);
+	}
+
+	put_unaligned_le32(X0, &out[0]);
+	put_unaligned_le32(X1, &out[4]);
+	put_unaligned_le32(X2, &out[8]);
+	put_unaligned_le32(X3, &out[12]);
+}
+EXPORT_SYMBOL_GPL(lea_encrypt);
+
+void lea_decrypt(const void *ctx, u8 *out, const u8 *in)
+{
+	const struct crypto_lea_ctx *key = ctx;
+
+	u32 X0 = get_unaligned_le32(&in[0]);
+	u32 X1 = get_unaligned_le32(&in[4]);
+	u32 X2 = get_unaligned_le32(&in[8]);
+	u32 X3 = get_unaligned_le32(&in[12]);
+
+	if (key->round > 28) {
+		X0 = (ror32(X0, 9) - (X3 ^ key->rk[186])) ^ key->rk[187];
+		X1 = (rol32(X1, 5) - (X0 ^ key->rk[188])) ^ key->rk[189];
+		X2 = (rol32(X2, 3) - (X1 ^ key->rk[190])) ^ key->rk[191];
+		X3 = (ror32(X3, 9) - (X2 ^ key->rk[180])) ^ key->rk[181];
+		X0 = (rol32(X0, 5) - (X3 ^ key->rk[182])) ^ key->rk[183];
+		X1 = (rol32(X1, 3) - (X0 ^ key->rk[184])) ^ key->rk[185];
+		X2 = (ror32(X2, 9) - (X1 ^ key->rk[174])) ^ key->rk[175];
+		X3 = (rol32(X3, 5) - (X2 ^ key->rk[176])) ^ key->rk[177];
+		X0 = (rol32(X0, 3) - (X3 ^ key->rk[178])) ^ key->rk[179];
+		X1 = (ror32(X1, 9) - (X0 ^ key->rk[168])) ^ key->rk[169];
+		X2 = (rol32(X2, 5) - (X1 ^ key->rk[170])) ^ key->rk[171];
+		X3 = (rol32(X3, 3) - (X2 ^ key->rk[172])) ^ key->rk[173];
+	}
+
+	if (key->round > 24) {
+		X0 = (ror32(X0, 9) - (X3 ^ key->rk[162])) ^ key->rk[163];
+		X1 = (rol32(X1, 5) - (X0 ^ key->rk[164])) ^ key->rk[165];
+		X2 = (rol32(X2, 3) - (X1 ^ key->rk[166])) ^ key->rk[167];
+		X3 = (ror32(X3, 9) - (X2 ^ key->rk[156])) ^ key->rk[157];
+		X0 = (rol32(X0, 5) - (X3 ^ key->rk[158])) ^ key->rk[159];
+		X1 = (rol32(X1, 3) - (X0 ^ key->rk[160])) ^ key->rk[161];
+		X2 = (ror32(X2, 9) - (X1 ^ key->rk[150])) ^ key->rk[151];
+		X3 = (rol32(X3, 5) - (X2 ^ key->rk[152])) ^ key->rk[153];
+		X0 = (rol32(X0, 3) - (X3 ^ key->rk[154])) ^ key->rk[155];
+		X1 = (ror32(X1, 9) - (X0 ^ key->rk[144])) ^ key->rk[145];
+		X2 = (rol32(X2, 5) - (X1 ^ key->rk[146])) ^ key->rk[147];
+		X3 = (rol32(X3, 3) - (X2 ^ key->rk[148])) ^ key->rk[149];
+	}
+
+	X0 = (ror32(X0, 9) - (X3 ^ key->rk[138])) ^ key->rk[139];
+	X1 = (rol32(X1, 5) - (X0 ^ key->rk[140])) ^ key->rk[141];
+	X2 = (rol32(X2, 3) - (X1 ^ key->rk[142])) ^ key->rk[143];
+	X3 = (ror32(X3, 9) - (X2 ^ key->rk[132])) ^ key->rk[133];
+	X0 = (rol32(X0, 5) - (X3 ^ key->rk[134])) ^ key->rk[135];
+	X1 = (rol32(X1, 3) - (X0 ^ key->rk[136])) ^ key->rk[137];
+	X2 = (ror32(X2, 9) - (X1 ^ key->rk[126])) ^ key->rk[127];
+	X3 = (rol32(X3, 5) - (X2 ^ key->rk[128])) ^ key->rk[129];
+	X0 = (rol32(X0, 3) - (X3 ^ key->rk[130])) ^ key->rk[131];
+	X1 = (ror32(X1, 9) - (X0 ^ key->rk[120])) ^ key->rk[121];
+	X2 = (rol32(X2, 5) - (X1 ^ key->rk[122])) ^ key->rk[123];
+	X3 = (rol32(X3, 3) - (X2 ^ key->rk[124])) ^ key->rk[125];
+
+	X0 = (ror32(X0, 9) - (X3 ^ key->rk[114])) ^ key->rk[115];
+	X1 = (rol32(X1, 5) - (X0 ^ key->rk[116])) ^ key->rk[117];
+	X2 = (rol32(X2, 3) - (X1 ^ key->rk[118])) ^ key->rk[119];
+	X3 = (ror32(X3, 9) - (X2 ^ key->rk[108])) ^ key->rk[109];
+	X0 = (rol32(X0, 5) - (X3 ^ key->rk[110])) ^ key->rk[111];
+	X1 = (rol32(X1, 3) - (X0 ^ key->rk[112])) ^ key->rk[113];
+	X2 = (ror32(X2, 9) - (X1 ^ key->rk[102])) ^ key->rk[103];
+	X3 = (rol32(X3, 5) - (X2 ^ key->rk[104])) ^ key->rk[105];
+	X0 = (rol32(X0, 3) - (X3 ^ key->rk[106])) ^ key->rk[107];
+	X1 = (ror32(X1, 9) - (X0 ^ key->rk[96])) ^ key->rk[97];
+	X2 = (rol32(X2, 5) - (X1 ^ key->rk[98])) ^ key->rk[99];
+	X3 = (rol32(X3, 3) - (X2 ^ key->rk[100])) ^ key->rk[101];
+
+	X0 = (ror32(X0, 9) - (X3 ^ key->rk[90])) ^ key->rk[91];
+	X1 = (rol32(X1, 5) - (X0 ^ key->rk[92])) ^ key->rk[93];
+	X2 = (rol32(X2, 3) - (X1 ^ key->rk[94])) ^ key->rk[95];
+	X3 = (ror32(X3, 9) - (X2 ^ key->rk[84])) ^ key->rk[85];
+	X0 = (rol32(X0, 5) - (X3 ^ key->rk[86])) ^ key->rk[87];
+	X1 = (rol32(X1, 3) - (X0 ^ key->rk[88])) ^ key->rk[89];
+	X2 = (ror32(X2, 9) - (X1 ^ key->rk[78])) ^ key->rk[79];
+	X3 = (rol32(X3, 5) - (X2 ^ key->rk[80])) ^ key->rk[81];
+	X0 = (rol32(X0, 3) - (X3 ^ key->rk[82])) ^ key->rk[83];
+	X1 = (ror32(X1, 9) - (X0 ^ key->rk[72])) ^ key->rk[73];
+	X2 = (rol32(X2, 5) - (X1 ^ key->rk[74])) ^ key->rk[75];
+	X3 = (rol32(X3, 3) - (X2 ^ key->rk[76])) ^ key->rk[77];
+
+	X0 = (ror32(X0, 9) - (X3 ^ key->rk[66])) ^ key->rk[67];
+	X1 = (rol32(X1, 5) - (X0 ^ key->rk[68])) ^ key->rk[69];
+	X2 = (rol32(X2, 3) - (X1 ^ key->rk[70])) ^ key->rk[71];
+	X3 = (ror32(X3, 9) - (X2 ^ key->rk[60])) ^ key->rk[61];
+	X0 = (rol32(X0, 5) - (X3 ^ key->rk[62])) ^ key->rk[63];
+	X1 = (rol32(X1, 3) - (X0 ^ key->rk[64])) ^ key->rk[65];
+	X2 = (ror32(X2, 9) - (X1 ^ key->rk[54])) ^ key->rk[55];
+	X3 = (rol32(X3, 5) - (X2 ^ key->rk[56])) ^ key->rk[57];
+	X0 = (rol32(X0, 3) - (X3 ^ key->rk[58])) ^ key->rk[59];
+	X1 = (ror32(X1, 9) - (X0 ^ key->rk[48])) ^ key->rk[49];
+	X2 = (rol32(X2, 5) - (X1 ^ key->rk[50])) ^ key->rk[51];
+	X3 = (rol32(X3, 3) - (X2 ^ key->rk[52])) ^ key->rk[53];
+
+	X0 = (ror32(X0, 9) - (X3 ^ key->rk[42])) ^ key->rk[43];
+	X1 = (rol32(X1, 5) - (X0 ^ key->rk[44])) ^ key->rk[45];
+	X2 = (rol32(X2, 3) - (X1 ^ key->rk[46])) ^ key->rk[47];
+	X3 = (ror32(X3, 9) - (X2 ^ key->rk[36])) ^ key->rk[37];
+	X0 = (rol32(X0, 5) - (X3 ^ key->rk[38])) ^ key->rk[39];
+	X1 = (rol32(X1, 3) - (X0 ^ key->rk[40])) ^ key->rk[41];
+	X2 = (ror32(X2, 9) - (X1 ^ key->rk[30])) ^ key->rk[31];
+	X3 = (rol32(X3, 5) - (X2 ^ key->rk[32])) ^ key->rk[33];
+	X0 = (rol32(X0, 3) - (X3 ^ key->rk[34])) ^ key->rk[35];
+	X1 = (ror32(X1, 9) - (X0 ^ key->rk[24])) ^ key->rk[25];
+	X2 = (rol32(X2, 5) - (X1 ^ key->rk[26])) ^ key->rk[27];
+	X3 = (rol32(X3, 3) - (X2 ^ key->rk[28])) ^ key->rk[29];
+
+	X0 = (ror32(X0, 9) - (X3 ^ key->rk[18])) ^ key->rk[19];
+	X1 = (rol32(X1, 5) - (X0 ^ key->rk[20])) ^ key->rk[21];
+	X2 = (rol32(X2, 3) - (X1 ^ key->rk[22])) ^ key->rk[23];
+	X3 = (ror32(X3, 9) - (X2 ^ key->rk[12])) ^ key->rk[13];
+	X0 = (rol32(X0, 5) - (X3 ^ key->rk[14])) ^ key->rk[15];
+	X1 = (rol32(X1, 3) - (X0 ^ key->rk[16])) ^ key->rk[17];
+	X2 = (ror32(X2, 9) - (X1 ^ key->rk[6])) ^ key->rk[7];
+	X3 = (rol32(X3, 5) - (X2 ^ key->rk[8])) ^ key->rk[9];
+	X0 = (rol32(X0, 3) - (X3 ^ key->rk[10])) ^ key->rk[11];
+	X1 = (ror32(X1, 9) - (X0 ^ key->rk[0])) ^ key->rk[1];
+	X2 = (rol32(X2, 5) - (X1 ^ key->rk[2])) ^ key->rk[3];
+	X3 = (rol32(X3, 3) - (X2 ^ key->rk[4])) ^ key->rk[5];
+
+	put_unaligned_le32(X0, &out[0]);
+	put_unaligned_le32(X1, &out[4]);
+	put_unaligned_le32(X2, &out[8]);
+	put_unaligned_le32(X3, &out[12]);
+}
+EXPORT_SYMBOL_GPL(lea_decrypt);
+
+int lea_set_key(struct crypto_lea_ctx *key, const u8 *in_key,
+						u32 key_len)
+{
+	const u32 *_mk = (const u32 *)in_key;
+
+	switch (key_len) {
+	case 16:
+		key->rk[0] = rol32(get_unaligned_le32(&_mk[0]) + lea_delta[0][0], 1);
+		key->rk[6] = rol32(key->rk[0] + lea_delta[1][1], 1);
+		key->rk[12] = rol32(key->rk[6] + lea_delta[2][2], 1);
+		key->rk[18] = rol32(key->rk[12] + lea_delta[3][3], 1);
+		key->rk[24] = rol32(key->rk[18] + lea_delta[0][4], 1);
+		key->rk[30] = rol32(key->rk[24] + lea_delta[1][5], 1);
+		key->rk[36] = rol32(key->rk[30] + lea_delta[2][6], 1);
+		key->rk[42] = rol32(key->rk[36] + lea_delta[3][7], 1);
+		key->rk[48] = rol32(key->rk[42] + lea_delta[0][8], 1);
+		key->rk[54] = rol32(key->rk[48] + lea_delta[1][9], 1);
+		key->rk[60] = rol32(key->rk[54] + lea_delta[2][10], 1);
+		key->rk[66] = rol32(key->rk[60] + lea_delta[3][11], 1);
+		key->rk[72] = rol32(key->rk[66] + lea_delta[0][12], 1);
+		key->rk[78] = rol32(key->rk[72] + lea_delta[1][13], 1);
+		key->rk[84] = rol32(key->rk[78] + lea_delta[2][14], 1);
+		key->rk[90] = rol32(key->rk[84] + lea_delta[3][15], 1);
+		key->rk[96] = rol32(key->rk[90] + lea_delta[0][16], 1);
+		key->rk[102] = rol32(key->rk[96] + lea_delta[1][17], 1);
+		key->rk[108] = rol32(key->rk[102] + lea_delta[2][18], 1);
+		key->rk[114] = rol32(key->rk[108] + lea_delta[3][19], 1);
+		key->rk[120] = rol32(key->rk[114] + lea_delta[0][20], 1);
+		key->rk[126] = rol32(key->rk[120] + lea_delta[1][21], 1);
+		key->rk[132] = rol32(key->rk[126] + lea_delta[2][22], 1);
+		key->rk[138] = rol32(key->rk[132] + lea_delta[3][23], 1);
+
+		key->rk[1] = key->rk[3] = key->rk[5] =
+				rol32(get_unaligned_le32(&_mk[1]) + lea_delta[0][1], 3);
+		key->rk[7] = key->rk[9] = key->rk[11] =
+				rol32(key->rk[1] + lea_delta[1][2], 3);
+		key->rk[13] = key->rk[15] = key->rk[17] =
+				rol32(key->rk[7] + lea_delta[2][3], 3);
+		key->rk[19] = key->rk[21] = key->rk[23] =
+				rol32(key->rk[13] + lea_delta[3][4], 3);
+		key->rk[25] = key->rk[27] = key->rk[29] =
+				rol32(key->rk[19] + lea_delta[0][5], 3);
+		key->rk[31] = key->rk[33] = key->rk[35] =
+				rol32(key->rk[25] + lea_delta[1][6], 3);
+		key->rk[37] = key->rk[39] = key->rk[41] =
+				rol32(key->rk[31] + lea_delta[2][7], 3);
+		key->rk[43] = key->rk[45] = key->rk[47] =
+				rol32(key->rk[37] + lea_delta[3][8], 3);
+		key->rk[49] = key->rk[51] = key->rk[53] =
+				rol32(key->rk[43] + lea_delta[0][9], 3);
+		key->rk[55] = key->rk[57] = key->rk[59] =
+				rol32(key->rk[49] + lea_delta[1][10], 3);
+		key->rk[61] = key->rk[63] = key->rk[65] =
+				rol32(key->rk[55] + lea_delta[2][11], 3);
+		key->rk[67] = key->rk[69] = key->rk[71] =
+				rol32(key->rk[61] + lea_delta[3][12], 3);
+		key->rk[73] = key->rk[75] = key->rk[77] =
+				rol32(key->rk[67] + lea_delta[0][13], 3);
+		key->rk[79] = key->rk[81] = key->rk[83] =
+				rol32(key->rk[73] + lea_delta[1][14], 3);
+		key->rk[85] = key->rk[87] = key->rk[89] =
+				rol32(key->rk[79] + lea_delta[2][15], 3);
+		key->rk[91] = key->rk[93] = key->rk[95] =
+				rol32(key->rk[85] + lea_delta[3][16], 3);
+		key->rk[97] = key->rk[99] = key->rk[101] =
+				rol32(key->rk[91] + lea_delta[0][17], 3);
+		key->rk[103] = key->rk[105] = key->rk[107] =
+				rol32(key->rk[97] + lea_delta[1][18], 3);
+		key->rk[109] = key->rk[111] = key->rk[113] =
+				rol32(key->rk[103] + lea_delta[2][19], 3);
+		key->rk[115] = key->rk[117] = key->rk[119] =
+				rol32(key->rk[109] + lea_delta[3][20], 3);
+		key->rk[121] = key->rk[123] = key->rk[125] =
+				rol32(key->rk[115] + lea_delta[0][21], 3);
+		key->rk[127] = key->rk[129] = key->rk[131] =
+				rol32(key->rk[121] + lea_delta[1][22], 3);
+		key->rk[133] = key->rk[135] = key->rk[137] =
+				rol32(key->rk[127] + lea_delta[2][23], 3);
+		key->rk[139] = key->rk[141] = key->rk[143] =
+				rol32(key->rk[133] + lea_delta[3][24], 3);
+
+		key->rk[2] = rol32(get_unaligned_le32(&_mk[2]) + lea_delta[0][2], 6);
+		key->rk[8] = rol32(key->rk[2] + lea_delta[1][3], 6);
+		key->rk[14] = rol32(key->rk[8] + lea_delta[2][4], 6);
+		key->rk[20] = rol32(key->rk[14] + lea_delta[3][5], 6);
+		key->rk[26] = rol32(key->rk[20] + lea_delta[0][6], 6);
+		key->rk[32] = rol32(key->rk[26] + lea_delta[1][7], 6);
+		key->rk[38] = rol32(key->rk[32] + lea_delta[2][8], 6);
+		key->rk[44] = rol32(key->rk[38] + lea_delta[3][9], 6);
+		key->rk[50] = rol32(key->rk[44] + lea_delta[0][10], 6);
+		key->rk[56] = rol32(key->rk[50] + lea_delta[1][11], 6);
+		key->rk[62] = rol32(key->rk[56] + lea_delta[2][12], 6);
+		key->rk[68] = rol32(key->rk[62] + lea_delta[3][13], 6);
+		key->rk[74] = rol32(key->rk[68] + lea_delta[0][14], 6);
+		key->rk[80] = rol32(key->rk[74] + lea_delta[1][15], 6);
+		key->rk[86] = rol32(key->rk[80] + lea_delta[2][16], 6);
+		key->rk[92] = rol32(key->rk[86] + lea_delta[3][17], 6);
+		key->rk[98] = rol32(key->rk[92] + lea_delta[0][18], 6);
+		key->rk[104] = rol32(key->rk[98] + lea_delta[1][19], 6);
+		key->rk[110] = rol32(key->rk[104] + lea_delta[2][20], 6);
+		key->rk[116] = rol32(key->rk[110] + lea_delta[3][21], 6);
+		key->rk[122] = rol32(key->rk[116] + lea_delta[0][22], 6);
+		key->rk[128] = rol32(key->rk[122] + lea_delta[1][23], 6);
+		key->rk[134] = rol32(key->rk[128] + lea_delta[2][24], 6);
+		key->rk[140] = rol32(key->rk[134] + lea_delta[3][25], 6);
+
+		key->rk[4] = rol32(get_unaligned_le32(&_mk[3]) + lea_delta[0][3], 11);
+		key->rk[10] = rol32(key->rk[4] + lea_delta[1][4], 11);
+		key->rk[16] = rol32(key->rk[10] + lea_delta[2][5], 11);
+		key->rk[22] = rol32(key->rk[16] + lea_delta[3][6], 11);
+		key->rk[28] = rol32(key->rk[22] + lea_delta[0][7], 11);
+		key->rk[34] = rol32(key->rk[28] + lea_delta[1][8], 11);
+		key->rk[40] = rol32(key->rk[34] + lea_delta[2][9], 11);
+		key->rk[46] = rol32(key->rk[40] + lea_delta[3][10], 11);
+		key->rk[52] = rol32(key->rk[46] + lea_delta[0][11], 11);
+		key->rk[58] = rol32(key->rk[52] + lea_delta[1][12], 11);
+		key->rk[64] = rol32(key->rk[58] + lea_delta[2][13], 11);
+		key->rk[70] = rol32(key->rk[64] + lea_delta[3][14], 11);
+		key->rk[76] = rol32(key->rk[70] + lea_delta[0][15], 11);
+		key->rk[82] = rol32(key->rk[76] + lea_delta[1][16], 11);
+		key->rk[88] = rol32(key->rk[82] + lea_delta[2][17], 11);
+		key->rk[94] = rol32(key->rk[88] + lea_delta[3][18], 11);
+		key->rk[100] = rol32(key->rk[94] + lea_delta[0][19], 11);
+		key->rk[106] = rol32(key->rk[100] + lea_delta[1][20], 11);
+		key->rk[112] = rol32(key->rk[106] + lea_delta[2][21], 11);
+		key->rk[118] = rol32(key->rk[112] + lea_delta[3][22], 11);
+		key->rk[124] = rol32(key->rk[118] + lea_delta[0][23], 11);
+		key->rk[130] = rol32(key->rk[124] + lea_delta[1][24], 11);
+		key->rk[136] = rol32(key->rk[130] + lea_delta[2][25], 11);
+		key->rk[142] = rol32(key->rk[136] + lea_delta[3][26], 11);
+		break;
+
+	case 24:
+		key->rk[0] = rol32(get_unaligned_le32(&_mk[0]) + lea_delta[0][0], 1);
+		key->rk[6] = rol32(key->rk[0] + lea_delta[1][1], 1);
+		key->rk[12] = rol32(key->rk[6] + lea_delta[2][2], 1);
+		key->rk[18] = rol32(key->rk[12] + lea_delta[3][3], 1);
+		key->rk[24] = rol32(key->rk[18] + lea_delta[4][4], 1);
+		key->rk[30] = rol32(key->rk[24] + lea_delta[5][5], 1);
+		key->rk[36] = rol32(key->rk[30] + lea_delta[0][6], 1);
+		key->rk[42] = rol32(key->rk[36] + lea_delta[1][7], 1);
+		key->rk[48] = rol32(key->rk[42] + lea_delta[2][8], 1);
+		key->rk[54] = rol32(key->rk[48] + lea_delta[3][9], 1);
+		key->rk[60] = rol32(key->rk[54] + lea_delta[4][10], 1);
+		key->rk[66] = rol32(key->rk[60] + lea_delta[5][11], 1);
+		key->rk[72] = rol32(key->rk[66] + lea_delta[0][12], 1);
+		key->rk[78] = rol32(key->rk[72] + lea_delta[1][13], 1);
+		key->rk[84] = rol32(key->rk[78] + lea_delta[2][14], 1);
+		key->rk[90] = rol32(key->rk[84] + lea_delta[3][15], 1);
+		key->rk[96] = rol32(key->rk[90] + lea_delta[4][16], 1);
+		key->rk[102] = rol32(key->rk[96] + lea_delta[5][17], 1);
+		key->rk[108] = rol32(key->rk[102] + lea_delta[0][18], 1);
+		key->rk[114] = rol32(key->rk[108] + lea_delta[1][19], 1);
+		key->rk[120] = rol32(key->rk[114] + lea_delta[2][20], 1);
+		key->rk[126] = rol32(key->rk[120] + lea_delta[3][21], 1);
+		key->rk[132] = rol32(key->rk[126] + lea_delta[4][22], 1);
+		key->rk[138] = rol32(key->rk[132] + lea_delta[5][23], 1);
+		key->rk[144] = rol32(key->rk[138] + lea_delta[0][24], 1);
+		key->rk[150] = rol32(key->rk[144] + lea_delta[1][25], 1);
+		key->rk[156] = rol32(key->rk[150] + lea_delta[2][26], 1);
+		key->rk[162] = rol32(key->rk[156] + lea_delta[3][27], 1);
+
+		key->rk[1] = rol32(get_unaligned_le32(&_mk[1]) + lea_delta[0][1], 3);
+		key->rk[7] = rol32(key->rk[1] + lea_delta[1][2], 3);
+		key->rk[13] = rol32(key->rk[7] + lea_delta[2][3], 3);
+		key->rk[19] = rol32(key->rk[13] + lea_delta[3][4], 3);
+		key->rk[25] = rol32(key->rk[19] + lea_delta[4][5], 3);
+		key->rk[31] = rol32(key->rk[25] + lea_delta[5][6], 3);
+		key->rk[37] = rol32(key->rk[31] + lea_delta[0][7], 3);
+		key->rk[43] = rol32(key->rk[37] + lea_delta[1][8], 3);
+		key->rk[49] = rol32(key->rk[43] + lea_delta[2][9], 3);
+		key->rk[55] = rol32(key->rk[49] + lea_delta[3][10], 3);
+		key->rk[61] = rol32(key->rk[55] + lea_delta[4][11], 3);
+		key->rk[67] = rol32(key->rk[61] + lea_delta[5][12], 3);
+		key->rk[73] = rol32(key->rk[67] + lea_delta[0][13], 3);
+		key->rk[79] = rol32(key->rk[73] + lea_delta[1][14], 3);
+		key->rk[85] = rol32(key->rk[79] + lea_delta[2][15], 3);
+		key->rk[91] = rol32(key->rk[85] + lea_delta[3][16], 3);
+		key->rk[97] = rol32(key->rk[91] + lea_delta[4][17], 3);
+		key->rk[103] = rol32(key->rk[97] + lea_delta[5][18], 3);
+		key->rk[109] = rol32(key->rk[103] + lea_delta[0][19], 3);
+		key->rk[115] = rol32(key->rk[109] + lea_delta[1][20], 3);
+		key->rk[121] = rol32(key->rk[115] + lea_delta[2][21], 3);
+		key->rk[127] = rol32(key->rk[121] + lea_delta[3][22], 3);
+		key->rk[133] = rol32(key->rk[127] + lea_delta[4][23], 3);
+		key->rk[139] = rol32(key->rk[133] + lea_delta[5][24], 3);
+		key->rk[145] = rol32(key->rk[139] + lea_delta[0][25], 3);
+		key->rk[151] = rol32(key->rk[145] + lea_delta[1][26], 3);
+		key->rk[157] = rol32(key->rk[151] + lea_delta[2][27], 3);
+		key->rk[163] = rol32(key->rk[157] + lea_delta[3][28], 3);
+
+		key->rk[2] = rol32(get_unaligned_le32(&_mk[2]) + lea_delta[0][2], 6);
+		key->rk[8] = rol32(key->rk[2] + lea_delta[1][3], 6);
+		key->rk[14] = rol32(key->rk[8] + lea_delta[2][4], 6);
+		key->rk[20] = rol32(key->rk[14] + lea_delta[3][5], 6);
+		key->rk[26] = rol32(key->rk[20] + lea_delta[4][6], 6);
+		key->rk[32] = rol32(key->rk[26] + lea_delta[5][7], 6);
+		key->rk[38] = rol32(key->rk[32] + lea_delta[0][8], 6);
+		key->rk[44] = rol32(key->rk[38] + lea_delta[1][9], 6);
+		key->rk[50] = rol32(key->rk[44] + lea_delta[2][10], 6);
+		key->rk[56] = rol32(key->rk[50] + lea_delta[3][11], 6);
+		key->rk[62] = rol32(key->rk[56] + lea_delta[4][12], 6);
+		key->rk[68] = rol32(key->rk[62] + lea_delta[5][13], 6);
+		key->rk[74] = rol32(key->rk[68] + lea_delta[0][14], 6);
+		key->rk[80] = rol32(key->rk[74] + lea_delta[1][15], 6);
+		key->rk[86] = rol32(key->rk[80] + lea_delta[2][16], 6);
+		key->rk[92] = rol32(key->rk[86] + lea_delta[3][17], 6);
+		key->rk[98] = rol32(key->rk[92] + lea_delta[4][18], 6);
+		key->rk[104] = rol32(key->rk[98] + lea_delta[5][19], 6);
+		key->rk[110] = rol32(key->rk[104] + lea_delta[0][20], 6);
+		key->rk[116] = rol32(key->rk[110] + lea_delta[1][21], 6);
+		key->rk[122] = rol32(key->rk[116] + lea_delta[2][22], 6);
+		key->rk[128] = rol32(key->rk[122] + lea_delta[3][23], 6);
+		key->rk[134] = rol32(key->rk[128] + lea_delta[4][24], 6);
+		key->rk[140] = rol32(key->rk[134] + lea_delta[5][25], 6);
+		key->rk[146] = rol32(key->rk[140] + lea_delta[0][26], 6);
+		key->rk[152] = rol32(key->rk[146] + lea_delta[1][27], 6);
+		key->rk[158] = rol32(key->rk[152] + lea_delta[2][28], 6);
+		key->rk[164] = rol32(key->rk[158] + lea_delta[3][29], 6);
+
+		key->rk[3] = rol32(get_unaligned_le32(&_mk[3]) + lea_delta[0][3], 11);
+		key->rk[9] = rol32(key->rk[3] + lea_delta[1][4], 11);
+		key->rk[15] = rol32(key->rk[9] + lea_delta[2][5], 11);
+		key->rk[21] = rol32(key->rk[15] + lea_delta[3][6], 11);
+		key->rk[27] = rol32(key->rk[21] + lea_delta[4][7], 11);
+		key->rk[33] = rol32(key->rk[27] + lea_delta[5][8], 11);
+		key->rk[39] = rol32(key->rk[33] + lea_delta[0][9], 11);
+		key->rk[45] = rol32(key->rk[39] + lea_delta[1][10], 11);
+		key->rk[51] = rol32(key->rk[45] + lea_delta[2][11], 11);
+		key->rk[57] = rol32(key->rk[51] + lea_delta[3][12], 11);
+		key->rk[63] = rol32(key->rk[57] + lea_delta[4][13], 11);
+		key->rk[69] = rol32(key->rk[63] + lea_delta[5][14], 11);
+		key->rk[75] = rol32(key->rk[69] + lea_delta[0][15], 11);
+		key->rk[81] = rol32(key->rk[75] + lea_delta[1][16], 11);
+		key->rk[87] = rol32(key->rk[81] + lea_delta[2][17], 11);
+		key->rk[93] = rol32(key->rk[87] + lea_delta[3][18], 11);
+		key->rk[99] = rol32(key->rk[93] + lea_delta[4][19], 11);
+		key->rk[105] = rol32(key->rk[99] + lea_delta[5][20], 11);
+		key->rk[111] = rol32(key->rk[105] + lea_delta[0][21], 11);
+		key->rk[117] = rol32(key->rk[111] + lea_delta[1][22], 11);
+		key->rk[123] = rol32(key->rk[117] + lea_delta[2][23], 11);
+		key->rk[129] = rol32(key->rk[123] + lea_delta[3][24], 11);
+		key->rk[135] = rol32(key->rk[129] + lea_delta[4][25], 11);
+		key->rk[141] = rol32(key->rk[135] + lea_delta[5][26], 11);
+		key->rk[147] = rol32(key->rk[141] + lea_delta[0][27], 11);
+		key->rk[153] = rol32(key->rk[147] + lea_delta[1][28], 11);
+		key->rk[159] = rol32(key->rk[153] + lea_delta[2][29], 11);
+		key->rk[165] = rol32(key->rk[159] + lea_delta[3][30], 11);
+
+		key->rk[4] = rol32(get_unaligned_le32(&_mk[4]) + lea_delta[0][4], 13);
+		key->rk[10] = rol32(key->rk[4] + lea_delta[1][5], 13);
+		key->rk[16] = rol32(key->rk[10] + lea_delta[2][6], 13);
+		key->rk[22] = rol32(key->rk[16] + lea_delta[3][7], 13);
+		key->rk[28] = rol32(key->rk[22] + lea_delta[4][8], 13);
+		key->rk[34] = rol32(key->rk[28] + lea_delta[5][9], 13);
+		key->rk[40] = rol32(key->rk[34] + lea_delta[0][10], 13);
+		key->rk[46] = rol32(key->rk[40] + lea_delta[1][11], 13);
+		key->rk[52] = rol32(key->rk[46] + lea_delta[2][12], 13);
+		key->rk[58] = rol32(key->rk[52] + lea_delta[3][13], 13);
+		key->rk[64] = rol32(key->rk[58] + lea_delta[4][14], 13);
+		key->rk[70] = rol32(key->rk[64] + lea_delta[5][15], 13);
+		key->rk[76] = rol32(key->rk[70] + lea_delta[0][16], 13);
+		key->rk[82] = rol32(key->rk[76] + lea_delta[1][17], 13);
+		key->rk[88] = rol32(key->rk[82] + lea_delta[2][18], 13);
+		key->rk[94] = rol32(key->rk[88] + lea_delta[3][19], 13);
+		key->rk[100] = rol32(key->rk[94] + lea_delta[4][20], 13);
+		key->rk[106] = rol32(key->rk[100] + lea_delta[5][21], 13);
+		key->rk[112] = rol32(key->rk[106] + lea_delta[0][22], 13);
+		key->rk[118] = rol32(key->rk[112] + lea_delta[1][23], 13);
+		key->rk[124] = rol32(key->rk[118] + lea_delta[2][24], 13);
+		key->rk[130] = rol32(key->rk[124] + lea_delta[3][25], 13);
+		key->rk[136] = rol32(key->rk[130] + lea_delta[4][26], 13);
+		key->rk[142] = rol32(key->rk[136] + lea_delta[5][27], 13);
+		key->rk[148] = rol32(key->rk[142] + lea_delta[0][28], 13);
+		key->rk[154] = rol32(key->rk[148] + lea_delta[1][29], 13);
+		key->rk[160] = rol32(key->rk[154] + lea_delta[2][30], 13);
+		key->rk[166] = rol32(key->rk[160] + lea_delta[3][31], 13);
+
+		key->rk[5] = rol32(get_unaligned_le32(&_mk[5]) + lea_delta[0][5], 17);
+		key->rk[11] = rol32(key->rk[5] + lea_delta[1][6], 17);
+		key->rk[17] = rol32(key->rk[11] + lea_delta[2][7], 17);
+		key->rk[23] = rol32(key->rk[17] + lea_delta[3][8], 17);
+		key->rk[29] = rol32(key->rk[23] + lea_delta[4][9], 17);
+		key->rk[35] = rol32(key->rk[29] + lea_delta[5][10], 17);
+		key->rk[41] = rol32(key->rk[35] + lea_delta[0][11], 17);
+		key->rk[47] = rol32(key->rk[41] + lea_delta[1][12], 17);
+		key->rk[53] = rol32(key->rk[47] + lea_delta[2][13], 17);
+		key->rk[59] = rol32(key->rk[53] + lea_delta[3][14], 17);
+		key->rk[65] = rol32(key->rk[59] + lea_delta[4][15], 17);
+		key->rk[71] = rol32(key->rk[65] + lea_delta[5][16], 17);
+		key->rk[77] = rol32(key->rk[71] + lea_delta[0][17], 17);
+		key->rk[83] = rol32(key->rk[77] + lea_delta[1][18], 17);
+		key->rk[89] = rol32(key->rk[83] + lea_delta[2][19], 17);
+		key->rk[95] = rol32(key->rk[89] + lea_delta[3][20], 17);
+		key->rk[101] = rol32(key->rk[95] + lea_delta[4][21], 17);
+		key->rk[107] = rol32(key->rk[101] + lea_delta[5][22], 17);
+		key->rk[113] = rol32(key->rk[107] + lea_delta[0][23], 17);
+		key->rk[119] = rol32(key->rk[113] + lea_delta[1][24], 17);
+		key->rk[125] = rol32(key->rk[119] + lea_delta[2][25], 17);
+		key->rk[131] = rol32(key->rk[125] + lea_delta[3][26], 17);
+		key->rk[137] = rol32(key->rk[131] + lea_delta[4][27], 17);
+		key->rk[143] = rol32(key->rk[137] + lea_delta[5][28], 17);
+		key->rk[149] = rol32(key->rk[143] + lea_delta[0][29], 17);
+		key->rk[155] = rol32(key->rk[149] + lea_delta[1][30], 17);
+		key->rk[161] = rol32(key->rk[155] + lea_delta[2][31], 17);
+		key->rk[167] = rol32(key->rk[161] + lea_delta[3][0], 17);
+		break;
+
+	case 32:
+		key->rk[0] = rol32(get_unaligned_le32(&_mk[0]) + lea_delta[0][0], 1);
+		key->rk[8] = rol32(key->rk[0] + lea_delta[1][3], 6);
+		key->rk[16] = rol32(key->rk[8] + lea_delta[2][6], 13);
+		key->rk[24] = rol32(key->rk[16] + lea_delta[4][4], 1);
+		key->rk[32] = rol32(key->rk[24] + lea_delta[5][7], 6);
+		key->rk[40] = rol32(key->rk[32] + lea_delta[6][10], 13);
+		key->rk[48] = rol32(key->rk[40] + lea_delta[0][8], 1);
+		key->rk[56] = rol32(key->rk[48] + lea_delta[1][11], 6);
+		key->rk[64] = rol32(key->rk[56] + lea_delta[2][14], 13);
+		key->rk[72] = rol32(key->rk[64] + lea_delta[4][12], 1);
+		key->rk[80] = rol32(key->rk[72] + lea_delta[5][15], 6);
+		key->rk[88] = rol32(key->rk[80] + lea_delta[6][18], 13);
+		key->rk[96] = rol32(key->rk[88] + lea_delta[0][16], 1);
+		key->rk[104] = rol32(key->rk[96] + lea_delta[1][19], 6);
+		key->rk[112] = rol32(key->rk[104] + lea_delta[2][22], 13);
+		key->rk[120] = rol32(key->rk[112] + lea_delta[4][20], 1);
+		key->rk[128] = rol32(key->rk[120] + lea_delta[5][23], 6);
+		key->rk[136] = rol32(key->rk[128] + lea_delta[6][26], 13);
+		key->rk[144] = rol32(key->rk[136] + lea_delta[0][24], 1);
+		key->rk[152] = rol32(key->rk[144] + lea_delta[1][27], 6);
+		key->rk[160] = rol32(key->rk[152] + lea_delta[2][30], 13);
+		key->rk[168] = rol32(key->rk[160] + lea_delta[4][28], 1);
+		key->rk[176] = rol32(key->rk[168] + lea_delta[5][31], 6);
+		key->rk[184] = rol32(key->rk[176] + lea_delta[6][2], 13);
+
+		key->rk[1] = rol32(get_unaligned_le32(&_mk[1]) + lea_delta[0][1], 3);
+		key->rk[9] = rol32(key->rk[1] + lea_delta[1][4], 11);
+		key->rk[17] = rol32(key->rk[9] + lea_delta[2][7], 17);
+		key->rk[25] = rol32(key->rk[17] + lea_delta[4][5], 3);
+		key->rk[33] = rol32(key->rk[25] + lea_delta[5][8], 11);
+		key->rk[41] = rol32(key->rk[33] + lea_delta[6][11], 17);
+		key->rk[49] = rol32(key->rk[41] + lea_delta[0][9], 3);
+		key->rk[57] = rol32(key->rk[49] + lea_delta[1][12], 11);
+		key->rk[65] = rol32(key->rk[57] + lea_delta[2][15], 17);
+		key->rk[73] = rol32(key->rk[65] + lea_delta[4][13], 3);
+		key->rk[81] = rol32(key->rk[73] + lea_delta[5][16], 11);
+		key->rk[89] = rol32(key->rk[81] + lea_delta[6][19], 17);
+		key->rk[97] = rol32(key->rk[89] + lea_delta[0][17], 3);
+		key->rk[105] = rol32(key->rk[97] + lea_delta[1][20], 11);
+		key->rk[113] = rol32(key->rk[105] + lea_delta[2][23], 17);
+		key->rk[121] = rol32(key->rk[113] + lea_delta[4][21], 3);
+		key->rk[129] = rol32(key->rk[121] + lea_delta[5][24], 11);
+		key->rk[137] = rol32(key->rk[129] + lea_delta[6][27], 17);
+		key->rk[145] = rol32(key->rk[137] + lea_delta[0][25], 3);
+		key->rk[153] = rol32(key->rk[145] + lea_delta[1][28], 11);
+		key->rk[161] = rol32(key->rk[153] + lea_delta[2][31], 17);
+		key->rk[169] = rol32(key->rk[161] + lea_delta[4][29], 3);
+		key->rk[177] = rol32(key->rk[169] + lea_delta[5][0], 11);
+		key->rk[185] = rol32(key->rk[177] + lea_delta[6][3], 17);
+
+		key->rk[2] = rol32(get_unaligned_le32(&_mk[2]) + lea_delta[0][2], 6);
+		key->rk[10] = rol32(key->rk[2] + lea_delta[1][5], 13);
+		key->rk[18] = rol32(key->rk[10] + lea_delta[3][3], 1);
+		key->rk[26] = rol32(key->rk[18] + lea_delta[4][6], 6);
+		key->rk[34] = rol32(key->rk[26] + lea_delta[5][9], 13);
+		key->rk[42] = rol32(key->rk[34] + lea_delta[7][7], 1);
+		key->rk[50] = rol32(key->rk[42] + lea_delta[0][10], 6);
+		key->rk[58] = rol32(key->rk[50] + lea_delta[1][13], 13);
+		key->rk[66] = rol32(key->rk[58] + lea_delta[3][11], 1);
+		key->rk[74] = rol32(key->rk[66] + lea_delta[4][14], 6);
+		key->rk[82] = rol32(key->rk[74] + lea_delta[5][17], 13);
+		key->rk[90] = rol32(key->rk[82] + lea_delta[7][15], 1);
+		key->rk[98] = rol32(key->rk[90] + lea_delta[0][18], 6);
+		key->rk[106] = rol32(key->rk[98] + lea_delta[1][21], 13);
+		key->rk[114] = rol32(key->rk[106] + lea_delta[3][19], 1);
+		key->rk[122] = rol32(key->rk[114] + lea_delta[4][22], 6);
+		key->rk[130] = rol32(key->rk[122] + lea_delta[5][25], 13);
+		key->rk[138] = rol32(key->rk[130] + lea_delta[7][23], 1);
+		key->rk[146] = rol32(key->rk[138] + lea_delta[0][26], 6);
+		key->rk[154] = rol32(key->rk[146] + lea_delta[1][29], 13);
+		key->rk[162] = rol32(key->rk[154] + lea_delta[3][27], 1);
+		key->rk[170] = rol32(key->rk[162] + lea_delta[4][30], 6);
+		key->rk[178] = rol32(key->rk[170] + lea_delta[5][1], 13);
+		key->rk[186] = rol32(key->rk[178] + lea_delta[7][31], 1);
+
+		key->rk[3] = rol32(get_unaligned_le32(&_mk[3]) + lea_delta[0][3], 11);
+		key->rk[11] = rol32(key->rk[3] + lea_delta[1][6], 17);
+		key->rk[19] = rol32(key->rk[11] + lea_delta[3][4], 3);
+		key->rk[27] = rol32(key->rk[19] + lea_delta[4][7], 11);
+		key->rk[35] = rol32(key->rk[27] + lea_delta[5][10], 17);
+		key->rk[43] = rol32(key->rk[35] + lea_delta[7][8], 3);
+		key->rk[51] = rol32(key->rk[43] + lea_delta[0][11], 11);
+		key->rk[59] = rol32(key->rk[51] + lea_delta[1][14], 17);
+		key->rk[67] = rol32(key->rk[59] + lea_delta[3][12], 3);
+		key->rk[75] = rol32(key->rk[67] + lea_delta[4][15], 11);
+		key->rk[83] = rol32(key->rk[75] + lea_delta[5][18], 17);
+		key->rk[91] = rol32(key->rk[83] + lea_delta[7][16], 3);
+		key->rk[99] = rol32(key->rk[91] + lea_delta[0][19], 11);
+		key->rk[107] = rol32(key->rk[99] + lea_delta[1][22], 17);
+		key->rk[115] = rol32(key->rk[107] + lea_delta[3][20], 3);
+		key->rk[123] = rol32(key->rk[115] + lea_delta[4][23], 11);
+		key->rk[131] = rol32(key->rk[123] + lea_delta[5][26], 17);
+		key->rk[139] = rol32(key->rk[131] + lea_delta[7][24], 3);
+		key->rk[147] = rol32(key->rk[139] + lea_delta[0][27], 11);
+		key->rk[155] = rol32(key->rk[147] + lea_delta[1][30], 17);
+		key->rk[163] = rol32(key->rk[155] + lea_delta[3][28], 3);
+		key->rk[171] = rol32(key->rk[163] + lea_delta[4][31], 11);
+		key->rk[179] = rol32(key->rk[171] + lea_delta[5][2], 17);
+		key->rk[187] = rol32(key->rk[179] + lea_delta[7][0], 3);
+
+		key->rk[4] = rol32(get_unaligned_le32(&_mk[4]) + lea_delta[0][4], 13);
+		key->rk[12] = rol32(key->rk[4] + lea_delta[2][2], 1);
+		key->rk[20] = rol32(key->rk[12] + lea_delta[3][5], 6);
+		key->rk[28] = rol32(key->rk[20] + lea_delta[4][8], 13);
+		key->rk[36] = rol32(key->rk[28] + lea_delta[6][6], 1);
+		key->rk[44] = rol32(key->rk[36] + lea_delta[7][9], 6);
+		key->rk[52] = rol32(key->rk[44] + lea_delta[0][12], 13);
+		key->rk[60] = rol32(key->rk[52] + lea_delta[2][10], 1);
+		key->rk[68] = rol32(key->rk[60] + lea_delta[3][13], 6);
+		key->rk[76] = rol32(key->rk[68] + lea_delta[4][16], 13);
+		key->rk[84] = rol32(key->rk[76] + lea_delta[6][14], 1);
+		key->rk[92] = rol32(key->rk[84] + lea_delta[7][17], 6);
+		key->rk[100] = rol32(key->rk[92] + lea_delta[0][20], 13);
+		key->rk[108] = rol32(key->rk[100] + lea_delta[2][18], 1);
+		key->rk[116] = rol32(key->rk[108] + lea_delta[3][21], 6);
+		key->rk[124] = rol32(key->rk[116] + lea_delta[4][24], 13);
+		key->rk[132] = rol32(key->rk[124] + lea_delta[6][22], 1);
+		key->rk[140] = rol32(key->rk[132] + lea_delta[7][25], 6);
+		key->rk[148] = rol32(key->rk[140] + lea_delta[0][28], 13);
+		key->rk[156] = rol32(key->rk[148] + lea_delta[2][26], 1);
+		key->rk[164] = rol32(key->rk[156] + lea_delta[3][29], 6);
+		key->rk[172] = rol32(key->rk[164] + lea_delta[4][0], 13);
+		key->rk[180] = rol32(key->rk[172] + lea_delta[6][30], 1);
+		key->rk[188] = rol32(key->rk[180] + lea_delta[7][1], 6);
+
+		key->rk[5] = rol32(get_unaligned_le32(&_mk[5]) + lea_delta[0][5], 17);
+		key->rk[13] = rol32(key->rk[5] + lea_delta[2][3], 3);
+		key->rk[21] = rol32(key->rk[13] + lea_delta[3][6], 11);
+		key->rk[29] = rol32(key->rk[21] + lea_delta[4][9], 17);
+		key->rk[37] = rol32(key->rk[29] + lea_delta[6][7], 3);
+		key->rk[45] = rol32(key->rk[37] + lea_delta[7][10], 11);
+		key->rk[53] = rol32(key->rk[45] + lea_delta[0][13], 17);
+		key->rk[61] = rol32(key->rk[53] + lea_delta[2][11], 3);
+		key->rk[69] = rol32(key->rk[61] + lea_delta[3][14], 11);
+		key->rk[77] = rol32(key->rk[69] + lea_delta[4][17], 17);
+		key->rk[85] = rol32(key->rk[77] + lea_delta[6][15], 3);
+		key->rk[93] = rol32(key->rk[85] + lea_delta[7][18], 11);
+		key->rk[101] = rol32(key->rk[93] + lea_delta[0][21], 17);
+		key->rk[109] = rol32(key->rk[101] + lea_delta[2][19], 3);
+		key->rk[117] = rol32(key->rk[109] + lea_delta[3][22], 11);
+		key->rk[125] = rol32(key->rk[117] + lea_delta[4][25], 17);
+		key->rk[133] = rol32(key->rk[125] + lea_delta[6][23], 3);
+		key->rk[141] = rol32(key->rk[133] + lea_delta[7][26], 11);
+		key->rk[149] = rol32(key->rk[141] + lea_delta[0][29], 17);
+		key->rk[157] = rol32(key->rk[149] + lea_delta[2][27], 3);
+		key->rk[165] = rol32(key->rk[157] + lea_delta[3][30], 11);
+		key->rk[173] = rol32(key->rk[165] + lea_delta[4][1], 17);
+		key->rk[181] = rol32(key->rk[173] + lea_delta[6][31], 3);
+		key->rk[189] = rol32(key->rk[181] + lea_delta[7][2], 11);
+
+		key->rk[6] = rol32(get_unaligned_le32(&_mk[6]) + lea_delta[1][1], 1);
+		key->rk[14] = rol32(key->rk[6] + lea_delta[2][4], 6);
+		key->rk[22] = rol32(key->rk[14] + lea_delta[3][7], 13);
+		key->rk[30] = rol32(key->rk[22] + lea_delta[5][5], 1);
+		key->rk[38] = rol32(key->rk[30] + lea_delta[6][8], 6);
+		key->rk[46] = rol32(key->rk[38] + lea_delta[7][11], 13);
+		key->rk[54] = rol32(key->rk[46] + lea_delta[1][9], 1);
+		key->rk[62] = rol32(key->rk[54] + lea_delta[2][12], 6);
+		key->rk[70] = rol32(key->rk[62] + lea_delta[3][15], 13);
+		key->rk[78] = rol32(key->rk[70] + lea_delta[5][13], 1);
+		key->rk[86] = rol32(key->rk[78] + lea_delta[6][16], 6);
+		key->rk[94] = rol32(key->rk[86] + lea_delta[7][19], 13);
+		key->rk[102] = rol32(key->rk[94] + lea_delta[1][17], 1);
+		key->rk[110] = rol32(key->rk[102] + lea_delta[2][20], 6);
+		key->rk[118] = rol32(key->rk[110] + lea_delta[3][23], 13);
+		key->rk[126] = rol32(key->rk[118] + lea_delta[5][21], 1);
+		key->rk[134] = rol32(key->rk[126] + lea_delta[6][24], 6);
+		key->rk[142] = rol32(key->rk[134] + lea_delta[7][27], 13);
+		key->rk[150] = rol32(key->rk[142] + lea_delta[1][25], 1);
+		key->rk[158] = rol32(key->rk[150] + lea_delta[2][28], 6);
+		key->rk[166] = rol32(key->rk[158] + lea_delta[3][31], 13);
+		key->rk[174] = rol32(key->rk[166] + lea_delta[5][29], 1);
+		key->rk[182] = rol32(key->rk[174] + lea_delta[6][0], 6);
+		key->rk[190] = rol32(key->rk[182] + lea_delta[7][3], 13);
+
+		key->rk[7] = rol32(get_unaligned_le32(&_mk[7]) + lea_delta[1][2], 3);
+		key->rk[15] = rol32(key->rk[7] + lea_delta[2][5], 11);
+		key->rk[23] = rol32(key->rk[15] + lea_delta[3][8], 17);
+		key->rk[31] = rol32(key->rk[23] + lea_delta[5][6], 3);
+		key->rk[39] = rol32(key->rk[31] + lea_delta[6][9], 11);
+		key->rk[47] = rol32(key->rk[39] + lea_delta[7][12], 17);
+		key->rk[55] = rol32(key->rk[47] + lea_delta[1][10], 3);
+		key->rk[63] = rol32(key->rk[55] + lea_delta[2][13], 11);
+		key->rk[71] = rol32(key->rk[63] + lea_delta[3][16], 17);
+		key->rk[79] = rol32(key->rk[71] + lea_delta[5][14], 3);
+		key->rk[87] = rol32(key->rk[79] + lea_delta[6][17], 11);
+		key->rk[95] = rol32(key->rk[87] + lea_delta[7][20], 17);
+		key->rk[103] = rol32(key->rk[95] + lea_delta[1][18], 3);
+		key->rk[111] = rol32(key->rk[103] + lea_delta[2][21], 11);
+		key->rk[119] = rol32(key->rk[111] + lea_delta[3][24], 17);
+		key->rk[127] = rol32(key->rk[119] + lea_delta[5][22], 3);
+		key->rk[135] = rol32(key->rk[127] + lea_delta[6][25], 11);
+		key->rk[143] = rol32(key->rk[135] + lea_delta[7][28], 17);
+		key->rk[151] = rol32(key->rk[143] + lea_delta[1][26], 3);
+		key->rk[159] = rol32(key->rk[151] + lea_delta[2][29], 11);
+		key->rk[167] = rol32(key->rk[159] + lea_delta[3][0], 17);
+		key->rk[175] = rol32(key->rk[167] + lea_delta[5][30], 3);
+		key->rk[183] = rol32(key->rk[175] + lea_delta[6][1], 11);
+		key->rk[191] = rol32(key->rk[183] + lea_delta[7][4], 17);
+
+		break;
+
+	default:
+		return 1;
+	}
+
+	key->round = LEA_ROUND_CNT(key_len);
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(lea_set_key);
+
+static int crypto_lea_set_key(struct crypto_tfm *tfm, const u8 *in_key,
+											u32 key_len)
+{
+	return lea_set_key(crypto_tfm_ctx(tfm), in_key, key_len);
+}
+
+static void crypto_lea_encrypt(struct crypto_tfm *tfm, u8 *out, const u8 *in)
+{
+	const struct crypto_lea_ctx *key = crypto_tfm_ctx(tfm);
+
+	lea_encrypt(key, out, in);
+}
+
+static void crypto_lea_decrypt(struct crypto_tfm *tfm, u8 *out, const u8 *in)
+{
+	const struct crypto_lea_ctx *key = crypto_tfm_ctx(tfm);
+
+	lea_decrypt(key, out, in);
+}
+
+static struct crypto_alg lea_alg = {
+	.cra_name        = "lea",
+	.cra_driver_name = "lea-generic",
+	.cra_priority    = 100,
+	.cra_flags       = CRYPTO_ALG_TYPE_CIPHER,
+	.cra_blocksize   = LEA_BLOCK_SIZE,
+	.cra_ctxsize     = sizeof(struct crypto_lea_ctx),
+	.cra_module      = THIS_MODULE,
+	.cra_u           = {
+		.cipher = {
+			.cia_min_keysize = LEA_MIN_KEY_SIZE,
+			.cia_max_keysize = LEA_MAX_KEY_SIZE,
+			.cia_setkey = crypto_lea_set_key,
+			.cia_encrypt = crypto_lea_encrypt,
+			.cia_decrypt = crypto_lea_decrypt
+		}
+	}
+};
+
+static int crypto_lea_init(void)
+{
+	return crypto_register_alg(&lea_alg);
+}
+
+static void crypto_lea_exit(void)
+{
+	crypto_unregister_alg(&lea_alg);
+}
+
+module_init(crypto_lea_init);
+module_exit(crypto_lea_exit);
+
+MODULE_DESCRIPTION("LEA Cipher Algorithm");
+MODULE_AUTHOR("Dongsoo Lee <letrhee@nsr.re.kr>");
+MODULE_LICENSE("GPL");
+MODULE_ALIAS_CRYPTO("lea");
+MODULE_ALIAS_CRYPTO("lea-generic");
diff --git a/include/crypto/lea.h b/include/crypto/lea.h
new file mode 100644
index 000000000000..1668c7ed5a6e
--- /dev/null
+++ b/include/crypto/lea.h
@@ -0,0 +1,39 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * Cryptographic API.
+ *
+ * LEA Cipher Algorithm
+ *
+ * LEA is a 128-bit block cipher developed by South Korea in 2013.
+ *
+ * LEA is the national standard of Republic of Korea (KS X 3246) and included in
+ * the ISO/IEC 29192-2:2019 standard (Information security - Lightweight
+ * cryptography - Part 2: Block ciphers).
+ *
+ * Copyright (c) 2023 National Security Research.
+ * Author: Dongsoo Lee <letrhee@nsr.re.kr>
+ */
+
+#ifndef _CRYPTO_LEA_H
+#define _CRYPTO_LEA_H
+
+#include <linux/types.h>
+
+#define LEA_MIN_KEY_SIZE 16
+#define LEA_MAX_KEY_SIZE 32
+#define LEA_BLOCK_SIZE 16
+#define LEA_ROUND_CNT(key_len) ((key_len >> 1) + 16)
+
+#define LEA_MAX_KEYLENGTH_U32 (LEA_ROUND_CNT(LEA_MAX_KEY_SIZE) * 6)
+#define LEA_MAX_KEYLENGTH (LEA_MAX_KEYLENGTH_U32 * sizeof(u32))
+
+struct crypto_lea_ctx {
+	u32 rk[LEA_MAX_KEYLENGTH_U32];
+	u32 round;
+};
+
+int lea_set_key(struct crypto_lea_ctx *ctx, const u8 *in_key, u32 key_len);
+void lea_encrypt(const void *ctx, u8 *out, const u8 *in);
+void lea_decrypt(const void *ctx, u8 *out, const u8 *in);
+
+#endif
-- 
2.34.1

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 2/3] crypto: add LEA testmgr tests
  2023-04-28 11:00 [PATCH 0/3] crypto: LEA block cipher implementation Dongsoo Lee
  2023-04-28 11:00 ` [PATCH 1/3] " Dongsoo Lee
@ 2023-04-28 11:00 ` Dongsoo Lee
  2023-04-28 11:00 ` [PATCH 3/3] crypto: LEA block cipher AVX2 optimization Dongsoo Lee
  2023-04-28 23:19 ` [PATCH 0/3] crypto: LEA block cipher implementation Eric Biggers
  3 siblings, 0 replies; 10+ messages in thread
From: Dongsoo Lee @ 2023-04-28 11:00 UTC (permalink / raw)
  To: linux-crypto
  Cc: Herbert Xu, David S. Miller, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, x86, H. Peter Anvin, linux-kernel,
	David S. Miller, Dongsoo Lee, Dongsoo Lee

The test vectors used for testing were taken from KCMVP

- https://seed.kisa.or.kr/kisa/kcmvp/EgovVerification.do

Signed-off-by: Dongsoo Lee <letrhee@nsr.re.kr>
---
 crypto/tcrypt.c  |   73 +++
 crypto/testmgr.c |   32 ++
 crypto/testmgr.h | 1211 ++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 1316 insertions(+)

diff --git a/crypto/tcrypt.c b/crypto/tcrypt.c
index 202ca1a3105d..bf6ea5821051 100644
--- a/crypto/tcrypt.c
+++ b/crypto/tcrypt.c
@@ -1708,6 +1708,10 @@ static int do_test(const char *alg, u32 type, u32 mask, int m, u32 num_mb)
 		ret = min(ret, tcrypt_test("cts(cbc(sm4))"));
 		break;
 
+	case 60:
+		ret = min(ret, tcrypt_test("gcm(lea)"));
+		break;
+
 	case 100:
 		ret = min(ret, tcrypt_test("hmac(md5)"));
 		break;
@@ -1855,6 +1859,12 @@ static int do_test(const char *alg, u32 type, u32 mask, int m, u32 num_mb)
 		ret = min(ret, tcrypt_test("cfb(aria)"));
 		ret = min(ret, tcrypt_test("ctr(aria)"));
 		break;
+	case 193:
+		ret = min(ret, tcrypt_test("ecb(lea)"));
+		ret = min(ret, tcrypt_test("cbc(lea)"));
+		ret = min(ret, tcrypt_test("ctr(lea)"));
+		ret = min(ret, tcrypt_test("xts(lea)"));
+		break;
 	case 200:
 		test_cipher_speed("ecb(aes)", ENCRYPT, sec, NULL, 0,
 				speed_template_16_24_32);
@@ -2222,6 +2232,39 @@ static int do_test(const char *alg, u32 type, u32 mask, int m, u32 num_mb)
 				   speed_template_16, num_mb);
 		break;
 
+	case 230:
+		test_cipher_speed("ecb(lea)", ENCRYPT, sec, NULL, 0,
+				  speed_template_16_24_32);
+		test_cipher_speed("ecb(lea)", DECRYPT, sec, NULL, 0,
+				  speed_template_16_24_32);
+		test_cipher_speed("cbc(lea)", ENCRYPT, sec, NULL, 0,
+				  speed_template_16_24_32);
+		test_cipher_speed("cbc(lea)", DECRYPT, sec, NULL, 0,
+				  speed_template_16_24_32);
+		test_cipher_speed("ctr(lea)", ENCRYPT, sec, NULL, 0,
+				  speed_template_16_24_32);
+		test_cipher_speed("ctr(lea)", DECRYPT, sec, NULL, 0,
+				  speed_template_16_24_32);
+		test_cipher_speed("xts(lea)", ENCRYPT, sec, NULL, 0,
+				  speed_template_32_48_64);
+		test_cipher_speed("xts(lea)", DECRYPT, sec, NULL, 0,
+				  speed_template_32_48_64);
+		break;
+
+	case 231:
+		test_aead_speed("gcm(lea)", ENCRYPT, sec,
+				NULL, 0, 16, 8, speed_template_16_24_32);
+		test_aead_speed("gcm(lea)", DECRYPT, sec,
+				NULL, 0, 16, 8, speed_template_16_24_32);
+		break;
+
+	case 232:
+		test_mb_aead_speed("gcm(lea)", ENCRYPT, sec, NULL, 0, 16, 8,
+				   speed_template_16, num_mb);
+		test_mb_aead_speed("gcm(lea)", DECRYPT, sec, NULL, 0, 16, 8,
+				   speed_template_16, num_mb);
+		break;
+
 	case 300:
 		if (alg) {
 			test_hash_speed(alg, sec, generic_hash_speed_template);
@@ -2657,6 +2700,21 @@ static int do_test(const char *alg, u32 type, u32 mask, int m, u32 num_mb)
 				   speed_template_16_24_32);
 		break;
 
+	case 520:
+		test_acipher_speed("ecb(lea)", ENCRYPT, sec, NULL, 0,
+				   speed_template_16_24_32);
+		test_acipher_speed("ecb(lea)", DECRYPT, sec, NULL, 0,
+				   speed_template_16_24_32);
+		test_acipher_speed("ctr(lea)", ENCRYPT, sec, NULL, 0,
+				   speed_template_16_24_32);
+		test_acipher_speed("ctr(lea)", DECRYPT, sec, NULL, 0,
+				   speed_template_16_24_32);
+		test_acipher_speed("xts(lea)", ENCRYPT, sec, NULL, 0,
+				   speed_template_32_48_64);
+		test_acipher_speed("xts(lea)", DECRYPT, sec, NULL, 0,
+				   speed_template_32_48_64);
+		break;
+
 	case 600:
 		test_mb_skcipher_speed("ecb(aes)", ENCRYPT, sec, NULL, 0,
 				       speed_template_16_24_32, num_mb);
@@ -2880,6 +2938,21 @@ static int do_test(const char *alg, u32 type, u32 mask, int m, u32 num_mb)
 				       speed_template_16_32, num_mb);
 		break;
 
+	case 611:
+		test_mb_skcipher_speed("ecb(lea)", ENCRYPT, sec, NULL, 0,
+				       speed_template_16_32, num_mb);
+		test_mb_skcipher_speed("ecb(lea)", DECRYPT, sec, NULL, 0,
+				       speed_template_16_32, num_mb);
+		test_mb_skcipher_speed("ctr(lea)", ENCRYPT, sec, NULL, 0,
+				       speed_template_16_32, num_mb);
+		test_mb_skcipher_speed("ctr(lea)", DECRYPT, sec, NULL, 0,
+				       speed_template_16_32, num_mb);
+		test_mb_skcipher_speed("xts(lea)", ENCRYPT, sec, NULL, 0,
+				       speed_template_32_64, num_mb);
+		test_mb_skcipher_speed("xts(lea)", DECRYPT, sec, NULL, 0,
+				       speed_template_32_64, num_mb);
+		break;
+
 	}
 
 	return ret;
diff --git a/crypto/testmgr.c b/crypto/testmgr.c
index 216878c8bc3d..7b8a53c2da2a 100644
--- a/crypto/testmgr.c
+++ b/crypto/testmgr.c
@@ -4539,6 +4539,12 @@ static const struct alg_test_desc alg_test_descs[] = {
 		.suite = {
 			.cipher = __VECS(des3_ede_cbc_tv_template)
 		},
+	}, {
+		.alg = "cbc(lea)",
+		.test = alg_test_skcipher,
+		.suite = {
+			.cipher = __VECS(lea_cbc_tv_template)
+		},
 	}, {
 		/* Same as cbc(aes) except the key is stored in
 		 * hardware secure memory which we reference by index
@@ -4742,6 +4748,12 @@ static const struct alg_test_desc alg_test_descs[] = {
 		.suite = {
 			.cipher = __VECS(des3_ede_ctr_tv_template)
 		}
+	}, {
+		.alg = "ctr(lea)",
+		.test = alg_test_skcipher,
+		.suite = {
+			.cipher = __VECS(lea_ctr_tv_template)
+		}
 	}, {
 		/* Same as ctr(aes) except the key is stored in
 		 * hardware secure memory which we reference by index
@@ -5029,6 +5041,12 @@ static const struct alg_test_desc alg_test_descs[] = {
 		.suite = {
 			.cipher = __VECS(khazad_tv_template)
 		}
+	}, {
+		.alg = "ecb(lea)",
+		.test = alg_test_skcipher,
+		.suite = {
+			.cipher = __VECS(lea_tv_template)
+		}
 	}, {
 		/* Same as ecb(aes) except the key is stored in
 		 * hardware secure memory which we reference by index
@@ -5199,6 +5217,13 @@ static const struct alg_test_desc alg_test_descs[] = {
 		.suite = {
 			.aead = __VECS(aria_gcm_tv_template)
 		}
+	}, {
+		.alg = "gcm(lea)",
+		.generic_driver = "gcm_base(ctr(lea-generic),ghash-generic)",
+		.test = alg_test_aead,
+		.suite = {
+			.aead = __VECS(lea_gcm_tv_template)
+		}
 	}, {
 		.alg = "gcm(sm4)",
 		.generic_driver = "gcm_base(ctr(sm4-generic),ghash-generic)",
@@ -5720,6 +5745,13 @@ static const struct alg_test_desc alg_test_descs[] = {
 		.suite = {
 			.cipher = __VECS(cast6_xts_tv_template)
 		}
+	}, {
+		.alg = "xts(lea)",
+		.generic_driver = "xts(ecb(lea-generic))",
+		.test = alg_test_skcipher,
+		.suite = {
+			.cipher = __VECS(lea_xts_tv_template)
+		}
 	}, {
 		/* Same as xts(aes) except the key is stored in
 		 * hardware secure memory which we reference by index
diff --git a/crypto/testmgr.h b/crypto/testmgr.h
index 5ca7a412508f..9a97d852030c 100644
--- a/crypto/testmgr.h
+++ b/crypto/testmgr.h
@@ -30444,6 +30444,1217 @@ static const struct aead_testvec aria_gcm_tv_template[] = {
 	}
 };
 
+static const struct cipher_testvec lea_tv_template[] = {
+	{
+		.key	= "\x07\xab\x63\x05\xb0\x25\xd8\x3f"
+			  "\x79\xad\xda\xa6\x3a\xc8\xad\x00",
+		.klen	= 16,
+		.ptext	= "\xf2\x8a\xe3\x25\x6a\xad\x23\xb4"
+			  "\x15\xe0\x28\x06\x3b\x61\x0c\x60",
+		.ctext	= "\x64\xd9\x08\xfc\xb7\xeb\xfe\xf9"
+			  "\x0f\xd6\x70\x10\x6d\xe7\xc7\xc5",
+		.len	= 16,
+	}, {
+		.key	= "\x42\xaf\x3b\xcd\x6c\xbe\xaa\xef"
+			  "\xf1\xa7\xc2\x6e\x61\xcd\x2b\xde",
+		.klen	= 16,
+		.ptext	= "\x51\x83\xbe\x45\xfd\x20\x47\xce"
+			  "\x31\x51\x89\xc2\x69\xb4\x83\xb3"
+			  "\x37\xa2\xf2\xfb\xe5\x4c\x17\x65"
+			  "\x5b\x09\xba\x29\x44\xee\x6f\x1e"
+			  "\x6d\xa0\x18\x2b\x6d\x66\xab\xfe"
+			  "\x8b\x82\x36\x01\xdc\xc2\x20\x8a"
+			  "\xac\x52\xb1\x53\x1f\xd4\xd4\x29"
+			  "\x18\xb2\x1c\xe8\x5a\xb3\x06\xa6"
+			  "\xee\xcd\x7e\x2f\xc4\x3b\xa4\xb2"
+			  "\x9d\xcf\xcf\xb9\x27\x88\xd2\x5e",
+		.ctext	= "\xf3\xb6\xbf\x4a\xfb\xa7\x10\x3e"
+			  "\x32\xb2\xac\x2e\x7b\x46\xff\x91"
+			  "\xe8\x72\xbc\xbb\x93\xcf\x52\xe2"
+			  "\x94\xed\x55\x39\x87\x1c\x48\x93"
+			  "\xd1\x4c\x54\x08\x86\x46\xe2\xfd"
+			  "\x0b\x7c\x62\xd5\x83\xf3\xaf\x67"
+			  "\x18\xb0\xba\x83\xc7\xa2\x9e\x2f"
+			  "\x96\x2d\xf0\x60\x62\x12\x1c\x52"
+			  "\x1b\xb9\xe7\x6d\x70\x35\x07\x07"
+			  "\x19\xed\xfb\x40\x9c\x5b\x83\xc2",
+		.len	= 80,
+	}, {
+		.key	= "\x9b\x6f\x9f\xba\x56\xe9\x6a\xea"
+			  "\x53\x8b\xf8\x27\x2a\x9f\x39\x2d",
+		.klen	= 16,
+			.ptext	= "\xf6\xde\xcf\xab\xfd\x89\xce\xf4"
+			  "\x93\xb5\xc0\xf7\x3b\xe7\xed\x71"
+			  "\x10\xe0\xd9\x61\x63\xba\x0d\xbd"
+			  "\xa6\x34\x1a\x63\x88\x4b\xdc\x52"
+			  "\x62\x0a\xfc\x1a\xd2\xa2\xb8\x91"
+			  "\xa5\xbd\xe7\xc8\xfb\x10\x37\x3d"
+			  "\xa5\x2f\xba\x52\xd2\xa6\xa1\xfe"
+			  "\xeb\x45\x47\xc3\xbb\xbb\x71\xe0"
+			  "\xd3\x67\xd4\xc7\x2d\x6a\xd7\xd1"
+			  "\x0f\x01\x9b\x31\x32\x12\x38\x27"
+			  "\x24\x04\x4a\x76\xeb\xd4\xad\x17"
+			  "\xeb\x65\x84\x2f\x0a\x18\x80\x3f"
+			  "\x11\x9d\x5f\x9a\x55\x09\xb2\x1d"
+			  "\x98\x28\xe4\x1a\x2a\x14\x78\x95"
+			  "\x53\x06\x92\xb3\xf6\x6d\xb9\x6f"
+			  "\x6e\x3d\xdb\x8f\xbc\x8a\x91\xd6"
+			  "\xe4\x55\xa5\x7c\x94\xa6\xd2\xdb"
+			  "\x07\xdb\xca\x6b\x29\x3f\x7e\xf0"
+			  "\xfc\xde\x99\xf2\x3a\x98\x4d\x6e"
+			  "\x3c\x75\x53\xcb\x1a\x38\x2d\x0f",
+			.ctext	= "\x98\xd8\x5d\x7d\x0d\x13\x6a\x80"
+			  "\xce\x74\x86\x44\x69\xd7\x7a\x03"
+			  "\xef\x56\xec\x9b\x24\xa7\x11\x9d"
+			  "\xe0\x95\x08\xa0\x4d\x6f\x43\x7e"
+			  "\x67\x0b\x54\xb3\x6e\x2c\xbd\xe5"
+			  "\x1c\xdb\xd0\x1e\x2c\xea\x53\x33"
+			  "\x2c\x2a\x14\x87\x9f\xf7\x7e\x02"
+			  "\x00\x0a\x00\xf1\x59\xfb\x18\x65"
+			  "\xe7\xdb\xed\x54\x33\x57\x91\x7d"
+			  "\x78\x3f\x18\xb0\x6f\xd8\xef\xa6"
+			  "\x68\x6d\x2e\x36\x2b\xce\xde\x94"
+			  "\xbb\x76\x87\xec\xfd\x75\x01\xb7"
+			  "\x9f\x91\x27\x40\x84\x06\x83\x72"
+			  "\x24\x66\x44\x0d\x24\x0e\xf0\x35"
+			  "\x56\x04\xbf\xcf\xbc\x30\xf1\x6f"
+			  "\x03\xd0\x05\x43\x58\x2a\x52\x71"
+			  "\x85\x26\x07\x93\x55\x16\x4e\x6b"
+			  "\x8c\xec\x36\xe3\x46\xb9\x09\x2d"
+			  "\x97\x06\xc4\x89\x46\xc4\x97\x62"
+			  "\x9c\x9c\x90\x55\xd9\xd8\x97\x77",
+		.len	= 160,
+	}, {
+		.key	= "\x14\x37\xaf\x53\x30\x69\xbd\x75"
+			  "\x25\xc1\x56\x0c\x78\xba\xd2\xa1"
+			  "\xe5\x34\x67\x1c\x00\x7e\xf2\x7c",
+		.klen	= 24,
+			.ptext	= "\x1c\xb4\xf4\xcb\x6c\x4b\xdb\x51"
+			  "\x68\xea\x84\x09\x72\x7b\xfd\x51",
+			.ctext	= "\x69\x72\x5c\x6d\xf9\x12\xf8\xb7"
+			  "\x0e\xb5\x11\xe6\x66\x3c\x58\x70",
+		.len	= 16,
+	}, {
+		.key	= "\x5e\xdc\x34\x69\x04\xb2\x96\xcf"
+			  "\x6b\xf3\xb4\x18\xe9\xab\x35\xdb"
+			  "\x0a\x47\xa1\x11\x33\xa9\x24\xca",
+		.klen	= 24,
+			.ptext	= "\x85\x7c\x8f\x1f\x04\xc5\xa0\x68"
+			  "\xf9\xbb\x83\xaf\x95\xd9\x98\x64"
+			  "\xd6\x31\x77\x51\xaf\x03\x32\xd1"
+			  "\x63\x8e\xda\x3d\x32\x26\x44\xa8"
+			  "\x37\x87\x0c\xcc\x91\x69\xdb\x43"
+			  "\xc1\x55\xe6\xfb\x53\xb6\xb7\xe4"
+			  "\xc1\x33\x30\xeb\x94\x3c\xcd\x2c"
+			  "\xcc\xe3\x29\x63\x82\xee\xc4\xa4"
+			  "\xcc\x2a\x03\x4d\xe1\x02\x78\x38"
+			  "\x7d\x4f\x64\x35\x87\x72\x7a\xb7",
+		.ctext	= "\x72\x22\x3a\x93\x94\x2f\x73\x59"
+			  "\xfe\x5e\x51\x6a\x05\xc8\xe8\x41"
+			  "\xc5\x9b\xb7\x47\x14\x80\x9b\x13"
+			  "\xa9\x75\x7b\x82\x93\xf9\xb0\xb4"
+			  "\x20\xd1\xc5\xa4\xf4\x40\xf3\x65"
+			  "\xd0\x8f\x94\x25\xe3\x47\xb5\xdd"
+			  "\x23\xa9\xed\x05\xf2\xce\x16\x18"
+			  "\xcc\xb0\x9e\x71\x2c\x59\xb9\x7b"
+			  "\x76\x74\x51\x7f\xc8\x75\xae\x9f"
+			  "\x6f\x18\x8b\xfa\x5a\x42\xba\xc9",
+		.len	= 80,
+	}, {
+		.key	= "\x51\x4b\x8b\xf1\x41\xf5\x60\x41"
+			  "\x24\x13\xed\x1e\x40\xe3\x4e\xc2"
+			  "\x3a\x89\xe9\x90\x36\xa4\xac\x4a",
+		.klen	= 24,
+		.ptext	= "\x3e\x25\x96\x84\xe8\x61\x79\x59"
+			  "\x33\x65\xfe\x5c\xb3\x89\xe9\xd1"
+			  "\xee\x48\x9e\x1e\x05\x4e\xe4\x7c"
+			  "\x97\xd3\xea\xf2\xe2\x28\x88\x84"
+			  "\x2b\x8f\xc6\xa8\x60\x50\xa2\xf9"
+			  "\xfd\x09\x0e\x2f\x2c\x46\x39\x4f"
+			  "\x30\x51\x0f\x1f\x03\x4c\x03\xdd"
+			  "\x3e\x7c\x0c\x30\x3a\xe8\xed\x5f"
+			  "\x75\x23\xba\xc1\x37\x66\x98\x75"
+			  "\x75\xe1\xc4\x52\xf5\x53\xd7\x21"
+			  "\xb3\xd9\x48\x0a\x84\x03\x32\x4d"
+			  "\xf9\x2d\x57\x33\x86\x0d\x66\x43"
+			  "\xe3\x88\x79\xb8\xb3\xca\xe2\x33"
+			  "\x64\x95\x27\xae\x56\xd9\x4b\xb1"
+			  "\x3f\x86\x4f\xc8\xce\x9e\xf9\x34"
+			  "\x8e\x8e\xd4\xe1\x0e\xbe\x78\x98"
+			  "\x3f\x67\x0b\x76\x1d\xa5\x08\x9d"
+			  "\x91\xcd\x3f\x29\x96\x00\x1e\x66"
+			  "\x9c\x00\x2e\x40\x29\x43\xe0\xfa"
+			  "\xc6\x46\x8a\x23\x19\x24\xad\xc6",
+		.ctext	= "\x62\x39\x86\x7f\x34\xd5\x7b\x91"
+			  "\x72\x94\x10\xf9\x37\x97\xc6\x9e"
+			  "\x45\x52\x6f\x13\x40\x5e\xc2\x22"
+			  "\xed\xfa\xe6\x82\xb6\xc2\xd7\x5b"
+			  "\x33\x24\x30\xd3\x0b\xc2\x47\x97"
+			  "\x35\xec\xcd\x3b\xd9\x85\x65\x7e"
+			  "\xc9\x65\xeb\x93\x39\x4b\xd8\x8c"
+			  "\xdc\xe7\xa7\x6b\xe8\x12\x55\xab"
+			  "\x34\x18\xd5\x70\x82\x77\x01\x29"
+			  "\xc3\x48\x2a\x2b\x1e\x51\xf1\x4e"
+			  "\x2c\x69\xa2\x4e\x64\x05\x94\x44"
+			  "\x87\xb0\x85\x54\xd7\x5a\x35\x04"
+			  "\x3d\x71\x3b\xad\x56\x43\xf6\xc4"
+			  "\xfc\x1c\x5c\xf2\x2b\x3c\x72\x47"
+			  "\x9d\xd0\x60\xab\x92\xb4\xda\x51"
+			  "\xb7\x6d\xca\x85\x57\x69\x14\x36"
+			  "\x08\xa9\x2a\xe8\xde\xd6\x84\xa8"
+			  "\xa6\xd0\x93\x76\x5f\x41\x49\xcf"
+			  "\x1a\x37\x53\xb8\x49\x36\x8e\x99"
+			  "\xd0\x66\xd2\xf7\x11\xc2\x7f\x75",
+		.len	= 160,
+	}, {
+		.key	= "\x4f\x67\x79\xe2\xbd\x1e\x93\x19"
+			  "\xc6\x30\x15\xac\xff\xef\xd7\xa7"
+			  "\x91\xf0\xed\x59\xdf\x1b\x70\x07"
+			  "\x69\xfe\x82\xe2\xf0\x66\x8c\x35",
+		.klen	= 32,
+		.ptext	= "\xdc\x31\xca\xe3\xda\x5e\x0a\x11"
+			  "\xc9\x66\xb0\x20\xd7\xcf\xfe\xde",
+		.ctext	= "\xed\xa2\x04\x20\x98\xf6\x67\xe8"
+			  "\x57\xa0\x2d\xb8\xca\xa7\xdf\xf2",
+		.len	= 16,
+	}, {
+		.key	= "\x90\x98\x09\xcb\x38\x09\xbc\xdd"
+			  "\xb9\x9a\x08\x3d\x12\x61\x7b\xca"
+			  "\xf7\x53\x06\x45\x73\x5a\xbc\x04"
+			  "\xd2\xa8\xd7\xea\xbe\x4a\xfc\x96",
+		.klen	= 32,
+		.ptext	= "\xa8\x00\xc0\xdb\x6a\x4c\x6a\x70"
+			  "\x2a\xc9\xfa\xe9\x81\xbe\x6b\xe6"
+			  "\xdc\xf3\x36\x8b\x23\xc3\x17\x30"
+			  "\x99\x73\x13\x59\x04\xc2\xba\xe8"
+			  "\x0d\xc1\xaa\x91\xe9\xe5\x54\x8f"
+			  "\x39\x5b\x03\x95\x2f\x9b\x1a\x08"
+			  "\xf3\x40\x9c\x6b\x45\x17\xf2\x1b"
+			  "\x63\x76\xe9\x3c\x2d\xcf\xfb\xf3"
+			  "\x87\x84\xcf\xd5\xff\xfd\x03\xa0"
+			  "\xb0\xf9\x28\x29\x65\x21\x0e\x96",
+		.ctext	= "\x2a\x50\xfa\x90\xed\x00\xeb\xfa"
+			  "\x11\x88\xcc\x91\x13\xdd\x43\x37"
+			  "\xb3\x80\xd5\xf8\xc1\x58\x2c\x80"
+			  "\x77\xec\x67\x28\xec\x31\x8a\xb4"
+			  "\x5d\xe5\xef\xd1\xd0\xa6\x2e\x4e"
+			  "\x87\x03\x52\x83\x2b\xec\x22\x3d"
+			  "\x8d\x5d\xcd\x39\x72\x09\xc8\x24"
+			  "\xe4\xa9\x57\xf6\x5d\x78\x5b\xa5"
+			  "\xd7\xf9\xa4\xcc\x5d\x0b\x35\x35"
+			  "\x28\xdb\xcc\xa6\x35\x48\x66\x8a",
+		.len	= 80,
+	}, {
+		.key	= "\xde\x49\x23\xf2\x61\xac\x74\xcf"
+			  "\x97\xe4\x81\xce\x67\x4a\x0b\x3c"
+			  "\x3e\xa9\x82\x55\xb1\x50\xcb\xff"
+			  "\x64\x66\x41\xb9\x2a\x7e\xfa\xce",
+		.klen	= 32,
+		.ptext	= "\x6d\x6b\x4b\xce\xd1\x56\x8e\x3e"
+			  "\x14\x0e\x22\x8f\x39\x9e\xb4\x4d"
+			  "\xe5\x25\xbd\x99\x09\xe2\x4c\xd9"
+			  "\xc1\x8f\x06\xae\x7c\xf0\x6b\x27"
+			  "\x5e\xab\x5b\x34\xe2\x5a\xd8\x5d"
+			  "\xc4\xdf\x0d\xb3\x1e\xf7\x8f\x07"
+			  "\xd1\x13\xe4\x5b\x26\x63\x42\x96"
+			  "\xb5\x33\x98\x7c\x86\x7a\xd3\xdc"
+			  "\x77\xb0\x5a\x0b\xdd\xe1\xda\x92"
+			  "\x6e\x00\x49\x24\x5f\x7d\x25\xd3"
+			  "\xc9\x19\xfd\x83\x51\xfa\x33\x9e"
+			  "\x08\xfa\x00\x09\x90\x45\xb8\x57"
+			  "\x81\x23\x50\x3d\x0a\x12\x1d\x46"
+			  "\xdc\x18\xde\xc8\x43\x57\xfd\x17"
+			  "\x96\xe2\x12\xf8\xd2\xcf\xa9\x59"
+			  "\x82\x8e\x45\x3f\xe2\x79\xa5\xff"
+			  "\x43\xab\x45\xb1\xb1\x16\x28\xe2"
+			  "\xd4\xd0\xd5\x89\x14\xae\xa0\x3c"
+			  "\x00\x14\x2a\xa4\xf1\x0b\x2b\x2f"
+			  "\xea\x94\x6f\x04\xc3\x3d\x1f\x3c",
+		.ctext	= "\xb7\x3a\x00\x64\xa4\x29\xeb\xe6"
+			  "\xa7\xcf\x35\xd7\xad\xb9\x4f\x24"
+			  "\xa2\xa0\xff\x7a\x1d\x83\x55\x22"
+			  "\x45\x3a\x67\xeb\x8f\xb4\xfe\xd6"
+			  "\x3d\xa5\x1d\x96\x34\xff\x4c\x70"
+			  "\xa7\x64\xdf\x3f\x6f\x37\x63\xe0"
+			  "\xd3\x84\x56\x30\x77\x42\x19\xa8"
+			  "\x19\xc2\x6e\xad\xfd\x3b\x93\x19"
+			  "\x99\x35\xa9\x5b\xd4\xa9\x51\xd4"
+			  "\x46\x77\x23\xe1\x2f\xba\x1c\xa4"
+			  "\xe8\xb1\x35\xfa\x1f\xb9\xed\x9f"
+			  "\xaa\x7f\xdc\x79\xd2\x85\x7f\x78"
+			  "\xac\x8d\x8c\x39\xc1\x1d\x33\xd0"
+			  "\xae\x58\xb6\xe5\xe0\xef\x78\x19"
+			  "\x5c\x0c\x82\x14\xab\x7d\x3a\x82"
+			  "\xb9\x1f\x9a\x7b\xbe\x89\xd6\xa0"
+			  "\x79\x6e\x9d\xeb\xc6\x9a\xee\x88"
+			  "\x11\x01\x1b\x9d\x48\xee\xcd\x8d"
+			  "\xb7\xbf\x71\x56\x6e\xa6\xd8\xa0"
+			  "\x85\x8e\x59\x64\x32\xe1\x80\x3d",
+		.len	= 160,
+	},
+};
+
+static const struct cipher_testvec lea_cbc_tv_template[] = {
+	{
+		.key	= "\x87\xf1\x42\x4f\x1a\x14\x83\xcc"
+			  "\x1f\xd0\x35\x4e\x18\xa9\x94\xab",
+		.klen	= 16,
+		.iv	= "\xcf\x58\x4e\x6e\xf6\xd6\x42\x88"
+			  "\x0a\xb7\x87\x42\x7d\xb9\xb0\x76",
+		.ptext	= "\x13\x9d\x4e\xff\x8d\x35\xb7\x6e"
+			  "\x85\xbf\x06\xfe\x99\x71\x63\xcb",
+		.ctext	= "\x49\xb9\xf3\x22\x6d\xa5\x4b\x4a"
+			  "\x0d\x38\x5a\x9c\x48\x70\x52\x4b",
+		.len	= 16,
+	}, {
+		.key	= "\x73\x01\x97\xc9\x42\xd9\x7f\xf9"
+			  "\x38\xa8\x3f\x77\xc4\x34\x4e\x6d",
+		.klen	= 16,
+		.iv	= "\xb6\x17\xb2\x59\xed\xcd\xc6\xbb"
+			  "\x2f\x0c\x3a\x10\x58\x53\x5b\x04",
+		.ptext	= "\xb7\xc6\x95\xe4\xb5\x39\x36\x52"
+			  "\xb7\x8b\x74\x3c\x46\x35\xb2\x0f"
+			  "\x6e\x22\xff\x27\x63\xc2\xe0\x8b"
+			  "\x6b\x5a\x4f\xd7\xf7\x9e\x03\x79"
+			  "\x13\x81\xf2\x20\x01\x4c\x15\x72"
+			  "\x21\xed\x6b\xfe\x15\x92\x40\x71"
+			  "\x21\x77\xaf\x0c\xd8\xfc\x66\x55"
+			  "\xf5\xfb\xa9\x0d\x87\x58\x9a\x63"
+			  "\x51\xda\xb7\x67\x70\x39\xa4\xc1"
+			  "\x3e\x78\x2b\xa3\x77\x74\x81\xfc",
+		.ctext	= "\x7c\x96\xf9\x67\x5b\xe0\x38\x54"
+			  "\x70\x0d\xea\xe5\x10\x06\xf4\xfc"
+			  "\xfc\x3a\xda\x33\xba\xe2\x0d\x4f"
+			  "\xf6\x13\xfa\x6b\xa8\x74\xb1\x75"
+			  "\xb7\xde\x71\xdc\xf8\x7a\x18\x26"
+			  "\x7b\x57\x74\x10\xf0\xe8\xb9\xdf"
+			  "\x1e\x05\x37\xa5\x60\xe5\xd1\xef"
+			  "\xfe\xc1\x10\x22\xce\x60\x23\xb4"
+			  "\x98\x5c\x9d\x8d\xa2\x07\x33\x70"
+			  "\x7c\xe7\x6a\x42\x35\x82\xaf\x23",
+		.len	= 80,
+	},  {
+		.key	= "\xb2\x10\x06\xa2\x47\x18\xd6\xbf"
+			  "\x8a\xc5\xad\xdb\x90\xe5\xf4\x4d",
+		.klen	= 16,
+		.iv	= "\xa5\xa6\xf3\xce\xee\xaa\x93\x2d"
+			  "\x4c\x59\x68\x45\x82\x7b\xee\x2d",
+		.ptext	= "\x9b\x06\x13\xae\x86\x34\xf6\xfa"
+			  "\x04\xd9\xef\x9a\xc4\xf4\xcf\xa9"
+			  "\xcb\x84\x69\x40\x1a\x9d\x51\x31"
+			  "\x8b\xba\xe3\xf8\xfd\x55\x87\xee"
+			  "\xb0\xb5\x34\xc0\xf2\x08\x33\x20"
+			  "\xfc\xb1\x26\xba\x17\xe3\x48\x6a"
+			  "\x03\x6f\xf6\xac\x98\xda\x6f\x54"
+			  "\xae\xb3\xd8\x7f\x3b\x23\x83\xc9"
+			  "\xbb\xc6\x70\xc0\xd5\xb9\x14\x99"
+			  "\x3b\xf5\x5a\x22\xd2\xdb\xe8\xf8"
+			  "\x13\x0f\xa3\xfa\xb1\x8a\x75\xfd"
+			  "\x7b\xeb\x4e\xc2\x85\x0e\x68\x25"
+			  "\x82\xe0\xd0\x96\x75\x72\x22\xcd"
+			  "\x89\x4c\x93\xba\x3c\x03\x35\xbb"
+			  "\xc3\x0e\x77\x12\xaa\xd5\xeb\x96"
+			  "\xbc\x0b\x4d\xa8\x22\x3e\xc0\x69"
+			  "\xcf\xac\x5a\x2b\x1b\x59\xe3\x25"
+			  "\xad\x5e\xda\x6a\x9f\x84\xb9\x1c"
+			  "\xdd\x11\x7b\xdc\xce\xe2\x5a\x86"
+			  "\x37\xba\xdd\x1b\x5c\xda\x12\xff",
+		.ctext	= "\xb2\x25\x29\xec\xc4\x7d\x73\xca"
+			  "\x8c\xf2\x05\xbe\x8e\x88\x94\x77"
+			  "\xd0\x2f\xb6\x5c\x99\x23\x64\x2f"
+			  "\x67\x4f\xaf\x76\x69\x82\x6c\x97"
+			  "\x8f\xb4\x8a\xc7\xdd\x1b\xbe\x01"
+			  "\x35\x07\xdf\xb9\x0f\x0d\x6b\xab"
+			  "\x59\x8f\xdd\x34\xc6\x93\xb1\x66"
+			  "\x13\xf2\xb4\x78\xc0\x1d\xff\xc4"
+			  "\xb7\x0b\x44\x85\xbb\x93\x43\x0e"
+			  "\x40\xe6\xbc\x0e\xbb\xf3\x53\xce"
+			  "\xe5\x1b\x92\xd6\xb4\xa0\x10\xf0"
+			  "\x4b\x1f\xbe\x7c\x2f\x4f\x6f\x24"
+			  "\x69\xa2\xe4\x4b\xad\x79\x68\xf7"
+			  "\xf9\x23\xb8\x31\x6c\x21\xfd\xf8"
+			  "\x47\xe5\x34\x0e\x10\x95\x20\x9b"
+			  "\xfa\xa9\x1e\xa7\x0a\x5a\xc6\x3a"
+			  "\x39\x39\xf9\x92\xed\xe2\x4e\x8d"
+			  "\xba\x21\x24\x50\x88\x80\x89\x8a"
+			  "\xd3\x20\x87\x0f\x74\x7d\x5c\xe6"
+			  "\xc7\x75\xe5\xcf\xf7\xc4\x2d\xca",
+		.len	= 160,
+	}, {
+		.key	= "\x68\xd2\x18\x65\x0e\x96\xe1\x07"
+			  "\x71\xd4\x36\x1a\x41\x85\xfc\x81"
+			  "\x27\xc3\xb5\x41\x64\xda\x4a\x35",
+		.klen	= 24,
+		.iv	= "\xb5\xa1\x07\x03\x79\x0b\xe7\x4e"
+			  "\x15\xf9\x12\x2d\x98\x52\xa4\xdc",
+		.ptext	= "\x9b\x56\xb0\xb2\x6c\x2f\x85\x53"
+			  "\x6b\xc9\x2f\x27\xb3\xe4\x41\x0b",
+		.ctext	= "\x72\x86\x6a\xa8\xe3\xf1\xa4\x44"
+			  "\x96\x18\xc8\xcf\x62\x3d\x9b\xbe",
+		.len	= 16,
+	}, {
+		.key	= "\xc2\xe6\x6b\xb9\x2b\xf6\xa3\x1f"
+			  "\x12\x35\x44\x5e\x2f\x92\x57\xed"
+			  "\x6c\x59\xc3\xa5\x8f\x4c\x13\x76",
+		.klen	= 24,
+		.iv	= "\x1a\xf6\x79\x59\x6f\x3c\x13\x85"
+			  "\x38\x35\x6e\xe6\x06\x3c\x49\xcb",
+		.ptext	= "\x38\x43\x9b\xdf\x1f\x6a\xd7\x5a"
+			  "\x60\xd0\x6e\x78\x99\xa8\x95\x2b"
+			  "\x47\x90\x4a\x0c\xe7\x1f\x91\x98"
+			  "\x5b\xbd\x04\x99\x90\xb8\x8a\xe2"
+			  "\x5e\x94\x67\x3f\xaf\xa2\x75\xac"
+			  "\xe4\xd4\xb0\xc5\x74\xcf\xf8\x7e"
+			  "\xd6\x42\x13\x14\xa2\x76\xf2\x44"
+			  "\xf3\x27\x35\xba\x0f\x93\xf1\xcc"
+			  "\x4a\xd0\xb0\x68\x27\x62\xb9\x4b"
+			  "\xc1\x0d\x92\x74\x69\xe8\xc4\xd9",
+		.ctext	= "\x96\xbe\x15\xc3\xb8\xd1\x47\x3b"
+			  "\x4a\x3c\xb8\xf5\x25\x83\xb1\xad"
+			  "\x80\x4f\xe4\x6d\xc1\x43\xfd\x26"
+			  "\xc3\x8c\x4b\x01\x9c\x10\xd6\x0f"
+			  "\x68\x15\x82\x50\x95\x32\xe5\x86"
+			  "\xcc\x23\x71\x8b\x7b\xd7\x50\x45"
+			  "\xd5\x77\xf8\xe7\x78\xca\x4b\xf0"
+			  "\x27\x8e\xb2\x5a\xb7\xcd\x67\x08"
+			  "\x00\xc5\xec\x88\x32\xfe\x91\xb8"
+			  "\x4e\x56\xab\x58\xde\xe8\x49\xa8",
+		.len	= 80,
+	}, {
+		.key	= "\x60\x4f\xeb\x8b\x42\x88\xe6\xee"
+			  "\x61\x96\xba\xb9\x66\x91\xed\xed"
+			  "\xa4\x8c\x1d\x41\x43\x23\x41\x5b",
+		.klen	= 24,
+		.iv	= "\x9d\x53\x31\x46\xe8\x8f\x69\x21"
+			  "\x16\x0f\x09\x14\xf9\x6c\x21\x89",
+		.ptext	= "\xab\x6a\x2c\x98\x2d\x14\xda\xc2"
+			  "\x4e\x0f\x13\xe3\xce\x28\x38\x62"
+			  "\xc4\x2f\xac\xab\x3d\x08\x93\xdf"
+			  "\x26\xff\xd9\xc9\x6c\x5c\x76\x15"
+			  "\x61\x37\xf1\xbc\x62\x8e\x23\xc3"
+			  "\xb7\x95\x3e\x25\xba\x4d\x0e\x0e"
+			  "\x3b\x58\x7e\x49\x24\x0c\x5d\xfc"
+			  "\x59\xc6\x62\x93\xe2\x81\x6e\xfa"
+			  "\x4c\xa7\x12\x0f\x4c\x26\x51\x57"
+			  "\xa6\xc7\xa7\xef\x4d\xbc\x4a\xc6"
+			  "\xcc\x77\xaf\x0a\xe4\xc3\x50\xe0"
+			  "\x77\x0d\xad\x58\xa5\x02\x90\xa0"
+			  "\x34\x60\x96\x78\x35\x05\xeb\xe5"
+			  "\xe4\x4d\x55\x2a\xd1\x9a\x74\xf4"
+			  "\x3d\x34\x48\xd5\xc7\x54\xf3\xf3"
+			  "\x48\x7b\xc0\x02\xfb\x08\x65\x6f"
+			  "\xe1\x0a\x85\xde\x63\x53\x79\xd7"
+			  "\x3a\xce\x50\xbc\x8c\x12\x14\xff"
+			  "\x57\x36\x4f\x91\x13\xe7\xce\x9e"
+			  "\x93\xb9\xa5\x77\x2d\xbb\x74\xd0",
+		.ctext	= "\x55\x6b\xda\xdc\x75\x31\xee\xe8"
+			  "\x88\xf6\xde\x47\x8f\xb3\x74\x0f"
+			  "\xa2\xbd\x15\x22\x08\x76\x74\xf2"
+			  "\xc6\xe1\x64\xdc\x6f\xb6\x08\x7c"
+			  "\x41\x6b\xcc\x7c\x25\x29\x54\x78"
+			  "\x25\x9d\x4e\xbb\xec\xfd\x42\xd3"
+			  "\x2b\x97\x23\x9e\x45\x91\x02\x68"
+			  "\x0a\x19\x79\x82\xab\x3e\xd6\xd7"
+			  "\x32\xd2\xbc\x8a\x2e\x37\x35\x58"
+			  "\xb4\xc5\xe1\xc9\x12\x30\xb7\x76"
+			  "\xcb\x1f\x02\x60\x78\xbc\xa9\x10"
+			  "\x4c\xf2\x19\xbc\x96\x06\x5e\xef"
+			  "\x44\xda\x86\xa4\xa3\xaa\x99\xf2"
+			  "\xec\xb9\xa6\x09\xd8\x5c\x6f\x4f"
+			  "\x19\x07\xb7\x1d\x49\xdf\x55\x2b"
+			  "\xd1\x43\x43\xb2\xc6\x79\x75\x19"
+			  "\x6a\x25\xd8\xa2\xaf\xdc\x96\xd3"
+			  "\x78\x9e\xeb\x38\x3f\x4d\x5c\xce"
+			  "\x42\x02\x7a\xdb\xcd\xc3\x42\xa3"
+			  "\x41\xc0\x19\x45\xc0\xb3\x89\x95",
+		.len	= 160,
+	}, {
+		.key	= "\x1a\x4e\xe8\x2b\x1f\x37\x84\x94"
+			  "\x6d\xf2\xa1\x8f\xc7\x49\xb3\x4f"
+			  "\xe2\x26\xcf\x28\x11\xa6\x6a\x47"
+			  "\x22\x6e\x64\xa1\x82\x42\x45\x29",
+		.klen	= 32,
+		.iv	= "\xa8\xd4\xc6\x46\xb1\xd9\x93\x84"
+			  "\x48\x62\x4f\x8a\xc9\x6a\xd8\x4c",
+		.ptext	= "\xa6\xab\xcd\x81\x09\xb7\x4e\x58"
+			  "\xbb\x43\x03\x66\x44\xc6\x60\xe3",
+		.ctext	= "\x91\xee\x72\xe8\xe2\x6f\xa4\x23"
+			  "\x49\x77\xe4\x64\xca\x48\x72\xca",
+		.len	= 16,
+	}, {
+		.key	= "\x50\x81\xcf\xf8\x35\x84\xf4\x3b"
+			  "\x8b\x60\x07\x4f\xb2\x05\x08\xbb"
+			  "\x60\x63\xf9\x0b\x44\x7c\xa0\x80"
+			  "\xe9\xbd\x88\x06\xde\x8e\x49\x66",
+		.klen	= 32,
+		.iv	= "\x14\x28\x09\xbd\x87\xa6\x43\x2d"
+			  "\x20\x5f\xc7\xd2\xda\x74\x02\xf8",
+		.ptext	= "\x25\xa5\x80\x8b\x88\x69\xaf\xce"
+			  "\x89\x3d\xe6\x50\xd1\x3c\xa5\x1d"
+			  "\x8c\xf0\x1f\x31\x0f\x68\xf5\x32"
+			  "\xbd\x8a\x45\x5e\x2b\xab\xe3\xc2"
+			  "\x82\x5d\xe6\xac\x25\x88\x67\x64"
+			  "\x94\xbd\x85\x17\x91\xc6\xac\x14"
+			  "\x81\x82\x18\x3b\x14\xf0\x94\xb1"
+			  "\x28\x89\x88\xd9\xeb\xd3\x32\x80"
+			  "\x40\x33\x34\x58\x65\x02\x4f\xa8"
+			  "\xd2\xe4\x6e\x41\x64\x55\xe6\xb4",
+		.ctext	= "\xee\x57\xd3\x98\x7e\x62\xcf\x04"
+			  "\xbb\x11\x21\x91\x20\xb4\xa3\x92"
+			  "\x16\x86\xaf\xa1\x86\x9b\x8a\x4c"
+			  "\x43\x7f\xaf\xcc\x87\x99\x6a\x04"
+			  "\xc0\x06\xb0\xc0\x4d\xe4\x98\xb2"
+			  "\x4b\x24\x34\x87\x3d\x70\xdb\x57"
+			  "\xe3\x71\x8c\x09\x16\x9e\x56\xd0"
+			  "\x9a\xc4\xb7\x25\x40\xcc\xc3\xed"
+			  "\x6d\x23\x11\x29\x39\x8a\x71\x75"
+			  "\x0c\x8f\x0c\xe4\xe4\x2b\x93\x59",
+		.len	= 80,
+	}, {
+		.key	= "\x26\x7e\x63\x9d\x25\x19\x08\x8a"
+			  "\x05\xbd\x8a\xf4\x31\x3c\x47\x55"
+			  "\x88\x06\xb9\xcb\x03\x42\x40\xc8"
+			  "\x98\x1d\x21\x0b\x5e\x62\xce\xcf",
+		.klen	= 32,
+		.iv	= "\xf1\x4c\x68\x42\x18\x98\x82\x38"
+			  "\xa5\xdd\x28\x21\x9d\x20\x1f\x38",
+		.ptext	= "\x99\xa3\x6f\xfe\x6c\xff\x1f\xe7"
+			  "\x06\x72\x40\x53\x99\x7a\x2d\xbf"
+			  "\xfa\xa3\x10\x3d\x49\x9d\xa8\x21"
+			  "\xd4\x91\x4a\xfe\x39\xb5\x26\xd1"
+			  "\xcb\x1f\xcc\x7b\x37\xd7\xef\x75"
+			  "\x68\x2f\x68\xbf\xa7\x57\x7d\x19"
+			  "\x07\x2c\x64\x76\x00\x51\x03\xae"
+			  "\x5a\x81\xfa\x73\x4c\x23\xe3\x86"
+			  "\xe6\x1f\xd8\x2a\xac\xf1\x36\xda"
+			  "\x84\xfc\xa1\x37\xd2\x20\x49\x44"
+			  "\xe1\x8e\x6b\xd5\x85\xdb\x1a\x45"
+			  "\xfe\x54\x3f\x68\x20\x92\xdf\xc0"
+			  "\xb1\x4e\x9c\xf4\x13\x76\x7f\x7d"
+			  "\x22\x7f\xf4\xa3\x60\xfe\x16\xa8"
+			  "\x50\x72\x2d\x43\x1f\x64\x75\x50"
+			  "\x89\xb3\x22\xc5\xfb\x29\xa0\xe8"
+			  "\xf5\x51\x1f\xbf\xb3\x8d\x4f\xc8"
+			  "\x0c\x63\x68\xeb\x9a\x18\x6e\xad"
+			  "\x1b\x80\xb3\xa6\x17\x14\x9d\x35"
+			  "\xc4\x45\xa9\x72\x26\x10\xb0\x64",
+		.ctext	= "\xb5\x35\x2d\x1b\x32\x1d\x11\x00"
+			  "\x7a\x50\xaa\x50\x0b\x7d\x7d\xd4"
+			  "\x3c\x59\x89\xbf\x12\xe7\x20\x9d"
+			  "\x96\xe4\xe3\x04\xc7\x2a\x53\x44"
+			  "\xe4\x39\x1e\xd4\x25\x89\x2c\x6a"
+			  "\xd4\x05\xda\x1d\x0a\xce\xcc\x67"
+			  "\x7b\x80\x76\xf3\x28\x0c\xb7\x85"
+			  "\xb1\x18\x07\x7b\x78\xbe\x2d\xec"
+			  "\xbe\xf6\x77\x22\x74\x22\xc1\x88"
+			  "\x00\xef\x25\xaf\x03\xcd\x69\x3c"
+			  "\xc1\x31\x17\xab\x92\x5c\xf7\xc3"
+			  "\x90\x0b\xfa\xdf\xf7\xdf\x0a\x6e"
+			  "\x1e\x82\x39\x16\x35\x3b\xa6\x2b"
+			  "\x96\x8d\x9d\xd3\xaa\x56\xae\x7a"
+			  "\xba\x4b\xcb\x46\x8e\xaf\x37\x04"
+			  "\xcc\x06\x21\x72\x52\x0e\x94\x6f"
+			  "\x9b\x6c\x0c\x18\x01\x97\x6d\x31"
+			  "\x85\xb6\xbd\xfd\x50\x4d\x99\x2b"
+			  "\x74\x23\x57\x80\x15\x3f\x69\xa5"
+			  "\xf3\x2c\xcf\xf1\x1e\xc7\xe0\x04",
+		.len	= 160,
+	},
+};
+
+static const struct cipher_testvec lea_ctr_tv_template[] = {
+	{
+		.key	= "\x7a\xd3\x6a\x75\xd5\x5f\x30\x22"
+			  "\x09\x4e\x06\xf7\xc8\x97\xd8\xbb",
+		.klen	= 16,
+		.iv	= "\x0c\x5f\x04\xe8\xb5\x12\x19\x5e"
+			  "\x74\xb3\xde\x57\xe9\x70\x97\x9e",
+		.ptext	= "\x08\x7a\x83\xfc\xc1\x13\xa9\xf3"
+			  "\xe0\xe9\xd5\xaf\x32\xa2\xdd\x3a",
+		.ctext	= "\x2b\x73\x49\x7c\x4f\xc9\xef\x38"
+			  "\xbe\x7a\x0b\xcb\x1a\xab\x87\xa4",
+		.len	= 16,
+	}, {
+		.key	= "\x74\xba\x38\x82\x43\x53\x9e\xfa"
+			  "\x20\x2d\xfa\x64\xa9\x81\x74\xd9",
+		.klen	= 16,
+		.iv	= "\xe0\x56\xc2\xc6\xd2\x99\xef\x9c"
+			  "\x77\x6f\x5b\xc9\xda\xca\x04\xe8",
+		.ptext	= "\x79\x3b\x03\x34\xef\x07\x5a\x43"
+			  "\xd0\x7c\xec\xf1\xd5\x85\xcd\x9a"
+			  "\x39\x7d\xbc\x8c\x62\x41\xee\xbb"
+			  "\xc4\x89\x0e\xb7\x03\x78\x81\xdc"
+			  "\x57\x71\xee\xc8\x35\x2d\xfe\x13"
+			  "\x2c\x0a\x60\x3a\x0d\xa6\x11\xdb"
+			  "\x4e\xad\xda\x28\xb0\xef\x1a\x96"
+			  "\x20\xb6\xc5\xd5\xdb\x56\xad\x05"
+			  "\xd6\x05\x00\x27\x5d\xed\x12\xd1"
+			  "\xfa\x80\x5d\x26\x98\x0c\xc7\x06",
+		.ctext	= "\xaf\x18\x50\x91\xa0\xa4\xf1\xe2"
+			  "\x5b\xc2\xfc\xb0\x5c\xb6\xdd\x1b"
+			  "\x46\xcb\x01\xd5\x8f\x90\x55\xc6"
+			  "\x1b\x9a\xb5\x49\xd4\x6d\x1c\x55"
+			  "\x9a\xdc\x51\x36\xe0\x6e\xb6\xcc"
+			  "\xd9\xf7\xc8\x5a\x2d\x6d\x3b\x5b"
+			  "\x22\x18\x08\x1c\xfa\x76\x75\x98"
+			  "\x60\x36\x8b\x52\x3a\xd9\xf2\x26"
+			  "\xa3\xa7\x72\x55\x3b\x67\x35\xac"
+			  "\xa4\x75\x6e\x9d\xa2\x0f\x91\xa5",
+		.len	= 80,
+	}, {
+		.key	= "\xfc\xec\x3e\x94\x9e\x90\xf8\xb5"
+			  "\x93\xe6\x97\x38\x23\x29\x36\x65",
+		.klen	= 16,
+		.iv	= "\xc9\xf8\xca\xe3\xd9\x64\xf0\x73"
+			  "\x65\x48\xe9\xdf\x62\xd9\xe2\x2c",
+		.ptext	= "\x07\x7d\x79\x17\x76\xe1\x7e\xc0"
+			  "\x9e\x45\xf6\xa0\x60\x1b\x66\xc0"
+			  "\xf0\xd1\x4e\x2d\x7f\xeb\xf3\xa7"
+			  "\x17\x54\x61\x99\xc6\xf6\xb1\x4e"
+			  "\xfe\x88\x88\x61\x3c\xa7\xe0\x75"
+			  "\xe8\x29\x0b\x27\x7c\xae\xf4\x41"
+			  "\xe9\x77\xa9\x30\x37\x7c\x16\xb9"
+			  "\x6b\xb8\x13\xe7\xad\xc8\xa2\x48"
+			  "\xaa\xb4\x71\x59\x38\x0d\xa7\x3e"
+			  "\x38\x38\xdd\xb6\xc1\x09\x69\x4f"
+			  "\x7b\x94\xe3\xd6\x48\x3f\xe2\x12"
+			  "\x2a\x1c\x07\xb2\x61\x76\x3d\x83"
+			  "\xd3\xaa\x3e\xe6\xb1\x38\x5a\x82"
+			  "\x58\x1a\x74\x36\x75\x55\x4d\x51"
+			  "\x6d\xcd\x05\x06\xfc\x5d\xde\x1a"
+			  "\x1c\x27\x44\xe0\x28\x29\x0a\x67"
+			  "\x41\x12\xf7\xf2\xf1\x53\x81\xa8"
+			  "\x0e\x78\xd8\x8d\xe1\xb9\x26\xb1"
+			  "\x88\xcc\x15\xa8\x99\xfe\x93\x39"
+			  "\x08\x82\xd2\x5a\x4b\x09\x92\x5d",
+		.ctext	= "\xf8\x67\x10\x0f\x73\x13\x15\x94"
+			  "\xf5\x7f\x40\x3f\x5d\x60\x1a\x2f"
+			  "\x79\xce\xc0\x86\x27\x96\x0d\xfd"
+			  "\x83\x01\x05\xf8\x13\x47\xe9\x9e"
+			  "\x9d\xe2\x14\x90\x75\xed\xd0\x92"
+			  "\x6c\xc8\x74\x6e\x2b\xbd\xaf\xb8"
+			  "\x7f\x60\x52\x75\x39\xcc\x24\xa7"
+			  "\x15\xec\x79\x2f\x67\x5a\xce\xc4"
+			  "\x13\x0a\x3f\x38\x4a\xe3\x99\x14"
+			  "\xc8\x4e\x14\xbe\xd7\x16\x17\xc1"
+			  "\xc9\xf4\xa8\x4a\x19\x04\x90\x48"
+			  "\x81\x6d\x3c\x84\xce\x17\xdd\x27"
+			  "\xe5\x1c\x0e\xd0\x51\x95\xea\x6f"
+			  "\xb5\xc6\x28\x18\x0b\xe9\xe2\x5d"
+			  "\xa8\x35\xde\x16\x7a\x4b\x26\x59"
+			  "\x57\x38\xc8\xde\xa6\x9a\x0a\x63"
+			  "\xcf\x92\x2f\x49\xb3\x68\xb3\x25"
+			  "\xa4\x16\x61\xaf\xb4\xfd\x9e\xb3"
+			  "\xf0\xb6\x7b\x53\xd1\x86\xca\x6a"
+			  "\x1e\xf5\x92\x5d\x22\x0d\x0f\x70",
+		.len	= 160,
+	}, {
+		.key	= "\xbb\x93\xa2\x64\x3e\x84\xa4\x1a"
+			  "\x23\xfa\x12\xa5\x4d\x5e\x7e\xd6"
+			  "\x94\x39\x1e\xa3\x68\x49\x87\xd8",
+		.klen	= 24,
+		.iv	= "\xb7\xd5\xb9\x09\x11\x3d\x5c\xcb"
+			  "\x0b\xd5\x49\x24\xe1\xf3\x4c\x3f",
+		.ptext	= "\x5f\x47\x28\x64\x01\x6b\xdc\x28"
+			  "\x59\xbb\x25\xe1\xb1\x67\x44\x5d",
+		.ctext	= "\xc6\x35\x7a\xbd\x1d\x38\x24\xf2"
+			  "\xc7\x2e\xd6\xef\x4b\x76\xd8\x97",
+		.len	= 16,
+	}, {
+		.key	= "\x25\x7a\x7c\x23\x19\xa7\x1d\x0d"
+			  "\x33\x0e\x06\x34\x5a\x0e\xf0\xfd"
+			  "\xa8\x63\x72\x33\x12\x3f\xc7\xb4",
+		.klen	= 24,
+		.iv	= "\x4c\x9c\xd2\x6a\xe7\xd1\x5f\x7d"
+			  "\xbd\x64\xac\xc7\x8e\x20\x28\x89",
+		.ptext	= "\xeb\x67\x7a\x5c\x53\xc9\xc5\x6a"
+			  "\x9d\xd5\x2b\xdd\x95\x2e\x90\x98"
+			  "\xea\xe2\xa0\x25\x48\xf8\x13\xef"
+			  "\xc1\x48\x2f\xb2\x71\x90\x8f\x2f"
+			  "\x62\xc3\x24\x24\xad\xa4\x79\x7b"
+			  "\xe2\x94\x3b\xc2\xaa\xa8\xf8\xdb"
+			  "\xab\xff\x27\xf5\xac\x53\x69\xbb"
+			  "\xfa\xcd\x0e\xca\x0a\x1e\xdb\x69"
+			  "\x5f\xcb\x0a\x74\xae\xc8\x93\x9a"
+			  "\x41\x49\xaa\xc9\x99\xd5\x89\xe5",
+		.ctext	= "\xf7\xc2\xde\x82\xdb\x28\xf7\xb7"
+			  "\xe6\x25\x8b\xb5\x31\xb9\x22\x15"
+			  "\x69\xe6\xdb\x58\x97\x29\x02\x50"
+			  "\xc2\xf4\x73\x80\x9d\x43\x49\xcd"
+			  "\x48\xbe\x5c\x54\x7f\x5f\x60\xff"
+			  "\xfd\x42\xbe\x92\xb0\x91\xbc\x96"
+			  "\x3f\x0d\x57\x58\x39\x7d\x3c\x33"
+			  "\xca\x5d\x32\x83\x4e\xc1\x7f\x47"
+			  "\x35\x12\x5c\x32\xac\xfc\xe6\x45"
+			  "\xb6\xdc\xb7\x16\x87\x4f\x19\x00",
+		.len	= 80,
+	}, {
+		.key	= "\x84\x1e\xca\x09\x74\xee\xc0\x3a"
+			  "\xe8\xbd\x0f\x57\xb8\x16\xeb\x4f"
+			  "\x69\x79\xa3\xca\x51\xf2\xde\x60",
+		.klen	= 24,
+		.iv	= "\xfc\xf0\x24\x08\xcf\x55\xa1\xd3"
+			  "\xeb\xca\x26\xda\x55\x55\x71\x74",
+		.ptext	= "\x53\x2d\xae\xad\x19\xcd\x3e\xf4"
+			  "\xa4\x47\xb6\x14\xe7\xdb\x2b\x66"
+			  "\x25\xc8\xad\x44\x9e\x62\x11\xc0"
+			  "\x6d\x65\xf4\x96\xb1\x89\xfc\x60"
+			  "\xeb\x56\x61\x09\xa7\x3a\xac\x84"
+			  "\x5f\xd9\xbf\xbe\x9c\xa4\x16\xd1"
+			  "\x5e\xad\x4c\x7a\xbe\xb9\xe1\xcd"
+			  "\xd2\x97\x3a\x27\xd1\xb1\xe9\x65"
+			  "\x77\xe1\x2f\x53\xab\x86\xbf\x67"
+			  "\x60\xd6\xc5\xb0\xb9\x76\x27\x09"
+			  "\x70\x48\x0b\x92\x78\x84\x99\x61"
+			  "\xe1\x0a\x02\x74\xfd\xf6\xc1\xea"
+			  "\xc1\x75\x21\x73\x6d\xd8\xff\x06"
+			  "\x70\xe7\xd1\xd2\x85\x78\xe7\x76"
+			  "\x23\x40\xf1\x74\x14\xe8\xc2\xe3"
+			  "\x63\x63\x53\x65\x7c\x80\x0b\x59"
+			  "\x8f\xbb\x3d\x52\x35\x59\xf3\xc7"
+			  "\x56\xb4\xea\x0c\x4a\xd3\xdd\x80"
+			  "\x3e\x3d\x06\x09\xda\x0f\xe3\xbd"
+			  "\x21\x4d\x36\xe2\x98\x76\x4f\x19",
+		.ctext	= "\x3e\x23\xf2\x14\x9f\x53\xe8\x64"
+			  "\xd3\x4e\x6a\xbd\xa7\xad\xf9\xa3"
+			  "\x80\x5f\x27\x75\x2e\xee\xcc\xda"
+			  "\x72\x07\x41\x99\x1d\x37\x34\x3b"
+			  "\x00\xfd\x35\x03\x06\xf3\xba\xd8"
+			  "\xa8\xc0\x31\x0c\x7f\x96\x1f\xcf"
+			  "\x46\x96\x4e\x38\x93\x90\xd0\xfc"
+			  "\xca\x59\x1f\xe0\x5d\xc4\x9b\x48"
+			  "\x8d\xd2\xb4\x29\x18\xfd\xad\x89"
+			  "\x3a\xcf\x2f\xa2\x29\x59\xc6\xc5"
+			  "\x91\x0c\xb7\xe5\x7a\x1e\xc7\xc1"
+			  "\x07\x88\x90\xa1\xb3\xa3\x94\x41"
+			  "\x56\x7e\x03\x6d\x3b\x90\x0a\x83"
+			  "\xed\x40\xb4\xd7\x83\x61\xcd\xb5"
+			  "\xf2\xb7\x83\xbc\x1a\x0a\x41\x6d"
+			  "\xab\xca\xdb\xd8\xde\xd4\x4a\x76"
+			  "\xf7\x3a\xe2\x35\x76\x3b\x6e\x8c"
+			  "\xed\xc2\x37\xb4\x32\x9f\x71\x62"
+			  "\x4e\x55\xdc\x42\xae\xc5\xb3\x80"
+			  "\xd8\x04\x20\xf2\x85\x94\xe6\xb3",
+		.len	= 160,
+	}, {
+		.key	= "\xaa\x5b\x8d\xd6\x4b\x30\x23\x13"
+			  "\xdc\xe4\x18\x46\x4e\xae\x92\x90"
+			  "\x8b\xe9\x53\x37\x11\x21\x84\x56"
+			  "\xe0\x6e\xb1\xd3\x97\x00\x16\x92",
+		.klen	= 32,
+		.iv	= "\xda\xfc\x19\xe8\xf6\x87\x17\x53"
+			  "\xc8\x1f\x63\x68\xdb\x32\x8c\x0c",
+		.ptext	= "\xd0\xe9\xdf\xe7\x03\x45\x2d\x16"
+			  "\x6b\x6e\xcf\x20\xc2\x48\xe6\x2c",
+		.ctext	= "\xfc\x9a\x78\xba\x8f\x08\xae\xa8"
+			  "\x2f\x9a\x37\xe5\xbd\x2c\x04\xd8",
+		.len	= 16,
+	}, {
+		.key	= "\x11\xfc\x29\x85\xb9\x74\xb0\x65"
+			  "\xf9\x50\x82\xf8\x62\xf0\x52\xb7"
+			  "\xd9\xb4\xd2\x1c\x3c\x0e\x76\x5a"
+			  "\x49\xdb\x7a\x4b\xbb\xf3\x26\xaa",
+		.klen	= 32,
+		.iv	= "\xb5\xfe\x51\x82\x64\x8a\x24\xe6"
+			  "\xe1\x5b\x20\xe3\x54\x02\x62\xb3",
+		.ptext	= "\x5f\xb2\x26\x33\xba\x4e\x8b\x98"
+			  "\x1a\xc6\x96\x5d\x58\xa4\x78\x7f"
+			  "\xcf\xe2\x14\xed\x06\xff\xbc\x3a"
+			  "\x8f\x52\x3b\x96\x2e\x9d\x19\xfc"
+			  "\x3e\xe5\x1a\xad\x51\x81\x08\xdc"
+			  "\x17\x72\xb2\xab\x81\xf2\x35\x56"
+			  "\x25\x4f\x7a\xae\xe5\xfa\x00\xca"
+			  "\xcb\xdb\xdc\xf9\x38\xe8\xfe\xfa"
+			  "\x3e\xf6\xb5\x70\x4a\xcf\x76\x90"
+			  "\x06\x84\xd9\x1d\x7d\x05\xe4\x96",
+		.ctext	= "\xa0\x03\x29\xcc\xfd\x82\xbd\x62"
+			  "\x39\x1c\xc9\xe0\xc8\x69\x46\x45"
+			  "\x31\xc8\x1e\x6b\x5f\x37\x97\xa2"
+			  "\xcb\x93\x19\x4a\x02\x42\x09\x2a"
+			  "\x85\x5c\x78\x43\xb5\xe1\x1b\x69"
+			  "\x67\x08\x79\xa3\xd5\x2d\xcb\xd5"
+			  "\x30\x3e\x9b\xf2\x1b\xa7\x0b\x72"
+			  "\x5f\xe5\xf8\xd8\x40\x45\xab\x8e"
+			  "\x8e\x14\xf6\x0a\x85\xc1\x41\x3c"
+			  "\x88\x56\xf0\x7d\x4d\xfd\x7e\x0e",
+		.len	= 80,
+	}, {
+		.key	= "\xeb\xe8\xee\x96\x66\xd0\x6d\xb7"
+			  "\x69\xcd\xa8\xb9\x8f\x1e\xab\x04"
+			  "\xe7\xa6\xa4\xa8\x99\xfb\x9f\x05"
+			  "\xcd\xbb\x95\xcb\xc8\x1f\xa5\x26",
+		.klen	= 32,
+		.iv	= "\x58\xd2\xa1\x32\x73\x03\xcc\xb5"
+			  "\x1b\xb9\xe2\x0d\x84\x66\x59\x67",
+		.ptext	= "\x79\xc0\xe7\x32\xfc\xcc\x44\xd4"
+			  "\x2d\x3b\x31\x9b\x6d\xfa\xb9\xf6"
+			  "\xc2\x05\xb7\xe5\x7d\x7c\x98\xae"
+			  "\x1b\xf8\x62\xd2\x6a\x1f\xf5\x3f"
+			  "\xed\x76\x92\xc7\x80\x77\x99\xd1"
+			  "\x3f\xe4\x97\x4e\xa5\x5a\x7f\xef"
+			  "\xf1\x29\x38\x95\xce\x63\x58\x0a"
+			  "\x32\x33\x30\xee\x87\x70\x08\xf4"
+			  "\x09\x72\xab\x4e\x6f\x25\x27\x65"
+			  "\xcd\x5b\xce\xce\xb9\x67\x80\x79"
+			  "\xad\xe7\x2d\x2c\xac\xe1\x95\x30"
+			  "\x28\x12\x52\x4b\x24\x82\x19\xee"
+			  "\x96\x5c\x3d\xae\x0f\xfd\x74\xf8"
+			  "\x9d\x4b\xde\x01\xf1\x48\x43\xfd"
+			  "\xbd\xe7\x9d\x91\x60\x1e\xd6\x8a"
+			  "\xc5\x3c\xd2\xcf\x88\x7d\xb0\x94"
+			  "\x5b\xdb\x4d\xd1\xa9\x28\x0a\xf3"
+			  "\x79\x5a\xd0\xd1\x94\x26\x51\xe1"
+			  "\xea\xd0\x90\xac\x32\x41\xa3\x7f"
+			  "\xd1\x5a\xb7\x64\xfd\x88\x56\x50",
+		.ctext	= "\xca\xdd\x51\xe5\xbf\x4a\x97\x8f"
+			  "\x79\x7a\x1c\x0a\x63\x0b\x2f\xc4"
+			  "\x67\x40\x0d\x77\x44\x30\x3c\x87"
+			  "\x3d\xbe\x2b\x52\xb1\xe3\x13\x7c"
+			  "\xd3\x6b\xa5\x23\x2a\x5e\xd3\x32"
+			  "\xb0\x2f\x20\xad\x25\x76\xba\x76"
+			  "\x2e\xc1\x66\x18\xec\x4e\xc8\x1a"
+			  "\x33\x4b\x20\x1a\x0a\x24\x41\x38"
+			  "\x5c\xb9\xa9\x33\x5e\x91\x4f\xcd"
+			  "\x1e\x00\x0b\x8c\x61\x04\x07\x7f"
+			  "\x57\x4c\x21\xc0\x61\x82\x57\x1d"
+			  "\x69\x34\xa4\x7b\x93\xf2\x7a\x86"
+			  "\xd2\x0b\x0b\x7b\xa6\xac\xbb\x7b"
+			  "\x0d\x56\x24\x31\x0a\x82\x81\x58"
+			  "\xc1\xf3\x36\xca\x04\xa0\xfa\x01"
+			  "\xa6\x45\x1f\x0e\x87\x69\x33\xe5"
+			  "\x4c\xdc\x32\x89\x4a\xb2\xd3\x9b"
+			  "\x23\x2c\x30\x16\x38\xab\xe0\xbf"
+			  "\x50\xce\x33\x34\x45\x88\xd0\xa7"
+			  "\x31\xbf\x31\xdb\x42\x7f\xe2\x76",
+		.len	= 160,
+	},
+};
+
+static const struct cipher_testvec lea_xts_tv_template[] = {
+	{
+		.key	= "\x13\x1d\xbb\xbf\xf9\x7d\xcc\x8c"
+			  "\x82\x99\x52\x1d\xaf\x04\x1a\x0a"
+			  "\x75\x36\x73\x96\xc5\x4f\x9e\xac"
+			  "\x8a\xf0\xef\x06\x49\xc8\x7c\x0a",
+		.klen	= 32,
+		.iv	= "\x03\xb2\x44\xdf\x7b\xa4\x34\xd1"
+			  "\x19\xa6\x30\x9d\x91\xc5\x65\x3b",
+		.ptext	= "\x31\xb7\x63\x5b\x36\x2f\x93\x86"
+			  "\xcc\xe7\x56\xf3\x3a\xed\x64\xd1",
+		.ctext	= "\x36\x53\x37\xbd\x47\x42\x5c\xe7"
+			  "\xf9\xc4\x0a\xfc\x38\x70\xdb\x93",
+		.len	= 16,
+	}, {
+		.key	= "\xf3\x9c\x37\xe3\x80\x12\xff\xd7"
+			  "\x7b\x09\xd5\xd6\x9a\x0b\xf1\x37"
+			  "\x43\xe7\xef\x84\x91\xa9\xeb\x08"
+			  "\x06\xf0\x99\x7c\xc4\x8b\xbc\xa9",
+		.klen	= 32,
+		.iv	= "\x23\x66\x4c\xe3\x08\xfa\xdc\x21"
+			  "\x18\x0e\xac\xd0\xbc\x20\x20\xdd",
+		.ptext	= "\x51\x27\x06\x5b\x8e\xaf\x6b\xf4"
+			  "\x73\x89\x16\x60\x6a\x6a\xfa\x80"
+			  "\x7a\x26\x99\xce\x18\xb2\x96\x25"
+			  "\xf1\xec\x37\xb4\x1d\x6b\x2b\xfe"
+			  "\x81\xeb\xef\x12\x2c\xe5\x10\x6a"
+			  "\xe5\x03\x00\x65\x34\xe0\x1e\x2a"
+			  "\x6d\x0c\xb8\x4b\xa5\x74\x23\x02"
+			  "\xe7\x48\xd3\x0e\xc9\xeb\xbf\x49"
+			  "\x64\xd9\x92\xcf\x29\x43\xb7\x33"
+			  "\x11\x4c\x9b\x76\x94\xaa\x17\x8c"
+			  "\x9d\xa9\x13\x05\x83\x10\xce\xb5"
+			  "\x48\xa8\x02\xae\x93\x7c\x61\xba"
+			  "\x68\xf8\xf2\x5f\xcd\x7c\xfd\xb6"
+			  "\x06\x28\x1e\x52\x02\x25\x7f\x7a"
+			  "\x84\x31\x62\x2a\xbb\x5a\x3c\x25"
+			  "\x1e\x8f\x46\x32\x52\x8d\x94\x7d"
+			  "\x35\x4e\xfd\x01\xa4\xc7\xd1\x8a"
+			  "\x12\xf9\x05\xfd\x31\xac\xfa\xd3"
+			  "\x18\x71\x3a\x3b\xe2\xfa\xac\xec"
+			  "\x04\x94\x29\x07\x77\x17\x0a\x30"
+			  "\x0d\xd7\x6c\x99\x64\xb6\x48\xe1"
+			  "\x32\x1f\xe7\x76\xb4\x93\x39\x6f",
+		.ctext	= "\xe2\x08\x85\x96\xd5\xcd\x2b\xd0"
+			  "\xb0\xff\xa4\x54\x78\x04\xcf\x5a"
+			  "\x59\x56\xf6\xd8\x8a\x9a\x04\x98"
+			  "\x72\xa3\xe1\x68\x84\xee\x4a\xa1"
+			  "\x0e\x39\xc0\x77\x4f\x69\x1d\x8b"
+			  "\x0f\xcb\x1d\x98\xd3\xa0\xc2\x81"
+			  "\x7d\x7f\x51\xbf\x6e\x1b\xd1\x73"
+			  "\xd5\x68\x72\x72\x1c\x21\x78\x37"
+			  "\x59\x11\x30\x59\x46\x9c\xd3\x0e"
+			  "\x2f\x66\x56\x5c\x4b\x43\xd7\xa3"
+			  "\x85\xce\x32\xc1\x36\xdf\x7b\x3a"
+			  "\x24\x80\xd5\x51\x3a\x84\x71\x8f"
+			  "\x49\x6c\x05\xc5\x06\xa5\x13\xaa"
+			  "\x8c\x32\xe2\x61\xd8\xae\x26\x23"
+			  "\x2f\x32\x94\x92\x5f\x37\xd9\x05"
+			  "\x32\xb6\x34\x29\x3e\xae\xd7\xfa"
+			  "\xa7\x4b\xd6\x7a\x71\x00\xc7\xf0"
+			  "\x91\x17\x18\xf8\x0f\xa7\x41\x86"
+			  "\xb3\x0f\xa2\xd0\xd9\x3c\xf3\x2b"
+			  "\x0e\x0b\xd8\x7f\xdc\x51\x1f\xf8"
+			  "\xbe\x42\x41\x3d\x53\xdb\x1e\x6f"
+			  "\x91\x7a\x4d\x56\x70\x5a\xd9\x19",
+		.len	= 176,
+	}, {
+		.key	= "\x39\xa1\x40\xca\x04\x1f\xab\x0d"
+			  "\x30\x9e\x6d\x2b\xf3\x52\x06\x87"
+			  "\x9f\x5b\xd8\xdf\xac\xf6\xcd\x48"
+			  "\x7b\x6d\xfd\x78\x06\xa5\x2d\x85",
+		.klen	= 32,
+		.iv	= "\x14\x6c\xdf\xce\x8a\xa1\x78\x42"
+			  "\xbe\xad\xb0\xc9\xcc\x45\x8b\x1c",
+		.ptext	= "\x9d\xea\xc3\xbd\xa6\x57\x82\x4d"
+			  "\x02\x6e\x38\x09\x2e\x92\xd4\x93"
+			  "\xe2\x70\xc9\x52\xe3\x64\x3c\x17"
+			  "\xa8\x33\x92\x07\x53\x1f\x23\xc2"
+			  "\x94\x8a\x22\xe6\x22\xd6\x31\xee"
+			  "\xce\x9f\xbb\xa1\xb5\xdf\x99\x26"
+			  "\xae\x23\x7f\x77\xd8\xa6\xec\xcd"
+			  "\x91\xa6\x08\x24\x88\x7f\xf2\xee"
+			  "\x30\x27\xff\x4b\x4d\x06\xd4\x6c"
+			  "\x97\x85\x2e\x87\x5f\x7f\xcc\xda"
+			  "\x7c\x74\x7e\xaa\xf7\x53\x20\xbe"
+			  "\xf6\x51\xe4\xeb\x24\xde\x1d\xa6"
+			  "\x9b\x4d\xca\xdc\xdd\x0e\xeb\x2b"
+			  "\x9b\x07\xfd\xa3\x6d\xa9\x9a\xb5"
+			  "\x0b\xe2\xf9\x72\x69\x90\xec\xf7"
+			  "\x7b\x17\xdc\x8d\x4f\xf3\xaf\xed"
+			  "\xf6\x6a\xdc\x19\x39\x82\xe2\x84"
+			  "\x7b\x4c\x5f\x7e\x3e\x55\x8b\x11"
+			  "\xdc\xe7\x11\x5a\x52\x02\xe4\xd7"
+			  "\xf7\x90\xd7\xdf\x94\xf1\xe4\xd5"
+			  "\xe4\x49\xe8\x19\x33\x22\x66\x19"
+			  "\xc6\xf5\xdc\xad\x7c\xf0\xf3\xea"
+			  "\xe2\xa4\xa2\x57\x53\x28\x28\xb5"
+			  "\x32\x6b\xfc\xa2\x86\xee\x8e\x0a"
+			  "\x25\x76\x20\x94\xff\x50\x73\x5d"
+			  "\x2c\xb4\x66\xd2\x59\x95\xa0\x37"
+			  "\xc4\x96\x47",
+		.ctext	= "\xc0\x48\x1b\xcf\x4a\xbd\x7b\xb2"
+			  "\x18\xe8\x2a\x31\xaf\x7f\x7e\x3f"
+			  "\x7f\x79\xc7\x03\x4b\x24\xc8\xfb"
+			  "\xaa\x8b\x6b\x4d\x51\x80\x95\x60"
+			  "\xb2\x9c\x3b\x80\xf3\x23\x93\xd3"
+			  "\xef\x55\xc3\x9b\xae\xa0\x13\xe0"
+			  "\x36\x6f\x4e\xc8\x06\x99\x12\x81"
+			  "\xf2\x70\x28\x42\x8f\x00\x79\xb2"
+			  "\xb9\x7d\xfe\x3a\x6a\x45\xea\x1d"
+			  "\x83\x8e\xbc\x07\xf3\xaf\x73\xb9"
+			  "\xbd\x6c\x40\x59\x43\xc2\x54\x2a"
+			  "\xb2\x9e\x06\x52\x7f\x35\xf9\xdf"
+			  "\x7e\xa0\xf9\x27\x2d\x0d\xb7\x6a"
+			  "\x5e\x17\xf5\xf3\x26\xc1\xd0\x0c"
+			  "\x1b\x57\xbe\xf3\xf0\xa0\xe4\x36"
+			  "\x7b\x5b\x0f\xc1\x47\xac\x96\xa1"
+			  "\xd9\x01\xac\xf3\x2a\xa2\xc2\x6e"
+			  "\x82\x83\x00\xff\x5d\x57\x98\xac"
+			  "\x8b\xaa\x05\xcd\xe9\x08\x90\xd6"
+			  "\x21\x84\xd1\x33\xd0\x2b\xc4\xa7"
+			  "\xe9\x59\x4f\x2f\xb4\x19\x97\x7c"
+			  "\xe4\x2d\xe9\x02\x7b\xb3\x58\xf6"
+			  "\xab\x5a\x33\xfa\x53\xc7\x61\xc7"
+			  "\x71\xc6\x0f\xdc\x3e\x18\x6c\xe8"
+			  "\xb8\xd2\x21\x15\x1e\x82\x20\x69"
+			  "\xf2\x92\x7f\xa4\x64\xb9\xf4\xa5"
+			  "\x61\x3b\xb9",
+		.len	= 211,
+	}, {
+		.key	= "\xae\xf5\x94\x42\xea\x02\xeb\x8f"
+			  "\x41\x74\x00\x8c\x55\x12\x72\x5f"
+			  "\x0d\x4e\x9d\x3a\x90\xb7\x73\x0c"
+			  "\xc8\x93\x59\x07\xe8\x95\x8c\x86"
+			  "\x99\x76\xeb\x5c\xd7\xc7\xf0\x2f"
+			  "\xac\x5e\xa0\x75\xd2\xbf\xa7\xb6",
+		.klen	= 48,
+		.iv	= "\x78\x38\x47\xb2\x56\x55\x3d\x82"
+			  "\x93\x7e\x34\xd7\xc2\xe6\x0c\x66",
+		.ptext	= "\xd4\x7b\x83\x78\x74\xba\xd9\x5b"
+			  "\x27\x61\x31\x74\xa4\x00\x03\x59"
+			  "\x61\xc9\x23\x2e\xcb\x3d\xaf\xf5"
+			  "\x3d\xa5\x2a\x02\x7d\x12\x11\x6e"
+			  "\xec\x59\xfd\x95\x93\x59\x5e\x68"
+			  "\x9e\x9d\x10\x74\x96\x9a\xac\x51"
+			  "\x4b\xd3\x91\xaf\xbe\x33\x78\x3a"
+			  "\x77\x61\xd8\x24\xa8\xfd\xbf\x2e"
+			  "\xd8\x45\xee\x53\x2e\x91\x22\x0e"
+			  "\x43\xe6\xb7\x2a\x1c\xb6\x1a\xd4"
+			  "\x74\x46\xfd\x70\xcf\x42\x5e\x4f"
+			  "\x4e\xd8\x4e\x91\x75\x2e\x6d\x02"
+			  "\x7a\xf2\xdb\x69\x43",
+		.ctext	= "\x48\xda\x19\x0e\x4c\xa5\x9d\xc4"
+			  "\xa5\x34\x37\x81\xde\x1b\x8c\x61"
+			  "\x5c\x70\x92\xf6\x66\x28\x88\xe4"
+			  "\xa2\x36\xc9\x66\xcf\x85\x45\x56"
+			  "\x2d\xbc\x44\x19\xe9\x75\xec\x61"
+			  "\xbb\x1a\x11\xdf\x3c\x2b\xa4\x49"
+			  "\x80\xdd\x3b\x6e\xd3\xd4\x29\xd2"
+			  "\x01\x11\xf8\x2f\x83\x96\x60\xef"
+			  "\x9d\x33\xc5\xde\x5e\x48\x10\xaf"
+			  "\x02\x47\xda\x91\x88\x2a\x9f\x44"
+			  "\x31\x68\x73\x1b\x12\xc0\x91\xc4"
+			  "\xc1\xdd\xf3\x43\xba\x05\x66\xb6"
+			  "\x04\x4e\xea\xea\x1f",
+		.len	= 101,
+	}, {
+		.key	= "\x3f\xa4\x4e\x46\x47\x13\x19\xbe"
+			  "\x8b\x5b\xea\xcb\x8f\x0f\x55\x19"
+			  "\xaf\xea\x38\x15\x9a\x9f\xa1\xda"
+			  "\xb1\x24\xb9\x45\xfb\x1e\xa7\x50"
+			  "\xff\x25\x21\x65\x17\x34\xab\xec"
+			  "\x72\x65\xc2\x07\x7c\xbe\x6f\x65"
+			  "\x51\x57\x9e\xd2\x88\x43\xbc\x9e"
+			  "\x44\x9b\x54\x4a\x3d\x4a\x8c\x40",
+		.klen	= 64,
+		.iv	= "\x71\x60\xda\x95\x7b\x60\x1d\x7e"
+			  "\x96\x0c\xca\xe9\x47\x58\x1b\x54",
+		.ptext	= "\x10\x1b\x67\x8f\x11\xf6\xf9\xcd"
+			  "\x1d\x72\xa7\x1a\x55\x82\xb4\xef"
+			  "\x16\x53\x05\x4a\xa7\xa8\x02\x82"
+			  "\x07\x33\x6a\x63\x45\x55\xac\x51"
+			  "\xa3\x44\xbd\x6c\x9b\x56\xb3\xef"
+			  "\xab\x45\x6b\x0a\x18\xf0\xe8\x35"
+			  "\x3d\x19\xb9\xd2\x7e\x46\x37\x04"
+			  "\x2e\x3b\x3c\x0d\xd8\xcf\x25\x4a"
+			  "\xd7\x63\xeb\x74\xa9\x5a\x95\x4c"
+			  "\x9f\xfb\xe3\x5f\x9e\x41\x14\x03"
+			  "\x48\x8b\xde\x0c\xe6\x70\xd0\x22"
+			  "\x07\xd5\x7f\x88\x8b\xcc\x5a\x12"
+			  "\x9d\xfb\xa6\x84\x97\x3e\xad\x44"
+			  "\x3e\xfa\x3c\xd0\x99\xb0\x0c\x6b"
+			  "\x32\x57\x73\x4a\xfb\xc7\x8d\x01"
+			  "\xe7\xdd\x7c\x7e\x53\x80\xe3\xbb"
+			  "\xdc\x39\x73\x4a\x6f\x11\x3e\xa1"
+			  "\x33\xfa\xb9\x5a\x63\xc7\xdd\xe7"
+			  "\x9d\x00\x89\x6c\x8b\x2c\xc6\x0c"
+			  "\x51\xa4\x29\x80\xae\x97\x67\x7f"
+			  "\xc0\x30\x8c\x5c\x00\xb3\xc9\xe7"
+			  "\x90\xf5\x26\xb7\x55\xad\x5b\x5e"
+			  "\xaf\xf7\x6a\xc8\x22\xc0\x08\x9f"
+			  "\x09\xd0\x8c\x77\x5a\xad\x7c\x2c"
+			  "\xc2\xd7\x3c\x76\xc9\x08\xbd\x83"
+			  "\x09\xf2\xcc\x65\x7a\x84\xf2\x49"
+			  "\x04\x69\xd2\x1c\x72\x01\xec\xa8"
+			  "\xf8\x58\x2a\x65\x4a\x12\x3d\xfe"
+			  "\x82\x4f\x02\x97\xb6\x9e\x54\x8c"
+			  "\x79\x43\x23\x6c\xc4\x67\x33\xce"
+			  "\x37\x4e\xfe\x0f\x66\xa7\x16\x1c"
+			  "\xba\xbf\x75\x2c\x74\x30\xcd\x9c"
+			  "\x34\x04\x5f\x44\xac\x06\x0a\x9f"
+			  "\xe3\x68\x92\x4f\x20\x89\x35\x82"
+			  "\x2e\xe9\xdc\xbf\x79\xc3\xb8\x9b"
+			  "\x18\xe2\xaa\xed\xa4\x6b\xd3\xe7"
+			  "\xb7\xfb\x8a\x10\x7a\x23\x1d\x5b"
+			  "\x89\xa3\xe9\x26\x0e\x31\x3a\x4d"
+			  "\x99\xee\x14\x1b\x4c\x90\xf5\xf3"
+			  "\x70\xeb\x78\x9d\x6a\x20\xb9\x60"
+			  "\x3e\x24\x42\xd0\x62\x93\x94\x4e"
+			  "\xbb\x21\xce\x0e\xcc\x4c\xd7\x04",
+		.ctext	= "\xf2\x90\x24\x8d\xba\x6f\x31\x5c"
+			  "\x3e\x5a\x2d\xf1\x72\xe0\x99\x17"
+			  "\xf9\x9e\xf9\x3e\x6c\x8e\x43\xd9"
+			  "\x41\xbe\x74\x94\x4d\xf9\x73\x7d"
+			  "\xe0\xa6\x62\xd1\x9e\x27\x80\x7d"
+			  "\x40\x4c\x92\x50\xe9\x4e\x6b\x67"
+			  "\xa7\x48\x8c\xd5\xcf\x4b\x2b\xe8"
+			  "\x8c\xd5\x90\x7e\x52\x83\x36\xd6"
+			  "\x20\xf5\x78\x31\xeb\x65\x55\xc7"
+			  "\x49\x9c\x7a\xe3\xa8\xad\xe3\x6a"
+			  "\xc2\x3d\xbc\x45\x2f\x8f\x6a\xc1"
+			  "\x61\x9c\xbb\xf9\xe7\x1d\x06\x94"
+			  "\x49\x36\x77\x95\x52\xfa\x3a\x2c"
+			  "\x92\xf3\x77\x38\xbe\xf2\x54\xe9"
+			  "\x5d\x1c\x9e\xc8\x5a\x29\x24\x1f"
+			  "\x3c\xbc\x71\x5e\x73\xdb\xf6\x22"
+			  "\x27\x6d\xe7\x18\x82\xb1\x51\x1c"
+			  "\xdb\x50\x58\xd3\xf5\xf2\xb1\x7f"
+			  "\x67\x71\x67\x01\xe0\x23\x04\xfc"
+			  "\x91\x81\x04\x75\x55\x7b\x01\xc8"
+			  "\x21\x57\x60\x61\x38\x2c\x42\x9a"
+			  "\x9e\xd3\xd7\x16\x2c\xe6\x7e\xe6"
+			  "\xdc\x3c\xbe\x31\x77\x0d\xc4\xfe"
+			  "\xa3\x69\x05\xdf\x70\xe8\x44\x48"
+			  "\x69\x40\x56\x64\x0c\x1f\x72\x89"
+			  "\x15\xb8\xbd\x10\x2a\x75\xb8\x1b"
+			  "\x42\xcc\x75\x50\xc7\xe6\xcf\x13"
+			  "\x2e\xda\x18\x36\x6f\x41\xd7\x14"
+			  "\x2d\xb6\x6d\xce\xe3\x38\x9a\xd0"
+			  "\x14\x94\x4c\x93\xd3\x11\xcc\x59"
+			  "\x6e\x2c\xb1\xf5\xa0\x6c\xec\x9b"
+			  "\xcc\x5c\x26\xbe\x5f\x90\x9a\xb1"
+			  "\x97\xea\x33\x1e\x6c\x91\x57\x7d"
+			  "\xd7\xf8\x4f\x93\x62\xec\xb6\x18"
+			  "\x65\xe3\xe2\xfe\xd7\xb0\xf1\xc1"
+			  "\xea\xa1\x98\xe9\x0a\xd8\x05\x79"
+			  "\x7b\xb5\x85\xd0\x5b\x71\xbc\x77"
+			  "\xd2\xb5\x8f\xb9\xd8\xdf\x50\xc1"
+			  "\xe7\x1d\xe6\x73\x11\xf5\x99\x0d"
+			  "\x91\x18\x92\xef\xe2\x33\x97\x03"
+			  "\x65\xbd\xf4\xe4\xab\x55\x71\x7c"
+			  "\xa2\xb6\xce\x1d\x48\x3d\x65\xa7",
+		.len	= 336,
+	},
+};
+
+static const struct aead_testvec lea_gcm_tv_template[] = {
+	{
+		.key	= "\xa4\x94\x52\x9d\x9c\xac\x44\x59"
+			  "\xf0\x57\x8c\xdf\x7f\x87\xa8\xc9",
+		.klen	= 16,
+		.iv	= "\x4b\xc3\x50\xf9\x7f\x1d\xa1\x2c"
+			  "\xb1\x64\x7b\xd2",
+		.assoc	= "",
+		.alen	= 0,
+		.ptext	= "\x64\x9a\x28\x1e\xd1\xa8\x3e\x59",
+		.plen	= 8,
+		.ctext	= "\xe8\xea\xa3\x5e\xb6\x2e\x25\xcb"
+			  "\x9d\xfe\x1e\xd1\xdc\x53\x3c\x11"
+			  "\x4f\x06\x50\x8b\x18\x9c\xc6\x52",
+		.clen	= 24,
+	}, {
+		.key	= "\x07\x0c\x3c\x1f\x8d\xad\x00\x1e"
+			  "\xee\xb3\xb7\xe2\x28\xb4\xed\xd5",
+		.klen	= 16,
+		.iv	= "\xcf\x80\x82\x6c\x54\x57\x07\xfb"
+			  "\x87\x5a\x6a\xcd",
+		.assoc	= "\x5b\x40\xd6\x74\xe9\x4a\xd5\x5e"
+			  "\xb8\x79\xb8\xa9\x3c\xfe\x38\x38"
+			  "\x9c\xf2\x5d\x07\xb9\x47\x9f\xbb"
+			  "\x6b\xff\x4c\x7e\x0d\x9b\x29\x09"
+			  "\x3d\xd7\x5c\x02",
+		.alen	= 36,
+		.ptext	= "\xdd\x94\x89\x89\x5d\x16\x3c\x0e"
+			  "\x3d\x6f\x87\x65\xcd\x3b\xec\x1c"
+			  "\x38\x8e\x7c\x0c\xc0\x2b\x41\x2e"
+			  "\x4b\xf7\xda\xb0\x1f\xad\x65\x48"
+			  "\xea\xd2\xa2\xc9\x05\xec\x54\xf4"
+			  "\xf9\xef\xeb\x90\x43\xf8\x61\xbd"
+			  "\x54\x3d\x62\x85\xdc\x44\xaf\xb4"
+			  "\x48\x54\xc4\xe9\x89\x2a\xb9\xee"
+			  "\x18\xec\x66\x45\x37\x63\xca\x03"
+			  "\x79\x64\xae\xe2\x84\x8f\x85\x91",
+		.plen	= 80,
+		.ctext	= "\xb6\x34\x2e\x35\x28\xa0\x34\x30"
+			  "\xf3\x98\x25\x37\xc8\xb6\xa1\x84"
+			  "\xe9\x79\x9e\x80\xc0\x87\x5b\xa4"
+			  "\x9a\x0c\x93\x00\x08\x3f\x51\x25"
+			  "\x6d\x73\x9d\x34\xa2\x63\x3e\x5b"
+			  "\x47\x53\x94\xf8\x1c\x78\x64\x6d"
+			  "\x3a\x96\xdd\x11\xef\x23\x5b\xd4"
+			  "\x75\x8f\x6c\x6f\x97\xea\x0b\x89"
+			  "\xe9\x8b\xfb\x8a\x99\x66\x4e\x33"
+			  "\x17\x0a\x63\xc4\xfe\x5c\xa3\xf8"
+			  "\x87\xaf\x9d\x1b\xd0\x20\x8c\x0d"
+			  "\x42\xcb\x77\x88\xdd\x3f\xe2\xdb",
+		.clen	= 96,
+	}, {
+		.key	= "\xa8\x70\xc1\x07\xf7\x8c\x92\x65"
+			  "\xa8\x57\xd6\xe6\x7a\x23\xe9\x8a"
+			  "\x3d\x14\xad\xb5\x91\xd4\x75\x85",
+		.klen	= 24,
+		.iv	= "\xf0\x89\x21\x63\xef\x04\x8a\xd8"
+			  "\xc0\x3b\x20\xa2",
+		.assoc	= "\xfc\xfa\xd1\x08\x9f\xd5\x2d\x6a"
+			  "\x55\x61\xc8\x1c",
+		.alen	= 12,
+		.ptext	= "\xf4\xa4\xe0\x75\x49\xc9\x40\x22"
+			  "\x17\x18\x64\xc0\x5d\x26\xde\xab"
+			  "\xd8\x49\xf9\x10\xc9\x4f\x9b\x4a"
+			  "\xf8\x70\x70\x6b\xf9\x80\x44\x18",
+		.plen	= 32,
+		.ctext	= "\xeb\x0a\xd2\x9b\xbd\xf1\xfe\x5c"
+			  "\xb5\x7e\x82\xfe\xef\x98\xcd\x20"
+			  "\xb8\x26\x46\x1f\xa7\xc4\xb1\xba"
+			  "\x04\x27\xbc\xe8\x28\x8b\xe2\x9c"
+			  "\x68\x49\x11\x0a\x5b\x8d\x2e\x55"
+			  "\xb3\x73\xf9\x78\x4b\xd4\x34\x5f",
+		.clen	= 48,
+	}, {
+		.key	= "\x3b\xe7\x4c\x0c\x71\x08\xe0\xae"
+			  "\xb8\xe9\x57\x41\x54\x52\xa2\x03"
+			  "\x5d\x8a\x45\x7d\x07\x83\xb7\x59",
+		.klen	= 24,
+		.iv	= "\x27\x51\x07\x73\xf2\xe0\xc5\x33"
+			  "\x07\xe7\x20\x19",
+		.assoc	= "\xb0\x18\x4c\x99\x64\x9a\x27\x2a"
+			  "\x91\xb8\x1b\x9a\x99\xdb\x46\xa4"
+			  "\x1a\xb5\xd8\xc4\x73\xc0\xbd\x4a"
+			  "\x84\xe7\x7d\xae\xb5\x82\x60\x23",
+		.alen	= 32,
+		.ptext	= "\x39\x88\xd5\x6e\x94\x00\x14\xf9"
+			  "\x5a\xb9\x03\x23\x3a\x3b\x56\xdb"
+			  "\x3c\xfd\xfb\x6d\x47\xd9\xb5\x9b"
+			  "\xe6\xbc\x07\xf0\x4b\xa2\x53\x51"
+			  "\x95\xc2\x43\xd5\x4e\x05\x68\xd7"
+			  "\x38\xbd\x21\x49\x49\x94\xbf\x4a"
+			  "\xf4\xc2\xe6\xfb\xaa\x84\x36\x8f"
+			  "\xa1\xc9\x2b\xa2\xd4\x2e\x42\xcc"
+			  "\x4b\x2c\x5e\x75\x9c\x90\x69\xeb",
+		.plen	= 72,
+		.ctext	= "\x84\xe1\x22\x8e\x1d\xd6\x26\xe0"
+			  "\xfc\xbb\x5e\x50\x43\x66\x4e\xb1"
+			  "\x2c\xa2\xb4\x8d\x2a\x57\x52\x1e"
+			  "\xe1\x90\x25\x0b\x12\x1d\x8f\xcb"
+			  "\x81\xae\xdc\x06\xc6\xa8\x4b\xd7"
+			  "\xa5\xbf\xbb\x84\xa9\x9b\x49\xa5"
+			  "\xcd\x8e\xec\x3b\x89\xce\x99\x86"
+			  "\x1f\xed\xfc\x08\x17\xd9\xe5\x9c"
+			  "\x8a\x29\x0b\x7f\x32\x6c\x9a\x99"
+			  "\x53\x5e\xcd\xe5\x6e\x60\xf3\x3e"
+			  "\x3a\x50\x5b\x39\x0b\x06\xf4\x0b",
+		.clen	= 88,
+	}, {
+		.key	= "\xad\x4a\x74\x23\x04\x47\xbc\xd4"
+			  "\x92\xf2\xf8\xa8\xc5\x94\xa0\x43"
+			  "\x79\x27\x16\x90\xbf\x0c\x8a\x13"
+			  "\xdd\xfc\x1b\x7b\x96\x41\x3e\x77",
+		.klen	= 32,
+		.iv	= "\xab\x26\x64\xcb\xa1\xac\xd7\xa3"
+			  "\xc5\x7e\xe5\x27",
+		.assoc	= "\x6e\x27\x41\x4f",
+		.alen	= 4,
+		.ptext	= "\x82\x83\xa6\xf9\x3b\x73\xbd\x39"
+			  "\x2b\xd5\x41\xf0\x7e\xb4\x61\xa0",
+		.plen	= 16,
+		.ctext	= "\x62\xb3\xc9\x62\x84\xee\x7c\x7c"
+			  "\xf3\x85\x42\x76\x47\xe4\xf2\xd1"
+			  "\xe8\x2f\x67\x8a\x38\xcc\x02\x1a"
+			  "\x03\xc8\x3f\xb7\x94\xaf\x01\xb0",
+		.clen	= 32,
+	}, {
+		.key	= "\x77\xaa\xa2\x33\x82\x3e\x00\x08"
+			  "\x76\x4f\x49\xfa\x78\xf8\x7a\x21"
+			  "\x18\x1f\x33\xae\x8e\xa8\x17\xc3"
+			  "\x43\xe8\x76\x88\x94\x5d\x2a\x7b",
+		.klen	= 32,
+		.iv	= "\xd2\x9c\xbe\x07\x8d\x8a\xd6\x59"
+			  "\x12\xcf\xca\x6f",
+		.assoc	= "\x32\x88\x95\x71\x45\x3c\xee\x45"
+			  "\x6f\x12\xb4\x5e\x22\x41\x8f\xd4"
+			  "\xe4\xc7\xd5\xba\x53\x5e\xaa\xac",
+		.alen	= 24,
+		.ptext	= "\x66\xac\x6c\xa7\xf5\xba\x4e\x1d"
+			  "\x7c\xa7\x42\x49\x1c\x9e\x1d\xc1"
+			  "\xe2\x05\xf5\x4a\x4c\xf7\xce\xef"
+			  "\x09\xf5\x76\x55\x01\xd8\xae\x49"
+			  "\x95\x0a\x8a\x9b\x28\xf6\x1b\x2f"
+			  "\xde\xbd\x4b\x51\xa3\x2b\x07\x49"
+			  "\x70\xe9\xa4\x2f\xc9\xf4\x7b\x01",
+		.plen	= 56,
+		.ctext	= "\x1e\x98\x0b\xc3\xd9\x70\xec\x90"
+			  "\x04\x17\x7f\x5e\xe0\xe9\xba\xca"
+			  "\x2f\x49\x28\x36\x71\x08\x69\xe5"
+			  "\x91\xa2\x0c\x0f\xa4\x12\xff\xae"
+			  "\xd9\x5f\x98\x50\xcf\x93\xb4\xfb"
+			  "\x9f\x43\x1a\xd8\x55\x5f\x4b\x3a"
+			  "\xe7\xc8\x1e\xae\x61\x29\x81\x1f"
+			  "\xe3\xee\x8a\x8e\x04\xee\x49\x4b"
+			  "\x2b\x54\xd7\xdc\xea\xcd\xba\xd6",
+		.clen	= 72,
+	},
+};
+
 static const struct cipher_testvec chacha20_tv_template[] = {
 	{ /* RFC7539 A.2. Test Vector #1 */
 		.key	= "\x00\x00\x00\x00\x00\x00\x00\x00"
-- 
2.34.1

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 3/3] crypto: LEA block cipher AVX2 optimization
  2023-04-28 11:00 [PATCH 0/3] crypto: LEA block cipher implementation Dongsoo Lee
  2023-04-28 11:00 ` [PATCH 1/3] " Dongsoo Lee
  2023-04-28 11:00 ` [PATCH 2/3] crypto: add LEA testmgr tests Dongsoo Lee
@ 2023-04-28 11:00 ` Dongsoo Lee
  2023-04-28 15:54   ` Dave Hansen
  2023-04-28 23:19 ` [PATCH 0/3] crypto: LEA block cipher implementation Eric Biggers
  3 siblings, 1 reply; 10+ messages in thread
From: Dongsoo Lee @ 2023-04-28 11:00 UTC (permalink / raw)
  To: linux-crypto
  Cc: Herbert Xu, David S. Miller, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, x86, H. Peter Anvin, linux-kernel,
	David S. Miller, Dongsoo Lee, Dongsoo Lee

For the x86_64 environment, we use SSE2/MOVBE/AVX2 instructions. Since
LEA use four 32-bit unsigned integers for 128-bit block, the SSE2 and
AVX2 implementations encrypts four and eight blocks at a time for
optimization, respectively.

Our submission provides a optimized implementation of 4/8 block ECB, CBC
decryption, CTR, and XTS cipher operation modes on x86_64 CPUs
supporting AVX2.
The MOVBE instruction is used for optimizing the CTR mode.

Signed-off-by: Dongsoo Lee <letrhee@nsr.re.kr>
---
 arch/x86/crypto/Kconfig               |   22 +
 arch/x86/crypto/Makefile              |    3 +
 arch/x86/crypto/lea_avx2_glue.c       | 1112 +++++++++++++++++++++++++
 arch/x86/crypto/lea_avx2_x86_64-asm.S |  778 +++++++++++++++++
 4 files changed, 1915 insertions(+)
 create mode 100644 arch/x86/crypto/lea_avx2_glue.c
 create mode 100644 arch/x86/crypto/lea_avx2_x86_64-asm.S

diff --git a/arch/x86/crypto/Kconfig b/arch/x86/crypto/Kconfig
index 9bbfd01cfa2f..bc2620d9401a 100644
--- a/arch/x86/crypto/Kconfig
+++ b/arch/x86/crypto/Kconfig
@@ -342,6 +342,28 @@ config CRYPTO_ARIA_GFNI_AVX512_X86_64
 
 	  Processes 64 blocks in parallel.
 
+config CRYPTO_LEA_AVX2
+	tristate "Ciphers: LEA with modes: ECB, CBC, CTR, XTS (SSE2/MOVBE/AVX2)"
+	select CRYPTO_LEA
+	imply CRYPTO_XTS
+	imply CRYPTO_CTR
+	help
+	  LEA cipher algorithm (KS X 3246, ISO/IEC 29192-2:2019)
+
+	  LEA is one of the standard cryptographic alorithms of
+	  the Republic of Korea. It consists of four 32bit word.
+
+	  See:
+	  https://seed.kisa.or.kr/kisa/algorithm/EgovLeaInfo.do
+
+	  Architecture: x86_64 using:
+	  - SSE2 (Streaming SIMD Extensions 2)
+	  - MOVBE (Move Data After Swapping Bytes)
+	  - AVX2 (Advanced Vector Extensions)
+
+	  Processes 4(SSE2), 8(AVX2) blocks in parallel.
+	  In CTR mode, the MOVBE instruction is utilized for improved performance.
+
 config CRYPTO_CHACHA20_X86_64
 	tristate "Ciphers: ChaCha20, XChaCha20, XChaCha12 (SSSE3/AVX2/AVX-512VL)"
 	depends on X86 && 64BIT
diff --git a/arch/x86/crypto/Makefile b/arch/x86/crypto/Makefile
index 9aa46093c91b..de23293b88df 100644
--- a/arch/x86/crypto/Makefile
+++ b/arch/x86/crypto/Makefile
@@ -109,6 +109,9 @@ aria-aesni-avx2-x86_64-y := aria-aesni-avx2-asm_64.o aria_aesni_avx2_glue.o
 obj-$(CONFIG_CRYPTO_ARIA_GFNI_AVX512_X86_64) += aria-gfni-avx512-x86_64.o
 aria-gfni-avx512-x86_64-y := aria-gfni-avx512-asm_64.o aria_gfni_avx512_glue.o
 
+obj-$(CONFIG_CRYPTO_LEA_AVX2) += lea-avx2-x86_64.o
+lea-avx2-x86_64-y := lea_avx2_x86_64-asm.o lea_avx2_glue.o
+
 quiet_cmd_perlasm = PERLASM $@
       cmd_perlasm = $(PERL) $< > $@
 $(obj)/%.S: $(src)/%.pl FORCE
diff --git a/arch/x86/crypto/lea_avx2_glue.c b/arch/x86/crypto/lea_avx2_glue.c
new file mode 100644
index 000000000000..532958d3caa5
--- /dev/null
+++ b/arch/x86/crypto/lea_avx2_glue.c
@@ -0,0 +1,1112 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Glue Code for the SSE2/MOVBE/AVX2 assembler instructions for the LEA Cipher
+ *
+ * Copyright (c) 2023 National Security Research.
+ * Author: Dongsoo Lee <letrhee@nsr.re.kr>
+ */
+
+#include <asm/simd.h>
+#include <asm/unaligned.h>
+#include <crypto/algapi.h>
+#include <crypto/ctr.h>
+#include <crypto/internal/simd.h>
+#include <crypto/scatterwalk.h>
+#include <crypto/skcipher.h>
+#include <crypto/internal/skcipher.h>
+#include <linux/err.h>
+#include <linux/module.h>
+#include <linux/types.h>
+
+#include <crypto/lea.h>
+#include <crypto/xts.h>
+#include "ecb_cbc_helpers.h"
+
+#define SIMD_KEY_ALIGN 16
+#define SIMD_ALIGN_ATTR __aligned(SIMD_KEY_ALIGN)
+
+struct lea_xts_ctx {
+	u8 raw_crypt_ctx[sizeof(struct crypto_lea_ctx)] SIMD_ALIGN_ATTR;
+	u8 raw_tweak_ctx[sizeof(struct crypto_lea_ctx)] SIMD_ALIGN_ATTR;
+};
+
+#define LEA_AVX2_PARALLEL_BLOCKS 8
+#define LEA_SSE2_PARALLEL_BLOCKS 4
+
+asmlinkage void lea_avx2_ecb_enc_8way(const void *ctx, u8 *dst, const u8 *src);
+asmlinkage void lea_avx2_ecb_dec_8way(const void *ctx, u8 *dst, const u8 *src);
+asmlinkage void lea_avx2_ecb_enc_4way(const void *ctx, u8 *dst, const u8 *src);
+asmlinkage void lea_avx2_ecb_dec_4way(const void *ctx, u8 *dst, const u8 *src);
+
+asmlinkage void lea_avx2_cbc_dec_8way(const void *ctx, u8 *dst, const u8 *src);
+asmlinkage void lea_avx2_cbc_dec_4way(const void *ctx, u8 *dst, const u8 *src);
+
+asmlinkage void lea_avx2_ctr_enc_8way(const void *ctx, u8 *dst, const u8 *src,
+				u8 *ctr, u8 *buffer);
+asmlinkage void lea_avx2_ctr_enc_4way(const void *ctx, u8 *dst, const u8 *src,
+				u8 *ctr);
+
+asmlinkage void lea_avx2_xts_enc_8way(const void *ctx, u8 *dst, const u8 *src,
+				u8 *tweak);
+asmlinkage void lea_avx2_xts_dec_8way(const void *ctx, u8 *dst, const u8 *src,
+				u8 *tweak);
+asmlinkage void lea_avx2_xts_enc_4way(const void *ctx, u8 *dst, const u8 *src,
+				u8 *tweak);
+asmlinkage void lea_avx2_xts_dec_4way(const void *ctx, u8 *dst, const u8 *src,
+				u8 *tweak);
+asmlinkage void lea_avx2_xts_next_tweak_sse2(u8 *tweak_out, const u8 *tweak_in);
+
+static int ecb_encrypt_8way(struct skcipher_request *req)
+{
+	ECB_WALK_START(req, LEA_BLOCK_SIZE, LEA_SSE2_PARALLEL_BLOCKS);
+	ECB_BLOCK(LEA_AVX2_PARALLEL_BLOCKS, lea_avx2_ecb_enc_8way);
+	ECB_BLOCK(LEA_SSE2_PARALLEL_BLOCKS, lea_avx2_ecb_enc_4way);
+	ECB_BLOCK(1, lea_encrypt);
+	ECB_WALK_END();
+}
+
+static int ecb_decrypt_8way(struct skcipher_request *req)
+{
+	ECB_WALK_START(req, LEA_BLOCK_SIZE, LEA_SSE2_PARALLEL_BLOCKS);
+	ECB_BLOCK(LEA_AVX2_PARALLEL_BLOCKS, lea_avx2_ecb_dec_8way);
+	ECB_BLOCK(LEA_SSE2_PARALLEL_BLOCKS, lea_avx2_ecb_dec_4way);
+	ECB_BLOCK(1, lea_decrypt);
+	ECB_WALK_END();
+}
+
+static int ecb_encrypt_4way(struct skcipher_request *req)
+{
+	ECB_WALK_START(req, LEA_BLOCK_SIZE, LEA_SSE2_PARALLEL_BLOCKS);
+	ECB_BLOCK(LEA_SSE2_PARALLEL_BLOCKS, lea_avx2_ecb_enc_4way);
+	ECB_BLOCK(1, lea_encrypt);
+	ECB_WALK_END();
+}
+
+static int ecb_decrypt_4way(struct skcipher_request *req)
+{
+	ECB_WALK_START(req, LEA_BLOCK_SIZE, LEA_SSE2_PARALLEL_BLOCKS);
+	ECB_BLOCK(LEA_SSE2_PARALLEL_BLOCKS, lea_avx2_ecb_dec_4way);
+	ECB_BLOCK(1, lea_decrypt);
+	ECB_WALK_END();
+}
+
+static int cbc_encrypt(struct skcipher_request *req)
+{
+	CBC_WALK_START(req, LEA_BLOCK_SIZE, -1);
+	CBC_ENC_BLOCK(lea_encrypt);
+	CBC_WALK_END();
+}
+
+static int cbc_decrypt_8way(struct skcipher_request *req)
+{
+	CBC_WALK_START(req, LEA_BLOCK_SIZE, LEA_SSE2_PARALLEL_BLOCKS);
+	CBC_DEC_BLOCK(LEA_AVX2_PARALLEL_BLOCKS, lea_avx2_cbc_dec_8way);
+	CBC_DEC_BLOCK(LEA_SSE2_PARALLEL_BLOCKS, lea_avx2_cbc_dec_4way);
+	CBC_DEC_BLOCK(1, lea_decrypt);
+	CBC_WALK_END();
+}
+
+static int cbc_decrypt_4way(struct skcipher_request *req)
+{
+	CBC_WALK_START(req, LEA_BLOCK_SIZE, LEA_SSE2_PARALLEL_BLOCKS);
+	CBC_DEC_BLOCK(LEA_SSE2_PARALLEL_BLOCKS, lea_avx2_cbc_dec_4way);
+	CBC_DEC_BLOCK(1, lea_decrypt);
+	CBC_WALK_END();
+}
+
+struct _lea_u128 {
+	u64 v0, v1;
+};
+
+static inline void xor_1blk(u8 *out, const u8 *in1, const u8 *in2)
+{
+	const struct _lea_u128 *_in1 = (const struct _lea_u128 *)in1;
+	const struct _lea_u128 *_in2 = (const struct _lea_u128 *)in2;
+	struct _lea_u128 *_out = (struct _lea_u128 *)out;
+
+	_out->v0 = _in1->v0 ^ _in2->v0;
+	_out->v1 = _in1->v1 ^ _in2->v1;
+}
+
+static inline void xts_next_tweak(u8 *out, const u8 *in)
+{
+	const u64 *_in = (const u64 *)in;
+	u64 *_out = (u64 *)out;
+	u64 v0 = _in[0];
+	u64 v1 = _in[1];
+	u64 carry = (u64)(((s64)v1) >> 63);
+
+	v1 = (v1 << 1) ^ (v0 >> 63);
+	v0 = (v0 << 1) ^ ((u64)carry & 0x87);
+
+	_out[0] = v0;
+	_out[1] = v1;
+}
+
+static int xts_encrypt_8way(struct skcipher_request *req)
+{
+	struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
+	struct crypto_tfm *tfm_ctx = crypto_skcipher_ctx(tfm);
+	struct lea_xts_ctx *ctx = crypto_tfm_ctx(tfm_ctx);
+	struct skcipher_request subreq;
+	struct skcipher_walk walk;
+
+	int ret;
+	u32 nblocks;
+	u32 tail = req->cryptlen % LEA_BLOCK_SIZE;
+	u32 edge_tail = 0;
+
+	if (req->cryptlen < LEA_BLOCK_SIZE)
+		return -EINVAL;
+
+	ret = skcipher_walk_virt(&walk, req, false);
+	if (ret)
+		return ret;
+
+	if (unlikely(tail != 0 && walk.nbytes < walk.total)) {
+		u32 req_len = req->cryptlen - LEA_BLOCK_SIZE - tail;
+
+		skcipher_walk_abort(&walk);
+
+		skcipher_request_set_tfm(&subreq, tfm);
+		skcipher_request_set_callback(
+			&subreq, skcipher_request_flags(req), NULL, NULL);
+		skcipher_request_set_crypt(&subreq, req->src, req->dst, req_len,
+					req->iv);
+		req = &subreq;
+		ret = skcipher_walk_virt(&walk, req, false);
+		if (ret)
+			return ret;
+		edge_tail = tail;
+		tail = 0;
+	}
+
+	lea_encrypt(ctx->raw_tweak_ctx, walk.iv, walk.iv);
+
+	while ((nblocks = walk.nbytes / LEA_BLOCK_SIZE) > 0) {
+		u32 nbytes = walk.nbytes;
+		const u8 *src = walk.src.virt.addr;
+		u8 *dst = walk.dst.virt.addr;
+		bool is_tail = tail != 0 &&
+				(nblocks + 1) * LEA_BLOCK_SIZE > walk.total;
+
+		if (unlikely(is_tail))
+			nblocks -= 1;
+
+		kernel_fpu_begin();
+
+		for (; nblocks >= LEA_AVX2_PARALLEL_BLOCKS;
+			nblocks -= LEA_AVX2_PARALLEL_BLOCKS) {
+			lea_avx2_xts_enc_8way(ctx->raw_crypt_ctx, dst, src, walk.iv);
+			src += LEA_AVX2_PARALLEL_BLOCKS * LEA_BLOCK_SIZE;
+			dst += LEA_AVX2_PARALLEL_BLOCKS * LEA_BLOCK_SIZE;
+			nbytes -= LEA_AVX2_PARALLEL_BLOCKS * LEA_BLOCK_SIZE;
+		}
+
+		for (; nblocks >= LEA_SSE2_PARALLEL_BLOCKS;
+			nblocks -= LEA_SSE2_PARALLEL_BLOCKS) {
+			lea_avx2_xts_enc_4way(ctx->raw_crypt_ctx, dst, src, walk.iv);
+			src += LEA_SSE2_PARALLEL_BLOCKS * LEA_BLOCK_SIZE;
+			dst += LEA_SSE2_PARALLEL_BLOCKS * LEA_BLOCK_SIZE;
+			nbytes -= LEA_SSE2_PARALLEL_BLOCKS * LEA_BLOCK_SIZE;
+		}
+
+		for (; nblocks > 0; nblocks -= 1) {
+			u8 __aligned(16) buffer[LEA_BLOCK_SIZE];
+
+			xor_1blk(buffer, walk.iv, src);
+			lea_encrypt(ctx->raw_crypt_ctx, buffer,
+						buffer);
+			xor_1blk(dst, walk.iv, buffer);
+			xts_next_tweak(walk.iv, walk.iv);
+
+			src += LEA_BLOCK_SIZE;
+			dst += LEA_BLOCK_SIZE;
+			nbytes -= LEA_BLOCK_SIZE;
+		}
+
+		if (unlikely(is_tail)) {
+			u8 __aligned(16) buffer[LEA_BLOCK_SIZE];
+
+			xor_1blk(buffer, walk.iv, src);
+			lea_encrypt(ctx->raw_crypt_ctx, buffer,
+						buffer);
+			xor_1blk(buffer, walk.iv, buffer);
+
+			memcpy(dst, buffer, LEA_BLOCK_SIZE);
+			memcpy(buffer, src + LEA_BLOCK_SIZE, tail);
+			memcpy(dst + LEA_BLOCK_SIZE, dst, tail);
+
+			xts_next_tweak(walk.iv, walk.iv);
+
+			xor_1blk(buffer, walk.iv, buffer);
+			lea_encrypt(ctx->raw_crypt_ctx, buffer,
+						buffer);
+			xor_1blk(dst, walk.iv, buffer);
+
+			nbytes -= LEA_BLOCK_SIZE + tail;
+
+			kernel_fpu_end();
+			return skcipher_walk_done(&walk, nbytes);
+		}
+
+		kernel_fpu_end();
+		ret = skcipher_walk_done(&walk, nbytes);
+		if (ret)
+			return ret;
+	}
+
+	if (unlikely(edge_tail != 0)) {
+		u8 __aligned(16) buffer[LEA_BLOCK_SIZE];
+		struct scatterlist sg_src[2];
+		struct scatterlist sg_dst[2];
+		struct scatterlist *scatter_src;
+		struct scatterlist *scatter_dst;
+		const u8 *src;
+		u8 *dst;
+
+		scatter_src = scatterwalk_ffwd(sg_src, req->src, req->cryptlen);
+		if (req->src == req->dst) {
+			scatter_dst = scatter_src;
+		} else {
+			scatter_dst = scatterwalk_ffwd(sg_dst, req->dst,
+							req->cryptlen);
+		}
+
+		skcipher_request_set_crypt(req, scatter_src, scatter_dst,
+					LEA_BLOCK_SIZE + edge_tail, req->iv);
+
+		ret = skcipher_walk_virt(&walk, req, false);
+
+		src = walk.src.virt.addr;
+		dst = walk.dst.virt.addr;
+
+		kernel_fpu_begin();
+
+		xor_1blk(buffer, walk.iv, src);
+		lea_encrypt(ctx->raw_crypt_ctx, buffer, buffer);
+		xor_1blk(buffer, walk.iv, buffer);
+
+		memcpy(dst, buffer, LEA_BLOCK_SIZE);
+		memcpy(buffer, src + LEA_BLOCK_SIZE, edge_tail);
+		memcpy(dst + LEA_BLOCK_SIZE, dst, edge_tail);
+
+		xts_next_tweak(walk.iv, walk.iv);
+
+		xor_1blk(buffer, walk.iv, buffer);
+		lea_encrypt(ctx->raw_crypt_ctx, buffer, buffer);
+		xor_1blk(dst, walk.iv, buffer);
+
+		kernel_fpu_end();
+		ret = skcipher_walk_done(&walk, 0);
+	}
+
+	return ret;
+}
+
+static int xts_decrypt_8way(struct skcipher_request *req)
+{
+	struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
+	struct crypto_tfm *tfm_ctx = crypto_skcipher_ctx(tfm);
+	struct lea_xts_ctx *ctx = crypto_tfm_ctx(tfm_ctx);
+	struct skcipher_request subreq;
+	struct skcipher_walk walk;
+
+	u8 __aligned(16) ntweak[16] = { 0, };
+	u8 __aligned(16) buffer[LEA_BLOCK_SIZE];
+
+	int ret;
+	u32 nblocks;
+	u32 tail = req->cryptlen % LEA_BLOCK_SIZE;
+	u32 edge_tail = 0;
+
+	if (req->cryptlen < LEA_BLOCK_SIZE)
+		return -EINVAL;
+
+	ret = skcipher_walk_virt(&walk, req, false);
+
+	if (ret)
+		return ret;
+
+	if (unlikely(tail != 0 && walk.nbytes < walk.total)) {
+		u32 req_len = req->cryptlen - LEA_BLOCK_SIZE - tail;
+
+		skcipher_walk_abort(&walk);
+
+		skcipher_request_set_tfm(&subreq, tfm);
+		skcipher_request_set_callback(
+			&subreq, skcipher_request_flags(req), NULL, NULL);
+		skcipher_request_set_crypt(&subreq, req->src, req->dst, req_len,
+					req->iv);
+		req = &subreq;
+		ret = skcipher_walk_virt(&walk, req, false);
+		if (ret)
+			return ret;
+
+		edge_tail = tail;
+		tail = 0;
+	}
+
+	lea_encrypt(ctx->raw_tweak_ctx, walk.iv, walk.iv);
+
+	while ((nblocks = walk.nbytes / LEA_BLOCK_SIZE) > 0) {
+		u32 nbytes = walk.nbytes;
+		const u8 *src = walk.src.virt.addr;
+		u8 *dst = walk.dst.virt.addr;
+		bool is_tail = tail != 0 &&
+				(nblocks + 1) * LEA_BLOCK_SIZE > walk.total;
+
+		if (unlikely(is_tail))
+			nblocks -= 1;
+
+		kernel_fpu_begin();
+
+		for (; nblocks >= LEA_AVX2_PARALLEL_BLOCKS;
+			nblocks -= LEA_AVX2_PARALLEL_BLOCKS) {
+			lea_avx2_xts_dec_8way(ctx->raw_crypt_ctx, dst, src, walk.iv);
+			src += LEA_AVX2_PARALLEL_BLOCKS * LEA_BLOCK_SIZE;
+			dst += LEA_AVX2_PARALLEL_BLOCKS * LEA_BLOCK_SIZE;
+			nbytes -= LEA_AVX2_PARALLEL_BLOCKS * LEA_BLOCK_SIZE;
+		}
+
+		for (; nblocks >= LEA_SSE2_PARALLEL_BLOCKS;
+			nblocks -= LEA_SSE2_PARALLEL_BLOCKS) {
+			lea_avx2_xts_dec_4way(ctx->raw_crypt_ctx, dst, src, walk.iv);
+			src += LEA_SSE2_PARALLEL_BLOCKS * LEA_BLOCK_SIZE;
+			dst += LEA_SSE2_PARALLEL_BLOCKS * LEA_BLOCK_SIZE;
+			nbytes -= LEA_SSE2_PARALLEL_BLOCKS * LEA_BLOCK_SIZE;
+		}
+
+		for (; nblocks > 0; nblocks -= 1) {
+			xor_1blk(buffer, walk.iv, src);
+			lea_decrypt(ctx->raw_crypt_ctx, buffer,
+						buffer);
+			xor_1blk(dst, walk.iv, buffer);
+			xts_next_tweak(walk.iv, walk.iv);
+
+			src += LEA_BLOCK_SIZE;
+			dst += LEA_BLOCK_SIZE;
+			nbytes -= LEA_BLOCK_SIZE;
+		}
+
+		if (unlikely(is_tail)) {
+			memcpy(ntweak, walk.iv, LEA_BLOCK_SIZE);
+			xts_next_tweak(walk.iv, ntweak);
+
+			xor_1blk(buffer, walk.iv, src);
+			lea_decrypt(ctx->raw_crypt_ctx, buffer,
+						buffer);
+			xor_1blk(buffer, walk.iv, buffer);
+
+			memcpy(dst, buffer, LEA_BLOCK_SIZE);
+
+			memcpy(buffer, src + 16, tail);
+			memcpy(dst + 16, dst, tail);
+
+			xor_1blk(buffer, ntweak, buffer);
+			lea_decrypt(ctx->raw_crypt_ctx, buffer,
+						buffer);
+			xor_1blk(dst, ntweak, buffer);
+
+			nbytes -= LEA_BLOCK_SIZE + tail;
+
+			kernel_fpu_end();
+			return skcipher_walk_done(&walk, nbytes);
+		}
+
+		kernel_fpu_end();
+
+		ret = skcipher_walk_done(&walk, nbytes);
+		if (ret)
+			return ret;
+	}
+
+	if (unlikely(edge_tail != 0)) {
+		struct scatterlist sg_src[2];
+		struct scatterlist sg_dst[2];
+		struct scatterlist *scatter_src;
+		struct scatterlist *scatter_dst;
+		const u8 *src;
+		u8 *dst;
+
+		scatter_src = scatterwalk_ffwd(sg_src, req->src, req->cryptlen);
+		if (req->src == req->dst) {
+			scatter_dst = scatter_src;
+		} else {
+			scatter_dst = scatterwalk_ffwd(sg_dst, req->dst,
+							req->cryptlen);
+		}
+
+		skcipher_request_set_crypt(req, scatter_src, scatter_dst,
+					LEA_BLOCK_SIZE + edge_tail, req->iv);
+
+		ret = skcipher_walk_virt(&walk, req, false);
+
+		src = walk.src.virt.addr;
+		dst = walk.dst.virt.addr;
+
+		kernel_fpu_begin();
+
+		memcpy(ntweak, walk.iv, LEA_BLOCK_SIZE);
+		xts_next_tweak(walk.iv, ntweak);
+
+		xor_1blk(buffer, walk.iv, src);
+		lea_decrypt(ctx->raw_crypt_ctx, buffer, buffer);
+		xor_1blk(buffer, walk.iv, buffer);
+
+		memcpy(dst, buffer, LEA_BLOCK_SIZE);
+
+		memcpy(buffer, src + 16, edge_tail);
+		memcpy(dst + 16, dst, edge_tail);
+
+		xor_1blk(buffer, ntweak, buffer);
+		lea_decrypt(ctx->raw_crypt_ctx, buffer, buffer);
+		xor_1blk(dst, ntweak, buffer);
+
+		kernel_fpu_end();
+		ret = skcipher_walk_done(&walk, 0);
+	}
+
+	return ret;
+}
+
+static int ctr_encrypt_4way(struct skcipher_request *req)
+{
+	struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
+	struct crypto_lea_ctx *ctx = crypto_skcipher_ctx(tfm);
+	struct skcipher_walk walk;
+
+	u8 __aligned(16) buffer[LEA_BLOCK_SIZE];
+
+	int ret;
+
+	ret = skcipher_walk_virt(&walk, req, false);
+	if (ret)
+		return ret;
+
+	while (walk.nbytes > 0) {
+		u32 nbytes = walk.nbytes;
+		const u8 *src = walk.src.virt.addr;
+		u8 *dst = walk.dst.virt.addr;
+
+		kernel_fpu_begin();
+
+		while (nbytes >= LEA_SSE2_PARALLEL_BLOCKS * LEA_BLOCK_SIZE) {
+			lea_avx2_ctr_enc_4way(ctx, dst, src, walk.iv);
+			src += LEA_SSE2_PARALLEL_BLOCKS * LEA_BLOCK_SIZE;
+			dst += LEA_SSE2_PARALLEL_BLOCKS * LEA_BLOCK_SIZE;
+			nbytes -= LEA_SSE2_PARALLEL_BLOCKS * LEA_BLOCK_SIZE;
+		}
+
+		while (nbytes >= LEA_BLOCK_SIZE) {
+			lea_encrypt(ctx, buffer, walk.iv);
+			xor_1blk(dst, buffer, src);
+			crypto_inc(walk.iv, LEA_BLOCK_SIZE);
+
+			src += LEA_BLOCK_SIZE;
+			dst += LEA_BLOCK_SIZE;
+			nbytes -= LEA_BLOCK_SIZE;
+		}
+
+		if (unlikely(walk.nbytes == walk.total && nbytes != 0)) {
+			lea_encrypt(ctx, buffer, walk.iv);
+			crypto_xor_cpy(dst, src, buffer, nbytes);
+			crypto_inc(walk.iv, LEA_BLOCK_SIZE);
+
+			nbytes = 0;
+		}
+
+		kernel_fpu_end();
+		ret = skcipher_walk_done(&walk, nbytes);
+		if (ret)
+			return ret;
+	}
+
+	return ret;
+}
+
+static int ctr_encrypt_8way(struct skcipher_request *req)
+{
+	struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
+	struct crypto_lea_ctx *ctx = crypto_skcipher_ctx(tfm);
+	struct skcipher_walk walk;
+
+	u8 __aligned(32) buffer[LEA_BLOCK_SIZE * LEA_AVX2_PARALLEL_BLOCKS];
+
+	int ret;
+
+	ret = skcipher_walk_virt(&walk, req, false);
+	if (ret)
+		return ret;
+
+	while (walk.nbytes > 0) {
+		u32 nbytes = walk.nbytes;
+		const u8 *src = walk.src.virt.addr;
+		u8 *dst = walk.dst.virt.addr;
+
+		kernel_fpu_begin();
+
+		while (nbytes >= LEA_AVX2_PARALLEL_BLOCKS * LEA_BLOCK_SIZE) {
+			lea_avx2_ctr_enc_8way(ctx, dst, src, walk.iv, buffer);
+			src += LEA_AVX2_PARALLEL_BLOCKS * LEA_BLOCK_SIZE;
+			dst += LEA_AVX2_PARALLEL_BLOCKS * LEA_BLOCK_SIZE;
+			nbytes -= LEA_AVX2_PARALLEL_BLOCKS * LEA_BLOCK_SIZE;
+		}
+
+		while (nbytes >= LEA_SSE2_PARALLEL_BLOCKS * LEA_BLOCK_SIZE) {
+			lea_avx2_ctr_enc_4way(ctx, dst, src, walk.iv);
+			src += LEA_SSE2_PARALLEL_BLOCKS * LEA_BLOCK_SIZE;
+			dst += LEA_SSE2_PARALLEL_BLOCKS * LEA_BLOCK_SIZE;
+			nbytes -= LEA_SSE2_PARALLEL_BLOCKS * LEA_BLOCK_SIZE;
+		}
+
+		while (nbytes >= LEA_BLOCK_SIZE) {
+			lea_encrypt(ctx, buffer, walk.iv);
+			xor_1blk(dst, buffer, src);
+			crypto_inc(walk.iv, LEA_BLOCK_SIZE);
+
+			src += LEA_BLOCK_SIZE;
+			dst += LEA_BLOCK_SIZE;
+			nbytes -= LEA_BLOCK_SIZE;
+		}
+
+		if (unlikely(walk.nbytes == walk.total && nbytes != 0)) {
+			lea_encrypt(ctx, buffer, walk.iv);
+			crypto_xor_cpy(dst, src, buffer, nbytes);
+			crypto_inc(walk.iv, LEA_BLOCK_SIZE);
+
+			nbytes = 0;
+		}
+
+		kernel_fpu_end();
+		ret = skcipher_walk_done(&walk, nbytes);
+		if (ret)
+			return ret;
+	}
+
+	return ret;
+}
+
+static int xts_encrypt_4way(struct skcipher_request *req)
+{
+	struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
+	struct crypto_tfm *tfm_ctx = crypto_skcipher_ctx(tfm);
+	struct lea_xts_ctx *ctx = crypto_tfm_ctx(tfm_ctx);
+	struct skcipher_request subreq;
+	struct skcipher_walk walk;
+
+	u8 __aligned(16) buffer[LEA_BLOCK_SIZE];
+
+	int ret;
+	u32 nblocks;
+	u32 tail = req->cryptlen % LEA_BLOCK_SIZE;
+	u32 edge_tail = 0;
+
+	if (req->cryptlen < LEA_BLOCK_SIZE)
+		return -EINVAL;
+
+	ret = skcipher_walk_virt(&walk, req, false);
+	if (ret)
+		return ret;
+
+	if (unlikely(tail != 0 && walk.nbytes < walk.total)) {
+		u32 req_len = req->cryptlen - LEA_BLOCK_SIZE - tail;
+
+		skcipher_walk_abort(&walk);
+
+		skcipher_request_set_tfm(&subreq, tfm);
+		skcipher_request_set_callback(
+			&subreq, skcipher_request_flags(req), NULL, NULL);
+		skcipher_request_set_crypt(&subreq, req->src, req->dst, req_len,
+					req->iv);
+		req = &subreq;
+		ret = skcipher_walk_virt(&walk, req, false);
+		if (ret)
+			return ret;
+
+		edge_tail = tail;
+		tail = 0;
+	}
+
+	lea_encrypt(ctx->raw_tweak_ctx, walk.iv, walk.iv);
+
+	while ((nblocks = walk.nbytes / LEA_BLOCK_SIZE) > 0) {
+		u32 nbytes = walk.nbytes;
+		const u8 *src = walk.src.virt.addr;
+		u8 *dst = walk.dst.virt.addr;
+		bool is_tail = tail != 0 &&
+				(nblocks + 1) * LEA_BLOCK_SIZE > walk.total;
+
+		if (unlikely(is_tail))
+			nblocks -= 1;
+
+		kernel_fpu_begin();
+
+		for (; nblocks >= LEA_SSE2_PARALLEL_BLOCKS;
+			nblocks -= LEA_SSE2_PARALLEL_BLOCKS) {
+			lea_avx2_xts_enc_4way(ctx->raw_crypt_ctx, dst, src, walk.iv);
+			src += LEA_SSE2_PARALLEL_BLOCKS * LEA_BLOCK_SIZE;
+			dst += LEA_SSE2_PARALLEL_BLOCKS * LEA_BLOCK_SIZE;
+			nbytes -= LEA_SSE2_PARALLEL_BLOCKS * LEA_BLOCK_SIZE;
+		}
+
+		for (; nblocks > 0; nblocks -= 1) {
+
+			xor_1blk(buffer, walk.iv, src);
+			lea_encrypt(ctx->raw_crypt_ctx, buffer,
+						buffer);
+			xor_1blk(dst, walk.iv, buffer);
+			xts_next_tweak(walk.iv, walk.iv);
+
+			src += LEA_BLOCK_SIZE;
+			dst += LEA_BLOCK_SIZE;
+			nbytes -= LEA_BLOCK_SIZE;
+		}
+
+		if (unlikely(is_tail)) {
+			xor_1blk(buffer, walk.iv, src);
+			lea_encrypt(ctx->raw_crypt_ctx, buffer,
+						buffer);
+			xor_1blk(buffer, walk.iv, buffer);
+
+			memcpy(dst, buffer, LEA_BLOCK_SIZE);
+			memcpy(buffer, src + LEA_BLOCK_SIZE, tail);
+			memcpy(dst + LEA_BLOCK_SIZE, dst, tail);
+
+			xts_next_tweak(walk.iv, walk.iv);
+
+			xor_1blk(buffer, walk.iv, buffer);
+			lea_encrypt(ctx->raw_crypt_ctx, buffer,
+						buffer);
+			xor_1blk(dst, walk.iv, buffer);
+
+			nbytes -= LEA_BLOCK_SIZE + tail;
+
+			kernel_fpu_end();
+			return skcipher_walk_done(&walk, nbytes);
+		}
+
+		kernel_fpu_end();
+		ret = skcipher_walk_done(&walk, nbytes);
+		if (ret)
+			return ret;
+	}
+
+	if (unlikely(edge_tail != 0)) {
+		struct scatterlist sg_src[2];
+		struct scatterlist sg_dst[2];
+		struct scatterlist *scatter_src;
+		struct scatterlist *scatter_dst;
+		const u8 *src;
+		u8 *dst;
+
+		scatter_src = scatterwalk_ffwd(sg_src, req->src, req->cryptlen);
+		if (req->src == req->dst) {
+			scatter_dst = scatter_src;
+		} else {
+			scatter_dst = scatterwalk_ffwd(sg_dst, req->dst,
+								req->cryptlen);
+		}
+
+		skcipher_request_set_crypt(req, scatter_src, scatter_dst,
+					LEA_BLOCK_SIZE + edge_tail, req->iv);
+
+		ret = skcipher_walk_virt(&walk, req, false);
+
+		src = walk.src.virt.addr;
+		dst = walk.dst.virt.addr;
+
+		kernel_fpu_begin();
+
+		xor_1blk(buffer, walk.iv, src);
+		lea_encrypt(ctx->raw_crypt_ctx, buffer, buffer);
+		xor_1blk(buffer, walk.iv, buffer);
+
+		memcpy(dst, buffer, LEA_BLOCK_SIZE);
+		memcpy(buffer, src + LEA_BLOCK_SIZE, edge_tail);
+		memcpy(dst + LEA_BLOCK_SIZE, dst, edge_tail);
+
+		xts_next_tweak(walk.iv, walk.iv);
+
+		xor_1blk(buffer, walk.iv, buffer);
+		lea_encrypt(ctx->raw_crypt_ctx, buffer, buffer);
+		xor_1blk(dst, walk.iv, buffer);
+
+		kernel_fpu_end();
+
+		ret = skcipher_walk_done(&walk, 0);
+	}
+
+	return ret;
+}
+
+static int xts_decrypt_4way(struct skcipher_request *req)
+{
+	struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
+	struct crypto_tfm *tfm_ctx = crypto_skcipher_ctx(tfm);
+	struct lea_xts_ctx *ctx = crypto_tfm_ctx(tfm_ctx);
+	struct skcipher_request subreq;
+	struct skcipher_walk walk;
+
+	int ret;
+	u32 nblocks;
+	u32 tail = req->cryptlen % LEA_BLOCK_SIZE;
+	u32 edge_tail = 0;
+
+	if (req->cryptlen < LEA_BLOCK_SIZE)
+		return -EINVAL;
+
+	ret = skcipher_walk_virt(&walk, req, false);
+	if (ret)
+		return ret;
+
+	if (unlikely(tail != 0 && walk.nbytes < walk.total)) {
+		u32 req_len = req->cryptlen - LEA_BLOCK_SIZE - tail;
+
+		skcipher_walk_abort(&walk);
+
+		skcipher_request_set_tfm(&subreq, tfm);
+		skcipher_request_set_callback(
+			&subreq, skcipher_request_flags(req), NULL, NULL);
+		skcipher_request_set_crypt(&subreq, req->src, req->dst, req_len,
+					req->iv);
+		req = &subreq;
+		ret = skcipher_walk_virt(&walk, req, false);
+		if (ret)
+			return ret;
+
+		edge_tail = tail;
+		tail = 0;
+	}
+
+	lea_encrypt(ctx->raw_tweak_ctx, walk.iv, walk.iv);
+
+	while ((nblocks = walk.nbytes / LEA_BLOCK_SIZE) > 0) {
+		u32 nbytes = walk.nbytes;
+		const u8 *src = walk.src.virt.addr;
+		u8 *dst = walk.dst.virt.addr;
+		bool is_tail = tail != 0 &&
+			(nblocks + 1) * LEA_BLOCK_SIZE > walk.total;
+
+		if (unlikely(is_tail))
+			nblocks -= 1;
+
+		kernel_fpu_begin();
+
+		for (; nblocks >= LEA_SSE2_PARALLEL_BLOCKS;
+			nblocks -= LEA_SSE2_PARALLEL_BLOCKS) {
+			lea_avx2_xts_dec_4way(ctx->raw_crypt_ctx, dst, src, walk.iv);
+			src += LEA_SSE2_PARALLEL_BLOCKS * LEA_BLOCK_SIZE;
+			dst += LEA_SSE2_PARALLEL_BLOCKS * LEA_BLOCK_SIZE;
+			nbytes -= LEA_SSE2_PARALLEL_BLOCKS * LEA_BLOCK_SIZE;
+		}
+
+		for (; nblocks > 0; nblocks -= 1) {
+			u8 __aligned(16) buffer[LEA_BLOCK_SIZE];
+
+			xor_1blk(buffer, walk.iv, src);
+			lea_decrypt(ctx->raw_crypt_ctx, buffer,
+						buffer);
+			xor_1blk(dst, walk.iv, buffer);
+			xts_next_tweak(walk.iv, walk.iv);
+
+			src += LEA_BLOCK_SIZE;
+			dst += LEA_BLOCK_SIZE;
+			nbytes -= LEA_BLOCK_SIZE;
+		}
+
+		if (unlikely(is_tail)) {
+			u8 __aligned(16) ntweak[16] = {
+				0,
+			};
+			u8 __aligned(16) buffer[LEA_BLOCK_SIZE];
+
+			memcpy(ntweak, walk.iv, LEA_BLOCK_SIZE);
+			xts_next_tweak(walk.iv, ntweak);
+
+			xor_1blk(buffer, walk.iv, src);
+			lea_decrypt(ctx->raw_crypt_ctx, buffer,
+						buffer);
+			xor_1blk(buffer, walk.iv, buffer);
+
+			memcpy(dst, buffer, LEA_BLOCK_SIZE);
+
+			memcpy(buffer, src + 16, tail);
+			memcpy(dst + 16, dst, tail);
+
+			xor_1blk(buffer, ntweak, buffer);
+			lea_decrypt(ctx->raw_crypt_ctx, buffer,
+						buffer);
+			xor_1blk(dst, ntweak, buffer);
+
+			nbytes -= LEA_BLOCK_SIZE + tail;
+
+			kernel_fpu_end();
+			return skcipher_walk_done(&walk, nbytes);
+		}
+
+		kernel_fpu_end();
+		ret = skcipher_walk_done(&walk, nbytes);
+		if (ret)
+			return ret;
+	}
+
+	if (unlikely(edge_tail != 0)) {
+		u8 __aligned(16) ntweak[16] = {
+			0,
+		};
+		u8 __aligned(16) buffer[LEA_BLOCK_SIZE];
+		struct scatterlist sg_src[2];
+		struct scatterlist sg_dst[2];
+		struct scatterlist *scatter_src;
+		struct scatterlist *scatter_dst;
+		const u8 *src;
+		u8 *dst;
+
+		scatter_src = scatterwalk_ffwd(sg_src, req->src, req->cryptlen);
+		if (req->src == req->dst) {
+			scatter_dst = scatter_src;
+		} else {
+			scatter_dst = scatterwalk_ffwd(sg_dst, req->dst,
+							req->cryptlen);
+		}
+
+		skcipher_request_set_crypt(req, scatter_src, scatter_dst,
+					LEA_BLOCK_SIZE + edge_tail, req->iv);
+
+		ret = skcipher_walk_virt(&walk, req, false);
+
+		src = walk.src.virt.addr;
+		dst = walk.dst.virt.addr;
+
+		kernel_fpu_begin();
+
+		memcpy(ntweak, walk.iv, LEA_BLOCK_SIZE);
+		xts_next_tweak(walk.iv, ntweak);
+
+		xor_1blk(buffer, walk.iv, src);
+		lea_decrypt(ctx->raw_crypt_ctx, buffer, buffer);
+		xor_1blk(buffer, walk.iv, buffer);
+
+		memcpy(dst, buffer, LEA_BLOCK_SIZE);
+
+		memcpy(buffer, src + 16, edge_tail);
+		memcpy(dst + 16, dst, edge_tail);
+
+		xor_1blk(buffer, ntweak, buffer);
+		lea_decrypt(ctx->raw_crypt_ctx, buffer, buffer);
+		xor_1blk(dst, ntweak, buffer);
+
+		kernel_fpu_end();
+		ret = skcipher_walk_done(&walk, 0);
+	}
+
+	return ret;
+}
+
+static int xts_lea_set_key(struct crypto_skcipher *tfm, const u8 *key,
+				u32 keylen)
+{
+	struct crypto_tfm *tfm_ctx = crypto_skcipher_ctx(tfm);
+	struct lea_xts_ctx *ctx = crypto_tfm_ctx(tfm_ctx);
+
+	struct crypto_lea_ctx *crypt_key =
+		(struct crypto_lea_ctx *)(ctx->raw_crypt_ctx);
+	struct crypto_lea_ctx *tweak_key =
+		(struct crypto_lea_ctx *)(ctx->raw_tweak_ctx);
+
+	int result;
+
+	result = xts_verify_key(tfm, key, keylen);
+	if (result)
+		return result;
+
+	result = lea_set_key(crypt_key, key, keylen / 2);
+
+	if (result)
+		return result;
+
+	return lea_set_key(tweak_key, key + (keylen / 2), keylen / 2);
+}
+
+static int _lea_set_key(struct crypto_skcipher *tfm, const u8 *key, u32 keylen)
+{
+	return lea_set_key(crypto_skcipher_ctx(tfm), key, keylen);
+}
+
+static struct skcipher_alg lea_simd_avx2_algs[] = {
+	{
+		.base.cra_name = "__ecb(lea)",
+		.base.cra_driver_name = "__ecb-lea-sse2",
+		.base.cra_priority = 300 - 1,
+		.base.cra_flags = CRYPTO_ALG_INTERNAL,
+		.base.cra_blocksize = LEA_BLOCK_SIZE,
+		.base.cra_ctxsize = sizeof(struct crypto_lea_ctx),
+		.base.cra_module = THIS_MODULE,
+		.min_keysize = LEA_MIN_KEY_SIZE,
+		.max_keysize = LEA_MAX_KEY_SIZE,
+		.walksize = LEA_SSE2_PARALLEL_BLOCKS * LEA_BLOCK_SIZE,
+		.setkey = _lea_set_key,
+		.encrypt = ecb_encrypt_4way,
+		.decrypt = ecb_decrypt_4way,
+	},
+	{
+		.base.cra_name = "__cbc(lea)",
+		.base.cra_driver_name = "__cbc-lea-sse2",
+		.base.cra_priority = 300 - 1,
+		.base.cra_flags = CRYPTO_ALG_INTERNAL,
+		.base.cra_blocksize = LEA_BLOCK_SIZE,
+		.base.cra_ctxsize = sizeof(struct crypto_lea_ctx),
+		.base.cra_module = THIS_MODULE,
+		.min_keysize = LEA_MIN_KEY_SIZE,
+		.max_keysize = LEA_MAX_KEY_SIZE,
+		.walksize = LEA_SSE2_PARALLEL_BLOCKS * LEA_BLOCK_SIZE,
+		.ivsize = LEA_BLOCK_SIZE,
+		.setkey = _lea_set_key,
+		.encrypt = cbc_encrypt,
+		.decrypt = cbc_decrypt_4way,
+	},
+	{
+		.base.cra_name = "__xts(lea)",
+		.base.cra_driver_name = "__xts-lea-sse2",
+		.base.cra_priority = 300 - 1,
+		.base.cra_flags = CRYPTO_ALG_INTERNAL,
+		.base.cra_blocksize = LEA_BLOCK_SIZE,
+		.base.cra_ctxsize = sizeof(struct lea_xts_ctx),
+		.base.cra_module = THIS_MODULE,
+		.min_keysize = LEA_MIN_KEY_SIZE * 2,
+		.max_keysize = LEA_MAX_KEY_SIZE * 2,
+		.walksize = LEA_SSE2_PARALLEL_BLOCKS * LEA_BLOCK_SIZE,
+		.ivsize = LEA_BLOCK_SIZE,
+		.setkey = xts_lea_set_key,
+		.encrypt = xts_encrypt_4way,
+		.decrypt = xts_decrypt_4way,
+	},
+	{
+		.base.cra_name = "__ctr(lea)",
+		.base.cra_driver_name = "__ctr-lea-sse2",
+		.base.cra_priority = 300 - 1,
+		.base.cra_flags = CRYPTO_ALG_INTERNAL,
+		.base.cra_blocksize = 1,
+		.base.cra_ctxsize = sizeof(struct crypto_lea_ctx),
+		.base.cra_module = THIS_MODULE,
+		.min_keysize = LEA_MIN_KEY_SIZE,
+		.max_keysize = LEA_MAX_KEY_SIZE,
+		.chunksize = LEA_BLOCK_SIZE,
+		.walksize = LEA_SSE2_PARALLEL_BLOCKS * LEA_BLOCK_SIZE,
+		.ivsize = LEA_BLOCK_SIZE,
+		.setkey = _lea_set_key,
+		.encrypt = ctr_encrypt_4way,
+		.decrypt = ctr_encrypt_4way,
+	},
+	{
+		.base.cra_name = "__ecb(lea)",
+		.base.cra_driver_name = "__ecb-lea-avx2",
+		.base.cra_priority = 300,
+		.base.cra_flags = CRYPTO_ALG_INTERNAL,
+		.base.cra_blocksize = LEA_BLOCK_SIZE,
+		.base.cra_ctxsize = sizeof(struct crypto_lea_ctx),
+		.base.cra_module = THIS_MODULE,
+		.min_keysize = LEA_MIN_KEY_SIZE,
+		.max_keysize = LEA_MAX_KEY_SIZE,
+		.walksize = LEA_AVX2_PARALLEL_BLOCKS * LEA_BLOCK_SIZE,
+		.setkey = _lea_set_key,
+		.encrypt = ecb_encrypt_8way,
+		.decrypt = ecb_decrypt_8way,
+	},
+	{
+		.base.cra_name = "__ctr(lea)",
+		.base.cra_driver_name = "__ctr-lea-avx2",
+		.base.cra_priority = 300,
+		.base.cra_flags = CRYPTO_ALG_INTERNAL,
+		.base.cra_blocksize = 1,
+		.base.cra_ctxsize = sizeof(struct crypto_lea_ctx),
+		.base.cra_module = THIS_MODULE,
+		.min_keysize = LEA_MIN_KEY_SIZE,
+		.max_keysize = LEA_MAX_KEY_SIZE,
+		.chunksize = LEA_BLOCK_SIZE,
+		.walksize = LEA_AVX2_PARALLEL_BLOCKS * LEA_BLOCK_SIZE,
+		.ivsize = LEA_BLOCK_SIZE,
+		.setkey = _lea_set_key,
+		.encrypt = ctr_encrypt_8way,
+		.decrypt = ctr_encrypt_8way,
+	},
+	{
+		.base.cra_name = "__cbc(lea)",
+		.base.cra_driver_name = "__cbc-lea-avx2",
+		.base.cra_priority = 300,
+		.base.cra_flags = CRYPTO_ALG_INTERNAL,
+		.base.cra_blocksize = LEA_BLOCK_SIZE,
+		.base.cra_ctxsize = sizeof(struct crypto_lea_ctx),
+		.base.cra_module = THIS_MODULE,
+		.min_keysize = LEA_MIN_KEY_SIZE,
+		.max_keysize = LEA_MAX_KEY_SIZE,
+		.walksize = LEA_AVX2_PARALLEL_BLOCKS * LEA_BLOCK_SIZE,
+		.ivsize = LEA_BLOCK_SIZE,
+		.setkey = _lea_set_key,
+		.encrypt = cbc_encrypt,
+		.decrypt = cbc_decrypt_8way,
+	},
+	{
+		.base.cra_name = "__xts(lea)",
+		.base.cra_driver_name = "__xts-lea-avx2",
+		.base.cra_priority = 300,
+		.base.cra_flags = CRYPTO_ALG_INTERNAL,
+		.base.cra_blocksize = LEA_BLOCK_SIZE,
+		.base.cra_ctxsize = sizeof(struct lea_xts_ctx),
+		.base.cra_module = THIS_MODULE,
+		.min_keysize = LEA_MIN_KEY_SIZE * 2,
+		.max_keysize = LEA_MAX_KEY_SIZE * 2,
+		.walksize = LEA_AVX2_PARALLEL_BLOCKS * LEA_BLOCK_SIZE,
+		.ivsize = LEA_BLOCK_SIZE,
+		.setkey = xts_lea_set_key,
+		.encrypt = xts_encrypt_8way,
+		.decrypt = xts_decrypt_8way,
+	},
+};
+
+static struct simd_skcipher_alg *lea_simd_algs[ARRAY_SIZE(lea_simd_avx2_algs)];
+
+static int __init crypto_lea_avx2_init(void)
+{
+	const char *feature_name;
+
+	if (!boot_cpu_has(X86_FEATURE_XMM2)) {
+		pr_info("SSE2 instructions are not detected.\n");
+		return -ENODEV;
+	}
+
+	if (!boot_cpu_has(X86_FEATURE_MOVBE)) {
+		pr_info("MOVBE instructions are not detected.\n");
+		return -ENODEV;
+	}
+
+	if (!boot_cpu_has(X86_FEATURE_AVX2) || !boot_cpu_has(X86_FEATURE_AVX)) {
+		pr_info("AVX2 instructions are not detected.\n");
+		return -ENODEV;
+	}
+
+	if (!cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM,
+				&feature_name)) {
+		pr_info("CPU feature '%s' is not supported.\n", feature_name);
+		return -ENODEV;
+	}
+
+	return simd_register_skciphers_compat(
+		lea_simd_avx2_algs, ARRAY_SIZE(lea_simd_algs), lea_simd_algs);
+}
+
+static void __exit crypto_lea_avx2_exit(void)
+{
+	simd_unregister_skciphers(lea_simd_avx2_algs, ARRAY_SIZE(lea_simd_algs),
+				lea_simd_algs);
+}
+
+module_init(crypto_lea_avx2_init);
+module_exit(crypto_lea_avx2_exit);
+
+MODULE_DESCRIPTION("LEA Cipher Algorithm, AVX2, SSE2 SIMD, MOVBE");
+MODULE_AUTHOR("Dongsoo Lee <letrhee@nsr.re.kr>");
+MODULE_LICENSE("GPL");
+MODULE_ALIAS_CRYPTO("lea");
+MODULE_ALIAS_CRYPTO("lea-avx2");
diff --git a/arch/x86/crypto/lea_avx2_x86_64-asm.S b/arch/x86/crypto/lea_avx2_x86_64-asm.S
new file mode 100644
index 000000000000..06ad30a2ab63
--- /dev/null
+++ b/arch/x86/crypto/lea_avx2_x86_64-asm.S
@@ -0,0 +1,778 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * LEA Cipher 8-way(AVX2), 4-way(SSE2) parallel algorithm.
+ * In CTR mode, the MOVBE instruction is utilized for improved performance.
+ *
+ * Copyright (c) 2023 National Security Research.
+ * Author: Dongsoo Lee <letrhee@nsr.re.kr>
+ */
+
+#include <linux/linkage.h>
+#include <asm/frame.h>
+
+.file "lea_avx2_x86_64-asm.S"
+
+.section .text
+
+#define LEA_MAX_KEYLENGTH (32 * 6 * 4)
+
+#define ADD_CTR1_R(low, high) \
+	add $1, low; \
+	adc $0, high;
+
+#define PROC_NEXT_CTR(addr, blk_offset, low, high) \
+	ADD_CTR1_R(low, high); \
+	movbe high, (blk_offset * 16)(addr); \
+	movbe low, (blk_offset * 16 + 8)(addr);
+
+#define XTS_TW_X0 %xmm8
+#define XTS_TW_X1 %xmm9
+#define XTS_TW_I2 %xmm0
+#define XTS_TW_O2 %xmm10
+#define XTS_TW_X3 %xmm11
+#define XTS_TW_X4 %xmm12
+#define XTS_TW_X5 %xmm13
+#define XTS_TW_I6 %xmm1
+#define XTS_TW_O6 %xmm14
+#define XTS_TW_X7 %xmm15
+#define XTS_TW_X8 %xmm2
+#define XTS_MASK  %xmm7
+
+#define XTS_TW_Y0 %ymm12
+#define XTS_TW_Y1 %ymm13
+#define XTS_TW_Y2 %ymm14
+#define XTS_TW_Y3 %ymm15
+
+#define CTR_64_low %rax
+#define CTR_64_high %r9
+
+
+#define XMM(n) %xmm ##  n
+#define YMM(n) %ymm ##  n
+
+#define XAR_AVX2(v0, v1, cur, pre, tmp, rk1, rk2) \
+	vpbroadcastd rk2, tmp; \
+	vpxor        tmp, cur, cur; \
+	vpbroadcastd rk1, tmp; \
+	vpxor        pre, tmp, tmp; \
+	vpaddd       cur, tmp, tmp; \
+	vpsrld       v0, tmp, cur; \
+	vpslld       v1, tmp, tmp; \
+	vpxor        tmp, cur, cur;
+
+
+#define XSR_AVX2(v0, v1, cur, pre, tmp, rk1, rk2) \
+	vpsrld       v0, cur, tmp; \
+	vpslld       v1, cur, cur; \
+	vpxor        tmp, cur, cur; \
+	vpbroadcastd rk1, tmp; \
+	vpxor        pre, tmp, tmp; \
+	vpsubd       tmp, cur, cur; \
+	vpbroadcastd rk2, tmp; \
+	vpxor        tmp, cur, cur;
+
+#define XAR3_AVX2(cur, pre, tmp, rk1, rk2) \
+	XAR_AVX2($3, $29, cur, pre, tmp, rk1, rk2)
+
+#define XAR5_AVX2(cur, pre, tmp, rk1, rk2) \
+	XAR_AVX2($5, $27, cur, pre, tmp, rk1, rk2)
+
+#define XAR9_AVX2(cur, pre, tmp, rk1, rk2) \
+	XAR_AVX2($23, $9, cur, pre, tmp, rk1, rk2)
+
+
+#define XSR9_AVX2(cur, pre, tmp, rk1, rk2) \
+	XSR_AVX2($9, $23, cur, pre, tmp, rk1, rk2)
+
+#define XSR5_AVX2(cur, pre, tmp, rk1, rk2) \
+	XSR_AVX2($27, $5, cur, pre, tmp, rk1, rk2)
+
+#define XSR3_AVX2(cur, pre, tmp, rk1, rk2) \
+	XSR_AVX2($29, $3, cur, pre, tmp, rk1, rk2)
+
+#define LOAD_AND_JOIN8_YMM(i, ti, j, mem) \
+	vmovd (j + 0 * 16)(mem), XMM(ti); \
+	vpinsrd $0x1, (j + 1 * 16)(mem), XMM(ti), XMM(ti); \
+	vpinsrd $0x2, (j + 2 * 16)(mem), XMM(ti), XMM(ti); \
+	vpinsrd $0x3, (j + 3 * 16)(mem), XMM(ti), XMM(ti); \
+	vmovd (j + 4 * 16)(mem), XMM(i); \
+	vpinsrd $0x1, (j + 5 * 16)(mem), XMM(i), XMM(i); \
+	vpinsrd $0x2, (j + 6 * 16)(mem), XMM(i), XMM(i); \
+	vpinsrd $0x3, (j + 7 * 16)(mem), XMM(i), XMM(i); \
+	vinserti128 $0x1, XMM(ti), YMM(i), YMM(i); \
+
+#define LOAD_AND_JOIN_BLOCK8(i0, i1, i2, i3, ti0, mem) \
+	LOAD_AND_JOIN8_YMM(i0, ti0, 0, mem);\
+	LOAD_AND_JOIN8_YMM(i1, ti0, 4, mem);\
+	LOAD_AND_JOIN8_YMM(i2, ti0, 8, mem);\
+	LOAD_AND_JOIN8_YMM(i3, ti0, 12, mem);
+
+#define SPLIT_AND_STORE8_YMM(i, j, mem) \
+	vmovd XMM(i), (j + 4 * 16)(mem);\
+	vpextrd $0x1, XMM(i), (j + 5 * 16)(mem);\
+	vpextrd $0x2, XMM(i), (j + 6 * 16)(mem);\
+	vpextrd $0x3, XMM(i), (j + 7 * 16)(mem);\
+	vextracti128 $0x1, YMM(i), XMM(i);\
+	vmovd XMM(i), (j + 0 * 16)(mem);\
+	vpextrd $0x1, XMM(i), (j + 1 * 16)(mem);\
+	vpextrd $0x2, XMM(i), (j + 2 * 16)(mem);\
+	vpextrd $0x3, XMM(i), (j + 3 * 16)(mem);
+
+#define SPLIT_AND_STORE_BLOCK8(i0, i1, i2, i3, mem) \
+	SPLIT_AND_STORE8_YMM(i0, 0, mem);\
+	SPLIT_AND_STORE8_YMM(i1, 4, mem);\
+	SPLIT_AND_STORE8_YMM(i2, 8, mem);\
+	SPLIT_AND_STORE8_YMM(i3, 12, mem);
+
+
+#define LOAD_BLOCK4(x0, x1, x2, x3, mem) \
+	movdqu 0 * 16(mem), x0; \
+	movdqu 1 * 16(mem), x1; \
+	movdqu 2 * 16(mem), x2; \
+	movdqu 3 * 16(mem), x3;
+
+#define SPLIT_BLOCK4(x0, x1, out_x2, x3, tmp, in_x2) \
+	movdqa x0, out_x2; \
+	movdqa in_x2, tmp; \
+	punpckldq x1, x0; \
+	punpckhdq x1, out_x2; \
+	punpckldq x3, tmp; \
+	punpckhdq x3, in_x2; \
+	\
+	movdqa x0, x1; \
+	movdqa out_x2, x3; \
+	punpcklqdq tmp, x0; \
+	punpckhqdq tmp, x1; \
+	punpcklqdq in_x2, out_x2; \
+	punpckhqdq in_x2, x3;
+
+#define XOR_BLOCK3(x0, x1, x2, tmp0, tmp1, tmp2, mem) \
+	movdqu 0 * 16(mem), tmp0; \
+	movdqu 1 * 16(mem), tmp1; \
+	movdqu 2 * 16(mem), tmp2; \
+	pxor tmp0, x0;            \
+	pxor tmp1, x1;            \
+	pxor tmp2, x2;
+
+#define STORE_BLOCK4(x0, x1, x2, x3, mem) \
+	movdqu x0, 0 * 16(mem); \
+	movdqu x1, 1 * 16(mem); \
+	movdqu x2, 2 * 16(mem); \
+	movdqu x3, 3 * 16(mem);
+
+#define LEA_1ROUND_ENC(i0, i1, i2, i3, tmp, rk, rnd_num) \
+	XAR3_AVX2(i3, i2, tmp, (((rnd_num) * 6 + 4) * 4)(rk), (((rnd_num) * 6 + 5) * 4)(rk)); \
+	XAR5_AVX2(i2, i1, tmp, (((rnd_num) * 6 + 2) * 4)(rk), (((rnd_num) * 6 + 3) * 4)(rk)); \
+	XAR9_AVX2(i1, i0, tmp, (((rnd_num) * 6 + 0) * 4)(rk), (((rnd_num) * 6 + 1) * 4)(rk));
+
+#define LEA_4ROUND_ENC(i0, i1, i2, i3, tmp, rk, rnd_num) \
+	LEA_1ROUND_ENC(i0, i1, i2, i3, tmp, rk, rnd_num + 0); \
+	LEA_1ROUND_ENC(i1, i2, i3, i0, tmp, rk, rnd_num + 1); \
+	LEA_1ROUND_ENC(i2, i3, i0, i1, tmp, rk, rnd_num + 2); \
+	LEA_1ROUND_ENC(i3, i0, i1, i2, tmp, rk, rnd_num + 3);
+
+#define LEA_1ROUND_DEC(i0, i1, i2, i3, tmp, rk, rnd_num) \
+	XSR9_AVX2(i0, i3, tmp, (((rnd_num) * 6 + 0) * 4)(rk), (((rnd_num) * 6 + 1) * 4)(rk)); \
+	XSR5_AVX2(i1, i0, tmp, (((rnd_num) * 6 + 2) * 4)(rk), (((rnd_num) * 6 + 3) * 4)(rk)); \
+	XSR3_AVX2(i2, i1, tmp, (((rnd_num) * 6 + 4) * 4)(rk), (((rnd_num) * 6 + 5) * 4)(rk));
+
+#define LEA_4ROUND_DEC(i0, i1, i2, i3, tmp, rk, rnd_num) \
+	LEA_1ROUND_DEC(i0, i1, i2, i3, tmp, rk, rnd_num + 3); \
+	LEA_1ROUND_DEC(i3, i0, i1, i2, tmp, rk, rnd_num + 2); \
+	LEA_1ROUND_DEC(i2, i3, i0, i1, tmp, rk, rnd_num + 1); \
+	LEA_1ROUND_DEC(i1, i2, i3, i0, tmp, rk, rnd_num + 0);
+
+#define CBC_LOAD_SHUFFLE_MASK(mask) \
+	vmovdqa .Lcbc_shuffle_mask(%rip), mask;
+
+#define XTS_LOAD_TWEAK_MASK(mask) \
+	vmovdqa .Lxts_tweak_mask(%rip), mask;
+
+#define XTS_NEXT_TWEAK_1BLOCK(out0, in0, tmp0, mask) \
+	pshufd $0x13, in0, tmp0; \
+	psrad $31, tmp0; \
+	pand mask, tmp0; \
+	vpsllq $1, in0, out0; \
+	pxor tmp0, out0;
+
+#define JOIN_BLOCK4(x0, x1, out_x2, x3, tmp, in_x2) \
+	vpunpckhdq x1, x0, out_x2; \
+	vpunpckldq x1, x0, x0; \
+	vpunpckldq x3, in_x2, tmp; \
+	vpunpckhdq x3, in_x2, in_x2; \
+	\
+	vpunpckhqdq tmp, x0, x1; \
+	vpunpcklqdq tmp, x0, x0; \
+	vpunpckhqdq in_x2, out_x2, x3; \
+	vpunpcklqdq in_x2, out_x2, out_x2;
+
+
+.align 8
+SYM_FUNC_START_LOCAL(__lea_avx2_enc_4way)
+	/* input:
+	 *	%rdi: ctx, CTX
+	 *	%xmm0..%xmm3: 4 plaintext blocks
+	 * output:
+	 *	%xmm0..%xmm3: 4 encrypted blocks
+	 */
+	LEA_4ROUND_ENC(%xmm0, %xmm1, %xmm2, %xmm3, %xmm4, %rdi, 0);
+	LEA_4ROUND_ENC(%xmm0, %xmm1, %xmm2, %xmm3, %xmm4, %rdi, 4);
+	LEA_4ROUND_ENC(%xmm0, %xmm1, %xmm2, %xmm3, %xmm4, %rdi, 8);
+	LEA_4ROUND_ENC(%xmm0, %xmm1, %xmm2, %xmm3, %xmm4, %rdi, 12);
+	LEA_4ROUND_ENC(%xmm0, %xmm1, %xmm2, %xmm3, %xmm4, %rdi, 16);
+	LEA_4ROUND_ENC(%xmm0, %xmm1, %xmm2, %xmm3, %xmm4, %rdi, 20);
+
+	cmpl $24, LEA_MAX_KEYLENGTH(%rdi);
+	je .Lenc4_done;
+	LEA_4ROUND_ENC(%xmm0, %xmm1, %xmm2, %xmm3, %xmm4, %rdi, 24);
+
+	cmpl $28, LEA_MAX_KEYLENGTH(%rdi);
+	je .Lenc4_done;
+	LEA_4ROUND_ENC(%xmm0, %xmm1, %xmm2, %xmm3, %xmm4, %rdi, 28);
+
+.Lenc4_done:
+	RET;
+SYM_FUNC_END(__lea_avx2_enc_4way)
+
+.align 8
+SYM_FUNC_START_LOCAL(__lea_avx2_dec_4way)
+	/* input:
+	 *	%rdi: ctx, CTX
+	 *	%xmm0..%xmm3: 4 encrypted blocks
+	 * output:
+	 *	%xmm0..%xmm3: 4 plaintext blocks
+	 */
+	cmpl $28, LEA_MAX_KEYLENGTH(%rdi);
+	jbe .Ldec4_24;
+	LEA_4ROUND_DEC(%xmm0, %xmm1, %xmm2, %xmm3, %xmm4, %rdi, 28);
+
+.Ldec4_24:
+	cmpl $24, LEA_MAX_KEYLENGTH(%rdi);
+	jbe .Ldec4_20;
+	LEA_4ROUND_DEC(%xmm0, %xmm1, %xmm2, %xmm3, %xmm4, %rdi, 24);
+
+.Ldec4_20:
+	LEA_4ROUND_DEC(%xmm0, %xmm1, %xmm2, %xmm3, %xmm4, %rdi, 20);
+	LEA_4ROUND_DEC(%xmm0, %xmm1, %xmm2, %xmm3, %xmm4, %rdi, 16);
+	LEA_4ROUND_DEC(%xmm0, %xmm1, %xmm2, %xmm3, %xmm4, %rdi, 12);
+	LEA_4ROUND_DEC(%xmm0, %xmm1, %xmm2, %xmm3, %xmm4, %rdi, 8);
+	LEA_4ROUND_DEC(%xmm0, %xmm1, %xmm2, %xmm3, %xmm4, %rdi, 4);
+	LEA_4ROUND_DEC(%xmm0, %xmm1, %xmm2, %xmm3, %xmm4, %rdi, 0);
+
+	RET;
+SYM_FUNC_END(__lea_avx2_dec_4way)
+
+
+.align 8
+SYM_FUNC_START_LOCAL(__lea_avx2_enc_8way)
+	/* input:
+	 *	%rdi: ctx, CTX
+	 *	%ymm0..%ymm3: 8 plaintext blocks
+	 * output:
+	 *	%ymm0..%ymm3: 8 encrypted blocks
+	 */
+	LEA_4ROUND_ENC(%ymm0, %ymm1, %ymm2, %ymm3, %ymm4, %rdi, 0);
+	LEA_4ROUND_ENC(%ymm0, %ymm1, %ymm2, %ymm3, %ymm4, %rdi, 4);
+	LEA_4ROUND_ENC(%ymm0, %ymm1, %ymm2, %ymm3, %ymm4, %rdi, 8);
+	LEA_4ROUND_ENC(%ymm0, %ymm1, %ymm2, %ymm3, %ymm4, %rdi, 12);
+	LEA_4ROUND_ENC(%ymm0, %ymm1, %ymm2, %ymm3, %ymm4, %rdi, 16);
+	LEA_4ROUND_ENC(%ymm0, %ymm1, %ymm2, %ymm3, %ymm4, %rdi, 20);
+
+	cmpl $24, LEA_MAX_KEYLENGTH(%rdi);
+	je .Lenc8_done;
+	LEA_4ROUND_ENC(%ymm0, %ymm1, %ymm2, %ymm3, %ymm4, %rdi, 24);
+
+	cmpl $28, LEA_MAX_KEYLENGTH(%rdi);
+	je .Lenc8_done;
+	LEA_4ROUND_ENC(%ymm0, %ymm1, %ymm2, %ymm3, %ymm4, %rdi, 28);
+
+.Lenc8_done:
+	RET;
+SYM_FUNC_END(__lea_avx2_enc_8way)
+
+.align 8
+SYM_FUNC_START_LOCAL(__lea_avx2_dec_8way)
+	/* input:
+	 *	%rdi: ctx, CTX
+	 *	%ymm0..%ymm3: 8 encrypted blocks
+	 * output:
+	 *	%ymm0..%ymm3: 8 plaintext blocks
+	 */
+	cmpl $28, LEA_MAX_KEYLENGTH(%rdi);
+	jbe .Lenc8_24;
+	LEA_4ROUND_DEC(%ymm0, %ymm1, %ymm2, %ymm3, %ymm4, %rdi, 28);
+
+.Lenc8_24:
+	cmpl $24, LEA_MAX_KEYLENGTH(%rdi);
+	jbe .Lenc8_20;
+	LEA_4ROUND_DEC(%ymm0, %ymm1, %ymm2, %ymm3, %ymm4, %rdi, 24);
+
+.Lenc8_20:
+	LEA_4ROUND_DEC(%ymm0, %ymm1, %ymm2, %ymm3, %ymm4, %rdi, 20);
+	LEA_4ROUND_DEC(%ymm0, %ymm1, %ymm2, %ymm3, %ymm4, %rdi, 16);
+	LEA_4ROUND_DEC(%ymm0, %ymm1, %ymm2, %ymm3, %ymm4, %rdi, 12);
+	LEA_4ROUND_DEC(%ymm0, %ymm1, %ymm2, %ymm3, %ymm4, %rdi, 8);
+	LEA_4ROUND_DEC(%ymm0, %ymm1, %ymm2, %ymm3, %ymm4, %rdi, 4);
+	LEA_4ROUND_DEC(%ymm0, %ymm1, %ymm2, %ymm3, %ymm4, %rdi, 0);
+
+	RET;
+SYM_FUNC_END(__lea_avx2_dec_8way)
+
+SYM_FUNC_START(lea_avx2_ecb_enc_4way)
+	/* input:
+	 *	%rdi: ctx, CTX
+	 *	%rsi: dst (4 blocks)
+	 *	%rdx: src (4 blocks)
+	 */
+	FRAME_BEGIN
+
+	LOAD_BLOCK4(%xmm0, %xmm1, %xmm5, %xmm3, %rdx);
+	JOIN_BLOCK4(%xmm0, %xmm1, %xmm2, %xmm3, %xmm4, %xmm5);
+
+	call __lea_avx2_enc_4way
+
+	SPLIT_BLOCK4(%xmm0, %xmm1, %xmm5, %xmm3, %xmm4, %xmm2);
+	STORE_BLOCK4(%xmm0, %xmm1, %xmm5, %xmm3, %rsi);
+
+	FRAME_END
+	RET;
+SYM_FUNC_END(lea_avx2_ecb_enc_4way)
+
+SYM_FUNC_START(lea_avx2_ecb_dec_4way)
+	/* input:
+	 *	%rdi: ctx, CTX
+	 *	%rsi: dst (4 blocks)
+	 *	%rdx: src (4 blocks)
+	 */
+	FRAME_BEGIN
+
+	LOAD_BLOCK4(%xmm0, %xmm1, %xmm5, %xmm3, %rdx);
+	JOIN_BLOCK4(%xmm0, %xmm1, %xmm2, %xmm3, %xmm4, %xmm5);
+
+	call __lea_avx2_dec_4way
+
+	SPLIT_BLOCK4(%xmm0, %xmm1, %xmm5, %xmm3, %xmm4, %xmm2);
+	STORE_BLOCK4(%xmm0, %xmm1, %xmm5, %xmm3, %rsi);
+
+	FRAME_END
+	RET;
+SYM_FUNC_END(lea_avx2_ecb_dec_4way)
+
+SYM_FUNC_START(lea_avx2_cbc_dec_4way)
+	/* input:
+	 *	%rdi: ctx, CTX
+	 *	%rsi: dst (4 blocks)
+	 *	%rdx: src (4 blocks)
+	 */
+	FRAME_BEGIN
+
+	LOAD_BLOCK4(%xmm0, %xmm1, %xmm5, %xmm3, %rdx);
+	JOIN_BLOCK4(%xmm0, %xmm1, %xmm2, %xmm3, %xmm4, %xmm5);
+
+	call __lea_avx2_dec_4way
+
+	SPLIT_BLOCK4(%xmm0, %xmm1, %xmm5, %xmm3, %xmm4, %xmm2);
+	XOR_BLOCK3(%xmm1, %xmm5, %xmm3, %xmm4, %xmm6, %xmm7, %rdx);
+	STORE_BLOCK4(%xmm0, %xmm1, %xmm5, %xmm3, %rsi);
+
+	FRAME_END
+	RET;
+SYM_FUNC_END(lea_avx2_cbc_dec_4way)
+
+SYM_FUNC_START(lea_avx2_xts_enc_4way)
+	/* input:
+	 *	%rdi: ctx, CTX
+	 *	%rsi: dst (4 blocks)
+	 *	%rdx: src (4 blocks)
+	 *	%rcx: tweak
+	 */
+	FRAME_BEGIN
+
+	LOAD_BLOCK4(%xmm0, %xmm1, %xmm5, %xmm3, %rdx);
+	movdqu (%rcx), XTS_TW_X0;
+	XTS_LOAD_TWEAK_MASK(XTS_MASK);
+	pxor XTS_TW_X0, %xmm0;
+	XTS_NEXT_TWEAK_1BLOCK(XTS_TW_X1, XTS_TW_X0, %xmm4, XTS_MASK);
+	pxor XTS_TW_X1, %xmm1;
+	XTS_NEXT_TWEAK_1BLOCK(XTS_TW_O2, XTS_TW_X1, %xmm4, XTS_MASK);
+	pxor XTS_TW_O2, %xmm5;
+	XTS_NEXT_TWEAK_1BLOCK(XTS_TW_X3, XTS_TW_O2, %xmm4, XTS_MASK);
+	pxor XTS_TW_X3, %xmm3;
+
+
+	JOIN_BLOCK4(%xmm0, %xmm1, %xmm2, %xmm3, %xmm4, %xmm5);
+
+	call __lea_avx2_enc_4way
+
+	SPLIT_BLOCK4(%xmm0, %xmm1, %xmm5, %xmm3, %xmm4, %xmm2);
+
+	pxor XTS_TW_X0, %xmm0;
+	pxor XTS_TW_X1, %xmm1;
+	pxor XTS_TW_O2, %xmm5;
+	pxor XTS_TW_X3, %xmm3;
+
+	XTS_NEXT_TWEAK_1BLOCK(XTS_TW_X0, XTS_TW_X3, %xmm4, XTS_MASK);
+	movdqu XTS_TW_X0, (%rcx);
+	STORE_BLOCK4(%xmm0, %xmm1, %xmm5, %xmm3, %rsi);
+
+	FRAME_END
+	RET;
+SYM_FUNC_END(lea_avx2_xts_enc_4way)
+
+SYM_FUNC_START(lea_avx2_xts_dec_4way)
+	/* input:
+	 *	%rdi: ctx, CTX
+	 *	%rsi: dst (4 blocks)
+	 *	%rdx: src (4 blocks)
+	 *	%rcx: tweak
+	 */
+	FRAME_BEGIN
+
+	LOAD_BLOCK4(%xmm0, %xmm1, %xmm5, %xmm3, %rdx);
+	movdqu (%rcx), XTS_TW_X0;
+	XTS_LOAD_TWEAK_MASK(XTS_MASK);
+	pxor XTS_TW_X0, %xmm0;
+	XTS_NEXT_TWEAK_1BLOCK(XTS_TW_X1, XTS_TW_X0, %xmm4, XTS_MASK);
+	pxor XTS_TW_X1, %xmm1;
+	XTS_NEXT_TWEAK_1BLOCK(XTS_TW_O2, XTS_TW_X1, %xmm4, XTS_MASK);
+	pxor XTS_TW_O2, %xmm5;
+	XTS_NEXT_TWEAK_1BLOCK(XTS_TW_X3, XTS_TW_O2, %xmm4, XTS_MASK);
+	pxor XTS_TW_X3, %xmm3;
+
+	JOIN_BLOCK4(%xmm0, %xmm1, %xmm2, %xmm3, %xmm4, %xmm5);
+
+	call __lea_avx2_dec_4way
+
+	SPLIT_BLOCK4(%xmm0, %xmm1, %xmm5, %xmm3, %xmm4, %xmm2);
+
+	pxor XTS_TW_X0, %xmm0;
+	pxor XTS_TW_X1, %xmm1;
+	pxor XTS_TW_O2, %xmm5;
+	pxor XTS_TW_X3, %xmm3;
+
+	XTS_NEXT_TWEAK_1BLOCK(XTS_TW_X0, XTS_TW_X3, %xmm4, XTS_MASK);
+	movdqu XTS_TW_X0, (%rcx);
+	STORE_BLOCK4(%xmm0, %xmm1, %xmm5, %xmm3, %rsi);
+
+	FRAME_END
+	RET;
+SYM_FUNC_END(lea_avx2_xts_dec_4way)
+
+SYM_FUNC_START(lea_avx2_xts_next_tweak_sse2)
+	/* input:
+	 *	%rdi: tweak_out
+	 *	%rsi: tweak_in
+	 */
+	FRAME_BEGIN
+
+	movdqu (%rsi), XTS_TW_X0;
+	XTS_LOAD_TWEAK_MASK(XTS_MASK);
+	XTS_NEXT_TWEAK_1BLOCK(XTS_TW_X0, XTS_TW_X0, %xmm5, XTS_MASK);
+	movdqu XTS_TW_X0, (%rdi);
+
+	FRAME_END
+	RET;
+SYM_FUNC_END(lea_avx2_xts_next_tweak_sse2)
+
+SYM_FUNC_START(lea_avx2_ctr_enc_4way)
+	/* input:
+	 *	%rdi: ctx, CTX
+	 *	%rsi: dst (4 blocks)
+	 *	%rdx: src (4 blocks)
+	 *	%rcx: ctr
+	 * changed:
+	 *  CTR_64_high(%r9)
+	 *  CTR_64_low(%rax)
+	 */
+	FRAME_BEGIN
+
+	push CTR_64_high;
+
+	movbe (%rcx), CTR_64_high;
+	movbe 8(%rcx), CTR_64_low;
+
+	movdqu (%rcx), %xmm0;
+	PROC_NEXT_CTR(%rcx, 0, CTR_64_low, CTR_64_high);
+	movdqu (%rcx), %xmm1;
+	PROC_NEXT_CTR(%rcx, 0, CTR_64_low, CTR_64_high);
+	movdqu (%rcx), %xmm5;
+	PROC_NEXT_CTR(%rcx, 0, CTR_64_low, CTR_64_high);
+	movdqu (%rcx), %xmm3;
+	PROC_NEXT_CTR(%rcx, 0, CTR_64_low, CTR_64_high);
+
+	JOIN_BLOCK4(%xmm0, %xmm1, %xmm2, %xmm3, %xmm4, %xmm5);
+
+	call __lea_avx2_enc_4way;
+
+	SPLIT_BLOCK4(%xmm0, %xmm1, %xmm5, %xmm3, %xmm4, %xmm2);
+	LOAD_BLOCK4(%xmm6, %xmm7, %xmm8, %xmm9, %rdx);
+
+	pxor %xmm6, %xmm0;
+	pxor %xmm7, %xmm1;
+	pxor %xmm8, %xmm5;
+	pxor %xmm9, %xmm3;
+
+	STORE_BLOCK4(%xmm0, %xmm1, %xmm5, %xmm3, %rsi);
+
+	pop CTR_64_high;
+
+	FRAME_END
+	RET;
+SYM_FUNC_END(lea_avx2_ctr_enc_4way)
+
+SYM_FUNC_START(lea_avx2_ecb_enc_8way)
+	/* input:
+	 *	%rdi: ctx, CTX
+	 *	%rsi: dst (8 blocks)
+	 *	%rdx: src (8 blocks)
+	 */
+	FRAME_BEGIN
+
+	vzeroupper;
+
+	LOAD_AND_JOIN_BLOCK8(0, 1, 2, 3, 4, %rdx);
+
+	call __lea_avx2_enc_8way;
+
+	SPLIT_AND_STORE_BLOCK8(0, 1, 2, 3, %rsi);
+
+	vzeroupper;
+
+	FRAME_END
+	RET;
+SYM_FUNC_END(lea_avx2_ecb_enc_8way)
+
+SYM_FUNC_START(lea_avx2_ecb_dec_8way)
+	/* input:
+	 *	%rdi: ctx, CTX
+	 *	%rsi: dst (8 blocks)
+	 *	%rdx: src (8 blocks)
+	 */
+	FRAME_BEGIN
+
+	vzeroupper;
+
+	LOAD_AND_JOIN_BLOCK8(0, 1, 2, 3, 4, %rdx);
+
+	call __lea_avx2_dec_8way
+
+	SPLIT_AND_STORE_BLOCK8(0, 1, 2, 3, %rsi);
+
+	vzeroupper;
+
+	FRAME_END
+	RET;
+SYM_FUNC_END(lea_avx2_ecb_dec_8way)
+
+SYM_FUNC_START(lea_avx2_cbc_dec_8way)
+	/* input:
+	 *	%rdi: ctx, CTX
+	 *	%rsi: dst (8 blocks)
+	 *	%rdx: src (8 blocks)
+	 */
+	FRAME_BEGIN
+
+	vzeroupper;
+
+	LOAD_AND_JOIN_BLOCK8(0, 1, 2, 3, 4, %rdx);
+
+	CBC_LOAD_SHUFFLE_MASK(%ymm5);
+	vpxor %ymm4, %ymm4, %ymm4;
+
+	vpermd %ymm0, %ymm5, %ymm6;
+	vpermd %ymm1, %ymm5, %ymm7;
+	vpermd %ymm2, %ymm5, %ymm8;
+	vpermd %ymm3, %ymm5, %ymm9;
+
+	vpblendd $0x10, %ymm4, %ymm6, %ymm6;
+	vpblendd $0x10, %ymm4, %ymm7, %ymm7;
+	vpblendd $0x10, %ymm4, %ymm8, %ymm8;
+	vpblendd $0x10, %ymm4, %ymm9, %ymm9;
+
+	call __lea_avx2_dec_8way
+
+	vpxor  %ymm6, %ymm0, %ymm0;
+	vpxor  %ymm7, %ymm1, %ymm1;
+	vpxor  %ymm8, %ymm2, %ymm2;
+	vpxor  %ymm9, %ymm3, %ymm3;
+
+	SPLIT_AND_STORE_BLOCK8(0, 1, 2, 3, %rsi);
+
+	vzeroupper;
+
+	FRAME_END
+	RET;
+SYM_FUNC_END(lea_avx2_cbc_dec_8way)
+
+SYM_FUNC_START(lea_avx2_xts_enc_8way)
+	/* input:
+	 *	%rdi: ctx, CTX
+	 *	%rsi: dst (8 blocks)
+	 *	%rdx: src (8 blocks)
+	 *	%rcx: tweak
+	 */
+	FRAME_BEGIN
+
+	vzeroupper;
+
+	movdqu (%rcx), XTS_TW_X0;
+	XTS_LOAD_TWEAK_MASK(XTS_MASK);
+	XTS_NEXT_TWEAK_1BLOCK(XTS_TW_X1, XTS_TW_X0, XMM(5), XTS_MASK);
+	XTS_NEXT_TWEAK_1BLOCK(XTS_TW_I2, XTS_TW_X1, XMM(5), XTS_MASK);
+	XTS_NEXT_TWEAK_1BLOCK(XTS_TW_X3, XTS_TW_I2, XMM(5), XTS_MASK);
+
+	XTS_NEXT_TWEAK_1BLOCK(XTS_TW_X4, XTS_TW_X3, XMM(5), XTS_MASK);
+	XTS_NEXT_TWEAK_1BLOCK(XTS_TW_X5, XTS_TW_X4, XMM(5), XTS_MASK);
+	XTS_NEXT_TWEAK_1BLOCK(XTS_TW_I6, XTS_TW_X5, XMM(5), XTS_MASK);
+	XTS_NEXT_TWEAK_1BLOCK(XTS_TW_X7, XTS_TW_I6, XMM(5), XTS_MASK);
+
+	XTS_NEXT_TWEAK_1BLOCK(XTS_TW_X8, XTS_TW_X7, XMM(5), XTS_MASK);
+	movdqu XTS_TW_X8, (%rcx);
+
+	JOIN_BLOCK4(XTS_TW_X0, XTS_TW_X1, XTS_TW_O2, XTS_TW_X3, XMM(5), XTS_TW_I2);
+	JOIN_BLOCK4(XTS_TW_X4, XTS_TW_X5, XTS_TW_O6, XTS_TW_X7, XMM(5), XTS_TW_I6);
+
+	vinserti128 $0x1, XTS_TW_X0, XTS_TW_Y0, XTS_TW_Y0;
+	vinserti128 $0x1, XTS_TW_X1, XTS_TW_Y1, XTS_TW_Y1;
+	vinserti128 $0x1, XTS_TW_O2, XTS_TW_Y2, XTS_TW_Y2;
+	vinserti128 $0x1, XTS_TW_X3, XTS_TW_Y3, XTS_TW_Y3;
+
+	LOAD_AND_JOIN_BLOCK8(0, 1, 2, 3, 4, %rdx);
+
+	vpxor XTS_TW_Y0, %ymm0, %ymm0;
+	vpxor XTS_TW_Y1, %ymm1, %ymm1;
+	vpxor XTS_TW_Y2, %ymm2, %ymm2;
+	vpxor XTS_TW_Y3, %ymm3, %ymm3;
+
+	call __lea_avx2_enc_8way
+
+	vpxor XTS_TW_Y0, %ymm0, %ymm0;
+	vpxor XTS_TW_Y1, %ymm1, %ymm1;
+	vpxor XTS_TW_Y2, %ymm2, %ymm2;
+	vpxor XTS_TW_Y3, %ymm3, %ymm3;
+
+	SPLIT_AND_STORE_BLOCK8(0, 1, 2, 3, %rsi);
+
+	vzeroupper;
+
+	FRAME_END
+	RET;
+SYM_FUNC_END(lea_avx2_xts_enc_8way)
+
+SYM_FUNC_START(lea_avx2_xts_dec_8way)
+	/* input:
+	 *	%rdi: ctx, CTX
+	 *	%rsi: dst (8 blocks)
+	 *	%rdx: src (8 blocks)
+	 *	%rcx: tweak
+	 */
+	FRAME_BEGIN
+
+	vzeroupper;
+
+	movdqu (%rcx), XTS_TW_X0;
+	XTS_LOAD_TWEAK_MASK(XTS_MASK);
+	XTS_NEXT_TWEAK_1BLOCK(XTS_TW_X1, XTS_TW_X0, XMM(5), XTS_MASK);
+	XTS_NEXT_TWEAK_1BLOCK(XTS_TW_I2, XTS_TW_X1, XMM(5), XTS_MASK);
+	XTS_NEXT_TWEAK_1BLOCK(XTS_TW_X3, XTS_TW_I2, XMM(5), XTS_MASK);
+
+	XTS_NEXT_TWEAK_1BLOCK(XTS_TW_X4, XTS_TW_X3, XMM(5), XTS_MASK);
+	XTS_NEXT_TWEAK_1BLOCK(XTS_TW_X5, XTS_TW_X4, XMM(5), XTS_MASK);
+	XTS_NEXT_TWEAK_1BLOCK(XTS_TW_I6, XTS_TW_X5, XMM(5), XTS_MASK);
+	XTS_NEXT_TWEAK_1BLOCK(XTS_TW_X7, XTS_TW_I6, XMM(5), XTS_MASK);
+
+	XTS_NEXT_TWEAK_1BLOCK(XTS_TW_X8, XTS_TW_X7, XMM(5), XTS_MASK);
+	movdqu XTS_TW_X8, (%rcx);
+
+	JOIN_BLOCK4(XTS_TW_X0, XTS_TW_X1, XTS_TW_O2, XTS_TW_X3, XMM(5), XTS_TW_I2);
+	JOIN_BLOCK4(XTS_TW_X4, XTS_TW_X5, XTS_TW_O6, XTS_TW_X7, XMM(5), XTS_TW_I6);
+
+	vinserti128 $0x1, XTS_TW_X0, XTS_TW_Y0, XTS_TW_Y0;
+	vinserti128 $0x1, XTS_TW_X1, XTS_TW_Y1, XTS_TW_Y1;
+	vinserti128 $0x1, XTS_TW_O2, XTS_TW_Y2, XTS_TW_Y2;
+	vinserti128 $0x1, XTS_TW_X3, XTS_TW_Y3, XTS_TW_Y3;
+
+	LOAD_AND_JOIN_BLOCK8(0, 1, 2, 3, 4, %rdx);
+
+	vpxor XTS_TW_Y0, %ymm0, %ymm0;
+	vpxor XTS_TW_Y1, %ymm1, %ymm1;
+	vpxor XTS_TW_Y2, %ymm2, %ymm2;
+	vpxor XTS_TW_Y3, %ymm3, %ymm3;
+
+	call __lea_avx2_dec_8way
+
+	vpxor XTS_TW_Y0, %ymm0, %ymm0;
+	vpxor XTS_TW_Y1, %ymm1, %ymm1;
+	vpxor XTS_TW_Y2, %ymm2, %ymm2;
+	vpxor XTS_TW_Y3, %ymm3, %ymm3;
+
+	SPLIT_AND_STORE_BLOCK8(0, 1, 2, 3, %rsi);
+
+	vzeroupper;
+
+	FRAME_END
+	RET;
+SYM_FUNC_END(lea_avx2_xts_dec_8way)
+
+
+SYM_FUNC_START(lea_avx2_ctr_enc_8way)
+	/* input:
+	 *	%rdi: ctx, CTX
+	 *	%rsi: dst (8 blocks)
+	 *	%rdx: src (8 blocks)
+	 *	%rcx: ctr
+	 *  %r8 : buffer (8 blocks)
+	 * changed:
+	 *  CTR_64_high(%r9)
+	 *  CTR_64_low(%rax)
+	 */
+	FRAME_BEGIN
+
+	push CTR_64_high;
+
+	vzeroupper;
+	movbe (%rcx), CTR_64_high;
+	movbe 8(%rcx), CTR_64_low;
+	movbe CTR_64_high, (%r8);
+	movbe CTR_64_low, 8(%r8);
+
+	PROC_NEXT_CTR(%r8, 1, CTR_64_low, CTR_64_high);
+	PROC_NEXT_CTR(%r8, 2, CTR_64_low, CTR_64_high);
+	PROC_NEXT_CTR(%r8, 3, CTR_64_low, CTR_64_high);
+	PROC_NEXT_CTR(%r8, 4, CTR_64_low, CTR_64_high);
+	PROC_NEXT_CTR(%r8, 5, CTR_64_low, CTR_64_high);
+	PROC_NEXT_CTR(%r8, 6, CTR_64_low, CTR_64_high);
+	PROC_NEXT_CTR(%r8, 7, CTR_64_low, CTR_64_high);
+	PROC_NEXT_CTR(%rcx, 0, CTR_64_low, CTR_64_high);
+
+	LOAD_AND_JOIN_BLOCK8(0, 1, 2, 3, 4, %r8);
+	LOAD_AND_JOIN_BLOCK8(5, 6, 7, 8, 4, %rdx);
+
+	call __lea_avx2_enc_8way;
+
+	vpxor %ymm5, %ymm0, %ymm0;
+	vpxor %ymm6, %ymm1, %ymm1;
+	vpxor %ymm7, %ymm2, %ymm2;
+	vpxor %ymm8, %ymm3, %ymm3;
+
+	SPLIT_AND_STORE_BLOCK8(0, 1, 2, 3, %rsi);
+
+	vzeroupper;
+
+	pop CTR_64_high;
+
+	FRAME_END
+	RET;
+SYM_FUNC_END(lea_avx2_ctr_enc_8way)
+
+
+.section	.rodata.cst32.cbc_shuffle_mask, "aM", @progbits, 32
+.align 32
+.Lcbc_shuffle_mask:
+	.octa 0x00000002000000010000000000000007
+	.octa 0x00000006000000050000000400000003
+
+.section	.rodata.cst16.xts_tweak_mask, "aM", @progbits, 16
+.align 16
+.Lxts_tweak_mask:
+	.octa 0x00000000000000010000000000000087
-- 
2.34.1

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 3/3] crypto: LEA block cipher AVX2 optimization
  2023-04-28 11:00 ` [PATCH 3/3] crypto: LEA block cipher AVX2 optimization Dongsoo Lee
@ 2023-04-28 15:54   ` Dave Hansen
  2023-05-16  4:29     ` Dongsoo Lee
  0 siblings, 1 reply; 10+ messages in thread
From: Dave Hansen @ 2023-04-28 15:54 UTC (permalink / raw)
  To: Dongsoo Lee, linux-crypto
  Cc: Herbert Xu, David S. Miller, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, x86, H. Peter Anvin, linux-kernel,
	David S. Miller, Dongsoo Lee

> +config CRYPTO_LEA_AVX2
> +	tristate "Ciphers: LEA with modes: ECB, CBC, CTR, XTS (SSE2/MOVBE/AVX2)"
> +	select CRYPTO_LEA
> +	imply CRYPTO_XTS
> +	imply CRYPTO_CTR
> +	help
> +	  LEA cipher algorithm (KS X 3246, ISO/IEC 29192-2:2019)
> +
> +	  LEA is one of the standard cryptographic alorithms of
> +	  the Republic of Korea. It consists of four 32bit word.

The "four 32bit word" thing is probably not a detail end users care
about enough to see in Kconfig text.

> +	  See:
> +	  https://seed.kisa.or.kr/kisa/algorithm/EgovLeaInfo.do
> +
> +	  Architecture: x86_64 using:
> +	  - SSE2 (Streaming SIMD Extensions 2)
> +	  - MOVBE (Move Data After Swapping Bytes)
> +	  - AVX2 (Advanced Vector Extensions)

What about i386?  If this is truly 64-bit-only for some reason, it's not
reflected anywhere that I can see, like having a:

	depends on X86_64

I'm also a _bit_ confused why this has one config option called "_AVX2"
but that also includes the SSE2 implementation.

> +	  Processes 4(SSE2), 8(AVX2) blocks in parallel.
> +	  In CTR mode, the MOVBE instruction is utilized for improved performance.
> +
>  config CRYPTO_CHACHA20_X86_64
>  	tristate "Ciphers: ChaCha20, XChaCha20, XChaCha12 (SSSE3/AVX2/AVX-512VL)"
>  	depends on X86 && 64BIT
> diff --git a/arch/x86/crypto/Makefile b/arch/x86/crypto/Makefile
> index 9aa46093c91b..de23293b88df 100644
> --- a/arch/x86/crypto/Makefile
> +++ b/arch/x86/crypto/Makefile
> @@ -109,6 +109,9 @@ aria-aesni-avx2-x86_64-y := aria-aesni-avx2-asm_64.o aria_aesni_avx2_glue.o
>  obj-$(CONFIG_CRYPTO_ARIA_GFNI_AVX512_X86_64) += aria-gfni-avx512-x86_64.o
>  aria-gfni-avx512-x86_64-y := aria-gfni-avx512-asm_64.o aria_gfni_avx512_glue.o
>  
> +obj-$(CONFIG_CRYPTO_LEA_AVX2) += lea-avx2-x86_64.o
> +lea-avx2-x86_64-y := lea_avx2_x86_64-asm.o lea_avx2_glue.o
> +
>  quiet_cmd_perlasm = PERLASM $@
>        cmd_perlasm = $(PERL) $< > $@
>  $(obj)/%.S: $(src)/%.pl FORCE
> diff --git a/arch/x86/crypto/lea_avx2_glue.c b/arch/x86/crypto/lea_avx2_glue.c
> new file mode 100644
> index 000000000000..532958d3caa5
> --- /dev/null
> +++ b/arch/x86/crypto/lea_avx2_glue.c
> @@ -0,0 +1,1112 @@
> +// SPDX-License-Identifier: GPL-2.0-or-later
> +/*
> + * Glue Code for the SSE2/MOVBE/AVX2 assembler instructions for the LEA Cipher
> + *
> + * Copyright (c) 2023 National Security Research.
> + * Author: Dongsoo Lee <letrhee@nsr.re.kr>
> + */
> +
> +#include <asm/simd.h>
> +#include <asm/unaligned.h>
> +#include <crypto/algapi.h>
> +#include <crypto/ctr.h>
> +#include <crypto/internal/simd.h>
> +#include <crypto/scatterwalk.h>
> +#include <crypto/skcipher.h>
> +#include <crypto/internal/skcipher.h>
> +#include <linux/err.h>
> +#include <linux/module.h>
> +#include <linux/types.h>
> +
> +#include <crypto/lea.h>
> +#include <crypto/xts.h>
> +#include "ecb_cbc_helpers.h"
> +
> +#define SIMD_KEY_ALIGN 16
> +#define SIMD_ALIGN_ATTR __aligned(SIMD_KEY_ALIGN)
> +
> +struct lea_xts_ctx {
> +	u8 raw_crypt_ctx[sizeof(struct crypto_lea_ctx)] SIMD_ALIGN_ATTR;
> +	u8 raw_tweak_ctx[sizeof(struct crypto_lea_ctx)] SIMD_ALIGN_ATTR;
> +};

The typing here is a bit goofy.  What's wrong with:

struct lea_xts_ctx {
	struct crypto_lea_ctx crypt_ctx SIMD_ALIGN_ATTR;
	struct crypto_lea_ctx lea_ctx   SIMD_ALIGN_ATTR;
};

?  You end up with the same sized structure but you don't have to cast
it as much.

> +struct _lea_u128 {
> +	u64 v0, v1;
> +};
> +
> +static inline void xor_1blk(u8 *out, const u8 *in1, const u8 *in2)
> +{
> +	const struct _lea_u128 *_in1 = (const struct _lea_u128 *)in1;
> +	const struct _lea_u128 *_in2 = (const struct _lea_u128 *)in2;
> +	struct _lea_u128 *_out = (struct _lea_u128 *)out;
> +
> +	_out->v0 = _in1->v0 ^ _in2->v0;
> +	_out->v1 = _in1->v1 ^ _in2->v1;
> +}
> +
> +static inline void xts_next_tweak(u8 *out, const u8 *in)
> +{
> +	const u64 *_in = (const u64 *)in;
> +	u64 *_out = (u64 *)out;
> +	u64 v0 = _in[0];
> +	u64 v1 = _in[1];
> +	u64 carry = (u64)(((s64)v1) >> 63);
> +
> +	v1 = (v1 << 1) ^ (v0 >> 63);
> +	v0 = (v0 << 1) ^ ((u64)carry & 0x87);
> +
> +	_out[0] = v0;
> +	_out[1] = v1;
> +}

I don't really care either way, but it's interesting that in two
adjacent functions this deals with two adjacent 64-bit values.  In one
it defines a structure with two u64's and in the next it treats it as an
array.

> +static int xts_encrypt_8way(struct skcipher_request *req)
> +{
...

It's kinda a shame that there isn't more code shared here between, for
instance the 4way and 8way functions.  But I guess this crypto code
tends to be merged and then very rarely fixed up after.

> +static int xts_lea_set_key(struct crypto_skcipher *tfm, const u8 *key,
> +				u32 keylen)
> +{
> +	struct crypto_tfm *tfm_ctx = crypto_skcipher_ctx(tfm);
> +	struct lea_xts_ctx *ctx = crypto_tfm_ctx(tfm_ctx);
> +
> +	struct crypto_lea_ctx *crypt_key =
> +		(struct crypto_lea_ctx *)(ctx->raw_crypt_ctx);
> +	struct crypto_lea_ctx *tweak_key =
> +		(struct crypto_lea_ctx *)(ctx->raw_tweak_ctx);

These were those goofy casts that can go away if the typing is a bit
more careful

...
> +static struct simd_skcipher_alg *lea_simd_algs[ARRAY_SIZE(lea_simd_avx2_algs)];
> +
> +static int __init crypto_lea_avx2_init(void)
> +{
> +	const char *feature_name;
> +
> +	if (!boot_cpu_has(X86_FEATURE_XMM2)) {
> +		pr_info("SSE2 instructions are not detected.\n");
> +		return -ENODEV;
> +	}
> +
> +	if (!boot_cpu_has(X86_FEATURE_MOVBE)) {
> +		pr_info("MOVBE instructions are not detected.\n");
> +		return -ENODEV;
> +	}
> +
> +	if (!boot_cpu_has(X86_FEATURE_AVX2) || !boot_cpu_has(X86_FEATURE_AVX)) {
> +		pr_info("AVX2 instructions are not detected.\n");
> +		return -ENODEV;
> +	}
> +
> +	if (!cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM,
> +				&feature_name)) {
> +		pr_info("CPU feature '%s' is not supported.\n", feature_name);
> +		return -ENODEV;
> +	}

This looks suspect.

It requires that *ALL* of XMM2, MOVBE, AVX, AVX2 and XSAVE support for
*ANY* of these to be used.  In other cipher code that I've seen, it
separates out the AVX/YMM acceleration from the pure SSE2/XMM
acceleration functions so that CPUs with only SSE2 can still benefit.

Either this is wrong, or there is something subtle going on that I'm
missing.



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 0/3] crypto: LEA block cipher implementation
  2023-04-28 11:00 [PATCH 0/3] crypto: LEA block cipher implementation Dongsoo Lee
                   ` (2 preceding siblings ...)
  2023-04-28 11:00 ` [PATCH 3/3] crypto: LEA block cipher AVX2 optimization Dongsoo Lee
@ 2023-04-28 23:19 ` Eric Biggers
  2023-05-16  4:27   ` Dongsoo Lee
  3 siblings, 1 reply; 10+ messages in thread
From: Eric Biggers @ 2023-04-28 23:19 UTC (permalink / raw)
  To: Dongsoo Lee
  Cc: linux-crypto, Herbert Xu, David S. Miller, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
	linux-kernel, David S. Miller, Dongsoo Lee

Hi Dongsoo,

On Fri, Apr 28, 2023 at 08:00:55PM +0900, Dongsoo Lee wrote:
> The Korean e-government framework contains various cryptographic
> applications, and KCMVP-validated cryptographic module should be used
> according to the government requirements. The ARIA block cipher, which
> is already included in Linux kernel, has been widely used as a symmetric
> key cipher. However, the adoption of LEA increase rapidly for new
> applications.
> 
> By adding LEA to the Linux kernel, Dedicated device drivers that require
> LEA encryption can be provided without additional crypto implementation.
> An example of an immediately applicable use case is disk encryption
> using cryptsetup.
> 
> The submitted implementation includes a generic C implementation that
> uses 32-bit ARX operations, and an optimized implementation for the
> x86_64 environment.

Can you elaborate further on the use case for this cipher?  Your description
above is very vague.  What is the actual use case when so many other ciphers
already exist, including much better studied ones?  Are people being required to
use this cipher, and if so under what situations?  There is also already another
"national pride" block cipher from Korea (ARIA); do we really need another one?

BTW, in 2018, I investigated LEA and various other ciphers as options for
storage encryption on ARM processors without the crypto extensions.  We ended up
not selecting LEA for several different reasons (e.g. see
https://lore.kernel.org/r/20180507232000.GA194688@google.com), and we later
created Adiantum for the use case.  But, it sounds like "storage encryption on
processors without crypto instructions" isn't the use case you have in mind at
all anyway, seeing as the only assembly code you're providing is for x86_64.
What sort of use case do you actually have in mind?  Is this perhaps a PhD
thesis type of thing that won't actually be used in a real world application?

IIRC, one of the issues with LEA was that the LEA paper doesn't provide test
vectors, so I couldn't be certain that I had actually implemented the algorithm
correctly.  It sounds like there are now test vectors available.  How confident
are you that they actually match the original algorithm?

> The implementation has been tested with kernel module tcrypt.ko and has
> passed the selftest using test vectors for KCMVP[4]. The path also test
> with CONFIG_CRYPTO_MANAGER_EXTRA_TESTS enabled.

There is a KASAN out-of-bounds error in lea_set_key() when running the
self-tests.

- Eric

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/3] crypto: LEA block cipher implementation
  2023-04-28 11:00 ` [PATCH 1/3] " Dongsoo Lee
@ 2023-04-28 23:29   ` Eric Biggers
  2023-04-29  2:20     ` Letrhee
  0 siblings, 1 reply; 10+ messages in thread
From: Eric Biggers @ 2023-04-28 23:29 UTC (permalink / raw)
  To: Dongsoo Lee
  Cc: linux-crypto, Herbert Xu, David S. Miller, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
	linux-kernel, David S. Miller, Dongsoo Lee

Hi Dongsoo,

On Fri, Apr 28, 2023 at 08:00:56PM +0900, Dongsoo Lee wrote:
> The LEA is a Korean national standard block cipher, described in
> "KS X 3246" and is also included in the international standard, "ISO/IEC
> 29192-2:2019 standard (Information security - Lightweight cryptography
> - Part 2: Block ciphers)".
> 
> The LEA algorithm is a symmetric key cipher that processes data blocks
> of 128-bits and has three different key lengths, each with a different
> number of rounds:
> 
> - LEA-128: 128-bit key, 24 rounds,
> - LEA-192: 192-bit key, 28 rounds, and
> - LEA-256: 256-bit key, 32 rounds.
> 
> The round function of LEA consists of 32-bit ARX(modular Addition,
> bitwise Rotation, and bitwise XOR) operations.
> 
> The implementation same as submitted generic C implementation is
> distributed through the Korea Internet & Security Agency (KISA).
> 
> - https://seed.kisa.or.kr/kisa/algorithm/EgovLeaInfo.do
> - https://seed.kisa.or.kr/kisa/Board/20/detailView.do
> 
> Signed-off-by: Dongsoo Lee <letrhee@nsr.re.kr>
> ---
>  crypto/Kconfig       |  12 +
>  crypto/Makefile      |   1 +
>  crypto/lea_generic.c | 915 +++++++++++++++++++++++++++++++++++++++++++
>  include/crypto/lea.h |  39 ++
>  4 files changed, 967 insertions(+)
>  create mode 100644 crypto/lea_generic.c
>  create mode 100644 include/crypto/lea.h

This implementation is very ugly.  There's no need to unroll all the rounds in
the source code as you're doing.  It also makes it very difficult to check the
implementation against the original paper.

I happened to write an LEA implementation several years ago, and IMO it's much
cleaner than this one.  It's less than half the lines of code, despite having a
lot more comments.  I also implemented (and documented) some optimizations, some
of which were recommended in the original LEA paper, IIRC.  Maybe you'd like to
take a look at my implementation for some ideas, or even just use it outright?
You can get it from here:
https://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux.git/commit/?h=old/wip-lea&id=1d1cbba14380f8a1abc76baf939b9e51de047fb6

- Eric

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/3] crypto: LEA block cipher implementation
  2023-04-28 23:29   ` Eric Biggers
@ 2023-04-29  2:20     ` Letrhee
  0 siblings, 0 replies; 10+ messages in thread
From: Letrhee @ 2023-04-29  2:20 UTC (permalink / raw)
  To: Eric Biggers
  Cc: Dongsoo Lee, linux-crypto, Herbert Xu, David S. Miller,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, linux-kernel, David S. Miller

Thank you for taking the time to review the implementation. I
appreciate your feedback and will respond to other reviews soon.

As you mentioned, I agree that using loop unrolling in generic C code
can be unnecessary and make it difficult to verify against the
original paper. Additionally, I acknowledge that the submitted code
may lack sufficient comments.

I'll follow your suggestion and re-implement the code.

Thank you again for your review.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: [PATCH 0/3] crypto: LEA block cipher implementation
  2023-04-28 23:19 ` [PATCH 0/3] crypto: LEA block cipher implementation Eric Biggers
@ 2023-05-16  4:27   ` Dongsoo Lee
  0 siblings, 0 replies; 10+ messages in thread
From: Dongsoo Lee @ 2023-05-16  4:27 UTC (permalink / raw)
  To: 'Eric Biggers'
  Cc: linux-crypto, 'Herbert Xu', 'David S. Miller',
	'Thomas Gleixner', 'Ingo Molnar',
	'Borislav Petkov', 'Dave Hansen',
	x86, 'H. Peter Anvin',
	linux-kernel, 'Dongsoo Lee'

Thank you for your kind review and sorry for taking a bit of time to
respond.

We expect that the first application of the patch would be disk encryption
on the Gooroom platform ('Gooroom' is a Korean word, meaning 'cloud') [1].
Currently, the Gooroom platform uses AES-XTS for disk encryption. The main
reason for submitting this patch is to make disk encryption with LEA (e.g.
LEA-XTS) available in it.

The Gooroom platform is a government-driven Debian-based Linux distribution
in South Korea. In Korea, there are many crypto companies that want to
bundle Linux into their products and sell them. They create their own
Gooroom platforms by modifying the original Gooroom platform for their
services. (Of course, the Gooroom platform is not mandatory, and companies
wishing to use Linux are free to choose an appropriate distribution.) BTW,
in Korea, many crypto companies want to use LEA, because LEA is one of the
block ciphers of the KCMVP, a validation program for commercial crypto S/W
to be delivered to the Korean government.

The Linux Crypto API already has another Korean block cipher, ARIA, also
one of the block ciphers of the KCVMP. However, LEA is more widely used
than ARIA in industry nowadays, because LEA is one of the lightweight
cryptography standard of ISO/IEC [2] and performs well on low-end devices
that support 32-bit operations. So we think they are complementary to each
other.
LEA also performs slightly better in Generic C and AVX2 instruction
implementations than ARIA. While there is no AVX512 instruction
implementation of LEA yet, it is expected that the techniques used in the
AVX2 implementation can also be applied to AVX512. Rather, using 512-bit
registers and rotation instructions, LEA is expected to show even better
performance in AVX512 than in AVX2.

Performance comparisons of the two ciphers on a Ryzen R9 5950X using the
tcrypt module are shown below. Please note that this CPU does not support
GFNI and AVX512, so the results on ARIA may show less efficiency compared
to the ones offered by the current Linux kernel. The experiments on LEA
were done with the version that we are currently working on.

- 256-bit key, 4096 bytes
  - aes-aesni
    - ecb enc   1,637 cycles
    - ecb dec   1,608 cycles
    - ctr enc   1,649 cycles
  - aria-generic
    - ecb enc 235,293 cycles
    - ecb dec 237,949 cycles
    - ctr enc 240,754 cycles
  - lea-generic
    - ecb enc  31,945 cycles
    - ecb dec  50,511 cycles
    - ctr enc  33,942 cycles
  - aria-avx2
    - ecb enc  9,807 cycles
    - ecb dec 10,203 cycles
    - ctr enc 10,038 cycles
  - lea-avx2
    - ecb enc  5,784 cycles
    - ecb dec  7,423 cycles
    - ctr enc  6,136 cycles

In general, it's obvious that the hardware-accelerated AES is the best
performer. However, there exist not only environments where the hardware-
accelerated AES is not supported, but also situations where AES is not
preferred for various reasons. In these cases, if someone wants to encrypt
using a block cipher, LEA could be an alternative.

Apart from this, we also have implemented LEA in lightweight environments
such as 8-bit AVR and 16-bit MSP [3]. Only the assembly implementation of
LEA with AVX2 was submitted because the main goal was x86_64 as mentioned
earlier. If LEA were to be included in the Linux kernel, it would be
possible to modify and supplement the submission with lightweight
implementations to provide efficient encryption on low-performance devices.

Although the designers of LEA did not provide test vectors in their paper
[5], the ISO/IEC standard [2] and the KS standard [4] do. Furthermore, the
Block Cipher LEA Specification("블록암호 LEA 규격서", written in Korean)
document on the LEA introduction page [6] and the Wikipedia article on LEA
[7] show the same test vectors as in the standards.
The test vectors for ECB, CBC, CTR, and GCM modes included in the testmgr
module are taken from the KCMVP Cryptographic Algorithm Verification
Criteria V3.0("KCMVP 검증대상 암호알고리즘 검증기준 V3.0", written in
Korean) [8]. Test vectors for the XTS mode were generated by ourselves, and
we crosschecked them using Crypto++ [9] and testmgr on Linux.

[1] https://github.com/gooroom https://www.gooroom.kr/
[2] ISO/IEC 29192-2:2019, Information security - Lightweight cryptography -
Part 2: Block ciphers.
[3]
https://github.com/cryptolu/FELICS/tree/master/block_ciphers/source/ciphers/
LEA_128_128_v01/source
[4] KS X 3246, 128-bit block cipher LEA.
[5] Hong, Deukjo, et al. "LEA: A 128-bit block cipher for fast encryption
on common processors.", WISA 2013.
[6] https://seed.kisa.or.kr/kisa/algorithm/EgovLeaInfo.do
[7] https://en.wikipedia.org/wiki/LEA_(cipher)
[8] https://seed.kisa.or.kr/kisa/kcmvp/EgovVerification.do
[9] https://www.cryptopp.com/

+) We applied the optimization technique introduced in your the other
review to our decryption code. So, could you please let us know how to
state that fact clearly?

-----Original Message-----
From: Eric Biggers <ebiggers@kernel.org> 
Sent: Saturday, April 29, 2023 8:20 AM
To: Dongsoo Lee <letrhee@nsr.re.kr>
Cc: linux-crypto@vger.kernel.org; Herbert Xu <herbert@gondor.apana.org.au>;
David S. Miller <davem@davemloft.net>; Thomas Gleixner
<tglx@linutronix.de>; Ingo Molnar <mingo@redhat.com>; Borislav Petkov
<bp@alien8.de>; Dave Hansen <dave.hansen@linux.intel.com>; x86@kernel.org;
H. Peter Anvin <hpa@zytor.com>; linux-kernel@vger.kernel.org; David S.
Miller <abc@test.nsr.re.kr>; Dongsoo Lee <letrhee@gmail.com>
Subject: Re: [PATCH 0/3] crypto: LEA block cipher implementation

Hi Dongsoo,

On Fri, Apr 28, 2023 at 08:00:55PM +0900, Dongsoo Lee wrote:
> The Korean e-government framework contains various cryptographic 
> applications, and KCMVP-validated cryptographic module should be used 
> according to the government requirements. The ARIA block cipher, which 
> is already included in Linux kernel, has been widely used as a 
> symmetric key cipher. However, the adoption of LEA increase rapidly 
> for new applications.
> 
> By adding LEA to the Linux kernel, Dedicated device drivers that 
> require LEA encryption can be provided without additional crypto
implementation.
> An example of an immediately applicable use case is disk encryption 
> using cryptsetup.
> 
> The submitted implementation includes a generic C implementation that 
> uses 32-bit ARX operations, and an optimized implementation for the
> x86_64 environment.

Can you elaborate further on the use case for this cipher?  Your
description above is very vague.  What is the actual use case when so many
other ciphers already exist, including much better studied ones?  Are
people being required to use this cipher, and if so under what situations?
There is also already another "national pride" block cipher from Korea
(ARIA); do we really need another one?

BTW, in 2018, I investigated LEA and various other ciphers as options for
storage encryption on ARM processors without the crypto extensions.  We
ended up not selecting LEA for several different reasons (e.g. see
https://lore.kernel.org/r/20180507232000.GA194688@google.com), and we later
created Adiantum for the use case.  But, it sounds like "storage encryption
on processors without crypto instructions" isn't the use case you have in
mind at all anyway, seeing as the only assembly code you're providing is
for x86_64.
What sort of use case do you actually have in mind?  Is this perhaps a PhD
thesis type of thing that won't actually be used in a real world
application?

IIRC, one of the issues with LEA was that the LEA paper doesn't provide
test vectors, so I couldn't be certain that I had actually implemented the
algorithm correctly.  It sounds like there are now test vectors available.
How confident are you that they actually match the original algorithm?

> The implementation has been tested with kernel module tcrypt.ko and 
> has passed the selftest using test vectors for KCMVP[4]. The path also 
> test with CONFIG_CRYPTO_MANAGER_EXTRA_TESTS enabled.

There is a KASAN out-of-bounds error in lea_set_key() when running the self-
tests.

- Eric

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: [PATCH 3/3] crypto: LEA block cipher AVX2 optimization
  2023-04-28 15:54   ` Dave Hansen
@ 2023-05-16  4:29     ` Dongsoo Lee
  0 siblings, 0 replies; 10+ messages in thread
From: Dongsoo Lee @ 2023-05-16  4:29 UTC (permalink / raw)
  To: 'Dave Hansen', linux-crypto
  Cc: 'Herbert Xu', 'David S. Miller',
	'Thomas Gleixner', 'Ingo Molnar',
	'Borislav Petkov', 'Dave Hansen',
	x86, 'H. Peter Anvin',
	linux-kernel, 'Dongsoo Lee'

Thanks for the review and sorry it took me a while to respond.

>> +config CRYPTO_LEA_AVX2
>> +	tristate "Ciphers: LEA with modes: ECB, CBC, CTR, XTS
>> (SSE2/MOVBE/AVX2)"
>> +	select CRYPTO_LEA
>> +	imply CRYPTO_XTS
>> +	imply CRYPTO_CTR
>> +	help
>> +	  LEA cipher algorithm (KS X 3246, ISO/IEC 29192-2:2019)
>> +
>> +	  LEA is one of the standard cryptographic alorithms of
>> +	  the Republic of Korea. It consists of four 32bit word.
> The "four 32bit word" thing is probably not a detail end users care about enough to see in Kconfig text.

I am in the process of re-implementing it for LEA cipher and will try to improve the description.

>> +	  See:
>> +	  https://seed.kisa.or.kr/kisa/algorithm/EgovLeaInfo.do
>> +
>> +	  Architecture: x86_64 using:
>> +	  - SSE2 (Streaming SIMD Extensions 2)
>> +	  - MOVBE (Move Data After Swapping Bytes)
>> +	  - AVX2 (Advanced Vector Extensions)
>
>What about i386?  If this is truly 64-bit-only for some reason, it's not reflected anywhere that I can see, like having a:
>
>	depends on X86_64
>
>I'm also a _bit_ confused why this has one config option called "_AVX2"
>but that also includes the SSE2 implementation.

As you mentioned, the SIMD optimizations used in this implementation are also applicable on i386. The initial support target was for x86_64 environments where AVX2 instructions are available, so we ended up with an awkward support target, which may need to be changed.
Internally, there is an implementation for i386, and I'll include it in the submission.

The LEA 4-way SIMD implementation can be done purely using SSE2 commands, but it can also be done a bit faster using `vpunpckldq` (AVX), `vpbroadcast` (AVX2), `vpxor` (AVX), etc. In this implementation, 4-way encryption is not possible without AVX2. If a future SSE2 implementation were to be added, it would be possible to include both a 4-way implementation with SSE2 and a 4-way implementation with AVX2.

>> +struct lea_xts_ctx {
>> +	u8 raw_crypt_ctx[sizeof(struct crypto_lea_ctx)] SIMD_ALIGN_ATTR;
>> +	u8 raw_tweak_ctx[sizeof(struct crypto_lea_ctx)] SIMD_ALIGN_ATTR; };
>
>The typing here is a bit goofy.  What's wrong with:
>
>struct lea_xts_ctx {
>	struct crypto_lea_ctx crypt_ctx SIMD_ALIGN_ATTR;
>	struct crypto_lea_ctx lea_ctx   SIMD_ALIGN_ATTR;
>};
>
>?  You end up with the same sized structure but you don't have to cast it as much.

This is a mistake I made by just bringing in other code without much thought. I'll fix it.

>> +struct _lea_u128 {
>> +	u64 v0, v1;
>> +};
>> +
>> +static inline void xor_1blk(u8 *out, const u8 *in1, const u8 *in2) {
>> +	const struct _lea_u128 *_in1 = (const struct _lea_u128 *)in1;
>> +	const struct _lea_u128 *_in2 = (const struct _lea_u128 *)in2;
>> +	struct _lea_u128 *_out = (struct _lea_u128 *)out;
>> +
>> +	_out->v0 = _in1->v0 ^ _in2->v0;
>> +	_out->v1 = _in1->v1 ^ _in2->v1;
>> +}
>> +
>> +static inline void xts_next_tweak(u8 *out, const u8 *in) {
>> +	const u64 *_in = (const u64 *)in;
>> +	u64 *_out = (u64 *)out;
>> +	u64 v0 = _in[0];
>> +	u64 v1 = _in[1];
>> +	u64 carry = (u64)(((s64)v1) >> 63);
>> +
>> +	v1 = (v1 << 1) ^ (v0 >> 63);
>> +	v0 = (v0 << 1) ^ ((u64)carry & 0x87);
>> +
>> +	_out[0] = v0;
>> +	_out[1] = v1;
>> +}
>
>I don't really care either way, but it's interesting that in two adjacent functions this deals with two adjacent 64-bit values.  In one it defines a structure with two u64's and in the next it treats it as an array.

I'll unify them in one way.


>> +static int xts_encrypt_8way(struct skcipher_request *req) {
>...
>
>It's kinda a shame that there isn't more code shared here between, for instance the 4way and 8way functions.  But I guess this crypto code tends to be merged and then very rarely fixed up after.

This is a mistake in implementation: as I mentioned, the code I submitted also requires AVX2 for the 4-way implementation, so this is unnecessary duplication. I will add a proper SSE2 4-way implementation by sharing the code with the 8-way.


>> +static int xts_lea_set_key(struct crypto_skcipher *tfm, const u8 *key,
>> +				u32 keylen)
>> +{
>> +	struct crypto_tfm *tfm_ctx = crypto_skcipher_ctx(tfm);
>> +	struct lea_xts_ctx *ctx = crypto_tfm_ctx(tfm_ctx);
>> +
>> +	struct crypto_lea_ctx *crypt_key =
>> +		(struct crypto_lea_ctx *)(ctx->raw_crypt_ctx);
>> +	struct crypto_lea_ctx *tweak_key =
>> +		(struct crypto_lea_ctx *)(ctx->raw_tweak_ctx);
>
>These were those goofy casts that can go away if the typing is a bit more careful

I'll fix it by redefining `struct lea_xts_ctx`.


>> +static struct simd_skcipher_alg
>> *lea_simd_algs[ARRAY_SIZE(lea_simd_avx2_algs)];
>> +
>> +static int __init crypto_lea_avx2_init(void) {
>> +	const char *feature_name;
>> +
>> +	if (!boot_cpu_has(X86_FEATURE_XMM2)) {
>> +		pr_info("SSE2 instructions are not detected.\n");
>> +		return -ENODEV;
>> +	}
>> +
>> +	if (!boot_cpu_has(X86_FEATURE_MOVBE)) {
>> +		pr_info("MOVBE instructions are not detected.\n");
>> +		return -ENODEV;
>> +	}
>> +
>> +	if (!boot_cpu_has(X86_FEATURE_AVX2) || 
>> +!boot_cpu_has(X86_FEATURE_AVX))
>> {
>> +		pr_info("AVX2 instructions are not detected.\n");
>> +		return -ENODEV;
>> +	}
>> +
>> +	if (!cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM,
>> +				&feature_name)) {
>> +		pr_info("CPU feature '%s' is not supported.\n", feature_name);
>> +		return -ENODEV;
>> +	}
>
>This looks suspect.
>
>It requires that *ALL* of XMM2, MOVBE, AVX, AVX2 and XSAVE support for
>*ANY* of these to be used.  In other cipher code that I've seen, it separates out the AVX/YMM acceleration from the pure SSE2/XMM acceleration functions so that CPUs with only SSE2 can still benefit.
>
>Either this is wrong, or there is something subtle going on that I'm missing.

This is a mistake, as the initial support was implemented for x86_64 environments with AVX2 instructions.

Since there are already implementations that support i386 and x86_64, SSE2, SSE2 and MOVBE, and AVX2 each independently, I will change it to support the various environments in the next version.


Thank you.


-----Original Message-----
From: Dave Hansen <dave.hansen@intel.com> 
Sent: Saturday, April 29, 2023 12:55 AM
To: Dongsoo Lee <letrhee@nsr.re.kr>; linux-crypto@vger.kernel.org
Cc: Herbert Xu <herbert@gondor.apana.org.au>; David S. Miller <davem@davemloft.net>; Thomas Gleixner <tglx@linutronix.de>; Ingo Molnar <mingo@redhat.com>; Borislav Petkov <bp@alien8.de>; Dave Hansen <dave.hansen@linux.intel.com>; x86@kernel.org; H. Peter Anvin <hpa@zytor.com>; linux-kernel@vger.kernel.org; David S. Miller <abc@test.nsr.re.kr>; Dongsoo Lee <letrhee@gmail.com>
Subject: Re: [PATCH 3/3] crypto: LEA block cipher AVX2 optimization

> +config CRYPTO_LEA_AVX2
> +	tristate "Ciphers: LEA with modes: ECB, CBC, CTR, XTS
> (SSE2/MOVBE/AVX2)"
> +	select CRYPTO_LEA
> +	imply CRYPTO_XTS
> +	imply CRYPTO_CTR
> +	help
> +	  LEA cipher algorithm (KS X 3246, ISO/IEC 29192-2:2019)
> +
> +	  LEA is one of the standard cryptographic alorithms of
> +	  the Republic of Korea. It consists of four 32bit word.

The "four 32bit word" thing is probably not a detail end users care about enough to see in Kconfig text.

> +	  See:
> +	  https://seed.kisa.or.kr/kisa/algorithm/EgovLeaInfo.do
> +
> +	  Architecture: x86_64 using:
> +	  - SSE2 (Streaming SIMD Extensions 2)
> +	  - MOVBE (Move Data After Swapping Bytes)
> +	  - AVX2 (Advanced Vector Extensions)

What about i386?  If this is truly 64-bit-only for some reason, it's not reflected anywhere that I can see, like having a:

	depends on X86_64

I'm also a _bit_ confused why this has one config option called "_AVX2"
but that also includes the SSE2 implementation.

> +	  Processes 4(SSE2), 8(AVX2) blocks in parallel.
> +	  In CTR mode, the MOVBE instruction is utilized for improved
> performance.
> +
>  config CRYPTO_CHACHA20_X86_64
>  	tristate "Ciphers: ChaCha20, XChaCha20, XChaCha12 
> (SSSE3/AVX2/AVX-512VL)"
>  	depends on X86 && 64BIT
> diff --git a/arch/x86/crypto/Makefile b/arch/x86/crypto/Makefile index 
> 9aa46093c91b..de23293b88df 100644
> --- a/arch/x86/crypto/Makefile
> +++ b/arch/x86/crypto/Makefile
> @@ -109,6 +109,9 @@ aria-aesni-avx2-x86_64-y := 
> aria-aesni-avx2-asm_64.o aria_aesni_avx2_glue.o
>  obj-$(CONFIG_CRYPTO_ARIA_GFNI_AVX512_X86_64) += 
> aria-gfni-avx512-x86_64.o  aria-gfni-avx512-x86_64-y := 
> aria-gfni-avx512-asm_64.o aria_gfni_avx512_glue.o
>
> +obj-$(CONFIG_CRYPTO_LEA_AVX2) += lea-avx2-x86_64.o lea-avx2-x86_64-y 
> +:= lea_avx2_x86_64-asm.o lea_avx2_glue.o
> +
>  quiet_cmd_perlasm = PERLASM $@
>        cmd_perlasm = $(PERL) $< > $@
>  $(obj)/%.S: $(src)/%.pl FORCE
> diff --git a/arch/x86/crypto/lea_avx2_glue.c 
> b/arch/x86/crypto/lea_avx2_glue.c new file mode 100644 index 
> 000000000000..532958d3caa5
> --- /dev/null
> +++ b/arch/x86/crypto/lea_avx2_glue.c
> @@ -0,0 +1,1112 @@
> +// SPDX-License-Identifier: GPL-2.0-or-later
> +/*
> + * Glue Code for the SSE2/MOVBE/AVX2 assembler instructions for the 
> +LEA
> Cipher
> + *
> + * Copyright (c) 2023 National Security Research.
> + * Author: Dongsoo Lee <letrhee@nsr.re.kr> */
> +
> +#include <asm/simd.h>
> +#include <asm/unaligned.h>
> +#include <crypto/algapi.h>
> +#include <crypto/ctr.h>
> +#include <crypto/internal/simd.h>
> +#include <crypto/scatterwalk.h>
> +#include <crypto/skcipher.h>
> +#include <crypto/internal/skcipher.h> #include <linux/err.h> #include 
> +<linux/module.h> #include <linux/types.h>
> +
> +#include <crypto/lea.h>
> +#include <crypto/xts.h>
> +#include "ecb_cbc_helpers.h"
> +
> +#define SIMD_KEY_ALIGN 16
> +#define SIMD_ALIGN_ATTR __aligned(SIMD_KEY_ALIGN)
> +
> +struct lea_xts_ctx {
> +	u8 raw_crypt_ctx[sizeof(struct crypto_lea_ctx)] SIMD_ALIGN_ATTR;
> +	u8 raw_tweak_ctx[sizeof(struct crypto_lea_ctx)] SIMD_ALIGN_ATTR; };

The typing here is a bit goofy.  What's wrong with:

struct lea_xts_ctx {
	struct crypto_lea_ctx crypt_ctx SIMD_ALIGN_ATTR;
	struct crypto_lea_ctx lea_ctx   SIMD_ALIGN_ATTR;
};

?  You end up with the same sized structure but you don't have to cast it as much.

> +struct _lea_u128 {
> +	u64 v0, v1;
> +};
> +
> +static inline void xor_1blk(u8 *out, const u8 *in1, const u8 *in2) {
> +	const struct _lea_u128 *_in1 = (const struct _lea_u128 *)in1;
> +	const struct _lea_u128 *_in2 = (const struct _lea_u128 *)in2;
> +	struct _lea_u128 *_out = (struct _lea_u128 *)out;
> +
> +	_out->v0 = _in1->v0 ^ _in2->v0;
> +	_out->v1 = _in1->v1 ^ _in2->v1;
> +}
> +
> +static inline void xts_next_tweak(u8 *out, const u8 *in) {
> +	const u64 *_in = (const u64 *)in;
> +	u64 *_out = (u64 *)out;
> +	u64 v0 = _in[0];
> +	u64 v1 = _in[1];
> +	u64 carry = (u64)(((s64)v1) >> 63);
> +
> +	v1 = (v1 << 1) ^ (v0 >> 63);
> +	v0 = (v0 << 1) ^ ((u64)carry & 0x87);
> +
> +	_out[0] = v0;
> +	_out[1] = v1;
> +}

I don't really care either way, but it's interesting that in two adjacent functions this deals with two adjacent 64-bit values.  In one it defines a structure with two u64's and in the next it treats it as an array.

> +static int xts_encrypt_8way(struct skcipher_request *req) {
...

It's kinda a shame that there isn't more code shared here between, for instance the 4way and 8way functions.  But I guess this crypto code tends to be merged and then very rarely fixed up after.

> +static int xts_lea_set_key(struct crypto_skcipher *tfm, const u8 *key,
> +				u32 keylen)
> +{
> +	struct crypto_tfm *tfm_ctx = crypto_skcipher_ctx(tfm);
> +	struct lea_xts_ctx *ctx = crypto_tfm_ctx(tfm_ctx);
> +
> +	struct crypto_lea_ctx *crypt_key =
> +		(struct crypto_lea_ctx *)(ctx->raw_crypt_ctx);
> +	struct crypto_lea_ctx *tweak_key =
> +		(struct crypto_lea_ctx *)(ctx->raw_tweak_ctx);

These were those goofy casts that can go away if the typing is a bit more careful

...
> +static struct simd_skcipher_alg
> *lea_simd_algs[ARRAY_SIZE(lea_simd_avx2_algs)];
> +
> +static int __init crypto_lea_avx2_init(void) {
> +	const char *feature_name;
> +
> +	if (!boot_cpu_has(X86_FEATURE_XMM2)) {
> +		pr_info("SSE2 instructions are not detected.\n");
> +		return -ENODEV;
> +	}
> +
> +	if (!boot_cpu_has(X86_FEATURE_MOVBE)) {
> +		pr_info("MOVBE instructions are not detected.\n");
> +		return -ENODEV;
> +	}
> +
> +	if (!boot_cpu_has(X86_FEATURE_AVX2) || 
> +!boot_cpu_has(X86_FEATURE_AVX))
> {
> +		pr_info("AVX2 instructions are not detected.\n");
> +		return -ENODEV;
> +	}
> +
> +	if (!cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM,
> +				&feature_name)) {
> +		pr_info("CPU feature '%s' is not supported.\n", feature_name);
> +		return -ENODEV;
> +	}

This looks suspect.

It requires that *ALL* of XMM2, MOVBE, AVX, AVX2 and XSAVE support for
*ANY* of these to be used.  In other cipher code that I've seen, it separates out the AVX/YMM acceleration from the pure SSE2/XMM acceleration functions so that CPUs with only SSE2 can still benefit.

Either this is wrong, or there is something subtle going on that I'm missing.


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2023-05-16  4:29 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-04-28 11:00 [PATCH 0/3] crypto: LEA block cipher implementation Dongsoo Lee
2023-04-28 11:00 ` [PATCH 1/3] " Dongsoo Lee
2023-04-28 23:29   ` Eric Biggers
2023-04-29  2:20     ` Letrhee
2023-04-28 11:00 ` [PATCH 2/3] crypto: add LEA testmgr tests Dongsoo Lee
2023-04-28 11:00 ` [PATCH 3/3] crypto: LEA block cipher AVX2 optimization Dongsoo Lee
2023-04-28 15:54   ` Dave Hansen
2023-05-16  4:29     ` Dongsoo Lee
2023-04-28 23:19 ` [PATCH 0/3] crypto: LEA block cipher implementation Eric Biggers
2023-05-16  4:27   ` Dongsoo Lee

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).