linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/6] crypto: aegis128 - add NEON intrinsics version for ARM/arm64
@ 2019-06-24  7:38 Ard Biesheuvel
  2019-06-24  7:38 ` [PATCH 1/6] crypto: aegis128 - use unaliged helper in unaligned decrypt path Ard Biesheuvel
                   ` (6 more replies)
  0 siblings, 7 replies; 15+ messages in thread
From: Ard Biesheuvel @ 2019-06-24  7:38 UTC (permalink / raw)
  To: linux-crypto
  Cc: Ard Biesheuvel, Herbert Xu, Eric Biggers, Steve Capper,
	Ondrej Mosnacek, linux-arm-kernel

Now that aegis128 has been announced as one of the winners of the CAESAR
competition, it's time to provide some better support for it on arm64 (and
32-bit ARM *)

This time, instead of cloning the generic driver twice and rewriting half
of it in arm64 and ARM assembly, add hooks for an accelerated SIMD path to
the generic driver, and populate it with a C version using NEON intrinsics
that can be built for both ARM and arm64. This results in a speedup of ~11x,
resulting in a performance of 2.2 cycles per byte on Cortex-A53.

Patches #1 .. #3 are some fixes/improvements for the generic code. Patch #4
adds the plumbing for using a SIMD accelerated implementation. Patch #5
adds the ARM and arm64 code, and patch #6 adds a speed test.

Note that aegis128l and aegis256 were not selected, and nor where any of the
morus contestants, and so we should probably consider dropping those drivers
again.

* 32-bit ARM today rarely provides the special AES instruction that the
  implementation in this series relies on, but this may change in the future,
  and the NEON intrinsics code can be compiled for both ISAs.

Cc: Eric Biggers <ebiggers@google.com>
Cc: Ondrej Mosnacek <omosnace@redhat.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Steve Capper <steve.capper@arm.com>

Ard Biesheuvel (6):
  crypto: aegis128 - use unaliged helper in unaligned decrypt path
  crypto: aegis - drop empty TFM init/exit routines
  crypto: aegis - avoid prerotated AES tables
  crypto: aegis128 - add support for SIMD acceleration
  crypto: aegis128 - provide a SIMD implementation based on NEON
    intrinsics
  crypto: tcrypt - add a speed test for AEGIS128

 crypto/Kconfig               |   5 +
 crypto/Makefile              |  12 ++
 crypto/aegis.h               |  28 ++--
 crypto/aegis128-neon-inner.c | 142 ++++++++++++++++++++
 crypto/aegis128-neon.c       |  43 ++++++
 crypto/aegis128.c            |  55 +++++---
 crypto/aegis128l.c           |  11 --
 crypto/aegis256.c            |  11 --
 crypto/tcrypt.c              |   7 +
 9 files changed, 261 insertions(+), 53 deletions(-)
 create mode 100644 crypto/aegis128-neon-inner.c
 create mode 100644 crypto/aegis128-neon.c

-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 1/6] crypto: aegis128 - use unaliged helper in unaligned decrypt path
  2019-06-24  7:38 [PATCH 0/6] crypto: aegis128 - add NEON intrinsics version for ARM/arm64 Ard Biesheuvel
@ 2019-06-24  7:38 ` Ard Biesheuvel
  2019-06-24  7:59   ` Ondrej Mosnacek
  2019-06-24  7:38 ` [PATCH 2/6] crypto: aegis - drop empty TFM init/exit routines Ard Biesheuvel
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 15+ messages in thread
From: Ard Biesheuvel @ 2019-06-24  7:38 UTC (permalink / raw)
  To: linux-crypto
  Cc: Ard Biesheuvel, Herbert Xu, Eric Biggers, Steve Capper,
	Ondrej Mosnacek, linux-arm-kernel

Use crypto_aegis128_update_u() not crypto_aegis128_update_a() in the
decrypt path that is taken when the source or destination pointers
are not aligned.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 crypto/aegis128.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/crypto/aegis128.c b/crypto/aegis128.c
index d78f77fc5dd1..125e11246990 100644
--- a/crypto/aegis128.c
+++ b/crypto/aegis128.c
@@ -208,7 +208,7 @@ static void crypto_aegis128_decrypt_chunk(struct aegis_state *state, u8 *dst,
 			crypto_aegis_block_xor(&tmp, &state->blocks[1]);
 			crypto_xor(tmp.bytes, src, AEGIS_BLOCK_SIZE);
 
-			crypto_aegis128_update_a(state, &tmp);
+			crypto_aegis128_update_u(state, &tmp);
 
 			memcpy(dst, tmp.bytes, AEGIS_BLOCK_SIZE);
 
-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 2/6] crypto: aegis - drop empty TFM init/exit routines
  2019-06-24  7:38 [PATCH 0/6] crypto: aegis128 - add NEON intrinsics version for ARM/arm64 Ard Biesheuvel
  2019-06-24  7:38 ` [PATCH 1/6] crypto: aegis128 - use unaliged helper in unaligned decrypt path Ard Biesheuvel
@ 2019-06-24  7:38 ` Ard Biesheuvel
  2019-06-24  8:03   ` Ondrej Mosnacek
  2019-06-24  7:38 ` [PATCH 3/6] crypto: aegis - avoid prerotated AES tables Ard Biesheuvel
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 15+ messages in thread
From: Ard Biesheuvel @ 2019-06-24  7:38 UTC (permalink / raw)
  To: linux-crypto
  Cc: Ard Biesheuvel, Herbert Xu, Eric Biggers, Steve Capper,
	Ondrej Mosnacek, linux-arm-kernel

TFM init/exit routines are optional, so no need to provide empty ones.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 crypto/aegis128.c  | 11 -----------
 crypto/aegis128l.c | 11 -----------
 crypto/aegis256.c  | 11 -----------
 3 files changed, 33 deletions(-)

diff --git a/crypto/aegis128.c b/crypto/aegis128.c
index 125e11246990..4f8f1cdef129 100644
--- a/crypto/aegis128.c
+++ b/crypto/aegis128.c
@@ -403,22 +403,11 @@ static int crypto_aegis128_decrypt(struct aead_request *req)
 	return crypto_memneq(tag.bytes, zeros, authsize) ? -EBADMSG : 0;
 }
 
-static int crypto_aegis128_init_tfm(struct crypto_aead *tfm)
-{
-	return 0;
-}
-
-static void crypto_aegis128_exit_tfm(struct crypto_aead *tfm)
-{
-}
-
 static struct aead_alg crypto_aegis128_alg = {
 	.setkey = crypto_aegis128_setkey,
 	.setauthsize = crypto_aegis128_setauthsize,
 	.encrypt = crypto_aegis128_encrypt,
 	.decrypt = crypto_aegis128_decrypt,
-	.init = crypto_aegis128_init_tfm,
-	.exit = crypto_aegis128_exit_tfm,
 
 	.ivsize = AEGIS128_NONCE_SIZE,
 	.maxauthsize = AEGIS128_MAX_AUTH_SIZE,
diff --git a/crypto/aegis128l.c b/crypto/aegis128l.c
index 9bca3d619a22..ef5bc2297a2c 100644
--- a/crypto/aegis128l.c
+++ b/crypto/aegis128l.c
@@ -467,22 +467,11 @@ static int crypto_aegis128l_decrypt(struct aead_request *req)
 	return crypto_memneq(tag.bytes, zeros, authsize) ? -EBADMSG : 0;
 }
 
-static int crypto_aegis128l_init_tfm(struct crypto_aead *tfm)
-{
-	return 0;
-}
-
-static void crypto_aegis128l_exit_tfm(struct crypto_aead *tfm)
-{
-}
-
 static struct aead_alg crypto_aegis128l_alg = {
 	.setkey = crypto_aegis128l_setkey,
 	.setauthsize = crypto_aegis128l_setauthsize,
 	.encrypt = crypto_aegis128l_encrypt,
 	.decrypt = crypto_aegis128l_decrypt,
-	.init = crypto_aegis128l_init_tfm,
-	.exit = crypto_aegis128l_exit_tfm,
 
 	.ivsize = AEGIS128L_NONCE_SIZE,
 	.maxauthsize = AEGIS128L_MAX_AUTH_SIZE,
diff --git a/crypto/aegis256.c b/crypto/aegis256.c
index b47fd39595ad..b824ef4d1248 100644
--- a/crypto/aegis256.c
+++ b/crypto/aegis256.c
@@ -418,22 +418,11 @@ static int crypto_aegis256_decrypt(struct aead_request *req)
 	return crypto_memneq(tag.bytes, zeros, authsize) ? -EBADMSG : 0;
 }
 
-static int crypto_aegis256_init_tfm(struct crypto_aead *tfm)
-{
-	return 0;
-}
-
-static void crypto_aegis256_exit_tfm(struct crypto_aead *tfm)
-{
-}
-
 static struct aead_alg crypto_aegis256_alg = {
 	.setkey = crypto_aegis256_setkey,
 	.setauthsize = crypto_aegis256_setauthsize,
 	.encrypt = crypto_aegis256_encrypt,
 	.decrypt = crypto_aegis256_decrypt,
-	.init = crypto_aegis256_init_tfm,
-	.exit = crypto_aegis256_exit_tfm,
 
 	.ivsize = AEGIS256_NONCE_SIZE,
 	.maxauthsize = AEGIS256_MAX_AUTH_SIZE,
-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 3/6] crypto: aegis - avoid prerotated AES tables
  2019-06-24  7:38 [PATCH 0/6] crypto: aegis128 - add NEON intrinsics version for ARM/arm64 Ard Biesheuvel
  2019-06-24  7:38 ` [PATCH 1/6] crypto: aegis128 - use unaliged helper in unaligned decrypt path Ard Biesheuvel
  2019-06-24  7:38 ` [PATCH 2/6] crypto: aegis - drop empty TFM init/exit routines Ard Biesheuvel
@ 2019-06-24  7:38 ` Ard Biesheuvel
  2019-06-24  8:13   ` Ondrej Mosnacek
  2019-06-24  7:38 ` [PATCH 4/6] crypto: aegis128 - add support for SIMD acceleration Ard Biesheuvel
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 15+ messages in thread
From: Ard Biesheuvel @ 2019-06-24  7:38 UTC (permalink / raw)
  To: linux-crypto
  Cc: Ard Biesheuvel, Herbert Xu, Eric Biggers, Steve Capper,
	Ondrej Mosnacek, linux-arm-kernel

The generic AES code provides four sets of lookup tables, where each
set consists of four tables containing the same 32-bit values, but
rotated by 0, 8, 16 and 24 bits, respectively. This makes sense for
CISC architectures such as x86 which support memory operands, but
for other architectures, the rotates are quite cheap, and using all
four tables needlessly thrashes the D-cache, and actually hurts rather
than helps performance.

Since x86 already has its own implementation of AEGIS based on AES-NI
instructions, let's tweak the generic implementation towards other
architectures, and avoid the prerotated tables, and perform the
rotations inline. On ARM Cortex-A53, this results in a ~8% speedup.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 crypto/aegis.h | 14 ++++++--------
 1 file changed, 6 insertions(+), 8 deletions(-)

diff --git a/crypto/aegis.h b/crypto/aegis.h
index 41a3090cda8e..3308066ddde0 100644
--- a/crypto/aegis.h
+++ b/crypto/aegis.h
@@ -10,6 +10,7 @@
 #define _CRYPTO_AEGIS_H
 
 #include <crypto/aes.h>
+#include <linux/bitops.h>
 #include <linux/types.h>
 
 #define AEGIS_BLOCK_SIZE 16
@@ -53,16 +54,13 @@ static void crypto_aegis_aesenc(union aegis_block *dst,
 				const union aegis_block *key)
 {
 	const u8  *s  = src->bytes;
-	const u32 *t0 = crypto_ft_tab[0];
-	const u32 *t1 = crypto_ft_tab[1];
-	const u32 *t2 = crypto_ft_tab[2];
-	const u32 *t3 = crypto_ft_tab[3];
+	const u32 *t = crypto_ft_tab[0];
 	u32 d0, d1, d2, d3;
 
-	d0 = t0[s[ 0]] ^ t1[s[ 5]] ^ t2[s[10]] ^ t3[s[15]];
-	d1 = t0[s[ 4]] ^ t1[s[ 9]] ^ t2[s[14]] ^ t3[s[ 3]];
-	d2 = t0[s[ 8]] ^ t1[s[13]] ^ t2[s[ 2]] ^ t3[s[ 7]];
-	d3 = t0[s[12]] ^ t1[s[ 1]] ^ t2[s[ 6]] ^ t3[s[11]];
+	d0 = t[s[ 0]] ^ rol32(t[s[ 5]], 8) ^ rol32(t[s[10]], 16) ^ rol32(t[s[15]], 24);
+	d1 = t[s[ 4]] ^ rol32(t[s[ 9]], 8) ^ rol32(t[s[14]], 16) ^ rol32(t[s[ 3]], 24);
+	d2 = t[s[ 8]] ^ rol32(t[s[13]], 8) ^ rol32(t[s[ 2]], 16) ^ rol32(t[s[ 7]], 24);
+	d3 = t[s[12]] ^ rol32(t[s[ 1]], 8) ^ rol32(t[s[ 6]], 16) ^ rol32(t[s[11]], 24);
 
 	dst->words32[0] = cpu_to_le32(d0) ^ key->words32[0];
 	dst->words32[1] = cpu_to_le32(d1) ^ key->words32[1];
-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 4/6] crypto: aegis128 - add support for SIMD acceleration
  2019-06-24  7:38 [PATCH 0/6] crypto: aegis128 - add NEON intrinsics version for ARM/arm64 Ard Biesheuvel
                   ` (2 preceding siblings ...)
  2019-06-24  7:38 ` [PATCH 3/6] crypto: aegis - avoid prerotated AES tables Ard Biesheuvel
@ 2019-06-24  7:38 ` Ard Biesheuvel
  2019-06-24  7:38 ` [PATCH 5/6] crypto: aegis128 - provide a SIMD implementation based on NEON intrinsics Ard Biesheuvel
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 15+ messages in thread
From: Ard Biesheuvel @ 2019-06-24  7:38 UTC (permalink / raw)
  To: linux-crypto
  Cc: Ard Biesheuvel, Herbert Xu, Eric Biggers, Steve Capper,
	Ondrej Mosnacek, linux-arm-kernel

Add some plumbing to allow the AEGIS128 code to be built with SIMD
routines for acceleration.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 crypto/aegis.h    | 14 +++----
 crypto/aegis128.c | 42 ++++++++++++++++++--
 2 files changed, 45 insertions(+), 11 deletions(-)

diff --git a/crypto/aegis.h b/crypto/aegis.h
index 3308066ddde0..6cb65a497ba2 100644
--- a/crypto/aegis.h
+++ b/crypto/aegis.h
@@ -35,23 +35,23 @@ static const union aegis_block crypto_aegis_const[2] = {
 	} },
 };
 
-static void crypto_aegis_block_xor(union aegis_block *dst,
-				   const union aegis_block *src)
+static inline void crypto_aegis_block_xor(union aegis_block *dst,
+					  const union aegis_block *src)
 {
 	dst->words64[0] ^= src->words64[0];
 	dst->words64[1] ^= src->words64[1];
 }
 
-static void crypto_aegis_block_and(union aegis_block *dst,
-				   const union aegis_block *src)
+static inline void crypto_aegis_block_and(union aegis_block *dst,
+					  const union aegis_block *src)
 {
 	dst->words64[0] &= src->words64[0];
 	dst->words64[1] &= src->words64[1];
 }
 
-static void crypto_aegis_aesenc(union aegis_block *dst,
-				const union aegis_block *src,
-				const union aegis_block *key)
+static inline void crypto_aegis_aesenc(union aegis_block *dst,
+				       const union aegis_block *src,
+				       const union aegis_block *key)
 {
 	const u8  *s  = src->bytes;
 	const u32 *t = crypto_ft_tab[0];
diff --git a/crypto/aegis128.c b/crypto/aegis128.c
index 4f8f1cdef129..1bbd3e49c890 100644
--- a/crypto/aegis128.c
+++ b/crypto/aegis128.c
@@ -8,6 +8,7 @@
 
 #include <crypto/algapi.h>
 #include <crypto/internal/aead.h>
+#include <crypto/internal/simd.h>
 #include <crypto/internal/skcipher.h>
 #include <crypto/scatterwalk.h>
 #include <linux/err.h>
@@ -15,6 +16,7 @@
 #include <linux/kernel.h>
 #include <linux/module.h>
 #include <linux/scatterlist.h>
+#include <asm/simd.h>
 
 #include "aegis.h"
 
@@ -40,6 +42,15 @@ struct aegis128_ops {
 			    const u8 *src, unsigned int size);
 };
 
+static bool have_simd;
+
+bool crypto_aegis128_have_simd(void);
+void crypto_aegis128_update_simd(struct aegis_state *state, const void *msg);
+void crypto_aegis128_encrypt_chunk_simd(struct aegis_state *state, u8 *dst,
+					const u8 *src, unsigned int size);
+void crypto_aegis128_decrypt_chunk_simd(struct aegis_state *state, u8 *dst,
+					const u8 *src, unsigned int size);
+
 static void crypto_aegis128_update(struct aegis_state *state)
 {
 	union aegis_block tmp;
@@ -55,12 +66,22 @@ static void crypto_aegis128_update(struct aegis_state *state)
 static void crypto_aegis128_update_a(struct aegis_state *state,
 				     const union aegis_block *msg)
 {
+	if (have_simd && crypto_simd_usable()) {
+		crypto_aegis128_update_simd(state, msg);
+		return;
+	}
+
 	crypto_aegis128_update(state);
 	crypto_aegis_block_xor(&state->blocks[0], msg);
 }
 
 static void crypto_aegis128_update_u(struct aegis_state *state, const void *msg)
 {
+	if (have_simd && crypto_simd_usable()) {
+		crypto_aegis128_update_simd(state, msg);
+		return;
+	}
+
 	crypto_aegis128_update(state);
 	crypto_xor(state->blocks[0].bytes, msg, AEGIS_BLOCK_SIZE);
 }
@@ -365,7 +386,7 @@ static void crypto_aegis128_crypt(struct aead_request *req,
 
 static int crypto_aegis128_encrypt(struct aead_request *req)
 {
-	static const struct aegis128_ops ops = {
+	const struct aegis128_ops *ops = &(struct aegis128_ops){
 		.skcipher_walk_init = skcipher_walk_aead_encrypt,
 		.crypt_chunk = crypto_aegis128_encrypt_chunk,
 	};
@@ -375,7 +396,12 @@ static int crypto_aegis128_encrypt(struct aead_request *req)
 	unsigned int authsize = crypto_aead_authsize(tfm);
 	unsigned int cryptlen = req->cryptlen;
 
-	crypto_aegis128_crypt(req, &tag, cryptlen, &ops);
+	if (have_simd && crypto_simd_usable())
+		ops = &(struct aegis128_ops){
+			.skcipher_walk_init = skcipher_walk_aead_encrypt,
+			.crypt_chunk = crypto_aegis128_encrypt_chunk_simd };
+
+	crypto_aegis128_crypt(req, &tag, cryptlen, ops);
 
 	scatterwalk_map_and_copy(tag.bytes, req->dst, req->assoclen + cryptlen,
 				 authsize, 1);
@@ -384,7 +410,7 @@ static int crypto_aegis128_encrypt(struct aead_request *req)
 
 static int crypto_aegis128_decrypt(struct aead_request *req)
 {
-	static const struct aegis128_ops ops = {
+	const struct aegis128_ops *ops = &(struct aegis128_ops){
 		.skcipher_walk_init = skcipher_walk_aead_decrypt,
 		.crypt_chunk = crypto_aegis128_decrypt_chunk,
 	};
@@ -398,7 +424,12 @@ static int crypto_aegis128_decrypt(struct aead_request *req)
 	scatterwalk_map_and_copy(tag.bytes, req->src, req->assoclen + cryptlen,
 				 authsize, 0);
 
-	crypto_aegis128_crypt(req, &tag, cryptlen, &ops);
+	if (have_simd && crypto_simd_usable())
+		ops = &(struct aegis128_ops){
+			.skcipher_walk_init = skcipher_walk_aead_decrypt,
+			.crypt_chunk = crypto_aegis128_decrypt_chunk_simd };
+
+	crypto_aegis128_crypt(req, &tag, cryptlen, ops);
 
 	return crypto_memneq(tag.bytes, zeros, authsize) ? -EBADMSG : 0;
 }
@@ -429,6 +460,9 @@ static struct aead_alg crypto_aegis128_alg = {
 
 static int __init crypto_aegis128_module_init(void)
 {
+	if (IS_ENABLED(CONFIG_CRYPTO_AEGIS128_SIMD))
+		have_simd = crypto_aegis128_have_simd();
+
 	return crypto_register_aead(&crypto_aegis128_alg);
 }
 
-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 5/6] crypto: aegis128 - provide a SIMD implementation based on NEON intrinsics
  2019-06-24  7:38 [PATCH 0/6] crypto: aegis128 - add NEON intrinsics version for ARM/arm64 Ard Biesheuvel
                   ` (3 preceding siblings ...)
  2019-06-24  7:38 ` [PATCH 4/6] crypto: aegis128 - add support for SIMD acceleration Ard Biesheuvel
@ 2019-06-24  7:38 ` Ard Biesheuvel
  2019-06-24 14:37   ` Ard Biesheuvel
  2019-06-24  7:38 ` [PATCH 6/6] crypto: tcrypt - add a speed test for AEGIS128 Ard Biesheuvel
  2019-06-24 16:56 ` [PATCH 0/6] crypto: aegis128 - add NEON intrinsics version for ARM/arm64 Eric Biggers
  6 siblings, 1 reply; 15+ messages in thread
From: Ard Biesheuvel @ 2019-06-24  7:38 UTC (permalink / raw)
  To: linux-crypto
  Cc: Ard Biesheuvel, Herbert Xu, Eric Biggers, Steve Capper,
	Ondrej Mosnacek, linux-arm-kernel

Provide an accelerated implementation of aegis128 by wiring up the
SIMD hooks in the generic driver to an implementation based on NEON
intrinsics, which can be compiled to both ARM and arm64 code.

This results in a performance of 2.2 cycles per byte on Cortex-A53,
which is a performance increase of ~11x compared to the generic
code.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 crypto/Kconfig               |   5 +
 crypto/Makefile              |  12 ++
 crypto/aegis128-neon-inner.c | 142 ++++++++++++++++++++
 crypto/aegis128-neon.c       |  43 ++++++
 4 files changed, 202 insertions(+)

diff --git a/crypto/Kconfig b/crypto/Kconfig
index 3d056e7da65f..c4b96f2e1344 100644
--- a/crypto/Kconfig
+++ b/crypto/Kconfig
@@ -311,6 +311,11 @@ config CRYPTO_AEGIS128
 	help
 	 Support for the AEGIS-128 dedicated AEAD algorithm.
 
+config CRYPTO_AEGIS128_SIMD
+	bool "Support SIMD acceleration for AEGIS-128"
+	depends on CRYPTO_AEGIS128 && ((ARM || ARM64) && KERNEL_MODE_NEON)
+	default y
+
 config CRYPTO_AEGIS128L
 	tristate "AEGIS-128L AEAD algorithm"
 	select CRYPTO_AEAD
diff --git a/crypto/Makefile b/crypto/Makefile
index 266a4cdbb9e2..f4a55cfb7f17 100644
--- a/crypto/Makefile
+++ b/crypto/Makefile
@@ -92,6 +92,18 @@ obj-$(CONFIG_CRYPTO_GCM) += gcm.o
 obj-$(CONFIG_CRYPTO_CCM) += ccm.o
 obj-$(CONFIG_CRYPTO_CHACHA20POLY1305) += chacha20poly1305.o
 obj-$(CONFIG_CRYPTO_AEGIS128) += aegis128.o
+aegis128-y := aegis128.o
+
+ifeq ($(ARCH),arm)
+CFLAGS_aegis128-neon-inner.o += -ffreestanding -march=armv7-a -mfloat-abi=softfp -mfpu=crypto-neon-fp-armv8
+aegis128-$(CONFIG_CRYPTO_AEGIS128_SIMD) += aegis128-neon.o aegis128-neon-inner.o
+endif
+ifeq ($(ARCH),arm64)
+CFLAGS_aegis128-neon-inner.o += -ffreestanding -mcpu=generic+crypto
+CFLAGS_REMOVE_aegis128-neon-inner.o += -mgeneral-regs-only
+aegis128-$(CONFIG_CRYPTO_AEGIS128_SIMD) += aegis128-neon.o aegis128-neon-inner.o
+endif
+
 obj-$(CONFIG_CRYPTO_AEGIS128L) += aegis128l.o
 obj-$(CONFIG_CRYPTO_AEGIS256) += aegis256.o
 obj-$(CONFIG_CRYPTO_MORUS640) += morus640.o
diff --git a/crypto/aegis128-neon-inner.c b/crypto/aegis128-neon-inner.c
new file mode 100644
index 000000000000..c6d90390ac38
--- /dev/null
+++ b/crypto/aegis128-neon-inner.c
@@ -0,0 +1,142 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Copyright (C) 2019 Linaro, Ltd. <ard.biesheuvel@linaro.org>
+ */
+
+#ifdef CONFIG_ARM64
+#include <asm/neon-intrinsics.h>
+#else
+#include <arm_neon.h>
+#endif
+
+#define AEGIS_BLOCK_SIZE	16
+
+#include <stddef.h>
+
+void *memcpy(void *dest, const void *src, size_t n);
+void *memset(void *s, int c, size_t n);
+
+struct aegis128_state {
+	uint8x16_t v[5];
+};
+
+static struct aegis128_state aegis128_update_neon(struct aegis128_state st,
+						  uint8x16_t m)
+{
+	uint8x16_t z = {};
+	uint8x16_t t;
+
+	t        = vaesmcq_u8(vaeseq_u8(st.v[3], z));
+	st.v[3] ^= vaesmcq_u8(vaeseq_u8(st.v[2], z));
+	st.v[2] ^= vaesmcq_u8(vaeseq_u8(st.v[1], z));
+	st.v[1] ^= vaesmcq_u8(vaeseq_u8(st.v[0], z));
+	st.v[0] ^= vaesmcq_u8(vaeseq_u8(st.v[4], z)) ^ m;
+	st.v[4] ^= t;
+
+	return st;
+}
+
+void crypto_aegis128_update_neon(void *state, const void *msg)
+{
+	struct aegis128_state st = { {
+		vld1q_u8(state),
+		vld1q_u8(state + 16),
+		vld1q_u8(state + 32),
+		vld1q_u8(state + 48),
+		vld1q_u8(state + 64)
+	} };
+
+	st = aegis128_update_neon(st, vld1q_u8(msg));
+
+	vst1q_u8(state, st.v[0]);
+	vst1q_u8(state + 16, st.v[1]);
+	vst1q_u8(state + 32, st.v[2]);
+	vst1q_u8(state + 48, st.v[3]);
+	vst1q_u8(state + 64, st.v[4]);
+}
+
+void crypto_aegis128_encrypt_chunk_neon(void *state, void *dst, const void *src,
+					unsigned int size)
+{
+	struct aegis128_state st = { {
+		vld1q_u8(state),
+		vld1q_u8(state + 16),
+		vld1q_u8(state + 32),
+		vld1q_u8(state + 48),
+		vld1q_u8(state + 64)
+	} };
+	uint8x16_t tmp;
+
+	while (size >= AEGIS_BLOCK_SIZE) {
+		uint8x16_t s = vld1q_u8(src);
+
+		tmp = s ^ st.v[1] ^ (st.v[2] & st.v[3]) ^ st.v[4];
+		st = aegis128_update_neon(st, s);
+		vst1q_u8(dst, tmp);
+
+		size -= AEGIS_BLOCK_SIZE;
+		src += AEGIS_BLOCK_SIZE;
+		dst += AEGIS_BLOCK_SIZE;
+	}
+
+	if (size > 0) {
+		uint8_t buf[AEGIS_BLOCK_SIZE] = {};
+		uint8x16_t msg;
+
+		memcpy(buf, src, size);
+		msg = vld1q_u8(buf);
+		tmp = msg ^ st.v[1] ^ (st.v[2] & st.v[3]) ^ st.v[4];
+		st = aegis128_update_neon(st, msg);
+		vst1q_u8(buf, tmp);
+		memcpy(dst, buf, size);
+	}
+
+	vst1q_u8(state, st.v[0]);
+	vst1q_u8(state + 16, st.v[1]);
+	vst1q_u8(state + 32, st.v[2]);
+	vst1q_u8(state + 48, st.v[3]);
+	vst1q_u8(state + 64, st.v[4]);
+}
+
+void crypto_aegis128_decrypt_chunk_neon(void *state, void *dst, const void *src,
+					unsigned int size)
+{
+	struct aegis128_state st = { {
+		vld1q_u8(state),
+		vld1q_u8(state + 16),
+		vld1q_u8(state + 32),
+		vld1q_u8(state + 48),
+		vld1q_u8(state + 64)
+	} };
+	uint8x16_t tmp;
+
+	while (size >= AEGIS_BLOCK_SIZE) {
+		tmp = vld1q_u8(src) ^ st.v[1] ^ (st.v[2] & st.v[3]) ^ st.v[4];
+		st = aegis128_update_neon(st, tmp);
+		vst1q_u8(dst, tmp);
+
+		size -= AEGIS_BLOCK_SIZE;
+		src += AEGIS_BLOCK_SIZE;
+		dst += AEGIS_BLOCK_SIZE;
+	}
+
+	if (size > 0) {
+		uint8_t buf[AEGIS_BLOCK_SIZE] = {};
+		uint8x16_t msg;
+
+		memcpy(buf, src, size);
+		msg = vld1q_u8(buf) ^ st.v[1] ^ (st.v[2] & st.v[3]) ^ st.v[4];
+		vst1q_u8(buf, msg);
+		memcpy(dst, buf, size);
+
+		memset(buf + size, 0, AEGIS_BLOCK_SIZE - size);
+		msg = vld1q_u8(buf);
+		st = aegis128_update_neon(st, msg);
+	}
+
+	vst1q_u8(state, st.v[0]);
+	vst1q_u8(state + 16, st.v[1]);
+	vst1q_u8(state + 32, st.v[2]);
+	vst1q_u8(state + 48, st.v[3]);
+	vst1q_u8(state + 64, st.v[4]);
+}
diff --git a/crypto/aegis128-neon.c b/crypto/aegis128-neon.c
new file mode 100644
index 000000000000..c1c0a1686f67
--- /dev/null
+++ b/crypto/aegis128-neon.c
@@ -0,0 +1,43 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Copyright (C) 2019 Linaro Ltd <ard.biesheuvel@linaro.org>
+ */
+
+#include <asm/cpufeature.h>
+#include <asm/neon.h>
+
+#include "aegis.h"
+
+void crypto_aegis128_update_neon(void *state, const void *msg);
+void crypto_aegis128_encrypt_chunk_neon(void *state, void *dst, const void *src,
+					unsigned int size);
+void crypto_aegis128_decrypt_chunk_neon(void *state, void *dst, const void *src,
+					unsigned int size);
+
+bool crypto_aegis128_have_simd(void)
+{
+	return cpu_have_feature(cpu_feature(AES));
+}
+
+void crypto_aegis128_update_simd(union aegis_block *state, const void *msg)
+{
+	kernel_neon_begin();
+	crypto_aegis128_update_neon(state, msg);
+	kernel_neon_end();
+}
+
+void crypto_aegis128_encrypt_chunk_simd(union aegis_block *state, u8 *dst,
+					const u8 *src, unsigned int size)
+{
+	kernel_neon_begin();
+	crypto_aegis128_encrypt_chunk_neon(state, dst, src, size);
+	kernel_neon_end();
+}
+
+void crypto_aegis128_decrypt_chunk_simd(union aegis_block *state, u8 *dst,
+					const u8 *src, unsigned int size)
+{
+	kernel_neon_begin();
+	crypto_aegis128_decrypt_chunk_neon(state, dst, src, size);
+	kernel_neon_end();
+}
-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 6/6] crypto: tcrypt - add a speed test for AEGIS128
  2019-06-24  7:38 [PATCH 0/6] crypto: aegis128 - add NEON intrinsics version for ARM/arm64 Ard Biesheuvel
                   ` (4 preceding siblings ...)
  2019-06-24  7:38 ` [PATCH 5/6] crypto: aegis128 - provide a SIMD implementation based on NEON intrinsics Ard Biesheuvel
@ 2019-06-24  7:38 ` Ard Biesheuvel
  2019-06-24 16:56 ` [PATCH 0/6] crypto: aegis128 - add NEON intrinsics version for ARM/arm64 Eric Biggers
  6 siblings, 0 replies; 15+ messages in thread
From: Ard Biesheuvel @ 2019-06-24  7:38 UTC (permalink / raw)
  To: linux-crypto
  Cc: Ard Biesheuvel, Herbert Xu, Eric Biggers, Steve Capper,
	Ondrej Mosnacek, linux-arm-kernel

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 crypto/tcrypt.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/crypto/tcrypt.c b/crypto/tcrypt.c
index ad78ab5b93cb..c578ccd92c57 100644
--- a/crypto/tcrypt.c
+++ b/crypto/tcrypt.c
@@ -2327,6 +2327,13 @@ static int do_test(const char *alg, u32 type, u32 mask, int m, u32 num_mb)
 				  0, speed_template_32);
 		break;
 
+	case 221:
+		test_aead_speed("aegis128", ENCRYPT, sec,
+				NULL, 0, 16, 8, speed_template_16);
+		test_aead_speed("aegis128", DECRYPT, sec,
+				NULL, 0, 16, 8, speed_template_16);
+		break;
+
 	case 300:
 		if (alg) {
 			test_hash_speed(alg, sec, generic_hash_speed_template);
-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/6] crypto: aegis128 - use unaliged helper in unaligned decrypt path
  2019-06-24  7:38 ` [PATCH 1/6] crypto: aegis128 - use unaliged helper in unaligned decrypt path Ard Biesheuvel
@ 2019-06-24  7:59   ` Ondrej Mosnacek
  2019-06-24  8:01     ` Ard Biesheuvel
  0 siblings, 1 reply; 15+ messages in thread
From: Ondrej Mosnacek @ 2019-06-24  7:59 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Herbert Xu, Steve Capper, linux-crypto, linux-arm-kernel, Eric Biggers

Hi Ard,

On Mon, Jun 24, 2019 at 9:38 AM Ard Biesheuvel
<ard.biesheuvel@linaro.org> wrote:
>
> Use crypto_aegis128_update_u() not crypto_aegis128_update_a() in the
> decrypt path that is taken when the source or destination pointers
> are not aligned.
>
> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> ---
>  crypto/aegis128.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/crypto/aegis128.c b/crypto/aegis128.c
> index d78f77fc5dd1..125e11246990 100644
> --- a/crypto/aegis128.c
> +++ b/crypto/aegis128.c
> @@ -208,7 +208,7 @@ static void crypto_aegis128_decrypt_chunk(struct aegis_state *state, u8 *dst,
>                         crypto_aegis_block_xor(&tmp, &state->blocks[1]);
>                         crypto_xor(tmp.bytes, src, AEGIS_BLOCK_SIZE);
>
> -                       crypto_aegis128_update_a(state, &tmp);
> +                       crypto_aegis128_update_u(state, &tmp);

The "tmp" variable used here is declared directly on the stack as
'union aegis_block' and thus should be aligned to alignof(__le64),
which allows the use of crypto_aegis128_update_a() ->
crypto_aegis_block_xor(). It is also passed directly to
crypto_aegis_block_xor() a few lines above. Or am I missing something?


>
>                         memcpy(dst, tmp.bytes, AEGIS_BLOCK_SIZE);
>
> --
> 2.20.1
>

--
Ondrej Mosnacek <omosnace at redhat dot com>
Software Engineer, Security Technologies
Red Hat, Inc.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/6] crypto: aegis128 - use unaliged helper in unaligned decrypt path
  2019-06-24  7:59   ` Ondrej Mosnacek
@ 2019-06-24  8:01     ` Ard Biesheuvel
  0 siblings, 0 replies; 15+ messages in thread
From: Ard Biesheuvel @ 2019-06-24  8:01 UTC (permalink / raw)
  To: Ondrej Mosnacek
  Cc: Herbert Xu, Steve Capper,
	open list:HARDWARE RANDOM NUMBER GENERATOR CORE,
	linux-arm-kernel, Eric Biggers

On Mon, 24 Jun 2019 at 09:59, Ondrej Mosnacek <omosnace@redhat.com> wrote:
>
> Hi Ard,
>
> On Mon, Jun 24, 2019 at 9:38 AM Ard Biesheuvel
> <ard.biesheuvel@linaro.org> wrote:
> >
> > Use crypto_aegis128_update_u() not crypto_aegis128_update_a() in the
> > decrypt path that is taken when the source or destination pointers
> > are not aligned.
> >
> > Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> > ---
> >  crypto/aegis128.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/crypto/aegis128.c b/crypto/aegis128.c
> > index d78f77fc5dd1..125e11246990 100644
> > --- a/crypto/aegis128.c
> > +++ b/crypto/aegis128.c
> > @@ -208,7 +208,7 @@ static void crypto_aegis128_decrypt_chunk(struct aegis_state *state, u8 *dst,
> >                         crypto_aegis_block_xor(&tmp, &state->blocks[1]);
> >                         crypto_xor(tmp.bytes, src, AEGIS_BLOCK_SIZE);
> >
> > -                       crypto_aegis128_update_a(state, &tmp);
> > +                       crypto_aegis128_update_u(state, &tmp);
>
> The "tmp" variable used here is declared directly on the stack as
> 'union aegis_block' and thus should be aligned to alignof(__le64),
> which allows the use of crypto_aegis128_update_a() ->
> crypto_aegis_block_xor(). It is also passed directly to
> crypto_aegis_block_xor() a few lines above. Or am I missing something?
>

Ah yes, you are absolutely right. Apologies for the noise. I just
noticed the asymmetry with the encrypt path, but I should have looked
more carefully.

Please disregard this patch.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/6] crypto: aegis - drop empty TFM init/exit routines
  2019-06-24  7:38 ` [PATCH 2/6] crypto: aegis - drop empty TFM init/exit routines Ard Biesheuvel
@ 2019-06-24  8:03   ` Ondrej Mosnacek
  0 siblings, 0 replies; 15+ messages in thread
From: Ondrej Mosnacek @ 2019-06-24  8:03 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Herbert Xu, Steve Capper, linux-crypto, linux-arm-kernel, Eric Biggers

On Mon, Jun 24, 2019 at 9:38 AM Ard Biesheuvel
<ard.biesheuvel@linaro.org> wrote:
> TFM init/exit routines are optional, so no need to provide empty ones.
>
> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>

Reviewed-by: Ondrej Mosnacek <omosnace@redhat.com>

> ---
>  crypto/aegis128.c  | 11 -----------
>  crypto/aegis128l.c | 11 -----------
>  crypto/aegis256.c  | 11 -----------
>  3 files changed, 33 deletions(-)
>
> diff --git a/crypto/aegis128.c b/crypto/aegis128.c
> index 125e11246990..4f8f1cdef129 100644
> --- a/crypto/aegis128.c
> +++ b/crypto/aegis128.c
> @@ -403,22 +403,11 @@ static int crypto_aegis128_decrypt(struct aead_request *req)
>         return crypto_memneq(tag.bytes, zeros, authsize) ? -EBADMSG : 0;
>  }
>
> -static int crypto_aegis128_init_tfm(struct crypto_aead *tfm)
> -{
> -       return 0;
> -}
> -
> -static void crypto_aegis128_exit_tfm(struct crypto_aead *tfm)
> -{
> -}
> -
>  static struct aead_alg crypto_aegis128_alg = {
>         .setkey = crypto_aegis128_setkey,
>         .setauthsize = crypto_aegis128_setauthsize,
>         .encrypt = crypto_aegis128_encrypt,
>         .decrypt = crypto_aegis128_decrypt,
> -       .init = crypto_aegis128_init_tfm,
> -       .exit = crypto_aegis128_exit_tfm,
>
>         .ivsize = AEGIS128_NONCE_SIZE,
>         .maxauthsize = AEGIS128_MAX_AUTH_SIZE,
> diff --git a/crypto/aegis128l.c b/crypto/aegis128l.c
> index 9bca3d619a22..ef5bc2297a2c 100644
> --- a/crypto/aegis128l.c
> +++ b/crypto/aegis128l.c
> @@ -467,22 +467,11 @@ static int crypto_aegis128l_decrypt(struct aead_request *req)
>         return crypto_memneq(tag.bytes, zeros, authsize) ? -EBADMSG : 0;
>  }
>
> -static int crypto_aegis128l_init_tfm(struct crypto_aead *tfm)
> -{
> -       return 0;
> -}
> -
> -static void crypto_aegis128l_exit_tfm(struct crypto_aead *tfm)
> -{
> -}
> -
>  static struct aead_alg crypto_aegis128l_alg = {
>         .setkey = crypto_aegis128l_setkey,
>         .setauthsize = crypto_aegis128l_setauthsize,
>         .encrypt = crypto_aegis128l_encrypt,
>         .decrypt = crypto_aegis128l_decrypt,
> -       .init = crypto_aegis128l_init_tfm,
> -       .exit = crypto_aegis128l_exit_tfm,
>
>         .ivsize = AEGIS128L_NONCE_SIZE,
>         .maxauthsize = AEGIS128L_MAX_AUTH_SIZE,
> diff --git a/crypto/aegis256.c b/crypto/aegis256.c
> index b47fd39595ad..b824ef4d1248 100644
> --- a/crypto/aegis256.c
> +++ b/crypto/aegis256.c
> @@ -418,22 +418,11 @@ static int crypto_aegis256_decrypt(struct aead_request *req)
>         return crypto_memneq(tag.bytes, zeros, authsize) ? -EBADMSG : 0;
>  }
>
> -static int crypto_aegis256_init_tfm(struct crypto_aead *tfm)
> -{
> -       return 0;
> -}
> -
> -static void crypto_aegis256_exit_tfm(struct crypto_aead *tfm)
> -{
> -}
> -
>  static struct aead_alg crypto_aegis256_alg = {
>         .setkey = crypto_aegis256_setkey,
>         .setauthsize = crypto_aegis256_setauthsize,
>         .encrypt = crypto_aegis256_encrypt,
>         .decrypt = crypto_aegis256_decrypt,
> -       .init = crypto_aegis256_init_tfm,
> -       .exit = crypto_aegis256_exit_tfm,
>
>         .ivsize = AEGIS256_NONCE_SIZE,
>         .maxauthsize = AEGIS256_MAX_AUTH_SIZE,
> --
> 2.20.1
>


-- 
Ondrej Mosnacek <omosnace at redhat dot com>
Software Engineer, Security Technologies
Red Hat, Inc.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/6] crypto: aegis - avoid prerotated AES tables
  2019-06-24  7:38 ` [PATCH 3/6] crypto: aegis - avoid prerotated AES tables Ard Biesheuvel
@ 2019-06-24  8:13   ` Ondrej Mosnacek
  0 siblings, 0 replies; 15+ messages in thread
From: Ondrej Mosnacek @ 2019-06-24  8:13 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Herbert Xu, Steve Capper, linux-crypto, linux-arm-kernel, Eric Biggers

On Mon, Jun 24, 2019 at 9:38 AM Ard Biesheuvel
<ard.biesheuvel@linaro.org> wrote:
> The generic AES code provides four sets of lookup tables, where each
> set consists of four tables containing the same 32-bit values, but
> rotated by 0, 8, 16 and 24 bits, respectively. This makes sense for
> CISC architectures such as x86 which support memory operands, but
> for other architectures, the rotates are quite cheap, and using all
> four tables needlessly thrashes the D-cache, and actually hurts rather
> than helps performance.
>
> Since x86 already has its own implementation of AEGIS based on AES-NI
> instructions, let's tweak the generic implementation towards other
> architectures, and avoid the prerotated tables, and perform the
> rotations inline. On ARM Cortex-A53, this results in a ~8% speedup.
>
> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>

I'm not an expert on low-level performance, but the rationale sounds reasonable.

Acked-by: Ondrej Mosnacek <omosnace@redhat.com>

> ---
>  crypto/aegis.h | 14 ++++++--------
>  1 file changed, 6 insertions(+), 8 deletions(-)
>
> diff --git a/crypto/aegis.h b/crypto/aegis.h
> index 41a3090cda8e..3308066ddde0 100644
> --- a/crypto/aegis.h
> +++ b/crypto/aegis.h
> @@ -10,6 +10,7 @@
>  #define _CRYPTO_AEGIS_H
>
>  #include <crypto/aes.h>
> +#include <linux/bitops.h>
>  #include <linux/types.h>
>
>  #define AEGIS_BLOCK_SIZE 16
> @@ -53,16 +54,13 @@ static void crypto_aegis_aesenc(union aegis_block *dst,
>                                 const union aegis_block *key)
>  {
>         const u8  *s  = src->bytes;
> -       const u32 *t0 = crypto_ft_tab[0];
> -       const u32 *t1 = crypto_ft_tab[1];
> -       const u32 *t2 = crypto_ft_tab[2];
> -       const u32 *t3 = crypto_ft_tab[3];
> +       const u32 *t = crypto_ft_tab[0];
>         u32 d0, d1, d2, d3;
>
> -       d0 = t0[s[ 0]] ^ t1[s[ 5]] ^ t2[s[10]] ^ t3[s[15]];
> -       d1 = t0[s[ 4]] ^ t1[s[ 9]] ^ t2[s[14]] ^ t3[s[ 3]];
> -       d2 = t0[s[ 8]] ^ t1[s[13]] ^ t2[s[ 2]] ^ t3[s[ 7]];
> -       d3 = t0[s[12]] ^ t1[s[ 1]] ^ t2[s[ 6]] ^ t3[s[11]];
> +       d0 = t[s[ 0]] ^ rol32(t[s[ 5]], 8) ^ rol32(t[s[10]], 16) ^ rol32(t[s[15]], 24);
> +       d1 = t[s[ 4]] ^ rol32(t[s[ 9]], 8) ^ rol32(t[s[14]], 16) ^ rol32(t[s[ 3]], 24);
> +       d2 = t[s[ 8]] ^ rol32(t[s[13]], 8) ^ rol32(t[s[ 2]], 16) ^ rol32(t[s[ 7]], 24);
> +       d3 = t[s[12]] ^ rol32(t[s[ 1]], 8) ^ rol32(t[s[ 6]], 16) ^ rol32(t[s[11]], 24);
>
>         dst->words32[0] = cpu_to_le32(d0) ^ key->words32[0];
>         dst->words32[1] = cpu_to_le32(d1) ^ key->words32[1];
> --
> 2.20.1
>


-- 
Ondrej Mosnacek <omosnace at redhat dot com>
Software Engineer, Security Technologies
Red Hat, Inc.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 5/6] crypto: aegis128 - provide a SIMD implementation based on NEON intrinsics
  2019-06-24  7:38 ` [PATCH 5/6] crypto: aegis128 - provide a SIMD implementation based on NEON intrinsics Ard Biesheuvel
@ 2019-06-24 14:37   ` Ard Biesheuvel
  0 siblings, 0 replies; 15+ messages in thread
From: Ard Biesheuvel @ 2019-06-24 14:37 UTC (permalink / raw)
  To: open list:HARDWARE RANDOM NUMBER GENERATOR CORE
  Cc: Herbert Xu, Steve Capper, Ondrej Mosnacek, linux-arm-kernel,
	Eric Biggers

On Mon, 24 Jun 2019 at 09:38, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
>
> Provide an accelerated implementation of aegis128 by wiring up the
> SIMD hooks in the generic driver to an implementation based on NEON
> intrinsics, which can be compiled to both ARM and arm64 code.
>
> This results in a performance of 2.2 cycles per byte on Cortex-A53,
> which is a performance increase of ~11x compared to the generic
> code.
>
> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> ---
>  crypto/Kconfig               |   5 +
>  crypto/Makefile              |  12 ++
>  crypto/aegis128-neon-inner.c | 142 ++++++++++++++++++++
>  crypto/aegis128-neon.c       |  43 ++++++
>  4 files changed, 202 insertions(+)
>
...
> diff --git a/crypto/Makefile b/crypto/Makefile
> index 266a4cdbb9e2..f4a55cfb7f17 100644
> --- a/crypto/Makefile
> +++ b/crypto/Makefile
> @@ -92,6 +92,18 @@ obj-$(CONFIG_CRYPTO_GCM) += gcm.o
>  obj-$(CONFIG_CRYPTO_CCM) += ccm.o
>  obj-$(CONFIG_CRYPTO_CHACHA20POLY1305) += chacha20poly1305.o
>  obj-$(CONFIG_CRYPTO_AEGIS128) += aegis128.o
> +aegis128-y := aegis128.o
> +

This doesn't actually work when building a module. I'll have to rename
the .c file so that the module that combines the objects can retain
its name

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/6] crypto: aegis128 - add NEON intrinsics version for ARM/arm64
  2019-06-24  7:38 [PATCH 0/6] crypto: aegis128 - add NEON intrinsics version for ARM/arm64 Ard Biesheuvel
                   ` (5 preceding siblings ...)
  2019-06-24  7:38 ` [PATCH 6/6] crypto: tcrypt - add a speed test for AEGIS128 Ard Biesheuvel
@ 2019-06-24 16:56 ` Eric Biggers
  2019-06-25 14:07   ` Ondrej Mosnacek
  6 siblings, 1 reply; 15+ messages in thread
From: Eric Biggers @ 2019-06-24 16:56 UTC (permalink / raw)
  To: Ard Biesheuvel, Ondrej Mosnacek
  Cc: Milan Broz, Steve Capper, linux-crypto, linux-arm-kernel, Herbert Xu

On Mon, Jun 24, 2019 at 09:38:12AM +0200, Ard Biesheuvel wrote:
> Now that aegis128 has been announced as one of the winners of the CAESAR
> competition, it's time to provide some better support for it on arm64 (and
> 32-bit ARM *)
> 
> This time, instead of cloning the generic driver twice and rewriting half
> of it in arm64 and ARM assembly, add hooks for an accelerated SIMD path to
> the generic driver, and populate it with a C version using NEON intrinsics
> that can be built for both ARM and arm64. This results in a speedup of ~11x,
> resulting in a performance of 2.2 cycles per byte on Cortex-A53.
> 
> Patches #1 .. #3 are some fixes/improvements for the generic code. Patch #4
> adds the plumbing for using a SIMD accelerated implementation. Patch #5
> adds the ARM and arm64 code, and patch #6 adds a speed test.
> 
> Note that aegis128l and aegis256 were not selected, and nor where any of the
> morus contestants, and so we should probably consider dropping those drivers
> again.
> 

I'll also note that a few months ago there were attacks published on all
versions of full MORUS, with only 2^76 data and time complexity
(https://eprint.iacr.org/2019/172.pdf).  So MORUS is cryptographically broken,
and isn't really something that people should be using.  Ondrej, are people
actually using MORUS in the kernel?  I understand that you added it for your
Master's Thesis with the intent that it would be used with dm-integrity and
dm-crypt, but it's not clear that people are actually doing that.

In any case we could consider dropping the assembly implementations, though.

- Eric

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/6] crypto: aegis128 - add NEON intrinsics version for ARM/arm64
  2019-06-24 16:56 ` [PATCH 0/6] crypto: aegis128 - add NEON intrinsics version for ARM/arm64 Eric Biggers
@ 2019-06-25 14:07   ` Ondrej Mosnacek
  2019-06-25 14:57     ` Ard Biesheuvel
  0 siblings, 1 reply; 15+ messages in thread
From: Ondrej Mosnacek @ 2019-06-25 14:07 UTC (permalink / raw)
  To: Eric Biggers
  Cc: Herbert Xu, Steve Capper, Ard Biesheuvel, linux-crypto,
	Milan Broz, linux-arm-kernel

On Mon, Jun 24, 2019 at 6:57 PM Eric Biggers <ebiggers@kernel.org> wrote:
> On Mon, Jun 24, 2019 at 09:38:12AM +0200, Ard Biesheuvel wrote:
> > Now that aegis128 has been announced as one of the winners of the CAESAR
> > competition, it's time to provide some better support for it on arm64 (and
> > 32-bit ARM *)
> >
> > This time, instead of cloning the generic driver twice and rewriting half
> > of it in arm64 and ARM assembly, add hooks for an accelerated SIMD path to
> > the generic driver, and populate it with a C version using NEON intrinsics
> > that can be built for both ARM and arm64. This results in a speedup of ~11x,
> > resulting in a performance of 2.2 cycles per byte on Cortex-A53.
> >
> > Patches #1 .. #3 are some fixes/improvements for the generic code. Patch #4
> > adds the plumbing for using a SIMD accelerated implementation. Patch #5
> > adds the ARM and arm64 code, and patch #6 adds a speed test.
> >
> > Note that aegis128l and aegis256 were not selected, and nor where any of the
> > morus contestants, and so we should probably consider dropping those drivers
> > again.
> >
>
> I'll also note that a few months ago there were attacks published on all
> versions of full MORUS, with only 2^76 data and time complexity
> (https://eprint.iacr.org/2019/172.pdf).  So MORUS is cryptographically broken,
> and isn't really something that people should be using.  Ondrej, are people
> actually using MORUS in the kernel?  I understand that you added it for your
> Master's Thesis with the intent that it would be used with dm-integrity and
> dm-crypt, but it's not clear that people are actually doing that.

AFAIK, the only (potential) users are dm-crypt/dm-integrity and
af_alg. I don't expect many (if any) users using it, but who knows...
I don't have any problem with MORUS being removed from crypto API. It
seems to be broken rather heavily...

>
> In any case we could consider dropping the assembly implementations, though.
>
> - Eric

--
Ondrej Mosnacek <omosnace at redhat dot com>
Software Engineer, Security Technologies
Red Hat, Inc.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/6] crypto: aegis128 - add NEON intrinsics version for ARM/arm64
  2019-06-25 14:07   ` Ondrej Mosnacek
@ 2019-06-25 14:57     ` Ard Biesheuvel
  0 siblings, 0 replies; 15+ messages in thread
From: Ard Biesheuvel @ 2019-06-25 14:57 UTC (permalink / raw)
  To: Ondrej Mosnacek
  Cc: Herbert Xu, Steve Capper, Eric Biggers,
	open list:HARDWARE RANDOM NUMBER GENERATOR CORE, Milan Broz,
	linux-arm-kernel

On Tue, 25 Jun 2019 at 16:07, Ondrej Mosnacek <omosnace@redhat.com> wrote:
>
> On Mon, Jun 24, 2019 at 6:57 PM Eric Biggers <ebiggers@kernel.org> wrote:
> > On Mon, Jun 24, 2019 at 09:38:12AM +0200, Ard Biesheuvel wrote:
> > > Now that aegis128 has been announced as one of the winners of the CAESAR
> > > competition, it's time to provide some better support for it on arm64 (and
> > > 32-bit ARM *)
> > >
> > > This time, instead of cloning the generic driver twice and rewriting half
> > > of it in arm64 and ARM assembly, add hooks for an accelerated SIMD path to
> > > the generic driver, and populate it with a C version using NEON intrinsics
> > > that can be built for both ARM and arm64. This results in a speedup of ~11x,
> > > resulting in a performance of 2.2 cycles per byte on Cortex-A53.
> > >
> > > Patches #1 .. #3 are some fixes/improvements for the generic code. Patch #4
> > > adds the plumbing for using a SIMD accelerated implementation. Patch #5
> > > adds the ARM and arm64 code, and patch #6 adds a speed test.
> > >
> > > Note that aegis128l and aegis256 were not selected, and nor where any of the
> > > morus contestants, and so we should probably consider dropping those drivers
> > > again.
> > >
> >
> > I'll also note that a few months ago there were attacks published on all
> > versions of full MORUS, with only 2^76 data and time complexity
> > (https://eprint.iacr.org/2019/172.pdf).  So MORUS is cryptographically broken,
> > and isn't really something that people should be using.  Ondrej, are people
> > actually using MORUS in the kernel?  I understand that you added it for your
> > Master's Thesis with the intent that it would be used with dm-integrity and
> > dm-crypt, but it's not clear that people are actually doing that.
>
> AFAIK, the only (potential) users are dm-crypt/dm-integrity and
> af_alg. I don't expect many (if any) users using it, but who knows...
> I don't have any problem with MORUS being removed from crypto API. It
> seems to be broken rather heavily...
>

OK, patch sent.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2019-06-25 14:57 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-06-24  7:38 [PATCH 0/6] crypto: aegis128 - add NEON intrinsics version for ARM/arm64 Ard Biesheuvel
2019-06-24  7:38 ` [PATCH 1/6] crypto: aegis128 - use unaliged helper in unaligned decrypt path Ard Biesheuvel
2019-06-24  7:59   ` Ondrej Mosnacek
2019-06-24  8:01     ` Ard Biesheuvel
2019-06-24  7:38 ` [PATCH 2/6] crypto: aegis - drop empty TFM init/exit routines Ard Biesheuvel
2019-06-24  8:03   ` Ondrej Mosnacek
2019-06-24  7:38 ` [PATCH 3/6] crypto: aegis - avoid prerotated AES tables Ard Biesheuvel
2019-06-24  8:13   ` Ondrej Mosnacek
2019-06-24  7:38 ` [PATCH 4/6] crypto: aegis128 - add support for SIMD acceleration Ard Biesheuvel
2019-06-24  7:38 ` [PATCH 5/6] crypto: aegis128 - provide a SIMD implementation based on NEON intrinsics Ard Biesheuvel
2019-06-24 14:37   ` Ard Biesheuvel
2019-06-24  7:38 ` [PATCH 6/6] crypto: tcrypt - add a speed test for AEGIS128 Ard Biesheuvel
2019-06-24 16:56 ` [PATCH 0/6] crypto: aegis128 - add NEON intrinsics version for ARM/arm64 Eric Biggers
2019-06-25 14:07   ` Ondrej Mosnacek
2019-06-25 14:57     ` Ard Biesheuvel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).