All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/12] arm64: crypto: prepare for new kernel mode NEON policy
@ 2017-06-10 16:22 ` Ard Biesheuvel
  0 siblings, 0 replies; 28+ messages in thread
From: Ard Biesheuvel @ 2017-06-10 16:22 UTC (permalink / raw)
  To: linux-crypto, herbert, linux-arm-kernel, catalin.marinas,
	will.deacon, dave.martin
  Cc: Ard Biesheuvel

TL;DR: preparatory work for expected changes in arm64's handling of kernel
       mode SIMD

@Herbert: The arm64 maintainers may want to take this through the arm64 tree,
          and if not, we need their acks on patch #1. Thanks.

Currently, arm64 allows kernel mode NEON (KMN) in process, softirq or hardirq
context. In the process case, we preserve/restore the NEON context lazily,
but in the softirq/hardirq cases, we eagerly stash a slice of the NEON
register file, and immediately restore it when kernel_neon_end() is called.

Given the above, arm64 actually does not use the generic may_use_simd() API
at all*, which was added to allow async wrappers of synchronous SIMD routines
to be implemented in a generic manner. (On x86, kernel mode SIMD may be used
in process context or while serving an interrupt taken from user space. On ARM,
SIMD may only be used in process context)

When adding support for the SVE architecture extension, which shared part of
the NEON register file with the SIMD and crypto extensions, the eager preserve/
restore in interrupt context is becoming a problem: it should either preserve
and restore the entire SVE state (which may be up to 8 KB in size), or it
should not be allowed to interrupt the lazy preserve, which does need to deal
with the large SVE state anyway. Otherwise, such an interruption would corrupt
the NEON state the lazy preserve sees after the interruption.

Given how
a) KMN is never actually used in hardirq context,
b) KMN is only used in softirq context by mac80211 code running on behalf of
   WiFi devices that don't perform the crypto in hardware,
b) KMN in softirq context is statistically unlikely to interrupt the kernel
   while it is doing kernel mode NEON in process context,

the unconditional eager preserve/restore typically executes when no KMN in
process context is in progress, and we can simplify things substantially by
disallowing nested KMN, i.e., disallow KMN in hardirq context, and allow KMN
in softirq only if no KMN in process context is already in progress.

The no-nesting rule leaves only the outer SVE-aware lazy preserve/restore,
which needs to execute with bottom halves disabled, but other than that, no
intrusive changes should be needed to deal with the SVE payloads.

Given that the no-nesting rule implies that SIMD is no longer allowed in any
context, the KMN users need to be made aware of this. This series updates the
current KMN users in the arm64 tree to take may_use_simd() into account. Since
at this time, SIMD is still allowed in any context, an implementation of
may_use_simd() is added that simply returns true (#1). It will be updated in
the future when the no-nesting modifications are made.

* may_use_simd() is only used as a hint in the SHA256 NEON code, since on some
  microarchitectures, it is only marginally faster, and the eager preserve and
  restore could actually make it slower.

Ard Biesheuvel (12):
  arm64: neon: replace generic definition of may_use_simd()
  crypto: arm64/ghash-ce - add non-SIMD scalar fallback
  crypto: arm64/crct10dif - add non-SIMD generic fallback
  crypto: arm64/crc32 - add non-SIMD scalar fallback
  crypto: arm64/sha1-ce - add non-SIMD generic fallback
  crypto: arm64/sha2-ce - add non-SIMD scalar fallback
  crypto: arm64/aes-ce-cipher - match round key endianness with generic
    code
  crypto: arm64/aes-ce-cipher: add non-SIMD generic fallback
  crypto: arm64/aes-ce-ccm: add non-SIMD generic fallback
  crypto: arm64/aes-blk - add a non-SIMD fallback for synchronous CTR
  crypto: arm64/chacha20 - take may_use_simd() into account
  crypto: arm64/aes-bs - implement non-SIMD fallback for AES-CTR

 arch/arm64/crypto/Kconfig              |  22 ++-
 arch/arm64/crypto/aes-ce-ccm-core.S    |  30 ++--
 arch/arm64/crypto/aes-ce-ccm-glue.c    | 152 +++++++++++++++-----
 arch/arm64/crypto/aes-ce-cipher.c      |  55 ++++---
 arch/arm64/crypto/aes-ce.S             |  12 +-
 arch/arm64/crypto/aes-ctr-fallback.h   |  55 +++++++
 arch/arm64/crypto/aes-glue.c           |  17 ++-
 arch/arm64/crypto/aes-neonbs-glue.c    |  48 ++++++-
 arch/arm64/crypto/chacha20-neon-glue.c |   5 +-
 arch/arm64/crypto/crc32-ce-glue.c      |  11 +-
 arch/arm64/crypto/crct10dif-ce-glue.c  |  13 +-
 arch/arm64/crypto/ghash-ce-glue.c      |  49 +++++--
 arch/arm64/crypto/sha1-ce-glue.c       |  18 ++-
 arch/arm64/crypto/sha2-ce-glue.c       |  30 +++-
 arch/arm64/crypto/sha256-glue.c        |   1 +
 arch/arm64/include/asm/Kbuild          |   1 -
 arch/arm64/include/asm/simd.h          |  24 ++++
 17 files changed, 420 insertions(+), 123 deletions(-)
 create mode 100644 arch/arm64/crypto/aes-ctr-fallback.h
 create mode 100644 arch/arm64/include/asm/simd.h

-- 
2.7.4

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH 00/12] arm64: crypto: prepare for new kernel mode NEON policy
@ 2017-06-10 16:22 ` Ard Biesheuvel
  0 siblings, 0 replies; 28+ messages in thread
From: Ard Biesheuvel @ 2017-06-10 16:22 UTC (permalink / raw)
  To: linux-arm-kernel

TL;DR: preparatory work for expected changes in arm64's handling of kernel
       mode SIMD

@Herbert: The arm64 maintainers may want to take this through the arm64 tree,
          and if not, we need their acks on patch #1. Thanks.

Currently, arm64 allows kernel mode NEON (KMN) in process, softirq or hardirq
context. In the process case, we preserve/restore the NEON context lazily,
but in the softirq/hardirq cases, we eagerly stash a slice of the NEON
register file, and immediately restore it when kernel_neon_end() is called.

Given the above, arm64 actually does not use the generic may_use_simd() API
at all*, which was added to allow async wrappers of synchronous SIMD routines
to be implemented in a generic manner. (On x86, kernel mode SIMD may be used
in process context or while serving an interrupt taken from user space. On ARM,
SIMD may only be used in process context)

When adding support for the SVE architecture extension, which shared part of
the NEON register file with the SIMD and crypto extensions, the eager preserve/
restore in interrupt context is becoming a problem: it should either preserve
and restore the entire SVE state (which may be up to 8 KB in size), or it
should not be allowed to interrupt the lazy preserve, which does need to deal
with the large SVE state anyway. Otherwise, such an interruption would corrupt
the NEON state the lazy preserve sees after the interruption.

Given how
a) KMN is never actually used in hardirq context,
b) KMN is only used in softirq context by mac80211 code running on behalf of
   WiFi devices that don't perform the crypto in hardware,
b) KMN in softirq context is statistically unlikely to interrupt the kernel
   while it is doing kernel mode NEON in process context,

the unconditional eager preserve/restore typically executes when no KMN in
process context is in progress, and we can simplify things substantially by
disallowing nested KMN, i.e., disallow KMN in hardirq context, and allow KMN
in softirq only if no KMN in process context is already in progress.

The no-nesting rule leaves only the outer SVE-aware lazy preserve/restore,
which needs to execute with bottom halves disabled, but other than that, no
intrusive changes should be needed to deal with the SVE payloads.

Given that the no-nesting rule implies that SIMD is no longer allowed in any
context, the KMN users need to be made aware of this. This series updates the
current KMN users in the arm64 tree to take may_use_simd() into account. Since
at this time, SIMD is still allowed in any context, an implementation of
may_use_simd() is added that simply returns true (#1). It will be updated in
the future when the no-nesting modifications are made.

* may_use_simd() is only used as a hint in the SHA256 NEON code, since on some
  microarchitectures, it is only marginally faster, and the eager preserve and
  restore could actually make it slower.

Ard Biesheuvel (12):
  arm64: neon: replace generic definition of may_use_simd()
  crypto: arm64/ghash-ce - add non-SIMD scalar fallback
  crypto: arm64/crct10dif - add non-SIMD generic fallback
  crypto: arm64/crc32 - add non-SIMD scalar fallback
  crypto: arm64/sha1-ce - add non-SIMD generic fallback
  crypto: arm64/sha2-ce - add non-SIMD scalar fallback
  crypto: arm64/aes-ce-cipher - match round key endianness with generic
    code
  crypto: arm64/aes-ce-cipher: add non-SIMD generic fallback
  crypto: arm64/aes-ce-ccm: add non-SIMD generic fallback
  crypto: arm64/aes-blk - add a non-SIMD fallback for synchronous CTR
  crypto: arm64/chacha20 - take may_use_simd() into account
  crypto: arm64/aes-bs - implement non-SIMD fallback for AES-CTR

 arch/arm64/crypto/Kconfig              |  22 ++-
 arch/arm64/crypto/aes-ce-ccm-core.S    |  30 ++--
 arch/arm64/crypto/aes-ce-ccm-glue.c    | 152 +++++++++++++++-----
 arch/arm64/crypto/aes-ce-cipher.c      |  55 ++++---
 arch/arm64/crypto/aes-ce.S             |  12 +-
 arch/arm64/crypto/aes-ctr-fallback.h   |  55 +++++++
 arch/arm64/crypto/aes-glue.c           |  17 ++-
 arch/arm64/crypto/aes-neonbs-glue.c    |  48 ++++++-
 arch/arm64/crypto/chacha20-neon-glue.c |   5 +-
 arch/arm64/crypto/crc32-ce-glue.c      |  11 +-
 arch/arm64/crypto/crct10dif-ce-glue.c  |  13 +-
 arch/arm64/crypto/ghash-ce-glue.c      |  49 +++++--
 arch/arm64/crypto/sha1-ce-glue.c       |  18 ++-
 arch/arm64/crypto/sha2-ce-glue.c       |  30 +++-
 arch/arm64/crypto/sha256-glue.c        |   1 +
 arch/arm64/include/asm/Kbuild          |   1 -
 arch/arm64/include/asm/simd.h          |  24 ++++
 17 files changed, 420 insertions(+), 123 deletions(-)
 create mode 100644 arch/arm64/crypto/aes-ctr-fallback.h
 create mode 100644 arch/arm64/include/asm/simd.h

-- 
2.7.4

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH 01/12] arm64: neon: replace generic definition of may_use_simd()
  2017-06-10 16:22 ` Ard Biesheuvel
@ 2017-06-10 16:22   ` Ard Biesheuvel
  -1 siblings, 0 replies; 28+ messages in thread
From: Ard Biesheuvel @ 2017-06-10 16:22 UTC (permalink / raw)
  To: linux-crypto, herbert, linux-arm-kernel, catalin.marinas,
	will.deacon, dave.martin
  Cc: Ard Biesheuvel

In preparation of modifying the logic that decides whether kernel mode
NEON is allowable, which is required for SVE support, introduce an
implementation of may_use_simd() that reflects the current reality, i.e.,
that SIMD is allowed in any context.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/include/asm/Kbuild |  1 -
 arch/arm64/include/asm/simd.h | 24 ++++++++++++++++++++
 2 files changed, 24 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/Kbuild b/arch/arm64/include/asm/Kbuild
index a7a97a608033..3c469b557ee8 100644
--- a/arch/arm64/include/asm/Kbuild
+++ b/arch/arm64/include/asm/Kbuild
@@ -31,7 +31,6 @@ generic-y += sembuf.h
 generic-y += serial.h
 generic-y += set_memory.h
 generic-y += shmbuf.h
-generic-y += simd.h
 generic-y += sizes.h
 generic-y += socket.h
 generic-y += sockios.h
diff --git a/arch/arm64/include/asm/simd.h b/arch/arm64/include/asm/simd.h
new file mode 100644
index 000000000000..f8aa7b3a0140
--- /dev/null
+++ b/arch/arm64/include/asm/simd.h
@@ -0,0 +1,24 @@
+/*
+ * Copyright (C) 2017 Linaro Ltd. <ard.biesheuvel@linaro.org>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published
+ * by the Free Software Foundation.
+ */
+
+#ifndef __ASM_SIMD_H
+#define __ASM_SIMD_H
+
+#include <linux/compiler.h>
+#include <linux/types.h>
+
+/*
+ * may_use_simd - whether it is allowable at this time to issue SIMD
+ *                instructions or access the SIMD register file
+ */
+static __must_check inline bool may_use_simd(void)
+{
+	return true;
+}
+
+#endif
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 01/12] arm64: neon: replace generic definition of may_use_simd()
@ 2017-06-10 16:22   ` Ard Biesheuvel
  0 siblings, 0 replies; 28+ messages in thread
From: Ard Biesheuvel @ 2017-06-10 16:22 UTC (permalink / raw)
  To: linux-arm-kernel

In preparation of modifying the logic that decides whether kernel mode
NEON is allowable, which is required for SVE support, introduce an
implementation of may_use_simd() that reflects the current reality, i.e.,
that SIMD is allowed in any context.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/include/asm/Kbuild |  1 -
 arch/arm64/include/asm/simd.h | 24 ++++++++++++++++++++
 2 files changed, 24 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/Kbuild b/arch/arm64/include/asm/Kbuild
index a7a97a608033..3c469b557ee8 100644
--- a/arch/arm64/include/asm/Kbuild
+++ b/arch/arm64/include/asm/Kbuild
@@ -31,7 +31,6 @@ generic-y += sembuf.h
 generic-y += serial.h
 generic-y += set_memory.h
 generic-y += shmbuf.h
-generic-y += simd.h
 generic-y += sizes.h
 generic-y += socket.h
 generic-y += sockios.h
diff --git a/arch/arm64/include/asm/simd.h b/arch/arm64/include/asm/simd.h
new file mode 100644
index 000000000000..f8aa7b3a0140
--- /dev/null
+++ b/arch/arm64/include/asm/simd.h
@@ -0,0 +1,24 @@
+/*
+ * Copyright (C) 2017 Linaro Ltd. <ard.biesheuvel@linaro.org>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published
+ * by the Free Software Foundation.
+ */
+
+#ifndef __ASM_SIMD_H
+#define __ASM_SIMD_H
+
+#include <linux/compiler.h>
+#include <linux/types.h>
+
+/*
+ * may_use_simd - whether it is allowable at this time to issue SIMD
+ *                instructions or access the SIMD register file
+ */
+static __must_check inline bool may_use_simd(void)
+{
+	return true;
+}
+
+#endif
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 02/12] crypto: arm64/ghash-ce - add non-SIMD scalar fallback
  2017-06-10 16:22 ` Ard Biesheuvel
@ 2017-06-10 16:22   ` Ard Biesheuvel
  -1 siblings, 0 replies; 28+ messages in thread
From: Ard Biesheuvel @ 2017-06-10 16:22 UTC (permalink / raw)
  To: linux-crypto, herbert, linux-arm-kernel, catalin.marinas,
	will.deacon, dave.martin
  Cc: Ard Biesheuvel

The arm64 kernel will shortly disallow nested kernel mode NEON, so
add a fallback to scalar C code that can be invoked in that case.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/crypto/Kconfig         |  3 +-
 arch/arm64/crypto/ghash-ce-glue.c | 49 ++++++++++++++++----
 2 files changed, 43 insertions(+), 9 deletions(-)

diff --git a/arch/arm64/crypto/Kconfig b/arch/arm64/crypto/Kconfig
index d92293747d63..7d75a363e317 100644
--- a/arch/arm64/crypto/Kconfig
+++ b/arch/arm64/crypto/Kconfig
@@ -28,8 +28,9 @@ config CRYPTO_SHA2_ARM64_CE
 
 config CRYPTO_GHASH_ARM64_CE
 	tristate "GHASH (for GCM chaining mode) using ARMv8 Crypto Extensions"
-	depends on ARM64 && KERNEL_MODE_NEON
+	depends on KERNEL_MODE_NEON
 	select CRYPTO_HASH
+	select CRYPTO_GF128MUL
 
 config CRYPTO_CRCT10DIF_ARM64_CE
 	tristate "CRCT10DIF digest algorithm using PMULL instructions"
diff --git a/arch/arm64/crypto/ghash-ce-glue.c b/arch/arm64/crypto/ghash-ce-glue.c
index 833ec1e3f3e9..3e1a778b181a 100644
--- a/arch/arm64/crypto/ghash-ce-glue.c
+++ b/arch/arm64/crypto/ghash-ce-glue.c
@@ -1,7 +1,7 @@
 /*
  * Accelerated GHASH implementation with ARMv8 PMULL instructions.
  *
- * Copyright (C) 2014 Linaro Ltd. <ard.biesheuvel@linaro.org>
+ * Copyright (C) 2014 - 2017 Linaro Ltd. <ard.biesheuvel@linaro.org>
  *
  * This program is free software; you can redistribute it and/or modify it
  * under the terms of the GNU General Public License version 2 as published
@@ -9,7 +9,9 @@
  */
 
 #include <asm/neon.h>
+#include <asm/simd.h>
 #include <asm/unaligned.h>
+#include <crypto/gf128mul.h>
 #include <crypto/internal/hash.h>
 #include <linux/cpufeature.h>
 #include <linux/crypto.h>
@@ -25,6 +27,7 @@ MODULE_LICENSE("GPL v2");
 struct ghash_key {
 	u64 a;
 	u64 b;
+	be128 k;
 };
 
 struct ghash_desc_ctx {
@@ -44,6 +47,36 @@ static int ghash_init(struct shash_desc *desc)
 	return 0;
 }
 
+static void ghash_do_update(int blocks, u64 dg[], const char *src,
+			    struct ghash_key *key, const char *head)
+{
+	if (may_use_simd()) {
+		kernel_neon_begin();
+		pmull_ghash_update(blocks, dg, src, key, head);
+		kernel_neon_end();
+	} else {
+		be128 dst = { cpu_to_be64(dg[1]), cpu_to_be64(dg[0]) };
+
+		do {
+			const u8 *in = src;
+
+			if (head) {
+				in = head;
+				blocks++;
+				head = NULL;
+			} else {
+				src += GHASH_BLOCK_SIZE;
+			}
+
+			crypto_xor((u8 *)&dst, in, GHASH_BLOCK_SIZE);
+			gf128mul_lle(&dst, &key->k);
+		} while (--blocks);
+
+		dg[0] = be64_to_cpu(dst.b);
+		dg[1] = be64_to_cpu(dst.a);
+	}
+}
+
 static int ghash_update(struct shash_desc *desc, const u8 *src,
 			unsigned int len)
 {
@@ -67,10 +100,9 @@ static int ghash_update(struct shash_desc *desc, const u8 *src,
 		blocks = len / GHASH_BLOCK_SIZE;
 		len %= GHASH_BLOCK_SIZE;
 
-		kernel_neon_begin_partial(8);
-		pmull_ghash_update(blocks, ctx->digest, src, key,
-				   partial ? ctx->buf : NULL);
-		kernel_neon_end();
+		ghash_do_update(blocks, ctx->digest, src, key,
+				partial ? ctx->buf : NULL);
+
 		src += blocks * GHASH_BLOCK_SIZE;
 		partial = 0;
 	}
@@ -89,9 +121,7 @@ static int ghash_final(struct shash_desc *desc, u8 *dst)
 
 		memset(ctx->buf + partial, 0, GHASH_BLOCK_SIZE - partial);
 
-		kernel_neon_begin_partial(8);
-		pmull_ghash_update(1, ctx->digest, ctx->buf, key, NULL);
-		kernel_neon_end();
+		ghash_do_update(1, ctx->digest, ctx->buf, key, NULL);
 	}
 	put_unaligned_be64(ctx->digest[1], dst);
 	put_unaligned_be64(ctx->digest[0], dst + 8);
@@ -111,6 +141,9 @@ static int ghash_setkey(struct crypto_shash *tfm,
 		return -EINVAL;
 	}
 
+	/* needed for the fallback */
+	memcpy(&key->k, inkey, GHASH_BLOCK_SIZE);
+
 	/* perform multiplication by 'x' in GF(2^128) */
 	b = get_unaligned_be64(inkey);
 	a = get_unaligned_be64(inkey + 8);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 02/12] crypto: arm64/ghash-ce - add non-SIMD scalar fallback
@ 2017-06-10 16:22   ` Ard Biesheuvel
  0 siblings, 0 replies; 28+ messages in thread
From: Ard Biesheuvel @ 2017-06-10 16:22 UTC (permalink / raw)
  To: linux-arm-kernel

The arm64 kernel will shortly disallow nested kernel mode NEON, so
add a fallback to scalar C code that can be invoked in that case.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/crypto/Kconfig         |  3 +-
 arch/arm64/crypto/ghash-ce-glue.c | 49 ++++++++++++++++----
 2 files changed, 43 insertions(+), 9 deletions(-)

diff --git a/arch/arm64/crypto/Kconfig b/arch/arm64/crypto/Kconfig
index d92293747d63..7d75a363e317 100644
--- a/arch/arm64/crypto/Kconfig
+++ b/arch/arm64/crypto/Kconfig
@@ -28,8 +28,9 @@ config CRYPTO_SHA2_ARM64_CE
 
 config CRYPTO_GHASH_ARM64_CE
 	tristate "GHASH (for GCM chaining mode) using ARMv8 Crypto Extensions"
-	depends on ARM64 && KERNEL_MODE_NEON
+	depends on KERNEL_MODE_NEON
 	select CRYPTO_HASH
+	select CRYPTO_GF128MUL
 
 config CRYPTO_CRCT10DIF_ARM64_CE
 	tristate "CRCT10DIF digest algorithm using PMULL instructions"
diff --git a/arch/arm64/crypto/ghash-ce-glue.c b/arch/arm64/crypto/ghash-ce-glue.c
index 833ec1e3f3e9..3e1a778b181a 100644
--- a/arch/arm64/crypto/ghash-ce-glue.c
+++ b/arch/arm64/crypto/ghash-ce-glue.c
@@ -1,7 +1,7 @@
 /*
  * Accelerated GHASH implementation with ARMv8 PMULL instructions.
  *
- * Copyright (C) 2014 Linaro Ltd. <ard.biesheuvel@linaro.org>
+ * Copyright (C) 2014 - 2017 Linaro Ltd. <ard.biesheuvel@linaro.org>
  *
  * This program is free software; you can redistribute it and/or modify it
  * under the terms of the GNU General Public License version 2 as published
@@ -9,7 +9,9 @@
  */
 
 #include <asm/neon.h>
+#include <asm/simd.h>
 #include <asm/unaligned.h>
+#include <crypto/gf128mul.h>
 #include <crypto/internal/hash.h>
 #include <linux/cpufeature.h>
 #include <linux/crypto.h>
@@ -25,6 +27,7 @@ MODULE_LICENSE("GPL v2");
 struct ghash_key {
 	u64 a;
 	u64 b;
+	be128 k;
 };
 
 struct ghash_desc_ctx {
@@ -44,6 +47,36 @@ static int ghash_init(struct shash_desc *desc)
 	return 0;
 }
 
+static void ghash_do_update(int blocks, u64 dg[], const char *src,
+			    struct ghash_key *key, const char *head)
+{
+	if (may_use_simd()) {
+		kernel_neon_begin();
+		pmull_ghash_update(blocks, dg, src, key, head);
+		kernel_neon_end();
+	} else {
+		be128 dst = { cpu_to_be64(dg[1]), cpu_to_be64(dg[0]) };
+
+		do {
+			const u8 *in = src;
+
+			if (head) {
+				in = head;
+				blocks++;
+				head = NULL;
+			} else {
+				src += GHASH_BLOCK_SIZE;
+			}
+
+			crypto_xor((u8 *)&dst, in, GHASH_BLOCK_SIZE);
+			gf128mul_lle(&dst, &key->k);
+		} while (--blocks);
+
+		dg[0] = be64_to_cpu(dst.b);
+		dg[1] = be64_to_cpu(dst.a);
+	}
+}
+
 static int ghash_update(struct shash_desc *desc, const u8 *src,
 			unsigned int len)
 {
@@ -67,10 +100,9 @@ static int ghash_update(struct shash_desc *desc, const u8 *src,
 		blocks = len / GHASH_BLOCK_SIZE;
 		len %= GHASH_BLOCK_SIZE;
 
-		kernel_neon_begin_partial(8);
-		pmull_ghash_update(blocks, ctx->digest, src, key,
-				   partial ? ctx->buf : NULL);
-		kernel_neon_end();
+		ghash_do_update(blocks, ctx->digest, src, key,
+				partial ? ctx->buf : NULL);
+
 		src += blocks * GHASH_BLOCK_SIZE;
 		partial = 0;
 	}
@@ -89,9 +121,7 @@ static int ghash_final(struct shash_desc *desc, u8 *dst)
 
 		memset(ctx->buf + partial, 0, GHASH_BLOCK_SIZE - partial);
 
-		kernel_neon_begin_partial(8);
-		pmull_ghash_update(1, ctx->digest, ctx->buf, key, NULL);
-		kernel_neon_end();
+		ghash_do_update(1, ctx->digest, ctx->buf, key, NULL);
 	}
 	put_unaligned_be64(ctx->digest[1], dst);
 	put_unaligned_be64(ctx->digest[0], dst + 8);
@@ -111,6 +141,9 @@ static int ghash_setkey(struct crypto_shash *tfm,
 		return -EINVAL;
 	}
 
+	/* needed for the fallback */
+	memcpy(&key->k, inkey, GHASH_BLOCK_SIZE);
+
 	/* perform multiplication by 'x' in GF(2^128) */
 	b = get_unaligned_be64(inkey);
 	a = get_unaligned_be64(inkey + 8);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 03/12] crypto: arm64/crct10dif - add non-SIMD generic fallback
  2017-06-10 16:22 ` Ard Biesheuvel
@ 2017-06-10 16:22   ` Ard Biesheuvel
  -1 siblings, 0 replies; 28+ messages in thread
From: Ard Biesheuvel @ 2017-06-10 16:22 UTC (permalink / raw)
  To: linux-crypto, herbert, linux-arm-kernel, catalin.marinas,
	will.deacon, dave.martin
  Cc: Ard Biesheuvel

The arm64 kernel will shortly disallow nested kernel mode NEON, so
add a fallback to scalar C code that can be invoked in that case.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/crypto/crct10dif-ce-glue.c | 13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/crypto/crct10dif-ce-glue.c b/arch/arm64/crypto/crct10dif-ce-glue.c
index 60cb590c2590..96f0cae4a022 100644
--- a/arch/arm64/crypto/crct10dif-ce-glue.c
+++ b/arch/arm64/crypto/crct10dif-ce-glue.c
@@ -1,7 +1,7 @@
 /*
  * Accelerated CRC-T10DIF using arm64 NEON and Crypto Extensions instructions
  *
- * Copyright (C) 2016 Linaro Ltd <ard.biesheuvel@linaro.org>
+ * Copyright (C) 2016 - 2017 Linaro Ltd <ard.biesheuvel@linaro.org>
  *
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of the GNU General Public License version 2 as
@@ -18,6 +18,7 @@
 #include <crypto/internal/hash.h>
 
 #include <asm/neon.h>
+#include <asm/simd.h>
 
 #define CRC_T10DIF_PMULL_CHUNK_SIZE	16U
 
@@ -48,9 +49,13 @@ static int crct10dif_update(struct shash_desc *desc, const u8 *data,
 	}
 
 	if (length > 0) {
-		kernel_neon_begin_partial(14);
-		*crc = crc_t10dif_pmull(*crc, data, length);
-		kernel_neon_end();
+		if (may_use_simd()) {
+			kernel_neon_begin();
+			*crc = crc_t10dif_pmull(*crc, data, length);
+			kernel_neon_end();
+		} else {
+			*crc = crc_t10dif_generic(*crc, data, length);
+		}
 	}
 
 	return 0;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 03/12] crypto: arm64/crct10dif - add non-SIMD generic fallback
@ 2017-06-10 16:22   ` Ard Biesheuvel
  0 siblings, 0 replies; 28+ messages in thread
From: Ard Biesheuvel @ 2017-06-10 16:22 UTC (permalink / raw)
  To: linux-arm-kernel

The arm64 kernel will shortly disallow nested kernel mode NEON, so
add a fallback to scalar C code that can be invoked in that case.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/crypto/crct10dif-ce-glue.c | 13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/crypto/crct10dif-ce-glue.c b/arch/arm64/crypto/crct10dif-ce-glue.c
index 60cb590c2590..96f0cae4a022 100644
--- a/arch/arm64/crypto/crct10dif-ce-glue.c
+++ b/arch/arm64/crypto/crct10dif-ce-glue.c
@@ -1,7 +1,7 @@
 /*
  * Accelerated CRC-T10DIF using arm64 NEON and Crypto Extensions instructions
  *
- * Copyright (C) 2016 Linaro Ltd <ard.biesheuvel@linaro.org>
+ * Copyright (C) 2016 - 2017 Linaro Ltd <ard.biesheuvel@linaro.org>
  *
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of the GNU General Public License version 2 as
@@ -18,6 +18,7 @@
 #include <crypto/internal/hash.h>
 
 #include <asm/neon.h>
+#include <asm/simd.h>
 
 #define CRC_T10DIF_PMULL_CHUNK_SIZE	16U
 
@@ -48,9 +49,13 @@ static int crct10dif_update(struct shash_desc *desc, const u8 *data,
 	}
 
 	if (length > 0) {
-		kernel_neon_begin_partial(14);
-		*crc = crc_t10dif_pmull(*crc, data, length);
-		kernel_neon_end();
+		if (may_use_simd()) {
+			kernel_neon_begin();
+			*crc = crc_t10dif_pmull(*crc, data, length);
+			kernel_neon_end();
+		} else {
+			*crc = crc_t10dif_generic(*crc, data, length);
+		}
 	}
 
 	return 0;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 04/12] crypto: arm64/crc32 - add non-SIMD scalar fallback
  2017-06-10 16:22 ` Ard Biesheuvel
@ 2017-06-10 16:22   ` Ard Biesheuvel
  -1 siblings, 0 replies; 28+ messages in thread
From: Ard Biesheuvel @ 2017-06-10 16:22 UTC (permalink / raw)
  To: linux-crypto, herbert, linux-arm-kernel, catalin.marinas,
	will.deacon, dave.martin
  Cc: Ard Biesheuvel

The arm64 kernel will shortly disallow nested kernel mode NEON, so
add a fallback to scalar C code that can be invoked in that case.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/crypto/crc32-ce-glue.c | 11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/crypto/crc32-ce-glue.c b/arch/arm64/crypto/crc32-ce-glue.c
index eccb1ae90064..624f4137918c 100644
--- a/arch/arm64/crypto/crc32-ce-glue.c
+++ b/arch/arm64/crypto/crc32-ce-glue.c
@@ -1,7 +1,7 @@
 /*
  * Accelerated CRC32(C) using arm64 NEON and Crypto Extensions instructions
  *
- * Copyright (C) 2016 Linaro Ltd <ard.biesheuvel@linaro.org>
+ * Copyright (C) 2016 - 2017 Linaro Ltd <ard.biesheuvel@linaro.org>
  *
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of the GNU General Public License version 2 as
@@ -19,6 +19,7 @@
 
 #include <asm/hwcap.h>
 #include <asm/neon.h>
+#include <asm/simd.h>
 #include <asm/unaligned.h>
 
 #define PMULL_MIN_LEN		64L	/* minimum size of buffer
@@ -105,10 +106,10 @@ static int crc32_pmull_update(struct shash_desc *desc, const u8 *data,
 		length -= l;
 	}
 
-	if (length >= PMULL_MIN_LEN) {
+	if (length >= PMULL_MIN_LEN && may_use_simd()) {
 		l = round_down(length, SCALE_F);
 
-		kernel_neon_begin_partial(10);
+		kernel_neon_begin();
 		*crc = crc32_pmull_le(data, l, *crc);
 		kernel_neon_end();
 
@@ -137,10 +138,10 @@ static int crc32c_pmull_update(struct shash_desc *desc, const u8 *data,
 		length -= l;
 	}
 
-	if (length >= PMULL_MIN_LEN) {
+	if (length >= PMULL_MIN_LEN && may_use_simd()) {
 		l = round_down(length, SCALE_F);
 
-		kernel_neon_begin_partial(10);
+		kernel_neon_begin();
 		*crc = crc32c_pmull_le(data, l, *crc);
 		kernel_neon_end();
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 04/12] crypto: arm64/crc32 - add non-SIMD scalar fallback
@ 2017-06-10 16:22   ` Ard Biesheuvel
  0 siblings, 0 replies; 28+ messages in thread
From: Ard Biesheuvel @ 2017-06-10 16:22 UTC (permalink / raw)
  To: linux-arm-kernel

The arm64 kernel will shortly disallow nested kernel mode NEON, so
add a fallback to scalar C code that can be invoked in that case.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/crypto/crc32-ce-glue.c | 11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/crypto/crc32-ce-glue.c b/arch/arm64/crypto/crc32-ce-glue.c
index eccb1ae90064..624f4137918c 100644
--- a/arch/arm64/crypto/crc32-ce-glue.c
+++ b/arch/arm64/crypto/crc32-ce-glue.c
@@ -1,7 +1,7 @@
 /*
  * Accelerated CRC32(C) using arm64 NEON and Crypto Extensions instructions
  *
- * Copyright (C) 2016 Linaro Ltd <ard.biesheuvel@linaro.org>
+ * Copyright (C) 2016 - 2017 Linaro Ltd <ard.biesheuvel@linaro.org>
  *
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of the GNU General Public License version 2 as
@@ -19,6 +19,7 @@
 
 #include <asm/hwcap.h>
 #include <asm/neon.h>
+#include <asm/simd.h>
 #include <asm/unaligned.h>
 
 #define PMULL_MIN_LEN		64L	/* minimum size of buffer
@@ -105,10 +106,10 @@ static int crc32_pmull_update(struct shash_desc *desc, const u8 *data,
 		length -= l;
 	}
 
-	if (length >= PMULL_MIN_LEN) {
+	if (length >= PMULL_MIN_LEN && may_use_simd()) {
 		l = round_down(length, SCALE_F);
 
-		kernel_neon_begin_partial(10);
+		kernel_neon_begin();
 		*crc = crc32_pmull_le(data, l, *crc);
 		kernel_neon_end();
 
@@ -137,10 +138,10 @@ static int crc32c_pmull_update(struct shash_desc *desc, const u8 *data,
 		length -= l;
 	}
 
-	if (length >= PMULL_MIN_LEN) {
+	if (length >= PMULL_MIN_LEN && may_use_simd()) {
 		l = round_down(length, SCALE_F);
 
-		kernel_neon_begin_partial(10);
+		kernel_neon_begin();
 		*crc = crc32c_pmull_le(data, l, *crc);
 		kernel_neon_end();
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 05/12] crypto: arm64/sha1-ce - add non-SIMD generic fallback
  2017-06-10 16:22 ` Ard Biesheuvel
@ 2017-06-10 16:22   ` Ard Biesheuvel
  -1 siblings, 0 replies; 28+ messages in thread
From: Ard Biesheuvel @ 2017-06-10 16:22 UTC (permalink / raw)
  To: linux-crypto, herbert, linux-arm-kernel, catalin.marinas,
	will.deacon, dave.martin
  Cc: Ard Biesheuvel

The arm64 kernel will shortly disallow nested kernel mode NEON, so
add a fallback to scalar C code that can be invoked in that case.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/crypto/Kconfig        |  3 ++-
 arch/arm64/crypto/sha1-ce-glue.c | 18 ++++++++++++++----
 2 files changed, 16 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/crypto/Kconfig b/arch/arm64/crypto/Kconfig
index 7d75a363e317..5d5953545dad 100644
--- a/arch/arm64/crypto/Kconfig
+++ b/arch/arm64/crypto/Kconfig
@@ -18,8 +18,9 @@ config CRYPTO_SHA512_ARM64
 
 config CRYPTO_SHA1_ARM64_CE
 	tristate "SHA-1 digest algorithm (ARMv8 Crypto Extensions)"
-	depends on ARM64 && KERNEL_MODE_NEON
+	depends on KERNEL_MODE_NEON
 	select CRYPTO_HASH
+	select CRYPTO_SHA1
 
 config CRYPTO_SHA2_ARM64_CE
 	tristate "SHA-224/SHA-256 digest algorithm (ARMv8 Crypto Extensions)"
diff --git a/arch/arm64/crypto/sha1-ce-glue.c b/arch/arm64/crypto/sha1-ce-glue.c
index aefda9868627..058cbe299dd6 100644
--- a/arch/arm64/crypto/sha1-ce-glue.c
+++ b/arch/arm64/crypto/sha1-ce-glue.c
@@ -1,7 +1,7 @@
 /*
  * sha1-ce-glue.c - SHA-1 secure hash using ARMv8 Crypto Extensions
  *
- * Copyright (C) 2014 Linaro Ltd <ard.biesheuvel@linaro.org>
+ * Copyright (C) 2014 - 2017 Linaro Ltd <ard.biesheuvel@linaro.org>
  *
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of the GNU General Public License version 2 as
@@ -9,6 +9,7 @@
  */
 
 #include <asm/neon.h>
+#include <asm/simd.h>
 #include <asm/unaligned.h>
 #include <crypto/internal/hash.h>
 #include <crypto/sha.h>
@@ -37,8 +38,11 @@ static int sha1_ce_update(struct shash_desc *desc, const u8 *data,
 {
 	struct sha1_ce_state *sctx = shash_desc_ctx(desc);
 
+	if (!may_use_simd())
+		return crypto_sha1_update(desc, data, len);
+
 	sctx->finalize = 0;
-	kernel_neon_begin_partial(16);
+	kernel_neon_begin();
 	sha1_base_do_update(desc, data, len,
 			    (sha1_block_fn *)sha1_ce_transform);
 	kernel_neon_end();
@@ -57,13 +61,16 @@ static int sha1_ce_finup(struct shash_desc *desc, const u8 *data,
 	ASM_EXPORT(sha1_ce_offsetof_finalize,
 		   offsetof(struct sha1_ce_state, finalize));
 
+	if (!may_use_simd())
+		return crypto_sha1_finup(desc, data, len, out);
+
 	/*
 	 * Allow the asm code to perform the finalization if there is no
 	 * partial data and the input is a round multiple of the block size.
 	 */
 	sctx->finalize = finalize;
 
-	kernel_neon_begin_partial(16);
+	kernel_neon_begin();
 	sha1_base_do_update(desc, data, len,
 			    (sha1_block_fn *)sha1_ce_transform);
 	if (!finalize)
@@ -76,8 +83,11 @@ static int sha1_ce_final(struct shash_desc *desc, u8 *out)
 {
 	struct sha1_ce_state *sctx = shash_desc_ctx(desc);
 
+	if (!may_use_simd())
+		return crypto_sha1_finup(desc, NULL, 0, out);
+
 	sctx->finalize = 0;
-	kernel_neon_begin_partial(16);
+	kernel_neon_begin();
 	sha1_base_do_finalize(desc, (sha1_block_fn *)sha1_ce_transform);
 	kernel_neon_end();
 	return sha1_base_finish(desc, out);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 05/12] crypto: arm64/sha1-ce - add non-SIMD generic fallback
@ 2017-06-10 16:22   ` Ard Biesheuvel
  0 siblings, 0 replies; 28+ messages in thread
From: Ard Biesheuvel @ 2017-06-10 16:22 UTC (permalink / raw)
  To: linux-arm-kernel

The arm64 kernel will shortly disallow nested kernel mode NEON, so
add a fallback to scalar C code that can be invoked in that case.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/crypto/Kconfig        |  3 ++-
 arch/arm64/crypto/sha1-ce-glue.c | 18 ++++++++++++++----
 2 files changed, 16 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/crypto/Kconfig b/arch/arm64/crypto/Kconfig
index 7d75a363e317..5d5953545dad 100644
--- a/arch/arm64/crypto/Kconfig
+++ b/arch/arm64/crypto/Kconfig
@@ -18,8 +18,9 @@ config CRYPTO_SHA512_ARM64
 
 config CRYPTO_SHA1_ARM64_CE
 	tristate "SHA-1 digest algorithm (ARMv8 Crypto Extensions)"
-	depends on ARM64 && KERNEL_MODE_NEON
+	depends on KERNEL_MODE_NEON
 	select CRYPTO_HASH
+	select CRYPTO_SHA1
 
 config CRYPTO_SHA2_ARM64_CE
 	tristate "SHA-224/SHA-256 digest algorithm (ARMv8 Crypto Extensions)"
diff --git a/arch/arm64/crypto/sha1-ce-glue.c b/arch/arm64/crypto/sha1-ce-glue.c
index aefda9868627..058cbe299dd6 100644
--- a/arch/arm64/crypto/sha1-ce-glue.c
+++ b/arch/arm64/crypto/sha1-ce-glue.c
@@ -1,7 +1,7 @@
 /*
  * sha1-ce-glue.c - SHA-1 secure hash using ARMv8 Crypto Extensions
  *
- * Copyright (C) 2014 Linaro Ltd <ard.biesheuvel@linaro.org>
+ * Copyright (C) 2014 - 2017 Linaro Ltd <ard.biesheuvel@linaro.org>
  *
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of the GNU General Public License version 2 as
@@ -9,6 +9,7 @@
  */
 
 #include <asm/neon.h>
+#include <asm/simd.h>
 #include <asm/unaligned.h>
 #include <crypto/internal/hash.h>
 #include <crypto/sha.h>
@@ -37,8 +38,11 @@ static int sha1_ce_update(struct shash_desc *desc, const u8 *data,
 {
 	struct sha1_ce_state *sctx = shash_desc_ctx(desc);
 
+	if (!may_use_simd())
+		return crypto_sha1_update(desc, data, len);
+
 	sctx->finalize = 0;
-	kernel_neon_begin_partial(16);
+	kernel_neon_begin();
 	sha1_base_do_update(desc, data, len,
 			    (sha1_block_fn *)sha1_ce_transform);
 	kernel_neon_end();
@@ -57,13 +61,16 @@ static int sha1_ce_finup(struct shash_desc *desc, const u8 *data,
 	ASM_EXPORT(sha1_ce_offsetof_finalize,
 		   offsetof(struct sha1_ce_state, finalize));
 
+	if (!may_use_simd())
+		return crypto_sha1_finup(desc, data, len, out);
+
 	/*
 	 * Allow the asm code to perform the finalization if there is no
 	 * partial data and the input is a round multiple of the block size.
 	 */
 	sctx->finalize = finalize;
 
-	kernel_neon_begin_partial(16);
+	kernel_neon_begin();
 	sha1_base_do_update(desc, data, len,
 			    (sha1_block_fn *)sha1_ce_transform);
 	if (!finalize)
@@ -76,8 +83,11 @@ static int sha1_ce_final(struct shash_desc *desc, u8 *out)
 {
 	struct sha1_ce_state *sctx = shash_desc_ctx(desc);
 
+	if (!may_use_simd())
+		return crypto_sha1_finup(desc, NULL, 0, out);
+
 	sctx->finalize = 0;
-	kernel_neon_begin_partial(16);
+	kernel_neon_begin();
 	sha1_base_do_finalize(desc, (sha1_block_fn *)sha1_ce_transform);
 	kernel_neon_end();
 	return sha1_base_finish(desc, out);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 06/12] crypto: arm64/sha2-ce - add non-SIMD scalar fallback
  2017-06-10 16:22 ` Ard Biesheuvel
@ 2017-06-10 16:22   ` Ard Biesheuvel
  -1 siblings, 0 replies; 28+ messages in thread
From: Ard Biesheuvel @ 2017-06-10 16:22 UTC (permalink / raw)
  To: linux-crypto, herbert, linux-arm-kernel, catalin.marinas,
	will.deacon, dave.martin
  Cc: Ard Biesheuvel

The arm64 kernel will shortly disallow nested kernel mode NEON, so
add a fallback to scalar code that can be invoked in that case.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/crypto/Kconfig        |  3 +-
 arch/arm64/crypto/sha2-ce-glue.c | 30 +++++++++++++++++---
 arch/arm64/crypto/sha256-glue.c  |  1 +
 3 files changed, 29 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/crypto/Kconfig b/arch/arm64/crypto/Kconfig
index 5d5953545dad..8cd145f9c1ff 100644
--- a/arch/arm64/crypto/Kconfig
+++ b/arch/arm64/crypto/Kconfig
@@ -24,8 +24,9 @@ config CRYPTO_SHA1_ARM64_CE
 
 config CRYPTO_SHA2_ARM64_CE
 	tristate "SHA-224/SHA-256 digest algorithm (ARMv8 Crypto Extensions)"
-	depends on ARM64 && KERNEL_MODE_NEON
+	depends on KERNEL_MODE_NEON
 	select CRYPTO_HASH
+	select CRYPTO_SHA256_ARM64
 
 config CRYPTO_GHASH_ARM64_CE
 	tristate "GHASH (for GCM chaining mode) using ARMv8 Crypto Extensions"
diff --git a/arch/arm64/crypto/sha2-ce-glue.c b/arch/arm64/crypto/sha2-ce-glue.c
index 7cd587564a41..eb71543568b6 100644
--- a/arch/arm64/crypto/sha2-ce-glue.c
+++ b/arch/arm64/crypto/sha2-ce-glue.c
@@ -1,7 +1,7 @@
 /*
  * sha2-ce-glue.c - SHA-224/SHA-256 using ARMv8 Crypto Extensions
  *
- * Copyright (C) 2014 Linaro Ltd <ard.biesheuvel@linaro.org>
+ * Copyright (C) 2014 - 2017 Linaro Ltd <ard.biesheuvel@linaro.org>
  *
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of the GNU General Public License version 2 as
@@ -9,6 +9,7 @@
  */
 
 #include <asm/neon.h>
+#include <asm/simd.h>
 #include <asm/unaligned.h>
 #include <crypto/internal/hash.h>
 #include <crypto/sha.h>
@@ -32,13 +33,19 @@ struct sha256_ce_state {
 asmlinkage void sha2_ce_transform(struct sha256_ce_state *sst, u8 const *src,
 				  int blocks);
 
+asmlinkage void sha256_block_data_order(u32 *digest, u8 const *src, int blocks);
+
 static int sha256_ce_update(struct shash_desc *desc, const u8 *data,
 			    unsigned int len)
 {
 	struct sha256_ce_state *sctx = shash_desc_ctx(desc);
 
+	if (!may_use_simd())
+		return sha256_base_do_update(desc, data, len,
+				(sha256_block_fn *)sha256_block_data_order);
+
 	sctx->finalize = 0;
-	kernel_neon_begin_partial(28);
+	kernel_neon_begin();
 	sha256_base_do_update(desc, data, len,
 			      (sha256_block_fn *)sha2_ce_transform);
 	kernel_neon_end();
@@ -57,13 +64,22 @@ static int sha256_ce_finup(struct shash_desc *desc, const u8 *data,
 	ASM_EXPORT(sha256_ce_offsetof_finalize,
 		   offsetof(struct sha256_ce_state, finalize));
 
+	if (!may_use_simd()) {
+		if (len)
+			sha256_base_do_update(desc, data, len,
+				(sha256_block_fn *)sha256_block_data_order);
+		sha256_base_do_finalize(desc,
+				(sha256_block_fn *)sha256_block_data_order);
+		return sha256_base_finish(desc, out);
+	}
+
 	/*
 	 * Allow the asm code to perform the finalization if there is no
 	 * partial data and the input is a round multiple of the block size.
 	 */
 	sctx->finalize = finalize;
 
-	kernel_neon_begin_partial(28);
+	kernel_neon_begin();
 	sha256_base_do_update(desc, data, len,
 			      (sha256_block_fn *)sha2_ce_transform);
 	if (!finalize)
@@ -77,8 +93,14 @@ static int sha256_ce_final(struct shash_desc *desc, u8 *out)
 {
 	struct sha256_ce_state *sctx = shash_desc_ctx(desc);
 
+	if (!may_use_simd()) {
+		sha256_base_do_finalize(desc,
+				(sha256_block_fn *)sha256_block_data_order);
+		return sha256_base_finish(desc, out);
+	}
+
 	sctx->finalize = 0;
-	kernel_neon_begin_partial(28);
+	kernel_neon_begin();
 	sha256_base_do_finalize(desc, (sha256_block_fn *)sha2_ce_transform);
 	kernel_neon_end();
 	return sha256_base_finish(desc, out);
diff --git a/arch/arm64/crypto/sha256-glue.c b/arch/arm64/crypto/sha256-glue.c
index a2226f841960..b064d925fe2a 100644
--- a/arch/arm64/crypto/sha256-glue.c
+++ b/arch/arm64/crypto/sha256-glue.c
@@ -29,6 +29,7 @@ MODULE_ALIAS_CRYPTO("sha256");
 
 asmlinkage void sha256_block_data_order(u32 *digest, const void *data,
 					unsigned int num_blks);
+EXPORT_SYMBOL(sha256_block_data_order);
 
 asmlinkage void sha256_block_neon(u32 *digest, const void *data,
 				  unsigned int num_blks);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 06/12] crypto: arm64/sha2-ce - add non-SIMD scalar fallback
@ 2017-06-10 16:22   ` Ard Biesheuvel
  0 siblings, 0 replies; 28+ messages in thread
From: Ard Biesheuvel @ 2017-06-10 16:22 UTC (permalink / raw)
  To: linux-arm-kernel

The arm64 kernel will shortly disallow nested kernel mode NEON, so
add a fallback to scalar code that can be invoked in that case.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/crypto/Kconfig        |  3 +-
 arch/arm64/crypto/sha2-ce-glue.c | 30 +++++++++++++++++---
 arch/arm64/crypto/sha256-glue.c  |  1 +
 3 files changed, 29 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/crypto/Kconfig b/arch/arm64/crypto/Kconfig
index 5d5953545dad..8cd145f9c1ff 100644
--- a/arch/arm64/crypto/Kconfig
+++ b/arch/arm64/crypto/Kconfig
@@ -24,8 +24,9 @@ config CRYPTO_SHA1_ARM64_CE
 
 config CRYPTO_SHA2_ARM64_CE
 	tristate "SHA-224/SHA-256 digest algorithm (ARMv8 Crypto Extensions)"
-	depends on ARM64 && KERNEL_MODE_NEON
+	depends on KERNEL_MODE_NEON
 	select CRYPTO_HASH
+	select CRYPTO_SHA256_ARM64
 
 config CRYPTO_GHASH_ARM64_CE
 	tristate "GHASH (for GCM chaining mode) using ARMv8 Crypto Extensions"
diff --git a/arch/arm64/crypto/sha2-ce-glue.c b/arch/arm64/crypto/sha2-ce-glue.c
index 7cd587564a41..eb71543568b6 100644
--- a/arch/arm64/crypto/sha2-ce-glue.c
+++ b/arch/arm64/crypto/sha2-ce-glue.c
@@ -1,7 +1,7 @@
 /*
  * sha2-ce-glue.c - SHA-224/SHA-256 using ARMv8 Crypto Extensions
  *
- * Copyright (C) 2014 Linaro Ltd <ard.biesheuvel@linaro.org>
+ * Copyright (C) 2014 - 2017 Linaro Ltd <ard.biesheuvel@linaro.org>
  *
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of the GNU General Public License version 2 as
@@ -9,6 +9,7 @@
  */
 
 #include <asm/neon.h>
+#include <asm/simd.h>
 #include <asm/unaligned.h>
 #include <crypto/internal/hash.h>
 #include <crypto/sha.h>
@@ -32,13 +33,19 @@ struct sha256_ce_state {
 asmlinkage void sha2_ce_transform(struct sha256_ce_state *sst, u8 const *src,
 				  int blocks);
 
+asmlinkage void sha256_block_data_order(u32 *digest, u8 const *src, int blocks);
+
 static int sha256_ce_update(struct shash_desc *desc, const u8 *data,
 			    unsigned int len)
 {
 	struct sha256_ce_state *sctx = shash_desc_ctx(desc);
 
+	if (!may_use_simd())
+		return sha256_base_do_update(desc, data, len,
+				(sha256_block_fn *)sha256_block_data_order);
+
 	sctx->finalize = 0;
-	kernel_neon_begin_partial(28);
+	kernel_neon_begin();
 	sha256_base_do_update(desc, data, len,
 			      (sha256_block_fn *)sha2_ce_transform);
 	kernel_neon_end();
@@ -57,13 +64,22 @@ static int sha256_ce_finup(struct shash_desc *desc, const u8 *data,
 	ASM_EXPORT(sha256_ce_offsetof_finalize,
 		   offsetof(struct sha256_ce_state, finalize));
 
+	if (!may_use_simd()) {
+		if (len)
+			sha256_base_do_update(desc, data, len,
+				(sha256_block_fn *)sha256_block_data_order);
+		sha256_base_do_finalize(desc,
+				(sha256_block_fn *)sha256_block_data_order);
+		return sha256_base_finish(desc, out);
+	}
+
 	/*
 	 * Allow the asm code to perform the finalization if there is no
 	 * partial data and the input is a round multiple of the block size.
 	 */
 	sctx->finalize = finalize;
 
-	kernel_neon_begin_partial(28);
+	kernel_neon_begin();
 	sha256_base_do_update(desc, data, len,
 			      (sha256_block_fn *)sha2_ce_transform);
 	if (!finalize)
@@ -77,8 +93,14 @@ static int sha256_ce_final(struct shash_desc *desc, u8 *out)
 {
 	struct sha256_ce_state *sctx = shash_desc_ctx(desc);
 
+	if (!may_use_simd()) {
+		sha256_base_do_finalize(desc,
+				(sha256_block_fn *)sha256_block_data_order);
+		return sha256_base_finish(desc, out);
+	}
+
 	sctx->finalize = 0;
-	kernel_neon_begin_partial(28);
+	kernel_neon_begin();
 	sha256_base_do_finalize(desc, (sha256_block_fn *)sha2_ce_transform);
 	kernel_neon_end();
 	return sha256_base_finish(desc, out);
diff --git a/arch/arm64/crypto/sha256-glue.c b/arch/arm64/crypto/sha256-glue.c
index a2226f841960..b064d925fe2a 100644
--- a/arch/arm64/crypto/sha256-glue.c
+++ b/arch/arm64/crypto/sha256-glue.c
@@ -29,6 +29,7 @@ MODULE_ALIAS_CRYPTO("sha256");
 
 asmlinkage void sha256_block_data_order(u32 *digest, const void *data,
 					unsigned int num_blks);
+EXPORT_SYMBOL(sha256_block_data_order);
 
 asmlinkage void sha256_block_neon(u32 *digest, const void *data,
 				  unsigned int num_blks);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 07/12] crypto: arm64/aes-ce-cipher - match round key endianness with generic code
  2017-06-10 16:22 ` Ard Biesheuvel
@ 2017-06-10 16:22   ` Ard Biesheuvel
  -1 siblings, 0 replies; 28+ messages in thread
From: Ard Biesheuvel @ 2017-06-10 16:22 UTC (permalink / raw)
  To: linux-crypto, herbert, linux-arm-kernel, catalin.marinas,
	will.deacon, dave.martin
  Cc: Ard Biesheuvel

In order to be able to reuse the generic AES code as a fallback for
situations where the NEON may not be used, update the key handling
to match the byte order of the generic code: it stores round keys
as sequences of 32-bit quantities rather than streams of bytes, and
so our code needs to be updated to reflect that.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/crypto/aes-ce-ccm-core.S | 30 ++++++++---------
 arch/arm64/crypto/aes-ce-cipher.c   | 35 +++++++++-----------
 arch/arm64/crypto/aes-ce.S          | 12 +++----
 3 files changed, 37 insertions(+), 40 deletions(-)

diff --git a/arch/arm64/crypto/aes-ce-ccm-core.S b/arch/arm64/crypto/aes-ce-ccm-core.S
index 3363560c79b7..e3a375c4cb83 100644
--- a/arch/arm64/crypto/aes-ce-ccm-core.S
+++ b/arch/arm64/crypto/aes-ce-ccm-core.S
@@ -1,7 +1,7 @@
 /*
  * aesce-ccm-core.S - AES-CCM transform for ARMv8 with Crypto Extensions
  *
- * Copyright (C) 2013 - 2014 Linaro Ltd <ard.biesheuvel@linaro.org>
+ * Copyright (C) 2013 - 2017 Linaro Ltd <ard.biesheuvel@linaro.org>
  *
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of the GNU General Public License version 2 as
@@ -32,7 +32,7 @@ ENTRY(ce_aes_ccm_auth_data)
 	beq	8f				/* out of input? */
 	cbnz	w8, 0b
 	eor	v0.16b, v0.16b, v1.16b
-1:	ld1	{v3.16b}, [x4]			/* load first round key */
+1:	ld1	{v3.4s}, [x4]			/* load first round key */
 	prfm	pldl1strm, [x1]
 	cmp	w5, #12				/* which key size? */
 	add	x6, x4, #16
@@ -42,17 +42,17 @@ ENTRY(ce_aes_ccm_auth_data)
 	mov	v5.16b, v3.16b
 	b	4f
 2:	mov	v4.16b, v3.16b
-	ld1	{v5.16b}, [x6], #16		/* load 2nd round key */
+	ld1	{v5.4s}, [x6], #16		/* load 2nd round key */
 3:	aese	v0.16b, v4.16b
 	aesmc	v0.16b, v0.16b
-4:	ld1	{v3.16b}, [x6], #16		/* load next round key */
+4:	ld1	{v3.4s}, [x6], #16		/* load next round key */
 	aese	v0.16b, v5.16b
 	aesmc	v0.16b, v0.16b
-5:	ld1	{v4.16b}, [x6], #16		/* load next round key */
+5:	ld1	{v4.4s}, [x6], #16		/* load next round key */
 	subs	w7, w7, #3
 	aese	v0.16b, v3.16b
 	aesmc	v0.16b, v0.16b
-	ld1	{v5.16b}, [x6], #16		/* load next round key */
+	ld1	{v5.4s}, [x6], #16		/* load next round key */
 	bpl	3b
 	aese	v0.16b, v4.16b
 	subs	w2, w2, #16			/* last data? */
@@ -90,7 +90,7 @@ ENDPROC(ce_aes_ccm_auth_data)
 	 * 			 u32 rounds);
 	 */
 ENTRY(ce_aes_ccm_final)
-	ld1	{v3.16b}, [x2], #16		/* load first round key */
+	ld1	{v3.4s}, [x2], #16		/* load first round key */
 	ld1	{v0.16b}, [x0]			/* load mac */
 	cmp	w3, #12				/* which key size? */
 	sub	w3, w3, #2			/* modified # of rounds */
@@ -100,17 +100,17 @@ ENTRY(ce_aes_ccm_final)
 	mov	v5.16b, v3.16b
 	b	2f
 0:	mov	v4.16b, v3.16b
-1:	ld1	{v5.16b}, [x2], #16		/* load next round key */
+1:	ld1	{v5.4s}, [x2], #16		/* load next round key */
 	aese	v0.16b, v4.16b
 	aesmc	v0.16b, v0.16b
 	aese	v1.16b, v4.16b
 	aesmc	v1.16b, v1.16b
-2:	ld1	{v3.16b}, [x2], #16		/* load next round key */
+2:	ld1	{v3.4s}, [x2], #16		/* load next round key */
 	aese	v0.16b, v5.16b
 	aesmc	v0.16b, v0.16b
 	aese	v1.16b, v5.16b
 	aesmc	v1.16b, v1.16b
-3:	ld1	{v4.16b}, [x2], #16		/* load next round key */
+3:	ld1	{v4.4s}, [x2], #16		/* load next round key */
 	subs	w3, w3, #3
 	aese	v0.16b, v3.16b
 	aesmc	v0.16b, v0.16b
@@ -137,31 +137,31 @@ CPU_LE(	rev	x8, x8			)	/* keep swabbed ctr in reg */
 	cmp	w4, #12				/* which key size? */
 	sub	w7, w4, #2			/* get modified # of rounds */
 	ins	v1.d[1], x9			/* no carry in lower ctr */
-	ld1	{v3.16b}, [x3]			/* load first round key */
+	ld1	{v3.4s}, [x3]			/* load first round key */
 	add	x10, x3, #16
 	bmi	1f
 	bne	4f
 	mov	v5.16b, v3.16b
 	b	3f
 1:	mov	v4.16b, v3.16b
-	ld1	{v5.16b}, [x10], #16		/* load 2nd round key */
+	ld1	{v5.4s}, [x10], #16		/* load 2nd round key */
 2:	/* inner loop: 3 rounds, 2x interleaved */
 	aese	v0.16b, v4.16b
 	aesmc	v0.16b, v0.16b
 	aese	v1.16b, v4.16b
 	aesmc	v1.16b, v1.16b
-3:	ld1	{v3.16b}, [x10], #16		/* load next round key */
+3:	ld1	{v3.4s}, [x10], #16		/* load next round key */
 	aese	v0.16b, v5.16b
 	aesmc	v0.16b, v0.16b
 	aese	v1.16b, v5.16b
 	aesmc	v1.16b, v1.16b
-4:	ld1	{v4.16b}, [x10], #16		/* load next round key */
+4:	ld1	{v4.4s}, [x10], #16		/* load next round key */
 	subs	w7, w7, #3
 	aese	v0.16b, v3.16b
 	aesmc	v0.16b, v0.16b
 	aese	v1.16b, v3.16b
 	aesmc	v1.16b, v1.16b
-	ld1	{v5.16b}, [x10], #16		/* load next round key */
+	ld1	{v5.4s}, [x10], #16		/* load next round key */
 	bpl	2b
 	aese	v0.16b, v4.16b
 	aese	v1.16b, v4.16b
diff --git a/arch/arm64/crypto/aes-ce-cipher.c b/arch/arm64/crypto/aes-ce-cipher.c
index 50d9fe11d0c8..a0a0e5e3a8b5 100644
--- a/arch/arm64/crypto/aes-ce-cipher.c
+++ b/arch/arm64/crypto/aes-ce-cipher.c
@@ -1,7 +1,7 @@
 /*
  * aes-ce-cipher.c - core AES cipher using ARMv8 Crypto Extensions
  *
- * Copyright (C) 2013 - 2014 Linaro Ltd <ard.biesheuvel@linaro.org>
+ * Copyright (C) 2013 - 2017 Linaro Ltd <ard.biesheuvel@linaro.org>
  *
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of the GNU General Public License version 2 as
@@ -9,6 +9,7 @@
  */
 
 #include <asm/neon.h>
+#include <asm/unaligned.h>
 #include <crypto/aes.h>
 #include <linux/cpufeature.h>
 #include <linux/crypto.h>
@@ -47,24 +48,24 @@ static void aes_cipher_encrypt(struct crypto_tfm *tfm, u8 dst[], u8 const src[])
 	kernel_neon_begin_partial(4);
 
 	__asm__("	ld1	{v0.16b}, %[in]			;"
-		"	ld1	{v1.16b}, [%[key]], #16		;"
+		"	ld1	{v1.4s}, [%[key]], #16		;"
 		"	cmp	%w[rounds], #10			;"
 		"	bmi	0f				;"
 		"	bne	3f				;"
 		"	mov	v3.16b, v1.16b			;"
 		"	b	2f				;"
 		"0:	mov	v2.16b, v1.16b			;"
-		"	ld1	{v3.16b}, [%[key]], #16		;"
+		"	ld1	{v3.4s}, [%[key]], #16		;"
 		"1:	aese	v0.16b, v2.16b			;"
 		"	aesmc	v0.16b, v0.16b			;"
-		"2:	ld1	{v1.16b}, [%[key]], #16		;"
+		"2:	ld1	{v1.4s}, [%[key]], #16		;"
 		"	aese	v0.16b, v3.16b			;"
 		"	aesmc	v0.16b, v0.16b			;"
-		"3:	ld1	{v2.16b}, [%[key]], #16		;"
+		"3:	ld1	{v2.4s}, [%[key]], #16		;"
 		"	subs	%w[rounds], %w[rounds], #3	;"
 		"	aese	v0.16b, v1.16b			;"
 		"	aesmc	v0.16b, v0.16b			;"
-		"	ld1	{v3.16b}, [%[key]], #16		;"
+		"	ld1	{v3.4s}, [%[key]], #16		;"
 		"	bpl	1b				;"
 		"	aese	v0.16b, v2.16b			;"
 		"	eor	v0.16b, v0.16b, v3.16b		;"
@@ -92,24 +93,24 @@ static void aes_cipher_decrypt(struct crypto_tfm *tfm, u8 dst[], u8 const src[])
 	kernel_neon_begin_partial(4);
 
 	__asm__("	ld1	{v0.16b}, %[in]			;"
-		"	ld1	{v1.16b}, [%[key]], #16		;"
+		"	ld1	{v1.4s}, [%[key]], #16		;"
 		"	cmp	%w[rounds], #10			;"
 		"	bmi	0f				;"
 		"	bne	3f				;"
 		"	mov	v3.16b, v1.16b			;"
 		"	b	2f				;"
 		"0:	mov	v2.16b, v1.16b			;"
-		"	ld1	{v3.16b}, [%[key]], #16		;"
+		"	ld1	{v3.4s}, [%[key]], #16		;"
 		"1:	aesd	v0.16b, v2.16b			;"
 		"	aesimc	v0.16b, v0.16b			;"
-		"2:	ld1	{v1.16b}, [%[key]], #16		;"
+		"2:	ld1	{v1.4s}, [%[key]], #16		;"
 		"	aesd	v0.16b, v3.16b			;"
 		"	aesimc	v0.16b, v0.16b			;"
-		"3:	ld1	{v2.16b}, [%[key]], #16		;"
+		"3:	ld1	{v2.4s}, [%[key]], #16		;"
 		"	subs	%w[rounds], %w[rounds], #3	;"
 		"	aesd	v0.16b, v1.16b			;"
 		"	aesimc	v0.16b, v0.16b			;"
-		"	ld1	{v3.16b}, [%[key]], #16		;"
+		"	ld1	{v3.4s}, [%[key]], #16		;"
 		"	bpl	1b				;"
 		"	aesd	v0.16b, v2.16b			;"
 		"	eor	v0.16b, v0.16b, v3.16b		;"
@@ -165,20 +166,16 @@ int ce_aes_expandkey(struct crypto_aes_ctx *ctx, const u8 *in_key,
 	    key_len != AES_KEYSIZE_256)
 		return -EINVAL;
 
-	memcpy(ctx->key_enc, in_key, key_len);
 	ctx->key_length = key_len;
+	for (i = 0; i < kwords; i++)
+		ctx->key_enc[i] = get_unaligned_le32(in_key + i * sizeof(u32));
 
 	kernel_neon_begin_partial(2);
 	for (i = 0; i < sizeof(rcon); i++) {
 		u32 *rki = ctx->key_enc + (i * kwords);
 		u32 *rko = rki + kwords;
 
-#ifndef CONFIG_CPU_BIG_ENDIAN
 		rko[0] = ror32(aes_sub(rki[kwords - 1]), 8) ^ rcon[i] ^ rki[0];
-#else
-		rko[0] = rol32(aes_sub(rki[kwords - 1]), 8) ^ (rcon[i] << 24) ^
-			 rki[0];
-#endif
 		rko[1] = rko[0] ^ rki[1];
 		rko[2] = rko[1] ^ rki[2];
 		rko[3] = rko[2] ^ rki[3];
@@ -210,9 +207,9 @@ int ce_aes_expandkey(struct crypto_aes_ctx *ctx, const u8 *in_key,
 
 	key_dec[0] = key_enc[j];
 	for (i = 1, j--; j > 0; i++, j--)
-		__asm__("ld1	{v0.16b}, %[in]		;"
+		__asm__("ld1	{v0.4s}, %[in]		;"
 			"aesimc	v1.16b, v0.16b		;"
-			"st1	{v1.16b}, %[out]	;"
+			"st1	{v1.4s}, %[out]	;"
 
 		:	[out]	"=Q"(key_dec[i])
 		:	[in]	"Q"(key_enc[j])
diff --git a/arch/arm64/crypto/aes-ce.S b/arch/arm64/crypto/aes-ce.S
index b46093d567e5..50330f5c3adc 100644
--- a/arch/arm64/crypto/aes-ce.S
+++ b/arch/arm64/crypto/aes-ce.S
@@ -2,7 +2,7 @@
  * linux/arch/arm64/crypto/aes-ce.S - AES cipher for ARMv8 with
  *                                    Crypto Extensions
  *
- * Copyright (C) 2013 Linaro Ltd <ard.biesheuvel@linaro.org>
+ * Copyright (C) 2013 - 2017 Linaro Ltd <ard.biesheuvel@linaro.org>
  *
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of the GNU General Public License version 2 as
@@ -22,11 +22,11 @@
 	cmp		\rounds, #12
 	blo		2222f		/* 128 bits */
 	beq		1111f		/* 192 bits */
-	ld1		{v17.16b-v18.16b}, [\rk], #32
-1111:	ld1		{v19.16b-v20.16b}, [\rk], #32
-2222:	ld1		{v21.16b-v24.16b}, [\rk], #64
-	ld1		{v25.16b-v28.16b}, [\rk], #64
-	ld1		{v29.16b-v31.16b}, [\rk]
+	ld1		{v17.4s-v18.4s}, [\rk], #32
+1111:	ld1		{v19.4s-v20.4s}, [\rk], #32
+2222:	ld1		{v21.4s-v24.4s}, [\rk], #64
+	ld1		{v25.4s-v28.4s}, [\rk], #64
+	ld1		{v29.4s-v31.4s}, [\rk]
 	.endm
 
 	/* prepare for encryption with key in rk[] */
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 07/12] crypto: arm64/aes-ce-cipher - match round key endianness with generic code
@ 2017-06-10 16:22   ` Ard Biesheuvel
  0 siblings, 0 replies; 28+ messages in thread
From: Ard Biesheuvel @ 2017-06-10 16:22 UTC (permalink / raw)
  To: linux-arm-kernel

In order to be able to reuse the generic AES code as a fallback for
situations where the NEON may not be used, update the key handling
to match the byte order of the generic code: it stores round keys
as sequences of 32-bit quantities rather than streams of bytes, and
so our code needs to be updated to reflect that.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/crypto/aes-ce-ccm-core.S | 30 ++++++++---------
 arch/arm64/crypto/aes-ce-cipher.c   | 35 +++++++++-----------
 arch/arm64/crypto/aes-ce.S          | 12 +++----
 3 files changed, 37 insertions(+), 40 deletions(-)

diff --git a/arch/arm64/crypto/aes-ce-ccm-core.S b/arch/arm64/crypto/aes-ce-ccm-core.S
index 3363560c79b7..e3a375c4cb83 100644
--- a/arch/arm64/crypto/aes-ce-ccm-core.S
+++ b/arch/arm64/crypto/aes-ce-ccm-core.S
@@ -1,7 +1,7 @@
 /*
  * aesce-ccm-core.S - AES-CCM transform for ARMv8 with Crypto Extensions
  *
- * Copyright (C) 2013 - 2014 Linaro Ltd <ard.biesheuvel@linaro.org>
+ * Copyright (C) 2013 - 2017 Linaro Ltd <ard.biesheuvel@linaro.org>
  *
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of the GNU General Public License version 2 as
@@ -32,7 +32,7 @@ ENTRY(ce_aes_ccm_auth_data)
 	beq	8f				/* out of input? */
 	cbnz	w8, 0b
 	eor	v0.16b, v0.16b, v1.16b
-1:	ld1	{v3.16b}, [x4]			/* load first round key */
+1:	ld1	{v3.4s}, [x4]			/* load first round key */
 	prfm	pldl1strm, [x1]
 	cmp	w5, #12				/* which key size? */
 	add	x6, x4, #16
@@ -42,17 +42,17 @@ ENTRY(ce_aes_ccm_auth_data)
 	mov	v5.16b, v3.16b
 	b	4f
 2:	mov	v4.16b, v3.16b
-	ld1	{v5.16b}, [x6], #16		/* load 2nd round key */
+	ld1	{v5.4s}, [x6], #16		/* load 2nd round key */
 3:	aese	v0.16b, v4.16b
 	aesmc	v0.16b, v0.16b
-4:	ld1	{v3.16b}, [x6], #16		/* load next round key */
+4:	ld1	{v3.4s}, [x6], #16		/* load next round key */
 	aese	v0.16b, v5.16b
 	aesmc	v0.16b, v0.16b
-5:	ld1	{v4.16b}, [x6], #16		/* load next round key */
+5:	ld1	{v4.4s}, [x6], #16		/* load next round key */
 	subs	w7, w7, #3
 	aese	v0.16b, v3.16b
 	aesmc	v0.16b, v0.16b
-	ld1	{v5.16b}, [x6], #16		/* load next round key */
+	ld1	{v5.4s}, [x6], #16		/* load next round key */
 	bpl	3b
 	aese	v0.16b, v4.16b
 	subs	w2, w2, #16			/* last data? */
@@ -90,7 +90,7 @@ ENDPROC(ce_aes_ccm_auth_data)
 	 * 			 u32 rounds);
 	 */
 ENTRY(ce_aes_ccm_final)
-	ld1	{v3.16b}, [x2], #16		/* load first round key */
+	ld1	{v3.4s}, [x2], #16		/* load first round key */
 	ld1	{v0.16b}, [x0]			/* load mac */
 	cmp	w3, #12				/* which key size? */
 	sub	w3, w3, #2			/* modified # of rounds */
@@ -100,17 +100,17 @@ ENTRY(ce_aes_ccm_final)
 	mov	v5.16b, v3.16b
 	b	2f
 0:	mov	v4.16b, v3.16b
-1:	ld1	{v5.16b}, [x2], #16		/* load next round key */
+1:	ld1	{v5.4s}, [x2], #16		/* load next round key */
 	aese	v0.16b, v4.16b
 	aesmc	v0.16b, v0.16b
 	aese	v1.16b, v4.16b
 	aesmc	v1.16b, v1.16b
-2:	ld1	{v3.16b}, [x2], #16		/* load next round key */
+2:	ld1	{v3.4s}, [x2], #16		/* load next round key */
 	aese	v0.16b, v5.16b
 	aesmc	v0.16b, v0.16b
 	aese	v1.16b, v5.16b
 	aesmc	v1.16b, v1.16b
-3:	ld1	{v4.16b}, [x2], #16		/* load next round key */
+3:	ld1	{v4.4s}, [x2], #16		/* load next round key */
 	subs	w3, w3, #3
 	aese	v0.16b, v3.16b
 	aesmc	v0.16b, v0.16b
@@ -137,31 +137,31 @@ CPU_LE(	rev	x8, x8			)	/* keep swabbed ctr in reg */
 	cmp	w4, #12				/* which key size? */
 	sub	w7, w4, #2			/* get modified # of rounds */
 	ins	v1.d[1], x9			/* no carry in lower ctr */
-	ld1	{v3.16b}, [x3]			/* load first round key */
+	ld1	{v3.4s}, [x3]			/* load first round key */
 	add	x10, x3, #16
 	bmi	1f
 	bne	4f
 	mov	v5.16b, v3.16b
 	b	3f
 1:	mov	v4.16b, v3.16b
-	ld1	{v5.16b}, [x10], #16		/* load 2nd round key */
+	ld1	{v5.4s}, [x10], #16		/* load 2nd round key */
 2:	/* inner loop: 3 rounds, 2x interleaved */
 	aese	v0.16b, v4.16b
 	aesmc	v0.16b, v0.16b
 	aese	v1.16b, v4.16b
 	aesmc	v1.16b, v1.16b
-3:	ld1	{v3.16b}, [x10], #16		/* load next round key */
+3:	ld1	{v3.4s}, [x10], #16		/* load next round key */
 	aese	v0.16b, v5.16b
 	aesmc	v0.16b, v0.16b
 	aese	v1.16b, v5.16b
 	aesmc	v1.16b, v1.16b
-4:	ld1	{v4.16b}, [x10], #16		/* load next round key */
+4:	ld1	{v4.4s}, [x10], #16		/* load next round key */
 	subs	w7, w7, #3
 	aese	v0.16b, v3.16b
 	aesmc	v0.16b, v0.16b
 	aese	v1.16b, v3.16b
 	aesmc	v1.16b, v1.16b
-	ld1	{v5.16b}, [x10], #16		/* load next round key */
+	ld1	{v5.4s}, [x10], #16		/* load next round key */
 	bpl	2b
 	aese	v0.16b, v4.16b
 	aese	v1.16b, v4.16b
diff --git a/arch/arm64/crypto/aes-ce-cipher.c b/arch/arm64/crypto/aes-ce-cipher.c
index 50d9fe11d0c8..a0a0e5e3a8b5 100644
--- a/arch/arm64/crypto/aes-ce-cipher.c
+++ b/arch/arm64/crypto/aes-ce-cipher.c
@@ -1,7 +1,7 @@
 /*
  * aes-ce-cipher.c - core AES cipher using ARMv8 Crypto Extensions
  *
- * Copyright (C) 2013 - 2014 Linaro Ltd <ard.biesheuvel@linaro.org>
+ * Copyright (C) 2013 - 2017 Linaro Ltd <ard.biesheuvel@linaro.org>
  *
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of the GNU General Public License version 2 as
@@ -9,6 +9,7 @@
  */
 
 #include <asm/neon.h>
+#include <asm/unaligned.h>
 #include <crypto/aes.h>
 #include <linux/cpufeature.h>
 #include <linux/crypto.h>
@@ -47,24 +48,24 @@ static void aes_cipher_encrypt(struct crypto_tfm *tfm, u8 dst[], u8 const src[])
 	kernel_neon_begin_partial(4);
 
 	__asm__("	ld1	{v0.16b}, %[in]			;"
-		"	ld1	{v1.16b}, [%[key]], #16		;"
+		"	ld1	{v1.4s}, [%[key]], #16		;"
 		"	cmp	%w[rounds], #10			;"
 		"	bmi	0f				;"
 		"	bne	3f				;"
 		"	mov	v3.16b, v1.16b			;"
 		"	b	2f				;"
 		"0:	mov	v2.16b, v1.16b			;"
-		"	ld1	{v3.16b}, [%[key]], #16		;"
+		"	ld1	{v3.4s}, [%[key]], #16		;"
 		"1:	aese	v0.16b, v2.16b			;"
 		"	aesmc	v0.16b, v0.16b			;"
-		"2:	ld1	{v1.16b}, [%[key]], #16		;"
+		"2:	ld1	{v1.4s}, [%[key]], #16		;"
 		"	aese	v0.16b, v3.16b			;"
 		"	aesmc	v0.16b, v0.16b			;"
-		"3:	ld1	{v2.16b}, [%[key]], #16		;"
+		"3:	ld1	{v2.4s}, [%[key]], #16		;"
 		"	subs	%w[rounds], %w[rounds], #3	;"
 		"	aese	v0.16b, v1.16b			;"
 		"	aesmc	v0.16b, v0.16b			;"
-		"	ld1	{v3.16b}, [%[key]], #16		;"
+		"	ld1	{v3.4s}, [%[key]], #16		;"
 		"	bpl	1b				;"
 		"	aese	v0.16b, v2.16b			;"
 		"	eor	v0.16b, v0.16b, v3.16b		;"
@@ -92,24 +93,24 @@ static void aes_cipher_decrypt(struct crypto_tfm *tfm, u8 dst[], u8 const src[])
 	kernel_neon_begin_partial(4);
 
 	__asm__("	ld1	{v0.16b}, %[in]			;"
-		"	ld1	{v1.16b}, [%[key]], #16		;"
+		"	ld1	{v1.4s}, [%[key]], #16		;"
 		"	cmp	%w[rounds], #10			;"
 		"	bmi	0f				;"
 		"	bne	3f				;"
 		"	mov	v3.16b, v1.16b			;"
 		"	b	2f				;"
 		"0:	mov	v2.16b, v1.16b			;"
-		"	ld1	{v3.16b}, [%[key]], #16		;"
+		"	ld1	{v3.4s}, [%[key]], #16		;"
 		"1:	aesd	v0.16b, v2.16b			;"
 		"	aesimc	v0.16b, v0.16b			;"
-		"2:	ld1	{v1.16b}, [%[key]], #16		;"
+		"2:	ld1	{v1.4s}, [%[key]], #16		;"
 		"	aesd	v0.16b, v3.16b			;"
 		"	aesimc	v0.16b, v0.16b			;"
-		"3:	ld1	{v2.16b}, [%[key]], #16		;"
+		"3:	ld1	{v2.4s}, [%[key]], #16		;"
 		"	subs	%w[rounds], %w[rounds], #3	;"
 		"	aesd	v0.16b, v1.16b			;"
 		"	aesimc	v0.16b, v0.16b			;"
-		"	ld1	{v3.16b}, [%[key]], #16		;"
+		"	ld1	{v3.4s}, [%[key]], #16		;"
 		"	bpl	1b				;"
 		"	aesd	v0.16b, v2.16b			;"
 		"	eor	v0.16b, v0.16b, v3.16b		;"
@@ -165,20 +166,16 @@ int ce_aes_expandkey(struct crypto_aes_ctx *ctx, const u8 *in_key,
 	    key_len != AES_KEYSIZE_256)
 		return -EINVAL;
 
-	memcpy(ctx->key_enc, in_key, key_len);
 	ctx->key_length = key_len;
+	for (i = 0; i < kwords; i++)
+		ctx->key_enc[i] = get_unaligned_le32(in_key + i * sizeof(u32));
 
 	kernel_neon_begin_partial(2);
 	for (i = 0; i < sizeof(rcon); i++) {
 		u32 *rki = ctx->key_enc + (i * kwords);
 		u32 *rko = rki + kwords;
 
-#ifndef CONFIG_CPU_BIG_ENDIAN
 		rko[0] = ror32(aes_sub(rki[kwords - 1]), 8) ^ rcon[i] ^ rki[0];
-#else
-		rko[0] = rol32(aes_sub(rki[kwords - 1]), 8) ^ (rcon[i] << 24) ^
-			 rki[0];
-#endif
 		rko[1] = rko[0] ^ rki[1];
 		rko[2] = rko[1] ^ rki[2];
 		rko[3] = rko[2] ^ rki[3];
@@ -210,9 +207,9 @@ int ce_aes_expandkey(struct crypto_aes_ctx *ctx, const u8 *in_key,
 
 	key_dec[0] = key_enc[j];
 	for (i = 1, j--; j > 0; i++, j--)
-		__asm__("ld1	{v0.16b}, %[in]		;"
+		__asm__("ld1	{v0.4s}, %[in]		;"
 			"aesimc	v1.16b, v0.16b		;"
-			"st1	{v1.16b}, %[out]	;"
+			"st1	{v1.4s}, %[out]	;"
 
 		:	[out]	"=Q"(key_dec[i])
 		:	[in]	"Q"(key_enc[j])
diff --git a/arch/arm64/crypto/aes-ce.S b/arch/arm64/crypto/aes-ce.S
index b46093d567e5..50330f5c3adc 100644
--- a/arch/arm64/crypto/aes-ce.S
+++ b/arch/arm64/crypto/aes-ce.S
@@ -2,7 +2,7 @@
  * linux/arch/arm64/crypto/aes-ce.S - AES cipher for ARMv8 with
  *                                    Crypto Extensions
  *
- * Copyright (C) 2013 Linaro Ltd <ard.biesheuvel@linaro.org>
+ * Copyright (C) 2013 - 2017 Linaro Ltd <ard.biesheuvel@linaro.org>
  *
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of the GNU General Public License version 2 as
@@ -22,11 +22,11 @@
 	cmp		\rounds, #12
 	blo		2222f		/* 128 bits */
 	beq		1111f		/* 192 bits */
-	ld1		{v17.16b-v18.16b}, [\rk], #32
-1111:	ld1		{v19.16b-v20.16b}, [\rk], #32
-2222:	ld1		{v21.16b-v24.16b}, [\rk], #64
-	ld1		{v25.16b-v28.16b}, [\rk], #64
-	ld1		{v29.16b-v31.16b}, [\rk]
+	ld1		{v17.4s-v18.4s}, [\rk], #32
+1111:	ld1		{v19.4s-v20.4s}, [\rk], #32
+2222:	ld1		{v21.4s-v24.4s}, [\rk], #64
+	ld1		{v25.4s-v28.4s}, [\rk], #64
+	ld1		{v29.4s-v31.4s}, [\rk]
 	.endm
 
 	/* prepare for encryption with key in rk[] */
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 08/12] crypto: arm64/aes-ce-cipher: add non-SIMD generic fallback
  2017-06-10 16:22 ` Ard Biesheuvel
@ 2017-06-10 16:22   ` Ard Biesheuvel
  -1 siblings, 0 replies; 28+ messages in thread
From: Ard Biesheuvel @ 2017-06-10 16:22 UTC (permalink / raw)
  To: linux-crypto, herbert, linux-arm-kernel, catalin.marinas,
	will.deacon, dave.martin
  Cc: Ard Biesheuvel

The arm64 kernel will shortly disallow nested kernel mode NEON, so
add a fallback to scalar code that can be invoked in that case.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/crypto/Kconfig         |  3 ++-
 arch/arm64/crypto/aes-ce-cipher.c | 20 +++++++++++++++++---
 2 files changed, 19 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/crypto/Kconfig b/arch/arm64/crypto/Kconfig
index 8cd145f9c1ff..772801f263d9 100644
--- a/arch/arm64/crypto/Kconfig
+++ b/arch/arm64/crypto/Kconfig
@@ -50,8 +50,9 @@ config CRYPTO_AES_ARM64
 
 config CRYPTO_AES_ARM64_CE
 	tristate "AES core cipher using ARMv8 Crypto Extensions"
-	depends on ARM64 && KERNEL_MODE_NEON
+	depends on KERNEL_MODE_NEON
 	select CRYPTO_ALGAPI
+	select CRYPTO_AES_ARM64
 
 config CRYPTO_AES_ARM64_CE_CCM
 	tristate "AES in CCM mode using ARMv8 Crypto Extensions"
diff --git a/arch/arm64/crypto/aes-ce-cipher.c b/arch/arm64/crypto/aes-ce-cipher.c
index a0a0e5e3a8b5..6a75cd75ed11 100644
--- a/arch/arm64/crypto/aes-ce-cipher.c
+++ b/arch/arm64/crypto/aes-ce-cipher.c
@@ -9,6 +9,7 @@
  */
 
 #include <asm/neon.h>
+#include <asm/simd.h>
 #include <asm/unaligned.h>
 #include <crypto/aes.h>
 #include <linux/cpufeature.h>
@@ -21,6 +22,9 @@ MODULE_DESCRIPTION("Synchronous AES cipher using ARMv8 Crypto Extensions");
 MODULE_AUTHOR("Ard Biesheuvel <ard.biesheuvel@linaro.org>");
 MODULE_LICENSE("GPL v2");
 
+asmlinkage void __aes_arm64_encrypt(u32 *rk, u8 *out, const u8 *in, int rounds);
+asmlinkage void __aes_arm64_decrypt(u32 *rk, u8 *out, const u8 *in, int rounds);
+
 struct aes_block {
 	u8 b[AES_BLOCK_SIZE];
 };
@@ -45,7 +49,12 @@ static void aes_cipher_encrypt(struct crypto_tfm *tfm, u8 dst[], u8 const src[])
 	void *dummy0;
 	int dummy1;
 
-	kernel_neon_begin_partial(4);
+	if (!may_use_simd()) {
+		__aes_arm64_encrypt(ctx->key_enc, dst, src, num_rounds(ctx));
+		return;
+	}
+
+	kernel_neon_begin();
 
 	__asm__("	ld1	{v0.16b}, %[in]			;"
 		"	ld1	{v1.4s}, [%[key]], #16		;"
@@ -90,7 +99,12 @@ static void aes_cipher_decrypt(struct crypto_tfm *tfm, u8 dst[], u8 const src[])
 	void *dummy0;
 	int dummy1;
 
-	kernel_neon_begin_partial(4);
+	if (!may_use_simd()) {
+		__aes_arm64_decrypt(ctx->key_dec, dst, src, num_rounds(ctx));
+		return;
+	}
+
+	kernel_neon_begin();
 
 	__asm__("	ld1	{v0.16b}, %[in]			;"
 		"	ld1	{v1.4s}, [%[key]], #16		;"
@@ -170,7 +184,7 @@ int ce_aes_expandkey(struct crypto_aes_ctx *ctx, const u8 *in_key,
 	for (i = 0; i < kwords; i++)
 		ctx->key_enc[i] = get_unaligned_le32(in_key + i * sizeof(u32));
 
-	kernel_neon_begin_partial(2);
+	kernel_neon_begin();
 	for (i = 0; i < sizeof(rcon); i++) {
 		u32 *rki = ctx->key_enc + (i * kwords);
 		u32 *rko = rki + kwords;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 08/12] crypto: arm64/aes-ce-cipher: add non-SIMD generic fallback
@ 2017-06-10 16:22   ` Ard Biesheuvel
  0 siblings, 0 replies; 28+ messages in thread
From: Ard Biesheuvel @ 2017-06-10 16:22 UTC (permalink / raw)
  To: linux-arm-kernel

The arm64 kernel will shortly disallow nested kernel mode NEON, so
add a fallback to scalar code that can be invoked in that case.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/crypto/Kconfig         |  3 ++-
 arch/arm64/crypto/aes-ce-cipher.c | 20 +++++++++++++++++---
 2 files changed, 19 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/crypto/Kconfig b/arch/arm64/crypto/Kconfig
index 8cd145f9c1ff..772801f263d9 100644
--- a/arch/arm64/crypto/Kconfig
+++ b/arch/arm64/crypto/Kconfig
@@ -50,8 +50,9 @@ config CRYPTO_AES_ARM64
 
 config CRYPTO_AES_ARM64_CE
 	tristate "AES core cipher using ARMv8 Crypto Extensions"
-	depends on ARM64 && KERNEL_MODE_NEON
+	depends on KERNEL_MODE_NEON
 	select CRYPTO_ALGAPI
+	select CRYPTO_AES_ARM64
 
 config CRYPTO_AES_ARM64_CE_CCM
 	tristate "AES in CCM mode using ARMv8 Crypto Extensions"
diff --git a/arch/arm64/crypto/aes-ce-cipher.c b/arch/arm64/crypto/aes-ce-cipher.c
index a0a0e5e3a8b5..6a75cd75ed11 100644
--- a/arch/arm64/crypto/aes-ce-cipher.c
+++ b/arch/arm64/crypto/aes-ce-cipher.c
@@ -9,6 +9,7 @@
  */
 
 #include <asm/neon.h>
+#include <asm/simd.h>
 #include <asm/unaligned.h>
 #include <crypto/aes.h>
 #include <linux/cpufeature.h>
@@ -21,6 +22,9 @@ MODULE_DESCRIPTION("Synchronous AES cipher using ARMv8 Crypto Extensions");
 MODULE_AUTHOR("Ard Biesheuvel <ard.biesheuvel@linaro.org>");
 MODULE_LICENSE("GPL v2");
 
+asmlinkage void __aes_arm64_encrypt(u32 *rk, u8 *out, const u8 *in, int rounds);
+asmlinkage void __aes_arm64_decrypt(u32 *rk, u8 *out, const u8 *in, int rounds);
+
 struct aes_block {
 	u8 b[AES_BLOCK_SIZE];
 };
@@ -45,7 +49,12 @@ static void aes_cipher_encrypt(struct crypto_tfm *tfm, u8 dst[], u8 const src[])
 	void *dummy0;
 	int dummy1;
 
-	kernel_neon_begin_partial(4);
+	if (!may_use_simd()) {
+		__aes_arm64_encrypt(ctx->key_enc, dst, src, num_rounds(ctx));
+		return;
+	}
+
+	kernel_neon_begin();
 
 	__asm__("	ld1	{v0.16b}, %[in]			;"
 		"	ld1	{v1.4s}, [%[key]], #16		;"
@@ -90,7 +99,12 @@ static void aes_cipher_decrypt(struct crypto_tfm *tfm, u8 dst[], u8 const src[])
 	void *dummy0;
 	int dummy1;
 
-	kernel_neon_begin_partial(4);
+	if (!may_use_simd()) {
+		__aes_arm64_decrypt(ctx->key_dec, dst, src, num_rounds(ctx));
+		return;
+	}
+
+	kernel_neon_begin();
 
 	__asm__("	ld1	{v0.16b}, %[in]			;"
 		"	ld1	{v1.4s}, [%[key]], #16		;"
@@ -170,7 +184,7 @@ int ce_aes_expandkey(struct crypto_aes_ctx *ctx, const u8 *in_key,
 	for (i = 0; i < kwords; i++)
 		ctx->key_enc[i] = get_unaligned_le32(in_key + i * sizeof(u32));
 
-	kernel_neon_begin_partial(2);
+	kernel_neon_begin();
 	for (i = 0; i < sizeof(rcon); i++) {
 		u32 *rki = ctx->key_enc + (i * kwords);
 		u32 *rko = rki + kwords;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 09/12] crypto: arm64/aes-ce-ccm: add non-SIMD generic fallback
  2017-06-10 16:22 ` Ard Biesheuvel
@ 2017-06-10 16:22   ` Ard Biesheuvel
  -1 siblings, 0 replies; 28+ messages in thread
From: Ard Biesheuvel @ 2017-06-10 16:22 UTC (permalink / raw)
  To: linux-crypto, herbert, linux-arm-kernel, catalin.marinas,
	will.deacon, dave.martin
  Cc: Ard Biesheuvel

The arm64 kernel will shortly disallow nested kernel mode NEON.

So honour this in the ARMv8 Crypto Extensions implementation of CCM-AES,
and fall back to a dynamically instantiated ccm(aes) implementation if
necessary (which will in all likelihood be produced by the generic CCM,
CTR and AES drivers). Due to the fact that this may break the boottime
algo tests, this driver can now only be built as a module.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/crypto/Kconfig           |   3 +-
 arch/arm64/crypto/aes-ce-ccm-glue.c | 152 +++++++++++++++-----
 2 files changed, 116 insertions(+), 39 deletions(-)

diff --git a/arch/arm64/crypto/Kconfig b/arch/arm64/crypto/Kconfig
index 772801f263d9..c3b74db72cc8 100644
--- a/arch/arm64/crypto/Kconfig
+++ b/arch/arm64/crypto/Kconfig
@@ -56,10 +56,11 @@ config CRYPTO_AES_ARM64_CE
 
 config CRYPTO_AES_ARM64_CE_CCM
 	tristate "AES in CCM mode using ARMv8 Crypto Extensions"
-	depends on ARM64 && KERNEL_MODE_NEON
+	depends on KERNEL_MODE_NEON && m
 	select CRYPTO_ALGAPI
 	select CRYPTO_AES_ARM64_CE
 	select CRYPTO_AEAD
+	select CRYPTO_CCM
 
 config CRYPTO_AES_ARM64_CE_BLK
 	tristate "AES in ECB/CBC/CTR/XTS modes using ARMv8 Crypto Extensions"
diff --git a/arch/arm64/crypto/aes-ce-ccm-glue.c b/arch/arm64/crypto/aes-ce-ccm-glue.c
index 6a7dbc7c83a6..c5ae50141988 100644
--- a/arch/arm64/crypto/aes-ce-ccm-glue.c
+++ b/arch/arm64/crypto/aes-ce-ccm-glue.c
@@ -1,7 +1,7 @@
 /*
  * aes-ccm-glue.c - AES-CCM transform for ARMv8 with Crypto Extensions
  *
- * Copyright (C) 2013 - 2014 Linaro Ltd <ard.biesheuvel@linaro.org>
+ * Copyright (C) 2013 - 2017 Linaro Ltd <ard.biesheuvel@linaro.org>
  *
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of the GNU General Public License version 2 as
@@ -9,6 +9,7 @@
  */
 
 #include <asm/neon.h>
+#include <asm/simd.h>
 #include <asm/unaligned.h>
 #include <crypto/aes.h>
 #include <crypto/scatterwalk.h>
@@ -18,6 +19,11 @@
 
 #include "aes-ce-setkey.h"
 
+struct crypto_aes_ccm_ctx {
+	struct crypto_aes_ctx	key;
+	struct crypto_aead	*fallback;
+};
+
 static int num_rounds(struct crypto_aes_ctx *ctx)
 {
 	/*
@@ -47,22 +53,33 @@ asmlinkage void ce_aes_ccm_final(u8 mac[], u8 const ctr[], u32 const rk[],
 static int ccm_setkey(struct crypto_aead *tfm, const u8 *in_key,
 		      unsigned int key_len)
 {
-	struct crypto_aes_ctx *ctx = crypto_aead_ctx(tfm);
+	struct crypto_aes_ccm_ctx *ctx = crypto_aead_ctx(tfm);
 	int ret;
 
-	ret = ce_aes_expandkey(ctx, in_key, key_len);
-	if (!ret)
-		return 0;
+	ret = ce_aes_expandkey(&ctx->key, in_key, key_len);
+	if (ret) {
+		tfm->base.crt_flags |= CRYPTO_TFM_RES_BAD_KEY_LEN;
+		return ret;
+	}
+
+	ret = crypto_aead_setkey(ctx->fallback, in_key, key_len);
+	if (ret) {
+		if (ctx->fallback->base.crt_flags & CRYPTO_TFM_RES_BAD_KEY_LEN)
+			tfm->base.crt_flags |= CRYPTO_TFM_RES_BAD_KEY_LEN;
+		return ret;
+	}
 
-	tfm->base.crt_flags |= CRYPTO_TFM_RES_BAD_KEY_LEN;
-	return -EINVAL;
+	return 0;
 }
 
 static int ccm_setauthsize(struct crypto_aead *tfm, unsigned int authsize)
 {
+	struct crypto_aes_ccm_ctx *ctx = crypto_aead_ctx(tfm);
+
 	if ((authsize & 1) || authsize < 4)
 		return -EINVAL;
-	return 0;
+
+	return crypto_aead_setauthsize(ctx->fallback, authsize);
 }
 
 static int ccm_init_mac(struct aead_request *req, u8 maciv[], u32 msglen)
@@ -106,7 +123,7 @@ static int ccm_init_mac(struct aead_request *req, u8 maciv[], u32 msglen)
 static void ccm_calculate_auth_mac(struct aead_request *req, u8 mac[])
 {
 	struct crypto_aead *aead = crypto_aead_reqtfm(req);
-	struct crypto_aes_ctx *ctx = crypto_aead_ctx(aead);
+	struct crypto_aes_ccm_ctx *ctx = crypto_aead_ctx(aead);
 	struct __packed { __be16 l; __be32 h; u16 len; } ltag;
 	struct scatter_walk walk;
 	u32 len = req->assoclen;
@@ -122,8 +139,8 @@ static void ccm_calculate_auth_mac(struct aead_request *req, u8 mac[])
 		ltag.len = 6;
 	}
 
-	ce_aes_ccm_auth_data(mac, (u8 *)&ltag, ltag.len, &macp, ctx->key_enc,
-			     num_rounds(ctx));
+	ce_aes_ccm_auth_data(mac, (u8 *)&ltag, ltag.len, &macp,
+			     ctx->key.key_enc, num_rounds(&ctx->key));
 	scatterwalk_start(&walk, req->src);
 
 	do {
@@ -135,8 +152,8 @@ static void ccm_calculate_auth_mac(struct aead_request *req, u8 mac[])
 			n = scatterwalk_clamp(&walk, len);
 		}
 		p = scatterwalk_map(&walk);
-		ce_aes_ccm_auth_data(mac, p, n, &macp, ctx->key_enc,
-				     num_rounds(ctx));
+		ce_aes_ccm_auth_data(mac, p, n, &macp, ctx->key.key_enc,
+				     num_rounds(&ctx->key));
 		len -= n;
 
 		scatterwalk_unmap(p);
@@ -148,18 +165,34 @@ static void ccm_calculate_auth_mac(struct aead_request *req, u8 mac[])
 static int ccm_encrypt(struct aead_request *req)
 {
 	struct crypto_aead *aead = crypto_aead_reqtfm(req);
-	struct crypto_aes_ctx *ctx = crypto_aead_ctx(aead);
+	struct crypto_aes_ccm_ctx *ctx = crypto_aead_ctx(aead);
 	struct skcipher_walk walk;
 	u8 __aligned(8) mac[AES_BLOCK_SIZE];
 	u8 buf[AES_BLOCK_SIZE];
 	u32 len = req->cryptlen;
 	int err;
 
+	if (!may_use_simd()) {
+		struct aead_request *fallback_req;
+
+		fallback_req = aead_request_alloc(ctx->fallback, GFP_ATOMIC);
+		if (!fallback_req)
+			return -ENOMEM;
+
+		aead_request_set_ad(fallback_req, req->assoclen);
+		aead_request_set_crypt(fallback_req, req->src, req->dst,
+				       req->cryptlen, req->iv);
+
+		err = crypto_aead_encrypt(fallback_req);
+		aead_request_free(fallback_req);
+		return err;
+	}
+
 	err = ccm_init_mac(req, mac, len);
 	if (err)
 		return err;
 
-	kernel_neon_begin_partial(6);
+	kernel_neon_begin();
 
 	if (req->assoclen)
 		ccm_calculate_auth_mac(req, mac);
@@ -176,13 +209,14 @@ static int ccm_encrypt(struct aead_request *req)
 			tail = 0;
 
 		ce_aes_ccm_encrypt(walk.dst.virt.addr, walk.src.virt.addr,
-				   walk.nbytes - tail, ctx->key_enc,
-				   num_rounds(ctx), mac, walk.iv);
+				   walk.nbytes - tail, ctx->key.key_enc,
+				   num_rounds(&ctx->key), mac, walk.iv);
 
 		err = skcipher_walk_done(&walk, tail);
 	}
 	if (!err)
-		ce_aes_ccm_final(mac, buf, ctx->key_enc, num_rounds(ctx));
+		ce_aes_ccm_final(mac, buf, ctx->key.key_enc,
+				 num_rounds(&ctx->key));
 
 	kernel_neon_end();
 
@@ -199,7 +233,7 @@ static int ccm_encrypt(struct aead_request *req)
 static int ccm_decrypt(struct aead_request *req)
 {
 	struct crypto_aead *aead = crypto_aead_reqtfm(req);
-	struct crypto_aes_ctx *ctx = crypto_aead_ctx(aead);
+	struct crypto_aes_ccm_ctx *ctx = crypto_aead_ctx(aead);
 	unsigned int authsize = crypto_aead_authsize(aead);
 	struct skcipher_walk walk;
 	u8 __aligned(8) mac[AES_BLOCK_SIZE];
@@ -207,11 +241,27 @@ static int ccm_decrypt(struct aead_request *req)
 	u32 len = req->cryptlen - authsize;
 	int err;
 
+	if (!may_use_simd()) {
+		struct aead_request *fallback_req;
+
+		fallback_req = aead_request_alloc(ctx->fallback, GFP_ATOMIC);
+		if (!fallback_req)
+			return -ENOMEM;
+
+		aead_request_set_ad(fallback_req, req->assoclen);
+		aead_request_set_crypt(fallback_req, req->src, req->dst,
+				       req->cryptlen, req->iv);
+
+		err = crypto_aead_decrypt(fallback_req);
+		aead_request_free(fallback_req);
+		return err;
+	}
+
 	err = ccm_init_mac(req, mac, len);
 	if (err)
 		return err;
 
-	kernel_neon_begin_partial(6);
+	kernel_neon_begin();
 
 	if (req->assoclen)
 		ccm_calculate_auth_mac(req, mac);
@@ -228,13 +278,14 @@ static int ccm_decrypt(struct aead_request *req)
 			tail = 0;
 
 		ce_aes_ccm_decrypt(walk.dst.virt.addr, walk.src.virt.addr,
-				   walk.nbytes - tail, ctx->key_enc,
-				   num_rounds(ctx), mac, walk.iv);
+				   walk.nbytes - tail, ctx->key.key_enc,
+				   num_rounds(&ctx->key), mac, walk.iv);
 
 		err = skcipher_walk_done(&walk, tail);
 	}
 	if (!err)
-		ce_aes_ccm_final(mac, buf, ctx->key_enc, num_rounds(ctx));
+		ce_aes_ccm_final(mac, buf, ctx->key.key_enc,
+				 num_rounds(&ctx->key));
 
 	kernel_neon_end();
 
@@ -251,28 +302,53 @@ static int ccm_decrypt(struct aead_request *req)
 	return 0;
 }
 
+static int ccm_init(struct crypto_aead *aead)
+{
+	struct crypto_aes_ccm_ctx *ctx = crypto_aead_ctx(aead);
+	struct crypto_aead *tfm;
+
+	tfm = crypto_alloc_aead("ccm(aes)", 0,
+				CRYPTO_ALG_ASYNC | CRYPTO_ALG_NEED_FALLBACK);
+
+	if (IS_ERR(tfm))
+		return PTR_ERR(tfm);
+
+	ctx->fallback = tfm;
+	return 0;
+}
+
+static void ccm_exit(struct crypto_aead *aead)
+{
+	struct crypto_aes_ccm_ctx *ctx = crypto_aead_ctx(aead);
+
+	crypto_free_aead(ctx->fallback);
+}
+
 static struct aead_alg ccm_aes_alg = {
-	.base = {
-		.cra_name		= "ccm(aes)",
-		.cra_driver_name	= "ccm-aes-ce",
-		.cra_priority		= 300,
-		.cra_blocksize		= 1,
-		.cra_ctxsize		= sizeof(struct crypto_aes_ctx),
-		.cra_module		= THIS_MODULE,
-	},
-	.ivsize		= AES_BLOCK_SIZE,
-	.chunksize	= AES_BLOCK_SIZE,
-	.maxauthsize	= AES_BLOCK_SIZE,
-	.setkey		= ccm_setkey,
-	.setauthsize	= ccm_setauthsize,
-	.encrypt	= ccm_encrypt,
-	.decrypt	= ccm_decrypt,
+	.base.cra_name		= "ccm(aes)",
+	.base.cra_driver_name	= "ccm-aes-ce",
+	.base.cra_priority	= 300,
+	.base.cra_blocksize	= 1,
+	.base.cra_ctxsize	= sizeof(struct crypto_aes_ccm_ctx),
+	.base.cra_module	= THIS_MODULE,
+	.base.cra_flags		= CRYPTO_ALG_NEED_FALLBACK,
+
+	.ivsize			= AES_BLOCK_SIZE,
+	.chunksize		= AES_BLOCK_SIZE,
+	.maxauthsize		= AES_BLOCK_SIZE,
+	.setkey			= ccm_setkey,
+	.setauthsize		= ccm_setauthsize,
+	.encrypt		= ccm_encrypt,
+	.decrypt		= ccm_decrypt,
+	.init			= ccm_init,
+	.exit			= ccm_exit,
 };
 
 static int __init aes_mod_init(void)
 {
 	if (!(elf_hwcap & HWCAP_AES))
 		return -ENODEV;
+
 	return crypto_register_aead(&ccm_aes_alg);
 }
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 09/12] crypto: arm64/aes-ce-ccm: add non-SIMD generic fallback
@ 2017-06-10 16:22   ` Ard Biesheuvel
  0 siblings, 0 replies; 28+ messages in thread
From: Ard Biesheuvel @ 2017-06-10 16:22 UTC (permalink / raw)
  To: linux-arm-kernel

The arm64 kernel will shortly disallow nested kernel mode NEON.

So honour this in the ARMv8 Crypto Extensions implementation of CCM-AES,
and fall back to a dynamically instantiated ccm(aes) implementation if
necessary (which will in all likelihood be produced by the generic CCM,
CTR and AES drivers). Due to the fact that this may break the boottime
algo tests, this driver can now only be built as a module.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/crypto/Kconfig           |   3 +-
 arch/arm64/crypto/aes-ce-ccm-glue.c | 152 +++++++++++++++-----
 2 files changed, 116 insertions(+), 39 deletions(-)

diff --git a/arch/arm64/crypto/Kconfig b/arch/arm64/crypto/Kconfig
index 772801f263d9..c3b74db72cc8 100644
--- a/arch/arm64/crypto/Kconfig
+++ b/arch/arm64/crypto/Kconfig
@@ -56,10 +56,11 @@ config CRYPTO_AES_ARM64_CE
 
 config CRYPTO_AES_ARM64_CE_CCM
 	tristate "AES in CCM mode using ARMv8 Crypto Extensions"
-	depends on ARM64 && KERNEL_MODE_NEON
+	depends on KERNEL_MODE_NEON && m
 	select CRYPTO_ALGAPI
 	select CRYPTO_AES_ARM64_CE
 	select CRYPTO_AEAD
+	select CRYPTO_CCM
 
 config CRYPTO_AES_ARM64_CE_BLK
 	tristate "AES in ECB/CBC/CTR/XTS modes using ARMv8 Crypto Extensions"
diff --git a/arch/arm64/crypto/aes-ce-ccm-glue.c b/arch/arm64/crypto/aes-ce-ccm-glue.c
index 6a7dbc7c83a6..c5ae50141988 100644
--- a/arch/arm64/crypto/aes-ce-ccm-glue.c
+++ b/arch/arm64/crypto/aes-ce-ccm-glue.c
@@ -1,7 +1,7 @@
 /*
  * aes-ccm-glue.c - AES-CCM transform for ARMv8 with Crypto Extensions
  *
- * Copyright (C) 2013 - 2014 Linaro Ltd <ard.biesheuvel@linaro.org>
+ * Copyright (C) 2013 - 2017 Linaro Ltd <ard.biesheuvel@linaro.org>
  *
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of the GNU General Public License version 2 as
@@ -9,6 +9,7 @@
  */
 
 #include <asm/neon.h>
+#include <asm/simd.h>
 #include <asm/unaligned.h>
 #include <crypto/aes.h>
 #include <crypto/scatterwalk.h>
@@ -18,6 +19,11 @@
 
 #include "aes-ce-setkey.h"
 
+struct crypto_aes_ccm_ctx {
+	struct crypto_aes_ctx	key;
+	struct crypto_aead	*fallback;
+};
+
 static int num_rounds(struct crypto_aes_ctx *ctx)
 {
 	/*
@@ -47,22 +53,33 @@ asmlinkage void ce_aes_ccm_final(u8 mac[], u8 const ctr[], u32 const rk[],
 static int ccm_setkey(struct crypto_aead *tfm, const u8 *in_key,
 		      unsigned int key_len)
 {
-	struct crypto_aes_ctx *ctx = crypto_aead_ctx(tfm);
+	struct crypto_aes_ccm_ctx *ctx = crypto_aead_ctx(tfm);
 	int ret;
 
-	ret = ce_aes_expandkey(ctx, in_key, key_len);
-	if (!ret)
-		return 0;
+	ret = ce_aes_expandkey(&ctx->key, in_key, key_len);
+	if (ret) {
+		tfm->base.crt_flags |= CRYPTO_TFM_RES_BAD_KEY_LEN;
+		return ret;
+	}
+
+	ret = crypto_aead_setkey(ctx->fallback, in_key, key_len);
+	if (ret) {
+		if (ctx->fallback->base.crt_flags & CRYPTO_TFM_RES_BAD_KEY_LEN)
+			tfm->base.crt_flags |= CRYPTO_TFM_RES_BAD_KEY_LEN;
+		return ret;
+	}
 
-	tfm->base.crt_flags |= CRYPTO_TFM_RES_BAD_KEY_LEN;
-	return -EINVAL;
+	return 0;
 }
 
 static int ccm_setauthsize(struct crypto_aead *tfm, unsigned int authsize)
 {
+	struct crypto_aes_ccm_ctx *ctx = crypto_aead_ctx(tfm);
+
 	if ((authsize & 1) || authsize < 4)
 		return -EINVAL;
-	return 0;
+
+	return crypto_aead_setauthsize(ctx->fallback, authsize);
 }
 
 static int ccm_init_mac(struct aead_request *req, u8 maciv[], u32 msglen)
@@ -106,7 +123,7 @@ static int ccm_init_mac(struct aead_request *req, u8 maciv[], u32 msglen)
 static void ccm_calculate_auth_mac(struct aead_request *req, u8 mac[])
 {
 	struct crypto_aead *aead = crypto_aead_reqtfm(req);
-	struct crypto_aes_ctx *ctx = crypto_aead_ctx(aead);
+	struct crypto_aes_ccm_ctx *ctx = crypto_aead_ctx(aead);
 	struct __packed { __be16 l; __be32 h; u16 len; } ltag;
 	struct scatter_walk walk;
 	u32 len = req->assoclen;
@@ -122,8 +139,8 @@ static void ccm_calculate_auth_mac(struct aead_request *req, u8 mac[])
 		ltag.len = 6;
 	}
 
-	ce_aes_ccm_auth_data(mac, (u8 *)&ltag, ltag.len, &macp, ctx->key_enc,
-			     num_rounds(ctx));
+	ce_aes_ccm_auth_data(mac, (u8 *)&ltag, ltag.len, &macp,
+			     ctx->key.key_enc, num_rounds(&ctx->key));
 	scatterwalk_start(&walk, req->src);
 
 	do {
@@ -135,8 +152,8 @@ static void ccm_calculate_auth_mac(struct aead_request *req, u8 mac[])
 			n = scatterwalk_clamp(&walk, len);
 		}
 		p = scatterwalk_map(&walk);
-		ce_aes_ccm_auth_data(mac, p, n, &macp, ctx->key_enc,
-				     num_rounds(ctx));
+		ce_aes_ccm_auth_data(mac, p, n, &macp, ctx->key.key_enc,
+				     num_rounds(&ctx->key));
 		len -= n;
 
 		scatterwalk_unmap(p);
@@ -148,18 +165,34 @@ static void ccm_calculate_auth_mac(struct aead_request *req, u8 mac[])
 static int ccm_encrypt(struct aead_request *req)
 {
 	struct crypto_aead *aead = crypto_aead_reqtfm(req);
-	struct crypto_aes_ctx *ctx = crypto_aead_ctx(aead);
+	struct crypto_aes_ccm_ctx *ctx = crypto_aead_ctx(aead);
 	struct skcipher_walk walk;
 	u8 __aligned(8) mac[AES_BLOCK_SIZE];
 	u8 buf[AES_BLOCK_SIZE];
 	u32 len = req->cryptlen;
 	int err;
 
+	if (!may_use_simd()) {
+		struct aead_request *fallback_req;
+
+		fallback_req = aead_request_alloc(ctx->fallback, GFP_ATOMIC);
+		if (!fallback_req)
+			return -ENOMEM;
+
+		aead_request_set_ad(fallback_req, req->assoclen);
+		aead_request_set_crypt(fallback_req, req->src, req->dst,
+				       req->cryptlen, req->iv);
+
+		err = crypto_aead_encrypt(fallback_req);
+		aead_request_free(fallback_req);
+		return err;
+	}
+
 	err = ccm_init_mac(req, mac, len);
 	if (err)
 		return err;
 
-	kernel_neon_begin_partial(6);
+	kernel_neon_begin();
 
 	if (req->assoclen)
 		ccm_calculate_auth_mac(req, mac);
@@ -176,13 +209,14 @@ static int ccm_encrypt(struct aead_request *req)
 			tail = 0;
 
 		ce_aes_ccm_encrypt(walk.dst.virt.addr, walk.src.virt.addr,
-				   walk.nbytes - tail, ctx->key_enc,
-				   num_rounds(ctx), mac, walk.iv);
+				   walk.nbytes - tail, ctx->key.key_enc,
+				   num_rounds(&ctx->key), mac, walk.iv);
 
 		err = skcipher_walk_done(&walk, tail);
 	}
 	if (!err)
-		ce_aes_ccm_final(mac, buf, ctx->key_enc, num_rounds(ctx));
+		ce_aes_ccm_final(mac, buf, ctx->key.key_enc,
+				 num_rounds(&ctx->key));
 
 	kernel_neon_end();
 
@@ -199,7 +233,7 @@ static int ccm_encrypt(struct aead_request *req)
 static int ccm_decrypt(struct aead_request *req)
 {
 	struct crypto_aead *aead = crypto_aead_reqtfm(req);
-	struct crypto_aes_ctx *ctx = crypto_aead_ctx(aead);
+	struct crypto_aes_ccm_ctx *ctx = crypto_aead_ctx(aead);
 	unsigned int authsize = crypto_aead_authsize(aead);
 	struct skcipher_walk walk;
 	u8 __aligned(8) mac[AES_BLOCK_SIZE];
@@ -207,11 +241,27 @@ static int ccm_decrypt(struct aead_request *req)
 	u32 len = req->cryptlen - authsize;
 	int err;
 
+	if (!may_use_simd()) {
+		struct aead_request *fallback_req;
+
+		fallback_req = aead_request_alloc(ctx->fallback, GFP_ATOMIC);
+		if (!fallback_req)
+			return -ENOMEM;
+
+		aead_request_set_ad(fallback_req, req->assoclen);
+		aead_request_set_crypt(fallback_req, req->src, req->dst,
+				       req->cryptlen, req->iv);
+
+		err = crypto_aead_decrypt(fallback_req);
+		aead_request_free(fallback_req);
+		return err;
+	}
+
 	err = ccm_init_mac(req, mac, len);
 	if (err)
 		return err;
 
-	kernel_neon_begin_partial(6);
+	kernel_neon_begin();
 
 	if (req->assoclen)
 		ccm_calculate_auth_mac(req, mac);
@@ -228,13 +278,14 @@ static int ccm_decrypt(struct aead_request *req)
 			tail = 0;
 
 		ce_aes_ccm_decrypt(walk.dst.virt.addr, walk.src.virt.addr,
-				   walk.nbytes - tail, ctx->key_enc,
-				   num_rounds(ctx), mac, walk.iv);
+				   walk.nbytes - tail, ctx->key.key_enc,
+				   num_rounds(&ctx->key), mac, walk.iv);
 
 		err = skcipher_walk_done(&walk, tail);
 	}
 	if (!err)
-		ce_aes_ccm_final(mac, buf, ctx->key_enc, num_rounds(ctx));
+		ce_aes_ccm_final(mac, buf, ctx->key.key_enc,
+				 num_rounds(&ctx->key));
 
 	kernel_neon_end();
 
@@ -251,28 +302,53 @@ static int ccm_decrypt(struct aead_request *req)
 	return 0;
 }
 
+static int ccm_init(struct crypto_aead *aead)
+{
+	struct crypto_aes_ccm_ctx *ctx = crypto_aead_ctx(aead);
+	struct crypto_aead *tfm;
+
+	tfm = crypto_alloc_aead("ccm(aes)", 0,
+				CRYPTO_ALG_ASYNC | CRYPTO_ALG_NEED_FALLBACK);
+
+	if (IS_ERR(tfm))
+		return PTR_ERR(tfm);
+
+	ctx->fallback = tfm;
+	return 0;
+}
+
+static void ccm_exit(struct crypto_aead *aead)
+{
+	struct crypto_aes_ccm_ctx *ctx = crypto_aead_ctx(aead);
+
+	crypto_free_aead(ctx->fallback);
+}
+
 static struct aead_alg ccm_aes_alg = {
-	.base = {
-		.cra_name		= "ccm(aes)",
-		.cra_driver_name	= "ccm-aes-ce",
-		.cra_priority		= 300,
-		.cra_blocksize		= 1,
-		.cra_ctxsize		= sizeof(struct crypto_aes_ctx),
-		.cra_module		= THIS_MODULE,
-	},
-	.ivsize		= AES_BLOCK_SIZE,
-	.chunksize	= AES_BLOCK_SIZE,
-	.maxauthsize	= AES_BLOCK_SIZE,
-	.setkey		= ccm_setkey,
-	.setauthsize	= ccm_setauthsize,
-	.encrypt	= ccm_encrypt,
-	.decrypt	= ccm_decrypt,
+	.base.cra_name		= "ccm(aes)",
+	.base.cra_driver_name	= "ccm-aes-ce",
+	.base.cra_priority	= 300,
+	.base.cra_blocksize	= 1,
+	.base.cra_ctxsize	= sizeof(struct crypto_aes_ccm_ctx),
+	.base.cra_module	= THIS_MODULE,
+	.base.cra_flags		= CRYPTO_ALG_NEED_FALLBACK,
+
+	.ivsize			= AES_BLOCK_SIZE,
+	.chunksize		= AES_BLOCK_SIZE,
+	.maxauthsize		= AES_BLOCK_SIZE,
+	.setkey			= ccm_setkey,
+	.setauthsize		= ccm_setauthsize,
+	.encrypt		= ccm_encrypt,
+	.decrypt		= ccm_decrypt,
+	.init			= ccm_init,
+	.exit			= ccm_exit,
 };
 
 static int __init aes_mod_init(void)
 {
 	if (!(elf_hwcap & HWCAP_AES))
 		return -ENODEV;
+
 	return crypto_register_aead(&ccm_aes_alg);
 }
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 10/12] crypto: arm64/aes-blk - add a non-SIMD fallback for synchronous CTR
  2017-06-10 16:22 ` Ard Biesheuvel
@ 2017-06-10 16:22   ` Ard Biesheuvel
  -1 siblings, 0 replies; 28+ messages in thread
From: Ard Biesheuvel @ 2017-06-10 16:22 UTC (permalink / raw)
  To: linux-crypto, herbert, linux-arm-kernel, catalin.marinas,
	will.deacon, dave.martin
  Cc: Ard Biesheuvel

To accommodate systems that may disallow use of the NEON in kernel mode
in some circumstances, introduce a C fallback for synchronous AES in CTR
mode, and use it if may_use_simd() returns false.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/crypto/Kconfig            |  7 ++-
 arch/arm64/crypto/aes-ctr-fallback.h | 55 ++++++++++++++++++++
 arch/arm64/crypto/aes-glue.c         | 17 +++++-
 3 files changed, 75 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/crypto/Kconfig b/arch/arm64/crypto/Kconfig
index c3b74db72cc8..6bd1921d8ca2 100644
--- a/arch/arm64/crypto/Kconfig
+++ b/arch/arm64/crypto/Kconfig
@@ -64,17 +64,20 @@ config CRYPTO_AES_ARM64_CE_CCM
 
 config CRYPTO_AES_ARM64_CE_BLK
 	tristate "AES in ECB/CBC/CTR/XTS modes using ARMv8 Crypto Extensions"
-	depends on ARM64 && KERNEL_MODE_NEON
+	depends on KERNEL_MODE_NEON
 	select CRYPTO_BLKCIPHER
 	select CRYPTO_AES_ARM64_CE
+	select CRYPTO_AES
 	select CRYPTO_SIMD
+	select CRYPTO_AES_ARM64
 
 config CRYPTO_AES_ARM64_NEON_BLK
 	tristate "AES in ECB/CBC/CTR/XTS modes using NEON instructions"
-	depends on ARM64 && KERNEL_MODE_NEON
+	depends on KERNEL_MODE_NEON
 	select CRYPTO_BLKCIPHER
 	select CRYPTO_AES
 	select CRYPTO_SIMD
+	select CRYPTO_AES_ARM64
 
 config CRYPTO_CHACHA20_NEON
 	tristate "NEON accelerated ChaCha20 symmetric cipher"
diff --git a/arch/arm64/crypto/aes-ctr-fallback.h b/arch/arm64/crypto/aes-ctr-fallback.h
new file mode 100644
index 000000000000..4a6bfac6ecb5
--- /dev/null
+++ b/arch/arm64/crypto/aes-ctr-fallback.h
@@ -0,0 +1,55 @@
+/*
+ * Fallback for sync aes(ctr) in contexts where kernel mode NEON
+ * is not allowed
+ *
+ * Copyright (C) 2017 Linaro Ltd <ard.biesheuvel@linaro.org>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <crypto/aes.h>
+#include <crypto/internal/skcipher.h>
+
+asmlinkage void __aes_arm64_encrypt(u32 *rk, u8 *out, const u8 *in, int rounds);
+
+static inline int aes_ctr_encrypt_fallback(struct crypto_aes_ctx *ctx,
+					   struct skcipher_request *req)
+{
+	struct skcipher_walk walk;
+	u8 buf[AES_BLOCK_SIZE];
+	int err;
+
+	err = skcipher_walk_virt(&walk, req, true);
+
+	while (walk.nbytes > 0) {
+		u8 *dst = walk.dst.virt.addr;
+		u8 *src = walk.src.virt.addr;
+		int nbytes = walk.nbytes;
+		int tail = 0;
+
+		if (nbytes < walk.total) {
+			nbytes = round_down(nbytes, AES_BLOCK_SIZE);
+			tail = walk.nbytes % AES_BLOCK_SIZE;
+		}
+
+		do {
+			int bsize = min(nbytes, AES_BLOCK_SIZE);
+
+			__aes_arm64_encrypt(ctx->key_enc, buf, walk.iv,
+					    ctx->key_length / 4 + 6);
+			if (dst != src)
+				memcpy(dst, src, bsize);
+			crypto_xor(dst, buf, bsize);
+			crypto_inc(walk.iv, AES_BLOCK_SIZE);
+
+			dst += AES_BLOCK_SIZE;
+			src += AES_BLOCK_SIZE;
+			nbytes -= AES_BLOCK_SIZE;
+		} while (nbytes > 0);
+
+		err = skcipher_walk_done(&walk, tail);
+	}
+	return err;
+}
diff --git a/arch/arm64/crypto/aes-glue.c b/arch/arm64/crypto/aes-glue.c
index bcf596b0197e..6806ad7d8dd4 100644
--- a/arch/arm64/crypto/aes-glue.c
+++ b/arch/arm64/crypto/aes-glue.c
@@ -10,6 +10,7 @@
 
 #include <asm/neon.h>
 #include <asm/hwcap.h>
+#include <asm/simd.h>
 #include <crypto/aes.h>
 #include <crypto/internal/hash.h>
 #include <crypto/internal/simd.h>
@@ -19,6 +20,7 @@
 #include <crypto/xts.h>
 
 #include "aes-ce-setkey.h"
+#include "aes-ctr-fallback.h"
 
 #ifdef USE_V8_CRYPTO_EXTENSIONS
 #define MODE			"ce"
@@ -251,6 +253,17 @@ static int ctr_encrypt(struct skcipher_request *req)
 	return err;
 }
 
+static int ctr_encrypt_sync(struct skcipher_request *req)
+{
+	struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
+	struct crypto_aes_ctx *ctx = crypto_skcipher_ctx(tfm);
+
+	if (!may_use_simd())
+		return aes_ctr_encrypt_fallback(ctx, req);
+
+	return ctr_encrypt(req);
+}
+
 static int xts_encrypt(struct skcipher_request *req)
 {
 	struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
@@ -357,8 +370,8 @@ static struct skcipher_alg aes_algs[] = { {
 	.ivsize		= AES_BLOCK_SIZE,
 	.chunksize	= AES_BLOCK_SIZE,
 	.setkey		= skcipher_aes_setkey,
-	.encrypt	= ctr_encrypt,
-	.decrypt	= ctr_encrypt,
+	.encrypt	= ctr_encrypt_sync,
+	.decrypt	= ctr_encrypt_sync,
 }, {
 	.base = {
 		.cra_name		= "__xts(aes)",
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 10/12] crypto: arm64/aes-blk - add a non-SIMD fallback for synchronous CTR
@ 2017-06-10 16:22   ` Ard Biesheuvel
  0 siblings, 0 replies; 28+ messages in thread
From: Ard Biesheuvel @ 2017-06-10 16:22 UTC (permalink / raw)
  To: linux-arm-kernel

To accommodate systems that may disallow use of the NEON in kernel mode
in some circumstances, introduce a C fallback for synchronous AES in CTR
mode, and use it if may_use_simd() returns false.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/crypto/Kconfig            |  7 ++-
 arch/arm64/crypto/aes-ctr-fallback.h | 55 ++++++++++++++++++++
 arch/arm64/crypto/aes-glue.c         | 17 +++++-
 3 files changed, 75 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/crypto/Kconfig b/arch/arm64/crypto/Kconfig
index c3b74db72cc8..6bd1921d8ca2 100644
--- a/arch/arm64/crypto/Kconfig
+++ b/arch/arm64/crypto/Kconfig
@@ -64,17 +64,20 @@ config CRYPTO_AES_ARM64_CE_CCM
 
 config CRYPTO_AES_ARM64_CE_BLK
 	tristate "AES in ECB/CBC/CTR/XTS modes using ARMv8 Crypto Extensions"
-	depends on ARM64 && KERNEL_MODE_NEON
+	depends on KERNEL_MODE_NEON
 	select CRYPTO_BLKCIPHER
 	select CRYPTO_AES_ARM64_CE
+	select CRYPTO_AES
 	select CRYPTO_SIMD
+	select CRYPTO_AES_ARM64
 
 config CRYPTO_AES_ARM64_NEON_BLK
 	tristate "AES in ECB/CBC/CTR/XTS modes using NEON instructions"
-	depends on ARM64 && KERNEL_MODE_NEON
+	depends on KERNEL_MODE_NEON
 	select CRYPTO_BLKCIPHER
 	select CRYPTO_AES
 	select CRYPTO_SIMD
+	select CRYPTO_AES_ARM64
 
 config CRYPTO_CHACHA20_NEON
 	tristate "NEON accelerated ChaCha20 symmetric cipher"
diff --git a/arch/arm64/crypto/aes-ctr-fallback.h b/arch/arm64/crypto/aes-ctr-fallback.h
new file mode 100644
index 000000000000..4a6bfac6ecb5
--- /dev/null
+++ b/arch/arm64/crypto/aes-ctr-fallback.h
@@ -0,0 +1,55 @@
+/*
+ * Fallback for sync aes(ctr) in contexts where kernel mode NEON
+ * is not allowed
+ *
+ * Copyright (C) 2017 Linaro Ltd <ard.biesheuvel@linaro.org>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <crypto/aes.h>
+#include <crypto/internal/skcipher.h>
+
+asmlinkage void __aes_arm64_encrypt(u32 *rk, u8 *out, const u8 *in, int rounds);
+
+static inline int aes_ctr_encrypt_fallback(struct crypto_aes_ctx *ctx,
+					   struct skcipher_request *req)
+{
+	struct skcipher_walk walk;
+	u8 buf[AES_BLOCK_SIZE];
+	int err;
+
+	err = skcipher_walk_virt(&walk, req, true);
+
+	while (walk.nbytes > 0) {
+		u8 *dst = walk.dst.virt.addr;
+		u8 *src = walk.src.virt.addr;
+		int nbytes = walk.nbytes;
+		int tail = 0;
+
+		if (nbytes < walk.total) {
+			nbytes = round_down(nbytes, AES_BLOCK_SIZE);
+			tail = walk.nbytes % AES_BLOCK_SIZE;
+		}
+
+		do {
+			int bsize = min(nbytes, AES_BLOCK_SIZE);
+
+			__aes_arm64_encrypt(ctx->key_enc, buf, walk.iv,
+					    ctx->key_length / 4 + 6);
+			if (dst != src)
+				memcpy(dst, src, bsize);
+			crypto_xor(dst, buf, bsize);
+			crypto_inc(walk.iv, AES_BLOCK_SIZE);
+
+			dst += AES_BLOCK_SIZE;
+			src += AES_BLOCK_SIZE;
+			nbytes -= AES_BLOCK_SIZE;
+		} while (nbytes > 0);
+
+		err = skcipher_walk_done(&walk, tail);
+	}
+	return err;
+}
diff --git a/arch/arm64/crypto/aes-glue.c b/arch/arm64/crypto/aes-glue.c
index bcf596b0197e..6806ad7d8dd4 100644
--- a/arch/arm64/crypto/aes-glue.c
+++ b/arch/arm64/crypto/aes-glue.c
@@ -10,6 +10,7 @@
 
 #include <asm/neon.h>
 #include <asm/hwcap.h>
+#include <asm/simd.h>
 #include <crypto/aes.h>
 #include <crypto/internal/hash.h>
 #include <crypto/internal/simd.h>
@@ -19,6 +20,7 @@
 #include <crypto/xts.h>
 
 #include "aes-ce-setkey.h"
+#include "aes-ctr-fallback.h"
 
 #ifdef USE_V8_CRYPTO_EXTENSIONS
 #define MODE			"ce"
@@ -251,6 +253,17 @@ static int ctr_encrypt(struct skcipher_request *req)
 	return err;
 }
 
+static int ctr_encrypt_sync(struct skcipher_request *req)
+{
+	struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
+	struct crypto_aes_ctx *ctx = crypto_skcipher_ctx(tfm);
+
+	if (!may_use_simd())
+		return aes_ctr_encrypt_fallback(ctx, req);
+
+	return ctr_encrypt(req);
+}
+
 static int xts_encrypt(struct skcipher_request *req)
 {
 	struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
@@ -357,8 +370,8 @@ static struct skcipher_alg aes_algs[] = { {
 	.ivsize		= AES_BLOCK_SIZE,
 	.chunksize	= AES_BLOCK_SIZE,
 	.setkey		= skcipher_aes_setkey,
-	.encrypt	= ctr_encrypt,
-	.decrypt	= ctr_encrypt,
+	.encrypt	= ctr_encrypt_sync,
+	.decrypt	= ctr_encrypt_sync,
 }, {
 	.base = {
 		.cra_name		= "__xts(aes)",
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 11/12] crypto: arm64/chacha20 - take may_use_simd() into account
  2017-06-10 16:22 ` Ard Biesheuvel
@ 2017-06-10 16:22   ` Ard Biesheuvel
  -1 siblings, 0 replies; 28+ messages in thread
From: Ard Biesheuvel @ 2017-06-10 16:22 UTC (permalink / raw)
  To: linux-crypto, herbert, linux-arm-kernel, catalin.marinas,
	will.deacon, dave.martin
  Cc: Ard Biesheuvel

To accommodate systems that disallow the use of kernel mode NEON in
some circumstances, take the return value of may_use_simd into
account when deciding whether to invoke the C fallback routine.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/crypto/chacha20-neon-glue.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/crypto/chacha20-neon-glue.c b/arch/arm64/crypto/chacha20-neon-glue.c
index a7cd575ea223..cbdb75d15cd0 100644
--- a/arch/arm64/crypto/chacha20-neon-glue.c
+++ b/arch/arm64/crypto/chacha20-neon-glue.c
@@ -1,7 +1,7 @@
 /*
  * ChaCha20 256-bit cipher algorithm, RFC7539, arm64 NEON functions
  *
- * Copyright (C) 2016 Linaro, Ltd. <ard.biesheuvel@linaro.org>
+ * Copyright (C) 2016 - 2017 Linaro, Ltd. <ard.biesheuvel@linaro.org>
  *
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of the GNU General Public License version 2 as
@@ -26,6 +26,7 @@
 
 #include <asm/hwcap.h>
 #include <asm/neon.h>
+#include <asm/simd.h>
 
 asmlinkage void chacha20_block_xor_neon(u32 *state, u8 *dst, const u8 *src);
 asmlinkage void chacha20_4block_xor_neon(u32 *state, u8 *dst, const u8 *src);
@@ -64,7 +65,7 @@ static int chacha20_neon(struct skcipher_request *req)
 	u32 state[16];
 	int err;
 
-	if (req->cryptlen <= CHACHA20_BLOCK_SIZE)
+	if (!may_use_simd() || req->cryptlen <= CHACHA20_BLOCK_SIZE)
 		return crypto_chacha20_crypt(req);
 
 	err = skcipher_walk_virt(&walk, req, true);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 11/12] crypto: arm64/chacha20 - take may_use_simd() into account
@ 2017-06-10 16:22   ` Ard Biesheuvel
  0 siblings, 0 replies; 28+ messages in thread
From: Ard Biesheuvel @ 2017-06-10 16:22 UTC (permalink / raw)
  To: linux-arm-kernel

To accommodate systems that disallow the use of kernel mode NEON in
some circumstances, take the return value of may_use_simd into
account when deciding whether to invoke the C fallback routine.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/crypto/chacha20-neon-glue.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/crypto/chacha20-neon-glue.c b/arch/arm64/crypto/chacha20-neon-glue.c
index a7cd575ea223..cbdb75d15cd0 100644
--- a/arch/arm64/crypto/chacha20-neon-glue.c
+++ b/arch/arm64/crypto/chacha20-neon-glue.c
@@ -1,7 +1,7 @@
 /*
  * ChaCha20 256-bit cipher algorithm, RFC7539, arm64 NEON functions
  *
- * Copyright (C) 2016 Linaro, Ltd. <ard.biesheuvel@linaro.org>
+ * Copyright (C) 2016 - 2017 Linaro, Ltd. <ard.biesheuvel@linaro.org>
  *
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of the GNU General Public License version 2 as
@@ -26,6 +26,7 @@
 
 #include <asm/hwcap.h>
 #include <asm/neon.h>
+#include <asm/simd.h>
 
 asmlinkage void chacha20_block_xor_neon(u32 *state, u8 *dst, const u8 *src);
 asmlinkage void chacha20_4block_xor_neon(u32 *state, u8 *dst, const u8 *src);
@@ -64,7 +65,7 @@ static int chacha20_neon(struct skcipher_request *req)
 	u32 state[16];
 	int err;
 
-	if (req->cryptlen <= CHACHA20_BLOCK_SIZE)
+	if (!may_use_simd() || req->cryptlen <= CHACHA20_BLOCK_SIZE)
 		return crypto_chacha20_crypt(req);
 
 	err = skcipher_walk_virt(&walk, req, true);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 12/12] crypto: arm64/aes-bs - implement non-SIMD fallback for AES-CTR
  2017-06-10 16:22 ` Ard Biesheuvel
@ 2017-06-10 16:22   ` Ard Biesheuvel
  -1 siblings, 0 replies; 28+ messages in thread
From: Ard Biesheuvel @ 2017-06-10 16:22 UTC (permalink / raw)
  To: linux-crypto, herbert, linux-arm-kernel, catalin.marinas,
	will.deacon, dave.martin
  Cc: Ard Biesheuvel

Of the various chaining modes implemented by the bit sliced AES driver,
only CTR is exposed as a synchronous cipher, and requires a fallback in
order to remain usable once we update the kernel mode NEON handling logic
to disallow nested use. So wire up the existing CTR fallback C code.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/crypto/aes-neonbs-glue.c | 48 ++++++++++++++++++--
 1 file changed, 43 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/crypto/aes-neonbs-glue.c b/arch/arm64/crypto/aes-neonbs-glue.c
index db2501d93550..5fe442c26ff1 100644
--- a/arch/arm64/crypto/aes-neonbs-glue.c
+++ b/arch/arm64/crypto/aes-neonbs-glue.c
@@ -1,7 +1,7 @@
 /*
  * Bit sliced AES using NEON instructions
  *
- * Copyright (C) 2016 Linaro Ltd <ard.biesheuvel@linaro.org>
+ * Copyright (C) 2016 - 2017 Linaro Ltd <ard.biesheuvel@linaro.org>
  *
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of the GNU General Public License version 2 as
@@ -9,12 +9,15 @@
  */
 
 #include <asm/neon.h>
+#include <asm/simd.h>
 #include <crypto/aes.h>
 #include <crypto/internal/simd.h>
 #include <crypto/internal/skcipher.h>
 #include <crypto/xts.h>
 #include <linux/module.h>
 
+#include "aes-ctr-fallback.h"
+
 MODULE_AUTHOR("Ard Biesheuvel <ard.biesheuvel@linaro.org>");
 MODULE_LICENSE("GPL v2");
 
@@ -58,6 +61,11 @@ struct aesbs_cbc_ctx {
 	u32			enc[AES_MAX_KEYLENGTH_U32];
 };
 
+struct aesbs_ctr_ctx {
+	struct aesbs_ctx	key;		/* must be first member */
+	struct crypto_aes_ctx	fallback;
+};
+
 struct aesbs_xts_ctx {
 	struct aesbs_ctx	key;
 	u32			twkey[AES_MAX_KEYLENGTH_U32];
@@ -196,6 +204,25 @@ static int cbc_decrypt(struct skcipher_request *req)
 	return err;
 }
 
+static int aesbs_ctr_setkey_sync(struct crypto_skcipher *tfm, const u8 *in_key,
+				 unsigned int key_len)
+{
+	struct aesbs_ctr_ctx *ctx = crypto_skcipher_ctx(tfm);
+	int err;
+
+	err = crypto_aes_expand_key(&ctx->fallback, in_key, key_len);
+	if (err)
+		return err;
+
+	ctx->key.rounds = 6 + key_len / 4;
+
+	kernel_neon_begin();
+	aesbs_convert_key(ctx->key.rk, ctx->fallback.key_enc, ctx->key.rounds);
+	kernel_neon_end();
+
+	return 0;
+}
+
 static int ctr_encrypt(struct skcipher_request *req)
 {
 	struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
@@ -260,6 +287,17 @@ static int aesbs_xts_setkey(struct crypto_skcipher *tfm, const u8 *in_key,
 	return aesbs_setkey(tfm, in_key, key_len);
 }
 
+static int ctr_encrypt_sync(struct skcipher_request *req)
+{
+	struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
+	struct aesbs_ctr_ctx *ctx = crypto_skcipher_ctx(tfm);
+
+	if (!may_use_simd())
+		return aes_ctr_encrypt_fallback(&ctx->fallback, req);
+
+	return ctr_encrypt(req);
+}
+
 static int __xts_crypt(struct skcipher_request *req,
 		       void (*fn)(u8 out[], u8 const in[], u8 const rk[],
 				  int rounds, int blocks, u8 iv[]))
@@ -356,7 +394,7 @@ static struct skcipher_alg aes_algs[] = { {
 	.base.cra_driver_name	= "ctr-aes-neonbs",
 	.base.cra_priority	= 250 - 1,
 	.base.cra_blocksize	= 1,
-	.base.cra_ctxsize	= sizeof(struct aesbs_ctx),
+	.base.cra_ctxsize	= sizeof(struct aesbs_ctr_ctx),
 	.base.cra_module	= THIS_MODULE,
 
 	.min_keysize		= AES_MIN_KEY_SIZE,
@@ -364,9 +402,9 @@ static struct skcipher_alg aes_algs[] = { {
 	.chunksize		= AES_BLOCK_SIZE,
 	.walksize		= 8 * AES_BLOCK_SIZE,
 	.ivsize			= AES_BLOCK_SIZE,
-	.setkey			= aesbs_setkey,
-	.encrypt		= ctr_encrypt,
-	.decrypt		= ctr_encrypt,
+	.setkey			= aesbs_ctr_setkey_sync,
+	.encrypt		= ctr_encrypt_sync,
+	.decrypt		= ctr_encrypt_sync,
 }, {
 	.base.cra_name		= "__xts(aes)",
 	.base.cra_driver_name	= "__xts-aes-neonbs",
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 12/12] crypto: arm64/aes-bs - implement non-SIMD fallback for AES-CTR
@ 2017-06-10 16:22   ` Ard Biesheuvel
  0 siblings, 0 replies; 28+ messages in thread
From: Ard Biesheuvel @ 2017-06-10 16:22 UTC (permalink / raw)
  To: linux-arm-kernel

Of the various chaining modes implemented by the bit sliced AES driver,
only CTR is exposed as a synchronous cipher, and requires a fallback in
order to remain usable once we update the kernel mode NEON handling logic
to disallow nested use. So wire up the existing CTR fallback C code.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/crypto/aes-neonbs-glue.c | 48 ++++++++++++++++++--
 1 file changed, 43 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/crypto/aes-neonbs-glue.c b/arch/arm64/crypto/aes-neonbs-glue.c
index db2501d93550..5fe442c26ff1 100644
--- a/arch/arm64/crypto/aes-neonbs-glue.c
+++ b/arch/arm64/crypto/aes-neonbs-glue.c
@@ -1,7 +1,7 @@
 /*
  * Bit sliced AES using NEON instructions
  *
- * Copyright (C) 2016 Linaro Ltd <ard.biesheuvel@linaro.org>
+ * Copyright (C) 2016 - 2017 Linaro Ltd <ard.biesheuvel@linaro.org>
  *
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of the GNU General Public License version 2 as
@@ -9,12 +9,15 @@
  */
 
 #include <asm/neon.h>
+#include <asm/simd.h>
 #include <crypto/aes.h>
 #include <crypto/internal/simd.h>
 #include <crypto/internal/skcipher.h>
 #include <crypto/xts.h>
 #include <linux/module.h>
 
+#include "aes-ctr-fallback.h"
+
 MODULE_AUTHOR("Ard Biesheuvel <ard.biesheuvel@linaro.org>");
 MODULE_LICENSE("GPL v2");
 
@@ -58,6 +61,11 @@ struct aesbs_cbc_ctx {
 	u32			enc[AES_MAX_KEYLENGTH_U32];
 };
 
+struct aesbs_ctr_ctx {
+	struct aesbs_ctx	key;		/* must be first member */
+	struct crypto_aes_ctx	fallback;
+};
+
 struct aesbs_xts_ctx {
 	struct aesbs_ctx	key;
 	u32			twkey[AES_MAX_KEYLENGTH_U32];
@@ -196,6 +204,25 @@ static int cbc_decrypt(struct skcipher_request *req)
 	return err;
 }
 
+static int aesbs_ctr_setkey_sync(struct crypto_skcipher *tfm, const u8 *in_key,
+				 unsigned int key_len)
+{
+	struct aesbs_ctr_ctx *ctx = crypto_skcipher_ctx(tfm);
+	int err;
+
+	err = crypto_aes_expand_key(&ctx->fallback, in_key, key_len);
+	if (err)
+		return err;
+
+	ctx->key.rounds = 6 + key_len / 4;
+
+	kernel_neon_begin();
+	aesbs_convert_key(ctx->key.rk, ctx->fallback.key_enc, ctx->key.rounds);
+	kernel_neon_end();
+
+	return 0;
+}
+
 static int ctr_encrypt(struct skcipher_request *req)
 {
 	struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
@@ -260,6 +287,17 @@ static int aesbs_xts_setkey(struct crypto_skcipher *tfm, const u8 *in_key,
 	return aesbs_setkey(tfm, in_key, key_len);
 }
 
+static int ctr_encrypt_sync(struct skcipher_request *req)
+{
+	struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
+	struct aesbs_ctr_ctx *ctx = crypto_skcipher_ctx(tfm);
+
+	if (!may_use_simd())
+		return aes_ctr_encrypt_fallback(&ctx->fallback, req);
+
+	return ctr_encrypt(req);
+}
+
 static int __xts_crypt(struct skcipher_request *req,
 		       void (*fn)(u8 out[], u8 const in[], u8 const rk[],
 				  int rounds, int blocks, u8 iv[]))
@@ -356,7 +394,7 @@ static struct skcipher_alg aes_algs[] = { {
 	.base.cra_driver_name	= "ctr-aes-neonbs",
 	.base.cra_priority	= 250 - 1,
 	.base.cra_blocksize	= 1,
-	.base.cra_ctxsize	= sizeof(struct aesbs_ctx),
+	.base.cra_ctxsize	= sizeof(struct aesbs_ctr_ctx),
 	.base.cra_module	= THIS_MODULE,
 
 	.min_keysize		= AES_MIN_KEY_SIZE,
@@ -364,9 +402,9 @@ static struct skcipher_alg aes_algs[] = { {
 	.chunksize		= AES_BLOCK_SIZE,
 	.walksize		= 8 * AES_BLOCK_SIZE,
 	.ivsize			= AES_BLOCK_SIZE,
-	.setkey			= aesbs_setkey,
-	.encrypt		= ctr_encrypt,
-	.decrypt		= ctr_encrypt,
+	.setkey			= aesbs_ctr_setkey_sync,
+	.encrypt		= ctr_encrypt_sync,
+	.decrypt		= ctr_encrypt_sync,
 }, {
 	.base.cra_name		= "__xts(aes)",
 	.base.cra_driver_name	= "__xts-aes-neonbs",
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH 00/12] arm64: crypto: prepare for new kernel mode NEON policy
  2017-06-10 16:22 ` Ard Biesheuvel
@ 2017-06-12 14:31   ` Ard Biesheuvel
  -1 siblings, 0 replies; 28+ messages in thread
From: Ard Biesheuvel @ 2017-06-12 14:31 UTC (permalink / raw)
  To: linux-crypto, Herbert Xu, linux-arm-kernel, Catalin Marinas,
	Will Deacon, Dave Martin
  Cc: Ard Biesheuvel

On 10 June 2017 at 18:22, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
> TL;DR: preparatory work for expected changes in arm64's handling of kernel
>        mode SIMD
>
> @Herbert: The arm64 maintainers may want to take this through the arm64 tree,
>           and if not, we need their acks on patch #1. Thanks.
>

Please disregard this for merging for now. I am looking into whether
it is possible to use time invariant fallbacks instead for the AES
routines (and I missed a couple of AES MAC routines in the conversion)

> Currently, arm64 allows kernel mode NEON (KMN) in process, softirq or hardirq
> context. In the process case, we preserve/restore the NEON context lazily,
> but in the softirq/hardirq cases, we eagerly stash a slice of the NEON
> register file, and immediately restore it when kernel_neon_end() is called.
>
> Given the above, arm64 actually does not use the generic may_use_simd() API
> at all*, which was added to allow async wrappers of synchronous SIMD routines
> to be implemented in a generic manner. (On x86, kernel mode SIMD may be used
> in process context or while serving an interrupt taken from user space. On ARM,
> SIMD may only be used in process context)
>
> When adding support for the SVE architecture extension, which shared part of
> the NEON register file with the SIMD and crypto extensions, the eager preserve/
> restore in interrupt context is becoming a problem: it should either preserve
> and restore the entire SVE state (which may be up to 8 KB in size), or it
> should not be allowed to interrupt the lazy preserve, which does need to deal
> with the large SVE state anyway. Otherwise, such an interruption would corrupt
> the NEON state the lazy preserve sees after the interruption.
>
> Given how
> a) KMN is never actually used in hardirq context,
> b) KMN is only used in softirq context by mac80211 code running on behalf of
>    WiFi devices that don't perform the crypto in hardware,
> b) KMN in softirq context is statistically unlikely to interrupt the kernel
>    while it is doing kernel mode NEON in process context,
>
> the unconditional eager preserve/restore typically executes when no KMN in
> process context is in progress, and we can simplify things substantially by
> disallowing nested KMN, i.e., disallow KMN in hardirq context, and allow KMN
> in softirq only if no KMN in process context is already in progress.
>
> The no-nesting rule leaves only the outer SVE-aware lazy preserve/restore,
> which needs to execute with bottom halves disabled, but other than that, no
> intrusive changes should be needed to deal with the SVE payloads.
>
> Given that the no-nesting rule implies that SIMD is no longer allowed in any
> context, the KMN users need to be made aware of this. This series updates the
> current KMN users in the arm64 tree to take may_use_simd() into account. Since
> at this time, SIMD is still allowed in any context, an implementation of
> may_use_simd() is added that simply returns true (#1). It will be updated in
> the future when the no-nesting modifications are made.
>
> * may_use_simd() is only used as a hint in the SHA256 NEON code, since on some
>   microarchitectures, it is only marginally faster, and the eager preserve and
>   restore could actually make it slower.
>
> Ard Biesheuvel (12):
>   arm64: neon: replace generic definition of may_use_simd()
>   crypto: arm64/ghash-ce - add non-SIMD scalar fallback
>   crypto: arm64/crct10dif - add non-SIMD generic fallback
>   crypto: arm64/crc32 - add non-SIMD scalar fallback
>   crypto: arm64/sha1-ce - add non-SIMD generic fallback
>   crypto: arm64/sha2-ce - add non-SIMD scalar fallback
>   crypto: arm64/aes-ce-cipher - match round key endianness with generic
>     code
>   crypto: arm64/aes-ce-cipher: add non-SIMD generic fallback
>   crypto: arm64/aes-ce-ccm: add non-SIMD generic fallback
>   crypto: arm64/aes-blk - add a non-SIMD fallback for synchronous CTR
>   crypto: arm64/chacha20 - take may_use_simd() into account
>   crypto: arm64/aes-bs - implement non-SIMD fallback for AES-CTR
>
>  arch/arm64/crypto/Kconfig              |  22 ++-
>  arch/arm64/crypto/aes-ce-ccm-core.S    |  30 ++--
>  arch/arm64/crypto/aes-ce-ccm-glue.c    | 152 +++++++++++++++-----
>  arch/arm64/crypto/aes-ce-cipher.c      |  55 ++++---
>  arch/arm64/crypto/aes-ce.S             |  12 +-
>  arch/arm64/crypto/aes-ctr-fallback.h   |  55 +++++++
>  arch/arm64/crypto/aes-glue.c           |  17 ++-
>  arch/arm64/crypto/aes-neonbs-glue.c    |  48 ++++++-
>  arch/arm64/crypto/chacha20-neon-glue.c |   5 +-
>  arch/arm64/crypto/crc32-ce-glue.c      |  11 +-
>  arch/arm64/crypto/crct10dif-ce-glue.c  |  13 +-
>  arch/arm64/crypto/ghash-ce-glue.c      |  49 +++++--
>  arch/arm64/crypto/sha1-ce-glue.c       |  18 ++-
>  arch/arm64/crypto/sha2-ce-glue.c       |  30 +++-
>  arch/arm64/crypto/sha256-glue.c        |   1 +
>  arch/arm64/include/asm/Kbuild          |   1 -
>  arch/arm64/include/asm/simd.h          |  24 ++++
>  17 files changed, 420 insertions(+), 123 deletions(-)
>  create mode 100644 arch/arm64/crypto/aes-ctr-fallback.h
>  create mode 100644 arch/arm64/include/asm/simd.h
>
> --
> 2.7.4
>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH 00/12] arm64: crypto: prepare for new kernel mode NEON policy
@ 2017-06-12 14:31   ` Ard Biesheuvel
  0 siblings, 0 replies; 28+ messages in thread
From: Ard Biesheuvel @ 2017-06-12 14:31 UTC (permalink / raw)
  To: linux-arm-kernel

On 10 June 2017 at 18:22, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
> TL;DR: preparatory work for expected changes in arm64's handling of kernel
>        mode SIMD
>
> @Herbert: The arm64 maintainers may want to take this through the arm64 tree,
>           and if not, we need their acks on patch #1. Thanks.
>

Please disregard this for merging for now. I am looking into whether
it is possible to use time invariant fallbacks instead for the AES
routines (and I missed a couple of AES MAC routines in the conversion)

> Currently, arm64 allows kernel mode NEON (KMN) in process, softirq or hardirq
> context. In the process case, we preserve/restore the NEON context lazily,
> but in the softirq/hardirq cases, we eagerly stash a slice of the NEON
> register file, and immediately restore it when kernel_neon_end() is called.
>
> Given the above, arm64 actually does not use the generic may_use_simd() API
> at all*, which was added to allow async wrappers of synchronous SIMD routines
> to be implemented in a generic manner. (On x86, kernel mode SIMD may be used
> in process context or while serving an interrupt taken from user space. On ARM,
> SIMD may only be used in process context)
>
> When adding support for the SVE architecture extension, which shared part of
> the NEON register file with the SIMD and crypto extensions, the eager preserve/
> restore in interrupt context is becoming a problem: it should either preserve
> and restore the entire SVE state (which may be up to 8 KB in size), or it
> should not be allowed to interrupt the lazy preserve, which does need to deal
> with the large SVE state anyway. Otherwise, such an interruption would corrupt
> the NEON state the lazy preserve sees after the interruption.
>
> Given how
> a) KMN is never actually used in hardirq context,
> b) KMN is only used in softirq context by mac80211 code running on behalf of
>    WiFi devices that don't perform the crypto in hardware,
> b) KMN in softirq context is statistically unlikely to interrupt the kernel
>    while it is doing kernel mode NEON in process context,
>
> the unconditional eager preserve/restore typically executes when no KMN in
> process context is in progress, and we can simplify things substantially by
> disallowing nested KMN, i.e., disallow KMN in hardirq context, and allow KMN
> in softirq only if no KMN in process context is already in progress.
>
> The no-nesting rule leaves only the outer SVE-aware lazy preserve/restore,
> which needs to execute with bottom halves disabled, but other than that, no
> intrusive changes should be needed to deal with the SVE payloads.
>
> Given that the no-nesting rule implies that SIMD is no longer allowed in any
> context, the KMN users need to be made aware of this. This series updates the
> current KMN users in the arm64 tree to take may_use_simd() into account. Since
> at this time, SIMD is still allowed in any context, an implementation of
> may_use_simd() is added that simply returns true (#1). It will be updated in
> the future when the no-nesting modifications are made.
>
> * may_use_simd() is only used as a hint in the SHA256 NEON code, since on some
>   microarchitectures, it is only marginally faster, and the eager preserve and
>   restore could actually make it slower.
>
> Ard Biesheuvel (12):
>   arm64: neon: replace generic definition of may_use_simd()
>   crypto: arm64/ghash-ce - add non-SIMD scalar fallback
>   crypto: arm64/crct10dif - add non-SIMD generic fallback
>   crypto: arm64/crc32 - add non-SIMD scalar fallback
>   crypto: arm64/sha1-ce - add non-SIMD generic fallback
>   crypto: arm64/sha2-ce - add non-SIMD scalar fallback
>   crypto: arm64/aes-ce-cipher - match round key endianness with generic
>     code
>   crypto: arm64/aes-ce-cipher: add non-SIMD generic fallback
>   crypto: arm64/aes-ce-ccm: add non-SIMD generic fallback
>   crypto: arm64/aes-blk - add a non-SIMD fallback for synchronous CTR
>   crypto: arm64/chacha20 - take may_use_simd() into account
>   crypto: arm64/aes-bs - implement non-SIMD fallback for AES-CTR
>
>  arch/arm64/crypto/Kconfig              |  22 ++-
>  arch/arm64/crypto/aes-ce-ccm-core.S    |  30 ++--
>  arch/arm64/crypto/aes-ce-ccm-glue.c    | 152 +++++++++++++++-----
>  arch/arm64/crypto/aes-ce-cipher.c      |  55 ++++---
>  arch/arm64/crypto/aes-ce.S             |  12 +-
>  arch/arm64/crypto/aes-ctr-fallback.h   |  55 +++++++
>  arch/arm64/crypto/aes-glue.c           |  17 ++-
>  arch/arm64/crypto/aes-neonbs-glue.c    |  48 ++++++-
>  arch/arm64/crypto/chacha20-neon-glue.c |   5 +-
>  arch/arm64/crypto/crc32-ce-glue.c      |  11 +-
>  arch/arm64/crypto/crct10dif-ce-glue.c  |  13 +-
>  arch/arm64/crypto/ghash-ce-glue.c      |  49 +++++--
>  arch/arm64/crypto/sha1-ce-glue.c       |  18 ++-
>  arch/arm64/crypto/sha2-ce-glue.c       |  30 +++-
>  arch/arm64/crypto/sha256-glue.c        |   1 +
>  arch/arm64/include/asm/Kbuild          |   1 -
>  arch/arm64/include/asm/simd.h          |  24 ++++
>  17 files changed, 420 insertions(+), 123 deletions(-)
>  create mode 100644 arch/arm64/crypto/aes-ctr-fallback.h
>  create mode 100644 arch/arm64/include/asm/simd.h
>
> --
> 2.7.4
>

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2017-06-12 14:31 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-06-10 16:22 [PATCH 00/12] arm64: crypto: prepare for new kernel mode NEON policy Ard Biesheuvel
2017-06-10 16:22 ` Ard Biesheuvel
2017-06-10 16:22 ` [PATCH 01/12] arm64: neon: replace generic definition of may_use_simd() Ard Biesheuvel
2017-06-10 16:22   ` Ard Biesheuvel
2017-06-10 16:22 ` [PATCH 02/12] crypto: arm64/ghash-ce - add non-SIMD scalar fallback Ard Biesheuvel
2017-06-10 16:22   ` Ard Biesheuvel
2017-06-10 16:22 ` [PATCH 03/12] crypto: arm64/crct10dif - add non-SIMD generic fallback Ard Biesheuvel
2017-06-10 16:22   ` Ard Biesheuvel
2017-06-10 16:22 ` [PATCH 04/12] crypto: arm64/crc32 - add non-SIMD scalar fallback Ard Biesheuvel
2017-06-10 16:22   ` Ard Biesheuvel
2017-06-10 16:22 ` [PATCH 05/12] crypto: arm64/sha1-ce - add non-SIMD generic fallback Ard Biesheuvel
2017-06-10 16:22   ` Ard Biesheuvel
2017-06-10 16:22 ` [PATCH 06/12] crypto: arm64/sha2-ce - add non-SIMD scalar fallback Ard Biesheuvel
2017-06-10 16:22   ` Ard Biesheuvel
2017-06-10 16:22 ` [PATCH 07/12] crypto: arm64/aes-ce-cipher - match round key endianness with generic code Ard Biesheuvel
2017-06-10 16:22   ` Ard Biesheuvel
2017-06-10 16:22 ` [PATCH 08/12] crypto: arm64/aes-ce-cipher: add non-SIMD generic fallback Ard Biesheuvel
2017-06-10 16:22   ` Ard Biesheuvel
2017-06-10 16:22 ` [PATCH 09/12] crypto: arm64/aes-ce-ccm: " Ard Biesheuvel
2017-06-10 16:22   ` Ard Biesheuvel
2017-06-10 16:22 ` [PATCH 10/12] crypto: arm64/aes-blk - add a non-SIMD fallback for synchronous CTR Ard Biesheuvel
2017-06-10 16:22   ` Ard Biesheuvel
2017-06-10 16:22 ` [PATCH 11/12] crypto: arm64/chacha20 - take may_use_simd() into account Ard Biesheuvel
2017-06-10 16:22   ` Ard Biesheuvel
2017-06-10 16:22 ` [PATCH 12/12] crypto: arm64/aes-bs - implement non-SIMD fallback for AES-CTR Ard Biesheuvel
2017-06-10 16:22   ` Ard Biesheuvel
2017-06-12 14:31 ` [PATCH 00/12] arm64: crypto: prepare for new kernel mode NEON policy Ard Biesheuvel
2017-06-12 14:31   ` Ard Biesheuvel

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.