All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 00/16] crypto: SHA glue code consolidation
@ 2015-04-07  8:51 ` Ard Biesheuvel
  0 siblings, 0 replies; 50+ messages in thread
From: Ard Biesheuvel @ 2015-04-07  8:51 UTC (permalink / raw)
  To: linux-crypto, linux-arm-kernel, x86, herbert, samitolvanen,
	jussi.kivilinna
  Cc: stockhausen, Ard Biesheuvel

Hello all,

This is v3 of what is now a complete glue code consolidation series
for generic, x86, arm and arm64 implementations of SHA-1, SHA-224/256
and SHA-384/512.

The purpose is to have a single, canonical implementation of the core
logic that gets reused by all versions of the algorithm. Note that this
is not about saving space in the binary, but about ensuring that the same
code is used everywhere, reducing the maintenance burden.

The base layer implements all the update and finalization logic around
the block transforms, where the prototypes of the latter look something
like this:

typedef void (shaXXX_block_fn)(int blocks, u8 const *src, uXX *state,
                             const u8 *head, void *p);

The block implementation should process the head block first, then
process the requested number of block starting at 'src'. The generic
pointer 'p' is passed down from the do_update/do_finalize() versions;
this is used for instance by the ARM64 implementations to indicate to
the core ASM implementation that it should finalize the digest, which
it will do only if the input was a round multiple of the block size.
The generic pointer is used here as a means of conveying that information
back and forth.

Note that the base functions prototypes are all 'returning int' but
they all return 0. They should be invoked as tail calls where possible
to eliminate some of the function call overhead. If that is not possible,
the return values can be safely ignored.

Changes since v2:
- Replace the base modules with header files containing static inlines that
  implement the core logic. This avoids introducing new modules or new
  inter-module dependencies, and gives the compiler the opportunity for
  optimization.
- Now includes new glue fo the existing SHA-1 NEON module and Sami's new
  SHA-224/256 ASM+NEON module
- Use direct assigments instead of memcpy() to set the initial state (as is
  done in many of the call sites of the various init functions that are being
  converted by this series)

Changes since v1 (RFC):
- prefixed globally visible generic symbols with crypto_
- added SHA-1 base layer
- updated init code to only set the initial constants and clear the
  count, clearing the buffer is unnecessary [Markus]
- favor the small update path in crypto_sha_XXX_base_do_update() [Markus]
- update crypto_sha_XXX_do_finalize() to use memset() on the buffer directly
  rather than copying a statically allocated padding buffer into it
  [Markus]
- moved a bunch of existing arm and x86 implementations to use the new base
  layers

Note: looking at the generated asm (for arm64), I noticed that the memcpy/memset
invocations with compile time constant src and len arguments (which includes
the empty struct assignments) are eliminated completely, and replaced by
direct loads and stores. Hopefully this addresses the concern raised by Markus
regarding this.

Ard Biesheuvel (16):
  crypto: sha1: implement base layer for SHA-1
  crypto: sha256: implement base layer for SHA-256
  crypto: sha512: implement base layer for SHA-512
  crypto: sha1-generic: move to generic glue implementation
  crypto: sha256-generic: move to generic glue implementation
  crypto: sha512-generic: move to generic glue implementation
  crypto/arm: move SHA-1 ARM asm implementation to base layer
  crypto/arm: move SHA-1 NEON implementation to base layer
  crypto/arm: move SHA-1 ARMv8 implementation to base layer
  crypto/arm: move SHA-224/256 ASM/NEON implementation to base layer
  crypto/arm: move SHA-224/256 ARMv8 implementation to base layer
  crypto/arm64: move SHA-1 ARMv8 implementation to base layer
  crypto/arm64: move SHA-224/256 ARMv8 implementation to base layer
  crypto/x86: move SHA-1 SSSE3 implementation to base layer
  crypto/x86: move SHA-224/256 SSSE3 implementation to base layer
  crypto/x86: move SHA-384/512 SSSE3 implementation to base layer

 arch/arm/crypto/Kconfig                  |   3 +-
 arch/arm/crypto/sha1-ce-glue.c           | 111 +++++-----------
 arch/arm/{include/asm => }/crypto/sha1.h |   3 +
 arch/arm/crypto/sha1_glue.c              | 116 ++++-------------
 arch/arm/crypto/sha1_neon_glue.c         | 139 +++++---------------
 arch/arm/crypto/sha2-ce-glue.c           | 154 ++++++-----------------
 arch/arm/crypto/sha256_glue.c            | 174 +++++--------------------
 arch/arm/crypto/sha256_glue.h            |  17 +--
 arch/arm/crypto/sha256_neon_glue.c       | 144 +++++++--------------
 arch/arm64/crypto/sha1-ce-core.S         |  11 +-
 arch/arm64/crypto/sha1-ce-glue.c         | 133 ++++----------------
 arch/arm64/crypto/sha2-ce-core.S         |  11 +-
 arch/arm64/crypto/sha2-ce-glue.c         | 209 +++++--------------------------
 arch/x86/crypto/sha1_ssse3_glue.c        | 136 +++++---------------
 arch/x86/crypto/sha256_ssse3_glue.c      | 184 ++++++---------------------
 arch/x86/crypto/sha512_ssse3_glue.c      | 193 ++++++----------------------
 crypto/sha1_generic.c                    | 108 +++++-----------
 crypto/sha256_generic.c                  | 140 +++++----------------
 crypto/sha512_generic.c                  | 127 ++++---------------
 include/crypto/sha.h                     |   9 ++
 include/crypto/sha1_base.h               | 123 ++++++++++++++++++
 include/crypto/sha256_base.h             | 144 +++++++++++++++++++++
 include/crypto/sha512_base.h             | 147 ++++++++++++++++++++++
 23 files changed, 880 insertions(+), 1656 deletions(-)
 rename arch/arm/{include/asm => }/crypto/sha1.h (67%)
 create mode 100644 include/crypto/sha1_base.h
 create mode 100644 include/crypto/sha256_base.h
 create mode 100644 include/crypto/sha512_base.h

-- 
1.8.3.2

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH v3 00/16] crypto: SHA glue code consolidation
@ 2015-04-07  8:51 ` Ard Biesheuvel
  0 siblings, 0 replies; 50+ messages in thread
From: Ard Biesheuvel @ 2015-04-07  8:51 UTC (permalink / raw)
  To: linux-arm-kernel

Hello all,

This is v3 of what is now a complete glue code consolidation series
for generic, x86, arm and arm64 implementations of SHA-1, SHA-224/256
and SHA-384/512.

The purpose is to have a single, canonical implementation of the core
logic that gets reused by all versions of the algorithm. Note that this
is not about saving space in the binary, but about ensuring that the same
code is used everywhere, reducing the maintenance burden.

The base layer implements all the update and finalization logic around
the block transforms, where the prototypes of the latter look something
like this:

typedef void (shaXXX_block_fn)(int blocks, u8 const *src, uXX *state,
                             const u8 *head, void *p);

The block implementation should process the head block first, then
process the requested number of block starting at 'src'. The generic
pointer 'p' is passed down from the do_update/do_finalize() versions;
this is used for instance by the ARM64 implementations to indicate to
the core ASM implementation that it should finalize the digest, which
it will do only if the input was a round multiple of the block size.
The generic pointer is used here as a means of conveying that information
back and forth.

Note that the base functions prototypes are all 'returning int' but
they all return 0. They should be invoked as tail calls where possible
to eliminate some of the function call overhead. If that is not possible,
the return values can be safely ignored.

Changes since v2:
- Replace the base modules with header files containing static inlines that
  implement the core logic. This avoids introducing new modules or new
  inter-module dependencies, and gives the compiler the opportunity for
  optimization.
- Now includes new glue fo the existing SHA-1 NEON module and Sami's new
  SHA-224/256 ASM+NEON module
- Use direct assigments instead of memcpy() to set the initial state (as is
  done in many of the call sites of the various init functions that are being
  converted by this series)

Changes since v1 (RFC):
- prefixed globally visible generic symbols with crypto_
- added SHA-1 base layer
- updated init code to only set the initial constants and clear the
  count, clearing the buffer is unnecessary [Markus]
- favor the small update path in crypto_sha_XXX_base_do_update() [Markus]
- update crypto_sha_XXX_do_finalize() to use memset() on the buffer directly
  rather than copying a statically allocated padding buffer into it
  [Markus]
- moved a bunch of existing arm and x86 implementations to use the new base
  layers

Note: looking at the generated asm (for arm64), I noticed that the memcpy/memset
invocations with compile time constant src and len arguments (which includes
the empty struct assignments) are eliminated completely, and replaced by
direct loads and stores. Hopefully this addresses the concern raised by Markus
regarding this.

Ard Biesheuvel (16):
  crypto: sha1: implement base layer for SHA-1
  crypto: sha256: implement base layer for SHA-256
  crypto: sha512: implement base layer for SHA-512
  crypto: sha1-generic: move to generic glue implementation
  crypto: sha256-generic: move to generic glue implementation
  crypto: sha512-generic: move to generic glue implementation
  crypto/arm: move SHA-1 ARM asm implementation to base layer
  crypto/arm: move SHA-1 NEON implementation to base layer
  crypto/arm: move SHA-1 ARMv8 implementation to base layer
  crypto/arm: move SHA-224/256 ASM/NEON implementation to base layer
  crypto/arm: move SHA-224/256 ARMv8 implementation to base layer
  crypto/arm64: move SHA-1 ARMv8 implementation to base layer
  crypto/arm64: move SHA-224/256 ARMv8 implementation to base layer
  crypto/x86: move SHA-1 SSSE3 implementation to base layer
  crypto/x86: move SHA-224/256 SSSE3 implementation to base layer
  crypto/x86: move SHA-384/512 SSSE3 implementation to base layer

 arch/arm/crypto/Kconfig                  |   3 +-
 arch/arm/crypto/sha1-ce-glue.c           | 111 +++++-----------
 arch/arm/{include/asm => }/crypto/sha1.h |   3 +
 arch/arm/crypto/sha1_glue.c              | 116 ++++-------------
 arch/arm/crypto/sha1_neon_glue.c         | 139 +++++---------------
 arch/arm/crypto/sha2-ce-glue.c           | 154 ++++++-----------------
 arch/arm/crypto/sha256_glue.c            | 174 +++++--------------------
 arch/arm/crypto/sha256_glue.h            |  17 +--
 arch/arm/crypto/sha256_neon_glue.c       | 144 +++++++--------------
 arch/arm64/crypto/sha1-ce-core.S         |  11 +-
 arch/arm64/crypto/sha1-ce-glue.c         | 133 ++++----------------
 arch/arm64/crypto/sha2-ce-core.S         |  11 +-
 arch/arm64/crypto/sha2-ce-glue.c         | 209 +++++--------------------------
 arch/x86/crypto/sha1_ssse3_glue.c        | 136 +++++---------------
 arch/x86/crypto/sha256_ssse3_glue.c      | 184 ++++++---------------------
 arch/x86/crypto/sha512_ssse3_glue.c      | 193 ++++++----------------------
 crypto/sha1_generic.c                    | 108 +++++-----------
 crypto/sha256_generic.c                  | 140 +++++----------------
 crypto/sha512_generic.c                  | 127 ++++---------------
 include/crypto/sha.h                     |   9 ++
 include/crypto/sha1_base.h               | 123 ++++++++++++++++++
 include/crypto/sha256_base.h             | 144 +++++++++++++++++++++
 include/crypto/sha512_base.h             | 147 ++++++++++++++++++++++
 23 files changed, 880 insertions(+), 1656 deletions(-)
 rename arch/arm/{include/asm => }/crypto/sha1.h (67%)
 create mode 100644 include/crypto/sha1_base.h
 create mode 100644 include/crypto/sha256_base.h
 create mode 100644 include/crypto/sha512_base.h

-- 
1.8.3.2

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH v3 01/16] crypto: sha1: implement base layer for SHA-1
  2015-04-07  8:51 ` Ard Biesheuvel
@ 2015-04-07  8:51   ` Ard Biesheuvel
  -1 siblings, 0 replies; 50+ messages in thread
From: Ard Biesheuvel @ 2015-04-07  8:51 UTC (permalink / raw)
  To: linux-crypto, linux-arm-kernel, x86, herbert, samitolvanen,
	jussi.kivilinna
  Cc: stockhausen, Ard Biesheuvel

To reduce the number of copies of boilerplate code throughout
the tree, this patch implements generic glue for the SHA-1
algorithm. This allows a specific arch or hardware implementation
to only implement the special handling that it needs.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 include/crypto/sha1_base.h | 123 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 123 insertions(+)
 create mode 100644 include/crypto/sha1_base.h

diff --git a/include/crypto/sha1_base.h b/include/crypto/sha1_base.h
new file mode 100644
index 000000000000..919db0920203
--- /dev/null
+++ b/include/crypto/sha1_base.h
@@ -0,0 +1,123 @@
+/*
+ * sha1_base.h - core logic for SHA-1 implementations
+ *
+ * Copyright (C) 2015 Linaro Ltd <ard.biesheuvel@linaro.org>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <crypto/internal/hash.h>
+#include <crypto/sha.h>
+#include <linux/crypto.h>
+#include <linux/module.h>
+
+#include <asm/unaligned.h>
+
+typedef void (sha1_block_fn)(int blocks, u8 const *src, u32 *state,
+			     const u8 *head, void *p);
+
+static inline int sha1_base_init(struct shash_desc *desc)
+{
+	struct sha1_state *sctx = shash_desc_ctx(desc);
+
+	sctx->state[0] = SHA1_H0;
+	sctx->state[1] = SHA1_H1;
+	sctx->state[2] = SHA1_H2;
+	sctx->state[3] = SHA1_H3;
+	sctx->state[4] = SHA1_H4;
+ 	sctx->count = 0;
+
+	return 0;
+}
+
+static inline int sha1_base_export(struct shash_desc *desc, void *out)
+{
+	struct sha1_state *sctx = shash_desc_ctx(desc);
+	struct sha1_state *dst = out;
+
+	*dst = *sctx;
+
+	return 0;
+}
+
+static inline int sha1_base_import(struct shash_desc *desc, const void *in)
+{
+	struct sha1_state *sctx = shash_desc_ctx(desc);
+	struct sha1_state const *src = in;
+
+	*sctx = *src;
+
+	return 0;
+}
+
+static inline int sha1_base_do_update(struct shash_desc *desc, const u8 *data,
+				      unsigned int len, sha1_block_fn *block_fn,
+				      void *p)
+{
+	struct sha1_state *sctx = shash_desc_ctx(desc);
+	unsigned int partial = sctx->count % SHA1_BLOCK_SIZE;
+
+	sctx->count += len;
+
+	if (unlikely((partial + len) >= SHA1_BLOCK_SIZE)) {
+		int blocks;
+
+		if (partial) {
+			int p = SHA1_BLOCK_SIZE - partial;
+
+			memcpy(sctx->buffer + partial, data, p);
+			data += p;
+			len -= p;
+		}
+
+		blocks = len / SHA1_BLOCK_SIZE;
+		len %= SHA1_BLOCK_SIZE;
+
+		block_fn(blocks, data, sctx->state,
+			 partial ? sctx->buffer : NULL, p);
+		data += blocks * SHA1_BLOCK_SIZE;
+		partial = 0;
+	}
+	if (len)
+		memcpy(sctx->buffer + partial, data, len);
+
+	return 0;
+}
+
+static inline int sha1_base_do_finalize(struct shash_desc *desc,
+					sha1_block_fn *block_fn, void *p)
+{
+	const int bit_offset = SHA1_BLOCK_SIZE - sizeof(__be64);
+	struct sha1_state *sctx = shash_desc_ctx(desc);
+	__be64 *bits = (__be64 *)(sctx->buffer + bit_offset);
+	unsigned int partial = sctx->count % SHA1_BLOCK_SIZE;
+
+	sctx->buffer[partial++] = 0x80;
+	if (partial > bit_offset) {
+		memset(sctx->buffer + partial, 0x0, SHA1_BLOCK_SIZE - partial);
+		partial = 0;
+
+		block_fn(1, sctx->buffer, sctx->state, NULL, p);
+	}
+
+	memset(sctx->buffer + partial, 0x0, bit_offset - partial);
+	*bits = cpu_to_be64(sctx->count << 3);
+	block_fn(1, sctx->buffer, sctx->state, NULL, p);
+
+	return 0;
+}
+
+static inline int sha1_base_finish(struct shash_desc *desc, u8 *out)
+{
+	struct sha1_state *sctx = shash_desc_ctx(desc);
+	__be32 *digest = (__be32 *)out;
+	int i;
+
+	for (i = 0; i < SHA1_DIGEST_SIZE / sizeof(__be32); i++)
+		put_unaligned_be32(sctx->state[i], digest++);
+
+	*sctx = (struct sha1_state){};
+	return 0;
+}
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v3 01/16] crypto: sha1: implement base layer for SHA-1
@ 2015-04-07  8:51   ` Ard Biesheuvel
  0 siblings, 0 replies; 50+ messages in thread
From: Ard Biesheuvel @ 2015-04-07  8:51 UTC (permalink / raw)
  To: linux-arm-kernel

To reduce the number of copies of boilerplate code throughout
the tree, this patch implements generic glue for the SHA-1
algorithm. This allows a specific arch or hardware implementation
to only implement the special handling that it needs.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 include/crypto/sha1_base.h | 123 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 123 insertions(+)
 create mode 100644 include/crypto/sha1_base.h

diff --git a/include/crypto/sha1_base.h b/include/crypto/sha1_base.h
new file mode 100644
index 000000000000..919db0920203
--- /dev/null
+++ b/include/crypto/sha1_base.h
@@ -0,0 +1,123 @@
+/*
+ * sha1_base.h - core logic for SHA-1 implementations
+ *
+ * Copyright (C) 2015 Linaro Ltd <ard.biesheuvel@linaro.org>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <crypto/internal/hash.h>
+#include <crypto/sha.h>
+#include <linux/crypto.h>
+#include <linux/module.h>
+
+#include <asm/unaligned.h>
+
+typedef void (sha1_block_fn)(int blocks, u8 const *src, u32 *state,
+			     const u8 *head, void *p);
+
+static inline int sha1_base_init(struct shash_desc *desc)
+{
+	struct sha1_state *sctx = shash_desc_ctx(desc);
+
+	sctx->state[0] = SHA1_H0;
+	sctx->state[1] = SHA1_H1;
+	sctx->state[2] = SHA1_H2;
+	sctx->state[3] = SHA1_H3;
+	sctx->state[4] = SHA1_H4;
+ 	sctx->count = 0;
+
+	return 0;
+}
+
+static inline int sha1_base_export(struct shash_desc *desc, void *out)
+{
+	struct sha1_state *sctx = shash_desc_ctx(desc);
+	struct sha1_state *dst = out;
+
+	*dst = *sctx;
+
+	return 0;
+}
+
+static inline int sha1_base_import(struct shash_desc *desc, const void *in)
+{
+	struct sha1_state *sctx = shash_desc_ctx(desc);
+	struct sha1_state const *src = in;
+
+	*sctx = *src;
+
+	return 0;
+}
+
+static inline int sha1_base_do_update(struct shash_desc *desc, const u8 *data,
+				      unsigned int len, sha1_block_fn *block_fn,
+				      void *p)
+{
+	struct sha1_state *sctx = shash_desc_ctx(desc);
+	unsigned int partial = sctx->count % SHA1_BLOCK_SIZE;
+
+	sctx->count += len;
+
+	if (unlikely((partial + len) >= SHA1_BLOCK_SIZE)) {
+		int blocks;
+
+		if (partial) {
+			int p = SHA1_BLOCK_SIZE - partial;
+
+			memcpy(sctx->buffer + partial, data, p);
+			data += p;
+			len -= p;
+		}
+
+		blocks = len / SHA1_BLOCK_SIZE;
+		len %= SHA1_BLOCK_SIZE;
+
+		block_fn(blocks, data, sctx->state,
+			 partial ? sctx->buffer : NULL, p);
+		data += blocks * SHA1_BLOCK_SIZE;
+		partial = 0;
+	}
+	if (len)
+		memcpy(sctx->buffer + partial, data, len);
+
+	return 0;
+}
+
+static inline int sha1_base_do_finalize(struct shash_desc *desc,
+					sha1_block_fn *block_fn, void *p)
+{
+	const int bit_offset = SHA1_BLOCK_SIZE - sizeof(__be64);
+	struct sha1_state *sctx = shash_desc_ctx(desc);
+	__be64 *bits = (__be64 *)(sctx->buffer + bit_offset);
+	unsigned int partial = sctx->count % SHA1_BLOCK_SIZE;
+
+	sctx->buffer[partial++] = 0x80;
+	if (partial > bit_offset) {
+		memset(sctx->buffer + partial, 0x0, SHA1_BLOCK_SIZE - partial);
+		partial = 0;
+
+		block_fn(1, sctx->buffer, sctx->state, NULL, p);
+	}
+
+	memset(sctx->buffer + partial, 0x0, bit_offset - partial);
+	*bits = cpu_to_be64(sctx->count << 3);
+	block_fn(1, sctx->buffer, sctx->state, NULL, p);
+
+	return 0;
+}
+
+static inline int sha1_base_finish(struct shash_desc *desc, u8 *out)
+{
+	struct sha1_state *sctx = shash_desc_ctx(desc);
+	__be32 *digest = (__be32 *)out;
+	int i;
+
+	for (i = 0; i < SHA1_DIGEST_SIZE / sizeof(__be32); i++)
+		put_unaligned_be32(sctx->state[i], digest++);
+
+	*sctx = (struct sha1_state){};
+	return 0;
+}
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v3 02/16] crypto: sha256: implement base layer for SHA-256
  2015-04-07  8:51 ` Ard Biesheuvel
@ 2015-04-07  8:51   ` Ard Biesheuvel
  -1 siblings, 0 replies; 50+ messages in thread
From: Ard Biesheuvel @ 2015-04-07  8:51 UTC (permalink / raw)
  To: linux-crypto, linux-arm-kernel, x86, herbert, samitolvanen,
	jussi.kivilinna
  Cc: stockhausen, Ard Biesheuvel

To reduce the number of copies of boilerplate code throughout
the tree, this patch implements generic glue for the SHA-256
algorithm. This allows a specific arch or hardware implementation
to only implement the special handling that it needs.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 include/crypto/sha256_base.h | 144 +++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 144 insertions(+)
 create mode 100644 include/crypto/sha256_base.h

diff --git a/include/crypto/sha256_base.h b/include/crypto/sha256_base.h
new file mode 100644
index 000000000000..237d549b4093
--- /dev/null
+++ b/include/crypto/sha256_base.h
@@ -0,0 +1,144 @@
+/*
+ * sha256_base.h - core logic for SHA-256 implementations
+ *
+ * Copyright (C) 2015 Linaro Ltd <ard.biesheuvel@linaro.org>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <crypto/internal/hash.h>
+#include <crypto/sha.h>
+#include <linux/crypto.h>
+#include <linux/module.h>
+
+#include <asm/unaligned.h>
+
+typedef void (sha256_block_fn)(int blocks, u8 const *src, u32 *state,
+			       const u8 *head, void *p);
+
+static inline int sha224_base_init(struct shash_desc *desc)
+{
+	struct sha256_state *sctx = shash_desc_ctx(desc);
+
+	sctx->state[0] = SHA224_H0;
+	sctx->state[1] = SHA224_H1;
+	sctx->state[2] = SHA224_H2;
+	sctx->state[3] = SHA224_H3;
+	sctx->state[4] = SHA224_H4;
+	sctx->state[5] = SHA224_H5;
+	sctx->state[6] = SHA224_H6;
+	sctx->state[7] = SHA224_H7;
+	sctx->count = 0;
+
+	return 0;
+}
+
+static inline int sha256_base_init(struct shash_desc *desc)
+{
+	struct sha256_state *sctx = shash_desc_ctx(desc);
+
+	sctx->state[0] = SHA256_H0;
+	sctx->state[1] = SHA256_H1;
+	sctx->state[2] = SHA256_H2;
+	sctx->state[3] = SHA256_H3;
+	sctx->state[4] = SHA256_H4;
+	sctx->state[5] = SHA256_H5;
+	sctx->state[6] = SHA256_H6;
+	sctx->state[7] = SHA256_H7;
+	sctx->count = 0;
+
+	return 0;
+}
+
+static inline int sha256_base_export(struct shash_desc *desc, void *out)
+{
+	struct sha256_state *sctx = shash_desc_ctx(desc);
+	struct sha256_state *dst = out;
+
+	*dst = *sctx;
+
+	return 0;
+}
+
+static inline int sha256_base_import(struct shash_desc *desc, const void *in)
+{
+	struct sha256_state *sctx = shash_desc_ctx(desc);
+	struct sha256_state const *src = in;
+
+	*sctx = *src;
+
+	return 0;
+}
+
+static inline int sha256_base_do_update(struct shash_desc *desc, const u8 *data,
+					unsigned int len,
+					sha256_block_fn *block_fn, void *p)
+{
+	struct sha256_state *sctx = shash_desc_ctx(desc);
+	unsigned int partial = sctx->count % SHA256_BLOCK_SIZE;
+
+	sctx->count += len;
+
+	if (unlikely((partial + len) >= SHA256_BLOCK_SIZE)) {
+		int blocks;
+
+		if (partial) {
+			int p = SHA256_BLOCK_SIZE - partial;
+
+			memcpy(sctx->buf + partial, data, p);
+			data += p;
+			len -= p;
+		}
+
+		blocks = len / SHA256_BLOCK_SIZE;
+		len %= SHA256_BLOCK_SIZE;
+
+		block_fn(blocks, data, sctx->state,
+			 partial ? sctx->buf : NULL, p);
+		data += blocks * SHA256_BLOCK_SIZE;
+		partial = 0;
+	}
+	if (len)
+		memcpy(sctx->buf + partial, data, len);
+
+	return 0;
+}
+
+static inline int sha256_base_do_finalize(struct shash_desc *desc,
+					  sha256_block_fn *block_fn, void *p)
+{
+	const int bit_offset = SHA256_BLOCK_SIZE - sizeof(__be64);
+	struct sha256_state *sctx = shash_desc_ctx(desc);
+	__be64 *bits = (__be64 *)(sctx->buf + bit_offset);
+	unsigned int partial = sctx->count % SHA256_BLOCK_SIZE;
+
+	sctx->buf[partial++] = 0x80;
+	if (partial > bit_offset) {
+		memset(sctx->buf + partial, 0x0, SHA256_BLOCK_SIZE - partial);
+		partial = 0;
+
+		block_fn(1, sctx->buf, sctx->state, NULL, p);
+	}
+
+	memset(sctx->buf + partial, 0x0, bit_offset - partial);
+	*bits = cpu_to_be64(sctx->count << 3);
+	block_fn(1, sctx->buf, sctx->state, NULL, p);
+
+	return 0;
+}
+
+static inline int sha256_base_finish(struct shash_desc *desc, u8 *out)
+{
+	unsigned int digest_size = crypto_shash_digestsize(desc->tfm);
+	struct sha256_state *sctx = shash_desc_ctx(desc);
+	__be32 *digest = (__be32 *)out;
+	int i;
+
+	for (i = 0; digest_size > 0; i++, digest_size -= sizeof(__be32))
+		put_unaligned_be32(sctx->state[i], digest++);
+
+	*sctx = (struct sha256_state){};
+	return 0;
+}
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v3 02/16] crypto: sha256: implement base layer for SHA-256
@ 2015-04-07  8:51   ` Ard Biesheuvel
  0 siblings, 0 replies; 50+ messages in thread
From: Ard Biesheuvel @ 2015-04-07  8:51 UTC (permalink / raw)
  To: linux-arm-kernel

To reduce the number of copies of boilerplate code throughout
the tree, this patch implements generic glue for the SHA-256
algorithm. This allows a specific arch or hardware implementation
to only implement the special handling that it needs.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 include/crypto/sha256_base.h | 144 +++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 144 insertions(+)
 create mode 100644 include/crypto/sha256_base.h

diff --git a/include/crypto/sha256_base.h b/include/crypto/sha256_base.h
new file mode 100644
index 000000000000..237d549b4093
--- /dev/null
+++ b/include/crypto/sha256_base.h
@@ -0,0 +1,144 @@
+/*
+ * sha256_base.h - core logic for SHA-256 implementations
+ *
+ * Copyright (C) 2015 Linaro Ltd <ard.biesheuvel@linaro.org>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <crypto/internal/hash.h>
+#include <crypto/sha.h>
+#include <linux/crypto.h>
+#include <linux/module.h>
+
+#include <asm/unaligned.h>
+
+typedef void (sha256_block_fn)(int blocks, u8 const *src, u32 *state,
+			       const u8 *head, void *p);
+
+static inline int sha224_base_init(struct shash_desc *desc)
+{
+	struct sha256_state *sctx = shash_desc_ctx(desc);
+
+	sctx->state[0] = SHA224_H0;
+	sctx->state[1] = SHA224_H1;
+	sctx->state[2] = SHA224_H2;
+	sctx->state[3] = SHA224_H3;
+	sctx->state[4] = SHA224_H4;
+	sctx->state[5] = SHA224_H5;
+	sctx->state[6] = SHA224_H6;
+	sctx->state[7] = SHA224_H7;
+	sctx->count = 0;
+
+	return 0;
+}
+
+static inline int sha256_base_init(struct shash_desc *desc)
+{
+	struct sha256_state *sctx = shash_desc_ctx(desc);
+
+	sctx->state[0] = SHA256_H0;
+	sctx->state[1] = SHA256_H1;
+	sctx->state[2] = SHA256_H2;
+	sctx->state[3] = SHA256_H3;
+	sctx->state[4] = SHA256_H4;
+	sctx->state[5] = SHA256_H5;
+	sctx->state[6] = SHA256_H6;
+	sctx->state[7] = SHA256_H7;
+	sctx->count = 0;
+
+	return 0;
+}
+
+static inline int sha256_base_export(struct shash_desc *desc, void *out)
+{
+	struct sha256_state *sctx = shash_desc_ctx(desc);
+	struct sha256_state *dst = out;
+
+	*dst = *sctx;
+
+	return 0;
+}
+
+static inline int sha256_base_import(struct shash_desc *desc, const void *in)
+{
+	struct sha256_state *sctx = shash_desc_ctx(desc);
+	struct sha256_state const *src = in;
+
+	*sctx = *src;
+
+	return 0;
+}
+
+static inline int sha256_base_do_update(struct shash_desc *desc, const u8 *data,
+					unsigned int len,
+					sha256_block_fn *block_fn, void *p)
+{
+	struct sha256_state *sctx = shash_desc_ctx(desc);
+	unsigned int partial = sctx->count % SHA256_BLOCK_SIZE;
+
+	sctx->count += len;
+
+	if (unlikely((partial + len) >= SHA256_BLOCK_SIZE)) {
+		int blocks;
+
+		if (partial) {
+			int p = SHA256_BLOCK_SIZE - partial;
+
+			memcpy(sctx->buf + partial, data, p);
+			data += p;
+			len -= p;
+		}
+
+		blocks = len / SHA256_BLOCK_SIZE;
+		len %= SHA256_BLOCK_SIZE;
+
+		block_fn(blocks, data, sctx->state,
+			 partial ? sctx->buf : NULL, p);
+		data += blocks * SHA256_BLOCK_SIZE;
+		partial = 0;
+	}
+	if (len)
+		memcpy(sctx->buf + partial, data, len);
+
+	return 0;
+}
+
+static inline int sha256_base_do_finalize(struct shash_desc *desc,
+					  sha256_block_fn *block_fn, void *p)
+{
+	const int bit_offset = SHA256_BLOCK_SIZE - sizeof(__be64);
+	struct sha256_state *sctx = shash_desc_ctx(desc);
+	__be64 *bits = (__be64 *)(sctx->buf + bit_offset);
+	unsigned int partial = sctx->count % SHA256_BLOCK_SIZE;
+
+	sctx->buf[partial++] = 0x80;
+	if (partial > bit_offset) {
+		memset(sctx->buf + partial, 0x0, SHA256_BLOCK_SIZE - partial);
+		partial = 0;
+
+		block_fn(1, sctx->buf, sctx->state, NULL, p);
+	}
+
+	memset(sctx->buf + partial, 0x0, bit_offset - partial);
+	*bits = cpu_to_be64(sctx->count << 3);
+	block_fn(1, sctx->buf, sctx->state, NULL, p);
+
+	return 0;
+}
+
+static inline int sha256_base_finish(struct shash_desc *desc, u8 *out)
+{
+	unsigned int digest_size = crypto_shash_digestsize(desc->tfm);
+	struct sha256_state *sctx = shash_desc_ctx(desc);
+	__be32 *digest = (__be32 *)out;
+	int i;
+
+	for (i = 0; digest_size > 0; i++, digest_size -= sizeof(__be32))
+		put_unaligned_be32(sctx->state[i], digest++);
+
+	*sctx = (struct sha256_state){};
+	return 0;
+}
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v3 03/16] crypto: sha512: implement base layer for SHA-512
  2015-04-07  8:51 ` Ard Biesheuvel
@ 2015-04-07  8:51   ` Ard Biesheuvel
  -1 siblings, 0 replies; 50+ messages in thread
From: Ard Biesheuvel @ 2015-04-07  8:51 UTC (permalink / raw)
  To: linux-crypto, linux-arm-kernel, x86, herbert, samitolvanen,
	jussi.kivilinna
  Cc: stockhausen, Ard Biesheuvel

To reduce the number of copies of boilerplate code throughout
the tree, this patch implements generic glue for the SHA-512
algorithm. This allows a specific arch or hardware implementation
to only implement the special handling that it needs.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 include/crypto/sha512_base.h | 147 +++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 147 insertions(+)
 create mode 100644 include/crypto/sha512_base.h

diff --git a/include/crypto/sha512_base.h b/include/crypto/sha512_base.h
new file mode 100644
index 000000000000..44351f781dce
--- /dev/null
+++ b/include/crypto/sha512_base.h
@@ -0,0 +1,147 @@
+/*
+ * sha512_base.h - core logic for SHA-512 implementations
+ *
+ * Copyright (C) 2015 Linaro Ltd <ard.biesheuvel@linaro.org>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <crypto/internal/hash.h>
+#include <crypto/sha.h>
+#include <linux/crypto.h>
+#include <linux/module.h>
+
+#include <asm/unaligned.h>
+
+typedef void (sha512_block_fn)(int blocks, u8 const *src, u64 *state,
+			       const u8 *head, void *p);
+
+static inline int sha384_base_init(struct shash_desc *desc)
+{
+	struct sha512_state *sctx = shash_desc_ctx(desc);
+
+	sctx->state[0] = SHA384_H0;
+	sctx->state[1] = SHA384_H1;
+	sctx->state[2] = SHA384_H2;
+	sctx->state[3] = SHA384_H3;
+	sctx->state[4] = SHA384_H4;
+	sctx->state[5] = SHA384_H5;
+	sctx->state[6] = SHA384_H6;
+	sctx->state[7] = SHA384_H7;
+	sctx->count[0] = sctx->count[1] = 0;
+
+	return 0;
+}
+
+static inline int sha512_base_init(struct shash_desc *desc)
+{
+	struct sha512_state *sctx = shash_desc_ctx(desc);
+
+	sctx->state[0] = SHA512_H0;
+	sctx->state[1] = SHA512_H1;
+	sctx->state[2] = SHA512_H2;
+	sctx->state[3] = SHA512_H3;
+	sctx->state[4] = SHA512_H4;
+	sctx->state[5] = SHA512_H5;
+	sctx->state[6] = SHA512_H6;
+	sctx->state[7] = SHA512_H7;
+	sctx->count[0] = sctx->count[1] = 0;
+
+	return 0;
+}
+
+static inline int sha512_base_export(struct shash_desc *desc, void *out)
+{
+	struct sha512_state *sctx = shash_desc_ctx(desc);
+	struct sha512_state *dst = out;
+
+	*dst = *sctx;
+
+	return 0;
+}
+
+static inline int sha512_base_import(struct shash_desc *desc, const void *in)
+{
+	struct sha512_state *sctx = shash_desc_ctx(desc);
+	struct sha512_state const *src = in;
+
+	*sctx = *src;
+
+	return 0;
+}
+
+static inline int sha512_base_do_update(struct shash_desc *desc, const u8 *data,
+					unsigned int len,
+					sha512_block_fn *block_fn, void *p)
+{
+	struct sha512_state *sctx = shash_desc_ctx(desc);
+	unsigned int partial = sctx->count[0] % SHA512_BLOCK_SIZE;
+
+	sctx->count[0] += len;
+	if (sctx->count[0] < len)
+		sctx->count[1]++;
+
+	if (unlikely((partial + len) >= SHA512_BLOCK_SIZE)) {
+		int blocks;
+
+		if (partial) {
+			int p = SHA512_BLOCK_SIZE - partial;
+
+			memcpy(sctx->buf + partial, data, p);
+			data += p;
+			len -= p;
+		}
+
+		blocks = len / SHA512_BLOCK_SIZE;
+		len %= SHA512_BLOCK_SIZE;
+
+		block_fn(blocks, data, sctx->state,
+			 partial ? sctx->buf : NULL, p);
+		data += blocks * SHA512_BLOCK_SIZE;
+		partial = 0;
+	}
+	if (len)
+		memcpy(sctx->buf + partial, data, len);
+
+	return 0;
+}
+
+static inline int sha512_base_do_finalize(struct shash_desc *desc,
+				   sha512_block_fn *block_fn, void *p)
+{
+	const int bit_offset = SHA512_BLOCK_SIZE - sizeof(__be64[2]);
+	struct sha512_state *sctx = shash_desc_ctx(desc);
+	__be64 *bits = (__be64 *)(sctx->buf + bit_offset);
+	unsigned int partial = sctx->count[0] % SHA512_BLOCK_SIZE;
+
+	sctx->buf[partial++] = 0x80;
+	if (partial > bit_offset) {
+		memset(sctx->buf + partial, 0x0, SHA512_BLOCK_SIZE - partial);
+		partial = 0;
+
+		block_fn(1, sctx->buf, sctx->state, NULL, p);
+	}
+
+	memset(sctx->buf + partial, 0x0, bit_offset - partial);
+	bits[0] = cpu_to_be64(sctx->count[1] << 3 | sctx->count[0] >> 61);
+	bits[1] = cpu_to_be64(sctx->count[0] << 3);
+	block_fn(1, sctx->buf, sctx->state, NULL, p);
+
+	return 0;
+}
+
+static inline int sha512_base_finish(struct shash_desc *desc, u8 *out)
+{
+	unsigned int digest_size = crypto_shash_digestsize(desc->tfm);
+	struct sha512_state *sctx = shash_desc_ctx(desc);
+	__be64 *digest = (__be64 *)out;
+	int i;
+
+	for (i = 0; digest_size > 0; i++, digest_size -= sizeof(__be64))
+		put_unaligned_be64(sctx->state[i], digest++);
+
+	*sctx = (struct sha512_state){};
+	return 0;
+}
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v3 03/16] crypto: sha512: implement base layer for SHA-512
@ 2015-04-07  8:51   ` Ard Biesheuvel
  0 siblings, 0 replies; 50+ messages in thread
From: Ard Biesheuvel @ 2015-04-07  8:51 UTC (permalink / raw)
  To: linux-arm-kernel

To reduce the number of copies of boilerplate code throughout
the tree, this patch implements generic glue for the SHA-512
algorithm. This allows a specific arch or hardware implementation
to only implement the special handling that it needs.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 include/crypto/sha512_base.h | 147 +++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 147 insertions(+)
 create mode 100644 include/crypto/sha512_base.h

diff --git a/include/crypto/sha512_base.h b/include/crypto/sha512_base.h
new file mode 100644
index 000000000000..44351f781dce
--- /dev/null
+++ b/include/crypto/sha512_base.h
@@ -0,0 +1,147 @@
+/*
+ * sha512_base.h - core logic for SHA-512 implementations
+ *
+ * Copyright (C) 2015 Linaro Ltd <ard.biesheuvel@linaro.org>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <crypto/internal/hash.h>
+#include <crypto/sha.h>
+#include <linux/crypto.h>
+#include <linux/module.h>
+
+#include <asm/unaligned.h>
+
+typedef void (sha512_block_fn)(int blocks, u8 const *src, u64 *state,
+			       const u8 *head, void *p);
+
+static inline int sha384_base_init(struct shash_desc *desc)
+{
+	struct sha512_state *sctx = shash_desc_ctx(desc);
+
+	sctx->state[0] = SHA384_H0;
+	sctx->state[1] = SHA384_H1;
+	sctx->state[2] = SHA384_H2;
+	sctx->state[3] = SHA384_H3;
+	sctx->state[4] = SHA384_H4;
+	sctx->state[5] = SHA384_H5;
+	sctx->state[6] = SHA384_H6;
+	sctx->state[7] = SHA384_H7;
+	sctx->count[0] = sctx->count[1] = 0;
+
+	return 0;
+}
+
+static inline int sha512_base_init(struct shash_desc *desc)
+{
+	struct sha512_state *sctx = shash_desc_ctx(desc);
+
+	sctx->state[0] = SHA512_H0;
+	sctx->state[1] = SHA512_H1;
+	sctx->state[2] = SHA512_H2;
+	sctx->state[3] = SHA512_H3;
+	sctx->state[4] = SHA512_H4;
+	sctx->state[5] = SHA512_H5;
+	sctx->state[6] = SHA512_H6;
+	sctx->state[7] = SHA512_H7;
+	sctx->count[0] = sctx->count[1] = 0;
+
+	return 0;
+}
+
+static inline int sha512_base_export(struct shash_desc *desc, void *out)
+{
+	struct sha512_state *sctx = shash_desc_ctx(desc);
+	struct sha512_state *dst = out;
+
+	*dst = *sctx;
+
+	return 0;
+}
+
+static inline int sha512_base_import(struct shash_desc *desc, const void *in)
+{
+	struct sha512_state *sctx = shash_desc_ctx(desc);
+	struct sha512_state const *src = in;
+
+	*sctx = *src;
+
+	return 0;
+}
+
+static inline int sha512_base_do_update(struct shash_desc *desc, const u8 *data,
+					unsigned int len,
+					sha512_block_fn *block_fn, void *p)
+{
+	struct sha512_state *sctx = shash_desc_ctx(desc);
+	unsigned int partial = sctx->count[0] % SHA512_BLOCK_SIZE;
+
+	sctx->count[0] += len;
+	if (sctx->count[0] < len)
+		sctx->count[1]++;
+
+	if (unlikely((partial + len) >= SHA512_BLOCK_SIZE)) {
+		int blocks;
+
+		if (partial) {
+			int p = SHA512_BLOCK_SIZE - partial;
+
+			memcpy(sctx->buf + partial, data, p);
+			data += p;
+			len -= p;
+		}
+
+		blocks = len / SHA512_BLOCK_SIZE;
+		len %= SHA512_BLOCK_SIZE;
+
+		block_fn(blocks, data, sctx->state,
+			 partial ? sctx->buf : NULL, p);
+		data += blocks * SHA512_BLOCK_SIZE;
+		partial = 0;
+	}
+	if (len)
+		memcpy(sctx->buf + partial, data, len);
+
+	return 0;
+}
+
+static inline int sha512_base_do_finalize(struct shash_desc *desc,
+				   sha512_block_fn *block_fn, void *p)
+{
+	const int bit_offset = SHA512_BLOCK_SIZE - sizeof(__be64[2]);
+	struct sha512_state *sctx = shash_desc_ctx(desc);
+	__be64 *bits = (__be64 *)(sctx->buf + bit_offset);
+	unsigned int partial = sctx->count[0] % SHA512_BLOCK_SIZE;
+
+	sctx->buf[partial++] = 0x80;
+	if (partial > bit_offset) {
+		memset(sctx->buf + partial, 0x0, SHA512_BLOCK_SIZE - partial);
+		partial = 0;
+
+		block_fn(1, sctx->buf, sctx->state, NULL, p);
+	}
+
+	memset(sctx->buf + partial, 0x0, bit_offset - partial);
+	bits[0] = cpu_to_be64(sctx->count[1] << 3 | sctx->count[0] >> 61);
+	bits[1] = cpu_to_be64(sctx->count[0] << 3);
+	block_fn(1, sctx->buf, sctx->state, NULL, p);
+
+	return 0;
+}
+
+static inline int sha512_base_finish(struct shash_desc *desc, u8 *out)
+{
+	unsigned int digest_size = crypto_shash_digestsize(desc->tfm);
+	struct sha512_state *sctx = shash_desc_ctx(desc);
+	__be64 *digest = (__be64 *)out;
+	int i;
+
+	for (i = 0; digest_size > 0; i++, digest_size -= sizeof(__be64))
+		put_unaligned_be64(sctx->state[i], digest++);
+
+	*sctx = (struct sha512_state){};
+	return 0;
+}
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v3 04/16] crypto: sha1-generic: move to generic glue implementation
  2015-04-07  8:51 ` Ard Biesheuvel
@ 2015-04-07  8:51   ` Ard Biesheuvel
  -1 siblings, 0 replies; 50+ messages in thread
From: Ard Biesheuvel @ 2015-04-07  8:51 UTC (permalink / raw)
  To: linux-crypto, linux-arm-kernel, x86, herbert, samitolvanen,
	jussi.kivilinna
  Cc: stockhausen, Ard Biesheuvel

This updates the generic SHA-1 implementation to use the generic
shared SHA-1 glue code.

It also implements a .finup hook crypto_sha1_finup() and exports
it to other modules.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 crypto/sha1_generic.c | 108 +++++++++++++-------------------------------------
 include/crypto/sha.h  |   3 ++
 2 files changed, 31 insertions(+), 80 deletions(-)

diff --git a/crypto/sha1_generic.c b/crypto/sha1_generic.c
index a3e50c37eb6f..322a2278d939 100644
--- a/crypto/sha1_generic.c
+++ b/crypto/sha1_generic.c
@@ -23,109 +23,57 @@
 #include <linux/cryptohash.h>
 #include <linux/types.h>
 #include <crypto/sha.h>
+#include <crypto/sha1_base.h>
 #include <asm/byteorder.h>
 
-static int sha1_init(struct shash_desc *desc)
+static void sha1_generic_block_fn(int blocks, u8 const *src, u32 *state,
+				  const u8 *head, void *p)
 {
-	struct sha1_state *sctx = shash_desc_ctx(desc);
+	u32 temp[SHA_WORKSPACE_WORDS];
 
-	*sctx = (struct sha1_state){
-		.state = { SHA1_H0, SHA1_H1, SHA1_H2, SHA1_H3, SHA1_H4 },
-	};
+	if (head)
+		sha_transform(state, head, temp);
 
-	return 0;
+	while (blocks--) {
+		sha_transform(state, src, temp);
+		src += SHA1_BLOCK_SIZE;
+	}
+	memzero_explicit(temp, sizeof(temp));
 }
 
 int crypto_sha1_update(struct shash_desc *desc, const u8 *data,
 			unsigned int len)
 {
-	struct sha1_state *sctx = shash_desc_ctx(desc);
-	unsigned int partial, done;
-	const u8 *src;
-
-	partial = sctx->count % SHA1_BLOCK_SIZE;
-	sctx->count += len;
-	done = 0;
-	src = data;
-
-	if ((partial + len) >= SHA1_BLOCK_SIZE) {
-		u32 temp[SHA_WORKSPACE_WORDS];
-
-		if (partial) {
-			done = -partial;
-			memcpy(sctx->buffer + partial, data,
-			       done + SHA1_BLOCK_SIZE);
-			src = sctx->buffer;
-		}
-
-		do {
-			sha_transform(sctx->state, src, temp);
-			done += SHA1_BLOCK_SIZE;
-			src = data + done;
-		} while (done + SHA1_BLOCK_SIZE <= len);
-
-		memzero_explicit(temp, sizeof(temp));
-		partial = 0;
-	}
-	memcpy(sctx->buffer + partial, src, len - done);
-
-	return 0;
+	return sha1_base_do_update(desc, data, len, sha1_generic_block_fn,
+				   NULL);
 }
 EXPORT_SYMBOL(crypto_sha1_update);
 
-
-/* Add padding and return the message digest. */
-static int sha1_final(struct shash_desc *desc, u8 *out)
-{
-	struct sha1_state *sctx = shash_desc_ctx(desc);
-	__be32 *dst = (__be32 *)out;
-	u32 i, index, padlen;
-	__be64 bits;
-	static const u8 padding[64] = { 0x80, };
-
-	bits = cpu_to_be64(sctx->count << 3);
-
-	/* Pad out to 56 mod 64 */
-	index = sctx->count & 0x3f;
-	padlen = (index < 56) ? (56 - index) : ((64+56) - index);
-	crypto_sha1_update(desc, padding, padlen);
-
-	/* Append length */
-	crypto_sha1_update(desc, (const u8 *)&bits, sizeof(bits));
-
-	/* Store state in digest */
-	for (i = 0; i < 5; i++)
-		dst[i] = cpu_to_be32(sctx->state[i]);
-
-	/* Wipe context */
-	memset(sctx, 0, sizeof *sctx);
-
-	return 0;
-}
-
-static int sha1_export(struct shash_desc *desc, void *out)
+int crypto_sha1_finup(struct shash_desc *desc, const u8 *data,
+		      unsigned int len, u8 *out)
 {
-	struct sha1_state *sctx = shash_desc_ctx(desc);
-
-	memcpy(out, sctx, sizeof(*sctx));
-	return 0;
+	if (len)
+		sha1_base_do_update(desc, data, len, sha1_generic_block_fn,
+				    NULL);
+	sha1_base_do_finalize(desc, sha1_generic_block_fn, NULL);
+	return sha1_base_finish(desc, out);
 }
+EXPORT_SYMBOL(crypto_sha1_finup);
 
-static int sha1_import(struct shash_desc *desc, const void *in)
+/* Add padding and return the message digest. */
+static int sha1_final(struct shash_desc *desc, u8 *out)
 {
-	struct sha1_state *sctx = shash_desc_ctx(desc);
-
-	memcpy(sctx, in, sizeof(*sctx));
-	return 0;
+	return crypto_sha1_finup(desc, NULL, 0, out);
 }
 
 static struct shash_alg alg = {
 	.digestsize	=	SHA1_DIGEST_SIZE,
-	.init		=	sha1_init,
+	.init		=	sha1_base_init,
 	.update		=	crypto_sha1_update,
 	.final		=	sha1_final,
-	.export		=	sha1_export,
-	.import		=	sha1_import,
+	.finup		=	crypto_sha1_finup,
+	.export		=	sha1_base_export,
+	.import		=	sha1_base_import,
 	.descsize	=	sizeof(struct sha1_state),
 	.statesize	=	sizeof(struct sha1_state),
 	.base		=	{
diff --git a/include/crypto/sha.h b/include/crypto/sha.h
index 190f8a0e0242..10e2009f5464 100644
--- a/include/crypto/sha.h
+++ b/include/crypto/sha.h
@@ -87,6 +87,9 @@ struct shash_desc;
 extern int crypto_sha1_update(struct shash_desc *desc, const u8 *data,
 			      unsigned int len);
 
+extern int crypto_sha1_finup(struct shash_desc *desc, const u8 *data,
+			     unsigned int len, u8 *hash);
+
 extern int crypto_sha256_update(struct shash_desc *desc, const u8 *data,
 			      unsigned int len);
 
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v3 04/16] crypto: sha1-generic: move to generic glue implementation
@ 2015-04-07  8:51   ` Ard Biesheuvel
  0 siblings, 0 replies; 50+ messages in thread
From: Ard Biesheuvel @ 2015-04-07  8:51 UTC (permalink / raw)
  To: linux-arm-kernel

This updates the generic SHA-1 implementation to use the generic
shared SHA-1 glue code.

It also implements a .finup hook crypto_sha1_finup() and exports
it to other modules.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 crypto/sha1_generic.c | 108 +++++++++++++-------------------------------------
 include/crypto/sha.h  |   3 ++
 2 files changed, 31 insertions(+), 80 deletions(-)

diff --git a/crypto/sha1_generic.c b/crypto/sha1_generic.c
index a3e50c37eb6f..322a2278d939 100644
--- a/crypto/sha1_generic.c
+++ b/crypto/sha1_generic.c
@@ -23,109 +23,57 @@
 #include <linux/cryptohash.h>
 #include <linux/types.h>
 #include <crypto/sha.h>
+#include <crypto/sha1_base.h>
 #include <asm/byteorder.h>
 
-static int sha1_init(struct shash_desc *desc)
+static void sha1_generic_block_fn(int blocks, u8 const *src, u32 *state,
+				  const u8 *head, void *p)
 {
-	struct sha1_state *sctx = shash_desc_ctx(desc);
+	u32 temp[SHA_WORKSPACE_WORDS];
 
-	*sctx = (struct sha1_state){
-		.state = { SHA1_H0, SHA1_H1, SHA1_H2, SHA1_H3, SHA1_H4 },
-	};
+	if (head)
+		sha_transform(state, head, temp);
 
-	return 0;
+	while (blocks--) {
+		sha_transform(state, src, temp);
+		src += SHA1_BLOCK_SIZE;
+	}
+	memzero_explicit(temp, sizeof(temp));
 }
 
 int crypto_sha1_update(struct shash_desc *desc, const u8 *data,
 			unsigned int len)
 {
-	struct sha1_state *sctx = shash_desc_ctx(desc);
-	unsigned int partial, done;
-	const u8 *src;
-
-	partial = sctx->count % SHA1_BLOCK_SIZE;
-	sctx->count += len;
-	done = 0;
-	src = data;
-
-	if ((partial + len) >= SHA1_BLOCK_SIZE) {
-		u32 temp[SHA_WORKSPACE_WORDS];
-
-		if (partial) {
-			done = -partial;
-			memcpy(sctx->buffer + partial, data,
-			       done + SHA1_BLOCK_SIZE);
-			src = sctx->buffer;
-		}
-
-		do {
-			sha_transform(sctx->state, src, temp);
-			done += SHA1_BLOCK_SIZE;
-			src = data + done;
-		} while (done + SHA1_BLOCK_SIZE <= len);
-
-		memzero_explicit(temp, sizeof(temp));
-		partial = 0;
-	}
-	memcpy(sctx->buffer + partial, src, len - done);
-
-	return 0;
+	return sha1_base_do_update(desc, data, len, sha1_generic_block_fn,
+				   NULL);
 }
 EXPORT_SYMBOL(crypto_sha1_update);
 
-
-/* Add padding and return the message digest. */
-static int sha1_final(struct shash_desc *desc, u8 *out)
-{
-	struct sha1_state *sctx = shash_desc_ctx(desc);
-	__be32 *dst = (__be32 *)out;
-	u32 i, index, padlen;
-	__be64 bits;
-	static const u8 padding[64] = { 0x80, };
-
-	bits = cpu_to_be64(sctx->count << 3);
-
-	/* Pad out to 56 mod 64 */
-	index = sctx->count & 0x3f;
-	padlen = (index < 56) ? (56 - index) : ((64+56) - index);
-	crypto_sha1_update(desc, padding, padlen);
-
-	/* Append length */
-	crypto_sha1_update(desc, (const u8 *)&bits, sizeof(bits));
-
-	/* Store state in digest */
-	for (i = 0; i < 5; i++)
-		dst[i] = cpu_to_be32(sctx->state[i]);
-
-	/* Wipe context */
-	memset(sctx, 0, sizeof *sctx);
-
-	return 0;
-}
-
-static int sha1_export(struct shash_desc *desc, void *out)
+int crypto_sha1_finup(struct shash_desc *desc, const u8 *data,
+		      unsigned int len, u8 *out)
 {
-	struct sha1_state *sctx = shash_desc_ctx(desc);
-
-	memcpy(out, sctx, sizeof(*sctx));
-	return 0;
+	if (len)
+		sha1_base_do_update(desc, data, len, sha1_generic_block_fn,
+				    NULL);
+	sha1_base_do_finalize(desc, sha1_generic_block_fn, NULL);
+	return sha1_base_finish(desc, out);
 }
+EXPORT_SYMBOL(crypto_sha1_finup);
 
-static int sha1_import(struct shash_desc *desc, const void *in)
+/* Add padding and return the message digest. */
+static int sha1_final(struct shash_desc *desc, u8 *out)
 {
-	struct sha1_state *sctx = shash_desc_ctx(desc);
-
-	memcpy(sctx, in, sizeof(*sctx));
-	return 0;
+	return crypto_sha1_finup(desc, NULL, 0, out);
 }
 
 static struct shash_alg alg = {
 	.digestsize	=	SHA1_DIGEST_SIZE,
-	.init		=	sha1_init,
+	.init		=	sha1_base_init,
 	.update		=	crypto_sha1_update,
 	.final		=	sha1_final,
-	.export		=	sha1_export,
-	.import		=	sha1_import,
+	.finup		=	crypto_sha1_finup,
+	.export		=	sha1_base_export,
+	.import		=	sha1_base_import,
 	.descsize	=	sizeof(struct sha1_state),
 	.statesize	=	sizeof(struct sha1_state),
 	.base		=	{
diff --git a/include/crypto/sha.h b/include/crypto/sha.h
index 190f8a0e0242..10e2009f5464 100644
--- a/include/crypto/sha.h
+++ b/include/crypto/sha.h
@@ -87,6 +87,9 @@ struct shash_desc;
 extern int crypto_sha1_update(struct shash_desc *desc, const u8 *data,
 			      unsigned int len);
 
+extern int crypto_sha1_finup(struct shash_desc *desc, const u8 *data,
+			     unsigned int len, u8 *hash);
+
 extern int crypto_sha256_update(struct shash_desc *desc, const u8 *data,
 			      unsigned int len);
 
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v3 05/16] crypto: sha256-generic: move to generic glue implementation
  2015-04-07  8:51 ` Ard Biesheuvel
@ 2015-04-07  8:51   ` Ard Biesheuvel
  -1 siblings, 0 replies; 50+ messages in thread
From: Ard Biesheuvel @ 2015-04-07  8:51 UTC (permalink / raw)
  To: linux-crypto, linux-arm-kernel, x86, herbert, samitolvanen,
	jussi.kivilinna
  Cc: stockhausen, Ard Biesheuvel

This updates the generic SHA-256 implementation to use the
new shared SHA-256 glue code.

It also implements a .finup hook crypto_sha256_finup() and exports
it to other modules.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 crypto/sha256_generic.c | 140 ++++++++++--------------------------------------
 include/crypto/sha.h    |   3 ++
 2 files changed, 31 insertions(+), 112 deletions(-)

diff --git a/crypto/sha256_generic.c b/crypto/sha256_generic.c
index b001ff5c2efc..794e31889ac9 100644
--- a/crypto/sha256_generic.c
+++ b/crypto/sha256_generic.c
@@ -23,6 +23,7 @@
 #include <linux/mm.h>
 #include <linux/types.h>
 #include <crypto/sha.h>
+#include <crypto/sha256_base.h>
 #include <asm/byteorder.h>
 #include <asm/unaligned.h>
 
@@ -214,136 +215,50 @@ static void sha256_transform(u32 *state, const u8 *input)
 	memzero_explicit(W, 64 * sizeof(u32));
 }
 
-static int sha224_init(struct shash_desc *desc)
+static void sha256_generic_block_fn(int blocks, u8 const *src, u32 *state,
+				    const u8 *head, void *p)
 {
-	struct sha256_state *sctx = shash_desc_ctx(desc);
-	sctx->state[0] = SHA224_H0;
-	sctx->state[1] = SHA224_H1;
-	sctx->state[2] = SHA224_H2;
-	sctx->state[3] = SHA224_H3;
-	sctx->state[4] = SHA224_H4;
-	sctx->state[5] = SHA224_H5;
-	sctx->state[6] = SHA224_H6;
-	sctx->state[7] = SHA224_H7;
-	sctx->count = 0;
+	if (head)
+		sha256_transform(state, head);
 
-	return 0;
-}
-
-static int sha256_init(struct shash_desc *desc)
-{
-	struct sha256_state *sctx = shash_desc_ctx(desc);
-	sctx->state[0] = SHA256_H0;
-	sctx->state[1] = SHA256_H1;
-	sctx->state[2] = SHA256_H2;
-	sctx->state[3] = SHA256_H3;
-	sctx->state[4] = SHA256_H4;
-	sctx->state[5] = SHA256_H5;
-	sctx->state[6] = SHA256_H6;
-	sctx->state[7] = SHA256_H7;
-	sctx->count = 0;
-
-	return 0;
+	while (blocks--) {
+		sha256_transform(state, src);
+		src += SHA256_BLOCK_SIZE;
+	}
 }
 
 int crypto_sha256_update(struct shash_desc *desc, const u8 *data,
 			  unsigned int len)
 {
-	struct sha256_state *sctx = shash_desc_ctx(desc);
-	unsigned int partial, done;
-	const u8 *src;
-
-	partial = sctx->count & 0x3f;
-	sctx->count += len;
-	done = 0;
-	src = data;
-
-	if ((partial + len) > 63) {
-		if (partial) {
-			done = -partial;
-			memcpy(sctx->buf + partial, data, done + 64);
-			src = sctx->buf;
-		}
-
-		do {
-			sha256_transform(sctx->state, src);
-			done += 64;
-			src = data + done;
-		} while (done + 63 < len);
-
-		partial = 0;
-	}
-	memcpy(sctx->buf + partial, src, len - done);
-
-	return 0;
+	return sha256_base_do_update(desc, data, len, sha256_generic_block_fn,
+				     NULL);
 }
 EXPORT_SYMBOL(crypto_sha256_update);
 
-static int sha256_final(struct shash_desc *desc, u8 *out)
-{
-	struct sha256_state *sctx = shash_desc_ctx(desc);
-	__be32 *dst = (__be32 *)out;
-	__be64 bits;
-	unsigned int index, pad_len;
-	int i;
-	static const u8 padding[64] = { 0x80, };
-
-	/* Save number of bits */
-	bits = cpu_to_be64(sctx->count << 3);
-
-	/* Pad out to 56 mod 64. */
-	index = sctx->count & 0x3f;
-	pad_len = (index < 56) ? (56 - index) : ((64+56) - index);
-	crypto_sha256_update(desc, padding, pad_len);
-
-	/* Append length (before padding) */
-	crypto_sha256_update(desc, (const u8 *)&bits, sizeof(bits));
-
-	/* Store state in digest */
-	for (i = 0; i < 8; i++)
-		dst[i] = cpu_to_be32(sctx->state[i]);
-
-	/* Zeroize sensitive information. */
-	memset(sctx, 0, sizeof(*sctx));
-
-	return 0;
-}
-
-static int sha224_final(struct shash_desc *desc, u8 *hash)
-{
-	u8 D[SHA256_DIGEST_SIZE];
-
-	sha256_final(desc, D);
-
-	memcpy(hash, D, SHA224_DIGEST_SIZE);
-	memzero_explicit(D, SHA256_DIGEST_SIZE);
-
-	return 0;
-}
-
-static int sha256_export(struct shash_desc *desc, void *out)
+int crypto_sha256_finup(struct shash_desc *desc, const u8 *data,
+			unsigned int len, u8 *hash)
 {
-	struct sha256_state *sctx = shash_desc_ctx(desc);
-
-	memcpy(out, sctx, sizeof(*sctx));
-	return 0;
+	if (len)
+		sha256_base_do_update(desc, data, len, sha256_generic_block_fn,
+				      NULL);
+	sha256_base_do_finalize(desc, sha256_generic_block_fn, NULL);
+	return sha256_base_finish(desc, hash);
 }
+EXPORT_SYMBOL(crypto_sha256_finup);
 
-static int sha256_import(struct shash_desc *desc, const void *in)
+static int sha256_final(struct shash_desc *desc, u8 *out)
 {
-	struct sha256_state *sctx = shash_desc_ctx(desc);
-
-	memcpy(sctx, in, sizeof(*sctx));
-	return 0;
+	return crypto_sha256_finup(desc, NULL, 0, out);
 }
 
 static struct shash_alg sha256_algs[2] = { {
 	.digestsize	=	SHA256_DIGEST_SIZE,
-	.init		=	sha256_init,
+	.init		=	sha256_base_init,
 	.update		=	crypto_sha256_update,
 	.final		=	sha256_final,
-	.export		=	sha256_export,
-	.import		=	sha256_import,
+	.finup		=	crypto_sha256_finup,
+	.export		=	sha256_base_export,
+	.import		=	sha256_base_import,
 	.descsize	=	sizeof(struct sha256_state),
 	.statesize	=	sizeof(struct sha256_state),
 	.base		=	{
@@ -355,9 +270,10 @@ static struct shash_alg sha256_algs[2] = { {
 	}
 }, {
 	.digestsize	=	SHA224_DIGEST_SIZE,
-	.init		=	sha224_init,
+	.init		=	sha224_base_init,
 	.update		=	crypto_sha256_update,
-	.final		=	sha224_final,
+	.final		=	sha256_final,
+	.finup		=	crypto_sha256_finup,
 	.descsize	=	sizeof(struct sha256_state),
 	.base		=	{
 		.cra_name	=	"sha224",
diff --git a/include/crypto/sha.h b/include/crypto/sha.h
index 10e2009f5464..cfdba9b1e539 100644
--- a/include/crypto/sha.h
+++ b/include/crypto/sha.h
@@ -93,6 +93,9 @@ extern int crypto_sha1_finup(struct shash_desc *desc, const u8 *data,
 extern int crypto_sha256_update(struct shash_desc *desc, const u8 *data,
 			      unsigned int len);
 
+extern int crypto_sha256_finup(struct shash_desc *desc, const u8 *data,
+			       unsigned int len, u8 *hash);
+
 extern int crypto_sha512_update(struct shash_desc *desc, const u8 *data,
 			      unsigned int len);
 #endif
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v3 05/16] crypto: sha256-generic: move to generic glue implementation
@ 2015-04-07  8:51   ` Ard Biesheuvel
  0 siblings, 0 replies; 50+ messages in thread
From: Ard Biesheuvel @ 2015-04-07  8:51 UTC (permalink / raw)
  To: linux-arm-kernel

This updates the generic SHA-256 implementation to use the
new shared SHA-256 glue code.

It also implements a .finup hook crypto_sha256_finup() and exports
it to other modules.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 crypto/sha256_generic.c | 140 ++++++++++--------------------------------------
 include/crypto/sha.h    |   3 ++
 2 files changed, 31 insertions(+), 112 deletions(-)

diff --git a/crypto/sha256_generic.c b/crypto/sha256_generic.c
index b001ff5c2efc..794e31889ac9 100644
--- a/crypto/sha256_generic.c
+++ b/crypto/sha256_generic.c
@@ -23,6 +23,7 @@
 #include <linux/mm.h>
 #include <linux/types.h>
 #include <crypto/sha.h>
+#include <crypto/sha256_base.h>
 #include <asm/byteorder.h>
 #include <asm/unaligned.h>
 
@@ -214,136 +215,50 @@ static void sha256_transform(u32 *state, const u8 *input)
 	memzero_explicit(W, 64 * sizeof(u32));
 }
 
-static int sha224_init(struct shash_desc *desc)
+static void sha256_generic_block_fn(int blocks, u8 const *src, u32 *state,
+				    const u8 *head, void *p)
 {
-	struct sha256_state *sctx = shash_desc_ctx(desc);
-	sctx->state[0] = SHA224_H0;
-	sctx->state[1] = SHA224_H1;
-	sctx->state[2] = SHA224_H2;
-	sctx->state[3] = SHA224_H3;
-	sctx->state[4] = SHA224_H4;
-	sctx->state[5] = SHA224_H5;
-	sctx->state[6] = SHA224_H6;
-	sctx->state[7] = SHA224_H7;
-	sctx->count = 0;
+	if (head)
+		sha256_transform(state, head);
 
-	return 0;
-}
-
-static int sha256_init(struct shash_desc *desc)
-{
-	struct sha256_state *sctx = shash_desc_ctx(desc);
-	sctx->state[0] = SHA256_H0;
-	sctx->state[1] = SHA256_H1;
-	sctx->state[2] = SHA256_H2;
-	sctx->state[3] = SHA256_H3;
-	sctx->state[4] = SHA256_H4;
-	sctx->state[5] = SHA256_H5;
-	sctx->state[6] = SHA256_H6;
-	sctx->state[7] = SHA256_H7;
-	sctx->count = 0;
-
-	return 0;
+	while (blocks--) {
+		sha256_transform(state, src);
+		src += SHA256_BLOCK_SIZE;
+	}
 }
 
 int crypto_sha256_update(struct shash_desc *desc, const u8 *data,
 			  unsigned int len)
 {
-	struct sha256_state *sctx = shash_desc_ctx(desc);
-	unsigned int partial, done;
-	const u8 *src;
-
-	partial = sctx->count & 0x3f;
-	sctx->count += len;
-	done = 0;
-	src = data;
-
-	if ((partial + len) > 63) {
-		if (partial) {
-			done = -partial;
-			memcpy(sctx->buf + partial, data, done + 64);
-			src = sctx->buf;
-		}
-
-		do {
-			sha256_transform(sctx->state, src);
-			done += 64;
-			src = data + done;
-		} while (done + 63 < len);
-
-		partial = 0;
-	}
-	memcpy(sctx->buf + partial, src, len - done);
-
-	return 0;
+	return sha256_base_do_update(desc, data, len, sha256_generic_block_fn,
+				     NULL);
 }
 EXPORT_SYMBOL(crypto_sha256_update);
 
-static int sha256_final(struct shash_desc *desc, u8 *out)
-{
-	struct sha256_state *sctx = shash_desc_ctx(desc);
-	__be32 *dst = (__be32 *)out;
-	__be64 bits;
-	unsigned int index, pad_len;
-	int i;
-	static const u8 padding[64] = { 0x80, };
-
-	/* Save number of bits */
-	bits = cpu_to_be64(sctx->count << 3);
-
-	/* Pad out to 56 mod 64. */
-	index = sctx->count & 0x3f;
-	pad_len = (index < 56) ? (56 - index) : ((64+56) - index);
-	crypto_sha256_update(desc, padding, pad_len);
-
-	/* Append length (before padding) */
-	crypto_sha256_update(desc, (const u8 *)&bits, sizeof(bits));
-
-	/* Store state in digest */
-	for (i = 0; i < 8; i++)
-		dst[i] = cpu_to_be32(sctx->state[i]);
-
-	/* Zeroize sensitive information. */
-	memset(sctx, 0, sizeof(*sctx));
-
-	return 0;
-}
-
-static int sha224_final(struct shash_desc *desc, u8 *hash)
-{
-	u8 D[SHA256_DIGEST_SIZE];
-
-	sha256_final(desc, D);
-
-	memcpy(hash, D, SHA224_DIGEST_SIZE);
-	memzero_explicit(D, SHA256_DIGEST_SIZE);
-
-	return 0;
-}
-
-static int sha256_export(struct shash_desc *desc, void *out)
+int crypto_sha256_finup(struct shash_desc *desc, const u8 *data,
+			unsigned int len, u8 *hash)
 {
-	struct sha256_state *sctx = shash_desc_ctx(desc);
-
-	memcpy(out, sctx, sizeof(*sctx));
-	return 0;
+	if (len)
+		sha256_base_do_update(desc, data, len, sha256_generic_block_fn,
+				      NULL);
+	sha256_base_do_finalize(desc, sha256_generic_block_fn, NULL);
+	return sha256_base_finish(desc, hash);
 }
+EXPORT_SYMBOL(crypto_sha256_finup);
 
-static int sha256_import(struct shash_desc *desc, const void *in)
+static int sha256_final(struct shash_desc *desc, u8 *out)
 {
-	struct sha256_state *sctx = shash_desc_ctx(desc);
-
-	memcpy(sctx, in, sizeof(*sctx));
-	return 0;
+	return crypto_sha256_finup(desc, NULL, 0, out);
 }
 
 static struct shash_alg sha256_algs[2] = { {
 	.digestsize	=	SHA256_DIGEST_SIZE,
-	.init		=	sha256_init,
+	.init		=	sha256_base_init,
 	.update		=	crypto_sha256_update,
 	.final		=	sha256_final,
-	.export		=	sha256_export,
-	.import		=	sha256_import,
+	.finup		=	crypto_sha256_finup,
+	.export		=	sha256_base_export,
+	.import		=	sha256_base_import,
 	.descsize	=	sizeof(struct sha256_state),
 	.statesize	=	sizeof(struct sha256_state),
 	.base		=	{
@@ -355,9 +270,10 @@ static struct shash_alg sha256_algs[2] = { {
 	}
 }, {
 	.digestsize	=	SHA224_DIGEST_SIZE,
-	.init		=	sha224_init,
+	.init		=	sha224_base_init,
 	.update		=	crypto_sha256_update,
-	.final		=	sha224_final,
+	.final		=	sha256_final,
+	.finup		=	crypto_sha256_finup,
 	.descsize	=	sizeof(struct sha256_state),
 	.base		=	{
 		.cra_name	=	"sha224",
diff --git a/include/crypto/sha.h b/include/crypto/sha.h
index 10e2009f5464..cfdba9b1e539 100644
--- a/include/crypto/sha.h
+++ b/include/crypto/sha.h
@@ -93,6 +93,9 @@ extern int crypto_sha1_finup(struct shash_desc *desc, const u8 *data,
 extern int crypto_sha256_update(struct shash_desc *desc, const u8 *data,
 			      unsigned int len);
 
+extern int crypto_sha256_finup(struct shash_desc *desc, const u8 *data,
+			       unsigned int len, u8 *hash);
+
 extern int crypto_sha512_update(struct shash_desc *desc, const u8 *data,
 			      unsigned int len);
 #endif
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v3 06/16] crypto: sha512-generic: move to generic glue implementation
  2015-04-07  8:51 ` Ard Biesheuvel
@ 2015-04-07  8:51   ` Ard Biesheuvel
  -1 siblings, 0 replies; 50+ messages in thread
From: Ard Biesheuvel @ 2015-04-07  8:51 UTC (permalink / raw)
  To: linux-crypto, linux-arm-kernel, x86, herbert, samitolvanen,
	jussi.kivilinna
  Cc: stockhausen, Ard Biesheuvel

This updated the generic SHA-512 implementation to use the
generic shared SHA-512 glue code.

It also implements a .finup hook crypto_sha512_finup() and exports
it to other modules.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 crypto/sha512_generic.c | 127 ++++++++++--------------------------------------
 include/crypto/sha.h    |   3 ++
 2 files changed, 29 insertions(+), 101 deletions(-)

diff --git a/crypto/sha512_generic.c b/crypto/sha512_generic.c
index 1c3c3767e079..8cf0082d7084 100644
--- a/crypto/sha512_generic.c
+++ b/crypto/sha512_generic.c
@@ -18,6 +18,7 @@
 #include <linux/crypto.h>
 #include <linux/types.h>
 #include <crypto/sha.h>
+#include <crypto/sha512_base.h>
 #include <linux/percpu.h>
 #include <asm/byteorder.h>
 #include <asm/unaligned.h>
@@ -130,125 +131,48 @@ sha512_transform(u64 *state, const u8 *input)
 	a = b = c = d = e = f = g = h = t1 = t2 = 0;
 }
 
-static int
-sha512_init(struct shash_desc *desc)
+static void sha512_generic_block_fn(int blocks, u8 const *src, u64 *state,
+				    const u8 *head, void *p)
 {
-	struct sha512_state *sctx = shash_desc_ctx(desc);
-	sctx->state[0] = SHA512_H0;
-	sctx->state[1] = SHA512_H1;
-	sctx->state[2] = SHA512_H2;
-	sctx->state[3] = SHA512_H3;
-	sctx->state[4] = SHA512_H4;
-	sctx->state[5] = SHA512_H5;
-	sctx->state[6] = SHA512_H6;
-	sctx->state[7] = SHA512_H7;
-	sctx->count[0] = sctx->count[1] = 0;
+	if (head)
+		sha512_transform(state, head);
 
-	return 0;
-}
-
-static int
-sha384_init(struct shash_desc *desc)
-{
-	struct sha512_state *sctx = shash_desc_ctx(desc);
-	sctx->state[0] = SHA384_H0;
-	sctx->state[1] = SHA384_H1;
-	sctx->state[2] = SHA384_H2;
-	sctx->state[3] = SHA384_H3;
-	sctx->state[4] = SHA384_H4;
-	sctx->state[5] = SHA384_H5;
-	sctx->state[6] = SHA384_H6;
-	sctx->state[7] = SHA384_H7;
-	sctx->count[0] = sctx->count[1] = 0;
-
-	return 0;
+	while (blocks--) {
+		sha512_transform(state, src);
+		src += SHA512_BLOCK_SIZE;
+	}
 }
 
 int crypto_sha512_update(struct shash_desc *desc, const u8 *data,
 			unsigned int len)
 {
-	struct sha512_state *sctx = shash_desc_ctx(desc);
-
-	unsigned int i, index, part_len;
-
-	/* Compute number of bytes mod 128 */
-	index = sctx->count[0] & 0x7f;
-
-	/* Update number of bytes */
-	if ((sctx->count[0] += len) < len)
-		sctx->count[1]++;
-
-        part_len = 128 - index;
-
-	/* Transform as many times as possible. */
-	if (len >= part_len) {
-		memcpy(&sctx->buf[index], data, part_len);
-		sha512_transform(sctx->state, sctx->buf);
-
-		for (i = part_len; i + 127 < len; i+=128)
-			sha512_transform(sctx->state, &data[i]);
-
-		index = 0;
-	} else {
-		i = 0;
-	}
-
-	/* Buffer remaining input */
-	memcpy(&sctx->buf[index], &data[i], len - i);
-
-	return 0;
+	return sha512_base_do_update(desc, data, len, sha512_generic_block_fn,
+				     NULL);
 }
 EXPORT_SYMBOL(crypto_sha512_update);
 
-static int
-sha512_final(struct shash_desc *desc, u8 *hash)
+int crypto_sha512_finup(struct shash_desc *desc, const u8 *data,
+			unsigned int len, u8 *hash)
 {
-	struct sha512_state *sctx = shash_desc_ctx(desc);
-        static u8 padding[128] = { 0x80, };
-	__be64 *dst = (__be64 *)hash;
-	__be64 bits[2];
-	unsigned int index, pad_len;
-	int i;
-
-	/* Save number of bits */
-	bits[1] = cpu_to_be64(sctx->count[0] << 3);
-	bits[0] = cpu_to_be64(sctx->count[1] << 3 | sctx->count[0] >> 61);
-
-	/* Pad out to 112 mod 128. */
-	index = sctx->count[0] & 0x7f;
-	pad_len = (index < 112) ? (112 - index) : ((128+112) - index);
-	crypto_sha512_update(desc, padding, pad_len);
-
-	/* Append length (before padding) */
-	crypto_sha512_update(desc, (const u8 *)bits, sizeof(bits));
-
-	/* Store state in digest */
-	for (i = 0; i < 8; i++)
-		dst[i] = cpu_to_be64(sctx->state[i]);
-
-	/* Zeroize sensitive information. */
-	memset(sctx, 0, sizeof(struct sha512_state));
-
-	return 0;
+	if (len)
+		sha512_base_do_update(desc, data, len, sha512_generic_block_fn,
+				      NULL);
+	sha512_base_do_finalize(desc, sha512_generic_block_fn, NULL);
+	return sha512_base_finish(desc, hash);
 }
+EXPORT_SYMBOL(crypto_sha512_finup);
 
-static int sha384_final(struct shash_desc *desc, u8 *hash)
+static int sha512_final(struct shash_desc *desc, u8 *hash)
 {
-	u8 D[64];
-
-	sha512_final(desc, D);
-
-	memcpy(hash, D, 48);
-	memzero_explicit(D, 64);
-
-	return 0;
+	return crypto_sha512_finup(desc, NULL, 0, hash);
 }
 
 static struct shash_alg sha512_algs[2] = { {
 	.digestsize	=	SHA512_DIGEST_SIZE,
-	.init		=	sha512_init,
+	.init		=	sha512_base_init,
 	.update		=	crypto_sha512_update,
 	.final		=	sha512_final,
+	.finup		=	crypto_sha512_finup,
 	.descsize	=	sizeof(struct sha512_state),
 	.base		=	{
 		.cra_name	=	"sha512",
@@ -259,9 +183,10 @@ static struct shash_alg sha512_algs[2] = { {
 	}
 }, {
 	.digestsize	=	SHA384_DIGEST_SIZE,
-	.init		=	sha384_init,
+	.init		=	sha384_base_init,
 	.update		=	crypto_sha512_update,
-	.final		=	sha384_final,
+	.final		=	sha512_final,
+	.finup		=	crypto_sha512_finup,
 	.descsize	=	sizeof(struct sha512_state),
 	.base		=	{
 		.cra_name	=	"sha384",
diff --git a/include/crypto/sha.h b/include/crypto/sha.h
index cfdba9b1e539..2bbbd6f7a1e3 100644
--- a/include/crypto/sha.h
+++ b/include/crypto/sha.h
@@ -98,4 +98,7 @@ extern int crypto_sha256_finup(struct shash_desc *desc, const u8 *data,
 
 extern int crypto_sha512_update(struct shash_desc *desc, const u8 *data,
 			      unsigned int len);
+
+extern int crypto_sha512_finup(struct shash_desc *desc, const u8 *data,
+			       unsigned int len, u8 *hash);
 #endif
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v3 06/16] crypto: sha512-generic: move to generic glue implementation
@ 2015-04-07  8:51   ` Ard Biesheuvel
  0 siblings, 0 replies; 50+ messages in thread
From: Ard Biesheuvel @ 2015-04-07  8:51 UTC (permalink / raw)
  To: linux-arm-kernel

This updated the generic SHA-512 implementation to use the
generic shared SHA-512 glue code.

It also implements a .finup hook crypto_sha512_finup() and exports
it to other modules.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 crypto/sha512_generic.c | 127 ++++++++++--------------------------------------
 include/crypto/sha.h    |   3 ++
 2 files changed, 29 insertions(+), 101 deletions(-)

diff --git a/crypto/sha512_generic.c b/crypto/sha512_generic.c
index 1c3c3767e079..8cf0082d7084 100644
--- a/crypto/sha512_generic.c
+++ b/crypto/sha512_generic.c
@@ -18,6 +18,7 @@
 #include <linux/crypto.h>
 #include <linux/types.h>
 #include <crypto/sha.h>
+#include <crypto/sha512_base.h>
 #include <linux/percpu.h>
 #include <asm/byteorder.h>
 #include <asm/unaligned.h>
@@ -130,125 +131,48 @@ sha512_transform(u64 *state, const u8 *input)
 	a = b = c = d = e = f = g = h = t1 = t2 = 0;
 }
 
-static int
-sha512_init(struct shash_desc *desc)
+static void sha512_generic_block_fn(int blocks, u8 const *src, u64 *state,
+				    const u8 *head, void *p)
 {
-	struct sha512_state *sctx = shash_desc_ctx(desc);
-	sctx->state[0] = SHA512_H0;
-	sctx->state[1] = SHA512_H1;
-	sctx->state[2] = SHA512_H2;
-	sctx->state[3] = SHA512_H3;
-	sctx->state[4] = SHA512_H4;
-	sctx->state[5] = SHA512_H5;
-	sctx->state[6] = SHA512_H6;
-	sctx->state[7] = SHA512_H7;
-	sctx->count[0] = sctx->count[1] = 0;
+	if (head)
+		sha512_transform(state, head);
 
-	return 0;
-}
-
-static int
-sha384_init(struct shash_desc *desc)
-{
-	struct sha512_state *sctx = shash_desc_ctx(desc);
-	sctx->state[0] = SHA384_H0;
-	sctx->state[1] = SHA384_H1;
-	sctx->state[2] = SHA384_H2;
-	sctx->state[3] = SHA384_H3;
-	sctx->state[4] = SHA384_H4;
-	sctx->state[5] = SHA384_H5;
-	sctx->state[6] = SHA384_H6;
-	sctx->state[7] = SHA384_H7;
-	sctx->count[0] = sctx->count[1] = 0;
-
-	return 0;
+	while (blocks--) {
+		sha512_transform(state, src);
+		src += SHA512_BLOCK_SIZE;
+	}
 }
 
 int crypto_sha512_update(struct shash_desc *desc, const u8 *data,
 			unsigned int len)
 {
-	struct sha512_state *sctx = shash_desc_ctx(desc);
-
-	unsigned int i, index, part_len;
-
-	/* Compute number of bytes mod 128 */
-	index = sctx->count[0] & 0x7f;
-
-	/* Update number of bytes */
-	if ((sctx->count[0] += len) < len)
-		sctx->count[1]++;
-
-        part_len = 128 - index;
-
-	/* Transform as many times as possible. */
-	if (len >= part_len) {
-		memcpy(&sctx->buf[index], data, part_len);
-		sha512_transform(sctx->state, sctx->buf);
-
-		for (i = part_len; i + 127 < len; i+=128)
-			sha512_transform(sctx->state, &data[i]);
-
-		index = 0;
-	} else {
-		i = 0;
-	}
-
-	/* Buffer remaining input */
-	memcpy(&sctx->buf[index], &data[i], len - i);
-
-	return 0;
+	return sha512_base_do_update(desc, data, len, sha512_generic_block_fn,
+				     NULL);
 }
 EXPORT_SYMBOL(crypto_sha512_update);
 
-static int
-sha512_final(struct shash_desc *desc, u8 *hash)
+int crypto_sha512_finup(struct shash_desc *desc, const u8 *data,
+			unsigned int len, u8 *hash)
 {
-	struct sha512_state *sctx = shash_desc_ctx(desc);
-        static u8 padding[128] = { 0x80, };
-	__be64 *dst = (__be64 *)hash;
-	__be64 bits[2];
-	unsigned int index, pad_len;
-	int i;
-
-	/* Save number of bits */
-	bits[1] = cpu_to_be64(sctx->count[0] << 3);
-	bits[0] = cpu_to_be64(sctx->count[1] << 3 | sctx->count[0] >> 61);
-
-	/* Pad out to 112 mod 128. */
-	index = sctx->count[0] & 0x7f;
-	pad_len = (index < 112) ? (112 - index) : ((128+112) - index);
-	crypto_sha512_update(desc, padding, pad_len);
-
-	/* Append length (before padding) */
-	crypto_sha512_update(desc, (const u8 *)bits, sizeof(bits));
-
-	/* Store state in digest */
-	for (i = 0; i < 8; i++)
-		dst[i] = cpu_to_be64(sctx->state[i]);
-
-	/* Zeroize sensitive information. */
-	memset(sctx, 0, sizeof(struct sha512_state));
-
-	return 0;
+	if (len)
+		sha512_base_do_update(desc, data, len, sha512_generic_block_fn,
+				      NULL);
+	sha512_base_do_finalize(desc, sha512_generic_block_fn, NULL);
+	return sha512_base_finish(desc, hash);
 }
+EXPORT_SYMBOL(crypto_sha512_finup);
 
-static int sha384_final(struct shash_desc *desc, u8 *hash)
+static int sha512_final(struct shash_desc *desc, u8 *hash)
 {
-	u8 D[64];
-
-	sha512_final(desc, D);
-
-	memcpy(hash, D, 48);
-	memzero_explicit(D, 64);
-
-	return 0;
+	return crypto_sha512_finup(desc, NULL, 0, hash);
 }
 
 static struct shash_alg sha512_algs[2] = { {
 	.digestsize	=	SHA512_DIGEST_SIZE,
-	.init		=	sha512_init,
+	.init		=	sha512_base_init,
 	.update		=	crypto_sha512_update,
 	.final		=	sha512_final,
+	.finup		=	crypto_sha512_finup,
 	.descsize	=	sizeof(struct sha512_state),
 	.base		=	{
 		.cra_name	=	"sha512",
@@ -259,9 +183,10 @@ static struct shash_alg sha512_algs[2] = { {
 	}
 }, {
 	.digestsize	=	SHA384_DIGEST_SIZE,
-	.init		=	sha384_init,
+	.init		=	sha384_base_init,
 	.update		=	crypto_sha512_update,
-	.final		=	sha384_final,
+	.final		=	sha512_final,
+	.finup		=	crypto_sha512_finup,
 	.descsize	=	sizeof(struct sha512_state),
 	.base		=	{
 		.cra_name	=	"sha384",
diff --git a/include/crypto/sha.h b/include/crypto/sha.h
index cfdba9b1e539..2bbbd6f7a1e3 100644
--- a/include/crypto/sha.h
+++ b/include/crypto/sha.h
@@ -98,4 +98,7 @@ extern int crypto_sha256_finup(struct shash_desc *desc, const u8 *data,
 
 extern int crypto_sha512_update(struct shash_desc *desc, const u8 *data,
 			      unsigned int len);
+
+extern int crypto_sha512_finup(struct shash_desc *desc, const u8 *data,
+			       unsigned int len, u8 *hash);
 #endif
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v3 07/16] crypto/arm: move SHA-1 ARM asm implementation to base layer
  2015-04-07  8:51 ` Ard Biesheuvel
@ 2015-04-07  8:51   ` Ard Biesheuvel
  -1 siblings, 0 replies; 50+ messages in thread
From: Ard Biesheuvel @ 2015-04-07  8:51 UTC (permalink / raw)
  To: linux-crypto, linux-arm-kernel, x86, herbert, samitolvanen,
	jussi.kivilinna
  Cc: stockhausen, Ard Biesheuvel

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm/crypto/sha1-ce-glue.c           |   3 +-
 arch/arm/{include/asm => }/crypto/sha1.h |   3 +
 arch/arm/crypto/sha1_glue.c              | 116 ++++++-------------------------
 arch/arm/crypto/sha1_neon_glue.c         |   2 +-
 4 files changed, 29 insertions(+), 95 deletions(-)
 rename arch/arm/{include/asm => }/crypto/sha1.h (67%)

diff --git a/arch/arm/crypto/sha1-ce-glue.c b/arch/arm/crypto/sha1-ce-glue.c
index a9dd90df9fd7..e93b24c1af1f 100644
--- a/arch/arm/crypto/sha1-ce-glue.c
+++ b/arch/arm/crypto/sha1-ce-glue.c
@@ -13,12 +13,13 @@
 #include <linux/crypto.h>
 #include <linux/module.h>
 
-#include <asm/crypto/sha1.h>
 #include <asm/hwcap.h>
 #include <asm/neon.h>
 #include <asm/simd.h>
 #include <asm/unaligned.h>
 
+#include "sha1.h"
+
 MODULE_DESCRIPTION("SHA1 secure hash using ARMv8 Crypto Extensions");
 MODULE_AUTHOR("Ard Biesheuvel <ard.biesheuvel@linaro.org>");
 MODULE_LICENSE("GPL v2");
diff --git a/arch/arm/include/asm/crypto/sha1.h b/arch/arm/crypto/sha1.h
similarity index 67%
rename from arch/arm/include/asm/crypto/sha1.h
rename to arch/arm/crypto/sha1.h
index 75e6a417416b..ffd8bd08b1a7 100644
--- a/arch/arm/include/asm/crypto/sha1.h
+++ b/arch/arm/crypto/sha1.h
@@ -7,4 +7,7 @@
 extern int sha1_update_arm(struct shash_desc *desc, const u8 *data,
 			   unsigned int len);
 
+extern int sha1_finup_arm(struct shash_desc *desc, const u8 *data,
+			   unsigned int len, u8 *out);
+
 #endif
diff --git a/arch/arm/crypto/sha1_glue.c b/arch/arm/crypto/sha1_glue.c
index e31b0440c613..c5a9519aaaad 100644
--- a/arch/arm/crypto/sha1_glue.c
+++ b/arch/arm/crypto/sha1_glue.c
@@ -22,125 +22,55 @@
 #include <linux/cryptohash.h>
 #include <linux/types.h>
 #include <crypto/sha.h>
+#include <crypto/sha1_base.h>
 #include <asm/byteorder.h>
-#include <asm/crypto/sha1.h>
 
+#include "sha1.h"
 
 asmlinkage void sha1_block_data_order(u32 *digest,
 		const unsigned char *data, unsigned int rounds);
 
-
-static int sha1_init(struct shash_desc *desc)
-{
-	struct sha1_state *sctx = shash_desc_ctx(desc);
-
-	*sctx = (struct sha1_state){
-		.state = { SHA1_H0, SHA1_H1, SHA1_H2, SHA1_H3, SHA1_H4 },
-	};
-
-	return 0;
-}
-
-
-static int __sha1_update(struct sha1_state *sctx, const u8 *data,
-			 unsigned int len, unsigned int partial)
+static void sha1_arm_block_fn(int blocks, u8 const *src, u32 *state,
+			      const u8 *head, void *p)
 {
-	unsigned int done = 0;
-
-	sctx->count += len;
-
-	if (partial) {
-		done = SHA1_BLOCK_SIZE - partial;
-		memcpy(sctx->buffer + partial, data, done);
-		sha1_block_data_order(sctx->state, sctx->buffer, 1);
-	}
-
-	if (len - done >= SHA1_BLOCK_SIZE) {
-		const unsigned int rounds = (len - done) / SHA1_BLOCK_SIZE;
-		sha1_block_data_order(sctx->state, data + done, rounds);
-		done += rounds * SHA1_BLOCK_SIZE;
-	}
-
-	memcpy(sctx->buffer, data + done, len - done);
-	return 0;
+	if (head)
+		sha1_block_data_order(state, head, 1);
+	if (blocks)
+		sha1_block_data_order(state, src, blocks);
 }
 
-
 int sha1_update_arm(struct shash_desc *desc, const u8 *data,
 		    unsigned int len)
 {
-	struct sha1_state *sctx = shash_desc_ctx(desc);
-	unsigned int partial = sctx->count % SHA1_BLOCK_SIZE;
-	int res;
-
-	/* Handle the fast case right here */
-	if (partial + len < SHA1_BLOCK_SIZE) {
-		sctx->count += len;
-		memcpy(sctx->buffer + partial, data, len);
-		return 0;
-	}
-	res = __sha1_update(sctx, data, len, partial);
-	return res;
+	return sha1_base_do_update(desc, data, len, sha1_arm_block_fn, NULL);
 }
 EXPORT_SYMBOL_GPL(sha1_update_arm);
 
-
-/* Add padding and return the message digest. */
-static int sha1_final(struct shash_desc *desc, u8 *out)
+int sha1_finup_arm(struct shash_desc *desc, const u8 *data,
+		   unsigned int len, u8 *out)
 {
-	struct sha1_state *sctx = shash_desc_ctx(desc);
-	unsigned int i, index, padlen;
-	__be32 *dst = (__be32 *)out;
-	__be64 bits;
-	static const u8 padding[SHA1_BLOCK_SIZE] = { 0x80, };
-
-	bits = cpu_to_be64(sctx->count << 3);
-
-	/* Pad out to 56 mod 64 and append length */
-	index = sctx->count % SHA1_BLOCK_SIZE;
-	padlen = (index < 56) ? (56 - index) : ((SHA1_BLOCK_SIZE+56) - index);
-	/* We need to fill a whole block for __sha1_update() */
-	if (padlen <= 56) {
-		sctx->count += padlen;
-		memcpy(sctx->buffer + index, padding, padlen);
-	} else {
-		__sha1_update(sctx, padding, padlen, index);
-	}
-	__sha1_update(sctx, (const u8 *)&bits, sizeof(bits), 56);
-
-	/* Store state in digest */
-	for (i = 0; i < 5; i++)
-		dst[i] = cpu_to_be32(sctx->state[i]);
-
-	/* Wipe context */
-	memset(sctx, 0, sizeof(*sctx));
-	return 0;
-}
-
+	if (len)
+		sha1_base_do_update(desc, data, len, sha1_arm_block_fn, NULL);
+	sha1_base_do_finalize(desc, sha1_arm_block_fn, NULL);
 
-static int sha1_export(struct shash_desc *desc, void *out)
-{
-	struct sha1_state *sctx = shash_desc_ctx(desc);
-	memcpy(out, sctx, sizeof(*sctx));
-	return 0;
+	return sha1_base_finish(desc, out);
 }
+EXPORT_SYMBOL_GPL(sha1_finup_arm);
 
-
-static int sha1_import(struct shash_desc *desc, const void *in)
+/* Add padding and return the message digest. */
+static int sha1_final(struct shash_desc *desc, u8 *out)
 {
-	struct sha1_state *sctx = shash_desc_ctx(desc);
-	memcpy(sctx, in, sizeof(*sctx));
-	return 0;
+	return sha1_finup_arm(desc, NULL, 0, out);
 }
 
-
 static struct shash_alg alg = {
 	.digestsize	=	SHA1_DIGEST_SIZE,
-	.init		=	sha1_init,
+	.init		=	sha1_base_init,
 	.update		=	sha1_update_arm,
 	.final		=	sha1_final,
-	.export		=	sha1_export,
-	.import		=	sha1_import,
+	.finup		=	sha1_finup_arm,
+	.export		=	sha1_base_export,
+	.import		=	sha1_base_import,
 	.descsize	=	sizeof(struct sha1_state),
 	.statesize	=	sizeof(struct sha1_state),
 	.base		=	{
diff --git a/arch/arm/crypto/sha1_neon_glue.c b/arch/arm/crypto/sha1_neon_glue.c
index 0b0083757d47..5d9a1b4aac73 100644
--- a/arch/arm/crypto/sha1_neon_glue.c
+++ b/arch/arm/crypto/sha1_neon_glue.c
@@ -28,8 +28,8 @@
 #include <asm/byteorder.h>
 #include <asm/neon.h>
 #include <asm/simd.h>
-#include <asm/crypto/sha1.h>
 
+#include "sha1.h"
 
 asmlinkage void sha1_transform_neon(void *state_h, const char *data,
 				    unsigned int rounds);
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v3 07/16] crypto/arm: move SHA-1 ARM asm implementation to base layer
@ 2015-04-07  8:51   ` Ard Biesheuvel
  0 siblings, 0 replies; 50+ messages in thread
From: Ard Biesheuvel @ 2015-04-07  8:51 UTC (permalink / raw)
  To: linux-arm-kernel

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm/crypto/sha1-ce-glue.c           |   3 +-
 arch/arm/{include/asm => }/crypto/sha1.h |   3 +
 arch/arm/crypto/sha1_glue.c              | 116 ++++++-------------------------
 arch/arm/crypto/sha1_neon_glue.c         |   2 +-
 4 files changed, 29 insertions(+), 95 deletions(-)
 rename arch/arm/{include/asm => }/crypto/sha1.h (67%)

diff --git a/arch/arm/crypto/sha1-ce-glue.c b/arch/arm/crypto/sha1-ce-glue.c
index a9dd90df9fd7..e93b24c1af1f 100644
--- a/arch/arm/crypto/sha1-ce-glue.c
+++ b/arch/arm/crypto/sha1-ce-glue.c
@@ -13,12 +13,13 @@
 #include <linux/crypto.h>
 #include <linux/module.h>
 
-#include <asm/crypto/sha1.h>
 #include <asm/hwcap.h>
 #include <asm/neon.h>
 #include <asm/simd.h>
 #include <asm/unaligned.h>
 
+#include "sha1.h"
+
 MODULE_DESCRIPTION("SHA1 secure hash using ARMv8 Crypto Extensions");
 MODULE_AUTHOR("Ard Biesheuvel <ard.biesheuvel@linaro.org>");
 MODULE_LICENSE("GPL v2");
diff --git a/arch/arm/include/asm/crypto/sha1.h b/arch/arm/crypto/sha1.h
similarity index 67%
rename from arch/arm/include/asm/crypto/sha1.h
rename to arch/arm/crypto/sha1.h
index 75e6a417416b..ffd8bd08b1a7 100644
--- a/arch/arm/include/asm/crypto/sha1.h
+++ b/arch/arm/crypto/sha1.h
@@ -7,4 +7,7 @@
 extern int sha1_update_arm(struct shash_desc *desc, const u8 *data,
 			   unsigned int len);
 
+extern int sha1_finup_arm(struct shash_desc *desc, const u8 *data,
+			   unsigned int len, u8 *out);
+
 #endif
diff --git a/arch/arm/crypto/sha1_glue.c b/arch/arm/crypto/sha1_glue.c
index e31b0440c613..c5a9519aaaad 100644
--- a/arch/arm/crypto/sha1_glue.c
+++ b/arch/arm/crypto/sha1_glue.c
@@ -22,125 +22,55 @@
 #include <linux/cryptohash.h>
 #include <linux/types.h>
 #include <crypto/sha.h>
+#include <crypto/sha1_base.h>
 #include <asm/byteorder.h>
-#include <asm/crypto/sha1.h>
 
+#include "sha1.h"
 
 asmlinkage void sha1_block_data_order(u32 *digest,
 		const unsigned char *data, unsigned int rounds);
 
-
-static int sha1_init(struct shash_desc *desc)
-{
-	struct sha1_state *sctx = shash_desc_ctx(desc);
-
-	*sctx = (struct sha1_state){
-		.state = { SHA1_H0, SHA1_H1, SHA1_H2, SHA1_H3, SHA1_H4 },
-	};
-
-	return 0;
-}
-
-
-static int __sha1_update(struct sha1_state *sctx, const u8 *data,
-			 unsigned int len, unsigned int partial)
+static void sha1_arm_block_fn(int blocks, u8 const *src, u32 *state,
+			      const u8 *head, void *p)
 {
-	unsigned int done = 0;
-
-	sctx->count += len;
-
-	if (partial) {
-		done = SHA1_BLOCK_SIZE - partial;
-		memcpy(sctx->buffer + partial, data, done);
-		sha1_block_data_order(sctx->state, sctx->buffer, 1);
-	}
-
-	if (len - done >= SHA1_BLOCK_SIZE) {
-		const unsigned int rounds = (len - done) / SHA1_BLOCK_SIZE;
-		sha1_block_data_order(sctx->state, data + done, rounds);
-		done += rounds * SHA1_BLOCK_SIZE;
-	}
-
-	memcpy(sctx->buffer, data + done, len - done);
-	return 0;
+	if (head)
+		sha1_block_data_order(state, head, 1);
+	if (blocks)
+		sha1_block_data_order(state, src, blocks);
 }
 
-
 int sha1_update_arm(struct shash_desc *desc, const u8 *data,
 		    unsigned int len)
 {
-	struct sha1_state *sctx = shash_desc_ctx(desc);
-	unsigned int partial = sctx->count % SHA1_BLOCK_SIZE;
-	int res;
-
-	/* Handle the fast case right here */
-	if (partial + len < SHA1_BLOCK_SIZE) {
-		sctx->count += len;
-		memcpy(sctx->buffer + partial, data, len);
-		return 0;
-	}
-	res = __sha1_update(sctx, data, len, partial);
-	return res;
+	return sha1_base_do_update(desc, data, len, sha1_arm_block_fn, NULL);
 }
 EXPORT_SYMBOL_GPL(sha1_update_arm);
 
-
-/* Add padding and return the message digest. */
-static int sha1_final(struct shash_desc *desc, u8 *out)
+int sha1_finup_arm(struct shash_desc *desc, const u8 *data,
+		   unsigned int len, u8 *out)
 {
-	struct sha1_state *sctx = shash_desc_ctx(desc);
-	unsigned int i, index, padlen;
-	__be32 *dst = (__be32 *)out;
-	__be64 bits;
-	static const u8 padding[SHA1_BLOCK_SIZE] = { 0x80, };
-
-	bits = cpu_to_be64(sctx->count << 3);
-
-	/* Pad out to 56 mod 64 and append length */
-	index = sctx->count % SHA1_BLOCK_SIZE;
-	padlen = (index < 56) ? (56 - index) : ((SHA1_BLOCK_SIZE+56) - index);
-	/* We need to fill a whole block for __sha1_update() */
-	if (padlen <= 56) {
-		sctx->count += padlen;
-		memcpy(sctx->buffer + index, padding, padlen);
-	} else {
-		__sha1_update(sctx, padding, padlen, index);
-	}
-	__sha1_update(sctx, (const u8 *)&bits, sizeof(bits), 56);
-
-	/* Store state in digest */
-	for (i = 0; i < 5; i++)
-		dst[i] = cpu_to_be32(sctx->state[i]);
-
-	/* Wipe context */
-	memset(sctx, 0, sizeof(*sctx));
-	return 0;
-}
-
+	if (len)
+		sha1_base_do_update(desc, data, len, sha1_arm_block_fn, NULL);
+	sha1_base_do_finalize(desc, sha1_arm_block_fn, NULL);
 
-static int sha1_export(struct shash_desc *desc, void *out)
-{
-	struct sha1_state *sctx = shash_desc_ctx(desc);
-	memcpy(out, sctx, sizeof(*sctx));
-	return 0;
+	return sha1_base_finish(desc, out);
 }
+EXPORT_SYMBOL_GPL(sha1_finup_arm);
 
-
-static int sha1_import(struct shash_desc *desc, const void *in)
+/* Add padding and return the message digest. */
+static int sha1_final(struct shash_desc *desc, u8 *out)
 {
-	struct sha1_state *sctx = shash_desc_ctx(desc);
-	memcpy(sctx, in, sizeof(*sctx));
-	return 0;
+	return sha1_finup_arm(desc, NULL, 0, out);
 }
 
-
 static struct shash_alg alg = {
 	.digestsize	=	SHA1_DIGEST_SIZE,
-	.init		=	sha1_init,
+	.init		=	sha1_base_init,
 	.update		=	sha1_update_arm,
 	.final		=	sha1_final,
-	.export		=	sha1_export,
-	.import		=	sha1_import,
+	.finup		=	sha1_finup_arm,
+	.export		=	sha1_base_export,
+	.import		=	sha1_base_import,
 	.descsize	=	sizeof(struct sha1_state),
 	.statesize	=	sizeof(struct sha1_state),
 	.base		=	{
diff --git a/arch/arm/crypto/sha1_neon_glue.c b/arch/arm/crypto/sha1_neon_glue.c
index 0b0083757d47..5d9a1b4aac73 100644
--- a/arch/arm/crypto/sha1_neon_glue.c
+++ b/arch/arm/crypto/sha1_neon_glue.c
@@ -28,8 +28,8 @@
 #include <asm/byteorder.h>
 #include <asm/neon.h>
 #include <asm/simd.h>
-#include <asm/crypto/sha1.h>
 
+#include "sha1.h"
 
 asmlinkage void sha1_transform_neon(void *state_h, const char *data,
 				    unsigned int rounds);
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v3 08/16] crypto/arm: move SHA-1 NEON implementation to base layer
  2015-04-07  8:51 ` Ard Biesheuvel
@ 2015-04-07  8:51   ` Ard Biesheuvel
  -1 siblings, 0 replies; 50+ messages in thread
From: Ard Biesheuvel @ 2015-04-07  8:51 UTC (permalink / raw)
  To: linux-crypto, linux-arm-kernel, x86, herbert, samitolvanen,
	jussi.kivilinna
  Cc: stockhausen, Ard Biesheuvel

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm/crypto/sha1_neon_glue.c | 137 +++++++++------------------------------
 1 file changed, 30 insertions(+), 107 deletions(-)

diff --git a/arch/arm/crypto/sha1_neon_glue.c b/arch/arm/crypto/sha1_neon_glue.c
index 5d9a1b4aac73..4280f657fb9d 100644
--- a/arch/arm/crypto/sha1_neon_glue.c
+++ b/arch/arm/crypto/sha1_neon_glue.c
@@ -25,7 +25,7 @@
 #include <linux/cryptohash.h>
 #include <linux/types.h>
 #include <crypto/sha.h>
-#include <asm/byteorder.h>
+#include <crypto/sha1_base.h>
 #include <asm/neon.h>
 #include <asm/simd.h>
 
@@ -34,136 +34,59 @@
 asmlinkage void sha1_transform_neon(void *state_h, const char *data,
 				    unsigned int rounds);
 
-
-static int sha1_neon_init(struct shash_desc *desc)
+static void sha1_neon_block_fn(int blocks, u8 const *src, u32 *state,
+			      const u8 *head, void *p)
 {
-	struct sha1_state *sctx = shash_desc_ctx(desc);
-
-	*sctx = (struct sha1_state){
-		.state = { SHA1_H0, SHA1_H1, SHA1_H2, SHA1_H3, SHA1_H4 },
-	};
-
-	return 0;
-}
-
-static int __sha1_neon_update(struct shash_desc *desc, const u8 *data,
-			       unsigned int len, unsigned int partial)
-{
-	struct sha1_state *sctx = shash_desc_ctx(desc);
-	unsigned int done = 0;
-
-	sctx->count += len;
-
-	if (partial) {
-		done = SHA1_BLOCK_SIZE - partial;
-		memcpy(sctx->buffer + partial, data, done);
-		sha1_transform_neon(sctx->state, sctx->buffer, 1);
-	}
-
-	if (len - done >= SHA1_BLOCK_SIZE) {
-		const unsigned int rounds = (len - done) / SHA1_BLOCK_SIZE;
-
-		sha1_transform_neon(sctx->state, data + done, rounds);
-		done += rounds * SHA1_BLOCK_SIZE;
-	}
-
-	memcpy(sctx->buffer, data + done, len - done);
-
-	return 0;
+	if (head)
+		sha1_transform_neon(state, head, 1);
+	if (blocks)
+		sha1_transform_neon(state, src, blocks);
 }
 
 static int sha1_neon_update(struct shash_desc *desc, const u8 *data,
-			     unsigned int len)
-{
-	struct sha1_state *sctx = shash_desc_ctx(desc);
-	unsigned int partial = sctx->count % SHA1_BLOCK_SIZE;
-	int res;
-
-	/* Handle the fast case right here */
-	if (partial + len < SHA1_BLOCK_SIZE) {
-		sctx->count += len;
-		memcpy(sctx->buffer + partial, data, len);
-
-		return 0;
-	}
-
-	if (!may_use_simd()) {
-		res = sha1_update_arm(desc, data, len);
-	} else {
-		kernel_neon_begin();
-		res = __sha1_neon_update(desc, data, len, partial);
-		kernel_neon_end();
-	}
-
-	return res;
-}
-
-
-/* Add padding and return the message digest. */
-static int sha1_neon_final(struct shash_desc *desc, u8 *out)
+			  unsigned int len)
 {
 	struct sha1_state *sctx = shash_desc_ctx(desc);
-	unsigned int i, index, padlen;
-	__be32 *dst = (__be32 *)out;
-	__be64 bits;
-	static const u8 padding[SHA1_BLOCK_SIZE] = { 0x80, };
-
-	bits = cpu_to_be64(sctx->count << 3);
-
-	/* Pad out to 56 mod 64 and append length */
-	index = sctx->count % SHA1_BLOCK_SIZE;
-	padlen = (index < 56) ? (56 - index) : ((SHA1_BLOCK_SIZE+56) - index);
-	if (!may_use_simd()) {
-		sha1_update_arm(desc, padding, padlen);
-		sha1_update_arm(desc, (const u8 *)&bits, sizeof(bits));
-	} else {
-		kernel_neon_begin();
-		/* We need to fill a whole block for __sha1_neon_update() */
-		if (padlen <= 56) {
-			sctx->count += padlen;
-			memcpy(sctx->buffer + index, padding, padlen);
-		} else {
-			__sha1_neon_update(desc, padding, padlen, index);
-		}
-		__sha1_neon_update(desc, (const u8 *)&bits, sizeof(bits), 56);
-		kernel_neon_end();
-	}
 
-	/* Store state in digest */
-	for (i = 0; i < 5; i++)
-		dst[i] = cpu_to_be32(sctx->state[i]);
+	if (!may_use_simd() ||
+	    (sctx->count % SHA1_BLOCK_SIZE) + len < SHA1_BLOCK_SIZE)
+		return sha1_update_arm(desc, data, len);
 
-	/* Wipe context */
-	memset(sctx, 0, sizeof(*sctx));
+	kernel_neon_begin();
+	sha1_base_do_update(desc, data, len, sha1_neon_block_fn, NULL);
+	kernel_neon_end();
 
 	return 0;
 }
 
-static int sha1_neon_export(struct shash_desc *desc, void *out)
+static int sha1_neon_finup(struct shash_desc *desc, const u8 *data,
+			   unsigned int len, u8 *out)
 {
-	struct sha1_state *sctx = shash_desc_ctx(desc);
+	if (!may_use_simd())
+		return sha1_finup_arm(desc, data, len, out);
 
-	memcpy(out, sctx, sizeof(*sctx));
+	kernel_neon_begin();
+	if (len)
+		sha1_base_do_update(desc, data, len, sha1_neon_block_fn, NULL);
+	sha1_base_do_finalize(desc, sha1_neon_block_fn, NULL);
+	kernel_neon_end();
 
-	return 0;
+	return sha1_base_finish(desc, out);
 }
 
-static int sha1_neon_import(struct shash_desc *desc, const void *in)
+static int sha1_neon_final(struct shash_desc *desc, u8 *out)
 {
-	struct sha1_state *sctx = shash_desc_ctx(desc);
-
-	memcpy(sctx, in, sizeof(*sctx));
-
-	return 0;
+	return sha1_neon_finup(desc, NULL, 0, out);
 }
 
 static struct shash_alg alg = {
 	.digestsize	=	SHA1_DIGEST_SIZE,
-	.init		=	sha1_neon_init,
+	.init		=	sha1_base_init,
 	.update		=	sha1_neon_update,
 	.final		=	sha1_neon_final,
-	.export		=	sha1_neon_export,
-	.import		=	sha1_neon_import,
+	.finup		=	sha1_neon_finup,
+	.export		=	sha1_base_export,
+	.import		=	sha1_base_import,
 	.descsize	=	sizeof(struct sha1_state),
 	.statesize	=	sizeof(struct sha1_state),
 	.base		=	{
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v3 08/16] crypto/arm: move SHA-1 NEON implementation to base layer
@ 2015-04-07  8:51   ` Ard Biesheuvel
  0 siblings, 0 replies; 50+ messages in thread
From: Ard Biesheuvel @ 2015-04-07  8:51 UTC (permalink / raw)
  To: linux-arm-kernel

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm/crypto/sha1_neon_glue.c | 137 +++++++++------------------------------
 1 file changed, 30 insertions(+), 107 deletions(-)

diff --git a/arch/arm/crypto/sha1_neon_glue.c b/arch/arm/crypto/sha1_neon_glue.c
index 5d9a1b4aac73..4280f657fb9d 100644
--- a/arch/arm/crypto/sha1_neon_glue.c
+++ b/arch/arm/crypto/sha1_neon_glue.c
@@ -25,7 +25,7 @@
 #include <linux/cryptohash.h>
 #include <linux/types.h>
 #include <crypto/sha.h>
-#include <asm/byteorder.h>
+#include <crypto/sha1_base.h>
 #include <asm/neon.h>
 #include <asm/simd.h>
 
@@ -34,136 +34,59 @@
 asmlinkage void sha1_transform_neon(void *state_h, const char *data,
 				    unsigned int rounds);
 
-
-static int sha1_neon_init(struct shash_desc *desc)
+static void sha1_neon_block_fn(int blocks, u8 const *src, u32 *state,
+			      const u8 *head, void *p)
 {
-	struct sha1_state *sctx = shash_desc_ctx(desc);
-
-	*sctx = (struct sha1_state){
-		.state = { SHA1_H0, SHA1_H1, SHA1_H2, SHA1_H3, SHA1_H4 },
-	};
-
-	return 0;
-}
-
-static int __sha1_neon_update(struct shash_desc *desc, const u8 *data,
-			       unsigned int len, unsigned int partial)
-{
-	struct sha1_state *sctx = shash_desc_ctx(desc);
-	unsigned int done = 0;
-
-	sctx->count += len;
-
-	if (partial) {
-		done = SHA1_BLOCK_SIZE - partial;
-		memcpy(sctx->buffer + partial, data, done);
-		sha1_transform_neon(sctx->state, sctx->buffer, 1);
-	}
-
-	if (len - done >= SHA1_BLOCK_SIZE) {
-		const unsigned int rounds = (len - done) / SHA1_BLOCK_SIZE;
-
-		sha1_transform_neon(sctx->state, data + done, rounds);
-		done += rounds * SHA1_BLOCK_SIZE;
-	}
-
-	memcpy(sctx->buffer, data + done, len - done);
-
-	return 0;
+	if (head)
+		sha1_transform_neon(state, head, 1);
+	if (blocks)
+		sha1_transform_neon(state, src, blocks);
 }
 
 static int sha1_neon_update(struct shash_desc *desc, const u8 *data,
-			     unsigned int len)
-{
-	struct sha1_state *sctx = shash_desc_ctx(desc);
-	unsigned int partial = sctx->count % SHA1_BLOCK_SIZE;
-	int res;
-
-	/* Handle the fast case right here */
-	if (partial + len < SHA1_BLOCK_SIZE) {
-		sctx->count += len;
-		memcpy(sctx->buffer + partial, data, len);
-
-		return 0;
-	}
-
-	if (!may_use_simd()) {
-		res = sha1_update_arm(desc, data, len);
-	} else {
-		kernel_neon_begin();
-		res = __sha1_neon_update(desc, data, len, partial);
-		kernel_neon_end();
-	}
-
-	return res;
-}
-
-
-/* Add padding and return the message digest. */
-static int sha1_neon_final(struct shash_desc *desc, u8 *out)
+			  unsigned int len)
 {
 	struct sha1_state *sctx = shash_desc_ctx(desc);
-	unsigned int i, index, padlen;
-	__be32 *dst = (__be32 *)out;
-	__be64 bits;
-	static const u8 padding[SHA1_BLOCK_SIZE] = { 0x80, };
-
-	bits = cpu_to_be64(sctx->count << 3);
-
-	/* Pad out to 56 mod 64 and append length */
-	index = sctx->count % SHA1_BLOCK_SIZE;
-	padlen = (index < 56) ? (56 - index) : ((SHA1_BLOCK_SIZE+56) - index);
-	if (!may_use_simd()) {
-		sha1_update_arm(desc, padding, padlen);
-		sha1_update_arm(desc, (const u8 *)&bits, sizeof(bits));
-	} else {
-		kernel_neon_begin();
-		/* We need to fill a whole block for __sha1_neon_update() */
-		if (padlen <= 56) {
-			sctx->count += padlen;
-			memcpy(sctx->buffer + index, padding, padlen);
-		} else {
-			__sha1_neon_update(desc, padding, padlen, index);
-		}
-		__sha1_neon_update(desc, (const u8 *)&bits, sizeof(bits), 56);
-		kernel_neon_end();
-	}
 
-	/* Store state in digest */
-	for (i = 0; i < 5; i++)
-		dst[i] = cpu_to_be32(sctx->state[i]);
+	if (!may_use_simd() ||
+	    (sctx->count % SHA1_BLOCK_SIZE) + len < SHA1_BLOCK_SIZE)
+		return sha1_update_arm(desc, data, len);
 
-	/* Wipe context */
-	memset(sctx, 0, sizeof(*sctx));
+	kernel_neon_begin();
+	sha1_base_do_update(desc, data, len, sha1_neon_block_fn, NULL);
+	kernel_neon_end();
 
 	return 0;
 }
 
-static int sha1_neon_export(struct shash_desc *desc, void *out)
+static int sha1_neon_finup(struct shash_desc *desc, const u8 *data,
+			   unsigned int len, u8 *out)
 {
-	struct sha1_state *sctx = shash_desc_ctx(desc);
+	if (!may_use_simd())
+		return sha1_finup_arm(desc, data, len, out);
 
-	memcpy(out, sctx, sizeof(*sctx));
+	kernel_neon_begin();
+	if (len)
+		sha1_base_do_update(desc, data, len, sha1_neon_block_fn, NULL);
+	sha1_base_do_finalize(desc, sha1_neon_block_fn, NULL);
+	kernel_neon_end();
 
-	return 0;
+	return sha1_base_finish(desc, out);
 }
 
-static int sha1_neon_import(struct shash_desc *desc, const void *in)
+static int sha1_neon_final(struct shash_desc *desc, u8 *out)
 {
-	struct sha1_state *sctx = shash_desc_ctx(desc);
-
-	memcpy(sctx, in, sizeof(*sctx));
-
-	return 0;
+	return sha1_neon_finup(desc, NULL, 0, out);
 }
 
 static struct shash_alg alg = {
 	.digestsize	=	SHA1_DIGEST_SIZE,
-	.init		=	sha1_neon_init,
+	.init		=	sha1_base_init,
 	.update		=	sha1_neon_update,
 	.final		=	sha1_neon_final,
-	.export		=	sha1_neon_export,
-	.import		=	sha1_neon_import,
+	.finup		=	sha1_neon_finup,
+	.export		=	sha1_base_export,
+	.import		=	sha1_base_import,
 	.descsize	=	sizeof(struct sha1_state),
 	.statesize	=	sizeof(struct sha1_state),
 	.base		=	{
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v3 09/16] crypto/arm: move SHA-1 ARMv8 implementation to base layer
  2015-04-07  8:51 ` Ard Biesheuvel
@ 2015-04-07  8:51   ` Ard Biesheuvel
  -1 siblings, 0 replies; 50+ messages in thread
From: Ard Biesheuvel @ 2015-04-07  8:51 UTC (permalink / raw)
  To: linux-crypto, linux-arm-kernel, x86, herbert, samitolvanen,
	jussi.kivilinna
  Cc: stockhausen, Ard Biesheuvel

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm/crypto/Kconfig        |   1 -
 arch/arm/crypto/sha1-ce-glue.c | 108 +++++++++++------------------------------
 2 files changed, 28 insertions(+), 81 deletions(-)

diff --git a/arch/arm/crypto/Kconfig b/arch/arm/crypto/Kconfig
index 458729d2ce22..5ed98bc6f95d 100644
--- a/arch/arm/crypto/Kconfig
+++ b/arch/arm/crypto/Kconfig
@@ -31,7 +31,6 @@ config CRYPTO_SHA1_ARM_CE
 	tristate "SHA1 digest algorithm (ARM v8 Crypto Extensions)"
 	depends on KERNEL_MODE_NEON
 	select CRYPTO_SHA1_ARM
-	select CRYPTO_SHA1
 	select CRYPTO_HASH
 	help
 	  SHA-1 secure hash standard (FIPS 180-1/DFIPS 180-2) implemented
diff --git a/arch/arm/crypto/sha1-ce-glue.c b/arch/arm/crypto/sha1-ce-glue.c
index e93b24c1af1f..9d0e86e5647b 100644
--- a/arch/arm/crypto/sha1-ce-glue.c
+++ b/arch/arm/crypto/sha1-ce-glue.c
@@ -10,13 +10,13 @@
 
 #include <crypto/internal/hash.h>
 #include <crypto/sha.h>
+#include <crypto/sha1_base.h>
 #include <linux/crypto.h>
 #include <linux/module.h>
 
 #include <asm/hwcap.h>
 #include <asm/neon.h>
 #include <asm/simd.h>
-#include <asm/unaligned.h>
 
 #include "sha1.h"
 
@@ -24,104 +24,52 @@ MODULE_DESCRIPTION("SHA1 secure hash using ARMv8 Crypto Extensions");
 MODULE_AUTHOR("Ard Biesheuvel <ard.biesheuvel@linaro.org>");
 MODULE_LICENSE("GPL v2");
 
-asmlinkage void sha1_ce_transform(int blocks, u8 const *src, u32 *state, 
-				  u8 *head);
+asmlinkage void sha1_ce_transform(int blocks, u8 const *src, u32 *state,
+				  const u8 *head, void *p);
 
-static int sha1_init(struct shash_desc *desc)
+static int sha1_ce_update(struct shash_desc *desc, const u8 *data,
+			  unsigned int len)
 {
 	struct sha1_state *sctx = shash_desc_ctx(desc);
 
-	*sctx = (struct sha1_state){
-		.state = { SHA1_H0, SHA1_H1, SHA1_H2, SHA1_H3, SHA1_H4 },
-	};
-	return 0;
-}
-
-static int sha1_update(struct shash_desc *desc, const u8 *data,
-		       unsigned int len)
-{
-	struct sha1_state *sctx = shash_desc_ctx(desc);
-	unsigned int partial;
-
-	if (!may_use_simd())
+	if (!may_use_simd() ||
+	    (sctx->count % SHA1_BLOCK_SIZE) + len < SHA1_BLOCK_SIZE)
 		return sha1_update_arm(desc, data, len);
 
-	partial = sctx->count % SHA1_BLOCK_SIZE;
-	sctx->count += len;
-
-	if ((partial + len) >= SHA1_BLOCK_SIZE) {
-		int blocks;
+	kernel_neon_begin();
+	sha1_base_do_update(desc, data, len, sha1_ce_transform, NULL);
+	kernel_neon_end();
 
-		if (partial) {
-			int p = SHA1_BLOCK_SIZE - partial;
-
-			memcpy(sctx->buffer + partial, data, p);
-			data += p;
-			len -= p;
-		}
-
-		blocks = len / SHA1_BLOCK_SIZE;
-		len %= SHA1_BLOCK_SIZE;
-
-		kernel_neon_begin();
-		sha1_ce_transform(blocks, data, sctx->state,
-				  partial ? sctx->buffer : NULL);
-		kernel_neon_end();
-
-		data += blocks * SHA1_BLOCK_SIZE;
-		partial = 0;
-	}
-	if (len)
-		memcpy(sctx->buffer + partial, data, len);
 	return 0;
 }
 
-static int sha1_final(struct shash_desc *desc, u8 *out)
+static int sha1_ce_finup(struct shash_desc *desc, const u8 *data,
+			 unsigned int len, u8 *out)
 {
-	static const u8 padding[SHA1_BLOCK_SIZE] = { 0x80, };
-
-	struct sha1_state *sctx = shash_desc_ctx(desc);
-	__be64 bits = cpu_to_be64(sctx->count << 3);
-	__be32 *dst = (__be32 *)out;
-	int i;
-
-	u32 padlen = SHA1_BLOCK_SIZE
-		     - ((sctx->count + sizeof(bits)) % SHA1_BLOCK_SIZE);
-
-	sha1_update(desc, padding, padlen);
-	sha1_update(desc, (const u8 *)&bits, sizeof(bits));
-
-	for (i = 0; i < SHA1_DIGEST_SIZE / sizeof(__be32); i++)
-		put_unaligned_be32(sctx->state[i], dst++);
-
-	*sctx = (struct sha1_state){};
-	return 0;
-}
+	if (!may_use_simd())
+		return sha1_finup_arm(desc, data, len, out);
 
-static int sha1_export(struct shash_desc *desc, void *out)
-{
-	struct sha1_state *sctx = shash_desc_ctx(desc);
-	struct sha1_state *dst = out;
+	kernel_neon_begin();
+	if (len)
+		sha1_base_do_update(desc, data, len, sha1_ce_transform, NULL);
+	sha1_base_do_finalize(desc, sha1_ce_transform, NULL);
+	kernel_neon_end();
 
-	*dst = *sctx;
-	return 0;
+	return sha1_base_finish(desc, out);
 }
 
-static int sha1_import(struct shash_desc *desc, const void *in)
+static int sha1_ce_final(struct shash_desc *desc, u8 *out)
 {
-	struct sha1_state *sctx = shash_desc_ctx(desc);
-	struct sha1_state const *src = in;
-
-	*sctx = *src;
-	return 0;
+	return sha1_ce_finup(desc, NULL, 0l, out);
 }
 
 static struct shash_alg alg = {
-	.init			= sha1_init,
-	.update			= sha1_update,
-	.final			= sha1_final,
-	.export			= sha1_export,
-	.import			= sha1_import,
+	.init			= sha1_base_init,
+	.update			= sha1_ce_update,
+	.final			= sha1_ce_final,
+	.finup			= sha1_ce_finup,
+	.export			= sha1_base_export,
+	.import			= sha1_base_import,
 	.descsize		= sizeof(struct sha1_state),
 	.digestsize		= SHA1_DIGEST_SIZE,
 	.statesize		= sizeof(struct sha1_state),
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v3 09/16] crypto/arm: move SHA-1 ARMv8 implementation to base layer
@ 2015-04-07  8:51   ` Ard Biesheuvel
  0 siblings, 0 replies; 50+ messages in thread
From: Ard Biesheuvel @ 2015-04-07  8:51 UTC (permalink / raw)
  To: linux-arm-kernel

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm/crypto/Kconfig        |   1 -
 arch/arm/crypto/sha1-ce-glue.c | 108 +++++++++++------------------------------
 2 files changed, 28 insertions(+), 81 deletions(-)

diff --git a/arch/arm/crypto/Kconfig b/arch/arm/crypto/Kconfig
index 458729d2ce22..5ed98bc6f95d 100644
--- a/arch/arm/crypto/Kconfig
+++ b/arch/arm/crypto/Kconfig
@@ -31,7 +31,6 @@ config CRYPTO_SHA1_ARM_CE
 	tristate "SHA1 digest algorithm (ARM v8 Crypto Extensions)"
 	depends on KERNEL_MODE_NEON
 	select CRYPTO_SHA1_ARM
-	select CRYPTO_SHA1
 	select CRYPTO_HASH
 	help
 	  SHA-1 secure hash standard (FIPS 180-1/DFIPS 180-2) implemented
diff --git a/arch/arm/crypto/sha1-ce-glue.c b/arch/arm/crypto/sha1-ce-glue.c
index e93b24c1af1f..9d0e86e5647b 100644
--- a/arch/arm/crypto/sha1-ce-glue.c
+++ b/arch/arm/crypto/sha1-ce-glue.c
@@ -10,13 +10,13 @@
 
 #include <crypto/internal/hash.h>
 #include <crypto/sha.h>
+#include <crypto/sha1_base.h>
 #include <linux/crypto.h>
 #include <linux/module.h>
 
 #include <asm/hwcap.h>
 #include <asm/neon.h>
 #include <asm/simd.h>
-#include <asm/unaligned.h>
 
 #include "sha1.h"
 
@@ -24,104 +24,52 @@ MODULE_DESCRIPTION("SHA1 secure hash using ARMv8 Crypto Extensions");
 MODULE_AUTHOR("Ard Biesheuvel <ard.biesheuvel@linaro.org>");
 MODULE_LICENSE("GPL v2");
 
-asmlinkage void sha1_ce_transform(int blocks, u8 const *src, u32 *state, 
-				  u8 *head);
+asmlinkage void sha1_ce_transform(int blocks, u8 const *src, u32 *state,
+				  const u8 *head, void *p);
 
-static int sha1_init(struct shash_desc *desc)
+static int sha1_ce_update(struct shash_desc *desc, const u8 *data,
+			  unsigned int len)
 {
 	struct sha1_state *sctx = shash_desc_ctx(desc);
 
-	*sctx = (struct sha1_state){
-		.state = { SHA1_H0, SHA1_H1, SHA1_H2, SHA1_H3, SHA1_H4 },
-	};
-	return 0;
-}
-
-static int sha1_update(struct shash_desc *desc, const u8 *data,
-		       unsigned int len)
-{
-	struct sha1_state *sctx = shash_desc_ctx(desc);
-	unsigned int partial;
-
-	if (!may_use_simd())
+	if (!may_use_simd() ||
+	    (sctx->count % SHA1_BLOCK_SIZE) + len < SHA1_BLOCK_SIZE)
 		return sha1_update_arm(desc, data, len);
 
-	partial = sctx->count % SHA1_BLOCK_SIZE;
-	sctx->count += len;
-
-	if ((partial + len) >= SHA1_BLOCK_SIZE) {
-		int blocks;
+	kernel_neon_begin();
+	sha1_base_do_update(desc, data, len, sha1_ce_transform, NULL);
+	kernel_neon_end();
 
-		if (partial) {
-			int p = SHA1_BLOCK_SIZE - partial;
-
-			memcpy(sctx->buffer + partial, data, p);
-			data += p;
-			len -= p;
-		}
-
-		blocks = len / SHA1_BLOCK_SIZE;
-		len %= SHA1_BLOCK_SIZE;
-
-		kernel_neon_begin();
-		sha1_ce_transform(blocks, data, sctx->state,
-				  partial ? sctx->buffer : NULL);
-		kernel_neon_end();
-
-		data += blocks * SHA1_BLOCK_SIZE;
-		partial = 0;
-	}
-	if (len)
-		memcpy(sctx->buffer + partial, data, len);
 	return 0;
 }
 
-static int sha1_final(struct shash_desc *desc, u8 *out)
+static int sha1_ce_finup(struct shash_desc *desc, const u8 *data,
+			 unsigned int len, u8 *out)
 {
-	static const u8 padding[SHA1_BLOCK_SIZE] = { 0x80, };
-
-	struct sha1_state *sctx = shash_desc_ctx(desc);
-	__be64 bits = cpu_to_be64(sctx->count << 3);
-	__be32 *dst = (__be32 *)out;
-	int i;
-
-	u32 padlen = SHA1_BLOCK_SIZE
-		     - ((sctx->count + sizeof(bits)) % SHA1_BLOCK_SIZE);
-
-	sha1_update(desc, padding, padlen);
-	sha1_update(desc, (const u8 *)&bits, sizeof(bits));
-
-	for (i = 0; i < SHA1_DIGEST_SIZE / sizeof(__be32); i++)
-		put_unaligned_be32(sctx->state[i], dst++);
-
-	*sctx = (struct sha1_state){};
-	return 0;
-}
+	if (!may_use_simd())
+		return sha1_finup_arm(desc, data, len, out);
 
-static int sha1_export(struct shash_desc *desc, void *out)
-{
-	struct sha1_state *sctx = shash_desc_ctx(desc);
-	struct sha1_state *dst = out;
+	kernel_neon_begin();
+	if (len)
+		sha1_base_do_update(desc, data, len, sha1_ce_transform, NULL);
+	sha1_base_do_finalize(desc, sha1_ce_transform, NULL);
+	kernel_neon_end();
 
-	*dst = *sctx;
-	return 0;
+	return sha1_base_finish(desc, out);
 }
 
-static int sha1_import(struct shash_desc *desc, const void *in)
+static int sha1_ce_final(struct shash_desc *desc, u8 *out)
 {
-	struct sha1_state *sctx = shash_desc_ctx(desc);
-	struct sha1_state const *src = in;
-
-	*sctx = *src;
-	return 0;
+	return sha1_ce_finup(desc, NULL, 0l, out);
 }
 
 static struct shash_alg alg = {
-	.init			= sha1_init,
-	.update			= sha1_update,
-	.final			= sha1_final,
-	.export			= sha1_export,
-	.import			= sha1_import,
+	.init			= sha1_base_init,
+	.update			= sha1_ce_update,
+	.final			= sha1_ce_final,
+	.finup			= sha1_ce_finup,
+	.export			= sha1_base_export,
+	.import			= sha1_base_import,
 	.descsize		= sizeof(struct sha1_state),
 	.digestsize		= SHA1_DIGEST_SIZE,
 	.statesize		= sizeof(struct sha1_state),
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v3 10/16] crypto/arm: move SHA-224/256 ASM/NEON implementation to base layer
  2015-04-07  8:51 ` Ard Biesheuvel
@ 2015-04-07  8:51   ` Ard Biesheuvel
  -1 siblings, 0 replies; 50+ messages in thread
From: Ard Biesheuvel @ 2015-04-07  8:51 UTC (permalink / raw)
  To: linux-crypto, linux-arm-kernel, x86, herbert, samitolvanen,
	jussi.kivilinna
  Cc: stockhausen, Ard Biesheuvel

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm/crypto/sha256_glue.c      | 174 ++++++++-----------------------------
 arch/arm/crypto/sha256_glue.h      |  17 +---
 arch/arm/crypto/sha256_neon_glue.c | 144 +++++++++---------------------
 3 files changed, 81 insertions(+), 254 deletions(-)

diff --git a/arch/arm/crypto/sha256_glue.c b/arch/arm/crypto/sha256_glue.c
index ccef5e25bbcb..6f14a5a0a467 100644
--- a/arch/arm/crypto/sha256_glue.c
+++ b/arch/arm/crypto/sha256_glue.c
@@ -24,163 +24,56 @@
 #include <linux/types.h>
 #include <linux/string.h>
 #include <crypto/sha.h>
-#include <asm/byteorder.h>
+#include <crypto/sha256_base.h>
 #include <asm/simd.h>
 #include <asm/neon.h>
+
 #include "sha256_glue.h"
 
 asmlinkage void sha256_block_data_order(u32 *digest, const void *data,
-				      unsigned int num_blks);
-
+					unsigned int num_blks);
 
-int sha256_init(struct shash_desc *desc)
+static void sha256_arm_block_fn(int blocks, u8 const *src, u32 *state,
+				const u8 *head, void *p)
 {
-	struct sha256_state *sctx = shash_desc_ctx(desc);
-
-	sctx->state[0] = SHA256_H0;
-	sctx->state[1] = SHA256_H1;
-	sctx->state[2] = SHA256_H2;
-	sctx->state[3] = SHA256_H3;
-	sctx->state[4] = SHA256_H4;
-	sctx->state[5] = SHA256_H5;
-	sctx->state[6] = SHA256_H6;
-	sctx->state[7] = SHA256_H7;
-	sctx->count = 0;
-
-	return 0;
+	if (head)
+		sha256_block_data_order(state, head, 1);
+	if (blocks)
+		sha256_block_data_order(state, src, blocks);
 }
 
-int sha224_init(struct shash_desc *desc)
+int crypto_sha256_arm_update(struct shash_desc *desc, const u8 *data,
+			     unsigned int len)
 {
-	struct sha256_state *sctx = shash_desc_ctx(desc);
-
-	sctx->state[0] = SHA224_H0;
-	sctx->state[1] = SHA224_H1;
-	sctx->state[2] = SHA224_H2;
-	sctx->state[3] = SHA224_H3;
-	sctx->state[4] = SHA224_H4;
-	sctx->state[5] = SHA224_H5;
-	sctx->state[6] = SHA224_H6;
-	sctx->state[7] = SHA224_H7;
-	sctx->count = 0;
-
-	return 0;
+	return sha256_base_do_update(desc, data, len, sha256_arm_block_fn,
+				     NULL);
 }
+EXPORT_SYMBOL(crypto_sha256_arm_update);
 
-int __sha256_update(struct shash_desc *desc, const u8 *data, unsigned int len,
-		    unsigned int partial)
+int crypto_sha256_arm_finup(struct shash_desc *desc, const u8 *data,
+			    unsigned int len, u8 *hash)
 {
-	struct sha256_state *sctx = shash_desc_ctx(desc);
-	unsigned int done = 0;
-
-	sctx->count += len;
-
-	if (partial) {
-		done = SHA256_BLOCK_SIZE - partial;
-		memcpy(sctx->buf + partial, data, done);
-		sha256_block_data_order(sctx->state, sctx->buf, 1);
-	}
-
-	if (len - done >= SHA256_BLOCK_SIZE) {
-		const unsigned int rounds = (len - done) / SHA256_BLOCK_SIZE;
-
-		sha256_block_data_order(sctx->state, data + done, rounds);
-		done += rounds * SHA256_BLOCK_SIZE;
-	}
-
-	memcpy(sctx->buf, data + done, len - done);
-
-	return 0;
+	if (len)
+		sha256_base_do_update(desc, data, len, sha256_arm_block_fn,
+				      NULL);
+	sha256_base_do_finalize(desc, sha256_arm_block_fn, NULL);
+	return sha256_base_finish(desc, hash);
 }
+EXPORT_SYMBOL(crypto_sha256_arm_finup);
 
-int sha256_update(struct shash_desc *desc, const u8 *data, unsigned int len)
-{
-	struct sha256_state *sctx = shash_desc_ctx(desc);
-	unsigned int partial = sctx->count % SHA256_BLOCK_SIZE;
-
-	/* Handle the fast case right here */
-	if (partial + len < SHA256_BLOCK_SIZE) {
-		sctx->count += len;
-		memcpy(sctx->buf + partial, data, len);
-
-		return 0;
-	}
-
-	return __sha256_update(desc, data, len, partial);
-}
-
-/* Add padding and return the message digest. */
 static int sha256_final(struct shash_desc *desc, u8 *out)
 {
-	struct sha256_state *sctx = shash_desc_ctx(desc);
-	unsigned int i, index, padlen;
-	__be32 *dst = (__be32 *)out;
-	__be64 bits;
-	static const u8 padding[SHA256_BLOCK_SIZE] = { 0x80, };
-
-	/* save number of bits */
-	bits = cpu_to_be64(sctx->count << 3);
-
-	/* Pad out to 56 mod 64 and append length */
-	index = sctx->count % SHA256_BLOCK_SIZE;
-	padlen = (index < 56) ? (56 - index) : ((SHA256_BLOCK_SIZE+56)-index);
-
-	/* We need to fill a whole block for __sha256_update */
-	if (padlen <= 56) {
-		sctx->count += padlen;
-		memcpy(sctx->buf + index, padding, padlen);
-	} else {
-		__sha256_update(desc, padding, padlen, index);
-	}
-	__sha256_update(desc, (const u8 *)&bits, sizeof(bits), 56);
-
-	/* Store state in digest */
-	for (i = 0; i < 8; i++)
-		dst[i] = cpu_to_be32(sctx->state[i]);
-
-	/* Wipe context */
-	memset(sctx, 0, sizeof(*sctx));
-
-	return 0;
-}
-
-static int sha224_final(struct shash_desc *desc, u8 *out)
-{
-	u8 D[SHA256_DIGEST_SIZE];
-
-	sha256_final(desc, D);
-
-	memcpy(out, D, SHA224_DIGEST_SIZE);
-	memzero_explicit(D, SHA256_DIGEST_SIZE);
-
-	return 0;
-}
-
-int sha256_export(struct shash_desc *desc, void *out)
-{
-	struct sha256_state *sctx = shash_desc_ctx(desc);
-
-	memcpy(out, sctx, sizeof(*sctx));
-
-	return 0;
-}
-
-int sha256_import(struct shash_desc *desc, const void *in)
-{
-	struct sha256_state *sctx = shash_desc_ctx(desc);
-
-	memcpy(sctx, in, sizeof(*sctx));
-
-	return 0;
+	return crypto_sha256_finup(desc, NULL, 0, out);
 }
 
 static struct shash_alg algs[] = { {
 	.digestsize	=	SHA256_DIGEST_SIZE,
-	.init		=	sha256_init,
-	.update		=	sha256_update,
+	.init		=	sha256_base_init,
+	.update		=	crypto_sha256_arm_update,
 	.final		=	sha256_final,
-	.export		=	sha256_export,
-	.import		=	sha256_import,
+	.finup		=	crypto_sha256_arm_finup,
+	.export		=	sha256_base_export,
+	.import		=	sha256_base_import,
 	.descsize	=	sizeof(struct sha256_state),
 	.statesize	=	sizeof(struct sha256_state),
 	.base		=	{
@@ -193,11 +86,12 @@ static struct shash_alg algs[] = { {
 	}
 }, {
 	.digestsize	=	SHA224_DIGEST_SIZE,
-	.init		=	sha224_init,
-	.update		=	sha256_update,
-	.final		=	sha224_final,
-	.export		=	sha256_export,
-	.import		=	sha256_import,
+	.init		=	sha224_base_init,
+	.update		=	crypto_sha256_arm_update,
+	.final		=	sha256_final,
+	.finup		=	crypto_sha256_arm_finup,
+	.export		=	sha256_base_export,
+	.import		=	sha256_base_import,
 	.descsize	=	sizeof(struct sha256_state),
 	.statesize	=	sizeof(struct sha256_state),
 	.base		=	{
diff --git a/arch/arm/crypto/sha256_glue.h b/arch/arm/crypto/sha256_glue.h
index 0312f4ffe8cc..7cf0bf786ada 100644
--- a/arch/arm/crypto/sha256_glue.h
+++ b/arch/arm/crypto/sha256_glue.h
@@ -2,22 +2,13 @@
 #define _CRYPTO_SHA256_GLUE_H
 
 #include <linux/crypto.h>
-#include <crypto/sha.h>
 
 extern struct shash_alg sha256_neon_algs[2];
 
-extern int sha256_init(struct shash_desc *desc);
+int crypto_sha256_arm_update(struct shash_desc *desc, const u8 *data,
+			     unsigned int len);
 
-extern int sha224_init(struct shash_desc *desc);
-
-extern int __sha256_update(struct shash_desc *desc, const u8 *data,
-			   unsigned int len, unsigned int partial);
-
-extern int sha256_update(struct shash_desc *desc, const u8 *data,
-			 unsigned int len);
-
-extern int sha256_export(struct shash_desc *desc, void *out);
-
-extern int sha256_import(struct shash_desc *desc, const void *in);
+int crypto_sha256_arm_finup(struct shash_desc *desc, const u8 *data,
+			    unsigned int len, u8 *hash);
 
 #endif /* _CRYPTO_SHA256_GLUE_H */
diff --git a/arch/arm/crypto/sha256_neon_glue.c b/arch/arm/crypto/sha256_neon_glue.c
index c4da10090eee..90053d9dc5bd 100644
--- a/arch/arm/crypto/sha256_neon_glue.c
+++ b/arch/arm/crypto/sha256_neon_glue.c
@@ -19,129 +19,70 @@
 #include <linux/types.h>
 #include <linux/string.h>
 #include <crypto/sha.h>
+#include <crypto/sha256_base.h>
 #include <asm/byteorder.h>
 #include <asm/simd.h>
 #include <asm/neon.h>
+
 #include "sha256_glue.h"
 
 asmlinkage void sha256_block_data_order_neon(u32 *digest, const void *data,
-				      unsigned int num_blks);
-
+					     unsigned int num_blks);
 
-static int __sha256_neon_update(struct shash_desc *desc, const u8 *data,
-				unsigned int len, unsigned int partial)
+static void sha256_neon_block_fn(int blocks, u8 const *src, u32 *state,
+				 const u8 *head, void *p)
 {
-	struct sha256_state *sctx = shash_desc_ctx(desc);
-	unsigned int done = 0;
-
-	sctx->count += len;
-
-	if (partial) {
-		done = SHA256_BLOCK_SIZE - partial;
-		memcpy(sctx->buf + partial, data, done);
-		sha256_block_data_order_neon(sctx->state, sctx->buf, 1);
-	}
-
-	if (len - done >= SHA256_BLOCK_SIZE) {
-		const unsigned int rounds = (len - done) / SHA256_BLOCK_SIZE;
-
-		sha256_block_data_order_neon(sctx->state, data + done, rounds);
-		done += rounds * SHA256_BLOCK_SIZE;
-	}
-
-	memcpy(sctx->buf, data + done, len - done);
-
-	return 0;
+	if (head)
+		sha256_block_data_order_neon(state, head, 1);
+	if (blocks)
+		sha256_block_data_order_neon(state, src, blocks);
 }
 
-static int sha256_neon_update(struct shash_desc *desc, const u8 *data,
-			      unsigned int len)
+static int sha256_update(struct shash_desc *desc, const u8 *data,
+			 unsigned int len)
 {
 	struct sha256_state *sctx = shash_desc_ctx(desc);
-	unsigned int partial = sctx->count % SHA256_BLOCK_SIZE;
-	int res;
 
-	/* Handle the fast case right here */
-	if (partial + len < SHA256_BLOCK_SIZE) {
-		sctx->count += len;
-		memcpy(sctx->buf + partial, data, len);
+	if (!may_use_simd() ||
+	    (sctx->count % SHA256_BLOCK_SIZE) + len < SHA256_BLOCK_SIZE)
+		return crypto_sha256_arm_update(desc, data, len);
 
-		return 0;
-	}
+	kernel_neon_begin();
+	sha256_base_do_update(desc, data, len, sha256_neon_block_fn, NULL);
+	kernel_neon_end();
 
-	if (!may_use_simd()) {
-		res = __sha256_update(desc, data, len, partial);
-	} else {
-		kernel_neon_begin();
-		res = __sha256_neon_update(desc, data, len, partial);
-		kernel_neon_end();
-	}
-
-	return res;
+	return 0;
 }
 
-/* Add padding and return the message digest. */
-static int sha256_neon_final(struct shash_desc *desc, u8 *out)
+static int sha256_finup(struct shash_desc *desc, const u8 *data,
+			unsigned int len, u8 *out)
 {
-	struct sha256_state *sctx = shash_desc_ctx(desc);
-	unsigned int i, index, padlen;
-	__be32 *dst = (__be32 *)out;
-	__be64 bits;
-	static const u8 padding[SHA256_BLOCK_SIZE] = { 0x80, };
-
-	/* save number of bits */
-	bits = cpu_to_be64(sctx->count << 3);
-
-	/* Pad out to 56 mod 64 and append length */
-	index = sctx->count % SHA256_BLOCK_SIZE;
-	padlen = (index < 56) ? (56 - index) : ((SHA256_BLOCK_SIZE+56)-index);
-
-	if (!may_use_simd()) {
-		sha256_update(desc, padding, padlen);
-		sha256_update(desc, (const u8 *)&bits, sizeof(bits));
-	} else {
-		kernel_neon_begin();
-		/* We need to fill a whole block for __sha256_neon_update() */
-		if (padlen <= 56) {
-			sctx->count += padlen;
-			memcpy(sctx->buf + index, padding, padlen);
-		} else {
-			__sha256_neon_update(desc, padding, padlen, index);
-		}
-		__sha256_neon_update(desc, (const u8 *)&bits,
-					sizeof(bits), 56);
-		kernel_neon_end();
-	}
+	if (!may_use_simd())
+		return crypto_sha256_arm_finup(desc, data, len, out);
 
-	/* Store state in digest */
-	for (i = 0; i < 8; i++)
-		dst[i] = cpu_to_be32(sctx->state[i]);
+	kernel_neon_begin();
+	if (len)
+		sha256_base_do_update(desc, data, len, sha256_neon_block_fn,
+				      NULL);
+	sha256_base_do_finalize(desc, sha256_neon_block_fn, NULL);
+	kernel_neon_end();
 
-	/* Wipe context */
-	memzero_explicit(sctx, sizeof(*sctx));
-
-	return 0;
+	return sha256_base_finish(desc, out);
 }
 
-static int sha224_neon_final(struct shash_desc *desc, u8 *out)
+static int sha256_final(struct shash_desc *desc, u8 *out)
 {
-	u8 D[SHA256_DIGEST_SIZE];
-
-	sha256_neon_final(desc, D);
-
-	memcpy(out, D, SHA224_DIGEST_SIZE);
-	memzero_explicit(D, SHA256_DIGEST_SIZE);
-
-	return 0;
+	return sha256_finup(desc, NULL, 0, out);
 }
 
 struct shash_alg sha256_neon_algs[] = { {
 	.digestsize	=	SHA256_DIGEST_SIZE,
-	.init		=	sha256_init,
-	.update		=	sha256_neon_update,
-	.final		=	sha256_neon_final,
-	.export		=	sha256_export,
-	.import		=	sha256_import,
+	.init		=	sha256_base_init,
+	.update		=	sha256_update,
+	.final		=	sha256_final,
+	.finup		=	sha256_finup,
+	.export		=	sha256_base_export,
+	.import		=	sha256_base_import,
 	.descsize	=	sizeof(struct sha256_state),
 	.statesize	=	sizeof(struct sha256_state),
 	.base		=	{
@@ -154,11 +95,12 @@ struct shash_alg sha256_neon_algs[] = { {
 	}
 }, {
 	.digestsize	=	SHA224_DIGEST_SIZE,
-	.init		=	sha224_init,
-	.update		=	sha256_neon_update,
-	.final		=	sha224_neon_final,
-	.export		=	sha256_export,
-	.import		=	sha256_import,
+	.init		=	sha224_base_init,
+	.update		=	sha256_update,
+	.final		=	sha256_final,
+	.finup		=	sha256_finup,
+	.export		=	sha256_base_export,
+	.import		=	sha256_base_import,
 	.descsize	=	sizeof(struct sha256_state),
 	.statesize	=	sizeof(struct sha256_state),
 	.base		=	{
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v3 10/16] crypto/arm: move SHA-224/256 ASM/NEON implementation to base layer
@ 2015-04-07  8:51   ` Ard Biesheuvel
  0 siblings, 0 replies; 50+ messages in thread
From: Ard Biesheuvel @ 2015-04-07  8:51 UTC (permalink / raw)
  To: linux-arm-kernel

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm/crypto/sha256_glue.c      | 174 ++++++++-----------------------------
 arch/arm/crypto/sha256_glue.h      |  17 +---
 arch/arm/crypto/sha256_neon_glue.c | 144 +++++++++---------------------
 3 files changed, 81 insertions(+), 254 deletions(-)

diff --git a/arch/arm/crypto/sha256_glue.c b/arch/arm/crypto/sha256_glue.c
index ccef5e25bbcb..6f14a5a0a467 100644
--- a/arch/arm/crypto/sha256_glue.c
+++ b/arch/arm/crypto/sha256_glue.c
@@ -24,163 +24,56 @@
 #include <linux/types.h>
 #include <linux/string.h>
 #include <crypto/sha.h>
-#include <asm/byteorder.h>
+#include <crypto/sha256_base.h>
 #include <asm/simd.h>
 #include <asm/neon.h>
+
 #include "sha256_glue.h"
 
 asmlinkage void sha256_block_data_order(u32 *digest, const void *data,
-				      unsigned int num_blks);
-
+					unsigned int num_blks);
 
-int sha256_init(struct shash_desc *desc)
+static void sha256_arm_block_fn(int blocks, u8 const *src, u32 *state,
+				const u8 *head, void *p)
 {
-	struct sha256_state *sctx = shash_desc_ctx(desc);
-
-	sctx->state[0] = SHA256_H0;
-	sctx->state[1] = SHA256_H1;
-	sctx->state[2] = SHA256_H2;
-	sctx->state[3] = SHA256_H3;
-	sctx->state[4] = SHA256_H4;
-	sctx->state[5] = SHA256_H5;
-	sctx->state[6] = SHA256_H6;
-	sctx->state[7] = SHA256_H7;
-	sctx->count = 0;
-
-	return 0;
+	if (head)
+		sha256_block_data_order(state, head, 1);
+	if (blocks)
+		sha256_block_data_order(state, src, blocks);
 }
 
-int sha224_init(struct shash_desc *desc)
+int crypto_sha256_arm_update(struct shash_desc *desc, const u8 *data,
+			     unsigned int len)
 {
-	struct sha256_state *sctx = shash_desc_ctx(desc);
-
-	sctx->state[0] = SHA224_H0;
-	sctx->state[1] = SHA224_H1;
-	sctx->state[2] = SHA224_H2;
-	sctx->state[3] = SHA224_H3;
-	sctx->state[4] = SHA224_H4;
-	sctx->state[5] = SHA224_H5;
-	sctx->state[6] = SHA224_H6;
-	sctx->state[7] = SHA224_H7;
-	sctx->count = 0;
-
-	return 0;
+	return sha256_base_do_update(desc, data, len, sha256_arm_block_fn,
+				     NULL);
 }
+EXPORT_SYMBOL(crypto_sha256_arm_update);
 
-int __sha256_update(struct shash_desc *desc, const u8 *data, unsigned int len,
-		    unsigned int partial)
+int crypto_sha256_arm_finup(struct shash_desc *desc, const u8 *data,
+			    unsigned int len, u8 *hash)
 {
-	struct sha256_state *sctx = shash_desc_ctx(desc);
-	unsigned int done = 0;
-
-	sctx->count += len;
-
-	if (partial) {
-		done = SHA256_BLOCK_SIZE - partial;
-		memcpy(sctx->buf + partial, data, done);
-		sha256_block_data_order(sctx->state, sctx->buf, 1);
-	}
-
-	if (len - done >= SHA256_BLOCK_SIZE) {
-		const unsigned int rounds = (len - done) / SHA256_BLOCK_SIZE;
-
-		sha256_block_data_order(sctx->state, data + done, rounds);
-		done += rounds * SHA256_BLOCK_SIZE;
-	}
-
-	memcpy(sctx->buf, data + done, len - done);
-
-	return 0;
+	if (len)
+		sha256_base_do_update(desc, data, len, sha256_arm_block_fn,
+				      NULL);
+	sha256_base_do_finalize(desc, sha256_arm_block_fn, NULL);
+	return sha256_base_finish(desc, hash);
 }
+EXPORT_SYMBOL(crypto_sha256_arm_finup);
 
-int sha256_update(struct shash_desc *desc, const u8 *data, unsigned int len)
-{
-	struct sha256_state *sctx = shash_desc_ctx(desc);
-	unsigned int partial = sctx->count % SHA256_BLOCK_SIZE;
-
-	/* Handle the fast case right here */
-	if (partial + len < SHA256_BLOCK_SIZE) {
-		sctx->count += len;
-		memcpy(sctx->buf + partial, data, len);
-
-		return 0;
-	}
-
-	return __sha256_update(desc, data, len, partial);
-}
-
-/* Add padding and return the message digest. */
 static int sha256_final(struct shash_desc *desc, u8 *out)
 {
-	struct sha256_state *sctx = shash_desc_ctx(desc);
-	unsigned int i, index, padlen;
-	__be32 *dst = (__be32 *)out;
-	__be64 bits;
-	static const u8 padding[SHA256_BLOCK_SIZE] = { 0x80, };
-
-	/* save number of bits */
-	bits = cpu_to_be64(sctx->count << 3);
-
-	/* Pad out to 56 mod 64 and append length */
-	index = sctx->count % SHA256_BLOCK_SIZE;
-	padlen = (index < 56) ? (56 - index) : ((SHA256_BLOCK_SIZE+56)-index);
-
-	/* We need to fill a whole block for __sha256_update */
-	if (padlen <= 56) {
-		sctx->count += padlen;
-		memcpy(sctx->buf + index, padding, padlen);
-	} else {
-		__sha256_update(desc, padding, padlen, index);
-	}
-	__sha256_update(desc, (const u8 *)&bits, sizeof(bits), 56);
-
-	/* Store state in digest */
-	for (i = 0; i < 8; i++)
-		dst[i] = cpu_to_be32(sctx->state[i]);
-
-	/* Wipe context */
-	memset(sctx, 0, sizeof(*sctx));
-
-	return 0;
-}
-
-static int sha224_final(struct shash_desc *desc, u8 *out)
-{
-	u8 D[SHA256_DIGEST_SIZE];
-
-	sha256_final(desc, D);
-
-	memcpy(out, D, SHA224_DIGEST_SIZE);
-	memzero_explicit(D, SHA256_DIGEST_SIZE);
-
-	return 0;
-}
-
-int sha256_export(struct shash_desc *desc, void *out)
-{
-	struct sha256_state *sctx = shash_desc_ctx(desc);
-
-	memcpy(out, sctx, sizeof(*sctx));
-
-	return 0;
-}
-
-int sha256_import(struct shash_desc *desc, const void *in)
-{
-	struct sha256_state *sctx = shash_desc_ctx(desc);
-
-	memcpy(sctx, in, sizeof(*sctx));
-
-	return 0;
+	return crypto_sha256_finup(desc, NULL, 0, out);
 }
 
 static struct shash_alg algs[] = { {
 	.digestsize	=	SHA256_DIGEST_SIZE,
-	.init		=	sha256_init,
-	.update		=	sha256_update,
+	.init		=	sha256_base_init,
+	.update		=	crypto_sha256_arm_update,
 	.final		=	sha256_final,
-	.export		=	sha256_export,
-	.import		=	sha256_import,
+	.finup		=	crypto_sha256_arm_finup,
+	.export		=	sha256_base_export,
+	.import		=	sha256_base_import,
 	.descsize	=	sizeof(struct sha256_state),
 	.statesize	=	sizeof(struct sha256_state),
 	.base		=	{
@@ -193,11 +86,12 @@ static struct shash_alg algs[] = { {
 	}
 }, {
 	.digestsize	=	SHA224_DIGEST_SIZE,
-	.init		=	sha224_init,
-	.update		=	sha256_update,
-	.final		=	sha224_final,
-	.export		=	sha256_export,
-	.import		=	sha256_import,
+	.init		=	sha224_base_init,
+	.update		=	crypto_sha256_arm_update,
+	.final		=	sha256_final,
+	.finup		=	crypto_sha256_arm_finup,
+	.export		=	sha256_base_export,
+	.import		=	sha256_base_import,
 	.descsize	=	sizeof(struct sha256_state),
 	.statesize	=	sizeof(struct sha256_state),
 	.base		=	{
diff --git a/arch/arm/crypto/sha256_glue.h b/arch/arm/crypto/sha256_glue.h
index 0312f4ffe8cc..7cf0bf786ada 100644
--- a/arch/arm/crypto/sha256_glue.h
+++ b/arch/arm/crypto/sha256_glue.h
@@ -2,22 +2,13 @@
 #define _CRYPTO_SHA256_GLUE_H
 
 #include <linux/crypto.h>
-#include <crypto/sha.h>
 
 extern struct shash_alg sha256_neon_algs[2];
 
-extern int sha256_init(struct shash_desc *desc);
+int crypto_sha256_arm_update(struct shash_desc *desc, const u8 *data,
+			     unsigned int len);
 
-extern int sha224_init(struct shash_desc *desc);
-
-extern int __sha256_update(struct shash_desc *desc, const u8 *data,
-			   unsigned int len, unsigned int partial);
-
-extern int sha256_update(struct shash_desc *desc, const u8 *data,
-			 unsigned int len);
-
-extern int sha256_export(struct shash_desc *desc, void *out);
-
-extern int sha256_import(struct shash_desc *desc, const void *in);
+int crypto_sha256_arm_finup(struct shash_desc *desc, const u8 *data,
+			    unsigned int len, u8 *hash);
 
 #endif /* _CRYPTO_SHA256_GLUE_H */
diff --git a/arch/arm/crypto/sha256_neon_glue.c b/arch/arm/crypto/sha256_neon_glue.c
index c4da10090eee..90053d9dc5bd 100644
--- a/arch/arm/crypto/sha256_neon_glue.c
+++ b/arch/arm/crypto/sha256_neon_glue.c
@@ -19,129 +19,70 @@
 #include <linux/types.h>
 #include <linux/string.h>
 #include <crypto/sha.h>
+#include <crypto/sha256_base.h>
 #include <asm/byteorder.h>
 #include <asm/simd.h>
 #include <asm/neon.h>
+
 #include "sha256_glue.h"
 
 asmlinkage void sha256_block_data_order_neon(u32 *digest, const void *data,
-				      unsigned int num_blks);
-
+					     unsigned int num_blks);
 
-static int __sha256_neon_update(struct shash_desc *desc, const u8 *data,
-				unsigned int len, unsigned int partial)
+static void sha256_neon_block_fn(int blocks, u8 const *src, u32 *state,
+				 const u8 *head, void *p)
 {
-	struct sha256_state *sctx = shash_desc_ctx(desc);
-	unsigned int done = 0;
-
-	sctx->count += len;
-
-	if (partial) {
-		done = SHA256_BLOCK_SIZE - partial;
-		memcpy(sctx->buf + partial, data, done);
-		sha256_block_data_order_neon(sctx->state, sctx->buf, 1);
-	}
-
-	if (len - done >= SHA256_BLOCK_SIZE) {
-		const unsigned int rounds = (len - done) / SHA256_BLOCK_SIZE;
-
-		sha256_block_data_order_neon(sctx->state, data + done, rounds);
-		done += rounds * SHA256_BLOCK_SIZE;
-	}
-
-	memcpy(sctx->buf, data + done, len - done);
-
-	return 0;
+	if (head)
+		sha256_block_data_order_neon(state, head, 1);
+	if (blocks)
+		sha256_block_data_order_neon(state, src, blocks);
 }
 
-static int sha256_neon_update(struct shash_desc *desc, const u8 *data,
-			      unsigned int len)
+static int sha256_update(struct shash_desc *desc, const u8 *data,
+			 unsigned int len)
 {
 	struct sha256_state *sctx = shash_desc_ctx(desc);
-	unsigned int partial = sctx->count % SHA256_BLOCK_SIZE;
-	int res;
 
-	/* Handle the fast case right here */
-	if (partial + len < SHA256_BLOCK_SIZE) {
-		sctx->count += len;
-		memcpy(sctx->buf + partial, data, len);
+	if (!may_use_simd() ||
+	    (sctx->count % SHA256_BLOCK_SIZE) + len < SHA256_BLOCK_SIZE)
+		return crypto_sha256_arm_update(desc, data, len);
 
-		return 0;
-	}
+	kernel_neon_begin();
+	sha256_base_do_update(desc, data, len, sha256_neon_block_fn, NULL);
+	kernel_neon_end();
 
-	if (!may_use_simd()) {
-		res = __sha256_update(desc, data, len, partial);
-	} else {
-		kernel_neon_begin();
-		res = __sha256_neon_update(desc, data, len, partial);
-		kernel_neon_end();
-	}
-
-	return res;
+	return 0;
 }
 
-/* Add padding and return the message digest. */
-static int sha256_neon_final(struct shash_desc *desc, u8 *out)
+static int sha256_finup(struct shash_desc *desc, const u8 *data,
+			unsigned int len, u8 *out)
 {
-	struct sha256_state *sctx = shash_desc_ctx(desc);
-	unsigned int i, index, padlen;
-	__be32 *dst = (__be32 *)out;
-	__be64 bits;
-	static const u8 padding[SHA256_BLOCK_SIZE] = { 0x80, };
-
-	/* save number of bits */
-	bits = cpu_to_be64(sctx->count << 3);
-
-	/* Pad out to 56 mod 64 and append length */
-	index = sctx->count % SHA256_BLOCK_SIZE;
-	padlen = (index < 56) ? (56 - index) : ((SHA256_BLOCK_SIZE+56)-index);
-
-	if (!may_use_simd()) {
-		sha256_update(desc, padding, padlen);
-		sha256_update(desc, (const u8 *)&bits, sizeof(bits));
-	} else {
-		kernel_neon_begin();
-		/* We need to fill a whole block for __sha256_neon_update() */
-		if (padlen <= 56) {
-			sctx->count += padlen;
-			memcpy(sctx->buf + index, padding, padlen);
-		} else {
-			__sha256_neon_update(desc, padding, padlen, index);
-		}
-		__sha256_neon_update(desc, (const u8 *)&bits,
-					sizeof(bits), 56);
-		kernel_neon_end();
-	}
+	if (!may_use_simd())
+		return crypto_sha256_arm_finup(desc, data, len, out);
 
-	/* Store state in digest */
-	for (i = 0; i < 8; i++)
-		dst[i] = cpu_to_be32(sctx->state[i]);
+	kernel_neon_begin();
+	if (len)
+		sha256_base_do_update(desc, data, len, sha256_neon_block_fn,
+				      NULL);
+	sha256_base_do_finalize(desc, sha256_neon_block_fn, NULL);
+	kernel_neon_end();
 
-	/* Wipe context */
-	memzero_explicit(sctx, sizeof(*sctx));
-
-	return 0;
+	return sha256_base_finish(desc, out);
 }
 
-static int sha224_neon_final(struct shash_desc *desc, u8 *out)
+static int sha256_final(struct shash_desc *desc, u8 *out)
 {
-	u8 D[SHA256_DIGEST_SIZE];
-
-	sha256_neon_final(desc, D);
-
-	memcpy(out, D, SHA224_DIGEST_SIZE);
-	memzero_explicit(D, SHA256_DIGEST_SIZE);
-
-	return 0;
+	return sha256_finup(desc, NULL, 0, out);
 }
 
 struct shash_alg sha256_neon_algs[] = { {
 	.digestsize	=	SHA256_DIGEST_SIZE,
-	.init		=	sha256_init,
-	.update		=	sha256_neon_update,
-	.final		=	sha256_neon_final,
-	.export		=	sha256_export,
-	.import		=	sha256_import,
+	.init		=	sha256_base_init,
+	.update		=	sha256_update,
+	.final		=	sha256_final,
+	.finup		=	sha256_finup,
+	.export		=	sha256_base_export,
+	.import		=	sha256_base_import,
 	.descsize	=	sizeof(struct sha256_state),
 	.statesize	=	sizeof(struct sha256_state),
 	.base		=	{
@@ -154,11 +95,12 @@ struct shash_alg sha256_neon_algs[] = { {
 	}
 }, {
 	.digestsize	=	SHA224_DIGEST_SIZE,
-	.init		=	sha224_init,
-	.update		=	sha256_neon_update,
-	.final		=	sha224_neon_final,
-	.export		=	sha256_export,
-	.import		=	sha256_import,
+	.init		=	sha224_base_init,
+	.update		=	sha256_update,
+	.final		=	sha256_final,
+	.finup		=	sha256_finup,
+	.export		=	sha256_base_export,
+	.import		=	sha256_base_import,
 	.descsize	=	sizeof(struct sha256_state),
 	.statesize	=	sizeof(struct sha256_state),
 	.base		=	{
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v3 11/16] crypto/arm: move SHA-224/256 ARMv8 implementation to base layer
  2015-04-07  8:51 ` Ard Biesheuvel
@ 2015-04-07  8:51   ` Ard Biesheuvel
  -1 siblings, 0 replies; 50+ messages in thread
From: Ard Biesheuvel @ 2015-04-07  8:51 UTC (permalink / raw)
  To: linux-crypto, linux-arm-kernel, x86, herbert, samitolvanen,
	jussi.kivilinna
  Cc: stockhausen, Ard Biesheuvel

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm/crypto/Kconfig        |   2 +-
 arch/arm/crypto/sha2-ce-glue.c | 154 ++++++++++-------------------------------
 2 files changed, 36 insertions(+), 120 deletions(-)

diff --git a/arch/arm/crypto/Kconfig b/arch/arm/crypto/Kconfig
index 5ed98bc6f95d..a267529d9577 100644
--- a/arch/arm/crypto/Kconfig
+++ b/arch/arm/crypto/Kconfig
@@ -39,7 +39,7 @@ config CRYPTO_SHA1_ARM_CE
 config CRYPTO_SHA2_ARM_CE
 	tristate "SHA-224/256 digest algorithm (ARM v8 Crypto Extensions)"
 	depends on KERNEL_MODE_NEON
-	select CRYPTO_SHA256
+	select CRYPTO_SHA256_ARM
 	select CRYPTO_HASH
 	help
 	  SHA-256 secure hash standard (DFIPS 180-2) implemented
diff --git a/arch/arm/crypto/sha2-ce-glue.c b/arch/arm/crypto/sha2-ce-glue.c
index 0449eca3aab3..6110b937264c 100644
--- a/arch/arm/crypto/sha2-ce-glue.c
+++ b/arch/arm/crypto/sha2-ce-glue.c
@@ -10,6 +10,7 @@
 
 #include <crypto/internal/hash.h>
 #include <crypto/sha.h>
+#include <crypto/sha256_base.h>
 #include <linux/crypto.h>
 #include <linux/module.h>
 
@@ -18,145 +19,59 @@
 #include <asm/neon.h>
 #include <asm/unaligned.h>
 
+#include "sha256_glue.h"
+
 MODULE_DESCRIPTION("SHA-224/SHA-256 secure hash using ARMv8 Crypto Extensions");
 MODULE_AUTHOR("Ard Biesheuvel <ard.biesheuvel@linaro.org>");
 MODULE_LICENSE("GPL v2");
 
 asmlinkage void sha2_ce_transform(int blocks, u8 const *src, u32 *state,
-				  u8 *head);
+				  const u8 *head, void *p);
 
-static int sha224_init(struct shash_desc *desc)
+static int sha2_ce_update(struct shash_desc *desc, const u8 *data,
+			  unsigned int len)
 {
 	struct sha256_state *sctx = shash_desc_ctx(desc);
 
-	*sctx = (struct sha256_state){
-		.state = {
-			SHA224_H0, SHA224_H1, SHA224_H2, SHA224_H3,
-			SHA224_H4, SHA224_H5, SHA224_H6, SHA224_H7,
-		}
-	};
-	return 0;
-}
+	if (!may_use_simd() ||
+	    (sctx->count % SHA256_BLOCK_SIZE) + len < SHA256_BLOCK_SIZE)
+		return crypto_sha256_arm_update(desc, data, len);
 
-static int sha256_init(struct shash_desc *desc)
-{
-	struct sha256_state *sctx = shash_desc_ctx(desc);
+	kernel_neon_begin();
+	sha256_base_do_update(desc, data, len, sha2_ce_transform, NULL);
+	kernel_neon_end();
 
-	*sctx = (struct sha256_state){
-		.state = {
-			SHA256_H0, SHA256_H1, SHA256_H2, SHA256_H3,
-			SHA256_H4, SHA256_H5, SHA256_H6, SHA256_H7,
-		}
-	};
 	return 0;
 }
 
-static int sha2_update(struct shash_desc *desc, const u8 *data,
-		       unsigned int len)
+static int sha2_ce_finup(struct shash_desc *desc, const u8 *data,
+			 unsigned int len, u8 *out)
 {
-	struct sha256_state *sctx = shash_desc_ctx(desc);
-	unsigned int partial;
-
 	if (!may_use_simd())
-		return crypto_sha256_update(desc, data, len);
-
-	partial = sctx->count % SHA256_BLOCK_SIZE;
-	sctx->count += len;
-
-	if ((partial + len) >= SHA256_BLOCK_SIZE) {
-		int blocks;
-
-		if (partial) {
-			int p = SHA256_BLOCK_SIZE - partial;
-
-			memcpy(sctx->buf + partial, data, p);
-			data += p;
-			len -= p;
-		}
-
-		blocks = len / SHA256_BLOCK_SIZE;
-		len %= SHA256_BLOCK_SIZE;
+		return crypto_sha256_arm_finup(desc, data, len, out);
 
-		kernel_neon_begin();
-		sha2_ce_transform(blocks, data, sctx->state,
-				  partial ? sctx->buf : NULL);
-		kernel_neon_end();
-
-		data += blocks * SHA256_BLOCK_SIZE;
-		partial = 0;
-	}
+	kernel_neon_begin();
 	if (len)
-		memcpy(sctx->buf + partial, data, len);
-	return 0;
-}
-
-static void sha2_final(struct shash_desc *desc)
-{
-	static const u8 padding[SHA256_BLOCK_SIZE] = { 0x80, };
+		sha256_base_do_update(desc, data, len,
+					     sha2_ce_transform, NULL);
+	sha256_base_do_finalize(desc, sha2_ce_transform, NULL);
+	kernel_neon_end();
 
-	struct sha256_state *sctx = shash_desc_ctx(desc);
-	__be64 bits = cpu_to_be64(sctx->count << 3);
-	u32 padlen = SHA256_BLOCK_SIZE
-		     - ((sctx->count + sizeof(bits)) % SHA256_BLOCK_SIZE);
-
-	sha2_update(desc, padding, padlen);
-	sha2_update(desc, (const u8 *)&bits, sizeof(bits));
+	return sha256_base_finish(desc, out);
 }
 
-static int sha224_final(struct shash_desc *desc, u8 *out)
+static int sha2_ce_final(struct shash_desc *desc, u8 *out)
 {
-	struct sha256_state *sctx = shash_desc_ctx(desc);
-	__be32 *dst = (__be32 *)out;
-	int i;
-
-	sha2_final(desc);
-
-	for (i = 0; i < SHA224_DIGEST_SIZE / sizeof(__be32); i++)
-		put_unaligned_be32(sctx->state[i], dst++);
-
-	*sctx = (struct sha256_state){};
-	return 0;
-}
-
-static int sha256_final(struct shash_desc *desc, u8 *out)
-{
-	struct sha256_state *sctx = shash_desc_ctx(desc);
-	__be32 *dst = (__be32 *)out;
-	int i;
-
-	sha2_final(desc);
-
-	for (i = 0; i < SHA256_DIGEST_SIZE / sizeof(__be32); i++)
-		put_unaligned_be32(sctx->state[i], dst++);
-
-	*sctx = (struct sha256_state){};
-	return 0;
-}
-
-static int sha2_export(struct shash_desc *desc, void *out)
-{
-	struct sha256_state *sctx = shash_desc_ctx(desc);
-	struct sha256_state *dst = out;
-
-	*dst = *sctx;
-	return 0;
-}
-
-static int sha2_import(struct shash_desc *desc, const void *in)
-{
-	struct sha256_state *sctx = shash_desc_ctx(desc);
-	struct sha256_state const *src = in;
-
-	*sctx = *src;
-	return 0;
+	return sha2_ce_finup(desc, NULL, 0, out);
 }
 
 static struct shash_alg algs[] = { {
-	.init			= sha224_init,
-	.update			= sha2_update,
-	.final			= sha224_final,
-	.export			= sha2_export,
-	.import			= sha2_import,
+	.init			= sha224_base_init,
+	.update			= sha2_ce_update,
+	.final			= sha2_ce_final,
+	.finup			= sha2_ce_finup,
+	.export			= sha256_base_export,
+	.import			= sha256_base_import,
 	.descsize		= sizeof(struct sha256_state),
 	.digestsize		= SHA224_DIGEST_SIZE,
 	.statesize		= sizeof(struct sha256_state),
@@ -169,11 +84,12 @@ static struct shash_alg algs[] = { {
 		.cra_module		= THIS_MODULE,
 	}
 }, {
-	.init			= sha256_init,
-	.update			= sha2_update,
-	.final			= sha256_final,
-	.export			= sha2_export,
-	.import			= sha2_import,
+	.init			= sha256_base_init,
+	.update			= sha2_ce_update,
+	.final			= sha2_ce_final,
+	.finup			= sha2_ce_finup,
+	.export			= sha256_base_export,
+	.import			= sha256_base_import,
 	.descsize		= sizeof(struct sha256_state),
 	.digestsize		= SHA256_DIGEST_SIZE,
 	.statesize		= sizeof(struct sha256_state),
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v3 11/16] crypto/arm: move SHA-224/256 ARMv8 implementation to base layer
@ 2015-04-07  8:51   ` Ard Biesheuvel
  0 siblings, 0 replies; 50+ messages in thread
From: Ard Biesheuvel @ 2015-04-07  8:51 UTC (permalink / raw)
  To: linux-arm-kernel

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm/crypto/Kconfig        |   2 +-
 arch/arm/crypto/sha2-ce-glue.c | 154 ++++++++++-------------------------------
 2 files changed, 36 insertions(+), 120 deletions(-)

diff --git a/arch/arm/crypto/Kconfig b/arch/arm/crypto/Kconfig
index 5ed98bc6f95d..a267529d9577 100644
--- a/arch/arm/crypto/Kconfig
+++ b/arch/arm/crypto/Kconfig
@@ -39,7 +39,7 @@ config CRYPTO_SHA1_ARM_CE
 config CRYPTO_SHA2_ARM_CE
 	tristate "SHA-224/256 digest algorithm (ARM v8 Crypto Extensions)"
 	depends on KERNEL_MODE_NEON
-	select CRYPTO_SHA256
+	select CRYPTO_SHA256_ARM
 	select CRYPTO_HASH
 	help
 	  SHA-256 secure hash standard (DFIPS 180-2) implemented
diff --git a/arch/arm/crypto/sha2-ce-glue.c b/arch/arm/crypto/sha2-ce-glue.c
index 0449eca3aab3..6110b937264c 100644
--- a/arch/arm/crypto/sha2-ce-glue.c
+++ b/arch/arm/crypto/sha2-ce-glue.c
@@ -10,6 +10,7 @@
 
 #include <crypto/internal/hash.h>
 #include <crypto/sha.h>
+#include <crypto/sha256_base.h>
 #include <linux/crypto.h>
 #include <linux/module.h>
 
@@ -18,145 +19,59 @@
 #include <asm/neon.h>
 #include <asm/unaligned.h>
 
+#include "sha256_glue.h"
+
 MODULE_DESCRIPTION("SHA-224/SHA-256 secure hash using ARMv8 Crypto Extensions");
 MODULE_AUTHOR("Ard Biesheuvel <ard.biesheuvel@linaro.org>");
 MODULE_LICENSE("GPL v2");
 
 asmlinkage void sha2_ce_transform(int blocks, u8 const *src, u32 *state,
-				  u8 *head);
+				  const u8 *head, void *p);
 
-static int sha224_init(struct shash_desc *desc)
+static int sha2_ce_update(struct shash_desc *desc, const u8 *data,
+			  unsigned int len)
 {
 	struct sha256_state *sctx = shash_desc_ctx(desc);
 
-	*sctx = (struct sha256_state){
-		.state = {
-			SHA224_H0, SHA224_H1, SHA224_H2, SHA224_H3,
-			SHA224_H4, SHA224_H5, SHA224_H6, SHA224_H7,
-		}
-	};
-	return 0;
-}
+	if (!may_use_simd() ||
+	    (sctx->count % SHA256_BLOCK_SIZE) + len < SHA256_BLOCK_SIZE)
+		return crypto_sha256_arm_update(desc, data, len);
 
-static int sha256_init(struct shash_desc *desc)
-{
-	struct sha256_state *sctx = shash_desc_ctx(desc);
+	kernel_neon_begin();
+	sha256_base_do_update(desc, data, len, sha2_ce_transform, NULL);
+	kernel_neon_end();
 
-	*sctx = (struct sha256_state){
-		.state = {
-			SHA256_H0, SHA256_H1, SHA256_H2, SHA256_H3,
-			SHA256_H4, SHA256_H5, SHA256_H6, SHA256_H7,
-		}
-	};
 	return 0;
 }
 
-static int sha2_update(struct shash_desc *desc, const u8 *data,
-		       unsigned int len)
+static int sha2_ce_finup(struct shash_desc *desc, const u8 *data,
+			 unsigned int len, u8 *out)
 {
-	struct sha256_state *sctx = shash_desc_ctx(desc);
-	unsigned int partial;
-
 	if (!may_use_simd())
-		return crypto_sha256_update(desc, data, len);
-
-	partial = sctx->count % SHA256_BLOCK_SIZE;
-	sctx->count += len;
-
-	if ((partial + len) >= SHA256_BLOCK_SIZE) {
-		int blocks;
-
-		if (partial) {
-			int p = SHA256_BLOCK_SIZE - partial;
-
-			memcpy(sctx->buf + partial, data, p);
-			data += p;
-			len -= p;
-		}
-
-		blocks = len / SHA256_BLOCK_SIZE;
-		len %= SHA256_BLOCK_SIZE;
+		return crypto_sha256_arm_finup(desc, data, len, out);
 
-		kernel_neon_begin();
-		sha2_ce_transform(blocks, data, sctx->state,
-				  partial ? sctx->buf : NULL);
-		kernel_neon_end();
-
-		data += blocks * SHA256_BLOCK_SIZE;
-		partial = 0;
-	}
+	kernel_neon_begin();
 	if (len)
-		memcpy(sctx->buf + partial, data, len);
-	return 0;
-}
-
-static void sha2_final(struct shash_desc *desc)
-{
-	static const u8 padding[SHA256_BLOCK_SIZE] = { 0x80, };
+		sha256_base_do_update(desc, data, len,
+					     sha2_ce_transform, NULL);
+	sha256_base_do_finalize(desc, sha2_ce_transform, NULL);
+	kernel_neon_end();
 
-	struct sha256_state *sctx = shash_desc_ctx(desc);
-	__be64 bits = cpu_to_be64(sctx->count << 3);
-	u32 padlen = SHA256_BLOCK_SIZE
-		     - ((sctx->count + sizeof(bits)) % SHA256_BLOCK_SIZE);
-
-	sha2_update(desc, padding, padlen);
-	sha2_update(desc, (const u8 *)&bits, sizeof(bits));
+	return sha256_base_finish(desc, out);
 }
 
-static int sha224_final(struct shash_desc *desc, u8 *out)
+static int sha2_ce_final(struct shash_desc *desc, u8 *out)
 {
-	struct sha256_state *sctx = shash_desc_ctx(desc);
-	__be32 *dst = (__be32 *)out;
-	int i;
-
-	sha2_final(desc);
-
-	for (i = 0; i < SHA224_DIGEST_SIZE / sizeof(__be32); i++)
-		put_unaligned_be32(sctx->state[i], dst++);
-
-	*sctx = (struct sha256_state){};
-	return 0;
-}
-
-static int sha256_final(struct shash_desc *desc, u8 *out)
-{
-	struct sha256_state *sctx = shash_desc_ctx(desc);
-	__be32 *dst = (__be32 *)out;
-	int i;
-
-	sha2_final(desc);
-
-	for (i = 0; i < SHA256_DIGEST_SIZE / sizeof(__be32); i++)
-		put_unaligned_be32(sctx->state[i], dst++);
-
-	*sctx = (struct sha256_state){};
-	return 0;
-}
-
-static int sha2_export(struct shash_desc *desc, void *out)
-{
-	struct sha256_state *sctx = shash_desc_ctx(desc);
-	struct sha256_state *dst = out;
-
-	*dst = *sctx;
-	return 0;
-}
-
-static int sha2_import(struct shash_desc *desc, const void *in)
-{
-	struct sha256_state *sctx = shash_desc_ctx(desc);
-	struct sha256_state const *src = in;
-
-	*sctx = *src;
-	return 0;
+	return sha2_ce_finup(desc, NULL, 0, out);
 }
 
 static struct shash_alg algs[] = { {
-	.init			= sha224_init,
-	.update			= sha2_update,
-	.final			= sha224_final,
-	.export			= sha2_export,
-	.import			= sha2_import,
+	.init			= sha224_base_init,
+	.update			= sha2_ce_update,
+	.final			= sha2_ce_final,
+	.finup			= sha2_ce_finup,
+	.export			= sha256_base_export,
+	.import			= sha256_base_import,
 	.descsize		= sizeof(struct sha256_state),
 	.digestsize		= SHA224_DIGEST_SIZE,
 	.statesize		= sizeof(struct sha256_state),
@@ -169,11 +84,12 @@ static struct shash_alg algs[] = { {
 		.cra_module		= THIS_MODULE,
 	}
 }, {
-	.init			= sha256_init,
-	.update			= sha2_update,
-	.final			= sha256_final,
-	.export			= sha2_export,
-	.import			= sha2_import,
+	.init			= sha256_base_init,
+	.update			= sha2_ce_update,
+	.final			= sha2_ce_final,
+	.finup			= sha2_ce_finup,
+	.export			= sha256_base_export,
+	.import			= sha256_base_import,
 	.descsize		= sizeof(struct sha256_state),
 	.digestsize		= SHA256_DIGEST_SIZE,
 	.statesize		= sizeof(struct sha256_state),
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v3 12/16] crypto/arm64: move SHA-1 ARMv8 implementation to base layer
  2015-04-07  8:51 ` Ard Biesheuvel
@ 2015-04-07  8:52   ` Ard Biesheuvel
  -1 siblings, 0 replies; 50+ messages in thread
From: Ard Biesheuvel @ 2015-04-07  8:52 UTC (permalink / raw)
  To: linux-crypto, linux-arm-kernel, x86, herbert, samitolvanen,
	jussi.kivilinna
  Cc: stockhausen, Ard Biesheuvel

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/crypto/sha1-ce-core.S |  11 ++--
 arch/arm64/crypto/sha1-ce-glue.c | 133 +++++++--------------------------------
 2 files changed, 31 insertions(+), 113 deletions(-)

diff --git a/arch/arm64/crypto/sha1-ce-core.S b/arch/arm64/crypto/sha1-ce-core.S
index 09d57d98609c..a2c3ad51286b 100644
--- a/arch/arm64/crypto/sha1-ce-core.S
+++ b/arch/arm64/crypto/sha1-ce-core.S
@@ -131,15 +131,18 @@ CPU_LE(	rev32		v11.16b, v11.16b	)
 
 	/*
 	 * Final block: add padding and total bit count.
-	 * Skip if we have no total byte count in x4. In that case, the input
-	 * size was not a round multiple of the block size, and the padding is
-	 * handled by the C code.
+	 * Skip if the input size was not a round multiple of the block size,
+	 * the padding is handled by the C code in that case.
 	 */
 	cbz		x4, 3f
+	ldr		x5, [x2, #-8]		// sha1_state::count
+	tst		x5, #0x3f		// round multiple of block size?
+	b.ne		3f
+	str		wzr, [x4]
 	movi		v9.2d, #0
 	mov		x8, #0x80000000
 	movi		v10.2d, #0
-	ror		x7, x4, #29		// ror(lsl(x4, 3), 32)
+	ror		x7, x5, #29		// ror(lsl(x4, 3), 32)
 	fmov		d8, x8
 	mov		x4, #0
 	mov		v11.d[0], xzr
diff --git a/arch/arm64/crypto/sha1-ce-glue.c b/arch/arm64/crypto/sha1-ce-glue.c
index 6fe83f37a750..141d5f3d7389 100644
--- a/arch/arm64/crypto/sha1-ce-glue.c
+++ b/arch/arm64/crypto/sha1-ce-glue.c
@@ -12,6 +12,7 @@
 #include <asm/unaligned.h>
 #include <crypto/internal/hash.h>
 #include <crypto/sha.h>
+#include <crypto/sha1_base.h>
 #include <linux/cpufeature.h>
 #include <linux/crypto.h>
 #include <linux/module.h>
@@ -21,132 +22,46 @@ MODULE_AUTHOR("Ard Biesheuvel <ard.biesheuvel@linaro.org>");
 MODULE_LICENSE("GPL v2");
 
 asmlinkage void sha1_ce_transform(int blocks, u8 const *src, u32 *state,
-				  u8 *head, long bytes);
+				  const u8 *head, void *p);
 
-static int sha1_init(struct shash_desc *desc)
+static int sha1_ce_update(struct shash_desc *desc, const u8 *data,
+			  unsigned int len)
 {
-	struct sha1_state *sctx = shash_desc_ctx(desc);
-
-	*sctx = (struct sha1_state){
-		.state = { SHA1_H0, SHA1_H1, SHA1_H2, SHA1_H3, SHA1_H4 },
-	};
-	return 0;
-}
-
-static int sha1_update(struct shash_desc *desc, const u8 *data,
-		       unsigned int len)
-{
-	struct sha1_state *sctx = shash_desc_ctx(desc);
-	unsigned int partial = sctx->count % SHA1_BLOCK_SIZE;
-
-	sctx->count += len;
-
-	if ((partial + len) >= SHA1_BLOCK_SIZE) {
-		int blocks;
-
-		if (partial) {
-			int p = SHA1_BLOCK_SIZE - partial;
-
-			memcpy(sctx->buffer + partial, data, p);
-			data += p;
-			len -= p;
-		}
-
-		blocks = len / SHA1_BLOCK_SIZE;
-		len %= SHA1_BLOCK_SIZE;
-
-		kernel_neon_begin_partial(16);
-		sha1_ce_transform(blocks, data, sctx->state,
-				  partial ? sctx->buffer : NULL, 0);
-		kernel_neon_end();
-
-		data += blocks * SHA1_BLOCK_SIZE;
-		partial = 0;
-	}
-	if (len)
-		memcpy(sctx->buffer + partial, data, len);
-	return 0;
-}
-
-static int sha1_final(struct shash_desc *desc, u8 *out)
-{
-	static const u8 padding[SHA1_BLOCK_SIZE] = { 0x80, };
-
-	struct sha1_state *sctx = shash_desc_ctx(desc);
-	__be64 bits = cpu_to_be64(sctx->count << 3);
-	__be32 *dst = (__be32 *)out;
-	int i;
-
-	u32 padlen = SHA1_BLOCK_SIZE
-		     - ((sctx->count + sizeof(bits)) % SHA1_BLOCK_SIZE);
-
-	sha1_update(desc, padding, padlen);
-	sha1_update(desc, (const u8 *)&bits, sizeof(bits));
-
-	for (i = 0; i < SHA1_DIGEST_SIZE / sizeof(__be32); i++)
-		put_unaligned_be32(sctx->state[i], dst++);
+	kernel_neon_begin_partial(16);
+	sha1_base_do_update(desc, data, len, sha1_ce_transform, NULL);
+	kernel_neon_end();
 
-	*sctx = (struct sha1_state){};
 	return 0;
 }
 
-static int sha1_finup(struct shash_desc *desc, const u8 *data,
-		      unsigned int len, u8 *out)
+static int sha1_ce_finup(struct shash_desc *desc, const u8 *data,
+			 unsigned int len, u8 *out)
 {
-	struct sha1_state *sctx = shash_desc_ctx(desc);
-	__be32 *dst = (__be32 *)out;
-	int blocks;
-	int i;
-
-	if (sctx->count || !len || (len % SHA1_BLOCK_SIZE)) {
-		sha1_update(desc, data, len);
-		return sha1_final(desc, out);
-	}
-
-	/*
-	 * Use a fast path if the input is a multiple of 64 bytes. In
-	 * this case, there is no need to copy data around, and we can
-	 * perform the entire digest calculation in a single invocation
-	 * of sha1_ce_transform()
-	 */
-	blocks = len / SHA1_BLOCK_SIZE;
+	u32 finalize = 1;
 
 	kernel_neon_begin_partial(16);
-	sha1_ce_transform(blocks, data, sctx->state, NULL, len);
+	if (len)
+		sha1_base_do_update(desc, data, len, sha1_ce_transform,
+				    &finalize);
+	if (finalize)
+		sha1_base_do_finalize(desc, sha1_ce_transform, NULL);
 	kernel_neon_end();
 
-	for (i = 0; i < SHA1_DIGEST_SIZE / sizeof(__be32); i++)
-		put_unaligned_be32(sctx->state[i], dst++);
-
-	*sctx = (struct sha1_state){};
-	return 0;
+	return sha1_base_finish(desc, out);
 }
 
-static int sha1_export(struct shash_desc *desc, void *out)
+static int sha1_ce_final(struct shash_desc *desc, u8 *out)
 {
-	struct sha1_state *sctx = shash_desc_ctx(desc);
-	struct sha1_state *dst = out;
-
-	*dst = *sctx;
-	return 0;
-}
-
-static int sha1_import(struct shash_desc *desc, const void *in)
-{
-	struct sha1_state *sctx = shash_desc_ctx(desc);
-	struct sha1_state const *src = in;
-
-	*sctx = *src;
-	return 0;
+	return sha1_ce_finup(desc, NULL, 0, out);
 }
 
 static struct shash_alg alg = {
-	.init			= sha1_init,
-	.update			= sha1_update,
-	.final			= sha1_final,
-	.finup			= sha1_finup,
-	.export			= sha1_export,
-	.import			= sha1_import,
+	.init			= sha1_base_init,
+	.update			= sha1_ce_update,
+	.final			= sha1_ce_final,
+	.finup			= sha1_ce_finup,
+	.export			= sha1_base_export,
+	.import			= sha1_base_import,
 	.descsize		= sizeof(struct sha1_state),
 	.digestsize		= SHA1_DIGEST_SIZE,
 	.statesize		= sizeof(struct sha1_state),
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v3 12/16] crypto/arm64: move SHA-1 ARMv8 implementation to base layer
@ 2015-04-07  8:52   ` Ard Biesheuvel
  0 siblings, 0 replies; 50+ messages in thread
From: Ard Biesheuvel @ 2015-04-07  8:52 UTC (permalink / raw)
  To: linux-arm-kernel

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/crypto/sha1-ce-core.S |  11 ++--
 arch/arm64/crypto/sha1-ce-glue.c | 133 +++++++--------------------------------
 2 files changed, 31 insertions(+), 113 deletions(-)

diff --git a/arch/arm64/crypto/sha1-ce-core.S b/arch/arm64/crypto/sha1-ce-core.S
index 09d57d98609c..a2c3ad51286b 100644
--- a/arch/arm64/crypto/sha1-ce-core.S
+++ b/arch/arm64/crypto/sha1-ce-core.S
@@ -131,15 +131,18 @@ CPU_LE(	rev32		v11.16b, v11.16b	)
 
 	/*
 	 * Final block: add padding and total bit count.
-	 * Skip if we have no total byte count in x4. In that case, the input
-	 * size was not a round multiple of the block size, and the padding is
-	 * handled by the C code.
+	 * Skip if the input size was not a round multiple of the block size,
+	 * the padding is handled by the C code in that case.
 	 */
 	cbz		x4, 3f
+	ldr		x5, [x2, #-8]		// sha1_state::count
+	tst		x5, #0x3f		// round multiple of block size?
+	b.ne		3f
+	str		wzr, [x4]
 	movi		v9.2d, #0
 	mov		x8, #0x80000000
 	movi		v10.2d, #0
-	ror		x7, x4, #29		// ror(lsl(x4, 3), 32)
+	ror		x7, x5, #29		// ror(lsl(x4, 3), 32)
 	fmov		d8, x8
 	mov		x4, #0
 	mov		v11.d[0], xzr
diff --git a/arch/arm64/crypto/sha1-ce-glue.c b/arch/arm64/crypto/sha1-ce-glue.c
index 6fe83f37a750..141d5f3d7389 100644
--- a/arch/arm64/crypto/sha1-ce-glue.c
+++ b/arch/arm64/crypto/sha1-ce-glue.c
@@ -12,6 +12,7 @@
 #include <asm/unaligned.h>
 #include <crypto/internal/hash.h>
 #include <crypto/sha.h>
+#include <crypto/sha1_base.h>
 #include <linux/cpufeature.h>
 #include <linux/crypto.h>
 #include <linux/module.h>
@@ -21,132 +22,46 @@ MODULE_AUTHOR("Ard Biesheuvel <ard.biesheuvel@linaro.org>");
 MODULE_LICENSE("GPL v2");
 
 asmlinkage void sha1_ce_transform(int blocks, u8 const *src, u32 *state,
-				  u8 *head, long bytes);
+				  const u8 *head, void *p);
 
-static int sha1_init(struct shash_desc *desc)
+static int sha1_ce_update(struct shash_desc *desc, const u8 *data,
+			  unsigned int len)
 {
-	struct sha1_state *sctx = shash_desc_ctx(desc);
-
-	*sctx = (struct sha1_state){
-		.state = { SHA1_H0, SHA1_H1, SHA1_H2, SHA1_H3, SHA1_H4 },
-	};
-	return 0;
-}
-
-static int sha1_update(struct shash_desc *desc, const u8 *data,
-		       unsigned int len)
-{
-	struct sha1_state *sctx = shash_desc_ctx(desc);
-	unsigned int partial = sctx->count % SHA1_BLOCK_SIZE;
-
-	sctx->count += len;
-
-	if ((partial + len) >= SHA1_BLOCK_SIZE) {
-		int blocks;
-
-		if (partial) {
-			int p = SHA1_BLOCK_SIZE - partial;
-
-			memcpy(sctx->buffer + partial, data, p);
-			data += p;
-			len -= p;
-		}
-
-		blocks = len / SHA1_BLOCK_SIZE;
-		len %= SHA1_BLOCK_SIZE;
-
-		kernel_neon_begin_partial(16);
-		sha1_ce_transform(blocks, data, sctx->state,
-				  partial ? sctx->buffer : NULL, 0);
-		kernel_neon_end();
-
-		data += blocks * SHA1_BLOCK_SIZE;
-		partial = 0;
-	}
-	if (len)
-		memcpy(sctx->buffer + partial, data, len);
-	return 0;
-}
-
-static int sha1_final(struct shash_desc *desc, u8 *out)
-{
-	static const u8 padding[SHA1_BLOCK_SIZE] = { 0x80, };
-
-	struct sha1_state *sctx = shash_desc_ctx(desc);
-	__be64 bits = cpu_to_be64(sctx->count << 3);
-	__be32 *dst = (__be32 *)out;
-	int i;
-
-	u32 padlen = SHA1_BLOCK_SIZE
-		     - ((sctx->count + sizeof(bits)) % SHA1_BLOCK_SIZE);
-
-	sha1_update(desc, padding, padlen);
-	sha1_update(desc, (const u8 *)&bits, sizeof(bits));
-
-	for (i = 0; i < SHA1_DIGEST_SIZE / sizeof(__be32); i++)
-		put_unaligned_be32(sctx->state[i], dst++);
+	kernel_neon_begin_partial(16);
+	sha1_base_do_update(desc, data, len, sha1_ce_transform, NULL);
+	kernel_neon_end();
 
-	*sctx = (struct sha1_state){};
 	return 0;
 }
 
-static int sha1_finup(struct shash_desc *desc, const u8 *data,
-		      unsigned int len, u8 *out)
+static int sha1_ce_finup(struct shash_desc *desc, const u8 *data,
+			 unsigned int len, u8 *out)
 {
-	struct sha1_state *sctx = shash_desc_ctx(desc);
-	__be32 *dst = (__be32 *)out;
-	int blocks;
-	int i;
-
-	if (sctx->count || !len || (len % SHA1_BLOCK_SIZE)) {
-		sha1_update(desc, data, len);
-		return sha1_final(desc, out);
-	}
-
-	/*
-	 * Use a fast path if the input is a multiple of 64 bytes. In
-	 * this case, there is no need to copy data around, and we can
-	 * perform the entire digest calculation in a single invocation
-	 * of sha1_ce_transform()
-	 */
-	blocks = len / SHA1_BLOCK_SIZE;
+	u32 finalize = 1;
 
 	kernel_neon_begin_partial(16);
-	sha1_ce_transform(blocks, data, sctx->state, NULL, len);
+	if (len)
+		sha1_base_do_update(desc, data, len, sha1_ce_transform,
+				    &finalize);
+	if (finalize)
+		sha1_base_do_finalize(desc, sha1_ce_transform, NULL);
 	kernel_neon_end();
 
-	for (i = 0; i < SHA1_DIGEST_SIZE / sizeof(__be32); i++)
-		put_unaligned_be32(sctx->state[i], dst++);
-
-	*sctx = (struct sha1_state){};
-	return 0;
+	return sha1_base_finish(desc, out);
 }
 
-static int sha1_export(struct shash_desc *desc, void *out)
+static int sha1_ce_final(struct shash_desc *desc, u8 *out)
 {
-	struct sha1_state *sctx = shash_desc_ctx(desc);
-	struct sha1_state *dst = out;
-
-	*dst = *sctx;
-	return 0;
-}
-
-static int sha1_import(struct shash_desc *desc, const void *in)
-{
-	struct sha1_state *sctx = shash_desc_ctx(desc);
-	struct sha1_state const *src = in;
-
-	*sctx = *src;
-	return 0;
+	return sha1_ce_finup(desc, NULL, 0, out);
 }
 
 static struct shash_alg alg = {
-	.init			= sha1_init,
-	.update			= sha1_update,
-	.final			= sha1_final,
-	.finup			= sha1_finup,
-	.export			= sha1_export,
-	.import			= sha1_import,
+	.init			= sha1_base_init,
+	.update			= sha1_ce_update,
+	.final			= sha1_ce_final,
+	.finup			= sha1_ce_finup,
+	.export			= sha1_base_export,
+	.import			= sha1_base_import,
 	.descsize		= sizeof(struct sha1_state),
 	.digestsize		= SHA1_DIGEST_SIZE,
 	.statesize		= sizeof(struct sha1_state),
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v3 13/16] crypto/arm64: move SHA-224/256 ARMv8 implementation to base layer
  2015-04-07  8:51 ` Ard Biesheuvel
@ 2015-04-07  8:52   ` Ard Biesheuvel
  -1 siblings, 0 replies; 50+ messages in thread
From: Ard Biesheuvel @ 2015-04-07  8:52 UTC (permalink / raw)
  To: linux-crypto, linux-arm-kernel, x86, herbert, samitolvanen,
	jussi.kivilinna
  Cc: stockhausen, Ard Biesheuvel

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/crypto/sha2-ce-core.S |  11 ++-
 arch/arm64/crypto/sha2-ce-glue.c | 209 ++++++---------------------------------
 2 files changed, 38 insertions(+), 182 deletions(-)

diff --git a/arch/arm64/crypto/sha2-ce-core.S b/arch/arm64/crypto/sha2-ce-core.S
index 7f29fc031ea8..65ad56636fba 100644
--- a/arch/arm64/crypto/sha2-ce-core.S
+++ b/arch/arm64/crypto/sha2-ce-core.S
@@ -135,15 +135,18 @@ CPU_LE(	rev32		v19.16b, v19.16b	)
 
 	/*
 	 * Final block: add padding and total bit count.
-	 * Skip if we have no total byte count in x4. In that case, the input
-	 * size was not a round multiple of the block size, and the padding is
-	 * handled by the C code.
+	 * Skip if the input size was not a round multiple of the block size,
+	 * the padding is handled by the C code in that case.
 	 */
 	cbz		x4, 3f
+	ldr		x5, [x2, #-8]		// sha256_state::count
+	tst		x5, #0x3f		// round multiple of block size?
+	b.ne		3f
+	str		wzr, [x4]
 	movi		v17.2d, #0
 	mov		x8, #0x80000000
 	movi		v18.2d, #0
-	ror		x7, x4, #29		// ror(lsl(x4, 3), 32)
+	ror		x7, x5, #29		// ror(lsl(x4, 3), 32)
 	fmov		d16, x8
 	mov		x4, #0
 	mov		v19.d[0], xzr
diff --git a/arch/arm64/crypto/sha2-ce-glue.c b/arch/arm64/crypto/sha2-ce-glue.c
index ae67e88c28b9..91ac3682a730 100644
--- a/arch/arm64/crypto/sha2-ce-glue.c
+++ b/arch/arm64/crypto/sha2-ce-glue.c
@@ -12,6 +12,7 @@
 #include <asm/unaligned.h>
 #include <crypto/internal/hash.h>
 #include <crypto/sha.h>
+#include <crypto/sha256_base.h>
 #include <linux/cpufeature.h>
 #include <linux/crypto.h>
 #include <linux/module.h>
@@ -20,195 +21,47 @@ MODULE_DESCRIPTION("SHA-224/SHA-256 secure hash using ARMv8 Crypto Extensions");
 MODULE_AUTHOR("Ard Biesheuvel <ard.biesheuvel@linaro.org>");
 MODULE_LICENSE("GPL v2");
 
-asmlinkage int sha2_ce_transform(int blocks, u8 const *src, u32 *state,
-				 u8 *head, long bytes);
+asmlinkage void sha2_ce_transform(int blocks, u8 const *src, u32 *state,
+				  const u8 *head, void *p);
 
-static int sha224_init(struct shash_desc *desc)
+static int sha256_ce_update(struct shash_desc *desc, const u8 *data,
+			    unsigned int len)
 {
-	struct sha256_state *sctx = shash_desc_ctx(desc);
-
-	*sctx = (struct sha256_state){
-		.state = {
-			SHA224_H0, SHA224_H1, SHA224_H2, SHA224_H3,
-			SHA224_H4, SHA224_H5, SHA224_H6, SHA224_H7,
-		}
-	};
-	return 0;
-}
-
-static int sha256_init(struct shash_desc *desc)
-{
-	struct sha256_state *sctx = shash_desc_ctx(desc);
-
-	*sctx = (struct sha256_state){
-		.state = {
-			SHA256_H0, SHA256_H1, SHA256_H2, SHA256_H3,
-			SHA256_H4, SHA256_H5, SHA256_H6, SHA256_H7,
-		}
-	};
-	return 0;
-}
-
-static int sha2_update(struct shash_desc *desc, const u8 *data,
-		       unsigned int len)
-{
-	struct sha256_state *sctx = shash_desc_ctx(desc);
-	unsigned int partial = sctx->count % SHA256_BLOCK_SIZE;
-
-	sctx->count += len;
-
-	if ((partial + len) >= SHA256_BLOCK_SIZE) {
-		int blocks;
-
-		if (partial) {
-			int p = SHA256_BLOCK_SIZE - partial;
-
-			memcpy(sctx->buf + partial, data, p);
-			data += p;
-			len -= p;
-		}
-
-		blocks = len / SHA256_BLOCK_SIZE;
-		len %= SHA256_BLOCK_SIZE;
-
-		kernel_neon_begin_partial(28);
-		sha2_ce_transform(blocks, data, sctx->state,
-				  partial ? sctx->buf : NULL, 0);
-		kernel_neon_end();
-
-		data += blocks * SHA256_BLOCK_SIZE;
-		partial = 0;
-	}
-	if (len)
-		memcpy(sctx->buf + partial, data, len);
-	return 0;
-}
-
-static void sha2_final(struct shash_desc *desc)
-{
-	static const u8 padding[SHA256_BLOCK_SIZE] = { 0x80, };
-
-	struct sha256_state *sctx = shash_desc_ctx(desc);
-	__be64 bits = cpu_to_be64(sctx->count << 3);
-	u32 padlen = SHA256_BLOCK_SIZE
-		     - ((sctx->count + sizeof(bits)) % SHA256_BLOCK_SIZE);
-
-	sha2_update(desc, padding, padlen);
-	sha2_update(desc, (const u8 *)&bits, sizeof(bits));
-}
-
-static int sha224_final(struct shash_desc *desc, u8 *out)
-{
-	struct sha256_state *sctx = shash_desc_ctx(desc);
-	__be32 *dst = (__be32 *)out;
-	int i;
-
-	sha2_final(desc);
-
-	for (i = 0; i < SHA224_DIGEST_SIZE / sizeof(__be32); i++)
-		put_unaligned_be32(sctx->state[i], dst++);
-
-	*sctx = (struct sha256_state){};
-	return 0;
-}
-
-static int sha256_final(struct shash_desc *desc, u8 *out)
-{
-	struct sha256_state *sctx = shash_desc_ctx(desc);
-	__be32 *dst = (__be32 *)out;
-	int i;
-
-	sha2_final(desc);
-
-	for (i = 0; i < SHA256_DIGEST_SIZE / sizeof(__be32); i++)
-		put_unaligned_be32(sctx->state[i], dst++);
-
-	*sctx = (struct sha256_state){};
-	return 0;
-}
-
-static void sha2_finup(struct shash_desc *desc, const u8 *data,
-		       unsigned int len)
-{
-	struct sha256_state *sctx = shash_desc_ctx(desc);
-	int blocks;
-
-	if (sctx->count || !len || (len % SHA256_BLOCK_SIZE)) {
-		sha2_update(desc, data, len);
-		sha2_final(desc);
-		return;
-	}
-
-	/*
-	 * Use a fast path if the input is a multiple of 64 bytes. In
-	 * this case, there is no need to copy data around, and we can
-	 * perform the entire digest calculation in a single invocation
-	 * of sha2_ce_transform()
-	 */
-	blocks = len / SHA256_BLOCK_SIZE;
-
 	kernel_neon_begin_partial(28);
-	sha2_ce_transform(blocks, data, sctx->state, NULL, len);
+	sha256_base_do_update(desc, data, len, sha2_ce_transform, NULL);
 	kernel_neon_end();
-}
 
-static int sha224_finup(struct shash_desc *desc, const u8 *data,
-			unsigned int len, u8 *out)
-{
-	struct sha256_state *sctx = shash_desc_ctx(desc);
-	__be32 *dst = (__be32 *)out;
-	int i;
-
-	sha2_finup(desc, data, len);
-
-	for (i = 0; i < SHA224_DIGEST_SIZE / sizeof(__be32); i++)
-		put_unaligned_be32(sctx->state[i], dst++);
-
-	*sctx = (struct sha256_state){};
 	return 0;
 }
 
-static int sha256_finup(struct shash_desc *desc, const u8 *data,
-			unsigned int len, u8 *out)
+static int sha256_ce_finup(struct shash_desc *desc, const u8 *data,
+			   unsigned int len, u8 *out)
 {
-	struct sha256_state *sctx = shash_desc_ctx(desc);
-	__be32 *dst = (__be32 *)out;
-	int i;
-
-	sha2_finup(desc, data, len);
-
-	for (i = 0; i < SHA256_DIGEST_SIZE / sizeof(__be32); i++)
-		put_unaligned_be32(sctx->state[i], dst++);
-
-	*sctx = (struct sha256_state){};
-	return 0;
-}
+	u32 finalize = 1;
 
-static int sha2_export(struct shash_desc *desc, void *out)
-{
-	struct sha256_state *sctx = shash_desc_ctx(desc);
-	struct sha256_state *dst = out;
+	kernel_neon_begin_partial(28);
+	if (len)
+		sha256_base_do_update(desc, data, len, sha2_ce_transform,
+				      &finalize);
+	if (finalize)
+		sha256_base_do_finalize(desc, sha2_ce_transform, NULL);
+	kernel_neon_end();
 
-	*dst = *sctx;
-	return 0;
+	return sha256_base_finish(desc, out);
 }
 
-static int sha2_import(struct shash_desc *desc, const void *in)
+static int sha256_ce_final(struct shash_desc *desc, u8 *out)
 {
-	struct sha256_state *sctx = shash_desc_ctx(desc);
-	struct sha256_state const *src = in;
-
-	*sctx = *src;
-	return 0;
+	return sha256_ce_finup(desc, NULL, 0, out);
 }
 
 static struct shash_alg algs[] = { {
-	.init			= sha224_init,
-	.update			= sha2_update,
-	.final			= sha224_final,
-	.finup			= sha224_finup,
-	.export			= sha2_export,
-	.import			= sha2_import,
+	.init			= sha224_base_init,
+	.update			= sha256_ce_update,
+	.final			= sha256_ce_final,
+	.finup			= sha256_ce_finup,
+	.export			= sha256_base_export,
+	.import			= sha256_base_import,
 	.descsize		= sizeof(struct sha256_state),
 	.digestsize		= SHA224_DIGEST_SIZE,
 	.statesize		= sizeof(struct sha256_state),
@@ -221,12 +74,12 @@ static struct shash_alg algs[] = { {
 		.cra_module		= THIS_MODULE,
 	}
 }, {
-	.init			= sha256_init,
-	.update			= sha2_update,
-	.final			= sha256_final,
-	.finup			= sha256_finup,
-	.export			= sha2_export,
-	.import			= sha2_import,
+	.init			= sha256_base_init,
+	.update			= sha256_ce_update,
+	.final			= sha256_ce_final,
+	.finup			= sha256_ce_finup,
+	.export			= sha256_base_export,
+	.import			= sha256_base_import,
 	.descsize		= sizeof(struct sha256_state),
 	.digestsize		= SHA256_DIGEST_SIZE,
 	.statesize		= sizeof(struct sha256_state),
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v3 13/16] crypto/arm64: move SHA-224/256 ARMv8 implementation to base layer
@ 2015-04-07  8:52   ` Ard Biesheuvel
  0 siblings, 0 replies; 50+ messages in thread
From: Ard Biesheuvel @ 2015-04-07  8:52 UTC (permalink / raw)
  To: linux-arm-kernel

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/crypto/sha2-ce-core.S |  11 ++-
 arch/arm64/crypto/sha2-ce-glue.c | 209 ++++++---------------------------------
 2 files changed, 38 insertions(+), 182 deletions(-)

diff --git a/arch/arm64/crypto/sha2-ce-core.S b/arch/arm64/crypto/sha2-ce-core.S
index 7f29fc031ea8..65ad56636fba 100644
--- a/arch/arm64/crypto/sha2-ce-core.S
+++ b/arch/arm64/crypto/sha2-ce-core.S
@@ -135,15 +135,18 @@ CPU_LE(	rev32		v19.16b, v19.16b	)
 
 	/*
 	 * Final block: add padding and total bit count.
-	 * Skip if we have no total byte count in x4. In that case, the input
-	 * size was not a round multiple of the block size, and the padding is
-	 * handled by the C code.
+	 * Skip if the input size was not a round multiple of the block size,
+	 * the padding is handled by the C code in that case.
 	 */
 	cbz		x4, 3f
+	ldr		x5, [x2, #-8]		// sha256_state::count
+	tst		x5, #0x3f		// round multiple of block size?
+	b.ne		3f
+	str		wzr, [x4]
 	movi		v17.2d, #0
 	mov		x8, #0x80000000
 	movi		v18.2d, #0
-	ror		x7, x4, #29		// ror(lsl(x4, 3), 32)
+	ror		x7, x5, #29		// ror(lsl(x4, 3), 32)
 	fmov		d16, x8
 	mov		x4, #0
 	mov		v19.d[0], xzr
diff --git a/arch/arm64/crypto/sha2-ce-glue.c b/arch/arm64/crypto/sha2-ce-glue.c
index ae67e88c28b9..91ac3682a730 100644
--- a/arch/arm64/crypto/sha2-ce-glue.c
+++ b/arch/arm64/crypto/sha2-ce-glue.c
@@ -12,6 +12,7 @@
 #include <asm/unaligned.h>
 #include <crypto/internal/hash.h>
 #include <crypto/sha.h>
+#include <crypto/sha256_base.h>
 #include <linux/cpufeature.h>
 #include <linux/crypto.h>
 #include <linux/module.h>
@@ -20,195 +21,47 @@ MODULE_DESCRIPTION("SHA-224/SHA-256 secure hash using ARMv8 Crypto Extensions");
 MODULE_AUTHOR("Ard Biesheuvel <ard.biesheuvel@linaro.org>");
 MODULE_LICENSE("GPL v2");
 
-asmlinkage int sha2_ce_transform(int blocks, u8 const *src, u32 *state,
-				 u8 *head, long bytes);
+asmlinkage void sha2_ce_transform(int blocks, u8 const *src, u32 *state,
+				  const u8 *head, void *p);
 
-static int sha224_init(struct shash_desc *desc)
+static int sha256_ce_update(struct shash_desc *desc, const u8 *data,
+			    unsigned int len)
 {
-	struct sha256_state *sctx = shash_desc_ctx(desc);
-
-	*sctx = (struct sha256_state){
-		.state = {
-			SHA224_H0, SHA224_H1, SHA224_H2, SHA224_H3,
-			SHA224_H4, SHA224_H5, SHA224_H6, SHA224_H7,
-		}
-	};
-	return 0;
-}
-
-static int sha256_init(struct shash_desc *desc)
-{
-	struct sha256_state *sctx = shash_desc_ctx(desc);
-
-	*sctx = (struct sha256_state){
-		.state = {
-			SHA256_H0, SHA256_H1, SHA256_H2, SHA256_H3,
-			SHA256_H4, SHA256_H5, SHA256_H6, SHA256_H7,
-		}
-	};
-	return 0;
-}
-
-static int sha2_update(struct shash_desc *desc, const u8 *data,
-		       unsigned int len)
-{
-	struct sha256_state *sctx = shash_desc_ctx(desc);
-	unsigned int partial = sctx->count % SHA256_BLOCK_SIZE;
-
-	sctx->count += len;
-
-	if ((partial + len) >= SHA256_BLOCK_SIZE) {
-		int blocks;
-
-		if (partial) {
-			int p = SHA256_BLOCK_SIZE - partial;
-
-			memcpy(sctx->buf + partial, data, p);
-			data += p;
-			len -= p;
-		}
-
-		blocks = len / SHA256_BLOCK_SIZE;
-		len %= SHA256_BLOCK_SIZE;
-
-		kernel_neon_begin_partial(28);
-		sha2_ce_transform(blocks, data, sctx->state,
-				  partial ? sctx->buf : NULL, 0);
-		kernel_neon_end();
-
-		data += blocks * SHA256_BLOCK_SIZE;
-		partial = 0;
-	}
-	if (len)
-		memcpy(sctx->buf + partial, data, len);
-	return 0;
-}
-
-static void sha2_final(struct shash_desc *desc)
-{
-	static const u8 padding[SHA256_BLOCK_SIZE] = { 0x80, };
-
-	struct sha256_state *sctx = shash_desc_ctx(desc);
-	__be64 bits = cpu_to_be64(sctx->count << 3);
-	u32 padlen = SHA256_BLOCK_SIZE
-		     - ((sctx->count + sizeof(bits)) % SHA256_BLOCK_SIZE);
-
-	sha2_update(desc, padding, padlen);
-	sha2_update(desc, (const u8 *)&bits, sizeof(bits));
-}
-
-static int sha224_final(struct shash_desc *desc, u8 *out)
-{
-	struct sha256_state *sctx = shash_desc_ctx(desc);
-	__be32 *dst = (__be32 *)out;
-	int i;
-
-	sha2_final(desc);
-
-	for (i = 0; i < SHA224_DIGEST_SIZE / sizeof(__be32); i++)
-		put_unaligned_be32(sctx->state[i], dst++);
-
-	*sctx = (struct sha256_state){};
-	return 0;
-}
-
-static int sha256_final(struct shash_desc *desc, u8 *out)
-{
-	struct sha256_state *sctx = shash_desc_ctx(desc);
-	__be32 *dst = (__be32 *)out;
-	int i;
-
-	sha2_final(desc);
-
-	for (i = 0; i < SHA256_DIGEST_SIZE / sizeof(__be32); i++)
-		put_unaligned_be32(sctx->state[i], dst++);
-
-	*sctx = (struct sha256_state){};
-	return 0;
-}
-
-static void sha2_finup(struct shash_desc *desc, const u8 *data,
-		       unsigned int len)
-{
-	struct sha256_state *sctx = shash_desc_ctx(desc);
-	int blocks;
-
-	if (sctx->count || !len || (len % SHA256_BLOCK_SIZE)) {
-		sha2_update(desc, data, len);
-		sha2_final(desc);
-		return;
-	}
-
-	/*
-	 * Use a fast path if the input is a multiple of 64 bytes. In
-	 * this case, there is no need to copy data around, and we can
-	 * perform the entire digest calculation in a single invocation
-	 * of sha2_ce_transform()
-	 */
-	blocks = len / SHA256_BLOCK_SIZE;
-
 	kernel_neon_begin_partial(28);
-	sha2_ce_transform(blocks, data, sctx->state, NULL, len);
+	sha256_base_do_update(desc, data, len, sha2_ce_transform, NULL);
 	kernel_neon_end();
-}
 
-static int sha224_finup(struct shash_desc *desc, const u8 *data,
-			unsigned int len, u8 *out)
-{
-	struct sha256_state *sctx = shash_desc_ctx(desc);
-	__be32 *dst = (__be32 *)out;
-	int i;
-
-	sha2_finup(desc, data, len);
-
-	for (i = 0; i < SHA224_DIGEST_SIZE / sizeof(__be32); i++)
-		put_unaligned_be32(sctx->state[i], dst++);
-
-	*sctx = (struct sha256_state){};
 	return 0;
 }
 
-static int sha256_finup(struct shash_desc *desc, const u8 *data,
-			unsigned int len, u8 *out)
+static int sha256_ce_finup(struct shash_desc *desc, const u8 *data,
+			   unsigned int len, u8 *out)
 {
-	struct sha256_state *sctx = shash_desc_ctx(desc);
-	__be32 *dst = (__be32 *)out;
-	int i;
-
-	sha2_finup(desc, data, len);
-
-	for (i = 0; i < SHA256_DIGEST_SIZE / sizeof(__be32); i++)
-		put_unaligned_be32(sctx->state[i], dst++);
-
-	*sctx = (struct sha256_state){};
-	return 0;
-}
+	u32 finalize = 1;
 
-static int sha2_export(struct shash_desc *desc, void *out)
-{
-	struct sha256_state *sctx = shash_desc_ctx(desc);
-	struct sha256_state *dst = out;
+	kernel_neon_begin_partial(28);
+	if (len)
+		sha256_base_do_update(desc, data, len, sha2_ce_transform,
+				      &finalize);
+	if (finalize)
+		sha256_base_do_finalize(desc, sha2_ce_transform, NULL);
+	kernel_neon_end();
 
-	*dst = *sctx;
-	return 0;
+	return sha256_base_finish(desc, out);
 }
 
-static int sha2_import(struct shash_desc *desc, const void *in)
+static int sha256_ce_final(struct shash_desc *desc, u8 *out)
 {
-	struct sha256_state *sctx = shash_desc_ctx(desc);
-	struct sha256_state const *src = in;
-
-	*sctx = *src;
-	return 0;
+	return sha256_ce_finup(desc, NULL, 0, out);
 }
 
 static struct shash_alg algs[] = { {
-	.init			= sha224_init,
-	.update			= sha2_update,
-	.final			= sha224_final,
-	.finup			= sha224_finup,
-	.export			= sha2_export,
-	.import			= sha2_import,
+	.init			= sha224_base_init,
+	.update			= sha256_ce_update,
+	.final			= sha256_ce_final,
+	.finup			= sha256_ce_finup,
+	.export			= sha256_base_export,
+	.import			= sha256_base_import,
 	.descsize		= sizeof(struct sha256_state),
 	.digestsize		= SHA224_DIGEST_SIZE,
 	.statesize		= sizeof(struct sha256_state),
@@ -221,12 +74,12 @@ static struct shash_alg algs[] = { {
 		.cra_module		= THIS_MODULE,
 	}
 }, {
-	.init			= sha256_init,
-	.update			= sha2_update,
-	.final			= sha256_final,
-	.finup			= sha256_finup,
-	.export			= sha2_export,
-	.import			= sha2_import,
+	.init			= sha256_base_init,
+	.update			= sha256_ce_update,
+	.final			= sha256_ce_final,
+	.finup			= sha256_ce_finup,
+	.export			= sha256_base_export,
+	.import			= sha256_base_import,
 	.descsize		= sizeof(struct sha256_state),
 	.digestsize		= SHA256_DIGEST_SIZE,
 	.statesize		= sizeof(struct sha256_state),
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v3 14/16] crypto/x86: move SHA-1 SSSE3 implementation to base layer
  2015-04-07  8:51 ` Ard Biesheuvel
@ 2015-04-07  8:52   ` Ard Biesheuvel
  -1 siblings, 0 replies; 50+ messages in thread
From: Ard Biesheuvel @ 2015-04-07  8:52 UTC (permalink / raw)
  To: linux-crypto, linux-arm-kernel, x86, herbert, samitolvanen,
	jussi.kivilinna
  Cc: stockhausen, Ard Biesheuvel

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/x86/crypto/sha1_ssse3_glue.c | 136 +++++++++-----------------------------
 1 file changed, 30 insertions(+), 106 deletions(-)

diff --git a/arch/x86/crypto/sha1_ssse3_glue.c b/arch/x86/crypto/sha1_ssse3_glue.c
index 6c20fe04a738..8678dc75fbf3 100644
--- a/arch/x86/crypto/sha1_ssse3_glue.c
+++ b/arch/x86/crypto/sha1_ssse3_glue.c
@@ -28,7 +28,7 @@
 #include <linux/cryptohash.h>
 #include <linux/types.h>
 #include <crypto/sha.h>
-#include <asm/byteorder.h>
+#include <crypto/sha1_base.h>
 #include <asm/i387.h>
 #include <asm/xcr.h>
 #include <asm/xsave.h>
@@ -49,127 +49,50 @@ asmlinkage void sha1_transform_avx2(u32 *digest, const char *data,
 
 static asmlinkage void (*sha1_transform_asm)(u32 *, const char *, unsigned int);
 
-
-static int sha1_ssse3_init(struct shash_desc *desc)
+static void sha1_ssse3_block_fn(int blocks, u8 const *src, u32 *state,
+				const u8 *head, void *p)
 {
-	struct sha1_state *sctx = shash_desc_ctx(desc);
-
-	*sctx = (struct sha1_state){
-		.state = { SHA1_H0, SHA1_H1, SHA1_H2, SHA1_H3, SHA1_H4 },
-	};
-
-	return 0;
-}
-
-static int __sha1_ssse3_update(struct shash_desc *desc, const u8 *data,
-			       unsigned int len, unsigned int partial)
-{
-	struct sha1_state *sctx = shash_desc_ctx(desc);
-	unsigned int done = 0;
-
-	sctx->count += len;
-
-	if (partial) {
-		done = SHA1_BLOCK_SIZE - partial;
-		memcpy(sctx->buffer + partial, data, done);
-		sha1_transform_asm(sctx->state, sctx->buffer, 1);
-	}
-
-	if (len - done >= SHA1_BLOCK_SIZE) {
-		const unsigned int rounds = (len - done) / SHA1_BLOCK_SIZE;
-
-		sha1_transform_asm(sctx->state, data + done, rounds);
-		done += rounds * SHA1_BLOCK_SIZE;
-	}
-
-	memcpy(sctx->buffer, data + done, len - done);
-
-	return 0;
+	if (head)
+		sha1_transform_asm(state, head, 1);
+	if (blocks)
+		sha1_transform_asm(state, src, blocks);
 }
 
 static int sha1_ssse3_update(struct shash_desc *desc, const u8 *data,
 			     unsigned int len)
 {
 	struct sha1_state *sctx = shash_desc_ctx(desc);
-	unsigned int partial = sctx->count % SHA1_BLOCK_SIZE;
-	int res;
 
-	/* Handle the fast case right here */
-	if (partial + len < SHA1_BLOCK_SIZE) {
-		sctx->count += len;
-		memcpy(sctx->buffer + partial, data, len);
+	if (!irq_fpu_usable() ||
+	    (sctx->count % SHA1_BLOCK_SIZE) + len < SHA1_BLOCK_SIZE)
+		return crypto_sha1_update(desc, data, len);
 
-		return 0;
-	}
-
-	if (!irq_fpu_usable()) {
-		res = crypto_sha1_update(desc, data, len);
-	} else {
-		kernel_fpu_begin();
-		res = __sha1_ssse3_update(desc, data, len, partial);
-		kernel_fpu_end();
-	}
-
-	return res;
-}
-
-
-/* Add padding and return the message digest. */
-static int sha1_ssse3_final(struct shash_desc *desc, u8 *out)
-{
-	struct sha1_state *sctx = shash_desc_ctx(desc);
-	unsigned int i, index, padlen;
-	__be32 *dst = (__be32 *)out;
-	__be64 bits;
-	static const u8 padding[SHA1_BLOCK_SIZE] = { 0x80, };
-
-	bits = cpu_to_be64(sctx->count << 3);
-
-	/* Pad out to 56 mod 64 and append length */
-	index = sctx->count % SHA1_BLOCK_SIZE;
-	padlen = (index < 56) ? (56 - index) : ((SHA1_BLOCK_SIZE+56) - index);
-	if (!irq_fpu_usable()) {
-		crypto_sha1_update(desc, padding, padlen);
-		crypto_sha1_update(desc, (const u8 *)&bits, sizeof(bits));
-	} else {
-		kernel_fpu_begin();
-		/* We need to fill a whole block for __sha1_ssse3_update() */
-		if (padlen <= 56) {
-			sctx->count += padlen;
-			memcpy(sctx->buffer + index, padding, padlen);
-		} else {
-			__sha1_ssse3_update(desc, padding, padlen, index);
-		}
-		__sha1_ssse3_update(desc, (const u8 *)&bits, sizeof(bits), 56);
-		kernel_fpu_end();
-	}
-
-	/* Store state in digest */
-	for (i = 0; i < 5; i++)
-		dst[i] = cpu_to_be32(sctx->state[i]);
-
-	/* Wipe context */
-	memset(sctx, 0, sizeof(*sctx));
+	kernel_fpu_begin();
+	sha1_base_do_update(desc, data, len, sha1_ssse3_block_fn, NULL);
+	kernel_fpu_end();
 
 	return 0;
 }
 
-static int sha1_ssse3_export(struct shash_desc *desc, void *out)
+static int sha1_ssse3_finup(struct shash_desc *desc, const u8 *data,
+			      unsigned int len, u8 *out)
 {
-	struct sha1_state *sctx = shash_desc_ctx(desc);
+	if (!irq_fpu_usable())
+		return crypto_sha1_finup(desc, data, len, out);
 
-	memcpy(out, sctx, sizeof(*sctx));
+	kernel_fpu_begin();
+	if (len)
+		sha1_base_do_update(desc, data, len, sha1_ssse3_block_fn, NULL);
+	sha1_base_do_finalize(desc, sha1_ssse3_block_fn, NULL);
+	kernel_fpu_end();
 
-	return 0;
+	return sha1_base_finish(desc, out);
 }
 
-static int sha1_ssse3_import(struct shash_desc *desc, const void *in)
+/* Add padding and return the message digest. */
+static int sha1_ssse3_final(struct shash_desc *desc, u8 *out)
 {
-	struct sha1_state *sctx = shash_desc_ctx(desc);
-
-	memcpy(sctx, in, sizeof(*sctx));
-
-	return 0;
+	return sha1_ssse3_finup(desc, NULL, 0, out);
 }
 
 #ifdef CONFIG_AS_AVX2
@@ -186,11 +109,12 @@ static void sha1_apply_transform_avx2(u32 *digest, const char *data,
 
 static struct shash_alg alg = {
 	.digestsize	=	SHA1_DIGEST_SIZE,
-	.init		=	sha1_ssse3_init,
+	.init		=	sha1_base_init,
 	.update		=	sha1_ssse3_update,
 	.final		=	sha1_ssse3_final,
-	.export		=	sha1_ssse3_export,
-	.import		=	sha1_ssse3_import,
+	.finup		=	sha1_ssse3_finup,
+	.export		=	sha1_base_export,
+	.import		=	sha1_base_import,
 	.descsize	=	sizeof(struct sha1_state),
 	.statesize	=	sizeof(struct sha1_state),
 	.base		=	{
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v3 14/16] crypto/x86: move SHA-1 SSSE3 implementation to base layer
@ 2015-04-07  8:52   ` Ard Biesheuvel
  0 siblings, 0 replies; 50+ messages in thread
From: Ard Biesheuvel @ 2015-04-07  8:52 UTC (permalink / raw)
  To: linux-arm-kernel

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/x86/crypto/sha1_ssse3_glue.c | 136 +++++++++-----------------------------
 1 file changed, 30 insertions(+), 106 deletions(-)

diff --git a/arch/x86/crypto/sha1_ssse3_glue.c b/arch/x86/crypto/sha1_ssse3_glue.c
index 6c20fe04a738..8678dc75fbf3 100644
--- a/arch/x86/crypto/sha1_ssse3_glue.c
+++ b/arch/x86/crypto/sha1_ssse3_glue.c
@@ -28,7 +28,7 @@
 #include <linux/cryptohash.h>
 #include <linux/types.h>
 #include <crypto/sha.h>
-#include <asm/byteorder.h>
+#include <crypto/sha1_base.h>
 #include <asm/i387.h>
 #include <asm/xcr.h>
 #include <asm/xsave.h>
@@ -49,127 +49,50 @@ asmlinkage void sha1_transform_avx2(u32 *digest, const char *data,
 
 static asmlinkage void (*sha1_transform_asm)(u32 *, const char *, unsigned int);
 
-
-static int sha1_ssse3_init(struct shash_desc *desc)
+static void sha1_ssse3_block_fn(int blocks, u8 const *src, u32 *state,
+				const u8 *head, void *p)
 {
-	struct sha1_state *sctx = shash_desc_ctx(desc);
-
-	*sctx = (struct sha1_state){
-		.state = { SHA1_H0, SHA1_H1, SHA1_H2, SHA1_H3, SHA1_H4 },
-	};
-
-	return 0;
-}
-
-static int __sha1_ssse3_update(struct shash_desc *desc, const u8 *data,
-			       unsigned int len, unsigned int partial)
-{
-	struct sha1_state *sctx = shash_desc_ctx(desc);
-	unsigned int done = 0;
-
-	sctx->count += len;
-
-	if (partial) {
-		done = SHA1_BLOCK_SIZE - partial;
-		memcpy(sctx->buffer + partial, data, done);
-		sha1_transform_asm(sctx->state, sctx->buffer, 1);
-	}
-
-	if (len - done >= SHA1_BLOCK_SIZE) {
-		const unsigned int rounds = (len - done) / SHA1_BLOCK_SIZE;
-
-		sha1_transform_asm(sctx->state, data + done, rounds);
-		done += rounds * SHA1_BLOCK_SIZE;
-	}
-
-	memcpy(sctx->buffer, data + done, len - done);
-
-	return 0;
+	if (head)
+		sha1_transform_asm(state, head, 1);
+	if (blocks)
+		sha1_transform_asm(state, src, blocks);
 }
 
 static int sha1_ssse3_update(struct shash_desc *desc, const u8 *data,
 			     unsigned int len)
 {
 	struct sha1_state *sctx = shash_desc_ctx(desc);
-	unsigned int partial = sctx->count % SHA1_BLOCK_SIZE;
-	int res;
 
-	/* Handle the fast case right here */
-	if (partial + len < SHA1_BLOCK_SIZE) {
-		sctx->count += len;
-		memcpy(sctx->buffer + partial, data, len);
+	if (!irq_fpu_usable() ||
+	    (sctx->count % SHA1_BLOCK_SIZE) + len < SHA1_BLOCK_SIZE)
+		return crypto_sha1_update(desc, data, len);
 
-		return 0;
-	}
-
-	if (!irq_fpu_usable()) {
-		res = crypto_sha1_update(desc, data, len);
-	} else {
-		kernel_fpu_begin();
-		res = __sha1_ssse3_update(desc, data, len, partial);
-		kernel_fpu_end();
-	}
-
-	return res;
-}
-
-
-/* Add padding and return the message digest. */
-static int sha1_ssse3_final(struct shash_desc *desc, u8 *out)
-{
-	struct sha1_state *sctx = shash_desc_ctx(desc);
-	unsigned int i, index, padlen;
-	__be32 *dst = (__be32 *)out;
-	__be64 bits;
-	static const u8 padding[SHA1_BLOCK_SIZE] = { 0x80, };
-
-	bits = cpu_to_be64(sctx->count << 3);
-
-	/* Pad out to 56 mod 64 and append length */
-	index = sctx->count % SHA1_BLOCK_SIZE;
-	padlen = (index < 56) ? (56 - index) : ((SHA1_BLOCK_SIZE+56) - index);
-	if (!irq_fpu_usable()) {
-		crypto_sha1_update(desc, padding, padlen);
-		crypto_sha1_update(desc, (const u8 *)&bits, sizeof(bits));
-	} else {
-		kernel_fpu_begin();
-		/* We need to fill a whole block for __sha1_ssse3_update() */
-		if (padlen <= 56) {
-			sctx->count += padlen;
-			memcpy(sctx->buffer + index, padding, padlen);
-		} else {
-			__sha1_ssse3_update(desc, padding, padlen, index);
-		}
-		__sha1_ssse3_update(desc, (const u8 *)&bits, sizeof(bits), 56);
-		kernel_fpu_end();
-	}
-
-	/* Store state in digest */
-	for (i = 0; i < 5; i++)
-		dst[i] = cpu_to_be32(sctx->state[i]);
-
-	/* Wipe context */
-	memset(sctx, 0, sizeof(*sctx));
+	kernel_fpu_begin();
+	sha1_base_do_update(desc, data, len, sha1_ssse3_block_fn, NULL);
+	kernel_fpu_end();
 
 	return 0;
 }
 
-static int sha1_ssse3_export(struct shash_desc *desc, void *out)
+static int sha1_ssse3_finup(struct shash_desc *desc, const u8 *data,
+			      unsigned int len, u8 *out)
 {
-	struct sha1_state *sctx = shash_desc_ctx(desc);
+	if (!irq_fpu_usable())
+		return crypto_sha1_finup(desc, data, len, out);
 
-	memcpy(out, sctx, sizeof(*sctx));
+	kernel_fpu_begin();
+	if (len)
+		sha1_base_do_update(desc, data, len, sha1_ssse3_block_fn, NULL);
+	sha1_base_do_finalize(desc, sha1_ssse3_block_fn, NULL);
+	kernel_fpu_end();
 
-	return 0;
+	return sha1_base_finish(desc, out);
 }
 
-static int sha1_ssse3_import(struct shash_desc *desc, const void *in)
+/* Add padding and return the message digest. */
+static int sha1_ssse3_final(struct shash_desc *desc, u8 *out)
 {
-	struct sha1_state *sctx = shash_desc_ctx(desc);
-
-	memcpy(sctx, in, sizeof(*sctx));
-
-	return 0;
+	return sha1_ssse3_finup(desc, NULL, 0, out);
 }
 
 #ifdef CONFIG_AS_AVX2
@@ -186,11 +109,12 @@ static void sha1_apply_transform_avx2(u32 *digest, const char *data,
 
 static struct shash_alg alg = {
 	.digestsize	=	SHA1_DIGEST_SIZE,
-	.init		=	sha1_ssse3_init,
+	.init		=	sha1_base_init,
 	.update		=	sha1_ssse3_update,
 	.final		=	sha1_ssse3_final,
-	.export		=	sha1_ssse3_export,
-	.import		=	sha1_ssse3_import,
+	.finup		=	sha1_ssse3_finup,
+	.export		=	sha1_base_export,
+	.import		=	sha1_base_import,
 	.descsize	=	sizeof(struct sha1_state),
 	.statesize	=	sizeof(struct sha1_state),
 	.base		=	{
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v3 15/16] crypto/x86: move SHA-224/256 SSSE3 implementation to base layer
  2015-04-07  8:51 ` Ard Biesheuvel
@ 2015-04-07  8:52   ` Ard Biesheuvel
  -1 siblings, 0 replies; 50+ messages in thread
From: Ard Biesheuvel @ 2015-04-07  8:52 UTC (permalink / raw)
  To: linux-crypto, linux-arm-kernel, x86, herbert, samitolvanen,
	jussi.kivilinna
  Cc: stockhausen, Ard Biesheuvel

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/x86/crypto/sha256_ssse3_glue.c | 184 +++++++-----------------------------
 1 file changed, 36 insertions(+), 148 deletions(-)

diff --git a/arch/x86/crypto/sha256_ssse3_glue.c b/arch/x86/crypto/sha256_ssse3_glue.c
index 8fad72f4dfd2..bd4ae0da0a49 100644
--- a/arch/x86/crypto/sha256_ssse3_glue.c
+++ b/arch/x86/crypto/sha256_ssse3_glue.c
@@ -36,7 +36,7 @@
 #include <linux/cryptohash.h>
 #include <linux/types.h>
 #include <crypto/sha.h>
-#include <asm/byteorder.h>
+#include <crypto/sha256_base.h>
 #include <asm/i387.h>
 #include <asm/xcr.h>
 #include <asm/xsave.h>
@@ -55,174 +55,61 @@ asmlinkage void sha256_transform_rorx(const char *data, u32 *digest,
 
 static asmlinkage void (*sha256_transform_asm)(const char *, u32 *, u64);
 
-
-static int sha256_ssse3_init(struct shash_desc *desc)
-{
-	struct sha256_state *sctx = shash_desc_ctx(desc);
-
-	sctx->state[0] = SHA256_H0;
-	sctx->state[1] = SHA256_H1;
-	sctx->state[2] = SHA256_H2;
-	sctx->state[3] = SHA256_H3;
-	sctx->state[4] = SHA256_H4;
-	sctx->state[5] = SHA256_H5;
-	sctx->state[6] = SHA256_H6;
-	sctx->state[7] = SHA256_H7;
-	sctx->count = 0;
-
-	return 0;
-}
-
-static int __sha256_ssse3_update(struct shash_desc *desc, const u8 *data,
-			       unsigned int len, unsigned int partial)
+static void sha256_ssse3_block_fn(int blocks, u8 const *src, u32 *state,
+				  const u8 *head, void *p)
 {
-	struct sha256_state *sctx = shash_desc_ctx(desc);
-	unsigned int done = 0;
-
-	sctx->count += len;
-
-	if (partial) {
-		done = SHA256_BLOCK_SIZE - partial;
-		memcpy(sctx->buf + partial, data, done);
-		sha256_transform_asm(sctx->buf, sctx->state, 1);
-	}
-
-	if (len - done >= SHA256_BLOCK_SIZE) {
-		const unsigned int rounds = (len - done) / SHA256_BLOCK_SIZE;
-
-		sha256_transform_asm(data + done, sctx->state, (u64) rounds);
-
-		done += rounds * SHA256_BLOCK_SIZE;
-	}
-
-	memcpy(sctx->buf, data + done, len - done);
-
-	return 0;
+	if (head)
+		sha256_transform_asm(head, state, 1);
+	if (blocks)
+		sha256_transform_asm(src, state, blocks);
 }
 
 static int sha256_ssse3_update(struct shash_desc *desc, const u8 *data,
 			     unsigned int len)
 {
 	struct sha256_state *sctx = shash_desc_ctx(desc);
-	unsigned int partial = sctx->count % SHA256_BLOCK_SIZE;
-	int res;
 
-	/* Handle the fast case right here */
-	if (partial + len < SHA256_BLOCK_SIZE) {
-		sctx->count += len;
-		memcpy(sctx->buf + partial, data, len);
-
-		return 0;
-	}
-
-	if (!irq_fpu_usable()) {
-		res = crypto_sha256_update(desc, data, len);
-	} else {
-		kernel_fpu_begin();
-		res = __sha256_ssse3_update(desc, data, len, partial);
-		kernel_fpu_end();
-	}
+	if (!irq_fpu_usable() ||
+	    (sctx->count % SHA256_BLOCK_SIZE) + len < SHA256_BLOCK_SIZE)
+		return crypto_sha256_update(desc, data, len);
 
-	return res;
-}
-
-
-/* Add padding and return the message digest. */
-static int sha256_ssse3_final(struct shash_desc *desc, u8 *out)
-{
-	struct sha256_state *sctx = shash_desc_ctx(desc);
-	unsigned int i, index, padlen;
-	__be32 *dst = (__be32 *)out;
-	__be64 bits;
-	static const u8 padding[SHA256_BLOCK_SIZE] = { 0x80, };
-
-	bits = cpu_to_be64(sctx->count << 3);
-
-	/* Pad out to 56 mod 64 and append length */
-	index = sctx->count % SHA256_BLOCK_SIZE;
-	padlen = (index < 56) ? (56 - index) : ((SHA256_BLOCK_SIZE+56)-index);
-
-	if (!irq_fpu_usable()) {
-		crypto_sha256_update(desc, padding, padlen);
-		crypto_sha256_update(desc, (const u8 *)&bits, sizeof(bits));
-	} else {
-		kernel_fpu_begin();
-		/* We need to fill a whole block for __sha256_ssse3_update() */
-		if (padlen <= 56) {
-			sctx->count += padlen;
-			memcpy(sctx->buf + index, padding, padlen);
-		} else {
-			__sha256_ssse3_update(desc, padding, padlen, index);
-		}
-		__sha256_ssse3_update(desc, (const u8 *)&bits,
-					sizeof(bits), 56);
-		kernel_fpu_end();
-	}
-
-	/* Store state in digest */
-	for (i = 0; i < 8; i++)
-		dst[i] = cpu_to_be32(sctx->state[i]);
-
-	/* Wipe context */
-	memset(sctx, 0, sizeof(*sctx));
+	kernel_fpu_begin();
+	sha256_base_do_update(desc, data, len, sha256_ssse3_block_fn, NULL);
+	kernel_fpu_end();
 
 	return 0;
 }
 
-static int sha256_ssse3_export(struct shash_desc *desc, void *out)
+static int sha256_ssse3_finup(struct shash_desc *desc, const u8 *data,
+			      unsigned int len, u8 *out)
 {
-	struct sha256_state *sctx = shash_desc_ctx(desc);
+	if (!irq_fpu_usable())
+		return crypto_sha256_finup(desc, data, len, out);
 
-	memcpy(out, sctx, sizeof(*sctx));
+	kernel_fpu_begin();
+	if (len)
+		sha256_base_do_update(desc, data, len, sha256_ssse3_block_fn,
+				      NULL);
+	sha256_base_do_finalize(desc, sha256_ssse3_block_fn, NULL);
+	kernel_fpu_end();
 
-	return 0;
-}
-
-static int sha256_ssse3_import(struct shash_desc *desc, const void *in)
-{
-	struct sha256_state *sctx = shash_desc_ctx(desc);
-
-	memcpy(sctx, in, sizeof(*sctx));
-
-	return 0;
+	return sha256_base_finish(desc, out);
 }
 
-static int sha224_ssse3_init(struct shash_desc *desc)
-{
-	struct sha256_state *sctx = shash_desc_ctx(desc);
-
-	sctx->state[0] = SHA224_H0;
-	sctx->state[1] = SHA224_H1;
-	sctx->state[2] = SHA224_H2;
-	sctx->state[3] = SHA224_H3;
-	sctx->state[4] = SHA224_H4;
-	sctx->state[5] = SHA224_H5;
-	sctx->state[6] = SHA224_H6;
-	sctx->state[7] = SHA224_H7;
-	sctx->count = 0;
-
-	return 0;
-}
-
-static int sha224_ssse3_final(struct shash_desc *desc, u8 *hash)
+/* Add padding and return the message digest. */
+static int sha256_ssse3_final(struct shash_desc *desc, u8 *out)
 {
-	u8 D[SHA256_DIGEST_SIZE];
-
-	sha256_ssse3_final(desc, D);
-
-	memcpy(hash, D, SHA224_DIGEST_SIZE);
-	memzero_explicit(D, SHA256_DIGEST_SIZE);
-
-	return 0;
+	return sha256_ssse3_finup(desc, NULL, 0, out);
 }
 
 static struct shash_alg algs[] = { {
 	.digestsize	=	SHA256_DIGEST_SIZE,
-	.init		=	sha256_ssse3_init,
+	.init		=	sha256_base_init,
 	.update		=	sha256_ssse3_update,
 	.final		=	sha256_ssse3_final,
-	.export		=	sha256_ssse3_export,
-	.import		=	sha256_ssse3_import,
+	.finup		=	sha256_ssse3_finup,
+	.export		=	sha256_base_export,
+	.import		=	sha256_base_import,
 	.descsize	=	sizeof(struct sha256_state),
 	.statesize	=	sizeof(struct sha256_state),
 	.base		=	{
@@ -235,11 +122,12 @@ static struct shash_alg algs[] = { {
 	}
 }, {
 	.digestsize	=	SHA224_DIGEST_SIZE,
-	.init		=	sha224_ssse3_init,
+	.init		=	sha224_base_init,
 	.update		=	sha256_ssse3_update,
-	.final		=	sha224_ssse3_final,
-	.export		=	sha256_ssse3_export,
-	.import		=	sha256_ssse3_import,
+	.final		=	sha256_ssse3_final,
+	.finup		=	sha256_ssse3_finup,
+	.export		=	sha256_base_export,
+	.import		=	sha256_base_import,
 	.descsize	=	sizeof(struct sha256_state),
 	.statesize	=	sizeof(struct sha256_state),
 	.base		=	{
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v3 15/16] crypto/x86: move SHA-224/256 SSSE3 implementation to base layer
@ 2015-04-07  8:52   ` Ard Biesheuvel
  0 siblings, 0 replies; 50+ messages in thread
From: Ard Biesheuvel @ 2015-04-07  8:52 UTC (permalink / raw)
  To: linux-arm-kernel

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/x86/crypto/sha256_ssse3_glue.c | 184 +++++++-----------------------------
 1 file changed, 36 insertions(+), 148 deletions(-)

diff --git a/arch/x86/crypto/sha256_ssse3_glue.c b/arch/x86/crypto/sha256_ssse3_glue.c
index 8fad72f4dfd2..bd4ae0da0a49 100644
--- a/arch/x86/crypto/sha256_ssse3_glue.c
+++ b/arch/x86/crypto/sha256_ssse3_glue.c
@@ -36,7 +36,7 @@
 #include <linux/cryptohash.h>
 #include <linux/types.h>
 #include <crypto/sha.h>
-#include <asm/byteorder.h>
+#include <crypto/sha256_base.h>
 #include <asm/i387.h>
 #include <asm/xcr.h>
 #include <asm/xsave.h>
@@ -55,174 +55,61 @@ asmlinkage void sha256_transform_rorx(const char *data, u32 *digest,
 
 static asmlinkage void (*sha256_transform_asm)(const char *, u32 *, u64);
 
-
-static int sha256_ssse3_init(struct shash_desc *desc)
-{
-	struct sha256_state *sctx = shash_desc_ctx(desc);
-
-	sctx->state[0] = SHA256_H0;
-	sctx->state[1] = SHA256_H1;
-	sctx->state[2] = SHA256_H2;
-	sctx->state[3] = SHA256_H3;
-	sctx->state[4] = SHA256_H4;
-	sctx->state[5] = SHA256_H5;
-	sctx->state[6] = SHA256_H6;
-	sctx->state[7] = SHA256_H7;
-	sctx->count = 0;
-
-	return 0;
-}
-
-static int __sha256_ssse3_update(struct shash_desc *desc, const u8 *data,
-			       unsigned int len, unsigned int partial)
+static void sha256_ssse3_block_fn(int blocks, u8 const *src, u32 *state,
+				  const u8 *head, void *p)
 {
-	struct sha256_state *sctx = shash_desc_ctx(desc);
-	unsigned int done = 0;
-
-	sctx->count += len;
-
-	if (partial) {
-		done = SHA256_BLOCK_SIZE - partial;
-		memcpy(sctx->buf + partial, data, done);
-		sha256_transform_asm(sctx->buf, sctx->state, 1);
-	}
-
-	if (len - done >= SHA256_BLOCK_SIZE) {
-		const unsigned int rounds = (len - done) / SHA256_BLOCK_SIZE;
-
-		sha256_transform_asm(data + done, sctx->state, (u64) rounds);
-
-		done += rounds * SHA256_BLOCK_SIZE;
-	}
-
-	memcpy(sctx->buf, data + done, len - done);
-
-	return 0;
+	if (head)
+		sha256_transform_asm(head, state, 1);
+	if (blocks)
+		sha256_transform_asm(src, state, blocks);
 }
 
 static int sha256_ssse3_update(struct shash_desc *desc, const u8 *data,
 			     unsigned int len)
 {
 	struct sha256_state *sctx = shash_desc_ctx(desc);
-	unsigned int partial = sctx->count % SHA256_BLOCK_SIZE;
-	int res;
 
-	/* Handle the fast case right here */
-	if (partial + len < SHA256_BLOCK_SIZE) {
-		sctx->count += len;
-		memcpy(sctx->buf + partial, data, len);
-
-		return 0;
-	}
-
-	if (!irq_fpu_usable()) {
-		res = crypto_sha256_update(desc, data, len);
-	} else {
-		kernel_fpu_begin();
-		res = __sha256_ssse3_update(desc, data, len, partial);
-		kernel_fpu_end();
-	}
+	if (!irq_fpu_usable() ||
+	    (sctx->count % SHA256_BLOCK_SIZE) + len < SHA256_BLOCK_SIZE)
+		return crypto_sha256_update(desc, data, len);
 
-	return res;
-}
-
-
-/* Add padding and return the message digest. */
-static int sha256_ssse3_final(struct shash_desc *desc, u8 *out)
-{
-	struct sha256_state *sctx = shash_desc_ctx(desc);
-	unsigned int i, index, padlen;
-	__be32 *dst = (__be32 *)out;
-	__be64 bits;
-	static const u8 padding[SHA256_BLOCK_SIZE] = { 0x80, };
-
-	bits = cpu_to_be64(sctx->count << 3);
-
-	/* Pad out to 56 mod 64 and append length */
-	index = sctx->count % SHA256_BLOCK_SIZE;
-	padlen = (index < 56) ? (56 - index) : ((SHA256_BLOCK_SIZE+56)-index);
-
-	if (!irq_fpu_usable()) {
-		crypto_sha256_update(desc, padding, padlen);
-		crypto_sha256_update(desc, (const u8 *)&bits, sizeof(bits));
-	} else {
-		kernel_fpu_begin();
-		/* We need to fill a whole block for __sha256_ssse3_update() */
-		if (padlen <= 56) {
-			sctx->count += padlen;
-			memcpy(sctx->buf + index, padding, padlen);
-		} else {
-			__sha256_ssse3_update(desc, padding, padlen, index);
-		}
-		__sha256_ssse3_update(desc, (const u8 *)&bits,
-					sizeof(bits), 56);
-		kernel_fpu_end();
-	}
-
-	/* Store state in digest */
-	for (i = 0; i < 8; i++)
-		dst[i] = cpu_to_be32(sctx->state[i]);
-
-	/* Wipe context */
-	memset(sctx, 0, sizeof(*sctx));
+	kernel_fpu_begin();
+	sha256_base_do_update(desc, data, len, sha256_ssse3_block_fn, NULL);
+	kernel_fpu_end();
 
 	return 0;
 }
 
-static int sha256_ssse3_export(struct shash_desc *desc, void *out)
+static int sha256_ssse3_finup(struct shash_desc *desc, const u8 *data,
+			      unsigned int len, u8 *out)
 {
-	struct sha256_state *sctx = shash_desc_ctx(desc);
+	if (!irq_fpu_usable())
+		return crypto_sha256_finup(desc, data, len, out);
 
-	memcpy(out, sctx, sizeof(*sctx));
+	kernel_fpu_begin();
+	if (len)
+		sha256_base_do_update(desc, data, len, sha256_ssse3_block_fn,
+				      NULL);
+	sha256_base_do_finalize(desc, sha256_ssse3_block_fn, NULL);
+	kernel_fpu_end();
 
-	return 0;
-}
-
-static int sha256_ssse3_import(struct shash_desc *desc, const void *in)
-{
-	struct sha256_state *sctx = shash_desc_ctx(desc);
-
-	memcpy(sctx, in, sizeof(*sctx));
-
-	return 0;
+	return sha256_base_finish(desc, out);
 }
 
-static int sha224_ssse3_init(struct shash_desc *desc)
-{
-	struct sha256_state *sctx = shash_desc_ctx(desc);
-
-	sctx->state[0] = SHA224_H0;
-	sctx->state[1] = SHA224_H1;
-	sctx->state[2] = SHA224_H2;
-	sctx->state[3] = SHA224_H3;
-	sctx->state[4] = SHA224_H4;
-	sctx->state[5] = SHA224_H5;
-	sctx->state[6] = SHA224_H6;
-	sctx->state[7] = SHA224_H7;
-	sctx->count = 0;
-
-	return 0;
-}
-
-static int sha224_ssse3_final(struct shash_desc *desc, u8 *hash)
+/* Add padding and return the message digest. */
+static int sha256_ssse3_final(struct shash_desc *desc, u8 *out)
 {
-	u8 D[SHA256_DIGEST_SIZE];
-
-	sha256_ssse3_final(desc, D);
-
-	memcpy(hash, D, SHA224_DIGEST_SIZE);
-	memzero_explicit(D, SHA256_DIGEST_SIZE);
-
-	return 0;
+	return sha256_ssse3_finup(desc, NULL, 0, out);
 }
 
 static struct shash_alg algs[] = { {
 	.digestsize	=	SHA256_DIGEST_SIZE,
-	.init		=	sha256_ssse3_init,
+	.init		=	sha256_base_init,
 	.update		=	sha256_ssse3_update,
 	.final		=	sha256_ssse3_final,
-	.export		=	sha256_ssse3_export,
-	.import		=	sha256_ssse3_import,
+	.finup		=	sha256_ssse3_finup,
+	.export		=	sha256_base_export,
+	.import		=	sha256_base_import,
 	.descsize	=	sizeof(struct sha256_state),
 	.statesize	=	sizeof(struct sha256_state),
 	.base		=	{
@@ -235,11 +122,12 @@ static struct shash_alg algs[] = { {
 	}
 }, {
 	.digestsize	=	SHA224_DIGEST_SIZE,
-	.init		=	sha224_ssse3_init,
+	.init		=	sha224_base_init,
 	.update		=	sha256_ssse3_update,
-	.final		=	sha224_ssse3_final,
-	.export		=	sha256_ssse3_export,
-	.import		=	sha256_ssse3_import,
+	.final		=	sha256_ssse3_final,
+	.finup		=	sha256_ssse3_finup,
+	.export		=	sha256_base_export,
+	.import		=	sha256_base_import,
 	.descsize	=	sizeof(struct sha256_state),
 	.statesize	=	sizeof(struct sha256_state),
 	.base		=	{
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v3 16/16] crypto/x86: move SHA-384/512 SSSE3 implementation to base layer
  2015-04-07  8:51 ` Ard Biesheuvel
@ 2015-04-07  8:52   ` Ard Biesheuvel
  -1 siblings, 0 replies; 50+ messages in thread
From: Ard Biesheuvel @ 2015-04-07  8:52 UTC (permalink / raw)
  To: linux-crypto, linux-arm-kernel, x86, herbert, samitolvanen,
	jussi.kivilinna
  Cc: stockhausen, Ard Biesheuvel

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/x86/crypto/sha512_ssse3_glue.c | 193 +++++++-----------------------------
 1 file changed, 36 insertions(+), 157 deletions(-)

diff --git a/arch/x86/crypto/sha512_ssse3_glue.c b/arch/x86/crypto/sha512_ssse3_glue.c
index 0b6af26832bf..4daa27a5d347 100644
--- a/arch/x86/crypto/sha512_ssse3_glue.c
+++ b/arch/x86/crypto/sha512_ssse3_glue.c
@@ -34,7 +34,7 @@
 #include <linux/cryptohash.h>
 #include <linux/types.h>
 #include <crypto/sha.h>
-#include <asm/byteorder.h>
+#include <crypto/sha512_base.h>
 #include <asm/i387.h>
 #include <asm/xcr.h>
 #include <asm/xsave.h>
@@ -54,183 +54,61 @@ asmlinkage void sha512_transform_rorx(const char *data, u64 *digest,
 
 static asmlinkage void (*sha512_transform_asm)(const char *, u64 *, u64);
 
-
-static int sha512_ssse3_init(struct shash_desc *desc)
-{
-	struct sha512_state *sctx = shash_desc_ctx(desc);
-
-	sctx->state[0] = SHA512_H0;
-	sctx->state[1] = SHA512_H1;
-	sctx->state[2] = SHA512_H2;
-	sctx->state[3] = SHA512_H3;
-	sctx->state[4] = SHA512_H4;
-	sctx->state[5] = SHA512_H5;
-	sctx->state[6] = SHA512_H6;
-	sctx->state[7] = SHA512_H7;
-	sctx->count[0] = sctx->count[1] = 0;
-
-	return 0;
-}
-
-static int __sha512_ssse3_update(struct shash_desc *desc, const u8 *data,
-			       unsigned int len, unsigned int partial)
+static void sha512_ssse3_block_fn(int blocks, u8 const *src, u64 *state,
+				  const u8 *head, void *p)
 {
-	struct sha512_state *sctx = shash_desc_ctx(desc);
-	unsigned int done = 0;
-
-	sctx->count[0] += len;
-	if (sctx->count[0] < len)
-		sctx->count[1]++;
-
-	if (partial) {
-		done = SHA512_BLOCK_SIZE - partial;
-		memcpy(sctx->buf + partial, data, done);
-		sha512_transform_asm(sctx->buf, sctx->state, 1);
-	}
-
-	if (len - done >= SHA512_BLOCK_SIZE) {
-		const unsigned int rounds = (len - done) / SHA512_BLOCK_SIZE;
-
-		sha512_transform_asm(data + done, sctx->state, (u64) rounds);
-
-		done += rounds * SHA512_BLOCK_SIZE;
-	}
-
-	memcpy(sctx->buf, data + done, len - done);
-
-	return 0;
+	if (head)
+		sha512_transform_asm(head, state, 1);
+	if (blocks)
+		sha512_transform_asm(src, state, blocks);
 }
 
 static int sha512_ssse3_update(struct shash_desc *desc, const u8 *data,
 			     unsigned int len)
 {
 	struct sha512_state *sctx = shash_desc_ctx(desc);
-	unsigned int partial = sctx->count[0] % SHA512_BLOCK_SIZE;
-	int res;
-
-	/* Handle the fast case right here */
-	if (partial + len < SHA512_BLOCK_SIZE) {
-		sctx->count[0] += len;
-		if (sctx->count[0] < len)
-			sctx->count[1]++;
-		memcpy(sctx->buf + partial, data, len);
-
-		return 0;
-	}
-
-	if (!irq_fpu_usable()) {
-		res = crypto_sha512_update(desc, data, len);
-	} else {
-		kernel_fpu_begin();
-		res = __sha512_ssse3_update(desc, data, len, partial);
-		kernel_fpu_end();
-	}
-
-	return res;
-}
-
-
-/* Add padding and return the message digest. */
-static int sha512_ssse3_final(struct shash_desc *desc, u8 *out)
-{
-	struct sha512_state *sctx = shash_desc_ctx(desc);
-	unsigned int i, index, padlen;
-	__be64 *dst = (__be64 *)out;
-	__be64 bits[2];
-	static const u8 padding[SHA512_BLOCK_SIZE] = { 0x80, };
-
-	/* save number of bits */
-	bits[1] = cpu_to_be64(sctx->count[0] << 3);
-	bits[0] = cpu_to_be64(sctx->count[1] << 3 | sctx->count[0] >> 61);
-
-	/* Pad out to 112 mod 128 and append length */
-	index = sctx->count[0] & 0x7f;
-	padlen = (index < 112) ? (112 - index) : ((128+112) - index);
-
-	if (!irq_fpu_usable()) {
-		crypto_sha512_update(desc, padding, padlen);
-		crypto_sha512_update(desc, (const u8 *)&bits, sizeof(bits));
-	} else {
-		kernel_fpu_begin();
-		/* We need to fill a whole block for __sha512_ssse3_update() */
-		if (padlen <= 112) {
-			sctx->count[0] += padlen;
-			if (sctx->count[0] < padlen)
-				sctx->count[1]++;
-			memcpy(sctx->buf + index, padding, padlen);
-		} else {
-			__sha512_ssse3_update(desc, padding, padlen, index);
-		}
-		__sha512_ssse3_update(desc, (const u8 *)&bits,
-					sizeof(bits), 112);
-		kernel_fpu_end();
-	}
-
-	/* Store state in digest */
-	for (i = 0; i < 8; i++)
-		dst[i] = cpu_to_be64(sctx->state[i]);
-
-	/* Wipe context */
-	memset(sctx, 0, sizeof(*sctx));
-
-	return 0;
-}
-
-static int sha512_ssse3_export(struct shash_desc *desc, void *out)
-{
-	struct sha512_state *sctx = shash_desc_ctx(desc);
-
-	memcpy(out, sctx, sizeof(*sctx));
 
-	return 0;
-}
-
-static int sha512_ssse3_import(struct shash_desc *desc, const void *in)
-{
-	struct sha512_state *sctx = shash_desc_ctx(desc);
+	if (!irq_fpu_usable() ||
+	    (sctx->count[0] % SHA512_BLOCK_SIZE) + len < SHA512_BLOCK_SIZE)
+		return crypto_sha512_update(desc, data, len);
 
-	memcpy(sctx, in, sizeof(*sctx));
+	kernel_fpu_begin();
+	sha512_base_do_update(desc, data, len, sha512_ssse3_block_fn, NULL);
+	kernel_fpu_end();
 
 	return 0;
 }
 
-static int sha384_ssse3_init(struct shash_desc *desc)
+static int sha512_ssse3_finup(struct shash_desc *desc, const u8 *data,
+			      unsigned int len, u8 *out)
 {
-	struct sha512_state *sctx = shash_desc_ctx(desc);
+	if (!irq_fpu_usable())
+		return crypto_sha512_finup(desc, data, len, out);
 
-	sctx->state[0] = SHA384_H0;
-	sctx->state[1] = SHA384_H1;
-	sctx->state[2] = SHA384_H2;
-	sctx->state[3] = SHA384_H3;
-	sctx->state[4] = SHA384_H4;
-	sctx->state[5] = SHA384_H5;
-	sctx->state[6] = SHA384_H6;
-	sctx->state[7] = SHA384_H7;
+	kernel_fpu_begin();
+	if (len)
+		sha512_base_do_update(desc, data, len, sha512_ssse3_block_fn,
+				      NULL);
+	sha512_base_do_finalize(desc, sha512_ssse3_block_fn, NULL);
+	kernel_fpu_end();
 
-	sctx->count[0] = sctx->count[1] = 0;
-
-	return 0;
+	return sha512_base_finish(desc, out);
 }
 
-static int sha384_ssse3_final(struct shash_desc *desc, u8 *hash)
+/* Add padding and return the message digest. */
+static int sha512_ssse3_final(struct shash_desc *desc, u8 *out)
 {
-	u8 D[SHA512_DIGEST_SIZE];
-
-	sha512_ssse3_final(desc, D);
-
-	memcpy(hash, D, SHA384_DIGEST_SIZE);
-	memzero_explicit(D, SHA512_DIGEST_SIZE);
-
-	return 0;
+	return sha512_ssse3_finup(desc, NULL, 0, out);
 }
 
 static struct shash_alg algs[] = { {
 	.digestsize	=	SHA512_DIGEST_SIZE,
-	.init		=	sha512_ssse3_init,
+	.init		=	sha512_base_init,
 	.update		=	sha512_ssse3_update,
 	.final		=	sha512_ssse3_final,
-	.export		=	sha512_ssse3_export,
-	.import		=	sha512_ssse3_import,
+	.finup		=	sha512_ssse3_finup,
+	.export		=	sha512_base_export,
+	.import		=	sha512_base_import,
 	.descsize	=	sizeof(struct sha512_state),
 	.statesize	=	sizeof(struct sha512_state),
 	.base		=	{
@@ -243,11 +121,12 @@ static struct shash_alg algs[] = { {
 	}
 },  {
 	.digestsize	=	SHA384_DIGEST_SIZE,
-	.init		=	sha384_ssse3_init,
+	.init		=	sha384_base_init,
 	.update		=	sha512_ssse3_update,
-	.final		=	sha384_ssse3_final,
-	.export		=	sha512_ssse3_export,
-	.import		=	sha512_ssse3_import,
+	.final		=	sha512_ssse3_final,
+	.finup		=	sha512_ssse3_finup,
+	.export		=	sha512_base_export,
+	.import		=	sha512_base_import,
 	.descsize	=	sizeof(struct sha512_state),
 	.statesize	=	sizeof(struct sha512_state),
 	.base		=	{
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v3 16/16] crypto/x86: move SHA-384/512 SSSE3 implementation to base layer
@ 2015-04-07  8:52   ` Ard Biesheuvel
  0 siblings, 0 replies; 50+ messages in thread
From: Ard Biesheuvel @ 2015-04-07  8:52 UTC (permalink / raw)
  To: linux-arm-kernel

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/x86/crypto/sha512_ssse3_glue.c | 193 +++++++-----------------------------
 1 file changed, 36 insertions(+), 157 deletions(-)

diff --git a/arch/x86/crypto/sha512_ssse3_glue.c b/arch/x86/crypto/sha512_ssse3_glue.c
index 0b6af26832bf..4daa27a5d347 100644
--- a/arch/x86/crypto/sha512_ssse3_glue.c
+++ b/arch/x86/crypto/sha512_ssse3_glue.c
@@ -34,7 +34,7 @@
 #include <linux/cryptohash.h>
 #include <linux/types.h>
 #include <crypto/sha.h>
-#include <asm/byteorder.h>
+#include <crypto/sha512_base.h>
 #include <asm/i387.h>
 #include <asm/xcr.h>
 #include <asm/xsave.h>
@@ -54,183 +54,61 @@ asmlinkage void sha512_transform_rorx(const char *data, u64 *digest,
 
 static asmlinkage void (*sha512_transform_asm)(const char *, u64 *, u64);
 
-
-static int sha512_ssse3_init(struct shash_desc *desc)
-{
-	struct sha512_state *sctx = shash_desc_ctx(desc);
-
-	sctx->state[0] = SHA512_H0;
-	sctx->state[1] = SHA512_H1;
-	sctx->state[2] = SHA512_H2;
-	sctx->state[3] = SHA512_H3;
-	sctx->state[4] = SHA512_H4;
-	sctx->state[5] = SHA512_H5;
-	sctx->state[6] = SHA512_H6;
-	sctx->state[7] = SHA512_H7;
-	sctx->count[0] = sctx->count[1] = 0;
-
-	return 0;
-}
-
-static int __sha512_ssse3_update(struct shash_desc *desc, const u8 *data,
-			       unsigned int len, unsigned int partial)
+static void sha512_ssse3_block_fn(int blocks, u8 const *src, u64 *state,
+				  const u8 *head, void *p)
 {
-	struct sha512_state *sctx = shash_desc_ctx(desc);
-	unsigned int done = 0;
-
-	sctx->count[0] += len;
-	if (sctx->count[0] < len)
-		sctx->count[1]++;
-
-	if (partial) {
-		done = SHA512_BLOCK_SIZE - partial;
-		memcpy(sctx->buf + partial, data, done);
-		sha512_transform_asm(sctx->buf, sctx->state, 1);
-	}
-
-	if (len - done >= SHA512_BLOCK_SIZE) {
-		const unsigned int rounds = (len - done) / SHA512_BLOCK_SIZE;
-
-		sha512_transform_asm(data + done, sctx->state, (u64) rounds);
-
-		done += rounds * SHA512_BLOCK_SIZE;
-	}
-
-	memcpy(sctx->buf, data + done, len - done);
-
-	return 0;
+	if (head)
+		sha512_transform_asm(head, state, 1);
+	if (blocks)
+		sha512_transform_asm(src, state, blocks);
 }
 
 static int sha512_ssse3_update(struct shash_desc *desc, const u8 *data,
 			     unsigned int len)
 {
 	struct sha512_state *sctx = shash_desc_ctx(desc);
-	unsigned int partial = sctx->count[0] % SHA512_BLOCK_SIZE;
-	int res;
-
-	/* Handle the fast case right here */
-	if (partial + len < SHA512_BLOCK_SIZE) {
-		sctx->count[0] += len;
-		if (sctx->count[0] < len)
-			sctx->count[1]++;
-		memcpy(sctx->buf + partial, data, len);
-
-		return 0;
-	}
-
-	if (!irq_fpu_usable()) {
-		res = crypto_sha512_update(desc, data, len);
-	} else {
-		kernel_fpu_begin();
-		res = __sha512_ssse3_update(desc, data, len, partial);
-		kernel_fpu_end();
-	}
-
-	return res;
-}
-
-
-/* Add padding and return the message digest. */
-static int sha512_ssse3_final(struct shash_desc *desc, u8 *out)
-{
-	struct sha512_state *sctx = shash_desc_ctx(desc);
-	unsigned int i, index, padlen;
-	__be64 *dst = (__be64 *)out;
-	__be64 bits[2];
-	static const u8 padding[SHA512_BLOCK_SIZE] = { 0x80, };
-
-	/* save number of bits */
-	bits[1] = cpu_to_be64(sctx->count[0] << 3);
-	bits[0] = cpu_to_be64(sctx->count[1] << 3 | sctx->count[0] >> 61);
-
-	/* Pad out to 112 mod 128 and append length */
-	index = sctx->count[0] & 0x7f;
-	padlen = (index < 112) ? (112 - index) : ((128+112) - index);
-
-	if (!irq_fpu_usable()) {
-		crypto_sha512_update(desc, padding, padlen);
-		crypto_sha512_update(desc, (const u8 *)&bits, sizeof(bits));
-	} else {
-		kernel_fpu_begin();
-		/* We need to fill a whole block for __sha512_ssse3_update() */
-		if (padlen <= 112) {
-			sctx->count[0] += padlen;
-			if (sctx->count[0] < padlen)
-				sctx->count[1]++;
-			memcpy(sctx->buf + index, padding, padlen);
-		} else {
-			__sha512_ssse3_update(desc, padding, padlen, index);
-		}
-		__sha512_ssse3_update(desc, (const u8 *)&bits,
-					sizeof(bits), 112);
-		kernel_fpu_end();
-	}
-
-	/* Store state in digest */
-	for (i = 0; i < 8; i++)
-		dst[i] = cpu_to_be64(sctx->state[i]);
-
-	/* Wipe context */
-	memset(sctx, 0, sizeof(*sctx));
-
-	return 0;
-}
-
-static int sha512_ssse3_export(struct shash_desc *desc, void *out)
-{
-	struct sha512_state *sctx = shash_desc_ctx(desc);
-
-	memcpy(out, sctx, sizeof(*sctx));
 
-	return 0;
-}
-
-static int sha512_ssse3_import(struct shash_desc *desc, const void *in)
-{
-	struct sha512_state *sctx = shash_desc_ctx(desc);
+	if (!irq_fpu_usable() ||
+	    (sctx->count[0] % SHA512_BLOCK_SIZE) + len < SHA512_BLOCK_SIZE)
+		return crypto_sha512_update(desc, data, len);
 
-	memcpy(sctx, in, sizeof(*sctx));
+	kernel_fpu_begin();
+	sha512_base_do_update(desc, data, len, sha512_ssse3_block_fn, NULL);
+	kernel_fpu_end();
 
 	return 0;
 }
 
-static int sha384_ssse3_init(struct shash_desc *desc)
+static int sha512_ssse3_finup(struct shash_desc *desc, const u8 *data,
+			      unsigned int len, u8 *out)
 {
-	struct sha512_state *sctx = shash_desc_ctx(desc);
+	if (!irq_fpu_usable())
+		return crypto_sha512_finup(desc, data, len, out);
 
-	sctx->state[0] = SHA384_H0;
-	sctx->state[1] = SHA384_H1;
-	sctx->state[2] = SHA384_H2;
-	sctx->state[3] = SHA384_H3;
-	sctx->state[4] = SHA384_H4;
-	sctx->state[5] = SHA384_H5;
-	sctx->state[6] = SHA384_H6;
-	sctx->state[7] = SHA384_H7;
+	kernel_fpu_begin();
+	if (len)
+		sha512_base_do_update(desc, data, len, sha512_ssse3_block_fn,
+				      NULL);
+	sha512_base_do_finalize(desc, sha512_ssse3_block_fn, NULL);
+	kernel_fpu_end();
 
-	sctx->count[0] = sctx->count[1] = 0;
-
-	return 0;
+	return sha512_base_finish(desc, out);
 }
 
-static int sha384_ssse3_final(struct shash_desc *desc, u8 *hash)
+/* Add padding and return the message digest. */
+static int sha512_ssse3_final(struct shash_desc *desc, u8 *out)
 {
-	u8 D[SHA512_DIGEST_SIZE];
-
-	sha512_ssse3_final(desc, D);
-
-	memcpy(hash, D, SHA384_DIGEST_SIZE);
-	memzero_explicit(D, SHA512_DIGEST_SIZE);
-
-	return 0;
+	return sha512_ssse3_finup(desc, NULL, 0, out);
 }
 
 static struct shash_alg algs[] = { {
 	.digestsize	=	SHA512_DIGEST_SIZE,
-	.init		=	sha512_ssse3_init,
+	.init		=	sha512_base_init,
 	.update		=	sha512_ssse3_update,
 	.final		=	sha512_ssse3_final,
-	.export		=	sha512_ssse3_export,
-	.import		=	sha512_ssse3_import,
+	.finup		=	sha512_ssse3_finup,
+	.export		=	sha512_base_export,
+	.import		=	sha512_base_import,
 	.descsize	=	sizeof(struct sha512_state),
 	.statesize	=	sizeof(struct sha512_state),
 	.base		=	{
@@ -243,11 +121,12 @@ static struct shash_alg algs[] = { {
 	}
 },  {
 	.digestsize	=	SHA384_DIGEST_SIZE,
-	.init		=	sha384_ssse3_init,
+	.init		=	sha384_base_init,
 	.update		=	sha512_ssse3_update,
-	.final		=	sha384_ssse3_final,
-	.export		=	sha512_ssse3_export,
-	.import		=	sha512_ssse3_import,
+	.final		=	sha512_ssse3_final,
+	.finup		=	sha512_ssse3_finup,
+	.export		=	sha512_base_export,
+	.import		=	sha512_base_import,
 	.descsize	=	sizeof(struct sha512_state),
 	.statesize	=	sizeof(struct sha512_state),
 	.base		=	{
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* Re: [PATCH v3 01/16] crypto: sha1: implement base layer for SHA-1
  2015-04-07  8:51   ` Ard Biesheuvel
@ 2015-04-08 13:19     ` Herbert Xu
  -1 siblings, 0 replies; 50+ messages in thread
From: Herbert Xu @ 2015-04-08 13:19 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: linux-crypto, linux-arm-kernel, x86, samitolvanen,
	jussi.kivilinna, stockhausen

On Tue, Apr 07, 2015 at 10:51:49AM +0200, Ard Biesheuvel wrote:
>
> +typedef void (sha1_block_fn)(int blocks, u8 const *src, u32 *state,
> +			     const u8 *head, void *p);

Does this really need five arguments? First of all we can get rid
of head by just calling this function twice.  The last argument
appears to only be used by arm64 where it is simply another way
of saying (sctx->count + len) % SHA_BLOCK_SIZE != 0.  So why not
get rid of it and just use the conditional?

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH v3 01/16] crypto: sha1: implement base layer for SHA-1
@ 2015-04-08 13:19     ` Herbert Xu
  0 siblings, 0 replies; 50+ messages in thread
From: Herbert Xu @ 2015-04-08 13:19 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Apr 07, 2015 at 10:51:49AM +0200, Ard Biesheuvel wrote:
>
> +typedef void (sha1_block_fn)(int blocks, u8 const *src, u32 *state,
> +			     const u8 *head, void *p);

Does this really need five arguments? First of all we can get rid
of head by just calling this function twice.  The last argument
appears to only be used by arm64 where it is simply another way
of saying (sctx->count + len) % SHA_BLOCK_SIZE != 0.  So why not
get rid of it and just use the conditional?

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v3 01/16] crypto: sha1: implement base layer for SHA-1
  2015-04-08 13:19     ` Herbert Xu
@ 2015-04-08 13:25       ` Ard Biesheuvel
  -1 siblings, 0 replies; 50+ messages in thread
From: Ard Biesheuvel @ 2015-04-08 13:25 UTC (permalink / raw)
  To: Herbert Xu
  Cc: linux-crypto, linux-arm-kernel, x86, Sami Tolvanen,
	Jussi Kivilinna, Markus Stockhausen

On 8 April 2015 at 15:19, Herbert Xu <herbert@gondor.apana.org.au> wrote:
> On Tue, Apr 07, 2015 at 10:51:49AM +0200, Ard Biesheuvel wrote:
>>
>> +typedef void (sha1_block_fn)(int blocks, u8 const *src, u32 *state,
>> +                          const u8 *head, void *p);
>
> Does this really need five arguments? First of all we can get rid
> of head by just calling this function twice.

Not having to call the function twice is the whole point. In the arm64
case, all the SHA-256 round keys can be kept in registers (it has 32
16-byte SIMD registers), and that is what motivates this pattern. By
passing a head block, a pointer to the source and the generic pointer
(which arm64 uses to finalize the block, we can process all data in a
single invocation of the block transform)

>  The last argument
> appears to only be used by arm64 where it is simply another way
> of saying (sctx->count + len) % SHA_BLOCK_SIZE != 0.  So why not
> get rid of it and just use the conditional?
>

Do note that these are only used by static inline functions, so the
unused arguments are all eliminated from the binary anyway. In fact,
looking at the generated code, the function calls don't use function
pointers at all anymore,
but just call the block transform directly, so the typedef is only
used as a prototype, really.

-- 
Ard.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH v3 01/16] crypto: sha1: implement base layer for SHA-1
@ 2015-04-08 13:25       ` Ard Biesheuvel
  0 siblings, 0 replies; 50+ messages in thread
From: Ard Biesheuvel @ 2015-04-08 13:25 UTC (permalink / raw)
  To: linux-arm-kernel

On 8 April 2015 at 15:19, Herbert Xu <herbert@gondor.apana.org.au> wrote:
> On Tue, Apr 07, 2015 at 10:51:49AM +0200, Ard Biesheuvel wrote:
>>
>> +typedef void (sha1_block_fn)(int blocks, u8 const *src, u32 *state,
>> +                          const u8 *head, void *p);
>
> Does this really need five arguments? First of all we can get rid
> of head by just calling this function twice.

Not having to call the function twice is the whole point. In the arm64
case, all the SHA-256 round keys can be kept in registers (it has 32
16-byte SIMD registers), and that is what motivates this pattern. By
passing a head block, a pointer to the source and the generic pointer
(which arm64 uses to finalize the block, we can process all data in a
single invocation of the block transform)

>  The last argument
> appears to only be used by arm64 where it is simply another way
> of saying (sctx->count + len) % SHA_BLOCK_SIZE != 0.  So why not
> get rid of it and just use the conditional?
>

Do note that these are only used by static inline functions, so the
unused arguments are all eliminated from the binary anyway. In fact,
looking at the generated code, the function calls don't use function
pointers at all anymore,
but just call the block transform directly, so the typedef is only
used as a prototype, really.

-- 
Ard.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v3 01/16] crypto: sha1: implement base layer for SHA-1
  2015-04-08 13:25       ` Ard Biesheuvel
@ 2015-04-08 13:30         ` Herbert Xu
  -1 siblings, 0 replies; 50+ messages in thread
From: Herbert Xu @ 2015-04-08 13:30 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: linux-crypto, linux-arm-kernel, x86, Sami Tolvanen,
	Jussi Kivilinna, Markus Stockhausen

On Wed, Apr 08, 2015 at 03:25:14PM +0200, Ard Biesheuvel wrote:
>
> Not having to call the function twice is the whole point. In the arm64
> case, all the SHA-256 round keys can be kept in registers (it has 32
> 16-byte SIMD registers), and that is what motivates this pattern. By
> passing a head block, a pointer to the source and the generic pointer
> (which arm64 uses to finalize the block, we can process all data in a
> single invocation of the block transform)

Does this really make any difference? With IPsec the partial code
path is never even going to get executed.

> Do note that these are only used by static inline functions, so the
> unused arguments are all eliminated from the binary anyway. In fact,
> looking at the generated code, the function calls don't use function
> pointers at all anymore,
> but just call the block transform directly, so the typedef is only
> used as a prototype, really.

It's not just the generated code.  The next guy that comes along
and writes a SHA implementation is going to go WTH is this p
argument.  I'm not going to add crap to the generic layer just
because ARM needs it.  In fact ARM doesn't even need it.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH v3 01/16] crypto: sha1: implement base layer for SHA-1
@ 2015-04-08 13:30         ` Herbert Xu
  0 siblings, 0 replies; 50+ messages in thread
From: Herbert Xu @ 2015-04-08 13:30 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Apr 08, 2015 at 03:25:14PM +0200, Ard Biesheuvel wrote:
>
> Not having to call the function twice is the whole point. In the arm64
> case, all the SHA-256 round keys can be kept in registers (it has 32
> 16-byte SIMD registers), and that is what motivates this pattern. By
> passing a head block, a pointer to the source and the generic pointer
> (which arm64 uses to finalize the block, we can process all data in a
> single invocation of the block transform)

Does this really make any difference? With IPsec the partial code
path is never even going to get executed.

> Do note that these are only used by static inline functions, so the
> unused arguments are all eliminated from the binary anyway. In fact,
> looking at the generated code, the function calls don't use function
> pointers at all anymore,
> but just call the block transform directly, so the typedef is only
> used as a prototype, really.

It's not just the generated code.  The next guy that comes along
and writes a SHA implementation is going to go WTH is this p
argument.  I'm not going to add crap to the generic layer just
because ARM needs it.  In fact ARM doesn't even need it.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v3 01/16] crypto: sha1: implement base layer for SHA-1
  2015-04-08 13:30         ` Herbert Xu
@ 2015-04-08 13:40           ` Ard Biesheuvel
  -1 siblings, 0 replies; 50+ messages in thread
From: Ard Biesheuvel @ 2015-04-08 13:40 UTC (permalink / raw)
  To: Herbert Xu
  Cc: linux-crypto, linux-arm-kernel, x86, Sami Tolvanen,
	Jussi Kivilinna, Markus Stockhausen

On 8 April 2015 at 15:30, Herbert Xu <herbert@gondor.apana.org.au> wrote:
> On Wed, Apr 08, 2015 at 03:25:14PM +0200, Ard Biesheuvel wrote:
>>
>> Not having to call the function twice is the whole point. In the arm64
>> case, all the SHA-256 round keys can be kept in registers (it has 32
>> 16-byte SIMD registers), and that is what motivates this pattern. By
>> passing a head block, a pointer to the source and the generic pointer
>> (which arm64 uses to finalize the block, we can process all data in a
>> single invocation of the block transform)
>
> Does this really make any difference? With IPsec the partial code
> path is never even going to get executed.
>

This is not the partial code path, it is the .finup path, in fact.
Anything that hashes data that is often a multiple of the block size
(which is more likely for block based applications than for IPsec, I
think) should benefit from this. But even if it is not, using a head
block and a pointer to the src eliminates one call of the block
transform.

Note that, in the arm64 case, calling a SHA-256 block transform in
non-process context involves:
- stacking the contents of 28 SIMD registers (28 x 16 = 448 bytes)
- loading the SHA-256 constants (16 x 16 = 256 bytes)
- processing the data
- unstacking the contents of 28 SIMD registers (448 bytes)

so anything that can prevent needlessly calling these functions
multiple times in quick successsion is going to help, and 'just
calling it twice' just doesn't cut it.

>> Do note that these are only used by static inline functions, so the
>> unused arguments are all eliminated from the binary anyway. In fact,
>> looking at the generated code, the function calls don't use function
>> pointers at all anymore,
>> but just call the block transform directly, so the typedef is only
>> used as a prototype, really.
>
> It's not just the generated code.  The next guy that comes along
> and writes a SHA implementation is going to go WTH is this p
> argument.  I'm not going to add crap to the generic layer just
> because ARM needs it.  In fact ARM doesn't even need it.
>

OK, so there are 2 pieces of crap [sic] in this proposed generic layer:
- the head block
- the generic pointer

The generic pointer is used in the arm64 case to convey the
information that the current invocation of the block transform is the
final one, and the core code can apply the padding and finalize /and/
pass back whether it has done so or not. (the latter can easily be
done in the C code as well)  I used a generic pointer to allow other
uses, but if you have a better idea for this particular use case, I'd
be happy to hear it.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH v3 01/16] crypto: sha1: implement base layer for SHA-1
@ 2015-04-08 13:40           ` Ard Biesheuvel
  0 siblings, 0 replies; 50+ messages in thread
From: Ard Biesheuvel @ 2015-04-08 13:40 UTC (permalink / raw)
  To: linux-arm-kernel

On 8 April 2015 at 15:30, Herbert Xu <herbert@gondor.apana.org.au> wrote:
> On Wed, Apr 08, 2015 at 03:25:14PM +0200, Ard Biesheuvel wrote:
>>
>> Not having to call the function twice is the whole point. In the arm64
>> case, all the SHA-256 round keys can be kept in registers (it has 32
>> 16-byte SIMD registers), and that is what motivates this pattern. By
>> passing a head block, a pointer to the source and the generic pointer
>> (which arm64 uses to finalize the block, we can process all data in a
>> single invocation of the block transform)
>
> Does this really make any difference? With IPsec the partial code
> path is never even going to get executed.
>

This is not the partial code path, it is the .finup path, in fact.
Anything that hashes data that is often a multiple of the block size
(which is more likely for block based applications than for IPsec, I
think) should benefit from this. But even if it is not, using a head
block and a pointer to the src eliminates one call of the block
transform.

Note that, in the arm64 case, calling a SHA-256 block transform in
non-process context involves:
- stacking the contents of 28 SIMD registers (28 x 16 = 448 bytes)
- loading the SHA-256 constants (16 x 16 = 256 bytes)
- processing the data
- unstacking the contents of 28 SIMD registers (448 bytes)

so anything that can prevent needlessly calling these functions
multiple times in quick successsion is going to help, and 'just
calling it twice' just doesn't cut it.

>> Do note that these are only used by static inline functions, so the
>> unused arguments are all eliminated from the binary anyway. In fact,
>> looking at the generated code, the function calls don't use function
>> pointers at all anymore,
>> but just call the block transform directly, so the typedef is only
>> used as a prototype, really.
>
> It's not just the generated code.  The next guy that comes along
> and writes a SHA implementation is going to go WTH is this p
> argument.  I'm not going to add crap to the generic layer just
> because ARM needs it.  In fact ARM doesn't even need it.
>

OK, so there are 2 pieces of crap [sic] in this proposed generic layer:
- the head block
- the generic pointer

The generic pointer is used in the arm64 case to convey the
information that the current invocation of the block transform is the
final one, and the core code can apply the padding and finalize /and/
pass back whether it has done so or not. (the latter can easily be
done in the C code as well)  I used a generic pointer to allow other
uses, but if you have a better idea for this particular use case, I'd
be happy to hear it.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v3 01/16] crypto: sha1: implement base layer for SHA-1
  2015-04-08 13:40           ` Ard Biesheuvel
@ 2015-04-08 13:52             ` Ard Biesheuvel
  -1 siblings, 0 replies; 50+ messages in thread
From: Ard Biesheuvel @ 2015-04-08 13:52 UTC (permalink / raw)
  To: Herbert Xu
  Cc: linux-crypto, linux-arm-kernel, x86, Sami Tolvanen,
	Jussi Kivilinna, Markus Stockhausen

On 8 April 2015 at 15:40, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
> On 8 April 2015 at 15:30, Herbert Xu <herbert@gondor.apana.org.au> wrote:
>> On Wed, Apr 08, 2015 at 03:25:14PM +0200, Ard Biesheuvel wrote:
>>>
>>> Not having to call the function twice is the whole point. In the arm64
>>> case, all the SHA-256 round keys can be kept in registers (it has 32
>>> 16-byte SIMD registers), and that is what motivates this pattern. By
>>> passing a head block, a pointer to the source and the generic pointer
>>> (which arm64 uses to finalize the block, we can process all data in a
>>> single invocation of the block transform)
>>
>> Does this really make any difference? With IPsec the partial code
>> path is never even going to get executed.
>>
>
> This is not the partial code path, it is the .finup path, in fact.
> Anything that hashes data that is often a multiple of the block size
> (which is more likely for block based applications than for IPsec, I
> think) should benefit from this. But even if it is not, using a head
> block and a pointer to the src eliminates one call of the block
> transform.
>
> Note that, in the arm64 case, calling a SHA-256 block transform in
> non-process context involves:
> - stacking the contents of 28 SIMD registers (28 x 16 = 448 bytes)
> - loading the SHA-256 constants (16 x 16 = 256 bytes)
> - processing the data
> - unstacking the contents of 28 SIMD registers (448 bytes)
>
> so anything that can prevent needlessly calling these functions
> multiple times in quick successsion is going to help, and 'just
> calling it twice' just doesn't cut it.
>

OK, stacking/unstacking can be amortized over multiple invocations of
the block transform, only loading the round constants cannot.


>>> Do note that these are only used by static inline functions, so the
>>> unused arguments are all eliminated from the binary anyway. In fact,
>>> looking at the generated code, the function calls don't use function
>>> pointers at all anymore,
>>> but just call the block transform directly, so the typedef is only
>>> used as a prototype, really.
>>
>> It's not just the generated code.  The next guy that comes along
>> and writes a SHA implementation is going to go WTH is this p
>> argument.  I'm not going to add crap to the generic layer just
>> because ARM needs it.  In fact ARM doesn't even need it.
>>
>
> OK, so there are 2 pieces of crap [sic] in this proposed generic layer:
> - the head block
> - the generic pointer
>
> The generic pointer is used in the arm64 case to convey the
> information that the current invocation of the block transform is the
> final one, and the core code can apply the padding and finalize /and/
> pass back whether it has done so or not. (the latter can easily be
> done in the C code as well)  I used a generic pointer to allow other
> uses, but if you have a better idea for this particular use case, I'd
> be happy to hear it.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH v3 01/16] crypto: sha1: implement base layer for SHA-1
@ 2015-04-08 13:52             ` Ard Biesheuvel
  0 siblings, 0 replies; 50+ messages in thread
From: Ard Biesheuvel @ 2015-04-08 13:52 UTC (permalink / raw)
  To: linux-arm-kernel

On 8 April 2015 at 15:40, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
> On 8 April 2015 at 15:30, Herbert Xu <herbert@gondor.apana.org.au> wrote:
>> On Wed, Apr 08, 2015 at 03:25:14PM +0200, Ard Biesheuvel wrote:
>>>
>>> Not having to call the function twice is the whole point. In the arm64
>>> case, all the SHA-256 round keys can be kept in registers (it has 32
>>> 16-byte SIMD registers), and that is what motivates this pattern. By
>>> passing a head block, a pointer to the source and the generic pointer
>>> (which arm64 uses to finalize the block, we can process all data in a
>>> single invocation of the block transform)
>>
>> Does this really make any difference? With IPsec the partial code
>> path is never even going to get executed.
>>
>
> This is not the partial code path, it is the .finup path, in fact.
> Anything that hashes data that is often a multiple of the block size
> (which is more likely for block based applications than for IPsec, I
> think) should benefit from this. But even if it is not, using a head
> block and a pointer to the src eliminates one call of the block
> transform.
>
> Note that, in the arm64 case, calling a SHA-256 block transform in
> non-process context involves:
> - stacking the contents of 28 SIMD registers (28 x 16 = 448 bytes)
> - loading the SHA-256 constants (16 x 16 = 256 bytes)
> - processing the data
> - unstacking the contents of 28 SIMD registers (448 bytes)
>
> so anything that can prevent needlessly calling these functions
> multiple times in quick successsion is going to help, and 'just
> calling it twice' just doesn't cut it.
>

OK, stacking/unstacking can be amortized over multiple invocations of
the block transform, only loading the round constants cannot.


>>> Do note that these are only used by static inline functions, so the
>>> unused arguments are all eliminated from the binary anyway. In fact,
>>> looking at the generated code, the function calls don't use function
>>> pointers at all anymore,
>>> but just call the block transform directly, so the typedef is only
>>> used as a prototype, really.
>>
>> It's not just the generated code.  The next guy that comes along
>> and writes a SHA implementation is going to go WTH is this p
>> argument.  I'm not going to add crap to the generic layer just
>> because ARM needs it.  In fact ARM doesn't even need it.
>>
>
> OK, so there are 2 pieces of crap [sic] in this proposed generic layer:
> - the head block
> - the generic pointer
>
> The generic pointer is used in the arm64 case to convey the
> information that the current invocation of the block transform is the
> final one, and the core code can apply the padding and finalize /and/
> pass back whether it has done so or not. (the latter can easily be
> done in the C code as well)  I used a generic pointer to allow other
> uses, but if you have a better idea for this particular use case, I'd
> be happy to hear it.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v3 01/16] crypto: sha1: implement base layer for SHA-1
  2015-04-08 13:40           ` Ard Biesheuvel
@ 2015-04-08 14:06             ` Herbert Xu
  -1 siblings, 0 replies; 50+ messages in thread
From: Herbert Xu @ 2015-04-08 14:06 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: linux-crypto, linux-arm-kernel, x86, Sami Tolvanen,
	Jussi Kivilinna, Markus Stockhausen

On Wed, Apr 08, 2015 at 03:40:56PM +0200, Ard Biesheuvel wrote:
>
> This is not the partial code path, it is the .finup path, in fact.
> Anything that hashes data that is often a multiple of the block size
> (which is more likely for block based applications than for IPsec, I
> think) should benefit from this. But even if it is not, using a head
> block and a pointer to the src eliminates one call of the block
> transform.

This would only ever happen if you call update on a partial block
which should be the slow path anyway.

IPsec only calls digest so it can never do a partial block update
unless you have a badly fragmented skb.

Incidentally if you did want to deal with such cases in light
of the high overhead that you have you may want to switch over
to ahash which would allow you to do the saving and restoring
once per operation rather than once per SG entry.

> OK, so there are 2 pieces of crap [sic] in this proposed generic layer:
> - the head block
> - the generic pointer

Yes your patches look great otherwise.  I just want to keep things
simple because you're creating a template for all future SHA authors.

> The generic pointer is used in the arm64 case to convey the
> information that the current invocation of the block transform is the
> final one, and the core code can apply the padding and finalize /and/
> pass back whether it has done so or not. (the latter can easily be
> done in the C code as well)  I used a generic pointer to allow other
> uses, but if you have a better idea for this particular use case, I'd
> be happy to hear it.

As I said in the original email, this appears to be equivalent to

	(sctx->count + len) % SHA1_BLOCK_SIZE != 0

So why not just do that check in C? It'll get compiled into a
simple mask so it should be no slower than a branch on finalize.

In fact if your assembly function always updates sctx->count
then you could simplify that to

	sctx->count % SHA1_BLOCK_SIZE != 0

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH v3 01/16] crypto: sha1: implement base layer for SHA-1
@ 2015-04-08 14:06             ` Herbert Xu
  0 siblings, 0 replies; 50+ messages in thread
From: Herbert Xu @ 2015-04-08 14:06 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Apr 08, 2015 at 03:40:56PM +0200, Ard Biesheuvel wrote:
>
> This is not the partial code path, it is the .finup path, in fact.
> Anything that hashes data that is often a multiple of the block size
> (which is more likely for block based applications than for IPsec, I
> think) should benefit from this. But even if it is not, using a head
> block and a pointer to the src eliminates one call of the block
> transform.

This would only ever happen if you call update on a partial block
which should be the slow path anyway.

IPsec only calls digest so it can never do a partial block update
unless you have a badly fragmented skb.

Incidentally if you did want to deal with such cases in light
of the high overhead that you have you may want to switch over
to ahash which would allow you to do the saving and restoring
once per operation rather than once per SG entry.

> OK, so there are 2 pieces of crap [sic] in this proposed generic layer:
> - the head block
> - the generic pointer

Yes your patches look great otherwise.  I just want to keep things
simple because you're creating a template for all future SHA authors.

> The generic pointer is used in the arm64 case to convey the
> information that the current invocation of the block transform is the
> final one, and the core code can apply the padding and finalize /and/
> pass back whether it has done so or not. (the latter can easily be
> done in the C code as well)  I used a generic pointer to allow other
> uses, but if you have a better idea for this particular use case, I'd
> be happy to hear it.

As I said in the original email, this appears to be equivalent to

	(sctx->count + len) % SHA1_BLOCK_SIZE != 0

So why not just do that check in C? It'll get compiled into a
simple mask so it should be no slower than a branch on finalize.

In fact if your assembly function always updates sctx->count
then you could simplify that to

	sctx->count % SHA1_BLOCK_SIZE != 0

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v3 01/16] crypto: sha1: implement base layer for SHA-1
  2015-04-08 14:06             ` Herbert Xu
@ 2015-04-08 14:18               ` Ard Biesheuvel
  -1 siblings, 0 replies; 50+ messages in thread
From: Ard Biesheuvel @ 2015-04-08 14:18 UTC (permalink / raw)
  To: Herbert Xu
  Cc: linux-crypto, linux-arm-kernel, x86, Sami Tolvanen,
	Jussi Kivilinna, Markus Stockhausen

On 8 April 2015 at 16:06, Herbert Xu <herbert@gondor.apana.org.au> wrote:
> On Wed, Apr 08, 2015 at 03:40:56PM +0200, Ard Biesheuvel wrote:
>>
>> This is not the partial code path, it is the .finup path, in fact.
>> Anything that hashes data that is often a multiple of the block size
>> (which is more likely for block based applications than for IPsec, I
>> think) should benefit from this. But even if it is not, using a head
>> block and a pointer to the src eliminates one call of the block
>> transform.
>
> This would only ever happen if you call update on a partial block
> which should be the slow path anyway.
>
> IPsec only calls digest so it can never do a partial block update
> unless you have a badly fragmented skb.
>

OK, so using digest() (which uses finup()), there will never be any
data stashed in the struct::buf[], so the possible invocations of the
block transform are
- first call with all data
- second call with the unaligned chunk at the end + padding
- third call with the end of the padding + count

where the second call is only made in a minority of cases where there
is no room for the count and single padding byte after the data.
And the head block only helps with partial data, not with reducing the
potential number of invocations done for padding and count etc.

> Incidentally if you did want to deal with such cases in light
> of the high overhead that you have you may want to switch over
> to ahash which would allow you to do the saving and restoring
> once per operation rather than once per SG entry.
>

OK I will look into that.

>> OK, so there are 2 pieces of crap [sic] in this proposed generic layer:
>> - the head block
>> - the generic pointer
>
> Yes your patches look great otherwise.  I just want to keep things
> simple because you're creating a template for all future SHA authors.
>
>> The generic pointer is used in the arm64 case to convey the
>> information that the current invocation of the block transform is the
>> final one, and the core code can apply the padding and finalize /and/
>> pass back whether it has done so or not. (the latter can easily be
>> done in the C code as well)  I used a generic pointer to allow other
>> uses, but if you have a better idea for this particular use case, I'd
>> be happy to hear it.
>
> As I said in the original email, this appears to be equivalent to
>
>         (sctx->count + len) % SHA1_BLOCK_SIZE != 0
>
> So why not just do that check in C? It'll get compiled into a
> simple mask so it should be no slower than a branch on finalize.
>
> In fact if your assembly function always updates sctx->count
> then you could simplify that to
>
>         sctx->count % SHA1_BLOCK_SIZE != 0
>

The problem is that the block transform needs to be informed of the
fact that the current invocation is the final one, and it can perform
the padding, add the count etc. The C code can figure out by looking
at the count whether the asm code has applied the padding or not, but
it cannot really tell the asm code whether it should apply the padding
without an additional argument.

So what about

> +typedef void (sha1_block_fn)(int blocks, u8 const *src, u32 *state,
> +                          bool final);

?

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH v3 01/16] crypto: sha1: implement base layer for SHA-1
@ 2015-04-08 14:18               ` Ard Biesheuvel
  0 siblings, 0 replies; 50+ messages in thread
From: Ard Biesheuvel @ 2015-04-08 14:18 UTC (permalink / raw)
  To: linux-arm-kernel

On 8 April 2015 at 16:06, Herbert Xu <herbert@gondor.apana.org.au> wrote:
> On Wed, Apr 08, 2015 at 03:40:56PM +0200, Ard Biesheuvel wrote:
>>
>> This is not the partial code path, it is the .finup path, in fact.
>> Anything that hashes data that is often a multiple of the block size
>> (which is more likely for block based applications than for IPsec, I
>> think) should benefit from this. But even if it is not, using a head
>> block and a pointer to the src eliminates one call of the block
>> transform.
>
> This would only ever happen if you call update on a partial block
> which should be the slow path anyway.
>
> IPsec only calls digest so it can never do a partial block update
> unless you have a badly fragmented skb.
>

OK, so using digest() (which uses finup()), there will never be any
data stashed in the struct::buf[], so the possible invocations of the
block transform are
- first call with all data
- second call with the unaligned chunk at the end + padding
- third call with the end of the padding + count

where the second call is only made in a minority of cases where there
is no room for the count and single padding byte after the data.
And the head block only helps with partial data, not with reducing the
potential number of invocations done for padding and count etc.

> Incidentally if you did want to deal with such cases in light
> of the high overhead that you have you may want to switch over
> to ahash which would allow you to do the saving and restoring
> once per operation rather than once per SG entry.
>

OK I will look into that.

>> OK, so there are 2 pieces of crap [sic] in this proposed generic layer:
>> - the head block
>> - the generic pointer
>
> Yes your patches look great otherwise.  I just want to keep things
> simple because you're creating a template for all future SHA authors.
>
>> The generic pointer is used in the arm64 case to convey the
>> information that the current invocation of the block transform is the
>> final one, and the core code can apply the padding and finalize /and/
>> pass back whether it has done so or not. (the latter can easily be
>> done in the C code as well)  I used a generic pointer to allow other
>> uses, but if you have a better idea for this particular use case, I'd
>> be happy to hear it.
>
> As I said in the original email, this appears to be equivalent to
>
>         (sctx->count + len) % SHA1_BLOCK_SIZE != 0
>
> So why not just do that check in C? It'll get compiled into a
> simple mask so it should be no slower than a branch on finalize.
>
> In fact if your assembly function always updates sctx->count
> then you could simplify that to
>
>         sctx->count % SHA1_BLOCK_SIZE != 0
>

The problem is that the block transform needs to be informed of the
fact that the current invocation is the final one, and it can perform
the padding, add the count etc. The C code can figure out by looking
at the count whether the asm code has applied the padding or not, but
it cannot really tell the asm code whether it should apply the padding
without an additional argument.

So what about

> +typedef void (sha1_block_fn)(int blocks, u8 const *src, u32 *state,
> +                          bool final);

?

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v3 01/16] crypto: sha1: implement base layer for SHA-1
  2015-04-08 14:18               ` Ard Biesheuvel
@ 2015-04-08 14:22                 ` Herbert Xu
  -1 siblings, 0 replies; 50+ messages in thread
From: Herbert Xu @ 2015-04-08 14:22 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: linux-crypto, linux-arm-kernel, x86, Sami Tolvanen,
	Jussi Kivilinna, Markus Stockhausen

On Wed, Apr 08, 2015 at 04:18:56PM +0200, Ard Biesheuvel wrote:
>
> So what about
> 
> > +typedef void (sha1_block_fn)(int blocks, u8 const *src, u32 *state,
> > +                          bool final);
> 
> ?

Looks good to me.

Thanks,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH v3 01/16] crypto: sha1: implement base layer for SHA-1
@ 2015-04-08 14:22                 ` Herbert Xu
  0 siblings, 0 replies; 50+ messages in thread
From: Herbert Xu @ 2015-04-08 14:22 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Apr 08, 2015 at 04:18:56PM +0200, Ard Biesheuvel wrote:
>
> So what about
> 
> > +typedef void (sha1_block_fn)(int blocks, u8 const *src, u32 *state,
> > +                          bool final);
> 
> ?

Looks good to me.

Thanks,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 50+ messages in thread

end of thread, other threads:[~2015-04-08 14:22 UTC | newest]

Thread overview: 50+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-04-07  8:51 [PATCH v3 00/16] crypto: SHA glue code consolidation Ard Biesheuvel
2015-04-07  8:51 ` Ard Biesheuvel
2015-04-07  8:51 ` [PATCH v3 01/16] crypto: sha1: implement base layer for SHA-1 Ard Biesheuvel
2015-04-07  8:51   ` Ard Biesheuvel
2015-04-08 13:19   ` Herbert Xu
2015-04-08 13:19     ` Herbert Xu
2015-04-08 13:25     ` Ard Biesheuvel
2015-04-08 13:25       ` Ard Biesheuvel
2015-04-08 13:30       ` Herbert Xu
2015-04-08 13:30         ` Herbert Xu
2015-04-08 13:40         ` Ard Biesheuvel
2015-04-08 13:40           ` Ard Biesheuvel
2015-04-08 13:52           ` Ard Biesheuvel
2015-04-08 13:52             ` Ard Biesheuvel
2015-04-08 14:06           ` Herbert Xu
2015-04-08 14:06             ` Herbert Xu
2015-04-08 14:18             ` Ard Biesheuvel
2015-04-08 14:18               ` Ard Biesheuvel
2015-04-08 14:22               ` Herbert Xu
2015-04-08 14:22                 ` Herbert Xu
2015-04-07  8:51 ` [PATCH v3 02/16] crypto: sha256: implement base layer for SHA-256 Ard Biesheuvel
2015-04-07  8:51   ` Ard Biesheuvel
2015-04-07  8:51 ` [PATCH v3 03/16] crypto: sha512: implement base layer for SHA-512 Ard Biesheuvel
2015-04-07  8:51   ` Ard Biesheuvel
2015-04-07  8:51 ` [PATCH v3 04/16] crypto: sha1-generic: move to generic glue implementation Ard Biesheuvel
2015-04-07  8:51   ` Ard Biesheuvel
2015-04-07  8:51 ` [PATCH v3 05/16] crypto: sha256-generic: " Ard Biesheuvel
2015-04-07  8:51   ` Ard Biesheuvel
2015-04-07  8:51 ` [PATCH v3 06/16] crypto: sha512-generic: " Ard Biesheuvel
2015-04-07  8:51   ` Ard Biesheuvel
2015-04-07  8:51 ` [PATCH v3 07/16] crypto/arm: move SHA-1 ARM asm implementation to base layer Ard Biesheuvel
2015-04-07  8:51   ` Ard Biesheuvel
2015-04-07  8:51 ` [PATCH v3 08/16] crypto/arm: move SHA-1 NEON " Ard Biesheuvel
2015-04-07  8:51   ` Ard Biesheuvel
2015-04-07  8:51 ` [PATCH v3 09/16] crypto/arm: move SHA-1 ARMv8 " Ard Biesheuvel
2015-04-07  8:51   ` Ard Biesheuvel
2015-04-07  8:51 ` [PATCH v3 10/16] crypto/arm: move SHA-224/256 ASM/NEON " Ard Biesheuvel
2015-04-07  8:51   ` Ard Biesheuvel
2015-04-07  8:51 ` [PATCH v3 11/16] crypto/arm: move SHA-224/256 ARMv8 " Ard Biesheuvel
2015-04-07  8:51   ` Ard Biesheuvel
2015-04-07  8:52 ` [PATCH v3 12/16] crypto/arm64: move SHA-1 " Ard Biesheuvel
2015-04-07  8:52   ` Ard Biesheuvel
2015-04-07  8:52 ` [PATCH v3 13/16] crypto/arm64: move SHA-224/256 " Ard Biesheuvel
2015-04-07  8:52   ` Ard Biesheuvel
2015-04-07  8:52 ` [PATCH v3 14/16] crypto/x86: move SHA-1 SSSE3 " Ard Biesheuvel
2015-04-07  8:52   ` Ard Biesheuvel
2015-04-07  8:52 ` [PATCH v3 15/16] crypto/x86: move SHA-224/256 " Ard Biesheuvel
2015-04-07  8:52   ` Ard Biesheuvel
2015-04-07  8:52 ` [PATCH v3 16/16] crypto/x86: move SHA-384/512 " Ard Biesheuvel
2015-04-07  8:52   ` Ard Biesheuvel

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.