All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/6] crypto: lib/sha256 - cleanup/optimization
@ 2020-10-20 20:39 Arvind Sankar
  2020-10-20 20:39 ` [PATCH v2 1/6] crypto: Use memzero_explicit() for clearing state Arvind Sankar
                   ` (5 more replies)
  0 siblings, 6 replies; 24+ messages in thread
From: Arvind Sankar @ 2020-10-20 20:39 UTC (permalink / raw)
  To: Herbert Xu, David S. Miller, linux-crypto, David Laight; +Cc: linux-kernel

Patch 1 -- Use memzero_explicit() instead of structure assignment/plain
memset() to clear sensitive state.

Patch 2 -- I am not sure about this one: currently the temporary
variables used in the generic sha256 implementation are cleared, but the
clearing is optimized away due to lack of compiler barriers. I don't
think it's really necessary to clear them, but I'm not a cryptanalyst,
so I would like comment on whether it's indeed safe not to, or we should
instead add the required barriers to force clearing.

The last four patches are optimizations for generic sha256.

v2:
- Add patch to combine K and W arrays, suggested by David
- Reformat SHA256_ROUND() macro a little

Arvind Sankar (6):
  crypto: Use memzero_explicit() for clearing state
  crypto: lib/sha256 - Don't clear temporary variables
  crypto: lib/sha256 - Clear W[] in sha256_update() instead of
    sha256_transform()
  crypto: lib/sha256 - Unroll SHA256 loop 8 times intead of 64
  crypto: lib/sha256 - Unroll LOAD and BLEND loops
  crypto: lib/sha - Combine round constants and message schedule

 include/crypto/sha1_base.h   |   3 +-
 include/crypto/sha256_base.h |   3 +-
 include/crypto/sha512_base.h |   3 +-
 include/crypto/sm3_base.h    |   3 +-
 lib/crypto/sha256.c          | 211 +++++++++++------------------------
 5 files changed, 71 insertions(+), 152 deletions(-)

-- 
2.26.2


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v2 1/6] crypto: Use memzero_explicit() for clearing state
  2020-10-20 20:39 [PATCH v2 0/6] crypto: lib/sha256 - cleanup/optimization Arvind Sankar
@ 2020-10-20 20:39 ` Arvind Sankar
  2020-10-22  4:36   ` Eric Biggers
  2020-10-20 20:39 ` [PATCH v2 2/6] crypto: lib/sha256 - Don't clear temporary variables Arvind Sankar
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 24+ messages in thread
From: Arvind Sankar @ 2020-10-20 20:39 UTC (permalink / raw)
  To: Herbert Xu, David S. Miller, linux-crypto, David Laight; +Cc: linux-kernel

Without the barrier_data() inside memzero_explicit(), the compiler may
optimize away the state-clearing if it can tell that the state is not
used afterwards. At least in lib/crypto/sha256.c:__sha256_final(), the
function can get inlined into sha256(), in which case the memset is
optimized away.

Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
---
 include/crypto/sha1_base.h   | 3 ++-
 include/crypto/sha256_base.h | 3 ++-
 include/crypto/sha512_base.h | 3 ++-
 include/crypto/sm3_base.h    | 3 ++-
 lib/crypto/sha256.c          | 2 +-
 5 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/include/crypto/sha1_base.h b/include/crypto/sha1_base.h
index 20fd1f7468af..a5d6033efef7 100644
--- a/include/crypto/sha1_base.h
+++ b/include/crypto/sha1_base.h
@@ -12,6 +12,7 @@
 #include <crypto/sha.h>
 #include <linux/crypto.h>
 #include <linux/module.h>
+#include <linux/string.h>
 
 #include <asm/unaligned.h>
 
@@ -101,7 +102,7 @@ static inline int sha1_base_finish(struct shash_desc *desc, u8 *out)
 	for (i = 0; i < SHA1_DIGEST_SIZE / sizeof(__be32); i++)
 		put_unaligned_be32(sctx->state[i], digest++);
 
-	*sctx = (struct sha1_state){};
+	memzero_explicit(sctx, sizeof(*sctx));
 	return 0;
 }
 
diff --git a/include/crypto/sha256_base.h b/include/crypto/sha256_base.h
index 6ded110783ae..93f9fd21cc06 100644
--- a/include/crypto/sha256_base.h
+++ b/include/crypto/sha256_base.h
@@ -12,6 +12,7 @@
 #include <crypto/sha.h>
 #include <linux/crypto.h>
 #include <linux/module.h>
+#include <linux/string.h>
 
 #include <asm/unaligned.h>
 
@@ -105,7 +106,7 @@ static inline int sha256_base_finish(struct shash_desc *desc, u8 *out)
 	for (i = 0; digest_size > 0; i++, digest_size -= sizeof(__be32))
 		put_unaligned_be32(sctx->state[i], digest++);
 
-	*sctx = (struct sha256_state){};
+	memzero_explicit(sctx, sizeof(*sctx));
 	return 0;
 }
 
diff --git a/include/crypto/sha512_base.h b/include/crypto/sha512_base.h
index fb19c77494dc..93ab73baa38e 100644
--- a/include/crypto/sha512_base.h
+++ b/include/crypto/sha512_base.h
@@ -12,6 +12,7 @@
 #include <crypto/sha.h>
 #include <linux/crypto.h>
 #include <linux/module.h>
+#include <linux/string.h>
 
 #include <asm/unaligned.h>
 
@@ -126,7 +127,7 @@ static inline int sha512_base_finish(struct shash_desc *desc, u8 *out)
 	for (i = 0; digest_size > 0; i++, digest_size -= sizeof(__be64))
 		put_unaligned_be64(sctx->state[i], digest++);
 
-	*sctx = (struct sha512_state){};
+	memzero_explicit(sctx, sizeof(*sctx));
 	return 0;
 }
 
diff --git a/include/crypto/sm3_base.h b/include/crypto/sm3_base.h
index 1cbf9aa1fe52..2f3a32ab97bb 100644
--- a/include/crypto/sm3_base.h
+++ b/include/crypto/sm3_base.h
@@ -13,6 +13,7 @@
 #include <crypto/sm3.h>
 #include <linux/crypto.h>
 #include <linux/module.h>
+#include <linux/string.h>
 #include <asm/unaligned.h>
 
 typedef void (sm3_block_fn)(struct sm3_state *sst, u8 const *src, int blocks);
@@ -104,7 +105,7 @@ static inline int sm3_base_finish(struct shash_desc *desc, u8 *out)
 	for (i = 0; i < SM3_DIGEST_SIZE / sizeof(__be32); i++)
 		put_unaligned_be32(sctx->state[i], digest++);
 
-	*sctx = (struct sm3_state){};
+	memzero_explicit(sctx, sizeof(*sctx));
 	return 0;
 }
 
diff --git a/lib/crypto/sha256.c b/lib/crypto/sha256.c
index 2321f6cb322f..d43bc39ab05e 100644
--- a/lib/crypto/sha256.c
+++ b/lib/crypto/sha256.c
@@ -265,7 +265,7 @@ static void __sha256_final(struct sha256_state *sctx, u8 *out, int digest_words)
 		put_unaligned_be32(sctx->state[i], &dst[i]);
 
 	/* Zeroize sensitive information. */
-	memset(sctx, 0, sizeof(*sctx));
+	memzero_explicit(sctx, sizeof(*sctx));
 }
 
 void sha256_final(struct sha256_state *sctx, u8 *out)
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v2 2/6] crypto: lib/sha256 - Don't clear temporary variables
  2020-10-20 20:39 [PATCH v2 0/6] crypto: lib/sha256 - cleanup/optimization Arvind Sankar
  2020-10-20 20:39 ` [PATCH v2 1/6] crypto: Use memzero_explicit() for clearing state Arvind Sankar
@ 2020-10-20 20:39 ` Arvind Sankar
  2020-10-22  4:58   ` Eric Biggers
  2020-10-20 20:39 ` [PATCH v2 3/6] crypto: lib/sha256 - Clear W[] in sha256_update() instead of sha256_transform() Arvind Sankar
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 24+ messages in thread
From: Arvind Sankar @ 2020-10-20 20:39 UTC (permalink / raw)
  To: Herbert Xu, David S. Miller, linux-crypto, David Laight; +Cc: linux-kernel

The assignments to clear a through h and t1/t2 are optimized out by the
compiler because they are unused after the assignments.

These variables shouldn't be very sensitive: t1/t2 can be calculated
from a through h, so they don't reveal any additional information.
Knowing a through h is equivalent to knowing one 64-byte block's SHA256
hash (with non-standard initial value) which, assuming SHA256 is secure,
doesn't reveal any information about the input.

Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
---
 lib/crypto/sha256.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/lib/crypto/sha256.c b/lib/crypto/sha256.c
index d43bc39ab05e..099cd11f83c1 100644
--- a/lib/crypto/sha256.c
+++ b/lib/crypto/sha256.c
@@ -202,7 +202,6 @@ static void sha256_transform(u32 *state, const u8 *input)
 	state[4] += e; state[5] += f; state[6] += g; state[7] += h;
 
 	/* clear any sensitive info... */
-	a = b = c = d = e = f = g = h = t1 = t2 = 0;
 	memzero_explicit(W, 64 * sizeof(u32));
 }
 
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v2 3/6] crypto: lib/sha256 - Clear W[] in sha256_update() instead of sha256_transform()
  2020-10-20 20:39 [PATCH v2 0/6] crypto: lib/sha256 - cleanup/optimization Arvind Sankar
  2020-10-20 20:39 ` [PATCH v2 1/6] crypto: Use memzero_explicit() for clearing state Arvind Sankar
  2020-10-20 20:39 ` [PATCH v2 2/6] crypto: lib/sha256 - Don't clear temporary variables Arvind Sankar
@ 2020-10-20 20:39 ` Arvind Sankar
  2020-10-22  4:59   ` Eric Biggers
  2020-10-20 20:39 ` [PATCH v2 4/6] crypto: lib/sha256 - Unroll SHA256 loop 8 times intead of 64 Arvind Sankar
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 24+ messages in thread
From: Arvind Sankar @ 2020-10-20 20:39 UTC (permalink / raw)
  To: Herbert Xu, David S. Miller, linux-crypto, David Laight; +Cc: linux-kernel

The temporary W[] array is currently zeroed out once every call to
sha256_transform(), i.e. once every 64 bytes of input data. Moving it to
sha256_update() instead so that it is cleared only once per update can
save about 2-3% of the total time taken to compute the digest, with a
reasonable memset() implementation, and considerably more (~20%) with a
bad one (eg the x86 purgatory currently uses a memset() coded in C).

Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
---
 lib/crypto/sha256.c | 11 +++++------
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/lib/crypto/sha256.c b/lib/crypto/sha256.c
index 099cd11f83c1..c6bfeacc5b81 100644
--- a/lib/crypto/sha256.c
+++ b/lib/crypto/sha256.c
@@ -43,10 +43,9 @@ static inline void BLEND_OP(int I, u32 *W)
 	W[I] = s1(W[I-2]) + W[I-7] + s0(W[I-15]) + W[I-16];
 }
 
-static void sha256_transform(u32 *state, const u8 *input)
+static void sha256_transform(u32 *state, const u8 *input, u32 *W)
 {
 	u32 a, b, c, d, e, f, g, h, t1, t2;
-	u32 W[64];
 	int i;
 
 	/* load the input */
@@ -200,15 +199,13 @@ static void sha256_transform(u32 *state, const u8 *input)
 
 	state[0] += a; state[1] += b; state[2] += c; state[3] += d;
 	state[4] += e; state[5] += f; state[6] += g; state[7] += h;
-
-	/* clear any sensitive info... */
-	memzero_explicit(W, 64 * sizeof(u32));
 }
 
 void sha256_update(struct sha256_state *sctx, const u8 *data, unsigned int len)
 {
 	unsigned int partial, done;
 	const u8 *src;
+	u32 W[64];
 
 	partial = sctx->count & 0x3f;
 	sctx->count += len;
@@ -223,11 +220,13 @@ void sha256_update(struct sha256_state *sctx, const u8 *data, unsigned int len)
 		}
 
 		do {
-			sha256_transform(sctx->state, src);
+			sha256_transform(sctx->state, src, W);
 			done += 64;
 			src = data + done;
 		} while (done + 63 < len);
 
+		memzero_explicit(W, sizeof(W));
+
 		partial = 0;
 	}
 	memcpy(sctx->buf + partial, src, len - done);
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v2 4/6] crypto: lib/sha256 - Unroll SHA256 loop 8 times intead of 64
  2020-10-20 20:39 [PATCH v2 0/6] crypto: lib/sha256 - cleanup/optimization Arvind Sankar
                   ` (2 preceding siblings ...)
  2020-10-20 20:39 ` [PATCH v2 3/6] crypto: lib/sha256 - Clear W[] in sha256_update() instead of sha256_transform() Arvind Sankar
@ 2020-10-20 20:39 ` Arvind Sankar
  2020-10-22  5:02   ` Eric Biggers
  2020-10-20 20:39 ` [PATCH v2 5/6] crypto: lib/sha256 - Unroll LOAD and BLEND loops Arvind Sankar
  2020-10-20 20:39 ` [PATCH v2 6/6] crypto: lib/sha - Combine round constants and message schedule Arvind Sankar
  5 siblings, 1 reply; 24+ messages in thread
From: Arvind Sankar @ 2020-10-20 20:39 UTC (permalink / raw)
  To: Herbert Xu, David S. Miller, linux-crypto, David Laight; +Cc: linux-kernel

This reduces code size substantially (on x86_64 with gcc-10 the size of
sha256_update() goes from 7593 bytes to 1952 bytes including the new
SHA256_K array), and on x86 is slightly faster than the full unroll
(tesed on Broadwell Xeon).

Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
---
 lib/crypto/sha256.c | 166 ++++++++------------------------------------
 1 file changed, 30 insertions(+), 136 deletions(-)

diff --git a/lib/crypto/sha256.c b/lib/crypto/sha256.c
index c6bfeacc5b81..5efd390706c6 100644
--- a/lib/crypto/sha256.c
+++ b/lib/crypto/sha256.c
@@ -18,6 +18,17 @@
 #include <crypto/sha.h>
 #include <asm/unaligned.h>
 
+static const u32 SHA256_K[] = {
+	0x428a2f98, 0x71374491, 0xb5c0fbcf, 0xe9b5dba5, 0x3956c25b, 0x59f111f1, 0x923f82a4, 0xab1c5ed5,
+	0xd807aa98, 0x12835b01, 0x243185be, 0x550c7dc3, 0x72be5d74, 0x80deb1fe, 0x9bdc06a7, 0xc19bf174,
+	0xe49b69c1, 0xefbe4786, 0x0fc19dc6, 0x240ca1cc, 0x2de92c6f, 0x4a7484aa, 0x5cb0a9dc, 0x76f988da,
+	0x983e5152, 0xa831c66d, 0xb00327c8, 0xbf597fc7, 0xc6e00bf3, 0xd5a79147, 0x06ca6351, 0x14292967,
+	0x27b70a85, 0x2e1b2138, 0x4d2c6dfc, 0x53380d13, 0x650a7354, 0x766a0abb, 0x81c2c92e, 0x92722c85,
+	0xa2bfe8a1, 0xa81a664b, 0xc24b8b70, 0xc76c51a3, 0xd192e819, 0xd6990624, 0xf40e3585, 0x106aa070,
+	0x19a4c116, 0x1e376c08, 0x2748774c, 0x34b0bcb5, 0x391c0cb3, 0x4ed8aa4a, 0x5b9cca4f, 0x682e6ff3,
+	0x748f82ee, 0x78a5636f, 0x84c87814, 0x8cc70208, 0x90befffa, 0xa4506ceb, 0xbef9a3f7, 0xc67178f2,
+};
+
 static inline u32 Ch(u32 x, u32 y, u32 z)
 {
 	return z ^ (x & (y ^ z));
@@ -43,9 +54,17 @@ static inline void BLEND_OP(int I, u32 *W)
 	W[I] = s1(W[I-2]) + W[I-7] + s0(W[I-15]) + W[I-16];
 }
 
+#define SHA256_ROUND(i, a, b, c, d, e, f, g, h) do {		\
+	u32 t1, t2;						\
+	t1 = h + e1(e) + Ch(e, f, g) + SHA256_K[i] + W[i];	\
+	t2 = e0(a) + Maj(a, b, c);				\
+	d += t1;						\
+	h = t1 + t2;						\
+} while (0)
+
 static void sha256_transform(u32 *state, const u8 *input, u32 *W)
 {
-	u32 a, b, c, d, e, f, g, h, t1, t2;
+	u32 a, b, c, d, e, f, g, h;
 	int i;
 
 	/* load the input */
@@ -61,141 +80,16 @@ static void sha256_transform(u32 *state, const u8 *input, u32 *W)
 	e = state[4];  f = state[5];  g = state[6];  h = state[7];
 
 	/* now iterate */
-	t1 = h + e1(e) + Ch(e, f, g) + 0x428a2f98 + W[0];
-	t2 = e0(a) + Maj(a, b, c);    d += t1;    h = t1 + t2;
-	t1 = g + e1(d) + Ch(d, e, f) + 0x71374491 + W[1];
-	t2 = e0(h) + Maj(h, a, b);    c += t1;    g = t1 + t2;
-	t1 = f + e1(c) + Ch(c, d, e) + 0xb5c0fbcf + W[2];
-	t2 = e0(g) + Maj(g, h, a);    b += t1;    f = t1 + t2;
-	t1 = e + e1(b) + Ch(b, c, d) + 0xe9b5dba5 + W[3];
-	t2 = e0(f) + Maj(f, g, h);    a += t1;    e = t1 + t2;
-	t1 = d + e1(a) + Ch(a, b, c) + 0x3956c25b + W[4];
-	t2 = e0(e) + Maj(e, f, g);    h += t1;    d = t1 + t2;
-	t1 = c + e1(h) + Ch(h, a, b) + 0x59f111f1 + W[5];
-	t2 = e0(d) + Maj(d, e, f);    g += t1;    c = t1 + t2;
-	t1 = b + e1(g) + Ch(g, h, a) + 0x923f82a4 + W[6];
-	t2 = e0(c) + Maj(c, d, e);    f += t1;    b = t1 + t2;
-	t1 = a + e1(f) + Ch(f, g, h) + 0xab1c5ed5 + W[7];
-	t2 = e0(b) + Maj(b, c, d);    e += t1;    a = t1 + t2;
-
-	t1 = h + e1(e) + Ch(e, f, g) + 0xd807aa98 + W[8];
-	t2 = e0(a) + Maj(a, b, c);    d += t1;    h = t1 + t2;
-	t1 = g + e1(d) + Ch(d, e, f) + 0x12835b01 + W[9];
-	t2 = e0(h) + Maj(h, a, b);    c += t1;    g = t1 + t2;
-	t1 = f + e1(c) + Ch(c, d, e) + 0x243185be + W[10];
-	t2 = e0(g) + Maj(g, h, a);    b += t1;    f = t1 + t2;
-	t1 = e + e1(b) + Ch(b, c, d) + 0x550c7dc3 + W[11];
-	t2 = e0(f) + Maj(f, g, h);    a += t1;    e = t1 + t2;
-	t1 = d + e1(a) + Ch(a, b, c) + 0x72be5d74 + W[12];
-	t2 = e0(e) + Maj(e, f, g);    h += t1;    d = t1 + t2;
-	t1 = c + e1(h) + Ch(h, a, b) + 0x80deb1fe + W[13];
-	t2 = e0(d) + Maj(d, e, f);    g += t1;    c = t1 + t2;
-	t1 = b + e1(g) + Ch(g, h, a) + 0x9bdc06a7 + W[14];
-	t2 = e0(c) + Maj(c, d, e);    f += t1;    b = t1 + t2;
-	t1 = a + e1(f) + Ch(f, g, h) + 0xc19bf174 + W[15];
-	t2 = e0(b) + Maj(b, c, d);    e += t1;    a = t1 + t2;
-
-	t1 = h + e1(e) + Ch(e, f, g) + 0xe49b69c1 + W[16];
-	t2 = e0(a) + Maj(a, b, c);    d += t1;    h = t1 + t2;
-	t1 = g + e1(d) + Ch(d, e, f) + 0xefbe4786 + W[17];
-	t2 = e0(h) + Maj(h, a, b);    c += t1;    g = t1 + t2;
-	t1 = f + e1(c) + Ch(c, d, e) + 0x0fc19dc6 + W[18];
-	t2 = e0(g) + Maj(g, h, a);    b += t1;    f = t1 + t2;
-	t1 = e + e1(b) + Ch(b, c, d) + 0x240ca1cc + W[19];
-	t2 = e0(f) + Maj(f, g, h);    a += t1;    e = t1 + t2;
-	t1 = d + e1(a) + Ch(a, b, c) + 0x2de92c6f + W[20];
-	t2 = e0(e) + Maj(e, f, g);    h += t1;    d = t1 + t2;
-	t1 = c + e1(h) + Ch(h, a, b) + 0x4a7484aa + W[21];
-	t2 = e0(d) + Maj(d, e, f);    g += t1;    c = t1 + t2;
-	t1 = b + e1(g) + Ch(g, h, a) + 0x5cb0a9dc + W[22];
-	t2 = e0(c) + Maj(c, d, e);    f += t1;    b = t1 + t2;
-	t1 = a + e1(f) + Ch(f, g, h) + 0x76f988da + W[23];
-	t2 = e0(b) + Maj(b, c, d);    e += t1;    a = t1 + t2;
-
-	t1 = h + e1(e) + Ch(e, f, g) + 0x983e5152 + W[24];
-	t2 = e0(a) + Maj(a, b, c);    d += t1;    h = t1 + t2;
-	t1 = g + e1(d) + Ch(d, e, f) + 0xa831c66d + W[25];
-	t2 = e0(h) + Maj(h, a, b);    c += t1;    g = t1 + t2;
-	t1 = f + e1(c) + Ch(c, d, e) + 0xb00327c8 + W[26];
-	t2 = e0(g) + Maj(g, h, a);    b += t1;    f = t1 + t2;
-	t1 = e + e1(b) + Ch(b, c, d) + 0xbf597fc7 + W[27];
-	t2 = e0(f) + Maj(f, g, h);    a += t1;    e = t1 + t2;
-	t1 = d + e1(a) + Ch(a, b, c) + 0xc6e00bf3 + W[28];
-	t2 = e0(e) + Maj(e, f, g);    h += t1;    d = t1 + t2;
-	t1 = c + e1(h) + Ch(h, a, b) + 0xd5a79147 + W[29];
-	t2 = e0(d) + Maj(d, e, f);    g += t1;    c = t1 + t2;
-	t1 = b + e1(g) + Ch(g, h, a) + 0x06ca6351 + W[30];
-	t2 = e0(c) + Maj(c, d, e);    f += t1;    b = t1 + t2;
-	t1 = a + e1(f) + Ch(f, g, h) + 0x14292967 + W[31];
-	t2 = e0(b) + Maj(b, c, d);    e += t1;    a = t1 + t2;
-
-	t1 = h + e1(e) + Ch(e, f, g) + 0x27b70a85 + W[32];
-	t2 = e0(a) + Maj(a, b, c);    d += t1;    h = t1 + t2;
-	t1 = g + e1(d) + Ch(d, e, f) + 0x2e1b2138 + W[33];
-	t2 = e0(h) + Maj(h, a, b);    c += t1;    g = t1 + t2;
-	t1 = f + e1(c) + Ch(c, d, e) + 0x4d2c6dfc + W[34];
-	t2 = e0(g) + Maj(g, h, a);    b += t1;    f = t1 + t2;
-	t1 = e + e1(b) + Ch(b, c, d) + 0x53380d13 + W[35];
-	t2 = e0(f) + Maj(f, g, h);    a += t1;    e = t1 + t2;
-	t1 = d + e1(a) + Ch(a, b, c) + 0x650a7354 + W[36];
-	t2 = e0(e) + Maj(e, f, g);    h += t1;    d = t1 + t2;
-	t1 = c + e1(h) + Ch(h, a, b) + 0x766a0abb + W[37];
-	t2 = e0(d) + Maj(d, e, f);    g += t1;    c = t1 + t2;
-	t1 = b + e1(g) + Ch(g, h, a) + 0x81c2c92e + W[38];
-	t2 = e0(c) + Maj(c, d, e);    f += t1;    b = t1 + t2;
-	t1 = a + e1(f) + Ch(f, g, h) + 0x92722c85 + W[39];
-	t2 = e0(b) + Maj(b, c, d);    e += t1;    a = t1 + t2;
-
-	t1 = h + e1(e) + Ch(e, f, g) + 0xa2bfe8a1 + W[40];
-	t2 = e0(a) + Maj(a, b, c);    d += t1;    h = t1 + t2;
-	t1 = g + e1(d) + Ch(d, e, f) + 0xa81a664b + W[41];
-	t2 = e0(h) + Maj(h, a, b);    c += t1;    g = t1 + t2;
-	t1 = f + e1(c) + Ch(c, d, e) + 0xc24b8b70 + W[42];
-	t2 = e0(g) + Maj(g, h, a);    b += t1;    f = t1 + t2;
-	t1 = e + e1(b) + Ch(b, c, d) + 0xc76c51a3 + W[43];
-	t2 = e0(f) + Maj(f, g, h);    a += t1;    e = t1 + t2;
-	t1 = d + e1(a) + Ch(a, b, c) + 0xd192e819 + W[44];
-	t2 = e0(e) + Maj(e, f, g);    h += t1;    d = t1 + t2;
-	t1 = c + e1(h) + Ch(h, a, b) + 0xd6990624 + W[45];
-	t2 = e0(d) + Maj(d, e, f);    g += t1;    c = t1 + t2;
-	t1 = b + e1(g) + Ch(g, h, a) + 0xf40e3585 + W[46];
-	t2 = e0(c) + Maj(c, d, e);    f += t1;    b = t1 + t2;
-	t1 = a + e1(f) + Ch(f, g, h) + 0x106aa070 + W[47];
-	t2 = e0(b) + Maj(b, c, d);    e += t1;    a = t1 + t2;
-
-	t1 = h + e1(e) + Ch(e, f, g) + 0x19a4c116 + W[48];
-	t2 = e0(a) + Maj(a, b, c);    d += t1;    h = t1 + t2;
-	t1 = g + e1(d) + Ch(d, e, f) + 0x1e376c08 + W[49];
-	t2 = e0(h) + Maj(h, a, b);    c += t1;    g = t1 + t2;
-	t1 = f + e1(c) + Ch(c, d, e) + 0x2748774c + W[50];
-	t2 = e0(g) + Maj(g, h, a);    b += t1;    f = t1 + t2;
-	t1 = e + e1(b) + Ch(b, c, d) + 0x34b0bcb5 + W[51];
-	t2 = e0(f) + Maj(f, g, h);    a += t1;    e = t1 + t2;
-	t1 = d + e1(a) + Ch(a, b, c) + 0x391c0cb3 + W[52];
-	t2 = e0(e) + Maj(e, f, g);    h += t1;    d = t1 + t2;
-	t1 = c + e1(h) + Ch(h, a, b) + 0x4ed8aa4a + W[53];
-	t2 = e0(d) + Maj(d, e, f);    g += t1;    c = t1 + t2;
-	t1 = b + e1(g) + Ch(g, h, a) + 0x5b9cca4f + W[54];
-	t2 = e0(c) + Maj(c, d, e);    f += t1;    b = t1 + t2;
-	t1 = a + e1(f) + Ch(f, g, h) + 0x682e6ff3 + W[55];
-	t2 = e0(b) + Maj(b, c, d);    e += t1;    a = t1 + t2;
-
-	t1 = h + e1(e) + Ch(e, f, g) + 0x748f82ee + W[56];
-	t2 = e0(a) + Maj(a, b, c);    d += t1;    h = t1 + t2;
-	t1 = g + e1(d) + Ch(d, e, f) + 0x78a5636f + W[57];
-	t2 = e0(h) + Maj(h, a, b);    c += t1;    g = t1 + t2;
-	t1 = f + e1(c) + Ch(c, d, e) + 0x84c87814 + W[58];
-	t2 = e0(g) + Maj(g, h, a);    b += t1;    f = t1 + t2;
-	t1 = e + e1(b) + Ch(b, c, d) + 0x8cc70208 + W[59];
-	t2 = e0(f) + Maj(f, g, h);    a += t1;    e = t1 + t2;
-	t1 = d + e1(a) + Ch(a, b, c) + 0x90befffa + W[60];
-	t2 = e0(e) + Maj(e, f, g);    h += t1;    d = t1 + t2;
-	t1 = c + e1(h) + Ch(h, a, b) + 0xa4506ceb + W[61];
-	t2 = e0(d) + Maj(d, e, f);    g += t1;    c = t1 + t2;
-	t1 = b + e1(g) + Ch(g, h, a) + 0xbef9a3f7 + W[62];
-	t2 = e0(c) + Maj(c, d, e);    f += t1;    b = t1 + t2;
-	t1 = a + e1(f) + Ch(f, g, h) + 0xc67178f2 + W[63];
-	t2 = e0(b) + Maj(b, c, d);    e += t1;    a = t1 + t2;
+	for (i = 0; i < 64; i += 8) {
+		SHA256_ROUND(i + 0, a, b, c, d, e, f, g, h);
+		SHA256_ROUND(i + 1, h, a, b, c, d, e, f, g);
+		SHA256_ROUND(i + 2, g, h, a, b, c, d, e, f);
+		SHA256_ROUND(i + 3, f, g, h, a, b, c, d, e);
+		SHA256_ROUND(i + 4, e, f, g, h, a, b, c, d);
+		SHA256_ROUND(i + 5, d, e, f, g, h, a, b, c);
+		SHA256_ROUND(i + 6, c, d, e, f, g, h, a, b);
+		SHA256_ROUND(i + 7, b, c, d, e, f, g, h, a);
+	}
 
 	state[0] += a; state[1] += b; state[2] += c; state[3] += d;
 	state[4] += e; state[5] += f; state[6] += g; state[7] += h;
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v2 5/6] crypto: lib/sha256 - Unroll LOAD and BLEND loops
  2020-10-20 20:39 [PATCH v2 0/6] crypto: lib/sha256 - cleanup/optimization Arvind Sankar
                   ` (3 preceding siblings ...)
  2020-10-20 20:39 ` [PATCH v2 4/6] crypto: lib/sha256 - Unroll SHA256 loop 8 times intead of 64 Arvind Sankar
@ 2020-10-20 20:39 ` Arvind Sankar
  2020-10-22  5:02   ` Eric Biggers
  2020-10-20 20:39 ` [PATCH v2 6/6] crypto: lib/sha - Combine round constants and message schedule Arvind Sankar
  5 siblings, 1 reply; 24+ messages in thread
From: Arvind Sankar @ 2020-10-20 20:39 UTC (permalink / raw)
  To: Herbert Xu, David S. Miller, linux-crypto, David Laight; +Cc: linux-kernel

Unrolling the LOAD and BLEND loops improves performance by ~8% on x86_64
(tested on Broadwell Xeon) while not increasing code size too much.

Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
---
 lib/crypto/sha256.c | 24 ++++++++++++++++++++----
 1 file changed, 20 insertions(+), 4 deletions(-)

diff --git a/lib/crypto/sha256.c b/lib/crypto/sha256.c
index 5efd390706c6..3a8802d5f747 100644
--- a/lib/crypto/sha256.c
+++ b/lib/crypto/sha256.c
@@ -68,12 +68,28 @@ static void sha256_transform(u32 *state, const u8 *input, u32 *W)
 	int i;
 
 	/* load the input */
-	for (i = 0; i < 16; i++)
-		LOAD_OP(i, W, input);
+	for (i = 0; i < 16; i += 8) {
+		LOAD_OP(i + 0, W, input);
+		LOAD_OP(i + 1, W, input);
+		LOAD_OP(i + 2, W, input);
+		LOAD_OP(i + 3, W, input);
+		LOAD_OP(i + 4, W, input);
+		LOAD_OP(i + 5, W, input);
+		LOAD_OP(i + 6, W, input);
+		LOAD_OP(i + 7, W, input);
+	}
 
 	/* now blend */
-	for (i = 16; i < 64; i++)
-		BLEND_OP(i, W);
+	for (i = 16; i < 64; i += 8) {
+		BLEND_OP(i + 0, W);
+		BLEND_OP(i + 1, W);
+		BLEND_OP(i + 2, W);
+		BLEND_OP(i + 3, W);
+		BLEND_OP(i + 4, W);
+		BLEND_OP(i + 5, W);
+		BLEND_OP(i + 6, W);
+		BLEND_OP(i + 7, W);
+	}
 
 	/* load the state into our registers */
 	a = state[0];  b = state[1];  c = state[2];  d = state[3];
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v2 6/6] crypto: lib/sha - Combine round constants and message schedule
  2020-10-20 20:39 [PATCH v2 0/6] crypto: lib/sha256 - cleanup/optimization Arvind Sankar
                   ` (4 preceding siblings ...)
  2020-10-20 20:39 ` [PATCH v2 5/6] crypto: lib/sha256 - Unroll LOAD and BLEND loops Arvind Sankar
@ 2020-10-20 20:39 ` Arvind Sankar
  2020-10-20 21:36   ` David Laight
  2020-10-22  4:34   ` Eric Biggers
  5 siblings, 2 replies; 24+ messages in thread
From: Arvind Sankar @ 2020-10-20 20:39 UTC (permalink / raw)
  To: Herbert Xu, David S. Miller, linux-crypto, David Laight; +Cc: linux-kernel

Putting the round constants and the message schedule arrays together in
one structure saves one register, which can be a significant benefit on
register-constrained architectures. On x86-32 (tested on Broadwell
Xeon), this gives a 10% performance benefit.

Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Suggested-by: David Laight <David.Laight@ACULAB.COM>
---
 lib/crypto/sha256.c | 49 ++++++++++++++++++++++++++-------------------
 1 file changed, 28 insertions(+), 21 deletions(-)

diff --git a/lib/crypto/sha256.c b/lib/crypto/sha256.c
index 3a8802d5f747..985cd0560d79 100644
--- a/lib/crypto/sha256.c
+++ b/lib/crypto/sha256.c
@@ -29,6 +29,11 @@ static const u32 SHA256_K[] = {
 	0x748f82ee, 0x78a5636f, 0x84c87814, 0x8cc70208, 0x90befffa, 0xa4506ceb, 0xbef9a3f7, 0xc67178f2,
 };
 
+struct KW {
+	u32 K[64];
+	u32 W[64];
+};
+
 static inline u32 Ch(u32 x, u32 y, u32 z)
 {
 	return z ^ (x & (y ^ z));
@@ -56,39 +61,39 @@ static inline void BLEND_OP(int I, u32 *W)
 
 #define SHA256_ROUND(i, a, b, c, d, e, f, g, h) do {		\
 	u32 t1, t2;						\
-	t1 = h + e1(e) + Ch(e, f, g) + SHA256_K[i] + W[i];	\
+	t1 = h + e1(e) + Ch(e, f, g) + KW->K[i] + KW->W[i];	\
 	t2 = e0(a) + Maj(a, b, c);				\
 	d += t1;						\
 	h = t1 + t2;						\
 } while (0)
 
-static void sha256_transform(u32 *state, const u8 *input, u32 *W)
+static void sha256_transform(u32 *state, const u8 *input, struct KW *KW)
 {
 	u32 a, b, c, d, e, f, g, h;
 	int i;
 
 	/* load the input */
 	for (i = 0; i < 16; i += 8) {
-		LOAD_OP(i + 0, W, input);
-		LOAD_OP(i + 1, W, input);
-		LOAD_OP(i + 2, W, input);
-		LOAD_OP(i + 3, W, input);
-		LOAD_OP(i + 4, W, input);
-		LOAD_OP(i + 5, W, input);
-		LOAD_OP(i + 6, W, input);
-		LOAD_OP(i + 7, W, input);
+		LOAD_OP(i + 0, KW->W, input);
+		LOAD_OP(i + 1, KW->W, input);
+		LOAD_OP(i + 2, KW->W, input);
+		LOAD_OP(i + 3, KW->W, input);
+		LOAD_OP(i + 4, KW->W, input);
+		LOAD_OP(i + 5, KW->W, input);
+		LOAD_OP(i + 6, KW->W, input);
+		LOAD_OP(i + 7, KW->W, input);
 	}
 
 	/* now blend */
 	for (i = 16; i < 64; i += 8) {
-		BLEND_OP(i + 0, W);
-		BLEND_OP(i + 1, W);
-		BLEND_OP(i + 2, W);
-		BLEND_OP(i + 3, W);
-		BLEND_OP(i + 4, W);
-		BLEND_OP(i + 5, W);
-		BLEND_OP(i + 6, W);
-		BLEND_OP(i + 7, W);
+		BLEND_OP(i + 0, KW->W);
+		BLEND_OP(i + 1, KW->W);
+		BLEND_OP(i + 2, KW->W);
+		BLEND_OP(i + 3, KW->W);
+		BLEND_OP(i + 4, KW->W);
+		BLEND_OP(i + 5, KW->W);
+		BLEND_OP(i + 6, KW->W);
+		BLEND_OP(i + 7, KW->W);
 	}
 
 	/* load the state into our registers */
@@ -115,7 +120,7 @@ void sha256_update(struct sha256_state *sctx, const u8 *data, unsigned int len)
 {
 	unsigned int partial, done;
 	const u8 *src;
-	u32 W[64];
+	struct KW KW;
 
 	partial = sctx->count & 0x3f;
 	sctx->count += len;
@@ -129,13 +134,15 @@ void sha256_update(struct sha256_state *sctx, const u8 *data, unsigned int len)
 			src = sctx->buf;
 		}
 
+		memcpy(KW.K, SHA256_K, sizeof(KW.K));
+
 		do {
-			sha256_transform(sctx->state, src, W);
+			sha256_transform(sctx->state, src, &KW);
 			done += 64;
 			src = data + done;
 		} while (done + 63 < len);
 
-		memzero_explicit(W, sizeof(W));
+		memzero_explicit(KW.W, sizeof(KW.W));
 
 		partial = 0;
 	}
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* RE: [PATCH v2 6/6] crypto: lib/sha - Combine round constants and message schedule
  2020-10-20 20:39 ` [PATCH v2 6/6] crypto: lib/sha - Combine round constants and message schedule Arvind Sankar
@ 2020-10-20 21:36   ` David Laight
  2020-10-21 15:16     ` Arvind Sankar
  2020-10-22  4:34   ` Eric Biggers
  1 sibling, 1 reply; 24+ messages in thread
From: David Laight @ 2020-10-20 21:36 UTC (permalink / raw)
  To: 'Arvind Sankar', Herbert Xu, David S. Miller, linux-crypto
  Cc: linux-kernel

From: Arvind Sankar
> Sent: 20 October 2020 21:40
> 
> Putting the round constants and the message schedule arrays together in
> one structure saves one register, which can be a significant benefit on
> register-constrained architectures. On x86-32 (tested on Broadwell
> Xeon), this gives a 10% performance benefit.

I'm actually stunned it makes that much difference.
The object code must be truly horrid (before and after).

There are probably other strange tweaks that give a similar
improvement.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 6/6] crypto: lib/sha - Combine round constants and message schedule
  2020-10-20 21:36   ` David Laight
@ 2020-10-21 15:16     ` Arvind Sankar
  0 siblings, 0 replies; 24+ messages in thread
From: Arvind Sankar @ 2020-10-21 15:16 UTC (permalink / raw)
  To: David Laight
  Cc: 'Arvind Sankar',
	Herbert Xu, David S. Miller, linux-crypto, linux-kernel

On Tue, Oct 20, 2020 at 09:36:00PM +0000, David Laight wrote:
> From: Arvind Sankar
> > Sent: 20 October 2020 21:40
> > 
> > Putting the round constants and the message schedule arrays together in
> > one structure saves one register, which can be a significant benefit on
> > register-constrained architectures. On x86-32 (tested on Broadwell
> > Xeon), this gives a 10% performance benefit.
> 
> I'm actually stunned it makes that much difference.
> The object code must be truly horrid (before and after).
> 
> There are probably other strange tweaks that give a similar
> improvement.
> 
> 	David
> 

Hm yes, I took a closer look at the generated code, and gcc seems to be
doing something completely braindead. Before this change, it actually
copies 8 words at a time from SHA256_K onto the stack, and uses those
stack temporaries for the calculation. So this patch is giving a benefit
just because it only does the copy once instead of every time around the
loop.

It doesn't even really need a register to hold SHA256_K since this isn't
PIC code, it could just access it directly as SHA256_K(%ecx) if it just
multiplied the loop counter i by 4.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 6/6] crypto: lib/sha - Combine round constants and message schedule
  2020-10-20 20:39 ` [PATCH v2 6/6] crypto: lib/sha - Combine round constants and message schedule Arvind Sankar
  2020-10-20 21:36   ` David Laight
@ 2020-10-22  4:34   ` Eric Biggers
  2020-10-22  8:20     ` David Laight
  1 sibling, 1 reply; 24+ messages in thread
From: Eric Biggers @ 2020-10-22  4:34 UTC (permalink / raw)
  To: Arvind Sankar
  Cc: Herbert Xu, David S. Miller, linux-crypto, David Laight, linux-kernel

On Tue, Oct 20, 2020 at 04:39:57PM -0400, Arvind Sankar wrote:
> Putting the round constants and the message schedule arrays together in
> one structure saves one register, which can be a significant benefit on
> register-constrained architectures. On x86-32 (tested on Broadwell
> Xeon), this gives a 10% performance benefit.
> 
> Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
> Suggested-by: David Laight <David.Laight@ACULAB.COM>
> ---
>  lib/crypto/sha256.c | 49 ++++++++++++++++++++++++++-------------------
>  1 file changed, 28 insertions(+), 21 deletions(-)
> 
> diff --git a/lib/crypto/sha256.c b/lib/crypto/sha256.c
> index 3a8802d5f747..985cd0560d79 100644
> --- a/lib/crypto/sha256.c
> +++ b/lib/crypto/sha256.c
> @@ -29,6 +29,11 @@ static const u32 SHA256_K[] = {
>  	0x748f82ee, 0x78a5636f, 0x84c87814, 0x8cc70208, 0x90befffa, 0xa4506ceb, 0xbef9a3f7, 0xc67178f2,
>  };
>  
> +struct KW {
> +	u32 K[64];
> +	u32 W[64];
> +};

Note that this doubles the stack usage from 256 to 512 bytes.  That's pretty
large for kernel code, especially when compiler options can increase the stack
usage well beyond the "expected" value.

So unless this gives a big performance improvement on architectures other than
32-bit x86 (which people don't really care about these days), we probably
shouldn't do this.

FWIW, it's possible to reduce the length of 'W' to 16 words by computing the
next W value just before each round 16-63, or by computing the next W values in
batches of 16 before rounds 16, 32, and 48.  (This is similar to what lib/sha1.c
does for SHA-1.)  In a quick userspace benchmark that seems to reduce
performance by about 25% on x86_64, though.

- Eric

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 1/6] crypto: Use memzero_explicit() for clearing state
  2020-10-20 20:39 ` [PATCH v2 1/6] crypto: Use memzero_explicit() for clearing state Arvind Sankar
@ 2020-10-22  4:36   ` Eric Biggers
  2020-10-23 15:39     ` Arvind Sankar
  0 siblings, 1 reply; 24+ messages in thread
From: Eric Biggers @ 2020-10-22  4:36 UTC (permalink / raw)
  To: Arvind Sankar
  Cc: Herbert Xu, David S. Miller, linux-crypto, David Laight, linux-kernel

On Tue, Oct 20, 2020 at 04:39:52PM -0400, Arvind Sankar wrote:
> Without the barrier_data() inside memzero_explicit(), the compiler may
> optimize away the state-clearing if it can tell that the state is not
> used afterwards. At least in lib/crypto/sha256.c:__sha256_final(), the
> function can get inlined into sha256(), in which case the memset is
> optimized away.
> 
> Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>

Reviewed-by: Eric Biggers <ebiggers@google.com>

Maybe get the one in arch/arm64/crypto/sha3-ce-glue.c too?

- Eric

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 2/6] crypto: lib/sha256 - Don't clear temporary variables
  2020-10-20 20:39 ` [PATCH v2 2/6] crypto: lib/sha256 - Don't clear temporary variables Arvind Sankar
@ 2020-10-22  4:58   ` Eric Biggers
  2020-10-23  3:17     ` Arvind Sankar
  0 siblings, 1 reply; 24+ messages in thread
From: Eric Biggers @ 2020-10-22  4:58 UTC (permalink / raw)
  To: Arvind Sankar
  Cc: Herbert Xu, David S. Miller, linux-crypto, David Laight, linux-kernel

On Tue, Oct 20, 2020 at 04:39:53PM -0400, Arvind Sankar wrote:
> The assignments to clear a through h and t1/t2 are optimized out by the
> compiler because they are unused after the assignments.
> 
> These variables shouldn't be very sensitive: t1/t2 can be calculated
> from a through h, so they don't reveal any additional information.
> Knowing a through h is equivalent to knowing one 64-byte block's SHA256
> hash (with non-standard initial value) which, assuming SHA256 is secure,
> doesn't reveal any information about the input.
> 
> Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>

I don't entirely buy the second paragraph.  It could be the case that the input
is less than or equal to one SHA-256 block (64 bytes), in which case leaking
'a' through 'h' would reveal the final SHA-256 hash if the input length is
known.  And note that callers might consider either the input, the resulting
hash, or both to be sensitive information -- it depends.

> ---
>  lib/crypto/sha256.c | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/lib/crypto/sha256.c b/lib/crypto/sha256.c
> index d43bc39ab05e..099cd11f83c1 100644
> --- a/lib/crypto/sha256.c
> +++ b/lib/crypto/sha256.c
> @@ -202,7 +202,6 @@ static void sha256_transform(u32 *state, const u8 *input)
>  	state[4] += e; state[5] += f; state[6] += g; state[7] += h;
>  
>  	/* clear any sensitive info... */
> -	a = b = c = d = e = f = g = h = t1 = t2 = 0;
>  	memzero_explicit(W, 64 * sizeof(u32));
>  }

Your change itself is fine, though.  As you mentioned, these assignments get
optimized out, so they weren't accomplishing anything.

The fact is, there just isn't any way to guarantee in C code that all sensitive
variables get cleared.

So we shouldn't (and generally don't) bother trying to clear individual u32's,
ints, etc. like this, but rather only structs and arrays, as clearing those is
more likely to work as intended.

- Eric

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 3/6] crypto: lib/sha256 - Clear W[] in sha256_update() instead of sha256_transform()
  2020-10-20 20:39 ` [PATCH v2 3/6] crypto: lib/sha256 - Clear W[] in sha256_update() instead of sha256_transform() Arvind Sankar
@ 2020-10-22  4:59   ` Eric Biggers
  0 siblings, 0 replies; 24+ messages in thread
From: Eric Biggers @ 2020-10-22  4:59 UTC (permalink / raw)
  To: Arvind Sankar
  Cc: Herbert Xu, David S. Miller, linux-crypto, David Laight, linux-kernel

On Tue, Oct 20, 2020 at 04:39:54PM -0400, Arvind Sankar wrote:
> The temporary W[] array is currently zeroed out once every call to
> sha256_transform(), i.e. once every 64 bytes of input data. Moving it to
> sha256_update() instead so that it is cleared only once per update can
> save about 2-3% of the total time taken to compute the digest, with a
> reasonable memset() implementation, and considerably more (~20%) with a
> bad one (eg the x86 purgatory currently uses a memset() coded in C).
> 
> Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>

Looks good,

Reviewed-by: Eric Biggers <ebiggers@google.com>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 4/6] crypto: lib/sha256 - Unroll SHA256 loop 8 times intead of 64
  2020-10-20 20:39 ` [PATCH v2 4/6] crypto: lib/sha256 - Unroll SHA256 loop 8 times intead of 64 Arvind Sankar
@ 2020-10-22  5:02   ` Eric Biggers
  2020-10-23  3:12     ` Arvind Sankar
  0 siblings, 1 reply; 24+ messages in thread
From: Eric Biggers @ 2020-10-22  5:02 UTC (permalink / raw)
  To: Arvind Sankar
  Cc: Herbert Xu, David S. Miller, linux-crypto, David Laight, linux-kernel

On Tue, Oct 20, 2020 at 04:39:55PM -0400, Arvind Sankar wrote:
> This reduces code size substantially (on x86_64 with gcc-10 the size of
> sha256_update() goes from 7593 bytes to 1952 bytes including the new
> SHA256_K array), and on x86 is slightly faster than the full unroll
> (tesed on Broadwell Xeon).

tesed => tested

> 
> Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
> ---
>  lib/crypto/sha256.c | 166 ++++++++------------------------------------
>  1 file changed, 30 insertions(+), 136 deletions(-)
> 
> diff --git a/lib/crypto/sha256.c b/lib/crypto/sha256.c
> index c6bfeacc5b81..5efd390706c6 100644
> --- a/lib/crypto/sha256.c
> +++ b/lib/crypto/sha256.c
> @@ -18,6 +18,17 @@
>  #include <crypto/sha.h>
>  #include <asm/unaligned.h>
>  
> +static const u32 SHA256_K[] = {
> +	0x428a2f98, 0x71374491, 0xb5c0fbcf, 0xe9b5dba5, 0x3956c25b, 0x59f111f1, 0x923f82a4, 0xab1c5ed5,
> +	0xd807aa98, 0x12835b01, 0x243185be, 0x550c7dc3, 0x72be5d74, 0x80deb1fe, 0x9bdc06a7, 0xc19bf174,
> +	0xe49b69c1, 0xefbe4786, 0x0fc19dc6, 0x240ca1cc, 0x2de92c6f, 0x4a7484aa, 0x5cb0a9dc, 0x76f988da,
> +	0x983e5152, 0xa831c66d, 0xb00327c8, 0xbf597fc7, 0xc6e00bf3, 0xd5a79147, 0x06ca6351, 0x14292967,
> +	0x27b70a85, 0x2e1b2138, 0x4d2c6dfc, 0x53380d13, 0x650a7354, 0x766a0abb, 0x81c2c92e, 0x92722c85,
> +	0xa2bfe8a1, 0xa81a664b, 0xc24b8b70, 0xc76c51a3, 0xd192e819, 0xd6990624, 0xf40e3585, 0x106aa070,
> +	0x19a4c116, 0x1e376c08, 0x2748774c, 0x34b0bcb5, 0x391c0cb3, 0x4ed8aa4a, 0x5b9cca4f, 0x682e6ff3,
> +	0x748f82ee, 0x78a5636f, 0x84c87814, 0x8cc70208, 0x90befffa, 0xa4506ceb, 0xbef9a3f7, 0xc67178f2,
> +};

Limit this to 80 columns?

Otherwise this looks good.

- Eric

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 5/6] crypto: lib/sha256 - Unroll LOAD and BLEND loops
  2020-10-20 20:39 ` [PATCH v2 5/6] crypto: lib/sha256 - Unroll LOAD and BLEND loops Arvind Sankar
@ 2020-10-22  5:02   ` Eric Biggers
  0 siblings, 0 replies; 24+ messages in thread
From: Eric Biggers @ 2020-10-22  5:02 UTC (permalink / raw)
  To: Arvind Sankar
  Cc: Herbert Xu, David S. Miller, linux-crypto, David Laight, linux-kernel

On Tue, Oct 20, 2020 at 04:39:56PM -0400, Arvind Sankar wrote:
> Unrolling the LOAD and BLEND loops improves performance by ~8% on x86_64
> (tested on Broadwell Xeon) while not increasing code size too much.
> 
> Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
> ---

Looks good,

Reviewed-by: Eric Biggers <ebiggers@google.com>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: [PATCH v2 6/6] crypto: lib/sha - Combine round constants and message schedule
  2020-10-22  4:34   ` Eric Biggers
@ 2020-10-22  8:20     ` David Laight
  0 siblings, 0 replies; 24+ messages in thread
From: David Laight @ 2020-10-22  8:20 UTC (permalink / raw)
  To: 'Eric Biggers', Arvind Sankar
  Cc: Herbert Xu, David S. Miller, linux-crypto, linux-kernel


From: Eric Biggers
> Sent: 22 October 2020 05:35
> 
> On Tue, Oct 20, 2020 at 04:39:57PM -0400, Arvind Sankar wrote:
> > Putting the round constants and the message schedule arrays together in
> > one structure saves one register, which can be a significant benefit on
> > register-constrained architectures. On x86-32 (tested on Broadwell
> > Xeon), this gives a 10% performance benefit.
> >
> > Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
> > Suggested-by: David Laight <David.Laight@ACULAB.COM>
> > ---
> >  lib/crypto/sha256.c | 49 ++++++++++++++++++++++++++-------------------
> >  1 file changed, 28 insertions(+), 21 deletions(-)
> >
> > diff --git a/lib/crypto/sha256.c b/lib/crypto/sha256.c
> > index 3a8802d5f747..985cd0560d79 100644
> > --- a/lib/crypto/sha256.c
> > +++ b/lib/crypto/sha256.c
> > @@ -29,6 +29,11 @@ static const u32 SHA256_K[] = {
> >  	0x748f82ee, 0x78a5636f, 0x84c87814, 0x8cc70208, 0x90befffa, 0xa4506ceb, 0xbef9a3f7, 0xc67178f2,
> >  };
> >
> > +struct KW {
> > +	u32 K[64];
> > +	u32 W[64];
> > +};
> 
> Note that this doubles the stack usage from 256 to 512 bytes.  That's pretty
> large for kernel code, especially when compiler options can increase the stack
> usage well beyond the "expected" value.
> 
> So unless this gives a big performance improvement on architectures other than
> 32-bit x86 (which people don't really care about these days), we probably
> shouldn't do this.

IIRC the gain came from an odd side effect - which can probably
be got (for some compiler versions) by other means.

> FWIW, it's possible to reduce the length of 'W' to 16 words by computing the
> next W value just before each round 16-63,

I was looking at that.
You'd need to do the first 16 rounds then rounds 17-63 in a second
loop to avoid the conditional.
The problem is that it needs too many registers.
You'd need registers for 16 W values, the 8 a-h and a few spare.

...

Looking closely each round is like:
        t1 = h + e1(e) + Ch(e, f, g) + 0x428a2f98 + W[0];
        t2 = e0(a) + Maj(a, b, c);
        h = t1 + t2;   // Not used for a few rounds
        d += t1;       // Needed next round
So only needs 4 of the state variables (e, f, g, h).
The next round uses d, e, f and g.
So with extreme care d and h can use the same register.
Although I'm not sure how you'd get the compiler to do it.
Possibly making state[] volatile (or using READ/WRITE_ONCE).
So the last two lines become:
        state[7] = t1 + t2;
        d = state[3] + t1;
That might stop the x86 (32bit) spilling registers.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 4/6] crypto: lib/sha256 - Unroll SHA256 loop 8 times intead of 64
  2020-10-22  5:02   ` Eric Biggers
@ 2020-10-23  3:12     ` Arvind Sankar
  2020-10-23  3:16       ` Herbert Xu
  0 siblings, 1 reply; 24+ messages in thread
From: Arvind Sankar @ 2020-10-23  3:12 UTC (permalink / raw)
  To: Eric Biggers
  Cc: Arvind Sankar, Herbert Xu, David S. Miller, linux-crypto,
	David Laight, linux-kernel

On Wed, Oct 21, 2020 at 10:02:19PM -0700, Eric Biggers wrote:
> On Tue, Oct 20, 2020 at 04:39:55PM -0400, Arvind Sankar wrote:
> > This reduces code size substantially (on x86_64 with gcc-10 the size of
> > sha256_update() goes from 7593 bytes to 1952 bytes including the new
> > SHA256_K array), and on x86 is slightly faster than the full unroll
> > (tesed on Broadwell Xeon).
> 
> tesed => tested
> 
> > 
> > Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
> > ---
> >  lib/crypto/sha256.c | 166 ++++++++------------------------------------
> >  1 file changed, 30 insertions(+), 136 deletions(-)
> > 
> > diff --git a/lib/crypto/sha256.c b/lib/crypto/sha256.c
> > index c6bfeacc5b81..5efd390706c6 100644
> > --- a/lib/crypto/sha256.c
> > +++ b/lib/crypto/sha256.c
> > @@ -18,6 +18,17 @@
> >  #include <crypto/sha.h>
> >  #include <asm/unaligned.h>
> >  
> > +static const u32 SHA256_K[] = {
> > +	0x428a2f98, 0x71374491, 0xb5c0fbcf, 0xe9b5dba5, 0x3956c25b, 0x59f111f1, 0x923f82a4, 0xab1c5ed5,
> > +	0xd807aa98, 0x12835b01, 0x243185be, 0x550c7dc3, 0x72be5d74, 0x80deb1fe, 0x9bdc06a7, 0xc19bf174,
> > +	0xe49b69c1, 0xefbe4786, 0x0fc19dc6, 0x240ca1cc, 0x2de92c6f, 0x4a7484aa, 0x5cb0a9dc, 0x76f988da,
> > +	0x983e5152, 0xa831c66d, 0xb00327c8, 0xbf597fc7, 0xc6e00bf3, 0xd5a79147, 0x06ca6351, 0x14292967,
> > +	0x27b70a85, 0x2e1b2138, 0x4d2c6dfc, 0x53380d13, 0x650a7354, 0x766a0abb, 0x81c2c92e, 0x92722c85,
> > +	0xa2bfe8a1, 0xa81a664b, 0xc24b8b70, 0xc76c51a3, 0xd192e819, 0xd6990624, 0xf40e3585, 0x106aa070,
> > +	0x19a4c116, 0x1e376c08, 0x2748774c, 0x34b0bcb5, 0x391c0cb3, 0x4ed8aa4a, 0x5b9cca4f, 0x682e6ff3,
> > +	0x748f82ee, 0x78a5636f, 0x84c87814, 0x8cc70208, 0x90befffa, 0xa4506ceb, 0xbef9a3f7, 0xc67178f2,
> > +};
> 
> Limit this to 80 columns?

I was aiming for 8 columns per line to match all the other groupings by
eight. It does slightly exceed 100 columns but can this be an exception,
or should I maybe make it 4 columns per line?

> 
> Otherwise this looks good.
> 
> - Eric

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 4/6] crypto: lib/sha256 - Unroll SHA256 loop 8 times intead of 64
  2020-10-23  3:12     ` Arvind Sankar
@ 2020-10-23  3:16       ` Herbert Xu
  0 siblings, 0 replies; 24+ messages in thread
From: Herbert Xu @ 2020-10-23  3:16 UTC (permalink / raw)
  To: Arvind Sankar
  Cc: Eric Biggers, David S. Miller, linux-crypto, David Laight, linux-kernel

On Thu, Oct 22, 2020 at 11:12:36PM -0400, Arvind Sankar wrote:
>
> I was aiming for 8 columns per line to match all the other groupings by
> eight. It does slightly exceed 100 columns but can this be an exception,
> or should I maybe make it 4 columns per line?

Please limit it to 4 columns.

Thanks,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 2/6] crypto: lib/sha256 - Don't clear temporary variables
  2020-10-22  4:58   ` Eric Biggers
@ 2020-10-23  3:17     ` Arvind Sankar
  0 siblings, 0 replies; 24+ messages in thread
From: Arvind Sankar @ 2020-10-23  3:17 UTC (permalink / raw)
  To: Eric Biggers
  Cc: Arvind Sankar, Herbert Xu, David S. Miller, linux-crypto,
	David Laight, linux-kernel

On Wed, Oct 21, 2020 at 09:58:50PM -0700, Eric Biggers wrote:
> On Tue, Oct 20, 2020 at 04:39:53PM -0400, Arvind Sankar wrote:
> > The assignments to clear a through h and t1/t2 are optimized out by the
> > compiler because they are unused after the assignments.
> > 
> > These variables shouldn't be very sensitive: t1/t2 can be calculated
> > from a through h, so they don't reveal any additional information.
> > Knowing a through h is equivalent to knowing one 64-byte block's SHA256
> > hash (with non-standard initial value) which, assuming SHA256 is secure,
> > doesn't reveal any information about the input.
> > 
> > Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
> 
> I don't entirely buy the second paragraph.  It could be the case that the input
> is less than or equal to one SHA-256 block (64 bytes), in which case leaking
> 'a' through 'h' would reveal the final SHA-256 hash if the input length is
> known.  And note that callers might consider either the input, the resulting
> hash, or both to be sensitive information -- it depends.

The "non-standard initial value" was just parenthetical -- my thinking
was that revealing the hash, whether the real SHA hash or an
intermediate one starting at some other initial value, shouldn't reveal
the input; not that you get any additional security from being an
intermediate block. But if the hash itself could be sensitive, yeah then
a-h are sensitive anyway.

> 
> > ---
> >  lib/crypto/sha256.c | 1 -
> >  1 file changed, 1 deletion(-)
> > 
> > diff --git a/lib/crypto/sha256.c b/lib/crypto/sha256.c
> > index d43bc39ab05e..099cd11f83c1 100644
> > --- a/lib/crypto/sha256.c
> > +++ b/lib/crypto/sha256.c
> > @@ -202,7 +202,6 @@ static void sha256_transform(u32 *state, const u8 *input)
> >  	state[4] += e; state[5] += f; state[6] += g; state[7] += h;
> >  
> >  	/* clear any sensitive info... */
> > -	a = b = c = d = e = f = g = h = t1 = t2 = 0;
> >  	memzero_explicit(W, 64 * sizeof(u32));
> >  }
> 
> Your change itself is fine, though.  As you mentioned, these assignments get
> optimized out, so they weren't accomplishing anything.
> 
> The fact is, there just isn't any way to guarantee in C code that all sensitive
> variables get cleared.
> 
> So we shouldn't (and generally don't) bother trying to clear individual u32's,
> ints, etc. like this, but rather only structs and arrays, as clearing those is
> more likely to work as intended.
> 
> - Eric

Ok, I'll just drop the second paragraph from the commit message then.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 1/6] crypto: Use memzero_explicit() for clearing state
  2020-10-22  4:36   ` Eric Biggers
@ 2020-10-23 15:39     ` Arvind Sankar
  2020-10-23 15:56       ` Eric Biggers
  0 siblings, 1 reply; 24+ messages in thread
From: Arvind Sankar @ 2020-10-23 15:39 UTC (permalink / raw)
  To: Eric Biggers
  Cc: Arvind Sankar, Herbert Xu, David S. Miller, linux-crypto,
	David Laight, linux-kernel

On Wed, Oct 21, 2020 at 09:36:33PM -0700, Eric Biggers wrote:
> On Tue, Oct 20, 2020 at 04:39:52PM -0400, Arvind Sankar wrote:
> > Without the barrier_data() inside memzero_explicit(), the compiler may
> > optimize away the state-clearing if it can tell that the state is not
> > used afterwards. At least in lib/crypto/sha256.c:__sha256_final(), the
> > function can get inlined into sha256(), in which case the memset is
> > optimized away.
> > 
> > Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
> 
> Reviewed-by: Eric Biggers <ebiggers@google.com>
> 
> Maybe get the one in arch/arm64/crypto/sha3-ce-glue.c too?
> 
> - Eric

Hm, there are a few more as well like that. But now I'm thinking it's
only the generic sha256.c that may be problematic. The rest of them are
in _final() functions which will be stored as function pointers in a
structure, so there should be no risk of them getting optimized away?

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 1/6] crypto: Use memzero_explicit() for clearing state
  2020-10-23 15:39     ` Arvind Sankar
@ 2020-10-23 15:56       ` Eric Biggers
  2020-10-23 20:45         ` Herbert Xu
  0 siblings, 1 reply; 24+ messages in thread
From: Eric Biggers @ 2020-10-23 15:56 UTC (permalink / raw)
  To: Arvind Sankar
  Cc: Herbert Xu, David S. Miller, linux-crypto, David Laight, linux-kernel

On Fri, Oct 23, 2020 at 11:39:27AM -0400, Arvind Sankar wrote:
> On Wed, Oct 21, 2020 at 09:36:33PM -0700, Eric Biggers wrote:
> > On Tue, Oct 20, 2020 at 04:39:52PM -0400, Arvind Sankar wrote:
> > > Without the barrier_data() inside memzero_explicit(), the compiler may
> > > optimize away the state-clearing if it can tell that the state is not
> > > used afterwards. At least in lib/crypto/sha256.c:__sha256_final(), the
> > > function can get inlined into sha256(), in which case the memset is
> > > optimized away.
> > > 
> > > Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
> > 
> > Reviewed-by: Eric Biggers <ebiggers@google.com>
> > 
> > Maybe get the one in arch/arm64/crypto/sha3-ce-glue.c too?
> > 
> > - Eric
> 
> Hm, there are a few more as well like that. But now I'm thinking it's
> only the generic sha256.c that may be problematic. The rest of them are
> in _final() functions which will be stored as function pointers in a
> structure, so there should be no risk of them getting optimized away?

When clearing memory because "it may be sensitive" rather than "it's needed for
the code to behave correctly", I think it's best to use memzero_explicit() to
make the intent clear, even if it seems that memset() is sufficient.  Also keep
in mind that support for compiling the kernel with LTO (link-time optimization)
is being worked on (and some people already do it), which results in more code
being optimized out.

- Eric

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 1/6] crypto: Use memzero_explicit() for clearing state
  2020-10-23 15:56       ` Eric Biggers
@ 2020-10-23 20:45         ` Herbert Xu
  2020-10-23 21:53           ` Eric Biggers
  0 siblings, 1 reply; 24+ messages in thread
From: Herbert Xu @ 2020-10-23 20:45 UTC (permalink / raw)
  To: Eric Biggers
  Cc: Arvind Sankar, David S. Miller, linux-crypto, David Laight, linux-kernel

On Fri, Oct 23, 2020 at 08:56:04AM -0700, Eric Biggers wrote:
>
> When clearing memory because "it may be sensitive" rather than "it's needed for
> the code to behave correctly", I think it's best to use memzero_explicit() to
> make the intent clear, even if it seems that memset() is sufficient.  Also keep
> in mind that support for compiling the kernel with LTO (link-time optimization)
> is being worked on (and some people already do it), which results in more code
> being optimized out.

The rule up until now has been that we only use memzero_explicit for
stack variables.  At this point please don't convert anything else
as it will cause too much churn.

If LTO did arrive we should do a global conversion.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 1/6] crypto: Use memzero_explicit() for clearing state
  2020-10-23 20:45         ` Herbert Xu
@ 2020-10-23 21:53           ` Eric Biggers
  2020-10-29  7:00             ` Herbert Xu
  0 siblings, 1 reply; 24+ messages in thread
From: Eric Biggers @ 2020-10-23 21:53 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Arvind Sankar, David S. Miller, linux-crypto, David Laight,
	linux-kernel, Sami Tolvanen

On Sat, Oct 24, 2020 at 07:45:36AM +1100, Herbert Xu wrote:
> On Fri, Oct 23, 2020 at 08:56:04AM -0700, Eric Biggers wrote:
> >
> > When clearing memory because "it may be sensitive" rather than "it's needed for
> > the code to behave correctly", I think it's best to use memzero_explicit() to
> > make the intent clear, even if it seems that memset() is sufficient.  Also keep
> > in mind that support for compiling the kernel with LTO (link-time optimization)
> > is being worked on (and some people already do it), which results in more code
> > being optimized out.
> 
> The rule up until now has been that we only use memzero_explicit for
> stack variables.  At this point please don't convert anything else
> as it will cause too much churn.
> 
> If LTO did arrive we should do a global conversion.
> 

LTO is actively being worked on, now up to v6:
https://lkml.kernel.org/lkml/20201013003203.4168817-1-samitolvanen@google.com/
And in the real world it's already being used; the Android Compatibility
Definition Document strongly recommends enabling CFI, which depends on LTO.

It's doubtful that anyone will do a global conversion from memset() to
memzero_explicit(), as it's too hard to find all the places that should be
converted.  They are in lots of different subsystems; the crypto subsystem will
have the most, but not all.  We just need to fix as many as we can.  If you'd
like to do something more comprehensive than this patch, that would be great,
but I hope we don't wait forever for a global conversion that never happens.

FWIW, kfree_sensitive() (formerly kzfree) already got converted by
https://git.kernel.org/linus/8982ae527fbef170, and it wasn't really
controversial.  Some people even wanted Cc stable (which I disagreed with, but
it apparently made the final version).

- Eric

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 1/6] crypto: Use memzero_explicit() for clearing state
  2020-10-23 21:53           ` Eric Biggers
@ 2020-10-29  7:00             ` Herbert Xu
  0 siblings, 0 replies; 24+ messages in thread
From: Herbert Xu @ 2020-10-29  7:00 UTC (permalink / raw)
  To: Eric Biggers
  Cc: Arvind Sankar, David S. Miller, linux-crypto, David Laight,
	linux-kernel, Sami Tolvanen

On Fri, Oct 23, 2020 at 02:53:29PM -0700, Eric Biggers wrote:
>
> It's doubtful that anyone will do a global conversion from memset() to
> memzero_explicit(), as it's too hard to find all the places that should be
> converted.  They are in lots of different subsystems; the crypto subsystem will
> have the most, but not all.  We just need to fix as many as we can.  If you'd
> like to do something more comprehensive than this patch, that would be great,
> but I hope we don't wait forever for a global conversion that never happens.

I had thought that there were more instances of memset in crypto/
that needed converting, but a quick grep seems to indicate that most
are not sensitive.  So I'm OK with this patch.

Thanks,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2020-10-29  9:14 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-20 20:39 [PATCH v2 0/6] crypto: lib/sha256 - cleanup/optimization Arvind Sankar
2020-10-20 20:39 ` [PATCH v2 1/6] crypto: Use memzero_explicit() for clearing state Arvind Sankar
2020-10-22  4:36   ` Eric Biggers
2020-10-23 15:39     ` Arvind Sankar
2020-10-23 15:56       ` Eric Biggers
2020-10-23 20:45         ` Herbert Xu
2020-10-23 21:53           ` Eric Biggers
2020-10-29  7:00             ` Herbert Xu
2020-10-20 20:39 ` [PATCH v2 2/6] crypto: lib/sha256 - Don't clear temporary variables Arvind Sankar
2020-10-22  4:58   ` Eric Biggers
2020-10-23  3:17     ` Arvind Sankar
2020-10-20 20:39 ` [PATCH v2 3/6] crypto: lib/sha256 - Clear W[] in sha256_update() instead of sha256_transform() Arvind Sankar
2020-10-22  4:59   ` Eric Biggers
2020-10-20 20:39 ` [PATCH v2 4/6] crypto: lib/sha256 - Unroll SHA256 loop 8 times intead of 64 Arvind Sankar
2020-10-22  5:02   ` Eric Biggers
2020-10-23  3:12     ` Arvind Sankar
2020-10-23  3:16       ` Herbert Xu
2020-10-20 20:39 ` [PATCH v2 5/6] crypto: lib/sha256 - Unroll LOAD and BLEND loops Arvind Sankar
2020-10-22  5:02   ` Eric Biggers
2020-10-20 20:39 ` [PATCH v2 6/6] crypto: lib/sha - Combine round constants and message schedule Arvind Sankar
2020-10-20 21:36   ` David Laight
2020-10-21 15:16     ` Arvind Sankar
2020-10-22  4:34   ` Eric Biggers
2020-10-22  8:20     ` David Laight

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.