* [PATCH] Put sha1dc on a diet
@ 2017-03-01 0:30 Linus Torvalds
2017-03-01 18:42 ` Junio C Hamano
2017-03-01 19:07 ` Jeff King
0 siblings, 2 replies; 36+ messages in thread
From: Linus Torvalds @ 2017-03-01 0:30 UTC (permalink / raw)
To: Junio C Hamano, Jeff King, Marc Stevens, Dan Shumow; +Cc: Git Mailing List
From: Linus Torvalds <torvalds@linux-foundation.org>
Date: Tue, 28 Feb 2017 16:12:32 -0800
Subject: [PATCH] Put sha1dc on a diet
This removes the unnecessary parts of the sha1dc code, shrinking things from
[torvalds@i7 git]$ size sha1dc/*.o
text data bss dec hex filename
277559 640 0 278199 43eb7 sha1dc/sha1.o
4438 11352 0 15790 3dae sha1dc/ubc_check.o
to
[torvalds@i7 git]$ size sha1dc/*.o
text data bss dec hex filename
13287 0 0 13287 33e7 sha1dc/sha1.o
4438 11352 0 15790 3dae sha1dc/ubc_check.o
so the sha1.o text size shrinks from about 271kB to about 13kB.
The shrinking comes mainly from only generating the recompressio functions
for the two rounds that are actually used (58 and 65), but also from
removing a couple of other unused functions. The sha1dc library lost its
"safe_hash" parameter to do that, since we check - and refuse to touch -
the colliding cases manually.
The git binary itself is about 2MB of text on my system. For other helper
binaries the size reduction is even more noticeable. A quarter MB here
and a quarter MB there, and suddenly you have a big binary ;)
This has been tested with the bad pdf image:
[torvalds@i7 git]$ ./t/helper/test-sha1 < ~/Downloads/bad.pdf
fatal: The SHA1 computation detected evidence of a collision attack;
refusing to process the contents.
although we obviously still don't have an actual git object to test with.
The normal git test-suite obviously also passes.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
I notice that sha1dc is in the 'pu' branch now, so let's put my money
where my mouth is, and send in the sha1dc diet patch.
sha1dc/sha1.c | 356 +++-------------------------------------------------------
sha1dc/sha1.h | 24 ----
2 files changed, 18 insertions(+), 362 deletions(-)
diff --git a/sha1dc/sha1.c b/sha1dc/sha1.c
index 6569b403e..4910f0c35 100644
--- a/sha1dc/sha1.c
+++ b/sha1dc/sha1.c
@@ -9,6 +9,15 @@
#include "sha1dc/sha1.h"
#include "sha1dc/ubc_check.h"
+// function type for sha1_recompression_step_T (uint32_t ihvin[5], uint32_t ihvout[5], const uint32_t me2[80], const uint32_t state[5])
+// where 0 <= T < 80
+// me2 is an expanded message (the expansion of an original message block XOR'ed with a disturbance vector's message block difference)
+// state is the internal state (a,b,c,d,e) before step T of the SHA-1 compression function while processing the original message block
+// the function will return:
+// ihvin: the reconstructed input chaining value
+// ihvout: the reconstructed output chaining value
+typedef void(*sha1_recompression_type)(uint32_t*, uint32_t*, const uint32_t*, const uint32_t*);
+
#define rotate_right(x,n) (((x)>>(n))|((x)<<(32-(n))))
#define rotate_left(x,n) (((x)<<(n))|((x)>>(32-(n))))
@@ -39,212 +48,14 @@
-void sha1_message_expansion(uint32_t W[80])
+static void sha1_message_expansion(uint32_t W[80])
{
unsigned i;
for (i = 16; i < 80; ++i)
W[i] = rotate_left(W[i - 3] ^ W[i - 8] ^ W[i - 14] ^ W[i - 16], 1);
}
-void sha1_compression(uint32_t ihv[5], const uint32_t m[16])
-{
- uint32_t W[80];
- uint32_t a, b, c, d, e;
- unsigned i;
-
- memcpy(W, m, 16 * 4);
- for (i = 16; i < 80; ++i)
- W[i] = rotate_left(W[i - 3] ^ W[i - 8] ^ W[i - 14] ^ W[i - 16], 1);
-
- a = ihv[0];
- b = ihv[1];
- c = ihv[2];
- d = ihv[3];
- e = ihv[4];
-
- HASHCLASH_SHA1COMPRESS_ROUND1_STEP(a, b, c, d, e, W, 0);
- HASHCLASH_SHA1COMPRESS_ROUND1_STEP(e, a, b, c, d, W, 1);
- HASHCLASH_SHA1COMPRESS_ROUND1_STEP(d, e, a, b, c, W, 2);
- HASHCLASH_SHA1COMPRESS_ROUND1_STEP(c, d, e, a, b, W, 3);
- HASHCLASH_SHA1COMPRESS_ROUND1_STEP(b, c, d, e, a, W, 4);
- HASHCLASH_SHA1COMPRESS_ROUND1_STEP(a, b, c, d, e, W, 5);
- HASHCLASH_SHA1COMPRESS_ROUND1_STEP(e, a, b, c, d, W, 6);
- HASHCLASH_SHA1COMPRESS_ROUND1_STEP(d, e, a, b, c, W, 7);
- HASHCLASH_SHA1COMPRESS_ROUND1_STEP(c, d, e, a, b, W, 8);
- HASHCLASH_SHA1COMPRESS_ROUND1_STEP(b, c, d, e, a, W, 9);
- HASHCLASH_SHA1COMPRESS_ROUND1_STEP(a, b, c, d, e, W, 10);
- HASHCLASH_SHA1COMPRESS_ROUND1_STEP(e, a, b, c, d, W, 11);
- HASHCLASH_SHA1COMPRESS_ROUND1_STEP(d, e, a, b, c, W, 12);
- HASHCLASH_SHA1COMPRESS_ROUND1_STEP(c, d, e, a, b, W, 13);
- HASHCLASH_SHA1COMPRESS_ROUND1_STEP(b, c, d, e, a, W, 14);
- HASHCLASH_SHA1COMPRESS_ROUND1_STEP(a, b, c, d, e, W, 15);
- HASHCLASH_SHA1COMPRESS_ROUND1_STEP(e, a, b, c, d, W, 16);
- HASHCLASH_SHA1COMPRESS_ROUND1_STEP(d, e, a, b, c, W, 17);
- HASHCLASH_SHA1COMPRESS_ROUND1_STEP(c, d, e, a, b, W, 18);
- HASHCLASH_SHA1COMPRESS_ROUND1_STEP(b, c, d, e, a, W, 19);
-
- HASHCLASH_SHA1COMPRESS_ROUND2_STEP(a, b, c, d, e, W, 20);
- HASHCLASH_SHA1COMPRESS_ROUND2_STEP(e, a, b, c, d, W, 21);
- HASHCLASH_SHA1COMPRESS_ROUND2_STEP(d, e, a, b, c, W, 22);
- HASHCLASH_SHA1COMPRESS_ROUND2_STEP(c, d, e, a, b, W, 23);
- HASHCLASH_SHA1COMPRESS_ROUND2_STEP(b, c, d, e, a, W, 24);
- HASHCLASH_SHA1COMPRESS_ROUND2_STEP(a, b, c, d, e, W, 25);
- HASHCLASH_SHA1COMPRESS_ROUND2_STEP(e, a, b, c, d, W, 26);
- HASHCLASH_SHA1COMPRESS_ROUND2_STEP(d, e, a, b, c, W, 27);
- HASHCLASH_SHA1COMPRESS_ROUND2_STEP(c, d, e, a, b, W, 28);
- HASHCLASH_SHA1COMPRESS_ROUND2_STEP(b, c, d, e, a, W, 29);
- HASHCLASH_SHA1COMPRESS_ROUND2_STEP(a, b, c, d, e, W, 30);
- HASHCLASH_SHA1COMPRESS_ROUND2_STEP(e, a, b, c, d, W, 31);
- HASHCLASH_SHA1COMPRESS_ROUND2_STEP(d, e, a, b, c, W, 32);
- HASHCLASH_SHA1COMPRESS_ROUND2_STEP(c, d, e, a, b, W, 33);
- HASHCLASH_SHA1COMPRESS_ROUND2_STEP(b, c, d, e, a, W, 34);
- HASHCLASH_SHA1COMPRESS_ROUND2_STEP(a, b, c, d, e, W, 35);
- HASHCLASH_SHA1COMPRESS_ROUND2_STEP(e, a, b, c, d, W, 36);
- HASHCLASH_SHA1COMPRESS_ROUND2_STEP(d, e, a, b, c, W, 37);
- HASHCLASH_SHA1COMPRESS_ROUND2_STEP(c, d, e, a, b, W, 38);
- HASHCLASH_SHA1COMPRESS_ROUND2_STEP(b, c, d, e, a, W, 39);
-
- HASHCLASH_SHA1COMPRESS_ROUND3_STEP(a, b, c, d, e, W, 40);
- HASHCLASH_SHA1COMPRESS_ROUND3_STEP(e, a, b, c, d, W, 41);
- HASHCLASH_SHA1COMPRESS_ROUND3_STEP(d, e, a, b, c, W, 42);
- HASHCLASH_SHA1COMPRESS_ROUND3_STEP(c, d, e, a, b, W, 43);
- HASHCLASH_SHA1COMPRESS_ROUND3_STEP(b, c, d, e, a, W, 44);
- HASHCLASH_SHA1COMPRESS_ROUND3_STEP(a, b, c, d, e, W, 45);
- HASHCLASH_SHA1COMPRESS_ROUND3_STEP(e, a, b, c, d, W, 46);
- HASHCLASH_SHA1COMPRESS_ROUND3_STEP(d, e, a, b, c, W, 47);
- HASHCLASH_SHA1COMPRESS_ROUND3_STEP(c, d, e, a, b, W, 48);
- HASHCLASH_SHA1COMPRESS_ROUND3_STEP(b, c, d, e, a, W, 49);
- HASHCLASH_SHA1COMPRESS_ROUND3_STEP(a, b, c, d, e, W, 50);
- HASHCLASH_SHA1COMPRESS_ROUND3_STEP(e, a, b, c, d, W, 51);
- HASHCLASH_SHA1COMPRESS_ROUND3_STEP(d, e, a, b, c, W, 52);
- HASHCLASH_SHA1COMPRESS_ROUND3_STEP(c, d, e, a, b, W, 53);
- HASHCLASH_SHA1COMPRESS_ROUND3_STEP(b, c, d, e, a, W, 54);
- HASHCLASH_SHA1COMPRESS_ROUND3_STEP(a, b, c, d, e, W, 55);
- HASHCLASH_SHA1COMPRESS_ROUND3_STEP(e, a, b, c, d, W, 56);
- HASHCLASH_SHA1COMPRESS_ROUND3_STEP(d, e, a, b, c, W, 57);
- HASHCLASH_SHA1COMPRESS_ROUND3_STEP(c, d, e, a, b, W, 58);
- HASHCLASH_SHA1COMPRESS_ROUND3_STEP(b, c, d, e, a, W, 59);
-
- HASHCLASH_SHA1COMPRESS_ROUND4_STEP(a, b, c, d, e, W, 60);
- HASHCLASH_SHA1COMPRESS_ROUND4_STEP(e, a, b, c, d, W, 61);
- HASHCLASH_SHA1COMPRESS_ROUND4_STEP(d, e, a, b, c, W, 62);
- HASHCLASH_SHA1COMPRESS_ROUND4_STEP(c, d, e, a, b, W, 63);
- HASHCLASH_SHA1COMPRESS_ROUND4_STEP(b, c, d, e, a, W, 64);
- HASHCLASH_SHA1COMPRESS_ROUND4_STEP(a, b, c, d, e, W, 65);
- HASHCLASH_SHA1COMPRESS_ROUND4_STEP(e, a, b, c, d, W, 66);
- HASHCLASH_SHA1COMPRESS_ROUND4_STEP(d, e, a, b, c, W, 67);
- HASHCLASH_SHA1COMPRESS_ROUND4_STEP(c, d, e, a, b, W, 68);
- HASHCLASH_SHA1COMPRESS_ROUND4_STEP(b, c, d, e, a, W, 69);
- HASHCLASH_SHA1COMPRESS_ROUND4_STEP(a, b, c, d, e, W, 70);
- HASHCLASH_SHA1COMPRESS_ROUND4_STEP(e, a, b, c, d, W, 71);
- HASHCLASH_SHA1COMPRESS_ROUND4_STEP(d, e, a, b, c, W, 72);
- HASHCLASH_SHA1COMPRESS_ROUND4_STEP(c, d, e, a, b, W, 73);
- HASHCLASH_SHA1COMPRESS_ROUND4_STEP(b, c, d, e, a, W, 74);
- HASHCLASH_SHA1COMPRESS_ROUND4_STEP(a, b, c, d, e, W, 75);
- HASHCLASH_SHA1COMPRESS_ROUND4_STEP(e, a, b, c, d, W, 76);
- HASHCLASH_SHA1COMPRESS_ROUND4_STEP(d, e, a, b, c, W, 77);
- HASHCLASH_SHA1COMPRESS_ROUND4_STEP(c, d, e, a, b, W, 78);
- HASHCLASH_SHA1COMPRESS_ROUND4_STEP(b, c, d, e, a, W, 79);
-
- ihv[0] += a; ihv[1] += b; ihv[2] += c; ihv[3] += d; ihv[4] += e;
-}
-
-
-
-void sha1_compression_W(uint32_t ihv[5], const uint32_t W[80])
-{
- uint32_t a = ihv[0], b = ihv[1], c = ihv[2], d = ihv[3], e = ihv[4];
-
- HASHCLASH_SHA1COMPRESS_ROUND1_STEP(a, b, c, d, e, W, 0);
- HASHCLASH_SHA1COMPRESS_ROUND1_STEP(e, a, b, c, d, W, 1);
- HASHCLASH_SHA1COMPRESS_ROUND1_STEP(d, e, a, b, c, W, 2);
- HASHCLASH_SHA1COMPRESS_ROUND1_STEP(c, d, e, a, b, W, 3);
- HASHCLASH_SHA1COMPRESS_ROUND1_STEP(b, c, d, e, a, W, 4);
- HASHCLASH_SHA1COMPRESS_ROUND1_STEP(a, b, c, d, e, W, 5);
- HASHCLASH_SHA1COMPRESS_ROUND1_STEP(e, a, b, c, d, W, 6);
- HASHCLASH_SHA1COMPRESS_ROUND1_STEP(d, e, a, b, c, W, 7);
- HASHCLASH_SHA1COMPRESS_ROUND1_STEP(c, d, e, a, b, W, 8);
- HASHCLASH_SHA1COMPRESS_ROUND1_STEP(b, c, d, e, a, W, 9);
- HASHCLASH_SHA1COMPRESS_ROUND1_STEP(a, b, c, d, e, W, 10);
- HASHCLASH_SHA1COMPRESS_ROUND1_STEP(e, a, b, c, d, W, 11);
- HASHCLASH_SHA1COMPRESS_ROUND1_STEP(d, e, a, b, c, W, 12);
- HASHCLASH_SHA1COMPRESS_ROUND1_STEP(c, d, e, a, b, W, 13);
- HASHCLASH_SHA1COMPRESS_ROUND1_STEP(b, c, d, e, a, W, 14);
- HASHCLASH_SHA1COMPRESS_ROUND1_STEP(a, b, c, d, e, W, 15);
- HASHCLASH_SHA1COMPRESS_ROUND1_STEP(e, a, b, c, d, W, 16);
- HASHCLASH_SHA1COMPRESS_ROUND1_STEP(d, e, a, b, c, W, 17);
- HASHCLASH_SHA1COMPRESS_ROUND1_STEP(c, d, e, a, b, W, 18);
- HASHCLASH_SHA1COMPRESS_ROUND1_STEP(b, c, d, e, a, W, 19);
-
- HASHCLASH_SHA1COMPRESS_ROUND2_STEP(a, b, c, d, e, W, 20);
- HASHCLASH_SHA1COMPRESS_ROUND2_STEP(e, a, b, c, d, W, 21);
- HASHCLASH_SHA1COMPRESS_ROUND2_STEP(d, e, a, b, c, W, 22);
- HASHCLASH_SHA1COMPRESS_ROUND2_STEP(c, d, e, a, b, W, 23);
- HASHCLASH_SHA1COMPRESS_ROUND2_STEP(b, c, d, e, a, W, 24);
- HASHCLASH_SHA1COMPRESS_ROUND2_STEP(a, b, c, d, e, W, 25);
- HASHCLASH_SHA1COMPRESS_ROUND2_STEP(e, a, b, c, d, W, 26);
- HASHCLASH_SHA1COMPRESS_ROUND2_STEP(d, e, a, b, c, W, 27);
- HASHCLASH_SHA1COMPRESS_ROUND2_STEP(c, d, e, a, b, W, 28);
- HASHCLASH_SHA1COMPRESS_ROUND2_STEP(b, c, d, e, a, W, 29);
- HASHCLASH_SHA1COMPRESS_ROUND2_STEP(a, b, c, d, e, W, 30);
- HASHCLASH_SHA1COMPRESS_ROUND2_STEP(e, a, b, c, d, W, 31);
- HASHCLASH_SHA1COMPRESS_ROUND2_STEP(d, e, a, b, c, W, 32);
- HASHCLASH_SHA1COMPRESS_ROUND2_STEP(c, d, e, a, b, W, 33);
- HASHCLASH_SHA1COMPRESS_ROUND2_STEP(b, c, d, e, a, W, 34);
- HASHCLASH_SHA1COMPRESS_ROUND2_STEP(a, b, c, d, e, W, 35);
- HASHCLASH_SHA1COMPRESS_ROUND2_STEP(e, a, b, c, d, W, 36);
- HASHCLASH_SHA1COMPRESS_ROUND2_STEP(d, e, a, b, c, W, 37);
- HASHCLASH_SHA1COMPRESS_ROUND2_STEP(c, d, e, a, b, W, 38);
- HASHCLASH_SHA1COMPRESS_ROUND2_STEP(b, c, d, e, a, W, 39);
-
- HASHCLASH_SHA1COMPRESS_ROUND3_STEP(a, b, c, d, e, W, 40);
- HASHCLASH_SHA1COMPRESS_ROUND3_STEP(e, a, b, c, d, W, 41);
- HASHCLASH_SHA1COMPRESS_ROUND3_STEP(d, e, a, b, c, W, 42);
- HASHCLASH_SHA1COMPRESS_ROUND3_STEP(c, d, e, a, b, W, 43);
- HASHCLASH_SHA1COMPRESS_ROUND3_STEP(b, c, d, e, a, W, 44);
- HASHCLASH_SHA1COMPRESS_ROUND3_STEP(a, b, c, d, e, W, 45);
- HASHCLASH_SHA1COMPRESS_ROUND3_STEP(e, a, b, c, d, W, 46);
- HASHCLASH_SHA1COMPRESS_ROUND3_STEP(d, e, a, b, c, W, 47);
- HASHCLASH_SHA1COMPRESS_ROUND3_STEP(c, d, e, a, b, W, 48);
- HASHCLASH_SHA1COMPRESS_ROUND3_STEP(b, c, d, e, a, W, 49);
- HASHCLASH_SHA1COMPRESS_ROUND3_STEP(a, b, c, d, e, W, 50);
- HASHCLASH_SHA1COMPRESS_ROUND3_STEP(e, a, b, c, d, W, 51);
- HASHCLASH_SHA1COMPRESS_ROUND3_STEP(d, e, a, b, c, W, 52);
- HASHCLASH_SHA1COMPRESS_ROUND3_STEP(c, d, e, a, b, W, 53);
- HASHCLASH_SHA1COMPRESS_ROUND3_STEP(b, c, d, e, a, W, 54);
- HASHCLASH_SHA1COMPRESS_ROUND3_STEP(a, b, c, d, e, W, 55);
- HASHCLASH_SHA1COMPRESS_ROUND3_STEP(e, a, b, c, d, W, 56);
- HASHCLASH_SHA1COMPRESS_ROUND3_STEP(d, e, a, b, c, W, 57);
- HASHCLASH_SHA1COMPRESS_ROUND3_STEP(c, d, e, a, b, W, 58);
- HASHCLASH_SHA1COMPRESS_ROUND3_STEP(b, c, d, e, a, W, 59);
-
- HASHCLASH_SHA1COMPRESS_ROUND4_STEP(a, b, c, d, e, W, 60);
- HASHCLASH_SHA1COMPRESS_ROUND4_STEP(e, a, b, c, d, W, 61);
- HASHCLASH_SHA1COMPRESS_ROUND4_STEP(d, e, a, b, c, W, 62);
- HASHCLASH_SHA1COMPRESS_ROUND4_STEP(c, d, e, a, b, W, 63);
- HASHCLASH_SHA1COMPRESS_ROUND4_STEP(b, c, d, e, a, W, 64);
- HASHCLASH_SHA1COMPRESS_ROUND4_STEP(a, b, c, d, e, W, 65);
- HASHCLASH_SHA1COMPRESS_ROUND4_STEP(e, a, b, c, d, W, 66);
- HASHCLASH_SHA1COMPRESS_ROUND4_STEP(d, e, a, b, c, W, 67);
- HASHCLASH_SHA1COMPRESS_ROUND4_STEP(c, d, e, a, b, W, 68);
- HASHCLASH_SHA1COMPRESS_ROUND4_STEP(b, c, d, e, a, W, 69);
- HASHCLASH_SHA1COMPRESS_ROUND4_STEP(a, b, c, d, e, W, 70);
- HASHCLASH_SHA1COMPRESS_ROUND4_STEP(e, a, b, c, d, W, 71);
- HASHCLASH_SHA1COMPRESS_ROUND4_STEP(d, e, a, b, c, W, 72);
- HASHCLASH_SHA1COMPRESS_ROUND4_STEP(c, d, e, a, b, W, 73);
- HASHCLASH_SHA1COMPRESS_ROUND4_STEP(b, c, d, e, a, W, 74);
- HASHCLASH_SHA1COMPRESS_ROUND4_STEP(a, b, c, d, e, W, 75);
- HASHCLASH_SHA1COMPRESS_ROUND4_STEP(e, a, b, c, d, W, 76);
- HASHCLASH_SHA1COMPRESS_ROUND4_STEP(d, e, a, b, c, W, 77);
- HASHCLASH_SHA1COMPRESS_ROUND4_STEP(c, d, e, a, b, W, 78);
- HASHCLASH_SHA1COMPRESS_ROUND4_STEP(b, c, d, e, a, W, 79);
-
- ihv[0] += a; ihv[1] += b; ihv[2] += c; ihv[3] += d; ihv[4] += e;
-}
-
-
-
-void sha1_compression_states(uint32_t ihv[5], const uint32_t W[80], uint32_t states[80][5])
+static void sha1_compression_states(uint32_t ihv[5], const uint32_t W[80], uint32_t states[80][5])
{
uint32_t a = ihv[0], b = ihv[1], c = ihv[2], d = ihv[3], e = ihv[4];
@@ -664,7 +475,7 @@ void sha1_compression_states(uint32_t ihv[5], const uint32_t W[80], uint32_t sta
#define SHA1_RECOMPRESS(t) \
-void sha1recompress_fast_ ## t (uint32_t ihvin[5], uint32_t ihvout[5], const uint32_t me2[80], const uint32_t state[5]) \
+static void sha1recompress_fast_ ## t (uint32_t ihvin[5], uint32_t ihvout[5], const uint32_t me2[80], const uint32_t state[5]) \
{ \
uint32_t a = state[0], b = state[1], c = state[2], d = state[3], e = state[4]; \
if (t > 79) HASHCLASH_SHA1COMPRESS_ROUND4_STEP_BW(b, c, d, e, a, me2, 79); \
@@ -832,111 +643,20 @@ void sha1recompress_fast_ ## t (uint32_t ihvin[5], uint32_t ihvout[5], const uin
ihvout[0] = ihvin[0] + a; ihvout[1] = ihvin[1] + b; ihvout[2] = ihvin[2] + c; ihvout[3] = ihvin[3] + d; ihvout[4] = ihvin[4] + e; \
}
-SHA1_RECOMPRESS(0)
-SHA1_RECOMPRESS(1)
-SHA1_RECOMPRESS(2)
-SHA1_RECOMPRESS(3)
-SHA1_RECOMPRESS(4)
-SHA1_RECOMPRESS(5)
-SHA1_RECOMPRESS(6)
-SHA1_RECOMPRESS(7)
-SHA1_RECOMPRESS(8)
-SHA1_RECOMPRESS(9)
-
-SHA1_RECOMPRESS(10)
-SHA1_RECOMPRESS(11)
-SHA1_RECOMPRESS(12)
-SHA1_RECOMPRESS(13)
-SHA1_RECOMPRESS(14)
-SHA1_RECOMPRESS(15)
-SHA1_RECOMPRESS(16)
-SHA1_RECOMPRESS(17)
-SHA1_RECOMPRESS(18)
-SHA1_RECOMPRESS(19)
-
-SHA1_RECOMPRESS(20)
-SHA1_RECOMPRESS(21)
-SHA1_RECOMPRESS(22)
-SHA1_RECOMPRESS(23)
-SHA1_RECOMPRESS(24)
-SHA1_RECOMPRESS(25)
-SHA1_RECOMPRESS(26)
-SHA1_RECOMPRESS(27)
-SHA1_RECOMPRESS(28)
-SHA1_RECOMPRESS(29)
-
-SHA1_RECOMPRESS(30)
-SHA1_RECOMPRESS(31)
-SHA1_RECOMPRESS(32)
-SHA1_RECOMPRESS(33)
-SHA1_RECOMPRESS(34)
-SHA1_RECOMPRESS(35)
-SHA1_RECOMPRESS(36)
-SHA1_RECOMPRESS(37)
-SHA1_RECOMPRESS(38)
-SHA1_RECOMPRESS(39)
-
-SHA1_RECOMPRESS(40)
-SHA1_RECOMPRESS(41)
-SHA1_RECOMPRESS(42)
-SHA1_RECOMPRESS(43)
-SHA1_RECOMPRESS(44)
-SHA1_RECOMPRESS(45)
-SHA1_RECOMPRESS(46)
-SHA1_RECOMPRESS(47)
-SHA1_RECOMPRESS(48)
-SHA1_RECOMPRESS(49)
-
-SHA1_RECOMPRESS(50)
-SHA1_RECOMPRESS(51)
-SHA1_RECOMPRESS(52)
-SHA1_RECOMPRESS(53)
-SHA1_RECOMPRESS(54)
-SHA1_RECOMPRESS(55)
-SHA1_RECOMPRESS(56)
-SHA1_RECOMPRESS(57)
SHA1_RECOMPRESS(58)
-SHA1_RECOMPRESS(59)
-
-SHA1_RECOMPRESS(60)
-SHA1_RECOMPRESS(61)
-SHA1_RECOMPRESS(62)
-SHA1_RECOMPRESS(63)
-SHA1_RECOMPRESS(64)
SHA1_RECOMPRESS(65)
-SHA1_RECOMPRESS(66)
-SHA1_RECOMPRESS(67)
-SHA1_RECOMPRESS(68)
-SHA1_RECOMPRESS(69)
-
-SHA1_RECOMPRESS(70)
-SHA1_RECOMPRESS(71)
-SHA1_RECOMPRESS(72)
-SHA1_RECOMPRESS(73)
-SHA1_RECOMPRESS(74)
-SHA1_RECOMPRESS(75)
-SHA1_RECOMPRESS(76)
-SHA1_RECOMPRESS(77)
-SHA1_RECOMPRESS(78)
-SHA1_RECOMPRESS(79)
-
-sha1_recompression_type sha1_recompression_step[80] =
+
+static sha1_recompression_type sha1_recompression_step[80] =
{
- sha1recompress_fast_0, sha1recompress_fast_1, sha1recompress_fast_2, sha1recompress_fast_3, sha1recompress_fast_4, sha1recompress_fast_5, sha1recompress_fast_6, sha1recompress_fast_7, sha1recompress_fast_8, sha1recompress_fast_9,
- sha1recompress_fast_10, sha1recompress_fast_11, sha1recompress_fast_12, sha1recompress_fast_13, sha1recompress_fast_14, sha1recompress_fast_15, sha1recompress_fast_16, sha1recompress_fast_17, sha1recompress_fast_18, sha1recompress_fast_19,
- sha1recompress_fast_20, sha1recompress_fast_21, sha1recompress_fast_22, sha1recompress_fast_23, sha1recompress_fast_24, sha1recompress_fast_25, sha1recompress_fast_26, sha1recompress_fast_27, sha1recompress_fast_28, sha1recompress_fast_29,
- sha1recompress_fast_30, sha1recompress_fast_31, sha1recompress_fast_32, sha1recompress_fast_33, sha1recompress_fast_34, sha1recompress_fast_35, sha1recompress_fast_36, sha1recompress_fast_37, sha1recompress_fast_38, sha1recompress_fast_39,
- sha1recompress_fast_40, sha1recompress_fast_41, sha1recompress_fast_42, sha1recompress_fast_43, sha1recompress_fast_44, sha1recompress_fast_45, sha1recompress_fast_46, sha1recompress_fast_47, sha1recompress_fast_48, sha1recompress_fast_49,
- sha1recompress_fast_50, sha1recompress_fast_51, sha1recompress_fast_52, sha1recompress_fast_53, sha1recompress_fast_54, sha1recompress_fast_55, sha1recompress_fast_56, sha1recompress_fast_57, sha1recompress_fast_58, sha1recompress_fast_59,
- sha1recompress_fast_60, sha1recompress_fast_61, sha1recompress_fast_62, sha1recompress_fast_63, sha1recompress_fast_64, sha1recompress_fast_65, sha1recompress_fast_66, sha1recompress_fast_67, sha1recompress_fast_68, sha1recompress_fast_69,
- sha1recompress_fast_70, sha1recompress_fast_71, sha1recompress_fast_72, sha1recompress_fast_73, sha1recompress_fast_74, sha1recompress_fast_75, sha1recompress_fast_76, sha1recompress_fast_77, sha1recompress_fast_78, sha1recompress_fast_79,
+ [58] = sha1recompress_fast_58,
+ [65] = sha1recompress_fast_65,
};
-void sha1_process(SHA1_CTX* ctx, const uint32_t block[16])
+static void sha1_process(SHA1_CTX* ctx, const uint32_t block[16])
{
unsigned i, j;
uint32_t ubc_dv_mask[DVMASKSIZE];
@@ -973,12 +693,6 @@ void sha1_process(SHA1_CTX* ctx, const uint32_t block[16])
if (ctx->callback != NULL)
ctx->callback(ctx->total - 64, ctx->ihv1, ctx->ihv2, ctx->m1, ctx->m2);
- if (ctx->safe_hash)
- {
- sha1_compression_W(ctx->ihv, ctx->m1);
- sha1_compression_W(ctx->ihv, ctx->m1);
- }
-
break;
}
}
@@ -990,7 +704,7 @@ void sha1_process(SHA1_CTX* ctx, const uint32_t block[16])
-void swap_bytes(uint32_t val[16])
+static void swap_bytes(uint32_t val[16])
{
unsigned i;
for (i = 0; i < 16; ++i)
@@ -1011,7 +725,6 @@ void SHA1DCInit(SHA1_CTX* ctx)
ctx->ihv[3] = 0x10325476;
ctx->ihv[4] = 0xC3D2E1F0;
ctx->found_collision = 0;
- ctx->safe_hash = 1;
ctx->ubc_check = 1;
ctx->detect_coll = 1;
ctx->reduced_round_coll = 0;
@@ -1019,39 +732,6 @@ void SHA1DCInit(SHA1_CTX* ctx)
ctx->callback = NULL;
}
-void SHA1DCSetSafeHash(SHA1_CTX* ctx, int safehash)
-{
- if (safehash)
- ctx->safe_hash = 1;
- else
- ctx->safe_hash = 0;
-}
-
-
-void SHA1DCSetUseUBC(SHA1_CTX* ctx, int ubc_check)
-{
- if (ubc_check)
- ctx->ubc_check = 1;
- else
- ctx->ubc_check = 0;
-}
-
-void SHA1DCSetUseDetectColl(SHA1_CTX* ctx, int detect_coll)
-{
- if (detect_coll)
- ctx->detect_coll = 1;
- else
- ctx->detect_coll = 0;
-}
-
-void SHA1DCSetDetectReducedRoundCollision(SHA1_CTX* ctx, int reduced_round_coll)
-{
- if (reduced_round_coll)
- ctx->reduced_round_coll = 1;
- else
- ctx->reduced_round_coll = 0;
-}
-
void SHA1DCSetCallback(SHA1_CTX* ctx, collision_block_callback callback)
{
ctx->callback = callback;
diff --git a/sha1dc/sha1.h b/sha1dc/sha1.h
index 1bb0ace99..f126bf63b 100644
--- a/sha1dc/sha1.h
+++ b/sha1dc/sha1.h
@@ -5,29 +5,6 @@
* https://opensource.org/licenses/MIT
***/
-// uses SHA-1 message expansion to expand the first 16 words of W[] to 80 words
-void sha1_message_expansion(uint32_t W[80]);
-
-// sha-1 compression function; first version takes a message block pre-parsed as 16 32-bit integers, second version takes an already expanded message)
-void sha1_compression(uint32_t ihv[5], const uint32_t m[16]);
-void sha1_compression_W(uint32_t ihv[5], const uint32_t W[80]);
-
-// same as sha1_compression_W, but additionally store intermediate states
-// only stores states ii (the state between step ii-1 and step ii) when DOSTORESTATEii is defined in ubc_check.h
-void sha1_compression_states(uint32_t ihv[5], const uint32_t W[80], uint32_t states[80][5]);
-
-// function type for sha1_recompression_step_T (uint32_t ihvin[5], uint32_t ihvout[5], const uint32_t me2[80], const uint32_t state[5])
-// where 0 <= T < 80
-// me2 is an expanded message (the expansion of an original message block XOR'ed with a disturbance vector's message block difference)
-// state is the internal state (a,b,c,d,e) before step T of the SHA-1 compression function while processing the original message block
-// the function will return:
-// ihvin: the reconstructed input chaining value
-// ihvout: the reconstructed output chaining value
-typedef void(*sha1_recompression_type)(uint32_t*, uint32_t*, const uint32_t*, const uint32_t*);
-
-// table of sha1_recompression_step_0, ... , sha1_recompression_step_79
-extern sha1_recompression_type sha1_recompression_step[80];
-
// a callback function type that can be set to be called when a collision block has been found:
// void collision_block_callback(uint64_t byteoffset, const uint32_t ihvin1[5], const uint32_t ihvin2[5], const uint32_t m1[80], const uint32_t m2[80])
typedef void(*collision_block_callback)(uint64_t, const uint32_t*, const uint32_t*, const uint32_t*, const uint32_t*);
@@ -39,7 +16,6 @@ typedef struct {
unsigned char buffer[64];
int bigendian;
int found_collision;
- int safe_hash;
int detect_coll;
int ubc_check;
int reduced_round_coll;
--
2.12.0.4.g94589516d
^ permalink raw reply related [flat|nested] 36+ messages in thread
* Re: [PATCH] Put sha1dc on a diet
2017-03-01 0:30 [PATCH] Put sha1dc on a diet Linus Torvalds
@ 2017-03-01 18:42 ` Junio C Hamano
2017-03-01 18:49 ` Linus Torvalds
2017-03-01 19:07 ` Jeff King
1 sibling, 1 reply; 36+ messages in thread
From: Junio C Hamano @ 2017-03-01 18:42 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Jeff King, Marc Stevens, Dan Shumow, Git Mailing List
Linus Torvalds <torvalds@linux-foundation.org> writes:
> I notice that sha1dc is in the 'pu' branch now, so let's put my money
> where my mouth is, and send in the sha1dc diet patch.
I see //c99 comments and also T array[] = { [58] = val } both of
which I think we stay away from (and the former is from the initial
import), so some people on other platforms MAY have trouble with
this topic.
Let's see what happens by queuing it on 'pu' ;-)
Thanks.
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH] Put sha1dc on a diet
2017-03-01 18:42 ` Junio C Hamano
@ 2017-03-01 18:49 ` Linus Torvalds
2017-03-01 19:41 ` Junio C Hamano
2017-03-01 19:53 ` Jeff King
0 siblings, 2 replies; 36+ messages in thread
From: Linus Torvalds @ 2017-03-01 18:49 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Jeff King, Marc Stevens, Dan Shumow, Git Mailing List
On Wed, Mar 1, 2017 at 10:42 AM, Junio C Hamano <gitster@pobox.com> wrote:
> I see //c99 comments
sha1dc is already full of // style comments. I just followed the
existing practice.
> and also T array[] = { [58] = val } both of
> which I think we stay away from (and the former is from the initial
> import), so some people on other platforms MAY have trouble with
> this topic.
Hmm. The "{ [58] = val; }" kind of initialization would be easy to
work around by just filling in everything else with NULL, but it would
make for a pretty nasty readability issue.
That said, if you mis-count the NULL's, the end result will pretty
immediately SIGSEGV, so I guess it wouldn't be much of a maintenance
problem.
But if you're just willing to take the "let's see" approach, I think
the explicitly numbered initializer is much better.
The main people who I assume would really want to use the sha1dc
library are hosting places. And they won't be using crazy compilers
from the last century.
That said, I think that it would be lovely to just default to
USE_SHA1DC and just put the whole attack behind us. Yes, it's slower.
No, it doesn't really seem to matter that much in practice.
Linus
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH] Put sha1dc on a diet
2017-03-01 18:49 ` Linus Torvalds
@ 2017-03-01 19:41 ` Junio C Hamano
2017-03-01 21:56 ` Johannes Schindelin
2017-03-01 19:53 ` Jeff King
1 sibling, 1 reply; 36+ messages in thread
From: Junio C Hamano @ 2017-03-01 19:41 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Jeff King, Marc Stevens, Dan Shumow, Git Mailing List
Linus Torvalds <torvalds@linux-foundation.org> writes:
> That said, I think that it would be lovely to just default to
> USE_SHA1DC and just put the whole attack behind us. Yes, it's slower.
> No, it doesn't really seem to matter that much in practice.
Yes. It would be a very good goal.
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH] Put sha1dc on a diet
2017-03-01 19:41 ` Junio C Hamano
@ 2017-03-01 21:56 ` Johannes Schindelin
2017-03-01 22:05 ` Junio C Hamano
2017-03-01 22:16 ` Linus Torvalds
0 siblings, 2 replies; 36+ messages in thread
From: Johannes Schindelin @ 2017-03-01 21:56 UTC (permalink / raw)
To: Junio C Hamano
Cc: Linus Torvalds, Jeff King, Marc Stevens, Dan Shumow, Git Mailing List
Hi,
On Wed, 1 Mar 2017, Junio C Hamano wrote:
> Linus Torvalds <torvalds@linux-foundation.org> writes:
>
> > That said, I think that it would be lovely to just default to
> > USE_SHA1DC and just put the whole attack behind us. Yes, it's slower.
> > No, it doesn't really seem to matter that much in practice.
>
> Yes. It would be a very good goal.
So let me get this straight: not only do we now implicitly want to bump
the required C compiler to C99 without any grace period worth mentioning
[*1*], we are also all of a sudden no longer worried about a double digit
percentage drop of speed [*2*]?
Puzzled,
Johannes
Footnote *1*: I know, it is easy to forget that some developers cannot
choose their tools, or even their hardware. In the past, we seemed to take
appropriate care, though.
Footnote *2*: With real-world repositories of notable size, that
performance regression hurts. A lot. We just spent time to get the speed
of SHA-1 down by a couple percent and it was a noticeable improvement here.
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH] Put sha1dc on a diet
2017-03-01 21:56 ` Johannes Schindelin
@ 2017-03-01 22:05 ` Junio C Hamano
2017-03-01 22:16 ` Linus Torvalds
1 sibling, 0 replies; 36+ messages in thread
From: Junio C Hamano @ 2017-03-01 22:05 UTC (permalink / raw)
To: Johannes Schindelin
Cc: Linus Torvalds, Jeff King, Marc Stevens, Dan Shumow, Git Mailing List
On Wed, Mar 1, 2017 at 1:56 PM, Johannes Schindelin
<Johannes.Schindelin@gmx.de> wrote:
> Hi,
>
> On Wed, 1 Mar 2017, Junio C Hamano wrote:
>
>> Linus Torvalds <torvalds@linux-foundation.org> writes:
>>
>> > That said, I think that it would be lovely to just default to
>> > USE_SHA1DC and just put the whole attack behind us. Yes, it's slower.
>> > No, it doesn't really seem to matter that much in practice.
>>
>> Yes. It would be a very good goal.
>
> So let me get this straight: not only do we now implicitly want to bump
> the required C compiler to C99 without any grace period worth mentioning
> [*1*], we are also all of a sudden no longer worried about a double digit
> percentage drop of speed [*2*]?
Before we get the code into shape suitable for 'next', it is more important to
make sure it operates correctly, adding necessary features if any (e.g. "hash
with or without check" knob) while it is in 'pu', and *1* is to allow
it to progress
faster without having to worry about something we could do mechanically
before making it ready for 'next'.
The performance thing is really "let's see how well it goes". With effort to
optimize still "just has began", I think it is too early to tell if
Linus's "doesn't
really seem to matter" is the case or not.
Queuing such a topic on 'pu' is one effective way to make sure people are
working off of the same codebase.
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH] Put sha1dc on a diet
2017-03-01 21:56 ` Johannes Schindelin
2017-03-01 22:05 ` Junio C Hamano
@ 2017-03-01 22:16 ` Linus Torvalds
2017-03-01 22:51 ` Johannes Schindelin
1 sibling, 1 reply; 36+ messages in thread
From: Linus Torvalds @ 2017-03-01 22:16 UTC (permalink / raw)
To: Johannes Schindelin
Cc: Junio C Hamano, Jeff King, Marc Stevens, Dan Shumow, Git Mailing List
On Wed, Mar 1, 2017 at 1:56 PM, Johannes Schindelin
<Johannes.Schindelin@gmx.de> wrote:
> Footnote *1*: I know, it is easy to forget that some developers cannot
> choose their tools, or even their hardware. In the past, we seemed to take
> appropriate care, though.
I don't think you need to worry about the Windows side. That can
continue to do something else.
When I advocated perhaps using USE_SHA1DC by default, I definitely
did not mean it in a "everywhere, regardless of issues" manner.
For example, the conmfig.mak.uname script already explicitly asks for
"BLK_SHA1 = YesPlease" for Windows. Don't bother changing that, it's
an explicit choice.
But the Linux rules don't actually specify which SHA1 version to use,
so the main Makefile currently defaults to just using openssl.
So that's the "default" choice I think we might want to change. Not
the "we're windows, and explicitly want BLK_SHA1 because of
environment and build infrastructure".
Linus
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH] Put sha1dc on a diet
2017-03-01 22:16 ` Linus Torvalds
@ 2017-03-01 22:51 ` Johannes Schindelin
2017-03-01 23:05 ` Linus Torvalds
0 siblings, 1 reply; 36+ messages in thread
From: Johannes Schindelin @ 2017-03-01 22:51 UTC (permalink / raw)
To: Linus Torvalds
Cc: Junio C Hamano, Jeff King, Marc Stevens, Dan Shumow, Git Mailing List
Hi,
On Wed, 1 Mar 2017, Linus Torvalds wrote:
> On Wed, Mar 1, 2017 at 1:56 PM, Johannes Schindelin
> <Johannes.Schindelin@gmx.de> wrote:
> > Footnote *1*: I know, it is easy to forget that some developers cannot
> > choose their tools, or even their hardware. In the past, we seemed to take
> > appropriate care, though.
>
> I don't think you need to worry about the Windows side.
I am not. I build G?t for Windows using GCC.
My concern is about that unexpected turn "oh, let's just switch to C99
because, well, because my compiler canehandle it, and everybody else
should just switch tn a modern compiler". That really sounded careless.
> That can continue to do something else.
>
> When I advocated perhaps using USE_SHA1DC by default, I definitely did
> not mean it in a "everywhere, regardless of issues" manner.
>
> For example, the conmfig.mak.uname script already explicitly asks for
> "BLK_SHA1 = YesPlease" for Windows. Don't bother changing that, it's an
> explicit choice.
That setting is only in git.git's version, not in gxt-for-windows/git.git.
We switched to OpenSSL because of speed improvements, in particular with
recent Intel processors.
> But the Linux rules don't actually specify which SHA1 version to use,
> so the main Makefile currently defaults to just using openssl.
>
> So that's the "default" choice I think we might want to change. Not
> the "we're windows, and explicitly want BLK_SHA1 because of
> environment and build infrastructure".
Since we switched away from BLOCK_SHA1, any such change would affect Git
for Windews.
But I think bigger than just developers on Windows OS. There are many
developers out there working on large repositories (yes, much larger than
Linux). Also using Macs and Linux. I am not at all sure that we want to
give them an updated Git they cannot fail to notice to be much slower than
before.
Don't get me wrong: I *hope* that you'll manage to get sha1dc
competitively fast. If you don't, well, then we simply cannot use it by
default for *all* of our calls (you already pointed out that the pack
index' checksum does not need collision detection, and in fact, *any*
operation that works on implicitly trusted data falls into the same court,
e.g. `git add`).
Ciao,
Johannes
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH] Put sha1dc on a diet
2017-03-01 22:51 ` Johannes Schindelin
@ 2017-03-01 23:05 ` Linus Torvalds
2017-03-01 23:19 ` Jeff King
2017-03-02 14:37 ` Johannes Schindelin
0 siblings, 2 replies; 36+ messages in thread
From: Linus Torvalds @ 2017-03-01 23:05 UTC (permalink / raw)
To: Johannes Schindelin
Cc: Junio C Hamano, Jeff King, Marc Stevens, Dan Shumow, Git Mailing List
On Wed, Mar 1, 2017 at 2:51 PM, Johannes Schindelin
<Johannes.Schindelin@gmx.de> wrote:
>
> But I think bigger than just developers on Windows OS. There are many
> developers out there working on large repositories (yes, much larger than
> Linux). Also using Macs and Linux. I am not at all sure that we want to
> give them an updated Git they cannot fail to notice to be much slower than
> before.
Johannes, have you *tried* the patches?
I really don't think you have. It is completely unnoticeable in any
normal situation. The two cases that it's noticeable is:
- a full fsck is noticeable slower
- a full non-local clone is slower (but not really noticeably so
since the network traffic dominates).
In other words, I think you're making shit up. I don't think you
understand how little the SHA1 performance actually matters. It's
noticeable in benchmarks. It's not noticeable in any normal operation.
.. and yes, I've actually been running the patches locally since I
posted my first version (which apparently didn't go out to the list
because of list size limits) and now running the version in 'pu'.
Linus
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH] Put sha1dc on a diet
2017-03-01 23:05 ` Linus Torvalds
@ 2017-03-01 23:19 ` Jeff King
2017-03-02 6:10 ` Duy Nguyen
2017-03-02 14:39 ` Johannes Schindelin
2017-03-02 14:37 ` Johannes Schindelin
1 sibling, 2 replies; 36+ messages in thread
From: Jeff King @ 2017-03-01 23:19 UTC (permalink / raw)
To: Linus Torvalds
Cc: Johannes Schindelin, Junio C Hamano, Marc Stevens, Dan Shumow,
Git Mailing List
On Wed, Mar 01, 2017 at 03:05:25PM -0800, Linus Torvalds wrote:
> On Wed, Mar 1, 2017 at 2:51 PM, Johannes Schindelin
> <Johannes.Schindelin@gmx.de> wrote:
> >
> > But I think bigger than just developers on Windows OS. There are many
> > developers out there working on large repositories (yes, much larger than
> > Linux). Also using Macs and Linux. I am not at all sure that we want to
> > give them an updated Git they cannot fail to notice to be much slower than
> > before.
>
> Johannes, have you *tried* the patches?
>
> I really don't think you have. It is completely unnoticeable in any
> normal situation. The two cases that it's noticeable is:
>
> - a full fsck is noticeable slower
>
> - a full non-local clone is slower (but not really noticeably so
> since the network traffic dominates).
>
> In other words, I think you're making shit up. I don't think you
> understand how little the SHA1 performance actually matters. It's
> noticeable in benchmarks. It's not noticeable in any normal operation.
>
> .. and yes, I've actually been running the patches locally since I
> posted my first version (which apparently didn't go out to the list
> because of list size limits) and now running the version in 'pu'.
You have to remember that some of the Git for Windows users are doing
horrific things like using repositories with 450MB .git/index files, and
the speed to compute the sha1 during an update is noticeable there.
IMHO that is a good sign that the right approach is to switch to an
index format that doesn't require rewriting all 450MB to update one
entry. But obviously that is a much harder problem than just using an
optimized sha1 implementation.
I do think that could argue for turning on the collision detection only
during object-write operations, which is where it matters. It would be
really trivial to flip the "check collisions" bit on sha1dc. But I
suspect you could go faster still by compiling against two separate
implementations: the fast-as-possible one (which could be openssl or
blk-sha1), and the slower-but-careful sha1dc.
-Peff
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH] Put sha1dc on a diet
2017-03-01 23:19 ` Jeff King
@ 2017-03-02 6:10 ` Duy Nguyen
2017-03-02 14:45 ` Johannes Schindelin
2017-03-02 14:39 ` Johannes Schindelin
1 sibling, 1 reply; 36+ messages in thread
From: Duy Nguyen @ 2017-03-02 6:10 UTC (permalink / raw)
To: Jeff King
Cc: Linus Torvalds, Johannes Schindelin, Junio C Hamano,
Marc Stevens, Dan Shumow, Git Mailing List
On Thu, Mar 2, 2017 at 6:19 AM, Jeff King <peff@peff.net> wrote:
> You have to remember that some of the Git for Windows users are doing
> horrific things like using repositories with 450MB .git/index files, and
> the speed to compute the sha1 during an update is noticeable there.
We probably should separate this use case from the object hashing
anyway. Here we need a better, more reliable crc32 basically, to
detect bit flips. Even if we move to SHA-something, we can keep
staying with SHA-1 here (and with the fastest implementation)
--
Duy
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH] Put sha1dc on a diet
2017-03-02 6:10 ` Duy Nguyen
@ 2017-03-02 14:45 ` Johannes Schindelin
2017-03-02 16:35 ` Linus Torvalds
0 siblings, 1 reply; 36+ messages in thread
From: Johannes Schindelin @ 2017-03-02 14:45 UTC (permalink / raw)
To: Duy Nguyen
Cc: Jeff King, Linus Torvalds, Junio C Hamano, Marc Stevens,
Dan Shumow, Git Mailing List
Hi Duy,
On Thu, 2 Mar 2017, Duy Nguyen wrote:
> On Thu, Mar 2, 2017 at 6:19 AM, Jeff King <peff@peff.net> wrote:
> > You have to remember that some of the Git for Windows users are doing
> > horrific things like using repositories with 450MB .git/index files,
> > and the speed to compute the sha1 during an update is noticeable
> > there.
>
> We probably should separate this use case from the object hashing
> anyway. Here we need a better, more reliable crc32 basically, to detect
> bit flips. Even if we move to SHA-something, we can keep staying with
> SHA-1 here (and with the fastest implementation)
I guess it was convenient to use the same hash algorithm for all hashing
purposes in the beginning. The downside, of course, was that we kept
talking about SHA-1s instead of commit hashes and the index checksum (i.e.
using labels based on implementation details rather than semantically
meaningful names).
In the meantime, we use different hash algorithms where appropriate, of
course, and we typically encapsulate the exact hash algorithm so that it
is easy to switch when/if necessary (think the hash functions for strings
in our hashtables, and the hash functions in xdiff).
It would probably make sense to switch the index integrity check away from
SHA-1 because we really only care about detecting bit flips there, and we
have no need for the computational overhead of using a full-blown
cryptographic hash for that purpose.
Ciao,
Johannes
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH] Put sha1dc on a diet
2017-03-02 14:45 ` Johannes Schindelin
@ 2017-03-02 16:35 ` Linus Torvalds
2017-03-02 18:37 ` Jeff Hostetler
0 siblings, 1 reply; 36+ messages in thread
From: Linus Torvalds @ 2017-03-02 16:35 UTC (permalink / raw)
To: Johannes Schindelin
Cc: Duy Nguyen, Jeff King, Junio C Hamano, Marc Stevens, Dan Shumow,
Git Mailing List
On Thu, Mar 2, 2017 at 6:45 AM, Johannes Schindelin
<Johannes.Schindelin@gmx.de> wrote:
>
> It would probably make sense to switch the index integrity check away from
> SHA-1 because we really only care about detecting bit flips there, and we
> have no need for the computational overhead of using a full-blown
> cryptographic hash for that purpose.
Which index do you actually see as being a problem, btw? The main file
index (.git/index) or the pack-file indexes?
We definitely don't need the checking version of sha1 for either of
those, but as Jeff already did the math, at least the pack-file index
is almost negligible, because the pack-file operations that update it
end up doing SHA1 over the objects - and the object SHA1 calculations
are much bigger.
And I don't think we even check the pack-file index hashes except on fsck.
Now, if your _file_ index is 300-400MB (and I do think we check the
SHA fingerprint on that even on just reading it - verify_hdr() in
do_read_index()), then that's going to be a somewhat noticeable hit on
every normal "git diff" etc.
But I'd have expected the stat() calls of all the files listed by that
index to be the _much_ bigger problem in that case. Or do you just
turn those off with assume-unchanged?
Yeah, those stat calls are threaded when preloading, but even so..
Anyway, the file index SHA1 checking could probably just be disabled
entirely (with a config flag). It's a corruption check that simply
isn't that important. So if that's your main SHA1 issue, that would be
easy to fix.
Everything else - like pack-file generation etc for a big clone() may
end up using a ton of SHA1 too, but the SHA1 costs all scale with the
other costs that drown them out (ie zlib, network, etc).
I'd love to see a profile if you have one.
Linus
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH] Put sha1dc on a diet
2017-03-02 16:35 ` Linus Torvalds
@ 2017-03-02 18:37 ` Jeff Hostetler
2017-03-02 19:04 ` Linus Torvalds
0 siblings, 1 reply; 36+ messages in thread
From: Jeff Hostetler @ 2017-03-02 18:37 UTC (permalink / raw)
To: Linus Torvalds, Johannes Schindelin
Cc: Duy Nguyen, Jeff King, Junio C Hamano, Marc Stevens, Dan Shumow,
Git Mailing List
On 3/2/2017 11:35 AM, Linus Torvalds wrote:
> On Thu, Mar 2, 2017 at 6:45 AM, Johannes Schindelin
> <Johannes.Schindelin@gmx.de> wrote:
>> It would probably make sense to switch the index integrity check away from
>> SHA-1 because we really only care about detecting bit flips there, and we
>> have no need for the computational overhead of using a full-blown
>> cryptographic hash for that purpose.
> Which index do you actually see as being a problem, btw? The main file
> index (.git/index) or the pack-file indexes?
>
> We definitely don't need the checking version of sha1 for either of
> those, but as Jeff already did the math, at least the pack-file index
> is almost negligible, because the pack-file operations that update it
> end up doing SHA1 over the objects - and the object SHA1 calculations
> are much bigger.
>
> And I don't think we even check the pack-file index hashes except on fsck.
>
> Now, if your _file_ index is 300-400MB (and I do think we check the
> SHA fingerprint on that even on just reading it - verify_hdr() in
> do_read_index()), then that's going to be a somewhat noticeable hit on
> every normal "git diff" etc.
Yes, the .git/index is 450MB with ~3.1M entries. verify_hdr() is called
each time
we read it into memory.
We have been testing a patch in GfW to run the verification in a
separate thread
while the main thread parses (and mallocs) the cache_entries. This does
help
offset the time.
https://github.com/git-for-windows/git/pull/978/files
> But I'd have expected the stat() calls of all the files listed by that
> index to be the _much_ bigger problem in that case. Or do you just
> turn those off with assume-unchanged?
>
> Yeah, those stat calls are threaded when preloading, but even so..
Yes, the stat() calls are more significant percentage of the time (and
having
core.fscache and core.preloadindex help that greatly), but the total
time for a command
is just that -- the total -- so using the philosophy of "every little
bit helps", the faster
routines help us here.
> Anyway, the file index SHA1 checking could probably just be disabled
> entirely (with a config flag). It's a corruption check that simply
> isn't that important. So if that's your main SHA1 issue, that would be
> easy to fix.
Yes, in the GVFS effort, we disabled the verification with a config
setting and haven't
had any incidents.
> Everything else - like pack-file generation etc for a big clone() may
> end up using a ton of SHA1 too, but the SHA1 costs all scale with the
> other costs that drown them out (ie zlib, network, etc).
>
> I'd love to see a profile if you have one.
>
> Linus
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH] Put sha1dc on a diet
2017-03-02 18:37 ` Jeff Hostetler
@ 2017-03-02 19:04 ` Linus Torvalds
0 siblings, 0 replies; 36+ messages in thread
From: Linus Torvalds @ 2017-03-02 19:04 UTC (permalink / raw)
To: Jeff Hostetler
Cc: Johannes Schindelin, Duy Nguyen, Jeff King, Junio C Hamano,
Marc Stevens, Dan Shumow, Git Mailing List
On Thu, Mar 2, 2017 at 10:37 AM, Jeff Hostetler <git@jeffhostetler.com> wrote:
>>
>> Now, if your _file_ index is 300-400MB (and I do think we check the
>> SHA fingerprint on that even on just reading it - verify_hdr() in
>> do_read_index()), then that's going to be a somewhat noticeable hit on
>> every normal "git diff" etc.
>
> Yes, the .git/index is 450MB with ~3.1M entries. verify_hdr() is called
> each time we read it into memory.
Ok. So that's really just a purely historical artifact.
The file index is actually the first part of git to have ever been
written. You can't even see it in the history, because the initial
revision from Apr 7, 2005, obviously depended on the actual object
hashing.
But the file index actually came first. You can _kind_ of see that in
the layout of the original git tree, and how the main header file is
still called "cache.h", and how the original ".git" directory was
actually called ".dircache".
And the two biggest files (by a fairly big margin) are "read-cache.c"
and "update-cache.c".
So that file index cache was in many ways _the_ central part of the
original git model. The sha1 file indexing and object database was
just the backing store for the file index.
But part of that history is then how much I worried about corruption
of that index (and, let's face it, general corruption resistance _was_
one of the primary design goals - performance was high up there too,
but safety in the face of filesystem corruption was and is a primary
issue).
But realistically, I don't think we've *ever* hit anything serious on
the index file, and it's obviously not a security issue. It also isn't
even a compatibility issue, so it would be trivial to just bump the
version header and saying that the signature changes the meaning of
the checksum.
That said:
> We have been testing a patch in GfW to run the verification in a separate thread
> while the main thread parses (and mallocs) the cache_entries. This does help
> offset the time.
Yeah, that seems an even better solution, honestly.
The patch would be cleaner without the NO_PTHREADS things.
I wonder how meaningful that thing even is today. Looking at what
seems to select NO_PTHREADS, I suspect that's all entirely historical.
For example, you'll see it for QNX etc, which seems wrong - QNX
definitely has pthreads according to their docs, for example.
Linus
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH] Put sha1dc on a diet
2017-03-01 23:19 ` Jeff King
2017-03-02 6:10 ` Duy Nguyen
@ 2017-03-02 14:39 ` Johannes Schindelin
1 sibling, 0 replies; 36+ messages in thread
From: Johannes Schindelin @ 2017-03-02 14:39 UTC (permalink / raw)
To: Jeff King
Cc: Linus Torvalds, Junio C Hamano, Marc Stevens, Dan Shumow,
Git Mailing List
Hi Peff,
On Wed, 1 Mar 2017, Jeff King wrote:
> I do think that could argue for turning on the collision detection only
> during object-write operations, which is where it matters. It would be
> really trivial to flip the "check collisions" bit on sha1dc. But I
> suspect you could go faster still by compiling against two separate
> implementations: the fast-as-possible one (which could be openssl or
> blk-sha1), and the slower-but-careful sha1dc.
Given the speed difference between OpenSSL and sha1dc, it would be a wise
thing indeed to do sha1dc only where objects enter from possibly untrusted
sources, and use OpenSSL for all other hashing.
Ciao,
Johannes
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH] Put sha1dc on a diet
2017-03-01 23:05 ` Linus Torvalds
2017-03-01 23:19 ` Jeff King
@ 2017-03-02 14:37 ` Johannes Schindelin
1 sibling, 0 replies; 36+ messages in thread
From: Johannes Schindelin @ 2017-03-02 14:37 UTC (permalink / raw)
To: Linus Torvalds
Cc: Junio C Hamano, Jeff King, Marc Stevens, Dan Shumow, Git Mailing List
Hi Linus,
On Wed, 1 Mar 2017, Linus Torvalds wrote:
> On Wed, Mar 1, 2017 at 2:51 PM, Johannes Schindelin
> <Johannes.Schindelin@gmx.de> wrote:
> >
> > But I think bigger than just developers on Windows OS. There are many
> > developers out there working on large repositories (yes, much larger
> > than Linux). Also using Macs and Linux. I am not at all sure that we
> > want to give them an updated Git they cannot fail to notice to be much
> > slower than before.
>
> Johannes, have you *tried* the patches?
>
> I really don't think you have. It is completely unnoticeable in any
> normal situation. The two cases that it's noticeable is:
>
> - a full fsck is noticeable slower
>
> - a full non-local clone is slower (but not really noticeably so
> since the network traffic dominates).
>
> In other words, I think you're making shit up. I don't think you
> understand how little the SHA1 performance actually matters. It's
> noticeable in benchmarks. It's not noticeable in any normal operation.
>
> .. and yes, I've actually been running the patches locally since I
> posted my first version (which apparently didn't go out to the list
> because of list size limits) and now running the version in 'pu'.
If you think that the Linux repository is a big one, then your reaction is
understandable.
I have zero interest in potty language, therefore my reply is very terse:
yes, I have been looking ad SHA-1 performance, and yes, it matters. Think
an index file of 300-400MB.
Ciao,
Johannes
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH] Put sha1dc on a diet
2017-03-01 18:49 ` Linus Torvalds
2017-03-01 19:41 ` Junio C Hamano
@ 2017-03-01 19:53 ` Jeff King
[not found] ` <CA+55aFwf3sxKW+dGTMjNAeHMOf=rvctEQohm+rbhEb=e3KLpHw@mail.gmail.com>
1 sibling, 1 reply; 36+ messages in thread
From: Jeff King @ 2017-03-01 19:53 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Junio C Hamano, Marc Stevens, Dan Shumow, Git Mailing List
On Wed, Mar 01, 2017 at 10:49:55AM -0800, Linus Torvalds wrote:
> That said, I think that it would be lovely to just default to
> USE_SHA1DC and just put the whole attack behind us. Yes, it's slower.
> No, it doesn't really seem to matter that much in practice.
My biggest concern is the index-pack operation. Try this:
time git clone --no-local --bare linux tmp.git
with and without USE_SHA1DC. I get:
[w/ openssl]
real 1m52.307s
user 2m47.928s
sys 0m14.992s
[w/ sha1dc]
real 3m4.043s
user 6m16.412s
sys 0m13.772s
That's real latency the user will see. It's hard to break it down,
though. The actual "receiving" phase is generally going to be network
bound. The delta-resolution that happens afterwards is totally local and
CPU-bound (but does run in parallel).
And of course this repository tends to the larger side (though certainly
there are bigger ones), and you only feel the pain on clone or when
doing an initial push, not day-to-day.
So maybe we just suck it up and accept that it's a bit slower.
-Peff
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH] Put sha1dc on a diet
2017-03-01 0:30 [PATCH] Put sha1dc on a diet Linus Torvalds
2017-03-01 18:42 ` Junio C Hamano
@ 2017-03-01 19:07 ` Jeff King
2017-03-01 19:10 ` Linus Torvalds
1 sibling, 1 reply; 36+ messages in thread
From: Jeff King @ 2017-03-01 19:07 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Junio C Hamano, Marc Stevens, Dan Shumow, Git Mailing List
On Tue, Feb 28, 2017 at 04:30:26PM -0800, Linus Torvalds wrote:
> From: Linus Torvalds <torvalds@linux-foundation.org>
> Date: Tue, 28 Feb 2017 16:12:32 -0800
> Subject: [PATCH] Put sha1dc on a diet
>
> This removes the unnecessary parts of the sha1dc code, shrinking things from
> [...]
So obviously the smaller object size is nice, and the diffstat is
certainly satisfying. My only qualm would be whether this conflicts with
the optimizations that Dan is working on (probably not conceptually, but
textually).
-Peff
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH] Put sha1dc on a diet
2017-03-01 19:07 ` Jeff King
@ 2017-03-01 19:10 ` Linus Torvalds
0 siblings, 0 replies; 36+ messages in thread
From: Linus Torvalds @ 2017-03-01 19:10 UTC (permalink / raw)
To: Jeff King; +Cc: Junio C Hamano, Marc Stevens, Dan Shumow, Git Mailing List
On Wed, Mar 1, 2017 at 11:07 AM, Jeff King <peff@peff.net> wrote:
>
> So obviously the smaller object size is nice, and the diffstat is
> certainly satisfying. My only qualm would be whether this conflicts with
> the optimizations that Dan is working on (probably not conceptually, but
> textually).
Yeah. But I'll happily just re-apply the patch on any new version that
Dan posts. The patch is obviously trivial, even if size-wise it's a
fair number of lines.
So I wouldn't suggest using the patched version as some kind of
starting point. It's much easier to just take a new version of
upstream and repeat the diet patch on it.
.. and obviously later versions of the upstream sha1dc code may not
even need this at all since Dan and Marc are now aware of the issue.
Linus
^ permalink raw reply [flat|nested] 36+ messages in thread
end of thread, other threads:[~2017-03-16 22:07 UTC | newest]
Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-03-01 0:30 [PATCH] Put sha1dc on a diet Linus Torvalds
2017-03-01 18:42 ` Junio C Hamano
2017-03-01 18:49 ` Linus Torvalds
2017-03-01 19:41 ` Junio C Hamano
2017-03-01 21:56 ` Johannes Schindelin
2017-03-01 22:05 ` Junio C Hamano
2017-03-01 22:16 ` Linus Torvalds
2017-03-01 22:51 ` Johannes Schindelin
2017-03-01 23:05 ` Linus Torvalds
2017-03-01 23:19 ` Jeff King
2017-03-02 6:10 ` Duy Nguyen
2017-03-02 14:45 ` Johannes Schindelin
2017-03-02 16:35 ` Linus Torvalds
2017-03-02 18:37 ` Jeff Hostetler
2017-03-02 19:04 ` Linus Torvalds
2017-03-02 14:39 ` Johannes Schindelin
2017-03-02 14:37 ` Johannes Schindelin
2017-03-01 19:53 ` Jeff King
[not found] ` <CA+55aFwf3sxKW+dGTMjNAeHMOf=rvctEQohm+rbhEb=e3KLpHw@mail.gmail.com>
2017-03-01 20:34 ` Jeff King
[not found] ` <CA+55aFwr1jncrk-cekn0Y8rs_S+zs7RrgQ-Jb-ZbgCvmVrHT_A@mail.gmail.com>
2017-03-01 23:13 ` Jeff King
2017-03-01 23:38 ` Linus Torvalds
2017-03-02 1:31 ` Dan Shumow
2017-03-02 4:38 ` Junio C Hamano
2017-03-04 1:07 ` Dan Shumow
2017-03-13 15:13 ` Jeff King
[not found] ` <CY1PR0301MB2107B3C5131D5DC7F91A0147C4250@CY1PR0301MB2107.namprd03.prod.outlook.com>
[not found] ` <CY1PR0301MB2107876B6E47FBCF03AB1EA1C4250@CY1PR0301MB2107.namprd03.prod.outlook.com>
2017-03-13 19:48 ` Jeff King
2017-03-13 20:12 ` Marc Stevens
2017-03-13 20:20 ` Linus Torvalds
2017-03-13 20:47 ` Marc Stevens
2017-03-13 21:00 ` Jeff King
2017-03-13 21:15 ` Marc Stevens
2017-03-16 18:22 ` Marc Stevens
2017-03-16 22:06 ` Jeff King
2017-03-16 22:07 ` Dan Shumow
2017-03-01 19:07 ` Jeff King
2017-03-01 19:10 ` Linus Torvalds
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.