All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] diff.c: Ensure "index $from..$to" line contains unambiguous SHA1s
@ 2010-05-30 13:37 Johan Herland
  2010-05-30 19:58 ` Johannes Sixt
  0 siblings, 1 reply; 3+ messages in thread
From: Johan Herland @ 2010-05-30 13:37 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

In the metainfo section of git diffs there's an "index" line providing
abbreviated (unless --full-index is used) blob SHA1s from the
pre-/post-images used to generate the diff. These provide hints that
can be used to reconstruct a 3-way merge when applying the patch
(see the --3way option to 'git am' for more details).

In order for this to work, however, the blob SHA1s must not be
abbreviated into ambiguity.

This patch eliminates the possible ambiguity by using find_unique_abbrev()
to produce the abbreviated SHA1s (instead of blind abbreviation by way of
"%.*s").

A testcase verifying the fix is also included.

Signed-off-by: Johan Herland <johan@herland.net>
---
 diff.c                              |    6 +++---
 t/t4043-diff-index-unique-abbrev.sh |   35 +++++++++++++++++++++++++++++++++++
 2 files changed, 38 insertions(+), 3 deletions(-)
 create mode 100755 t/t4043-diff-index-unique-abbrev.sh

diff --git a/diff.c b/diff.c
index 494f560..1aefa66 100644
--- a/diff.c
+++ b/diff.c
@@ -2419,9 +2419,9 @@ static void fill_metainfo(struct strbuf *msg,
 			    (!fill_mmfile(&mf, two) && diff_filespec_is_binary(two)))
 				abbrev = 40;
 		}
-		strbuf_addf(msg, "index %.*s..%.*s",
-			    abbrev, sha1_to_hex(one->sha1),
-			    abbrev, sha1_to_hex(two->sha1));
+		strbuf_addf(msg, "index %s..",
+			    find_unique_abbrev(one->sha1, abbrev));
+		strbuf_addstr(msg, find_unique_abbrev(two->sha1, abbrev));
 		if (one->mode == two->mode)
 			strbuf_addf(msg, " %06o", one->mode);
 		strbuf_addch(msg, '\n');
diff --git a/t/t4043-diff-index-unique-abbrev.sh b/t/t4043-diff-index-unique-abbrev.sh
new file mode 100755
index 0000000..d5ce72b
--- /dev/null
+++ b/t/t4043-diff-index-unique-abbrev.sh
@@ -0,0 +1,35 @@
+#!/bin/sh
+
+test_description='test unique sha1 abbreviation on "index from..to" line'
+. ./test-lib.sh
+
+cat >expect_initial <<EOF
+100644 blob 51d2738463ea4ca66f8691c91e33ce64b7d41bb1	foo
+EOF
+
+cat >expect_update <<EOF
+100644 blob 51d2738efb4ad8a1e40bed839ab8e116f0a15e47	foo
+EOF
+
+test_expect_success 'setup' '
+	echo 4827 > foo &&
+	git add foo &&
+	git commit -m "initial" &&
+	git cat-file -p HEAD: > actual &&
+	test_cmp expect_initial actual &&
+	echo 11742 > foo &&
+	git commit -a -m "update" &&
+	git cat-file -p HEAD: > actual &&
+	test_cmp expect_update actual
+'
+
+cat >expect <<EOF
+index 51d27384..51d2738e 100644
+EOF
+
+test_expect_success 'diff does not produce ambiguous index line' '
+	git diff HEAD^..HEAD | grep index > actual &&
+	test_cmp expect actual
+'
+
+test_done
-- 
1.7.0.4

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH] diff.c: Ensure "index $from..$to" line contains unambiguous SHA1s
  2010-05-30 13:37 [PATCH] diff.c: Ensure "index $from..$to" line contains unambiguous SHA1s Johan Herland
@ 2010-05-30 19:58 ` Johannes Sixt
  2010-05-30 23:23   ` Johan Herland
  0 siblings, 1 reply; 3+ messages in thread
From: Johannes Sixt @ 2010-05-30 19:58 UTC (permalink / raw)
  To: Johan Herland; +Cc: Junio C Hamano, git

On Sonntag, 30. Mai 2010, Johan Herland wrote:
> +cat >expect_initial <<EOF
> +100644 blob 51d2738463ea4ca66f8691c91e33ce64b7d41bb1	foo
> +EOF
> +
> +cat >expect_update <<EOF
> +100644 blob 51d2738efb4ad8a1e40bed839ab8e116f0a15e47	foo
> +EOF
> +
> +test_expect_success 'setup' '
> +	echo 4827 > foo &&
...
> +	echo 11742 > foo &&

How the fscking hell did you find these two simple values that are an 
almost-SHA1-collision? It's easier to hit the jackpot!?!

-- Hannes

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] diff.c: Ensure "index $from..$to" line contains unambiguous SHA1s
  2010-05-30 19:58 ` Johannes Sixt
@ 2010-05-30 23:23   ` Johan Herland
  0 siblings, 0 replies; 3+ messages in thread
From: Johan Herland @ 2010-05-30 23:23 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: Junio C Hamano, git

[-- Attachment #1: Type: Text/Plain, Size: 2071 bytes --]

On Sunday 30 May 2010, Johannes Sixt wrote:
> On Sonntag, 30. Mai 2010, Johan Herland wrote:
> > +cat >expect_initial <<EOF
> > +100644 blob 51d2738463ea4ca66f8691c91e33ce64b7d41bb1	foo
> > +EOF
> > +
> > +cat >expect_update <<EOF
> > +100644 blob 51d2738efb4ad8a1e40bed839ab8e116f0a15e47	foo
> > +EOF
> > +
> > +test_expect_success 'setup' '
> > +	echo 4827 > foo &&
> 
> ...
> 
> > +	echo 11742 > foo &&
> 
> How the fscking hell did you find these two simple values that are an
> almost-SHA1-collision? It's easier to hit the jackpot!?!

Finding matches in the intial 7 hex chars (== 28 bits) of a SHA1 isn't 
_that_ hard. In this case, I used the attached hack/program[1], which found 
the above values in <0.9 seconds, albeit you need ~1 GB free RAM to run 
it[2].

The program simply increments a 32-bit integer, and produces simple (but 
unique) blob objects (merely containing the 32-bit integer in decimal 
format) for each integer, then hashes that blob object and stores the 
original integer in a reverse dictionary (actually, a 2^28-entry array), 
keyed/indexed by the first 28 bits of the hash. Then, repeat until we find 
an integer whose blob hashes to the same location as a previous integer. 
Since we're using a 32-bit integer to generate inputs to a 2^28-entry array, 
we're bound to find collisions long before the integer overflows.

FTR, there are already several almost-collisions in real-world repos. I 
first found some in repos at my $DAYJOB, but there are also multiple cases 
in the git.git repo. The following command will list all 7-char ambiguities 
in a packfile:

  git verify-pack -v $pack.idx | cut -c1-7 | uniq -d


Have fun! :)

...Johan


[1]: Put the attached file in your git.git checkout and compile it with:

  gcc -o find_sha_dup find_sha_dup.c block-sha1/sha1.c


[2]: Double that for each additional bit you want to crack. I.e. cracking 
the first 8 hex chars (== 32 bits) would require ~16 GB free RAM. I'm sure 
there are more efficient ways of doing these things...

-- 
Johan Herland, <johan@herland.net>
www.herland.net

[-- Attachment #2: find_sha_dup.c --]
[-- Type: text/x-csrc, Size: 2042 bytes --]

#include <stdint.h>
#include <stdlib.h>
#include <string.h>
#include <stdio.h>

#include "block-sha1/sha1.h"

/* Find SHA1 collisions within the first 7 hex chars (== 28 bits) */

#define num_entries (1 << 28) /* Need room for 2 ^ 28 entries */
uint32_t a[num_entries];

const char *str (uint32_t x, uint32_t *len)
{
	/* Generate unique, short, readable string from integer */
#define buf_len 15
	static char buf[buf_len];
	int l;
	l = snprintf(buf, buf_len, "%u\n", x);
	if (l >= buf_len || l < 0) {
		printf("FAIL! l == %u\n", l);
		exit(1);
	}
	if (len)
		*len = l;
	return buf;
}

const unsigned char *sha1 (const char *s, size_t len)
{
	static blk_SHA_CTX sha1_ctx;
	static unsigned char sha1_result[20];
	char hdr[32];
	int hdrlen;

	/* Make it look like a Git blob object */
	hdrlen = sprintf(hdr, "blob %lu", len) + 1;

	blk_SHA1_Init(&sha1_ctx);
	blk_SHA1_Update(&sha1_ctx, hdr, hdrlen);
	blk_SHA1_Update(&sha1_ctx, s, len);
	blk_SHA1_Final(sha1_result, &sha1_ctx);

	return sha1_result;
}

const unsigned char *sha1_x (uint32_t x)
{
	uint32_t len = 0;
	const char *s = str(x, &len);
	return sha1(s, len);
}

uint32_t a_index (const unsigned char *h)
{
	uint32_t ret = (h[0] << 20) | (h[1] << 12) | (h[2] << 4) | (h[3] >> 4);
	return ret;
}

char *sha1_to_hex(const unsigned char *sha1)
{
	static char buffer[41];
	static const char hex[] = "0123456789abcdef";
	int i;
	char *buf = buffer;

	for (i = 0; i < 20; i++) {
		unsigned int val = *sha1++;
		*buf++ = hex[val >> 4];
		*buf++ = hex[val & 0xf];
	}
	*buf = '\0';

	return buffer;
}

int main (void)
{
	uint32_t x = 0, i;

	memset(a, 0xff, num_entries * sizeof(uint32_t));

	while (x != 0xffffffff) {
		i = a_index(sha1_x(x));
		if (a[i] == 0xffffffff) {
			a[i] = x++;
			continue;
		}

		/* Found collision! */
		uint32_t y = a[i];
		printf("Found collision between entries %u and %u\n", y, x);
		printf("\t %u == '%s' => %s\n", y, str(y, NULL), sha1_to_hex(sha1_x(y)));
		printf("\t %u == '%s' => %s\n", x, str(x, NULL), sha1_to_hex(sha1_x(x)));
		return 1;
	}

	return 0;
}

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2010-05-30 23:23 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-05-30 13:37 [PATCH] diff.c: Ensure "index $from..$to" line contains unambiguous SHA1s Johan Herland
2010-05-30 19:58 ` Johannes Sixt
2010-05-30 23:23   ` Johan Herland

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.