All of lore.kernel.org
 help / color / mirror / Atom feed
* SHA1 collision in production repo?! (probably not)
@ 2017-03-31 16:05 Lars Schneider
  2017-03-31 17:27 ` Jeff King
  2017-03-31 17:35 ` Junio C Hamano
  0 siblings, 2 replies; 15+ messages in thread
From: Lars Schneider @ 2017-03-31 16:05 UTC (permalink / raw)
  To: Git List; +Cc: Jeff King

Hi,

I just got a report with the following output after a "git fetch" operation
using Git 2.11.0.windows.3 [1]:

remote: Counting objects: 5922, done.
remote: Compressing objects: 100% (14/14), done.
error: inflate: data stream error (unknown compression method)
error: unable to unpack 6acd8f279a8b20311665f41134579b7380970446 header
fatal: SHA1 COLLISION FOUND WITH 6acd8f279a8b20311665f41134579b7380970446 !
fatal: index-pack failed

I would be really surprised if we discovered a SHA1 collision in a production
repo. My guess is that this is somehow triggered by a network issue (see data
stream error). Any tips how to debug this?

Thanks,
Lars

[1] Git for Windows build based on Git v2.11.0

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: SHA1 collision in production repo?! (probably not)
  2017-03-31 16:05 SHA1 collision in production repo?! (probably not) Lars Schneider
@ 2017-03-31 17:27 ` Jeff King
  2017-03-31 17:35 ` Junio C Hamano
  1 sibling, 0 replies; 15+ messages in thread
From: Jeff King @ 2017-03-31 17:27 UTC (permalink / raw)
  To: Lars Schneider; +Cc: Git List

On Fri, Mar 31, 2017 at 06:05:17PM +0200, Lars Schneider wrote:

> I just got a report with the following output after a "git fetch" operation
> using Git 2.11.0.windows.3 [1]:
> 
> remote: Counting objects: 5922, done.
> remote: Compressing objects: 100% (14/14), done.
> error: inflate: data stream error (unknown compression method)
> error: unable to unpack 6acd8f279a8b20311665f41134579b7380970446 header
> fatal: SHA1 COLLISION FOUND WITH 6acd8f279a8b20311665f41134579b7380970446 !
> fatal: index-pack failed
> 
> I would be really surprised if we discovered a SHA1 collision in a production
> repo. My guess is that this is somehow triggered by a network issue (see data
> stream error). Any tips how to debug this?

I'd be surprised, too. :)

I'm not sure that inflate error actually comes from the network pack.
The "unable to unpack $sha1 header" message actually comes from
sha1_object_info_loose(). Which means we're failing to read our _local_
version of 6acd8f279a8b, which is an object we believe is coming in from
the network.

And that would explain the false-positive collision. We computed the
sha1 on something come from the network. We believe we have an object
with that sha1 already (but it's corrupted), and then when we compared
the real bytes to the corrupted bytes, they didn't match.

We should be able to confirm that by running "git fsck" on the local
repo, which I'd expect to complain about the loose object.

But what I find disturbing there is that we did not notice the failure
when accessing the loose object. It seems to have been lost in the call
chain. I think the problem is that sha1_loose_object_info() may report
errors in two ways: returning -1 if it did not find the object, or
putting OBJ_BAD into the type field if it found a corrupt object. But
callers are not aware of the second one.

I think it should probably return -1 for a corruption, too, and act as
if we don't have the object (because we effectively don't).

-Peff

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: SHA1 collision in production repo?! (probably not)
  2017-03-31 16:05 SHA1 collision in production repo?! (probably not) Lars Schneider
  2017-03-31 17:27 ` Jeff King
@ 2017-03-31 17:35 ` Junio C Hamano
  2017-03-31 17:45   ` Jeff King
  1 sibling, 1 reply; 15+ messages in thread
From: Junio C Hamano @ 2017-03-31 17:35 UTC (permalink / raw)
  To: Lars Schneider; +Cc: Git List, Jeff King

Lars Schneider <larsxschneider@gmail.com> writes:

> Hi,
>
> I just got a report with the following output after a "git fetch" operation
> using Git 2.11.0.windows.3 [1]:
>
> remote: Counting objects: 5922, done.
> remote: Compressing objects: 100% (14/14), done.
> error: inflate: data stream error (unknown compression method)
> error: unable to unpack 6acd8f279a8b20311665f41134579b7380970446 header
> fatal: SHA1 COLLISION FOUND WITH 6acd8f279a8b20311665f41134579b7380970446 !
> fatal: index-pack failed
>
> I would be really surprised if we discovered a SHA1 collision in a production
> repo. My guess is that this is somehow triggered by a network issue (see data
> stream error). Any tips how to debug this?

Perhaps the first thing to do is to tweak the messages in builtin/index-pack.c
to help you identify which one of identical 5 messages is firing.

My guess would be that the code saw an object that came over the
wire, hashed it to learn that its object name is
6acd8f279a8b20311665f41134579b7380970446, noticed that it locally
already has the object with the same name, and tried to make sure
they have identical contents (otherwise, what came over the wire is
a successful second preimage attack).  But your loose object on disk
you already had was corrupt and didn't inflate correctly when
builtin/index-pack.c::compare_objects() or check_collision() tried
to.  The code saw no data, or truncated data, or
whatever---something different from what the other data that hashed
down to 6acd8..., and reported a collision when there is no
collision.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: SHA1 collision in production repo?! (probably not)
  2017-03-31 17:35 ` Junio C Hamano
@ 2017-03-31 17:45   ` Jeff King
  2017-03-31 17:48     ` Jeff King
                       ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Jeff King @ 2017-03-31 17:45 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Lars Schneider, Git List

On Fri, Mar 31, 2017 at 10:35:06AM -0700, Junio C Hamano wrote:

> Lars Schneider <larsxschneider@gmail.com> writes:
> 
> > Hi,
> >
> > I just got a report with the following output after a "git fetch" operation
> > using Git 2.11.0.windows.3 [1]:
> >
> > remote: Counting objects: 5922, done.
> > remote: Compressing objects: 100% (14/14), done.
> > error: inflate: data stream error (unknown compression method)
> > error: unable to unpack 6acd8f279a8b20311665f41134579b7380970446 header
> > fatal: SHA1 COLLISION FOUND WITH 6acd8f279a8b20311665f41134579b7380970446 !
> > fatal: index-pack failed
> >
> > I would be really surprised if we discovered a SHA1 collision in a production
> > repo. My guess is that this is somehow triggered by a network issue (see data
> > stream error). Any tips how to debug this?
> 
> Perhaps the first thing to do is to tweak the messages in builtin/index-pack.c
> to help you identify which one of identical 5 messages is firing.
> 
> My guess would be that the code saw an object that came over the
> wire, hashed it to learn that its object name is
> 6acd8f279a8b20311665f41134579b7380970446, noticed that it locally
> already has the object with the same name, and tried to make sure
> they have identical contents (otherwise, what came over the wire is
> a successful second preimage attack).  But your loose object on disk
> you already had was corrupt and didn't inflate correctly when
> builtin/index-pack.c::compare_objects() or check_collision() tried
> to.  The code saw no data, or truncated data, or
> whatever---something different from what the other data that hashed
> down to 6acd8..., and reported a collision when there is no
> collision.

My guess is the one in compare_objects(). The "unable to unpack" error
comes from sha1_loose_object_info(). We'd normally then follow up with
read_sha1_file(), which would generate its own set of errors.

But if open_istream() got a bogus value for the object size (but didn't
correctly report an error), then it would probably quietly return 0
bytes from read_istream() later.

I suspect this may improve things, but I haven't dug deeper to see if
there are unwanted side effects, or if there are other spots that need
similar treatment.

diff --git a/sha1_file.c b/sha1_file.c
index 43990dec7..38411f90b 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -2952,7 +2952,7 @@ static int sha1_loose_object_info(const unsigned char *sha1,
 	if (status && oi->typep)
 		*oi->typep = status;
 	strbuf_release(&hdrbuf);
-	return 0;
+	return status;
 }
 
 int sha1_object_info_extended(const unsigned char *sha1, struct object_info *oi, unsigned flags)

-Peff

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: SHA1 collision in production repo?! (probably not)
  2017-03-31 17:45   ` Jeff King
@ 2017-03-31 17:48     ` Jeff King
  2017-03-31 18:19       ` Junio C Hamano
  2017-03-31 21:16     ` Junio C Hamano
  2017-09-12 16:18     ` SHA1 collision in production repo?! (probably not) Lars Schneider
  2 siblings, 1 reply; 15+ messages in thread
From: Jeff King @ 2017-03-31 17:48 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Lars Schneider, Git List

On Fri, Mar 31, 2017 at 01:45:15PM -0400, Jeff King wrote:

> I suspect this may improve things, but I haven't dug deeper to see if
> there are unwanted side effects, or if there are other spots that need
> similar treatment.
> 
> diff --git a/sha1_file.c b/sha1_file.c
> index 43990dec7..38411f90b 100644
> --- a/sha1_file.c
> +++ b/sha1_file.c
> @@ -2952,7 +2952,7 @@ static int sha1_loose_object_info(const unsigned char *sha1,
>  	if (status && oi->typep)
>  		*oi->typep = status;
>  	strbuf_release(&hdrbuf);
> -	return 0;
> +	return status;
>  }
>  
>  int sha1_object_info_extended(const unsigned char *sha1, struct object_info *oi, unsigned flags)

Er, no, that's totally wrong. "status' may be holding the type. It
should really be:

  return status < 0 ? status : 0;

-Peff

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: SHA1 collision in production repo?! (probably not)
  2017-03-31 17:48     ` Jeff King
@ 2017-03-31 18:19       ` Junio C Hamano
  2017-03-31 18:42         ` Jeff King
  0 siblings, 1 reply; 15+ messages in thread
From: Junio C Hamano @ 2017-03-31 18:19 UTC (permalink / raw)
  To: Jeff King; +Cc: Lars Schneider, Git List

Jeff King <peff@peff.net> writes:

> On Fri, Mar 31, 2017 at 01:45:15PM -0400, Jeff King wrote:
>
>> I suspect this may improve things, but I haven't dug deeper to see if
>> there are unwanted side effects, or if there are other spots that need
>> similar treatment.
>> 
>> diff --git a/sha1_file.c b/sha1_file.c
>> index 43990dec7..38411f90b 100644
>> --- a/sha1_file.c
>> +++ b/sha1_file.c
>> @@ -2952,7 +2952,7 @@ static int sha1_loose_object_info(const unsigned char *sha1,
>>  	if (status && oi->typep)
>>  		*oi->typep = status;
>>  	strbuf_release(&hdrbuf);
>> -	return 0;
>> +	return status;
>>  }
>>  
>>  int sha1_object_info_extended(const unsigned char *sha1, struct object_info *oi, unsigned flags)
>
> Er, no, that's totally wrong. "status' may be holding the type. It
> should really be:
>
>   return status < 0 ? status : 0;

Sounds more like it.  The only caller will say "ah, that object is
not available to us---let's try packs again", which is exactly what
we want to happen.

There is another bug in the codepath: the assignment to *oi->typep
in the pre-context must guard against negative status value.  By
returning an error correctly like you do above, that bug becomes
more or less irrelevant, though.




^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: SHA1 collision in production repo?! (probably not)
  2017-03-31 18:19       ` Junio C Hamano
@ 2017-03-31 18:42         ` Jeff King
  0 siblings, 0 replies; 15+ messages in thread
From: Jeff King @ 2017-03-31 18:42 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Lars Schneider, Git List

On Fri, Mar 31, 2017 at 11:19:54AM -0700, Junio C Hamano wrote:

> > Er, no, that's totally wrong. "status' may be holding the type. It
> > should really be:
> >
> >   return status < 0 ? status : 0;
> 
> Sounds more like it.  The only caller will say "ah, that object is
> not available to us---let's try packs again", which is exactly what
> we want to happen.

Right. Callers cannot distinguish between "did not have it" and
"corrupted", but that is no different than the rest of the sha1-file
interface.

> There is another bug in the codepath: the assignment to *oi->typep
> in the pre-context must guard against negative status value.  By
> returning an error correctly like you do above, that bug becomes
> more or less irrelevant, though.

I think that is intentional. OBJ_BAD is -1 (though the callers are
assuming that error() == OBJ_BAD). And that type gets relayed through
sha1_object_info() as the combined type/status return value.

-Peff

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: SHA1 collision in production repo?! (probably not)
  2017-03-31 17:45   ` Jeff King
  2017-03-31 17:48     ` Jeff King
@ 2017-03-31 21:16     ` Junio C Hamano
  2017-04-01  8:03       ` Jeff King
  2017-09-12 16:18     ` SHA1 collision in production repo?! (probably not) Lars Schneider
  2 siblings, 1 reply; 15+ messages in thread
From: Junio C Hamano @ 2017-03-31 21:16 UTC (permalink / raw)
  To: Jeff King; +Cc: Lars Schneider, Git List

Before we forget...


-- >8 --
From: Jeff King <peff@peff.net>

When sha1_loose_object_info() finds that a loose object file cannot
be stat(2)ed or mmap(2)ed, it returns -1 to signal an error to the
caller.  However, if it found that the loose object file is corrupt
and the object data cannot be used from it, it forgets to return -1.

This can confuse the caller, who may be lead to mistakenly think
that there is a loose object and possibly gets an incorrect type and
size from the function.  The SHA-1 collision detection codepath, for
example, when it gets an object over the wire and tries to see the
data is the same as what is available as a loose object locally, can
get confused when the loose object is correupted due to this bug.

---
 sha1_file.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/sha1_file.c b/sha1_file.c
index 71063890ff..368c89b5c3 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -2952,7 +2952,7 @@ static int sha1_loose_object_info(const unsigned char *sha1,
 	if (status && oi->typep)
 		*oi->typep = status;
 	strbuf_release(&hdrbuf);
-	return 0;
+	return (status < 0) ? status : 0;
 }
 
 int sha1_object_info_extended(const unsigned char *sha1, struct object_info *oi, unsigned flags)

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: SHA1 collision in production repo?! (probably not)
  2017-03-31 21:16     ` Junio C Hamano
@ 2017-04-01  8:03       ` Jeff King
  2017-04-01  8:05         ` [PATCH 1/2] sha1_loose_object_info: return error for corrupted objects Jeff King
  2017-04-01  8:09         ` [PATCH 2/2] index-pack: detect local corruption in collision check Jeff King
  0 siblings, 2 replies; 15+ messages in thread
From: Jeff King @ 2017-04-01  8:03 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Lars Schneider, Git List

On Fri, Mar 31, 2017 at 02:16:29PM -0700, Junio C Hamano wrote:

> Before we forget...
> 
> -- >8 --
> From: Jeff King <peff@peff.net>
> 
> When sha1_loose_object_info() finds that a loose object file cannot
> be stat(2)ed or mmap(2)ed, it returns -1 to signal an error to the
> caller.  However, if it found that the loose object file is corrupt
> and the object data cannot be used from it, it forgets to return -1.
> 
> This can confuse the caller, who may be lead to mistakenly think
> that there is a loose object and possibly gets an incorrect type and
> size from the function.  The SHA-1 collision detection codepath, for
> example, when it gets an object over the wire and tries to see the
> data is the same as what is available as a loose object locally, can
> get confused when the loose object is correupted due to this bug.

Unfortunately this isn't quite right. I was able to reproduce the
problem that Lars saw, and this patch doesn't fix it.

So here's a two-patch series. The first fixes the problem described
above, along with a simpler test that demonstrates it. The second fixes
Lars's problem on top.

I know you're heading out for a week, so I'll post it now for review,
and then hold and repost when you get back.

  [1/2]: sha1_loose_object_info: return error for corrupted objects
  [2/2]: index-pack: detect local corruption in collision check

 builtin/index-pack.c         |  2 ++
 sha1_file.c                  |  2 +-
 t/t1060-object-corruption.sh | 24 ++++++++++++++++++++++++
 3 files changed, 27 insertions(+), 1 deletion(-)

-Peff

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 1/2] sha1_loose_object_info: return error for corrupted objects
  2017-04-01  8:03       ` Jeff King
@ 2017-04-01  8:05         ` Jeff King
  2017-04-01 17:47           ` Junio C Hamano
  2017-04-01  8:09         ` [PATCH 2/2] index-pack: detect local corruption in collision check Jeff King
  1 sibling, 1 reply; 15+ messages in thread
From: Jeff King @ 2017-04-01  8:05 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Lars Schneider, Git List

When sha1_loose_object_info() finds that a loose object file
cannot be stat(2)ed or mmap(2)ed, it returns -1 to signal an
error to the caller.  However, if it found that the loose
object file is corrupt and the object data cannot be used
from it, it stuffs OBJ_BAD into "type" field of the
object_info, but returns zero (i.e., success), which can
confuse callers.

This is due to 052fe5eac (sha1_loose_object_info: make type
lookup optional, 2013-07-12), which switched the return to a
strict success/error, rather than returning the type (but
botched the return).

Callers of regular sha1_object_info() don't notice the
difference, as that function returns the type (which is
OBJ_BAD in this case). However, direct callers of
sha1_object_info_extended() see the function return success,
but without setting any meaningful values in the object_info
struct, leading them to access potentially uninitialized
memory.

The easiest way to see the bug is via "cat-file -s", which
will happily ignore the corruption and report whatever
value happened to be in the "size" variable.

Signed-off-by: Jeff King <peff@peff.net>
---
So not only does it not fail, but running with --valgrind complains.

 sha1_file.c                  | 2 +-
 t/t1060-object-corruption.sh | 7 +++++++
 2 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/sha1_file.c b/sha1_file.c
index 71063890f..368c89b5c 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -2952,7 +2952,7 @@ static int sha1_loose_object_info(const unsigned char *sha1,
 	if (status && oi->typep)
 		*oi->typep = status;
 	strbuf_release(&hdrbuf);
-	return 0;
+	return (status < 0) ? status : 0;
 }
 
 int sha1_object_info_extended(const unsigned char *sha1, struct object_info *oi, unsigned flags)
diff --git a/t/t1060-object-corruption.sh b/t/t1060-object-corruption.sh
index 3f8705139..3a88d79c5 100755
--- a/t/t1060-object-corruption.sh
+++ b/t/t1060-object-corruption.sh
@@ -53,6 +53,13 @@ test_expect_success 'streaming a corrupt blob fails' '
 	)
 '
 
+test_expect_success 'getting type of a corrupt blob fails' '
+	(
+		cd bit-error &&
+		test_must_fail git cat-file -s HEAD:content.t
+	)
+'
+
 test_expect_success 'read-tree -u detects bit-errors in blobs' '
 	(
 		cd bit-error &&
-- 
2.12.2.943.g91cb50fd8


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 2/2] index-pack: detect local corruption in collision check
  2017-04-01  8:03       ` Jeff King
  2017-04-01  8:05         ` [PATCH 1/2] sha1_loose_object_info: return error for corrupted objects Jeff King
@ 2017-04-01  8:09         ` Jeff King
  2017-04-01 18:04           ` Junio C Hamano
  1 sibling, 1 reply; 15+ messages in thread
From: Jeff King @ 2017-04-01  8:09 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Lars Schneider, Git List

When we notice that we have a local copy of an incoming
object, we compare the two objects to make sure we haven't
found a collision. Before we get to the actual object
bytes, though, we compare the type and size from
sha1_object_info().

If our local object is corrupted, then the type will be
OBJ_BAD, which obviously will not match the incoming type,
and we'll report "SHA1 COLLISION FOUND" (with capital
letters and everything). This is confusing, as the problem
is not a collision but rather local corruption. We should
reoprt that instead (just like we do if reading the rest of
the object content fails a few lines later).

Note that we _could_ just ignore the error and mark it as a
non-collision. That would let you "git fetch" to replace a
corrupted object. But it's not a very reliable method for
repairing a repository. The earlier want/have negotiation
tries to get the other side to omit objects we already have,
and it would not realize that we are "missing" this
corrupted object. So we're better off complaining loudly
when we see corruption, and letting the user take more
drastic measures to repair (like making a full clone
elsewhere and copying the pack into place).

Note that the test sets transfer.unpackLimit in the
receiving repository so that we use index-pack (which is
what does the collision check). Normally for such a small
push we'd use unpack-objects, which would simply try to
write the loose object, and discard the new one when we see
that there's already an old one.

Signed-off-by: Jeff King <peff@peff.net>
---
I was surprised to see that unpack-objects doesn't do the same collision
check. I think it relies on the loose-object precedence. But that seems
like it misses a case: if you have a packed version of an object and
receive a small fetch or push, we may unpack a duplicate object to its
loose format without doing any kind of collision check.

We may want to tighten that up. In the long run, I'd love to see
unpack-objects go away in favor of teaching index-pack how to write
loose objects. The two had very similar code once upon a time, but
index-pack has grown a lot of feature and bugfixes over the years.

 builtin/index-pack.c         |  2 ++
 t/t1060-object-corruption.sh | 17 +++++++++++++++++
 2 files changed, 19 insertions(+)

diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 88d205f85..9f17adb37 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -809,6 +809,8 @@ static void sha1_object(const void *data, struct object_entry *obj_entry,
 		unsigned long has_size;
 		read_lock();
 		has_type = sha1_object_info(sha1, &has_size);
+		if (has_type < 0)
+			die(_("cannot read existing object info %s"), sha1_to_hex(sha1));
 		if (has_type != type || has_size != size)
 			die(_("SHA1 COLLISION FOUND WITH %s !"), sha1_to_hex(sha1));
 		has_data = read_sha1_file(sha1, &has_type, &has_size);
diff --git a/t/t1060-object-corruption.sh b/t/t1060-object-corruption.sh
index 3a88d79c5..ac1f189fd 100755
--- a/t/t1060-object-corruption.sh
+++ b/t/t1060-object-corruption.sh
@@ -21,6 +21,14 @@ test_expect_success 'setup corrupt repo' '
 		cd bit-error &&
 		test_commit content &&
 		corrupt_byte HEAD:content.t 10
+	) &&
+	git init no-bit-error &&
+	(
+		# distinct commit from bit-error, but containing a
+		# non-corrupted version of the same blob
+		cd no-bit-error &&
+		test_tick &&
+		test_commit content
 	)
 '
 
@@ -108,4 +116,13 @@ test_expect_failure 'clone --local detects misnamed objects' '
 	test_must_fail git clone --local misnamed misnamed-checkout
 '
 
+test_expect_success 'fetch into corrupted repo with index-pack' '
+	(
+		cd bit-error &&
+		test_must_fail git -c transfer.unpackLimit=1 \
+			fetch ../no-bit-error 2>stderr &&
+		test_i18ngrep ! -i collision stderr
+	)
+'
+
 test_done
-- 
2.12.2.943.g91cb50fd8

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] sha1_loose_object_info: return error for corrupted objects
  2017-04-01  8:05         ` [PATCH 1/2] sha1_loose_object_info: return error for corrupted objects Jeff King
@ 2017-04-01 17:47           ` Junio C Hamano
  0 siblings, 0 replies; 15+ messages in thread
From: Junio C Hamano @ 2017-04-01 17:47 UTC (permalink / raw)
  To: Jeff King; +Cc: Lars Schneider, Git List

Jeff King <peff@peff.net> writes:

> When sha1_loose_object_info() finds that a loose object file
> cannot be stat(2)ed or mmap(2)ed, it returns -1 to signal an
> error to the caller.  However, if it found that the loose
> object file is corrupt and the object data cannot be used
> from it, it stuffs OBJ_BAD into "type" field of the
> object_info, but returns zero (i.e., success), which can
> confuse callers.

Thanks for a nice analysis and a fix.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] index-pack: detect local corruption in collision check
  2017-04-01  8:09         ` [PATCH 2/2] index-pack: detect local corruption in collision check Jeff King
@ 2017-04-01 18:04           ` Junio C Hamano
  0 siblings, 0 replies; 15+ messages in thread
From: Junio C Hamano @ 2017-04-01 18:04 UTC (permalink / raw)
  To: Jeff King; +Cc: Lars Schneider, Git List

Jeff King <peff@peff.net> writes:

> is not a collision but rather local corruption. We should
> reoprt that instead (just like we do if reading the rest of
> the object content fails a few lines later).

s/reoprt/report/ (locally amended while queuing).

> We may want to tighten that up. In the long run, I'd love to see
> unpack-objects go away in favor of teaching index-pack how to write
> loose objects. The two had very similar code once upon a time, but
> index-pack has grown a lot of feature and bugfixes over the years.

This sounds like a nice future to aspire to reach.  

Besides having to maintain two separate executables, another
downside of the current arrangement is that we have to make the
decision between streaming to a single pack and exploding into loose
objects too early and base our decision solely on the object count.
If we are moving to a single receiver, it is conceivable to switch
to a scheme based on the size of the incoming pack (e.g. spool the
first N MB and if we run out we write out loose objects, otherwise
create a new pack, and dump the spooled part and stream the rest
into it while indexing).

Thanks.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: SHA1 collision in production repo?! (probably not)
  2017-03-31 17:45   ` Jeff King
  2017-03-31 17:48     ` Jeff King
  2017-03-31 21:16     ` Junio C Hamano
@ 2017-09-12 16:18     ` Lars Schneider
  2017-09-12 17:38       ` Jeff King
  2 siblings, 1 reply; 15+ messages in thread
From: Lars Schneider @ 2017-09-12 16:18 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, Git List


> On 31 Mar 2017, at 19:45, Jeff King <peff@peff.net> wrote:
> 
> On Fri, Mar 31, 2017 at 10:35:06AM -0700, Junio C Hamano wrote:
> 
>> Lars Schneider <larsxschneider@gmail.com> writes:
>> 
>>> Hi,
>>> 
>>> I just got a report with the following output after a "git fetch" operation
>>> using Git 2.11.0.windows.3 [1]:
>>> 
>>> remote: Counting objects: 5922, done.
>>> remote: Compressing objects: 100% (14/14), done.
>>> error: inflate: data stream error (unknown compression method)
>>> error: unable to unpack 6acd8f279a8b20311665f41134579b7380970446 header
>>> fatal: SHA1 COLLISION FOUND WITH 6acd8f279a8b20311665f41134579b7380970446 !
>>> fatal: index-pack failed
>>> 
>>> I would be really surprised if we discovered a SHA1 collision in a production
>>> repo. My guess is that this is somehow triggered by a network issue (see data
>>> stream error). Any tips how to debug this?
>> 
>> Perhaps the first thing to do is to tweak the messages in builtin/index-pack.c
>> to help you identify which one of identical 5 messages is firing.
>> 
>> My guess would be that the code saw an object that came over the
>> wire, hashed it to learn that its object name is
>> 6acd8f279a8b20311665f41134579b7380970446, noticed that it locally
>> already has the object with the same name, and tried to make sure
>> they have identical contents (otherwise, what came over the wire is
>> a successful second preimage attack).  But your loose object on disk
>> you already had was corrupt and didn't inflate correctly when
>> builtin/index-pack.c::compare_objects() or check_collision() tried
>> to.  The code saw no data, or truncated data, or
>> whatever---something different from what the other data that hashed
>> down to 6acd8..., and reported a collision when there is no
>> collision.
> 
> My guess is the one in compare_objects(). The "unable to unpack" error
> comes from sha1_loose_object_info(). We'd normally then follow up with
> read_sha1_file(), which would generate its own set of errors.
> 
> But if open_istream() got a bogus value for the object size (but didn't
> correctly report an error), then it would probably quietly return 0
> bytes from read_istream() later.
> 
> I suspect this may improve things, but I haven't dug deeper to see if
> there are unwanted side effects, or if there are other spots that need
> similar treatment.
> 
> diff --git a/sha1_file.c b/sha1_file.c
> index 43990dec7..38411f90b 100644
> --- a/sha1_file.c
> +++ b/sha1_file.c
> @@ -2952,7 +2952,7 @@ static int sha1_loose_object_info(const unsigned char *sha1,
> 	if (status && oi->typep)
> 		*oi->typep = status;
> 	strbuf_release(&hdrbuf);
> -	return 0;
> +	return status;
> }
> 
> int sha1_object_info_extended(const unsigned char *sha1, struct object_info *oi, unsigned flags)

Hi Peff,

we are seeing this now in Git 2.14.1:

...
error: inflate: data stream error (unknown compression method)
error: unable to unpack 7b513f98a66ef9488e516e7abbc246438597c6d5 header
error: inflate: data stream error (unknown compression method)
error: unable to unpack 7b513f98a66ef9488e516e7abbc246438597c6d5 header
fatal: loose object 7b513f98a66ef9488e516e7abbc246438597c6d5 (stored in .git/objects/7b/513f98a66ef9488e516e7abbc246438597c6d5) is corrupt
fatal: The remote end hung up unexpectedly

I guess this means your fix [1] works properly :-)

At some point I will try to explore a retry mechanism for these cases.

Cheers,
Lars

[1] https://github.com/git/git/commit/93cff9a978e1c177ac3e889867004a56773301b2

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: SHA1 collision in production repo?! (probably not)
  2017-09-12 16:18     ` SHA1 collision in production repo?! (probably not) Lars Schneider
@ 2017-09-12 17:38       ` Jeff King
  0 siblings, 0 replies; 15+ messages in thread
From: Jeff King @ 2017-09-12 17:38 UTC (permalink / raw)
  To: Lars Schneider; +Cc: Junio C Hamano, Git List

On Tue, Sep 12, 2017 at 06:18:32PM +0200, Lars Schneider wrote:

> we are seeing this now in Git 2.14.1:
> 
> ...
> error: inflate: data stream error (unknown compression method)
> error: unable to unpack 7b513f98a66ef9488e516e7abbc246438597c6d5 header
> error: inflate: data stream error (unknown compression method)
> error: unable to unpack 7b513f98a66ef9488e516e7abbc246438597c6d5 header
> fatal: loose object 7b513f98a66ef9488e516e7abbc246438597c6d5 (stored in .git/objects/7b/513f98a66ef9488e516e7abbc246438597c6d5) is corrupt
> fatal: The remote end hung up unexpectedly
> 
> I guess this means your fix [1] works properly :-)

Oh, good. :)

> At some point I will try to explore a retry mechanism for these cases.

I don't think we can generally retry loose-object failures. We use
copies from packs first, and then loose. So a corrupt loose can fallback
to another pack or to loose, but not the other way around (because we
would never look at the loose if we had a good copy elsewhere).

Though in your particular case, if I recall, you're receiving the object
over the network and the corrupted copy is in the way. So right now the
recovery process is:

  1. Notice the commit message.

  2. Run git-fsck to notice that we don't really[1] need the object.

  3. Run `rm .git/objects/7b/513f...`

  4. Re-run the fetch.

But in theory we should be able to say "oh, we don't _really_ have that,
the collision test isn't necessary" and then overwrite it. I actually
thought that's what would happen now (has_sha1_file() would return an
error), but I guess for what we need, it just does a stat() and calls it
a day, not realizing we ought to be overwriting.

-Peff

[1] git-fsck will actually complain if reflogs point to the object, and
    we can always expire those in a corrupted repo. So possibly what you
    want to know is whether it's reachable from actual refs. Of course
    this whole check is optional. If the object's corrupted, it's
    corrupted. But I get nervous calling `rm` on something that _could_
    be precious (say it's just a single-bit error that could be
    recovered). But if you have a known-good copy incoming, that's less
    of an issue.

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2017-09-12 17:38 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-03-31 16:05 SHA1 collision in production repo?! (probably not) Lars Schneider
2017-03-31 17:27 ` Jeff King
2017-03-31 17:35 ` Junio C Hamano
2017-03-31 17:45   ` Jeff King
2017-03-31 17:48     ` Jeff King
2017-03-31 18:19       ` Junio C Hamano
2017-03-31 18:42         ` Jeff King
2017-03-31 21:16     ` Junio C Hamano
2017-04-01  8:03       ` Jeff King
2017-04-01  8:05         ` [PATCH 1/2] sha1_loose_object_info: return error for corrupted objects Jeff King
2017-04-01 17:47           ` Junio C Hamano
2017-04-01  8:09         ` [PATCH 2/2] index-pack: detect local corruption in collision check Jeff King
2017-04-01 18:04           ` Junio C Hamano
2017-09-12 16:18     ` SHA1 collision in production repo?! (probably not) Lars Schneider
2017-09-12 17:38       ` Jeff King

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.