All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] Add failing test: "fsck survives inflate errors"
@ 2014-07-20  4:43 Samuel Bronson
  2014-07-20 20:43 ` Samuel Bronson
  0 siblings, 1 reply; 5+ messages in thread
From: Samuel Bronson @ 2014-07-20  4:43 UTC (permalink / raw)
  To: git; +Cc: Samuel Bronson

While inflate errors are obviously NOT GOOD, and should perhaps be
fatal for most commands, git fsck is something of a special case
because it is useful to have *it* report as many corrupt objects as
possible in one run.

Signed-off-by: Samuel Bronson <naesten@gmail.com>
---
 t/t1450-fsck.sh | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index 8c739c9..6dcc4b2 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -53,6 +53,23 @@ test_expect_success 'setup: helpers for corruption tests' '
 	}
 '
 
+# git fsck should be able to detect more than one corrupt object per run
+test_expect_failure 'fsck survives inflate errors' '
+	hash1=ffffffffffffffffffffffffffffffffffffffff &&
+	hash2=fffffffffffffffffffffffffffffffffffffffe &&
+	mkdir -p .git/objects/ff &&
+	echo not-zlib >$(sha1_file $hash1) &&
+	test_when_finished "remove_object $hash1" &&
+	echo not-zlib >$(sha1_file $hash2) &&
+	test_when_finished "remove_object $hash2" &&
+
+	# Return value is not documented
+	test_might_fail git fsck 2>out &&
+	cat out && echo ====== &&
+	grep "$hash1.*corrupt" out &&
+	grep "$hash2.*corrupt" out
+'
+
 test_expect_success 'object with bad sha1' '
 	sha=$(echo blob | git hash-object -w --stdin) &&
 	old=$(echo $sha | sed "s+^..+&/+") &&
-- 
2.0.1

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] Add failing test: "fsck survives inflate errors"
  2014-07-20  4:43 [PATCH] Add failing test: "fsck survives inflate errors" Samuel Bronson
@ 2014-07-20 20:43 ` Samuel Bronson
  2014-07-30  2:42   ` [BUG] parse_object() does not behave as documented Samuel Bronson
  2014-08-09  6:53   ` [PATCH] Add failing test: "fsck survives inflate errors" Duy Nguyen
  0 siblings, 2 replies; 5+ messages in thread
From: Samuel Bronson @ 2014-07-20 20:43 UTC (permalink / raw)
  To: git

The following message is a courtesy copy of an article
that has been posted to gmane.comp.version-control.git as well.

Oh, I forgot to provide any analysis of the problem.  Oops.

It may be just as well, though; I was tired enough that I might have
botched it in any case.  So, have an analysis:

While inflate errors are obviously NOT GOOD, and should perhaps be
instantly fatal for most commands, git fsck is something of a special
case because it is useful to have *it* report as many corrupt objects as
possible in one run.

Unfortunately, this is not currently the case, as shown by the provided
testcase.

The output for this testcase is:

,----
| checking known breakage:
|         hash1=ffffffffffffffffffffffffffffffffffffffff &&
|         hash2=fffffffffffffffffffffffffffffffffffffffe &&
|         mkdir -p .git/objects/ff &&
|         echo not-zlib >$(sha1_file $hash1) &&
|         test_when_finished "remove_object $hash1" &&
|         echo not-zlib >$(sha1_file $hash2) &&
|         test_when_finished "remove_object $hash2" &&
| 
|         # Return value is not documented
|         test_might_fail git fsck 2>out &&
|         cat out && echo ====== &&
|         grep "$hash1.*corrupt" out &&
|         grep "$hash2.*corrupt" out
| 
| error: inflate: data stream error (incorrect header check)
| error: unable to unpack ffffffffffffffffffffffffffffffffffffffff header
| error: inflate: data stream error (incorrect header check)
| fatal: loose object ffffffffffffffffffffffffffffffffffffffff (stored in .git/objects/ff/ffffffffffffffffffffffffffffffffffffff) is corrupt
| ======
| fatal: loose object ffffffffffffffffffffffffffffffffffffffff (stored in .git/objects/ff/ffffffffffffffffffffffffffffffffffffff) is corrupt
| not ok 5 - fsck survives inflate errors # TODO known breakage
`----

If I flip it from expect_failure to expect_success and run the test with
-i, then go into the trash directory and run "gdb ../../git-fsck", I can
obtain this (thoroughly rehearsed & trimmed) gdb transcript:

,----
| % gdb ../../git-fsck
| GNU gdb (Debian 7.7.1-3) 7.7.1
...
| Reading symbols from ../../git-fsck...done.
| (gdb) break error
| Breakpoint 1 at 0x813d24c: file usage.c, line 143.
| (gdb) break die
| Breakpoint 2 at 0x813d152: file usage.c, line 94.
| (gdb) run
| Starting program: /home/naesten/hacking/git/git-fsck
| [Thread debugging using libthread_db enabled]
| Using host libthread_db library "/lib/i386-linux-gnu/i686/cmov/libthread_db.so.1".
| Checking object directories: 100% (256/256), done.
| 
| Breakpoint 1, error (err=0x8182f7a "inflate: %s (%s)") at usage.c:143
| 143     {
| (gdb) bt
| #0  error (err=0x8182f7a "inflate: %s (%s)") at usage.c:143
| #1  0x081452ff in git_inflate (strm=0xbfffe6b8, flush=0)
|     at zlib.c:144
| #2  0x08125367 in unpack_sha1_header (stream=<optimized out>,
|     map=<optimized out>, mapsize=<optimized out>,
|     buffer=<optimized out>, bufsiz=<optimized out>)
|     at sha1_file.c:1515
| #3  0x08125546 in sha1_loose_object_info (
|     sha1=sha1@entry=0x82659d4 '\377' <repeats 20 times>,
|     oi=oi@entry=0xbfffe788) at sha1_file.c:2528
| #4  0x08126b2d in sha1_object_info_extended (
|     sha1=0x82659d4 '\377' <repeats 20 times>, oi=0xbfffe788, flags=1)
|     at sha1_file.c:2565
| #5  0x0812666f in sha1_object_info (
|     sha1=0x82659d4 '\377' <repeats 20 times>, sizep=0x0)
|     at sha1_file.c:2601
| #6  0x080f6941 in parse_object (
|     sha1=0x82659d4 '\377' <repeats 20 times>) at object.c:247
| #7  0x080758ac in fsck_sha1 (
|     sha1=sha1@entry=0x82659d4 '\377' <repeats 20 times>)
|     at builtin/fsck.c:333
...
| (gdb) c
| Continuing.
| error: inflate: data stream error (incorrect header check)
| 
| Breakpoint 1, error (err=0x817c525 "unable to unpack %s header")
|     at usage.c:143
| 143     {
| (gdb) bt
| #0  error (err=0x817c525 "unable to unpack %s header") at usage.c:143
| #1  0x08125564 in sha1_loose_object_info (
|     sha1=sha1@entry=0x82659d4 '\377' <repeats 20 times>,
|     oi=oi@entry=0xbfffe788) at sha1_file.c:2529
| #2  0x08126b2d in sha1_object_info_extended (
|     sha1=0x82659d4 '\377' <repeats 20 times>, oi=0xbfffe788, flags=1)
|     at sha1_file.c:2565
| #3  0x0812666f in sha1_object_info (
|     sha1=0x82659d4 '\377' <repeats 20 times>, sizep=0x0)
|     at sha1_file.c:2601
| #4  0x080f6941 in parse_object (
|     sha1=0x82659d4 '\377' <repeats 20 times>) at object.c:247
...
| (gdb) frame 4
| #4  0x080f6941 in parse_object (
|     sha1=0x82659d4 '\377' <repeats 20 times>) at object.c:247
| warning: Source file is more recent than executable.
| 247                  sha1_object_info(sha1, NULL) == OBJ_BLOB)) { // <-- first error
| (gdb) down
| #3  0x0812666f in sha1_object_info (
|     sha1=0x82659d4 '\377' <repeats 20 times>, sizep=0x0)
|     at sha1_file.c:2601
| 2601            if (sha1_object_info_extended(sha1, &oi, LOOKUP_REPLACE_OBJECT) < 0)
| (gdb) fin
| Run till exit from #3  0x0812666f in sha1_object_info (
|     sha1=0x82659d4 '\377' <repeats 20 times>, sizep=0x0)
|     at sha1_file.c:2601
| error: unable to unpack ffffffffffffffffffffffffffffffffffffffff header
| parse_object (sha1=0x82659d4 '\377' <repeats 20 times>)
|     at object.c:246
| 246                 (!obj && has_sha1_file(sha1) &&
| Value returned is $1 = -1
| (gdb) c
| Continuing.
| 
| Breakpoint 1, error (err=0x8182f7a "inflate: %s (%s)") at usage.c:143
| 143     {
| (gdb) bt
| #0  error (err=0x8182f7a "inflate: %s (%s)") at usage.c:143
| #1  0x081452ff in git_inflate (strm=0xbfffe710, flush=0)
|     at zlib.c:144
| #2  0x08125367 in unpack_sha1_header (stream=<optimized out>,
|     map=<optimized out>, mapsize=<optimized out>,
|     buffer=<optimized out>, bufsiz=<optimized out>)
|     at sha1_file.c:1515
| #3  0x08125429 in unpack_sha1_file (map=map@entry=0xb7fdc000,
|     mapsize=<optimized out>, type=type@entry=0xbfffe7d8,
|     size=0xbfffe7dc, sha1=0x82659d4 '\377' <repeats 20 times>)
|     at sha1_file.c:1620
| #4  0x08126024 in read_object (
|     sha1=sha1@entry=0x82659d4 '\377' <repeats 20 times>,
|     type=type@entry=0xbfffe7d8, size=size@entry=0xbfffe7dc)
|     at sha1_file.c:2667
| #5  0x08126c33 in read_sha1_file_extended (
|     sha1=0x82659d4 '\377' <repeats 20 times>, type=0xbfffe7d8,
|     size=0xbfffe7dc, flag=1) at sha1_file.c:2690
| #6  0x080f69b8 in read_sha1_file (size=0xbfffe7dc, type=0xbfffe7d8,
|     sha1=0x82659d4 '\377' <repeats 20 times>) at cache.h:853
| #7  parse_object (sha1=0x82659d4 '\377' <repeats 20 times>)
|     at object.c:256
...
| (gdb) c
| Continuing.
| error: inflate: data stream error (incorrect header check)
| 
| Breakpoint 2, die (
|     err=0x817cdbc "loose object %s (stored in %s) is corrupt")
|     at usage.c:94
| 94      {
| (gdb) bt
| #0  die (err=0x817cdbc "loose object %s (stored in %s) is corrupt")
|     at usage.c:94
| #1  0x08126cc1 in read_sha1_file_extended (
|     sha1=0x82659d4 '\377' <repeats 20 times>, type=0xbfffe7d8,
|     size=0xbfffe7dc, flag=1) at sha1_file.c:2705
| #2  0x080f69b8 in read_sha1_file (size=0xbfffe7dc, type=0xbfffe7d8,
|     sha1=0x82659d4 '\377' <repeats 20 times>) at cache.h:853
| #3  parse_object (sha1=0x82659d4 '\377' <repeats 20 times>)
|     at object.c:256
...
| (gdb) frame 3
| #3  parse_object (sha1=0x82659d4 '\377' <repeats 20 times>)
|     at object.c:256
| 256             buffer = read_sha1_file(sha1, &type, &size); // <-- death
| (gdb)
`----

It's probably clearest what's happened here if I just show you this ...

>From object.c:

--8<---------------cut here---------------start------------->8---
struct object *parse_object(const unsigned char *sha1)
{
	unsigned long size;
	enum object_type type;
	int eaten;
	const unsigned char *repl = lookup_replace_object(sha1);
	void *buffer;
	struct object *obj;

	obj = lookup_object(sha1);
	if (obj && obj->parsed)
		return obj;

	if ((obj && obj->type == OBJ_BLOB) ||
	    (!obj && has_sha1_file(sha1) &&
	     sha1_object_info(sha1, NULL) == OBJ_BLOB)) { // <-- first error
		if (check_sha1_signature(repl, NULL, 0, NULL) < 0) {
			error("sha1 mismatch %s", sha1_to_hex(repl));
			return NULL;
		}
		parse_blob_buffer(lookup_blob(sha1), NULL, 0);
		return lookup_object(sha1);
	}

	buffer = read_sha1_file(sha1, &type, &size); // <-- death
	if (buffer) {
		if (check_sha1_signature(repl, buffer, size, typename(type)) < 0) {
			free(buffer);
			error("sha1 mismatch %s", sha1_to_hex(repl));
			return NULL;
		}

		obj = parse_object_buffer(sha1, type, size, buffer, &eaten);
		if (!eaten)
			free(buffer);
		return obj;
	}
	return NULL;
}
--8<---------------cut here---------------end--------------->8---

(I've clearly added some marginal comments to my copy. ;-)

The first two lines of output

> error: inflate: data stream error (incorrect header check)
> error: unable to unpack ffffffffffffffffffffffffffffffffffffffff header

come from deep within the call "sha1_object_info(sha1, NULL)".

The next two lines

> error: inflate: data stream error (incorrect header check)
> fatal: loose object ffffffffffffffffffffffffffffffffffffffff (stored in .git/objects/ff/ffffffffffffffffffffffffffffffffffffff) is corrupt

and death come from the call "read_sha1_file(sha1, &type, &size)", which
is fair because that's exactly what the documentation comment above
read_sha1_file_extended() *says* will happen:

--8<---------------cut here---------------start------------->8---
/*
 * This function dies on corrupt objects; the callers who want to
 * deal with them should arrange to call read_object() and give error
 * messages themselves.
 */
--8<---------------cut here---------------end--------------->8---

So, given that parse_object()'s documentation is:

--8<---------------cut here---------------start------------->8---
/*
 * Returns the object, having parsed it to find out what it is.
 *
 * Returns NULL if the object is missing or corrupt.
 */
--8<---------------cut here---------------end--------------->8---

it probably should not call read_sha1_file() on a corrupt object.

Options for fixing this would appear to include:

1. Saving the result of sha1_object_info(sha1, NULL) to a variable and
   returning early if the object is corrupt.  (But what happens if there
   is corruption far enough in that it isn't seen when trying to grab
   the object header?)

2. Calling read_object() and giving our own error messages.

3. Making read_sha1_file_extended only *optionally* die; since it's
   calling die() directly.

I'm also a bit nervous about this, though:

--8<---------------cut here---------------start------------->8---
static inline void *read_sha1_file(const unsigned char *sha1, enum object_type *type, unsigned long *size)
{
	return read_sha1_file_extended(sha1, type, size, LOOKUP_REPLACE_OBJECT);
}
--8<---------------cut here---------------end--------------->8---

Do we really want that happening while scanning the loose objects?

-- 
Hi! I'm a .signature virus! Copy me into your ~/.signature to help me spread!

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [BUG] parse_object() does not behave as documented
  2014-07-20 20:43 ` Samuel Bronson
@ 2014-07-30  2:42   ` Samuel Bronson
  2014-08-09  5:43     ` Samuel Bronson
  2014-08-09  6:53   ` [PATCH] Add failing test: "fsck survives inflate errors" Duy Nguyen
  1 sibling, 1 reply; 5+ messages in thread
From: Samuel Bronson @ 2014-07-30  2:42 UTC (permalink / raw)
  To: git

[Hmm, nobody seems ot have commented on this analysis; maybe reposting
it with a subject containing [BUG] will help?]

Samuel Bronson <naesten@gmail.com> writes:

> The following message is a courtesy copy of an article
> that has been posted to gmane.comp.version-control.git as well.
>
> Oh, I forgot to provide any analysis of the problem.  Oops.
>
> It may be just as well, though; I was tired enough that I might have
> botched it in any case.  So, have an analysis:
>
> While inflate errors are obviously NOT GOOD, and should perhaps be
> instantly fatal for most commands, git fsck is something of a special
> case because it is useful to have *it* report as many corrupt objects as
> possible in one run.
>
> Unfortunately, this is not currently the case, as shown by the provided
> testcase.
>
> The output for this testcase is:
>
> ,----
> | checking known breakage:
> |         hash1=ffffffffffffffffffffffffffffffffffffffff &&
> |         hash2=fffffffffffffffffffffffffffffffffffffffe &&
> |         mkdir -p .git/objects/ff &&
> |         echo not-zlib >$(sha1_file $hash1) &&
> |         test_when_finished "remove_object $hash1" &&
> |         echo not-zlib >$(sha1_file $hash2) &&
> |         test_when_finished "remove_object $hash2" &&
> | 
> |         # Return value is not documented
> |         test_might_fail git fsck 2>out &&
> |         cat out && echo ====== &&
> |         grep "$hash1.*corrupt" out &&
> |         grep "$hash2.*corrupt" out
> | 
> | error: inflate: data stream error (incorrect header check)
> | error: unable to unpack ffffffffffffffffffffffffffffffffffffffff header
> | error: inflate: data stream error (incorrect header check)
> | fatal: loose object ffffffffffffffffffffffffffffffffffffffff (stored
> | in .git/objects/ff/ffffffffffffffffffffffffffffffffffffff) is
> | corrupt
> | ======
> | fatal: loose object ffffffffffffffffffffffffffffffffffffffff (stored
> | in .git/objects/ff/ffffffffffffffffffffffffffffffffffffff) is
> | corrupt
> | not ok 5 - fsck survives inflate errors # TODO known breakage
> `----
>
> If I flip it from expect_failure to expect_success and run the test with
> -i, then go into the trash directory and run "gdb ../../git-fsck", I can
> obtain this (thoroughly rehearsed & trimmed) gdb transcript:
>
> ,----
> | % gdb ../../git-fsck
> | GNU gdb (Debian 7.7.1-3) 7.7.1
> ...
> | Reading symbols from ../../git-fsck...done.
> | (gdb) break error
> | Breakpoint 1 at 0x813d24c: file usage.c, line 143.
> | (gdb) break die
> | Breakpoint 2 at 0x813d152: file usage.c, line 94.
> | (gdb) run
> | Starting program: /home/naesten/hacking/git/git-fsck
> | [Thread debugging using libthread_db enabled]
> | Using host libthread_db library
> | "/lib/i386-linux-gnu/i686/cmov/libthread_db.so.1".
> | Checking object directories: 100% (256/256), done.
> | 
> | Breakpoint 1, error (err=0x8182f7a "inflate: %s (%s)") at usage.c:143
> | 143     {
> | (gdb) bt
> | #0  error (err=0x8182f7a "inflate: %s (%s)") at usage.c:143
> | #1  0x081452ff in git_inflate (strm=0xbfffe6b8, flush=0)
> |     at zlib.c:144
> | #2  0x08125367 in unpack_sha1_header (stream=<optimized out>,
> |     map=<optimized out>, mapsize=<optimized out>,
> |     buffer=<optimized out>, bufsiz=<optimized out>)
> |     at sha1_file.c:1515
> | #3  0x08125546 in sha1_loose_object_info (
> |     sha1=sha1@entry=0x82659d4 '\377' <repeats 20 times>,
> |     oi=oi@entry=0xbfffe788) at sha1_file.c:2528
> | #4  0x08126b2d in sha1_object_info_extended (
> |     sha1=0x82659d4 '\377' <repeats 20 times>, oi=0xbfffe788, flags=1)
> |     at sha1_file.c:2565
> | #5  0x0812666f in sha1_object_info (
> |     sha1=0x82659d4 '\377' <repeats 20 times>, sizep=0x0)
> |     at sha1_file.c:2601
> | #6  0x080f6941 in parse_object (
> |     sha1=0x82659d4 '\377' <repeats 20 times>) at object.c:247
> | #7  0x080758ac in fsck_sha1 (
> |     sha1=sha1@entry=0x82659d4 '\377' <repeats 20 times>)
> |     at builtin/fsck.c:333
> ...
> | (gdb) c
> | Continuing.
> | error: inflate: data stream error (incorrect header check)
> | 
> | Breakpoint 1, error (err=0x817c525 "unable to unpack %s header")
> |     at usage.c:143
> | 143     {
> | (gdb) bt
> | #0  error (err=0x817c525 "unable to unpack %s header") at usage.c:143
> | #1  0x08125564 in sha1_loose_object_info (
> |     sha1=sha1@entry=0x82659d4 '\377' <repeats 20 times>,
> |     oi=oi@entry=0xbfffe788) at sha1_file.c:2529
> | #2  0x08126b2d in sha1_object_info_extended (
> |     sha1=0x82659d4 '\377' <repeats 20 times>, oi=0xbfffe788, flags=1)
> |     at sha1_file.c:2565
> | #3  0x0812666f in sha1_object_info (
> |     sha1=0x82659d4 '\377' <repeats 20 times>, sizep=0x0)
> |     at sha1_file.c:2601
> | #4  0x080f6941 in parse_object (
> |     sha1=0x82659d4 '\377' <repeats 20 times>) at object.c:247
> ...
> | (gdb) frame 4
> | #4  0x080f6941 in parse_object (
> |     sha1=0x82659d4 '\377' <repeats 20 times>) at object.c:247
> | warning: Source file is more recent than executable.
> | 247 sha1_object_info(sha1, NULL) == OBJ_BLOB)) { // <-- first error
> | (gdb) down
> | #3  0x0812666f in sha1_object_info (
> |     sha1=0x82659d4 '\377' <repeats 20 times>, sizep=0x0)
> |     at sha1_file.c:2601
> | 2601 if (sha1_object_info_extended(sha1, &oi, LOOKUP_REPLACE_OBJECT)
> | < 0)
> | (gdb) fin
> | Run till exit from #3  0x0812666f in sha1_object_info (
> |     sha1=0x82659d4 '\377' <repeats 20 times>, sizep=0x0)
> |     at sha1_file.c:2601
> | error: unable to unpack ffffffffffffffffffffffffffffffffffffffff header
> | parse_object (sha1=0x82659d4 '\377' <repeats 20 times>)
> |     at object.c:246
> | 246                 (!obj && has_sha1_file(sha1) &&
> | Value returned is $1 = -1
> | (gdb) c
> | Continuing.
> | 
> | Breakpoint 1, error (err=0x8182f7a "inflate: %s (%s)") at usage.c:143
> | 143     {
> | (gdb) bt
> | #0  error (err=0x8182f7a "inflate: %s (%s)") at usage.c:143
> | #1  0x081452ff in git_inflate (strm=0xbfffe710, flush=0)
> |     at zlib.c:144
> | #2  0x08125367 in unpack_sha1_header (stream=<optimized out>,
> |     map=<optimized out>, mapsize=<optimized out>,
> |     buffer=<optimized out>, bufsiz=<optimized out>)
> |     at sha1_file.c:1515
> | #3  0x08125429 in unpack_sha1_file (map=map@entry=0xb7fdc000,
> |     mapsize=<optimized out>, type=type@entry=0xbfffe7d8,
> |     size=0xbfffe7dc, sha1=0x82659d4 '\377' <repeats 20 times>)
> |     at sha1_file.c:1620
> | #4  0x08126024 in read_object (
> |     sha1=sha1@entry=0x82659d4 '\377' <repeats 20 times>,
> |     type=type@entry=0xbfffe7d8, size=size@entry=0xbfffe7dc)
> |     at sha1_file.c:2667
> | #5  0x08126c33 in read_sha1_file_extended (
> |     sha1=0x82659d4 '\377' <repeats 20 times>, type=0xbfffe7d8,
> |     size=0xbfffe7dc, flag=1) at sha1_file.c:2690
> | #6  0x080f69b8 in read_sha1_file (size=0xbfffe7dc, type=0xbfffe7d8,
> |     sha1=0x82659d4 '\377' <repeats 20 times>) at cache.h:853
> | #7  parse_object (sha1=0x82659d4 '\377' <repeats 20 times>)
> |     at object.c:256
> ...
> | (gdb) c
> | Continuing.
> | error: inflate: data stream error (incorrect header check)
> | 
> | Breakpoint 2, die (
> |     err=0x817cdbc "loose object %s (stored in %s) is corrupt")
> |     at usage.c:94
> | 94      {
> | (gdb) bt
> | #0  die (err=0x817cdbc "loose object %s (stored in %s) is corrupt")
> |     at usage.c:94
> | #1  0x08126cc1 in read_sha1_file_extended (
> |     sha1=0x82659d4 '\377' <repeats 20 times>, type=0xbfffe7d8,
> |     size=0xbfffe7dc, flag=1) at sha1_file.c:2705
> | #2  0x080f69b8 in read_sha1_file (size=0xbfffe7dc, type=0xbfffe7d8,
> |     sha1=0x82659d4 '\377' <repeats 20 times>) at cache.h:853
> | #3  parse_object (sha1=0x82659d4 '\377' <repeats 20 times>)
> |     at object.c:256
> ...
> | (gdb) frame 3
> | #3  parse_object (sha1=0x82659d4 '\377' <repeats 20 times>)
> |     at object.c:256
> | 256             buffer = read_sha1_file(sha1, &type, &size); // <-- death
> | (gdb)
> `----
>
> It's probably clearest what's happened here if I just show you this ...
>
> From object.c:
>
> struct object *parse_object(const unsigned char *sha1)
> {
> 	unsigned long size;
> 	enum object_type type;
> 	int eaten;
> 	const unsigned char *repl = lookup_replace_object(sha1);
> 	void *buffer;
> 	struct object *obj;
>
> 	obj = lookup_object(sha1);
> 	if (obj && obj->parsed)
> 		return obj;
>
> 	if ((obj && obj->type == OBJ_BLOB) ||
> 	    (!obj && has_sha1_file(sha1) &&
> 	     sha1_object_info(sha1, NULL) == OBJ_BLOB)) { // <-- first error
> 		if (check_sha1_signature(repl, NULL, 0, NULL) < 0) {
> 			error("sha1 mismatch %s", sha1_to_hex(repl));
> 			return NULL;
> 		}
> 		parse_blob_buffer(lookup_blob(sha1), NULL, 0);
> 		return lookup_object(sha1);
> 	}
>
> 	buffer = read_sha1_file(sha1, &type, &size); // <-- death
> 	if (buffer) {
> 		if (check_sha1_signature(repl, buffer, size, typename(type)) < 0) {
> 			free(buffer);
> 			error("sha1 mismatch %s", sha1_to_hex(repl));
> 			return NULL;
> 		}
>
> 		obj = parse_object_buffer(sha1, type, size, buffer, &eaten);
> 		if (!eaten)
> 			free(buffer);
> 		return obj;
> 	}
> 	return NULL;
> }
>
> (I've clearly added some marginal comments to my copy. ;-)
>
> The first two lines of output
>
>> error: inflate: data stream error (incorrect header check)
>> error: unable to unpack ffffffffffffffffffffffffffffffffffffffff header
>
> come from deep within the call "sha1_object_info(sha1, NULL)".
>
> The next two lines
>
>> error: inflate: data stream error (incorrect header check)
>> fatal: loose object ffffffffffffffffffffffffffffffffffffffff (stored
>> in .git/objects/ff/ffffffffffffffffffffffffffffffffffffff) is
>> corrupt
>
> and death come from the call "read_sha1_file(sha1, &type, &size)", which
> is fair because that's exactly what the documentation comment above
> read_sha1_file_extended() *says* will happen:
>
> /*
>  * This function dies on corrupt objects; the callers who want to
>  * deal with them should arrange to call read_object() and give error
>  * messages themselves.
>  */
>
> So, given that parse_object()'s documentation is:
>
> /*
>  * Returns the object, having parsed it to find out what it is.
>  *
>  * Returns NULL if the object is missing or corrupt.
>  */
>
> it probably should not call read_sha1_file() on a corrupt object.
>
> Options for fixing this would appear to include:
>
> 1. Saving the result of sha1_object_info(sha1, NULL) to a variable and
>    returning early if the object is corrupt.  (But what happens if there
>    is corruption far enough in that it isn't seen when trying to grab
>    the object header?)
>
> 2. Calling read_object() and giving our own error messages.
>
> 3. Making read_sha1_file_extended only *optionally* die; since it's
>    calling die() directly.
>
> I'm also a bit nervous about this, though:
>
> static inline void *read_sha1_file(const unsigned char *sha1, enum object_type *type, unsigned long *size)
> {
> 	return read_sha1_file_extended(sha1, type, size, LOOKUP_REPLACE_OBJECT);
> }
>
> Do we really want that happening while scanning the loose objects?

-- 
Hi! I'm a .signature virus! Copy me into your ~/.signature to help me spread!

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [BUG] parse_object() does not behave as documented
  2014-07-30  2:42   ` [BUG] parse_object() does not behave as documented Samuel Bronson
@ 2014-08-09  5:43     ` Samuel Bronson
  0 siblings, 0 replies; 5+ messages in thread
From: Samuel Bronson @ 2014-08-09  5:43 UTC (permalink / raw)
  To: git

Ping?

Samuel Bronson <naesten@gmail.com> writes:

> [Hmm, nobody seems ot have commented on this analysis; maybe reposting
> it with a subject containing [BUG] will help?]
>
> Samuel Bronson <naesten@gmail.com> writes:
>
>> The following message is a courtesy copy of an article
>> that has been posted to gmane.comp.version-control.git as well.
>>
>> Oh, I forgot to provide any analysis of the problem.  Oops.
>>
>> It may be just as well, though; I was tired enough that I might have
>> botched it in any case.  So, have an analysis:
>>
>> While inflate errors are obviously NOT GOOD, and should perhaps be
>> instantly fatal for most commands, git fsck is something of a special
>> case because it is useful to have *it* report as many corrupt objects as
>> possible in one run.
>>
>> Unfortunately, this is not currently the case, as shown by the provided
>> testcase.
>>
>> The output for this testcase is:
>>
>> ,----
>> | checking known breakage:
>> |         hash1=ffffffffffffffffffffffffffffffffffffffff &&
>> |         hash2=fffffffffffffffffffffffffffffffffffffffe &&
>> |         mkdir -p .git/objects/ff &&
>> |         echo not-zlib >$(sha1_file $hash1) &&
>> |         test_when_finished "remove_object $hash1" &&
>> |         echo not-zlib >$(sha1_file $hash2) &&
>> |         test_when_finished "remove_object $hash2" &&
>> | 
>> |         # Return value is not documented
>> |         test_might_fail git fsck 2>out &&
>> |         cat out && echo ====== &&
>> |         grep "$hash1.*corrupt" out &&
>> |         grep "$hash2.*corrupt" out
>> | 
>> | error: inflate: data stream error (incorrect header check)
>> | error: unable to unpack ffffffffffffffffffffffffffffffffffffffff header
>> | error: inflate: data stream error (incorrect header check)
>> | fatal: loose object ffffffffffffffffffffffffffffffffffffffff (stored
>> | in .git/objects/ff/ffffffffffffffffffffffffffffffffffffff) is
>> | corrupt
>> | ======
>> | fatal: loose object ffffffffffffffffffffffffffffffffffffffff (stored
>> | in .git/objects/ff/ffffffffffffffffffffffffffffffffffffff) is
>> | corrupt
>> | not ok 5 - fsck survives inflate errors # TODO known breakage
>> `----
>>
>> If I flip it from expect_failure to expect_success and run the test with
>> -i, then go into the trash directory and run "gdb ../../git-fsck", I can
>> obtain this (thoroughly rehearsed & trimmed) gdb transcript:
>>
>> ,----
>> | % gdb ../../git-fsck
>> | GNU gdb (Debian 7.7.1-3) 7.7.1
>> ...
>> | Reading symbols from ../../git-fsck...done.
>> | (gdb) break error
>> | Breakpoint 1 at 0x813d24c: file usage.c, line 143.
>> | (gdb) break die
>> | Breakpoint 2 at 0x813d152: file usage.c, line 94.
>> | (gdb) run
>> | Starting program: /home/naesten/hacking/git/git-fsck
>> | [Thread debugging using libthread_db enabled]
>> | Using host libthread_db library
>> | "/lib/i386-linux-gnu/i686/cmov/libthread_db.so.1".
>> | Checking object directories: 100% (256/256), done.
>> | 
>> | Breakpoint 1, error (err=0x8182f7a "inflate: %s (%s)") at usage.c:143
>> | 143     {
>> | (gdb) bt
>> | #0  error (err=0x8182f7a "inflate: %s (%s)") at usage.c:143
>> | #1  0x081452ff in git_inflate (strm=0xbfffe6b8, flush=0)
>> |     at zlib.c:144
>> | #2  0x08125367 in unpack_sha1_header (stream=<optimized out>,
>> |     map=<optimized out>, mapsize=<optimized out>,
>> |     buffer=<optimized out>, bufsiz=<optimized out>)
>> |     at sha1_file.c:1515
>> | #3  0x08125546 in sha1_loose_object_info (
>> |     sha1=sha1@entry=0x82659d4 '\377' <repeats 20 times>,
>> |     oi=oi@entry=0xbfffe788) at sha1_file.c:2528
>> | #4  0x08126b2d in sha1_object_info_extended (
>> |     sha1=0x82659d4 '\377' <repeats 20 times>, oi=0xbfffe788, flags=1)
>> |     at sha1_file.c:2565
>> | #5  0x0812666f in sha1_object_info (
>> |     sha1=0x82659d4 '\377' <repeats 20 times>, sizep=0x0)
>> |     at sha1_file.c:2601
>> | #6  0x080f6941 in parse_object (
>> |     sha1=0x82659d4 '\377' <repeats 20 times>) at object.c:247
>> | #7  0x080758ac in fsck_sha1 (
>> |     sha1=sha1@entry=0x82659d4 '\377' <repeats 20 times>)
>> |     at builtin/fsck.c:333
>> ...
>> | (gdb) c
>> | Continuing.
>> | error: inflate: data stream error (incorrect header check)
>> | 
>> | Breakpoint 1, error (err=0x817c525 "unable to unpack %s header")
>> |     at usage.c:143
>> | 143     {
>> | (gdb) bt
>> | #0  error (err=0x817c525 "unable to unpack %s header") at usage.c:143
>> | #1  0x08125564 in sha1_loose_object_info (
>> |     sha1=sha1@entry=0x82659d4 '\377' <repeats 20 times>,
>> |     oi=oi@entry=0xbfffe788) at sha1_file.c:2529
>> | #2  0x08126b2d in sha1_object_info_extended (
>> |     sha1=0x82659d4 '\377' <repeats 20 times>, oi=0xbfffe788, flags=1)
>> |     at sha1_file.c:2565
>> | #3  0x0812666f in sha1_object_info (
>> |     sha1=0x82659d4 '\377' <repeats 20 times>, sizep=0x0)
>> |     at sha1_file.c:2601
>> | #4  0x080f6941 in parse_object (
>> |     sha1=0x82659d4 '\377' <repeats 20 times>) at object.c:247
>> ...
>> | (gdb) frame 4
>> | #4  0x080f6941 in parse_object (
>> |     sha1=0x82659d4 '\377' <repeats 20 times>) at object.c:247
>> | warning: Source file is more recent than executable.
>> | 247 sha1_object_info(sha1, NULL) == OBJ_BLOB)) { // <-- first error
>> | (gdb) down
>> | #3  0x0812666f in sha1_object_info (
>> |     sha1=0x82659d4 '\377' <repeats 20 times>, sizep=0x0)
>> |     at sha1_file.c:2601
>> | 2601 if (sha1_object_info_extended(sha1, &oi, LOOKUP_REPLACE_OBJECT)
>> | < 0)
>> | (gdb) fin
>> | Run till exit from #3  0x0812666f in sha1_object_info (
>> |     sha1=0x82659d4 '\377' <repeats 20 times>, sizep=0x0)
>> |     at sha1_file.c:2601
>> | error: unable to unpack ffffffffffffffffffffffffffffffffffffffff header
>> | parse_object (sha1=0x82659d4 '\377' <repeats 20 times>)
>> |     at object.c:246
>> | 246                 (!obj && has_sha1_file(sha1) &&
>> | Value returned is $1 = -1
>> | (gdb) c
>> | Continuing.
>> | 
>> | Breakpoint 1, error (err=0x8182f7a "inflate: %s (%s)") at usage.c:143
>> | 143     {
>> | (gdb) bt
>> | #0  error (err=0x8182f7a "inflate: %s (%s)") at usage.c:143
>> | #1  0x081452ff in git_inflate (strm=0xbfffe710, flush=0)
>> |     at zlib.c:144
>> | #2  0x08125367 in unpack_sha1_header (stream=<optimized out>,
>> |     map=<optimized out>, mapsize=<optimized out>,
>> |     buffer=<optimized out>, bufsiz=<optimized out>)
>> |     at sha1_file.c:1515
>> | #3  0x08125429 in unpack_sha1_file (map=map@entry=0xb7fdc000,
>> |     mapsize=<optimized out>, type=type@entry=0xbfffe7d8,
>> |     size=0xbfffe7dc, sha1=0x82659d4 '\377' <repeats 20 times>)
>> |     at sha1_file.c:1620
>> | #4  0x08126024 in read_object (
>> |     sha1=sha1@entry=0x82659d4 '\377' <repeats 20 times>,
>> |     type=type@entry=0xbfffe7d8, size=size@entry=0xbfffe7dc)
>> |     at sha1_file.c:2667
>> | #5  0x08126c33 in read_sha1_file_extended (
>> |     sha1=0x82659d4 '\377' <repeats 20 times>, type=0xbfffe7d8,
>> |     size=0xbfffe7dc, flag=1) at sha1_file.c:2690
>> | #6  0x080f69b8 in read_sha1_file (size=0xbfffe7dc, type=0xbfffe7d8,
>> |     sha1=0x82659d4 '\377' <repeats 20 times>) at cache.h:853
>> | #7  parse_object (sha1=0x82659d4 '\377' <repeats 20 times>)
>> |     at object.c:256
>> ...
>> | (gdb) c
>> | Continuing.
>> | error: inflate: data stream error (incorrect header check)
>> | 
>> | Breakpoint 2, die (
>> |     err=0x817cdbc "loose object %s (stored in %s) is corrupt")
>> |     at usage.c:94
>> | 94      {
>> | (gdb) bt
>> | #0  die (err=0x817cdbc "loose object %s (stored in %s) is corrupt")
>> |     at usage.c:94
>> | #1  0x08126cc1 in read_sha1_file_extended (
>> |     sha1=0x82659d4 '\377' <repeats 20 times>, type=0xbfffe7d8,
>> |     size=0xbfffe7dc, flag=1) at sha1_file.c:2705
>> | #2  0x080f69b8 in read_sha1_file (size=0xbfffe7dc, type=0xbfffe7d8,
>> |     sha1=0x82659d4 '\377' <repeats 20 times>) at cache.h:853
>> | #3  parse_object (sha1=0x82659d4 '\377' <repeats 20 times>)
>> |     at object.c:256
>> ...
>> | (gdb) frame 3
>> | #3  parse_object (sha1=0x82659d4 '\377' <repeats 20 times>)
>> |     at object.c:256
>> | 256             buffer = read_sha1_file(sha1, &type, &size); // <-- death
>> | (gdb)
>> `----
>>
>> It's probably clearest what's happened here if I just show you this ...
>>
>> From object.c:
>>
>> struct object *parse_object(const unsigned char *sha1)
>> {
>> 	unsigned long size;
>> 	enum object_type type;
>> 	int eaten;
>> 	const unsigned char *repl = lookup_replace_object(sha1);
>> 	void *buffer;
>> 	struct object *obj;
>>
>> 	obj = lookup_object(sha1);
>> 	if (obj && obj->parsed)
>> 		return obj;
>>
>> 	if ((obj && obj->type == OBJ_BLOB) ||
>> 	    (!obj && has_sha1_file(sha1) &&
>> 	     sha1_object_info(sha1, NULL) == OBJ_BLOB)) { // <-- first error
>> 		if (check_sha1_signature(repl, NULL, 0, NULL) < 0) {
>> 			error("sha1 mismatch %s", sha1_to_hex(repl));
>> 			return NULL;
>> 		}
>> 		parse_blob_buffer(lookup_blob(sha1), NULL, 0);
>> 		return lookup_object(sha1);
>> 	}
>>
>> 	buffer = read_sha1_file(sha1, &type, &size); // <-- death
>> 	if (buffer) {
>> 		if (check_sha1_signature(repl, buffer, size, typename(type)) < 0) {
>> 			free(buffer);
>> 			error("sha1 mismatch %s", sha1_to_hex(repl));
>> 			return NULL;
>> 		}
>>
>> 		obj = parse_object_buffer(sha1, type, size, buffer, &eaten);
>> 		if (!eaten)
>> 			free(buffer);
>> 		return obj;
>> 	}
>> 	return NULL;
>> }
>>
>> (I've clearly added some marginal comments to my copy. ;-)
>>
>> The first two lines of output
>>
>>> error: inflate: data stream error (incorrect header check)
>>> error: unable to unpack ffffffffffffffffffffffffffffffffffffffff header
>>
>> come from deep within the call "sha1_object_info(sha1, NULL)".
>>
>> The next two lines
>>
>>> error: inflate: data stream error (incorrect header check)
>>> fatal: loose object ffffffffffffffffffffffffffffffffffffffff (stored
>>> in .git/objects/ff/ffffffffffffffffffffffffffffffffffffff) is
>>> corrupt
>>
>> and death come from the call "read_sha1_file(sha1, &type, &size)", which
>> is fair because that's exactly what the documentation comment above
>> read_sha1_file_extended() *says* will happen:
>>
>> /*
>>  * This function dies on corrupt objects; the callers who want to
>>  * deal with them should arrange to call read_object() and give error
>>  * messages themselves.
>>  */
>>
>> So, given that parse_object()'s documentation is:
>>
>> /*
>>  * Returns the object, having parsed it to find out what it is.
>>  *
>>  * Returns NULL if the object is missing or corrupt.
>>  */
>>
>> it probably should not call read_sha1_file() on a corrupt object.
>>
>> Options for fixing this would appear to include:
>>
>> 1. Saving the result of sha1_object_info(sha1, NULL) to a variable and
>>    returning early if the object is corrupt.  (But what happens if there
>>    is corruption far enough in that it isn't seen when trying to grab
>>    the object header?)
>>
>> 2. Calling read_object() and giving our own error messages.
>>
>> 3. Making read_sha1_file_extended only *optionally* die; since it's
>>    calling die() directly.
>>
>> I'm also a bit nervous about this, though:
>>
>> static inline void *read_sha1_file(const unsigned char *sha1, enum object_type *type, unsigned long *size)
>> {
>> 	return read_sha1_file_extended(sha1, type, size, LOOKUP_REPLACE_OBJECT);
>> }
>>
>> Do we really want that happening while scanning the loose objects?

-- 
Hi! I'm a .signature virus! Copy me into your ~/.signature to help me spread!

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] Add failing test: "fsck survives inflate errors"
  2014-07-20 20:43 ` Samuel Bronson
  2014-07-30  2:42   ` [BUG] parse_object() does not behave as documented Samuel Bronson
@ 2014-08-09  6:53   ` Duy Nguyen
  1 sibling, 0 replies; 5+ messages in thread
From: Duy Nguyen @ 2014-08-09  6:53 UTC (permalink / raw)
  To: Samuel Bronson; +Cc: Git Mailing List

On Mon, Jul 21, 2014 at 3:43 AM, Samuel Bronson <naesten@gmail.com> wrote:
> So, given that parse_object()'s documentation is:
>
> --8<---------------cut here---------------start------------->8---
> /*
>  * Returns the object, having parsed it to find out what it is.
>  *
>  * Returns NULL if the object is missing or corrupt.
>  */
> --8<---------------cut here---------------end--------------->8---
>
> it probably should not call read_sha1_file() on a corrupt object.
>
> Options for fixing this would appear to include:
>
> 1. Saving the result of sha1_object_info(sha1, NULL) to a variable and
>    returning early if the object is corrupt.  (But what happens if there
>    is corruption far enough in that it isn't seen when trying to grab
>    the object header?)
>
> 2. Calling read_object() and giving our own error messages.
>
> 3. Making read_sha1_file_extended only *optionally* die; since it's
>    calling die() directly.

We've been using die() quite freely (or at least used to) and there
are many more cases that can trigger die() and parse_object() can do
nothing about it. Adding a "gentle" flag to read_sha1_file_extended
and pass it further down could be the first step. Patches welcome.
-- 
Duy

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2014-08-09  6:53 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-07-20  4:43 [PATCH] Add failing test: "fsck survives inflate errors" Samuel Bronson
2014-07-20 20:43 ` Samuel Bronson
2014-07-30  2:42   ` [BUG] parse_object() does not behave as documented Samuel Bronson
2014-08-09  5:43     ` Samuel Bronson
2014-08-09  6:53   ` [PATCH] Add failing test: "fsck survives inflate errors" Duy Nguyen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.