All of lore.kernel.org
 help / color / mirror / Atom feed
* [BUG] resolved deltas
@ 2014-08-21 11:35 Petr Stodulka
  2014-08-21 18:25 ` Petr Stodulka
  2014-08-22 19:41 ` Martin von Gagern
  0 siblings, 2 replies; 30+ messages in thread
From: Petr Stodulka @ 2014-08-21 11:35 UTC (permalink / raw)
  To: git

Hi guys,
I wanted post you patch here for this bug, but I can't find primary 
source of this problem [0], because I don't understand some ideas in the 
code. So what I investigate:

Bug is reprodusible since git version 1.8.3.1 (may earlier 1.8.xx, but I 
don't test it) to actual upstream version.
This problem "doesn't exists" in version 1.7.xx - or more precisely is 
not reproducible. "May" this is reproducible
since commit "7218a215" - in this commit was added assert in file 
"builtin/index-pack.c" (actual line is 918), but I didn't test this.

This assert tests if object's (child) real type == OBJ_REF_DELTA. This 
will failure for object with real_type == OBJ_TREE (set as parent's 
type) and type == OBJ_REF_DELTA. Here some prints of important variables 
before failure assert() (from older version but I think that values are 
still actual in this case):
------------------------------
(gdb) p base->ref_first
$9 = 3223

(gdb) p deltas[3223]
$10 = {
   base = {
     sha1 = "\274\070k\343K\324x\037q\273h\327*n\n\356\061$ \036",
     offset = 2267795834784135356
   },
   obj_no = 11152
}

(gdb) p *child
$11 = {
   idx = {
     sha1 = "J\242i\251\261\273\305\067\236%CE\022\257\252\342[;\tD",
     crc32 = 2811659028,
     offset = 10392153
   },
   size = 30,
   hdr_size = 22,
   type = OBJ_REF_DELTA,
   real_type = OBJ_TREE,
   delta_depth = 0,
   base_object_no = 5458
}

(gdb) p objects[5458]
$13 = {
   idx = {
     sha1 = "\274\070k\343K\324x\037q\273h\327*n\n\356\061$ \036",
     crc32 = 3724458534,
     offset = 6879168
   },
   size = 236,
   hdr_size = 2,
   type = OBJ_TREE,
   real_type = OBJ_TREE,
   delta_depth = 0,
   base_object_no = 0
}
---------------------------------------

"base_object_no" is static 5458. "base->ref_first" child's object are 
dynamic. If you want stop process in same position my recommendation for 
gdb (if you use gdb) when you will be in file index-pack.c:
br 1093
cont
set variable nr_threads = 1
br 1111
cond 2 i == 6300
cont
br 916
cont
-----------
compilated without any optimisation, line numbers modified for commit 
"6c4ab27f2378ce67940b4496365043119d7ffff2"
Condition i == 6300 ---> last iteration before failure has dynamic rank 
in range 6304 to 6309 in the most cases (that's weird for me, when count 
of downloaded objects is statically 12806, may wrong search of children?)
-----------------------------------

Here I am lost. I don't know really what I can do next here, because I 
don't understand some ideas in code. e.g. searching of child - functions 
find_delta(), find_delta_children(). Calculation on line 618:
----
     int next = (first+last) / 2;
----
I still don't understand. I didn't find description of this searching 
algorithm in tech. documentation but I didn't read all yet. However I 
think that source of problems could be somewhere in these two functions. 
When child is found, its real_type is set to parent's type in function 
resolve_delta() on the line 865 and then lasts wait for failure. I don't 
think that problem is in repository itself [1], but it is possible.

Any next ideas/hints or explanation of these functions? I began study 
source code and mechanisms of the git this week, so don't beat me yet 
please :-)

Regards,
Petr

[0] https://bugzilla.redhat.com/show_bug.cgi?id=1099919
[1] git clone https://code.google.com/p/mapsforge/ mapsforge.git

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [BUG] resolved deltas
  2014-08-21 11:35 [BUG] resolved deltas Petr Stodulka
@ 2014-08-21 18:25 ` Petr Stodulka
  2014-08-22 19:41 ` Martin von Gagern
  1 sibling, 0 replies; 30+ messages in thread
From: Petr Stodulka @ 2014-08-21 18:25 UTC (permalink / raw)
  To: git


> <snip>
> Bug is reprodusible since git version 1.8.3.1 (may earlier 1.8.xx, but 
> I don't test it) to actual upstream version.
> This problem "doesn't exists" in version 1.7.xx - or more precisely is 
> not reproducible. "May" this is reproducible
> since commit "7218a215" - in this commit was added assert in file 
> "builtin/index-pack.c" (actual line is 918), but I didn't test this.
Ok so this is reproducible since this commit because of assert().

> Here I am lost. I don't know really what I can do next here, because I 
> don't understand some ideas in code. e.g. searching of child - 
> functions find_delta(), find_delta_children(). Calculation on line 618:
> ----
>     int next = (first+last) / 2;
> ----
> I still don't understand. I didn't find description of this searching 
> algorithm in tech. documentation but I didn't read all yet. However I 
> think that source of problems could be somewhere in these two 
> functions. When child is found, its real_type is set to parent's type 
> in function resolve_delta() on the line 865 and then lasts wait for 
> failure. I don't think that problem is in repository itself [1], but 
> it is possible.
I read history of commits and my idea seems to be incorrect. It seems 
more like some error in repository itself. But I'd rather get opinion 
from someone who knows this code and ideas better.

Regards,
Petr

> [0] https://bugzilla.redhat.com/show_bug.cgi?id=1099919
> [1] git clone https://code.google.com/p/mapsforge/ mapsforge.git

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [BUG] resolved deltas
  2014-08-21 11:35 [BUG] resolved deltas Petr Stodulka
  2014-08-21 18:25 ` Petr Stodulka
@ 2014-08-22 19:41 ` Martin von Gagern
  2014-08-23 10:12   ` René Scharfe
  1 sibling, 1 reply; 30+ messages in thread
From: Martin von Gagern @ 2014-08-22 19:41 UTC (permalink / raw)
  To: git

On 21.08.2014 13:35, Petr Stodulka wrote:
> Hi guys,
> I wanted post you patch here for this bug, but I can't find primary
> source of this problem [0], because I don't understand some ideas in the
> code.
>
> […]
> 
> Any next ideas/hints or explanation of these functions? I began study
> source code and mechanisms of the git this week, so don't beat me yet
> please :-)
> 
> Regards,
> Petr
> 
> [0] https://bugzilla.redhat.com/show_bug.cgi?id=1099919

Some pointers to related reports and investigations:

https://groups.google.com/forum/#!topic/mapsforge-dev/IF6mgmwvZmY
https://groups.google.com/forum/#!topic/mapsforge-dev/f2KvFALlkvo
https://code.google.com/p/support/issues/detail?id=31571
https://groups.google.com/forum/#!topic/mapsforge-dev/nomzr5dkkqc
http://thread.gmane.org/gmane.comp.version-control.git/254626

The last is my own bug report to this mailing list here, which
unfortunately received no reaction yet. In that report, I can confirm
that the commit introducing the assertion is the same commit which
causes things to fail:
https://github.com/git/git/commit/7218a215efc7ae46f7ca8d82442f354e

In this https://code.google.com/p/mapsforge/ repository, resolve_delta
gets called twice for some delta. The first time, type and real_type are
identical, but by the second pass in find_unresolved_deltas the real
type will have chaned to OBJ_TREE. This caused the old code to simply
scip the object, but the new code aborts instead.

So far my understanding. I'm not sure whether this kind of duplicate
resolution is something normal or indicates some breakage in the
repository in question. But aborting seems a bad solution in either case.

Greetings,
 Martin von Gagern

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [BUG] resolved deltas
  2014-08-22 19:41 ` Martin von Gagern
@ 2014-08-23 10:12   ` René Scharfe
  2014-08-23 10:56     ` Jeff King
  0 siblings, 1 reply; 30+ messages in thread
From: René Scharfe @ 2014-08-23 10:12 UTC (permalink / raw)
  To: Martin von Gagern, git; +Cc: Junio C Hamano, Jeff King

Am 22.08.2014 um 21:41 schrieb Martin von Gagern:
> On 21.08.2014 13:35, Petr Stodulka wrote:
>> Hi guys,
>> I wanted post you patch here for this bug, but I can't find primary
>> source of this problem [0], because I don't understand some ideas in the
>> code.
>>
>> […]
>>
>> Any next ideas/hints or explanation of these functions? I began study
>> source code and mechanisms of the git this week, so don't beat me yet
>> please :-)
>>
>> Regards,
>> Petr
>>
>> [0] https://bugzilla.redhat.com/show_bug.cgi?id=1099919
> 
> Some pointers to related reports and investigations:
> 
> https://groups.google.com/forum/#!topic/mapsforge-dev/IF6mgmwvZmY
> https://groups.google.com/forum/#!topic/mapsforge-dev/f2KvFALlkvo
> https://code.google.com/p/support/issues/detail?id=31571
> https://groups.google.com/forum/#!topic/mapsforge-dev/nomzr5dkkqc
> http://thread.gmane.org/gmane.comp.version-control.git/254626
> 
> The last is my own bug report to this mailing list here, which
> unfortunately received no reaction yet. In that report, I can confirm
> that the commit introducing the assertion is the same commit which
> causes things to fail:
> https://github.com/git/git/commit/7218a215efc7ae46f7ca8d82442f354e
> 
> In this https://code.google.com/p/mapsforge/ repository, resolve_delta
> gets called twice for some delta. The first time, type and real_type are
> identical, but by the second pass in find_unresolved_deltas the real
> type will have chaned to OBJ_TREE. This caused the old code to simply
> scip the object, but the new code aborts instead.
> 
> So far my understanding. I'm not sure whether this kind of duplicate
> resolution is something normal or indicates some breakage in the
> repository in question. But aborting seems a bad solution in either case.

Git 1.7.6 clones the repo without reporting an error or warning, but a
check shows that a tree object is missing:

  $ git clone https://code.google.com/p/mapsforge/
  Cloning into mapsforge...
  remote: Counting objects: 12879, done.
  Receiving objects: 100% (12879/12879), 10.19 MiB | 2.48 MiB/s, done.
  Resolving deltas: 100% (4349/4349), done.
  $ cd mapsforge/
  $ git fsck
  broken link from    tree eb95fb0268c43f512e2ce512e6625072acafbdb5
                to    tree a0155d8d5bb58afb9a5d20459be023899c9a1cef
  missing tree a0155d8d5bb58afb9a5d20459be023899c9a1cef
  dangling tree b6f32087526f43205bf8b5e6539936da787ecb1a

Git 2.1.0 hits an assertion:

  $ git clone https://code.google.com/p/mapsforge/
  Cloning into 'mapsforge'...
  remote: Counting objects: 12879, done.
  Receiving objects: 100% (12879/12879), 10.19 MiB | 2.53 MiB/s, done.
  git: builtin/index-pack.c:918: find_unresolved_deltas_1: Assertion `child->real_type == OBJ_REF_DELTA' failed.
  error: index-pack died of signal 6
  fatal: index-pack failed

The patch below, which makes index-pack ignore objects with unexpected
real_type as before, changes the shown error message, but clone still
fails:

  $ git clone https://code.google.com/p/mapsforge/
  Cloning into 'mapsforge'...
  remote: Counting objects: 12879, done.
  Receiving objects: 100% (12879/12879), 10.19 MiB | 2.43 MiB/s, done.
  Resolving deltas: 100% (4348/4348), done.
  fatal: did not receive expected object a0155d8d5bb58afb9a5d20459be023899c9a1cef
  fatal: index-pack failed

Turning that fatal error into a warning get us a bit further:

  $ git clone https://code.google.com/p/mapsforge/
  Cloning into 'mapsforge'...
  remote: Counting objects: 12879, done.
  Receiving objects: 100% (12879/12879), 10.19 MiB | 2.38 MiB/s, done.
  Resolving deltas: 100% (4350/4350), done.
  warning: did not receive expected object a0155d8d5bb58afb9a5d20459be023899c9a1cef
  fatal: The same object bc386be34bd4781f71bb68d72a6e0aee3124201e appears twice in the pack
  fatal: index-pack failed

Disabling strict checking (WRITE_IDX_STRICT) as well lets clone
succeed, but a check shows that a tree is missing, as wiht git 1.7.6:

  $ git clone https://code.google.com/p/mapsforge/
  Cloning into 'mapsforge'...
  remote: Counting objects: 12879, done.
  Receiving objects: 100% (12879/12879), 10.19 MiB | 2.37 MiB/s, done.
  Resolving deltas: 100% (4349/4349), done.
  warning: did not receive expected object a0155d8d5bb58afb9a5d20459be023899c9a1cef
  Checking connectivity... done.
  $ cd mapsforge/
  $ git fsck
  Checking object directories: 100% (256/256), done.
  Checking objects: 100% (12879/12879), done.
  broken link from    tree eb95fb0268c43f512e2ce512e6625072acafbdb5
                to    tree a0155d8d5bb58afb9a5d20459be023899c9a1cef
  missing tree a0155d8d5bb58afb9a5d20459be023899c9a1cef

Cloning the repo with Egit works fine, but git fsck shows the same
results as above.

https://github.com/applantation/mapsforge has that missing tree
object, by the way.

OK, what now?  It's better to show an error message instead of
failing an assertion if a repo is found to be corrupt because the
issue is caused by external input.  I don't know if the patch
below may have any bad side effects, though, so no sign-off at
this time.

Allowing git to clone a broken repo sounds useful, up to point.
In this particular case older versions had no problem doing that,
so it seems worth supporting.

And how did the tree object went missing in the first place?

René
---
 builtin/index-pack.c | 33 +++++++++++++++++++--------------
 1 file changed, 19 insertions(+), 14 deletions(-)

diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 5568a5b..f7dc5b0 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -913,28 +913,33 @@ static struct base_data *find_unresolved_deltas_1(struct base_data *base,
 
 	if (base->ref_first <= base->ref_last) {
 		struct object_entry *child = objects + deltas[base->ref_first].obj_no;
-		struct base_data *result = alloc_base_data();
 
-		assert(child->real_type == OBJ_REF_DELTA);
-		resolve_delta(child, base, result);
-		if (base->ref_first == base->ref_last && base->ofs_last == -1)
-			free_base_data(base);
+		if (child->real_type == OBJ_REF_DELTA) {
+			struct base_data *result = alloc_base_data();
 
-		base->ref_first++;
-		return result;
+			resolve_delta(child, base, result);
+			if (base->ref_first == base->ref_last &&
+			    base->ofs_last == -1)
+				free_base_data(base);
+
+			base->ref_first++;
+			return result;
+		}
 	}
 
 	if (base->ofs_first <= base->ofs_last) {
 		struct object_entry *child = objects + deltas[base->ofs_first].obj_no;
-		struct base_data *result = alloc_base_data();
 
-		assert(child->real_type == OBJ_OFS_DELTA);
-		resolve_delta(child, base, result);
-		if (base->ofs_first == base->ofs_last)
-			free_base_data(base);
+		if (child->real_type == OBJ_OFS_DELTA) {
+			struct base_data *result = alloc_base_data();
 
-		base->ofs_first++;
-		return result;
+			resolve_delta(child, base, result);
+			if (base->ofs_first == base->ofs_last)
+				free_base_data(base);
+
+			base->ofs_first++;
+			return result;
+		}
 	}
 
 	unlink_base_data(base);
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [BUG] resolved deltas
  2014-08-23 10:12   ` René Scharfe
@ 2014-08-23 10:56     ` Jeff King
  2014-08-23 11:04       ` Jeff King
  2014-08-25 17:19       ` [BUG] resolved deltas Shawn Pearce
  0 siblings, 2 replies; 30+ messages in thread
From: Jeff King @ 2014-08-23 10:56 UTC (permalink / raw)
  To: René Scharfe; +Cc: Shawn Pearce, Martin von Gagern, git, Junio C Hamano

[+cc spearce, as I think this is a bug in code.google.com's sending
 side, and he can probably get the attention of the right folks]

On Sat, Aug 23, 2014 at 12:12:08PM +0200, René Scharfe wrote:

> Git 1.7.6 clones the repo without reporting an error or warning, but a
> check shows that a tree object is missing:

Thanks for digging on this. I happened to be looking at it at the exact
same time, so now I can stop. :)

> The patch below, which makes index-pack ignore objects with unexpected
> real_type as before, changes the shown error message, but clone still
> fails:
> 
>   $ git clone https://code.google.com/p/mapsforge/
>   Cloning into 'mapsforge'...
>   remote: Counting objects: 12879, done.
>   Receiving objects: 100% (12879/12879), 10.19 MiB | 2.43 MiB/s, done.
>   Resolving deltas: 100% (4348/4348), done.
>   fatal: did not receive expected object a0155d8d5bb58afb9a5d20459be023899c9a1cef
>   fatal: index-pack failed

This makes sense. Early versions of index-pack didn't know how to check
for missing objects, but now it does. However, we only trigger that code
if we are using --strict, or if we are cloning (in which case we pass
the --check-self-contained-and-connected option).

If you are doing a regular fetch, we rely on check_everything_connected
after the pack is received. So (with your patch):

  $ git init
  $ git fetch https://code.google.com/p/mapsforge/
  remote: Counting objects: 12298, done.
  Receiving objects: 100% (12298/12298), 9.24 MiB | 959.00 KiB/s, done.
  Resolving deltas: 100% (4107/4107), done.
  error: Could not read a0155d8d5bb58afb9a5d20459be023899c9a1cef
  fatal: bad tree object a0155d8d5bb58afb9a5d20459be023899c9a1cef
  error: https://code.google.com/p/mapsforge/ did not send all necessary objects

That all makes sense.

> Turning that fatal error into a warning get us a bit further:
> 
>   $ git clone https://code.google.com/p/mapsforge/
>   Cloning into 'mapsforge'...
>   remote: Counting objects: 12879, done.
>   Receiving objects: 100% (12879/12879), 10.19 MiB | 2.38 MiB/s, done.
>   Resolving deltas: 100% (4350/4350), done.
>   warning: did not receive expected object a0155d8d5bb58afb9a5d20459be023899c9a1cef
>   fatal: The same object bc386be34bd4781f71bb68d72a6e0aee3124201e appears twice in the pack
>   fatal: index-pack failed
> 
> Disabling strict checking (WRITE_IDX_STRICT) as well lets clone
> succeed, but a check shows that a tree is missing, as wiht git 1.7.6:

Interesting that one object is duplicated and another object is missing.
The pack doesn't contain the sha1s of the objects; we compute them on
the fly in index-pack. So it's likely that the real problem is that one
entry in the pack either has the wrong delta data, or mentions the wrong
base object. And does it in such a way that we generate the another
object that does exist (so the packfile data isn't garbled or corrupted;
it's a bug on the sending side that uses the wrong data).

> https://github.com/applantation/mapsforge has that missing tree
> object, by the way.

Unsurprisingly, it's a tree object quite similar to the duplicated one.

> OK, what now?  It's better to show an error message instead of
> failing an assertion if a repo is found to be corrupt because the
> issue is caused by external input.  I don't know if the patch
> below may have any bad side effects, though, so no sign-off at
> this time.

Definitely. Your patch looks like an improvement. The objects in the
delta list must have had their real_types set to REF_DELTA and OFS_DELTA
at some point (since that is how they got on the list). And there is no
way for the type field to change from a delta type to anything else
_except_ by us resolving the delta. So I think the condition triggering
that assert cannot mean anything except that we have a duplicate object
(and it is not actually the delta object that is duplicated, but rather
its base, as seeing it again is what causes us to try to resolve twice).

So we know at this point that we have a duplicate object, and we could
warn or say something. But I think we probably would not want to. If
--strict is in effect, then we will notice and complain later. And if it
is not, then we should allow it without comment (in this case the repo
is broken, but it would not _have_ to be).

So I think your patch is doing the right thing.

> Allowing git to clone a broken repo sounds useful, up to point.
> In this particular case older versions had no problem doing that,
> so it seems worth supporting.

I think with your patch we are fine. Without --strict (which is implied
on a clone), you can still "git init" and "git fetch" the broken pack
(fetch will complain, but it leaves the pack in place).

We may want to adjust the fact that --check-self-contained-and-connected
implies strict (it builds on the fsck bits of index-pack, but it does
not have to imply a full fsck, nor strict index writing).

> And how did the tree object went missing in the first place?

My guess is a bug on the sending side. We have seen duplicate pack
objects before, but never (AFAIK) combined with a missing object. This
really seems like the sender just sent the wrong data for one object.

IIRC, code.google.com is backed by their custom Dulwich implementation
which runs on BigTable. We'll see if Shawn can get this to the right
people as a bug report. :)

-Peff

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [BUG] resolved deltas
  2014-08-23 10:56     ` Jeff King
@ 2014-08-23 11:04       ` Jeff King
  2014-08-23 11:18         ` Jeff King
  2014-08-25 17:19       ` [BUG] resolved deltas Shawn Pearce
  1 sibling, 1 reply; 30+ messages in thread
From: Jeff King @ 2014-08-23 11:04 UTC (permalink / raw)
  To: René Scharfe; +Cc: Shawn Pearce, Martin von Gagern, git, Junio C Hamano

On Sat, Aug 23, 2014 at 06:56:40AM -0400, Jeff King wrote:

> So I think your patch is doing the right thing.

By the way, if you want to add a test to your patch, there is
infrastructure in t5308 to create packs with duplicate objects. If I
understand the problem correctly, you could trigger this by having a
delta object whose base is duplicated, even without the missing object.

-Peff

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [BUG] resolved deltas
  2014-08-23 11:04       ` Jeff King
@ 2014-08-23 11:18         ` Jeff King
  2014-08-25 16:39           ` René Scharfe
  0 siblings, 1 reply; 30+ messages in thread
From: Jeff King @ 2014-08-23 11:18 UTC (permalink / raw)
  To: René Scharfe; +Cc: Shawn Pearce, Martin von Gagern, git, Junio C Hamano

On Sat, Aug 23, 2014 at 07:04:59AM -0400, Jeff King wrote:

> On Sat, Aug 23, 2014 at 06:56:40AM -0400, Jeff King wrote:
> 
> > So I think your patch is doing the right thing.
> 
> By the way, if you want to add a test to your patch, there is
> infrastructure in t5308 to create packs with duplicate objects. If I
> understand the problem correctly, you could trigger this by having a
> delta object whose base is duplicated, even without the missing object.

This actually turned out to be really easy. The test below fails without
your patch and passes with it. Please feel free to squash it in.

diff --git a/t/t5308-pack-detect-duplicates.sh b/t/t5308-pack-detect-duplicates.sh
index 9c5a876..50f7a69 100755
--- a/t/t5308-pack-detect-duplicates.sh
+++ b/t/t5308-pack-detect-duplicates.sh
@@ -77,4 +77,19 @@ test_expect_success 'index-pack can reject packs with duplicates' '
 	test_expect_code 1 git cat-file -e $LO_SHA1
 '
 
+test_expect_success 'duplicated delta base does not trigger assert' '
+	clear_packs &&
+	{
+		A=01d7713666f4de822776c7622c10f1b07de280dc &&
+		B=e68fe8129b546b101aee9510c5328e7f21ca1d18 &&
+		pack_header 3 &&
+		pack_obj $A $B &&
+		pack_obj $B &&
+		pack_obj $B
+	} >dups.pack &&
+	pack_trailer dups.pack &&
+	git index-pack --stdin <dups.pack &&
+	test_must_fail git index-pack --stdin --strict <dups.pack
+'
+
 test_done

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [BUG] resolved deltas
  2014-08-23 11:18         ` Jeff King
@ 2014-08-25 16:39           ` René Scharfe
  2014-08-28 22:08             ` Jeff King
  0 siblings, 1 reply; 30+ messages in thread
From: René Scharfe @ 2014-08-25 16:39 UTC (permalink / raw)
  To: Jeff King; +Cc: Shawn Pearce, Martin von Gagern, git, Junio C Hamano

Am 23.08.2014 um 13:18 schrieb Jeff King:
> On Sat, Aug 23, 2014 at 07:04:59AM -0400, Jeff King wrote:
> 
>> On Sat, Aug 23, 2014 at 06:56:40AM -0400, Jeff King wrote:
>>
>>> So I think your patch is doing the right thing.
>>
>> By the way, if you want to add a test to your patch, there is
>> infrastructure in t5308 to create packs with duplicate objects. If I
>> understand the problem correctly, you could trigger this by having a
>> delta object whose base is duplicated, even without the missing object.
> 
> This actually turned out to be really easy. The test below fails without
> your patch and passes with it. Please feel free to squash it in.
> 
> diff --git a/t/t5308-pack-detect-duplicates.sh b/t/t5308-pack-detect-duplicates.sh
> index 9c5a876..50f7a69 100755
> --- a/t/t5308-pack-detect-duplicates.sh
> +++ b/t/t5308-pack-detect-duplicates.sh
> @@ -77,4 +77,19 @@ test_expect_success 'index-pack can reject packs with duplicates' '
>   	test_expect_code 1 git cat-file -e $LO_SHA1
>   '
>   
> +test_expect_success 'duplicated delta base does not trigger assert' '
> +	clear_packs &&
> +	{
> +		A=01d7713666f4de822776c7622c10f1b07de280dc &&
> +		B=e68fe8129b546b101aee9510c5328e7f21ca1d18 &&
> +		pack_header 3 &&
> +		pack_obj $A $B &&
> +		pack_obj $B &&
> +		pack_obj $B
> +	} >dups.pack &&
> +	pack_trailer dups.pack &&
> +	git index-pack --stdin <dups.pack &&
> +	test_must_fail git index-pack --stdin --strict <dups.pack
> +'
> +
>   test_done

Thanks, that looks good.  But while preparing the patch I noticed that
the added test sometimes fails.  Helgrind pointed outet a race
condition.  It is not caused by the patch to turn the asserts into
regular ifs, however -- here's a Helgrind report for the original code
with the new test:

==34949== Helgrind, a thread error detector
==34949== Copyright (C) 2007-2013, and GNU GPL'd, by OpenWorks LLP et al.
==34949== Using Valgrind-3.9.0 and LibVEX; rerun with -h for copyright info
==34949== Command: /home/lsr/src/git/t/../bin-wrappers/git index-pack --stdin
==34949==
==34949== Helgrind, a thread error detector
==34949== Copyright (C) 2007-2013, and GNU GPL'd, by OpenWorks LLP et al.
==34949== Using Valgrind-3.9.0 and LibVEX; rerun with -h for copyright info
==34949== Command: /home/lsr/src/git/git index-pack --stdin
==34949==
==34949== ---Thread-Announcement------------------------------------------
==34949==
==34949== Thread #3 was created
==34949==    at 0x594DF7E: clone (clone.S:74)
==34949==    by 0x544A2B9: do_clone.constprop.3 (createthread.c:75)
==34949==    by 0x544B762: pthread_create@@GLIBC_2.2.5 (createthread.c:245)
==34949==    by 0x4C2D55D: pthread_create_WRK (hg_intercepts.c:269)
==34949==    by 0x43ABB8: cmd_index_pack (index-pack.c:1097)
==34949==    by 0x405B6A: handle_builtin (git.c:351)
==34949==    by 0x404CE8: main (git.c:575)
==34949==
==34949== ---Thread-Announcement------------------------------------------
==34949==
==34949== Thread #2 was created
==34949==    at 0x594DF7E: clone (clone.S:74)
==34949==    by 0x544A2B9: do_clone.constprop.3 (createthread.c:75)
==34949==    by 0x544B762: pthread_create@@GLIBC_2.2.5 (createthread.c:245)
==34949==    by 0x4C2D55D: pthread_create_WRK (hg_intercepts.c:269)
==34949==    by 0x43ABB8: cmd_index_pack (index-pack.c:1097)
==34949==    by 0x405B6A: handle_builtin (git.c:351)
==34949==    by 0x404CE8: main (git.c:575)
==34949==
==34949== ----------------------------------------------------------------
==34949==
==34949== Possible data race during read of size 4 at 0x5E15910 by thread #3
==34949== Locks held: none
==34949==    at 0x439327: find_unresolved_deltas (index-pack.c:918)
==34949==    by 0x439666: threaded_second_pass (index-pack.c:1002)
==34949==    by 0x4C2D6F6: mythread_wrapper (hg_intercepts.c:233)
==34949==    by 0x544B0A3: start_thread (pthread_create.c:309)
==34949==
==34949== This conflicts with a previous write of size 4 by thread #2
==34949== Locks held: none
==34949==    at 0x4390E2: resolve_delta (index-pack.c:865)
==34949==    by 0x439340: find_unresolved_deltas (index-pack.c:919)
==34949==    by 0x439666: threaded_second_pass (index-pack.c:1002)
==34949==    by 0x4C2D6F6: mythread_wrapper (hg_intercepts.c:233)
==34949==    by 0x544B0A3: start_thread (pthread_create.c:309)
==34949==
==34949== Address 0x5E15910 is 48 bytes inside a block of size 256 alloc'd
==34949==    at 0x4C2A7D0: calloc (vg_replace_malloc.c:618)
==34949==    by 0x50CA83: xcalloc (wrapper.c:119)
==34949==    by 0x439AF6: cmd_index_pack (index-pack.c:1643)
==34949==    by 0x405B6A: handle_builtin (git.c:351)
==34949==    by 0x404CE8: main (git.c:575)
==34949==
git: builtin/index-pack.c:918: find_unresolved_deltas_1: Assertion `child->real_type == OBJ_REF_DELTA' failed.
==34949==
==34949== For counts of detected and suppressed errors, rerun with: -v
==34949== Use --history-level=approx or =none to gain increased speed, at
==34949== the cost of reduced accuracy of conflicting-access information
==34949== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 15 from 12)
Killed

It only happens in one of ten cases or so.  The write at index-pack.c
line 865 is in resolve_delta() and sets real_type of a delta object to
the real_type of its base.  It seems that threads can end up with work
items that end up requiring to touch another thread's part of the pie
if there are duplicates.

René

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [BUG] resolved deltas
  2014-08-23 10:56     ` Jeff King
  2014-08-23 11:04       ` Jeff King
@ 2014-08-25 17:19       ` Shawn Pearce
  1 sibling, 0 replies; 30+ messages in thread
From: Shawn Pearce @ 2014-08-25 17:19 UTC (permalink / raw)
  To: Jeff King; +Cc: René Scharfe, Martin von Gagern, git, Junio C Hamano

On Sat, Aug 23, 2014 at 3:56 AM, Jeff King <peff@peff.net> wrote:
> [+cc spearce, as I think this is a bug in code.google.com's sending
>  side, and he can probably get the attention of the right folks]
...
> My guess is a bug on the sending side. We have seen duplicate pack
> objects before, but never (AFAIK) combined with a missing object. This
> really seems like the sender just sent the wrong data for one object.
>
> IIRC, code.google.com is backed by their custom Dulwich implementation
> which runs on BigTable. We'll see if Shawn can get this to the right
> people as a bug report. :)

Thanks. This is a bug in the code.google.com implementation that is
running on Bigtable. I forwarded the report to the team that manages
that service so they can investigate further.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [BUG] resolved deltas
  2014-08-25 16:39           ` René Scharfe
@ 2014-08-28 22:08             ` Jeff King
  2014-08-28 22:15               ` Jeff King
  2014-08-28 22:22               ` Jeff King
  0 siblings, 2 replies; 30+ messages in thread
From: Jeff King @ 2014-08-28 22:08 UTC (permalink / raw)
  To: René Scharfe; +Cc: Shawn Pearce, Martin von Gagern, git, Junio C Hamano

On Mon, Aug 25, 2014 at 06:39:45PM +0200, René Scharfe wrote:

> Thanks, that looks good.  But while preparing the patch I noticed that
> the added test sometimes fails.  Helgrind pointed outet a race
> condition.  It is not caused by the patch to turn the asserts into
> regular ifs, however -- here's a Helgrind report for the original code
> with the new test:

Interesting. I couldn't convince Helgrind to catch such a case, but it
makes sense. We split the delta-resolving work by dividing up the base
objects. We then find any deltas that need that base object (which is
read-only). If there's only one instances of the base, then we'll be the
only thread working on those delta. But if there are two such bases,
they're going to look at the same deltas.

So we need some kind of mutual exclusion so that only one thread
proceeds with resolving the delta. The "real_type" check sort-of
functions in that way (except of course it is not actually thread safe).
So one obvious option is just a coarse-grained global lock to modify or
check real_type fields. That would probably perform badly (lots of
useless lock contention on unrelated objects), but at least it would
work.

If we accept pushing a lock into _each_ object_entry, then we would get
a lot less lock contention (i.e., none at all in the common case of no
duplicates). The cost is storing one lock per object, though. Not great,
but probably OK.

Is there some way we can make real_type itself more atomic? I.e., use it
as the mutex with the rule that we do not "claim" the object_entry as
ours to work on until we atomically set delta_obj->real_type. I think
doing a compare-and-swap looking for OBJ_REF_DELTA and replacing it with
base->real_type would be enough there (and substitute OBJ_OFS_DELTA for
the second conditional, of course).

However, I'm not sure we can portably rely on having a compare-and-swap
primitive (or that we want to go through the hassle of conditionally
using it).

-Peff

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [BUG] resolved deltas
  2014-08-28 22:08             ` Jeff King
@ 2014-08-28 22:15               ` Jeff King
  2014-08-28 23:04                 ` Jeff King
  2014-08-28 22:22               ` Jeff King
  1 sibling, 1 reply; 30+ messages in thread
From: Jeff King @ 2014-08-28 22:15 UTC (permalink / raw)
  To: René Scharfe; +Cc: Shawn Pearce, Martin von Gagern, git, Junio C Hamano

On Thu, Aug 28, 2014 at 06:08:21PM -0400, Jeff King wrote:

> So we need some kind of mutual exclusion so that only one thread
> proceeds with resolving the delta. The "real_type" check sort-of
> functions in that way (except of course it is not actually thread safe).

Here's a patch which implements that. Since I couldn't replicate the
original problem with helgrind, I am just guessing at whether it
properly fixes it (well, it is more than just a guess; I used my brain
to analyze it, but that is far from foolproof).

It uses a single lock. I did a best-of-five timing of "git index-pack
--verify" on the kernel repo, both before and after. The results ended
up quite similar (both about 57s), though the run-to-run numbers are all
over the place (up to about 65s in the worst case). So maybe it is not
so bad.

As I implemented, I realized that even with the mutex, I really was just
implementing compare_and_swap (and I wrote it that way to make it more
obvious). So if we wanted to, it would be trivial to replace the
"claim_delta" function with a true compare-and-swap instruction if the
compiler and processor support it.

---
diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index f7dc5b0..ed9e253 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -112,6 +112,10 @@ static pthread_mutex_t deepest_delta_mutex;
 #define deepest_delta_lock()	lock_mutex(&deepest_delta_mutex)
 #define deepest_delta_unlock()	unlock_mutex(&deepest_delta_mutex)
 
+static pthread_mutex_t object_entry_mutex;
+#define object_entry_lock()	lock_mutex(&object_entry_mutex)
+#define object_entry_unlock()	unlock_mutex(&object_entry_mutex)
+
 static pthread_key_t key;
 
 static inline void lock_mutex(pthread_mutex_t *mutex)
@@ -135,6 +139,7 @@ static void init_thread(void)
 	init_recursive_mutex(&read_mutex);
 	pthread_mutex_init(&counter_mutex, NULL);
 	pthread_mutex_init(&work_mutex, NULL);
+	pthread_mutex_init(&object_entry_mutex, NULL);
 	if (show_stat)
 		pthread_mutex_init(&deepest_delta_mutex, NULL);
 	pthread_key_create(&key, NULL);
@@ -157,6 +162,7 @@ static void cleanup_thread(void)
 	pthread_mutex_destroy(&read_mutex);
 	pthread_mutex_destroy(&counter_mutex);
 	pthread_mutex_destroy(&work_mutex);
+	pthread_mutex_destroy(&object_entry_mutex);
 	if (show_stat)
 		pthread_mutex_destroy(&deepest_delta_mutex);
 	for (i = 0; i < nr_threads; i++)
@@ -862,7 +868,6 @@ static void resolve_delta(struct object_entry *delta_obj,
 {
 	void *base_data, *delta_data;
 
-	delta_obj->real_type = base->obj->real_type;
 	if (show_stat) {
 		delta_obj->delta_depth = base->obj->delta_depth + 1;
 		deepest_delta_lock();
@@ -888,6 +893,21 @@ static void resolve_delta(struct object_entry *delta_obj,
 	counter_unlock();
 }
 
+static int claim_delta(struct object_entry *delta_obj,
+		       enum object_type delta_type,
+		       enum object_type base_type)
+{
+	enum object_type old_type;
+
+	object_entry_lock();
+	old_type = delta_obj->real_type;
+	if (old_type == delta_type)
+		delta_obj->real_type = base_type;
+	object_entry_unlock();
+
+	return old_type == delta_type;
+}
+
 static struct base_data *find_unresolved_deltas_1(struct base_data *base,
 						  struct base_data *prev_base)
 {
@@ -914,7 +934,7 @@ static struct base_data *find_unresolved_deltas_1(struct base_data *base,
 	if (base->ref_first <= base->ref_last) {
 		struct object_entry *child = objects + deltas[base->ref_first].obj_no;
 
-		if (child->real_type == OBJ_REF_DELTA) {
+		if (claim_delta(child, OBJ_REF_DELTA, base->obj->real_type)) {
 			struct base_data *result = alloc_base_data();
 
 			resolve_delta(child, base, result);
@@ -930,7 +950,7 @@ static struct base_data *find_unresolved_deltas_1(struct base_data *base,
 	if (base->ofs_first <= base->ofs_last) {
 		struct object_entry *child = objects + deltas[base->ofs_first].obj_no;
 
-		if (child->real_type == OBJ_OFS_DELTA) {
+		if (claim_delta(child, OBJ_OFS_DELTA, base->obj->real_type)) {
 			struct base_data *result = alloc_base_data();
 
 			resolve_delta(child, base, result);

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [BUG] resolved deltas
  2014-08-28 22:08             ` Jeff King
  2014-08-28 22:15               ` Jeff King
@ 2014-08-28 22:22               ` Jeff King
  2014-08-28 23:14                 ` Junio C Hamano
  1 sibling, 1 reply; 30+ messages in thread
From: Jeff King @ 2014-08-28 22:22 UTC (permalink / raw)
  To: René Scharfe; +Cc: Shawn Pearce, Martin von Gagern, git, Junio C Hamano

On Thu, Aug 28, 2014 at 06:08:21PM -0400, Jeff King wrote:

> Interesting. I couldn't convince Helgrind to catch such a case...

Ugh. It helps if you actually helgrind the git binary, and not the
shell-script from bin-wrappers. I can easily replicate the problem now.
The patch I just posted seems to fix it (at least it has been running in
a loop for over a minute with no failures, whereas before it took only a
few seconds to provoke a failure).

-Peff

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [BUG] resolved deltas
  2014-08-28 22:15               ` Jeff King
@ 2014-08-28 23:04                 ` Jeff King
  0 siblings, 0 replies; 30+ messages in thread
From: Jeff King @ 2014-08-28 23:04 UTC (permalink / raw)
  To: René Scharfe; +Cc: Shawn Pearce, Martin von Gagern, git, Junio C Hamano

On Thu, Aug 28, 2014 at 06:15:18PM -0400, Jeff King wrote:

> As I implemented, I realized that even with the mutex, I really was just
> implementing compare_and_swap (and I wrote it that way to make it more
> obvious). So if we wanted to, it would be trivial to replace the
> "claim_delta" function with a true compare-and-swap instruction if the
> compiler and processor support it.

That's something like this on top:

diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index ed9e253..1782a46 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -897,6 +897,10 @@ static int claim_delta(struct object_entry *delta_obj,
 		       enum object_type delta_type,
 		       enum object_type base_type)
 {
+#ifdef __GCC_HAVE_SYNC_COMPARE_AND_SWAP_4
+	return __sync_bool_compare_and_swap(&delta_obj->real_type, delta_type,
+					    base_type);
+#else
 	enum object_type old_type;
 
 	object_entry_lock();
@@ -906,6 +910,7 @@ static int claim_delta(struct object_entry *delta_obj,
 	object_entry_unlock();
 
 	return old_type == delta_type;
+#endif
 }
 
 static struct base_data *find_unresolved_deltas_1(struct base_data *base,


Though I guess gcc's __sync stuff is old, and the new C11 thing is
stdatomic.h. I guess we could support either if we can find the right
preprocessor macros (I am not even sure if the above is correct; the "4"
is for the size of the data type, but this is an enum, so we don't even
know what the right size is...).

However, the above doesn't seem to be any faster for me, so it may not
be worth the hassle.

-Peff

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [BUG] resolved deltas
  2014-08-28 22:22               ` Jeff King
@ 2014-08-28 23:14                 ` Junio C Hamano
  2014-08-29 20:55                   ` Jeff King
  0 siblings, 1 reply; 30+ messages in thread
From: Junio C Hamano @ 2014-08-28 23:14 UTC (permalink / raw)
  To: Jeff King; +Cc: René Scharfe, Shawn Pearce, Martin von Gagern, git

Jeff King <peff@peff.net> writes:

> On Thu, Aug 28, 2014 at 06:08:21PM -0400, Jeff King wrote:
>
>> Interesting. I couldn't convince Helgrind to catch such a case...
>
> Ugh. It helps if you actually helgrind the git binary, and not the
> shell-script from bin-wrappers. I can easily replicate the problem now.
> The patch I just posted seems to fix it (at least it has been running in
> a loop for over a minute with no failures, whereas before it took only a
> few seconds to provoke a failure).

Thanks for digging.  I'll pick it up and may comment on it in
tomorrow's cycle (it is a bit too late for today's cycle,
unfortunately).

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [BUG] resolved deltas
  2014-08-28 23:14                 ` Junio C Hamano
@ 2014-08-29 20:55                   ` Jeff King
  2014-08-29 20:57                     ` [PATCH 1/2] index-pack: fix race condition with duplicate bases Jeff King
                                       ` (2 more replies)
  0 siblings, 3 replies; 30+ messages in thread
From: Jeff King @ 2014-08-29 20:55 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: René Scharfe, Shawn Pearce, Martin von Gagern, git

On Thu, Aug 28, 2014 at 04:14:01PM -0700, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > On Thu, Aug 28, 2014 at 06:08:21PM -0400, Jeff King wrote:
> >
> >> Interesting. I couldn't convince Helgrind to catch such a case...
> >
> > Ugh. It helps if you actually helgrind the git binary, and not the
> > shell-script from bin-wrappers. I can easily replicate the problem now.
> > The patch I just posted seems to fix it (at least it has been running in
> > a loop for over a minute with no failures, whereas before it took only a
> > few seconds to provoke a failure).
> 
> Thanks for digging.  I'll pick it up and may comment on it in
> tomorrow's cycle (it is a bit too late for today's cycle,
> unfortunately).

Here's a proposed series.

  [1/2]: index-pack: fix race condition with duplicate bases
  [2/2]: index-pack: handle duplicate base objects gracefully

There are two big changes from what has been posted already:

  1. While writing up the commit message, I realized that this can't
     ever happen for OFS_DELTA (neither the race condition, nor the
     duplicate resolution) because offsets by definition point to a
     unique entry in our object_entry array. So even if we have multiple
     copies of the base object in the pack, there's only one possible
     base for an OFS_DELTA.

  2. René's original patch just does nothing for a delta which has
     already been resolved, and continues through the function to try
     any OFS_DELTA objects. If there isn't one, we return NULL to say
     "nothing left to resolve". But that's not quite true. There may
     have been other REF_DELTA against the same base, but we never
     incremented our counter to find it.

     Instead, we need to actually increment our counter and loop again
     to see if there's another REF_DELTA to handle.

-Peff

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 1/2] index-pack: fix race condition with duplicate bases
  2014-08-29 20:55                   ` Jeff King
@ 2014-08-29 20:57                     ` Jeff King
  2014-08-29 20:58                     ` [PATCH 2/2] index-pack: handle duplicate base objects gracefully Jeff King
  2014-08-30 13:23                     ` [PATCH 3/2] t5309: mark delta-cycle failover tests as passing Jeff King
  2 siblings, 0 replies; 30+ messages in thread
From: Jeff King @ 2014-08-29 20:57 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: René Scharfe, Shawn Pearce, Martin von Gagern, git

When we are resolving deltas in an indexed pack, we do it by
first selecting a potential base (either one stored in full
in the pack, or one created by resolving another delta), and
then resolving any deltas that use that base.  When we
resolve a particular delta, we flip its "real_type" field
from OBJ_{REF,OFS}_DELTA to whatever the real type is.

We assume that traversing the objects this way will visit
each delta only once. This is correct for most packs; we
visit the delta only when we process its base, and each
object (and thus each base) appears only once. However, if a
base object appears multiple times in the pack, we will try
to resolve any deltas based on it once for each instance.

We can detect this case by noting that a delta we are about
to resolve has already had its real_type field flipped, and
we already do so with an assert().  However, if multiple
threads are in use, we may race with another thread on
comparing and flipping the field. We need to synchronize the
access.

The right mechanism for doing this is a compare-and-swap (we
atomically "claim" the delta for our own and find out
whether our claim was successful). We can implement this
in C by using a pthread mutex to protect the operation. This
is not the fastest way of doing a compare-and-swap; many
processors provide instructions for this, and gcc and other
compilers provide builtins to access them. However, some
experiments showed that lock contention does not cause a
significant slowdown here. Adding c-a-s support for many
compilers would increase the maintenance burden (and we
would still end up including the pthread version as a
fallback).

Note that we only need to touch the OBJ_REF_DELTA codepath
here. An OBJ_OFS_DELTA object points to its base using an
offset, and therefore has only one base, even if another
copy of that base object appears in the pack (we do still
touch it briefly because the setting of real_type is
factored out of resolve_data).

Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/index-pack.c | 33 +++++++++++++++++++++++++++++++--
 1 file changed, 31 insertions(+), 2 deletions(-)

diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 5568a5b..eebf1a8 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -112,6 +112,10 @@ static pthread_mutex_t deepest_delta_mutex;
 #define deepest_delta_lock()	lock_mutex(&deepest_delta_mutex)
 #define deepest_delta_unlock()	unlock_mutex(&deepest_delta_mutex)
 
+static pthread_mutex_t type_cas_mutex;
+#define type_cas_lock()		lock_mutex(&type_cas_mutex)
+#define type_cas_unlock()	unlock_mutex(&type_cas_mutex)
+
 static pthread_key_t key;
 
 static inline void lock_mutex(pthread_mutex_t *mutex)
@@ -135,6 +139,7 @@ static void init_thread(void)
 	init_recursive_mutex(&read_mutex);
 	pthread_mutex_init(&counter_mutex, NULL);
 	pthread_mutex_init(&work_mutex, NULL);
+	pthread_mutex_init(&type_cas_mutex, NULL);
 	if (show_stat)
 		pthread_mutex_init(&deepest_delta_mutex, NULL);
 	pthread_key_create(&key, NULL);
@@ -157,6 +162,7 @@ static void cleanup_thread(void)
 	pthread_mutex_destroy(&read_mutex);
 	pthread_mutex_destroy(&counter_mutex);
 	pthread_mutex_destroy(&work_mutex);
+	pthread_mutex_destroy(&type_cas_mutex);
 	if (show_stat)
 		pthread_mutex_destroy(&deepest_delta_mutex);
 	for (i = 0; i < nr_threads; i++)
@@ -862,7 +868,6 @@ static void resolve_delta(struct object_entry *delta_obj,
 {
 	void *base_data, *delta_data;
 
-	delta_obj->real_type = base->obj->real_type;
 	if (show_stat) {
 		delta_obj->delta_depth = base->obj->delta_depth + 1;
 		deepest_delta_lock();
@@ -888,6 +893,26 @@ static void resolve_delta(struct object_entry *delta_obj,
 	counter_unlock();
 }
 
+/*
+ * Standard boolean compare-and-swap: atomically check whether "*type" is
+ * "want"; if so, swap in "set" and return true. Otherwise, leave it untouched
+ * and return false.
+ */
+static int compare_and_swap_type(enum object_type *type,
+				 enum object_type want,
+				 enum object_type set)
+{
+	enum object_type old;
+
+	type_cas_lock();
+	old = *type;
+	if (old == want)
+		*type = set;
+	type_cas_unlock();
+
+	return old == want;
+}
+
 static struct base_data *find_unresolved_deltas_1(struct base_data *base,
 						  struct base_data *prev_base)
 {
@@ -915,7 +940,10 @@ static struct base_data *find_unresolved_deltas_1(struct base_data *base,
 		struct object_entry *child = objects + deltas[base->ref_first].obj_no;
 		struct base_data *result = alloc_base_data();
 
-		assert(child->real_type == OBJ_REF_DELTA);
+		if (!compare_and_swap_type(&child->real_type, OBJ_REF_DELTA,
+					   base->obj->real_type))
+			die("BUG: child->real_type != OBJ_REF_DELTA");
+
 		resolve_delta(child, base, result);
 		if (base->ref_first == base->ref_last && base->ofs_last == -1)
 			free_base_data(base);
@@ -929,6 +957,7 @@ static struct base_data *find_unresolved_deltas_1(struct base_data *base,
 		struct base_data *result = alloc_base_data();
 
 		assert(child->real_type == OBJ_OFS_DELTA);
+		child->real_type = base->obj->real_type;
 		resolve_delta(child, base, result);
 		if (base->ofs_first == base->ofs_last)
 			free_base_data(base);
-- 
2.1.0.346.ga0367b9

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 2/2] index-pack: handle duplicate base objects gracefully
  2014-08-29 20:55                   ` Jeff King
  2014-08-29 20:57                     ` [PATCH 1/2] index-pack: fix race condition with duplicate bases Jeff King
@ 2014-08-29 20:58                     ` Jeff King
  2014-08-29 21:56                       ` Junio C Hamano
  2014-08-30 13:23                     ` [PATCH 3/2] t5309: mark delta-cycle failover tests as passing Jeff King
  2 siblings, 1 reply; 30+ messages in thread
From: Jeff King @ 2014-08-29 20:58 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: René Scharfe, Shawn Pearce, Martin von Gagern, git

If a pack contains duplicates of an object, and if that
object has any deltas pointing at it with REF_DELTA, we will
try to resolve the deltas twice. While unusual, this is not
strictly an error, but we currently die() with an unhelpful
message.  We should instead silently ignore the delta and
move on. The first resolution already filled in any data we
needed (like the sha1).

The duplicate base object is not our concern during the
resolution phase, and the .idx-writing phase will decide
whether to complain or allow it (based on whether --strict
is set).

Based-on-work-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/index-pack.c              | 18 ++++++++++--------
 t/t5308-pack-detect-duplicates.sh | 15 +++++++++++++++
 2 files changed, 25 insertions(+), 8 deletions(-)

diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index eebf1a8..acc30a9 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -936,20 +936,22 @@ static struct base_data *find_unresolved_deltas_1(struct base_data *base,
 		link_base_data(prev_base, base);
 	}
 
-	if (base->ref_first <= base->ref_last) {
+	while (base->ref_first <= base->ref_last) {
 		struct object_entry *child = objects + deltas[base->ref_first].obj_no;
-		struct base_data *result = alloc_base_data();
+		struct base_data *result = NULL;
 
-		if (!compare_and_swap_type(&child->real_type, OBJ_REF_DELTA,
-					   base->obj->real_type))
-			die("BUG: child->real_type != OBJ_REF_DELTA");
+		if (compare_and_swap_type(&child->real_type, OBJ_REF_DELTA,
+					  base->obj->real_type)) {
+			result = alloc_base_data();
+			resolve_delta(child, base, result);
+		}
 
-		resolve_delta(child, base, result);
 		if (base->ref_first == base->ref_last && base->ofs_last == -1)
 			free_base_data(base);
-
 		base->ref_first++;
-		return result;
+
+		if (result)
+			return result;
 	}
 
 	if (base->ofs_first <= base->ofs_last) {
diff --git a/t/t5308-pack-detect-duplicates.sh b/t/t5308-pack-detect-duplicates.sh
index 9c5a876..50f7a69 100755
--- a/t/t5308-pack-detect-duplicates.sh
+++ b/t/t5308-pack-detect-duplicates.sh
@@ -77,4 +77,19 @@ test_expect_success 'index-pack can reject packs with duplicates' '
 	test_expect_code 1 git cat-file -e $LO_SHA1
 '
 
+test_expect_success 'duplicated delta base does not trigger assert' '
+	clear_packs &&
+	{
+		A=01d7713666f4de822776c7622c10f1b07de280dc &&
+		B=e68fe8129b546b101aee9510c5328e7f21ca1d18 &&
+		pack_header 3 &&
+		pack_obj $A $B &&
+		pack_obj $B &&
+		pack_obj $B
+	} >dups.pack &&
+	pack_trailer dups.pack &&
+	git index-pack --stdin <dups.pack &&
+	test_must_fail git index-pack --stdin --strict <dups.pack
+'
+
 test_done
-- 
2.1.0.346.ga0367b9

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/2] index-pack: handle duplicate base objects gracefully
  2014-08-29 20:58                     ` [PATCH 2/2] index-pack: handle duplicate base objects gracefully Jeff King
@ 2014-08-29 21:56                       ` Junio C Hamano
  2014-08-29 22:08                         ` Jeff King
  0 siblings, 1 reply; 30+ messages in thread
From: Junio C Hamano @ 2014-08-29 21:56 UTC (permalink / raw)
  To: Jeff King; +Cc: René Scharfe, Shawn Pearce, Martin von Gagern, git

Jeff King <peff@peff.net> writes:

> If a pack contains duplicates of an object, and if that
> object has any deltas pointing at it with REF_DELTA, we will
> try to resolve the deltas twice. While unusual, this is not
> strictly an error, but we currently die() with an unhelpful
> message.

Hmm, I vaguely recall Shawn declaring this as an error in the pack
stream, though.

> The duplicate base object is not our concern during the
> resolution phase, and the .idx-writing phase will decide
> whether to complain or allow it (based on whether --strict
> is set).

OK.  We still diagnose and just be more lenient without loosening
the correctness of the result, so that would be not just OK but a
very welcome change.

Will queue.  Thanks.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/2] index-pack: handle duplicate base objects gracefully
  2014-08-29 21:56                       ` Junio C Hamano
@ 2014-08-29 22:08                         ` Jeff King
  2014-08-30  2:59                           ` Shawn Pearce
  0 siblings, 1 reply; 30+ messages in thread
From: Jeff King @ 2014-08-29 22:08 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: René Scharfe, Shawn Pearce, Martin von Gagern, git

On Fri, Aug 29, 2014 at 02:56:18PM -0700, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > If a pack contains duplicates of an object, and if that
> > object has any deltas pointing at it with REF_DELTA, we will
> > try to resolve the deltas twice. While unusual, this is not
> > strictly an error, but we currently die() with an unhelpful
> > message.
> 
> Hmm, I vaguely recall Shawn declaring this as an error in the pack
> stream, though.

I agree it is probably a bug on the sending side, but I think last time
this came up we decided to try to be liberal in what we accept.  c.f.
http://thread.gmane.org/gmane.comp.version-control.git/232305/focus=232310

-Peff

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/2] index-pack: handle duplicate base objects gracefully
  2014-08-29 22:08                         ` Jeff King
@ 2014-08-30  2:59                           ` Shawn Pearce
  2014-08-30 13:16                             ` Jeff King
  0 siblings, 1 reply; 30+ messages in thread
From: Shawn Pearce @ 2014-08-30  2:59 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, René Scharfe, Martin von Gagern, git

On Fri, Aug 29, 2014 at 3:08 PM, Jeff King <peff@peff.net> wrote:
> On Fri, Aug 29, 2014 at 02:56:18PM -0700, Junio C Hamano wrote:
>
>> Jeff King <peff@peff.net> writes:
>>
>> > If a pack contains duplicates of an object, and if that
>> > object has any deltas pointing at it with REF_DELTA, we will
>> > try to resolve the deltas twice. While unusual, this is not
>> > strictly an error, but we currently die() with an unhelpful
>> > message.
>>
>> Hmm, I vaguely recall Shawn declaring this as an error in the pack
>> stream, though.
>
> I agree it is probably a bug on the sending side, but I think last time
> this came up we decided to try to be liberal in what we accept.  c.f.
> http://thread.gmane.org/gmane.comp.version-control.git/232305/focus=232310

IIRC they aren't valid pack files to contain duplicates.

Once upon a time JGit had a bug and android.googlesource.com returned
duplicate objects in a Linux kernel repository. This caused at least
some versions of git-core to fail very badly in binary search at
object lookup time or something. We had a lot of users angry with us.
:)

I know Nico said its OK last year, but its really not. I don't think
implementations are capable of handling it.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/2] index-pack: handle duplicate base objects gracefully
  2014-08-30  2:59                           ` Shawn Pearce
@ 2014-08-30 13:16                             ` Jeff King
  2014-08-30 16:00                               ` René Scharfe
  2014-08-31  1:10                               ` Shawn Pearce
  0 siblings, 2 replies; 30+ messages in thread
From: Jeff King @ 2014-08-30 13:16 UTC (permalink / raw)
  To: Shawn Pearce; +Cc: Junio C Hamano, René Scharfe, Martin von Gagern, git

On Fri, Aug 29, 2014 at 07:59:32PM -0700, Shawn Pearce wrote:

> > I agree it is probably a bug on the sending side, but I think last time
> > this came up we decided to try to be liberal in what we accept.  c.f.
> > http://thread.gmane.org/gmane.comp.version-control.git/232305/focus=232310
> 
> IIRC they aren't valid pack files to contain duplicates.
> 
> Once upon a time JGit had a bug and android.googlesource.com returned
> duplicate objects in a Linux kernel repository. This caused at least
> some versions of git-core to fail very badly in binary search at
> object lookup time or something. We had a lot of users angry with us.
> :)
> 
> I know Nico said its OK last year, but its really not. I don't think
> implementations are capable of handling it.

We do detect and complain if --strict is given. Should we make it the
default instead? I think it is still worthwhile to have a mode that can
handle these packs. It may be the only reasonable way to recover the
data from such a broken pack (and that broken pack may be the only copy
of the data you have, if you are stuck getting it out of a broken
implementation on a remote server).

-Peff

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 3/2] t5309: mark delta-cycle failover tests as passing
  2014-08-29 20:55                   ` Jeff King
  2014-08-29 20:57                     ` [PATCH 1/2] index-pack: fix race condition with duplicate bases Jeff King
  2014-08-29 20:58                     ` [PATCH 2/2] index-pack: handle duplicate base objects gracefully Jeff King
@ 2014-08-30 13:23                     ` Jeff King
  2014-08-31 15:15                       ` Jeff King
  2 siblings, 1 reply; 30+ messages in thread
From: Jeff King @ 2014-08-30 13:23 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: René Scharfe, Shawn Pearce, Martin von Gagern, git

t5309 checks that we detect and fail when there is a cycle
of deltas (i.e., A is a delta on B, which is a delta on A).
It also checks the cases where we have such a cycle, but we
have an extra copy of the object either in the same pack or
in another one. These cases are recoverable, because we can
use the alternate copy to reconstruct the delta object.

However, it did not work in practice (and the tests were
marked as expect_failure) because it led to us trying to
resolve some deltas multiple times, something we assert()
should not happen. E.g., we'd see the full base A, use it to
resolve delta B on top of A, then use that to resolve the
delta A on top of B, and then finally try resolving B on top
of A again.

Since the previous commit, we handle duplicate bases more
gracefully, and the tests pass. While we're here, let's also
make the tests a little more robust:

  1. For the case of finding the extra base in the same
     pack, we do not need to use --fix-thin; it should (and
     does) work without it.

  2. Confirm that the resulting pack has a duplicate object.
     In the case of using --fix-thin to pull the object from
     another pack, the duplicate is added to the pack by us
     (this is a good thing, as otherwise future readers
     would encounter the cycle again).

Signed-off-by: Jeff King <peff@peff.net>
---
An obvious follow-on to the other two patches.

The implications of this make me slightly nervous, though. In the
--fix-thin case, the resulting pack will have 3 objects:

  - A as a delta on B
  - B as a delta on A
  - a full copy of either A (or B) provided by --fix-thin

We create a .idx that has duplicate entries for A. If a reader is trying
to reconstruct B and they find the full copy of A, they're fine. If they
find the delta copy, what happens?

Ideally the reader would say "hey, I can't reconstruct A here, let me
try to find another copy". But I am not sure if that happens, or if we
are even capable of finding another copy of A (certainly we can find one
in another pack, but I do not think we are smart enough to find a
duplicate in the same pack).

By definition, we have a copy in another pack here (that is where the
--fix-thin base came from), but there is nothing to guarantee that the
other pack lives on.

 t/t5309-pack-delta-cycles.sh | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/t/t5309-pack-delta-cycles.sh b/t/t5309-pack-delta-cycles.sh
index 3e7861b..5309095 100755
--- a/t/t5309-pack-delta-cycles.sh
+++ b/t/t5309-pack-delta-cycles.sh
@@ -56,13 +56,14 @@ test_expect_success 'index-pack detects REF_DELTA cycles' '
 	test_must_fail git index-pack --fix-thin --stdin <cycle.pack
 '
 
-test_expect_failure 'failover to an object in another pack' '
+test_expect_success 'failover to an object in another pack' '
 	clear_packs &&
 	git index-pack --stdin <ab.pack &&
-	git index-pack --stdin --fix-thin <cycle.pack
+	git index-pack --stdin --fix-thin <cycle.pack &&
+	test_must_fail git index-pack --strict --stdin --fix-thin <cycle.pack
 '
 
-test_expect_failure 'failover to a duplicate object in the same pack' '
+test_expect_success 'failover to a duplicate object in the same pack' '
 	clear_packs &&
 	{
 		pack_header 3 &&
@@ -71,7 +72,8 @@ test_expect_failure 'failover to a duplicate object in the same pack' '
 		pack_obj $A
 	} >recoverable.pack &&
 	pack_trailer recoverable.pack &&
-	git index-pack --fix-thin --stdin <recoverable.pack
+	git index-pack --stdin <recoverable.pack &&
+	test_must_fail git index-pack --strict --stdin <recoverable.pack
 '
 
 test_done
-- 
2.1.0.346.ga0367b9

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/2] index-pack: handle duplicate base objects gracefully
  2014-08-30 13:16                             ` Jeff King
@ 2014-08-30 16:00                               ` René Scharfe
  2014-08-31 15:17                                 ` Jeff King
  2014-08-31  1:10                               ` Shawn Pearce
  1 sibling, 1 reply; 30+ messages in thread
From: René Scharfe @ 2014-08-30 16:00 UTC (permalink / raw)
  To: Jeff King, Shawn Pearce; +Cc: Junio C Hamano, Martin von Gagern, git

Am 30.08.2014 um 15:16 schrieb Jeff King:
> On Fri, Aug 29, 2014 at 07:59:32PM -0700, Shawn Pearce wrote:
>
>>> I agree it is probably a bug on the sending side, but I think last time
>>> this came up we decided to try to be liberal in what we accept.  c.f.
>>> http://thread.gmane.org/gmane.comp.version-control.git/232305/focus=232310
>>
>> IIRC they aren't valid pack files to contain duplicates.
>>
>> Once upon a time JGit had a bug and android.googlesource.com returned
>> duplicate objects in a Linux kernel repository. This caused at least
>> some versions of git-core to fail very badly in binary search at
>> object lookup time or something. We had a lot of users angry with us.
>> :)
>>
>> I know Nico said its OK last year, but its really not. I don't think
>> implementations are capable of handling it.
>
> We do detect and complain if --strict is given. Should we make it the
> default instead? I think it is still worthwhile to have a mode that can
> handle these packs. It may be the only reasonable way to recover the
> data from such a broken pack (and that broken pack may be the only copy
> of the data you have, if you are stuck getting it out of a broken
> implementation on a remote server).

Sounds reasonable; being able to extract code from broken repos -- 
especially in this real-world case -- is beneficial.

My only nit with patch 2: Petr Stodulka <pstodulk@redhat.com> and Martin 
von Gagern <Martin.vGagern@gmx.net> should be mentioned as bug reporters.

René

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/2] index-pack: handle duplicate base objects gracefully
  2014-08-30 13:16                             ` Jeff King
  2014-08-30 16:00                               ` René Scharfe
@ 2014-08-31  1:10                               ` Shawn Pearce
  2014-08-31 15:24                                 ` Jeff King
  1 sibling, 1 reply; 30+ messages in thread
From: Shawn Pearce @ 2014-08-31  1:10 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, René Scharfe, Martin von Gagern, git

On Sat, Aug 30, 2014 at 6:16 AM, Jeff King <peff@peff.net> wrote:
> On Fri, Aug 29, 2014 at 07:59:32PM -0700, Shawn Pearce wrote:
>
>> > I agree it is probably a bug on the sending side, but I think last time
>> > this came up we decided to try to be liberal in what we accept.  c.f.
>> > http://thread.gmane.org/gmane.comp.version-control.git/232305/focus=232310
>>
>> IIRC they aren't valid pack files to contain duplicates.
>>
>> Once upon a time JGit had a bug and android.googlesource.com returned
>> duplicate objects in a Linux kernel repository. This caused at least
>> some versions of git-core to fail very badly in binary search at
>> object lookup time or something. We had a lot of users angry with us.
>> :)
>>
>> I know Nico said its OK last year, but its really not. I don't think
>> implementations are capable of handling it.
>
> We do detect and complain if --strict is given. Should we make it the
> default instead? I think it is still worthwhile to have a mode that can
> handle these packs. It may be the only reasonable way to recover the
> data from such a broken pack (and that broken pack may be the only copy
> of the data you have, if you are stuck getting it out of a broken
> implementation on a remote server).

Eh, OK, that does make a lot of sense.

Unfortunately accepting it by default encourages broken
implementations to stay broken and ship packs with duplicate objects,
which they shouldn't do since there are readers out there that cannot
handle it.

In my opinion we should make --strict default, but its an excellent
point made that users should have an escape hatch to disable this
check, attempt accepting a broken pack and try fixing it locally with
repack.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 3/2] t5309: mark delta-cycle failover tests as passing
  2014-08-30 13:23                     ` [PATCH 3/2] t5309: mark delta-cycle failover tests as passing Jeff King
@ 2014-08-31 15:15                       ` Jeff King
  2014-09-02 17:19                         ` Junio C Hamano
  0 siblings, 1 reply; 30+ messages in thread
From: Jeff King @ 2014-08-31 15:15 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: René Scharfe, Shawn Pearce, Martin von Gagern, git

On Sat, Aug 30, 2014 at 09:23:11AM -0400, Jeff King wrote:

> The implications of this make me slightly nervous, though. In the
> --fix-thin case, the resulting pack will have 3 objects:
> 
>   - A as a delta on B
>   - B as a delta on A
>   - a full copy of either A (or B) provided by --fix-thin
> 
> We create a .idx that has duplicate entries for A. If a reader is trying
> to reconstruct B and they find the full copy of A, they're fine. If they
> find the delta copy, what happens?
> 
> Ideally the reader would say "hey, I can't reconstruct A here, let me
> try to find another copy". But I am not sure if that happens, or if we
> are even capable of finding another copy of A (certainly we can find one
> in another pack, but I do not think we are smart enough to find a
> duplicate in the same pack).

The main reason this was "makes me nervous" before is that I did not
fully understand _why_ it worked with the current code. That bugged me,
so I dug further. And the answer is that it does not, but just happens
to work for some small cases.

Try this on top:

diff --git a/t/t5309-pack-delta-cycles.sh b/t/t5309-pack-delta-cycles.sh
index 5309095..4086983 100755
--- a/t/t5309-pack-delta-cycles.sh
+++ b/t/t5309-pack-delta-cycles.sh
@@ -58,9 +58,19 @@ test_expect_success 'index-pack detects REF_DELTA cycles' '
 
 test_expect_success 'failover to an object in another pack' '
 	clear_packs &&
+	{
+		pack_header 100 &&
+		for i in $(test_seq 50); do
+			pack_obj $A $B &&
+			pack_obj $B $A || break
+		done
+	} >megacycle.pack &&
+	pack_trailer megacycle.pack &&
 	git index-pack --stdin <ab.pack &&
-	git index-pack --stdin --fix-thin <cycle.pack &&
-	test_must_fail git index-pack --strict --stdin --fix-thin <cycle.pack
+	git index-pack --stdin --fix-thin <megacycle.pack &&
+	echo >&2 indexed pack successfully... &&
+	git fsck &&
+	echo >&2 actually re-read pack successfully
 '
 
 test_expect_success 'failover to a duplicate object in the same pack' '


It has the same cycle problem, but we are just adding a larger number of
instances to the pack. Which means that any given sha1-lookup in the
index is more likely to hit a delta rather than the base object.

We successfully index the pack, but our fsck goes into an infinite loop.
Yikes.

I haven't really looked into it, but I suspect we would need some kind
of cycle detection on the delta resolution (and possibly to teach the
sha1-lookup to recognize duplicate objects in the pack and treat them
individually). Frankly, I don't think it is worth the effort or
complexity. We should probably just declare delta cycles insane and
reject them outright.

We used to do that because the only way to correctly resolve them was by
introducing a duplicate base object, and we did not allow that. Patch 2
from my series loosened this, which makes index-pack work, but not
necessarily the rest of git. And since index-pack is the gatekeeper on
receiving objects from remotes, it needs to be the _most_ picky. So my
series is definitely a regression as-is.

We can solve this in one of three ways:

  1. Teach the rest of git to handle recoverable delta cycles. This is
     probably crazy and not worth the effort (and it just lets crap
     through that will hurt other git implementations, too).

  2. Continue to let through duplicate objects in index-pack, but
     specifically detect and reject delta cycles. This is more work (I'm
     not sure yet how easy it would be to detect cycles), but it would
     mean we can treat duplicates (a much less nasty problem) and cycles
     differently.

  3. Go back to outlawing duplicate bases. This is very easy. Just drop
     my patch 2. :)

I am inclined to go with the third option. There has already been a
suggestion from Shawn that we disallow duplicates entirely, and I was
tempted to go that direction even without this finding. But to me this
makes it a no-brainer; the question has gone from "how strict do we want
to be" to "do we want to protect the rest of the code against useless
and potentially harmful violations of their assumptions".

If we do go with (3), that opens up two new questions:

  a. Should we disallow _all_ duplicates, or just those that are bases?
     This is actually easy to code; the assert() in find_unresolved_deltas
     catches the bases, and the .idx writer catches any other ones.

  b. How optional do we want to make this? Right now (without this
     series) the delta-base duplicates always die, and regular
     duplicates are prevented only under --strict.

     If we treat them the same, it should probably be die-by-default.
     Should there be an optional mode to let this stuff through (i.e., a
     "I know this might cause problems with the rest of git, but I am
     desperate to get the data out of this pack" mode?)

     If we treat them differently, there is not much harm in an option
     to loosen regular duplicates, as I think the rest of git handles
     it. For bases, in theory you might be able to recover some data.
     But you may also run into this infinite loop. It is very much "at
     your own risk".

     I wonder if index-pack is really the right place for such a "please
     help me get the data out of this broken pack" operation in the
     first place. If it is a broken pack, we are probably much better
     off to explode it into loose objects than try to index a broken
     pack. That's way less efficient, but this should be a last-resort.

I think my preference is to outlaw all duplicates unconditionally in
index-pack. Catch the duplicate base in find_unresolved_deltas as we do
now, but improve the error message. Confirm that unpack-objects can
handle these cases. Optionally attach advice to the duplicate errors
directing people to use "unpack-objects" and/or set
transfer.unpackLimit higher.

-Peff

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/2] index-pack: handle duplicate base objects gracefully
  2014-08-30 16:00                               ` René Scharfe
@ 2014-08-31 15:17                                 ` Jeff King
  2014-08-31 16:30                                   ` René Scharfe
  0 siblings, 1 reply; 30+ messages in thread
From: Jeff King @ 2014-08-31 15:17 UTC (permalink / raw)
  To: René Scharfe; +Cc: Shawn Pearce, Junio C Hamano, Martin von Gagern, git

On Sat, Aug 30, 2014 at 06:00:59PM +0200, René Scharfe wrote:

> My only nit with patch 2: Petr Stodulka <pstodulk@redhat.com> and Martin von
> Gagern <Martin.vGagern@gmx.net> should be mentioned as bug reporters.

Yeah, I agree with that. And actually, you should get a Reported-by:
on the first patch. :)

However, I think there are some grave implications of this series; see
the message I just posted elsewhere in the thread. I think we will end
up dropping patch 2, but keep patch 1.

-Peff

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/2] index-pack: handle duplicate base objects gracefully
  2014-08-31  1:10                               ` Shawn Pearce
@ 2014-08-31 15:24                                 ` Jeff King
  2014-08-31 22:23                                   ` Junio C Hamano
  0 siblings, 1 reply; 30+ messages in thread
From: Jeff King @ 2014-08-31 15:24 UTC (permalink / raw)
  To: Shawn Pearce; +Cc: Junio C Hamano, René Scharfe, Martin von Gagern, git

On Sat, Aug 30, 2014 at 06:10:33PM -0700, Shawn Pearce wrote:

> > We do detect and complain if --strict is given. Should we make it the
> > default instead? I think it is still worthwhile to have a mode that can
> > handle these packs. It may be the only reasonable way to recover the
> > data from such a broken pack (and that broken pack may be the only copy
> > of the data you have, if you are stuck getting it out of a broken
> > implementation on a remote server).
> 
> Eh, OK, that does make a lot of sense.
> 
> Unfortunately accepting it by default encourages broken
> implementations to stay broken and ship packs with duplicate objects,
> which they shouldn't do since there are readers out there that cannot
> handle it.

I was pretty much convinced by this line of reasoning after you reading
your message. But after discovering some horrible side effects that I
just posted elsewhere, I think it's a no-brainer.

> In my opinion we should make --strict default, but its an excellent
> point made that users should have an escape hatch to disable this
> check, attempt accepting a broken pack and try fixing it locally with
> repack.

I am not sure if you meant it this way, but --strict controls a lot more
than just duplicate objects. It also runs fsck checks against the
incoming objects. Turning all of that on is a much bigger change than I
think we've been talking about.

But it's one I think we should consider. We've had --strict on at GitHub
for a few years, and it has caught many problems. Frequently we get
push-back from users saying "but nowhere else I pushed this to
complains". My opinion is that it is not that we are wrong for being
picky, but that other servers are not being picky enough.

It's also a bit of a hassle, though. E.g., a lot of projects have broken
ident lines deep in their history (from old versions of git, or other
broken implementations) and it is often way too late to fix them. We
usually try to get people to rewrite the history if it's not a problem,
but otherwise will lift the restrictions temporarily to let them push up
old history.

Broken ident lines are annoying, but not _too_ fundamentally bad.
Duplicate tree entries are a lot worse. Fsck even distinguishes between
"error" and "warning", but "index-pack --strict" treats both as a reason
to reject the object. We could perhaps loosen that, and make sure our
error/warning categories are reasonable.

-Peff

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/2] index-pack: handle duplicate base objects gracefully
  2014-08-31 15:17                                 ` Jeff King
@ 2014-08-31 16:30                                   ` René Scharfe
  0 siblings, 0 replies; 30+ messages in thread
From: René Scharfe @ 2014-08-31 16:30 UTC (permalink / raw)
  To: Jeff King; +Cc: Shawn Pearce, Junio C Hamano, Martin von Gagern, git

Am 31.08.2014 um 17:17 schrieb Jeff King:
> On Sat, Aug 30, 2014 at 06:00:59PM +0200, René Scharfe wrote:
>
>> My only nit with patch 2: Petr Stodulka <pstodulk@redhat.com> and Martin von
>> Gagern <Martin.vGagern@gmx.net> should be mentioned as bug reporters.
>
> Yeah, I agree with that. And actually, you should get a Reported-by:
> on the first patch. :)
>
> However, I think there are some grave implications of this series; see
> the message I just posted elsewhere in the thread. I think we will end
> up dropping patch 2, but keep patch 1.

OK, but it would still be a good idea to replace the assert()s with 
die()s and error messages that tell users that the repo is broken, not git.

René

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/2] index-pack: handle duplicate base objects gracefully
  2014-08-31 15:24                                 ` Jeff King
@ 2014-08-31 22:23                                   ` Junio C Hamano
  0 siblings, 0 replies; 30+ messages in thread
From: Junio C Hamano @ 2014-08-31 22:23 UTC (permalink / raw)
  To: Jeff King; +Cc: Shawn Pearce, René Scharfe, Martin von Gagern, git

Jeff King <peff@peff.net> writes:

> Broken ident lines are annoying, but not _too_ fundamentally bad.
> Duplicate tree entries are a lot worse. Fsck even distinguishes between
> "error" and "warning", but "index-pack --strict" treats both as a reason
> to reject the object. We could perhaps loosen that, and make sure our
> error/warning categories are reasonable.

A pack stream that records the same object twice is not a breakage
that is buried deep in the history and cannot be corrected without a
wholesale history rewrite, unlike commit objects with broken ident
lines or tree objects with 0-filled file type designators.

As such, I think it is a reasonable thing to do the following:

 - "git repack" should be taught to dedup, as a way to recover from
   such a breakage, if not done already.

 - "git fsck" should complain, suggesting users to repack to
   recover.  It may even want to exit with non-zero status.

 - "git receive-pack" and "git fetch-pack" should warn, loudly,
   without failing.  They should ideally not keep such a corrupt
   pack stream on disk at all, de-duping the objects while
   streaming, but if that is not practical, at least they should
   give an easy way for the user to cause de-duping immediately (I
   do not mind if that is "we detected that the other side fed us a
   pack stream that is broken.  We automatically correct the
   breakage for you by repacking.  Be patient while we do so").

 - If the other side of "receive-pack/fetch-pack" implements the
   agent capability, upon detecting such a breakage, it may not be a
   bad idea to offer the user to send an e-mail reporting the
   version of the software to a wall-of-shame e-mail address ;-).

I agree that a tree that records the same path twice should be
outright rejected.  There is no sane recovery path.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 3/2] t5309: mark delta-cycle failover tests as passing
  2014-08-31 15:15                       ` Jeff King
@ 2014-09-02 17:19                         ` Junio C Hamano
  0 siblings, 0 replies; 30+ messages in thread
From: Junio C Hamano @ 2014-09-02 17:19 UTC (permalink / raw)
  To: Jeff King; +Cc: René Scharfe, Shawn Pearce, Martin von Gagern, git

Jeff King <peff@peff.net> writes:

> We used to do that because the only way to correctly resolve them was by
> introducing a duplicate base object, and we did not allow that. Patch 2
> from my series loosened this, which makes index-pack work, but not
> necessarily the rest of git. And since index-pack is the gatekeeper on
> receiving objects from remotes, it needs to be the _most_ picky. So my
> series is definitely a regression as-is.

Yeah, at first, allowing the delta resolution so that we can
resurrect data from such a corrupt pack looked a no-brainer
improvement, but I think that is probably a right conclusion.
Thanks for digging this one through.

>      I wonder if index-pack is really the right place for such a "please
>      help me get the data out of this broken pack" operation in the
>      first place. If it is a broken pack, we are probably much better
>      off to explode it into loose objects than try to index a broken
>      pack. That's way less efficient, but this should be a last-resort.

Most objects in such a broken pack do not have to get unpacked, no?
I wonder if we can excise duplicate objects from the pack stream,
which would involve adjusting the delta offset for any ofs-delta
representations that appear after the part we cut out of the stream
to remove such cruft.

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2014-09-02 17:19 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-08-21 11:35 [BUG] resolved deltas Petr Stodulka
2014-08-21 18:25 ` Petr Stodulka
2014-08-22 19:41 ` Martin von Gagern
2014-08-23 10:12   ` René Scharfe
2014-08-23 10:56     ` Jeff King
2014-08-23 11:04       ` Jeff King
2014-08-23 11:18         ` Jeff King
2014-08-25 16:39           ` René Scharfe
2014-08-28 22:08             ` Jeff King
2014-08-28 22:15               ` Jeff King
2014-08-28 23:04                 ` Jeff King
2014-08-28 22:22               ` Jeff King
2014-08-28 23:14                 ` Junio C Hamano
2014-08-29 20:55                   ` Jeff King
2014-08-29 20:57                     ` [PATCH 1/2] index-pack: fix race condition with duplicate bases Jeff King
2014-08-29 20:58                     ` [PATCH 2/2] index-pack: handle duplicate base objects gracefully Jeff King
2014-08-29 21:56                       ` Junio C Hamano
2014-08-29 22:08                         ` Jeff King
2014-08-30  2:59                           ` Shawn Pearce
2014-08-30 13:16                             ` Jeff King
2014-08-30 16:00                               ` René Scharfe
2014-08-31 15:17                                 ` Jeff King
2014-08-31 16:30                                   ` René Scharfe
2014-08-31  1:10                               ` Shawn Pearce
2014-08-31 15:24                                 ` Jeff King
2014-08-31 22:23                                   ` Junio C Hamano
2014-08-30 13:23                     ` [PATCH 3/2] t5309: mark delta-cycle failover tests as passing Jeff King
2014-08-31 15:15                       ` Jeff King
2014-09-02 17:19                         ` Junio C Hamano
2014-08-25 17:19       ` [BUG] resolved deltas Shawn Pearce

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.