All of lore.kernel.org
 help / color / mirror / Atom feed
* Multiple fetches when unshallowing a shallow clone
@ 2015-11-30 19:35 Jason Paller-Rzepka
  2015-12-04 20:46 ` Stefan Beller
  0 siblings, 1 reply; 19+ messages in thread
From: Jason Paller-Rzepka @ 2015-11-30 19:35 UTC (permalink / raw)
  To: git

Hi all,

Would anyone be willing to help me understand some shallow-clone
behavior?  (I found a bug in Dulwich, and I'm looking for some context
so I can determine how to fix it.)

I learned that cgit sometimes performs two fetches for a `git fetch
--unshallow`: one with depth 'infinity', and a subsequent one with
depth zero.

Could anyone answer:
1) What is the purpose of the second fetch?
2) What does this depth of zero mean? Is it the same as a depth of
infinity?  (I assume not... but, since I thought the smallest
meaningful depth was 1, I don't know what else it might mean.)

Thank you!
Jason

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Multiple fetches when unshallowing a shallow clone
  2015-11-30 19:35 Multiple fetches when unshallowing a shallow clone Jason Paller-Rzepka
@ 2015-12-04 20:46 ` Stefan Beller
  2015-12-04 21:27   ` Jeff King
  0 siblings, 1 reply; 19+ messages in thread
From: Stefan Beller @ 2015-12-04 20:46 UTC (permalink / raw)
  To: Jason Paller-Rzepka; +Cc: git

On Mon, Nov 30, 2015 at 11:35 AM, Jason Paller-Rzepka
<jasonpr@google.com> wrote:
> Hi all,
>
> Would anyone be willing to help me understand some shallow-clone
> behavior?  (I found a bug in Dulwich, and I'm looking for some context
> so I can determine how to fix it.)
>
> I learned that cgit sometimes performs two fetches for a `git fetch
> --unshallow`: one with depth 'infinity', and a subsequent one with
> depth zero.

Is there a condition to trigger this 'sometimes' ?

I just tried reproducing via
$ GIT_TRACE=1 git fetch --unshallow

and could not see a second fetch, but only a
fetch-pack with --depth=2147483647
>
> Could anyone answer:
> 1) What is the purpose of the second fetch?
> 2) What does this depth of zero mean? Is it the same as a depth of
> infinity?  (I assume not... but, since I thought the smallest
> meaningful depth was 1, I don't know what else it might mean.)
>
> Thank you!
> Jason
> --
> To unsubscribe from this list: send the line "unsubscribe git" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Multiple fetches when unshallowing a shallow clone
  2015-12-04 20:46 ` Stefan Beller
@ 2015-12-04 21:27   ` Jeff King
  2015-12-04 21:36     ` Stefan Beller
                       ` (2 more replies)
  0 siblings, 3 replies; 19+ messages in thread
From: Jeff King @ 2015-12-04 21:27 UTC (permalink / raw)
  To: Stefan Beller; +Cc: Jason Paller-Rzepka, git

On Fri, Dec 04, 2015 at 12:46:59PM -0800, Stefan Beller wrote:

> On Mon, Nov 30, 2015 at 11:35 AM, Jason Paller-Rzepka
> <jasonpr@google.com> wrote:
> > Hi all,
> >
> > Would anyone be willing to help me understand some shallow-clone
> > behavior?  (I found a bug in Dulwich, and I'm looking for some context
> > so I can determine how to fix it.)
> >
> > I learned that cgit sometimes performs two fetches for a `git fetch
> > --unshallow`: one with depth 'infinity', and a subsequent one with
> > depth zero.
> 
> Is there a condition to trigger this 'sometimes' ?
> 
> I just tried reproducing via
> $ GIT_TRACE=1 git fetch --unshallow
> 
> and could not see a second fetch, but only a
> fetch-pack with --depth=2147483647

This seems to reproduce consistently for me:

  $ git clone --depth=1 git://github.com/git/git
  Cloning into 'git'...
  remote: Counting objects: 2925, done.
  remote: Compressing objects: 100% (2602/2602), done.
  remote: Total 2925 (delta 230), reused 2329 (delta 206), pack-reused 0
  Receiving objects: 100% (2925/2925), 6.17 MiB | 10.80 MiB/s, done.
  Resolving deltas: 100% (230/230), done.

  $ cd git
  $ git fetch --unshallow
  remote: Counting objects: 185430, done.
  remote: Compressing objects: 100% (46933/46933), done.
  remote: Total 185430 (delta 140505), reused 181589 (delta 136694), pack-reused 0
  Receiving objects: 100% (185430/185430), 52.80 MiB | 10.84 MiB/s, done.
  Resolving deltas: 100% (140505/140505), completed with 1784 local objects.
  remote: Counting objects: 579, done.
  remote: Compressing objects: 100% (579/579), done.
  remote: Total 579 (delta 0), reused 579 (delta 0), pack-reused 0
  Receiving objects: 100% (579/579), 266.85 KiB | 0 bytes/s, done.
  [... fetch output ...]

That looks like two packs being received for the --unshallow case.

-Peff

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Multiple fetches when unshallowing a shallow clone
  2015-12-04 21:27   ` Jeff King
@ 2015-12-04 21:36     ` Stefan Beller
  2015-12-04 21:38     ` Jason Paller-Rzepka
  2015-12-04 21:57     ` Junio C Hamano
  2 siblings, 0 replies; 19+ messages in thread
From: Stefan Beller @ 2015-12-04 21:36 UTC (permalink / raw)
  To: Jeff King; +Cc: Jason Paller-Rzepka, git

On Fri, Dec 4, 2015 at 1:27 PM, Jeff King <peff@peff.net> wrote:
>>
>> and could not see a second fetch, but only a
>> fetch-pack with --depth=2147483647
>
> This seems to reproduce consistently for me:
>
>   $ git clone --depth=1 git://github.com/git/git

I used the http protocol, so I guess that's the difference.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Multiple fetches when unshallowing a shallow clone
  2015-12-04 21:27   ` Jeff King
  2015-12-04 21:36     ` Stefan Beller
@ 2015-12-04 21:38     ` Jason Paller-Rzepka
  2015-12-04 21:50       ` Stefan Beller
  2015-12-04 21:51       ` Jeff King
  2015-12-04 21:57     ` Junio C Hamano
  2 siblings, 2 replies; 19+ messages in thread
From: Jason Paller-Rzepka @ 2015-12-04 21:38 UTC (permalink / raw)
  To: Jeff King; +Cc: Stefan Beller, git

It appears that it happens when the shallow history grows to include a
commit that's pointed to by a previously unseen tag.  For example,
when I deepen a checkout of git to depth 8, I hit v2.5.2, and a second
fetch takes place.

```
$ git clone --depth=1 http://github.com/git/git
Cloning into 'git'...
remote: Counting objects: 2925, done.
remote: Compressing objects: 100% (2602/2602), done.
remote: Total 2925 (delta 230), reused 2329 (delta 206), pack-reused 0
Receiving objects: 100% (2925/2925), 6.17 MiB | 0 bytes/s, done.
Resolving deltas: 100% (230/230), done.
Checking connectivity... done.
$ git -C git fetch --depth=8
remote: Counting objects: 858, done.
remote: Compressing objects: 100% (774/774), done.
remote: Total 858 (delta 793), reused 138 (delta 80), pack-reused 0
Receiving objects: 100% (858/858), 364.53 KiB | 0 bytes/s, done.
Resolving deltas: 100% (793/793), completed with 476 local objects.
remote: Counting objects: 1, done.
remote: Total 1 (delta 0), reused 1 (delta 0), pack-reused 0
Unpacking objects: 100% (1/1), done.
From http://github.com/git/git
 * [new tag]         v2.5.2     -> v2.5.2
$
```

But why would fetching a tag (or set of tags) merit a depth of zero?
Doesn't depth 1 mean "give me the the objects, and none of their
descendants"?  Why use 0?

Thanks!
Jason

On Fri, Dec 4, 2015 at 4:27 PM, Jeff King <peff@peff.net> wrote:
> On Fri, Dec 04, 2015 at 12:46:59PM -0800, Stefan Beller wrote:
>
>> On Mon, Nov 30, 2015 at 11:35 AM, Jason Paller-Rzepka
>> <jasonpr@google.com> wrote:
>> > Hi all,
>> >
>> > Would anyone be willing to help me understand some shallow-clone
>> > behavior?  (I found a bug in Dulwich, and I'm looking for some context
>> > so I can determine how to fix it.)
>> >
>> > I learned that cgit sometimes performs two fetches for a `git fetch
>> > --unshallow`: one with depth 'infinity', and a subsequent one with
>> > depth zero.
>>
>> Is there a condition to trigger this 'sometimes' ?
>>
>> I just tried reproducing via
>> $ GIT_TRACE=1 git fetch --unshallow
>>
>> and could not see a second fetch, but only a
>> fetch-pack with --depth=2147483647
>
> This seems to reproduce consistently for me:
>
>   $ git clone --depth=1 git://github.com/git/git
>   Cloning into 'git'...
>   remote: Counting objects: 2925, done.
>   remote: Compressing objects: 100% (2602/2602), done.
>   remote: Total 2925 (delta 230), reused 2329 (delta 206), pack-reused 0
>   Receiving objects: 100% (2925/2925), 6.17 MiB | 10.80 MiB/s, done.
>   Resolving deltas: 100% (230/230), done.
>
>   $ cd git
>   $ git fetch --unshallow
>   remote: Counting objects: 185430, done.
>   remote: Compressing objects: 100% (46933/46933), done.
>   remote: Total 185430 (delta 140505), reused 181589 (delta 136694), pack-reused 0
>   Receiving objects: 100% (185430/185430), 52.80 MiB | 10.84 MiB/s, done.
>   Resolving deltas: 100% (140505/140505), completed with 1784 local objects.
>   remote: Counting objects: 579, done.
>   remote: Compressing objects: 100% (579/579), done.
>   remote: Total 579 (delta 0), reused 579 (delta 0), pack-reused 0
>   Receiving objects: 100% (579/579), 266.85 KiB | 0 bytes/s, done.
>   [... fetch output ...]
>
> That looks like two packs being received for the --unshallow case.
>
> -Peff

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Multiple fetches when unshallowing a shallow clone
  2015-12-04 21:38     ` Jason Paller-Rzepka
@ 2015-12-04 21:50       ` Stefan Beller
  2015-12-04 21:51       ` Jeff King
  1 sibling, 0 replies; 19+ messages in thread
From: Stefan Beller @ 2015-12-04 21:50 UTC (permalink / raw)
  To: Jason Paller-Rzepka; +Cc: Jeff King, git

I can reproduce it now. Instead of using my $random version, I just
needed origin/master
to reproduce.

The second fetch is invoked via
(as outputted via GIT_TRACE=1 git -C git fetch --depth=8)

13:44:56.863841 run-command.c:343       trace: run_command:
'fetch-pack' '--stateless-rpc' '--stdin' '--lock-pack' '--thin'
'https://github.com/git/git/'

so it seems like there is no explicit depth given, so I think the 0
comes from the initialization step and nobody touched it to fill with
meaningful values.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Multiple fetches when unshallowing a shallow clone
  2015-12-04 21:38     ` Jason Paller-Rzepka
  2015-12-04 21:50       ` Stefan Beller
@ 2015-12-04 21:51       ` Jeff King
  2015-12-04 22:45         ` Junio C Hamano
  1 sibling, 1 reply; 19+ messages in thread
From: Jeff King @ 2015-12-04 21:51 UTC (permalink / raw)
  To: Jason Paller-Rzepka
  Cc: Nguyễn Thái Ngọc Duy, Stefan Beller, git

On Fri, Dec 04, 2015 at 04:38:16PM -0500, Jason Paller-Rzepka wrote:

> It appears that it happens when the shallow history grows to include a
> commit that's pointed to by a previously unseen tag.  For example,
> when I deepen a checkout of git to depth 8, I hit v2.5.2, and a second
> fetch takes place.

Yeah. The code is in builtin/fetch.c:backfill_tags.

> But why would fetching a tag (or set of tags) merit a depth of zero?
> Doesn't depth 1 mean "give me the the objects, and none of their
> descendants"?  Why use 0?

That comes from this line:

  transport_set_option(transport, TRANS_OPT_DEPTH, "0");

That line blame back to b888d61 (Make fetch a builtin, 2007-09-10),
which isn't incredibly helpful.

I think that comes from the original git-fetch.sh, which had:

  ?*)
          # do not deepen a shallow tree when following tags
          shallow_depth=

Which makes sense. I think the code at that point is not aware that we
just "unshallowed" and can therefore drop the depth parameter
altogether. But I admit I am not all that familiar with the shallow
code.

+cc Duy, who can probably say something way more intelligent about this
off the top of his head. :)

-Peff

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Multiple fetches when unshallowing a shallow clone
  2015-12-04 21:27   ` Jeff King
  2015-12-04 21:36     ` Stefan Beller
  2015-12-04 21:38     ` Jason Paller-Rzepka
@ 2015-12-04 21:57     ` Junio C Hamano
  2015-12-04 22:10       ` Jeff King
  2 siblings, 1 reply; 19+ messages in thread
From: Junio C Hamano @ 2015-12-04 21:57 UTC (permalink / raw)
  To: Jeff King; +Cc: Stefan Beller, Jason Paller-Rzepka, git

Jeff King <peff@peff.net> writes:

> This seems to reproduce consistently for me:
>
>   $ git clone --depth=1 git://github.com/git/git
>   Cloning into 'git'...
>   remote: Counting objects: 2925, done.
>   remote: Compressing objects: 100% (2602/2602), done.
>   remote: Total 2925 (delta 230), reused 2329 (delta 206), pack-reused 0
>   Receiving objects: 100% (2925/2925), 6.17 MiB | 10.80 MiB/s, done.
>   Resolving deltas: 100% (230/230), done.
>
>   $ cd git
>   $ git fetch --unshallow
>   remote: Counting objects: 185430, done.
>   remote: Compressing objects: 100% (46933/46933), done.
>   remote: Total 185430 (delta 140505), reused 181589 (delta 136694), pack-reused 0
>   Receiving objects: 100% (185430/185430), 52.80 MiB | 10.84 MiB/s, done.
>   Resolving deltas: 100% (140505/140505), completed with 1784 local objects.
>   remote: Counting objects: 579, done.
>   remote: Compressing objects: 100% (579/579), done.
>   remote: Total 579 (delta 0), reused 579 (delta 0), pack-reused 0
>   Receiving objects: 100% (579/579), 266.85 KiB | 0 bytes/s, done.
>   [... fetch output ...]
>
> That looks like two packs being received for the --unshallow case.

What is puzzling is that I do not seem to see this "two fetches"
with the local transport.  I only see "deepen 2147483647" in the
protocol log.

Moreover, the only interesting lines in the output from

    $ git grep -B1 'deepen ' \*.[ch]

are

fetch-pack.c-   if (args->depth > 0)
fetch-pack.c:           packet_buf_write(&req_buf, "deepen %d", args->depth);

so I do not see how anybody would be sending "deepen 0" as Jason
saw.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Multiple fetches when unshallowing a shallow clone
  2015-12-04 21:57     ` Junio C Hamano
@ 2015-12-04 22:10       ` Jeff King
  0 siblings, 0 replies; 19+ messages in thread
From: Jeff King @ 2015-12-04 22:10 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Stefan Beller, Jason Paller-Rzepka, git

On Fri, Dec 04, 2015 at 01:57:30PM -0800, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > This seems to reproduce consistently for me:
> >
> >   $ git clone --depth=1 git://github.com/git/git
> >   Cloning into 'git'...
> >   remote: Counting objects: 2925, done.
> >   remote: Compressing objects: 100% (2602/2602), done.
> >   remote: Total 2925 (delta 230), reused 2329 (delta 206), pack-reused 0
> >   Receiving objects: 100% (2925/2925), 6.17 MiB | 10.80 MiB/s, done.
> >   Resolving deltas: 100% (230/230), done.
> >
> >   $ cd git
> >   $ git fetch --unshallow
> >   remote: Counting objects: 185430, done.
> >   remote: Compressing objects: 100% (46933/46933), done.
> >   remote: Total 185430 (delta 140505), reused 181589 (delta 136694), pack-reused 0
> >   Receiving objects: 100% (185430/185430), 52.80 MiB | 10.84 MiB/s, done.
> >   Resolving deltas: 100% (140505/140505), completed with 1784 local objects.
> >   remote: Counting objects: 579, done.
> >   remote: Compressing objects: 100% (579/579), done.
> >   remote: Total 579 (delta 0), reused 579 (delta 0), pack-reused 0
> >   Receiving objects: 100% (579/579), 266.85 KiB | 0 bytes/s, done.
> >   [... fetch output ...]
> >
> > That looks like two packs being received for the --unshallow case.
> 
> What is puzzling is that I do not seem to see this "two fetches"
> with the local transport.  I only see "deepen 2147483647" in the
> protocol log.

Yeah, I do not ever see "deepen 0" from GIT_TRACE_PACKET output. FWIW,
here's the output I am using to reproduce this locally:

  # do this once
  git clone --bare git://github.com/git/git src.git

  # do this for each test run
  rm -rf repo
  git clone --no-local --depth=1 src.git repo
  cd repo
  echo "==> unshallow" &&
  git fetch --progress --unshallow 2>&1 | grep remote

And you can see that there are two separate sections (and I traced this
to backfill_tags() with gdb).

Note that the issue goes away if the shallow clone is done with "--bare"
(I guess because we pick up tags differently in the initial clone?).

-Peff

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Multiple fetches when unshallowing a shallow clone
  2015-12-04 21:51       ` Jeff King
@ 2015-12-04 22:45         ` Junio C Hamano
  2015-12-05  5:33           ` Duy Nguyen
  0 siblings, 1 reply; 19+ messages in thread
From: Junio C Hamano @ 2015-12-04 22:45 UTC (permalink / raw)
  To: Jeff King
  Cc: Jason Paller-Rzepka, Nguyễn Thái Ngọc Duy,
	Stefan Beller, git

Jeff King <peff@peff.net> writes:

>> But why would fetching a tag (or set of tags) merit a depth of zero?
>> Doesn't depth 1 mean "give me the the objects, and none of their
>> descendants"?  Why use 0?
>
> That comes from this line:
>
>   transport_set_option(transport, TRANS_OPT_DEPTH, "0");
>
> That line blame back to b888d61 (Make fetch a builtin, 2007-09-10),
> which isn't incredibly helpful.

Hmm, "0" means "no depth limitations", which is exactly what we want
in this "unshallow" case, I would think.  The behaviour observed is
just like a regular fetch that auto-follows tags, where it has to
make a second fetch if the primary fetch fails to include everything
that is needed for propagating the tag for whatever reason.

Having said that, IIRC, these days a depth limited clone is created
implicitly with --single-branch option, and I am not sure what the
right behaviour for the auto-following of tags in such a repository.

> I think that comes from the original git-fetch.sh, which had:
>
>   ?*)
>           # do not deepen a shallow tree when following tags
>           shallow_depth=
>
> Which makes sense. I think the code at that point is not aware that we
> just "unshallowed" and can therefore drop the depth parameter
> altogether. But I admit I am not all that familiar with the shallow
> code.
>
> +cc Duy, who can probably say something way more intelligent about this
> off the top of his head. :)
>
> -Peff

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Multiple fetches when unshallowing a shallow clone
  2015-12-04 22:45         ` Junio C Hamano
@ 2015-12-05  5:33           ` Duy Nguyen
  2015-12-06  4:00             ` Junio C Hamano
  0 siblings, 1 reply; 19+ messages in thread
From: Duy Nguyen @ 2015-12-05  5:33 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jeff King, Jason Paller-Rzepka, Stefan Beller, git

On Fri, Dec 4, 2015 at 11:45 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Jeff King <peff@peff.net> writes:
>
>>> But why would fetching a tag (or set of tags) merit a depth of zero?
>>> Doesn't depth 1 mean "give me the the objects, and none of their
>>> descendants"?  Why use 0?
>>
>> That comes from this line:
>>
>>   transport_set_option(transport, TRANS_OPT_DEPTH, "0");
>>
>> That line blame back to b888d61 (Make fetch a builtin, 2007-09-10),
>> which isn't incredibly helpful.
>
> Hmm, "0" means "no depth limitations", which is exactly what we want
> in this "unshallow" case, I would think.  The behaviour observed is

No depth 0 means "do not change depth", which is why Jeff saw no
'deepen' lines (and those lines should be rejected any way). It's
equivalent of doing "git fetch" without --depth.

> just like a regular fetch that auto-follows tags, where it has to
> make a second fetch if the primary fetch fails to include everything
> that is needed for propagating the tag for whatever reason.
>
> Having said that, IIRC, these days a depth limited clone is created
> implicitly with --single-branch option, and I am not sure what the
> right behaviour for the auto-following of tags in such a repository.

I suppose followtags feature has been around long enough that we can
simply trust that and skip the second fetch? But it's not that easy
for subsequent fetches after the initial fetch in git-clone, because
we no longer know if --single-branch was used (of if there is any new
branch fetched since).
-- 
Duy

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Multiple fetches when unshallowing a shallow clone
  2015-12-05  5:33           ` Duy Nguyen
@ 2015-12-06  4:00             ` Junio C Hamano
  2015-12-06  6:37               ` Jeff King
  0 siblings, 1 reply; 19+ messages in thread
From: Junio C Hamano @ 2015-12-06  4:00 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: Jeff King, Jason Paller-Rzepka, Stefan Beller, git

Duy Nguyen <pclouds@gmail.com> writes:

> On Fri, Dec 4, 2015 at 11:45 PM, Junio C Hamano <gitster@pobox.com> wrote:
> ...
>> just like a regular fetch that auto-follows tags, where it has to
>> make a second fetch if the primary fetch fails to include everything
>> that is needed for propagating the tag for whatever reason.
>>
>> Having said that, IIRC, these days a depth limited clone is created
>> implicitly with --single-branch option, and I am not sure what the
>> right behaviour for the auto-following of tags in such a repository.
>
> I suppose followtags feature has been around long enough that we can
> simply trust that and skip the second fetch?

Hmmmm, I wonder why the code needs the backfill fetch while talking
to a server that has the include-tag capability, then?

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Multiple fetches when unshallowing a shallow clone
  2015-12-06  4:00             ` Junio C Hamano
@ 2015-12-06  6:37               ` Jeff King
  2015-12-06  7:01                 ` Jeff King
  0 siblings, 1 reply; 19+ messages in thread
From: Jeff King @ 2015-12-06  6:37 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Duy Nguyen, Jason Paller-Rzepka, Stefan Beller, git

On Sat, Dec 05, 2015 at 08:00:28PM -0800, Junio C Hamano wrote:

> > I suppose followtags feature has been around long enough that we can
> > simply trust that and skip the second fetch?
> 
> Hmmmm, I wonder why the code needs the backfill fetch while talking
> to a server that has the include-tag capability, then?

The logic in find_non_local_tags makes no sense to me.

After we do the first fetch, we call this function to find out if there
are any tags we need to pick up. We iterate through the list of refs
given to us by the remote (including peeled lines), and do:

        if (ends_with(ref->name, "^{}")) {
                if (item && !has_sha1_file(ref->old_sha1) &&
                    !will_fetch(head, ref->old_sha1) &&
                    !has_sha1_file(item->util) &&
                    !will_fetch(head, item->util))
                        item->util = NULL;
                ...
        }

where "ref" is the peeled line (i.e., the pointed-to commit), and "item"
is the preceding line (i.e., the tag itself) with util set to its sha1.
Any such item whose util survives as non-NULL is fetched in the backfill
step.

So I'd think the logic for backfilling a given tag should be:

  1. We have the peeled object, and...

  2. We don't currently have the tag pointing to it, and...

  3. We are not already going to fetch it.

You could write that as:

  if (has_sha1_file(ref->old_sha1) &&
      !has_sha1_file(item->util) &&
      !will_fetch(head, item->util))

Of course the conditional in the code is "should we skip the backfill",
the negation. So using De Morgan's laws, we'd write:

  if (!has_sha1_file(ref->old_sha1) ||
      has_sha1_file(item->util) ||
      will_fetch(head, item->util))

Which is quite a bit different than what is there. I'm not sure at all
what the "will_fetch(head, ref->old_sha1)" is doing. In fact, for the
backfill step, it looks like we feed an empty "head" list, so the
!will_fetch() is always true.

And indeed, replacing the logic with what I wrote does make the backfill
go away in my test case. But it's so far from what is there that I feel
like I must be missing something.

-Peff

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Multiple fetches when unshallowing a shallow clone
  2015-12-06  6:37               ` Jeff King
@ 2015-12-06  7:01                 ` Jeff King
  2015-12-06 10:46                   ` Duy Nguyen
  2015-12-07 21:27                   ` Junio C Hamano
  0 siblings, 2 replies; 19+ messages in thread
From: Jeff King @ 2015-12-06  7:01 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Duy Nguyen, Jason Paller-Rzepka, Stefan Beller, git

On Sun, Dec 06, 2015 at 01:37:18AM -0500, Jeff King wrote:

> And indeed, replacing the logic with what I wrote does make the backfill
> go away in my test case. But it's so far from what is there that I feel
> like I must be missing something.

I think one thing I was missing is that we need to just grab the
_object_, but we need to realize that the ref needs updating[1]. So we
cannot skip backfill of any tag that we do not already have, even if we
already have the tag object.

Which made me wonder why this:

  git init parent &&
  git -C parent commit --allow-empty -m one &&
  git clone parent child &&
  git -C parent commit --allow-empty -m two &&
  git -C parent tag -m mytag foo &&
  git -C parent commit --allow-empty -m three &&
  git -C child fetch

does not appear to need to backfill to pick up refs/tags/foo. But it
does. It's just that it hits the quickfetch() code path and does not
have to ask the other side for a pack. And that explains why it does hit
in the --shallow case: we explicitly disable quickfetch in such cases.

For the unshallow case, of course we could use it (but only for the
second, backfill fetch). Something like this seems to work for me:

diff --git a/builtin/fetch.c b/builtin/fetch.c
index ed84963..b33b90f 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -881,6 +881,8 @@ static void backfill_tags(struct transport *transport, struct ref *ref_map)
 
 	transport_set_option(transport, TRANS_OPT_FOLLOWTAGS, NULL);
 	transport_set_option(transport, TRANS_OPT_DEPTH, "0");
+	if (unshallow)
+		depth = NULL;
 	fetch_refs(transport, ref_map);
 
 	if (gsecondary) {

But I admit I am not at all confident that it doesn't cause other
problems, or that it covers all cases. Even in a shallow repo, we should
be able to quickfetch individual tags, shouldn't we? I wonder if we
could just always set "depth = NULL" here.

-Peff

[1] I'm still puzzled why find_non_local_tags uses has_sha1_file() on
    the tag object at all, then.

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: Multiple fetches when unshallowing a shallow clone
  2015-12-06  7:01                 ` Jeff King
@ 2015-12-06 10:46                   ` Duy Nguyen
  2015-12-07 19:57                     ` Jason Paller-Rzepka
       [not found]                     ` <CACs8u9RzUVWw2Ld1K7JeO7Eci114JEiML8bbGy96m4pZZk=FnA@mail.gmail.com>
  2015-12-07 21:27                   ` Junio C Hamano
  1 sibling, 2 replies; 19+ messages in thread
From: Duy Nguyen @ 2015-12-06 10:46 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, Jason Paller-Rzepka, Stefan Beller, git

On Sun, Dec 6, 2015 at 8:01 AM, Jeff King <peff@peff.net> wrote:
> On Sun, Dec 06, 2015 at 01:37:18AM -0500, Jeff King wrote:
>
>> And indeed, replacing the logic with what I wrote does make the backfill
>> go away in my test case. But it's so far from what is there that I feel
>> like I must be missing something.
>
> I think one thing I was missing is that we need to just grab the
> _object_, but we need to realize that the ref needs updating[1]. So we
> cannot skip backfill of any tag that we do not already have, even if we
> already have the tag object.

It's probably worth adding a few comment lines about this. I did
search back commit history but did not get this.

> Which made me wonder why this:
>
>   git init parent &&
>   git -C parent commit --allow-empty -m one &&
>   git clone parent child &&
>   git -C parent commit --allow-empty -m two &&
>   git -C parent tag -m mytag foo &&
>   git -C parent commit --allow-empty -m three &&
>   git -C child fetch
>
> does not appear to need to backfill to pick up refs/tags/foo. But it
> does. It's just that it hits the quickfetch() code path and does not
> have to ask the other side for a pack. And that explains why it does hit
> in the --shallow case: we explicitly disable quickfetch in such cases.
>
> For the unshallow case, of course we could use it (but only for the
> second, backfill fetch). Something like this seems to work for me:
>
> diff --git a/builtin/fetch.c b/builtin/fetch.c
> index ed84963..b33b90f 100644
> --- a/builtin/fetch.c
> +++ b/builtin/fetch.c
> @@ -881,6 +881,8 @@ static void backfill_tags(struct transport *transport, struct ref *ref_map)
>
>         transport_set_option(transport, TRANS_OPT_FOLLOWTAGS, NULL);
>         transport_set_option(transport, TRANS_OPT_DEPTH, "0");
> +       if (unshallow)
> +               depth = NULL;
>         fetch_refs(transport, ref_map);
>
>         if (gsecondary) {
>
> But I admit I am not at all confident that it doesn't cause other
> problems, or that it covers all cases. Even in a shallow repo, we should
> be able to quickfetch individual tags, shouldn't we?

Yes. depth is only non-NULL when you pass --depth (or --unshallow).
quickfetch should happen when you fetch without those options.

> I wonder if we could just always set "depth = NULL" here.

--unshallow is essentially --depth=<max>, so I don't see why
--unshallow should be singled out here. We probably want to restore
depth back (or pass a flag to explicitly ignore the "depth" exception
in quickfetch). For multiple fetches, we spawn new commands so depth
being NULL does not harm. Just in case somebody tries to fetch a
couple more times in the same process in future.
-- 
Duy

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Multiple fetches when unshallowing a shallow clone
  2015-12-06 10:46                   ` Duy Nguyen
@ 2015-12-07 19:57                     ` Jason Paller-Rzepka
       [not found]                     ` <CACs8u9RzUVWw2Ld1K7JeO7Eci114JEiML8bbGy96m4pZZk=FnA@mail.gmail.com>
  1 sibling, 0 replies; 19+ messages in thread
From: Jason Paller-Rzepka @ 2015-12-07 19:57 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: Jeff King, Junio C Hamano, Stefan Beller, git

I should point out that I never saw a "deepen 0" line.  Rather, I saw
git send "option depth 0" to the remote helper.
Duy, you mentioned that depth=0 means "do not change depth".  I assume
that means the server should use exactly the shallows that the client
sent, and it does not need to traverse the tree or modify the shallow
or unshallow sets at all.  Right?
Duy, you also mentioned that "those lines should be rejected any way".
You just mean that a "deepen 0" line should be rejected, right? And
that's because the right way to tell git-upload-pack not to change the
depth is to omit the "deepen" line after the "shallow" lines, so
there's never a need to send "deepen 0"?

Thanks!

On Sun, Dec 6, 2015 at 5:46 AM, Duy Nguyen <pclouds@gmail.com> wrote:
> On Sun, Dec 6, 2015 at 8:01 AM, Jeff King <peff@peff.net> wrote:
>> On Sun, Dec 06, 2015 at 01:37:18AM -0500, Jeff King wrote:
>>
>>> And indeed, replacing the logic with what I wrote does make the backfill
>>> go away in my test case. But it's so far from what is there that I feel
>>> like I must be missing something.
>>
>> I think one thing I was missing is that we need to just grab the
>> _object_, but we need to realize that the ref needs updating[1]. So we
>> cannot skip backfill of any tag that we do not already have, even if we
>> already have the tag object.
>
> It's probably worth adding a few comment lines about this. I did
> search back commit history but did not get this.
>
>> Which made me wonder why this:
>>
>>   git init parent &&
>>   git -C parent commit --allow-empty -m one &&
>>   git clone parent child &&
>>   git -C parent commit --allow-empty -m two &&
>>   git -C parent tag -m mytag foo &&
>>   git -C parent commit --allow-empty -m three &&
>>   git -C child fetch
>>
>> does not appear to need to backfill to pick up refs/tags/foo. But it
>> does. It's just that it hits the quickfetch() code path and does not
>> have to ask the other side for a pack. And that explains why it does hit
>> in the --shallow case: we explicitly disable quickfetch in such cases.
>>
>> For the unshallow case, of course we could use it (but only for the
>> second, backfill fetch). Something like this seems to work for me:
>>
>> diff --git a/builtin/fetch.c b/builtin/fetch.c
>> index ed84963..b33b90f 100644
>> --- a/builtin/fetch.c
>> +++ b/builtin/fetch.c
>> @@ -881,6 +881,8 @@ static void backfill_tags(struct transport *transport, struct ref *ref_map)
>>
>>         transport_set_option(transport, TRANS_OPT_FOLLOWTAGS, NULL);
>>         transport_set_option(transport, TRANS_OPT_DEPTH, "0");
>> +       if (unshallow)
>> +               depth = NULL;
>>         fetch_refs(transport, ref_map);
>>
>>         if (gsecondary) {
>>
>> But I admit I am not at all confident that it doesn't cause other
>> problems, or that it covers all cases. Even in a shallow repo, we should
>> be able to quickfetch individual tags, shouldn't we?
>
> Yes. depth is only non-NULL when you pass --depth (or --unshallow).
> quickfetch should happen when you fetch without those options.
>
>> I wonder if we could just always set "depth = NULL" here.
>
> --unshallow is essentially --depth=<max>, so I don't see why
> --unshallow should be singled out here. We probably want to restore
> depth back (or pass a flag to explicitly ignore the "depth" exception
> in quickfetch). For multiple fetches, we spawn new commands so depth
> being NULL does not harm. Just in case somebody tries to fetch a
> couple more times in the same process in future.
> --
> Duy

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Multiple fetches when unshallowing a shallow clone
       [not found]                     ` <CACs8u9RzUVWw2Ld1K7JeO7Eci114JEiML8bbGy96m4pZZk=FnA@mail.gmail.com>
@ 2015-12-07 21:21                       ` Duy Nguyen
  0 siblings, 0 replies; 19+ messages in thread
From: Duy Nguyen @ 2015-12-07 21:21 UTC (permalink / raw)
  To: Jason Paller-Rzepka; +Cc: Jeff King, Junio C Hamano, Stefan Beller, git

On Mon, Dec 7, 2015 at 8:57 PM, Jason Paller-Rzepka <jasonpr@google.com> wrote:
> Duy, you mentioned that depth=0 means "do not change depth".  I assume that
> means the server should use exactly the shallows that the client sent, and
> it does not need to traverse the tree or modify the shallow or unshallow
> sets at all.  Right?

Correct. The server might send new shallow lines anyway though, if the
server repo is also shallow and the new fetched ref needs to be cut.
But I don't know if Dulwich supports that yet.

> Duy, you also mentioned that "those lines should be rejected any way".  You
> just mean that a "deepen 0" line should be rejected, right? And that's
> because the right way to tell git-upload-pack not to change the depth is to
> omit the "deepen" line after the "shallow" lines, so there's never a need to
> send "deepen 0"?

Also correct. I didn't check the code when I wrote that. But I have
checked and upload-pack does reject "deepen 0"

if (starts_with(line, "deepen ")) {
   char *end;
   depth = strtol(line + 7, &end, 0);
   if (end == line + 7 || depth <= 0)
      die("Invalid deepen: %s", line);
-- 
Duy

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Multiple fetches when unshallowing a shallow clone
  2015-12-06  7:01                 ` Jeff King
  2015-12-06 10:46                   ` Duy Nguyen
@ 2015-12-07 21:27                   ` Junio C Hamano
  2015-12-07 21:42                     ` Jeff King
  1 sibling, 1 reply; 19+ messages in thread
From: Junio C Hamano @ 2015-12-07 21:27 UTC (permalink / raw)
  To: Jeff King
  Cc: Duy Nguyen, Jason Paller-Rzepka, Stefan Beller, git, Dennis Kaarsemaker

Jeff King <peff@peff.net> writes:

> I think one thing I was missing is that we need to just grab the
> _object_, but we need to realize that the ref needs updating[1]. So we
> cannot skip backfill of any tag that we do not already have, even if we
> already have the tag object.
> ...
> [1] I'm still puzzled why find_non_local_tags uses has_sha1_file() on
>     the tag object at all, then.

The designed semantics of auto-following tags (not necessarily as
implemented or documented, i.e. there may be implementation or
documentation bugs), I think, is to arrive at the same state as
doing a fetch (or a push) without the auto-following and then doing
a separate fetch (or a push) of tags that point at the objects that
are reachable from the tips of refs after finishing the first
(i.e. without auto-follow) fetch (or a push).  In a scenario where
we already have a commit reachable from existing remote-tracking
branch and the current transfer (be it a fetch or a push, with or
without auto-follow) does not update any remote-tracking branch
(because the source side did not have any changes), if the source
side added a tag that refers to that commit that the receiving end
lacks, that tag needs to be transferred and then stored.

So has_sha1_file() is not the right test---if anything, it needs to
be checking if the object being checked is reachable from a tip of
some ref.

But of course, that test is rather expensive, so perhaps the
implementation cheated and uses has_sha1_file() instead?  The only
case it would misidentify would be after an aborted fetch (or push)
left unconnected island of objects and some of these objects that
are not reachable are pointed at by tags the receiving end does not
have.





 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Multiple fetches when unshallowing a shallow clone
  2015-12-07 21:27                   ` Junio C Hamano
@ 2015-12-07 21:42                     ` Jeff King
  0 siblings, 0 replies; 19+ messages in thread
From: Jeff King @ 2015-12-07 21:42 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Duy Nguyen, Jason Paller-Rzepka, Stefan Beller, git, Dennis Kaarsemaker

On Mon, Dec 07, 2015 at 01:27:51PM -0800, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > I think one thing I was missing is that we need to just grab the
> > _object_, but we need to realize that the ref needs updating[1]. So we
> > cannot skip backfill of any tag that we do not already have, even if we
> > already have the tag object.
> > ...
> > [1] I'm still puzzled why find_non_local_tags uses has_sha1_file() on
> >     the tag object at all, then.
> 
> The designed semantics of auto-following tags (not necessarily as
> implemented or documented, i.e. there may be implementation or
> documentation bugs), I think, is to arrive at the same state as
> doing a fetch (or a push) without the auto-following and then doing
> a separate fetch (or a push) of tags that point at the objects that
> are reachable from the tips of refs after finishing the first
> (i.e. without auto-follow) fetch (or a push).  In a scenario where
> we already have a commit reachable from existing remote-tracking
> branch and the current transfer (be it a fetch or a push, with or
> without auto-follow) does not update any remote-tracking branch
> (because the source side did not have any changes), if the source
> side added a tag that refers to that commit that the receiving end
> lacks, that tag needs to be transferred and then stored.
> 
> So has_sha1_file() is not the right test---if anything, it needs to
> be checking if the object being checked is reachable from a tip of
> some ref.
> 
> But of course, that test is rather expensive, so perhaps the
> implementation cheated and uses has_sha1_file() instead?  The only
> case it would misidentify would be after an aborted fetch (or push)
> left unconnected island of objects and some of these objects that
> are not reachable are pointed at by tags the receiving end does not
> have.

I may have confused myself. There are actually two has_sha1_file() calls
in find_non_local_tags.

I agree it is the only sensible test for "do we have the commit this tag
peels to, and if so, we want to grab the tag".  Reachability is too
expensive to compute.

But for the other one ("do we have the tag object itself"), I initially
claimed "if we have the tag object already, we do not have to do the
backfill fetch". Which is not quite true. We have to update the ref even
if we have the tag object. But then, what if we have the tag object for
other reasons (e.g., because another tag points at it?).

E.g. in this sequence:

  git -C parent commit --allow-empty -m base
  git -C parent tag -m mytag foo
  git clone parent child
  git -C parent update-ref refs/tags/bar foo
  git -C child fetch

we must backfill refs/tags/bar during the fetch, even though we already
have the object. I don't see any point in checking has_sha1_file() for
the tagged object at all. If we don't have it, we obviously must fetch.
And if we do have it, we must fetch the ref, even if that results in no
objects transferred.

It's entirely possible I'm just confused, and AFAICT nobody has noticed
any breakage here, so please don't feel you need to spend a lot of time
humoring me. I'm just writing up my confusion for posterity. :)

-Peff

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2015-12-07 21:43 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-11-30 19:35 Multiple fetches when unshallowing a shallow clone Jason Paller-Rzepka
2015-12-04 20:46 ` Stefan Beller
2015-12-04 21:27   ` Jeff King
2015-12-04 21:36     ` Stefan Beller
2015-12-04 21:38     ` Jason Paller-Rzepka
2015-12-04 21:50       ` Stefan Beller
2015-12-04 21:51       ` Jeff King
2015-12-04 22:45         ` Junio C Hamano
2015-12-05  5:33           ` Duy Nguyen
2015-12-06  4:00             ` Junio C Hamano
2015-12-06  6:37               ` Jeff King
2015-12-06  7:01                 ` Jeff King
2015-12-06 10:46                   ` Duy Nguyen
2015-12-07 19:57                     ` Jason Paller-Rzepka
     [not found]                     ` <CACs8u9RzUVWw2Ld1K7JeO7Eci114JEiML8bbGy96m4pZZk=FnA@mail.gmail.com>
2015-12-07 21:21                       ` Duy Nguyen
2015-12-07 21:27                   ` Junio C Hamano
2015-12-07 21:42                     ` Jeff King
2015-12-04 21:57     ` Junio C Hamano
2015-12-04 22:10       ` Jeff King

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.