All of lore.kernel.org
 help / color / mirror / Atom feed
* git fetch --depth=* broken?
@ 2009-08-22  5:52 Nicolas Pitre
  2009-08-24  4:04 ` [PATCH] fix simple deepening of a repo Nicolas Pitre
  0 siblings, 1 reply; 22+ messages in thread
From: Nicolas Pitre @ 2009-08-22  5:52 UTC (permalink / raw)
  To: Junio C Hamano, Johannes Schindelin; +Cc: git


try out:

	git clone --depth=1 git://git.kernel.org/pub/scm/git/git
	cd git
	git fetch --depth=2

It then silently fails, except for the return code of 1.

With -v this is the same result.  Only if I remove --depth= do I get:

>From git://git.kernel.org/pub/scm/git/git
 = [up to date]      html       -> origin/html
 = [up to date]      maint      -> origin/maint
 = [up to date]      man        -> origin/man
 = [up to date]      master     -> origin/master
 = [up to date]      next       -> origin/next
 = [up to date]      pu         -> origin/pu
 = [up to date]      todo       -> origin/todo

and a return code of 0.

It seems that commit c6bc400585 is partly responsible for that 
misbehavior.  At least reverting it makes the status list appear again 
even with the presence of --depth=.

But still, actual result isn't any better.  Using --depth=2 or 
--depth=1000 doesn't change anything, unless there is _also_ a ref that 
was updated on the remote end.  Looks like the code is happy to conclude 
that there is nothing to do if local refs match remote refs and never go 
to talk further to the remote end ("no "shallow ..." nor "deepen ..." 
are sent over the wire) despite the fact that --depth=1000 would 
certainly have to trigger a pack transfer.

I'm also surprised that such thing as simple deepening of a repo is not 
in the test suite.  We certainly document this operation in the 
git-fetch man page though.

The code in builtin-fetch-pack.c still looks rather confusing to me, so 
hopefully you are more familiar with it and can provide a fix.


Nicolas

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH] fix simple deepening of a repo
  2009-08-22  5:52 git fetch --depth=* broken? Nicolas Pitre
@ 2009-08-24  4:04 ` Nicolas Pitre
  2009-08-24  4:49   ` Junio C Hamano
  0 siblings, 1 reply; 22+ messages in thread
From: Nicolas Pitre @ 2009-08-24  4:04 UTC (permalink / raw)
  To: Junio C Hamano, Johannes Schindelin; +Cc: git

If all refs sent by the remote repo during a fetch are reachable 
locally, then no further conversation is performed with the remote. This 
check is skipped when the --depth argument is provided to allow the 
deepening of a shallow clone which corresponding remote repo has no 
changed.

However, some additional filtering was added in commit c29727d5 to 
remove those refs which are equal on both sides.  If the remote repo has 
not changed, then the list of refs to give the remote process becomes 
empty and simply attempting to deepen a shallow repo always fails.

Let's stop being smart in that case and simply send the whole list over
when that condition is met.  The remote will do the right thing anyways.

Test cases for this issue are also provided.

Signed-off-by: Nicolas Pitre <nico@cam.org>
---

On Sat, 22 Aug 2009, Nicolas Pitre wrote:

> try out:
> 
> 	git clone --depth=1 git://git.kernel.org/pub/scm/git/git
> 	cd git
> 	git fetch --depth=2
> 
> It then silently fails, except for the return code of 1.

Here's the fix.  Problem wasn't in builtin-fetch-pack.c after all.

diff --git a/t/t5500-fetch-pack.sh b/t/t5500-fetch-pack.sh
index a8c2ca2..18376d6 100755
--- a/t/t5500-fetch-pack.sh
+++ b/t/t5500-fetch-pack.sh
@@ -139,6 +139,36 @@ test_expect_success 'fsck in shallow repo' '
 	)
 '
 
+test_expect_success 'simple fetch in shallow repo' '
+	(
+		cd shallow &&
+		git fetch
+	)
+'
+
+test_expect_success 'no changes expected' '
+	(
+		cd shallow &&
+		git count-objects -v
+	) > count.shallow.2 &&
+	cmp count.shallow count.shallow.2
+'
+
+test_expect_success 'fetch same depth in shallow repo' '
+	(
+		cd shallow &&
+		git fetch --depth=2
+	)
+'
+
+test_expect_success 'no changes expected' '
+	(
+		cd shallow &&
+		git count-objects -v
+	) > count.shallow.3 &&
+	cmp count.shallow count.shallow.3
+'
+
 test_expect_success 'add two more' '
 	add B66 $B65 &&
 	add B67 $B66
@@ -201,4 +231,21 @@ test_expect_success 'pull in shallow repo with missing merge base' '
 	)
 '
 
+test_expect_success 'additional simple shallow deepenings' '
+	(
+		cd shallow &&
+		git fetch --depth=8 &&
+		git fetch --depth=10 &&
+		git fetch --depth=11
+	)
+'
+
+test_expect_success 'clone shallow object count' '
+	(
+		cd shallow &&
+		git count-objects -v
+	) > count.shallow &&
+	grep "^count: 52" count.shallow
+'
+
 test_done
diff --git a/transport.c b/transport.c
index a8df319..ce91387 100644
--- a/transport.c
+++ b/transport.c
@@ -925,11 +925,12 @@ const struct ref *transport_get_remote_refs(struct transport *transport)
 int transport_fetch_refs(struct transport *transport, const struct ref *refs)
 {
 	int rc;
-	int nr_heads = 0, nr_alloc = 0;
+	int nr_heads = 0, nr_alloc = 0, nr_refs = 0;
 	const struct ref **heads = NULL;
 	const struct ref *rm;
 
 	for (rm = refs; rm; rm = rm->next) {
+		nr_refs++;
 		if (rm->peer_ref &&
 		    !hashcmp(rm->peer_ref->old_sha1, rm->old_sha1))
 			continue;
@@ -937,6 +938,19 @@ int transport_fetch_refs(struct transport *transport, const struct ref *refs)
 		heads[nr_heads++] = rm;
 	}
 
+	if (!nr_heads) {
+		/*
+		 * When deepening of a shallow repository is requested,
+		 * then local and remote refs are likely to still be equal.
+		 * Just feed them all to the fetch method in that case.
+		 * This condition shouldn't be met in a non-deepening fetch
+		 * (see builtin-fetch.c:quickfetch()).
+		 */
+		heads = xmalloc(nr_refs * sizeof(*heads));
+		for (rm = refs; rm; rm = rm->next)
+			heads[nr_heads++] = rm;
+	}
+
 	rc = transport->fetch(transport, nr_heads, heads);
 	free(heads);
 	return rc;

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH] fix simple deepening of a repo
  2009-08-24  4:04 ` [PATCH] fix simple deepening of a repo Nicolas Pitre
@ 2009-08-24  4:49   ` Junio C Hamano
  2009-08-24 13:55     ` Nicolas Pitre
  2009-08-24 16:26     ` Daniel Barkalow
  0 siblings, 2 replies; 22+ messages in thread
From: Junio C Hamano @ 2009-08-24  4:49 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: Daniel Barkalow, Johannes Schindelin, git

Nicolas Pitre <nico@cam.org> writes:

> If all refs sent by the remote repo during a fetch are reachable 
> locally, then no further conversation is performed with the remote. This 
> check is skipped when the --depth argument is provided to allow the 
> deepening of a shallow clone which corresponding remote repo has no 
> changed.
>
> However, some additional filtering was added in commit c29727d5 to 
> remove those refs which are equal on both sides.  If the remote repo has 
> not changed, then the list of refs to give the remote process becomes 
> empty and simply attempting to deepen a shallow repo always fails.
>
> Let's stop being smart in that case and simply send the whole list over
> when that condition is met.  The remote will do the right thing anyways.
>
> Test cases for this issue are also provided.
>
> Signed-off-by: Nicolas Pitre <nico@cam.org>
> ---

Thanks.  The fix looks correct (as usual with patches from you).

But it makes me wonder if this logic to filter refs is buying us anything.

>  	for (rm = refs; rm; rm = rm->next) {
> +		nr_refs++;
>  		if (rm->peer_ref &&
>  		    !hashcmp(rm->peer_ref->old_sha1, rm->old_sha1))
>  			continue;
		ALLOC_GROW(heads, nr_heads + 1, nr_alloc);
		heads[nr_heads++] = rm;
	}

What is the point of not asking for the refs that we know are the same?

In other words, what breaks (not necessarily in the correctness sense, but
also in the performance sense) if we get rid of this filtering altogether?

Over a native protocol, we will tell the other end what we have and the
remote end will notice that we are asking for the same thing, so it won't
include unnecessary objects in the pack anyway.  When calling out to a
walker, we will also notice that the ref matches what we have already and
will not fetch anything from the remote.  Because the commit at the tip of
the ref that is already up-to-date is marked as COMPLETE, we will not even
locally have to walk the object chain to make sure things are connected.

I think the only thing that this possibly gains is if _everything_ is up
to date.  Then we won't need to make a call to transport->fetch() and it
would save a new connection.

But that optimization is already done long ago by the caller's
quickfetch() in the normal case.

Which makes me suspect that the real fix should be something a lot
simpler, like this (untested) patch.

diff --git a/transport.c b/transport.c
index f231b35..14dab81 100644
--- a/transport.c
+++ b/transport.c
@@ -1053,18 +1053,16 @@ const struct ref *transport_get_remote_refs(struct transport *transport)
 int transport_fetch_refs(struct transport *transport, const struct ref *refs)
 {
 	int rc;
-	int nr_heads = 0, nr_alloc = 0;
+	int nr_heads = 0;
 	const struct ref **heads = NULL;
 	const struct ref *rm;
 
-	for (rm = refs; rm; rm = rm->next) {
-		if (rm->peer_ref &&
-		    !hashcmp(rm->peer_ref->old_sha1, rm->old_sha1))
-			continue;
-		ALLOC_GROW(heads, nr_heads + 1, nr_alloc);
+	for (rm = refs; rm; rm = rm->next)
+		nr_heads++;
+	heads = xmalloc(nr_heads * sizeof(*heads));
+	nr_heads = 0;
+	for (rm = refs; rm; rm = rm->next)
 		heads[nr_heads++] = rm;
-	}
-
 	rc = transport->fetch(transport, nr_heads, heads);
 	free(heads);
 	return rc;

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH] fix simple deepening of a repo
  2009-08-24  4:49   ` Junio C Hamano
@ 2009-08-24 13:55     ` Nicolas Pitre
  2009-08-24 14:20       ` Johan Herland
  2009-08-24 22:21       ` Junio C Hamano
  2009-08-24 16:26     ` Daniel Barkalow
  1 sibling, 2 replies; 22+ messages in thread
From: Nicolas Pitre @ 2009-08-24 13:55 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Daniel Barkalow, Johannes Schindelin, git

On Sun, 23 Aug 2009, Junio C Hamano wrote:

> Nicolas Pitre <nico@cam.org> writes:
> 
> > If all refs sent by the remote repo during a fetch are reachable 
> > locally, then no further conversation is performed with the remote. This 
> > check is skipped when the --depth argument is provided to allow the 
> > deepening of a shallow clone which corresponding remote repo has no 
> > changed.
> >
> > However, some additional filtering was added in commit c29727d5 to 
> > remove those refs which are equal on both sides.  If the remote repo has 
> > not changed, then the list of refs to give the remote process becomes 
> > empty and simply attempting to deepen a shallow repo always fails.
> >
> > Let's stop being smart in that case and simply send the whole list over
> > when that condition is met.  The remote will do the right thing anyways.
> >
> > Test cases for this issue are also provided.
> >
> > Signed-off-by: Nicolas Pitre <nico@cam.org>
> > ---
> 
> Thanks.  The fix looks correct (as usual with patches from you).
> 
> But it makes me wonder if this logic to filter refs is buying us anything.
> 
> >  	for (rm = refs; rm; rm = rm->next) {
> > +		nr_refs++;
> >  		if (rm->peer_ref &&
> >  		    !hashcmp(rm->peer_ref->old_sha1, rm->old_sha1))
> >  			continue;
> 		ALLOC_GROW(heads, nr_heads + 1, nr_alloc);
> 		heads[nr_heads++] = rm;
> 	}
> 
> What is the point of not asking for the refs that we know are the same?

I could see the advantage if the number of refs is really huge.  Wasn't 
some converted repositories producing a lot of branches and/or tags 
significantly slowing down a fetch operation?  Granted that was long ago 
when that issue got "fixed" so the details are fuzzy to me.

> In other words, what breaks (not necessarily in the correctness sense, but
> also in the performance sense) if we get rid of this filtering altogether?

If you really want to get rid of that filtering, I'd still do it in a 
separate patch.  That way if any issue appears because of that then 
bissection will point directly to that removal alone.


Nicolas

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] fix simple deepening of a repo
  2009-08-24 13:55     ` Nicolas Pitre
@ 2009-08-24 14:20       ` Johan Herland
  2009-08-24 22:21       ` Junio C Hamano
  1 sibling, 0 replies; 22+ messages in thread
From: Johan Herland @ 2009-08-24 14:20 UTC (permalink / raw)
  To: git; +Cc: Nicolas Pitre, Junio C Hamano, Daniel Barkalow, Johannes Schindelin

On Monday 24 August 2009, Nicolas Pitre wrote:
> On Sun, 23 Aug 2009, Junio C Hamano wrote:
> > Nicolas Pitre <nico@cam.org> writes:
> > > If all refs sent by the remote repo during a fetch are reachable
> > > locally, then no further conversation is performed with the
> > > remote. This check is skipped when the --depth argument is
> > > provided to allow the deepening of a shallow clone which
> > > corresponding remote repo has no changed.
> > >
> > > However, some additional filtering was added in commit c29727d5
> > > to remove those refs which are equal on both sides.  If the
> > > remote repo has not changed, then the list of refs to give the
> > > remote process becomes empty and simply attempting to deepen a
> > > shallow repo always fails.
> > >
> > > Let's stop being smart in that case and simply send the whole
> > > list over when that condition is met.  The remote will do the
> > > right thing anyways.
> > >
> > > Test cases for this issue are also provided.
> > >
> > > Signed-off-by: Nicolas Pitre <nico@cam.org>
> > > ---
> >
> > Thanks.  The fix looks correct (as usual with patches from you).
> >
> > But it makes me wonder if this logic to filter refs is buying us
> > anything.
> >
> > >  	for (rm = refs; rm; rm = rm->next) {
> > > +		nr_refs++;
> > >  		if (rm->peer_ref &&
> > >  		    !hashcmp(rm->peer_ref->old_sha1, rm->old_sha1))
> > >  			continue;
> >
> > 		ALLOC_GROW(heads, nr_heads + 1, nr_alloc);
> > 		heads[nr_heads++] = rm;
> > 	}
> >
> > What is the point of not asking for the refs that we know are the
> > same?
>
> I could see the advantage if the number of refs is really huge. 
> Wasn't some converted repositories producing a lot of branches and/or
> tags significantly slowing down a fetch operation?  Granted that was
> long ago when that issue got "fixed" so the details are fuzzy to me.

I'm converting several CVS repos to Git with ~50 000 refs, so I'm happy 
with any change that can speed things up for repos with many refs.

Right now, my biggest gripe is that a 'git push --mirror' on such a repo 
can easily take ~10 min. even though the actual pack generation and 
transfer only takes a couple of seconds. It seems like it needs ~10 
minutes to generate the list of changed/added/deleted refs...
Unfortunately I haven't had time to look properly into it, yet...


...Johan

-- 
Johan Herland, <johan@herland.net>
www.herland.net

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] fix simple deepening of a repo
  2009-08-24  4:49   ` Junio C Hamano
  2009-08-24 13:55     ` Nicolas Pitre
@ 2009-08-24 16:26     ` Daniel Barkalow
  2009-08-24 22:30       ` Julian Phillips
  1 sibling, 1 reply; 22+ messages in thread
From: Daniel Barkalow @ 2009-08-24 16:26 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Nicolas Pitre, Johannes Schindelin, git

On Sun, 23 Aug 2009, Junio C Hamano wrote:

> Nicolas Pitre <nico@cam.org> writes:
> 
> > If all refs sent by the remote repo during a fetch are reachable 
> > locally, then no further conversation is performed with the remote. This 
> > check is skipped when the --depth argument is provided to allow the 
> > deepening of a shallow clone which corresponding remote repo has no 
> > changed.
> >
> > However, some additional filtering was added in commit c29727d5 to 
> > remove those refs which are equal on both sides.  If the remote repo has 
> > not changed, then the list of refs to give the remote process becomes 
> > empty and simply attempting to deepen a shallow repo always fails.
> >
> > Let's stop being smart in that case and simply send the whole list over
> > when that condition is met.  The remote will do the right thing anyways.
> >
> > Test cases for this issue are also provided.
> >
> > Signed-off-by: Nicolas Pitre <nico@cam.org>
> > ---
> 
> Thanks.  The fix looks correct (as usual with patches from you).
> 
> But it makes me wonder if this logic to filter refs is buying us anything.
> 
> >  	for (rm = refs; rm; rm = rm->next) {
> > +		nr_refs++;
> >  		if (rm->peer_ref &&
> >  		    !hashcmp(rm->peer_ref->old_sha1, rm->old_sha1))
> >  			continue;
> 		ALLOC_GROW(heads, nr_heads + 1, nr_alloc);
> 		heads[nr_heads++] = rm;
> 	}
> 
> What is the point of not asking for the refs that we know are the same?

This code is part of the original C implementation of fetch; I suspect the 
optimization was somehow in the shell version and made sense there, 
perhaps because there wasn't a quickfetch in the shell version or that 
there was some non-negligable per-ref cost in the code around there, since 
it was calling helper programs and such.

Anyway, I think it makes sense to remove the filtering from 
transport_fetch_refs(), like your patch does.

If it makes a difference for the actual protocol, fetch_refs_via_pack() 
could filter them at that stage.

	-Daniel
*This .sig left intentionally blank*

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] fix simple deepening of a repo
  2009-08-24 13:55     ` Nicolas Pitre
  2009-08-24 14:20       ` Johan Herland
@ 2009-08-24 22:21       ` Junio C Hamano
  1 sibling, 0 replies; 22+ messages in thread
From: Junio C Hamano @ 2009-08-24 22:21 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: Junio C Hamano, Daniel Barkalow, Johannes Schindelin, git

Nicolas Pitre <nico@cam.org> writes:

> If you really want to get rid of that filtering, I'd still do it in a 
> separate patch.  That way if any issue appears because of that then 
> bissection will point directly to that removal alone.

Fair enough.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] fix simple deepening of a repo
  2009-08-24 16:26     ` Daniel Barkalow
@ 2009-08-24 22:30       ` Julian Phillips
  2009-08-25  0:18         ` Nicolas Pitre
  0 siblings, 1 reply; 22+ messages in thread
From: Julian Phillips @ 2009-08-24 22:30 UTC (permalink / raw)
  To: Daniel Barkalow; +Cc: Junio C Hamano, Nicolas Pitre, Johannes Schindelin, git

On Mon, 24 Aug 2009, Daniel Barkalow wrote:

> On Sun, 23 Aug 2009, Junio C Hamano wrote:
>
>> But it makes me wonder if this logic to filter refs is buying us anything.
>>
>>>  	for (rm = refs; rm; rm = rm->next) {
>>> +		nr_refs++;
>>>  		if (rm->peer_ref &&
>>>  		    !hashcmp(rm->peer_ref->old_sha1, rm->old_sha1))
>>>  			continue;
>> 		ALLOC_GROW(heads, nr_heads + 1, nr_alloc);
>> 		heads[nr_heads++] = rm;
>> 	}
>>
>> What is the point of not asking for the refs that we know are the same?
>
> This code is part of the original C implementation of fetch; I suspect the
> optimization was somehow in the shell version and made sense there,
> perhaps because there wasn't a quickfetch in the shell version or that
> there was some non-negligable per-ref cost in the code around there, since
> it was calling helper programs and such.

I don't remember copying it from the shell version but my memory is 
terrible, so I could easily be wrong.  The relevant commit message was:

"git-fetch2: remove ref_maps that are not interesting

Once we have the full list of ref_maps, remove any where the local
and remote sha1s are the same - as we don't need to do anything for
them."

So that doesn't help.  I was very concerned about performance though 
(which was why I wanted fetch in C in the first place), so may have added 
it to speed up fetches that have only updated a few refs - and I assume 
that quickfetch was something that came along later after you absorbed the 
work into the transport series?

> Anyway, I think it makes sense to remove the filtering from
> transport_fetch_refs(), like your patch does.
>
> If it makes a difference for the actual protocol, fetch_refs_via_pack()
> could filter them at that stage.

I think it would certainly be worth investigating the performance aspects 
... no time tonight, but maybe tomorrow.

-- 
Julian

  ---
Some circumstantial evidence is very strong, as when you find a trout in
the milk.
 		-- Thoreau

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] fix simple deepening of a repo
  2009-08-24 22:30       ` Julian Phillips
@ 2009-08-25  0:18         ` Nicolas Pitre
  2009-08-25  2:12           ` Shawn O. Pearce
  0 siblings, 1 reply; 22+ messages in thread
From: Nicolas Pitre @ 2009-08-25  0:18 UTC (permalink / raw)
  To: Julian Phillips; +Cc: Daniel Barkalow, Junio C Hamano, Johannes Schindelin, git

On Mon, 24 Aug 2009, Julian Phillips wrote:

> On Mon, 24 Aug 2009, Daniel Barkalow wrote:
> 
> > On Sun, 23 Aug 2009, Junio C Hamano wrote:
> > 
> > > What is the point of not asking for the refs that we know are the same?
> > 
> > This code is part of the original C implementation of fetch; I suspect the
> > optimization was somehow in the shell version and made sense there,
> > perhaps because there wasn't a quickfetch in the shell version or that
> > there was some non-negligable per-ref cost in the code around there, since
> > it was calling helper programs and such.
> 
> I don't remember copying it from the shell version but my memory is terrible,
> so I could easily be wrong.  The relevant commit message was:
> 
> "git-fetch2: remove ref_maps that are not interesting
> 
> Once we have the full list of ref_maps, remove any where the local
> and remote sha1s are the same - as we don't need to do anything for
> them."
> 
> So that doesn't help.  I was very concerned about performance though (which
> was why I wanted fetch in C in the first place), so may have added it to speed
> up fetches that have only updated a few refs - and I assume that quickfetch
> was something that came along later after you absorbed the work into the
> transport series?

Well... Johan Herland says he has to deal with repositories containing 
around 50000 refs.  So in that case it is certainly a good idea not to 
send the whole 50000 refs back if only one or two (or a hundred) need to 
be updated.  And quickfetch() won't help in that case since its purpose 
is only to determine if there is anything at all to update.

> > Anyway, I think it makes sense to remove the filtering from
> > transport_fetch_refs(), like your patch does.
> > 
> > If it makes a difference for the actual protocol, fetch_refs_via_pack()
> > could filter them at that stage.
> 
> I think it would certainly be worth investigating the performance aspects ...
> no time tonight, but maybe tomorrow.

50000 refs * 45 bytes each = 2.25 MB.  That's all wasted bandwidth if 
only one ref needs updating.


Nicolas

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] fix simple deepening of a repo
  2009-08-25  0:18         ` Nicolas Pitre
@ 2009-08-25  2:12           ` Shawn O. Pearce
  2009-08-25  5:00             ` Sverre Rabbelier
  2009-08-25  5:21             ` Junio C Hamano
  0 siblings, 2 replies; 22+ messages in thread
From: Shawn O. Pearce @ 2009-08-25  2:12 UTC (permalink / raw)
  To: Nicolas Pitre
  Cc: Julian Phillips, Daniel Barkalow, Junio C Hamano,
	Johannes Schindelin, git

Nicolas Pitre <nico@cam.org> wrote:
> Well... Johan Herland says he has to deal with repositories containing 
> around 50000 refs.  So in that case it is certainly a good idea not to 
> send the whole 50000 refs back if only one or two (or a hundred) need to 
> be updated.  And quickfetch() won't help in that case since its purpose 
> is only to determine if there is anything at all to update.
...
> 50000 refs * 45 bytes each = 2.25 MB.  That's all wasted bandwidth if 
> only one ref needs updating.

Not just Johan Herland.  Gerrit Code Review creates a new ref
for every patch proposed for review.  Imagine taking every email
message on git ML that has "[PATCH]" in the subject, and creating
a new ref for that in a git.git clone.

We aren't quite at the 50k ref stage yet, but we're starting to
consider that some of our repositories have a ton of refs, and
that the initial advertisement for either fetch or push is horrid.

Since the refs are immutable I could actually teach the JGit
daemon to hide them from JGit's receive-pack, thus cutting down the
advertisement on push, but the refs exist so you can literally say:

  git fetch URL refs/changes/88/4488/2
  git show FETCH_HEAD

to inspect the "v2" version of whatever 4488 is, and if 4488 was
the last commit in a patch series, you'd also be able to do:

  git log -p --reverse ..FETCH_HEAD

to see the complete series.

Given how infrequent it is to grab a given change is though, I'm
starting to consider either a protocol extension that allows the
client to probe for a ref which wasn't in the initial advertisement,
or take it on a command line flag, e.g.:

  git fetch --uploadpack='git upload-pack --ref refs/changes/88/4488/2' URL refs/changes/88/4488/2

Personally I'd prefer extending the protocol, because making the
end user supply information twice is stupid.

I don't know enough about Johan's case though to know whether or
not he can get away with hiding the bulk of the refs in the initial
advertisement.  In the case of Gerrit Code Review, the bulk of the
refs is under refs/changes/, only a handful of things are under the
refs/heads/ and ref/tags/ namespace, and most fetches actually are
for only refs/heads/ and refs/tags/.  So hiding the refs/changes/
namespace would make large improvement in the advertisement cost.

-- 
Shawn.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] fix simple deepening of a repo
  2009-08-25  2:12           ` Shawn O. Pearce
@ 2009-08-25  5:00             ` Sverre Rabbelier
  2009-08-25  5:21             ` Junio C Hamano
  1 sibling, 0 replies; 22+ messages in thread
From: Sverre Rabbelier @ 2009-08-25  5:00 UTC (permalink / raw)
  To: Shawn O. Pearce
  Cc: Nicolas Pitre, Julian Phillips, Daniel Barkalow, Junio C Hamano,
	Johannes Schindelin, git

Heya,

On Mon, Aug 24, 2009 at 19:12, Shawn O. Pearce<spearce@spearce.org> wrote:
>  git fetch --uploadpack='git upload-pack --ref refs/changes/88/4488/2' URL refs/changes/88/4488/2
>
> Personally I'd prefer extending the protocol, because making the
> end user supply information twice is stupid.

$ git fetch --unadvertised URL refs/changes/88/4488/2

And we expand that to the above behind the screens?

-- 
Cheers,

Sverre Rabbelier

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] fix simple deepening of a repo
  2009-08-25  2:12           ` Shawn O. Pearce
  2009-08-25  5:00             ` Sverre Rabbelier
@ 2009-08-25  5:21             ` Junio C Hamano
  2009-08-25  6:12               ` Shawn O. Pearce
  1 sibling, 1 reply; 22+ messages in thread
From: Junio C Hamano @ 2009-08-25  5:21 UTC (permalink / raw)
  To: Shawn O. Pearce
  Cc: Nicolas Pitre, Julian Phillips, Daniel Barkalow,
	Johannes Schindelin, git

"Shawn O. Pearce" <spearce@spearce.org> writes:

> We aren't quite at the 50k ref stage yet, but we're starting to
> consider that some of our repositories have a ton of refs, and
> that the initial advertisement for either fetch or push is horrid.
>
> Since the refs are immutable I could actually teach the JGit
> daemon to hide them from JGit's receive-pack, thus cutting down the
> advertisement on push, but the refs exist so you can literally say:

What do you mean "refs are immutable"?

Do you mean "In the particular application, Gerrit, the server knows that
certain refs will never move nor get deleted, once they are created"?  If
so, then I would understand, but otherwise what you are describing is not
git anymore ;-)

And I think it is probably worth thinking things through to find a way to
take advantage of that knowledge.

Even though refs under refs/changes/ hierarchy may have that property, the
client won't know what's available unless you advertise it in some way.

You could assume some offline measure outside the git protocol exists for
clients to find out about them, and protocol extension could say "I do not
want to hear about refs that match these globs during this exchange,
because I have learnt about them offline", and the server could skip
advertisement.

>   git fetch --uploadpack='git upload-pack --ref refs/changes/88/4488/2' URL refs/changes/88/4488/2
>
> Personally I'd prefer extending the protocol, because making the
> end user supply information twice is stupid.

In the upload-pack protocol, the server talks first, so it is rather hard
to shoehorn a request from a client to ask "I know about refs/changes/*
hiearchy, so don't talk about them".

I however think it is entirely reasonable to have a server side
configuration that tells upload-pack not to advertise refs/changes/*
hierarchy but still remembers they are OUR_REF.  In send_ref() in
upload-pack.c, you'd do something like (I know, I know, you'd be doing
an equivalent of this in jgit):

	static const char *capabilities = "multi_ack ...";
	struct object *o = parse_object(sha1);
	int skip_advertisement = exclude_ref_from_advertisement(refname);

	if (!o)
		die("git upload-pack: cannot find object %s:", sha1_to_hex(sha1));

	if (!skip_advertisement) {
		if (capabilities)
			packet_write(1, "%s %s%c%s\n", sha1_to_hex(sha1), refname,
				0, capabilities);
		else
			packet_write(1, "%s %s\n", sha1_to_hex(sha1), refname);
		capabilities = NULL;
	}

	if (!(o->flags & OUR_REF)) {
		o->flags |= OUR_REF;
		nr_our_refs++;
	}
	if (o->type == OBJ_TAG) {
		o = deref_tag(o, refname, 0);
		if (o && !skip_advertisement)
			packet_write(1, "%s %s^{}\n", sha1_to_hex(o->sha1), refname);
	}
	return 0;

Doing it this way, receive_needs() will allow refs/changes/88/4488/2 to be
requested, because that is what send_ref() saw and marked as OUR_REF.  It
was just not sent to the client.  And get_common_commits() will behave the
same with or without this abbreviated advertisement,

Of course, the client side cannot grab everything with refs/*:refs/remotes/*
wildcard refspecs from such a server, but I think that can be considered a
feature.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] fix simple deepening of a repo
  2009-08-25  5:21             ` Junio C Hamano
@ 2009-08-25  6:12               ` Shawn O. Pearce
  2009-08-25  6:33                 ` Junio C Hamano
  0 siblings, 1 reply; 22+ messages in thread
From: Shawn O. Pearce @ 2009-08-25  6:12 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Nicolas Pitre, Julian Phillips, Daniel Barkalow,
	Johannes Schindelin, git

Junio C Hamano <gitster@pobox.com> wrote:
> "Shawn O. Pearce" <spearce@spearce.org> writes:
> > We aren't quite at the 50k ref stage yet, but we're starting to
> > consider that some of our repositories have a ton of refs, and
> > that the initial advertisement for either fetch or push is horrid.
> >
> > Since the refs are immutable I could actually teach the JGit
> > daemon to hide them from JGit's receive-pack, thus cutting down the
> > advertisement on push, but the refs exist so you can literally say:
> 
> What do you mean "refs are immutable"?
> 
> Do you mean "In the particular application, Gerrit, the server knows that
> certain refs will never move nor get deleted, once they are created"?  If
> so, then I would understand, but otherwise what you are describing is not
> git anymore ;-)

The former.  :-)

I mean that this particular server implementation will deny any
update made to refs/changes/, as if one had the following as the
update hook on that repository:

  #!/bin/sh
  case "$1" in
  refs/changes/*) exit 1;;
  *) exit 0;
  esac

This of course is completely legal, and since the server knows the
ref cannot be moved, there is no need to advertise it to the client.
But this is a very specialized thing, its rare that the thing that
formats the advertisement knows what the update hook will permit
to be modified.
 
> >   git fetch --uploadpack='git upload-pack --ref refs/changes/88/4488/2' URL refs/changes/88/4488/2
> >
> > Personally I'd prefer extending the protocol, because making the
> > end user supply information twice is stupid.
> 
> In the upload-pack protocol, the server talks first, so it is rather hard
> to shoehorn a request from a client to ask "I know about refs/changes/*
> hiearchy, so don't talk about them".

Actually, that assumption is still a problem.

The client knows the *name* of the ref, but not the SHA-1 the ref is
currently valued at.  Thus when the client knows it wants a certain
ref by name, it needs to send a "want " line to the server that would
give it whatever that ref currently points at.  Unfortunately since we
have not obtained that value yet, we are stuck.

However, we do have one name we want to know about, but the server may
have 50k other names in the same namespace we do not know about.

I was thinking instead that we have a new protocol extension:

  S: ... HEAD\0side-band ... expand-refs
  S: ... refs/heads/master
  S: 0000

  C: ... expand refs/changes/88/4488/2
  C: 0000

  S: ... refs/changes/88/4488/2
  S: 0000

  C: ... want XXXXXX\0side-band-64k ...
 
> Of course, the client side cannot grab everything with refs/*:refs/remotes/*
> wildcard refspecs from such a server, but I think that can be considered a
> feature.

If expand accepted globs like fetch does, then fetch can ask for
expand of refs/changes/* if it does not see any refs/changes/*
on advertisement.  Or just expand a particular ref, or handful of
refs, that the user has asked for on the fetch line.

The problem with this is servers which are sending this expand-refs
tag have hidden certain namespaces from older clients.  Those names
can't be seen by older git clients, unless the user does an upgrade.

This might be OK for Gerrit Code Review's refs/changes/ namespace,
but it may not be good for others.


-- 
Shawn.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] fix simple deepening of a repo
  2009-08-25  6:12               ` Shawn O. Pearce
@ 2009-08-25  6:33                 ` Junio C Hamano
  2009-08-25 15:14                   ` Shawn O. Pearce
  0 siblings, 1 reply; 22+ messages in thread
From: Junio C Hamano @ 2009-08-25  6:33 UTC (permalink / raw)
  To: Shawn O. Pearce
  Cc: Nicolas Pitre, Julian Phillips, Daniel Barkalow,
	Johannes Schindelin, git

"Shawn O. Pearce" <spearce@spearce.org> writes:

> The client knows the *name* of the ref, but not the SHA-1 the ref is
> currently valued at.  Thus when the client knows it wants a certain
> ref by name, it needs to send a "want " line to the server that would
> give it whatever that ref currently points at.  Unfortunately since we
> have not obtained that value yet, we are stuck.

That could be something you can fix in the out-of-band procedure Gerrit
uses (you let the client learn both name and value offline, and then the
client uses that value on "want" line).

However, even if we limit the discussion to Gerrit, you would need an
updated client that can be called with the out-of-band information
(i.e. "we know that changes/88/4488/2 points at X, so use X when
requesting") when talking with such an updated server.

So I think that expand-refs is a much nicer general solution than just
"server side is configured to hide but still allow certain refs", and
client updates cannot be avoided.

And again, 

> The problem with this is servers which are sending this expand-refs
> tag have hidden certain namespaces from older clients.  Those names
> can't be seen by older git clients, unless the user does an upgrade.

I do not think "generally hidden, but if you need to know you are allowed
to peek" is much of a problem.  You do not do that for regular refs, only
for "on-demand-as-needed" type things.  If we are going to make extensive
use of notes on commits to give richer annotations, I suspect notes
hierarchy could be hidden by default in a similar way.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] fix simple deepening of a repo
  2009-08-25  6:33                 ` Junio C Hamano
@ 2009-08-25 15:14                   ` Shawn O. Pearce
  2009-08-26  2:10                     ` Shawn O. Pearce
  0 siblings, 1 reply; 22+ messages in thread
From: Shawn O. Pearce @ 2009-08-25 15:14 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Nicolas Pitre, Julian Phillips, Daniel Barkalow,
	Johannes Schindelin, git

Junio C Hamano <gitster@pobox.com> wrote:
> "Shawn O. Pearce" <spearce@spearce.org> writes:
> > The client knows the *name* of the ref, but not the SHA-1 the ref is
> > currently valued at.  Thus when the client knows it wants a certain
> > ref by name, it needs to send a "want " line to the server that would
> > give it whatever that ref currently points at.  Unfortunately since we
> > have not obtained that value yet, we are stuck.
> 
> That could be something you can fix in the out-of-band procedure Gerrit
> uses (you let the client learn both name and value offline, and then the
> client uses that value on "want" line).

Well, we're trying to reduce out-of-band things with Gerrit.

Its bad enough that Gerrit doesn't use git am and git send-email
to slug around changes for discussion.  As it is we're an island
among the git world, *despite* the fact that Gerrit speaks the git
protocol natively and you can push directly to it, avoiding the
send-email SMTP nonsense many folks run into.
 
> However, even if we limit the discussion to Gerrit, you would need an
> updated client that can be called with the out-of-band information
> (i.e. "we know that changes/88/4488/2 points at X, so use X when
> requesting") when talking with such an updated server.

Yes, exactly.  Existing clients wouldn't send an arbitrary want
request, even if the server had a whitelist of objects it would
allow to be requested.

One reason why Gerrit publishes pending changes with ref names is to
make it easier for any user to obtain the proposed change locally.
Its hard to beat `git fetch URL blah`, that's even easier than
"save to mbox, git am mbox".
 
> So I think that expand-refs is a much nicer general solution than just
> "server side is configured to hide but still allow certain refs", and
> client updates cannot be avoided.

Yes, I agree.  Given 20/20 hindsight, its the way the protocol
should have been implemented:

  C: 0014expand refs/heads/*
  C: 0013expand refs/tags/*
  C: 0000

  S: ...refs/heads/master
  S: ...refs/heads/next
  S: ...refs/tags/v1.0
  S: 0000

This would have permitted clients doing `git pull URL for-linus` to say:

  C: 0011expand for-linus
  C: 0000

  S: ...refs/heads/for-linus
  S: ...refs/remotes/k26/for-linus
  S: 0000

and thus significantly narrow the scope of what they are shown when
they connect for a given ref.

> > The problem with this is servers which are sending this expand-refs
> > tag have hidden certain namespaces from older clients.  Those names
> > can't be seen by older git clients, unless the user does an upgrade.
> 
> I do not think "generally hidden, but if you need to know you are allowed
> to peek" is much of a problem.  You do not do that for regular refs, only
> for "on-demand-as-needed" type things.  If we are going to make extensive
> use of notes on commits to give richer annotations, I suspect notes
> hierarchy could be hidden by default in a similar way.

After sleeping on it, I'm OK with hiding some refs from older clients.

Sometimes things evolve, and you should just update your software
to keep up with them.  If you really want the "hidden refs" that
Gerrit advertises, you should install a newer client.

We could consider supporting a legacy option through upload-pack,
such as:

  git fetch --upload-pack='git-upload-pack --expand refs/changes/' URL

which tells the remote side to additionally expand those refs during
the initial advertisement.  Then users have an escape hatch if:

* They know the remote is new enough to hide refs;
* They suspect the remote is hiding refs;
* They received an out-of-band notification telling this;
* They have an older client which doesn't support expanding refs;
* They cannot upgrade said client yet;

I'm thinking about writing an RFC patch for this today for git.git.
I think the expand refs feature neatly solves a number of problems
for me in Gerrit.  But I'm really hoping its not the only set of
repositories that would benefit from such a feature, because if so,
its not worth the headache of the protocol change.

-- 
Shawn.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] fix simple deepening of a repo
  2009-08-25 15:14                   ` Shawn O. Pearce
@ 2009-08-26  2:10                     ` Shawn O. Pearce
  2009-08-26  7:08                       ` Johannes Sixt
  2009-08-28 17:30                       ` [RFC PATCH] upload-pack: expand capability advertises additional refs Shawn O. Pearce
  0 siblings, 2 replies; 22+ messages in thread
From: Shawn O. Pearce @ 2009-08-26  2:10 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Nicolas Pitre, Julian Phillips, Daniel Barkalow,
	Johannes Schindelin, git

"Shawn O. Pearce" <spearce@spearce.org> wrote:
> Junio C Hamano <gitster@pobox.com> wrote:
> > So I think that expand-refs is a much nicer general solution than just
> > "server side is configured to hide but still allow certain refs", and
> > client updates cannot be avoided.
> 
> Yes, I agree.
...
> I'm thinking about writing an RFC patch for this today for git.git.

RFC patch below.  This is only the upload-pack side, and lacks
test, etc, so its posted *ONLY* for discussion.  I'll try to flesh
it out further tomorrow into something that could be considered
more seriously.

--8<--
[RFC] upload-pack: expand capability advertises additional refs

The expand capability and associated command permits the client
to ask for information about refs which were not in the initial
advertisement sent when the connection was first opened.

In the below exchange the server initially only advertises its
current HEAD, refs/heads and refs/tags namespaces.  However,
the client has been instructed to fetch anything which matches
refs/remotes/jc/*.

Since no matching refs appeared in the initial advertisement,
the client requests the server to expand the desired pattern,
and terminates its expand request list with a flush.

Upon receiving a flush from the client, the server displays any
local refs which match any of the expand patterns requested,
and then closes this secondary advertisement list with a flush.
If no refs matched, the server immediately returns a flush.

If multiple expand patterns match the same ref, the ref is returned
only once in the secondary advertisement, avoid confusing the client
with duplicate results.

  S: 008f... HEAD\0...include-tag expand
  S: 0043... refs/heads/build-next
  S: 0040... refs/tags/v1.6.4.1
  S: 0043... refs/tags/v1.6.4.1^{}
  S: 0000

  C: 001dexpand refs/remotes/jc/*
  C: 0000

  S: 0043... refs/remotes/jc/maint
  S: 0044... refs/remotes/jc/master
  S: 0000

  C: 0031want ...
  C: 0000

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Shawn O. Pearce <sop@google.com>
---
 upload-pack.c |   97 +++++++++++++++++++++++++++++++++++++++++++++++++++++---
 1 files changed, 91 insertions(+), 6 deletions(-)

diff --git a/upload-pack.c b/upload-pack.c
index 4d8be83..e1ec608 100644
--- a/upload-pack.c
+++ b/upload-pack.c
@@ -10,6 +10,8 @@
 #include "revision.h"
 #include "list-objects.h"
 #include "run-command.h"
+#include "remote.h"
+#include "string-list.h"
 
 static const char upload_pack_usage[] = "git upload-pack [--strict] [--timeout=nn] <dir>";
 
@@ -30,6 +32,18 @@ static int multi_ack, nr_our_refs;
 static int use_thin_pack, use_ofs_delta, use_include_tag;
 static int no_progress, daemon_mode;
 static int shallow_nr;
+
+struct adv_ref {
+	struct adv_ref *next;
+	char *name;
+	unsigned pattern:1;
+};
+static struct adv_ref *to_advertise;
+static struct adv_ref **advertise_tail = &to_advertise;
+
+static struct ref *local_refs;
+static struct ref **refs_tail = &local_refs;
+
 static struct object_array have_obj;
 static struct object_array want_obj;
 static unsigned int timeout;
@@ -470,6 +484,17 @@ static int get_common_commits(void)
 	}
 }
 
+static void push_advertise(const char *name)
+{
+	struct adv_ref *adv = xcalloc(1, sizeof(*adv));
+	adv->name = xstrdup(name);
+	adv->pattern = !!strchr(adv->name, '*');
+	*advertise_tail = adv;
+	advertise_tail = &adv->next;
+}
+
+static void send_refs(void);
+
 static void receive_needs(void)
 {
 	struct object_array shallows = {0, 0, NULL};
@@ -484,11 +509,22 @@ static void receive_needs(void)
 		unsigned char sha1_buf[20];
 		len = packet_read_line(0, line, sizeof(line));
 		reset_timeout();
-		if (!len)
+		if (!len) {
+			if (to_advertise) {
+				send_refs();
+				continue;
+			}
 			break;
+		}
 		if (debug_fd)
 			write_in_full(debug_fd, line, len);
 
+		if (!prefixcmp(line, "expand ")) {
+			if (line[len - 1] == '\n')
+				line[len - 1] = 0;
+			push_advertise(line + 7);
+			continue;
+		}
 		if (!prefixcmp(line, "shallow ")) {
 			unsigned char sha1[20];
 			struct object *object;
@@ -603,11 +639,14 @@ static void receive_needs(void)
 	free(shallows.objects);
 }
 
-static int send_ref(const char *refname, const unsigned char *sha1, int flag, void *cb_data)
+static int send_ref(struct string_list_item *item, void *cb_data)
 {
 	static const char *capabilities = "multi_ack thin-pack side-band"
 		" side-band-64k ofs-delta shallow no-progress"
-		" include-tag";
+		" include-tag expand";
+	const struct ref *ref = item->util;
+	const char *refname = ref->name;
+	const unsigned char *sha1 = ref->new_sha1;
 	struct object *o = parse_object(sha1);
 
 	if (!o)
@@ -631,12 +670,58 @@ static int send_ref(const char *refname, const unsigned char *sha1, int flag, vo
 	return 0;
 }
 
+static void send_refs(void)
+{
+	struct ref *to_send = NULL, **tail = &to_send;
+	struct ref *ref;
+	struct adv_ref *adv, *next_adv;
+	struct string_list sorted_names;
+
+	for (adv = to_advertise; adv; adv = next_adv) {
+		struct refspec spec;
+
+		memset(&spec, 0, sizeof(spec));
+		spec.pattern = adv->pattern;
+		spec.src = adv->name;
+		spec.dst = adv->name;
+		next_adv = adv->next;
+		get_fetch_map(local_refs, &spec, &tail, 1);
+
+		free(adv->name);
+		free(adv);
+	}
+	to_advertise = NULL;
+	advertise_tail = &to_advertise;
+
+	memset(&sorted_names, 0, sizeof(sorted_names));
+	for (ref = to_send; ref; ref = ref->next)
+		string_list_insert(ref->name, &sorted_names)->util = ref;
+	for_each_string_list(send_ref, &sorted_names, NULL);
+	string_list_clear(&sorted_names, 0);
+	free_refs(to_send);
+	packet_flush(1);
+}
+
+static int scan_ref(const char *refname, const unsigned char *sha1, int flag, void *cb_data)
+{
+	struct ref *r = alloc_ref(refname);
+	hashcpy(r->new_sha1, sha1);
+	*refs_tail = r;
+	refs_tail = &r->next;
+	return 0;
+}
+
 static void upload_pack(void)
 {
 	reset_timeout();
-	head_ref(send_ref, NULL);
-	for_each_ref(send_ref, NULL);
-	packet_flush(1);
+	head_ref(scan_ref, NULL);
+	for_each_ref(scan_ref, NULL);
+
+	push_advertise("HEAD");
+	push_advertise("refs/heads/*");
+	push_advertise("refs/tags/*");
+	send_refs();
+
 	receive_needs();
 	if (want_obj.nr) {
 		get_common_commits();
-- 
1.6.4.1.331.gda1d56

-- 
Shawn.

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH] fix simple deepening of a repo
  2009-08-26  2:10                     ` Shawn O. Pearce
@ 2009-08-26  7:08                       ` Johannes Sixt
  2009-08-26  8:22                         ` Shawn O. Pearce
  2009-08-28 17:30                       ` [RFC PATCH] upload-pack: expand capability advertises additional refs Shawn O. Pearce
  1 sibling, 1 reply; 22+ messages in thread
From: Johannes Sixt @ 2009-08-26  7:08 UTC (permalink / raw)
  To: Shawn O. Pearce
  Cc: Junio C Hamano, Nicolas Pitre, Julian Phillips, Daniel Barkalow,
	Johannes Schindelin, git

Shawn O. Pearce schrieb:
>  static void upload_pack(void)
>  {
>  	reset_timeout();
> -	head_ref(send_ref, NULL);
> -	for_each_ref(send_ref, NULL);
> -	packet_flush(1);
> +	head_ref(scan_ref, NULL);
> +	for_each_ref(scan_ref, NULL);
> +
> +	push_advertise("HEAD");
> +	push_advertise("refs/heads/*");
> +	push_advertise("refs/tags/*");
> +	send_refs();
> +

How will this mesh with 'git clone --mirror'? Is the client expected to
ask with 'expand refs/*'?

-- Hannes

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] fix simple deepening of a repo
  2009-08-26  7:08                       ` Johannes Sixt
@ 2009-08-26  8:22                         ` Shawn O. Pearce
  2009-08-26  9:03                           ` Junio C Hamano
  0 siblings, 1 reply; 22+ messages in thread
From: Shawn O. Pearce @ 2009-08-26  8:22 UTC (permalink / raw)
  To: Johannes Sixt
  Cc: Junio C Hamano, Nicolas Pitre, Julian Phillips, Daniel Barkalow,
	Johannes Schindelin, git

Johannes Sixt <j.sixt@viscovery.net> wrote:
> Shawn O. Pearce schrieb:
> >  static void upload_pack(void)
> >  {
> >  	reset_timeout();
> > -	head_ref(send_ref, NULL);
> > -	for_each_ref(send_ref, NULL);
> > -	packet_flush(1);
> > +	head_ref(scan_ref, NULL);
> > +	for_each_ref(scan_ref, NULL);
> > +
> > +	push_advertise("HEAD");
> > +	push_advertise("refs/heads/*");
> > +	push_advertise("refs/tags/*");
> > +	send_refs();
> > +
> 
> How will this mesh with 'git clone --mirror'?

Not well.

> Is the client expected to
> ask with 'expand refs/*'?

If the client is new enough to understand the "expand" extension,
yes, it would ask for "expand refs/*" and get everthing, allowing
it to complete a full mirror.

If the client is older and doesn't know this extension, then it
is hopeless.  The server isn't advertising refs/*, doesn't know
that the client can't ask for refs/*, and the client doesn't know
it is missing refs when it makes the mirror.

This backwards incompatible breakage is something I have no real
solution for.  :-|

-- 
Shawn.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] fix simple deepening of a repo
  2009-08-26  8:22                         ` Shawn O. Pearce
@ 2009-08-26  9:03                           ` Junio C Hamano
  2009-08-26 17:03                             ` Shawn O. Pearce
  0 siblings, 1 reply; 22+ messages in thread
From: Junio C Hamano @ 2009-08-26  9:03 UTC (permalink / raw)
  To: Shawn O. Pearce
  Cc: Johannes Sixt, Junio C Hamano, Nicolas Pitre, Julian Phillips,
	Daniel Barkalow, Johannes Schindelin, git

"Shawn O. Pearce" <spearce@spearce.org> writes:

>> How will this mesh with 'git clone --mirror'?
>
> Not well.
>
>> Is the client expected to
>> ask with 'expand refs/*'?
>
> If the client is new enough to understand the "expand" extension,
> yes, it would ask for "expand refs/*" and get everthing, allowing
> it to complete a full mirror.
>
> If the client is older and doesn't know this extension, then it
> is hopeless.  The server isn't advertising refs/*, doesn't know
> that the client can't ask for refs/*, and the client doesn't know
> it is missing refs when it makes the mirror.
>
> This backwards incompatible breakage is something I have no real
> solution for.  :-|

But we at least can assume that the server operator is reasonable and
wouldn't go overboard, (ab)using this "abbreviated advertisement" feature
to hide heads and tags from the clients.  I would suspect that a server
operated by such a sick person who hides these normal refs will be shunned
by users so it won't either happen in practice, or even if it happens, it
won't be a problem---simply because nobody would want to interact with
such a server.  

Think about in what situation you would want to do a mirror clone.  The
most obvious is to have a back-up or secondary distribution point, and I
do not think of any other sane reason (other than "because I can", which
does not count).  If the original repository has so many refs to benefit
from the "abbreviated advertisement" feature (otherwise there is no point
using it in the first place), then its mirror repository would also want
to use the feature when talking to its clients, acting as a back-up
distribution point.  That means the version of git used to prime, update
and serve the mirror will know the expand extention.

I am hoping that we can finish 1.6.5 by mid September (let's tentatively
say we will shoot for 16th).  I expect the expand extention to be in
'next' by that time, cooking for 1.7.0.  How does that timetable sound?

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] fix simple deepening of a repo
  2009-08-26  9:03                           ` Junio C Hamano
@ 2009-08-26 17:03                             ` Shawn O. Pearce
  0 siblings, 0 replies; 22+ messages in thread
From: Shawn O. Pearce @ 2009-08-26 17:03 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Johannes Sixt, Nicolas Pitre, Julian Phillips, Daniel Barkalow,
	Johannes Schindelin, git

Junio C Hamano <gitster@pobox.com> wrote:
> "Shawn O. Pearce" <spearce@spearce.org> writes:
> >> How will this mesh with 'git clone --mirror'?
> >
> > Not well.
> 
> But we at least can assume that the server operator is reasonable and
> wouldn't go overboard, (ab)using this "abbreviated advertisement" feature
> to hide heads and tags from the clients.

Yes.  My patch is hardcoded to show only heads and tags, and nothing else.

But I think we want to make this configurable, and show everything
by default, but if there is a configuration entry, show only what
the configuration entry patterns suggest to advertise.

Thus an admin could hide refs/heads/*, but maybe he wants to, and
show only refs/heads/master, refs/heads/maint, refs/heads/next by
default.  This is actually a rather clear indication to a client
that although there may be individual cooking topics scattered
through the expanded refs/heads/* space, any reasonable default
clone wouldn't take them.

> Think about in what situation you would want to do a mirror clone.
...
> That means the version of git used to prime, update
> and serve the mirror will know the expand extention.

Great point Junio.  The backwards compatibility may be a non-issue
then, especially if this is configurable and we advertise refs/*
by default like we do now, and any reasonable admin who does enable
the hiding still advertises the core namspaces that really matter
to the majority of clients.

> I am hoping that we can finish 1.6.5 by mid September (let's tentatively
> say we will shoot for 16th).  I expect the expand extention to be in
> 'next' by that time, cooking for 1.7.0.  How does that timetable sound?

Oh, if 1.6.5 is mid-September, this is certainly not 1.6.5 material.
I'm not in any rush, this should go in when its ready, but 1.7
might be reasonable.

-- 
Shawn.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [RFC PATCH] upload-pack: expand capability advertises additional refs
  2009-08-26  2:10                     ` Shawn O. Pearce
  2009-08-26  7:08                       ` Johannes Sixt
@ 2009-08-28 17:30                       ` Shawn O. Pearce
  2009-08-28 19:07                         ` Junio C Hamano
  1 sibling, 1 reply; 22+ messages in thread
From: Shawn O. Pearce @ 2009-08-28 17:30 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Nicolas Pitre, Julian Phillips, Daniel Barkalow,
	Johannes Schindelin

The expand capability and associated command permits the client
to ask for information about refs which were not in the initial
advertisement sent when the connection was first opened.

In the below exchange the server initially only advertises its
current HEAD, refs/heads and refs/tags namespaces.  However,
the client has been instructed to fetch anything which matches
refs/remotes/jc/*.

Since no matching refs appeared in the initial advertisement,
the client requests the server to expand the desired pattern,
and terminates its expand request list with a flush.

Upon receiving a flush from the client, the server displays any
local refs which match any of the expand patterns requested,
and then closes this secondary advertisement list with a flush.
If no refs matched, the server immediately returns a flush.

If multiple expand patterns match the same ref, the ref is returned
only once in the secondary advertisement, avoid confusing the client
with duplicate results.

  S: 008f... HEAD\0...include-tag expand
  S: 0043... refs/heads/build-next
  S: 0040... refs/tags/v1.6.4.1
  S: 0043... refs/tags/v1.6.4.1^{}
  S: 0000

  C: 001dexpand refs/remotes/jc/*
  C: 0000

  S: 0043... refs/remotes/jc/maint
  S: 0044... refs/remotes/jc/master
  S: 0000

  C: 0031want ...
  C: 0000

Repository owners can control the set of refs which are sent as part
of the initial advertisement by configuring upload.advertise in the
repository configuration file.  If not set this is assumed to be
"refs/*", matching the prior behavior of advertising every local ref.

Fetch clients which are not using the anonymous git:// protocol
and which do not support the expand protocol extension may still
force the server to expand its configured upload.advertise set by
passing the --expand=<pattern> command line flag as part of the
--upload-pack= command line given to clone or fetch.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Shawn O. Pearce <sop@google.com>
---

 This is roughly my final server side version of this proposal.
 I still need to write the client code, but want to at least get
 this out there for further discussion.

 Documentation/config.txt          |   10 +++
 Documentation/git-upload-pack.txt |   10 +++-
 upload-pack.c                     |  131 +++++++++++++++++++++++++++++++++++--
 3 files changed, 143 insertions(+), 8 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 5256c7f..07907b6 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -1489,6 +1489,16 @@ transfer.unpackLimit::
 	not set, the value of this variable is used instead.
 	The default value is 100.
 
+upload.advertise::
+	The default set of refs to advertise when a fetch or
+	clone client connects to this repository.  Additional
+	local refs not in the default advertisement can still
+	be guessed and requested by clients through additional
+	network round trips.  Refs may be expressed as a complete
+	name ("refs/heads/master") or as a pattern expected by
+	remote.<name>.fetch (such as "refs/heads/*").  If not
+	specified, all refs are advertised ("refs/*").
+
 url.<base>.insteadOf::
 	Any URL that starts with this value will be rewritten to
 	start, instead, with <base>. In cases where some site serves a
diff --git a/Documentation/git-upload-pack.txt b/Documentation/git-upload-pack.txt
index b8e49dc..4cd9cc0 100644
--- a/Documentation/git-upload-pack.txt
+++ b/Documentation/git-upload-pack.txt
@@ -8,7 +8,7 @@ git-upload-pack - Send objects packed back to git-fetch-pack
 
 SYNOPSIS
 --------
-'git upload-pack' [--strict] [--timeout=<n>] <directory>
+'git upload-pack' [--strict] [--timeout=<n>] [--expand=<pattern> ...] <directory>
 
 DESCRIPTION
 -----------
@@ -30,6 +30,14 @@ OPTIONS
 --timeout=<n>::
 	Interrupt transfer after <n> seconds of inactivity.
 
+--expand=<pattern>::
+	Expand the requested pattern and advertise matching refs,
+	even if those refs were not matched by upload.advertise.
+	This option may be repeated to request expansion of more
+	than one pattern.  This option is intended only as an
+	escape hatch for older clients to fetch from a server
+	which has hidden interesting refs.
+
 <directory>::
 	The repository to sync from.
 
diff --git a/upload-pack.c b/upload-pack.c
index 4d8be83..890c1c5 100644
--- a/upload-pack.c
+++ b/upload-pack.c
@@ -10,6 +10,8 @@
 #include "revision.h"
 #include "list-objects.h"
 #include "run-command.h"
+#include "remote.h"
+#include "string-list.h"
 
 static const char upload_pack_usage[] = "git upload-pack [--strict] [--timeout=nn] <dir>";
 
@@ -30,6 +32,19 @@ static int multi_ack, nr_our_refs;
 static int use_thin_pack, use_ofs_delta, use_include_tag;
 static int no_progress, daemon_mode;
 static int shallow_nr;
+
+struct adv_ref {
+	struct adv_ref *next;
+	char *name;
+	unsigned pattern:1;
+};
+static struct adv_ref *to_advertise;
+static struct adv_ref **advertise_tail = &to_advertise;
+static int configured_advertise;
+
+static struct ref *local_refs;
+static struct ref **refs_tail = &local_refs;
+
 static struct object_array have_obj;
 static struct object_array want_obj;
 static unsigned int timeout;
@@ -470,6 +485,69 @@ static int get_common_commits(void)
 	}
 }
 
+static void push_advertise(const char *name)
+{
+	struct adv_ref *adv = xcalloc(1, sizeof(*adv));
+	adv->name = xstrdup(name);
+	adv->pattern = !!strchr(adv->name, '*');
+	*advertise_tail = adv;
+	advertise_tail = &adv->next;
+}
+
+static int upload_pack_config(const char *var, const char *value, void *cb)
+{
+	if (strcmp(var, "upload.advertise") == 0) {
+		configured_advertise = 1;
+		push_advertise(value);
+		return 0;
+	}
+
+	return git_default_config(var, value, cb);
+}
+
+static int send_ref(struct string_list_item *item, void *cb_data);
+static void send_refs(void)
+{
+	struct ref *to_send = NULL, **tail = &to_send;
+	struct ref *ref;
+	struct adv_ref *adv, *next_adv;
+	struct string_list sorted_names;
+
+	for (adv = to_advertise; adv; adv = next_adv) {
+		struct refspec spec;
+
+		memset(&spec, 0, sizeof(spec));
+		spec.pattern = adv->pattern;
+		spec.src = adv->name;
+		spec.dst = adv->name;
+		next_adv = adv->next;
+		get_fetch_map(local_refs, &spec, &tail, 1);
+
+		free(adv->name);
+		free(adv);
+	}
+	to_advertise = NULL;
+	advertise_tail = &to_advertise;
+
+	/* We may have duplicate copies of the same ref above, if
+	 * two advertise records matched the same local name.  To
+	 * avoid sending the same ref twice to the client, we put
+	 * them into a sorted list and then skip duplicates as we
+	 * output them.
+	 */
+	memset(&sorted_names, 0, sizeof(sorted_names));
+	for (ref = to_send; ref; ref = ref->next)
+		string_list_append(ref->name, &sorted_names)->util = ref;
+	sort_string_list(&sorted_names);
+
+	ref = NULL;
+	for_each_string_list(send_ref, &sorted_names, &ref);
+
+	string_list_clear(&sorted_names, 0);
+	free_refs(to_send);
+	packet_flush(1);
+}
+
 static void receive_needs(void)
 {
 	struct object_array shallows = {0, 0, NULL};
@@ -484,11 +562,22 @@ static void receive_needs(void)
 		unsigned char sha1_buf[20];
 		len = packet_read_line(0, line, sizeof(line));
 		reset_timeout();
-		if (!len)
+		if (!len) {
+			if (to_advertise) {
+				send_refs();
+				continue;
+			}
 			break;
+		}
 		if (debug_fd)
 			write_in_full(debug_fd, line, len);
 
+		if (!prefixcmp(line, "expand ")) {
+			if (line[len - 1] == '\n')
+				line[len - 1] = 0;
+			push_advertise(line + 7);
+			continue;
+		}
 		if (!prefixcmp(line, "shallow ")) {
 			unsigned char sha1[20];
 			struct object *object;
@@ -603,13 +692,22 @@ static void receive_needs(void)
 	free(shallows.objects);
 }
 
-static int send_ref(const char *refname, const unsigned char *sha1, int flag, void *cb_data)
+static int send_ref(struct string_list_item *item, void *cb_data)
 {
 	static const char *capabilities = "multi_ack thin-pack side-band"
 		" side-band-64k ofs-delta shallow no-progress"
-		" include-tag";
-	struct object *o = parse_object(sha1);
+		" include-tag expand";
+	struct ref **last_ref = cb_data;
+	struct ref *ref = item->util;
+	const char *refname = ref->name;
+	const unsigned char *sha1 = ref->new_sha1;
+	struct object *o;
+
+	if (*last_ref && !strcmp(refname, (*last_ref)->name))
+		return 0;
+	*last_ref = ref;
 
+	o = parse_object(sha1);
 	if (!o)
 		die("git upload-pack: cannot find object %s:", sha1_to_hex(sha1));
 
@@ -631,12 +729,26 @@ static int send_ref(const char *refname, const unsigned char *sha1, int flag, vo
 	return 0;
 }
 
+static int scan_ref(const char *refname, const unsigned char *sha1, int flag, void *cb_data)
+{
+	struct ref *r = alloc_ref(refname);
+	hashcpy(r->new_sha1, sha1);
+	*refs_tail = r;
+	refs_tail = &r->next;
+	return 0;
+}
+
 static void upload_pack(void)
 {
+	git_config(upload_pack_config, NULL);
+	if (!configured_advertise)
+		push_advertise("refs/*");
+
+	head_ref(scan_ref, NULL);
+	for_each_ref(scan_ref, NULL);
+
 	reset_timeout();
-	head_ref(send_ref, NULL);
-	for_each_ref(send_ref, NULL);
-	packet_flush(1);
+	send_refs();
 	receive_needs();
 	if (want_obj.nr) {
 		get_common_commits();
@@ -652,6 +764,7 @@ int main(int argc, char **argv)
 
 	git_extract_argv0_path(argv[0]);
 	read_replace_refs = 0;
+	push_advertise("HEAD");
 
 	for (i = 1; i < argc; i++) {
 		char *arg = argv[i];
@@ -667,6 +780,10 @@ int main(int argc, char **argv)
 			daemon_mode = 1;
 			continue;
 		}
+		if (!prefixcmp(arg, "--expand=")) {
+			push_advertise(arg + 9);
+			continue;
+		}
 		if (!strcmp(arg, "--")) {
 			i++;
 			break;
-- 
1.6.4.1.341.gf2a44

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH] upload-pack: expand capability advertises additional refs
  2009-08-28 17:30                       ` [RFC PATCH] upload-pack: expand capability advertises additional refs Shawn O. Pearce
@ 2009-08-28 19:07                         ` Junio C Hamano
  0 siblings, 0 replies; 22+ messages in thread
From: Junio C Hamano @ 2009-08-28 19:07 UTC (permalink / raw)
  To: Shawn O. Pearce
  Cc: git, Nicolas Pitre, Julian Phillips, Daniel Barkalow,
	Johannes Schindelin

"Shawn O. Pearce" <spearce@spearce.org> writes:

>  static void upload_pack(void)
>  {
> +	git_config(upload_pack_config, NULL);
> +	if (!configured_advertise)
> +		push_advertise("refs/*");
> +
> +	head_ref(scan_ref, NULL);
> +	for_each_ref(scan_ref, NULL);
> +
>  	reset_timeout();
> -	head_ref(send_ref, NULL);
> -	for_each_ref(send_ref, NULL);
> -	packet_flush(1);
> +	send_refs();
>  	receive_needs();
>  	if (want_obj.nr) {
>  		get_common_commits();
> @@ -652,6 +764,7 @@ int main(int argc, char **argv)
>  
>  	git_extract_argv0_path(argv[0]);
>  	read_replace_refs = 0;
> +	push_advertise("HEAD");
>  
>  	for (i = 1; i < argc; i++) {
>  		char *arg = argv[i];
> @@ -667,6 +780,10 @@ int main(int argc, char **argv)
>  			daemon_mode = 1;
>  			continue;
>  		}
> +		if (!prefixcmp(arg, "--expand=")) {
> +			push_advertise(arg + 9);
> +			continue;
> +		}
>  		if (!strcmp(arg, "--")) {
>  			i++;
>  			break;

This arrangement of push_advertise() calls are rather curious.  I think
your design guidelines are:

 - We do want to advertise HEAD, so it can (and should) be unconditional;

 - We may want to restrict with configuration, or use "refs/*" as the
   fallback default;

 - We may want to (extend|replace) advertised set configured in the
   configuration file.

I would naively expect the above to be implemented in this order:

 - In main, first thing is to do the git_config() bit; if to_advertise is
   non-empty after git_config() returns, we have seen upload.advertise.
   Otherwise we haven't.  Push "refs/*" in the latter case.

 - Then, parse the command line.  Do we see "--expand="?  If so:

   . When handling the first "--expand=", clear to_advertise list.  This
     step is optional---necessary only if we want to make config variable
     override-able ;-);

   . Then, push the pattern in.

 - And finally push "HEAD" in unconditionally.

Which would mean a few things:

 - Your version always extends what is read from the configuration with
   --expand=, but I think replacing would be the right thing to do.

 - configured_advertise global variable can go.

 - Instead, to implement "--expand= replaces", we need a local variable in
   main() to remember if we already cleared to_advertise of the patterns
   read from the configuration file, and use it in the part that parses
   "--expand=".

But I may not be thinking clearly; I was up all night because I couldn't
sleep under loud noises of fire-fighting helicopters...

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2009-08-28 19:08 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-08-22  5:52 git fetch --depth=* broken? Nicolas Pitre
2009-08-24  4:04 ` [PATCH] fix simple deepening of a repo Nicolas Pitre
2009-08-24  4:49   ` Junio C Hamano
2009-08-24 13:55     ` Nicolas Pitre
2009-08-24 14:20       ` Johan Herland
2009-08-24 22:21       ` Junio C Hamano
2009-08-24 16:26     ` Daniel Barkalow
2009-08-24 22:30       ` Julian Phillips
2009-08-25  0:18         ` Nicolas Pitre
2009-08-25  2:12           ` Shawn O. Pearce
2009-08-25  5:00             ` Sverre Rabbelier
2009-08-25  5:21             ` Junio C Hamano
2009-08-25  6:12               ` Shawn O. Pearce
2009-08-25  6:33                 ` Junio C Hamano
2009-08-25 15:14                   ` Shawn O. Pearce
2009-08-26  2:10                     ` Shawn O. Pearce
2009-08-26  7:08                       ` Johannes Sixt
2009-08-26  8:22                         ` Shawn O. Pearce
2009-08-26  9:03                           ` Junio C Hamano
2009-08-26 17:03                             ` Shawn O. Pearce
2009-08-28 17:30                       ` [RFC PATCH] upload-pack: expand capability advertises additional refs Shawn O. Pearce
2009-08-28 19:07                         ` Junio C Hamano

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.