All of lore.kernel.org
 help / color / mirror / Atom feed
* Fetching too many tags?
@ 2023-08-10  6:08 Ronan Pigott
  2023-08-11 18:09 ` Jeff King
  2023-08-11 22:06 ` Ronan Pigott
  0 siblings, 2 replies; 5+ messages in thread
From: Ronan Pigott @ 2023-08-10  6:08 UTC (permalink / raw)
  To: git

Hey git,

I am interested in git performance today and can't figure out what's going on
here. I was wondering why my git-fetch might be slow in an up-to-date repo:

  $ git pull
  Already up to date.
  $ time git fetch origin master
  From https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux
   * branch                      master     -> FETCH_HEAD
  git fetch origin master  0.13s user 0.06s system 10% cpu 1.705 total

GIT_TRACE_CURL shows it spends most of the time transfering (all) tags from the
remote. It's much faster with --no-tags:

  $ time git fetch -n origin master
  From https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux
   * branch                      master     -> FETCH_HEAD
  git fetch -n origin master  0.11s user 0.03s system 36% cpu 0.383 total

But I don't have tagOpt set:

  $ git config remote.origin.tagOpt || echo $?
  1

And the remote doesn't have to send me any commits, so I don't see why I should
receive any tags at all. Why might I be receiving so many tags?

Thanks,
Ronan

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Fetching too many tags?
  2023-08-10  6:08 Fetching too many tags? Ronan Pigott
@ 2023-08-11 18:09 ` Jeff King
  2023-08-11 22:06 ` Ronan Pigott
  1 sibling, 0 replies; 5+ messages in thread
From: Jeff King @ 2023-08-11 18:09 UTC (permalink / raw)
  To: Ronan Pigott; +Cc: git

On Thu, Aug 10, 2023 at 06:08:34AM +0000, Ronan Pigott wrote:

> I am interested in git performance today and can't figure out what's going on
> here. I was wondering why my git-fetch might be slow in an up-to-date repo:
> 
>   $ git pull
>   Already up to date.
>   $ time git fetch origin master
>   From https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux
>    * branch                      master     -> FETCH_HEAD
>   git fetch origin master  0.13s user 0.06s system 10% cpu 1.705 total
> 
> GIT_TRACE_CURL shows it spends most of the time transfering (all) tags from the
> remote. It's much faster with --no-tags:
> 
>   $ time git fetch -n origin master
>   From https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux
>    * branch                      master     -> FETCH_HEAD
>   git fetch -n origin master  0.11s user 0.03s system 36% cpu 0.383 total
> 
> But I don't have tagOpt set:
> 
>   $ git config remote.origin.tagOpt || echo $?
>   1
> 
> And the remote doesn't have to send me any commits, so I don't see why I should
> receive any tags at all. Why might I be receiving so many tags?

You didn't define "receiving tags", but I assume you just mean that you
saw the tag names and object ids in the trace output. From the output
above, it looks like no actual tag objects were transferred.

And the answer, then, is that this is how the Git protocol works. The
server says "here are all the refs I know about", then the client
decides what it wants from that list and asks the server to send the
necessary objects, after which it updates its local refs.

So the server will necessarily send all of the tags. Only the client
knows what it already has and whether any of them are new. And in the
default mode, which will fetch tags that point to commits we have, it is
checking each such new tag to see if it is worth fetching. Even if we
did not fetch new commits, we might see new tags that point to existing
commits.

When you use "--no-tags", that explicitly says "do not bother with tags
at all". Recent versions of Git have a protocol extension where the
client can say "I am only interested in refs/heads/master; don't bother
telling me about other stuff". Since the client knows we do not care
about tags, it can use that extension to get a much smaller ref
advertisement from the server.

-Peff

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Fetching too many tags?
  2023-08-10  6:08 Fetching too many tags? Ronan Pigott
  2023-08-11 18:09 ` Jeff King
@ 2023-08-11 22:06 ` Ronan Pigott
  2023-08-11 23:58   ` Jeff King
  2023-08-12  1:04   ` Ronan Pigott
  1 sibling, 2 replies; 5+ messages in thread
From: Ronan Pigott @ 2023-08-11 22:06 UTC (permalink / raw)
  To: Jeff King; +Cc: git

> And the answer, then, is that this is how the Git protocol works. The
> server says "here are all the refs I know about", then the client
> decides what it wants from that list and asks the server to send the
> necessary objects, after which it updates its local refs.

Thanks, this clears up some of my confusion. I had thought that the client sent
the server what we had and that the server would then decide what objects to
send over.

> When you use "--no-tags", that explicitly says "do not bother with tags
> at all". Recent versions of Git have a protocol extension where the
> client can say "I am only interested in refs/heads/master; don't bother
> telling me about other stuff". Since the client knows we do not care
> about tags, it can use that extension to get a much smaller ref
> advertisement from the server.

Do you mean the --negotiation-tip fetch option? In my experience, it doesn't
appear to have much of an effect in this case.

  $ time git fetch origin master
  From https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux
   * branch                      master     -> FETCH_HEAD
  git fetch origin master  0.13s user 0.04s system 9% cpu 1.793 total
  $ time git fetch --negotiation-tip=master origin master
  From https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux
   * branch                      master     -> FETCH_HEAD
  git fetch --negotiation-tip=master origin master  0.10s user 0.06s system 9% cpu 1.762 total

Is that because (most) the tags point to commits reachable from master?

My prior (apparently incorrect) understanding of the fetch negotiation is based
on my interpretation of the description of this option in git-fetch(1):

> By default, Git will report, to the server, commits reachable from all local
> refs to find common commits in an attempt to reduce the size of the
> to-be-received packfile. If specified, Git will only report commits reachable
> from the given tips. This is useful to speed up fetches when the user knows
> which local ref is likely to have commits in common with the upstream ref being
> fetched.

Now, if I understand correctly, the report does not include the tags that we
already have? 

Cheers,
Ronan

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Fetching too many tags?
  2023-08-11 22:06 ` Ronan Pigott
@ 2023-08-11 23:58   ` Jeff King
  2023-08-12  1:04   ` Ronan Pigott
  1 sibling, 0 replies; 5+ messages in thread
From: Jeff King @ 2023-08-11 23:58 UTC (permalink / raw)
  To: Ronan Pigott; +Cc: git

On Fri, Aug 11, 2023 at 10:06:43PM +0000, Ronan Pigott wrote:

> > When you use "--no-tags", that explicitly says "do not bother with tags
> > at all". Recent versions of Git have a protocol extension where the
> > client can say "I am only interested in refs/heads/master; don't bother
> > telling me about other stuff". Since the client knows we do not care
> > about tags, it can use that extension to get a much smaller ref
> > advertisement from the server.
> 
> Do you mean the --negotiation-tip fetch option? In my experience, it doesn't
> appear to have much of an effect in this case.

No, the "negotiation" phase only happens when there are objects to
fetch, and the client and server have to agree on which ones. That's not
happening at all in your case (so --negotiation-tip won't have any
effect).

The feature I was thinking of is that in Git's "v2" protocol, the client
gets to speak first, and so it can say "btw, I am only interested in
these refs". v2 became the default in git v2.29 (of course both client
and server have to support it, but kernel.org is definitely up to date
there).

You can see it in action with something like this:

  GIT_TRACE_PACKET=1 git fetch --no-tags origin master

The "ref-prefix" lines are the client telling the server which prefixes
it's interested in (we have to ask for several variants because "master"
from the command line gets fully qualified based on what the other side
offers). Try it without --no-tags and you'll see a wider ref-prefix
request. If you try:

  GIT_TRACE_PACKET=1 git -c protocol.version=0 fetch --no-tags origin master

you'll see the full advertisement, even with --no-tags. In v0, the
server speaks first and just dumps its complete list of refs.

> > By default, Git will report, to the server, commits reachable from all local
> > refs to find common commits in an attempt to reduce the size of the
> > to-be-received packfile. If specified, Git will only report commits reachable
> > from the given tips. This is useful to speed up fetches when the user knows
> > which local ref is likely to have commits in common with the upstream ref being
> > fetched.
> 
> Now, if I understand correctly, the report does not include the tags that we
> already have? 

So there's no negotiation here at all, as I explained above. But when it
does happen, Git should use all refs, including tags and branches, to
try to reach a common point in the history graph. If you run with
GIT_TRACE_PACKET on a request that actually fetches objects, you'll see
"have" and "want" lines from the client.

For a vanilla fetch from a server you regularly fetch from, the
negotiation is pretty boring and fast (the client tells the server about
the old commit at the tip of the branch, and the server immediately says
"OK, I know about that").

A more interesting one is if you fetch the kernel from Linus's repo, and
then fetch from the stable kernel repo after that. Or maybe vice versa.
There you have two histories that share significant chunks, but also
have each diverged. So you should see the client and server dumping
sha1's at each other until they reach a common point. That's a case
where --negotiation-tip can sometimes speed things up.

-Peff

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Fetching too many tags?
  2023-08-11 22:06 ` Ronan Pigott
  2023-08-11 23:58   ` Jeff King
@ 2023-08-12  1:04   ` Ronan Pigott
  1 sibling, 0 replies; 5+ messages in thread
From: Ronan Pigott @ 2023-08-12  1:04 UTC (permalink / raw)
  To: Jeff King; +Cc: git

> No, the "negotiation" phase only happens when there are objects to
> fetch, and the client and server have to agree on which ones. That's not
> happening at all in your case (so --negotiation-tip won't have any
> effect).

Ah, I see.

> The feature I was thinking of is that in Git's "v2" protocol, the client
> gets to speak first, and so it can say "btw, I am only interested in
> these refs". v2 became the default in git v2.29 (of course both client
> and server have to support it, but kernel.org is definitely up to date
> there).
> 
> You can see it in action with something like this:
> 
>  GIT_TRACE_PACKET=1 git fetch --no-tags origin master
> 
> The "ref-prefix" lines are the client telling the server which prefixes
> it's interested in (we have to ask for several variants because "master"
> from the command line gets fully qualified based on what the other side
> offers). Try it without --no-tags and you'll see a wider ref-prefix
> request. If you try:

Thanks. I tried this and indeed without --no-tags there is an additional line

> 17:41:29.163545 pkt-line.c:86   packet:   git< ref-prefix refs/tags/

I understand now that this is why the server is telling me about all those tags.
I had thought it would only need to tell me about tags that point to something
reachable from master, and was confused why the server was advertising all the
tags.

Thanks,
Ronan

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2023-08-12  1:04 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-08-10  6:08 Fetching too many tags? Ronan Pigott
2023-08-11 18:09 ` Jeff King
2023-08-11 22:06 ` Ronan Pigott
2023-08-11 23:58   ` Jeff King
2023-08-12  1:04   ` Ronan Pigott

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.