* Re: git gc expanding packed data?
@ 2009-08-08 1:11 Andreas Schwab
2009-08-08 13:05 ` Hin-Tak Leung
2009-08-09 2:56 ` Nicolas Pitre
0 siblings, 2 replies; 26+ messages in thread
From: Andreas Schwab @ 2009-08-08 1:11 UTC (permalink / raw)
To: Nicolas Pitre; +Cc: Hin-Tak Leung, git
Nicolas Pitre <nico@cam.org> writes:
> It appears that the git installation serving clone requests for
> git://gcc.gnu.org/git/gcc.git generates lots of unreferenced objects. I
> just cloned it and the pack I was sent contains 1383356 objects (can be
> determined with 'git show-index < .git/objects/pack/*.idx | wc -l').
> However, there are only 978501 actually referenced objects in that
> cloned repository ( 'git rev-list --all --objects | wc -l'). That makes
> for 404855 useless objects in the cloned repository.
Those objects are not useless. They are referenced by the remote refs
on the remote side, which are not fetched by default. If you clone a
mirror of the repository you'll see no unreferenced objects.
Andreas.
--
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5
"And now for something completely different."
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: git gc expanding packed data? 2009-08-08 1:11 git gc expanding packed data? Andreas Schwab @ 2009-08-08 13:05 ` Hin-Tak Leung 2009-08-08 13:25 ` Andreas Schwab 2009-08-09 2:56 ` Nicolas Pitre 1 sibling, 1 reply; 26+ messages in thread From: Hin-Tak Leung @ 2009-08-08 13:05 UTC (permalink / raw) To: Andreas Schwab; +Cc: Nicolas Pitre, git On Sat, Aug 8, 2009 at 2:11 AM, Andreas Schwab<schwab@linux-m68k.org> wrote: > Nicolas Pitre <nico@cam.org> writes: > >> It appears that the git installation serving clone requests for >> git://gcc.gnu.org/git/gcc.git generates lots of unreferenced objects. I >> just cloned it and the pack I was sent contains 1383356 objects (can be >> determined with 'git show-index < .git/objects/pack/*.idx | wc -l'). >> However, there are only 978501 actually referenced objects in that >> cloned repository ( 'git rev-list --all --objects | wc -l'). That makes >> for 404855 useless objects in the cloned repository. > > Those objects are not useless. They are referenced by the remote refs > on the remote side, which are not fetched by default. If you clone a > mirror of the repository you'll see no unreferenced objects. > > Andreas. > > -- > Andreas Schwab, schwab@linux-m68k.org > GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 > "And now for something completely different." > Thanks... It is a difference between svn and git mentality probably - one only pushes reasonably reliable code to a public git repository, whereas anything transient is recorded in svn - I think many of the unreferenced objects are svn user-branches (which are probably of use to people who intend to work on gcc for fairly extended periods, rather than casual users like me). The case with gcc is probably quite extreme - many user branches, and very large code base - but is there anything on the git side with git gc which can lessen this kind of pathological behavior (expanding packs)? Thanks a lot for the explanation and the discussion. Hin-Tak ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: git gc expanding packed data? 2009-08-08 13:05 ` Hin-Tak Leung @ 2009-08-08 13:25 ` Andreas Schwab 0 siblings, 0 replies; 26+ messages in thread From: Andreas Schwab @ 2009-08-08 13:25 UTC (permalink / raw) To: Hin-Tak Leung; +Cc: Nicolas Pitre, git Hin-Tak Leung <hintak.leung@gmail.com> writes: > Thanks... It is a difference between svn and git mentality probably - It is just that the remote is a git-svn tree, with only a few branches created as local branches. > The case with gcc is probably quite extreme - many user branches, and > very large code base - but is there anything on the git side with git > gc which can lessen this kind of pathological behavior (expanding > packs)? If you fetch all refs, not only refs/heads/*, all objects will be referenced. Andreas. -- Andreas Schwab, schwab@linux-m68k.org GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 "And now for something completely different." ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: git gc expanding packed data? 2009-08-08 1:11 git gc expanding packed data? Andreas Schwab 2009-08-08 13:05 ` Hin-Tak Leung @ 2009-08-09 2:56 ` Nicolas Pitre 2009-08-09 7:43 ` Andreas Schwab 1 sibling, 1 reply; 26+ messages in thread From: Nicolas Pitre @ 2009-08-09 2:56 UTC (permalink / raw) To: Andreas Schwab; +Cc: Hin-Tak Leung, git On Sat, 8 Aug 2009, Andreas Schwab wrote: > Nicolas Pitre <nico@cam.org> writes: > > > It appears that the git installation serving clone requests for > > git://gcc.gnu.org/git/gcc.git generates lots of unreferenced objects. I > > just cloned it and the pack I was sent contains 1383356 objects (can be > > determined with 'git show-index < .git/objects/pack/*.idx | wc -l'). > > However, there are only 978501 actually referenced objects in that > > cloned repository ( 'git rev-list --all --objects | wc -l'). That makes > > for 404855 useless objects in the cloned repository. > > Those objects are not useless. They are referenced by the remote refs > on the remote side, which are not fetched by default. If you clone a > mirror of the repository you'll see no unreferenced objects. If you do a clone using the git:// protocol and the server sends you only the ref for the trunk branch, then it should send you only objects reachable from that branch. Any extra objects sent by the server are useless to me and wastes my and everyone else's bandwidth, and on my next repack those objects are pruned anyway. The point of the git protocol is _not_ necessarily to send a copy of the remote pack file over, even during a clone. Nicolas ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: git gc expanding packed data? 2009-08-09 2:56 ` Nicolas Pitre @ 2009-08-09 7:43 ` Andreas Schwab 2009-09-25 18:05 ` git clone sending unneeded objects (was : git gc expanding packed data?) Jason Merrill 0 siblings, 1 reply; 26+ messages in thread From: Andreas Schwab @ 2009-08-09 7:43 UTC (permalink / raw) To: Nicolas Pitre; +Cc: Hin-Tak Leung, git Nicolas Pitre <nico@cam.org> writes: > If you do a clone using the git:// protocol and the server sends you > only the ref for the trunk branch, A clone will fetch all branches from refs/heads/*. > then it should send you only objects reachable from that branch. Apparantly this does not work. I'd guess the extra objects are needed due to the delta compression. Andreas. -- Andreas Schwab, schwab@linux-m68k.org GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 "And now for something completely different." ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: git clone sending unneeded objects (was : git gc expanding packed data?) 2009-08-09 7:43 ` Andreas Schwab @ 2009-09-25 18:05 ` Jason Merrill 2009-09-25 19:34 ` git clone sending unneeded objects Matthieu Moy 0 siblings, 1 reply; 26+ messages in thread From: Jason Merrill @ 2009-09-25 18:05 UTC (permalink / raw) To: Andreas Schwab; +Cc: Nicolas Pitre, Hin-Tak Leung, git On 08/09/2009 03:43 AM, Andreas Schwab wrote: > Nicolas Pitre<nico@cam.org> writes: > >> If you do a clone using the git:// protocol and the server sends you >> only the ref for the trunk branch, > > A clone will fetch all branches from refs/heads/*. > >> then it should send you only objects reachable from that branch. > > Apparantly this does not work. I'd guess the extra objects are needed > due to the delta compression. I just tried doing a clone of the GCC repository, then git gc --prune=now, and another clone specifying --reference to the first, and it wanted to download all the unreachable objects again. So it doesn't seem to be a compression issue. This is with git 1.6.4 on both ends. Jason ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: git clone sending unneeded objects 2009-09-25 18:05 ` git clone sending unneeded objects (was : git gc expanding packed data?) Jason Merrill @ 2009-09-25 19:34 ` Matthieu Moy 2009-09-25 19:43 ` Jason Merrill 2009-09-25 19:53 ` Nicolas Pitre 0 siblings, 2 replies; 26+ messages in thread From: Matthieu Moy @ 2009-09-25 19:34 UTC (permalink / raw) To: Jason Merrill; +Cc: git, Nicolas Pitre, Hin-Tak Leung Jason Merrill <jason@redhat.com> writes: > On 08/09/2009 03:43 AM, Andreas Schwab wrote: >> Nicolas Pitre<nico@cam.org> writes: >> >>> If you do a clone using the git:// protocol and the server sends you >>> only the ref for the trunk branch, >> >> A clone will fetch all branches from refs/heads/*. >> >>> then it should send you only objects reachable from that branch. >> >> Apparantly this does not work. I'd guess the extra objects are needed >> due to the delta compression. > > I just tried doing a clone of the GCC repository, then git gc > --prune=now, and another clone specifying --reference to the first, > and it wanted to download all the unreachable objects again. So it > doesn't seem to be a compression issue. > > This is with git 1.6.4 on both ends. Which protocol did you use? If you use git:// or ssh://, it's normally a security feature that Git sends you only reachable objects. If it doesn't, it's a serious bug. -- Matthieu Moy http://www-verimag.imag.fr/~moy/ ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: git clone sending unneeded objects 2009-09-25 19:34 ` git clone sending unneeded objects Matthieu Moy @ 2009-09-25 19:43 ` Jason Merrill 2009-09-25 19:53 ` Nicolas Pitre 1 sibling, 0 replies; 26+ messages in thread From: Jason Merrill @ 2009-09-25 19:43 UTC (permalink / raw) To: Matthieu Moy; +Cc: git, Nicolas Pitre, Hin-Tak Leung On 09/25/2009 03:34 PM, Matthieu Moy wrote: > Which protocol did you use? git:// Jason ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: git clone sending unneeded objects 2009-09-25 19:34 ` git clone sending unneeded objects Matthieu Moy 2009-09-25 19:43 ` Jason Merrill @ 2009-09-25 19:53 ` Nicolas Pitre 2009-09-25 20:20 ` Jason Merrill 1 sibling, 1 reply; 26+ messages in thread From: Nicolas Pitre @ 2009-09-25 19:53 UTC (permalink / raw) To: Matthieu Moy; +Cc: Jason Merrill, git, Hin-Tak Leung On Fri, 25 Sep 2009, Matthieu Moy wrote: > Jason Merrill <jason@redhat.com> writes: > > > On 08/09/2009 03:43 AM, Andreas Schwab wrote: > >> Nicolas Pitre<nico@cam.org> writes: > >> > >>> If you do a clone using the git:// protocol and the server sends you > >>> only the ref for the trunk branch, > >> > >> A clone will fetch all branches from refs/heads/*. > >> > >>> then it should send you only objects reachable from that branch. > >> > >> Apparantly this does not work. I'd guess the extra objects are needed > >> due to the delta compression. > > > > I just tried doing a clone of the GCC repository, then git gc > > --prune=now, and another clone specifying --reference to the first, > > and it wanted to download all the unreachable objects again. So it > > doesn't seem to be a compression issue. > > > > This is with git 1.6.4 on both ends. > > Which protocol did you use? > > If you use git:// or ssh://, it's normally a security feature that Git > sends you only reachable objects. If it doesn't, it's a serious bug. I did reproduce the issue with git:// back when this discussion started. I also asked for more information about the remote which didn't come forth. Nicolas ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: git clone sending unneeded objects 2009-09-25 19:53 ` Nicolas Pitre @ 2009-09-25 20:20 ` Jason Merrill 2009-09-25 20:47 ` Nicolas Pitre 2009-09-26 0:43 ` Hin-Tak Leung 0 siblings, 2 replies; 26+ messages in thread From: Jason Merrill @ 2009-09-25 20:20 UTC (permalink / raw) To: Nicolas Pitre; +Cc: Matthieu Moy, git, Hin-Tak Leung On 09/25/2009 03:53 PM, Nicolas Pitre wrote: > I did reproduce the issue with git:// back when this discussion started. > I also asked for more information about the remote which didn't come > forth. Looking back, I only see you asking about the git version on the server, which is 1.6.4. So again: git clone git://gcc.gnu.org/git/gcc.git (1399509 objects, ~600MB .git dir) git gc --prune=now (988906 objects, ~450MB .git dir) ...then git clone git://gcc.gnu.org/git/gcc.git --reference $firstclone (573401 objects, ~550MB .git dir) git fsck (clean) git gc --prune=now (5 objects, ~7MB .git dir) What's going on here? Jason ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: git clone sending unneeded objects 2009-09-25 20:20 ` Jason Merrill @ 2009-09-25 20:47 ` Nicolas Pitre 2009-09-25 23:17 ` Jason Merrill 2009-09-26 0:43 ` Hin-Tak Leung 1 sibling, 1 reply; 26+ messages in thread From: Nicolas Pitre @ 2009-09-25 20:47 UTC (permalink / raw) To: Jason Merrill; +Cc: Matthieu Moy, git, Hin-Tak Leung On Fri, 25 Sep 2009, Jason Merrill wrote: > On 09/25/2009 03:53 PM, Nicolas Pitre wrote: > > I did reproduce the issue with git:// back when this discussion started. > > I also asked for more information about the remote which didn't come > > forth. > > Looking back, I only see you asking about the git version on the server, which > is 1.6.4. > > So again: > > git clone git://gcc.gnu.org/git/gcc.git > (1399509 objects, ~600MB .git dir) > git gc --prune=now (988906 objects, ~450MB .git dir) > > ...then > > git clone git://gcc.gnu.org/git/gcc.git --reference $firstclone > (573401 objects, ~550MB .git dir) > git fsck (clean) > git gc --prune=now (5 objects, ~7MB .git dir) > > What's going on here? Some screw up. Do you have access to the remote machine? Is it possible to have a tarball of the gcc.git directory from there? Nicolas ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: git clone sending unneeded objects 2009-09-25 20:47 ` Nicolas Pitre @ 2009-09-25 23:17 ` Jason Merrill 2009-09-26 0:49 ` Nicolas Pitre 2009-09-26 4:44 ` git clone sending unneeded objects Jason Merrill 0 siblings, 2 replies; 26+ messages in thread From: Jason Merrill @ 2009-09-25 23:17 UTC (permalink / raw) To: Nicolas Pitre; +Cc: Matthieu Moy, git, Hin-Tak Leung On 09/25/2009 04:47 PM, Nicolas Pitre wrote: > Do you have access to the remote machine? Is it possible to have a > tarball of the gcc.git directory from there? http://gcc.gnu.org/gcc-git.tar.gz I'll leave it there for a few days. Jason ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: git clone sending unneeded objects 2009-09-25 23:17 ` Jason Merrill @ 2009-09-26 0:49 ` Nicolas Pitre 2009-09-26 3:54 ` [PATCH] make 'git clone' ask the remote only for objects it cares about Nicolas Pitre 2009-09-26 4:44 ` git clone sending unneeded objects Jason Merrill 1 sibling, 1 reply; 26+ messages in thread From: Nicolas Pitre @ 2009-09-26 0:49 UTC (permalink / raw) To: Jason Merrill; +Cc: Matthieu Moy, git, Hin-Tak Leung On Fri, 25 Sep 2009, Jason Merrill wrote: > On 09/25/2009 04:47 PM, Nicolas Pitre wrote: > > Do you have access to the remote machine? Is it possible to have a > > tarball of the gcc.git directory from there? > > http://gcc.gnu.org/gcc-git.tar.gz > > I'll leave it there for a few days. Thanks, I got it now. And I was able to reproduce the issue locally. Cloning the original repository does transfer objects which become unreferenced in the clone. But cloning that cloned repository (before pruning the unreferenced objects) does not transfer those objects again. Just need to find out why. Nicolas ^ permalink raw reply [flat|nested] 26+ messages in thread
* [PATCH] make 'git clone' ask the remote only for objects it cares about 2009-09-26 0:49 ` Nicolas Pitre @ 2009-09-26 3:54 ` Nicolas Pitre 2009-09-26 7:21 ` Andreas Schwab 2009-09-26 19:50 ` Shawn O. Pearce 0 siblings, 2 replies; 26+ messages in thread From: Nicolas Pitre @ 2009-09-26 3:54 UTC (permalink / raw) To: Junio C Hamano; +Cc: Jason Merrill, Matthieu Moy, git, Hin-Tak Leung Current behavior of 'git clone' when not using --mirror is to fetch everything from the peer, and then filter out unwanted refs just before writing them out to the cloned repository. This may become highly inefficient if the peer has an unusual ref namespace, or if it simply has "remotes" refs of its own, and those locally unwanted refs are connecting to a large set of objects which becomes unreferenced as soon as they are fetched. Let's filter out those unwanted refs from the peer _before_ asking it what refs we want to fetch instead, which is the most logical thing to do anyway. Signed-off-by: Nicolas Pitre <nico@fluxnic.net> --- On Fri, 25 Sep 2009, Nicolas Pitre wrote: > On Fri, 25 Sep 2009, Jason Merrill wrote: > > > On 09/25/2009 04:47 PM, Nicolas Pitre wrote: > > > Do you have access to the remote machine? Is it possible to have a > > > tarball of the gcc.git directory from there? > > > > http://gcc.gnu.org/gcc-git.tar.gz > > > > I'll leave it there for a few days. > > Thanks, I got it now. And I was able to reproduce the issue locally. > > Cloning the original repository does transfer objects which become > unreferenced in the clone. But cloning that cloned repository (before > pruning the unreferenced objects) does not transfer those objects again. > > Just need to find out why. And the "why" is described above. The problem was actually on the client side and was affecting clones of any repository containing anything outside refs/heads and refs/tags. The fact that the git repository on gcc.gnu.org has lots of stuff in "remote" branches that don't get cloned by default is a separate configuration/policy issue on that server which might need (or not) to be looked into. For instance at least, as a bare repository, it should have all the git files in gcc.git/ directly instead of gcc.git/.git/. diff --git a/builtin-clone.c b/builtin-clone.c index bab2d84..edf7c7f 100644 --- a/builtin-clone.c +++ b/builtin-clone.c @@ -329,24 +329,28 @@ static void remove_junk_on_signal(int signo) raise(signo); } -static struct ref *write_remote_refs(const struct ref *refs, - struct refspec *refspec, const char *reflog) +static struct ref *wanted_peer_refs(const struct ref *refs, + struct refspec *refspec) { struct ref *local_refs = NULL; struct ref **tail = &local_refs; - struct ref *r; get_fetch_map(refs, refspec, &tail, 0); if (!option_mirror) get_fetch_map(refs, tag_refspec, &tail, 0); + return local_refs; +} + +static void write_remote_refs(const struct ref *local_refs, const char *reflog) +{ + const struct ref *r; + for (r = local_refs; r; r = r->next) add_extra_ref(r->peer_ref->name, r->old_sha1, 0); pack_refs(PACK_REFS_ALL); clear_extra_refs(); - - return local_refs; } int cmd_clone(int argc, const char **argv, const char *prefix) @@ -495,9 +499,10 @@ int cmd_clone(int argc, const char **argv, const char *prefix) strbuf_reset(&value); - if (path && !is_bundle) + if (path && !is_bundle) { refs = clone_local(path, git_dir); - else { + mapped_refs = wanted_peer_refs(refs, refspec); + } else { struct remote *remote = remote_get(argv[0]); transport = transport_get(remote, remote->url[0]); @@ -520,14 +525,16 @@ int cmd_clone(int argc, const char **argv, const char *prefix) option_upload_pack); refs = transport_get_remote_refs(transport); - if (refs) - transport_fetch_refs(transport, refs); + if (refs) { + mapped_refs = wanted_peer_refs(refs, refspec); + transport_fetch_refs(transport, mapped_refs); + } } if (refs) { clear_extra_refs(); - mapped_refs = write_remote_refs(refs, refspec, reflog_msg.buf); + write_remote_refs(mapped_refs, reflog_msg.buf); remote_head = find_ref_by_name(refs, "HEAD"); remote_head_points_at = ^ permalink raw reply related [flat|nested] 26+ messages in thread
* Re: [PATCH] make 'git clone' ask the remote only for objects it cares about 2009-09-26 3:54 ` [PATCH] make 'git clone' ask the remote only for objects it cares about Nicolas Pitre @ 2009-09-26 7:21 ` Andreas Schwab 2009-09-26 19:50 ` Shawn O. Pearce 1 sibling, 0 replies; 26+ messages in thread From: Andreas Schwab @ 2009-09-26 7:21 UTC (permalink / raw) To: Nicolas Pitre Cc: Junio C Hamano, Jason Merrill, Matthieu Moy, git, Hin-Tak Leung Nicolas Pitre <nico@fluxnic.net> writes: > The fact that the git repository on gcc.gnu.org has lots of stuff in > "remote" branches that don't get cloned by default is a separate > configuration/policy issue on that server which might need (or not) to > be looked into. For instance at least, as a bare repository, it should > have all the git files in gcc.git/ directly instead of gcc.git/.git/. The remote is just a git-svn tree. Andreas. -- Andreas Schwab, schwab@linux-m68k.org GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 "And now for something completely different." ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH] make 'git clone' ask the remote only for objects it cares about 2009-09-26 3:54 ` [PATCH] make 'git clone' ask the remote only for objects it cares about Nicolas Pitre 2009-09-26 7:21 ` Andreas Schwab @ 2009-09-26 19:50 ` Shawn O. Pearce 2009-09-27 0:26 ` Nicolas Pitre 1 sibling, 1 reply; 26+ messages in thread From: Shawn O. Pearce @ 2009-09-26 19:50 UTC (permalink / raw) To: Nicolas Pitre Cc: Junio C Hamano, Jason Merrill, Matthieu Moy, git, Hin-Tak Leung Nicolas Pitre <nico@fluxnic.net> wrote: > Current behavior of 'git clone' when not using --mirror is to fetch > everything from the peer, and then filter out unwanted refs just before > writing them out to the cloned repository. This may become highly > inefficient if the peer has an unusual ref namespace, or if it simply > has "remotes" refs of its own, and those locally unwanted refs are > connecting to a large set of objects which becomes unreferenced as soon > as they are fetched. ... > +static void write_remote_refs(const struct ref *local_refs, const char *reflog) Here reflog is now unused. I'm going to squash this in. diff --git a/builtin-clone.c b/builtin-clone.c index edf7c7f..4992c25 100644 --- a/builtin-clone.c +++ b/builtin-clone.c @@ -342,7 +342,7 @@ static struct ref *wanted_peer_refs(const struct ref *refs, return local_refs; } -static void write_remote_refs(const struct ref *local_refs, const char *reflog) +static void write_remote_refs(const struct ref *local_refs) { const struct ref *r; @@ -534,7 +534,7 @@ int cmd_clone(int argc, const char **argv, const char *prefix) if (refs) { clear_extra_refs(); - write_remote_refs(mapped_refs, reflog_msg.buf); + write_remote_refs(mapped_refs); remote_head = find_ref_by_name(refs, "HEAD"); remote_head_points_at = -- Shawn. ^ permalink raw reply related [flat|nested] 26+ messages in thread
* Re: [PATCH] make 'git clone' ask the remote only for objects it cares about 2009-09-26 19:50 ` Shawn O. Pearce @ 2009-09-27 0:26 ` Nicolas Pitre 0 siblings, 0 replies; 26+ messages in thread From: Nicolas Pitre @ 2009-09-27 0:26 UTC (permalink / raw) To: Shawn O. Pearce Cc: Junio C Hamano, Jason Merrill, Matthieu Moy, git, Hin-Tak Leung On Sat, 26 Sep 2009, Shawn O. Pearce wrote: > Nicolas Pitre <nico@fluxnic.net> wrote: > > Current behavior of 'git clone' when not using --mirror is to fetch > > everything from the peer, and then filter out unwanted refs just before > > writing them out to the cloned repository. This may become highly > > inefficient if the peer has an unusual ref namespace, or if it simply > > has "remotes" refs of its own, and those locally unwanted refs are > > connecting to a large set of objects which becomes unreferenced as soon > > as they are fetched. > ... > > +static void write_remote_refs(const struct ref *local_refs, const char *reflog) > > Here reflog is now unused. I'm going to squash this in. Yeah, I noticed. Since I didn't know what was the original intent for it, I just left it there. Nicolas ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: git clone sending unneeded objects 2009-09-25 23:17 ` Jason Merrill 2009-09-26 0:49 ` Nicolas Pitre @ 2009-09-26 4:44 ` Jason Merrill 2009-09-26 13:33 ` Jason Merrill 2009-09-27 1:27 ` Nicolas Pitre 1 sibling, 2 replies; 26+ messages in thread From: Jason Merrill @ 2009-09-26 4:44 UTC (permalink / raw) To: Nicolas Pitre; +Cc: Matthieu Moy, git, Hin-Tak Leung Incidentally, somewhat related to this issue, I've noticed that if I fetch a branch which I don't currently have in my repository, and I have most of the commits on that branch in my object store (or in an alternate repository) but not the most recent commit, git fetch isn't smart enough to only grab the commits I'm actually missing, it wants to fetch much more. I would expect that since the clone pulled down everything in the gcc.git repository, I could then do git config remote.origin.fetch 'refs/remotes/*:refs/remotes/origin/*' git fetch and have all the branches, not just the ones in refs/heads. But when I do this git fetch wants to fetch some 500k redundant objects. Jason ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: git clone sending unneeded objects 2009-09-26 4:44 ` git clone sending unneeded objects Jason Merrill @ 2009-09-26 13:33 ` Jason Merrill 2009-09-27 2:26 ` Nicolas Pitre 2009-09-27 1:27 ` Nicolas Pitre 1 sibling, 1 reply; 26+ messages in thread From: Jason Merrill @ 2009-09-26 13:33 UTC (permalink / raw) To: Nicolas Pitre; +Cc: Matthieu Moy, git, Hin-Tak Leung On 09/26/2009 12:44 AM, Jason Merrill wrote: > git config remote.origin.fetch 'refs/remotes/*:refs/remotes/origin/*' > git fetch git count-objects -v before: count: 44 size: 1768 in-pack: 1399509 packs: 1 size-pack: 600456 prune-packable: 0 garbage: 0 and after (transferred 278MB): count: 44 size: 1768 in-pack: 1947339 packs: 2 size-pack: 1178408 prune-packable: 8 garbage: 0 and then after git gc --prune=now: count: 0 size: 0 in-pack: 1399613 packs: 1 size-pack: 839900 prune-packable: 0 garbage: 0 So I only actually needed 104 more objects, but fetch wasn't clever enough to see that, and my new pack is much less efficient. I've run into the same issue using alternates to set up multiple working directories for different branches; if the alternate directory isn't completely up-to-date, fetch wants to pull down lots of data again rather than use what I have and only fetch the last one or two commits. Jason ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: git clone sending unneeded objects 2009-09-26 13:33 ` Jason Merrill @ 2009-09-27 2:26 ` Nicolas Pitre 0 siblings, 0 replies; 26+ messages in thread From: Nicolas Pitre @ 2009-09-27 2:26 UTC (permalink / raw) To: Jason Merrill; +Cc: Matthieu Moy, git, Hin-Tak Leung On Sat, 26 Sep 2009, Jason Merrill wrote: > On 09/26/2009 12:44 AM, Jason Merrill wrote: > > git config remote.origin.fetch 'refs/remotes/*:refs/remotes/origin/*' > > git fetch > > git count-objects -v before: > > count: 44 > size: 1768 > in-pack: 1399509 > packs: 1 > size-pack: 600456 > prune-packable: 0 > garbage: 0 I'm sure if you had done 'git rev-list --all --objects | wc -l' at that point, the result would have been something around 900000. That's the actual number of objects git had a reference to, compared to the total objects contained in the object store. > and after (transferred 278MB): > > count: 44 > size: 1768 > in-pack: 1947339 > packs: 2 > size-pack: 1178408 > prune-packable: 8 > garbage: 0 And those 500000 extra objects or so (minus a couple dozens which were probably used to "complete" the fetched thin pack and are duplicates of local objects -- the fetch progress message gave the exact number) were obtained from the remote repository because git has no way to tell the remote it already had them. That's what I was explaining in my previous email. > and then after git gc --prune=now: > > count: 0 > size: 0 > in-pack: 1399613 > packs: 1 > size-pack: 839900 > prune-packable: 0 > garbage: 0 > > So I only actually needed 104 more objects, but fetch wasn't clever enough to > see that, and my new pack is much less efficient. Like I said, it's not that the fetch wasn't clever enough. Rather that your initial clone asked for way too many objects in the first place. That's what my patch fixed. Now the pack efficiency can be explained as well. A single pack is always going to be more efficient than 2 packs. Problem is when you do a gc, by default git does the least costly operation which consists of copying as much data from existing packs without extra processing. That means that many objects were copied from the second (newly received) pack although a better delta representation was most probably available in the other larger pack (remember that most objects from that second pack already existed in the first pack). Git do select the second pack in preference to the other pack because it is more recent, and normally more recent packs contains more recent objects which is a good heuristic to optimizes the object enumeration. In this case this didn't produce a good result, but again we're talking about a scenario which is bogus from the start and shouldn't be. So if you do a 'git gc --aggressive' and let it run for a while, you should get back a smaller pack, possibly even much smaller than the original one. Nicolas ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: git clone sending unneeded objects 2009-09-26 4:44 ` git clone sending unneeded objects Jason Merrill 2009-09-26 13:33 ` Jason Merrill @ 2009-09-27 1:27 ` Nicolas Pitre 2009-09-27 2:04 ` Shawn O. Pearce 1 sibling, 1 reply; 26+ messages in thread From: Nicolas Pitre @ 2009-09-27 1:27 UTC (permalink / raw) To: Jason Merrill; +Cc: Matthieu Moy, git, Hin-Tak Leung On Sat, 26 Sep 2009, Jason Merrill wrote: > Incidentally, somewhat related to this issue, I've noticed that if I fetch a > branch which I don't currently have in my repository, and I have most of the > commits on that branch in my object store (or in an alternate repository) but > not the most recent commit, git fetch isn't smart enough to only grab the > commits I'm actually missing, it wants to fetch much more. > > I would expect that since the clone pulled down everything in the gcc.git > repository, I could then do > > git config remote.origin.fetch 'refs/remotes/*:refs/remotes/origin/*' > git fetch > > and have all the branches, not just the ones in refs/heads. But when I do > this git fetch wants to fetch some 500k redundant objects. Well... Assuming a fixed git using the patch I posted yesterday, my clone of gcc.git has 988941 objects. The source repository used for the clone has 1399551 objects. Of course the source repo has more objects because it has extra branches in the refs/remotes/ namespace that the clone didn't fetch. If you wish to also fetch those branches as you illustrated above then you'll get the difference i.e. 410610 additional objects. And even if the broken clone (before my patch) did pull everything from gcc.git, in the cloned repository those 410610 extra objects are considered as garbage because nothing actually reference them. So even if you decide to fetch the extra branches that the initial clone didn't pick up, or if you do reference that repository with "garbage" objects for another clone to which you want to add those extra branches, git has no way to know that it already had access to those objects locally and "ungarbage" them as they aren't referenced. Result is a useless fetch of 410610 objects that you already have, but that you weren't supposed to have in the first place. Nicolas ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: git clone sending unneeded objects 2009-09-27 1:27 ` Nicolas Pitre @ 2009-09-27 2:04 ` Shawn O. Pearce 2009-09-27 2:31 ` Nicolas Pitre 2009-09-27 4:35 ` Jason Merrill 0 siblings, 2 replies; 26+ messages in thread From: Shawn O. Pearce @ 2009-09-27 2:04 UTC (permalink / raw) To: Nicolas Pitre; +Cc: Jason Merrill, Matthieu Moy, git, Hin-Tak Leung Nicolas Pitre <nico@fluxnic.net> wrote: > And even if the broken clone (before my patch) did pull everything from > gcc.git, in the cloned repository those 410610 extra objects are > considered as garbage because nothing actually reference them. So even > if you decide to fetch the extra branches that the initial clone didn't > pick up, or if you do reference that repository with "garbage" objects > for another clone to which you want to add those extra branches, git has > no way to know that it already had access to those objects locally and > "ungarbage" them as they aren't referenced. Result is a useless fetch > of 410610 objects that you already have, but that you weren't supposed > to have in the first place. Just to clarify a minor nit: Actually, if those refs have not changed, quickfetch should kick in and realize that all 410610 objects are reachable locally without errors, permitting the client to avoid the object transfer. However, if *ANY* of those refs were to change to something you don't actually have, quickfetch would fail, and we would need to fetch all 410610 objects. -- Shawn. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: git clone sending unneeded objects 2009-09-27 2:04 ` Shawn O. Pearce @ 2009-09-27 2:31 ` Nicolas Pitre 2009-09-27 4:35 ` Jason Merrill 1 sibling, 0 replies; 26+ messages in thread From: Nicolas Pitre @ 2009-09-27 2:31 UTC (permalink / raw) To: Shawn O. Pearce; +Cc: Jason Merrill, Matthieu Moy, git, Hin-Tak Leung On Sat, 26 Sep 2009, Shawn O. Pearce wrote: > Nicolas Pitre <nico@fluxnic.net> wrote: > > And even if the broken clone (before my patch) did pull everything from > > gcc.git, in the cloned repository those 410610 extra objects are > > considered as garbage because nothing actually reference them. So even > > if you decide to fetch the extra branches that the initial clone didn't > > pick up, or if you do reference that repository with "garbage" objects > > for another clone to which you want to add those extra branches, git has > > no way to know that it already had access to those objects locally and > > "ungarbage" them as they aren't referenced. Result is a useless fetch > > of 410610 objects that you already have, but that you weren't supposed > > to have in the first place. > > Just to clarify a minor nit: > > Actually, if those refs have not changed, quickfetch should kick in > and realize that all 410610 objects are reachable locally without > errors, permitting the client to avoid the object transfer. > > However, if *ANY* of those refs were to change to something you > don't actually have, quickfetch would fail, and we would need to > fetch all 410610 objects. Right. But since we're talking about a git mirror for the gcc svn repo and gcc is a rather active project, the likelyhood of any ref to change at any time is rather high. Nicolas ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: git clone sending unneeded objects 2009-09-27 2:04 ` Shawn O. Pearce 2009-09-27 2:31 ` Nicolas Pitre @ 2009-09-27 4:35 ` Jason Merrill 2009-09-28 4:18 ` Nicolas Pitre 1 sibling, 1 reply; 26+ messages in thread From: Jason Merrill @ 2009-09-27 4:35 UTC (permalink / raw) To: Shawn O. Pearce; +Cc: Nicolas Pitre, Matthieu Moy, git, Hin-Tak Leung On 09/26/2009 10:04 PM, Shawn O. Pearce wrote: > Actually, if those refs have not changed, quickfetch should kick in > and realize that all 410610 objects are reachable locally without > errors, permitting the client to avoid the object transfer. > > However, if *ANY* of those refs were to change to something you > don't actually have, quickfetch would fail, and we would need to > fetch all 410610 objects. Right. That seems unfortunate to me; couldn't fetch do a bit more checking before it decides to download the whole world again? Jason ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: git clone sending unneeded objects 2009-09-27 4:35 ` Jason Merrill @ 2009-09-28 4:18 ` Nicolas Pitre 0 siblings, 0 replies; 26+ messages in thread From: Nicolas Pitre @ 2009-09-28 4:18 UTC (permalink / raw) To: Jason Merrill; +Cc: Shawn O. Pearce, Matthieu Moy, git, Hin-Tak Leung On Sun, 27 Sep 2009, Jason Merrill wrote: > On 09/26/2009 10:04 PM, Shawn O. Pearce wrote: > > Actually, if those refs have not changed, quickfetch should kick in > > and realize that all 410610 objects are reachable locally without > > errors, permitting the client to avoid the object transfer. > > > > However, if *ANY* of those refs were to change to something you > > don't actually have, quickfetch would fail, and we would need to > > fetch all 410610 objects. > > Right. That seems unfortunate to me; couldn't fetch do a bit more checking > before it decides to download the whole world again? The quickfetch test could be turned into a filter so refs that are already available locally could simply not be fetched on a per ref basis. But that would be a rather expensive test which couldn't keep its "quick" qualifier anymore, and so for a case that shouldn't have happened normally anyway if git didn't have a bug with its clone operation as I've explained already. Nicolas ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: git clone sending unneeded objects 2009-09-25 20:20 ` Jason Merrill 2009-09-25 20:47 ` Nicolas Pitre @ 2009-09-26 0:43 ` Hin-Tak Leung 1 sibling, 0 replies; 26+ messages in thread From: Hin-Tak Leung @ 2009-09-26 0:43 UTC (permalink / raw) To: Jason Merrill; +Cc: Nicolas Pitre, Matthieu Moy, git On Fri, Sep 25, 2009 at 9:20 PM, Jason Merrill <jason@redhat.com> wrote: > On 09/25/2009 03:53 PM, Nicolas Pitre wrote: >> >> I did reproduce the issue with git:// back when this discussion started. >> I also asked for more information about the remote which didn't come >> forth. > > Looking back, I only see you asking about the git version on the server, > which is 1.6.4. Hmm, I was under the impression from the previous thread that the server is a bit older and/or have more backward compatible settings to cater for older git clients? > > So again: > > git clone git://gcc.gnu.org/git/gcc.git > (1399509 objects, ~600MB .git dir) > git gc --prune=now (988906 objects, ~450MB .git dir) > > ...then > > git clone git://gcc.gnu.org/git/gcc.git --reference $firstclone > (573401 objects, ~550MB .git dir) > git fsck (clean) > git gc --prune=now (5 objects, ~7MB .git dir) > > What's going on here? FWIW, I still have my clone (git://) and do my periodic 'git fetch' and 'git gc prune=now' (learned my lessons!) and it is currently .git dir is about 350MB. (from previous discussion the optimal at the time was about 300MB, so it has grown a bit in the last couple of months). And thanks everybody for all the discussion and advice. git is a great tool. (and I have essentially stopped using svn, prefering git-svn!). ^ permalink raw reply [flat|nested] 26+ messages in thread
end of thread, other threads:[~2009-09-28 4:18 UTC | newest] Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2009-08-08 1:11 git gc expanding packed data? Andreas Schwab 2009-08-08 13:05 ` Hin-Tak Leung 2009-08-08 13:25 ` Andreas Schwab 2009-08-09 2:56 ` Nicolas Pitre 2009-08-09 7:43 ` Andreas Schwab 2009-09-25 18:05 ` git clone sending unneeded objects (was : git gc expanding packed data?) Jason Merrill 2009-09-25 19:34 ` git clone sending unneeded objects Matthieu Moy 2009-09-25 19:43 ` Jason Merrill 2009-09-25 19:53 ` Nicolas Pitre 2009-09-25 20:20 ` Jason Merrill 2009-09-25 20:47 ` Nicolas Pitre 2009-09-25 23:17 ` Jason Merrill 2009-09-26 0:49 ` Nicolas Pitre 2009-09-26 3:54 ` [PATCH] make 'git clone' ask the remote only for objects it cares about Nicolas Pitre 2009-09-26 7:21 ` Andreas Schwab 2009-09-26 19:50 ` Shawn O. Pearce 2009-09-27 0:26 ` Nicolas Pitre 2009-09-26 4:44 ` git clone sending unneeded objects Jason Merrill 2009-09-26 13:33 ` Jason Merrill 2009-09-27 2:26 ` Nicolas Pitre 2009-09-27 1:27 ` Nicolas Pitre 2009-09-27 2:04 ` Shawn O. Pearce 2009-09-27 2:31 ` Nicolas Pitre 2009-09-27 4:35 ` Jason Merrill 2009-09-28 4:18 ` Nicolas Pitre 2009-09-26 0:43 ` Hin-Tak Leung
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.