* Re: git gc expanding packed data?
@ 2009-08-08 1:11 Andreas Schwab
2009-08-08 13:05 ` Hin-Tak Leung
2009-08-09 2:56 ` Nicolas Pitre
0 siblings, 2 replies; 26+ messages in thread
From: Andreas Schwab @ 2009-08-08 1:11 UTC (permalink / raw)
To: Nicolas Pitre; +Cc: Hin-Tak Leung, git
Nicolas Pitre <nico@cam.org> writes:
> It appears that the git installation serving clone requests for
> git://gcc.gnu.org/git/gcc.git generates lots of unreferenced objects. I
> just cloned it and the pack I was sent contains 1383356 objects (can be
> determined with 'git show-index < .git/objects/pack/*.idx | wc -l').
> However, there are only 978501 actually referenced objects in that
> cloned repository ( 'git rev-list --all --objects | wc -l'). That makes
> for 404855 useless objects in the cloned repository.
Those objects are not useless. They are referenced by the remote refs
on the remote side, which are not fetched by default. If you clone a
mirror of the repository you'll see no unreferenced objects.
Andreas.
--
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5
"And now for something completely different."
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: git gc expanding packed data?
2009-08-08 1:11 git gc expanding packed data? Andreas Schwab
@ 2009-08-08 13:05 ` Hin-Tak Leung
2009-08-08 13:25 ` Andreas Schwab
2009-08-09 2:56 ` Nicolas Pitre
1 sibling, 1 reply; 26+ messages in thread
From: Hin-Tak Leung @ 2009-08-08 13:05 UTC (permalink / raw)
To: Andreas Schwab; +Cc: Nicolas Pitre, git
On Sat, Aug 8, 2009 at 2:11 AM, Andreas Schwab<schwab@linux-m68k.org> wrote:
> Nicolas Pitre <nico@cam.org> writes:
>
>> It appears that the git installation serving clone requests for
>> git://gcc.gnu.org/git/gcc.git generates lots of unreferenced objects. I
>> just cloned it and the pack I was sent contains 1383356 objects (can be
>> determined with 'git show-index < .git/objects/pack/*.idx | wc -l').
>> However, there are only 978501 actually referenced objects in that
>> cloned repository ( 'git rev-list --all --objects | wc -l'). That makes
>> for 404855 useless objects in the cloned repository.
>
> Those objects are not useless. They are referenced by the remote refs
> on the remote side, which are not fetched by default. If you clone a
> mirror of the repository you'll see no unreferenced objects.
>
> Andreas.
>
> --
> Andreas Schwab, schwab@linux-m68k.org
> GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5
> "And now for something completely different."
>
Thanks... It is a difference between svn and git mentality probably -
one only pushes reasonably reliable code to a public git repository,
whereas anything transient is recorded in svn - I think many of the
unreferenced objects are svn user-branches (which are probably of use
to people who intend to work on gcc for fairly extended periods,
rather than casual users like me).
The case with gcc is probably quite extreme - many user branches, and
very large code base - but is there anything on the git side with git
gc which can lessen this kind of pathological behavior (expanding
packs)?
Thanks a lot for the explanation and the discussion.
Hin-Tak
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: git gc expanding packed data?
2009-08-08 13:05 ` Hin-Tak Leung
@ 2009-08-08 13:25 ` Andreas Schwab
0 siblings, 0 replies; 26+ messages in thread
From: Andreas Schwab @ 2009-08-08 13:25 UTC (permalink / raw)
To: Hin-Tak Leung; +Cc: Nicolas Pitre, git
Hin-Tak Leung <hintak.leung@gmail.com> writes:
> Thanks... It is a difference between svn and git mentality probably -
It is just that the remote is a git-svn tree, with only a few branches
created as local branches.
> The case with gcc is probably quite extreme - many user branches, and
> very large code base - but is there anything on the git side with git
> gc which can lessen this kind of pathological behavior (expanding
> packs)?
If you fetch all refs, not only refs/heads/*, all objects will be
referenced.
Andreas.
--
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5
"And now for something completely different."
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: git gc expanding packed data?
2009-08-08 1:11 git gc expanding packed data? Andreas Schwab
2009-08-08 13:05 ` Hin-Tak Leung
@ 2009-08-09 2:56 ` Nicolas Pitre
2009-08-09 7:43 ` Andreas Schwab
1 sibling, 1 reply; 26+ messages in thread
From: Nicolas Pitre @ 2009-08-09 2:56 UTC (permalink / raw)
To: Andreas Schwab; +Cc: Hin-Tak Leung, git
On Sat, 8 Aug 2009, Andreas Schwab wrote:
> Nicolas Pitre <nico@cam.org> writes:
>
> > It appears that the git installation serving clone requests for
> > git://gcc.gnu.org/git/gcc.git generates lots of unreferenced objects. I
> > just cloned it and the pack I was sent contains 1383356 objects (can be
> > determined with 'git show-index < .git/objects/pack/*.idx | wc -l').
> > However, there are only 978501 actually referenced objects in that
> > cloned repository ( 'git rev-list --all --objects | wc -l'). That makes
> > for 404855 useless objects in the cloned repository.
>
> Those objects are not useless. They are referenced by the remote refs
> on the remote side, which are not fetched by default. If you clone a
> mirror of the repository you'll see no unreferenced objects.
If you do a clone using the git:// protocol and the server sends you
only the ref for the trunk branch, then it should send you only objects
reachable from that branch. Any extra objects sent by the server are
useless to me and wastes my and everyone else's bandwidth, and on my
next repack those objects are pruned anyway. The point of the git
protocol is _not_ necessarily to send a copy of the remote pack file
over, even during a clone.
Nicolas
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: git gc expanding packed data?
2009-08-09 2:56 ` Nicolas Pitre
@ 2009-08-09 7:43 ` Andreas Schwab
2009-09-25 18:05 ` git clone sending unneeded objects (was : git gc expanding packed data?) Jason Merrill
0 siblings, 1 reply; 26+ messages in thread
From: Andreas Schwab @ 2009-08-09 7:43 UTC (permalink / raw)
To: Nicolas Pitre; +Cc: Hin-Tak Leung, git
Nicolas Pitre <nico@cam.org> writes:
> If you do a clone using the git:// protocol and the server sends you
> only the ref for the trunk branch,
A clone will fetch all branches from refs/heads/*.
> then it should send you only objects reachable from that branch.
Apparantly this does not work. I'd guess the extra objects are needed
due to the delta compression.
Andreas.
--
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5
"And now for something completely different."
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: git clone sending unneeded objects (was : git gc expanding packed data?)
2009-08-09 7:43 ` Andreas Schwab
@ 2009-09-25 18:05 ` Jason Merrill
2009-09-25 19:34 ` git clone sending unneeded objects Matthieu Moy
0 siblings, 1 reply; 26+ messages in thread
From: Jason Merrill @ 2009-09-25 18:05 UTC (permalink / raw)
To: Andreas Schwab; +Cc: Nicolas Pitre, Hin-Tak Leung, git
On 08/09/2009 03:43 AM, Andreas Schwab wrote:
> Nicolas Pitre<nico@cam.org> writes:
>
>> If you do a clone using the git:// protocol and the server sends you
>> only the ref for the trunk branch,
>
> A clone will fetch all branches from refs/heads/*.
>
>> then it should send you only objects reachable from that branch.
>
> Apparantly this does not work. I'd guess the extra objects are needed
> due to the delta compression.
I just tried doing a clone of the GCC repository, then git gc
--prune=now, and another clone specifying --reference to the first, and
it wanted to download all the unreachable objects again. So it doesn't
seem to be a compression issue.
This is with git 1.6.4 on both ends.
Jason
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: git clone sending unneeded objects
2009-09-25 18:05 ` git clone sending unneeded objects (was : git gc expanding packed data?) Jason Merrill
@ 2009-09-25 19:34 ` Matthieu Moy
2009-09-25 19:43 ` Jason Merrill
2009-09-25 19:53 ` Nicolas Pitre
0 siblings, 2 replies; 26+ messages in thread
From: Matthieu Moy @ 2009-09-25 19:34 UTC (permalink / raw)
To: Jason Merrill; +Cc: git, Nicolas Pitre, Hin-Tak Leung
Jason Merrill <jason@redhat.com> writes:
> On 08/09/2009 03:43 AM, Andreas Schwab wrote:
>> Nicolas Pitre<nico@cam.org> writes:
>>
>>> If you do a clone using the git:// protocol and the server sends you
>>> only the ref for the trunk branch,
>>
>> A clone will fetch all branches from refs/heads/*.
>>
>>> then it should send you only objects reachable from that branch.
>>
>> Apparantly this does not work. I'd guess the extra objects are needed
>> due to the delta compression.
>
> I just tried doing a clone of the GCC repository, then git gc
> --prune=now, and another clone specifying --reference to the first,
> and it wanted to download all the unreachable objects again. So it
> doesn't seem to be a compression issue.
>
> This is with git 1.6.4 on both ends.
Which protocol did you use?
If you use git:// or ssh://, it's normally a security feature that Git
sends you only reachable objects. If it doesn't, it's a serious bug.
--
Matthieu Moy
http://www-verimag.imag.fr/~moy/
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: git clone sending unneeded objects
2009-09-25 19:34 ` git clone sending unneeded objects Matthieu Moy
@ 2009-09-25 19:43 ` Jason Merrill
2009-09-25 19:53 ` Nicolas Pitre
1 sibling, 0 replies; 26+ messages in thread
From: Jason Merrill @ 2009-09-25 19:43 UTC (permalink / raw)
To: Matthieu Moy; +Cc: git, Nicolas Pitre, Hin-Tak Leung
On 09/25/2009 03:34 PM, Matthieu Moy wrote:
> Which protocol did you use?
git://
Jason
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: git clone sending unneeded objects
2009-09-25 19:34 ` git clone sending unneeded objects Matthieu Moy
2009-09-25 19:43 ` Jason Merrill
@ 2009-09-25 19:53 ` Nicolas Pitre
2009-09-25 20:20 ` Jason Merrill
1 sibling, 1 reply; 26+ messages in thread
From: Nicolas Pitre @ 2009-09-25 19:53 UTC (permalink / raw)
To: Matthieu Moy; +Cc: Jason Merrill, git, Hin-Tak Leung
On Fri, 25 Sep 2009, Matthieu Moy wrote:
> Jason Merrill <jason@redhat.com> writes:
>
> > On 08/09/2009 03:43 AM, Andreas Schwab wrote:
> >> Nicolas Pitre<nico@cam.org> writes:
> >>
> >>> If you do a clone using the git:// protocol and the server sends you
> >>> only the ref for the trunk branch,
> >>
> >> A clone will fetch all branches from refs/heads/*.
> >>
> >>> then it should send you only objects reachable from that branch.
> >>
> >> Apparantly this does not work. I'd guess the extra objects are needed
> >> due to the delta compression.
> >
> > I just tried doing a clone of the GCC repository, then git gc
> > --prune=now, and another clone specifying --reference to the first,
> > and it wanted to download all the unreachable objects again. So it
> > doesn't seem to be a compression issue.
> >
> > This is with git 1.6.4 on both ends.
>
> Which protocol did you use?
>
> If you use git:// or ssh://, it's normally a security feature that Git
> sends you only reachable objects. If it doesn't, it's a serious bug.
I did reproduce the issue with git:// back when this discussion started.
I also asked for more information about the remote which didn't come
forth.
Nicolas
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: git clone sending unneeded objects
2009-09-25 19:53 ` Nicolas Pitre
@ 2009-09-25 20:20 ` Jason Merrill
2009-09-25 20:47 ` Nicolas Pitre
2009-09-26 0:43 ` Hin-Tak Leung
0 siblings, 2 replies; 26+ messages in thread
From: Jason Merrill @ 2009-09-25 20:20 UTC (permalink / raw)
To: Nicolas Pitre; +Cc: Matthieu Moy, git, Hin-Tak Leung
On 09/25/2009 03:53 PM, Nicolas Pitre wrote:
> I did reproduce the issue with git:// back when this discussion started.
> I also asked for more information about the remote which didn't come
> forth.
Looking back, I only see you asking about the git version on the server,
which is 1.6.4.
So again:
git clone git://gcc.gnu.org/git/gcc.git
(1399509 objects, ~600MB .git dir)
git gc --prune=now (988906 objects, ~450MB .git dir)
...then
git clone git://gcc.gnu.org/git/gcc.git --reference $firstclone
(573401 objects, ~550MB .git dir)
git fsck (clean)
git gc --prune=now (5 objects, ~7MB .git dir)
What's going on here?
Jason
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: git clone sending unneeded objects
2009-09-25 20:20 ` Jason Merrill
@ 2009-09-25 20:47 ` Nicolas Pitre
2009-09-25 23:17 ` Jason Merrill
2009-09-26 0:43 ` Hin-Tak Leung
1 sibling, 1 reply; 26+ messages in thread
From: Nicolas Pitre @ 2009-09-25 20:47 UTC (permalink / raw)
To: Jason Merrill; +Cc: Matthieu Moy, git, Hin-Tak Leung
On Fri, 25 Sep 2009, Jason Merrill wrote:
> On 09/25/2009 03:53 PM, Nicolas Pitre wrote:
> > I did reproduce the issue with git:// back when this discussion started.
> > I also asked for more information about the remote which didn't come
> > forth.
>
> Looking back, I only see you asking about the git version on the server, which
> is 1.6.4.
>
> So again:
>
> git clone git://gcc.gnu.org/git/gcc.git
> (1399509 objects, ~600MB .git dir)
> git gc --prune=now (988906 objects, ~450MB .git dir)
>
> ...then
>
> git clone git://gcc.gnu.org/git/gcc.git --reference $firstclone
> (573401 objects, ~550MB .git dir)
> git fsck (clean)
> git gc --prune=now (5 objects, ~7MB .git dir)
>
> What's going on here?
Some screw up.
Do you have access to the remote machine? Is it possible to have a
tarball of the gcc.git directory from there?
Nicolas
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: git clone sending unneeded objects
2009-09-25 20:47 ` Nicolas Pitre
@ 2009-09-25 23:17 ` Jason Merrill
2009-09-26 0:49 ` Nicolas Pitre
2009-09-26 4:44 ` git clone sending unneeded objects Jason Merrill
0 siblings, 2 replies; 26+ messages in thread
From: Jason Merrill @ 2009-09-25 23:17 UTC (permalink / raw)
To: Nicolas Pitre; +Cc: Matthieu Moy, git, Hin-Tak Leung
On 09/25/2009 04:47 PM, Nicolas Pitre wrote:
> Do you have access to the remote machine? Is it possible to have a
> tarball of the gcc.git directory from there?
http://gcc.gnu.org/gcc-git.tar.gz
I'll leave it there for a few days.
Jason
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: git clone sending unneeded objects
2009-09-25 20:20 ` Jason Merrill
2009-09-25 20:47 ` Nicolas Pitre
@ 2009-09-26 0:43 ` Hin-Tak Leung
1 sibling, 0 replies; 26+ messages in thread
From: Hin-Tak Leung @ 2009-09-26 0:43 UTC (permalink / raw)
To: Jason Merrill; +Cc: Nicolas Pitre, Matthieu Moy, git
On Fri, Sep 25, 2009 at 9:20 PM, Jason Merrill <jason@redhat.com> wrote:
> On 09/25/2009 03:53 PM, Nicolas Pitre wrote:
>>
>> I did reproduce the issue with git:// back when this discussion started.
>> I also asked for more information about the remote which didn't come
>> forth.
>
> Looking back, I only see you asking about the git version on the server,
> which is 1.6.4.
Hmm, I was under the impression from the previous thread that the
server is a bit older and/or have more backward compatible settings to
cater for older git clients?
>
> So again:
>
> git clone git://gcc.gnu.org/git/gcc.git
> (1399509 objects, ~600MB .git dir)
> git gc --prune=now (988906 objects, ~450MB .git dir)
>
> ...then
>
> git clone git://gcc.gnu.org/git/gcc.git --reference $firstclone
> (573401 objects, ~550MB .git dir)
> git fsck (clean)
> git gc --prune=now (5 objects, ~7MB .git dir)
>
> What's going on here?
FWIW, I still have my clone (git://) and do my periodic 'git fetch'
and 'git gc prune=now' (learned my lessons!) and it is currently .git
dir is about 350MB. (from previous discussion the optimal at the time
was about 300MB, so it has grown a bit in the last couple of months).
And thanks everybody for all the discussion and advice. git is a great
tool. (and I have essentially stopped using svn, prefering git-svn!).
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: git clone sending unneeded objects
2009-09-25 23:17 ` Jason Merrill
@ 2009-09-26 0:49 ` Nicolas Pitre
2009-09-26 3:54 ` [PATCH] make 'git clone' ask the remote only for objects it cares about Nicolas Pitre
2009-09-26 4:44 ` git clone sending unneeded objects Jason Merrill
1 sibling, 1 reply; 26+ messages in thread
From: Nicolas Pitre @ 2009-09-26 0:49 UTC (permalink / raw)
To: Jason Merrill; +Cc: Matthieu Moy, git, Hin-Tak Leung
On Fri, 25 Sep 2009, Jason Merrill wrote:
> On 09/25/2009 04:47 PM, Nicolas Pitre wrote:
> > Do you have access to the remote machine? Is it possible to have a
> > tarball of the gcc.git directory from there?
>
> http://gcc.gnu.org/gcc-git.tar.gz
>
> I'll leave it there for a few days.
Thanks, I got it now. And I was able to reproduce the issue locally.
Cloning the original repository does transfer objects which become
unreferenced in the clone. But cloning that cloned repository (before
pruning the unreferenced objects) does not transfer those objects again.
Just need to find out why.
Nicolas
^ permalink raw reply [flat|nested] 26+ messages in thread
* [PATCH] make 'git clone' ask the remote only for objects it cares about
2009-09-26 0:49 ` Nicolas Pitre
@ 2009-09-26 3:54 ` Nicolas Pitre
2009-09-26 7:21 ` Andreas Schwab
2009-09-26 19:50 ` Shawn O. Pearce
0 siblings, 2 replies; 26+ messages in thread
From: Nicolas Pitre @ 2009-09-26 3:54 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Jason Merrill, Matthieu Moy, git, Hin-Tak Leung
Current behavior of 'git clone' when not using --mirror is to fetch
everything from the peer, and then filter out unwanted refs just before
writing them out to the cloned repository. This may become highly
inefficient if the peer has an unusual ref namespace, or if it simply
has "remotes" refs of its own, and those locally unwanted refs are
connecting to a large set of objects which becomes unreferenced as soon
as they are fetched.
Let's filter out those unwanted refs from the peer _before_ asking it
what refs we want to fetch instead, which is the most logical thing to
do anyway.
Signed-off-by: Nicolas Pitre <nico@fluxnic.net>
---
On Fri, 25 Sep 2009, Nicolas Pitre wrote:
> On Fri, 25 Sep 2009, Jason Merrill wrote:
>
> > On 09/25/2009 04:47 PM, Nicolas Pitre wrote:
> > > Do you have access to the remote machine? Is it possible to have a
> > > tarball of the gcc.git directory from there?
> >
> > http://gcc.gnu.org/gcc-git.tar.gz
> >
> > I'll leave it there for a few days.
>
> Thanks, I got it now. And I was able to reproduce the issue locally.
>
> Cloning the original repository does transfer objects which become
> unreferenced in the clone. But cloning that cloned repository (before
> pruning the unreferenced objects) does not transfer those objects again.
>
> Just need to find out why.
And the "why" is described above. The problem was actually on the
client side and was affecting clones of any repository containing
anything outside refs/heads and refs/tags.
The fact that the git repository on gcc.gnu.org has lots of stuff in
"remote" branches that don't get cloned by default is a separate
configuration/policy issue on that server which might need (or not) to
be looked into. For instance at least, as a bare repository, it should
have all the git files in gcc.git/ directly instead of gcc.git/.git/.
diff --git a/builtin-clone.c b/builtin-clone.c
index bab2d84..edf7c7f 100644
--- a/builtin-clone.c
+++ b/builtin-clone.c
@@ -329,24 +329,28 @@ static void remove_junk_on_signal(int signo)
raise(signo);
}
-static struct ref *write_remote_refs(const struct ref *refs,
- struct refspec *refspec, const char *reflog)
+static struct ref *wanted_peer_refs(const struct ref *refs,
+ struct refspec *refspec)
{
struct ref *local_refs = NULL;
struct ref **tail = &local_refs;
- struct ref *r;
get_fetch_map(refs, refspec, &tail, 0);
if (!option_mirror)
get_fetch_map(refs, tag_refspec, &tail, 0);
+ return local_refs;
+}
+
+static void write_remote_refs(const struct ref *local_refs, const char *reflog)
+{
+ const struct ref *r;
+
for (r = local_refs; r; r = r->next)
add_extra_ref(r->peer_ref->name, r->old_sha1, 0);
pack_refs(PACK_REFS_ALL);
clear_extra_refs();
-
- return local_refs;
}
int cmd_clone(int argc, const char **argv, const char *prefix)
@@ -495,9 +499,10 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
strbuf_reset(&value);
- if (path && !is_bundle)
+ if (path && !is_bundle) {
refs = clone_local(path, git_dir);
- else {
+ mapped_refs = wanted_peer_refs(refs, refspec);
+ } else {
struct remote *remote = remote_get(argv[0]);
transport = transport_get(remote, remote->url[0]);
@@ -520,14 +525,16 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
option_upload_pack);
refs = transport_get_remote_refs(transport);
- if (refs)
- transport_fetch_refs(transport, refs);
+ if (refs) {
+ mapped_refs = wanted_peer_refs(refs, refspec);
+ transport_fetch_refs(transport, mapped_refs);
+ }
}
if (refs) {
clear_extra_refs();
- mapped_refs = write_remote_refs(refs, refspec, reflog_msg.buf);
+ write_remote_refs(mapped_refs, reflog_msg.buf);
remote_head = find_ref_by_name(refs, "HEAD");
remote_head_points_at =
^ permalink raw reply related [flat|nested] 26+ messages in thread
* Re: git clone sending unneeded objects
2009-09-25 23:17 ` Jason Merrill
2009-09-26 0:49 ` Nicolas Pitre
@ 2009-09-26 4:44 ` Jason Merrill
2009-09-26 13:33 ` Jason Merrill
2009-09-27 1:27 ` Nicolas Pitre
1 sibling, 2 replies; 26+ messages in thread
From: Jason Merrill @ 2009-09-26 4:44 UTC (permalink / raw)
To: Nicolas Pitre; +Cc: Matthieu Moy, git, Hin-Tak Leung
Incidentally, somewhat related to this issue, I've noticed that if I
fetch a branch which I don't currently have in my repository, and I have
most of the commits on that branch in my object store (or in an
alternate repository) but not the most recent commit, git fetch isn't
smart enough to only grab the commits I'm actually missing, it wants to
fetch much more.
I would expect that since the clone pulled down everything in the
gcc.git repository, I could then do
git config remote.origin.fetch 'refs/remotes/*:refs/remotes/origin/*'
git fetch
and have all the branches, not just the ones in refs/heads. But when I
do this git fetch wants to fetch some 500k redundant objects.
Jason
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH] make 'git clone' ask the remote only for objects it cares about
2009-09-26 3:54 ` [PATCH] make 'git clone' ask the remote only for objects it cares about Nicolas Pitre
@ 2009-09-26 7:21 ` Andreas Schwab
2009-09-26 19:50 ` Shawn O. Pearce
1 sibling, 0 replies; 26+ messages in thread
From: Andreas Schwab @ 2009-09-26 7:21 UTC (permalink / raw)
To: Nicolas Pitre
Cc: Junio C Hamano, Jason Merrill, Matthieu Moy, git, Hin-Tak Leung
Nicolas Pitre <nico@fluxnic.net> writes:
> The fact that the git repository on gcc.gnu.org has lots of stuff in
> "remote" branches that don't get cloned by default is a separate
> configuration/policy issue on that server which might need (or not) to
> be looked into. For instance at least, as a bare repository, it should
> have all the git files in gcc.git/ directly instead of gcc.git/.git/.
The remote is just a git-svn tree.
Andreas.
--
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5
"And now for something completely different."
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: git clone sending unneeded objects
2009-09-26 4:44 ` git clone sending unneeded objects Jason Merrill
@ 2009-09-26 13:33 ` Jason Merrill
2009-09-27 2:26 ` Nicolas Pitre
2009-09-27 1:27 ` Nicolas Pitre
1 sibling, 1 reply; 26+ messages in thread
From: Jason Merrill @ 2009-09-26 13:33 UTC (permalink / raw)
To: Nicolas Pitre; +Cc: Matthieu Moy, git, Hin-Tak Leung
On 09/26/2009 12:44 AM, Jason Merrill wrote:
> git config remote.origin.fetch 'refs/remotes/*:refs/remotes/origin/*'
> git fetch
git count-objects -v before:
count: 44
size: 1768
in-pack: 1399509
packs: 1
size-pack: 600456
prune-packable: 0
garbage: 0
and after (transferred 278MB):
count: 44
size: 1768
in-pack: 1947339
packs: 2
size-pack: 1178408
prune-packable: 8
garbage: 0
and then after git gc --prune=now:
count: 0
size: 0
in-pack: 1399613
packs: 1
size-pack: 839900
prune-packable: 0
garbage: 0
So I only actually needed 104 more objects, but fetch wasn't clever
enough to see that, and my new pack is much less efficient.
I've run into the same issue using alternates to set up multiple working
directories for different branches; if the alternate directory isn't
completely up-to-date, fetch wants to pull down lots of data again
rather than use what I have and only fetch the last one or two commits.
Jason
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH] make 'git clone' ask the remote only for objects it cares about
2009-09-26 3:54 ` [PATCH] make 'git clone' ask the remote only for objects it cares about Nicolas Pitre
2009-09-26 7:21 ` Andreas Schwab
@ 2009-09-26 19:50 ` Shawn O. Pearce
2009-09-27 0:26 ` Nicolas Pitre
1 sibling, 1 reply; 26+ messages in thread
From: Shawn O. Pearce @ 2009-09-26 19:50 UTC (permalink / raw)
To: Nicolas Pitre
Cc: Junio C Hamano, Jason Merrill, Matthieu Moy, git, Hin-Tak Leung
Nicolas Pitre <nico@fluxnic.net> wrote:
> Current behavior of 'git clone' when not using --mirror is to fetch
> everything from the peer, and then filter out unwanted refs just before
> writing them out to the cloned repository. This may become highly
> inefficient if the peer has an unusual ref namespace, or if it simply
> has "remotes" refs of its own, and those locally unwanted refs are
> connecting to a large set of objects which becomes unreferenced as soon
> as they are fetched.
...
> +static void write_remote_refs(const struct ref *local_refs, const char *reflog)
Here reflog is now unused. I'm going to squash this in.
diff --git a/builtin-clone.c b/builtin-clone.c
index edf7c7f..4992c25 100644
--- a/builtin-clone.c
+++ b/builtin-clone.c
@@ -342,7 +342,7 @@ static struct ref *wanted_peer_refs(const struct ref *refs,
return local_refs;
}
-static void write_remote_refs(const struct ref *local_refs, const char *reflog)
+static void write_remote_refs(const struct ref *local_refs)
{
const struct ref *r;
@@ -534,7 +534,7 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
if (refs) {
clear_extra_refs();
- write_remote_refs(mapped_refs, reflog_msg.buf);
+ write_remote_refs(mapped_refs);
remote_head = find_ref_by_name(refs, "HEAD");
remote_head_points_at =
--
Shawn.
^ permalink raw reply related [flat|nested] 26+ messages in thread
* Re: [PATCH] make 'git clone' ask the remote only for objects it cares about
2009-09-26 19:50 ` Shawn O. Pearce
@ 2009-09-27 0:26 ` Nicolas Pitre
0 siblings, 0 replies; 26+ messages in thread
From: Nicolas Pitre @ 2009-09-27 0:26 UTC (permalink / raw)
To: Shawn O. Pearce
Cc: Junio C Hamano, Jason Merrill, Matthieu Moy, git, Hin-Tak Leung
On Sat, 26 Sep 2009, Shawn O. Pearce wrote:
> Nicolas Pitre <nico@fluxnic.net> wrote:
> > Current behavior of 'git clone' when not using --mirror is to fetch
> > everything from the peer, and then filter out unwanted refs just before
> > writing them out to the cloned repository. This may become highly
> > inefficient if the peer has an unusual ref namespace, or if it simply
> > has "remotes" refs of its own, and those locally unwanted refs are
> > connecting to a large set of objects which becomes unreferenced as soon
> > as they are fetched.
> ...
> > +static void write_remote_refs(const struct ref *local_refs, const char *reflog)
>
> Here reflog is now unused. I'm going to squash this in.
Yeah, I noticed. Since I didn't know what was the original intent for
it, I just left it there.
Nicolas
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: git clone sending unneeded objects
2009-09-26 4:44 ` git clone sending unneeded objects Jason Merrill
2009-09-26 13:33 ` Jason Merrill
@ 2009-09-27 1:27 ` Nicolas Pitre
2009-09-27 2:04 ` Shawn O. Pearce
1 sibling, 1 reply; 26+ messages in thread
From: Nicolas Pitre @ 2009-09-27 1:27 UTC (permalink / raw)
To: Jason Merrill; +Cc: Matthieu Moy, git, Hin-Tak Leung
On Sat, 26 Sep 2009, Jason Merrill wrote:
> Incidentally, somewhat related to this issue, I've noticed that if I fetch a
> branch which I don't currently have in my repository, and I have most of the
> commits on that branch in my object store (or in an alternate repository) but
> not the most recent commit, git fetch isn't smart enough to only grab the
> commits I'm actually missing, it wants to fetch much more.
>
> I would expect that since the clone pulled down everything in the gcc.git
> repository, I could then do
>
> git config remote.origin.fetch 'refs/remotes/*:refs/remotes/origin/*'
> git fetch
>
> and have all the branches, not just the ones in refs/heads. But when I do
> this git fetch wants to fetch some 500k redundant objects.
Well... Assuming a fixed git using the patch I posted yesterday, my
clone of gcc.git has 988941 objects. The source repository used for the
clone has 1399551 objects. Of course the source repo has more objects
because it has extra branches in the refs/remotes/ namespace that the
clone didn't fetch. If you wish to also fetch those branches as you
illustrated above then you'll get the difference i.e. 410610 additional
objects.
And even if the broken clone (before my patch) did pull everything from
gcc.git, in the cloned repository those 410610 extra objects are
considered as garbage because nothing actually reference them. So even
if you decide to fetch the extra branches that the initial clone didn't
pick up, or if you do reference that repository with "garbage" objects
for another clone to which you want to add those extra branches, git has
no way to know that it already had access to those objects locally and
"ungarbage" them as they aren't referenced. Result is a useless fetch
of 410610 objects that you already have, but that you weren't supposed
to have in the first place.
Nicolas
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: git clone sending unneeded objects
2009-09-27 1:27 ` Nicolas Pitre
@ 2009-09-27 2:04 ` Shawn O. Pearce
2009-09-27 2:31 ` Nicolas Pitre
2009-09-27 4:35 ` Jason Merrill
0 siblings, 2 replies; 26+ messages in thread
From: Shawn O. Pearce @ 2009-09-27 2:04 UTC (permalink / raw)
To: Nicolas Pitre; +Cc: Jason Merrill, Matthieu Moy, git, Hin-Tak Leung
Nicolas Pitre <nico@fluxnic.net> wrote:
> And even if the broken clone (before my patch) did pull everything from
> gcc.git, in the cloned repository those 410610 extra objects are
> considered as garbage because nothing actually reference them. So even
> if you decide to fetch the extra branches that the initial clone didn't
> pick up, or if you do reference that repository with "garbage" objects
> for another clone to which you want to add those extra branches, git has
> no way to know that it already had access to those objects locally and
> "ungarbage" them as they aren't referenced. Result is a useless fetch
> of 410610 objects that you already have, but that you weren't supposed
> to have in the first place.
Just to clarify a minor nit:
Actually, if those refs have not changed, quickfetch should kick in
and realize that all 410610 objects are reachable locally without
errors, permitting the client to avoid the object transfer.
However, if *ANY* of those refs were to change to something you
don't actually have, quickfetch would fail, and we would need to
fetch all 410610 objects.
--
Shawn.
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: git clone sending unneeded objects
2009-09-26 13:33 ` Jason Merrill
@ 2009-09-27 2:26 ` Nicolas Pitre
0 siblings, 0 replies; 26+ messages in thread
From: Nicolas Pitre @ 2009-09-27 2:26 UTC (permalink / raw)
To: Jason Merrill; +Cc: Matthieu Moy, git, Hin-Tak Leung
On Sat, 26 Sep 2009, Jason Merrill wrote:
> On 09/26/2009 12:44 AM, Jason Merrill wrote:
> > git config remote.origin.fetch 'refs/remotes/*:refs/remotes/origin/*'
> > git fetch
>
> git count-objects -v before:
>
> count: 44
> size: 1768
> in-pack: 1399509
> packs: 1
> size-pack: 600456
> prune-packable: 0
> garbage: 0
I'm sure if you had done 'git rev-list --all --objects | wc -l' at that
point, the result would have been something around 900000. That's the
actual number of objects git had a reference to, compared to the total
objects contained in the object store.
> and after (transferred 278MB):
>
> count: 44
> size: 1768
> in-pack: 1947339
> packs: 2
> size-pack: 1178408
> prune-packable: 8
> garbage: 0
And those 500000 extra objects or so (minus a couple dozens which were
probably used to "complete" the fetched thin pack and are duplicates of
local objects -- the fetch progress message gave the exact number) were
obtained from the remote repository because git has no way to tell the
remote it already had them. That's what I was explaining in my previous
email.
> and then after git gc --prune=now:
>
> count: 0
> size: 0
> in-pack: 1399613
> packs: 1
> size-pack: 839900
> prune-packable: 0
> garbage: 0
>
> So I only actually needed 104 more objects, but fetch wasn't clever enough to
> see that, and my new pack is much less efficient.
Like I said, it's not that the fetch wasn't clever enough. Rather that
your initial clone asked for way too many objects in the first place.
That's what my patch fixed.
Now the pack efficiency can be explained as well. A single pack is
always going to be more efficient than 2 packs. Problem is when you do
a gc, by default git does the least costly operation which consists of
copying as much data from existing packs without extra processing.
That means that many objects were copied from the second (newly
received) pack although a better delta representation was most probably
available in the other larger pack (remember that most objects from that
second pack already existed in the first pack). Git do select the
second pack in preference to the other pack because it is more recent,
and normally more recent packs contains more recent objects which is a
good heuristic to optimizes the object enumeration. In this case this
didn't produce a good result, but again we're talking about a scenario
which is bogus from the start and shouldn't be.
So if you do a 'git gc --aggressive' and let it run for a while, you
should get back a smaller pack, possibly even much smaller than the
original
one.
Nicolas
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: git clone sending unneeded objects
2009-09-27 2:04 ` Shawn O. Pearce
@ 2009-09-27 2:31 ` Nicolas Pitre
2009-09-27 4:35 ` Jason Merrill
1 sibling, 0 replies; 26+ messages in thread
From: Nicolas Pitre @ 2009-09-27 2:31 UTC (permalink / raw)
To: Shawn O. Pearce; +Cc: Jason Merrill, Matthieu Moy, git, Hin-Tak Leung
On Sat, 26 Sep 2009, Shawn O. Pearce wrote:
> Nicolas Pitre <nico@fluxnic.net> wrote:
> > And even if the broken clone (before my patch) did pull everything from
> > gcc.git, in the cloned repository those 410610 extra objects are
> > considered as garbage because nothing actually reference them. So even
> > if you decide to fetch the extra branches that the initial clone didn't
> > pick up, or if you do reference that repository with "garbage" objects
> > for another clone to which you want to add those extra branches, git has
> > no way to know that it already had access to those objects locally and
> > "ungarbage" them as they aren't referenced. Result is a useless fetch
> > of 410610 objects that you already have, but that you weren't supposed
> > to have in the first place.
>
> Just to clarify a minor nit:
>
> Actually, if those refs have not changed, quickfetch should kick in
> and realize that all 410610 objects are reachable locally without
> errors, permitting the client to avoid the object transfer.
>
> However, if *ANY* of those refs were to change to something you
> don't actually have, quickfetch would fail, and we would need to
> fetch all 410610 objects.
Right. But since we're talking about a git mirror for the gcc svn repo
and gcc is a rather active project, the likelyhood of any ref to change
at any time is rather high.
Nicolas
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: git clone sending unneeded objects
2009-09-27 2:04 ` Shawn O. Pearce
2009-09-27 2:31 ` Nicolas Pitre
@ 2009-09-27 4:35 ` Jason Merrill
2009-09-28 4:18 ` Nicolas Pitre
1 sibling, 1 reply; 26+ messages in thread
From: Jason Merrill @ 2009-09-27 4:35 UTC (permalink / raw)
To: Shawn O. Pearce; +Cc: Nicolas Pitre, Matthieu Moy, git, Hin-Tak Leung
On 09/26/2009 10:04 PM, Shawn O. Pearce wrote:
> Actually, if those refs have not changed, quickfetch should kick in
> and realize that all 410610 objects are reachable locally without
> errors, permitting the client to avoid the object transfer.
>
> However, if *ANY* of those refs were to change to something you
> don't actually have, quickfetch would fail, and we would need to
> fetch all 410610 objects.
Right. That seems unfortunate to me; couldn't fetch do a bit more
checking before it decides to download the whole world again?
Jason
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: git clone sending unneeded objects
2009-09-27 4:35 ` Jason Merrill
@ 2009-09-28 4:18 ` Nicolas Pitre
0 siblings, 0 replies; 26+ messages in thread
From: Nicolas Pitre @ 2009-09-28 4:18 UTC (permalink / raw)
To: Jason Merrill; +Cc: Shawn O. Pearce, Matthieu Moy, git, Hin-Tak Leung
On Sun, 27 Sep 2009, Jason Merrill wrote:
> On 09/26/2009 10:04 PM, Shawn O. Pearce wrote:
> > Actually, if those refs have not changed, quickfetch should kick in
> > and realize that all 410610 objects are reachable locally without
> > errors, permitting the client to avoid the object transfer.
> >
> > However, if *ANY* of those refs were to change to something you
> > don't actually have, quickfetch would fail, and we would need to
> > fetch all 410610 objects.
>
> Right. That seems unfortunate to me; couldn't fetch do a bit more checking
> before it decides to download the whole world again?
The quickfetch test could be turned into a filter so refs that are
already available locally could simply not be fetched on a per ref
basis. But that would be a rather expensive test which couldn't keep
its "quick" qualifier anymore, and so for a case that shouldn't have
happened normally anyway if git didn't have a bug with its clone
operation as I've explained already.
Nicolas
^ permalink raw reply [flat|nested] 26+ messages in thread
end of thread, other threads:[~2009-09-28 4:18 UTC | newest]
Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-08-08 1:11 git gc expanding packed data? Andreas Schwab
2009-08-08 13:05 ` Hin-Tak Leung
2009-08-08 13:25 ` Andreas Schwab
2009-08-09 2:56 ` Nicolas Pitre
2009-08-09 7:43 ` Andreas Schwab
2009-09-25 18:05 ` git clone sending unneeded objects (was : git gc expanding packed data?) Jason Merrill
2009-09-25 19:34 ` git clone sending unneeded objects Matthieu Moy
2009-09-25 19:43 ` Jason Merrill
2009-09-25 19:53 ` Nicolas Pitre
2009-09-25 20:20 ` Jason Merrill
2009-09-25 20:47 ` Nicolas Pitre
2009-09-25 23:17 ` Jason Merrill
2009-09-26 0:49 ` Nicolas Pitre
2009-09-26 3:54 ` [PATCH] make 'git clone' ask the remote only for objects it cares about Nicolas Pitre
2009-09-26 7:21 ` Andreas Schwab
2009-09-26 19:50 ` Shawn O. Pearce
2009-09-27 0:26 ` Nicolas Pitre
2009-09-26 4:44 ` git clone sending unneeded objects Jason Merrill
2009-09-26 13:33 ` Jason Merrill
2009-09-27 2:26 ` Nicolas Pitre
2009-09-27 1:27 ` Nicolas Pitre
2009-09-27 2:04 ` Shawn O. Pearce
2009-09-27 2:31 ` Nicolas Pitre
2009-09-27 4:35 ` Jason Merrill
2009-09-28 4:18 ` Nicolas Pitre
2009-09-26 0:43 ` Hin-Tak Leung
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.