All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: git gc expanding packed data?
@ 2009-08-08  1:11 Andreas Schwab
  2009-08-08 13:05 ` Hin-Tak Leung
  2009-08-09  2:56 ` Nicolas Pitre
  0 siblings, 2 replies; 26+ messages in thread
From: Andreas Schwab @ 2009-08-08  1:11 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: Hin-Tak Leung, git

Nicolas Pitre <nico@cam.org> writes:

> It appears that the git installation serving clone requests for
> git://gcc.gnu.org/git/gcc.git generates lots of unreferenced objects. I
> just cloned it and the pack I was sent contains 1383356 objects (can be
> determined with 'git show-index < .git/objects/pack/*.idx | wc -l').
> However, there are only 978501 actually referenced objects in that
> cloned repository ( 'git rev-list --all --objects | wc -l').  That makes
> for 404855 useless objects in the cloned repository.

Those objects are not useless.  They are referenced by the remote refs
on the remote side, which are not fetched by default.  If you clone a
mirror of the repository you'll see no unreferenced objects.

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: git gc expanding packed data?
  2009-08-08  1:11 git gc expanding packed data? Andreas Schwab
@ 2009-08-08 13:05 ` Hin-Tak Leung
  2009-08-08 13:25   ` Andreas Schwab
  2009-08-09  2:56 ` Nicolas Pitre
  1 sibling, 1 reply; 26+ messages in thread
From: Hin-Tak Leung @ 2009-08-08 13:05 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: Nicolas Pitre, git

On Sat, Aug 8, 2009 at 2:11 AM, Andreas Schwab<schwab@linux-m68k.org> wrote:
> Nicolas Pitre <nico@cam.org> writes:
>
>> It appears that the git installation serving clone requests for
>> git://gcc.gnu.org/git/gcc.git generates lots of unreferenced objects. I
>> just cloned it and the pack I was sent contains 1383356 objects (can be
>> determined with 'git show-index < .git/objects/pack/*.idx | wc -l').
>> However, there are only 978501 actually referenced objects in that
>> cloned repository ( 'git rev-list --all --objects | wc -l').  That makes
>> for 404855 useless objects in the cloned repository.
>
> Those objects are not useless.  They are referenced by the remote refs
> on the remote side, which are not fetched by default.  If you clone a
> mirror of the repository you'll see no unreferenced objects.
>
> Andreas.
>
> --
> Andreas Schwab, schwab@linux-m68k.org
> GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
> "And now for something completely different."
>

Thanks... It is a difference between svn and git mentality probably -
one only pushes reasonably reliable code to a public git repository,
whereas anything transient is recorded in svn - I think many of the
unreferenced objects are svn user-branches (which are probably of use
to people who intend to work on gcc for fairly extended periods,
rather than casual users like me).
The case with gcc is probably quite extreme - many user branches, and
very large code base - but is there anything on the git side with git
gc which can lessen this kind of pathological behavior (expanding
packs)?

Thanks a lot for the explanation and the discussion.

Hin-Tak

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: git gc expanding packed data?
  2009-08-08 13:05 ` Hin-Tak Leung
@ 2009-08-08 13:25   ` Andreas Schwab
  0 siblings, 0 replies; 26+ messages in thread
From: Andreas Schwab @ 2009-08-08 13:25 UTC (permalink / raw)
  To: Hin-Tak Leung; +Cc: Nicolas Pitre, git

Hin-Tak Leung <hintak.leung@gmail.com> writes:

> Thanks... It is a difference between svn and git mentality probably -

It is just that the remote is a git-svn tree, with only a few branches
created as local branches.

> The case with gcc is probably quite extreme - many user branches, and
> very large code base - but is there anything on the git side with git
> gc which can lessen this kind of pathological behavior (expanding
> packs)?

If you fetch all refs, not only refs/heads/*, all objects will be
referenced.

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: git gc expanding packed data?
  2009-08-08  1:11 git gc expanding packed data? Andreas Schwab
  2009-08-08 13:05 ` Hin-Tak Leung
@ 2009-08-09  2:56 ` Nicolas Pitre
  2009-08-09  7:43   ` Andreas Schwab
  1 sibling, 1 reply; 26+ messages in thread
From: Nicolas Pitre @ 2009-08-09  2:56 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: Hin-Tak Leung, git

On Sat, 8 Aug 2009, Andreas Schwab wrote:

> Nicolas Pitre <nico@cam.org> writes:
> 
> > It appears that the git installation serving clone requests for
> > git://gcc.gnu.org/git/gcc.git generates lots of unreferenced objects. I
> > just cloned it and the pack I was sent contains 1383356 objects (can be
> > determined with 'git show-index < .git/objects/pack/*.idx | wc -l').
> > However, there are only 978501 actually referenced objects in that
> > cloned repository ( 'git rev-list --all --objects | wc -l').  That makes
> > for 404855 useless objects in the cloned repository.
> 
> Those objects are not useless.  They are referenced by the remote refs
> on the remote side, which are not fetched by default.  If you clone a
> mirror of the repository you'll see no unreferenced objects.

If you do a clone using the git:// protocol and the server sends you 
only the ref for the trunk branch, then it should send you only objects 
reachable from that branch.  Any extra objects sent by the server are 
useless to me and wastes my and everyone else's bandwidth, and on my 
next repack those objects are pruned anyway.  The point of the git 
protocol is _not_ necessarily to send a copy of the remote pack file 
over, even during a clone.


Nicolas

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: git gc expanding packed data?
  2009-08-09  2:56 ` Nicolas Pitre
@ 2009-08-09  7:43   ` Andreas Schwab
  2009-09-25 18:05     ` git clone sending unneeded objects (was : git gc expanding packed data?) Jason Merrill
  0 siblings, 1 reply; 26+ messages in thread
From: Andreas Schwab @ 2009-08-09  7:43 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: Hin-Tak Leung, git

Nicolas Pitre <nico@cam.org> writes:

> If you do a clone using the git:// protocol and the server sends you 
> only the ref for the trunk branch,

A clone will fetch all branches from refs/heads/*.

> then it should send you only objects reachable from that branch.

Apparantly this does not work.  I'd guess the extra objects are needed
due to the delta compression.

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: git clone sending unneeded objects (was : git gc expanding packed data?)
  2009-08-09  7:43   ` Andreas Schwab
@ 2009-09-25 18:05     ` Jason Merrill
  2009-09-25 19:34       ` git clone sending unneeded objects Matthieu Moy
  0 siblings, 1 reply; 26+ messages in thread
From: Jason Merrill @ 2009-09-25 18:05 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: Nicolas Pitre, Hin-Tak Leung, git

On 08/09/2009 03:43 AM, Andreas Schwab wrote:
> Nicolas Pitre<nico@cam.org>  writes:
>
>> If you do a clone using the git:// protocol and the server sends you
>> only the ref for the trunk branch,
>
> A clone will fetch all branches from refs/heads/*.
>
>> then it should send you only objects reachable from that branch.
>
> Apparantly this does not work.  I'd guess the extra objects are needed
> due to the delta compression.

I just tried doing a clone of the GCC repository, then git gc 
--prune=now, and another clone specifying --reference to the first, and 
it wanted to download all the unreachable objects again.  So it doesn't 
seem to be a compression issue.

This is with git 1.6.4 on both ends.

Jason

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: git clone sending unneeded objects
  2009-09-25 18:05     ` git clone sending unneeded objects (was : git gc expanding packed data?) Jason Merrill
@ 2009-09-25 19:34       ` Matthieu Moy
  2009-09-25 19:43         ` Jason Merrill
  2009-09-25 19:53         ` Nicolas Pitre
  0 siblings, 2 replies; 26+ messages in thread
From: Matthieu Moy @ 2009-09-25 19:34 UTC (permalink / raw)
  To: Jason Merrill; +Cc: git, Nicolas Pitre, Hin-Tak Leung

Jason Merrill <jason@redhat.com> writes:

> On 08/09/2009 03:43 AM, Andreas Schwab wrote:
>> Nicolas Pitre<nico@cam.org>  writes:
>>
>>> If you do a clone using the git:// protocol and the server sends you
>>> only the ref for the trunk branch,
>>
>> A clone will fetch all branches from refs/heads/*.
>>
>>> then it should send you only objects reachable from that branch.
>>
>> Apparantly this does not work.  I'd guess the extra objects are needed
>> due to the delta compression.
>
> I just tried doing a clone of the GCC repository, then git gc
> --prune=now, and another clone specifying --reference to the first,
> and it wanted to download all the unreachable objects again.  So it
> doesn't seem to be a compression issue.
>
> This is with git 1.6.4 on both ends.

Which protocol did you use?

If you use git:// or ssh://, it's normally a security feature that Git
sends you only reachable objects. If it doesn't, it's a serious bug.

-- 
Matthieu Moy
http://www-verimag.imag.fr/~moy/

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: git clone sending unneeded objects
  2009-09-25 19:34       ` git clone sending unneeded objects Matthieu Moy
@ 2009-09-25 19:43         ` Jason Merrill
  2009-09-25 19:53         ` Nicolas Pitre
  1 sibling, 0 replies; 26+ messages in thread
From: Jason Merrill @ 2009-09-25 19:43 UTC (permalink / raw)
  To: Matthieu Moy; +Cc: git, Nicolas Pitre, Hin-Tak Leung

On 09/25/2009 03:34 PM, Matthieu Moy wrote:
> Which protocol did you use?

git://

Jason

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: git clone sending unneeded objects
  2009-09-25 19:34       ` git clone sending unneeded objects Matthieu Moy
  2009-09-25 19:43         ` Jason Merrill
@ 2009-09-25 19:53         ` Nicolas Pitre
  2009-09-25 20:20           ` Jason Merrill
  1 sibling, 1 reply; 26+ messages in thread
From: Nicolas Pitre @ 2009-09-25 19:53 UTC (permalink / raw)
  To: Matthieu Moy; +Cc: Jason Merrill, git, Hin-Tak Leung

On Fri, 25 Sep 2009, Matthieu Moy wrote:

> Jason Merrill <jason@redhat.com> writes:
> 
> > On 08/09/2009 03:43 AM, Andreas Schwab wrote:
> >> Nicolas Pitre<nico@cam.org>  writes:
> >>
> >>> If you do a clone using the git:// protocol and the server sends you
> >>> only the ref for the trunk branch,
> >>
> >> A clone will fetch all branches from refs/heads/*.
> >>
> >>> then it should send you only objects reachable from that branch.
> >>
> >> Apparantly this does not work.  I'd guess the extra objects are needed
> >> due to the delta compression.
> >
> > I just tried doing a clone of the GCC repository, then git gc
> > --prune=now, and another clone specifying --reference to the first,
> > and it wanted to download all the unreachable objects again.  So it
> > doesn't seem to be a compression issue.
> >
> > This is with git 1.6.4 on both ends.
> 
> Which protocol did you use?
> 
> If you use git:// or ssh://, it's normally a security feature that Git
> sends you only reachable objects. If it doesn't, it's a serious bug.

I did reproduce the issue with git:// back when this discussion started. 
I also asked for more information about the remote which didn't come 
forth.


Nicolas

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: git clone sending unneeded objects
  2009-09-25 19:53         ` Nicolas Pitre
@ 2009-09-25 20:20           ` Jason Merrill
  2009-09-25 20:47             ` Nicolas Pitre
  2009-09-26  0:43             ` Hin-Tak Leung
  0 siblings, 2 replies; 26+ messages in thread
From: Jason Merrill @ 2009-09-25 20:20 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: Matthieu Moy, git, Hin-Tak Leung

On 09/25/2009 03:53 PM, Nicolas Pitre wrote:
> I did reproduce the issue with git:// back when this discussion started.
> I also asked for more information about the remote which didn't come
> forth.

Looking back, I only see you asking about the git version on the server, 
which is 1.6.4.

So again:

git clone git://gcc.gnu.org/git/gcc.git
  (1399509 objects, ~600MB .git dir)
git gc --prune=now (988906 objects, ~450MB .git dir)

...then

git clone git://gcc.gnu.org/git/gcc.git --reference $firstclone
  (573401 objects, ~550MB .git dir)
git fsck (clean)
git gc --prune=now (5 objects, ~7MB .git dir)

What's going on here?

Jason

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: git clone sending unneeded objects
  2009-09-25 20:20           ` Jason Merrill
@ 2009-09-25 20:47             ` Nicolas Pitre
  2009-09-25 23:17               ` Jason Merrill
  2009-09-26  0:43             ` Hin-Tak Leung
  1 sibling, 1 reply; 26+ messages in thread
From: Nicolas Pitre @ 2009-09-25 20:47 UTC (permalink / raw)
  To: Jason Merrill; +Cc: Matthieu Moy, git, Hin-Tak Leung

On Fri, 25 Sep 2009, Jason Merrill wrote:

> On 09/25/2009 03:53 PM, Nicolas Pitre wrote:
> > I did reproduce the issue with git:// back when this discussion started.
> > I also asked for more information about the remote which didn't come
> > forth.
> 
> Looking back, I only see you asking about the git version on the server, which
> is 1.6.4.
> 
> So again:
> 
> git clone git://gcc.gnu.org/git/gcc.git
>  (1399509 objects, ~600MB .git dir)
> git gc --prune=now (988906 objects, ~450MB .git dir)
> 
> ...then
> 
> git clone git://gcc.gnu.org/git/gcc.git --reference $firstclone
>  (573401 objects, ~550MB .git dir)
> git fsck (clean)
> git gc --prune=now (5 objects, ~7MB .git dir)
> 
> What's going on here?

Some screw up.

Do you have access to the remote machine?  Is it possible to have a 
tarball of the gcc.git directory from there?


Nicolas

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: git clone sending unneeded objects
  2009-09-25 20:47             ` Nicolas Pitre
@ 2009-09-25 23:17               ` Jason Merrill
  2009-09-26  0:49                 ` Nicolas Pitre
  2009-09-26  4:44                 ` git clone sending unneeded objects Jason Merrill
  0 siblings, 2 replies; 26+ messages in thread
From: Jason Merrill @ 2009-09-25 23:17 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: Matthieu Moy, git, Hin-Tak Leung

On 09/25/2009 04:47 PM, Nicolas Pitre wrote:
> Do you have access to the remote machine?  Is it possible to have a
> tarball of the gcc.git directory from there?

http://gcc.gnu.org/gcc-git.tar.gz

I'll leave it there for a few days.

Jason

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: git clone sending unneeded objects
  2009-09-25 20:20           ` Jason Merrill
  2009-09-25 20:47             ` Nicolas Pitre
@ 2009-09-26  0:43             ` Hin-Tak Leung
  1 sibling, 0 replies; 26+ messages in thread
From: Hin-Tak Leung @ 2009-09-26  0:43 UTC (permalink / raw)
  To: Jason Merrill; +Cc: Nicolas Pitre, Matthieu Moy, git

On Fri, Sep 25, 2009 at 9:20 PM, Jason Merrill <jason@redhat.com> wrote:
> On 09/25/2009 03:53 PM, Nicolas Pitre wrote:
>>
>> I did reproduce the issue with git:// back when this discussion started.
>> I also asked for more information about the remote which didn't come
>> forth.
>
> Looking back, I only see you asking about the git version on the server,
> which is 1.6.4.

Hmm, I was under the impression from the previous thread that the
server is a bit older and/or have more backward compatible settings to
cater for older git clients?

>
> So again:
>
> git clone git://gcc.gnu.org/git/gcc.git
>  (1399509 objects, ~600MB .git dir)
> git gc --prune=now (988906 objects, ~450MB .git dir)
>
> ...then
>
> git clone git://gcc.gnu.org/git/gcc.git --reference $firstclone
>  (573401 objects, ~550MB .git dir)
> git fsck (clean)
> git gc --prune=now (5 objects, ~7MB .git dir)
>
> What's going on here?

FWIW, I still have my clone (git://) and do my periodic 'git fetch'
and 'git gc prune=now' (learned my lessons!) and it is currently .git
dir is about 350MB. (from previous discussion the optimal at the time
was about 300MB, so it has grown a bit in the last couple of months).

And thanks everybody for all the discussion and advice. git is a great
tool. (and I have essentially stopped using svn, prefering git-svn!).

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: git clone sending unneeded objects
  2009-09-25 23:17               ` Jason Merrill
@ 2009-09-26  0:49                 ` Nicolas Pitre
  2009-09-26  3:54                   ` [PATCH] make 'git clone' ask the remote only for objects it cares about Nicolas Pitre
  2009-09-26  4:44                 ` git clone sending unneeded objects Jason Merrill
  1 sibling, 1 reply; 26+ messages in thread
From: Nicolas Pitre @ 2009-09-26  0:49 UTC (permalink / raw)
  To: Jason Merrill; +Cc: Matthieu Moy, git, Hin-Tak Leung

On Fri, 25 Sep 2009, Jason Merrill wrote:

> On 09/25/2009 04:47 PM, Nicolas Pitre wrote:
> > Do you have access to the remote machine?  Is it possible to have a
> > tarball of the gcc.git directory from there?
> 
> http://gcc.gnu.org/gcc-git.tar.gz
> 
> I'll leave it there for a few days.

Thanks, I got it now.  And I was able to reproduce the issue locally.

Cloning the original repository does transfer objects which become 
unreferenced in the clone.  But cloning that cloned repository (before 
pruning the unreferenced objects) does not transfer those objects again.  

Just need to find out why.


Nicolas

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH] make 'git clone' ask the remote only for objects it cares about
  2009-09-26  0:49                 ` Nicolas Pitre
@ 2009-09-26  3:54                   ` Nicolas Pitre
  2009-09-26  7:21                     ` Andreas Schwab
  2009-09-26 19:50                     ` Shawn O. Pearce
  0 siblings, 2 replies; 26+ messages in thread
From: Nicolas Pitre @ 2009-09-26  3:54 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jason Merrill, Matthieu Moy, git, Hin-Tak Leung

Current behavior of 'git clone' when not using --mirror is to fetch 
everything from the peer, and then filter out unwanted refs just before 
writing them out to the cloned repository.  This may become highly 
inefficient if the peer has an unusual ref namespace, or if it simply 
has "remotes" refs of its own, and those locally unwanted refs are 
connecting to a large set of objects which becomes unreferenced as soon 
as they are fetched.

Let's filter out those unwanted refs from the peer _before_ asking it 
what refs we want to fetch instead, which is the most logical thing to 
do anyway.

Signed-off-by: Nicolas Pitre <nico@fluxnic.net>
---

On Fri, 25 Sep 2009, Nicolas Pitre wrote:

> On Fri, 25 Sep 2009, Jason Merrill wrote:
> 
> > On 09/25/2009 04:47 PM, Nicolas Pitre wrote:
> > > Do you have access to the remote machine?  Is it possible to have a
> > > tarball of the gcc.git directory from there?
> > 
> > http://gcc.gnu.org/gcc-git.tar.gz
> > 
> > I'll leave it there for a few days.
> 
> Thanks, I got it now.  And I was able to reproduce the issue locally.
> 
> Cloning the original repository does transfer objects which become 
> unreferenced in the clone.  But cloning that cloned repository (before 
> pruning the unreferenced objects) does not transfer those objects again.  
> 
> Just need to find out why.

And the "why" is described above.  The problem was actually on the 
client side and was affecting clones of any repository containing 
anything outside refs/heads and refs/tags.

The fact that the git repository on gcc.gnu.org has lots of stuff in 
"remote" branches that don't get cloned by default is a separate 
configuration/policy issue on that server which might need (or not) to 
be looked into.  For instance at least, as a bare repository, it should 
have all the git files in gcc.git/ directly instead of gcc.git/.git/.

diff --git a/builtin-clone.c b/builtin-clone.c
index bab2d84..edf7c7f 100644
--- a/builtin-clone.c
+++ b/builtin-clone.c
@@ -329,24 +329,28 @@ static void remove_junk_on_signal(int signo)
 	raise(signo);
 }
 
-static struct ref *write_remote_refs(const struct ref *refs,
-		struct refspec *refspec, const char *reflog)
+static struct ref *wanted_peer_refs(const struct ref *refs,
+		struct refspec *refspec)
 {
 	struct ref *local_refs = NULL;
 	struct ref **tail = &local_refs;
-	struct ref *r;
 
 	get_fetch_map(refs, refspec, &tail, 0);
 	if (!option_mirror)
 		get_fetch_map(refs, tag_refspec, &tail, 0);
 
+	return local_refs;
+}
+
+static void write_remote_refs(const struct ref *local_refs, const char *reflog)
+{
+	const struct ref *r;
+
 	for (r = local_refs; r; r = r->next)
 		add_extra_ref(r->peer_ref->name, r->old_sha1, 0);
 
 	pack_refs(PACK_REFS_ALL);
 	clear_extra_refs();
-
-	return local_refs;
 }
 
 int cmd_clone(int argc, const char **argv, const char *prefix)
@@ -495,9 +499,10 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
 
 	strbuf_reset(&value);
 
-	if (path && !is_bundle)
+	if (path && !is_bundle) {
 		refs = clone_local(path, git_dir);
-	else {
+		mapped_refs = wanted_peer_refs(refs, refspec);
+	} else {
 		struct remote *remote = remote_get(argv[0]);
 		transport = transport_get(remote, remote->url[0]);
 
@@ -520,14 +525,16 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
 					     option_upload_pack);
 
 		refs = transport_get_remote_refs(transport);
-		if (refs)
-			transport_fetch_refs(transport, refs);
+		if (refs) {
+			mapped_refs = wanted_peer_refs(refs, refspec);
+			transport_fetch_refs(transport, mapped_refs);
+		}
 	}
 
 	if (refs) {
 		clear_extra_refs();
 
-		mapped_refs = write_remote_refs(refs, refspec, reflog_msg.buf);
+		write_remote_refs(mapped_refs, reflog_msg.buf);
 
 		remote_head = find_ref_by_name(refs, "HEAD");
 		remote_head_points_at =

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: git clone sending unneeded objects
  2009-09-25 23:17               ` Jason Merrill
  2009-09-26  0:49                 ` Nicolas Pitre
@ 2009-09-26  4:44                 ` Jason Merrill
  2009-09-26 13:33                   ` Jason Merrill
  2009-09-27  1:27                   ` Nicolas Pitre
  1 sibling, 2 replies; 26+ messages in thread
From: Jason Merrill @ 2009-09-26  4:44 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: Matthieu Moy, git, Hin-Tak Leung

Incidentally, somewhat related to this issue, I've noticed that if I 
fetch a branch which I don't currently have in my repository, and I have 
most of the commits on that branch in my object store (or in an 
alternate repository) but not the most recent commit, git fetch isn't 
smart enough to only grab the commits I'm actually missing, it wants to 
fetch much more.

I would expect that since the clone pulled down everything in the 
gcc.git repository, I could then do

git config remote.origin.fetch 'refs/remotes/*:refs/remotes/origin/*'
git fetch

and have all the branches, not just the ones in refs/heads.  But when I 
do this git fetch wants to fetch some 500k redundant objects.

Jason

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] make 'git clone' ask the remote only for objects it cares about
  2009-09-26  3:54                   ` [PATCH] make 'git clone' ask the remote only for objects it cares about Nicolas Pitre
@ 2009-09-26  7:21                     ` Andreas Schwab
  2009-09-26 19:50                     ` Shawn O. Pearce
  1 sibling, 0 replies; 26+ messages in thread
From: Andreas Schwab @ 2009-09-26  7:21 UTC (permalink / raw)
  To: Nicolas Pitre
  Cc: Junio C Hamano, Jason Merrill, Matthieu Moy, git, Hin-Tak Leung

Nicolas Pitre <nico@fluxnic.net> writes:

> The fact that the git repository on gcc.gnu.org has lots of stuff in 
> "remote" branches that don't get cloned by default is a separate 
> configuration/policy issue on that server which might need (or not) to 
> be looked into.  For instance at least, as a bare repository, it should 
> have all the git files in gcc.git/ directly instead of gcc.git/.git/.

The remote is just a git-svn tree.

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: git clone sending unneeded objects
  2009-09-26  4:44                 ` git clone sending unneeded objects Jason Merrill
@ 2009-09-26 13:33                   ` Jason Merrill
  2009-09-27  2:26                     ` Nicolas Pitre
  2009-09-27  1:27                   ` Nicolas Pitre
  1 sibling, 1 reply; 26+ messages in thread
From: Jason Merrill @ 2009-09-26 13:33 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: Matthieu Moy, git, Hin-Tak Leung

On 09/26/2009 12:44 AM, Jason Merrill wrote:
> git config remote.origin.fetch 'refs/remotes/*:refs/remotes/origin/*'
> git fetch

git count-objects -v before:

count: 44
size: 1768
in-pack: 1399509
packs: 1
size-pack: 600456
prune-packable: 0
garbage: 0

and after (transferred 278MB):

count: 44
size: 1768
in-pack: 1947339
packs: 2
size-pack: 1178408
prune-packable: 8
garbage: 0

and then after git gc --prune=now:

count: 0
size: 0
in-pack: 1399613
packs: 1
size-pack: 839900
prune-packable: 0
garbage: 0

So I only actually needed 104 more objects, but fetch wasn't clever 
enough to see that, and my new pack is much less efficient.

I've run into the same issue using alternates to set up multiple working 
directories for different branches; if the alternate directory isn't 
completely up-to-date, fetch wants to pull down lots of data again 
rather than use what I have and only fetch the last one or two commits.

Jason

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] make 'git clone' ask the remote only for objects it cares about
  2009-09-26  3:54                   ` [PATCH] make 'git clone' ask the remote only for objects it cares about Nicolas Pitre
  2009-09-26  7:21                     ` Andreas Schwab
@ 2009-09-26 19:50                     ` Shawn O. Pearce
  2009-09-27  0:26                       ` Nicolas Pitre
  1 sibling, 1 reply; 26+ messages in thread
From: Shawn O. Pearce @ 2009-09-26 19:50 UTC (permalink / raw)
  To: Nicolas Pitre
  Cc: Junio C Hamano, Jason Merrill, Matthieu Moy, git, Hin-Tak Leung

Nicolas Pitre <nico@fluxnic.net> wrote:
> Current behavior of 'git clone' when not using --mirror is to fetch 
> everything from the peer, and then filter out unwanted refs just before 
> writing them out to the cloned repository.  This may become highly 
> inefficient if the peer has an unusual ref namespace, or if it simply 
> has "remotes" refs of its own, and those locally unwanted refs are 
> connecting to a large set of objects which becomes unreferenced as soon 
> as they are fetched.
...
> +static void write_remote_refs(const struct ref *local_refs, const char *reflog)

Here reflog is now unused.  I'm going to squash this in.

diff --git a/builtin-clone.c b/builtin-clone.c
index edf7c7f..4992c25 100644
--- a/builtin-clone.c
+++ b/builtin-clone.c
@@ -342,7 +342,7 @@ static struct ref *wanted_peer_refs(const struct ref *refs,
 	return local_refs;
 }
 
-static void write_remote_refs(const struct ref *local_refs, const char *reflog)
+static void write_remote_refs(const struct ref *local_refs)
 {
 	const struct ref *r;
 
@@ -534,7 +534,7 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
 	if (refs) {
 		clear_extra_refs();
 
-		write_remote_refs(mapped_refs, reflog_msg.buf);
+		write_remote_refs(mapped_refs);
 
 		remote_head = find_ref_by_name(refs, "HEAD");
 		remote_head_points_at =

-- 
Shawn.

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH] make 'git clone' ask the remote only for objects it cares about
  2009-09-26 19:50                     ` Shawn O. Pearce
@ 2009-09-27  0:26                       ` Nicolas Pitre
  0 siblings, 0 replies; 26+ messages in thread
From: Nicolas Pitre @ 2009-09-27  0:26 UTC (permalink / raw)
  To: Shawn O. Pearce
  Cc: Junio C Hamano, Jason Merrill, Matthieu Moy, git, Hin-Tak Leung

On Sat, 26 Sep 2009, Shawn O. Pearce wrote:

> Nicolas Pitre <nico@fluxnic.net> wrote:
> > Current behavior of 'git clone' when not using --mirror is to fetch 
> > everything from the peer, and then filter out unwanted refs just before 
> > writing them out to the cloned repository.  This may become highly 
> > inefficient if the peer has an unusual ref namespace, or if it simply 
> > has "remotes" refs of its own, and those locally unwanted refs are 
> > connecting to a large set of objects which becomes unreferenced as soon 
> > as they are fetched.
> ...
> > +static void write_remote_refs(const struct ref *local_refs, const char *reflog)
> 
> Here reflog is now unused.  I'm going to squash this in.

Yeah, I noticed.  Since I didn't know what was the original intent for 
it, I just left it there.


Nicolas

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: git clone sending unneeded objects
  2009-09-26  4:44                 ` git clone sending unneeded objects Jason Merrill
  2009-09-26 13:33                   ` Jason Merrill
@ 2009-09-27  1:27                   ` Nicolas Pitre
  2009-09-27  2:04                     ` Shawn O. Pearce
  1 sibling, 1 reply; 26+ messages in thread
From: Nicolas Pitre @ 2009-09-27  1:27 UTC (permalink / raw)
  To: Jason Merrill; +Cc: Matthieu Moy, git, Hin-Tak Leung

On Sat, 26 Sep 2009, Jason Merrill wrote:

> Incidentally, somewhat related to this issue, I've noticed that if I fetch a
> branch which I don't currently have in my repository, and I have most of the
> commits on that branch in my object store (or in an alternate repository) but
> not the most recent commit, git fetch isn't smart enough to only grab the
> commits I'm actually missing, it wants to fetch much more.
> 
> I would expect that since the clone pulled down everything in the gcc.git
> repository, I could then do
> 
> git config remote.origin.fetch 'refs/remotes/*:refs/remotes/origin/*'
> git fetch
> 
> and have all the branches, not just the ones in refs/heads.  But when I do
> this git fetch wants to fetch some 500k redundant objects.

Well...  Assuming a fixed git using the patch I posted yesterday, my 
clone of gcc.git has 988941 objects.  The source repository used for the 
clone has 1399551 objects.  Of course the source repo has more objects 
because it has extra branches in the refs/remotes/ namespace that the 
clone didn't fetch.  If you wish to also fetch those branches as you 
illustrated above then you'll get the difference i.e. 410610 additional 
objects.

And even if the broken clone (before my patch) did pull everything from 
gcc.git, in the cloned repository those 410610 extra objects are 
considered as garbage because nothing actually reference them.  So even 
if you decide to fetch the extra branches that the initial clone didn't 
pick up, or if you do reference that repository with "garbage" objects 
for another clone to which you want to add those extra branches, git has 
no way to know that it already had access to those objects locally and 
"ungarbage" them as they aren't referenced.  Result is a useless fetch 
of 410610 objects that you already have, but that you weren't supposed 
to have in the first place.


Nicolas

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: git clone sending unneeded objects
  2009-09-27  1:27                   ` Nicolas Pitre
@ 2009-09-27  2:04                     ` Shawn O. Pearce
  2009-09-27  2:31                       ` Nicolas Pitre
  2009-09-27  4:35                       ` Jason Merrill
  0 siblings, 2 replies; 26+ messages in thread
From: Shawn O. Pearce @ 2009-09-27  2:04 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: Jason Merrill, Matthieu Moy, git, Hin-Tak Leung

Nicolas Pitre <nico@fluxnic.net> wrote:
> And even if the broken clone (before my patch) did pull everything from 
> gcc.git, in the cloned repository those 410610 extra objects are 
> considered as garbage because nothing actually reference them.  So even 
> if you decide to fetch the extra branches that the initial clone didn't 
> pick up, or if you do reference that repository with "garbage" objects 
> for another clone to which you want to add those extra branches, git has 
> no way to know that it already had access to those objects locally and 
> "ungarbage" them as they aren't referenced.  Result is a useless fetch 
> of 410610 objects that you already have, but that you weren't supposed 
> to have in the first place.

Just to clarify a minor nit:

Actually, if those refs have not changed, quickfetch should kick in
and realize that all 410610 objects are reachable locally without
errors, permitting the client to avoid the object transfer.

However, if *ANY* of those refs were to change to something you
don't actually have, quickfetch would fail, and we would need to
fetch all 410610 objects.

-- 
Shawn.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: git clone sending unneeded objects
  2009-09-26 13:33                   ` Jason Merrill
@ 2009-09-27  2:26                     ` Nicolas Pitre
  0 siblings, 0 replies; 26+ messages in thread
From: Nicolas Pitre @ 2009-09-27  2:26 UTC (permalink / raw)
  To: Jason Merrill; +Cc: Matthieu Moy, git, Hin-Tak Leung

On Sat, 26 Sep 2009, Jason Merrill wrote:

> On 09/26/2009 12:44 AM, Jason Merrill wrote:
> > git config remote.origin.fetch 'refs/remotes/*:refs/remotes/origin/*'
> > git fetch
> 
> git count-objects -v before:
> 
> count: 44
> size: 1768
> in-pack: 1399509
> packs: 1
> size-pack: 600456
> prune-packable: 0
> garbage: 0

I'm sure if you had done 'git rev-list --all --objects | wc -l' at that 
point, the result would have been something around 900000.  That's the 
actual number of objects git had a reference to, compared to the total 
objects contained in the object store.

> and after (transferred 278MB):
> 
> count: 44
> size: 1768
> in-pack: 1947339
> packs: 2
> size-pack: 1178408
> prune-packable: 8
> garbage: 0

And those 500000 extra objects or so (minus a couple dozens which were 
probably used to "complete" the fetched thin pack and are duplicates of 
local objects -- the fetch progress message gave the exact number) were 
obtained from the remote repository because git has no way to tell the 
remote it already had them.  That's what I was explaining in my previous 
email.

> and then after git gc --prune=now:
> 
> count: 0
> size: 0
> in-pack: 1399613
> packs: 1
> size-pack: 839900
> prune-packable: 0
> garbage: 0
> 
> So I only actually needed 104 more objects, but fetch wasn't clever enough to
> see that, and my new pack is much less efficient.

Like I said, it's not that the fetch wasn't clever enough.  Rather that 
your initial clone asked for way too many objects in the first place.  
That's what my patch fixed.

Now the pack efficiency can be explained as well.  A single pack is 
always going to be more efficient than 2 packs.  Problem is when you do 
a gc, by default git does the least costly operation which consists of 
copying as much data from existing packs without extra processing.  
That means that many objects were copied from the second (newly 
received) pack although a better delta representation was most probably 
available in the other larger pack (remember that most objects from that 
second pack already existed in the first pack).  Git do select the 
second pack in preference to the other pack because it is more recent, 
and normally more recent packs contains more recent objects which is a 
good heuristic to optimizes the object enumeration.  In this case this 
didn't produce a good result, but again we're talking about a scenario 
which is bogus from the start and shouldn't be.

So if you do a 'git gc --aggressive' and let it run for a while, you 
should get back a smaller pack, possibly even much smaller than the 
original 
one.


Nicolas

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: git clone sending unneeded objects
  2009-09-27  2:04                     ` Shawn O. Pearce
@ 2009-09-27  2:31                       ` Nicolas Pitre
  2009-09-27  4:35                       ` Jason Merrill
  1 sibling, 0 replies; 26+ messages in thread
From: Nicolas Pitre @ 2009-09-27  2:31 UTC (permalink / raw)
  To: Shawn O. Pearce; +Cc: Jason Merrill, Matthieu Moy, git, Hin-Tak Leung

On Sat, 26 Sep 2009, Shawn O. Pearce wrote:

> Nicolas Pitre <nico@fluxnic.net> wrote:
> > And even if the broken clone (before my patch) did pull everything from 
> > gcc.git, in the cloned repository those 410610 extra objects are 
> > considered as garbage because nothing actually reference them.  So even 
> > if you decide to fetch the extra branches that the initial clone didn't 
> > pick up, or if you do reference that repository with "garbage" objects 
> > for another clone to which you want to add those extra branches, git has 
> > no way to know that it already had access to those objects locally and 
> > "ungarbage" them as they aren't referenced.  Result is a useless fetch 
> > of 410610 objects that you already have, but that you weren't supposed 
> > to have in the first place.
> 
> Just to clarify a minor nit:
> 
> Actually, if those refs have not changed, quickfetch should kick in
> and realize that all 410610 objects are reachable locally without
> errors, permitting the client to avoid the object transfer.
> 
> However, if *ANY* of those refs were to change to something you
> don't actually have, quickfetch would fail, and we would need to
> fetch all 410610 objects.

Right.  But since we're talking about a git mirror for the gcc svn repo 
and gcc is a rather active project, the likelyhood of any ref to change 
at any time is rather high.


Nicolas

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: git clone sending unneeded objects
  2009-09-27  2:04                     ` Shawn O. Pearce
  2009-09-27  2:31                       ` Nicolas Pitre
@ 2009-09-27  4:35                       ` Jason Merrill
  2009-09-28  4:18                         ` Nicolas Pitre
  1 sibling, 1 reply; 26+ messages in thread
From: Jason Merrill @ 2009-09-27  4:35 UTC (permalink / raw)
  To: Shawn O. Pearce; +Cc: Nicolas Pitre, Matthieu Moy, git, Hin-Tak Leung

On 09/26/2009 10:04 PM, Shawn O. Pearce wrote:
> Actually, if those refs have not changed, quickfetch should kick in
> and realize that all 410610 objects are reachable locally without
> errors, permitting the client to avoid the object transfer.
>
> However, if *ANY* of those refs were to change to something you
> don't actually have, quickfetch would fail, and we would need to
> fetch all 410610 objects.

Right.  That seems unfortunate to me; couldn't fetch do a bit more 
checking before it decides to download the whole world again?

Jason

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: git clone sending unneeded objects
  2009-09-27  4:35                       ` Jason Merrill
@ 2009-09-28  4:18                         ` Nicolas Pitre
  0 siblings, 0 replies; 26+ messages in thread
From: Nicolas Pitre @ 2009-09-28  4:18 UTC (permalink / raw)
  To: Jason Merrill; +Cc: Shawn O. Pearce, Matthieu Moy, git, Hin-Tak Leung

On Sun, 27 Sep 2009, Jason Merrill wrote:

> On 09/26/2009 10:04 PM, Shawn O. Pearce wrote:
> > Actually, if those refs have not changed, quickfetch should kick in
> > and realize that all 410610 objects are reachable locally without
> > errors, permitting the client to avoid the object transfer.
> > 
> > However, if *ANY* of those refs were to change to something you
> > don't actually have, quickfetch would fail, and we would need to
> > fetch all 410610 objects.
> 
> Right.  That seems unfortunate to me; couldn't fetch do a bit more checking
> before it decides to download the whole world again?

The quickfetch test could be turned into a filter so refs that are 
already available locally could simply not be fetched on a per ref 
basis.  But that would be a rather expensive test which couldn't keep 
its "quick" qualifier anymore, and so for a case that shouldn't have 
happened normally anyway if git didn't have a bug with its clone 
operation as I've explained already.


Nicolas

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2009-09-28  4:18 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-08-08  1:11 git gc expanding packed data? Andreas Schwab
2009-08-08 13:05 ` Hin-Tak Leung
2009-08-08 13:25   ` Andreas Schwab
2009-08-09  2:56 ` Nicolas Pitre
2009-08-09  7:43   ` Andreas Schwab
2009-09-25 18:05     ` git clone sending unneeded objects (was : git gc expanding packed data?) Jason Merrill
2009-09-25 19:34       ` git clone sending unneeded objects Matthieu Moy
2009-09-25 19:43         ` Jason Merrill
2009-09-25 19:53         ` Nicolas Pitre
2009-09-25 20:20           ` Jason Merrill
2009-09-25 20:47             ` Nicolas Pitre
2009-09-25 23:17               ` Jason Merrill
2009-09-26  0:49                 ` Nicolas Pitre
2009-09-26  3:54                   ` [PATCH] make 'git clone' ask the remote only for objects it cares about Nicolas Pitre
2009-09-26  7:21                     ` Andreas Schwab
2009-09-26 19:50                     ` Shawn O. Pearce
2009-09-27  0:26                       ` Nicolas Pitre
2009-09-26  4:44                 ` git clone sending unneeded objects Jason Merrill
2009-09-26 13:33                   ` Jason Merrill
2009-09-27  2:26                     ` Nicolas Pitre
2009-09-27  1:27                   ` Nicolas Pitre
2009-09-27  2:04                     ` Shawn O. Pearce
2009-09-27  2:31                       ` Nicolas Pitre
2009-09-27  4:35                       ` Jason Merrill
2009-09-28  4:18                         ` Nicolas Pitre
2009-09-26  0:43             ` Hin-Tak Leung

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.