All of lore.kernel.org
 help / color / mirror / Atom feed
* Efficiency of initial clone from server
@ 2007-02-11 19:53 Jon Smirl
  2007-02-11 22:53 ` Shawn O. Pearce
  0 siblings, 1 reply; 32+ messages in thread
From: Jon Smirl @ 2007-02-11 19:53 UTC (permalink / raw)
  To: Git Mailing List

I'm doing a clone right now and I see this:

jonsmirl@jonsmirl:/extra$ cg clone
git://git2.kernel.org/pub/scm/linux/kernel/git/linville/wireless-dev.git
Initialized empty Git repository in .git/
Fetching pack (head and objects)...
remote: Generating pack...
remote: Done counting 404120 objects.
remote: Deltifying 404120 objects.
remote:  100% (404120/404120) done
Indexing 404120 objects.
remote: Total 404120, written 404120 (delta 320324), reused 365290
(delta 282572)
 100% (404120/404120) done
Resolving 320324 deltas.
....

Is this happening because the repository on the server is not
completely packed? It is basically building a pack of the whole thing
and shipping it to me, right?

If that is the case, why not first pack the whole repository and then
copy it down the wire? Now the next clone that comes along doesn't
have to do so much work. Would this help to eliminate some of the load
at kernel.org?

Some this is wrong with this tree too, what are these errors about
fatal: pack: not a valid SHA1

fatal: pack: not a valid SHA1
Fetching tags... v2.6.12 v2.6.12-rc2 v2.6.12-rc3 v2.6.12-rc4
v2.6.12-rc5 v2.6.12-rc6 v2.6.13 v2.6.13-rc1 v2.6.13-rc2 v2.6.13-rc3
v2.6.13-rc4 v2.6.13-rc5 v2.6.13-rc6 v2.6.13-rc7 v2.6.14 v2.6.14-rc1
v2.6.14-rc2 v2.6.14-rc3 v2.6.14-rc4 v2.6.14-rc5 v2.6.15 v2.6.15-rc1
v2.6.15-rc2 v2.6.15-rc3 v2.6.15-rc4 v2.6.15-rc5 v2.6.15-rc6
v2.6.15-rc7 v2.6.16 v2.6.16-rc1 v2.6.16-rc2 v2.6.16-rc3 v2.6.16-rc4
v2.6.16-rc5 v2.6.16-rc6 v2.6.17 v2.6.17-rc1 v2.6.17-rc2 v2.6.17-rc3
v2.6.17-rc4 v2.6.17-rc5 v2.6.17-rc6 v2.6.18 v2.6.18-rc1 v2.6.18-rc2
v2.6.18-rc3 v2.6.18-rc4 v2.6.18-rc5 v2.6.18-rc6 v2.6.18-rc7 v2.6.19
v2.6.19-rc1 v2.6.19-rc2 v2.6.19-rc3 v2.6.19-rc4 v2.6.19-rc5
v2.6.19-rc6 v2.6.20-rc1 v2.6.20-rc2 v2.6.20-rc3 v2.6.20-rc4
v2.6.20-rc5 v2.6.20-rc6
remote: Generating pack...
remote: Done counting 63 objects.
remote: Deltifying 63 objects.
remote:  100% (63/63) done
Indexing 63 objects.
remote: Total 63, written 63 (delta 0), reused 63 (delta 0)
 100% (63/63) done
fatal: pack: not a valid SHA1
New branch: 0953670fbcb75e26fb93340bddae934e85618f2e


-- 
Jon Smirl
jonsmirl@gmail.com

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Efficiency of initial clone from server
  2007-02-11 19:53 Efficiency of initial clone from server Jon Smirl
@ 2007-02-11 22:53 ` Shawn O. Pearce
  2007-02-11 23:25   ` Jon Smirl
  2007-02-11 23:29   ` Jon Smirl
  0 siblings, 2 replies; 32+ messages in thread
From: Shawn O. Pearce @ 2007-02-11 22:53 UTC (permalink / raw)
  To: Jon Smirl; +Cc: Git Mailing List

Jon Smirl <jonsmirl@gmail.com> wrote:
> Is this happening because the repository on the server is not
> completely packed? It is basically building a pack of the whole thing
> and shipping it to me, right?

Correct.  The wire protocol only allows us to send one pack.
So we have to pack everything and transmit it as a single unit.
 
> If that is the case, why not first pack the whole repository and then
> copy it down the wire? Now the next clone that comes along doesn't
> have to do so much work. Would this help to eliminate some of the load
> at kernel.org?

Probably, but then the daemon needs write access to the repository.
This isn't required right now; it can be strictly read-only and
still serve the contents.
 
> remote: Total 63, written 63 (delta 0), reused 63 (delta 0)
> 100% (63/63) done
> fatal: pack: not a valid SHA1
> New branch: 0953670fbcb75e26fb93340bddae934e85618f2e

What version of git is this?  That looks like we're assuming the word
pack was an object, but I'm not sure why we would do such a thing...

-- 
Shawn.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Efficiency of initial clone from server
  2007-02-11 22:53 ` Shawn O. Pearce
@ 2007-02-11 23:25   ` Jon Smirl
  2007-02-11 23:51     ` Jon Smirl
  2007-02-12  1:38     ` Nicolas Pitre
  2007-02-11 23:29   ` Jon Smirl
  1 sibling, 2 replies; 32+ messages in thread
From: Jon Smirl @ 2007-02-11 23:25 UTC (permalink / raw)
  To: Shawn O. Pearce; +Cc: Git Mailing List

On 2/11/07, Shawn O. Pearce <spearce@spearce.org> wrote:
> Jon Smirl <jonsmirl@gmail.com> wrote:
> > Is this happening because the repository on the server is not
> > completely packed? It is basically building a pack of the whole thing
> > and shipping it to me, right?
>
> Correct.  The wire protocol only allows us to send one pack.
> So we have to pack everything and transmit it as a single unit.
>
> > If that is the case, why not first pack the whole repository and then
> > copy it down the wire? Now the next clone that comes along doesn't
> > have to do so much work. Would this help to eliminate some of the load
> > at kernel.org?
>
> Probably, but then the daemon needs write access to the repository.
> This isn't required right now; it can be strictly read-only and
> still serve the contents.
>
> > remote: Total 63, written 63 (delta 0), reused 63 (delta 0)
> > 100% (63/63) done
> > fatal: pack: not a valid SHA1
> > New branch: 0953670fbcb75e26fb93340bddae934e85618f2e
>
> What version of git is this?  That looks like we're assuming the word
> pack was an object, but I'm not sure why we would do such a thing...

jonsmirl@jonsmirl:/usr/local/bin$ git --version
git version 1.5.0.rc2.g53551-dirty


>
> --
> Shawn.
>


-- 
Jon Smirl
jonsmirl@gmail.com

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Efficiency of initial clone from server
  2007-02-11 22:53 ` Shawn O. Pearce
  2007-02-11 23:25   ` Jon Smirl
@ 2007-02-11 23:29   ` Jon Smirl
  1 sibling, 0 replies; 32+ messages in thread
From: Jon Smirl @ 2007-02-11 23:29 UTC (permalink / raw)
  To: Shawn O. Pearce; +Cc: Git Mailing List

On 2/11/07, Shawn O. Pearce <spearce@spearce.org> wrote:
> Jon Smirl <jonsmirl@gmail.com> wrote:
> > Is this happening because the repository on the server is not
> > completely packed? It is basically building a pack of the whole thing
> > and shipping it to me, right?
>
> Correct.  The wire protocol only allows us to send one pack.
> So we have to pack everything and transmit it as a single unit.
>
> > If that is the case, why not first pack the whole repository and then
> > copy it down the wire? Now the next clone that comes along doesn't
> > have to do so much work. Would this help to eliminate some of the load
> > at kernel.org?
>
> Probably, but then the daemon needs write access to the repository.
> This isn't required right now; it can be strictly read-only and
> still serve the contents.

Does it need write access for push to work, can the server check for
write access and save the complete repack if it has access? This
appears to be causing a lot of needless work for kernel.org.

-- 
Jon Smirl
jonsmirl@gmail.com

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Efficiency of initial clone from server
  2007-02-11 23:25   ` Jon Smirl
@ 2007-02-11 23:51     ` Jon Smirl
  2007-02-12  1:38     ` Nicolas Pitre
  1 sibling, 0 replies; 32+ messages in thread
From: Jon Smirl @ 2007-02-11 23:51 UTC (permalink / raw)
  To: Shawn O. Pearce; +Cc: Git Mailing List

On 2/11/07, Jon Smirl <jonsmirl@gmail.com> wrote:
> > > remote: Total 63, written 63 (delta 0), reused 63 (delta 0)
> > > 100% (63/63) done
> > > fatal: pack: not a valid SHA1
> > > New branch: 0953670fbcb75e26fb93340bddae934e85618f2e
> >
> > What version of git is this?  That looks like we're assuming the word
> > pack was an object, but I'm not sure why we would do such a thing...
>
> jonsmirl@jonsmirl:/usr/local/bin$ git --version
> git version 1.5.0.rc2.g53551-dirty
>

I just whacked my git tree, cloned a new copy and installed it. Still
get the same errors.

jonsmirl@jonsmirl:/extra$ rm -rf wireless-dev
gjonsmirl@jonsmirl:/extra$ cg clone
git://git2.kernel.org/pub/scm/linux/kernel/gt/linville/wireless-dev.git
Initialized empty Git repository in .git/
Fetching pack (head and objects)...
remote: Generating pack...
remote: Done counting 404120 objects.
remote: Deltifying 404120 objects.
remote:  100% (404120/404120) done
Indexing 404120 objects.
remote: Total 404120, written 404120 (delta 320324), reused 365290
(delta 282572)
 100% (404120/404120) done
Resolving 320324 deltas.
 100% (320324/320324) done
fatal: pack: not a valid SHA1
Fetching tags... v2.6.12 v2.6.12-rc2 v2.6.12-rc3 v2.6.12-rc4
v2.6.12-rc5 v2.6.12-rc6 v2.6.13 v2.6.13-rc1 v2.6.13-rc2 v2.6.13-rc3
v2.6.13-rc4 v2.6.13-rc5 v2.6.13-rc6 v2.6.13-rc7 v2.6.14 v2.6.14-rc1
v2.6.14-rc2 v2.6.14-rc3 v2.6.14-rc4 v2.6.14-rc5 v2.6.15 v2.6.15-rc1
v2.6.15-rc2 v2.6.15-rc3 v2.6.15-rc4 v2.6.15-rc5 v2.6.15-rc6
v2.6.15-rc7 v2.6.16 v2.6.16-rc1 v2.6.16-rc2 v2.6.16-rc3 v2.6.16-rc4
v2.6.16-rc5 v2.6.16-rc6 v2.6.17 v2.6.17-rc1 v2.6.17-rc2 v2.6.17-rc3
v2.6.17-rc4 v2.6.17-rc5 v2.6.17-rc6 v2.6.18 v2.6.18-rc1 v2.6.18-rc2
v2.6.18-rc3 v2.6.18-rc4 v2.6.18-rc5 v2.6.18-rc6 v2.6.18-rc7 v2.6.19
v2.6.19-rc1 v2.6.19-rc2 v2.6.19-rc3 v2.6.19-rc4 v2.6.19-rc5
v2.6.19-rc6 v2.6.20-rc1 v2.6.20-rc2 v2.6.20-rc3 v2.6.20-rc4
v2.6.20-rc5 v2.6.20-rc6
remote: Generating pack...
remote: Done counting 63 objects.
remote: Deltifying 63 objects.
remote:  100% (63/63) done
Indexing 63 objects.
remote: Total 63, written 63 (delta 0), reused 63 (delta 0)
 100% (63/63) done
fatal: pack: not a valid SHA1
New branch: 0953670fbcb75e26fb93340bddae934e85618f2e
Cloned to wireless-dev/ (origin
git://git2.kernel.org/pub/scm/linux/kernel/git/linville/wireless-dev.git
available as branch "origin")
jonsmirl@jonsmirl:/extra$ git --version
git version 1.5.0.rc4.g1843e
jonsmirl@jonsmirl:/extra$ cg --version
cogito-0.18.2 (cogito-0.18rc1-gb6a6e87)


-- 
Jon Smirl
jonsmirl@gmail.com

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Efficiency of initial clone from server
  2007-02-11 23:25   ` Jon Smirl
  2007-02-11 23:51     ` Jon Smirl
@ 2007-02-12  1:38     ` Nicolas Pitre
  2007-02-12  2:15       ` Jon Smirl
  2007-02-12  4:16       ` Junio C Hamano
  1 sibling, 2 replies; 32+ messages in thread
From: Nicolas Pitre @ 2007-02-12  1:38 UTC (permalink / raw)
  To: Jon Smirl; +Cc: Shawn O. Pearce, Git Mailing List

On Sun, 11 Feb 2007, Jon Smirl wrote:

> On 2/11/07, Shawn O. Pearce <spearce@spearce.org> wrote:
> > Jon Smirl <jonsmirl@gmail.com> wrote:
> > > remote: Total 63, written 63 (delta 0), reused 63 (delta 0)
> > > 100% (63/63) done
> > > fatal: pack: not a valid SHA1
> > > New branch: 0953670fbcb75e26fb93340bddae934e85618f2e
> >
> > What version of git is this?  That looks like we're assuming the word
> > pack was an object, but I'm not sure why we would do such a thing...

This "pack" comes from pack-index when providing the name of the pack.
It is either "pack" or "keep" and specifies the name of the .keep file 
to remove in the later case.
This is caught by git-fetch.sh with some code identified with a comment 
that reads: "# special line coming from index-pack with the pack name"

> jonsmirl@jonsmirl:/usr/local/bin$ git --version
> git version 1.5.0.rc2.g53551-dirty

You must have conflicting vintage of GIT installations on your machine 
with missing support for the "pack" and "keep" stuff described above.


Nicolas

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Efficiency of initial clone from server
  2007-02-12  1:38     ` Nicolas Pitre
@ 2007-02-12  2:15       ` Jon Smirl
  2007-02-12  3:55         ` Nicolas Pitre
  2007-02-12  4:16       ` Junio C Hamano
  1 sibling, 1 reply; 32+ messages in thread
From: Jon Smirl @ 2007-02-12  2:15 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: Shawn O. Pearce, Git Mailing List

On 2/11/07, Nicolas Pitre <nico@cam.org> wrote:
> On Sun, 11 Feb 2007, Jon Smirl wrote:
>
> > On 2/11/07, Shawn O. Pearce <spearce@spearce.org> wrote:
> > > Jon Smirl <jonsmirl@gmail.com> wrote:
> > > > remote: Total 63, written 63 (delta 0), reused 63 (delta 0)
> > > > 100% (63/63) done
> > > > fatal: pack: not a valid SHA1
> > > > New branch: 0953670fbcb75e26fb93340bddae934e85618f2e
> > >
> > > What version of git is this?  That looks like we're assuming the word
> > > pack was an object, but I'm not sure why we would do such a thing...
>
> This "pack" comes from pack-index when providing the name of the pack.
> It is either "pack" or "keep" and specifies the name of the .keep file
> to remove in the later case.
> This is caught by git-fetch.sh with some code identified with a comment
> that reads: "# special line coming from index-pack with the pack name"
>
> > jonsmirl@jonsmirl:/usr/local/bin$ git --version
> > git version 1.5.0.rc2.g53551-dirty
>
> You must have conflicting vintage of GIT installations on your machine
> with missing support for the "pack" and "keep" stuff described above.

I can clone Linus' git tree without getting errors.

Maybe there is something wrong with the wireless-dev tree.
It looks like someone is working on it:

jonsmirl@jonsmirl:/extra$ git clone
git://git.kernel.org/pub/scm/linux/kernel/gt/linville/wireless-dev.git
Initialized empty Git repository in /extra/wireless-dev/.git/
fatal: The remote end hung up unexpectedly
fetch-pack from
'git://git.kernel.org/pub/scm/linux/kernel/gt/linville/wireless-dev.git'
failed.
jonsmirl@jonsmirl:/extra$



-- 
Jon Smirl
jonsmirl@gmail.com

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Efficiency of initial clone from server
  2007-02-12  2:15       ` Jon Smirl
@ 2007-02-12  3:55         ` Nicolas Pitre
  2007-02-12  4:49           ` Shawn O. Pearce
  0 siblings, 1 reply; 32+ messages in thread
From: Nicolas Pitre @ 2007-02-12  3:55 UTC (permalink / raw)
  To: Jon Smirl; +Cc: Shawn O. Pearce, Git Mailing List

On Sun, 11 Feb 2007, Jon Smirl wrote:

> Maybe there is something wrong with the wireless-dev tree.
> It looks like someone is working on it:
> 
> jonsmirl@jonsmirl:/extra$ git clone
> git://git.kernel.org/pub/scm/linux/kernel/gt/linville/wireless-dev.git
> Initialized empty Git repository in /extra/wireless-dev/.git/
> fatal: The remote end hung up unexpectedly
> fetch-pack from
> 'git://git.kernel.org/pub/scm/linux/kernel/gt/linville/wireless-dev.git'
> failed.

Try with git2.kernel.org then.  It seems that git.kernel.org too often 
resolves to git1.kernel.org not sharing the load with git2.kernel.org 
appropriately.


Nicolas

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Efficiency of initial clone from server
  2007-02-12  1:38     ` Nicolas Pitre
  2007-02-12  2:15       ` Jon Smirl
@ 2007-02-12  4:16       ` Junio C Hamano
  2007-02-12  4:29         ` Jon Smirl
  1 sibling, 1 reply; 32+ messages in thread
From: Junio C Hamano @ 2007-02-12  4:16 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: Jon Smirl, Shawn O. Pearce, Git Mailing List

Nicolas Pitre <nico@cam.org> writes:

> On Sun, 11 Feb 2007, Jon Smirl wrote:
>
>> On 2/11/07, Shawn O. Pearce <spearce@spearce.org> wrote:
>> > Jon Smirl <jonsmirl@gmail.com> wrote:
>> > > remote: Total 63, written 63 (delta 0), reused 63 (delta 0)
>> > > 100% (63/63) done
>> > > fatal: pack: not a valid SHA1
>> > > New branch: 0953670fbcb75e26fb93340bddae934e85618f2e
>> >
>> > What version of git is this?  That looks like we're assuming the word
>> > pack was an object, but I'm not sure why we would do such a thing...
>
> This "pack" comes from pack-index when providing the name of the pack.
> It is either "pack" or "keep" and specifies the name of the .keep file 
> to remove in the later case.
> This is caught by git-fetch.sh with some code identified with a comment 
> that reads: "# special line coming from index-pack with the pack name"

That is true only if Jon used git-fetch, git-pull or git-clone.
Unfortunately I noticed that his commandline read "cg clone".

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Efficiency of initial clone from server
  2007-02-12  4:16       ` Junio C Hamano
@ 2007-02-12  4:29         ` Jon Smirl
  2007-02-12  4:33           ` Junio C Hamano
  0 siblings, 1 reply; 32+ messages in thread
From: Jon Smirl @ 2007-02-12  4:29 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Nicolas Pitre, Shawn O. Pearce, Git Mailing List

On 2/11/07, Junio C Hamano <junkio@cox.net> wrote:
> Nicolas Pitre <nico@cam.org> writes:
>
> > On Sun, 11 Feb 2007, Jon Smirl wrote:
> >
> >> On 2/11/07, Shawn O. Pearce <spearce@spearce.org> wrote:
> >> > Jon Smirl <jonsmirl@gmail.com> wrote:
> >> > > remote: Total 63, written 63 (delta 0), reused 63 (delta 0)
> >> > > 100% (63/63) done
> >> > > fatal: pack: not a valid SHA1
> >> > > New branch: 0953670fbcb75e26fb93340bddae934e85618f2e
> >> >
> >> > What version of git is this?  That looks like we're assuming the word
> >> > pack was an object, but I'm not sure why we would do such a thing...
> >
> > This "pack" comes from pack-index when providing the name of the pack.
> > It is either "pack" or "keep" and specifies the name of the .keep file
> > to remove in the later case.
> > This is caught by git-fetch.sh with some code identified with a comment
> > that reads: "# special line coming from index-pack with the pack name"
>
> That is true only if Jon used git-fetch, git-pull or git-clone.
> Unfortunately I noticed that his commandline read "cg clone".

All wireless-dev are all dead.
I can still get to Linus' tree without problem. wireless-2.6 tree is ok too.

jonsmirl@jonsmirl:/extra$ git clone
git://git.kernel.org/pub/scm/linux/kernel/gt/linville/wireless-dev.git
Initialized empty Git repository in /extra/wireless-dev/.git/
fatal: The remote end hung up unexpectedly
fetch-pack from
'git://git.kernel.org/pub/scm/linux/kernel/gt/linville/wireless-dev.git'
failed.
jonsmirl@jonsmirl:/extra$ git clone
git://git2.kernel.org/pub/scm/linux/kernel/gt/linville/wireless-dev.git
Initialized empty Git repository in /extra/wireless-dev/.git/
fatal: The remote end hung up unexpectedly
fetch-pack from
'git://git2.kernel.org/pub/scm/linux/kernel/gt/linville/wireless-dev.git'
failed.
jonsmirl@jonsmirl:/extra$ git clone
git://git1.kernel.org/pub/scm/linux/kernel/gt/linville/wireless-dev.git
Initialized empty Git repository in /extra/wireless-dev/.git/
fatal: The remote end hung up unexpectedly
fetch-pack from
'git://git1.kernel.org/pub/scm/linux/kernel/gt/linville/wireless-dev.git'
failed.
jonsmirl@jonsmirl:/extra$


Clone of the git repo complete without errors

jonsmirl@jonsmirl:/extra$ git clone git://git2.kernel.org/pub/scm/git/git.git
Initialized empty Git repository in /extra/git/.git/
remote: Generating pack...
remote: Done counting 39677 objects.
remote: Deltifying 39677 objects.
remote:  100% (39677/39677) done
Indexing 39677 objects.
remote: Total 39677, written 39677 (delta 27701), reused 39585 (delta 27642)
 100% (39677/39677) done
Resolving 27701 deltas.
 100% (27701/27701) done
Checking files out...
 100% (798/798) done
jonsmirl@jonsmirl:/extra$

I think something is wrong with the wireless-dev tree.

-- 
Jon Smirl
jonsmirl@gmail.com

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Efficiency of initial clone from server
  2007-02-12  4:29         ` Jon Smirl
@ 2007-02-12  4:33           ` Junio C Hamano
  2007-02-12  4:53             ` Jon Smirl
  0 siblings, 1 reply; 32+ messages in thread
From: Junio C Hamano @ 2007-02-12  4:33 UTC (permalink / raw)
  To: Jon Smirl
  Cc: Junio C Hamano, Nicolas Pitre, Shawn O. Pearce, Git Mailing List

"Jon Smirl" <jonsmirl@gmail.com> writes:

> All wireless-dev are all dead.
> I can still get to Linus' tree without problem. wireless-2.6 tree is ok too.
>
> jonsmirl@jonsmirl:/extra$ git clone
> git://git.kernel.org/pub/scm/linux/kernel/gt/linville/wireless-dev.git
> Initialized empty Git repository in /extra/wireless-dev/.git/
> fatal: The remote end hung up unexpectedly

Are you sure the above is ".../linux/kernel/gt/linville/..."?

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Efficiency of initial clone from server
  2007-02-12  3:55         ` Nicolas Pitre
@ 2007-02-12  4:49           ` Shawn O. Pearce
  2007-02-12 16:42             ` Nicolas Pitre
  0 siblings, 1 reply; 32+ messages in thread
From: Shawn O. Pearce @ 2007-02-12  4:49 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: Jon Smirl, Git Mailing List

Nicolas Pitre <nico@cam.org> wrote:
> Try with git2.kernel.org then.  It seems that git.kernel.org too often 
> resolves to git1.kernel.org not sharing the load with git2.kernel.org 
> appropriately.

Hey!  You are giving out my secret!

(I use git2.kernel.org to fetch git.git.)

-- 
Shawn.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Efficiency of initial clone from server
  2007-02-12  4:33           ` Junio C Hamano
@ 2007-02-12  4:53             ` Jon Smirl
  2007-02-12  5:01               ` Jon Smirl
  2007-02-13 15:03               ` Andreas Ericsson
  0 siblings, 2 replies; 32+ messages in thread
From: Jon Smirl @ 2007-02-12  4:53 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Nicolas Pitre, Shawn O. Pearce, Git Mailing List

On 2/11/07, Junio C Hamano <junkio@cox.net> wrote:
> "Jon Smirl" <jonsmirl@gmail.com> writes:
>
> > All wireless-dev are all dead.
> > I can still get to Linus' tree without problem. wireless-2.6 tree is ok too.
> >
> > jonsmirl@jonsmirl:/extra$ git clone
> > git://git.kernel.org/pub/scm/linux/kernel/gt/linville/wireless-dev.git
> > Initialized empty Git repository in /extra/wireless-dev/.git/
> > fatal: The remote end hung up unexpectedly
>
> Are you sure the above is ".../linux/kernel/gt/linville/..."?

You're right it should be git instead of gt, somewhere in my copying
strings around I lost the 'i' and wasn't paying attention.  That's not
a very good error message: "fatal: The remote end hung up
unexpectedly" for a missing repository.

Here's a git clone with the right string. It works:

jonsmirl@jonsmirl:/extra$ git clone
git://git2.kernel.org/pub/scm/linux/kernel/git/linville/wireless-dev.git
Initialized empty Git repository in /extra/wireless-dev/.git/
remote: Generating pack...
remote: Done counting 411874 objects.
remote: Deltifying 411874 objects.
remote:  100% (411874/411874) done
Indexing 411874 objects.
remote: Total 411874, written 411874 (delta 327049), reused 374287
(delta 289485)
 100% (411874/411874) done
Resolving 327049 deltas.
 100% (327049/327049) done
Checking files out...
 100% (21429/21429) done
jonsmirl@jonsmirl:/extra$

Same thing with cg clone, it's what is broken.
cg update is broken in the same way.
I'm using the current git version of cogitio.
I'll switch to the git commands, git clone is about 10x faster for the
clone anyway.

jonsmirl@jonsmirl:/extra$ cg clone
git://git2.kernel.org/pub/scm/linux/kernel/git/linville/wireless-dev.git
Initialized empty Git repository in .git/
Fetching pack (head and objects)...
remote: Generating pack...
remote: Done counting 404120 objects.
remote: Deltifying 404120 objects.
remote:  100% (404120/404120) done
Indexing 404120 objects.
remote: Total 404120, written 404120 (delta 320324), reused 365290
(delta 282572)
 100% (404120/404120) done
Resolving 320324 deltas.
 100% (320324/320324) done
fatal: pack: not a valid SHA1
Fetching tags... v2.6.12 v2.6.12-rc2 v2.6.12-rc3 v2.6.12-rc4
v2.6.12-rc5 v2.6.12-rc6 v2.6.13 v2.6.13-rc1 v2.6.13-rc2 v2.6.13-rc3
v2.6.13-rc4 v2.6.13-rc5 v2.6.13-rc6 v2.6.13-rc7 v2.6.14 v2.6.14-rc1
v2.6.14-rc2 v2.6.14-rc3 v2.6.14-rc4 v2.6.14-rc5 v2.6.15 v2.6.15-rc1
v2.6.15-rc2 v2.6.15-rc3 v2.6.15-rc4 v2.6.15-rc5 v2.6.15-rc6
v2.6.15-rc7 v2.6.16 v2.6.16-rc1 v2.6.16-rc2 v2.6.16-rc3 v2.6.16-rc4
v2.6.16-rc5 v2.6.16-rc6 v2.6.17 v2.6.17-rc1 v2.6.17-rc2 v2.6.17-rc3
v2.6.17-rc4 v2.6.17-rc5 v2.6.17-rc6 v2.6.18 v2.6.18-rc1 v2.6.18-rc2
v2.6.18-rc3 v2.6.18-rc4 v2.6.18-rc5 v2.6.18-rc6 v2.6.18-rc7 v2.6.19
v2.6.19-rc1 v2.6.19-rc2 v2.6.19-rc3 v2.6.19-rc4 v2.6.19-rc5
v2.6.19-rc6 v2.6.20-rc1 v2.6.20-rc2 v2.6.20-rc3 v2.6.20-rc4
v2.6.20-rc5 v2.6.20-rc6
remote: Generating pack...
remote: Done counting 63 objects.
remote: Deltifying 63 objects.
remote:  100% (63/63) done
Indexing 63 objects.
remote: Total 63, written 63 (delta 0), reused 63 (delta 0)
 100% (63/63) done
fatal: pack: not a valid SHA1
New branch: 0953670fbcb75e26fb93340bddae934e85618f2e
Cloned to wireless-dev/ (origin
git://git2.kernel.org/pub/scm/linux/kernel/git/linville/wireless-dev.git
available as branch "origin")
jonsmirl@jonsmirl:/extra$


-- 
Jon Smirl
jonsmirl@gmail.com

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Efficiency of initial clone from server
  2007-02-12  4:53             ` Jon Smirl
@ 2007-02-12  5:01               ` Jon Smirl
  2007-02-12  5:11                 ` Shawn O. Pearce
  2007-02-12  5:30                 ` Junio C Hamano
  2007-02-13 15:03               ` Andreas Ericsson
  1 sibling, 2 replies; 32+ messages in thread
From: Jon Smirl @ 2007-02-12  5:01 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Nicolas Pitre, Shawn O. Pearce, Git Mailing List

On 2/11/07, Jon Smirl <jonsmirl@gmail.com> wrote:
> Same thing with cg clone, it's what is broken.
> cg update is broken in the same way.
> I'm using the current git version of cogitio.
> I'll switch to the git commands, git clone is about 10x faster for the
> clone anyway.

Don't read anything into the 10x speed diff, my last git clone was
really slow. I'm probably fighting other people at kernel.org to keep
the tree in RAM.

But pack to the original point, can't the server check and see if it
has write access so that it can keep the fully packed tree? I've just
caused kernel.org to needlessly repack the wireless-dev tree a dozen
times playing with this clone command. If it didn't have to keep
repacking for the clone, clone would be a lot faster.

-- 
Jon Smirl
jonsmirl@gmail.com

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Efficiency of initial clone from server
  2007-02-12  5:01               ` Jon Smirl
@ 2007-02-12  5:11                 ` Shawn O. Pearce
  2007-02-12  5:17                   ` Jon Smirl
  2007-02-12  5:30                 ` Junio C Hamano
  1 sibling, 1 reply; 32+ messages in thread
From: Shawn O. Pearce @ 2007-02-12  5:11 UTC (permalink / raw)
  To: Jon Smirl; +Cc: Junio C Hamano, Nicolas Pitre, Git Mailing List

Jon Smirl <jonsmirl@gmail.com> wrote:
> But pack to the original point, can't the server check and see if it
> has write access so that it can keep the fully packed tree? I've just
> caused kernel.org to needlessly repack the wireless-dev tree a dozen
> times playing with this clone command. If it didn't have to keep
> repacking for the clone, clone would be a lot faster.

We probably could.

I have actually been thinking about another problem that is
somewhat related.  We cannot put more than 4 GiB of data into a
single packfile, due to the current index size limitation, or more
than 2^32-1 objects into one packfile, due to the header nr_objects
field size.

Right now we are sending a single packfile down to the client,
even if the remote server end has the repository broken down into
a couple of packfiles (such as "really old historical stuff" and
"active stuff from this year").  If we could send more than one
packfile to the client in a single stream, we could still keep the
file size limitations.

We can also avoid this huge repack case on the server.  Because it
could just send all of the packfiles that it already has, followed
by whatever is loose which wasn't in a prior packfile.  And no
write access required.

Of course, we still could do the optimization of caching the
packfile, but I'm not sure how well that would work on kernel.org,
as I understand the trees are owned by the devs which created them
while the git daemon is probably not running as their UNIX user.

-- 
Shawn.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Efficiency of initial clone from server
  2007-02-12  5:11                 ` Shawn O. Pearce
@ 2007-02-12  5:17                   ` Jon Smirl
  2007-02-12 15:20                     ` Nicolas Pitre
  0 siblings, 1 reply; 32+ messages in thread
From: Jon Smirl @ 2007-02-12  5:17 UTC (permalink / raw)
  To: Shawn O. Pearce; +Cc: Junio C Hamano, Nicolas Pitre, Git Mailing List

On 2/12/07, Shawn O. Pearce <spearce@spearce.org> wrote:
> Jon Smirl <jonsmirl@gmail.com> wrote:
> > But pack to the original point, can't the server check and see if it
> > has write access so that it can keep the fully packed tree? I've just
> > caused kernel.org to needlessly repack the wireless-dev tree a dozen
> > times playing with this clone command. If it didn't have to keep
> > repacking for the clone, clone would be a lot faster.
>
> We probably could.
>
> I have actually been thinking about another problem that is
> somewhat related.  We cannot put more than 4 GiB of data into a
> single packfile, due to the current index size limitation, or more
> than 2^32-1 objects into one packfile, due to the header nr_objects
> field size.
>
> Right now we are sending a single packfile down to the client,
> even if the remote server end has the repository broken down into
> a couple of packfiles (such as "really old historical stuff" and
> "active stuff from this year").  If we could send more than one
> packfile to the client in a single stream, we could still keep the
> file size limitations.
>
> We can also avoid this huge repack case on the server.  Because it
> could just send all of the packfiles that it already has, followed
> by whatever is loose which wasn't in a prior packfile.  And no
> write access required.
>
> Of course, we still could do the optimization of caching the
> packfile, but I'm not sure how well that would work on kernel.org,
> as I understand the trees are owned by the devs which created them
> while the git daemon is probably not running as their UNIX user.

I didn't want to cache the packfile, instead I wanted to repack the
repository and then copy the resulting pack file down the wire. A
clone would just be a trigger to make sure everything in the repo was
packed (maybe into multiple packs) before starting to send anything.
Doing it this way means that everyone benefits from the packing.


>
> --
> Shawn.
>


-- 
Jon Smirl
jonsmirl@gmail.com

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Efficiency of initial clone from server
  2007-02-12  5:01               ` Jon Smirl
  2007-02-12  5:11                 ` Shawn O. Pearce
@ 2007-02-12  5:30                 ` Junio C Hamano
  2007-02-12  5:55                   ` Jon Smirl
  2007-02-12 11:45                   ` Johannes Schindelin
  1 sibling, 2 replies; 32+ messages in thread
From: Junio C Hamano @ 2007-02-12  5:30 UTC (permalink / raw)
  To: Jon Smirl
  Cc: Junio C Hamano, Nicolas Pitre, Shawn O. Pearce, Git Mailing List

"Jon Smirl" <jonsmirl@gmail.com> writes:

> On 2/11/07, Jon Smirl <jonsmirl@gmail.com> wrote:
>> Same thing with cg clone, it's what is broken.
>> cg update is broken in the same way.
>> I'm using the current git version of cogitio.
>> I'll switch to the git commands, git clone is about 10x faster for the
>> clone anyway.
>
> Don't read anything into the 10x speed diff, my last git clone was
> really slow. I'm probably fighting other people at kernel.org to keep
> the tree in RAM.
>
> But pack to the original point, can't the server check and see if it
> has write access so that it can keep the fully packed tree? I've just
> caused kernel.org to needlessly repack the wireless-dev tree a dozen
> times playing with this clone command. If it didn't have to keep
> repacking for the clone, clone would be a lot faster.

You are assuming everybody does initial clone all the time.  I
do not think that holds true in practice.

For something like tglx historical tree that will _never_
change, there is a specific hack the repository owner can take
advantage of to always feed a prepackaged pack, although its use
is not advertised well enough (and I do not think it buys much
in practice).

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Efficiency of initial clone from server
  2007-02-12  5:30                 ` Junio C Hamano
@ 2007-02-12  5:55                   ` Jon Smirl
  2007-02-12  6:08                     ` Junio C Hamano
  2007-02-12 11:45                   ` Johannes Schindelin
  1 sibling, 1 reply; 32+ messages in thread
From: Jon Smirl @ 2007-02-12  5:55 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Nicolas Pitre, Shawn O. Pearce, Git Mailing List

On 2/12/07, Junio C Hamano <junkio@cox.net> wrote:
> "Jon Smirl" <jonsmirl@gmail.com> writes:
>
> > On 2/11/07, Jon Smirl <jonsmirl@gmail.com> wrote:
> >> Same thing with cg clone, it's what is broken.
> >> cg update is broken in the same way.
> >> I'm using the current git version of cogitio.
> >> I'll switch to the git commands, git clone is about 10x faster for the
> >> clone anyway.
> >
> > Don't read anything into the 10x speed diff, my last git clone was
> > really slow. I'm probably fighting other people at kernel.org to keep
> > the tree in RAM.
> >
> > But pack to the original point, can't the server check and see if it
> > has write access so that it can keep the fully packed tree? I've just
> > caused kernel.org to needlessly repack the wireless-dev tree a dozen
> > times playing with this clone command. If it didn't have to keep
> > repacking for the clone, clone would be a lot faster.
>
> You are assuming everybody does initial clone all the time.  I

Why not use an initial clone as a trigger for a repack? Given the
thousands of people playing with trees on kernel.org it must happen
quite a bit. Add a log message to the server and we can find out for
sure.

I am guilty of doing initial clones for different kernel trees from
kernel.org when I could be doing a local clone of linus' tree and then
pulling the deltas from kernel.org. But I'm lazy, I just kick the
clone off in the background and it finishes in three or four minutes.
I also do the clones when I have messed my local trees up so much that
I don't know what is in them anymore.

> do not think that holds true in practice.

git experts can avoid almost all of the clones, but most people don't
learn enough about git to avoid them.

For example, I've been using git for quite a while now and I still
haven't bother to figure out how to do this: start with a local clone
of linus' tree, now I want to pull the wireless-dev tree into the same
local tree as another branch. And maybe pull the wireless-2.6 into yet
another branch. Then can I pull updates from all of my remote
repositories with a single command?

>
> For something like tglx historical tree that will _never_
> change, there is a specific hack the repository owner can take
> advantage of to always feed a prepackaged pack, although its use
> is not advertised well enough (and I do not think it buys much
> in practice).
>
>
>


-- 
Jon Smirl
jonsmirl@gmail.com

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Efficiency of initial clone from server
  2007-02-12  5:55                   ` Jon Smirl
@ 2007-02-12  6:08                     ` Junio C Hamano
  2007-02-12 15:24                       ` Jon Smirl
  0 siblings, 1 reply; 32+ messages in thread
From: Junio C Hamano @ 2007-02-12  6:08 UTC (permalink / raw)
  To: Jon Smirl
  Cc: Junio C Hamano, Nicolas Pitre, Shawn O. Pearce, Git Mailing List

"Jon Smirl" <jonsmirl@gmail.com> writes:

> I am guilty of doing initial clones for different kernel trees from
> kernel.org when I could be doing a local clone of linus' tree and then
> pulling the deltas from kernel.org. But I'm lazy, I just kick the
> clone off in the background and it finishes in three or four minutes.
> I also do the clones when I have messed my local trees up so much that
> I don't know what is in them anymore.

Time to learn to use --reference perhaps?

	git clone --reference linux-2.6 git://.../linville/wireless-dev.git

where "linux-2.6" is local repository which is my personal copy
of Linus's repo.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Efficiency of initial clone from server
  2007-02-12  5:30                 ` Junio C Hamano
  2007-02-12  5:55                   ` Jon Smirl
@ 2007-02-12 11:45                   ` Johannes Schindelin
  2007-02-12 14:31                     ` Jon Smirl
  1 sibling, 1 reply; 32+ messages in thread
From: Johannes Schindelin @ 2007-02-12 11:45 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Jon Smirl, Nicolas Pitre, Shawn O. Pearce, Git Mailing List

Hi,

On Sun, 11 Feb 2007, Junio C Hamano wrote:

> You are assuming everybody does initial clone all the time.  I do not 
> think that holds true in practice.

It depends how you interpret "all the time". What you (Junio) are 
suggesting is that the count of initial clones is relatively small as 
compared to the total number of fetches.

However, you can interpret "all the time" in terms of "time". Most fetches 
are really small. They even often end up in no objects pulled at all. 
These are cheap for the server. The initial clones take a long time. They 
are expensive.

I'd be interested to learn how much of the CPU time is actually spent in 
initial clones, rather than other types of fetches. It might make sense 
yet to optimize initial clones.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Efficiency of initial clone from server
  2007-02-12 11:45                   ` Johannes Schindelin
@ 2007-02-12 14:31                     ` Jon Smirl
  2007-02-12 17:06                       ` Shawn O. Pearce
  0 siblings, 1 reply; 32+ messages in thread
From: Jon Smirl @ 2007-02-12 14:31 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: Junio C Hamano, Nicolas Pitre, Shawn O. Pearce, Git Mailing List

On 2/12/07, Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote:
> Hi,
>
> On Sun, 11 Feb 2007, Junio C Hamano wrote:
>
> > You are assuming everybody does initial clone all the time.  I do not
> > think that holds true in practice.
>
> It depends how you interpret "all the time". What you (Junio) are
> suggesting is that the count of initial clones is relatively small as
> compared to the total number of fetches.
>
> However, you can interpret "all the time" in terms of "time". Most fetches
> are really small. They even often end up in no objects pulled at all.
> These are cheap for the server. The initial clones take a long time. They
> are expensive.
>
> I'd be interested to learn how much of the CPU time is actually spent in
> initial clones, rather than other types of fetches. It might make sense
> yet to optimize initial clones.

I don't think CPU is a problem at kernel.org, but disk IO defnitely
is. The initial clones cause several minutes (sometimes 10 min or more
when there kernel.org is loaded) worth of disk IO. They also totally
thrash the kernel.org cache. The alternative of using a clone to
trigger a repack would go through this once, and then use sendfile (is
gitd that smart?) to send the packs. Sendfile uses the smallest cache
required.

Why doesn't clone copy the existing packs down first with sendfile,
then build a small pack for what is left and avoid the initial step of
making a giant pack. Isn't clone going to break when the repo exceeds
2GB?


>
> Ciao,
> Dscho
>
>


-- 
Jon Smirl
jonsmirl@gmail.com

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Efficiency of initial clone from server
  2007-02-12  5:17                   ` Jon Smirl
@ 2007-02-12 15:20                     ` Nicolas Pitre
  2007-02-12 19:35                       ` Theodore Tso
  0 siblings, 1 reply; 32+ messages in thread
From: Nicolas Pitre @ 2007-02-12 15:20 UTC (permalink / raw)
  To: Jon Smirl; +Cc: Shawn O. Pearce, Junio C Hamano, Git Mailing List

On Mon, 12 Feb 2007, Jon Smirl wrote:

> I didn't want to cache the packfile, instead I wanted to repack the
> repository and then copy the resulting pack file down the wire. A
> clone would just be a trigger to make sure everything in the repo was
> packed (maybe into multiple packs) before starting to send anything.
> Doing it this way means that everyone benefits from the packing.

Repacking on clone is not the solution at all.

This problem is going to largely be resolved when GIT 1.5.0 gets 
installed on kernel.org.  With latest GIT, pushes are kept as packs on 
the remote end (when they're big enough which is over 100 objects by 
default).  Then repacking multiple packs into one is almost free as most 
of the data is simply copied from one pack and sent over the wire as a 
single pack.

As for the cache problem on kernel.org, that would be largely resolved 
if all kernel related projects were repacked with reference to Linus' 
repository to avoid copying the same set of data all over the place.


Nicolas

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Efficiency of initial clone from server
  2007-02-12  6:08                     ` Junio C Hamano
@ 2007-02-12 15:24                       ` Jon Smirl
  2007-02-12 16:40                         ` Jon Smirl
  0 siblings, 1 reply; 32+ messages in thread
From: Jon Smirl @ 2007-02-12 15:24 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Nicolas Pitre, Shawn O. Pearce, Git Mailing List

On 2/12/07, Junio C Hamano <junkio@cox.net> wrote:
> "Jon Smirl" <jonsmirl@gmail.com> writes:
>
> > I am guilty of doing initial clones for different kernel trees from
> > kernel.org when I could be doing a local clone of linus' tree and then
> > pulling the deltas from kernel.org. But I'm lazy, I just kick the
> > clone off in the background and it finishes in three or four minutes.
> > I also do the clones when I have messed my local trees up so much that
> > I don't know what is in them anymore.
>
> Time to learn to use --reference perhaps?
>
>         git clone --reference linux-2.6 git://.../linville/wireless-dev.git
>
> where "linux-2.6" is local repository which is my personal copy
> of Linus's repo.

I knew you smart guys would have a command to do this. This is in the
category of a command that I use infrequently enough that I forget
about it.

Something like Cogito could be smart so that when you did a clone
command it could prompt you if you wanted a new repo or to share an
existing one.

-- 
Jon Smirl
jonsmirl@gmail.com

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Efficiency of initial clone from server
  2007-02-12 15:24                       ` Jon Smirl
@ 2007-02-12 16:40                         ` Jon Smirl
  2007-02-12 17:04                           ` Shawn O. Pearce
  0 siblings, 1 reply; 32+ messages in thread
From: Jon Smirl @ 2007-02-12 16:40 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Nicolas Pitre, Shawn O. Pearce, Git Mailing List

On 2/12/07, Jon Smirl <jonsmirl@gmail.com> wrote:
> On 2/12/07, Junio C Hamano <junkio@cox.net> wrote:
> > "Jon Smirl" <jonsmirl@gmail.com> writes:
> >
> > > I am guilty of doing initial clones for different kernel trees from
> > > kernel.org when I could be doing a local clone of linus' tree and then
> > > pulling the deltas from kernel.org. But I'm lazy, I just kick the
> > > clone off in the background and it finishes in three or four minutes.
> > > I also do the clones when I have messed my local trees up so much that
> > > I don't know what is in them anymore.
> >
> > Time to learn to use --reference perhaps?
> >
> >         git clone --reference linux-2.6 git://.../linville/wireless-dev.git
> >
> > where "linux-2.6" is local repository which is my personal copy
> > of Linus's repo.

Does this use hard links so that if I whack my linux-2.6 it won't also
destroy my wireless-dev repo?


-- 
Jon Smirl
jonsmirl@gmail.com

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Efficiency of initial clone from server
  2007-02-12  4:49           ` Shawn O. Pearce
@ 2007-02-12 16:42             ` Nicolas Pitre
  0 siblings, 0 replies; 32+ messages in thread
From: Nicolas Pitre @ 2007-02-12 16:42 UTC (permalink / raw)
  To: Shawn O. Pearce; +Cc: Jon Smirl, Git Mailing List

On Sun, 11 Feb 2007, Shawn O. Pearce wrote:

> Nicolas Pitre <nico@cam.org> wrote:
> > Try with git2.kernel.org then.  It seems that git.kernel.org too often 
> > resolves to git1.kernel.org not sharing the load with git2.kernel.org 
> > appropriately.
> 
> Hey!  You are giving out my secret!

Well... this issue (and work-around) is mentioned on the k.o front page 
at http://www.kernel.org/.

I didn't meant to spoil a secret... sorry.  ;-)


Nicolas

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Efficiency of initial clone from server
  2007-02-12 16:40                         ` Jon Smirl
@ 2007-02-12 17:04                           ` Shawn O. Pearce
  0 siblings, 0 replies; 32+ messages in thread
From: Shawn O. Pearce @ 2007-02-12 17:04 UTC (permalink / raw)
  To: Jon Smirl; +Cc: Junio C Hamano, Nicolas Pitre, Git Mailing List

Jon Smirl <jonsmirl@gmail.com> wrote:
> On 2/12/07, Jon Smirl <jonsmirl@gmail.com> wrote:
> >On 2/12/07, Junio C Hamano <junkio@cox.net> wrote:
> >> Time to learn to use --reference perhaps?
> >>
> >>         git clone --reference linux-2.6 
> >git://.../linville/wireless-dev.git
> >>
> >> where "linux-2.6" is local repository which is my personal copy
> >> of Linus's repo.
> 
> Does this use hard links so that if I whack my linux-2.6 it won't also
> destroy my wireless-dev repo?

No.  It uses a Git specific 'symlink workalike'.  We record
the path to the reference repository in the plain text file

  .git/objects/info/alternates

This file can list multiple repositories that this repository borrows
objects from.  If you delete any of those, then this repository
will likely lose access to objects it thinks should be here,
thereby totally screwing this repository.

There's also some risk with repacking or pruneing the reference
source (your linux-2.6 repository) as that repository doesn't know
what objects it has which it itself doesn't need, but which are in
use by the other repositories.


The easiest way to prevent this from destroying a repository later
on is to make your shiny new wireless-dev clone standalone by
completely repacking it after creation (*without* -l !) :

  git repack -a -d
  rm .git/objects/info/alternates

Another solution is don't clone from the remote, but instead clone
locally then update the origin and refetch:

  git clone -l -n linux-2.6 wireless-dev
  cd wireless-dev
  git config remote.origin.url git://.../linville/wireless-dev.git
  git fetch

The initial clone will setup hardlinks, but also uses the wrong
origin URL.  Hence we have to change it just before we attempt to
fetch from wireless-dev.

-- 
Shawn.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Efficiency of initial clone from server
  2007-02-12 14:31                     ` Jon Smirl
@ 2007-02-12 17:06                       ` Shawn O. Pearce
  0 siblings, 0 replies; 32+ messages in thread
From: Shawn O. Pearce @ 2007-02-12 17:06 UTC (permalink / raw)
  To: Jon Smirl
  Cc: Johannes Schindelin, Junio C Hamano, Nicolas Pitre, Git Mailing List

Jon Smirl <jonsmirl@gmail.com> wrote:
> Why doesn't clone copy the existing packs down first with sendfile,
> then build a small pack for what is left and avoid the initial step of
> making a giant pack. Isn't clone going to break when the repo exceeds
> 2GB?

Because the network format allows an unlimited packfile size,
at least until you reach the 2^32-1 object count barrier anyway.
It also allows only one packfile to be sent.

The problem is the local system.  We are unable to build an index
file locally on a packfile that is larger than 4 GiB, as the offsets
to positions within the packfile are 32 bit unsigned big-endian
integers.  The index gets built on the client as it receives the
packfile from the server.

-- 
Shawn.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Efficiency of initial clone from server
  2007-02-12 15:20                     ` Nicolas Pitre
@ 2007-02-12 19:35                       ` Theodore Tso
  2007-02-12 20:53                         ` Junio C Hamano
  0 siblings, 1 reply; 32+ messages in thread
From: Theodore Tso @ 2007-02-12 19:35 UTC (permalink / raw)
  To: Nicolas Pitre
  Cc: Jon Smirl, Shawn O. Pearce, Junio C Hamano, Git Mailing List

On Mon, Feb 12, 2007 at 10:20:31AM -0500, Nicolas Pitre wrote:
> Repacking on clone is not the solution at all.
> 
> This problem is going to largely be resolved when GIT 1.5.0 gets 
> installed on kernel.org.  With latest GIT, pushes are kept as packs on 
> the remote end (when they're big enough which is over 100 objects by 
> default).  Then repacking multiple packs into one is almost free as most 
> of the data is simply copied from one pack and sent over the wire as a 
> single pack.

Even before we get Git 1.5.0 installed on master.kernel.org (and we
should really ask hpa to do that), is there a reason we haven't done
something like this across all of the kernel repo's on
master.kernel.org?

for i in <list of kernel git repo's on master.kernel.org>
do
   pushd $i
   if [ ! -f objects/info/alternates ]; then
	echo /pub/scm/linux/kernel/git/torvalds/linux-2.6.git/objects \
	   > objects/info/alternates
	git repack -a -d -l
   fi
   popd
done


					- Ted

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Efficiency of initial clone from server
  2007-02-12 19:35                       ` Theodore Tso
@ 2007-02-12 20:53                         ` Junio C Hamano
  2007-02-12 21:33                           ` Nicolas Pitre
  2007-02-13  0:51                           ` Jakub Narebski
  0 siblings, 2 replies; 32+ messages in thread
From: Junio C Hamano @ 2007-02-12 20:53 UTC (permalink / raw)
  To: Theodore Tso; +Cc: Nicolas Pitre, Jon Smirl, Shawn O. Pearce, Git Mailing List

Theodore Tso <tytso@mit.edu> writes:

> Even before we get Git 1.5.0 installed on master.kernel.org (and we
> should really ask hpa to do that), is there a reason we haven't done
> something like this across all of the kernel repo's on
> master.kernel.org?
>
> for i in <list of kernel git repo's on master.kernel.org>
> do
>    pushd $i
>    if [ ! -f objects/info/alternates ]; then
> 	echo /pub/scm/linux/kernel/git/torvalds/linux-2.6.git/objects \
> 	   > objects/info/alternates
> 	git repack -a -d -l
>    fi
>    popd
> done

Perhaps s/<list of kernel git repo's on master.kernel.org>/& minus a few/?

"Minus a few" are (obviously) Linus's repository and bkcvs
historical ones (I think there are two of them).

Other than that I do not think of a major problem.  Repacking
into one would inconvenience http clients but that is not a new
issue and would have happened when the owner of the individual
repository chose to do so anyway.

Older clients do not understand the more efficient packfile
format (delta-base-offset encoding) that can be used with recent
git.  The feature is not turned on by default and is controlled
by configuration repack.usedeltabaseoffset.  Whoever does the
"git repack" above should make sure he does not enable it with
his $HOME/.gitconfig.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Efficiency of initial clone from server
  2007-02-12 20:53                         ` Junio C Hamano
@ 2007-02-12 21:33                           ` Nicolas Pitre
  2007-02-13  0:51                           ` Jakub Narebski
  1 sibling, 0 replies; 32+ messages in thread
From: Nicolas Pitre @ 2007-02-12 21:33 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Theodore Tso, Jon Smirl, Shawn O. Pearce, Git Mailing List

On Mon, 12 Feb 2007, Junio C Hamano wrote:

> Older clients do not understand the more efficient packfile format 
> (delta-base-offset encoding) that can be used with recent git.  The 
> feature is not turned on by default and is controlled by configuration 
> repack.usedeltabaseoffset. Whoever does the "git repack" above should 
> make sure he does not enable it with his $HOME/.gitconfig.

Of course, if we don't care about old GIT client using HTTP transport, 
then enabling this option won't affect GIT client using the native 
protocol as the delta-base-offset encoding can be translated into the 
legacy encoding on the fly depending on the client's advertized 
capabilities.


Nicolas

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Efficiency of initial clone from server
  2007-02-12 20:53                         ` Junio C Hamano
  2007-02-12 21:33                           ` Nicolas Pitre
@ 2007-02-13  0:51                           ` Jakub Narebski
  1 sibling, 0 replies; 32+ messages in thread
From: Jakub Narebski @ 2007-02-13  0:51 UTC (permalink / raw)
  To: git

Junio C Hamano wrote:

> Theodore Tso <tytso@mit.edu> writes:
> 
>> Even before we get Git 1.5.0 installed on master.kernel.org (and we
>> should really ask hpa to do that), is there a reason we haven't done
>> something like this across all of the kernel repo's on
>> master.kernel.org?
>>
>> for i in <list of kernel git repo's on master.kernel.org>
>> do
>>    pushd $i
>>    if [ ! -f objects/info/alternates ]; then
>>      echo /pub/scm/linux/kernel/git/torvalds/linux-2.6.git/objects \
>>         > objects/info/alternates
>>      git repack -a -d -l
>>    fi
>>    popd
>> done
> 
> Perhaps s/<list of kernel git repo's on master.kernel.org>/& minus a few/?
> 
> "Minus a few" are (obviously) Linus's repository and bkcvs
> historical ones (I think there are two of them).
> 
> Other than that I do not think of a major problem.  Repacking
> into one would inconvenience http clients but that is not a new
> issue and would have happened when the owner of the individual
> repository chose to do so anyway.

So objects/info/http-alternates should be also set, I guess...

-- 
Jakub Narebski
Warsaw, Poland
ShadeHawk on #git

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Efficiency of initial clone from server
  2007-02-12  4:53             ` Jon Smirl
  2007-02-12  5:01               ` Jon Smirl
@ 2007-02-13 15:03               ` Andreas Ericsson
  1 sibling, 0 replies; 32+ messages in thread
From: Andreas Ericsson @ 2007-02-13 15:03 UTC (permalink / raw)
  To: Jon Smirl
  Cc: Junio C Hamano, Nicolas Pitre, Shawn O. Pearce, Git Mailing List

Jon Smirl wrote:
> On 2/11/07, Junio C Hamano <junkio@cox.net> wrote:
>> "Jon Smirl" <jonsmirl@gmail.com> writes:
>>
>> > jonsmirl@jonsmirl:/extra$ git clone
>> > git://git.kernel.org/pub/scm/linux/kernel/gt/linville/wireless-dev.git
>> > Initialized empty Git repository in /extra/wireless-dev/.git/
>> > fatal: The remote end hung up unexpectedly
>>
>> Are you sure the above is ".../linux/kernel/gt/linville/..."?
> 
> You're right it should be git instead of gt, somewhere in my copying
> strings around I lost the 'i' and wasn't paying attention.  That's not
> a very good error message: "fatal: The remote end hung up
> unexpectedly" for a missing repository.
> 

It's necessary for security reasons that the git daemon doesn't tell you
*why* it failed though, otherwise attackers could use the git daemon to
browse the existance of files and directories on the remote end.

It could be nice to add "Are you sure $path_to_repo hosts a repository?"
to the message though, which would toss any spelling errors in the users'
face.

-- 
Andreas Ericsson                   andreas.ericsson@op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2007-02-13 15:03 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-02-11 19:53 Efficiency of initial clone from server Jon Smirl
2007-02-11 22:53 ` Shawn O. Pearce
2007-02-11 23:25   ` Jon Smirl
2007-02-11 23:51     ` Jon Smirl
2007-02-12  1:38     ` Nicolas Pitre
2007-02-12  2:15       ` Jon Smirl
2007-02-12  3:55         ` Nicolas Pitre
2007-02-12  4:49           ` Shawn O. Pearce
2007-02-12 16:42             ` Nicolas Pitre
2007-02-12  4:16       ` Junio C Hamano
2007-02-12  4:29         ` Jon Smirl
2007-02-12  4:33           ` Junio C Hamano
2007-02-12  4:53             ` Jon Smirl
2007-02-12  5:01               ` Jon Smirl
2007-02-12  5:11                 ` Shawn O. Pearce
2007-02-12  5:17                   ` Jon Smirl
2007-02-12 15:20                     ` Nicolas Pitre
2007-02-12 19:35                       ` Theodore Tso
2007-02-12 20:53                         ` Junio C Hamano
2007-02-12 21:33                           ` Nicolas Pitre
2007-02-13  0:51                           ` Jakub Narebski
2007-02-12  5:30                 ` Junio C Hamano
2007-02-12  5:55                   ` Jon Smirl
2007-02-12  6:08                     ` Junio C Hamano
2007-02-12 15:24                       ` Jon Smirl
2007-02-12 16:40                         ` Jon Smirl
2007-02-12 17:04                           ` Shawn O. Pearce
2007-02-12 11:45                   ` Johannes Schindelin
2007-02-12 14:31                     ` Jon Smirl
2007-02-12 17:06                       ` Shawn O. Pearce
2007-02-13 15:03               ` Andreas Ericsson
2007-02-11 23:29   ` Jon Smirl

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.