All of lore.kernel.org
 help / color / mirror / Atom feed
* git clone submodules recursive and reference
@ 2012-04-20 15:12 Samuel Maftoul
  2012-04-20 18:59 ` Jens Lehmann
  0 siblings, 1 reply; 7+ messages in thread
From: Samuel Maftoul @ 2012-04-20 15:12 UTC (permalink / raw)
  To: git

Hello,

I'm using git clone --reference, it works like a charm !

Now, I have submodules, so I call git clone with both --recursive and
--reference, works only for the repo itself, submodules are being
cloned without the "--reference" option.

With GIT_TRACE=1 I can see this for the initial repo:

trace: built-in: git 'clone' '--recursive' '--reference' [...]

And this for the submodules:

trace: built-in: git 'clone' '-n' [...]

for every submodules.

Is this an intended behavior ?

How can I force the clones for submodules to be executed with the
--reference option ?

I tried to wrap /usr/lib/git-core/git-clone, but GIT_TRACE states it,
it's a builtin and indeed my wrapped version of git clone is not
executed !

Thanks

--

Samuel MAFTOUL

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: git clone submodules recursive and reference
  2012-04-20 15:12 git clone submodules recursive and reference Samuel Maftoul
@ 2012-04-20 18:59 ` Jens Lehmann
  2012-04-20 19:26   ` Samuel Maftoul
  2012-06-29 20:19   ` Phil Hord
  0 siblings, 2 replies; 7+ messages in thread
From: Jens Lehmann @ 2012-04-20 18:59 UTC (permalink / raw)
  To: Samuel Maftoul; +Cc: git

Am 20.04.2012 17:12, schrieb Samuel Maftoul:
> Hello,
> 
> I'm using git clone --reference, it works like a charm !
> 
> Now, I have submodules, so I call git clone with both --recursive and
> --reference, works only for the repo itself, submodules are being
> cloned without the "--reference" option.
> 
> With GIT_TRACE=1 I can see this for the initial repo:
> 
> trace: built-in: git 'clone' '--recursive' '--reference' [...]
> 
> And this for the submodules:
> 
> trace: built-in: git 'clone' '-n' [...]
> 
> for every submodules.
> 
> Is this an intended behavior ?

Hmm, to me it looks like passing the --reference option to the clone
run in the submodules doesn't make much sense, as that would make
all submodules and the superproject use the same alternates. And as
far as I know sharing objects between different repositories is not
supported.

> How can I force the clones for submodules to be executed with the
> --reference option ?

You'd have to use "git clone" without the --recursive option and
then do a "git submodule update --init --reference ...".

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: git clone submodules recursive and reference
  2012-04-20 18:59 ` Jens Lehmann
@ 2012-04-20 19:26   ` Samuel Maftoul
  2012-04-21 13:45     ` Jens Lehmann
  2012-06-29 20:19   ` Phil Hord
  1 sibling, 1 reply; 7+ messages in thread
From: Samuel Maftoul @ 2012-04-20 19:26 UTC (permalink / raw)
  To: Jens Lehmann; +Cc: git

> Hmm, to me it looks like passing the --reference option to the clone
> run in the submodules doesn't make much sense, as that would make
> all submodules and the superproject use the same alternates. And as
> far as I know sharing objects between different repositories is not
> supported.

I'm sharing objects between repositories by creating a bare
repository, adding the remotes for the repositories and fetch them in
this bare repo.
So for me, it makes sense to pass the "--reference" to the submodules
clone, if submodules remotes are added to this reference bare repo and
objects are already fetched (and I'm in this case, as I use a lot of
different projects that shares the same set of submodules).

>
>> How can I force the clones for submodules to be executed with the
>> --reference option ?
>
> You'd have to use "git clone" without the --recursive option and
> then do a "git submodule update --init --reference ...".

Yes, this should make it, but I would have been more happy with a
single command !

Thanks Jens !

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: git clone submodules recursive and reference
  2012-04-20 19:26   ` Samuel Maftoul
@ 2012-04-21 13:45     ` Jens Lehmann
  2012-04-23  8:06       ` Samuel Maftoul
  0 siblings, 1 reply; 7+ messages in thread
From: Jens Lehmann @ 2012-04-21 13:45 UTC (permalink / raw)
  To: Samuel Maftoul; +Cc: git

Am 20.04.2012 21:26, schrieb Samuel Maftoul:
>> Hmm, to me it looks like passing the --reference option to the clone
>> run in the submodules doesn't make much sense, as that would make
>> all submodules and the superproject use the same alternates. And as
>> far as I know sharing objects between different repositories is not
>> supported.

I take that back, I was thinking about the idea to store the objects
of all submodules in the superproject's object store and then access
them via alternates which was discussed some time ago. That won't
work out of the box because the submodule commits would be dangling
in the superprojects repo.

> I'm sharing objects between repositories by creating a bare
> repository, adding the remotes for the repositories and fetch them in
> this bare repo.

This sounds like a cool way to reduce the disk footprint of the
repos on our Jenkins server.

> So for me, it makes sense to pass the "--reference" to the submodules
> clone, if submodules remotes are added to this reference bare repo and
> objects are already fetched (and I'm in this case, as I use a lot of
> different projects that shares the same set of submodules).

How do you fetch then, do you fetch into the referenced repo first
and then do a fetch in the clones afterwards to just update the refs
there? Or is the bare repo just a starting point for the initial
clone?

>>> How can I force the clones for submodules to be executed with the
>>> --reference option ?
>>
>> You'd have to use "git clone" without the --recursive option and
>> then do a "git submodule update --init --reference ...".
> 
> Yes, this should make it, but I would have been more happy with a
> single command !

Hmm, me thinks we'd have to add a new option for that, and I'm not
sure it is worth it.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: git clone submodules recursive and reference
  2012-04-21 13:45     ` Jens Lehmann
@ 2012-04-23  8:06       ` Samuel Maftoul
  2012-04-23 21:20         ` Jens Lehmann
  0 siblings, 1 reply; 7+ messages in thread
From: Samuel Maftoul @ 2012-04-23  8:06 UTC (permalink / raw)
  To: Jens Lehmann; +Cc: git

>> I'm sharing objects between repositories by creating a bare
>> repository, adding the remotes for the repositories and fetch them in
>> this bare repo.
>
> This sounds like a cool way to reduce the disk footprint of the
> repos on our Jenkins server.

I'm not using --reference for reducing disk footprint, but rather for
caching git repos and reducing the impact of slow networks !
Why would it reduce the disk footprint ?

>
>> So for me, it makes sense to pass the "--reference" to the submodules
>> clone, if submodules remotes are added to this reference bare repo and
>> objects are already fetched (and I'm in this case, as I use a lot of
>> different projects that shares the same set of submodules).
>
> How do you fetch then, do you fetch into the referenced repo first
> and then do a fetch in the clones afterwards to just update the refs
> there? Or is the bare repo just a starting point for the initial
> clone?

You need to fetch first in the bare repo, than in your clones. When
you use --reference, the reference leaves untouched, it's your job to
update the reference (would be nice to have options that allows to
update the reference at the same time that the clone updates, so no
need to connect twice to the remote repository).

> Hmm, me thinks we'd have to add a new option for that, and I'm not
> sure it is worth it.

Maybe it's not worth ...

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: git clone submodules recursive and reference
  2012-04-23  8:06       ` Samuel Maftoul
@ 2012-04-23 21:20         ` Jens Lehmann
  0 siblings, 0 replies; 7+ messages in thread
From: Jens Lehmann @ 2012-04-23 21:20 UTC (permalink / raw)
  To: Samuel Maftoul; +Cc: git

Am 23.04.2012 10:06, schrieb Samuel Maftoul:
>>> I'm sharing objects between repositories by creating a bare
>>> repository, adding the remotes for the repositories and fetch them in
>>> this bare repo.
>>
>> This sounds like a cool way to reduce the disk footprint of the
>> repos on our Jenkins server.
> 
> I'm not using --reference for reducing disk footprint, but rather for
> caching git repos and reducing the impact of slow networks !
> Why would it reduce the disk footprint ?

Because the object store of the referenced repo is reused, the cloned
.git directory takes up less space (I think that is why the man page
talks about "reducing network and local storage costs").

>>> So for me, it makes sense to pass the "--reference" to the submodules
>>> clone, if submodules remotes are added to this reference bare repo and
>>> objects are already fetched (and I'm in this case, as I use a lot of
>>> different projects that shares the same set of submodules).
>>
>> How do you fetch then, do you fetch into the referenced repo first
>> and then do a fetch in the clones afterwards to just update the refs
>> there? Or is the bare repo just a starting point for the initial
>> clone?
> 
> You need to fetch first in the bare repo, than in your clones. When
> you use --reference, the reference leaves untouched, it's your job to
> update the reference (would be nice to have options that allows to
> update the reference at the same time that the clone updates, so no
> need to connect twice to the remote repository).

I think you could also just fetch in the cloned repos, but then you'll
have to download the objects over the network for each clone and also
won't share the new objects. So I think your approach makes lots of
sense, now I'll just have to tune our Jenkins scripts a bit. ;-)

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: git clone submodules recursive and reference
  2012-04-20 18:59 ` Jens Lehmann
  2012-04-20 19:26   ` Samuel Maftoul
@ 2012-06-29 20:19   ` Phil Hord
  1 sibling, 0 replies; 7+ messages in thread
From: Phil Hord @ 2012-06-29 20:19 UTC (permalink / raw)
  To: Jens Lehmann; +Cc: Samuel Maftoul, git

On Fri, Apr 20, 2012 at 2:59 PM, Jens Lehmann <Jens.Lehmann@web.de> wrote:
> Am 20.04.2012 17:12, schrieb Samuel Maftoul:
>> Hello,
>>
>> I'm using git clone --reference, it works like a charm !
>>
>> Now, I have submodules, so I call git clone with both --recursive and
>> --reference, works only for the repo itself, submodules are being
>> cloned without the "--reference" option.
>>
>> With GIT_TRACE=1 I can see this for the initial repo:
>>
>> trace: built-in: git 'clone' '--recursive' '--reference' [...]
>>
>> And this for the submodules:
>>
>> trace: built-in: git 'clone' '-n' [...]
>>
>> for every submodules.
>>
>> Is this an intended behavior ?
>
> Hmm, to me it looks like passing the --reference option to the clone
> run in the submodules doesn't make much sense, as that would make
> all submodules and the superproject use the same alternates. And as
> far as I know sharing objects between different repositories is not
> supported.

Suppose we had an option for submodules which would use the _relative_
submodule URL to augment the --reference path.  Would that be tenable,
or do we need some extra option?

On my Jenkins server, I have a local mirror of my repos on my Gerrit
server.  I intentionally set the mirror paths up to match the layout
of the Gerrit forest of repositories.  Thus, my relative URLs work for
local clones from this mirror as well.

Currently I am doing something like this:

myrepo=/tmp/test-mirror1
mirror=/var/lib/jenkins/mirror/superproject.git
remote=gerrit:superproject.git

#-- Clone fresh from local mirrors
git clone --recursive ${mirror} ${myrepo}
cd ${myrepo}

#-- Switch to the remote server URL
git config remote.origin.url ${remote}
git submodule sync

#-- Checkout remote updates
git pull --ff-only --recurse-submodules origin
git submodule update


In my tests, this is about twice as fast[*1*] doing it the normal way:

git clone --recursive ${remote} ${myrepo}


But I would like to just do it like this:

git clone --reference=${mirror} --recursive ${remote}  ${myrepo}

It would be silly for all the submodules to use that reference as-is,
except in the weird case where you've pull the remote objects from
several repositories into one bare container.  I would argue we don't
need to support the weird case at all, or only with some --weirdo
switch added on.  Except there is already precedent for doing this
"the weird way":

git submodule update --recursive --reference=/var/mirror/foo

There's nothing there that stops the the absolute reference being used
for each submodule, and that's glaringly discongruent to my proposed
addition to git-clone.  So I can think of two ways to move forward
with the relative-reference idea.

1. Use another switch to turn this behavior on
    git clone --reference=${mirror} \
              --submodule-relative-references \
              --recursive ${remote} ${myrepo}

2. Use a different switch name for 'reference':
    git clone --reference-forest=${mirror} \
              --recursive ${remote} ${myrepo}

Does someone have an opinion to guide me?

>> How can I force the clones for submodules to be executed with the
>> --reference option ?
>
> You'd have to use "git clone" without the --recursive option and
> then do a "git submodule update --init --reference ...".

Or the way I did it, but my method has its flaws.

Phil

[*1*] While it is twice as fast, it is subject to some fatal errors if
my mirrored repos are out of sync. That is, if a mirrored submodule
does not include the commit which is gitlinked from the
super-project's HEAD, then the original recursive clone will fail;
then I lose my speedup and possibly error out as well.  --references
does not suffer this same malady.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2012-06-29 20:20 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-04-20 15:12 git clone submodules recursive and reference Samuel Maftoul
2012-04-20 18:59 ` Jens Lehmann
2012-04-20 19:26   ` Samuel Maftoul
2012-04-21 13:45     ` Jens Lehmann
2012-04-23  8:06       ` Samuel Maftoul
2012-04-23 21:20         ` Jens Lehmann
2012-06-29 20:19   ` Phil Hord

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.