All of lore.kernel.org
 help / color / mirror / Atom feed
* Bug? git submodule update --reference doesn't use the referenced repository
@ 2017-04-03  3:13 Maxime Viargues
  2017-04-03 16:32 ` Stefan Beller
  0 siblings, 1 reply; 3+ messages in thread
From: Maxime Viargues @ 2017-04-03  3:13 UTC (permalink / raw)
  To: git

Hi there,

I have been trying to use the --reference option to clone a big 
repository using a local copy, but I can't manage to make it work using 
sub-module update. I believe this is a bug, unless I missed something.
I am on Windows, Git 2.12.0

So the problem is as follow:
- I have got a repository with multiple sub-modules, say
     main
         lib1
             sub-module1.git
         lib2
             sub-module2.git
- The original repositories are in GitHub, which makes it slow
- I have done a normal git clone of the entire repository (not bare) and 
put it on a file server, say \\fileserver\ref_repo\
(Note that the problem also happens with local copy)

So if I do a clone to get the repo and all the submodules with...
git clone --reference-if-able \\fileserver\ref-repo --recursive 
git@github.com:company/main
...then it all works, all the sub-modules get cloned and the it's fast.

Now in my case I am working with Jenkins jobs and I need to first do a 
clone, and then get the sub-modules, but if I do...
git clone --reference-if-able \\fileserver\ref-repo 
git@github.com:company/main (so non-recursive)
cd main
git submodule update --init --reference \\fileserver\ref-repo
... then this takes ages, as it would normally do without the use of 
--reference. I suspect it's not actually using it.
The git clone documentation mentions that the reference is then passed 
to the sub-module clone commands, so I would expect "git clone 
--recursive" to work the same as "git submodule update", as far as 
--reference is concerned.

I noticed for a single module, doing a...
git submodule update --init --reference 
\\fileserver\ref-repo\lib1\sub-module1 -- lib1/sub-module1
...i.e. adding the sub-module path to the reference path, works. Which 
kind of make sense but then how do you do to apply it to all the 
sub-modules? (without writing a script to do that)

If someone can confirm the problem or explain me what I am dong wrong 
that would be great.

Maxime

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Bug? git submodule update --reference doesn't use the referenced repository
  2017-04-03  3:13 Bug? git submodule update --reference doesn't use the referenced repository Maxime Viargues
@ 2017-04-03 16:32 ` Stefan Beller
  2017-04-04  2:15   ` Maxime Viargues
  0 siblings, 1 reply; 3+ messages in thread
From: Stefan Beller @ 2017-04-03 16:32 UTC (permalink / raw)
  To: Maxime Viargues; +Cc: git

On Sun, Apr 2, 2017 at 8:13 PM, Maxime Viargues
<maxime.viargues@serato.com> wrote:
> Hi there,
>
> I have been trying to use the --reference option to clone a big repository
> using a local copy, but I can't manage to make it work using sub-module
> update. I believe this is a bug, unless I missed something.
> I am on Windows, Git 2.12.0

which is new enough, that the new --reference code is in. :)

>
> So the problem is as follow:
> - I have got a repository with multiple sub-modules, say
>     main
>         lib1
>             sub-module1.git
>         lib2
>             sub-module2.git
> - The original repositories are in GitHub, which makes it slow
> - I have done a normal git clone of the entire repository (not bare) and put
> it on a file server, say \\fileserver\ref_repo\
> (Note that the problem also happens with local copy)
>
> So if I do a clone to get the repo and all the submodules with...
> git clone --reference-if-able \\fileserver\ref-repo --recursive
> git@github.com:company/main
> ...then it all works, all the sub-modules get cloned and the it's fast.

great. :)

> Now in my case I am working with Jenkins jobs and I need to first do a
> clone, and then get the sub-modules, but if I do...
> git clone --reference-if-able \\fileserver\ref-repo
> git@github.com:company/main (so non-recursive)
> cd main
> git submodule update --init --reference \\fileserver\ref-repo
> ... then this takes ages, as it would normally do without the use of
> --reference. I suspect it's not actually using it.

So to confirm your suspicion, can you run

  GIT_TRACE=1 git clone ...
  cd main && GIT_TRACE=1 git submodule update ...

to see which child processes are spawned to deal with the submodules?
Also to confirm, it is the "submodule update" that is taking so long for you?

> The git clone documentation mentions that the reference is then passed to
> the sub-module clone commands, so I would expect "git clone --recursive" to
> work the same as "git submodule update", as far as --reference is concerned.

Oh, there we have an opportunity to improve the man page (or the code).

    git clone --reference --recursive ...

will set the config variables

    git config submodule.alternateLocation superproject
    git config submodule.alternateErrorStrategy die (or "info" for
--reference-if-able)

and the clone for the submodules (that are an independent process, just
run after the clone of the superproject is done) will pickup these
config variables
and act accordingly.

If you only run

    git clone --reference ...

then these variables are not set. Probably they should be set such
that the later
invocation of "git submodule update --int" will behave the same as the git-clone
of the superproject did.

So as a workaround for you to get up to speed again, you can just set
these config
variables yourself before running the "submodule update --init" and it
should work.

>
> I noticed for a single module, doing a...
> git submodule update --init --reference
> \\fileserver\ref-repo\lib1\sub-module1 -- lib1/sub-module1
> ...i.e. adding the sub-module path to the reference path, works. Which kind
> of make sense but then how do you do to apply it to all the sub-modules?
> (without writing a script to do that)

I think that functionality is broken as it takes the same reference
for all submodules,
such that you need to go through the submodules one by one and give the
submodule specific reference location.

>
> If someone can confirm the problem or explain me what I am dong wrong that
> would be great.
>
> Maxime

Stefan

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Bug? git submodule update --reference doesn't use the referenced repository
  2017-04-03 16:32 ` Stefan Beller
@ 2017-04-04  2:15   ` Maxime Viargues
  0 siblings, 0 replies; 3+ messages in thread
From: Maxime Viargues @ 2017-04-04  2:15 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git

On 04-Apr-17 4:32 AM, Stefan Beller wrote:
> On Sun, Apr 2, 2017 at 8:13 PM, Maxime Viargues
> <maxime.viargues@serato.com> wrote:
>> Hi there,
>>
>> I have been trying to use the --reference option to clone a big repository
>> using a local copy, but I can't manage to make it work using sub-module
>> update. I believe this is a bug, unless I missed something.
>> I am on Windows, Git 2.12.0
> which is new enough, that the new --reference code is in. :)
>
>> So the problem is as follow:
>> - I have got a repository with multiple sub-modules, say
>>      main
>>          lib1
>>              sub-module1.git
>>          lib2
>>              sub-module2.git
>> - The original repositories are in GitHub, which makes it slow
>> - I have done a normal git clone of the entire repository (not bare) and put
>> it on a file server, say \\fileserver\ref_repo\
>> (Note that the problem also happens with local copy)
>>
>> So if I do a clone to get the repo and all the submodules with...
>> git clone --reference-if-able \\fileserver\ref-repo --recursive
>> git@github.com:company/main
>> ...then it all works, all the sub-modules get cloned and the it's fast.
> great. :)
>
>> Now in my case I am working with Jenkins jobs and I need to first do a
>> clone, and then get the sub-modules, but if I do...
>> git clone --reference-if-able \\fileserver\ref-repo
>> git@github.com:company/main (so non-recursive)
>> cd main
>> git submodule update --init --reference \\fileserver\ref-repo
>> ... then this takes ages, as it would normally do without the use of
>> --reference. I suspect it's not actually using it.
> So to confirm your suspicion, can you run
>
>    GIT_TRACE=1 git clone ...
>    cd main && GIT_TRACE=1 git submodule update ...
>
> to see which child processes are spawned to deal with the submodules?
> Also to confirm, it is the "submodule update" that is taking so long for you?
Yes I confirm it's the "submodule update" which is taking a long time. 
The clone with the reference is definitely working.

Running git submodule update with  "GIT_TRACE=1", here is a snippet of 
what I get:

10:14:44.924684 git.c:596               trace: exec: 'git-submodule' 
'update' '--init' '--reference' 
'\\fileserver\Builds\reference_repos\main-repo'
10:14:44.925684 run-command.c:369       trace: run_command: 
'git-submodule' 'update' '--init' '--reference' 
'\\fileserver\Builds\reference_repos\main-repo'
10:14:45.146488 git.c:596               trace: exec: 
'git-sh-i18n--envsubst' '--variables' 'usage: $dashless $USAGE'
10:14:45.146488 run-command.c:369       trace: run_command: 
'git-sh-i18n--envsubst' '--variables' 'usage: $dashless $USAGE'
10:14:45.231548 git.c:596               trace: exec: 
'git-sh-i18n--envsubst' 'usage: $dashless $USAGE'
10:14:45.231548 run-command.c:369       trace: run_command: 
'git-sh-i18n--envsubst' 'usage: $dashless $USAGE'
10:14:45.357059 git.c:371               trace: built-in: git 'rev-parse' 
'--git-dir'
10:14:45.427806 git.c:371               trace: built-in: git 'rev-parse' 
'--git-path' 'objects'
10:14:45.487348 git.c:371               trace: built-in: git 'rev-parse' 
'-q' '--git-dir'
10:14:45.593794 git.c:371               trace: built-in: git 'rev-parse' 
'--show-prefix'
10:14:45.643162 git.c:371               trace: built-in: git 'rev-parse' 
'--show-toplevel'
10:14:45.700201 git.c:371               trace: built-in: git 
'submodule--helper' 'init'
10:14:45.986024 git.c:371               trace: built-in: git 
'submodule--helper' 'update-clone' 
'--reference=\\fileserver\Builds\reference_repos\main-repo'
10:14:45.988024 run-command.c:1155      run_processes_parallel: 
preparing to run up to 1 tasks
10:14:45.988024 run-command.c:369       trace: run_command: 
'submodule--helper' 'clone' '--path' 'lib1/lib1_source' '--name' 
'lib1/lib1_source' '--url' 'git@github.com:company/main-repo-lib1.git' 
'--reference' '\\fileserver\Builds\reference_repos\main-repo'
10:15:06.204872 run-command.c:369       trace: run_command: 
'submodule--helper' 'clone' '--path' 'lib2/lib2_source' '--name' 
'lib2/lib2_source' '--url' 'git@github.com:company/main-repo-lib2.git' 
'--reference' '\\fileserver\Builds\reference_repos\main-repo'
10:14:46.025555 git.c:371               trace: built-in: git 
'submodule--helper' 'clone' '--path' 'lib1/lib1_source' '--name' 
'lib1/lib1_source' '--url' 'git@github.com:company/main-repo-lib1.git' 
'--reference' '\\fileserver\Builds\reference_repos\main-repo'
10:14:46.027555 run-command.c:369       trace: run_command: 'clone' 
'--no-checkout' '--reference' 
'\\fileserver\Builds\reference_repos\main-repo' '--separate-git-dir' 
'D:/tmp2/git_clone_tests/main-repo/.git/modules/lib1/lib1_source' 
'git@github.com:company/main-repo-lib1.git' 
'D:/tmp2/git_clone_tests/main-repo/lib1/lib1_source'
10:14:46.061305 git.c:371               trace: built-in: git 'clone' 
'--no-checkout' '--reference' 
'\\fileserver\Builds\reference_repos\main-repo' '--separate-git-dir' 
'D:/tmp2/git_clone_tests/main-repo/.git/modules/lib1/lib1_source' 
'git@github.com:company/main-repo-lib1.git' 
'D:/tmp2/git_clone_tests/main-repo/lib1/lib1_source'
10:14:46.115339 run-command.c:369       trace: run_command: 'ssh' 
'git@github.com' 'git-upload-pack '\''company/main-repo-lib1.git'\'''
Cloning into 'D:/tmp2/git_clone_tests/main-repo/lib1/lib1_source'...
10:14:48.962590 run-command.c:369       trace: run_command: 
'git-upload-pack '\''//fileserver/Builds/reference_repos/main-repo/.git'\'''
10:14:49.103908 run-command.c:369       trace: run_command: 
'git-upload-pack '\''D:/GitHub/main-repo/.git'\'''
10:14:49.184477 run-command.c:369       trace: run_command: 
'git-upload-pack '\''//fileserver/Builds/reference_repos/main-repo/.git'\'''
10:14:49.322365 run-command.c:369       trace: run_command: 
'git-upload-pack '\''D:/GitHub/main-repo/.git'\'''
10:14:52.281044 run-command.c:369       trace: run_command: 'index-pack' 
'--stdin' '--fix-thin' '--keep=fetch-pack 5764 on WIN-1198' 
'--check-self-contained-and-connected'
10:14:52.315569 git.c:371               trace: built-in: git 
'index-pack' '--stdin' '--fix-thin' '--keep=fetch-pack 5764 on WIN-1198' 
'--check-self-contained-and-connected'
10:15:06.119340 run-command.c:369       trace: run_command: 'rev-list' 
'--objects' '--stdin' '--not' '--all' '--quiet'
10:15:06.170876 git.c:371               trace: built-in: git 'rev-list' 
'--objects' '--stdin' '--not' '--all' '--quiet'
10:15:13.072336 run-command.c:369       trace: run_command: 
'submodule--helper' 'clone' '--path' 'lib3/lib3_source' '--name' 
'lib3/lib3_source' '--url' 'git@github.com:company/main-repo-lib3' 
'--reference' '\\fileserver\Builds\reference_repos\main-repo'
10:15:06.238893 git.c:371               trace: built-in: git 
'submodule--helper' 'clone' '--path' 'lib2/lib2_source' '--name' 
'lib2/lib2_source' '--url' 'git@github.com:company/main-repo-lib2.git' 
'--reference' '\\fileserver\Builds\reference_repos\main-repo'
10:15:06.239894 run-command.c:369       trace: run_command: 'clone' 
'--no-checkout' '--reference' 
'\\fileserver\Builds\reference_repos\main-repo' '--separate-git-dir' 
'D:/tmp2/git_clone_tests/main-repo/.git/modules/lib2/lib2_source' 
'git@github.com:company/main-repo-lib2.git' 
'D:/tmp2/git_clone_tests/main-repo/lib2/lib2_source'
10:15:06.273418 git.c:371               trace: built-in: git 'clone' 
'--no-checkout' '--reference' 
'\\fileserver\Builds\reference_repos\main-repo' '--separate-git-dir' 
'D:/tmp2/git_clone_tests/main-repo/.git/modules/lib2/lib2_source' 
'git@github.com:company/main-repo-lib2.git' 
'D:/tmp2/git_clone_tests/main-repo/lib2/lib2_source'
10:15:06.309945 run-command.c:369       trace: run_command: 'ssh' 
'git@github.com' 'git-upload-pack '\''company/main-repo-lib2.git'\'''
Cloning into 'D:/tmp2/git_clone_tests/main-repo/lib2/lib2_source'...
10:15:08.210491 run-command.c:369       trace: run_command: 
'git-upload-pack '\''//fileserver/Builds/reference_repos/main-repo/.git'\'''
10:15:08.370561 run-command.c:369       trace: run_command: 
'git-upload-pack '\''D:/GitHub/main-repo/.git'\'''
10:15:08.451234 run-command.c:369       trace: run_command: 
'git-upload-pack '\''//fileserver/Builds/reference_repos/main-repo/.git'\'''
10:15:08.589129 run-command.c:369       trace: run_command: 
'git-upload-pack '\''D:/GitHub/main-repo/.git'\'''
10:15:11.533328 run-command.c:369       trace: run_command: 'index-pack' 
'--stdin' '--fix-thin' '--keep=fetch-pack 9308 on WIN-1198' 
'--check-self-contained-and-connected'
10:15:11.575862 git.c:371               trace: built-in: git 
'index-pack' '--stdin' '--fix-thin' '--keep=fetch-pack 9308 on WIN-1198' 
'--check-self-contained-and-connected'
10:15:12.986776 run-command.c:369       trace: run_command: 'rev-list' 
'--objects' '--stdin' '--not' '--all' '--quiet'
10:15:13.039314 git.c:371               trace: built-in: git 'rev-list' 
'--objects' '--stdin' '--not' '--all' '--quiet'
10:15:49.633796 run-command.c:369       trace: run_command: 
'submodule--helper' 'clone' '--path' 'lib4/lib4_source' '--name' 
'lib4/lib4_source' '--url' 'git@github.com:company/main-repo-lib4.git' 
'--reference' '\\fileserver\Builds\reference_repos\main-repo'
10:15:13.106441 git.c:371               trace: built-in: git 
'submodule--helper' 'clone' '--path' 'lib3/lib3_source' '--name' 
'lib3/lib3_source' '--url' 'git@github.com:company/main-repo-lib3' 
'--reference' '\\fileserver\Builds\reference_repos\main-repo'
10:15:13.107441 run-command.c:369       trace: run_command: 'clone' 
'--no-checkout' '--reference' 
'\\fileserver\Builds\reference_repos\main-repo' '--separate-git-dir' 
'D:/tmp2/git_clone_tests/main-repo/.git/modules/lib3/lib3_source' 
'git@github.com:company/main-repo-lib3' 
'D:/tmp2/git_clone_tests/main-repo/lib3/lib3_source'
10:15:13.141464 git.c:371               trace: built-in: git 'clone' 
'--no-checkout' '--reference' 
'\\fileserver\Builds\reference_repos\main-repo' '--separate-git-dir' 
'D:/tmp2/git_clone_tests/main-repo/.git/modules/lib3/lib3_source' 
'git@github.com:company/main-repo-lib3' 
'D:/tmp2/git_clone_tests/main-repo/lib3/lib3_source'
10:15:13.174486 run-command.c:369       trace: run_command: 'ssh' 
'git@github.com' 'git-upload-pack '\''company/main-repo-lib3'\'''
...
>
>> The git clone documentation mentions that the reference is then passed to
>> the sub-module clone commands, so I would expect "git clone --recursive" to
>> work the same as "git submodule update", as far as --reference is concerned.
> Oh, there we have an opportunity to improve the man page (or the code).
>
>      git clone --reference --recursive ...
>
> will set the config variables
>
>      git config submodule.alternateLocation superproject
>      git config submodule.alternateErrorStrategy die (or "info" for
> --reference-if-able)
>
> and the clone for the submodules (that are an independent process, just
> run after the clone of the superproject is done) will pickup these
> config variables
> and act accordingly.
>
> If you only run
>
>      git clone --reference ...
>
> then these variables are not set. Probably they should be set such
> that the later
> invocation of "git submodule update --int" will behave the same as the git-clone
> of the superproject did.
>
> So as a workaround for you to get up to speed again, you can just set
> these config
> variables yourself before running the "submodule update --init" and it
> should work.
Ok I'll try that.
>> I noticed for a single module, doing a...
>> git submodule update --init --reference
>> \\fileserver\ref-repo\lib1\sub-module1 -- lib1/sub-module1
>> ...i.e. adding the sub-module path to the reference path, works. Which kind
>> of make sense but then how do you do to apply it to all the sub-modules?
>> (without writing a script to do that)
> I think that functionality is broken as it takes the same reference
> for all submodules,
> such that you need to go through the submodules one by one and give the
> submodule specific reference location.
I actually made a script to run it on each submodule, which works but is 
still quite slow as it cannot be parallelized (git doesn't like multiple 
submodule updates running concurrently).
>
>> If someone can confirm the problem or explain me what I am dong wrong that
>> would be great.
>>
>> Maxime
> Stefan
Thanks for you quick answer

Maxime

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2017-04-04  2:15 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-04-03  3:13 Bug? git submodule update --reference doesn't use the referenced repository Maxime Viargues
2017-04-03 16:32 ` Stefan Beller
2017-04-04  2:15   ` Maxime Viargues

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.