All of lore.kernel.org
 help / color / mirror / Atom feed
* Problem with --shallow-submodules option
@ 2016-06-20 13:06 Istvan Zakar
  2016-06-20 17:45 ` Stefan Beller
  2016-06-22 15:31 ` Fredrik Gustafsson
  0 siblings, 2 replies; 8+ messages in thread
From: Istvan Zakar @ 2016-06-20 13:06 UTC (permalink / raw)
  To: git

Hello,

I'm working on a relatively big project with many submodules. During 
cloning for testing I tried to decrease the amount of data need to be 
fetched from the server by using --shallow-submodules option in the clone 
command. It seems to check out the tip of the remote repo, and if it's not 
the commit registered in the superproject the submodule update fails 
(obviously). Can I somehow tell to fetch that exact commit I need for my 
superproject?

Thanks,
   Istvan


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Problem with --shallow-submodules option
  2016-06-20 13:06 Problem with --shallow-submodules option Istvan Zakar
@ 2016-06-20 17:45 ` Stefan Beller
  2016-06-21  6:32   ` Istvan Zakar
  2016-06-22 15:31 ` Fredrik Gustafsson
  1 sibling, 1 reply; 8+ messages in thread
From: Stefan Beller @ 2016-06-20 17:45 UTC (permalink / raw)
  To: Istvan Zakar; +Cc: git

On Mon, Jun 20, 2016 at 6:06 AM, Istvan Zakar <istvan.zakar@gmail.com> wrote:
> Hello,
>
> I'm working on a relatively big project with many submodules. During
> cloning for testing I tried to decrease the amount of data need to be
> fetched from the server by using --shallow-submodules option in the clone
> command. It seems to check out the tip of the remote repo, and if it's not
> the commit registered in the superproject the submodule update fails
> (obviously).

Yes that is broken as the depth of a submodule is counted from its own HEAD
not from the superprojects sha1 as it should.

So it does

    git clone --depth=1 <submodule-url> <submodule-path>

    if HEAD != recorded gitlink sha1,
        git fetch <recorded gitlink sha1>

    git checkout <recorded gitlink sha1>

> Can I somehow tell to fetch that exact commit I need for my
> superproject?

Some servers support fetching by direct sha1, which is what we make use
of here, then it sort-of works.

If the server doesn't support the capability to fetch an arbitrary sha1,
the submodule command fails, with a message such as

    error: no such remote ref $sha1
    Fetched in submodule path '<submodule>', but it did not contain
$sha1. Direct fetching of that commit failed.

So if it breaks for you now, I would suggest not using that switch, I
don't think there is a quick
workaround.

>
> Thanks,
>    Istvan

Thanks,
Stefan

>
> --
> To unsubscribe from this list: send the line "unsubscribe git" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Problem with --shallow-submodules option
  2016-06-20 17:45 ` Stefan Beller
@ 2016-06-21  6:32   ` Istvan Zakar
  2016-06-21 20:35     ` Stefan Beller
  0 siblings, 1 reply; 8+ messages in thread
From: Istvan Zakar @ 2016-06-21  6:32 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git

Hi,

Thanks for the answer.
So it means that it is a setting on the server side which can be
activated? (I guess it depends on the version of the server)
I did some reading in the topic. Are you talking about this setting
"uploadpack.allowReachableSHA1InWant", or did I misunderstood what I
read?

Thanks,
    Istvan

On 20 June 2016 at 19:45, Stefan Beller <sbeller@google.com> wrote:
> On Mon, Jun 20, 2016 at 6:06 AM, Istvan Zakar <istvan.zakar@gmail.com> wrote:
>> Hello,
>>
>> I'm working on a relatively big project with many submodules. During
>> cloning for testing I tried to decrease the amount of data need to be
>> fetched from the server by using --shallow-submodules option in the clone
>> command. It seems to check out the tip of the remote repo, and if it's not
>> the commit registered in the superproject the submodule update fails
>> (obviously).
>
> Yes that is broken as the depth of a submodule is counted from its own HEAD
> not from the superprojects sha1 as it should.
>
> So it does
>
>     git clone --depth=1 <submodule-url> <submodule-path>
>
>     if HEAD != recorded gitlink sha1,
>         git fetch <recorded gitlink sha1>
>
>     git checkout <recorded gitlink sha1>
>
>> Can I somehow tell to fetch that exact commit I need for my
>> superproject?
>
> Some servers support fetching by direct sha1, which is what we make use
> of here, then it sort-of works.
>
> If the server doesn't support the capability to fetch an arbitrary sha1,
> the submodule command fails, with a message such as
>
>     error: no such remote ref $sha1
>     Fetched in submodule path '<submodule>', but it did not contain
> $sha1. Direct fetching of that commit failed.
>
> So if it breaks for you now, I would suggest not using that switch, I
> don't think there is a quick
> workaround.
>
>>
>> Thanks,
>>    Istvan
>
> Thanks,
> Stefan
>
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe git" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Problem with --shallow-submodules option
  2016-06-21  6:32   ` Istvan Zakar
@ 2016-06-21 20:35     ` Stefan Beller
  0 siblings, 0 replies; 8+ messages in thread
From: Stefan Beller @ 2016-06-21 20:35 UTC (permalink / raw)
  To: Istvan Zakar; +Cc: git

On Mon, Jun 20, 2016 at 11:32 PM, Istvan Zakar <istvan.zakar@gmail.com> wrote:
> Hi,
>
> Thanks for the answer.
> So it means that it is a setting on the server side which can be
> activated? (I guess it depends on the version of the server)
> I did some reading in the topic. Are you talking about this setting
> "uploadpack.allowReachableSHA1InWant", or did I misunderstood what I
> read?

No that's exactly what I meant; sorry for not spelling that out.

Thanks,
Stefan

>
> Thanks,
>     Istvan
>
> On 20 June 2016 at 19:45, Stefan Beller <sbeller@google.com> wrote:
>> On Mon, Jun 20, 2016 at 6:06 AM, Istvan Zakar <istvan.zakar@gmail.com> wrote:
>>> Hello,
>>>
>>> I'm working on a relatively big project with many submodules. During
>>> cloning for testing I tried to decrease the amount of data need to be
>>> fetched from the server by using --shallow-submodules option in the clone
>>> command. It seems to check out the tip of the remote repo, and if it's not
>>> the commit registered in the superproject the submodule update fails
>>> (obviously).
>>
>> Yes that is broken as the depth of a submodule is counted from its own HEAD
>> not from the superprojects sha1 as it should.
>>
>> So it does
>>
>>     git clone --depth=1 <submodule-url> <submodule-path>
>>
>>     if HEAD != recorded gitlink sha1,
>>         git fetch <recorded gitlink sha1>
>>
>>     git checkout <recorded gitlink sha1>
>>
>>> Can I somehow tell to fetch that exact commit I need for my
>>> superproject?
>>
>> Some servers support fetching by direct sha1, which is what we make use
>> of here, then it sort-of works.
>>
>> If the server doesn't support the capability to fetch an arbitrary sha1,
>> the submodule command fails, with a message such as
>>
>>     error: no such remote ref $sha1
>>     Fetched in submodule path '<submodule>', but it did not contain
>> $sha1. Direct fetching of that commit failed.
>>
>> So if it breaks for you now, I would suggest not using that switch, I
>> don't think there is a quick
>> workaround.
>>
>>>
>>> Thanks,
>>>    Istvan
>>
>> Thanks,
>> Stefan
>>
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe git" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Problem with --shallow-submodules option
  2016-06-20 13:06 Problem with --shallow-submodules option Istvan Zakar
  2016-06-20 17:45 ` Stefan Beller
@ 2016-06-22 15:31 ` Fredrik Gustafsson
  2016-06-30 13:27   ` Istvan Zakar
  1 sibling, 1 reply; 8+ messages in thread
From: Fredrik Gustafsson @ 2016-06-22 15:31 UTC (permalink / raw)
  To: Istvan Zakar; +Cc: git

On Mon, Jun 20, 2016 at 01:06:39PM +0000, Istvan Zakar wrote:
> I'm working on a relatively big project with many submodules. During 
> cloning for testing I tried to decrease the amount of data need to be 
> fetched from the server by using --shallow-submodules option in the clone 
> command. It seems to check out the tip of the remote repo, and if it's not 
> the commit registered in the superproject the submodule update fails 
> (obviously). Can I somehow tell to fetch that exact commit I need for my 
> superproject?

Maybe. http://stackoverflow.com/questions/2144406/git-shallow-submodules
gives a good overview of this problem.

git fetches a branch and is shallow from that branch, which might be an
other sha1 than the one the submodule points to, (as you say). This
is/was one of the drawbacks with this method. However the since git 2.8,
git will try to fetch the sha1 direct (and not the branch). So then it
will work, if(!), the server supports direct access to sha1. This was
previously not allowed due to security concerns (if I recall correctly).

So the answer is, yes this will work if you've a recent version of git
and support on the server side for doing this. Unfortunately I'm not
sure which git version is needed on the server side for this to work.

-- 
Fredrik Gustafsson

phone: +46 733-608274
e-mail: iveqy@iveqy.com
website: http://www.iveqy.com

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Problem with --shallow-submodules option
  2016-06-22 15:31 ` Fredrik Gustafsson
@ 2016-06-30 13:27   ` Istvan Zakar
  2016-06-30 20:57     ` Stefan Beller
  0 siblings, 1 reply; 8+ messages in thread
From: Istvan Zakar @ 2016-06-30 13:27 UTC (permalink / raw)
  To: Fredrik Gustafsson; +Cc: git

Hello,

Thanks for your answers. I tested it after the changes were made on
the git server, and it seems to be working. But some other issue came
up.

We have quite many submodules in our project so I did some comaprision:

If I do a clone with these parameters:
--jobs 20 --recurse-submodules

The clone lasts ~53 seconds, and the total size of the folder is around 2 GB.

If I add the shallow-submodules option, the size of the folder will be
a bit below 1GB, so the size decreased as I expected, but the time of
the clone itself increased to 90 seconds. It seems the last step of
the command, checking out the submodules is executed one-by-one, and
not in parallel, so it seems at this step the jobs parameter does not
have effect.

Is it intentional, or there is some option I missed?

I'm using git 2.9.0 on client side.

Thanks,
   Istvan

ps: if I update the submodules with --depth 1 parameter in parallel
using xargs it lasts about 18 seconds, so it's a workaround for this
issue, but it would be nice to do it with a single command.




On 22 June 2016 at 17:31, Fredrik Gustafsson <iveqy@iveqy.com> wrote:
> On Mon, Jun 20, 2016 at 01:06:39PM +0000, Istvan Zakar wrote:
>> I'm working on a relatively big project with many submodules. During
>> cloning for testing I tried to decrease the amount of data need to be
>> fetched from the server by using --shallow-submodules option in the clone
>> command. It seems to check out the tip of the remote repo, and if it's not
>> the commit registered in the superproject the submodule update fails
>> (obviously). Can I somehow tell to fetch that exact commit I need for my
>> superproject?
>
> Maybe. http://stackoverflow.com/questions/2144406/git-shallow-submodules
> gives a good overview of this problem.
>
> git fetches a branch and is shallow from that branch, which might be an
> other sha1 than the one the submodule points to, (as you say). This
> is/was one of the drawbacks with this method. However the since git 2.8,
> git will try to fetch the sha1 direct (and not the branch). So then it
> will work, if(!), the server supports direct access to sha1. This was
> previously not allowed due to security concerns (if I recall correctly).
>
> So the answer is, yes this will work if you've a recent version of git
> and support on the server side for doing this. Unfortunately I'm not
> sure which git version is needed on the server side for this to work.
>
> --
> Fredrik Gustafsson
>
> phone: +46 733-608274
> e-mail: iveqy@iveqy.com
> website: http://www.iveqy.com

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Problem with --shallow-submodules option
  2016-06-30 13:27   ` Istvan Zakar
@ 2016-06-30 20:57     ` Stefan Beller
  2016-06-30 21:04       ` Istvan Zakar
  0 siblings, 1 reply; 8+ messages in thread
From: Stefan Beller @ 2016-06-30 20:57 UTC (permalink / raw)
  To: Istvan Zakar; +Cc: Fredrik Gustafsson, git

On Thu, Jun 30, 2016 at 6:27 AM, Istvan Zakar <istvan.zakar@gmail.com> wrote:
> Hello,
>
> Thanks for your answers. I tested it after the changes were made on
> the git server, and it seems to be working. But some other issue came
> up.
>
> We have quite many submodules in our project so I did some comaprision:
>
> If I do a clone with these parameters:
> --jobs 20 --recurse-submodules
>
> The clone lasts ~53 seconds, and the total size of the folder is around 2 GB.
>
> If I add the shallow-submodules option, the size of the folder will be
> a bit below 1GB, so the size decreased as I expected, but the time of
> the clone itself increased to 90 seconds. It seems the last step of
> the command, checking out the submodules is executed one-by-one, and
> not in parallel, so it seems at this step the jobs parameter does not
> have effect.
>
> Is it intentional, or there is some option I missed?

It was intentional at the time of submitting the patches.
The checkout phase is a bit complicated as it combines the
newly cloned submodules as well as the submodules to incrementally
fetch into one bucket and treats them the same.

And for submodules that were fetched incrementally you may run into problems
when combining that with the local state (e.g. rebase or merge configured in
`submodule.<name>.update` or passed on the command line), which requires
human interaction (resolving the merge conflict), which we want to present one
at a time to the user.

The handling for the user is not quite clear, when to stop, see:
15ffb7cde48b73b3d5ce259443db7d2e0ba13750 (submodule update: continue
when a checkout fails)
877449c136539cf8b9b4ed9cfe33a796b7b93f93 (git-submodule.sh: clarify
the "should we die now" logic)

So we want to die as soon as we see a merge conflict or other
error that is likely to require some human interaction.
To do that properly we need to have complicated logic or just update
one submodule at a time.

For initial checkouts we know that there will be no merge conflicts, i.e.
it will be a "checkout -f" (with an implicit must_die_on_failure=no)
So we could run all checkouts of submodules in parallel, too. We'd
just need to write the patch for that.

As the cloning is already done in parallel, we can hook into the initial
checkout there easily. I'd build that on top of [1], creating a similar commit.
In the successful case of `update_clone_task_finished` (the case with
`!result`  -> return 0;) we would need to add the checkout command to
the queue instead of just finishing.

[1] https://github.com/gitster/git/commit/665b35eccd39fefd714cb5c332277a6b94fd9386


>
> I'm using git 2.9.0 on client side.
>
> Thanks,
>    Istvan
>
> ps: if I update the submodules with --depth 1 parameter in parallel
> using xargs it lasts about 18 seconds, so it's a workaround for this
> issue, but it would be nice to do it with a single command.
>
>
>
>
> On 22 June 2016 at 17:31, Fredrik Gustafsson <iveqy@iveqy.com> wrote:
>> On Mon, Jun 20, 2016 at 01:06:39PM +0000, Istvan Zakar wrote:
>>> I'm working on a relatively big project with many submodules. During
>>> cloning for testing I tried to decrease the amount of data need to be
>>> fetched from the server by using --shallow-submodules option in the clone
>>> command. It seems to check out the tip of the remote repo, and if it's not
>>> the commit registered in the superproject the submodule update fails
>>> (obviously). Can I somehow tell to fetch that exact commit I need for my
>>> superproject?
>>
>> Maybe. http://stackoverflow.com/questions/2144406/git-shallow-submodules
>> gives a good overview of this problem.
>>
>> git fetches a branch and is shallow from that branch, which might be an
>> other sha1 than the one the submodule points to, (as you say). This
>> is/was one of the drawbacks with this method. However the since git 2.8,
>> git will try to fetch the sha1 direct (and not the branch). So then it
>> will work, if(!), the server supports direct access to sha1. This was
>> previously not allowed due to security concerns (if I recall correctly).
>>
>> So the answer is, yes this will work if you've a recent version of git
>> and support on the server side for doing this. Unfortunately I'm not
>> sure which git version is needed on the server side for this to work.
>>
>> --
>> Fredrik Gustafsson
>>
>> phone: +46 733-608274
>> e-mail: iveqy@iveqy.com
>> website: http://www.iveqy.com
> --
> To unsubscribe from this list: send the line "unsubscribe git" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Problem with --shallow-submodules option
  2016-06-30 20:57     ` Stefan Beller
@ 2016-06-30 21:04       ` Istvan Zakar
  0 siblings, 0 replies; 8+ messages in thread
From: Istvan Zakar @ 2016-06-30 21:04 UTC (permalink / raw)
  To: Stefan Beller; +Cc: Fredrik Gustafsson, git

Hi,

Thanks for the clarification, it makes sense now.

Thanks,
    Istvan


On 30 June 2016 at 22:57, Stefan Beller <sbeller@google.com> wrote:
> On Thu, Jun 30, 2016 at 6:27 AM, Istvan Zakar <istvan.zakar@gmail.com> wrote:
>> Hello,
>>
>> Thanks for your answers. I tested it after the changes were made on
>> the git server, and it seems to be working. But some other issue came
>> up.
>>
>> We have quite many submodules in our project so I did some comaprision:
>>
>> If I do a clone with these parameters:
>> --jobs 20 --recurse-submodules
>>
>> The clone lasts ~53 seconds, and the total size of the folder is around 2 GB.
>>
>> If I add the shallow-submodules option, the size of the folder will be
>> a bit below 1GB, so the size decreased as I expected, but the time of
>> the clone itself increased to 90 seconds. It seems the last step of
>> the command, checking out the submodules is executed one-by-one, and
>> not in parallel, so it seems at this step the jobs parameter does not
>> have effect.
>>
>> Is it intentional, or there is some option I missed?
>
> It was intentional at the time of submitting the patches.
> The checkout phase is a bit complicated as it combines the
> newly cloned submodules as well as the submodules to incrementally
> fetch into one bucket and treats them the same.
>
> And for submodules that were fetched incrementally you may run into problems
> when combining that with the local state (e.g. rebase or merge configured in
> `submodule.<name>.update` or passed on the command line), which requires
> human interaction (resolving the merge conflict), which we want to present one
> at a time to the user.
>
> The handling for the user is not quite clear, when to stop, see:
> 15ffb7cde48b73b3d5ce259443db7d2e0ba13750 (submodule update: continue
> when a checkout fails)
> 877449c136539cf8b9b4ed9cfe33a796b7b93f93 (git-submodule.sh: clarify
> the "should we die now" logic)
>
> So we want to die as soon as we see a merge conflict or other
> error that is likely to require some human interaction.
> To do that properly we need to have complicated logic or just update
> one submodule at a time.
>
> For initial checkouts we know that there will be no merge conflicts, i.e.
> it will be a "checkout -f" (with an implicit must_die_on_failure=no)
> So we could run all checkouts of submodules in parallel, too. We'd
> just need to write the patch for that.
>
> As the cloning is already done in parallel, we can hook into the initial
> checkout there easily. I'd build that on top of [1], creating a similar commit.
> In the successful case of `update_clone_task_finished` (the case with
> `!result`  -> return 0;) we would need to add the checkout command to
> the queue instead of just finishing.
>
> [1] https://github.com/gitster/git/commit/665b35eccd39fefd714cb5c332277a6b94fd9386
>
>
>>
>> I'm using git 2.9.0 on client side.
>>
>> Thanks,
>>    Istvan
>>
>> ps: if I update the submodules with --depth 1 parameter in parallel
>> using xargs it lasts about 18 seconds, so it's a workaround for this
>> issue, but it would be nice to do it with a single command.
>>
>>
>>
>>
>> On 22 June 2016 at 17:31, Fredrik Gustafsson <iveqy@iveqy.com> wrote:
>>> On Mon, Jun 20, 2016 at 01:06:39PM +0000, Istvan Zakar wrote:
>>>> I'm working on a relatively big project with many submodules. During
>>>> cloning for testing I tried to decrease the amount of data need to be
>>>> fetched from the server by using --shallow-submodules option in the clone
>>>> command. It seems to check out the tip of the remote repo, and if it's not
>>>> the commit registered in the superproject the submodule update fails
>>>> (obviously). Can I somehow tell to fetch that exact commit I need for my
>>>> superproject?
>>>
>>> Maybe. http://stackoverflow.com/questions/2144406/git-shallow-submodules
>>> gives a good overview of this problem.
>>>
>>> git fetches a branch and is shallow from that branch, which might be an
>>> other sha1 than the one the submodule points to, (as you say). This
>>> is/was one of the drawbacks with this method. However the since git 2.8,
>>> git will try to fetch the sha1 direct (and not the branch). So then it
>>> will work, if(!), the server supports direct access to sha1. This was
>>> previously not allowed due to security concerns (if I recall correctly).
>>>
>>> So the answer is, yes this will work if you've a recent version of git
>>> and support on the server side for doing this. Unfortunately I'm not
>>> sure which git version is needed on the server side for this to work.
>>>
>>> --
>>> Fredrik Gustafsson
>>>
>>> phone: +46 733-608274
>>> e-mail: iveqy@iveqy.com
>>> website: http://www.iveqy.com
>> --
>> To unsubscribe from this list: send the line "unsubscribe git" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2016-06-30 21:05 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-06-20 13:06 Problem with --shallow-submodules option Istvan Zakar
2016-06-20 17:45 ` Stefan Beller
2016-06-21  6:32   ` Istvan Zakar
2016-06-21 20:35     ` Stefan Beller
2016-06-22 15:31 ` Fredrik Gustafsson
2016-06-30 13:27   ` Istvan Zakar
2016-06-30 20:57     ` Stefan Beller
2016-06-30 21:04       ` Istvan Zakar

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.