[PATCH v2] add test to demonstrate that shallow recursive clones fail

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH v2] add test to demonstrate that shallow recursive clones fail
@ 2015-11-12  9:37 larsxschneider
  2015-11-12 23:34 ` Stefan Beller
  2015-11-13  5:35 ` Jeff King
  0 siblings, 2 replies; 35+ messages in thread
From: larsxschneider @ 2015-11-12  9:37 UTC (permalink / raw)
  To: git; +Cc: sbeller, Lars Schneider

From: Lars Schneider <larsxschneider@gmail.com>

"git clone --recursive --depth 1 --single-branch <url>" clones the
submodules successfully. However, it does not obey "--depth 1" for
submodule cloning.

The following workaround does only work if the used submodule pointer
is on the default branch. Otherwise "git submodule update" fails with
"fatal: reference is not a tree:" and "Unable to checkout".
git clone --depth 1 --single-branch <url>
cd <repo-name>
git submodule update --init --recursive --depth 1

The workaround does not fail using the "--remote" flag. However, in that
case the wrong commit is checked out.

Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
---
 t/t7412-submodule-recursive.sh | 78 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 78 insertions(+)
 create mode 100755 t/t7412-submodule-recursive.sh

diff --git a/t/t7412-submodule-recursive.sh b/t/t7412-submodule-recursive.sh
new file mode 100755
index 0000000..aaf252b
--- /dev/null
+++ b/t/t7412-submodule-recursive.sh
@@ -0,0 +1,78 @@
+#!/bin/sh
+
+test_description='Test shallow cloning of repos with submodules'
+
+. ./test-lib.sh
+
+test_expect_success setup '
+	git checkout -b master &&
+	echo file >file &&
+	git add file &&
+	test_tick &&
+	git commit -m "master commit 1" &&
+
+	git checkout -b branch &&
+	echo file >branch-file &&
+	git add branch-file &&
+	test_tick &&
+	git commit -m "branch commit 1" &&
+
+	git checkout master &&
+	git clone . repo &&
+	(
+		cd repo &&
+		git checkout master &&
+		git submodule add ../. submodule &&
+		(
+			cd submodule &&
+			git checkout branch
+		) &&
+		git add submodule &&
+		test_tick &&
+		git commit -m "master commit 2"
+	)
+'
+
+test_expect_failure shallow-clone-recursive '
+	URL="file://$(pwd | sed "s/[[:space:]]/%20/g")/repo" &&
+	echo $URL &&
+	git clone --recursive --depth 1 --single-branch $URL clone-recursive &&
+	(
+		cd "clone-recursive" &&
+		git log --oneline >lines &&
+		test_line_count = 1 lines
+	) &&
+	(
+		cd "clone-recursive/submodule" &&
+		git log --oneline >lines &&
+		test_line_count = 1 lines
+	)
+'
+
+test_expect_failure shallow-clone-recursive-workaround '
+	URL="file://$(pwd | sed "s/[[:space:]]/%20/g")/repo" &&
+	echo $URL &&
+	git clone --depth 1 --single-branch $URL clone-recursive-workaround &&
+	(
+		cd "clone-recursive-workaround" &&
+		git log --oneline >lines &&
+		test_line_count = 1 lines &&
+		git submodule update --init --recursive --depth 1
+	)
+'
+
+test_expect_failure shallow-clone-recursive-with-remote-workaround '
+	URL="file://$(pwd | sed "s/[[:space:]]/%20/g")/repo" &&
+	echo $URL &&
+	git clone --depth 1 --single-branch $URL clone-recursive-remote-workaround &&
+	(
+		cd "clone-recursive-remote-workaround" &&
+		git log --oneline >lines &&
+		test_line_count = 1 lines &&
+		git submodule update --init --remote --recursive --depth 1 &&
+		git status submodule >status &&
+		test_must_fail grep "modified:" status
+	)
+'
+
+test_done
-- 
2.5.1

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [PATCH v2] add test to demonstrate that shallow recursive clones fail
  2015-11-12  9:37 [PATCH v2] add test to demonstrate that shallow recursive clones fail larsxschneider
@ 2015-11-12 23:34 ` Stefan Beller
  2015-11-15 12:43   ` Lars Schneider
  2015-11-13  5:35 ` Jeff King
  1 sibling, 1 reply; 35+ messages in thread
From: Stefan Beller @ 2015-11-12 23:34 UTC (permalink / raw)
  To: Lars Schneider; +Cc: git

On Thu, Nov 12, 2015 at 1:37 AM,  <larsxschneider@gmail.com> wrote:
> From: Lars Schneider <larsxschneider@gmail.com>
>
> "git clone --recursive --depth 1 --single-branch <url>" clones the
> submodules successfully. However, it does not obey "--depth 1" for
> submodule cloning.
>
> The following workaround does only work if the used submodule pointer
> is on the default branch. Otherwise "git submodule update" fails with
> "fatal: reference is not a tree:" and "Unable to checkout".
> git clone --depth 1 --single-branch <url>
> cd <repo-name>
> git submodule update --init --recursive --depth 1
>
> The workaround does not fail using the "--remote" flag. However, in that
> case the wrong commit is checked out.
>
> Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
> ---

Thanks for writing these tests. :)

> +test_expect_failure shallow-clone-recursive-workaround '
> +       URL="file://$(pwd | sed "s/[[:space:]]/%20/g")/repo" &&
> +       echo $URL &&
> +       git clone --depth 1 --single-branch $URL clone-recursive-workaround &&
> +       (
> +               cd "clone-recursive-workaround" &&
> +               git log --oneline >lines &&
> +               test_line_count = 1 lines &&
> +               git submodule update --init --recursive --depth 1

Should we prepend the lines with git submodule update with test_must_fail here?

> +       )
> +'
> +
> +test_expect_failure shallow-clone-recursive-with-remote-workaround '
> +       URL="file://$(pwd | sed "s/[[:space:]]/%20/g")/repo" &&
> +       echo $URL &&
> +       git clone --depth 1 --single-branch $URL clone-recursive-remote-workaround &&
> +       (
> +               cd "clone-recursive-remote-workaround" &&
> +               git log --oneline >lines &&
> +               test_line_count = 1 lines &&
> +               git submodule update --init --remote --recursive --depth 1 &&
> +               git status submodule >status &&
> +               test_must_fail grep "modified:" status

Use ! here instead of test_must_fail.

IIUC we use test_must_fail for git commands (to test that git does
return a non null value instead of segfaulting).
But on the other hand we trust grep to not segfault, so just negating
its output is enough here.

> +       )
> +'
> +
> +test_done
> --
> 2.5.1
>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2] add test to demonstrate that shallow recursive clones fail
  2015-11-12  9:37 [PATCH v2] add test to demonstrate that shallow recursive clones fail larsxschneider
  2015-11-12 23:34 ` Stefan Beller
@ 2015-11-13  5:35 ` Jeff King
  2015-11-13 18:41   ` Stefan Beller
  2015-11-15 12:53   ` Lars Schneider
  1 sibling, 2 replies; 35+ messages in thread
From: Jeff King @ 2015-11-13  5:35 UTC (permalink / raw)
  To: larsxschneider; +Cc: git, sbeller

On Thu, Nov 12, 2015 at 10:37:41AM +0100, larsxschneider@gmail.com wrote:

> From: Lars Schneider <larsxschneider@gmail.com>
> 
> "git clone --recursive --depth 1 --single-branch <url>" clones the
> submodules successfully. However, it does not obey "--depth 1" for
> submodule cloning.
> 
> The following workaround does only work if the used submodule pointer
> is on the default branch. Otherwise "git submodule update" fails with
> "fatal: reference is not a tree:" and "Unable to checkout".
> git clone --depth 1 --single-branch <url>
> cd <repo-name>
> git submodule update --init --recursive --depth 1
> 
> The workaround does not fail using the "--remote" flag. However, in that
> case the wrong commit is checked out.

Hrm. Do we want to make these workarounds work correctly? Or is the
final solution going to be that the first command you gave simply works,
and no workarounds are needed.  If the latter, I wonder if we want to be
adding tests for the workarounds in the first place.

I'm not clear on the expected endgame.

-Peff

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2] add test to demonstrate that shallow recursive clones fail
  2015-11-13  5:35 ` Jeff King
@ 2015-11-13 18:41   ` Stefan Beller
  2015-11-13 23:16     ` Stefan Beller
  2015-11-15 12:53   ` Lars Schneider
  1 sibling, 1 reply; 35+ messages in thread
From: Stefan Beller @ 2015-11-13 18:41 UTC (permalink / raw)
  To: Jeff King; +Cc: Lars Schneider, git

On Thu, Nov 12, 2015 at 9:35 PM, Jeff King <peff@peff.net> wrote:
>
> Hrm. Do we want to make these workarounds work correctly? Or is the
> final solution going to be that the first command you gave simply works,
> and no workarounds are needed.  If the latter, I wonder if we want to be
> adding tests for the workarounds in the first place.

I think we want to make the final solution just work. I dug into that and it is
harder than expected. I may even call it a bug. The bug doesn't occur often as
it is only triggered by things like rewriting history (forced pushes)
or a short dpeth
argument.

So if you invoke "git clone --recursive", it will internally just
delegate the submodule
handling to "git submodule update --init --recursive", which then (as
the submodule
doesn't exist yet) will delegate the cloning to "git submodule--helper
clone", which
will then call git clone for the actual cloning.

However in this whole chain of commands we never pass around the actual sha1
we need. The strategy is to clone first and then checkout the sha1, which the
superprojects wants to see. The desired sha1 was hopefully included in
the cloning,
so we can check it out.

But the sha1 may not be present if we have a very short depth argument, or if we
rewrote history. In case of a short depth argument, consider the
following history:

... <- A <- B

A is the recorded sha1 in the superproject, whereas B is the HEAD in the
remote you're cloning from. If cloning with depth=1, the most naive way
would have been to pass on the depth argument down the command chain,
but then we would end up cloning B with no further depth, and upon checkout
we cannot find A.

In case of the rewritten history, consider:

.. < - C <- B
         \
          A

whereas A is the recorded sha1 in the superproject, but on a different branch
(or even just a dangling commit. but used to be on master).
B is the master branch. In case we pass on --depth to cloning the submodule,
--single-branch is implied by --depth, so we would not clone A. In case of
A being a dangling commit, we wouldn't even clone it without the depth argument.

So I propose:
 * similar to fetch, we enable clone to obtain a specific sha1 from remote.
 * we explicitly pass the submodule sha1 as recorded in the superproject
   to the submodule fetch/clone in case we follow the exact sha1. In case of
   --remote or the branch field present in the superprojects .gitmodule file,
   we can just pass the branch name.

Thanks,
Stefan

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2] add test to demonstrate that shallow recursive clones fail
  2015-11-13 18:41   ` Stefan Beller
@ 2015-11-13 23:16     ` Stefan Beller
  2015-11-13 23:38       ` Jeff King
  2015-11-30 18:11       ` Junio C Hamano
  0 siblings, 2 replies; 35+ messages in thread
From: Stefan Beller @ 2015-11-13 23:16 UTC (permalink / raw)
  To: Jeff King; +Cc: Lars Schneider, git, Junio C Hamano, Duy Nguyen

On Fri, Nov 13, 2015 at 10:41 AM, Stefan Beller <sbeller@google.com> wrote:
> On Thu, Nov 12, 2015 at 9:35 PM, Jeff King <peff@peff.net> wrote:
>>
>> Hrm. Do we want to make these workarounds work correctly? Or is the
>> final solution going to be that the first command you gave simply works,
>> and no workarounds are needed.  If the latter, I wonder if we want to be
>> adding tests for the workarounds in the first place.
>
> I think we want to make the final solution just work. I dug into that and it is
> harder than expected. I may even call it a bug. The bug doesn't occur often as
> it is only triggered by things like rewriting history (forced pushes)
> or a short dpeth
> argument.
>
> So if you invoke "git clone --recursive", it will internally just
> delegate the submodule
> handling to "git submodule update --init --recursive", which then (as
> the submodule
> doesn't exist yet) will delegate the cloning to "git submodule--helper
> clone", which
> will then call git clone for the actual cloning.
>
> However in this whole chain of commands we never pass around the actual sha1
> we need. The strategy is to clone first and then checkout the sha1, which the
> superprojects wants to see. The desired sha1 was hopefully included in
> the cloning,
> so we can check it out.
>
> But the sha1 may not be present if we have a very short depth argument, or if we
> rewrote history. In case of a short depth argument, consider the
> following history:
>
> ... <- A <- B
>
> A is the recorded sha1 in the superproject, whereas B is the HEAD in the
> remote you're cloning from. If cloning with depth=1, the most naive way
> would have been to pass on the depth argument down the command chain,
> but then we would end up cloning B with no further depth, and upon checkout
> we cannot find A.
>
> In case of the rewritten history, consider:
>
> .. < - C <- B
>          \
>           A
>
> whereas A is the recorded sha1 in the superproject, but on a different branch
> (or even just a dangling commit. but used to be on master).
> B is the master branch. In case we pass on --depth to cloning the submodule,
> --single-branch is implied by --depth, so we would not clone A. In case of
> A being a dangling commit, we wouldn't even clone it without the depth argument.
>
> So I propose:
>  * similar to fetch, we enable clone to obtain a specific sha1 from remote.
>  * we explicitly pass the submodule sha1 as recorded in the superproject
>    to the submodule fetch/clone in case we follow the exact sha1. In case of
>    --remote or the branch field present in the superprojects .gitmodule file,
>    we can just pass the branch name.
>
> Thanks,
> Stefan

+cc Junio, Duy

So cloning from an arbitrary SHA1 is not a new thing I just came up with,
but has been discussed before[1].

Junio wrote on Oct 09, 2014:
> This is so non-standard a thing to do that I doubt it is worth
> supporting with "git clone".  "git clone --branch", which is about
"> I want to follow that particular branch", would not mesh well with
> "I want to see the history that leads to this exact commit", either.
> You would not know which branch(es) is that exact commit is on in
> the first place.

I disagree with this. This is the *exact* thing you actually want to do when
dealing with submodules. When fetching/cloning for a submodule, you want
to obtain the exact sha1, instead of a branch (which happens to be supported
too, but is not the original use case with submodules.)

> The "uploadpack.allowtipsha1inwant" is a wrong configuration to tie
> this into.  The intent of the configuration is to allow *ONLY*
> commits at the tip of the (possibly hidden) refs to be asked for.
> Those who want to hide some refs using "uploadpack.hiderefs" may
> want to enable "allowtipsha1inwant" to allow the tips of the hidden
> refs while still disallowing a request to fetch any random reachable
> commit not at the tip.

If the server contains at least one superproject/submodule, there is a legit
use case for fetching an exact sha1, which isn't a tip of a branch, but may
be in any branch  or even in no branch at all. So I wonder how we want
to add that as a non-hacky solution to allow for fetching specific sha1s
as we still need to differentiate between obligerated (forced pushed,
"go away sha1s") and legit submodule pointer sha1s.

As we don't want to lookup in a superproject (we don't even know
which superproject we'd need to look into), we can either go for a
more liberal sha1 fetching attitude or somehow modify the submodule
repository to mark sha1s which are good to fetch as "submodule update"
fetches.

Proposal:
    Could we have a refs/submodules/tracking which is tracking all the
    sha1s a superproject ever pointed to? That ref would need to be
    maintained by the superproject. If there is a forced push to the
    superproject, it would also need to obliterate some of the history
    in that special ref.
Problems with this proposal:
    How do we care about multiple superprojects,
    or superprojects with different branches?

[1] http://git.661346.n2.nabble.com/Can-I-fetch-an-arbitrary-commit-by-sha1-td7619396.html

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2] add test to demonstrate that shallow recursive clones fail
  2015-11-13 23:16     ` Stefan Beller
@ 2015-11-13 23:38       ` Jeff King
  2015-11-13 23:41         ` Jeff King
  2015-11-30 18:11       ` Junio C Hamano
  1 sibling, 1 reply; 35+ messages in thread
From: Jeff King @ 2015-11-13 23:38 UTC (permalink / raw)
  To: Stefan Beller; +Cc: Lars Schneider, git, Junio C Hamano, Duy Nguyen

On Fri, Nov 13, 2015 at 03:16:01PM -0800, Stefan Beller wrote:

> Junio wrote on Oct 09, 2014:
> > This is so non-standard a thing to do that I doubt it is worth
> > supporting with "git clone".  "git clone --branch", which is about
> "> I want to follow that particular branch", would not mesh well with
> > "I want to see the history that leads to this exact commit", either.
> > You would not know which branch(es) is that exact commit is on in
> > the first place.
> 
> I disagree with this. This is the *exact* thing you actually want to do when
> dealing with submodules. When fetching/cloning for a submodule, you want
> to obtain the exact sha1, instead of a branch (which happens to be supported
> too, but is not the original use case with submodules.)

I think this is already implemented in 68ee628 (upload-pack: optionally
allow fetching reachable sha1, 2015-05-21), isn't it?

> If the server contains at least one superproject/submodule, there is a legit
> use case for fetching an exact sha1, which isn't a tip of a branch, but may
> be in any branch  or even in no branch at all.

The patch above doesn't handle "no branch at all", but I'm not sure if
we want to (it violates git's usual access model; moreover, a git
repository does not necessarily have all ancestors of an unreachable
object, though these days it usually does).

-Peff

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2] add test to demonstrate that shallow recursive clones fail
  2015-11-13 23:38       ` Jeff King
@ 2015-11-13 23:41         ` Jeff King
  2015-11-14  0:10           ` Stefan Beller
  0 siblings, 1 reply; 35+ messages in thread
From: Jeff King @ 2015-11-13 23:41 UTC (permalink / raw)
  To: Stefan Beller; +Cc: Lars Schneider, git, Junio C Hamano, Duy Nguyen

On Fri, Nov 13, 2015 at 06:38:07PM -0500, Jeff King wrote:

> On Fri, Nov 13, 2015 at 03:16:01PM -0800, Stefan Beller wrote:
> 
> > Junio wrote on Oct 09, 2014:
> > > This is so non-standard a thing to do that I doubt it is worth
> > > supporting with "git clone".  "git clone --branch", which is about
> > "> I want to follow that particular branch", would not mesh well with
> > > "I want to see the history that leads to this exact commit", either.
> > > You would not know which branch(es) is that exact commit is on in
> > > the first place.
> > 
> > I disagree with this. This is the *exact* thing you actually want to do when
> > dealing with submodules. When fetching/cloning for a submodule, you want
> > to obtain the exact sha1, instead of a branch (which happens to be supported
> > too, but is not the original use case with submodules.)
> 
> I think this is already implemented in 68ee628 (upload-pack: optionally
> allow fetching reachable sha1, 2015-05-21), isn't it?

Note that this just implements the server side. I think to use this with
submodules right now, you'd have to manually "git init && git fetch" in
the submodule. It might make sense to teach clone to handle this, to
avoid the submodule code duplicating what the clone code does.

-Peff

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2] add test to demonstrate that shallow recursive clones fail
  2015-11-13 23:41         ` Jeff King
@ 2015-11-14  0:10           ` Stefan Beller
  2015-11-16 18:59             ` Jens Lehmann
  0 siblings, 1 reply; 35+ messages in thread
From: Stefan Beller @ 2015-11-14  0:10 UTC (permalink / raw)
  To: Jeff King; +Cc: Lars Schneider, git, Junio C Hamano, Duy Nguyen

On Fri, Nov 13, 2015 at 3:41 PM, Jeff King <peff@peff.net> wrote:
> On Fri, Nov 13, 2015 at 06:38:07PM -0500, Jeff King wrote:
>
>> On Fri, Nov 13, 2015 at 03:16:01PM -0800, Stefan Beller wrote:
>>
>> > Junio wrote on Oct 09, 2014:
>> > > This is so non-standard a thing to do that I doubt it is worth
>> > > supporting with "git clone".  "git clone --branch", which is about
>> > "> I want to follow that particular branch", would not mesh well with
>> > > "I want to see the history that leads to this exact commit", either.
>> > > You would not know which branch(es) is that exact commit is on in
>> > > the first place.
>> >
>> > I disagree with this. This is the *exact* thing you actually want to do when
>> > dealing with submodules. When fetching/cloning for a submodule, you want
>> > to obtain the exact sha1, instead of a branch (which happens to be supported
>> > too, but is not the original use case with submodules.)
>>
>> I think this is already implemented in 68ee628 (upload-pack: optionally
>> allow fetching reachable sha1, 2015-05-21), isn't it?
>
> Note that this just implements the server side. I think to use this with
> submodules right now, you'd have to manually "git init && git fetch" in
> the submodule. It might make sense to teach clone to handle this, to
> avoid the submodule code duplicating what the clone code does.

Yes I want to add it to clone, as that is a prerequisite for making
git clone --recursive --depth 1 to work as you'd expect. (such that
the submodule can be cloned&checkout instead of rewriting that to be
init&fetch.

Thanks for pointing out that we already have some kind of server support.

I wonder if we should add an additional way to make fetching only some
sha1s possible. ("I don't want users to fetch any sha1, but only those
where superprojects point{ed} to", even if you force push a superproject,
you want to want to only allow fetching all sha1s which exist in the current
superprojects branch.)

Maybe our emails crossed, but in the other mail I pointed out we could use
some sort of hidden ref (refs/superprojects/*) for that, which are
allowed to mark
any sort of sha1, which are allowed in the superproject/submodule context
to be fetched.

So whenever you push to a superproject (a project that has a gitlink),
we would need to check serverside if that submodule is at us and mark the
correct sha1s in the submodule. Then you can disallow fetching most of the sha1s
but still could have a correctly working submodule update mechanism.

Thanks,
Stefan

>
> -Peff

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2] add test to demonstrate that shallow recursive clones fail
  2015-11-12 23:34 ` Stefan Beller
@ 2015-11-15 12:43   ` Lars Schneider
  0 siblings, 0 replies; 35+ messages in thread
From: Lars Schneider @ 2015-11-15 12:43 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git


On 13 Nov 2015, at 00:34, Stefan Beller <sbeller@google.com> wrote:

> On Thu, Nov 12, 2015 at 1:37 AM,  <larsxschneider@gmail.com> wrote:
>> From: Lars Schneider <larsxschneider@gmail.com>
>> 
>> "git clone --recursive --depth 1 --single-branch <url>" clones the
>> submodules successfully. However, it does not obey "--depth 1" for
>> submodule cloning.
>> 
>> The following workaround does only work if the used submodule pointer
>> is on the default branch. Otherwise "git submodule update" fails with
>> "fatal: reference is not a tree:" and "Unable to checkout".
>> git clone --depth 1 --single-branch <url>
>> cd <repo-name>
>> git submodule update --init --recursive --depth 1
>> 
>> The workaround does not fail using the "--remote" flag. However, in that
>> case the wrong commit is checked out.
>> 
>> Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
>> ---
> 
> Thanks for writing these tests. :)
Thanks for looking into the issue :)


> 
>> +test_expect_failure shallow-clone-recursive-workaround '
>> +       URL="file://$(pwd | sed "s/[[:space:]]/%20/g")/repo" &&
>> +       echo $URL &&
>> +       git clone --depth 1 --single-branch $URL clone-recursive-workaround &&
>> +       (
>> +               cd "clone-recursive-workaround" &&
>> +               git log --oneline >lines &&
>> +               test_line_count = 1 lines &&
>> +               git submodule update --init --recursive --depth 1
> 
> Should we prepend the lines with git submodule update with test_must_fail here?
Wouldn't the test fail then? The test is expected to fail (see "test_expect_failure"). Am I missing something?


> 
>> +       )
>> +'
>> +
>> +test_expect_failure shallow-clone-recursive-with-remote-workaround '
>> +       URL="file://$(pwd | sed "s/[[:space:]]/%20/g")/repo" &&
>> +       echo $URL &&
>> +       git clone --depth 1 --single-branch $URL clone-recursive-remote-workaround &&
>> +       (
>> +               cd "clone-recursive-remote-workaround" &&
>> +               git log --oneline >lines &&
>> +               test_line_count = 1 lines &&
>> +               git submodule update --init --remote --recursive --depth 1 &&
>> +               git status submodule >status &&
>> +               test_must_fail grep "modified:" status
> 
> Use ! here instead of test_must_fail.
> 
> IIUC we use test_must_fail for git commands (to test that git does
> return a non null value instead of segfaulting).
> But on the other hand we trust grep to not segfault, so just negating
> its output is enough here.

OK! I will fix that in the next series!

Thanks,
Lars

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2] add test to demonstrate that shallow recursive clones fail
  2015-11-13  5:35 ` Jeff King
  2015-11-13 18:41   ` Stefan Beller
@ 2015-11-15 12:53   ` Lars Schneider
  2015-11-17 21:34     ` Jeff King
  1 sibling, 1 reply; 35+ messages in thread
From: Lars Schneider @ 2015-11-15 12:53 UTC (permalink / raw)
  To: Jeff King; +Cc: git, sbeller


On 13 Nov 2015, at 06:35, Jeff King <peff@peff.net> wrote:

> On Thu, Nov 12, 2015 at 10:37:41AM +0100, larsxschneider@gmail.com wrote:
> 
>> From: Lars Schneider <larsxschneider@gmail.com>
>> 
>> "git clone --recursive --depth 1 --single-branch <url>" clones the
>> submodules successfully. However, it does not obey "--depth 1" for
>> submodule cloning.
>> 
>> The following workaround does only work if the used submodule pointer
>> is on the default branch. Otherwise "git submodule update" fails with
>> "fatal: reference is not a tree:" and "Unable to checkout".
>> git clone --depth 1 --single-branch <url>
>> cd <repo-name>
>> git submodule update --init --recursive --depth 1
>> 
>> The workaround does not fail using the "--remote" flag. However, in that
>> case the wrong commit is checked out.
> 
> Hrm. Do we want to make these workarounds work correctly? Or is the
> final solution going to be that the first command you gave simply works,
> and no workarounds are needed.  If the latter, I wonder if we want to be
> adding tests for the workarounds in the first place.
> 
> I'm not clear on the expected endgame.

I see your point. I'll remove the workaround tests in the next roll. That being said, I think the we should do something about the workarounds, too, because it certainly confused me as Git user. Would you merge a patch that prints a warning message like "--depth parameter not supported for submodules update command" or something if a user tries this command?

Thanks,
Lars

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2] add test to demonstrate that shallow recursive clones fail
  2015-11-14  0:10           ` Stefan Beller
@ 2015-11-16 18:59             ` Jens Lehmann
  2015-11-16 19:25               ` Stefan Beller
  0 siblings, 1 reply; 35+ messages in thread
From: Jens Lehmann @ 2015-11-16 18:59 UTC (permalink / raw)
  To: Stefan Beller, Jeff King; +Cc: Lars Schneider, git, Junio C Hamano, Duy Nguyen

Am 14.11.2015 um 01:10 schrieb Stefan Beller:
> On Fri, Nov 13, 2015 at 3:41 PM, Jeff King <peff@peff.net> wrote:
>> On Fri, Nov 13, 2015 at 06:38:07PM -0500, Jeff King wrote:
>>
>>> On Fri, Nov 13, 2015 at 03:16:01PM -0800, Stefan Beller wrote:
>>>
>>>> Junio wrote on Oct 09, 2014:
>>>>> This is so non-standard a thing to do that I doubt it is worth
>>>>> supporting with "git clone".  "git clone --branch", which is about
>>>> "> I want to follow that particular branch", would not mesh well with
>>>>> "I want to see the history that leads to this exact commit", either.
>>>>> You would not know which branch(es) is that exact commit is on in
>>>>> the first place.
>>>>
>>>> I disagree with this. This is the *exact* thing you actually want to do when
>>>> dealing with submodules. When fetching/cloning for a submodule, you want
>>>> to obtain the exact sha1, instead of a branch (which happens to be supported
>>>> too, but is not the original use case with submodules.)

Yes, being able to fetch certain sha1s makes lots of sense for submodules
(this has been discussed some time ago at a GitTogether). But - apart from
the extra network load - it's rather helpful to get all the submodule
branches too (though that could be limited to the branches the sha1 is on).

>>> I think this is already implemented in 68ee628 (upload-pack: optionally
>>> allow fetching reachable sha1, 2015-05-21), isn't it?
>>
>> Note that this just implements the server side. I think to use this with
>> submodules right now, you'd have to manually "git init && git fetch" in
>> the submodule. It might make sense to teach clone to handle this, to
>> avoid the submodule code duplicating what the clone code does.
>
> Yes I want to add it to clone, as that is a prerequisite for making
> git clone --recursive --depth 1 to work as you'd expect. (such that
> the submodule can be cloned&checkout instead of rewriting that to be
> init&fetch.

Cool, that should help recursive fetch too.

> Thanks for pointing out that we already have some kind of server support.
>
> I wonder if we should add an additional way to make fetching only some
> sha1s possible. ("I don't want users to fetch any sha1, but only those
> where superprojects point{ed} to", even if you force push a superproject,
> you want to want to only allow fetching all sha1s which exist in the current
> superprojects branch.)

Me thinks the restrictions for sha1-fetching could come from the branches
these sha1s are found in the upstream submodule: if the client is allowed
to fetch a branch, it should be able to fetch any sha1 on that branch.

> Maybe our emails crossed, but in the other mail I pointed out we could use
> some sort of hidden ref (refs/superprojects/*) for that, which are
> allowed to mark
> any sort of sha1, which are allowed in the superproject/submodule context
> to be fetched.
>
> So whenever you push to a superproject (a project that has a gitlink),
> we would need to check serverside if that submodule is at us and mark the
> correct sha1s in the submodule. Then you can disallow fetching most of the sha1s
> but still could have a correctly working submodule update mechanism.

And what happens if the submodule isn't at us? Involving the serverside of
a superproject in submodule fetching sounds wrong to me. Me thinks that
the upstream of the submodule should always control if a sha1 is allowed
to be fetched. Or did I understand you wrong?

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2] add test to demonstrate that shallow recursive clones fail
  2015-11-16 18:59             ` Jens Lehmann
@ 2015-11-16 19:25               ` Stefan Beller
  2015-11-16 21:42                 ` Jens Lehmann
  2015-11-17 20:17                 ` Duy Nguyen
  0 siblings, 2 replies; 35+ messages in thread
From: Stefan Beller @ 2015-11-16 19:25 UTC (permalink / raw)
  To: Jens Lehmann; +Cc: Jeff King, Lars Schneider, git, Junio C Hamano, Duy Nguyen

On Mon, Nov 16, 2015 at 10:59 AM, Jens Lehmann <Jens.Lehmann@web.de> wrote:
> Am 14.11.2015 um 01:10 schrieb Stefan Beller:
>>
>> On Fri, Nov 13, 2015 at 3:41 PM, Jeff King <peff@peff.net> wrote:
>>>
>>> On Fri, Nov 13, 2015 at 06:38:07PM -0500, Jeff King wrote:
>>>
>>>> On Fri, Nov 13, 2015 at 03:16:01PM -0800, Stefan Beller wrote:
>>>>
>>>>> Junio wrote on Oct 09, 2014:
>>>>>>
>>>>>> This is so non-standard a thing to do that I doubt it is worth
>>>>>> supporting with "git clone".  "git clone --branch", which is about
>>>>>
>>>>> "> I want to follow that particular branch", would not mesh well with
>>>>>>
>>>>>> "I want to see the history that leads to this exact commit", either.
>>>>>> You would not know which branch(es) is that exact commit is on in
>>>>>> the first place.
>>>>>
>>>>>
>>>>> I disagree with this. This is the *exact* thing you actually want to do
>>>>> when
>>>>> dealing with submodules. When fetching/cloning for a submodule, you
>>>>> want
>>>>> to obtain the exact sha1, instead of a branch (which happens to be
>>>>> supported
>>>>> too, but is not the original use case with submodules.)
>
>
> Yes, being able to fetch certain sha1s makes lots of sense for submodules
> (this has been discussed some time ago at a GitTogether). But - apart from
> the extra network load - it's rather helpful to get all the submodule
> branches too (though that could be limited to the branches the sha1 is on).

Ok, I did not attend that GitTogether ;)

>
>>>> I think this is already implemented in 68ee628 (upload-pack: optionally
>>>> allow fetching reachable sha1, 2015-05-21), isn't it?
>>>
>>>
>>> Note that this just implements the server side. I think to use this with
>>> submodules right now, you'd have to manually "git init && git fetch" in
>>> the submodule. It might make sense to teach clone to handle this, to
>>> avoid the submodule code duplicating what the clone code does.
>>
>>
>> Yes I want to add it to clone, as that is a prerequisite for making
>> git clone --recursive --depth 1 to work as you'd expect. (such that
>> the submodule can be cloned&checkout instead of rewriting that to be
>> init&fetch.
>
>
> Cool, that should help recursive fetch too.
>
>> Thanks for pointing out that we already have some kind of server support.
>>
>> I wonder if we should add an additional way to make fetching only some
>> sha1s possible. ("I don't want users to fetch any sha1, but only those
>> where superprojects point{ed} to", even if you force push a superproject,
>> you want to want to only allow fetching all sha1s which exist in the
>> current
>> superprojects branch.)
>
>
> Me thinks the restrictions for sha1-fetching could come from the branches
> these sha1s are found in the upstream submodule: if the client is allowed
> to fetch a branch, it should be able to fetch any sha1 on that branch.

I'd agree on that. The server side even with uploadpack.allowTipSHA1InWant
set, is not sufficient though.

To fetch an arbitrary sha1, you would need to check if that sha1 is part
of the history of any advertised branch and then allow fetching serverside,
which sounds like some work for the server, which we may want to avoid
by having smarter data structures there.

Instead of having to search all branches for the requested sha1, we could have
some sort of data structure to make it not an O(n) operation (n being
all objects
in the repo).

Maybe I overestimate the work which needs to be done, because the server has
bitmaps nowadays.

Maybe a lazy reverse-pointer graph can be established on the serverside.
So I guess when we add the feature to fetch arbitrary sha1s, reachable from
any branch, people using submodules will make use of the feature. (such as with
git fetch --recurse --depth 1 or via a new `git fetch --recursive
--up-to-submodule-tip-only`)

So once the server is asked for a certain sha1, it will do the
reachability check,
which takes some effort, but then stores the result in the form:
"If ${current tip sha} of ${branch} is reachable, so is requested $sha1."

So when the next fetch request for $sha1 arrives, the server only needs to
check for ${current tip sha} to be part of $branch, which is expected to be
a shorter revwalk from the tip. (Because it is nearer to the tip, a bitmap could
just tell you or at least shorten the walk even more)
If the ${branch} has changed, the next evaluation for $sha1 can update
the cache,
such that the reverse lookup is not expensive on expectation.

I assume this will mostly be used with submodules, so only a few sha1s need
this caching.

>
>> Maybe our emails crossed, but in the other mail I pointed out we could use
>> some sort of hidden ref (refs/superprojects/*) for that, which are
>> allowed to mark
>> any sort of sha1, which are allowed in the superproject/submodule context
>> to be fetched.
>>
>> So whenever you push to a superproject (a project that has a gitlink),
>> we would need to check serverside if that submodule is at us and mark the
>> correct sha1s in the submodule. Then you can disallow fetching most of the
>> sha1s
>> but still could have a correctly working submodule update mechanism.
>
>
> And what happens if the submodule isn't at us? Involving the serverside of
> a superproject in submodule fetching sounds wrong to me. Me thinks that
> the upstream of the submodule should always control if a sha1 is allowed
> to be fetched. Or did I understand you wrong?

Yes and no.
The serverside submodule repository should be responsible for the ultimate
decision if you are allowed to fetch that sha1. But maybe on pushing the
superproject, we can store a hint in the submodule, that this sha1 is legit.
Although I may be missguided in my thinking here as the superproject
should have no influence on the submodule.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2] add test to demonstrate that shallow recursive clones fail
  2015-11-16 19:25               ` Stefan Beller
@ 2015-11-16 21:42                 ` Jens Lehmann
  2015-11-16 22:56                   ` Stefan Beller
  2015-11-17 20:17                 ` Duy Nguyen
  1 sibling, 1 reply; 35+ messages in thread
From: Jens Lehmann @ 2015-11-16 21:42 UTC (permalink / raw)
  To: Stefan Beller; +Cc: Jeff King, Lars Schneider, git, Junio C Hamano, Duy Nguyen

Am 16.11.2015 um 20:25 schrieb Stefan Beller:
> On Mon, Nov 16, 2015 at 10:59 AM, Jens Lehmann <Jens.Lehmann@web.de> wrote:
>> Am 14.11.2015 um 01:10 schrieb Stefan Beller:
>>> Thanks for pointing out that we already have some kind of server support.
>>>
>>> I wonder if we should add an additional way to make fetching only some
>>> sha1s possible. ("I don't want users to fetch any sha1, but only those
>>> where superprojects point{ed} to", even if you force push a superproject,
>>> you want to want to only allow fetching all sha1s which exist in the
>>> current
>>> superprojects branch.)
>>
>>
>> Me thinks the restrictions for sha1-fetching could come from the branches
>> these sha1s are found in the upstream submodule: if the client is allowed
>> to fetch a branch, it should be able to fetch any sha1 on that branch.
>
> I'd agree on that. The server side even with uploadpack.allowTipSHA1InWant
> set, is not sufficient though.
>
> To fetch an arbitrary sha1, you would need to check if that sha1 is part
> of the history of any advertised branch and then allow fetching serverside,
> which sounds like some work for the server, which we may want to avoid
> by having smarter data structures there.
>
> Instead of having to search all branches for the requested sha1, we could have
> some sort of data structure to make it not an O(n) operation (n being
> all objects
> in the repo).
>
> Maybe I overestimate the work which needs to be done, because the server has
> bitmaps nowadays.
>
> Maybe a lazy reverse-pointer graph can be established on the serverside.
> So I guess when we add the feature to fetch arbitrary sha1s, reachable from
> any branch, people using submodules will make use of the feature. (such as with
> git fetch --recurse --depth 1 or via a new `git fetch --recursive
> --up-to-submodule-tip-only`)
>
> So once the server is asked for a certain sha1, it will do the
> reachability check,
> which takes some effort, but then stores the result in the form:
> "If ${current tip sha} of ${branch} is reachable, so is requested $sha1."
>
> So when the next fetch request for $sha1 arrives, the server only needs to
> check for ${current tip sha} to be part of $branch, which is expected to be
> a shorter revwalk from the tip. (Because it is nearer to the tip, a bitmap could
> just tell you or at least shorten the walk even more)
> If the ${branch} has changed, the next evaluation for $sha1 can update
> the cache,
> such that the reverse lookup is not expensive on expectation.

Makes sense, although I do not know enough about the server side to tell if
it would need such an optimization or will cope with the load just fine.

But even if we'd enable such a feature without having to set an extra config
option, a submodule fetch asking for certain sha1s would have to fall back
to a simple "fetch all" like we do now when the server doesn't support that
for backwards compatibility. But maybe that's just obvious.

> I assume this will mostly be used with submodules, so only a few sha1s need
> this caching.

I won't bet on that, some of the submodules at $DAYJOB are rather busy and
see almost the same traffic as their superprojects ;-)

>>> Maybe our emails crossed, but in the other mail I pointed out we could use
>>> some sort of hidden ref (refs/superprojects/*) for that, which are
>>> allowed to mark
>>> any sort of sha1, which are allowed in the superproject/submodule context
>>> to be fetched.
>>>
>>> So whenever you push to a superproject (a project that has a gitlink),
>>> we would need to check serverside if that submodule is at us and mark the
>>> correct sha1s in the submodule. Then you can disallow fetching most of the
>>> sha1s
>>> but still could have a correctly working submodule update mechanism.
>>
>>
>> And what happens if the submodule isn't at us? Involving the serverside of
>> a superproject in submodule fetching sounds wrong to me. Me thinks that
>> the upstream of the submodule should always control if a sha1 is allowed
>> to be fetched. Or did I understand you wrong?
>
> Yes and no.
> The serverside submodule repository should be responsible for the ultimate
> decision if you are allowed to fetch that sha1. But maybe on pushing the
> superproject, we can store a hint in the submodule, that this sha1 is legit.
> Although I may be missguided in my thinking here as the superproject
> should have no influence on the submodule.

Submodules should never be aware of their superproject. But a superproject
does know its submodules, so I don't think the influence you describe here
is a problem per se. It's just looking like a corner case to me, as in a
lot of scenarios submodules do not live on the same server. And even if
they do, a superproject has no canonical way of finding their submodule's
repos (except for submodules that use relative URLs). So I'd rather like
to see a generic solution first, before we think about adding an optimized
version for certain setups later ;-)

The only real itch I have with the "superproject declaring submodule sha1s
fetchable on the server" approach is that it smells like a security problem.
The access rights of superprojects are often different from those of the
submodules it contains and this feels like a privilege escalation waiting
to happen.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2] add test to demonstrate that shallow recursive clones fail
  2015-11-16 21:42                 ` Jens Lehmann
@ 2015-11-16 22:56                   ` Stefan Beller
  2015-11-17 19:46                     ` Jens Lehmann
  0 siblings, 1 reply; 35+ messages in thread
From: Stefan Beller @ 2015-11-16 22:56 UTC (permalink / raw)
  To: Jens Lehmann; +Cc: Jeff King, Lars Schneider, git, Junio C Hamano, Duy Nguyen

On Mon, Nov 16, 2015 at 1:42 PM, Jens Lehmann <Jens.Lehmann@web.de> wrote:
> Am 16.11.2015 um 20:25 schrieb Stefan Beller:
>>
>> On Mon, Nov 16, 2015 at 10:59 AM, Jens Lehmann <Jens.Lehmann@web.de>
>> wrote:
>>>
>>> Am 14.11.2015 um 01:10 schrieb Stefan Beller:
>>>>
>>>> Thanks for pointing out that we already have some kind of server
>>>> support.
>>>>
>>>> I wonder if we should add an additional way to make fetching only some
>>>> sha1s possible. ("I don't want users to fetch any sha1, but only those
>>>> where superprojects point{ed} to", even if you force push a
>>>> superproject,
>>>> you want to want to only allow fetching all sha1s which exist in the
>>>> current
>>>> superprojects branch.)
>>>
>>>
>>>
>>> Me thinks the restrictions for sha1-fetching could come from the branches
>>> these sha1s are found in the upstream submodule: if the client is allowed
>>> to fetch a branch, it should be able to fetch any sha1 on that branch.
>>
>>
>> I'd agree on that. The server side even with uploadpack.allowTipSHA1InWant
>> set, is not sufficient though.
>>
>> To fetch an arbitrary sha1, you would need to check if that sha1 is part
>> of the history of any advertised branch and then allow fetching
>> serverside,
>> which sounds like some work for the server, which we may want to avoid
>> by having smarter data structures there.
>>
>> Instead of having to search all branches for the requested sha1, we could
>> have
>> some sort of data structure to make it not an O(n) operation (n being
>> all objects
>> in the repo).
>>
>> Maybe I overestimate the work which needs to be done, because the server
>> has
>> bitmaps nowadays.
>>
>> Maybe a lazy reverse-pointer graph can be established on the serverside.
>> So I guess when we add the feature to fetch arbitrary sha1s, reachable
>> from
>> any branch, people using submodules will make use of the feature. (such as
>> with
>> git fetch --recurse --depth 1 or via a new `git fetch --recursive
>> --up-to-submodule-tip-only`)
>>
>> So once the server is asked for a certain sha1, it will do the
>> reachability check,
>> which takes some effort, but then stores the result in the form:
>> "If ${current tip sha} of ${branch} is reachable, so is requested $sha1."
>>
>> So when the next fetch request for $sha1 arrives, the server only needs to
>> check for ${current tip sha} to be part of $branch, which is expected to
>> be
>> a shorter revwalk from the tip. (Because it is nearer to the tip, a bitmap
>> could
>> just tell you or at least shorten the walk even more)
>> If the ${branch} has changed, the next evaluation for $sha1 can update
>> the cache,
>> such that the reverse lookup is not expensive on expectation.
>
>
> Makes sense, although I do not know enough about the server side to tell if
> it would need such an optimization or will cope with the load just fine.
>
> But even if we'd enable such a feature without having to set an extra config
> option, a submodule fetch asking for certain sha1s would have to fall back
> to a simple "fetch all" like we do now when the server doesn't support that
> for backwards compatibility. But maybe that's just obvious.

It's not obvious to me.  Say you run the command:

    git clone --recursive --depth=1 ...

Currently the depth argument for the submodules is ignored, because it
doesn't work out conceptually. This is because recursive fetches fetch the
branch tips and not the submodule-specified sha1.

If we want to make it work, we would need to think about, what we want
to achieve here.
depth is usually used to reduce the transmit time/bandwidth required.

So if the server tells us it's not allowing fetching of arbitrary
sha1s by its cryptic message:

    $ git fetch origin 6f963a895a97d720c909fcf4eb0544a272ef7c49:refs/heads/copy
    error: no such remote ref 6f963a895a97d720c909fcf4eb0544a272ef7c49

we have two choices, either error out with

    die(_("Server doesn't support cloning of arbitrary sha1s"))

or we could pretend as if we know how to fix it by cloning regularly
with the whole history
attached and then present a tightened history by shallowing after
cloning. But that would
defeat the whole point of the depth argument in the first place, the
time and bandwidth would
have been wasted. So instead I'd rather have the user make the choice.



>
>> I assume this will mostly be used with submodules, so only a few sha1s
>> need
>> this caching.
>
>
> I won't bet on that, some of the submodules at $DAYJOB are rather busy and
> see almost the same traffic as their superprojects ;-)

But do you update the superproject with each submodules commit?
(We plan to update the superproject in Gerrit with each submodule eventually,
so yeah that point is nuts.)

>
> Submodules should never be aware of their superproject. But a superproject
> does know its submodules, so I don't think the influence you describe here
> is a problem per se. It's just looking like a corner case to me, as in a
> lot of scenarios submodules do not live on the same server. And even if
> they do, a superproject has no canonical way of finding their submodule's
> repos (except for submodules that use relative URLs). So I'd rather like
> to see a generic solution first, before we think about adding an optimized
> version for certain setups later ;-)

ok. :)

>
> The only real itch I have with the "superproject declaring submodule sha1s
> fetchable on the server" approach is that it smells like a security problem.
> The access rights of superprojects are often different from those of the
> submodules it contains and this feels like a privilege escalation waiting
> to happen.

Yeah, I agree. It was one of my first ideas, probably not the best
idea on this topic.
Currently I am amazed by the reverse lazy caching if that check should ever be
a problem on the server side.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2] add test to demonstrate that shallow recursive clones fail
  2015-11-16 22:56                   ` Stefan Beller
@ 2015-11-17 19:46                     ` Jens Lehmann
  2015-11-17 20:04                       ` Stefan Beller
  0 siblings, 1 reply; 35+ messages in thread
From: Jens Lehmann @ 2015-11-17 19:46 UTC (permalink / raw)
  To: Stefan Beller; +Cc: Jeff King, Lars Schneider, git, Junio C Hamano, Duy Nguyen

Am 16.11.2015 um 23:56 schrieb Stefan Beller:
> On Mon, Nov 16, 2015 at 1:42 PM, Jens Lehmann <Jens.Lehmann@web.de> wrote:
>> Am 16.11.2015 um 20:25 schrieb Stefan Beller:
>>>
>>> On Mon, Nov 16, 2015 at 10:59 AM, Jens Lehmann <Jens.Lehmann@web.de>
>>> wrote:
>>>>
>>>> Am 14.11.2015 um 01:10 schrieb Stefan Beller:
>>>>>
>>>>> Thanks for pointing out that we already have some kind of server
>>>>> support.
>>>>>
>>>>> I wonder if we should add an additional way to make fetching only some
>>>>> sha1s possible. ("I don't want users to fetch any sha1, but only those
>>>>> where superprojects point{ed} to", even if you force push a
>>>>> superproject,
>>>>> you want to want to only allow fetching all sha1s which exist in the
>>>>> current
>>>>> superprojects branch.)
>>>>
>>>>
>>>>
>>>> Me thinks the restrictions for sha1-fetching could come from the branches
>>>> these sha1s are found in the upstream submodule: if the client is allowed
>>>> to fetch a branch, it should be able to fetch any sha1 on that branch.
>>>
>>>
>>> I'd agree on that. The server side even with uploadpack.allowTipSHA1InWant
>>> set, is not sufficient though.
>>>
>>> To fetch an arbitrary sha1, you would need to check if that sha1 is part
>>> of the history of any advertised branch and then allow fetching
>>> serverside,
>>> which sounds like some work for the server, which we may want to avoid
>>> by having smarter data structures there.
>>>
>>> Instead of having to search all branches for the requested sha1, we could
>>> have
>>> some sort of data structure to make it not an O(n) operation (n being
>>> all objects
>>> in the repo).
>>>
>>> Maybe I overestimate the work which needs to be done, because the server
>>> has
>>> bitmaps nowadays.
>>>
>>> Maybe a lazy reverse-pointer graph can be established on the serverside.
>>> So I guess when we add the feature to fetch arbitrary sha1s, reachable
>>> from
>>> any branch, people using submodules will make use of the feature. (such as
>>> with
>>> git fetch --recurse --depth 1 or via a new `git fetch --recursive
>>> --up-to-submodule-tip-only`)
>>>
>>> So once the server is asked for a certain sha1, it will do the
>>> reachability check,
>>> which takes some effort, but then stores the result in the form:
>>> "If ${current tip sha} of ${branch} is reachable, so is requested $sha1."
>>>
>>> So when the next fetch request for $sha1 arrives, the server only needs to
>>> check for ${current tip sha} to be part of $branch, which is expected to
>>> be
>>> a shorter revwalk from the tip. (Because it is nearer to the tip, a bitmap
>>> could
>>> just tell you or at least shorten the walk even more)
>>> If the ${branch} has changed, the next evaluation for $sha1 can update
>>> the cache,
>>> such that the reverse lookup is not expensive on expectation.
>>
>>
>> Makes sense, although I do not know enough about the server side to tell if
>> it would need such an optimization or will cope with the load just fine.
>>
>> But even if we'd enable such a feature without having to set an extra config
>> option, a submodule fetch asking for certain sha1s would have to fall back
>> to a simple "fetch all" like we do now when the server doesn't support that
>> for backwards compatibility. But maybe that's just obvious.
>
> It's not obvious to me.  Say you run the command:
>
>      git clone --recursive --depth=1 ...
>
> Currently the depth argument for the submodules is ignored, because it
> doesn't work out conceptually. This is because recursive fetches fetch the
> branch tips and not the submodule-specified sha1.
>
> If we want to make it work, we would need to think about, what we want
> to achieve here.
> depth is usually used to reduce the transmit time/bandwidth required.
>
> So if the server tells us it's not allowing fetching of arbitrary
> sha1s by its cryptic message:
>
>      $ git fetch origin 6f963a895a97d720c909fcf4eb0544a272ef7c49:refs/heads/copy
>      error: no such remote ref 6f963a895a97d720c909fcf4eb0544a272ef7c49
>
> we have two choices, either error out with
>
>      die(_("Server doesn't support cloning of arbitrary sha1s"))
>
> or we could pretend as if we know how to fix it by cloning regularly
> with the whole history
> attached and then present a tightened history by shallowing after
> cloning. But that would
> defeat the whole point of the depth argument in the first place, the
> time and bandwidth would
> have been wasted. So instead I'd rather have the user make the choice.

But for quite some time you'll have older servers out there that
don't support fetching a single sha1 or aren't configured to do so.
Wouldn't it be better to give the user an appropriate warning and
fall back to cloning everything for those submodules while using the
optimized new method for all others and the superproject? Otherwise
you won't be able to limit the depth if only a single submodule
server doesn't support fetching a single sha1.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2] add test to demonstrate that shallow recursive clones fail
  2015-11-17 19:46                     ` Jens Lehmann
@ 2015-11-17 20:04                       ` Stefan Beller
  2015-11-17 20:39                         ` Jens Lehmann
  0 siblings, 1 reply; 35+ messages in thread
From: Stefan Beller @ 2015-11-17 20:04 UTC (permalink / raw)
  To: Jens Lehmann; +Cc: Jeff King, Lars Schneider, git, Junio C Hamano, Duy Nguyen

On Tue, Nov 17, 2015 at 11:46 AM, Jens Lehmann <Jens.Lehmann@web.de> wrote:
>
> But for quite some time you'll have older servers out there that
> don't support fetching a single sha1 or aren't configured to do so.

Only when talking about the open source side. If you have all the
submodules/superprojects on your companies mirror, you can control
the git installations there.

> Wouldn't it be better to give the user an appropriate warning and
> fall back to cloning everything for those submodules while using the
> optimized new method for all others and the superproject? Otherwise
> you won't be able to limit the depth if only a single submodule
> server doesn't support fetching a single sha1.
>

I think warnings are fine, but no fallbacks. The warning could look like:

    Server for submodule <foo> doesn't support fetching by sha1.
    Fetch again without depth argument.

and keep going with the other submodules. This would allow the user
to make an informed decision if they want to have the fallback solution
(which requires more band width, disk space)
On the other hand, that's what people do today, so it's not that bad either,
so I guess falling bad would work too.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2] add test to demonstrate that shallow recursive clones fail
  2015-11-16 19:25               ` Stefan Beller
  2015-11-16 21:42                 ` Jens Lehmann
@ 2015-11-17 20:17                 ` Duy Nguyen
  2015-11-17 21:43                   ` Jeff King
  1 sibling, 1 reply; 35+ messages in thread
From: Duy Nguyen @ 2015-11-17 20:17 UTC (permalink / raw)
  To: Stefan Beller
  Cc: Jens Lehmann, Jeff King, Lars Schneider, git, Junio C Hamano

On Mon, Nov 16, 2015 at 8:25 PM, Stefan Beller <sbeller@google.com> wrote:
> Instead of having to search all branches for the requested sha1, we could have
> some sort of data structure to make it not an O(n) operation (n being
> all objects
> in the repo).
>
> Maybe I overestimate the work which needs to be done, because the server has
> bitmaps nowadays.

Quote from [1]

> If we take the kernel history in rev-list and pick two commits that
> are roughly ~10,000 commits apart from one another, JGit can compute
> the rev-list --objects between these two commits in about 120
> milliseconds (git-core should be faster, or at least comparable).

I think we should be fine (note that --objects is a lot heavier than
commit walking). Though.. I just tried it on git.git. 10k commits
(without --objects) take about 200ms with C Git..

[1] http://marc.info/?l=git&m=133374018207891&w=2
-- 
Duy

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2] add test to demonstrate that shallow recursive clones fail
  2015-11-17 20:04                       ` Stefan Beller
@ 2015-11-17 20:39                         ` Jens Lehmann
  2015-11-17 20:49                           ` Stefan Beller
  0 siblings, 1 reply; 35+ messages in thread
From: Jens Lehmann @ 2015-11-17 20:39 UTC (permalink / raw)
  To: Stefan Beller; +Cc: Jeff King, Lars Schneider, git, Junio C Hamano, Duy Nguyen

Am 17.11.2015 um 21:04 schrieb Stefan Beller:
> On Tue, Nov 17, 2015 at 11:46 AM, Jens Lehmann <Jens.Lehmann@web.de> wrote:
>>
>> But for quite some time you'll have older servers out there that
>> don't support fetching a single sha1 or aren't configured to do so.
>
> Only when talking about the open source side. If you have all the
> submodules/superprojects on your companies mirror, you can control
> the git installations there.

Sure. But that doesn't mean we should make life harder for the open
source side, no? We'll have to support both for quite some time.

>> Wouldn't it be better to give the user an appropriate warning and
>> fall back to cloning everything for those submodules while using the
>> optimized new method for all others and the superproject? Otherwise
>> you won't be able to limit the depth if only a single submodule
>> server doesn't support fetching a single sha1.
>>
>
> I think warnings are fine, but no fallbacks. The warning could look like:
>
>      Server for submodule <foo> doesn't support fetching by sha1.
>      Fetch again without depth argument.
>
> and keep going with the other submodules. This would allow the user
> to make an informed decision if they want to have the fallback solution
> (which requires more band width, disk space)

No, this is a regression. This worked before but now some submodules
are missing from the clone. And if that happens inside a Jenkins
script I doubt that Jenkins can make an informed decision, that job
will simply fail.

> On the other hand, that's what people do today, so it's not that bad either,
> so I guess falling bad would work too.

Not that bad? I don't see any other sane way. Don't break formerly
working use cases without a very good reason. Fall back to what we
did before (even if it is suboptimal) and only then use the new
optimized (and admittedly better) feature when it is available.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2] add test to demonstrate that shallow recursive clones fail
  2015-11-17 20:39                         ` Jens Lehmann
@ 2015-11-17 20:49                           ` Stefan Beller
  2015-11-17 21:00                             ` Jens Lehmann
  0 siblings, 1 reply; 35+ messages in thread
From: Stefan Beller @ 2015-11-17 20:49 UTC (permalink / raw)
  To: Jens Lehmann; +Cc: Jeff King, Lars Schneider, git, Junio C Hamano, Duy Nguyen

On Tue, Nov 17, 2015 at 12:39 PM, Jens Lehmann <Jens.Lehmann@web.de> wrote:
> Am 17.11.2015 um 21:04 schrieb Stefan Beller:
>>
>> On Tue, Nov 17, 2015 at 11:46 AM, Jens Lehmann <Jens.Lehmann@web.de>
>> wrote:
>>>
>>>
>>> But for quite some time you'll have older servers out there that
>>> don't support fetching a single sha1 or aren't configured to do so.
>>
>>
>> Only when talking about the open source side. If you have all the
>> submodules/superprojects on your companies mirror, you can control
>> the git installations there.
>
>
> Sure. But that doesn't mean we should make life harder for the open
> source side, no? We'll have to support both for quite some time.
>
>>> Wouldn't it be better to give the user an appropriate warning and
>>> fall back to cloning everything for those submodules while using the
>>> optimized new method for all others and the superproject? Otherwise
>>> you won't be able to limit the depth if only a single submodule
>>> server doesn't support fetching a single sha1.
>>>
>>
>> I think warnings are fine, but no fallbacks. The warning could look like:
>>
>>      Server for submodule <foo> doesn't support fetching by sha1.
>>      Fetch again without depth argument.
>>
>> and keep going with the other submodules. This would allow the user
>> to make an informed decision if they want to have the fallback solution
>> (which requires more band width, disk space)
>
>
> No, this is a regression. This worked before but now some submodules
> are missing from the clone. And if that happens inside a Jenkins
> script I doubt that Jenkins can make an informed decision, that job
> will simply fail.
>
>> On the other hand, that's what people do today, so it's not that bad
>> either,
>> so I guess falling bad would work too.
>
>
> Not that bad? I don't see any other sane way. Don't break formerly
> working use cases without a very good reason. Fall back to what we
> did before (even if it is suboptimal) and only then use the new
> optimized (and admittedly better) feature when it is available.

I assumed we'd have yet another flag to activate the new behavior,
but if you want to roll out that new feature as a default, I agree on
needing the fallback.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2] add test to demonstrate that shallow recursive clones fail
  2015-11-17 20:49                           ` Stefan Beller
@ 2015-11-17 21:00                             ` Jens Lehmann
  0 siblings, 0 replies; 35+ messages in thread
From: Jens Lehmann @ 2015-11-17 21:00 UTC (permalink / raw)
  To: Stefan Beller; +Cc: Jeff King, Lars Schneider, git, Junio C Hamano, Duy Nguyen

Am 17.11.2015 um 21:49 schrieb Stefan Beller:
> I assumed we'd have yet another flag to activate the new behavior,
> but if you want to roll out that new feature as a default, I agree on
> needing the fallback.

Ah, I was under the impression that users are surprised by --depth
not propagating into the submodules, so I considered this a bugfix
which doesn't need extra configuration (and the subject line seems
to support my impression ;-).

And if some users really need the old behavior, they can remove the
--recurse-submodules from the clone and issue a submodule update
afterwards (Unless they speak up now and tell us why having --depth
only apply to the superproject is a feature ... ;-).

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2] add test to demonstrate that shallow recursive clones fail
  2015-11-15 12:53   ` Lars Schneider
@ 2015-11-17 21:34     ` Jeff King
  0 siblings, 0 replies; 35+ messages in thread
From: Jeff King @ 2015-11-17 21:34 UTC (permalink / raw)
  To: Lars Schneider; +Cc: git, sbeller

On Sun, Nov 15, 2015 at 01:53:31PM +0100, Lars Schneider wrote:

> > Hrm. Do we want to make these workarounds work correctly? Or is the
> > final solution going to be that the first command you gave simply works,
> > and no workarounds are needed.  If the latter, I wonder if we want to be
> > adding tests for the workarounds in the first place.
> > 
> > I'm not clear on the expected endgame.
> 
> I see your point. I'll remove the workaround tests in the next roll.
> That being said, I think the we should do something about the
> workarounds, too, because it certainly confused me as Git user. Would
> you merge a patch that prints a warning message like "--depth
> parameter not supported for submodules update command" or something if
> a user tries this command?

I would never promise to merge something without seeing it first. :)

It does sound sensible to me to warn the user when some of their request
is being ignored. I'm not familiar enough with submodules to know
whether there are hidden gotchas here (so I would rely on folks like
Stefan and Jens for review).

-Peff

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2] add test to demonstrate that shallow recursive clones fail
  2015-11-17 20:17                 ` Duy Nguyen
@ 2015-11-17 21:43                   ` Jeff King
  2015-11-18 12:32                     ` Duy Nguyen
  0 siblings, 1 reply; 35+ messages in thread
From: Jeff King @ 2015-11-17 21:43 UTC (permalink / raw)
  To: Duy Nguyen
  Cc: Stefan Beller, Jens Lehmann, Lars Schneider, git, Junio C Hamano

On Tue, Nov 17, 2015 at 09:17:43PM +0100, Duy Nguyen wrote:

> On Mon, Nov 16, 2015 at 8:25 PM, Stefan Beller <sbeller@google.com> wrote:
> > Instead of having to search all branches for the requested sha1, we could have
> > some sort of data structure to make it not an O(n) operation (n being
> > all objects
> > in the repo).
> >
> > Maybe I overestimate the work which needs to be done, because the server has
> > bitmaps nowadays.
> 
> Quote from [1]
> 
> > If we take the kernel history in rev-list and pick two commits that
> > are roughly ~10,000 commits apart from one another, JGit can compute
> > the rev-list --objects between these two commits in about 120
> > milliseconds (git-core should be faster, or at least comparable).
> 
> I think we should be fine (note that --objects is a lot heavier than
> commit walking). Though.. I just tried it on git.git. 10k commits
> (without --objects) take about 200ms with C Git..

A lot of this depends on the endpoints. We can't store bitmaps for every
commit, so we often have to fall back to traversing from the commit,
collecting reachable objects until we hit a commit that does have
bitmaps.

I think the for the purposes of upload-pack and reachability, it might
be fine to just walk commits, which as you note is much cheaper. The C
git bitmap code does not currently have a way to say "I only care about
commits, do not bother filling in the trees and blobs when you have to
do a fallback traversal". But it would not be hard to add, I think.

-Peff

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2] add test to demonstrate that shallow recursive clones fail
  2015-11-17 21:43                   ` Jeff King
@ 2015-11-18 12:32                     ` Duy Nguyen
  2015-11-18 21:11                       ` Jeff King
  0 siblings, 1 reply; 35+ messages in thread
From: Duy Nguyen @ 2015-11-18 12:32 UTC (permalink / raw)
  To: Jeff King
  Cc: Stefan Beller, Jens Lehmann, Lars Schneider, git, Junio C Hamano

On Tue, Nov 17, 2015 at 10:43 PM, Jeff King <peff@peff.net> wrote:
> On Tue, Nov 17, 2015 at 09:17:43PM +0100, Duy Nguyen wrote:
>
>> On Mon, Nov 16, 2015 at 8:25 PM, Stefan Beller <sbeller@google.com> wrote:
>> > Instead of having to search all branches for the requested sha1, we could have
>> > some sort of data structure to make it not an O(n) operation (n being
>> > all objects
>> > in the repo).
>> >
>> > Maybe I overestimate the work which needs to be done, because the server has
>> > bitmaps nowadays.
>>
>> Quote from [1]
>>
>> > If we take the kernel history in rev-list and pick two commits that
>> > are roughly ~10,000 commits apart from one another, JGit can compute
>> > the rev-list --objects between these two commits in about 120
>> > milliseconds (git-core should be faster, or at least comparable).
>>
>> I think we should be fine (note that --objects is a lot heavier than
>> commit walking). Though.. I just tried it on git.git. 10k commits
>> (without --objects) take about 200ms with C Git..
>
> A lot of this depends on the endpoints. We can't store bitmaps for every
> commit, so we often have to fall back to traversing from the commit,
> collecting reachable objects until we hit a commit that does have
> bitmaps.
>
> I think the for the purposes of upload-pack and reachability, it might
> be fine to just walk commits, which as you note is much cheaper. The C
> git bitmap code does not currently have a way to say "I only care about
> commits, do not bother filling in the trees and blobs when you have to
> do a fallback traversal". But it would not be hard to add, I think.

Yeah I think that was the 10k commits in Shawn's mail: the number of
commits we may have to walk until we hit a reachability bitmap. It
looks like C Git will create a bitmap every 5k commits, not 10k,
though, if I read the code correctly. The point is reachability test
with the presence of pack bitmaps is not O(n) anymore. Which is
probably good enough for now.
-- 
Duy

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2] add test to demonstrate that shallow recursive clones fail
  2015-11-18 12:32                     ` Duy Nguyen
@ 2015-11-18 21:11                       ` Jeff King
  2015-11-18 21:36                         ` Stefan Beller
  0 siblings, 1 reply; 35+ messages in thread
From: Jeff King @ 2015-11-18 21:11 UTC (permalink / raw)
  To: Duy Nguyen
  Cc: Stefan Beller, Jens Lehmann, Lars Schneider, git, Junio C Hamano

On Wed, Nov 18, 2015 at 01:32:36PM +0100, Duy Nguyen wrote:

> Yeah I think that was the 10k commits in Shawn's mail: the number of
> commits we may have to walk until we hit a reachability bitmap. It
> looks like C Git will create a bitmap every 5k commits, not 10k,
> though, if I read the code correctly. The point is reachability test
> with the presence of pack bitmaps is not O(n) anymore. Which is
> probably good enough for now.

There are some pathological cases, though. I hit one recently that still
took 40s to do "rev-list --objects --all --use-bitmap-index" (it's 80s
without bitmaps).  The problem is that it has over 20,000 refs. We try
to put a bitmap at the tip of each ref, but it's tough when there are
that many.

I suspect there's room for improvement in the commit selection in such
cases. That code hasn't really been tweaked since it was originally
written, and repositories like that are extreme outliers.

-Peff

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2] add test to demonstrate that shallow recursive clones fail
  2015-11-18 21:11                       ` Jeff King
@ 2015-11-18 21:36                         ` Stefan Beller
       [not found]                           ` <CAOE36qj2m4e3hw73-QoLbbpGv4RiyhBt_ou7eN4i4q8pF15rdA@mail.gmail.com>
  0 siblings, 1 reply; 35+ messages in thread
From: Stefan Beller @ 2015-11-18 21:36 UTC (permalink / raw)
  To: Jeff King, Terry Parker
  Cc: Duy Nguyen, Jens Lehmann, Lars Schneider, git, Junio C Hamano

On Wed, Nov 18, 2015 at 1:11 PM, Jeff King <peff@peff.net> wrote:
> On Wed, Nov 18, 2015 at 01:32:36PM +0100, Duy Nguyen wrote:
>
>> Yeah I think that was the 10k commits in Shawn's mail: the number of
>> commits we may have to walk until we hit a reachability bitmap. It
>> looks like C Git will create a bitmap every 5k commits, not 10k,
>> though, if I read the code correctly. The point is reachability test
>> with the presence of pack bitmaps is not O(n) anymore. Which is
>> probably good enough for now.
>
> There are some pathological cases, though. I hit one recently that still
> took 40s to do "rev-list --objects --all --use-bitmap-index" (it's 80s
> without bitmaps).  The problem is that it has over 20,000 refs. We try
> to put a bitmap at the tip of each ref, but it's tough when there are
> that many.

+Terry, who did optimize the JGit implementation for bitmaps,
as we also had a "lots of refs" hoarder repo, which underperformed
before.

>
> I suspect there's room for improvement in the commit selection in such
> cases. That code hasn't really been tweaked since it was originally
> written, and repositories like that are extreme outliers.
>
> -Peff

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2] add test to demonstrate that shallow recursive clones fail
       [not found]                           ` <CAOE36qj2m4e3hw73-QoLbbpGv4RiyhBt_ou7eN4i4q8pF15rdA@mail.gmail.com>
@ 2015-11-19 13:58                             ` Jeff King
  0 siblings, 0 replies; 35+ messages in thread
From: Jeff King @ 2015-11-19 13:58 UTC (permalink / raw)
  To: Terry Parker
  Cc: Stefan Beller, Duy Nguyen, Jens Lehmann, Lars Schneider, git,
	Junio C Hamano

On Wed, Nov 18, 2015 at 03:40:02PM -0800, Terry Parker wrote:

> > +Terry, who did optimize the JGit implementation for bitmaps,
> > as we also had a "lots of refs" hoarder repo, which underperformed
> > before.
> 
> The performance issue with the "hoarder" repo was that the bitmap
> commit selection algorithm in JGit was selecting too many commits and
> the hoarder repo caused OOMs.
> 
> JGit's selection algorithm walks each branch and makes sure commits
> are selected every 100-200 commits for "recent" history and every
> 5000-5100 commits for more distant history. IIUC cgit sorts all commits
> by date and does a similar selection. Since it doesn't attempt to get
> even coverage on each branch, the selected commits may have
> large gaps for a particular branch.

Thanks for the data point. I agree that cgit seems to have the opposite
problem: selecting fewer commits, which gives worse performance at read
time, but no OOM at write time. :)

I suspect there's some low-hanging fruit in tweaking cgit's
implementation, where we might be able to do a better job of hitting
near the tips of tags without having to drastically increase the number
of bitmapped commits. But I haven't looked closely.

-Peff

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2] add test to demonstrate that shallow recursive clones fail
  2015-11-13 23:16     ` Stefan Beller
  2015-11-13 23:38       ` Jeff King
@ 2015-11-30 18:11       ` Junio C Hamano
  2015-12-01  0:47         ` Stefan Beller
  1 sibling, 1 reply; 35+ messages in thread
From: Junio C Hamano @ 2015-11-30 18:11 UTC (permalink / raw)
  To: Stefan Beller; +Cc: Jeff King, Lars Schneider, git, Duy Nguyen

Stefan Beller <sbeller@google.com> writes:

> +cc Junio, Duy
>
> So cloning from an arbitrary SHA1 is not a new thing I just came up with,
> but has been discussed before[1].
>
> Junio wrote on Oct 09, 2014:
>> This is so non-standard a thing to do that I doubt it is worth
>> supporting with "git clone".  "git clone --branch", which is about
> "> I want to follow that particular branch", would not mesh well with
>> "I want to see the history that leads to this exact commit", either.
>> You would not know which branch(es) is that exact commit is on in
>> the first place.
>
> I disagree with this. This is the *exact* thing you actually want to do when
> dealing with submodules.

Yup, I know, but I do not think the above disagrees with you (read
again ;-).  It merely says "--branch" option to "clone" is not a
good place to add a new "clone at this single commit" mode of
operation.

In order to propagate "--single-branch" thru "--recurse-submodules",
I suspect that you would need to teach "clone" a new option that is
different from "--branch" that allows you to clone the history
starting from the commit recorded in the tree of the superproject in
the submodule.  That would be orthogonal to "--depth $n", of course,
in other words, a top-level "--single-branch --recurse-submodules"
clone should trigger the "history reachable from a specified commit"
mode of clone in submodules, and if the top-level one specified the
"--depth" option, the lower-level ones can limit the depth
accordingly.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2] add test to demonstrate that shallow recursive clones fail
  2015-11-30 18:11       ` Junio C Hamano
@ 2015-12-01  0:47         ` Stefan Beller
  2015-12-01 12:15           ` Duy Nguyen
  0 siblings, 1 reply; 35+ messages in thread
From: Stefan Beller @ 2015-12-01  0:47 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jeff King, Lars Schneider, git, Duy Nguyen

On Mon, Nov 30, 2015 at 10:11 AM, Junio C Hamano <gitster@pobox.com> wrote:
> Stefan Beller <sbeller@google.com> writes:
>
>> +cc Junio, Duy
>>
>> So cloning from an arbitrary SHA1 is not a new thing I just came up with,
>> but has been discussed before[1].
>>
>> Junio wrote on Oct 09, 2014:
>>> This is so non-standard a thing to do that I doubt it is worth
>>> supporting with "git clone".  "git clone --branch", which is about
>> "> I want to follow that particular branch", would not mesh well with
>>> "I want to see the history that leads to this exact commit", either.
>>> You would not know which branch(es) is that exact commit is on in
>>> the first place.
>>
>> I disagree with this. This is the *exact* thing you actually want to do when
>> dealing with submodules.
>
> Yup, I know, but I do not think the above disagrees with you (read
> again ;-).  It merely says "--branch" option to "clone" is not a
> good place to add a new "clone at this single commit" mode of
> operation.

Ok. So maybe a bit of bike shedding time:

How does

    git clone --detached-head <sha1>

sound? I would imagine that this would either present you with a fresh clone
with a detached head at the specified sha1, or if the server doesn't support
getting a specific sha1, it would error out.

Having a command like that we could then use it inside of the submodule
code as  git clone --detached-head is surely compatible with --depth.
In case the --detached-head clone fails, we can still use the old behavior,
warning about --depth not being supported and ignoring --depth in
further commands and just performing a standard clone.

>
> In order to propagate "--single-branch" thru "--recurse-submodules",
> I suspect that you would need to teach "clone" a new option that is
> different from "--branch" that allows you to clone the history
> starting from the commit recorded in the tree of the superproject in
> the submodule.  That would be orthogonal to "--depth $n", of course,
> in other words, a top-level "--single-branch --recurse-submodules"
> clone should trigger the "history reachable from a specified commit"
> mode of clone in submodules, and if the top-level one specified the
> "--depth" option, the lower-level ones can limit the depth
> accordingly.
>
>
>
>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2] add test to demonstrate that shallow recursive clones fail
  2015-12-01  0:47         ` Stefan Beller
@ 2015-12-01 12:15           ` Duy Nguyen
  2015-12-01 21:47             ` Junio C Hamano
  0 siblings, 1 reply; 35+ messages in thread
From: Duy Nguyen @ 2015-12-01 12:15 UTC (permalink / raw)
  To: Stefan Beller; +Cc: Junio C Hamano, Jeff King, Lars Schneider, git

On Tue, Dec 1, 2015 at 1:47 AM, Stefan Beller <sbeller@google.com> wrote:
> On Mon, Nov 30, 2015 at 10:11 AM, Junio C Hamano <gitster@pobox.com> wrote:
>> Stefan Beller <sbeller@google.com> writes:
>>
>>> +cc Junio, Duy
>>>
>>> So cloning from an arbitrary SHA1 is not a new thing I just came up with,
>>> but has been discussed before[1].
>>>
>>> Junio wrote on Oct 09, 2014:
>>>> This is so non-standard a thing to do that I doubt it is worth
>>>> supporting with "git clone".  "git clone --branch", which is about
>>> "> I want to follow that particular branch", would not mesh well with
>>>> "I want to see the history that leads to this exact commit", either.
>>>> You would not know which branch(es) is that exact commit is on in
>>>> the first place.
>>>
>>> I disagree with this. This is the *exact* thing you actually want to do when
>>> dealing with submodules.
>>
>> Yup, I know, but I do not think the above disagrees with you (read
>> again ;-).  It merely says "--branch" option to "clone" is not a
>> good place to add a new "clone at this single commit" mode of
>> operation.
>
> Ok. So maybe a bit of bike shedding time:
>
> How does
>
>     git clone --detached-head <sha1>

maybe

git clone --commit-id <sha1> repo (*)

instead. Detached head is implied, and this way you don't have to
disambiguate sha-1 vs refname. And --commit-id can also be added in
git-fetch. Actually the git-fetch case is even more interesting, what
do we do with refspec..

(*) as usual, we accept committish sha-1, not just comit sha-1, so
--commit-id may be confusing..? Or maybe just go with --tag where we
accept either tag names, tag sha-1 or commit-sha1

> sound? I would imagine that this would either present you with a fresh clone
> with a detached head at the specified sha1, or if the server doesn't support
> getting a specific sha1, it would error out.
-- 
Duy

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2] add test to demonstrate that shallow recursive clones fail
  2015-12-01 12:15           ` Duy Nguyen
@ 2015-12-01 21:47             ` Junio C Hamano
  2015-12-01 22:22               ` Junio C Hamano
  0 siblings, 1 reply; 35+ messages in thread
From: Junio C Hamano @ 2015-12-01 21:47 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: Stefan Beller, Jeff King, Lars Schneider, git

Duy Nguyen <pclouds@gmail.com> writes:

> maybe
>
> git clone --commit-id <sha1> repo (*)
>
> instead. Detached head is implied, and this way you don't have to
> disambiguate sha-1 vs refname. And --commit-id can also be added in
> git-fetch. Actually the git-fetch case is even more interesting, what
> do we do with refspec..
>
> (*) as usual, we accept committish sha-1, not just comit sha-1, so
> --commit-id may be confusing..? Or maybe just go with --tag where we
> accept either tag names, tag sha-1 or commit-sha1

I agree with you that it is sensible to think this topic around
"fetch" not "clone".

I however do not think "--commit-id" is a good name for that option.
You are naming the option after what it is that is given, not after
what the commit specified with that commit-id is used for.  It is
just as nonsense as renaming the "--not-merged" option used in
"branch --not-merged $commit" to "--commit-id" because the option
takes the object name for a committish.

Besides, "git fetch" can grab any object, not limited to committish,
which makes "--commit-id" a doubly unsuitable name for the option.

I do not think you would need a new option for this, by the way.
Just add a new syntax for the LFS of a refspec that cannot possibly
be confused with existing choices of what can come there (i.e. an
empty string to denote deletion, or a partial refname), e.g. come up
with an appropriate string in $sign and allow the following:

    $ git fetch ${sign}c78f7b5ed9dc
    $ git fetch ${sign}c78f7b5ed9dc:refs/remotes/origin/frotz

to do the obvious thing, perhaps?  We could even allow some form of
extended SHA-1 expressions with some restrictions (e.g. limit its
use in a protected friendly environment to avoid excessive resource
use at the server side):

    $ git fetch ${sign}c78f7b5ed9dc~12
    $ git fetch ${sign}HEAD@{4}:refs/remotes/origin/frotz

The ${sign} signals two things.  (1) That the endpoint of the
history (or the name of the object being fetched, be it a blob, a
tree and all its contents, a commit and everything reachable from
it, etc.) is specified and (2) that the specification will be
interpreted at the remote side.

Hmm?

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2] add test to demonstrate that shallow recursive clones fail
  2015-12-01 21:47             ` Junio C Hamano
@ 2015-12-01 22:22               ` Junio C Hamano
  2015-12-03 20:03                 ` Stefan Beller
  2015-12-04  7:52                 ` Duy Nguyen
  0 siblings, 2 replies; 35+ messages in thread
From: Junio C Hamano @ 2015-12-01 22:22 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: Stefan Beller, Jeff King, Lars Schneider, git

Junio C Hamano <gitster@pobox.com> writes:

> I do not think you would need a new option for this, by the way.
> Just add a new syntax for the LFS of a refspec that cannot possibly
> be confused with existing choices of what can come there (i.e. an
> empty string to denote deletion, or a partial refname), e.g. come up
> with an appropriate string in $sign and allow the following:
>
>     $ git fetch ${sign}c78f7b5ed9dc
>     $ git fetch ${sign}c78f7b5ed9dc:refs/remotes/origin/frotz
>
> to do the obvious thing, perhaps?  We could even allow some form of
> extended SHA-1 expressions with some restrictions ...

Note that the above example already uses a form of extended SHA-1
expression, and I personally do not think we should support it in
the very initial version.

This is because the actual object name, if resolved on the remote
side, will not be known by "fetch".  To support the "resolve on the
remote end", we would need protocol extension to have the remote end
tell the "fetch", i.e. "you asked to fetch HEAD@{4}, the exact
object name for that is 030000f4c81729d2cb862a317e41a60a7111b98d";
otherwise we cannot add a line to FETCH_HEAD and cannot update the
RHS of the refspec.

Instead, we should limit us to 40-hex object name and nothing else
in the initial incarnation.

i.e.

     $ git fetch ${sign}c78f7b5ed9dc1c6edc8db06ac65860151d54fd07
     $ git fetch ${sign}c78f7b5ed9dc1c6edc8db06ac65860151d54fd07:refs/remotes/origin/frotz

If the remote end (which, as Peff pointed out earlier, already knows
how to respond to a fetch request for an exact object when
configured to do so) allows such a fetch to go through, "fetch" can
(and will) update the ref named by the RHS of storing refspec with
the current code, so there is no need to do anything special to
support this.

As to ${sign}, I was tempted to say an empty string might be
sufficient (i.e. "do not use 40-hex as your branch name"), but it
probably is a bad idea.  A single dot "." would be a possibility
(i.e. a ref component cannot begin with a dot), but squating on it
and saying "anything that begins with . must be followed by 40-hex
(and in the future by an extended SHA-1)" would rob extensibility
from us, so perhaps ".@c78f7b5ed9dc1c6edc8db06ac65860151d54fd07" or
something?  That is leading "." denotes "this is an extended refspec"
and the next character denotes what kind of extended refspec it is.
For now we say that "@" denotes "exact object name is used instead
of a(n abbreviated) refname".

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2] add test to demonstrate that shallow recursive clones fail
  2015-12-01 22:22               ` Junio C Hamano
@ 2015-12-03 20:03                 ` Stefan Beller
  2015-12-04 15:32                   ` Junio C Hamano
  2015-12-04 16:15                   ` Junio C Hamano
  2015-12-04  7:52                 ` Duy Nguyen
  1 sibling, 2 replies; 35+ messages in thread
From: Stefan Beller @ 2015-12-03 20:03 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Duy Nguyen, Jeff King, Lars Schneider, git

On Tue, Dec 1, 2015 at 2:22 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Junio C Hamano <gitster@pobox.com> writes:
>
>> I do not think you would need a new option for this, by the way.
>> Just add a new syntax for the LFS of a refspec that cannot possibly
>> be confused with existing choices of what can come there (i.e. an
>> empty string to denote deletion, or a partial refname), e.g. come up
>> with an appropriate string in $sign and allow the following:
>>
>>     $ git fetch ${sign}c78f7b5ed9dc
>>     $ git fetch ${sign}c78f7b5ed9dc:refs/remotes/origin/frotz

That looks good to me.

>
> Instead, we should limit us to 40-hex object name and nothing else
> in the initial incarnation.

ok, will do.

>
> i.e.
>
>      $ git fetch ${sign}c78f7b5ed9dc1c6edc8db06ac65860151d54fd07
>      $ git fetch ${sign}c78f7b5ed9dc1c6edc8db06ac65860151d54fd07:refs/remotes/origin/frotz
>
> If the remote end (which, as Peff pointed out earlier, already knows
> how to respond to a fetch request for an exact object when
> configured to do so) allows such a fetch to go through, "fetch" can
> (and will) update the ref named by the RHS of storing refspec with
> the current code, so there is no need to do anything special to
> support this.
>
> As to ${sign}, I was tempted to say an empty string might be
> sufficient (i.e. "do not use 40-hex as your branch name"), but it
> probably is a bad idea.

I would think if sign is empty string the server will check if the given
40-hex is unique (either a branch named so, while there is no such
object or just that object and not branch/tag) or the remote would
reject due to disambiguation. This possibility can be done later though.

> A single dot "." would be a possibility
> (i.e. a ref component cannot begin with a dot), but squating on it
> and saying "anything that begins with . must be followed by 40-hex
> (and in the future by an extended SHA-1)" would rob extensibility
> from us, so perhaps ".@c78f7b5ed9dc1c6edc8db06ac65860151d54fd07" or
> something?

My gut reaction is to reject that notation, as it is very cryptic.
Looking at the @ sign, it reminds me of the reflog notion such as HEAD@{-1}.
So maybe it would be more appealing to specify
HEAD@{c78f7b5ed9dc1c6edc8db06ac65860151d54fd07}
to mean a specific commit. By saying HEAD we indicate it is not meant as
a branch (both on the remote as well as locally).
By having the @{ sequence this would also be dis-ambiguous from any
branch.

> That is leading "." denotes "this is an extended refspec"
> and the next character denotes what kind of extended refspec it is.
> For now we say that "@" denotes "exact object name is used instead
> of a(n abbreviated) refname".

So using @ as you propose I could also specify .@refs/heads/master as
an un-abbreviated ref?

Did you have any reason to pick . specifically or are we welcome to bikeshed
why a colon might be better? (or ":", "?", "[", "\", "^", "~", SP, or TAB)

We could use [id]c78f7b5ed9dc1c6edc8db06ac65860151d54fd07
or [const]c78f7b5ed9dc1c6edc8db06ac65860151d54fd07 ?

Looking at the big picture here, this being a preparation for improving
submodule cloning, we also want to allow tags here?

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2] add test to demonstrate that shallow recursive clones fail
  2015-12-01 22:22               ` Junio C Hamano
  2015-12-03 20:03                 ` Stefan Beller
@ 2015-12-04  7:52                 ` Duy Nguyen
  1 sibling, 0 replies; 35+ messages in thread
From: Duy Nguyen @ 2015-12-04  7:52 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Stefan Beller, Jeff King, Lars Schneider, git

On Tue, Dec 1, 2015 at 11:22 PM, Junio C Hamano <gitster@pobox.com> wrote:
> As to ${sign}, I was tempted to say an empty string might be
> sufficient (i.e. "do not use 40-hex as your branch name"), but it
> probably is a bad idea.  A single dot "." would be a possibility
> (i.e. a ref component cannot begin with a dot), but squating on it
> and saying "anything that begins with . must be followed by 40-hex
> (and in the future by an extended SHA-1)" would rob extensibility
> from us, so perhaps ".@c78f7b5ed9dc1c6edc8db06ac65860151d54fd07" or
> something?  That is leading "." denotes "this is an extended refspec"
> and the next character denotes what kind of extended refspec it is.
> For now we say that "@" denotes "exact object name is used instead
> of a(n abbreviated) refname".

How about @{something}? "something" could be "sha1" (or even
rev-parse). It's not as cryptic, gives plenty of room for extension,
and (to me) immediately connects to extended sha-1 syntax. Yeah we
don't support _extended_ syntax right away, but in future I think we
should. We can decide on a quick shortcut syntax later.
-- 
Duy

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2] add test to demonstrate that shallow recursive clones fail
  2015-12-03 20:03                 ` Stefan Beller
@ 2015-12-04 15:32                   ` Junio C Hamano
  2015-12-04 16:15                   ` Junio C Hamano
  1 sibling, 0 replies; 35+ messages in thread
From: Junio C Hamano @ 2015-12-04 15:32 UTC (permalink / raw)
  To: Stefan Beller; +Cc: Duy Nguyen, Jeff King, Lars Schneider, git

Stefan Beller <sbeller@google.com> writes:

>> A single dot "." would be a possibility
>> (i.e. a ref component cannot begin with a dot), but squating on it
>> and saying "anything that begins with . must be followed by 40-hex
>> (and in the future by an extended SHA-1)" would rob extensibility
>> from us, so perhaps ".@c78f7b5ed9dc1c6edc8db06ac65860151d54fd07" or
>> something?
>
> My gut reaction is to reject that notation, as it is very cryptic.
> Looking at the @ sign, it reminds me of the reflog notion such as HEAD@{-1}.
> So maybe it would be more appealing to specify
> HEAD@{c78f7b5ed9dc1c6edc8db06ac65860151d54fd07}
> to mean a specific commit. By saying HEAD we indicate it is not meant as
> a branch (both on the remote as well as locally).
> By having the @{ sequence this would also be dis-ambiguous from any
> branch.

I specifically rejected that when I wrote the message you are
responding to, because I think that would make it harder to later
enhance the mechanism to ask for HEAD@{...}, i.e. extended SHA-1
expression.

But bikeshedding can be left as an exercise to those who have too
much time on hand.

> Looking at the big picture here, this being a preparation for improving
> submodule cloning, we also want to allow tags here?

The reason why we need to restrict to raw 40-hex in the initial
iteration is because we do not want protocol update for an uncooked
idea.  Hence a tag object name in 40-hex is fine, but a tag refname
e.g. v1.0 is not.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2] add test to demonstrate that shallow recursive clones fail
  2015-12-03 20:03                 ` Stefan Beller
  2015-12-04 15:32                   ` Junio C Hamano
@ 2015-12-04 16:15                   ` Junio C Hamano
  1 sibling, 0 replies; 35+ messages in thread
From: Junio C Hamano @ 2015-12-04 16:15 UTC (permalink / raw)
  To: Stefan Beller; +Cc: Duy Nguyen, Jeff King, Lars Schneider, git

Stefan Beller <sbeller@google.com> writes:

> Did you have any reason to pick . specifically or are we welcome to bikeshed
> why a colon might be better? (or ":", "?", "[", "\", "^", "~", SP, or TAB)
>
> We could use [id]c78f7b5ed9dc1c6edc8db06ac65860151d54fd07
> or [const]c78f7b5ed9dc1c6edc8db06ac65860151d54fd07 ?

As to the bikeshedding, I kind of like the above.

    [object:c78f7b5ed9dc1c6edc8db06ac65860151d54fd07]

would be a more natural way to use the pairwise magic characters
like [], I suspect.

Thanks.

^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2015-12-04 16:15 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-11-12  9:37 [PATCH v2] add test to demonstrate that shallow recursive clones fail larsxschneider
2015-11-12 23:34 ` Stefan Beller
2015-11-15 12:43   ` Lars Schneider
2015-11-13  5:35 ` Jeff King
2015-11-13 18:41   ` Stefan Beller
2015-11-13 23:16     ` Stefan Beller
2015-11-13 23:38       ` Jeff King
2015-11-13 23:41         ` Jeff King
2015-11-14  0:10           ` Stefan Beller
2015-11-16 18:59             ` Jens Lehmann
2015-11-16 19:25               ` Stefan Beller
2015-11-16 21:42                 ` Jens Lehmann
2015-11-16 22:56                   ` Stefan Beller
2015-11-17 19:46                     ` Jens Lehmann
2015-11-17 20:04                       ` Stefan Beller
2015-11-17 20:39                         ` Jens Lehmann
2015-11-17 20:49                           ` Stefan Beller
2015-11-17 21:00                             ` Jens Lehmann
2015-11-17 20:17                 ` Duy Nguyen
2015-11-17 21:43                   ` Jeff King
2015-11-18 12:32                     ` Duy Nguyen
2015-11-18 21:11                       ` Jeff King
2015-11-18 21:36                         ` Stefan Beller
     [not found]                           ` <CAOE36qj2m4e3hw73-QoLbbpGv4RiyhBt_ou7eN4i4q8pF15rdA@mail.gmail.com>
2015-11-19 13:58                             ` Jeff King
2015-11-30 18:11       ` Junio C Hamano
2015-12-01  0:47         ` Stefan Beller
2015-12-01 12:15           ` Duy Nguyen
2015-12-01 21:47             ` Junio C Hamano
2015-12-01 22:22               ` Junio C Hamano
2015-12-03 20:03                 ` Stefan Beller
2015-12-04 15:32                   ` Junio C Hamano
2015-12-04 16:15                   ` Junio C Hamano
2015-12-04  7:52                 ` Duy Nguyen
2015-11-15 12:53   ` Lars Schneider
2015-11-17 21:34     ` Jeff King

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.