[RFC] On the --depth argument when fetching with submodules

* [RFC] On the --depth argument when fetching with submodules
@ 2016-02-05 22:48 Stefan Beller
  2016-02-06  0:05 ` Junio C Hamano
  0 siblings, 1 reply; 7+ messages in thread
From: Stefan Beller @ 2016-02-05 22:48 UTC (permalink / raw)
  To: Jens Lehmann, Jonathan Nieder, git

Currently when cloning a project, including submodules, the --depth argument
is passed on recursively, i.e. when cloning with "--depth 2", both the
superproject as well as the submodule will have a depth of 2.  It is not
garantueed that the commits as specified by the superproject are included
in these 2 commits of the submodule.

Illustration:
(superproject with depth 2, so A would have more parents, not shown)

superproject/master: A <- B
                    /      \
submodule/master:  C <- D <- E <- F <- G

(Current behavior is to fetch G and F)

The submodule is referenced at C and E from the two superproject commits.
So it is a reasonable expectation to have C and E included when fetching
the superproject with depth of 2.

So to fetch the correct submodule commits, we need to
* traverse the superproject and list all submodule commits.
* fetch these submodule commits (C and E) by sha1
* in case of shallow clones, I'd propose to have the submodules be shallowed
  with depth 1, i.e. to fetch C and E and no parents thereof
* we need to think of a way to preserve the commits in the submodule for
  later use (i.e. gc must not delete those commits)

For the later I propose to use a ref in the submodule (refs/meta/superproject ?)
which will be an evil merge of all relevant commits to be preserved for the
superproject. It is an evil merge commit as we do not need to store any worktree
at all, but we only care about the parents relationship preventing the gc to
collect data required from the superproject. The parents should contain all
branches or in case of disjoint shallow histories (like in the example above)
all the relevant commits.

Using these ideas we can go a step further in submodules and not need branches,
but only detached heads, which are directed from the superproject.

As for the implementation of these ideas, whenever you commit in the
superproject
with modification to a submodule, the superproject would need to modify the
refs/meta/superproject submodule branch to make sure the submodule is
safe against
garbage collection.

Thanks,
Stefan

^ permalink raw reply	[flat|nested] 7+ messages in thread