Git Mailing List Archive on lore.kernel.org
 help / color / Atom feed
* RFC/Discussion - Submodule UX Improvements
@ 2021-04-16 23:36 Emily Shaffer
  2021-04-18  5:22 ` Christian Couder
                   ` (5 more replies)
  0 siblings, 6 replies; 18+ messages in thread
From: Emily Shaffer @ 2021-04-16 23:36 UTC (permalink / raw)
  To: git; +Cc: avarab, jrnieder, albertcui, gitster, matheus.bernardino

Hi folks,

As hinted by a couple recent patches, I'm planning on some pretty big submodule
work over the next 6 months or so - and Ævar pointed out to me in
https://lore.kernel.org/git/87v98p17im.fsf@evledraar.gmail.com that I probably
should share some of those plans ahead of time. :) So attached is a lightly
modified version of the doc that we've been working on internally at Google,
focusing on what we think would be an ideal submodule workflow.

I'm hoping that folks will get a chance to read some or all of it and let us
know what sounds cool (or sounds extremely broken). The best spot to start is
probably the "Overview" section, which describes what the "main path" would look
like for a user working on a project with submodules. Most of the work that
we're planning on doing is under the "What doesn't already work" headings.

Thanks in advance for any time you spend reading/discussing :)

 - Emily

Background
==========

It's worth mentioning that the main goal that's funding this work is to provide
an alternative for users whose projects use repo
(https://source.android.com/setup/develop#repo) today. That means that the main
focus is to try and reach feature parity to repo for an easier transition for
those who want to switch. As a result, some of the direction below is aimed
towards learning from what has worked well with repo (but hopefully more
flexible for users who want to do more, or differently).

There are also a few things mentioned that are specifically targeted to ease use
with Gerrit, which is in wide use here at Google (and therefore also a
consideration we need to make to keep getting paid ;) ).

Overview
=======

When the work is completed, users should be able to have a clean, obvious
workflow when using best practices:

To download the code, they should be able to run simply git clone
https://example.com/superproject to download the project and all its submodules;
if partial clone is configured, they should receive only the objects allowed by
the filter in their superproject as well as in each submodule.

To begin working on a feature, from the superproject they can 'git switch -c
feature', and since the new branch is being created, a new branch 'feature' will
be created for each submodule, pointing to the submodule's current 'HEAD'. They
can move to a submodule directory and begin to make changes, and when they
commit these changes normally with 'git commit' from the submodule directory,
running git status in the superproject will reflect that a submodule has
changed. Next, they can switch to a second submodule, making and committing more
changes.

When they are ready to send these changes which are ready for review but need to
be linked together, they can switch back to the superproject, where 'git status'
indicates that there are changes in both submodules. They can commit these
changes to the superproject and use 'git push' to send a review; Git will
recurse into affected submodules and push those submodule commits appropriately
as well.

While the user is waiting for feedback on their review, to work on their next
task, they can 'git switch other-feature', which will checkout the branches
specified in the superproject commit at the tip of 'other-feature'; now the user
can continue working as before.

When it's time to update their local repo, the user can do so as with a
single-repo project. First they can 'git checkout main && git pull' (or 'git
pull -r'); Git will first checkout the branches associated with main in each
submodule, then fetch and merge/rebase in each submodule appropriately. Finally,
they can 'git switch feature && git rebase', at which time Git will recursively
checkout the branches associated with 'feature' in each submodule and rebase
each submodule appropriately.

Detailed Design
===============

The Well-Tread Path: Basic Contribution Workflow
------------------------------------------------

- git clone

1. git clone initializes the directory indicated by the user
2. git clone fetches the superproject
3. git clone checks out the superproject at server's HEAD (or at another commit
   as specified by the user, e.g. with --branch)
4. git clone warns the user that a recommended hook/config setup exists and
   provides a tip on how to install it
5. For each submodule encountered in step 3, git clone is invoked for the
   submodule, and steps 1-4 are repeated (but in directories indicated by the
   superproject commit, not by the user).

Note that this means options like '--branch' *don't* propagate directly to the
submodules. If superproject branch "foo" points its submodule to branch "main",
then 'git clone --branch foo https://superproject.git' will clone
superproject/submodule to branch 'main' instead. (It *may* be OK to take
'--branch' to mean "the branch specified by the parent *and* the branch named in
--branch if applicable, but no other branches".)

What doesn't already work:

  * --recurse-submodules should turn on submodule.recurse=true
  * superproject gets top-level config inherited by submodules
  * New --recurse-submodules --single-branch semantics
  * Progress bar for clone (see work estimates)
  * Recommended config from project owner


-- Partial clone

1. git clone initializes the directory indicated by the user
2. git clone applies the appropriate configs for the partial clone filter
   requested by the user
  a) These configs go to the config file shared by superproject and submodules.
3. git clone fetches the superproject
4. git clone checks out the superproject at server's HEAD
5. git clone warns the user that a recommended hook/config setup exists and
   provides a tip on how to install it
6. For each submodule encountered in step 4, git clone is invoked for the
   submodule, and steps 1-4 are repeated (but in directories indicated by the
   superproject commit, not by the user). The same filter supplied to the
   superproject applies to the submodules.


What doesn't already work:

  * --filter=blob:none with submodules (it's using global variables)
  * propagating --filter=blob:none to submodules (via submodules.config)
  * Recommended config from project owner


- git fetch

By default, git fetch looks for (1) the remote name(s) supplied at the command
line, (2) the remote which the currently checked out branch is tracking, or (3)
the remote named origin, in that order. For submodules, there is no guarantee
that (1) has anything to do with the state of the submodule referenced by the
superproject commit, so just start from (2).

This operation can be extremely long-running if the project contains many large
submodules, so progress indicators should be displayed.

Caveat: this will mean that we should be more careful about ensuring that
submodule branches have tracking info set up correctly; that may be an issue for
users who want to branch within their submodule. This may be OK because users
will probably still have 'origin' as their submodule's remote, and if they want
more complicated behavior, they will be able to configure it.

What doesn't already work:

  * Make sure not to propagate (1) to submodules while recursing
  * Fetching new submodules.
  * Not having 0.95 success probability ** 100 = low success probability (that
    is, we need more retries during submodule fetch)
  * Progress indicators


- git switch / git checkout

Submodules should continue to perform these operations the same way that they
have before, that is, the way that single-repo Git works. But superprojects
should behave as follows:


-- Create mode (git switch -c / git checkout -b)

1. The current worktree is checked for uncommitted changes to tracked files. The
   current worktree of each submodule is also checked.
2. A new branch is created on the superproject; that branch's ref is pointed to
   the current HEAD.
3. The new branch is checked out on the superproject.
4. A new branch with the same name is created on each submodule.
  a. If there is a naming conflict, we could prompt the user to resolve it, or
     we could just check out the branch by that name and print a warning to the
     user with advice on how to solve it (cd submodule && git switch -c
     different-branch-name HEAD@{1}). Maybe we could skip the warning/advice if
     the tree is identical to the tree we would have used as the start point
     (that is, the user switched branches in the submodule, then said "oh crap"
     and went back and switched branches in the superproject).
  b. Tracking info is set appropriately on each new branch to the upstream of
     the branch referenced by the parent of the new superproject commit, OR to
     the default branch's upstream.
5. The new branch is checked out on each of the submodules.

What doesn't already work:

  * Safety check when leaving uncommitted submodule changes
  * Propagating branch names to submodules currently requires a custom hacky
    repolike patch
  * Error handling + graceful non-error handling if the branch already exists
  * "Knowing what branch to push to": copying over which-branch-is-upstream info
    ** Needs some UX help, push.default is a mess
  * Tracking info setups


-- Switching to an existing branch (git switch / git checkout)

1. The current worktree is checked for uncommitted changes to tracked files. The
   current worktree of each submodule is also checked.
2. The requested branch is checked out on the superproject.
3. The submodule commit or branch referenced by the newly-checked-out
   superproject commit is checked out on each submodule.

What doesn't already work:

  * Same as in create mode


- git status

-- From superproject
The superproject is clean if:

  * No tracked files in the superproject have been modified and not committed
  * No tracked files in any submodules have been modified and not committed
  * No commits in any submodules differ from the commits referenced by the tip
    commit of the superproject

Advices should describe:

  * How to commit or drop changes to files in the superproject
  * How to commit or drop changes to files in the submodules
  * How to commit changes to submodule references 
  * Which commit/branch to switch the submodule back to if the current work
    should be dropped: "Submodule "foo" no longer points to "main", 'git -C foo
    switch main' to discard changes"

What doesn't already work:

  * "git status" being super fast and actually possible to use.
    ** (That is, we've seen it move very slowly on projects with many
       submodules.)
  * Advice updates to use the appropriate submodule-y commands.

-- From submodule

git status's behavior for submodules does not change compared to
single-repository Git, except that a red warning line will also display if the
superproject commit does not point to the HEAD of the submodule. (This could
look similar to the detached-HEAD warning and tracking branch lines in git
status today, e.g. "HEAD is ahead of parent project by 2 commits".)

What doesn't already work:

  * "git status" from a submodule being aware of the superproject.


- git push

-- From superproject

Ideally, a push of the superproject commit results in a push of each submodule
which changed, to the appropriate Gerrit upstream. Commits pushed this way
across submodules should somehow be associated in the Gerrit UI, similar to the
"submitted together" display. This will need some work to make happen.

What doesn't already work:

  * Automatically setting Gerrit topic (with a hook)
  * "push --recurse-submodules" knowing where to push to in submodules to
    initiate a Gerrit review
    ** From `branch` field in .gitmodules?
    ** Gerrit accepting 'git push -o review origin main' pushes?
    ** Review URL with a remote helper that rewrites refs/heads/main to
       refs/for/main?
    ** Need UX help

From submodule
No change to client behavior is needed. With Gerrit submodule subscriptions, the
server knows how to generate superproject commits when merging submodule
commits.

- git pull / git rebase

Note: We're still thinking about this one :)

1. Performs a fetch as described above
2. For each superproject commit, replay the submodule commits against the newly
   updated submodule base; then, make a new superproject commit containing those
   changes

What doesn't already work:

  * Rewriting gitlinks in a superproject commit when 'rebase
    --recurse-submodules'-ing
  * Resuming after resolving a conflict during rebase

- git merge

The story for merges is a little bit muddled... and for our goals we don't need
it for quite a while, so we haven't thought much about it :) Any suggestions
folks have about reasonable ways to 'git merge --recurse-submodules' are totally
welcome. For now, though, we'll probably just stick in some error message saying
that merges with submodules isn't currently supported (maybe we will even add
that downstream).

What doesn't already work:

  * Erroring out for "not supported"


Aligning Teams
--------------

There's two pieces of work that we are relying on a lot, and both have been
mentioned upstream by now, so I'll just link out:

1. Recommended Hook Configurations
(https://lore.kernel.org/git/pull.908.v2.git.1616723016659.gitgitgadget@gmail.com)

2. Shared Configuration Across Submodules
(https://lore.kernel.org/git/20210408233936.533342-1-emilyshaffer@google.com)

Edge Cases, Mess Recovery, & Power Users
-------------------------------

- Unstaged Changes in Submodules At Commit Time

-- Related Changes (Single Branch)

If a user has unstaged changes in multiple submodules and runs 'git commit
--all' from the superproject, they should be presented with an editor which
contains commit message drafts for each modified branch, including the
superproject, separated by scissors or some other delineator. After providing a
commit message, Git should perform each submodule commit, then finally perform
the superproject commit based on the submodules' new commit IDs and apply the
proposed superproject commit message.


What doesn't already work:

  * "git commit --recurse-submodules" that lets me write a commit message with
    scissors dividing things in each repository

-- Unrelated Changes (Separating Into Multiple Branches)

If a user has unstaged changes in multiple submodules and only wants to commit
some of them, and runs 'git add --patch' from the superproject, they should be
walked through 'git add --patch' for each submodule first. However, since this
could be a lengthy process, we need to think carefully about how the UX should
look compared to the existing `git add --patch` UX for single-repo projects.

What doesn't already work:

  * "git add --patch" that recurses through submodule hunks as well


- Recovering from Exploratory Changes with 'git restore' and 'git reset'

When a user has checked out some historical commit in at least one submodule for
the purpose of exploration/investigation, it should be easy to reset the entire
tree back to the state defined by the superproject commit. Running git restore
(or git reset) from the superproject should recurse by running git checkout on
each submodule - and when there are no untracked changes in the submodule, it
can do this without asking for user intervention or approval.

What doesn't already work:

  * Add some tests for good restore/reset behavior and make them pass


- Multiple Commits on a Superproject Branch

Generally, one superproject commit should represent one feature, where that one
feature may consist of multiple submodule commits. It could be thought of
similarly to a merge commit, which brings a stack of related changes into the
history and summarizes them a single commit, without squashing or losing
history. So a user who has two commits in one superproject branch is working on
two features, one of which depends on the other. Reordering those commits should
involve replaying the commits in each submodule associated with each
superproject commit:


  superproject  submodule                    superproject  submodule

       A ---------> a1                            B ----------> b1
       |            |                             |             |
       |            a2                            |             b2
       |            |                             |             |
       |            a3                            A ----------> a1
       |            |          rebase             |             |
       B ---------> b1         =====>             |             a2
       |            |                             |             |
       |            b2                            |             a3
       |            |                             |             |
       o            o                             o             o
       |            |                             |             |
       o            o                             o             o
       |            |                             |             |


- Branching in a Submodule

In addition to the 'git status' warning, users should also receive a warning
like detached-HEAD when switching branches in the submodule without a
superproject commit - "the branch you are leaving behind is not tracked by any
superproject commit". Users who are just working in and pushing from a single
submodule may find this warning annoying, so it should be clear how to disable
that warning per-submodule.


- Worktrees

When a user runs 'git worktree add' from the superproject, each submodule in the
new worktree should also be created as a worktree of the corresponding submodule
in the original project.

What doesn't already work:

  * worktrees and submodules getting along - submodules are now freshly cloned
    when creating a superproject worktree

- git clone --reference [--dissociate]

When cloning with an alternate directory, submodules should also try to use
object stores associated with the referenced project instead of cloning from
their remotes right away. It is unclear how much of this works today.


What doesn't already work:

  * Writing some tests and making them pass

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: RFC/Discussion - Submodule UX Improvements
  2021-04-16 23:36 RFC/Discussion - Submodule UX Improvements Emily Shaffer
@ 2021-04-18  5:22 ` Christian Couder
  2021-04-20 23:10   ` Emily Shaffer
  2021-04-19  3:20 ` Philippe Blain
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 18+ messages in thread
From: Christian Couder @ 2021-04-18  5:22 UTC (permalink / raw)
  To: Emily Shaffer
  Cc: git, Ævar Arnfjörð Bjarmason, Jonathan Nieder,
	albertcui, Junio C Hamano, Matheus Tavares Bernardino,
	Shourya Shukla

Hi Emily,

On Sat, Apr 17, 2021 at 1:39 AM Emily Shaffer <emilyshaffer@google.com> wrote:
>
> Hi folks,
>
> As hinted by a couple recent patches, I'm planning on some pretty big submodule
> work over the next 6 months or so - and Ævar pointed out to me in
> https://lore.kernel.org/git/87v98p17im.fsf@evledraar.gmail.com that I probably
> should share some of those plans ahead of time. :) So attached is a lightly
> modified version of the doc that we've been working on internally at Google,
> focusing on what we think would be an ideal submodule workflow.

Thanks for sharing this doc! My main concern with this is that we are
likely to have a GSoC student working soon on finishing to port `git
submodule` to C code. And I wonder how that would interact with your
work.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: RFC/Discussion - Submodule UX Improvements
  2021-04-16 23:36 RFC/Discussion - Submodule UX Improvements Emily Shaffer
  2021-04-18  5:22 ` Christian Couder
@ 2021-04-19  3:20 ` Philippe Blain
  2021-04-20 23:03   ` Emily Shaffer
  2021-04-19 12:56 ` Randall S. Becker
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 18+ messages in thread
From: Philippe Blain @ 2021-04-19  3:20 UTC (permalink / raw)
  To: Emily Shaffer, git
  Cc: avarab, jrnieder, albertcui, gitster, matheus.bernardino

Hi Emily,

Le 2021-04-16 à 19:36, Emily Shaffer a écrit :
> Hi folks,
> 
> As hinted by a couple recent patches, I'm planning on some pretty big submodule
> work over the next 6 months or so - and Ævar pointed out to me in
> https://lore.kernel.org/git/87v98p17im.fsf@evledraar.gmail.com that I probably
> should share some of those plans ahead of time. :) So attached is a lightly
> modified version of the doc that we've been working on internally at Google,
> focusing on what we think would be an ideal submodule workflow.
> 
> I'm hoping that folks will get a chance to read some or all of it and let us
> know what sounds cool (or sounds extremely broken). The best spot to start is
> probably the "Overview" section, which describes what the "main path" would look
> like for a user working on a project with submodules. Most of the work that
> we're planning on doing is under the "What doesn't already work" headings.
> 
> Thanks in advance for any time you spend reading/discussing :)

Thanks a lot for sharing this roadmap. There a lot of good ideas there, that would really
improve the situation for projects using subdmodules. I've added some toughts on specific
items below.

> 
>   - Emily
> 
> Background
> ==========
> 
> It's worth mentioning that the main goal that's funding this work is to provide
> an alternative for users whose projects use repo
> (https://source.android.com/setup/develop#repo) today. That means that the main
> focus is to try and reach feature parity to repo for an easier transition for
> those who want to switch. As a result, some of the direction below is aimed
> towards learning from what has worked well with repo (but hopefully more
> flexible for users who want to do more, or differently).
> 
> There are also a few things mentioned that are specifically targeted to ease use
> with Gerrit, which is in wide use here at Google (and therefore also a
> consideration we need to make to keep getting paid ;) ).
> 
> Overview
> =======
> 
> When the work is completed, users should be able to have a clean, obvious
> workflow when using best practices:
> 
> To download the code, they should be able to run simply git clone
> https://example.com/superproject to download the project and all its submodules;

Playing the devil's advocate here, but some projects do not want / need all of
their submodules in a "regular" checkout, so I guess that would have to be somehow
configurable. I've always felt that since each project is different in that regard,
it would be better if each project could declare if their submodules are non-optional
and need to be also cloned when the superproject is cloned. Maybe an additional field
in '.gitmodules', like a boolean 'submodule.<name>.optional', could be added,
so that submodules that are optional are not cloned, but others are. If that setting
is opt-in (meaning that it defaults to 'true', i.e., submodules are considered optional by default),
then it would be easier to argue for 'git clone' to mean 'git clone --recurse-submodules':
'git clone' would clone the superproject and any non-optional submodule.
Then eventually, when the usage of 'submodule.<name>.optional' becomes more widespread,
we can switch the default and then projects would need to explicitely declare their submodule
optional if they don't want them cloned by a simple 'git clone'.


> if partial clone is configured, they should receive only the objects allowed by
> the filter in their superproject as well as in each submodule.
> 
> To begin working on a feature, from the superproject they can 'git switch -c
> feature', and since the new branch is being created, a new branch 'feature' will
> be created for each submodule, pointing to the submodule's current 'HEAD'. They
> can move to a submodule directory and begin to make changes, and when they
> commit these changes normally with 'git commit' from the submodule directory,
> running git status in the superproject will reflect that a submodule has
> changed. Next, they can switch to a second submodule, making and committing more
> changes.

Yes. Apart from recursive 'git checkout -b / git switch -c', the workflow you describe
already works well, from my experience.

> 
> When they are ready to send these changes which are ready for review but need to
> be linked together, they can switch back to the superproject, where 'git status'
> indicates that there are changes in both submodules. They can commit these
> changes to the superproject and use 'git push' to send a review; Git will
> recurse into affected submodules and push those submodule commits appropriately
> as well.
> 
> While the user is waiting for feedback on their review, to work on their next
> task, they can 'git switch other-feature', which will checkout the branches
> specified in the superproject commit at the tip of 'other-feature'; now the user
> can continue working as before.

Here, I'm not sure what you mean by "the branches (plural) specified in the superproject
commit at the tip of other-feature". Today, with 'submodule.recurse = true', 'git checkout some-feature'
already checks out each submodule in detached HEAD at the commit recorded in the superproject commit
at the tip of some-feature. It's unclear if you are proposing to instead record submodule branch
names in the superproject commit.. is that what's going on here ? (or is it just a typo ?)

> 
> When it's time to update their local repo, the user can do so as with a
> single-repo project. First they can 'git checkout main && git pull' (or 'git
> pull -r'); Git will first checkout the branches associated with main in each
> submodule, then fetch and merge/rebase in each submodule appropriately. 

What if some submodule does not use the same branch name for their primary integration branch?
Sometimes as a superproject using another project as a submodule, you do not
control that...

> Finally,
> they can 'git switch feature && git rebase', at which time Git will recursively
> checkout the branches associated with 'feature' in each submodule and rebase
> each submodule appropriately.
> 
> Detailed Design
> ===============
> 
> The Well-Tread Path: Basic Contribution Workflow
> ------------------------------------------------
> 
> - git clone
> 
> 1. git clone initializes the directory indicated by the user
> 2. git clone fetches the superproject
> 3. git clone checks out the superproject at server's HEAD (or at another commit
>     as specified by the user, e.g. with --branch)
> 4. git clone warns the user that a recommended hook/config setup exists and
>     provides a tip on how to install it
> 5. For each submodule encountered in step 3, git clone is invoked for the
>     submodule, and steps 1-4 are repeated (but in directories indicated by the
>     superproject commit, not by the user).
> 
> Note that this means options like '--branch' *don't* propagate directly to the
> submodules. If superproject branch "foo" points its submodule to branch "main",

Here again, I'm not sure what you mean, because right now there is no concept of
the superproject having a submodule "pointing to some branch", only to a specific
commit. 'submodule.<name>.branch' is only ever used by the command 'git submodule update --remote'.
Is there an implicit proposal to change that ?

> then 'git clone --branch foo https://superproject.git' will clone
> superproject/submodule to branch 'main' instead. (It *may* be OK to take
> '--branch' to mean "the branch specified by the parent *and* the branch named in
> --branch if applicable, but no other branches".)
> 
> What doesn't already work:
> 
>    * --recurse-submodules should turn on submodule.recurse=true

That's actually a good very idea, but maybe it should be explicitely mentioned, I think
(in the output of the command I mean).

>    * superproject gets top-level config inherited by submodules
>    * New --recurse-submodules --single-branch semantics
>    * Progress bar for clone (see work estimates)
>    * Recommended config from project owner
> 
> 
> -- Partial clone
> 
> 1. git clone initializes the directory indicated by the user
> 2. git clone applies the appropriate configs for the partial clone filter
>     requested by the user
>    a) These configs go to the config file shared by superproject and submodules.
> 3. git clone fetches the superproject
> 4. git clone checks out the superproject at server's HEAD
> 5. git clone warns the user that a recommended hook/config setup exists and
>     provides a tip on how to install it
> 6. For each submodule encountered in step 4, git clone is invoked for the
>     submodule, and steps 1-4 are repeated (but in directories indicated by the
>     superproject commit, not by the user). The same filter supplied to the
>     superproject applies to the submodules.
> 
> 
> What doesn't already work:
> 
>    * --filter=blob:none with submodules (it's using global variables)
>    * propagating --filter=blob:none to submodules (via submodules.config)
>    * Recommended config from project owner
> 
> 
> - git fetch
> 
> By default, git fetch looks for (1) the remote name(s) supplied at the command
> line, (2) the remote which the currently checked out branch is tracking, or (3)
> the remote named origin, in that order. For submodules, there is no guarantee
> that (1) has anything to do with the state of the submodule referenced by the
> superproject commit, so just start from (2).
> 
> This operation can be extremely long-running if the project contains many large
> submodules, so progress indicators should be displayed.
> 
> Caveat: this will mean that we should be more careful about ensuring that
> submodule branches have tracking info set up correctly; that may be an issue for
> users who want to branch within their submodule. This may be OK because users
> will probably still have 'origin' as their submodule's remote, and if they want
> more complicated behavior, they will be able to configure it.
> 
> What doesn't already work:
> 
>    * Make sure not to propagate (1) to submodules while recursing
>    * Fetching new submodules.
>    * Not having 0.95 success probability ** 100 = low success probability (that
>      is, we need more retries during submodule fetch)
>    * Progress indicators

I would add the following:

- Fix 'git fetch upstream' when 'submodule.recurse' and 'fetch.recurseSubdmodules=on-demand'
are both set  (the submodule is not fetched even if the superproject changed the submodule
commit).

- Do not rely on 'origin' exising in the submodule (or being pushable to). Right now,
renaming the 'origin' remote to 'upstream' in a submodule, and using 'origin' for one's own
fork of a submodule, (as is often done in the superproject), breaks 'git fetch --recurse-submodules'
(or 'git fetch' if 'submodule.recurse' is set), in the sense that the fetch does not recurse
to the submodule, as it should. I do not have a simple reproducer handy but
I've seen it happen and there are a couple hard-coded "origin" in the submodule code [1], [2].

> 
> 
> - git switch / git checkout
> 
> Submodules should continue to perform these operations the same way that they
> have before, that is, the way that single-repo Git works. But superprojects
> should behave as follows:
> 
> 
> -- Create mode (git switch -c / git checkout -b)
> 
> 1. The current worktree is checked for uncommitted changes to tracked files. The
>     current worktree of each submodule is also checked.
> 2. A new branch is created on the superproject; that branch's ref is pointed to
>     the current HEAD.
> 3. The new branch is checked out on the superproject.
> 4. A new branch with the same name is created on each submodule.

That might not be wanted by all, so I think it should be configurable.

>    a. If there is a naming conflict, we could prompt the user to resolve it, or
>       we could just check out the branch by that name and print a warning to the
>       user with advice on how to solve it (cd submodule && git switch -c
>       different-branch-name HEAD@{1}). Maybe we could skip the warning/advice if
>       the tree is identical to the tree we would have used as the start point
>       (that is, the user switched branches in the submodule, then said "oh crap"
>       and went back and switched branches in the superproject).
>    b. Tracking info is set appropriately on each new branch to the upstream of
>       the branch referenced by the parent of the new superproject commit, OR to
>       the default branch's upstream.

This last point is a little unclear: which "new superproject commit" ? (we are creating
a branch, so there is no new commit yet?). And again, you talk about a (submodule?) branch being referenced
by a superproject commit, which is not a concept that actually exists today.
Also, usually tracking info is only set
automatically when using the form 'git checkout -b new-branch upstream/master' or
the like. Do you also propose that 'git checkout -b new-branch', by itself, should
automatically set tracking info ?


> 5. The new branch is checked out on each of the submodules.
> 
> What doesn't already work:
> 
>    * Safety check when leaving uncommitted submodule changes

Yes, that has been reported several times ([3], [4], [5]). I have fixes for this,
not quite ready to send because I'm trying to write extensive tests (maybe too extensive)...

>    * Propagating branch names to submodules currently requires a custom hacky
>      repolike patch
>    * Error handling + graceful non-error handling if the branch already exists
>    * "Knowing what branch to push to": copying over which-branch-is-upstream info
>      ** Needs some UX help, push.default is a mess
>    * Tracking info setups
> 
> -- Switching to an existing branch (git switch / git checkout)
> 
> 1. The current worktree is checked for uncommitted changes to tracked files. The
>     current worktree of each submodule is also checked.
> 2. The requested branch is checked out on the superproject.
> 3. The submodule commit or branch referenced by the newly-checked-out
>     superproject commit is checked out on each submodule.
> 
> What doesn't already work:
> 
>    * Same as in create mode

Here, I would add that 'git checkout --recurse-submodules', along with 'git clone --recurse-submodules',
have trouble with correctly checkout-ing an older commit that records a submodule that
was since removed from the project. The user experience around this use case is currently very very bad [6].
This is partly due to 'git clone --recurse-submodules' only cloning submodules that are recorded in
the tip commit of the default branch of the superproject, which could certainly be improved.

> 
> 
> - git status
> 
> -- From superproject
> The superproject is clean if:
> 
>    * No tracked files in the superproject have been modified and not committed
>    * No tracked files in any submodules have been modified and not committed
>    * No commits in any submodules differ from the commits referenced by the tip
>      commit of the superproject
> 
> Advices should describe:
> 
>    * How to commit or drop changes to files in the superproject
>    * How to commit or drop changes to files in the submodules
>    * How to commit changes to submodule references
>    * Which commit/branch to switch the submodule back to if the current work
>      should be dropped: "Submodule "foo" no longer points to "main", 'git -C foo
>      switch main' to discard changes"
> 
> What doesn't already work:
> 
>    * "git status" being super fast and actually possible to use.
>      ** (That is, we've seen it move very slowly on projects with many
>         submodules.)
>    * Advice updates to use the appropriate submodule-y commands.

I would add that 'git status' should show the submodule as "rewind" if the
currently checked out submodule commit is *behind* what's recorded in the current superproject
commit. That is shown by 'git diff --submodule=<log | diff>' and 'git submodule summary'
and is quite useful to prevent a following 'git commit -am' in the superproject to regress the submodule commit
by mistake. It would be nice if 'git status' could also show this information (code in
submodule.c::show_submodule_header).

> 
> -- From submodule
> 
> git status's behavior for submodules does not change compared to
> single-repository Git, except that a red warning line will also display if the
> superproject commit does not point to the HEAD of the submodule. (This could
> look similar to the detached-HEAD warning and tracking branch lines in git
> status today, e.g. "HEAD is ahead of parent project by 2 commits".)

That would be a nice addition :)

> 
> What doesn't already work:
> 
>    * "git status" from a submodule being aware of the superproject.
> 
> 
> - git push
> 
> -- From superproject
> 
> Ideally, a push of the superproject commit results in a push of each submodule
> which changed, to the appropriate Gerrit upstream. Commits pushed this way
> across submodules should somehow be associated in the Gerrit UI, similar to the
> "submitted together" display. This will need some work to make happen.
> 
> What doesn't already work:
> 
>    * Automatically setting Gerrit topic (with a hook)
>    * "push --recurse-submodules" knowing where to push to in submodules to
>      initiate a Gerrit review
>      ** From `branch` field in .gitmodules?
>      ** Gerrit accepting 'git push -o review origin main' pushes?
>      ** Review URL with a remote helper that rewrites refs/heads/main to
>         refs/for/main?
>      ** Need UX help

It would be nice if 'git push' would not force users to use the same
remote names and branch names in the superproject and the submodule.
Previous discussion around this that I had spotted are at [7] and [8].

> 
>>From submodule
> No change to client behavior is needed. With Gerrit submodule subscriptions, the
> server knows how to generate superproject commits when merging submodule
> commits.
> 
> - git pull / git rebase
> 
> Note: We're still thinking about this one :)
> 
> 1. Performs a fetch as described above
> 2. For each superproject commit, replay the submodule commits against the newly
>     updated submodule base; then, make a new superproject commit containing those
>     changes
> 
> What doesn't already work:
> 
>    * Rewriting gitlinks in a superproject commit when 'rebase
>      --recurse-submodules'-ing
>    * Resuming after resolving a conflict during rebase

In general, rebase is not well aware of 'submodule.recurse'. Even if you do not
need to rewrite superproject commits, there are a couple of use cases that are broken
right now:

- 'git rebase upstream/master' when upstream updated the submodule, will correctly
(recursively) checkout upstream/master before starting the rebase, but upon
'git rebase --abort', the submodule will stay checked out at the commit recorded in
'upstream/master', which is confusing. This only happens when 'submodule.recurse' is true (!).
- 'git rebase -i' which stops at a commit 'A' where the submodule commit is changed,
does not correctly check out the submodule tree. It's checked out at the commit recorded in A~1
(and this also only happens if submodule.recurse is true)
- In some cases, like 'rebase -i'-ing across the addition of new submodules, at the end
of the rebase the submodules are empty, and 'git submodule update' must be run to
re-populate them.

> 
> - git merge
> 
> The story for merges is a little bit muddled... and for our goals we don't need
> it for quite a while, so we haven't thought much about it :) Any suggestions
> folks have about reasonable ways to 'git merge --recurse-submodules' are totally
> welcome. For now, though, we'll probably just stick in some error message saying
> that merges with submodules isn't currently supported (maybe we will even add
> that downstream).

What is "downstream" here ?

Also, there is quite a bit of a future plan in the commit message of
a6d7eb2c7a (pull: optionally rebase submodules (remote submodule changes only), 2017-06-23).
It would be nice to revisit this, I think (regarding both rebase and merge).

> 
> What doesn't already work:
> 
>    * Erroring out for "not supported"
> 
> 
> Aligning Teams
> --------------
> 
> [ ... ] 
> 
> 
> - Worktrees
> 
> When a user runs 'git worktree add' from the superproject, each submodule in the
> new worktree should also be created as a worktree of the corresponding submodule
> in the original project.
> 
> What doesn't already work:
> 
>    * worktrees and submodules getting along - submodules are now freshly cloned
>      when creating a superproject worktree

That would certainly be nice. I've been using worktrees with submodule-containing
projects and everything has been working fine (there were 2 bugs but I fixed them).
Once we are not wasting disk space
by re-cloning the submodules, we whould remove the 'not recommended' mention in the
docs aboout using worktrees with projects containing submodules.

> 
> - git clone --reference [--dissociate]
> 
> When cloning with an alternate directory, submodules should also try to use
> object stores associated with the referenced project instead of cloning from
> their remotes right away. It is unclear how much of this works today.
> 
> 
> What doesn't already work:
> 
>    * Writing some tests and making them pass
> 


Thanks again for providing these details,

Philippe.

[1] https://github.com/git/git/blob/b0c09ab8796fb736efa432b8e817334f3e5ee75a/builtin/submodule--helper.c#L43-L51
[2] https://github.com/git/git/blob/b0c09ab8796fb736efa432b8e817334f3e5ee75a/submodule.c#L1525
[3] https://lore.kernel.org/git/CAHsG2VT4YB_nf8PrEmrHwK-iY-AQo0VDcvXGVsf8cEYXws4nig@mail.gmail.com/
[4] https://lore.kernel.org/git/20200525094019.22padbzuk7ukr5uv@overdrive.tratt.net/T/#u
[5] https://lore.kernel.org/git/05afbdeb-6c72-f14c-cdf0-e14894de05a3@gmail.com/T/#t
[6] https://github.com/gitgitgadget/git/issues/752
[7]https://lore.kernel.org/git/20170405174719.1297-6-bmwill@google.com/t/#m224c2475b1bad333e1118f68c80465b638ed87ee
[8] https://public-inbox.org/git/20170627162307.GE161648@aiede.mtv.corp.google.com/

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: RFC/Discussion - Submodule UX Improvements
  2021-04-16 23:36 RFC/Discussion - Submodule UX Improvements Emily Shaffer
  2021-04-18  5:22 ` Christian Couder
  2021-04-19  3:20 ` Philippe Blain
@ 2021-04-19 12:56 ` Randall S. Becker
  2021-04-19 12:56 ` Aaron Schrab
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 18+ messages in thread
From: Randall S. Becker @ 2021-04-19 12:56 UTC (permalink / raw)
  To: 'Emily Shaffer', git
  Cc: avarab, jrnieder, albertcui, gitster, matheus.bernardino

> -----Original Message-----
> From: Emily Shaffer <emilyshaffer@google.com>
On April 16, 2021 7:37 PM, Emily Shaffer wrote:
> As hinted by a couple recent patches, I'm planning on some pretty big
> submodule work over the next 6 months or so - and Ævar pointed out to me
in
> https://lore.kernel.org/git/87v98p17im.fsf@evledraar.gmail.com that I
> probably should share some of those plans ahead of time. :) So attached is
a
> lightly modified version of the doc that we've been working on internally
at
> Google, focusing on what we think would be an ideal submodule workflow.
> 
> I'm hoping that folks will get a chance to read some or all of it and let
us know
> what sounds cool (or sounds extremely broken). The best spot to start is
> probably the "Overview" section, which describes what the "main path"
would
> look like for a user working on a project with submodules. Most of the
work
> that we're planning on doing is under the "What doesn't already work"
> headings.
> 
> Thanks in advance for any time you spend reading/discussing :)
<big snip>

Just adding my voice here, this is something my teams would be very happy to
consider.

> - Worktrees
> When a user runs 'git worktree add' from the superproject, each submodule
>  in the new worktree should also be created as a worktree of the
corresponding
>  submodule in the original project.
> What doesn't already work:
>   * worktrees and submodules getting along - submodules are now freshly
cloned
>     when creating a superproject worktree

My teams are currently debating the use of submodules (we have gone back and
forth over the years on these) and worktrees (which seem to have some
positive process implications for those more legacy-ish team members more
used to a centralised workflows). I have not seen any worktree/submodule
combinations used but fear the worst - as in I'm pretty sure I know which of
my team members is going to try this. It is probably a separate matter to
make the two get along better.

Cheers,
Randall

-- Brief whoami:
NonStop developer since approximately 211288444200000000
UNIX developer since approximately 421664400
-- In my real life, I talk too much.




^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: RFC/Discussion - Submodule UX Improvements
  2021-04-16 23:36 RFC/Discussion - Submodule UX Improvements Emily Shaffer
                   ` (2 preceding siblings ...)
  2021-04-19 12:56 ` Randall S. Becker
@ 2021-04-19 12:56 ` Aaron Schrab
  2021-04-20 18:49   ` Emily Shaffer
  2021-04-19 19:14 ` Jacob Keller
  2021-04-22 15:32 ` Jacob Keller
  5 siblings, 1 reply; 18+ messages in thread
From: Aaron Schrab @ 2021-04-19 12:56 UTC (permalink / raw)
  To: Emily Shaffer
  Cc: git, avarab, jrnieder, albertcui, gitster, matheus.bernardino

At 16:36 -0700 16 Apr 2021, Emily Shaffer <emilyshaffer@google.com> wrote:
>- git switch / git checkout

(snip)

>4. A new branch with the same name is created on each submodule.
>  a. If there is a naming conflict, we could prompt the user to resolve it, or
>     we could just check out the branch by that name and print a warning to the
>     user with advice on how to solve it (cd submodule && git switch -c
>     different-branch-name HEAD@{1}). Maybe we could skip the warning/advice if
>     the tree is identical to the tree we would have used as the start point
>     (that is, the user switched branches in the submodule, then said "oh crap"
>     and went back and switched branches in the superproject).
>  b. Tracking info is set appropriately on each new branch to the upstream of
>     the branch referenced by the parent of the new superproject commit, OR to
>     the default branch's upstream.
>5. The new branch is checked out on each of the submodules.

In many cases the branch name for the superproject isn't going to be 
appropriate for submodules.

This seems likely to create a LOT of junk branches. Do you also have a 
proposal for cleaning those up?

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: RFC/Discussion - Submodule UX Improvements
  2021-04-16 23:36 RFC/Discussion - Submodule UX Improvements Emily Shaffer
                   ` (3 preceding siblings ...)
  2021-04-19 12:56 ` Aaron Schrab
@ 2021-04-19 19:14 ` Jacob Keller
  2021-04-19 19:28   ` Randall S. Becker
  2021-04-22 15:32 ` Jacob Keller
  5 siblings, 1 reply; 18+ messages in thread
From: Jacob Keller @ 2021-04-19 19:14 UTC (permalink / raw)
  To: Emily Shaffer
  Cc: Git mailing list, Ævar Arnfjörð Bjarmason,
	Jonathan Nieder, albertcui, Junio C Hamano, matheus.bernardino

On Fri, Apr 16, 2021 at 4:38 PM Emily Shaffer <emilyshaffer@google.com> wrote:
>
> Hi folks,
>
> As hinted by a couple recent patches, I'm planning on some pretty big submodule
> work over the next 6 months or so - and Ævar pointed out to me in
> https://lore.kernel.org/git/87v98p17im.fsf@evledraar.gmail.com that I probably
> should share some of those plans ahead of time. :) So attached is a lightly
> modified version of the doc that we've been working on internally at Google,
> focusing on what we think would be an ideal submodule workflow.
>
> I'm hoping that folks will get a chance to read some or all of it and let us
> know what sounds cool (or sounds extremely broken). The best spot to start is
> probably the "Overview" section, which describes what the "main path" would look
> like for a user working on a project with submodules. Most of the work that
> we're planning on doing is under the "What doesn't already work" headings.
>
> Thanks in advance for any time you spend reading/discussing :)
>
>  - Emily
>
> Background
> ==========
>
> It's worth mentioning that the main goal that's funding this work is to provide
> an alternative for users whose projects use repo
> (https://source.android.com/setup/develop#repo) today. That means that the main
> focus is to try and reach feature parity to repo for an easier transition for
> those who want to switch. As a result, some of the direction below is aimed
> towards learning from what has worked well with repo (but hopefully more
> flexible for users who want to do more, or differently).
>
> There are also a few things mentioned that are specifically targeted to ease use
> with Gerrit, which is in wide use here at Google (and therefore also a
> consideration we need to make to keep getting paid ;) ).
>
> Overview
> =======
>

One thing that I think I didn't see covered when I scanned this, that
is something I find difficult or annoying to resolve is using "blame"
with submodules. I use blame a lot to do code history analysis to
understand how something got to the way it is. (Often this helps
resolve issues or bugs by using new context to understand why an old
change was broken).

It has bothered me in the past when I try to do "git blame
<path/to/submodule>" and I get nothing. Obviously there are ways
around this: you can for example just log the path and get the commit
that changed it most recently, or try to search for when the submodule
was set to a given commit.

A sort of dream I had was a flow where I could do something from the
parent like "git blame <path/to/submodule>/submodule/file" and have it
present a blame of that files contents keyed on the *parent* commit
that changed the submodule to have that line, as opposed to being
forced to go into the submodule and figure out what commit introduced
it and then go back to the parent and find out what commit changed the
submodule to include that submodule commit.

> When the work is completed, users should be able to have a clean, obvious
> workflow when using best practices:
>
> To download the code, they should be able to run simply git clone
> https://example.com/superproject to download the project and all its submodules;
> if partial clone is configured, they should receive only the objects allowed by
> the filter in their superproject as well as in each submodule.
>
> To begin working on a feature, from the superproject they can 'git switch -c
> feature', and since the new branch is being created, a new branch 'feature' will
> be created for each submodule, pointing to the submodule's current 'HEAD'. They
> can move to a submodule directory and begin to make changes, and when they
> commit these changes normally with 'git commit' from the submodule directory,
> running git status in the superproject will reflect that a submodule has
> changed. Next, they can switch to a second submodule, making and committing more
> changes.
>
> When they are ready to send these changes which are ready for review but need to
> be linked together, they can switch back to the superproject, where 'git status'
> indicates that there are changes in both submodules. They can commit these
> changes to the superproject and use 'git push' to send a review; Git will
> recurse into affected submodules and push those submodule commits appropriately
> as well.
>
> While the user is waiting for feedback on their review, to work on their next
> task, they can 'git switch other-feature', which will checkout the branches
> specified in the superproject commit at the tip of 'other-feature'; now the user
> can continue working as before.
>
> When it's time to update their local repo, the user can do so as with a
> single-repo project. First they can 'git checkout main && git pull' (or 'git
> pull -r'); Git will first checkout the branches associated with main in each
> submodule, then fetch and merge/rebase in each submodule appropriately. Finally,
> they can 'git switch feature && git rebase', at which time Git will recursively
> checkout the branches associated with 'feature' in each submodule and rebase
> each submodule appropriately.
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: RFC/Discussion - Submodule UX Improvements
  2021-04-19 19:14 ` Jacob Keller
@ 2021-04-19 19:28   ` Randall S. Becker
  2021-04-20 16:18     ` Jacob Keller
  0 siblings, 1 reply; 18+ messages in thread
From: Randall S. Becker @ 2021-04-19 19:28 UTC (permalink / raw)
  To: 'Jacob Keller', 'Emily Shaffer'
  Cc: 'Git mailing list',
	'Ævar Arnfjörð Bjarmason',
	'Jonathan Nieder', albertcui, 'Junio C Hamano',
	matheus.bernardino

On April 19, 2021 3:15 PM, Jacob Keller wrote:
> On Fri, Apr 16, 2021 at 4:38 PM Emily Shaffer <emilyshaffer@google.com>
> wrote:
> >
> > Hi folks,
> >
> > As hinted by a couple recent patches, I'm planning on some pretty big
> > submodule work over the next 6 months or so - and Ævar pointed out to
> > me in https://lore.kernel.org/git/87v98p17im.fsf@evledraar.gmail.com
> > that I probably should share some of those plans ahead of time. :) So
> > attached is a lightly modified version of the doc that we've been
> > working on internally at Google, focusing on what we think would be an ideal
> submodule workflow.
> >
> > I'm hoping that folks will get a chance to read some or all of it and
> > let us know what sounds cool (or sounds extremely broken). The best
> > spot to start is probably the "Overview" section, which describes what
> > the "main path" would look like for a user working on a project with
> > submodules. Most of the work that we're planning on doing is under the
> "What doesn't already work" headings.
> >
> > Thanks in advance for any time you spend reading/discussing :)
> >
> >  - Emily
> >
> > Background
> > ==========
> >
> > It's worth mentioning that the main goal that's funding this work is
> > to provide an alternative for users whose projects use repo
> > (https://source.android.com/setup/develop#repo) today. That means that
> > the main focus is to try and reach feature parity to repo for an
> > easier transition for those who want to switch. As a result, some of
> > the direction below is aimed towards learning from what has worked
> > well with repo (but hopefully more flexible for users who want to do more, or
> differently).
> >
> > There are also a few things mentioned that are specifically targeted
> > to ease use with Gerrit, which is in wide use here at Google (and
> > therefore also a consideration we need to make to keep getting paid ;) ).
> >
> > Overview
> > =======
> >
> 
> One thing that I think I didn't see covered when I scanned this, that is
> something I find difficult or annoying to resolve is using "blame"
> with submodules. I use blame a lot to do code history analysis to understand
> how something got to the way it is. (Often this helps resolve issues or bugs by
> using new context to understand why an old change was broken).
> 
> It has bothered me in the past when I try to do "git blame
> <path/to/submodule>" and I get nothing. Obviously there are ways around this:
> you can for example just log the path and get the commit that changed it most
> recently, or try to search for when the submodule was set to a given commit.
> 
> A sort of dream I had was a flow where I could do something from the parent
> like "git blame <path/to/submodule>/submodule/file" and have it present a
> blame of that files contents keyed on the *parent* commit that changed the
> submodule to have that line, as opposed to being forced to go into the
> submodule and figure out what commit introduced it and then go back to the
> parent and find out what commit changed the submodule to include that
> submodule commit.

Not going to disagree, but are you looking for the blame on the submodule ref file itself or files in the submodule? It's hard to teach git to do a blame on a one-line file.

Otherwise, and I think this is what you really are going for, teaching it to do a blame based on "git blame <path/to/submodule>/submodule/file" would be very nice and abstracts out the need for the user (or more importantly to me = scripts) to understand that a submodule is involved; however, it is opening up a very large door: "should/could we teach git to abstract submodules out of every command". This would potentially replace a significant part of the use cases for the "git submodule foreach" sub-command. In your ask, the current paradigm "cd <path/to/submodule>/submodule && git blame file" or pretty much every other command does work, but it requires the user/script to know you have a submodule in the path. So my question is: is this worth the effort? I don't have a good answer to that question. Half of my brain would like this very much/the other half is scared of the impact to the code.

Just my musings.

Randall


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: RFC/Discussion - Submodule UX Improvements
  2021-04-19 19:28   ` Randall S. Becker
@ 2021-04-20 16:18     ` Jacob Keller
  2021-04-20 18:47       ` Emily Shaffer
  0 siblings, 1 reply; 18+ messages in thread
From: Jacob Keller @ 2021-04-20 16:18 UTC (permalink / raw)
  To: Randall S. Becker
  Cc: Emily Shaffer, Git mailing list,
	Ævar Arnfjörð Bjarmason, Jonathan Nieder,
	Albert Cui, Junio C Hamano, Matheus Tavares Bernardino

On Mon, Apr 19, 2021 at 12:28 PM Randall S. Becker
<rsbecker@nexbridge.com> wrote:
> On April 19, 2021 3:15 PM, Jacob Keller wrote:
> > A sort of dream I had was a flow where I could do something from the parent
> > like "git blame <path/to/submodule>/submodule/file" and have it present a
> > blame of that files contents keyed on the *parent* commit that changed the
> > submodule to have that line, as opposed to being forced to go into the
> > submodule and figure out what commit introduced it and then go back to the
> > parent and find out what commit changed the submodule to include that
> > submodule commit.
>
> Not going to disagree, but are you looking for the blame on the submodule ref file itself or files in the submodule? It's hard to teach git to do a blame on a one-line file.
>

Well, I would like if "git blame <path/to/submodule>" did.. something
other than just fail. Sometimes my brain is working in a "blame where
this came from" and I type that out and then get frustrated when it
fails. Additionally...

> Otherwise, and I think this is what you really are going for, teaching it to do a blame based on "git blame <path/to/submodule>/submodule/file" would be very nice and abstracts out the need for the user (or more importantly to me = scripts) to understand that a submodule is involved; however, it is opening up a very large door: "should/could we teach git to abstract submodules out of every command". This would potentially replace a significant part of the use cases for the "git submodule foreach" sub-command. In your ask, the current paradigm "cd <path/to/submodule>/submodule && git blame file" or pretty much every other command does work, but it requires the user/script to know you have a submodule in the path. So my question is: is this worth the effort? I don't have a good answer to that question. Half of my brain would like this very much/the other half is scared of the impact to the code.
>
> Just my musings.

I'm not asking for "git blame <path/to/submodule>/<file>" to give the
the same outout as "cd <path/to/submodule> && git blame <file>"

What i'm asking is: given this file, tell me which commit in the
parent did the line get introduced. So basically I want to walk over
the changes to the submodule pointer and find out when it get
introduced into the parent, not when it got introduced into the
submodule itself.

This is a related question, but it is actually not trivial to go
instantly from "it was in xyz submodule commit" to "it was then pulled
in by xyz parent commit". It's something that is quite tedious to do
manually, especially since the submodule pointer could change
arbitrarily so knowing the submodule commit doesn't mean you can
simply grep for which commit set the submodule exactly to that commit.
Essentially, I want a 'git blame' that ignores all changes which
aren't actually the submodule pointer, update.

I think that's something that is much harder to do manually, but feels
like it should be relatively simple to implement within the blame
algorithm. I don't feel like this is something strictly replaceable by
"git submodule foreach"

>
> Randall
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: RFC/Discussion - Submodule UX Improvements
  2021-04-20 16:18     ` Jacob Keller
@ 2021-04-20 18:47       ` Emily Shaffer
  2021-04-20 19:38         ` Randall S. Becker
  2021-04-21  6:57         ` Jacob Keller
  0 siblings, 2 replies; 18+ messages in thread
From: Emily Shaffer @ 2021-04-20 18:47 UTC (permalink / raw)
  To: Jacob Keller
  Cc: Randall S. Becker, Git mailing list,
	Ævar Arnfjörð Bjarmason, Jonathan Nieder,
	Albert Cui, Junio C Hamano, Matheus Tavares Bernardino

On Tue, Apr 20, 2021 at 09:18:05AM -0700, Jacob Keller wrote:
> 
> On Mon, Apr 19, 2021 at 12:28 PM Randall S. Becker
> <rsbecker@nexbridge.com> wrote:
> > On April 19, 2021 3:15 PM, Jacob Keller wrote:
> > > A sort of dream I had was a flow where I could do something from the parent
> > > like "git blame <path/to/submodule>/submodule/file" and have it present a
> > > blame of that files contents keyed on the *parent* commit that changed the
> > > submodule to have that line, as opposed to being forced to go into the
> > > submodule and figure out what commit introduced it and then go back to the
> > > parent and find out what commit changed the submodule to include that
> > > submodule commit.
> >
> > Not going to disagree, but are you looking for the blame on the submodule ref file itself or files in the submodule? It's hard to teach git to do a blame on a one-line file.
> >
> 
> Well, I would like if "git blame <path/to/submodule>" did.. something
> other than just fail. Sometimes my brain is working in a "blame where
> this came from" and I type that out and then get frustrated when it
> fails. Additionally...
> 
> > Otherwise, and I think this is what you really are going for, teaching it to do a blame based on "git blame <path/to/submodule>/submodule/file" would be very nice and abstracts out the need for the user (or more importantly to me = scripts) to understand that a submodule is involved; however, it is opening up a very large door: "should/could we teach git to abstract submodules out of every command". This would potentially replace a significant part of the use cases for the "git submodule foreach" sub-command. In your ask, the current paradigm "cd <path/to/submodule>/submodule && git blame file" or pretty much every other command does work, but it requires the user/script to know you have a submodule in the path. So my question is: is this worth the effort? I don't have a good answer to that question. Half of my brain would like this very much/the other half is scared of the impact to the code.
> >
> > Just my musings.
> 
> I'm not asking for "git blame <path/to/submodule>/<file>" to give the
> the same outout as "cd <path/to/submodule> && git blame <file>"
> 
> What i'm asking is: given this file, tell me which commit in the
> parent did the line get introduced. So basically I want to walk over
> the changes to the submodule pointer and find out when it get
> introduced into the parent, not when it got introduced into the
> submodule itself.
> 
> This is a related question, but it is actually not trivial to go
> instantly from "it was in xyz submodule commit" to "it was then pulled
> in by xyz parent commit". It's something that is quite tedious to do
> manually, especially since the submodule pointer could change
> arbitrarily so knowing the submodule commit doesn't mean you can
> simply grep for which commit set the submodule exactly to that commit.
> Essentially, I want a 'git blame' that ignores all changes which
> aren't actually the submodule pointer, update.
> 
> I think that's something that is much harder to do manually, but feels
> like it should be relatively simple to implement within the blame
> algorithm. I don't feel like this is something strictly replaceable by
> "git submodule foreach"

I think I understand what you're saying. Something like the following
tree:

super   sub
b------->4
         3
         2
a------->1

producing something like this:

'git -C sub blame main.c'

1 AU Thor	2020-01-01
2 CO Mitter	2020-01-02		int main() {
4 AU Thor	2020-01-04		  printf("Hello world!\n");
3 Dev E		2020-01-03		  return 0;
2 CO Mitter	2020-01-02		}

and
'git blame sub/main.c'

a Mai N		2020-01-01
b Senior Dev	2020-01-04		int main() {
b Senior Dev	2020-01-04		  printf("Hello world!\n");
b Senior Dev	2020-01-04		  return 0;
b Senior Dev	2020-01-04		}

or to put it another way: if we are treating superproject commit as "the
whole feature", then it could be useful to see "which feature added this
change" instead of "which atomic commit inside a feature added this
change".

To me, it sounds expensive to compute... wouldn't you  need to say, for
each blame line, "is this commit an ancestor of the commit associated in
THIS superproject commit? ...how about the next superproject commit?"
But I also don't have much experience with the blame implementation so
maybe I'm thinking naively :) :)

And even if it is expensive, considering that Jacob and Randall both had
different ideas of what their ideal 'git blame' recursive behavior would
be, maybe it makes sense to use a flag to ask for the more expensive
behavior, e.g. 'git blame --show-superproject-commit sub/main.c'?

 - Emily

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: RFC/Discussion - Submodule UX Improvements
  2021-04-19 12:56 ` Aaron Schrab
@ 2021-04-20 18:49   ` Emily Shaffer
  2021-04-20 19:29     ` Randall S. Becker
  0 siblings, 1 reply; 18+ messages in thread
From: Emily Shaffer @ 2021-04-20 18:49 UTC (permalink / raw)
  To: git, avarab, jrnieder, albertcui, gitster, matheus.bernardino

On Mon, Apr 19, 2021 at 08:56:43AM -0400, Aaron Schrab wrote:
> 
> At 16:36 -0700 16 Apr 2021, Emily Shaffer <emilyshaffer@google.com> wrote:
> > - git switch / git checkout
> 
> (snip)
> 
> > 4. A new branch with the same name is created on each submodule.
> >  a. If there is a naming conflict, we could prompt the user to resolve it, or
> >     we could just check out the branch by that name and print a warning to the
> >     user with advice on how to solve it (cd submodule && git switch -c
> >     different-branch-name HEAD@{1}). Maybe we could skip the warning/advice if
> >     the tree is identical to the tree we would have used as the start point
> >     (that is, the user switched branches in the submodule, then said "oh crap"
> >     and went back and switched branches in the superproject).
> >  b. Tracking info is set appropriately on each new branch to the upstream of
> >     the branch referenced by the parent of the new superproject commit, OR to
> >     the default branch's upstream.
> > 5. The new branch is checked out on each of the submodules.
> 
> In many cases the branch name for the superproject isn't going to be
> appropriate for submodules.
> 
> This seems likely to create a LOT of junk branches. Do you also have a
> proposal for cleaning those up?

Yeah, I think we have a point internally for "clean up alllll the
submodule branches that are unreferenced/already merged". You're right
that in a workflow where I have a superproject with eight submodules,
because I need them to build, but only do active development on one
submodule out of the eight, I'll have a ton of junk refs in the other
seven submodules. Yuck :)

 - Emily

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: RFC/Discussion - Submodule UX Improvements
  2021-04-20 18:49   ` Emily Shaffer
@ 2021-04-20 19:29     ` Randall S. Becker
  0 siblings, 0 replies; 18+ messages in thread
From: Randall S. Becker @ 2021-04-20 19:29 UTC (permalink / raw)
  To: 'Emily Shaffer',
	git, avarab, jrnieder, albertcui, gitster, matheus.bernardino

On April 20, 2021 2:50 PM, Emily Shaffer wrote:
> On Mon, Apr 19, 2021 at 08:56:43AM -0400, Aaron Schrab wrote:
> >
> > At 16:36 -0700 16 Apr 2021, Emily Shaffer <emilyshaffer@google.com>
> wrote:
> > > - git switch / git checkout
> >
> > (snip)
> >
> > > 4. A new branch with the same name is created on each submodule.
> > >  a. If there is a naming conflict, we could prompt the user to resolve
it, or
> > >     we could just check out the branch by that name and print a
warning to
> the
> > >     user with advice on how to solve it (cd submodule && git switch -c
> > >     different-branch-name HEAD@{1}). Maybe we could skip the
> warning/advice if
> > >     the tree is identical to the tree we would have used as the start
point
> > >     (that is, the user switched branches in the submodule, then said
"oh crap"
> > >     and went back and switched branches in the superproject).
> > >  b. Tracking info is set appropriately on each new branch to the
upstream of
> > >     the branch referenced by the parent of the new superproject
commit, OR
> to
> > >     the default branch's upstream.
> > > 5. The new branch is checked out on each of the submodules.
> >
> > In many cases the branch name for the superproject isn't going to be
> > appropriate for submodules.
> >
> > This seems likely to create a LOT of junk branches. Do you also have a
> > proposal for cleaning those up?
> 
> Yeah, I think we have a point internally for "clean up alllll the
submodule
> branches that are unreferenced/already merged". You're right that in a
> workflow where I have a superproject with eight submodules, because I need
> them to build, but only do active development on one submodule out of the
> eight, I'll have a ton of junk refs in the other seven submodules. Yuck :)

In fact, this yuck is a reason why many organizations have gone to
monolithic repositories instead of multiple smaller ones - because of the
touch points. However, the argument for using multiple smaller repos mirrors
this particular use case, so while "yuck", it might have value when
mirroring what happens in the issue tracking systems that have massive touch
points. We were there and moved to monolithic per product release group, but
when we had the other approach, this particular feature actually would have
helped a whole lot. I wonder whether this mess might have more value than we
think.

Regards,
Randall

-- Brief whoami:
NonStop developer since approximately 211288444200000000
UNIX developer since approximately 421664400
MVS not admitting to anything
-- In my real life, I talk too much.




^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: RFC/Discussion - Submodule UX Improvements
  2021-04-20 18:47       ` Emily Shaffer
@ 2021-04-20 19:38         ` Randall S. Becker
  2021-04-21  6:57         ` Jacob Keller
  1 sibling, 0 replies; 18+ messages in thread
From: Randall S. Becker @ 2021-04-20 19:38 UTC (permalink / raw)
  To: 'Emily Shaffer', 'Jacob Keller'
  Cc: 'Git mailing list',
	'Ævar Arnfjörð Bjarmason',
	'Jonathan Nieder', 'Albert Cui',
	'Junio C Hamano', 'Matheus Tavares Bernardino'

On April 20, 2021 2:48 PM, Emily Shaffer wrote:
> On Tue, Apr 20, 2021 at 09:18:05AM -0700, Jacob Keller wrote:
> >
> > On Mon, Apr 19, 2021 at 12:28 PM Randall S. Becker
> > <rsbecker@nexbridge.com> wrote:
> > > On April 19, 2021 3:15 PM, Jacob Keller wrote:
> > > > A sort of dream I had was a flow where I could do something from
> > > > the parent like "git blame <path/to/submodule>/submodule/file" and
> > > > have it present a blame of that files contents keyed on the
> > > > *parent* commit that changed the submodule to have that line, as
> > > > opposed to being forced to go into the submodule and figure out
> > > > what commit introduced it and then go back to the parent and find
> > > > out what commit changed the submodule to include that submodule
> commit.
> > >
> > > Not going to disagree, but are you looking for the blame on the
submodule
> ref file itself or files in the submodule? It's hard to teach git to do a
blame on a
> one-line file.
> > >
> >
> > Well, I would like if "git blame <path/to/submodule>" did.. something
> > other than just fail. Sometimes my brain is working in a "blame where
> > this came from" and I type that out and then get frustrated when it
> > fails. Additionally...
> >
> > > Otherwise, and I think this is what you really are going for, teaching
it to do
> a blame based on "git blame <path/to/submodule>/submodule/file" would be
> very nice and abstracts out the need for the user (or more importantly to
me =
> scripts) to understand that a submodule is involved; however, it is
opening up a
> very large door: "should/could we teach git to abstract submodules out of
every
> command". This would potentially replace a significant part of the use
cases for
> the "git submodule foreach" sub-command. In your ask, the current paradigm
> "cd <path/to/submodule>/submodule && git blame file" or pretty much every
> other command does work, but it requires the user/script to know you have
a
> submodule in the path. So my question is: is this worth the effort? I
don't have a
> good answer to that question. Half of my brain would like this very
much/the
> other half is scared of the impact to the code.
> > >
> > > Just my musings.
> >
> > I'm not asking for "git blame <path/to/submodule>/<file>" to give the
> > the same outout as "cd <path/to/submodule> && git blame <file>"
> >
> > What i'm asking is: given this file, tell me which commit in the
> > parent did the line get introduced. So basically I want to walk over
> > the changes to the submodule pointer and find out when it get
> > introduced into the parent, not when it got introduced into the
> > submodule itself.
> >
> > This is a related question, but it is actually not trivial to go
> > instantly from "it was in xyz submodule commit" to "it was then pulled
> > in by xyz parent commit". It's something that is quite tedious to do
> > manually, especially since the submodule pointer could change
> > arbitrarily so knowing the submodule commit doesn't mean you can
> > simply grep for which commit set the submodule exactly to that commit.
> > Essentially, I want a 'git blame' that ignores all changes which
> > aren't actually the submodule pointer, update.
> >
> > I think that's something that is much harder to do manually, but feels
> > like it should be relatively simple to implement within the blame
> > algorithm. I don't feel like this is something strictly replaceable by
> > "git submodule foreach"
> 
> I think I understand what you're saying. Something like the following
> tree:
> 
> super   sub
> b------->4
>          3
>          2
> a------->1
> 
> producing something like this:
> 
> 'git -C sub blame main.c'
> 
> 1 AU Thor	2020-01-01
> 2 CO Mitter	2020-01-02		int main() {
> 4 AU Thor	2020-01-04		  printf("Hello world!\n");
> 3 Dev E		2020-01-03		  return 0;
> 2 CO Mitter	2020-01-02		}
> 
> and
> 'git blame sub/main.c'
> 
> a Mai N		2020-01-01
> b Senior Dev	2020-01-04		int main() {
> b Senior Dev	2020-01-04		  printf("Hello world!\n");
> b Senior Dev	2020-01-04		  return 0;
> b Senior Dev	2020-01-04		}
> 
> or to put it another way: if we are treating superproject commit as "the
whole
> feature", then it could be useful to see "which feature added this change"
> instead of "which atomic commit inside a feature added this change".
> 
> To me, it sounds expensive to compute... wouldn't you  need to say, for
each
> blame line, "is this commit an ancestor of the commit associated in THIS
> superproject commit? ...how about the next superproject commit?"
> But I also don't have much experience with the blame implementation so
> maybe I'm thinking naively :) :)
> 
> And even if it is expensive, considering that Jacob and Randall both had
> different ideas of what their ideal 'git blame' recursive behavior would
be,
> maybe it makes sense to use a flag to ask for the more expensive behavior,
e.g.
> 'git blame --show-superproject-commit sub/main.c'?

I was partly trying to figure out which path Jacob was requesting and "both"
seem useful to me. Looking at our own super-repo history and comparing what
is in one specific submodule, we have the commit on the submodule ref file
flipping repeatedly between two commits during a period of time (in
timestamp order) where an there were multiple submodule topic branches in
parallel with super-repo topic branches. I'm not saying that was a good
thing, just the reality of one particular icky submodule. Once back on the
main branch, things moved to something rational, but a blame of changing
submodule contents of sub/main.c --show-superproject-commit would lead to
something inherently non-deterministic until the topic branches are all
pruned and/or merged, at least in this degenerative situation. After the
merge, everything was back to deterministic and simple to compute.

Randall


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: RFC/Discussion - Submodule UX Improvements
  2021-04-19  3:20 ` Philippe Blain
@ 2021-04-20 23:03   ` Emily Shaffer
  2021-04-20 23:30     ` Junio C Hamano
  2021-04-21  2:27     ` Philippe Blain
  0 siblings, 2 replies; 18+ messages in thread
From: Emily Shaffer @ 2021-04-20 23:03 UTC (permalink / raw)
  To: Philippe Blain
  Cc: git, avarab, jrnieder, albertcui, gitster, matheus.bernardino

On Sun, Apr 18, 2021 at 11:20:06PM -0400, Philippe Blain wrote:
> > To download the code, they should be able to run simply git clone
> > https://example.com/superproject to download the project and all its submodules;
> 
> Playing the devil's advocate here, but some projects do not want / need all of
> their submodules in a "regular" checkout, so I guess that would have to be somehow
> configurable. I've always felt that since each project is different in that regard,
> it would be better if each project could declare if their submodules are non-optional
> and need to be also cloned when the superproject is cloned. Maybe an additional field
> in '.gitmodules', like a boolean 'submodule.<name>.optional', could be added,
> so that submodules that are optional are not cloned, but others are. If that setting
> is opt-in (meaning that it defaults to 'true', i.e., submodules are considered optional by default),
> then it would be easier to argue for 'git clone' to mean 'git clone --recurse-submodules':
> 'git clone' would clone the superproject and any non-optional submodule.
> Then eventually, when the usage of 'submodule.<name>.optional' becomes more widespread,
> we can switch the default and then projects would need to explicitely declare their submodule
> optional if they don't want them cloned by a simple 'git clone'.

This is actually a point we discussed internally and I cut out of the
doc before sharing, because it is very far down our roadmap (not
expecting to address until probably the second half of the year). As I
understand it, this can also be achieved today by setting
'submodule.path/to/module.active = false' in the superproject's
.git/config.

However, it seems to me like this would be a really cool application of
sparse-checkout, especially if you could distribute the sparse-checkout
config (for example, during clone* ;) ) before a user has the chance to
try and clone all necessary repos for the initial checkout.

* https://lore.kernel.org/git/pull.908.git.1616105016055.gitgitgadget@gmail.com
> > While the user is waiting for feedback on their review, to work on their next
> > task, they can 'git switch other-feature', which will checkout the branches
> > specified in the superproject commit at the tip of 'other-feature'; now the user
> > can continue working as before.
> 
> Here, I'm not sure what you mean by "the branches (plural) specified in the superproject
> commit at the tip of other-feature". Today, with 'submodule.recurse = true', 'git checkout some-feature'
> already checks out each submodule in detached HEAD at the commit recorded in the superproject commit
> at the tip of some-feature. It's unclear if you are proposing to instead record submodule branch
> names in the superproject commit.. is that what's going on here ? (or is it just a typo ?)

Yeah, I'm not sure that it makes sense to record branch name in the
commit itself - but I could see it being useful to do some inference on
the client side and put the user in some state besides detached-HEAD on
checkout. Hmmm.

> 
> > 
> > When it's time to update their local repo, the user can do so as with a
> > single-repo project. First they can 'git checkout main && git pull' (or 'git
> > pull -r'); Git will first checkout the branches associated with main in each
> > submodule, then fetch and merge/rebase in each submodule appropriately.
> 
> What if some submodule does not use the same branch name for their primary integration branch?
> Sometimes as a superproject using another project as a submodule, you do not
> control that...

Yeah, you're right that that's an important consideration - "how can I
teach my superproject to default to a different branch than the name of
the superproject's current branch?" I wonder whether the branch config
in .gitmodules (or an equivalent in superproject's .git/config) would
make sense to try and use here?

> > Note that this means options like '--branch' *don't* propagate directly to the
> > submodules. If superproject branch "foo" points its submodule to branch "main",
> 
> Here again, I'm not sure what you mean, because right now there is no concept of
> the superproject having a submodule "pointing to some branch", only to a specific
> commit. 'submodule.<name>.branch' is only ever used by the command 'git submodule update --remote'.
> Is there an implicit proposal to change that ?

There is no proposal to change the concept of superproject commits
referencing submodule commits. I do not think it is a good idea to try
to have superproject commits reference submodule branch names. No. :) I
think the answer here is the same as above - detached-HEAD is an
inconvenient state for the user unless they specifically ask for it, and
it would be better to check out some branch predictably if possible.

That means that it should be hard for users to end up in a state where
the submodule commit 123 referenced by the superproject commit abc
doesn't have some ref pointing to it in the submodule; I think this is
what I was trying to get at with the 'git status' improvements and 'git
checkout'/'git switch' warnings.

> 
> > then 'git clone --branch foo https://superproject.git' will clone
> > superproject/submodule to branch 'main' instead. (It *may* be OK to take
> > '--branch' to mean "the branch specified by the parent *and* the branch named in
> > --branch if applicable, but no other branches".)
> > 
> > What doesn't already work:
> > 
> >    * --recurse-submodules should turn on submodule.recurse=true
> 
> That's actually a good very idea, but maybe it should be explicitely mentioned, I think
> (in the output of the command I mean).
> 
> >    * superproject gets top-level config inherited by submodules
> >    * New --recurse-submodules --single-branch semantics
> >    * Progress bar for clone (see work estimates)
> >    * Recommended config from project owner
> > 
> > 
> > -- Partial clone
> > 
> > 1. git clone initializes the directory indicated by the user
> > 2. git clone applies the appropriate configs for the partial clone filter
> >     requested by the user
> >    a) These configs go to the config file shared by superproject and submodules.
> > 3. git clone fetches the superproject
> > 4. git clone checks out the superproject at server's HEAD
> > 5. git clone warns the user that a recommended hook/config setup exists and
> >     provides a tip on how to install it
> > 6. For each submodule encountered in step 4, git clone is invoked for the
> >     submodule, and steps 1-4 are repeated (but in directories indicated by the
> >     superproject commit, not by the user). The same filter supplied to the
> >     superproject applies to the submodules.
> > 
> > 
> > What doesn't already work:
> > 
> >    * --filter=blob:none with submodules (it's using global variables)
> >    * propagating --filter=blob:none to submodules (via submodules.config)
> >    * Recommended config from project owner
> > 
> > 
> > - git fetch
> > 
> > By default, git fetch looks for (1) the remote name(s) supplied at the command
> > line, (2) the remote which the currently checked out branch is tracking, or (3)
> > the remote named origin, in that order. For submodules, there is no guarantee
> > that (1) has anything to do with the state of the submodule referenced by the
> > superproject commit, so just start from (2).
> > 
> > This operation can be extremely long-running if the project contains many large
> > submodules, so progress indicators should be displayed.
> > 
> > Caveat: this will mean that we should be more careful about ensuring that
> > submodule branches have tracking info set up correctly; that may be an issue for
> > users who want to branch within their submodule. This may be OK because users
> > will probably still have 'origin' as their submodule's remote, and if they want
> > more complicated behavior, they will be able to configure it.
> > 
> > What doesn't already work:
> > 
> >    * Make sure not to propagate (1) to submodules while recursing
> >    * Fetching new submodules.
> >    * Not having 0.95 success probability ** 100 = low success probability (that
> >      is, we need more retries during submodule fetch)
> >    * Progress indicators
> 
> I would add the following:
> 
> - Fix 'git fetch upstream' when 'submodule.recurse' and 'fetch.recurseSubdmodules=on-demand'
> are both set  (the submodule is not fetched even if the superproject changed the submodule
> commit).

Interesting. Sounds like it's worth writing a test case to see what does
happen/what should happen and make it work :)

> 
> - Do not rely on 'origin' exising in the submodule (or being pushable to). Right now,
> renaming the 'origin' remote to 'upstream' in a submodule, and using 'origin' for one's own
> fork of a submodule, (as is often done in the superproject), breaks 'git fetch --recurse-submodules'
> (or 'git fetch' if 'submodule.recurse' is set), in the sense that the fetch does not recurse
> to the submodule, as it should. I do not have a simple reproducer handy but
> I've seen it happen and there are a couple hard-coded "origin" in the submodule code [1], [2].

This sounds to me like a specific example of a more generalized goal,
which may or may not have ended up in this doc(?) to appropriately
choose the right remote for fetching and pushing. So, definitely :)

> > 
> > 
> > - git switch / git checkout
> > 
> > Submodules should continue to perform these operations the same way that they
> > have before, that is, the way that single-repo Git works. But superprojects
> > should behave as follows:
> > 
> > 
> > -- Create mode (git switch -c / git checkout -b)
> > 
> > 1. The current worktree is checked for uncommitted changes to tracked files. The
> >     current worktree of each submodule is also checked.
> > 2. A new branch is created on the superproject; that branch's ref is pointed to
> >     the current HEAD.
> > 3. The new branch is checked out on the superproject.
> > 4. A new branch with the same name is created on each submodule.
> 
> That might not be wanted by all, so I think it should be configurable.
> 
> >    a. If there is a naming conflict, we could prompt the user to resolve it, or
> >       we could just check out the branch by that name and print a warning to the
> >       user with advice on how to solve it (cd submodule && git switch -c
> >       different-branch-name HEAD@{1}). Maybe we could skip the warning/advice if
> >       the tree is identical to the tree we would have used as the start point
> >       (that is, the user switched branches in the submodule, then said "oh crap"
> >       and went back and switched branches in the superproject).
> >    b. Tracking info is set appropriately on each new branch to the upstream of
> >       the branch referenced by the parent of the new superproject commit, OR to
> >       the default branch's upstream.
> 
> This last point is a little unclear: which "new superproject commit" ? (we are creating
> a branch, so there is no new commit yet?). And again, you talk about a (submodule?) branch being referenced
> by a superproject commit, which is not a concept that actually exists today.

Yeah, I can clean up the wording here, thanks for pointing it out.

> Also, usually tracking info is only set
> automatically when using the form 'git checkout -b new-branch upstream/master' or
> the like. Do you also propose that 'git checkout -b new-branch', by itself, should
> automatically set tracking info ?

Yes - that is an approach that we want to explore, to solve the general
push/fetch remote+branch problem.

> 
> 
> > 5. The new branch is checked out on each of the submodules.
> > 
> > What doesn't already work:
> > 
> >    * Safety check when leaving uncommitted submodule changes
> 
> Yes, that has been reported several times ([3], [4], [5]). I have fixes for this,
> not quite ready to send because I'm trying to write extensive tests (maybe too extensive)...
> 
> >    * Propagating branch names to submodules currently requires a custom hacky
> >      repolike patch
> >    * Error handling + graceful non-error handling if the branch already exists
> >    * "Knowing what branch to push to": copying over which-branch-is-upstream info
> >      ** Needs some UX help, push.default is a mess
> >    * Tracking info setups
> > 
> > -- Switching to an existing branch (git switch / git checkout)
> > 
> > 1. The current worktree is checked for uncommitted changes to tracked files. The
> >     current worktree of each submodule is also checked.
> > 2. The requested branch is checked out on the superproject.
> > 3. The submodule commit or branch referenced by the newly-checked-out
> >     superproject commit is checked out on each submodule.
> > 
> > What doesn't already work:
> > 
> >    * Same as in create mode
> 
> Here, I would add that 'git checkout --recurse-submodules', along with 'git clone --recurse-submodules',
> have trouble with correctly checkout-ing an older commit that records a submodule that
> was since removed from the project. The user experience around this use case is currently very very bad [6].
> This is partly due to 'git clone --recurse-submodules' only cloning submodules that are recorded in
> the tip commit of the default branch of the superproject, which could certainly be improved.

Yeah, we are aware of this pain internally too, thanks for pointing it
out.

> 
> > 
> > 
> > - git status
> > 
> > -- From superproject
> > The superproject is clean if:
> > 
> >    * No tracked files in the superproject have been modified and not committed
> >    * No tracked files in any submodules have been modified and not committed
> >    * No commits in any submodules differ from the commits referenced by the tip
> >      commit of the superproject
> > 
> > Advices should describe:
> > 
> >    * How to commit or drop changes to files in the superproject
> >    * How to commit or drop changes to files in the submodules
> >    * How to commit changes to submodule references
> >    * Which commit/branch to switch the submodule back to if the current work
> >      should be dropped: "Submodule "foo" no longer points to "main", 'git -C foo
> >      switch main' to discard changes"
> > 
> > What doesn't already work:
> > 
> >    * "git status" being super fast and actually possible to use.
> >      ** (That is, we've seen it move very slowly on projects with many
> >         submodules.)
> >    * Advice updates to use the appropriate submodule-y commands.
> 
> I would add that 'git status' should show the submodule as "rewind" if the
> currently checked out submodule commit is *behind* what's recorded in the current superproject
> commit. That is shown by 'git diff --submodule=<log | diff>' and 'git submodule summary'
> and is quite useful to prevent a following 'git commit -am' in the superproject to regress the submodule commit
> by mistake. It would be nice if 'git status' could also show this information (code in
> submodule.c::show_submodule_header).

Oh interesting, that's a good point. Thanks.

> 
> > 
> > -- From submodule
> > 
> > git status's behavior for submodules does not change compared to
> > single-repository Git, except that a red warning line will also display if the
> > superproject commit does not point to the HEAD of the submodule. (This could
> > look similar to the detached-HEAD warning and tracking branch lines in git
> > status today, e.g. "HEAD is ahead of parent project by 2 commits".)
> 
> That would be a nice addition :)
> 
> > 
> > What doesn't already work:
> > 
> >    * "git status" from a submodule being aware of the superproject.
> > 
> > 
> > - git push
> > 
> > -- From superproject
> > 
> > Ideally, a push of the superproject commit results in a push of each submodule
> > which changed, to the appropriate Gerrit upstream. Commits pushed this way
> > across submodules should somehow be associated in the Gerrit UI, similar to the
> > "submitted together" display. This will need some work to make happen.
> > 
> > What doesn't already work:
> > 
> >    * Automatically setting Gerrit topic (with a hook)
> >    * "push --recurse-submodules" knowing where to push to in submodules to
> >      initiate a Gerrit review
> >      ** From `branch` field in .gitmodules?
> >      ** Gerrit accepting 'git push -o review origin main' pushes?
> >      ** Review URL with a remote helper that rewrites refs/heads/main to
> >         refs/for/main?
> >      ** Need UX help
> 
> It would be nice if 'git push' would not force users to use the same
> remote names and branch names in the superproject and the submodule.
> Previous discussion around this that I had spotted are at [7] and [8].
> 
> > 
> > > From submodule
> > No change to client behavior is needed. With Gerrit submodule subscriptions, the
> > server knows how to generate superproject commits when merging submodule
> > commits.
> > 
> > - git pull / git rebase
> > 
> > Note: We're still thinking about this one :)
> > 
> > 1. Performs a fetch as described above
> > 2. For each superproject commit, replay the submodule commits against the newly
> >     updated submodule base; then, make a new superproject commit containing those
> >     changes
> > 
> > What doesn't already work:
> > 
> >    * Rewriting gitlinks in a superproject commit when 'rebase
> >      --recurse-submodules'-ing
> >    * Resuming after resolving a conflict during rebase
> 
> In general, rebase is not well aware of 'submodule.recurse'. Even if you do not
> need to rewrite superproject commits, there are a couple of use cases that are broken
> right now:
> 
> - 'git rebase upstream/master' when upstream updated the submodule, will correctly
> (recursively) checkout upstream/master before starting the rebase, but upon
> 'git rebase --abort', the submodule will stay checked out at the commit recorded in
> 'upstream/master', which is confusing. This only happens when 'submodule.recurse' is true (!).
> - 'git rebase -i' which stops at a commit 'A' where the submodule commit is changed,
> does not correctly check out the submodule tree. It's checked out at the commit recorded in A~1
> (and this also only happens if submodule.recurse is true)
> - In some cases, like 'rebase -i'-ing across the addition of new submodules, at the end
> of the rebase the submodules are empty, and 'git submodule update' must be run to
> re-populate them.

Interesting, thanks for pointing these out.

> 
> > 
> > - git merge
> > 
> > The story for merges is a little bit muddled... and for our goals we don't need
> > it for quite a while, so we haven't thought much about it :) Any suggestions
> > folks have about reasonable ways to 'git merge --recurse-submodules' are totally
> > welcome. For now, though, we'll probably just stick in some error message saying
> > that merges with submodules isn't currently supported (maybe we will even add
> > that downstream).
> 
> What is "downstream" here ?

"Downstream" meaning the version (fork? ehh) of Git that we build and
ship to developers at Google. We carry a handful of patches - mostly for
stuff that only makes sense internally, like certain transports or
authentication helpers - and occasionally experimental stuff (for
example, we ship config-based hooks to Googlers this way right now). If
we're expecting "No, you can't merge with submodules!" to be a temporary
error message, then it might not make sense to try and upstream that
error string at all.



Thanks for the thorough read and all the pointers, I really appreciate
it.

 - Emily

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: RFC/Discussion - Submodule UX Improvements
  2021-04-18  5:22 ` Christian Couder
@ 2021-04-20 23:10   ` Emily Shaffer
  0 siblings, 0 replies; 18+ messages in thread
From: Emily Shaffer @ 2021-04-20 23:10 UTC (permalink / raw)
  To: Christian Couder
  Cc: git, Ævar Arnfjörð Bjarmason, Jonathan Nieder,
	albertcui, Junio C Hamano, Matheus Tavares Bernardino,
	Shourya Shukla

On Sun, Apr 18, 2021 at 07:22:07AM +0200, Christian Couder wrote:
> 
> Hi Emily,
> 
> On Sat, Apr 17, 2021 at 1:39 AM Emily Shaffer <emilyshaffer@google.com> wrote:
> >
> > Hi folks,
> >
> > As hinted by a couple recent patches, I'm planning on some pretty big submodule
> > work over the next 6 months or so - and Ævar pointed out to me in
> > https://lore.kernel.org/git/87v98p17im.fsf@evledraar.gmail.com that I probably
> > should share some of those plans ahead of time. :) So attached is a lightly
> > modified version of the doc that we've been working on internally at Google,
> > focusing on what we think would be an ideal submodule workflow.
> 
> Thanks for sharing this doc! My main concern with this is that we are
> likely to have a GSoC student working soon on finishing to port `git
> submodule` to C code. And I wonder how that would interact with your
> work.

I discussed this a little with Jonathan N and Albert and we think it
probably won't matter too much. If anything, I expect mostly we would
touch the submodule--helper, and not the 'git submodule' builtin. But
just in case - it would be useful if any GSoC student were publishing
their code to a feature branch (on a fork, maybe) so that I could keep
an eye out for possible conflicts that way. Or, at very least, CCing me
and Jonathan N on patches :)

 - Emily

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: RFC/Discussion - Submodule UX Improvements
  2021-04-20 23:03   ` Emily Shaffer
@ 2021-04-20 23:30     ` Junio C Hamano
  2021-04-21  2:27     ` Philippe Blain
  1 sibling, 0 replies; 18+ messages in thread
From: Junio C Hamano @ 2021-04-20 23:30 UTC (permalink / raw)
  To: Emily Shaffer
  Cc: Philippe Blain, git, avarab, jrnieder, albertcui, matheus.bernardino

Emily Shaffer <emilyshaffer@google.com> writes:

> This is actually a point we discussed internally and I cut out of the
> doc before sharing, because it is very far down our roadmap (not
> expecting to address until probably the second half of the year). As I
> understand it, this can also be achieved today by setting
> 'submodule.path/to/module.active = false' in the superproject's
> .git/config.

Yeah, I think we also added support to choose which submodules can
be "active" based on the attributes system.

Three are many ways to apply band-aid to a tree that should have
been a monolithic single repository but has been split into many
submodules only because we historically did not scale well.  As you
meantioned, sparse-checkout and lazy/partial cloning may change the
picture drastically, not just "sparse" may allow such an "a set of
artificially split out submodules" to be selectively populated, but
more directly clone and work with only the parts you are interested
in a monolithic repository.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: RFC/Discussion - Submodule UX Improvements
  2021-04-20 23:03   ` Emily Shaffer
  2021-04-20 23:30     ` Junio C Hamano
@ 2021-04-21  2:27     ` Philippe Blain
  1 sibling, 0 replies; 18+ messages in thread
From: Philippe Blain @ 2021-04-21  2:27 UTC (permalink / raw)
  To: Emily Shaffer
  Cc: git, avarab, jrnieder, albertcui, gitster, matheus.bernardino



Le 2021-04-20 à 19:03, Emily Shaffer a écrit :

>>> When it's time to update their local repo, the user can do so as with a
>>> single-repo project. First they can 'git checkout main && git pull' (or 'git
>>> pull -r'); Git will first checkout the branches associated with main in each
>>> submodule, then fetch and merge/rebase in each submodule appropriately.
>>
>> What if some submodule does not use the same branch name for their primary integration branch?
>> Sometimes as a superproject using another project as a submodule, you do not
>> control that...
> 
> Yeah, you're right that that's an important consideration - "how can I
> teach my superproject to default to a different branch than the name of
> the superproject's current branch?" I wonder whether the branch config
> in .gitmodules (or an equivalent in superproject's .git/config) would
> make sense to try and use here?

I think it depends on the workflow. Re-reading the above, I would definitely *not* want
'git pull --recurse-submodules' in the superproject to go into each submodule
and do 'git pull' there ! Because maybe some submodule introduced breaking changes
in its API or something and I do not want to deal with that now; I just want to update my tree
with the latest changes *to the superproject* (and maybe to the submodules *if* they
were updated by the superproject, but not if they were updated in the submodule upstream project).
For me, 'git pull --recurse-submodules'
has mostly the right behaviour today, except what does not work (doing something useful
when both sides record changes to the submodule pointer).


> 
>> Also, usually tracking info is only set
>> automatically when using the form 'git checkout -b new-branch upstream/master' or
>> the like. Do you also propose that 'git checkout -b new-branch', by itself, should
>> automatically set tracking info ?
> 
> Yes - that is an approach that we want to explore, to solve the general
> push/fetch remote+branch problem.

Yeah, it would be nice if the triangular workflow capabilities of Git would be expanded
(if I understand correctly that's what you are hinting at here). My personal TODO list
for that has the following items (just dumping that here in case it's useful to someone):

# improve UI/UX around 'branch.pushRemote' and 'remote.pushDefault'
- git branch --verbose could show difference with @{push} in addition to / instead of @{upstream}
- git status "
- git prompt "
- add config branch.<name>.pushBranch (or pushRef)
- add 'git branch --set-push-to remote/name' to set branch.name.pushRemote and branch.name.pushRef
- add 'git push -p <remote> <branch>' to set 'branch.name.pushRemote' and 'branch.name.pushRef' (and warn if push.default is not 'current') OR:
- allow 'branch.pushRemote' and 'remote.pushDefault' to work if push.default=simple
- reword push.default section in git-config  (very unclear)

https://lore.kernel.org/git/87d0q72du2.fsf@javad.com/t/#u
https://lore.kernel.org/git/20130607124146.GF28668@sociomantic.com/t/#u


Cheers,

Philippe.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: RFC/Discussion - Submodule UX Improvements
  2021-04-20 18:47       ` Emily Shaffer
  2021-04-20 19:38         ` Randall S. Becker
@ 2021-04-21  6:57         ` Jacob Keller
  1 sibling, 0 replies; 18+ messages in thread
From: Jacob Keller @ 2021-04-21  6:57 UTC (permalink / raw)
  To: Emily Shaffer
  Cc: Randall S. Becker, Git mailing list,
	Ævar Arnfjörð Bjarmason, Jonathan Nieder,
	Albert Cui, Junio C Hamano, Matheus Tavares Bernardino

On Tue, Apr 20, 2021 at 11:47 AM Emily Shaffer <emilyshaffer@google.com> wrote:
>
> On Tue, Apr 20, 2021 at 09:18:05AM -0700, Jacob Keller wrote:
> >
> > On Mon, Apr 19, 2021 at 12:28 PM Randall S. Becker
> > <rsbecker@nexbridge.com> wrote:
> > > On April 19, 2021 3:15 PM, Jacob Keller wrote:
> > > > A sort of dream I had was a flow where I could do something from the parent
> > > > like "git blame <path/to/submodule>/submodule/file" and have it present a
> > > > blame of that files contents keyed on the *parent* commit that changed the
> > > > submodule to have that line, as opposed to being forced to go into the
> > > > submodule and figure out what commit introduced it and then go back to the
> > > > parent and find out what commit changed the submodule to include that
> > > > submodule commit.
> > >
> > > Not going to disagree, but are you looking for the blame on the submodule ref file itself or files in the submodule? It's hard to teach git to do a blame on a one-line file.
> > >
> >
> > Well, I would like if "git blame <path/to/submodule>" did.. something
> > other than just fail. Sometimes my brain is working in a "blame where
> > this came from" and I type that out and then get frustrated when it
> > fails. Additionally...
> >
> > > Otherwise, and I think this is what you really are going for, teaching it to do a blame based on "git blame <path/to/submodule>/submodule/file" would be very nice and abstracts out the need for the user (or more importantly to me = scripts) to understand that a submodule is involved; however, it is opening up a very large door: "should/could we teach git to abstract submodules out of every command". This would potentially replace a significant part of the use cases for the "git submodule foreach" sub-command. In your ask, the current paradigm "cd <path/to/submodule>/submodule && git blame file" or pretty much every other command does work, but it requires the user/script to know you have a submodule in the path. So my question is: is this worth the effort? I don't have a good answer to that question. Half of my brain would like this very much/the other half is scared of the impact to the code.
> > >
> > > Just my musings.
> >
> > I'm not asking for "git blame <path/to/submodule>/<file>" to give the
> > the same outout as "cd <path/to/submodule> && git blame <file>"
> >
> > What i'm asking is: given this file, tell me which commit in the
> > parent did the line get introduced. So basically I want to walk over
> > the changes to the submodule pointer and find out when it get
> > introduced into the parent, not when it got introduced into the
> > submodule itself.
> >
> > This is a related question, but it is actually not trivial to go
> > instantly from "it was in xyz submodule commit" to "it was then pulled
> > in by xyz parent commit". It's something that is quite tedious to do
> > manually, especially since the submodule pointer could change
> > arbitrarily so knowing the submodule commit doesn't mean you can
> > simply grep for which commit set the submodule exactly to that commit.
> > Essentially, I want a 'git blame' that ignores all changes which
> > aren't actually the submodule pointer, update.
> >
> > I think that's something that is much harder to do manually, but feels
> > like it should be relatively simple to implement within the blame
> > algorithm. I don't feel like this is something strictly replaceable by
> > "git submodule foreach"
>
> I think I understand what you're saying. Something like the following
> tree:
>
> super   sub
> b------->4
>          3
>          2
> a------->1
>
> producing something like this:
>
> 'git -C sub blame main.c'
>
> 1 AU Thor       2020-01-01
> 2 CO Mitter     2020-01-02              int main() {
> 4 AU Thor       2020-01-04                printf("Hello world!\n");
> 3 Dev E         2020-01-03                return 0;
> 2 CO Mitter     2020-01-02              }
>
> and
> 'git blame sub/main.c'
>
> a Mai N         2020-01-01
> b Senior Dev    2020-01-04              int main() {
> b Senior Dev    2020-01-04                printf("Hello world!\n");
> b Senior Dev    2020-01-04                return 0;
> b Senior Dev    2020-01-04              }
>
> or to put it another way: if we are treating superproject commit as "the
> whole feature", then it could be useful to see "which feature added this
> change" instead of "which atomic commit inside a feature added this
> change".
>

Right. I often want to find out when some change actually made it into
the super project.

> To me, it sounds expensive to compute... wouldn't you  need to say, for
> each blame line, "is this commit an ancestor of the commit associated in
> THIS superproject commit? ...how about the next superproject commit?"
> But I also don't have much experience with the blame implementation so
> maybe I'm thinking naively :) :)

Well I imagine it has to be similar to how we compute the blame for a
regular file? I imagine we start at some commit and walk backwards up
the tree, no?

I imagine the current blame algorithm starts from the current commit
and walks backwards through the commit history, determining which
commit was last to have a given line.

In the submodule case I highlighted, we would be doing the same thing:
Follow the super project history. When you find a submodule file, pull
its contents from the matching submodule commit that the parent
history saw. No need to dig any further into the submodule commit
history, just give me that contents and then I can treat it as if that
contents was what was in the super project for this commit, and use
the normal blame algorithm.

It's much more difficult to do that manually (hence why we invented
blame/annotate in the first place), and trying to go from "git -C
<submodule> blame file" to then figure out which super project commit
introduced the change is also tedious and non-trivial considering you
might now have intermediate or unrelated changes (i.e. it's actually
possible that that particular commit *never* made it into the super
project at all, because it got skipped over, and it might even be
after the file got re-written)

My idea for how blame of submodujles work is to essentially pretend as
if you had subtree merged the contents of the submodule into regular
parent project files with those paths, and then do blame on that using
just the parent project history.... If that makes sense?

>
> And even if it is expensive, considering that Jacob and Randall both had
> different ideas of what their ideal 'git blame' recursive behavior would
> be, maybe it makes sense to use a flag to ask for the more expensive
> behavior, e.g. 'git blame --show-superproject-commit sub/main.c'?
>

Right I imagine that in some ways both are useful, and it depends on
the context of what you're looking for.

The reason I bring up the blame example is because the idea for what I
want is quite tedious to mimic by hand, and requires more than just a
simple git submodule foreach or a cd into the submodule to operate on
it as a standalone repository.

>  - Emily

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: RFC/Discussion - Submodule UX Improvements
  2021-04-16 23:36 RFC/Discussion - Submodule UX Improvements Emily Shaffer
                   ` (4 preceding siblings ...)
  2021-04-19 19:14 ` Jacob Keller
@ 2021-04-22 15:32 ` Jacob Keller
  5 siblings, 0 replies; 18+ messages in thread
From: Jacob Keller @ 2021-04-22 15:32 UTC (permalink / raw)
  To: Emily Shaffer
  Cc: Git mailing list, Ævar Arnfjörð Bjarmason,
	Jonathan Nieder, Albert Cui, Junio C Hamano,
	Matheus Tavares Bernardino

On Fri, Apr 16, 2021 at 4:38 PM Emily Shaffer <emilyshaffer@google.com> wrote:
> -- Create mode (git switch -c / git checkout -b)
>
> 1. The current worktree is checked for uncommitted changes to tracked files. The
>    current worktree of each submodule is also checked.
> 2. A new branch is created on the superproject; that branch's ref is pointed to
>    the current HEAD.
> 3. The new branch is checked out on the superproject.
> 4. A new branch with the same name is created on each submodule.
>   a. If there is a naming conflict, we could prompt the user to resolve it, or
>      we could just check out the branch by that name and print a warning to the
>      user with advice on how to solve it (cd submodule && git switch -c
>      different-branch-name HEAD@{1}). Maybe we could skip the warning/advice if
>      the tree is identical to the tree we would have used as the start point
>      (that is, the user switched branches in the submodule, then said "oh crap"
>      and went back and switched branches in the superproject).
>   b. Tracking info is set appropriately on each new branch to the upstream of
>      the branch referenced by the parent of the new superproject commit, OR to
>      the default branch's upstream.
> 5. The new branch is checked out on each of the submodules.
>
> What doesn't already work:
>
>   * Safety check when leaving uncommitted submodule changes
>   * Propagating branch names to submodules currently requires a custom hacky
>     repolike patch
>   * Error handling + graceful non-error handling if the branch already exists
>   * "Knowing what branch to push to": copying over which-branch-is-upstream info
>     ** Needs some UX help, push.default is a mess
>   * Tracking info setups
>


As someone who uses submodules extensively for various projects, I'm
not sure about propagating branches into the submodules.

I think i'd only want this behavior if/when I intend to work on a
submodule. Because of the nature of submodules being distinct, we tend
towards doing submodule work separately, merging it, and then pulling
that change into the super project.

>
> -- Switching to an existing branch (git switch / git checkout)
>
> 1. The current worktree is checked for uncommitted changes to tracked files. The
>    current worktree of each submodule is also checked.
> 2. The requested branch is checked out on the superproject.
> 3. The submodule commit or branch referenced by the newly-checked-out
>    superproject commit is checked out on each submodule.
>
> What doesn't already work:
>
>   * Same as in create mode

I'd imagine there are multiple cases here. For cases where you're not
actively developing submodule, you want to just checkout the right
contents (i.e. what is tracked by the super project). But if you're
developing the submodule in conjunction with the super project you
might want to instead checkout the matching work (as in above where
you create a branch within the submodule?)

> - git status
>
> -- From superproject
> The superproject is clean if:
>
>   * No tracked files in the superproject have been modified and not committed
>   * No tracked files in any submodules have been modified and not committed
>   * No commits in any submodules differ from the commits referenced by the tip
>     commit of the superproject
>
> Advices should describe:
>
>   * How to commit or drop changes to files in the superproject
>   * How to commit or drop changes to files in the submodules
>   * How to commit changes to submodule references
>   * Which commit/branch to switch the submodule back to if the current work
>     should be dropped: "Submodule "foo" no longer points to "main", 'git -C foo
>     switch main' to discard changes"
>
> What doesn't already work:
>
>   * "git status" being super fast and actually possible to use.
>     ** (That is, we've seen it move very slowly on projects with many
>        submodules.)
>   * Advice updates to use the appropriate submodule-y commands.
>

Yea, a slow status means people tend to not use it!

> -- From submodule
>
> git status's behavior for submodules does not change compared to
> single-repository Git, except that a red warning line will also display if the
> superproject commit does not point to the HEAD of the submodule. (This could
> look similar to the detached-HEAD warning and tracking branch lines in git
> status today, e.g. "HEAD is ahead of parent project by 2 commits".)
>
> What doesn't already work:
>
>   * "git status" from a submodule being aware of the superproject.
>

This seems like a very good improvement. One of the biggest complaints
about submodules I've had to deal with when helping coworkers is the
fact that submodules weren't moved forward automatically, and that
they had no real idea that the submodule was different. This tended to
lead towards commits including submodule rewinds on accident.


> - Worktrees
>
> When a user runs 'git worktree add' from the superproject, each submodule in the
> new worktree should also be created as a worktree of the corresponding submodule
> in the original project.
>
> What doesn't already work:
>
>   * worktrees and submodules getting along - submodules are now freshly cloned
>     when creating a superproject worktree

This is something I would love to see fixed!  Right now using work
trees on a project with submodules is problematic. Especially a
project with many submodules, as this ends up making many extra
clones, taking disk space and network time to setup.

>
> - git clone --reference [--dissociate]
>
> When cloning with an alternate directory, submodules should also try to use
> object stores associated with the referenced project instead of cloning from
> their remotes right away. It is unclear how much of this works today.
>
>
> What doesn't already work:
>
>   * Writing some tests and making them pass

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, back to index

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-16 23:36 RFC/Discussion - Submodule UX Improvements Emily Shaffer
2021-04-18  5:22 ` Christian Couder
2021-04-20 23:10   ` Emily Shaffer
2021-04-19  3:20 ` Philippe Blain
2021-04-20 23:03   ` Emily Shaffer
2021-04-20 23:30     ` Junio C Hamano
2021-04-21  2:27     ` Philippe Blain
2021-04-19 12:56 ` Randall S. Becker
2021-04-19 12:56 ` Aaron Schrab
2021-04-20 18:49   ` Emily Shaffer
2021-04-20 19:29     ` Randall S. Becker
2021-04-19 19:14 ` Jacob Keller
2021-04-19 19:28   ` Randall S. Becker
2021-04-20 16:18     ` Jacob Keller
2021-04-20 18:47       ` Emily Shaffer
2021-04-20 19:38         ` Randall S. Becker
2021-04-21  6:57         ` Jacob Keller
2021-04-22 15:32 ` Jacob Keller

Git Mailing List Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/git/0 git/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 git git/ https://lore.kernel.org/git \
		git@vger.kernel.org
	public-inbox-index git

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.git


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git