All of lore.kernel.org
 help / color / mirror / Atom feed
From: Derrick Stolee <stolee@gmail.com>
To: Tim Renouf <tpr.ll@botech.co.uk>,
	newren@gmail.com, dstolee@microsoft.com, git@vger.kernel.org
Subject: Re: [PATCH] unpack-trees: add core.sparseCheckoutRmFiles config
Date: Tue, 1 Jun 2021 22:00:50 -0400	[thread overview]
Message-ID: <f6d39636-308c-c846-55b5-3f16a155e69d@gmail.com> (raw)
In-Reply-To: <20210601183106.3084008-1-tpr.ll@botech.co.uk>

On 6/1/2021 2:31 PM, Tim Renouf wrote:
> When doing a checkout (or other index merge from a tree) causes the
> removal of a path that is outside sparse-checkout, the file is removed
> from the working tree, even if it is dirty.
> 
> That is probably what you want if the file got there by being
> materialized in a merge conflict. But it is not what you want if you
> deliberately put the file there.
> 
> This commit adds the above config item, defaulting to "true" to get the
> old behavior. Set it to "false" to avoid removing such a file from the
> worktree.

I don't think config is necessary here. This behavior is niche
enough to create a behavior-breaking change. However, we do want
to ensure that the full flow of eventually clearing the file when
clean is handled.

> diff --git a/Documentation/git-sparse-checkout.txt b/Documentation/git-sparse-checkout.txt
> index a0eeaeb02e..31adb38b1d 100644
> --- a/Documentation/git-sparse-checkout.txt
> +++ b/Documentation/git-sparse-checkout.txt
> @@ -111,6 +111,17 @@ the sparse-checkout file.
>  To repopulate the working directory with all files, use the
>  `git sparse-checkout disable` command.
>  
> +The `core.sparseCheckoutRmFiles` config setting

If we _are_ going to go with a config option, then I'm not a big
fan of this name. I've also been thinking that the sparse-checkout
config has been squatting in the "core.*" space too long. Perhaps
it is time to create its own section?

What about something like sparseCheckout.keepDirtyFiles, which
defaults to false?

Alternatively, we could add things to the index.* space. We
already have "index.sparse" for the sparse index feature. For this
topic, a config option "index.keepDirtySparseFiles" would have a
similar meaning to my other suggestion.

At the least, you would need to update the appropriate file in
Documentation/config/*.txt.

>  extern int core_apply_sparse_checkout;
>  extern int core_sparse_checkout_cone;
> +extern int core_sparse_checkout_rm_files;

These previous variables being global is unfortunate and it
might be time to fix that. At minimum, I think this new
option might be able to inject somewhere else without
running a lot of git_config_get_value() calls in a loop.

Since your changes are within unpack-trees.c, maybe adding
a flag to 'struct unpack_trees_options' would suffice. It
could be initialized in unpack_trees() so only happen once
per index update.

> +test_expect_success 'core.sparseCheckoutRmFiles checkout that would remove file outside sparse-checkout removes file from disk' '

This test name is too long. Perhaps

	'sparse-checkout removes dirty non-matching file'

> +	git config core.sparseCheckout false &&
> +	git checkout -f top &&
> +	echo added3 >added3 &&
> +	git add added3 &&
> +	git commit -madded3 &&
> +	git checkout top &&
> +	test_path_is_missing added3 &&
> +	git config core.sparseCheckout true &&
> +	git config core.sparseCheckoutRmFiles true &&
> +	echo init.t >.git/info/sparse-checkout &&

Perhaps we could use a more modern approach, such as

	git sparse-checkout init &&
	git sparse-checkout set init.t &&

(and use 'git sparse-checkout disable' at the start of the
test.)

> +	git checkout HEAD@{1} &&

I'd typically expect 'git checkout HEAD~1' instead of
using the reflog, since it is more robust to changing
the test in the future. Better even to create a new
branch earlier and then switch between named branches.

> +	test_path_is_missing added3 &&
> +	echo dirty >added3 &&

I appreciate that you confirm the file is missing before
you create it.

> +	git checkout top &&
> +	test_path_is_missing added3

Thought: does this happen also with 'git sparse-checkout
reapply'?

> +'
> +
> +test_expect_success '!core.sparseCheckoutRmFiles checkout that would remove file outside sparse-checkout does not remove file from disk' '

and here:

	'sparse-checkout keeps dirty non-matching file'

These tests are very similar. Perhaps they could be
grouped and just have a check at the end that makes
'added3' dirty and checks the behavior of 'git checkout'
with the two config settings?

> --- a/unpack-trees.c
> +++ b/unpack-trees.c
> @@ -1528,7 +1528,9 @@ static void mark_new_skip_worktree(struct pattern_list *pl,
>  
>  	/*
>  	 * 1. Pretend the narrowest worktree: only unmerged entries
> -	 * are checked out
> +	 * are checked out. If core.sparseCheckoutRmFiles is off, then
> +	 * treat a file being removed as merged, so it does not get
> +	 * removed from the worktree.
>  	 */
>  	for (i = 0; i < istate->cache_nr; i++) {
>  		struct cache_entry *ce = istate->cache[i];
> @@ -1536,7 +1538,8 @@ static void mark_new_skip_worktree(struct pattern_list *pl,
>  		if (select_flag && !(ce->ce_flags & select_flag))
>  			continue;
>  
> -		if (!ce_stage(ce) && !(ce->ce_flags & CE_CONFLICTED))
> +		if ((!ce_stage(ce) && !(ce->ce_flags & CE_CONFLICTED)) ||
> +		    ((ce->ce_flags & CE_REMOVE) && !core_sparse_checkout_rm_files))
>  			ce->ce_flags |= skip_wt_flag;
>  		else
>  			ce->ce_flags &= ~skip_wt_flag;
> @@ -1681,12 +1684,15 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
>  
>  	if (!o->skip_sparse_checkout) {
>  		/*
> -		 * Sparse checkout loop #2: set NEW_SKIP_WORKTREE on entries not in loop #1
> +		 * Sparse checkout loop #2: set NEW_SKIP_WORKTREE on entries not in loop #1.
> +		 * If !core.sparseCheckoutRmFiles, include files being removed so ones
> +		 * outside sparse-checkout patterns do not get removed from the worktree.
>  		 * If they will have NEW_SKIP_WORKTREE, also set CE_SKIP_WORKTREE
>  		 * so apply_sparse_checkout() won't attempt to remove it from worktree
>  		 */
> +		int mask = core_sparse_checkout_rm_files ? CE_ADDED : CE_ADDED | CE_REMOVE;
>  		mark_new_skip_worktree(o->pl, &o->result,
> -				       CE_ADDED, CE_SKIP_WORKTREE | CE_NEW_SKIP_WORKTREE,
> +				       mask, CE_SKIP_WORKTREE | CE_NEW_SKIP_WORKTREE,
>  				       o->verbose_update);

I'm a bit worried about this use of CE_REMOVE. I see its use listed in
cache-tree.c with this comment:

		/*
		 * CE_REMOVE entries are removed before the index is
		 * written to disk. Skip them to remain consistent
		 * with the future on-disk index.
		 */

This makes me think that the entries are actually not present in the
written index, which is incorrect. It will make it look like we have
staged a deletion of that file. Can you check that the output of
'git status' shows the file with no staged changes, but an unstaged
_modification_? Alternatively: the modification might be ignored since
it is a sparse entry, but we would probably want to fix that before
taking this change. If my understanding is correct*, then 'git status'
will show it as a staged deletion and an unstaged addition.

(*) This is a BIG IF.

Thank you for your interest here. Elijah is right that the area is
fraught with complexity, and I'm learning something new about it
every day as I work on my sparse-index feature. I'm still trying
to grasp the subtleties like this and their ramifications before
changing the existing behavior because I want to be _sure_ we are
moving in a more stable direction and not just another unstable
point that frustrates users.

This change seems like a focused attempt, but I think we need to
track down those other details before committing to such a change:

1. How does a user with a dirty, tracked, sparse file get back
   into a state where that file is deleted? What commands do
   they need to run? Can that be tested and added to the sparse-
   checkout documentation?

2. What does 'git status' look like when a user is in this state?
   Is that helpful?

Thanks,
-Stolee

  reply	other threads:[~2021-06-02  2:00 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-28 20:14 bug report: git checkout deletes worktree file even though it is excluded by sparse-checkout Tim Renouf (open source)
2021-05-28 22:44 ` Elijah Newren
2021-06-01 18:31   ` [PATCH] unpack-trees: add core.sparseCheckoutRmFiles config Tim Renouf
2021-06-02  2:00     ` Derrick Stolee [this message]
2021-06-02  2:32       ` Junio C Hamano
2021-06-02  5:53         ` Elijah Newren
2021-06-02  6:13           ` Tim Renouf (open source)
2021-06-02 23:41             ` Elijah Newren
2021-06-05 15:33               ` Tim Renouf (open source)
2021-06-02  1:37 ` bug report: git checkout deletes worktree file even though it is excluded by sparse-checkout Derrick Stolee

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f6d39636-308c-c846-55b5-3f16a155e69d@gmail.com \
    --to=stolee@gmail.com \
    --cc=dstolee@microsoft.com \
    --cc=git@vger.kernel.org \
    --cc=newren@gmail.com \
    --cc=tpr.ll@botech.co.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.