git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* bug report: git checkout deletes worktree file even though it is excluded by sparse-checkout
@ 2021-05-28 20:14 Tim Renouf (open source)
  2021-05-28 22:44 ` Elijah Newren
  2021-06-02  1:37 ` bug report: git checkout deletes worktree file even though it is excluded by sparse-checkout Derrick Stolee
  0 siblings, 2 replies; 10+ messages in thread
From: Tim Renouf (open source) @ 2021-05-28 20:14 UTC (permalink / raw)
  To: git

Hi all

I have a bug report: git checkout deletes a worktree file even though it is excluded by sparse-checkout, even if it is dirty.

Watch this (starting in an empty directory):

$ git init
Initialised empty Git repository in /mnt/amd/home/trenouf/amd/tmp/git/.git/
$ echo file1 >file1; echo file2 >file2
$ git add file1 file2
$ git commit -m"commit 1"
[master (root-commit) 4f7035d] commit 1
 2 files changed, 2 insertions(+)
 create mode 100644 file1
 create mode 100644 file2
$ git rm file2
rm 'file2'
$ git commit -m"rm file2"
[master d025c99] rm file2
 1 file changed, 1 deletion(-)
 delete mode 100644 file2
$ git checkout master~1
HEAD is now at 4f7035d commit 1
$ git sparse-checkout set /file1
$ ls
file1
$ echo dirty >file2
$ ls
file1  file2
$ git checkout master
Previous HEAD position was 4f7035d commit 1
Switched to branch 'master'
$ ls
file1

I set up sparse-checkout to include only file1, not file2. file2 is now not in the worktree, even though it is in the commit I am checked out at. Then I create file2 with arbitrary content. Then a git checkout switching to the commit where file2 is removed also deletes it from the worktree.

I assert that file2 should be left untouched by that checkout, because it is excluded by sparse-checkout. I guess file2 had its skip-worktree bit set before the checkout that removed it from the index; that should stop it being deleted in the worktree.

To be clear, I expect that last “ls” to still show “file1  file2”.

Thank you for your attention if you have got this far.

-tpr

[System Info]
git version:
git version 2.31.1
cpu: x86_64
no commit associated with this build
sizeof-long: 8
sizeof-size_t: 8
shell-path: /bin/sh
uname: Linux 5.8.0-53-generic #60~20.04.1-Ubuntu SMP Thu May 6 09:52:46 UTC 2021 x86_64
compiler info: gnuc: 9.3
libc info: glibc: 2.31
$SHELL (typically, interactive shell): /bin/bash


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: bug report: git checkout deletes worktree file even though it is excluded by sparse-checkout
  2021-05-28 20:14 bug report: git checkout deletes worktree file even though it is excluded by sparse-checkout Tim Renouf (open source)
@ 2021-05-28 22:44 ` Elijah Newren
  2021-06-01 18:31   ` [PATCH] unpack-trees: add core.sparseCheckoutRmFiles config Tim Renouf
  2021-06-02  1:37 ` bug report: git checkout deletes worktree file even though it is excluded by sparse-checkout Derrick Stolee
  1 sibling, 1 reply; 10+ messages in thread
From: Elijah Newren @ 2021-05-28 22:44 UTC (permalink / raw)
  To: Tim Renouf (open source); +Cc: Git Mailing List

Hi,

On Fri, May 28, 2021 at 1:46 PM Tim Renouf (open source)
<tpr.ll@botech.co.uk> wrote:
>
> Hi all
>
> I have a bug report: git checkout deletes a worktree file even though it is excluded by sparse-checkout, even if it is dirty.
>
> Watch this (starting in an empty directory):
>
> $ git init
> Initialised empty Git repository in /mnt/amd/home/trenouf/amd/tmp/git/.git/
> $ echo file1 >file1; echo file2 >file2
> $ git add file1 file2
> $ git commit -m"commit 1"
> [master (root-commit) 4f7035d] commit 1
>  2 files changed, 2 insertions(+)
>  create mode 100644 file1
>  create mode 100644 file2
> $ git rm file2
> rm 'file2'
> $ git commit -m"rm file2"
> [master d025c99] rm file2
>  1 file changed, 1 deletion(-)
>  delete mode 100644 file2
> $ git checkout master~1
> HEAD is now at 4f7035d commit 1
> $ git sparse-checkout set /file1
> $ ls
> file1
> $ echo dirty >file2
> $ ls
> file1  file2
> $ git checkout master
> Previous HEAD position was 4f7035d commit 1
> Switched to branch 'master'
> $ ls
> file1
>
> I set up sparse-checkout to include only file1, not file2. file2 is now not in the worktree, even though it is in the commit I am checked out at. Then I create file2 with arbitrary content. Then a git checkout switching to the commit where file2 is removed also deletes it from the worktree.
>
> I assert that file2 should be left untouched by that checkout, because it is excluded by sparse-checkout. I guess file2 had its skip-worktree bit set before the checkout that removed it from the index; that should stop it being deleted in the worktree.
>
> To be clear, I expect that last “ls” to still show “file1  file2”.
>
> Thank you for your attention if you have got this far.

Thanks for the report.  It's another example of how
"skip-worktree-means-treat-the-file-as-matching-head" causes
confusion.  You can find more issues with
present-despite-skip-worktree files at
https://lore.kernel.org/git/CABPp-BF6GpoDtMfpzf=3VWL_puuRH-cNV=9KajdF1003Fe05jA@mail.gmail.com/,
including reasons they come up more often than you'd think.

We do need to get these fixed up, though I don't want to step on
Stolee's toes with his sparse-index work, and I'm hesitant to open new
big projects until merge-ort is complete.

> -tpr
>
> [System Info]
> git version:
> git version 2.31.1
> cpu: x86_64
> no commit associated with this build
> sizeof-long: 8
> sizeof-size_t: 8
> shell-path: /bin/sh
> uname: Linux 5.8.0-53-generic #60~20.04.1-Ubuntu SMP Thu May 6 09:52:46 UTC 2021 x86_64
> compiler info: gnuc: 9.3
> libc info: glibc: 2.31
> $SHELL (typically, interactive shell): /bin/bash

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH] unpack-trees: add core.sparseCheckoutRmFiles config
  2021-05-28 22:44 ` Elijah Newren
@ 2021-06-01 18:31   ` Tim Renouf
  2021-06-02  2:00     ` Derrick Stolee
  0 siblings, 1 reply; 10+ messages in thread
From: Tim Renouf @ 2021-06-01 18:31 UTC (permalink / raw)
  To: newren, dstolee, git; +Cc: Tim Renouf

When doing a checkout (or other index merge from a tree) causes the
removal of a path that is outside sparse-checkout, the file is removed
from the working tree, even if it is dirty.

That is probably what you want if the file got there by being
materialized in a merge conflict. But it is not what you want if you
deliberately put the file there.

This commit adds the above config item, defaulting to "true" to get the
old behavior. Set it to "false" to avoid removing such a file from the
worktree.

Signed-off-by: Tim Renouf <tpr.ll@botech.co.uk>
---
Here is a fix for my problem, hidden under a config option as it might
not be what everyone wants (there are a few tests that rely on the
existing behavior). I realize this is a bit of a piecemeal approach.
Hopefully it will be superseded by the sparse-index work when that
arrives.

 Documentation/git-sparse-checkout.txt | 11 ++++++++
 cache.h                               |  1 +
 config.c                              |  5 ++++
 environment.c                         |  1 +
 t/t1011-read-tree-sparse-checkout.sh  | 36 +++++++++++++++++++++++++++
 unpack-trees.c                        | 14 ++++++++---
 6 files changed, 64 insertions(+), 4 deletions(-)

diff --git a/Documentation/git-sparse-checkout.txt b/Documentation/git-sparse-checkout.txt
index a0eeaeb02e..31adb38b1d 100644
--- a/Documentation/git-sparse-checkout.txt
+++ b/Documentation/git-sparse-checkout.txt
@@ -111,6 +111,17 @@ the sparse-checkout file.
 To repopulate the working directory with all files, use the
 `git sparse-checkout disable` command.
 
+The `core.sparseCheckoutRmFiles` config setting determines what to do when a
+checkout (or other index merge from a tree) causes the removal of a path that
+is outside the sparse-checkout patterns but the file exists in the worktree
+anyway. The default is `true`, which causes such a file to be removed from the
+worktree, which is probably what you want to remove outside-sparse-checkout
+files that were materialized by a merge conflict. Setting it to `false` means
+that such a file is not removed, which is probably what you want if you
+deliberately have files in the outside-sparse-checkout part of a worktree.
+
+The behavior with regard to worktree files that are outside sparse-checkout
+patterns is likely to change in the future.
 
 FULL PATTERN SET
 ----------------
diff --git a/cache.h b/cache.h
index 6fda8091f1..19ee1cfc02 100644
--- a/cache.h
+++ b/cache.h
@@ -964,6 +964,7 @@ extern const char *core_fsmonitor;
 
 extern int core_apply_sparse_checkout;
 extern int core_sparse_checkout_cone;
+extern int core_sparse_checkout_rm_files;
 
 /*
  * Include broken refs in all ref iterations, which will
diff --git a/config.c b/config.c
index 6428393a41..dd24e753a8 100644
--- a/config.c
+++ b/config.c
@@ -1552,6 +1552,11 @@ static int git_default_core_config(const char *var, const char *value, void *cb)
 		return 0;
 	}
 
+	if (!strcmp(var, "core.sparsecheckoutrmfiles")) {
+		core_sparse_checkout_rm_files = git_config_bool(var, value);
+		return 0;
+	}
+
 	if (!strcmp(var, "core.precomposeunicode")) {
 		precomposed_unicode = git_config_bool(var, value);
 		return 0;
diff --git a/environment.c b/environment.c
index 2f27008424..cff6bbbe62 100644
--- a/environment.c
+++ b/environment.c
@@ -70,6 +70,7 @@ char *notes_ref_name;
 int grafts_replace_parents = 1;
 int core_apply_sparse_checkout;
 int core_sparse_checkout_cone;
+int core_sparse_checkout_rm_files = 1;
 int merge_log_config = -1;
 int precomposed_unicode = -1; /* see probe_utf8_pathname_composition() */
 unsigned long pack_size_limit_cfg;
diff --git a/t/t1011-read-tree-sparse-checkout.sh b/t/t1011-read-tree-sparse-checkout.sh
index 24092c09a9..67690b31c3 100755
--- a/t/t1011-read-tree-sparse-checkout.sh
+++ b/t/t1011-read-tree-sparse-checkout.sh
@@ -280,4 +280,40 @@ test_expect_success 'checkout with --ignore-skip-worktree-bits' '
 	git diff --exit-code HEAD
 '
 
+test_expect_success 'core.sparseCheckoutRmFiles checkout that would remove file outside sparse-checkout removes file from disk' '
+	git config core.sparseCheckout false &&
+	git checkout -f top &&
+	echo added3 >added3 &&
+	git add added3 &&
+	git commit -madded3 &&
+	git checkout top &&
+	test_path_is_missing added3 &&
+	git config core.sparseCheckout true &&
+	git config core.sparseCheckoutRmFiles true &&
+	echo init.t >.git/info/sparse-checkout &&
+	git checkout HEAD@{1} &&
+	test_path_is_missing added3 &&
+	echo dirty >added3 &&
+	git checkout top &&
+	test_path_is_missing added3
+'
+
+test_expect_success '!core.sparseCheckoutRmFiles checkout that would remove file outside sparse-checkout does not remove file from disk' '
+	git config core.sparseCheckout false &&
+	git checkout -f top &&
+	echo added4 >added4 &&
+	git add added4 &&
+	git commit -madded4 &&
+	git checkout top &&
+	test_path_is_missing added4 &&
+	git config core.sparseCheckout true &&
+	git config core.sparseCheckoutRmFiles false &&
+	echo init.t >.git/info/sparse-checkout &&
+	git checkout HEAD@{1} &&
+	test_path_is_missing added4 &&
+	echo dirty >added4 &&
+	git checkout top &&
+	test_path_is_file added4
+'
+
 test_done
diff --git a/unpack-trees.c b/unpack-trees.c
index 9298fe1d9b..cdc3915974 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -1528,7 +1528,9 @@ static void mark_new_skip_worktree(struct pattern_list *pl,
 
 	/*
 	 * 1. Pretend the narrowest worktree: only unmerged entries
-	 * are checked out
+	 * are checked out. If core.sparseCheckoutRmFiles is off, then
+	 * treat a file being removed as merged, so it does not get
+	 * removed from the worktree.
 	 */
 	for (i = 0; i < istate->cache_nr; i++) {
 		struct cache_entry *ce = istate->cache[i];
@@ -1536,7 +1538,8 @@ static void mark_new_skip_worktree(struct pattern_list *pl,
 		if (select_flag && !(ce->ce_flags & select_flag))
 			continue;
 
-		if (!ce_stage(ce) && !(ce->ce_flags & CE_CONFLICTED))
+		if ((!ce_stage(ce) && !(ce->ce_flags & CE_CONFLICTED)) ||
+		    ((ce->ce_flags & CE_REMOVE) && !core_sparse_checkout_rm_files))
 			ce->ce_flags |= skip_wt_flag;
 		else
 			ce->ce_flags &= ~skip_wt_flag;
@@ -1681,12 +1684,15 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
 
 	if (!o->skip_sparse_checkout) {
 		/*
-		 * Sparse checkout loop #2: set NEW_SKIP_WORKTREE on entries not in loop #1
+		 * Sparse checkout loop #2: set NEW_SKIP_WORKTREE on entries not in loop #1.
+		 * If !core.sparseCheckoutRmFiles, include files being removed so ones
+		 * outside sparse-checkout patterns do not get removed from the worktree.
 		 * If they will have NEW_SKIP_WORKTREE, also set CE_SKIP_WORKTREE
 		 * so apply_sparse_checkout() won't attempt to remove it from worktree
 		 */
+		int mask = core_sparse_checkout_rm_files ? CE_ADDED : CE_ADDED | CE_REMOVE;
 		mark_new_skip_worktree(o->pl, &o->result,
-				       CE_ADDED, CE_SKIP_WORKTREE | CE_NEW_SKIP_WORKTREE,
+				       mask, CE_SKIP_WORKTREE | CE_NEW_SKIP_WORKTREE,
 				       o->verbose_update);
 
 		ret = 0;
-- 
2.32.0.rc2.3.g151f456769


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: bug report: git checkout deletes worktree file even though it is excluded by sparse-checkout
  2021-05-28 20:14 bug report: git checkout deletes worktree file even though it is excluded by sparse-checkout Tim Renouf (open source)
  2021-05-28 22:44 ` Elijah Newren
@ 2021-06-02  1:37 ` Derrick Stolee
  1 sibling, 0 replies; 10+ messages in thread
From: Derrick Stolee @ 2021-06-02  1:37 UTC (permalink / raw)
  To: Tim Renouf (open source), git

On 5/28/2021 4:14 PM, Tim Renouf (open source) wrote:
> Hi all
> 
> I have a bug report: git checkout deletes a worktree file even though it is excluded by sparse-checkout, even if it is dirty.

...

> I set up sparse-checkout to include only file1, not file2. file2 is now not in the worktree, even though it is in the commit I am checked out at. Then I create file2 with arbitrary content. Then a git checkout switching to the commit where file2 is removed also deletes it from the worktree.
>
> I assert that file2 should be left untouched by that checkout, because it is excluded by sparse-checkout. I guess file2 had its skip-worktree bit set before the checkout that removed it from the index; that should stop it being deleted in the worktree.

In this case, 'file2' exists in the index at 'master' and
'git checkout master' made the working directory appear as
if it was at that position with the sparse-checkout
patterns applied.

The confusion happens that Git behaves differently when a
file exists outside the sparse-checkout patterns based on
whether the file actually exists in the index or not. Only
an exact path match with a sparse index entry will cause
this deletion.

This is another reason why I prefer the "cone mode" patterns:
it is harder to get this kind of problem. You could easily
make it happen with "folder2/file" where "folder2" is not in
the cone mode sparse-checkout, but it's less likely in some
way.
 
> To be clear, I expect that last “ls” to still show “file1  file2”.

I can understand this expectation. I don't think that if
we changed this behavior that it would cause any confusion.
That is, as long as the file disappears when no longer dirty,
_and_ a command such as "git reset --hard" overwrites the
file. It is important to provide ways for users to remove
the file when they want it gone.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] unpack-trees: add core.sparseCheckoutRmFiles config
  2021-06-01 18:31   ` [PATCH] unpack-trees: add core.sparseCheckoutRmFiles config Tim Renouf
@ 2021-06-02  2:00     ` Derrick Stolee
  2021-06-02  2:32       ` Junio C Hamano
  0 siblings, 1 reply; 10+ messages in thread
From: Derrick Stolee @ 2021-06-02  2:00 UTC (permalink / raw)
  To: Tim Renouf, newren, dstolee, git

On 6/1/2021 2:31 PM, Tim Renouf wrote:
> When doing a checkout (or other index merge from a tree) causes the
> removal of a path that is outside sparse-checkout, the file is removed
> from the working tree, even if it is dirty.
> 
> That is probably what you want if the file got there by being
> materialized in a merge conflict. But it is not what you want if you
> deliberately put the file there.
> 
> This commit adds the above config item, defaulting to "true" to get the
> old behavior. Set it to "false" to avoid removing such a file from the
> worktree.

I don't think config is necessary here. This behavior is niche
enough to create a behavior-breaking change. However, we do want
to ensure that the full flow of eventually clearing the file when
clean is handled.

> diff --git a/Documentation/git-sparse-checkout.txt b/Documentation/git-sparse-checkout.txt
> index a0eeaeb02e..31adb38b1d 100644
> --- a/Documentation/git-sparse-checkout.txt
> +++ b/Documentation/git-sparse-checkout.txt
> @@ -111,6 +111,17 @@ the sparse-checkout file.
>  To repopulate the working directory with all files, use the
>  `git sparse-checkout disable` command.
>  
> +The `core.sparseCheckoutRmFiles` config setting

If we _are_ going to go with a config option, then I'm not a big
fan of this name. I've also been thinking that the sparse-checkout
config has been squatting in the "core.*" space too long. Perhaps
it is time to create its own section?

What about something like sparseCheckout.keepDirtyFiles, which
defaults to false?

Alternatively, we could add things to the index.* space. We
already have "index.sparse" for the sparse index feature. For this
topic, a config option "index.keepDirtySparseFiles" would have a
similar meaning to my other suggestion.

At the least, you would need to update the appropriate file in
Documentation/config/*.txt.

>  extern int core_apply_sparse_checkout;
>  extern int core_sparse_checkout_cone;
> +extern int core_sparse_checkout_rm_files;

These previous variables being global is unfortunate and it
might be time to fix that. At minimum, I think this new
option might be able to inject somewhere else without
running a lot of git_config_get_value() calls in a loop.

Since your changes are within unpack-trees.c, maybe adding
a flag to 'struct unpack_trees_options' would suffice. It
could be initialized in unpack_trees() so only happen once
per index update.

> +test_expect_success 'core.sparseCheckoutRmFiles checkout that would remove file outside sparse-checkout removes file from disk' '

This test name is too long. Perhaps

	'sparse-checkout removes dirty non-matching file'

> +	git config core.sparseCheckout false &&
> +	git checkout -f top &&
> +	echo added3 >added3 &&
> +	git add added3 &&
> +	git commit -madded3 &&
> +	git checkout top &&
> +	test_path_is_missing added3 &&
> +	git config core.sparseCheckout true &&
> +	git config core.sparseCheckoutRmFiles true &&
> +	echo init.t >.git/info/sparse-checkout &&

Perhaps we could use a more modern approach, such as

	git sparse-checkout init &&
	git sparse-checkout set init.t &&

(and use 'git sparse-checkout disable' at the start of the
test.)

> +	git checkout HEAD@{1} &&

I'd typically expect 'git checkout HEAD~1' instead of
using the reflog, since it is more robust to changing
the test in the future. Better even to create a new
branch earlier and then switch between named branches.

> +	test_path_is_missing added3 &&
> +	echo dirty >added3 &&

I appreciate that you confirm the file is missing before
you create it.

> +	git checkout top &&
> +	test_path_is_missing added3

Thought: does this happen also with 'git sparse-checkout
reapply'?

> +'
> +
> +test_expect_success '!core.sparseCheckoutRmFiles checkout that would remove file outside sparse-checkout does not remove file from disk' '

and here:

	'sparse-checkout keeps dirty non-matching file'

These tests are very similar. Perhaps they could be
grouped and just have a check at the end that makes
'added3' dirty and checks the behavior of 'git checkout'
with the two config settings?

> --- a/unpack-trees.c
> +++ b/unpack-trees.c
> @@ -1528,7 +1528,9 @@ static void mark_new_skip_worktree(struct pattern_list *pl,
>  
>  	/*
>  	 * 1. Pretend the narrowest worktree: only unmerged entries
> -	 * are checked out
> +	 * are checked out. If core.sparseCheckoutRmFiles is off, then
> +	 * treat a file being removed as merged, so it does not get
> +	 * removed from the worktree.
>  	 */
>  	for (i = 0; i < istate->cache_nr; i++) {
>  		struct cache_entry *ce = istate->cache[i];
> @@ -1536,7 +1538,8 @@ static void mark_new_skip_worktree(struct pattern_list *pl,
>  		if (select_flag && !(ce->ce_flags & select_flag))
>  			continue;
>  
> -		if (!ce_stage(ce) && !(ce->ce_flags & CE_CONFLICTED))
> +		if ((!ce_stage(ce) && !(ce->ce_flags & CE_CONFLICTED)) ||
> +		    ((ce->ce_flags & CE_REMOVE) && !core_sparse_checkout_rm_files))
>  			ce->ce_flags |= skip_wt_flag;
>  		else
>  			ce->ce_flags &= ~skip_wt_flag;
> @@ -1681,12 +1684,15 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
>  
>  	if (!o->skip_sparse_checkout) {
>  		/*
> -		 * Sparse checkout loop #2: set NEW_SKIP_WORKTREE on entries not in loop #1
> +		 * Sparse checkout loop #2: set NEW_SKIP_WORKTREE on entries not in loop #1.
> +		 * If !core.sparseCheckoutRmFiles, include files being removed so ones
> +		 * outside sparse-checkout patterns do not get removed from the worktree.
>  		 * If they will have NEW_SKIP_WORKTREE, also set CE_SKIP_WORKTREE
>  		 * so apply_sparse_checkout() won't attempt to remove it from worktree
>  		 */
> +		int mask = core_sparse_checkout_rm_files ? CE_ADDED : CE_ADDED | CE_REMOVE;
>  		mark_new_skip_worktree(o->pl, &o->result,
> -				       CE_ADDED, CE_SKIP_WORKTREE | CE_NEW_SKIP_WORKTREE,
> +				       mask, CE_SKIP_WORKTREE | CE_NEW_SKIP_WORKTREE,
>  				       o->verbose_update);

I'm a bit worried about this use of CE_REMOVE. I see its use listed in
cache-tree.c with this comment:

		/*
		 * CE_REMOVE entries are removed before the index is
		 * written to disk. Skip them to remain consistent
		 * with the future on-disk index.
		 */

This makes me think that the entries are actually not present in the
written index, which is incorrect. It will make it look like we have
staged a deletion of that file. Can you check that the output of
'git status' shows the file with no staged changes, but an unstaged
_modification_? Alternatively: the modification might be ignored since
it is a sparse entry, but we would probably want to fix that before
taking this change. If my understanding is correct*, then 'git status'
will show it as a staged deletion and an unstaged addition.

(*) This is a BIG IF.

Thank you for your interest here. Elijah is right that the area is
fraught with complexity, and I'm learning something new about it
every day as I work on my sparse-index feature. I'm still trying
to grasp the subtleties like this and their ramifications before
changing the existing behavior because I want to be _sure_ we are
moving in a more stable direction and not just another unstable
point that frustrates users.

This change seems like a focused attempt, but I think we need to
track down those other details before committing to such a change:

1. How does a user with a dirty, tracked, sparse file get back
   into a state where that file is deleted? What commands do
   they need to run? Can that be tested and added to the sparse-
   checkout documentation?

2. What does 'git status' look like when a user is in this state?
   Is that helpful?

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] unpack-trees: add core.sparseCheckoutRmFiles config
  2021-06-02  2:00     ` Derrick Stolee
@ 2021-06-02  2:32       ` Junio C Hamano
  2021-06-02  5:53         ` Elijah Newren
  0 siblings, 1 reply; 10+ messages in thread
From: Junio C Hamano @ 2021-06-02  2:32 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Tim Renouf, newren, dstolee, git

Derrick Stolee <stolee@gmail.com> writes:

>> This commit adds the above config item, defaulting to "true" to get the
>> old behavior. Set it to "false" to avoid removing such a file from the
>> worktree.
>
> I don't think config is necessary here. This behavior is niche
> enough to create a behavior-breaking change. However, we do want
> to ensure that the full flow of eventually clearing the file when
> clean is handled.

I didn't have a chance to get around to commenting on the patch
(instead I was preparing for -rc3), but you covered pretty much
everything I wanted to say.  It is unusual for those who are using
sparse checkout to have a modified (=tracked and dirty) file that
shouldn't be there, and making sure the user notices these unusual
files, instead of silently losing the changes to them, is probably a
"bugfix".

An explicit request to destructively overwrite the path ("git
restore -- path") or remove the working tree file along with
modification ("git reset --hard") is a good thing to have, but
the branch switching "git switch" is supposed to preserve local
modification (or fail to switch), whether the dirty path is inside
or outside the sparse checkout area.

> If we _are_ going to go with a config option, then I'm not a big
> fan of this name. I've also been thinking that the sparse-checkout
> config has been squatting in the "core.*" space too long. Perhaps
> it is time to create its own section?
>
> What about something like sparseCheckout.keepDirtyFiles, which
> defaults to false?

What about not having a configuration?

> 1. How does a user with a dirty, tracked, sparse file get back
>    into a state where that file is deleted? What commands do
>    they need to run? Can that be tested and added to the sparse-
>    checkout documentation?
>
> 2. What does 'git status' look like when a user is in this state?
>    Is that helpful?

Good points.

Thanks.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] unpack-trees: add core.sparseCheckoutRmFiles config
  2021-06-02  2:32       ` Junio C Hamano
@ 2021-06-02  5:53         ` Elijah Newren
  2021-06-02  6:13           ` Tim Renouf (open source)
  0 siblings, 1 reply; 10+ messages in thread
From: Elijah Newren @ 2021-06-02  5:53 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Derrick Stolee, Tim Renouf, Derrick Stolee, Git Mailing List

Derrick and Junio already made some really good points, I'll just try
to add a few comments.

On Tue, Jun 1, 2021 at 7:32 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Derrick Stolee <stolee@gmail.com> writes:
>
> >> This commit adds the above config item, defaulting to "true" to get the
> >> old behavior. Set it to "false" to avoid removing such a file from the
> >> worktree.
> >
> > I don't think config is necessary here. This behavior is niche
> > enough to create a behavior-breaking change. However, we do want
> > to ensure that the full flow of eventually clearing the file when
> > clean is handled.
>
> I didn't have a chance to get around to commenting on the patch
> (instead I was preparing for -rc3), but you covered pretty much
> everything I wanted to say.  It is unusual for those who are using
> sparse checkout to have a modified (=tracked and dirty) file that
> shouldn't be there, and making sure the user notices these unusual
> files, instead of silently losing the changes to them, is probably a
> "bugfix".
>
> An explicit request to destructively overwrite the path ("git
> restore -- path") or remove the working tree file along with
> modification ("git reset --hard") is a good thing to have, but
> the branch switching "git switch" is supposed to preserve local
> modification (or fail to switch), whether the dirty path is inside
> or outside the sparse checkout area.

Yes, I think this is the best phrasing of the bug.  Since
SKIP_WORKTREE was originally implemented as "assume file contents
match HEAD" (which is sometimes a reasonable first approximation to
the intended behavior, but which keeps leading us astray into bad
behavior), it's pretty easy to see what's likely happening here:
unpack-trees.c is almost certainly just assuming that the file
contents match HEAD and not even bothering to verify that assumption
before deleting the file.  We should make it verify when a file is
SKIP_WORKTREE.

> > If we _are_ going to go with a config option, then I'm not a big
> > fan of this name. I've also been thinking that the sparse-checkout
> > config has been squatting in the "core.*" space too long. Perhaps
> > it is time to create its own section?
> >
> > What about something like sparseCheckout.keepDirtyFiles, which
> > defaults to false?
>
> What about not having a configuration?

I agree; we all concur that the reported behavior is a bug; adding a
config option to turn a bug off for some people just doesn't make any
sense.  Let's fix the bug instead.

I'm also worried that making a config option may have masked subtle
bugs in the patch that the rest of the testsuite would have turned up.

> > 1. How does a user with a dirty, tracked, sparse file get back
> >    into a state where that file is deleted? What commands do
> >    they need to run? Can that be tested and added to the sparse-
> >    checkout documentation?

Yeah, I've wondered if 'git sparse-checkout reapply' should do
something special here...or whether additional subcommands might be
needed.  I'm still not quite sure.

> > 2. What does 'git status' look like when a user is in this state?
> >    Is that helpful?

I'm worried that trying to make 'git status' report these files would
be expensive and defeat some of the advantages of a sparse-checkout.
Even in cone mode, and even if we could just stat the leading
directories to see they aren't there, would we still end up iterating
over all the index entries just to verify that their leading directory
is missing so we don't have to stat it?

Whereas checkout, if it might end up deleting something, it makes
perfect sense to pay the cost of a stat at that point and throw an
error message and abort if things aren't clean.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] unpack-trees: add core.sparseCheckoutRmFiles config
  2021-06-02  5:53         ` Elijah Newren
@ 2021-06-02  6:13           ` Tim Renouf (open source)
  2021-06-02 23:41             ` Elijah Newren
  0 siblings, 1 reply; 10+ messages in thread
From: Tim Renouf (open source) @ 2021-06-02  6:13 UTC (permalink / raw)
  To: Elijah Newren
  Cc: Junio C Hamano, Derrick Stolee, Derrick Stolee, Git Mailing List

Hi all

Thanks for the reviews and comments.

My use case is that I never want it to remove or otherwise touch those files outside of sparse-checkout that happen to be the same path as index paths. I realize that currently gives me complications (e.g. I must never try and merge/cherry-pick/rebase a commit that might cause a merge conflict out there), and I realize that’s not what everyone else wants. For example, I don’t want git reset --hard or whatever to remove those files. Hence the config option.

Am I right in saying that the sparse-index work makes it easier to achieve my use case? In that those outside-sparse-checkout paths would not ever get merged into the index, even if I merge/cherry-pick/rebase a tree with paths there?

I can go into more details on how I arrived ay my use case if it helps.

So maybe there are two separate things here:

1. The bug that checkout removes my file when it is dirty, instead of refusing (unless -f) or just ignoring it.
2. My use case, which is to do its best to never remove or otherwise touch worktree files outside sparse-checkout.

> I'm also worried that making a config option may have masked subtle
> bugs in the patch that the rest of the testsuite would have turned up.

It is true that not hiding it in a config option makes a few tests fail, including ones that specifically test that a git reset after a materialization from a merge conflict causes the file to be removed.

Thanks.

-tpr


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] unpack-trees: add core.sparseCheckoutRmFiles config
  2021-06-02  6:13           ` Tim Renouf (open source)
@ 2021-06-02 23:41             ` Elijah Newren
  2021-06-05 15:33               ` Tim Renouf (open source)
  0 siblings, 1 reply; 10+ messages in thread
From: Elijah Newren @ 2021-06-02 23:41 UTC (permalink / raw)
  To: Tim Renouf (open source)
  Cc: Junio C Hamano, Derrick Stolee, Derrick Stolee, Git Mailing List

Hi,

On Tue, Jun 1, 2021 at 11:14 PM Tim Renouf (open source)
<tpr.ll@botech.co.uk> wrote:
>
> Hi all
>
> Thanks for the reviews and comments.
>
> My use case is that I never want it to remove or otherwise touch those files outside of sparse-checkout that happen to be the same path as index paths. I realize that currently gives me complications (e.g. I must never try and merge/cherry-pick/rebase a commit that might cause a merge conflict out there), and I realize that’s not what everyone else wants. For example, I don’t want git reset --hard or whatever to remove those files. Hence the config option.

Every case I've ever seen of present-despite-skip-worktree files is an
accidental and erroneous condition.  If you're trying to use it for
something else, we'd really need to understand the higher level
use-cases so that we can make ranges of commands work together nicely.
The sparse checkout capability itself was started by a low-level
implementational detail ("treat paths as though they matched HEAD and
don't write them to the working tree") and led to a maze of surprises,
bugs, edge cases, etc.  I'd rather not compound that problem by adding
another configuration option defined via a similar low-level
implementational detail.

I'm also leaning towards having `git reset --hard` not clear
present-despite-skip-worktree files, and not having `git status`
report them; both seem like an unnecessary performance issue for this
rare accidental/erroneous condition.  I totally agree that `git
switch/checkout` should not delete or overwrite such files if they
differ from HEAD; but similar to how having a dirty file prevents a
branch switch to some branch that deletes the file, a
present-despite-skip-worktree file that differs from HEAD should
result in an error message and an aborted switch/checkout.  At least
for the standard sparse-checkout behavior; don't know what your
usecase is.

> Am I right in saying that the sparse-index work makes it easier to achieve my use case? In that those outside-sparse-checkout paths would not ever get merged into the index, even if I merge/cherry-pick/rebase a tree with paths there?

If you merge/cherry-pick/rebase a commit with changes to a path
outside the sparse-checkout, and it merges cleanly with your current
branch, then with sparse-index it wouldn't even show up in the index
(though one of its parent trees would be modified because of the
changes to that file).  But that's not very different than without the
sparse-index, at least with the ort merge backend (available and ready
for using starting with git-2.32).  The recursive merge strategy (the
default) is known to write files to the working tree despite the
sparse-checkout requests, even when the merge is clean, but that's
just a bug in the recursive backend.  (One that isn't easily fixable
within its design, which is one of many reasons it was being written
in the form of the ort backend.)

If you merge/cherry-pick/rebase a commit with changes to a path
outside the sparse-checkout, and it conflicts with your branch, then
with or without sparse-index, that file is going to show up in the
index with higher order stages and be written to the working tree.
That is, so long as you don't have a file in the way.  Since the ort
backend makes use of the checkout machinery to switch from the current
state to the new state, fixing checkout to throw an error and abort
when a present-despite-skip-worktree file is present would cause
merges/rebases/cherry-picks to do the same.

> I can go into more details on how I arrived ay my use case if it helps.
>
> So maybe there are two separate things here:
>
> 1. The bug that checkout removes my file when it is dirty, instead of refusing (unless -f) or just ignoring it.

Yep, we need to fix this.

> 2. My use case, which is to do its best to never remove or otherwise touch worktree files outside sparse-checkout.

Can you expand on this use case a bit?  Should git diff or git status
report on your changes for your usecase?  Should "git restore" ignore
the given paths and not overwrite it?  What if the user explicitly
listed the path in question?  Should stash save these changes or
ignore them?  Should add include these changes or ignore them?  Since
your indexed file is different than the worktree file, should "git mv"
move just the file recorded in the index, just the one in the
worktree, or both?  If someone tries to run "git archive", should your
modified files be included?  If you don't like anything touching these
paths, does that mean they are also uninteresting?  So for example,
should "git log -p" or "git grep $REVISION" only return results inside
the sparse-checkout list of paths?

From a higher level, what are you trying to achieve?  Is it similar to
`git update-index --assume-unchanged`?  The point of sparse-checkout
was to reduce the number of files in the working tree, but it appears
you aren't trying to do that; you are putting files back into the
working tree anyway.  So, what's the point of sparse-checkout then?

It's possible sparse-checkout doesn't meet your needs, and just isn't
designed to meet them, and we need another special concept for your
case.  Or perhaps there's a config option that's meaningful.  Or maybe
you're just struggling with the bugs of sparse-checkout, of which
there are *many*.  See e.g.
https://lore.kernel.org/git/CABPp-BGJ_Nvi5TmgriD9Bh6eNXE2EDq2f8e8QKXAeYG3BxZafA@mail.gmail.com/

> > I'm also worried that making a config option may have masked subtle
> > bugs in the patch that the rest of the testsuite would have turned up.
>
> It is true that not hiding it in a config option makes a few tests fail, including ones that specifically test that a git reset after a materialization from a merge conflict causes the file to be removed.

Yeah, not so surprising that it has weird interactions with merging
(and perhaps different weird interactions with different merge
backends).  Anything that touches unpack-trees.c either needs to be
part of standard operation, or if it's behind a special config option,
then it needs to be accompanied with a huge number of tests from many
different commands since it affects so many things.  We're trying to
add a battery of tests for sparse-checkout and sparse-index, and we
still obviously need a lot more.

Hope that helps,
Elijah

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] unpack-trees: add core.sparseCheckoutRmFiles config
  2021-06-02 23:41             ` Elijah Newren
@ 2021-06-05 15:33               ` Tim Renouf (open source)
  0 siblings, 0 replies; 10+ messages in thread
From: Tim Renouf (open source) @ 2021-06-05 15:33 UTC (permalink / raw)
  To: Elijah Newren
  Cc: Junio C Hamano, Derrick Stolee, Derrick Stolee, Git Mailing List

Hi all

> Can you expand on this use case a bit? 

Fundamentally, I wanted sparse-checkout to cause all git commands to completely ignore all files outside sparse-checkout patterns, even if they happen to correspond to paths in the index. I realize that that is difficult to achieve, but I was hoping that my patch would make it close enough to be workable. In particular, I would need to avoid doing a merge (cherry-pick, rebase) with conflicts outside the sparse-checkout patterns.

Here is the (maybe slightly specialized) use case:

Due to decisions out of my hands, I have a monorepo containing some shared code and some separate components in subdirectories. There is a “main” branch where the code is built and delivered from. Each component is developed separately and has its own dev branch, e.g. “dev-component1”, “dev-component2”. Many developers develop on just one component, so they check out its dev branch. The “dev-component1” branch still contains the “main” version of the shared code and the other components, so you can merge it up to the main branch.

But some people want to be able to develop on two components at the same time. So they want to check out component1 on its dev-component1 branch, and component2 on its dev-component2 branch. They would ideally want to do this at the same time as maintaining the overall monorepo directory layout. The monorepo does not give a good way to do that.

I was experimenting with an approach combining sparse-checkout, worktrees, and the core.worktree config item, as follows:

The monorepo has its main worktree at the root of the overall workspace, with sparse-checkout set to check out just the shared code not considered part of any separate component. Each component has its own worktree in its subdirectory, with sparse-checkout set to only check out that component. To get the monorepo-like directory structure, a component’s worktree has its core.worktree config item set to point back up to the root of the overall workspace.

So, with that somewhat hairy structure, anything that touches a file outside of the worktree’s sparse-checkout actually touches a file in a different worktree, giving some surprising results.

Even with my patch, there is still a risk of that happening. And having a component’s worktree rooted somewhere other than where you think it is rooted gave some other surprising results.

I think I am going to abandon that approach and go a different way.

Thanks for the attention.

-tpr

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2021-06-05 15:33 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-28 20:14 bug report: git checkout deletes worktree file even though it is excluded by sparse-checkout Tim Renouf (open source)
2021-05-28 22:44 ` Elijah Newren
2021-06-01 18:31   ` [PATCH] unpack-trees: add core.sparseCheckoutRmFiles config Tim Renouf
2021-06-02  2:00     ` Derrick Stolee
2021-06-02  2:32       ` Junio C Hamano
2021-06-02  5:53         ` Elijah Newren
2021-06-02  6:13           ` Tim Renouf (open source)
2021-06-02 23:41             ` Elijah Newren
2021-06-05 15:33               ` Tim Renouf (open source)
2021-06-02  1:37 ` bug report: git checkout deletes worktree file even though it is excluded by sparse-checkout Derrick Stolee

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).