git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/6] Fix various issues around removal of untracked files/directories
@ 2021-09-18 23:15 Elijah Newren via GitGitGadget
  2021-09-18 23:15 ` [PATCH 1/6] t2500: add various tests for nuking untracked files Elijah Newren via GitGitGadget
                   ` (6 more replies)
  0 siblings, 7 replies; 82+ messages in thread
From: Elijah Newren via GitGitGadget @ 2021-09-18 23:15 UTC (permalink / raw)
  To: git; +Cc: Ævar Arnfjörð Bjarmason, Fedor Biryukov, Elijah Newren

This series depends on en/am-abort-fix.

We have multiple codepaths that delete untracked files/directories but
shouldn't. There are also some codepaths where we delete untracked
files/directories intentionally (based on mailing list discussion), but
where that intent is not documented. Fix the documentation, add several new
(mostly failing) testcases, fix some of the new testcases, and add comments
about some potential remaining problems. (I found these as a side-effect of
looking at [1], though [2] pointed out one explicitly while I was working on
it.)

Note that I'm using Junio's declaration about checkout -f and reset --hard
(and also presuming that since read-tree --reset is porcelain that its
behavior should be left alone)[3] in this series.

SIDENOTE about treating (some) ignored files as precious:

There's another related topic here that came up in the mailing list threads
that is separate even if similar: namely, treating ignored files as precious
instead of deleting them. I do not try to handle that here, but I believe
that would actually be relatively easy to handle. If you leave
unpack_trees_options->dir as NULL, then ignored files are treated as
precious (my original patch 2 made that mistake). There's a few other
locations that already optionally set up unpack_trees_options->dir (a quick
search for "overwrite_ignore" and "overwrite-ignore" will find them), so
we'd just need to implement that option flag in more places corresponding to
the new callsites (and perhaps make a global core.overwrite_ignored config
option to affect all of these). Of course, doing so would globally treat
ignored files as precious rather than allowing them to be configured on a
per-path basis, but honestly I think the idea of configuring ignored files
as precious on a per-path basis sounds like insanity. (We have enough bugs
with untracked and ignored files without adding yet another type. Also,
tla/baz was excessively confusing to me due in part to the number of types
of files and I'd rather not see such ideas ported to git. And, of course,
configuring per-path rules sounds like lots of work for end users to
configure. There may be additional reasons against it.) So, if someone wants
to pursue the precious-ignored concept then I'd much rather see it done as a
global setting. Just my $0.02.

[1] https://lore.kernel.org/git/xmqqv93n7q1v.fsf@gitster.g/ [2]
https://lore.kernel.org/git/C357A648-8B13-45C3-9388-C0C7F7D40DAE@gmail.com/
[3] https://lore.kernel.org/git/xmqqr1e2ejs9.fsf@gitster.g/

Elijah Newren (6):
  t2500: add various tests for nuking untracked files
  Split unpack_trees 'reset' flag into two for untracked handling
  unpack-trees: avoid nuking untracked dir in way of unmerged file
  unpack-trees: avoid nuking untracked dir in way of locally deleted
    file
  Comment important codepaths regarding nuking untracked files/dirs
  Documentation: call out commands that nuke untracked files/directories

 Documentation/git-checkout.txt   |   5 +-
 Documentation/git-read-tree.txt  |   5 +-
 Documentation/git-reset.txt      |   3 +-
 builtin/am.c                     |   6 +-
 builtin/checkout.c               |  10 +-
 builtin/read-tree.c              |  11 +-
 builtin/reset.c                  |  15 +-
 builtin/stash.c                  |   3 +-
 builtin/submodule--helper.c      |   4 +
 builtin/worktree.c               |   5 +
 contrib/rerere-train.sh          |   2 +-
 reset.c                          |   9 +-
 submodule.c                      |   1 +
 t/t1013-read-tree-submodule.sh   |   4 +-
 t/t2500-untracked-overwriting.sh | 244 +++++++++++++++++++++++++++++++
 unpack-trees.c                   |  56 +++++--
 unpack-trees.h                   |   4 +-
 17 files changed, 359 insertions(+), 28 deletions(-)
 create mode 100755 t/t2500-untracked-overwriting.sh


base-commit: c5ead19ea282a288e01d86536349a4ae4a093e4b
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1036%2Fnewren%2Funtracked_removal-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1036/newren/untracked_removal-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/1036
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 82+ messages in thread

* [PATCH 1/6] t2500: add various tests for nuking untracked files
  2021-09-18 23:15 [PATCH 0/6] Fix various issues around removal of untracked files/directories Elijah Newren via GitGitGadget
@ 2021-09-18 23:15 ` Elijah Newren via GitGitGadget
  2021-09-19 13:44   ` Ævar Arnfjörð Bjarmason
  2021-09-18 23:15 ` [PATCH 2/6] Split unpack_trees 'reset' flag into two for untracked handling Elijah Newren via GitGitGadget
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 82+ messages in thread
From: Elijah Newren via GitGitGadget @ 2021-09-18 23:15 UTC (permalink / raw)
  To: git
  Cc: Ævar Arnfjörð Bjarmason, Fedor Biryukov,
	Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

Noting that unpack_trees treats reset=1 & update=1 as license to nuke
untracked files, I looked for code paths that use this combination and
tried to generate testcases which demonstrated unintentional loss of
untracked files and directories.  I found several.

I also include testcases for `git reset --{hard,merge,keep}`.  A hard
reset is perhaps the most direct test of unpack_tree's reset=1 behavior,
but we cannot make `git reset --hard` preserve untracked files without
some migration work.

Also, the two commands `checkout --force` (because of the --force) and
`read-tree --reset` (because it's plumbing and we need to keep it
backward compatible) were left out as we expect those to continue
removing untracked files and directories.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 t/t2500-untracked-overwriting.sh | 244 +++++++++++++++++++++++++++++++
 1 file changed, 244 insertions(+)
 create mode 100755 t/t2500-untracked-overwriting.sh

diff --git a/t/t2500-untracked-overwriting.sh b/t/t2500-untracked-overwriting.sh
new file mode 100755
index 00000000000..a1a6dfa671e
--- /dev/null
+++ b/t/t2500-untracked-overwriting.sh
@@ -0,0 +1,244 @@
+#!/bin/sh
+
+test_description='Test handling of overwriting untracked files'
+
+. ./test-lib.sh
+
+test_setup_reset () {
+	test_create_repo reset_$1 &&
+	(
+		cd reset_$1 &&
+		test_commit init &&
+
+		git branch stable &&
+		git branch work &&
+
+		git checkout work &&
+		test_commit foo &&
+
+		git checkout stable
+	)
+}
+
+test_expect_success 'reset --hard will nuke untracked files/dirs' '
+	test_setup_reset hard &&
+	(
+		cd reset_hard &&
+		git ls-tree -r stable &&
+		git log --all --name-status --oneline &&
+		git ls-tree -r work &&
+
+		mkdir foo.t &&
+		echo precious >foo.t/file &&
+		echo foo >expect &&
+
+		git reset --hard work &&
+
+		# check that untracked directory foo.t/ was nuked
+		test_path_is_file foo.t &&
+		test_cmp expect foo.t
+	)
+'
+
+test_expect_success 'reset --merge will preserve untracked files/dirs' '
+	test_setup_reset merge &&
+	(
+		cd reset_merge &&
+
+		mkdir foo.t &&
+		echo precious >foo.t/file &&
+		cp foo.t/file expect &&
+
+		test_must_fail git reset --merge work 2>error &&
+		test_cmp expect foo.t/file &&
+		grep "Updating.*foo.t.*would lose untracked files" error
+	)
+'
+
+test_expect_success 'reset --keep will preserve untracked files/dirs' '
+	test_setup_reset keep &&
+	(
+		cd reset_keep &&
+
+		mkdir foo.t &&
+		echo precious >foo.t/file &&
+		cp foo.t/file expect &&
+
+		test_must_fail git reset --merge work 2>error &&
+		test_cmp expect foo.t/file &&
+		grep "Updating.*foo.t.*would lose untracked files" error
+	)
+'
+
+test_setup_checkout_m () {
+	test_create_repo checkout &&
+	(
+		cd checkout &&
+		test_commit init &&
+
+		test_write_lines file has some >filler &&
+		git add filler &&
+		git commit -m filler &&
+
+		git branch stable &&
+
+		git switch -c work &&
+		echo stuff >notes.txt &&
+		test_write_lines file has some words >filler &&
+		git add notes.txt filler &&
+		git commit -m filler &&
+
+		git checkout stable
+	)
+}
+
+test_expect_failure 'checkout -m does not nuke untracked file' '
+	test_setup_checkout_m &&
+	(
+		cd checkout &&
+
+		# Tweak filler
+		test_write_lines this file has some >filler &&
+		# Make an untracked file, save its contents in "expect"
+		echo precious >notes.txt &&
+		cp notes.txt expect &&
+
+		test_must_fail git checkout -m work &&
+		test_cmp expect notes.txt
+	)
+'
+
+test_setup_sequencing () {
+	test_create_repo sequencing_$1 &&
+	(
+		cd sequencing_$1 &&
+		test_commit init &&
+
+		test_write_lines this file has some words >filler &&
+		git add filler &&
+		git commit -m filler &&
+
+		mkdir -p foo/bar &&
+		test_commit foo/bar/baz &&
+
+		git branch simple &&
+		git branch fooey &&
+
+		git checkout fooey &&
+		git rm foo/bar/baz.t &&
+		echo stuff >>filler &&
+		git add -u &&
+		git commit -m "changes" &&
+
+		git checkout simple &&
+		echo items >>filler &&
+		echo newstuff >>newfile &&
+		git add filler newfile &&
+		git commit -m another
+	)
+}
+
+test_expect_failure 'git rebase --abort and untracked files' '
+	test_setup_sequencing rebase_abort_and_untracked &&
+	(
+		cd sequencing_rebase_abort_and_untracked &&
+		git checkout fooey &&
+		test_must_fail git rebase simple &&
+
+		cat init.t &&
+		git rm init.t &&
+		echo precious >init.t &&
+		cp init.t expect &&
+		git status --porcelain &&
+		test_must_fail git rebase --abort &&
+		test_cmp expect init.t
+	)
+'
+
+test_expect_failure 'git rebase fast forwarding and untracked files' '
+	test_setup_sequencing rebase_fast_forward_and_untracked &&
+	(
+		cd sequencing_rebase_fast_forward_and_untracked &&
+		git checkout init &&
+		echo precious >filler &&
+		cp filler expect &&
+		test_must_fail git rebase init simple &&
+		test_cmp expect filler
+	)
+'
+
+test_expect_failure 'git rebase --autostash and untracked files' '
+	test_setup_sequencing rebase_autostash_and_untracked &&
+	(
+		cd sequencing_rebase_autostash_and_untracked &&
+		git checkout simple &&
+		git rm filler &&
+		mkdir filler &&
+		echo precious >filler/file &&
+		cp filler/file expect &&
+		git rebase --autostash init &&
+		test_path_is_file filler/file
+	)
+'
+
+test_expect_failure 'git stash and untracked files' '
+	test_setup_sequencing stash_and_untracked_files &&
+	(
+		cd sequencing_stash_and_untracked_files &&
+		git checkout simple &&
+		git rm filler &&
+		mkdir filler &&
+		echo precious >filler/file &&
+		cp filler/file expect &&
+		git status --porcelain &&
+		git stash push &&
+		git status --porcelain &&
+		test_path_is_file filler/file
+	)
+'
+
+test_expect_failure 'git am --abort and untracked dir vs. unmerged file' '
+	test_setup_sequencing am_abort_and_untracked &&
+	(
+		cd sequencing_am_abort_and_untracked &&
+		git format-patch -1 --stdout fooey >changes.mbox &&
+		test_must_fail git am --3way changes.mbox &&
+
+		# Delete the conflicted file; we will stage and commit it later
+		rm filler &&
+
+		# Put an unrelated untracked directory there
+		mkdir filler &&
+		echo foo >filler/file1 &&
+		echo bar >filler/file2 &&
+
+		test_must_fail git am --abort 2>errors &&
+		test_path_is_dir filler &&
+		grep "Updating .filler. would lose untracked files in it" errors
+	)
+'
+
+test_expect_failure 'git am --skip and untracked dir vs deleted file' '
+	test_setup_sequencing am_skip_and_untracked &&
+	(
+		cd sequencing_am_skip_and_untracked &&
+		git checkout fooey &&
+		git format-patch -1 --stdout simple >changes.mbox &&
+		test_must_fail git am --3way changes.mbox &&
+
+		# Delete newfile
+		rm newfile &&
+
+		# Put an unrelated untracked directory there
+		mkdir newfile &&
+		echo foo >newfile/file1 &&
+		echo bar >newfile/file2 &&
+
+		# Change our mind about resolutions, just skip this patch
+		test_must_fail git am --skip 2>errors &&
+		test_path_is_dir newfile &&
+		grep "Updating .newfile. would lose untracked files in it" errors
+	)
+'
+
+test_done
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [PATCH 2/6] Split unpack_trees 'reset' flag into two for untracked handling
  2021-09-18 23:15 [PATCH 0/6] Fix various issues around removal of untracked files/directories Elijah Newren via GitGitGadget
  2021-09-18 23:15 ` [PATCH 1/6] t2500: add various tests for nuking untracked files Elijah Newren via GitGitGadget
@ 2021-09-18 23:15 ` Elijah Newren via GitGitGadget
  2021-09-19 13:48   ` Ævar Arnfjörð Bjarmason
  2021-09-20 10:19   ` Phillip Wood
  2021-09-18 23:15 ` [PATCH 3/6] unpack-trees: avoid nuking untracked dir in way of unmerged file Elijah Newren via GitGitGadget
                   ` (4 subsequent siblings)
  6 siblings, 2 replies; 82+ messages in thread
From: Elijah Newren via GitGitGadget @ 2021-09-18 23:15 UTC (permalink / raw)
  To: git
  Cc: Ævar Arnfjörð Bjarmason, Fedor Biryukov,
	Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

Traditionally, unpack_trees_options->reset was used to signal that it
was okay to delete any untracked files in the way.  This was used by
`git read-tree --reset`, but then started appearing in other places as
well.  However, many of the other uses should not be deleting untracked
files in the way.  Split this into two separate fields:
   reset_nuke_untracked
   reset_keep_untracked
and, since many code paths in unpack_trees need to be followed for both
of these flags, introduce a third one for convenience:
   reset_either
which is simply an or-ing of the other two.

Modify existing callers so that
   read-tree --reset
   reset --hard
   checkout --force
continue using reset_nuke_untracked, but so that other callers,
including
   am
   checkout without --force
   stash  (though currently dead code; reset always had a value of 0)
   numerous callers from rebase/sequencer to reset_head()
will use the new reset_keep_untracked field.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 builtin/am.c                     |  6 +++++-
 builtin/checkout.c               | 10 +++++++++-
 builtin/read-tree.c              | 11 ++++++++---
 builtin/reset.c                  | 15 +++++++++++++--
 builtin/stash.c                  |  2 +-
 reset.c                          |  9 +++++++--
 t/t1013-read-tree-submodule.sh   |  4 ++--
 t/t2500-untracked-overwriting.sh |  6 +++---
 unpack-trees.c                   | 17 ++++++++++++-----
 unpack-trees.h                   |  4 +++-
 10 files changed, 63 insertions(+), 21 deletions(-)

diff --git a/builtin/am.c b/builtin/am.c
index c79e0167e98..dbe6cbe6a33 100644
--- a/builtin/am.c
+++ b/builtin/am.c
@@ -1918,8 +1918,12 @@ static int fast_forward_to(struct tree *head, struct tree *remote, int reset)
 	opts.dst_index = &the_index;
 	opts.update = 1;
 	opts.merge = 1;
-	opts.reset = reset;
+	opts.reset_keep_untracked = reset;
 	opts.fn = twoway_merge;
+	/* Setup opts.dir so that ignored files in the way get overwritten */
+	opts.dir = xcalloc(1, sizeof(*opts.dir));
+	opts.dir->flags |= DIR_SHOW_IGNORED;
+	setup_standard_excludes(opts.dir);
 	init_tree_desc(&t[0], head->buffer, head->size);
 	init_tree_desc(&t[1], remote->buffer, remote->size);
 
diff --git a/builtin/checkout.c b/builtin/checkout.c
index b5d477919a7..ab0bb4d94f0 100644
--- a/builtin/checkout.c
+++ b/builtin/checkout.c
@@ -646,12 +646,20 @@ static int reset_tree(struct tree *tree, const struct checkout_opts *o,
 	opts.head_idx = -1;
 	opts.update = worktree;
 	opts.skip_unmerged = !worktree;
-	opts.reset = 1;
+	if (o->force)
+		opts.reset_nuke_untracked = 1;
+	else
+		opts.reset_keep_untracked = 1;
 	opts.merge = 1;
 	opts.fn = oneway_merge;
 	opts.verbose_update = o->show_progress;
 	opts.src_index = &the_index;
 	opts.dst_index = &the_index;
+	if (o->overwrite_ignore) {
+		opts.dir = xcalloc(1, sizeof(*opts.dir));
+		opts.dir->flags |= DIR_SHOW_IGNORED;
+		setup_standard_excludes(opts.dir);
+	}
 	init_checkout_metadata(&opts.meta, info->refname,
 			       info->commit ? &info->commit->object.oid : null_oid(),
 			       NULL);
diff --git a/builtin/read-tree.c b/builtin/read-tree.c
index 485e7b04794..8b94e1aa261 100644
--- a/builtin/read-tree.c
+++ b/builtin/read-tree.c
@@ -133,7 +133,7 @@ int cmd_read_tree(int argc, const char **argv, const char *cmd_prefix)
 			 N_("3-way merge if no file level merging required")),
 		OPT_BOOL(0, "aggressive", &opts.aggressive,
 			 N_("3-way merge in presence of adds and removes")),
-		OPT_BOOL(0, "reset", &opts.reset,
+		OPT_BOOL(0, "reset", &opts.reset_keep_untracked,
 			 N_("same as -m, but discard unmerged entries")),
 		{ OPTION_STRING, 0, "prefix", &opts.prefix, N_("<subdirectory>/"),
 		  N_("read the tree into the index under <subdirectory>/"),
@@ -162,6 +162,11 @@ int cmd_read_tree(int argc, const char **argv, const char *cmd_prefix)
 	opts.head_idx = -1;
 	opts.src_index = &the_index;
 	opts.dst_index = &the_index;
+	if (opts.reset_keep_untracked) {
+		opts.dir = xcalloc(1, sizeof(*opts.dir));
+		opts.dir->flags |= DIR_SHOW_IGNORED;
+		setup_standard_excludes(opts.dir);
+	}
 
 	git_config(git_read_tree_config, NULL);
 
@@ -171,7 +176,7 @@ int cmd_read_tree(int argc, const char **argv, const char *cmd_prefix)
 	hold_locked_index(&lock_file, LOCK_DIE_ON_ERROR);
 
 	prefix_set = opts.prefix ? 1 : 0;
-	if (1 < opts.merge + opts.reset + prefix_set)
+	if (1 < opts.merge + opts.reset_keep_untracked + prefix_set)
 		die("Which one? -m, --reset, or --prefix?");
 
 	/*
@@ -183,7 +188,7 @@ int cmd_read_tree(int argc, const char **argv, const char *cmd_prefix)
 	 * mode.
 	 */
 
-	if (opts.reset || opts.merge || opts.prefix) {
+	if (opts.reset_keep_untracked || opts.merge || opts.prefix) {
 		if (read_cache_unmerged() && (opts.prefix || opts.merge))
 			die(_("You need to resolve your current index first"));
 		stage = opts.merge = 1;
diff --git a/builtin/reset.c b/builtin/reset.c
index 43e855cb887..ba39c4882a6 100644
--- a/builtin/reset.c
+++ b/builtin/reset.c
@@ -10,6 +10,7 @@
 #define USE_THE_INDEX_COMPATIBILITY_MACROS
 #include "builtin.h"
 #include "config.h"
+#include "dir.h"
 #include "lockfile.h"
 #include "tag.h"
 #include "object.h"
@@ -70,9 +71,19 @@ static int reset_index(const char *ref, const struct object_id *oid, int reset_t
 		break;
 	case HARD:
 		opts.update = 1;
-		/* fallthrough */
+		opts.reset_nuke_untracked = 1;
+		break;
+	case MIXED:
+		opts.reset_keep_untracked = 1; /* but opts.update=0, so untracked left alone */
+		break;
 	default:
-		opts.reset = 1;
+		BUG("invalid reset_type passed to reset_index");
+	}
+	if (opts.reset_keep_untracked) {
+		/* Setup opts.dir so we can overwrite ignored files */
+		opts.dir = xcalloc(1, sizeof(*opts.dir));
+		opts.dir->flags |= DIR_SHOW_IGNORED;
+		setup_standard_excludes(opts.dir);
 	}
 
 	read_cache_unmerged();
diff --git a/builtin/stash.c b/builtin/stash.c
index 8f42360ca91..4ceb3581b47 100644
--- a/builtin/stash.c
+++ b/builtin/stash.c
@@ -256,7 +256,7 @@ static int reset_tree(struct object_id *i_tree, int update, int reset)
 	opts.src_index = &the_index;
 	opts.dst_index = &the_index;
 	opts.merge = 1;
-	opts.reset = reset;
+	opts.reset_keep_untracked = reset;
 	opts.update = update;
 	opts.fn = oneway_merge;
 
diff --git a/reset.c b/reset.c
index 79310ae071b..0880c76aef9 100644
--- a/reset.c
+++ b/reset.c
@@ -1,5 +1,6 @@
 #include "git-compat-util.h"
 #include "cache-tree.h"
+#include "dir.h"
 #include "lockfile.h"
 #include "refs.h"
 #include "reset.h"
@@ -57,8 +58,12 @@ int reset_head(struct repository *r, struct object_id *oid, const char *action,
 	unpack_tree_opts.update = 1;
 	unpack_tree_opts.merge = 1;
 	init_checkout_metadata(&unpack_tree_opts.meta, switch_to_branch, oid, NULL);
-	if (!detach_head)
-		unpack_tree_opts.reset = 1;
+	if (!detach_head) {
+		unpack_tree_opts.reset_keep_untracked = 1;
+		unpack_tree_opts.dir = xcalloc(1, sizeof(*unpack_tree_opts.dir));
+		unpack_tree_opts.dir->flags |= DIR_SHOW_IGNORED;
+		setup_standard_excludes(unpack_tree_opts.dir);
+	}
 
 	if (repo_read_index_unmerged(r) < 0) {
 		ret = error(_("could not read index"));
diff --git a/t/t1013-read-tree-submodule.sh b/t/t1013-read-tree-submodule.sh
index b6df7444c05..4e485c223ad 100755
--- a/t/t1013-read-tree-submodule.sh
+++ b/t/t1013-read-tree-submodule.sh
@@ -10,10 +10,10 @@ KNOWN_FAILURE_SUBMODULE_OVERWRITE_IGNORED_UNTRACKED=1
 
 test_submodule_switch_recursing_with_args "read-tree -u -m"
 
-test_submodule_forced_switch_recursing_with_args "read-tree -u --reset"
+test_submodule_switch_recursing_with_args "read-tree -u --reset"
 
 test_submodule_switch "read-tree -u -m"
 
-test_submodule_forced_switch "read-tree -u --reset"
+test_submodule_switch "read-tree -u --reset"
 
 test_done
diff --git a/t/t2500-untracked-overwriting.sh b/t/t2500-untracked-overwriting.sh
index a1a6dfa671e..786ec33d63a 100755
--- a/t/t2500-untracked-overwriting.sh
+++ b/t/t2500-untracked-overwriting.sh
@@ -92,7 +92,7 @@ test_setup_checkout_m () {
 	)
 }
 
-test_expect_failure 'checkout -m does not nuke untracked file' '
+test_expect_success 'checkout -m does not nuke untracked file' '
 	test_setup_checkout_m &&
 	(
 		cd checkout &&
@@ -138,7 +138,7 @@ test_setup_sequencing () {
 	)
 }
 
-test_expect_failure 'git rebase --abort and untracked files' '
+test_expect_success 'git rebase --abort and untracked files' '
 	test_setup_sequencing rebase_abort_and_untracked &&
 	(
 		cd sequencing_rebase_abort_and_untracked &&
@@ -155,7 +155,7 @@ test_expect_failure 'git rebase --abort and untracked files' '
 	)
 '
 
-test_expect_failure 'git rebase fast forwarding and untracked files' '
+test_expect_success 'git rebase fast forwarding and untracked files' '
 	test_setup_sequencing rebase_fast_forward_and_untracked &&
 	(
 		cd sequencing_rebase_fast_forward_and_untracked &&
diff --git a/unpack-trees.c b/unpack-trees.c
index 5786645f315..d952eebe96a 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -301,7 +301,7 @@ static int check_submodule_move_head(const struct cache_entry *ce,
 	if (!sub)
 		return 0;
 
-	if (o->reset)
+	if (o->reset_nuke_untracked)
 		flags |= SUBMODULE_MOVE_HEAD_FORCE;
 
 	if (submodule_move_head(ce->name, old_id, new_id, flags))
@@ -1696,6 +1696,13 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
 	if (len > MAX_UNPACK_TREES)
 		die("unpack_trees takes at most %d trees", MAX_UNPACK_TREES);
 
+	if (o->reset_nuke_untracked && o->reset_keep_untracked)
+		BUG("reset_nuke_untracked and reset_keep_untracked are incompatible");
+
+	o->reset_either = 0;
+	if (o->reset_nuke_untracked || o->reset_keep_untracked)
+		o->reset_either = 1;
+
 	trace_performance_enter();
 	trace2_region_enter("unpack_trees", "unpack_trees", the_repository);
 
@@ -1989,7 +1996,7 @@ static int verify_uptodate_1(const struct cache_entry *ce,
 	 */
 	if ((ce->ce_flags & CE_VALID) || ce_skip_worktree(ce))
 		; /* keep checking */
-	else if (o->reset || ce_uptodate(ce))
+	else if (o->reset_either || ce_uptodate(ce))
 		return 0;
 
 	if (!lstat(ce->name, &st)) {
@@ -2218,7 +2225,7 @@ static int verify_absent_1(const struct cache_entry *ce,
 	int len;
 	struct stat st;
 
-	if (o->index_only || o->reset || !o->update)
+	if (o->index_only || o->reset_nuke_untracked || !o->update)
 		return 0;
 
 	len = check_leading_path(ce->name, ce_namelen(ce), 0);
@@ -2585,7 +2592,7 @@ int twoway_merge(const struct cache_entry * const *src,
 
 	if (current) {
 		if (current->ce_flags & CE_CONFLICTED) {
-			if (same(oldtree, newtree) || o->reset) {
+			if (same(oldtree, newtree) || o->reset_either) {
 				if (!newtree)
 					return deleted_entry(current, current, o);
 				else
@@ -2683,7 +2690,7 @@ int oneway_merge(const struct cache_entry * const *src,
 
 	if (old && same(old, a)) {
 		int update = 0;
-		if (o->reset && o->update && !ce_uptodate(old) && !ce_skip_worktree(old) &&
+		if (o->reset_either && o->update && !ce_uptodate(old) && !ce_skip_worktree(old) &&
 			!(old->ce_flags & CE_FSMONITOR_VALID)) {
 			struct stat st;
 			if (lstat(old->name, &st) ||
diff --git a/unpack-trees.h b/unpack-trees.h
index 2d88b19dca7..c419bf8b1f9 100644
--- a/unpack-trees.h
+++ b/unpack-trees.h
@@ -46,7 +46,9 @@ void setup_unpack_trees_porcelain(struct unpack_trees_options *opts,
 void clear_unpack_trees_porcelain(struct unpack_trees_options *opts);
 
 struct unpack_trees_options {
-	unsigned int reset,
+	unsigned int reset_nuke_untracked,
+		     reset_keep_untracked,
+		     reset_either, /* internal use only */
 		     merge,
 		     update,
 		     clone,
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [PATCH 3/6] unpack-trees: avoid nuking untracked dir in way of unmerged file
  2021-09-18 23:15 [PATCH 0/6] Fix various issues around removal of untracked files/directories Elijah Newren via GitGitGadget
  2021-09-18 23:15 ` [PATCH 1/6] t2500: add various tests for nuking untracked files Elijah Newren via GitGitGadget
  2021-09-18 23:15 ` [PATCH 2/6] Split unpack_trees 'reset' flag into two for untracked handling Elijah Newren via GitGitGadget
@ 2021-09-18 23:15 ` Elijah Newren via GitGitGadget
  2021-09-18 23:15 ` [PATCH 4/6] unpack-trees: avoid nuking untracked dir in way of locally deleted file Elijah Newren via GitGitGadget
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 82+ messages in thread
From: Elijah Newren via GitGitGadget @ 2021-09-18 23:15 UTC (permalink / raw)
  To: git
  Cc: Ævar Arnfjörð Bjarmason, Fedor Biryukov,
	Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 t/t2500-untracked-overwriting.sh |  2 +-
 unpack-trees.c                   | 35 ++++++++++++++++++++++++++++----
 2 files changed, 32 insertions(+), 5 deletions(-)

diff --git a/t/t2500-untracked-overwriting.sh b/t/t2500-untracked-overwriting.sh
index 786ec33d63a..017946a494f 100755
--- a/t/t2500-untracked-overwriting.sh
+++ b/t/t2500-untracked-overwriting.sh
@@ -197,7 +197,7 @@ test_expect_failure 'git stash and untracked files' '
 	)
 '
 
-test_expect_failure 'git am --abort and untracked dir vs. unmerged file' '
+test_expect_success 'git am --abort and untracked dir vs. unmerged file' '
 	test_setup_sequencing am_abort_and_untracked &&
 	(
 		cd sequencing_am_abort_and_untracked &&
diff --git a/unpack-trees.c b/unpack-trees.c
index d952eebe96a..3b3d1c0ff40 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -2163,9 +2163,15 @@ static int icase_exists(struct unpack_trees_options *o, const char *name, int le
 	return src && !ie_match_stat(o->src_index, src, st, CE_MATCH_IGNORE_VALID|CE_MATCH_IGNORE_SKIP_WORKTREE);
 }
 
+enum absent_checking_type {
+	COMPLETELY_ABSENT,
+	ABSENT_ANY_DIRECTORY
+};
+
 static int check_ok_to_remove(const char *name, int len, int dtype,
 			      const struct cache_entry *ce, struct stat *st,
 			      enum unpack_trees_error_types error_type,
+			      enum absent_checking_type absent_type,
 			      struct unpack_trees_options *o)
 {
 	const struct cache_entry *result;
@@ -2200,6 +2206,10 @@ static int check_ok_to_remove(const char *name, int len, int dtype,
 		return 0;
 	}
 
+	/* If we only care about directories, then we can remove */
+	if (absent_type == ABSENT_ANY_DIRECTORY)
+		return 0;
+
 	/*
 	 * The previous round may already have decided to
 	 * delete this path, which is in a subdirectory that
@@ -2220,6 +2230,7 @@ static int check_ok_to_remove(const char *name, int len, int dtype,
  */
 static int verify_absent_1(const struct cache_entry *ce,
 			   enum unpack_trees_error_types error_type,
+			   enum absent_checking_type absent_type,
 			   struct unpack_trees_options *o)
 {
 	int len;
@@ -2245,7 +2256,8 @@ static int verify_absent_1(const struct cache_entry *ce,
 								NULL, o);
 			else
 				ret = check_ok_to_remove(path, len, DT_UNKNOWN, NULL,
-							 &st, error_type, o);
+							 &st, error_type,
+							 absent_type, o);
 		}
 		free(path);
 		return ret;
@@ -2260,7 +2272,7 @@ static int verify_absent_1(const struct cache_entry *ce,
 
 		return check_ok_to_remove(ce->name, ce_namelen(ce),
 					  ce_to_dtype(ce), ce, &st,
-					  error_type, o);
+					  error_type, absent_type, o);
 	}
 }
 
@@ -2270,14 +2282,23 @@ static int verify_absent(const struct cache_entry *ce,
 {
 	if (!o->skip_sparse_checkout && (ce->ce_flags & CE_NEW_SKIP_WORKTREE))
 		return 0;
-	return verify_absent_1(ce, error_type, o);
+	return verify_absent_1(ce, error_type, COMPLETELY_ABSENT, o);
+}
+
+static int verify_absent_if_directory(const struct cache_entry *ce,
+				      enum unpack_trees_error_types error_type,
+				      struct unpack_trees_options *o)
+{
+	if (!o->skip_sparse_checkout && (ce->ce_flags & CE_NEW_SKIP_WORKTREE))
+		return 0;
+	return verify_absent_1(ce, error_type, ABSENT_ANY_DIRECTORY, o);
 }
 
 static int verify_absent_sparse(const struct cache_entry *ce,
 				enum unpack_trees_error_types error_type,
 				struct unpack_trees_options *o)
 {
-	return verify_absent_1(ce, error_type, o);
+	return verify_absent_1(ce, error_type, COMPLETELY_ABSENT, o);
 }
 
 static int merged_entry(const struct cache_entry *ce,
@@ -2351,6 +2372,12 @@ static int merged_entry(const struct cache_entry *ce,
 		 * Previously unmerged entry left as an existence
 		 * marker by read_index_unmerged();
 		 */
+		if (verify_absent_if_directory(merge,
+				  ERROR_WOULD_LOSE_UNTRACKED_OVERWRITTEN, o)) {
+			discard_cache_entry(merge);
+			return -1;
+		}
+
 		invalidate_ce_path(old, o);
 	}
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [PATCH 4/6] unpack-trees: avoid nuking untracked dir in way of locally deleted file
  2021-09-18 23:15 [PATCH 0/6] Fix various issues around removal of untracked files/directories Elijah Newren via GitGitGadget
                   ` (2 preceding siblings ...)
  2021-09-18 23:15 ` [PATCH 3/6] unpack-trees: avoid nuking untracked dir in way of unmerged file Elijah Newren via GitGitGadget
@ 2021-09-18 23:15 ` Elijah Newren via GitGitGadget
  2021-09-19 13:52   ` Ævar Arnfjörð Bjarmason
  2021-09-18 23:15 ` [PATCH 5/6] Comment important codepaths regarding nuking untracked files/dirs Elijah Newren via GitGitGadget
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 82+ messages in thread
From: Elijah Newren via GitGitGadget @ 2021-09-18 23:15 UTC (permalink / raw)
  To: git
  Cc: Ævar Arnfjörð Bjarmason, Fedor Biryukov,
	Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 t/t2500-untracked-overwriting.sh | 2 +-
 unpack-trees.c                   | 4 ++++
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/t/t2500-untracked-overwriting.sh b/t/t2500-untracked-overwriting.sh
index 017946a494f..d4d9dc928aa 100755
--- a/t/t2500-untracked-overwriting.sh
+++ b/t/t2500-untracked-overwriting.sh
@@ -218,7 +218,7 @@ test_expect_success 'git am --abort and untracked dir vs. unmerged file' '
 	)
 '
 
-test_expect_failure 'git am --skip and untracked dir vs deleted file' '
+test_expect_success 'git am --skip and untracked dir vs deleted file' '
 	test_setup_sequencing am_skip_and_untracked &&
 	(
 		cd sequencing_am_skip_and_untracked &&
diff --git a/unpack-trees.c b/unpack-trees.c
index 3b3d1c0ff40..858595a13f1 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -2395,7 +2395,11 @@ static int deleted_entry(const struct cache_entry *ce,
 		if (verify_absent(ce, ERROR_WOULD_LOSE_UNTRACKED_REMOVED, o))
 			return -1;
 		return 0;
+	} else {
+		if (verify_absent_if_directory(ce, ERROR_WOULD_LOSE_UNTRACKED_REMOVED, o))
+			return -1;
 	}
+
 	if (!(old->ce_flags & CE_CONFLICTED) && verify_uptodate(old, o))
 		return -1;
 	add_entry(o, ce, CE_REMOVE, 0);
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [PATCH 5/6] Comment important codepaths regarding nuking untracked files/dirs
  2021-09-18 23:15 [PATCH 0/6] Fix various issues around removal of untracked files/directories Elijah Newren via GitGitGadget
                   ` (3 preceding siblings ...)
  2021-09-18 23:15 ` [PATCH 4/6] unpack-trees: avoid nuking untracked dir in way of locally deleted file Elijah Newren via GitGitGadget
@ 2021-09-18 23:15 ` Elijah Newren via GitGitGadget
  2021-09-24 11:47   ` Luke Diamand
  2021-09-18 23:15 ` [PATCH 6/6] Documentation: call out commands that nuke untracked files/directories Elijah Newren via GitGitGadget
  2021-09-24  6:37 ` [PATCH v2 0/6] Fix various issues around removal of " Elijah Newren via GitGitGadget
  6 siblings, 1 reply; 82+ messages in thread
From: Elijah Newren via GitGitGadget @ 2021-09-18 23:15 UTC (permalink / raw)
  To: git
  Cc: Ævar Arnfjörð Bjarmason, Fedor Biryukov,
	Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

In the last few commits we focused on code in unpack-trees.c that
mistakenly removed untracked files or directories.  There may be more of
those, but in this commit we change our focus: callers of toplevel
commands that are expected to remove untracked files or directories.

As noted previously, we have toplevel commands that are expected to
delete untracked files such as 'read-tree --reset', 'reset --hard', and
'checkout --force'.  However, that does not mean that other highlevel
commands that happen to call these other commands thought about or
conveyed to users the possibility that untracked files could be removed.
Audit the code for such callsites, and add comments near existing
callsites to mention whether these are safe or not.

My auditing is somewhat incomplete, though; it skipped several cases:
  * git-rebase--preserve-merges.sh: is in the process of being
    deprecated/removed, so I won't leave a note that there are
    likely more bugs in that script.
  * contrib/git-new-workdir: why is the -f flag being used in a new
    empty directory??  It shouldn't hurt, but it seems useless.
  * git-p4.py: Don't see why -f is needed for a new dir (maybe it's
    not and is just superfluous), but I'm not at all familiar with
    the p4 stuff
  * git-archimport.perl: Don't care; arch is long since dead
  * git-cvs*.perl: Don't care; cvs is long since dead

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 builtin/stash.c             | 1 +
 builtin/submodule--helper.c | 4 ++++
 builtin/worktree.c          | 5 +++++
 contrib/rerere-train.sh     | 2 +-
 submodule.c                 | 1 +
 5 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/builtin/stash.c b/builtin/stash.c
index 4ceb3581b47..ce5e0364c68 100644
--- a/builtin/stash.c
+++ b/builtin/stash.c
@@ -1519,6 +1519,7 @@ static int do_push_stash(const struct pathspec *ps, const char *stash_msg, int q
 		} else {
 			struct child_process cp = CHILD_PROCESS_INIT;
 			cp.git_cmd = 1;
+			/* BUG: this nukes untracked files in the way */
 			strvec_pushl(&cp.args, "reset", "--hard", "-q",
 				     "--no-recurse-submodules", NULL);
 			if (run_command(&cp)) {
diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
index ef2776a9e45..a49242d15ae 100644
--- a/builtin/submodule--helper.c
+++ b/builtin/submodule--helper.c
@@ -2864,6 +2864,10 @@ static int add_submodule(const struct add_data *add_data)
 		prepare_submodule_repo_env(&cp.env_array);
 		cp.git_cmd = 1;
 		cp.dir = add_data->sm_path;
+		/*
+		 * NOTE: we only get here if add_data->force is true, so
+		 * passing --force to checkout is reasonable.
+		 */
 		strvec_pushl(&cp.args, "checkout", "-f", "-q", NULL);
 
 		if (add_data->branch) {
diff --git a/builtin/worktree.c b/builtin/worktree.c
index 0d0a80da61f..383947ff54f 100644
--- a/builtin/worktree.c
+++ b/builtin/worktree.c
@@ -356,6 +356,11 @@ static int add_worktree(const char *path, const char *refname,
 	if (opts->checkout) {
 		cp.argv = NULL;
 		strvec_clear(&cp.args);
+		/*
+		 * NOTE: reset --hard is okay here, because 'worktree add'
+		 * refuses to work in an extant non-empty directory, so there
+		 * is no risk of deleting untracked files.
+		 */
 		strvec_pushl(&cp.args, "reset", "--hard", "--no-recurse-submodules", NULL);
 		if (opts->quiet)
 			strvec_push(&cp.args, "--quiet");
diff --git a/contrib/rerere-train.sh b/contrib/rerere-train.sh
index eeee45dd341..75125d6ae00 100755
--- a/contrib/rerere-train.sh
+++ b/contrib/rerere-train.sh
@@ -91,7 +91,7 @@ do
 		git checkout -q $commit -- .
 		git rerere
 	fi
-	git reset -q --hard
+	git reset -q --hard  # Might nuke untracked files...
 done
 
 if test -z "$branch"
diff --git a/submodule.c b/submodule.c
index 8e611fe1dbf..a9b71d585cf 100644
--- a/submodule.c
+++ b/submodule.c
@@ -1866,6 +1866,7 @@ static void submodule_reset_index(const char *path)
 
 	strvec_pushf(&cp.args, "--super-prefix=%s%s/",
 		     get_super_prefix_or_empty(), path);
+	/* TODO: determine if this might overwright untracked files */
 	strvec_pushl(&cp.args, "read-tree", "-u", "--reset", NULL);
 
 	strvec_push(&cp.args, empty_tree_oid_hex());
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [PATCH 6/6] Documentation: call out commands that nuke untracked files/directories
  2021-09-18 23:15 [PATCH 0/6] Fix various issues around removal of untracked files/directories Elijah Newren via GitGitGadget
                   ` (4 preceding siblings ...)
  2021-09-18 23:15 ` [PATCH 5/6] Comment important codepaths regarding nuking untracked files/dirs Elijah Newren via GitGitGadget
@ 2021-09-18 23:15 ` Elijah Newren via GitGitGadget
  2021-09-19 10:52   ` Philip Oakley
  2021-09-24  6:37 ` [PATCH v2 0/6] Fix various issues around removal of " Elijah Newren via GitGitGadget
  6 siblings, 1 reply; 82+ messages in thread
From: Elijah Newren via GitGitGadget @ 2021-09-18 23:15 UTC (permalink / raw)
  To: git
  Cc: Ævar Arnfjörð Bjarmason, Fedor Biryukov,
	Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

Some commands have traditionally also removed untracked files (or
directories) that were in the way of a tracked file we needed.  Document
these cases.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 Documentation/git-checkout.txt  | 5 +++--
 Documentation/git-read-tree.txt | 5 +++--
 Documentation/git-reset.txt     | 3 ++-
 3 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/Documentation/git-checkout.txt b/Documentation/git-checkout.txt
index b1a6fe44997..d473c9bf387 100644
--- a/Documentation/git-checkout.txt
+++ b/Documentation/git-checkout.txt
@@ -118,8 +118,9 @@ OPTIONS
 -f::
 --force::
 	When switching branches, proceed even if the index or the
-	working tree differs from `HEAD`.  This is used to throw away
-	local changes.
+	working tree differs from `HEAD`, and even if there are untracked
+	files in the way.  This is used to throw away local changes and
+	any untracked files or directories that are in the way.
 +
 When checking out paths from the index, do not fail upon unmerged
 entries; instead, unmerged entries are ignored.
diff --git a/Documentation/git-read-tree.txt b/Documentation/git-read-tree.txt
index 5fa8bab64c2..4731ec3283f 100644
--- a/Documentation/git-read-tree.txt
+++ b/Documentation/git-read-tree.txt
@@ -39,8 +39,9 @@ OPTIONS
 
 --reset::
 	Same as -m, except that unmerged entries are discarded instead
-	of failing. When used with `-u`, updates leading to loss of
-	working tree changes will not abort the operation.
+	of failing.  When used with `-u`, updates leading to loss of
+	working tree changes or untracked files or directories will not
+	abort the operation.
 
 -u::
 	After a successful merge, update the files in the work
diff --git a/Documentation/git-reset.txt b/Documentation/git-reset.txt
index 252e2d4e47d..6f7685f53d5 100644
--- a/Documentation/git-reset.txt
+++ b/Documentation/git-reset.txt
@@ -69,7 +69,8 @@ linkgit:git-add[1]).
 
 --hard::
 	Resets the index and working tree. Any changes to tracked files in the
-	working tree since `<commit>` are discarded.
+	working tree since `<commit>` are discarded.  Any untracked files or
+	directories in the way of writing any tracked files are simply deleted.
 
 --merge::
 	Resets the index and updates the files in the working tree that are
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 82+ messages in thread

* Re: [PATCH 6/6] Documentation: call out commands that nuke untracked files/directories
  2021-09-18 23:15 ` [PATCH 6/6] Documentation: call out commands that nuke untracked files/directories Elijah Newren via GitGitGadget
@ 2021-09-19 10:52   ` Philip Oakley
  2021-09-19 13:36     ` Philip Oakley
  0 siblings, 1 reply; 82+ messages in thread
From: Philip Oakley @ 2021-09-19 10:52 UTC (permalink / raw)
  To: Elijah Newren via GitGitGadget, git
  Cc: Ævar Arnfjörð Bjarmason, Fedor Biryukov, Elijah Newren

truly minor nit.
On 19/09/2021 00:15, Elijah Newren via GitGitGadget wrote:
> From: Elijah Newren <newren@gmail.com>
>
> Some commands have traditionally also removed untracked files (or
> directories) that were in the way of a tracked file we needed.  Document
> these cases.
>
> Signed-off-by: Elijah Newren <newren@gmail.com>
> ---
>  Documentation/git-checkout.txt  | 5 +++--
>  Documentation/git-read-tree.txt | 5 +++--
>  Documentation/git-reset.txt     | 3 ++-
>  3 files changed, 8 insertions(+), 5 deletions(-)
>
> diff --git a/Documentation/git-checkout.txt b/Documentation/git-checkout.txt
> index b1a6fe44997..d473c9bf387 100644
> --- a/Documentation/git-checkout.txt
> +++ b/Documentation/git-checkout.txt
> @@ -118,8 +118,9 @@ OPTIONS
>  -f::
>  --force::
>  	When switching branches, proceed even if the index or the
> -	working tree differs from `HEAD`.  This is used to throw away
> -	local changes.
> +	working tree differs from `HEAD`, and even if there are untracked
> +	files in the way.  This is used to throw away local changes and
double space after full stop?
> +	any untracked files or directories that are in the way.
>  +
>  When checking out paths from the index, do not fail upon unmerged
>  entries; instead, unmerged entries are ignored.
> diff --git a/Documentation/git-read-tree.txt b/Documentation/git-read-tree.txt
> index 5fa8bab64c2..4731ec3283f 100644
> --- a/Documentation/git-read-tree.txt
> +++ b/Documentation/git-read-tree.txt
> @@ -39,8 +39,9 @@ OPTIONS
>  
>  --reset::
>  	Same as -m, except that unmerged entries are discarded instead
> -	of failing. When used with `-u`, updates leading to loss of
> -	working tree changes will not abort the operation.
> +	of failing.  When used with `-u`, updates leading to loss of
Is the single space to double space change desired?
I had the impression that the project had decided on single spaces, but
I can't see anything in SubmittingPatches or CodingGuidelines. I don't
think there are DocumentationGuidelines.
> +	working tree changes or untracked files or directories will not
> +	abort the operation.
>  
>  -u::
>  	After a successful merge, update the files in the work
> diff --git a/Documentation/git-reset.txt b/Documentation/git-reset.txt
> index 252e2d4e47d..6f7685f53d5 100644
> --- a/Documentation/git-reset.txt
> +++ b/Documentation/git-reset.txt
> @@ -69,7 +69,8 @@ linkgit:git-add[1]).
>  
>  --hard::
>  	Resets the index and working tree. Any changes to tracked files in the
> -	working tree since `<commit>` are discarded.
> +	working tree since `<commit>` are discarded.  Any untracked files or
as above,  s /.  /. /  ?
> +	directories in the way of writing any tracked files are simply deleted.
>  
>  --merge::
>  	Resets the index and updates the files in the working tree that are
--
Philip

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH 6/6] Documentation: call out commands that nuke untracked files/directories
  2021-09-19 10:52   ` Philip Oakley
@ 2021-09-19 13:36     ` Philip Oakley
  2021-09-20 16:29       ` Elijah Newren
  0 siblings, 1 reply; 82+ messages in thread
From: Philip Oakley @ 2021-09-19 13:36 UTC (permalink / raw)
  To: Elijah Newren via GitGitGadget, git
  Cc: Ævar Arnfjörð Bjarmason, Fedor Biryukov, Elijah Newren

On 19/09/2021 11:52, Philip Oakley wrote:
> truly minor nit.
> On 19/09/2021 00:15, Elijah Newren via GitGitGadget wrote:
>> From: Elijah Newren <newren@gmail.com>
>>
>> Some commands have traditionally also removed untracked files (or
>> directories) that were in the way of a tracked file we needed.  Document
>> these cases.
>>
>> Signed-off-by: Elijah Newren <newren@gmail.com>
>> ---
>>  Documentation/git-checkout.txt  | 5 +++--
>>  Documentation/git-read-tree.txt | 5 +++--
>>  Documentation/git-reset.txt     | 3 ++-
>>  3 files changed, 8 insertions(+), 5 deletions(-)
>>
>> diff --git a/Documentation/git-checkout.txt b/Documentation/git-checkout.txt
>> index b1a6fe44997..d473c9bf387 100644
>> --- a/Documentation/git-checkout.txt
>> +++ b/Documentation/git-checkout.txt
>> @@ -118,8 +118,9 @@ OPTIONS
>>  -f::
>>  --force::
>>  	When switching branches, proceed even if the index or the
>> -	working tree differs from `HEAD`.  This is used to throw away
>> -	local changes.
>> +	working tree differs from `HEAD`, and even if there are untracked
>> +	files in the way.  This is used to throw away local changes and
> double space after full stop?
>> +	any untracked files or directories that are in the way.
>>  +
>>  When checking out paths from the index, do not fail upon unmerged
>>  entries; instead, unmerged entries are ignored.
>> diff --git a/Documentation/git-read-tree.txt b/Documentation/git-read-tree.txt
>> index 5fa8bab64c2..4731ec3283f 100644
>> --- a/Documentation/git-read-tree.txt
>> +++ b/Documentation/git-read-tree.txt
>> @@ -39,8 +39,9 @@ OPTIONS
>>  
>>  --reset::
>>  	Same as -m, except that unmerged entries are discarded instead
>> -	of failing. When used with `-u`, updates leading to loss of
>> -	working tree changes will not abort the operation.
>> +	of failing.  When used with `-u`, updates leading to loss of
> Is the single space to double space change desired?
> I had the impression that the project had decided on single spaces, but
> I can't see anything in SubmittingPatches or CodingGuidelines. I don't
> think there are DocumentationGuidelines.

I may have been mistaken about any project decision. I had a look around
the archives and only came up with a 2008 post [1] by Junio that, at the
time, was looking for two spaces after the full stop.

It's not clear if we consider the man pages to be 'typeset' such that a
single space would be the norm, or mono-spaced 'typewriter' style (two
spaces). There's much commentary in the Wikipedia article [2].

So still minor.
--
Philip

>> +	working tree changes or untracked files or directories will not
>> +	abort the operation.
>>  
>>  -u::
>>  	After a successful merge, update the files in the work
>> diff --git a/Documentation/git-reset.txt b/Documentation/git-reset.txt
>> index 252e2d4e47d..6f7685f53d5 100644
>> --- a/Documentation/git-reset.txt
>> +++ b/Documentation/git-reset.txt
>> @@ -69,7 +69,8 @@ linkgit:git-add[1]).
>>  
>>  --hard::
>>  	Resets the index and working tree. Any changes to tracked files in the
>> -	working tree since `<commit>` are discarded.
>> +	working tree since `<commit>` are discarded.  Any untracked files or
> as above,  s /.  /. /  ?
>> +	directories in the way of writing any tracked files are simply deleted.
>>  
>>  --merge::
>>  	Resets the index and updates the files in the working tree that are
>>
[1] https://lore.kernel.org/git/7vfxtu3fku.fsf@gitster.siamese.dyndns.org/
[2] https://en.wikipedia.org/wiki/Sentence_spacing

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH 1/6] t2500: add various tests for nuking untracked files
  2021-09-18 23:15 ` [PATCH 1/6] t2500: add various tests for nuking untracked files Elijah Newren via GitGitGadget
@ 2021-09-19 13:44   ` Ævar Arnfjörð Bjarmason
  2021-09-20 14:48     ` Elijah Newren
  0 siblings, 1 reply; 82+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-19 13:44 UTC (permalink / raw)
  To: Elijah Newren via GitGitGadget; +Cc: git, Fedor Biryukov, Elijah Newren


On Sat, Sep 18 2021, Elijah Newren via GitGitGadget wrote:

> +	test_create_repo reset_$1 &&

s/test_create_repo/git init/ these days (also for the rest).

> +		mkdir foo.t &&
> +		echo precious >foo.t/file &&
> +		cp foo.t/file expect &&
> +
> +		test_must_fail git reset --merge work 2>error &&
> +		test_cmp expect foo.t/file &&
> +		grep "Updating.*foo.t.*would lose untracked files" error

The test is ambiguous about whether we complain about foo.t/file, or
foo.t, if there was foo.t{file,file-two} would we complain just once or
twice?

I think it's just the directory, but probably worthwhile for the test to
make the distinction. If it's a "a/sub/dir/file" do we complain about
"a/" or "a/sub/dir/" ?

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH 2/6] Split unpack_trees 'reset' flag into two for untracked handling
  2021-09-18 23:15 ` [PATCH 2/6] Split unpack_trees 'reset' flag into two for untracked handling Elijah Newren via GitGitGadget
@ 2021-09-19 13:48   ` Ævar Arnfjörð Bjarmason
  2021-09-20 15:20     ` Elijah Newren
  2021-09-20 10:19   ` Phillip Wood
  1 sibling, 1 reply; 82+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-19 13:48 UTC (permalink / raw)
  To: Elijah Newren via GitGitGadget; +Cc: git, Fedor Biryukov, Elijah Newren


On Sat, Sep 18 2021, Elijah Newren via GitGitGadget wrote:

> -	opts.reset = reset;
> +	opts.reset_keep_untracked = reset;
>  	opts.fn = twoway_merge;
> +	/* Setup opts.dir so that ignored files in the way get overwritten */
> +	opts.dir = xcalloc(1, sizeof(*opts.dir));
> +	opts.dir->flags |= DIR_SHOW_IGNORED;
> +	setup_standard_excludes(opts.dir);

Is the "opts.dir" free'd later somehow?

>  	opts.head_idx = -1;
>  	opts.update = worktree;
>  	opts.skip_unmerged = !worktree;
> -	opts.reset = 1;
> +	if (o->force)
> +		opts.reset_nuke_untracked = 1;
> +	else
> +		opts.reset_keep_untracked = 1;

In both cases opts.reset_keep_untracked is set to 1, I assume it's a
mistake, aside from that perhaps betteras:

    opts.reset_keep_untracked = o->force; /* or !o->force, depending... */

>  	opts.merge = 1;
>  	opts.fn = oneway_merge;
>  	opts.verbose_update = o->show_progress;
>  	opts.src_index = &the_index;
>  	opts.dst_index = &the_index;
> +	if (o->overwrite_ignore) {
> +		opts.dir = xcalloc(1, sizeof(*opts.dir));

ditto potential leak.

> +		opts.dir = xcalloc(1, sizeof(*opts.dir));
> +		opts.dir->flags |= DIR_SHOW_IGNORED;
> +		setup_standard_excludes(opts.dir);
> +	}


ditto (also more omitted).

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH 4/6] unpack-trees: avoid nuking untracked dir in way of locally deleted file
  2021-09-18 23:15 ` [PATCH 4/6] unpack-trees: avoid nuking untracked dir in way of locally deleted file Elijah Newren via GitGitGadget
@ 2021-09-19 13:52   ` Ævar Arnfjörð Bjarmason
  2021-09-20 16:12     ` Elijah Newren
  0 siblings, 1 reply; 82+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-19 13:52 UTC (permalink / raw)
  To: Elijah Newren via GitGitGadget; +Cc: git, Fedor Biryukov, Elijah Newren


On Sat, Sep 18 2021, Elijah Newren via GitGitGadget wrote:

> From: Elijah Newren <newren@gmail.com>
>
> Signed-off-by: Elijah Newren <newren@gmail.com>
> ---
>  t/t2500-untracked-overwriting.sh | 2 +-
>  unpack-trees.c                   | 4 ++++
>  2 files changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/t/t2500-untracked-overwriting.sh b/t/t2500-untracked-overwriting.sh
> index 017946a494f..d4d9dc928aa 100755
> --- a/t/t2500-untracked-overwriting.sh
> +++ b/t/t2500-untracked-overwriting.sh
> @@ -218,7 +218,7 @@ test_expect_success 'git am --abort and untracked dir vs. unmerged file' '
>  	)
>  '
>  
> -test_expect_failure 'git am --skip and untracked dir vs deleted file' '
> +test_expect_success 'git am --skip and untracked dir vs deleted file' '
>  	test_setup_sequencing am_skip_and_untracked &&
>  	(
>  		cd sequencing_am_skip_and_untracked &&
> diff --git a/unpack-trees.c b/unpack-trees.c
> index 3b3d1c0ff40..858595a13f1 100644
> --- a/unpack-trees.c
> +++ b/unpack-trees.c
> @@ -2395,7 +2395,11 @@ static int deleted_entry(const struct cache_entry *ce,
>  		if (verify_absent(ce, ERROR_WOULD_LOSE_UNTRACKED_REMOVED, o))
>  			return -1;
>  		return 0;
> +	} else {
> +		if (verify_absent_if_directory(ce, ERROR_WOULD_LOSE_UNTRACKED_REMOVED, o))
> +			return -1;
>  	}

Maybe just "else if" ?

[...]

> +

Stray whitespace change

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH 2/6] Split unpack_trees 'reset' flag into two for untracked handling
  2021-09-18 23:15 ` [PATCH 2/6] Split unpack_trees 'reset' flag into two for untracked handling Elijah Newren via GitGitGadget
  2021-09-19 13:48   ` Ævar Arnfjörð Bjarmason
@ 2021-09-20 10:19   ` Phillip Wood
  2021-09-20 16:05     ` Elijah Newren
  1 sibling, 1 reply; 82+ messages in thread
From: Phillip Wood @ 2021-09-20 10:19 UTC (permalink / raw)
  To: Elijah Newren via GitGitGadget, git
  Cc: Ævar Arnfjörð Bjarmason, Fedor Biryukov, Elijah Newren

On 19/09/2021 00:15, Elijah Newren via GitGitGadget wrote:
> From: Elijah Newren <newren@gmail.com>
> 
> Traditionally, unpack_trees_options->reset was used to signal that it
> was okay to delete any untracked files in the way.  This was used by
> `git read-tree --reset`, but then started appearing in other places as
> well.  However, many of the other uses should not be deleting untracked
> files in the way.  Split this into two separate fields:
>     reset_nuke_untracked
>     reset_keep_untracked
> and, since many code paths in unpack_trees need to be followed for both
> of these flags, introduce a third one for convenience:
>     reset_either
> which is simply an or-ing of the other two.

See [1] for an alternative approach that used an enum instead of adding 
mutually exclusive flags.

> Modify existing callers so that
>     read-tree --reset

it would be nice if read-tree callers could choose whether they want to 
remove untracked files or not - that could always be added later. This 
patch changes the behavior of 'git read-tree -m -u' (and other commands) 
so that they will overwrite ignored files - I'm in favor of that change 
but it would be good to spell out the change in the commit message.

>     reset --hard
>     checkout --force

I often use checkout --force to clear unwanted changes when I'm 
switching branches, I'd prefer it if it did not remove untracked files.

> continue using reset_nuke_untracked, but so that other callers,
> including
>     am
>     checkout without --force
>     stash  (though currently dead code; reset always had a value of 0)
>     numerous callers from rebase/sequencer to reset_head()
> will use the new reset_keep_untracked field.

This is great. In the discussion around [1] there is a mention of 'git 
checkout <pathspec>' which also overwrites untracked files. It does not 
use unpack_trees() so is arguably outside the scope of what you're doing 
here but it might be worth mentioning.

> [...]
> diff --git a/builtin/read-tree.c b/builtin/read-tree.c
> index 485e7b04794..8b94e1aa261 100644
> --- a/builtin/read-tree.c
> +++ b/builtin/read-tree.c
> @@ -133,7 +133,7 @@ int cmd_read_tree(int argc, const char **argv, const char *cmd_prefix)
>   			 N_("3-way merge if no file level merging required")),
>   		OPT_BOOL(0, "aggressive", &opts.aggressive,
>   			 N_("3-way merge in presence of adds and removes")),
> -		OPT_BOOL(0, "reset", &opts.reset,
> +		OPT_BOOL(0, "reset", &opts.reset_keep_untracked,
>   			 N_("same as -m, but discard unmerged entries")),
>   		{ OPTION_STRING, 0, "prefix", &opts.prefix, N_("<subdirectory>/"),
>   		  N_("read the tree into the index under <subdirectory>/"),
> @@ -162,6 +162,11 @@ int cmd_read_tree(int argc, const char **argv, const char *cmd_prefix)
>   	opts.head_idx = -1;
>   	opts.src_index = &the_index;
>   	opts.dst_index = &the_index;
> +	if (opts.reset_keep_untracked) {
> +		opts.dir = xcalloc(1, sizeof(*opts.dir));
> +		opts.dir->flags |= DIR_SHOW_IGNORED;
> +		setup_standard_excludes(opts.dir);
> +	}

Does this clobber any excludes added by --exclude-per-directory?

> diff --git a/builtin/reset.c b/builtin/reset.c
> index 43e855cb887..ba39c4882a6 100644
> --- a/builtin/reset.c
> +++ b/builtin/reset.c
> @@ -10,6 +10,7 @@
>   #define USE_THE_INDEX_COMPATIBILITY_MACROS
>   #include "builtin.h"
>   #include "config.h"
> +#include "dir.h"
>   #include "lockfile.h"
>   #include "tag.h"
>   #include "object.h"
> @@ -70,9 +71,19 @@ static int reset_index(const char *ref, const struct object_id *oid, int reset_t
>   		break;
>   	case HARD:
>   		opts.update = 1;
> -		/* fallthrough */
> +		opts.reset_nuke_untracked = 1;
> +		break;
> +	case MIXED:
> +		opts.reset_keep_untracked = 1; /* but opts.update=0, so untracked left alone */
> +		break;
>   	default:
> -		opts.reset = 1;
> +		BUG("invalid reset_type passed to reset_index");

There is no case SOFT: but in that case we don't call reset_index() so 
we're OK.

> diff --git a/reset.c b/reset.c
> index 79310ae071b..0880c76aef9 100644
> --- a/reset.c
> +++ b/reset.c
> @@ -1,5 +1,6 @@
>   #include "git-compat-util.h"
>   #include "cache-tree.h"
> +#include "dir.h"
>   #include "lockfile.h"
>   #include "refs.h"
>   #include "reset.h"
> @@ -57,8 +58,12 @@ int reset_head(struct repository *r, struct object_id *oid, const char *action,
>   	unpack_tree_opts.update = 1;
>   	unpack_tree_opts.merge = 1;
>   	init_checkout_metadata(&unpack_tree_opts.meta, switch_to_branch, oid, NULL);
> -	if (!detach_head)
> -		unpack_tree_opts.reset = 1;

Unrelated to this patch but this looks dodgy to me. For 'git rebase 
<upstream> <branch>' where <branch> is ahead of <upstream> we skip the 
rebase and use reset_head() to checkout <branch> without 'detach_head' 
set. I think this should be checking 'reset_hard' instead of 'detach_head'

> diff --git a/unpack-trees.c b/unpack-trees.c
> index 5786645f315..d952eebe96a 100644
> --- a/unpack-trees.c
> +++ b/unpack-trees.c
> @@ -301,7 +301,7 @@ static int check_submodule_move_head(const struct cache_entry *ce,
>   	if (!sub)
>   		return 0;
>   
> -	if (o->reset)
> +	if (o->reset_nuke_untracked)
>   		flags |= SUBMODULE_MOVE_HEAD_FORCE;
>   
>   	if (submodule_move_head(ce->name, old_id, new_id, flags))
> @@ -1696,6 +1696,13 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
>   	if (len > MAX_UNPACK_TREES)
>   		die("unpack_trees takes at most %d trees", MAX_UNPACK_TREES);
>   
> +	if (o->reset_nuke_untracked && o->reset_keep_untracked)
> +		BUG("reset_nuke_untracked and reset_keep_untracked are incompatible");
> +
> +	o->reset_either = 0;
> +	if (o->reset_nuke_untracked || o->reset_keep_untracked)
> +		o->reset_either = 1;

<bikeshed>
o->reset_either = o->reset_nuke_untracked | o->reset_keep_untracked
</bikeshed>

> diff --git a/unpack-trees.h b/unpack-trees.h
> index 2d88b19dca7..c419bf8b1f9 100644
> --- a/unpack-trees.h
> +++ b/unpack-trees.h
> @@ -46,7 +46,9 @@ void setup_unpack_trees_porcelain(struct unpack_trees_options *opts,
>   void clear_unpack_trees_porcelain(struct unpack_trees_options *opts);
>   
>   struct unpack_trees_options {
> -	unsigned int reset,
> +	unsigned int reset_nuke_untracked,
> +		     reset_keep_untracked,
> +		     reset_either, /* internal use only */

I think I prefer the enum approach in [1] but I'm biased and I'm not 
sure it's worth getting excited about. Thanks for working on this it 
will be great to have git stop overwriting untracked files so often.

Best Wishes

Phillip

[1] 
https://lore.kernel.org/git/20190501101403.20294-1-phillip.wood123@gmail.com/


>   		     merge,
>   		     update,
>   		     clone,
> 


^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH 1/6] t2500: add various tests for nuking untracked files
  2021-09-19 13:44   ` Ævar Arnfjörð Bjarmason
@ 2021-09-20 14:48     ` Elijah Newren
  0 siblings, 0 replies; 82+ messages in thread
From: Elijah Newren @ 2021-09-20 14:48 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Elijah Newren via GitGitGadget, Git Mailing List, Fedor Biryukov

On Sun, Sep 19, 2021 at 6:47 AM Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
>
> On Sat, Sep 18 2021, Elijah Newren via GitGitGadget wrote:
>
> > +     test_create_repo reset_$1 &&
>
> s/test_create_repo/git init/ these days (also for the rest).

Ah, from your f0d4d398e2 ("test-lib: split up and deprecate
test_create_repo()", 2021-05-10).  Interesting history; I'll
switchover to git init.

> > +             mkdir foo.t &&
> > +             echo precious >foo.t/file &&
> > +             cp foo.t/file expect &&
> > +
> > +             test_must_fail git reset --merge work 2>error &&
> > +             test_cmp expect foo.t/file &&
> > +             grep "Updating.*foo.t.*would lose untracked files" error
>
> The test is ambiguous about whether we complain about foo.t/file, or
> foo.t, if there was foo.t{file,file-two} would we complain just once or
> twice?
>
> I think it's just the directory, but probably worthwhile for the test to
> make the distinction. If it's a "a/sub/dir/file" do we complain about
> "a/" or "a/sub/dir/" ?

Yeah, I'll switch it to grep "Updating .foo.t. would lose untracked files" error

to make it clearer (where I'm using '.' instead of attempting to
escape single quote characters appropriately).

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH 2/6] Split unpack_trees 'reset' flag into two for untracked handling
  2021-09-19 13:48   ` Ævar Arnfjörð Bjarmason
@ 2021-09-20 15:20     ` Elijah Newren
  0 siblings, 0 replies; 82+ messages in thread
From: Elijah Newren @ 2021-09-20 15:20 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Elijah Newren via GitGitGadget, Git Mailing List, Fedor Biryukov

On Sun, Sep 19, 2021 at 6:51 AM Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
>
> On Sat, Sep 18 2021, Elijah Newren via GitGitGadget wrote:
>
> > -     opts.reset = reset;
> > +     opts.reset_keep_untracked = reset;
> >       opts.fn = twoway_merge;
> > +     /* Setup opts.dir so that ignored files in the way get overwritten */
> > +     opts.dir = xcalloc(1, sizeof(*opts.dir));
> > +     opts.dir->flags |= DIR_SHOW_IGNORED;
> > +     setup_standard_excludes(opts.dir);
>
> Is the "opts.dir" free'd later somehow?

No, much like other callsites that set this up (e.g.
builtin/checkout.c), there isn't a place that frees it.  In copying
how they worked, I also copied their bugs.  ;-)

I'm tempted to move this code into setup_unpack_trees_porcelain() and
then free it in clear_unpack_trees_porcelain()...though not all
callers make use of those functions.  Hmm...

> >       opts.head_idx = -1;
> >       opts.update = worktree;
> >       opts.skip_unmerged = !worktree;
> > -     opts.reset = 1;
> > +     if (o->force)
> > +             opts.reset_nuke_untracked = 1;
> > +     else
> > +             opts.reset_keep_untracked = 1;
>
> In both cases opts.reset_keep_untracked is set to 1, I assume it's a
> mistake

No, it only sets opts.reset_keep_untracked to 1 when o->force is
false.  If o->force is true, it instead sets opts.reset_nuke_untracked
to 1.  There's no mistake there.

> >       opts.merge = 1;
> >       opts.fn = oneway_merge;
> >       opts.verbose_update = o->show_progress;
> >       opts.src_index = &the_index;
> >       opts.dst_index = &the_index;
> > +     if (o->overwrite_ignore) {
> > +             opts.dir = xcalloc(1, sizeof(*opts.dir));
>
> ditto potential leak.
>
> > +             opts.dir = xcalloc(1, sizeof(*opts.dir));
> > +             opts.dir->flags |= DIR_SHOW_IGNORED;
> > +             setup_standard_excludes(opts.dir);
> > +     }
>
>
> ditto (also more omitted).

Yep.

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH 2/6] Split unpack_trees 'reset' flag into two for untracked handling
  2021-09-20 10:19   ` Phillip Wood
@ 2021-09-20 16:05     ` Elijah Newren
  2021-09-20 18:11       ` Phillip Wood
  0 siblings, 1 reply; 82+ messages in thread
From: Elijah Newren @ 2021-09-20 16:05 UTC (permalink / raw)
  To: Phillip Wood
  Cc: Elijah Newren via GitGitGadget, Git Mailing List,
	Ævar Arnfjörð Bjarmason, Fedor Biryukov

On Mon, Sep 20, 2021 at 3:19 AM Phillip Wood <phillip.wood123@gmail.com> wrote:
>
> On 19/09/2021 00:15, Elijah Newren via GitGitGadget wrote:
> > From: Elijah Newren <newren@gmail.com>
> >
> > Traditionally, unpack_trees_options->reset was used to signal that it
> > was okay to delete any untracked files in the way.  This was used by
> > `git read-tree --reset`, but then started appearing in other places as
> > well.  However, many of the other uses should not be deleting untracked
> > files in the way.  Split this into two separate fields:
> >     reset_nuke_untracked
> >     reset_keep_untracked
> > and, since many code paths in unpack_trees need to be followed for both
> > of these flags, introduce a third one for convenience:
> >     reset_either
> > which is simply an or-ing of the other two.
>
> See [1] for an alternative approach that used an enum instead of adding
> mutually exclusive flags.

Oh, interesting.  Any reason you didn't pursue that old series further?

> > Modify existing callers so that
> >     read-tree --reset
>
> it would be nice if read-tree callers could choose whether they want to
> remove untracked files or not - that could always be added later. This
> patch changes the behavior of 'git read-tree -m -u' (and other commands)
> so that they will overwrite ignored files - I'm in favor of that change
> but it would be good to spell out the change in the commit message.

Those commands made no distinction between untracked and ignored files
previously, and overwrote all of them.  This patch changes those
commands so that they stop overwriting untracked files, unless those
files are ignored.  So, there's no change in behavior for ignored
files, only for non-ignored untracked files.

Your suggestion to point out the behavior relative to ignored files in
the commit message, though, is probably a good idea.  I should mention
that ignored files will continue to be removed by these commands.

> >     reset --hard
> >     checkout --force
>
> I often use checkout --force to clear unwanted changes when I'm
> switching branches, I'd prefer it if it did not remove untracked files.

I originally started down that path to see what it looked like, but
Junio weighed in and explicitly called out checkout --force as being a
command that should remove untracked files in the way.  See
https://lore.kernel.org/git/xmqqr1e2ejs9.fsf@gitster.g/.  Seems you
also felt that way previously, at
https://lore.kernel.org/git/d4c36a24-b40c-a6ca-7a05-572ab93a0101@gmail.com/
-- any reason for your change of opinion?

> > continue using reset_nuke_untracked, but so that other callers,
> > including
> >     am
> >     checkout without --force
> >     stash  (though currently dead code; reset always had a value of 0)
> >     numerous callers from rebase/sequencer to reset_head()
> > will use the new reset_keep_untracked field.
>
> This is great. In the discussion around [1] there is a mention of 'git
> checkout <pathspec>' which also overwrites untracked files. It does not
> use unpack_trees() so is arguably outside the scope of what you're doing
> here but it might be worth mentioning.

Oh, that's interesting.  Yeah, that's worth mentioning and perhaps digging into.

>
> > [...]
> > diff --git a/builtin/read-tree.c b/builtin/read-tree.c
> > index 485e7b04794..8b94e1aa261 100644
> > --- a/builtin/read-tree.c
> > +++ b/builtin/read-tree.c
> > @@ -133,7 +133,7 @@ int cmd_read_tree(int argc, const char **argv, const char *cmd_prefix)
> >                        N_("3-way merge if no file level merging required")),
> >               OPT_BOOL(0, "aggressive", &opts.aggressive,
> >                        N_("3-way merge in presence of adds and removes")),
> > -             OPT_BOOL(0, "reset", &opts.reset,
> > +             OPT_BOOL(0, "reset", &opts.reset_keep_untracked,
> >                        N_("same as -m, but discard unmerged entries")),
> >               { OPTION_STRING, 0, "prefix", &opts.prefix, N_("<subdirectory>/"),
> >                 N_("read the tree into the index under <subdirectory>/"),
> > @@ -162,6 +162,11 @@ int cmd_read_tree(int argc, const char **argv, const char *cmd_prefix)
> >       opts.head_idx = -1;
> >       opts.src_index = &the_index;
> >       opts.dst_index = &the_index;
> > +     if (opts.reset_keep_untracked) {
> > +             opts.dir = xcalloc(1, sizeof(*opts.dir));
> > +             opts.dir->flags |= DIR_SHOW_IGNORED;
> > +             setup_standard_excludes(opts.dir);
> > +     }
>
> Does this clobber any excludes added by --exclude-per-directory?

Oh, um...I've basically implemented a --exclude-standard and assumed
it was passed, ignoring whatever setting of opts.dir was already set
up by exclude-per-directory.  Oops.

> > diff --git a/builtin/reset.c b/builtin/reset.c
> > index 43e855cb887..ba39c4882a6 100644
> > --- a/builtin/reset.c
> > +++ b/builtin/reset.c
> > @@ -10,6 +10,7 @@
> >   #define USE_THE_INDEX_COMPATIBILITY_MACROS
> >   #include "builtin.h"
> >   #include "config.h"
> > +#include "dir.h"
> >   #include "lockfile.h"
> >   #include "tag.h"
> >   #include "object.h"
> > @@ -70,9 +71,19 @@ static int reset_index(const char *ref, const struct object_id *oid, int reset_t
> >               break;
> >       case HARD:
> >               opts.update = 1;
> > -             /* fallthrough */
> > +             opts.reset_nuke_untracked = 1;
> > +             break;
> > +     case MIXED:
> > +             opts.reset_keep_untracked = 1; /* but opts.update=0, so untracked left alone */
> > +             break;
> >       default:
> > -             opts.reset = 1;
> > +             BUG("invalid reset_type passed to reset_index");
>
> There is no case SOFT: but in that case we don't call reset_index() so
> we're OK.
>
> > diff --git a/reset.c b/reset.c
> > index 79310ae071b..0880c76aef9 100644
> > --- a/reset.c
> > +++ b/reset.c
> > @@ -1,5 +1,6 @@
> >   #include "git-compat-util.h"
> >   #include "cache-tree.h"
> > +#include "dir.h"
> >   #include "lockfile.h"
> >   #include "refs.h"
> >   #include "reset.h"
> > @@ -57,8 +58,12 @@ int reset_head(struct repository *r, struct object_id *oid, const char *action,
> >       unpack_tree_opts.update = 1;
> >       unpack_tree_opts.merge = 1;
> >       init_checkout_metadata(&unpack_tree_opts.meta, switch_to_branch, oid, NULL);
> > -     if (!detach_head)
> > -             unpack_tree_opts.reset = 1;
>
> Unrelated to this patch but this looks dodgy to me. For 'git rebase
> <upstream> <branch>' where <branch> is ahead of <upstream> we skip the
> rebase and use reset_head() to checkout <branch> without 'detach_head'
> set. I think this should be checking 'reset_hard' instead of 'detach_head'
>
> > diff --git a/unpack-trees.c b/unpack-trees.c
> > index 5786645f315..d952eebe96a 100644
> > --- a/unpack-trees.c
> > +++ b/unpack-trees.c
> > @@ -301,7 +301,7 @@ static int check_submodule_move_head(const struct cache_entry *ce,
> >       if (!sub)
> >               return 0;
> >
> > -     if (o->reset)
> > +     if (o->reset_nuke_untracked)
> >               flags |= SUBMODULE_MOVE_HEAD_FORCE;
> >
> >       if (submodule_move_head(ce->name, old_id, new_id, flags))
> > @@ -1696,6 +1696,13 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
> >       if (len > MAX_UNPACK_TREES)
> >               die("unpack_trees takes at most %d trees", MAX_UNPACK_TREES);
> >
> > +     if (o->reset_nuke_untracked && o->reset_keep_untracked)
> > +             BUG("reset_nuke_untracked and reset_keep_untracked are incompatible");
> > +
> > +     o->reset_either = 0;
> > +     if (o->reset_nuke_untracked || o->reset_keep_untracked)
> > +             o->reset_either = 1;
>
> <bikeshed>
> o->reset_either = o->reset_nuke_untracked | o->reset_keep_untracked
> </bikeshed>

Goes away entirely if we adopt your enum suggestion.

> > diff --git a/unpack-trees.h b/unpack-trees.h
> > index 2d88b19dca7..c419bf8b1f9 100644
> > --- a/unpack-trees.h
> > +++ b/unpack-trees.h
> > @@ -46,7 +46,9 @@ void setup_unpack_trees_porcelain(struct unpack_trees_options *opts,
> >   void clear_unpack_trees_porcelain(struct unpack_trees_options *opts);
> >
> >   struct unpack_trees_options {
> > -     unsigned int reset,
> > +     unsigned int reset_nuke_untracked,
> > +                  reset_keep_untracked,
> > +                  reset_either, /* internal use only */
>
> I think I prefer the enum approach in [1] but I'm biased and I'm not
> sure it's worth getting excited about. Thanks for working on this it
> will be great to have git stop overwriting untracked files so often.

I think the enum approach makes sense; I'll try it out.

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH 4/6] unpack-trees: avoid nuking untracked dir in way of locally deleted file
  2021-09-19 13:52   ` Ævar Arnfjörð Bjarmason
@ 2021-09-20 16:12     ` Elijah Newren
  0 siblings, 0 replies; 82+ messages in thread
From: Elijah Newren @ 2021-09-20 16:12 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Elijah Newren via GitGitGadget, Git Mailing List, Fedor Biryukov

On Sun, Sep 19, 2021 at 6:52 AM Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
>
>
> On Sat, Sep 18 2021, Elijah Newren via GitGitGadget wrote:
>
> > From: Elijah Newren <newren@gmail.com>
> >
> > Signed-off-by: Elijah Newren <newren@gmail.com>
> > ---
> >  t/t2500-untracked-overwriting.sh | 2 +-
> >  unpack-trees.c                   | 4 ++++
> >  2 files changed, 5 insertions(+), 1 deletion(-)
> >
> > diff --git a/t/t2500-untracked-overwriting.sh b/t/t2500-untracked-overwriting.sh
> > index 017946a494f..d4d9dc928aa 100755
> > --- a/t/t2500-untracked-overwriting.sh
> > +++ b/t/t2500-untracked-overwriting.sh
> > @@ -218,7 +218,7 @@ test_expect_success 'git am --abort and untracked dir vs. unmerged file' '
> >       )
> >  '
> >
> > -test_expect_failure 'git am --skip and untracked dir vs deleted file' '
> > +test_expect_success 'git am --skip and untracked dir vs deleted file' '
> >       test_setup_sequencing am_skip_and_untracked &&
> >       (
> >               cd sequencing_am_skip_and_untracked &&
> > diff --git a/unpack-trees.c b/unpack-trees.c
> > index 3b3d1c0ff40..858595a13f1 100644
> > --- a/unpack-trees.c
> > +++ b/unpack-trees.c
> > @@ -2395,7 +2395,11 @@ static int deleted_entry(const struct cache_entry *ce,
> >               if (verify_absent(ce, ERROR_WOULD_LOSE_UNTRACKED_REMOVED, o))
> >                       return -1;
> >               return 0;
> > +     } else {
> > +             if (verify_absent_if_directory(ce, ERROR_WOULD_LOSE_UNTRACKED_REMOVED, o))
> > +                     return -1;
> >       }
>
> Maybe just "else if" ?

Yeah, that makes sense.

> [...]

That's kind of misleading.  ;-)  You trimmed out a single line here,
and in particular one that only contained a trailing curly brace.
Thus, your "trimming" here actually made things longer.

>
> > +
>
> Stray whitespace change

No, the whitespace addition was after making the if-block above it
more complex with the extra else block.  That if-block is now
approximately 2/3 of the length of the function, and is the part that
is relevant to the comment above it.  Since the code that follows the
if-block is separate from the comment above and the if-block became
more complex, it felt natural to add a bit of spacing.  So, it wasn't
stray, but intentional and related to the changes above.

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH 6/6] Documentation: call out commands that nuke untracked files/directories
  2021-09-19 13:36     ` Philip Oakley
@ 2021-09-20 16:29       ` Elijah Newren
  0 siblings, 0 replies; 82+ messages in thread
From: Elijah Newren @ 2021-09-20 16:29 UTC (permalink / raw)
  To: Philip Oakley
  Cc: Elijah Newren via GitGitGadget, Git Mailing List,
	Ævar Arnfjörð Bjarmason, Fedor Biryukov

On Sun, Sep 19, 2021 at 6:36 AM Philip Oakley <philipoakley@iee.email> wrote:
>
> On 19/09/2021 11:52, Philip Oakley wrote:
> > truly minor nit.
> > On 19/09/2021 00:15, Elijah Newren via GitGitGadget wrote:
> >> From: Elijah Newren <newren@gmail.com>
> >>
> >> Some commands have traditionally also removed untracked files (or
> >> directories) that were in the way of a tracked file we needed.  Document
> >> these cases.
> >>
> >> Signed-off-by: Elijah Newren <newren@gmail.com>
> >> ---
> >>  Documentation/git-checkout.txt  | 5 +++--
> >>  Documentation/git-read-tree.txt | 5 +++--
> >>  Documentation/git-reset.txt     | 3 ++-
> >>  3 files changed, 8 insertions(+), 5 deletions(-)
> >>
> >> diff --git a/Documentation/git-checkout.txt b/Documentation/git-checkout.txt
> >> index b1a6fe44997..d473c9bf387 100644
> >> --- a/Documentation/git-checkout.txt
> >> +++ b/Documentation/git-checkout.txt
> >> @@ -118,8 +118,9 @@ OPTIONS
> >>  -f::
> >>  --force::
> >>      When switching branches, proceed even if the index or the
> >> -    working tree differs from `HEAD`.  This is used to throw away
> >> -    local changes.
> >> +    working tree differs from `HEAD`, and even if there are untracked
> >> +    files in the way.  This is used to throw away local changes and
> > double space after full stop?

Note that the original also had a double space after full stop
(looking at the previous sentence).

> >> +    any untracked files or directories that are in the way.
> >>  +
> >>  When checking out paths from the index, do not fail upon unmerged
> >>  entries; instead, unmerged entries are ignored.
> >> diff --git a/Documentation/git-read-tree.txt b/Documentation/git-read-tree.txt
> >> index 5fa8bab64c2..4731ec3283f 100644
> >> --- a/Documentation/git-read-tree.txt
> >> +++ b/Documentation/git-read-tree.txt
> >> @@ -39,8 +39,9 @@ OPTIONS
> >>
> >>  --reset::
> >>      Same as -m, except that unmerged entries are discarded instead
> >> -    of failing. When used with `-u`, updates leading to loss of
> >> -    working tree changes will not abort the operation.
> >> +    of failing.  When used with `-u`, updates leading to loss of
> > Is the single space to double space change desired?
> > I had the impression that the project had decided on single spaces, but
> > I can't see anything in SubmittingPatches or CodingGuidelines. I don't
> > think there are DocumentationGuidelines.

Double space is better as per Junio's declaration here:
https://lore.kernel.org/git/xmqqftchkext.fsf@gitster.c.googlers.com/

However, it's so minor that I wouldn't normally bother to change it
specifically.  In this case, I originally was tweaking the sentence
before as well, and when modifying both sentences I just naturally put
two spaces after the full stop between them.  But then I re-read and
decided to reword and ended up restoring the original first sentence
and didn't even notice that resulted in a change of spacing at the end
of the sentence.

> I may have been mistaken about any project decision. I had a look around
> the archives and only came up with a 2008 post [1] by Junio that, at the
> time, was looking for two spaces after the full stop.
>
> It's not clear if we consider the man pages to be 'typeset' such that a
> single space would be the norm, or mono-spaced 'typewriter' style (two
> spaces). There's much commentary in the Wikipedia article [2].
>
> So still minor.

Yes, the double space is "more correct" in the sources as per Junio's
declaration in the link I provided above, but I agree it's pretty
minor.

> [1] https://lore.kernel.org/git/7vfxtu3fku.fsf@gitster.siamese.dyndns.org/
> [2] https://en.wikipedia.org/wiki/Sentence_spacing

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH 2/6] Split unpack_trees 'reset' flag into two for untracked handling
  2021-09-20 16:05     ` Elijah Newren
@ 2021-09-20 18:11       ` Phillip Wood
  2021-09-24  2:27         ` Elijah Newren
  0 siblings, 1 reply; 82+ messages in thread
From: Phillip Wood @ 2021-09-20 18:11 UTC (permalink / raw)
  To: Elijah Newren, Phillip Wood
  Cc: Elijah Newren via GitGitGadget, Git Mailing List,
	Ævar Arnfjörð Bjarmason, Fedor Biryukov

On 20/09/2021 17:05, Elijah Newren wrote:
> On Mon, Sep 20, 2021 at 3:19 AM Phillip Wood <phillip.wood123@gmail.com> wrote:
>>
>> On 19/09/2021 00:15, Elijah Newren via GitGitGadget wrote:
>>> From: Elijah Newren <newren@gmail.com>
>>>
>>> Traditionally, unpack_trees_options->reset was used to signal that it
>>> was okay to delete any untracked files in the way.  This was used by
>>> `git read-tree --reset`, but then started appearing in other places as
>>> well.  However, many of the other uses should not be deleting untracked
>>> files in the way.  Split this into two separate fields:
>>>      reset_nuke_untracked
>>>      reset_keep_untracked
>>> and, since many code paths in unpack_trees need to be followed for both
>>> of these flags, introduce a third one for convenience:
>>>      reset_either
>>> which is simply an or-ing of the other two.
>>
>> See [1] for an alternative approach that used an enum instead of adding
>> mutually exclusive flags.
> 
> Oh, interesting.  Any reason you didn't pursue that old series further?

Mainly lack of time/distracted by other things. I was also not that 
confident about modifying the unpack_trees() code. Duy was very helpful 
but then moved on quite soon after I posted that series I think and 
there didn't seem to be much interest from others.

>>> Modify existing callers so that
>>>      read-tree --reset
>>
>> it would be nice if read-tree callers could choose whether they want to
>> remove untracked files or not - that could always be added later. This
>> patch changes the behavior of 'git read-tree -m -u' (and other commands)
>> so that they will overwrite ignored files - I'm in favor of that change
>> but it would be good to spell out the change in the commit message.
> 
> Those commands made no distinction between untracked and ignored files
> previously, and overwrote all of them.

Are you sure, I thought 'read-tree -m -u' unlike 'read-tree --reset -u' 
refused to overwrite untracked and ignored files currently.

>  This patch changes those
> commands so that they stop overwriting untracked files, unless those
> files are ignored.  So, there's no change in behavior for ignored
> files, only for non-ignored untracked files.
> 
> Your suggestion to point out the behavior relative to ignored files in
> the commit message, though, is probably a good idea.  I should mention
> that ignored files will continue to be removed by these commands.
> 
>>>      reset --hard
>>>      checkout --force
>>
>> I often use checkout --force to clear unwanted changes when I'm
>> switching branches, I'd prefer it if it did not remove untracked files.
> 
> I originally started down that path to see what it looked like, but
> Junio weighed in and explicitly called out checkout --force as being a
> command that should remove untracked files in the way.  See
> https://lore.kernel.org/git/xmqqr1e2ejs9.fsf@gitster.g/.  Seems you
> also felt that way previously, at
> https://lore.kernel.org/git/d4c36a24-b40c-a6ca-7a05-572ab93a0101@gmail.com/
> -- any reason for your change of opinion?

I've no recollection of writing that email! When I was writing today I 
thought that 'checkout -f' and 'switch --discard-changes' behaved the 
same way but it appears from that other message that they do not so 
maybe it is OK for 'checkout -f' to nuke everything if there is a safe 
alternative available in the form of 'switch --discard-changes'

>>> continue using reset_nuke_untracked, but so that other callers,
>>> including
>>>      am
>>>      checkout without --force
>>>      stash  (though currently dead code; reset always had a value of 0)
>>>      numerous callers from rebase/sequencer to reset_head()
>>> will use the new reset_keep_untracked field.
>>
>> This is great. In the discussion around [1] there is a mention of 'git
>> checkout <pathspec>' which also overwrites untracked files. It does not
>> use unpack_trees() so is arguably outside the scope of what you're doing
>> here but it might be worth mentioning.
> 
> Oh, that's interesting.  Yeah, that's worth mentioning and perhaps digging into.

It'd be fantastic to fix that if you have the time and inclination to 
dig into it.

Best Wishes

Phillip

>>> [...]
>>> diff --git a/builtin/read-tree.c b/builtin/read-tree.c
>>> index 485e7b04794..8b94e1aa261 100644
>>> --- a/builtin/read-tree.c
>>> +++ b/builtin/read-tree.c
>>> @@ -133,7 +133,7 @@ int cmd_read_tree(int argc, const char **argv, const char *cmd_prefix)
>>>                         N_("3-way merge if no file level merging required")),
>>>                OPT_BOOL(0, "aggressive", &opts.aggressive,
>>>                         N_("3-way merge in presence of adds and removes")),
>>> -             OPT_BOOL(0, "reset", &opts.reset,
>>> +             OPT_BOOL(0, "reset", &opts.reset_keep_untracked,
>>>                         N_("same as -m, but discard unmerged entries")),
>>>                { OPTION_STRING, 0, "prefix", &opts.prefix, N_("<subdirectory>/"),
>>>                  N_("read the tree into the index under <subdirectory>/"),
>>> @@ -162,6 +162,11 @@ int cmd_read_tree(int argc, const char **argv, const char *cmd_prefix)
>>>        opts.head_idx = -1;
>>>        opts.src_index = &the_index;
>>>        opts.dst_index = &the_index;
>>> +     if (opts.reset_keep_untracked) {
>>> +             opts.dir = xcalloc(1, sizeof(*opts.dir));
>>> +             opts.dir->flags |= DIR_SHOW_IGNORED;
>>> +             setup_standard_excludes(opts.dir);
>>> +     }
>>
>> Does this clobber any excludes added by --exclude-per-directory?
> 
> Oh, um...I've basically implemented a --exclude-standard and assumed
> it was passed, ignoring whatever setting of opts.dir was already set
> up by exclude-per-directory.  Oops.
> 
>>> diff --git a/builtin/reset.c b/builtin/reset.c
>>> index 43e855cb887..ba39c4882a6 100644
>>> --- a/builtin/reset.c
>>> +++ b/builtin/reset.c
>>> @@ -10,6 +10,7 @@
>>>    #define USE_THE_INDEX_COMPATIBILITY_MACROS
>>>    #include "builtin.h"
>>>    #include "config.h"
>>> +#include "dir.h"
>>>    #include "lockfile.h"
>>>    #include "tag.h"
>>>    #include "object.h"
>>> @@ -70,9 +71,19 @@ static int reset_index(const char *ref, const struct object_id *oid, int reset_t
>>>                break;
>>>        case HARD:
>>>                opts.update = 1;
>>> -             /* fallthrough */
>>> +             opts.reset_nuke_untracked = 1;
>>> +             break;
>>> +     case MIXED:
>>> +             opts.reset_keep_untracked = 1; /* but opts.update=0, so untracked left alone */
>>> +             break;
>>>        default:
>>> -             opts.reset = 1;
>>> +             BUG("invalid reset_type passed to reset_index");
>>
>> There is no case SOFT: but in that case we don't call reset_index() so
>> we're OK.
>>
>>> diff --git a/reset.c b/reset.c
>>> index 79310ae071b..0880c76aef9 100644
>>> --- a/reset.c
>>> +++ b/reset.c
>>> @@ -1,5 +1,6 @@
>>>    #include "git-compat-util.h"
>>>    #include "cache-tree.h"
>>> +#include "dir.h"
>>>    #include "lockfile.h"
>>>    #include "refs.h"
>>>    #include "reset.h"
>>> @@ -57,8 +58,12 @@ int reset_head(struct repository *r, struct object_id *oid, const char *action,
>>>        unpack_tree_opts.update = 1;
>>>        unpack_tree_opts.merge = 1;
>>>        init_checkout_metadata(&unpack_tree_opts.meta, switch_to_branch, oid, NULL);
>>> -     if (!detach_head)
>>> -             unpack_tree_opts.reset = 1;
>>
>> Unrelated to this patch but this looks dodgy to me. For 'git rebase
>> <upstream> <branch>' where <branch> is ahead of <upstream> we skip the
>> rebase and use reset_head() to checkout <branch> without 'detach_head'
>> set. I think this should be checking 'reset_hard' instead of 'detach_head'
>>
>>> diff --git a/unpack-trees.c b/unpack-trees.c
>>> index 5786645f315..d952eebe96a 100644
>>> --- a/unpack-trees.c
>>> +++ b/unpack-trees.c
>>> @@ -301,7 +301,7 @@ static int check_submodule_move_head(const struct cache_entry *ce,
>>>        if (!sub)
>>>                return 0;
>>>
>>> -     if (o->reset)
>>> +     if (o->reset_nuke_untracked)
>>>                flags |= SUBMODULE_MOVE_HEAD_FORCE;
>>>
>>>        if (submodule_move_head(ce->name, old_id, new_id, flags))
>>> @@ -1696,6 +1696,13 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
>>>        if (len > MAX_UNPACK_TREES)
>>>                die("unpack_trees takes at most %d trees", MAX_UNPACK_TREES);
>>>
>>> +     if (o->reset_nuke_untracked && o->reset_keep_untracked)
>>> +             BUG("reset_nuke_untracked and reset_keep_untracked are incompatible");
>>> +
>>> +     o->reset_either = 0;
>>> +     if (o->reset_nuke_untracked || o->reset_keep_untracked)
>>> +             o->reset_either = 1;
>>
>> <bikeshed>
>> o->reset_either = o->reset_nuke_untracked | o->reset_keep_untracked
>> </bikeshed>
> 
> Goes away entirely if we adopt your enum suggestion.
> 
>>> diff --git a/unpack-trees.h b/unpack-trees.h
>>> index 2d88b19dca7..c419bf8b1f9 100644
>>> --- a/unpack-trees.h
>>> +++ b/unpack-trees.h
>>> @@ -46,7 +46,9 @@ void setup_unpack_trees_porcelain(struct unpack_trees_options *opts,
>>>    void clear_unpack_trees_porcelain(struct unpack_trees_options *opts);
>>>
>>>    struct unpack_trees_options {
>>> -     unsigned int reset,
>>> +     unsigned int reset_nuke_untracked,
>>> +                  reset_keep_untracked,
>>> +                  reset_either, /* internal use only */
>>
>> I think I prefer the enum approach in [1] but I'm biased and I'm not
>> sure it's worth getting excited about. Thanks for working on this it
>> will be great to have git stop overwriting untracked files so often.
> 
> I think the enum approach makes sense; I'll try it out.
> 


^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH 2/6] Split unpack_trees 'reset' flag into two for untracked handling
  2021-09-20 18:11       ` Phillip Wood
@ 2021-09-24  2:27         ` Elijah Newren
  0 siblings, 0 replies; 82+ messages in thread
From: Elijah Newren @ 2021-09-24  2:27 UTC (permalink / raw)
  To: Phillip Wood
  Cc: Elijah Newren via GitGitGadget, Git Mailing List,
	Ævar Arnfjörð Bjarmason, Fedor Biryukov

On Mon, Sep 20, 2021 at 11:11 AM Phillip Wood <phillip.wood123@gmail.com> wrote:
>
> On 20/09/2021 17:05, Elijah Newren wrote:
> > On Mon, Sep 20, 2021 at 3:19 AM Phillip Wood <phillip.wood123@gmail.com> wrote:
> >>
> >> On 19/09/2021 00:15, Elijah Newren via GitGitGadget wrote:
> >>> From: Elijah Newren <newren@gmail.com>
> >>>
> >>> Traditionally, unpack_trees_options->reset was used to signal that it
> >>> was okay to delete any untracked files in the way.  This was used by
> >>> `git read-tree --reset`, but then started appearing in other places as
> >>> well.  However, many of the other uses should not be deleting untracked
> >>> files in the way.  Split this into two separate fields:
> >>>      reset_nuke_untracked
> >>>      reset_keep_untracked
> >>> and, since many code paths in unpack_trees need to be followed for both
> >>> of these flags, introduce a third one for convenience:
> >>>      reset_either
> >>> which is simply an or-ing of the other two.
> >>
> >> See [1] for an alternative approach that used an enum instead of adding
> >> mutually exclusive flags.
> >
> > Oh, interesting.  Any reason you didn't pursue that old series further?
>
> Mainly lack of time/distracted by other things. I was also not that
> confident about modifying the unpack_trees() code. Duy was very helpful
> but then moved on quite soon after I posted that series I think and
> there didn't seem to be much interest from others.
>
> >>> Modify existing callers so that
> >>>      read-tree --reset
> >>
> >> it would be nice if read-tree callers could choose whether they want to
> >> remove untracked files or not - that could always be added later. This
> >> patch changes the behavior of 'git read-tree -m -u' (and other commands)
> >> so that they will overwrite ignored files - I'm in favor of that change
> >> but it would be good to spell out the change in the commit message.
> >
> > Those commands made no distinction between untracked and ignored files
> > previously, and overwrote all of them.
>
> Are you sure, I thought 'read-tree -m -u' unlike 'read-tree --reset -u'
> refused to overwrite untracked and ignored files currently.

Doh, I was thinking of read-tree --reset -u rather than read-tree -m
-u, despite the fact that you explicitly called out (and I even quoted
you on) the latter.  You are right.

> >  This patch changes those
> > commands so that they stop overwriting untracked files, unless those
> > files are ignored.  So, there's no change in behavior for ignored
> > files, only for non-ignored untracked files.
> >
> > Your suggestion to point out the behavior relative to ignored files in
> > the commit message, though, is probably a good idea.  I should mention
> > that ignored files will continue to be removed by these commands.
> >
> >>>      reset --hard
> >>>      checkout --force
> >>
> >> I often use checkout --force to clear unwanted changes when I'm
> >> switching branches, I'd prefer it if it did not remove untracked files.
> >
> > I originally started down that path to see what it looked like, but
> > Junio weighed in and explicitly called out checkout --force as being a
> > command that should remove untracked files in the way.  See
> > https://lore.kernel.org/git/xmqqr1e2ejs9.fsf@gitster.g/.  Seems you
> > also felt that way previously, at
> > https://lore.kernel.org/git/d4c36a24-b40c-a6ca-7a05-572ab93a0101@gmail.com/
> > -- any reason for your change of opinion?
>
> I've no recollection of writing that email! When I was writing today I
> thought that 'checkout -f' and 'switch --discard-changes' behaved the
> same way but it appears from that other message that they do not so
> maybe it is OK for 'checkout -f' to nuke everything if there is a safe
> alternative available in the form of 'switch --discard-changes'
>
> >>> continue using reset_nuke_untracked, but so that other callers,
> >>> including
> >>>      am
> >>>      checkout without --force
> >>>      stash  (though currently dead code; reset always had a value of 0)
> >>>      numerous callers from rebase/sequencer to reset_head()
> >>> will use the new reset_keep_untracked field.
> >>
> >> This is great. In the discussion around [1] there is a mention of 'git
> >> checkout <pathspec>' which also overwrites untracked files. It does not
> >> use unpack_trees() so is arguably outside the scope of what you're doing
> >> here but it might be worth mentioning.
> >
> > Oh, that's interesting.  Yeah, that's worth mentioning and perhaps digging into.
>
> It'd be fantastic to fix that if you have the time and inclination to
> dig into it.

I won't include it in this series, but I'll throw it on my (long) pile
of things to perhaps look at later.


Thanks for the suggestions and pointers in your reviews!

^ permalink raw reply	[flat|nested] 82+ messages in thread

* [PATCH v2 0/6] Fix various issues around removal of untracked files/directories
  2021-09-18 23:15 [PATCH 0/6] Fix various issues around removal of untracked files/directories Elijah Newren via GitGitGadget
                   ` (5 preceding siblings ...)
  2021-09-18 23:15 ` [PATCH 6/6] Documentation: call out commands that nuke untracked files/directories Elijah Newren via GitGitGadget
@ 2021-09-24  6:37 ` Elijah Newren via GitGitGadget
  2021-09-24  6:37   ` [PATCH v2 1/6] t2500: add various tests for nuking untracked files Elijah Newren via GitGitGadget
                     ` (6 more replies)
  6 siblings, 7 replies; 82+ messages in thread
From: Elijah Newren via GitGitGadget @ 2021-09-24  6:37 UTC (permalink / raw)
  To: git
  Cc: Ævar Arnfjörð Bjarmason, Fedor Biryukov,
	Philip Oakley, Phillip Wood, Elijah Newren, Elijah Newren

This series depends on en/am-abort-fix.

We have multiple codepaths that delete untracked files/directories but
shouldn't. There are also some codepaths where we delete untracked
files/directories intentionally (based on mailing list discussion), but
where that intent is not documented. Fix the documentation, add several new
(mostly failing) testcases, fix some of the new testcases, and add comments
about some potential remaining problems. (I found these as a side-effect of
looking at [1], though [2] pointed out one explicitly while I was working on
it.)

Note that I'm using Junio's declaration about checkout -f and reset --hard
(and also presuming that since read-tree --reset is porcelain that its
behavior should be left alone)[3] in this series.

Changes since v1:

 * Various small cleanups (suggested by Ævar)
 * Fixed memory leaks of unpack_trees_opts->dir (also suggested by Ævar)
 * Use an enum for unpack_trees_options->reset, instead of multiple fields
   (suggested by Phillip)
 * Avoid changing behavior for cases not setting unpack_trees_options.reset
   > 0 (even if it may make sense to nuke ignored files when running either
   read-tree -m -u or the various reset flavors run internally by
   rebase/sequencer); we can revisit that later.

SIDENOTE about treating (some) ignored files as precious:

There's another related topic here that came up in the mailing list threads
that is separate even if similar: namely, treating ignored files as precious
instead of deleting them. I do not try to handle that here, but I believe
that would actually be relatively easy to handle. If you leave
unpack_trees_options->dir as NULL, then ignored files are treated as
precious (my original patch 2 made that mistake). There's a few other
locations that already optionally set up unpack_trees_options->dir (a quick
search for "overwrite_ignore" and "overwrite-ignore" will find them), so
we'd just need to implement that option flag in more places corresponding to
the new callsites (and perhaps make a global core.overwrite_ignored config
option to affect all of these). Of course, doing so would globally treat
ignored files as precious rather than allowing them to be configured on a
per-path basis, but honestly I think the idea of configuring ignored files
as precious on a per-path basis sounds like insanity. (We have enough bugs
with untracked and ignored files without adding yet another type. Also,
tla/baz was excessively confusing to me due in part to the number of types
of files and I'd rather not see such ideas ported to git. And, of course,
configuring per-path rules sounds like lots of work for end users to
configure. There may be additional reasons against it.) So, if someone wants
to pursue the precious-ignored concept then I'd much rather see it done as a
global setting. Just my $0.02.

[1] https://lore.kernel.org/git/xmqqv93n7q1v.fsf@gitster.g/ [2]
https://lore.kernel.org/git/C357A648-8B13-45C3-9388-C0C7F7D40DAE@gmail.com/
[3] https://lore.kernel.org/git/xmqqr1e2ejs9.fsf@gitster.g/

Elijah Newren (6):
  t2500: add various tests for nuking untracked files
  Change unpack_trees' 'reset' flag into an enum
  unpack-trees: avoid nuking untracked dir in way of unmerged file
  unpack-trees: avoid nuking untracked dir in way of locally deleted
    file
  Comment important codepaths regarding nuking untracked files/dirs
  Documentation: call out commands that nuke untracked files/directories

 Documentation/git-checkout.txt   |   5 +-
 Documentation/git-read-tree.txt  |   5 +-
 Documentation/git-reset.txt      |   3 +-
 builtin/am.c                     |  13 +-
 builtin/checkout.c               |  18 ++-
 builtin/read-tree.c              |   3 +
 builtin/reset.c                  |  20 ++-
 builtin/stash.c                  |  18 ++-
 builtin/submodule--helper.c      |   4 +
 builtin/worktree.c               |   5 +
 contrib/rerere-train.sh          |   2 +-
 reset.c                          |  13 +-
 submodule.c                      |   1 +
 t/t2500-untracked-overwriting.sh | 244 +++++++++++++++++++++++++++++++
 unpack-trees.c                   |  44 +++++-
 unpack-trees.h                   |  11 +-
 16 files changed, 387 insertions(+), 22 deletions(-)
 create mode 100755 t/t2500-untracked-overwriting.sh


base-commit: c5ead19ea282a288e01d86536349a4ae4a093e4b
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1036%2Fnewren%2Funtracked_removal-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1036/newren/untracked_removal-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/1036

Range-diff vs v1:

 1:  b634136a74b ! 1:  9460a49c7ed t2500: add various tests for nuking untracked files
     @@ t/t2500-untracked-overwriting.sh (new)
      +. ./test-lib.sh
      +
      +test_setup_reset () {
     -+	test_create_repo reset_$1 &&
     ++	git init reset_$1 &&
      +	(
      +		cd reset_$1 &&
      +		test_commit init &&
     @@ t/t2500-untracked-overwriting.sh (new)
      +
      +		test_must_fail git reset --merge work 2>error &&
      +		test_cmp expect foo.t/file &&
     -+		grep "Updating.*foo.t.*would lose untracked files" error
     ++		grep "Updating .foo.t. would lose untracked files" error
      +	)
      +'
      +
     @@ t/t2500-untracked-overwriting.sh (new)
      +'
      +
      +test_setup_checkout_m () {
     -+	test_create_repo checkout &&
     ++	git init checkout &&
      +	(
      +		cd checkout &&
      +		test_commit init &&
     @@ t/t2500-untracked-overwriting.sh (new)
      +'
      +
      +test_setup_sequencing () {
     -+	test_create_repo sequencing_$1 &&
     ++	git init sequencing_$1 &&
      +	(
      +		cd sequencing_$1 &&
      +		test_commit init &&
 2:  45bd05a945f ! 2:  b77692b8f49 Split unpack_trees 'reset' flag into two for untracked handling
     @@ Metadata
      Author: Elijah Newren <newren@gmail.com>
      
       ## Commit message ##
     -    Split unpack_trees 'reset' flag into two for untracked handling
     +    Change unpack_trees' 'reset' flag into an enum
      
          Traditionally, unpack_trees_options->reset was used to signal that it
          was okay to delete any untracked files in the way.  This was used by
          `git read-tree --reset`, but then started appearing in other places as
          well.  However, many of the other uses should not be deleting untracked
     -    files in the way.  Split this into two separate fields:
     -       reset_nuke_untracked
     -       reset_keep_untracked
     -    and, since many code paths in unpack_trees need to be followed for both
     -    of these flags, introduce a third one for convenience:
     -       reset_either
     -    which is simply an or-ing of the other two.
     +    files in the way.  Change this value to an enum so that a value of 1
     +    (i.e. "true") can be split into two:
     +       UNPACK_RESET_PROTECT_UNTRACKED,
     +       UNPACK_RESET_OVERWRITE_UNTRACKED
     +    In order to catch accidental misuses, define with the enum a special
     +    value of
     +       UNPACK_RESET_INVALID = 1
     +    which will trigger a BUG().
      
          Modify existing callers so that
             read-tree --reset
             reset --hard
             checkout --force
     -    continue using reset_nuke_untracked, but so that other callers,
     -    including
     +    continue using the UNPACK_RESET_OVERWRITE_UNTRACKED logic, while other
     +    callers, including
             am
             checkout without --force
             stash  (though currently dead code; reset always had a value of 0)
             numerous callers from rebase/sequencer to reset_head()
     -    will use the new reset_keep_untracked field.
     +    will use the new UNPACK_RESET_PROTECT_UNTRACKED value.
     +
     +    In order to protect untracked files but still allow deleting of ignored
     +    files, we also have to setup unpack_trees_opt.dir.  It may make sense to
     +    set up unpack_trees_opt.dir in more cases, but here I tried to only do
     +    so in cases where we switched from deleting all untracked files to
     +    avoiding doing so (i.e. where we now use
     +    UNPACK_RESET_PROTECT_UNTRACKED).
     +
     +    Also, note that 'git checkout <pathspec>' currently also allows
     +    overwriting untracked files.  That case should also be fixed, but it
     +    does not use unpack_trees() and thus is outside the scope of the current
     +    changes.
      
          Signed-off-by: Elijah Newren <newren@gmail.com>
      
     @@ builtin/am.c: static int fast_forward_to(struct tree *head, struct tree *remote,
       	opts.update = 1;
       	opts.merge = 1;
      -	opts.reset = reset;
     -+	opts.reset_keep_untracked = reset;
     ++	opts.reset = reset ? UNPACK_RESET_PROTECT_UNTRACKED : 0;
       	opts.fn = twoway_merge;
     -+	/* Setup opts.dir so that ignored files in the way get overwritten */
     -+	opts.dir = xcalloc(1, sizeof(*opts.dir));
     -+	opts.dir->flags |= DIR_SHOW_IGNORED;
     -+	setup_standard_excludes(opts.dir);
     ++	if (opts.reset) {
     ++		/* Allow ignored files in the way to get overwritten */
     ++		opts.dir = xcalloc(1, sizeof(*opts.dir));
     ++		opts.dir->flags |= DIR_SHOW_IGNORED;
     ++		setup_standard_excludes(opts.dir);
     ++	}
       	init_tree_desc(&t[0], head->buffer, head->size);
       	init_tree_desc(&t[1], remote->buffer, remote->size);
       
     +@@ builtin/am.c: static int fast_forward_to(struct tree *head, struct tree *remote, int reset)
     + 		return -1;
     + 	}
     + 
     ++	if (opts.reset) {
     ++		dir_clear(opts.dir);
     ++		FREE_AND_NULL(opts.dir);
     ++	}
     ++
     + 	if (write_locked_index(&the_index, &lock_file, COMMIT_LOCK))
     + 		die(_("unable to write new index file"));
     + 
      
       ## builtin/checkout.c ##
      @@ builtin/checkout.c: static int reset_tree(struct tree *tree, const struct checkout_opts *o,
     + {
     + 	struct unpack_trees_options opts;
     + 	struct tree_desc tree_desc;
     ++	int unpack_trees_ret;
     + 
     + 	memset(&opts, 0, sizeof(opts));
       	opts.head_idx = -1;
       	opts.update = worktree;
       	opts.skip_unmerged = !worktree;
      -	opts.reset = 1;
     -+	if (o->force)
     -+		opts.reset_nuke_untracked = 1;
     -+	else
     -+		opts.reset_keep_untracked = 1;
     ++	opts.reset = o->force ? UNPACK_RESET_OVERWRITE_UNTRACKED :
     ++				UNPACK_RESET_PROTECT_UNTRACKED;
       	opts.merge = 1;
       	opts.fn = oneway_merge;
       	opts.verbose_update = o->show_progress;
     @@ builtin/checkout.c: static int reset_tree(struct tree *tree, const struct checko
       	init_checkout_metadata(&opts.meta, info->refname,
       			       info->commit ? &info->commit->object.oid : null_oid(),
       			       NULL);
     + 	parse_tree(tree);
     + 	init_tree_desc(&tree_desc, tree->buffer, tree->size);
     +-	switch (unpack_trees(1, &tree_desc, &opts)) {
     ++	unpack_trees_ret = unpack_trees(1, &tree_desc, &opts);
     ++
     ++	if (o->overwrite_ignore) {
     ++		dir_clear(opts.dir);
     ++		FREE_AND_NULL(opts.dir);
     ++	}
     ++
     ++	switch (unpack_trees_ret) {
     + 	case -2:
     + 		*writeout_error = 1;
     + 		/*
      
       ## builtin/read-tree.c ##
      @@ builtin/read-tree.c: int cmd_read_tree(int argc, const char **argv, const char *cmd_prefix)
     - 			 N_("3-way merge if no file level merging required")),
     - 		OPT_BOOL(0, "aggressive", &opts.aggressive,
     - 			 N_("3-way merge in presence of adds and removes")),
     --		OPT_BOOL(0, "reset", &opts.reset,
     -+		OPT_BOOL(0, "reset", &opts.reset_keep_untracked,
     - 			 N_("same as -m, but discard unmerged entries")),
     - 		{ OPTION_STRING, 0, "prefix", &opts.prefix, N_("<subdirectory>/"),
     - 		  N_("read the tree into the index under <subdirectory>/"),
     -@@ builtin/read-tree.c: int cmd_read_tree(int argc, const char **argv, const char *cmd_prefix)
     - 	opts.head_idx = -1;
     - 	opts.src_index = &the_index;
     - 	opts.dst_index = &the_index;
     -+	if (opts.reset_keep_untracked) {
     -+		opts.dir = xcalloc(1, sizeof(*opts.dir));
     -+		opts.dir->flags |= DIR_SHOW_IGNORED;
     -+		setup_standard_excludes(opts.dir);
     -+	}
     - 
     - 	git_config(git_read_tree_config, NULL);
     - 
     -@@ builtin/read-tree.c: int cmd_read_tree(int argc, const char **argv, const char *cmd_prefix)
     - 	hold_locked_index(&lock_file, LOCK_DIE_ON_ERROR);
     - 
     - 	prefix_set = opts.prefix ? 1 : 0;
     --	if (1 < opts.merge + opts.reset + prefix_set)
     -+	if (1 < opts.merge + opts.reset_keep_untracked + prefix_set)
     + 	if (1 < opts.merge + opts.reset + prefix_set)
       		die("Which one? -m, --reset, or --prefix?");
       
     ++	if (opts.reset)
     ++		opts.reset = UNPACK_RESET_OVERWRITE_UNTRACKED;
     ++
       	/*
     -@@ builtin/read-tree.c: int cmd_read_tree(int argc, const char **argv, const char *cmd_prefix)
     - 	 * mode.
     - 	 */
     - 
     --	if (opts.reset || opts.merge || opts.prefix) {
     -+	if (opts.reset_keep_untracked || opts.merge || opts.prefix) {
     - 		if (read_cache_unmerged() && (opts.prefix || opts.merge))
     - 			die(_("You need to resolve your current index first"));
     - 		stage = opts.merge = 1;
     + 	 * NEEDSWORK
     + 	 *
      
       ## builtin/reset.c ##
      @@
     @@ builtin/reset.c: static int reset_index(const char *ref, const struct object_id
       	case HARD:
       		opts.update = 1;
      -		/* fallthrough */
     -+		opts.reset_nuke_untracked = 1;
     ++		opts.reset = UNPACK_RESET_OVERWRITE_UNTRACKED;
      +		break;
      +	case MIXED:
     -+		opts.reset_keep_untracked = 1; /* but opts.update=0, so untracked left alone */
     ++		opts.reset = UNPACK_RESET_PROTECT_UNTRACKED;
     ++		/* but opts.update=0, so working tree not updated */
      +		break;
       	default:
      -		opts.reset = 1;
      +		BUG("invalid reset_type passed to reset_index");
      +	}
     -+	if (opts.reset_keep_untracked) {
     ++	if (opts.reset == UNPACK_RESET_PROTECT_UNTRACKED) {
      +		/* Setup opts.dir so we can overwrite ignored files */
      +		opts.dir = xcalloc(1, sizeof(*opts.dir));
      +		opts.dir->flags |= DIR_SHOW_IGNORED;
     @@ builtin/reset.c: static int reset_index(const char *ref, const struct object_id
       	}
       
       	read_cache_unmerged();
     +@@ builtin/reset.c: static int reset_index(const char *ref, const struct object_id *oid, int reset_t
     + 	ret = 0;
     + 
     + out:
     ++	if (opts.reset == UNPACK_RESET_PROTECT_UNTRACKED) {
     ++		dir_clear(opts.dir);
     ++		FREE_AND_NULL(opts.dir);
     ++	}
     + 	for (i = 0; i < nr; i++)
     + 		free((void *)desc[i].buffer);
     + 	return ret;
      
       ## builtin/stash.c ##
     +@@ builtin/stash.c: static int reset_tree(struct object_id *i_tree, int update, int reset)
     + 	struct tree_desc t[MAX_UNPACK_TREES];
     + 	struct tree *tree;
     + 	struct lock_file lock_file = LOCK_INIT;
     ++	int unpack_trees_ret;
     + 
     + 	read_cache_preload(NULL);
     + 	if (refresh_cache(REFRESH_QUIET))
      @@ builtin/stash.c: static int reset_tree(struct object_id *i_tree, int update, int reset)
       	opts.src_index = &the_index;
       	opts.dst_index = &the_index;
       	opts.merge = 1;
      -	opts.reset = reset;
     -+	opts.reset_keep_untracked = reset;
     ++	opts.reset = reset ? UNPACK_RESET_PROTECT_UNTRACKED : 0;
     ++	if (opts.reset) {
     ++		opts.dir = xcalloc(1, sizeof(*opts.dir));
     ++		opts.dir->flags |= DIR_SHOW_IGNORED;
     ++		setup_standard_excludes(opts.dir);
     ++	}
       	opts.update = update;
       	opts.fn = oneway_merge;
       
     +-	if (unpack_trees(nr_trees, t, &opts))
     ++	unpack_trees_ret = unpack_trees(nr_trees, t, &opts);
     ++
     ++	if (opts.reset) {
     ++		dir_clear(opts.dir);
     ++		FREE_AND_NULL(opts.dir);
     ++	}
     ++
     ++	if (unpack_trees_ret)
     + 		return -1;
     + 
     + 	if (write_locked_index(&the_index, &lock_file, COMMIT_LOCK))
      
       ## reset.c ##
      @@
     @@ reset.c: int reset_head(struct repository *r, struct object_id *oid, const char
      -	if (!detach_head)
      -		unpack_tree_opts.reset = 1;
      +	if (!detach_head) {
     -+		unpack_tree_opts.reset_keep_untracked = 1;
     ++		unpack_tree_opts.reset = UNPACK_RESET_PROTECT_UNTRACKED;
      +		unpack_tree_opts.dir = xcalloc(1, sizeof(*unpack_tree_opts.dir));
      +		unpack_tree_opts.dir->flags |= DIR_SHOW_IGNORED;
      +		setup_standard_excludes(unpack_tree_opts.dir);
     @@ reset.c: int reset_head(struct repository *r, struct object_id *oid, const char
       
       	if (repo_read_index_unmerged(r) < 0) {
       		ret = error(_("could not read index"));
     -
     - ## t/t1013-read-tree-submodule.sh ##
     -@@ t/t1013-read-tree-submodule.sh: KNOWN_FAILURE_SUBMODULE_OVERWRITE_IGNORED_UNTRACKED=1
     - 
     - test_submodule_switch_recursing_with_args "read-tree -u -m"
     - 
     --test_submodule_forced_switch_recursing_with_args "read-tree -u --reset"
     -+test_submodule_switch_recursing_with_args "read-tree -u --reset"
     +@@ reset.c: reset_head_refs:
     + 			    oid_to_hex(oid), "1", NULL);
       
     - test_submodule_switch "read-tree -u -m"
     - 
     --test_submodule_forced_switch "read-tree -u --reset"
     -+test_submodule_switch "read-tree -u --reset"
     - 
     - test_done
     + leave_reset_head:
     ++	if (unpack_tree_opts.dir) {
     ++		dir_clear(unpack_tree_opts.dir);
     ++		FREE_AND_NULL(unpack_tree_opts.dir);
     ++	}
     + 	strbuf_release(&msg);
     + 	rollback_lock_file(&lock);
     + 	clear_unpack_trees_porcelain(&unpack_tree_opts);
      
       ## t/t2500-untracked-overwriting.sh ##
      @@ t/t2500-untracked-overwriting.sh: test_setup_checkout_m () {
     @@ t/t2500-untracked-overwriting.sh: test_expect_failure 'git rebase --abort and un
       		cd sequencing_rebase_fast_forward_and_untracked &&
      
       ## unpack-trees.c ##
     -@@ unpack-trees.c: static int check_submodule_move_head(const struct cache_entry *ce,
     - 	if (!sub)
     - 		return 0;
     - 
     --	if (o->reset)
     -+	if (o->reset_nuke_untracked)
     - 		flags |= SUBMODULE_MOVE_HEAD_FORCE;
     - 
     - 	if (submodule_move_head(ce->name, old_id, new_id, flags))
      @@ unpack-trees.c: int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
     - 	if (len > MAX_UNPACK_TREES)
     - 		die("unpack_trees takes at most %d trees", MAX_UNPACK_TREES);
     + 	struct pattern_list pl;
     + 	int free_pattern_list = 0;
       
     -+	if (o->reset_nuke_untracked && o->reset_keep_untracked)
     -+		BUG("reset_nuke_untracked and reset_keep_untracked are incompatible");
     ++	if (o->reset == UNPACK_RESET_INVALID)
     ++		BUG("o->reset had a value of 1; should be UNPACK_TREES_*_UNTRACKED");
      +
     -+	o->reset_either = 0;
     -+	if (o->reset_nuke_untracked || o->reset_keep_untracked)
     -+		o->reset_either = 1;
     -+
     - 	trace_performance_enter();
     - 	trace2_region_enter("unpack_trees", "unpack_trees", the_repository);
     - 
     -@@ unpack-trees.c: static int verify_uptodate_1(const struct cache_entry *ce,
     - 	 */
     - 	if ((ce->ce_flags & CE_VALID) || ce_skip_worktree(ce))
     - 		; /* keep checking */
     --	else if (o->reset || ce_uptodate(ce))
     -+	else if (o->reset_either || ce_uptodate(ce))
     - 		return 0;
     + 	if (len > MAX_UNPACK_TREES)
     + 		die("unpack_trees takes at most %d trees", MAX_UNPACK_TREES);
       
     - 	if (!lstat(ce->name, &st)) {
      @@ unpack-trees.c: static int verify_absent_1(const struct cache_entry *ce,
       	int len;
       	struct stat st;
       
      -	if (o->index_only || o->reset || !o->update)
     -+	if (o->index_only || o->reset_nuke_untracked || !o->update)
     ++	if (o->index_only || !o->update ||
     ++	    o->reset == UNPACK_RESET_OVERWRITE_UNTRACKED)
       		return 0;
       
       	len = check_leading_path(ce->name, ce_namelen(ce), 0);
     -@@ unpack-trees.c: int twoway_merge(const struct cache_entry * const *src,
     - 
     - 	if (current) {
     - 		if (current->ce_flags & CE_CONFLICTED) {
     --			if (same(oldtree, newtree) || o->reset) {
     -+			if (same(oldtree, newtree) || o->reset_either) {
     - 				if (!newtree)
     - 					return deleted_entry(current, current, o);
     - 				else
     -@@ unpack-trees.c: int oneway_merge(const struct cache_entry * const *src,
     - 
     - 	if (old && same(old, a)) {
     - 		int update = 0;
     --		if (o->reset && o->update && !ce_uptodate(old) && !ce_skip_worktree(old) &&
     -+		if (o->reset_either && o->update && !ce_uptodate(old) && !ce_skip_worktree(old) &&
     - 			!(old->ce_flags & CE_FSMONITOR_VALID)) {
     - 			struct stat st;
     - 			if (lstat(old->name, &st) ||
      
       ## unpack-trees.h ##
      @@ unpack-trees.h: void setup_unpack_trees_porcelain(struct unpack_trees_options *opts,
     +  */
       void clear_unpack_trees_porcelain(struct unpack_trees_options *opts);
       
     ++enum unpack_trees_reset_type {
     ++	UNPACK_RESET_NONE = 0,    /* traditional "false" value; still valid */
     ++	UNPACK_RESET_INVALID = 1, /* "true" no longer valid; use below values */
     ++	UNPACK_RESET_PROTECT_UNTRACKED,
     ++	UNPACK_RESET_OVERWRITE_UNTRACKED
     ++};
     ++
       struct unpack_trees_options {
      -	unsigned int reset,
     -+	unsigned int reset_nuke_untracked,
     -+		     reset_keep_untracked,
     -+		     reset_either, /* internal use only */
     - 		     merge,
     +-		     merge,
     ++	unsigned int merge,
       		     update,
       		     clone,
     + 		     index_only,
     +@@ unpack-trees.h: struct unpack_trees_options {
     + 		     exiting_early,
     + 		     show_all_errors,
     + 		     dry_run;
     ++	enum unpack_trees_reset_type reset;
     + 	const char *prefix;
     + 	int cache_bottom;
     + 	struct dir_struct *dir;
 3:  a69117a1c9e = 3:  208f3b3ebe5 unpack-trees: avoid nuking untracked dir in way of unmerged file
 4:  01bf850bb0f ! 4:  0a0997d081b unpack-trees: avoid nuking untracked dir in way of locally deleted file
     @@ unpack-trees.c: static int deleted_entry(const struct cache_entry *ce,
       		if (verify_absent(ce, ERROR_WOULD_LOSE_UNTRACKED_REMOVED, o))
       			return -1;
       		return 0;
     -+	} else {
     -+		if (verify_absent_if_directory(ce, ERROR_WOULD_LOSE_UNTRACKED_REMOVED, o))
     -+			return -1;
     ++	} else if (verify_absent_if_directory(ce, ERROR_WOULD_LOSE_UNTRACKED_REMOVED, o)) {
     ++		return -1;
       	}
      +
       	if (!(old->ce_flags & CE_CONFLICTED) && verify_uptodate(old, o))
 5:  60c5d6b4615 = 5:  4b78a526d2a Comment important codepaths regarding nuking untracked files/dirs
 6:  6ea23d165cf = 6:  993451a8036 Documentation: call out commands that nuke untracked files/directories

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 82+ messages in thread

* [PATCH v2 1/6] t2500: add various tests for nuking untracked files
  2021-09-24  6:37 ` [PATCH v2 0/6] Fix various issues around removal of " Elijah Newren via GitGitGadget
@ 2021-09-24  6:37   ` Elijah Newren via GitGitGadget
  2021-09-24  6:37   ` [PATCH v2 2/6] Change unpack_trees' 'reset' flag into an enum Elijah Newren via GitGitGadget
                     ` (5 subsequent siblings)
  6 siblings, 0 replies; 82+ messages in thread
From: Elijah Newren via GitGitGadget @ 2021-09-24  6:37 UTC (permalink / raw)
  To: git
  Cc: Ævar Arnfjörð Bjarmason, Fedor Biryukov,
	Philip Oakley, Phillip Wood, Elijah Newren, Elijah Newren,
	Elijah Newren

From: Elijah Newren <newren@gmail.com>

Noting that unpack_trees treats reset=1 & update=1 as license to nuke
untracked files, I looked for code paths that use this combination and
tried to generate testcases which demonstrated unintentional loss of
untracked files and directories.  I found several.

I also include testcases for `git reset --{hard,merge,keep}`.  A hard
reset is perhaps the most direct test of unpack_tree's reset=1 behavior,
but we cannot make `git reset --hard` preserve untracked files without
some migration work.

Also, the two commands `checkout --force` (because of the --force) and
`read-tree --reset` (because it's plumbing and we need to keep it
backward compatible) were left out as we expect those to continue
removing untracked files and directories.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 t/t2500-untracked-overwriting.sh | 244 +++++++++++++++++++++++++++++++
 1 file changed, 244 insertions(+)
 create mode 100755 t/t2500-untracked-overwriting.sh

diff --git a/t/t2500-untracked-overwriting.sh b/t/t2500-untracked-overwriting.sh
new file mode 100755
index 00000000000..2412d121ea8
--- /dev/null
+++ b/t/t2500-untracked-overwriting.sh
@@ -0,0 +1,244 @@
+#!/bin/sh
+
+test_description='Test handling of overwriting untracked files'
+
+. ./test-lib.sh
+
+test_setup_reset () {
+	git init reset_$1 &&
+	(
+		cd reset_$1 &&
+		test_commit init &&
+
+		git branch stable &&
+		git branch work &&
+
+		git checkout work &&
+		test_commit foo &&
+
+		git checkout stable
+	)
+}
+
+test_expect_success 'reset --hard will nuke untracked files/dirs' '
+	test_setup_reset hard &&
+	(
+		cd reset_hard &&
+		git ls-tree -r stable &&
+		git log --all --name-status --oneline &&
+		git ls-tree -r work &&
+
+		mkdir foo.t &&
+		echo precious >foo.t/file &&
+		echo foo >expect &&
+
+		git reset --hard work &&
+
+		# check that untracked directory foo.t/ was nuked
+		test_path_is_file foo.t &&
+		test_cmp expect foo.t
+	)
+'
+
+test_expect_success 'reset --merge will preserve untracked files/dirs' '
+	test_setup_reset merge &&
+	(
+		cd reset_merge &&
+
+		mkdir foo.t &&
+		echo precious >foo.t/file &&
+		cp foo.t/file expect &&
+
+		test_must_fail git reset --merge work 2>error &&
+		test_cmp expect foo.t/file &&
+		grep "Updating .foo.t. would lose untracked files" error
+	)
+'
+
+test_expect_success 'reset --keep will preserve untracked files/dirs' '
+	test_setup_reset keep &&
+	(
+		cd reset_keep &&
+
+		mkdir foo.t &&
+		echo precious >foo.t/file &&
+		cp foo.t/file expect &&
+
+		test_must_fail git reset --merge work 2>error &&
+		test_cmp expect foo.t/file &&
+		grep "Updating.*foo.t.*would lose untracked files" error
+	)
+'
+
+test_setup_checkout_m () {
+	git init checkout &&
+	(
+		cd checkout &&
+		test_commit init &&
+
+		test_write_lines file has some >filler &&
+		git add filler &&
+		git commit -m filler &&
+
+		git branch stable &&
+
+		git switch -c work &&
+		echo stuff >notes.txt &&
+		test_write_lines file has some words >filler &&
+		git add notes.txt filler &&
+		git commit -m filler &&
+
+		git checkout stable
+	)
+}
+
+test_expect_failure 'checkout -m does not nuke untracked file' '
+	test_setup_checkout_m &&
+	(
+		cd checkout &&
+
+		# Tweak filler
+		test_write_lines this file has some >filler &&
+		# Make an untracked file, save its contents in "expect"
+		echo precious >notes.txt &&
+		cp notes.txt expect &&
+
+		test_must_fail git checkout -m work &&
+		test_cmp expect notes.txt
+	)
+'
+
+test_setup_sequencing () {
+	git init sequencing_$1 &&
+	(
+		cd sequencing_$1 &&
+		test_commit init &&
+
+		test_write_lines this file has some words >filler &&
+		git add filler &&
+		git commit -m filler &&
+
+		mkdir -p foo/bar &&
+		test_commit foo/bar/baz &&
+
+		git branch simple &&
+		git branch fooey &&
+
+		git checkout fooey &&
+		git rm foo/bar/baz.t &&
+		echo stuff >>filler &&
+		git add -u &&
+		git commit -m "changes" &&
+
+		git checkout simple &&
+		echo items >>filler &&
+		echo newstuff >>newfile &&
+		git add filler newfile &&
+		git commit -m another
+	)
+}
+
+test_expect_failure 'git rebase --abort and untracked files' '
+	test_setup_sequencing rebase_abort_and_untracked &&
+	(
+		cd sequencing_rebase_abort_and_untracked &&
+		git checkout fooey &&
+		test_must_fail git rebase simple &&
+
+		cat init.t &&
+		git rm init.t &&
+		echo precious >init.t &&
+		cp init.t expect &&
+		git status --porcelain &&
+		test_must_fail git rebase --abort &&
+		test_cmp expect init.t
+	)
+'
+
+test_expect_failure 'git rebase fast forwarding and untracked files' '
+	test_setup_sequencing rebase_fast_forward_and_untracked &&
+	(
+		cd sequencing_rebase_fast_forward_and_untracked &&
+		git checkout init &&
+		echo precious >filler &&
+		cp filler expect &&
+		test_must_fail git rebase init simple &&
+		test_cmp expect filler
+	)
+'
+
+test_expect_failure 'git rebase --autostash and untracked files' '
+	test_setup_sequencing rebase_autostash_and_untracked &&
+	(
+		cd sequencing_rebase_autostash_and_untracked &&
+		git checkout simple &&
+		git rm filler &&
+		mkdir filler &&
+		echo precious >filler/file &&
+		cp filler/file expect &&
+		git rebase --autostash init &&
+		test_path_is_file filler/file
+	)
+'
+
+test_expect_failure 'git stash and untracked files' '
+	test_setup_sequencing stash_and_untracked_files &&
+	(
+		cd sequencing_stash_and_untracked_files &&
+		git checkout simple &&
+		git rm filler &&
+		mkdir filler &&
+		echo precious >filler/file &&
+		cp filler/file expect &&
+		git status --porcelain &&
+		git stash push &&
+		git status --porcelain &&
+		test_path_is_file filler/file
+	)
+'
+
+test_expect_failure 'git am --abort and untracked dir vs. unmerged file' '
+	test_setup_sequencing am_abort_and_untracked &&
+	(
+		cd sequencing_am_abort_and_untracked &&
+		git format-patch -1 --stdout fooey >changes.mbox &&
+		test_must_fail git am --3way changes.mbox &&
+
+		# Delete the conflicted file; we will stage and commit it later
+		rm filler &&
+
+		# Put an unrelated untracked directory there
+		mkdir filler &&
+		echo foo >filler/file1 &&
+		echo bar >filler/file2 &&
+
+		test_must_fail git am --abort 2>errors &&
+		test_path_is_dir filler &&
+		grep "Updating .filler. would lose untracked files in it" errors
+	)
+'
+
+test_expect_failure 'git am --skip and untracked dir vs deleted file' '
+	test_setup_sequencing am_skip_and_untracked &&
+	(
+		cd sequencing_am_skip_and_untracked &&
+		git checkout fooey &&
+		git format-patch -1 --stdout simple >changes.mbox &&
+		test_must_fail git am --3way changes.mbox &&
+
+		# Delete newfile
+		rm newfile &&
+
+		# Put an unrelated untracked directory there
+		mkdir newfile &&
+		echo foo >newfile/file1 &&
+		echo bar >newfile/file2 &&
+
+		# Change our mind about resolutions, just skip this patch
+		test_must_fail git am --skip 2>errors &&
+		test_path_is_dir newfile &&
+		grep "Updating .newfile. would lose untracked files in it" errors
+	)
+'
+
+test_done
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [PATCH v2 2/6] Change unpack_trees' 'reset' flag into an enum
  2021-09-24  6:37 ` [PATCH v2 0/6] Fix various issues around removal of " Elijah Newren via GitGitGadget
  2021-09-24  6:37   ` [PATCH v2 1/6] t2500: add various tests for nuking untracked files Elijah Newren via GitGitGadget
@ 2021-09-24  6:37   ` Elijah Newren via GitGitGadget
  2021-09-24 17:35     ` Junio C Hamano
  2021-09-24  6:37   ` [PATCH v2 3/6] unpack-trees: avoid nuking untracked dir in way of unmerged file Elijah Newren via GitGitGadget
                     ` (4 subsequent siblings)
  6 siblings, 1 reply; 82+ messages in thread
From: Elijah Newren via GitGitGadget @ 2021-09-24  6:37 UTC (permalink / raw)
  To: git
  Cc: Ævar Arnfjörð Bjarmason, Fedor Biryukov,
	Philip Oakley, Phillip Wood, Elijah Newren, Elijah Newren,
	Elijah Newren

From: Elijah Newren <newren@gmail.com>

Traditionally, unpack_trees_options->reset was used to signal that it
was okay to delete any untracked files in the way.  This was used by
`git read-tree --reset`, but then started appearing in other places as
well.  However, many of the other uses should not be deleting untracked
files in the way.  Change this value to an enum so that a value of 1
(i.e. "true") can be split into two:
   UNPACK_RESET_PROTECT_UNTRACKED,
   UNPACK_RESET_OVERWRITE_UNTRACKED
In order to catch accidental misuses, define with the enum a special
value of
   UNPACK_RESET_INVALID = 1
which will trigger a BUG().

Modify existing callers so that
   read-tree --reset
   reset --hard
   checkout --force
continue using the UNPACK_RESET_OVERWRITE_UNTRACKED logic, while other
callers, including
   am
   checkout without --force
   stash  (though currently dead code; reset always had a value of 0)
   numerous callers from rebase/sequencer to reset_head()
will use the new UNPACK_RESET_PROTECT_UNTRACKED value.

In order to protect untracked files but still allow deleting of ignored
files, we also have to setup unpack_trees_opt.dir.  It may make sense to
set up unpack_trees_opt.dir in more cases, but here I tried to only do
so in cases where we switched from deleting all untracked files to
avoiding doing so (i.e. where we now use
UNPACK_RESET_PROTECT_UNTRACKED).

Also, note that 'git checkout <pathspec>' currently also allows
overwriting untracked files.  That case should also be fixed, but it
does not use unpack_trees() and thus is outside the scope of the current
changes.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 builtin/am.c                     | 13 ++++++++++++-
 builtin/checkout.c               | 18 ++++++++++++++++--
 builtin/read-tree.c              |  3 +++
 builtin/reset.c                  | 20 ++++++++++++++++++--
 builtin/stash.c                  | 17 +++++++++++++++--
 reset.c                          | 13 +++++++++++--
 t/t2500-untracked-overwriting.sh |  6 +++---
 unpack-trees.c                   |  6 +++++-
 unpack-trees.h                   | 11 +++++++++--
 9 files changed, 92 insertions(+), 15 deletions(-)

diff --git a/builtin/am.c b/builtin/am.c
index c79e0167e98..b17baa67ad8 100644
--- a/builtin/am.c
+++ b/builtin/am.c
@@ -1918,8 +1918,14 @@ static int fast_forward_to(struct tree *head, struct tree *remote, int reset)
 	opts.dst_index = &the_index;
 	opts.update = 1;
 	opts.merge = 1;
-	opts.reset = reset;
+	opts.reset = reset ? UNPACK_RESET_PROTECT_UNTRACKED : 0;
 	opts.fn = twoway_merge;
+	if (opts.reset) {
+		/* Allow ignored files in the way to get overwritten */
+		opts.dir = xcalloc(1, sizeof(*opts.dir));
+		opts.dir->flags |= DIR_SHOW_IGNORED;
+		setup_standard_excludes(opts.dir);
+	}
 	init_tree_desc(&t[0], head->buffer, head->size);
 	init_tree_desc(&t[1], remote->buffer, remote->size);
 
@@ -1928,6 +1934,11 @@ static int fast_forward_to(struct tree *head, struct tree *remote, int reset)
 		return -1;
 	}
 
+	if (opts.reset) {
+		dir_clear(opts.dir);
+		FREE_AND_NULL(opts.dir);
+	}
+
 	if (write_locked_index(&the_index, &lock_file, COMMIT_LOCK))
 		die(_("unable to write new index file"));
 
diff --git a/builtin/checkout.c b/builtin/checkout.c
index b5d477919a7..52826e0d145 100644
--- a/builtin/checkout.c
+++ b/builtin/checkout.c
@@ -641,23 +641,37 @@ static int reset_tree(struct tree *tree, const struct checkout_opts *o,
 {
 	struct unpack_trees_options opts;
 	struct tree_desc tree_desc;
+	int unpack_trees_ret;
 
 	memset(&opts, 0, sizeof(opts));
 	opts.head_idx = -1;
 	opts.update = worktree;
 	opts.skip_unmerged = !worktree;
-	opts.reset = 1;
+	opts.reset = o->force ? UNPACK_RESET_OVERWRITE_UNTRACKED :
+				UNPACK_RESET_PROTECT_UNTRACKED;
 	opts.merge = 1;
 	opts.fn = oneway_merge;
 	opts.verbose_update = o->show_progress;
 	opts.src_index = &the_index;
 	opts.dst_index = &the_index;
+	if (o->overwrite_ignore) {
+		opts.dir = xcalloc(1, sizeof(*opts.dir));
+		opts.dir->flags |= DIR_SHOW_IGNORED;
+		setup_standard_excludes(opts.dir);
+	}
 	init_checkout_metadata(&opts.meta, info->refname,
 			       info->commit ? &info->commit->object.oid : null_oid(),
 			       NULL);
 	parse_tree(tree);
 	init_tree_desc(&tree_desc, tree->buffer, tree->size);
-	switch (unpack_trees(1, &tree_desc, &opts)) {
+	unpack_trees_ret = unpack_trees(1, &tree_desc, &opts);
+
+	if (o->overwrite_ignore) {
+		dir_clear(opts.dir);
+		FREE_AND_NULL(opts.dir);
+	}
+
+	switch (unpack_trees_ret) {
 	case -2:
 		*writeout_error = 1;
 		/*
diff --git a/builtin/read-tree.c b/builtin/read-tree.c
index 485e7b04794..740fc0335af 100644
--- a/builtin/read-tree.c
+++ b/builtin/read-tree.c
@@ -174,6 +174,9 @@ int cmd_read_tree(int argc, const char **argv, const char *cmd_prefix)
 	if (1 < opts.merge + opts.reset + prefix_set)
 		die("Which one? -m, --reset, or --prefix?");
 
+	if (opts.reset)
+		opts.reset = UNPACK_RESET_OVERWRITE_UNTRACKED;
+
 	/*
 	 * NEEDSWORK
 	 *
diff --git a/builtin/reset.c b/builtin/reset.c
index 43e855cb887..a12ee986e9f 100644
--- a/builtin/reset.c
+++ b/builtin/reset.c
@@ -10,6 +10,7 @@
 #define USE_THE_INDEX_COMPATIBILITY_MACROS
 #include "builtin.h"
 #include "config.h"
+#include "dir.h"
 #include "lockfile.h"
 #include "tag.h"
 #include "object.h"
@@ -70,9 +71,20 @@ static int reset_index(const char *ref, const struct object_id *oid, int reset_t
 		break;
 	case HARD:
 		opts.update = 1;
-		/* fallthrough */
+		opts.reset = UNPACK_RESET_OVERWRITE_UNTRACKED;
+		break;
+	case MIXED:
+		opts.reset = UNPACK_RESET_PROTECT_UNTRACKED;
+		/* but opts.update=0, so working tree not updated */
+		break;
 	default:
-		opts.reset = 1;
+		BUG("invalid reset_type passed to reset_index");
+	}
+	if (opts.reset == UNPACK_RESET_PROTECT_UNTRACKED) {
+		/* Setup opts.dir so we can overwrite ignored files */
+		opts.dir = xcalloc(1, sizeof(*opts.dir));
+		opts.dir->flags |= DIR_SHOW_IGNORED;
+		setup_standard_excludes(opts.dir);
 	}
 
 	read_cache_unmerged();
@@ -104,6 +116,10 @@ static int reset_index(const char *ref, const struct object_id *oid, int reset_t
 	ret = 0;
 
 out:
+	if (opts.reset == UNPACK_RESET_PROTECT_UNTRACKED) {
+		dir_clear(opts.dir);
+		FREE_AND_NULL(opts.dir);
+	}
 	for (i = 0; i < nr; i++)
 		free((void *)desc[i].buffer);
 	return ret;
diff --git a/builtin/stash.c b/builtin/stash.c
index 8f42360ca91..563f590afbd 100644
--- a/builtin/stash.c
+++ b/builtin/stash.c
@@ -237,6 +237,7 @@ static int reset_tree(struct object_id *i_tree, int update, int reset)
 	struct tree_desc t[MAX_UNPACK_TREES];
 	struct tree *tree;
 	struct lock_file lock_file = LOCK_INIT;
+	int unpack_trees_ret;
 
 	read_cache_preload(NULL);
 	if (refresh_cache(REFRESH_QUIET))
@@ -256,11 +257,23 @@ static int reset_tree(struct object_id *i_tree, int update, int reset)
 	opts.src_index = &the_index;
 	opts.dst_index = &the_index;
 	opts.merge = 1;
-	opts.reset = reset;
+	opts.reset = reset ? UNPACK_RESET_PROTECT_UNTRACKED : 0;
+	if (opts.reset) {
+		opts.dir = xcalloc(1, sizeof(*opts.dir));
+		opts.dir->flags |= DIR_SHOW_IGNORED;
+		setup_standard_excludes(opts.dir);
+	}
 	opts.update = update;
 	opts.fn = oneway_merge;
 
-	if (unpack_trees(nr_trees, t, &opts))
+	unpack_trees_ret = unpack_trees(nr_trees, t, &opts);
+
+	if (opts.reset) {
+		dir_clear(opts.dir);
+		FREE_AND_NULL(opts.dir);
+	}
+
+	if (unpack_trees_ret)
 		return -1;
 
 	if (write_locked_index(&the_index, &lock_file, COMMIT_LOCK))
diff --git a/reset.c b/reset.c
index 79310ae071b..1695f3828c5 100644
--- a/reset.c
+++ b/reset.c
@@ -1,5 +1,6 @@
 #include "git-compat-util.h"
 #include "cache-tree.h"
+#include "dir.h"
 #include "lockfile.h"
 #include "refs.h"
 #include "reset.h"
@@ -57,8 +58,12 @@ int reset_head(struct repository *r, struct object_id *oid, const char *action,
 	unpack_tree_opts.update = 1;
 	unpack_tree_opts.merge = 1;
 	init_checkout_metadata(&unpack_tree_opts.meta, switch_to_branch, oid, NULL);
-	if (!detach_head)
-		unpack_tree_opts.reset = 1;
+	if (!detach_head) {
+		unpack_tree_opts.reset = UNPACK_RESET_PROTECT_UNTRACKED;
+		unpack_tree_opts.dir = xcalloc(1, sizeof(*unpack_tree_opts.dir));
+		unpack_tree_opts.dir->flags |= DIR_SHOW_IGNORED;
+		setup_standard_excludes(unpack_tree_opts.dir);
+	}
 
 	if (repo_read_index_unmerged(r) < 0) {
 		ret = error(_("could not read index"));
@@ -131,6 +136,10 @@ reset_head_refs:
 			    oid_to_hex(oid), "1", NULL);
 
 leave_reset_head:
+	if (unpack_tree_opts.dir) {
+		dir_clear(unpack_tree_opts.dir);
+		FREE_AND_NULL(unpack_tree_opts.dir);
+	}
 	strbuf_release(&msg);
 	rollback_lock_file(&lock);
 	clear_unpack_trees_porcelain(&unpack_tree_opts);
diff --git a/t/t2500-untracked-overwriting.sh b/t/t2500-untracked-overwriting.sh
index 2412d121ea8..18604360df8 100755
--- a/t/t2500-untracked-overwriting.sh
+++ b/t/t2500-untracked-overwriting.sh
@@ -92,7 +92,7 @@ test_setup_checkout_m () {
 	)
 }
 
-test_expect_failure 'checkout -m does not nuke untracked file' '
+test_expect_success 'checkout -m does not nuke untracked file' '
 	test_setup_checkout_m &&
 	(
 		cd checkout &&
@@ -138,7 +138,7 @@ test_setup_sequencing () {
 	)
 }
 
-test_expect_failure 'git rebase --abort and untracked files' '
+test_expect_success 'git rebase --abort and untracked files' '
 	test_setup_sequencing rebase_abort_and_untracked &&
 	(
 		cd sequencing_rebase_abort_and_untracked &&
@@ -155,7 +155,7 @@ test_expect_failure 'git rebase --abort and untracked files' '
 	)
 '
 
-test_expect_failure 'git rebase fast forwarding and untracked files' '
+test_expect_success 'git rebase fast forwarding and untracked files' '
 	test_setup_sequencing rebase_fast_forward_and_untracked &&
 	(
 		cd sequencing_rebase_fast_forward_and_untracked &&
diff --git a/unpack-trees.c b/unpack-trees.c
index 5786645f315..fcbe63bbed9 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -1693,6 +1693,9 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
 	struct pattern_list pl;
 	int free_pattern_list = 0;
 
+	if (o->reset == UNPACK_RESET_INVALID)
+		BUG("o->reset had a value of 1; should be UNPACK_TREES_*_UNTRACKED");
+
 	if (len > MAX_UNPACK_TREES)
 		die("unpack_trees takes at most %d trees", MAX_UNPACK_TREES);
 
@@ -2218,7 +2221,8 @@ static int verify_absent_1(const struct cache_entry *ce,
 	int len;
 	struct stat st;
 
-	if (o->index_only || o->reset || !o->update)
+	if (o->index_only || !o->update ||
+	    o->reset == UNPACK_RESET_OVERWRITE_UNTRACKED)
 		return 0;
 
 	len = check_leading_path(ce->name, ce_namelen(ce), 0);
diff --git a/unpack-trees.h b/unpack-trees.h
index 2d88b19dca7..1f386fb16cc 100644
--- a/unpack-trees.h
+++ b/unpack-trees.h
@@ -45,9 +45,15 @@ void setup_unpack_trees_porcelain(struct unpack_trees_options *opts,
  */
 void clear_unpack_trees_porcelain(struct unpack_trees_options *opts);
 
+enum unpack_trees_reset_type {
+	UNPACK_RESET_NONE = 0,    /* traditional "false" value; still valid */
+	UNPACK_RESET_INVALID = 1, /* "true" no longer valid; use below values */
+	UNPACK_RESET_PROTECT_UNTRACKED,
+	UNPACK_RESET_OVERWRITE_UNTRACKED
+};
+
 struct unpack_trees_options {
-	unsigned int reset,
-		     merge,
+	unsigned int merge,
 		     update,
 		     clone,
 		     index_only,
@@ -64,6 +70,7 @@ struct unpack_trees_options {
 		     exiting_early,
 		     show_all_errors,
 		     dry_run;
+	enum unpack_trees_reset_type reset;
 	const char *prefix;
 	int cache_bottom;
 	struct dir_struct *dir;
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [PATCH v2 3/6] unpack-trees: avoid nuking untracked dir in way of unmerged file
  2021-09-24  6:37 ` [PATCH v2 0/6] Fix various issues around removal of " Elijah Newren via GitGitGadget
  2021-09-24  6:37   ` [PATCH v2 1/6] t2500: add various tests for nuking untracked files Elijah Newren via GitGitGadget
  2021-09-24  6:37   ` [PATCH v2 2/6] Change unpack_trees' 'reset' flag into an enum Elijah Newren via GitGitGadget
@ 2021-09-24  6:37   ` Elijah Newren via GitGitGadget
  2021-09-24  6:37   ` [PATCH v2 4/6] unpack-trees: avoid nuking untracked dir in way of locally deleted file Elijah Newren via GitGitGadget
                     ` (3 subsequent siblings)
  6 siblings, 0 replies; 82+ messages in thread
From: Elijah Newren via GitGitGadget @ 2021-09-24  6:37 UTC (permalink / raw)
  To: git
  Cc: Ævar Arnfjörð Bjarmason, Fedor Biryukov,
	Philip Oakley, Phillip Wood, Elijah Newren, Elijah Newren,
	Elijah Newren

From: Elijah Newren <newren@gmail.com>

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 t/t2500-untracked-overwriting.sh |  2 +-
 unpack-trees.c                   | 35 ++++++++++++++++++++++++++++----
 2 files changed, 32 insertions(+), 5 deletions(-)

diff --git a/t/t2500-untracked-overwriting.sh b/t/t2500-untracked-overwriting.sh
index 18604360df8..5ec66058cfc 100755
--- a/t/t2500-untracked-overwriting.sh
+++ b/t/t2500-untracked-overwriting.sh
@@ -197,7 +197,7 @@ test_expect_failure 'git stash and untracked files' '
 	)
 '
 
-test_expect_failure 'git am --abort and untracked dir vs. unmerged file' '
+test_expect_success 'git am --abort and untracked dir vs. unmerged file' '
 	test_setup_sequencing am_abort_and_untracked &&
 	(
 		cd sequencing_am_abort_and_untracked &&
diff --git a/unpack-trees.c b/unpack-trees.c
index fcbe63bbed9..f7d0088a4fd 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -2159,9 +2159,15 @@ static int icase_exists(struct unpack_trees_options *o, const char *name, int le
 	return src && !ie_match_stat(o->src_index, src, st, CE_MATCH_IGNORE_VALID|CE_MATCH_IGNORE_SKIP_WORKTREE);
 }
 
+enum absent_checking_type {
+	COMPLETELY_ABSENT,
+	ABSENT_ANY_DIRECTORY
+};
+
 static int check_ok_to_remove(const char *name, int len, int dtype,
 			      const struct cache_entry *ce, struct stat *st,
 			      enum unpack_trees_error_types error_type,
+			      enum absent_checking_type absent_type,
 			      struct unpack_trees_options *o)
 {
 	const struct cache_entry *result;
@@ -2196,6 +2202,10 @@ static int check_ok_to_remove(const char *name, int len, int dtype,
 		return 0;
 	}
 
+	/* If we only care about directories, then we can remove */
+	if (absent_type == ABSENT_ANY_DIRECTORY)
+		return 0;
+
 	/*
 	 * The previous round may already have decided to
 	 * delete this path, which is in a subdirectory that
@@ -2216,6 +2226,7 @@ static int check_ok_to_remove(const char *name, int len, int dtype,
  */
 static int verify_absent_1(const struct cache_entry *ce,
 			   enum unpack_trees_error_types error_type,
+			   enum absent_checking_type absent_type,
 			   struct unpack_trees_options *o)
 {
 	int len;
@@ -2242,7 +2253,8 @@ static int verify_absent_1(const struct cache_entry *ce,
 								NULL, o);
 			else
 				ret = check_ok_to_remove(path, len, DT_UNKNOWN, NULL,
-							 &st, error_type, o);
+							 &st, error_type,
+							 absent_type, o);
 		}
 		free(path);
 		return ret;
@@ -2257,7 +2269,7 @@ static int verify_absent_1(const struct cache_entry *ce,
 
 		return check_ok_to_remove(ce->name, ce_namelen(ce),
 					  ce_to_dtype(ce), ce, &st,
-					  error_type, o);
+					  error_type, absent_type, o);
 	}
 }
 
@@ -2267,14 +2279,23 @@ static int verify_absent(const struct cache_entry *ce,
 {
 	if (!o->skip_sparse_checkout && (ce->ce_flags & CE_NEW_SKIP_WORKTREE))
 		return 0;
-	return verify_absent_1(ce, error_type, o);
+	return verify_absent_1(ce, error_type, COMPLETELY_ABSENT, o);
+}
+
+static int verify_absent_if_directory(const struct cache_entry *ce,
+				      enum unpack_trees_error_types error_type,
+				      struct unpack_trees_options *o)
+{
+	if (!o->skip_sparse_checkout && (ce->ce_flags & CE_NEW_SKIP_WORKTREE))
+		return 0;
+	return verify_absent_1(ce, error_type, ABSENT_ANY_DIRECTORY, o);
 }
 
 static int verify_absent_sparse(const struct cache_entry *ce,
 				enum unpack_trees_error_types error_type,
 				struct unpack_trees_options *o)
 {
-	return verify_absent_1(ce, error_type, o);
+	return verify_absent_1(ce, error_type, COMPLETELY_ABSENT, o);
 }
 
 static int merged_entry(const struct cache_entry *ce,
@@ -2348,6 +2369,12 @@ static int merged_entry(const struct cache_entry *ce,
 		 * Previously unmerged entry left as an existence
 		 * marker by read_index_unmerged();
 		 */
+		if (verify_absent_if_directory(merge,
+				  ERROR_WOULD_LOSE_UNTRACKED_OVERWRITTEN, o)) {
+			discard_cache_entry(merge);
+			return -1;
+		}
+
 		invalidate_ce_path(old, o);
 	}
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [PATCH v2 4/6] unpack-trees: avoid nuking untracked dir in way of locally deleted file
  2021-09-24  6:37 ` [PATCH v2 0/6] Fix various issues around removal of " Elijah Newren via GitGitGadget
                     ` (2 preceding siblings ...)
  2021-09-24  6:37   ` [PATCH v2 3/6] unpack-trees: avoid nuking untracked dir in way of unmerged file Elijah Newren via GitGitGadget
@ 2021-09-24  6:37   ` Elijah Newren via GitGitGadget
  2021-09-24  6:37   ` [PATCH v2 5/6] Comment important codepaths regarding nuking untracked files/dirs Elijah Newren via GitGitGadget
                     ` (2 subsequent siblings)
  6 siblings, 0 replies; 82+ messages in thread
From: Elijah Newren via GitGitGadget @ 2021-09-24  6:37 UTC (permalink / raw)
  To: git
  Cc: Ævar Arnfjörð Bjarmason, Fedor Biryukov,
	Philip Oakley, Phillip Wood, Elijah Newren, Elijah Newren,
	Elijah Newren

From: Elijah Newren <newren@gmail.com>

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 t/t2500-untracked-overwriting.sh | 2 +-
 unpack-trees.c                   | 3 +++
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/t/t2500-untracked-overwriting.sh b/t/t2500-untracked-overwriting.sh
index 5ec66058cfc..5c0bf4d21fc 100755
--- a/t/t2500-untracked-overwriting.sh
+++ b/t/t2500-untracked-overwriting.sh
@@ -218,7 +218,7 @@ test_expect_success 'git am --abort and untracked dir vs. unmerged file' '
 	)
 '
 
-test_expect_failure 'git am --skip and untracked dir vs deleted file' '
+test_expect_success 'git am --skip and untracked dir vs deleted file' '
 	test_setup_sequencing am_skip_and_untracked &&
 	(
 		cd sequencing_am_skip_and_untracked &&
diff --git a/unpack-trees.c b/unpack-trees.c
index f7d0088a4fd..b1e7ee9dfc0 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -2392,7 +2392,10 @@ static int deleted_entry(const struct cache_entry *ce,
 		if (verify_absent(ce, ERROR_WOULD_LOSE_UNTRACKED_REMOVED, o))
 			return -1;
 		return 0;
+	} else if (verify_absent_if_directory(ce, ERROR_WOULD_LOSE_UNTRACKED_REMOVED, o)) {
+		return -1;
 	}
+
 	if (!(old->ce_flags & CE_CONFLICTED) && verify_uptodate(old, o))
 		return -1;
 	add_entry(o, ce, CE_REMOVE, 0);
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [PATCH v2 5/6] Comment important codepaths regarding nuking untracked files/dirs
  2021-09-24  6:37 ` [PATCH v2 0/6] Fix various issues around removal of " Elijah Newren via GitGitGadget
                     ` (3 preceding siblings ...)
  2021-09-24  6:37   ` [PATCH v2 4/6] unpack-trees: avoid nuking untracked dir in way of locally deleted file Elijah Newren via GitGitGadget
@ 2021-09-24  6:37   ` Elijah Newren via GitGitGadget
  2021-09-24 17:50     ` Eric Sunshine
  2021-09-24  6:37   ` [PATCH v2 6/6] Documentation: call out commands that nuke untracked files/directories Elijah Newren via GitGitGadget
  2021-09-27 16:33   ` [PATCH v3 00/11] Fix various issues around removal of " Elijah Newren via GitGitGadget
  6 siblings, 1 reply; 82+ messages in thread
From: Elijah Newren via GitGitGadget @ 2021-09-24  6:37 UTC (permalink / raw)
  To: git
  Cc: Ævar Arnfjörð Bjarmason, Fedor Biryukov,
	Philip Oakley, Phillip Wood, Elijah Newren, Elijah Newren,
	Elijah Newren

From: Elijah Newren <newren@gmail.com>

In the last few commits we focused on code in unpack-trees.c that
mistakenly removed untracked files or directories.  There may be more of
those, but in this commit we change our focus: callers of toplevel
commands that are expected to remove untracked files or directories.

As noted previously, we have toplevel commands that are expected to
delete untracked files such as 'read-tree --reset', 'reset --hard', and
'checkout --force'.  However, that does not mean that other highlevel
commands that happen to call these other commands thought about or
conveyed to users the possibility that untracked files could be removed.
Audit the code for such callsites, and add comments near existing
callsites to mention whether these are safe or not.

My auditing is somewhat incomplete, though; it skipped several cases:
  * git-rebase--preserve-merges.sh: is in the process of being
    deprecated/removed, so I won't leave a note that there are
    likely more bugs in that script.
  * contrib/git-new-workdir: why is the -f flag being used in a new
    empty directory??  It shouldn't hurt, but it seems useless.
  * git-p4.py: Don't see why -f is needed for a new dir (maybe it's
    not and is just superfluous), but I'm not at all familiar with
    the p4 stuff
  * git-archimport.perl: Don't care; arch is long since dead
  * git-cvs*.perl: Don't care; cvs is long since dead

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 builtin/stash.c             | 1 +
 builtin/submodule--helper.c | 4 ++++
 builtin/worktree.c          | 5 +++++
 contrib/rerere-train.sh     | 2 +-
 submodule.c                 | 1 +
 5 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/builtin/stash.c b/builtin/stash.c
index 563f590afbd..0d071183eba 100644
--- a/builtin/stash.c
+++ b/builtin/stash.c
@@ -1532,6 +1532,7 @@ static int do_push_stash(const struct pathspec *ps, const char *stash_msg, int q
 		} else {
 			struct child_process cp = CHILD_PROCESS_INIT;
 			cp.git_cmd = 1;
+			/* BUG: this nukes untracked files in the way */
 			strvec_pushl(&cp.args, "reset", "--hard", "-q",
 				     "--no-recurse-submodules", NULL);
 			if (run_command(&cp)) {
diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
index ef2776a9e45..a49242d15ae 100644
--- a/builtin/submodule--helper.c
+++ b/builtin/submodule--helper.c
@@ -2864,6 +2864,10 @@ static int add_submodule(const struct add_data *add_data)
 		prepare_submodule_repo_env(&cp.env_array);
 		cp.git_cmd = 1;
 		cp.dir = add_data->sm_path;
+		/*
+		 * NOTE: we only get here if add_data->force is true, so
+		 * passing --force to checkout is reasonable.
+		 */
 		strvec_pushl(&cp.args, "checkout", "-f", "-q", NULL);
 
 		if (add_data->branch) {
diff --git a/builtin/worktree.c b/builtin/worktree.c
index 0d0a80da61f..383947ff54f 100644
--- a/builtin/worktree.c
+++ b/builtin/worktree.c
@@ -356,6 +356,11 @@ static int add_worktree(const char *path, const char *refname,
 	if (opts->checkout) {
 		cp.argv = NULL;
 		strvec_clear(&cp.args);
+		/*
+		 * NOTE: reset --hard is okay here, because 'worktree add'
+		 * refuses to work in an extant non-empty directory, so there
+		 * is no risk of deleting untracked files.
+		 */
 		strvec_pushl(&cp.args, "reset", "--hard", "--no-recurse-submodules", NULL);
 		if (opts->quiet)
 			strvec_push(&cp.args, "--quiet");
diff --git a/contrib/rerere-train.sh b/contrib/rerere-train.sh
index eeee45dd341..75125d6ae00 100755
--- a/contrib/rerere-train.sh
+++ b/contrib/rerere-train.sh
@@ -91,7 +91,7 @@ do
 		git checkout -q $commit -- .
 		git rerere
 	fi
-	git reset -q --hard
+	git reset -q --hard  # Might nuke untracked files...
 done
 
 if test -z "$branch"
diff --git a/submodule.c b/submodule.c
index 8e611fe1dbf..a9b71d585cf 100644
--- a/submodule.c
+++ b/submodule.c
@@ -1866,6 +1866,7 @@ static void submodule_reset_index(const char *path)
 
 	strvec_pushf(&cp.args, "--super-prefix=%s%s/",
 		     get_super_prefix_or_empty(), path);
+	/* TODO: determine if this might overwright untracked files */
 	strvec_pushl(&cp.args, "read-tree", "-u", "--reset", NULL);
 
 	strvec_push(&cp.args, empty_tree_oid_hex());
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [PATCH v2 6/6] Documentation: call out commands that nuke untracked files/directories
  2021-09-24  6:37 ` [PATCH v2 0/6] Fix various issues around removal of " Elijah Newren via GitGitGadget
                     ` (4 preceding siblings ...)
  2021-09-24  6:37   ` [PATCH v2 5/6] Comment important codepaths regarding nuking untracked files/dirs Elijah Newren via GitGitGadget
@ 2021-09-24  6:37   ` Elijah Newren via GitGitGadget
  2021-09-27 16:33   ` [PATCH v3 00/11] Fix various issues around removal of " Elijah Newren via GitGitGadget
  6 siblings, 0 replies; 82+ messages in thread
From: Elijah Newren via GitGitGadget @ 2021-09-24  6:37 UTC (permalink / raw)
  To: git
  Cc: Ævar Arnfjörð Bjarmason, Fedor Biryukov,
	Philip Oakley, Phillip Wood, Elijah Newren, Elijah Newren,
	Elijah Newren

From: Elijah Newren <newren@gmail.com>

Some commands have traditionally also removed untracked files (or
directories) that were in the way of a tracked file we needed.  Document
these cases.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 Documentation/git-checkout.txt  | 5 +++--
 Documentation/git-read-tree.txt | 5 +++--
 Documentation/git-reset.txt     | 3 ++-
 3 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/Documentation/git-checkout.txt b/Documentation/git-checkout.txt
index b1a6fe44997..d473c9bf387 100644
--- a/Documentation/git-checkout.txt
+++ b/Documentation/git-checkout.txt
@@ -118,8 +118,9 @@ OPTIONS
 -f::
 --force::
 	When switching branches, proceed even if the index or the
-	working tree differs from `HEAD`.  This is used to throw away
-	local changes.
+	working tree differs from `HEAD`, and even if there are untracked
+	files in the way.  This is used to throw away local changes and
+	any untracked files or directories that are in the way.
 +
 When checking out paths from the index, do not fail upon unmerged
 entries; instead, unmerged entries are ignored.
diff --git a/Documentation/git-read-tree.txt b/Documentation/git-read-tree.txt
index 5fa8bab64c2..4731ec3283f 100644
--- a/Documentation/git-read-tree.txt
+++ b/Documentation/git-read-tree.txt
@@ -39,8 +39,9 @@ OPTIONS
 
 --reset::
 	Same as -m, except that unmerged entries are discarded instead
-	of failing. When used with `-u`, updates leading to loss of
-	working tree changes will not abort the operation.
+	of failing.  When used with `-u`, updates leading to loss of
+	working tree changes or untracked files or directories will not
+	abort the operation.
 
 -u::
 	After a successful merge, update the files in the work
diff --git a/Documentation/git-reset.txt b/Documentation/git-reset.txt
index 252e2d4e47d..6f7685f53d5 100644
--- a/Documentation/git-reset.txt
+++ b/Documentation/git-reset.txt
@@ -69,7 +69,8 @@ linkgit:git-add[1]).
 
 --hard::
 	Resets the index and working tree. Any changes to tracked files in the
-	working tree since `<commit>` are discarded.
+	working tree since `<commit>` are discarded.  Any untracked files or
+	directories in the way of writing any tracked files are simply deleted.
 
 --merge::
 	Resets the index and updates the files in the working tree that are
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 82+ messages in thread

* Re: [PATCH 5/6] Comment important codepaths regarding nuking untracked files/dirs
  2021-09-18 23:15 ` [PATCH 5/6] Comment important codepaths regarding nuking untracked files/dirs Elijah Newren via GitGitGadget
@ 2021-09-24 11:47   ` Luke Diamand
  2021-09-24 13:41     ` Elijah Newren
  0 siblings, 1 reply; 82+ messages in thread
From: Luke Diamand @ 2021-09-24 11:47 UTC (permalink / raw)
  To: Elijah Newren via GitGitGadget
  Cc: Git Users, Ævar Arnfjörð Bjarmason,
	Fedor Biryukov, Elijah Newren

On Sun, 19 Sept 2021 at 00:15, Elijah Newren via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Elijah Newren <newren@gmail.com>
>
> In the last few commits we focused on code in unpack-trees.c that
> mistakenly removed untracked files or directories.  There may be more of
> those, but in this commit we change our focus: callers of toplevel
> commands that are expected to remove untracked files or directories.
>
> As noted previously, we have toplevel commands that are expected to
> delete untracked files such as 'read-tree --reset', 'reset --hard', and
> 'checkout --force'.  However, that does not mean that other highlevel
> commands that happen to call these other commands thought about or
> conveyed to users the possibility that untracked files could be removed.
> Audit the code for such callsites, and add comments near existing
> callsites to mention whether these are safe or not.
>
> My auditing is somewhat incomplete, though; it skipped several cases:
>   * git-rebase--preserve-merges.sh: is in the process of being
>     deprecated/removed, so I won't leave a note that there are
>     likely more bugs in that script.
>   * contrib/git-new-workdir: why is the -f flag being used in a new
>     empty directory??  It shouldn't hurt, but it seems useless.
>   * git-p4.py: Don't see why -f is needed for a new dir (maybe it's
>     not and is just superfluous), but I'm not at all familiar with
>     the p4 stuff

Assuming you're talking about this code in git-p4.py:

            print("Synchronizing p4 checkout...")
            if new_client_dir:
                # old one was destroyed, and maybe nobody told p4
                p4_sync("...", "-f")
            else:
                p4_sync("...")

This is doing a Perforce sync in the P4 repo, not the git repo.

In the usual/happy case, this directory already exists, the Perforce
server knows about its state, and a normal "p4 sync ..." will bring it
up to date.

But, if someone manually deleted the directory then "p4 sync ..." will
only update modified files, and all sorts of things will then go wrong
(e.g. the files we updated in the git view won't be present, and
git-p4 will fall flat on its face).

So in this case, do a forced sync, which syncs everything ignoring the
P4 server's idea of what files are/not present.

Luke

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH 5/6] Comment important codepaths regarding nuking untracked files/dirs
  2021-09-24 11:47   ` Luke Diamand
@ 2021-09-24 13:41     ` Elijah Newren
  0 siblings, 0 replies; 82+ messages in thread
From: Elijah Newren @ 2021-09-24 13:41 UTC (permalink / raw)
  To: Luke Diamand
  Cc: Elijah Newren via GitGitGadget, Git Users,
	Ævar Arnfjörð Bjarmason, Fedor Biryukov

On Fri, Sep 24, 2021 at 4:47 AM Luke Diamand <luke@diamand.org> wrote:
>
> On Sun, 19 Sept 2021 at 00:15, Elijah Newren via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
> >
> > From: Elijah Newren <newren@gmail.com>
> >
> > In the last few commits we focused on code in unpack-trees.c that
> > mistakenly removed untracked files or directories.  There may be more of
> > those, but in this commit we change our focus: callers of toplevel
> > commands that are expected to remove untracked files or directories.
> >
> > As noted previously, we have toplevel commands that are expected to
> > delete untracked files such as 'read-tree --reset', 'reset --hard', and
> > 'checkout --force'.  However, that does not mean that other highlevel
> > commands that happen to call these other commands thought about or
> > conveyed to users the possibility that untracked files could be removed.
> > Audit the code for such callsites, and add comments near existing
> > callsites to mention whether these are safe or not.
> >
> > My auditing is somewhat incomplete, though; it skipped several cases:
> >   * git-rebase--preserve-merges.sh: is in the process of being
> >     deprecated/removed, so I won't leave a note that there are
> >     likely more bugs in that script.
> >   * contrib/git-new-workdir: why is the -f flag being used in a new
> >     empty directory??  It shouldn't hurt, but it seems useless.
> >   * git-p4.py: Don't see why -f is needed for a new dir (maybe it's
> >     not and is just superfluous), but I'm not at all familiar with
> >     the p4 stuff
>
> Assuming you're talking about this code in git-p4.py:
>
>             print("Synchronizing p4 checkout...")
>             if new_client_dir:
>                 # old one was destroyed, and maybe nobody told p4
>                 p4_sync("...", "-f")
>             else:
>                 p4_sync("...")

No, I was talking about this code:

                system([ "git", "checkout", "-f" ])

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH v2 2/6] Change unpack_trees' 'reset' flag into an enum
  2021-09-24  6:37   ` [PATCH v2 2/6] Change unpack_trees' 'reset' flag into an enum Elijah Newren via GitGitGadget
@ 2021-09-24 17:35     ` Junio C Hamano
  2021-09-26  6:50       ` Elijah Newren
  0 siblings, 1 reply; 82+ messages in thread
From: Junio C Hamano @ 2021-09-24 17:35 UTC (permalink / raw)
  To: Elijah Newren via GitGitGadget
  Cc: git, Ævar Arnfjörð Bjarmason, Fedor Biryukov,
	Philip Oakley, Phillip Wood, Elijah Newren

"Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com> writes:

> Also, note that 'git checkout <pathspec>' currently also allows
> overwriting untracked files.  That case should also be fixed, ...

I wasted a few minutes wondering about the example.  Please make it
clear that you are checking out of a tree-ish that is different from
HEAD, as there will by definition no "overwriting untracked" if you
are checking out of the index.

E.g. "git checkout <tree-ish> -- <pathspec>".

With this command line:

   $ git checkout HEAD~24 -- path

where path used to be there as late as 24 revisions ago, but since
then we removed, and the user wants to materialize the file out of
the old version, path, be it tracked, untracked, or even a
directory, should be made identical to the copy from the given
version, no?  Where does the "should also be fixed" come from?

> diff --git a/builtin/am.c b/builtin/am.c
> index c79e0167e98..b17baa67ad8 100644
> --- a/builtin/am.c
> +++ b/builtin/am.c
> @@ -1918,8 +1918,14 @@ static int fast_forward_to(struct tree *head, struct tree *remote, int reset)
>  	opts.dst_index = &the_index;
>  	opts.update = 1;
>  	opts.merge = 1;
> -	opts.reset = reset;
> +	opts.reset = reset ? UNPACK_RESET_PROTECT_UNTRACKED : 0;
>  	opts.fn = twoway_merge;
> +	if (opts.reset) {
> +		/* Allow ignored files in the way to get overwritten */
> +		opts.dir = xcalloc(1, sizeof(*opts.dir));
> +		opts.dir->flags |= DIR_SHOW_IGNORED;
> +		setup_standard_excludes(opts.dir);

Do these three lines make a recurring pattern when opts.reset is set?
I am wondering if this can be done more centrally by the unpack-trees
machinery (i.e. "gee this one has o->reset set to X, so let's set up
the o->dir before doing anything").

> diff --git a/builtin/checkout.c b/builtin/checkout.c
> index b5d477919a7..52826e0d145 100644
> --- a/builtin/checkout.c
> +++ b/builtin/checkout.c
> @@ -641,23 +641,37 @@ static int reset_tree(struct tree *tree, const struct checkout_opts *o,
>  {
>  	struct unpack_trees_options opts;
>  	struct tree_desc tree_desc;
> +	int unpack_trees_ret;
>  
>  	memset(&opts, 0, sizeof(opts));
>  	opts.head_idx = -1;
>  	opts.update = worktree;
>  	opts.skip_unmerged = !worktree;
> -	opts.reset = 1;
> +	opts.reset = o->force ? UNPACK_RESET_OVERWRITE_UNTRACKED :
> +				UNPACK_RESET_PROTECT_UNTRACKED;
>  	opts.merge = 1;
>  	opts.fn = oneway_merge;
>  	opts.verbose_update = o->show_progress;
>  	opts.src_index = &the_index;
>  	opts.dst_index = &the_index;
> +	if (o->overwrite_ignore) {
> +		opts.dir = xcalloc(1, sizeof(*opts.dir));
> +		opts.dir->flags |= DIR_SHOW_IGNORED;
> +		setup_standard_excludes(opts.dir);
> +	}

If our longer term goal is to decide classification of files not in
the index (currently, "ignored" and "untracked", but we may want to
add a new "precious" class) and (across various commands that build
on the unpack-trees infrastructure) to protect the "untracked" and
"precious" ones, with --[no-]overwrite-{ignore,untracked} options as
escape hatches, uniformly, perhaps the --[no-]-overwrite-ignore
option may be stolen from here and shifted to unpack_tree_options to
help us going in that direction?  This is just an observation for
longer term, not a suggestion to include the first step for such a
move in this series.

>  	init_checkout_metadata(&opts.meta, info->refname,
>  			       info->commit ? &info->commit->object.oid : null_oid(),
>  			       NULL);
>  	parse_tree(tree);
>  	init_tree_desc(&tree_desc, tree->buffer, tree->size);
> -	switch (unpack_trees(1, &tree_desc, &opts)) {
> +	unpack_trees_ret = unpack_trees(1, &tree_desc, &opts);
> +
> +	if (o->overwrite_ignore) {
> +		dir_clear(opts.dir);
> +		FREE_AND_NULL(opts.dir);

This dir_clear() is also a recurring theme.  See below.

> +	}
> +
> +	switch (unpack_trees_ret) {
>  	case -2:
>  		*writeout_error = 1;
>  		/*
> diff --git a/builtin/read-tree.c b/builtin/read-tree.c
> index 485e7b04794..740fc0335af 100644
> --- a/builtin/read-tree.c
> +++ b/builtin/read-tree.c
> @@ -174,6 +174,9 @@ int cmd_read_tree(int argc, const char **argv, const char *cmd_prefix)
>  	if (1 < opts.merge + opts.reset + prefix_set)
>  		die("Which one? -m, --reset, or --prefix?");
>  
> +	if (opts.reset)
> +		opts.reset = UNPACK_RESET_OVERWRITE_UNTRACKED;
> +

We do not do anything about opts.dir here by default, which means we
by default do not overwrite ignored, but that's OK, because this old
command explicitly takes --exclude-per-directory to tell it what the
command should consider "ignored" and with the option given we do
prepare opts.dir just fine.

> diff --git a/builtin/reset.c b/builtin/reset.c
> index 43e855cb887..a12ee986e9f 100644
> --- a/builtin/reset.c
> +++ b/builtin/reset.c
> @@ -10,6 +10,7 @@
>  #define USE_THE_INDEX_COMPATIBILITY_MACROS
>  #include "builtin.h"
>  #include "config.h"
> +#include "dir.h"
>  #include "lockfile.h"
>  #include "tag.h"
>  #include "object.h"
> @@ -70,9 +71,20 @@ static int reset_index(const char *ref, const struct object_id *oid, int reset_t
>  		break;
>  	case HARD:
>  		opts.update = 1;
> -		/* fallthrough */
> +		opts.reset = UNPACK_RESET_OVERWRITE_UNTRACKED;
> +		break;
> +	case MIXED:
> +		opts.reset = UNPACK_RESET_PROTECT_UNTRACKED;
> +		/* but opts.update=0, so working tree not updated */
> +		break;
>  	default:
> -		opts.reset = 1;
> +		BUG("invalid reset_type passed to reset_index");
> +	}
> +	if (opts.reset == UNPACK_RESET_PROTECT_UNTRACKED) {

Unlike the one in "am", this cares .reset being a particular value,
not just being non-zero.  Puzzling.

It is a bit counter-intuitive in that we do not allow overwrite
ignored (which is currently a synonym for "expendable") when .reset
is set to allow us to ovewrite untracked.

> +		/* Setup opts.dir so we can overwrite ignored files */
> +		opts.dir = xcalloc(1, sizeof(*opts.dir));
> +		opts.dir->flags |= DIR_SHOW_IGNORED;
> +		setup_standard_excludes(opts.dir);

> @@ -104,6 +116,10 @@ static int reset_index(const char *ref, const struct object_id *oid, int reset_t
>  	ret = 0;
>  
>  out:
> +	if (opts.reset == UNPACK_RESET_PROTECT_UNTRACKED) {
> +		dir_clear(opts.dir);
> +		FREE_AND_NULL(opts.dir);

This dir_clear() is also a recurring theme.  See below.

> diff --git a/builtin/stash.c b/builtin/stash.c
> index 8f42360ca91..563f590afbd 100644
> --- a/builtin/stash.c
> +++ b/builtin/stash.c
> @@ -237,6 +237,7 @@ static int reset_tree(struct object_id *i_tree, int update, int reset)
>  	struct tree_desc t[MAX_UNPACK_TREES];
>  	struct tree *tree;
>  	struct lock_file lock_file = LOCK_INIT;
> +	int unpack_trees_ret;
>  
>  	read_cache_preload(NULL);
>  	if (refresh_cache(REFRESH_QUIET))
> @@ -256,11 +257,23 @@ static int reset_tree(struct object_id *i_tree, int update, int reset)
>  	opts.src_index = &the_index;
>  	opts.dst_index = &the_index;
>  	opts.merge = 1;
> -	opts.reset = reset;
> +	opts.reset = reset ? UNPACK_RESET_PROTECT_UNTRACKED : 0;
> +	if (opts.reset) {
> +		opts.dir = xcalloc(1, sizeof(*opts.dir));
> +		opts.dir->flags |= DIR_SHOW_IGNORED;
> +		setup_standard_excludes(opts.dir);
> +	}
>  	opts.update = update;
>  	opts.fn = oneway_merge;
>  
> -	if (unpack_trees(nr_trees, t, &opts))
> +	unpack_trees_ret = unpack_trees(nr_trees, t, &opts);
> +
> +	if (opts.reset) {
> +		dir_clear(opts.dir);
> +		FREE_AND_NULL(opts.dir);

This dir_clear() is also a recurring theme.  Why aren't their guards
uniformly "if (opts.dir)"?  The logic to decide if we set up opts.dir
or not may be far from here and may be different from code path to
code path, but the need to clear opts.dir should not have to care
why opts.dir was populated, no?


> +	}
> +
> +	if (unpack_trees_ret)
>  		return -1;
>  
>  	if (write_locked_index(&the_index, &lock_file, COMMIT_LOCK))
> diff --git a/reset.c b/reset.c
> index 79310ae071b..1695f3828c5 100644
> --- a/reset.c
> +++ b/reset.c
> @@ -1,5 +1,6 @@
>  #include "git-compat-util.h"
>  #include "cache-tree.h"
> +#include "dir.h"
>  #include "lockfile.h"
>  #include "refs.h"
>  #include "reset.h"
> @@ -57,8 +58,12 @@ int reset_head(struct repository *r, struct object_id *oid, const char *action,
>  	unpack_tree_opts.update = 1;
>  	unpack_tree_opts.merge = 1;
>  	init_checkout_metadata(&unpack_tree_opts.meta, switch_to_branch, oid, NULL);
> -	if (!detach_head)
> -		unpack_tree_opts.reset = 1;
> +	if (!detach_head) {
> +		unpack_tree_opts.reset = UNPACK_RESET_PROTECT_UNTRACKED;
> +		unpack_tree_opts.dir = xcalloc(1, sizeof(*unpack_tree_opts.dir));
> +		unpack_tree_opts.dir->flags |= DIR_SHOW_IGNORED;
> +		setup_standard_excludes(unpack_tree_opts.dir);
> +	}
>  
>  	if (repo_read_index_unmerged(r) < 0) {
>  		ret = error(_("could not read index"));
> @@ -131,6 +136,10 @@ reset_head_refs:
>  			    oid_to_hex(oid), "1", NULL);
>  
>  leave_reset_head:
> +	if (unpack_tree_opts.dir) {
> +		dir_clear(unpack_tree_opts.dir);
> +		FREE_AND_NULL(unpack_tree_opts.dir);

Yes, I think this is the right way to decide if we call dir_clear(),
and all other hunks in this patch should do the same.

> +	}
>  	strbuf_release(&msg);
>  	rollback_lock_file(&lock);
>  	clear_unpack_trees_porcelain(&unpack_tree_opts);
> diff --git a/t/t2500-untracked-overwriting.sh b/t/t2500-untracked-overwriting.sh
> index 2412d121ea8..18604360df8 100755
> --- a/t/t2500-untracked-overwriting.sh
> +++ b/t/t2500-untracked-overwriting.sh
> @@ -92,7 +92,7 @@ test_setup_checkout_m () {
>  	)
>  }
>  
> -test_expect_failure 'checkout -m does not nuke untracked file' '
> +test_expect_success 'checkout -m does not nuke untracked file' '
>  	test_setup_checkout_m &&
>  	(
>  		cd checkout &&
> @@ -138,7 +138,7 @@ test_setup_sequencing () {
>  	)
>  }
>  
> -test_expect_failure 'git rebase --abort and untracked files' '
> +test_expect_success 'git rebase --abort and untracked files' '
>  	test_setup_sequencing rebase_abort_and_untracked &&
>  	(
>  		cd sequencing_rebase_abort_and_untracked &&
> @@ -155,7 +155,7 @@ test_expect_failure 'git rebase --abort and untracked files' '
>  	)
>  '
>  
> -test_expect_failure 'git rebase fast forwarding and untracked files' '
> +test_expect_success 'git rebase fast forwarding and untracked files' '
>  	test_setup_sequencing rebase_fast_forward_and_untracked &&
>  	(
>  		cd sequencing_rebase_fast_forward_and_untracked &&
> diff --git a/unpack-trees.c b/unpack-trees.c
> index 5786645f315..fcbe63bbed9 100644
> --- a/unpack-trees.c
> +++ b/unpack-trees.c
> @@ -1693,6 +1693,9 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
>  	struct pattern_list pl;
>  	int free_pattern_list = 0;
>  
> +	if (o->reset == UNPACK_RESET_INVALID)
> +		BUG("o->reset had a value of 1; should be UNPACK_TREES_*_UNTRACKED");
> +
>  	if (len > MAX_UNPACK_TREES)
>  		die("unpack_trees takes at most %d trees", MAX_UNPACK_TREES);
>  
> @@ -2218,7 +2221,8 @@ static int verify_absent_1(const struct cache_entry *ce,
>  	int len;
>  	struct stat st;
>  
> -	if (o->index_only || o->reset || !o->update)
> +	if (o->index_only || !o->update ||
> +	    o->reset == UNPACK_RESET_OVERWRITE_UNTRACKED)
>  		return 0;
>  
>  	len = check_leading_path(ce->name, ce_namelen(ce), 0);
> diff --git a/unpack-trees.h b/unpack-trees.h
> index 2d88b19dca7..1f386fb16cc 100644
> --- a/unpack-trees.h
> +++ b/unpack-trees.h
> @@ -45,9 +45,15 @@ void setup_unpack_trees_porcelain(struct unpack_trees_options *opts,
>   */
>  void clear_unpack_trees_porcelain(struct unpack_trees_options *opts);
>  
> +enum unpack_trees_reset_type {
> +	UNPACK_RESET_NONE = 0,    /* traditional "false" value; still valid */
> +	UNPACK_RESET_INVALID = 1, /* "true" no longer valid; use below values */
> +	UNPACK_RESET_PROTECT_UNTRACKED,
> +	UNPACK_RESET_OVERWRITE_UNTRACKED
> +};
> +
>  struct unpack_trees_options {
> -	unsigned int reset,
> -		     merge,
> +	unsigned int merge,
>  		     update,
>  		     clone,
>  		     index_only,
> @@ -64,6 +70,7 @@ struct unpack_trees_options {
>  		     exiting_early,
>  		     show_all_errors,
>  		     dry_run;
> +	enum unpack_trees_reset_type reset;
>  	const char *prefix;
>  	int cache_bottom;
>  	struct dir_struct *dir;

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH v2 5/6] Comment important codepaths regarding nuking untracked files/dirs
  2021-09-24  6:37   ` [PATCH v2 5/6] Comment important codepaths regarding nuking untracked files/dirs Elijah Newren via GitGitGadget
@ 2021-09-24 17:50     ` Eric Sunshine
  2021-09-26  6:35       ` Elijah Newren
  0 siblings, 1 reply; 82+ messages in thread
From: Eric Sunshine @ 2021-09-24 17:50 UTC (permalink / raw)
  To: Elijah Newren via GitGitGadget
  Cc: Git List, Ævar Arnfjörð Bjarmason, Fedor Biryukov,
	Philip Oakley, Phillip Wood, Elijah Newren

On Fri, Sep 24, 2021 at 2:37 AM Elijah Newren via GitGitGadget
<gitgitgadget@gmail.com> wrote:
> In the last few commits we focused on code in unpack-trees.c that
> mistakenly removed untracked files or directories.  There may be more of
> those, but in this commit we change our focus: callers of toplevel
> commands that are expected to remove untracked files or directories.
>
> As noted previously, we have toplevel commands that are expected to
> delete untracked files such as 'read-tree --reset', 'reset --hard', and
> 'checkout --force'.  However, that does not mean that other highlevel
> commands that happen to call these other commands thought about or
> conveyed to users the possibility that untracked files could be removed.
> Audit the code for such callsites, and add comments near existing
> callsites to mention whether these are safe or not.
> [...]
> Signed-off-by: Elijah Newren <newren@gmail.com>
> ---
> diff --git a/builtin/worktree.c b/builtin/worktree.c
> @@ -356,6 +356,11 @@ static int add_worktree(const char *path, const char *refname,
> +               /*
> +                * NOTE: reset --hard is okay here, because 'worktree add'
> +                * refuses to work in an extant non-empty directory, so there
> +                * is no risk of deleting untracked files.
> +                */
>                 strvec_pushl(&cp.args, "reset", "--hard", "--no-recurse-submodules", NULL);

I understand that this comment helps you or some other person auditing
similar cases in the future, however, as a standalone comment for a
reader who isn't aware of the intention, it seems more confusing than
illuminating. It also detracts from the important purpose of `--hard`
here, which is that it is necessary in order to get `git reset` to
actually "checkout" the files into the empty directory, so use of
`--hard` is not an accident or carelessness.

These days, we'd probably just use:

    git restore --no-recurse-submodules .

instead (including the final `.`) to achieve the same, and that
wouldn't need any sort of cuationary comment like the one being added
by this patch. So, perhaps that's a better way to go, or maybe it's
outside the scope of this series...

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH v2 5/6] Comment important codepaths regarding nuking untracked files/dirs
  2021-09-24 17:50     ` Eric Sunshine
@ 2021-09-26  6:35       ` Elijah Newren
  0 siblings, 0 replies; 82+ messages in thread
From: Elijah Newren @ 2021-09-26  6:35 UTC (permalink / raw)
  To: Eric Sunshine
  Cc: Elijah Newren via GitGitGadget, Git List,
	Ævar Arnfjörð Bjarmason, Fedor Biryukov,
	Philip Oakley, Phillip Wood

On Fri, Sep 24, 2021 at 10:50 AM Eric Sunshine <sunshine@sunshineco.com> wrote:
>
> On Fri, Sep 24, 2021 at 2:37 AM Elijah Newren via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
> > In the last few commits we focused on code in unpack-trees.c that
> > mistakenly removed untracked files or directories.  There may be more of
> > those, but in this commit we change our focus: callers of toplevel
> > commands that are expected to remove untracked files or directories.
> >
> > As noted previously, we have toplevel commands that are expected to
> > delete untracked files such as 'read-tree --reset', 'reset --hard', and
> > 'checkout --force'.  However, that does not mean that other highlevel
> > commands that happen to call these other commands thought about or
> > conveyed to users the possibility that untracked files could be removed.
> > Audit the code for such callsites, and add comments near existing
> > callsites to mention whether these are safe or not.
> > [...]
> > Signed-off-by: Elijah Newren <newren@gmail.com>
> > ---
> > diff --git a/builtin/worktree.c b/builtin/worktree.c
> > @@ -356,6 +356,11 @@ static int add_worktree(const char *path, const char *refname,
> > +               /*
> > +                * NOTE: reset --hard is okay here, because 'worktree add'
> > +                * refuses to work in an extant non-empty directory, so there
> > +                * is no risk of deleting untracked files.
> > +                */
> >                 strvec_pushl(&cp.args, "reset", "--hard", "--no-recurse-submodules", NULL);
>
> I understand that this comment helps you or some other person auditing
> similar cases in the future, however, as a standalone comment for a
> reader who isn't aware of the intention, it seems more confusing than
> illuminating. It also detracts from the important purpose of `--hard`
> here, which is that it is necessary in order to get `git reset` to
> actually "checkout" the files into the empty directory, so use of
> `--hard` is not an accident or carelessness.

Fair enough; I'll strike it.

> These days, we'd probably just use:
>
>     git restore --no-recurse-submodules .
>
> instead (including the final `.`) to achieve the same, and that
> wouldn't need any sort of cuationary comment like the one being added
> by this patch. So, perhaps that's a better way to go, or maybe it's
> outside the scope of this series...

Yeah, that'd make sense.  Though it'd make even more sense to get rid
of the subprocess forking.  Definitely something for a different
series, though.

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH v2 2/6] Change unpack_trees' 'reset' flag into an enum
  2021-09-24 17:35     ` Junio C Hamano
@ 2021-09-26  6:50       ` Elijah Newren
  0 siblings, 0 replies; 82+ messages in thread
From: Elijah Newren @ 2021-09-26  6:50 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Elijah Newren via GitGitGadget, git,
	Ævar Arnfjörð Bjarmason, Fedor Biryukov,
	Philip Oakley, Phillip Wood

On Fri, Sep 24, 2021 at 10:35 AM Junio C Hamano <gitster@pobox.com> wrote:
>
> "Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
> > Also, note that 'git checkout <pathspec>' currently also allows
> > overwriting untracked files.  That case should also be fixed, ...
>
> I wasted a few minutes wondering about the example.  Please make it
> clear that you are checking out of a tree-ish that is different from
> HEAD, as there will by definition no "overwriting untracked" if you
> are checking out of the index.
>
> E.g. "git checkout <tree-ish> -- <pathspec>".
>
> With this command line:
>
>    $ git checkout HEAD~24 -- path
>
> where path used to be there as late as 24 revisions ago, but since
> then we removed, and the user wants to materialize the file out of
> the old version, path, be it tracked, untracked, or even a
> directory, should be made identical to the copy from the given
> version, no?  Where does the "should also be fixed" come from?

Sorry, I should have been more careful.  Phillip noted this other case
that overwrote untracked files and suggested mentioning it; see
https://lore.kernel.org/git/acef3628-9542-d777-2534-577de9707e15@gmail.com/
and the links from there.  Perhaps I would have been better off
mentioning that link and the other links in it.

> > diff --git a/builtin/am.c b/builtin/am.c
> > index c79e0167e98..b17baa67ad8 100644
> > --- a/builtin/am.c
> > +++ b/builtin/am.c
> > @@ -1918,8 +1918,14 @@ static int fast_forward_to(struct tree *head, struct tree *remote, int reset)
> >       opts.dst_index = &the_index;
> >       opts.update = 1;
> >       opts.merge = 1;
> > -     opts.reset = reset;
> > +     opts.reset = reset ? UNPACK_RESET_PROTECT_UNTRACKED : 0;
> >       opts.fn = twoway_merge;
> > +     if (opts.reset) {
> > +             /* Allow ignored files in the way to get overwritten */
> > +             opts.dir = xcalloc(1, sizeof(*opts.dir));
> > +             opts.dir->flags |= DIR_SHOW_IGNORED;
> > +             setup_standard_excludes(opts.dir);
>
> Do these three lines make a recurring pattern when opts.reset is set?
> I am wondering if this can be done more centrally by the unpack-trees
> machinery (i.e. "gee this one has o->reset set to X, so let's set up
> the o->dir before doing anything").

It's a very common pattern...but not universal due to some prevailing
weirdness and differences in how commands handle ignored files.  But,
it looks like that can be fixed up, and then this can be centralized.
I've got some patches doing this.

> > diff --git a/builtin/checkout.c b/builtin/checkout.c
> > index b5d477919a7..52826e0d145 100644
> > --- a/builtin/checkout.c
> > +++ b/builtin/checkout.c
> > @@ -641,23 +641,37 @@ static int reset_tree(struct tree *tree, const struct checkout_opts *o,
> >  {
> >       struct unpack_trees_options opts;
> >       struct tree_desc tree_desc;
> > +     int unpack_trees_ret;
> >
> >       memset(&opts, 0, sizeof(opts));
> >       opts.head_idx = -1;
> >       opts.update = worktree;
> >       opts.skip_unmerged = !worktree;
> > -     opts.reset = 1;
> > +     opts.reset = o->force ? UNPACK_RESET_OVERWRITE_UNTRACKED :
> > +                             UNPACK_RESET_PROTECT_UNTRACKED;
> >       opts.merge = 1;
> >       opts.fn = oneway_merge;
> >       opts.verbose_update = o->show_progress;
> >       opts.src_index = &the_index;
> >       opts.dst_index = &the_index;
> > +     if (o->overwrite_ignore) {
> > +             opts.dir = xcalloc(1, sizeof(*opts.dir));
> > +             opts.dir->flags |= DIR_SHOW_IGNORED;
> > +             setup_standard_excludes(opts.dir);
> > +     }
>
> If our longer term goal is to decide classification of files not in
> the index (currently, "ignored" and "untracked", but we may want to
> add a new "precious" class) and (across various commands that build
> on the unpack-trees infrastructure) to protect the "untracked" and
> "precious" ones, with --[no-]overwrite-{ignore,untracked} options as
> escape hatches, uniformly, perhaps the --[no-]-overwrite-ignore
> option may be stolen from here and shifted to unpack_tree_options to
> help us going in that direction?  This is just an observation for
> longer term, not a suggestion to include the first step for such a
> move in this series.

I had thought about that, but I was already pretty deep down the
rabbit hole of trying to avoid removing the current working directory
as a side effect of other commands.  I noted in my cover letter the
--no-overwrite-ignore options and how they could be used building on
this series, and now you bring it up too.  And your previous comment
about the common pattern for unpack_trees_options.dir forces me to go
further down this hole a bit.  So, I've got patches that implement a
fair amount of this, but doesn't actually add the new
--no-overwrite-ignore options or plumb them through.  They do,
however, add a FIXME comment where a single boolean value could be set
in order to make such options work.

>
> >       init_checkout_metadata(&opts.meta, info->refname,
> >                              info->commit ? &info->commit->object.oid : null_oid(),
> >                              NULL);
> >       parse_tree(tree);
> >       init_tree_desc(&tree_desc, tree->buffer, tree->size);
> > -     switch (unpack_trees(1, &tree_desc, &opts)) {
> > +     unpack_trees_ret = unpack_trees(1, &tree_desc, &opts);
> > +
> > +     if (o->overwrite_ignore) {
> > +             dir_clear(opts.dir);
> > +             FREE_AND_NULL(opts.dir);
>
> This dir_clear() is also a recurring theme.  See below.
>
> > +     }
> > +
> > +     switch (unpack_trees_ret) {
> >       case -2:
> >               *writeout_error = 1;
> >               /*
> > diff --git a/builtin/read-tree.c b/builtin/read-tree.c
> > index 485e7b04794..740fc0335af 100644
> > --- a/builtin/read-tree.c
> > +++ b/builtin/read-tree.c
> > @@ -174,6 +174,9 @@ int cmd_read_tree(int argc, const char **argv, const char *cmd_prefix)
> >       if (1 < opts.merge + opts.reset + prefix_set)
> >               die("Which one? -m, --reset, or --prefix?");
> >
> > +     if (opts.reset)
> > +             opts.reset = UNPACK_RESET_OVERWRITE_UNTRACKED;
> > +
>
> We do not do anything about opts.dir here by default, which means we
> by default do not overwrite ignored, but that's OK, because this old
> command explicitly takes --exclude-per-directory to tell it what the
> command should consider "ignored" and with the option given we do
> prepare opts.dir just fine.
>
> > diff --git a/builtin/reset.c b/builtin/reset.c
> > index 43e855cb887..a12ee986e9f 100644
> > --- a/builtin/reset.c
> > +++ b/builtin/reset.c
> > @@ -10,6 +10,7 @@
> >  #define USE_THE_INDEX_COMPATIBILITY_MACROS
> >  #include "builtin.h"
> >  #include "config.h"
> > +#include "dir.h"
> >  #include "lockfile.h"
> >  #include "tag.h"
> >  #include "object.h"
> > @@ -70,9 +71,20 @@ static int reset_index(const char *ref, const struct object_id *oid, int reset_t
> >               break;
> >       case HARD:
> >               opts.update = 1;
> > -             /* fallthrough */
> > +             opts.reset = UNPACK_RESET_OVERWRITE_UNTRACKED;
> > +             break;
> > +     case MIXED:
> > +             opts.reset = UNPACK_RESET_PROTECT_UNTRACKED;
> > +             /* but opts.update=0, so working tree not updated */
> > +             break;
> >       default:
> > -             opts.reset = 1;
> > +             BUG("invalid reset_type passed to reset_index");
> > +     }
> > +     if (opts.reset == UNPACK_RESET_PROTECT_UNTRACKED) {
>
> Unlike the one in "am", this cares .reset being a particular value,
> not just being non-zero.  Puzzling.
>
> It is a bit counter-intuitive in that we do not allow overwrite
> ignored (which is currently a synonym for "expendable") when .reset
> is set to allow us to ovewrite untracked.

No, when opts.reset == UNPACK_RESET_NUKE_UNTRACKED, unpack_trees() has
no way to differentiate between untracked and ignored and thus just
deletes them all.  Setting up unpack_trees_options.dir so that we can
differentiate between untracked and ignored files when we are just
going to treat them the same (delete them both) is wasted effort.  So
the check for a certain type of reset value was merely an
optimization.  (And that optimization didn't apply in the am case,
since it doesn't use UNPACK_RESET_NUKE_UNTRACKED.)

However, consolidating the unpack_trees_options.dir handling into
unpack_trees() allows me to better document the optimization and only
worry about it in one place, which seems to make the code much
clearer.

>
> > +             /* Setup opts.dir so we can overwrite ignored files */
> > +             opts.dir = xcalloc(1, sizeof(*opts.dir));
> > +             opts.dir->flags |= DIR_SHOW_IGNORED;
> > +             setup_standard_excludes(opts.dir);
>
> > @@ -104,6 +116,10 @@ static int reset_index(const char *ref, const struct object_id *oid, int reset_t
> >       ret = 0;
> >
> >  out:
> > +     if (opts.reset == UNPACK_RESET_PROTECT_UNTRACKED) {
> > +             dir_clear(opts.dir);
> > +             FREE_AND_NULL(opts.dir);
>
> This dir_clear() is also a recurring theme.  See below.
>
> > diff --git a/builtin/stash.c b/builtin/stash.c
> > index 8f42360ca91..563f590afbd 100644
> > --- a/builtin/stash.c
> > +++ b/builtin/stash.c
> > @@ -237,6 +237,7 @@ static int reset_tree(struct object_id *i_tree, int update, int reset)
> >       struct tree_desc t[MAX_UNPACK_TREES];
> >       struct tree *tree;
> >       struct lock_file lock_file = LOCK_INIT;
> > +     int unpack_trees_ret;
> >
> >       read_cache_preload(NULL);
> >       if (refresh_cache(REFRESH_QUIET))
> > @@ -256,11 +257,23 @@ static int reset_tree(struct object_id *i_tree, int update, int reset)
> >       opts.src_index = &the_index;
> >       opts.dst_index = &the_index;
> >       opts.merge = 1;
> > -     opts.reset = reset;
> > +     opts.reset = reset ? UNPACK_RESET_PROTECT_UNTRACKED : 0;
> > +     if (opts.reset) {
> > +             opts.dir = xcalloc(1, sizeof(*opts.dir));
> > +             opts.dir->flags |= DIR_SHOW_IGNORED;
> > +             setup_standard_excludes(opts.dir);
> > +     }
> >       opts.update = update;
> >       opts.fn = oneway_merge;
> >
> > -     if (unpack_trees(nr_trees, t, &opts))
> > +     unpack_trees_ret = unpack_trees(nr_trees, t, &opts);
> > +
> > +     if (opts.reset) {
> > +             dir_clear(opts.dir);
> > +             FREE_AND_NULL(opts.dir);
>
> This dir_clear() is also a recurring theme.  Why aren't their guards
> uniformly "if (opts.dir)"?  The logic to decide if we set up opts.dir
> or not may be far from here and may be different from code path to
> code path, but the need to clear opts.dir should not have to care
> why opts.dir was populated, no?

Yeah, you're right; I should have used "if (opts.dir)" for all these cases.

> > +     }
> > +
> > +     if (unpack_trees_ret)
> >               return -1;
> >
> >       if (write_locked_index(&the_index, &lock_file, COMMIT_LOCK))
> > diff --git a/reset.c b/reset.c
> > index 79310ae071b..1695f3828c5 100644
> > --- a/reset.c
> > +++ b/reset.c
> > @@ -1,5 +1,6 @@
> >  #include "git-compat-util.h"
> >  #include "cache-tree.h"
> > +#include "dir.h"
> >  #include "lockfile.h"
> >  #include "refs.h"
> >  #include "reset.h"
> > @@ -57,8 +58,12 @@ int reset_head(struct repository *r, struct object_id *oid, const char *action,
> >       unpack_tree_opts.update = 1;
> >       unpack_tree_opts.merge = 1;
> >       init_checkout_metadata(&unpack_tree_opts.meta, switch_to_branch, oid, NULL);
> > -     if (!detach_head)
> > -             unpack_tree_opts.reset = 1;
> > +     if (!detach_head) {
> > +             unpack_tree_opts.reset = UNPACK_RESET_PROTECT_UNTRACKED;
> > +             unpack_tree_opts.dir = xcalloc(1, sizeof(*unpack_tree_opts.dir));
> > +             unpack_tree_opts.dir->flags |= DIR_SHOW_IGNORED;
> > +             setup_standard_excludes(unpack_tree_opts.dir);
> > +     }
> >
> >       if (repo_read_index_unmerged(r) < 0) {
> >               ret = error(_("could not read index"));
> > @@ -131,6 +136,10 @@ reset_head_refs:
> >                           oid_to_hex(oid), "1", NULL);
> >
> >  leave_reset_head:
> > +     if (unpack_tree_opts.dir) {
> > +             dir_clear(unpack_tree_opts.dir);
> > +             FREE_AND_NULL(unpack_tree_opts.dir);
>
> Yes, I think this is the right way to decide if we call dir_clear(),
> and all other hunks in this patch should do the same.

Will fix, though the consolidation ends up making it just one place anyway.

^ permalink raw reply	[flat|nested] 82+ messages in thread

* [PATCH v3 00/11] Fix various issues around removal of untracked files/directories
  2021-09-24  6:37 ` [PATCH v2 0/6] Fix various issues around removal of " Elijah Newren via GitGitGadget
                     ` (5 preceding siblings ...)
  2021-09-24  6:37   ` [PATCH v2 6/6] Documentation: call out commands that nuke untracked files/directories Elijah Newren via GitGitGadget
@ 2021-09-27 16:33   ` Elijah Newren via GitGitGadget
  2021-09-27 16:33     ` [PATCH v3 01/11] t2500: add various tests for nuking untracked files Elijah Newren via GitGitGadget
                       ` (14 more replies)
  6 siblings, 15 replies; 82+ messages in thread
From: Elijah Newren via GitGitGadget @ 2021-09-27 16:33 UTC (permalink / raw)
  To: git
  Cc: Ævar Arnfjörð Bjarmason, Fedor Biryukov,
	Philip Oakley, Phillip Wood, Elijah Newren, Eric Sunshine,
	Elijah Newren

We have multiple codepaths that delete untracked files/directories but
shouldn't. There are also some codepaths where we delete untracked
files/directories intentionally (based on mailing list discussion), but
where that intent is not documented. We also have some codepaths that
preserve ignored files, which shouldn't. Fix the documentation, add several
new (mostly failing) testcases, fix some of the new testcases, and add
comments about some potential remaining problems. (I found these as a
side-effect of looking at [1], though [2] pointed out one explicitly while I
was working on it.)

Note that I'm using Junio's declaration about checkout -f and reset --hard
(and also presuming that since read-tree --reset is porcelain that its
behavior should be left alone)[3] in this series.

Changes since v2 (all due to Junio's request to consolidate
unpack_trees_options.dir handling):

 * fix some (pre-existing) memory leaks, in preparation for consolidating
   some common code (new patch 2)
 * New patches (3 & 6) to make a few more commands remove ignored files by
   default -- which also fixes an existing testcase
 * New patches (4 & 5) to consolidate the various places handling
   unpack_trees_options.dir and default to treating ignored files as
   expendable within unpack_trees(). These change also make it very easy to
   add --no-overwrite-ignore options in the future to additional commands
   (checkout and merge already have such an option, though merge only passes
   that along to the fast-forwarding backend)

Changes since v1:

 * Various small cleanups (suggested by Ævar)
 * Fixed memory leaks of unpack_trees_opts->dir (also suggested by Ævar)
 * Use an enum for unpack_trees_options->reset, instead of multiple fields
   (suggested by Phillip)
 * Avoid changing behavior for cases not setting unpack_trees_options.reset
   > 0 (even if it may make sense to nuke ignored files when running either
   read-tree -m -u or the various reset flavors run internally by
   rebase/sequencer); we can revisit that later.

SIDENOTE about treating ignored files as precious:

The patches are now getting pretty close to being able to handle ignored
files as precious. The only things left would be making merge pass the
--no-overwrite-ignore option along to more backends, and adding the
--no-overwrite-ignore option that both checkout and merge take to more
commands. There's already comments in the code about what boolean would need
to be set by that flag. And then perhaps also make a global
core.overwrite_ignored config option to affect all of these. Granted, doing
this would globally treat ignored files as precious rather than allowing
them to be configured on a per-path basis, but honestly I think the idea of
configuring ignored files as precious on a per-path basis sounds like
insanity. (We have enough bugs with untracked and ignored files without
adding yet another type. And, of course, configuring per-path rules sounds
like lots of work for end users to configure. There may be additional
reasons against it.) So, if someone wants to pursue the precious-ignored
concept then I'd much rather see it done as a global setting. Just my $0.02.

[1] https://lore.kernel.org/git/xmqqv93n7q1v.fsf@gitster.g/ [2]
https://lore.kernel.org/git/C357A648-8B13-45C3-9388-C0C7F7D40DAE@gmail.com/
[3] https://lore.kernel.org/git/xmqqr1e2ejs9.fsf@gitster.g/

Elijah Newren (11):
  t2500: add various tests for nuking untracked files
  checkout, read-tree: fix leak of unpack_trees_options.dir
  read-tree, merge-recursive: overwrite ignored files by default
  unpack-trees: introduce preserve_ignored to unpack_trees_options
  unpack-trees: make dir an internal-only struct
  Remove ignored files by default when they are in the way
  Change unpack_trees' 'reset' flag into an enum
  unpack-trees: avoid nuking untracked dir in way of unmerged file
  unpack-trees: avoid nuking untracked dir in way of locally deleted
    file
  Comment important codepaths regarding nuking untracked files/dirs
  Documentation: call out commands that nuke untracked files/directories

 Documentation/git-checkout.txt   |   5 +-
 Documentation/git-read-tree.txt  |  23 +--
 Documentation/git-reset.txt      |   3 +-
 builtin/am.c                     |   3 +-
 builtin/checkout.c               |  10 +-
 builtin/clone.c                  |   1 +
 builtin/merge.c                  |   1 +
 builtin/read-tree.c              |  26 ++--
 builtin/reset.c                  |  10 +-
 builtin/stash.c                  |   5 +-
 builtin/submodule--helper.c      |   4 +
 contrib/rerere-train.sh          |   2 +-
 merge-ort.c                      |   8 +-
 merge-recursive.c                |   5 +-
 merge.c                          |   8 +-
 reset.c                          |   3 +-
 sequencer.c                      |   1 +
 submodule.c                      |   1 +
 t/t1013-read-tree-submodule.sh   |   1 -
 t/t2500-untracked-overwriting.sh | 244 +++++++++++++++++++++++++++++++
 t/t7112-reset-submodule.sh       |   1 -
 unpack-trees.c                   |  61 +++++++-
 unpack-trees.h                   |  14 +-
 23 files changed, 366 insertions(+), 74 deletions(-)
 create mode 100755 t/t2500-untracked-overwriting.sh


base-commit: ddb1055343948e0d0bc81f8d20245f1ada6430a0
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1036%2Fnewren%2Funtracked_removal-v3
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1036/newren/untracked_removal-v3
Pull-Request: https://github.com/gitgitgadget/git/pull/1036

Range-diff vs v2:

  1:  9460a49c7ed =  1:  66270ffc74e t2500: add various tests for nuking untracked files
  -:  ----------- >  2:  0c74285b253 checkout, read-tree: fix leak of unpack_trees_options.dir
  -:  ----------- >  3:  2501a0c552a read-tree, merge-recursive: overwrite ignored files by default
  -:  ----------- >  4:  f1a0700e598 unpack-trees: introduce preserve_ignored to unpack_trees_options
  -:  ----------- >  5:  0d119142778 unpack-trees: make dir an internal-only struct
  -:  ----------- >  6:  b7fe354efff Remove ignored files by default when they are in the way
  2:  b77692b8f49 !  7:  9eb20121fc3 Change unpack_trees' 'reset' flag into an enum
     @@ Commit message
          (i.e. "true") can be split into two:
             UNPACK_RESET_PROTECT_UNTRACKED,
             UNPACK_RESET_OVERWRITE_UNTRACKED
     -    In order to catch accidental misuses, define with the enum a special
     -    value of
     +    In order to catch accidental misuses (i.e. where folks call it the way
     +    they traditionally used to), define the special enum value of
             UNPACK_RESET_INVALID = 1
          which will trigger a BUG().
      
     @@ Commit message
             numerous callers from rebase/sequencer to reset_head()
          will use the new UNPACK_RESET_PROTECT_UNTRACKED value.
      
     -    In order to protect untracked files but still allow deleting of ignored
     -    files, we also have to setup unpack_trees_opt.dir.  It may make sense to
     -    set up unpack_trees_opt.dir in more cases, but here I tried to only do
     -    so in cases where we switched from deleting all untracked files to
     -    avoiding doing so (i.e. where we now use
     -    UNPACK_RESET_PROTECT_UNTRACKED).
     +    Also, note that it has been reported that 'git checkout <treeish>
     +    <pathspec>' currently also allows overwriting untracked files[1].  That
     +    case should also be fixed, but it does not use unpack_trees() and thus
     +    is outside the scope of the current changes.
      
     -    Also, note that 'git checkout <pathspec>' currently also allows
     -    overwriting untracked files.  That case should also be fixed, but it
     -    does not use unpack_trees() and thus is outside the scope of the current
     -    changes.
     +    [1] https://lore.kernel.org/git/15dad590-087e-5a48-9238-5d2826950506@gmail.com/
      
          Signed-off-by: Elijah Newren <newren@gmail.com>
      
     @@ builtin/am.c: static int fast_forward_to(struct tree *head, struct tree *remote,
       	opts.update = 1;
       	opts.merge = 1;
      -	opts.reset = reset;
     +-	if (!reset)
     +-		opts.preserve_ignored = 0; /* FIXME: !overwrite_ignore */
      +	opts.reset = reset ? UNPACK_RESET_PROTECT_UNTRACKED : 0;
     ++	opts.preserve_ignored = 0; /* FIXME: !overwrite_ignore */
       	opts.fn = twoway_merge;
     -+	if (opts.reset) {
     -+		/* Allow ignored files in the way to get overwritten */
     -+		opts.dir = xcalloc(1, sizeof(*opts.dir));
     -+		opts.dir->flags |= DIR_SHOW_IGNORED;
     -+		setup_standard_excludes(opts.dir);
     -+	}
       	init_tree_desc(&t[0], head->buffer, head->size);
       	init_tree_desc(&t[1], remote->buffer, remote->size);
     - 
     -@@ builtin/am.c: static int fast_forward_to(struct tree *head, struct tree *remote, int reset)
     - 		return -1;
     - 	}
     - 
     -+	if (opts.reset) {
     -+		dir_clear(opts.dir);
     -+		FREE_AND_NULL(opts.dir);
     -+	}
     -+
     - 	if (write_locked_index(&the_index, &lock_file, COMMIT_LOCK))
     - 		die(_("unable to write new index file"));
     - 
      
       ## builtin/checkout.c ##
      @@ builtin/checkout.c: static int reset_tree(struct tree *tree, const struct checkout_opts *o,
     - {
     - 	struct unpack_trees_options opts;
     - 	struct tree_desc tree_desc;
     -+	int unpack_trees_ret;
     - 
     - 	memset(&opts, 0, sizeof(opts));
       	opts.head_idx = -1;
       	opts.update = worktree;
       	opts.skip_unmerged = !worktree;
      -	opts.reset = 1;
      +	opts.reset = o->force ? UNPACK_RESET_OVERWRITE_UNTRACKED :
      +				UNPACK_RESET_PROTECT_UNTRACKED;
     ++	opts.preserve_ignored = (!o->force && !o->overwrite_ignore);
       	opts.merge = 1;
     +-	opts.preserve_ignored = 0;
       	opts.fn = oneway_merge;
       	opts.verbose_update = o->show_progress;
       	opts.src_index = &the_index;
     - 	opts.dst_index = &the_index;
     -+	if (o->overwrite_ignore) {
     -+		opts.dir = xcalloc(1, sizeof(*opts.dir));
     -+		opts.dir->flags |= DIR_SHOW_IGNORED;
     -+		setup_standard_excludes(opts.dir);
     -+	}
     - 	init_checkout_metadata(&opts.meta, info->refname,
     - 			       info->commit ? &info->commit->object.oid : null_oid(),
     - 			       NULL);
     - 	parse_tree(tree);
     - 	init_tree_desc(&tree_desc, tree->buffer, tree->size);
     --	switch (unpack_trees(1, &tree_desc, &opts)) {
     -+	unpack_trees_ret = unpack_trees(1, &tree_desc, &opts);
     -+
     -+	if (o->overwrite_ignore) {
     -+		dir_clear(opts.dir);
     -+		FREE_AND_NULL(opts.dir);
     -+	}
     -+
     -+	switch (unpack_trees_ret) {
     - 	case -2:
     - 		*writeout_error = 1;
     - 		/*
      
       ## builtin/read-tree.c ##
      @@ builtin/read-tree.c: int cmd_read_tree(int argc, const char **argv, const char *cmd_prefix)
     @@ builtin/read-tree.c: int cmd_read_tree(int argc, const char **argv, const char *
       	 *
      
       ## builtin/reset.c ##
     -@@
     - #define USE_THE_INDEX_COMPATIBILITY_MACROS
     - #include "builtin.h"
     - #include "config.h"
     -+#include "dir.h"
     - #include "lockfile.h"
     - #include "tag.h"
     - #include "object.h"
      @@ builtin/reset.c: static int reset_index(const char *ref, const struct object_id *oid, int reset_t
       		break;
       	case HARD:
     @@ builtin/reset.c: static int reset_index(const char *ref, const struct object_id
       	default:
      -		opts.reset = 1;
      +		BUG("invalid reset_type passed to reset_index");
     -+	}
     -+	if (opts.reset == UNPACK_RESET_PROTECT_UNTRACKED) {
     -+		/* Setup opts.dir so we can overwrite ignored files */
     -+		opts.dir = xcalloc(1, sizeof(*opts.dir));
     -+		opts.dir->flags |= DIR_SHOW_IGNORED;
     -+		setup_standard_excludes(opts.dir);
       	}
       
       	read_cache_unmerged();
     -@@ builtin/reset.c: static int reset_index(const char *ref, const struct object_id *oid, int reset_t
     - 	ret = 0;
     - 
     - out:
     -+	if (opts.reset == UNPACK_RESET_PROTECT_UNTRACKED) {
     -+		dir_clear(opts.dir);
     -+		FREE_AND_NULL(opts.dir);
     -+	}
     - 	for (i = 0; i < nr; i++)
     - 		free((void *)desc[i].buffer);
     - 	return ret;
      
       ## builtin/stash.c ##
     -@@ builtin/stash.c: static int reset_tree(struct object_id *i_tree, int update, int reset)
     - 	struct tree_desc t[MAX_UNPACK_TREES];
     - 	struct tree *tree;
     - 	struct lock_file lock_file = LOCK_INIT;
     -+	int unpack_trees_ret;
     - 
     - 	read_cache_preload(NULL);
     - 	if (refresh_cache(REFRESH_QUIET))
      @@ builtin/stash.c: static int reset_tree(struct object_id *i_tree, int update, int reset)
       	opts.src_index = &the_index;
       	opts.dst_index = &the_index;
       	opts.merge = 1;
      -	opts.reset = reset;
      +	opts.reset = reset ? UNPACK_RESET_PROTECT_UNTRACKED : 0;
     -+	if (opts.reset) {
     -+		opts.dir = xcalloc(1, sizeof(*opts.dir));
     -+		opts.dir->flags |= DIR_SHOW_IGNORED;
     -+		setup_standard_excludes(opts.dir);
     -+	}
       	opts.update = update;
     +-	if (update && !reset)
     ++	if (update)
     + 		opts.preserve_ignored = 0; /* FIXME: !overwrite_ignore */
       	opts.fn = oneway_merge;
       
     --	if (unpack_trees(nr_trees, t, &opts))
     -+	unpack_trees_ret = unpack_trees(nr_trees, t, &opts);
     -+
     -+	if (opts.reset) {
     -+		dir_clear(opts.dir);
     -+		FREE_AND_NULL(opts.dir);
     -+	}
     -+
     -+	if (unpack_trees_ret)
     - 		return -1;
     - 
     - 	if (write_locked_index(&the_index, &lock_file, COMMIT_LOCK))
      
       ## reset.c ##
     -@@
     - #include "git-compat-util.h"
     - #include "cache-tree.h"
     -+#include "dir.h"
     - #include "lockfile.h"
     - #include "refs.h"
     - #include "reset.h"
      @@ reset.c: int reset_head(struct repository *r, struct object_id *oid, const char *action,
     - 	unpack_tree_opts.update = 1;
     - 	unpack_tree_opts.merge = 1;
     + 	unpack_tree_opts.preserve_ignored = 0; /* FIXME: !overwrite_ignore */
       	init_checkout_metadata(&unpack_tree_opts.meta, switch_to_branch, oid, NULL);
     --	if (!detach_head)
     + 	if (!detach_head)
      -		unpack_tree_opts.reset = 1;
     -+	if (!detach_head) {
      +		unpack_tree_opts.reset = UNPACK_RESET_PROTECT_UNTRACKED;
     -+		unpack_tree_opts.dir = xcalloc(1, sizeof(*unpack_tree_opts.dir));
     -+		unpack_tree_opts.dir->flags |= DIR_SHOW_IGNORED;
     -+		setup_standard_excludes(unpack_tree_opts.dir);
     -+	}
       
       	if (repo_read_index_unmerged(r) < 0) {
       		ret = error(_("could not read index"));
     -@@ reset.c: reset_head_refs:
     - 			    oid_to_hex(oid), "1", NULL);
     - 
     - leave_reset_head:
     -+	if (unpack_tree_opts.dir) {
     -+		dir_clear(unpack_tree_opts.dir);
     -+		FREE_AND_NULL(unpack_tree_opts.dir);
     -+	}
     - 	strbuf_release(&msg);
     - 	rollback_lock_file(&lock);
     - 	clear_unpack_trees_porcelain(&unpack_tree_opts);
      
       ## t/t2500-untracked-overwriting.sh ##
      @@ t/t2500-untracked-overwriting.sh: test_setup_checkout_m () {
     @@ t/t2500-untracked-overwriting.sh: test_expect_failure 'git rebase --abort and un
      
       ## unpack-trees.c ##
      @@ unpack-trees.c: int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
     - 	struct pattern_list pl;
       	int free_pattern_list = 0;
     + 	struct dir_struct dir = DIR_INIT;
       
      +	if (o->reset == UNPACK_RESET_INVALID)
      +		BUG("o->reset had a value of 1; should be UNPACK_TREES_*_UNTRACKED");
      +
       	if (len > MAX_UNPACK_TREES)
       		die("unpack_trees takes at most %d trees", MAX_UNPACK_TREES);
     + 	if (o->dir)
     +@@ unpack-trees.c: int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
     + 		ensure_full_index(o->dst_index);
     + 	}
       
     ++	if (o->reset == UNPACK_RESET_OVERWRITE_UNTRACKED &&
     ++	    o->preserve_ignored)
     ++		BUG("UNPACK_RESET_OVERWRITE_UNTRACKED incompatible with preserved ignored files");
     ++
     + 	if (!o->preserve_ignored) {
     + 		o->dir = &dir;
     + 		o->dir->flags |= DIR_SHOW_IGNORED;
      @@ unpack-trees.c: static int verify_absent_1(const struct cache_entry *ce,
       	int len;
       	struct stat st;
     @@ unpack-trees.h: void setup_unpack_trees_porcelain(struct unpack_trees_options *o
      -		     merge,
      +	unsigned int merge,
       		     update,
     + 		     preserve_ignored,
       		     clone,
     - 		     index_only,
      @@ unpack-trees.h: struct unpack_trees_options {
       		     exiting_early,
       		     show_all_errors,
     @@ unpack-trees.h: struct unpack_trees_options {
      +	enum unpack_trees_reset_type reset;
       	const char *prefix;
       	int cache_bottom;
     - 	struct dir_struct *dir;
     + 	struct pathspec *pathspec;
  3:  208f3b3ebe5 =  8:  e4c42d43b09 unpack-trees: avoid nuking untracked dir in way of unmerged file
  4:  0a0997d081b =  9:  1a770681704 unpack-trees: avoid nuking untracked dir in way of locally deleted file
  5:  4b78a526d2a ! 10:  6b42a80bf3d Comment important codepaths regarding nuking untracked files/dirs
     @@ Commit message
            * git-archimport.perl: Don't care; arch is long since dead
            * git-cvs*.perl: Don't care; cvs is long since dead
      
     +    Also, the reset --hard in builtin/worktree.c looks safe, due to only
     +    running in an empty directory.
     +
          Signed-off-by: Elijah Newren <newren@gmail.com>
      
       ## builtin/stash.c ##
     @@ builtin/submodule--helper.c: static int add_submodule(const struct add_data *add
       
       		if (add_data->branch) {
      
     - ## builtin/worktree.c ##
     -@@ builtin/worktree.c: static int add_worktree(const char *path, const char *refname,
     - 	if (opts->checkout) {
     - 		cp.argv = NULL;
     - 		strvec_clear(&cp.args);
     -+		/*
     -+		 * NOTE: reset --hard is okay here, because 'worktree add'
     -+		 * refuses to work in an extant non-empty directory, so there
     -+		 * is no risk of deleting untracked files.
     -+		 */
     - 		strvec_pushl(&cp.args, "reset", "--hard", "--no-recurse-submodules", NULL);
     - 		if (opts->quiet)
     - 			strvec_push(&cp.args, "--quiet");
     -
       ## contrib/rerere-train.sh ##
      @@ contrib/rerere-train.sh: do
       		git checkout -q $commit -- .
  6:  993451a8036 = 11:  de416f887d7 Documentation: call out commands that nuke untracked files/directories

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 82+ messages in thread

* [PATCH v3 01/11] t2500: add various tests for nuking untracked files
  2021-09-27 16:33   ` [PATCH v3 00/11] Fix various issues around removal of " Elijah Newren via GitGitGadget
@ 2021-09-27 16:33     ` Elijah Newren via GitGitGadget
  2021-09-27 16:33     ` [PATCH v3 02/11] checkout, read-tree: fix leak of unpack_trees_options.dir Elijah Newren via GitGitGadget
                       ` (13 subsequent siblings)
  14 siblings, 0 replies; 82+ messages in thread
From: Elijah Newren via GitGitGadget @ 2021-09-27 16:33 UTC (permalink / raw)
  To: git
  Cc: Ævar Arnfjörð Bjarmason, Fedor Biryukov,
	Philip Oakley, Phillip Wood, Elijah Newren, Eric Sunshine,
	Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

Noting that unpack_trees treats reset=1 & update=1 as license to nuke
untracked files, I looked for code paths that use this combination and
tried to generate testcases which demonstrated unintentional loss of
untracked files and directories.  I found several.

I also include testcases for `git reset --{hard,merge,keep}`.  A hard
reset is perhaps the most direct test of unpack_tree's reset=1 behavior,
but we cannot make `git reset --hard` preserve untracked files without
some migration work.

Also, the two commands `checkout --force` (because of the --force) and
`read-tree --reset` (because it's plumbing and we need to keep it
backward compatible) were left out as we expect those to continue
removing untracked files and directories.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 t/t2500-untracked-overwriting.sh | 244 +++++++++++++++++++++++++++++++
 1 file changed, 244 insertions(+)
 create mode 100755 t/t2500-untracked-overwriting.sh

diff --git a/t/t2500-untracked-overwriting.sh b/t/t2500-untracked-overwriting.sh
new file mode 100755
index 00000000000..2412d121ea8
--- /dev/null
+++ b/t/t2500-untracked-overwriting.sh
@@ -0,0 +1,244 @@
+#!/bin/sh
+
+test_description='Test handling of overwriting untracked files'
+
+. ./test-lib.sh
+
+test_setup_reset () {
+	git init reset_$1 &&
+	(
+		cd reset_$1 &&
+		test_commit init &&
+
+		git branch stable &&
+		git branch work &&
+
+		git checkout work &&
+		test_commit foo &&
+
+		git checkout stable
+	)
+}
+
+test_expect_success 'reset --hard will nuke untracked files/dirs' '
+	test_setup_reset hard &&
+	(
+		cd reset_hard &&
+		git ls-tree -r stable &&
+		git log --all --name-status --oneline &&
+		git ls-tree -r work &&
+
+		mkdir foo.t &&
+		echo precious >foo.t/file &&
+		echo foo >expect &&
+
+		git reset --hard work &&
+
+		# check that untracked directory foo.t/ was nuked
+		test_path_is_file foo.t &&
+		test_cmp expect foo.t
+	)
+'
+
+test_expect_success 'reset --merge will preserve untracked files/dirs' '
+	test_setup_reset merge &&
+	(
+		cd reset_merge &&
+
+		mkdir foo.t &&
+		echo precious >foo.t/file &&
+		cp foo.t/file expect &&
+
+		test_must_fail git reset --merge work 2>error &&
+		test_cmp expect foo.t/file &&
+		grep "Updating .foo.t. would lose untracked files" error
+	)
+'
+
+test_expect_success 'reset --keep will preserve untracked files/dirs' '
+	test_setup_reset keep &&
+	(
+		cd reset_keep &&
+
+		mkdir foo.t &&
+		echo precious >foo.t/file &&
+		cp foo.t/file expect &&
+
+		test_must_fail git reset --merge work 2>error &&
+		test_cmp expect foo.t/file &&
+		grep "Updating.*foo.t.*would lose untracked files" error
+	)
+'
+
+test_setup_checkout_m () {
+	git init checkout &&
+	(
+		cd checkout &&
+		test_commit init &&
+
+		test_write_lines file has some >filler &&
+		git add filler &&
+		git commit -m filler &&
+
+		git branch stable &&
+
+		git switch -c work &&
+		echo stuff >notes.txt &&
+		test_write_lines file has some words >filler &&
+		git add notes.txt filler &&
+		git commit -m filler &&
+
+		git checkout stable
+	)
+}
+
+test_expect_failure 'checkout -m does not nuke untracked file' '
+	test_setup_checkout_m &&
+	(
+		cd checkout &&
+
+		# Tweak filler
+		test_write_lines this file has some >filler &&
+		# Make an untracked file, save its contents in "expect"
+		echo precious >notes.txt &&
+		cp notes.txt expect &&
+
+		test_must_fail git checkout -m work &&
+		test_cmp expect notes.txt
+	)
+'
+
+test_setup_sequencing () {
+	git init sequencing_$1 &&
+	(
+		cd sequencing_$1 &&
+		test_commit init &&
+
+		test_write_lines this file has some words >filler &&
+		git add filler &&
+		git commit -m filler &&
+
+		mkdir -p foo/bar &&
+		test_commit foo/bar/baz &&
+
+		git branch simple &&
+		git branch fooey &&
+
+		git checkout fooey &&
+		git rm foo/bar/baz.t &&
+		echo stuff >>filler &&
+		git add -u &&
+		git commit -m "changes" &&
+
+		git checkout simple &&
+		echo items >>filler &&
+		echo newstuff >>newfile &&
+		git add filler newfile &&
+		git commit -m another
+	)
+}
+
+test_expect_failure 'git rebase --abort and untracked files' '
+	test_setup_sequencing rebase_abort_and_untracked &&
+	(
+		cd sequencing_rebase_abort_and_untracked &&
+		git checkout fooey &&
+		test_must_fail git rebase simple &&
+
+		cat init.t &&
+		git rm init.t &&
+		echo precious >init.t &&
+		cp init.t expect &&
+		git status --porcelain &&
+		test_must_fail git rebase --abort &&
+		test_cmp expect init.t
+	)
+'
+
+test_expect_failure 'git rebase fast forwarding and untracked files' '
+	test_setup_sequencing rebase_fast_forward_and_untracked &&
+	(
+		cd sequencing_rebase_fast_forward_and_untracked &&
+		git checkout init &&
+		echo precious >filler &&
+		cp filler expect &&
+		test_must_fail git rebase init simple &&
+		test_cmp expect filler
+	)
+'
+
+test_expect_failure 'git rebase --autostash and untracked files' '
+	test_setup_sequencing rebase_autostash_and_untracked &&
+	(
+		cd sequencing_rebase_autostash_and_untracked &&
+		git checkout simple &&
+		git rm filler &&
+		mkdir filler &&
+		echo precious >filler/file &&
+		cp filler/file expect &&
+		git rebase --autostash init &&
+		test_path_is_file filler/file
+	)
+'
+
+test_expect_failure 'git stash and untracked files' '
+	test_setup_sequencing stash_and_untracked_files &&
+	(
+		cd sequencing_stash_and_untracked_files &&
+		git checkout simple &&
+		git rm filler &&
+		mkdir filler &&
+		echo precious >filler/file &&
+		cp filler/file expect &&
+		git status --porcelain &&
+		git stash push &&
+		git status --porcelain &&
+		test_path_is_file filler/file
+	)
+'
+
+test_expect_failure 'git am --abort and untracked dir vs. unmerged file' '
+	test_setup_sequencing am_abort_and_untracked &&
+	(
+		cd sequencing_am_abort_and_untracked &&
+		git format-patch -1 --stdout fooey >changes.mbox &&
+		test_must_fail git am --3way changes.mbox &&
+
+		# Delete the conflicted file; we will stage and commit it later
+		rm filler &&
+
+		# Put an unrelated untracked directory there
+		mkdir filler &&
+		echo foo >filler/file1 &&
+		echo bar >filler/file2 &&
+
+		test_must_fail git am --abort 2>errors &&
+		test_path_is_dir filler &&
+		grep "Updating .filler. would lose untracked files in it" errors
+	)
+'
+
+test_expect_failure 'git am --skip and untracked dir vs deleted file' '
+	test_setup_sequencing am_skip_and_untracked &&
+	(
+		cd sequencing_am_skip_and_untracked &&
+		git checkout fooey &&
+		git format-patch -1 --stdout simple >changes.mbox &&
+		test_must_fail git am --3way changes.mbox &&
+
+		# Delete newfile
+		rm newfile &&
+
+		# Put an unrelated untracked directory there
+		mkdir newfile &&
+		echo foo >newfile/file1 &&
+		echo bar >newfile/file2 &&
+
+		# Change our mind about resolutions, just skip this patch
+		test_must_fail git am --skip 2>errors &&
+		test_path_is_dir newfile &&
+		grep "Updating .newfile. would lose untracked files in it" errors
+	)
+'
+
+test_done
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [PATCH v3 02/11] checkout, read-tree: fix leak of unpack_trees_options.dir
  2021-09-27 16:33   ` [PATCH v3 00/11] Fix various issues around removal of " Elijah Newren via GitGitGadget
  2021-09-27 16:33     ` [PATCH v3 01/11] t2500: add various tests for nuking untracked files Elijah Newren via GitGitGadget
@ 2021-09-27 16:33     ` Elijah Newren via GitGitGadget
  2021-09-27 16:33     ` [PATCH v3 03/11] read-tree, merge-recursive: overwrite ignored files by default Elijah Newren via GitGitGadget
                       ` (12 subsequent siblings)
  14 siblings, 0 replies; 82+ messages in thread
From: Elijah Newren via GitGitGadget @ 2021-09-27 16:33 UTC (permalink / raw)
  To: git
  Cc: Ævar Arnfjörð Bjarmason, Fedor Biryukov,
	Philip Oakley, Phillip Wood, Elijah Newren, Eric Sunshine,
	Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 builtin/checkout.c  | 4 ++++
 builtin/read-tree.c | 5 +++++
 2 files changed, 9 insertions(+)

diff --git a/builtin/checkout.c b/builtin/checkout.c
index 8c69dcdf72a..5335435d616 100644
--- a/builtin/checkout.c
+++ b/builtin/checkout.c
@@ -760,6 +760,10 @@ static int merge_working_tree(const struct checkout_opts *opts,
 		init_tree_desc(&trees[1], tree->buffer, tree->size);
 
 		ret = unpack_trees(2, trees, &topts);
+		if (topts.dir) {
+			dir_clear(topts.dir);
+			FREE_AND_NULL(topts.dir);
+		}
 		clear_unpack_trees_porcelain(&topts);
 		if (ret == -1) {
 			/*
diff --git a/builtin/read-tree.c b/builtin/read-tree.c
index 485e7b04794..96102c222bf 100644
--- a/builtin/read-tree.c
+++ b/builtin/read-tree.c
@@ -250,6 +250,11 @@ int cmd_read_tree(int argc, const char **argv, const char *cmd_prefix)
 	if (unpack_trees(nr_trees, t, &opts))
 		return 128;
 
+	if (opts.dir) {
+		dir_clear(opts.dir);
+		FREE_AND_NULL(opts.dir);
+	}
+
 	if (opts.debug_unpack || opts.dry_run)
 		return 0; /* do not write the index out */
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [PATCH v3 03/11] read-tree, merge-recursive: overwrite ignored files by default
  2021-09-27 16:33   ` [PATCH v3 00/11] Fix various issues around removal of " Elijah Newren via GitGitGadget
  2021-09-27 16:33     ` [PATCH v3 01/11] t2500: add various tests for nuking untracked files Elijah Newren via GitGitGadget
  2021-09-27 16:33     ` [PATCH v3 02/11] checkout, read-tree: fix leak of unpack_trees_options.dir Elijah Newren via GitGitGadget
@ 2021-09-27 16:33     ` Elijah Newren via GitGitGadget
  2021-12-13 17:12       ` Jack O'Connor
  2021-09-27 16:33     ` [PATCH v3 04/11] unpack-trees: introduce preserve_ignored to unpack_trees_options Elijah Newren via GitGitGadget
                       ` (11 subsequent siblings)
  14 siblings, 1 reply; 82+ messages in thread
From: Elijah Newren via GitGitGadget @ 2021-09-27 16:33 UTC (permalink / raw)
  To: git
  Cc: Ævar Arnfjörð Bjarmason, Fedor Biryukov,
	Philip Oakley, Phillip Wood, Elijah Newren, Eric Sunshine,
	Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

This fixes a long-standing patchwork of ignored files handling in
read-tree and merge-recursive, called out and suggested by Junio long
ago.  Quoting from commit dcf0c16ef1 ("core.excludesfile clean-up"
2007-11-16):

    git-read-tree takes --exclude-per-directory=<gitignore>,
    not because the flexibility was needed.  Again, this was
    because the option predates the standardization of the ignore
    files.

    ...

    On the other hand, I think it makes perfect sense to fix
    git-read-tree, git-merge-recursive and git-clean to follow the
    same rule as other commands.  I do not think of a valid use case
    to give an exclude-per-directory that is nonstandard to
    read-tree command, outside a "negative" test in the t1004 test
    script.

    This patch is the first step to untangle this mess.

    The next step would be to teach read-tree, merge-recursive and
    clean (in C) to use setup_standard_excludes().

History shows each of these were partially or fully fixed:

  * clean was taught the new trick in 1617adc7a0 ("Teach git clean to
    use setup_standard_excludes()", 2007-11-14).

  * read-tree was primarily used by checkout & merge scripts.  checkout
    and merge later became builtins and were both fixed to use the new
    setup_standard_excludes() handling in fc001b526c ("checkout,merge:
    loosen overwriting untracked file check based on info/exclude",
    2011-11-27).  So the primary users were fixed, though read-tree
    itself was not.

  * merge-recursive has now been replaced as the default merge backend
    by merge-ort.  merge-ort fixed this by using
    setup_standard_excludes() starting early in its implementation; see
    commit 6681ce5cf6 ("merge-ort: add implementation of checkout()",
    2020-12-13), largely due to its design depending on checkout() and
    thus being influenced by the checkout code.  However,
    merge-recursive itself was not fixed here, in part because its
    design meant it had difficulty differentiating between untracked
    files, ignored files, leftover tracked files that haven't been
    removed yet due to order of processing files, and files written by
    itself due to collisions).

Make the conversion more complete by now handling read-tree and
handling at least the unpack_trees() portion of merge-recursive.  While
merge-recursive is on its way out, fixing the unpack_trees() portion is
easy and facilitates some of the later changes in this series.  Note
that fixing read-tree makes the --exclude-per-directory option to
read-tree useless, so we remove it from the documentation (though we
continue to accept it if passed).

The read-tree changes happen to fix a bug in t1013.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 Documentation/git-read-tree.txt | 18 +-----------------
 builtin/read-tree.c             | 25 ++++++++++---------------
 merge-recursive.c               | 11 ++++++++++-
 t/t1013-read-tree-submodule.sh  |  1 -
 4 files changed, 21 insertions(+), 34 deletions(-)

diff --git a/Documentation/git-read-tree.txt b/Documentation/git-read-tree.txt
index 5fa8bab64c2..0222a27c5af 100644
--- a/Documentation/git-read-tree.txt
+++ b/Documentation/git-read-tree.txt
@@ -10,8 +10,7 @@ SYNOPSIS
 --------
 [verse]
 'git read-tree' [[-m [--trivial] [--aggressive] | --reset | --prefix=<prefix>]
-		[-u [--exclude-per-directory=<gitignore>] | -i]]
-		[--index-output=<file>] [--no-sparse-checkout]
+		[-u | -i]] [--index-output=<file>] [--no-sparse-checkout]
 		(--empty | <tree-ish1> [<tree-ish2> [<tree-ish3>]])
 
 
@@ -88,21 +87,6 @@ OPTIONS
 	The command will refuse to overwrite entries that already
 	existed in the original index file.
 
---exclude-per-directory=<gitignore>::
-	When running the command with `-u` and `-m` options, the
-	merge result may need to overwrite paths that are not
-	tracked in the current branch.  The command usually
-	refuses to proceed with the merge to avoid losing such a
-	path.  However this safety valve sometimes gets in the
-	way.  For example, it often happens that the other
-	branch added a file that used to be a generated file in
-	your branch, and the safety valve triggers when you try
-	to switch to that branch after you ran `make` but before
-	running `make clean` to remove the generated file.  This
-	option tells the command to read per-directory exclude
-	file (usually '.gitignore') and allows such an untracked
-	but explicitly ignored file to be overwritten.
-
 --index-output=<file>::
 	Instead of writing the results out to `$GIT_INDEX_FILE`,
 	write the resulting index in the named file.  While the
diff --git a/builtin/read-tree.c b/builtin/read-tree.c
index 96102c222bf..73cb957a69b 100644
--- a/builtin/read-tree.c
+++ b/builtin/read-tree.c
@@ -38,7 +38,7 @@ static int list_tree(struct object_id *oid)
 }
 
 static const char * const read_tree_usage[] = {
-	N_("git read-tree [(-m [--trivial] [--aggressive] | --reset | --prefix=<prefix>) [-u [--exclude-per-directory=<gitignore>] | -i]] [--no-sparse-checkout] [--index-output=<file>] (--empty | <tree-ish1> [<tree-ish2> [<tree-ish3>]])"),
+	N_("git read-tree [(-m [--trivial] [--aggressive] | --reset | --prefix=<prefix>) [-u | -i]] [--no-sparse-checkout] [--index-output=<file>] (--empty | <tree-ish1> [<tree-ish2> [<tree-ish3>]])"),
 	NULL
 };
 
@@ -53,24 +53,16 @@ static int index_output_cb(const struct option *opt, const char *arg,
 static int exclude_per_directory_cb(const struct option *opt, const char *arg,
 				    int unset)
 {
-	struct dir_struct *dir;
 	struct unpack_trees_options *opts;
 
 	BUG_ON_OPT_NEG(unset);
 
 	opts = (struct unpack_trees_options *)opt->value;
 
-	if (opts->dir)
-		die("more than one --exclude-per-directory given.");
-
-	dir = xcalloc(1, sizeof(*opts->dir));
-	dir->flags |= DIR_SHOW_IGNORED;
-	dir->exclude_per_dir = arg;
-	opts->dir = dir;
-	/* We do not need to nor want to do read-directory
-	 * here; we are merely interested in reusing the
-	 * per directory ignore stack mechanism.
-	 */
+	if (!opts->update)
+		die("--exclude-per-directory is meaningless unless -u");
+	if (strcmp(arg, ".gitignore"))
+		die("--exclude-per-directory argument must be .gitignore");
 	return 0;
 }
 
@@ -209,8 +201,11 @@ int cmd_read_tree(int argc, const char **argv, const char *cmd_prefix)
 	if ((opts.update || opts.index_only) && !opts.merge)
 		die("%s is meaningless without -m, --reset, or --prefix",
 		    opts.update ? "-u" : "-i");
-	if ((opts.dir && !opts.update))
-		die("--exclude-per-directory is meaningless unless -u");
+	if (opts.update && !opts.reset) {
+		CALLOC_ARRAY(opts.dir, 1);
+		opts.dir->flags |= DIR_SHOW_IGNORED;
+		setup_standard_excludes(opts.dir);
+	}
 	if (opts.merge && !opts.index_only)
 		setup_work_tree();
 
diff --git a/merge-recursive.c b/merge-recursive.c
index e594d4c3fa1..233d9f686ad 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -408,8 +408,13 @@ static int unpack_trees_start(struct merge_options *opt,
 	memset(&opt->priv->unpack_opts, 0, sizeof(opt->priv->unpack_opts));
 	if (opt->priv->call_depth)
 		opt->priv->unpack_opts.index_only = 1;
-	else
+	else {
 		opt->priv->unpack_opts.update = 1;
+		/* FIXME: should only do this if !overwrite_ignore */
+		CALLOC_ARRAY(opt->priv->unpack_opts.dir, 1);
+		opt->priv->unpack_opts.dir->flags |= DIR_SHOW_IGNORED;
+		setup_standard_excludes(opt->priv->unpack_opts.dir);
+	}
 	opt->priv->unpack_opts.merge = 1;
 	opt->priv->unpack_opts.head_idx = 2;
 	opt->priv->unpack_opts.fn = threeway_merge;
@@ -423,6 +428,10 @@ static int unpack_trees_start(struct merge_options *opt,
 	init_tree_desc_from_tree(t+2, merge);
 
 	rc = unpack_trees(3, t, &opt->priv->unpack_opts);
+	if (opt->priv->unpack_opts.dir) {
+		dir_clear(opt->priv->unpack_opts.dir);
+		FREE_AND_NULL(opt->priv->unpack_opts.dir);
+	}
 	cache_tree_free(&opt->repo->index->cache_tree);
 
 	/*
diff --git a/t/t1013-read-tree-submodule.sh b/t/t1013-read-tree-submodule.sh
index b6df7444c05..bfc90d4cf27 100755
--- a/t/t1013-read-tree-submodule.sh
+++ b/t/t1013-read-tree-submodule.sh
@@ -6,7 +6,6 @@ test_description='read-tree can handle submodules'
 . "$TEST_DIRECTORY"/lib-submodule-update.sh
 
 KNOWN_FAILURE_DIRECTORY_SUBMODULE_CONFLICTS=1
-KNOWN_FAILURE_SUBMODULE_OVERWRITE_IGNORED_UNTRACKED=1
 
 test_submodule_switch_recursing_with_args "read-tree -u -m"
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [PATCH v3 04/11] unpack-trees: introduce preserve_ignored to unpack_trees_options
  2021-09-27 16:33   ` [PATCH v3 00/11] Fix various issues around removal of " Elijah Newren via GitGitGadget
                       ` (2 preceding siblings ...)
  2021-09-27 16:33     ` [PATCH v3 03/11] read-tree, merge-recursive: overwrite ignored files by default Elijah Newren via GitGitGadget
@ 2021-09-27 16:33     ` Elijah Newren via GitGitGadget
  2021-09-29  9:22       ` Ævar Arnfjörð Bjarmason
  2021-09-27 16:33     ` [PATCH v3 05/11] unpack-trees: make dir an internal-only struct Elijah Newren via GitGitGadget
                       ` (10 subsequent siblings)
  14 siblings, 1 reply; 82+ messages in thread
From: Elijah Newren via GitGitGadget @ 2021-09-27 16:33 UTC (permalink / raw)
  To: git
  Cc: Ævar Arnfjörð Bjarmason, Fedor Biryukov,
	Philip Oakley, Phillip Wood, Elijah Newren, Eric Sunshine,
	Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

Currently, every caller of unpack_trees() that wants to ensure ignored
files are overwritten by default needs to:
   * allocate unpack_trees_options.dir
   * flip the DIR_SHOW_IGNORED flag in unpack_trees_options.dir->flags
   * call setup_standard_excludes
AND then after the call to unpack_trees() needs to
   * call dir_clear()
   * deallocate unpack_trees_options.dir
That's a fair amount of boilerplate, and every caller uses identical
code.  Make this easier by instead introducing a new boolean value where
the default value (0) does what we want so that new callers of
unpack_trees() automatically get the appropriate behavior.  And move all
the handling of unpack_trees_options.dir into unpack_trees() itself.

While preserve_ignored = 0 is the behavior we feel is the appropriate
default, we defer fixing commands to use the appropriate default until a
later commit.  So, this commit introduces several locations where we
manually set preserve_ignored=1.  This makes it clear where code paths
were previously preserving ignored files when they should not have been;
a future commit will flip these to instead use a value of 0 to get the
behavior we want.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 builtin/am.c        |  3 +++
 builtin/checkout.c  | 11 ++---------
 builtin/clone.c     |  2 ++
 builtin/merge.c     |  2 ++
 builtin/read-tree.c | 13 +++----------
 builtin/reset.c     |  2 ++
 builtin/stash.c     |  3 +++
 merge-ort.c         |  8 +-------
 merge-recursive.c   |  8 +-------
 merge.c             |  8 +-------
 reset.c             |  2 ++
 sequencer.c         |  2 ++
 unpack-trees.c      | 10 ++++++++++
 unpack-trees.h      |  1 +
 14 files changed, 35 insertions(+), 40 deletions(-)

diff --git a/builtin/am.c b/builtin/am.c
index e4a0ff9cd7c..1ee70692bc3 100644
--- a/builtin/am.c
+++ b/builtin/am.c
@@ -1918,6 +1918,9 @@ static int fast_forward_to(struct tree *head, struct tree *remote, int reset)
 	opts.update = 1;
 	opts.merge = 1;
 	opts.reset = reset;
+	if (!reset)
+		/* FIXME: Default should be to remove ignored files */
+		opts.preserve_ignored = 1;
 	opts.fn = twoway_merge;
 	init_tree_desc(&t[0], head->buffer, head->size);
 	init_tree_desc(&t[1], remote->buffer, remote->size);
diff --git a/builtin/checkout.c b/builtin/checkout.c
index 5335435d616..5e7957dd068 100644
--- a/builtin/checkout.c
+++ b/builtin/checkout.c
@@ -648,6 +648,7 @@ static int reset_tree(struct tree *tree, const struct checkout_opts *o,
 	opts.skip_unmerged = !worktree;
 	opts.reset = 1;
 	opts.merge = 1;
+	opts.preserve_ignored = 0;
 	opts.fn = oneway_merge;
 	opts.verbose_update = o->show_progress;
 	opts.src_index = &the_index;
@@ -746,11 +747,7 @@ static int merge_working_tree(const struct checkout_opts *opts,
 				       new_branch_info->commit ?
 				       &new_branch_info->commit->object.oid :
 				       &new_branch_info->oid, NULL);
-		if (opts->overwrite_ignore) {
-			topts.dir = xcalloc(1, sizeof(*topts.dir));
-			topts.dir->flags |= DIR_SHOW_IGNORED;
-			setup_standard_excludes(topts.dir);
-		}
+		topts.preserve_ignored = !opts->overwrite_ignore;
 		tree = parse_tree_indirect(old_branch_info->commit ?
 					   &old_branch_info->commit->object.oid :
 					   the_hash_algo->empty_tree);
@@ -760,10 +757,6 @@ static int merge_working_tree(const struct checkout_opts *opts,
 		init_tree_desc(&trees[1], tree->buffer, tree->size);
 
 		ret = unpack_trees(2, trees, &topts);
-		if (topts.dir) {
-			dir_clear(topts.dir);
-			FREE_AND_NULL(topts.dir);
-		}
 		clear_unpack_trees_porcelain(&topts);
 		if (ret == -1) {
 			/*
diff --git a/builtin/clone.c b/builtin/clone.c
index ff1d3d447a3..be1c3840d62 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -687,6 +687,8 @@ static int checkout(int submodule_progress)
 	opts.update = 1;
 	opts.merge = 1;
 	opts.clone = 1;
+	/* FIXME: Default should be to remove ignored files */
+	opts.preserve_ignored = 1;
 	opts.fn = oneway_merge;
 	opts.verbose_update = (option_verbosity >= 0);
 	opts.src_index = &the_index;
diff --git a/builtin/merge.c b/builtin/merge.c
index 3fbdacc7db4..1e5fff095fc 100644
--- a/builtin/merge.c
+++ b/builtin/merge.c
@@ -680,6 +680,8 @@ static int read_tree_trivial(struct object_id *common, struct object_id *head,
 	opts.verbose_update = 1;
 	opts.trivial_merges_only = 1;
 	opts.merge = 1;
+	/* FIXME: Default should be to remove ignored files */
+	opts.preserve_ignored = 1;
 	trees[nr_trees] = parse_tree_indirect(common);
 	if (!trees[nr_trees++])
 		return -1;
diff --git a/builtin/read-tree.c b/builtin/read-tree.c
index 73cb957a69b..443d206eca6 100644
--- a/builtin/read-tree.c
+++ b/builtin/read-tree.c
@@ -201,11 +201,9 @@ int cmd_read_tree(int argc, const char **argv, const char *cmd_prefix)
 	if ((opts.update || opts.index_only) && !opts.merge)
 		die("%s is meaningless without -m, --reset, or --prefix",
 		    opts.update ? "-u" : "-i");
-	if (opts.update && !opts.reset) {
-		CALLOC_ARRAY(opts.dir, 1);
-		opts.dir->flags |= DIR_SHOW_IGNORED;
-		setup_standard_excludes(opts.dir);
-	}
+	if (opts.update && !opts.reset)
+		opts.preserve_ignored = 0;
+	/* otherwise, opts.preserve_ignored is irrelevant */
 	if (opts.merge && !opts.index_only)
 		setup_work_tree();
 
@@ -245,11 +243,6 @@ int cmd_read_tree(int argc, const char **argv, const char *cmd_prefix)
 	if (unpack_trees(nr_trees, t, &opts))
 		return 128;
 
-	if (opts.dir) {
-		dir_clear(opts.dir);
-		FREE_AND_NULL(opts.dir);
-	}
-
 	if (opts.debug_unpack || opts.dry_run)
 		return 0; /* do not write the index out */
 
diff --git a/builtin/reset.c b/builtin/reset.c
index 51c9e2f43ff..7f38656f018 100644
--- a/builtin/reset.c
+++ b/builtin/reset.c
@@ -67,6 +67,8 @@ static int reset_index(const char *ref, const struct object_id *oid, int reset_t
 	case KEEP:
 	case MERGE:
 		opts.update = 1;
+		/* FIXME: Default should be to remove ignored files */
+		opts.preserve_ignored = 1;
 		break;
 	case HARD:
 		opts.update = 1;
diff --git a/builtin/stash.c b/builtin/stash.c
index 8f42360ca91..88287b890d5 100644
--- a/builtin/stash.c
+++ b/builtin/stash.c
@@ -258,6 +258,9 @@ static int reset_tree(struct object_id *i_tree, int update, int reset)
 	opts.merge = 1;
 	opts.reset = reset;
 	opts.update = update;
+	if (update && !reset)
+		/* FIXME: Default should be to remove ignored files */
+		opts.preserve_ignored = 1;
 	opts.fn = oneway_merge;
 
 	if (unpack_trees(nr_trees, t, &opts))
diff --git a/merge-ort.c b/merge-ort.c
index 35aa979c3a4..0d64ec716bd 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -4045,11 +4045,7 @@ static int checkout(struct merge_options *opt,
 	unpack_opts.quiet = 0; /* FIXME: sequencer might want quiet? */
 	unpack_opts.verbose_update = (opt->verbosity > 2);
 	unpack_opts.fn = twoway_merge;
-	if (1/* FIXME: opts->overwrite_ignore*/) {
-		CALLOC_ARRAY(unpack_opts.dir, 1);
-		unpack_opts.dir->flags |= DIR_SHOW_IGNORED;
-		setup_standard_excludes(unpack_opts.dir);
-	}
+	unpack_opts.preserve_ignored = 0; /* FIXME: !opts->overwrite_ignore*/
 	parse_tree(prev);
 	init_tree_desc(&trees[0], prev->buffer, prev->size);
 	parse_tree(next);
@@ -4057,8 +4053,6 @@ static int checkout(struct merge_options *opt,
 
 	ret = unpack_trees(2, trees, &unpack_opts);
 	clear_unpack_trees_porcelain(&unpack_opts);
-	dir_clear(unpack_opts.dir);
-	FREE_AND_NULL(unpack_opts.dir);
 	return ret;
 }
 
diff --git a/merge-recursive.c b/merge-recursive.c
index 233d9f686ad..2be3f5d4044 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -411,9 +411,7 @@ static int unpack_trees_start(struct merge_options *opt,
 	else {
 		opt->priv->unpack_opts.update = 1;
 		/* FIXME: should only do this if !overwrite_ignore */
-		CALLOC_ARRAY(opt->priv->unpack_opts.dir, 1);
-		opt->priv->unpack_opts.dir->flags |= DIR_SHOW_IGNORED;
-		setup_standard_excludes(opt->priv->unpack_opts.dir);
+		opt->priv->unpack_opts.preserve_ignored = 0;
 	}
 	opt->priv->unpack_opts.merge = 1;
 	opt->priv->unpack_opts.head_idx = 2;
@@ -428,10 +426,6 @@ static int unpack_trees_start(struct merge_options *opt,
 	init_tree_desc_from_tree(t+2, merge);
 
 	rc = unpack_trees(3, t, &opt->priv->unpack_opts);
-	if (opt->priv->unpack_opts.dir) {
-		dir_clear(opt->priv->unpack_opts.dir);
-		FREE_AND_NULL(opt->priv->unpack_opts.dir);
-	}
 	cache_tree_free(&opt->repo->index->cache_tree);
 
 	/*
diff --git a/merge.c b/merge.c
index 6e736881d90..2382ff66d35 100644
--- a/merge.c
+++ b/merge.c
@@ -53,7 +53,6 @@ int checkout_fast_forward(struct repository *r,
 	struct unpack_trees_options opts;
 	struct tree_desc t[MAX_UNPACK_TREES];
 	int i, nr_trees = 0;
-	struct dir_struct dir = DIR_INIT;
 	struct lock_file lock_file = LOCK_INIT;
 
 	refresh_index(r->index, REFRESH_QUIET, NULL, NULL, NULL);
@@ -80,11 +79,7 @@ int checkout_fast_forward(struct repository *r,
 	}
 
 	memset(&opts, 0, sizeof(opts));
-	if (overwrite_ignore) {
-		dir.flags |= DIR_SHOW_IGNORED;
-		setup_standard_excludes(&dir);
-		opts.dir = &dir;
-	}
+	opts.preserve_ignored = !overwrite_ignore;
 
 	opts.head_idx = 1;
 	opts.src_index = r->index;
@@ -101,7 +96,6 @@ int checkout_fast_forward(struct repository *r,
 		clear_unpack_trees_porcelain(&opts);
 		return -1;
 	}
-	dir_clear(&dir);
 	clear_unpack_trees_porcelain(&opts);
 
 	if (write_locked_index(r->index, &lock_file, COMMIT_LOCK))
diff --git a/reset.c b/reset.c
index 79310ae071b..41b3e2d88de 100644
--- a/reset.c
+++ b/reset.c
@@ -56,6 +56,8 @@ int reset_head(struct repository *r, struct object_id *oid, const char *action,
 	unpack_tree_opts.fn = reset_hard ? oneway_merge : twoway_merge;
 	unpack_tree_opts.update = 1;
 	unpack_tree_opts.merge = 1;
+	/* FIXME: Default should be to remove ignored files */
+	unpack_tree_opts.preserve_ignored = 1;
 	init_checkout_metadata(&unpack_tree_opts.meta, switch_to_branch, oid, NULL);
 	if (!detach_head)
 		unpack_tree_opts.reset = 1;
diff --git a/sequencer.c b/sequencer.c
index 614d56f5e21..098566c68d9 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -3699,6 +3699,8 @@ static int do_reset(struct repository *r,
 	unpack_tree_opts.fn = oneway_merge;
 	unpack_tree_opts.merge = 1;
 	unpack_tree_opts.update = 1;
+	/* FIXME: Default should be to remove ignored files */
+	unpack_tree_opts.preserve_ignored = 1;
 	init_checkout_metadata(&unpack_tree_opts.meta, name, &oid, NULL);
 
 	if (repo_read_index_unmerged(r)) {
diff --git a/unpack-trees.c b/unpack-trees.c
index 8ea0a542da8..1e4eae1dc7d 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -1707,6 +1707,12 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
 		ensure_full_index(o->dst_index);
 	}
 
+	if (!o->preserve_ignored) {
+		CALLOC_ARRAY(o->dir, 1);
+		o->dir->flags |= DIR_SHOW_IGNORED;
+		setup_standard_excludes(o->dir);
+	}
+
 	if (!core_apply_sparse_checkout || !o->update)
 		o->skip_sparse_checkout = 1;
 	if (!o->skip_sparse_checkout && !o->pl) {
@@ -1868,6 +1874,10 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
 done:
 	if (free_pattern_list)
 		clear_pattern_list(&pl);
+	if (o->dir) {
+		dir_clear(o->dir);
+		FREE_AND_NULL(o->dir);
+	}
 	trace2_region_leave("unpack_trees", "unpack_trees", the_repository);
 	trace_performance_leave("unpack_trees");
 	return ret;
diff --git a/unpack-trees.h b/unpack-trees.h
index 2d88b19dca7..f98cfd49d7b 100644
--- a/unpack-trees.h
+++ b/unpack-trees.h
@@ -49,6 +49,7 @@ struct unpack_trees_options {
 	unsigned int reset,
 		     merge,
 		     update,
+		     preserve_ignored,
 		     clone,
 		     index_only,
 		     nontrivial_merge,
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [PATCH v3 05/11] unpack-trees: make dir an internal-only struct
  2021-09-27 16:33   ` [PATCH v3 00/11] Fix various issues around removal of " Elijah Newren via GitGitGadget
                       ` (3 preceding siblings ...)
  2021-09-27 16:33     ` [PATCH v3 04/11] unpack-trees: introduce preserve_ignored to unpack_trees_options Elijah Newren via GitGitGadget
@ 2021-09-27 16:33     ` Elijah Newren via GitGitGadget
  2021-09-27 16:33     ` [PATCH v3 06/11] Remove ignored files by default when they are in the way Elijah Newren via GitGitGadget
                       ` (9 subsequent siblings)
  14 siblings, 0 replies; 82+ messages in thread
From: Elijah Newren via GitGitGadget @ 2021-09-27 16:33 UTC (permalink / raw)
  To: git
  Cc: Ævar Arnfjörð Bjarmason, Fedor Biryukov,
	Philip Oakley, Phillip Wood, Elijah Newren, Eric Sunshine,
	Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

Avoid accidental misuse or confusion over ownership by clearly making
unpack_trees_options.dir an internal-only variable.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 unpack-trees.c | 7 +++++--
 unpack-trees.h | 2 +-
 2 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/unpack-trees.c b/unpack-trees.c
index 1e4eae1dc7d..e067cce0fcd 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -1694,9 +1694,12 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
 	static struct cache_entry *dfc;
 	struct pattern_list pl;
 	int free_pattern_list = 0;
+	struct dir_struct dir = DIR_INIT;
 
 	if (len > MAX_UNPACK_TREES)
 		die("unpack_trees takes at most %d trees", MAX_UNPACK_TREES);
+	if (o->dir)
+		BUG("o->dir is for internal use only");
 
 	trace_performance_enter();
 	trace2_region_enter("unpack_trees", "unpack_trees", the_repository);
@@ -1708,7 +1711,7 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
 	}
 
 	if (!o->preserve_ignored) {
-		CALLOC_ARRAY(o->dir, 1);
+		o->dir = &dir;
 		o->dir->flags |= DIR_SHOW_IGNORED;
 		setup_standard_excludes(o->dir);
 	}
@@ -1876,7 +1879,7 @@ done:
 		clear_pattern_list(&pl);
 	if (o->dir) {
 		dir_clear(o->dir);
-		FREE_AND_NULL(o->dir);
+		o->dir = NULL;
 	}
 	trace2_region_leave("unpack_trees", "unpack_trees", the_repository);
 	trace_performance_leave("unpack_trees");
diff --git a/unpack-trees.h b/unpack-trees.h
index f98cfd49d7b..61da25dafee 100644
--- a/unpack-trees.h
+++ b/unpack-trees.h
@@ -67,7 +67,6 @@ struct unpack_trees_options {
 		     dry_run;
 	const char *prefix;
 	int cache_bottom;
-	struct dir_struct *dir;
 	struct pathspec *pathspec;
 	merge_fn_t fn;
 	const char *msgs[NB_UNPACK_TREES_WARNING_TYPES];
@@ -89,6 +88,7 @@ struct unpack_trees_options {
 	struct index_state result;
 
 	struct pattern_list *pl; /* for internal use */
+	struct dir_struct *dir; /* for internal use only */
 	struct checkout_metadata meta;
 };
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [PATCH v3 06/11] Remove ignored files by default when they are in the way
  2021-09-27 16:33   ` [PATCH v3 00/11] Fix various issues around removal of " Elijah Newren via GitGitGadget
                       ` (4 preceding siblings ...)
  2021-09-27 16:33     ` [PATCH v3 05/11] unpack-trees: make dir an internal-only struct Elijah Newren via GitGitGadget
@ 2021-09-27 16:33     ` Elijah Newren via GitGitGadget
  2021-09-27 16:33     ` [PATCH v3 07/11] Change unpack_trees' 'reset' flag into an enum Elijah Newren via GitGitGadget
                       ` (8 subsequent siblings)
  14 siblings, 0 replies; 82+ messages in thread
From: Elijah Newren via GitGitGadget @ 2021-09-27 16:33 UTC (permalink / raw)
  To: git
  Cc: Ævar Arnfjörð Bjarmason, Fedor Biryukov,
	Philip Oakley, Phillip Wood, Elijah Newren, Eric Sunshine,
	Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

Change several commands to remove ignored files by default when they are
in the way.  Since some commands (checkout, merge) take a
--no-overwrite-ignore option to allow the user to configure this, and it
may make sense to add that option to more commands (and in the case of
merge, actually plumb that configuration option through to more of the
backends than just the fast-forwarding special case), add little
comments about where such flags would be used.

Incidentally, this fixes a test failure in t7112.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 builtin/am.c               | 3 +--
 builtin/clone.c            | 3 +--
 builtin/merge.c            | 3 +--
 builtin/reset.c            | 3 +--
 builtin/stash.c            | 3 +--
 merge-ort.c                | 2 +-
 reset.c                    | 3 +--
 sequencer.c                | 3 +--
 t/t7112-reset-submodule.sh | 1 -
 9 files changed, 8 insertions(+), 16 deletions(-)

diff --git a/builtin/am.c b/builtin/am.c
index 1ee70692bc3..57738eff0c5 100644
--- a/builtin/am.c
+++ b/builtin/am.c
@@ -1919,8 +1919,7 @@ static int fast_forward_to(struct tree *head, struct tree *remote, int reset)
 	opts.merge = 1;
 	opts.reset = reset;
 	if (!reset)
-		/* FIXME: Default should be to remove ignored files */
-		opts.preserve_ignored = 1;
+		opts.preserve_ignored = 0; /* FIXME: !overwrite_ignore */
 	opts.fn = twoway_merge;
 	init_tree_desc(&t[0], head->buffer, head->size);
 	init_tree_desc(&t[1], remote->buffer, remote->size);
diff --git a/builtin/clone.c b/builtin/clone.c
index be1c3840d62..11ec6c5f2c8 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -687,8 +687,7 @@ static int checkout(int submodule_progress)
 	opts.update = 1;
 	opts.merge = 1;
 	opts.clone = 1;
-	/* FIXME: Default should be to remove ignored files */
-	opts.preserve_ignored = 1;
+	opts.preserve_ignored = 0;
 	opts.fn = oneway_merge;
 	opts.verbose_update = (option_verbosity >= 0);
 	opts.src_index = &the_index;
diff --git a/builtin/merge.c b/builtin/merge.c
index 1e5fff095fc..0ccd5e1ac83 100644
--- a/builtin/merge.c
+++ b/builtin/merge.c
@@ -680,8 +680,7 @@ static int read_tree_trivial(struct object_id *common, struct object_id *head,
 	opts.verbose_update = 1;
 	opts.trivial_merges_only = 1;
 	opts.merge = 1;
-	/* FIXME: Default should be to remove ignored files */
-	opts.preserve_ignored = 1;
+	opts.preserve_ignored = 0; /* FIXME: !overwrite_ignore */
 	trees[nr_trees] = parse_tree_indirect(common);
 	if (!trees[nr_trees++])
 		return -1;
diff --git a/builtin/reset.c b/builtin/reset.c
index 7f38656f018..5df01cc42e0 100644
--- a/builtin/reset.c
+++ b/builtin/reset.c
@@ -67,8 +67,7 @@ static int reset_index(const char *ref, const struct object_id *oid, int reset_t
 	case KEEP:
 	case MERGE:
 		opts.update = 1;
-		/* FIXME: Default should be to remove ignored files */
-		opts.preserve_ignored = 1;
+		opts.preserve_ignored = 0; /* FIXME: !overwrite_ignore */
 		break;
 	case HARD:
 		opts.update = 1;
diff --git a/builtin/stash.c b/builtin/stash.c
index 88287b890d5..d60cdaf32f5 100644
--- a/builtin/stash.c
+++ b/builtin/stash.c
@@ -259,8 +259,7 @@ static int reset_tree(struct object_id *i_tree, int update, int reset)
 	opts.reset = reset;
 	opts.update = update;
 	if (update && !reset)
-		/* FIXME: Default should be to remove ignored files */
-		opts.preserve_ignored = 1;
+		opts.preserve_ignored = 0; /* FIXME: !overwrite_ignore */
 	opts.fn = oneway_merge;
 
 	if (unpack_trees(nr_trees, t, &opts))
diff --git a/merge-ort.c b/merge-ort.c
index 0d64ec716bd..04596b5e7b3 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -4045,7 +4045,7 @@ static int checkout(struct merge_options *opt,
 	unpack_opts.quiet = 0; /* FIXME: sequencer might want quiet? */
 	unpack_opts.verbose_update = (opt->verbosity > 2);
 	unpack_opts.fn = twoway_merge;
-	unpack_opts.preserve_ignored = 0; /* FIXME: !opts->overwrite_ignore*/
+	unpack_opts.preserve_ignored = 0; /* FIXME: !opts->overwrite_ignore */
 	parse_tree(prev);
 	init_tree_desc(&trees[0], prev->buffer, prev->size);
 	parse_tree(next);
diff --git a/reset.c b/reset.c
index 41b3e2d88de..f40a8ecf663 100644
--- a/reset.c
+++ b/reset.c
@@ -56,8 +56,7 @@ int reset_head(struct repository *r, struct object_id *oid, const char *action,
 	unpack_tree_opts.fn = reset_hard ? oneway_merge : twoway_merge;
 	unpack_tree_opts.update = 1;
 	unpack_tree_opts.merge = 1;
-	/* FIXME: Default should be to remove ignored files */
-	unpack_tree_opts.preserve_ignored = 1;
+	unpack_tree_opts.preserve_ignored = 0; /* FIXME: !overwrite_ignore */
 	init_checkout_metadata(&unpack_tree_opts.meta, switch_to_branch, oid, NULL);
 	if (!detach_head)
 		unpack_tree_opts.reset = 1;
diff --git a/sequencer.c b/sequencer.c
index 098566c68d9..6872b7b00a4 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -3699,8 +3699,7 @@ static int do_reset(struct repository *r,
 	unpack_tree_opts.fn = oneway_merge;
 	unpack_tree_opts.merge = 1;
 	unpack_tree_opts.update = 1;
-	/* FIXME: Default should be to remove ignored files */
-	unpack_tree_opts.preserve_ignored = 1;
+	unpack_tree_opts.preserve_ignored = 0; /* FIXME: !overwrite_ignore */
 	init_checkout_metadata(&unpack_tree_opts.meta, name, &oid, NULL);
 
 	if (repo_read_index_unmerged(r)) {
diff --git a/t/t7112-reset-submodule.sh b/t/t7112-reset-submodule.sh
index 19830d90365..a3e2413bc33 100755
--- a/t/t7112-reset-submodule.sh
+++ b/t/t7112-reset-submodule.sh
@@ -6,7 +6,6 @@ test_description='reset can handle submodules'
 . "$TEST_DIRECTORY"/lib-submodule-update.sh
 
 KNOWN_FAILURE_DIRECTORY_SUBMODULE_CONFLICTS=1
-KNOWN_FAILURE_SUBMODULE_OVERWRITE_IGNORED_UNTRACKED=1
 
 test_submodule_switch_recursing_with_args "reset --keep"
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [PATCH v3 07/11] Change unpack_trees' 'reset' flag into an enum
  2021-09-27 16:33   ` [PATCH v3 00/11] Fix various issues around removal of " Elijah Newren via GitGitGadget
                       ` (5 preceding siblings ...)
  2021-09-27 16:33     ` [PATCH v3 06/11] Remove ignored files by default when they are in the way Elijah Newren via GitGitGadget
@ 2021-09-27 16:33     ` Elijah Newren via GitGitGadget
  2021-09-27 16:33     ` [PATCH v3 08/11] unpack-trees: avoid nuking untracked dir in way of unmerged file Elijah Newren via GitGitGadget
                       ` (7 subsequent siblings)
  14 siblings, 0 replies; 82+ messages in thread
From: Elijah Newren via GitGitGadget @ 2021-09-27 16:33 UTC (permalink / raw)
  To: git
  Cc: Ævar Arnfjörð Bjarmason, Fedor Biryukov,
	Philip Oakley, Phillip Wood, Elijah Newren, Eric Sunshine,
	Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

Traditionally, unpack_trees_options->reset was used to signal that it
was okay to delete any untracked files in the way.  This was used by
`git read-tree --reset`, but then started appearing in other places as
well.  However, many of the other uses should not be deleting untracked
files in the way.  Change this value to an enum so that a value of 1
(i.e. "true") can be split into two:
   UNPACK_RESET_PROTECT_UNTRACKED,
   UNPACK_RESET_OVERWRITE_UNTRACKED
In order to catch accidental misuses (i.e. where folks call it the way
they traditionally used to), define the special enum value of
   UNPACK_RESET_INVALID = 1
which will trigger a BUG().

Modify existing callers so that
   read-tree --reset
   reset --hard
   checkout --force
continue using the UNPACK_RESET_OVERWRITE_UNTRACKED logic, while other
callers, including
   am
   checkout without --force
   stash  (though currently dead code; reset always had a value of 0)
   numerous callers from rebase/sequencer to reset_head()
will use the new UNPACK_RESET_PROTECT_UNTRACKED value.

Also, note that it has been reported that 'git checkout <treeish>
<pathspec>' currently also allows overwriting untracked files[1].  That
case should also be fixed, but it does not use unpack_trees() and thus
is outside the scope of the current changes.

[1] https://lore.kernel.org/git/15dad590-087e-5a48-9238-5d2826950506@gmail.com/

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 builtin/am.c                     |  5 ++---
 builtin/checkout.c               |  5 +++--
 builtin/read-tree.c              |  3 +++
 builtin/reset.c                  |  9 +++++++--
 builtin/stash.c                  |  4 ++--
 reset.c                          |  2 +-
 t/t2500-untracked-overwriting.sh |  6 +++---
 unpack-trees.c                   | 10 +++++++++-
 unpack-trees.h                   | 11 +++++++++--
 9 files changed, 39 insertions(+), 16 deletions(-)

diff --git a/builtin/am.c b/builtin/am.c
index 57738eff0c5..f296226e95f 100644
--- a/builtin/am.c
+++ b/builtin/am.c
@@ -1917,9 +1917,8 @@ static int fast_forward_to(struct tree *head, struct tree *remote, int reset)
 	opts.dst_index = &the_index;
 	opts.update = 1;
 	opts.merge = 1;
-	opts.reset = reset;
-	if (!reset)
-		opts.preserve_ignored = 0; /* FIXME: !overwrite_ignore */
+	opts.reset = reset ? UNPACK_RESET_PROTECT_UNTRACKED : 0;
+	opts.preserve_ignored = 0; /* FIXME: !overwrite_ignore */
 	opts.fn = twoway_merge;
 	init_tree_desc(&t[0], head->buffer, head->size);
 	init_tree_desc(&t[1], remote->buffer, remote->size);
diff --git a/builtin/checkout.c b/builtin/checkout.c
index 5e7957dd068..cbf73b8c9f6 100644
--- a/builtin/checkout.c
+++ b/builtin/checkout.c
@@ -646,9 +646,10 @@ static int reset_tree(struct tree *tree, const struct checkout_opts *o,
 	opts.head_idx = -1;
 	opts.update = worktree;
 	opts.skip_unmerged = !worktree;
-	opts.reset = 1;
+	opts.reset = o->force ? UNPACK_RESET_OVERWRITE_UNTRACKED :
+				UNPACK_RESET_PROTECT_UNTRACKED;
+	opts.preserve_ignored = (!o->force && !o->overwrite_ignore);
 	opts.merge = 1;
-	opts.preserve_ignored = 0;
 	opts.fn = oneway_merge;
 	opts.verbose_update = o->show_progress;
 	opts.src_index = &the_index;
diff --git a/builtin/read-tree.c b/builtin/read-tree.c
index 443d206eca6..2109c4c9e5c 100644
--- a/builtin/read-tree.c
+++ b/builtin/read-tree.c
@@ -166,6 +166,9 @@ int cmd_read_tree(int argc, const char **argv, const char *cmd_prefix)
 	if (1 < opts.merge + opts.reset + prefix_set)
 		die("Which one? -m, --reset, or --prefix?");
 
+	if (opts.reset)
+		opts.reset = UNPACK_RESET_OVERWRITE_UNTRACKED;
+
 	/*
 	 * NEEDSWORK
 	 *
diff --git a/builtin/reset.c b/builtin/reset.c
index 5df01cc42e0..73935953494 100644
--- a/builtin/reset.c
+++ b/builtin/reset.c
@@ -71,9 +71,14 @@ static int reset_index(const char *ref, const struct object_id *oid, int reset_t
 		break;
 	case HARD:
 		opts.update = 1;
-		/* fallthrough */
+		opts.reset = UNPACK_RESET_OVERWRITE_UNTRACKED;
+		break;
+	case MIXED:
+		opts.reset = UNPACK_RESET_PROTECT_UNTRACKED;
+		/* but opts.update=0, so working tree not updated */
+		break;
 	default:
-		opts.reset = 1;
+		BUG("invalid reset_type passed to reset_index");
 	}
 
 	read_cache_unmerged();
diff --git a/builtin/stash.c b/builtin/stash.c
index d60cdaf32f5..0e3662a230c 100644
--- a/builtin/stash.c
+++ b/builtin/stash.c
@@ -256,9 +256,9 @@ static int reset_tree(struct object_id *i_tree, int update, int reset)
 	opts.src_index = &the_index;
 	opts.dst_index = &the_index;
 	opts.merge = 1;
-	opts.reset = reset;
+	opts.reset = reset ? UNPACK_RESET_PROTECT_UNTRACKED : 0;
 	opts.update = update;
-	if (update && !reset)
+	if (update)
 		opts.preserve_ignored = 0; /* FIXME: !overwrite_ignore */
 	opts.fn = oneway_merge;
 
diff --git a/reset.c b/reset.c
index f40a8ecf663..f214df3d96c 100644
--- a/reset.c
+++ b/reset.c
@@ -59,7 +59,7 @@ int reset_head(struct repository *r, struct object_id *oid, const char *action,
 	unpack_tree_opts.preserve_ignored = 0; /* FIXME: !overwrite_ignore */
 	init_checkout_metadata(&unpack_tree_opts.meta, switch_to_branch, oid, NULL);
 	if (!detach_head)
-		unpack_tree_opts.reset = 1;
+		unpack_tree_opts.reset = UNPACK_RESET_PROTECT_UNTRACKED;
 
 	if (repo_read_index_unmerged(r) < 0) {
 		ret = error(_("could not read index"));
diff --git a/t/t2500-untracked-overwriting.sh b/t/t2500-untracked-overwriting.sh
index 2412d121ea8..18604360df8 100755
--- a/t/t2500-untracked-overwriting.sh
+++ b/t/t2500-untracked-overwriting.sh
@@ -92,7 +92,7 @@ test_setup_checkout_m () {
 	)
 }
 
-test_expect_failure 'checkout -m does not nuke untracked file' '
+test_expect_success 'checkout -m does not nuke untracked file' '
 	test_setup_checkout_m &&
 	(
 		cd checkout &&
@@ -138,7 +138,7 @@ test_setup_sequencing () {
 	)
 }
 
-test_expect_failure 'git rebase --abort and untracked files' '
+test_expect_success 'git rebase --abort and untracked files' '
 	test_setup_sequencing rebase_abort_and_untracked &&
 	(
 		cd sequencing_rebase_abort_and_untracked &&
@@ -155,7 +155,7 @@ test_expect_failure 'git rebase --abort and untracked files' '
 	)
 '
 
-test_expect_failure 'git rebase fast forwarding and untracked files' '
+test_expect_success 'git rebase fast forwarding and untracked files' '
 	test_setup_sequencing rebase_fast_forward_and_untracked &&
 	(
 		cd sequencing_rebase_fast_forward_and_untracked &&
diff --git a/unpack-trees.c b/unpack-trees.c
index e067cce0fcd..812e4c66713 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -1696,6 +1696,9 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
 	int free_pattern_list = 0;
 	struct dir_struct dir = DIR_INIT;
 
+	if (o->reset == UNPACK_RESET_INVALID)
+		BUG("o->reset had a value of 1; should be UNPACK_TREES_*_UNTRACKED");
+
 	if (len > MAX_UNPACK_TREES)
 		die("unpack_trees takes at most %d trees", MAX_UNPACK_TREES);
 	if (o->dir)
@@ -1710,6 +1713,10 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
 		ensure_full_index(o->dst_index);
 	}
 
+	if (o->reset == UNPACK_RESET_OVERWRITE_UNTRACKED &&
+	    o->preserve_ignored)
+		BUG("UNPACK_RESET_OVERWRITE_UNTRACKED incompatible with preserved ignored files");
+
 	if (!o->preserve_ignored) {
 		o->dir = &dir;
 		o->dir->flags |= DIR_SHOW_IGNORED;
@@ -2233,7 +2240,8 @@ static int verify_absent_1(const struct cache_entry *ce,
 	int len;
 	struct stat st;
 
-	if (o->index_only || o->reset || !o->update)
+	if (o->index_only || !o->update ||
+	    o->reset == UNPACK_RESET_OVERWRITE_UNTRACKED)
 		return 0;
 
 	len = check_leading_path(ce->name, ce_namelen(ce), 0);
diff --git a/unpack-trees.h b/unpack-trees.h
index 61da25dafee..71ffb7eeb0c 100644
--- a/unpack-trees.h
+++ b/unpack-trees.h
@@ -45,9 +45,15 @@ void setup_unpack_trees_porcelain(struct unpack_trees_options *opts,
  */
 void clear_unpack_trees_porcelain(struct unpack_trees_options *opts);
 
+enum unpack_trees_reset_type {
+	UNPACK_RESET_NONE = 0,    /* traditional "false" value; still valid */
+	UNPACK_RESET_INVALID = 1, /* "true" no longer valid; use below values */
+	UNPACK_RESET_PROTECT_UNTRACKED,
+	UNPACK_RESET_OVERWRITE_UNTRACKED
+};
+
 struct unpack_trees_options {
-	unsigned int reset,
-		     merge,
+	unsigned int merge,
 		     update,
 		     preserve_ignored,
 		     clone,
@@ -65,6 +71,7 @@ struct unpack_trees_options {
 		     exiting_early,
 		     show_all_errors,
 		     dry_run;
+	enum unpack_trees_reset_type reset;
 	const char *prefix;
 	int cache_bottom;
 	struct pathspec *pathspec;
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [PATCH v3 08/11] unpack-trees: avoid nuking untracked dir in way of unmerged file
  2021-09-27 16:33   ` [PATCH v3 00/11] Fix various issues around removal of " Elijah Newren via GitGitGadget
                       ` (6 preceding siblings ...)
  2021-09-27 16:33     ` [PATCH v3 07/11] Change unpack_trees' 'reset' flag into an enum Elijah Newren via GitGitGadget
@ 2021-09-27 16:33     ` Elijah Newren via GitGitGadget
  2021-09-27 16:33     ` [PATCH v3 09/11] unpack-trees: avoid nuking untracked dir in way of locally deleted file Elijah Newren via GitGitGadget
                       ` (6 subsequent siblings)
  14 siblings, 0 replies; 82+ messages in thread
From: Elijah Newren via GitGitGadget @ 2021-09-27 16:33 UTC (permalink / raw)
  To: git
  Cc: Ævar Arnfjörð Bjarmason, Fedor Biryukov,
	Philip Oakley, Phillip Wood, Elijah Newren, Eric Sunshine,
	Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 t/t2500-untracked-overwriting.sh |  2 +-
 unpack-trees.c                   | 35 ++++++++++++++++++++++++++++----
 2 files changed, 32 insertions(+), 5 deletions(-)

diff --git a/t/t2500-untracked-overwriting.sh b/t/t2500-untracked-overwriting.sh
index 18604360df8..5ec66058cfc 100755
--- a/t/t2500-untracked-overwriting.sh
+++ b/t/t2500-untracked-overwriting.sh
@@ -197,7 +197,7 @@ test_expect_failure 'git stash and untracked files' '
 	)
 '
 
-test_expect_failure 'git am --abort and untracked dir vs. unmerged file' '
+test_expect_success 'git am --abort and untracked dir vs. unmerged file' '
 	test_setup_sequencing am_abort_and_untracked &&
 	(
 		cd sequencing_am_abort_and_untracked &&
diff --git a/unpack-trees.c b/unpack-trees.c
index 812e4c66713..080118f2325 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -2178,9 +2178,15 @@ static int icase_exists(struct unpack_trees_options *o, const char *name, int le
 	return src && !ie_match_stat(o->src_index, src, st, CE_MATCH_IGNORE_VALID|CE_MATCH_IGNORE_SKIP_WORKTREE);
 }
 
+enum absent_checking_type {
+	COMPLETELY_ABSENT,
+	ABSENT_ANY_DIRECTORY
+};
+
 static int check_ok_to_remove(const char *name, int len, int dtype,
 			      const struct cache_entry *ce, struct stat *st,
 			      enum unpack_trees_error_types error_type,
+			      enum absent_checking_type absent_type,
 			      struct unpack_trees_options *o)
 {
 	const struct cache_entry *result;
@@ -2215,6 +2221,10 @@ static int check_ok_to_remove(const char *name, int len, int dtype,
 		return 0;
 	}
 
+	/* If we only care about directories, then we can remove */
+	if (absent_type == ABSENT_ANY_DIRECTORY)
+		return 0;
+
 	/*
 	 * The previous round may already have decided to
 	 * delete this path, which is in a subdirectory that
@@ -2235,6 +2245,7 @@ static int check_ok_to_remove(const char *name, int len, int dtype,
  */
 static int verify_absent_1(const struct cache_entry *ce,
 			   enum unpack_trees_error_types error_type,
+			   enum absent_checking_type absent_type,
 			   struct unpack_trees_options *o)
 {
 	int len;
@@ -2261,7 +2272,8 @@ static int verify_absent_1(const struct cache_entry *ce,
 								NULL, o);
 			else
 				ret = check_ok_to_remove(path, len, DT_UNKNOWN, NULL,
-							 &st, error_type, o);
+							 &st, error_type,
+							 absent_type, o);
 		}
 		free(path);
 		return ret;
@@ -2276,7 +2288,7 @@ static int verify_absent_1(const struct cache_entry *ce,
 
 		return check_ok_to_remove(ce->name, ce_namelen(ce),
 					  ce_to_dtype(ce), ce, &st,
-					  error_type, o);
+					  error_type, absent_type, o);
 	}
 }
 
@@ -2286,14 +2298,23 @@ static int verify_absent(const struct cache_entry *ce,
 {
 	if (!o->skip_sparse_checkout && (ce->ce_flags & CE_NEW_SKIP_WORKTREE))
 		return 0;
-	return verify_absent_1(ce, error_type, o);
+	return verify_absent_1(ce, error_type, COMPLETELY_ABSENT, o);
+}
+
+static int verify_absent_if_directory(const struct cache_entry *ce,
+				      enum unpack_trees_error_types error_type,
+				      struct unpack_trees_options *o)
+{
+	if (!o->skip_sparse_checkout && (ce->ce_flags & CE_NEW_SKIP_WORKTREE))
+		return 0;
+	return verify_absent_1(ce, error_type, ABSENT_ANY_DIRECTORY, o);
 }
 
 static int verify_absent_sparse(const struct cache_entry *ce,
 				enum unpack_trees_error_types error_type,
 				struct unpack_trees_options *o)
 {
-	return verify_absent_1(ce, error_type, o);
+	return verify_absent_1(ce, error_type, COMPLETELY_ABSENT, o);
 }
 
 static int merged_entry(const struct cache_entry *ce,
@@ -2367,6 +2388,12 @@ static int merged_entry(const struct cache_entry *ce,
 		 * Previously unmerged entry left as an existence
 		 * marker by read_index_unmerged();
 		 */
+		if (verify_absent_if_directory(merge,
+				  ERROR_WOULD_LOSE_UNTRACKED_OVERWRITTEN, o)) {
+			discard_cache_entry(merge);
+			return -1;
+		}
+
 		invalidate_ce_path(old, o);
 	}
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [PATCH v3 09/11] unpack-trees: avoid nuking untracked dir in way of locally deleted file
  2021-09-27 16:33   ` [PATCH v3 00/11] Fix various issues around removal of " Elijah Newren via GitGitGadget
                       ` (7 preceding siblings ...)
  2021-09-27 16:33     ` [PATCH v3 08/11] unpack-trees: avoid nuking untracked dir in way of unmerged file Elijah Newren via GitGitGadget
@ 2021-09-27 16:33     ` Elijah Newren via GitGitGadget
  2021-09-27 16:33     ` [PATCH v3 10/11] Comment important codepaths regarding nuking untracked files/dirs Elijah Newren via GitGitGadget
                       ` (5 subsequent siblings)
  14 siblings, 0 replies; 82+ messages in thread
From: Elijah Newren via GitGitGadget @ 2021-09-27 16:33 UTC (permalink / raw)
  To: git
  Cc: Ævar Arnfjörð Bjarmason, Fedor Biryukov,
	Philip Oakley, Phillip Wood, Elijah Newren, Eric Sunshine,
	Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 t/t2500-untracked-overwriting.sh | 2 +-
 unpack-trees.c                   | 3 +++
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/t/t2500-untracked-overwriting.sh b/t/t2500-untracked-overwriting.sh
index 5ec66058cfc..5c0bf4d21fc 100755
--- a/t/t2500-untracked-overwriting.sh
+++ b/t/t2500-untracked-overwriting.sh
@@ -218,7 +218,7 @@ test_expect_success 'git am --abort and untracked dir vs. unmerged file' '
 	)
 '
 
-test_expect_failure 'git am --skip and untracked dir vs deleted file' '
+test_expect_success 'git am --skip and untracked dir vs deleted file' '
 	test_setup_sequencing am_skip_and_untracked &&
 	(
 		cd sequencing_am_skip_and_untracked &&
diff --git a/unpack-trees.c b/unpack-trees.c
index 080118f2325..a7e1712d236 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -2411,7 +2411,10 @@ static int deleted_entry(const struct cache_entry *ce,
 		if (verify_absent(ce, ERROR_WOULD_LOSE_UNTRACKED_REMOVED, o))
 			return -1;
 		return 0;
+	} else if (verify_absent_if_directory(ce, ERROR_WOULD_LOSE_UNTRACKED_REMOVED, o)) {
+		return -1;
 	}
+
 	if (!(old->ce_flags & CE_CONFLICTED) && verify_uptodate(old, o))
 		return -1;
 	add_entry(o, ce, CE_REMOVE, 0);
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [PATCH v3 10/11] Comment important codepaths regarding nuking untracked files/dirs
  2021-09-27 16:33   ` [PATCH v3 00/11] Fix various issues around removal of " Elijah Newren via GitGitGadget
                       ` (8 preceding siblings ...)
  2021-09-27 16:33     ` [PATCH v3 09/11] unpack-trees: avoid nuking untracked dir in way of locally deleted file Elijah Newren via GitGitGadget
@ 2021-09-27 16:33     ` Elijah Newren via GitGitGadget
  2021-09-27 16:33     ` [PATCH v3 11/11] Documentation: call out commands that nuke untracked files/directories Elijah Newren via GitGitGadget
                       ` (4 subsequent siblings)
  14 siblings, 0 replies; 82+ messages in thread
From: Elijah Newren via GitGitGadget @ 2021-09-27 16:33 UTC (permalink / raw)
  To: git
  Cc: Ævar Arnfjörð Bjarmason, Fedor Biryukov,
	Philip Oakley, Phillip Wood, Elijah Newren, Eric Sunshine,
	Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

In the last few commits we focused on code in unpack-trees.c that
mistakenly removed untracked files or directories.  There may be more of
those, but in this commit we change our focus: callers of toplevel
commands that are expected to remove untracked files or directories.

As noted previously, we have toplevel commands that are expected to
delete untracked files such as 'read-tree --reset', 'reset --hard', and
'checkout --force'.  However, that does not mean that other highlevel
commands that happen to call these other commands thought about or
conveyed to users the possibility that untracked files could be removed.
Audit the code for such callsites, and add comments near existing
callsites to mention whether these are safe or not.

My auditing is somewhat incomplete, though; it skipped several cases:
  * git-rebase--preserve-merges.sh: is in the process of being
    deprecated/removed, so I won't leave a note that there are
    likely more bugs in that script.
  * contrib/git-new-workdir: why is the -f flag being used in a new
    empty directory??  It shouldn't hurt, but it seems useless.
  * git-p4.py: Don't see why -f is needed for a new dir (maybe it's
    not and is just superfluous), but I'm not at all familiar with
    the p4 stuff
  * git-archimport.perl: Don't care; arch is long since dead
  * git-cvs*.perl: Don't care; cvs is long since dead

Also, the reset --hard in builtin/worktree.c looks safe, due to only
running in an empty directory.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 builtin/stash.c             | 1 +
 builtin/submodule--helper.c | 4 ++++
 contrib/rerere-train.sh     | 2 +-
 submodule.c                 | 1 +
 4 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/builtin/stash.c b/builtin/stash.c
index 0e3662a230c..aa31163a5a1 100644
--- a/builtin/stash.c
+++ b/builtin/stash.c
@@ -1521,6 +1521,7 @@ static int do_push_stash(const struct pathspec *ps, const char *stash_msg, int q
 		} else {
 			struct child_process cp = CHILD_PROCESS_INIT;
 			cp.git_cmd = 1;
+			/* BUG: this nukes untracked files in the way */
 			strvec_pushl(&cp.args, "reset", "--hard", "-q",
 				     "--no-recurse-submodules", NULL);
 			if (run_command(&cp)) {
diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
index 5336daf186d..e13f2a0bcd0 100644
--- a/builtin/submodule--helper.c
+++ b/builtin/submodule--helper.c
@@ -3090,6 +3090,10 @@ static int add_submodule(const struct add_data *add_data)
 		prepare_submodule_repo_env(&cp.env_array);
 		cp.git_cmd = 1;
 		cp.dir = add_data->sm_path;
+		/*
+		 * NOTE: we only get here if add_data->force is true, so
+		 * passing --force to checkout is reasonable.
+		 */
 		strvec_pushl(&cp.args, "checkout", "-f", "-q", NULL);
 
 		if (add_data->branch) {
diff --git a/contrib/rerere-train.sh b/contrib/rerere-train.sh
index eeee45dd341..75125d6ae00 100755
--- a/contrib/rerere-train.sh
+++ b/contrib/rerere-train.sh
@@ -91,7 +91,7 @@ do
 		git checkout -q $commit -- .
 		git rerere
 	fi
-	git reset -q --hard
+	git reset -q --hard  # Might nuke untracked files...
 done
 
 if test -z "$branch"
diff --git a/submodule.c b/submodule.c
index 78aed03d928..c8ba93cc708 100644
--- a/submodule.c
+++ b/submodule.c
@@ -1908,6 +1908,7 @@ static void submodule_reset_index(const char *path)
 
 	strvec_pushf(&cp.args, "--super-prefix=%s%s/",
 		     get_super_prefix_or_empty(), path);
+	/* TODO: determine if this might overwright untracked files */
 	strvec_pushl(&cp.args, "read-tree", "-u", "--reset", NULL);
 
 	strvec_push(&cp.args, empty_tree_oid_hex());
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [PATCH v3 11/11] Documentation: call out commands that nuke untracked files/directories
  2021-09-27 16:33   ` [PATCH v3 00/11] Fix various issues around removal of " Elijah Newren via GitGitGadget
                       ` (9 preceding siblings ...)
  2021-09-27 16:33     ` [PATCH v3 10/11] Comment important codepaths regarding nuking untracked files/dirs Elijah Newren via GitGitGadget
@ 2021-09-27 16:33     ` Elijah Newren via GitGitGadget
  2021-09-27 20:36     ` [PATCH v3 00/11] Fix various issues around removal of " Junio C Hamano
                       ` (3 subsequent siblings)
  14 siblings, 0 replies; 82+ messages in thread
From: Elijah Newren via GitGitGadget @ 2021-09-27 16:33 UTC (permalink / raw)
  To: git
  Cc: Ævar Arnfjörð Bjarmason, Fedor Biryukov,
	Philip Oakley, Phillip Wood, Elijah Newren, Eric Sunshine,
	Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

Some commands have traditionally also removed untracked files (or
directories) that were in the way of a tracked file we needed.  Document
these cases.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 Documentation/git-checkout.txt  | 5 +++--
 Documentation/git-read-tree.txt | 5 +++--
 Documentation/git-reset.txt     | 3 ++-
 3 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/Documentation/git-checkout.txt b/Documentation/git-checkout.txt
index b1a6fe44997..d473c9bf387 100644
--- a/Documentation/git-checkout.txt
+++ b/Documentation/git-checkout.txt
@@ -118,8 +118,9 @@ OPTIONS
 -f::
 --force::
 	When switching branches, proceed even if the index or the
-	working tree differs from `HEAD`.  This is used to throw away
-	local changes.
+	working tree differs from `HEAD`, and even if there are untracked
+	files in the way.  This is used to throw away local changes and
+	any untracked files or directories that are in the way.
 +
 When checking out paths from the index, do not fail upon unmerged
 entries; instead, unmerged entries are ignored.
diff --git a/Documentation/git-read-tree.txt b/Documentation/git-read-tree.txt
index 0222a27c5af..8c3aceb8324 100644
--- a/Documentation/git-read-tree.txt
+++ b/Documentation/git-read-tree.txt
@@ -38,8 +38,9 @@ OPTIONS
 
 --reset::
 	Same as -m, except that unmerged entries are discarded instead
-	of failing. When used with `-u`, updates leading to loss of
-	working tree changes will not abort the operation.
+	of failing.  When used with `-u`, updates leading to loss of
+	working tree changes or untracked files or directories will not
+	abort the operation.
 
 -u::
 	After a successful merge, update the files in the work
diff --git a/Documentation/git-reset.txt b/Documentation/git-reset.txt
index 252e2d4e47d..6f7685f53d5 100644
--- a/Documentation/git-reset.txt
+++ b/Documentation/git-reset.txt
@@ -69,7 +69,8 @@ linkgit:git-add[1]).
 
 --hard::
 	Resets the index and working tree. Any changes to tracked files in the
-	working tree since `<commit>` are discarded.
+	working tree since `<commit>` are discarded.  Any untracked files or
+	directories in the way of writing any tracked files are simply deleted.
 
 --merge::
 	Resets the index and updates the files in the working tree that are
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 82+ messages in thread

* Re: [PATCH v3 00/11] Fix various issues around removal of untracked files/directories
  2021-09-27 16:33   ` [PATCH v3 00/11] Fix various issues around removal of " Elijah Newren via GitGitGadget
                       ` (10 preceding siblings ...)
  2021-09-27 16:33     ` [PATCH v3 11/11] Documentation: call out commands that nuke untracked files/directories Elijah Newren via GitGitGadget
@ 2021-09-27 20:36     ` Junio C Hamano
  2021-09-27 20:41       ` Elijah Newren
  2021-09-30 14:00     ` Phillip Wood
                       ` (2 subsequent siblings)
  14 siblings, 1 reply; 82+ messages in thread
From: Junio C Hamano @ 2021-09-27 20:36 UTC (permalink / raw)
  To: Elijah Newren via GitGitGadget
  Cc: git, Ævar Arnfjörð Bjarmason, Fedor Biryukov,
	Philip Oakley, Phillip Wood, Elijah Newren, Eric Sunshine

"Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com> writes:

> Changes since v2 (all due to Junio's request to consolidate
> unpack_trees_options.dir handling):

Heh, don't blame me.  I even explicitly said it was merely an
observation for longer term, not a suggestion to include the first
step for such a move in this series.

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH v3 00/11] Fix various issues around removal of untracked files/directories
  2021-09-27 20:36     ` [PATCH v3 00/11] Fix various issues around removal of " Junio C Hamano
@ 2021-09-27 20:41       ` Elijah Newren
  2021-09-27 21:31         ` Elijah Newren
  0 siblings, 1 reply; 82+ messages in thread
From: Elijah Newren @ 2021-09-27 20:41 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Elijah Newren via GitGitGadget, Git Mailing List,
	Ævar Arnfjörð Bjarmason, Fedor Biryukov,
	Philip Oakley, Phillip Wood, Eric Sunshine

On Mon, Sep 27, 2021 at 1:36 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> "Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
> > Changes since v2 (all due to Junio's request to consolidate
> > unpack_trees_options.dir handling):
>
> Heh, don't blame me.  I even explicitly said it was merely an
> observation for longer term, not a suggestion to include the first
> step for such a move in this series.

Well...the repetitive code for setting up and clearing out
unpack_trees_options.dir that already existed (and which my series was
copying to more places) bugged me too, but I was worried that it was a
bit messy to clean up (and the fact that it took five patches suggests
it was).  But then you also brought it up as an issue when reviewing,
so I figured I might as well dive in...

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH v3 00/11] Fix various issues around removal of untracked files/directories
  2021-09-27 20:41       ` Elijah Newren
@ 2021-09-27 21:31         ` Elijah Newren
  0 siblings, 0 replies; 82+ messages in thread
From: Elijah Newren @ 2021-09-27 21:31 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Elijah Newren via GitGitGadget, Git Mailing List,
	Ævar Arnfjörð Bjarmason, Fedor Biryukov,
	Philip Oakley, Phillip Wood, Eric Sunshine

On Mon, Sep 27, 2021 at 1:41 PM Elijah Newren <newren@gmail.com> wrote:
>
> On Mon, Sep 27, 2021 at 1:36 PM Junio C Hamano <gitster@pobox.com> wrote:
> >
> > "Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com> writes:
> >
> > > Changes since v2 (all due to Junio's request to consolidate
> > > unpack_trees_options.dir handling):
> >
> > Heh, don't blame me.  I even explicitly said it was merely an
> > observation for longer term, not a suggestion to include the first
> > step for such a move in this series.
>
> Well...the repetitive code for setting up and clearing out
> unpack_trees_options.dir that already existed (and which my series was
> copying to more places) bugged me too, but I was worried that it was a
> bit messy to clean up (and the fact that it took five patches suggests
> it was).  But then you also brought it up as an issue when reviewing,
> so I figured I might as well dive in...

I guess I should add that some of your other review comments were
related, e.g. your puzzlement/assumption that some of my changes
preserved ignored files when untracked files were being overwritten
(which was not what the patches actually did).  Trying to make the
code clearer was in some ways easier by first consolidating all those
other bits.

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH v3 04/11] unpack-trees: introduce preserve_ignored to unpack_trees_options
  2021-09-27 16:33     ` [PATCH v3 04/11] unpack-trees: introduce preserve_ignored to unpack_trees_options Elijah Newren via GitGitGadget
@ 2021-09-29  9:22       ` Ævar Arnfjörð Bjarmason
  2021-09-29 15:35         ` Elijah Newren
  0 siblings, 1 reply; 82+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-29  9:22 UTC (permalink / raw)
  To: Elijah Newren via GitGitGadget
  Cc: git, Fedor Biryukov, Philip Oakley, Phillip Wood, Eric Sunshine,
	Elijah Newren


On Mon, Sep 27 2021, Elijah Newren via GitGitGadget wrote:

> From: Elijah Newren <newren@gmail.com>
>
> Currently, every caller of unpack_trees() that wants to ensure ignored
> files are overwritten by default needs to:
>    * allocate unpack_trees_options.dir
>    * flip the DIR_SHOW_IGNORED flag in unpack_trees_options.dir->flags
>    * call setup_standard_excludes
> AND then after the call to unpack_trees() needs to
>    * call dir_clear()
>    * deallocate unpack_trees_options.dir
> That's a fair amount of boilerplate, and every caller uses identical
> code.  Make this easier by instead introducing a new boolean value where
> the default value (0) does what we want so that new callers of
> unpack_trees() automatically get the appropriate behavior.  And move all
> the handling of unpack_trees_options.dir into unpack_trees() itself.
>
> While preserve_ignored = 0 is the behavior we feel is the appropriate
> default, we defer fixing commands to use the appropriate default until a
> later commit.  So, this commit introduces several locations where we
> manually set preserve_ignored=1.  This makes it clear where code paths
> were previously preserving ignored files when they should not have been;
> a future commit will flip these to instead use a value of 0 to get the
> behavior we want.
>
> Signed-off-by: Elijah Newren <newren@gmail.com>
> ---
>  builtin/am.c        |  3 +++
>  builtin/checkout.c  | 11 ++---------
>  builtin/clone.c     |  2 ++
>  builtin/merge.c     |  2 ++
>  builtin/read-tree.c | 13 +++----------
>  builtin/reset.c     |  2 ++
>  builtin/stash.c     |  3 +++
>  merge-ort.c         |  8 +-------
>  merge-recursive.c   |  8 +-------
>  merge.c             |  8 +-------
>  reset.c             |  2 ++
>  sequencer.c         |  2 ++
>  unpack-trees.c      | 10 ++++++++++
>  unpack-trees.h      |  1 +
>  14 files changed, 35 insertions(+), 40 deletions(-)
>
> diff --git a/builtin/am.c b/builtin/am.c
> index e4a0ff9cd7c..1ee70692bc3 100644
> --- a/builtin/am.c
> +++ b/builtin/am.c
> @@ -1918,6 +1918,9 @@ static int fast_forward_to(struct tree *head, struct tree *remote, int reset)
>  	opts.update = 1;
>  	opts.merge = 1;
>  	opts.reset = reset;
> +	if (!reset)
> +		/* FIXME: Default should be to remove ignored files */
> +		opts.preserve_ignored = 1;
>  	opts.fn = twoway_merge;
>  	init_tree_desc(&t[0], head->buffer, head->size);
>  	init_tree_desc(&t[1], remote->buffer, remote->size);
> diff --git a/builtin/checkout.c b/builtin/checkout.c
> index 5335435d616..5e7957dd068 100644
> --- a/builtin/checkout.c
> +++ b/builtin/checkout.c
> @@ -648,6 +648,7 @@ static int reset_tree(struct tree *tree, const struct checkout_opts *o,
>  	opts.skip_unmerged = !worktree;
>  	opts.reset = 1;
>  	opts.merge = 1;
> +	opts.preserve_ignored = 0;
>  	opts.fn = oneway_merge;
>  	opts.verbose_update = o->show_progress;
>  	opts.src_index = &the_index;
> @@ -746,11 +747,7 @@ static int merge_working_tree(const struct checkout_opts *opts,
>  				       new_branch_info->commit ?
>  				       &new_branch_info->commit->object.oid :
>  				       &new_branch_info->oid, NULL);
> -		if (opts->overwrite_ignore) {
> -			topts.dir = xcalloc(1, sizeof(*topts.dir));
> -			topts.dir->flags |= DIR_SHOW_IGNORED;
> -			setup_standard_excludes(topts.dir);
> -		}
> +		topts.preserve_ignored = !opts->overwrite_ignore;
>  		tree = parse_tree_indirect(old_branch_info->commit ?
>  					   &old_branch_info->commit->object.oid :
>  					   the_hash_algo->empty_tree);
> @@ -760,10 +757,6 @@ static int merge_working_tree(const struct checkout_opts *opts,
>  		init_tree_desc(&trees[1], tree->buffer, tree->size);
>  
>  		ret = unpack_trees(2, trees, &topts);
> -		if (topts.dir) {
> -			dir_clear(topts.dir);
> -			FREE_AND_NULL(topts.dir);
> -		}
>  		clear_unpack_trees_porcelain(&topts);
>  		if (ret == -1) {
>  			/*
> diff --git a/builtin/clone.c b/builtin/clone.c
> index ff1d3d447a3..be1c3840d62 100644
> --- a/builtin/clone.c
> +++ b/builtin/clone.c
> @@ -687,6 +687,8 @@ static int checkout(int submodule_progress)
>  	opts.update = 1;
>  	opts.merge = 1;
>  	opts.clone = 1;
> +	/* FIXME: Default should be to remove ignored files */
> +	opts.preserve_ignored = 1;
>  	opts.fn = oneway_merge;
>  	opts.verbose_update = (option_verbosity >= 0);
>  	opts.src_index = &the_index;
> diff --git a/builtin/merge.c b/builtin/merge.c
> index 3fbdacc7db4..1e5fff095fc 100644
> --- a/builtin/merge.c
> +++ b/builtin/merge.c
> @@ -680,6 +680,8 @@ static int read_tree_trivial(struct object_id *common, struct object_id *head,
>  	opts.verbose_update = 1;
>  	opts.trivial_merges_only = 1;
>  	opts.merge = 1;
> +	/* FIXME: Default should be to remove ignored files */
> +	opts.preserve_ignored = 1;
>  	trees[nr_trees] = parse_tree_indirect(common);
>  	if (!trees[nr_trees++])
>  		return -1;
> diff --git a/builtin/read-tree.c b/builtin/read-tree.c
> index 73cb957a69b..443d206eca6 100644
> --- a/builtin/read-tree.c
> +++ b/builtin/read-tree.c
> @@ -201,11 +201,9 @@ int cmd_read_tree(int argc, const char **argv, const char *cmd_prefix)
>  	if ((opts.update || opts.index_only) && !opts.merge)
>  		die("%s is meaningless without -m, --reset, or --prefix",
>  		    opts.update ? "-u" : "-i");
> -	if (opts.update && !opts.reset) {
> -		CALLOC_ARRAY(opts.dir, 1);
> -		opts.dir->flags |= DIR_SHOW_IGNORED;
> -		setup_standard_excludes(opts.dir);
> -	}
> +	if (opts.update && !opts.reset)
> +		opts.preserve_ignored = 0;
> +	/* otherwise, opts.preserve_ignored is irrelevant */
>  	if (opts.merge && !opts.index_only)
>  		setup_work_tree();
>  
> @@ -245,11 +243,6 @@ int cmd_read_tree(int argc, const char **argv, const char *cmd_prefix)
>  	if (unpack_trees(nr_trees, t, &opts))
>  		return 128;
>  
> -	if (opts.dir) {
> -		dir_clear(opts.dir);
> -		FREE_AND_NULL(opts.dir);
> -	}
> -
>  	if (opts.debug_unpack || opts.dry_run)
>  		return 0; /* do not write the index out */
>  
> diff --git a/builtin/reset.c b/builtin/reset.c
> index 51c9e2f43ff..7f38656f018 100644
> --- a/builtin/reset.c
> +++ b/builtin/reset.c
> @@ -67,6 +67,8 @@ static int reset_index(const char *ref, const struct object_id *oid, int reset_t
>  	case KEEP:
>  	case MERGE:
>  		opts.update = 1;
> +		/* FIXME: Default should be to remove ignored files */
> +		opts.preserve_ignored = 1;
>  		break;
>  	case HARD:
>  		opts.update = 1;
> diff --git a/builtin/stash.c b/builtin/stash.c
> index 8f42360ca91..88287b890d5 100644
> --- a/builtin/stash.c
> +++ b/builtin/stash.c
> @@ -258,6 +258,9 @@ static int reset_tree(struct object_id *i_tree, int update, int reset)
>  	opts.merge = 1;
>  	opts.reset = reset;
>  	opts.update = update;
> +	if (update && !reset)
> +		/* FIXME: Default should be to remove ignored files */
> +		opts.preserve_ignored = 1;
>  	opts.fn = oneway_merge;
>  
>  	if (unpack_trees(nr_trees, t, &opts))
> diff --git a/merge-ort.c b/merge-ort.c
> index 35aa979c3a4..0d64ec716bd 100644
> --- a/merge-ort.c
> +++ b/merge-ort.c
> @@ -4045,11 +4045,7 @@ static int checkout(struct merge_options *opt,
>  	unpack_opts.quiet = 0; /* FIXME: sequencer might want quiet? */
>  	unpack_opts.verbose_update = (opt->verbosity > 2);
>  	unpack_opts.fn = twoway_merge;
> -	if (1/* FIXME: opts->overwrite_ignore*/) {
> -		CALLOC_ARRAY(unpack_opts.dir, 1);
> -		unpack_opts.dir->flags |= DIR_SHOW_IGNORED;
> -		setup_standard_excludes(unpack_opts.dir);
> -	}
> +	unpack_opts.preserve_ignored = 0; /* FIXME: !opts->overwrite_ignore*/
>  	parse_tree(prev);
>  	init_tree_desc(&trees[0], prev->buffer, prev->size);
>  	parse_tree(next);
> @@ -4057,8 +4053,6 @@ static int checkout(struct merge_options *opt,
>  
>  	ret = unpack_trees(2, trees, &unpack_opts);
>  	clear_unpack_trees_porcelain(&unpack_opts);
> -	dir_clear(unpack_opts.dir);
> -	FREE_AND_NULL(unpack_opts.dir);
>  	return ret;
>  }
>  
> diff --git a/merge-recursive.c b/merge-recursive.c
> index 233d9f686ad..2be3f5d4044 100644
> --- a/merge-recursive.c
> +++ b/merge-recursive.c
> @@ -411,9 +411,7 @@ static int unpack_trees_start(struct merge_options *opt,
>  	else {
>  		opt->priv->unpack_opts.update = 1;
>  		/* FIXME: should only do this if !overwrite_ignore */
> -		CALLOC_ARRAY(opt->priv->unpack_opts.dir, 1);
> -		opt->priv->unpack_opts.dir->flags |= DIR_SHOW_IGNORED;
> -		setup_standard_excludes(opt->priv->unpack_opts.dir);
> +		opt->priv->unpack_opts.preserve_ignored = 0;
>  	}
>  	opt->priv->unpack_opts.merge = 1;
>  	opt->priv->unpack_opts.head_idx = 2;
> @@ -428,10 +426,6 @@ static int unpack_trees_start(struct merge_options *opt,
>  	init_tree_desc_from_tree(t+2, merge);
>  
>  	rc = unpack_trees(3, t, &opt->priv->unpack_opts);
> -	if (opt->priv->unpack_opts.dir) {
> -		dir_clear(opt->priv->unpack_opts.dir);
> -		FREE_AND_NULL(opt->priv->unpack_opts.dir);
> -	}
>  	cache_tree_free(&opt->repo->index->cache_tree);
>  
>  	/*
> diff --git a/merge.c b/merge.c
> index 6e736881d90..2382ff66d35 100644
> --- a/merge.c
> +++ b/merge.c
> @@ -53,7 +53,6 @@ int checkout_fast_forward(struct repository *r,
>  	struct unpack_trees_options opts;
>  	struct tree_desc t[MAX_UNPACK_TREES];
>  	int i, nr_trees = 0;
> -	struct dir_struct dir = DIR_INIT;
>  	struct lock_file lock_file = LOCK_INIT;
>  
>  	refresh_index(r->index, REFRESH_QUIET, NULL, NULL, NULL);
> @@ -80,11 +79,7 @@ int checkout_fast_forward(struct repository *r,
>  	}
>  
>  	memset(&opts, 0, sizeof(opts));
> -	if (overwrite_ignore) {
> -		dir.flags |= DIR_SHOW_IGNORED;
> -		setup_standard_excludes(&dir);
> -		opts.dir = &dir;
> -	}
> +	opts.preserve_ignored = !overwrite_ignore;
>  
>  	opts.head_idx = 1;
>  	opts.src_index = r->index;
> @@ -101,7 +96,6 @@ int checkout_fast_forward(struct repository *r,
>  		clear_unpack_trees_porcelain(&opts);
>  		return -1;
>  	}
> -	dir_clear(&dir);
>  	clear_unpack_trees_porcelain(&opts);
>  
>  	if (write_locked_index(r->index, &lock_file, COMMIT_LOCK))
> diff --git a/reset.c b/reset.c
> index 79310ae071b..41b3e2d88de 100644
> --- a/reset.c
> +++ b/reset.c
> @@ -56,6 +56,8 @@ int reset_head(struct repository *r, struct object_id *oid, const char *action,
>  	unpack_tree_opts.fn = reset_hard ? oneway_merge : twoway_merge;
>  	unpack_tree_opts.update = 1;
>  	unpack_tree_opts.merge = 1;
> +	/* FIXME: Default should be to remove ignored files */
> +	unpack_tree_opts.preserve_ignored = 1;
>  	init_checkout_metadata(&unpack_tree_opts.meta, switch_to_branch, oid, NULL);
>  	if (!detach_head)
>  		unpack_tree_opts.reset = 1;
> diff --git a/sequencer.c b/sequencer.c
> index 614d56f5e21..098566c68d9 100644
> --- a/sequencer.c
> +++ b/sequencer.c
> @@ -3699,6 +3699,8 @@ static int do_reset(struct repository *r,
>  	unpack_tree_opts.fn = oneway_merge;
>  	unpack_tree_opts.merge = 1;
>  	unpack_tree_opts.update = 1;
> +	/* FIXME: Default should be to remove ignored files */
> +	unpack_tree_opts.preserve_ignored = 1;
>  	init_checkout_metadata(&unpack_tree_opts.meta, name, &oid, NULL);
>  
>  	if (repo_read_index_unmerged(r)) {
> diff --git a/unpack-trees.c b/unpack-trees.c
> index 8ea0a542da8..1e4eae1dc7d 100644
> --- a/unpack-trees.c
> +++ b/unpack-trees.c
> @@ -1707,6 +1707,12 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
>  		ensure_full_index(o->dst_index);
>  	}
>  
> +	if (!o->preserve_ignored) {
> +		CALLOC_ARRAY(o->dir, 1);
> +		o->dir->flags |= DIR_SHOW_IGNORED;
> +		setup_standard_excludes(o->dir);
> +	}
> +
>  	if (!core_apply_sparse_checkout || !o->update)
>  		o->skip_sparse_checkout = 1;
>  	if (!o->skip_sparse_checkout && !o->pl) {
> @@ -1868,6 +1874,10 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
>  done:
>  	if (free_pattern_list)
>  		clear_pattern_list(&pl);
> +	if (o->dir) {
> +		dir_clear(o->dir);
> +		FREE_AND_NULL(o->dir);
> +	}
>  	trace2_region_leave("unpack_trees", "unpack_trees", the_repository);
>  	trace_performance_leave("unpack_trees");
>  	return ret;
> diff --git a/unpack-trees.h b/unpack-trees.h
> index 2d88b19dca7..f98cfd49d7b 100644
> --- a/unpack-trees.h
> +++ b/unpack-trees.h
> @@ -49,6 +49,7 @@ struct unpack_trees_options {
>  	unsigned int reset,
>  		     merge,
>  		     update,
> +		     preserve_ignored,
>  		     clone,
>  		     index_only,
>  		     nontrivial_merge,

I think getting rid of the boilerplate makes sense, but it doesn't sound
from the commit message like you've considered just making that "struct
dir*" member a "struct dir" instead.

That simplifies things a lot, i.e. we can just DIR_INIT it, and don't
need every caller to malloc/free it.

Sometimes a pointer makes sense, but in this case the "struct
unpack_trees_options" can just own it.

As part of WIP leak fixes I have unsubmitted I'd implemented that, patch
follows below.

I think the part of it that deals with managing the "struct dir" is much
nicer, but you might still want to keep the "preserve_ignored" you've
added.

Oh, and I noticed I removed the dir_clear() here but didn't add it to
clear_unpack_trees_porcelain(), that also needs to be done (and I did it
in a later fix that I should squash in), but I can't be bothered to
re-do the below diff just for that, and since the point is how we manage
the struct itself (the freeing is rather trivial...).

diff --git a/builtin/checkout.c b/builtin/checkout.c
index 8c69dcdf72a..632da036717 100644
--- a/builtin/checkout.c
+++ b/builtin/checkout.c
@@ -747,9 +747,8 @@ static int merge_working_tree(const struct checkout_opts *opts,
 				       &new_branch_info->commit->object.oid :
 				       &new_branch_info->oid, NULL);
 		if (opts->overwrite_ignore) {
-			topts.dir = xcalloc(1, sizeof(*topts.dir));
-			topts.dir->flags |= DIR_SHOW_IGNORED;
-			setup_standard_excludes(topts.dir);
+			topts.dir.flags |= DIR_SHOW_IGNORED;
+			setup_standard_excludes(&topts.dir);
 		}
 		tree = parse_tree_indirect(old_branch_info->commit ?
 					   &old_branch_info->commit->object.oid :
diff --git a/builtin/read-tree.c b/builtin/read-tree.c
index 485e7b04794..6d529c77c49 100644
--- a/builtin/read-tree.c
+++ b/builtin/read-tree.c
@@ -53,20 +53,17 @@ static int index_output_cb(const struct option *opt, const char *arg,
 static int exclude_per_directory_cb(const struct option *opt, const char *arg,
 				    int unset)
 {
-	struct dir_struct *dir;
 	struct unpack_trees_options *opts;
 
 	BUG_ON_OPT_NEG(unset);
 
 	opts = (struct unpack_trees_options *)opt->value;
 
-	if (opts->dir)
+	if (opts->dir.exclude_per_dir)
 		die("more than one --exclude-per-directory given.");
 
-	dir = xcalloc(1, sizeof(*opts->dir));
-	dir->flags |= DIR_SHOW_IGNORED;
-	dir->exclude_per_dir = arg;
-	opts->dir = dir;
+	opts->dir.flags |= DIR_SHOW_IGNORED;
+	opts->dir.exclude_per_dir = arg;
 	/* We do not need to nor want to do read-directory
 	 * here; we are merely interested in reusing the
 	 * per directory ignore stack mechanism.
@@ -209,7 +206,7 @@ int cmd_read_tree(int argc, const char **argv, const char *cmd_prefix)
 	if ((opts.update || opts.index_only) && !opts.merge)
 		die("%s is meaningless without -m, --reset, or --prefix",
 		    opts.update ? "-u" : "-i");
-	if ((opts.dir && !opts.update))
+	if ((opts.dir.exclude_per_dir && !opts.update))
 		die("--exclude-per-directory is meaningless unless -u");
 	if (opts.merge && !opts.index_only)
 		setup_work_tree();
diff --git a/merge-ort.c b/merge-ort.c
index 35aa979c3a4..e526b78b88d 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -4021,9 +4021,8 @@ static int checkout(struct merge_options *opt,
 	/* Switch the index/working copy from old to new */
 	int ret;
 	struct tree_desc trees[2];
-	struct unpack_trees_options unpack_opts;
+	struct unpack_trees_options unpack_opts = UNPACK_TREES_OPTIONS_INIT;
 
-	memset(&unpack_opts, 0, sizeof(unpack_opts));
 	unpack_opts.head_idx = -1;
 	unpack_opts.src_index = opt->repo->index;
 	unpack_opts.dst_index = opt->repo->index;
@@ -4046,9 +4045,8 @@ static int checkout(struct merge_options *opt,
 	unpack_opts.verbose_update = (opt->verbosity > 2);
 	unpack_opts.fn = twoway_merge;
 	if (1/* FIXME: opts->overwrite_ignore*/) {
-		CALLOC_ARRAY(unpack_opts.dir, 1);
-		unpack_opts.dir->flags |= DIR_SHOW_IGNORED;
-		setup_standard_excludes(unpack_opts.dir);
+		unpack_opts.dir.flags |= DIR_SHOW_IGNORED;
+		setup_standard_excludes(&unpack_opts.dir);
 	}
 	parse_tree(prev);
 	init_tree_desc(&trees[0], prev->buffer, prev->size);
@@ -4057,8 +4055,6 @@ static int checkout(struct merge_options *opt,
 
 	ret = unpack_trees(2, trees, &unpack_opts);
 	clear_unpack_trees_porcelain(&unpack_opts);
-	dir_clear(unpack_opts.dir);
-	FREE_AND_NULL(unpack_opts.dir);
 	return ret;
 }
 
diff --git a/merge.c b/merge.c
index 6e736881d90..9cb32990dd9 100644
--- a/merge.c
+++ b/merge.c
@@ -50,10 +50,9 @@ int checkout_fast_forward(struct repository *r,
 			  int overwrite_ignore)
 {
 	struct tree *trees[MAX_UNPACK_TREES];
-	struct unpack_trees_options opts;
+	struct unpack_trees_options opts = UNPACK_TREES_OPTIONS_INIT;
 	struct tree_desc t[MAX_UNPACK_TREES];
 	int i, nr_trees = 0;
-	struct dir_struct dir = DIR_INIT;
 	struct lock_file lock_file = LOCK_INIT;
 
 	refresh_index(r->index, REFRESH_QUIET, NULL, NULL, NULL);
@@ -79,11 +78,9 @@ int checkout_fast_forward(struct repository *r,
 		init_tree_desc(t+i, trees[i]->buffer, trees[i]->size);
 	}
 
-	memset(&opts, 0, sizeof(opts));
 	if (overwrite_ignore) {
-		dir.flags |= DIR_SHOW_IGNORED;
-		setup_standard_excludes(&dir);
-		opts.dir = &dir;
+		opts.dir.flags |= DIR_SHOW_IGNORED;
+		setup_standard_excludes(&opts.dir);
 	}
 
 	opts.head_idx = 1;
@@ -101,7 +98,6 @@ int checkout_fast_forward(struct repository *r,
 		clear_unpack_trees_porcelain(&opts);
 		return -1;
 	}
-	dir_clear(&dir);
 	clear_unpack_trees_porcelain(&opts);
 
 	if (write_locked_index(r->index, &lock_file, COMMIT_LOCK))
diff --git a/unpack-trees.c b/unpack-trees.c
index 8ea0a542da8..33a2dc23ffc 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -2081,7 +2081,7 @@ static int verify_clean_subdirectory(const struct cache_entry *ce,
 	 */
 	int namelen;
 	int i;
-	struct dir_struct d;
+	struct dir_struct d = DIR_INIT;
 	char *pathbuf;
 	int cnt = 0;
 
@@ -2132,9 +2132,7 @@ static int verify_clean_subdirectory(const struct cache_entry *ce,
 	 */
 	pathbuf = xstrfmt("%.*s/", namelen, ce->name);
 
-	memset(&d, 0, sizeof(d));
-	if (o->dir)
-		d.exclude_per_dir = o->dir->exclude_per_dir;
+	d.exclude_per_dir = o->dir.exclude_per_dir;
 	i = read_directory(&d, o->src_index, pathbuf, namelen+1, NULL);
 	if (i)
 		return add_rejected_path(o, ERROR_NOT_UPTODATE_DIR, ce->name);
@@ -2175,8 +2173,7 @@ static int check_ok_to_remove(const char *name, int len, int dtype,
 	if (ignore_case && icase_exists(o, name, len, st))
 		return 0;
 
-	if (o->dir &&
-	    is_excluded(o->dir, o->src_index, name, &dtype))
+	if (is_excluded(&o->dir, o->src_index, name, &dtype))
 		/*
 		 * ce->name is explicitly excluded, so it is Ok to
 		 * overwrite it.
diff --git a/unpack-trees.h b/unpack-trees.h
index 2d88b19dca7..6fa6a4dfc3e 100644
--- a/unpack-trees.h
+++ b/unpack-trees.h
@@ -5,6 +5,7 @@
 #include "strvec.h"
 #include "string-list.h"
 #include "tree-walk.h"
+#include "dir.h"
 
 #define MAX_UNPACK_TREES MAX_TRAVERSE_TREES
 
@@ -66,7 +67,7 @@ struct unpack_trees_options {
 		     dry_run;
 	const char *prefix;
 	int cache_bottom;
-	struct dir_struct *dir;
+	struct dir_struct dir;
 	struct pathspec *pathspec;
 	merge_fn_t fn;
 	const char *msgs[NB_UNPACK_TREES_WARNING_TYPES];
@@ -90,6 +91,9 @@ struct unpack_trees_options {
 	struct pattern_list *pl; /* for internal use */
 	struct checkout_metadata meta;
 };
+#define UNPACK_TREES_OPTIONS_INIT { \
+	.dir = DIR_INIT, \
+}
 
 int unpack_trees(unsigned n, struct tree_desc *t,
 		 struct unpack_trees_options *options);

^ permalink raw reply related	[flat|nested] 82+ messages in thread

* Re: [PATCH v3 04/11] unpack-trees: introduce preserve_ignored to unpack_trees_options
  2021-09-29  9:22       ` Ævar Arnfjörð Bjarmason
@ 2021-09-29 15:35         ` Elijah Newren
  2021-09-29 18:30           ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 82+ messages in thread
From: Elijah Newren @ 2021-09-29 15:35 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Elijah Newren via GitGitGadget, Git Mailing List, Fedor Biryukov,
	Philip Oakley, Phillip Wood, Eric Sunshine

On Wed, Sep 29, 2021 at 2:27 AM Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
>
> On Mon, Sep 27 2021, Elijah Newren via GitGitGadget wrote:
>
> > From: Elijah Newren <newren@gmail.com>
> >
> > Currently, every caller of unpack_trees() that wants to ensure ignored
> > files are overwritten by default needs to:
> >    * allocate unpack_trees_options.dir
> >    * flip the DIR_SHOW_IGNORED flag in unpack_trees_options.dir->flags
> >    * call setup_standard_excludes
> > AND then after the call to unpack_trees() needs to
> >    * call dir_clear()
> >    * deallocate unpack_trees_options.dir
> > That's a fair amount of boilerplate, and every caller uses identical
> > code.  Make this easier by instead introducing a new boolean value where
> > the default value (0) does what we want so that new callers of
> > unpack_trees() automatically get the appropriate behavior.  And move all
> > the handling of unpack_trees_options.dir into unpack_trees() itself.
> >
> > While preserve_ignored = 0 is the behavior we feel is the appropriate
> > default, we defer fixing commands to use the appropriate default until a
> > later commit.  So, this commit introduces several locations where we
> > manually set preserve_ignored=1.  This makes it clear where code paths
> > were previously preserving ignored files when they should not have been;
> > a future commit will flip these to instead use a value of 0 to get the
> > behavior we want.
> >
> > Signed-off-by: Elijah Newren <newren@gmail.com>
> > ---
> >  builtin/am.c        |  3 +++
> >  builtin/checkout.c  | 11 ++---------
> >  builtin/clone.c     |  2 ++
> >  builtin/merge.c     |  2 ++
> >  builtin/read-tree.c | 13 +++----------
> >  builtin/reset.c     |  2 ++
> >  builtin/stash.c     |  3 +++
> >  merge-ort.c         |  8 +-------
> >  merge-recursive.c   |  8 +-------
> >  merge.c             |  8 +-------
> >  reset.c             |  2 ++
> >  sequencer.c         |  2 ++
> >  unpack-trees.c      | 10 ++++++++++
> >  unpack-trees.h      |  1 +
> >  14 files changed, 35 insertions(+), 40 deletions(-)
> >
> > diff --git a/builtin/am.c b/builtin/am.c
> > index e4a0ff9cd7c..1ee70692bc3 100644
> > --- a/builtin/am.c
> > +++ b/builtin/am.c
> > @@ -1918,6 +1918,9 @@ static int fast_forward_to(struct tree *head, struct tree *remote, int reset)
> >       opts.update = 1;
> >       opts.merge = 1;
> >       opts.reset = reset;
> > +     if (!reset)
> > +             /* FIXME: Default should be to remove ignored files */
> > +             opts.preserve_ignored = 1;
> >       opts.fn = twoway_merge;
> >       init_tree_desc(&t[0], head->buffer, head->size);
> >       init_tree_desc(&t[1], remote->buffer, remote->size);
> > diff --git a/builtin/checkout.c b/builtin/checkout.c
> > index 5335435d616..5e7957dd068 100644
> > --- a/builtin/checkout.c
> > +++ b/builtin/checkout.c
> > @@ -648,6 +648,7 @@ static int reset_tree(struct tree *tree, const struct checkout_opts *o,
> >       opts.skip_unmerged = !worktree;
> >       opts.reset = 1;
> >       opts.merge = 1;
> > +     opts.preserve_ignored = 0;
> >       opts.fn = oneway_merge;
> >       opts.verbose_update = o->show_progress;
> >       opts.src_index = &the_index;
> > @@ -746,11 +747,7 @@ static int merge_working_tree(const struct checkout_opts *opts,
> >                                      new_branch_info->commit ?
> >                                      &new_branch_info->commit->object.oid :
> >                                      &new_branch_info->oid, NULL);
> > -             if (opts->overwrite_ignore) {
> > -                     topts.dir = xcalloc(1, sizeof(*topts.dir));
> > -                     topts.dir->flags |= DIR_SHOW_IGNORED;
> > -                     setup_standard_excludes(topts.dir);
> > -             }
> > +             topts.preserve_ignored = !opts->overwrite_ignore;
> >               tree = parse_tree_indirect(old_branch_info->commit ?
> >                                          &old_branch_info->commit->object.oid :
> >                                          the_hash_algo->empty_tree);
> > @@ -760,10 +757,6 @@ static int merge_working_tree(const struct checkout_opts *opts,
> >               init_tree_desc(&trees[1], tree->buffer, tree->size);
> >
> >               ret = unpack_trees(2, trees, &topts);
> > -             if (topts.dir) {
> > -                     dir_clear(topts.dir);
> > -                     FREE_AND_NULL(topts.dir);
> > -             }
> >               clear_unpack_trees_porcelain(&topts);
> >               if (ret == -1) {
> >                       /*
> > diff --git a/builtin/clone.c b/builtin/clone.c
> > index ff1d3d447a3..be1c3840d62 100644
> > --- a/builtin/clone.c
> > +++ b/builtin/clone.c
> > @@ -687,6 +687,8 @@ static int checkout(int submodule_progress)
> >       opts.update = 1;
> >       opts.merge = 1;
> >       opts.clone = 1;
> > +     /* FIXME: Default should be to remove ignored files */
> > +     opts.preserve_ignored = 1;
> >       opts.fn = oneway_merge;
> >       opts.verbose_update = (option_verbosity >= 0);
> >       opts.src_index = &the_index;
> > diff --git a/builtin/merge.c b/builtin/merge.c
> > index 3fbdacc7db4..1e5fff095fc 100644
> > --- a/builtin/merge.c
> > +++ b/builtin/merge.c
> > @@ -680,6 +680,8 @@ static int read_tree_trivial(struct object_id *common, struct object_id *head,
> >       opts.verbose_update = 1;
> >       opts.trivial_merges_only = 1;
> >       opts.merge = 1;
> > +     /* FIXME: Default should be to remove ignored files */
> > +     opts.preserve_ignored = 1;
> >       trees[nr_trees] = parse_tree_indirect(common);
> >       if (!trees[nr_trees++])
> >               return -1;
> > diff --git a/builtin/read-tree.c b/builtin/read-tree.c
> > index 73cb957a69b..443d206eca6 100644
> > --- a/builtin/read-tree.c
> > +++ b/builtin/read-tree.c
> > @@ -201,11 +201,9 @@ int cmd_read_tree(int argc, const char **argv, const char *cmd_prefix)
> >       if ((opts.update || opts.index_only) && !opts.merge)
> >               die("%s is meaningless without -m, --reset, or --prefix",
> >                   opts.update ? "-u" : "-i");
> > -     if (opts.update && !opts.reset) {
> > -             CALLOC_ARRAY(opts.dir, 1);
> > -             opts.dir->flags |= DIR_SHOW_IGNORED;
> > -             setup_standard_excludes(opts.dir);
> > -     }
> > +     if (opts.update && !opts.reset)
> > +             opts.preserve_ignored = 0;
> > +     /* otherwise, opts.preserve_ignored is irrelevant */
> >       if (opts.merge && !opts.index_only)
> >               setup_work_tree();
> >
> > @@ -245,11 +243,6 @@ int cmd_read_tree(int argc, const char **argv, const char *cmd_prefix)
> >       if (unpack_trees(nr_trees, t, &opts))
> >               return 128;
> >
> > -     if (opts.dir) {
> > -             dir_clear(opts.dir);
> > -             FREE_AND_NULL(opts.dir);
> > -     }
> > -
> >       if (opts.debug_unpack || opts.dry_run)
> >               return 0; /* do not write the index out */
> >
> > diff --git a/builtin/reset.c b/builtin/reset.c
> > index 51c9e2f43ff..7f38656f018 100644
> > --- a/builtin/reset.c
> > +++ b/builtin/reset.c
> > @@ -67,6 +67,8 @@ static int reset_index(const char *ref, const struct object_id *oid, int reset_t
> >       case KEEP:
> >       case MERGE:
> >               opts.update = 1;
> > +             /* FIXME: Default should be to remove ignored files */
> > +             opts.preserve_ignored = 1;
> >               break;
> >       case HARD:
> >               opts.update = 1;
> > diff --git a/builtin/stash.c b/builtin/stash.c
> > index 8f42360ca91..88287b890d5 100644
> > --- a/builtin/stash.c
> > +++ b/builtin/stash.c
> > @@ -258,6 +258,9 @@ static int reset_tree(struct object_id *i_tree, int update, int reset)
> >       opts.merge = 1;
> >       opts.reset = reset;
> >       opts.update = update;
> > +     if (update && !reset)
> > +             /* FIXME: Default should be to remove ignored files */
> > +             opts.preserve_ignored = 1;
> >       opts.fn = oneway_merge;
> >
> >       if (unpack_trees(nr_trees, t, &opts))
> > diff --git a/merge-ort.c b/merge-ort.c
> > index 35aa979c3a4..0d64ec716bd 100644
> > --- a/merge-ort.c
> > +++ b/merge-ort.c
> > @@ -4045,11 +4045,7 @@ static int checkout(struct merge_options *opt,
> >       unpack_opts.quiet = 0; /* FIXME: sequencer might want quiet? */
> >       unpack_opts.verbose_update = (opt->verbosity > 2);
> >       unpack_opts.fn = twoway_merge;
> > -     if (1/* FIXME: opts->overwrite_ignore*/) {
> > -             CALLOC_ARRAY(unpack_opts.dir, 1);
> > -             unpack_opts.dir->flags |= DIR_SHOW_IGNORED;
> > -             setup_standard_excludes(unpack_opts.dir);
> > -     }
> > +     unpack_opts.preserve_ignored = 0; /* FIXME: !opts->overwrite_ignore*/
> >       parse_tree(prev);
> >       init_tree_desc(&trees[0], prev->buffer, prev->size);
> >       parse_tree(next);
> > @@ -4057,8 +4053,6 @@ static int checkout(struct merge_options *opt,
> >
> >       ret = unpack_trees(2, trees, &unpack_opts);
> >       clear_unpack_trees_porcelain(&unpack_opts);
> > -     dir_clear(unpack_opts.dir);
> > -     FREE_AND_NULL(unpack_opts.dir);
> >       return ret;
> >  }
> >
> > diff --git a/merge-recursive.c b/merge-recursive.c
> > index 233d9f686ad..2be3f5d4044 100644
> > --- a/merge-recursive.c
> > +++ b/merge-recursive.c
> > @@ -411,9 +411,7 @@ static int unpack_trees_start(struct merge_options *opt,
> >       else {
> >               opt->priv->unpack_opts.update = 1;
> >               /* FIXME: should only do this if !overwrite_ignore */
> > -             CALLOC_ARRAY(opt->priv->unpack_opts.dir, 1);
> > -             opt->priv->unpack_opts.dir->flags |= DIR_SHOW_IGNORED;
> > -             setup_standard_excludes(opt->priv->unpack_opts.dir);
> > +             opt->priv->unpack_opts.preserve_ignored = 0;
> >       }
> >       opt->priv->unpack_opts.merge = 1;
> >       opt->priv->unpack_opts.head_idx = 2;
> > @@ -428,10 +426,6 @@ static int unpack_trees_start(struct merge_options *opt,
> >       init_tree_desc_from_tree(t+2, merge);
> >
> >       rc = unpack_trees(3, t, &opt->priv->unpack_opts);
> > -     if (opt->priv->unpack_opts.dir) {
> > -             dir_clear(opt->priv->unpack_opts.dir);
> > -             FREE_AND_NULL(opt->priv->unpack_opts.dir);
> > -     }
> >       cache_tree_free(&opt->repo->index->cache_tree);
> >
> >       /*
> > diff --git a/merge.c b/merge.c
> > index 6e736881d90..2382ff66d35 100644
> > --- a/merge.c
> > +++ b/merge.c
> > @@ -53,7 +53,6 @@ int checkout_fast_forward(struct repository *r,
> >       struct unpack_trees_options opts;
> >       struct tree_desc t[MAX_UNPACK_TREES];
> >       int i, nr_trees = 0;
> > -     struct dir_struct dir = DIR_INIT;
> >       struct lock_file lock_file = LOCK_INIT;
> >
> >       refresh_index(r->index, REFRESH_QUIET, NULL, NULL, NULL);
> > @@ -80,11 +79,7 @@ int checkout_fast_forward(struct repository *r,
> >       }
> >
> >       memset(&opts, 0, sizeof(opts));
> > -     if (overwrite_ignore) {
> > -             dir.flags |= DIR_SHOW_IGNORED;
> > -             setup_standard_excludes(&dir);
> > -             opts.dir = &dir;
> > -     }
> > +     opts.preserve_ignored = !overwrite_ignore;
> >
> >       opts.head_idx = 1;
> >       opts.src_index = r->index;
> > @@ -101,7 +96,6 @@ int checkout_fast_forward(struct repository *r,
> >               clear_unpack_trees_porcelain(&opts);
> >               return -1;
> >       }
> > -     dir_clear(&dir);
> >       clear_unpack_trees_porcelain(&opts);
> >
> >       if (write_locked_index(r->index, &lock_file, COMMIT_LOCK))
> > diff --git a/reset.c b/reset.c
> > index 79310ae071b..41b3e2d88de 100644
> > --- a/reset.c
> > +++ b/reset.c
> > @@ -56,6 +56,8 @@ int reset_head(struct repository *r, struct object_id *oid, const char *action,
> >       unpack_tree_opts.fn = reset_hard ? oneway_merge : twoway_merge;
> >       unpack_tree_opts.update = 1;
> >       unpack_tree_opts.merge = 1;
> > +     /* FIXME: Default should be to remove ignored files */
> > +     unpack_tree_opts.preserve_ignored = 1;
> >       init_checkout_metadata(&unpack_tree_opts.meta, switch_to_branch, oid, NULL);
> >       if (!detach_head)
> >               unpack_tree_opts.reset = 1;
> > diff --git a/sequencer.c b/sequencer.c
> > index 614d56f5e21..098566c68d9 100644
> > --- a/sequencer.c
> > +++ b/sequencer.c
> > @@ -3699,6 +3699,8 @@ static int do_reset(struct repository *r,
> >       unpack_tree_opts.fn = oneway_merge;
> >       unpack_tree_opts.merge = 1;
> >       unpack_tree_opts.update = 1;
> > +     /* FIXME: Default should be to remove ignored files */
> > +     unpack_tree_opts.preserve_ignored = 1;
> >       init_checkout_metadata(&unpack_tree_opts.meta, name, &oid, NULL);
> >
> >       if (repo_read_index_unmerged(r)) {
> > diff --git a/unpack-trees.c b/unpack-trees.c
> > index 8ea0a542da8..1e4eae1dc7d 100644
> > --- a/unpack-trees.c
> > +++ b/unpack-trees.c
> > @@ -1707,6 +1707,12 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
> >               ensure_full_index(o->dst_index);
> >       }
> >
> > +     if (!o->preserve_ignored) {
> > +             CALLOC_ARRAY(o->dir, 1);
> > +             o->dir->flags |= DIR_SHOW_IGNORED;
> > +             setup_standard_excludes(o->dir);
> > +     }
> > +
> >       if (!core_apply_sparse_checkout || !o->update)
> >               o->skip_sparse_checkout = 1;
> >       if (!o->skip_sparse_checkout && !o->pl) {
> > @@ -1868,6 +1874,10 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
> >  done:
> >       if (free_pattern_list)
> >               clear_pattern_list(&pl);
> > +     if (o->dir) {
> > +             dir_clear(o->dir);
> > +             FREE_AND_NULL(o->dir);
> > +     }
> >       trace2_region_leave("unpack_trees", "unpack_trees", the_repository);
> >       trace_performance_leave("unpack_trees");
> >       return ret;
> > diff --git a/unpack-trees.h b/unpack-trees.h
> > index 2d88b19dca7..f98cfd49d7b 100644
> > --- a/unpack-trees.h
> > +++ b/unpack-trees.h
> > @@ -49,6 +49,7 @@ struct unpack_trees_options {
> >       unsigned int reset,
> >                    merge,
> >                    update,
> > +                  preserve_ignored,
> >                    clone,
> >                    index_only,
> >                    nontrivial_merge,
>
> I think getting rid of the boilerplate makes sense, but it doesn't sound
> from the commit message like you've considered just making that "struct
> dir*" member a "struct dir" instead.
>
> That simplifies things a lot, i.e. we can just DIR_INIT it, and don't
> need every caller to malloc/free it.

See the next patch in the series.  :-)

> Sometimes a pointer makes sense, but in this case the "struct
> unpack_trees_options" can just own it.

I did make it internal to unpack_trees_options in the next patch, but
kept it as a pointer just because that let me know whether it was used
or not.  I guess I could have added a boolean as well.  But I don't
actually allocate anything, because it's either a NULL pointer, or a
pointer to something on the stack.  So, I do get to just use DIR_INIT.

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH v3 04/11] unpack-trees: introduce preserve_ignored to unpack_trees_options
  2021-09-29 15:35         ` Elijah Newren
@ 2021-09-29 18:30           ` Ævar Arnfjörð Bjarmason
  2021-09-30  4:25             ` Elijah Newren
  0 siblings, 1 reply; 82+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-29 18:30 UTC (permalink / raw)
  To: Elijah Newren
  Cc: Elijah Newren via GitGitGadget, Git Mailing List, Fedor Biryukov,
	Philip Oakley, Phillip Wood, Eric Sunshine


On Wed, Sep 29 2021, Elijah Newren wrote:

> On Wed, Sep 29, 2021 at 2:27 AM Ævar Arnfjörð Bjarmason
> <avarab@gmail.com> wrote:
>>
>> On Mon, Sep 27 2021, Elijah Newren via GitGitGadget wrote:
>>
>> > From: Elijah Newren <newren@gmail.com>
>> >
>> > Currently, every caller of unpack_trees() that wants to ensure ignored
>> > files are overwritten by default needs to:
>> >    * allocate unpack_trees_options.dir
>> >    * flip the DIR_SHOW_IGNORED flag in unpack_trees_options.dir->flags
>> >    * call setup_standard_excludes
>> > AND then after the call to unpack_trees() needs to
>> >    * call dir_clear()
>> >    * deallocate unpack_trees_options.dir
>> > That's a fair amount of boilerplate, and every caller uses identical
>> > code.  Make this easier by instead introducing a new boolean value where
>> > the default value (0) does what we want so that new callers of
>> > unpack_trees() automatically get the appropriate behavior.  And move all
>> > the handling of unpack_trees_options.dir into unpack_trees() itself.
>> >
>> > While preserve_ignored = 0 is the behavior we feel is the appropriate
>> > default, we defer fixing commands to use the appropriate default until a
>> > later commit.  So, this commit introduces several locations where we
>> > manually set preserve_ignored=1.  This makes it clear where code paths
>> > were previously preserving ignored files when they should not have been;
>> > a future commit will flip these to instead use a value of 0 to get the
>> > behavior we want.
>> >
>> > Signed-off-by: Elijah Newren <newren@gmail.com>
>> > ---
>> >  builtin/am.c        |  3 +++
>> >  builtin/checkout.c  | 11 ++---------
>> >  builtin/clone.c     |  2 ++
>> >  builtin/merge.c     |  2 ++
>> >  builtin/read-tree.c | 13 +++----------
>> >  builtin/reset.c     |  2 ++
>> >  builtin/stash.c     |  3 +++
>> >  merge-ort.c         |  8 +-------
>> >  merge-recursive.c   |  8 +-------
>> >  merge.c             |  8 +-------
>> >  reset.c             |  2 ++
>> >  sequencer.c         |  2 ++
>> >  unpack-trees.c      | 10 ++++++++++
>> >  unpack-trees.h      |  1 +
>> >  14 files changed, 35 insertions(+), 40 deletions(-)
>> >
>> > diff --git a/builtin/am.c b/builtin/am.c
>> > index e4a0ff9cd7c..1ee70692bc3 100644
>> > --- a/builtin/am.c
>> > +++ b/builtin/am.c
>> > @@ -1918,6 +1918,9 @@ static int fast_forward_to(struct tree *head, struct tree *remote, int reset)
>> >       opts.update = 1;
>> >       opts.merge = 1;
>> >       opts.reset = reset;
>> > +     if (!reset)
>> > +             /* FIXME: Default should be to remove ignored files */
>> > +             opts.preserve_ignored = 1;
>> >       opts.fn = twoway_merge;
>> >       init_tree_desc(&t[0], head->buffer, head->size);
>> >       init_tree_desc(&t[1], remote->buffer, remote->size);
>> > diff --git a/builtin/checkout.c b/builtin/checkout.c
>> > index 5335435d616..5e7957dd068 100644
>> > --- a/builtin/checkout.c
>> > +++ b/builtin/checkout.c
>> > @@ -648,6 +648,7 @@ static int reset_tree(struct tree *tree, const struct checkout_opts *o,
>> >       opts.skip_unmerged = !worktree;
>> >       opts.reset = 1;
>> >       opts.merge = 1;
>> > +     opts.preserve_ignored = 0;
>> >       opts.fn = oneway_merge;
>> >       opts.verbose_update = o->show_progress;
>> >       opts.src_index = &the_index;
>> > @@ -746,11 +747,7 @@ static int merge_working_tree(const struct checkout_opts *opts,
>> >                                      new_branch_info->commit ?
>> >                                      &new_branch_info->commit->object.oid :
>> >                                      &new_branch_info->oid, NULL);
>> > -             if (opts->overwrite_ignore) {
>> > -                     topts.dir = xcalloc(1, sizeof(*topts.dir));
>> > -                     topts.dir->flags |= DIR_SHOW_IGNORED;
>> > -                     setup_standard_excludes(topts.dir);
>> > -             }
>> > +             topts.preserve_ignored = !opts->overwrite_ignore;
>> >               tree = parse_tree_indirect(old_branch_info->commit ?
>> >                                          &old_branch_info->commit->object.oid :
>> >                                          the_hash_algo->empty_tree);
>> > @@ -760,10 +757,6 @@ static int merge_working_tree(const struct checkout_opts *opts,
>> >               init_tree_desc(&trees[1], tree->buffer, tree->size);
>> >
>> >               ret = unpack_trees(2, trees, &topts);
>> > -             if (topts.dir) {
>> > -                     dir_clear(topts.dir);
>> > -                     FREE_AND_NULL(topts.dir);
>> > -             }
>> >               clear_unpack_trees_porcelain(&topts);
>> >               if (ret == -1) {
>> >                       /*
>> > diff --git a/builtin/clone.c b/builtin/clone.c
>> > index ff1d3d447a3..be1c3840d62 100644
>> > --- a/builtin/clone.c
>> > +++ b/builtin/clone.c
>> > @@ -687,6 +687,8 @@ static int checkout(int submodule_progress)
>> >       opts.update = 1;
>> >       opts.merge = 1;
>> >       opts.clone = 1;
>> > +     /* FIXME: Default should be to remove ignored files */
>> > +     opts.preserve_ignored = 1;
>> >       opts.fn = oneway_merge;
>> >       opts.verbose_update = (option_verbosity >= 0);
>> >       opts.src_index = &the_index;
>> > diff --git a/builtin/merge.c b/builtin/merge.c
>> > index 3fbdacc7db4..1e5fff095fc 100644
>> > --- a/builtin/merge.c
>> > +++ b/builtin/merge.c
>> > @@ -680,6 +680,8 @@ static int read_tree_trivial(struct object_id *common, struct object_id *head,
>> >       opts.verbose_update = 1;
>> >       opts.trivial_merges_only = 1;
>> >       opts.merge = 1;
>> > +     /* FIXME: Default should be to remove ignored files */
>> > +     opts.preserve_ignored = 1;
>> >       trees[nr_trees] = parse_tree_indirect(common);
>> >       if (!trees[nr_trees++])
>> >               return -1;
>> > diff --git a/builtin/read-tree.c b/builtin/read-tree.c
>> > index 73cb957a69b..443d206eca6 100644
>> > --- a/builtin/read-tree.c
>> > +++ b/builtin/read-tree.c
>> > @@ -201,11 +201,9 @@ int cmd_read_tree(int argc, const char **argv, const char *cmd_prefix)
>> >       if ((opts.update || opts.index_only) && !opts.merge)
>> >               die("%s is meaningless without -m, --reset, or --prefix",
>> >                   opts.update ? "-u" : "-i");
>> > -     if (opts.update && !opts.reset) {
>> > -             CALLOC_ARRAY(opts.dir, 1);
>> > -             opts.dir->flags |= DIR_SHOW_IGNORED;
>> > -             setup_standard_excludes(opts.dir);
>> > -     }
>> > +     if (opts.update && !opts.reset)
>> > +             opts.preserve_ignored = 0;
>> > +     /* otherwise, opts.preserve_ignored is irrelevant */
>> >       if (opts.merge && !opts.index_only)
>> >               setup_work_tree();
>> >
>> > @@ -245,11 +243,6 @@ int cmd_read_tree(int argc, const char **argv, const char *cmd_prefix)
>> >       if (unpack_trees(nr_trees, t, &opts))
>> >               return 128;
>> >
>> > -     if (opts.dir) {
>> > -             dir_clear(opts.dir);
>> > -             FREE_AND_NULL(opts.dir);
>> > -     }
>> > -
>> >       if (opts.debug_unpack || opts.dry_run)
>> >               return 0; /* do not write the index out */
>> >
>> > diff --git a/builtin/reset.c b/builtin/reset.c
>> > index 51c9e2f43ff..7f38656f018 100644
>> > --- a/builtin/reset.c
>> > +++ b/builtin/reset.c
>> > @@ -67,6 +67,8 @@ static int reset_index(const char *ref, const struct object_id *oid, int reset_t
>> >       case KEEP:
>> >       case MERGE:
>> >               opts.update = 1;
>> > +             /* FIXME: Default should be to remove ignored files */
>> > +             opts.preserve_ignored = 1;
>> >               break;
>> >       case HARD:
>> >               opts.update = 1;
>> > diff --git a/builtin/stash.c b/builtin/stash.c
>> > index 8f42360ca91..88287b890d5 100644
>> > --- a/builtin/stash.c
>> > +++ b/builtin/stash.c
>> > @@ -258,6 +258,9 @@ static int reset_tree(struct object_id *i_tree, int update, int reset)
>> >       opts.merge = 1;
>> >       opts.reset = reset;
>> >       opts.update = update;
>> > +     if (update && !reset)
>> > +             /* FIXME: Default should be to remove ignored files */
>> > +             opts.preserve_ignored = 1;
>> >       opts.fn = oneway_merge;
>> >
>> >       if (unpack_trees(nr_trees, t, &opts))
>> > diff --git a/merge-ort.c b/merge-ort.c
>> > index 35aa979c3a4..0d64ec716bd 100644
>> > --- a/merge-ort.c
>> > +++ b/merge-ort.c
>> > @@ -4045,11 +4045,7 @@ static int checkout(struct merge_options *opt,
>> >       unpack_opts.quiet = 0; /* FIXME: sequencer might want quiet? */
>> >       unpack_opts.verbose_update = (opt->verbosity > 2);
>> >       unpack_opts.fn = twoway_merge;
>> > -     if (1/* FIXME: opts->overwrite_ignore*/) {
>> > -             CALLOC_ARRAY(unpack_opts.dir, 1);
>> > -             unpack_opts.dir->flags |= DIR_SHOW_IGNORED;
>> > -             setup_standard_excludes(unpack_opts.dir);
>> > -     }
>> > +     unpack_opts.preserve_ignored = 0; /* FIXME: !opts->overwrite_ignore*/
>> >       parse_tree(prev);
>> >       init_tree_desc(&trees[0], prev->buffer, prev->size);
>> >       parse_tree(next);
>> > @@ -4057,8 +4053,6 @@ static int checkout(struct merge_options *opt,
>> >
>> >       ret = unpack_trees(2, trees, &unpack_opts);
>> >       clear_unpack_trees_porcelain(&unpack_opts);
>> > -     dir_clear(unpack_opts.dir);
>> > -     FREE_AND_NULL(unpack_opts.dir);
>> >       return ret;
>> >  }
>> >
>> > diff --git a/merge-recursive.c b/merge-recursive.c
>> > index 233d9f686ad..2be3f5d4044 100644
>> > --- a/merge-recursive.c
>> > +++ b/merge-recursive.c
>> > @@ -411,9 +411,7 @@ static int unpack_trees_start(struct merge_options *opt,
>> >       else {
>> >               opt->priv->unpack_opts.update = 1;
>> >               /* FIXME: should only do this if !overwrite_ignore */
>> > -             CALLOC_ARRAY(opt->priv->unpack_opts.dir, 1);
>> > -             opt->priv->unpack_opts.dir->flags |= DIR_SHOW_IGNORED;
>> > -             setup_standard_excludes(opt->priv->unpack_opts.dir);
>> > +             opt->priv->unpack_opts.preserve_ignored = 0;
>> >       }
>> >       opt->priv->unpack_opts.merge = 1;
>> >       opt->priv->unpack_opts.head_idx = 2;
>> > @@ -428,10 +426,6 @@ static int unpack_trees_start(struct merge_options *opt,
>> >       init_tree_desc_from_tree(t+2, merge);
>> >
>> >       rc = unpack_trees(3, t, &opt->priv->unpack_opts);
>> > -     if (opt->priv->unpack_opts.dir) {
>> > -             dir_clear(opt->priv->unpack_opts.dir);
>> > -             FREE_AND_NULL(opt->priv->unpack_opts.dir);
>> > -     }
>> >       cache_tree_free(&opt->repo->index->cache_tree);
>> >
>> >       /*
>> > diff --git a/merge.c b/merge.c
>> > index 6e736881d90..2382ff66d35 100644
>> > --- a/merge.c
>> > +++ b/merge.c
>> > @@ -53,7 +53,6 @@ int checkout_fast_forward(struct repository *r,
>> >       struct unpack_trees_options opts;
>> >       struct tree_desc t[MAX_UNPACK_TREES];
>> >       int i, nr_trees = 0;
>> > -     struct dir_struct dir = DIR_INIT;
>> >       struct lock_file lock_file = LOCK_INIT;
>> >
>> >       refresh_index(r->index, REFRESH_QUIET, NULL, NULL, NULL);
>> > @@ -80,11 +79,7 @@ int checkout_fast_forward(struct repository *r,
>> >       }
>> >
>> >       memset(&opts, 0, sizeof(opts));
>> > -     if (overwrite_ignore) {
>> > -             dir.flags |= DIR_SHOW_IGNORED;
>> > -             setup_standard_excludes(&dir);
>> > -             opts.dir = &dir;
>> > -     }
>> > +     opts.preserve_ignored = !overwrite_ignore;
>> >
>> >       opts.head_idx = 1;
>> >       opts.src_index = r->index;
>> > @@ -101,7 +96,6 @@ int checkout_fast_forward(struct repository *r,
>> >               clear_unpack_trees_porcelain(&opts);
>> >               return -1;
>> >       }
>> > -     dir_clear(&dir);
>> >       clear_unpack_trees_porcelain(&opts);
>> >
>> >       if (write_locked_index(r->index, &lock_file, COMMIT_LOCK))
>> > diff --git a/reset.c b/reset.c
>> > index 79310ae071b..41b3e2d88de 100644
>> > --- a/reset.c
>> > +++ b/reset.c
>> > @@ -56,6 +56,8 @@ int reset_head(struct repository *r, struct object_id *oid, const char *action,
>> >       unpack_tree_opts.fn = reset_hard ? oneway_merge : twoway_merge;
>> >       unpack_tree_opts.update = 1;
>> >       unpack_tree_opts.merge = 1;
>> > +     /* FIXME: Default should be to remove ignored files */
>> > +     unpack_tree_opts.preserve_ignored = 1;
>> >       init_checkout_metadata(&unpack_tree_opts.meta, switch_to_branch, oid, NULL);
>> >       if (!detach_head)
>> >               unpack_tree_opts.reset = 1;
>> > diff --git a/sequencer.c b/sequencer.c
>> > index 614d56f5e21..098566c68d9 100644
>> > --- a/sequencer.c
>> > +++ b/sequencer.c
>> > @@ -3699,6 +3699,8 @@ static int do_reset(struct repository *r,
>> >       unpack_tree_opts.fn = oneway_merge;
>> >       unpack_tree_opts.merge = 1;
>> >       unpack_tree_opts.update = 1;
>> > +     /* FIXME: Default should be to remove ignored files */
>> > +     unpack_tree_opts.preserve_ignored = 1;
>> >       init_checkout_metadata(&unpack_tree_opts.meta, name, &oid, NULL);
>> >
>> >       if (repo_read_index_unmerged(r)) {
>> > diff --git a/unpack-trees.c b/unpack-trees.c
>> > index 8ea0a542da8..1e4eae1dc7d 100644
>> > --- a/unpack-trees.c
>> > +++ b/unpack-trees.c
>> > @@ -1707,6 +1707,12 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
>> >               ensure_full_index(o->dst_index);
>> >       }
>> >
>> > +     if (!o->preserve_ignored) {
>> > +             CALLOC_ARRAY(o->dir, 1);
>> > +             o->dir->flags |= DIR_SHOW_IGNORED;
>> > +             setup_standard_excludes(o->dir);
>> > +     }
>> > +
>> >       if (!core_apply_sparse_checkout || !o->update)
>> >               o->skip_sparse_checkout = 1;
>> >       if (!o->skip_sparse_checkout && !o->pl) {
>> > @@ -1868,6 +1874,10 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
>> >  done:
>> >       if (free_pattern_list)
>> >               clear_pattern_list(&pl);
>> > +     if (o->dir) {
>> > +             dir_clear(o->dir);
>> > +             FREE_AND_NULL(o->dir);
>> > +     }
>> >       trace2_region_leave("unpack_trees", "unpack_trees", the_repository);
>> >       trace_performance_leave("unpack_trees");
>> >       return ret;
>> > diff --git a/unpack-trees.h b/unpack-trees.h
>> > index 2d88b19dca7..f98cfd49d7b 100644
>> > --- a/unpack-trees.h
>> > +++ b/unpack-trees.h
>> > @@ -49,6 +49,7 @@ struct unpack_trees_options {
>> >       unsigned int reset,
>> >                    merge,
>> >                    update,
>> > +                  preserve_ignored,
>> >                    clone,
>> >                    index_only,
>> >                    nontrivial_merge,
>>
>> I think getting rid of the boilerplate makes sense, but it doesn't sound
>> from the commit message like you've considered just making that "struct
>> dir*" member a "struct dir" instead.
>>
>> That simplifies things a lot, i.e. we can just DIR_INIT it, and don't
>> need every caller to malloc/free it.
>
> See the next patch in the series.  :-)

Ah!

>> Sometimes a pointer makes sense, but in this case the "struct
>> unpack_trees_options" can just own it.
>
> I did make it internal to unpack_trees_options in the next patch, but
> kept it as a pointer just because that let me know whether it was used
> or not.  I guess I could have added a boolean as well.  But I don't
> actually allocate anything, because it's either a NULL pointer, or a
> pointer to something on the stack.  So, I do get to just use DIR_INIT.

I think I'm probably missing something. I just made it allocated on the
stack by the caller using "struct unpack_trees_options", but then you
end up having a dir* in the struct, but that's only filled in as a
pointer to the stack variable? Maybe there's some subtlety I'm missing
here...

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH v3 04/11] unpack-trees: introduce preserve_ignored to unpack_trees_options
  2021-09-29 18:30           ` Ævar Arnfjörð Bjarmason
@ 2021-09-30  4:25             ` Elijah Newren
  2021-09-30 14:04               ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 82+ messages in thread
From: Elijah Newren @ 2021-09-30  4:25 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Elijah Newren via GitGitGadget, Git Mailing List, Fedor Biryukov,
	Philip Oakley, Phillip Wood, Eric Sunshine

On Wed, Sep 29, 2021 at 11:32 AM Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
>
> On Wed, Sep 29 2021, Elijah Newren wrote:
>
> > On Wed, Sep 29, 2021 at 2:27 AM Ævar Arnfjörð Bjarmason
> > <avarab@gmail.com> wrote:
> >>
...
> >>
> >> I think getting rid of the boilerplate makes sense, but it doesn't sound
> >> from the commit message like you've considered just making that "struct
> >> dir*" member a "struct dir" instead.
> >>
> >> That simplifies things a lot, i.e. we can just DIR_INIT it, and don't
> >> need every caller to malloc/free it.
> >
> > See the next patch in the series.  :-)
>
> Ah!
>
> >> Sometimes a pointer makes sense, but in this case the "struct
> >> unpack_trees_options" can just own it.
> >
> > I did make it internal to unpack_trees_options in the next patch, but
> > kept it as a pointer just because that let me know whether it was used
> > or not.  I guess I could have added a boolean as well.  But I don't
> > actually allocate anything, because it's either a NULL pointer, or a
> > pointer to something on the stack.  So, I do get to just use DIR_INIT.
>
> I think I'm probably missing something. I just made it allocated on the
> stack by the caller using "struct unpack_trees_options", but then you
> end up having a dir* in the struct, but that's only filled in as a
> pointer to the stack variable? Maybe there's some subtlety I'm missing
> here...

As per the next patch:

int unpack_trees(..., struct unpack_trees_options *o)
{
    struct dir_struct dir = DIR_INIT;
    ...
    if (!o->preserve_ignored) {
        /* Setup 'dir', make o->dir point to it */
        ....
        o->dir = &dir;
    }
    ...
    if (o->dir)
        /* cleanup */
    ....
}

The caller doesn't touch o->dir (other than initializing it to zeros);
unpack_trees() is wholly responsible for it.  I'd kind of like to
entirely remove dir from unpack_trees_options(), but I need a way of
passing it down through all the other functions in unpack-trees.c, and
leaving it in unpack_trees_options seems the easiest way to do so.  So
I just marked it as "for internal use only".

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH v3 00/11] Fix various issues around removal of untracked files/directories
  2021-09-27 16:33   ` [PATCH v3 00/11] Fix various issues around removal of " Elijah Newren via GitGitGadget
                       ` (11 preceding siblings ...)
  2021-09-27 20:36     ` [PATCH v3 00/11] Fix various issues around removal of " Junio C Hamano
@ 2021-09-30 14:00     ` Phillip Wood
       [not found]     ` <aaa8ea3b-0902-f9e6-c1a4-0ca2b1b2f57b@gmail.com>
  2021-10-04  1:11     ` [RFC PATCH v4 00/10] " Ævar Arnfjörð Bjarmason
  14 siblings, 0 replies; 82+ messages in thread
From: Phillip Wood @ 2021-09-30 14:00 UTC (permalink / raw)
  To: Elijah Newren via GitGitGadget, git
  Cc: Ævar Arnfjörð Bjarmason, Fedor Biryukov,
	Philip Oakley, Elijah Newren, Eric Sunshine

[Resending as it did not seem to get through to the list last time]

Hi Elijah

On 27/09/2021 17:33, Elijah Newren via GitGitGadget wrote:
> We have multiple codepaths that delete untracked files/directories but
> shouldn't. There are also some codepaths where we delete untracked
> files/directories intentionally (based on mailing list discussion), but
> where that intent is not documented. We also have some codepaths that
> preserve ignored files, which shouldn't. Fix the documentation, add several
> new (mostly failing) testcases, fix some of the new testcases, and add
> comments about some potential remaining problems. (I found these as a
> side-effect of looking at [1], though [2] pointed out one explicitly while I
> was working on it.)
> 
> Note that I'm using Junio's declaration about checkout -f and reset --hard
> (and also presuming that since read-tree --reset is porcelain that its
> behavior should be left alone)[3] in this series.

I've had a read through and I don't have any specific comments, I like 
the way you have simplified adding the standard excludes for callers and 
making the existing value of reset invalid when converting to an enum. I 
think there is a small risk someone will complain about read-tree 
changing how it handles ignored files, but hopefully everyone was just 
passing ".gitignore" to --exclude-per-directory and they wont mind 
'read-tree -m -u' removing ignored files now.

Best Wishes

Phillip

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH v3 04/11] unpack-trees: introduce preserve_ignored to unpack_trees_options
  2021-09-30  4:25             ` Elijah Newren
@ 2021-09-30 14:04               ` Ævar Arnfjörð Bjarmason
  2021-10-01  1:53                 ` Elijah Newren
  0 siblings, 1 reply; 82+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-30 14:04 UTC (permalink / raw)
  To: Elijah Newren
  Cc: Elijah Newren via GitGitGadget, Git Mailing List, Fedor Biryukov,
	Philip Oakley, Phillip Wood, Eric Sunshine


On Wed, Sep 29 2021, Elijah Newren wrote:

> On Wed, Sep 29, 2021 at 11:32 AM Ævar Arnfjörð Bjarmason
> <avarab@gmail.com> wrote:
>>
>> On Wed, Sep 29 2021, Elijah Newren wrote:
>>
>> > On Wed, Sep 29, 2021 at 2:27 AM Ævar Arnfjörð Bjarmason
>> > <avarab@gmail.com> wrote:
>> >>
> ...
>> >>
>> >> I think getting rid of the boilerplate makes sense, but it doesn't sound
>> >> from the commit message like you've considered just making that "struct
>> >> dir*" member a "struct dir" instead.
>> >>
>> >> That simplifies things a lot, i.e. we can just DIR_INIT it, and don't
>> >> need every caller to malloc/free it.
>> >
>> > See the next patch in the series.  :-)
>>
>> Ah!
>>
>> >> Sometimes a pointer makes sense, but in this case the "struct
>> >> unpack_trees_options" can just own it.
>> >
>> > I did make it internal to unpack_trees_options in the next patch, but
>> > kept it as a pointer just because that let me know whether it was used
>> > or not.  I guess I could have added a boolean as well.  But I don't
>> > actually allocate anything, because it's either a NULL pointer, or a
>> > pointer to something on the stack.  So, I do get to just use DIR_INIT.
>>
>> I think I'm probably missing something. I just made it allocated on the
>> stack by the caller using "struct unpack_trees_options", but then you
>> end up having a dir* in the struct, but that's only filled in as a
>> pointer to the stack variable? Maybe there's some subtlety I'm missing
>> here...
>
> As per the next patch:
>
> int unpack_trees(..., struct unpack_trees_options *o)
> {
>     struct dir_struct dir = DIR_INIT;
>     ...
>     if (!o->preserve_ignored) {
>         /* Setup 'dir', make o->dir point to it */
>         ....
>         o->dir = &dir;
>     }
>     ...
>     if (o->dir)
>         /* cleanup */
>     ....
> }
>
> The caller doesn't touch o->dir (other than initializing it to zeros);
> unpack_trees() is wholly responsible for it.  I'd kind of like to
> entirely remove dir from unpack_trees_options(), but I need a way of
> passing it down through all the other functions in unpack-trees.c, and
> leaving it in unpack_trees_options seems the easiest way to do so.  So
> I just marked it as "for internal use only".

I think I understand *how* it works, I'm puzzled by why you went for
this whole level of indirection when you're using a struct on the stack
in the end anyway, just ... put that in "struct unpack_trees_options"?

Anyway, I see I have only myself to blame here, as you added these leak
fixes in the v2 in response to some of my offhand comments.

FWIW I then went on to do some deeper fixes not just on these leaks but
the surrounding leaks, which will be blocked by 2/11 & 05/11 of this
topic for a while. I suppose I only have myself to blame :)

Below is a patch-on-top that I think makes this whole thing much simpler
by doing away with the pointer entirely.

I suppose this is also a partial reply to
https://lore.kernel.org/git/CABPp-BG_qigBoirMGR-Yk9Niyxt0UmYCEqojsYxbSEarLAmraA@mail.gmail.com/;
but I quite dislike this pattern of including a pointer like this where
it's not needed just for the practicalities of memory management.

I.e. here you use DIR_INIT. In my local patches to fix up the wider
memory leaks in this area I've got DIR_INIT also using a STRBUF_INIT,
and DIR_INIT will in turn be referenced by a
UNPACK_TREES_OPTIONS_INIT. It's quite nice if you're having to
initialize with "UNPACK_TREES_OPTIONS_INIT" have that initialization
work all the way down the chain, and not need e.g. a manual
strbuf_init(), dir_init() etc.

I removed the dir_init() in ce93a4c6127 (dir.[ch]: replace dir_init()
with DIR_INIT, 2021-07-01), but would probably need to bring it back, of
course you need some "release()" method for the
UNPACK_TREES_OPTIONS_INIT, which in turn needs to call the dir_release()
(well, "dir_clear()" in that case), and it needs to call
"strbuf_release()". It's just nicer if that boilerplate is all on
destruction, but not also on struct/object setup.

We do need that setup in some cases (although a lot could just be
replaced by lazy initialization), but if we don't....

diff --git a/unpack-trees.c b/unpack-trees.c
index a7e1712d236..de5cc6cd025 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -1694,15 +1694,12 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
 	static struct cache_entry *dfc;
 	struct pattern_list pl;
 	int free_pattern_list = 0;
-	struct dir_struct dir = DIR_INIT;
 
 	if (o->reset == UNPACK_RESET_INVALID)
 		BUG("o->reset had a value of 1; should be UNPACK_TREES_*_UNTRACKED");
 
 	if (len > MAX_UNPACK_TREES)
 		die("unpack_trees takes at most %d trees", MAX_UNPACK_TREES);
-	if (o->dir)
-		BUG("o->dir is for internal use only");
 
 	trace_performance_enter();
 	trace2_region_enter("unpack_trees", "unpack_trees", the_repository);
@@ -1718,9 +1715,8 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
 		BUG("UNPACK_RESET_OVERWRITE_UNTRACKED incompatible with preserved ignored files");
 
 	if (!o->preserve_ignored) {
-		o->dir = &dir;
-		o->dir->flags |= DIR_SHOW_IGNORED;
-		setup_standard_excludes(o->dir);
+		o->dir.flags |= DIR_SHOW_IGNORED;
+		setup_standard_excludes(&o->dir);
 	}
 
 	if (!core_apply_sparse_checkout || !o->update)
@@ -1884,10 +1880,7 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
 done:
 	if (free_pattern_list)
 		clear_pattern_list(&pl);
-	if (o->dir) {
-		dir_clear(o->dir);
-		o->dir = NULL;
-	}
+	dir_clear(&o->dir);
 	trace2_region_leave("unpack_trees", "unpack_trees", the_repository);
 	trace_performance_leave("unpack_trees");
 	return ret;
@@ -2153,8 +2146,7 @@ static int verify_clean_subdirectory(const struct cache_entry *ce,
 	pathbuf = xstrfmt("%.*s/", namelen, ce->name);
 
 	memset(&d, 0, sizeof(d));
-	if (o->dir)
-		d.exclude_per_dir = o->dir->exclude_per_dir;
+	d.exclude_per_dir = o->dir.exclude_per_dir;
 	i = read_directory(&d, o->src_index, pathbuf, namelen+1, NULL);
 	if (i)
 		return add_rejected_path(o, ERROR_NOT_UPTODATE_DIR, ce->name);
@@ -2201,8 +2193,7 @@ static int check_ok_to_remove(const char *name, int len, int dtype,
 	if (ignore_case && icase_exists(o, name, len, st))
 		return 0;
 
-	if (o->dir &&
-	    is_excluded(o->dir, o->src_index, name, &dtype))
+	if (is_excluded(&o->dir, o->src_index, name, &dtype))
 		/*
 		 * ce->name is explicitly excluded, so it is Ok to
 		 * overwrite it.
diff --git a/unpack-trees.h b/unpack-trees.h
index 71ffb7eeb0c..a8afbb20170 100644
--- a/unpack-trees.h
+++ b/unpack-trees.h
@@ -5,6 +5,7 @@
 #include "strvec.h"
 #include "string-list.h"
 #include "tree-walk.h"
+#include "dir.h"
 
 #define MAX_UNPACK_TREES MAX_TRAVERSE_TREES
 
@@ -95,7 +96,7 @@ struct unpack_trees_options {
 	struct index_state result;
 
 	struct pattern_list *pl; /* for internal use */
-	struct dir_struct *dir; /* for internal use only */
+	struct dir_struct dir; /* for internal use only */
 	struct checkout_metadata meta;
 };
 

^ permalink raw reply related	[flat|nested] 82+ messages in thread

* Re: [PATCH v3 04/11] unpack-trees: introduce preserve_ignored to unpack_trees_options
  2021-09-30 14:04               ` Ævar Arnfjörð Bjarmason
@ 2021-10-01  1:53                 ` Elijah Newren
  2021-10-01  8:15                   ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 82+ messages in thread
From: Elijah Newren @ 2021-10-01  1:53 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Elijah Newren via GitGitGadget, Git Mailing List, Fedor Biryukov,
	Philip Oakley, Phillip Wood, Eric Sunshine

On Thu, Sep 30, 2021 at 7:15 AM Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
>
> On Wed, Sep 29 2021, Elijah Newren wrote:
>
> > On Wed, Sep 29, 2021 at 11:32 AM Ævar Arnfjörð Bjarmason
> > <avarab@gmail.com> wrote:
> >>
> >> On Wed, Sep 29 2021, Elijah Newren wrote:
> >>
...
> > As per the next patch:
> >
> > int unpack_trees(..., struct unpack_trees_options *o)
> > {
> >     struct dir_struct dir = DIR_INIT;
> >     ...
> >     if (!o->preserve_ignored) {
> >         /* Setup 'dir', make o->dir point to it */
> >         ....
> >         o->dir = &dir;
> >     }
> >     ...
> >     if (o->dir)
> >         /* cleanup */
> >     ....
> > }
> >
> > The caller doesn't touch o->dir (other than initializing it to zeros);
> > unpack_trees() is wholly responsible for it.  I'd kind of like to
> > entirely remove dir from unpack_trees_options(), but I need a way of
> > passing it down through all the other functions in unpack-trees.c, and
> > leaving it in unpack_trees_options seems the easiest way to do so.  So
> > I just marked it as "for internal use only".
>
> I think I understand *how* it works, I'm puzzled by why you went for
> this whole level of indirection when you're using a struct on the stack
> in the end anyway, just ... put that in "struct unpack_trees_options"?
>
> Anyway, I see I have only myself to blame here, as you added these leak
> fixes in the v2 in response to some of my offhand comments.
>
> FWIW I then went on to do some deeper fixes not just on these leaks but
> the surrounding leaks, which will be blocked by 2/11 & 05/11 of this
> topic for a while. I suppose I only have myself to blame :)
>
> Below is a patch-on-top that I think makes this whole thing much simpler
> by doing away with the pointer entirely.
>
> I suppose this is also a partial reply to
> https://lore.kernel.org/git/CABPp-BG_qigBoirMGR-Yk9Niyxt0UmYCEqojsYxbSEarLAmraA@mail.gmail.com/;
> but I quite dislike this pattern of including a pointer like this where
> it's not needed just for the practicalities of memory management.
>
> I.e. here you use DIR_INIT. In my local patches to fix up the wider
> memory leaks in this area I've got DIR_INIT also using a STRBUF_INIT,
> and DIR_INIT will in turn be referenced by a
> UNPACK_TREES_OPTIONS_INIT. It's quite nice if you're having to
> initialize with "UNPACK_TREES_OPTIONS_INIT" have that initialization
> work all the way down the chain, and not need e.g. a manual
> strbuf_init(), dir_init() etc.

And you can keep using UNPACK_TREES_OPTIONS_INIT, because the
unpack_trees_opts->dir should be initialized to NULL.

> I removed the dir_init() in ce93a4c6127 (dir.[ch]: replace dir_init()
> with DIR_INIT, 2021-07-01),

I might be going on a tangent here, but looking at that patch, I'm
worried that dir_init() was buggy and that you perpetuated that bug
with DIR_INIT.  Note that dir_struct has a struct strbuf basebuf
member, which neither dir_init() or DIR_INIT initialize properly
(using either strbuf_init() or STRBUF_INIT).  As far as I can tell,
dir.c relies on either strbuf_add() calls to just happen to work with
this incorrectly initialized strbuf, or else use the strbuf_init()
call in prep_exclude() to do so, using the following snippet:

    if (!dir->basebuf.buf)
        strbuf_init(&dir->basebuf, PATH_MAX);

However, earlier in that same function we see

    if (stk->baselen <= baselen &&
        !strncmp(dir->basebuf.buf, base, stk->baselen))
            break;

So either that function can never have dir->basebuf.buf be NULL and
the strbuf_init() is dead code, or else it's possible for us to
trigger a segfault.  If it's the former, it may just be a ticking time
bomb that will transform into the latter with some other change,
because it's not at all obvious to me how dir->basebuf gets
initialized appropriately to avoid that strncmp call.  Perhaps there
is some invariant where exclude_stack is only set up by previous calls
to prep_exclude() and those won't set up exclude_stack until first
initializing basebuf.  But that really at least deserves a comment
about how we're abusing basebuf, and would probably be cleaner if we
initialized basebuf to STRBUF_INIT.

> but would probably need to bring it back, of

If you need to bring it back, it's unrelated to my changes here, and
would only be because of the lack of basebuf initialization above.

> course you need some "release()" method for the
> UNPACK_TREES_OPTIONS_INIT, which in turn needs to call the dir_release()
> (well, "dir_clear()" in that case), and it needs to call
> "strbuf_release()". It's just nicer if that boilerplate is all on
> destruction, but not also on struct/object setup.

The caller should *not* be initializing or tearing down
unpack_trees_options->dir beyond setting that field to NULL; it should
then leave it alone.

> We do need that setup in some cases (although a lot could just be
> replaced by lazy initialization), but if we don't....
>
> diff --git a/unpack-trees.c b/unpack-trees.c
> index a7e1712d236..de5cc6cd025 100644
> --- a/unpack-trees.c
> +++ b/unpack-trees.c
> @@ -1694,15 +1694,12 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
>         static struct cache_entry *dfc;
>         struct pattern_list pl;
>         int free_pattern_list = 0;
> -       struct dir_struct dir = DIR_INIT;
>
>         if (o->reset == UNPACK_RESET_INVALID)
>                 BUG("o->reset had a value of 1; should be UNPACK_TREES_*_UNTRACKED");
>
>         if (len > MAX_UNPACK_TREES)
>                 die("unpack_trees takes at most %d trees", MAX_UNPACK_TREES);
> -       if (o->dir)
> -               BUG("o->dir is for internal use only");

I think this was an important check that you've tossed without
replacement.  Historically, callers set up and tweaked o->dir with
various values.  With my patch, we are no longer allowing that..which
introduces a transition problem -- people might have written or are
now writing patches that make new calls of unpack_trees() previous to
this change of mine, but submit them after this change of mine gets
merged.  Without this check I added, they'd probably just do a
mechanical `o->dir->` change to `o->dir.` and assume it's good...and
then possibly have ugly bugs to hunt down.

So, I think it's helpful to have a check that provides an early
warning that tweaking o->dir is not only no longer required, but also
no longer allowed.

>         trace_performance_enter();
>         trace2_region_enter("unpack_trees", "unpack_trees", the_repository);
> @@ -1718,9 +1715,8 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
>                 BUG("UNPACK_RESET_OVERWRITE_UNTRACKED incompatible with preserved ignored files");
>
>         if (!o->preserve_ignored) {
> -               o->dir = &dir;
> -               o->dir->flags |= DIR_SHOW_IGNORED;
> -               setup_standard_excludes(o->dir);
> +               o->dir.flags |= DIR_SHOW_IGNORED;
> +               setup_standard_excludes(&o->dir);
>         }
>
>         if (!core_apply_sparse_checkout || !o->update)
> @@ -1884,10 +1880,7 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
>  done:
>         if (free_pattern_list)
>                 clear_pattern_list(&pl);
> -       if (o->dir) {
> -               dir_clear(o->dir);
> -               o->dir = NULL;
> -       }
> +       dir_clear(&o->dir);

Unconditionally calling dir_clear()...

>         trace2_region_leave("unpack_trees", "unpack_trees", the_repository);
>         trace_performance_leave("unpack_trees");
>         return ret;
> @@ -2153,8 +2146,7 @@ static int verify_clean_subdirectory(const struct cache_entry *ce,
>         pathbuf = xstrfmt("%.*s/", namelen, ce->name);
>
>         memset(&d, 0, sizeof(d));
> -       if (o->dir)
> -               d.exclude_per_dir = o->dir->exclude_per_dir;
> +       d.exclude_per_dir = o->dir.exclude_per_dir;
>         i = read_directory(&d, o->src_index, pathbuf, namelen+1, NULL);
>         if (i)
>                 return add_rejected_path(o, ERROR_NOT_UPTODATE_DIR, ce->name);
> @@ -2201,8 +2193,7 @@ static int check_ok_to_remove(const char *name, int len, int dtype,
>         if (ignore_case && icase_exists(o, name, len, st))
>                 return 0;
>
> -       if (o->dir &&
> -           is_excluded(o->dir, o->src_index, name, &dtype))
> +       if (is_excluded(&o->dir, o->src_index, name, &dtype))

Unconditionally calling is_excluded()...

>                 /*
>                  * ce->name is explicitly excluded, so it is Ok to
>                  * overwrite it.
> diff --git a/unpack-trees.h b/unpack-trees.h
> index 71ffb7eeb0c..a8afbb20170 100644
> --- a/unpack-trees.h
> +++ b/unpack-trees.h
> @@ -5,6 +5,7 @@
>  #include "strvec.h"
>  #include "string-list.h"
>  #include "tree-walk.h"
> +#include "dir.h"
>
>  #define MAX_UNPACK_TREES MAX_TRAVERSE_TREES
>
> @@ -95,7 +96,7 @@ struct unpack_trees_options {
>         struct index_state result;
>
>         struct pattern_list *pl; /* for internal use */
> -       struct dir_struct *dir; /* for internal use only */
> +       struct dir_struct dir; /* for internal use only */
>         struct checkout_metadata meta;
>  };
>

Not only did you drop the important safety check that o->dir not be
setup by the caller (which needs to be reinstated in some form), your
solution also involves unconditionally calling dir_clear() and
is_excluded().  It is not clear to me that those calls are safe...and
that they will continue to be safe in the future.  Even if it is safe
and will continue to be, I don't think this should be squashed into my
patches.  I think it should be a separate patch with its own commit
message that explicitly calls out this assumption.  Especially since
this is dir.c, which is an area where attempting to fix one very
simple little bug results in years of refactoring and fixing all kinds
of historical messes, sometimes waiting a year and a half for
responses to RFCs/review requests, and where we have to sometimes just
give up on attempting to understand the purpose of various bits of
code and instead rely on the regression tests and hope they are good
enough.  I still think that dir.c deserves a little warning at the
top, like the one I suggested in [1].

[1] https://lore.kernel.org/git/CABPp-BFiwzzUgiTj_zu+vF5x20L0=1cf25cHwk7KZQj2YkVzXw@mail.gmail.com/

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH v3 00/11] Fix various issues around removal of untracked files/directories
       [not found]     ` <aaa8ea3b-0902-f9e6-c1a4-0ca2b1b2f57b@gmail.com>
@ 2021-10-01  2:08       ` Elijah Newren
  0 siblings, 0 replies; 82+ messages in thread
From: Elijah Newren @ 2021-10-01  2:08 UTC (permalink / raw)
  To: Phillip Wood
  Cc: Elijah Newren via GitGitGadget, Git Mailing List,
	Ævar Arnfjörð Bjarmason, Fedor Biryukov,
	Philip Oakley, Eric Sunshine

Hi Phillip,

On Thu, Sep 30, 2021 at 3:08 AM Phillip Wood <phillip.wood123@gmail.com> wrote:
>
> Hi Elijah
>
> On 27/09/2021 17:33, Elijah Newren via GitGitGadget wrote:
> > We have multiple codepaths that delete untracked files/directories but
> > shouldn't. There are also some codepaths where we delete untracked
> > files/directories intentionally (based on mailing list discussion), but
> > where that intent is not documented. We also have some codepaths that
> > preserve ignored files, which shouldn't. Fix the documentation, add several
> > new (mostly failing) testcases, fix some of the new testcases, and add
> > comments about some potential remaining problems. (I found these as a
> > side-effect of looking at [1], though [2] pointed out one explicitly while I
> > was working on it.)
> >
> > Note that I'm using Junio's declaration about checkout -f and reset --hard
> > (and also presuming that since read-tree --reset is porcelain that its
> > behavior should be left alone)[3] in this series.
> >
>
> I've had a read through and I don't have any specific comments, I like
> the way you have simplified adding the standard excludes for callers and
> making the existing value of reset invalid when converting to an enum. I
> think there is a small risk someone will complain about read-tree
> changing how it handles ignored files, but hopefully everyone was just
> passing ".gitignore" to --exclude-per-directory and they wont mind
> 'read-tree -m -u' removing ignored files now.

Thanks for taking a look!

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH v3 04/11] unpack-trees: introduce preserve_ignored to unpack_trees_options
  2021-10-01  1:53                 ` Elijah Newren
@ 2021-10-01  8:15                   ` Ævar Arnfjörð Bjarmason
  2021-10-01  9:53                     ` Ævar Arnfjörð Bjarmason
  2021-10-01 18:50                     ` Elijah Newren
  0 siblings, 2 replies; 82+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-10-01  8:15 UTC (permalink / raw)
  To: Elijah Newren
  Cc: Elijah Newren via GitGitGadget, Git Mailing List, Fedor Biryukov,
	Philip Oakley, Phillip Wood, Eric Sunshine


On Thu, Sep 30 2021, Elijah Newren wrote:

> On Thu, Sep 30, 2021 at 7:15 AM Ævar Arnfjörð Bjarmason
> <avarab@gmail.com> wrote:
>>
>> On Wed, Sep 29 2021, Elijah Newren wrote:
>>
>> > On Wed, Sep 29, 2021 at 11:32 AM Ævar Arnfjörð Bjarmason
>> > <avarab@gmail.com> wrote:
>> >>
>> >> On Wed, Sep 29 2021, Elijah Newren wrote:
>> >>
> ...
>> > As per the next patch:
>> >
>> > int unpack_trees(..., struct unpack_trees_options *o)
>> > {
>> >     struct dir_struct dir = DIR_INIT;
>> >     ...
>> >     if (!o->preserve_ignored) {
>> >         /* Setup 'dir', make o->dir point to it */
>> >         ....
>> >         o->dir = &dir;
>> >     }
>> >     ...
>> >     if (o->dir)
>> >         /* cleanup */
>> >     ....
>> > }
>> >
>> > The caller doesn't touch o->dir (other than initializing it to zeros);
>> > unpack_trees() is wholly responsible for it.  I'd kind of like to
>> > entirely remove dir from unpack_trees_options(), but I need a way of
>> > passing it down through all the other functions in unpack-trees.c, and
>> > leaving it in unpack_trees_options seems the easiest way to do so.  So
>> > I just marked it as "for internal use only".
>>
>> I think I understand *how* it works, I'm puzzled by why you went for
>> this whole level of indirection when you're using a struct on the stack
>> in the end anyway, just ... put that in "struct unpack_trees_options"?
>>
>> Anyway, I see I have only myself to blame here, as you added these leak
>> fixes in the v2 in response to some of my offhand comments.
>>
>> FWIW I then went on to do some deeper fixes not just on these leaks but
>> the surrounding leaks, which will be blocked by 2/11 & 05/11 of this
>> topic for a while. I suppose I only have myself to blame :)
>>
>> Below is a patch-on-top that I think makes this whole thing much simpler
>> by doing away with the pointer entirely.
>>
>> I suppose this is also a partial reply to
>> https://lore.kernel.org/git/CABPp-BG_qigBoirMGR-Yk9Niyxt0UmYCEqojsYxbSEarLAmraA@mail.gmail.com/;
>> but I quite dislike this pattern of including a pointer like this where
>> it's not needed just for the practicalities of memory management.
>>
>> I.e. here you use DIR_INIT. In my local patches to fix up the wider
>> memory leaks in this area I've got DIR_INIT also using a STRBUF_INIT,
>> and DIR_INIT will in turn be referenced by a
>> UNPACK_TREES_OPTIONS_INIT. It's quite nice if you're having to
>> initialize with "UNPACK_TREES_OPTIONS_INIT" have that initialization
>> work all the way down the chain, and not need e.g. a manual
>> strbuf_init(), dir_init() etc.
>
> And you can keep using UNPACK_TREES_OPTIONS_INIT, because the
> unpack_trees_opts->dir should be initialized to NULL.

But I don't want it initialized to NULL, I want DIR_INIT....

>> I removed the dir_init() in ce93a4c6127 (dir.[ch]: replace dir_init()
>> with DIR_INIT, 2021-07-01),
>
> I might be going on a tangent here, but looking at that patch, I'm
> worried that dir_init() was buggy and that you perpetuated that bug
> with DIR_INIT.  Note that dir_struct has a struct strbuf basebuf
> member, which neither dir_init() or DIR_INIT initialize properly
> (using either strbuf_init() or STRBUF_INIT).  As far as I can tell,
> dir.c relies on either strbuf_add() calls to just happen to work with
> this incorrectly initialized strbuf, or else use the strbuf_init()
> call in prep_exclude() to do so, using the following snippet:
>
>     if (!dir->basebuf.buf)
>         strbuf_init(&dir->basebuf, PATH_MAX);
>
> However, earlier in that same function we see
>
>     if (stk->baselen <= baselen &&
>         !strncmp(dir->basebuf.buf, base, stk->baselen))
>             break;
>
> So either that function can never have dir->basebuf.buf be NULL and
> the strbuf_init() is dead code, or else it's possible for us to
> trigger a segfault.  If it's the former, it may just be a ticking time
> bomb that will transform into the latter with some other change,
> because it's not at all obvious to me how dir->basebuf gets
> initialized appropriately to avoid that strncmp call.  Perhaps there
> is some invariant where exclude_stack is only set up by previous calls
> to prep_exclude() and those won't set up exclude_stack until first
> initializing basebuf.  But that really at least deserves a comment
> about how we're abusing basebuf, and would probably be cleaner if we
> initialized basebuf to STRBUF_INIT.

...because yes, I forgot about that when sending you the diff-on-top,
sorry. Yes that's buggy with the diff-on-top I sent you.

I've got that fixed in the version I have. I.e. first I add a
UNPACK_TREES_OPTIONS_INIT macro, then deal with that lazy initialization
case (at which point DIR_INIT starts initializing that strbuf), then
change the "dir_struct" from a pointer to embedding it, and finally fix
a memory leak with that new API.

WIP patches here:
https://github.com/avar/git/compare/avar/post-sanitize-leak-test-mode-add-and-use-revisions-release...avar/post-sanitize-leak-test-mode-unpack-trees-and-dir

>> but would probably need to bring it back, of
>
> If you need to bring it back, it's unrelated to my changes here, and
> would only be because of the lack of basebuf initialization above.

Yes, in this case. I mean that generally speaking I think it's a good
pattern to use to have structs be initialized by macros like this,
because it means you can embed them N levels deep (as we sometimes do)
without having to call functions to initialize them.

So yes, in this case as long as DIR_INIT is { 0 } it doesn't matter, but
it does as soon as it has a member that needs initialization, and
generally speaking for any FOO_INIT that needs a BAR_INIT ....

>> course you need some "release()" method for the
>> UNPACK_TREES_OPTIONS_INIT, which in turn needs to call the dir_release()
>> (well, "dir_clear()" in that case), and it needs to call
>> "strbuf_release()". It's just nicer if that boilerplate is all on
>> destruction, but not also on struct/object setup.
>
> The caller should *not* be initializing or tearing down
> unpack_trees_options->dir beyond setting that field to NULL; it should
> then leave it alone.

s/NULL/DIR_INIT/ in my version, but yes.

>> We do need that setup in some cases (although a lot could just be
>> replaced by lazy initialization), but if we don't....
>>
>> diff --git a/unpack-trees.c b/unpack-trees.c
>> index a7e1712d236..de5cc6cd025 100644
>> --- a/unpack-trees.c
>> +++ b/unpack-trees.c
>> @@ -1694,15 +1694,12 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
>>         static struct cache_entry *dfc;
>>         struct pattern_list pl;
>>         int free_pattern_list = 0;
>> -       struct dir_struct dir = DIR_INIT;
>>
>>         if (o->reset == UNPACK_RESET_INVALID)
>>                 BUG("o->reset had a value of 1; should be UNPACK_TREES_*_UNTRACKED");
>>
>>         if (len > MAX_UNPACK_TREES)
>>                 die("unpack_trees takes at most %d trees", MAX_UNPACK_TREES);
>> -       if (o->dir)
>> -               BUG("o->dir is for internal use only");
>
> I think this was an important check that you've tossed without
> replacement.  Historically, callers set up and tweaked o->dir with
> various values.  With my patch, we are no longer allowing that..which
> introduces a transition problem -- people might have written or are
> now writing patches that make new calls of unpack_trees() previous to
> this change of mine, but submit them after this change of mine gets
> merged.  Without this check I added, they'd probably just do a
> mechanical `o->dir->` change to `o->dir.` and assume it's good...and
> then possibly have ugly bugs to hunt down.
>
> So, I think it's helpful to have a check that provides an early
> warning that tweaking o->dir is not only no longer required, but also
> no longer allowed.

The compiler will catch any such use of the pointer version on a
mis-merge, or do you just mean that the person running into that might
get the resolution wrong? I.e. before we could check o->dir being NULL
for "do we have an exclude", but &o->dir will always be true?

>>         trace_performance_enter();
>>         trace2_region_enter("unpack_trees", "unpack_trees", the_repository);
>> @@ -1718,9 +1715,8 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
>>                 BUG("UNPACK_RESET_OVERWRITE_UNTRACKED incompatible with preserved ignored files");
>>
>>         if (!o->preserve_ignored) {
>> -               o->dir = &dir;
>> -               o->dir->flags |= DIR_SHOW_IGNORED;
>> -               setup_standard_excludes(o->dir);
>> +               o->dir.flags |= DIR_SHOW_IGNORED;
>> +               setup_standard_excludes(&o->dir);
>>         }
>>
>>         if (!core_apply_sparse_checkout || !o->update)
>> @@ -1884,10 +1880,7 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
>>  done:
>>         if (free_pattern_list)
>>                 clear_pattern_list(&pl);
>> -       if (o->dir) {
>> -               dir_clear(o->dir);
>> -               o->dir = NULL;
>> -       }
>> +       dir_clear(&o->dir);
>
> Unconditionally calling dir_clear()...

As before I'm not sure about bugs in the ad-hoc patch on top, but I
don't think this is a bug in my version linked above.

I.e. it's zero'd out, and the dir_clear() either ends up calling
free(NULL) or tries to loop over 0..N where N will be 0, no?

>>         trace2_region_leave("unpack_trees", "unpack_trees", the_repository);
>>         trace_performance_leave("unpack_trees");
>>         return ret;
>> @@ -2153,8 +2146,7 @@ static int verify_clean_subdirectory(const struct cache_entry *ce,
>>         pathbuf = xstrfmt("%.*s/", namelen, ce->name);
>>
>>         memset(&d, 0, sizeof(d));
>> -       if (o->dir)
>> -               d.exclude_per_dir = o->dir->exclude_per_dir;
>> +       d.exclude_per_dir = o->dir.exclude_per_dir;
>>         i = read_directory(&d, o->src_index, pathbuf, namelen+1, NULL);
>>         if (i)
>>                 return add_rejected_path(o, ERROR_NOT_UPTODATE_DIR, ce->name);
>> @@ -2201,8 +2193,7 @@ static int check_ok_to_remove(const char *name, int len, int dtype,
>>         if (ignore_case && icase_exists(o, name, len, st))
>>                 return 0;
>>
>> -       if (o->dir &&
>> -           is_excluded(o->dir, o->src_index, name, &dtype))
>> +       if (is_excluded(&o->dir, o->src_index, name, &dtype))
>
> Unconditionally calling is_excluded()...

Which will just return "it's not", won't it? Just lik dir_clear() deals
with an "empty" dir_struct. There's existing callers of both with that
pattern in e.g. builtin/{add,clean}.c.

Maybe I've missed an edge case, but I think the only reason that "o->dir
&&" was there was because it was dynamically malloc'd before, but in my
version where we'll always have it initialized...

>>                 /*
>>                  * ce->name is explicitly excluded, so it is Ok to
>>                  * overwrite it.
>> diff --git a/unpack-trees.h b/unpack-trees.h
>> index 71ffb7eeb0c..a8afbb20170 100644
>> --- a/unpack-trees.h
>> +++ b/unpack-trees.h
>> @@ -5,6 +5,7 @@
>>  #include "strvec.h"
>>  #include "string-list.h"
>>  #include "tree-walk.h"
>> +#include "dir.h"
>>
>>  #define MAX_UNPACK_TREES MAX_TRAVERSE_TREES
>>
>> @@ -95,7 +96,7 @@ struct unpack_trees_options {
>>         struct index_state result;
>>
>>         struct pattern_list *pl; /* for internal use */
>> -       struct dir_struct *dir; /* for internal use only */
>> +       struct dir_struct dir; /* for internal use only */
>>         struct checkout_metadata meta;
>>  };
>>
>
> Not only did you drop the important safety check that o->dir not be
> setup by the caller (which needs to be reinstated in some form), your
> solution also involves unconditionally calling dir_clear() and
> is_excluded().  It is not clear to me that those calls are safe...and
> that they will continue to be safe in the future.

It is a common pattern we rely on, e.g. strbuf_release() and various
other custom free-like functions generally act as NOOP if they've got
nothing to do, just like free()...

> Even if it is safe
> and will continue to be, I don't think this should be squashed into my
> patches.  I think it should be a separate patch with its own commit
> message that explicitly calls out this assumption.  Especially since
> this is dir.c, which is an area where attempting to fix one very
> simple little bug results in years of refactoring and fixing all kinds
> of historical messes, sometimes waiting a year and a half for
> responses to RFCs/review requests, and where we have to sometimes just
> give up on attempting to understand the purpose of various bits of
> code and instead rely on the regression tests and hope they are good
> enough.  I still think that dir.c deserves a little warning at the
> top, like the one I suggested in [1].
>
> [1] https://lore.kernel.org/git/CABPp-BFiwzzUgiTj_zu+vF5x20L0=1cf25cHwk7KZQj2YkVzXw@mail.gmail.com/

*nod* I can always submit something like this afterwards.

Just on this series: Perhaps this discussion is a sign that this memory
leak fixing should be its own cleanup series where we could hash out any
approaches to doing that? I.e. as noted before I realize I'm to blame
for suggesting it in the first place, but those parts of these changes
don't seem like they're needed by other parts of the series (I tried
compiling with the two relevant patches ejected out).

Having a humongous set of memory leak fixes locally at this point, I
think it's generally not very worth the effort to fix a leak in b()
where a() calls b() and b() calls c(), and all of [abc]() are
leaking. I.e. often narrowly fixing leaks in b() will lead to different
solutions than if you're trying to resolve all of [abc](), as their
interaction comes into play.

Aside about safety: One thing I'll sometimes do when I'm unsure about
those sorts of fixes is to have my new INIT set a new "sentinel" field
to "12345" or whatever, then just BUG() out in an entry point in the API
that you can't avoid calling if it's not set like that, e.g. dir_clear()
or whatever the setup/work function is.

We don't have 100% test coverage, but we usually have at least *some*,
and doing that is good about catching e.g. a memset() at a distance, as
happens in this code with the merge code embedding the relevant struct
and memsetting it, which might be missed in some migration of just a
grep for "struct dir_struct" or whatever...

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH v3 04/11] unpack-trees: introduce preserve_ignored to unpack_trees_options
  2021-10-01  8:15                   ` Ævar Arnfjörð Bjarmason
@ 2021-10-01  9:53                     ` Ævar Arnfjörð Bjarmason
  2021-10-01 18:50                     ` Elijah Newren
  1 sibling, 0 replies; 82+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-10-01  9:53 UTC (permalink / raw)
  To: Elijah Newren
  Cc: Elijah Newren via GitGitGadget, Git Mailing List, Fedor Biryukov,
	Philip Oakley, Phillip Wood, Eric Sunshine


On Fri, Oct 01 2021, Ævar Arnfjörð Bjarmason wrote:

> Aside about safety: One thing I'll sometimes do when I'm unsure about
> those sorts of fixes is to have my new INIT set a new "sentinel" field
> to "12345" or whatever, then just BUG() out in an entry point in the API
> that you can't avoid calling if it's not set like that, e.g. dir_clear()
> or whatever the setup/work function is.

For reference: Something like the below, which passes with my WIP
patches. Showing that no non-static entry point can reach the code in
unpack-trees.c without "foo" being 12345, which can only be the case if
callers have used the macro (and the code internal to unpack-trees.c is
easy enough to audit).

 unpack-trees.c | 25 +++++++++++++++++++++++++
 unpack-trees.h |  2 ++
 2 files changed, 27 insertions(+)

diff --git a/unpack-trees.c b/unpack-trees.c
index d40af221e1c..f2365ecf215 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -199,6 +199,8 @@ void clear_unpack_trees_porcelain(struct unpack_trees_options *opts)
 {
 	strvec_clear(&opts->msgs_to_free);
 	dir_clear(&opts->dir);
+	if (opts->foo != 12345)
+		BUG("noes");
 	memset(opts->msgs, 0, sizeof(opts->msgs));
 }
 
@@ -1702,6 +1704,9 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
 	struct pattern_list pl;
 	int free_pattern_list = 0;
 
+	if (o->foo != 12345)
+		BUG("noes");
+
 	if (len > MAX_UNPACK_TREES)
 		die("unpack_trees takes at most %d trees", MAX_UNPACK_TREES);
 
@@ -1903,6 +1908,9 @@ enum update_sparsity_result update_sparsity(struct unpack_trees_options *o)
 	unsigned old_show_all_errors;
 	int free_pattern_list = 0;
 
+	if (o->foo != 12345)
+		BUG("noes");
+
 	old_show_all_errors = o->show_all_errors;
 	o->show_all_errors = 1;
 
@@ -2033,6 +2041,8 @@ static int verify_uptodate_1(const struct cache_entry *ce,
 int verify_uptodate(const struct cache_entry *ce,
 		    struct unpack_trees_options *o)
 {
+	if (o->foo != 12345)
+		BUG("noes");
 	if (!o->skip_sparse_checkout && (ce->ce_flags & CE_NEW_SKIP_WORKTREE))
 		return 0;
 	return verify_uptodate_1(ce, o, ERROR_NOT_UPTODATE_FILE);
@@ -2417,6 +2427,9 @@ int threeway_merge(const struct cache_entry * const *stages,
 	int no_anc_exists = 1;
 	int i;
 
+	if (o->foo != 12345)
+		BUG("noes");
+
 	for (i = 1; i < o->head_idx; i++) {
 		if (!stages[i] || stages[i] == o->df_conflict_entry)
 			any_anc_missing = 1;
@@ -2580,6 +2593,9 @@ int twoway_merge(const struct cache_entry * const *src,
 	const struct cache_entry *oldtree = src[1];
 	const struct cache_entry *newtree = src[2];
 
+	if (o->foo != 12345)
+		BUG("noes");
+
 	if (o->merge_size != 2)
 		return error("Cannot do a twoway merge of %d trees",
 			     o->merge_size);
@@ -2654,6 +2670,9 @@ int bind_merge(const struct cache_entry * const *src,
 	const struct cache_entry *old = src[0];
 	const struct cache_entry *a = src[1];
 
+	if (o->foo != 12345)
+		BUG("noes");
+
 	if (o->merge_size != 1)
 		return error("Cannot do a bind merge of %d trees",
 			     o->merge_size);
@@ -2680,6 +2699,9 @@ int oneway_merge(const struct cache_entry * const *src,
 	const struct cache_entry *old = src[0];
 	const struct cache_entry *a = src[1];
 
+	if (o->foo != 12345)
+		BUG("noes");
+
 	if (o->merge_size != 1)
 		return error("Cannot do a oneway merge of %d trees",
 			     o->merge_size);
@@ -2717,6 +2739,9 @@ int stash_worktree_untracked_merge(const struct cache_entry * const *src,
 	const struct cache_entry *worktree = src[1];
 	const struct cache_entry *untracked = src[2];
 
+	if (o->foo != 12345)
+		BUG("noes");
+
 	if (o->merge_size != 2)
 		BUG("invalid merge_size: %d", o->merge_size);
 
diff --git a/unpack-trees.h b/unpack-trees.h
index 75b67f90ccd..8dae0938ad1 100644
--- a/unpack-trees.h
+++ b/unpack-trees.h
@@ -90,10 +90,12 @@ struct unpack_trees_options {
 
 	struct pattern_list *pl; /* for internal use */
 	struct checkout_metadata meta;
+	int foo;
 };
 #define UNPACK_TREES_OPTIONS_INIT { \
 	.msgs_to_free = STRVEC_INIT, \
 	.dir = DIR_INIT, \
+	.foo = 12345, \
 }
 
 void unpack_trees_init(struct unpack_trees_options *options);

^ permalink raw reply related	[flat|nested] 82+ messages in thread

* Re: [PATCH v3 04/11] unpack-trees: introduce preserve_ignored to unpack_trees_options
  2021-10-01  8:15                   ` Ævar Arnfjörð Bjarmason
  2021-10-01  9:53                     ` Ævar Arnfjörð Bjarmason
@ 2021-10-01 18:50                     ` Elijah Newren
  2021-10-02  8:44                       ` Ævar Arnfjörð Bjarmason
  1 sibling, 1 reply; 82+ messages in thread
From: Elijah Newren @ 2021-10-01 18:50 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Elijah Newren via GitGitGadget, Git Mailing List, Fedor Biryukov,
	Philip Oakley, Phillip Wood

On Fri, Oct 1, 2021 at 1:47 AM Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote:
>
> On Thu, Sep 30 2021, Elijah Newren wrote:
>
> > On Thu, Sep 30, 2021 at 7:15 AM Ævar Arnfjörð Bjarmason
> > <avarab@gmail.com> wrote:
> >>
> >> On Wed, Sep 29 2021, Elijah Newren wrote:
> >>
> >> > On Wed, Sep 29, 2021 at 11:32 AM Ævar Arnfjörð Bjarmason
> >> > <avarab@gmail.com> wrote:
> >> >>
> >> >> On Wed, Sep 29 2021, Elijah Newren wrote:
> >> >>
> > ...
> >> > As per the next patch:
> >> >
> >> > int unpack_trees(..., struct unpack_trees_options *o)
> >> > {
> >> >     struct dir_struct dir = DIR_INIT;
> >> >     ...
> >> >     if (!o->preserve_ignored) {
> >> >         /* Setup 'dir', make o->dir point to it */
> >> >         ....
> >> >         o->dir = &dir;
> >> >     }
> >> >     ...
> >> >     if (o->dir)
> >> >         /* cleanup */
> >> >     ....
> >> > }
> >> >
> >> > The caller doesn't touch o->dir (other than initializing it to zeros);
> >> > unpack_trees() is wholly responsible for it.  I'd kind of like to
> >> > entirely remove dir from unpack_trees_options(), but I need a way of
> >> > passing it down through all the other functions in unpack-trees.c, and
> >> > leaving it in unpack_trees_options seems the easiest way to do so.  So
> >> > I just marked it as "for internal use only".
> >>
> >> I think I understand *how* it works, I'm puzzled by why you went for
> >> this whole level of indirection when you're using a struct on the stack
> >> in the end anyway, just ... put that in "struct unpack_trees_options"?
> >>
> >> Anyway, I see I have only myself to blame here, as you added these leak
> >> fixes in the v2 in response to some of my offhand comments.
> >>
> >> FWIW I then went on to do some deeper fixes not just on these leaks but
> >> the surrounding leaks, which will be blocked by 2/11 & 05/11 of this
> >> topic for a while. I suppose I only have myself to blame :)
> >>
> >> Below is a patch-on-top that I think makes this whole thing much simpler
> >> by doing away with the pointer entirely.
> >>
> >> I suppose this is also a partial reply to
> >> https://lore.kernel.org/git/CABPp-BG_qigBoirMGR-Yk9Niyxt0UmYCEqojsYxbSEarLAmraA@mail.gmail.com/;
> >> but I quite dislike this pattern of including a pointer like this where
> >> it's not needed just for the practicalities of memory management.
> >>
> >> I.e. here you use DIR_INIT. In my local patches to fix up the wider
> >> memory leaks in this area I've got DIR_INIT also using a STRBUF_INIT,
> >> and DIR_INIT will in turn be referenced by a
> >> UNPACK_TREES_OPTIONS_INIT. It's quite nice if you're having to
> >> initialize with "UNPACK_TREES_OPTIONS_INIT" have that initialization
> >> work all the way down the chain, and not need e.g. a manual
> >> strbuf_init(), dir_init() etc.
> >
> > And you can keep using UNPACK_TREES_OPTIONS_INIT, because the
> > unpack_trees_opts->dir should be initialized to NULL.
>
> But I don't want it initialized to NULL, I want DIR_INIT....

Why?  For what purpose?  How does that help anything?  I've seen you
say that you want it, but I haven't yet seen you state how it helps
you do anything easier.

What I really want, though, is not even to have it be a pointer, but
to avoid exposing internal implementation details inside a struct that
is meant to convey the public API.  Instead unpack-trees should do
something similar to merge-ort, where it hides all those internal-only
details (by e.g. having a void* priv that happens to point to a struct
unpack_trees_options_internal, the latter of which is only defined in
unpack_trees.c).  However, I didn't want to go through that work for
just one member.

But you've inspired me to check if there are other fields that
shouldn't be exposed.  Turns out that there is a lot of cruft in
unpack_trees_options that callers shouldn't be messing with (and which
isn't at all clear to people trying to use the API): cache_bottom,
dir, msgs, msgs_to_free, nontrivial_merge, skip_sparse_checkout,
show_all_errors (!), unpack_rejects, df_conflict_entry, merge_size,
result, and perhaps pl.  A few of those have gotten slightly
entangled.  And there may have been others that people just started
setting because it was an existing field, and now I can't
differentiate between an intentional API usage passing some kind of
interesting value and an accidental setting of something meant to be
internal.

So maybe I'll submit some patches on top that rip these direct members
out of of unpack_trees_options and push them inside some opaque
struct.

> >> I removed the dir_init() in ce93a4c6127 (dir.[ch]: replace dir_init()
> >> with DIR_INIT, 2021-07-01),
> >
> > I might be going on a tangent here, but looking at that patch, I'm
> > worried that dir_init() was buggy and that you perpetuated that bug
> > with DIR_INIT.  Note that dir_struct has a struct strbuf basebuf
> > member, which neither dir_init() or DIR_INIT initialize properly
> > (using either strbuf_init() or STRBUF_INIT).  As far as I can tell,
> > dir.c relies on either strbuf_add() calls to just happen to work with
> > this incorrectly initialized strbuf, or else use the strbuf_init()
> > call in prep_exclude() to do so, using the following snippet:
> >
> >     if (!dir->basebuf.buf)
> >         strbuf_init(&dir->basebuf, PATH_MAX);
> >
> > However, earlier in that same function we see
> >
> >     if (stk->baselen <= baselen &&
> >         !strncmp(dir->basebuf.buf, base, stk->baselen))
> >             break;
> >
> > So either that function can never have dir->basebuf.buf be NULL and
> > the strbuf_init() is dead code, or else it's possible for us to
> > trigger a segfault.  If it's the former, it may just be a ticking time
> > bomb that will transform into the latter with some other change,
> > because it's not at all obvious to me how dir->basebuf gets
> > initialized appropriately to avoid that strncmp call.  Perhaps there
> > is some invariant where exclude_stack is only set up by previous calls
> > to prep_exclude() and those won't set up exclude_stack until first
> > initializing basebuf.  But that really at least deserves a comment
> > about how we're abusing basebuf, and would probably be cleaner if we
> > initialized basebuf to STRBUF_INIT.
>
> ...because yes, I forgot about that when sending you the diff-on-top,
> sorry. Yes that's buggy with the diff-on-top I sent you.

That bug didn't come from the diff-on-top you sent me, it came from
the commit already merged to master -- ce93a4c6127  (dir.[ch]: replace
dir_init() with DIR_INIT, 2021-07-01), merged as part of
ab/struct-init on Jul 16.

> I've got that fixed in the version I have. I.e. first I add a
> UNPACK_TREES_OPTIONS_INIT macro, then deal with that lazy initialization
> case (at which point DIR_INIT starts initializing that strbuf), then
> change the "dir_struct" from a pointer to embedding it, and finally fix
> a memory leak with that new API.
>
> WIP patches here:
> https://github.com/avar/git/compare/avar/post-sanitize-leak-test-mode-add-and-use-revisions-release...avar/post-sanitize-leak-test-mode-unpack-trees-and-dir

Yes, that fixes DIR_INIT nicely.  Looks good!

> >> but would probably need to bring it back, of
> >
> > If you need to bring it back, it's unrelated to my changes here, and
> > would only be because of the lack of basebuf initialization above.
>
> Yes, in this case. I mean that generally speaking I think it's a good
> pattern to use to have structs be initialized by macros like this,
> because it means you can embed them N levels deep (as we sometimes do)
> without having to call functions to initialize them.
>
> So yes, in this case as long as DIR_INIT is { 0 } it doesn't matter, but
> it does as soon as it has a member that needs initialization, and
> generally speaking for any FOO_INIT that needs a BAR_INIT ...

Callers SHOULD NOT call a function to initialize
unpacked_trees_opts->dir in my patches.  It's a ****BUG**** if they do
so.  So if you're complaining that my changes force callers to also
invoke some additional function, then I think you're just not
understanding my patch.

So, I still see no reason given for wanting opts->dir to be a struct.
But maybe we can fix this by just removing 'dir' (and several other
members) from opts, so that callers can't initialize it in any way to
anything.

> >> course you need some "release()" method for the
> >> UNPACK_TREES_OPTIONS_INIT, which in turn needs to call the dir_release()
> >> (well, "dir_clear()" in that case), and it needs to call
> >> "strbuf_release()". It's just nicer if that boilerplate is all on
> >> destruction, but not also on struct/object setup.
> >
> > The caller should *not* be initializing or tearing down
> > unpack_trees_options->dir beyond setting that field to NULL; it should
> > then leave it alone.
>
> s/NULL/DIR_INIT/ in my version, but yes.
>
> >> We do need that setup in some cases (although a lot could just be
> >> replaced by lazy initialization), but if we don't....
> >>
> >> diff --git a/unpack-trees.c b/unpack-trees.c
> >> index a7e1712d236..de5cc6cd025 100644
> >> --- a/unpack-trees.c
> >> +++ b/unpack-trees.c
> >> @@ -1694,15 +1694,12 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
> >>         static struct cache_entry *dfc;
> >>         struct pattern_list pl;
> >>         int free_pattern_list = 0;
> >> -       struct dir_struct dir = DIR_INIT;
> >>
> >>         if (o->reset == UNPACK_RESET_INVALID)
> >>                 BUG("o->reset had a value of 1; should be UNPACK_TREES_*_UNTRACKED");
> >>
> >>         if (len > MAX_UNPACK_TREES)
> >>                 die("unpack_trees takes at most %d trees", MAX_UNPACK_TREES);
> >> -       if (o->dir)
> >> -               BUG("o->dir is for internal use only");
> >
> > I think this was an important check that you've tossed without
> > replacement.  Historically, callers set up and tweaked o->dir with
> > various values.  With my patch, we are no longer allowing that..which
> > introduces a transition problem -- people might have written or are
> > now writing patches that make new calls of unpack_trees() previous to
> > this change of mine, but submit them after this change of mine gets
> > merged.  Without this check I added, they'd probably just do a
> > mechanical `o->dir->` change to `o->dir.` and assume it's good...and
> > then possibly have ugly bugs to hunt down.
> >
> > So, I think it's helpful to have a check that provides an early
> > warning that tweaking o->dir is not only no longer required, but also
> > no longer allowed.
>
> The compiler will catch any such use of the pointer version on a
> mis-merge, or do you just mean that the person running into that might
> get the resolution wrong? I.e. before we could check o->dir being NULL
> for "do we have an exclude", but &o->dir will always be true?

The compiler catches and reports, and then the human sees the error
and just transliterates "o->dir->" to "o->dir.".  And then it
compiles, and the person assumes they fixed it correctly, but the
transliteration was WRONG and has subtle bugs because they had been
setting up o->dir with special values and something needs to warn them
that they shouldn't be touching o->dir anymore.  You removed the
safety check that would have let them know that their straightforward
transliteration was wrong.  I added that safety check intentionally,
and don't like seeing it ripped out without a replacement.

> >>         trace_performance_enter();
> >>         trace2_region_enter("unpack_trees", "unpack_trees", the_repository);
> >> @@ -1718,9 +1715,8 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
> >>                 BUG("UNPACK_RESET_OVERWRITE_UNTRACKED incompatible with preserved ignored files");
> >>
> >>         if (!o->preserve_ignored) {
> >> -               o->dir = &dir;
> >> -               o->dir->flags |= DIR_SHOW_IGNORED;
> >> -               setup_standard_excludes(o->dir);
> >> +               o->dir.flags |= DIR_SHOW_IGNORED;
> >> +               setup_standard_excludes(&o->dir);
> >>         }
> >>
> >>         if (!core_apply_sparse_checkout || !o->update)
> >> @@ -1884,10 +1880,7 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
> >>  done:
> >>         if (free_pattern_list)
> >>                 clear_pattern_list(&pl);
> >> -       if (o->dir) {
> >> -               dir_clear(o->dir);
> >> -               o->dir = NULL;
> >> -       }
> >> +       dir_clear(&o->dir);
> >
> > Unconditionally calling dir_clear()...
>
> As before I'm not sure about bugs in the ad-hoc patch on top, but I
> don't think this is a bug in my version linked above.
>
> I.e. it's zero'd out, and the dir_clear() either ends up calling
> free(NULL) or tries to loop over 0..N where N will be 0, no?
>
> >>         trace2_region_leave("unpack_trees", "unpack_trees", the_repository);
> >>         trace_performance_leave("unpack_trees");
> >>         return ret;
> >> @@ -2153,8 +2146,7 @@ static int verify_clean_subdirectory(const struct cache_entry *ce,
> >>         pathbuf = xstrfmt("%.*s/", namelen, ce->name);
> >>
> >>         memset(&d, 0, sizeof(d));
> >> -       if (o->dir)
> >> -               d.exclude_per_dir = o->dir->exclude_per_dir;
> >> +       d.exclude_per_dir = o->dir.exclude_per_dir;
> >>         i = read_directory(&d, o->src_index, pathbuf, namelen+1, NULL);
> >>         if (i)
> >>                 return add_rejected_path(o, ERROR_NOT_UPTODATE_DIR, ce->name);
> >> @@ -2201,8 +2193,7 @@ static int check_ok_to_remove(const char *name, int len, int dtype,
> >>         if (ignore_case && icase_exists(o, name, len, st))
> >>                 return 0;
> >>
> >> -       if (o->dir &&
> >> -           is_excluded(o->dir, o->src_index, name, &dtype))
> >> +       if (is_excluded(&o->dir, o->src_index, name, &dtype))
> >
> > Unconditionally calling is_excluded()...
>
> Which will just return "it's not", won't it? Just lik dir_clear() deals
> with an "empty" dir_struct. There's existing callers of both with that
> pattern in e.g. builtin/{add,clean}.c.

That's a question that someone writing this patch should investigate
and document.  I read through is_excluded() and the functions it
calls, and I _think_ it's possibly correct in the current code.  But
it's not trivial to verify.  And I'd look a bit closer at it to be
sure it's correct before making this change.

Yeah, I'm being pretty picky here, but this is dir.c we are dealing with.

> Maybe I've missed an edge case, but I think the only reason that "o->dir
> &&" was there was because it was dynamically malloc'd before, but in my
> version where we'll always have it initialized...

I think after checking the code to verify this, that it deserves being
mentioned in the commit message (at least given that this is dir.c
we're talking about).

> >>                 /*
> >>                  * ce->name is explicitly excluded, so it is Ok to
> >>                  * overwrite it.
> >> diff --git a/unpack-trees.h b/unpack-trees.h
> >> index 71ffb7eeb0c..a8afbb20170 100644
> >> --- a/unpack-trees.h
> >> +++ b/unpack-trees.h
> >> @@ -5,6 +5,7 @@
> >>  #include "strvec.h"
> >>  #include "string-list.h"
> >>  #include "tree-walk.h"
> >> +#include "dir.h"
> >>
> >>  #define MAX_UNPACK_TREES MAX_TRAVERSE_TREES
> >>
> >> @@ -95,7 +96,7 @@ struct unpack_trees_options {
> >>         struct index_state result;
> >>
> >>         struct pattern_list *pl; /* for internal use */
> >> -       struct dir_struct *dir; /* for internal use only */
> >> +       struct dir_struct dir; /* for internal use only */
> >>         struct checkout_metadata meta;
> >>  };
> >>
> >
> > Not only did you drop the important safety check that o->dir not be
> > setup by the caller (which needs to be reinstated in some form), your
> > solution also involves unconditionally calling dir_clear() and
> > is_excluded().  It is not clear to me that those calls are safe...and
> > that they will continue to be safe in the future.
>
> It is a common pattern we rely on, e.g. strbuf_release() and various
> other custom free-like functions generally act as NOOP if they've got
> nothing to do, just like free()...

Yeah, "common pattern" is a useful way to want things to behave, and
people assuming that in other parts of git's codebase is probably
fine, but I don't trust dir.c to already behave according to common
patterns.  My mistrust of it is deep enough that I think someone
should verify the relevant parts of dir.c do behave that way before
making changes like this, and then explicitly call it out in the
commit message when the change is made.  dir.c had some weird dragons
in it.  And after being repeatedly sucked back into dir.c problems for
years because of weird side effects, and seeing how many piles of
nearly-cancelling bugs it had, I don't trust it anymore.

> > Even if it is safe
> > and will continue to be, I don't think this should be squashed into my
> > patches.  I think it should be a separate patch with its own commit
> > message that explicitly calls out this assumption.  Especially since
> > this is dir.c, which is an area where attempting to fix one very
> > simple little bug results in years of refactoring and fixing all kinds
> > of historical messes, sometimes waiting a year and a half for
> > responses to RFCs/review requests, and where we have to sometimes just
> > give up on attempting to understand the purpose of various bits of
> > code and instead rely on the regression tests and hope they are good
> > enough.  I still think that dir.c deserves a little warning at the
> > top, like the one I suggested in [1].
> >
> > [1] https://lore.kernel.org/git/CABPp-BFiwzzUgiTj_zu+vF5x20L0=1cf25cHwk7KZQj2YkVzXw@mail.gmail.com/
>
> *nod* I can always submit something like this afterwards.

:-)

> Just on this series: Perhaps this discussion is a sign that this memory
> leak fixing should be its own cleanup series where we could hash out any
> approaches to doing that? I.e. as noted before I realize I'm to blame
> for suggesting it in the first place, but those parts of these changes
> don't seem like they're needed by other parts of the series (I tried
> compiling with the two relevant patches ejected out).
>
> Having a humongous set of memory leak fixes locally at this point, I
> think it's generally not very worth the effort to fix a leak in b()
> where a() calls b() and b() calls c(), and all of [abc]() are
> leaking. I.e. often narrowly fixing leaks in b() will lead to different
> solutions than if you're trying to resolve all of [abc](), as their
> interaction comes into play.

The patch we are commenting on isn't a leakfix, and is very much
integral to the series for reasons other than actual leakfix (which
occurred two patches before this one).  It's simplifying the API to
not require a bunch of boilerplate setup of opts->dir and instead
requiring callers to just set a simple boolean INSTEAD (and only if
they want non-default behavior, rather than requiring setting
opts->dir by all those who did want default behavior).  That way, when
I need to make sure several additional callers who should have gotten
the default behavior actually get it, they don't have to call a bunch
of functions to setup and cleanup opts->dir.

> Aside about safety: One thing I'll sometimes do when I'm unsure about
> those sorts of fixes is to have my new INIT set a new "sentinel" field
> to "12345" or whatever, then just BUG() out in an entry point in the API
> that you can't avoid calling if it's not set like that, e.g. dir_clear()
> or whatever the setup/work function is.
>
> We don't have 100% test coverage, but we usually have at least *some*,
> and doing that is good about catching e.g. a memset() at a distance, as
> happens in this code with the merge code embedding the relevant struct
> and memsetting it, which might be missed in some migration of just a
> grep for "struct dir_struct" or whatever...

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH v3 04/11] unpack-trees: introduce preserve_ignored to unpack_trees_options
  2021-10-01 18:50                     ` Elijah Newren
@ 2021-10-02  8:44                       ` Ævar Arnfjörð Bjarmason
  2021-10-03 22:21                         ` Ævar Arnfjörð Bjarmason
  2021-10-04 13:45                         ` Elijah Newren
  0 siblings, 2 replies; 82+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-10-02  8:44 UTC (permalink / raw)
  To: Elijah Newren
  Cc: Elijah Newren via GitGitGadget, Git Mailing List, Fedor Biryukov,
	Philip Oakley, Phillip Wood


On Fri, Oct 01 2021, Elijah Newren wrote:

> On Fri, Oct 1, 2021 at 1:47 AM Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote:
>>
>> On Thu, Sep 30 2021, Elijah Newren wrote:
>>
>> > On Thu, Sep 30, 2021 at 7:15 AM Ævar Arnfjörð Bjarmason
>> > <avarab@gmail.com> wrote:
>> >>
>> >> On Wed, Sep 29 2021, Elijah Newren wrote:
>> >>
>> >> > On Wed, Sep 29, 2021 at 11:32 AM Ævar Arnfjörð Bjarmason
>> >> > <avarab@gmail.com> wrote:
>> >> >>
>> >> >> On Wed, Sep 29 2021, Elijah Newren wrote:
>> >> >>
>> > ...
>> >> > As per the next patch:
>> >> >
>> >> > int unpack_trees(..., struct unpack_trees_options *o)
>> >> > {
>> >> >     struct dir_struct dir = DIR_INIT;
>> >> >     ...
>> >> >     if (!o->preserve_ignored) {
>> >> >         /* Setup 'dir', make o->dir point to it */
>> >> >         ....
>> >> >         o->dir = &dir;
>> >> >     }
>> >> >     ...
>> >> >     if (o->dir)
>> >> >         /* cleanup */
>> >> >     ....
>> >> > }
>> >> >
>> >> > The caller doesn't touch o->dir (other than initializing it to zeros);
>> >> > unpack_trees() is wholly responsible for it.  I'd kind of like to
>> >> > entirely remove dir from unpack_trees_options(), but I need a way of
>> >> > passing it down through all the other functions in unpack-trees.c, and
>> >> > leaving it in unpack_trees_options seems the easiest way to do so.  So
>> >> > I just marked it as "for internal use only".
>> >>
>> >> I think I understand *how* it works, I'm puzzled by why you went for
>> >> this whole level of indirection when you're using a struct on the stack
>> >> in the end anyway, just ... put that in "struct unpack_trees_options"?
>> >>
>> >> Anyway, I see I have only myself to blame here, as you added these leak
>> >> fixes in the v2 in response to some of my offhand comments.
>> >>
>> >> FWIW I then went on to do some deeper fixes not just on these leaks but
>> >> the surrounding leaks, which will be blocked by 2/11 & 05/11 of this
>> >> topic for a while. I suppose I only have myself to blame :)
>> >>
>> >> Below is a patch-on-top that I think makes this whole thing much simpler
>> >> by doing away with the pointer entirely.
>> >>
>> >> I suppose this is also a partial reply to
>> >> https://lore.kernel.org/git/CABPp-BG_qigBoirMGR-Yk9Niyxt0UmYCEqojsYxbSEarLAmraA@mail.gmail.com/;
>> >> but I quite dislike this pattern of including a pointer like this where
>> >> it's not needed just for the practicalities of memory management.
>> >>
>> >> I.e. here you use DIR_INIT. In my local patches to fix up the wider
>> >> memory leaks in this area I've got DIR_INIT also using a STRBUF_INIT,
>> >> and DIR_INIT will in turn be referenced by a
>> >> UNPACK_TREES_OPTIONS_INIT. It's quite nice if you're having to
>> >> initialize with "UNPACK_TREES_OPTIONS_INIT" have that initialization
>> >> work all the way down the chain, and not need e.g. a manual
>> >> strbuf_init(), dir_init() etc.
>> >
>> > And you can keep using UNPACK_TREES_OPTIONS_INIT, because the
>> > unpack_trees_opts->dir should be initialized to NULL.
>>
>> But I don't want it initialized to NULL, I want DIR_INIT....
>
> Why?  For what purpose?  How does that help anything?  I've seen you
> say that you want it, but I haven't yet seen you state how it helps
> you do anything easier.

Upthread of here in <87k0ixrv23.fsf@evledraar.gmail.com> I linked to WIP
patches that get rid of lazy initialization in dir.c. I.e. this change:
https://github.com/avar/git/commit/5a18133e927f82a9b6ffcb0c43c3f8657f8836af

[I see from reading ahead a bit that you saw that later...]

I.e. if you initialize it to NULL the embedded strbuf won't be in the
right state to be used, which is why we initialize it on the fly in that
part of prep_exclude(). If everything uses DIR_INIT we can just have it
be initialized already.

More generally I think that even if you don't have an immediate use-case
like that it makes sense to consistently use the macros, because once
something *does* need a strbuf or whatever you're not needing to do a
refactoring of ensuring that everything that hardcoded a "{ 0 }" or its
own memset() is converted to use it, or needing to do initialization
on-the-fly as in that code in prep_exclude()>

> What I really want, though, is not even to have it be a pointer, but
> to avoid exposing internal implementation details inside a struct that
> is meant to convey the public API.  Instead unpack-trees should do
> something similar to merge-ort, where it hides all those internal-only
> details (by e.g. having a void* priv that happens to point to a struct
> unpack_trees_options_internal, the latter of which is only defined in
> unpack_trees.c).  However, I didn't want to go through that work for
> just one member.

Yes, and I like the part of your change that's not having callers change
"struct dir_struct" but instead just exposing a field in unpack_trees to
flag the desired behavior, that that code wants to then use a "struct
dir_struct" can just be an implementation detail.

> But you've inspired me to check if there are other fields that
> shouldn't be exposed.  Turns out that there is a lot of cruft in
> unpack_trees_options that callers shouldn't be messing with (and which
> isn't at all clear to people trying to use the API): cache_bottom,
> dir, msgs, msgs_to_free, nontrivial_merge, skip_sparse_checkout,
> show_all_errors (!), unpack_rejects, df_conflict_entry, merge_size,
> result, and perhaps pl.  A few of those have gotten slightly
> entangled.  And there may have been others that people just started
> setting because it was an existing field, and now I can't
> differentiate between an intentional API usage passing some kind of
> interesting value and an accidental setting of something meant to be
> internal.
>
> So maybe I'll submit some patches on top that rip these direct members
> out of of unpack_trees_options and push them inside some opaque
> struct.

Sure, that sounds good. I only had a mild objection to doing it in a way
where you'll need that sort of code I removed in the linked commit in
prep_exclude() because you were trying not to expose that at any cost,
including via some *_INIT macro. I.e. if it's private we can just name
it "priv_*" or have a :

    struct dont_touch_this {
        struct dir_struct dir;
    };

Which are both ways of /messaging/ that it's private, and since the
target audience is just the rest of the git.git codebase I think that
ultimately something that 1) sends the right message 2) makes accidents
pretty much impossible suffices. I.e. you don't accidentally introduce a
new API user accessing a field called "->priv_*" or
"->private_*". Someone will review those patches...

>> >> I removed the dir_init() in ce93a4c6127 (dir.[ch]: replace dir_init()
>> >> with DIR_INIT, 2021-07-01),
>> >
>> > I might be going on a tangent here, but looking at that patch, I'm
>> > worried that dir_init() was buggy and that you perpetuated that bug
>> > with DIR_INIT.  Note that dir_struct has a struct strbuf basebuf
>> > member, which neither dir_init() or DIR_INIT initialize properly
>> > (using either strbuf_init() or STRBUF_INIT).  As far as I can tell,
>> > dir.c relies on either strbuf_add() calls to just happen to work with
>> > this incorrectly initialized strbuf, or else use the strbuf_init()
>> > call in prep_exclude() to do so, using the following snippet:
>> >
>> >     if (!dir->basebuf.buf)
>> >         strbuf_init(&dir->basebuf, PATH_MAX);
>> >
>> > However, earlier in that same function we see
>> >
>> >     if (stk->baselen <= baselen &&
>> >         !strncmp(dir->basebuf.buf, base, stk->baselen))
>> >             break;
>> >
>> > So either that function can never have dir->basebuf.buf be NULL and
>> > the strbuf_init() is dead code, or else it's possible for us to
>> > trigger a segfault.  If it's the former, it may just be a ticking time
>> > bomb that will transform into the latter with some other change,
>> > because it's not at all obvious to me how dir->basebuf gets
>> > initialized appropriately to avoid that strncmp call.  Perhaps there
>> > is some invariant where exclude_stack is only set up by previous calls
>> > to prep_exclude() and those won't set up exclude_stack until first
>> > initializing basebuf.  But that really at least deserves a comment
>> > about how we're abusing basebuf, and would probably be cleaner if we
>> > initialized basebuf to STRBUF_INIT.
>>
>> ...because yes, I forgot about that when sending you the diff-on-top,
>> sorry. Yes that's buggy with the diff-on-top I sent you.
>
> That bug didn't come from the diff-on-top you sent me, it came from
> the commit already merged to master -- ce93a4c6127  (dir.[ch]: replace
> dir_init() with DIR_INIT, 2021-07-01), merged as part of
> ab/struct-init on Jul 16.

Ah, I misunderstood you there. I'll look at that / fix it. Sorry.

>> I've got that fixed in the version I have. I.e. first I add a
>> UNPACK_TREES_OPTIONS_INIT macro, then deal with that lazy initialization
>> case (at which point DIR_INIT starts initializing that strbuf), then
>> change the "dir_struct" from a pointer to embedding it, and finally fix
>> a memory leak with that new API.
>>
>> WIP patches here:
>> https://github.com/avar/git/compare/avar/post-sanitize-leak-test-mode-add-and-use-revisions-release...avar/post-sanitize-leak-test-mode-unpack-trees-and-dir
>
> Yes, that fixes DIR_INIT nicely.  Looks good!
>
>> >> but would probably need to bring it back, of
>> >
>> > If you need to bring it back, it's unrelated to my changes here, and
>> > would only be because of the lack of basebuf initialization above.
>>
>> Yes, in this case. I mean that generally speaking I think it's a good
>> pattern to use to have structs be initialized by macros like this,
>> because it means you can embed them N levels deep (as we sometimes do)
>> without having to call functions to initialize them.
>>
>> So yes, in this case as long as DIR_INIT is { 0 } it doesn't matter, but
>> it does as soon as it has a member that needs initialization, and
>> generally speaking for any FOO_INIT that needs a BAR_INIT ...
>
> Callers SHOULD NOT call a function to initialize
> unpacked_trees_opts->dir in my patches.  It's a ****BUG**** if they do
> so.  So if you're complaining that my changes force callers to also
> invoke some additional function, then I think you're just not
> understanding my patch.
>
> So, I still see no reason given for wanting opts->dir to be a struct.
> But maybe we can fix this by just removing 'dir' (and several other
> members) from opts, so that callers can't initialize it in any way to
> anything.

I think your patches are fine as-is, and yes that would be a bug without
some of the changes I have in that WIP series.

As noted in
https://lore.kernel.org/git/87fstlrumj.fsf@evledraar.gmail.com/ I think
we're just having a side-discussion about init patterns, and what
direction to take these APIs when it comes to that...

>> >> course you need some "release()" method for the
>> >> UNPACK_TREES_OPTIONS_INIT, which in turn needs to call the dir_release()
>> >> (well, "dir_clear()" in that case), and it needs to call
>> >> "strbuf_release()". It's just nicer if that boilerplate is all on
>> >> destruction, but not also on struct/object setup.
>> >
>> > The caller should *not* be initializing or tearing down
>> > unpack_trees_options->dir beyond setting that field to NULL; it should
>> > then leave it alone.
>>
>> s/NULL/DIR_INIT/ in my version, but yes.
>>
>> >> We do need that setup in some cases (although a lot could just be
>> >> replaced by lazy initialization), but if we don't....
>> >>
>> >> diff --git a/unpack-trees.c b/unpack-trees.c
>> >> index a7e1712d236..de5cc6cd025 100644
>> >> --- a/unpack-trees.c
>> >> +++ b/unpack-trees.c
>> >> @@ -1694,15 +1694,12 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
>> >>         static struct cache_entry *dfc;
>> >>         struct pattern_list pl;
>> >>         int free_pattern_list = 0;
>> >> -       struct dir_struct dir = DIR_INIT;
>> >>
>> >>         if (o->reset == UNPACK_RESET_INVALID)
>> >>                 BUG("o->reset had a value of 1; should be UNPACK_TREES_*_UNTRACKED");
>> >>
>> >>         if (len > MAX_UNPACK_TREES)
>> >>                 die("unpack_trees takes at most %d trees", MAX_UNPACK_TREES);
>> >> -       if (o->dir)
>> >> -               BUG("o->dir is for internal use only");
>> >
>> > I think this was an important check that you've tossed without
>> > replacement.  Historically, callers set up and tweaked o->dir with
>> > various values.  With my patch, we are no longer allowing that..which
>> > introduces a transition problem -- people might have written or are
>> > now writing patches that make new calls of unpack_trees() previous to
>> > this change of mine, but submit them after this change of mine gets
>> > merged.  Without this check I added, they'd probably just do a
>> > mechanical `o->dir->` change to `o->dir.` and assume it's good...and
>> > then possibly have ugly bugs to hunt down.
>> >
>> > So, I think it's helpful to have a check that provides an early
>> > warning that tweaking o->dir is not only no longer required, but also
>> > no longer allowed.
>>
>> The compiler will catch any such use of the pointer version on a
>> mis-merge, or do you just mean that the person running into that might
>> get the resolution wrong? I.e. before we could check o->dir being NULL
>> for "do we have an exclude", but &o->dir will always be true?
>
> The compiler catches and reports, and then the human sees the error
> and just transliterates "o->dir->" to "o->dir.".  And then it
> compiles, and the person assumes they fixed it correctly, but the
> transliteration was WRONG and has subtle bugs because they had been
> setting up o->dir with special values and something needs to warn them
> that they shouldn't be touching o->dir anymore.  You removed the
> safety check that would have let them know that their straightforward
> transliteration was wrong.  I added that safety check intentionally,
> and don't like seeing it ripped out without a replacement.

I think that in general we've dealt with that in other places, e.g. if
we say have a "struct strbuf *" and memzero the containing struct we can
check against NULL as an alias for "do we have this at all", but if it's
a "struct strbuf" initialized with STRBUF_INIT we'll either need a
"have_the_string" side-member, or (if unambiguous) to check if .len == 0
(or more nastily perhaps, check .alloc or if it's the slopbuf).

>> >>         trace_performance_enter();
>> >>         trace2_region_enter("unpack_trees", "unpack_trees", the_repository);
>> >> @@ -1718,9 +1715,8 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
>> >>                 BUG("UNPACK_RESET_OVERWRITE_UNTRACKED incompatible with preserved ignored files");
>> >>
>> >>         if (!o->preserve_ignored) {
>> >> -               o->dir = &dir;
>> >> -               o->dir->flags |= DIR_SHOW_IGNORED;
>> >> -               setup_standard_excludes(o->dir);
>> >> +               o->dir.flags |= DIR_SHOW_IGNORED;
>> >> +               setup_standard_excludes(&o->dir);
>> >>         }
>> >>
>> >>         if (!core_apply_sparse_checkout || !o->update)
>> >> @@ -1884,10 +1880,7 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
>> >>  done:
>> >>         if (free_pattern_list)
>> >>                 clear_pattern_list(&pl);
>> >> -       if (o->dir) {
>> >> -               dir_clear(o->dir);
>> >> -               o->dir = NULL;
>> >> -       }
>> >> +       dir_clear(&o->dir);
>> >
>> > Unconditionally calling dir_clear()...
>>
>> As before I'm not sure about bugs in the ad-hoc patch on top, but I
>> don't think this is a bug in my version linked above.
>>
>> I.e. it's zero'd out, and the dir_clear() either ends up calling
>> free(NULL) or tries to loop over 0..N where N will be 0, no?
>>
>> >>         trace2_region_leave("unpack_trees", "unpack_trees", the_repository);
>> >>         trace_performance_leave("unpack_trees");
>> >>         return ret;
>> >> @@ -2153,8 +2146,7 @@ static int verify_clean_subdirectory(const struct cache_entry *ce,
>> >>         pathbuf = xstrfmt("%.*s/", namelen, ce->name);
>> >>
>> >>         memset(&d, 0, sizeof(d));
>> >> -       if (o->dir)
>> >> -               d.exclude_per_dir = o->dir->exclude_per_dir;
>> >> +       d.exclude_per_dir = o->dir.exclude_per_dir;
>> >>         i = read_directory(&d, o->src_index, pathbuf, namelen+1, NULL);
>> >>         if (i)
>> >>                 return add_rejected_path(o, ERROR_NOT_UPTODATE_DIR, ce->name);
>> >> @@ -2201,8 +2193,7 @@ static int check_ok_to_remove(const char *name, int len, int dtype,
>> >>         if (ignore_case && icase_exists(o, name, len, st))
>> >>                 return 0;
>> >>
>> >> -       if (o->dir &&
>> >> -           is_excluded(o->dir, o->src_index, name, &dtype))
>> >> +       if (is_excluded(&o->dir, o->src_index, name, &dtype))
>> >
>> > Unconditionally calling is_excluded()...
>>
>> Which will just return "it's not", won't it? Just lik dir_clear() deals
>> with an "empty" dir_struct. There's existing callers of both with that
>> pattern in e.g. builtin/{add,clean}.c.
>
> That's a question that someone writing this patch should investigate
> and document.  I read through is_excluded() and the functions it
> calls, and I _think_ it's possibly correct in the current code.  But
> it's not trivial to verify.  And I'd look a bit closer at it to be
> sure it's correct before making this change.
>
> Yeah, I'm being pretty picky here, but this is dir.c we are dealing with.

I'll make sure to make that clear / test it etc. if & when I do end up
submitting that WIP stuff...

>> Maybe I've missed an edge case, but I think the only reason that "o->dir
>> &&" was there was because it was dynamically malloc'd before, but in my
>> version where we'll always have it initialized...
>
> I think after checking the code to verify this, that it deserves being
> mentioned in the commit message (at least given that this is dir.c
> we're talking about).

*nod*. See also my side-thread <877dexrqvg.fsf@evledraar.gmail.com> for
 some exhaustive verification. Will reference that...

>> >>                 /*
>> >>                  * ce->name is explicitly excluded, so it is Ok to
>> >>                  * overwrite it.
>> >> diff --git a/unpack-trees.h b/unpack-trees.h
>> >> index 71ffb7eeb0c..a8afbb20170 100644
>> >> --- a/unpack-trees.h
>> >> +++ b/unpack-trees.h
>> >> @@ -5,6 +5,7 @@
>> >>  #include "strvec.h"
>> >>  #include "string-list.h"
>> >>  #include "tree-walk.h"
>> >> +#include "dir.h"
>> >>
>> >>  #define MAX_UNPACK_TREES MAX_TRAVERSE_TREES
>> >>
>> >> @@ -95,7 +96,7 @@ struct unpack_trees_options {
>> >>         struct index_state result;
>> >>
>> >>         struct pattern_list *pl; /* for internal use */
>> >> -       struct dir_struct *dir; /* for internal use only */
>> >> +       struct dir_struct dir; /* for internal use only */
>> >>         struct checkout_metadata meta;
>> >>  };
>> >>
>> >
>> > Not only did you drop the important safety check that o->dir not be
>> > setup by the caller (which needs to be reinstated in some form), your
>> > solution also involves unconditionally calling dir_clear() and
>> > is_excluded().  It is not clear to me that those calls are safe...and
>> > that they will continue to be safe in the future.
>>
>> It is a common pattern we rely on, e.g. strbuf_release() and various
>> other custom free-like functions generally act as NOOP if they've got
>> nothing to do, just like free()...
>
> Yeah, "common pattern" is a useful way to want things to behave, and
> people assuming that in other parts of git's codebase is probably
> fine, but I don't trust dir.c to already behave according to common
> patterns.  My mistrust of it is deep enough that I think someone
> should verify the relevant parts of dir.c do behave that way before
> making changes like this, and then explicitly call it out in the
> commit message when the change is made.  dir.c had some weird dragons
> in it.  And after being repeatedly sucked back into dir.c problems for
> years because of weird side effects, and seeing how many piles of
> nearly-cancelling bugs it had, I don't trust it anymore.

*nod*

>> > Even if it is safe
>> > and will continue to be, I don't think this should be squashed into my
>> > patches.  I think it should be a separate patch with its own commit
>> > message that explicitly calls out this assumption.  Especially since
>> > this is dir.c, which is an area where attempting to fix one very
>> > simple little bug results in years of refactoring and fixing all kinds
>> > of historical messes, sometimes waiting a year and a half for
>> > responses to RFCs/review requests, and where we have to sometimes just
>> > give up on attempting to understand the purpose of various bits of
>> > code and instead rely on the regression tests and hope they are good
>> > enough.  I still think that dir.c deserves a little warning at the
>> > top, like the one I suggested in [1].
>> >
>> > [1] https://lore.kernel.org/git/CABPp-BFiwzzUgiTj_zu+vF5x20L0=1cf25cHwk7KZQj2YkVzXw@mail.gmail.com/
>>
>> *nod* I can always submit something like this afterwards.
>
> :-)
>
>> Just on this series: Perhaps this discussion is a sign that this memory
>> leak fixing should be its own cleanup series where we could hash out any
>> approaches to doing that? I.e. as noted before I realize I'm to blame
>> for suggesting it in the first place, but those parts of these changes
>> don't seem like they're needed by other parts of the series (I tried
>> compiling with the two relevant patches ejected out).
>>
>> Having a humongous set of memory leak fixes locally at this point, I
>> think it's generally not very worth the effort to fix a leak in b()
>> where a() calls b() and b() calls c(), and all of [abc]() are
>> leaking. I.e. often narrowly fixing leaks in b() will lead to different
>> solutions than if you're trying to resolve all of [abc](), as their
>> interaction comes into play.
>
> The patch we are commenting on isn't a leakfix, and is very much
> integral to the series for reasons other than actual leakfix (which
> occurred two patches before this one).  It's simplifying the API to
> not require a bunch of boilerplate setup of opts->dir and instead
> requiring callers to just set a simple boolean INSTEAD (and only if
> they want non-default behavior, rather than requiring setting
> opts->dir by all those who did want default behavior).  That way, when
> I need to make sure several additional callers who should have gotten
> the default behavior actually get it, they don't have to call a bunch
> of functions to setup and cleanup opts->dir.

I stand corrected. I have not given this series as a whole any careful
review, sorry. This whole side-thread came about because I noticed the
memory allocation part of this since it was something I was working in
on in that WIP...

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH v3 04/11] unpack-trees: introduce preserve_ignored to unpack_trees_options
  2021-10-02  8:44                       ` Ævar Arnfjörð Bjarmason
@ 2021-10-03 22:21                         ` Ævar Arnfjörð Bjarmason
  2021-10-04 13:45                           ` Elijah Newren
  2021-10-04 13:45                         ` Elijah Newren
  1 sibling, 1 reply; 82+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-10-03 22:21 UTC (permalink / raw)
  To: Elijah Newren
  Cc: Elijah Newren via GitGitGadget, Git Mailing List, Fedor Biryukov,
	Philip Oakley, Phillip Wood


On Sat, Oct 02 2021, Ævar Arnfjörð Bjarmason wrote:

> On Fri, Oct 01 2021, Elijah Newren wrote:
>
>> On Fri, Oct 1, 2021 at 1:47 AM Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote:
>>>
>>> On Thu, Sep 30 2021, Elijah Newren wrote:
>>>
>>> > On Thu, Sep 30, 2021 at 7:15 AM Ævar Arnfjörð Bjarmason
>>> > <avarab@gmail.com> wrote:
>>> >>
>>> >> On Wed, Sep 29 2021, Elijah Newren wrote:
[...]
>>> > I might be going on a tangent here, but looking at that patch, I'm
>>> > worried that dir_init() was buggy and that you perpetuated that bug
>>> > with DIR_INIT.  Note that dir_struct has a struct strbuf basebuf
>>> > member, which neither dir_init() or DIR_INIT initialize properly
>>> > (using either strbuf_init() or STRBUF_INIT).  As far as I can tell,
>>> > dir.c relies on either strbuf_add() calls to just happen to work with
>>> > this incorrectly initialized strbuf, or else use the strbuf_init()
>>> > call in prep_exclude() to do so, using the following snippet:
>>> >
>>> >     if (!dir->basebuf.buf)
>>> >         strbuf_init(&dir->basebuf, PATH_MAX);
>>> >
>>> > However, earlier in that same function we see
>>> >
>>> >     if (stk->baselen <= baselen &&
>>> >         !strncmp(dir->basebuf.buf, base, stk->baselen))
>>> >             break;
>>> >
>>> > So either that function can never have dir->basebuf.buf be NULL and
>>> > the strbuf_init() is dead code, or else it's possible for us to
>>> > trigger a segfault.  If it's the former, it may just be a ticking time
>>> > bomb that will transform into the latter with some other change,
>>> > because it's not at all obvious to me how dir->basebuf gets
>>> > initialized appropriately to avoid that strncmp call.  Perhaps there
>>> > is some invariant where exclude_stack is only set up by previous calls
>>> > to prep_exclude() and those won't set up exclude_stack until first
>>> > initializing basebuf.  But that really at least deserves a comment
>>> > about how we're abusing basebuf, and would probably be cleaner if we
>>> > initialized basebuf to STRBUF_INIT.
>>>
>>> ...because yes, I forgot about that when sending you the diff-on-top,
>>> sorry. Yes that's buggy with the diff-on-top I sent you.
>>
>> That bug didn't come from the diff-on-top you sent me, it came from
>> the commit already merged to master -- ce93a4c6127  (dir.[ch]: replace
>> dir_init() with DIR_INIT, 2021-07-01), merged as part of
>> ab/struct-init on Jul 16.
>
> Ah, I misunderstood you there. I'll look at that / fix it. Sorry.

Just to tie up this loose end: Yes this control flow suck, and I've got
some patches to unpack-trees.[ch] & dir.[ch] I'm about to submit to fix
it. But just to comment on the existing behavior of the code, i.e. your
(above):

    "So either that function can never have dir->basebuf.buf be NULL and
    the strbuf_init() is dead code, or else it's possible for us to
    trigger a segfault.".

I hadn't had time to look into it when I said I'd fix it, but now that I
have I found thath there's nothing to fix, and this code wasn't buggy
either before or after my ce93a4c6127 (dir.[ch]: replace dir_init() with
DIR_INIT, 2021-07-01). I.e. we do have the invariant you mentioned.

The dir.[ch] API has always relied on the "struct dir_struct" being
zero'd out. First with memset() before your eceba532141 (dir: fix
problematic API to avoid memory leaks, 2020-08-18), and after my
ce93a4c6127 with the DIR_INIT, which both amount to the same thing.

We both missed a caller that used neither dir_init() nor uses DIR_INIT
now, but it uses "{ 0 }", so it's always zero'd.

Now, of course it being zero'd *would* segfault if you feed
"dir->basebuf.buf" to strncmp() as you note above, but that code isn't
reachable. The structure of that function is (pseudocode):

void prep_exclude(...)
{
	struct exclude_stack *stk = NULL;
	[...]

	while ((stk = dir->exclude_stack) != NULL)
		/* the strncmp() against "dir->basebuf.buf" is here */

	/* maybe we'll early return here */

	if (!dir->basebuf.buf)
		strbuf_init(&dir->basebuf, PATH_MAX);

	/*
         * Code that sets dir->exclude_stack to non-NULL for the first
	 * time follows...
	 */
}

I.e. dir->exclude_stack is *only* referenced in this function and
dir_clear() (where we also check it for NULL first).

It's state management between calls to prep_exclude(). So that that
initial while-loop can only be entered the the >1th time prep_exclude()
is called.

We'll then either have reached that strbuf_init() already, or if we took
an early return before the strbuf_init() we couldn't have set
dir->exclude_stack either. So that "dir->basebuf.buf" dereference is
safe in either case.

^ permalink raw reply	[flat|nested] 82+ messages in thread

* [RFC PATCH v4 00/10] Fix various issues around removal of untracked files/directories
  2021-09-27 16:33   ` [PATCH v3 00/11] Fix various issues around removal of " Elijah Newren via GitGitGadget
                       ` (13 preceding siblings ...)
       [not found]     ` <aaa8ea3b-0902-f9e6-c1a4-0ca2b1b2f57b@gmail.com>
@ 2021-10-04  1:11     ` Ævar Arnfjörð Bjarmason
  2021-10-04  1:11       ` [RFC PATCH v4 01/10] t2500: add various tests for nuking untracked files Ævar Arnfjörð Bjarmason
                         ` (10 more replies)
  14 siblings, 11 replies; 82+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-10-04  1:11 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Elijah Newren,
	Nguyễn Thái Ngọc Duy, Martin Ågren,
	Andrzej Hunt, Jeff King, Fedor Biryukov, Philip Oakley,
	Phillip Wood, Ævar Arnfjörð Bjarmason

This is an RFC proposed v4 of Elijah's en/removing-untracked-fixes
series[1] based on top of my memory leak fixes in the "unpack-trees" &
"dir" APIs[2].

As noted in [2] Elijah and I have been having a back & forth about the
approach his series takes to fixing memory leaks in those APIs. I
think submitting working code is more productive than continuing that
point-by-point discussion, so here we are.

I've avoided making any changes to this series except those narrowly
required to rebase it on top of mine, and to those parts of Elijah's
commit messages that became outdated as a result. In particular
3/10[3]'s is significantly changed, as much of its commit message
dicusses complexities that have gone away due to my preceding
series[2].

The "make dir an internal-only struct" has been replaced by a commit
that renames that struct member from "dir" to "private_dir". I think
even that is unnecessary as argued in [4], but I think the judgement
that something must be done to address that is Elijah's design
decision, so I did my best to retain it.

I did drop the dynamic allocation & it being a pointer, since with my
preceding [2] and subsequent unsubmitted memory leak fixes I've got on
top having it be embedded in "struct unpack_trees_options" makes
things easier to manage.

Havingn read through all this code quite thoroughly at this point I do
have other comments on it, but I'll reserve those until we've found
out what direction we're going forward with vis-a-vis what this will
be based on top of.

I'm (obviously) hoping for an answer of either on top of my series[2],
or alternatively that Elijah's series can stick to introducing the
"preserve_ignored" flag, but not change how the memory
management/name/type of the embedded "dir" happens (and we could thus
proceed in parallel).

But I'll hold off on any such general comments until we've got a way
forward with this, since if I start commenting inline on patches in
Elijah's v3, or this RFC-v4 on something unrelated to this proposed
re-arrangement that'll likely just confuse things, particularly as
some of those comments would be different depending on the base of his
v3 v.s. my series[2] in this RFC v4.

1. https://lore.kernel.org/git/pull.1036.v3.git.1632760428.gitgitgadget@gmail.com
2. https://lore.kernel.org/git/cover-00.10-00000000000-20211004T002226Z-avarab@gmail.com/
3. https://lore.kernel.org/git/RFC-patch-v4-03.10-739e9b871c4-20211004T004902Z-avarab@gmail.com
4. https://lore.kernel.org/git/87k0ivpzfx.fsf@evledraar.gmail.com

Elijah Newren (10):
  t2500: add various tests for nuking untracked files
  read-tree, merge-recursive: overwrite ignored files by default
  unpack-trees: introduce preserve_ignored to unpack_trees_options
  unpack-trees: rename "dir" to "private_dir"
  Remove ignored files by default when they are in the way
  Change unpack_trees' 'reset' flag into an enum
  unpack-trees: avoid nuking untracked dir in way of unmerged file
  unpack-trees: avoid nuking untracked dir in way of locally deleted
    file
  Comment important codepaths regarding nuking untracked files/dirs
  Documentation: call out commands that nuke untracked files/directories

 Documentation/git-checkout.txt   |   5 +-
 Documentation/git-read-tree.txt  |  23 +--
 Documentation/git-reset.txt      |   3 +-
 builtin/am.c                     |   3 +-
 builtin/checkout.c               |   9 +-
 builtin/clone.c                  |   1 +
 builtin/merge.c                  |   1 +
 builtin/read-tree.c              |  23 ++-
 builtin/reset.c                  |  10 +-
 builtin/stash.c                  |   5 +-
 builtin/submodule--helper.c      |   4 +
 contrib/rerere-train.sh          |   2 +-
 merge-ort.c                      |   5 +-
 merge-recursive.c                |   5 +-
 merge.c                          |   6 +-
 reset.c                          |   3 +-
 sequencer.c                      |   1 +
 submodule.c                      |   1 +
 t/t1013-read-tree-submodule.sh   |   1 -
 t/t2500-untracked-overwriting.sh | 244 +++++++++++++++++++++++++++++++
 t/t7112-reset-submodule.sh       |   1 -
 unpack-trees.c                   |  59 +++++++-
 unpack-trees.h                   |  16 +-
 23 files changed, 362 insertions(+), 69 deletions(-)
 create mode 100755 t/t2500-untracked-overwriting.sh

Range-diff against v3:
 1:  66270ffc74e !  1:  3a3203beee6 t2500: add various tests for nuking untracked files
    @@ Commit message
         removing untracked files and directories.
     
         Signed-off-by: Elijah Newren <newren@gmail.com>
    +    Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
     
      ## t/t2500-untracked-overwriting.sh (new) ##
     @@
 2:  0c74285b253 <  -:  ----------- checkout, read-tree: fix leak of unpack_trees_options.dir
 3:  2501a0c552a !  2:  8e5f4006604 read-tree, merge-recursive: overwrite ignored files by default
    @@ Commit message
         The read-tree changes happen to fix a bug in t1013.
     
         Signed-off-by: Elijah Newren <newren@gmail.com>
    +    Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
     
      ## Documentation/git-read-tree.txt ##
     @@ Documentation/git-read-tree.txt: SYNOPSIS
    @@ builtin/read-tree.c: static int list_tree(struct object_id *oid)
      	NULL
      };
      
    -@@ builtin/read-tree.c: static int index_output_cb(const struct option *opt, const char *arg,
    - static int exclude_per_directory_cb(const struct option *opt, const char *arg,
    - 				    int unset)
    - {
    --	struct dir_struct *dir;
    - 	struct unpack_trees_options *opts;
    - 
    - 	BUG_ON_OPT_NEG(unset);
    +@@ builtin/read-tree.c: static int exclude_per_directory_cb(const struct option *opt, const char *arg,
      
      	opts = (struct unpack_trees_options *)opt->value;
      
    --	if (opts->dir)
    +-	if (opts->dir.exclude_per_dir)
     -		die("more than one --exclude-per-directory given.");
     -
    --	dir = xcalloc(1, sizeof(*opts->dir));
    --	dir->flags |= DIR_SHOW_IGNORED;
    --	dir->exclude_per_dir = arg;
    --	opts->dir = dir;
    +-	opts->dir.flags |= DIR_SHOW_IGNORED;
    +-	opts->dir.exclude_per_dir = arg;
     -	/* We do not need to nor want to do read-directory
     -	 * here; we are merely interested in reusing the
     -	 * per directory ignore stack mechanism.
    @@ builtin/read-tree.c: int cmd_read_tree(int argc, const char **argv, const char *
      	if ((opts.update || opts.index_only) && !opts.merge)
      		die("%s is meaningless without -m, --reset, or --prefix",
      		    opts.update ? "-u" : "-i");
    --	if ((opts.dir && !opts.update))
    +-	if ((opts.dir.exclude_per_dir && !opts.update))
     -		die("--exclude-per-directory is meaningless unless -u");
     +	if (opts.update && !opts.reset) {
    -+		CALLOC_ARRAY(opts.dir, 1);
    -+		opts.dir->flags |= DIR_SHOW_IGNORED;
    -+		setup_standard_excludes(opts.dir);
    ++		opts.dir.flags |= DIR_SHOW_IGNORED;
    ++		setup_standard_excludes(&opts.dir);
     +	}
      	if (opts.merge && !opts.index_only)
      		setup_work_tree();
    @@ builtin/read-tree.c: int cmd_read_tree(int argc, const char **argv, const char *
     
      ## merge-recursive.c ##
     @@ merge-recursive.c: static int unpack_trees_start(struct merge_options *opt,
    - 	memset(&opt->priv->unpack_opts, 0, sizeof(opt->priv->unpack_opts));
    + 	unpack_trees_options_init(&opt->priv->unpack_opts);
      	if (opt->priv->call_depth)
      		opt->priv->unpack_opts.index_only = 1;
     -	else
     +	else {
      		opt->priv->unpack_opts.update = 1;
    -+		/* FIXME: should only do this if !overwrite_ignore */
    -+		CALLOC_ARRAY(opt->priv->unpack_opts.dir, 1);
    -+		opt->priv->unpack_opts.dir->flags |= DIR_SHOW_IGNORED;
    -+		setup_standard_excludes(opt->priv->unpack_opts.dir);
    ++		opt->priv->unpack_opts.dir.flags |= DIR_SHOW_IGNORED;
    ++		setup_standard_excludes(&opt->priv->unpack_opts.dir);
     +	}
      	opt->priv->unpack_opts.merge = 1;
      	opt->priv->unpack_opts.head_idx = 2;
      	opt->priv->unpack_opts.fn = threeway_merge;
    -@@ merge-recursive.c: static int unpack_trees_start(struct merge_options *opt,
    - 	init_tree_desc_from_tree(t+2, merge);
    - 
    - 	rc = unpack_trees(3, t, &opt->priv->unpack_opts);
    -+	if (opt->priv->unpack_opts.dir) {
    -+		dir_clear(opt->priv->unpack_opts.dir);
    -+		FREE_AND_NULL(opt->priv->unpack_opts.dir);
    -+	}
    - 	cache_tree_free(&opt->repo->index->cache_tree);
    - 
    - 	/*
     
      ## t/t1013-read-tree-submodule.sh ##
     @@ t/t1013-read-tree-submodule.sh: test_description='read-tree can handle submodules'
 4:  f1a0700e598 !  3:  739e9b871c4 unpack-trees: introduce preserve_ignored to unpack_trees_options
    @@ Commit message
     
         Currently, every caller of unpack_trees() that wants to ensure ignored
         files are overwritten by default needs to:
    -       * allocate unpack_trees_options.dir
    -       * flip the DIR_SHOW_IGNORED flag in unpack_trees_options.dir->flags
    -       * call setup_standard_excludes
    -    AND then after the call to unpack_trees() needs to
    -       * call dir_clear()
    -       * deallocate unpack_trees_options.dir
    -    That's a fair amount of boilerplate, and every caller uses identical
    -    code.  Make this easier by instead introducing a new boolean value where
    +
    +       * flip the DIR_SHOW_IGNORED flag in unpack_trees_options.dir.flags
    +       * call setup_standard_excludes(&unpack_trees_options.dir)
    +
    +    Avoid that boilerplate by introducing a new boolean value where
         the default value (0) does what we want so that new callers of
         unpack_trees() automatically get the appropriate behavior.  And move all
         the handling of unpack_trees_options.dir into unpack_trees() itself.
    @@ Commit message
         behavior we want.
     
         Signed-off-by: Elijah Newren <newren@gmail.com>
    +    Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
     
      ## builtin/am.c ##
     @@ builtin/am.c: static int fast_forward_to(struct tree *head, struct tree *remote, int reset)
    @@ builtin/checkout.c: static int merge_working_tree(const struct checkout_opts *op
      				       &new_branch_info->commit->object.oid :
      				       &new_branch_info->oid, NULL);
     -		if (opts->overwrite_ignore) {
    --			topts.dir = xcalloc(1, sizeof(*topts.dir));
    --			topts.dir->flags |= DIR_SHOW_IGNORED;
    --			setup_standard_excludes(topts.dir);
    +-			topts.dir.flags |= DIR_SHOW_IGNORED;
    +-			setup_standard_excludes(&topts.dir);
     -		}
     +		topts.preserve_ignored = !opts->overwrite_ignore;
      		tree = parse_tree_indirect(old_branch_info->commit ?
      					   &old_branch_info->commit->object.oid :
      					   the_hash_algo->empty_tree);
    -@@ builtin/checkout.c: static int merge_working_tree(const struct checkout_opts *opts,
    - 		init_tree_desc(&trees[1], tree->buffer, tree->size);
    - 
    - 		ret = unpack_trees(2, trees, &topts);
    --		if (topts.dir) {
    --			dir_clear(topts.dir);
    --			FREE_AND_NULL(topts.dir);
    --		}
    - 		clear_unpack_trees_porcelain(&topts);
    - 		if (ret == -1) {
    - 			/*
     
      ## builtin/clone.c ##
     @@ builtin/clone.c: static int checkout(int submodule_progress)
    @@ builtin/read-tree.c: int cmd_read_tree(int argc, const char **argv, const char *
      		die("%s is meaningless without -m, --reset, or --prefix",
      		    opts.update ? "-u" : "-i");
     -	if (opts.update && !opts.reset) {
    --		CALLOC_ARRAY(opts.dir, 1);
    --		opts.dir->flags |= DIR_SHOW_IGNORED;
    --		setup_standard_excludes(opts.dir);
    +-		opts.dir.flags |= DIR_SHOW_IGNORED;
    +-		setup_standard_excludes(&opts.dir);
     -	}
     +	if (opts.update && !opts.reset)
     +		opts.preserve_ignored = 0;
    @@ builtin/read-tree.c: int cmd_read_tree(int argc, const char **argv, const char *
      	if (opts.merge && !opts.index_only)
      		setup_work_tree();
      
    -@@ builtin/read-tree.c: int cmd_read_tree(int argc, const char **argv, const char *cmd_prefix)
    - 	if (unpack_trees(nr_trees, t, &opts))
    - 		return 128;
    - 
    --	if (opts.dir) {
    --		dir_clear(opts.dir);
    --		FREE_AND_NULL(opts.dir);
    --	}
    --
    - 	if (opts.debug_unpack || opts.dry_run)
    - 		return 0; /* do not write the index out */
    - 
     
      ## builtin/reset.c ##
     @@ builtin/reset.c: static int reset_index(const char *ref, const struct object_id *oid, int reset_t
    @@ builtin/stash.c: static int reset_tree(struct object_id *i_tree, int update, int
     +		opts.preserve_ignored = 1;
      	opts.fn = oneway_merge;
      
    - 	if (unpack_trees(nr_trees, t, &opts))
    + 	if (unpack_trees(nr_trees, t, &opts)) {
     
      ## merge-ort.c ##
     @@ merge-ort.c: static int checkout(struct merge_options *opt,
    @@ merge-ort.c: static int checkout(struct merge_options *opt,
      	unpack_opts.verbose_update = (opt->verbosity > 2);
      	unpack_opts.fn = twoway_merge;
     -	if (1/* FIXME: opts->overwrite_ignore*/) {
    --		CALLOC_ARRAY(unpack_opts.dir, 1);
    --		unpack_opts.dir->flags |= DIR_SHOW_IGNORED;
    --		setup_standard_excludes(unpack_opts.dir);
    +-		unpack_opts.dir.flags |= DIR_SHOW_IGNORED;
    +-		setup_standard_excludes(&unpack_opts.dir);
     -	}
     +	unpack_opts.preserve_ignored = 0; /* FIXME: !opts->overwrite_ignore*/
      	parse_tree(prev);
      	init_tree_desc(&trees[0], prev->buffer, prev->size);
      	parse_tree(next);
    -@@ merge-ort.c: static int checkout(struct merge_options *opt,
    - 
    - 	ret = unpack_trees(2, trees, &unpack_opts);
    - 	clear_unpack_trees_porcelain(&unpack_opts);
    --	dir_clear(unpack_opts.dir);
    --	FREE_AND_NULL(unpack_opts.dir);
    - 	return ret;
    - }
    - 
     
      ## merge-recursive.c ##
     @@ merge-recursive.c: static int unpack_trees_start(struct merge_options *opt,
    + 		opt->priv->unpack_opts.index_only = 1;
      	else {
      		opt->priv->unpack_opts.update = 1;
    - 		/* FIXME: should only do this if !overwrite_ignore */
    --		CALLOC_ARRAY(opt->priv->unpack_opts.dir, 1);
    --		opt->priv->unpack_opts.dir->flags |= DIR_SHOW_IGNORED;
    --		setup_standard_excludes(opt->priv->unpack_opts.dir);
    +-		opt->priv->unpack_opts.dir.flags |= DIR_SHOW_IGNORED;
    +-		setup_standard_excludes(&opt->priv->unpack_opts.dir);
    ++		/* FIXME: should only do this if !overwrite_ignore */
     +		opt->priv->unpack_opts.preserve_ignored = 0;
      	}
      	opt->priv->unpack_opts.merge = 1;
      	opt->priv->unpack_opts.head_idx = 2;
    -@@ merge-recursive.c: static int unpack_trees_start(struct merge_options *opt,
    - 	init_tree_desc_from_tree(t+2, merge);
    - 
    - 	rc = unpack_trees(3, t, &opt->priv->unpack_opts);
    --	if (opt->priv->unpack_opts.dir) {
    --		dir_clear(opt->priv->unpack_opts.dir);
    --		FREE_AND_NULL(opt->priv->unpack_opts.dir);
    --	}
    - 	cache_tree_free(&opt->repo->index->cache_tree);
    - 
    - 	/*
     
      ## merge.c ##
     @@ merge.c: int checkout_fast_forward(struct repository *r,
    - 	struct unpack_trees_options opts;
    - 	struct tree_desc t[MAX_UNPACK_TREES];
    - 	int i, nr_trees = 0;
    --	struct dir_struct dir = DIR_INIT;
    - 	struct lock_file lock_file = LOCK_INIT;
    - 
    - 	refresh_index(r->index, REFRESH_QUIET, NULL, NULL, NULL);
    -@@ merge.c: int checkout_fast_forward(struct repository *r,
    + 		init_tree_desc(t+i, trees[i]->buffer, trees[i]->size);
      	}
      
    - 	memset(&opts, 0, sizeof(opts));
     -	if (overwrite_ignore) {
    --		dir.flags |= DIR_SHOW_IGNORED;
    --		setup_standard_excludes(&dir);
    --		opts.dir = &dir;
    +-		opts.dir.flags |= DIR_SHOW_IGNORED;
    +-		setup_standard_excludes(&opts.dir);
     -	}
    +-
     +	opts.preserve_ignored = !overwrite_ignore;
    - 
      	opts.head_idx = 1;
      	opts.src_index = r->index;
    -@@ merge.c: int checkout_fast_forward(struct repository *r,
    - 		clear_unpack_trees_porcelain(&opts);
    - 		return -1;
    - 	}
    --	dir_clear(&dir);
    - 	clear_unpack_trees_porcelain(&opts);
    - 
    - 	if (write_locked_index(r->index, &lock_file, COMMIT_LOCK))
    + 	opts.dst_index = r->index;
     
      ## reset.c ##
     @@ reset.c: int reset_head(struct repository *r, struct object_id *oid, const char *action,
    @@ unpack-trees.c: int unpack_trees(unsigned len, struct tree_desc *t, struct unpac
      	}
      
     +	if (!o->preserve_ignored) {
    -+		CALLOC_ARRAY(o->dir, 1);
    -+		o->dir->flags |= DIR_SHOW_IGNORED;
    -+		setup_standard_excludes(o->dir);
    ++		o->dir.flags |= DIR_SHOW_IGNORED;
    ++		setup_standard_excludes(&o->dir);
     +	}
     +
      	if (!core_apply_sparse_checkout || !o->update)
      		o->skip_sparse_checkout = 1;
      	if (!o->skip_sparse_checkout && !o->pl) {
    -@@ unpack-trees.c: int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
    - done:
    - 	if (free_pattern_list)
    - 		clear_pattern_list(&pl);
    -+	if (o->dir) {
    -+		dir_clear(o->dir);
    -+		FREE_AND_NULL(o->dir);
    -+	}
    - 	trace2_region_leave("unpack_trees", "unpack_trees", the_repository);
    - 	trace_performance_leave("unpack_trees");
    - 	return ret;
     
      ## unpack-trees.h ##
     @@ unpack-trees.h: struct unpack_trees_options {
 5:  0d119142778 !  4:  296c1e03673 unpack-trees: make dir an internal-only struct
    @@ Metadata
     Author: Elijah Newren <newren@gmail.com>
     
      ## Commit message ##
    -    unpack-trees: make dir an internal-only struct
    +    unpack-trees: rename "dir" to "private_dir"
     
    -    Avoid accidental misuse or confusion over ownership by clearly making
    -    unpack_trees_options.dir an internal-only variable.
    +    Until the introduction of the "preserve_ignored" flag in the preceding
    +    commit callers who wanted its behavior needed to adjust "dir.flags"
    +    and call setup_standard_excludes() themselves.
    +
    +    Now that we have no external users of "dir" anymore let's rename it to
    +    "private_dir" and add a comment indicating that we'd like it not to be
    +    messed with by external callers. This should avoid avoid accidental
    +    misuse or confusion over its ownership.
     
         Signed-off-by: Elijah Newren <newren@gmail.com>
    +    Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
     
      ## unpack-trees.c ##
    -@@ unpack-trees.c: int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
    - 	static struct cache_entry *dfc;
    - 	struct pattern_list pl;
    - 	int free_pattern_list = 0;
    -+	struct dir_struct dir = DIR_INIT;
    +@@ unpack-trees.c: void unpack_trees_options_init(struct unpack_trees_options *o)
    + void unpack_trees_options_release(struct unpack_trees_options *opts)
    + {
    + 	strvec_clear(&opts->msgs_to_free);
    +-	dir_clear(&opts->dir);
    ++	dir_clear(&opts->private_dir);
    + }
      
    - 	if (len > MAX_UNPACK_TREES)
    - 		die("unpack_trees takes at most %d trees", MAX_UNPACK_TREES);
    -+	if (o->dir)
    -+		BUG("o->dir is for internal use only");
    - 
    - 	trace_performance_enter();
    - 	trace2_region_enter("unpack_trees", "unpack_trees", the_repository);
    + static int do_add_entry(struct unpack_trees_options *o, struct cache_entry *ce,
     @@ unpack-trees.c: int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
      	}
      
      	if (!o->preserve_ignored) {
    --		CALLOC_ARRAY(o->dir, 1);
    -+		o->dir = &dir;
    - 		o->dir->flags |= DIR_SHOW_IGNORED;
    - 		setup_standard_excludes(o->dir);
    - 	}
    -@@ unpack-trees.c: int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
    - 		clear_pattern_list(&pl);
    - 	if (o->dir) {
    - 		dir_clear(o->dir);
    --		FREE_AND_NULL(o->dir);
    -+		o->dir = NULL;
    +-		o->dir.flags |= DIR_SHOW_IGNORED;
    +-		setup_standard_excludes(&o->dir);
    ++		o->private_dir.flags |= DIR_SHOW_IGNORED;
    ++		setup_standard_excludes(&o->private_dir);
      	}
    - 	trace2_region_leave("unpack_trees", "unpack_trees", the_repository);
    - 	trace_performance_leave("unpack_trees");
    + 
    + 	if (!core_apply_sparse_checkout || !o->update)
    +@@ unpack-trees.c: static int verify_clean_subdirectory(const struct cache_entry *ce,
    + 	 */
    + 	pathbuf = xstrfmt("%.*s/", namelen, ce->name);
    + 
    +-	d.exclude_per_dir = o->dir.exclude_per_dir;
    ++	d.exclude_per_dir = o->private_dir.exclude_per_dir;
    + 	i = read_directory(&d, o->src_index, pathbuf, namelen+1, NULL);
    + 	dir_clear(&d);
    + 	free(pathbuf);
    +@@ unpack-trees.c: static int check_ok_to_remove(const char *name, int len, int dtype,
    + 	if (ignore_case && icase_exists(o, name, len, st))
    + 		return 0;
    + 
    +-	if (is_excluded(&o->dir, o->src_index, name, &dtype))
    ++	if (is_excluded(&o->private_dir, o->src_index, name, &dtype))
    + 		/*
    + 		 * ce->name is explicitly excluded, so it is Ok to
    + 		 * overwrite it.
     
      ## unpack-trees.h ##
     @@ unpack-trees.h: struct unpack_trees_options {
      		     dry_run;
      	const char *prefix;
      	int cache_bottom;
    --	struct dir_struct *dir;
    +-	struct dir_struct dir;
    ++	struct dir_struct private_dir; /* for internal use only */
      	struct pathspec *pathspec;
      	merge_fn_t fn;
      	const char *msgs[NB_UNPACK_TREES_WARNING_TYPES];
     @@ unpack-trees.h: struct unpack_trees_options {
    - 	struct index_state result;
      
    - 	struct pattern_list *pl; /* for internal use */
    -+	struct dir_struct *dir; /* for internal use only */
    - 	struct checkout_metadata meta;
    - };
    + #define UNPACK_TREES_OPTIONS_INIT { \
    + 	.msgs_to_free = STRVEC_INIT, \
    +-	.dir = DIR_INIT, \
    ++	.private_dir = DIR_INIT, \
    + }
    + void unpack_trees_options_init(struct unpack_trees_options *o);
      
 6:  b7fe354efff !  5:  27496506430 Remove ignored files by default when they are in the way
    @@ Commit message
         Incidentally, this fixes a test failure in t7112.
     
         Signed-off-by: Elijah Newren <newren@gmail.com>
    +    Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
     
      ## builtin/am.c ##
     @@ builtin/am.c: static int fast_forward_to(struct tree *head, struct tree *remote, int reset)
    @@ builtin/stash.c: static int reset_tree(struct object_id *i_tree, int update, int
     +		opts.preserve_ignored = 0; /* FIXME: !overwrite_ignore */
      	opts.fn = oneway_merge;
      
    - 	if (unpack_trees(nr_trees, t, &opts))
    + 	if (unpack_trees(nr_trees, t, &opts)) {
     
      ## merge-ort.c ##
     @@ merge-ort.c: static int checkout(struct merge_options *opt,
 7:  9eb20121fc3 !  6:  7b539a120b9 Change unpack_trees' 'reset' flag into an enum
    @@ Commit message
         [1] https://lore.kernel.org/git/15dad590-087e-5a48-9238-5d2826950506@gmail.com/
     
         Signed-off-by: Elijah Newren <newren@gmail.com>
    +    Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
     
      ## builtin/am.c ##
     @@ builtin/am.c: static int fast_forward_to(struct tree *head, struct tree *remote, int reset)
    @@ t/t2500-untracked-overwriting.sh: test_expect_failure 'git rebase --abort and un
     
      ## unpack-trees.c ##
     @@ unpack-trees.c: int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
    + 	struct pattern_list pl;
      	int free_pattern_list = 0;
    - 	struct dir_struct dir = DIR_INIT;
      
     +	if (o->reset == UNPACK_RESET_INVALID)
     +		BUG("o->reset had a value of 1; should be UNPACK_TREES_*_UNTRACKED");
     +
      	if (len > MAX_UNPACK_TREES)
      		die("unpack_trees takes at most %d trees", MAX_UNPACK_TREES);
    - 	if (o->dir)
    + 
     @@ unpack-trees.c: int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
      		ensure_full_index(o->dst_index);
      	}
    @@ unpack-trees.c: int unpack_trees(unsigned len, struct tree_desc *t, struct unpac
     +		BUG("UNPACK_RESET_OVERWRITE_UNTRACKED incompatible with preserved ignored files");
     +
      	if (!o->preserve_ignored) {
    - 		o->dir = &dir;
    - 		o->dir->flags |= DIR_SHOW_IGNORED;
    + 		o->private_dir.flags |= DIR_SHOW_IGNORED;
    + 		setup_standard_excludes(&o->private_dir);
     @@ unpack-trees.c: static int verify_absent_1(const struct cache_entry *ce,
      	int len;
      	struct stat st;
    @@ unpack-trees.c: static int verify_absent_1(const struct cache_entry *ce,
      ## unpack-trees.h ##
     @@ unpack-trees.h: void setup_unpack_trees_porcelain(struct unpack_trees_options *opts,
       */
    - void clear_unpack_trees_porcelain(struct unpack_trees_options *opts);
    + void unpack_trees_options_release(struct unpack_trees_options *opts);
      
     +enum unpack_trees_reset_type {
     +	UNPACK_RESET_NONE = 0,    /* traditional "false" value; still valid */
    @@ unpack-trees.h: struct unpack_trees_options {
     +	enum unpack_trees_reset_type reset;
      	const char *prefix;
      	int cache_bottom;
    - 	struct pathspec *pathspec;
    + 	struct dir_struct private_dir; /* for internal use only */
 8:  e4c42d43b09 !  7:  b6769f629ae unpack-trees: avoid nuking untracked dir in way of unmerged file
    @@ Commit message
         unpack-trees: avoid nuking untracked dir in way of unmerged file
     
         Signed-off-by: Elijah Newren <newren@gmail.com>
    +    Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
     
      ## t/t2500-untracked-overwriting.sh ##
     @@ t/t2500-untracked-overwriting.sh: test_expect_failure 'git stash and untracked files' '
 9:  1a770681704 !  8:  10a7cbf049e unpack-trees: avoid nuking untracked dir in way of locally deleted file
    @@ Commit message
         unpack-trees: avoid nuking untracked dir in way of locally deleted file
     
         Signed-off-by: Elijah Newren <newren@gmail.com>
    +    Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
     
      ## t/t2500-untracked-overwriting.sh ##
     @@ t/t2500-untracked-overwriting.sh: test_expect_success 'git am --abort and untracked dir vs. unmerged file' '
10:  6b42a80bf3d !  9:  b2f27d961a9 Comment important codepaths regarding nuking untracked files/dirs
    @@ Commit message
         running in an empty directory.
     
         Signed-off-by: Elijah Newren <newren@gmail.com>
    +    Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
     
      ## builtin/stash.c ##
     @@ builtin/stash.c: static int do_push_stash(const struct pathspec *ps, const char *stash_msg, int q
11:  de416f887d7 ! 10:  e88f81baa50 Documentation: call out commands that nuke untracked files/directories
    @@ Commit message
         these cases.
     
         Signed-off-by: Elijah Newren <newren@gmail.com>
    +    Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
     
      ## Documentation/git-checkout.txt ##
     @@ Documentation/git-checkout.txt: OPTIONS
-- 
2.33.0.1404.g83021034c5d


^ permalink raw reply	[flat|nested] 82+ messages in thread

* [RFC PATCH v4 01/10] t2500: add various tests for nuking untracked files
  2021-10-04  1:11     ` [RFC PATCH v4 00/10] " Ævar Arnfjörð Bjarmason
@ 2021-10-04  1:11       ` Ævar Arnfjörð Bjarmason
  2021-10-04  1:11       ` [RFC PATCH v4 02/10] read-tree, merge-recursive: overwrite ignored files by default Ævar Arnfjörð Bjarmason
                         ` (9 subsequent siblings)
  10 siblings, 0 replies; 82+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-10-04  1:11 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Elijah Newren,
	Nguyễn Thái Ngọc Duy, Martin Ågren,
	Andrzej Hunt, Jeff King, Fedor Biryukov, Philip Oakley,
	Phillip Wood, Ævar Arnfjörð Bjarmason

From: Elijah Newren <newren@gmail.com>

Noting that unpack_trees treats reset=1 & update=1 as license to nuke
untracked files, I looked for code paths that use this combination and
tried to generate testcases which demonstrated unintentional loss of
untracked files and directories.  I found several.

I also include testcases for `git reset --{hard,merge,keep}`.  A hard
reset is perhaps the most direct test of unpack_tree's reset=1 behavior,
but we cannot make `git reset --hard` preserve untracked files without
some migration work.

Also, the two commands `checkout --force` (because of the --force) and
`read-tree --reset` (because it's plumbing and we need to keep it
backward compatible) were left out as we expect those to continue
removing untracked files and directories.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/t2500-untracked-overwriting.sh | 244 +++++++++++++++++++++++++++++++
 1 file changed, 244 insertions(+)
 create mode 100755 t/t2500-untracked-overwriting.sh

diff --git a/t/t2500-untracked-overwriting.sh b/t/t2500-untracked-overwriting.sh
new file mode 100755
index 00000000000..2412d121ea8
--- /dev/null
+++ b/t/t2500-untracked-overwriting.sh
@@ -0,0 +1,244 @@
+#!/bin/sh
+
+test_description='Test handling of overwriting untracked files'
+
+. ./test-lib.sh
+
+test_setup_reset () {
+	git init reset_$1 &&
+	(
+		cd reset_$1 &&
+		test_commit init &&
+
+		git branch stable &&
+		git branch work &&
+
+		git checkout work &&
+		test_commit foo &&
+
+		git checkout stable
+	)
+}
+
+test_expect_success 'reset --hard will nuke untracked files/dirs' '
+	test_setup_reset hard &&
+	(
+		cd reset_hard &&
+		git ls-tree -r stable &&
+		git log --all --name-status --oneline &&
+		git ls-tree -r work &&
+
+		mkdir foo.t &&
+		echo precious >foo.t/file &&
+		echo foo >expect &&
+
+		git reset --hard work &&
+
+		# check that untracked directory foo.t/ was nuked
+		test_path_is_file foo.t &&
+		test_cmp expect foo.t
+	)
+'
+
+test_expect_success 'reset --merge will preserve untracked files/dirs' '
+	test_setup_reset merge &&
+	(
+		cd reset_merge &&
+
+		mkdir foo.t &&
+		echo precious >foo.t/file &&
+		cp foo.t/file expect &&
+
+		test_must_fail git reset --merge work 2>error &&
+		test_cmp expect foo.t/file &&
+		grep "Updating .foo.t. would lose untracked files" error
+	)
+'
+
+test_expect_success 'reset --keep will preserve untracked files/dirs' '
+	test_setup_reset keep &&
+	(
+		cd reset_keep &&
+
+		mkdir foo.t &&
+		echo precious >foo.t/file &&
+		cp foo.t/file expect &&
+
+		test_must_fail git reset --merge work 2>error &&
+		test_cmp expect foo.t/file &&
+		grep "Updating.*foo.t.*would lose untracked files" error
+	)
+'
+
+test_setup_checkout_m () {
+	git init checkout &&
+	(
+		cd checkout &&
+		test_commit init &&
+
+		test_write_lines file has some >filler &&
+		git add filler &&
+		git commit -m filler &&
+
+		git branch stable &&
+
+		git switch -c work &&
+		echo stuff >notes.txt &&
+		test_write_lines file has some words >filler &&
+		git add notes.txt filler &&
+		git commit -m filler &&
+
+		git checkout stable
+	)
+}
+
+test_expect_failure 'checkout -m does not nuke untracked file' '
+	test_setup_checkout_m &&
+	(
+		cd checkout &&
+
+		# Tweak filler
+		test_write_lines this file has some >filler &&
+		# Make an untracked file, save its contents in "expect"
+		echo precious >notes.txt &&
+		cp notes.txt expect &&
+
+		test_must_fail git checkout -m work &&
+		test_cmp expect notes.txt
+	)
+'
+
+test_setup_sequencing () {
+	git init sequencing_$1 &&
+	(
+		cd sequencing_$1 &&
+		test_commit init &&
+
+		test_write_lines this file has some words >filler &&
+		git add filler &&
+		git commit -m filler &&
+
+		mkdir -p foo/bar &&
+		test_commit foo/bar/baz &&
+
+		git branch simple &&
+		git branch fooey &&
+
+		git checkout fooey &&
+		git rm foo/bar/baz.t &&
+		echo stuff >>filler &&
+		git add -u &&
+		git commit -m "changes" &&
+
+		git checkout simple &&
+		echo items >>filler &&
+		echo newstuff >>newfile &&
+		git add filler newfile &&
+		git commit -m another
+	)
+}
+
+test_expect_failure 'git rebase --abort and untracked files' '
+	test_setup_sequencing rebase_abort_and_untracked &&
+	(
+		cd sequencing_rebase_abort_and_untracked &&
+		git checkout fooey &&
+		test_must_fail git rebase simple &&
+
+		cat init.t &&
+		git rm init.t &&
+		echo precious >init.t &&
+		cp init.t expect &&
+		git status --porcelain &&
+		test_must_fail git rebase --abort &&
+		test_cmp expect init.t
+	)
+'
+
+test_expect_failure 'git rebase fast forwarding and untracked files' '
+	test_setup_sequencing rebase_fast_forward_and_untracked &&
+	(
+		cd sequencing_rebase_fast_forward_and_untracked &&
+		git checkout init &&
+		echo precious >filler &&
+		cp filler expect &&
+		test_must_fail git rebase init simple &&
+		test_cmp expect filler
+	)
+'
+
+test_expect_failure 'git rebase --autostash and untracked files' '
+	test_setup_sequencing rebase_autostash_and_untracked &&
+	(
+		cd sequencing_rebase_autostash_and_untracked &&
+		git checkout simple &&
+		git rm filler &&
+		mkdir filler &&
+		echo precious >filler/file &&
+		cp filler/file expect &&
+		git rebase --autostash init &&
+		test_path_is_file filler/file
+	)
+'
+
+test_expect_failure 'git stash and untracked files' '
+	test_setup_sequencing stash_and_untracked_files &&
+	(
+		cd sequencing_stash_and_untracked_files &&
+		git checkout simple &&
+		git rm filler &&
+		mkdir filler &&
+		echo precious >filler/file &&
+		cp filler/file expect &&
+		git status --porcelain &&
+		git stash push &&
+		git status --porcelain &&
+		test_path_is_file filler/file
+	)
+'
+
+test_expect_failure 'git am --abort and untracked dir vs. unmerged file' '
+	test_setup_sequencing am_abort_and_untracked &&
+	(
+		cd sequencing_am_abort_and_untracked &&
+		git format-patch -1 --stdout fooey >changes.mbox &&
+		test_must_fail git am --3way changes.mbox &&
+
+		# Delete the conflicted file; we will stage and commit it later
+		rm filler &&
+
+		# Put an unrelated untracked directory there
+		mkdir filler &&
+		echo foo >filler/file1 &&
+		echo bar >filler/file2 &&
+
+		test_must_fail git am --abort 2>errors &&
+		test_path_is_dir filler &&
+		grep "Updating .filler. would lose untracked files in it" errors
+	)
+'
+
+test_expect_failure 'git am --skip and untracked dir vs deleted file' '
+	test_setup_sequencing am_skip_and_untracked &&
+	(
+		cd sequencing_am_skip_and_untracked &&
+		git checkout fooey &&
+		git format-patch -1 --stdout simple >changes.mbox &&
+		test_must_fail git am --3way changes.mbox &&
+
+		# Delete newfile
+		rm newfile &&
+
+		# Put an unrelated untracked directory there
+		mkdir newfile &&
+		echo foo >newfile/file1 &&
+		echo bar >newfile/file2 &&
+
+		# Change our mind about resolutions, just skip this patch
+		test_must_fail git am --skip 2>errors &&
+		test_path_is_dir newfile &&
+		grep "Updating .newfile. would lose untracked files in it" errors
+	)
+'
+
+test_done
-- 
2.33.0.1404.g83021034c5d


^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [RFC PATCH v4 02/10] read-tree, merge-recursive: overwrite ignored files by default
  2021-10-04  1:11     ` [RFC PATCH v4 00/10] " Ævar Arnfjörð Bjarmason
  2021-10-04  1:11       ` [RFC PATCH v4 01/10] t2500: add various tests for nuking untracked files Ævar Arnfjörð Bjarmason
@ 2021-10-04  1:11       ` Ævar Arnfjörð Bjarmason
  2021-10-04  1:11       ` [RFC PATCH v4 03/10] unpack-trees: introduce preserve_ignored to unpack_trees_options Ævar Arnfjörð Bjarmason
                         ` (8 subsequent siblings)
  10 siblings, 0 replies; 82+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-10-04  1:11 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Elijah Newren,
	Nguyễn Thái Ngọc Duy, Martin Ågren,
	Andrzej Hunt, Jeff King, Fedor Biryukov, Philip Oakley,
	Phillip Wood, Ævar Arnfjörð Bjarmason

From: Elijah Newren <newren@gmail.com>

This fixes a long-standing patchwork of ignored files handling in
read-tree and merge-recursive, called out and suggested by Junio long
ago.  Quoting from commit dcf0c16ef1 ("core.excludesfile clean-up"
2007-11-16):

    git-read-tree takes --exclude-per-directory=<gitignore>,
    not because the flexibility was needed.  Again, this was
    because the option predates the standardization of the ignore
    files.

    ...

    On the other hand, I think it makes perfect sense to fix
    git-read-tree, git-merge-recursive and git-clean to follow the
    same rule as other commands.  I do not think of a valid use case
    to give an exclude-per-directory that is nonstandard to
    read-tree command, outside a "negative" test in the t1004 test
    script.

    This patch is the first step to untangle this mess.

    The next step would be to teach read-tree, merge-recursive and
    clean (in C) to use setup_standard_excludes().

History shows each of these were partially or fully fixed:

  * clean was taught the new trick in 1617adc7a0 ("Teach git clean to
    use setup_standard_excludes()", 2007-11-14).

  * read-tree was primarily used by checkout & merge scripts.  checkout
    and merge later became builtins and were both fixed to use the new
    setup_standard_excludes() handling in fc001b526c ("checkout,merge:
    loosen overwriting untracked file check based on info/exclude",
    2011-11-27).  So the primary users were fixed, though read-tree
    itself was not.

  * merge-recursive has now been replaced as the default merge backend
    by merge-ort.  merge-ort fixed this by using
    setup_standard_excludes() starting early in its implementation; see
    commit 6681ce5cf6 ("merge-ort: add implementation of checkout()",
    2020-12-13), largely due to its design depending on checkout() and
    thus being influenced by the checkout code.  However,
    merge-recursive itself was not fixed here, in part because its
    design meant it had difficulty differentiating between untracked
    files, ignored files, leftover tracked files that haven't been
    removed yet due to order of processing files, and files written by
    itself due to collisions).

Make the conversion more complete by now handling read-tree and
handling at least the unpack_trees() portion of merge-recursive.  While
merge-recursive is on its way out, fixing the unpack_trees() portion is
easy and facilitates some of the later changes in this series.  Note
that fixing read-tree makes the --exclude-per-directory option to
read-tree useless, so we remove it from the documentation (though we
continue to accept it if passed).

The read-tree changes happen to fix a bug in t1013.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 Documentation/git-read-tree.txt | 18 +-----------------
 builtin/read-tree.c             | 21 +++++++++------------
 merge-recursive.c               |  5 ++++-
 t/t1013-read-tree-submodule.sh  |  1 -
 4 files changed, 14 insertions(+), 31 deletions(-)

diff --git a/Documentation/git-read-tree.txt b/Documentation/git-read-tree.txt
index 5fa8bab64c2..0222a27c5af 100644
--- a/Documentation/git-read-tree.txt
+++ b/Documentation/git-read-tree.txt
@@ -10,8 +10,7 @@ SYNOPSIS
 --------
 [verse]
 'git read-tree' [[-m [--trivial] [--aggressive] | --reset | --prefix=<prefix>]
-		[-u [--exclude-per-directory=<gitignore>] | -i]]
-		[--index-output=<file>] [--no-sparse-checkout]
+		[-u | -i]] [--index-output=<file>] [--no-sparse-checkout]
 		(--empty | <tree-ish1> [<tree-ish2> [<tree-ish3>]])
 
 
@@ -88,21 +87,6 @@ OPTIONS
 	The command will refuse to overwrite entries that already
 	existed in the original index file.
 
---exclude-per-directory=<gitignore>::
-	When running the command with `-u` and `-m` options, the
-	merge result may need to overwrite paths that are not
-	tracked in the current branch.  The command usually
-	refuses to proceed with the merge to avoid losing such a
-	path.  However this safety valve sometimes gets in the
-	way.  For example, it often happens that the other
-	branch added a file that used to be a generated file in
-	your branch, and the safety valve triggers when you try
-	to switch to that branch after you ran `make` but before
-	running `make clean` to remove the generated file.  This
-	option tells the command to read per-directory exclude
-	file (usually '.gitignore') and allows such an untracked
-	but explicitly ignored file to be overwritten.
-
 --index-output=<file>::
 	Instead of writing the results out to `$GIT_INDEX_FILE`,
 	write the resulting index in the named file.  While the
diff --git a/builtin/read-tree.c b/builtin/read-tree.c
index 8f1b8a7e74c..d0f88bbf3e3 100644
--- a/builtin/read-tree.c
+++ b/builtin/read-tree.c
@@ -38,7 +38,7 @@ static int list_tree(struct object_id *oid)
 }
 
 static const char * const read_tree_usage[] = {
-	N_("git read-tree [(-m [--trivial] [--aggressive] | --reset | --prefix=<prefix>) [-u [--exclude-per-directory=<gitignore>] | -i]] [--no-sparse-checkout] [--index-output=<file>] (--empty | <tree-ish1> [<tree-ish2> [<tree-ish3>]])"),
+	N_("git read-tree [(-m [--trivial] [--aggressive] | --reset | --prefix=<prefix>) [-u | -i]] [--no-sparse-checkout] [--index-output=<file>] (--empty | <tree-ish1> [<tree-ish2> [<tree-ish3>]])"),
 	NULL
 };
 
@@ -59,15 +59,10 @@ static int exclude_per_directory_cb(const struct option *opt, const char *arg,
 
 	opts = (struct unpack_trees_options *)opt->value;
 
-	if (opts->dir.exclude_per_dir)
-		die("more than one --exclude-per-directory given.");
-
-	opts->dir.flags |= DIR_SHOW_IGNORED;
-	opts->dir.exclude_per_dir = arg;
-	/* We do not need to nor want to do read-directory
-	 * here; we are merely interested in reusing the
-	 * per directory ignore stack mechanism.
-	 */
+	if (!opts->update)
+		die("--exclude-per-directory is meaningless unless -u");
+	if (strcmp(arg, ".gitignore"))
+		die("--exclude-per-directory argument must be .gitignore");
 	return 0;
 }
 
@@ -206,8 +201,10 @@ int cmd_read_tree(int argc, const char **argv, const char *cmd_prefix)
 	if ((opts.update || opts.index_only) && !opts.merge)
 		die("%s is meaningless without -m, --reset, or --prefix",
 		    opts.update ? "-u" : "-i");
-	if ((opts.dir.exclude_per_dir && !opts.update))
-		die("--exclude-per-directory is meaningless unless -u");
+	if (opts.update && !opts.reset) {
+		opts.dir.flags |= DIR_SHOW_IGNORED;
+		setup_standard_excludes(&opts.dir);
+	}
 	if (opts.merge && !opts.index_only)
 		setup_work_tree();
 
diff --git a/merge-recursive.c b/merge-recursive.c
index 316cb2ca907..a4131b8837b 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -408,8 +408,11 @@ static int unpack_trees_start(struct merge_options *opt,
 	unpack_trees_options_init(&opt->priv->unpack_opts);
 	if (opt->priv->call_depth)
 		opt->priv->unpack_opts.index_only = 1;
-	else
+	else {
 		opt->priv->unpack_opts.update = 1;
+		opt->priv->unpack_opts.dir.flags |= DIR_SHOW_IGNORED;
+		setup_standard_excludes(&opt->priv->unpack_opts.dir);
+	}
 	opt->priv->unpack_opts.merge = 1;
 	opt->priv->unpack_opts.head_idx = 2;
 	opt->priv->unpack_opts.fn = threeway_merge;
diff --git a/t/t1013-read-tree-submodule.sh b/t/t1013-read-tree-submodule.sh
index b6df7444c05..bfc90d4cf27 100755
--- a/t/t1013-read-tree-submodule.sh
+++ b/t/t1013-read-tree-submodule.sh
@@ -6,7 +6,6 @@ test_description='read-tree can handle submodules'
 . "$TEST_DIRECTORY"/lib-submodule-update.sh
 
 KNOWN_FAILURE_DIRECTORY_SUBMODULE_CONFLICTS=1
-KNOWN_FAILURE_SUBMODULE_OVERWRITE_IGNORED_UNTRACKED=1
 
 test_submodule_switch_recursing_with_args "read-tree -u -m"
 
-- 
2.33.0.1404.g83021034c5d


^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [RFC PATCH v4 03/10] unpack-trees: introduce preserve_ignored to unpack_trees_options
  2021-10-04  1:11     ` [RFC PATCH v4 00/10] " Ævar Arnfjörð Bjarmason
  2021-10-04  1:11       ` [RFC PATCH v4 01/10] t2500: add various tests for nuking untracked files Ævar Arnfjörð Bjarmason
  2021-10-04  1:11       ` [RFC PATCH v4 02/10] read-tree, merge-recursive: overwrite ignored files by default Ævar Arnfjörð Bjarmason
@ 2021-10-04  1:11       ` Ævar Arnfjörð Bjarmason
  2021-10-04  1:11       ` [RFC PATCH v4 04/10] unpack-trees: rename "dir" to "private_dir" Ævar Arnfjörð Bjarmason
                         ` (7 subsequent siblings)
  10 siblings, 0 replies; 82+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-10-04  1:11 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Elijah Newren,
	Nguyễn Thái Ngọc Duy, Martin Ågren,
	Andrzej Hunt, Jeff King, Fedor Biryukov, Philip Oakley,
	Phillip Wood, Ævar Arnfjörð Bjarmason

From: Elijah Newren <newren@gmail.com>

Currently, every caller of unpack_trees() that wants to ensure ignored
files are overwritten by default needs to:

   * flip the DIR_SHOW_IGNORED flag in unpack_trees_options.dir.flags
   * call setup_standard_excludes(&unpack_trees_options.dir)

Avoid that boilerplate by introducing a new boolean value where
the default value (0) does what we want so that new callers of
unpack_trees() automatically get the appropriate behavior.  And move all
the handling of unpack_trees_options.dir into unpack_trees() itself.

While preserve_ignored = 0 is the behavior we feel is the appropriate
default, we defer fixing commands to use the appropriate default until a
later commit.  So, this commit introduces several locations where we
manually set preserve_ignored=1.  This makes it clear where code paths
were previously preserving ignored files when they should not have been;
a future commit will flip these to instead use a value of 0 to get the
behavior we want.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/am.c        | 3 +++
 builtin/checkout.c  | 6 ++----
 builtin/clone.c     | 2 ++
 builtin/merge.c     | 2 ++
 builtin/read-tree.c | 7 +++----
 builtin/reset.c     | 2 ++
 builtin/stash.c     | 3 +++
 merge-ort.c         | 5 +----
 merge-recursive.c   | 4 ++--
 merge.c             | 6 +-----
 reset.c             | 2 ++
 sequencer.c         | 2 ++
 unpack-trees.c      | 5 +++++
 unpack-trees.h      | 1 +
 14 files changed, 31 insertions(+), 19 deletions(-)

diff --git a/builtin/am.c b/builtin/am.c
index 4d4bb473c0f..3c6efe2a46b 100644
--- a/builtin/am.c
+++ b/builtin/am.c
@@ -1918,6 +1918,9 @@ static int fast_forward_to(struct tree *head, struct tree *remote, int reset)
 	opts.update = 1;
 	opts.merge = 1;
 	opts.reset = reset;
+	if (!reset)
+		/* FIXME: Default should be to remove ignored files */
+		opts.preserve_ignored = 1;
 	opts.fn = twoway_merge;
 	init_tree_desc(&t[0], head->buffer, head->size);
 	init_tree_desc(&t[1], remote->buffer, remote->size);
diff --git a/builtin/checkout.c b/builtin/checkout.c
index fd76b504861..0c5187025c5 100644
--- a/builtin/checkout.c
+++ b/builtin/checkout.c
@@ -648,6 +648,7 @@ static int reset_tree(struct tree *tree, const struct checkout_opts *o,
 	opts.skip_unmerged = !worktree;
 	opts.reset = 1;
 	opts.merge = 1;
+	opts.preserve_ignored = 0;
 	opts.fn = oneway_merge;
 	opts.verbose_update = o->show_progress;
 	opts.src_index = &the_index;
@@ -749,10 +750,7 @@ static int merge_working_tree(const struct checkout_opts *opts,
 				       new_branch_info->commit ?
 				       &new_branch_info->commit->object.oid :
 				       &new_branch_info->oid, NULL);
-		if (opts->overwrite_ignore) {
-			topts.dir.flags |= DIR_SHOW_IGNORED;
-			setup_standard_excludes(&topts.dir);
-		}
+		topts.preserve_ignored = !opts->overwrite_ignore;
 		tree = parse_tree_indirect(old_branch_info->commit ?
 					   &old_branch_info->commit->object.oid :
 					   the_hash_algo->empty_tree);
diff --git a/builtin/clone.c b/builtin/clone.c
index df3bb9a7884..e76c38e4e81 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -686,6 +686,8 @@ static int checkout(int submodule_progress)
 	opts.update = 1;
 	opts.merge = 1;
 	opts.clone = 1;
+	/* FIXME: Default should be to remove ignored files */
+	opts.preserve_ignored = 1;
 	opts.fn = oneway_merge;
 	opts.verbose_update = (option_verbosity >= 0);
 	opts.src_index = &the_index;
diff --git a/builtin/merge.c b/builtin/merge.c
index 28089e2c5ed..55aac869e5a 100644
--- a/builtin/merge.c
+++ b/builtin/merge.c
@@ -680,6 +680,8 @@ static int read_tree_trivial(struct object_id *common, struct object_id *head,
 	opts.verbose_update = 1;
 	opts.trivial_merges_only = 1;
 	opts.merge = 1;
+	/* FIXME: Default should be to remove ignored files */
+	opts.preserve_ignored = 1;
 	trees[nr_trees] = parse_tree_indirect(common);
 	if (!trees[nr_trees++])
 		return -1;
diff --git a/builtin/read-tree.c b/builtin/read-tree.c
index d0f88bbf3e3..7f3c987b126 100644
--- a/builtin/read-tree.c
+++ b/builtin/read-tree.c
@@ -201,10 +201,9 @@ int cmd_read_tree(int argc, const char **argv, const char *cmd_prefix)
 	if ((opts.update || opts.index_only) && !opts.merge)
 		die("%s is meaningless without -m, --reset, or --prefix",
 		    opts.update ? "-u" : "-i");
-	if (opts.update && !opts.reset) {
-		opts.dir.flags |= DIR_SHOW_IGNORED;
-		setup_standard_excludes(&opts.dir);
-	}
+	if (opts.update && !opts.reset)
+		opts.preserve_ignored = 0;
+	/* otherwise, opts.preserve_ignored is irrelevant */
 	if (opts.merge && !opts.index_only)
 		setup_work_tree();
 
diff --git a/builtin/reset.c b/builtin/reset.c
index 713d084c3eb..73477239146 100644
--- a/builtin/reset.c
+++ b/builtin/reset.c
@@ -66,6 +66,8 @@ static int reset_index(const char *ref, const struct object_id *oid, int reset_t
 	case KEEP:
 	case MERGE:
 		opts.update = 1;
+		/* FIXME: Default should be to remove ignored files */
+		opts.preserve_ignored = 1;
 		break;
 	case HARD:
 		opts.update = 1;
diff --git a/builtin/stash.c b/builtin/stash.c
index be6ecb1ae11..78492013529 100644
--- a/builtin/stash.c
+++ b/builtin/stash.c
@@ -257,6 +257,9 @@ static int reset_tree(struct object_id *i_tree, int update, int reset)
 	opts.merge = 1;
 	opts.reset = reset;
 	opts.update = update;
+	if (update && !reset)
+		/* FIXME: Default should be to remove ignored files */
+		opts.preserve_ignored = 1;
 	opts.fn = oneway_merge;
 
 	if (unpack_trees(nr_trees, t, &opts)) {
diff --git a/merge-ort.c b/merge-ort.c
index 0a5937364c9..e5620bda212 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -4044,10 +4044,7 @@ static int checkout(struct merge_options *opt,
 	unpack_opts.quiet = 0; /* FIXME: sequencer might want quiet? */
 	unpack_opts.verbose_update = (opt->verbosity > 2);
 	unpack_opts.fn = twoway_merge;
-	if (1/* FIXME: opts->overwrite_ignore*/) {
-		unpack_opts.dir.flags |= DIR_SHOW_IGNORED;
-		setup_standard_excludes(&unpack_opts.dir);
-	}
+	unpack_opts.preserve_ignored = 0; /* FIXME: !opts->overwrite_ignore*/
 	parse_tree(prev);
 	init_tree_desc(&trees[0], prev->buffer, prev->size);
 	parse_tree(next);
diff --git a/merge-recursive.c b/merge-recursive.c
index a4131b8837b..5c6b95a79c0 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -410,8 +410,8 @@ static int unpack_trees_start(struct merge_options *opt,
 		opt->priv->unpack_opts.index_only = 1;
 	else {
 		opt->priv->unpack_opts.update = 1;
-		opt->priv->unpack_opts.dir.flags |= DIR_SHOW_IGNORED;
-		setup_standard_excludes(&opt->priv->unpack_opts.dir);
+		/* FIXME: should only do this if !overwrite_ignore */
+		opt->priv->unpack_opts.preserve_ignored = 0;
 	}
 	opt->priv->unpack_opts.merge = 1;
 	opt->priv->unpack_opts.head_idx = 2;
diff --git a/merge.c b/merge.c
index 2e3714ccaa0..e1f3165e407 100644
--- a/merge.c
+++ b/merge.c
@@ -79,11 +79,7 @@ int checkout_fast_forward(struct repository *r,
 		init_tree_desc(t+i, trees[i]->buffer, trees[i]->size);
 	}
 
-	if (overwrite_ignore) {
-		opts.dir.flags |= DIR_SHOW_IGNORED;
-		setup_standard_excludes(&opts.dir);
-	}
-
+	opts.preserve_ignored = !overwrite_ignore;
 	opts.head_idx = 1;
 	opts.src_index = r->index;
 	opts.dst_index = r->index;
diff --git a/reset.c b/reset.c
index f4bf3fbfac0..cd344f47f13 100644
--- a/reset.c
+++ b/reset.c
@@ -56,6 +56,8 @@ int reset_head(struct repository *r, struct object_id *oid, const char *action,
 	unpack_tree_opts.fn = reset_hard ? oneway_merge : twoway_merge;
 	unpack_tree_opts.update = 1;
 	unpack_tree_opts.merge = 1;
+	/* FIXME: Default should be to remove ignored files */
+	unpack_tree_opts.preserve_ignored = 1;
 	init_checkout_metadata(&unpack_tree_opts.meta, switch_to_branch, oid, NULL);
 	if (!detach_head)
 		unpack_tree_opts.reset = 1;
diff --git a/sequencer.c b/sequencer.c
index abd85b6c562..669ea15944c 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -3698,6 +3698,8 @@ static int do_reset(struct repository *r,
 	unpack_tree_opts.fn = oneway_merge;
 	unpack_tree_opts.merge = 1;
 	unpack_tree_opts.update = 1;
+	/* FIXME: Default should be to remove ignored files */
+	unpack_tree_opts.preserve_ignored = 1;
 	init_checkout_metadata(&unpack_tree_opts.meta, name, &oid, NULL);
 
 	if (repo_read_index_unmerged(r)) {
diff --git a/unpack-trees.c b/unpack-trees.c
index 260e7ec5bb4..02bc999c6c3 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -1711,6 +1711,11 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
 		ensure_full_index(o->dst_index);
 	}
 
+	if (!o->preserve_ignored) {
+		o->dir.flags |= DIR_SHOW_IGNORED;
+		setup_standard_excludes(&o->dir);
+	}
+
 	if (!core_apply_sparse_checkout || !o->update)
 		o->skip_sparse_checkout = 1;
 	if (!o->skip_sparse_checkout && !o->pl) {
diff --git a/unpack-trees.h b/unpack-trees.h
index a8d1f083b33..65a8d99d4ef 100644
--- a/unpack-trees.h
+++ b/unpack-trees.h
@@ -52,6 +52,7 @@ struct unpack_trees_options {
 	unsigned int reset,
 		     merge,
 		     update,
+		     preserve_ignored,
 		     clone,
 		     index_only,
 		     nontrivial_merge,
-- 
2.33.0.1404.g83021034c5d


^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [RFC PATCH v4 04/10] unpack-trees: rename "dir" to "private_dir"
  2021-10-04  1:11     ` [RFC PATCH v4 00/10] " Ævar Arnfjörð Bjarmason
                         ` (2 preceding siblings ...)
  2021-10-04  1:11       ` [RFC PATCH v4 03/10] unpack-trees: introduce preserve_ignored to unpack_trees_options Ævar Arnfjörð Bjarmason
@ 2021-10-04  1:11       ` Ævar Arnfjörð Bjarmason
  2021-10-04  1:11       ` [RFC PATCH v4 05/10] Remove ignored files by default when they are in the way Ævar Arnfjörð Bjarmason
                         ` (6 subsequent siblings)
  10 siblings, 0 replies; 82+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-10-04  1:11 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Elijah Newren,
	Nguyễn Thái Ngọc Duy, Martin Ågren,
	Andrzej Hunt, Jeff King, Fedor Biryukov, Philip Oakley,
	Phillip Wood, Ævar Arnfjörð Bjarmason

From: Elijah Newren <newren@gmail.com>

Until the introduction of the "preserve_ignored" flag in the preceding
commit callers who wanted its behavior needed to adjust "dir.flags"
and call setup_standard_excludes() themselves.

Now that we have no external users of "dir" anymore let's rename it to
"private_dir" and add a comment indicating that we'd like it not to be
messed with by external callers. This should avoid avoid accidental
misuse or confusion over its ownership.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 unpack-trees.c | 10 +++++-----
 unpack-trees.h |  4 ++--
 2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/unpack-trees.c b/unpack-trees.c
index 02bc999c6c3..512011cfa42 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -196,7 +196,7 @@ void unpack_trees_options_init(struct unpack_trees_options *o)
 void unpack_trees_options_release(struct unpack_trees_options *opts)
 {
 	strvec_clear(&opts->msgs_to_free);
-	dir_clear(&opts->dir);
+	dir_clear(&opts->private_dir);
 }
 
 static int do_add_entry(struct unpack_trees_options *o, struct cache_entry *ce,
@@ -1712,8 +1712,8 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
 	}
 
 	if (!o->preserve_ignored) {
-		o->dir.flags |= DIR_SHOW_IGNORED;
-		setup_standard_excludes(&o->dir);
+		o->private_dir.flags |= DIR_SHOW_IGNORED;
+		setup_standard_excludes(&o->private_dir);
 	}
 
 	if (!core_apply_sparse_checkout || !o->update)
@@ -2141,7 +2141,7 @@ static int verify_clean_subdirectory(const struct cache_entry *ce,
 	 */
 	pathbuf = xstrfmt("%.*s/", namelen, ce->name);
 
-	d.exclude_per_dir = o->dir.exclude_per_dir;
+	d.exclude_per_dir = o->private_dir.exclude_per_dir;
 	i = read_directory(&d, o->src_index, pathbuf, namelen+1, NULL);
 	dir_clear(&d);
 	free(pathbuf);
@@ -2183,7 +2183,7 @@ static int check_ok_to_remove(const char *name, int len, int dtype,
 	if (ignore_case && icase_exists(o, name, len, st))
 		return 0;
 
-	if (is_excluded(&o->dir, o->src_index, name, &dtype))
+	if (is_excluded(&o->private_dir, o->src_index, name, &dtype))
 		/*
 		 * ce->name is explicitly excluded, so it is Ok to
 		 * overwrite it.
diff --git a/unpack-trees.h b/unpack-trees.h
index 65a8d99d4ef..2eb633bf771 100644
--- a/unpack-trees.h
+++ b/unpack-trees.h
@@ -70,7 +70,7 @@ struct unpack_trees_options {
 		     dry_run;
 	const char *prefix;
 	int cache_bottom;
-	struct dir_struct dir;
+	struct dir_struct private_dir; /* for internal use only */
 	struct pathspec *pathspec;
 	merge_fn_t fn;
 	const char *msgs[NB_UNPACK_TREES_WARNING_TYPES];
@@ -97,7 +97,7 @@ struct unpack_trees_options {
 
 #define UNPACK_TREES_OPTIONS_INIT { \
 	.msgs_to_free = STRVEC_INIT, \
-	.dir = DIR_INIT, \
+	.private_dir = DIR_INIT, \
 }
 void unpack_trees_options_init(struct unpack_trees_options *o);
 
-- 
2.33.0.1404.g83021034c5d


^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [RFC PATCH v4 05/10] Remove ignored files by default when they are in the way
  2021-10-04  1:11     ` [RFC PATCH v4 00/10] " Ævar Arnfjörð Bjarmason
                         ` (3 preceding siblings ...)
  2021-10-04  1:11       ` [RFC PATCH v4 04/10] unpack-trees: rename "dir" to "private_dir" Ævar Arnfjörð Bjarmason
@ 2021-10-04  1:11       ` Ævar Arnfjörð Bjarmason
  2021-10-04  1:11       ` [RFC PATCH v4 06/10] Change unpack_trees' 'reset' flag into an enum Ævar Arnfjörð Bjarmason
                         ` (5 subsequent siblings)
  10 siblings, 0 replies; 82+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-10-04  1:11 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Elijah Newren,
	Nguyễn Thái Ngọc Duy, Martin Ågren,
	Andrzej Hunt, Jeff King, Fedor Biryukov, Philip Oakley,
	Phillip Wood, Ævar Arnfjörð Bjarmason

From: Elijah Newren <newren@gmail.com>

Change several commands to remove ignored files by default when they are
in the way.  Since some commands (checkout, merge) take a
--no-overwrite-ignore option to allow the user to configure this, and it
may make sense to add that option to more commands (and in the case of
merge, actually plumb that configuration option through to more of the
backends than just the fast-forwarding special case), add little
comments about where such flags would be used.

Incidentally, this fixes a test failure in t7112.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/am.c               | 3 +--
 builtin/clone.c            | 3 +--
 builtin/merge.c            | 3 +--
 builtin/reset.c            | 3 +--
 builtin/stash.c            | 3 +--
 merge-ort.c                | 2 +-
 reset.c                    | 3 +--
 sequencer.c                | 3 +--
 t/t7112-reset-submodule.sh | 1 -
 9 files changed, 8 insertions(+), 16 deletions(-)

diff --git a/builtin/am.c b/builtin/am.c
index 3c6efe2a46b..8cb7e72e6c5 100644
--- a/builtin/am.c
+++ b/builtin/am.c
@@ -1919,8 +1919,7 @@ static int fast_forward_to(struct tree *head, struct tree *remote, int reset)
 	opts.merge = 1;
 	opts.reset = reset;
 	if (!reset)
-		/* FIXME: Default should be to remove ignored files */
-		opts.preserve_ignored = 1;
+		opts.preserve_ignored = 0; /* FIXME: !overwrite_ignore */
 	opts.fn = twoway_merge;
 	init_tree_desc(&t[0], head->buffer, head->size);
 	init_tree_desc(&t[1], remote->buffer, remote->size);
diff --git a/builtin/clone.c b/builtin/clone.c
index e76c38e4e81..49edb4a2aaa 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -686,8 +686,7 @@ static int checkout(int submodule_progress)
 	opts.update = 1;
 	opts.merge = 1;
 	opts.clone = 1;
-	/* FIXME: Default should be to remove ignored files */
-	opts.preserve_ignored = 1;
+	opts.preserve_ignored = 0;
 	opts.fn = oneway_merge;
 	opts.verbose_update = (option_verbosity >= 0);
 	opts.src_index = &the_index;
diff --git a/builtin/merge.c b/builtin/merge.c
index 55aac869e5a..7f990e36651 100644
--- a/builtin/merge.c
+++ b/builtin/merge.c
@@ -680,8 +680,7 @@ static int read_tree_trivial(struct object_id *common, struct object_id *head,
 	opts.verbose_update = 1;
 	opts.trivial_merges_only = 1;
 	opts.merge = 1;
-	/* FIXME: Default should be to remove ignored files */
-	opts.preserve_ignored = 1;
+	opts.preserve_ignored = 0; /* FIXME: !overwrite_ignore */
 	trees[nr_trees] = parse_tree_indirect(common);
 	if (!trees[nr_trees++])
 		return -1;
diff --git a/builtin/reset.c b/builtin/reset.c
index 73477239146..9d1391335a1 100644
--- a/builtin/reset.c
+++ b/builtin/reset.c
@@ -66,8 +66,7 @@ static int reset_index(const char *ref, const struct object_id *oid, int reset_t
 	case KEEP:
 	case MERGE:
 		opts.update = 1;
-		/* FIXME: Default should be to remove ignored files */
-		opts.preserve_ignored = 1;
+		opts.preserve_ignored = 0; /* FIXME: !overwrite_ignore */
 		break;
 	case HARD:
 		opts.update = 1;
diff --git a/builtin/stash.c b/builtin/stash.c
index 78492013529..92ad3241270 100644
--- a/builtin/stash.c
+++ b/builtin/stash.c
@@ -258,8 +258,7 @@ static int reset_tree(struct object_id *i_tree, int update, int reset)
 	opts.reset = reset;
 	opts.update = update;
 	if (update && !reset)
-		/* FIXME: Default should be to remove ignored files */
-		opts.preserve_ignored = 1;
+		opts.preserve_ignored = 0; /* FIXME: !overwrite_ignore */
 	opts.fn = oneway_merge;
 
 	if (unpack_trees(nr_trees, t, &opts)) {
diff --git a/merge-ort.c b/merge-ort.c
index e5620bda212..5c443b2b00d 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -4044,7 +4044,7 @@ static int checkout(struct merge_options *opt,
 	unpack_opts.quiet = 0; /* FIXME: sequencer might want quiet? */
 	unpack_opts.verbose_update = (opt->verbosity > 2);
 	unpack_opts.fn = twoway_merge;
-	unpack_opts.preserve_ignored = 0; /* FIXME: !opts->overwrite_ignore*/
+	unpack_opts.preserve_ignored = 0; /* FIXME: !opts->overwrite_ignore */
 	parse_tree(prev);
 	init_tree_desc(&trees[0], prev->buffer, prev->size);
 	parse_tree(next);
diff --git a/reset.c b/reset.c
index cd344f47f13..5f69311b9f4 100644
--- a/reset.c
+++ b/reset.c
@@ -56,8 +56,7 @@ int reset_head(struct repository *r, struct object_id *oid, const char *action,
 	unpack_tree_opts.fn = reset_hard ? oneway_merge : twoway_merge;
 	unpack_tree_opts.update = 1;
 	unpack_tree_opts.merge = 1;
-	/* FIXME: Default should be to remove ignored files */
-	unpack_tree_opts.preserve_ignored = 1;
+	unpack_tree_opts.preserve_ignored = 0; /* FIXME: !overwrite_ignore */
 	init_checkout_metadata(&unpack_tree_opts.meta, switch_to_branch, oid, NULL);
 	if (!detach_head)
 		unpack_tree_opts.reset = 1;
diff --git a/sequencer.c b/sequencer.c
index 669ea15944c..05395db9e01 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -3698,8 +3698,7 @@ static int do_reset(struct repository *r,
 	unpack_tree_opts.fn = oneway_merge;
 	unpack_tree_opts.merge = 1;
 	unpack_tree_opts.update = 1;
-	/* FIXME: Default should be to remove ignored files */
-	unpack_tree_opts.preserve_ignored = 1;
+	unpack_tree_opts.preserve_ignored = 0; /* FIXME: !overwrite_ignore */
 	init_checkout_metadata(&unpack_tree_opts.meta, name, &oid, NULL);
 
 	if (repo_read_index_unmerged(r)) {
diff --git a/t/t7112-reset-submodule.sh b/t/t7112-reset-submodule.sh
index 19830d90365..a3e2413bc33 100755
--- a/t/t7112-reset-submodule.sh
+++ b/t/t7112-reset-submodule.sh
@@ -6,7 +6,6 @@ test_description='reset can handle submodules'
 . "$TEST_DIRECTORY"/lib-submodule-update.sh
 
 KNOWN_FAILURE_DIRECTORY_SUBMODULE_CONFLICTS=1
-KNOWN_FAILURE_SUBMODULE_OVERWRITE_IGNORED_UNTRACKED=1
 
 test_submodule_switch_recursing_with_args "reset --keep"
 
-- 
2.33.0.1404.g83021034c5d


^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [RFC PATCH v4 06/10] Change unpack_trees' 'reset' flag into an enum
  2021-10-04  1:11     ` [RFC PATCH v4 00/10] " Ævar Arnfjörð Bjarmason
                         ` (4 preceding siblings ...)
  2021-10-04  1:11       ` [RFC PATCH v4 05/10] Remove ignored files by default when they are in the way Ævar Arnfjörð Bjarmason
@ 2021-10-04  1:11       ` Ævar Arnfjörð Bjarmason
  2021-10-04  1:11       ` [RFC PATCH v4 07/10] unpack-trees: avoid nuking untracked dir in way of unmerged file Ævar Arnfjörð Bjarmason
                         ` (4 subsequent siblings)
  10 siblings, 0 replies; 82+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-10-04  1:11 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Elijah Newren,
	Nguyễn Thái Ngọc Duy, Martin Ågren,
	Andrzej Hunt, Jeff King, Fedor Biryukov, Philip Oakley,
	Phillip Wood, Ævar Arnfjörð Bjarmason

From: Elijah Newren <newren@gmail.com>

Traditionally, unpack_trees_options->reset was used to signal that it
was okay to delete any untracked files in the way.  This was used by
`git read-tree --reset`, but then started appearing in other places as
well.  However, many of the other uses should not be deleting untracked
files in the way.  Change this value to an enum so that a value of 1
(i.e. "true") can be split into two:
   UNPACK_RESET_PROTECT_UNTRACKED,
   UNPACK_RESET_OVERWRITE_UNTRACKED
In order to catch accidental misuses (i.e. where folks call it the way
they traditionally used to), define the special enum value of
   UNPACK_RESET_INVALID = 1
which will trigger a BUG().

Modify existing callers so that
   read-tree --reset
   reset --hard
   checkout --force
continue using the UNPACK_RESET_OVERWRITE_UNTRACKED logic, while other
callers, including
   am
   checkout without --force
   stash  (though currently dead code; reset always had a value of 0)
   numerous callers from rebase/sequencer to reset_head()
will use the new UNPACK_RESET_PROTECT_UNTRACKED value.

Also, note that it has been reported that 'git checkout <treeish>
<pathspec>' currently also allows overwriting untracked files[1].  That
case should also be fixed, but it does not use unpack_trees() and thus
is outside the scope of the current changes.

[1] https://lore.kernel.org/git/15dad590-087e-5a48-9238-5d2826950506@gmail.com/

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/am.c                     |  5 ++---
 builtin/checkout.c               |  5 +++--
 builtin/read-tree.c              |  3 +++
 builtin/reset.c                  |  9 +++++++--
 builtin/stash.c                  |  4 ++--
 reset.c                          |  2 +-
 t/t2500-untracked-overwriting.sh |  6 +++---
 unpack-trees.c                   | 10 +++++++++-
 unpack-trees.h                   | 11 +++++++++--
 9 files changed, 39 insertions(+), 16 deletions(-)

diff --git a/builtin/am.c b/builtin/am.c
index 8cb7e72e6c5..dfe2bf207e6 100644
--- a/builtin/am.c
+++ b/builtin/am.c
@@ -1917,9 +1917,8 @@ static int fast_forward_to(struct tree *head, struct tree *remote, int reset)
 	opts.dst_index = &the_index;
 	opts.update = 1;
 	opts.merge = 1;
-	opts.reset = reset;
-	if (!reset)
-		opts.preserve_ignored = 0; /* FIXME: !overwrite_ignore */
+	opts.reset = reset ? UNPACK_RESET_PROTECT_UNTRACKED : 0;
+	opts.preserve_ignored = 0; /* FIXME: !overwrite_ignore */
 	opts.fn = twoway_merge;
 	init_tree_desc(&t[0], head->buffer, head->size);
 	init_tree_desc(&t[1], remote->buffer, remote->size);
diff --git a/builtin/checkout.c b/builtin/checkout.c
index 0c5187025c5..ee3e450537f 100644
--- a/builtin/checkout.c
+++ b/builtin/checkout.c
@@ -646,9 +646,10 @@ static int reset_tree(struct tree *tree, const struct checkout_opts *o,
 	opts.head_idx = -1;
 	opts.update = worktree;
 	opts.skip_unmerged = !worktree;
-	opts.reset = 1;
+	opts.reset = o->force ? UNPACK_RESET_OVERWRITE_UNTRACKED :
+				UNPACK_RESET_PROTECT_UNTRACKED;
+	opts.preserve_ignored = (!o->force && !o->overwrite_ignore);
 	opts.merge = 1;
-	opts.preserve_ignored = 0;
 	opts.fn = oneway_merge;
 	opts.verbose_update = o->show_progress;
 	opts.src_index = &the_index;
diff --git a/builtin/read-tree.c b/builtin/read-tree.c
index 7f3c987b126..5e10caeff5a 100644
--- a/builtin/read-tree.c
+++ b/builtin/read-tree.c
@@ -166,6 +166,9 @@ int cmd_read_tree(int argc, const char **argv, const char *cmd_prefix)
 	if (1 < opts.merge + opts.reset + prefix_set)
 		die("Which one? -m, --reset, or --prefix?");
 
+	if (opts.reset)
+		opts.reset = UNPACK_RESET_OVERWRITE_UNTRACKED;
+
 	/*
 	 * NEEDSWORK
 	 *
diff --git a/builtin/reset.c b/builtin/reset.c
index 9d1391335a1..00d2de392a8 100644
--- a/builtin/reset.c
+++ b/builtin/reset.c
@@ -70,9 +70,14 @@ static int reset_index(const char *ref, const struct object_id *oid, int reset_t
 		break;
 	case HARD:
 		opts.update = 1;
-		/* fallthrough */
+		opts.reset = UNPACK_RESET_OVERWRITE_UNTRACKED;
+		break;
+	case MIXED:
+		opts.reset = UNPACK_RESET_PROTECT_UNTRACKED;
+		/* but opts.update=0, so working tree not updated */
+		break;
 	default:
-		opts.reset = 1;
+		BUG("invalid reset_type passed to reset_index");
 	}
 
 	read_cache_unmerged();
diff --git a/builtin/stash.c b/builtin/stash.c
index 92ad3241270..061237cf9a4 100644
--- a/builtin/stash.c
+++ b/builtin/stash.c
@@ -255,9 +255,9 @@ static int reset_tree(struct object_id *i_tree, int update, int reset)
 	opts.src_index = &the_index;
 	opts.dst_index = &the_index;
 	opts.merge = 1;
-	opts.reset = reset;
+	opts.reset = reset ? UNPACK_RESET_PROTECT_UNTRACKED : 0;
 	opts.update = update;
-	if (update && !reset)
+	if (update)
 		opts.preserve_ignored = 0; /* FIXME: !overwrite_ignore */
 	opts.fn = oneway_merge;
 
diff --git a/reset.c b/reset.c
index 5f69311b9f4..5788b1926f3 100644
--- a/reset.c
+++ b/reset.c
@@ -59,7 +59,7 @@ int reset_head(struct repository *r, struct object_id *oid, const char *action,
 	unpack_tree_opts.preserve_ignored = 0; /* FIXME: !overwrite_ignore */
 	init_checkout_metadata(&unpack_tree_opts.meta, switch_to_branch, oid, NULL);
 	if (!detach_head)
-		unpack_tree_opts.reset = 1;
+		unpack_tree_opts.reset = UNPACK_RESET_PROTECT_UNTRACKED;
 
 	if (repo_read_index_unmerged(r) < 0) {
 		ret = error(_("could not read index"));
diff --git a/t/t2500-untracked-overwriting.sh b/t/t2500-untracked-overwriting.sh
index 2412d121ea8..18604360df8 100755
--- a/t/t2500-untracked-overwriting.sh
+++ b/t/t2500-untracked-overwriting.sh
@@ -92,7 +92,7 @@ test_setup_checkout_m () {
 	)
 }
 
-test_expect_failure 'checkout -m does not nuke untracked file' '
+test_expect_success 'checkout -m does not nuke untracked file' '
 	test_setup_checkout_m &&
 	(
 		cd checkout &&
@@ -138,7 +138,7 @@ test_setup_sequencing () {
 	)
 }
 
-test_expect_failure 'git rebase --abort and untracked files' '
+test_expect_success 'git rebase --abort and untracked files' '
 	test_setup_sequencing rebase_abort_and_untracked &&
 	(
 		cd sequencing_rebase_abort_and_untracked &&
@@ -155,7 +155,7 @@ test_expect_failure 'git rebase --abort and untracked files' '
 	)
 '
 
-test_expect_failure 'git rebase fast forwarding and untracked files' '
+test_expect_success 'git rebase fast forwarding and untracked files' '
 	test_setup_sequencing rebase_fast_forward_and_untracked &&
 	(
 		cd sequencing_rebase_fast_forward_and_untracked &&
diff --git a/unpack-trees.c b/unpack-trees.c
index 512011cfa42..37f769030ab 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -1699,6 +1699,9 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
 	struct pattern_list pl;
 	int free_pattern_list = 0;
 
+	if (o->reset == UNPACK_RESET_INVALID)
+		BUG("o->reset had a value of 1; should be UNPACK_TREES_*_UNTRACKED");
+
 	if (len > MAX_UNPACK_TREES)
 		die("unpack_trees takes at most %d trees", MAX_UNPACK_TREES);
 
@@ -1711,6 +1714,10 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
 		ensure_full_index(o->dst_index);
 	}
 
+	if (o->reset == UNPACK_RESET_OVERWRITE_UNTRACKED &&
+	    o->preserve_ignored)
+		BUG("UNPACK_RESET_OVERWRITE_UNTRACKED incompatible with preserved ignored files");
+
 	if (!o->preserve_ignored) {
 		o->private_dir.flags |= DIR_SHOW_IGNORED;
 		setup_standard_excludes(&o->private_dir);
@@ -2227,7 +2234,8 @@ static int verify_absent_1(const struct cache_entry *ce,
 	int len;
 	struct stat st;
 
-	if (o->index_only || o->reset || !o->update)
+	if (o->index_only || !o->update ||
+	    o->reset == UNPACK_RESET_OVERWRITE_UNTRACKED)
 		return 0;
 
 	len = check_leading_path(ce->name, ce_namelen(ce), 0);
diff --git a/unpack-trees.h b/unpack-trees.h
index 2eb633bf771..1582599606a 100644
--- a/unpack-trees.h
+++ b/unpack-trees.h
@@ -48,9 +48,15 @@ void setup_unpack_trees_porcelain(struct unpack_trees_options *opts,
  */
 void unpack_trees_options_release(struct unpack_trees_options *opts);
 
+enum unpack_trees_reset_type {
+	UNPACK_RESET_NONE = 0,    /* traditional "false" value; still valid */
+	UNPACK_RESET_INVALID = 1, /* "true" no longer valid; use below values */
+	UNPACK_RESET_PROTECT_UNTRACKED,
+	UNPACK_RESET_OVERWRITE_UNTRACKED
+};
+
 struct unpack_trees_options {
-	unsigned int reset,
-		     merge,
+	unsigned int merge,
 		     update,
 		     preserve_ignored,
 		     clone,
@@ -68,6 +74,7 @@ struct unpack_trees_options {
 		     exiting_early,
 		     show_all_errors,
 		     dry_run;
+	enum unpack_trees_reset_type reset;
 	const char *prefix;
 	int cache_bottom;
 	struct dir_struct private_dir; /* for internal use only */
-- 
2.33.0.1404.g83021034c5d


^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [RFC PATCH v4 07/10] unpack-trees: avoid nuking untracked dir in way of unmerged file
  2021-10-04  1:11     ` [RFC PATCH v4 00/10] " Ævar Arnfjörð Bjarmason
                         ` (5 preceding siblings ...)
  2021-10-04  1:11       ` [RFC PATCH v4 06/10] Change unpack_trees' 'reset' flag into an enum Ævar Arnfjörð Bjarmason
@ 2021-10-04  1:11       ` Ævar Arnfjörð Bjarmason
  2021-10-04  1:11       ` [RFC PATCH v4 08/10] unpack-trees: avoid nuking untracked dir in way of locally deleted file Ævar Arnfjörð Bjarmason
                         ` (3 subsequent siblings)
  10 siblings, 0 replies; 82+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-10-04  1:11 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Elijah Newren,
	Nguyễn Thái Ngọc Duy, Martin Ågren,
	Andrzej Hunt, Jeff King, Fedor Biryukov, Philip Oakley,
	Phillip Wood, Ævar Arnfjörð Bjarmason

From: Elijah Newren <newren@gmail.com>

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/t2500-untracked-overwriting.sh |  2 +-
 unpack-trees.c                   | 35 ++++++++++++++++++++++++++++----
 2 files changed, 32 insertions(+), 5 deletions(-)

diff --git a/t/t2500-untracked-overwriting.sh b/t/t2500-untracked-overwriting.sh
index 18604360df8..5ec66058cfc 100755
--- a/t/t2500-untracked-overwriting.sh
+++ b/t/t2500-untracked-overwriting.sh
@@ -197,7 +197,7 @@ test_expect_failure 'git stash and untracked files' '
 	)
 '
 
-test_expect_failure 'git am --abort and untracked dir vs. unmerged file' '
+test_expect_success 'git am --abort and untracked dir vs. unmerged file' '
 	test_setup_sequencing am_abort_and_untracked &&
 	(
 		cd sequencing_am_abort_and_untracked &&
diff --git a/unpack-trees.c b/unpack-trees.c
index 37f769030ab..8408a8fcfff 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -2173,9 +2173,15 @@ static int icase_exists(struct unpack_trees_options *o, const char *name, int le
 	return src && !ie_match_stat(o->src_index, src, st, CE_MATCH_IGNORE_VALID|CE_MATCH_IGNORE_SKIP_WORKTREE);
 }
 
+enum absent_checking_type {
+	COMPLETELY_ABSENT,
+	ABSENT_ANY_DIRECTORY
+};
+
 static int check_ok_to_remove(const char *name, int len, int dtype,
 			      const struct cache_entry *ce, struct stat *st,
 			      enum unpack_trees_error_types error_type,
+			      enum absent_checking_type absent_type,
 			      struct unpack_trees_options *o)
 {
 	const struct cache_entry *result;
@@ -2209,6 +2215,10 @@ static int check_ok_to_remove(const char *name, int len, int dtype,
 		return 0;
 	}
 
+	/* If we only care about directories, then we can remove */
+	if (absent_type == ABSENT_ANY_DIRECTORY)
+		return 0;
+
 	/*
 	 * The previous round may already have decided to
 	 * delete this path, which is in a subdirectory that
@@ -2229,6 +2239,7 @@ static int check_ok_to_remove(const char *name, int len, int dtype,
  */
 static int verify_absent_1(const struct cache_entry *ce,
 			   enum unpack_trees_error_types error_type,
+			   enum absent_checking_type absent_type,
 			   struct unpack_trees_options *o)
 {
 	int len;
@@ -2255,7 +2266,8 @@ static int verify_absent_1(const struct cache_entry *ce,
 								NULL, o);
 			else
 				ret = check_ok_to_remove(path, len, DT_UNKNOWN, NULL,
-							 &st, error_type, o);
+							 &st, error_type,
+							 absent_type, o);
 		}
 		free(path);
 		return ret;
@@ -2270,7 +2282,7 @@ static int verify_absent_1(const struct cache_entry *ce,
 
 		return check_ok_to_remove(ce->name, ce_namelen(ce),
 					  ce_to_dtype(ce), ce, &st,
-					  error_type, o);
+					  error_type, absent_type, o);
 	}
 }
 
@@ -2280,14 +2292,23 @@ static int verify_absent(const struct cache_entry *ce,
 {
 	if (!o->skip_sparse_checkout && (ce->ce_flags & CE_NEW_SKIP_WORKTREE))
 		return 0;
-	return verify_absent_1(ce, error_type, o);
+	return verify_absent_1(ce, error_type, COMPLETELY_ABSENT, o);
+}
+
+static int verify_absent_if_directory(const struct cache_entry *ce,
+				      enum unpack_trees_error_types error_type,
+				      struct unpack_trees_options *o)
+{
+	if (!o->skip_sparse_checkout && (ce->ce_flags & CE_NEW_SKIP_WORKTREE))
+		return 0;
+	return verify_absent_1(ce, error_type, ABSENT_ANY_DIRECTORY, o);
 }
 
 static int verify_absent_sparse(const struct cache_entry *ce,
 				enum unpack_trees_error_types error_type,
 				struct unpack_trees_options *o)
 {
-	return verify_absent_1(ce, error_type, o);
+	return verify_absent_1(ce, error_type, COMPLETELY_ABSENT, o);
 }
 
 static int merged_entry(const struct cache_entry *ce,
@@ -2361,6 +2382,12 @@ static int merged_entry(const struct cache_entry *ce,
 		 * Previously unmerged entry left as an existence
 		 * marker by read_index_unmerged();
 		 */
+		if (verify_absent_if_directory(merge,
+				  ERROR_WOULD_LOSE_UNTRACKED_OVERWRITTEN, o)) {
+			discard_cache_entry(merge);
+			return -1;
+		}
+
 		invalidate_ce_path(old, o);
 	}
 
-- 
2.33.0.1404.g83021034c5d


^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [RFC PATCH v4 08/10] unpack-trees: avoid nuking untracked dir in way of locally deleted file
  2021-10-04  1:11     ` [RFC PATCH v4 00/10] " Ævar Arnfjörð Bjarmason
                         ` (6 preceding siblings ...)
  2021-10-04  1:11       ` [RFC PATCH v4 07/10] unpack-trees: avoid nuking untracked dir in way of unmerged file Ævar Arnfjörð Bjarmason
@ 2021-10-04  1:11       ` Ævar Arnfjörð Bjarmason
  2021-10-04  1:11       ` [RFC PATCH v4 09/10] Comment important codepaths regarding nuking untracked files/dirs Ævar Arnfjörð Bjarmason
                         ` (2 subsequent siblings)
  10 siblings, 0 replies; 82+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-10-04  1:11 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Elijah Newren,
	Nguyễn Thái Ngọc Duy, Martin Ågren,
	Andrzej Hunt, Jeff King, Fedor Biryukov, Philip Oakley,
	Phillip Wood, Ævar Arnfjörð Bjarmason

From: Elijah Newren <newren@gmail.com>

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/t2500-untracked-overwriting.sh | 2 +-
 unpack-trees.c                   | 3 +++
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/t/t2500-untracked-overwriting.sh b/t/t2500-untracked-overwriting.sh
index 5ec66058cfc..5c0bf4d21fc 100755
--- a/t/t2500-untracked-overwriting.sh
+++ b/t/t2500-untracked-overwriting.sh
@@ -218,7 +218,7 @@ test_expect_success 'git am --abort and untracked dir vs. unmerged file' '
 	)
 '
 
-test_expect_failure 'git am --skip and untracked dir vs deleted file' '
+test_expect_success 'git am --skip and untracked dir vs deleted file' '
 	test_setup_sequencing am_skip_and_untracked &&
 	(
 		cd sequencing_am_skip_and_untracked &&
diff --git a/unpack-trees.c b/unpack-trees.c
index 8408a8fcfff..703e7953d62 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -2405,7 +2405,10 @@ static int deleted_entry(const struct cache_entry *ce,
 		if (verify_absent(ce, ERROR_WOULD_LOSE_UNTRACKED_REMOVED, o))
 			return -1;
 		return 0;
+	} else if (verify_absent_if_directory(ce, ERROR_WOULD_LOSE_UNTRACKED_REMOVED, o)) {
+		return -1;
 	}
+
 	if (!(old->ce_flags & CE_CONFLICTED) && verify_uptodate(old, o))
 		return -1;
 	add_entry(o, ce, CE_REMOVE, 0);
-- 
2.33.0.1404.g83021034c5d


^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [RFC PATCH v4 09/10] Comment important codepaths regarding nuking untracked files/dirs
  2021-10-04  1:11     ` [RFC PATCH v4 00/10] " Ævar Arnfjörð Bjarmason
                         ` (7 preceding siblings ...)
  2021-10-04  1:11       ` [RFC PATCH v4 08/10] unpack-trees: avoid nuking untracked dir in way of locally deleted file Ævar Arnfjörð Bjarmason
@ 2021-10-04  1:11       ` Ævar Arnfjörð Bjarmason
  2021-10-04  1:11       ` [RFC PATCH v4 10/10] Documentation: call out commands that nuke untracked files/directories Ævar Arnfjörð Bjarmason
  2021-10-04 14:38       ` [RFC PATCH v4 00/10] Fix various issues around removal of " Elijah Newren
  10 siblings, 0 replies; 82+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-10-04  1:11 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Elijah Newren,
	Nguyễn Thái Ngọc Duy, Martin Ågren,
	Andrzej Hunt, Jeff King, Fedor Biryukov, Philip Oakley,
	Phillip Wood, Ævar Arnfjörð Bjarmason

From: Elijah Newren <newren@gmail.com>

In the last few commits we focused on code in unpack-trees.c that
mistakenly removed untracked files or directories.  There may be more of
those, but in this commit we change our focus: callers of toplevel
commands that are expected to remove untracked files or directories.

As noted previously, we have toplevel commands that are expected to
delete untracked files such as 'read-tree --reset', 'reset --hard', and
'checkout --force'.  However, that does not mean that other highlevel
commands that happen to call these other commands thought about or
conveyed to users the possibility that untracked files could be removed.
Audit the code for such callsites, and add comments near existing
callsites to mention whether these are safe or not.

My auditing is somewhat incomplete, though; it skipped several cases:
  * git-rebase--preserve-merges.sh: is in the process of being
    deprecated/removed, so I won't leave a note that there are
    likely more bugs in that script.
  * contrib/git-new-workdir: why is the -f flag being used in a new
    empty directory??  It shouldn't hurt, but it seems useless.
  * git-p4.py: Don't see why -f is needed for a new dir (maybe it's
    not and is just superfluous), but I'm not at all familiar with
    the p4 stuff
  * git-archimport.perl: Don't care; arch is long since dead
  * git-cvs*.perl: Don't care; cvs is long since dead

Also, the reset --hard in builtin/worktree.c looks safe, due to only
running in an empty directory.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/stash.c             | 1 +
 builtin/submodule--helper.c | 4 ++++
 contrib/rerere-train.sh     | 2 +-
 submodule.c                 | 1 +
 4 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/builtin/stash.c b/builtin/stash.c
index 061237cf9a4..fabfb63632e 100644
--- a/builtin/stash.c
+++ b/builtin/stash.c
@@ -1525,6 +1525,7 @@ static int do_push_stash(const struct pathspec *ps, const char *stash_msg, int q
 		} else {
 			struct child_process cp = CHILD_PROCESS_INIT;
 			cp.git_cmd = 1;
+			/* BUG: this nukes untracked files in the way */
 			strvec_pushl(&cp.args, "reset", "--hard", "-q",
 				     "--no-recurse-submodules", NULL);
 			if (run_command(&cp)) {
diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
index 57f09925157..e40e4b6aacc 100644
--- a/builtin/submodule--helper.c
+++ b/builtin/submodule--helper.c
@@ -3091,6 +3091,10 @@ static int add_submodule(const struct add_data *add_data)
 		prepare_submodule_repo_env(&cp.env_array);
 		cp.git_cmd = 1;
 		cp.dir = add_data->sm_path;
+		/*
+		 * NOTE: we only get here if add_data->force is true, so
+		 * passing --force to checkout is reasonable.
+		 */
 		strvec_pushl(&cp.args, "checkout", "-f", "-q", NULL);
 
 		if (add_data->branch) {
diff --git a/contrib/rerere-train.sh b/contrib/rerere-train.sh
index eeee45dd341..75125d6ae00 100755
--- a/contrib/rerere-train.sh
+++ b/contrib/rerere-train.sh
@@ -91,7 +91,7 @@ do
 		git checkout -q $commit -- .
 		git rerere
 	fi
-	git reset -q --hard
+	git reset -q --hard  # Might nuke untracked files...
 done
 
 if test -z "$branch"
diff --git a/submodule.c b/submodule.c
index 03cf36707ae..9f3d9b7ee73 100644
--- a/submodule.c
+++ b/submodule.c
@@ -1910,6 +1910,7 @@ static void submodule_reset_index(const char *path)
 
 	strvec_pushf(&cp.args, "--super-prefix=%s%s/",
 		     get_super_prefix_or_empty(), path);
+	/* TODO: determine if this might overwright untracked files */
 	strvec_pushl(&cp.args, "read-tree", "-u", "--reset", NULL);
 
 	strvec_push(&cp.args, empty_tree_oid_hex());
-- 
2.33.0.1404.g83021034c5d


^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [RFC PATCH v4 10/10] Documentation: call out commands that nuke untracked files/directories
  2021-10-04  1:11     ` [RFC PATCH v4 00/10] " Ævar Arnfjörð Bjarmason
                         ` (8 preceding siblings ...)
  2021-10-04  1:11       ` [RFC PATCH v4 09/10] Comment important codepaths regarding nuking untracked files/dirs Ævar Arnfjörð Bjarmason
@ 2021-10-04  1:11       ` Ævar Arnfjörð Bjarmason
  2021-10-04 14:38       ` [RFC PATCH v4 00/10] Fix various issues around removal of " Elijah Newren
  10 siblings, 0 replies; 82+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-10-04  1:11 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Elijah Newren,
	Nguyễn Thái Ngọc Duy, Martin Ågren,
	Andrzej Hunt, Jeff King, Fedor Biryukov, Philip Oakley,
	Phillip Wood, Ævar Arnfjörð Bjarmason

From: Elijah Newren <newren@gmail.com>

Some commands have traditionally also removed untracked files (or
directories) that were in the way of a tracked file we needed.  Document
these cases.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 Documentation/git-checkout.txt  | 5 +++--
 Documentation/git-read-tree.txt | 5 +++--
 Documentation/git-reset.txt     | 3 ++-
 3 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/Documentation/git-checkout.txt b/Documentation/git-checkout.txt
index b1a6fe44997..d473c9bf387 100644
--- a/Documentation/git-checkout.txt
+++ b/Documentation/git-checkout.txt
@@ -118,8 +118,9 @@ OPTIONS
 -f::
 --force::
 	When switching branches, proceed even if the index or the
-	working tree differs from `HEAD`.  This is used to throw away
-	local changes.
+	working tree differs from `HEAD`, and even if there are untracked
+	files in the way.  This is used to throw away local changes and
+	any untracked files or directories that are in the way.
 +
 When checking out paths from the index, do not fail upon unmerged
 entries; instead, unmerged entries are ignored.
diff --git a/Documentation/git-read-tree.txt b/Documentation/git-read-tree.txt
index 0222a27c5af..8c3aceb8324 100644
--- a/Documentation/git-read-tree.txt
+++ b/Documentation/git-read-tree.txt
@@ -38,8 +38,9 @@ OPTIONS
 
 --reset::
 	Same as -m, except that unmerged entries are discarded instead
-	of failing. When used with `-u`, updates leading to loss of
-	working tree changes will not abort the operation.
+	of failing.  When used with `-u`, updates leading to loss of
+	working tree changes or untracked files or directories will not
+	abort the operation.
 
 -u::
 	After a successful merge, update the files in the work
diff --git a/Documentation/git-reset.txt b/Documentation/git-reset.txt
index 252e2d4e47d..6f7685f53d5 100644
--- a/Documentation/git-reset.txt
+++ b/Documentation/git-reset.txt
@@ -69,7 +69,8 @@ linkgit:git-add[1]).
 
 --hard::
 	Resets the index and working tree. Any changes to tracked files in the
-	working tree since `<commit>` are discarded.
+	working tree since `<commit>` are discarded.  Any untracked files or
+	directories in the way of writing any tracked files are simply deleted.
 
 --merge::
 	Resets the index and updates the files in the working tree that are
-- 
2.33.0.1404.g83021034c5d


^ permalink raw reply related	[flat|nested] 82+ messages in thread

* Re: [PATCH v3 04/11] unpack-trees: introduce preserve_ignored to unpack_trees_options
  2021-10-02  8:44                       ` Ævar Arnfjörð Bjarmason
  2021-10-03 22:21                         ` Ævar Arnfjörð Bjarmason
@ 2021-10-04 13:45                         ` Elijah Newren
  2021-10-04 14:07                           ` Ævar Arnfjörð Bjarmason
  1 sibling, 1 reply; 82+ messages in thread
From: Elijah Newren @ 2021-10-04 13:45 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Elijah Newren via GitGitGadget, Git Mailing List, Fedor Biryukov,
	Philip Oakley, Phillip Wood

On Sat, Oct 2, 2021 at 2:07 AM Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote:
>
> On Fri, Oct 01 2021, Elijah Newren wrote:
>
...
> > So maybe I'll submit some patches on top that rip these direct members
> > out of of unpack_trees_options and push them inside some opaque
> > struct.
>
> Sure, that sounds good. I only had a mild objection to doing it in a way
> where you'll need that sort of code I removed in the linked commit in
> prep_exclude() because you were trying not to expose that at any cost,
> including via some *_INIT macro. I.e. if it's private we can just name
> it "priv_*" or have a :
>
>     struct dont_touch_this {
>         struct dir_struct dir;
>     };
>
> Which are both ways of /messaging/ that it's private, and since the
> target audience is just the rest of the git.git codebase I think that
> ultimately something that 1) sends the right message 2) makes accidents
> pretty much impossible suffices. I.e. you don't accidentally introduce a
> new API user accessing a field called "->priv_*" or
> "->private_*". Someone will review those patches...

An internal struct with all the members meant to be internal-only
provides nearly all the advantages that I was going for with the
opaque struct, while also being a smaller change than what I was
thinking of doing.  I like that idea; thanks for the suggestion.

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH v3 04/11] unpack-trees: introduce preserve_ignored to unpack_trees_options
  2021-10-03 22:21                         ` Ævar Arnfjörð Bjarmason
@ 2021-10-04 13:45                           ` Elijah Newren
  0 siblings, 0 replies; 82+ messages in thread
From: Elijah Newren @ 2021-10-04 13:45 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Elijah Newren via GitGitGadget, Git Mailing List, Fedor Biryukov,
	Philip Oakley, Phillip Wood

On Sun, Oct 3, 2021 at 3:38 PM Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote:
>
> On Sat, Oct 02 2021, Ævar Arnfjörð Bjarmason wrote:
>
> > On Fri, Oct 01 2021, Elijah Newren wrote:
> >
> >> On Fri, Oct 1, 2021 at 1:47 AM Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote:
> >>>
> >>> On Thu, Sep 30 2021, Elijah Newren wrote:
> >>>
> >>> > On Thu, Sep 30, 2021 at 7:15 AM Ævar Arnfjörð Bjarmason
> >>> > <avarab@gmail.com> wrote:
> >>> >>
> >>> >> On Wed, Sep 29 2021, Elijah Newren wrote:
> [...]
> >>> > I might be going on a tangent here, but looking at that patch, I'm
> >>> > worried that dir_init() was buggy and that you perpetuated that bug
> >>> > with DIR_INIT.  Note that dir_struct has a struct strbuf basebuf
> >>> > member, which neither dir_init() or DIR_INIT initialize properly
> >>> > (using either strbuf_init() or STRBUF_INIT).  As far as I can tell,
> >>> > dir.c relies on either strbuf_add() calls to just happen to work with
> >>> > this incorrectly initialized strbuf, or else use the strbuf_init()
> >>> > call in prep_exclude() to do so, using the following snippet:
> >>> >
> >>> >     if (!dir->basebuf.buf)
> >>> >         strbuf_init(&dir->basebuf, PATH_MAX);
> >>> >
> >>> > However, earlier in that same function we see
> >>> >
> >>> >     if (stk->baselen <= baselen &&
> >>> >         !strncmp(dir->basebuf.buf, base, stk->baselen))
> >>> >             break;
> >>> >
> >>> > So either that function can never have dir->basebuf.buf be NULL and
> >>> > the strbuf_init() is dead code, or else it's possible for us to
> >>> > trigger a segfault.  If it's the former, it may just be a ticking time
> >>> > bomb that will transform into the latter with some other change,
> >>> > because it's not at all obvious to me how dir->basebuf gets
> >>> > initialized appropriately to avoid that strncmp call.  Perhaps there
> >>> > is some invariant where exclude_stack is only set up by previous calls
> >>> > to prep_exclude() and those won't set up exclude_stack until first
> >>> > initializing basebuf.  But that really at least deserves a comment
> >>> > about how we're abusing basebuf, and would probably be cleaner if we
> >>> > initialized basebuf to STRBUF_INIT.
> >>>
> >>> ...because yes, I forgot about that when sending you the diff-on-top,
> >>> sorry. Yes that's buggy with the diff-on-top I sent you.
> >>
> >> That bug didn't come from the diff-on-top you sent me, it came from
> >> the commit already merged to master -- ce93a4c6127  (dir.[ch]: replace
> >> dir_init() with DIR_INIT, 2021-07-01), merged as part of
> >> ab/struct-init on Jul 16.
> >
> > Ah, I misunderstood you there. I'll look at that / fix it. Sorry.
>
> Just to tie up this loose end: Yes this control flow suck, and I've got
> some patches to unpack-trees.[ch] & dir.[ch] I'm about to submit to fix
> it. But just to comment on the existing behavior of the code, i.e. your
> (above):
>
>     "So either that function can never have dir->basebuf.buf be NULL and
>     the strbuf_init() is dead code, or else it's possible for us to
>     trigger a segfault.".
>
> I hadn't had time to look into it when I said I'd fix it, but now that I
> have I found thath there's nothing to fix, and this code wasn't buggy
> either before or after my ce93a4c6127 (dir.[ch]: replace dir_init() with
> DIR_INIT, 2021-07-01). I.e. we do have the invariant you mentioned.
>
> The dir.[ch] API has always relied on the "struct dir_struct" being
> zero'd out. First with memset() before your eceba532141 (dir: fix
> problematic API to avoid memory leaks, 2020-08-18), and after my
> ce93a4c6127 with the DIR_INIT, which both amount to the same thing.
>
> We both missed a caller that used neither dir_init() nor uses DIR_INIT
> now, but it uses "{ 0 }", so it's always zero'd.
>
> Now, of course it being zero'd *would* segfault if you feed
> "dir->basebuf.buf" to strncmp() as you note above, but that code isn't
> reachable. The structure of that function is (pseudocode):
>
> void prep_exclude(...)
> {
>         struct exclude_stack *stk = NULL;
>         [...]
>
>         while ((stk = dir->exclude_stack) != NULL)
>                 /* the strncmp() against "dir->basebuf.buf" is here */
>
>         /* maybe we'll early return here */
>
>         if (!dir->basebuf.buf)
>                 strbuf_init(&dir->basebuf, PATH_MAX);
>
>         /*
>          * Code that sets dir->exclude_stack to non-NULL for the first
>          * time follows...
>          */
> }
>
> I.e. dir->exclude_stack is *only* referenced in this function and
> dir_clear() (where we also check it for NULL first).
>
> It's state management between calls to prep_exclude(). So that that
> initial while-loop can only be entered the the >1th time prep_exclude()
> is called.
>
> We'll then either have reached that strbuf_init() already, or if we took
> an early return before the strbuf_init() we couldn't have set
> dir->exclude_stack either. So that "dir->basebuf.buf" dereference is
> safe in either case.

Thanks for digging into this.  I wonder if dir_struct could use some
separation of putting things inside an embedded internal struct as
well, similar to our discussions with unpack_trees_options.

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH v3 04/11] unpack-trees: introduce preserve_ignored to unpack_trees_options
  2021-10-04 13:45                         ` Elijah Newren
@ 2021-10-04 14:07                           ` Ævar Arnfjörð Bjarmason
  2021-10-04 14:57                             ` Elijah Newren
  0 siblings, 1 reply; 82+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-10-04 14:07 UTC (permalink / raw)
  To: Elijah Newren
  Cc: Elijah Newren via GitGitGadget, Git Mailing List, Fedor Biryukov,
	Philip Oakley, Phillip Wood


On Mon, Oct 04 2021, Elijah Newren wrote:

> On Sat, Oct 2, 2021 at 2:07 AM Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote:
>>
>> On Fri, Oct 01 2021, Elijah Newren wrote:
>>
> ...
>> > So maybe I'll submit some patches on top that rip these direct members
>> > out of of unpack_trees_options and push them inside some opaque
>> > struct.
>>
>> Sure, that sounds good. I only had a mild objection to doing it in a way
>> where you'll need that sort of code I removed in the linked commit in
>> prep_exclude() because you were trying not to expose that at any cost,
>> including via some *_INIT macro. I.e. if it's private we can just name
>> it "priv_*" or have a :
>>
>>     struct dont_touch_this {
>>         struct dir_struct dir;
>>     };
>>
>> Which are both ways of /messaging/ that it's private, and since the
>> target audience is just the rest of the git.git codebase I think that
>> ultimately something that 1) sends the right message 2) makes accidents
>> pretty much impossible suffices. I.e. you don't accidentally introduce a
>> new API user accessing a field called "->priv_*" or
>> "->private_*". Someone will review those patches...
>
> An internal struct with all the members meant to be internal-only
> provides nearly all the advantages that I was going for with the
> opaque struct, while also being a smaller change than what I was
> thinking of doing.  I like that idea; thanks for the suggestion.

Yeah, just to provide an explicit example something like the below. It
compiles to the same assembly (at least under -O3, didn't exhaustively
try other optimization levels).

I'm rather "meh" on it v.s. just prefixing the relevant member names
with "priv_" or "private_", but it results in the same semantics &
machine code, so it's effectively just a way of doing the labeling for
human consumption.

diff --git a/dir.c b/dir.c
index 39fce3bcba7..a714640e782 100644
--- a/dir.c
+++ b/dir.c
@@ -1533,12 +1533,12 @@ static void prep_exclude(struct dir_struct *dir,
 	 * which originate from directories not in the prefix of the
 	 * path being checked.
 	 */
-	while ((stk = dir->exclude_stack) != NULL) {
+	while ((stk = dir->private.exclude_stack) != NULL) {
 		if (stk->baselen <= baselen &&
 		    !strncmp(dir->basebuf.buf, base, stk->baselen))
 			break;
-		pl = &group->pl[dir->exclude_stack->exclude_ix];
-		dir->exclude_stack = stk->prev;
+		pl = &group->pl[dir->private.exclude_stack->exclude_ix];
+		dir->private.exclude_stack = stk->prev;
 		dir->pattern = NULL;
 		free((char *)pl->src); /* see strbuf_detach() below */
 		clear_pattern_list(pl);
@@ -1584,7 +1584,7 @@ static void prep_exclude(struct dir_struct *dir,
 						 base + current,
 						 cp - base - current);
 		}
-		stk->prev = dir->exclude_stack;
+		stk->prev = dir->private.exclude_stack;
 		stk->baselen = cp - base;
 		stk->exclude_ix = group->nr;
 		stk->ucd = untracked;
@@ -1605,7 +1605,7 @@ static void prep_exclude(struct dir_struct *dir,
 			    dir->pattern->flags & PATTERN_FLAG_NEGATIVE)
 				dir->pattern = NULL;
 			if (dir->pattern) {
-				dir->exclude_stack = stk;
+				dir->private.exclude_stack = stk;
 				return;
 			}
 		}
@@ -1662,7 +1662,7 @@ static void prep_exclude(struct dir_struct *dir,
 			invalidate_gitignore(dir->untracked, untracked);
 			oidcpy(&untracked->exclude_oid, &oid_stat.oid);
 		}
-		dir->exclude_stack = stk;
+		dir->private.exclude_stack = stk;
 		current = stk->baselen;
 	}
 	strbuf_setlen(&dir->basebuf, baselen);
@@ -3302,7 +3302,7 @@ void dir_clear(struct dir_struct *dir)
 	free(dir->ignored);
 	free(dir->entries);
 
-	stk = dir->exclude_stack;
+	stk = dir->private.exclude_stack;
 	while (stk) {
 		struct exclude_stack *prev = stk->prev;
 		free(stk);
diff --git a/dir.h b/dir.h
index 83f46c0fb4c..d30d294308d 100644
--- a/dir.h
+++ b/dir.h
@@ -209,6 +209,11 @@ struct untracked_cache {
  * record the paths discovered. A single `struct dir_struct` is used regardless
  * of whether or not the traversal recursively descends into subdirectories.
  */
+
+struct dir_struct_private {
+	struct exclude_stack *exclude_stack;
+};
+
 struct dir_struct {
 
 	/* The number of members in `entries[]` array. */
@@ -327,7 +332,7 @@ struct dir_struct {
 	 * (sub)directory in the traversal. Exclude points to the
 	 * matching exclude struct if the directory is excluded.
 	 */
-	struct exclude_stack *exclude_stack;
+	struct dir_struct_private private;
 	struct path_pattern *pattern;
 	struct strbuf basebuf;
 

^ permalink raw reply related	[flat|nested] 82+ messages in thread

* Re: [RFC PATCH v4 00/10] Fix various issues around removal of untracked files/directories
  2021-10-04  1:11     ` [RFC PATCH v4 00/10] " Ævar Arnfjörð Bjarmason
                         ` (9 preceding siblings ...)
  2021-10-04  1:11       ` [RFC PATCH v4 10/10] Documentation: call out commands that nuke untracked files/directories Ævar Arnfjörð Bjarmason
@ 2021-10-04 14:38       ` Elijah Newren
  2021-10-04 16:08         ` Ævar Arnfjörð Bjarmason
  2021-10-04 18:17         ` Junio C Hamano
  10 siblings, 2 replies; 82+ messages in thread
From: Elijah Newren @ 2021-10-04 14:38 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Git Mailing List, Junio C Hamano,
	Nguyễn Thái Ngọc Duy, Martin Ågren,
	Andrzej Hunt, Jeff King, Fedor Biryukov, Philip Oakley,
	Phillip Wood

On Sun, Oct 3, 2021 at 6:12 PM Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote:
>
> This is an RFC proposed v4 of Elijah's en/removing-untracked-fixes
> series[1] based on top of my memory leak fixes in the "unpack-trees" &
> "dir" APIs[2].
>
> As noted in [2] Elijah and I have been having a back & forth about the
> approach his series takes to fixing memory leaks in those APIs. I
> think submitting working code is more productive than continuing that
> point-by-point discussion, so here we are.
>
> I've avoided making any changes to this series except those narrowly
> required to rebase it on top of mine, and to those parts of Elijah's
> commit messages that became outdated as a result. In particular
> 3/10[3]'s is significantly changed, as much of its commit message
> dicusses complexities that have gone away due to my preceding
> series[2].
>
> The "make dir an internal-only struct" has been replaced by a commit
> that renames that struct member from "dir" to "private_dir". I think
> even that is unnecessary as argued in [4], but I think the judgement
> that something must be done to address that is Elijah's design
> decision, so I did my best to retain it.
>
> I did drop the dynamic allocation & it being a pointer, since with my
> preceding [2] and subsequent unsubmitted memory leak fixes I've got on
> top having it be embedded in "struct unpack_trees_options" makes
> things easier to manage.
>
> Havingn read through all this code quite thoroughly at this point I do
> have other comments on it, but I'll reserve those until we've found
> out what direction we're going forward with vis-a-vis what this will
> be based on top of.
>
> I'm (obviously) hoping for an answer of either on top of my series[2],
> or alternatively that Elijah's series can stick to introducing the
> "preserve_ignored" flag, but not change how the memory
> management/name/type of the embedded "dir" happens (and we could thus
> proceed in parallel).

???

This really bothers me.  I'm not quite sure how to put this into
words, so let me just try my best.  Let me start out by saying that I
think you often provide good feedback and ideas.  Sure, I sometimes
don't agree with some of the feedback or ideas, but overall your
feedback and contributions are definitely valuable.  I also think your
other series you rebased this on has some good ideas and some good
bugfixes.  There is something that seems off here, though.

In this particular case, to start with, Junio already said let's take
v3 as-is[1].  So your series should be rebased on mine, not
vice-versa.

Further while your other series that you are basing this on has some
memory leak fixes; to me, it mostly looks like refactorings for
stylistic code changes. Even though some of those stylistic changes
are good, making a series such as mine that includes bugfixes (to a
user reported bug no less), after multiple rounds and most reviewers
are fine with it, suddenly depend on a new big and unrelated treewide
stylistic refactoring series feels very off to me.  But that doesn't
quite fully explain my misgivings either; there's a bit more:

  * Junio has referred to several of your series as "Meh" and "code
churn".  That makes me think we'd have a higher than normal chance of
a user-reported bug ending up blocked on unrelated stylistic changes.
(Two of them actually, since I have another series depending on this
one that I've waited to submit until this merges to next.)
  * Your stylistic refactorings also manage to confuse the code in
merge-recursive.c, overall making the code potentially much harder to
understand[2][3].  And you open a foot-gun in
clear_unpack_trees_porcelain[3].
  * At least half the series of yours I've reviewed have had
significant bugs[4][5][6] (in addition to [2] and [3]).  This would be
fine if it was complex code that had bugs we were fixing, or if we
were adding new features, but:
  * You submit a huge volume of patches, with a very
disproportionately high ratio of stylistic refactorings rather than
bugfixes and new features.  (This is by no means bad on its own, it's
the combination of this with other factors.)
  * You misrepresent my changes in multiple ways, including ways I had
pointed out corrections for in our previous discussions (including
some of which you acknowledged and agreed with), and you do so even
after you have rebased my patches and added your signed-off-by to them
suggesting you ought to be familiar with them[7].

So, I guess trying to distill what bugs me, I'd say: it seems to me
that you have ignored what Junio said about taking my series, and then
you rebased my series on top of unrelated stylistic churn, with that
churn containing three issues that trigger ongoing misgivings I have
about the care being put behind these refactorings, especially
considering their value compared to the features and bugfixes we are
getting, and you seem to fail to try to understand my changes and
misrepresent them in the process.  I hope I'm not overreacting but
something feels wrong to me here.


A big thumbs down on this reroll.

[1] https://lore.kernel.org/git/xmqq35pk8ylz.fsf@gitster.g/
[2] https://lore.kernel.org/git/CABPp-BFYxWXZQXvDSrM1Y1ZaQ1d2TENQDvx1cyawvrDO1OvExA@mail.gmail.com/
[3] https://lore.kernel.org/git/CABPp-BH4ubjJ98Nvgp2iyKxmU9X+ypw4m1o=iL9Z4vSNZ-QTDw@mail.gmail.com/
[4] https://lore.kernel.org/git/CABPp-BGE+e1er6qFuG90j9+eVG34O8TN=imX=jtcb+jbQjN1QQ@mail.gmail.com/
[5] https://lore.kernel.org/git/CABPp-BEPkukGz32rrro1hzMvSQzX4v7U17CAcV-G2NS6v0u55g@mail.gmail.com/
[6] https://lore.kernel.org/git/xmqqfstppxzm.fsf@gitster.g/  [Note:
problem was flagged by j6t; I was about to flag the same problem when
I noticed he had already done so.]
[7] https://lore.kernel.org/git/CABPp-BEr28xzbpEZc5dq-RVDupXy+h-+PH6CoANF4e0kmxqf0Q@mail.gmail.com/

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH v3 04/11] unpack-trees: introduce preserve_ignored to unpack_trees_options
  2021-10-04 14:07                           ` Ævar Arnfjörð Bjarmason
@ 2021-10-04 14:57                             ` Elijah Newren
  0 siblings, 0 replies; 82+ messages in thread
From: Elijah Newren @ 2021-10-04 14:57 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Elijah Newren via GitGitGadget, Git Mailing List, Fedor Biryukov,
	Philip Oakley, Phillip Wood

On Mon, Oct 4, 2021 at 7:12 AM Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote:
>
>
> On Mon, Oct 04 2021, Elijah Newren wrote:
>
> > On Sat, Oct 2, 2021 at 2:07 AM Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote:
> >>
> >> On Fri, Oct 01 2021, Elijah Newren wrote:
> >>
> > ...
> >> > So maybe I'll submit some patches on top that rip these direct members
> >> > out of of unpack_trees_options and push them inside some opaque
> >> > struct.
> >>
> >> Sure, that sounds good. I only had a mild objection to doing it in a way
> >> where you'll need that sort of code I removed in the linked commit in
> >> prep_exclude() because you were trying not to expose that at any cost,
> >> including via some *_INIT macro. I.e. if it's private we can just name
> >> it "priv_*" or have a :
> >>
> >>     struct dont_touch_this {
> >>         struct dir_struct dir;
> >>     };
> >>
> >> Which are both ways of /messaging/ that it's private, and since the
> >> target audience is just the rest of the git.git codebase I think that
> >> ultimately something that 1) sends the right message 2) makes accidents
> >> pretty much impossible suffices. I.e. you don't accidentally introduce a
> >> new API user accessing a field called "->priv_*" or
> >> "->private_*". Someone will review those patches...
> >
> > An internal struct with all the members meant to be internal-only
> > provides nearly all the advantages that I was going for with the
> > opaque struct, while also being a smaller change than what I was
> > thinking of doing.  I like that idea; thanks for the suggestion.
>
> Yeah, just to provide an explicit example something like the below. It
> compiles to the same assembly (at least under -O3, didn't exhaustively
> try other optimization levels).
>
> I'm rather "meh" on it v.s. just prefixing the relevant member names
> with "priv_" or "private_", but it results in the same semantics &
> machine code, so it's effectively just a way of doing the labeling for
> human consumption.
>
> diff --git a/dir.c b/dir.c
> index 39fce3bcba7..a714640e782 100644
> --- a/dir.c
> +++ b/dir.c
> @@ -1533,12 +1533,12 @@ static void prep_exclude(struct dir_struct *dir,
>          * which originate from directories not in the prefix of the
>          * path being checked.
>          */
> -       while ((stk = dir->exclude_stack) != NULL) {
> +       while ((stk = dir->private.exclude_stack) != NULL) {
>                 if (stk->baselen <= baselen &&
>                     !strncmp(dir->basebuf.buf, base, stk->baselen))
>                         break;
> -               pl = &group->pl[dir->exclude_stack->exclude_ix];
> -               dir->exclude_stack = stk->prev;
> +               pl = &group->pl[dir->private.exclude_stack->exclude_ix];
> +               dir->private.exclude_stack = stk->prev;
>                 dir->pattern = NULL;
>                 free((char *)pl->src); /* see strbuf_detach() below */
>                 clear_pattern_list(pl);
> @@ -1584,7 +1584,7 @@ static void prep_exclude(struct dir_struct *dir,
>                                                  base + current,
>                                                  cp - base - current);
>                 }
> -               stk->prev = dir->exclude_stack;
> +               stk->prev = dir->private.exclude_stack;
>                 stk->baselen = cp - base;
>                 stk->exclude_ix = group->nr;
>                 stk->ucd = untracked;
> @@ -1605,7 +1605,7 @@ static void prep_exclude(struct dir_struct *dir,
>                             dir->pattern->flags & PATTERN_FLAG_NEGATIVE)
>                                 dir->pattern = NULL;
>                         if (dir->pattern) {
> -                               dir->exclude_stack = stk;
> +                               dir->private.exclude_stack = stk;
>                                 return;
>                         }
>                 }
> @@ -1662,7 +1662,7 @@ static void prep_exclude(struct dir_struct *dir,
>                         invalidate_gitignore(dir->untracked, untracked);
>                         oidcpy(&untracked->exclude_oid, &oid_stat.oid);
>                 }
> -               dir->exclude_stack = stk;
> +               dir->private.exclude_stack = stk;
>                 current = stk->baselen;
>         }
>         strbuf_setlen(&dir->basebuf, baselen);
> @@ -3302,7 +3302,7 @@ void dir_clear(struct dir_struct *dir)
>         free(dir->ignored);
>         free(dir->entries);
>
> -       stk = dir->exclude_stack;
> +       stk = dir->private.exclude_stack;
>         while (stk) {
>                 struct exclude_stack *prev = stk->prev;
>                 free(stk);
> diff --git a/dir.h b/dir.h
> index 83f46c0fb4c..d30d294308d 100644
> --- a/dir.h
> +++ b/dir.h
> @@ -209,6 +209,11 @@ struct untracked_cache {
>   * record the paths discovered. A single `struct dir_struct` is used regardless
>   * of whether or not the traversal recursively descends into subdirectories.
>   */
> +
> +struct dir_struct_private {
> +       struct exclude_stack *exclude_stack;
> +};
> +
>  struct dir_struct {
>
>         /* The number of members in `entries[]` array. */
> @@ -327,7 +332,7 @@ struct dir_struct {
>          * (sub)directory in the traversal. Exclude points to the
>          * matching exclude struct if the directory is excluded.
>          */
> -       struct exclude_stack *exclude_stack;
> +       struct dir_struct_private private;
>         struct path_pattern *pattern;
>         struct strbuf basebuf;

Yeah, that doesn't help much at all, and I'd argue even makes things
worse, because you're just looking at a single member.  This subtly
implies that all the other private variables are public API.  The
dir.h portion of the patch should look more like this:

$ git diff -w dir.h
diff --git a/dir.h b/dir.h
index 83f46c0fb4..93a9f02688 100644
--- a/dir.h
+++ b/dir.h
@@ -214,14 +214,9 @@ struct dir_struct {
        /* The number of members in `entries[]` array. */
        int nr;

-       /* Internal use; keeps track of allocation of `entries[]` array.*/
-       int alloc;
-
        /* The number of members in `ignored[]` array. */
        int ignored_nr;

-       int ignored_alloc;
-
        /* bit-field of options */
        enum {

@@ -301,11 +296,19 @@ struct dir_struct {
         */
        const char *exclude_per_dir;

+       struct dir_struct_internal {
+               /* Keeps track of allocation of `entries[]` array.*/
+               int alloc;
+
+               /* Keeps track of allocation of `ignored[]` array. */
+               int ignored_alloc;
+
                /*
                 * We maintain three groups of exclude pattern lists:
                 *
                 * EXC_CMDL lists patterns explicitly given on the command line.
-        * EXC_DIRS lists patterns obtained from per-directory ignore files.
+                * EXC_DIRS lists patterns obtained from per-directory ignore
+                *          files.
                 * EXC_FILE lists patterns from fallback ignore files, e.g.
                 *   - .git/info/exclude
                 *   - core.excludesfile
@@ -340,6 +343,7 @@ struct dir_struct {
                /* Stats about the traversal */
                unsigned visited_paths;
                unsigned visited_directories;
+       } internal;
 };

 #define DIR_INIT { 0 }


The above change would make it clear that there are 12 variables meant
for use only within dir.c that external callers should not be
initializing or reading for output after the fact -- and only 6 that
are part of the public API that they need worry about.  It also makes
it easier for folks messing with dir.c to know which parts are just
internal state management, which I think would have made it easier to
understand the weird basebuf/exclude_stack stuff in prep_exclude()
that you nicely tracked down.  But overall, I'm really most happy
about the part of this patch that lets external callers realize they
only need to worry about 6 out of 18 fields and that they can ignore
the rest.

unpack_trees_options should have something similar done with it, and
maybe some others.

^ permalink raw reply related	[flat|nested] 82+ messages in thread

* Re: [RFC PATCH v4 00/10] Fix various issues around removal of untracked files/directories
  2021-10-04 14:38       ` [RFC PATCH v4 00/10] Fix various issues around removal of " Elijah Newren
@ 2021-10-04 16:08         ` Ævar Arnfjörð Bjarmason
  2021-10-05  7:40           ` Elijah Newren
  2021-10-04 18:17         ` Junio C Hamano
  1 sibling, 1 reply; 82+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-10-04 16:08 UTC (permalink / raw)
  To: Elijah Newren
  Cc: Git Mailing List, Junio C Hamano,
	Nguyễn Thái Ngọc Duy, Martin Ågren,
	Andrzej Hunt, Jeff King, Fedor Biryukov, Philip Oakley,
	Phillip Wood


On Mon, Oct 04 2021, Elijah Newren wrote:

> On Sun, Oct 3, 2021 at 6:12 PM Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote:
>>
>> This is an RFC proposed v4 of Elijah's en/removing-untracked-fixes
>> series[1] based on top of my memory leak fixes in the "unpack-trees" &
>> "dir" APIs[2].
>>
>> As noted in [2] Elijah and I have been having a back & forth about the
>> approach his series takes to fixing memory leaks in those APIs. I
>> think submitting working code is more productive than continuing that
>> point-by-point discussion, so here we are.
>>
>> I've avoided making any changes to this series except those narrowly
>> required to rebase it on top of mine, and to those parts of Elijah's
>> commit messages that became outdated as a result. In particular
>> 3/10[3]'s is significantly changed, as much of its commit message
>> dicusses complexities that have gone away due to my preceding
>> series[2].
>>
>> The "make dir an internal-only struct" has been replaced by a commit
>> that renames that struct member from "dir" to "private_dir". I think
>> even that is unnecessary as argued in [4], but I think the judgement
>> that something must be done to address that is Elijah's design
>> decision, so I did my best to retain it.
>>
>> I did drop the dynamic allocation & it being a pointer, since with my
>> preceding [2] and subsequent unsubmitted memory leak fixes I've got on
>> top having it be embedded in "struct unpack_trees_options" makes
>> things easier to manage.
>>
>> Havingn read through all this code quite thoroughly at this point I do
>> have other comments on it, but I'll reserve those until we've found
>> out what direction we're going forward with vis-a-vis what this will
>> be based on top of.
>>
>> I'm (obviously) hoping for an answer of either on top of my series[2],
>> or alternatively that Elijah's series can stick to introducing the
>> "preserve_ignored" flag, but not change how the memory
>> management/name/type of the embedded "dir" happens (and we could thus
>> proceed in parallel).
>
> ???
>
> This really bothers me.  I'm not quite sure how to put this into
> words, so let me just try my best.  Let me start out by saying that I
> think you often provide good feedback and ideas.  Sure, I sometimes
> don't agree with some of the feedback or ideas, but overall your
> feedback and contributions are definitely valuable.  I also think your
> other series you rebased this on has some good ideas and some good
> bugfixes.  There is something that seems off here, though.

Just for Junio / anyone else following along: let's drop this RFC & the
relateded/proposed "unpack-trees & dir APIs: fix memory
leaks". Point-by-point commentary below (probably not needed/interesting
for those just interested in the state of those two serieses).

> In this particular case, to start with, Junio already said let's take
> v3 as-is[1].  So your series should be rebased on mine, not
> vice-versa.

I understand your annoyance at that, I wouldn't have submitted this if
I'd seen that before, I somehow managed to miss that mail in my mail
queue. I believed the status was at "Will merge to 'next'?" upthread of
[1].

We've then been having an extended back & forth about how to manage
"private" data/structs/leak patterns starting at
https://lore.kernel.org/git/87ilyjviiy.fsf@evledraar.gmail.com/.

At least some of which has been confused by my having quoted a working
but out of context diff from what I ended up submitting as
https://lore.kernel.org/git/cover-00.10-00000000000-20211004T002226Z-avarab@gmail.com/

So, sorry about stepping on your toes. I figured having a discussion
with working patches would be more productive, and that it would help
focus on the important changes in your series.

E.g. your 2nd and 3rd patch setup a "dir_clear()" that your 4th
consolidates, which as shown in this series are intermediate steps that
can be skipped. So perhaps there's some added churn, but also reduced
churn in your resulting series on top...

> Further while your other series that you are basing this on has some
> memory leak fixes; to me, it mostly looks like refactorings for
> stylistic code changes. [...]

Are you referring to the s/memset/UNPACK_TREES_OPTIONS_INIT/ bulk change
at the start?

I agree that it's not strictly necessary, but it's pretty much the same
as your earlier eceba532141 (dir: fix problematic API to avoid memory
leaks, 2020-08-18), and makes e.g. a later change you seemed to like
possible:
https://lore.kernel.org/git/CABPp-BFpyyJ-e8p5fbmCvyaEsfUow=RP45Nw0ckiwNEvVC4zrg@mail.gmail.com/

> Even though some of those stylistic changes
> are good, making a series such as mine that includes bugfixes (to a
> user reported bug no less), after multiple rounds and most reviewers
> are fine with it, suddenly depend on a new big and unrelated treewide
> stylistic refactoring series feels very off to me.  But that doesn't
> quite fully explain my misgivings either; there's a bit more:

[...]

>   * Junio has referred to several of your series as "Meh" and "code
> churn".  That makes me think we'd have a higher than normal chance of
> a user-reported bug ending up blocked on unrelated stylistic changes.
> (Two of them actually, since I have another series depending on this
> one that I've waited to submit until this merges to next.)

I'll stay out of this area for a while. Sorry about that.

> [...]
>   * You misrepresent my changes in multiple ways, including ways I had
> pointed out corrections for in our previous discussions (including
> some of which you acknowledged and agreed with), and you do so even
> after you have rebased my patches and added your signed-off-by to them
> suggesting you ought to be familiar with them[7].

You're absolutely right about that, and the comment starting at "*Sigh*"
in your linked [7] is entirely accurate. I'd like to apologize for that.

If I was in your shoes I'd be *very* annoyed at that chain from [7] to
the upthread cover-letter here making that false claim again.

For what it's worth I do know that your patches aren't allocating the
"struct dir_struct" on the heap, having rebased them etc. But through
some combination of a brainfart and it being late when I wrote that CL
last night I falsely claimed that they did, PBCAK.

What I *meant* to say was some summary that the your series's end state
has a pointer to a dir struct that's dynamically set up v.s. mine of
just initializing it via the macro from the start.

> So, I guess trying to distill what bugs me, I'd say: it seems to me
> that you have ignored what Junio said about taking my series, and then
> you rebased my series on top of unrelated stylistic churn, with that
> churn containing three issues that trigger ongoing misgivings I have
> about the care being put behind these refactorings, especially
> considering their value compared to the features and bugfixes we are
> getting, and you seem to fail to try to understand my changes and
> misrepresent them in the process.  I hope I'm not overreacting but
> something feels wrong to me here.

I don't think you're overreacting, and sorry again. Hopefully it helps
somewhat that I for the "ignoring Junio [and charging ahead with this]"
and the 2nd false claim about about heap allocation I was (believe it or
not) just honestly mistaken instead of trying to get on your nerves.

As some of my early feedback on the whole topic of gitignore
related-shredding/precious etc. should hopefully indicate I'm really
happy that you've picked up this topic.

I thought this would help it along, but that's clearly not the
case. Sorry again.

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [RFC PATCH v4 00/10] Fix various issues around removal of untracked files/directories
  2021-10-04 14:38       ` [RFC PATCH v4 00/10] Fix various issues around removal of " Elijah Newren
  2021-10-04 16:08         ` Ævar Arnfjörð Bjarmason
@ 2021-10-04 18:17         ` Junio C Hamano
  1 sibling, 0 replies; 82+ messages in thread
From: Junio C Hamano @ 2021-10-04 18:17 UTC (permalink / raw)
  To: Elijah Newren
  Cc: Ævar Arnfjörð Bjarmason, Git Mailing List,
	Nguyễn Thái Ngọc Duy, Martin Ågren,
	Andrzej Hunt, Jeff King, Fedor Biryukov, Philip Oakley,
	Phillip Wood

Elijah Newren <newren@gmail.com> writes:

> So, I guess trying to distill what bugs me, I'd say: it seems to me
> that you have ignored what Junio said about taking my series, and then
> you rebased my series on top of unrelated stylistic churn, with that
> churn containing three issues that trigger ongoing misgivings I have
> about the care being put behind these refactorings, especially
> considering their value compared to the features and bugfixes we are
> getting, and you seem to fail to try to understand my changes and
> misrepresent them in the process.  I hope I'm not overreacting but
> something feels wrong to me here.

Just one thing.  It is fairly easy to rebase and merge and generally
muck with somebody else's changes without fully understanding it.  I
do that all the time ;-)

I would prefer to see us assume misunderstanding by incompetence,
rather than misrepresentation by malice to further one's own agenda.

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [RFC PATCH v4 00/10] Fix various issues around removal of untracked files/directories
  2021-10-04 16:08         ` Ævar Arnfjörð Bjarmason
@ 2021-10-05  7:40           ` Elijah Newren
  0 siblings, 0 replies; 82+ messages in thread
From: Elijah Newren @ 2021-10-05  7:40 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Git Mailing List, Junio C Hamano,
	Nguyễn Thái Ngọc Duy, Martin Ågren,
	Andrzej Hunt, Jeff King, Fedor Biryukov, Philip Oakley,
	Phillip Wood

Hi Ævar,

On Mon, Oct 4, 2021 at 10:56 AM Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
>
> On Mon, Oct 04 2021, Elijah Newren wrote:
>
> > So, I guess trying to distill what bugs me, I'd say: [...] I hope I'm not overreacting but
> > something feels wrong to me here.
>
> I don't think you're overreacting, and sorry again. Hopefully it helps
> somewhat that I for the "ignoring Junio [and charging ahead with this]"
> and the 2nd false claim about about heap allocation I was (believe it or
> not) just honestly mistaken instead of trying to get on your nerves.

I really wish I could take back that email.  And yes, I was totally
overreacting (in part due to unrelated non-git stuff going on
recently).  I should have waited a day, and then I'd probably realize
it.  I owe you an apology, Ævar.  I'm very sorry.

In regards to the worst part of my email:

"""
  * At least half the series of yours I've reviewed have had
significant bugs[4][5][6] (in addition to [2] and [3]).  This would be
fine if it was complex code that had bugs we were fixing, or if we
were adding new features, but:
"""

I'm no stranger to introducing pretty bad bugs either, some caught in
review (one case of repository corruption caught in a review just last
week!), some making it into releases:

https://lore.kernel.org/git/YVOn3hDsb5pnxR53@coredump.intra.peff.net/
https://lore.kernel.org/git/DM6PR00MB06829EC5B85E0C5AC595004E894E9@DM6PR00MB0682.namprd00.prod.outlook.com/
https://lore.kernel.org/git/CABPp-BFWfwkYAPyySjWOMZ02_+YLf=TJ_aVMaHaizJWAsCL67g@mail.gmail.com/
https://lore.kernel.org/git/CABPp-BENB=mqfFU4FGb2OS9VDV=9VdT71HhFLZwtyxD8MpdTMQ@mail.gmail.com/
https://lore.kernel.org/git/CABPp-BEBKyE2NVfREov6k5qML5jryLjtzw=Y21EA=fHXA0PO5A@mail.gmail.com/
https://lore.kernel.org/git/CABPp-BF8eokQTVwgo80ffq3tn=NA=mPf7oymce5P4itDyZBiMg@mail.gmail.com/
https://lore.kernel.org/git/7v1uzu5a70.fsf@alter.siamese.dyndns.org/
https://lore.kernel.org/git/CABPp-BHL4P0RxQ6OAuDSev9BXVM0uKTYD3M4JGTQvSwcBv4K0Q@mail.gmail.com/

and I could easily find more, it was pretty easy to come up with that
list of bugs that still make me shudder.  Touching difficult code, or
code one isn't quite as familiar with is more prone to bigger
problems.  That doesn't mean we should avoid working on those areas,
but I'm afraid my email probably served only to scare people away from
doing so.  Treewide refactoring is, by its nature, likely to take one
into many areas of the codebase, and thus it'd be natural to expect
problems that hopefully get caught in review.  The fact that such
treewide refactorings did have problems, was by itself not my point
and not what I'd consider unusual.  I was attempting to make a more
nuanced point about lots of treewide refactorings in a short time
period coupled with lack of understanding of motivation for some of
those refactorings all combined with additional things on top, but
utterly failed at anything more than coming across as a jerk.  I'm
sorry.

Thanks for responding to my email so diplomatically; I'm super
impressed with that.  And just to be clear, I respect your
contributions and hold them in high esteem (prove, i18n, pcre2, faster
send-email to name just a few).  I think that message was
unfortunately completely lost in my email, which is plenty of reason
that I just shouldn't have sent it.

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH v3 03/11] read-tree, merge-recursive: overwrite ignored files by default
  2021-09-27 16:33     ` [PATCH v3 03/11] read-tree, merge-recursive: overwrite ignored files by default Elijah Newren via GitGitGadget
@ 2021-12-13 17:12       ` Jack O'Connor
  2021-12-13 20:10         ` Elijah Newren
  0 siblings, 1 reply; 82+ messages in thread
From: Jack O'Connor @ 2021-12-13 17:12 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren, Elijah Newren via GitGitGadget

> read-tree, merge-recursive: overwrite ignored files by default

When this patch shipped in v1.34, a test broke in a project of mine
(https://github.com/buildinspace/peru/blob/e9ba6e0024ea08105a8d027f958899cca39aeb9a/tests/test_cache.py#L111-L117)
that was relying on git read-tree *not* to respect .gitignore files.
(Obligatory https://xkcd.com/1172.) That peru tool is using git
plumbing commands to manage trees of files, but it tries to keep this
implementation detail internal, and behaving differently in the
presence of a .gitignore file belonging to the user would leak this
internal implementation detail. I've been trying to figure out a way
to reproduce the Git 1.33 behavior in Git 1.34, but so far I haven't
found any flags or configs to do that. (For example, putting !* in
.git/info/exclude doesn't seem to help, I think because a .gitignore
file in the working tree takes precedence.) Can anyone suggest another
workaround?

This is my first mail to this list, so please let me know if I mess up
the etiquette.

- Jack

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH v3 03/11] read-tree, merge-recursive: overwrite ignored files by default
  2021-12-13 17:12       ` Jack O'Connor
@ 2021-12-13 20:10         ` Elijah Newren
  0 siblings, 0 replies; 82+ messages in thread
From: Elijah Newren @ 2021-12-13 20:10 UTC (permalink / raw)
  To: Jack O'Connor; +Cc: git, Elijah Newren via GitGitGadget

On Mon, Dec 13, 2021 at 9:12 AM Jack O'Connor <oconnor663@gmail.com> wrote:
>
> > read-tree, merge-recursive: overwrite ignored files by default
>
> When this patch shipped in v1.34, a test broke in a project of mine
> (https://github.com/buildinspace/peru/blob/e9ba6e0024ea08105a8d027f958899cca39aeb9a/tests/test_cache.py#L111-L117)
> that was relying on git read-tree *not* to respect .gitignore files.
> (Obligatory https://xkcd.com/1172.) That peru tool is using git
> plumbing commands to manage trees of files, but it tries to keep this
> implementation detail internal, and behaving differently in the
> presence of a .gitignore file belonging to the user would leak this
> internal implementation detail. I've been trying to figure out a way
> to reproduce the Git 1.33 behavior in Git 1.34, but so far I haven't
> found any flags or configs to do that. (For example, putting !* in
> .git/info/exclude doesn't seem to help, I think because a .gitignore
> file in the working tree takes precedence.) Can anyone suggest another
> workaround?
>
> This is my first mail to this list, so please let me know if I mess up
> the etiquette.

Your email is fine.  :-)  Interesting usage case; thanks for sending it along.

Digging a bit into your repository, it appears this all started
because you noticed that checkout would overwrite ignored files, and
so you switched to reset --keep (in your 637d5c042262 (make cache
export refuse to pave .gitgnored files, 2014-07-22)) and then to
read-tree (in your 057d1af600f9 (Rewrite `export_tree` to allow
deleted files., 2014-08-05)) to avoid having ignored files be
overwritten. You could have stuck with `git checkout` all along, and
just passed it the --no-overwrite-ignore flag.  You probably just
missed the existence of that flag, because Duy forgot to document it
for 8 years (see git.git's commit 9d223d43e5 ("doc: document
--overwrite-ignore", 2019-03-29))  Going back to checkout might
provide you a workaround.  (Also, another random thing I noticed while
looking at your repo: `--diff-filter=d` is a much better way of
checking for not-deleted-changes than using `--diff-filter=ACMRTUXB`.
Note the lowercase 'd' rather than uppercase.)

Your report suggests more places should accept the
--no-overwrite-ignore flag, which I alluded to as a possibility in the
sixth patch in the series ("Remove ignored files by default when they
are in the way"[1]) and the comments in the cover letter about
precious ignored files (under "SIDENOTE about treating ignored files
as precious"[2]).  And perhaps we could have a core.overwriteIgnore
config option for setting a different global default (also as alluded
to in my cover letter).  Doing things would provide additional
workarounds, and finally provide the "precious ignored" concept that
has been discussed occasionally.  I think it's not too hard to do that
on top of my previous patch series.  I'll try to take a look after
some other in-flight series finally land.


[1] https://lore.kernel.org/git/b7fe354effff8da3de53bd9cc40a03b5fd455f67.1632760428.git.gitgitgadget@gmail.com/
[2] https://lore.kernel.org/git/pull.1036.v3.git.1632760428.gitgitgadget@gmail.com/

^ permalink raw reply	[flat|nested] 82+ messages in thread

end of thread, other threads:[~2021-12-13 20:10 UTC | newest]

Thread overview: 82+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-18 23:15 [PATCH 0/6] Fix various issues around removal of untracked files/directories Elijah Newren via GitGitGadget
2021-09-18 23:15 ` [PATCH 1/6] t2500: add various tests for nuking untracked files Elijah Newren via GitGitGadget
2021-09-19 13:44   ` Ævar Arnfjörð Bjarmason
2021-09-20 14:48     ` Elijah Newren
2021-09-18 23:15 ` [PATCH 2/6] Split unpack_trees 'reset' flag into two for untracked handling Elijah Newren via GitGitGadget
2021-09-19 13:48   ` Ævar Arnfjörð Bjarmason
2021-09-20 15:20     ` Elijah Newren
2021-09-20 10:19   ` Phillip Wood
2021-09-20 16:05     ` Elijah Newren
2021-09-20 18:11       ` Phillip Wood
2021-09-24  2:27         ` Elijah Newren
2021-09-18 23:15 ` [PATCH 3/6] unpack-trees: avoid nuking untracked dir in way of unmerged file Elijah Newren via GitGitGadget
2021-09-18 23:15 ` [PATCH 4/6] unpack-trees: avoid nuking untracked dir in way of locally deleted file Elijah Newren via GitGitGadget
2021-09-19 13:52   ` Ævar Arnfjörð Bjarmason
2021-09-20 16:12     ` Elijah Newren
2021-09-18 23:15 ` [PATCH 5/6] Comment important codepaths regarding nuking untracked files/dirs Elijah Newren via GitGitGadget
2021-09-24 11:47   ` Luke Diamand
2021-09-24 13:41     ` Elijah Newren
2021-09-18 23:15 ` [PATCH 6/6] Documentation: call out commands that nuke untracked files/directories Elijah Newren via GitGitGadget
2021-09-19 10:52   ` Philip Oakley
2021-09-19 13:36     ` Philip Oakley
2021-09-20 16:29       ` Elijah Newren
2021-09-24  6:37 ` [PATCH v2 0/6] Fix various issues around removal of " Elijah Newren via GitGitGadget
2021-09-24  6:37   ` [PATCH v2 1/6] t2500: add various tests for nuking untracked files Elijah Newren via GitGitGadget
2021-09-24  6:37   ` [PATCH v2 2/6] Change unpack_trees' 'reset' flag into an enum Elijah Newren via GitGitGadget
2021-09-24 17:35     ` Junio C Hamano
2021-09-26  6:50       ` Elijah Newren
2021-09-24  6:37   ` [PATCH v2 3/6] unpack-trees: avoid nuking untracked dir in way of unmerged file Elijah Newren via GitGitGadget
2021-09-24  6:37   ` [PATCH v2 4/6] unpack-trees: avoid nuking untracked dir in way of locally deleted file Elijah Newren via GitGitGadget
2021-09-24  6:37   ` [PATCH v2 5/6] Comment important codepaths regarding nuking untracked files/dirs Elijah Newren via GitGitGadget
2021-09-24 17:50     ` Eric Sunshine
2021-09-26  6:35       ` Elijah Newren
2021-09-24  6:37   ` [PATCH v2 6/6] Documentation: call out commands that nuke untracked files/directories Elijah Newren via GitGitGadget
2021-09-27 16:33   ` [PATCH v3 00/11] Fix various issues around removal of " Elijah Newren via GitGitGadget
2021-09-27 16:33     ` [PATCH v3 01/11] t2500: add various tests for nuking untracked files Elijah Newren via GitGitGadget
2021-09-27 16:33     ` [PATCH v3 02/11] checkout, read-tree: fix leak of unpack_trees_options.dir Elijah Newren via GitGitGadget
2021-09-27 16:33     ` [PATCH v3 03/11] read-tree, merge-recursive: overwrite ignored files by default Elijah Newren via GitGitGadget
2021-12-13 17:12       ` Jack O'Connor
2021-12-13 20:10         ` Elijah Newren
2021-09-27 16:33     ` [PATCH v3 04/11] unpack-trees: introduce preserve_ignored to unpack_trees_options Elijah Newren via GitGitGadget
2021-09-29  9:22       ` Ævar Arnfjörð Bjarmason
2021-09-29 15:35         ` Elijah Newren
2021-09-29 18:30           ` Ævar Arnfjörð Bjarmason
2021-09-30  4:25             ` Elijah Newren
2021-09-30 14:04               ` Ævar Arnfjörð Bjarmason
2021-10-01  1:53                 ` Elijah Newren
2021-10-01  8:15                   ` Ævar Arnfjörð Bjarmason
2021-10-01  9:53                     ` Ævar Arnfjörð Bjarmason
2021-10-01 18:50                     ` Elijah Newren
2021-10-02  8:44                       ` Ævar Arnfjörð Bjarmason
2021-10-03 22:21                         ` Ævar Arnfjörð Bjarmason
2021-10-04 13:45                           ` Elijah Newren
2021-10-04 13:45                         ` Elijah Newren
2021-10-04 14:07                           ` Ævar Arnfjörð Bjarmason
2021-10-04 14:57                             ` Elijah Newren
2021-09-27 16:33     ` [PATCH v3 05/11] unpack-trees: make dir an internal-only struct Elijah Newren via GitGitGadget
2021-09-27 16:33     ` [PATCH v3 06/11] Remove ignored files by default when they are in the way Elijah Newren via GitGitGadget
2021-09-27 16:33     ` [PATCH v3 07/11] Change unpack_trees' 'reset' flag into an enum Elijah Newren via GitGitGadget
2021-09-27 16:33     ` [PATCH v3 08/11] unpack-trees: avoid nuking untracked dir in way of unmerged file Elijah Newren via GitGitGadget
2021-09-27 16:33     ` [PATCH v3 09/11] unpack-trees: avoid nuking untracked dir in way of locally deleted file Elijah Newren via GitGitGadget
2021-09-27 16:33     ` [PATCH v3 10/11] Comment important codepaths regarding nuking untracked files/dirs Elijah Newren via GitGitGadget
2021-09-27 16:33     ` [PATCH v3 11/11] Documentation: call out commands that nuke untracked files/directories Elijah Newren via GitGitGadget
2021-09-27 20:36     ` [PATCH v3 00/11] Fix various issues around removal of " Junio C Hamano
2021-09-27 20:41       ` Elijah Newren
2021-09-27 21:31         ` Elijah Newren
2021-09-30 14:00     ` Phillip Wood
     [not found]     ` <aaa8ea3b-0902-f9e6-c1a4-0ca2b1b2f57b@gmail.com>
2021-10-01  2:08       ` Elijah Newren
2021-10-04  1:11     ` [RFC PATCH v4 00/10] " Ævar Arnfjörð Bjarmason
2021-10-04  1:11       ` [RFC PATCH v4 01/10] t2500: add various tests for nuking untracked files Ævar Arnfjörð Bjarmason
2021-10-04  1:11       ` [RFC PATCH v4 02/10] read-tree, merge-recursive: overwrite ignored files by default Ævar Arnfjörð Bjarmason
2021-10-04  1:11       ` [RFC PATCH v4 03/10] unpack-trees: introduce preserve_ignored to unpack_trees_options Ævar Arnfjörð Bjarmason
2021-10-04  1:11       ` [RFC PATCH v4 04/10] unpack-trees: rename "dir" to "private_dir" Ævar Arnfjörð Bjarmason
2021-10-04  1:11       ` [RFC PATCH v4 05/10] Remove ignored files by default when they are in the way Ævar Arnfjörð Bjarmason
2021-10-04  1:11       ` [RFC PATCH v4 06/10] Change unpack_trees' 'reset' flag into an enum Ævar Arnfjörð Bjarmason
2021-10-04  1:11       ` [RFC PATCH v4 07/10] unpack-trees: avoid nuking untracked dir in way of unmerged file Ævar Arnfjörð Bjarmason
2021-10-04  1:11       ` [RFC PATCH v4 08/10] unpack-trees: avoid nuking untracked dir in way of locally deleted file Ævar Arnfjörð Bjarmason
2021-10-04  1:11       ` [RFC PATCH v4 09/10] Comment important codepaths regarding nuking untracked files/dirs Ævar Arnfjörð Bjarmason
2021-10-04  1:11       ` [RFC PATCH v4 10/10] Documentation: call out commands that nuke untracked files/directories Ævar Arnfjörð Bjarmason
2021-10-04 14:38       ` [RFC PATCH v4 00/10] Fix various issues around removal of " Elijah Newren
2021-10-04 16:08         ` Ævar Arnfjörð Bjarmason
2021-10-05  7:40           ` Elijah Newren
2021-10-04 18:17         ` Junio C Hamano

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).