git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/10] Sparse-index: integrate with status and add
@ 2021-04-13 14:01 Derrick Stolee via GitGitGadget
  2021-04-13 14:01 ` [PATCH 01/10] t1092: add tests for status/add and sparse files Derrick Stolee via GitGitGadget
                   ` (11 more replies)
  0 siblings, 12 replies; 215+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-04-13 14:01 UTC (permalink / raw)
  To: git; +Cc: gitster, newren, Derrick Stolee

This is the first "payoff" series in the sparse-index work. It makes 'git
status' and 'git add' very fast when a sparse-index is enabled on a
repository with cone-mode sparse-checkout (and a small populated set).

This is based on ds/sparse-index-protections AND mt/add-rm-sparse-checkout.
The latter branch is needed because it changes the behavior of 'git add'
around sparse entries, which changes the expectations of a test added in
patch 1.

The approach here is to audit the places where ensure_full_index() pops up
while doing normal commands with pathspecs within the sparse-checkout
definition. Each of these are checked and tested. In the end, the
sparse-index is integrated with these features:

 * git status
 * git add -A
 * git add . (and other pathspecs)
 * FS Monitor index extension.

The performance tests in p2000-sparse-operations.sh improve by 95% or more,
even when compared with the full-index cases, not just the sparse-index
cases that previously had extra overhead.

Hopefully this is the first example of how ds/sparse-index-protections has
done the basic work to do these conversions safely, making them look easier
than they seemed when starting this adventure.

Thanks, -Stolee

Derrick Stolee (10):
  t1092: add tests for status/add and sparse files
  unpack-trees: make sparse aware
  dir.c: accept a directory as part of cone-mode patterns
  status: skip sparse-checkout percentage with sparse-index
  status: use sparse-index throughout
  dir: use expand_to_path() for sparse directories
  add: allow operating on a sparse-only index
  pathspec: stop calling ensure_full_index
  t7519: add sparse directories to FS monitor tests
  fsmonitor: test with sparse index

 builtin/add.c                            |  3 +
 builtin/commit.c                         |  3 +
 dir.c                                    |  5 ++
 dir.h                                    |  2 +-
 pathspec.c                               |  2 -
 preload-index.c                          |  2 +
 read-cache.c                             |  5 +-
 t/t1092-sparse-checkout-compatibility.sh | 73 +++++++++++++++++++++++-
 t/t7519-status-fsmonitor.sh              | 65 +++++++++++++++++++++
 unpack-trees.c                           | 24 +++++++-
 wt-status.c                              | 14 ++++-
 wt-status.h                              |  1 +
 12 files changed, 186 insertions(+), 13 deletions(-)


base-commit: f723f370c89ad61f4f40aabfd3540b1ce19c00e5
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-932%2Fderrickstolee%2Fsparse-index%2Fstatus-and-add-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-932/derrickstolee/sparse-index/status-and-add-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/932
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 215+ messages in thread

* [PATCH 01/10] t1092: add tests for status/add and sparse files
  2021-04-13 14:01 [PATCH 00/10] Sparse-index: integrate with status and add Derrick Stolee via GitGitGadget
@ 2021-04-13 14:01 ` Derrick Stolee via GitGitGadget
  2021-04-20 21:52   ` Elijah Newren
  2021-04-21 15:14   ` Matheus Tavares Bernardino
  2021-04-13 14:01 ` [PATCH 02/10] unpack-trees: make sparse aware Derrick Stolee via GitGitGadget
                   ` (10 subsequent siblings)
  11 siblings, 2 replies; 215+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-04-13 14:01 UTC (permalink / raw)
  To: git; +Cc: gitster, newren, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Before moving to update 'git status' and 'git add' to work with sparse
indexes, add an explicit test that ensures the sparse-index works the
same as a normal sparse-checkout when the worktree contains directories
and files outside of the sparse cone.

Specifically, 'folder1/a' is a file in our test repo, but 'folder1' is
not in the sparse cone. When 'folder1/a' is modified, the file
'folder1/a' is shown as modified, but adding it fails. This is new
behavior as of a20f704 (add: warn when asked to update SKIP_WORKTREE
entries, 2021-04-08). Before that change, these adds would be silently
ignored.

Untracked files are fine: adding new files both with 'git add .' and
'git add folder1/' works just as in a full checkout. This may not be
entirely desirable, but we are not intending to change behavior at the
moment, only document it.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 t/t1092-sparse-checkout-compatibility.sh | 36 ++++++++++++++++++++++++
 1 file changed, 36 insertions(+)

diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index 12e6c453024f..6598c12a2069 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -232,6 +232,42 @@ test_expect_success 'add, commit, checkout' '
 	test_all_match git checkout -
 '
 
+test_expect_success 'status/add: outside sparse cone' '
+	init_repos &&
+
+	# folder1 is at HEAD, but outside the sparse cone
+	run_on_sparse mkdir folder1 &&
+	cp initial-repo/folder1/a sparse-checkout/folder1/a &&
+	cp initial-repo/folder1/a sparse-index/folder1/a &&
+
+	test_sparse_match git status &&
+
+	write_script edit-contents <<-\EOF &&
+	echo text >>$1
+	EOF
+	run_on_all ../edit-contents folder1/a &&
+	run_on_all ../edit-contents folder1/new &&
+
+	test_sparse_match git status --porcelain=v2 &&
+
+	# This "git add folder1/a" is completely ignored
+	# by the sparse-checkout repos. It causes the
+	# full repo to have a different staged environment.
+	test_must_fail git -C sparse-checkout add folder1/a &&
+	test_must_fail git -C sparse-index add folder1/a &&
+	git -C full-checkout checkout HEAD -- folder1/a &&
+	test_sparse_match git status --porcelain=v2 &&
+
+	test_all_match git add . &&
+	test_all_match git status --porcelain=v2 &&
+	test_all_match git commit -m folder1/new &&
+
+	run_on_all ../edit-contents folder1/newer &&
+	test_all_match git add folder1/ &&
+	test_all_match git status --porcelain=v2 &&
+	test_all_match git commit -m folder1/newer
+'
+
 test_expect_success 'checkout and reset --hard' '
 	init_repos &&
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 215+ messages in thread

* [PATCH 02/10] unpack-trees: make sparse aware
  2021-04-13 14:01 [PATCH 00/10] Sparse-index: integrate with status and add Derrick Stolee via GitGitGadget
  2021-04-13 14:01 ` [PATCH 01/10] t1092: add tests for status/add and sparse files Derrick Stolee via GitGitGadget
@ 2021-04-13 14:01 ` Derrick Stolee via GitGitGadget
  2021-04-20 23:00   ` Elijah Newren
  2021-04-13 14:01 ` [PATCH 03/10] dir.c: accept a directory as part of cone-mode patterns Derrick Stolee via GitGitGadget
                   ` (9 subsequent siblings)
  11 siblings, 1 reply; 215+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-04-13 14:01 UTC (permalink / raw)
  To: git; +Cc: gitster, newren, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

As a first step to integrate 'git status' and 'git add' with the sparse
index, we must start integrating unpack_trees() with sparse directory
entries. These changes are currently impossible to trigger because
unpack_trees() calls ensure_full_index() if command_requires_full_index
is true. This is the case for all commands at the moment. As we expand
more commands to be sparse-aware, we might find that more changes are
required to unpack_trees(). The current changes will suffice for
'status' and 'add'.

unpack_trees() calls the traverse_trees() API using unpack_callback()
to decide if we should recurse into a subtree. We must add new abilities
to skip a subtree if it corresponds to a sparse directory entry.

It is important to be careful about the trailing directory separator
that exists in the sparse directory entries but not in the subtree
paths.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 dir.h           |  2 +-
 preload-index.c |  2 ++
 read-cache.c    |  3 +++
 unpack-trees.c  | 24 ++++++++++++++++++++++--
 4 files changed, 28 insertions(+), 3 deletions(-)

diff --git a/dir.h b/dir.h
index 51cb0e217247..9d6666f520f3 100644
--- a/dir.h
+++ b/dir.h
@@ -503,7 +503,7 @@ static inline int ce_path_match(struct index_state *istate,
 				char *seen)
 {
 	return match_pathspec(istate, pathspec, ce->name, ce_namelen(ce), 0, seen,
-			      S_ISDIR(ce->ce_mode) || S_ISGITLINK(ce->ce_mode));
+			      S_ISSPARSEDIR(ce->ce_mode) || S_ISDIR(ce->ce_mode) || S_ISGITLINK(ce->ce_mode));
 }
 
 static inline int dir_path_match(struct index_state *istate,
diff --git a/preload-index.c b/preload-index.c
index e5529a586366..35e67057ca9b 100644
--- a/preload-index.c
+++ b/preload-index.c
@@ -55,6 +55,8 @@ static void *preload_thread(void *_data)
 			continue;
 		if (S_ISGITLINK(ce->ce_mode))
 			continue;
+		if (S_ISSPARSEDIR(ce->ce_mode))
+			continue;
 		if (ce_uptodate(ce))
 			continue;
 		if (ce_skip_worktree(ce))
diff --git a/read-cache.c b/read-cache.c
index 29ffa9ac5db9..6308234b4838 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1594,6 +1594,9 @@ int refresh_index(struct index_state *istate, unsigned int flags,
 		if (ignore_skip_worktree && ce_skip_worktree(ce))
 			continue;
 
+		if (istate->sparse_index && S_ISSPARSEDIR(ce->ce_mode))
+			continue;
+
 		if (pathspec && !ce_path_match(istate, ce, pathspec, seen))
 			filtered = 1;
 
diff --git a/unpack-trees.c b/unpack-trees.c
index dddf106d5bd4..9a62e823928a 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -586,6 +586,13 @@ static void mark_ce_used(struct cache_entry *ce, struct unpack_trees_options *o)
 {
 	ce->ce_flags |= CE_UNPACKED;
 
+	/*
+	 * If this is a sparse directory, don't advance cache_bottom.
+	 * That will be advanced later using the cache-tree data.
+	 */
+	if (S_ISSPARSEDIR(ce->ce_mode))
+		return;
+
 	if (o->cache_bottom < o->src_index->cache_nr &&
 	    o->src_index->cache[o->cache_bottom] == ce) {
 		int bottom = o->cache_bottom;
@@ -984,6 +991,9 @@ static int do_compare_entry(const struct cache_entry *ce,
 	ce_len -= pathlen;
 	ce_name = ce->name + pathlen;
 
+	/* remove directory separator if a sparse directory entry */
+	if (S_ISSPARSEDIR(ce->ce_mode))
+		ce_len--;
 	return df_name_compare(ce_name, ce_len, S_IFREG, name, namelen, mode);
 }
 
@@ -993,6 +1003,10 @@ static int compare_entry(const struct cache_entry *ce, const struct traverse_inf
 	if (cmp)
 		return cmp;
 
+	/* If ce is a sparse directory, then allow equality here. */
+	if (S_ISSPARSEDIR(ce->ce_mode))
+		return 0;
+
 	/*
 	 * Even if the beginning compared identically, the ce should
 	 * compare as bigger than a directory leading up to it!
@@ -1243,6 +1257,7 @@ static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, str
 	struct cache_entry *src[MAX_UNPACK_TREES + 1] = { NULL, };
 	struct unpack_trees_options *o = info->data;
 	const struct name_entry *p = names;
+	unsigned recurse = 1;
 
 	/* Find first entry with a real name (we could use "mask" too) */
 	while (!p->mode)
@@ -1284,12 +1299,16 @@ static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, str
 					}
 				}
 				src[0] = ce;
+
+				if (S_ISSPARSEDIR(ce->ce_mode))
+					recurse = 0;
 			}
 			break;
 		}
 	}
 
-	if (unpack_nondirectories(n, mask, dirmask, src, names, info) < 0)
+	if (recurse &&
+	    unpack_nondirectories(n, mask, dirmask, src, names, info) < 0)
 		return -1;
 
 	if (o->merge && src[0]) {
@@ -1319,7 +1338,8 @@ static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, str
 			}
 		}
 
-		if (traverse_trees_recursive(n, dirmask, mask & ~dirmask,
+		if (recurse &&
+		    traverse_trees_recursive(n, dirmask, mask & ~dirmask,
 					     names, info) < 0)
 			return -1;
 		return mask;
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 215+ messages in thread

* [PATCH 03/10] dir.c: accept a directory as part of cone-mode patterns
  2021-04-13 14:01 [PATCH 00/10] Sparse-index: integrate with status and add Derrick Stolee via GitGitGadget
  2021-04-13 14:01 ` [PATCH 01/10] t1092: add tests for status/add and sparse files Derrick Stolee via GitGitGadget
  2021-04-13 14:01 ` [PATCH 02/10] unpack-trees: make sparse aware Derrick Stolee via GitGitGadget
@ 2021-04-13 14:01 ` Derrick Stolee via GitGitGadget
  2021-04-20 23:21   ` Elijah Newren
  2021-04-13 14:01 ` [PATCH 04/10] status: skip sparse-checkout percentage with sparse-index Derrick Stolee via GitGitGadget
                   ` (8 subsequent siblings)
  11 siblings, 1 reply; 215+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-04-13 14:01 UTC (permalink / raw)
  To: git; +Cc: gitster, newren, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

When we have sparse directory entries in the index, we want to compare
that directory against sparse-checkout patterns. Those pattern matching
algorithms are built expecting a file path, not a directory path. This
is especially important in the "cone mode" patterns which will match
files that exist within the "parent directories" as well as the
recursive directory matches.

If path_matches_pattern_list() is given a directory, we can add a fake
filename ("-") to the directory and get the same results as before,
assuming we are in cone mode. Since sparse index requires cone mode
patterns, this is an acceptable assumption.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 dir.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/dir.c b/dir.c
index 166238e79f52..57e22e605cec 100644
--- a/dir.c
+++ b/dir.c
@@ -1378,6 +1378,11 @@ enum pattern_match_result path_matches_pattern_list(
 	strbuf_addch(&parent_pathname, '/');
 	strbuf_add(&parent_pathname, pathname, pathlen);
 
+	/* Directory requests should be added as if they are a file */
+	if (parent_pathname.len > 1 &&
+	    parent_pathname.buf[parent_pathname.len - 1] == '/')
+		strbuf_add(&parent_pathname, "-", 1);
+
 	if (hashmap_contains_path(&pl->recursive_hashmap,
 				  &parent_pathname)) {
 		result = MATCHED_RECURSIVE;
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 215+ messages in thread

* [PATCH 04/10] status: skip sparse-checkout percentage with sparse-index
  2021-04-13 14:01 [PATCH 00/10] Sparse-index: integrate with status and add Derrick Stolee via GitGitGadget
                   ` (2 preceding siblings ...)
  2021-04-13 14:01 ` [PATCH 03/10] dir.c: accept a directory as part of cone-mode patterns Derrick Stolee via GitGitGadget
@ 2021-04-13 14:01 ` Derrick Stolee via GitGitGadget
  2021-04-20 23:26   ` Elijah Newren
  2021-04-13 14:01 ` [PATCH 05/10] status: use sparse-index throughout Derrick Stolee via GitGitGadget
                   ` (7 subsequent siblings)
  11 siblings, 1 reply; 215+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-04-13 14:01 UTC (permalink / raw)
  To: git; +Cc: gitster, newren, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

'git status' began reporting a percentage of populated paths when
sparse-checkout is enabled in 051df3cf (wt-status: show sparse
checkout status as well, 2020-07-18). This percentage is incorrect when
the index has sparse directories. It would also be expensive to
calculate as we would need to parse trees to count the total number of
possible paths.

Avoid the expensive computation by simplifying the output to only report
that a sparse checkout exists, without the percentage.

This change is the reason we use 'git status --porcelain=v2' in
t1092-sparse-checkout-compatibility.sh. We don't want to ensure that
this message is equal across both modes, but instead just the important
information about staged, modified, and untracked files are compared.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 t/t1092-sparse-checkout-compatibility.sh |  8 ++++++++
 wt-status.c                              | 14 +++++++++++---
 wt-status.h                              |  1 +
 3 files changed, 20 insertions(+), 3 deletions(-)

diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index 6598c12a2069..e488ef9bd941 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -196,6 +196,14 @@ test_expect_success 'status with options' '
 	test_all_match git status --porcelain=v2 -uno
 '
 
+test_expect_success 'status reports sparse-checkout' '
+	init_repos &&
+	git -C sparse-checkout status >full &&
+	git -C sparse-index status >sparse &&
+	test_i18ngrep "You are in a sparse checkout with " full &&
+	test_i18ngrep "You are in a sparse checkout." sparse
+'
+
 test_expect_success 'add, commit, checkout' '
 	init_repos &&
 
diff --git a/wt-status.c b/wt-status.c
index 0c8287a023e4..0425169c1895 100644
--- a/wt-status.c
+++ b/wt-status.c
@@ -1490,9 +1490,12 @@ static void show_sparse_checkout_in_use(struct wt_status *s,
 	if (s->state.sparse_checkout_percentage == SPARSE_CHECKOUT_DISABLED)
 		return;
 
-	status_printf_ln(s, color,
-			 _("You are in a sparse checkout with %d%% of tracked files present."),
-			 s->state.sparse_checkout_percentage);
+	if (s->state.sparse_checkout_percentage == SPARSE_CHECKOUT_SPARSE_INDEX)
+		status_printf_ln(s, color, _("You are in a sparse checkout."));
+	else
+		status_printf_ln(s, color,
+				_("You are in a sparse checkout with %d%% of tracked files present."),
+				s->state.sparse_checkout_percentage);
 	wt_longstatus_print_trailer(s);
 }
 
@@ -1650,6 +1653,11 @@ static void wt_status_check_sparse_checkout(struct repository *r,
 		return;
 	}
 
+	if (r->index->sparse_index) {
+		state->sparse_checkout_percentage = SPARSE_CHECKOUT_SPARSE_INDEX;
+		return;
+	}
+
 	for (i = 0; i < r->index->cache_nr; i++) {
 		struct cache_entry *ce = r->index->cache[i];
 		if (ce_skip_worktree(ce))
diff --git a/wt-status.h b/wt-status.h
index 0d32799b28e1..ab9cc9d8f032 100644
--- a/wt-status.h
+++ b/wt-status.h
@@ -78,6 +78,7 @@ enum wt_status_format {
 };
 
 #define SPARSE_CHECKOUT_DISABLED -1
+#define SPARSE_CHECKOUT_SPARSE_INDEX -2
 
 struct wt_status_state {
 	int merge_in_progress;
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 215+ messages in thread

* [PATCH 05/10] status: use sparse-index throughout
  2021-04-13 14:01 [PATCH 00/10] Sparse-index: integrate with status and add Derrick Stolee via GitGitGadget
                   ` (3 preceding siblings ...)
  2021-04-13 14:01 ` [PATCH 04/10] status: skip sparse-checkout percentage with sparse-index Derrick Stolee via GitGitGadget
@ 2021-04-13 14:01 ` Derrick Stolee via GitGitGadget
  2021-04-21  0:44   ` Elijah Newren
  2021-04-13 14:01 ` [PATCH 06/10] dir: use expand_to_path() for sparse directories Derrick Stolee via GitGitGadget
                   ` (6 subsequent siblings)
  11 siblings, 1 reply; 215+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-04-13 14:01 UTC (permalink / raw)
  To: git; +Cc: gitster, newren, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

By testing 'git -c core.fsmonitor= status -uno', we can check for the
simplest index operations that can be made sparse-aware. The necessary
implementation details are already integrated with sparse-checkout, so
modify command_requires_full_index to be zero for cmd_status().

By running the debugger for 'git status -uno' after that change, we find
two instances of ensure_full_index() that were added for extra safety,
but can be removed without issue.

In refresh_index(), we loop through the index entries. The
refresh_cache_ent() method copies the sparse directories into the
refreshed index without issue.

The loop within run_diff_files() skips things that are in stage 0 and
have skip-worktree enabled, so seems safe to disable ensure_full_index()
here.

This allows some cases of 'git status' to no longer expand a sparse
index to a full one, giving the following performance improvements for
p2000-sparse-checkout-operations.sh:

Test                                  HEAD~1           HEAD
-----------------------------------------------------------------------------
2000.2: git status (full-index-v3)    0.38(0.36+0.07)  0.37(0.31+0.10) -2.6%
2000.3: git status (full-index-v4)    0.38(0.29+0.12)  0.37(0.30+0.11) -2.6%
2000.4: git status (sparse-index-v3)  2.43(2.33+0.14)  0.04(0.05+0.04) -98.4%
2000.5: git status (sparse-index-v4)  2.44(2.35+0.13)  0.05(0.04+0.07) -98.0%

Note that since HEAD~1 was expanding the sparse index by parsing trees,
it was artificially slower than the full index case. Thus, the 98%
improvement is misleading, and instead we should celebrate the 0.37s to
0.05s improvement of 82%. This is more indicative of the peformance
gains we are expecting by using a sparse index.

Note: we are dropping the assignment of core.fsmonitor here. This is not
necessary for the test script as we are not altering the config any
other way. Correct integration with FS Monitor will be validated in
later changes.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 builtin/commit.c                         |  3 +++
 read-cache.c                             |  2 --
 t/t1092-sparse-checkout-compatibility.sh | 12 ++++++++----
 3 files changed, 11 insertions(+), 6 deletions(-)

diff --git a/builtin/commit.c b/builtin/commit.c
index cf0c36d1dcb2..e529da7beadd 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -1404,6 +1404,9 @@ int cmd_status(int argc, const char **argv, const char *prefix)
 	if (argc == 2 && !strcmp(argv[1], "-h"))
 		usage_with_options(builtin_status_usage, builtin_status_options);
 
+	prepare_repo_settings(the_repository);
+	the_repository->settings.command_requires_full_index = 0;
+
 	status_init_config(&s, git_status_config);
 	argc = parse_options(argc, argv, prefix,
 			     builtin_status_options,
diff --git a/read-cache.c b/read-cache.c
index 6308234b4838..83e6bdef7604 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1578,8 +1578,6 @@ int refresh_index(struct index_state *istate, unsigned int flags,
 	 */
 	preload_index(istate, pathspec, 0);
 	trace2_region_enter("index", "refresh", NULL);
-	/* TODO: audit for interaction with sparse-index. */
-	ensure_full_index(istate);
 	for (i = 0; i < istate->cache_nr; i++) {
 		struct cache_entry *ce, *new_entry;
 		int cache_errno = 0;
diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index e488ef9bd941..380a085f8ec4 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -449,12 +449,16 @@ test_expect_success 'sparse-index is expanded and converted back' '
 	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \
 		git -C sparse-index -c core.fsmonitor="" reset --hard &&
 	test_region index convert_to_sparse trace2.txt &&
-	test_region index ensure_full_index trace2.txt &&
+	test_region index ensure_full_index trace2.txt
+'
 
-	rm trace2.txt &&
+test_expect_success 'sparse-index is not expanded' '
+	init_repos &&
+
+	rm -f trace2.txt &&
 	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \
-		git -C sparse-index -c core.fsmonitor="" status -uno &&
-	test_region index ensure_full_index trace2.txt
+		git -C sparse-index status -uno &&
+	test_region ! index ensure_full_index trace2.txt
 '
 
 test_done
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 215+ messages in thread

* [PATCH 06/10] dir: use expand_to_path() for sparse directories
  2021-04-13 14:01 [PATCH 00/10] Sparse-index: integrate with status and add Derrick Stolee via GitGitGadget
                   ` (4 preceding siblings ...)
  2021-04-13 14:01 ` [PATCH 05/10] status: use sparse-index throughout Derrick Stolee via GitGitGadget
@ 2021-04-13 14:01 ` Derrick Stolee via GitGitGadget
  2021-04-21  0:52   ` Elijah Newren
  2021-04-13 14:01 ` [PATCH 07/10] add: allow operating on a sparse-only index Derrick Stolee via GitGitGadget
                   ` (5 subsequent siblings)
  11 siblings, 1 reply; 215+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-04-13 14:01 UTC (permalink / raw)
  To: git; +Cc: gitster, newren, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

The recently-implemented expand_to_path() method can supply position
queries a faster response if they are specifically asking for a path
within the sparse cone. Since this is the most-common scenario, this
provides a significant speedup.

Update t1092-sparse-checkout-compatibility.sh to fully ensure that 'git
status' does not expand a sparse index to a full one, even when there
exist untracked files.

The performance test script p2000-sparse-operations.sh demonstrates
that this is the final hole to fill to allow 'git status' to speed up
when using a sparse index:

Test                                  HEAD~1            HEAD
------------------------------------------------------------------------------
2000.4: git status (sparse-index-v3)  1.50(1.43+0.10)   0.04(0.04+0.03) -97.3%
2000.5: git status (sparse-index-v4)  1.50(1.43+0.10)   0.04(0.03+0.04) -97.3%

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 t/t1092-sparse-checkout-compatibility.sh | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index 380a085f8ec4..b937d7096afd 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -456,8 +456,9 @@ test_expect_success 'sparse-index is not expanded' '
 	init_repos &&
 
 	rm -f trace2.txt &&
+	echo >>sparse-index/untracked.txt &&
 	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \
-		git -C sparse-index status -uno &&
+		git -C sparse-index status &&
 	test_region ! index ensure_full_index trace2.txt
 '
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 215+ messages in thread

* [PATCH 07/10] add: allow operating on a sparse-only index
  2021-04-13 14:01 [PATCH 00/10] Sparse-index: integrate with status and add Derrick Stolee via GitGitGadget
                   ` (5 preceding siblings ...)
  2021-04-13 14:01 ` [PATCH 06/10] dir: use expand_to_path() for sparse directories Derrick Stolee via GitGitGadget
@ 2021-04-13 14:01 ` Derrick Stolee via GitGitGadget
  2021-04-13 14:01 ` [PATCH 08/10] pathspec: stop calling ensure_full_index Derrick Stolee via GitGitGadget
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 215+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-04-13 14:01 UTC (permalink / raw)
  To: git; +Cc: gitster, newren, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Disable command_requires_full_index for 'git add'. This does not require
any additional removals of ensure_full_index(). The main reason is that
'git add' discovers changes based on the pathspec and the worktree
itself. These are then inserted into the index directly, and calls to
index_name_pos() or index_file_exists() already call expand_to_path() at
the appropriate time to support a sparse-index.

Add a test to check that 'git add -A' and 'git add <file>' does not
expand the index at all, as long as <file> is not within a sparse
directory. This does not help the global 'git add .' case.

We can measure the improvement using p2000-sparse-operations.sh with
these results:

Test                                  HEAD~1           HEAD
------------------------------------------------------------------------------
2000.6: git add -A (full-index-v3)    1.35(1.00+0.20)  1.33(0.98+0.19) -1.5%
2000.7: git add -A (full-index-v4)    1.25(0.97+0.17)  1.23(0.96+0.16) -1.6%
2000.8: git add -A (sparse-index-v3)  2.38(2.28+0.13)  0.06(0.04+0.08) -97.5%
2000.9: git add -A (sparse-index-v4)  2.39(2.25+0.18)  0.06(0.04+0.07) -97.5%

While the 97% improvement seems impressive, it's important to recognize
that previously we had significant overhead for expanding the
sparse-index. Comparing to the full index case, 'git add -A' goes from
1.33s to 0.06s, which is "only" a 95% improvement.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 builtin/add.c                            |  3 +++
 t/t1092-sparse-checkout-compatibility.sh | 12 ++++++++++++
 2 files changed, 15 insertions(+)

diff --git a/builtin/add.c b/builtin/add.c
index 58ee3f954ef7..0572d0344065 100644
--- a/builtin/add.c
+++ b/builtin/add.c
@@ -526,6 +526,9 @@ int cmd_add(int argc, const char **argv, const char *prefix)
 	add_new_files = !take_worktree_changes && !refresh_only && !add_renormalize;
 	require_pathspec = !(take_worktree_changes || (0 < addremove_explicit));
 
+	prepare_repo_settings(the_repository);
+	the_repository->settings.command_requires_full_index = 0;
+
 	hold_locked_index(&lock_file, LOCK_DIE_ON_ERROR);
 
 	/*
diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index b937d7096afd..c210dba78067 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -459,6 +459,18 @@ test_expect_success 'sparse-index is not expanded' '
 	echo >>sparse-index/untracked.txt &&
 	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \
 		git -C sparse-index status &&
+	test_region ! index ensure_full_index trace2.txt &&
+
+	rm trace2.txt &&
+	echo >>sparse-index/README.md &&
+	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \
+		git -C sparse-index add -A &&
+	test_region ! index ensure_full_index trace2.txt &&
+
+	rm trace2.txt &&
+	echo >>sparse-index/extra.txt &&
+	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \
+		git -C sparse-index add extra.txt &&
 	test_region ! index ensure_full_index trace2.txt
 '
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 215+ messages in thread

* [PATCH 08/10] pathspec: stop calling ensure_full_index
  2021-04-13 14:01 [PATCH 00/10] Sparse-index: integrate with status and add Derrick Stolee via GitGitGadget
                   ` (6 preceding siblings ...)
  2021-04-13 14:01 ` [PATCH 07/10] add: allow operating on a sparse-only index Derrick Stolee via GitGitGadget
@ 2021-04-13 14:01 ` Derrick Stolee via GitGitGadget
  2021-04-21  0:57   ` Elijah Newren
  2021-04-13 14:01 ` [PATCH 09/10] t7519: add sparse directories to FS monitor tests Derrick Stolee via GitGitGadget
                   ` (3 subsequent siblings)
  11 siblings, 1 reply; 215+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-04-13 14:01 UTC (permalink / raw)
  To: git; +Cc: gitster, newren, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

The add_pathspec_matches_against_index() focuses on matching a pathspec
to file entries in the index. This already works correctly for its only
use: checking if untracked files exist in the index.

The compatibility checks in t1092 already test that 'git add <dir>'
works for a directory outside of the sparse cone. That provides coverage
for removing this guard.

This finalizes our ability to run 'git add .' without expanding a sparse
index to a full one. This is evidenced by an update to t1092 and by
these performance numbers for p2000-sparse-operations.sh:

Test                                    HEAD~1            HEAD
--------------------------------------------------------------------------------
2000.10: git add . (full-index-v3)      1.37(1.02+0.18)   1.38(1.01+0.20) +0.7%
2000.11: git add . (full-index-v4)      1.26(1.00+0.15)   1.27(0.99+0.17) +0.8%
2000.12: git add . (sparse-index-v3)    2.39(2.29+0.14)   0.06(0.05+0.07) -97.5%
2000.13: git add . (sparse-index-v4)    2.42(2.32+0.14)   0.06(0.05+0.06) -97.5%

While the 97% improvement is shown by the test results, it is worth
noting that expanding the sparse index was adding overhead in previous
commits. Comparing to the full index case, we see the performance go
from 1.27s to 0.06s, a 95% improvement.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 pathspec.c                               | 2 --
 t/t1092-sparse-checkout-compatibility.sh | 6 ++++++
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/pathspec.c b/pathspec.c
index 54813c0c4e8e..b51b48471fe6 100644
--- a/pathspec.c
+++ b/pathspec.c
@@ -37,8 +37,6 @@ void add_pathspec_matches_against_index(const struct pathspec *pathspec,
 			num_unmatched++;
 	if (!num_unmatched)
 		return;
-	/* TODO: audit for interaction with sparse-index. */
-	ensure_full_index(istate);
 	for (i = 0; i < istate->cache_nr; i++) {
 		const struct cache_entry *ce = istate->cache[i];
 		if (sw_action == PS_IGNORE_SKIP_WORKTREE && ce_skip_worktree(ce))
diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index c210dba78067..738013b00191 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -471,6 +471,12 @@ test_expect_success 'sparse-index is not expanded' '
 	echo >>sparse-index/extra.txt &&
 	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \
 		git -C sparse-index add extra.txt &&
+	test_region ! index ensure_full_index trace2.txt &&
+
+	rm trace2.txt &&
+	echo >>sparse-index/untracked.txt &&
+	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \
+		git -C sparse-index add . &&
 	test_region ! index ensure_full_index trace2.txt
 '
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 215+ messages in thread

* [PATCH 09/10] t7519: add sparse directories to FS monitor tests
  2021-04-13 14:01 [PATCH 00/10] Sparse-index: integrate with status and add Derrick Stolee via GitGitGadget
                   ` (7 preceding siblings ...)
  2021-04-13 14:01 ` [PATCH 08/10] pathspec: stop calling ensure_full_index Derrick Stolee via GitGitGadget
@ 2021-04-13 14:01 ` Derrick Stolee via GitGitGadget
  2021-04-13 14:01 ` [PATCH 10/10] fsmonitor: test with sparse index Derrick Stolee via GitGitGadget
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 215+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-04-13 14:01 UTC (permalink / raw)
  To: git; +Cc: gitster, newren, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

The File System Monitor (FS Monitor) tests in t7519 demonstrate some
important interactions with the index and the response from the FS
Monitor hook. Later changes will integrate the FS Monitor extension in
the index with the existence of sparse directory entries in a sparse
index. To do so, we need to include directories outside of the sparse
checkout definition.

Add a new directory, dir1a, between dir1 and dir2 in the test repo used
by this script. By inserting it in the middle, we are more likely to
trigger incorrect behavior when the fsmonitor_dirty bitmap is involved
with sparse directories changing the position of cache entries.

I could have modified the test to create two repos, one sparse and one
not, but that causes confusion in the expected output. Further, it makes
the test take twice as long. With this approach, we can validate that FS
Monitor works with the sparse index feature using the
GIT_TEST_SPARSE_INDEX=1 environment variable. The test currently fails
with that environment variable because FS Monitor is disabled when a
sparse index exists. The following changes will update this behavior.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 t/t7519-status-fsmonitor.sh | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/t/t7519-status-fsmonitor.sh b/t/t7519-status-fsmonitor.sh
index 45d025f96010..23879d967297 100755
--- a/t/t7519-status-fsmonitor.sh
+++ b/t/t7519-status-fsmonitor.sh
@@ -62,11 +62,16 @@ test_expect_success 'setup' '
 	mkdir dir1 &&
 	: >dir1/tracked &&
 	: >dir1/modified &&
+	mkdir dir1a &&
+	: >dir1a/a &&
+	: >dir1a/b &&
 	mkdir dir2 &&
 	: >dir2/tracked &&
 	: >dir2/modified &&
 	git -c core.fsmonitor= add . &&
 	git -c core.fsmonitor= commit -m initial &&
+	git sparse-checkout init --cone --no-sparse-index &&
+	git sparse-checkout set dir1 dir2 &&
 	git config core.fsmonitor .git/hooks/fsmonitor-test &&
 	cat >.gitignore <<-\EOF
 	.gitignore
@@ -99,6 +104,8 @@ test_expect_success 'update-index --no-fsmonitor" removes the fsmonitor extensio
 cat >expect <<EOF &&
 h dir1/modified
 H dir1/tracked
+S dir1a/a
+S dir1a/b
 h dir2/modified
 H dir2/tracked
 h modified
@@ -121,6 +128,8 @@ test_expect_success 'update-index --fsmonitor-valid" sets the fsmonitor valid bi
 cat >expect <<EOF &&
 H dir1/modified
 H dir1/tracked
+S dir1a/a
+S dir1a/b
 H dir2/modified
 H dir2/tracked
 H modified
@@ -139,6 +148,8 @@ test_expect_success 'update-index --no-fsmonitor-valid" clears the fsmonitor val
 cat >expect <<EOF &&
 H dir1/modified
 H dir1/tracked
+S dir1a/a
+S dir1a/b
 H dir2/modified
 H dir2/tracked
 H modified
@@ -158,6 +169,8 @@ cat >expect <<EOF &&
 H dir1/modified
 h dir1/new
 H dir1/tracked
+S dir1a/a
+S dir1a/b
 H dir2/modified
 h dir2/new
 H dir2/tracked
@@ -182,6 +195,8 @@ cat >expect <<EOF &&
 H dir1/modified
 h dir1/new
 h dir1/tracked
+S dir1a/a
+S dir1a/b
 H dir2/modified
 h dir2/new
 h dir2/tracked
@@ -201,6 +216,8 @@ test_expect_success 'all unmodified files get marked valid' '
 cat >expect <<EOF &&
 H dir1/modified
 h dir1/tracked
+S dir1a/a
+S dir1a/b
 h dir2/modified
 h dir2/tracked
 h modified
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 215+ messages in thread

* [PATCH 10/10] fsmonitor: test with sparse index
  2021-04-13 14:01 [PATCH 00/10] Sparse-index: integrate with status and add Derrick Stolee via GitGitGadget
                   ` (8 preceding siblings ...)
  2021-04-13 14:01 ` [PATCH 09/10] t7519: add sparse directories to FS monitor tests Derrick Stolee via GitGitGadget
@ 2021-04-13 14:01 ` Derrick Stolee via GitGitGadget
  2021-04-21  7:00   ` Elijah Newren
  2021-04-13 20:45 ` [PATCH 00/10] Sparse-index: integrate with status and add Matheus Tavares Bernardino
  2021-04-23 21:34 ` [PATCH v2 0/8] Sparse-index: integrate with status Derrick Stolee via GitGitGadget
  11 siblings, 1 reply; 215+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-04-13 14:01 UTC (permalink / raw)
  To: git; +Cc: gitster, newren, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

During the effort to protect uses of the index to operate on a full
index, we did not modify fsmonitor.c. This is because it already works
effectively with only the change to index_name_stage_pos(). The only
thing left to do is to test that it works correctly.

These tests are added to demonstrate that the behavior is the same
across a full index and a sparse index, but also that file modifications
to a tracked directory outside of the sparse cone will trigger
ensure_full_index().

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 t/t7519-status-fsmonitor.sh | 48 +++++++++++++++++++++++++++++++++++++
 1 file changed, 48 insertions(+)

diff --git a/t/t7519-status-fsmonitor.sh b/t/t7519-status-fsmonitor.sh
index 23879d967297..306157d48abf 100755
--- a/t/t7519-status-fsmonitor.sh
+++ b/t/t7519-status-fsmonitor.sh
@@ -78,6 +78,7 @@ test_expect_success 'setup' '
 	expect*
 	actual*
 	marker*
+	trace2*
 	EOF
 '
 
@@ -400,4 +401,51 @@ test_expect_success 'status succeeds after staging/unstaging' '
 	)
 '
 
+test_expect_success 'status succeeds with sparse index' '
+	test_config core.fsmonitor "$TEST_DIRECTORY/t7519/fsmonitor-all" &&
+	git status --porcelain=v2 >expect &&
+	git sparse-checkout init --cone --sparse-index &&
+	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \
+		git status --porcelain=v2 >actual &&
+	test_region ! index ensure_full_index trace2.txt &&
+	test_cmp expect actual &&
+	rm trace2.txt &&
+
+	write_script .git/hooks/fsmonitor-test<<-\EOF &&
+		printf "last_update_token\0"
+	EOF
+	git config core.fsmonitor .git/hooks/fsmonitor-test &&
+	git status --porcelain=v2 >expect &&
+	git sparse-checkout init --cone --sparse-index &&
+	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \
+		git status --porcelain=v2 >actual &&
+	test_region ! index ensure_full_index trace2.txt &&
+	test_cmp expect actual &&
+	rm trace2.txt &&
+
+	write_script .git/hooks/fsmonitor-test<<-\EOF &&
+		printf "last_update_token\0"
+		printf "dir1/modified\0"
+	EOF
+	git config core.fsmonitor .git/hooks/fsmonitor-test &&
+	git status --porcelain=v2 >expect &&
+	git sparse-checkout init --cone --sparse-index &&
+	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \
+		git status --porcelain=v2 >actual &&
+	test_region ! index ensure_full_index trace2.txt &&
+	test_cmp expect actual &&
+
+	write_script .git/hooks/fsmonitor-test<<-\EOF &&
+		printf "last_update_token\0"
+		printf "dir1a/modified\0"
+	EOF
+	git config core.fsmonitor .git/hooks/fsmonitor-test &&
+	git status --porcelain=v2 >expect &&
+	git sparse-checkout init --cone --sparse-index &&
+	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \
+		git status --porcelain=v2 >actual &&
+	test_region index ensure_full_index trace2.txt &&
+	test_cmp expect actual
+'
+
 test_done
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 215+ messages in thread

* Re: [PATCH 00/10] Sparse-index: integrate with status and add
  2021-04-13 14:01 [PATCH 00/10] Sparse-index: integrate with status and add Derrick Stolee via GitGitGadget
                   ` (9 preceding siblings ...)
  2021-04-13 14:01 ` [PATCH 10/10] fsmonitor: test with sparse index Derrick Stolee via GitGitGadget
@ 2021-04-13 20:45 ` Matheus Tavares Bernardino
  2021-04-14 16:31   ` Derrick Stolee
  2021-04-23 21:34 ` [PATCH v2 0/8] Sparse-index: integrate with status Derrick Stolee via GitGitGadget
  11 siblings, 1 reply; 215+ messages in thread
From: Matheus Tavares Bernardino @ 2021-04-13 20:45 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: git, Junio C Hamano, Elijah Newren, Derrick Stolee

Hi, Stolee

On Tue, Apr 13, 2021 at 11:02 AM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> This is the first "payoff" series in the sparse-index work. It makes 'git
> status' and 'git add' very fast when a sparse-index is enabled on a
> repository with cone-mode sparse-checkout (and a small populated set).
>
> This is based on ds/sparse-index-protections AND mt/add-rm-sparse-checkout.

I just noticed that our ds/sparse-index-protections and
mt/add-rm-sparse-checkout had a small semantic conflict. It didn't
appear before, but it does now with this new series.

ds/sparse-index-protections added `ensure_full_index()` guards before
the loops that traverse over all cache entries. At the same time,
mt/add-rm-sparse-checkout added yet another one of these loops, at
`pathspec.c::find_pathspecs_matching_skip_worktree()`. Although the
new place didn't get the `ensure_full_index()` guard, all of its
callers (in `add` and `rm`) did call `ensure_full_index()` before
calling it, so it was fine.

However, patches 7 and 8 remove some of these protections in `add`s
code. And, as a result, if "dir" is a sparse directory entry, `git add
[--refresh] dir/file` no longer emits the warning added at
mt/add-rm-sparse-checkout.

Adding `ensure_full_index()` at
`find_pathspecs_matching_skip_worktree()` fixes the problem. We have
to consider the performance implications, but they _might_ be
acceptable as we only call this function when a pathspec given to
`add` or `rm` does not match any non-ignored file inside the sparse
checkout.

Additionally, the tests I added at t3705 won't catch this problem,
even when running with GIT_TEST_SPARSE_INDEX=true :( That's because
they don't set core.sparseCheckout and core.sparseCheckoutCone, they
only set individual index entries with the SKIP_WORKTREE bit. And
therefore, the index is always written fully. Perhaps, should I reroll
my series using cone mode for these tests?

(And a semi-related question: do you plan on adding
GIT_TEST_SPARSE_INDEX=true to one of the CI jobs? )

^ permalink raw reply	[flat|nested] 215+ messages in thread

* Re: [PATCH 00/10] Sparse-index: integrate with status and add
  2021-04-13 20:45 ` [PATCH 00/10] Sparse-index: integrate with status and add Matheus Tavares Bernardino
@ 2021-04-14 16:31   ` Derrick Stolee
  0 siblings, 0 replies; 215+ messages in thread
From: Derrick Stolee @ 2021-04-14 16:31 UTC (permalink / raw)
  To: Matheus Tavares Bernardino, Derrick Stolee via GitGitGadget
  Cc: git, Junio C Hamano, Elijah Newren, Derrick Stolee

On 4/13/2021 4:45 PM, Matheus Tavares Bernardino wrote:
> Hi, Stolee
> 
> On Tue, Apr 13, 2021 at 11:02 AM Derrick Stolee via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
>>
>> This is the first "payoff" series in the sparse-index work. It makes 'git
>> status' and 'git add' very fast when a sparse-index is enabled on a
>> repository with cone-mode sparse-checkout (and a small populated set).
>>
>> This is based on ds/sparse-index-protections AND mt/add-rm-sparse-checkout.
> 
> I just noticed that our ds/sparse-index-protections and
> mt/add-rm-sparse-checkout had a small semantic conflict. It didn't
> appear before, but it does now with this new series.

Thank you for taking a close look.
 
> ds/sparse-index-protections added `ensure_full_index()` guards before
> the loops that traverse over all cache entries. At the same time,
> mt/add-rm-sparse-checkout added yet another one of these loops, at
> `pathspec.c::find_pathspecs_matching_skip_worktree()`. Although the
> new place didn't get the `ensure_full_index()` guard, all of its
> callers (in `add` and `rm`) did call `ensure_full_index()` before
> calling it, so it was fine.
>
> However, patches 7 and 8 remove some of these protections in `add`s
> code. And, as a result, if "dir" is a sparse directory entry, `git add
> [--refresh] dir/file` no longer emits the warning added at
> mt/add-rm-sparse-checkout.

You are right, it does not emit the warning. I will add a test that
ensures that behavior is the same across the two sparse repos in
t1092 as part of my v2 in this series.
 
> Adding `ensure_full_index()` at
> `find_pathspecs_matching_skip_worktree()` fixes the problem. We have
> to consider the performance implications, but they _might_ be
> acceptable as we only call this function when a pathspec given to
> `add` or `rm` does not match any non-ignored file inside the sparse
> checkout.

I'll want to do the right thing here to make the warning work, so
I'll take a look soon.

> Additionally, the tests I added at t3705 won't catch this problem,
> even when running with GIT_TEST_SPARSE_INDEX=true :( That's because
> they don't set core.sparseCheckout and core.sparseCheckoutCone, they
> only set individual index entries with the SKIP_WORKTREE bit. And
> therefore, the index is always written fully. Perhaps, should I reroll
> my series using cone mode for these tests?

Your series should not be re-rolled for this. Instead, this is valuable
feedback for this series: there is behavior in 'git add' that I am not
checking stays the same when the sparse-index is enabled. That's my
responsibility and I'll get it fixed.
 
> (And a semi-related question: do you plan on adding
> GIT_TEST_SPARSE_INDEX=true to one of the CI jobs? )

I do plan to add that, after things calm down. It won't do much right
now because it requires core.sparseCheckout[Cone] to be enabled. Not
many tests provide that, so they don't add much coverage. I thought at
one point to adjust the initial repo creation to include a
sparse-checkout in cone mode, but that would change too many tests.
I still haven't found the right way to expand the test coverage to
take advantage of our deep test suite for this feature.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 215+ messages in thread

* Re: [PATCH 01/10] t1092: add tests for status/add and sparse files
  2021-04-13 14:01 ` [PATCH 01/10] t1092: add tests for status/add and sparse files Derrick Stolee via GitGitGadget
@ 2021-04-20 21:52   ` Elijah Newren
  2021-04-21 13:21     ` Derrick Stolee
  2021-04-21 15:14   ` Matheus Tavares Bernardino
  1 sibling, 1 reply; 215+ messages in thread
From: Elijah Newren @ 2021-04-20 21:52 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Junio C Hamano, Derrick Stolee, Derrick Stolee

On Tue, Apr 13, 2021 at 7:01 AM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Derrick Stolee <dstolee@microsoft.com>
>
> Before moving to update 'git status' and 'git add' to work with sparse
> indexes, add an explicit test that ensures the sparse-index works the
> same as a normal sparse-checkout when the worktree contains directories
> and files outside of the sparse cone.
>
> Specifically, 'folder1/a' is a file in our test repo, but 'folder1' is
> not in the sparse cone. When 'folder1/a' is modified, the file
> 'folder1/a' is shown as modified, but adding it fails. This is new
> behavior as of a20f704 (add: warn when asked to update SKIP_WORKTREE
> entries, 2021-04-08). Before that change, these adds would be silently
> ignored.
>
> Untracked files are fine: adding new files both with 'git add .' and
> 'git add folder1/' works just as in a full checkout. This may not be
> entirely desirable, but we are not intending to change behavior at the
> moment, only document it.

Personally, I'd say not desirable and we should throw an error just
like we do with skip-worktree entries that the user happens to try to
git add.  I've had reports from users that got confused by what
happens after this.  I've been meaning to create some patches to fix
it up, but wanted to avoid getting in the way of the sparse-index work
and have been a bit tied up on other projects to boot.

I'll note in particular that it's easy for users after running "git
add" to run other things such as "git sparse-checkout reapply" or "git
switch $otherbranch" and suddenly the file disappears from the working
tree.  From the sparse-checkout machinery that makes sense; this path
doesn't match the .git/info/sparse-checkout list of paths, so it
should be removed from the working tree.  But it's very disorienting
to users.  Especially if some of those commands are side-effects of
other commands (e.g. our build system invokes "git sparse-checkout
reapply" in various cases, most common of which is that even a simple
"git pull" can bring down code with dependency changes and thus a need
for new sparsity rules and whatnot), but it definitely can just happen
in ways users don't expect with their own commands (e.g. the git
switch/checkout example).

The patch looks good, but it'd be nice if while documenting it we also
add a comment that we believe we want to change the behavior (for
sparse-checkout both with and without sparse-index).  It's one of
those many paper-cuts we still have.

> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  t/t1092-sparse-checkout-compatibility.sh | 36 ++++++++++++++++++++++++
>  1 file changed, 36 insertions(+)
>
> diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
> index 12e6c453024f..6598c12a2069 100755
> --- a/t/t1092-sparse-checkout-compatibility.sh
> +++ b/t/t1092-sparse-checkout-compatibility.sh
> @@ -232,6 +232,42 @@ test_expect_success 'add, commit, checkout' '
>         test_all_match git checkout -
>  '
>
> +test_expect_success 'status/add: outside sparse cone' '
> +       init_repos &&
> +
> +       # folder1 is at HEAD, but outside the sparse cone
> +       run_on_sparse mkdir folder1 &&
> +       cp initial-repo/folder1/a sparse-checkout/folder1/a &&
> +       cp initial-repo/folder1/a sparse-index/folder1/a &&
> +
> +       test_sparse_match git status &&
> +
> +       write_script edit-contents <<-\EOF &&
> +       echo text >>$1
> +       EOF
> +       run_on_all ../edit-contents folder1/a &&
> +       run_on_all ../edit-contents folder1/new &&
> +
> +       test_sparse_match git status --porcelain=v2 &&
> +
> +       # This "git add folder1/a" is completely ignored
> +       # by the sparse-checkout repos. It causes the
> +       # full repo to have a different staged environment.
> +       test_must_fail git -C sparse-checkout add folder1/a &&
> +       test_must_fail git -C sparse-index add folder1/a &&
> +       git -C full-checkout checkout HEAD -- folder1/a &&
> +       test_sparse_match git status --porcelain=v2 &&
> +
> +       test_all_match git add . &&
> +       test_all_match git status --porcelain=v2 &&
> +       test_all_match git commit -m folder1/new &&
> +
> +       run_on_all ../edit-contents folder1/newer &&
> +       test_all_match git add folder1/ &&
> +       test_all_match git status --porcelain=v2 &&
> +       test_all_match git commit -m folder1/newer
> +'
> +
>  test_expect_success 'checkout and reset --hard' '
>         init_repos &&
>
> --
> gitgitgadget
>

^ permalink raw reply	[flat|nested] 215+ messages in thread

* Re: [PATCH 02/10] unpack-trees: make sparse aware
  2021-04-13 14:01 ` [PATCH 02/10] unpack-trees: make sparse aware Derrick Stolee via GitGitGadget
@ 2021-04-20 23:00   ` Elijah Newren
  2021-04-21 13:41     ` Derrick Stolee
  2021-04-21 17:27     ` Derrick Stolee
  0 siblings, 2 replies; 215+ messages in thread
From: Elijah Newren @ 2021-04-20 23:00 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Junio C Hamano, Derrick Stolee, Derrick Stolee

On Tue, Apr 13, 2021 at 7:01 AM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Derrick Stolee <dstolee@microsoft.com>
>
> As a first step to integrate 'git status' and 'git add' with the sparse
> index, we must start integrating unpack_trees() with sparse directory
> entries. These changes are currently impossible to trigger because
> unpack_trees() calls ensure_full_index() if command_requires_full_index
> is true. This is the case for all commands at the moment. As we expand
> more commands to be sparse-aware, we might find that more changes are
> required to unpack_trees(). The current changes will suffice for
> 'status' and 'add'.
>
> unpack_trees() calls the traverse_trees() API using unpack_callback()
> to decide if we should recurse into a subtree. We must add new abilities
> to skip a subtree if it corresponds to a sparse directory entry.
>
> It is important to be careful about the trailing directory separator
> that exists in the sparse directory entries but not in the subtree
> paths.
>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  dir.h           |  2 +-
>  preload-index.c |  2 ++
>  read-cache.c    |  3 +++
>  unpack-trees.c  | 24 ++++++++++++++++++++++--
>  4 files changed, 28 insertions(+), 3 deletions(-)
>
> diff --git a/dir.h b/dir.h
> index 51cb0e217247..9d6666f520f3 100644
> --- a/dir.h
> +++ b/dir.h
> @@ -503,7 +503,7 @@ static inline int ce_path_match(struct index_state *istate,
>                                 char *seen)
>  {
>         return match_pathspec(istate, pathspec, ce->name, ce_namelen(ce), 0, seen,
> -                             S_ISDIR(ce->ce_mode) || S_ISGITLINK(ce->ce_mode));
> +                             S_ISSPARSEDIR(ce->ce_mode) || S_ISDIR(ce->ce_mode) || S_ISGITLINK(ce->ce_mode));

I'm confused why this change would be needed, or why it'd semantically
be meaningful here either.  Doesn't S_ISSPARSEDIR() being true imply
S_ISDIR() is true (and perhaps even vice versa?).

By chance, was this a leftover from your early RFC changes from a few
series ago when you had an entirely different mode for sparse
directory entries?

>  }
>
>  static inline int dir_path_match(struct index_state *istate,
> diff --git a/preload-index.c b/preload-index.c
> index e5529a586366..35e67057ca9b 100644
> --- a/preload-index.c
> +++ b/preload-index.c
> @@ -55,6 +55,8 @@ static void *preload_thread(void *_data)
>                         continue;
>                 if (S_ISGITLINK(ce->ce_mode))
>                         continue;
> +               if (S_ISSPARSEDIR(ce->ce_mode))
> +                       continue;
>                 if (ce_uptodate(ce))
>                         continue;
>                 if (ce_skip_worktree(ce))

Don't we have S_ISSPARSEDIR(ce->ce_mode) implies ce_skip_worktree(ce)?
 Is this a duplicate check?  If so, is it still desirable for
future-proofing or code clarity, or is it strictly redundant?

> diff --git a/read-cache.c b/read-cache.c
> index 29ffa9ac5db9..6308234b4838 100644
> --- a/read-cache.c
> +++ b/read-cache.c
> @@ -1594,6 +1594,9 @@ int refresh_index(struct index_state *istate, unsigned int flags,
>                 if (ignore_skip_worktree && ce_skip_worktree(ce))
>                         continue;
>
> +               if (istate->sparse_index && S_ISSPARSEDIR(ce->ce_mode))
> +                       continue;
> +

I'm a bit confused about what could trigger ce_skip_worktree(ce) &&
!ignore_skip_worktree and why it'd be desirable to refresh
skip-worktree entries.  However, this is tangential to your patch and
has apparently been around since 2009 (in particular, from 56cac48c35
("ie_match_stat(): do not ignore skip-worktree bit with
CE_MATCH_IGNORE_VALID", 2009-12-14)).

>                 if (pathspec && !ce_path_match(istate, ce, pathspec, seen))
>                         filtered = 1;
>
> diff --git a/unpack-trees.c b/unpack-trees.c
> index dddf106d5bd4..9a62e823928a 100644
> --- a/unpack-trees.c
> +++ b/unpack-trees.c
> @@ -586,6 +586,13 @@ static void mark_ce_used(struct cache_entry *ce, struct unpack_trees_options *o)
>  {
>         ce->ce_flags |= CE_UNPACKED;
>
> +       /*
> +        * If this is a sparse directory, don't advance cache_bottom.
> +        * That will be advanced later using the cache-tree data.
> +        */
> +       if (S_ISSPARSEDIR(ce->ce_mode))
> +               return;
> +

I don't understand cache_bottom stuff; we might want to get Junio to
look over it.  Or maybe I just need to dig a bit further and attempt
to understand it.

>         if (o->cache_bottom < o->src_index->cache_nr &&
>             o->src_index->cache[o->cache_bottom] == ce) {
>                 int bottom = o->cache_bottom;
> @@ -984,6 +991,9 @@ static int do_compare_entry(const struct cache_entry *ce,
>         ce_len -= pathlen;
>         ce_name = ce->name + pathlen;
>
> +       /* remove directory separator if a sparse directory entry */
> +       if (S_ISSPARSEDIR(ce->ce_mode))
> +               ce_len--;
>         return df_name_compare(ce_name, ce_len, S_IFREG, name, namelen, mode);

Shouldn't we be passing ce->ce_mode instead of S_IFREG here as well?

Note the following sort order:
   foo
   foo.txt
   foo/
   foo/bar

You've trimmed off the '/', so 'foo/' would be ordered where 'foo' is,
but df_name_compare() exists to make "foo" sort exactly where "foo/"
would when "foo" is a directory.  Will your df_name_compare() call
here result in foo.txt being placed after all the "foo/<subpath>"
entries in the index and perhaps cause other problems down the line?
(Are there issues, e.g. with cache-trees getting wrong ordering from
this, or even writing out indexes or tree objects with the wrong
ordering?  I've written out trees to disk with wrong ordering before
and git usually survives but gets really confused with diffs.)

Since at least one caller of compare_entry() takes the return result
and does a "if (cmp < 0)", this order is going to matter in some
cases.  Perhaps we need some testcases where there is a sparse
directory entry named "foo/" and a file recorded in some relevant tree
with the name "foo.txt" to be able to trigger these lines of code?

>  }
>
> @@ -993,6 +1003,10 @@ static int compare_entry(const struct cache_entry *ce, const struct traverse_inf
>         if (cmp)
>                 return cmp;
>
> +       /* If ce is a sparse directory, then allow equality here. */
> +       if (S_ISSPARSEDIR(ce->ce_mode))
> +               return 0;
> +

Um...so a sparse directory compares equal to _anything_ at all?  I'm
really confused why this would be desirable.  Am I missing something
here?

>         /*
>          * Even if the beginning compared identically, the ce should
>          * compare as bigger than a directory leading up to it!
> @@ -1243,6 +1257,7 @@ static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, str
>         struct cache_entry *src[MAX_UNPACK_TREES + 1] = { NULL, };
>         struct unpack_trees_options *o = info->data;
>         const struct name_entry *p = names;
> +       unsigned recurse = 1;

"recurse" sent my mind off into questions about safety checks, base
cases, etc., instead of just the simple "we don't want to read in
directories corresponding to sparse entries".  I think this would be
clearer either if the variable had the sparsity concept embedded in
its name somewhere (e.g. "unsigned sparse_entry = 0", and check for
(!sparse_entry) instead of (recurse) below), or with a comment about
why there are cases where you want to avoid recursion.

>
>         /* Find first entry with a real name (we could use "mask" too) */
>         while (!p->mode)
> @@ -1284,12 +1299,16 @@ static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, str
>                                         }
>                                 }
>                                 src[0] = ce;
> +
> +                               if (S_ISSPARSEDIR(ce->ce_mode))
> +                                       recurse = 0;

Ah, the context here doesn't show it but this is in the "if (!cmp)"
block, i.e. if we found a match for the sparse directory.  This makes
sense, to me, _if_ we ignore the above question about sparse
directories matching equal to anything and everything.

>                         }
>                         break;
>                 }
>         }
>
> -       if (unpack_nondirectories(n, mask, dirmask, src, names, info) < 0)
> +       if (recurse &&
> +           unpack_nondirectories(n, mask, dirmask, src, names, info) < 0)
>                 return -1;
>
>         if (o->merge && src[0]) {
> @@ -1319,7 +1338,8 @@ static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, str
>                         }
>                 }
>
> -               if (traverse_trees_recursive(n, dirmask, mask & ~dirmask,
> +               if (recurse &&
> +                   traverse_trees_recursive(n, dirmask, mask & ~dirmask,
>                                              names, info) < 0)
>                         return -1;
>                 return mask;

Nice.  :-)


I think your patch was mostly about the recurse stuff, which other
than the name or a comment about it look good to me.  However, all the
other preparatory small tweaks brought up a lot of questions or
confusion for me.  I'm worried there might be a bug or two, though I
may have just misunderstood some of the code bits.

^ permalink raw reply	[flat|nested] 215+ messages in thread

* Re: [PATCH 03/10] dir.c: accept a directory as part of cone-mode patterns
  2021-04-13 14:01 ` [PATCH 03/10] dir.c: accept a directory as part of cone-mode patterns Derrick Stolee via GitGitGadget
@ 2021-04-20 23:21   ` Elijah Newren
  2021-04-21 13:47     ` Derrick Stolee
  0 siblings, 1 reply; 215+ messages in thread
From: Elijah Newren @ 2021-04-20 23:21 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Junio C Hamano, Derrick Stolee, Derrick Stolee

On Tue, Apr 13, 2021 at 7:01 AM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Derrick Stolee <dstolee@microsoft.com>
>
> When we have sparse directory entries in the index, we want to compare
> that directory against sparse-checkout patterns. Those pattern matching
> algorithms are built expecting a file path, not a directory path. This
> is especially important in the "cone mode" patterns which will match
> files that exist within the "parent directories" as well as the
> recursive directory matches.
>
> If path_matches_pattern_list() is given a directory, we can add a fake
> filename ("-") to the directory and get the same results as before,
> assuming we are in cone mode. Since sparse index requires cone mode
> patterns, this is an acceptable assumption.

Makes sense; thanks for the good description.

> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  dir.c | 5 +++++
>  1 file changed, 5 insertions(+)
>
> diff --git a/dir.c b/dir.c
> index 166238e79f52..57e22e605cec 100644
> --- a/dir.c
> +++ b/dir.c
> @@ -1378,6 +1378,11 @@ enum pattern_match_result path_matches_pattern_list(
>         strbuf_addch(&parent_pathname, '/');
>         strbuf_add(&parent_pathname, pathname, pathlen);
>
> +       /* Directory requests should be added as if they are a file */

"added" or "matched"?  Also, the description seems a bit brief and
likely to surprise; I'd at least want to expand "file" to "file within
their given directory" but it might be nice to get some summarized
version of the commit message or at least state that "-" is just a
random simple name within the given directory.

> +       if (parent_pathname.len > 1 &&

Is this line...

> +           parent_pathname.buf[parent_pathname.len - 1] == '/')

to prevent an out-of-bounds indexing?  If so, shouldn't it be "> 0" or
">= 1" rather than "> 1"?  And if so, doesn't the strbuf_addch() call
above ensure the condition is always met?

Or are we trying to avoid adding the "-" when we parent_pathname is
just a plain "/"?

> +               strbuf_add(&parent_pathname, "-", 1);
> +

Sorry for all the questions on such a tiny change.  It makes sense to
me, I'm just curious whether it'll confuse future code readers.


>         if (hashmap_contains_path(&pl->recursive_hashmap,
>                                   &parent_pathname)) {
>                 result = MATCHED_RECURSIVE;
> --

^ permalink raw reply	[flat|nested] 215+ messages in thread

* Re: [PATCH 04/10] status: skip sparse-checkout percentage with sparse-index
  2021-04-13 14:01 ` [PATCH 04/10] status: skip sparse-checkout percentage with sparse-index Derrick Stolee via GitGitGadget
@ 2021-04-20 23:26   ` Elijah Newren
  2021-04-21 13:51     ` Derrick Stolee
  0 siblings, 1 reply; 215+ messages in thread
From: Elijah Newren @ 2021-04-20 23:26 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Junio C Hamano, Derrick Stolee, Derrick Stolee

On Tue, Apr 13, 2021 at 7:01 AM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Derrick Stolee <dstolee@microsoft.com>
>
> 'git status' began reporting a percentage of populated paths when
> sparse-checkout is enabled in 051df3cf (wt-status: show sparse
> checkout status as well, 2020-07-18). This percentage is incorrect when
> the index has sparse directories. It would also be expensive to
> calculate as we would need to parse trees to count the total number of
> possible paths.
>
> Avoid the expensive computation by simplifying the output to only report
> that a sparse checkout exists, without the percentage.

Makes sense.  The percentage wasn't critical, it was just a nice UI
bonus.  The critical part is notifying about being in a sparse
checkout.

It makes me wonder slightly if we'd want to remove the percentage for
both modes just to keep them more similar.  I'll ask some folks for
their thoughts/opinions.  Of course, that could always be tweaked
later and doesn't necessarily need to go into your series.

> This change is the reason we use 'git status --porcelain=v2' in
> t1092-sparse-checkout-compatibility.sh. We don't want to ensure that
> this message is equal across both modes, but instead just the important
> information about staged, modified, and untracked files are compared.
>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  t/t1092-sparse-checkout-compatibility.sh |  8 ++++++++
>  wt-status.c                              | 14 +++++++++++---
>  wt-status.h                              |  1 +
>  3 files changed, 20 insertions(+), 3 deletions(-)
>
> diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
> index 6598c12a2069..e488ef9bd941 100755
> --- a/t/t1092-sparse-checkout-compatibility.sh
> +++ b/t/t1092-sparse-checkout-compatibility.sh
> @@ -196,6 +196,14 @@ test_expect_success 'status with options' '
>         test_all_match git status --porcelain=v2 -uno
>  '
>
> +test_expect_success 'status reports sparse-checkout' '
> +       init_repos &&
> +       git -C sparse-checkout status >full &&
> +       git -C sparse-index status >sparse &&
> +       test_i18ngrep "You are in a sparse checkout with " full &&
> +       test_i18ngrep "You are in a sparse checkout." sparse
> +'
> +
>  test_expect_success 'add, commit, checkout' '
>         init_repos &&
>
> diff --git a/wt-status.c b/wt-status.c
> index 0c8287a023e4..0425169c1895 100644
> --- a/wt-status.c
> +++ b/wt-status.c
> @@ -1490,9 +1490,12 @@ static void show_sparse_checkout_in_use(struct wt_status *s,
>         if (s->state.sparse_checkout_percentage == SPARSE_CHECKOUT_DISABLED)
>                 return;
>
> -       status_printf_ln(s, color,
> -                        _("You are in a sparse checkout with %d%% of tracked files present."),
> -                        s->state.sparse_checkout_percentage);
> +       if (s->state.sparse_checkout_percentage == SPARSE_CHECKOUT_SPARSE_INDEX)
> +               status_printf_ln(s, color, _("You are in a sparse checkout."));
> +       else
> +               status_printf_ln(s, color,
> +                               _("You are in a sparse checkout with %d%% of tracked files present."),
> +                               s->state.sparse_checkout_percentage);
>         wt_longstatus_print_trailer(s);
>  }
>
> @@ -1650,6 +1653,11 @@ static void wt_status_check_sparse_checkout(struct repository *r,
>                 return;
>         }
>
> +       if (r->index->sparse_index) {
> +               state->sparse_checkout_percentage = SPARSE_CHECKOUT_SPARSE_INDEX;
> +               return;
> +       }
> +
>         for (i = 0; i < r->index->cache_nr; i++) {
>                 struct cache_entry *ce = r->index->cache[i];
>                 if (ce_skip_worktree(ce))
> diff --git a/wt-status.h b/wt-status.h
> index 0d32799b28e1..ab9cc9d8f032 100644
> --- a/wt-status.h
> +++ b/wt-status.h
> @@ -78,6 +78,7 @@ enum wt_status_format {
>  };
>
>  #define SPARSE_CHECKOUT_DISABLED -1
> +#define SPARSE_CHECKOUT_SPARSE_INDEX -2
>
>  struct wt_status_state {
>         int merge_in_progress;
> --
> gitgitgadget

Looks good.

^ permalink raw reply	[flat|nested] 215+ messages in thread

* Re: [PATCH 05/10] status: use sparse-index throughout
  2021-04-13 14:01 ` [PATCH 05/10] status: use sparse-index throughout Derrick Stolee via GitGitGadget
@ 2021-04-21  0:44   ` Elijah Newren
  2021-04-21 13:55     ` Derrick Stolee
  0 siblings, 1 reply; 215+ messages in thread
From: Elijah Newren @ 2021-04-21  0:44 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Junio C Hamano, Derrick Stolee, Derrick Stolee

On Tue, Apr 13, 2021 at 7:01 AM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Derrick Stolee <dstolee@microsoft.com>
>
> By testing 'git -c core.fsmonitor= status -uno', we can check for the
> simplest index operations that can be made sparse-aware. The necessary
> implementation details are already integrated with sparse-checkout, so
> modify command_requires_full_index to be zero for cmd_status().
>
> By running the debugger for 'git status -uno' after that change, we find
> two instances of ensure_full_index() that were added for extra safety,
> but can be removed without issue.
>
> In refresh_index(), we loop through the index entries. The
> refresh_cache_ent() method copies the sparse directories into the
> refreshed index without issue.

I do see the removal of a call to ensure_full_index() in
refresh_index() that you mention in this paragraph in the patch below.

I'm confused, though; I would have thought we wanted to avoid a
refresh_cache_ent() call.  Also, one of your previous patches added a

    if (istate->sparse_index && S_ISSPARSEDIR(ce->ce_mode))
        continue;

check before the code ever gets to the refresh_cache_ent() call, so as
far as I can tell, that function won't be called from refresh_entry()
for sparse entries.  Maybe your commit message here is out-of-date?
Or am I confused somehow?

> The loop within run_diff_files() skips things that are in stage 0 and
> have skip-worktree enabled, so seems safe to disable ensure_full_index()
> here.

Unlike the above, I don't see a removal of a ensure_full_index() call
in run_diff_files() as claimed by this paragraph.  Has the commit
message gotten out of date with refactorings you did while developing
this series?

> This allows some cases of 'git status' to no longer expand a sparse
> index to a full one, giving the following performance improvements for
> p2000-sparse-checkout-operations.sh:
>
> Test                                  HEAD~1           HEAD
> -----------------------------------------------------------------------------
> 2000.2: git status (full-index-v3)    0.38(0.36+0.07)  0.37(0.31+0.10) -2.6%
> 2000.3: git status (full-index-v4)    0.38(0.29+0.12)  0.37(0.30+0.11) -2.6%
> 2000.4: git status (sparse-index-v3)  2.43(2.33+0.14)  0.04(0.05+0.04) -98.4%
> 2000.5: git status (sparse-index-v4)  2.44(2.35+0.13)  0.05(0.04+0.07) -98.0%
>
> Note that since HEAD~1 was expanding the sparse index by parsing trees,
> it was artificially slower than the full index case. Thus, the 98%
> improvement is misleading, and instead we should celebrate the 0.37s to
> 0.05s improvement of 82%. This is more indicative of the peformance
> gains we are expecting by using a sparse index.

82%, very nice.  Was this with git.git as the test repository, or some
other repo?  If it's git.git, then we'd actually expect a much bigger
speedup for other repositories, as git.git is pretty small.


> Note: we are dropping the assignment of core.fsmonitor here. This is not
> necessary for the test script as we are not altering the config any
> other way. Correct integration with FS Monitor will be validated in
> later changes.
>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  builtin/commit.c                         |  3 +++
>  read-cache.c                             |  2 --
>  t/t1092-sparse-checkout-compatibility.sh | 12 ++++++++----
>  3 files changed, 11 insertions(+), 6 deletions(-)
>
> diff --git a/builtin/commit.c b/builtin/commit.c
> index cf0c36d1dcb2..e529da7beadd 100644
> --- a/builtin/commit.c
> +++ b/builtin/commit.c
> @@ -1404,6 +1404,9 @@ int cmd_status(int argc, const char **argv, const char *prefix)
>         if (argc == 2 && !strcmp(argv[1], "-h"))
>                 usage_with_options(builtin_status_usage, builtin_status_options);
>
> +       prepare_repo_settings(the_repository);
> +       the_repository->settings.command_requires_full_index = 0;
> +
>         status_init_config(&s, git_status_config);
>         argc = parse_options(argc, argv, prefix,
>                              builtin_status_options,
> diff --git a/read-cache.c b/read-cache.c
> index 6308234b4838..83e6bdef7604 100644
> --- a/read-cache.c
> +++ b/read-cache.c
> @@ -1578,8 +1578,6 @@ int refresh_index(struct index_state *istate, unsigned int flags,
>          */
>         preload_index(istate, pathspec, 0);
>         trace2_region_enter("index", "refresh", NULL);
> -       /* TODO: audit for interaction with sparse-index. */
> -       ensure_full_index(istate);
>         for (i = 0; i < istate->cache_nr; i++) {
>                 struct cache_entry *ce, *new_entry;
>                 int cache_errno = 0;
> diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
> index e488ef9bd941..380a085f8ec4 100755
> --- a/t/t1092-sparse-checkout-compatibility.sh
> +++ b/t/t1092-sparse-checkout-compatibility.sh
> @@ -449,12 +449,16 @@ test_expect_success 'sparse-index is expanded and converted back' '
>         GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \
>                 git -C sparse-index -c core.fsmonitor="" reset --hard &&
>         test_region index convert_to_sparse trace2.txt &&
> -       test_region index ensure_full_index trace2.txt &&
> +       test_region index ensure_full_index trace2.txt
> +'
>
> -       rm trace2.txt &&
> +test_expect_success 'sparse-index is not expanded' '
> +       init_repos &&
> +
> +       rm -f trace2.txt &&
>         GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \
> -               git -C sparse-index -c core.fsmonitor="" status -uno &&
> -       test_region index ensure_full_index trace2.txt
> +               git -C sparse-index status -uno &&
> +       test_region ! index ensure_full_index trace2.txt
>  '
>
>  test_done
> --
> gitgitgadget

Other than what looks like a couple issues in the commit message, the
change looks good to me.

^ permalink raw reply	[flat|nested] 215+ messages in thread

* Re: [PATCH 06/10] dir: use expand_to_path() for sparse directories
  2021-04-13 14:01 ` [PATCH 06/10] dir: use expand_to_path() for sparse directories Derrick Stolee via GitGitGadget
@ 2021-04-21  0:52   ` Elijah Newren
  2021-04-21  0:53     ` Elijah Newren
  0 siblings, 1 reply; 215+ messages in thread
From: Elijah Newren @ 2021-04-21  0:52 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Junio C Hamano, Derrick Stolee, Derrick Stolee

On Tue, Apr 13, 2021 at 7:01 AM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Derrick Stolee <dstolee@microsoft.com>
>
> The recently-implemented expand_to_path() method can supply position
> queries a faster response if they are specifically asking for a path
> within the sparse cone. Since this is the most-common scenario, this
> provides a significant speedup.
>
> Update t1092-sparse-checkout-compatibility.sh to fully ensure that 'git
> status' does not expand a sparse index to a full one, even when there
> exist untracked files.
>
> The performance test script p2000-sparse-operations.sh demonstrates
> that this is the final hole to fill to allow 'git status' to speed up
> when using a sparse index:
>
> Test                                  HEAD~1            HEAD
> ------------------------------------------------------------------------------
> 2000.4: git status (sparse-index-v3)  1.50(1.43+0.10)   0.04(0.04+0.03) -97.3%
> 2000.5: git status (sparse-index-v4)  1.50(1.43+0.10)   0.04(0.03+0.04) -97.3%

Um, I'm confused.  In the previous patch you claimed the following speedups:

2000.4: git status (sparse-index-v3)  2.43(2.33+0.14)  0.04(0.05+0.04) -98.4%
2000.5: git status (sparse-index-v4)  2.44(2.35+0.13)  0.05(0.04+0.07) -98.0%

I don't understand why the "Before" for this patch claims 1.50 as the
initial speed, if the "After" for the last patch was 0.04.  Should the
previous commit message have instead claimed:

2000.4: git status (sparse-index-v3)  2.43(2.33+0.14)  1.50(1.43+0.10) -38.3%
2000.5: git status (sparse-index-v4)  2.44(2.35+0.13)  1.50(1.43+0.10) -38.5%

?

>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  t/t1092-sparse-checkout-compatibility.sh | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
> index 380a085f8ec4..b937d7096afd 100755
> --- a/t/t1092-sparse-checkout-compatibility.sh
> +++ b/t/t1092-sparse-checkout-compatibility.sh
> @@ -456,8 +456,9 @@ test_expect_success 'sparse-index is not expanded' '
>         init_repos &&
>
>         rm -f trace2.txt &&
> +       echo >>sparse-index/untracked.txt &&
>         GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \
> -               git -C sparse-index status -uno &&
> +               git -C sparse-index status &&
>         test_region ! index ensure_full_index trace2.txt
>  '
>
> --
> gitgitgadget

Oh!  So, the previous patch was testing without enumerating untracked
files (because it did those slowly), whereas this one enumerates
untracked files and is still able to achieve the same performance?
This wasn't very clear from the commit message.  Maybe I'm just bad at
reading, but perhaps the commit message could be tweaked slightly to
make this more clear?

^ permalink raw reply	[flat|nested] 215+ messages in thread

* Re: [PATCH 06/10] dir: use expand_to_path() for sparse directories
  2021-04-21  0:52   ` Elijah Newren
@ 2021-04-21  0:53     ` Elijah Newren
  2021-04-21 14:03       ` Derrick Stolee
  0 siblings, 1 reply; 215+ messages in thread
From: Elijah Newren @ 2021-04-21  0:53 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Junio C Hamano, Derrick Stolee, Derrick Stolee

One more thing:

On Tue, Apr 20, 2021 at 5:52 PM Elijah Newren <newren@gmail.com> wrote:
>
> On Tue, Apr 13, 2021 at 7:01 AM Derrick Stolee via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
> >
> > From: Derrick Stolee <dstolee@microsoft.com>
> >
> > The recently-implemented expand_to_path() method can supply position
> > queries a faster response if they are specifically asking for a path
> > within the sparse cone. Since this is the most-common scenario, this
> > provides a significant speedup.
> >
> > Update t1092-sparse-checkout-compatibility.sh to fully ensure that 'git
> > status' does not expand a sparse index to a full one, even when there
> > exist untracked files.
> >
> > The performance test script p2000-sparse-operations.sh demonstrates
> > that this is the final hole to fill to allow 'git status' to speed up
> > when using a sparse index:
> >
> > Test                                  HEAD~1            HEAD
> > ------------------------------------------------------------------------------
> > 2000.4: git status (sparse-index-v3)  1.50(1.43+0.10)   0.04(0.04+0.03) -97.3%
> > 2000.5: git status (sparse-index-v4)  1.50(1.43+0.10)   0.04(0.03+0.04) -97.3%
>
> Um, I'm confused.  In the previous patch you claimed the following speedups:
>
> 2000.4: git status (sparse-index-v3)  2.43(2.33+0.14)  0.04(0.05+0.04) -98.4%
> 2000.5: git status (sparse-index-v4)  2.44(2.35+0.13)  0.05(0.04+0.07) -98.0%
>
> I don't understand why the "Before" for this patch claims 1.50 as the
> initial speed, if the "After" for the last patch was 0.04.  Should the
> previous commit message have instead claimed:
>
> 2000.4: git status (sparse-index-v3)  2.43(2.33+0.14)  1.50(1.43+0.10) -38.3%
> 2000.5: git status (sparse-index-v4)  2.44(2.35+0.13)  1.50(1.43+0.10) -38.5%
>
> ?
>
> >
> > Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> > ---
> >  t/t1092-sparse-checkout-compatibility.sh | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
> > index 380a085f8ec4..b937d7096afd 100755
> > --- a/t/t1092-sparse-checkout-compatibility.sh
> > +++ b/t/t1092-sparse-checkout-compatibility.sh
> > @@ -456,8 +456,9 @@ test_expect_success 'sparse-index is not expanded' '
> >         init_repos &&
> >
> >         rm -f trace2.txt &&
> > +       echo >>sparse-index/untracked.txt &&
> >         GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \
> > -               git -C sparse-index status -uno &&
> > +               git -C sparse-index status &&
> >         test_region ! index ensure_full_index trace2.txt
> >  '
> >
> > --
> > gitgitgadget
>
> Oh!  So, the previous patch was testing without enumerating untracked
> files (because it did those slowly), whereas this one enumerates
> untracked files and is still able to achieve the same performance?
> This wasn't very clear from the commit message.  Maybe I'm just bad at
> reading, but perhaps the commit message could be tweaked slightly to
> make this more clear?

Why is the subject of this commit "dir: use expand_to_path() ..." if
it only touches t1092-sparse-checkout-compatibility.sh?

^ permalink raw reply	[flat|nested] 215+ messages in thread

* Re: [PATCH 08/10] pathspec: stop calling ensure_full_index
  2021-04-13 14:01 ` [PATCH 08/10] pathspec: stop calling ensure_full_index Derrick Stolee via GitGitGadget
@ 2021-04-21  0:57   ` Elijah Newren
  0 siblings, 0 replies; 215+ messages in thread
From: Elijah Newren @ 2021-04-21  0:57 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Junio C Hamano, Derrick Stolee, Derrick Stolee

On Tue, Apr 13, 2021 at 7:01 AM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Derrick Stolee <dstolee@microsoft.com>
>
> The add_pathspec_matches_against_index() focuses on matching a pathspec
> to file entries in the index. This already works correctly for its only
> use: checking if untracked files exist in the index.
>
> The compatibility checks in t1092 already test that 'git add <dir>'
> works for a directory outside of the sparse cone. That provides coverage
> for removing this guard.
>
> This finalizes our ability to run 'git add .' without expanding a sparse
> index to a full one. This is evidenced by an update to t1092 and by
> these performance numbers for p2000-sparse-operations.sh:
>
> Test                                    HEAD~1            HEAD
> --------------------------------------------------------------------------------
> 2000.10: git add . (full-index-v3)      1.37(1.02+0.18)   1.38(1.01+0.20) +0.7%
> 2000.11: git add . (full-index-v4)      1.26(1.00+0.15)   1.27(0.99+0.17) +0.8%
> 2000.12: git add . (sparse-index-v3)    2.39(2.29+0.14)   0.06(0.05+0.07) -97.5%
> 2000.13: git add . (sparse-index-v4)    2.42(2.32+0.14)   0.06(0.05+0.06) -97.5%
>
> While the 97% improvement is shown by the test results, it is worth
> noting that expanding the sparse index was adding overhead in previous
> commits. Comparing to the full index case, we see the performance go
> from 1.27s to 0.06s, a 95% improvement.

This is awesome.  :-)

>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  pathspec.c                               | 2 --
>  t/t1092-sparse-checkout-compatibility.sh | 6 ++++++
>  2 files changed, 6 insertions(+), 2 deletions(-)
>
> diff --git a/pathspec.c b/pathspec.c
> index 54813c0c4e8e..b51b48471fe6 100644
> --- a/pathspec.c
> +++ b/pathspec.c
> @@ -37,8 +37,6 @@ void add_pathspec_matches_against_index(const struct pathspec *pathspec,
>                         num_unmatched++;
>         if (!num_unmatched)
>                 return;
> -       /* TODO: audit for interaction with sparse-index. */
> -       ensure_full_index(istate);
>         for (i = 0; i < istate->cache_nr; i++) {
>                 const struct cache_entry *ce = istate->cache[i];
>                 if (sw_action == PS_IGNORE_SKIP_WORKTREE && ce_skip_worktree(ce))
> diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
> index c210dba78067..738013b00191 100755
> --- a/t/t1092-sparse-checkout-compatibility.sh
> +++ b/t/t1092-sparse-checkout-compatibility.sh
> @@ -471,6 +471,12 @@ test_expect_success 'sparse-index is not expanded' '
>         echo >>sparse-index/extra.txt &&
>         GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \
>                 git -C sparse-index add extra.txt &&
> +       test_region ! index ensure_full_index trace2.txt &&
> +
> +       rm trace2.txt &&
> +       echo >>sparse-index/untracked.txt &&
> +       GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \
> +               git -C sparse-index add . &&
>         test_region ! index ensure_full_index trace2.txt
>  '
>
> --
> gitgitgadget
>

^ permalink raw reply	[flat|nested] 215+ messages in thread

* Re: [PATCH 10/10] fsmonitor: test with sparse index
  2021-04-13 14:01 ` [PATCH 10/10] fsmonitor: test with sparse index Derrick Stolee via GitGitGadget
@ 2021-04-21  7:00   ` Elijah Newren
  0 siblings, 0 replies; 215+ messages in thread
From: Elijah Newren @ 2021-04-21  7:00 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Junio C Hamano, Derrick Stolee, Derrick Stolee

On Tue, Apr 13, 2021 at 7:01 AM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Derrick Stolee <dstolee@microsoft.com>
>
> During the effort to protect uses of the index to operate on a full
> index, we did not modify fsmonitor.c. This is because it already works
> effectively with only the change to index_name_stage_pos(). The only
> thing left to do is to test that it works correctly.
>
> These tests are added to demonstrate that the behavior is the same
> across a full index and a sparse index, but also that file modifications
> to a tracked directory outside of the sparse cone will trigger
> ensure_full_index().
>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  t/t7519-status-fsmonitor.sh | 48 +++++++++++++++++++++++++++++++++++++
>  1 file changed, 48 insertions(+)
>
> diff --git a/t/t7519-status-fsmonitor.sh b/t/t7519-status-fsmonitor.sh
> index 23879d967297..306157d48abf 100755
> --- a/t/t7519-status-fsmonitor.sh
> +++ b/t/t7519-status-fsmonitor.sh
> @@ -78,6 +78,7 @@ test_expect_success 'setup' '
>         expect*
>         actual*
>         marker*
> +       trace2*
>         EOF
>  '
>
> @@ -400,4 +401,51 @@ test_expect_success 'status succeeds after staging/unstaging' '
>         )
>  '
>
> +test_expect_success 'status succeeds with sparse index' '
> +       test_config core.fsmonitor "$TEST_DIRECTORY/t7519/fsmonitor-all" &&
> +       git status --porcelain=v2 >expect &&
> +       git sparse-checkout init --cone --sparse-index &&
> +       GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \
> +               git status --porcelain=v2 >actual &&
> +       test_region ! index ensure_full_index trace2.txt &&
> +       test_cmp expect actual &&
> +       rm trace2.txt &&
> +
> +       write_script .git/hooks/fsmonitor-test<<-\EOF &&
> +               printf "last_update_token\0"
> +       EOF
> +       git config core.fsmonitor .git/hooks/fsmonitor-test &&
> +       git status --porcelain=v2 >expect &&
> +       git sparse-checkout init --cone --sparse-index &&
> +       GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \
> +               git status --porcelain=v2 >actual &&
> +       test_region ! index ensure_full_index trace2.txt &&
> +       test_cmp expect actual &&
> +       rm trace2.txt &&
> +
> +       write_script .git/hooks/fsmonitor-test<<-\EOF &&
> +               printf "last_update_token\0"
> +               printf "dir1/modified\0"
> +       EOF
> +       git config core.fsmonitor .git/hooks/fsmonitor-test &&
> +       git status --porcelain=v2 >expect &&
> +       git sparse-checkout init --cone --sparse-index &&
> +       GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \
> +               git status --porcelain=v2 >actual &&
> +       test_region ! index ensure_full_index trace2.txt &&
> +       test_cmp expect actual &&
> +
> +       write_script .git/hooks/fsmonitor-test<<-\EOF &&
> +               printf "last_update_token\0"
> +               printf "dir1a/modified\0"
> +       EOF
> +       git config core.fsmonitor .git/hooks/fsmonitor-test &&
> +       git status --porcelain=v2 >expect &&
> +       git sparse-checkout init --cone --sparse-index &&
> +       GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \
> +               git status --porcelain=v2 >actual &&
> +       test_region index ensure_full_index trace2.txt &&
> +       test_cmp expect actual

There's a lot of duplicated lines here; would it make sense to have a
helper function you call, making it easier to see the differences
between the four subsections of this test?  Also, do you want to use
test_config instead of git config, so that it automatically gets unset
at the end of the test?

^ permalink raw reply	[flat|nested] 215+ messages in thread

* Re: [PATCH 01/10] t1092: add tests for status/add and sparse files
  2021-04-20 21:52   ` Elijah Newren
@ 2021-04-21 13:21     ` Derrick Stolee
  0 siblings, 0 replies; 215+ messages in thread
From: Derrick Stolee @ 2021-04-21 13:21 UTC (permalink / raw)
  To: Elijah Newren, Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Junio C Hamano, Derrick Stolee, Derrick Stolee

On 4/20/2021 5:52 PM, Elijah Newren wrote:
> On Tue, Apr 13, 2021 at 7:01 AM Derrick Stolee via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
> I'll note in particular that it's easy for users after running "git
> add" to run other things such as "git sparse-checkout reapply" or "git
> switch $otherbranch" and suddenly the file disappears from the working
> tree.  From the sparse-checkout machinery that makes sense; this path
> doesn't match the .git/info/sparse-checkout list of paths, so it
> should be removed from the working tree.  But it's very disorienting
> to users.  Especially if some of those commands are side-effects of
> other commands (e.g. our build system invokes "git sparse-checkout
> reapply" in various cases, most common of which is that even a simple
> "git pull" can bring down code with dependency changes and thus a need
> for new sparsity rules and whatnot), but it definitely can just happen
> in ways users don't expect with their own commands (e.g. the git
> switch/checkout example).
> 
> The patch looks good, but it'd be nice if while documenting it we also
> add a comment that we believe we want to change the behavior (for
> sparse-checkout both with and without sparse-index).  It's one of
> those many paper-cuts we still have.

I can try to comment on these corner case tests that the behavior is
not intended to be permanent, especially when already needing to comment
how strange it is acting.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 215+ messages in thread

* Re: [PATCH 02/10] unpack-trees: make sparse aware
  2021-04-20 23:00   ` Elijah Newren
@ 2021-04-21 13:41     ` Derrick Stolee
  2021-04-21 16:11       ` Elijah Newren
  2021-04-21 17:27     ` Derrick Stolee
  1 sibling, 1 reply; 215+ messages in thread
From: Derrick Stolee @ 2021-04-21 13:41 UTC (permalink / raw)
  To: Elijah Newren, Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Junio C Hamano, Derrick Stolee, Derrick Stolee

On 4/20/2021 7:00 PM, Elijah Newren wrote:
> On Tue, Apr 13, 2021 at 7:01 AM Derrick Stolee via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
>> diff --git a/dir.h b/dir.h
>> index 51cb0e217247..9d6666f520f3 100644
>> --- a/dir.h
>> +++ b/dir.h
>> @@ -503,7 +503,7 @@ static inline int ce_path_match(struct index_state *istate,
>>                                 char *seen)
>>  {
>>         return match_pathspec(istate, pathspec, ce->name, ce_namelen(ce), 0, seen,
>> -                             S_ISDIR(ce->ce_mode) || S_ISGITLINK(ce->ce_mode));
>> +                             S_ISSPARSEDIR(ce->ce_mode) || S_ISDIR(ce->ce_mode) || S_ISGITLINK(ce->ce_mode));
> 
> I'm confused why this change would be needed, or why it'd semantically
> be meaningful here either.  Doesn't S_ISSPARSEDIR() being true imply
> S_ISDIR() is true (and perhaps even vice versa?).
> 
> By chance, was this a leftover from your early RFC changes from a few
> series ago when you had an entirely different mode for sparse
> directory entries?

I will double-check on this with additional testing and debugging.
Your comments below make it clear that this patch would benefit from
some additional splitting.

>>  }
>>
>>  static inline int dir_path_match(struct index_state *istate,
>> diff --git a/preload-index.c b/preload-index.c
>> index e5529a586366..35e67057ca9b 100644
>> --- a/preload-index.c
>> +++ b/preload-index.c
>> @@ -55,6 +55,8 @@ static void *preload_thread(void *_data)
>>                         continue;
>>                 if (S_ISGITLINK(ce->ce_mode))
>>                         continue;
>> +               if (S_ISSPARSEDIR(ce->ce_mode))
>> +                       continue;
>>                 if (ce_uptodate(ce))
>>                         continue;
>>                 if (ce_skip_worktree(ce))
> 
> Don't we have S_ISSPARSEDIR(ce->ce_mode) implies ce_skip_worktree(ce)?
>  Is this a duplicate check?  If so, is it still desirable for
> future-proofing or code clarity, or is it strictly redundant?

You're right, we could skip this one because the ce_skip_worktree(ce)
is enough to cover this case. I think I created this one because I was
auditing uses of S_ISGITLINK().

>> diff --git a/read-cache.c b/read-cache.c
>> index 29ffa9ac5db9..6308234b4838 100644
>> --- a/read-cache.c
>> +++ b/read-cache.c
>> @@ -1594,6 +1594,9 @@ int refresh_index(struct index_state *istate, unsigned int flags,
>>                 if (ignore_skip_worktree && ce_skip_worktree(ce))
>>                         continue;
>>
>> +               if (istate->sparse_index && S_ISSPARSEDIR(ce->ce_mode))
>> +                       continue;
>> +
> 
> I'm a bit confused about what could trigger ce_skip_worktree(ce) &&
> !ignore_skip_worktree and why it'd be desirable to refresh
> skip-worktree entries.  However, this is tangential to your patch and
> has apparently been around since 2009 (in particular, from 56cac48c35
> ("ie_match_stat(): do not ignore skip-worktree bit with
> CE_MATCH_IGNORE_VALID", 2009-12-14)).

This is probably better served with a statement like this earlier in
the method:

	if (ignore_skip_worktree)
		ensure_full_index(istate);

It seems like ignoring the skip worktree bits is a rare occasion and
it will be worth expanding the index for that case.

>>                 if (pathspec && !ce_path_match(istate, ce, pathspec, seen))
>>                         filtered = 1;
>>
>> diff --git a/unpack-trees.c b/unpack-trees.c
>> index dddf106d5bd4..9a62e823928a 100644
>> --- a/unpack-trees.c
>> +++ b/unpack-trees.c
>> @@ -586,6 +586,13 @@ static void mark_ce_used(struct cache_entry *ce, struct unpack_trees_options *o)
>>  {
>>         ce->ce_flags |= CE_UNPACKED;
>>
>> +       /*
>> +        * If this is a sparse directory, don't advance cache_bottom.
>> +        * That will be advanced later using the cache-tree data.
>> +        */
>> +       if (S_ISSPARSEDIR(ce->ce_mode))
>> +               return;
>> +
> 
> I don't understand cache_bottom stuff; we might want to get Junio to
> look over it.  Or maybe I just need to dig a bit further and attempt
> to understand it.

I remember looking very careful at this when I created this (and found
it worth a comment) but I don't recall enough off the top of my head.
This is worth splitting out with a careful message, which will force me
to reexamine the cache_bottom member.

>>         if (o->cache_bottom < o->src_index->cache_nr &&
>>             o->src_index->cache[o->cache_bottom] == ce) {
>>                 int bottom = o->cache_bottom;
>> @@ -984,6 +991,9 @@ static int do_compare_entry(const struct cache_entry *ce,
>>         ce_len -= pathlen;
>>         ce_name = ce->name + pathlen;
>>
>> +       /* remove directory separator if a sparse directory entry */
>> +       if (S_ISSPARSEDIR(ce->ce_mode))
>> +               ce_len--;
>>         return df_name_compare(ce_name, ce_len, S_IFREG, name, namelen, mode);
> 
> Shouldn't we be passing ce->ce_mode instead of S_IFREG here as well?
> 
> Note the following sort order:
>    foo
>    foo.txt
>    foo/
>    foo/bar
> 
> You've trimmed off the '/', so 'foo/' would be ordered where 'foo' is,
> but df_name_compare() exists to make "foo" sort exactly where "foo/"
> would when "foo" is a directory.  Will your df_name_compare() call
> here result in foo.txt being placed after all the "foo/<subpath>"
> entries in the index and perhaps cause other problems down the line?
> (Are there issues, e.g. with cache-trees getting wrong ordering from
> this, or even writing out indexes or tree objects with the wrong
> ordering?  I've written out trees to disk with wrong ordering before
> and git usually survives but gets really confused with diffs.)
> 
> Since at least one caller of compare_entry() takes the return result
> and does a "if (cmp < 0)", this order is going to matter in some
> cases.  Perhaps we need some testcases where there is a sparse
> directory entry named "foo/" and a file recorded in some relevant tree
> with the name "foo.txt" to be able to trigger these lines of code?

I will do some testing to find out why removing the separator here was
necessary or valuable.

>>  }
>>
>> @@ -993,6 +1003,10 @@ static int compare_entry(const struct cache_entry *ce, const struct traverse_inf
>>         if (cmp)
>>                 return cmp;
>>
>> +       /* If ce is a sparse directory, then allow equality here. */
>> +       if (S_ISSPARSEDIR(ce->ce_mode))
>> +               return 0;
>> +
> 
> Um...so a sparse directory compares equal to _anything_ at all?  I'm
> really confused why this would be desirable.  Am I missing something
> here?

The context is that is removed from the patch is that "cmp" is the
response from do_compare_entry(), which does a length-limited comparison.
If cmp is non-zero, then we've already returned the difference.

The rest of the method is checking if the 'info' input is actually a
parent directory of the _path_ given at this cache entry.

>>         /*
>>          * Even if the beginning compared identically, the ce should
>>          * compare as bigger than a directory leading up to it!

The line after this is:

	return ce_namelen(ce) > traverse_path_len(info, tree_entry_len(n));

This comparison is saying "these paths match up to the directory specified
by info and n, but we need 'ce' to be a file within that directory." But
in the case of a sparse directory entry, we can skip this comparison.

>> @@ -1243,6 +1257,7 @@ static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, str
>>         struct cache_entry *src[MAX_UNPACK_TREES + 1] = { NULL, };
>>         struct unpack_trees_options *o = info->data;
>>         const struct name_entry *p = names;
>> +       unsigned recurse = 1;
> 
> "recurse" sent my mind off into questions about safety checks, base
> cases, etc., instead of just the simple "we don't want to read in
> directories corresponding to sparse entries".  I think this would be
> clearer either if the variable had the sparsity concept embedded in
> its name somewhere (e.g. "unsigned sparse_entry = 0", and check for
> (!sparse_entry) instead of (recurse) below), or with a comment about
> why there are cases where you want to avoid recursion.

I can understand that. This callback is confusing because it _does_
recurse, but through a sequence of methods instead of actually calling
itself.

It would be better to say something like "unpack_subdirectories = 1"
and disabling it when we are in a sparse directory.

>>
>>         /* Find first entry with a real name (we could use "mask" too) */
>>         while (!p->mode)
>> @@ -1284,12 +1299,16 @@ static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, str
>>                                         }
>>                                 }
>>                                 src[0] = ce;
>> +
>> +                               if (S_ISSPARSEDIR(ce->ce_mode))
>> +                                       recurse = 0;
> 
> Ah, the context here doesn't show it but this is in the "if (!cmp)"
> block, i.e. if we found a match for the sparse directory.  This makes
> sense, to me, _if_ we ignore the above question about sparse
> directories matching equal to anything and everything.

I believe that "anything and everything" concern has been resolved.

>> @@ -1319,7 +1338,8 @@ static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, str
>>                         }
>>                 }
>>
>> -               if (traverse_trees_recursive(n, dirmask, mask & ~dirmask,
>> +               if (recurse &&
>> +                   traverse_trees_recursive(n, dirmask, mask & ~dirmask,
>>                                              names, info) < 0)
>>                         return -1;
>>                 return mask;
> 
> Nice.  :-)
> 
> 
> I think your patch was mostly about the recurse stuff, which other
> than the name or a comment about it look good to me.  However, all the
> other preparatory small tweaks brought up a lot of questions or
> confusion for me.  I'm worried there might be a bug or two, though I
> may have just misunderstood some of the code bits.
 
This patch could probably be split up a little to make these things
clearer. Thanks for bringing up the tricky bits.

-Stolee

^ permalink raw reply	[flat|nested] 215+ messages in thread

* Re: [PATCH 03/10] dir.c: accept a directory as part of cone-mode patterns
  2021-04-20 23:21   ` Elijah Newren
@ 2021-04-21 13:47     ` Derrick Stolee
  0 siblings, 0 replies; 215+ messages in thread
From: Derrick Stolee @ 2021-04-21 13:47 UTC (permalink / raw)
  To: Elijah Newren, Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Junio C Hamano, Derrick Stolee, Derrick Stolee

On 4/20/2021 7:21 PM, Elijah Newren wrote:
> On Tue, Apr 13, 2021 at 7:01 AM Derrick Stolee via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
>>
>> From: Derrick Stolee <dstolee@microsoft.com>
>>
>> When we have sparse directory entries in the index, we want to compare
>> that directory against sparse-checkout patterns. Those pattern matching
>> algorithms are built expecting a file path, not a directory path. This
>> is especially important in the "cone mode" patterns which will match
>> files that exist within the "parent directories" as well as the
>> recursive directory matches.
>>
>> If path_matches_pattern_list() is given a directory, we can add a fake
>> filename ("-") to the directory and get the same results as before,
>> assuming we are in cone mode. Since sparse index requires cone mode
>> patterns, this is an acceptable assumption.
> 
> Makes sense; thanks for the good description.
> 
>> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
>> ---
>>  dir.c | 5 +++++
>>  1 file changed, 5 insertions(+)
>>
>> diff --git a/dir.c b/dir.c
>> index 166238e79f52..57e22e605cec 100644
>> --- a/dir.c
>> +++ b/dir.c
>> @@ -1378,6 +1378,11 @@ enum pattern_match_result path_matches_pattern_list(
>>         strbuf_addch(&parent_pathname, '/');
>>         strbuf_add(&parent_pathname, pathname, pathlen);
>>
>> +       /* Directory requests should be added as if they are a file */
> 
> "added" or "matched"?  Also, the description seems a bit brief and
> likely to surprise; I'd at least want to expand "file" to "file within
> their given directory" but it might be nice to get some summarized
> version of the commit message or at least state that "-" is just a
> random simple name within the given directory.

I can improve this comment.

>> +       if (parent_pathname.len > 1 &&
> 
> Is this line...
> 
>> +           parent_pathname.buf[parent_pathname.len - 1] == '/')
> 
> to prevent an out-of-bounds indexing?  If so, shouldn't it be "> 0" or
> ">= 1" rather than "> 1"?  And if so, doesn't the strbuf_addch() call
> above ensure the condition is always met?
> 
> Or are we trying to avoid adding the "-" when we parent_pathname is
> just a plain "/"?

I believe plain "/" is impossible. There needs to be a valid tree entry
before that first slash ("a/", for example). But that isn't super
important to the logic here and just adds confusion.

> 
>> +               strbuf_add(&parent_pathname, "-", 1);
>> +
> 
> Sorry for all the questions on such a tiny change.  It makes sense to
> me, I'm just curious whether it'll confuse future code readers.

Yes, let's avoid confusion by doing the simple thing and use "> 0".

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 215+ messages in thread

* Re: [PATCH 04/10] status: skip sparse-checkout percentage with sparse-index
  2021-04-20 23:26   ` Elijah Newren
@ 2021-04-21 13:51     ` Derrick Stolee
  0 siblings, 0 replies; 215+ messages in thread
From: Derrick Stolee @ 2021-04-21 13:51 UTC (permalink / raw)
  To: Elijah Newren, Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Junio C Hamano, Derrick Stolee, Derrick Stolee

On 4/20/2021 7:26 PM, Elijah Newren wrote:
> On Tue, Apr 13, 2021 at 7:01 AM Derrick Stolee via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
>> Avoid the expensive computation by simplifying the output to only report
>> that a sparse checkout exists, without the percentage.
> 
> Makes sense.  The percentage wasn't critical, it was just a nice UI
> bonus.  The critical part is notifying about being in a sparse
> checkout.
> 
> It makes me wonder slightly if we'd want to remove the percentage for
> both modes just to keep them more similar.  I'll ask some folks for
> their thoughts/opinions.  Of course, that could always be tweaked
> later and doesn't necessarily need to go into your series.

I find the percentage helpful for users who are exploring the
sparse-checkout feature in their repositories. It's nice to know how
much time it is saving, because "percentage of files" frequently
translates to "percentage of time it takes to update the worktree".

I was sad to lose it here, but I don't see any way to keep it.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 215+ messages in thread

* Re: [PATCH 05/10] status: use sparse-index throughout
  2021-04-21  0:44   ` Elijah Newren
@ 2021-04-21 13:55     ` Derrick Stolee
  0 siblings, 0 replies; 215+ messages in thread
From: Derrick Stolee @ 2021-04-21 13:55 UTC (permalink / raw)
  To: Elijah Newren, Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Junio C Hamano, Derrick Stolee, Derrick Stolee

On 4/20/2021 8:44 PM, Elijah Newren wrote:
> On Tue, Apr 13, 2021 at 7:01 AM Derrick Stolee via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
>>
>> From: Derrick Stolee <dstolee@microsoft.com>
>>
>> By testing 'git -c core.fsmonitor= status -uno', we can check for the
>> simplest index operations that can be made sparse-aware. The necessary
>> implementation details are already integrated with sparse-checkout, so
>> modify command_requires_full_index to be zero for cmd_status().
>>
>> By running the debugger for 'git status -uno' after that change, we find
>> two instances of ensure_full_index() that were added for extra safety,
>> but can be removed without issue.
>>
>> In refresh_index(), we loop through the index entries. The
>> refresh_cache_ent() method copies the sparse directories into the
>> refreshed index without issue.
> 
> I do see the removal of a call to ensure_full_index() in
> refresh_index() that you mention in this paragraph in the patch below.
> 
> I'm confused, though; I would have thought we wanted to avoid a
> refresh_cache_ent() call.  Also, one of your previous patches added a
> 
>     if (istate->sparse_index && S_ISSPARSEDIR(ce->ce_mode))
>         continue;
> 
> check before the code ever gets to the refresh_cache_ent() call, so as
> far as I can tell, that function won't be called from refresh_entry()
> for sparse entries.  Maybe your commit message here is out-of-date?
> Or am I confused somehow?
> 
>> The loop within run_diff_files() skips things that are in stage 0 and
>> have skip-worktree enabled, so seems safe to disable ensure_full_index()
>> here.
> 
> Unlike the above, I don't see a removal of a ensure_full_index() call
> in run_diff_files() as claimed by this paragraph.  Has the commit
> message gotten out of date with refactorings you did while developing
> this series?

I greatly reduced the number of ensure_full_index() calls in the
previous topic (ds/sparse-index-protections) since first writing this
patch, so it is very likely to be out-of-date. Thanks for calling it out.

>> This allows some cases of 'git status' to no longer expand a sparse
>> index to a full one, giving the following performance improvements for
>> p2000-sparse-checkout-operations.sh:
>>
>> Test                                  HEAD~1           HEAD
>> -----------------------------------------------------------------------------
>> 2000.2: git status (full-index-v3)    0.38(0.36+0.07)  0.37(0.31+0.10) -2.6%
>> 2000.3: git status (full-index-v4)    0.38(0.29+0.12)  0.37(0.30+0.11) -2.6%
>> 2000.4: git status (sparse-index-v3)  2.43(2.33+0.14)  0.04(0.05+0.04) -98.4%
>> 2000.5: git status (sparse-index-v4)  2.44(2.35+0.13)  0.05(0.04+0.07) -98.0%
>>
>> Note that since HEAD~1 was expanding the sparse index by parsing trees,
>> it was artificially slower than the full index case. Thus, the 98%
>> improvement is misleading, and instead we should celebrate the 0.37s to
>> 0.05s improvement of 82%. This is more indicative of the peformance
>> gains we are expecting by using a sparse index.
> 
> 82%, very nice.  Was this with git.git as the test repository, or some
> other repo?  If it's git.git, then we'd actually expect a much bigger
> speedup for other repositories, as git.git is pretty small.
This test script takes the input repository (git.git in this case) and
creates a tree that contains that repository many times over, but only
four copies remain in the sparse-checkout definition. This creates the
big speedup, because of the enormous difference in index size.

As I am exploring commands such as 'merge' and 'rebase' I am finding
that this test setup is too expensive to cover those commands. I will
need to reduce the size of the test repository (by a factor of 4) and
that will reduce how impressive these results are while making the more
complicated commands testable in a reasonable amount of time.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 215+ messages in thread

* Re: [PATCH 06/10] dir: use expand_to_path() for sparse directories
  2021-04-21  0:53     ` Elijah Newren
@ 2021-04-21 14:03       ` Derrick Stolee
  0 siblings, 0 replies; 215+ messages in thread
From: Derrick Stolee @ 2021-04-21 14:03 UTC (permalink / raw)
  To: Elijah Newren, Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Junio C Hamano, Derrick Stolee, Derrick Stolee

On 4/20/2021 8:53 PM, Elijah Newren wrote:
> One more thing:
> 
> On Tue, Apr 20, 2021 at 5:52 PM Elijah Newren <newren@gmail.com> wrote:
>>
>> On Tue, Apr 13, 2021 at 7:01 AM Derrick Stolee via GitGitGadget
>> <gitgitgadget@gmail.com> wrote:
>>> Test                                  HEAD~1            HEAD
>>> ------------------------------------------------------------------------------
>>> 2000.4: git status (sparse-index-v3)  1.50(1.43+0.10)   0.04(0.04+0.03) -97.3%
>>> 2000.5: git status (sparse-index-v4)  1.50(1.43+0.10)   0.04(0.03+0.04) -97.3%
>>
>> Um, I'm confused.  In the previous patch you claimed the following speedups:
>>
>> 2000.4: git status (sparse-index-v3)  2.43(2.33+0.14)  0.04(0.05+0.04) -98.4%
>> 2000.5: git status (sparse-index-v4)  2.44(2.35+0.13)  0.05(0.04+0.07) -98.0%
>>
>> I don't understand why the "Before" for this patch claims 1.50 as the
>> initial speed, if the "After" for the last patch was 0.04.  Should the
>> previous commit message have instead claimed:
>>
>> 2000.4: git status (sparse-index-v3)  2.43(2.33+0.14)  1.50(1.43+0.10) -38.3%
>> 2000.5: git status (sparse-index-v4)  2.44(2.35+0.13)  1.50(1.43+0.10) -38.5%
...
>> Oh!  So, the previous patch was testing without enumerating untracked
>> files (because it did those slowly), whereas this one enumerates
>> untracked files and is still able to achieve the same performance?
>> This wasn't very clear from the commit message.  Maybe I'm just bad at
>> reading, but perhaps the commit message could be tweaked slightly to
>> make this more clear?
> 
> Why is the subject of this commit "dir: use expand_to_path() ..." if
> it only touches t1092-sparse-checkout-compatibility.sh?
 
You are right to be confused. This is another patch that simplified due
to refactors in the protections branch. This should just be squashed into
the previous.

For context: an earlier version inserted ensure_full_index() before
every call to index_name_pos() and then this patch swapped that for
a call to expand_to_path(). The change in the protections branch was
to have index_name_pos() call expand_to_path() itself, preventing the
need for these ensure_full_index() calls.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 215+ messages in thread

* Re: [PATCH 01/10] t1092: add tests for status/add and sparse files
  2021-04-13 14:01 ` [PATCH 01/10] t1092: add tests for status/add and sparse files Derrick Stolee via GitGitGadget
  2021-04-20 21:52   ` Elijah Newren
@ 2021-04-21 15:14   ` Matheus Tavares Bernardino
  2021-04-23 20:12     ` Derrick Stolee
  1 sibling, 1 reply; 215+ messages in thread
From: Matheus Tavares Bernardino @ 2021-04-21 15:14 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: git, Junio C Hamano, Elijah Newren, Derrick Stolee, Derrick Stolee

Hi, Stolee

You already said you will make changes in this test to make sure
git-add's sparse warning is kept on a sparse index (BTW thanks for
that :), but I just wanted to give a couple suggestions that came to
my mind while reading the patch.

On Tue, Apr 13, 2021 at 11:02 AM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
> index 12e6c453024f..6598c12a2069 100755
> --- a/t/t1092-sparse-checkout-compatibility.sh
> +++ b/t/t1092-sparse-checkout-compatibility.sh
> @@ -232,6 +232,42 @@ test_expect_success 'add, commit, checkout' '
>         test_all_match git checkout -
>  '
>
> +test_expect_success 'status/add: outside sparse cone' '
> +       init_repos &&
> +
> +       # folder1 is at HEAD, but outside the sparse cone
> +       run_on_sparse mkdir folder1 &&
> +       cp initial-repo/folder1/a sparse-checkout/folder1/a &&
> +       cp initial-repo/folder1/a sparse-index/folder1/a &&
> +
> +       test_sparse_match git status &&
> +
> +       write_script edit-contents <<-\EOF &&
> +       echo text >>$1
> +       EOF
> +       run_on_all ../edit-contents folder1/a &&
> +       run_on_all ../edit-contents folder1/new &&
> +
> +       test_sparse_match git status --porcelain=v2 &&
> +
> +       # This "git add folder1/a" is completely ignored
> +       # by the sparse-checkout repos. It causes the
> +       # full repo to have a different staged environment.
> +
> +       test_must_fail git -C sparse-checkout add folder1/a &&
> +       test_must_fail git -C sparse-index add folder1/a &&

To make sure the output is the same, could we collapse these two lines into:

test_sparse_match test_must_fail git add folder1/a ?

And additionally, I think we could repeat this check with `add
--refresh` and also after removing `folder1/a`. The reason I'm saying
this is because the check currently succeeds when `folder1/a` is in
the working tree (maybe because `fill_directory()` ends up expanding
the sparse index in this case?), but not under the two other
circumstances I mentioned (as we've discussed in [1]).

[1]: https://lore.kernel.org/git/CAHd-oW7vCKC-XRM=rX37+jQn_XDzjtar9nNHKQ-4OHSZ=2=KFA@mail.gmail.com/

> +       git -C full-checkout checkout HEAD -- folder1/a &&
> +       test_sparse_match git status --porcelain=v2 &&

Hmm, shouldn't this be `test_all_match`? IIUC, we've resetted
`folder1/a` on the full repo to make sure the status report is the
same across all repos, right?

> +       test_all_match git add . &&
> +       test_all_match git status --porcelain=v2 &&
> +       test_all_match git commit -m folder1/new &&
> +
> +       run_on_all ../edit-contents folder1/newer &&
> +       test_all_match git add folder1/ &&
> +       test_all_match git status --porcelain=v2 &&
> +       test_all_match git commit -m folder1/newer
> +'
> +
>  test_expect_success 'checkout and reset --hard' '
>         init_repos &&
>
> --
> gitgitgadget
>

^ permalink raw reply	[flat|nested] 215+ messages in thread

* Re: [PATCH 02/10] unpack-trees: make sparse aware
  2021-04-21 13:41     ` Derrick Stolee
@ 2021-04-21 16:11       ` Elijah Newren
  2021-04-22  2:24         ` Matheus Tavares Bernardino
  0 siblings, 1 reply; 215+ messages in thread
From: Elijah Newren @ 2021-04-21 16:11 UTC (permalink / raw)
  To: Derrick Stolee, Matheus Tavares Bernardino
  Cc: Derrick Stolee via GitGitGadget, Git Mailing List,
	Junio C Hamano, Derrick Stolee, Derrick Stolee

// Adding Matheus to cc due to the ignore_skip_worktree bit, given his
experience and expertise with the checkout and unpack-trees code.

On Wed, Apr 21, 2021 at 6:41 AM Derrick Stolee <stolee@gmail.com> wrote:
>
> On 4/20/2021 7:00 PM, Elijah Newren wrote:
> > On Tue, Apr 13, 2021 at 7:01 AM Derrick Stolee via GitGitGadget
> > <gitgitgadget@gmail.com> wrote:
> >> diff --git a/dir.h b/dir.h
> >> index 51cb0e217247..9d6666f520f3 100644
> >> --- a/dir.h
> >> +++ b/dir.h
> >> @@ -503,7 +503,7 @@ static inline int ce_path_match(struct index_state *istate,
> >>                                 char *seen)
> >>  {
> >>         return match_pathspec(istate, pathspec, ce->name, ce_namelen(ce), 0, seen,
> >> -                             S_ISDIR(ce->ce_mode) || S_ISGITLINK(ce->ce_mode));
> >> +                             S_ISSPARSEDIR(ce->ce_mode) || S_ISDIR(ce->ce_mode) || S_ISGITLINK(ce->ce_mode));
> >
> > I'm confused why this change would be needed, or why it'd semantically
> > be meaningful here either.  Doesn't S_ISSPARSEDIR() being true imply
> > S_ISDIR() is true (and perhaps even vice versa?).
> >
> > By chance, was this a leftover from your early RFC changes from a few
> > series ago when you had an entirely different mode for sparse
> > directory entries?
>
> I will double-check on this with additional testing and debugging.
> Your comments below make it clear that this patch would benefit from
> some additional splitting.
>
> >>  }
> >>
> >>  static inline int dir_path_match(struct index_state *istate,
> >> diff --git a/preload-index.c b/preload-index.c
> >> index e5529a586366..35e67057ca9b 100644
> >> --- a/preload-index.c
> >> +++ b/preload-index.c
> >> @@ -55,6 +55,8 @@ static void *preload_thread(void *_data)
> >>                         continue;
> >>                 if (S_ISGITLINK(ce->ce_mode))
> >>                         continue;
> >> +               if (S_ISSPARSEDIR(ce->ce_mode))
> >> +                       continue;
> >>                 if (ce_uptodate(ce))
> >>                         continue;
> >>                 if (ce_skip_worktree(ce))
> >
> > Don't we have S_ISSPARSEDIR(ce->ce_mode) implies ce_skip_worktree(ce)?
> >  Is this a duplicate check?  If so, is it still desirable for
> > future-proofing or code clarity, or is it strictly redundant?
>
> You're right, we could skip this one because the ce_skip_worktree(ce)
> is enough to cover this case. I think I created this one because I was
> auditing uses of S_ISGITLINK().
>
> >> diff --git a/read-cache.c b/read-cache.c
> >> index 29ffa9ac5db9..6308234b4838 100644
> >> --- a/read-cache.c
> >> +++ b/read-cache.c
> >> @@ -1594,6 +1594,9 @@ int refresh_index(struct index_state *istate, unsigned int flags,
> >>                 if (ignore_skip_worktree && ce_skip_worktree(ce))
> >>                         continue;
> >>
> >> +               if (istate->sparse_index && S_ISSPARSEDIR(ce->ce_mode))
> >> +                       continue;
> >> +
> >
> > I'm a bit confused about what could trigger ce_skip_worktree(ce) &&
> > !ignore_skip_worktree and why it'd be desirable to refresh
> > skip-worktree entries.  However, this is tangential to your patch and
> > has apparently been around since 2009 (in particular, from 56cac48c35
> > ("ie_match_stat(): do not ignore skip-worktree bit with
> > CE_MATCH_IGNORE_VALID", 2009-12-14)).
>
> This is probably better served with a statement like this earlier in
> the method:
>
>         if (ignore_skip_worktree)
>                 ensure_full_index(istate);
>
> It seems like ignoring the skip worktree bits is a rare occasion and
> it will be worth expanding the index for that case.

Maybe...I read the commit message that introduced the behavior and
it's not very convincing to me that SKIP_WORKTREE should be ignored
(it's also not that clear to me what the conditions are; is it just
update-index --really-refresh?); it may be worth double checking on
that assumption first, especially given how many other bugs existed
with skip_worktree stuff for years.  If it's necessary, then I agree
that your extra if-check makes sense.

In particular, I think it'd be really dumb for "update-index
--really-refresh" to read in and populate a huge subdirectory just to
stat files that don't exist because they are in directories that don't
exist.  And I think there's a pretty good argument to not update stat
information for skip_worktree entries in non-sparse-index cases even
in the presence of that flag, especially given Matheus' other recent
changes in this area (the emails just before we got to the point of
discussing SKIP_WORKTREE and racy clean entries...speaking of which,
it might be worthwhile pinging Matheus' for opinions on this issue
too.)

> >>                 if (pathspec && !ce_path_match(istate, ce, pathspec, seen))
> >>                         filtered = 1;
> >>
> >> diff --git a/unpack-trees.c b/unpack-trees.c
> >> index dddf106d5bd4..9a62e823928a 100644
> >> --- a/unpack-trees.c
> >> +++ b/unpack-trees.c
> >> @@ -586,6 +586,13 @@ static void mark_ce_used(struct cache_entry *ce, struct unpack_trees_options *o)
> >>  {
> >>         ce->ce_flags |= CE_UNPACKED;
> >>
> >> +       /*
> >> +        * If this is a sparse directory, don't advance cache_bottom.
> >> +        * That will be advanced later using the cache-tree data.
> >> +        */
> >> +       if (S_ISSPARSEDIR(ce->ce_mode))
> >> +               return;
> >> +
> >
> > I don't understand cache_bottom stuff; we might want to get Junio to
> > look over it.  Or maybe I just need to dig a bit further and attempt
> > to understand it.
>
> I remember looking very careful at this when I created this (and found
> it worth a comment) but I don't recall enough off the top of my head.
> This is worth splitting out with a careful message, which will force me
> to reexamine the cache_bottom member.
>
> >>         if (o->cache_bottom < o->src_index->cache_nr &&
> >>             o->src_index->cache[o->cache_bottom] == ce) {
> >>                 int bottom = o->cache_bottom;
> >> @@ -984,6 +991,9 @@ static int do_compare_entry(const struct cache_entry *ce,
> >>         ce_len -= pathlen;
> >>         ce_name = ce->name + pathlen;
> >>
> >> +       /* remove directory separator if a sparse directory entry */
> >> +       if (S_ISSPARSEDIR(ce->ce_mode))
> >> +               ce_len--;
> >>         return df_name_compare(ce_name, ce_len, S_IFREG, name, namelen, mode);
> >
> > Shouldn't we be passing ce->ce_mode instead of S_IFREG here as well?
> >
> > Note the following sort order:
> >    foo
> >    foo.txt
> >    foo/
> >    foo/bar
> >
> > You've trimmed off the '/', so 'foo/' would be ordered where 'foo' is,
> > but df_name_compare() exists to make "foo" sort exactly where "foo/"
> > would when "foo" is a directory.  Will your df_name_compare() call
> > here result in foo.txt being placed after all the "foo/<subpath>"
> > entries in the index and perhaps cause other problems down the line?
> > (Are there issues, e.g. with cache-trees getting wrong ordering from
> > this, or even writing out indexes or tree objects with the wrong
> > ordering?  I've written out trees to disk with wrong ordering before
> > and git usually survives but gets really confused with diffs.)
> >
> > Since at least one caller of compare_entry() takes the return result
> > and does a "if (cmp < 0)", this order is going to matter in some
> > cases.  Perhaps we need some testcases where there is a sparse
> > directory entry named "foo/" and a file recorded in some relevant tree
> > with the name "foo.txt" to be able to trigger these lines of code?
>
> I will do some testing to find out why removing the separator here was
> necessary or valuable.

I think you removed the separator because df_name_compare() assumes it
gets a regular filename (i.e. no trailing '/') and manually adds one
based on mode for directories.  You were probably worried about what
amounts to a non-sensical double '/', but df_name_compare() wouldn't
actually get to that point unless someone somehow recorded a path
within a git tree object that ended with a trailing '/'.  I'd rather
not have to worry about the double '/' and explain why it isn't
possible (or wonder about whether git trees with trailing '/'
characters could be recorded on some OS), so I think the trimming of
the separator as you did makes sense.

What doesn't make sense to me is that the code just below had a
hardcoded S_IFREG that it passed to df_name_compare, based on "this is
a cache entry, and index entries are _always_ regular files".  You
didn't change that, even though it's now a false assumption.
symlinks, and regular files should be passed as S_IFREG there, I'm not
sure what should be passed for submodules (though the fact that it's
been using S_IFREG for years suggests maybe that is the mode we want
for it, so we can't use ce->ce_mode), and I'm pretty sure sparse
directory entries should be passed as S_IFDIR in order to get the
sorting right unless you stop stripping the trailing '/' character.
I'm not exactly sure where the sorting for do_compare_entry() affects
the code later, but I tried to trace it out a little in my comments
above in order to guide some testing.

> >>  }
> >>
> >> @@ -993,6 +1003,10 @@ static int compare_entry(const struct cache_entry *ce, const struct traverse_inf
> >>         if (cmp)
> >>                 return cmp;
> >>
> >> +       /* If ce is a sparse directory, then allow equality here. */
> >> +       if (S_ISSPARSEDIR(ce->ce_mode))
> >> +               return 0;
> >> +
> >
> > Um...so a sparse directory compares equal to _anything_ at all?  I'm
> > really confused why this would be desirable.  Am I missing something
> > here?
>
> The context is that is removed from the patch is that "cmp" is the
> response from do_compare_entry(), which does a length-limited comparison.
> If cmp is non-zero, then we've already returned the difference.
>
> The rest of the method is checking if the 'info' input is actually a
> parent directory of the _path_ given at this cache entry.

Ah, thanks for the explanation.  So the only way we get here with
cmp==0 when we're dealing with a sparse directory entry is if we found
a directory by the same name....

> >>         /*
> >>          * Even if the beginning compared identically, the ce should
> >>          * compare as bigger than a directory leading up to it!
>
> The line after this is:
>
>         return ce_namelen(ce) > traverse_path_len(info, tree_entry_len(n));
>
> This comparison is saying "these paths match up to the directory specified
> by info and n, but we need 'ce' to be a file within that directory." But
> in the case of a sparse directory entry, we can skip this comparison.

Isn't this "must skip" rather than "can skip"?  If we're considering
the ce path "foo/bar/", then the traverse_path would be "foo/bar" and
we'd have:
    ce_namelen(ce) == 1 + traverse_path_len(info, tree_entry_len(n))
so this would return 1 for the comparison making them be treated as
non-equal even though they are what we consider equal entries.

In any event, it seems like this new check could use a better comment
than "then allow equality here".

> >> @@ -1243,6 +1257,7 @@ static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, str
> >>         struct cache_entry *src[MAX_UNPACK_TREES + 1] = { NULL, };
> >>         struct unpack_trees_options *o = info->data;
> >>         const struct name_entry *p = names;
> >> +       unsigned recurse = 1;
> >
> > "recurse" sent my mind off into questions about safety checks, base
> > cases, etc., instead of just the simple "we don't want to read in
> > directories corresponding to sparse entries".  I think this would be
> > clearer either if the variable had the sparsity concept embedded in
> > its name somewhere (e.g. "unsigned sparse_entry = 0", and check for
> > (!sparse_entry) instead of (recurse) below), or with a comment about
> > why there are cases where you want to avoid recursion.
>
> I can understand that. This callback is confusing because it _does_
> recurse, but through a sequence of methods instead of actually calling
> itself.
>
> It would be better to say something like "unpack_subdirectories = 1"
> and disabling it when we are in a sparse directory.

I like that name.

>
> >>
> >>         /* Find first entry with a real name (we could use "mask" too) */
> >>         while (!p->mode)
> >> @@ -1284,12 +1299,16 @@ static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, str
> >>                                         }
> >>                                 }
> >>                                 src[0] = ce;
> >> +
> >> +                               if (S_ISSPARSEDIR(ce->ce_mode))
> >> +                                       recurse = 0;
> >
> > Ah, the context here doesn't show it but this is in the "if (!cmp)"
> > block, i.e. if we found a match for the sparse directory.  This makes
> > sense, to me, _if_ we ignore the above question about sparse
> > directories matching equal to anything and everything.
>
> I believe that "anything and everything" concern has been resolved.

Yes, if we just improve the "then allow equality here" comment.

> >> @@ -1319,7 +1338,8 @@ static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, str
> >>                         }
> >>                 }
> >>
> >> -               if (traverse_trees_recursive(n, dirmask, mask & ~dirmask,
> >> +               if (recurse &&
> >> +                   traverse_trees_recursive(n, dirmask, mask & ~dirmask,
> >>                                              names, info) < 0)
> >>                         return -1;
> >>                 return mask;
> >
> > Nice.  :-)
> >
> >
> > I think your patch was mostly about the recurse stuff, which other
> > than the name or a comment about it look good to me.  However, all the
> > other preparatory small tweaks brought up a lot of questions or
> > confusion for me.  I'm worried there might be a bug or two, though I
> > may have just misunderstood some of the code bits.
>
> This patch could probably be split up a little to make these things
> clearer. Thanks for bringing up the tricky bits.
>
> -Stolee

^ permalink raw reply	[flat|nested] 215+ messages in thread

* Re: [PATCH 02/10] unpack-trees: make sparse aware
  2021-04-20 23:00   ` Elijah Newren
  2021-04-21 13:41     ` Derrick Stolee
@ 2021-04-21 17:27     ` Derrick Stolee
  2021-04-21 18:55       ` Matheus Tavares Bernardino
  2021-04-21 18:56       ` Elijah Newren
  1 sibling, 2 replies; 215+ messages in thread
From: Derrick Stolee @ 2021-04-21 17:27 UTC (permalink / raw)
  To: Elijah Newren, Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Junio C Hamano, Derrick Stolee, Derrick Stolee,
	Matheus Tavares Bernardino

On 4/20/2021 7:00 PM, Elijah Newren wrote:
> On Tue, Apr 13, 2021 at 7:01 AM Derrick Stolee via GitGitGadget
> <gitgitgadget@gmail.com> wrote:

>> diff --git a/read-cache.c b/read-cache.c
>> index 29ffa9ac5db9..6308234b4838 100644
>> --- a/read-cache.c
>> +++ b/read-cache.c
>> @@ -1594,6 +1594,9 @@ int refresh_index(struct index_state *istate, unsigned int flags,
>>                 if (ignore_skip_worktree && ce_skip_worktree(ce))
>>                         continue;
>>
>> +               if (istate->sparse_index && S_ISSPARSEDIR(ce->ce_mode))
>> +                       continue;
>> +
> 
> I'm a bit confused about what could trigger ce_skip_worktree(ce) &&
> !ignore_skip_worktree and why it'd be desirable to refresh
> skip-worktree entries.  However, this is tangential to your patch and
> has apparently been around since 2009 (in particular, from 56cac48c35
> ("ie_match_stat(): do not ignore skip-worktree bit with
> CE_MATCH_IGNORE_VALID", 2009-12-14)).

I did some more digging on this part here. There has been movement in
this space!

The thing that triggers this ignore_skip_worktree variable inside
refresh_index() is now the REFRESH_IGNORE_SKIP_WORKTREE flag which was
introduced recently and is set only by builtin/add.c:refresh(), by
Matheus: a20f704 (add: warn when asked to update SKIP_WORKTREE entries,
2021-04-08).

This means that we can (for now) keep the behavior the same by adding

	if (ignore_skip_worktree)
		ensure_full_index(istate);

before the loop. This prevents the expansion during 'git status', but
requires modification before we are ready for 'git add' to work
correctly. Specifically, 'git add' currently warns only when adding
something that exactly matches a tracked file with SKIP_WORKTREE. It
does _not_ warn when adding something that is untracked but would have
the SKIP_WORKTREE bit if it was tracked. We will need to add that
extra warning if we want to avoid expanding during 'git add'.

Alternatively, we can decide to change the behavior here and send an
error() and return failure if they try to add something that would
live within a sparse-directory entry. I will think more on this and
have a good answer before v2 is ready.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 215+ messages in thread

* Re: [PATCH 02/10] unpack-trees: make sparse aware
  2021-04-21 17:27     ` Derrick Stolee
@ 2021-04-21 18:55       ` Matheus Tavares Bernardino
  2021-04-21 19:10         ` Elijah Newren
  2021-04-21 18:56       ` Elijah Newren
  1 sibling, 1 reply; 215+ messages in thread
From: Matheus Tavares Bernardino @ 2021-04-21 18:55 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Elijah Newren, Derrick Stolee via GitGitGadget, Git Mailing List,
	Junio C Hamano, Derrick Stolee, Derrick Stolee

On Wed, Apr 21, 2021 at 2:27 PM Derrick Stolee <stolee@gmail.com> wrote:
>
> On 4/20/2021 7:00 PM, Elijah Newren wrote:
> > On Tue, Apr 13, 2021 at 7:01 AM Derrick Stolee via GitGitGadget
> > <gitgitgadget@gmail.com> wrote:
>
> >> diff --git a/read-cache.c b/read-cache.c
> >> index 29ffa9ac5db9..6308234b4838 100644
> >> --- a/read-cache.c
> >> +++ b/read-cache.c
> >> @@ -1594,6 +1594,9 @@ int refresh_index(struct index_state *istate, unsigned int flags,
> >>                 if (ignore_skip_worktree && ce_skip_worktree(ce))
> >>                         continue;
> >>
> >> +               if (istate->sparse_index && S_ISSPARSEDIR(ce->ce_mode))
> >> +                       continue;
> >> +
> >
> > I'm a bit confused about what could trigger ce_skip_worktree(ce) &&
> > !ignore_skip_worktree and why it'd be desirable to refresh
> > skip-worktree entries.  However, this is tangential to your patch and
> > has apparently been around since 2009 (in particular, from 56cac48c35
> > ("ie_match_stat(): do not ignore skip-worktree bit with
> > CE_MATCH_IGNORE_VALID", 2009-12-14)).
>
> I did some more digging on this part here. There has been movement in
> this space!
>
> The thing that triggers this ignore_skip_worktree variable inside
> refresh_index() is now the REFRESH_IGNORE_SKIP_WORKTREE flag which was
> introduced recently and is set only by builtin/add.c:refresh(), by
> Matheus: a20f704 (add: warn when asked to update SKIP_WORKTREE entries,
> 2021-04-08).
>
> This means that we can (for now) keep the behavior the same by adding
>
>         if (ignore_skip_worktree)
>                 ensure_full_index(istate);
>
> before the loop.

Hmm, I don't think we need to expand the index here.
ignore_skip_worktree makes the loop below ignore entries with the
skip_worktree bit set. Since sparse dirs also have this bit set, we
will already get the behavior we want :)

However, I think we will need to expand the index at
`find_pathspecs_matching_against_index()` in order to find and warn
about the pathspecs that have matches among skip_worktree entries...

> This prevents the expansion during 'git status', but
> requires modification before we are ready for 'git add' to work
> correctly. Specifically, 'git add' currently warns only when adding
> something that exactly matches a tracked file with SKIP_WORKTREE. It
> does _not_ warn when adding something that is untracked but would have
> the SKIP_WORKTREE bit if it was tracked. We will need to add that
> extra warning if we want to avoid expanding during 'git add'.

Hmm, I see :( I was trying to think if it would be possible to do the
pathspec matching (for the warning) without having to expand the
index, but then there are the untracked files... If the user gives
"a/*/c" and we have "a/b/" as a sparse dir, we don't know if "a/b/c"
is a skip_worktree entry or an untracked file without expanding the
index...

> Alternatively, we can decide to change the behavior here and send an
> error() and return failure if they try to add something that would
> live within a sparse-directory entry.

I think this behavior would be tricky to replicate on non-sparse-index
sparse-checkouts, if we were to do that. We would have to pathspec
match each untracked file against the sparsity patterns, perhaps?

^ permalink raw reply	[flat|nested] 215+ messages in thread

* Re: [PATCH 02/10] unpack-trees: make sparse aware
  2021-04-21 17:27     ` Derrick Stolee
  2021-04-21 18:55       ` Matheus Tavares Bernardino
@ 2021-04-21 18:56       ` Elijah Newren
  2021-04-23 20:16         ` Derrick Stolee
  1 sibling, 1 reply; 215+ messages in thread
From: Elijah Newren @ 2021-04-21 18:56 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Derrick Stolee via GitGitGadget, Git Mailing List,
	Junio C Hamano, Derrick Stolee, Derrick Stolee,
	Matheus Tavares Bernardino

On Wed, Apr 21, 2021 at 10:27 AM Derrick Stolee <stolee@gmail.com> wrote:
>
> On 4/20/2021 7:00 PM, Elijah Newren wrote:
> > On Tue, Apr 13, 2021 at 7:01 AM Derrick Stolee via GitGitGadget
> > <gitgitgadget@gmail.com> wrote:
>
> >> diff --git a/read-cache.c b/read-cache.c
> >> index 29ffa9ac5db9..6308234b4838 100644
> >> --- a/read-cache.c
> >> +++ b/read-cache.c
> >> @@ -1594,6 +1594,9 @@ int refresh_index(struct index_state *istate, unsigned int flags,
> >>                 if (ignore_skip_worktree && ce_skip_worktree(ce))
> >>                         continue;
> >>
> >> +               if (istate->sparse_index && S_ISSPARSEDIR(ce->ce_mode))
> >> +                       continue;
> >> +
> >
> > I'm a bit confused about what could trigger ce_skip_worktree(ce) &&
> > !ignore_skip_worktree and why it'd be desirable to refresh
> > skip-worktree entries.  However, this is tangential to your patch and
> > has apparently been around since 2009 (in particular, from 56cac48c35
> > ("ie_match_stat(): do not ignore skip-worktree bit with
> > CE_MATCH_IGNORE_VALID", 2009-12-14)).
>
> I did some more digging on this part here. There has been movement in
> this space!
>
> The thing that triggers this ignore_skip_worktree variable inside
> refresh_index() is now the REFRESH_IGNORE_SKIP_WORKTREE flag which was
> introduced recently and is set only by builtin/add.c:refresh(), by
> Matheus: a20f704 (add: warn when asked to update SKIP_WORKTREE entries,
> 2021-04-08).
>
> This means that we can (for now) keep the behavior the same by adding
>
>         if (ignore_skip_worktree)
>                 ensure_full_index(istate);
>
> before the loop. This prevents the expansion during 'git status', but
> requires modification before we are ready for 'git add' to work
> correctly. Specifically, 'git add' currently warns only when adding
> something that exactly matches a tracked file with SKIP_WORKTREE. It
> does _not_ warn when adding something that is untracked but would have
> the SKIP_WORKTREE bit if it was tracked. We will need to add that
> extra warning if we want to avoid expanding during 'git add'.
>
> Alternatively, we can decide to change the behavior here and send an
> error() and return failure if they try to add something that would
> live within a sparse-directory entry. I will think more on this and
> have a good answer before v2 is ready.

See my comments on 01/10; users are already getting surprised by "git
add" today and has been going on for months (though not super
frequently).  When they try to "git add" an untracked path that would
not match any path specifications in $GIT_DIR/info/sparse-checkout,
the fact that "git add" doesn't error out (or at the very least give a
warning) causes _subsequent_ commands to surprise the user with their
behavior; the fact that it is some later command that does weird stuff
(removing the file from the working tree) makes it harder for them to
try to understand and make sense of.  So, I'd say we do want to change
the behavior here...and not just for sparse-indexes but
sparse-checkouts in general.

As for how this affects the code, I think I'm behind both you and
Matheus on understanding here, but I'm starting to think it was a good
idea for me to spout my offhand comment on what looked like a funny
code smell that I thought was unrelated to your patch.  Sounds like it
is causing some good digging...I'll try to read up more on the results
when you send v2.  :-)

^ permalink raw reply	[flat|nested] 215+ messages in thread

* Re: [PATCH 02/10] unpack-trees: make sparse aware
  2021-04-21 18:55       ` Matheus Tavares Bernardino
@ 2021-04-21 19:10         ` Elijah Newren
  2021-04-21 19:51           ` Matheus Tavares Bernardino
  0 siblings, 1 reply; 215+ messages in thread
From: Elijah Newren @ 2021-04-21 19:10 UTC (permalink / raw)
  To: Matheus Tavares Bernardino
  Cc: Derrick Stolee, Derrick Stolee via GitGitGadget,
	Git Mailing List, Junio C Hamano, Derrick Stolee, Derrick Stolee

On Wed, Apr 21, 2021 at 11:55 AM Matheus Tavares Bernardino
<matheus.bernardino@usp.br> wrote:
>
> On Wed, Apr 21, 2021 at 2:27 PM Derrick Stolee <stolee@gmail.com> wrote:
> >
> > On 4/20/2021 7:00 PM, Elijah Newren wrote:
> > > On Tue, Apr 13, 2021 at 7:01 AM Derrick Stolee via GitGitGadget
> > > <gitgitgadget@gmail.com> wrote:
> >
> > >> diff --git a/read-cache.c b/read-cache.c
> > >> index 29ffa9ac5db9..6308234b4838 100644
> > >> --- a/read-cache.c
> > >> +++ b/read-cache.c
> > >> @@ -1594,6 +1594,9 @@ int refresh_index(struct index_state *istate, unsigned int flags,
> > >>                 if (ignore_skip_worktree && ce_skip_worktree(ce))
> > >>                         continue;
> > >>
> > >> +               if (istate->sparse_index && S_ISSPARSEDIR(ce->ce_mode))
> > >> +                       continue;
> > >> +
> > >
> > > I'm a bit confused about what could trigger ce_skip_worktree(ce) &&
> > > !ignore_skip_worktree and why it'd be desirable to refresh
> > > skip-worktree entries.  However, this is tangential to your patch and
> > > has apparently been around since 2009 (in particular, from 56cac48c35
> > > ("ie_match_stat(): do not ignore skip-worktree bit with
> > > CE_MATCH_IGNORE_VALID", 2009-12-14)).
> >
> > I did some more digging on this part here. There has been movement in
> > this space!
> >
> > The thing that triggers this ignore_skip_worktree variable inside
> > refresh_index() is now the REFRESH_IGNORE_SKIP_WORKTREE flag which was
> > introduced recently and is set only by builtin/add.c:refresh(), by
> > Matheus: a20f704 (add: warn when asked to update SKIP_WORKTREE entries,
> > 2021-04-08).
> >
> > This means that we can (for now) keep the behavior the same by adding
> >
> >         if (ignore_skip_worktree)
> >                 ensure_full_index(istate);
> >
> > before the loop.
>
> Hmm, I don't think we need to expand the index here.
> ignore_skip_worktree makes the loop below ignore entries with the
> skip_worktree bit set. Since sparse dirs also have this bit set, we
> will already get the behavior we want :)
>
> However, I think we will need to expand the index at
> `find_pathspecs_matching_against_index()` in order to find and warn
> about the pathspecs that have matches among skip_worktree entries...
>
> > This prevents the expansion during 'git status', but
> > requires modification before we are ready for 'git add' to work
> > correctly. Specifically, 'git add' currently warns only when adding
> > something that exactly matches a tracked file with SKIP_WORKTREE. It
> > does _not_ warn when adding something that is untracked but would have
> > the SKIP_WORKTREE bit if it was tracked. We will need to add that
> > extra warning if we want to avoid expanding during 'git add'.
>
> Hmm, I see :( I was trying to think if it would be possible to do the
> pathspec matching (for the warning) without having to expand the
> index, but then there are the untracked files... If the user gives
> "a/*/c" and we have "a/b/" as a sparse dir, we don't know if "a/b/c"
> is a skip_worktree entry or an untracked file without expanding the
> index...

I thought Stolee's series added something that could allow us to check
that e.g. "a/b/c" corresponded to an entry under the sparse directory
"a/b/" and thus is a would-be-sparse entry.  Can we use that?

> > Alternatively, we can decide to change the behavior here and send an
> > error() and return failure if they try to add something that would
> > live within a sparse-directory entry.
>
> I think this behavior would be tricky to replicate on non-sparse-index
> sparse-checkouts, if we were to do that. We would have to pathspec
> match each untracked file against the sparsity patterns, perhaps?

By way of analogy, don't we have to pay the cost of pathspec matching
each tree entry against the sparsity patterns when doing a checkout
before putting those entries into the index?  Since "git add" is
trying to put new entries into the index, doesn't it make sense for it
to pay the same cost for the untracked paths it is about to place
there?

Sure, that can be expensive for non-cone mode, but that's the price
users pay for using sparse-checkouts and not using cone mode, and they
pay it every time they try to update the index with some new checkout.
I think "git add" should be treated similarly as another way to update
the index -- especially since users will get confused (and have gotten
confused) by subsequent commands if we don't do those checks.

^ permalink raw reply	[flat|nested] 215+ messages in thread

* Re: [PATCH 02/10] unpack-trees: make sparse aware
  2021-04-21 19:10         ` Elijah Newren
@ 2021-04-21 19:51           ` Matheus Tavares Bernardino
  0 siblings, 0 replies; 215+ messages in thread
From: Matheus Tavares Bernardino @ 2021-04-21 19:51 UTC (permalink / raw)
  To: Elijah Newren
  Cc: Derrick Stolee, Derrick Stolee via GitGitGadget,
	Git Mailing List, Junio C Hamano, Derrick Stolee, Derrick Stolee

On Wed, Apr 21, 2021 at 4:11 PM Elijah Newren <newren@gmail.com> wrote:
>
> On Wed, Apr 21, 2021 at 11:55 AM Matheus Tavares Bernardino
> <matheus.bernardino@usp.br> wrote:
> >
> > On Wed, Apr 21, 2021 at 2:27 PM Derrick Stolee <stolee@gmail.com> wrote:
> > >
> > > On 4/20/2021 7:00 PM, Elijah Newren wrote:
> > > > On Tue, Apr 13, 2021 at 7:01 AM Derrick Stolee via GitGitGadget
> > > > <gitgitgadget@gmail.com> wrote:
> > >
> > > >> diff --git a/read-cache.c b/read-cache.c
> > > >> index 29ffa9ac5db9..6308234b4838 100644
> > > >> --- a/read-cache.c
> > > >> +++ b/read-cache.c
> > > >> @@ -1594,6 +1594,9 @@ int refresh_index(struct index_state *istate, unsigned int flags,
> > > >>                 if (ignore_skip_worktree && ce_skip_worktree(ce))
> > > >>                         continue;
> > > >>
> > > >> +               if (istate->sparse_index && S_ISSPARSEDIR(ce->ce_mode))
> > > >> +                       continue;
> > > >> +
> > > >
> > > > I'm a bit confused about what could trigger ce_skip_worktree(ce) &&
> > > > !ignore_skip_worktree and why it'd be desirable to refresh
> > > > skip-worktree entries.  However, this is tangential to your patch and
> > > > has apparently been around since 2009 (in particular, from 56cac48c35
> > > > ("ie_match_stat(): do not ignore skip-worktree bit with
> > > > CE_MATCH_IGNORE_VALID", 2009-12-14)).
> > >
> > > I did some more digging on this part here. There has been movement in
> > > this space!
> > >
> > > The thing that triggers this ignore_skip_worktree variable inside
> > > refresh_index() is now the REFRESH_IGNORE_SKIP_WORKTREE flag which was
> > > introduced recently and is set only by builtin/add.c:refresh(), by
> > > Matheus: a20f704 (add: warn when asked to update SKIP_WORKTREE entries,
> > > 2021-04-08).
> > >
> > > This means that we can (for now) keep the behavior the same by adding
> > >
> > >         if (ignore_skip_worktree)
> > >                 ensure_full_index(istate);
> > >
> > > before the loop.
> >
> > Hmm, I don't think we need to expand the index here.
> > ignore_skip_worktree makes the loop below ignore entries with the
> > skip_worktree bit set. Since sparse dirs also have this bit set, we
> > will already get the behavior we want :)
> >
> > However, I think we will need to expand the index at
> > `find_pathspecs_matching_against_index()` in order to find and warn
> > about the pathspecs that have matches among skip_worktree entries...
> >
> > > This prevents the expansion during 'git status', but
> > > requires modification before we are ready for 'git add' to work
> > > correctly. Specifically, 'git add' currently warns only when adding
> > > something that exactly matches a tracked file with SKIP_WORKTREE. It
> > > does _not_ warn when adding something that is untracked but would have
> > > the SKIP_WORKTREE bit if it was tracked. We will need to add that
> > > extra warning if we want to avoid expanding during 'git add'.
> >
> > Hmm, I see :( I was trying to think if it would be possible to do the
> > pathspec matching (for the warning) without having to expand the
> > index, but then there are the untracked files... If the user gives
> > "a/*/c" and we have "a/b/" as a sparse dir, we don't know if "a/b/c"
> > is a skip_worktree entry or an untracked file without expanding the
> > index...
>
> I thought Stolee's series added something that could allow us to check
> that e.g. "a/b/c" corresponded to an entry under the sparse directory
> "a/b/" and thus is a would-be-sparse entry.  Can we use that?

Yes, you mean for the warning on untracked paths that would become
sparse entries, right? The problem I was considering there was the
warning on tracked entries only, in which case I'm not sure if it
would help.

> > > Alternatively, we can decide to change the behavior here and send an
> > > error() and return failure if they try to add something that would
> > > live within a sparse-directory entry.
> >
> > I think this behavior would be tricky to replicate on non-sparse-index
> > sparse-checkouts, if we were to do that. We would have to pathspec
> > match each untracked file against the sparsity patterns, perhaps?
>
> By way of analogy, don't we have to pay the cost of pathspec matching
> each tree entry against the sparsity patterns when doing a checkout
> before putting those entries into the index?  Since "git add" is
> trying to put new entries into the index, doesn't it make sense for it
> to pay the same cost for the untracked paths it is about to place
> there?
>
> Sure, that can be expensive for non-cone mode, but that's the price
> users pay for using sparse-checkouts and not using cone mode, and they
> pay it every time they try to update the index with some new checkout.
> I think "git add" should be treated similarly as another way to update
> the index -- especially since users will get confused (and have gotten
> confused) by subsequent commands if we don't do those checks.

Good point. Yeah, that all makes sense :)

^ permalink raw reply	[flat|nested] 215+ messages in thread

* Re: [PATCH 02/10] unpack-trees: make sparse aware
  2021-04-21 16:11       ` Elijah Newren
@ 2021-04-22  2:24         ` Matheus Tavares Bernardino
  0 siblings, 0 replies; 215+ messages in thread
From: Matheus Tavares Bernardino @ 2021-04-22  2:24 UTC (permalink / raw)
  To: Elijah Newren
  Cc: Derrick Stolee, Derrick Stolee via GitGitGadget,
	Git Mailing List, Junio C Hamano, Derrick Stolee, Derrick Stolee

On Wed, Apr 21, 2021 at 1:11 PM Elijah Newren <newren@gmail.com> wrote:
>
> // Adding Matheus to cc due to the ignore_skip_worktree bit, given his
> experience and expertise with the checkout and unpack-trees code.
>
> On Wed, Apr 21, 2021 at 6:41 AM Derrick Stolee <stolee@gmail.com> wrote:
> >
> > On 4/20/2021 7:00 PM, Elijah Newren wrote:
> > > On Tue, Apr 13, 2021 at 7:01 AM Derrick Stolee via GitGitGadget
> > > <gitgitgadget@gmail.com> wrote:
> > >>
> > >> diff --git a/read-cache.c b/read-cache.c
> > >> index 29ffa9ac5db9..6308234b4838 100644
> > >> --- a/read-cache.c
> > >> +++ b/read-cache.c
> > >> @@ -1594,6 +1594,9 @@ int refresh_index(struct index_state *istate, unsigned int flags,
> > >>                 if (ignore_skip_worktree && ce_skip_worktree(ce))
> > >>                         continue;
> > >>
> > >> +               if (istate->sparse_index && S_ISSPARSEDIR(ce->ce_mode))
> > >> +                       continue;
> > >> +
> > >
> > > I'm a bit confused about what could trigger ce_skip_worktree(ce) &&
> > > !ignore_skip_worktree and why it'd be desirable to refresh
> > > skip-worktree entries.

The skip-worktree entries are not really refreshed in refresh_index(),
even when !ignore_skip_worktree (which is the default case; i.e.
without the REFRESH_IGNORE_SKIP_WORKTREE flag).

This flag (which is currently only used by `git add --refresh`s code
at `builtin/add.c:refresh()`), just makes refresh_index() skip the
following operations on skip-worktree entries: pathspec matching,
marking the matches on `seen`, checking/warning if unmerged, and
marking the entry as up-to-date (i.e. with the in-memory CE_UPTODATE
bit).

I added this flag in mt/add-rm-in-sparse-checkout and changed
`builtin/add.c:refresh()` to use it mainly because we needed a `seen`
array with only matches from non-skip-worktree entries so that we
could later decide when to emit the warning. (In fact, the original
implementation of the flag only controlled whether sparse matches
would be marked on `seen` or not [1])

[1]: https://lore.kernel.org/git/d65b214dd1d83a2e8710a9bbf98477c1929f0d5e.1614138107.git.matheus.bernardino@usp.br/

Perhaps we could alternatively make refresh_index() skip the
previously mentioned operations on all skip-worktrees entries
*unconditionally*. I.e. having, early in the loop:

if (ce_skip_worktree(ce))
        continue;

But I'm not familiar enough with CE_UPTODATE and how it's used in
different parts of the code base, so I didn't want to risk introducing
any bugs at refresh_index() callers that might want/expect the
function to set the CE_UPTODATE bit on the skip-worktree entries. The
case of `git add --refresh` was much narrower and easier to analyze,
and that's what we were interested in for the warning. That's why I
only changed the behavior there :)

> > > However, this is tangential to your patch and
> > > has apparently been around since 2009 (in particular, from 56cac48c35
> > > ("ie_match_stat(): do not ignore skip-worktree bit with
> > > CE_MATCH_IGNORE_VALID", 2009-12-14)).

Note that the `CE_MATCH_IGNORE_SKIP_WORKTREE` added in this patch does
control if refresh_cache_ent() will refresh skip-worktree entries, but
refresh_index() allways calls this function *without* this flag.

^ permalink raw reply	[flat|nested] 215+ messages in thread

* Re: [PATCH 01/10] t1092: add tests for status/add and sparse files
  2021-04-21 15:14   ` Matheus Tavares Bernardino
@ 2021-04-23 20:12     ` Derrick Stolee
  0 siblings, 0 replies; 215+ messages in thread
From: Derrick Stolee @ 2021-04-23 20:12 UTC (permalink / raw)
  To: Matheus Tavares Bernardino, Derrick Stolee via GitGitGadget
  Cc: git, Junio C Hamano, Elijah Newren, Derrick Stolee, Derrick Stolee

On 4/21/2021 11:14 AM, Matheus Tavares Bernardino wrote:
> Hi, Stolee
> 
> You already said you will make changes in this test to make sure
> git-add's sparse warning is kept on a sparse index (BTW thanks for
> that :), but I just wanted to give a couple suggestions that came to
> my mind while reading the patch.

I appreciate the suggestions! More tests always help me from
making mistakes, and you are definitely more of a 'git add'
expert than me.
 
>> +       test_must_fail git -C sparse-checkout add folder1/a &&
>> +       test_must_fail git -C sparse-index add folder1/a &&
> 
> To make sure the output is the same, could we collapse these two lines into:
> 
> test_sparse_match test_must_fail git add folder1/a ?

This is elegant. I'm sad I didn't think of it earlier.

> And additionally, I think we could repeat this check with `add
> --refresh` and also after removing `folder1/a`. The reason I'm saying
> this is because the check currently succeeds when `folder1/a` is in
> the working tree (maybe because `fill_directory()` ends up expanding
> the sparse index in this case?), but not under the two other
> circumstances I mentioned (as we've discussed in [1]).
> 
> [1]: https://lore.kernel.org/git/CAHd-oW7vCKC-XRM=rX37+jQn_XDzjtar9nNHKQ-4OHSZ=2=KFA@mail.gmail.com/

Can do!

>> +       git -C full-checkout checkout HEAD -- folder1/a &&
>> +       test_sparse_match git status --porcelain=v2 &&
> 
> Hmm, shouldn't this be `test_all_match`? IIUC, we've resetted
> `folder1/a` on the full repo to make sure the status report is the
> same across all repos, right?

Yes!

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 215+ messages in thread

* Re: [PATCH 02/10] unpack-trees: make sparse aware
  2021-04-21 18:56       ` Elijah Newren
@ 2021-04-23 20:16         ` Derrick Stolee
  0 siblings, 0 replies; 215+ messages in thread
From: Derrick Stolee @ 2021-04-23 20:16 UTC (permalink / raw)
  To: Elijah Newren
  Cc: Derrick Stolee via GitGitGadget, Git Mailing List,
	Junio C Hamano, Derrick Stolee, Derrick Stolee,
	Matheus Tavares Bernardino

On 4/21/2021 2:56 PM, Elijah Newren wrote:
> On Wed, Apr 21, 2021 at 10:27 AM Derrick Stolee <stolee@gmail.com> wrote:
>> Alternatively, we can decide to change the behavior here and send an
>> error() and return failure if they try to add something that would
>> live within a sparse-directory entry. I will think more on this and
>> have a good answer before v2 is ready.
> 
> See my comments on 01/10; users are already getting surprised by "git
> add" today and has been going on for months (though not super
> frequently).  When they try to "git add" an untracked path that would
> not match any path specifications in $GIT_DIR/info/sparse-checkout,
> the fact that "git add" doesn't error out (or at the very least give a
> warning) causes _subsequent_ commands to surprise the user with their
> behavior; the fact that it is some later command that does weird stuff
> (removing the file from the working tree) makes it harder for them to
> try to understand and make sense of.  So, I'd say we do want to change
> the behavior here...and not just for sparse-indexes but
> sparse-checkouts in general.
> 
> As for how this affects the code, I think I'm behind both you and
> Matheus on understanding here, but I'm starting to think it was a good
> idea for me to spout my offhand comment on what looked like a funny
> code smell that I thought was unrelated to your patch.  Sounds like it
> is causing some good digging...I'll try to read up more on the results
> when you send v2.  :-)

I think there are enough strange thing happening with 'git add' that I
want to take some time to figure out the right approach here. In v2, I
will delete the changes to builtin/add.c and instead focus on making
'git status' faster with a sparse-index. The 'git add' improvements
will follow in another series after I take enough time to understand
all of these special modes.

I think this split is especially important if we decide that changing
the behavior is the best thing to do here.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 215+ messages in thread

* [PATCH v2 0/8] Sparse-index: integrate with status
  2021-04-13 14:01 [PATCH 00/10] Sparse-index: integrate with status and add Derrick Stolee via GitGitGadget
                   ` (10 preceding siblings ...)
  2021-04-13 20:45 ` [PATCH 00/10] Sparse-index: integrate with status and add Matheus Tavares Bernardino
@ 2021-04-23 21:34 ` Derrick Stolee via GitGitGadget
  2021-04-23 21:34   ` [PATCH v2 1/8] t1092: add tests for status/add and sparse files Derrick Stolee via GitGitGadget
                     ` (9 more replies)
  11 siblings, 10 replies; 215+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-04-23 21:34 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee,
	Derrick Stolee

This is the first "payoff" series in the sparse-index work. It makes 'git
status' very fast when a sparse-index is enabled on a repository with
cone-mode sparse-checkout (and a small populated set).

This is based on ds/sparse-index-protections AND mt/add-rm-sparse-checkout.
The latter branch is needed because it changes the behavior of 'git add'
around sparse entries, which changes the expectations of a test added in
patch 1.

The approach here is to audit the places where ensure_full_index() pops up
while doing normal commands with pathspecs within the sparse-checkout
definition. Each of these are checked and tested. In the end, the
sparse-index is integrated with these features:

 * git status
 * FS Monitor index extension.

The performance tests in p2000-sparse-operations.sh improve by 95% or more,
even when compared with the full-index cases, not just the sparse-index
cases that previously had extra overhead.

Hopefully this is the first example of how ds/sparse-index-protections has
done the basic work to do these conversions safely, making them look easier
than they seemed when starting this adventure.

Thanks, -Stolee


Updates in V2
=============

 * Based on the feedback, it is clear that 'git add' will require much more
   careful testing and thought. I'm splitting it out of this series and it
   will return with a follow-up.
 * Test cases are improved, both in coverage and organization.
 * The previous "unpack-trees: make sparse aware" patch is split into three
   now.
 * Stale messages based on an old implementation of the "protections" topic
   are now fixed.
 * Performance tests were re-run.

Derrick Stolee (8):
  t1092: add tests for status/add and sparse files
  unpack-trees: preserve cache_bottom
  unpack-trees: compare sparse directories correctly
  unpack-trees: stop recursing into sparse directories
  dir.c: accept a directory as part of cone-mode patterns
  status: skip sparse-checkout percentage with sparse-index
  status: use sparse-index throughout
  fsmonitor: test with sparse index

 builtin/commit.c                         |  3 ++
 dir.c                                    | 11 +++++
 read-cache.c                             | 10 +++-
 t/t1092-sparse-checkout-compatibility.sh | 61 ++++++++++++++++++++++--
 t/t7519-status-fsmonitor.sh              | 48 +++++++++++++++++++
 unpack-trees.c                           | 25 ++++++++--
 wt-status.c                              | 14 ++++--
 wt-status.h                              |  1 +
 8 files changed, 161 insertions(+), 12 deletions(-)


base-commit: f723f370c89ad61f4f40aabfd3540b1ce19c00e5
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-932%2Fderrickstolee%2Fsparse-index%2Fstatus-and-add-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-932/derrickstolee/sparse-index/status-and-add-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/932

Range-diff vs v1:

  1:  b2cb5401eff8 !  1:  3bac9edae7d8 t1092: add tests for status/add and sparse files
     @@ Commit message
          Untracked files are fine: adding new files both with 'git add .' and
          'git add folder1/' works just as in a full checkout. This may not be
          entirely desirable, but we are not intending to change behavior at the
     -    moment, only document it.
     +    moment, only document it. A future change could alter the behavior to
     +    be more sensible, and this test could be modified to satisfy the new
     +    expected behavior.
      
          Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
      
     @@ t/t1092-sparse-checkout-compatibility.sh: test_expect_success 'add, commit, chec
      +	# This "git add folder1/a" is completely ignored
      +	# by the sparse-checkout repos. It causes the
      +	# full repo to have a different staged environment.
     -+	test_must_fail git -C sparse-checkout add folder1/a &&
     -+	test_must_fail git -C sparse-index add folder1/a &&
     ++	#
     ++	# This is not a desirable behavior, but this test
     ++	# ensures that the sparse-index is not the cause
     ++	# of a behavior change.
     ++	test_sparse_match test_must_fail git add folder1/a &&
     ++	test_sparse_match test_must_fail git add --refresh folder1/a &&
      +	git -C full-checkout checkout HEAD -- folder1/a &&
     -+	test_sparse_match git status --porcelain=v2 &&
     ++	test_all_match git status --porcelain=v2 &&
      +
      +	test_all_match git add . &&
      +	test_all_match git status --porcelain=v2 &&
  -:  ------------ >  2:  19344394379d unpack-trees: preserve cache_bottom
  -:  ------------ >  3:  24e71d8c0622 unpack-trees: compare sparse directories correctly
  2:  0a3892d2ec9e !  4:  d3c8948d0a33 unpack-trees: make sparse aware
     @@ Metadata
      Author: Derrick Stolee <dstolee@microsoft.com>
      
       ## Commit message ##
     -    unpack-trees: make sparse aware
     +    unpack-trees: stop recursing into sparse directories
      
     -    As a first step to integrate 'git status' and 'git add' with the sparse
     -    index, we must start integrating unpack_trees() with sparse directory
     -    entries. These changes are currently impossible to trigger because
     -    unpack_trees() calls ensure_full_index() if command_requires_full_index
     -    is true. This is the case for all commands at the moment. As we expand
     -    more commands to be sparse-aware, we might find that more changes are
     -    required to unpack_trees(). The current changes will suffice for
     -    'status' and 'add'.
     +    When walking trees using traverse_trees_recursive() and
     +    unpack_callback(), we must not attempt to walk into a sparse directory
     +    entry. There are no index entries within that directory to compare to
     +    the tree object at that position, so skip over the entries of that tree.
      
     -    unpack_trees() calls the traverse_trees() API using unpack_callback()
     -    to decide if we should recurse into a subtree. We must add new abilities
     -    to skip a subtree if it corresponds to a sparse directory entry.
     -
     -    It is important to be careful about the trailing directory separator
     -    that exists in the sparse directory entries but not in the subtree
     -    paths.
     +    This code is used in many places, so the only way to test it is to start
     +    removing the command_requres_full_index option from one builtin at a
     +    time and carefully test that its use of unpack_trees() behaves correctly
     +    with a sparse-index. Such tests will be added by later changes.
      
          Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
      
     - ## dir.h ##
     -@@ dir.h: static inline int ce_path_match(struct index_state *istate,
     - 				char *seen)
     - {
     - 	return match_pathspec(istate, pathspec, ce->name, ce_namelen(ce), 0, seen,
     --			      S_ISDIR(ce->ce_mode) || S_ISGITLINK(ce->ce_mode));
     -+			      S_ISSPARSEDIR(ce->ce_mode) || S_ISDIR(ce->ce_mode) || S_ISGITLINK(ce->ce_mode));
     - }
     - 
     - static inline int dir_path_match(struct index_state *istate,
     -
     - ## preload-index.c ##
     -@@ preload-index.c: static void *preload_thread(void *_data)
     - 			continue;
     - 		if (S_ISGITLINK(ce->ce_mode))
     - 			continue;
     -+		if (S_ISSPARSEDIR(ce->ce_mode))
     -+			continue;
     - 		if (ce_uptodate(ce))
     - 			continue;
     - 		if (ce_skip_worktree(ce))
     -
     - ## read-cache.c ##
     -@@ read-cache.c: int refresh_index(struct index_state *istate, unsigned int flags,
     - 		if (ignore_skip_worktree && ce_skip_worktree(ce))
     - 			continue;
     - 
     -+		if (istate->sparse_index && S_ISSPARSEDIR(ce->ce_mode))
     -+			continue;
     -+
     - 		if (pathspec && !ce_path_match(istate, ce, pathspec, seen))
     - 			filtered = 1;
     - 
     -
       ## unpack-trees.c ##
     -@@ unpack-trees.c: static void mark_ce_used(struct cache_entry *ce, struct unpack_trees_options *o)
     - {
     - 	ce->ce_flags |= CE_UNPACKED;
     - 
     -+	/*
     -+	 * If this is a sparse directory, don't advance cache_bottom.
     -+	 * That will be advanced later using the cache-tree data.
     -+	 */
     -+	if (S_ISSPARSEDIR(ce->ce_mode))
     -+		return;
     -+
     - 	if (o->cache_bottom < o->src_index->cache_nr &&
     - 	    o->src_index->cache[o->cache_bottom] == ce) {
     - 		int bottom = o->cache_bottom;
     -@@ unpack-trees.c: static int do_compare_entry(const struct cache_entry *ce,
     - 	ce_len -= pathlen;
     - 	ce_name = ce->name + pathlen;
     - 
     -+	/* remove directory separator if a sparse directory entry */
     -+	if (S_ISSPARSEDIR(ce->ce_mode))
     -+		ce_len--;
     - 	return df_name_compare(ce_name, ce_len, S_IFREG, name, namelen, mode);
     - }
     - 
     -@@ unpack-trees.c: static int compare_entry(const struct cache_entry *ce, const struct traverse_inf
     - 	if (cmp)
     - 		return cmp;
     - 
     -+	/* If ce is a sparse directory, then allow equality here. */
     -+	if (S_ISSPARSEDIR(ce->ce_mode))
     -+		return 0;
     -+
     - 	/*
     - 	 * Even if the beginning compared identically, the ce should
     - 	 * compare as bigger than a directory leading up to it!
      @@ unpack-trees.c: static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, str
       	struct cache_entry *src[MAX_UNPACK_TREES + 1] = { NULL, };
       	struct unpack_trees_options *o = info->data;
       	const struct name_entry *p = names;
     -+	unsigned recurse = 1;
     ++	unsigned unpack_tree = 1;
       
       	/* Find first entry with a real name (we could use "mask" too) */
       	while (!p->mode)
     @@ unpack-trees.c: static int unpack_callback(int n, unsigned long mask, unsigned l
       				src[0] = ce;
      +
      +				if (S_ISSPARSEDIR(ce->ce_mode))
     -+					recurse = 0;
     ++					unpack_tree = 0;
       			}
       			break;
       		}
       	}
       
      -	if (unpack_nondirectories(n, mask, dirmask, src, names, info) < 0)
     -+	if (recurse &&
     ++	if (unpack_tree &&
      +	    unpack_nondirectories(n, mask, dirmask, src, names, info) < 0)
       		return -1;
       
     @@ unpack-trees.c: static int unpack_callback(int n, unsigned long mask, unsigned l
       		}
       
      -		if (traverse_trees_recursive(n, dirmask, mask & ~dirmask,
     -+		if (recurse &&
     ++		if (unpack_tree &&
      +		    traverse_trees_recursive(n, dirmask, mask & ~dirmask,
       					     names, info) < 0)
       			return -1;
  3:  28ca717e6526 !  5:  fd96b71968b6 dir.c: accept a directory as part of cone-mode patterns
     @@ dir.c: enum pattern_match_result path_matches_pattern_list(
       	strbuf_addch(&parent_pathname, '/');
       	strbuf_add(&parent_pathname, pathname, pathlen);
       
     -+	/* Directory requests should be added as if they are a file */
     -+	if (parent_pathname.len > 1 &&
     ++	/*
     ++	 * Directory entries are matched if and only if a file
     ++	 * contained immediately within them is matched. For the
     ++	 * case of a directory entry, modify the path to create
     ++	 * a fake filename within this directory, allowing us to
     ++	 * use the file-base matching logic in an equivalent way.
     ++	 */
     ++	if (parent_pathname.len > 0 &&
      +	    parent_pathname.buf[parent_pathname.len - 1] == '/')
      +		strbuf_add(&parent_pathname, "-", 1);
      +
  4:  e86f874dd412 =  6:  1f4ba56e7416 status: skip sparse-checkout percentage with sparse-index
  5:  d7d4cad8be0b !  7:  3d09368c0541 status: use sparse-index throughout
     @@ Commit message
          implementation details are already integrated with sparse-checkout, so
          modify command_requires_full_index to be zero for cmd_status().
      
     -    By running the debugger for 'git status -uno' after that change, we find
     -    two instances of ensure_full_index() that were added for extra safety,
     -    but can be removed without issue.
     +    In refresh_index(), we loop through the index entries to refresh their
     +    stat() information. However, sparse directories have no stat()
     +    information to populate. Ignore these entries.
      
     -    In refresh_index(), we loop through the index entries. The
     -    refresh_cache_ent() method copies the sparse directories into the
     -    refreshed index without issue.
     +    This allows 'git status' to no longer expand a sparse index to a full
     +    one. This is further tested by dropping the "-uno" option and adding an
     +    untracked file into the worktree.
      
     -    The loop within run_diff_files() skips things that are in stage 0 and
     -    have skip-worktree enabled, so seems safe to disable ensure_full_index()
     -    here.
     -
     -    This allows some cases of 'git status' to no longer expand a sparse
     -    index to a full one, giving the following performance improvements for
     -    p2000-sparse-checkout-operations.sh:
     +    The performance test p2000-sparse-checkout-operations.sh demonstrates
     +    these improvements:
      
          Test                                  HEAD~1           HEAD
          -----------------------------------------------------------------------------
     -    2000.2: git status (full-index-v3)    0.38(0.36+0.07)  0.37(0.31+0.10) -2.6%
     -    2000.3: git status (full-index-v4)    0.38(0.29+0.12)  0.37(0.30+0.11) -2.6%
     -    2000.4: git status (sparse-index-v3)  2.43(2.33+0.14)  0.04(0.05+0.04) -98.4%
     -    2000.5: git status (sparse-index-v4)  2.44(2.35+0.13)  0.05(0.04+0.07) -98.0%
     +    2000.2: git status (full-index-v3)    0.31(0.30+0.05)  0.31(0.29+0.06) +0.0%
     +    2000.3: git status (full-index-v4)    0.31(0.29+0.07)  0.34(0.30+0.08) +9.7%
     +    2000.4: git status (sparse-index-v3)  2.35(2.28+0.10)  0.04(0.04+0.05) -98.3%
     +    2000.5: git status (sparse-index-v4)  2.35(2.24+0.15)  0.05(0.04+0.06) -97.9%
      
          Note that since HEAD~1 was expanding the sparse index by parsing trees,
          it was artificially slower than the full index case. Thus, the 98%
     -    improvement is misleading, and instead we should celebrate the 0.37s to
     -    0.05s improvement of 82%. This is more indicative of the peformance
     +    improvement is misleading, and instead we should celebrate the 0.34s to
     +    0.05s improvement of 85%. This is more indicative of the peformance
          gains we are expecting by using a sparse index.
      
          Note: we are dropping the assignment of core.fsmonitor here. This is not
     @@ read-cache.c: int refresh_index(struct index_state *istate, unsigned int flags,
       	trace2_region_enter("index", "refresh", NULL);
      -	/* TODO: audit for interaction with sparse-index. */
      -	ensure_full_index(istate);
     ++
       	for (i = 0; i < istate->cache_nr; i++) {
       		struct cache_entry *ce, *new_entry;
       		int cache_errno = 0;
     +@@ read-cache.c: int refresh_index(struct index_state *istate, unsigned int flags,
     + 		if (ignore_skip_worktree && ce_skip_worktree(ce))
     + 			continue;
     + 
     ++		/*
     ++		 * If this entry is a sparse directory, then there isn't
     ++		 * any stat() information to update. Ignore the entry.
     ++		 */
     ++		if (S_ISSPARSEDIR(ce->ce_mode))
     ++			continue;
     ++
     + 		if (pathspec && !ce_path_match(istate, ce, pathspec, seen))
     + 			filtered = 1;
     + 
      
       ## t/t1092-sparse-checkout-compatibility.sh ##
      @@ t/t1092-sparse-checkout-compatibility.sh: test_expect_success 'sparse-index is expanded and converted back' '
     @@ t/t1092-sparse-checkout-compatibility.sh: test_expect_success 'sparse-index is e
      +	init_repos &&
      +
      +	rm -f trace2.txt &&
     ++	echo >>sparse-index/untracked.txt &&
       	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \
      -		git -C sparse-index -c core.fsmonitor="" status -uno &&
      -	test_region index ensure_full_index trace2.txt
     -+		git -C sparse-index status -uno &&
     ++		git -C sparse-index status &&
      +	test_region ! index ensure_full_index trace2.txt
       '
       
  6:  434306541613 <  -:  ------------ dir: use expand_to_path() for sparse directories
  7:  f1a9ce4ef0e5 <  -:  ------------ add: allow operating on a sparse-only index
  8:  6d7f30f2b90a <  -:  ------------ pathspec: stop calling ensure_full_index
  9:  75199bbe8ca1 <  -:  ------------ t7519: add sparse directories to FS monitor tests
 10:  9d1183ddd280 !  8:  1fd033a6ebb2 fsmonitor: test with sparse index
     @@ t/t7519-status-fsmonitor.sh: test_expect_success 'status succeeds after staging/
       	)
       '
       
     -+test_expect_success 'status succeeds with sparse index' '
     -+	test_config core.fsmonitor "$TEST_DIRECTORY/t7519/fsmonitor-all" &&
     ++# Usage:
     ++# check_sparse_index_behavior [!]
     ++# If "!" is supplied, then we verify that we do not call ensure_full_index
     ++# during a call to 'git status'. Otherwise, we verify that we _do_ call it.
     ++check_sparse_index_behavior () {
      +	git status --porcelain=v2 >expect &&
      +	git sparse-checkout init --cone --sparse-index &&
     ++	git sparse-checkout set dir1 dir2 &&
      +	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \
      +		git status --porcelain=v2 >actual &&
     -+	test_region ! index ensure_full_index trace2.txt &&
     ++	test_region $1 index ensure_full_index trace2.txt &&
      +	test_cmp expect actual &&
      +	rm trace2.txt &&
     ++	git sparse-checkout disable
     ++}
     ++
     ++test_expect_success 'status succeeds with sparse index' '
     ++	git reset --hard &&
     ++
     ++	test_config core.fsmonitor "$TEST_DIRECTORY/t7519/fsmonitor-all" &&
     ++	check_sparse_index_behavior ! &&
      +
      +	write_script .git/hooks/fsmonitor-test<<-\EOF &&
      +		printf "last_update_token\0"
      +	EOF
      +	git config core.fsmonitor .git/hooks/fsmonitor-test &&
     -+	git status --porcelain=v2 >expect &&
     -+	git sparse-checkout init --cone --sparse-index &&
     -+	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \
     -+		git status --porcelain=v2 >actual &&
     -+	test_region ! index ensure_full_index trace2.txt &&
     -+	test_cmp expect actual &&
     -+	rm trace2.txt &&
     ++	check_sparse_index_behavior ! &&
      +
      +	write_script .git/hooks/fsmonitor-test<<-\EOF &&
      +		printf "last_update_token\0"
      +		printf "dir1/modified\0"
      +	EOF
     -+	git config core.fsmonitor .git/hooks/fsmonitor-test &&
     -+	git status --porcelain=v2 >expect &&
     -+	git sparse-checkout init --cone --sparse-index &&
     -+	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \
     -+		git status --porcelain=v2 >actual &&
     -+	test_region ! index ensure_full_index trace2.txt &&
     -+	test_cmp expect actual &&
     ++	check_sparse_index_behavior ! &&
      +
     ++	cp -r dir1 dir1a &&
     ++	git add dir1a &&
     ++	git commit -m "add dir1a" &&
     ++
     ++	# This one modifies outside the sparse-checkout definition
     ++	# and hence we expect to expand the sparse-index.
      +	write_script .git/hooks/fsmonitor-test<<-\EOF &&
      +		printf "last_update_token\0"
      +		printf "dir1a/modified\0"
      +	EOF
     -+	git config core.fsmonitor .git/hooks/fsmonitor-test &&
     -+	git status --porcelain=v2 >expect &&
     -+	git sparse-checkout init --cone --sparse-index &&
     -+	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \
     -+		git status --porcelain=v2 >actual &&
     -+	test_region index ensure_full_index trace2.txt &&
     -+	test_cmp expect actual
     ++	check_sparse_index_behavior
      +'
      +
       test_done

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 215+ messages in thread

* [PATCH v2 1/8] t1092: add tests for status/add and sparse files
  2021-04-23 21:34 ` [PATCH v2 0/8] Sparse-index: integrate with status Derrick Stolee via GitGitGadget
@ 2021-04-23 21:34   ` Derrick Stolee via GitGitGadget
  2021-05-13 12:40     ` Matheus Tavares Bernardino
  2021-04-23 21:34   ` [PATCH v2 2/8] unpack-trees: preserve cache_bottom Derrick Stolee via GitGitGadget
                     ` (8 subsequent siblings)
  9 siblings, 1 reply; 215+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-04-23 21:34 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Before moving to update 'git status' and 'git add' to work with sparse
indexes, add an explicit test that ensures the sparse-index works the
same as a normal sparse-checkout when the worktree contains directories
and files outside of the sparse cone.

Specifically, 'folder1/a' is a file in our test repo, but 'folder1' is
not in the sparse cone. When 'folder1/a' is modified, the file
'folder1/a' is shown as modified, but adding it fails. This is new
behavior as of a20f704 (add: warn when asked to update SKIP_WORKTREE
entries, 2021-04-08). Before that change, these adds would be silently
ignored.

Untracked files are fine: adding new files both with 'git add .' and
'git add folder1/' works just as in a full checkout. This may not be
entirely desirable, but we are not intending to change behavior at the
moment, only document it. A future change could alter the behavior to
be more sensible, and this test could be modified to satisfy the new
expected behavior.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 t/t1092-sparse-checkout-compatibility.sh | 40 ++++++++++++++++++++++++
 1 file changed, 40 insertions(+)

diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index 12e6c453024f..0ec487acd283 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -232,6 +232,46 @@ test_expect_success 'add, commit, checkout' '
 	test_all_match git checkout -
 '
 
+test_expect_success 'status/add: outside sparse cone' '
+	init_repos &&
+
+	# folder1 is at HEAD, but outside the sparse cone
+	run_on_sparse mkdir folder1 &&
+	cp initial-repo/folder1/a sparse-checkout/folder1/a &&
+	cp initial-repo/folder1/a sparse-index/folder1/a &&
+
+	test_sparse_match git status &&
+
+	write_script edit-contents <<-\EOF &&
+	echo text >>$1
+	EOF
+	run_on_all ../edit-contents folder1/a &&
+	run_on_all ../edit-contents folder1/new &&
+
+	test_sparse_match git status --porcelain=v2 &&
+
+	# This "git add folder1/a" is completely ignored
+	# by the sparse-checkout repos. It causes the
+	# full repo to have a different staged environment.
+	#
+	# This is not a desirable behavior, but this test
+	# ensures that the sparse-index is not the cause
+	# of a behavior change.
+	test_sparse_match test_must_fail git add folder1/a &&
+	test_sparse_match test_must_fail git add --refresh folder1/a &&
+	git -C full-checkout checkout HEAD -- folder1/a &&
+	test_all_match git status --porcelain=v2 &&
+
+	test_all_match git add . &&
+	test_all_match git status --porcelain=v2 &&
+	test_all_match git commit -m folder1/new &&
+
+	run_on_all ../edit-contents folder1/newer &&
+	test_all_match git add folder1/ &&
+	test_all_match git status --porcelain=v2 &&
+	test_all_match git commit -m folder1/newer
+'
+
 test_expect_success 'checkout and reset --hard' '
 	init_repos &&
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 215+ messages in thread

* [PATCH v2 2/8] unpack-trees: preserve cache_bottom
  2021-04-23 21:34 ` [PATCH v2 0/8] Sparse-index: integrate with status Derrick Stolee via GitGitGadget
  2021-04-23 21:34   ` [PATCH v2 1/8] t1092: add tests for status/add and sparse files Derrick Stolee via GitGitGadget
@ 2021-04-23 21:34   ` Derrick Stolee via GitGitGadget
  2021-04-23 21:34   ` [PATCH v2 3/8] unpack-trees: compare sparse directories correctly Derrick Stolee via GitGitGadget
                     ` (7 subsequent siblings)
  9 siblings, 0 replies; 215+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-04-23 21:34 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

The cache_bottom member of 'struct unpack_trees_options' is used to
track the range of index entries corresponding to a node of the cache
tree. While recursing with traverse_by_cache_tree(), this value is
preserved on the call stack using a local and then restored as that
method returns.

The mark_ce_used() method normally modifies the cache_bottom member when
it refers to the marked cache entry. However, sparse directory entries
are stored as nodes in the cache-tree data structure as of 2de37c53
(cache-tree: integrate with sparse directory entries, 2021-03-30). Thus,
the cache_bottom will be modified as the cache-tree walk advances. Do
not update it as well within mark_ce_used().

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 unpack-trees.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/unpack-trees.c b/unpack-trees.c
index dddf106d5bd4..1067db19c9d2 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -586,6 +586,13 @@ static void mark_ce_used(struct cache_entry *ce, struct unpack_trees_options *o)
 {
 	ce->ce_flags |= CE_UNPACKED;
 
+	/*
+	 * If this is a sparse directory, don't advance cache_bottom.
+	 * That will be advanced later using the cache-tree data.
+	 */
+	if (S_ISSPARSEDIR(ce->ce_mode))
+		return;
+
 	if (o->cache_bottom < o->src_index->cache_nr &&
 	    o->src_index->cache[o->cache_bottom] == ce) {
 		int bottom = o->cache_bottom;
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 215+ messages in thread

* [PATCH v2 3/8] unpack-trees: compare sparse directories correctly
  2021-04-23 21:34 ` [PATCH v2 0/8] Sparse-index: integrate with status Derrick Stolee via GitGitGadget
  2021-04-23 21:34   ` [PATCH v2 1/8] t1092: add tests for status/add and sparse files Derrick Stolee via GitGitGadget
  2021-04-23 21:34   ` [PATCH v2 2/8] unpack-trees: preserve cache_bottom Derrick Stolee via GitGitGadget
@ 2021-04-23 21:34   ` Derrick Stolee via GitGitGadget
  2021-05-13  3:26     ` Elijah Newren
  2021-04-23 21:34   ` [PATCH v2 4/8] unpack-trees: stop recursing into sparse directories Derrick Stolee via GitGitGadget
                     ` (6 subsequent siblings)
  9 siblings, 1 reply; 215+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-04-23 21:34 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

As we further integrate the sparse-index into unpack-trees, we need to
ensure that we compare sparse directory entries correctly with other
entries. This affects searching for an exact path as well as sorting
index entries.

Sparse directory entries contain the trailing directory separator. This
is important for the sorting, in particular. Thus, within
do_compare_entry() we stop using S_IFREG in all cases, since sparse
directories should use S_IFDIR to indicate that the comparison should
treat the entry name as a dirctory.

Within compare_entry(), it first calls do_compare_entry() to check the
leading portion of the name. When the input path is a directory name, we
could match exactly already. Thus, we should return 0 if we have an
exact string match on a sparse directory entry.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 unpack-trees.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/unpack-trees.c b/unpack-trees.c
index 1067db19c9d2..3af797093095 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -969,6 +969,7 @@ static int do_compare_entry(const struct cache_entry *ce,
 	int pathlen, ce_len;
 	const char *ce_name;
 	int cmp;
+	unsigned ce_mode;
 
 	/*
 	 * If we have not precomputed the traverse path, it is quicker
@@ -991,7 +992,8 @@ static int do_compare_entry(const struct cache_entry *ce,
 	ce_len -= pathlen;
 	ce_name = ce->name + pathlen;
 
-	return df_name_compare(ce_name, ce_len, S_IFREG, name, namelen, mode);
+	ce_mode = S_ISSPARSEDIR(ce->ce_mode) ? S_IFDIR : S_IFREG;
+	return df_name_compare(ce_name, ce_len, ce_mode, name, namelen, mode);
 }
 
 static int compare_entry(const struct cache_entry *ce, const struct traverse_info *info, const struct name_entry *n)
@@ -1000,6 +1002,10 @@ static int compare_entry(const struct cache_entry *ce, const struct traverse_inf
 	if (cmp)
 		return cmp;
 
+	/* If ce is a sparse directory, then allow an exact match. */
+	if (S_ISSPARSEDIR(ce->ce_mode))
+		return 0;
+
 	/*
 	 * Even if the beginning compared identically, the ce should
 	 * compare as bigger than a directory leading up to it!
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 215+ messages in thread

* [PATCH v2 4/8] unpack-trees: stop recursing into sparse directories
  2021-04-23 21:34 ` [PATCH v2 0/8] Sparse-index: integrate with status Derrick Stolee via GitGitGadget
                     ` (2 preceding siblings ...)
  2021-04-23 21:34   ` [PATCH v2 3/8] unpack-trees: compare sparse directories correctly Derrick Stolee via GitGitGadget
@ 2021-04-23 21:34   ` Derrick Stolee via GitGitGadget
  2021-05-13  3:31     ` Elijah Newren
  2021-04-23 21:34   ` [PATCH v2 5/8] dir.c: accept a directory as part of cone-mode patterns Derrick Stolee via GitGitGadget
                     ` (5 subsequent siblings)
  9 siblings, 1 reply; 215+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-04-23 21:34 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

When walking trees using traverse_trees_recursive() and
unpack_callback(), we must not attempt to walk into a sparse directory
entry. There are no index entries within that directory to compare to
the tree object at that position, so skip over the entries of that tree.

This code is used in many places, so the only way to test it is to start
removing the command_requres_full_index option from one builtin at a
time and carefully test that its use of unpack_trees() behaves correctly
with a sparse-index. Such tests will be added by later changes.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 unpack-trees.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/unpack-trees.c b/unpack-trees.c
index 3af797093095..67777570f829 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -1256,6 +1256,7 @@ static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, str
 	struct cache_entry *src[MAX_UNPACK_TREES + 1] = { NULL, };
 	struct unpack_trees_options *o = info->data;
 	const struct name_entry *p = names;
+	unsigned unpack_tree = 1;
 
 	/* Find first entry with a real name (we could use "mask" too) */
 	while (!p->mode)
@@ -1297,12 +1298,16 @@ static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, str
 					}
 				}
 				src[0] = ce;
+
+				if (S_ISSPARSEDIR(ce->ce_mode))
+					unpack_tree = 0;
 			}
 			break;
 		}
 	}
 
-	if (unpack_nondirectories(n, mask, dirmask, src, names, info) < 0)
+	if (unpack_tree &&
+	    unpack_nondirectories(n, mask, dirmask, src, names, info) < 0)
 		return -1;
 
 	if (o->merge && src[0]) {
@@ -1332,7 +1337,8 @@ static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, str
 			}
 		}
 
-		if (traverse_trees_recursive(n, dirmask, mask & ~dirmask,
+		if (unpack_tree &&
+		    traverse_trees_recursive(n, dirmask, mask & ~dirmask,
 					     names, info) < 0)
 			return -1;
 		return mask;
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 215+ messages in thread

* [PATCH v2 5/8] dir.c: accept a directory as part of cone-mode patterns
  2021-04-23 21:34 ` [PATCH v2 0/8] Sparse-index: integrate with status Derrick Stolee via GitGitGadget
                     ` (3 preceding siblings ...)
  2021-04-23 21:34   ` [PATCH v2 4/8] unpack-trees: stop recursing into sparse directories Derrick Stolee via GitGitGadget
@ 2021-04-23 21:34   ` Derrick Stolee via GitGitGadget
  2021-04-23 21:34   ` [PATCH v2 6/8] status: skip sparse-checkout percentage with sparse-index Derrick Stolee via GitGitGadget
                     ` (4 subsequent siblings)
  9 siblings, 0 replies; 215+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-04-23 21:34 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

When we have sparse directory entries in the index, we want to compare
that directory against sparse-checkout patterns. Those pattern matching
algorithms are built expecting a file path, not a directory path. This
is especially important in the "cone mode" patterns which will match
files that exist within the "parent directories" as well as the
recursive directory matches.

If path_matches_pattern_list() is given a directory, we can add a fake
filename ("-") to the directory and get the same results as before,
assuming we are in cone mode. Since sparse index requires cone mode
patterns, this is an acceptable assumption.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 dir.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/dir.c b/dir.c
index 166238e79f52..ab76ef286495 100644
--- a/dir.c
+++ b/dir.c
@@ -1378,6 +1378,17 @@ enum pattern_match_result path_matches_pattern_list(
 	strbuf_addch(&parent_pathname, '/');
 	strbuf_add(&parent_pathname, pathname, pathlen);
 
+	/*
+	 * Directory entries are matched if and only if a file
+	 * contained immediately within them is matched. For the
+	 * case of a directory entry, modify the path to create
+	 * a fake filename within this directory, allowing us to
+	 * use the file-base matching logic in an equivalent way.
+	 */
+	if (parent_pathname.len > 0 &&
+	    parent_pathname.buf[parent_pathname.len - 1] == '/')
+		strbuf_add(&parent_pathname, "-", 1);
+
 	if (hashmap_contains_path(&pl->recursive_hashmap,
 				  &parent_pathname)) {
 		result = MATCHED_RECURSIVE;
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 215+ messages in thread

* [PATCH v2 6/8] status: skip sparse-checkout percentage with sparse-index
  2021-04-23 21:34 ` [PATCH v2 0/8] Sparse-index: integrate with status Derrick Stolee via GitGitGadget
                     ` (4 preceding siblings ...)
  2021-04-23 21:34   ` [PATCH v2 5/8] dir.c: accept a directory as part of cone-mode patterns Derrick Stolee via GitGitGadget
@ 2021-04-23 21:34   ` Derrick Stolee via GitGitGadget
  2021-04-23 21:34   ` [PATCH v2 7/8] status: use sparse-index throughout Derrick Stolee via GitGitGadget
                     ` (3 subsequent siblings)
  9 siblings, 0 replies; 215+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-04-23 21:34 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

'git status' began reporting a percentage of populated paths when
sparse-checkout is enabled in 051df3cf (wt-status: show sparse
checkout status as well, 2020-07-18). This percentage is incorrect when
the index has sparse directories. It would also be expensive to
calculate as we would need to parse trees to count the total number of
possible paths.

Avoid the expensive computation by simplifying the output to only report
that a sparse checkout exists, without the percentage.

This change is the reason we use 'git status --porcelain=v2' in
t1092-sparse-checkout-compatibility.sh. We don't want to ensure that
this message is equal across both modes, but instead just the important
information about staged, modified, and untracked files are compared.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 t/t1092-sparse-checkout-compatibility.sh |  8 ++++++++
 wt-status.c                              | 14 +++++++++++---
 wt-status.h                              |  1 +
 3 files changed, 20 insertions(+), 3 deletions(-)

diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index 0ec487acd283..0dc551b25f67 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -196,6 +196,14 @@ test_expect_success 'status with options' '
 	test_all_match git status --porcelain=v2 -uno
 '
 
+test_expect_success 'status reports sparse-checkout' '
+	init_repos &&
+	git -C sparse-checkout status >full &&
+	git -C sparse-index status >sparse &&
+	test_i18ngrep "You are in a sparse checkout with " full &&
+	test_i18ngrep "You are in a sparse checkout." sparse
+'
+
 test_expect_success 'add, commit, checkout' '
 	init_repos &&
 
diff --git a/wt-status.c b/wt-status.c
index 0c8287a023e4..0425169c1895 100644
--- a/wt-status.c
+++ b/wt-status.c
@@ -1490,9 +1490,12 @@ static void show_sparse_checkout_in_use(struct wt_status *s,
 	if (s->state.sparse_checkout_percentage == SPARSE_CHECKOUT_DISABLED)
 		return;
 
-	status_printf_ln(s, color,
-			 _("You are in a sparse checkout with %d%% of tracked files present."),
-			 s->state.sparse_checkout_percentage);
+	if (s->state.sparse_checkout_percentage == SPARSE_CHECKOUT_SPARSE_INDEX)
+		status_printf_ln(s, color, _("You are in a sparse checkout."));
+	else
+		status_printf_ln(s, color,
+				_("You are in a sparse checkout with %d%% of tracked files present."),
+				s->state.sparse_checkout_percentage);
 	wt_longstatus_print_trailer(s);
 }
 
@@ -1650,6 +1653,11 @@ static void wt_status_check_sparse_checkout(struct repository *r,
 		return;
 	}
 
+	if (r->index->sparse_index) {
+		state->sparse_checkout_percentage = SPARSE_CHECKOUT_SPARSE_INDEX;
+		return;
+	}
+
 	for (i = 0; i < r->index->cache_nr; i++) {
 		struct cache_entry *ce = r->index->cache[i];
 		if (ce_skip_worktree(ce))
diff --git a/wt-status.h b/wt-status.h
index 0d32799b28e1..ab9cc9d8f032 100644
--- a/wt-status.h
+++ b/wt-status.h
@@ -78,6 +78,7 @@ enum wt_status_format {
 };
 
 #define SPARSE_CHECKOUT_DISABLED -1
+#define SPARSE_CHECKOUT_SPARSE_INDEX -2
 
 struct wt_status_state {
 	int merge_in_progress;
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 215+ messages in thread

* [PATCH v2 7/8] status: use sparse-index throughout
  2021-04-23 21:34 ` [PATCH v2 0/8] Sparse-index: integrate with status Derrick Stolee via GitGitGadget
                     ` (5 preceding siblings ...)
  2021-04-23 21:34   ` [PATCH v2 6/8] status: skip sparse-checkout percentage with sparse-index Derrick Stolee via GitGitGadget
@ 2021-04-23 21:34   ` Derrick Stolee via GitGitGadget
  2021-04-23 21:34   ` [PATCH v2 8/8] fsmonitor: test with sparse index Derrick Stolee via GitGitGadget
                     ` (2 subsequent siblings)
  9 siblings, 0 replies; 215+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-04-23 21:34 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

By testing 'git -c core.fsmonitor= status -uno', we can check for the
simplest index operations that can be made sparse-aware. The necessary
implementation details are already integrated with sparse-checkout, so
modify command_requires_full_index to be zero for cmd_status().

In refresh_index(), we loop through the index entries to refresh their
stat() information. However, sparse directories have no stat()
information to populate. Ignore these entries.

This allows 'git status' to no longer expand a sparse index to a full
one. This is further tested by dropping the "-uno" option and adding an
untracked file into the worktree.

The performance test p2000-sparse-checkout-operations.sh demonstrates
these improvements:

Test                                  HEAD~1           HEAD
-----------------------------------------------------------------------------
2000.2: git status (full-index-v3)    0.31(0.30+0.05)  0.31(0.29+0.06) +0.0%
2000.3: git status (full-index-v4)    0.31(0.29+0.07)  0.34(0.30+0.08) +9.7%
2000.4: git status (sparse-index-v3)  2.35(2.28+0.10)  0.04(0.04+0.05) -98.3%
2000.5: git status (sparse-index-v4)  2.35(2.24+0.15)  0.05(0.04+0.06) -97.9%

Note that since HEAD~1 was expanding the sparse index by parsing trees,
it was artificially slower than the full index case. Thus, the 98%
improvement is misleading, and instead we should celebrate the 0.34s to
0.05s improvement of 85%. This is more indicative of the peformance
gains we are expecting by using a sparse index.

Note: we are dropping the assignment of core.fsmonitor here. This is not
necessary for the test script as we are not altering the config any
other way. Correct integration with FS Monitor will be validated in
later changes.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 builtin/commit.c                         |  3 +++
 read-cache.c                             | 10 ++++++++--
 t/t1092-sparse-checkout-compatibility.sh | 13 +++++++++----
 3 files changed, 20 insertions(+), 6 deletions(-)

diff --git a/builtin/commit.c b/builtin/commit.c
index cf0c36d1dcb2..e529da7beadd 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -1404,6 +1404,9 @@ int cmd_status(int argc, const char **argv, const char *prefix)
 	if (argc == 2 && !strcmp(argv[1], "-h"))
 		usage_with_options(builtin_status_usage, builtin_status_options);
 
+	prepare_repo_settings(the_repository);
+	the_repository->settings.command_requires_full_index = 0;
+
 	status_init_config(&s, git_status_config);
 	argc = parse_options(argc, argv, prefix,
 			     builtin_status_options,
diff --git a/read-cache.c b/read-cache.c
index 29ffa9ac5db9..f80e26831b36 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1578,8 +1578,7 @@ int refresh_index(struct index_state *istate, unsigned int flags,
 	 */
 	preload_index(istate, pathspec, 0);
 	trace2_region_enter("index", "refresh", NULL);
-	/* TODO: audit for interaction with sparse-index. */
-	ensure_full_index(istate);
+
 	for (i = 0; i < istate->cache_nr; i++) {
 		struct cache_entry *ce, *new_entry;
 		int cache_errno = 0;
@@ -1594,6 +1593,13 @@ int refresh_index(struct index_state *istate, unsigned int flags,
 		if (ignore_skip_worktree && ce_skip_worktree(ce))
 			continue;
 
+		/*
+		 * If this entry is a sparse directory, then there isn't
+		 * any stat() information to update. Ignore the entry.
+		 */
+		if (S_ISSPARSEDIR(ce->ce_mode))
+			continue;
+
 		if (pathspec && !ce_path_match(istate, ce, pathspec, seen))
 			filtered = 1;
 
diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index 0dc551b25f67..5a8fe88dc894 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -453,12 +453,17 @@ test_expect_success 'sparse-index is expanded and converted back' '
 	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \
 		git -C sparse-index -c core.fsmonitor="" reset --hard &&
 	test_region index convert_to_sparse trace2.txt &&
-	test_region index ensure_full_index trace2.txt &&
+	test_region index ensure_full_index trace2.txt
+'
 
-	rm trace2.txt &&
+test_expect_success 'sparse-index is not expanded' '
+	init_repos &&
+
+	rm -f trace2.txt &&
+	echo >>sparse-index/untracked.txt &&
 	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \
-		git -C sparse-index -c core.fsmonitor="" status -uno &&
-	test_region index ensure_full_index trace2.txt
+		git -C sparse-index status &&
+	test_region ! index ensure_full_index trace2.txt
 '
 
 test_done
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 215+ messages in thread

* [PATCH v2 8/8] fsmonitor: test with sparse index
  2021-04-23 21:34 ` [PATCH v2 0/8] Sparse-index: integrate with status Derrick Stolee via GitGitGadget
                     ` (6 preceding siblings ...)
  2021-04-23 21:34   ` [PATCH v2 7/8] status: use sparse-index throughout Derrick Stolee via GitGitGadget
@ 2021-04-23 21:34   ` Derrick Stolee via GitGitGadget
  2021-05-13  4:12   ` [PATCH v2 0/8] Sparse-index: integrate with status Elijah Newren
  2021-05-14 18:30   ` [PATCH v3 00/12] " Derrick Stolee via GitGitGadget
  9 siblings, 0 replies; 215+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-04-23 21:34 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

During the effort to protect uses of the index to operate on a full
index, we did not modify fsmonitor.c. This is because it already works
effectively with only the change to index_name_stage_pos(). The only
thing left to do is to test that it works correctly.

These tests are added to demonstrate that the behavior is the same
across a full index and a sparse index, but also that file modifications
to a tracked directory outside of the sparse cone will trigger
ensure_full_index().

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 t/t7519-status-fsmonitor.sh | 48 +++++++++++++++++++++++++++++++++++++
 1 file changed, 48 insertions(+)

diff --git a/t/t7519-status-fsmonitor.sh b/t/t7519-status-fsmonitor.sh
index 45d025f96010..f70fe961902e 100755
--- a/t/t7519-status-fsmonitor.sh
+++ b/t/t7519-status-fsmonitor.sh
@@ -73,6 +73,7 @@ test_expect_success 'setup' '
 	expect*
 	actual*
 	marker*
+	trace2*
 	EOF
 '
 
@@ -383,4 +384,51 @@ test_expect_success 'status succeeds after staging/unstaging' '
 	)
 '
 
+# Usage:
+# check_sparse_index_behavior [!]
+# If "!" is supplied, then we verify that we do not call ensure_full_index
+# during a call to 'git status'. Otherwise, we verify that we _do_ call it.
+check_sparse_index_behavior () {
+	git status --porcelain=v2 >expect &&
+	git sparse-checkout init --cone --sparse-index &&
+	git sparse-checkout set dir1 dir2 &&
+	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \
+		git status --porcelain=v2 >actual &&
+	test_region $1 index ensure_full_index trace2.txt &&
+	test_cmp expect actual &&
+	rm trace2.txt &&
+	git sparse-checkout disable
+}
+
+test_expect_success 'status succeeds with sparse index' '
+	git reset --hard &&
+
+	test_config core.fsmonitor "$TEST_DIRECTORY/t7519/fsmonitor-all" &&
+	check_sparse_index_behavior ! &&
+
+	write_script .git/hooks/fsmonitor-test<<-\EOF &&
+		printf "last_update_token\0"
+	EOF
+	git config core.fsmonitor .git/hooks/fsmonitor-test &&
+	check_sparse_index_behavior ! &&
+
+	write_script .git/hooks/fsmonitor-test<<-\EOF &&
+		printf "last_update_token\0"
+		printf "dir1/modified\0"
+	EOF
+	check_sparse_index_behavior ! &&
+
+	cp -r dir1 dir1a &&
+	git add dir1a &&
+	git commit -m "add dir1a" &&
+
+	# This one modifies outside the sparse-checkout definition
+	# and hence we expect to expand the sparse-index.
+	write_script .git/hooks/fsmonitor-test<<-\EOF &&
+		printf "last_update_token\0"
+		printf "dir1a/modified\0"
+	EOF
+	check_sparse_index_behavior
+'
+
 test_done
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 215+ messages in thread

* Re: [PATCH v2 3/8] unpack-trees: compare sparse directories correctly
  2021-04-23 21:34   ` [PATCH v2 3/8] unpack-trees: compare sparse directories correctly Derrick Stolee via GitGitGadget
@ 2021-05-13  3:26     ` Elijah Newren
  0 siblings, 0 replies; 215+ messages in thread
From: Elijah Newren @ 2021-05-13  3:26 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Junio C Hamano, Matheus Tavares Bernardino,
	Derrick Stolee, Derrick Stolee, Derrick Stolee

On Fri, Apr 23, 2021 at 2:34 PM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Derrick Stolee <dstolee@microsoft.com>
>
> As we further integrate the sparse-index into unpack-trees, we need to
> ensure that we compare sparse directory entries correctly with other
> entries. This affects searching for an exact path as well as sorting
> index entries.
>
> Sparse directory entries contain the trailing directory separator. This
> is important for the sorting, in particular. Thus, within
> do_compare_entry() we stop using S_IFREG in all cases, since sparse
> directories should use S_IFDIR to indicate that the comparison should
> treat the entry name as a dirctory.
>
> Within compare_entry(), it first calls do_compare_entry() to check the
> leading portion of the name. When the input path is a directory name, we
> could match exactly already. Thus, we should return 0 if we have an
> exact string match on a sparse directory entry.

Thanks for splitting up patch 2 from the original series; it's much
easier to understand these separate patches.

>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  unpack-trees.c | 8 +++++++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/unpack-trees.c b/unpack-trees.c
> index 1067db19c9d2..3af797093095 100644
> --- a/unpack-trees.c
> +++ b/unpack-trees.c
> @@ -969,6 +969,7 @@ static int do_compare_entry(const struct cache_entry *ce,
>         int pathlen, ce_len;
>         const char *ce_name;
>         int cmp;
> +       unsigned ce_mode;
>
>         /*
>          * If we have not precomputed the traverse path, it is quicker
> @@ -991,7 +992,8 @@ static int do_compare_entry(const struct cache_entry *ce,
>         ce_len -= pathlen;
>         ce_name = ce->name + pathlen;
>
> -       return df_name_compare(ce_name, ce_len, S_IFREG, name, namelen, mode);
> +       ce_mode = S_ISSPARSEDIR(ce->ce_mode) ? S_IFDIR : S_IFREG;

Ah, so here the fact that S_ISSPARSEDIR is defined as
   #define S_ISSPARSEDIR(m) ((m) == S_IFDIR)
whereas S_ISDIR is defined as
   #define S_ISDIR(m)      (((m) & S_IFMT) == S_IFDIR)
turns out to be critically important, because if you used S_ISDIR()
here, then we'd get ce_mode = S_IFDIR for submodules and break the
sorting.  S_ISSPARSEDIR() gives us the correct value.

> +       return df_name_compare(ce_name, ce_len, ce_mode, name, namelen, mode);
>  }
>
>  static int compare_entry(const struct cache_entry *ce, const struct traverse_info *info, const struct name_entry *n)
> @@ -1000,6 +1002,10 @@ static int compare_entry(const struct cache_entry *ce, const struct traverse_inf
>         if (cmp)
>                 return cmp;
>
> +       /* If ce is a sparse directory, then allow an exact match. */
> +       if (S_ISSPARSEDIR(ce->ce_mode))
> +               return 0;

I think the comment from the commit message belongs in the code; the
comment in the code is too jarring without the more detailed
explanation.

> +
>         /*
>          * Even if the beginning compared identically, the ce should
>          * compare as bigger than a directory leading up to it!
> --
> gitgitgadget

^ permalink raw reply	[flat|nested] 215+ messages in thread

* Re: [PATCH v2 4/8] unpack-trees: stop recursing into sparse directories
  2021-04-23 21:34   ` [PATCH v2 4/8] unpack-trees: stop recursing into sparse directories Derrick Stolee via GitGitGadget
@ 2021-05-13  3:31     ` Elijah Newren
  0 siblings, 0 replies; 215+ messages in thread
From: Elijah Newren @ 2021-05-13  3:31 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Junio C Hamano, Matheus Tavares Bernardino,
	Derrick Stolee, Derrick Stolee, Derrick Stolee

On Fri, Apr 23, 2021 at 2:34 PM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Derrick Stolee <dstolee@microsoft.com>
>
> When walking trees using traverse_trees_recursive() and
> unpack_callback(), we must not attempt to walk into a sparse directory
> entry. There are no index entries within that directory to compare to
> the tree object at that position, so skip over the entries of that tree.
>
> This code is used in many places, so the only way to test it is to start
> removing the command_requres_full_index option from one builtin at a
> time and carefully test that its use of unpack_trees() behaves correctly
> with a sparse-index. Such tests will be added by later changes.
>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  unpack-trees.c | 10 ++++++++--
>  1 file changed, 8 insertions(+), 2 deletions(-)
>
> diff --git a/unpack-trees.c b/unpack-trees.c
> index 3af797093095..67777570f829 100644
> --- a/unpack-trees.c
> +++ b/unpack-trees.c
> @@ -1256,6 +1256,7 @@ static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, str
>         struct cache_entry *src[MAX_UNPACK_TREES + 1] = { NULL, };
>         struct unpack_trees_options *o = info->data;
>         const struct name_entry *p = names;
> +       unsigned unpack_tree = 1;
>
>         /* Find first entry with a real name (we could use "mask" too) */
>         while (!p->mode)
> @@ -1297,12 +1298,16 @@ static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, str
>                                         }
>                                 }
>                                 src[0] = ce;
> +
> +                               if (S_ISSPARSEDIR(ce->ce_mode))
> +                                       unpack_tree = 0;
>                         }
>                         break;
>                 }
>         }
>
> -       if (unpack_nondirectories(n, mask, dirmask, src, names, info) < 0)
> +       if (unpack_tree &&
> +           unpack_nondirectories(n, mask, dirmask, src, names, info) < 0)
>                 return -1;
>
>         if (o->merge && src[0]) {
> @@ -1332,7 +1337,8 @@ static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, str
>                         }
>                 }
>
> -               if (traverse_trees_recursive(n, dirmask, mask & ~dirmask,
> +               if (unpack_tree &&
> +                   traverse_trees_recursive(n, dirmask, mask & ~dirmask,
>                                              names, info) < 0)
>                         return -1;
>                 return mask;
> --
> gitgitgadget

The splitting of the previous patch looks really good here too, and
the variable rename makes it flow nicely.  Looking good.

^ permalink raw reply	[flat|nested] 215+ messages in thread

* Re: [PATCH v2 0/8] Sparse-index: integrate with status
  2021-04-23 21:34 ` [PATCH v2 0/8] Sparse-index: integrate with status Derrick Stolee via GitGitGadget
                     ` (7 preceding siblings ...)
  2021-04-23 21:34   ` [PATCH v2 8/8] fsmonitor: test with sparse index Derrick Stolee via GitGitGadget
@ 2021-05-13  4:12   ` Elijah Newren
  2021-05-14 18:28     ` Derrick Stolee
  2021-05-14 18:30   ` [PATCH v3 00/12] " Derrick Stolee via GitGitGadget
  9 siblings, 1 reply; 215+ messages in thread
From: Elijah Newren @ 2021-05-13  4:12 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Junio C Hamano, Matheus Tavares Bernardino,
	Derrick Stolee, Derrick Stolee

On Fri, Apr 23, 2021 at 2:34 PM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> This is the first "payoff" series in the sparse-index work. It makes 'git
> status' very fast when a sparse-index is enabled on a repository with
> cone-mode sparse-checkout (and a small populated set).
>
> This is based on ds/sparse-index-protections AND mt/add-rm-sparse-checkout.
> The latter branch is needed because it changes the behavior of 'git add'
> around sparse entries, which changes the expectations of a test added in
> patch 1.
>
> The approach here is to audit the places where ensure_full_index() pops up
> while doing normal commands with pathspecs within the sparse-checkout
> definition. Each of these are checked and tested. In the end, the
> sparse-index is integrated with these features:
>
>  * git status
>  * FS Monitor index extension.
>
> The performance tests in p2000-sparse-operations.sh improve by 95% or more,
> even when compared with the full-index cases, not just the sparse-index
> cases that previously had extra overhead.
>
> Hopefully this is the first example of how ds/sparse-index-protections has
> done the basic work to do these conversions safely, making them look easier
> than they seemed when starting this adventure.
>
> Thanks, -Stolee
>
>
> Updates in V2
> =============
>
>  * Based on the feedback, it is clear that 'git add' will require much more
>    careful testing and thought. I'm splitting it out of this series and it
>    will return with a follow-up.
>  * Test cases are improved, both in coverage and organization.
>  * The previous "unpack-trees: make sparse aware" patch is split into three
>    now.
>  * Stale messages based on an old implementation of the "protections" topic
>    are now fixed.
>  * Performance tests were re-run.

I read through the topic, both my old comments, the range-diff, and
the new patches where the range-diff wasn't enough.  I tried to spot
issues, and was hoping to find problems you alluded to in your recent
comments at https://lore.kernel.org/git/05932ebc-04ac-b3c5-a460-5d37d8604fd9@gmail.com/,
but I failed to spot them.  I hope it has to do with the cache bottom
stuff that I just don't understand, because otherwise I just missed
the problems in my review.  I can say that in v2 you fixed the issues
I did spot in my review of v1.

I'll look forward to v3 to see what it was I missed.  If I somehow
don't respond soon (in a week at the latest), do feel free to ping me;
sorry for somehow having this one slip through the cracks.

^ permalink raw reply	[flat|nested] 215+ messages in thread

* Re: [PATCH v2 1/8] t1092: add tests for status/add and sparse files
  2021-04-23 21:34   ` [PATCH v2 1/8] t1092: add tests for status/add and sparse files Derrick Stolee via GitGitGadget
@ 2021-05-13 12:40     ` Matheus Tavares Bernardino
  2021-05-14 12:27       ` Derrick Stolee
  0 siblings, 1 reply; 215+ messages in thread
From: Matheus Tavares Bernardino @ 2021-05-13 12:40 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: git, Junio C Hamano, Elijah Newren, Derrick Stolee,
	Derrick Stolee, Derrick Stolee

On Fri, Apr 23, 2021 at 6:34 PM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Derrick Stolee <dstolee@microsoft.com>
>
> Before moving to update 'git status' and 'git add' to work with sparse
> indexes, add an explicit test that ensures the sparse-index works the
> same as a normal sparse-checkout when the worktree contains directories
> and files outside of the sparse cone.
>
> Specifically, 'folder1/a' is a file in our test repo, but 'folder1' is
> not in the sparse cone. When 'folder1/a' is modified, the file
> 'folder1/a' is shown as modified, but adding it fails.

Hmm, I might be doing something wrong, but I think `folder1/a` is not
shown as modified.

$ git init test
$ mkdir test/folder1
$ echo original >test/folder1/a
$ echo original >test/b
$ git -C test add . && git -C test commit -m files
$ git -C test sparse-checkout init --cone --sparse-index
$ ls test
b
$ mkdir test/folder1 && echo modified >test/folder1/a
$ git -C test status
On branch master
You are in a sparse checkout with 50% of tracked files present.
nothing to commit, working tree clean

> This is new
> behavior as of a20f704 (add: warn when asked to update SKIP_WORKTREE
> entries, 2021-04-08). Before that change, these adds would be silently
> ignored.
>
> Untracked files are fine: adding new files both with 'git add .' and
> 'git add folder1/' works just as in a full checkout. This may not be
> entirely desirable, but we are not intending to change behavior at the
> moment, only document it. A future change could alter the behavior to
> be more sensible, and this test could be modified to satisfy the new
> expected behavior.
>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  t/t1092-sparse-checkout-compatibility.sh | 40 ++++++++++++++++++++++++
>  1 file changed, 40 insertions(+)
>
> diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
> index 12e6c453024f..0ec487acd283 100755
> --- a/t/t1092-sparse-checkout-compatibility.sh
> +++ b/t/t1092-sparse-checkout-compatibility.sh
> @@ -232,6 +232,46 @@ test_expect_success 'add, commit, checkout' '
>         test_all_match git checkout -
>  '
>
> +test_expect_success 'status/add: outside sparse cone' '
> +       init_repos &&

A minor suggestion: before recreating folder1/a, we could also test
that `git add folder1/a` will not remove the sparse entry from the
index and will properly warn about it on both sparse repos. I.e.
adding a:

        test_sparse_match test_must_fail git add folder1/a

> +       # folder1 is at HEAD, but outside the sparse cone
> +       run_on_sparse mkdir folder1 &&
> +       cp initial-repo/folder1/a sparse-checkout/folder1/a &&
> +       cp initial-repo/folder1/a sparse-index/folder1/a &&
> +
> +       test_sparse_match git status &&
> +
> +       write_script edit-contents <<-\EOF &&
> +       echo text >>$1
> +       EOF
> +       run_on_all ../edit-contents folder1/a &&

Hmm, we modify `folder1/a` in all repos, but we only try adding it on
the sparse repos, and then we immediately restore it on the full repo.
As we won't use the modified version on the full repo, could this
perhaps be `run_on_sparse` instead? If so, we could also save the
later `git -C full-checkout checkout HEAD -- folder1/a`.

> +       run_on_all ../edit-contents folder1/new &&
> +
> +       test_sparse_match git status --porcelain=v2 &&
> +
> +       # This "git add folder1/a" is completely ignored
> +       # by the sparse-checkout repos. It causes the
> +       # full repo to have a different staged environment.
> +       #
> +       # This is not a desirable behavior, but this test
> +       # ensures that the sparse-index is not the cause
> +       # of a behavior change.

I'm not sure I understand what the undesirable behavior is in this
sentence. Is it "git add folder1/a" erroring out and not updating
`folder1/a`? Or the full repo having a different staged environment?

> +       test_sparse_match test_must_fail git add folder1/a &&
> +       test_sparse_match test_must_fail git add --refresh folder1/a &&
> +       git -C full-checkout checkout HEAD -- folder1/a &&
> +       test_all_match git status --porcelain=v2 &&
> +
> +       test_all_match git add . &&
> +       test_all_match git status --porcelain=v2 &&
> +       test_all_match git commit -m folder1/new &&
> +
> +       run_on_all ../edit-contents folder1/newer &&
> +       test_all_match git add folder1/ &&
> +       test_all_match git status --porcelain=v2 &&
> +       test_all_match git commit -m folder1/newer
> +'
> +
>  test_expect_success 'checkout and reset --hard' '
>         init_repos &&
>
> --
> gitgitgadget
>

^ permalink raw reply	[flat|nested] 215+ messages in thread

* Re: [PATCH v2 1/8] t1092: add tests for status/add and sparse files
  2021-05-13 12:40     ` Matheus Tavares Bernardino
@ 2021-05-14 12:27       ` Derrick Stolee
  0 siblings, 0 replies; 215+ messages in thread
From: Derrick Stolee @ 2021-05-14 12:27 UTC (permalink / raw)
  To: Matheus Tavares Bernardino, Derrick Stolee via GitGitGadget
  Cc: git, Junio C Hamano, Elijah Newren, Derrick Stolee, Derrick Stolee

On 5/13/2021 8:40 AM, Matheus Tavares Bernardino wrote:
> On Fri, Apr 23, 2021 at 6:34 PM Derrick Stolee via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
>>
>> From: Derrick Stolee <dstolee@microsoft.com>
>>
>> Before moving to update 'git status' and 'git add' to work with sparse
>> indexes, add an explicit test that ensures the sparse-index works the
>> same as a normal sparse-checkout when the worktree contains directories
>> and files outside of the sparse cone.
>>
>> Specifically, 'folder1/a' is a file in our test repo, but 'folder1' is
>> not in the sparse cone. When 'folder1/a' is modified, the file
>> 'folder1/a' is shown as modified, but adding it fails.
> 
> Hmm, I might be doing something wrong, but I think `folder1/a` is not
> shown as modified.
> 
> $ git init test
> $ mkdir test/folder1
> $ echo original >test/folder1/a
> $ echo original >test/b
> $ git -C test add . && git -C test commit -m files
> $ git -C test sparse-checkout init --cone --sparse-index
> $ ls test
> b
> $ mkdir test/folder1 && echo modified >test/folder1/a
> $ git -C test status
> On branch master
> You are in a sparse checkout with 50% of tracked files present.
> nothing to commit, working tree clean

You are correct. This happens in both the sparse-index case and the
regular full-index case. The modifications outside of the sparse-checkout
definition are ignored, as long as they matched a tracked file.

I checked my latest code against this example and see that the sparse
index is not expanded to a full one. It _will_ be if we add an untracked
file outside of the sparse cone.

>> This is new
>> behavior as of a20f704 (add: warn when asked to update SKIP_WORKTREE
>> entries, 2021-04-08). Before that change, these adds would be silently
>> ignored.
>>
>> Untracked files are fine: adding new files both with 'git add .' and
>> 'git add folder1/' works just as in a full checkout. This may not be
>> entirely desirable, but we are not intending to change behavior at the
>> moment, only document it. A future change could alter the behavior to
>> be more sensible, and this test could be modified to satisfy the new
>> expected behavior.
>>
>> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
>> ---
>>  t/t1092-sparse-checkout-compatibility.sh | 40 ++++++++++++++++++++++++
>>  1 file changed, 40 insertions(+)
>>
>> diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
>> index 12e6c453024f..0ec487acd283 100755
>> --- a/t/t1092-sparse-checkout-compatibility.sh
>> +++ b/t/t1092-sparse-checkout-compatibility.sh
>> @@ -232,6 +232,46 @@ test_expect_success 'add, commit, checkout' '
>>         test_all_match git checkout -
>>  '
>>
>> +test_expect_success 'status/add: outside sparse cone' '
>> +       init_repos &&
> 
> A minor suggestion: before recreating folder1/a, we could also test
> that `git add folder1/a` will not remove the sparse entry from the
> index and will properly warn about it on both sparse repos. I.e.
> adding a:
> 
>         test_sparse_match test_must_fail git add folder1/a

Will do.

>> +       # folder1 is at HEAD, but outside the sparse cone
>> +       run_on_sparse mkdir folder1 &&
>> +       cp initial-repo/folder1/a sparse-checkout/folder1/a &&
>> +       cp initial-repo/folder1/a sparse-index/folder1/a &&
>> +
>> +       test_sparse_match git status &&
>> +
>> +       write_script edit-contents <<-\EOF &&
>> +       echo text >>$1
>> +       EOF
>> +       run_on_all ../edit-contents folder1/a &&
> 
> Hmm, we modify `folder1/a` in all repos, but we only try adding it on
> the sparse repos, and then we immediately restore it on the full repo.
> As we won't use the modified version on the full repo, could this
> perhaps be `run_on_sparse` instead? If so, we could also save the
> later `git -C full-checkout checkout HEAD -- folder1/a`.

Good idea.

>> +       run_on_all ../edit-contents folder1/new &&
>> +
>> +       test_sparse_match git status --porcelain=v2 &&
>> +
>> +       # This "git add folder1/a" is completely ignored
>> +       # by the sparse-checkout repos. It causes the
>> +       # full repo to have a different staged environment.
>> +       #
>> +       # This is not a desirable behavior, but this test
>> +       # ensures that the sparse-index is not the cause
>> +       # of a behavior change.
> 
> I'm not sure I understand what the undesirable behavior is in this
> sentence. Is it "git add folder1/a" erroring out and not updating
> `folder1/a`? Or the full repo having a different staged environment?

Perhaps this isn't actually undesirable, now that we are actually
returning an error. It's no longer silent, so maybe my comment is
stale from an earlier version.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 215+ messages in thread

* Re: [PATCH v2 0/8] Sparse-index: integrate with status
  2021-05-13  4:12   ` [PATCH v2 0/8] Sparse-index: integrate with status Elijah Newren
@ 2021-05-14 18:28     ` Derrick Stolee
  0 siblings, 0 replies; 215+ messages in thread
From: Derrick Stolee @ 2021-05-14 18:28 UTC (permalink / raw)
  To: Elijah Newren, Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Junio C Hamano, Matheus Tavares Bernardino,
	Derrick Stolee

On 5/13/2021 12:12 AM, Elijah Newren wrote:
> On Fri, Apr 23, 2021 at 2:34 PM Derrick Stolee via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
>>
>> This is the first "payoff" series in the sparse-index work. It makes 'git
>> status' very fast when a sparse-index is enabled on a repository with
>> cone-mode sparse-checkout (and a small populated set).
> 
> I read through the topic, both my old comments, the range-diff, and
> the new patches where the range-diff wasn't enough.  I tried to spot
> issues, and was hoping to find problems you alluded to in your recent
> comments at https://lore.kernel.org/git/05932ebc-04ac-b3c5-a460-5d37d8604fd9@gmail.com/,
> but I failed to spot them.  I hope it has to do with the cache bottom
> stuff that I just don't understand, because otherwise I just missed
> the problems in my review.  I can say that in v2 you fixed the issues
> I did spot in my review of v1.
> 
> I'll look forward to v3 to see what it was I missed.  If I somehow
> don't respond soon (in a week at the latest), do feel free to ping me;
> sorry for somehow having this one slip through the cracks.

v3 is on the way. The changes related to issues I found in my
deeper testing are more about what wasn't previously tested in
my test script as opposed to things actually being wrong in
the patch series. (There is one case where some new code was
incorrect, but it wasn't being tested because of the test repo's
data shape.)

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 215+ messages in thread

* [PATCH v3 00/12] Sparse-index: integrate with status
  2021-04-23 21:34 ` [PATCH v2 0/8] Sparse-index: integrate with status Derrick Stolee via GitGitGadget
                     ` (8 preceding siblings ...)
  2021-05-13  4:12   ` [PATCH v2 0/8] Sparse-index: integrate with status Elijah Newren
@ 2021-05-14 18:30   ` Derrick Stolee via GitGitGadget
  2021-05-14 18:31     ` [PATCH v3 01/12] sparse-index: skip indexes with unmerged entries Derrick Stolee via GitGitGadget
                       ` (12 more replies)
  9 siblings, 13 replies; 215+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-05-14 18:30 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee,
	Derrick Stolee

This is the first "payoff" series in the sparse-index work. It makes 'git
status' very fast when a sparse-index is enabled on a repository with
cone-mode sparse-checkout (and a small populated set).

This is based on ds/sparse-index-protections AND mt/add-rm-sparse-checkout.
The latter branch is needed because it changes the behavior of 'git add'
around sparse entries, which changes the expectations of a test added in
patch 1.

The approach here is to audit the places where ensure_full_index() pops up
while doing normal commands with pathspecs within the sparse-checkout
definition. Each of these are checked and tested. In the end, the
sparse-index is integrated with these features:

 * git status
 * FS Monitor index extension.

The performance tests in p2000-sparse-operations.sh improve by 95% or more,
even when compared with the full-index cases, not just the sparse-index
cases that previously had extra overhead.

Hopefully this is the first example of how ds/sparse-index-protections has
done the basic work to do these conversions safely, making them look easier
than they seemed when starting this adventure.

Thanks, -Stolee


Updates in V3
=============

Sorry that this was a long time coming. I got a little side-tracked on other
projects, but I also worked to get the sparse-index feature working against
the Scalar functional tests, which contain many special cases around the
sparse-checkout feature as they were inherited from special cases that arose
in the virtualized environment of VFS for Git. This version contains my
fixes based on that investigation. Most of these were easy to identify and
fix, but I was blocked for a long time struggling with a bug when combining
the sparse-index with the builtin FS Monitor feature, but I've reported my
findings already [1].

[1]
https://lore.kernel.org/git/0b9e54ba-ac27-e537-7bef-1b4448f92352@gmail.com/

 * Updated comments and tests based on the v2 feedback.
 * Expanded the test repository data shape based on the special cases found
   during my investigation.
 * Added several commits that either fix errors in the status code, or fix
   errors in the previous sparse-index series, specifically:
   * When in a conflict state, the cache-tree fails to update. For now, skip
     writing a sparse-index until this can be resolved more carefully.
   * When expanding a sparse-directory entry, we set the CE_SKIP_WORKTREE
     bit but forgot the CE_EXTENDED bit.
   * git status had failures if there was a sparse-directory entry as the
     first entry within a directory.
   * When expanding a directory to report its status, such as when a
     sparse-directory is staged but doesn't exist at HEAD (such as in an
     orphaned commit) we did not previously recurse correctly into
     subdirectories.
   * Be extra careful with the FS Monitor data when expanding or contracting
     an index. This version now abandons all FS Monitor data at these
     conversion points with the expectation that in the future these
     conversions will be rare so the FS Monitor feature can work
     efficiently. Updates in V2

----------------------------------------------------------------------------

 * Based on the feedback, it is clear that 'git add' will require much more
   careful testing and thought. I'm splitting it out of this series and it
   will return with a follow-up.
 * Test cases are improved, both in coverage and organization.
 * The previous "unpack-trees: make sparse aware" patch is split into three
   now.
 * Stale messages based on an old implementation of the "protections" topic
   are now fixed.
 * Performance tests were re-run.

Derrick Stolee (12):
  sparse-index: skip indexes with unmerged entries
  sparse-index: include EXTENDED flag when expanding
  t1092: expand repository data shape
  t1092: add tests for status/add and sparse files
  unpack-trees: preserve cache_bottom
  unpack-trees: compare sparse directories correctly
  unpack-trees: stop recursing into sparse directories
  dir.c: accept a directory as part of cone-mode patterns
  status: skip sparse-checkout percentage with sparse-index
  status: use sparse-index throughout
  wt-status: expand added sparse directory entries
  fsmonitor: integrate with sparse index

 builtin/commit.c                         |   3 +
 diff-lib.c                               |   6 ++
 dir.c                                    |  11 +++
 read-cache.c                             |  10 +-
 sparse-index.c                           |  27 +++++-
 t/t1092-sparse-checkout-compatibility.sh | 117 ++++++++++++++++++++++-
 t/t7519-status-fsmonitor.sh              |  48 ++++++++++
 unpack-trees.c                           |  27 +++++-
 wt-status.c                              |  64 ++++++++++++-
 wt-status.h                              |   1 +
 10 files changed, 300 insertions(+), 14 deletions(-)


base-commit: f723f370c89ad61f4f40aabfd3540b1ce19c00e5
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-932%2Fderrickstolee%2Fsparse-index%2Fstatus-and-add-v3
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-932/derrickstolee/sparse-index/status-and-add-v3
Pull-Request: https://github.com/gitgitgadget/git/pull/932

Range-diff vs v2:

  -:  ------------ >  1:  5a2ed3d1d701 sparse-index: skip indexes with unmerged entries
  -:  ------------ >  2:  8aa41e749471 sparse-index: include EXTENDED flag when expanding
  -:  ------------ >  3:  70971b1f9261 t1092: expand repository data shape
  1:  3bac9edae7d8 !  4:  a80b5a41153f t1092: add tests for status/add and sparse files
     @@ Commit message
          and files outside of the sparse cone.
      
          Specifically, 'folder1/a' is a file in our test repo, but 'folder1' is
     -    not in the sparse cone. When 'folder1/a' is modified, the file
     -    'folder1/a' is shown as modified, but adding it fails. This is new
     -    behavior as of a20f704 (add: warn when asked to update SKIP_WORKTREE
     -    entries, 2021-04-08). Before that change, these adds would be silently
     -    ignored.
     +    not in the sparse cone. When 'folder1/a' is modified, the file is not
     +    shown as modified and adding it will fail. This is new behavior as of
     +    a20f704 (add: warn when asked to update SKIP_WORKTREE entries,
     +    2021-04-08). Before that change, these adds would be silently ignored.
      
          Untracked files are fine: adding new files both with 'git add .' and
          'git add folder1/' works just as in a full checkout. This may not be
     @@ t/t1092-sparse-checkout-compatibility.sh: test_expect_success 'add, commit, chec
      +test_expect_success 'status/add: outside sparse cone' '
      +	init_repos &&
      +
     ++	# adding a "missing" file outside the cone should fail
     ++	test_sparse_match test_must_fail git add folder1/a &&
     ++
      +	# folder1 is at HEAD, but outside the sparse cone
      +	run_on_sparse mkdir folder1 &&
      +	cp initial-repo/folder1/a sparse-checkout/folder1/a &&
     @@ t/t1092-sparse-checkout-compatibility.sh: test_expect_success 'add, commit, chec
      +	write_script edit-contents <<-\EOF &&
      +	echo text >>$1
      +	EOF
     -+	run_on_all ../edit-contents folder1/a &&
     ++	run_on_sparse ../edit-contents folder1/a &&
      +	run_on_all ../edit-contents folder1/new &&
      +
      +	test_sparse_match git status --porcelain=v2 &&
      +
     -+	# This "git add folder1/a" is completely ignored
     -+	# by the sparse-checkout repos. It causes the
     -+	# full repo to have a different staged environment.
     -+	#
     -+	# This is not a desirable behavior, but this test
     -+	# ensures that the sparse-index is not the cause
     -+	# of a behavior change.
     ++	# This "git add folder1/a" fails with a warning
     ++	# in the sparse repos, differing from the full
     ++	# repo. This is intentional.
      +	test_sparse_match test_must_fail git add folder1/a &&
      +	test_sparse_match test_must_fail git add --refresh folder1/a &&
     -+	git -C full-checkout checkout HEAD -- folder1/a &&
      +	test_all_match git status --porcelain=v2 &&
      +
      +	test_all_match git add . &&
  2:  19344394379d =  5:  07a45b661c4a unpack-trees: preserve cache_bottom
  3:  24e71d8c0622 !  6:  cc4a526e7947 unpack-trees: compare sparse directories correctly
     @@ unpack-trees.c: static int compare_entry(const struct cache_entry *ce, const str
       	if (cmp)
       		return cmp;
       
     -+	/* If ce is a sparse directory, then allow an exact match. */
     ++	/*
     ++	 * At this point, we know that we have a prefix match. If ce
     ++	 * is a sparse directory, then allow an exact match. This only
     ++	 * works when the input name is a directory, since ce->name
     ++	 * ends in a directory separator.
     ++	 */
      +	if (S_ISSPARSEDIR(ce->ce_mode))
      +		return 0;
      +
  4:  d3c8948d0a33 !  7:  598375d3531f unpack-trees: stop recursing into sparse directories
     @@ Commit message
      
          Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
      
     + ## diff-lib.c ##
     +@@ diff-lib.c: static void show_new_file(struct rev_info *revs,
     + 	unsigned int mode;
     + 	unsigned dirty_submodule = 0;
     + 
     ++	if (S_ISSPARSEDIR(new_file->ce_mode))
     ++		return;
     ++
     + 	/*
     + 	 * New file in the index: it might actually be different in
     + 	 * the working tree.
     +@@ diff-lib.c: static int show_modified(struct rev_info *revs,
     + 	const struct object_id *oid;
     + 	unsigned dirty_submodule = 0;
     + 
     ++	if (S_ISSPARSEDIR(new_entry->ce_mode))
     ++		return 0;
     ++
     + 	if (get_stat_data(new_entry, &oid, &mode, cached, match_missing,
     + 			  &dirty_submodule, &revs->diffopt) < 0) {
     + 		if (report_missing)
     +
       ## unpack-trees.c ##
      @@ unpack-trees.c: static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, str
       	struct cache_entry *src[MAX_UNPACK_TREES + 1] = { NULL, };
     @@ unpack-trees.c: static int unpack_callback(int n, unsigned long mask, unsigned l
       	/* Find first entry with a real name (we could use "mask" too) */
       	while (!p->mode)
      @@ unpack-trees.c: static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, str
     - 					}
     - 				}
     - 				src[0] = ce;
     -+
     -+				if (S_ISSPARSEDIR(ce->ce_mode))
     -+					unpack_tree = 0;
     - 			}
     - 			break;
       		}
       	}
       
  5:  fd96b71968b6 =  8:  47da2b317237 dir.c: accept a directory as part of cone-mode patterns
  6:  1f4ba56e7416 =  9:  bc1512981493 status: skip sparse-checkout percentage with sparse-index
  7:  3d09368c0541 = 10:  5b1ae369a7cd status: use sparse-index throughout
  -:  ------------ > 11:  3b42783d4a86 wt-status: expand added sparse directory entries
  8:  1fd033a6ebb2 ! 12:  b72507f514d1 fsmonitor: test with sparse index
     @@ Metadata
      Author: Derrick Stolee <dstolee@microsoft.com>
      
       ## Commit message ##
     -    fsmonitor: test with sparse index
     +    fsmonitor: integrate with sparse index
      
     -    During the effort to protect uses of the index to operate on a full
     -    index, we did not modify fsmonitor.c. This is because it already works
     -    effectively with only the change to index_name_stage_pos(). The only
     -    thing left to do is to test that it works correctly.
     +    If we need to expand a sparse-index into a full one, then the FS Monitor
     +    bitmap is going to be incorrect. Ensure that we start fresh at such an
     +    event.
     +
     +    While this is currently a performance drawback, the eventual hope of the
     +    sparse-index feature is that these expansions will be rare and hence we
     +    will be able to keep the FS Monitor data accurate across multiple Git
     +    commands.
      
          These tests are added to demonstrate that the behavior is the same
          across a full index and a sparse index, but also that file modifications
     @@ Commit message
      
          Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
      
     + ## sparse-index.c ##
     +@@ sparse-index.c: int convert_to_sparse(struct index_state *istate)
     + 	cache_tree_free(&istate->cache_tree);
     + 	cache_tree_update(istate, 0);
     + 
     ++	istate->fsmonitor_has_run_once = 0;
     ++	FREE_AND_NULL(istate->fsmonitor_dirty);
     ++	FREE_AND_NULL(istate->fsmonitor_last_update);
     ++
     + 	istate->sparse_index = 1;
     + 	trace2_region_leave("index", "convert_to_sparse", istate->repo);
     + 	return 0;
     +@@ sparse-index.c: void ensure_full_index(struct index_state *istate)
     + 	istate->cache = full->cache;
     + 	istate->cache_nr = full->cache_nr;
     + 	istate->cache_alloc = full->cache_alloc;
     ++	istate->fsmonitor_has_run_once = 0;
     ++	FREE_AND_NULL(istate->fsmonitor_dirty);
     ++	FREE_AND_NULL(istate->fsmonitor_last_update);
     + 
     + 	strbuf_release(&base);
     + 	free(full);
     +
       ## t/t7519-status-fsmonitor.sh ##
      @@ t/t7519-status-fsmonitor.sh: test_expect_success 'setup' '
       	expect*

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 215+ messages in thread

* [PATCH v3 01/12] sparse-index: skip indexes with unmerged entries
  2021-05-14 18:30   ` [PATCH v3 00/12] " Derrick Stolee via GitGitGadget
@ 2021-05-14 18:31     ` Derrick Stolee via GitGitGadget
  2021-05-14 18:31     ` [PATCH v3 02/12] sparse-index: include EXTENDED flag when expanding Derrick Stolee via GitGitGadget
                       ` (11 subsequent siblings)
  12 siblings, 0 replies; 215+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-05-14 18:31 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

The sparse-index format is designed to be compatible with merge
conflicts, even those outside the sparse-checkout definition. The reason
is that when converting a full index to a sparse one, a cache entry with
nonzero stage will not be collapsed into a sparse directory entry.

However, this behavior was not tested, and a different behavior within
convert_to_sparse() fails in this scenario. Specifically,
cache_tree_update() will fail when unmerged entries exist.
convert_to_sparse_rec() uses the cache-tree data to recursively walk the
tree structure, but also to compute the OIDs used in the
sparse-directory entries.

Add an index scan to convert_to_sparse() that will detect if these merge
conflict entries exist and skip the conversion before trying to update
the cache-tree. This is marked as NEEDSWORK because this can be removed
with a suitable update to cache_tree_update() or a similar method that
can construct a cache-tree with invalid nodes, but still allow creating
the nodes necessary for creating sparse directory entries.

It is possible that in the future we will not need to make such an
update, since if we do not expand a sparse-index into a full one, this
conversion does not need to happen. Thus, this can be deferred until the
merge machinery is made to integrate with the sparse-index.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 sparse-index.c                           | 18 ++++++++++++++++++
 t/t1092-sparse-checkout-compatibility.sh | 22 ++++++++++++++++++++++
 2 files changed, 40 insertions(+)

diff --git a/sparse-index.c b/sparse-index.c
index 6f21397e2ee0..1b49898d0cb7 100644
--- a/sparse-index.c
+++ b/sparse-index.c
@@ -125,6 +125,17 @@ int set_sparse_index_config(struct repository *repo, int enable)
 	return res;
 }
 
+static int index_has_unmerged_entries(struct index_state *istate)
+{
+	int i;
+	for (i = 0; i < istate->cache_nr; i++) {
+		if (ce_stage(istate->cache[i]))
+			return 1;
+	}
+
+	return 0;
+}
+
 int convert_to_sparse(struct index_state *istate)
 {
 	int test_env;
@@ -161,6 +172,13 @@ int convert_to_sparse(struct index_state *istate)
 		return -1;
 	}
 
+	/*
+	 * NEEDSWORK: If we have unmerged entries, then stay full.
+	 * Unmerged entries prevent the cache-tree extension from working.
+	 */
+	if (index_has_unmerged_entries(istate))
+		return 0;
+
 	if (cache_tree_update(istate, 0)) {
 		warning(_("unable to update cache-tree, staying full"));
 		return -1;
diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index 12e6c453024f..4f2f09b53a32 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -352,6 +352,28 @@ test_expect_success 'merge with outside renames' '
 	done
 '
 
+# Sparse-index fails to convert the index in the
+# final 'git cherry-pick' command.
+test_expect_success 'cherry-pick with conflicts' '
+	init_repos &&
+
+	write_script edit-conflict <<-\EOF &&
+	echo $1 >conflict
+	EOF
+
+	test_all_match git checkout -b to-cherry-pick &&
+	run_on_all ../edit-conflict ABC &&
+	test_all_match git add conflict &&
+	test_all_match git commit -m "conflict to pick" &&
+
+	test_all_match git checkout -B base HEAD~1 &&
+	run_on_all ../edit-conflict DEF &&
+	test_all_match git add conflict &&
+	test_all_match git commit -m "conflict in base" &&
+
+	test_all_match test_must_fail git cherry-pick to-cherry-pick
+'
+
 test_expect_success 'clean' '
 	init_repos &&
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 215+ messages in thread

* [PATCH v3 02/12] sparse-index: include EXTENDED flag when expanding
  2021-05-14 18:30   ` [PATCH v3 00/12] " Derrick Stolee via GitGitGadget
  2021-05-14 18:31     ` [PATCH v3 01/12] sparse-index: skip indexes with unmerged entries Derrick Stolee via GitGitGadget
@ 2021-05-14 18:31     ` Derrick Stolee via GitGitGadget
  2021-05-18  1:33       ` Elijah Newren
  2021-05-14 18:31     ` [PATCH v3 03/12] t1092: expand repository data shape Derrick Stolee via GitGitGadget
                       ` (10 subsequent siblings)
  12 siblings, 1 reply; 215+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-05-14 18:31 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

When creating a full index from a sparse one, we create cache entries
for every blob within a given sparse directory entry. These are
correctly marked with the CE_SKIP_WORKTREE flag, but they must also be
marked with the CE_EXTENDED flag to ensure that the skip-worktree bit is
correctly written to disk in the case that the index is not converted
back down to a sparse-index.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 sparse-index.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/sparse-index.c b/sparse-index.c
index 1b49898d0cb7..b2b3fbd75050 100644
--- a/sparse-index.c
+++ b/sparse-index.c
@@ -222,7 +222,7 @@ static int add_path_to_index(const struct object_id *oid,
 	strbuf_addstr(base, path);
 
 	ce = make_cache_entry(istate, mode, oid, base->buf, 0, 0);
-	ce->ce_flags |= CE_SKIP_WORKTREE;
+	ce->ce_flags |= CE_SKIP_WORKTREE | CE_EXTENDED;
 	set_index_entry(istate, istate->cache_nr++, ce);
 
 	strbuf_setlen(base, len);
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 215+ messages in thread

* [PATCH v3 03/12] t1092: expand repository data shape
  2021-05-14 18:30   ` [PATCH v3 00/12] " Derrick Stolee via GitGitGadget
  2021-05-14 18:31     ` [PATCH v3 01/12] sparse-index: skip indexes with unmerged entries Derrick Stolee via GitGitGadget
  2021-05-14 18:31     ` [PATCH v3 02/12] sparse-index: include EXTENDED flag when expanding Derrick Stolee via GitGitGadget
@ 2021-05-14 18:31     ` Derrick Stolee via GitGitGadget
  2021-05-18  1:49       ` Elijah Newren
  2021-05-14 18:31     ` [PATCH v3 04/12] t1092: add tests for status/add and sparse files Derrick Stolee via GitGitGadget
                       ` (9 subsequent siblings)
  12 siblings, 1 reply; 215+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-05-14 18:31 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

As more features integrate with the sparse-index feature, more and more
special cases arise that require different data shapes within the tree
structure of the repository in order to demonstrate those cases.

Add several interesting special cases all at once instead of sprinkling
them across several commits. The interesting cases being added here are:

* Add sparse-directory entries on both sides of directories within the
  sparse-checkout definition.

* Add directories outside the sparse-checkout definition who have only
  one entry and are the first entry of a directory with multiple
  entries.

Later tests will take advantage of these shapes, but they also deepen
the tests that already exist.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 t/t1092-sparse-checkout-compatibility.sh | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index 4f2f09b53a32..98257695979a 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -17,7 +17,7 @@ test_expect_success 'setup' '
 		echo "after folder1" >g &&
 		echo "after x" >z &&
 		mkdir folder1 folder2 deep x &&
-		mkdir deep/deeper1 deep/deeper2 &&
+		mkdir deep/deeper1 deep/deeper2 deep/before deep/later &&
 		mkdir deep/deeper1/deepest &&
 		echo "after deeper1" >deep/e &&
 		echo "after deepest" >deep/deeper1/e &&
@@ -25,10 +25,16 @@ test_expect_success 'setup' '
 		cp a folder2 &&
 		cp a x &&
 		cp a deep &&
+		cp a deep/before &&
 		cp a deep/deeper1 &&
 		cp a deep/deeper2 &&
+		cp a deep/later &&
 		cp a deep/deeper1/deepest &&
 		cp -r deep/deeper1/deepest deep/deeper2 &&
+		mkdir deep/deeper1/0 &&
+		mkdir deep/deeper1/0/0 &&
+		touch deep/deeper1/0/1 &&
+		touch deep/deeper1/0/0/0 &&
 		git add . &&
 		git commit -m "initial commit" &&
 		git checkout -b base &&
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 215+ messages in thread

* [PATCH v3 04/12] t1092: add tests for status/add and sparse files
  2021-05-14 18:30   ` [PATCH v3 00/12] " Derrick Stolee via GitGitGadget
                       ` (2 preceding siblings ...)
  2021-05-14 18:31     ` [PATCH v3 03/12] t1092: expand repository data shape Derrick Stolee via GitGitGadget
@ 2021-05-14 18:31     ` Derrick Stolee via GitGitGadget
  2021-05-14 18:31     ` [PATCH v3 05/12] unpack-trees: preserve cache_bottom Derrick Stolee via GitGitGadget
                       ` (8 subsequent siblings)
  12 siblings, 0 replies; 215+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-05-14 18:31 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Before moving to update 'git status' and 'git add' to work with sparse
indexes, add an explicit test that ensures the sparse-index works the
same as a normal sparse-checkout when the worktree contains directories
and files outside of the sparse cone.

Specifically, 'folder1/a' is a file in our test repo, but 'folder1' is
not in the sparse cone. When 'folder1/a' is modified, the file is not
shown as modified and adding it will fail. This is new behavior as of
a20f704 (add: warn when asked to update SKIP_WORKTREE entries,
2021-04-08). Before that change, these adds would be silently ignored.

Untracked files are fine: adding new files both with 'git add .' and
'git add folder1/' works just as in a full checkout. This may not be
entirely desirable, but we are not intending to change behavior at the
moment, only document it. A future change could alter the behavior to
be more sensible, and this test could be modified to satisfy the new
expected behavior.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 t/t1092-sparse-checkout-compatibility.sh | 38 ++++++++++++++++++++++++
 1 file changed, 38 insertions(+)

diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index 98257695979a..fba98d5484ae 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -238,6 +238,44 @@ test_expect_success 'add, commit, checkout' '
 	test_all_match git checkout -
 '
 
+test_expect_success 'status/add: outside sparse cone' '
+	init_repos &&
+
+	# adding a "missing" file outside the cone should fail
+	test_sparse_match test_must_fail git add folder1/a &&
+
+	# folder1 is at HEAD, but outside the sparse cone
+	run_on_sparse mkdir folder1 &&
+	cp initial-repo/folder1/a sparse-checkout/folder1/a &&
+	cp initial-repo/folder1/a sparse-index/folder1/a &&
+
+	test_sparse_match git status &&
+
+	write_script edit-contents <<-\EOF &&
+	echo text >>$1
+	EOF
+	run_on_sparse ../edit-contents folder1/a &&
+	run_on_all ../edit-contents folder1/new &&
+
+	test_sparse_match git status --porcelain=v2 &&
+
+	# This "git add folder1/a" fails with a warning
+	# in the sparse repos, differing from the full
+	# repo. This is intentional.
+	test_sparse_match test_must_fail git add folder1/a &&
+	test_sparse_match test_must_fail git add --refresh folder1/a &&
+	test_all_match git status --porcelain=v2 &&
+
+	test_all_match git add . &&
+	test_all_match git status --porcelain=v2 &&
+	test_all_match git commit -m folder1/new &&
+
+	run_on_all ../edit-contents folder1/newer &&
+	test_all_match git add folder1/ &&
+	test_all_match git status --porcelain=v2 &&
+	test_all_match git commit -m folder1/newer
+'
+
 test_expect_success 'checkout and reset --hard' '
 	init_repos &&
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 215+ messages in thread

* [PATCH v3 05/12] unpack-trees: preserve cache_bottom
  2021-05-14 18:30   ` [PATCH v3 00/12] " Derrick Stolee via GitGitGadget
                       ` (3 preceding siblings ...)
  2021-05-14 18:31     ` [PATCH v3 04/12] t1092: add tests for status/add and sparse files Derrick Stolee via GitGitGadget
@ 2021-05-14 18:31     ` Derrick Stolee via GitGitGadget
  2021-05-14 18:31     ` [PATCH v3 06/12] unpack-trees: compare sparse directories correctly Derrick Stolee via GitGitGadget
                       ` (7 subsequent siblings)
  12 siblings, 0 replies; 215+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-05-14 18:31 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

The cache_bottom member of 'struct unpack_trees_options' is used to
track the range of index entries corresponding to a node of the cache
tree. While recursing with traverse_by_cache_tree(), this value is
preserved on the call stack using a local and then restored as that
method returns.

The mark_ce_used() method normally modifies the cache_bottom member when
it refers to the marked cache entry. However, sparse directory entries
are stored as nodes in the cache-tree data structure as of 2de37c53
(cache-tree: integrate with sparse directory entries, 2021-03-30). Thus,
the cache_bottom will be modified as the cache-tree walk advances. Do
not update it as well within mark_ce_used().

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 unpack-trees.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/unpack-trees.c b/unpack-trees.c
index dddf106d5bd4..1067db19c9d2 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -586,6 +586,13 @@ static void mark_ce_used(struct cache_entry *ce, struct unpack_trees_options *o)
 {
 	ce->ce_flags |= CE_UNPACKED;
 
+	/*
+	 * If this is a sparse directory, don't advance cache_bottom.
+	 * That will be advanced later using the cache-tree data.
+	 */
+	if (S_ISSPARSEDIR(ce->ce_mode))
+		return;
+
 	if (o->cache_bottom < o->src_index->cache_nr &&
 	    o->src_index->cache[o->cache_bottom] == ce) {
 		int bottom = o->cache_bottom;
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 215+ messages in thread

* [PATCH v3 06/12] unpack-trees: compare sparse directories correctly
  2021-05-14 18:30   ` [PATCH v3 00/12] " Derrick Stolee via GitGitGadget
                       ` (4 preceding siblings ...)
  2021-05-14 18:31     ` [PATCH v3 05/12] unpack-trees: preserve cache_bottom Derrick Stolee via GitGitGadget
@ 2021-05-14 18:31     ` Derrick Stolee via GitGitGadget
  2021-05-14 18:31     ` [PATCH v3 07/12] unpack-trees: stop recursing into sparse directories Derrick Stolee via GitGitGadget
                       ` (6 subsequent siblings)
  12 siblings, 0 replies; 215+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-05-14 18:31 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

As we further integrate the sparse-index into unpack-trees, we need to
ensure that we compare sparse directory entries correctly with other
entries. This affects searching for an exact path as well as sorting
index entries.

Sparse directory entries contain the trailing directory separator. This
is important for the sorting, in particular. Thus, within
do_compare_entry() we stop using S_IFREG in all cases, since sparse
directories should use S_IFDIR to indicate that the comparison should
treat the entry name as a dirctory.

Within compare_entry(), it first calls do_compare_entry() to check the
leading portion of the name. When the input path is a directory name, we
could match exactly already. Thus, we should return 0 if we have an
exact string match on a sparse directory entry.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 unpack-trees.c | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/unpack-trees.c b/unpack-trees.c
index 1067db19c9d2..ef6a2b1c951c 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -969,6 +969,7 @@ static int do_compare_entry(const struct cache_entry *ce,
 	int pathlen, ce_len;
 	const char *ce_name;
 	int cmp;
+	unsigned ce_mode;
 
 	/*
 	 * If we have not precomputed the traverse path, it is quicker
@@ -991,7 +992,8 @@ static int do_compare_entry(const struct cache_entry *ce,
 	ce_len -= pathlen;
 	ce_name = ce->name + pathlen;
 
-	return df_name_compare(ce_name, ce_len, S_IFREG, name, namelen, mode);
+	ce_mode = S_ISSPARSEDIR(ce->ce_mode) ? S_IFDIR : S_IFREG;
+	return df_name_compare(ce_name, ce_len, ce_mode, name, namelen, mode);
 }
 
 static int compare_entry(const struct cache_entry *ce, const struct traverse_info *info, const struct name_entry *n)
@@ -1000,6 +1002,15 @@ static int compare_entry(const struct cache_entry *ce, const struct traverse_inf
 	if (cmp)
 		return cmp;
 
+	/*
+	 * At this point, we know that we have a prefix match. If ce
+	 * is a sparse directory, then allow an exact match. This only
+	 * works when the input name is a directory, since ce->name
+	 * ends in a directory separator.
+	 */
+	if (S_ISSPARSEDIR(ce->ce_mode))
+		return 0;
+
 	/*
 	 * Even if the beginning compared identically, the ce should
 	 * compare as bigger than a directory leading up to it!
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 215+ messages in thread

* [PATCH v3 07/12] unpack-trees: stop recursing into sparse directories
  2021-05-14 18:30   ` [PATCH v3 00/12] " Derrick Stolee via GitGitGadget
                       ` (5 preceding siblings ...)
  2021-05-14 18:31     ` [PATCH v3 06/12] unpack-trees: compare sparse directories correctly Derrick Stolee via GitGitGadget
@ 2021-05-14 18:31     ` Derrick Stolee via GitGitGadget
  2021-05-18  2:03       ` Elijah Newren
  2021-05-14 18:31     ` [PATCH v3 08/12] dir.c: accept a directory as part of cone-mode patterns Derrick Stolee via GitGitGadget
                       ` (5 subsequent siblings)
  12 siblings, 1 reply; 215+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-05-14 18:31 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

When walking trees using traverse_trees_recursive() and
unpack_callback(), we must not attempt to walk into a sparse directory
entry. There are no index entries within that directory to compare to
the tree object at that position, so skip over the entries of that tree.

This code is used in many places, so the only way to test it is to start
removing the command_requres_full_index option from one builtin at a
time and carefully test that its use of unpack_trees() behaves correctly
with a sparse-index. Such tests will be added by later changes.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 diff-lib.c     | 6 ++++++
 unpack-trees.c | 7 +++++--
 2 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/diff-lib.c b/diff-lib.c
index b73cc1859a49..d5e7e01132ee 100644
--- a/diff-lib.c
+++ b/diff-lib.c
@@ -322,6 +322,9 @@ static void show_new_file(struct rev_info *revs,
 	unsigned int mode;
 	unsigned dirty_submodule = 0;
 
+	if (S_ISSPARSEDIR(new_file->ce_mode))
+		return;
+
 	/*
 	 * New file in the index: it might actually be different in
 	 * the working tree.
@@ -343,6 +346,9 @@ static int show_modified(struct rev_info *revs,
 	const struct object_id *oid;
 	unsigned dirty_submodule = 0;
 
+	if (S_ISSPARSEDIR(new_entry->ce_mode))
+		return 0;
+
 	if (get_stat_data(new_entry, &oid, &mode, cached, match_missing,
 			  &dirty_submodule, &revs->diffopt) < 0) {
 		if (report_missing)
diff --git a/unpack-trees.c b/unpack-trees.c
index ef6a2b1c951c..703b0bdc9dfd 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -1261,6 +1261,7 @@ static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, str
 	struct cache_entry *src[MAX_UNPACK_TREES + 1] = { NULL, };
 	struct unpack_trees_options *o = info->data;
 	const struct name_entry *p = names;
+	unsigned unpack_tree = 1;
 
 	/* Find first entry with a real name (we could use "mask" too) */
 	while (!p->mode)
@@ -1307,7 +1308,8 @@ static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, str
 		}
 	}
 
-	if (unpack_nondirectories(n, mask, dirmask, src, names, info) < 0)
+	if (unpack_tree &&
+	    unpack_nondirectories(n, mask, dirmask, src, names, info) < 0)
 		return -1;
 
 	if (o->merge && src[0]) {
@@ -1337,7 +1339,8 @@ static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, str
 			}
 		}
 
-		if (traverse_trees_recursive(n, dirmask, mask & ~dirmask,
+		if (unpack_tree &&
+		    traverse_trees_recursive(n, dirmask, mask & ~dirmask,
 					     names, info) < 0)
 			return -1;
 		return mask;
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 215+ messages in thread

* [PATCH v3 08/12] dir.c: accept a directory as part of cone-mode patterns
  2021-05-14 18:30   ` [PATCH v3 00/12] " Derrick Stolee via GitGitGadget
                       ` (6 preceding siblings ...)
  2021-05-14 18:31     ` [PATCH v3 07/12] unpack-trees: stop recursing into sparse directories Derrick Stolee via GitGitGadget
@ 2021-05-14 18:31     ` Derrick Stolee via GitGitGadget
  2021-05-14 18:31     ` [PATCH v3 09/12] status: skip sparse-checkout percentage with sparse-index Derrick Stolee via GitGitGadget
                       ` (4 subsequent siblings)
  12 siblings, 0 replies; 215+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-05-14 18:31 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

When we have sparse directory entries in the index, we want to compare
that directory against sparse-checkout patterns. Those pattern matching
algorithms are built expecting a file path, not a directory path. This
is especially important in the "cone mode" patterns which will match
files that exist within the "parent directories" as well as the
recursive directory matches.

If path_matches_pattern_list() is given a directory, we can add a fake
filename ("-") to the directory and get the same results as before,
assuming we are in cone mode. Since sparse index requires cone mode
patterns, this is an acceptable assumption.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 dir.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/dir.c b/dir.c
index 166238e79f52..ab76ef286495 100644
--- a/dir.c
+++ b/dir.c
@@ -1378,6 +1378,17 @@ enum pattern_match_result path_matches_pattern_list(
 	strbuf_addch(&parent_pathname, '/');
 	strbuf_add(&parent_pathname, pathname, pathlen);
 
+	/*
+	 * Directory entries are matched if and only if a file
+	 * contained immediately within them is matched. For the
+	 * case of a directory entry, modify the path to create
+	 * a fake filename within this directory, allowing us to
+	 * use the file-base matching logic in an equivalent way.
+	 */
+	if (parent_pathname.len > 0 &&
+	    parent_pathname.buf[parent_pathname.len - 1] == '/')
+		strbuf_add(&parent_pathname, "-", 1);
+
 	if (hashmap_contains_path(&pl->recursive_hashmap,
 				  &parent_pathname)) {
 		result = MATCHED_RECURSIVE;
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 215+ messages in thread

* [PATCH v3 09/12] status: skip sparse-checkout percentage with sparse-index
  2021-05-14 18:30   ` [PATCH v3 00/12] " Derrick Stolee via GitGitGadget
                       ` (7 preceding siblings ...)
  2021-05-14 18:31     ` [PATCH v3 08/12] dir.c: accept a directory as part of cone-mode patterns Derrick Stolee via GitGitGadget
@ 2021-05-14 18:31     ` Derrick Stolee via GitGitGadget
  2021-05-14 18:31     ` [PATCH v3 10/12] status: use sparse-index throughout Derrick Stolee via GitGitGadget
                       ` (3 subsequent siblings)
  12 siblings, 0 replies; 215+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-05-14 18:31 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

'git status' began reporting a percentage of populated paths when
sparse-checkout is enabled in 051df3cf (wt-status: show sparse
checkout status as well, 2020-07-18). This percentage is incorrect when
the index has sparse directories. It would also be expensive to
calculate as we would need to parse trees to count the total number of
possible paths.

Avoid the expensive computation by simplifying the output to only report
that a sparse checkout exists, without the percentage.

This change is the reason we use 'git status --porcelain=v2' in
t1092-sparse-checkout-compatibility.sh. We don't want to ensure that
this message is equal across both modes, but instead just the important
information about staged, modified, and untracked files are compared.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 t/t1092-sparse-checkout-compatibility.sh |  8 ++++++++
 wt-status.c                              | 14 +++++++++++---
 wt-status.h                              |  1 +
 3 files changed, 20 insertions(+), 3 deletions(-)

diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index fba98d5484ae..34dae7fbcadd 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -202,6 +202,14 @@ test_expect_success 'status with options' '
 	test_all_match git status --porcelain=v2 -uno
 '
 
+test_expect_success 'status reports sparse-checkout' '
+	init_repos &&
+	git -C sparse-checkout status >full &&
+	git -C sparse-index status >sparse &&
+	test_i18ngrep "You are in a sparse checkout with " full &&
+	test_i18ngrep "You are in a sparse checkout." sparse
+'
+
 test_expect_success 'add, commit, checkout' '
 	init_repos &&
 
diff --git a/wt-status.c b/wt-status.c
index 0c8287a023e4..0425169c1895 100644
--- a/wt-status.c
+++ b/wt-status.c
@@ -1490,9 +1490,12 @@ static void show_sparse_checkout_in_use(struct wt_status *s,
 	if (s->state.sparse_checkout_percentage == SPARSE_CHECKOUT_DISABLED)
 		return;
 
-	status_printf_ln(s, color,
-			 _("You are in a sparse checkout with %d%% of tracked files present."),
-			 s->state.sparse_checkout_percentage);
+	if (s->state.sparse_checkout_percentage == SPARSE_CHECKOUT_SPARSE_INDEX)
+		status_printf_ln(s, color, _("You are in a sparse checkout."));
+	else
+		status_printf_ln(s, color,
+				_("You are in a sparse checkout with %d%% of tracked files present."),
+				s->state.sparse_checkout_percentage);
 	wt_longstatus_print_trailer(s);
 }
 
@@ -1650,6 +1653,11 @@ static void wt_status_check_sparse_checkout(struct repository *r,
 		return;
 	}
 
+	if (r->index->sparse_index) {
+		state->sparse_checkout_percentage = SPARSE_CHECKOUT_SPARSE_INDEX;
+		return;
+	}
+
 	for (i = 0; i < r->index->cache_nr; i++) {
 		struct cache_entry *ce = r->index->cache[i];
 		if (ce_skip_worktree(ce))
diff --git a/wt-status.h b/wt-status.h
index 0d32799b28e1..ab9cc9d8f032 100644
--- a/wt-status.h
+++ b/wt-status.h
@@ -78,6 +78,7 @@ enum wt_status_format {
 };
 
 #define SPARSE_CHECKOUT_DISABLED -1
+#define SPARSE_CHECKOUT_SPARSE_INDEX -2
 
 struct wt_status_state {
 	int merge_in_progress;
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 215+ messages in thread

* [PATCH v3 10/12] status: use sparse-index throughout
  2021-05-14 18:30   ` [PATCH v3 00/12] " Derrick Stolee via GitGitGadget
                       ` (8 preceding siblings ...)
  2021-05-14 18:31     ` [PATCH v3 09/12] status: skip sparse-checkout percentage with sparse-index Derrick Stolee via GitGitGadget
@ 2021-05-14 18:31     ` Derrick Stolee via GitGitGadget
  2021-05-14 18:31     ` [PATCH v3 11/12] wt-status: expand added sparse directory entries Derrick Stolee via GitGitGadget
                       ` (2 subsequent siblings)
  12 siblings, 0 replies; 215+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-05-14 18:31 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

By testing 'git -c core.fsmonitor= status -uno', we can check for the
simplest index operations that can be made sparse-aware. The necessary
implementation details are already integrated with sparse-checkout, so
modify command_requires_full_index to be zero for cmd_status().

In refresh_index(), we loop through the index entries to refresh their
stat() information. However, sparse directories have no stat()
information to populate. Ignore these entries.

This allows 'git status' to no longer expand a sparse index to a full
one. This is further tested by dropping the "-uno" option and adding an
untracked file into the worktree.

The performance test p2000-sparse-checkout-operations.sh demonstrates
these improvements:

Test                                  HEAD~1           HEAD
-----------------------------------------------------------------------------
2000.2: git status (full-index-v3)    0.31(0.30+0.05)  0.31(0.29+0.06) +0.0%
2000.3: git status (full-index-v4)    0.31(0.29+0.07)  0.34(0.30+0.08) +9.7%
2000.4: git status (sparse-index-v3)  2.35(2.28+0.10)  0.04(0.04+0.05) -98.3%
2000.5: git status (sparse-index-v4)  2.35(2.24+0.15)  0.05(0.04+0.06) -97.9%

Note that since HEAD~1 was expanding the sparse index by parsing trees,
it was artificially slower than the full index case. Thus, the 98%
improvement is misleading, and instead we should celebrate the 0.34s to
0.05s improvement of 85%. This is more indicative of the peformance
gains we are expecting by using a sparse index.

Note: we are dropping the assignment of core.fsmonitor here. This is not
necessary for the test script as we are not altering the config any
other way. Correct integration with FS Monitor will be validated in
later changes.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 builtin/commit.c                         |  3 +++
 read-cache.c                             | 10 ++++++++--
 t/t1092-sparse-checkout-compatibility.sh | 13 +++++++++----
 3 files changed, 20 insertions(+), 6 deletions(-)

diff --git a/builtin/commit.c b/builtin/commit.c
index cf0c36d1dcb2..e529da7beadd 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -1404,6 +1404,9 @@ int cmd_status(int argc, const char **argv, const char *prefix)
 	if (argc == 2 && !strcmp(argv[1], "-h"))
 		usage_with_options(builtin_status_usage, builtin_status_options);
 
+	prepare_repo_settings(the_repository);
+	the_repository->settings.command_requires_full_index = 0;
+
 	status_init_config(&s, git_status_config);
 	argc = parse_options(argc, argv, prefix,
 			     builtin_status_options,
diff --git a/read-cache.c b/read-cache.c
index 29ffa9ac5db9..f80e26831b36 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1578,8 +1578,7 @@ int refresh_index(struct index_state *istate, unsigned int flags,
 	 */
 	preload_index(istate, pathspec, 0);
 	trace2_region_enter("index", "refresh", NULL);
-	/* TODO: audit for interaction with sparse-index. */
-	ensure_full_index(istate);
+
 	for (i = 0; i < istate->cache_nr; i++) {
 		struct cache_entry *ce, *new_entry;
 		int cache_errno = 0;
@@ -1594,6 +1593,13 @@ int refresh_index(struct index_state *istate, unsigned int flags,
 		if (ignore_skip_worktree && ce_skip_worktree(ce))
 			continue;
 
+		/*
+		 * If this entry is a sparse directory, then there isn't
+		 * any stat() information to update. Ignore the entry.
+		 */
+		if (S_ISSPARSEDIR(ce->ce_mode))
+			continue;
+
 		if (pathspec && !ce_path_match(istate, ce, pathspec, seen))
 			filtered = 1;
 
diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index 34dae7fbcadd..59faf7381093 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -479,12 +479,17 @@ test_expect_success 'sparse-index is expanded and converted back' '
 	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \
 		git -C sparse-index -c core.fsmonitor="" reset --hard &&
 	test_region index convert_to_sparse trace2.txt &&
-	test_region index ensure_full_index trace2.txt &&
+	test_region index ensure_full_index trace2.txt
+'
 
-	rm trace2.txt &&
+test_expect_success 'sparse-index is not expanded' '
+	init_repos &&
+
+	rm -f trace2.txt &&
+	echo >>sparse-index/untracked.txt &&
 	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \
-		git -C sparse-index -c core.fsmonitor="" status -uno &&
-	test_region index ensure_full_index trace2.txt
+		git -C sparse-index status &&
+	test_region ! index ensure_full_index trace2.txt
 '
 
 test_done
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 215+ messages in thread

* [PATCH v3 11/12] wt-status: expand added sparse directory entries
  2021-05-14 18:30   ` [PATCH v3 00/12] " Derrick Stolee via GitGitGadget
                       ` (9 preceding siblings ...)
  2021-05-14 18:31     ` [PATCH v3 10/12] status: use sparse-index throughout Derrick Stolee via GitGitGadget
@ 2021-05-14 18:31     ` Derrick Stolee via GitGitGadget
  2021-05-18  2:27       ` Elijah Newren
  2021-05-14 18:31     ` [PATCH v3 12/12] fsmonitor: integrate with sparse index Derrick Stolee via GitGitGadget
  2021-05-21 11:59     ` [PATCH v4 00/12] Sparse-index: integrate with status Derrick Stolee via GitGitGadget
  12 siblings, 1 reply; 215+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-05-14 18:31 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

It is difficult, but possible, to get into a state where we intend to
add a directory that is outside of the sparse-checkout definition. Add a
test to t1092-sparse-checkout-compatibility.sh that demonstrates this
using a combination of 'git reset --mixed' and 'git checkout --orphan'.

This test failed before because the output of 'git status
--porcelain=v2' would not match on the lines for folder1/:

* The sparse-checkout repo (with a full index) would output each path
  name that is intended to be added.

* The sparse-index repo would only output that "folder1/" is staged for
  addition.

The status should report the full list of files to be added, and so this
sparse-directory entry should be expanded to a full list when reaching
it inside the wt_status_collect_changes_initial() method. Use
read_tree_at() to assist.

Somehow, this loop over the cache entries was not guarded by
ensure_full_index() as intended.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 t/t1092-sparse-checkout-compatibility.sh | 28 +++++++++++++
 wt-status.c                              | 50 ++++++++++++++++++++++++
 2 files changed, 78 insertions(+)

diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index 59faf7381093..cd3669d36b53 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -492,4 +492,32 @@ test_expect_success 'sparse-index is not expanded' '
 	test_region ! index ensure_full_index trace2.txt
 '
 
+test_expect_success 'reset mixed and checkout orphan' '
+	init_repos &&
+
+	test_all_match git checkout rename-out-to-in &&
+	test_all_match git reset --mixed HEAD~1 &&
+	test_sparse_match test-tool read-cache --table --expand &&
+	test_all_match git status --porcelain=v2 &&
+	test_all_match git status --porcelain=v2 &&
+
+	# At this point, sparse-checkouts behave differently
+	# from the full-checkout.
+	test_sparse_match git checkout --orphan new-branch &&
+	test_sparse_match test-tool read-cache --table --expand &&
+	test_sparse_match git status --porcelain=v2 &&
+	test_sparse_match git status --porcelain=v2
+'
+
+test_expect_success 'add everything with deep new file' '
+	init_repos &&
+
+	run_on_sparse git sparse-checkout set deep/deeper1/deepest &&
+
+	run_on_all touch deep/deeper1/x &&
+	test_all_match git add . &&
+	test_all_match git status --porcelain=v2 &&
+	test_all_match git status --porcelain=v2
+'
+
 test_done
diff --git a/wt-status.c b/wt-status.c
index 0425169c1895..90db8bd659fa 100644
--- a/wt-status.c
+++ b/wt-status.c
@@ -654,6 +654,34 @@ static void wt_status_collect_changes_index(struct wt_status *s)
 	run_diff_index(&rev, 1);
 }
 
+static int add_file_to_list(const struct object_id *oid,
+			    struct strbuf *base, const char *path,
+			    unsigned int mode, void *context)
+{
+	struct string_list_item *it;
+	struct wt_status_change_data *d;
+	struct wt_status *s = context;
+	char *full_name;
+
+	if (S_ISDIR(mode))
+		return READ_TREE_RECURSIVE;
+
+	full_name = xstrfmt("%s%s", base->buf, path);
+	it = string_list_insert(&s->change, full_name);
+	d = it->util;
+	if (!d) {
+		CALLOC_ARRAY(d, 1);
+		it->util = d;
+	}
+
+	d->index_status = DIFF_STATUS_ADDED;
+	/* Leave {mode,oid}_head zero for adds. */
+	d->mode_index = mode;
+	oidcpy(&d->oid_index, oid);
+	s->committable = 1;
+	return 0;
+}
+
 static void wt_status_collect_changes_initial(struct wt_status *s)
 {
 	struct index_state *istate = s->repo->index;
@@ -668,6 +696,28 @@ static void wt_status_collect_changes_initial(struct wt_status *s)
 			continue;
 		if (ce_intent_to_add(ce))
 			continue;
+		if (S_ISSPARSEDIR(ce->ce_mode)) {
+			/*
+			 * This is a sparse directory entry, so we want to collect all
+			 * of the added files within the tree. This requires recursively
+			 * expanding the trees to find the elements that are new in this
+			 * tree and marking them with DIFF_STATUS_ADDED.
+			 */
+			struct strbuf base = STRBUF_INIT;
+			struct pathspec ps;
+			struct tree *tree = lookup_tree(istate->repo, &ce->oid);
+
+			memset(&ps, 0, sizeof(ps));
+			ps.recursive = 1;
+			ps.has_wildcard = 1;
+			ps.max_depth = -1;
+
+			strbuf_add(&base, ce->name, ce->ce_namelen);
+			read_tree_at(istate->repo, tree, &base, &ps,
+				     add_file_to_list, s);
+			continue;
+		}
+
 		it = string_list_insert(&s->change, ce->name);
 		d = it->util;
 		if (!d) {
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 215+ messages in thread

* [PATCH v3 12/12] fsmonitor: integrate with sparse index
  2021-05-14 18:30   ` [PATCH v3 00/12] " Derrick Stolee via GitGitGadget
                       ` (10 preceding siblings ...)
  2021-05-14 18:31     ` [PATCH v3 11/12] wt-status: expand added sparse directory entries Derrick Stolee via GitGitGadget
@ 2021-05-14 18:31     ` Derrick Stolee via GitGitGadget
  2021-05-21 11:59     ` [PATCH v4 00/12] Sparse-index: integrate with status Derrick Stolee via GitGitGadget
  12 siblings, 0 replies; 215+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-05-14 18:31 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

If we need to expand a sparse-index into a full one, then the FS Monitor
bitmap is going to be incorrect. Ensure that we start fresh at such an
event.

While this is currently a performance drawback, the eventual hope of the
sparse-index feature is that these expansions will be rare and hence we
will be able to keep the FS Monitor data accurate across multiple Git
commands.

These tests are added to demonstrate that the behavior is the same
across a full index and a sparse index, but also that file modifications
to a tracked directory outside of the sparse cone will trigger
ensure_full_index().

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 sparse-index.c              |  7 ++++++
 t/t7519-status-fsmonitor.sh | 48 +++++++++++++++++++++++++++++++++++++
 2 files changed, 55 insertions(+)

diff --git a/sparse-index.c b/sparse-index.c
index b2b3fbd75050..32ba0d17ef7c 100644
--- a/sparse-index.c
+++ b/sparse-index.c
@@ -195,6 +195,10 @@ int convert_to_sparse(struct index_state *istate)
 	cache_tree_free(&istate->cache_tree);
 	cache_tree_update(istate, 0);
 
+	istate->fsmonitor_has_run_once = 0;
+	FREE_AND_NULL(istate->fsmonitor_dirty);
+	FREE_AND_NULL(istate->fsmonitor_last_update);
+
 	istate->sparse_index = 1;
 	trace2_region_leave("index", "convert_to_sparse", istate->repo);
 	return 0;
@@ -291,6 +295,9 @@ void ensure_full_index(struct index_state *istate)
 	istate->cache = full->cache;
 	istate->cache_nr = full->cache_nr;
 	istate->cache_alloc = full->cache_alloc;
+	istate->fsmonitor_has_run_once = 0;
+	FREE_AND_NULL(istate->fsmonitor_dirty);
+	FREE_AND_NULL(istate->fsmonitor_last_update);
 
 	strbuf_release(&base);
 	free(full);
diff --git a/t/t7519-status-fsmonitor.sh b/t/t7519-status-fsmonitor.sh
index 45d025f96010..f70fe961902e 100755
--- a/t/t7519-status-fsmonitor.sh
+++ b/t/t7519-status-fsmonitor.sh
@@ -73,6 +73,7 @@ test_expect_success 'setup' '
 	expect*
 	actual*
 	marker*
+	trace2*
 	EOF
 '
 
@@ -383,4 +384,51 @@ test_expect_success 'status succeeds after staging/unstaging' '
 	)
 '
 
+# Usage:
+# check_sparse_index_behavior [!]
+# If "!" is supplied, then we verify that we do not call ensure_full_index
+# during a call to 'git status'. Otherwise, we verify that we _do_ call it.
+check_sparse_index_behavior () {
+	git status --porcelain=v2 >expect &&
+	git sparse-checkout init --cone --sparse-index &&
+	git sparse-checkout set dir1 dir2 &&
+	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \
+		git status --porcelain=v2 >actual &&
+	test_region $1 index ensure_full_index trace2.txt &&
+	test_cmp expect actual &&
+	rm trace2.txt &&
+	git sparse-checkout disable
+}
+
+test_expect_success 'status succeeds with sparse index' '
+	git reset --hard &&
+
+	test_config core.fsmonitor "$TEST_DIRECTORY/t7519/fsmonitor-all" &&
+	check_sparse_index_behavior ! &&
+
+	write_script .git/hooks/fsmonitor-test<<-\EOF &&
+		printf "last_update_token\0"
+	EOF
+	git config core.fsmonitor .git/hooks/fsmonitor-test &&
+	check_sparse_index_behavior ! &&
+
+	write_script .git/hooks/fsmonitor-test<<-\EOF &&
+		printf "last_update_token\0"
+		printf "dir1/modified\0"
+	EOF
+	check_sparse_index_behavior ! &&
+
+	cp -r dir1 dir1a &&
+	git add dir1a &&
+	git commit -m "add dir1a" &&
+
+	# This one modifies outside the sparse-checkout definition
+	# and hence we expect to expand the sparse-index.
+	write_script .git/hooks/fsmonitor-test<<-\EOF &&
+		printf "last_update_token\0"
+		printf "dir1a/modified\0"
+	EOF
+	check_sparse_index_behavior
+'
+
 test_done
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 215+ messages in thread

* Re: [PATCH v3 02/12] sparse-index: include EXTENDED flag when expanding
  2021-05-14 18:31     ` [PATCH v3 02/12] sparse-index: include EXTENDED flag when expanding Derrick Stolee via GitGitGadget
@ 2021-05-18  1:33       ` Elijah Newren
  2021-05-18 14:57         ` Derrick Stolee
  0 siblings, 1 reply; 215+ messages in thread
From: Elijah Newren @ 2021-05-18  1:33 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Junio C Hamano, Matheus Tavares Bernardino,
	Derrick Stolee, Derrick Stolee, Derrick Stolee

On Fri, May 14, 2021 at 11:31 AM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Derrick Stolee <dstolee@microsoft.com>
>
> When creating a full index from a sparse one, we create cache entries
> for every blob within a given sparse directory entry. These are
> correctly marked with the CE_SKIP_WORKTREE flag, but they must also be
> marked with the CE_EXTENDED flag to ensure that the skip-worktree bit is
> correctly written to disk in the case that the index is not converted
> back down to a sparse-index.

This seems odd to me.  When sparse-index is not involved and we are
just doing simple sparse checkouts, do we mark CE_SKIP_WORKTREE
entries with CE_EXTENDED?  I can't find any code that does so.

Is it possible that the setting of CE_EXTENDED is just a workaround
that happens to force the index to be written in cases where the logic
is otherwise thinking it can get away without one?  Or is there
something I'm missing about why the CE_EXTENDED flag is actually
needed here?

>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  sparse-index.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/sparse-index.c b/sparse-index.c
> index 1b49898d0cb7..b2b3fbd75050 100644
> --- a/sparse-index.c
> +++ b/sparse-index.c
> @@ -222,7 +222,7 @@ static int add_path_to_index(const struct object_id *oid,
>         strbuf_addstr(base, path);
>
>         ce = make_cache_entry(istate, mode, oid, base->buf, 0, 0);
> -       ce->ce_flags |= CE_SKIP_WORKTREE;
> +       ce->ce_flags |= CE_SKIP_WORKTREE | CE_EXTENDED;
>         set_index_entry(istate, istate->cache_nr++, ce);
>
>         strbuf_setlen(base, len);
> --
> gitgitgadget
>

^ permalink raw reply	[flat|nested] 215+ messages in thread

* Re: [PATCH v3 03/12] t1092: expand repository data shape
  2021-05-14 18:31     ` [PATCH v3 03/12] t1092: expand repository data shape Derrick Stolee via GitGitGadget
@ 2021-05-18  1:49       ` Elijah Newren
  2021-05-18 14:59         ` Derrick Stolee
  0 siblings, 1 reply; 215+ messages in thread
From: Elijah Newren @ 2021-05-18  1:49 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Junio C Hamano, Matheus Tavares Bernardino,
	Derrick Stolee, Derrick Stolee, Derrick Stolee

On Fri, May 14, 2021 at 11:31 AM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Derrick Stolee <dstolee@microsoft.com>
>
> As more features integrate with the sparse-index feature, more and more
> special cases arise that require different data shapes within the tree
> structure of the repository in order to demonstrate those cases.
>
> Add several interesting special cases all at once instead of sprinkling
> them across several commits. The interesting cases being added here are:
>
> * Add sparse-directory entries on both sides of directories within the
>   sparse-checkout definition.
>
> * Add directories outside the sparse-checkout definition who have only
>   one entry and are the first entry of a directory with multiple
>   entries.
>
> Later tests will take advantage of these shapes, but they also deepen
> the tests that already exist.

Makes sense.  Do we also want to add ones of the form

   foo/bar
   foo.txt

?

Here we'd be particularly looking that if foo is a sparse directory,
we want to avoid messing up its order.  ('foo' sorts before 'foo.txt',
but 'foo/' sorts after, and thus 'foo' the directory should be after
'foo.txt')


> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  t/t1092-sparse-checkout-compatibility.sh | 8 +++++++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
> index 4f2f09b53a32..98257695979a 100755
> --- a/t/t1092-sparse-checkout-compatibility.sh
> +++ b/t/t1092-sparse-checkout-compatibility.sh
> @@ -17,7 +17,7 @@ test_expect_success 'setup' '
>                 echo "after folder1" >g &&
>                 echo "after x" >z &&
>                 mkdir folder1 folder2 deep x &&
> -               mkdir deep/deeper1 deep/deeper2 &&
> +               mkdir deep/deeper1 deep/deeper2 deep/before deep/later &&
>                 mkdir deep/deeper1/deepest &&
>                 echo "after deeper1" >deep/e &&
>                 echo "after deepest" >deep/deeper1/e &&
> @@ -25,10 +25,16 @@ test_expect_success 'setup' '
>                 cp a folder2 &&
>                 cp a x &&
>                 cp a deep &&
> +               cp a deep/before &&
>                 cp a deep/deeper1 &&
>                 cp a deep/deeper2 &&
> +               cp a deep/later &&
>                 cp a deep/deeper1/deepest &&
>                 cp -r deep/deeper1/deepest deep/deeper2 &&
> +               mkdir deep/deeper1/0 &&
> +               mkdir deep/deeper1/0/0 &&
> +               touch deep/deeper1/0/1 &&
> +               touch deep/deeper1/0/0/0 &&
>                 git add . &&
>                 git commit -m "initial commit" &&
>                 git checkout -b base &&
> --
> gitgitgadget

Looks good.

^ permalink raw reply	[flat|nested] 215+ messages in thread

* Re: [PATCH v3 07/12] unpack-trees: stop recursing into sparse directories
  2021-05-14 18:31     ` [PATCH v3 07/12] unpack-trees: stop recursing into sparse directories Derrick Stolee via GitGitGadget
@ 2021-05-18  2:03       ` Elijah Newren
  2021-05-18  2:06         ` Elijah Newren
  0 siblings, 1 reply; 215+ messages in thread
From: Elijah Newren @ 2021-05-18  2:03 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Junio C Hamano, Matheus Tavares Bernardino,
	Derrick Stolee, Derrick Stolee, Derrick Stolee

On Fri, May 14, 2021 at 11:31 AM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Derrick Stolee <dstolee@microsoft.com>
>
> When walking trees using traverse_trees_recursive() and
> unpack_callback(), we must not attempt to walk into a sparse directory
> entry. There are no index entries within that directory to compare to
> the tree object at that position, so skip over the entries of that tree.
>
> This code is used in many places, so the only way to test it is to start
> removing the command_requres_full_index option from one builtin at a
> time and carefully test that its use of unpack_trees() behaves correctly
> with a sparse-index. Such tests will be added by later changes.
>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  diff-lib.c     | 6 ++++++
>  unpack-trees.c | 7 +++++--
>  2 files changed, 11 insertions(+), 2 deletions(-)
>
> diff --git a/diff-lib.c b/diff-lib.c
> index b73cc1859a49..d5e7e01132ee 100644
> --- a/diff-lib.c
> +++ b/diff-lib.c
> @@ -322,6 +322,9 @@ static void show_new_file(struct rev_info *revs,
>         unsigned int mode;
>         unsigned dirty_submodule = 0;
>
> +       if (S_ISSPARSEDIR(new_file->ce_mode))
> +               return;
> +

Makes sense, but is this related to the unpack-trees.c changes and the
commit message, or should it be in a separate commit?

>         /*
>          * New file in the index: it might actually be different in
>          * the working tree.
> @@ -343,6 +346,9 @@ static int show_modified(struct rev_info *revs,
>         const struct object_id *oid;
>         unsigned dirty_submodule = 0;
>
> +       if (S_ISSPARSEDIR(new_entry->ce_mode))
> +               return 0;
> +

Same question as above.  And a few more questions...

What if the old commit/tree had a file at this path, and the new
commit/tree has a (sparse) directory at this path?  Shouldn't
_something_ be shown for the file deletion?  Or does such a case not
run through this code path?

Also, wouldn't we expect it to be an error for show_modified() to be
called on a sparse directory?  If two sparse directories differed, we
should have inflated the trees to find the differences in the path
underneath them, right?  And if they didn't differ, then
show_modified() should not have been invoked?

I can see cases where we wouldn't want to bother looking at the
differences between to sparse directories, e.g. a
--restrict-to-sparsity-paths option to diff/log/etc, but I don't see
you setting this behind an option here.

>         if (get_stat_data(new_entry, &oid, &mode, cached, match_missing,
>                           &dirty_submodule, &revs->diffopt) < 0) {
>                 if (report_missing)
> diff --git a/unpack-trees.c b/unpack-trees.c
> index ef6a2b1c951c..703b0bdc9dfd 100644
> --- a/unpack-trees.c
> +++ b/unpack-trees.c
> @@ -1261,6 +1261,7 @@ static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, str
>         struct cache_entry *src[MAX_UNPACK_TREES + 1] = { NULL, };
>         struct unpack_trees_options *o = info->data;
>         const struct name_entry *p = names;
> +       unsigned unpack_tree = 1;
>
>         /* Find first entry with a real name (we could use "mask" too) */
>         while (!p->mode)
> @@ -1307,7 +1308,8 @@ static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, str
>                 }
>         }
>
> -       if (unpack_nondirectories(n, mask, dirmask, src, names, info) < 0)
> +       if (unpack_tree &&
> +           unpack_nondirectories(n, mask, dirmask, src, names, info) < 0)
>                 return -1;
>
>         if (o->merge && src[0]) {
> @@ -1337,7 +1339,8 @@ static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, str
>                         }
>                 }
>
> -               if (traverse_trees_recursive(n, dirmask, mask & ~dirmask,
> +               if (unpack_tree &&
> +                   traverse_trees_recursive(n, dirmask, mask & ~dirmask,
>                                              names, info) < 0)
>                         return -1;
>                 return mask;
> --
> gitgitgadget

The unpack-trees.c changes make sense to me still.

^ permalink raw reply	[flat|nested] 215+ messages in thread

* Re: [PATCH v3 07/12] unpack-trees: stop recursing into sparse directories
  2021-05-18  2:03       ` Elijah Newren
@ 2021-05-18  2:06         ` Elijah Newren
  2021-05-18 19:20           ` Derrick Stolee
  0 siblings, 1 reply; 215+ messages in thread
From: Elijah Newren @ 2021-05-18  2:06 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Junio C Hamano, Matheus Tavares Bernardino,
	Derrick Stolee, Derrick Stolee, Derrick Stolee

Sorry, I spoke too soon...

On Mon, May 17, 2021 at 7:03 PM Elijah Newren <newren@gmail.com> wrote:
>
> > diff --git a/unpack-trees.c b/unpack-trees.c
> > index ef6a2b1c951c..703b0bdc9dfd 100644
> > --- a/unpack-trees.c
> > +++ b/unpack-trees.c
> > @@ -1261,6 +1261,7 @@ static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, str
> >         struct cache_entry *src[MAX_UNPACK_TREES + 1] = { NULL, };
> >         struct unpack_trees_options *o = info->data;
> >         const struct name_entry *p = names;
> > +       unsigned unpack_tree = 1;

Here, you set unpack_tree to 1.

> >
> >         /* Find first entry with a real name (we could use "mask" too) */
> >         while (!p->mode)
> > @@ -1307,7 +1308,8 @@ static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, str
> >                 }
> >         }
> >
> > -       if (unpack_nondirectories(n, mask, dirmask, src, names, info) < 0)
> > +       if (unpack_tree &&

You check it's value here...

> > +           unpack_nondirectories(n, mask, dirmask, src, names, info) < 0)
> >                 return -1;
> >
> >         if (o->merge && src[0]) {
> > @@ -1337,7 +1339,8 @@ static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, str
> >                         }
> >                 }
> >
> > -               if (traverse_trees_recursive(n, dirmask, mask & ~dirmask,
> > +               if (unpack_tree &&
...and here....

> > +                   traverse_trees_recursive(n, dirmask, mask & ~dirmask,
> >                                              names, info) < 0)
> >                         return -1;
> >                 return mask;

but you never set unpack_tree to 0, so this is wasted effort and you
always recurse.  The previous iteration had a case where it'd set
unpack_tree to 0 in a certain case, but you deleted that code in this
version.  Why?

^ permalink raw reply	[flat|nested] 215+ messages in thread

* Re: [PATCH v3 11/12] wt-status: expand added sparse directory entries
  2021-05-14 18:31     ` [PATCH v3 11/12] wt-status: expand added sparse directory entries Derrick Stolee via GitGitGadget
@ 2021-05-18  2:27       ` Elijah Newren
  2021-05-18 18:26         ` Derrick Stolee
  0 siblings, 1 reply; 215+ messages in thread
From: Elijah Newren @ 2021-05-18  2:27 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Junio C Hamano, Matheus Tavares Bernardino,
	Derrick Stolee, Derrick Stolee, Derrick Stolee

On Fri, May 14, 2021 at 11:31 AM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Derrick Stolee <dstolee@microsoft.com>
>
> It is difficult, but possible, to get into a state where we intend to
> add a directory that is outside of the sparse-checkout definition. Add a

Then we need to fix that; allowing things to be added outside the
sparse-checkout definition is a bug[1][2].  That's an invariant I
believe we should maintain everywhere; things get really confusing to
users somewhere later down the road if we don't.  Matheus worked to
fix that with 'git add'; if there are other commands that need fixing
too, then we should also fix them.

[1] https://lore.kernel.org/git/CABPp-BFhyFiKSXdLM5q5t=ZKzr6V0pY7dbheierRaOHFbMEdkg@mail.gmail.com/
[2] https://lore.kernel.org/git/CABPp-BF0ZhbSs42R3Bw_r-hbhQ71qtbXSBqXdq0djyaan=8p=A@mail.gmail.com/

> test to t1092-sparse-checkout-compatibility.sh that demonstrates this
> using a combination of 'git reset --mixed' and 'git checkout --orphan'.

I think `git checkout --orphan` should just throw an error if
sparse-checkout is in use.  Allowing adding paths outside the
sparse-checkout set causes too much collateral and deferred confusion
for users.

> This test failed before because the output of 'git status
> --porcelain=v2' would not match on the lines for folder1/:
>
> * The sparse-checkout repo (with a full index) would output each path
>   name that is intended to be added.
>
> * The sparse-index repo would only output that "folder1/" is staged for
>   addition.
>
> The status should report the full list of files to be added, and so this
> sparse-directory entry should be expanded to a full list when reaching
> it inside the wt_status_collect_changes_initial() method. Use
> read_tree_at() to assist.

Having a sparse directory entry whose object_id in the index does not
match HEAD should be an error.  Have a CE_SKIP_WORKTREE non-directory
whose object_id in the index does not match HEAD should also be an
error.  I don't think we should complicate the code to try to handle
violations of those assumptions.  I do think we should add checks to
enforce that constraint (or BUG() if it's violated).

And yeah, that also means 'git sparse-checkout add/set' would need to
error out if paths are requested to be sparsified despite being
different from HEAD.

> Somehow, this loop over the cache entries was not guarded by
> ensure_full_index() as intended.
>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  t/t1092-sparse-checkout-compatibility.sh | 28 +++++++++++++
>  wt-status.c                              | 50 ++++++++++++++++++++++++
>  2 files changed, 78 insertions(+)
>
> diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
> index 59faf7381093..cd3669d36b53 100755
> --- a/t/t1092-sparse-checkout-compatibility.sh
> +++ b/t/t1092-sparse-checkout-compatibility.sh
> @@ -492,4 +492,32 @@ test_expect_success 'sparse-index is not expanded' '
>         test_region ! index ensure_full_index trace2.txt
>  '
>
> +test_expect_success 'reset mixed and checkout orphan' '
> +       init_repos &&
> +
> +       test_all_match git checkout rename-out-to-in &&
> +       test_all_match git reset --mixed HEAD~1 &&
> +       test_sparse_match test-tool read-cache --table --expand &&
> +       test_all_match git status --porcelain=v2 &&
> +       test_all_match git status --porcelain=v2 &&
> +
> +       # At this point, sparse-checkouts behave differently
> +       # from the full-checkout.
> +       test_sparse_match git checkout --orphan new-branch &&
> +       test_sparse_match test-tool read-cache --table --expand &&
> +       test_sparse_match git status --porcelain=v2 &&
> +       test_sparse_match git status --porcelain=v2
> +'
> +
> +test_expect_success 'add everything with deep new file' '
> +       init_repos &&
> +
> +       run_on_sparse git sparse-checkout set deep/deeper1/deepest &&
> +
> +       run_on_all touch deep/deeper1/x &&
> +       test_all_match git add . &&
> +       test_all_match git status --porcelain=v2 &&
> +       test_all_match git status --porcelain=v2
> +'
> +
>  test_done
> diff --git a/wt-status.c b/wt-status.c
> index 0425169c1895..90db8bd659fa 100644
> --- a/wt-status.c
> +++ b/wt-status.c
> @@ -654,6 +654,34 @@ static void wt_status_collect_changes_index(struct wt_status *s)
>         run_diff_index(&rev, 1);
>  }
>
> +static int add_file_to_list(const struct object_id *oid,
> +                           struct strbuf *base, const char *path,
> +                           unsigned int mode, void *context)
> +{
> +       struct string_list_item *it;
> +       struct wt_status_change_data *d;
> +       struct wt_status *s = context;
> +       char *full_name;
> +
> +       if (S_ISDIR(mode))
> +               return READ_TREE_RECURSIVE;
> +
> +       full_name = xstrfmt("%s%s", base->buf, path);
> +       it = string_list_insert(&s->change, full_name);
> +       d = it->util;
> +       if (!d) {
> +               CALLOC_ARRAY(d, 1);
> +               it->util = d;
> +       }
> +
> +       d->index_status = DIFF_STATUS_ADDED;
> +       /* Leave {mode,oid}_head zero for adds. */
> +       d->mode_index = mode;
> +       oidcpy(&d->oid_index, oid);
> +       s->committable = 1;
> +       return 0;
> +}
> +
>  static void wt_status_collect_changes_initial(struct wt_status *s)
>  {
>         struct index_state *istate = s->repo->index;
> @@ -668,6 +696,28 @@ static void wt_status_collect_changes_initial(struct wt_status *s)
>                         continue;
>                 if (ce_intent_to_add(ce))
>                         continue;
> +               if (S_ISSPARSEDIR(ce->ce_mode)) {
> +                       /*
> +                        * This is a sparse directory entry, so we want to collect all
> +                        * of the added files within the tree. This requires recursively
> +                        * expanding the trees to find the elements that are new in this
> +                        * tree and marking them with DIFF_STATUS_ADDED.
> +                        */
> +                       struct strbuf base = STRBUF_INIT;
> +                       struct pathspec ps;
> +                       struct tree *tree = lookup_tree(istate->repo, &ce->oid);
> +
> +                       memset(&ps, 0, sizeof(ps));
> +                       ps.recursive = 1;
> +                       ps.has_wildcard = 1;
> +                       ps.max_depth = -1;
> +
> +                       strbuf_add(&base, ce->name, ce->ce_namelen);
> +                       read_tree_at(istate->repo, tree, &base, &ps,
> +                                    add_file_to_list, s);
> +                       continue;
> +               }
> +
>                 it = string_list_insert(&s->change, ce->name);
>                 d = it->util;
>                 if (!d) {
> --
> gitgitgadget

This was a really nice catch that you got this particular testcase.
While I disagree with the fix, I do have to say nice work on the catch
and the implementation otherwise.

^ permalink raw reply	[flat|nested] 215+ messages in thread

* Re: [PATCH v3 02/12] sparse-index: include EXTENDED flag when expanding
  2021-05-18  1:33       ` Elijah Newren
@ 2021-05-18 14:57         ` Derrick Stolee
  2021-05-18 17:48           ` Elijah Newren
  0 siblings, 1 reply; 215+ messages in thread
From: Derrick Stolee @ 2021-05-18 14:57 UTC (permalink / raw)
  To: Elijah Newren, Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Junio C Hamano, Matheus Tavares Bernardino,
	Derrick Stolee, Derrick Stolee

On 5/17/2021 9:33 PM, Elijah Newren wrote:
> On Fri, May 14, 2021 at 11:31 AM Derrick Stolee via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
>>
>> From: Derrick Stolee <dstolee@microsoft.com>
>>
>> When creating a full index from a sparse one, we create cache entries
>> for every blob within a given sparse directory entry. These are
>> correctly marked with the CE_SKIP_WORKTREE flag, but they must also be
>> marked with the CE_EXTENDED flag to ensure that the skip-worktree bit is
>> correctly written to disk in the case that the index is not converted
>> back down to a sparse-index.
> 
> This seems odd to me.  When sparse-index is not involved and we are
> just doing simple sparse checkouts, do we mark CE_SKIP_WORKTREE
> entries with CE_EXTENDED?  I can't find any code that does so.
> 
> Is it possible that the setting of CE_EXTENDED is just a workaround
> that happens to force the index to be written in cases where the logic
> is otherwise thinking it can get away without one?  Or is there
> something I'm missing about why the CE_EXTENDED flag is actually
> needed here?

This is happening within the context of ensure_full_index(), so we
are creating new cache entries and want to mimic what they would
look like on-disk. Something within do_write_index() discovers that
since CE_SKIP_WORKTREE is set, then also CE_EXTENDED should be set
in order to ensure that the on-disk representation has enough room
for the CE_SKIP_WORKTREE bit.

I suppose this might not have a meaningful purpose other than when
I compare a full index against an expanded sparse-index and check
if their flags match.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 215+ messages in thread

* Re: [PATCH v3 03/12] t1092: expand repository data shape
  2021-05-18  1:49       ` Elijah Newren
@ 2021-05-18 14:59         ` Derrick Stolee
  0 siblings, 0 replies; 215+ messages in thread
From: Derrick Stolee @ 2021-05-18 14:59 UTC (permalink / raw)
  To: Elijah Newren, Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Junio C Hamano, Matheus Tavares Bernardino,
	Derrick Stolee, Derrick Stolee

On 5/17/2021 9:49 PM, Elijah Newren wrote:
> On Fri, May 14, 2021 at 11:31 AM Derrick Stolee via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
>> Later tests will take advantage of these shapes, but they also deepen
>> the tests that already exist.
> 
> Makes sense.  Do we also want to add ones of the form
> 
>    foo/bar
>    foo.txt
> 
> ?
> 
> Here we'd be particularly looking that if foo is a sparse directory,
> we want to avoid messing up its order.  ('foo' sorts before 'foo.txt',
> but 'foo/' sorts after, and thus 'foo' the directory should be after
> 'foo.txt')

Good idea!

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 215+ messages in thread

* Re: [PATCH v3 02/12] sparse-index: include EXTENDED flag when expanding
  2021-05-18 14:57         ` Derrick Stolee
@ 2021-05-18 17:48           ` Elijah Newren
  2021-05-18 18:16             ` Derrick Stolee
  0 siblings, 1 reply; 215+ messages in thread
From: Elijah Newren @ 2021-05-18 17:48 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Derrick Stolee via GitGitGadget, Git Mailing List,
	Junio C Hamano, Matheus Tavares Bernardino, Derrick Stolee,
	Derrick Stolee

On Tue, May 18, 2021 at 7:57 AM Derrick Stolee <stolee@gmail.com> wrote:
>
> On 5/17/2021 9:33 PM, Elijah Newren wrote:
> > On Fri, May 14, 2021 at 11:31 AM Derrick Stolee via GitGitGadget
> > <gitgitgadget@gmail.com> wrote:
> >>
> >> From: Derrick Stolee <dstolee@microsoft.com>
> >>
> >> When creating a full index from a sparse one, we create cache entries
> >> for every blob within a given sparse directory entry. These are
> >> correctly marked with the CE_SKIP_WORKTREE flag, but they must also be
> >> marked with the CE_EXTENDED flag to ensure that the skip-worktree bit is
> >> correctly written to disk in the case that the index is not converted
> >> back down to a sparse-index.
> >
> > This seems odd to me.  When sparse-index is not involved and we are
> > just doing simple sparse checkouts, do we mark CE_SKIP_WORKTREE
> > entries with CE_EXTENDED?  I can't find any code that does so.
> >
> > Is it possible that the setting of CE_EXTENDED is just a workaround
> > that happens to force the index to be written in cases where the logic
> > is otherwise thinking it can get away without one?  Or is there
> > something I'm missing about why the CE_EXTENDED flag is actually
> > needed here?
>
> This is happening within the context of ensure_full_index(), so we
> are creating new cache entries and want to mimic what they would
> look like on-disk. Something within do_write_index() discovers that
> since CE_SKIP_WORKTREE is set, then also CE_EXTENDED should be set
> in order to ensure that the on-disk representation has enough room
> for the CE_SKIP_WORKTREE bit.

Yeah, I think it's this part:

        /* reduce extended entries if possible */
        cache[i]->ce_flags &= ~CE_EXTENDED;
        if (cache[i]->ce_flags & CE_EXTENDED_FLAGS) {
            extended++;
            cache[i]->ce_flags |= CE_EXTENDED;
        }

>
> I suppose this might not have a meaningful purpose other than when
> I compare a full index against an expanded sparse-index and check
> if their flags match.

Ah, you're just setting this flag in advance of do_write_index() being
called so that you can compare in memory values and check they match
without doing a write-to-disk-and-read-back cycle.  Makes sense, but
it'd be nice to see this in the commit message.

^ permalink raw reply	[flat|nested] 215+ messages in thread

* Re: [PATCH v3 02/12] sparse-index: include EXTENDED flag when expanding
  2021-05-18 17:48           ` Elijah Newren
@ 2021-05-18 18:16             ` Derrick Stolee
  0 siblings, 0 replies; 215+ messages in thread
From: Derrick Stolee @ 2021-05-18 18:16 UTC (permalink / raw)
  To: Elijah Newren
  Cc: Derrick Stolee via GitGitGadget, Git Mailing List,
	Junio C Hamano, Matheus Tavares Bernardino, Derrick Stolee,
	Derrick Stolee

On 5/18/2021 1:48 PM, Elijah Newren wrote:
> On Tue, May 18, 2021 at 7:57 AM Derrick Stolee <stolee@gmail.com> wrote:
>>
>> On 5/17/2021 9:33 PM, Elijah Newren wrote:
>>> Is it possible that the setting of CE_EXTENDED is just a workaround
>>> that happens to force the index to be written in cases where the logic
>>> is otherwise thinking it can get away without one?  Or is there
>>> something I'm missing about why the CE_EXTENDED flag is actually
>>> needed here?
>>
>> This is happening within the context of ensure_full_index(), so we
>> are creating new cache entries and want to mimic what they would
>> look like on-disk. Something within do_write_index() discovers that
>> since CE_SKIP_WORKTREE is set, then also CE_EXTENDED should be set
>> in order to ensure that the on-disk representation has enough room
>> for the CE_SKIP_WORKTREE bit.
> 
> Yeah, I think it's this part:
> 
>         /* reduce extended entries if possible */
>         cache[i]->ce_flags &= ~CE_EXTENDED;
>         if (cache[i]->ce_flags & CE_EXTENDED_FLAGS) {
>             extended++;
>             cache[i]->ce_flags |= CE_EXTENDED;
>         }
> 
>>
>> I suppose this might not have a meaningful purpose other than when
>> I compare a full index against an expanded sparse-index and check
>> if their flags match.
> 
> Ah, you're just setting this flag in advance of do_write_index() being
> called so that you can compare in memory values and check they match
> without doing a write-to-disk-and-read-back cycle.  Makes sense, but
> it'd be nice to see this in the commit message.

Will do. Thanks,

-Stolee

^ permalink raw reply	[flat|nested] 215+ messages in thread

* Re: [PATCH v3 11/12] wt-status: expand added sparse directory entries
  2021-05-18  2:27       ` Elijah Newren
@ 2021-05-18 18:26         ` Derrick Stolee
  2021-05-18 19:04           ` Derrick Stolee
  0 siblings, 1 reply; 215+ messages in thread
From: Derrick Stolee @ 2021-05-18 18:26 UTC (permalink / raw)
  To: Elijah Newren, Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Junio C Hamano, Matheus Tavares Bernardino,
	Derrick Stolee, Derrick Stolee

On 5/17/2021 10:27 PM, Elijah Newren wrote:
> On Fri, May 14, 2021 at 11:31 AM Derrick Stolee via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
>>
>> From: Derrick Stolee <dstolee@microsoft.com>
>>
>> It is difficult, but possible, to get into a state where we intend to
>> add a directory that is outside of the sparse-checkout definition. Add a
> 
> Then we need to fix that; allowing things to be added outside the
> sparse-checkout definition is a bug[1][2].  That's an invariant I
> believe we should maintain everywhere; things get really confusing to
> users somewhere later down the road if we don't.  Matheus worked to
> fix that with 'git add'; if there are other commands that need fixing
> too, then we should also fix them.
> 
> [1] https://lore.kernel.org/git/CABPp-BFhyFiKSXdLM5q5t=ZKzr6V0pY7dbheierRaOHFbMEdkg@mail.gmail.com/
> [2] https://lore.kernel.org/git/CABPp-BF0ZhbSs42R3Bw_r-hbhQ71qtbXSBqXdq0djyaan=8p=A@mail.gmail.com/
> 
>> test to t1092-sparse-checkout-compatibility.sh that demonstrates this
>> using a combination of 'git reset --mixed' and 'git checkout --orphan'.
> 
> I think `git checkout --orphan` should just throw an error if
> sparse-checkout is in use.  Allowing adding paths outside the
> sparse-checkout set causes too much collateral and deferred confusion
> for users.

I've been trying to strike an interesting balance of creating
performance improvements without changing behavior, trying to
defer those behavior changes to an isolated instance. I think
that approach is unavoidable with the 'git add' work that I
pulled out of this series and will return to soon.

However, here I think it would be too much to start throwing
an error in this case. I think that change is a bit too much.

The thing I can try to do, instead of the current approach, is
to not allow sparse directory entries to differ between the
index and HEAD. That will satisfy this case, but also a lot of
other painful cases.

I have no idea how to actually accomplish that, but I'll start
digging.

>> This test failed before because the output of 'git status
>> --porcelain=v2' would not match on the lines for folder1/:
>>
>> * The sparse-checkout repo (with a full index) would output each path
>>   name that is intended to be added.
>>
>> * The sparse-index repo would only output that "folder1/" is staged for
>>   addition.
>>
>> The status should report the full list of files to be added, and so this
>> sparse-directory entry should be expanded to a full list when reaching
>> it inside the wt_status_collect_changes_initial() method. Use
>> read_tree_at() to assist.
> 
> Having a sparse directory entry whose object_id in the index does not
> match HEAD should be an error.

I can get behind this understanding.

>  Have a CE_SKIP_WORKTREE non-directory
> whose object_id in the index does not match HEAD should also be an
> error.

I'm less convinced here. At minimum, I'm not willing to stake
a firm claim and change the behavior around this statement in
the current series.

>  I don't think we should complicate the code to try to handle
> violations of those assumptions.  I do think we should add checks to
> enforce that constraint (or BUG() if it's violated).

A BUG() is likely too strict, because existing Git clients can
get users into this state, and then they upgrade and are suddenly
in a BUG() state. We should perhaps do our best effort to avoid
this case and handle it as appropriately as possible.

> And yeah, that also means 'git sparse-checkout add/set' would need to
> error out if paths are requested to be sparsified despite being
> different from HEAD.

This would be a reasonable thing, assuming the established
behavior is changed.

>> Somehow, this loop over the cache entries was not guarded by
>> ensure_full_index() as intended.
>>
>> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
>> ---
>>  t/t1092-sparse-checkout-compatibility.sh | 28 +++++++++++++
>>  wt-status.c                              | 50 ++++++++++++++++++++++++
>>  2 files changed, 78 insertions(+)
>>
>> diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
>> index 59faf7381093..cd3669d36b53 100755
>> --- a/t/t1092-sparse-checkout-compatibility.sh
>> +++ b/t/t1092-sparse-checkout-compatibility.sh
>> @@ -492,4 +492,32 @@ test_expect_success 'sparse-index is not expanded' '
>>         test_region ! index ensure_full_index trace2.txt
>>  '
>>
>> +test_expect_success 'reset mixed and checkout orphan' '
>> +       init_repos &&
>> +
>> +       test_all_match git checkout rename-out-to-in &&
>> +       test_all_match git reset --mixed HEAD~1 &&
>> +       test_sparse_match test-tool read-cache --table --expand &&
>> +       test_all_match git status --porcelain=v2 &&
>> +       test_all_match git status --porcelain=v2 &&
>> +
>> +       # At this point, sparse-checkouts behave differently
>> +       # from the full-checkout.
>> +       test_sparse_match git checkout --orphan new-branch &&
>> +       test_sparse_match test-tool read-cache --table --expand &&
>> +       test_sparse_match git status --porcelain=v2 &&
>> +       test_sparse_match git status --porcelain=v2
>> +'
>> +
>> +test_expect_success 'add everything with deep new file' '
>> +       init_repos &&
>> +
>> +       run_on_sparse git sparse-checkout set deep/deeper1/deepest &&
>> +
>> +       run_on_all touch deep/deeper1/x &&
>> +       test_all_match git add . &&
>> +       test_all_match git status --porcelain=v2 &&
>> +       test_all_match git status --porcelain=v2
>> +'>
> This was a really nice catch that you got this particular testcase.
> While I disagree with the fix, I do have to say nice work on the catch
> and the implementation otherwise.

This test exists almost verbatim in the Scalar and VFS For Git
functional tests. I have no idea what context caused it to be
necessary.

I can understand your aversion to the solution I presented here.
Preventing sparse directory entries that differ from the tree at
HEAD for that path should be more robust to future integrations.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 215+ messages in thread

* Re: [PATCH v3 11/12] wt-status: expand added sparse directory entries
  2021-05-18 18:26         ` Derrick Stolee
@ 2021-05-18 19:04           ` Derrick Stolee
  2021-05-19  8:38             ` Elijah Newren
  0 siblings, 1 reply; 215+ messages in thread
From: Derrick Stolee @ 2021-05-18 19:04 UTC (permalink / raw)
  To: Elijah Newren, Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Junio C Hamano, Matheus Tavares Bernardino,
	Derrick Stolee, Derrick Stolee

On 5/18/2021 2:26 PM, Derrick Stolee wrote:
> On 5/17/2021 10:27 PM, Elijah Newren wrote:
>> On Fri, May 14, 2021 at 11:31 AM Derrick Stolee via GitGitGadget
>> <gitgitgadget@gmail.com> wrote:
>>>
>>> From: Derrick Stolee <dstolee@microsoft.com>
>>>
>>> It is difficult, but possible, to get into a state where we intend to
>>> add a directory that is outside of the sparse-checkout definition. Add a
>>
>> Then we need to fix that; allowing things to be added outside the
>> sparse-checkout definition is a bug[1][2].  That's an invariant I
>> believe we should maintain everywhere; things get really confusing to
>> users somewhere later down the road if we don't.  Matheus worked to
>> fix that with 'git add'; if there are other commands that need fixing
>> too, then we should also fix them.
>>
>> [1] https://lore.kernel.org/git/CABPp-BFhyFiKSXdLM5q5t=ZKzr6V0pY7dbheierRaOHFbMEdkg@mail.gmail.com/
>> [2] https://lore.kernel.org/git/CABPp-BF0ZhbSs42R3Bw_r-hbhQ71qtbXSBqXdq0djyaan=8p=A@mail.gmail.com/
>>
>>> test to t1092-sparse-checkout-compatibility.sh that demonstrates this
>>> using a combination of 'git reset --mixed' and 'git checkout --orphan'.
>>
>> I think `git checkout --orphan` should just throw an error if
>> sparse-checkout is in use.  Allowing adding paths outside the
>> sparse-checkout set causes too much collateral and deferred confusion
>> for users.
> 
> I've been trying to strike an interesting balance of creating
> performance improvements without changing behavior, trying to
> defer those behavior changes to an isolated instance. I think
> that approach is unavoidable with the 'git add' work that I
> pulled out of this series and will return to soon.
> 
> However, here I think it would be too much to start throwing
> an error in this case. I think that change is a bit too much.
> 
> The thing I can try to do, instead of the current approach, is
> to not allow sparse directory entries to differ between the
> index and HEAD. That will satisfy this case, but also a lot of
> other painful cases.
> 
> I have no idea how to actually accomplish that, but I'll start
> digging.

It didn't take much digging to discover that this is likely
impossible, or rather it would be a drastic change to make this
happen.

The immediate issue is trying to prevent sparse directory entries
from existing when the contained paths don't match what exists at
HEAD. However, in the 'git checkout --orphan' case, we are using
a full index for the unpack_trees() that updates the in-memory
index according to the paths at HEAD, then updates HEAD to point
to a non-existing ref. The sparse directories are only created as
part of convert_to_sparse() within do_write_index(). At that
point, there is no HEAD provided. Trying to load it from scratch
violates the fact that HEAD is being staged to change _after_ the
index updates in a command like 'git checkout'.

So, the drastic change to make this work would be to update the
index API to require a root tree to be provided whenever writing
the index. However, that doesn't make sense, either! What do we
do when in a conflicted state?

What if a user modifies HEAD manually to point to a new ref?

Such a change would couple the index to the concept of HEAD in
an unproductive way, I think. The index data structure exists
as a separate entity that is frequently _compared_ to HEAD, and
the solution presented in this patch presents a way to keep the
comparison of a sparse-index and HEAD to be the same as if we
had a full index.

So, after looking into it, I'm back in favor of this change and
forever allowing sparse cache entries to differ from HEAD,
because there is no way to avoid it.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 215+ messages in thread

* Re: [PATCH v3 07/12] unpack-trees: stop recursing into sparse directories
  2021-05-18  2:06         ` Elijah Newren
@ 2021-05-18 19:20           ` Derrick Stolee
  0 siblings, 0 replies; 215+ messages in thread
From: Derrick Stolee @ 2021-05-18 19:20 UTC (permalink / raw)
  To: Elijah Newren, Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Junio C Hamano, Matheus Tavares Bernardino,
	Derrick Stolee, Derrick Stolee

On 5/17/2021 10:06 PM, Elijah Newren wrote:
> Sorry, I spoke too soon...
> 
> On Mon, May 17, 2021 at 7:03 PM Elijah Newren <newren@gmail.com> wrote:
>>
>>> diff --git a/unpack-trees.c b/unpack-trees.c
>>> index ef6a2b1c951c..703b0bdc9dfd 100644
>>> --- a/unpack-trees.c
>>> +++ b/unpack-trees.c
>>> @@ -1261,6 +1261,7 @@ static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, str
>>>         struct cache_entry *src[MAX_UNPACK_TREES + 1] = { NULL, };
>>>         struct unpack_trees_options *o = info->data;
>>>         const struct name_entry *p = names;
>>> +       unsigned unpack_tree = 1;
> 
> Here, you set unpack_tree to 1.
> 
>>>
>>>         /* Find first entry with a real name (we could use "mask" too) */
>>>         while (!p->mode)
>>> @@ -1307,7 +1308,8 @@ static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, str
>>>                 }
>>>         }
>>>
>>> -       if (unpack_nondirectories(n, mask, dirmask, src, names, info) < 0)
>>> +       if (unpack_tree &&
> 
> You check it's value here...
> 
>>> +           unpack_nondirectories(n, mask, dirmask, src, names, info) < 0)
>>>                 return -1;
>>>
>>>         if (o->merge && src[0]) {
>>> @@ -1337,7 +1339,8 @@ static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, str
>>>                         }
>>>                 }
>>>
>>> -               if (traverse_trees_recursive(n, dirmask, mask & ~dirmask,
>>> +               if (unpack_tree &&
> ...and here....
> 
>>> +                   traverse_trees_recursive(n, dirmask, mask & ~dirmask,
>>>                                              names, info) < 0)
>>>                         return -1;
>>>                 return mask;
> 
> but you never set unpack_tree to 0, so this is wasted effort and you
> always recurse.  The previous iteration had a case where it'd set
> unpack_tree to 0 in a certain case, but you deleted that code in this
> version.  Why?

It appears that the changes to unpack-trees.c are no longer relevant,
and instead the changes to diff-lib.c (which were already out of place)
should instead be the focus. In fact, those changes to diff-lib.c can
be simplified and moved to path 10, so I will do that.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 215+ messages in thread

* Re: [PATCH v3 11/12] wt-status: expand added sparse directory entries
  2021-05-18 19:04           ` Derrick Stolee
@ 2021-05-19  8:38             ` Elijah Newren
  0 siblings, 0 replies; 215+ messages in thread
From: Elijah Newren @ 2021-05-19  8:38 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Derrick Stolee via GitGitGadget, Git Mailing List,
	Junio C Hamano, Matheus Tavares Bernardino, Derrick Stolee,
	Derrick Stolee

On Tue, May 18, 2021 at 12:05 PM Derrick Stolee <stolee@gmail.com> wrote:
>
> On 5/18/2021 2:26 PM, Derrick Stolee wrote:
> > On 5/17/2021 10:27 PM, Elijah Newren wrote:
> >> On Fri, May 14, 2021 at 11:31 AM Derrick Stolee via GitGitGadget
> >> <gitgitgadget@gmail.com> wrote:
> >>>
> >>> From: Derrick Stolee <dstolee@microsoft.com>
> >>>
> >>> It is difficult, but possible, to get into a state where we intend to
> >>> add a directory that is outside of the sparse-checkout definition. Add a
> >>
> >> Then we need to fix that; allowing things to be added outside the
> >> sparse-checkout definition is a bug[1][2].  That's an invariant I
> >> believe we should maintain everywhere; things get really confusing to
> >> users somewhere later down the road if we don't.  Matheus worked to
> >> fix that with 'git add'; if there are other commands that need fixing
> >> too, then we should also fix them.
> >>
> >> [1] https://lore.kernel.org/git/CABPp-BFhyFiKSXdLM5q5t=ZKzr6V0pY7dbheierRaOHFbMEdkg@mail.gmail.com/
> >> [2] https://lore.kernel.org/git/CABPp-BF0ZhbSs42R3Bw_r-hbhQ71qtbXSBqXdq0djyaan=8p=A@mail.gmail.com/
> >>
> >>> test to t1092-sparse-checkout-compatibility.sh that demonstrates this
> >>> using a combination of 'git reset --mixed' and 'git checkout --orphan'.
> >>
> >> I think `git checkout --orphan` should just throw an error if
> >> sparse-checkout is in use.  Allowing adding paths outside the
> >> sparse-checkout set causes too much collateral and deferred confusion
> >> for users.
> >
> > I've been trying to strike an interesting balance of creating
> > performance improvements without changing behavior, trying to
> > defer those behavior changes to an isolated instance. I think
> > that approach is unavoidable with the 'git add' work that I
> > pulled out of this series and will return to soon.
> >
> > However, here I think it would be too much to start throwing
> > an error in this case. I think that change is a bit too much.
> >
> > The thing I can try to do, instead of the current approach, is
> > to not allow sparse directory entries to differ between the
> > index and HEAD. That will satisfy this case, but also a lot of
> > other painful cases.
> >
> > I have no idea how to actually accomplish that, but I'll start
> > digging.
>
> It didn't take much digging to discover that this is likely
> impossible, or rather it would be a drastic change to make this
> happen.
>
> The immediate issue is trying to prevent sparse directory entries
> from existing when the contained paths don't match what exists at
> HEAD. However, in the 'git checkout --orphan' case, we are using
> a full index for the unpack_trees() that updates the in-memory
> index according to the paths at HEAD, then updates HEAD to point
> to a non-existing ref. The sparse directories are only created as
> part of convert_to_sparse() within do_write_index(). At that
> point, there is no HEAD provided. Trying to load it from scratch
> violates the fact that HEAD is being staged to change _after_ the
> index updates in a command like 'git checkout'.
>
> So, the drastic change to make this work would be to update the
> index API to require a root tree to be provided whenever writing
> the index. However, that doesn't make sense, either! What do we
> do when in a conflicted state?
>
> What if a user modifies HEAD manually to point to a new ref?
>
> Such a change would couple the index to the concept of HEAD in
> an unproductive way, I think. The index data structure exists
> as a separate entity that is frequently _compared_ to HEAD, and
> the solution presented in this patch presents a way to keep the
> comparison of a sparse-index and HEAD to be the same as if we
> had a full index.
>
> So, after looking into it, I'm back in favor of this change and
> forever allowing sparse cache entries to differ from HEAD,
> because there is no way to avoid it.

Doh, thanks for digging in and entertaining the idea.  I'm worried
we'll get lots of confused users over the years from not being able to
do this, but you do make some good points.

I still think `git checkout --orphan` should be an error when in a
sparse checkout -- the point of a sparse checkout is that you only
care about a subset of files, whereas checkout --orphan fundamentally
says you are throwing away history but care about each and every file
since you are staging "changes" from all of them to include in some
new commit soon.  They just seem in strong opposition to me, and it
seems likely to result in surprises for some of the users when despite
the --orphan request and them fixing up the working directory how they
like, they get some new commit that contains files that aren't in
their working tree.  (In contrast, `git switch --orphan` would
probably be fine in a sparse checkout, precisely because it really
does empty everything).  However, I do agree with you that such a
change belongs in a separate series.  So, yes, your patch is good, and
I'll raise the behavioral change later.

(Sorry for being slow to respond and still not getting to all your
good reviews of my series; I'm a bit limited in my time for git right
now...)

^ permalink raw reply	[flat|nested] 215+ messages in thread

* [PATCH v4 00/12] Sparse-index: integrate with status
  2021-05-14 18:30   ` [PATCH v3 00/12] " Derrick Stolee via GitGitGadget
                       ` (11 preceding siblings ...)
  2021-05-14 18:31     ` [PATCH v3 12/12] fsmonitor: integrate with sparse index Derrick Stolee via GitGitGadget
@ 2021-05-21 11:59     ` Derrick Stolee via GitGitGadget
  2021-05-21 11:59       ` [PATCH v4 01/12] sparse-index: skip indexes with unmerged entries Derrick Stolee via GitGitGadget
                         ` (12 more replies)
  12 siblings, 13 replies; 215+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-05-21 11:59 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee,
	Derrick Stolee

This is the first "payoff" series in the sparse-index work. It makes 'git
status' very fast when a sparse-index is enabled on a repository with
cone-mode sparse-checkout (and a small populated set).

This is based on ds/sparse-index-protections AND mt/add-rm-sparse-checkout.
The latter branch is needed because it changes the behavior of 'git add'
around sparse entries, which changes the expectations of a test added in
patch 1.

The approach here is to audit the places where ensure_full_index() pops up
while doing normal commands with pathspecs within the sparse-checkout
definition. Each of these are checked and tested. In the end, the
sparse-index is integrated with these features:

 * git status
 * FS Monitor index extension.

The performance tests in p2000-sparse-operations.sh improve by 95% or more,
even when compared with the full-index cases, not just the sparse-index
cases that previously had extra overhead.

Hopefully this is the first example of how ds/sparse-index-protections has
done the basic work to do these conversions safely, making them look easier
than they seemed when starting this adventure.

Thanks, -Stolee


Updates in V4
=============

 * The previous patch "unpack-trees: stop recursing into sparse directories"
   was confusing, and actually a bit sloppy.
 * It has been replaced with "unpack-trees: be careful around sparse
   directory entries" which takes the sparse-directory checks and raises
   them higher up into unpack_trees.c instead of in diff-lib.c.


Updates in V3
=============

Sorry that this was a long time coming. I got a little side-tracked on other
projects, but I also worked to get the sparse-index feature working against
the Scalar functional tests, which contain many special cases around the
sparse-checkout feature as they were inherited from special cases that arose
in the virtualized environment of VFS for Git. This version contains my
fixes based on that investigation. Most of these were easy to identify and
fix, but I was blocked for a long time struggling with a bug when combining
the sparse-index with the builtin FS Monitor feature, but I've reported my
findings already [1].

[1]
https://lore.kernel.org/git/0b9e54ba-ac27-e537-7bef-1b4448f92352@gmail.com/

 * Updated comments and tests based on the v2 feedback.
 * Expanded the test repository data shape based on the special cases found
   during my investigation.
 * Added several commits that either fix errors in the status code, or fix
   errors in the previous sparse-index series, specifically:
   * When in a conflict state, the cache-tree fails to update. For now, skip
     writing a sparse-index until this can be resolved more carefully.
   * When expanding a sparse-directory entry, we set the CE_SKIP_WORKTREE
     bit but forgot the CE_EXTENDED bit.
   * git status had failures if there was a sparse-directory entry as the
     first entry within a directory.
   * When expanding a directory to report its status, such as when a
     sparse-directory is staged but doesn't exist at HEAD (such as in an
     orphaned commit) we did not previously recurse correctly into
     subdirectories.
   * Be extra careful with the FS Monitor data when expanding or contracting
     an index. This version now abandons all FS Monitor data at these
     conversion points with the expectation that in the future these
     conversions will be rare so the FS Monitor feature can work
     efficiently. Updates in V2

----------------------------------------------------------------------------

 * Based on the feedback, it is clear that 'git add' will require much more
   careful testing and thought. I'm splitting it out of this series and it
   will return with a follow-up.
 * Test cases are improved, both in coverage and organization.
 * The previous "unpack-trees: make sparse aware" patch is split into three
   now.
 * Stale messages based on an old implementation of the "protections" topic
   are now fixed.
 * Performance tests were re-run.

Derrick Stolee (12):
  sparse-index: skip indexes with unmerged entries
  sparse-index: include EXTENDED flag when expanding
  t1092: expand repository data shape
  t1092: add tests for status/add and sparse files
  unpack-trees: preserve cache_bottom
  unpack-trees: compare sparse directories correctly
  unpack-trees: be careful around sparse directory entries
  dir.c: accept a directory as part of cone-mode patterns
  status: skip sparse-checkout percentage with sparse-index
  status: use sparse-index throughout
  wt-status: expand added sparse directory entries
  fsmonitor: integrate with sparse index

 builtin/commit.c                         |   3 +
 dir.c                                    |  11 +++
 read-cache.c                             |  10 +-
 sparse-index.c                           |  27 +++++-
 t/t1092-sparse-checkout-compatibility.sh | 117 ++++++++++++++++++++++-
 t/t7519-status-fsmonitor.sh              |  48 ++++++++++
 unpack-trees.c                           |  26 ++++-
 wt-status.c                              |  64 ++++++++++++-
 wt-status.h                              |   1 +
 9 files changed, 295 insertions(+), 12 deletions(-)


base-commit: f723f370c89ad61f4f40aabfd3540b1ce19c00e5
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-932%2Fderrickstolee%2Fsparse-index%2Fstatus-and-add-v4
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-932/derrickstolee/sparse-index/status-and-add-v4
Pull-Request: https://github.com/gitgitgadget/git/pull/932

Range-diff vs v3:

  1:  5a2ed3d1d701 =  1:  5a2ed3d1d701 sparse-index: skip indexes with unmerged entries
  2:  8aa41e749471 =  2:  8aa41e749471 sparse-index: include EXTENDED flag when expanding
  3:  70971b1f9261 =  3:  70971b1f9261 t1092: expand repository data shape
  4:  a80b5a41153f =  4:  a80b5a41153f t1092: add tests for status/add and sparse files
  5:  07a45b661c4a =  5:  07a45b661c4a unpack-trees: preserve cache_bottom
  6:  cc4a526e7947 =  6:  cc4a526e7947 unpack-trees: compare sparse directories correctly
  7:  598375d3531f <  -:  ------------ unpack-trees: stop recursing into sparse directories
  -:  ------------ >  7:  e28df7f9395d unpack-trees: be careful around sparse directory entries
  8:  47da2b317237 =  8:  2cc3a93d4434 dir.c: accept a directory as part of cone-mode patterns
  9:  bc1512981493 =  9:  5011feb1aa04 status: skip sparse-checkout percentage with sparse-index
 10:  5b1ae369a7cd = 10:  9f2ce5301dc9 status: use sparse-index throughout
 11:  3b42783d4a86 = 11:  24417e095243 wt-status: expand added sparse directory entries
 12:  b72507f514d1 = 12:  584d4b559a91 fsmonitor: integrate with sparse index

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 215+ messages in thread

* [PATCH v4 01/12] sparse-index: skip indexes with unmerged entries
  2021-05-21 11:59     ` [PATCH v4 00/12] Sparse-index: integrate with status Derrick Stolee via GitGitGadget
@ 2021-05-21 11:59       ` Derrick Stolee via GitGitGadget
  2021-05-21 11:59       ` [PATCH v4 02/12] sparse-index: include EXTENDED flag when expanding Derrick Stolee via GitGitGadget
                         ` (11 subsequent siblings)
  12 siblings, 0 replies; 215+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-05-21 11:59 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

The sparse-index format is designed to be compatible with merge
conflicts, even those outside the sparse-checkout definition. The reason
is that when converting a full index to a sparse one, a cache entry with
nonzero stage will not be collapsed into a sparse directory entry.

However, this behavior was not tested, and a different behavior within
convert_to_sparse() fails in this scenario. Specifically,
cache_tree_update() will fail when unmerged entries exist.
convert_to_sparse_rec() uses the cache-tree data to recursively walk the
tree structure, but also to compute the OIDs used in the
sparse-directory entries.

Add an index scan to convert_to_sparse() that will detect if these merge
conflict entries exist and skip the conversion before trying to update
the cache-tree. This is marked as NEEDSWORK because this can be removed
with a suitable update to cache_tree_update() or a similar method that
can construct a cache-tree with invalid nodes, but still allow creating
the nodes necessary for creating sparse directory entries.

It is possible that in the future we will not need to make such an
update, since if we do not expand a sparse-index into a full one, this
conversion does not need to happen. Thus, this can be deferred until the
merge machinery is made to integrate with the sparse-index.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 sparse-index.c                           | 18 ++++++++++++++++++
 t/t1092-sparse-checkout-compatibility.sh | 22 ++++++++++++++++++++++
 2 files changed, 40 insertions(+)

diff --git a/sparse-index.c b/sparse-index.c
index 6f21397e2ee0..1b49898d0cb7 100644
--- a/sparse-index.c
+++ b/sparse-index.c
@@ -125,6 +125,17 @@ int set_sparse_index_config(struct repository *repo, int enable)
 	return res;
 }
 
+static int index_has_unmerged_entries(struct index_state *istate)
+{
+	int i;
+	for (i = 0; i < istate->cache_nr; i++) {
+		if (ce_stage(istate->cache[i]))
+			return 1;
+	}
+
+	return 0;
+}
+
 int convert_to_sparse(struct index_state *istate)
 {
 	int test_env;
@@ -161,6 +172,13 @@ int convert_to_sparse(struct index_state *istate)
 		return -1;
 	}
 
+	/*
+	 * NEEDSWORK: If we have unmerged entries, then stay full.
+	 * Unmerged entries prevent the cache-tree extension from working.
+	 */
+	if (index_has_unmerged_entries(istate))
+		return 0;
+
 	if (cache_tree_update(istate, 0)) {
 		warning(_("unable to update cache-tree, staying full"));
 		return -1;
diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index 12e6c453024f..4f2f09b53a32 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -352,6 +352,28 @@ test_expect_success 'merge with outside renames' '
 	done
 '
 
+# Sparse-index fails to convert the index in the
+# final 'git cherry-pick' command.
+test_expect_success 'cherry-pick with conflicts' '
+	init_repos &&
+
+	write_script edit-conflict <<-\EOF &&
+	echo $1 >conflict
+	EOF
+
+	test_all_match git checkout -b to-cherry-pick &&
+	run_on_all ../edit-conflict ABC &&
+	test_all_match git add conflict &&
+	test_all_match git commit -m "conflict to pick" &&
+
+	test_all_match git checkout -B base HEAD~1 &&
+	run_on_all ../edit-conflict DEF &&
+	test_all_match git add conflict &&
+	test_all_match git commit -m "conflict in base" &&
+
+	test_all_match test_must_fail git cherry-pick to-cherry-pick
+'
+
 test_expect_success 'clean' '
 	init_repos &&
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 215+ messages in thread

* [PATCH v4 02/12] sparse-index: include EXTENDED flag when expanding
  2021-05-21 11:59     ` [PATCH v4 00/12] Sparse-index: integrate with status Derrick Stolee via GitGitGadget
  2021-05-21 11:59       ` [PATCH v4 01/12] sparse-index: skip indexes with unmerged entries Derrick Stolee via GitGitGadget
@ 2021-05-21 11:59       ` Derrick Stolee via GitGitGadget
  2021-05-21 11:59       ` [PATCH v4 03/12] t1092: expand repository data shape Derrick Stolee via GitGitGadget
                         ` (10 subsequent siblings)
  12 siblings, 0 replies; 215+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-05-21 11:59 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

When creating a full index from a sparse one, we create cache entries
for every blob within a given sparse directory entry. These are
correctly marked with the CE_SKIP_WORKTREE flag, but they must also be
marked with the CE_EXTENDED flag to ensure that the skip-worktree bit is
correctly written to disk in the case that the index is not converted
back down to a sparse-index.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 sparse-index.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/sparse-index.c b/sparse-index.c
index 1b49898d0cb7..b2b3fbd75050 100644
--- a/sparse-index.c
+++ b/sparse-index.c
@@ -222,7 +222,7 @@ static int add_path_to_index(const struct object_id *oid,
 	strbuf_addstr(base, path);
 
 	ce = make_cache_entry(istate, mode, oid, base->buf, 0, 0);
-	ce->ce_flags |= CE_SKIP_WORKTREE;
+	ce->ce_flags |= CE_SKIP_WORKTREE | CE_EXTENDED;
 	set_index_entry(istate, istate->cache_nr++, ce);
 
 	strbuf_setlen(base, len);
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 215+ messages in thread

* [PATCH v4 03/12] t1092: expand repository data shape
  2021-05-21 11:59     ` [PATCH v4 00/12] Sparse-index: integrate with status Derrick Stolee via GitGitGadget
  2021-05-21 11:59       ` [PATCH v4 01/12] sparse-index: skip indexes with unmerged entries Derrick Stolee via GitGitGadget
  2021-05-21 11:59       ` [PATCH v4 02/12] sparse-index: include EXTENDED flag when expanding Derrick Stolee via GitGitGadget
@ 2021-05-21 11:59       ` Derrick Stolee via GitGitGadget
  2021-05-21 11:59       ` [PATCH v4 04/12] t1092: add tests for status/add and sparse files Derrick Stolee via GitGitGadget
                         ` (9 subsequent siblings)
  12 siblings, 0 replies; 215+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-05-21 11:59 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

As more features integrate with the sparse-index feature, more and more
special cases arise that require different data shapes within the tree
structure of the repository in order to demonstrate those cases.

Add several interesting special cases all at once instead of sprinkling
them across several commits. The interesting cases being added here are:

* Add sparse-directory entries on both sides of directories within the
  sparse-checkout definition.

* Add directories outside the sparse-checkout definition who have only
  one entry and are the first entry of a directory with multiple
  entries.

Later tests will take advantage of these shapes, but they also deepen
the tests that already exist.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 t/t1092-sparse-checkout-compatibility.sh | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index 4f2f09b53a32..98257695979a 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -17,7 +17,7 @@ test_expect_success 'setup' '
 		echo "after folder1" >g &&
 		echo "after x" >z &&
 		mkdir folder1 folder2 deep x &&
-		mkdir deep/deeper1 deep/deeper2 &&
+		mkdir deep/deeper1 deep/deeper2 deep/before deep/later &&
 		mkdir deep/deeper1/deepest &&
 		echo "after deeper1" >deep/e &&
 		echo "after deepest" >deep/deeper1/e &&
@@ -25,10 +25,16 @@ test_expect_success 'setup' '
 		cp a folder2 &&
 		cp a x &&
 		cp a deep &&
+		cp a deep/before &&
 		cp a deep/deeper1 &&
 		cp a deep/deeper2 &&
+		cp a deep/later &&
 		cp a deep/deeper1/deepest &&
 		cp -r deep/deeper1/deepest deep/deeper2 &&
+		mkdir deep/deeper1/0 &&
+		mkdir deep/deeper1/0/0 &&
+		touch deep/deeper1/0/1 &&
+		touch deep/deeper1/0/0/0 &&
 		git add . &&
 		git commit -m "initial commit" &&
 		git checkout -b base &&
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 215+ messages in thread

* [PATCH v4 04/12] t1092: add tests for status/add and sparse files
  2021-05-21 11:59     ` [PATCH v4 00/12] Sparse-index: integrate with status Derrick Stolee via GitGitGadget
                         ` (2 preceding siblings ...)
  2021-05-21 11:59       ` [PATCH v4 03/12] t1092: expand repository data shape Derrick Stolee via GitGitGadget
@ 2021-05-21 11:59       ` Derrick Stolee via GitGitGadget
  2021-05-21 11:59       ` [PATCH v4 05/12] unpack-trees: preserve cache_bottom Derrick Stolee via GitGitGadget
                         ` (8 subsequent siblings)
  12 siblings, 0 replies; 215+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-05-21 11:59 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Before moving to update 'git status' and 'git add' to work with sparse
indexes, add an explicit test that ensures the sparse-index works the
same as a normal sparse-checkout when the worktree contains directories
and files outside of the sparse cone.

Specifically, 'folder1/a' is a file in our test repo, but 'folder1' is
not in the sparse cone. When 'folder1/a' is modified, the file is not
shown as modified and adding it will fail. This is new behavior as of
a20f704 (add: warn when asked to update SKIP_WORKTREE entries,
2021-04-08). Before that change, these adds would be silently ignored.

Untracked files are fine: adding new files both with 'git add .' and
'git add folder1/' works just as in a full checkout. This may not be
entirely desirable, but we are not intending to change behavior at the
moment, only document it. A future change could alter the behavior to
be more sensible, and this test could be modified to satisfy the new
expected behavior.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 t/t1092-sparse-checkout-compatibility.sh | 38 ++++++++++++++++++++++++
 1 file changed, 38 insertions(+)

diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index 98257695979a..fba98d5484ae 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -238,6 +238,44 @@ test_expect_success 'add, commit, checkout' '
 	test_all_match git checkout -
 '
 
+test_expect_success 'status/add: outside sparse cone' '
+	init_repos &&
+
+	# adding a "missing" file outside the cone should fail
+	test_sparse_match test_must_fail git add folder1/a &&
+
+	# folder1 is at HEAD, but outside the sparse cone
+	run_on_sparse mkdir folder1 &&
+	cp initial-repo/folder1/a sparse-checkout/folder1/a &&
+	cp initial-repo/folder1/a sparse-index/folder1/a &&
+
+	test_sparse_match git status &&
+
+	write_script edit-contents <<-\EOF &&
+	echo text >>$1
+	EOF
+	run_on_sparse ../edit-contents folder1/a &&
+	run_on_all ../edit-contents folder1/new &&
+
+	test_sparse_match git status --porcelain=v2 &&
+
+	# This "git add folder1/a" fails with a warning
+	# in the sparse repos, differing from the full
+	# repo. This is intentional.
+	test_sparse_match test_must_fail git add folder1/a &&
+	test_sparse_match test_must_fail git add --refresh folder1/a &&
+	test_all_match git status --porcelain=v2 &&
+
+	test_all_match git add . &&
+	test_all_match git status --porcelain=v2 &&
+	test_all_match git commit -m folder1/new &&
+
+	run_on_all ../edit-contents folder1/newer &&
+	test_all_match git add folder1/ &&
+	test_all_match git status --porcelain=v2 &&
+	test_all_match git commit -m folder1/newer
+'
+
 test_expect_success 'checkout and reset --hard' '
 	init_repos &&
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 215+ messages in thread

* [PATCH v4 05/12] unpack-trees: preserve cache_bottom
  2021-05-21 11:59     ` [PATCH v4 00/12] Sparse-index: integrate with status Derrick Stolee via GitGitGadget
                         ` (3 preceding siblings ...)
  2021-05-21 11:59       ` [PATCH v4 04/12] t1092: add tests for status/add and sparse files Derrick Stolee via GitGitGadget
@ 2021-05-21 11:59       ` Derrick Stolee via GitGitGadget
  2021-05-21 11:59       ` [PATCH v4 06/12] unpack-trees: compare sparse directories correctly Derrick Stolee via GitGitGadget
                         ` (7 subsequent siblings)
  12 siblings, 0 replies; 215+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-05-21 11:59 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

The cache_bottom member of 'struct unpack_trees_options' is used to
track the range of index entries corresponding to a node of the cache
tree. While recursing with traverse_by_cache_tree(), this value is
preserved on the call stack using a local and then restored as that
method returns.

The mark_ce_used() method normally modifies the cache_bottom member when
it refers to the marked cache entry. However, sparse directory entries
are stored as nodes in the cache-tree data structure as of 2de37c53
(cache-tree: integrate with sparse directory entries, 2021-03-30). Thus,
the cache_bottom will be modified as the cache-tree walk advances. Do
not update it as well within mark_ce_used().

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 unpack-trees.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/unpack-trees.c b/unpack-trees.c
index dddf106d5bd4..1067db19c9d2 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -586,6 +586,13 @@ static void mark_ce_used(struct cache_entry *ce, struct unpack_trees_options *o)
 {
 	ce->ce_flags |= CE_UNPACKED;
 
+	/*
+	 * If this is a sparse directory, don't advance cache_bottom.
+	 * That will be advanced later using the cache-tree data.
+	 */
+	if (S_ISSPARSEDIR(ce->ce_mode))
+		return;
+
 	if (o->cache_bottom < o->src_index->cache_nr &&
 	    o->src_index->cache[o->cache_bottom] == ce) {
 		int bottom = o->cache_bottom;
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 215+ messages in thread

* [PATCH v4 06/12] unpack-trees: compare sparse directories correctly
  2021-05-21 11:59     ` [PATCH v4 00/12] Sparse-index: integrate with status Derrick Stolee via GitGitGadget
                         ` (4 preceding siblings ...)
  2021-05-21 11:59       ` [PATCH v4 05/12] unpack-trees: preserve cache_bottom Derrick Stolee via GitGitGadget
@ 2021-05-21 11:59       ` Derrick Stolee via GitGitGadget
  2021-05-21 11:59       ` [PATCH v4 07/12] unpack-trees: be careful around sparse directory entries Derrick Stolee via GitGitGadget
                         ` (6 subsequent siblings)
  12 siblings, 0 replies; 215+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-05-21 11:59 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

As we further integrate the sparse-index into unpack-trees, we need to
ensure that we compare sparse directory entries correctly with other
entries. This affects searching for an exact path as well as sorting
index entries.

Sparse directory entries contain the trailing directory separator. This
is important for the sorting, in particular. Thus, within
do_compare_entry() we stop using S_IFREG in all cases, since sparse
directories should use S_IFDIR to indicate that the comparison should
treat the entry name as a dirctory.

Within compare_entry(), it first calls do_compare_entry() to check the
leading portion of the name. When the input path is a directory name, we
could match exactly already. Thus, we should return 0 if we have an
exact string match on a sparse directory entry.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 unpack-trees.c | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/unpack-trees.c b/unpack-trees.c
index 1067db19c9d2..ef6a2b1c951c 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -969,6 +969,7 @@ static int do_compare_entry(const struct cache_entry *ce,
 	int pathlen, ce_len;
 	const char *ce_name;
 	int cmp;
+	unsigned ce_mode;
 
 	/*
 	 * If we have not precomputed the traverse path, it is quicker
@@ -991,7 +992,8 @@ static int do_compare_entry(const struct cache_entry *ce,
 	ce_len -= pathlen;
 	ce_name = ce->name + pathlen;
 
-	return df_name_compare(ce_name, ce_len, S_IFREG, name, namelen, mode);
+	ce_mode = S_ISSPARSEDIR(ce->ce_mode) ? S_IFDIR : S_IFREG;
+	return df_name_compare(ce_name, ce_len, ce_mode, name, namelen, mode);
 }
 
 static int compare_entry(const struct cache_entry *ce, const struct traverse_info *info, const struct name_entry *n)
@@ -1000,6 +1002,15 @@ static int compare_entry(const struct cache_entry *ce, const struct traverse_inf
 	if (cmp)
 		return cmp;
 
+	/*
+	 * At this point, we know that we have a prefix match. If ce
+	 * is a sparse directory, then allow an exact match. This only
+	 * works when the input name is a directory, since ce->name
+	 * ends in a directory separator.
+	 */
+	if (S_ISSPARSEDIR(ce->ce_mode))
+		return 0;
+
 	/*
 	 * Even if the beginning compared identically, the ce should
 	 * compare as bigger than a directory leading up to it!
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 215+ messages in thread

* [PATCH v4 07/12] unpack-trees: be careful around sparse directory entries
  2021-05-21 11:59     ` [PATCH v4 00/12] Sparse-index: integrate with status Derrick Stolee via GitGitGadget
                         ` (5 preceding siblings ...)
  2021-05-21 11:59       ` [PATCH v4 06/12] unpack-trees: compare sparse directories correctly Derrick Stolee via GitGitGadget
@ 2021-05-21 11:59       ` Derrick Stolee via GitGitGadget
  2021-05-28 11:36         ` Derrick Stolee
  2021-05-21 11:59       ` [PATCH v4 08/12] dir.c: accept a directory as part of cone-mode patterns Derrick Stolee via GitGitGadget
                         ` (5 subsequent siblings)
  12 siblings, 1 reply; 215+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-05-21 11:59 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

The methods traverse_by_cache_tree() and unpack_nondirectories() have
similar behavior in trying to demonstrate the difference between and
index and a tree, with some differences about how they walk the index.

Each of these is expecting every cache entry to correspond to a file
path. We need to skip over the sparse directory entries in the case of a
sparse-index. Those entries are discovered in the portion that looks for
subtrees among the cache entries by scanning the paths for slashes.

Skipping these sparse directory entries will have a measurable effect
when we relax 'git status' to work with sparse-indexes: without this
change these methods would call call_unpack_fn() which in turn calls
oneway_diff() and then shows these sparse directory entries as added or
modified files.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 unpack-trees.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/unpack-trees.c b/unpack-trees.c
index ef6a2b1c951c..22634d98e72b 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -802,6 +802,9 @@ static int traverse_by_cache_tree(int pos, int nr_entries, int nr_names,
 
 		src[0] = o->src_index->cache[pos + i];
 
+		if (S_ISSPARSEDIR(src[0]->ce_mode))
+			continue;
+
 		len = ce_namelen(src[0]);
 		new_ce_len = cache_entry_size(len);
 
@@ -1074,6 +1077,9 @@ static int unpack_nondirectories(int n, unsigned long mask,
 	if (mask == dirmask && !src[0])
 		return 0;
 
+	if (src[0] && S_ISSPARSEDIR(src[0]->ce_mode))
+		return 0;
+
 	/*
 	 * Ok, we've filled in up to any potential index entry in src[0],
 	 * now do the rest.
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 215+ messages in thread

* [PATCH v4 08/12] dir.c: accept a directory as part of cone-mode patterns
  2021-05-21 11:59     ` [PATCH v4 00/12] Sparse-index: integrate with status Derrick Stolee via GitGitGadget
                         ` (6 preceding siblings ...)
  2021-05-21 11:59       ` [PATCH v4 07/12] unpack-trees: be careful around sparse directory entries Derrick Stolee via GitGitGadget
@ 2021-05-21 11:59       ` Derrick Stolee via GitGitGadget
  2021-05-21 11:59       ` [PATCH v4 09/12] status: skip sparse-checkout percentage with sparse-index Derrick Stolee via GitGitGadget
                         ` (4 subsequent siblings)
  12 siblings, 0 replies; 215+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-05-21 11:59 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

When we have sparse directory entries in the index, we want to compare
that directory against sparse-checkout patterns. Those pattern matching
algorithms are built expecting a file path, not a directory path. This
is especially important in the "cone mode" patterns which will match
files that exist within the "parent directories" as well as the
recursive directory matches.

If path_matches_pattern_list() is given a directory, we can add a fake
filename ("-") to the directory and get the same results as before,
assuming we are in cone mode. Since sparse index requires cone mode
patterns, this is an acceptable assumption.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 dir.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/dir.c b/dir.c
index 166238e79f52..ab76ef286495 100644
--- a/dir.c
+++ b/dir.c
@@ -1378,6 +1378,17 @@ enum pattern_match_result path_matches_pattern_list(
 	strbuf_addch(&parent_pathname, '/');
 	strbuf_add(&parent_pathname, pathname, pathlen);
 
+	/*
+	 * Directory entries are matched if and only if a file
+	 * contained immediately within them is matched. For the
+	 * case of a directory entry, modify the path to create
+	 * a fake filename within this directory, allowing us to
+	 * use the file-base matching logic in an equivalent way.
+	 */
+	if (parent_pathname.len > 0 &&
+	    parent_pathname.buf[parent_pathname.len - 1] == '/')
+		strbuf_add(&parent_pathname, "-", 1);
+
 	if (hashmap_contains_path(&pl->recursive_hashmap,
 				  &parent_pathname)) {
 		result = MATCHED_RECURSIVE;
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 215+ messages in thread

* [PATCH v4 09/12] status: skip sparse-checkout percentage with sparse-index
  2021-05-21 11:59     ` [PATCH v4 00/12] Sparse-index: integrate with status Derrick Stolee via GitGitGadget
                         ` (7 preceding siblings ...)
  2021-05-21 11:59       ` [PATCH v4 08/12] dir.c: accept a directory as part of cone-mode patterns Derrick Stolee via GitGitGadget
@ 2021-05-21 11:59       ` Derrick Stolee via GitGitGadget
  2021-05-21 11:59       ` [PATCH v4 10/12] status: use sparse-index throughout Derrick Stolee via GitGitGadget
                         ` (3 subsequent siblings)
  12 siblings, 0 replies; 215+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-05-21 11:59 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

'git status' began reporting a percentage of populated paths when
sparse-checkout is enabled in 051df3cf (wt-status: show sparse
checkout status as well, 2020-07-18). This percentage is incorrect when
the index has sparse directories. It would also be expensive to
calculate as we would need to parse trees to count the total number of
possible paths.

Avoid the expensive computation by simplifying the output to only report
that a sparse checkout exists, without the percentage.

This change is the reason we use 'git status --porcelain=v2' in
t1092-sparse-checkout-compatibility.sh. We don't want to ensure that
this message is equal across both modes, but instead just the important
information about staged, modified, and untracked files are compared.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 t/t1092-sparse-checkout-compatibility.sh |  8 ++++++++
 wt-status.c                              | 14 +++++++++++---
 wt-status.h                              |  1 +
 3 files changed, 20 insertions(+), 3 deletions(-)

diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index fba98d5484ae..34dae7fbcadd 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -202,6 +202,14 @@ test_expect_success 'status with options' '
 	test_all_match git status --porcelain=v2 -uno
 '
 
+test_expect_success 'status reports sparse-checkout' '
+	init_repos &&
+	git -C sparse-checkout status >full &&
+	git -C sparse-index status >sparse &&
+	test_i18ngrep "You are in a sparse checkout with " full &&
+	test_i18ngrep "You are in a sparse checkout." sparse
+'
+
 test_expect_success 'add, commit, checkout' '
 	init_repos &&
 
diff --git a/wt-status.c b/wt-status.c
index 0c8287a023e4..0425169c1895 100644
--- a/wt-status.c
+++ b/wt-status.c
@@ -1490,9 +1490,12 @@ static void show_sparse_checkout_in_use(struct wt_status *s,
 	if (s->state.sparse_checkout_percentage == SPARSE_CHECKOUT_DISABLED)
 		return;
 
-	status_printf_ln(s, color,
-			 _("You are in a sparse checkout with %d%% of tracked files present."),
-			 s->state.sparse_checkout_percentage);
+	if (s->state.sparse_checkout_percentage == SPARSE_CHECKOUT_SPARSE_INDEX)
+		status_printf_ln(s, color, _("You are in a sparse checkout."));
+	else
+		status_printf_ln(s, color,
+				_("You are in a sparse checkout with %d%% of tracked files present."),
+				s->state.sparse_checkout_percentage);
 	wt_longstatus_print_trailer(s);
 }
 
@@ -1650,6 +1653,11 @@ static void wt_status_check_sparse_checkout(struct repository *r,
 		return;
 	}
 
+	if (r->index->sparse_index) {
+		state->sparse_checkout_percentage = SPARSE_CHECKOUT_SPARSE_INDEX;
+		return;
+	}
+
 	for (i = 0; i < r->index->cache_nr; i++) {
 		struct cache_entry *ce = r->index->cache[i];
 		if (ce_skip_worktree(ce))
diff --git a/wt-status.h b/wt-status.h
index 0d32799b28e1..ab9cc9d8f032 100644
--- a/wt-status.h
+++ b/wt-status.h
@@ -78,6 +78,7 @@ enum wt_status_format {
 };
 
 #define SPARSE_CHECKOUT_DISABLED -1
+#define SPARSE_CHECKOUT_SPARSE_INDEX -2
 
 struct wt_status_state {
 	int merge_in_progress;
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 215+ messages in thread

* [PATCH v4 10/12] status: use sparse-index throughout
  2021-05-21 11:59     ` [PATCH v4 00/12] Sparse-index: integrate with status Derrick Stolee via GitGitGadget
                         ` (8 preceding siblings ...)
  2021-05-21 11:59       ` [PATCH v4 09/12] status: skip sparse-checkout percentage with sparse-index Derrick Stolee via GitGitGadget
@ 2021-05-21 11:59       ` Derrick Stolee via GitGitGadget
  2021-05-21 11:59       ` [PATCH v4 11/12] wt-status: expand added sparse directory entries Derrick Stolee via GitGitGadget
                         ` (2 subsequent siblings)
  12 siblings, 0 replies; 215+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-05-21 11:59 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

By testing 'git -c core.fsmonitor= status -uno', we can check for the
simplest index operations that can be made sparse-aware. The necessary
implementation details are already integrated with sparse-checkout, so
modify command_requires_full_index to be zero for cmd_status().

In refresh_index(), we loop through the index entries to refresh their
stat() information. However, sparse directories have no stat()
information to populate. Ignore these entries.

This allows 'git status' to no longer expand a sparse index to a full
one. This is further tested by dropping the "-uno" option and adding an
untracked file into the worktree.

The performance test p2000-sparse-checkout-operations.sh demonstrates
these improvements:

Test                                  HEAD~1           HEAD
-----------------------------------------------------------------------------
2000.2: git status (full-index-v3)    0.31(0.30+0.05)  0.31(0.29+0.06) +0.0%
2000.3: git status (full-index-v4)    0.31(0.29+0.07)  0.34(0.30+0.08) +9.7%
2000.4: git status (sparse-index-v3)  2.35(2.28+0.10)  0.04(0.04+0.05) -98.3%
2000.5: git status (sparse-index-v4)  2.35(2.24+0.15)  0.05(0.04+0.06) -97.9%

Note that since HEAD~1 was expanding the sparse index by parsing trees,
it was artificially slower than the full index case. Thus, the 98%
improvement is misleading, and instead we should celebrate the 0.34s to
0.05s improvement of 85%. This is more indicative of the peformance
gains we are expecting by using a sparse index.

Note: we are dropping the assignment of core.fsmonitor here. This is not
necessary for the test script as we are not altering the config any
other way. Correct integration with FS Monitor will be validated in
later changes.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 builtin/commit.c                         |  3 +++
 read-cache.c                             | 10 ++++++++--
 t/t1092-sparse-checkout-compatibility.sh | 13 +++++++++----
 3 files changed, 20 insertions(+), 6 deletions(-)

diff --git a/builtin/commit.c b/builtin/commit.c
index cf0c36d1dcb2..e529da7beadd 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -1404,6 +1404,9 @@ int cmd_status(int argc, const char **argv, const char *prefix)
 	if (argc == 2 && !strcmp(argv[1], "-h"))
 		usage_with_options(builtin_status_usage, builtin_status_options);
 
+	prepare_repo_settings(the_repository);
+	the_repository->settings.command_requires_full_index = 0;
+
 	status_init_config(&s, git_status_config);
 	argc = parse_options(argc, argv, prefix,
 			     builtin_status_options,
diff --git a/read-cache.c b/read-cache.c
index 29ffa9ac5db9..f80e26831b36 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1578,8 +1578,7 @@ int refresh_index(struct index_state *istate, unsigned int flags,
 	 */
 	preload_index(istate, pathspec, 0);
 	trace2_region_enter("index", "refresh", NULL);
-	/* TODO: audit for interaction with sparse-index. */
-	ensure_full_index(istate);
+
 	for (i = 0; i < istate->cache_nr; i++) {
 		struct cache_entry *ce, *new_entry;
 		int cache_errno = 0;
@@ -1594,6 +1593,13 @@ int refresh_index(struct index_state *istate, unsigned int flags,
 		if (ignore_skip_worktree && ce_skip_worktree(ce))
 			continue;
 
+		/*
+		 * If this entry is a sparse directory, then there isn't
+		 * any stat() information to update. Ignore the entry.
+		 */
+		if (S_ISSPARSEDIR(ce->ce_mode))
+			continue;
+
 		if (pathspec && !ce_path_match(istate, ce, pathspec, seen))
 			filtered = 1;
 
diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index 34dae7fbcadd..59faf7381093 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -479,12 +479,17 @@ test_expect_success 'sparse-index is expanded and converted back' '
 	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \
 		git -C sparse-index -c core.fsmonitor="" reset --hard &&
 	test_region index convert_to_sparse trace2.txt &&
-	test_region index ensure_full_index trace2.txt &&
+	test_region index ensure_full_index trace2.txt
+'
 
-	rm trace2.txt &&
+test_expect_success 'sparse-index is not expanded' '
+	init_repos &&
+
+	rm -f trace2.txt &&
+	echo >>sparse-index/untracked.txt &&
 	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \
-		git -C sparse-index -c core.fsmonitor="" status -uno &&
-	test_region index ensure_full_index trace2.txt
+		git -C sparse-index status &&
+	test_region ! index ensure_full_index trace2.txt
 '
 
 test_done
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 215+ messages in thread

* [PATCH v4 11/12] wt-status: expand added sparse directory entries
  2021-05-21 11:59     ` [PATCH v4 00/12] Sparse-index: integrate with status Derrick Stolee via GitGitGadget
                         ` (9 preceding siblings ...)
  2021-05-21 11:59       ` [PATCH v4 10/12] status: use sparse-index throughout Derrick Stolee via GitGitGadget
@ 2021-05-21 11:59       ` Derrick Stolee via GitGitGadget
  2021-05-21 11:59       ` [PATCH v4 12/12] fsmonitor: integrate with sparse index Derrick Stolee via GitGitGadget
  2021-06-07 12:33       ` [PATCH v5 00/14] Sparse-index: integrate with status Derrick Stolee via GitGitGadget
  12 siblings, 0 replies; 215+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-05-21 11:59 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

It is difficult, but possible, to get into a state where we intend to
add a directory that is outside of the sparse-checkout definition. Add a
test to t1092-sparse-checkout-compatibility.sh that demonstrates this
using a combination of 'git reset --mixed' and 'git checkout --orphan'.

This test failed before because the output of 'git status
--porcelain=v2' would not match on the lines for folder1/:

* The sparse-checkout repo (with a full index) would output each path
  name that is intended to be added.

* The sparse-index repo would only output that "folder1/" is staged for
  addition.

The status should report the full list of files to be added, and so this
sparse-directory entry should be expanded to a full list when reaching
it inside the wt_status_collect_changes_initial() method. Use
read_tree_at() to assist.

Somehow, this loop over the cache entries was not guarded by
ensure_full_index() as intended.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 t/t1092-sparse-checkout-compatibility.sh | 28 +++++++++++++
 wt-status.c                              | 50 ++++++++++++++++++++++++
 2 files changed, 78 insertions(+)

diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index 59faf7381093..cd3669d36b53 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -492,4 +492,32 @@ test_expect_success 'sparse-index is not expanded' '
 	test_region ! index ensure_full_index trace2.txt
 '
 
+test_expect_success 'reset mixed and checkout orphan' '
+	init_repos &&
+
+	test_all_match git checkout rename-out-to-in &&
+	test_all_match git reset --mixed HEAD~1 &&
+	test_sparse_match test-tool read-cache --table --expand &&
+	test_all_match git status --porcelain=v2 &&
+	test_all_match git status --porcelain=v2 &&
+
+	# At this point, sparse-checkouts behave differently
+	# from the full-checkout.
+	test_sparse_match git checkout --orphan new-branch &&
+	test_sparse_match test-tool read-cache --table --expand &&
+	test_sparse_match git status --porcelain=v2 &&
+	test_sparse_match git status --porcelain=v2
+'
+
+test_expect_success 'add everything with deep new file' '
+	init_repos &&
+
+	run_on_sparse git sparse-checkout set deep/deeper1/deepest &&
+
+	run_on_all touch deep/deeper1/x &&
+	test_all_match git add . &&
+	test_all_match git status --porcelain=v2 &&
+	test_all_match git status --porcelain=v2
+'
+
 test_done
diff --git a/wt-status.c b/wt-status.c
index 0425169c1895..90db8bd659fa 100644
--- a/wt-status.c
+++ b/wt-status.c
@@ -654,6 +654,34 @@ static void wt_status_collect_changes_index(struct wt_status *s)
 	run_diff_index(&rev, 1);
 }
 
+static int add_file_to_list(const struct object_id *oid,
+			    struct strbuf *base, const char *path,
+			    unsigned int mode, void *context)
+{
+	struct string_list_item *it;
+	struct wt_status_change_data *d;
+	struct wt_status *s = context;
+	char *full_name;
+
+	if (S_ISDIR(mode))
+		return READ_TREE_RECURSIVE;
+
+	full_name = xstrfmt("%s%s", base->buf, path);
+	it = string_list_insert(&s->change, full_name);
+	d = it->util;
+	if (!d) {
+		CALLOC_ARRAY(d, 1);
+		it->util = d;
+	}
+
+	d->index_status = DIFF_STATUS_ADDED;
+	/* Leave {mode,oid}_head zero for adds. */
+	d->mode_index = mode;
+	oidcpy(&d->oid_index, oid);
+	s->committable = 1;
+	return 0;
+}
+
 static void wt_status_collect_changes_initial(struct wt_status *s)
 {
 	struct index_state *istate = s->repo->index;
@@ -668,6 +696,28 @@ static void wt_status_collect_changes_initial(struct wt_status *s)
 			continue;
 		if (ce_intent_to_add(ce))
 			continue;
+		if (S_ISSPARSEDIR(ce->ce_mode)) {
+			/*
+			 * This is a sparse directory entry, so we want to collect all
+			 * of the added files within the tree. This requires recursively
+			 * expanding the trees to find the elements that are new in this
+			 * tree and marking them with DIFF_STATUS_ADDED.
+			 */
+			struct strbuf base = STRBUF_INIT;
+			struct pathspec ps;
+			struct tree *tree = lookup_tree(istate->repo, &ce->oid);
+
+			memset(&ps, 0, sizeof(ps));
+			ps.recursive = 1;
+			ps.has_wildcard = 1;
+			ps.max_depth = -1;
+
+			strbuf_add(&base, ce->name, ce->ce_namelen);
+			read_tree_at(istate->repo, tree, &base, &ps,
+				     add_file_to_list, s);
+			continue;
+		}
+
 		it = string_list_insert(&s->change, ce->name);
 		d = it->util;
 		if (!d) {
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 215+ messages in thread

* [PATCH v4 12/12] fsmonitor: integrate with sparse index
  2021-05-21 11:59     ` [PATCH v4 00/12] Sparse-index: integrate with status Derrick Stolee via GitGitGadget
                         ` (10 preceding siblings ...)
  2021-05-21 11:59       ` [PATCH v4 11/12] wt-status: expand added sparse directory entries Derrick Stolee via GitGitGadget
@ 2021-05-21 11:59       ` Derrick Stolee via GitGitGadget
  2021-06-07 12:33       ` [PATCH v5 00/14] Sparse-index: integrate with status Derrick Stolee via GitGitGadget
  12 siblings, 0 replies; 215+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-05-21 11:59 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

If we need to expand a sparse-index into a full one, then the FS Monitor
bitmap is going to be incorrect. Ensure that we start fresh at such an
event.

While this is currently a performance drawback, the eventual hope of the
sparse-index feature is that these expansions will be rare and hence we
will be able to keep the FS Monitor data accurate across multiple Git
commands.

These tests are added to demonstrate that the behavior is the same
across a full index and a sparse index, but also that file modifications
to a tracked directory outside of the sparse cone will trigger
ensure_full_index().

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 sparse-index.c              |  7 ++++++
 t/t7519-status-fsmonitor.sh | 48 +++++++++++++++++++++++++++++++++++++
 2 files changed, 55 insertions(+)

diff --git a/sparse-index.c b/sparse-index.c
index b2b3fbd75050..32ba0d17ef7c 100644
--- a/sparse-index.c
+++ b/sparse-index.c
@@ -195,6 +195,10 @@ int convert_to_sparse(struct index_state *istate)
 	cache_tree_free(&istate->cache_tree);
 	cache_tree_update(istate, 0);
 
+	istate->fsmonitor_has_run_once = 0;
+	FREE_AND_NULL(istate->fsmonitor_dirty);
+	FREE_AND_NULL(istate->fsmonitor_last_update);
+
 	istate->sparse_index = 1;
 	trace2_region_leave("index", "convert_to_sparse", istate->repo);
 	return 0;
@@ -291,6 +295,9 @@ void ensure_full_index(struct index_state *istate)
 	istate->cache = full->cache;
 	istate->cache_nr = full->cache_nr;
 	istate->cache_alloc = full->cache_alloc;
+	istate->fsmonitor_has_run_once = 0;
+	FREE_AND_NULL(istate->fsmonitor_dirty);
+	FREE_AND_NULL(istate->fsmonitor_last_update);
 
 	strbuf_release(&base);
 	free(full);
diff --git a/t/t7519-status-fsmonitor.sh b/t/t7519-status-fsmonitor.sh
index 45d025f96010..f70fe961902e 100755
--- a/t/t7519-status-fsmonitor.sh
+++ b/t/t7519-status-fsmonitor.sh
@@ -73,6 +73,7 @@ test_expect_success 'setup' '
 	expect*
 	actual*
 	marker*
+	trace2*
 	EOF
 '
 
@@ -383,4 +384,51 @@ test_expect_success 'status succeeds after staging/unstaging' '
 	)
 '
 
+# Usage:
+# check_sparse_index_behavior [!]
+# If "!" is supplied, then we verify that we do not call ensure_full_index
+# during a call to 'git status'. Otherwise, we verify that we _do_ call it.
+check_sparse_index_behavior () {
+	git status --porcelain=v2 >expect &&
+	git sparse-checkout init --cone --sparse-index &&
+	git sparse-checkout set dir1 dir2 &&
+	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \
+		git status --porcelain=v2 >actual &&
+	test_region $1 index ensure_full_index trace2.txt &&
+	test_cmp expect actual &&
+	rm trace2.txt &&
+	git sparse-checkout disable
+}
+
+test_expect_success 'status succeeds with sparse index' '
+	git reset --hard &&
+
+	test_config core.fsmonitor "$TEST_DIRECTORY/t7519/fsmonitor-all" &&
+	check_sparse_index_behavior ! &&
+
+	write_script .git/hooks/fsmonitor-test<<-\EOF &&
+		printf "last_update_token\0"
+	EOF
+	git config core.fsmonitor .git/hooks/fsmonitor-test &&
+	check_sparse_index_behavior ! &&
+
+	write_script .git/hooks/fsmonitor-test<<-\EOF &&
+		printf "last_update_token\0"
+		printf "dir1/modified\0"
+	EOF
+	check_sparse_index_behavior ! &&
+
+	cp -r dir1 dir1a &&
+	git add dir1a &&
+	git commit -m "add dir1a" &&
+
+	# This one modifies outside the sparse-checkout definition
+	# and hence we expect to expand the sparse-index.
+	write_script .git/hooks/fsmonitor-test<<-\EOF &&
+		printf "last_update_token\0"
+		printf "dir1a/modified\0"
+	EOF
+	check_sparse_index_behavior
+'
+
 test_done
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 215+ messages in thread

* Re: [PATCH v4 07/12] unpack-trees: be careful around sparse directory entries
  2021-05-21 11:59       ` [PATCH v4 07/12] unpack-trees: be careful around sparse directory entries Derrick Stolee via GitGitGadget
@ 2021-05-28 11:36         ` Derrick Stolee
  0 siblings, 0 replies; 215+ messages in thread
From: Derrick Stolee @ 2021-05-28 11:36 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget, git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee,
	Derrick Stolee

On 5/21/2021 7:59 AM, Derrick Stolee via GitGitGadget wrote:
> From: Derrick Stolee <dstolee@microsoft.com>
> 
> The methods traverse_by_cache_tree() and unpack_nondirectories() have
> similar behavior in trying to demonstrate the difference between and
> index and a tree, with some differences about how they walk the index.

As I have been working on further sparse-index integrations,
specifically with 'git checkout', I have found an issue with this
patch that doesn't show itself in the current t1092 test script,
but appears later as more complicated scenarios appear.

I am pursuing the correct fix (that will also make 'git checkout'
work better) but it might be a week or two before I can send a v5
with that fix.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 215+ messages in thread

* [PATCH v5 00/14] Sparse-index: integrate with status
  2021-05-21 11:59     ` [PATCH v4 00/12] Sparse-index: integrate with status Derrick Stolee via GitGitGadget
                         ` (11 preceding siblings ...)
  2021-05-21 11:59       ` [PATCH v4 12/12] fsmonitor: integrate with sparse index Derrick Stolee via GitGitGadget
@ 2021-06-07 12:33       ` Derrick Stolee via GitGitGadget
  2021-06-07 12:33         ` [PATCH v5 01/14] sparse-index: skip indexes with unmerged entries Derrick Stolee via GitGitGadget
                           ` (14 more replies)
  12 siblings, 15 replies; 215+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-06-07 12:33 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee,
	Derrick Stolee

This is the first "payoff" series in the sparse-index work. It makes 'git
status' very fast when a sparse-index is enabled on a repository with
cone-mode sparse-checkout (and a small populated set).

This is based on ds/sparse-index-protections AND mt/add-rm-sparse-checkout.
The latter branch is needed because it changes the behavior of 'git add'
around sparse entries, which changes the expectations of a test added in
patch 1.

The approach here is to audit the places where ensure_full_index() pops up
while doing normal commands with pathspecs within the sparse-checkout
definition. Each of these are checked and tested. In the end, the
sparse-index is integrated with these features:

 * git status
 * FS Monitor index extension.

The performance tests in p2000-sparse-operations.sh improve by 95% or more,
even when compared with the full-index cases, not just the sparse-index
cases that previously had extra overhead.

Hopefully this is the first example of how ds/sparse-index-protections has
done the basic work to do these conversions safely, making them look easier
than they seemed when starting this adventure.

Thanks, -Stolee


Updates in V5
=============

I replaced one patch with a few that are more complicated. The reason is
that I started integrating with git checkout and realized that some of the
changes I was making in unpack-trees.c were incorrect for that situation, so
I might as well do them right here. The tests can't demonstrate the bugs
with the previous case until we integrate with git checkout, which will
follow in another series after this one is submitted.

For testing, I've integrated this series along with extensions that work for
git commit and git checkout into the Scalar functional tests, which test
many scenarios with cone mode sparse-checkout and hence provides good
evidence that this is working correctly.


Updates in V4
=============

 * The previous patch "unpack-trees: stop recursing into sparse directories"
   was confusing, and actually a bit sloppy.
 * It has been replaced with "unpack-trees: be careful around sparse
   directory entries" which takes the sparse-directory checks and raises
   them higher up into unpack_trees.c instead of in diff-lib.c.


Updates in V3
=============

Sorry that this was a long time coming. I got a little side-tracked on other
projects, but I also worked to get the sparse-index feature working against
the Scalar functional tests, which contain many special cases around the
sparse-checkout feature as they were inherited from special cases that arose
in the virtualized environment of VFS for Git. This version contains my
fixes based on that investigation. Most of these were easy to identify and
fix, but I was blocked for a long time struggling with a bug when combining
the sparse-index with the builtin FS Monitor feature, but I've reported my
findings already [1].

[1]
https://lore.kernel.org/git/0b9e54ba-ac27-e537-7bef-1b4448f92352@gmail.com/

 * Updated comments and tests based on the v2 feedback.
 * Expanded the test repository data shape based on the special cases found
   during my investigation.
 * Added several commits that either fix errors in the status code, or fix
   errors in the previous sparse-index series, specifically:
   * When in a conflict state, the cache-tree fails to update. For now, skip
     writing a sparse-index until this can be resolved more carefully.
   * When expanding a sparse-directory entry, we set the CE_SKIP_WORKTREE
     bit but forgot the CE_EXTENDED bit.
   * git status had failures if there was a sparse-directory entry as the
     first entry within a directory.
   * When expanding a directory to report its status, such as when a
     sparse-directory is staged but doesn't exist at HEAD (such as in an
     orphaned commit) we did not previously recurse correctly into
     subdirectories.
   * Be extra careful with the FS Monitor data when expanding or contracting
     an index. This version now abandons all FS Monitor data at these
     conversion points with the expectation that in the future these
     conversions will be rare so the FS Monitor feature can work
     efficiently. Updates in V2

----------------------------------------------------------------------------

 * Based on the feedback, it is clear that 'git add' will require much more
   careful testing and thought. I'm splitting it out of this series and it
   will return with a follow-up.
 * Test cases are improved, both in coverage and organization.
 * The previous "unpack-trees: make sparse aware" patch is split into three
   now.
 * Stale messages based on an old implementation of the "protections" topic
   are now fixed.
 * Performance tests were re-run.

Derrick Stolee (14):
  sparse-index: skip indexes with unmerged entries
  sparse-index: include EXTENDED flag when expanding
  t1092: replace incorrect 'echo' with 'cat'
  t1092: expand repository data shape
  t1092: add tests for status/add and sparse files
  unpack-trees: preserve cache_bottom
  unpack-trees: compare sparse directories correctly
  unpack-trees: unpack sparse directory entries
  dir.c: accept a directory as part of cone-mode patterns
  diff-lib: handle index diffs with sparse dirs
  status: skip sparse-checkout percentage with sparse-index
  status: use sparse-index throughout
  wt-status: expand added sparse directory entries
  fsmonitor: integrate with sparse index

 builtin/commit.c                         |   3 +
 diff-lib.c                               | 188 +++++++++++++++++++++++
 dir.c                                    |  11 ++
 read-cache.c                             |  10 +-
 sparse-index.c                           |  27 +++-
 t/t1092-sparse-checkout-compatibility.sh | 158 ++++++++++++++++++-
 t/t7519-status-fsmonitor.sh              |  48 ++++++
 unpack-trees.c                           | 121 +++++++++++++--
 wt-status.c                              |  64 +++++++-
 wt-status.h                              |   1 +
 10 files changed, 607 insertions(+), 24 deletions(-)


base-commit: f723f370c89ad61f4f40aabfd3540b1ce19c00e5
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-932%2Fderrickstolee%2Fsparse-index%2Fstatus-and-add-v5
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-932/derrickstolee/sparse-index/status-and-add-v5
Pull-Request: https://github.com/gitgitgadget/git/pull/932

Range-diff vs v4:

  1:  5a2ed3d1d701 =  1:  5a2ed3d1d701 sparse-index: skip indexes with unmerged entries
  2:  8aa41e749471 =  2:  8aa41e749471 sparse-index: include EXTENDED flag when expanding
  -:  ------------ >  3:  b99371c7dd61 t1092: replace incorrect 'echo' with 'cat'
  3:  70971b1f9261 !  4:  f4dddac1859e t1092: expand repository data shape
     @@ t/t1092-sparse-checkout-compatibility.sh: test_expect_success 'setup' '
      +		mkdir deep/deeper1/0/0 &&
      +		touch deep/deeper1/0/1 &&
      +		touch deep/deeper1/0/0/0 &&
     ++		cp -r deep/deeper1/0 folder1 &&
     ++		cp -r deep/deeper1/0 folder2 &&
     ++		echo >>folder1/0/0/0 &&
     ++		echo >>folder2/0/1 &&
       		git add . &&
       		git commit -m "initial commit" &&
       		git checkout -b base &&
     +@@ t/t1092-sparse-checkout-compatibility.sh: test_expect_success 'setup' '
     + 		mv folder1/a folder2/b &&
     + 		mv folder1/larger-content folder2/edited-content &&
     + 		echo >>folder2/edited-content &&
     ++		echo >>folder2/0/1 &&
     ++		echo stuff >>deep/deeper1/a &&
     + 		git add . &&
     + 		git commit -m "rename folder1/... to folder2/..." &&
     + 
     + 		git checkout -b rename-out-to-in rename-base &&
     + 		mv folder1/a deep/deeper1/b &&
     ++		echo more stuff >>deep/deeper1/a &&
     ++		rm folder2/0/1 &&
     ++		mkdir folder2/0/1 &&
     ++		echo >>folder2/0/1/1 &&
     + 		mv folder1/larger-content deep/deeper1/edited-content &&
     + 		echo >>deep/deeper1/edited-content &&
     + 		git add . &&
     +@@ t/t1092-sparse-checkout-compatibility.sh: test_expect_success 'setup' '
     + 
     + 		git checkout -b rename-in-to-out rename-base &&
     + 		mv deep/deeper1/a folder1/b &&
     ++		echo >>folder2/0/1 &&
     ++		rm -rf folder1/0/0 &&
     ++		echo >>folder1/0/0 &&
     + 		mv deep/deeper1/larger-content folder1/edited-content &&
     + 		echo >>folder1/edited-content &&
     + 		git add . &&
     +@@ t/t1092-sparse-checkout-compatibility.sh: test_expect_success 'diff --staged' '
     + 	test_all_match git diff --staged
     + '
     + 
     +-test_expect_success 'diff with renames' '
     ++test_expect_success 'diff with renames and conflicts' '
     + 	init_repos &&
     + 
     + 	for branch in rename-out-to-out rename-out-to-in rename-in-to-out
     + 	do
     + 		test_all_match git checkout rename-base &&
     + 		test_all_match git checkout $branch -- .&&
     ++		test_all_match git status --porcelain=v2 &&
     ++		test_all_match git diff --staged --no-renames &&
     ++		test_all_match git diff --staged --find-renames || return 1
     ++	done
     ++'
     ++
     ++test_expect_success 'diff with directory/file conflicts' '
     ++	init_repos &&
     ++
     ++	for branch in rename-out-to-out rename-out-to-in rename-in-to-out
     ++	do
     ++		git -C full-checkout reset --hard &&
     ++		test_sparse_match git reset --hard &&
     ++		test_all_match git checkout $branch &&
     ++		test_all_match git checkout rename-base -- . &&
     ++		test_all_match git status --porcelain=v2 &&
     + 		test_all_match git diff --staged --no-renames &&
     + 		test_all_match git diff --staged --find-renames || return 1
     + 	done
  4:  a80b5a41153f =  5:  856346b72f79 t1092: add tests for status/add and sparse files
  5:  07a45b661c4a =  6:  f3f6223e955f unpack-trees: preserve cache_bottom
  6:  cc4a526e7947 =  7:  45ae96adf285 unpack-trees: compare sparse directories correctly
  7:  e28df7f9395d !  8:  724194eef9f6 unpack-trees: be careful around sparse directory entries
     @@ Metadata
      Author: Derrick Stolee <dstolee@microsoft.com>
      
       ## Commit message ##
     -    unpack-trees: be careful around sparse directory entries
     +    unpack-trees: unpack sparse directory entries
      
     -    The methods traverse_by_cache_tree() and unpack_nondirectories() have
     -    similar behavior in trying to demonstrate the difference between and
     -    index and a tree, with some differences about how they walk the index.
     +    During unpack_callback(), index entries are compared against tree
     +    entries. These are matched according to names and types. One goal is to
     +    decide if we should recurse into subtrees or simply operate on one index
     +    entry.
      
     -    Each of these is expecting every cache entry to correspond to a file
     -    path. We need to skip over the sparse directory entries in the case of a
     -    sparse-index. Those entries are discovered in the portion that looks for
     -    subtrees among the cache entries by scanning the paths for slashes.
     +    In the case of a sparse-directory entry, we do not want to recurse into
     +    that subtree and instead simply compare the trees. In some cases, we
     +    might want to perform a merge operation on the entry, such as during
     +    'git checkout <commit>' which wants to replace a sparse tree entry with
     +    the tree for that path at the target commit. We extend the logic within
     +    unpack_nondirectories() to create a sparse-directory entry in this case,
     +    and then that is sent to call_unpack_fn().
      
     -    Skipping these sparse directory entries will have a measurable effect
     -    when we relax 'git status' to work with sparse-indexes: without this
     -    change these methods would call call_unpack_fn() which in turn calls
     -    oneway_diff() and then shows these sparse directory entries as added or
     -    modified files.
     +    There are some subtleties in this process. For instance, we need to
     +    update find_cache_entry() to allow finding a sparse-directory entry that
     +    exactly matches a given path.
      
          Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
      
       ## unpack-trees.c ##
     -@@ unpack-trees.c: static int traverse_by_cache_tree(int pos, int nr_entries, int nr_names,
     +@@ unpack-trees.c: static struct cache_entry *create_ce_entry(const struct traverse_info *info,
     + 	const struct name_entry *n,
     + 	int stage,
     + 	struct index_state *istate,
     +-	int is_transient)
     ++	int is_transient,
     ++	int is_sparse_directory)
     + {
     + 	size_t len = traverse_path_len(info, tree_entry_len(n));
     ++	size_t alloc_len = is_sparse_directory ? len + 1 : len;
     + 	struct cache_entry *ce =
     + 		is_transient ?
     +-		make_empty_transient_cache_entry(len) :
     +-		make_empty_cache_entry(istate, len);
     ++		make_empty_transient_cache_entry(alloc_len) :
     ++		make_empty_cache_entry(istate, alloc_len);
       
     - 		src[0] = o->src_index->cache[pos + i];
     + 	ce->ce_mode = create_ce_mode(n->mode);
     + 	ce->ce_flags = create_ce_flags(stage);
     +@@ unpack-trees.c: static struct cache_entry *create_ce_entry(const struct traverse_info *info,
     + 	/* len+1 because the cache_entry allocates space for NUL */
     + 	make_traverse_path(ce->name, len + 1, info, n->path, n->pathlen);
       
     -+		if (S_ISSPARSEDIR(src[0]->ce_mode))
     -+			continue;
     ++	if (is_sparse_directory) {
     ++		ce->name[len] = '/';
     ++		ce->name[len + 1] = 0;
     ++		ce->ce_namelen++;
     ++		ce->ce_flags |= CE_SKIP_WORKTREE;
     ++	}
      +
     - 		len = ce_namelen(src[0]);
     - 		new_ce_len = cache_entry_size(len);
     + 	return ce;
     + }
       
      @@ unpack-trees.c: static int unpack_nondirectories(int n, unsigned long mask,
     + 				 unsigned long dirmask,
     + 				 struct cache_entry **src,
     + 				 const struct name_entry *names,
     +-				 const struct traverse_info *info)
     ++				 const struct traverse_info *info,
     ++				 int sparse_directory)
     + {
     + 	int i;
     + 	struct unpack_trees_options *o = info->data;
     + 	unsigned long conflicts = info->df_conflicts | dirmask;
     + 
     +-	/* Do we have *only* directories? Nothing to do */
       	if (mask == dirmask && !src[0])
       		return 0;
       
     -+	if (src[0] && S_ISSPARSEDIR(src[0]->ce_mode))
     ++	/* no-op if our cache entry doesn't match the expectations. */
     ++	if (sparse_directory) {
     ++		if (src[0] && !S_ISSPARSEDIR(src[0]->ce_mode))
     ++			BUG("expected sparse directory entry");
     ++	} else if (src[0] && S_ISSPARSEDIR(src[0]->ce_mode)) {
      +		return 0;
     ++	}
      +
       	/*
       	 * Ok, we've filled in up to any potential index entry in src[0],
       	 * now do the rest.
     +@@ unpack-trees.c: static int unpack_nondirectories(int n, unsigned long mask,
     + 		 * not stored in the index.  otherwise construct the
     + 		 * cache entry from the index aware logic.
     + 		 */
     +-		src[i + o->merge] = create_ce_entry(info, names + i, stage, &o->result, o->merge);
     ++		src[i + o->merge] = create_ce_entry(info, names + i, stage,
     ++						    &o->result, o->merge,
     ++						    sparse_directory);
     + 	}
     + 
     + 	if (o->merge) {
     +@@ unpack-trees.c: static int find_cache_pos(struct traverse_info *info,
     + static struct cache_entry *find_cache_entry(struct traverse_info *info,
     + 					    const struct name_entry *p)
     + {
     ++	struct cache_entry *ce;
     + 	int pos = find_cache_pos(info, p->path, p->pathlen);
     + 	struct unpack_trees_options *o = info->data;
     + 
     + 	if (0 <= pos)
     + 		return o->src_index->cache[pos];
     +-	else
     ++
     ++	/*
     ++	 * Check for a sparse-directory entry named "path/".
     ++	 * Due to the input p->path not having a trailing
     ++	 * slash, the negative 'pos' value overshoots the
     ++	 * expected position by one, hence "-2" here.
     ++	 */
     ++	pos = -pos - 2;
     ++
     ++	if (pos < 0 || pos >= o->src_index->cache_nr)
     ++		return NULL;
     ++
     ++	ce = o->src_index->cache[pos];
     ++
     ++	if (!S_ISSPARSEDIR(ce->ce_mode))
     + 		return NULL;
     ++
     ++	/*
     ++	 * Compare ce->name to info->name + '/' + p->path + '/'
     ++	 * if info->name is non-empty. Compare ce->name to
     ++	 * p-.path + '/' otherwise.
     ++	 */
     ++	if (info->namelen) {
     ++		if (ce->ce_namelen == info->namelen + p->pathlen + 2 &&
     ++		    ce->name[info->namelen] == '/' &&
     ++		    !strncmp(ce->name, info->name, info->namelen) &&
     ++		    !strncmp(ce->name + info->namelen + 1, p->path, p->pathlen))
     ++			return ce;
     ++	} else if (ce->ce_namelen == p->pathlen + 1 &&
     ++		   !strncmp(ce->name, p->path, p->pathlen))
     ++		return ce;
     ++	return NULL;
     + }
     + 
     + static void debug_path(struct traverse_info *info)
     +@@ unpack-trees.c: static void debug_unpack_callback(int n,
     + 		debug_name_entry(i, names + i);
     + }
     + 
     ++/*
     ++ * Returns true if and only if the given cache_entry is a
     ++ * sparse-directory entry that matches the given name_entry
     ++ * from the tree walk at the given traverse_info.
     ++ */
     ++static int is_sparse_directory_entry(struct cache_entry *ce, struct name_entry *name, struct traverse_info *info)
     ++{
     ++	size_t expected_len, name_start;
     ++
     ++	if (!ce || !name || !S_ISSPARSEDIR(ce->ce_mode))
     ++		return 0;
     ++
     ++	if (info->namelen)
     ++		name_start = info->namelen + 1;
     ++	else
     ++		name_start = 0;
     ++	expected_len = name->pathlen + 1 + name_start;
     ++
     ++	if (ce->ce_namelen != expected_len ||
     ++	    strncmp(ce->name, info->name, info->namelen) ||
     ++	    strncmp(ce->name + name_start, name->path, name->pathlen))
     ++		return 0;
     ++
     ++	return 1;
     ++}
     ++
     + /*
     +  * Note that traverse_by_cache_tree() duplicates some logic in this function
     +  * without actually calling it. If you change the logic here you may need to
     +@@ unpack-trees.c: static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, str
     + 		}
     + 	}
     + 
     +-	if (unpack_nondirectories(n, mask, dirmask, src, names, info) < 0)
     ++	if (unpack_nondirectories(n, mask, dirmask, src, names, info, 0) < 0)
     + 		return -1;
     + 
     + 	if (o->merge && src[0]) {
     +@@ unpack-trees.c: static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, str
     + 			}
     + 		}
     + 
     +-		if (traverse_trees_recursive(n, dirmask, mask & ~dirmask,
     +-					     names, info) < 0)
     ++		if (is_sparse_directory_entry(src[0], names, info)) {
     ++			if (unpack_nondirectories(n, dirmask, mask & ~dirmask, src, names, info, 1) < 0)
     ++				return -1;
     ++		} else if (traverse_trees_recursive(n, dirmask, mask & ~dirmask,
     ++						    names, info) < 0) {
     + 			return -1;
     ++		}
     ++
     + 		return mask;
     + 	}
     + 
  8:  2cc3a93d4434 =  9:  b8ff179f43e3 dir.c: accept a directory as part of cone-mode patterns
  -:  ------------ > 10:  b9b97e011293 diff-lib: handle index diffs with sparse dirs
  9:  5011feb1aa04 = 11:  611b9f61fb2c status: skip sparse-checkout percentage with sparse-index
 10:  9f2ce5301dc9 = 12:  0c0a765dde80 status: use sparse-index throughout
 11:  24417e095243 ! 13:  02f2c7b63982 wt-status: expand added sparse directory entries
     @@ t/t1092-sparse-checkout-compatibility.sh: test_expect_success 'sparse-index is n
      +	init_repos &&
      +
      +	test_all_match git checkout rename-out-to-in &&
     -+	test_all_match git reset --mixed HEAD~1 &&
     ++
     ++	# Sparse checkouts do not agree with full checkouts about
     ++	# how to report a directory/file conflict during a reset.
     ++	# This command would fail with test_all_match because the
     ++	# full checkout reports "T folder1/0/1" while a sparse
     ++	# checkout reports "D folder1/0/1". This matches because
     ++	# the sparse checkouts skip "adding" the other side of
     ++	# the conflict.
     ++	test_sparse_match git reset --mixed HEAD~1 &&
      +	test_sparse_match test-tool read-cache --table --expand &&
     -+	test_all_match git status --porcelain=v2 &&
     -+	test_all_match git status --porcelain=v2 &&
     ++	test_sparse_match git status --porcelain=v2 &&
     ++	test_sparse_match git status --porcelain=v2 &&
      +
      +	# At this point, sparse-checkouts behave differently
      +	# from the full-checkout.
 12:  584d4b559a91 = 14:  46ca150c3548 fsmonitor: integrate with sparse index

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 215+ messages in thread

* [PATCH v5 01/14] sparse-index: skip indexes with unmerged entries
  2021-06-07 12:33       ` [PATCH v5 00/14] Sparse-index: integrate with status Derrick Stolee via GitGitGadget
@ 2021-06-07 12:33         ` Derrick Stolee via GitGitGadget
  2021-06-07 12:34         ` [PATCH v5 02/14] sparse-index: include EXTENDED flag when expanding Derrick Stolee via GitGitGadget
                           ` (13 subsequent siblings)
  14 siblings, 0 replies; 215+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-06-07 12:33 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

The sparse-index format is designed to be compatible with merge
conflicts, even those outside the sparse-checkout definition. The reason
is that when converting a full index to a sparse one, a cache entry with
nonzero stage will not be collapsed into a sparse directory entry.

However, this behavior was not tested, and a different behavior within
convert_to_sparse() fails in this scenario. Specifically,
cache_tree_update() will fail when unmerged entries exist.
convert_to_sparse_rec() uses the cache-tree data to recursively walk the
tree structure, but also to compute the OIDs used in the
sparse-directory entries.

Add an index scan to convert_to_sparse() that will detect if these merge
conflict entries exist and skip the conversion before trying to update
the cache-tree. This is marked as NEEDSWORK because this can be removed
with a suitable update to cache_tree_update() or a similar method that
can construct a cache-tree with invalid nodes, but still allow creating
the nodes necessary for creating sparse directory entries.

It is possible that in the future we will not need to make such an
update, since if we do not expand a sparse-index into a full one, this
conversion does not need to happen. Thus, this can be deferred until the
merge machinery is made to integrate with the sparse-index.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 sparse-index.c                           | 18 ++++++++++++++++++
 t/t1092-sparse-checkout-compatibility.sh | 22 ++++++++++++++++++++++
 2 files changed, 40 insertions(+)

diff --git a/sparse-index.c b/sparse-index.c
index 6f21397e2ee0..1b49898d0cb7 100644
--- a/sparse-index.c
+++ b/sparse-index.c
@@ -125,6 +125,17 @@ int set_sparse_index_config(struct repository *repo, int enable)
 	return res;
 }
 
+static int index_has_unmerged_entries(struct index_state *istate)
+{
+	int i;
+	for (i = 0; i < istate->cache_nr; i++) {
+		if (ce_stage(istate->cache[i]))
+			return 1;
+	}
+
+	return 0;
+}
+
 int convert_to_sparse(struct index_state *istate)
 {
 	int test_env;
@@ -161,6 +172,13 @@ int convert_to_sparse(struct index_state *istate)
 		return -1;
 	}
 
+	/*
+	 * NEEDSWORK: If we have unmerged entries, then stay full.
+	 * Unmerged entries prevent the cache-tree extension from working.
+	 */
+	if (index_has_unmerged_entries(istate))
+		return 0;
+
 	if (cache_tree_update(istate, 0)) {
 		warning(_("unable to update cache-tree, staying full"));
 		return -1;
diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index 12e6c453024f..4f2f09b53a32 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -352,6 +352,28 @@ test_expect_success 'merge with outside renames' '
 	done
 '
 
+# Sparse-index fails to convert the index in the
+# final 'git cherry-pick' command.
+test_expect_success 'cherry-pick with conflicts' '
+	init_repos &&
+
+	write_script edit-conflict <<-\EOF &&
+	echo $1 >conflict
+	EOF
+
+	test_all_match git checkout -b to-cherry-pick &&
+	run_on_all ../edit-conflict ABC &&
+	test_all_match git add conflict &&
+	test_all_match git commit -m "conflict to pick" &&
+
+	test_all_match git checkout -B base HEAD~1 &&
+	run_on_all ../edit-conflict DEF &&
+	test_all_match git add conflict &&
+	test_all_match git commit -m "conflict in base" &&
+
+	test_all_match test_must_fail git cherry-pick to-cherry-pick
+'
+
 test_expect_success 'clean' '
 	init_repos &&
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 215+ messages in thread

* [PATCH v5 02/14] sparse-index: include EXTENDED flag when expanding
  2021-06-07 12:33       ` [PATCH v5 00/14] Sparse-index: integrate with status Derrick Stolee via GitGitGadget
  2021-06-07 12:33         ` [PATCH v5 01/14] sparse-index: skip indexes with unmerged entries Derrick Stolee via GitGitGadget
@ 2021-06-07 12:34         ` Derrick Stolee via GitGitGadget
  2021-06-08 18:56           ` Elijah Newren
  2021-06-07 12:34         ` [PATCH v5 03/14] t1092: replace incorrect 'echo' with 'cat' Derrick Stolee via GitGitGadget
                           ` (12 subsequent siblings)
  14 siblings, 1 reply; 215+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-06-07 12:34 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

When creating a full index from a sparse one, we create cache entries
for every blob within a given sparse directory entry. These are
correctly marked with the CE_SKIP_WORKTREE flag, but they must also be
marked with the CE_EXTENDED flag to ensure that the skip-worktree bit is
correctly written to disk in the case that the index is not converted
back down to a sparse-index.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 sparse-index.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/sparse-index.c b/sparse-index.c
index 1b49898d0cb7..b2b3fbd75050 100644
--- a/sparse-index.c
+++ b/sparse-index.c
@@ -222,7 +222,7 @@ static int add_path_to_index(const struct object_id *oid,
 	strbuf_addstr(base, path);
 
 	ce = make_cache_entry(istate, mode, oid, base->buf, 0, 0);
-	ce->ce_flags |= CE_SKIP_WORKTREE;
+	ce->ce_flags |= CE_SKIP_WORKTREE | CE_EXTENDED;
 	set_index_entry(istate, istate->cache_nr++, ce);
 
 	strbuf_setlen(base, len);
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 215+ messages in thread

* [PATCH v5 03/14] t1092: replace incorrect 'echo' with 'cat'
  2021-06-07 12:33       ` [PATCH v5 00/14] Sparse-index: integrate with status Derrick Stolee via GitGitGadget
  2021-06-07 12:33         ` [PATCH v5 01/14] sparse-index: skip indexes with unmerged entries Derrick Stolee via GitGitGadget
  2021-06-07 12:34         ` [PATCH v5 02/14] sparse-index: include EXTENDED flag when expanding Derrick Stolee via GitGitGadget
@ 2021-06-07 12:34         ` Derrick Stolee via GitGitGadget
  2021-06-08 19:18           ` Elijah Newren
  2021-06-07 12:34         ` [PATCH v5 04/14] t1092: expand repository data shape Derrick Stolee via GitGitGadget
                           ` (11 subsequent siblings)
  14 siblings, 1 reply; 215+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-06-07 12:34 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

This fixes the test data shape to be as expected, allowing rename
detection to work properly now that the 'larger-conent' file actually
has meaningful lines.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 t/t1092-sparse-checkout-compatibility.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index 4f2f09b53a32..d55478a1902b 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -40,7 +40,7 @@ test_expect_success 'setup' '
 		done &&
 
 		git checkout -b rename-base base &&
-		echo >folder1/larger-content <<-\EOF &&
+		cat >folder1/larger-content <<-\EOF &&
 		matching
 		lines
 		help
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 215+ messages in thread

* [PATCH v5 04/14] t1092: expand repository data shape
  2021-06-07 12:33       ` [PATCH v5 00/14] Sparse-index: integrate with status Derrick Stolee via GitGitGadget
                           ` (2 preceding siblings ...)
  2021-06-07 12:34         ` [PATCH v5 03/14] t1092: replace incorrect 'echo' with 'cat' Derrick Stolee via GitGitGadget
@ 2021-06-07 12:34         ` Derrick Stolee via GitGitGadget
  2021-06-07 12:34         ` [PATCH v5 05/14] t1092: add tests for status/add and sparse files Derrick Stolee via GitGitGadget
                           ` (10 subsequent siblings)
  14 siblings, 0 replies; 215+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-06-07 12:34 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

As more features integrate with the sparse-index feature, more and more
special cases arise that require different data shapes within the tree
structure of the repository in order to demonstrate those cases.

Add several interesting special cases all at once instead of sprinkling
them across several commits. The interesting cases being added here are:

* Add sparse-directory entries on both sides of directories within the
  sparse-checkout definition.

* Add directories outside the sparse-checkout definition who have only
  one entry and are the first entry of a directory with multiple
  entries.

Later tests will take advantage of these shapes, but they also deepen
the tests that already exist.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 t/t1092-sparse-checkout-compatibility.sh | 39 ++++++++++++++++++++++--
 1 file changed, 37 insertions(+), 2 deletions(-)

diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index d55478a1902b..014a507d8b06 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -17,7 +17,7 @@ test_expect_success 'setup' '
 		echo "after folder1" >g &&
 		echo "after x" >z &&
 		mkdir folder1 folder2 deep x &&
-		mkdir deep/deeper1 deep/deeper2 &&
+		mkdir deep/deeper1 deep/deeper2 deep/before deep/later &&
 		mkdir deep/deeper1/deepest &&
 		echo "after deeper1" >deep/e &&
 		echo "after deepest" >deep/deeper1/e &&
@@ -25,10 +25,20 @@ test_expect_success 'setup' '
 		cp a folder2 &&
 		cp a x &&
 		cp a deep &&
+		cp a deep/before &&
 		cp a deep/deeper1 &&
 		cp a deep/deeper2 &&
+		cp a deep/later &&
 		cp a deep/deeper1/deepest &&
 		cp -r deep/deeper1/deepest deep/deeper2 &&
+		mkdir deep/deeper1/0 &&
+		mkdir deep/deeper1/0/0 &&
+		touch deep/deeper1/0/1 &&
+		touch deep/deeper1/0/0/0 &&
+		cp -r deep/deeper1/0 folder1 &&
+		cp -r deep/deeper1/0 folder2 &&
+		echo >>folder1/0/0/0 &&
+		echo >>folder2/0/1 &&
 		git add . &&
 		git commit -m "initial commit" &&
 		git checkout -b base &&
@@ -56,11 +66,17 @@ test_expect_success 'setup' '
 		mv folder1/a folder2/b &&
 		mv folder1/larger-content folder2/edited-content &&
 		echo >>folder2/edited-content &&
+		echo >>folder2/0/1 &&
+		echo stuff >>deep/deeper1/a &&
 		git add . &&
 		git commit -m "rename folder1/... to folder2/..." &&
 
 		git checkout -b rename-out-to-in rename-base &&
 		mv folder1/a deep/deeper1/b &&
+		echo more stuff >>deep/deeper1/a &&
+		rm folder2/0/1 &&
+		mkdir folder2/0/1 &&
+		echo >>folder2/0/1/1 &&
 		mv folder1/larger-content deep/deeper1/edited-content &&
 		echo >>deep/deeper1/edited-content &&
 		git add . &&
@@ -68,6 +84,9 @@ test_expect_success 'setup' '
 
 		git checkout -b rename-in-to-out rename-base &&
 		mv deep/deeper1/a folder1/b &&
+		echo >>folder2/0/1 &&
+		rm -rf folder1/0/0 &&
+		echo >>folder1/0/0 &&
 		mv deep/deeper1/larger-content folder1/edited-content &&
 		echo >>folder1/edited-content &&
 		git add . &&
@@ -262,13 +281,29 @@ test_expect_success 'diff --staged' '
 	test_all_match git diff --staged
 '
 
-test_expect_success 'diff with renames' '
+test_expect_success 'diff with renames and conflicts' '
 	init_repos &&
 
 	for branch in rename-out-to-out rename-out-to-in rename-in-to-out
 	do
 		test_all_match git checkout rename-base &&
 		test_all_match git checkout $branch -- .&&
+		test_all_match git status --porcelain=v2 &&
+		test_all_match git diff --staged --no-renames &&
+		test_all_match git diff --staged --find-renames || return 1
+	done
+'
+
+test_expect_success 'diff with directory/file conflicts' '
+	init_repos &&
+
+	for branch in rename-out-to-out rename-out-to-in rename-in-to-out
+	do
+		git -C full-checkout reset --hard &&
+		test_sparse_match git reset --hard &&
+		test_all_match git checkout $branch &&
+		test_all_match git checkout rename-base -- . &&
+		test_all_match git status --porcelain=v2 &&
 		test_all_match git diff --staged --no-renames &&
 		test_all_match git diff --staged --find-renames || return 1
 	done
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 215+ messages in thread

* [PATCH v5 05/14] t1092: add tests for status/add and sparse files
  2021-06-07 12:33       ` [PATCH v5 00/14] Sparse-index: integrate with status Derrick Stolee via GitGitGadget
                           ` (3 preceding siblings ...)
  2021-06-07 12:34         ` [PATCH v5 04/14] t1092: expand repository data shape Derrick Stolee via GitGitGadget
@ 2021-06-07 12:34         ` Derrick Stolee via GitGitGadget
  2021-06-07 12:34         ` [PATCH v5 06/14] unpack-trees: preserve cache_bottom Derrick Stolee via GitGitGadget
                           ` (9 subsequent siblings)
  14 siblings, 0 replies; 215+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-06-07 12:34 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Before moving to update 'git status' and 'git add' to work with sparse
indexes, add an explicit test that ensures the sparse-index works the
same as a normal sparse-checkout when the worktree contains directories
and files outside of the sparse cone.

Specifically, 'folder1/a' is a file in our test repo, but 'folder1' is
not in the sparse cone. When 'folder1/a' is modified, the file is not
shown as modified and adding it will fail. This is new behavior as of
a20f704 (add: warn when asked to update SKIP_WORKTREE entries,
2021-04-08). Before that change, these adds would be silently ignored.

Untracked files are fine: adding new files both with 'git add .' and
'git add folder1/' works just as in a full checkout. This may not be
entirely desirable, but we are not intending to change behavior at the
moment, only document it. A future change could alter the behavior to
be more sensible, and this test could be modified to satisfy the new
expected behavior.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 t/t1092-sparse-checkout-compatibility.sh | 38 ++++++++++++++++++++++++
 1 file changed, 38 insertions(+)

diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index 014a507d8b06..851a83388e4b 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -251,6 +251,44 @@ test_expect_success 'add, commit, checkout' '
 	test_all_match git checkout -
 '
 
+test_expect_success 'status/add: outside sparse cone' '
+	init_repos &&
+
+	# adding a "missing" file outside the cone should fail
+	test_sparse_match test_must_fail git add folder1/a &&
+
+	# folder1 is at HEAD, but outside the sparse cone
+	run_on_sparse mkdir folder1 &&
+	cp initial-repo/folder1/a sparse-checkout/folder1/a &&
+	cp initial-repo/folder1/a sparse-index/folder1/a &&
+
+	test_sparse_match git status &&
+
+	write_script edit-contents <<-\EOF &&
+	echo text >>$1
+	EOF
+	run_on_sparse ../edit-contents folder1/a &&
+	run_on_all ../edit-contents folder1/new &&
+
+	test_sparse_match git status --porcelain=v2 &&
+
+	# This "git add folder1/a" fails with a warning
+	# in the sparse repos, differing from the full
+	# repo. This is intentional.
+	test_sparse_match test_must_fail git add folder1/a &&
+	test_sparse_match test_must_fail git add --refresh folder1/a &&
+	test_all_match git status --porcelain=v2 &&
+
+	test_all_match git add . &&
+	test_all_match git status --porcelain=v2 &&
+	test_all_match git commit -m folder1/new &&
+
+	run_on_all ../edit-contents folder1/newer &&
+	test_all_match git add folder1/ &&
+	test_all_match git status --porcelain=v2 &&
+	test_all_match git commit -m folder1/newer
+'
+
 test_expect_success 'checkout and reset --hard' '
 	init_repos &&
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 215+ messages in thread

* [PATCH v5 06/14] unpack-trees: preserve cache_bottom
  2021-06-07 12:33       ` [PATCH v5 00/14] Sparse-index: integrate with status Derrick Stolee via GitGitGadget
                           ` (4 preceding siblings ...)
  2021-06-07 12:34         ` [PATCH v5 05/14] t1092: add tests for status/add and sparse files Derrick Stolee via GitGitGadget
@ 2021-06-07 12:34         ` Derrick Stolee via GitGitGadget
  2021-06-07 12:34         ` [PATCH v5 07/14] unpack-trees: compare sparse directories correctly Derrick Stolee via GitGitGadget
                           ` (8 subsequent siblings)
  14 siblings, 0 replies; 215+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-06-07 12:34 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

The cache_bottom member of 'struct unpack_trees_options' is used to
track the range of index entries corresponding to a node of the cache
tree. While recursing with traverse_by_cache_tree(), this value is
preserved on the call stack using a local and then restored as that
method returns.

The mark_ce_used() method normally modifies the cache_bottom member when
it refers to the marked cache entry. However, sparse directory entries
are stored as nodes in the cache-tree data structure as of 2de37c53
(cache-tree: integrate with sparse directory entries, 2021-03-30). Thus,
the cache_bottom will be modified as the cache-tree walk advances. Do
not update it as well within mark_ce_used().

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 unpack-trees.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/unpack-trees.c b/unpack-trees.c
index dddf106d5bd4..1067db19c9d2 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -586,6 +586,13 @@ static void mark_ce_used(struct cache_entry *ce, struct unpack_trees_options *o)
 {
 	ce->ce_flags |= CE_UNPACKED;
 
+	/*
+	 * If this is a sparse directory, don't advance cache_bottom.
+	 * That will be advanced later using the cache-tree data.
+	 */
+	if (S_ISSPARSEDIR(ce->ce_mode))
+		return;
+
 	if (o->cache_bottom < o->src_index->cache_nr &&
 	    o->src_index->cache[o->cache_bottom] == ce) {
 		int bottom = o->cache_bottom;
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 215+ messages in thread

* [PATCH v5 07/14] unpack-trees: compare sparse directories correctly
  2021-06-07 12:33       ` [PATCH v5 00/14] Sparse-index: integrate with status Derrick Stolee via GitGitGadget
                           ` (5 preceding siblings ...)
  2021-06-07 12:34         ` [PATCH v5 06/14] unpack-trees: preserve cache_bottom Derrick Stolee via GitGitGadget
@ 2021-06-07 12:34         ` Derrick Stolee via GitGitGadget
  2021-06-07 12:34         ` [PATCH v5 08/14] unpack-trees: unpack sparse directory entries Derrick Stolee via GitGitGadget
                           ` (7 subsequent siblings)
  14 siblings, 0 replies; 215+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-06-07 12:34 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

As we further integrate the sparse-index into unpack-trees, we need to
ensure that we compare sparse directory entries correctly with other
entries. This affects searching for an exact path as well as sorting
index entries.

Sparse directory entries contain the trailing directory separator. This
is important for the sorting, in particular. Thus, within
do_compare_entry() we stop using S_IFREG in all cases, since sparse
directories should use S_IFDIR to indicate that the comparison should
treat the entry name as a dirctory.

Within compare_entry(), it first calls do_compare_entry() to check the
leading portion of the name. When the input path is a directory name, we
could match exactly already. Thus, we should return 0 if we have an
exact string match on a sparse directory entry.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 unpack-trees.c | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/unpack-trees.c b/unpack-trees.c
index 1067db19c9d2..ef6a2b1c951c 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -969,6 +969,7 @@ static int do_compare_entry(const struct cache_entry *ce,
 	int pathlen, ce_len;
 	const char *ce_name;
 	int cmp;
+	unsigned ce_mode;
 
 	/*
 	 * If we have not precomputed the traverse path, it is quicker
@@ -991,7 +992,8 @@ static int do_compare_entry(const struct cache_entry *ce,
 	ce_len -= pathlen;
 	ce_name = ce->name + pathlen;
 
-	return df_name_compare(ce_name, ce_len, S_IFREG, name, namelen, mode);
+	ce_mode = S_ISSPARSEDIR(ce->ce_mode) ? S_IFDIR : S_IFREG;
+	return df_name_compare(ce_name, ce_len, ce_mode, name, namelen, mode);
 }
 
 static int compare_entry(const struct cache_entry *ce, const struct traverse_info *info, const struct name_entry *n)
@@ -1000,6 +1002,15 @@ static int compare_entry(const struct cache_entry *ce, const struct traverse_inf
 	if (cmp)
 		return cmp;
 
+	/*
+	 * At this point, we know that we have a prefix match. If ce
+	 * is a sparse directory, then allow an exact match. This only
+	 * works when the input name is a directory, since ce->name
+	 * ends in a directory separator.
+	 */
+	if (S_ISSPARSEDIR(ce->ce_mode))
+		return 0;
+
 	/*
 	 * Even if the beginning compared identically, the ce should
 	 * compare as bigger than a directory leading up to it!
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 215+ messages in thread

* [PATCH v5 08/14] unpack-trees: unpack sparse directory entries
  2021-06-07 12:33       ` [PATCH v5 00/14] Sparse-index: integrate with status Derrick Stolee via GitGitGadget
                           ` (6 preceding siblings ...)
  2021-06-07 12:34         ` [PATCH v5 07/14] unpack-trees: compare sparse directories correctly Derrick Stolee via GitGitGadget
@ 2021-06-07 12:34         ` Derrick Stolee via GitGitGadget
  2021-06-09  3:48           ` Elijah Newren
  2021-06-07 12:34         ` [PATCH v5 09/14] dir.c: accept a directory as part of cone-mode patterns Derrick Stolee via GitGitGadget
                           ` (6 subsequent siblings)
  14 siblings, 1 reply; 215+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-06-07 12:34 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

During unpack_callback(), index entries are compared against tree
entries. These are matched according to names and types. One goal is to
decide if we should recurse into subtrees or simply operate on one index
entry.

In the case of a sparse-directory entry, we do not want to recurse into
that subtree and instead simply compare the trees. In some cases, we
might want to perform a merge operation on the entry, such as during
'git checkout <commit>' which wants to replace a sparse tree entry with
the tree for that path at the target commit. We extend the logic within
unpack_nondirectories() to create a sparse-directory entry in this case,
and then that is sent to call_unpack_fn().

There are some subtleties in this process. For instance, we need to
update find_cache_entry() to allow finding a sparse-directory entry that
exactly matches a given path.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 unpack-trees.c | 101 ++++++++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 91 insertions(+), 10 deletions(-)

diff --git a/unpack-trees.c b/unpack-trees.c
index ef6a2b1c951c..ff448ee8424e 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -1037,13 +1037,15 @@ static struct cache_entry *create_ce_entry(const struct traverse_info *info,
 	const struct name_entry *n,
 	int stage,
 	struct index_state *istate,
-	int is_transient)
+	int is_transient,
+	int is_sparse_directory)
 {
 	size_t len = traverse_path_len(info, tree_entry_len(n));
+	size_t alloc_len = is_sparse_directory ? len + 1 : len;
 	struct cache_entry *ce =
 		is_transient ?
-		make_empty_transient_cache_entry(len) :
-		make_empty_cache_entry(istate, len);
+		make_empty_transient_cache_entry(alloc_len) :
+		make_empty_cache_entry(istate, alloc_len);
 
 	ce->ce_mode = create_ce_mode(n->mode);
 	ce->ce_flags = create_ce_flags(stage);
@@ -1052,6 +1054,13 @@ static struct cache_entry *create_ce_entry(const struct traverse_info *info,
 	/* len+1 because the cache_entry allocates space for NUL */
 	make_traverse_path(ce->name, len + 1, info, n->path, n->pathlen);
 
+	if (is_sparse_directory) {
+		ce->name[len] = '/';
+		ce->name[len + 1] = 0;
+		ce->ce_namelen++;
+		ce->ce_flags |= CE_SKIP_WORKTREE;
+	}
+
 	return ce;
 }
 
@@ -1064,16 +1073,24 @@ static int unpack_nondirectories(int n, unsigned long mask,
 				 unsigned long dirmask,
 				 struct cache_entry **src,
 				 const struct name_entry *names,
-				 const struct traverse_info *info)
+				 const struct traverse_info *info,
+				 int sparse_directory)
 {
 	int i;
 	struct unpack_trees_options *o = info->data;
 	unsigned long conflicts = info->df_conflicts | dirmask;
 
-	/* Do we have *only* directories? Nothing to do */
 	if (mask == dirmask && !src[0])
 		return 0;
 
+	/* no-op if our cache entry doesn't match the expectations. */
+	if (sparse_directory) {
+		if (src[0] && !S_ISSPARSEDIR(src[0]->ce_mode))
+			BUG("expected sparse directory entry");
+	} else if (src[0] && S_ISSPARSEDIR(src[0]->ce_mode)) {
+		return 0;
+	}
+
 	/*
 	 * Ok, we've filled in up to any potential index entry in src[0],
 	 * now do the rest.
@@ -1103,7 +1120,9 @@ static int unpack_nondirectories(int n, unsigned long mask,
 		 * not stored in the index.  otherwise construct the
 		 * cache entry from the index aware logic.
 		 */
-		src[i + o->merge] = create_ce_entry(info, names + i, stage, &o->result, o->merge);
+		src[i + o->merge] = create_ce_entry(info, names + i, stage,
+						    &o->result, o->merge,
+						    sparse_directory);
 	}
 
 	if (o->merge) {
@@ -1210,13 +1229,44 @@ static int find_cache_pos(struct traverse_info *info,
 static struct cache_entry *find_cache_entry(struct traverse_info *info,
 					    const struct name_entry *p)
 {
+	struct cache_entry *ce;
 	int pos = find_cache_pos(info, p->path, p->pathlen);
 	struct unpack_trees_options *o = info->data;
 
 	if (0 <= pos)
 		return o->src_index->cache[pos];
-	else
+
+	/*
+	 * Check for a sparse-directory entry named "path/".
+	 * Due to the input p->path not having a trailing
+	 * slash, the negative 'pos' value overshoots the
+	 * expected position by one, hence "-2" here.
+	 */
+	pos = -pos - 2;
+
+	if (pos < 0 || pos >= o->src_index->cache_nr)
+		return NULL;
+
+	ce = o->src_index->cache[pos];
+
+	if (!S_ISSPARSEDIR(ce->ce_mode))
 		return NULL;
+
+	/*
+	 * Compare ce->name to info->name + '/' + p->path + '/'
+	 * if info->name is non-empty. Compare ce->name to
+	 * p-.path + '/' otherwise.
+	 */
+	if (info->namelen) {
+		if (ce->ce_namelen == info->namelen + p->pathlen + 2 &&
+		    ce->name[info->namelen] == '/' &&
+		    !strncmp(ce->name, info->name, info->namelen) &&
+		    !strncmp(ce->name + info->namelen + 1, p->path, p->pathlen))
+			return ce;
+	} else if (ce->ce_namelen == p->pathlen + 1 &&
+		   !strncmp(ce->name, p->path, p->pathlen))
+		return ce;
+	return NULL;
 }
 
 static void debug_path(struct traverse_info *info)
@@ -1251,6 +1301,32 @@ static void debug_unpack_callback(int n,
 		debug_name_entry(i, names + i);
 }
 
+/*
+ * Returns true if and only if the given cache_entry is a
+ * sparse-directory entry that matches the given name_entry
+ * from the tree walk at the given traverse_info.
+ */
+static int is_sparse_directory_entry(struct cache_entry *ce, struct name_entry *name, struct traverse_info *info)
+{
+	size_t expected_len, name_start;
+
+	if (!ce || !name || !S_ISSPARSEDIR(ce->ce_mode))
+		return 0;
+
+	if (info->namelen)
+		name_start = info->namelen + 1;
+	else
+		name_start = 0;
+	expected_len = name->pathlen + 1 + name_start;
+
+	if (ce->ce_namelen != expected_len ||
+	    strncmp(ce->name, info->name, info->namelen) ||
+	    strncmp(ce->name + name_start, name->path, name->pathlen))
+		return 0;
+
+	return 1;
+}
+
 /*
  * Note that traverse_by_cache_tree() duplicates some logic in this function
  * without actually calling it. If you change the logic here you may need to
@@ -1307,7 +1383,7 @@ static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, str
 		}
 	}
 
-	if (unpack_nondirectories(n, mask, dirmask, src, names, info) < 0)
+	if (unpack_nondirectories(n, mask, dirmask, src, names, info, 0) < 0)
 		return -1;
 
 	if (o->merge && src[0]) {
@@ -1337,9 +1413,14 @@ static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, str
 			}
 		}
 
-		if (traverse_trees_recursive(n, dirmask, mask & ~dirmask,
-					     names, info) < 0)
+		if (is_sparse_directory_entry(src[0], names, info)) {
+			if (unpack_nondirectories(n, dirmask, mask & ~dirmask, src, names, info, 1) < 0)
+				return -1;
+		} else if (traverse_trees_recursive(n, dirmask, mask & ~dirmask,
+						    names, info) < 0) {
 			return -1;
+		}
+
 		return mask;
 	}
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 215+ messages in thread

* [PATCH v5 09/14] dir.c: accept a directory as part of cone-mode patterns
  2021-06-07 12:33       ` [PATCH v5 00/14] Sparse-index: integrate with status Derrick Stolee via GitGitGadget
                           ` (7 preceding siblings ...)
  2021-06-07 12:34         ` [PATCH v5 08/14] unpack-trees: unpack sparse directory entries Derrick Stolee via GitGitGadget
@ 2021-06-07 12:34         ` Derrick Stolee via GitGitGadget
  2021-06-07 12:34         ` [PATCH v5 10/14] diff-lib: handle index diffs with sparse dirs Derrick Stolee via GitGitGadget
                           ` (5 subsequent siblings)
  14 siblings, 0 replies; 215+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-06-07 12:34 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

When we have sparse directory entries in the index, we want to compare
that directory against sparse-checkout patterns. Those pattern matching
algorithms are built expecting a file path, not a directory path. This
is especially important in the "cone mode" patterns which will match
files that exist within the "parent directories" as well as the
recursive directory matches.

If path_matches_pattern_list() is given a directory, we can add a fake
filename ("-") to the directory and get the same results as before,
assuming we are in cone mode. Since sparse index requires cone mode
patterns, this is an acceptable assumption.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 dir.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/dir.c b/dir.c
index 166238e79f52..ab76ef286495 100644
--- a/dir.c
+++ b/dir.c
@@ -1378,6 +1378,17 @@ enum pattern_match_result path_matches_pattern_list(
 	strbuf_addch(&parent_pathname, '/');
 	strbuf_add(&parent_pathname, pathname, pathlen);
 
+	/*
+	 * Directory entries are matched if and only if a file
+	 * contained immediately within them is matched. For the
+	 * case of a directory entry, modify the path to create
+	 * a fake filename within this directory, allowing us to
+	 * use the file-base matching logic in an equivalent way.
+	 */
+	if (parent_pathname.len > 0 &&
+	    parent_pathname.buf[parent_pathname.len - 1] == '/')
+		strbuf_add(&parent_pathname, "-", 1);
+
 	if (hashmap_contains_path(&pl->recursive_hashmap,
 				  &parent_pathname)) {
 		result = MATCHED_RECURSIVE;
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 215+ messages in thread

* [PATCH v5 10/14] diff-lib: handle index diffs with sparse dirs
  2021-06-07 12:33       ` [PATCH v5 00/14] Sparse-index: integrate with status Derrick Stolee via GitGitGadget
                           ` (8 preceding siblings ...)
  2021-06-07 12:34         ` [PATCH v5 09/14] dir.c: accept a directory as part of cone-mode patterns Derrick Stolee via GitGitGadget
@ 2021-06-07 12:34         ` Derrick Stolee via GitGitGadget
  2021-06-07 15:26           ` Derrick Stolee
  2021-06-09  5:47           ` Elijah Newren
  2021-06-07 12:34         ` [PATCH v5 11/14] status: skip sparse-checkout percentage with sparse-index Derrick Stolee via GitGitGadget
                           ` (4 subsequent siblings)
  14 siblings, 2 replies; 215+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-06-07 12:34 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

While comparing an index to a tree, we may see a sparse directory entry.
In this case, we should compare that portion of the tree to the tree
represented by that entry. This could include a new tree which needs to
be expanded to a full list of added files. It could also include an
existing tree, in which case all of the changes inside are important to
describe, including the modifications, additions, and deletions. Note
that the case where the tree has a path and the index does not remains
identical to before: the lack of a cache entry is the same with a sparse
index.

In the case where a tree is modified, we need to expand the tree
recursively, and start comparing each contained entry as either an
addition, deletion, or modification. This causes an interesting
recursion that did not exist before.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 diff-lib.c | 188 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 188 insertions(+)

diff --git a/diff-lib.c b/diff-lib.c
index b73cc1859a49..ba4c683d4bc4 100644
--- a/diff-lib.c
+++ b/diff-lib.c
@@ -314,6 +314,48 @@ static int get_stat_data(const struct cache_entry *ce,
 	return 0;
 }
 
+struct show_new_tree_context {
+	struct rev_info *revs;
+	unsigned added:1;
+};
+
+static int show_new_file_from_tree(const struct object_id *oid,
+				   struct strbuf *base, const char *path,
+				   unsigned int mode, void *context)
+{
+	struct show_new_tree_context *ctx = context;
+	struct cache_entry *new_file = make_transient_cache_entry(mode, oid, path, /* stage */ 0);
+
+	diff_index_show_file(ctx->revs, ctx->added ? "+" : "-", new_file, oid, !is_null_oid(oid), mode, 0);
+	discard_cache_entry(new_file);
+	return 0;
+}
+
+static void show_directory(struct rev_info *revs,
+			   const struct cache_entry *new_dir,
+			   int added)
+{
+	/*
+	 * new_dir is a sparse directory entry, so we want to collect all
+	 * of the new files within the tree. This requires recursively
+	 * expanding the trees.
+	 */
+	struct show_new_tree_context ctx = { revs, added };
+	struct repository *r = revs->repo;
+	struct strbuf base = STRBUF_INIT;
+	struct pathspec ps;
+	struct tree *tree = lookup_tree(r, &new_dir->oid);
+
+	memset(&ps, 0, sizeof(ps));
+	ps.recursive = 1;
+	ps.has_wildcard = 1;
+	ps.max_depth = -1;
+
+	strbuf_add(&base, new_dir->name, new_dir->ce_namelen);
+	read_tree_at(r, tree, &base, &ps,
+			show_new_file_from_tree, &ctx);
+}
+
 static void show_new_file(struct rev_info *revs,
 			  const struct cache_entry *new_file,
 			  int cached, int match_missing)
@@ -322,6 +364,11 @@ static void show_new_file(struct rev_info *revs,
 	unsigned int mode;
 	unsigned dirty_submodule = 0;
 
+	if (new_file && S_ISSPARSEDIR(new_file->ce_mode)) {
+		show_directory(revs, new_file, /*added */ 1);
+		return;
+	}
+
 	/*
 	 * New file in the index: it might actually be different in
 	 * the working tree.
@@ -333,6 +380,136 @@ static void show_new_file(struct rev_info *revs,
 	diff_index_show_file(revs, "+", new_file, oid, !is_null_oid(oid), mode, dirty_submodule);
 }
 
+static int show_modified(struct rev_info *revs,
+			 const struct cache_entry *old_entry,
+			 const struct cache_entry *new_entry,
+			 int report_missing,
+			 int cached, int match_missing);
+
+static int compare_within_sparse_dir(int n, unsigned long mask,
+				     unsigned long dirmask, struct name_entry *entry,
+				     struct traverse_info *info)
+{
+	struct rev_info *revs = info->data;
+	struct object_id *oid0 = &entry[0].oid;
+	struct object_id *oid1 = &entry[1].oid;
+
+	if (oideq(oid0, oid1))
+		return mask;
+
+	/* Directory/file conflicts are handled earlier. */
+	if (S_ISDIR(entry[0].mode) && S_ISDIR(entry[1].mode)) {
+		struct tree_desc t[2];
+		void *buf[2];
+		struct traverse_info info_r = { NULL, };
+
+		info_r.name = xstrfmt("%s%s", info->traverse_path, entry[0].path);
+		info_r.namelen = strlen(info_r.name);
+		info_r.traverse_path = xstrfmt("%s/", info_r.name);
+		info_r.fn = compare_within_sparse_dir;
+		info_r.prev = info;
+		info_r.mode = entry[0].mode;
+		info_r.pathlen = entry[0].pathlen;
+		info_r.df_conflicts = 0;
+		info_r.data = revs;
+
+		buf[0] = fill_tree_descriptor(revs->repo, &t[0], oid0);
+		buf[1] = fill_tree_descriptor(revs->repo, &t[1], oid1);
+
+		traverse_trees(NULL, 2, t, &info_r);
+
+		free((char *)info_r.name);
+		free((char *)info_r.traverse_path);
+		free(buf[0]);
+		free(buf[1]);
+	} else {
+		char *old_path = NULL, *new_path = NULL;
+		struct cache_entry *old_entry = NULL, *new_entry = NULL;
+
+		if (entry[0].path) {
+			old_path = xstrfmt("%s%s", info->traverse_path, entry[0].path);
+			old_entry = make_transient_cache_entry(
+					entry[0].mode, &entry[0].oid,
+					old_path, /* stage */ 0);
+			old_entry->ce_flags |= CE_SKIP_WORKTREE;
+		}
+		if (entry[1].path) {
+			new_path = xstrfmt("%s%s", info->traverse_path, entry[1].path);
+			new_entry = make_transient_cache_entry(
+					entry[1].mode, &entry[1].oid,
+					new_path, /* stage */ 0);
+			new_entry->ce_flags |= CE_SKIP_WORKTREE;
+		}
+
+		if (entry[0].path && entry[1].path)
+			show_modified(revs, old_entry, new_entry, 0, 1, 0);
+		else if (entry[0].path)
+			diff_index_show_file(revs, revs->prefix,
+					     old_entry, &entry[0].oid,
+					     0, entry[0].mode, 0);
+		else if (entry[1].path)
+			show_new_file(revs, new_entry, 1, 0);
+
+		discard_cache_entry(old_entry);
+		discard_cache_entry(new_entry);
+		free(old_path);
+		free(new_path);
+	}
+
+	return mask;
+}
+
+static void show_modified_sparse_directory(struct rev_info *revs,
+			 const struct cache_entry *old_entry,
+			 const struct cache_entry *new_entry,
+			 int report_missing,
+			 int cached, int match_missing)
+{
+	struct tree_desc t[2];
+	void *buf[2];
+	struct traverse_info info = { NULL };
+	struct strbuf name = STRBUF_INIT;
+	struct strbuf parent_path = STRBUF_INIT;
+	char *last_dir_sep;
+
+	if (oideq(&old_entry->oid, &new_entry->oid))
+		return;
+
+	info.fn = compare_within_sparse_dir;
+	info.prev = &info;
+
+	strbuf_add(&name, new_entry->name, new_entry->ce_namelen - 1);
+	info.name = name.buf;
+	info.namelen = name.len;
+
+	strbuf_add(&parent_path, new_entry->name, new_entry->ce_namelen - 1);
+	if ((last_dir_sep = find_last_dir_sep(parent_path.buf)) > parent_path.buf)
+		strbuf_setlen(&parent_path, (last_dir_sep - parent_path.buf) - 1);
+	else
+		strbuf_setlen(&parent_path, 0);
+
+	info.pathlen = parent_path.len;
+
+	if (parent_path.len)
+		info.traverse_path = parent_path.buf;
+	else
+		info.traverse_path = "";
+
+	info.mode = new_entry->ce_mode;
+	info.df_conflicts = 0;
+	info.data = revs;
+
+	buf[0] = fill_tree_descriptor(revs->repo, &t[0], &old_entry->oid);
+	buf[1] = fill_tree_descriptor(revs->repo, &t[1], &new_entry->oid);
+
+	traverse_trees(NULL, 2, t, &info);
+
+	free(buf[0]);
+	free(buf[1]);
+	strbuf_release(&name);
+	strbuf_release(&parent_path);
+}
+
 static int show_modified(struct rev_info *revs,
 			 const struct cache_entry *old_entry,
 			 const struct cache_entry *new_entry,
@@ -343,6 +520,17 @@ static int show_modified(struct rev_info *revs,
 	const struct object_id *oid;
 	unsigned dirty_submodule = 0;
 
+	/*
+	 * If both are sparse directory entries, then expand the
+	 * modifications to the file level.
+	 */
+	if (old_entry && new_entry &&
+	    S_ISSPARSEDIR(old_entry->ce_mode) &&
+	    S_ISSPARSEDIR(new_entry->ce_mode)) {
+		show_modified_sparse_directory(revs, old_entry, new_entry, report_missing, cached, match_missing);
+		return 0;
+	}
+
 	if (get_stat_data(new_entry, &oid, &mode, cached, match_missing,
 			  &dirty_submodule, &revs->diffopt) < 0) {
 		if (report_missing)
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 215+ messages in thread

* [PATCH v5 11/14] status: skip sparse-checkout percentage with sparse-index
  2021-06-07 12:33       ` [PATCH v5 00/14] Sparse-index: integrate with status Derrick Stolee via GitGitGadget
                           ` (9 preceding siblings ...)
  2021-06-07 12:34         ` [PATCH v5 10/14] diff-lib: handle index diffs with sparse dirs Derrick Stolee via GitGitGadget
@ 2021-06-07 12:34         ` Derrick Stolee via GitGitGadget
  2021-06-07 12:34         ` [PATCH v5 12/14] status: use sparse-index throughout Derrick Stolee via GitGitGadget
                           ` (3 subsequent siblings)
  14 siblings, 0 replies; 215+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-06-07 12:34 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

'git status' began reporting a percentage of populated paths when
sparse-checkout is enabled in 051df3cf (wt-status: show sparse
checkout status as well, 2020-07-18). This percentage is incorrect when
the index has sparse directories. It would also be expensive to
calculate as we would need to parse trees to count the total number of
possible paths.

Avoid the expensive computation by simplifying the output to only report
that a sparse checkout exists, without the percentage.

This change is the reason we use 'git status --porcelain=v2' in
t1092-sparse-checkout-compatibility.sh. We don't want to ensure that
this message is equal across both modes, but instead just the important
information about staged, modified, and untracked files are compared.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 t/t1092-sparse-checkout-compatibility.sh |  8 ++++++++
 wt-status.c                              | 14 +++++++++++---
 wt-status.h                              |  1 +
 3 files changed, 20 insertions(+), 3 deletions(-)

diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index 851a83388e4b..f6b124e0500f 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -215,6 +215,14 @@ test_expect_success 'status with options' '
 	test_all_match git status --porcelain=v2 -uno
 '
 
+test_expect_success 'status reports sparse-checkout' '
+	init_repos &&
+	git -C sparse-checkout status >full &&
+	git -C sparse-index status >sparse &&
+	test_i18ngrep "You are in a sparse checkout with " full &&
+	test_i18ngrep "You are in a sparse checkout." sparse
+'
+
 test_expect_success 'add, commit, checkout' '
 	init_repos &&
 
diff --git a/wt-status.c b/wt-status.c
index 0c8287a023e4..0425169c1895 100644
--- a/wt-status.c
+++ b/wt-status.c
@@ -1490,9 +1490,12 @@ static void show_sparse_checkout_in_use(struct wt_status *s,
 	if (s->state.sparse_checkout_percentage == SPARSE_CHECKOUT_DISABLED)
 		return;
 
-	status_printf_ln(s, color,
-			 _("You are in a sparse checkout with %d%% of tracked files present."),
-			 s->state.sparse_checkout_percentage);
+	if (s->state.sparse_checkout_percentage == SPARSE_CHECKOUT_SPARSE_INDEX)
+		status_printf_ln(s, color, _("You are in a sparse checkout."));
+	else
+		status_printf_ln(s, color,
+				_("You are in a sparse checkout with %d%% of tracked files present."),
+				s->state.sparse_checkout_percentage);
 	wt_longstatus_print_trailer(s);
 }
 
@@ -1650,6 +1653,11 @@ static void wt_status_check_sparse_checkout(struct repository *r,
 		return;
 	}
 
+	if (r->index->sparse_index) {
+		state->sparse_checkout_percentage = SPARSE_CHECKOUT_SPARSE_INDEX;
+		return;
+	}
+
 	for (i = 0; i < r->index->cache_nr; i++) {
 		struct cache_entry *ce = r->index->cache[i];
 		if (ce_skip_worktree(ce))
diff --git a/wt-status.h b/wt-status.h
index 0d32799b28e1..ab9cc9d8f032 100644
--- a/wt-status.h
+++ b/wt-status.h
@@ -78,6 +78,7 @@ enum wt_status_format {
 };
 
 #define SPARSE_CHECKOUT_DISABLED -1
+#define SPARSE_CHECKOUT_SPARSE_INDEX -2
 
 struct wt_status_state {
 	int merge_in_progress;
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 215+ messages in thread

* [PATCH v5 12/14] status: use sparse-index throughout
  2021-06-07 12:33       ` [PATCH v5 00/14] Sparse-index: integrate with status Derrick Stolee via GitGitGadget
                           ` (10 preceding siblings ...)
  2021-06-07 12:34         ` [PATCH v5 11/14] status: skip sparse-checkout percentage with sparse-index Derrick Stolee via GitGitGadget
@ 2021-06-07 12:34         ` Derrick Stolee via GitGitGadget
  2021-06-07 12:34         ` [PATCH v5 13/14] wt-status: expand added sparse directory entries Derrick Stolee via GitGitGadget
                           ` (2 subsequent siblings)
  14 siblings, 0 replies; 215+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-06-07 12:34 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

By testing 'git -c core.fsmonitor= status -uno', we can check for the
simplest index operations that can be made sparse-aware. The necessary
implementation details are already integrated with sparse-checkout, so
modify command_requires_full_index to be zero for cmd_status().

In refresh_index(), we loop through the index entries to refresh their
stat() information. However, sparse directories have no stat()
information to populate. Ignore these entries.

This allows 'git status' to no longer expand a sparse index to a full
one. This is further tested by dropping the "-uno" option and adding an
untracked file into the worktree.

The performance test p2000-sparse-checkout-operations.sh demonstrates
these improvements:

Test                                  HEAD~1           HEAD
-----------------------------------------------------------------------------
2000.2: git status (full-index-v3)    0.31(0.30+0.05)  0.31(0.29+0.06) +0.0%
2000.3: git status (full-index-v4)    0.31(0.29+0.07)  0.34(0.30+0.08) +9.7%
2000.4: git status (sparse-index-v3)  2.35(2.28+0.10)  0.04(0.04+0.05) -98.3%
2000.5: git status (sparse-index-v4)  2.35(2.24+0.15)  0.05(0.04+0.06) -97.9%

Note that since HEAD~1 was expanding the sparse index by parsing trees,
it was artificially slower than the full index case. Thus, the 98%
improvement is misleading, and instead we should celebrate the 0.34s to
0.05s improvement of 85%. This is more indicative of the peformance
gains we are expecting by using a sparse index.

Note: we are dropping the assignment of core.fsmonitor here. This is not
necessary for the test script as we are not altering the config any
other way. Correct integration with FS Monitor will be validated in
later changes.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 builtin/commit.c                         |  3 +++
 read-cache.c                             | 10 ++++++++--
 t/t1092-sparse-checkout-compatibility.sh | 13 +++++++++----
 3 files changed, 20 insertions(+), 6 deletions(-)

diff --git a/builtin/commit.c b/builtin/commit.c
index cf0c36d1dcb2..e529da7beadd 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -1404,6 +1404,9 @@ int cmd_status(int argc, const char **argv, const char *prefix)
 	if (argc == 2 && !strcmp(argv[1], "-h"))
 		usage_with_options(builtin_status_usage, builtin_status_options);
 
+	prepare_repo_settings(the_repository);
+	the_repository->settings.command_requires_full_index = 0;
+
 	status_init_config(&s, git_status_config);
 	argc = parse_options(argc, argv, prefix,
 			     builtin_status_options,
diff --git a/read-cache.c b/read-cache.c
index 29ffa9ac5db9..f80e26831b36 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1578,8 +1578,7 @@ int refresh_index(struct index_state *istate, unsigned int flags,
 	 */
 	preload_index(istate, pathspec, 0);
 	trace2_region_enter("index", "refresh", NULL);
-	/* TODO: audit for interaction with sparse-index. */
-	ensure_full_index(istate);
+
 	for (i = 0; i < istate->cache_nr; i++) {
 		struct cache_entry *ce, *new_entry;
 		int cache_errno = 0;
@@ -1594,6 +1593,13 @@ int refresh_index(struct index_state *istate, unsigned int flags,
 		if (ignore_skip_worktree && ce_skip_worktree(ce))
 			continue;
 
+		/*
+		 * If this entry is a sparse directory, then there isn't
+		 * any stat() information to update. Ignore the entry.
+		 */
+		if (S_ISSPARSEDIR(ce->ce_mode))
+			continue;
+
 		if (pathspec && !ce_path_match(istate, ce, pathspec, seen))
 			filtered = 1;
 
diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index f6b124e0500f..099dc2bf440f 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -508,12 +508,17 @@ test_expect_success 'sparse-index is expanded and converted back' '
 	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \
 		git -C sparse-index -c core.fsmonitor="" reset --hard &&
 	test_region index convert_to_sparse trace2.txt &&
-	test_region index ensure_full_index trace2.txt &&
+	test_region index ensure_full_index trace2.txt
+'
 
-	rm trace2.txt &&
+test_expect_success 'sparse-index is not expanded' '
+	init_repos &&
+
+	rm -f trace2.txt &&
+	echo >>sparse-index/untracked.txt &&
 	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \
-		git -C sparse-index -c core.fsmonitor="" status -uno &&
-	test_region index ensure_full_index trace2.txt
+		git -C sparse-index status &&
+	test_region ! index ensure_full_index trace2.txt
 '
 
 test_done
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 215+ messages in thread

* [PATCH v5 13/14] wt-status: expand added sparse directory entries
  2021-06-07 12:33       ` [PATCH v5 00/14] Sparse-index: integrate with status Derrick Stolee via GitGitGadget
                           ` (11 preceding siblings ...)
  2021-06-07 12:34         ` [PATCH v5 12/14] status: use sparse-index throughout Derrick Stolee via GitGitGadget
@ 2021-06-07 12:34         ` Derrick Stolee via GitGitGadget
  2021-06-09  5:27           ` Elijah Newren
  2021-06-07 12:34         ` [PATCH v5 14/14] fsmonitor: integrate with sparse index Derrick Stolee via GitGitGadget
  2021-06-29  1:51         ` [PATCH v6 00/14] Sparse-index: integrate with status Derrick Stolee via GitGitGadget
  14 siblings, 1 reply; 215+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-06-07 12:34 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

It is difficult, but possible, to get into a state where we intend to
add a directory that is outside of the sparse-checkout definition. Add a
test to t1092-sparse-checkout-compatibility.sh that demonstrates this
using a combination of 'git reset --mixed' and 'git checkout --orphan'.

This test failed before because the output of 'git status
--porcelain=v2' would not match on the lines for folder1/:

* The sparse-checkout repo (with a full index) would output each path
  name that is intended to be added.

* The sparse-index repo would only output that "folder1/" is staged for
  addition.

The status should report the full list of files to be added, and so this
sparse-directory entry should be expanded to a full list when reaching
it inside the wt_status_collect_changes_initial() method. Use
read_tree_at() to assist.

Somehow, this loop over the cache entries was not guarded by
ensure_full_index() as intended.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 t/t1092-sparse-checkout-compatibility.sh | 36 +++++++++++++++++
 wt-status.c                              | 50 ++++++++++++++++++++++++
 2 files changed, 86 insertions(+)

diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index 099dc2bf440f..39b86fbe2be6 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -521,4 +521,40 @@ test_expect_success 'sparse-index is not expanded' '
 	test_region ! index ensure_full_index trace2.txt
 '
 
+test_expect_success 'reset mixed and checkout orphan' '
+	init_repos &&
+
+	test_all_match git checkout rename-out-to-in &&
+
+	# Sparse checkouts do not agree with full checkouts about
+	# how to report a directory/file conflict during a reset.
+	# This command would fail with test_all_match because the
+	# full checkout reports "T folder1/0/1" while a sparse
+	# checkout reports "D folder1/0/1". This matches because
+	# the sparse checkouts skip "adding" the other side of
+	# the conflict.
+	test_sparse_match git reset --mixed HEAD~1 &&
+	test_sparse_match test-tool read-cache --table --expand &&
+	test_sparse_match git status --porcelain=v2 &&
+	test_sparse_match git status --porcelain=v2 &&
+
+	# At this point, sparse-checkouts behave differently
+	# from the full-checkout.
+	test_sparse_match git checkout --orphan new-branch &&
+	test_sparse_match test-tool read-cache --table --expand &&
+	test_sparse_match git status --porcelain=v2 &&
+	test_sparse_match git status --porcelain=v2
+'
+
+test_expect_success 'add everything with deep new file' '
+	init_repos &&
+
+	run_on_sparse git sparse-checkout set deep/deeper1/deepest &&
+
+	run_on_all touch deep/deeper1/x &&
+	test_all_match git add . &&
+	test_all_match git status --porcelain=v2 &&
+	test_all_match git status --porcelain=v2
+'
+
 test_done
diff --git a/wt-status.c b/wt-status.c
index 0425169c1895..90db8bd659fa 100644
--- a/wt-status.c
+++ b/wt-status.c
@@ -654,6 +654,34 @@ static void wt_status_collect_changes_index(struct wt_status *s)
 	run_diff_index(&rev, 1);
 }
 
+static int add_file_to_list(const struct object_id *oid,
+			    struct strbuf *base, const char *path,
+			    unsigned int mode, void *context)
+{
+	struct string_list_item *it;
+	struct wt_status_change_data *d;
+	struct wt_status *s = context;
+	char *full_name;
+
+	if (S_ISDIR(mode))
+		return READ_TREE_RECURSIVE;
+
+	full_name = xstrfmt("%s%s", base->buf, path);
+	it = string_list_insert(&s->change, full_name);
+	d = it->util;
+	if (!d) {
+		CALLOC_ARRAY(d, 1);
+		it->util = d;
+	}
+
+	d->index_status = DIFF_STATUS_ADDED;
+	/* Leave {mode,oid}_head zero for adds. */
+	d->mode_index = mode;
+	oidcpy(&d->oid_index, oid);
+	s->committable = 1;
+	return 0;
+}
+
 static void wt_status_collect_changes_initial(struct wt_status *s)
 {
 	struct index_state *istate = s->repo->index;
@@ -668,6 +696,28 @@ static void wt_status_collect_changes_initial(struct wt_status *s)
 			continue;
 		if (ce_intent_to_add(ce))
 			continue;
+		if (S_ISSPARSEDIR(ce->ce_mode)) {
+			/*
+			 * This is a sparse directory entry, so we want to collect all
+			 * of the added files within the tree. This requires recursively
+			 * expanding the trees to find the elements that are new in this
+			 * tree and marking them with DIFF_STATUS_ADDED.
+			 */
+			struct strbuf base = STRBUF_INIT;
+			struct pathspec ps;
+			struct tree *tree = lookup_tree(istate->repo, &ce->oid);
+
+			memset(&ps, 0, sizeof(ps));
+			ps.recursive = 1;
+			ps.has_wildcard = 1;
+			ps.max_depth = -1;
+
+			strbuf_add(&base, ce->name, ce->ce_namelen);
+			read_tree_at(istate->repo, tree, &base, &ps,
+				     add_file_to_list, s);
+			continue;
+		}
+
 		it = string_list_insert(&s->change, ce->name);
 		d = it->util;
 		if (!d) {
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 215+ messages in thread

* [PATCH v5 14/14] fsmonitor: integrate with sparse index
  2021-06-07 12:33       ` [PATCH v5 00/14] Sparse-index: integrate with status Derrick Stolee via GitGitGadget
                           ` (12 preceding siblings ...)
  2021-06-07 12:34         ` [PATCH v5 13/14] wt-status: expand added sparse directory entries Derrick Stolee via GitGitGadget
@ 2021-06-07 12:34         ` Derrick Stolee via GitGitGadget
  2021-06-29  1:51         ` [PATCH v6 00/14] Sparse-index: integrate with status Derrick Stolee via GitGitGadget
  14 siblings, 0 replies; 215+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-06-07 12:34 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

If we need to expand a sparse-index into a full one, then the FS Monitor
bitmap is going to be incorrect. Ensure that we start fresh at such an
event.

While this is currently a performance drawback, the eventual hope of the
sparse-index feature is that these expansions will be rare and hence we
will be able to keep the FS Monitor data accurate across multiple Git
commands.

These tests are added to demonstrate that the behavior is the same
across a full index and a sparse index, but also that file modifications
to a tracked directory outside of the sparse cone will trigger
ensure_full_index().

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 sparse-index.c              |  7 ++++++
 t/t7519-status-fsmonitor.sh | 48 +++++++++++++++++++++++++++++++++++++
 2 files changed, 55 insertions(+)

diff --git a/sparse-index.c b/sparse-index.c
index b2b3fbd75050..32ba0d17ef7c 100644
--- a/sparse-index.c
+++ b/sparse-index.c
@@ -195,6 +195,10 @@ int convert_to_sparse(struct index_state *istate)
 	cache_tree_free(&istate->cache_tree);
 	cache_tree_update(istate, 0);
 
+	istate->fsmonitor_has_run_once = 0;
+	FREE_AND_NULL(istate->fsmonitor_dirty);
+	FREE_AND_NULL(istate->fsmonitor_last_update);
+
 	istate->sparse_index = 1;
 	trace2_region_leave("index", "convert_to_sparse", istate->repo);
 	return 0;
@@ -291,6 +295,9 @@ void ensure_full_index(struct index_state *istate)
 	istate->cache = full->cache;
 	istate->cache_nr = full->cache_nr;
 	istate->cache_alloc = full->cache_alloc;
+	istate->fsmonitor_has_run_once = 0;
+	FREE_AND_NULL(istate->fsmonitor_dirty);
+	FREE_AND_NULL(istate->fsmonitor_last_update);
 
 	strbuf_release(&base);
 	free(full);
diff --git a/t/t7519-status-fsmonitor.sh b/t/t7519-status-fsmonitor.sh
index 45d025f96010..f70fe961902e 100755
--- a/t/t7519-status-fsmonitor.sh
+++ b/t/t7519-status-fsmonitor.sh
@@ -73,6 +73,7 @@ test_expect_success 'setup' '
 	expect*
 	actual*
 	marker*
+	trace2*
 	EOF
 '
 
@@ -383,4 +384,51 @@ test_expect_success 'status succeeds after staging/unstaging' '
 	)
 '
 
+# Usage:
+# check_sparse_index_behavior [!]
+# If "!" is supplied, then we verify that we do not call ensure_full_index
+# during a call to 'git status'. Otherwise, we verify that we _do_ call it.
+check_sparse_index_behavior () {
+	git status --porcelain=v2 >expect &&
+	git sparse-checkout init --cone --sparse-index &&
+	git sparse-checkout set dir1 dir2 &&
+	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \
+		git status --porcelain=v2 >actual &&
+	test_region $1 index ensure_full_index trace2.txt &&
+	test_cmp expect actual &&
+	rm trace2.txt &&
+	git sparse-checkout disable
+}
+
+test_expect_success 'status succeeds with sparse index' '
+	git reset --hard &&
+
+	test_config core.fsmonitor "$TEST_DIRECTORY/t7519/fsmonitor-all" &&
+	check_sparse_index_behavior ! &&
+
+	write_script .git/hooks/fsmonitor-test<<-\EOF &&
+		printf "last_update_token\0"
+	EOF
+	git config core.fsmonitor .git/hooks/fsmonitor-test &&
+	check_sparse_index_behavior ! &&
+
+	write_script .git/hooks/fsmonitor-test<<-\EOF &&
+		printf "last_update_token\0"
+		printf "dir1/modified\0"
+	EOF
+	check_sparse_index_behavior ! &&
+
+	cp -r dir1 dir1a &&
+	git add dir1a &&
+	git commit -m "add dir1a" &&
+
+	# This one modifies outside the sparse-checkout definition
+	# and hence we expect to expand the sparse-index.
+	write_script .git/hooks/fsmonitor-test<<-\EOF &&
+		printf "last_update_token\0"
+		printf "dir1a/modified\0"
+	EOF
+	check_sparse_index_behavior
+'
+
 test_done
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 215+ messages in thread

* Re: [PATCH v5 10/14] diff-lib: handle index diffs with sparse dirs
  2021-06-07 12:34         ` [PATCH v5 10/14] diff-lib: handle index diffs with sparse dirs Derrick Stolee via GitGitGadget
@ 2021-06-07 15:26           ` Derrick Stolee
  2021-06-08  1:05             ` Junio C Hamano
  2021-06-09  5:47           ` Elijah Newren
  1 sibling, 1 reply; 215+ messages in thread
From: Derrick Stolee @ 2021-06-07 15:26 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget, git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee,
	Derrick Stolee

On 6/7/2021 8:34 AM, Derrick Stolee via GitGitGadget wrote:
> From: Derrick Stolee <dstolee@microsoft.com>
...
> +			old_entry = make_transient_cache_entry(
> +					entry[0].mode, &entry[0].oid,
> +					old_path, /* stage */ 0);

I didn't realize this before I started integrating with
v2.32.0 (which I should have done before submitting v5) that
make_transient_cache_entry() has changed its prototype to
include a memory pool parameter.

I'm working on a v6 that makes only this update and it will
probably be ready tomorrow.

Thanks,
-Stolee


^ permalink raw reply	[flat|nested] 215+ messages in thread

* Re: [PATCH v5 10/14] diff-lib: handle index diffs with sparse dirs
  2021-06-07 15:26           ` Derrick Stolee
@ 2021-06-08  1:05             ` Junio C Hamano
  2021-06-08 13:00               ` Derrick Stolee
  0 siblings, 1 reply; 215+ messages in thread
From: Junio C Hamano @ 2021-06-08  1:05 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Derrick Stolee via GitGitGadget, git, newren,
	Matheus Tavares Bernardino, Derrick Stolee, Derrick Stolee

Derrick Stolee <stolee@gmail.com> writes:

> On 6/7/2021 8:34 AM, Derrick Stolee via GitGitGadget wrote:
>> From: Derrick Stolee <dstolee@microsoft.com>
> ...
>> +			old_entry = make_transient_cache_entry(
>> +					entry[0].mode, &entry[0].oid,
>> +					old_path, /* stage */ 0);
>
> I didn't realize this before I started integrating with
> v2.32.0 (which I should have done before submitting v5) that
> make_transient_cache_entry() has changed its prototype to
> include a memory pool parameter.

Sorry for the trouble---these are usually all known to me for topics
I happened to have picked up in 'seen', since I try to make it a rule
that 'seen' must be a descendant of 'master'.

How can I usefully communicate the conflicts I find out during the
integration cycles to topic owners, I wonder.

Thanks.

^ permalink raw reply	[flat|nested] 215+ messages in thread

* Re: [PATCH v5 10/14] diff-lib: handle index diffs with sparse dirs
  2021-06-08  1:05             ` Junio C Hamano
@ 2021-06-08 13:00               ` Derrick Stolee
  0 siblings, 0 replies; 215+ messages in thread
From: Derrick Stolee @ 2021-06-08 13:00 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Derrick Stolee via GitGitGadget, git, newren,
	Matheus Tavares Bernardino, Derrick Stolee, Derrick Stolee

On 6/7/2021 9:05 PM, Junio C Hamano wrote:
> Derrick Stolee <stolee@gmail.com> writes:
> 
>> On 6/7/2021 8:34 AM, Derrick Stolee via GitGitGadget wrote:
>>> From: Derrick Stolee <dstolee@microsoft.com>
>> ...
>>> +			old_entry = make_transient_cache_entry(
>>> +					entry[0].mode, &entry[0].oid,
>>> +					old_path, /* stage */ 0);
>>
>> I didn't realize this before I started integrating with
>> v2.32.0 (which I should have done before submitting v5) that
>> make_transient_cache_entry() has changed its prototype to
>> include a memory pool parameter.
> 
> Sorry for the trouble---these are usually all known to me for topics
> I happened to have picked up in 'seen', since I try to make it a rule
> that 'seen' must be a descendant of 'master'.
> 
> How can I usefully communicate the conflicts I find out during the
> integration cycles to topic owners, I wonder.

This is my fault for stacking topics. I used a GitGitGadget PR
to target a custom merge of other topics in flight, so my
merges were testing against a static target. When those topics
were merged, I should have updated my PR to point to 'master' or
even 'next'.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 215+ messages in thread

* Re: [PATCH v5 02/14] sparse-index: include EXTENDED flag when expanding
  2021-06-07 12:34         ` [PATCH v5 02/14] sparse-index: include EXTENDED flag when expanding Derrick Stolee via GitGitGadget
@ 2021-06-08 18:56           ` Elijah Newren
  2021-06-09 17:39             ` Derrick Stolee
  0 siblings, 1 reply; 215+ messages in thread
From: Elijah Newren @ 2021-06-08 18:56 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Junio C Hamano, Matheus Tavares Bernardino,
	Derrick Stolee, Derrick Stolee, Derrick Stolee

On Mon, Jun 7, 2021 at 5:34 AM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Derrick Stolee <dstolee@microsoft.com>
>
> When creating a full index from a sparse one, we create cache entries
> for every blob within a given sparse directory entry. These are
> correctly marked with the CE_SKIP_WORKTREE flag, but they must also be
> marked with the CE_EXTENDED flag to ensure that the skip-worktree bit is
> correctly written to disk in the case that the index is not converted
> back down to a sparse-index.

In our previous discussion on this patch from v3
(https://lore.kernel.org/git/cb9161ca-dc6e-b77b-1a41-385ed8920bb2@gmail.com/),
you said you'd explain the reason for this change in a bit more
detail, but the commit message has not changed.

Could this be corrected?

>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  sparse-index.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/sparse-index.c b/sparse-index.c
> index 1b49898d0cb7..b2b3fbd75050 100644
> --- a/sparse-index.c
> +++ b/sparse-index.c
> @@ -222,7 +222,7 @@ static int add_path_to_index(const struct object_id *oid,
>         strbuf_addstr(base, path);
>
>         ce = make_cache_entry(istate, mode, oid, base->buf, 0, 0);
> -       ce->ce_flags |= CE_SKIP_WORKTREE;
> +       ce->ce_flags |= CE_SKIP_WORKTREE | CE_EXTENDED;
>         set_index_entry(istate, istate->cache_nr++, ce);
>
>         strbuf_setlen(base, len);
> --
> gitgitgadget

^ permalink raw reply	[flat|nested] 215+ messages in thread

* Re: [PATCH v5 03/14] t1092: replace incorrect 'echo' with 'cat'
  2021-06-07 12:34         ` [PATCH v5 03/14] t1092: replace incorrect 'echo' with 'cat' Derrick Stolee via GitGitGadget
@ 2021-06-08 19:18           ` Elijah Newren
  0 siblings, 0 replies; 215+ messages in thread
From: Elijah Newren @ 2021-06-08 19:18 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Junio C Hamano, Matheus Tavares Bernardino,
	Derrick Stolee, Derrick Stolee, Derrick Stolee

On Mon, Jun 7, 2021 at 5:34 AM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Derrick Stolee <dstolee@microsoft.com>
>
> This fixes the test data shape to be as expected, allowing rename
> detection to work properly now that the 'larger-conent' file actually

s/conent/content/

> has meaningful lines.
>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  t/t1092-sparse-checkout-compatibility.sh | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
> index 4f2f09b53a32..d55478a1902b 100755
> --- a/t/t1092-sparse-checkout-compatibility.sh
> +++ b/t/t1092-sparse-checkout-compatibility.sh
> @@ -40,7 +40,7 @@ test_expect_success 'setup' '
>                 done &&
>
>                 git checkout -b rename-base base &&
> -               echo >folder1/larger-content <<-\EOF &&
> +               cat >folder1/larger-content <<-\EOF &&
>                 matching
>                 lines
>                 help
> --
> gitgitgadget

^ permalink raw reply	[flat|nested] 215+ messages in thread

* Re: [PATCH v5 08/14] unpack-trees: unpack sparse directory entries
  2021-06-07 12:34         ` [PATCH v5 08/14] unpack-trees: unpack sparse directory entries Derrick Stolee via GitGitGadget
@ 2021-06-09  3:48           ` Elijah Newren
  2021-06-09 20:21             ` Derrick Stolee
  0 siblings, 1 reply; 215+ messages in thread
From: Elijah Newren @ 2021-06-09  3:48 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Junio C Hamano, Matheus Tavares Bernardino,
	Derrick Stolee, Derrick Stolee, Derrick Stolee

On Mon, Jun 7, 2021 at 5:34 AM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Derrick Stolee <dstolee@microsoft.com>
>
> During unpack_callback(), index entries are compared against tree
> entries. These are matched according to names and types. One goal is to
> decide if we should recurse into subtrees or simply operate on one index
> entry.
>
> In the case of a sparse-directory entry, we do not want to recurse into
> that subtree and instead simply compare the trees. In some cases, we
> might want to perform a merge operation on the entry, such as during
> 'git checkout <commit>' which wants to replace a sparse tree entry with
> the tree for that path at the target commit. We extend the logic within
> unpack_nondirectories() to create a sparse-directory entry in this case,
> and then that is sent to call_unpack_fn().

Does this presume that all callbacks are prepared to accept a sparse
directory entry?  Or do we have an external flag that ensures we do
not reach this code path when using callbacks that aren't prepared to
handle it properly?

I hope that the answer is the latter, and that the ensure_full_index()
calls are what prevents the code from reaching this point if a
callback would be used that couldn't handle a sparse directory entry.

I'd be particularly concerned that merge-recursive would call this
code with unpack_opts.fn = threeway_merge.  threeway_merge is kind of
interesting in that it might just happen to not die when passed a
sparse directory entry, but would pass along data that'd just break
stuff downstream in various subtle ways.  For example, if there were
conflicts in the sparse directory entries because both had been
modified, the merge should recurse and resolve individual paths
underneath, which the merge-recursive code would not be prepared to do
since unpack_trees() has already returned.  Also, even if there wasn't
a "conflict" because only one side modified, blindly doing a trivial
directory resolution will break rename detection.  I mention
merge-recursive not because it's worth fixing (well, it was and the
fix is called merge-ort) but because I'm most familiar with it.  The
other callbacks _might_ have similar problems, though its possible
that it's safe for one- and two- way merging and just fails once you
get to three-way.

> There are some subtleties in this process. For instance, we need to
> update find_cache_entry() to allow finding a sparse-directory entry that
> exactly matches a given path.
>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  unpack-trees.c | 101 ++++++++++++++++++++++++++++++++++++++++++++-----
>  1 file changed, 91 insertions(+), 10 deletions(-)
>
> diff --git a/unpack-trees.c b/unpack-trees.c
> index ef6a2b1c951c..ff448ee8424e 100644
> --- a/unpack-trees.c
> +++ b/unpack-trees.c
> @@ -1037,13 +1037,15 @@ static struct cache_entry *create_ce_entry(const struct traverse_info *info,
>         const struct name_entry *n,
>         int stage,
>         struct index_state *istate,
> -       int is_transient)
> +       int is_transient,
> +       int is_sparse_directory)
>  {
>         size_t len = traverse_path_len(info, tree_entry_len(n));
> +       size_t alloc_len = is_sparse_directory ? len + 1 : len;
>         struct cache_entry *ce =
>                 is_transient ?
> -               make_empty_transient_cache_entry(len) :
> -               make_empty_cache_entry(istate, len);
> +               make_empty_transient_cache_entry(alloc_len) :
> +               make_empty_cache_entry(istate, alloc_len);
>
>         ce->ce_mode = create_ce_mode(n->mode);
>         ce->ce_flags = create_ce_flags(stage);
> @@ -1052,6 +1054,13 @@ static struct cache_entry *create_ce_entry(const struct traverse_info *info,
>         /* len+1 because the cache_entry allocates space for NUL */
>         make_traverse_path(ce->name, len + 1, info, n->path, n->pathlen);
>
> +       if (is_sparse_directory) {
> +               ce->name[len] = '/';
> +               ce->name[len + 1] = 0;

Should this be '\0', for clarity?

> +               ce->ce_namelen++;
> +               ce->ce_flags |= CE_SKIP_WORKTREE;
> +       }
> +
>         return ce;
>  }
>
> @@ -1064,16 +1073,24 @@ static int unpack_nondirectories(int n, unsigned long mask,
>                                  unsigned long dirmask,
>                                  struct cache_entry **src,
>                                  const struct name_entry *names,
> -                                const struct traverse_info *info)
> +                                const struct traverse_info *info,
> +                                int sparse_directory)
>  {
>         int i;
>         struct unpack_trees_options *o = info->data;
>         unsigned long conflicts = info->df_conflicts | dirmask;
>
> -       /* Do we have *only* directories? Nothing to do */

You've removed the comment, but not the code.  So it still returns
immediately if there are only directories...right?  Am I missing
something?  Is this code still correct?  Or is the comment just
misleading now that src[0] can be a directory?

>         if (mask == dirmask && !src[0])
>                 return 0;
>
> +       /* no-op if our cache entry doesn't match the expectations. */
> +       if (sparse_directory) {
> +               if (src[0] && !S_ISSPARSEDIR(src[0]->ce_mode))
> +                       BUG("expected sparse directory entry");
> +       } else if (src[0] && S_ISSPARSEDIR(src[0]->ce_mode)) {
> +               return 0;
> +       }

This code reads like "If sparse_directory is false, but the cache
entry is a sparse directory, we'll just keep it as-is and ignore
changed or conflicting directories or files from the names name_entry.
However, I think this has to be coupled with knowledge about changes
to unpack_callback() you made, where you introduce an extra call to
unpack_nondirectories() for the sparse directory case, and in the
second one you would do useful work.  So "no-op" is kind of
misleading, it's more deferral until the later unpack_nondirectories()
call.

Or, at least so I think after trying to read over this patch.  Am I
understanding this right?

> +
>         /*
>          * Ok, we've filled in up to any potential index entry in src[0],
>          * now do the rest.
> @@ -1103,7 +1120,9 @@ static int unpack_nondirectories(int n, unsigned long mask,
>                  * not stored in the index.  otherwise construct the
>                  * cache entry from the index aware logic.
>                  */
> -               src[i + o->merge] = create_ce_entry(info, names + i, stage, &o->result, o->merge);
> +               src[i + o->merge] = create_ce_entry(info, names + i, stage,
> +                                                   &o->result, o->merge,
> +                                                   sparse_directory);
>         }
>
>         if (o->merge) {
> @@ -1210,13 +1229,44 @@ static int find_cache_pos(struct traverse_info *info,
>  static struct cache_entry *find_cache_entry(struct traverse_info *info,
>                                             const struct name_entry *p)
>  {
> +       struct cache_entry *ce;
>         int pos = find_cache_pos(info, p->path, p->pathlen);
>         struct unpack_trees_options *o = info->data;
>
>         if (0 <= pos)
>                 return o->src_index->cache[pos];
> -       else
> +
> +       /*
> +        * Check for a sparse-directory entry named "path/".
> +        * Due to the input p->path not having a trailing
> +        * slash, the negative 'pos' value overshoots the
> +        * expected position by one, hence "-2" here.
> +        */
> +       pos = -pos - 2;
> +
> +       if (pos < 0 || pos >= o->src_index->cache_nr)
> +               return NULL;
> +
> +       ce = o->src_index->cache[pos];
> +
> +       if (!S_ISSPARSEDIR(ce->ce_mode))
>                 return NULL;
> +
> +       /*
> +        * Compare ce->name to info->name + '/' + p->path + '/'
> +        * if info->name is non-empty. Compare ce->name to
> +        * p-.path + '/' otherwise.

p->path, not p-.path

Also, you state in both cases that you are comparing against a
trailing '/', but...

> +        */
> +       if (info->namelen) {
> +               if (ce->ce_namelen == info->namelen + p->pathlen + 2 &&
> +                   ce->name[info->namelen] == '/' &&
> +                   !strncmp(ce->name, info->name, info->namelen) &&
> +                   !strncmp(ce->name + info->namelen + 1, p->path, p->pathlen))

You only checked for one of the two '/' characters here.  Are you
omitting the check for the final '/' do to the S_ISSPARSEDIR() check
above?

> +                       return ce;
> +       } else if (ce->ce_namelen == p->pathlen + 1 &&
> +                  !strncmp(ce->name, p->path, p->pathlen))

Here you didn't check for the final '/'.  Is that intentional because
of the S_ISSPARSEDIR() check above?  If so, should the comment above
this block be corrected?

> +               return ce;
> +       return NULL;
>  }
>
>  static void debug_path(struct traverse_info *info)
> @@ -1251,6 +1301,32 @@ static void debug_unpack_callback(int n,
>                 debug_name_entry(i, names + i);
>  }
>
> +/*
> + * Returns true if and only if the given cache_entry is a
> + * sparse-directory entry that matches the given name_entry
> + * from the tree walk at the given traverse_info.
> + */
> +static int is_sparse_directory_entry(struct cache_entry *ce, struct name_entry *name, struct traverse_info *info)
> +{
> +       size_t expected_len, name_start;
> +
> +       if (!ce || !name || !S_ISSPARSEDIR(ce->ce_mode))
> +               return 0;
> +
> +       if (info->namelen)
> +               name_start = info->namelen + 1;
> +       else
> +               name_start = 0;
> +       expected_len = name->pathlen + 1 + name_start;
> +
> +       if (ce->ce_namelen != expected_len ||
> +           strncmp(ce->name, info->name, info->namelen) ||
> +           strncmp(ce->name + name_start, name->path, name->pathlen))
> +               return 0;

What about the intervening '/' character?  Could we get a false hit
between "foo/bar/" and "foo.bar/"?

Also, do we have to worry about the trailing '/'?

> +
> +       return 1;
> +}
> +
>  /*
>   * Note that traverse_by_cache_tree() duplicates some logic in this function
>   * without actually calling it. If you change the logic here you may need to
> @@ -1307,7 +1383,7 @@ static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, str
>                 }
>         }
>
> -       if (unpack_nondirectories(n, mask, dirmask, src, names, info) < 0)
> +       if (unpack_nondirectories(n, mask, dirmask, src, names, info, 0) < 0)
>                 return -1;
>
>         if (o->merge && src[0]) {
> @@ -1337,9 +1413,14 @@ static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, str
>                         }
>                 }
>
> -               if (traverse_trees_recursive(n, dirmask, mask & ~dirmask,
> -                                            names, info) < 0)
> +               if (is_sparse_directory_entry(src[0], names, info)) {
> +                       if (unpack_nondirectories(n, dirmask, mask & ~dirmask, src, names, info, 1) < 0)
> +                               return -1;
> +               } else if (traverse_trees_recursive(n, dirmask, mask & ~dirmask,
> +                                                   names, info) < 0) {
>                         return -1;
> +               }
> +
>                 return mask;
>         }
>
> --
> gitgitgadget

^ permalink raw reply	[flat|nested] 215+ messages in thread

* Re: [PATCH v5 13/14] wt-status: expand added sparse directory entries
  2021-06-07 12:34         ` [PATCH v5 13/14] wt-status: expand added sparse directory entries Derrick Stolee via GitGitGadget
@ 2021-06-09  5:27           ` Elijah Newren
  2021-06-09 20:49             ` Derrick Stolee
  0 siblings, 1 reply; 215+ messages in thread
From: Elijah Newren @ 2021-06-09  5:27 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Junio C Hamano, Matheus Tavares Bernardino,
	Derrick Stolee, Derrick Stolee, Derrick Stolee

On Mon, Jun 7, 2021 at 5:34 AM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Derrick Stolee <dstolee@microsoft.com>
>
> It is difficult, but possible, to get into a state where we intend to
> add a directory that is outside of the sparse-checkout definition. Add a
> test to t1092-sparse-checkout-compatibility.sh that demonstrates this
> using a combination of 'git reset --mixed' and 'git checkout --orphan'.
>
> This test failed before because the output of 'git status
> --porcelain=v2' would not match on the lines for folder1/:
>
> * The sparse-checkout repo (with a full index) would output each path
>   name that is intended to be added.
>
> * The sparse-index repo would only output that "folder1/" is staged for
>   addition.
>
> The status should report the full list of files to be added, and so this
> sparse-directory entry should be expanded to a full list when reaching
> it inside the wt_status_collect_changes_initial() method. Use
> read_tree_at() to assist.
>
> Somehow, this loop over the cache entries was not guarded by
> ensure_full_index() as intended.
>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  t/t1092-sparse-checkout-compatibility.sh | 36 +++++++++++++++++
>  wt-status.c                              | 50 ++++++++++++++++++++++++
>  2 files changed, 86 insertions(+)
>
> diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
> index 099dc2bf440f..39b86fbe2be6 100755
> --- a/t/t1092-sparse-checkout-compatibility.sh
> +++ b/t/t1092-sparse-checkout-compatibility.sh
> @@ -521,4 +521,40 @@ test_expect_success 'sparse-index is not expanded' '
>         test_region ! index ensure_full_index trace2.txt
>  '
>
> +test_expect_success 'reset mixed and checkout orphan' '
> +       init_repos &&
> +
> +       test_all_match git checkout rename-out-to-in &&
> +
> +       # Sparse checkouts do not agree with full checkouts about
> +       # how to report a directory/file conflict during a reset.
> +       # This command would fail with test_all_match because the
> +       # full checkout reports "T folder1/0/1" while a sparse
> +       # checkout reports "D folder1/0/1". This matches because
> +       # the sparse checkouts skip "adding" the other side of
> +       # the conflict.
> +       test_sparse_match git reset --mixed HEAD~1 &&

Ooh!  I think you found a sparse-checkout bug here.  I agree that
sparse-checkouts and full-checkouts should give different output in
this case, but I don't think the current difference is the correct
one.  Digging in a little closer, before running `git reset --mixed
HEAD~1` I see:

$ git ls-files -t | grep folder
S folder1/0/0/0
S folder1/0/1
S folder2/0/0/0
S folder2/0/1/1
S folder2/a
S folder2/larger-content

and after running git reset --mixed HEAD~1, I see:
S folder1/0/0/0
S folder1/0/1
H folder1/a
H folder1/larger-content
S folder2/0/0/0
H folder2/0/1
S folder2/a
S folder2/larger-content

meaning that the reset of the index failed.  It thinks some entries
are present in the working copy, though it didn't actually check any
of them out, leaving them to be marked as deleted.  This leaves the
sparse-checkout in a messed up state.  To correct it, I need to run
either of the following:

    git diff --diff-filter=D --name-only | xargs git update-index
--skip-worktree

or

    git sparse-checkout reapply

(Though one could ask whether sparse-checkout reapply should take a
missing file that isn't SKIP_WORKTREE and determine it's okay to just
mark it as SKIP_WORKTREE rather than treating it as dirty.  I'm not
sure the answer to that...)

I really think that `git reset --mixed ...` should have been getting
the sparsity right on its own without the manual fixup afterwards that
I needed to add.

> +       test_sparse_match test-tool read-cache --table --expand &&

If both the full and the sparse checkouts do a reset --mixed, I would
think that this step should be able to use a test_all_match...at least
if reset --mixed weren't broken.

> +       test_sparse_match git status --porcelain=v2 &&
> +       test_sparse_match git status --porcelain=v2 &&

Why is this test run twice?

> +
> +       # At this point, sparse-checkouts behave differently
> +       # from the full-checkout.
> +       test_sparse_match git checkout --orphan new-branch &&
> +       test_sparse_match test-tool read-cache --table --expand &&
> +       test_sparse_match git status --porcelain=v2 &&
> +       test_sparse_match git status --porcelain=v2

And again, you run the status twice...why?

> +'
> +
> +test_expect_success 'add everything with deep new file' '
> +       init_repos &&
> +
> +       run_on_sparse git sparse-checkout set deep/deeper1/deepest &&
> +
> +       run_on_all touch deep/deeper1/x &&
> +       test_all_match git add . &&
> +       test_all_match git status --porcelain=v2 &&
> +       test_all_match git status --porcelain=v2

same question.

> +'
> +
>  test_done
> diff --git a/wt-status.c b/wt-status.c
> index 0425169c1895..90db8bd659fa 100644
> --- a/wt-status.c
> +++ b/wt-status.c
> @@ -654,6 +654,34 @@ static void wt_status_collect_changes_index(struct wt_status *s)
>         run_diff_index(&rev, 1);
>  }
>
> +static int add_file_to_list(const struct object_id *oid,
> +                           struct strbuf *base, const char *path,
> +                           unsigned int mode, void *context)
> +{
> +       struct string_list_item *it;
> +       struct wt_status_change_data *d;
> +       struct wt_status *s = context;
> +       char *full_name;
> +
> +       if (S_ISDIR(mode))
> +               return READ_TREE_RECURSIVE;
> +
> +       full_name = xstrfmt("%s%s", base->buf, path);
> +       it = string_list_insert(&s->change, full_name);
> +       d = it->util;
> +       if (!d) {
> +               CALLOC_ARRAY(d, 1);
> +               it->util = d;
> +       }
> +
> +       d->index_status = DIFF_STATUS_ADDED;
> +       /* Leave {mode,oid}_head zero for adds. */
> +       d->mode_index = mode;
> +       oidcpy(&d->oid_index, oid);
> +       s->committable = 1;
> +       return 0;
> +}
> +
>  static void wt_status_collect_changes_initial(struct wt_status *s)
>  {
>         struct index_state *istate = s->repo->index;
> @@ -668,6 +696,28 @@ static void wt_status_collect_changes_initial(struct wt_status *s)
>                         continue;
>                 if (ce_intent_to_add(ce))
>                         continue;
> +               if (S_ISSPARSEDIR(ce->ce_mode)) {
> +                       /*
> +                        * This is a sparse directory entry, so we want to collect all
> +                        * of the added files within the tree. This requires recursively
> +                        * expanding the trees to find the elements that are new in this
> +                        * tree and marking them with DIFF_STATUS_ADDED.
> +                        */
> +                       struct strbuf base = STRBUF_INIT;
> +                       struct pathspec ps;
> +                       struct tree *tree = lookup_tree(istate->repo, &ce->oid);
> +
> +                       memset(&ps, 0, sizeof(ps));
> +                       ps.recursive = 1;
> +                       ps.has_wildcard = 1;
> +                       ps.max_depth = -1;
> +
> +                       strbuf_add(&base, ce->name, ce->ce_namelen);
> +                       read_tree_at(istate->repo, tree, &base, &ps,
> +                                    add_file_to_list, s);
> +                       continue;
> +               }
> +
>                 it = string_list_insert(&s->change, ce->name);
>                 d = it->util;
>                 if (!d) {
> --
> gitgitgadget
>

^ permalink raw reply	[flat|nested] 215+ messages in thread

* Re: [PATCH v5 10/14] diff-lib: handle index diffs with sparse dirs
  2021-06-07 12:34         ` [PATCH v5 10/14] diff-lib: handle index diffs with sparse dirs Derrick Stolee via GitGitGadget
  2021-06-07 15:26           ` Derrick Stolee
@ 2021-06-09  5:47           ` Elijah Newren
  2021-06-09  6:32             ` Junio C Hamano
  1 sibling, 1 reply; 215+ messages in thread
From: Elijah Newren @ 2021-06-09  5:47 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Junio C Hamano, Matheus Tavares Bernardino,
	Derrick Stolee, Derrick Stolee, Derrick Stolee

On Mon, Jun 7, 2021 at 5:34 AM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Derrick Stolee <dstolee@microsoft.com>
>
> While comparing an index to a tree, we may see a sparse directory entry.
> In this case, we should compare that portion of the tree to the tree
> represented by that entry. This could include a new tree which needs to
> be expanded to a full list of added files. It could also include an
> existing tree, in which case all of the changes inside are important to
> describe, including the modifications, additions, and deletions. Note
> that the case where the tree has a path and the index does not remains
> identical to before: the lack of a cache entry is the same with a sparse
> index.
>
> In the case where a tree is modified, we need to expand the tree
> recursively, and start comparing each contained entry as either an
> addition, deletion, or modification. This causes an interesting
> recursion that did not exist before.

So, I haven't read through this in detail yet...but there's a big
question I'm curious about:

Git already has code for comparing an index to a tree, a tree to a
tree, or a tree to the working directory, right?  So, when comparing a
sparse-index to a tree...can't we re-use the compare a tree to a tree
code when we hit a sparse directory?

Maybe there's a really good reason to conceptually duplicate the
compare a tree to a tree code, but it seems the commit message should
at least address that reason and why we need to reimplement that
logic.


> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  diff-lib.c | 188 +++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 188 insertions(+)
>
> diff --git a/diff-lib.c b/diff-lib.c
> index b73cc1859a49..ba4c683d4bc4 100644
> --- a/diff-lib.c
> +++ b/diff-lib.c
> @@ -314,6 +314,48 @@ static int get_stat_data(const struct cache_entry *ce,
>         return 0;
>  }
>
> +struct show_new_tree_context {
> +       struct rev_info *revs;
> +       unsigned added:1;
> +};
> +
> +static int show_new_file_from_tree(const struct object_id *oid,
> +                                  struct strbuf *base, const char *path,
> +                                  unsigned int mode, void *context)
> +{
> +       struct show_new_tree_context *ctx = context;
> +       struct cache_entry *new_file = make_transient_cache_entry(mode, oid, path, /* stage */ 0);
> +
> +       diff_index_show_file(ctx->revs, ctx->added ? "+" : "-", new_file, oid, !is_null_oid(oid), mode, 0);
> +       discard_cache_entry(new_file);
> +       return 0;
> +}
> +
> +static void show_directory(struct rev_info *revs,
> +                          const struct cache_entry *new_dir,
> +                          int added)
> +{
> +       /*
> +        * new_dir is a sparse directory entry, so we want to collect all
> +        * of the new files within the tree. This requires recursively
> +        * expanding the trees.
> +        */
> +       struct show_new_tree_context ctx = { revs, added };
> +       struct repository *r = revs->repo;
> +       struct strbuf base = STRBUF_INIT;
> +       struct pathspec ps;
> +       struct tree *tree = lookup_tree(r, &new_dir->oid);
> +
> +       memset(&ps, 0, sizeof(ps));
> +       ps.recursive = 1;
> +       ps.has_wildcard = 1;
> +       ps.max_depth = -1;
> +
> +       strbuf_add(&base, new_dir->name, new_dir->ce_namelen);
> +       read_tree_at(r, tree, &base, &ps,
> +                       show_new_file_from_tree, &ctx);
> +}
> +
>  static void show_new_file(struct rev_info *revs,
>                           const struct cache_entry *new_file,
>                           int cached, int match_missing)
> @@ -322,6 +364,11 @@ static void show_new_file(struct rev_info *revs,
>         unsigned int mode;
>         unsigned dirty_submodule = 0;
>
> +       if (new_file && S_ISSPARSEDIR(new_file->ce_mode)) {
> +               show_directory(revs, new_file, /*added */ 1);
> +               return;
> +       }
> +
>         /*
>          * New file in the index: it might actually be different in
>          * the working tree.
> @@ -333,6 +380,136 @@ static void show_new_file(struct rev_info *revs,
>         diff_index_show_file(revs, "+", new_file, oid, !is_null_oid(oid), mode, dirty_submodule);
>  }
>
> +static int show_modified(struct rev_info *revs,
> +                        const struct cache_entry *old_entry,
> +                        const struct cache_entry *new_entry,
> +                        int report_missing,
> +                        int cached, int match_missing);
> +
> +static int compare_within_sparse_dir(int n, unsigned long mask,
> +                                    unsigned long dirmask, struct name_entry *entry,
> +                                    struct traverse_info *info)
> +{
> +       struct rev_info *revs = info->data;
> +       struct object_id *oid0 = &entry[0].oid;
> +       struct object_id *oid1 = &entry[1].oid;
> +
> +       if (oideq(oid0, oid1))
> +               return mask;
> +
> +       /* Directory/file conflicts are handled earlier. */
> +       if (S_ISDIR(entry[0].mode) && S_ISDIR(entry[1].mode)) {
> +               struct tree_desc t[2];
> +               void *buf[2];
> +               struct traverse_info info_r = { NULL, };
> +
> +               info_r.name = xstrfmt("%s%s", info->traverse_path, entry[0].path);
> +               info_r.namelen = strlen(info_r.name);
> +               info_r.traverse_path = xstrfmt("%s/", info_r.name);
> +               info_r.fn = compare_within_sparse_dir;
> +               info_r.prev = info;
> +               info_r.mode = entry[0].mode;
> +               info_r.pathlen = entry[0].pathlen;
> +               info_r.df_conflicts = 0;
> +               info_r.data = revs;
> +
> +               buf[0] = fill_tree_descriptor(revs->repo, &t[0], oid0);
> +               buf[1] = fill_tree_descriptor(revs->repo, &t[1], oid1);
> +
> +               traverse_trees(NULL, 2, t, &info_r);
> +
> +               free((char *)info_r.name);
> +               free((char *)info_r.traverse_path);
> +               free(buf[0]);
> +               free(buf[1]);
> +       } else {
> +               char *old_path = NULL, *new_path = NULL;
> +               struct cache_entry *old_entry = NULL, *new_entry = NULL;
> +
> +               if (entry[0].path) {
> +                       old_path = xstrfmt("%s%s", info->traverse_path, entry[0].path);
> +                       old_entry = make_transient_cache_entry(
> +                                       entry[0].mode, &entry[0].oid,
> +                                       old_path, /* stage */ 0);
> +                       old_entry->ce_flags |= CE_SKIP_WORKTREE;
> +               }
> +               if (entry[1].path) {
> +                       new_path = xstrfmt("%s%s", info->traverse_path, entry[1].path);
> +                       new_entry = make_transient_cache_entry(
> +                                       entry[1].mode, &entry[1].oid,
> +                                       new_path, /* stage */ 0);
> +                       new_entry->ce_flags |= CE_SKIP_WORKTREE;
> +               }
> +
> +               if (entry[0].path && entry[1].path)
> +                       show_modified(revs, old_entry, new_entry, 0, 1, 0);
> +               else if (entry[0].path)
> +                       diff_index_show_file(revs, revs->prefix,
> +                                            old_entry, &entry[0].oid,
> +                                            0, entry[0].mode, 0);
> +               else if (entry[1].path)
> +                       show_new_file(revs, new_entry, 1, 0);
> +
> +               discard_cache_entry(old_entry);
> +               discard_cache_entry(new_entry);
> +               free(old_path);
> +               free(new_path);
> +       }
> +
> +       return mask;
> +}
> +
> +static void show_modified_sparse_directory(struct rev_info *revs,
> +                        const struct cache_entry *old_entry,
> +                        const struct cache_entry *new_entry,
> +                        int report_missing,
> +                        int cached, int match_missing)
> +{
> +       struct tree_desc t[2];
> +       void *buf[2];
> +       struct traverse_info info = { NULL };
> +       struct strbuf name = STRBUF_INIT;
> +       struct strbuf parent_path = STRBUF_INIT;
> +       char *last_dir_sep;
> +
> +       if (oideq(&old_entry->oid, &new_entry->oid))
> +               return;
> +
> +       info.fn = compare_within_sparse_dir;
> +       info.prev = &info;
> +
> +       strbuf_add(&name, new_entry->name, new_entry->ce_namelen - 1);
> +       info.name = name.buf;
> +       info.namelen = name.len;
> +
> +       strbuf_add(&parent_path, new_entry->name, new_entry->ce_namelen - 1);
> +       if ((last_dir_sep = find_last_dir_sep(parent_path.buf)) > parent_path.buf)
> +               strbuf_setlen(&parent_path, (last_dir_sep - parent_path.buf) - 1);
> +       else
> +               strbuf_setlen(&parent_path, 0);
> +
> +       info.pathlen = parent_path.len;
> +
> +       if (parent_path.len)
> +               info.traverse_path = parent_path.buf;
> +       else
> +               info.traverse_path = "";
> +
> +       info.mode = new_entry->ce_mode;
> +       info.df_conflicts = 0;
> +       info.data = revs;
> +
> +       buf[0] = fill_tree_descriptor(revs->repo, &t[0], &old_entry->oid);
> +       buf[1] = fill_tree_descriptor(revs->repo, &t[1], &new_entry->oid);
> +
> +       traverse_trees(NULL, 2, t, &info);
> +
> +       free(buf[0]);
> +       free(buf[1]);
> +       strbuf_release(&name);
> +       strbuf_release(&parent_path);
> +}
> +
>  static int show_modified(struct rev_info *revs,
>                          const struct cache_entry *old_entry,
>                          const struct cache_entry *new_entry,
> @@ -343,6 +520,17 @@ static int show_modified(struct rev_info *revs,
>         const struct object_id *oid;
>         unsigned dirty_submodule = 0;
>
> +       /*
> +        * If both are sparse directory entries, then expand the
> +        * modifications to the file level.
> +        */
> +       if (old_entry && new_entry &&
> +           S_ISSPARSEDIR(old_entry->ce_mode) &&
> +           S_ISSPARSEDIR(new_entry->ce_mode)) {
> +               show_modified_sparse_directory(revs, old_entry, new_entry, report_missing, cached, match_missing);
> +               return 0;
> +       }
> +
>         if (get_stat_data(new_entry, &oid, &mode, cached, match_missing,
>                           &dirty_submodule, &revs->diffopt) < 0) {
>                 if (report_missing)
> --
> gitgitgadget
>

^ permalink raw reply	[flat|nested] 215+ messages in thread

* Re: [PATCH v5 10/14] diff-lib: handle index diffs with sparse dirs
  2021-06-09  5:47           ` Elijah Newren
@ 2021-06-09  6:32             ` Junio C Hamano
  2021-06-09  8:11               ` Elijah Newren
  0 siblings, 1 reply; 215+ messages in thread
From: Junio C Hamano @ 2021-06-09  6:32 UTC (permalink / raw)
  To: Elijah Newren
  Cc: Derrick Stolee via GitGitGadget, Git Mailing List,
	Matheus Tavares Bernardino, Derrick Stolee, Derrick Stolee,
	Derrick Stolee

Elijah Newren <newren@gmail.com> writes:

> On Mon, Jun 7, 2021 at 5:34 AM Derrick Stolee via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
>>
>> From: Derrick Stolee <dstolee@microsoft.com>
>>
>> While comparing an index to a tree, we may see a sparse directory entry.
>> In this case, we should compare that portion of the tree to the tree
>> represented by that entry. This could include a new tree which needs to
>> be expanded to a full list of added files. It could also include an
>> existing tree, in which case all of the changes inside are important to
>> describe, including the modifications, additions, and deletions. Note
>> that the case where the tree has a path and the index does not remains
>> identical to before: the lack of a cache entry is the same with a sparse
>> index.
>>
>> In the case where a tree is modified, we need to expand the tree
>> recursively, and start comparing each contained entry as either an
>> addition, deletion, or modification. This causes an interesting
>> recursion that did not exist before.
>
> So, I haven't read through this in detail yet...but there's a big
> question I'm curious about:
>
> Git already has code for comparing an index to a tree, a tree to a
> tree, or a tree to the working directory, right?  So, when comparing a
> sparse-index to a tree...can't we re-use the compare a tree to a tree
> code when we hit a sparse directory?

Offhand I do not think of a reason why that cannot work.

The tree-diff machinery takes two trees, walks them in parallel and
repeatedly calls either diff_addremove() or diff_change(), which
appends diff_filepair() to the diff_queue[] structure.  If you see
an unexpanded tree on the index side, you should be able to pass
that tree with the subtree you are comparing against to the tree-diff
machinery to come up with a series of filepairs, and then tweak the
pathnames of these filepairs (as such a two-tree comparison would be
comparing two trees representing a single subdirectory of two different
vintages) before adding them to the diff_queue[] you are collecting
the index-vs-tree diff, for example.

But if a part of the index is represented as a tree because it is
outside the cone of interest, should we even be showing the
difference in that part of the tree?  If t/ directory is outside the
cone of interest, should "git diff HEAD~100 HEAD t/" show anything
to begin with (the same question for "git diff --cached HEAD t/")?

^ permalink raw reply	[flat|nested] 215+ messages in thread

* Re: [PATCH v5 10/14] diff-lib: handle index diffs with sparse dirs
  2021-06-09  6:32             ` Junio C Hamano
@ 2021-06-09  8:11               ` Elijah Newren
  2021-06-09 20:33                 ` Derrick Stolee
  0 siblings, 1 reply; 215+ messages in thread
From: Elijah Newren @ 2021-06-09  8:11 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Derrick Stolee via GitGitGadget, Git Mailing List,
	Matheus Tavares Bernardino, Derrick Stolee, Derrick Stolee,
	Derrick Stolee

On Tue, Jun 8, 2021 at 11:32 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Elijah Newren <newren@gmail.com> writes:
>
> > On Mon, Jun 7, 2021 at 5:34 AM Derrick Stolee via GitGitGadget
> > <gitgitgadget@gmail.com> wrote:
> >>
> >> From: Derrick Stolee <dstolee@microsoft.com>
> >>
> >> While comparing an index to a tree, we may see a sparse directory entry.
> >> In this case, we should compare that portion of the tree to the tree
> >> represented by that entry. This could include a new tree which needs to
> >> be expanded to a full list of added files. It could also include an
> >> existing tree, in which case all of the changes inside are important to
> >> describe, including the modifications, additions, and deletions. Note
> >> that the case where the tree has a path and the index does not remains
> >> identical to before: the lack of a cache entry is the same with a sparse
> >> index.
> >>
> >> In the case where a tree is modified, we need to expand the tree
> >> recursively, and start comparing each contained entry as either an
> >> addition, deletion, or modification. This causes an interesting
> >> recursion that did not exist before.
> >
> > So, I haven't read through this in detail yet...but there's a big
> > question I'm curious about:
> >
> > Git already has code for comparing an index to a tree, a tree to a
> > tree, or a tree to the working directory, right?  So, when comparing a
> > sparse-index to a tree...can't we re-use the compare a tree to a tree
> > code when we hit a sparse directory?
>
> Offhand I do not think of a reason why that cannot work.
>
> The tree-diff machinery takes two trees, walks them in parallel and
> repeatedly calls either diff_addremove() or diff_change(), which
> appends diff_filepair() to the diff_queue[] structure.  If you see
> an unexpanded tree on the index side, you should be able to pass
> that tree with the subtree you are comparing against to the tree-diff
> machinery to come up with a series of filepairs, and then tweak the
> pathnames of these filepairs (as such a two-tree comparison would be
> comparing two trees representing a single subdirectory of two different
> vintages) before adding them to the diff_queue[] you are collecting
> the index-vs-tree diff, for example.

Good to know it seems my idea might be reasonable.

> But if a part of the index is represented as a tree because it is
> outside the cone of interest, should we even be showing the
> difference in that part of the tree?  If t/ directory is outside the
> cone of interest, should "git diff HEAD~100 HEAD t/" show anything
> to begin with (the same question for "git diff --cached HEAD t/")?

Excellent question...and not just for diff, but log, grep with
revisions, and other commands.  We discussed this a while back[1] and
we seemed to lean towards eventually adding a flag because there are
usecases both for (1) viewing full history while having sparsity paths
restrict just the working copy, and (2) also restricting the view of
history to the sparsity paths.

[1] It's been discussed a few times, but there's a relatively
comprehensive discussion at the "Commands that would change for
behavior A" section from
https://lore.kernel.org/git/CABPp-BGJ_Nvi5TmgriD9Bh6eNXE2EDq2f8e8QKXAeYG3BxZafA@mail.gmail.com/

^ permalink raw reply	[flat|nested] 215+ messages in thread

* Re: [PATCH v5 02/14] sparse-index: include EXTENDED flag when expanding
  2021-06-08 18:56           ` Elijah Newren
@ 2021-06-09 17:39             ` Derrick Stolee
  2021-06-09 18:11               ` Elijah Newren
  0 siblings, 1 reply; 215+ messages in thread
From: Derrick Stolee @ 2021-06-09 17:39 UTC (permalink / raw)
  To: Elijah Newren, Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Junio C Hamano, Matheus Tavares Bernardino,
	Derrick Stolee, Derrick Stolee

On 6/8/2021 2:56 PM, Elijah Newren wrote:
> On Mon, Jun 7, 2021 at 5:34 AM Derrick Stolee via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
>>
>> From: Derrick Stolee <dstolee@microsoft.com>
>>
>> When creating a full index from a sparse one, we create cache entries
>> for every blob within a given sparse directory entry. These are
>> correctly marked with the CE_SKIP_WORKTREE flag, but they must also be
>> marked with the CE_EXTENDED flag to ensure that the skip-worktree bit is
>> correctly written to disk in the case that the index is not converted
>> back down to a sparse-index.
> 
> In our previous discussion on this patch from v3
> (https://lore.kernel.org/git/cb9161ca-dc6e-b77b-1a41-385ed8920bb2@gmail.com/),
> you said you'd explain the reason for this change in a bit more
> detail, but the commit message has not changed.

Thank you for the reminder.

> Could this be corrected?

How does this sound?

    When creating a full index from a sparse one, we create cache entries
    for every blob within a given sparse directory entry. These are
    correctly marked with the CE_SKIP_WORKTREE flag, but the CE_EXTENDED
    flag is not included. The CE_EXTENDED flag would exist if we loaded a
    full index from disk with these entries marked with CE_SKIP_WORKTREE, so
    we can add the flag here to be consistent. This allows us to directly
    compare the flags present in cache entries when testing the sparse-index
    feature, but has no significance to its correctness in the user-facing
    functionality.

I have this in my local branch for now, but can update it before the next
version.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 215+ messages in thread

* Re: [PATCH v5 02/14] sparse-index: include EXTENDED flag when expanding
  2021-06-09 17:39             ` Derrick Stolee
@ 2021-06-09 18:11               ` Elijah Newren
  0 siblings, 0 replies; 215+ messages in thread
From: Elijah Newren @ 2021-06-09 18:11 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Derrick Stolee via GitGitGadget, Git Mailing List,
	Junio C Hamano, Matheus Tavares Bernardino, Derrick Stolee,
	Derrick Stolee

On Wed, Jun 9, 2021 at 10:39 AM Derrick Stolee <stolee@gmail.com> wrote:
>
> On 6/8/2021 2:56 PM, Elijah Newren wrote:
> > On Mon, Jun 7, 2021 at 5:34 AM Derrick Stolee via GitGitGadget
> > <gitgitgadget@gmail.com> wrote:
> >>
> >> From: Derrick Stolee <dstolee@microsoft.com>
> >>
> >> When creating a full index from a sparse one, we create cache entries
> >> for every blob within a given sparse directory entry. These are
> >> correctly marked with the CE_SKIP_WORKTREE flag, but they must also be
> >> marked with the CE_EXTENDED flag to ensure that the skip-worktree bit is
> >> correctly written to disk in the case that the index is not converted
> >> back down to a sparse-index.
> >
> > In our previous discussion on this patch from v3
> > (https://lore.kernel.org/git/cb9161ca-dc6e-b77b-1a41-385ed8920bb2@gmail.com/),
> > you said you'd explain the reason for this change in a bit more
> > detail, but the commit message has not changed.
>
> Thank you for the reminder.
>
> > Could this be corrected?
>
> How does this sound?
>
>     When creating a full index from a sparse one, we create cache entries
>     for every blob within a given sparse directory entry. These are
>     correctly marked with the CE_SKIP_WORKTREE flag, but the CE_EXTENDED
>     flag is not included. The CE_EXTENDED flag would exist if we loaded a
>     full index from disk with these entries marked with CE_SKIP_WORKTREE, so
>     we can add the flag here to be consistent. This allows us to directly
>     compare the flags present in cache entries when testing the sparse-index
>     feature, but has no significance to its correctness in the user-facing
>     functionality.
>
> I have this in my local branch for now, but can update it before the next
> version.

Thanks; this looks good to me.

^ permalink raw reply	[flat|nested] 215+ messages in thread

* Re: [PATCH v5 08/14] unpack-trees: unpack sparse directory entries
  2021-06-09  3:48           ` Elijah Newren
@ 2021-06-09 20:21             ` Derrick Stolee
  0 siblings, 0 replies; 215+ messages in thread
From: Derrick Stolee @ 2021-06-09 20:21 UTC (permalink / raw)
  To: Elijah Newren, Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Junio C Hamano, Matheus Tavares Bernardino,
	Derrick Stolee, Derrick Stolee

On 6/8/2021 11:48 PM, Elijah Newren wrote:
> On Mon, Jun 7, 2021 at 5:34 AM Derrick Stolee via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
>>
>> From: Derrick Stolee <dstolee@microsoft.com>
>>
>> During unpack_callback(), index entries are compared against tree
>> entries. These are matched according to names and types. One goal is to
>> decide if we should recurse into subtrees or simply operate on one index
>> entry.
>>
>> In the case of a sparse-directory entry, we do not want to recurse into
>> that subtree and instead simply compare the trees. In some cases, we
>> might want to perform a merge operation on the entry, such as during
>> 'git checkout <commit>' which wants to replace a sparse tree entry with
>> the tree for that path at the target commit. We extend the logic within
>> unpack_nondirectories() to create a sparse-directory entry in this case,
>> and then that is sent to call_unpack_fn().
> 
> Does this presume that all callbacks are prepared to accept a sparse
> directory entry?  Or do we have an external flag that ensures we do
> not reach this code path when using callbacks that aren't prepared to
> handle it properly?
> 
> I hope that the answer is the latter, and that the ensure_full_index()
> calls are what prevents the code from reaching this point if a
> callback would be used that couldn't handle a sparse directory entry.

To the best of my knowledge, callbacks that are not protected have
ensure_full_index() protecting them. At minimum, the repository setting
command_requires_full_index is enabled by default, causing a sparse
index to be expanded to a full one immediately upon parsing (and also
after writing) to protect cases that might be missing. That is, until
we can create tests for each command before disabling it for that
command.

> I'd be particularly concerned that merge-recursive would call this
> code with unpack_opts.fn = threeway_merge.  threeway_merge is kind of
> interesting in that it might just happen to not die when passed a
> sparse directory entry, but would pass along data that'd just break
> stuff downstream in various subtle ways.  For example, if there were
> conflicts in the sparse directory entries because both had been
> modified, the merge should recurse and resolve individual paths
> underneath, which the merge-recursive code would not be prepared to do
> since unpack_trees() has already returned.  Also, even if there wasn't
> a "conflict" because only one side modified, blindly doing a trivial
> directory resolution will break rename detection.  I mention
> merge-recursive not because it's worth fixing (well, it was and the
> fix is called merge-ort) but because I'm most familiar with it.  The
> other callbacks _might_ have similar problems, though its possible
> that it's safe for one- and two- way merging and just fails once you
> get to three-way.

I also believe that threeway_merge might be difficult to integrate
and that its use should be protected with ensure_full_index() even if
'git merge' in general does not do it by default. We can cross that
bridge when we get to it. Merge, rebase, and cherry-pick are next on
my list of "commands to integrate with sparse-index" but I haven't
done the work yet to make them work.
>> +       if (is_sparse_directory) {
>> +               ce->name[len] = '/';
>> +               ce->name[len + 1] = 0;
> 
> Should this be '\0', for clarity?

Sure.

>> @@ -1064,16 +1073,24 @@ static int unpack_nondirectories(int n, unsigned long mask,
>>                                  unsigned long dirmask,
>>                                  struct cache_entry **src,
>>                                  const struct name_entry *names,
>> -                                const struct traverse_info *info)
>> +                                const struct traverse_info *info,
>> +                                int sparse_directory)
>>  {
>>         int i;
>>         struct unpack_trees_options *o = info->data;
>>         unsigned long conflicts = info->df_conflicts | dirmask;
>>
>> -       /* Do we have *only* directories? Nothing to do */
> 
> You've removed the comment, but not the code.  So it still returns
> immediately if there are only directories...right?  Am I missing
> something?  Is this code still correct?  Or is the comment just
> misleading now that src[0] can be a directory?
Yes, the comment is misleading now that we can call this method
with sparse-directory entries. The method name is also a bit
misleading: this should perhaps be renamed to unpack_single_entry()
or something like that. That will signal that we are not recursing
with traverse_trees_recursive() as we do in the other case.
 
>>         if (mask == dirmask && !src[0])
>>                 return 0;
>>
>> +       /* no-op if our cache entry doesn't match the expectations. */
>> +       if (sparse_directory) {
>> +               if (src[0] && !S_ISSPARSEDIR(src[0]->ce_mode))
>> +                       BUG("expected sparse directory entry");
>> +       } else if (src[0] && S_ISSPARSEDIR(src[0]->ce_mode)) {
>> +               return 0;
>> +       }
> 
> This code reads like "If sparse_directory is false, but the cache
> entry is a sparse directory, we'll just keep it as-is and ignore
> changed or conflicting directories or files from the names name_entry.
> However, I think this has to be coupled with knowledge about changes
> to unpack_callback() you made, where you introduce an extra call to
> unpack_nondirectories() for the sparse directory case, and in the
> second one you would do useful work.  So "no-op" is kind of
> misleading, it's more deferral until the later unpack_nondirectories()
> call.
> 
> Or, at least so I think after trying to read over this patch.  Am I
> understanding this right?

I think they are both correct: we defer until later by doing a no-op
right now. But using "defer" is more informative of the context of
this call.

>> +
>> +       /*
>> +        * Compare ce->name to info->name + '/' + p->path + '/'
>> +        * if info->name is non-empty. Compare ce->name to
>> +        * p-.path + '/' otherwise.
> 
> p->path, not p-.path

Thanks!
 
> Also, you state in both cases that you are comparing against a
> trailing '/', but...
> 
>> +        */
>> +       if (info->namelen) {
>> +               if (ce->ce_namelen == info->namelen + p->pathlen + 2 &&
>> +                   ce->name[info->namelen] == '/' &&
>> +                   !strncmp(ce->name, info->name, info->namelen) &&
>> +                   !strncmp(ce->name + info->namelen + 1, p->path, p->pathlen))
> 
> You only checked for one of the two '/' characters here.

The first '/' check is to verify that we match "{info->name}/{p->name}/"
and not "{info->name}.{p->name}/" ('.' means "any character").

>  Are you
> omitting the check for the final '/' do to the S_ISSPARSEDIR() check
> above?

Since we know at this point that ce is a sparse directory entry, the
final character _must_ be a trailing slash. There is not a trailing
slash in the input p->path.

>> +                       return ce;
>> +       } else if (ce->ce_namelen == p->pathlen + 1 &&
>> +                  !strncmp(ce->name, p->path, p->pathlen))
> 
> Here you didn't check for the final '/'.  Is that intentional because
> of the S_ISSPARSEDIR() check above?  If so, should the comment above
> this block be corrected?

Yes, will do.

>> +               return ce;
>> +       return NULL;
>>  }
>>
>>  static void debug_path(struct traverse_info *info)
>> @@ -1251,6 +1301,32 @@ static void debug_unpack_callback(int n,
>>                 debug_name_entry(i, names + i);
>>  }
>>
>> +/*
>> + * Returns true if and only if the given cache_entry is a
>> + * sparse-directory entry that matches the given name_entry
>> + * from the tree walk at the given traverse_info.
>> + */
>> +static int is_sparse_directory_entry(struct cache_entry *ce, struct name_entry *name, struct traverse_info *info)
>> +{
>> +       size_t expected_len, name_start;
>> +
>> +       if (!ce || !name || !S_ISSPARSEDIR(ce->ce_mode))
>> +               return 0;
>> +
>> +       if (info->namelen)
>> +               name_start = info->namelen + 1;
>> +       else
>> +               name_start = 0;
>> +       expected_len = name->pathlen + 1 + name_start;
>> +
>> +       if (ce->ce_namelen != expected_len ||
>> +           strncmp(ce->name, info->name, info->namelen) ||
>> +           strncmp(ce->name + name_start, name->path, name->pathlen))
>> +               return 0;
> 
> What about the intervening '/' character?  Could we get a false hit
> between "foo/bar/" and "foo.bar/"?

Here, you are right that I missed this check. I will add it.

> Also, do we have to worry about the trailing '/'?

No, the index would be malformed without it. Since this code
is so similar to the other check (just the negation of it)
I will add a clearly-commented helper method.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 215+ messages in thread

* Re: [PATCH v5 10/14] diff-lib: handle index diffs with sparse dirs
  2021-06-09  8:11               ` Elijah Newren
@ 2021-06-09 20:33                 ` Derrick Stolee
  2021-06-10 17:45                   ` Derrick Stolee
  0 siblings, 1 reply; 215+ messages in thread
From: Derrick Stolee @ 2021-06-09 20:33 UTC (permalink / raw)
  To: Elijah Newren, Junio C Hamano
  Cc: Derrick Stolee via GitGitGadget, Git Mailing List,
	Matheus Tavares Bernardino, Derrick Stolee, Derrick Stolee

On 6/9/2021 4:11 AM, Elijah Newren wrote:
> On Tue, Jun 8, 2021 at 11:32 PM Junio C Hamano <gitster@pobox.com> wrote:
>>
>> Elijah Newren <newren@gmail.com> writes:
>>
>>> On Mon, Jun 7, 2021 at 5:34 AM Derrick Stolee via GitGitGadget
>>> <gitgitgadget@gmail.com> wrote:
>>>>
>>>> From: Derrick Stolee <dstolee@microsoft.com>
>>>>
>>>> While comparing an index to a tree, we may see a sparse directory entry.
>>>> In this case, we should compare that portion of the tree to the tree
>>>> represented by that entry. This could include a new tree which needs to
>>>> be expanded to a full list of added files. It could also include an
>>>> existing tree, in which case all of the changes inside are important to
>>>> describe, including the modifications, additions, and deletions. Note
>>>> that the case where the tree has a path and the index does not remains
>>>> identical to before: the lack of a cache entry is the same with a sparse
>>>> index.
>>>>
>>>> In the case where a tree is modified, we need to expand the tree
>>>> recursively, and start comparing each contained entry as either an
>>>> addition, deletion, or modification. This causes an interesting
>>>> recursion that did not exist before.
>>>
>>> So, I haven't read through this in detail yet...but there's a big
>>> question I'm curious about:
>>>
>>> Git already has code for comparing an index to a tree, a tree to a
>>> tree, or a tree to the working directory, right?  So, when comparing a
>>> sparse-index to a tree...can't we re-use the compare a tree to a tree
>>> code when we hit a sparse directory?
>>
>> Offhand I do not think of a reason why that cannot work.
>>
>> The tree-diff machinery takes two trees, walks them in parallel and
>> repeatedly calls either diff_addremove() or diff_change(), which
>> appends diff_filepair() to the diff_queue[] structure.  If you see
>> an unexpanded tree on the index side, you should be able to pass
>> that tree with the subtree you are comparing against to the tree-diff
>> machinery to come up with a series of filepairs, and then tweak the
>> pathnames of these filepairs (as such a two-tree comparison would be
>> comparing two trees representing a single subdirectory of two different
>> vintages) before adding them to the diff_queue[] you are collecting
>> the index-vs-tree diff, for example.
> 
> Good to know it seems my idea might be reasonable.

I agree that this is reasonable. I just didn't look hard enough
to find existing code for this, since I found traverse_trees and
thought that _was_ the library for this.

>> But if a part of the index is represented as a tree because it is
>> outside the cone of interest, should we even be showing the
>> difference in that part of the tree?  If t/ directory is outside the
>> cone of interest, should "git diff HEAD~100 HEAD t/" show anything
>> to begin with (the same question for "git diff --cached HEAD t/")?
> 
> Excellent question...and not just for diff, but log, grep with
> revisions, and other commands.  We discussed this a while back[1] and
> we seemed to lean towards eventually adding a flag because there are
> usecases both for (1) viewing full history while having sparsity paths
> restrict just the working copy, and (2) also restricting the view of
> history to the sparsity paths.
> 
> [1] It's been discussed a few times, but there's a relatively
> comprehensive discussion at the "Commands that would change for
> behavior A" section from
> https://lore.kernel.org/git/CABPp-BGJ_Nvi5TmgriD9Bh6eNXE2EDq2f8e8QKXAeYG3BxZafA@mail.gmail.com/

Yes, we could investigate this behavior change in the future. The
good thing is that these points that handle sparse directory
entries create clear branching points for that future behavior
change.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 215+ messages in thread

* Re: [PATCH v5 13/14] wt-status: expand added sparse directory entries
  2021-06-09  5:27           ` Elijah Newren
@ 2021-06-09 20:49             ` Derrick Stolee
  0 siblings, 0 replies; 215+ messages in thread
From: Derrick Stolee @ 2021-06-09 20:49 UTC (permalink / raw)
  To: Elijah Newren, Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Junio C Hamano, Matheus Tavares Bernardino,
	Derrick Stolee, Derrick Stolee

On 6/9/2021 1:27 AM, Elijah Newren wrote:
> On Mon, Jun 7, 2021 at 5:34 AM Derrick Stolee via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
...
>> +test_expect_success 'reset mixed and checkout orphan' '
>> +       init_repos &&
>> +
>> +       test_all_match git checkout rename-out-to-in &&
>> +
>> +       # Sparse checkouts do not agree with full checkouts about
>> +       # how to report a directory/file conflict during a reset.
>> +       # This command would fail with test_all_match because the
>> +       # full checkout reports "T folder1/0/1" while a sparse
>> +       # checkout reports "D folder1/0/1". This matches because
>> +       # the sparse checkouts skip "adding" the other side of
>> +       # the conflict.
>> +       test_sparse_match git reset --mixed HEAD~1 &&
> 
> Ooh!  I think you found a sparse-checkout bug here.  I agree that
> sparse-checkouts and full-checkouts should give different output in
> this case, but I don't think the current difference is the correct
> one.  Digging in a little closer, before running `git reset --mixed
> HEAD~1` I see:
> 
> $ git ls-files -t | grep folder
> S folder1/0/0/0
> S folder1/0/1
> S folder2/0/0/0
> S folder2/0/1/1
> S folder2/a
> S folder2/larger-content
> 
> and after running git reset --mixed HEAD~1, I see:
> S folder1/0/0/0
> S folder1/0/1
> H folder1/a
> H folder1/larger-content
> S folder2/0/0/0
> H folder2/0/1
> S folder2/a
> S folder2/larger-content
> 
> meaning that the reset of the index failed.  It thinks some entries
> are present in the working copy, though it didn't actually check any
> of them out, leaving them to be marked as deleted.  This leaves the
> sparse-checkout in a messed up state.  To correct it, I need to run
> either of the following:
> 
>     git diff --diff-filter=D --name-only | xargs git update-index
> --skip-worktree
> 
> or
> 
>     git sparse-checkout reapply
> 
> (Though one could ask whether sparse-checkout reapply should take a
> missing file that isn't SKIP_WORKTREE and determine it's okay to just
> mark it as SKIP_WORKTREE rather than treating it as dirty.  I'm not
> sure the answer to that...)
> 
> I really think that `git reset --mixed ...` should have been getting
> the sparsity right on its own without the manual fixup afterwards that
> I needed to add.
> 
>> +       test_sparse_match test-tool read-cache --table --expand &&
> 
> If both the full and the sparse checkouts do a reset --mixed, I would
> think that this step should be able to use a test_all_match...at least
> if reset --mixed weren't broken.

I will add this to my list when getting to 'git reset' integration
with sparse-checkout. Thanks.

>> +       test_sparse_match git status --porcelain=v2 &&
>> +       test_sparse_match git status --porcelain=v2 &&
> 
> Why is this test run twice?
> 
>> +
>> +       # At this point, sparse-checkouts behave differently
>> +       # from the full-checkout.
>> +       test_sparse_match git checkout --orphan new-branch &&
>> +       test_sparse_match test-tool read-cache --table --expand &&
>> +       test_sparse_match git status --porcelain=v2 &&
>> +       test_sparse_match git status --porcelain=v2
> 
> And again, you run the status twice...why?
> 
>> +'
>> +
>> +test_expect_success 'add everything with deep new file' '
>> +       init_repos &&
>> +
>> +       run_on_sparse git sparse-checkout set deep/deeper1/deepest &&
>> +
>> +       run_on_all touch deep/deeper1/x &&
>> +       test_all_match git add . &&
>> +       test_all_match git status --porcelain=v2 &&
>> +       test_all_match git status --porcelain=v2
> 
> same question.

These double 'git status' calls are actually a bit subtle: there
was a bug in an earlier version that only appeared when using
'git status' twice, because the first kept the sparse index
without expanding it, and the bug actually had an incorrect
result when writing that index. Only the second 'git status'
would notice the problem. I started adding two calls to my tests,
but it is not necessary any more.

The reason to leave it out of the Git tests is that I'm testing
all of my submissions against the Scalar functional tests which
run 'git status' multiple times throughout each test situation
and that catches the problem as well. In the future, we will have
'git add' keeping the sparse index in-memory; that will also
expose this behavior sufficiently.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 215+ messages in thread

* Re: [PATCH v5 10/14] diff-lib: handle index diffs with sparse dirs
  2021-06-09 20:33                 ` Derrick Stolee
@ 2021-06-10 17:45                   ` Derrick Stolee
  2021-06-10 21:31                     ` Elijah Newren
  0 siblings, 1 reply; 215+ messages in thread
From: Derrick Stolee @ 2021-06-10 17:45 UTC (permalink / raw)
  To: Elijah Newren, Junio C Hamano
  Cc: Derrick Stolee via GitGitGadget, Git Mailing List,
	Matheus Tavares Bernardino, Derrick Stolee, Derrick Stolee

On 6/9/2021 4:33 PM, Derrick Stolee wrote:
> On 6/9/2021 4:11 AM, Elijah Newren wrote:
>> On Tue, Jun 8, 2021 at 11:32 PM Junio C Hamano <gitster@pobox.com> wrote:
>>>
>>> Elijah Newren <newren@gmail.com> writes:
>>>
>>> The tree-diff machinery takes two trees, walks them in parallel and
>>> repeatedly calls either diff_addremove() or diff_change(), which
>>> appends diff_filepair() to the diff_queue[] structure.  If you see
>>> an unexpanded tree on the index side, you should be able to pass
>>> that tree with the subtree you are comparing against to the tree-diff
>>> machinery to come up with a series of filepairs, and then tweak the
>>> pathnames of these filepairs (as such a two-tree comparison would be
>>> comparing two trees representing a single subdirectory of two different
>>> vintages) before adding them to the diff_queue[] you are collecting
>>> the index-vs-tree diff, for example.
>>
>> Good to know it seems my idea might be reasonable.
> 
> I agree that this is reasonable. I just didn't look hard enough
> to find existing code for this, since I found traverse_trees and
> thought that _was_ the library for this.

This was surprisingly simple, since most of the complicated stuff
is built into diff_tree_oid() and its use of revs->diffopt. The
new patch works as shown below the cut-line.

I was incredibly suspicious of how quickly this came together,
but it passes all the tests I have for it (including Scalar
functional tests with the commit, checkout, and add integrations).

I'll send a new version with this patch tomorrow, as well as the
other recommended edits.

Thanks,
-Stolee

--- >8 ---


diff --git a/diff-lib.c b/diff-lib.c
index c2ac9250fe9..b631df89343 100644
--- a/diff-lib.c
+++ b/diff-lib.c
@@ -316,6 +316,13 @@ static int get_stat_data(const struct index_state *istate,
 	return 0;
 }
 
+static void show_directory(struct rev_info *revs,
+			   const struct cache_entry *new_dir,
+			   int added)
+{
+	diff_tree_oid(NULL, &new_dir->oid, new_dir->name, &revs->diffopt);
+}
+
 static void show_new_file(struct rev_info *revs,
 			  const struct cache_entry *new_file,
 			  int cached, int match_missing)
@@ -325,6 +332,11 @@ static void show_new_file(struct rev_info *revs,
 	unsigned dirty_submodule = 0;
 	struct index_state *istate = revs->diffopt.repo->index;
 
+	if (new_file && S_ISSPARSEDIR(new_file->ce_mode)) {
+		show_directory(revs, new_file, /*added */ 1);
+		return;
+	}
+
 	/*
 	 * New file in the index: it might actually be different in
 	 * the working tree.
@@ -336,6 +348,15 @@ static void show_new_file(struct rev_info *revs,
 	diff_index_show_file(revs, "+", new_file, oid, !is_null_oid(oid), mode, dirty_submodule);
 }
 
+static void show_modified_sparse_directory(struct rev_info *revs,
+			 const struct cache_entry *old_entry,
+			 const struct cache_entry *new_entry,
+			 int report_missing,
+			 int cached, int match_missing)
+{
+	diff_tree_oid(&old_entry->oid, &new_entry->oid, new_entry->name, &revs->diffopt);
+}
+
 static int show_modified(struct rev_info *revs,
 			 const struct cache_entry *old_entry,
 			 const struct cache_entry *new_entry,
@@ -347,6 +368,17 @@ static int show_modified(struct rev_info *revs,
 	unsigned dirty_submodule = 0;
 	struct index_state *istate = revs->diffopt.repo->index;
 
+	/*
+	 * If both are sparse directory entries, then expand the
+	 * modifications to the file level.
+	 */
+	if (old_entry && new_entry &&
+	    S_ISSPARSEDIR(old_entry->ce_mode) &&
+	    S_ISSPARSEDIR(new_entry->ce_mode)) {
+		show_modified_sparse_directory(revs, old_entry, new_entry, report_missing, cached, match_missing);
+		return 0;
+	}
+
 	if (get_stat_data(istate, new_entry, &oid, &mode, cached, match_missing,
 			  &dirty_submodule, &revs->diffopt) < 0) {
 		if (report_missing)

^ permalink raw reply related	[flat|nested] 215+ messages in thread

* Re: [PATCH v5 10/14] diff-lib: handle index diffs with sparse dirs
  2021-06-10 17:45                   ` Derrick Stolee
@ 2021-06-10 21:31                     ` Elijah Newren
  2021-06-11 12:57                       ` Derrick Stolee
  0 siblings, 1 reply; 215+ messages in thread
From: Elijah Newren @ 2021-06-10 21:31 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Junio C Hamano, Derrick Stolee via GitGitGadget,
	Git Mailing List, Matheus Tavares Bernardino, Derrick Stolee,
	Derrick Stolee

On Thu, Jun 10, 2021 at 10:45 AM Derrick Stolee <stolee@gmail.com> wrote:
>
> On 6/9/2021 4:33 PM, Derrick Stolee wrote:
> > On 6/9/2021 4:11 AM, Elijah Newren wrote:
> >> On Tue, Jun 8, 2021 at 11:32 PM Junio C Hamano <gitster@pobox.com> wrote:
> >>>
> >>> Elijah Newren <newren@gmail.com> writes:
> >>>
> >>> The tree-diff machinery takes two trees, walks them in parallel and
> >>> repeatedly calls either diff_addremove() or diff_change(), which
> >>> appends diff_filepair() to the diff_queue[] structure.  If you see
> >>> an unexpanded tree on the index side, you should be able to pass
> >>> that tree with the subtree you are comparing against to the tree-diff
> >>> machinery to come up with a series of filepairs, and then tweak the
> >>> pathnames of these filepairs (as such a two-tree comparison would be
> >>> comparing two trees representing a single subdirectory of two different
> >>> vintages) before adding them to the diff_queue[] you are collecting
> >>> the index-vs-tree diff, for example.
> >>
> >> Good to know it seems my idea might be reasonable.
> >
> > I agree that this is reasonable. I just didn't look hard enough
> > to find existing code for this, since I found traverse_trees and
> > thought that _was_ the library for this.
>
> This was surprisingly simple, since most of the complicated stuff
> is built into diff_tree_oid() and its use of revs->diffopt. The
> new patch works as shown below the cut-line.
>
> I was incredibly suspicious of how quickly this came together,
> but it passes all the tests I have for it (including Scalar
> functional tests with the commit, checkout, and add integrations).

Nice!

> I'll send a new version with this patch tomorrow, as well as the
> other recommended edits.
>
> Thanks,
> -Stolee
>
> --- >8 ---
>
>
> diff --git a/diff-lib.c b/diff-lib.c
> index c2ac9250fe9..b631df89343 100644
> --- a/diff-lib.c
> +++ b/diff-lib.c
> @@ -316,6 +316,13 @@ static int get_stat_data(const struct index_state *istate,
>         return 0;
>  }
>
> +static void show_directory(struct rev_info *revs,
> +                          const struct cache_entry *new_dir,
> +                          int added)
> +{
> +       diff_tree_oid(NULL, &new_dir->oid, new_dir->name, &revs->diffopt);
> +}
> +
>  static void show_new_file(struct rev_info *revs,
>                           const struct cache_entry *new_file,
>                           int cached, int match_missing)
> @@ -325,6 +332,11 @@ static void show_new_file(struct rev_info *revs,
>         unsigned dirty_submodule = 0;
>         struct index_state *istate = revs->diffopt.repo->index;
>
> +       if (new_file && S_ISSPARSEDIR(new_file->ce_mode)) {
> +               show_directory(revs, new_file, /*added */ 1);
> +               return;
> +       }
> +
>         /*
>          * New file in the index: it might actually be different in
>          * the working tree.
> @@ -336,6 +348,15 @@ static void show_new_file(struct rev_info *revs,
>         diff_index_show_file(revs, "+", new_file, oid, !is_null_oid(oid), mode, dirty_submodule);
>  }
>
> +static void show_modified_sparse_directory(struct rev_info *revs,
> +                        const struct cache_entry *old_entry,
> +                        const struct cache_entry *new_entry,
> +                        int report_missing,
> +                        int cached, int match_missing)
> +{
> +       diff_tree_oid(&old_entry->oid, &new_entry->oid, new_entry->name, &revs->diffopt);
> +}
> +
>  static int show_modified(struct rev_info *revs,
>                          const struct cache_entry *old_entry,
>                          const struct cache_entry *new_entry,
> @@ -347,6 +368,17 @@ static int show_modified(struct rev_info *revs,
>         unsigned dirty_submodule = 0;
>         struct index_state *istate = revs->diffopt.repo->index;
>
> +       /*
> +        * If both are sparse directory entries, then expand the
> +        * modifications to the file level.
> +        */
> +       if (old_entry && new_entry &&
> +           S_ISSPARSEDIR(old_entry->ce_mode) &&
> +           S_ISSPARSEDIR(new_entry->ce_mode)) {
> +               show_modified_sparse_directory(revs, old_entry, new_entry, report_missing, cached, match_missing);
> +               return 0;
> +       }

What if S_ISSPARSEDIR(old_entry->ce_mode) != S_ISSPARSEDIR(new_entry->ce_mode) ?

> +
>         if (get_stat_data(istate, new_entry, &oid, &mode, cached, match_missing,
>                           &dirty_submodule, &revs->diffopt) < 0) {
>                 if (report_missing)

^ permalink raw reply	[flat|nested] 215+ messages in thread

* Re: [PATCH v5 10/14] diff-lib: handle index diffs with sparse dirs
  2021-06-10 21:31                     ` Elijah Newren
@ 2021-06-11 12:57                       ` Derrick Stolee
  2021-06-11 17:27                         ` Derrick Stolee
  0 siblings, 1 reply; 215+ messages in thread
From: Derrick Stolee @ 2021-06-11 12:57 UTC (permalink / raw)
  To: Elijah Newren
  Cc: Junio C Hamano, Derrick Stolee via GitGitGadget,
	Git Mailing List, Matheus Tavares Bernardino, Derrick Stolee,
	Derrick Stolee

On 6/10/2021 5:31 PM, Elijah Newren wrote:
> On Thu, Jun 10, 2021 at 10:45 AM Derrick Stolee <stolee@gmail.com> wrote:
>>
>> On 6/9/2021 4:33 PM, Derrick Stolee wrote:
>>> On 6/9/2021 4:11 AM, Elijah Newren wrote:
>>>> On Tue, Jun 8, 2021 at 11:32 PM Junio C Hamano <gitster@pobox.com> wrote:
>>>>>
>>>>> Elijah Newren <newren@gmail.com> writes:
>>>>>
>>>>> The tree-diff machinery takes two trees, walks them in parallel and
>>>>> repeatedly calls either diff_addremove() or diff_change(), which
>>>>> appends diff_filepair() to the diff_queue[] structure.  If you see
>>>>> an unexpanded tree on the index side, you should be able to pass
>>>>> that tree with the subtree you are comparing against to the tree-diff
>>>>> machinery to come up with a series of filepairs, and then tweak the
>>>>> pathnames of these filepairs (as such a two-tree comparison would be
>>>>> comparing two trees representing a single subdirectory of two different
>>>>> vintages) before adding them to the diff_queue[] you are collecting
>>>>> the index-vs-tree diff, for example.
>>>>
>>>> Good to know it seems my idea might be reasonable.
>>>
>>> I agree that this is reasonable. I just didn't look hard enough
>>> to find existing code for this, since I found traverse_trees and
>>> thought that _was_ the library for this.
>>
>> This was surprisingly simple, since most of the complicated stuff
>> is built into diff_tree_oid() and its use of revs->diffopt. The
>> new patch works as shown below the cut-line.
>>
>> I was incredibly suspicious of how quickly this came together,
>> but it passes all the tests I have for it (including Scalar
>> functional tests with the commit, checkout, and add integrations).
> 
> Nice!
> 
>> I'll send a new version with this patch tomorrow, as well as the
>> other recommended edits.

...still planning on this today, but...

>> +       /*
>> +        * If both are sparse directory entries, then expand the
>> +        * modifications to the file level.
>> +        */
>> +       if (old_entry && new_entry &&
>> +           S_ISSPARSEDIR(old_entry->ce_mode) &&
>> +           S_ISSPARSEDIR(new_entry->ce_mode)) {
>> +               show_modified_sparse_directory(revs, old_entry, new_entry, report_missing, cached, match_missing);
>> +               return 0;
>> +       }
> 
> What if S_ISSPARSEDIR(old_entry->ce_mode) != S_ISSPARSEDIR(new_entry->ce_mode) ?

You make a good point that something different would happen
in the case of a directory/file conflict on the sparse checkout
boundary. This can be as simple as the trivial "only files at
root" cone-mode sparse-checkout definition, with "folder/" (tree)
changing to "folder" (blob).

I'll see what I can do to create a test scenario for
this and add the correct cases.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 215+ messages in thread

* Re: [PATCH v5 10/14] diff-lib: handle index diffs with sparse dirs
  2021-06-11 12:57                       ` Derrick Stolee
@ 2021-06-11 17:27                         ` Derrick Stolee
  0 siblings, 0 replies; 215+ messages in thread
From: Derrick Stolee @ 2021-06-11 17:27 UTC (permalink / raw)
  To: Elijah Newren
  Cc: Junio C Hamano, Derrick Stolee via GitGitGadget,
	Git Mailing List, Matheus Tavares Bernardino, Derrick Stolee,
	Derrick Stolee

On 6/11/2021 8:57 AM, Derrick Stolee wrote:
> On 6/10/2021 5:31 PM, Elijah Newren wrote:
>> On Thu, Jun 10, 2021 at 10:45 AM Derrick Stolee <stolee@gmail.com> wrote:
>>>
>>> I'll send a new version with this patch tomorrow, as well as the
>>> other recommended edits.
> 
> ...still planning on this today, but...

So optimistic!
 
>>> +       /*
>>> +        * If both are sparse directory entries, then expand the
>>> +        * modifications to the file level.
>>> +        */
>>> +       if (old_entry && new_entry &&
>>> +           S_ISSPARSEDIR(old_entry->ce_mode) &&
>>> +           S_ISSPARSEDIR(new_entry->ce_mode)) {
>>> +               show_modified_sparse_directory(revs, old_entry, new_entry, report_missing, cached, match_missing);
>>> +               return 0;
>>> +       }
>>
>> What if S_ISSPARSEDIR(old_entry->ce_mode) != S_ISSPARSEDIR(new_entry->ce_mode) ?
> 
> You make a good point that something different would happen
> in the case of a directory/file conflict on the sparse checkout
> boundary. This can be as simple as the trivial "only files at
> root" cone-mode sparse-checkout definition, with "folder/" (tree)
> changing to "folder" (blob).
> 
> I'll see what I can do to create a test scenario for
> this and add the correct cases.

Creating a directory/file conflict in this way exposes a bug in
a different codepath in unpack_trees(), although it isn't visible
until 'git checkout' allows the index to stay sparse. It's due to
the code in unpack_callback() that handles blobs and trees
differently, and hence the blob/tree conflict isn't handled
appropriately there. The changes from Patch 8 are to blame for
these first errors.

At least, those are the first errors I have discovered with these
conflicts. There might be other scenarios that care about this
section of diff-lib.c, but I have not gotten to a point where
such behavior would be exposed.

I don't expect to succeed in squashing this bug today, so I'll
try again next week.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 215+ messages in thread

* [PATCH v6 00/14] Sparse-index: integrate with status
  2021-06-07 12:33       ` [PATCH v5 00/14] Sparse-index: integrate with status Derrick Stolee via GitGitGadget
                           ` (13 preceding siblings ...)
  2021-06-07 12:34         ` [PATCH v5 14/14] fsmonitor: integrate with sparse index Derrick Stolee via GitGitGadget
@ 2021-06-29  1:51         ` Derrick Stolee via GitGitGadget
  2021-06-29  1:51           ` [PATCH v6 01/14] sparse-index: skip indexes with unmerged entries Derrick Stolee via GitGitGadget
                             ` (15 more replies)
  14 siblings, 16 replies; 215+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-06-29  1:51 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee, git,
	johannes.schindelin, Derrick Stolee

This is the first "payoff" series in the sparse-index work. It makes 'git
status' very fast when a sparse-index is enabled on a repository with
cone-mode sparse-checkout (and a small populated set).

This is based on ds/sparse-index-protections AND mt/add-rm-sparse-checkout.
The latter branch is needed because it changes the behavior of 'git add'
around sparse entries, which changes the expectations of a test added in
patch 1.

The approach here is to audit the places where ensure_full_index() pops up
while doing normal commands with pathspecs within the sparse-checkout
definition. Each of these are checked and tested. In the end, the
sparse-index is integrated with these features:

 * git status
 * FS Monitor index extension.

The performance tests in p2000-sparse-operations.sh improve by 95% or more,
even when compared with the full-index cases, not just the sparse-index
cases that previously had extra overhead.

Hopefully this is the first example of how ds/sparse-index-protections has
done the basic work to do these conversions safely, making them look easier
than they seemed when starting this adventure.

Thanks, -Stolee


Update in V6
============

I'm sorry that this revision took so long. Initially I was blocked on
getting the directory/file conflict figured out (I did), but also my team
was very busy with some things. Eventually, we reached an internal deadline
to make an experimental release available [1] with initial sparse-index
performance boosts. Creating that included some additional review by Jeff
Hostetler and Johannes Schindelin which led to more changes in this version.

The good news is that this series has now created the basis for many Git
commands to integrate with the sparse-index without much additional work.
This effort was unfortunately overloaded on this series because the changes
needed for things like 'git checkout' or 'git add' all intersect with the
changes needed for 'git status'. Might as well get it right the first time.

Because the range-diff is a big difficult to read this time, I'll break the
changes down on a patch-by-patch basis.

 1. sparse-index: skip indexes with unmerged entries
    
    (no change)

 2. sparse-index: include EXTENDED flag when expanding

 * Commit message better describes the purpose of the change.

 3. t1092: replace incorrect 'echo' with 'cat'

 * Typo fix

 4. t1092: expand repository data shape

 * some files are added that surround "folder1/" immediately before and
   after, based on the sorting with the trailing slash. This provides extra
   coverage.

 5. t1092: add tests for status/add and sparse files
    
    (no change)

 6. unpack-trees: preserve cache_bottom
    
    (no change)

 7. unpack-trees: compare sparse directories correctly

 * We were previosly not comparing the path lengths, which causes a problem
   (with later updates) when a sparse directory such as "folder1/0/" gets
   compared to a tree name "folder1".

 8. unpack-trees: rename unpack_nondirectories()

 * This new commit changes the name to make more sense with its new behavior
   that could modify a sparse directory entry. The point of the method is in
   contrast to recursing into trees.

 9. unpack-trees: unpack sparse directory entries

 * THIS is the biggest change from previous versions. There were a few
   things going on that were tricky to get right, especially with the
   directory/file conflict (handled in an update in the following, new
   patch).

 * The changes to create_ce_entry() regarding alloc_len missed a spot that
   was critical to getting the length right in the allocated entry.

 * Use '\0' over 0 to represent the terminating character.

 * We don't need a "sparse_directory" parameter to unpack_nondirectories()
   (which was renamed to unpack_single_entry() by the previous new patch)
   because we can use dirmask to discover if src[0] (or any other value)
   should be a sparse directory entry.

 * Similarly, we don't need to call the method twice from unpack_callback().

 * The 'conflicts' variable is set to match the dirmask in the beginning,
   but it should depend on whether or not we have a sparse directory entry
   instead, and if all trees that have the path have a directory.

 * The implementation of find_cache_entry() uses find_cache_pos() to find an
   insertion position for a path if it doesn't find an exact match. Before,
   we subtracted one to find the sparse directory entry, but there could be
   multiple paths between the sparse directory entry and the insertion
   point, so we need to walk backwards until we find it. This requires many
   paths having the same prefix, so hopefully is a rare case. Some of the
   test data changes were added to cover the need for this logic. This uses
   a helper method, sparse_dir_matches_path, which is also used by
   is_sparse_directory_entry.

 10. unpack-trees: handle dir/file conflict of sparse entries

 * This new logic inside twoway_merge handles the special case for dealing
   with a directory/file conflict during a 'git checkout'. The necessarily
   data and tests are also added here, though the logic will only take
   serious effect when we integrate with 'git checkout' later.

 11. dir.c: accept a directory as part of cone-mode patterns

 * The value slash_pos was previously a pointer within a strbuf, but in some
   cases we add to that strbuf and that could reallocate the pointer, making
   slash_pos be invalid. The replacement is to have slash_pos be an integer
   position within the string, so it is consistent even if the string is
   reallocated for an append.

 12. diff-lib: handle index diffs with sparse dirs

 * As recommended in the previous review, a simple diff_tree_oid() replaces
   the complicated use of read_tree_at() and traverse_trees() in the
   previous version.

 13. status: skip sparse-checkout percentage with sparse-index
     
     (no change)

 14. status: use sparse-index throughout
     
     (no change)

 15. wt-status: expand added sparse directory entries

 * Duplicate 'git status --porcelain=v2' lines are removed from tests.

 * The pathspec is initialized using "= { 0 }" instead of memset().

 16. fsmonitor: integrate with sparse index

 * An extra test_region is added to ensure that the filesystem monitor hook
   is still being called, and we are not simply disabling the feature
   entirely.

Derrick Stolee (14):
  sparse-index: skip indexes with unmerged entries
  sparse-index: include EXTENDED flag when expanding
  t1092: replace incorrect 'echo' with 'cat'
  t1092: expand repository data shape
  t1092: add tests for status/add and sparse files
  unpack-trees: preserve cache_bottom
  unpack-trees: compare sparse directories correctly
  unpack-trees: unpack sparse directory entries
  dir.c: accept a directory as part of cone-mode patterns
  diff-lib: handle index diffs with sparse dirs
  status: skip sparse-checkout percentage with sparse-index
  status: use sparse-index throughout
  wt-status: expand added sparse directory entries
  fsmonitor: integrate with sparse index

 builtin/commit.c                         |   3 +
 diff-lib.c                               |  16 +++
 dir.c                                    |  11 ++
 read-cache.c                             |  10 +-
 sparse-index.c                           |  27 +++-
 t/t1092-sparse-checkout-compatibility.sh | 155 ++++++++++++++++++++++-
 t/t7519-status-fsmonitor.sh              |  48 +++++++
 unpack-trees.c                           | 130 ++++++++++++++++---
 wt-status.c                              |  63 ++++++++-
 wt-status.h                              |   1 +
 10 files changed, 436 insertions(+), 28 deletions(-)


base-commit: ebf3c04b262aa27fbb97f8a0156c2347fecafafb
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-932%2Fderrickstolee%2Fsparse-index%2Fstatus-and-add-v6
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-932/derrickstolee/sparse-index/status-and-add-v6
Pull-Request: https://github.com/gitgitgadget/git/pull/932

Range-diff vs v5:

  1:  5a2ed3d1d70 =  1:  2a4a7256304 sparse-index: skip indexes with unmerged entries
  2:  8aa41e74947 !  2:  f5bae86014d sparse-index: include EXTENDED flag when expanding
     @@ Commit message
      
          When creating a full index from a sparse one, we create cache entries
          for every blob within a given sparse directory entry. These are
     -    correctly marked with the CE_SKIP_WORKTREE flag, but they must also be
     -    marked with the CE_EXTENDED flag to ensure that the skip-worktree bit is
     -    correctly written to disk in the case that the index is not converted
     -    back down to a sparse-index.
     +    correctly marked with the CE_SKIP_WORKTREE flag, but the CE_EXTENDED
     +    flag is not included. The CE_EXTENDED flag would exist if we loaded a
     +    full index from disk with these entries marked with CE_SKIP_WORKTREE, so
     +    we can add the flag here to be consistent. This allows us to directly
     +    compare the flags present in cache entries when testing the sparse-index
     +    feature, but has no significance to its correctness in the user-facing
     +    functionality.
      
          Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
      
  3:  b99371c7dd6 !  3:  d965669c766 t1092: replace incorrect 'echo' with 'cat'
     @@ Commit message
          t1092: replace incorrect 'echo' with 'cat'
      
          This fixes the test data shape to be as expected, allowing rename
     -    detection to work properly now that the 'larger-conent' file actually
     +    detection to work properly now that the 'larger-content' file actually
          has meaningful lines.
      
          Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
  4:  f4dddac1859 =  4:  44a940211b2 t1092: expand repository data shape
  5:  856346b72f7 =  5:  701ac0e8ff6 t1092: add tests for status/add and sparse files
  6:  f3f6223e955 =  6:  587333f7c61 unpack-trees: preserve cache_bottom
  7:  45ae96adf28 =  7:  6fc898ac23e unpack-trees: compare sparse directories correctly
  8:  724194eef9f !  8:  b676ef4925b unpack-trees: unpack sparse directory entries
     @@ Commit message
          'git checkout <commit>' which wants to replace a sparse tree entry with
          the tree for that path at the target commit. We extend the logic within
          unpack_nondirectories() to create a sparse-directory entry in this case,
     -    and then that is sent to call_unpack_fn().
     +    and then that is sent to call_unpack_fn(). Since the name becomes
     +    confusing by handling directories, rename it to unpack_single_entry()
     +    since it handles a blob entry or a sparse directory entry without using
     +    traverse_trees_recursive().
      
          There are some subtleties in this process. For instance, we need to
          update find_cache_entry() to allow finding a sparse-directory entry that
     -    exactly matches a given path.
     +    exactly matches a given path. Use the new helper method
     +    sparse_dir_matches_path() for this.
      
          Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
      
     @@ unpack-trees.c: static struct cache_entry *create_ce_entry(const struct traverse
      +	size_t alloc_len = is_sparse_directory ? len + 1 : len;
       	struct cache_entry *ce =
       		is_transient ?
     --		make_empty_transient_cache_entry(len) :
     +-		make_empty_transient_cache_entry(len, NULL) :
      -		make_empty_cache_entry(istate, len);
     -+		make_empty_transient_cache_entry(alloc_len) :
     ++		make_empty_transient_cache_entry(alloc_len, NULL) :
      +		make_empty_cache_entry(istate, alloc_len);
       
       	ce->ce_mode = create_ce_mode(n->mode);
     @@ unpack-trees.c: static struct cache_entry *create_ce_entry(const struct traverse
       
      +	if (is_sparse_directory) {
      +		ce->name[len] = '/';
     -+		ce->name[len + 1] = 0;
     ++		ce->name[len + 1] = '\0';
      +		ce->ce_namelen++;
      +		ce->ce_flags |= CE_SKIP_WORKTREE;
      +	}
     @@ unpack-trees.c: static struct cache_entry *create_ce_entry(const struct traverse
       	return ce;
       }
       
     -@@ unpack-trees.c: static int unpack_nondirectories(int n, unsigned long mask,
     - 				 unsigned long dirmask,
     - 				 struct cache_entry **src,
     - 				 const struct name_entry *names,
     +@@ unpack-trees.c: static struct cache_entry *create_ce_entry(const struct traverse_info *info,
     +  * without actually calling it. If you change the logic here you may need to
     +  * check and change there as well.
     +  */
     +-static int unpack_nondirectories(int n, unsigned long mask,
     +-				 unsigned long dirmask,
     +-				 struct cache_entry **src,
     +-				 const struct name_entry *names,
      -				 const struct traverse_info *info)
     -+				 const struct traverse_info *info,
     -+				 int sparse_directory)
     ++static int unpack_single_entry(int n, unsigned long mask,
     ++			       unsigned long dirmask,
     ++			       struct cache_entry **src,
     ++			       const struct name_entry *names,
     ++			       const struct traverse_info *info,
     ++			       int sparse_directory)
       {
       	int i;
       	struct unpack_trees_options *o = info->data;
     @@ unpack-trees.c: static int unpack_nondirectories(int n, unsigned long mask,
       	if (mask == dirmask && !src[0])
       		return 0;
       
     -+	/* no-op if our cache entry doesn't match the expectations. */
     ++	/* defer work if our cache entry doesn't match the expectations. */
      +	if (sparse_directory) {
      +		if (src[0] && !S_ISSPARSEDIR(src[0]->ce_mode))
      +			BUG("expected sparse directory entry");
     @@ unpack-trees.c: static int unpack_nondirectories(int n, unsigned long mask,
       
       	if (o->merge) {
      @@ unpack-trees.c: static int find_cache_pos(struct traverse_info *info,
     + 	return -1;
     + }
     + 
     ++/*
     ++ * Given a sparse directory entry 'ce', compare ce->name to
     ++ * info->name + '/' + p->path + '/' if info->name is non-empty.
     ++ * Compare ce->name to p->path + '/' otherwise. Note that
     ++ * ce->name must end in a trailing '/' because it is a sparse
     ++ * directory entry.
     ++ */
     ++static int sparse_dir_matches_path(const struct cache_entry *ce,
     ++				   struct traverse_info *info,
     ++				   const struct name_entry *p)
     ++{
     ++	assert(S_ISSPARSEDIR(ce->ce_mode));
     ++	assert(ce->name[ce->ce_namelen - 1] == '/');
     ++
     ++	if (info->namelen)
     ++		return ce->ce_namelen == info->namelen + p->pathlen + 2 &&
     ++		       ce->name[info->namelen] == '/' &&
     ++		       !strncmp(ce->name, info->name, info->namelen) &&
     ++		       !strncmp(ce->name + info->namelen + 1, p->path, p->pathlen);
     ++	return ce->ce_namelen == p->pathlen + 1 &&
     ++	       !strncmp(ce->name, p->path, p->pathlen);
     ++}
     ++
       static struct cache_entry *find_cache_entry(struct traverse_info *info,
       					    const struct name_entry *p)
       {
     @@ unpack-trees.c: static int find_cache_pos(struct traverse_info *info,
      +	pos = -pos - 2;
      +
      +	if (pos < 0 || pos >= o->src_index->cache_nr)
     -+		return NULL;
     + 		return NULL;
      +
      +	ce = o->src_index->cache[pos];
      +
      +	if (!S_ISSPARSEDIR(ce->ce_mode))
     - 		return NULL;
     ++		return NULL;
      +
     -+	/*
     -+	 * Compare ce->name to info->name + '/' + p->path + '/'
     -+	 * if info->name is non-empty. Compare ce->name to
     -+	 * p-.path + '/' otherwise.
     -+	 */
     -+	if (info->namelen) {
     -+		if (ce->ce_namelen == info->namelen + p->pathlen + 2 &&
     -+		    ce->name[info->namelen] == '/' &&
     -+		    !strncmp(ce->name, info->name, info->namelen) &&
     -+		    !strncmp(ce->name + info->namelen + 1, p->path, p->pathlen))
     -+			return ce;
     -+	} else if (ce->ce_namelen == p->pathlen + 1 &&
     -+		   !strncmp(ce->name, p->path, p->pathlen))
     ++	if (sparse_dir_matches_path(ce, info, p))
      +		return ce;
     ++
      +	return NULL;
       }
       
     @@ unpack-trees.c: static void debug_unpack_callback(int n,
      + * sparse-directory entry that matches the given name_entry
      + * from the tree walk at the given traverse_info.
      + */
     -+static int is_sparse_directory_entry(struct cache_entry *ce, struct name_entry *name, struct traverse_info *info)
     ++static int is_sparse_directory_entry(struct cache_entry *ce,
     ++				     struct name_entry *name,
     ++				     struct traverse_info *info)
      +{
     -+	size_t expected_len, name_start;
     -+
      +	if (!ce || !name || !S_ISSPARSEDIR(ce->ce_mode))
      +		return 0;
      +
     -+	if (info->namelen)
     -+		name_start = info->namelen + 1;
     -+	else
     -+		name_start = 0;
     -+	expected_len = name->pathlen + 1 + name_start;
     -+
     -+	if (ce->ce_namelen != expected_len ||
     -+	    strncmp(ce->name, info->name, info->namelen) ||
     -+	    strncmp(ce->name + name_start, name->path, name->pathlen))
     -+		return 0;
     -+
     -+	return 1;
     ++	return sparse_dir_matches_path(ce, info, name);
      +}
      +
       /*
     @@ unpack-trees.c: static int unpack_callback(int n, unsigned long mask, unsigned l
       	}
       
      -	if (unpack_nondirectories(n, mask, dirmask, src, names, info) < 0)
     -+	if (unpack_nondirectories(n, mask, dirmask, src, names, info, 0) < 0)
     ++	if (unpack_single_entry(n, mask, dirmask, src, names, info, 0) < 0)
       		return -1;
       
       	if (o->merge && src[0]) {
     @@ unpack-trees.c: static int unpack_callback(int n, unsigned long mask, unsigned l
      -		if (traverse_trees_recursive(n, dirmask, mask & ~dirmask,
      -					     names, info) < 0)
      +		if (is_sparse_directory_entry(src[0], names, info)) {
     -+			if (unpack_nondirectories(n, dirmask, mask & ~dirmask, src, names, info, 1) < 0)
     ++			if (unpack_single_entry(n, dirmask, mask & ~dirmask, src, names, info, 1) < 0)
      +				return -1;
      +		} else if (traverse_trees_recursive(n, dirmask, mask & ~dirmask,
      +						    names, info) < 0) {
  9:  b8ff179f43e =  9:  d693f00d9a2 dir.c: accept a directory as part of cone-mode patterns
 10:  b9b97e01129 ! 10:  ed11cfc791f diff-lib: handle index diffs with sparse dirs
     @@ Commit message
          identical to before: the lack of a cache entry is the same with a sparse
          index.
      
     -    In the case where a tree is modified, we need to expand the tree
     -    recursively, and start comparing each contained entry as either an
     -    addition, deletion, or modification. This causes an interesting
     -    recursion that did not exist before.
     +    Use diff_tree_oid() appropriately to appropriately compute the diff.
      
          Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
      
       ## diff-lib.c ##
     -@@ diff-lib.c: static int get_stat_data(const struct cache_entry *ce,
     - 	return 0;
     - }
     - 
     -+struct show_new_tree_context {
     -+	struct rev_info *revs;
     -+	unsigned added:1;
     -+};
     -+
     -+static int show_new_file_from_tree(const struct object_id *oid,
     -+				   struct strbuf *base, const char *path,
     -+				   unsigned int mode, void *context)
     -+{
     -+	struct show_new_tree_context *ctx = context;
     -+	struct cache_entry *new_file = make_transient_cache_entry(mode, oid, path, /* stage */ 0);
     -+
     -+	diff_index_show_file(ctx->revs, ctx->added ? "+" : "-", new_file, oid, !is_null_oid(oid), mode, 0);
     -+	discard_cache_entry(new_file);
     -+	return 0;
     -+}
     -+
     -+static void show_directory(struct rev_info *revs,
     -+			   const struct cache_entry *new_dir,
     -+			   int added)
     -+{
     -+	/*
     -+	 * new_dir is a sparse directory entry, so we want to collect all
     -+	 * of the new files within the tree. This requires recursively
     -+	 * expanding the trees.
     -+	 */
     -+	struct show_new_tree_context ctx = { revs, added };
     -+	struct repository *r = revs->repo;
     -+	struct strbuf base = STRBUF_INIT;
     -+	struct pathspec ps;
     -+	struct tree *tree = lookup_tree(r, &new_dir->oid);
     -+
     -+	memset(&ps, 0, sizeof(ps));
     -+	ps.recursive = 1;
     -+	ps.has_wildcard = 1;
     -+	ps.max_depth = -1;
     -+
     -+	strbuf_add(&base, new_dir->name, new_dir->ce_namelen);
     -+	read_tree_at(r, tree, &base, &ps,
     -+			show_new_file_from_tree, &ctx);
     -+}
     -+
     - static void show_new_file(struct rev_info *revs,
     - 			  const struct cache_entry *new_file,
     - 			  int cached, int match_missing)
      @@ diff-lib.c: static void show_new_file(struct rev_info *revs,
     - 	unsigned int mode;
       	unsigned dirty_submodule = 0;
     + 	struct index_state *istate = revs->diffopt.repo->index;
       
      +	if (new_file && S_ISSPARSEDIR(new_file->ce_mode)) {
     -+		show_directory(revs, new_file, /*added */ 1);
     ++		diff_tree_oid(NULL, &new_file->oid, new_file->name, &revs->diffopt);
      +		return;
      +	}
      +
       	/*
       	 * New file in the index: it might actually be different in
       	 * the working tree.
     -@@ diff-lib.c: static void show_new_file(struct rev_info *revs,
     - 	diff_index_show_file(revs, "+", new_file, oid, !is_null_oid(oid), mode, dirty_submodule);
     - }
     - 
     -+static int show_modified(struct rev_info *revs,
     -+			 const struct cache_entry *old_entry,
     -+			 const struct cache_entry *new_entry,
     -+			 int report_missing,
     -+			 int cached, int match_missing);
     -+
     -+static int compare_within_sparse_dir(int n, unsigned long mask,
     -+				     unsigned long dirmask, struct name_entry *entry,
     -+				     struct traverse_info *info)
     -+{
     -+	struct rev_info *revs = info->data;
     -+	struct object_id *oid0 = &entry[0].oid;
     -+	struct object_id *oid1 = &entry[1].oid;
     -+
     -+	if (oideq(oid0, oid1))
     -+		return mask;
     -+
     -+	/* Directory/file conflicts are handled earlier. */
     -+	if (S_ISDIR(entry[0].mode) && S_ISDIR(entry[1].mode)) {
     -+		struct tree_desc t[2];
     -+		void *buf[2];
     -+		struct traverse_info info_r = { NULL, };
     -+
     -+		info_r.name = xstrfmt("%s%s", info->traverse_path, entry[0].path);
     -+		info_r.namelen = strlen(info_r.name);
     -+		info_r.traverse_path = xstrfmt("%s/", info_r.name);
     -+		info_r.fn = compare_within_sparse_dir;
     -+		info_r.prev = info;
     -+		info_r.mode = entry[0].mode;
     -+		info_r.pathlen = entry[0].pathlen;
     -+		info_r.df_conflicts = 0;
     -+		info_r.data = revs;
     -+
     -+		buf[0] = fill_tree_descriptor(revs->repo, &t[0], oid0);
     -+		buf[1] = fill_tree_descriptor(revs->repo, &t[1], oid1);
     -+
     -+		traverse_trees(NULL, 2, t, &info_r);
     -+
     -+		free((char *)info_r.name);
     -+		free((char *)info_r.traverse_path);
     -+		free(buf[0]);
     -+		free(buf[1]);
     -+	} else {
     -+		char *old_path = NULL, *new_path = NULL;
     -+		struct cache_entry *old_entry = NULL, *new_entry = NULL;
     -+
     -+		if (entry[0].path) {
     -+			old_path = xstrfmt("%s%s", info->traverse_path, entry[0].path);
     -+			old_entry = make_transient_cache_entry(
     -+					entry[0].mode, &entry[0].oid,
     -+					old_path, /* stage */ 0);
     -+			old_entry->ce_flags |= CE_SKIP_WORKTREE;
     -+		}
     -+		if (entry[1].path) {
     -+			new_path = xstrfmt("%s%s", info->traverse_path, entry[1].path);
     -+			new_entry = make_transient_cache_entry(
     -+					entry[1].mode, &entry[1].oid,
     -+					new_path, /* stage */ 0);
     -+			new_entry->ce_flags |= CE_SKIP_WORKTREE;
     -+		}
     -+
     -+		if (entry[0].path && entry[1].path)
     -+			show_modified(revs, old_entry, new_entry, 0, 1, 0);
     -+		else if (entry[0].path)
     -+			diff_index_show_file(revs, revs->prefix,
     -+					     old_entry, &entry[0].oid,
     -+					     0, entry[0].mode, 0);
     -+		else if (entry[1].path)
     -+			show_new_file(revs, new_entry, 1, 0);
     -+
     -+		discard_cache_entry(old_entry);
     -+		discard_cache_entry(new_entry);
     -+		free(old_path);
     -+		free(new_path);
     -+	}
     -+
     -+	return mask;
     -+}
     -+
     -+static void show_modified_sparse_directory(struct rev_info *revs,
     -+			 const struct cache_entry *old_entry,
     -+			 const struct cache_entry *new_entry,
     -+			 int report_missing,
     -+			 int cached, int match_missing)
     -+{
     -+	struct tree_desc t[2];
     -+	void *buf[2];
     -+	struct traverse_info info = { NULL };
     -+	struct strbuf name = STRBUF_INIT;
     -+	struct strbuf parent_path = STRBUF_INIT;
     -+	char *last_dir_sep;
     -+
     -+	if (oideq(&old_entry->oid, &new_entry->oid))
     -+		return;
     -+
     -+	info.fn = compare_within_sparse_dir;
     -+	info.prev = &info;
     -+
     -+	strbuf_add(&name, new_entry->name, new_entry->ce_namelen - 1);
     -+	info.name = name.buf;
     -+	info.namelen = name.len;
     -+
     -+	strbuf_add(&parent_path, new_entry->name, new_entry->ce_namelen - 1);
     -+	if ((last_dir_sep = find_last_dir_sep(parent_path.buf)) > parent_path.buf)
     -+		strbuf_setlen(&parent_path, (last_dir_sep - parent_path.buf) - 1);
     -+	else
     -+		strbuf_setlen(&parent_path, 0);
     -+
     -+	info.pathlen = parent_path.len;
     -+
     -+	if (parent_path.len)
     -+		info.traverse_path = parent_path.buf;
     -+	else
     -+		info.traverse_path = "";
     -+
     -+	info.mode = new_entry->ce_mode;
     -+	info.df_conflicts = 0;
     -+	info.data = revs;
     -+
     -+	buf[0] = fill_tree_descriptor(revs->repo, &t[0], &old_entry->oid);
     -+	buf[1] = fill_tree_descriptor(revs->repo, &t[1], &new_entry->oid);
     -+
     -+	traverse_trees(NULL, 2, t, &info);
     -+
     -+	free(buf[0]);
     -+	free(buf[1]);
     -+	strbuf_release(&name);
     -+	strbuf_release(&parent_path);
     -+}
     -+
     - static int show_modified(struct rev_info *revs,
     - 			 const struct cache_entry *old_entry,
     - 			 const struct cache_entry *new_entry,
      @@ diff-lib.c: static int show_modified(struct rev_info *revs,
     - 	const struct object_id *oid;
       	unsigned dirty_submodule = 0;
     + 	struct index_state *istate = revs->diffopt.repo->index;
       
      +	/*
      +	 * If both are sparse directory entries, then expand the
     @@ diff-lib.c: static int show_modified(struct rev_info *revs,
      +	if (old_entry && new_entry &&
      +	    S_ISSPARSEDIR(old_entry->ce_mode) &&
      +	    S_ISSPARSEDIR(new_entry->ce_mode)) {
     -+		show_modified_sparse_directory(revs, old_entry, new_entry, report_missing, cached, match_missing);
     ++		diff_tree_oid(&old_entry->oid, &new_entry->oid, new_entry->name, &revs->diffopt);
      +		return 0;
      +	}
      +
     - 	if (get_stat_data(new_entry, &oid, &mode, cached, match_missing,
     + 	if (get_stat_data(istate, new_entry, &oid, &mode, cached, match_missing,
       			  &dirty_submodule, &revs->diffopt) < 0) {
       		if (report_missing)
 11:  611b9f61fb2 = 11:  48fd25aacbe status: skip sparse-checkout percentage with sparse-index
 12:  0c0a765dde8 = 12:  3499105eb67 status: use sparse-index throughout
 13:  02f2c7b6398 ! 13:  08225483d69 wt-status: expand added sparse directory entries
     @@ t/t1092-sparse-checkout-compatibility.sh: test_expect_success 'sparse-index is n
      +	test_sparse_match git reset --mixed HEAD~1 &&
      +	test_sparse_match test-tool read-cache --table --expand &&
      +	test_sparse_match git status --porcelain=v2 &&
     -+	test_sparse_match git status --porcelain=v2 &&
      +
      +	# At this point, sparse-checkouts behave differently
      +	# from the full-checkout.
      +	test_sparse_match git checkout --orphan new-branch &&
      +	test_sparse_match test-tool read-cache --table --expand &&
     -+	test_sparse_match git status --porcelain=v2 &&
      +	test_sparse_match git status --porcelain=v2
      +'
      +
     @@ t/t1092-sparse-checkout-compatibility.sh: test_expect_success 'sparse-index is n
      +
      +	run_on_all touch deep/deeper1/x &&
      +	test_all_match git add . &&
     -+	test_all_match git status --porcelain=v2 &&
      +	test_all_match git status --porcelain=v2
      +'
      +
     @@ t/t1092-sparse-checkout-compatibility.sh: test_expect_success 'sparse-index is n
      
       ## wt-status.c ##
      @@ wt-status.c: static void wt_status_collect_changes_index(struct wt_status *s)
     - 	run_diff_index(&rev, 1);
     + 	clear_pathspec(&rev.prune_data);
       }
       
      +static int add_file_to_list(const struct object_id *oid,
     @@ wt-status.c: static void wt_status_collect_changes_initial(struct wt_status *s)
      +			 * tree and marking them with DIFF_STATUS_ADDED.
      +			 */
      +			struct strbuf base = STRBUF_INIT;
     -+			struct pathspec ps;
     ++			struct pathspec ps = { 0 };
      +			struct tree *tree = lookup_tree(istate->repo, &ce->oid);
      +
     -+			memset(&ps, 0, sizeof(ps));
      +			ps.recursive = 1;
      +			ps.has_wildcard = 1;
      +			ps.max_depth = -1;
 14:  46ca150c354 = 14:  711e403a63a fsmonitor: integrate with sparse index

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 215+ messages in thread

* [PATCH v6 01/14] sparse-index: skip indexes with unmerged entries
  2021-06-29  1:51         ` [PATCH v6 00/14] Sparse-index: integrate with status Derrick Stolee via GitGitGadget
@ 2021-06-29  1:51           ` Derrick Stolee via GitGitGadget
  2021-06-29  1:51           ` [PATCH v6 02/14] sparse-index: include EXTENDED flag when expanding Derrick Stolee via GitGitGadget
                             ` (14 subsequent siblings)
  15 siblings, 0 replies; 215+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-06-29  1:51 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee, git,
	johannes.schindelin, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

The sparse-index format is designed to be compatible with merge
conflicts, even those outside the sparse-checkout definition. The reason
is that when converting a full index to a sparse one, a cache entry with
nonzero stage will not be collapsed into a sparse directory entry.

However, this behavior was not tested, and a different behavior within
convert_to_sparse() fails in this scenario. Specifically,
cache_tree_update() will fail when unmerged entries exist.
convert_to_sparse_rec() uses the cache-tree data to recursively walk the
tree structure, but also to compute the OIDs used in the
sparse-directory entries.

Add an index scan to convert_to_sparse() that will detect if these merge
conflict entries exist and skip the conversion before trying to update
the cache-tree. This is marked as NEEDSWORK because this can be removed
with a suitable update to cache_tree_update() or a similar method that
can construct a cache-tree with invalid nodes, but still allow creating
the nodes necessary for creating sparse directory entries.

It is possible that in the future we will not need to make such an
update, since if we do not expand a sparse-index into a full one, this
conversion does not need to happen. Thus, this can be deferred until the
merge machinery is made to integrate with the sparse-index.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 sparse-index.c                           | 18 ++++++++++++++++++
 t/t1092-sparse-checkout-compatibility.sh | 22 ++++++++++++++++++++++
 2 files changed, 40 insertions(+)

diff --git a/sparse-index.c b/sparse-index.c
index affc4048f27..2c695930275 100644
--- a/sparse-index.c
+++ b/sparse-index.c
@@ -116,6 +116,17 @@ int set_sparse_index_config(struct repository *repo, int enable)
 	return res;
 }
 
+static int index_has_unmerged_entries(struct index_state *istate)
+{
+	int i;
+	for (i = 0; i < istate->cache_nr; i++) {
+		if (ce_stage(istate->cache[i]))
+			return 1;
+	}
+
+	return 0;
+}
+
 int convert_to_sparse(struct index_state *istate)
 {
 	int test_env;
@@ -152,6 +163,13 @@ int convert_to_sparse(struct index_state *istate)
 		return -1;
 	}
 
+	/*
+	 * NEEDSWORK: If we have unmerged entries, then stay full.
+	 * Unmerged entries prevent the cache-tree extension from working.
+	 */
+	if (index_has_unmerged_entries(istate))
+		return 0;
+
 	if (cache_tree_update(istate, 0)) {
 		warning(_("unable to update cache-tree, staying full"));
 		return -1;
diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index e9a815ca7aa..ba2fd94adaf 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -352,6 +352,28 @@ test_expect_success 'merge with outside renames' '
 	done
 '
 
+# Sparse-index fails to convert the index in the
+# final 'git cherry-pick' command.
+test_expect_success 'cherry-pick with conflicts' '
+	init_repos &&
+
+	write_script edit-conflict <<-\EOF &&
+	echo $1 >conflict
+	EOF
+
+	test_all_match git checkout -b to-cherry-pick &&
+	run_on_all ../edit-conflict ABC &&
+	test_all_match git add conflict &&
+	test_all_match git commit -m "conflict to pick" &&
+
+	test_all_match git checkout -B base HEAD~1 &&
+	run_on_all ../edit-conflict DEF &&
+	test_all_match git add conflict &&
+	test_all_match git commit -m "conflict in base" &&
+
+	test_all_match test_must_fail git cherry-pick to-cherry-pick
+'
+
 test_expect_success 'clean' '
 	init_repos &&
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 215+ messages in thread

* [PATCH v6 02/14] sparse-index: include EXTENDED flag when expanding
  2021-06-29  1:51         ` [PATCH v6 00/14] Sparse-index: integrate with status Derrick Stolee via GitGitGadget
  2021-06-29  1:51           ` [PATCH v6 01/14] sparse-index: skip indexes with unmerged entries Derrick Stolee via GitGitGadget
@ 2021-06-29  1:51           ` Derrick Stolee via GitGitGadget
  2021-06-29  1:51           ` [PATCH v6 03/14] t1092: replace incorrect 'echo' with 'cat' Derrick Stolee via GitGitGadget
                             ` (13 subsequent siblings)
  15 siblings, 0 replies; 215+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-06-29  1:51 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee, git,
	johannes.schindelin, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

When creating a full index from a sparse one, we create cache entries
for every blob within a given sparse directory entry. These are
correctly marked with the CE_SKIP_WORKTREE flag, but the CE_EXTENDED
flag is not included. The CE_EXTENDED flag would exist if we loaded a
full index from disk with these entries marked with CE_SKIP_WORKTREE, so
we can add the flag here to be consistent. This allows us to directly
compare the flags present in cache entries when testing the sparse-index
feature, but has no significance to its correctness in the user-facing
functionality.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 sparse-index.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/sparse-index.c b/sparse-index.c
index 2c695930275..ef53bd2198b 100644
--- a/sparse-index.c
+++ b/sparse-index.c
@@ -213,7 +213,7 @@ static int add_path_to_index(const struct object_id *oid,
 	strbuf_addstr(base, path);
 
 	ce = make_cache_entry(istate, mode, oid, base->buf, 0, 0);
-	ce->ce_flags |= CE_SKIP_WORKTREE;
+	ce->ce_flags |= CE_SKIP_WORKTREE | CE_EXTENDED;
 	set_index_entry(istate, istate->cache_nr++, ce);
 
 	strbuf_setlen(base, len);
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 215+ messages in thread

* [PATCH v6 03/14] t1092: replace incorrect 'echo' with 'cat'
  2021-06-29  1:51         ` [PATCH v6 00/14] Sparse-index: integrate with status Derrick Stolee via GitGitGadget
  2021-06-29  1:51           ` [PATCH v6 01/14] sparse-index: skip indexes with unmerged entries Derrick Stolee via GitGitGadget
  2021-06-29  1:51           ` [PATCH v6 02/14] sparse-index: include EXTENDED flag when expanding Derrick Stolee via GitGitGadget
@ 2021-06-29  1:51           ` Derrick Stolee via GitGitGadget
  2021-06-29  1:51           ` [PATCH v6 04/14] t1092: expand repository data shape Derrick Stolee via GitGitGadget
                             ` (12 subsequent siblings)
  15 siblings, 0 replies; 215+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-06-29  1:51 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee, git,
	johannes.schindelin, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

This fixes the test data shape to be as expected, allowing rename
detection to work properly now that the 'larger-content' file actually
has meaningful lines.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 t/t1092-sparse-checkout-compatibility.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index ba2fd94adaf..ebbba044f77 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -40,7 +40,7 @@ test_expect_success 'setup' '
 		done &&
 
 		git checkout -b rename-base base &&
-		echo >folder1/larger-content <<-\EOF &&
+		cat >folder1/larger-content <<-\EOF &&
 		matching
 		lines
 		help
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 215+ messages in thread

* [PATCH v6 04/14] t1092: expand repository data shape
  2021-06-29  1:51         ` [PATCH v6 00/14] Sparse-index: integrate with status Derrick Stolee via GitGitGadget
                             ` (2 preceding siblings ...)
  2021-06-29  1:51           ` [PATCH v6 03/14] t1092: replace incorrect 'echo' with 'cat' Derrick Stolee via GitGitGadget
@ 2021-06-29  1:51           ` Derrick Stolee via GitGitGadget
  2021-06-29  1:51           ` [PATCH v6 05/14] t1092: add tests for status/add and sparse files Derrick Stolee via GitGitGadget
                             ` (11 subsequent siblings)
  15 siblings, 0 replies; 215+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-06-29  1:51 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee, git,
	johannes.schindelin, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

As more features integrate with the sparse-index feature, more and more
special cases arise that require different data shapes within the tree
structure of the repository in order to demonstrate those cases.

Add several interesting special cases all at once instead of sprinkling
them across several commits. The interesting cases being added here are:

* Add sparse-directory entries on both sides of directories within the
  sparse-checkout definition.

* Add directories outside the sparse-checkout definition who have only
  one entry and are the first entry of a directory with multiple
  entries.

Later tests will take advantage of these shapes, but they also deepen
the tests that already exist.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 t/t1092-sparse-checkout-compatibility.sh | 39 ++++++++++++++++++++++--
 1 file changed, 37 insertions(+), 2 deletions(-)

diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index ebbba044f77..4a7b48e8d3f 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -17,7 +17,7 @@ test_expect_success 'setup' '
 		echo "after folder1" >g &&
 		echo "after x" >z &&
 		mkdir folder1 folder2 deep x &&
-		mkdir deep/deeper1 deep/deeper2 &&
+		mkdir deep/deeper1 deep/deeper2 deep/before deep/later &&
 		mkdir deep/deeper1/deepest &&
 		echo "after deeper1" >deep/e &&
 		echo "after deepest" >deep/deeper1/e &&
@@ -25,10 +25,20 @@ test_expect_success 'setup' '
 		cp a folder2 &&
 		cp a x &&
 		cp a deep &&
+		cp a deep/before &&
 		cp a deep/deeper1 &&
 		cp a deep/deeper2 &&
+		cp a deep/later &&
 		cp a deep/deeper1/deepest &&
 		cp -r deep/deeper1/deepest deep/deeper2 &&
+		mkdir deep/deeper1/0 &&
+		mkdir deep/deeper1/0/0 &&
+		touch deep/deeper1/0/1 &&
+		touch deep/deeper1/0/0/0 &&
+		cp -r deep/deeper1/0 folder1 &&
+		cp -r deep/deeper1/0 folder2 &&
+		echo >>folder1/0/0/0 &&
+		echo >>folder2/0/1 &&
 		git add . &&
 		git commit -m "initial commit" &&
 		git checkout -b base &&
@@ -56,11 +66,17 @@ test_expect_success 'setup' '
 		mv folder1/a folder2/b &&
 		mv folder1/larger-content folder2/edited-content &&
 		echo >>folder2/edited-content &&
+		echo >>folder2/0/1 &&
+		echo stuff >>deep/deeper1/a &&
 		git add . &&
 		git commit -m "rename folder1/... to folder2/..." &&
 
 		git checkout -b rename-out-to-in rename-base &&
 		mv folder1/a deep/deeper1/b &&
+		echo more stuff >>deep/deeper1/a &&
+		rm folder2/0/1 &&
+		mkdir folder2/0/1 &&
+		echo >>folder2/0/1/1 &&
 		mv folder1/larger-content deep/deeper1/edited-content &&
 		echo >>deep/deeper1/edited-content &&
 		git add . &&
@@ -68,6 +84,9 @@ test_expect_success 'setup' '
 
 		git checkout -b rename-in-to-out rename-base &&
 		mv deep/deeper1/a folder1/b &&
+		echo >>folder2/0/1 &&
+		rm -rf folder1/0/0 &&
+		echo >>folder1/0/0 &&
 		mv deep/deeper1/larger-content folder1/edited-content &&
 		echo >>folder1/edited-content &&
 		git add . &&
@@ -262,13 +281,29 @@ test_expect_success 'diff --staged' '
 	test_all_match git diff --staged
 '
 
-test_expect_success 'diff with renames' '
+test_expect_success 'diff with renames and conflicts' '
 	init_repos &&
 
 	for branch in rename-out-to-out rename-out-to-in rename-in-to-out
 	do
 		test_all_match git checkout rename-base &&
 		test_all_match git checkout $branch -- .&&
+		test_all_match git status --porcelain=v2 &&
+		test_all_match git diff --staged --no-renames &&
+		test_all_match git diff --staged --find-renames || return 1
+	done
+'
+
+test_expect_success 'diff with directory/file conflicts' '
+	init_repos &&
+
+	for branch in rename-out-to-out rename-out-to-in rename-in-to-out
+	do
+		git -C full-checkout reset --hard &&
+		test_sparse_match git reset --hard &&
+		test_all_match git checkout $branch &&
+		test_all_match git checkout rename-base -- . &&
+		test_all_match git status --porcelain=v2 &&
 		test_all_match git diff --staged --no-renames &&
 		test_all_match git diff --staged --find-renames || return 1
 	done
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 215+ messages in thread

* [PATCH v6 05/14] t1092: add tests for status/add and sparse files
  2021-06-29  1:51         ` [PATCH v6 00/14] Sparse-index: integrate with status Derrick Stolee via GitGitGadget
                             ` (3 preceding siblings ...)
  2021-06-29  1:51           ` [PATCH v6 04/14] t1092: expand repository data shape Derrick Stolee via GitGitGadget
@ 2021-06-29  1:51           ` Derrick Stolee via GitGitGadget
  2021-06-29  1:51           ` [PATCH v6 06/14] unpack-trees: preserve cache_bottom Derrick Stolee via GitGitGadget
                             ` (10 subsequent siblings)
  15 siblings, 0 replies; 215+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-06-29  1:51 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee, git,
	johannes.schindelin, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Before moving to update 'git status' and 'git add' to work with sparse
indexes, add an explicit test that ensures the sparse-index works the
same as a normal sparse-checkout when the worktree contains directories
and files outside of the sparse cone.

Specifically, 'folder1/a' is a file in our test repo, but 'folder1' is
not in the sparse cone. When 'folder1/a' is modified, the file is not
shown as modified and adding it will fail. This is new behavior as of
a20f704 (add: warn when asked to update SKIP_WORKTREE entries,
2021-04-08). Before that change, these adds would be silently ignored.

Untracked files are fine: adding new files both with 'git add .' and
'git add folder1/' works just as in a full checkout. This may not be
entirely desirable, but we are not intending to change behavior at the
moment, only document it. A future change could alter the behavior to
be more sensible, and this test could be modified to satisfy the new
expected behavior.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 t/t1092-sparse-checkout-compatibility.sh | 38 ++++++++++++++++++++++++
 1 file changed, 38 insertions(+)

diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index 4a7b48e8d3f..7c78e40b861 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -251,6 +251,44 @@ test_expect_success 'add, commit, checkout' '
 	test_all_match git checkout -
 '
 
+test_expect_success 'status/add: outside sparse cone' '
+	init_repos &&
+
+	# adding a "missing" file outside the cone should fail
+	test_sparse_match test_must_fail git add folder1/a &&
+
+	# folder1 is at HEAD, but outside the sparse cone
+	run_on_sparse mkdir folder1 &&
+	cp initial-repo/folder1/a sparse-checkout/folder1/a &&
+	cp initial-repo/folder1/a sparse-index/folder1/a &&
+
+	test_sparse_match git status &&
+
+	write_script edit-contents <<-\EOF &&
+	echo text >>$1
+	EOF
+	run_on_sparse ../edit-contents folder1/a &&
+	run_on_all ../edit-contents folder1/new &&
+
+	test_sparse_match git status --porcelain=v2 &&
+
+	# This "git add folder1/a" fails with a warning
+	# in the sparse repos, differing from the full
+	# repo. This is intentional.
+	test_sparse_match test_must_fail git add folder1/a &&
+	test_sparse_match test_must_fail git add --refresh folder1/a &&
+	test_all_match git status --porcelain=v2 &&
+
+	test_all_match git add . &&
+	test_all_match git status --porcelain=v2 &&
+	test_all_match git commit -m folder1/new &&
+
+	run_on_all ../edit-contents folder1/newer &&
+	test_all_match git add folder1/ &&
+	test_all_match git status --porcelain=v2 &&
+	test_all_match git commit -m folder1/newer
+'
+
 test_expect_success 'checkout and reset --hard' '
 	init_repos &&
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 215+ messages in thread

* [PATCH v6 06/14] unpack-trees: preserve cache_bottom
  2021-06-29  1:51         ` [PATCH v6 00/14] Sparse-index: integrate with status Derrick Stolee via GitGitGadget
                             ` (4 preceding siblings ...)
  2021-06-29  1:51           ` [PATCH v6 05/14] t1092: add tests for status/add and sparse files Derrick Stolee via GitGitGadget
@ 2021-06-29  1:51           ` Derrick Stolee via GitGitGadget
  2021-06-29  1:51           ` [PATCH v6 07/14] unpack-trees: compare sparse directories correctly Derrick Stolee via GitGitGadget
                             ` (9 subsequent siblings)
  15 siblings, 0 replies; 215+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-06-29  1:51 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee, git,
	johannes.schindelin, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

The cache_bottom member of 'struct unpack_trees_options' is used to
track the range of index entries corresponding to a node of the cache
tree. While recursing with traverse_by_cache_tree(), this value is
preserved on the call stack using a local and then restored as that
method returns.

The mark_ce_used() method normally modifies the cache_bottom member when
it refers to the marked cache entry. However, sparse directory entries
are stored as nodes in the cache-tree data structure as of 2de37c53
(cache-tree: integrate with sparse directory entries, 2021-03-30). Thus,
the cache_bottom will be modified as the cache-tree walk advances. Do
not update it as well within mark_ce_used().

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 unpack-trees.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/unpack-trees.c b/unpack-trees.c
index f88a69f8e71..87c1ed204c8 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -600,6 +600,13 @@ static void mark_ce_used(struct cache_entry *ce, struct unpack_trees_options *o)
 {
 	ce->ce_flags |= CE_UNPACKED;
 
+	/*
+	 * If this is a sparse directory, don't advance cache_bottom.
+	 * That will be advanced later using the cache-tree data.
+	 */
+	if (S_ISSPARSEDIR(ce->ce_mode))
+		return;
+
 	if (o->cache_bottom < o->src_index->cache_nr &&
 	    o->src_index->cache[o->cache_bottom] == ce) {
 		int bottom = o->cache_bottom;
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 215+ messages in thread

* [PATCH v6 07/14] unpack-trees: compare sparse directories correctly
  2021-06-29  1:51         ` [PATCH v6 00/14] Sparse-index: integrate with status Derrick Stolee via GitGitGadget
                             ` (5 preceding siblings ...)
  2021-06-29  1:51           ` [PATCH v6 06/14] unpack-trees: preserve cache_bottom Derrick Stolee via GitGitGadget
@ 2021-06-29  1:51           ` Derrick Stolee via GitGitGadget
  2021-06-29  1:51           ` [PATCH v6 08/14] unpack-trees: unpack sparse directory entries Derrick Stolee via GitGitGadget
                             ` (8 subsequent siblings)
  15 siblings, 0 replies; 215+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-06-29  1:51 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee, git,
	johannes.schindelin, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

As we further integrate the sparse-index into unpack-trees, we need to
ensure that we compare sparse directory entries correctly with other
entries. This affects searching for an exact path as well as sorting
index entries.

Sparse directory entries contain the trailing directory separator. This
is important for the sorting, in particular. Thus, within
do_compare_entry() we stop using S_IFREG in all cases, since sparse
directories should use S_IFDIR to indicate that the comparison should
treat the entry name as a dirctory.

Within compare_entry(), it first calls do_compare_entry() to check the
leading portion of the name. When the input path is a directory name, we
could match exactly already. Thus, we should return 0 if we have an
exact string match on a sparse directory entry.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 unpack-trees.c | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/unpack-trees.c b/unpack-trees.c
index 87c1ed204c8..7a507ddfe05 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -983,6 +983,7 @@ static int do_compare_entry(const struct cache_entry *ce,
 	int pathlen, ce_len;
 	const char *ce_name;
 	int cmp;
+	unsigned ce_mode;
 
 	/*
 	 * If we have not precomputed the traverse path, it is quicker
@@ -1005,7 +1006,8 @@ static int do_compare_entry(const struct cache_entry *ce,
 	ce_len -= pathlen;
 	ce_name = ce->name + pathlen;
 
-	return df_name_compare(ce_name, ce_len, S_IFREG, name, namelen, mode);
+	ce_mode = S_ISSPARSEDIR(ce->ce_mode) ? S_IFDIR : S_IFREG;
+	return df_name_compare(ce_name, ce_len, ce_mode, name, namelen, mode);
 }
 
 static int compare_entry(const struct cache_entry *ce, const struct traverse_info *info, const struct name_entry *n)
@@ -1014,6 +1016,15 @@ static int compare_entry(const struct cache_entry *ce, const struct traverse_inf
 	if (cmp)
 		return cmp;
 
+	/*
+	 * At this point, we know that we have a prefix match. If ce
+	 * is a sparse directory, then allow an exact match. This only
+	 * works when the input name is a directory, since ce->name
+	 * ends in a directory separator.
+	 */
+	if (S_ISSPARSEDIR(ce->ce_mode))
+		return 0;
+
 	/*
 	 * Even if the beginning compared identically, the ce should
 	 * compare as bigger than a directory leading up to it!
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 215+ messages in thread

* [PATCH v6 08/14] unpack-trees: unpack sparse directory entries
  2021-06-29  1:51         ` [PATCH v6 00/14] Sparse-index: integrate with status Derrick Stolee via GitGitGadget
                             ` (6 preceding siblings ...)
  2021-06-29  1:51           ` [PATCH v6 07/14] unpack-trees: compare sparse directories correctly Derrick Stolee via GitGitGadget
@ 2021-06-29  1:51           ` Derrick Stolee via GitGitGadget
  2021-06-29  1:51           ` [PATCH v6 09/14] dir.c: accept a directory as part of cone-mode patterns Derrick Stolee via GitGitGadget
                             ` (7 subsequent siblings)
  15 siblings, 0 replies; 215+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-06-29  1:51 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee, git,
	johannes.schindelin, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

During unpack_callback(), index entries are compared against tree
entries. These are matched according to names and types. One goal is to
decide if we should recurse into subtrees or simply operate on one index
entry.

In the case of a sparse-directory entry, we do not want to recurse into
that subtree and instead simply compare the trees. In some cases, we
might want to perform a merge operation on the entry, such as during
'git checkout <commit>' which wants to replace a sparse tree entry with
the tree for that path at the target commit. We extend the logic within
unpack_nondirectories() to create a sparse-directory entry in this case,
and then that is sent to call_unpack_fn(). Since the name becomes
confusing by handling directories, rename it to unpack_single_entry()
since it handles a blob entry or a sparse directory entry without using
traverse_trees_recursive().

There are some subtleties in this process. For instance, we need to
update find_cache_entry() to allow finding a sparse-directory entry that
exactly matches a given path. Use the new helper method
sparse_dir_matches_path() for this.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 unpack-trees.c | 110 ++++++++++++++++++++++++++++++++++++++++++-------
 1 file changed, 96 insertions(+), 14 deletions(-)

diff --git a/unpack-trees.c b/unpack-trees.c
index 7a507ddfe05..d205a15d61f 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -1051,13 +1051,15 @@ static struct cache_entry *create_ce_entry(const struct traverse_info *info,
 	const struct name_entry *n,
 	int stage,
 	struct index_state *istate,
-	int is_transient)
+	int is_transient,
+	int is_sparse_directory)
 {
 	size_t len = traverse_path_len(info, tree_entry_len(n));
+	size_t alloc_len = is_sparse_directory ? len + 1 : len;
 	struct cache_entry *ce =
 		is_transient ?
-		make_empty_transient_cache_entry(len, NULL) :
-		make_empty_cache_entry(istate, len);
+		make_empty_transient_cache_entry(alloc_len, NULL) :
+		make_empty_cache_entry(istate, alloc_len);
 
 	ce->ce_mode = create_ce_mode(n->mode);
 	ce->ce_flags = create_ce_flags(stage);
@@ -1066,6 +1068,13 @@ static struct cache_entry *create_ce_entry(const struct traverse_info *info,
 	/* len+1 because the cache_entry allocates space for NUL */
 	make_traverse_path(ce->name, len + 1, info, n->path, n->pathlen);
 
+	if (is_sparse_directory) {
+		ce->name[len] = '/';
+		ce->name[len + 1] = '\0';
+		ce->ce_namelen++;
+		ce->ce_flags |= CE_SKIP_WORKTREE;
+	}
+
 	return ce;
 }
 
@@ -1074,20 +1083,28 @@ static struct cache_entry *create_ce_entry(const struct traverse_info *info,
  * without actually calling it. If you change the logic here you may need to
  * check and change there as well.
  */
-static int unpack_nondirectories(int n, unsigned long mask,
-				 unsigned long dirmask,
-				 struct cache_entry **src,
-				 const struct name_entry *names,
-				 const struct traverse_info *info)
+static int unpack_single_entry(int n, unsigned long mask,
+			       unsigned long dirmask,
+			       struct cache_entry **src,
+			       const struct name_entry *names,
+			       const struct traverse_info *info,
+			       int sparse_directory)
 {
 	int i;
 	struct unpack_trees_options *o = info->data;
 	unsigned long conflicts = info->df_conflicts | dirmask;
 
-	/* Do we have *only* directories? Nothing to do */
 	if (mask == dirmask && !src[0])
 		return 0;
 
+	/* defer work if our cache entry doesn't match the expectations. */
+	if (sparse_directory) {
+		if (src[0] && !S_ISSPARSEDIR(src[0]->ce_mode))
+			BUG("expected sparse directory entry");
+	} else if (src[0] && S_ISSPARSEDIR(src[0]->ce_mode)) {
+		return 0;
+	}
+
 	/*
 	 * Ok, we've filled in up to any potential index entry in src[0],
 	 * now do the rest.
@@ -1117,7 +1134,9 @@ static int unpack_nondirectories(int n, unsigned long mask,
 		 * not stored in the index.  otherwise construct the
 		 * cache entry from the index aware logic.
 		 */
-		src[i + o->merge] = create_ce_entry(info, names + i, stage, &o->result, o->merge);
+		src[i + o->merge] = create_ce_entry(info, names + i, stage,
+						    &o->result, o->merge,
+						    sparse_directory);
 	}
 
 	if (o->merge) {
@@ -1221,16 +1240,59 @@ static int find_cache_pos(struct traverse_info *info,
 	return -1;
 }
 
+/*
+ * Given a sparse directory entry 'ce', compare ce->name to
+ * info->name + '/' + p->path + '/' if info->name is non-empty.
+ * Compare ce->name to p->path + '/' otherwise. Note that
+ * ce->name must end in a trailing '/' because it is a sparse
+ * directory entry.
+ */
+static int sparse_dir_matches_path(const struct cache_entry *ce,
+				   struct traverse_info *info,
+				   const struct name_entry *p)
+{
+	assert(S_ISSPARSEDIR(ce->ce_mode));
+	assert(ce->name[ce->ce_namelen - 1] == '/');
+
+	if (info->namelen)
+		return ce->ce_namelen == info->namelen + p->pathlen + 2 &&
+		       ce->name[info->namelen] == '/' &&
+		       !strncmp(ce->name, info->name, info->namelen) &&
+		       !strncmp(ce->name + info->namelen + 1, p->path, p->pathlen);
+	return ce->ce_namelen == p->pathlen + 1 &&
+	       !strncmp(ce->name, p->path, p->pathlen);
+}
+
 static struct cache_entry *find_cache_entry(struct traverse_info *info,
 					    const struct name_entry *p)
 {
+	struct cache_entry *ce;
 	int pos = find_cache_pos(info, p->path, p->pathlen);
 	struct unpack_trees_options *o = info->data;
 
 	if (0 <= pos)
 		return o->src_index->cache[pos];
-	else
+
+	/*
+	 * Check for a sparse-directory entry named "path/".
+	 * Due to the input p->path not having a trailing
+	 * slash, the negative 'pos' value overshoots the
+	 * expected position by one, hence "-2" here.
+	 */
+	pos = -pos - 2;
+
+	if (pos < 0 || pos >= o->src_index->cache_nr)
 		return NULL;
+
+	ce = o->src_index->cache[pos];
+
+	if (!S_ISSPARSEDIR(ce->ce_mode))
+		return NULL;
+
+	if (sparse_dir_matches_path(ce, info, p))
+		return ce;
+
+	return NULL;
 }
 
 static void debug_path(struct traverse_info *info)
@@ -1265,6 +1327,21 @@ static void debug_unpack_callback(int n,
 		debug_name_entry(i, names + i);
 }
 
+/*
+ * Returns true if and only if the given cache_entry is a
+ * sparse-directory entry that matches the given name_entry
+ * from the tree walk at the given traverse_info.
+ */
+static int is_sparse_directory_entry(struct cache_entry *ce,
+				     struct name_entry *name,
+				     struct traverse_info *info)
+{
+	if (!ce || !name || !S_ISSPARSEDIR(ce->ce_mode))
+		return 0;
+
+	return sparse_dir_matches_path(ce, info, name);
+}
+
 /*
  * Note that traverse_by_cache_tree() duplicates some logic in this function
  * without actually calling it. If you change the logic here you may need to
@@ -1321,7 +1398,7 @@ static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, str
 		}
 	}
 
-	if (unpack_nondirectories(n, mask, dirmask, src, names, info) < 0)
+	if (unpack_single_entry(n, mask, dirmask, src, names, info, 0) < 0)
 		return -1;
 
 	if (o->merge && src[0]) {
@@ -1351,9 +1428,14 @@ static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, str
 			}
 		}
 
-		if (traverse_trees_recursive(n, dirmask, mask & ~dirmask,
-					     names, info) < 0)
+		if (is_sparse_directory_entry(src[0], names, info)) {
+			if (unpack_single_entry(n, dirmask, mask & ~dirmask, src, names, info, 1) < 0)
+				return -1;
+		} else if (traverse_trees_recursive(n, dirmask, mask & ~dirmask,
+						    names, info) < 0) {
 			return -1;
+		}
+
 		return mask;
 	}
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 215+ messages in thread

* [PATCH v6 09/14] dir.c: accept a directory as part of cone-mode patterns
  2021-06-29  1:51         ` [PATCH v6 00/14] Sparse-index: integrate with status Derrick Stolee via GitGitGadget
                             ` (7 preceding siblings ...)
  2021-06-29  1:51           ` [PATCH v6 08/14] unpack-trees: unpack sparse directory entries Derrick Stolee via GitGitGadget
@ 2021-06-29  1:51           ` Derrick Stolee via GitGitGadget
  2021-06-29  1:51           ` [PATCH v6 10/14] diff-lib: handle index diffs with sparse dirs Derrick Stolee via GitGitGadget
                             ` (6 subsequent siblings)
  15 siblings, 0 replies; 215+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-06-29  1:51 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee, git,
	johannes.schindelin, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

When we have sparse directory entries in the index, we want to compare
that directory against sparse-checkout patterns. Those pattern matching
algorithms are built expecting a file path, not a directory path. This
is especially important in the "cone mode" patterns which will match
files that exist within the "parent directories" as well as the
recursive directory matches.

If path_matches_pattern_list() is given a directory, we can add a fake
filename ("-") to the directory and get the same results as before,
assuming we are in cone mode. Since sparse index requires cone mode
patterns, this is an acceptable assumption.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 dir.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/dir.c b/dir.c
index ebe5ec046e0..2155107b1d7 100644
--- a/dir.c
+++ b/dir.c
@@ -1397,6 +1397,17 @@ enum pattern_match_result path_matches_pattern_list(
 	strbuf_addch(&parent_pathname, '/');
 	strbuf_add(&parent_pathname, pathname, pathlen);
 
+	/*
+	 * Directory entries are matched if and only if a file
+	 * contained immediately within them is matched. For the
+	 * case of a directory entry, modify the path to create
+	 * a fake filename within this directory, allowing us to
+	 * use the file-base matching logic in an equivalent way.
+	 */
+	if (parent_pathname.len > 0 &&
+	    parent_pathname.buf[parent_pathname.len - 1] == '/')
+		strbuf_add(&parent_pathname, "-", 1);
+
 	if (hashmap_contains_path(&pl->recursive_hashmap,
 				  &parent_pathname)) {
 		result = MATCHED_RECURSIVE;
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 215+ messages in thread

* [PATCH v6 10/14] diff-lib: handle index diffs with sparse dirs
  2021-06-29  1:51         ` [PATCH v6 00/14] Sparse-index: integrate with status Derrick Stolee via GitGitGadget
                             ` (8 preceding siblings ...)
  2021-06-29  1:51           ` [PATCH v6 09/14] dir.c: accept a directory as part of cone-mode patterns Derrick Stolee via GitGitGadget
@ 2021-06-29  1:51           ` Derrick Stolee via GitGitGadget
  2021-06-29  1:51           ` [PATCH v6 11/14] status: skip sparse-checkout percentage with sparse-index Derrick Stolee via GitGitGadget
                             ` (5 subsequent siblings)
  15 siblings, 0 replies; 215+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-06-29  1:51 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee, git,
	johannes.schindelin, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

While comparing an index to a tree, we may see a sparse directory entry.
In this case, we should compare that portion of the tree to the tree
represented by that entry. This could include a new tree which needs to
be expanded to a full list of added files. It could also include an
existing tree, in which case all of the changes inside are important to
describe, including the modifications, additions, and deletions. Note
that the case where the tree has a path and the index does not remains
identical to before: the lack of a cache entry is the same with a sparse
index.

Use diff_tree_oid() appropriately to appropriately compute the diff.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 diff-lib.c | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/diff-lib.c b/diff-lib.c
index c2ac9250fe9..3f32f038371 100644
--- a/diff-lib.c
+++ b/diff-lib.c
@@ -325,6 +325,11 @@ static void show_new_file(struct rev_info *revs,
 	unsigned dirty_submodule = 0;
 	struct index_state *istate = revs->diffopt.repo->index;
 
+	if (new_file && S_ISSPARSEDIR(new_file->ce_mode)) {
+		diff_tree_oid(NULL, &new_file->oid, new_file->name, &revs->diffopt);
+		return;
+	}
+
 	/*
 	 * New file in the index: it might actually be different in
 	 * the working tree.
@@ -347,6 +352,17 @@ static int show_modified(struct rev_info *revs,
 	unsigned dirty_submodule = 0;
 	struct index_state *istate = revs->diffopt.repo->index;
 
+	/*
+	 * If both are sparse directory entries, then expand the
+	 * modifications to the file level.
+	 */
+	if (old_entry && new_entry &&
+	    S_ISSPARSEDIR(old_entry->ce_mode) &&
+	    S_ISSPARSEDIR(new_entry->ce_mode)) {
+		diff_tree_oid(&old_entry->oid, &new_entry->oid, new_entry->name, &revs->diffopt);
+		return 0;
+	}
+
 	if (get_stat_data(istate, new_entry, &oid, &mode, cached, match_missing,
 			  &dirty_submodule, &revs->diffopt) < 0) {
 		if (report_missing)
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 215+ messages in thread

* [PATCH v6 11/14] status: skip sparse-checkout percentage with sparse-index
  2021-06-29  1:51         ` [PATCH v6 00/14] Sparse-index: integrate with status Derrick Stolee via GitGitGadget
                             ` (9 preceding siblings ...)
  2021-06-29  1:51           ` [PATCH v6 10/14] diff-lib: handle index diffs with sparse dirs Derrick Stolee via GitGitGadget
@ 2021-06-29  1:51           ` Derrick Stolee via GitGitGadget
  2021-06-29  1:51           ` [PATCH v6 12/14] status: use sparse-index throughout Derrick Stolee via GitGitGadget
                             ` (4 subsequent siblings)
  15 siblings, 0 replies; 215+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-06-29  1:51 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee, git,
	johannes.schindelin, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

'git status' began reporting a percentage of populated paths when
sparse-checkout is enabled in 051df3cf (wt-status: show sparse
checkout status as well, 2020-07-18). This percentage is incorrect when
the index has sparse directories. It would also be expensive to
calculate as we would need to parse trees to count the total number of
possible paths.

Avoid the expensive computation by simplifying the output to only report
that a sparse checkout exists, without the percentage.

This change is the reason we use 'git status --porcelain=v2' in
t1092-sparse-checkout-compatibility.sh. We don't want to ensure that
this message is equal across both modes, but instead just the important
information about staged, modified, and untracked files are compared.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 t/t1092-sparse-checkout-compatibility.sh |  8 ++++++++
 wt-status.c                              | 14 +++++++++++---
 wt-status.h                              |  1 +
 3 files changed, 20 insertions(+), 3 deletions(-)

diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index 7c78e40b861..9035adcb7db 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -215,6 +215,14 @@ test_expect_success 'status with options' '
 	test_all_match git status --porcelain=v2 -uno
 '
 
+test_expect_success 'status reports sparse-checkout' '
+	init_repos &&
+	git -C sparse-checkout status >full &&
+	git -C sparse-index status >sparse &&
+	test_i18ngrep "You are in a sparse checkout with " full &&
+	test_i18ngrep "You are in a sparse checkout." sparse
+'
+
 test_expect_success 'add, commit, checkout' '
 	init_repos &&
 
diff --git a/wt-status.c b/wt-status.c
index 42b67357169..96db3e74962 100644
--- a/wt-status.c
+++ b/wt-status.c
@@ -1493,9 +1493,12 @@ static void show_sparse_checkout_in_use(struct wt_status *s,
 	if (s->state.sparse_checkout_percentage == SPARSE_CHECKOUT_DISABLED)
 		return;
 
-	status_printf_ln(s, color,
-			 _("You are in a sparse checkout with %d%% of tracked files present."),
-			 s->state.sparse_checkout_percentage);
+	if (s->state.sparse_checkout_percentage == SPARSE_CHECKOUT_SPARSE_INDEX)
+		status_printf_ln(s, color, _("You are in a sparse checkout."));
+	else
+		status_printf_ln(s, color,
+				_("You are in a sparse checkout with %d%% of tracked files present."),
+				s->state.sparse_checkout_percentage);
 	wt_longstatus_print_trailer(s);
 }
 
@@ -1653,6 +1656,11 @@ static void wt_status_check_sparse_checkout(struct repository *r,
 		return;
 	}
 
+	if (r->index->sparse_index) {
+		state->sparse_checkout_percentage = SPARSE_CHECKOUT_SPARSE_INDEX;
+		return;
+	}
+
 	for (i = 0; i < r->index->cache_nr; i++) {
 		struct cache_entry *ce = r->index->cache[i];
 		if (ce_skip_worktree(ce))
diff --git a/wt-status.h b/wt-status.h
index 0d32799b28e..ab9cc9d8f03 100644
--- a/wt-status.h
+++ b/wt-status.h
@@ -78,6 +78,7 @@ enum wt_status_format {
 };
 
 #define SPARSE_CHECKOUT_DISABLED -1
+#define SPARSE_CHECKOUT_SPARSE_INDEX -2
 
 struct wt_status_state {
 	int merge_in_progress;
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 215+ messages in thread

* [PATCH v6 12/14] status: use sparse-index throughout
  2021-06-29  1:51         ` [PATCH v6 00/14] Sparse-index: integrate with status Derrick Stolee via GitGitGadget
                             ` (10 preceding siblings ...)
  2021-06-29  1:51           ` [PATCH v6 11/14] status: skip sparse-checkout percentage with sparse-index Derrick Stolee via GitGitGadget
@ 2021-06-29  1:51           ` Derrick Stolee via GitGitGadget
  2021-06-29  1:51           ` [PATCH v6 13/14] wt-status: expand added sparse directory entries Derrick Stolee via GitGitGadget
                             ` (3 subsequent siblings)
  15 siblings, 0 replies; 215+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-06-29  1:51 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee, git,
	johannes.schindelin, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

By testing 'git -c core.fsmonitor= status -uno', we can check for the
simplest index operations that can be made sparse-aware. The necessary
implementation details are already integrated with sparse-checkout, so
modify command_requires_full_index to be zero for cmd_status().

In refresh_index(), we loop through the index entries to refresh their
stat() information. However, sparse directories have no stat()
information to populate. Ignore these entries.

This allows 'git status' to no longer expand a sparse index to a full
one. This is further tested by dropping the "-uno" option and adding an
untracked file into the worktree.

The performance test p2000-sparse-checkout-operations.sh demonstrates
these improvements:

Test                                  HEAD~1           HEAD
-----------------------------------------------------------------------------
2000.2: git status (full-index-v3)    0.31(0.30+0.05)  0.31(0.29+0.06) +0.0%
2000.3: git status (full-index-v4)    0.31(0.29+0.07)  0.34(0.30+0.08) +9.7%
2000.4: git status (sparse-index-v3)  2.35(2.28+0.10)  0.04(0.04+0.05) -98.3%
2000.5: git status (sparse-index-v4)  2.35(2.24+0.15)  0.05(0.04+0.06) -97.9%

Note that since HEAD~1 was expanding the sparse index by parsing trees,
it was artificially slower than the full index case. Thus, the 98%
improvement is misleading, and instead we should celebrate the 0.34s to
0.05s improvement of 85%. This is more indicative of the peformance
gains we are expecting by using a sparse index.

Note: we are dropping the assignment of core.fsmonitor here. This is not
necessary for the test script as we are not altering the config any
other way. Correct integration with FS Monitor will be validated in
later changes.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 builtin/commit.c                         |  3 +++
 read-cache.c                             | 10 ++++++++--
 t/t1092-sparse-checkout-compatibility.sh | 13 +++++++++----
 3 files changed, 20 insertions(+), 6 deletions(-)

diff --git a/builtin/commit.c b/builtin/commit.c
index 190d215d43b..12f51db158a 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -1510,6 +1510,9 @@ int cmd_status(int argc, const char **argv, const char *prefix)
 	if (argc == 2 && !strcmp(argv[1], "-h"))
 		usage_with_options(builtin_status_usage, builtin_status_options);
 
+	prepare_repo_settings(the_repository);
+	the_repository->settings.command_requires_full_index = 0;
+
 	status_init_config(&s, git_status_config);
 	argc = parse_options(argc, argv, prefix,
 			     builtin_status_options,
diff --git a/read-cache.c b/read-cache.c
index 1b3c2eb408b..277c2970a03 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1584,8 +1584,7 @@ int refresh_index(struct index_state *istate, unsigned int flags,
 	 */
 	preload_index(istate, pathspec, 0);
 	trace2_region_enter("index", "refresh", NULL);
-	/* TODO: audit for interaction with sparse-index. */
-	ensure_full_index(istate);
+
 	for (i = 0; i < istate->cache_nr; i++) {
 		struct cache_entry *ce, *new_entry;
 		int cache_errno = 0;
@@ -1600,6 +1599,13 @@ int refresh_index(struct index_state *istate, unsigned int flags,
 		if (ignore_skip_worktree && ce_skip_worktree(ce))
 			continue;
 
+		/*
+		 * If this entry is a sparse directory, then there isn't
+		 * any stat() information to update. Ignore the entry.
+		 */
+		if (S_ISSPARSEDIR(ce->ce_mode))
+			continue;
+
 		if (pathspec && !ce_path_match(istate, ce, pathspec, seen))
 			filtered = 1;
 
diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index 9035adcb7db..1e9737cb4b7 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -508,12 +508,17 @@ test_expect_success 'sparse-index is expanded and converted back' '
 	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \
 		git -C sparse-index -c core.fsmonitor="" reset --hard &&
 	test_region index convert_to_sparse trace2.txt &&
-	test_region index ensure_full_index trace2.txt &&
+	test_region index ensure_full_index trace2.txt
+'
 
-	rm trace2.txt &&
+test_expect_success 'sparse-index is not expanded' '
+	init_repos &&
+
+	rm -f trace2.txt &&
+	echo >>sparse-index/untracked.txt &&
 	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \
-		git -C sparse-index -c core.fsmonitor="" status -uno &&
-	test_region index ensure_full_index trace2.txt
+		git -C sparse-index status &&
+	test_region ! index ensure_full_index trace2.txt
 '
 
 test_done
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 215+ messages in thread

* [PATCH v6 13/14] wt-status: expand added sparse directory entries
  2021-06-29  1:51         ` [PATCH v6 00/14] Sparse-index: integrate with status Derrick Stolee via GitGitGadget
                             ` (11 preceding siblings ...)
  2021-06-29  1:51           ` [PATCH v6 12/14] status: use sparse-index throughout Derrick Stolee via GitGitGadget
@ 2021-06-29  1:51           ` Derrick Stolee via GitGitGadget
  2021-06-29  1:51           ` [PATCH v6 14/14] fsmonitor: integrate with sparse index Derrick Stolee via GitGitGadget
                             ` (2 subsequent siblings)
  15 siblings, 0 replies; 215+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-06-29  1:51 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee, git,
	johannes.schindelin, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

It is difficult, but possible, to get into a state where we intend to
add a directory that is outside of the sparse-checkout definition. Add a
test to t1092-sparse-checkout-compatibility.sh that demonstrates this
using a combination of 'git reset --mixed' and 'git checkout --orphan'.

This test failed before because the output of 'git status
--porcelain=v2' would not match on the lines for folder1/:

* The sparse-checkout repo (with a full index) would output each path
  name that is intended to be added.

* The sparse-index repo would only output that "folder1/" is staged for
  addition.

The status should report the full list of files to be added, and so this
sparse-directory entry should be expanded to a full list when reaching
it inside the wt_status_collect_changes_initial() method. Use
read_tree_at() to assist.

Somehow, this loop over the cache entries was not guarded by
ensure_full_index() as intended.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 t/t1092-sparse-checkout-compatibility.sh | 33 ++++++++++++++++
 wt-status.c                              | 49 ++++++++++++++++++++++++
 2 files changed, 82 insertions(+)

diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index 1e9737cb4b7..ef918437908 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -521,4 +521,37 @@ test_expect_success 'sparse-index is not expanded' '
 	test_region ! index ensure_full_index trace2.txt
 '
 
+test_expect_success 'reset mixed and checkout orphan' '
+	init_repos &&
+
+	test_all_match git checkout rename-out-to-in &&
+
+	# Sparse checkouts do not agree with full checkouts about
+	# how to report a directory/file conflict during a reset.
+	# This command would fail with test_all_match because the
+	# full checkout reports "T folder1/0/1" while a sparse
+	# checkout reports "D folder1/0/1". This matches because
+	# the sparse checkouts skip "adding" the other side of
+	# the conflict.
+	test_sparse_match git reset --mixed HEAD~1 &&
+	test_sparse_match test-tool read-cache --table --expand &&
+	test_sparse_match git status --porcelain=v2 &&
+
+	# At this point, sparse-checkouts behave differently
+	# from the full-checkout.
+	test_sparse_match git checkout --orphan new-branch &&
+	test_sparse_match test-tool read-cache --table --expand &&
+	test_sparse_match git status --porcelain=v2
+'
+
+test_expect_success 'add everything with deep new file' '
+	init_repos &&
+
+	run_on_sparse git sparse-checkout set deep/deeper1/deepest &&
+
+	run_on_all touch deep/deeper1/x &&
+	test_all_match git add . &&
+	test_all_match git status --porcelain=v2
+'
+
 test_done
diff --git a/wt-status.c b/wt-status.c
index 96db3e74962..a90c7b6aa8a 100644
--- a/wt-status.c
+++ b/wt-status.c
@@ -657,6 +657,34 @@ static void wt_status_collect_changes_index(struct wt_status *s)
 	clear_pathspec(&rev.prune_data);
 }
 
+static int add_file_to_list(const struct object_id *oid,
+			    struct strbuf *base, const char *path,
+			    unsigned int mode, void *context)
+{
+	struct string_list_item *it;
+	struct wt_status_change_data *d;
+	struct wt_status *s = context;
+	char *full_name;
+
+	if (S_ISDIR(mode))
+		return READ_TREE_RECURSIVE;
+
+	full_name = xstrfmt("%s%s", base->buf, path);
+	it = string_list_insert(&s->change, full_name);
+	d = it->util;
+	if (!d) {
+		CALLOC_ARRAY(d, 1);
+		it->util = d;
+	}
+
+	d->index_status = DIFF_STATUS_ADDED;
+	/* Leave {mode,oid}_head zero for adds. */
+	d->mode_index = mode;
+	oidcpy(&d->oid_index, oid);
+	s->committable = 1;
+	return 0;
+}
+
 static void wt_status_collect_changes_initial(struct wt_status *s)
 {
 	struct index_state *istate = s->repo->index;
@@ -671,6 +699,27 @@ static void wt_status_collect_changes_initial(struct wt_status *s)
 			continue;
 		if (ce_intent_to_add(ce))
 			continue;
+		if (S_ISSPARSEDIR(ce->ce_mode)) {
+			/*
+			 * This is a sparse directory entry, so we want to collect all
+			 * of the added files within the tree. This requires recursively
+			 * expanding the trees to find the elements that are new in this
+			 * tree and marking them with DIFF_STATUS_ADDED.
+			 */
+			struct strbuf base = STRBUF_INIT;
+			struct pathspec ps = { 0 };
+			struct tree *tree = lookup_tree(istate->repo, &ce->oid);
+
+			ps.recursive = 1;
+			ps.has_wildcard = 1;
+			ps.max_depth = -1;
+
+			strbuf_add(&base, ce->name, ce->ce_namelen);
+			read_tree_at(istate->repo, tree, &base, &ps,
+				     add_file_to_list, s);
+			continue;
+		}
+
 		it = string_list_insert(&s->change, ce->name);
 		d = it->util;
 		if (!d) {
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 215+ messages in thread

* [PATCH v6 14/14] fsmonitor: integrate with sparse index
  2021-06-29  1:51         ` [PATCH v6 00/14] Sparse-index: integrate with status Derrick Stolee via GitGitGadget
                             ` (12 preceding siblings ...)
  2021-06-29  1:51           ` [PATCH v6 13/14] wt-status: expand added sparse directory entries Derrick Stolee via GitGitGadget
@ 2021-06-29  1:51           ` Derrick Stolee via GitGitGadget
  2021-06-29  2:02           ` [PATCH v6 00/14] Sparse-index: integrate with status Derrick Stolee
  2021-06-29  2:04           ` [PATCH v7 00/16] " Derrick Stolee via GitGitGadget
  15 siblings, 0 replies; 215+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-06-29  1:51 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee, git,
	johannes.schindelin, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

If we need to expand a sparse-index into a full one, then the FS Monitor
bitmap is going to be incorrect. Ensure that we start fresh at such an
event.

While this is currently a performance drawback, the eventual hope of the
sparse-index feature is that these expansions will be rare and hence we
will be able to keep the FS Monitor data accurate across multiple Git
commands.

These tests are added to demonstrate that the behavior is the same
across a full index and a sparse index, but also that file modifications
to a tracked directory outside of the sparse cone will trigger
ensure_full_index().

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 sparse-index.c              |  7 ++++++
 t/t7519-status-fsmonitor.sh | 48 +++++++++++++++++++++++++++++++++++++
 2 files changed, 55 insertions(+)

diff --git a/sparse-index.c b/sparse-index.c
index ef53bd2198b..53c8f711ccc 100644
--- a/sparse-index.c
+++ b/sparse-index.c
@@ -186,6 +186,10 @@ int convert_to_sparse(struct index_state *istate)
 	cache_tree_free(&istate->cache_tree);
 	cache_tree_update(istate, 0);
 
+	istate->fsmonitor_has_run_once = 0;
+	FREE_AND_NULL(istate->fsmonitor_dirty);
+	FREE_AND_NULL(istate->fsmonitor_last_update);
+
 	istate->sparse_index = 1;
 	trace2_region_leave("index", "convert_to_sparse", istate->repo);
 	return 0;
@@ -282,6 +286,9 @@ void ensure_full_index(struct index_state *istate)
 	istate->cache = full->cache;
 	istate->cache_nr = full->cache_nr;
 	istate->cache_alloc = full->cache_alloc;
+	istate->fsmonitor_has_run_once = 0;
+	FREE_AND_NULL(istate->fsmonitor_dirty);
+	FREE_AND_NULL(istate->fsmonitor_last_update);
 
 	strbuf_release(&base);
 	free(full);
diff --git a/t/t7519-status-fsmonitor.sh b/t/t7519-status-fsmonitor.sh
index 637391c6ce4..8f9240b9b1a 100755
--- a/t/t7519-status-fsmonitor.sh
+++ b/t/t7519-status-fsmonitor.sh
@@ -73,6 +73,7 @@ test_expect_success 'setup' '
 	expect*
 	actual*
 	marker*
+	trace2*
 	EOF
 '
 
@@ -383,4 +384,51 @@ test_expect_success 'status succeeds after staging/unstaging' '
 	)
 '
 
+# Usage:
+# check_sparse_index_behavior [!]
+# If "!" is supplied, then we verify that we do not call ensure_full_index
+# during a call to 'git status'. Otherwise, we verify that we _do_ call it.
+check_sparse_index_behavior () {
+	git status --porcelain=v2 >expect &&
+	git sparse-checkout init --cone --sparse-index &&
+	git sparse-checkout set dir1 dir2 &&
+	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \
+		git status --porcelain=v2 >actual &&
+	test_region $1 index ensure_full_index trace2.txt &&
+	test_cmp expect actual &&
+	rm trace2.txt &&
+	git sparse-checkout disable
+}
+
+test_expect_success 'status succeeds with sparse index' '
+	git reset --hard &&
+
+	test_config core.fsmonitor "$TEST_DIRECTORY/t7519/fsmonitor-all" &&
+	check_sparse_index_behavior ! &&
+
+	write_script .git/hooks/fsmonitor-test<<-\EOF &&
+		printf "last_update_token\0"
+	EOF
+	git config core.fsmonitor .git/hooks/fsmonitor-test &&
+	check_sparse_index_behavior ! &&
+
+	write_script .git/hooks/fsmonitor-test<<-\EOF &&
+		printf "last_update_token\0"
+		printf "dir1/modified\0"
+	EOF
+	check_sparse_index_behavior ! &&
+
+	cp -r dir1 dir1a &&
+	git add dir1a &&
+	git commit -m "add dir1a" &&
+
+	# This one modifies outside the sparse-checkout definition
+	# and hence we expect to expand the sparse-index.
+	write_script .git/hooks/fsmonitor-test<<-\EOF &&
+		printf "last_update_token\0"
+		printf "dir1a/modified\0"
+	EOF
+	check_sparse_index_behavior
+'
+
 test_done
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 215+ messages in thread

* Re: [PATCH v6 00/14] Sparse-index: integrate with status
  2021-06-29  1:51         ` [PATCH v6 00/14] Sparse-index: integrate with status Derrick Stolee via GitGitGadget
                             ` (13 preceding siblings ...)
  2021-06-29  1:51           ` [PATCH v6 14/14] fsmonitor: integrate with sparse index Derrick Stolee via GitGitGadget
@ 2021-06-29  2:02           ` Derrick Stolee
  2021-06-29  2:04           ` [PATCH v7 00/16] " Derrick Stolee via GitGitGadget
  15 siblings, 0 replies; 215+ messages in thread
From: Derrick Stolee @ 2021-06-29  2:02 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget, git
  Cc: gitster, newren, Matheus Tavares Bernardino, git,
	johannes.schindelin, Derrick Stolee

On 6/28/2021 9:51 PM, Derrick Stolee via GitGitGadget wrote:
...
> Update in V6
> ============

I'm very sorry for the noise. I was working on my cover letter and
adjusting the patches relative to some nits I discovered, including
the ordering of some patches in my larger topic on this subject,
and forgot to push those before submitting the v6. Please ignore
this version. A correct v7 is coming soon.

Again, sorry for the noise.

-Stolee

^ permalink raw reply	[flat|nested] 215+ messages in thread

* [PATCH v7 00/16] Sparse-index: integrate with status
  2021-06-29  1:51         ` [PATCH v6 00/14] Sparse-index: integrate with status Derrick Stolee via GitGitGadget
                             ` (14 preceding siblings ...)
  2021-06-29  2:02           ` [PATCH v6 00/14] Sparse-index: integrate with status Derrick Stolee
@ 2021-06-29  2:04           ` Derrick Stolee via GitGitGadget
  2021-06-29  2:04             ` [PATCH v7 01/16] sparse-index: skip indexes with unmerged entries Derrick Stolee via GitGitGadget
                               ` (18 more replies)
  15 siblings, 19 replies; 215+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-06-29  2:04 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee, git,
	johannes.schindelin, Derrick Stolee

This is the first "payoff" series in the sparse-index work. It makes 'git
status' very fast when a sparse-index is enabled on a repository with
cone-mode sparse-checkout (and a small populated set).

This is based on ds/sparse-index-protections AND mt/add-rm-sparse-checkout.
The latter branch is needed because it changes the behavior of 'git add'
around sparse entries, which changes the expectations of a test added in
patch 1.

The approach here is to audit the places where ensure_full_index() pops up
while doing normal commands with pathspecs within the sparse-checkout
definition. Each of these are checked and tested. In the end, the
sparse-index is integrated with these features:

 * git status
 * FS Monitor index extension.

The performance tests in p2000-sparse-operations.sh improve by 95% or more,
even when compared with the full-index cases, not just the sparse-index
cases that previously had extra overhead.

Hopefully this is the first example of how ds/sparse-index-protections has
done the basic work to do these conversions safely, making them look easier
than they seemed when starting this adventure.

Thanks, -Stolee


Update in V7 (relative to v5)
=============================

APOLOGIES: As I was working on this cover letter, I was still organizing my
big list of patches, including reordering some into this series. I forgot to
actually include them in my v6 submission, so here is a re-submission.
Please ignore v6.

I'm sorry that this revision took so long. Initially I was blocked on
getting the directory/file conflict figured out (I did), but also my team
was very busy with some things. Eventually, we reached an internal deadline
to make an experimental release available [1] with initial sparse-index
performance boosts. Creating that included some additional review by Jeff
Hostetler and Johannes Schindelin which led to more changes in this version.

The good news is that this series has now created the basis for many Git
commands to integrate with the sparse-index without much additional work.
This effort was unfortunately overloaded on this series because the changes
needed for things like 'git checkout' or 'git add' all intersect with the
changes needed for 'git status'. Might as well get it right the first time.

Because the range-diff is a big difficult to read this time, I'll break the
changes down on a patch-by-patch basis.

 1. sparse-index: skip indexes with unmerged entries
    
    (no change)

 2. sparse-index: include EXTENDED flag when expanding

 * Commit message better describes the purpose of the change.

 3. t1092: replace incorrect 'echo' with 'cat'

 * Typo fix

 4. t1092: expand repository data shape

 * some files are added that surround "folder1/" immediately before and
   after, based on the sorting with the trailing slash. This provides extra
   coverage.

 5. t1092: add tests for status/add and sparse files
    
    (no change)

 6. unpack-trees: preserve cache_bottom
    
    (no change)

 7. unpack-trees: compare sparse directories correctly

 * We were previosly not comparing the path lengths, which causes a problem
   (with later updates) when a sparse directory such as "folder1/0/" gets
   compared to a tree name "folder1".

 8. unpack-trees: rename unpack_nondirectories()

 * This new commit changes the name to make more sense with its new behavior
   that could modify a sparse directory entry. The point of the method is in
   contrast to recursing into trees.

 9. unpack-trees: unpack sparse directory entries

 * THIS is the biggest change from previous versions. There were a few
   things going on that were tricky to get right, especially with the
   directory/file conflict (handled in an update in the following, new
   patch).

 * The changes to create_ce_entry() regarding alloc_len missed a spot that
   was critical to getting the length right in the allocated entry.

 * Use '\0' over 0 to represent the terminating character.

 * We don't need a "sparse_directory" parameter to unpack_nondirectories()
   (which was renamed to unpack_single_entry() by the previous new patch)
   because we can use dirmask to discover if src[0] (or any other value)
   should be a sparse directory entry.

 * Similarly, we don't need to call the method twice from unpack_callback().

 * The 'conflicts' variable is set to match the dirmask in the beginning,
   but it should depend on whether or not we have a sparse directory entry
   instead, and if all trees that have the path have a directory.

 * The implementation of find_cache_entry() uses find_cache_pos() to find an
   insertion position for a path if it doesn't find an exact match. Before,
   we subtracted one to find the sparse directory entry, but there could be
   multiple paths between the sparse directory entry and the insertion
   point, so we need to walk backwards until we find it. This requires many
   paths having the same prefix, so hopefully is a rare case. Some of the
   test data changes were added to cover the need for this logic. This uses
   a helper method, sparse_dir_matches_path, which is also used by
   is_sparse_directory_entry.

 10. unpack-trees: handle dir/file conflict of sparse entries

 * This new logic inside twoway_merge handles the special case for dealing
   with a directory/file conflict during a 'git checkout'. The necessarily
   data and tests are also added here, though the logic will only take
   serious effect when we integrate with 'git checkout' later.

 11. dir.c: accept a directory as part of cone-mode patterns

 * The value slash_pos was previously a pointer within a strbuf, but in some
   cases we add to that strbuf and that could reallocate the pointer, making
   slash_pos be invalid. The replacement is to have slash_pos be an integer
   position within the string, so it is consistent even if the string is
   reallocated for an append.

 12. diff-lib: handle index diffs with sparse dirs

 * As recommended in the previous review, a simple diff_tree_oid() replaces
   the complicated use of read_tree_at() and traverse_trees() in the
   previous version.

 13. status: skip sparse-checkout percentage with sparse-index
     
     (no change)

 14. status: use sparse-index throughout
     
     (no change)

 15. wt-status: expand added sparse directory entries

 * Duplicate 'git status --porcelain=v2' lines are removed from tests.

 * The pathspec is initialized using "= { 0 }" instead of memset().

 16. fsmonitor: integrate with sparse index

 * An extra test_region is added to ensure that the filesystem monitor hook
   is still being called, and we are not simply disabling the feature
   entirely.

Derrick Stolee (16):
  sparse-index: skip indexes with unmerged entries
  sparse-index: include EXTENDED flag when expanding
  t1092: replace incorrect 'echo' with 'cat'
  t1092: expand repository data shape
  t1092: add tests for status/add and sparse files
  unpack-trees: preserve cache_bottom
  unpack-trees: compare sparse directories correctly
  unpack-trees: rename unpack_nondirectories()
  unpack-trees: unpack sparse directory entries
  unpack-trees: handle dir/file conflict of sparse entries
  dir.c: accept a directory as part of cone-mode patterns
  diff-lib: handle index diffs with sparse dirs
  status: skip sparse-checkout percentage with sparse-index
  status: use sparse-index throughout
  wt-status: expand added sparse directory entries
  fsmonitor: integrate with sparse index

 builtin/commit.c                         |   3 +
 diff-lib.c                               |  16 ++
 dir.c                                    |  24 ++-
 read-cache.c                             |  10 +-
 sparse-index.c                           |  27 +++-
 t/t1092-sparse-checkout-compatibility.sh | 181 ++++++++++++++++++++++-
 t/t7519-status-fsmonitor.sh              |  49 ++++++
 unpack-trees.c                           | 145 +++++++++++++++---
 wt-status.c                              |  65 +++++++-
 wt-status.h                              |   1 +
 10 files changed, 485 insertions(+), 36 deletions(-)


base-commit: ebf3c04b262aa27fbb97f8a0156c2347fecafafb
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-932%2Fderrickstolee%2Fsparse-index%2Fstatus-and-add-v7
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-932/derrickstolee/sparse-index/status-and-add-v7
Pull-Request: https://github.com/gitgitgadget/git/pull/932

Range-diff vs v6:

  1:  2a4a7256304 =  1:  2a4a7256304 sparse-index: skip indexes with unmerged entries
  2:  f5bae86014d =  2:  f5bae86014d sparse-index: include EXTENDED flag when expanding
  3:  d965669c766 =  3:  d965669c766 t1092: replace incorrect 'echo' with 'cat'
  4:  44a940211b2 !  4:  e10fa11cfdb t1092: expand repository data shape
     @@ Commit message
            one entry and are the first entry of a directory with multiple
            entries.
      
     +    * Add filenames adjacent to a sparse directory entry that sort before
     +      and after the trailing slash.
     +
          Later tests will take advantage of these shapes, but they also deepen
          the tests that already exist.
      
     @@ t/t1092-sparse-checkout-compatibility.sh: test_expect_success 'setup' '
      +		mkdir deep/deeper1/0/0 &&
      +		touch deep/deeper1/0/1 &&
      +		touch deep/deeper1/0/0/0 &&
     ++		>folder1- &&
     ++		>folder1.x &&
     ++		>folder10 &&
      +		cp -r deep/deeper1/0 folder1 &&
      +		cp -r deep/deeper1/0 folder2 &&
      +		echo >>folder1/0/0/0 &&
  5:  701ac0e8ff6 =  5:  e94ffa07d46 t1092: add tests for status/add and sparse files
  6:  587333f7c61 =  6:  a8dda933567 unpack-trees: preserve cache_bottom
  7:  6fc898ac23e !  7:  e52166f6e4c unpack-trees: compare sparse directories correctly
     @@ Commit message
          Within compare_entry(), it first calls do_compare_entry() to check the
          leading portion of the name. When the input path is a directory name, we
          could match exactly already. Thus, we should return 0 if we have an
     -    exact string match on a sparse directory entry.
     +    exact string match on a sparse directory entry. The final check is a
     +    length comparison between the strings.
      
          Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
      
     @@ unpack-trees.c: static int compare_entry(const struct cache_entry *ce, const str
      +	 * works when the input name is a directory, since ce->name
      +	 * ends in a directory separator.
      +	 */
     -+	if (S_ISSPARSEDIR(ce->ce_mode))
     ++	if (S_ISSPARSEDIR(ce->ce_mode) &&
     ++	    ce->ce_namelen == traverse_path_len(info, tree_entry_len(n)) + 1)
      +		return 0;
      +
       	/*
  -:  ----------- >  8:  d04b62381b8 unpack-trees: rename unpack_nondirectories()
  8:  b676ef4925b !  9:  237ccf4e43d unpack-trees: unpack sparse directory entries
     @@ Commit message
          might want to perform a merge operation on the entry, such as during
          'git checkout <commit>' which wants to replace a sparse tree entry with
          the tree for that path at the target commit. We extend the logic within
     -    unpack_nondirectories() to create a sparse-directory entry in this case,
     -    and then that is sent to call_unpack_fn(). Since the name becomes
     -    confusing by handling directories, rename it to unpack_single_entry()
     -    since it handles a blob entry or a sparse directory entry without using
     -    traverse_trees_recursive().
     +    unpack_single_entry() to create a sparse-directory entry in this case,
     +    and then that is sent to call_unpack_fn().
      
          There are some subtleties in this process. For instance, we need to
          update find_cache_entry() to allow finding a sparse-directory entry that
          exactly matches a given path. Use the new helper method
     -    sparse_dir_matches_path() for this.
     +    sparse_dir_matches_path() for this. We also need to ignore conflict
     +    markers in the case that the entries correspond to directories and we
     +    already have a sparse directory entry.
      
          Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
      
     @@ unpack-trees.c: static struct cache_entry *create_ce_entry(const struct traverse
       	return ce;
       }
       
     -@@ unpack-trees.c: static struct cache_entry *create_ce_entry(const struct traverse_info *info,
     -  * without actually calling it. If you change the logic here you may need to
     -  * check and change there as well.
     -  */
     --static int unpack_nondirectories(int n, unsigned long mask,
     --				 unsigned long dirmask,
     --				 struct cache_entry **src,
     --				 const struct name_entry *names,
     --				 const struct traverse_info *info)
     -+static int unpack_single_entry(int n, unsigned long mask,
     -+			       unsigned long dirmask,
     -+			       struct cache_entry **src,
     -+			       const struct name_entry *names,
     -+			       const struct traverse_info *info,
     -+			       int sparse_directory)
     - {
     - 	int i;
     +@@ unpack-trees.c: static int unpack_single_entry(int n, unsigned long mask,
       	struct unpack_trees_options *o = info->data;
       	unsigned long conflicts = info->df_conflicts | dirmask;
       
     @@ unpack-trees.c: static struct cache_entry *create_ce_entry(const struct traverse
       	if (mask == dirmask && !src[0])
       		return 0;
       
     -+	/* defer work if our cache entry doesn't match the expectations. */
     -+	if (sparse_directory) {
     -+		if (src[0] && !S_ISSPARSEDIR(src[0]->ce_mode))
     -+			BUG("expected sparse directory entry");
     -+	} else if (src[0] && S_ISSPARSEDIR(src[0]->ce_mode)) {
     -+		return 0;
     -+	}
     ++	/*
     ++	 * When we have a sparse directory entry for src[0],
     ++	 * then this isn't necessarily a directory-file conflict.
     ++	 */
     ++	if (mask == dirmask && src[0] &&
     ++	    S_ISSPARSEDIR(src[0]->ce_mode))
     ++		conflicts = 0;
      +
       	/*
       	 * Ok, we've filled in up to any potential index entry in src[0],
       	 * now do the rest.
     -@@ unpack-trees.c: static int unpack_nondirectories(int n, unsigned long mask,
     +@@ unpack-trees.c: static int unpack_single_entry(int n, unsigned long mask,
       		 * not stored in the index.  otherwise construct the
       		 * cache entry from the index aware logic.
       		 */
      -		src[i + o->merge] = create_ce_entry(info, names + i, stage, &o->result, o->merge);
      +		src[i + o->merge] = create_ce_entry(info, names + i, stage,
      +						    &o->result, o->merge,
     -+						    sparse_directory);
     ++						    bit & dirmask);
       	}
       
       	if (o->merge) {
     @@ unpack-trees.c: static int find_cache_pos(struct traverse_info *info,
      +	 * Check for a sparse-directory entry named "path/".
      +	 * Due to the input p->path not having a trailing
      +	 * slash, the negative 'pos' value overshoots the
     -+	 * expected position by one, hence "-2" here.
     ++	 * expected position by at least one, hence "-2" here.
      +	 */
      +	pos = -pos - 2;
      +
      +	if (pos < 0 || pos >= o->src_index->cache_nr)
       		return NULL;
      +
     -+	ce = o->src_index->cache[pos];
     ++	/*
     ++	 * We might have multiple entries between 'pos' and
     ++	 * the actual sparse-directory entry, so start walking
     ++	 * back until finding it or passing where it would be.
     ++	 */
     ++	while (pos >= 0) {
     ++		ce = o->src_index->cache[pos];
     ++
     ++		if (strncmp(ce->name, p->path, p->pathlen))
     ++			return NULL;
      +
     -+	if (!S_ISSPARSEDIR(ce->ce_mode))
     -+		return NULL;
     ++		if (S_ISSPARSEDIR(ce->ce_mode) &&
     ++		    sparse_dir_matches_path(ce, info, p))
     ++			return ce;
      +
     -+	if (sparse_dir_matches_path(ce, info, p))
     -+		return ce;
     ++		pos--;
     ++	}
      +
      +	return NULL;
       }
     @@ unpack-trees.c: static void debug_unpack_callback(int n,
       /*
        * Note that traverse_by_cache_tree() duplicates some logic in this function
        * without actually calling it. If you change the logic here you may need to
     -@@ unpack-trees.c: static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, str
     - 		}
     - 	}
     - 
     --	if (unpack_nondirectories(n, mask, dirmask, src, names, info) < 0)
     -+	if (unpack_single_entry(n, mask, dirmask, src, names, info, 0) < 0)
     - 		return -1;
     - 
     - 	if (o->merge && src[0]) {
      @@ unpack-trees.c: static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, str
       			}
       		}
       
      -		if (traverse_trees_recursive(n, dirmask, mask & ~dirmask,
      -					     names, info) < 0)
     -+		if (is_sparse_directory_entry(src[0], names, info)) {
     -+			if (unpack_single_entry(n, dirmask, mask & ~dirmask, src, names, info, 1) < 0)
     -+				return -1;
     -+		} else if (traverse_trees_recursive(n, dirmask, mask & ~dirmask,
     ++		if (!is_sparse_directory_entry(src[0], names, info) &&
     ++		    traverse_trees_recursive(n, dirmask, mask & ~dirmask,
      +						    names, info) < 0) {
       			return -1;
      +		}
  -:  ----------- > 10:  9f31c691af6 unpack-trees: handle dir/file conflict of sparse entries
  9:  d693f00d9a2 ! 11:  2a43287c47e dir.c: accept a directory as part of cone-mode patterns
     @@ Commit message
          Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
      
       ## dir.c ##
     +@@ dir.c: enum pattern_match_result path_matches_pattern_list(
     + 	struct path_pattern *pattern;
     + 	struct strbuf parent_pathname = STRBUF_INIT;
     + 	int result = NOT_MATCHED;
     +-	const char *slash_pos;
     ++	size_t slash_pos;
     + 
     + 	if (!pl->use_cone_patterns) {
     + 		pattern = last_matching_pattern_from_list(pathname, pathlen, basename,
      @@ dir.c: enum pattern_match_result path_matches_pattern_list(
       	strbuf_addch(&parent_pathname, '/');
       	strbuf_add(&parent_pathname, pathname, pathlen);
     @@ dir.c: enum pattern_match_result path_matches_pattern_list(
      +	 * use the file-base matching logic in an equivalent way.
      +	 */
      +	if (parent_pathname.len > 0 &&
     -+	    parent_pathname.buf[parent_pathname.len - 1] == '/')
     ++	    parent_pathname.buf[parent_pathname.len - 1] == '/') {
     ++		slash_pos = parent_pathname.len - 1;
      +		strbuf_add(&parent_pathname, "-", 1);
     ++	} else {
     ++		const char *slash_ptr = strrchr(parent_pathname.buf, '/');
     ++		slash_pos = slash_ptr ? slash_ptr - parent_pathname.buf : 0;
     ++	}
      +
       	if (hashmap_contains_path(&pl->recursive_hashmap,
       				  &parent_pathname)) {
       		result = MATCHED_RECURSIVE;
     + 		goto done;
     + 	}
     + 
     +-	slash_pos = strrchr(parent_pathname.buf, '/');
     +-
     +-	if (slash_pos == parent_pathname.buf) {
     ++	if (!slash_pos) {
     + 		/* include every file in root */
     + 		result = MATCHED;
     + 		goto done;
     + 	}
     + 
     +-	strbuf_setlen(&parent_pathname, slash_pos - parent_pathname.buf);
     ++	strbuf_setlen(&parent_pathname, slash_pos);
     + 
     + 	if (hashmap_contains_path(&pl->parent_hashmap, &parent_pathname)) {
     + 		result = MATCHED;
 10:  ed11cfc791f ! 12:  f83aa08ff6b diff-lib: handle index diffs with sparse dirs
     @@ Commit message
          identical to before: the lack of a cache entry is the same with a sparse
          index.
      
     -    Use diff_tree_oid() appropriately to appropriately compute the diff.
     +    Use diff_tree_oid() appropriately to compute the diff.
      
          Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
      
 11:  48fd25aacbe = 13:  35063ffb8ed status: skip sparse-checkout percentage with sparse-index
 12:  3499105eb67 = 14:  b4033a9bf36 status: use sparse-index throughout
 13:  08225483d69 ! 15:  717a3f49f97 wt-status: expand added sparse directory entries
     @@ wt-status.c: static void wt_status_collect_changes_index(struct wt_status *s)
      +	struct string_list_item *it;
      +	struct wt_status_change_data *d;
      +	struct wt_status *s = context;
     -+	char *full_name;
     ++	struct strbuf full_name = STRBUF_INIT;
      +
      +	if (S_ISDIR(mode))
      +		return READ_TREE_RECURSIVE;
      +
     -+	full_name = xstrfmt("%s%s", base->buf, path);
     -+	it = string_list_insert(&s->change, full_name);
     ++	strbuf_add(&full_name, base->buf, base->len);
     ++	strbuf_addstr(&full_name, path);
     ++	it = string_list_insert(&s->change, full_name.buf);
      +	d = it->util;
      +	if (!d) {
      +		CALLOC_ARRAY(d, 1);
     @@ wt-status.c: static void wt_status_collect_changes_index(struct wt_status *s)
      +	d->mode_index = mode;
      +	oidcpy(&d->oid_index, oid);
      +	s->committable = 1;
     ++	strbuf_release(&full_name);
      +	return 0;
      +}
      +
 14:  711e403a63a ! 16:  1d744848ee6 fsmonitor: integrate with sparse index
     @@ t/t7519-status-fsmonitor.sh: test_expect_success 'status succeeds after staging/
      +	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \
      +		git status --porcelain=v2 >actual &&
      +	test_region $1 index ensure_full_index trace2.txt &&
     ++	test_region fsm_hook query trace2.txt &&
      +	test_cmp expect actual &&
      +	rm trace2.txt &&
      +	git sparse-checkout disable

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 215+ messages in thread

* [PATCH v7 01/16] sparse-index: skip indexes with unmerged entries
  2021-06-29  2:04           ` [PATCH v7 00/16] " Derrick Stolee via GitGitGadget
@ 2021-06-29  2:04             ` Derrick Stolee via GitGitGadget
  2021-06-29  2:04             ` [PATCH v7 02/16] sparse-index: include EXTENDED flag when expanding Derrick Stolee via GitGitGadget
                               ` (17 subsequent siblings)
  18 siblings, 0 replies; 215+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-06-29  2:04 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee, git,
	johannes.schindelin, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

The sparse-index format is designed to be compatible with merge
conflicts, even those outside the sparse-checkout definition. The reason
is that when converting a full index to a sparse one, a cache entry with
nonzero stage will not be collapsed into a sparse directory entry.

However, this behavior was not tested, and a different behavior within
convert_to_sparse() fails in this scenario. Specifically,
cache_tree_update() will fail when unmerged entries exist.
convert_to_sparse_rec() uses the cache-tree data to recursively walk the
tree structure, but also to compute the OIDs used in the
sparse-directory entries.

Add an index scan to convert_to_sparse() that will detect if these merge
conflict entries exist and skip the conversion before trying to update
the cache-tree. This is marked as NEEDSWORK because this can be removed
with a suitable update to cache_tree_update() or a similar method that
can construct a cache-tree with invalid nodes, but still allow creating
the nodes necessary for creating sparse directory entries.

It is possible that in the future we will not need to make such an
update, since if we do not expand a sparse-index into a full one, this
conversion does not need to happen. Thus, this can be deferred until the
merge machinery is made to integrate with the sparse-index.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 sparse-index.c                           | 18 ++++++++++++++++++
 t/t1092-sparse-checkout-compatibility.sh | 22 ++++++++++++++++++++++
 2 files changed, 40 insertions(+)

diff --git a/sparse-index.c b/sparse-index.c
index affc4048f27..2c695930275 100644
--- a/sparse-index.c
+++ b/sparse-index.c
@@ -116,6 +116,17 @@ int set_sparse_index_config(struct repository *repo, int enable)
 	return res;
 }
 
+static int index_has_unmerged_entries(struct index_state *istate)
+{
+	int i;
+	for (i = 0; i < istate->cache_nr; i++) {
+		if (ce_stage(istate->cache[i]))
+			return 1;
+	}
+
+	return 0;
+}
+
 int convert_to_sparse(struct index_state *istate)
 {
 	int test_env;
@@ -152,6 +163,13 @@ int convert_to_sparse(struct index_state *istate)
 		return -1;
 	}
 
+	/*
+	 * NEEDSWORK: If we have unmerged entries, then stay full.
+	 * Unmerged entries prevent the cache-tree extension from working.
+	 */
+	if (index_has_unmerged_entries(istate))
+		return 0;
+
 	if (cache_tree_update(istate, 0)) {
 		warning(_("unable to update cache-tree, staying full"));
 		return -1;
diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index e9a815ca7aa..ba2fd94adaf 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -352,6 +352,28 @@ test_expect_success 'merge with outside renames' '
 	done
 '
 
+# Sparse-index fails to convert the index in the
+# final 'git cherry-pick' command.
+test_expect_success 'cherry-pick with conflicts' '
+	init_repos &&
+
+	write_script edit-conflict <<-\EOF &&
+	echo $1 >conflict
+	EOF
+
+	test_all_match git checkout -b to-cherry-pick &&
+	run_on_all ../edit-conflict ABC &&
+	test_all_match git add conflict &&
+	test_all_match git commit -m "conflict to pick" &&
+
+	test_all_match git checkout -B base HEAD~1 &&
+	run_on_all ../edit-conflict DEF &&
+	test_all_match git add conflict &&
+	test_all_match git commit -m "conflict in base" &&
+
+	test_all_match test_must_fail git cherry-pick to-cherry-pick
+'
+
 test_expect_success 'clean' '
 	init_repos &&
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 215+ messages in thread

* [PATCH v7 02/16] sparse-index: include EXTENDED flag when expanding
  2021-06-29  2:04           ` [PATCH v7 00/16] " Derrick Stolee via GitGitGadget
  2021-06-29  2:04             ` [PATCH v7 01/16] sparse-index: skip indexes with unmerged entries Derrick Stolee via GitGitGadget
@ 2021-06-29  2:04             ` Derrick Stolee via GitGitGadget
  2021-06-29  2:04             ` [PATCH v7 03/16] t1092: replace incorrect 'echo' with 'cat' Derrick Stolee via GitGitGadget
                               ` (16 subsequent siblings)
  18 siblings, 0 replies; 215+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-06-29  2:04 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee, git,
	johannes.schindelin, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

When creating a full index from a sparse one, we create cache entries
for every blob within a given sparse directory entry. These are
correctly marked with the CE_SKIP_WORKTREE flag, but the CE_EXTENDED
flag is not included. The CE_EXTENDED flag would exist if we loaded a
full index from disk with these entries marked with CE_SKIP_WORKTREE, so
we can add the flag here to be consistent. This allows us to directly
compare the flags present in cache entries when testing the sparse-index
feature, but has no significance to its correctness in the user-facing
functionality.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 sparse-index.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/sparse-index.c b/sparse-index.c
index 2c695930275..ef53bd2198b 100644
--- a/sparse-index.c
+++ b/sparse-index.c
@@ -213,7 +213,7 @@ static int add_path_to_index(const struct object_id *oid,
 	strbuf_addstr(base, path);
 
 	ce = make_cache_entry(istate, mode, oid, base->buf, 0, 0);
-	ce->ce_flags |= CE_SKIP_WORKTREE;
+	ce->ce_flags |= CE_SKIP_WORKTREE | CE_EXTENDED;
 	set_index_entry(istate, istate->cache_nr++, ce);
 
 	strbuf_setlen(base, len);
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 215+ messages in thread

* [PATCH v7 03/16] t1092: replace incorrect 'echo' with 'cat'
  2021-06-29  2:04           ` [PATCH v7 00/16] " Derrick Stolee via GitGitGadget
  2021-06-29  2:04             ` [PATCH v7 01/16] sparse-index: skip indexes with unmerged entries Derrick Stolee via GitGitGadget
  2021-06-29  2:04             ` [PATCH v7 02/16] sparse-index: include EXTENDED flag when expanding Derrick Stolee via GitGitGadget
@ 2021-06-29  2:04             ` Derrick Stolee via GitGitGadget
  2021-06-29  2:04             ` [PATCH v7 04/16] t1092: expand repository data shape Derrick Stolee via GitGitGadget
                               ` (15 subsequent siblings)
  18 siblings, 0 replies; 215+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-06-29  2:04 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee, git,
	johannes.schindelin, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

This fixes the test data shape to be as expected, allowing rename
detection to work properly now that the 'larger-content' file actually
has meaningful lines.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 t/t1092-sparse-checkout-compatibility.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index ba2fd94adaf..ebbba044f77 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -40,7 +40,7 @@ test_expect_success 'setup' '
 		done &&
 
 		git checkout -b rename-base base &&
-		echo >folder1/larger-content <<-\EOF &&
+		cat >folder1/larger-content <<-\EOF &&
 		matching
 		lines
 		help
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 215+ messages in thread

* [PATCH v7 04/16] t1092: expand repository data shape
  2021-06-29  2:04           ` [PATCH v7 00/16] " Derrick Stolee via GitGitGadget
                               ` (2 preceding siblings ...)
  2021-06-29  2:04             ` [PATCH v7 03/16] t1092: replace incorrect 'echo' with 'cat' Derrick Stolee via GitGitGadget
@ 2021-06-29  2:04             ` Derrick Stolee via GitGitGadget
  2021-06-29  2:04             ` [PATCH v7 05/16] t1092: add tests for status/add and sparse files Derrick Stolee via GitGitGadget
                               ` (14 subsequent siblings)
  18 siblings, 0 replies; 215+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-06-29  2:04 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee, git,
	johannes.schindelin, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

As more features integrate with the sparse-index feature, more and more
special cases arise that require different data shapes within the tree
structure of the repository in order to demonstrate those cases.

Add several interesting special cases all at once instead of sprinkling
them across several commits. The interesting cases being added here are:

* Add sparse-directory entries on both sides of directories within the
  sparse-checkout definition.

* Add directories outside the sparse-checkout definition who have only
  one entry and are the first entry of a directory with multiple
  entries.

* Add filenames adjacent to a sparse directory entry that sort before
  and after the trailing slash.

Later tests will take advantage of these shapes, but they also deepen
the tests that already exist.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 t/t1092-sparse-checkout-compatibility.sh | 42 ++++++++++++++++++++++--
 1 file changed, 40 insertions(+), 2 deletions(-)

diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index ebbba044f77..363d605c209 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -17,7 +17,7 @@ test_expect_success 'setup' '
 		echo "after folder1" >g &&
 		echo "after x" >z &&
 		mkdir folder1 folder2 deep x &&
-		mkdir deep/deeper1 deep/deeper2 &&
+		mkdir deep/deeper1 deep/deeper2 deep/before deep/later &&
 		mkdir deep/deeper1/deepest &&
 		echo "after deeper1" >deep/e &&
 		echo "after deepest" >deep/deeper1/e &&
@@ -25,10 +25,23 @@ test_expect_success 'setup' '
 		cp a folder2 &&
 		cp a x &&
 		cp a deep &&
+		cp a deep/before &&
 		cp a deep/deeper1 &&
 		cp a deep/deeper2 &&
+		cp a deep/later &&
 		cp a deep/deeper1/deepest &&
 		cp -r deep/deeper1/deepest deep/deeper2 &&
+		mkdir deep/deeper1/0 &&
+		mkdir deep/deeper1/0/0 &&
+		touch deep/deeper1/0/1 &&
+		touch deep/deeper1/0/0/0 &&
+		>folder1- &&
+		>folder1.x &&
+		>folder10 &&
+		cp -r deep/deeper1/0 folder1 &&
+		cp -r deep/deeper1/0 folder2 &&
+		echo >>folder1/0/0/0 &&
+		echo >>folder2/0/1 &&
 		git add . &&
 		git commit -m "initial commit" &&
 		git checkout -b base &&
@@ -56,11 +69,17 @@ test_expect_success 'setup' '
 		mv folder1/a folder2/b &&
 		mv folder1/larger-content folder2/edited-content &&
 		echo >>folder2/edited-content &&
+		echo >>folder2/0/1 &&
+		echo stuff >>deep/deeper1/a &&
 		git add . &&
 		git commit -m "rename folder1/... to folder2/..." &&
 
 		git checkout -b rename-out-to-in rename-base &&
 		mv folder1/a deep/deeper1/b &&
+		echo more stuff >>deep/deeper1/a &&
+		rm folder2/0/1 &&
+		mkdir folder2/0/1 &&
+		echo >>folder2/0/1/1 &&
 		mv folder1/larger-content deep/deeper1/edited-content &&
 		echo >>deep/deeper1/edited-content &&
 		git add . &&
@@ -68,6 +87,9 @@ test_expect_success 'setup' '
 
 		git checkout -b rename-in-to-out rename-base &&
 		mv deep/deeper1/a folder1/b &&
+		echo >>folder2/0/1 &&
+		rm -rf folder1/0/0 &&
+		echo >>folder1/0/0 &&
 		mv deep/deeper1/larger-content folder1/edited-content &&
 		echo >>folder1/edited-content &&
 		git add . &&
@@ -262,13 +284,29 @@ test_expect_success 'diff --staged' '
 	test_all_match git diff --staged
 '
 
-test_expect_success 'diff with renames' '
+test_expect_success 'diff with renames and conflicts' '
 	init_repos &&
 
 	for branch in rename-out-to-out rename-out-to-in rename-in-to-out
 	do
 		test_all_match git checkout rename-base &&
 		test_all_match git checkout $branch -- .&&
+		test_all_match git status --porcelain=v2 &&
+		test_all_match git diff --staged --no-renames &&
+		test_all_match git diff --staged --find-renames || return 1
+	done
+'
+
+test_expect_success 'diff with directory/file conflicts' '
+	init_repos &&
+
+	for branch in rename-out-to-out rename-out-to-in rename-in-to-out
+	do
+		git -C full-checkout reset --hard &&
+		test_sparse_match git reset --hard &&
+		test_all_match git checkout $branch &&
+		test_all_match git checkout rename-base -- . &&
+		test_all_match git status --porcelain=v2 &&
 		test_all_match git diff --staged --no-renames &&
 		test_all_match git diff --staged --find-renames || return 1
 	done
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 215+ messages in thread

* [PATCH v7 05/16] t1092: add tests for status/add and sparse files
  2021-06-29  2:04           ` [PATCH v7 00/16] " Derrick Stolee via GitGitGadget
                               ` (3 preceding siblings ...)
  2021-06-29  2:04             ` [PATCH v7 04/16] t1092: expand repository data shape Derrick Stolee via GitGitGadget
@ 2021-06-29  2:04             ` Derrick Stolee via GitGitGadget
  2021-06-29  2:04             ` [PATCH v7 06/16] unpack-trees: preserve cache_bottom Derrick Stolee via GitGitGadget
                               ` (13 subsequent siblings)
  18 siblings, 0 replies; 215+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-06-29  2:04 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee, git,
	johannes.schindelin, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Before moving to update 'git status' and 'git add' to work with sparse
indexes, add an explicit test that ensures the sparse-index works the
same as a normal sparse-checkout when the worktree contains directories
and files outside of the sparse cone.

Specifically, 'folder1/a' is a file in our test repo, but 'folder1' is
not in the sparse cone. When 'folder1/a' is modified, the file is not
shown as modified and adding it will fail. This is new behavior as of
a20f704 (add: warn when asked to update SKIP_WORKTREE entries,
2021-04-08). Before that change, these adds would be silently ignored.

Untracked files are fine: adding new files both with 'git add .' and
'git add folder1/' works just as in a full checkout. This may not be
entirely desirable, but we are not intending to change behavior at the
moment, only document it. A future change could alter the behavior to
be more sensible, and this test could be modified to satisfy the new
expected behavior.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 t/t1092-sparse-checkout-compatibility.sh | 38 ++++++++++++++++++++++++
 1 file changed, 38 insertions(+)

diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index 363d605c209..3f61e5686b5 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -254,6 +254,44 @@ test_expect_success 'add, commit, checkout' '
 	test_all_match git checkout -
 '
 
+test_expect_success 'status/add: outside sparse cone' '
+	init_repos &&
+
+	# adding a "missing" file outside the cone should fail
+	test_sparse_match test_must_fail git add folder1/a &&
+
+	# folder1 is at HEAD, but outside the sparse cone
+	run_on_sparse mkdir folder1 &&
+	cp initial-repo/folder1/a sparse-checkout/folder1/a &&
+	cp initial-repo/folder1/a sparse-index/folder1/a &&
+
+	test_sparse_match git status &&
+
+	write_script edit-contents <<-\EOF &&
+	echo text >>$1
+	EOF
+	run_on_sparse ../edit-contents folder1/a &&
+	run_on_all ../edit-contents folder1/new &&
+
+	test_sparse_match git status --porcelain=v2 &&
+
+	# This "git add folder1/a" fails with a warning
+	# in the sparse repos, differing from the full
+	# repo. This is intentional.
+	test_sparse_match test_must_fail git add folder1/a &&
+	test_sparse_match test_must_fail git add --refresh folder1/a &&
+	test_all_match git status --porcelain=v2 &&
+
+	test_all_match git add . &&
+	test_all_match git status --porcelain=v2 &&
+	test_all_match git commit -m folder1/new &&
+
+	run_on_all ../edit-contents folder1/newer &&
+	test_all_match git add folder1/ &&
+	test_all_match git status --porcelain=v2 &&
+	test_all_match git commit -m folder1/newer
+'
+
 test_expect_success 'checkout and reset --hard' '
 	init_repos &&
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 215+ messages in thread

* [PATCH v7 06/16] unpack-trees: preserve cache_bottom
  2021-06-29  2:04           ` [PATCH v7 00/16] " Derrick Stolee via GitGitGadget
                               ` (4 preceding siblings ...)
  2021-06-29  2:04             ` [PATCH v7 05/16] t1092: add tests for status/add and sparse files Derrick Stolee via GitGitGadget
@ 2021-06-29  2:04             ` Derrick Stolee via GitGitGadget
  2021-06-29  2:04             ` [PATCH v7 07/16] unpack-trees: compare sparse directories correctly Derrick Stolee via GitGitGadget
                               ` (12 subsequent siblings)
  18 siblings, 0 replies; 215+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-06-29  2:04 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee, git,
	johannes.schindelin, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

The cache_bottom member of 'struct unpack_trees_options' is used to
track the range of index entries corresponding to a node of the cache
tree. While recursing with traverse_by_cache_tree(), this value is
preserved on the call stack using a local and then restored as that
method returns.

The mark_ce_used() method normally modifies the cache_bottom member when
it refers to the marked cache entry. However, sparse directory entries
are stored as nodes in the cache-tree data structure as of 2de37c53
(cache-tree: integrate with sparse directory entries, 2021-03-30). Thus,
the cache_bottom will be modified as the cache-tree walk advances. Do
not update it as well within mark_ce_used().

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 unpack-trees.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/unpack-trees.c b/unpack-trees.c
index f88a69f8e71..87c1ed204c8 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -600,6 +600,13 @@ static void mark_ce_used(struct cache_entry *ce, struct unpack_trees_options *o)
 {
 	ce->ce_flags |= CE_UNPACKED;
 
+	/*
+	 * If this is a sparse directory, don't advance cache_bottom.
+	 * That will be advanced later using the cache-tree data.
+	 */
+	if (S_ISSPARSEDIR(ce->ce_mode))
+		return;
+
 	if (o->cache_bottom < o->src_index->cache_nr &&
 	    o->src_index->cache[o->cache_bottom] == ce) {
 		int bottom = o->cache_bottom;
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 215+ messages in thread

* [PATCH v7 07/16] unpack-trees: compare sparse directories correctly
  2021-06-29  2:04           ` [PATCH v7 00/16] " Derrick Stolee via GitGitGadget
                               ` (5 preceding siblings ...)
  2021-06-29  2:04             ` [PATCH v7 06/16] unpack-trees: preserve cache_bottom Derrick Stolee via GitGitGadget
@ 2021-06-29  2:04             ` Derrick Stolee via GitGitGadget
  2021-06-29  2:04             ` [PATCH v7 08/16] unpack-trees: rename unpack_nondirectories() Derrick Stolee via GitGitGadget
                               ` (11 subsequent siblings)
  18 siblings, 0 replies; 215+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-06-29  2:04 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee, git,
	johannes.schindelin, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

As we further integrate the sparse-index into unpack-trees, we need to
ensure that we compare sparse directory entries correctly with other
entries. This affects searching for an exact path as well as sorting
index entries.

Sparse directory entries contain the trailing directory separator. This
is important for the sorting, in particular. Thus, within
do_compare_entry() we stop using S_IFREG in all cases, since sparse
directories should use S_IFDIR to indicate that the comparison should
treat the entry name as a dirctory.

Within compare_entry(), it first calls do_compare_entry() to check the
leading portion of the name. When the input path is a directory name, we
could match exactly already. Thus, we should return 0 if we have an
exact string match on a sparse directory entry. The final check is a
length comparison between the strings.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 unpack-trees.c | 14 +++++++++++++-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/unpack-trees.c b/unpack-trees.c
index 87c1ed204c8..b113cc750f2 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -983,6 +983,7 @@ static int do_compare_entry(const struct cache_entry *ce,
 	int pathlen, ce_len;
 	const char *ce_name;
 	int cmp;
+	unsigned ce_mode;
 
 	/*
 	 * If we have not precomputed the traverse path, it is quicker
@@ -1005,7 +1006,8 @@ static int do_compare_entry(const struct cache_entry *ce,
 	ce_len -= pathlen;
 	ce_name = ce->name + pathlen;
 
-	return df_name_compare(ce_name, ce_len, S_IFREG, name, namelen, mode);
+	ce_mode = S_ISSPARSEDIR(ce->ce_mode) ? S_IFDIR : S_IFREG;
+	return df_name_compare(ce_name, ce_len, ce_mode, name, namelen, mode);
 }
 
 static int compare_entry(const struct cache_entry *ce, const struct traverse_info *info, const struct name_entry *n)
@@ -1014,6 +1016,16 @@ static int compare_entry(const struct cache_entry *ce, const struct traverse_inf
 	if (cmp)
 		return cmp;
 
+	/*
+	 * At this point, we know that we have a prefix match. If ce
+	 * is a sparse directory, then allow an exact match. This only
+	 * works when the input name is a directory, since ce->name
+	 * ends in a directory separator.
+	 */
+	if (S_ISSPARSEDIR(ce->ce_mode) &&
+	    ce->ce_namelen == traverse_path_len(info, tree_entry_len(n)) + 1)
+		return 0;
+
 	/*
 	 * Even if the beginning compared identically, the ce should
 	 * compare as bigger than a directory leading up to it!
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 215+ messages in thread

* [PATCH v7 08/16] unpack-trees: rename unpack_nondirectories()
  2021-06-29  2:04           ` [PATCH v7 00/16] " Derrick Stolee via GitGitGadget
                               ` (6 preceding siblings ...)
  2021-06-29  2:04             ` [PATCH v7 07/16] unpack-trees: compare sparse directories correctly Derrick Stolee via GitGitGadget
@ 2021-06-29  2:04             ` Derrick Stolee via GitGitGadget
  2021-06-29  2:04             ` [PATCH v7 09/16] unpack-trees: unpack sparse directory entries Derrick Stolee via GitGitGadget
                               ` (10 subsequent siblings)
  18 siblings, 0 replies; 215+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-06-29  2:04 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee, git,
	johannes.schindelin, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

In the next change, we will use this method to unpack a sparse directory
entry, so change the name to unpack_single_entry() so these entries
apply. The new name reflects that we will not recurse into trees in
order to resolve the conflicts.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 unpack-trees.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/unpack-trees.c b/unpack-trees.c
index b113cc750f2..d26386ce8b2 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -804,7 +804,7 @@ static int traverse_by_cache_tree(int pos, int nr_entries, int nr_names,
 		BUG("We need cache-tree to do this optimization");
 
 	/*
-	 * Do what unpack_callback() and unpack_nondirectories() normally
+	 * Do what unpack_callback() and unpack_single_entry() normally
 	 * do. But we walk all paths in an iterative loop instead.
 	 *
 	 * D/F conflicts and higher stage entries are not a concern
@@ -1075,11 +1075,11 @@ static struct cache_entry *create_ce_entry(const struct traverse_info *info,
  * without actually calling it. If you change the logic here you may need to
  * check and change there as well.
  */
-static int unpack_nondirectories(int n, unsigned long mask,
-				 unsigned long dirmask,
-				 struct cache_entry **src,
-				 const struct name_entry *names,
-				 const struct traverse_info *info)
+static int unpack_single_entry(int n, unsigned long mask,
+			       unsigned long dirmask,
+			       struct cache_entry **src,
+			       const struct name_entry *names,
+			       const struct traverse_info *info)
 {
 	int i;
 	struct unpack_trees_options *o = info->data;
@@ -1322,7 +1322,7 @@ static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, str
 		}
 	}
 
-	if (unpack_nondirectories(n, mask, dirmask, src, names, info) < 0)
+	if (unpack_single_entry(n, mask, dirmask, src, names, info) < 0)
 		return -1;
 
 	if (o->merge && src[0]) {
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 215+ messages in thread

* [PATCH v7 09/16] unpack-trees: unpack sparse directory entries
  2021-06-29  2:04           ` [PATCH v7 00/16] " Derrick Stolee via GitGitGadget
                               ` (7 preceding siblings ...)
  2021-06-29  2:04             ` [PATCH v7 08/16] unpack-trees: rename unpack_nondirectories() Derrick Stolee via GitGitGadget
@ 2021-06-29  2:04             ` Derrick Stolee via GitGitGadget
  2021-07-07 22:25               ` Elijah Newren
  2021-06-29  2:04             ` [PATCH v7 10/16] unpack-trees: handle dir/file conflict of sparse entries Derrick Stolee via GitGitGadget
                               ` (9 subsequent siblings)
  18 siblings, 1 reply; 215+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-06-29  2:04 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee, git,
	johannes.schindelin, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

During unpack_callback(), index entries are compared against tree
entries. These are matched according to names and types. One goal is to
decide if we should recurse into subtrees or simply operate on one index
entry.

In the case of a sparse-directory entry, we do not want to recurse into
that subtree and instead simply compare the trees. In some cases, we
might want to perform a merge operation on the entry, such as during
'git checkout <commit>' which wants to replace a sparse tree entry with
the tree for that path at the target commit. We extend the logic within
unpack_single_entry() to create a sparse-directory entry in this case,
and then that is sent to call_unpack_fn().

There are some subtleties in this process. For instance, we need to
update find_cache_entry() to allow finding a sparse-directory entry that
exactly matches a given path. Use the new helper method
sparse_dir_matches_path() for this. We also need to ignore conflict
markers in the case that the entries correspond to directories and we
already have a sparse directory entry.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 unpack-trees.c | 105 +++++++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 97 insertions(+), 8 deletions(-)

diff --git a/unpack-trees.c b/unpack-trees.c
index d26386ce8b2..d141dffbd94 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -1052,13 +1052,15 @@ static struct cache_entry *create_ce_entry(const struct traverse_info *info,
 	const struct name_entry *n,
 	int stage,
 	struct index_state *istate,
-	int is_transient)
+	int is_transient,
+	int is_sparse_directory)
 {
 	size_t len = traverse_path_len(info, tree_entry_len(n));
+	size_t alloc_len = is_sparse_directory ? len + 1 : len;
 	struct cache_entry *ce =
 		is_transient ?
-		make_empty_transient_cache_entry(len, NULL) :
-		make_empty_cache_entry(istate, len);
+		make_empty_transient_cache_entry(alloc_len, NULL) :
+		make_empty_cache_entry(istate, alloc_len);
 
 	ce->ce_mode = create_ce_mode(n->mode);
 	ce->ce_flags = create_ce_flags(stage);
@@ -1067,6 +1069,13 @@ static struct cache_entry *create_ce_entry(const struct traverse_info *info,
 	/* len+1 because the cache_entry allocates space for NUL */
 	make_traverse_path(ce->name, len + 1, info, n->path, n->pathlen);
 
+	if (is_sparse_directory) {
+		ce->name[len] = '/';
+		ce->name[len + 1] = '\0';
+		ce->ce_namelen++;
+		ce->ce_flags |= CE_SKIP_WORKTREE;
+	}
+
 	return ce;
 }
 
@@ -1085,10 +1094,17 @@ static int unpack_single_entry(int n, unsigned long mask,
 	struct unpack_trees_options *o = info->data;
 	unsigned long conflicts = info->df_conflicts | dirmask;
 
-	/* Do we have *only* directories? Nothing to do */
 	if (mask == dirmask && !src[0])
 		return 0;
 
+	/*
+	 * When we have a sparse directory entry for src[0],
+	 * then this isn't necessarily a directory-file conflict.
+	 */
+	if (mask == dirmask && src[0] &&
+	    S_ISSPARSEDIR(src[0]->ce_mode))
+		conflicts = 0;
+
 	/*
 	 * Ok, we've filled in up to any potential index entry in src[0],
 	 * now do the rest.
@@ -1118,7 +1134,9 @@ static int unpack_single_entry(int n, unsigned long mask,
 		 * not stored in the index.  otherwise construct the
 		 * cache entry from the index aware logic.
 		 */
-		src[i + o->merge] = create_ce_entry(info, names + i, stage, &o->result, o->merge);
+		src[i + o->merge] = create_ce_entry(info, names + i, stage,
+						    &o->result, o->merge,
+						    bit & dirmask);
 	}
 
 	if (o->merge) {
@@ -1222,16 +1240,69 @@ static int find_cache_pos(struct traverse_info *info,
 	return -1;
 }
 
+/*
+ * Given a sparse directory entry 'ce', compare ce->name to
+ * info->name + '/' + p->path + '/' if info->name is non-empty.
+ * Compare ce->name to p->path + '/' otherwise. Note that
+ * ce->name must end in a trailing '/' because it is a sparse
+ * directory entry.
+ */
+static int sparse_dir_matches_path(const struct cache_entry *ce,
+				   struct traverse_info *info,
+				   const struct name_entry *p)
+{
+	assert(S_ISSPARSEDIR(ce->ce_mode));
+	assert(ce->name[ce->ce_namelen - 1] == '/');
+
+	if (info->namelen)
+		return ce->ce_namelen == info->namelen + p->pathlen + 2 &&
+		       ce->name[info->namelen] == '/' &&
+		       !strncmp(ce->name, info->name, info->namelen) &&
+		       !strncmp(ce->name + info->namelen + 1, p->path, p->pathlen);
+	return ce->ce_namelen == p->pathlen + 1 &&
+	       !strncmp(ce->name, p->path, p->pathlen);
+}
+
 static struct cache_entry *find_cache_entry(struct traverse_info *info,
 					    const struct name_entry *p)
 {
+	struct cache_entry *ce;
 	int pos = find_cache_pos(info, p->path, p->pathlen);
 	struct unpack_trees_options *o = info->data;
 
 	if (0 <= pos)
 		return o->src_index->cache[pos];
-	else
+
+	/*
+	 * Check for a sparse-directory entry named "path/".
+	 * Due to the input p->path not having a trailing
+	 * slash, the negative 'pos' value overshoots the
+	 * expected position by at least one, hence "-2" here.
+	 */
+	pos = -pos - 2;
+
+	if (pos < 0 || pos >= o->src_index->cache_nr)
 		return NULL;
+
+	/*
+	 * We might have multiple entries between 'pos' and
+	 * the actual sparse-directory entry, so start walking
+	 * back until finding it or passing where it would be.
+	 */
+	while (pos >= 0) {
+		ce = o->src_index->cache[pos];
+
+		if (strncmp(ce->name, p->path, p->pathlen))
+			return NULL;
+
+		if (S_ISSPARSEDIR(ce->ce_mode) &&
+		    sparse_dir_matches_path(ce, info, p))
+			return ce;
+
+		pos--;
+	}
+
+	return NULL;
 }
 
 static void debug_path(struct traverse_info *info)
@@ -1266,6 +1337,21 @@ static void debug_unpack_callback(int n,
 		debug_name_entry(i, names + i);
 }
 
+/*
+ * Returns true if and only if the given cache_entry is a
+ * sparse-directory entry that matches the given name_entry
+ * from the tree walk at the given traverse_info.
+ */
+static int is_sparse_directory_entry(struct cache_entry *ce,
+				     struct name_entry *name,
+				     struct traverse_info *info)
+{
+	if (!ce || !name || !S_ISSPARSEDIR(ce->ce_mode))
+		return 0;
+
+	return sparse_dir_matches_path(ce, info, name);
+}
+
 /*
  * Note that traverse_by_cache_tree() duplicates some logic in this function
  * without actually calling it. If you change the logic here you may need to
@@ -1352,9 +1438,12 @@ static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, str
 			}
 		}
 
-		if (traverse_trees_recursive(n, dirmask, mask & ~dirmask,
-					     names, info) < 0)
+		if (!is_sparse_directory_entry(src[0], names, info) &&
+		    traverse_trees_recursive(n, dirmask, mask & ~dirmask,
+						    names, info) < 0) {
 			return -1;
+		}
+
 		return mask;
 	}
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 215+ messages in thread

* [PATCH v7 10/16] unpack-trees: handle dir/file conflict of sparse entries
  2021-06-29  2:04           ` [PATCH v7 00/16] " Derrick Stolee via GitGitGadget
                               ` (8 preceding siblings ...)
  2021-06-29  2:04             ` [PATCH v7 09/16] unpack-trees: unpack sparse directory entries Derrick Stolee via GitGitGadget
@ 2021-06-29  2:04             ` Derrick Stolee via GitGitGadget
  2021-07-07 23:19               ` Elijah Newren
  2021-06-29  2:04             ` [PATCH v7 11/16] dir.c: accept a directory as part of cone-mode patterns Derrick Stolee via GitGitGadget
                               ` (8 subsequent siblings)
  18 siblings, 1 reply; 215+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-06-29  2:04 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee, git,
	johannes.schindelin, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 t/t1092-sparse-checkout-compatibility.sh | 25 ++++++++++++++++++++++--
 unpack-trees.c                           |  5 ++++-
 2 files changed, 27 insertions(+), 3 deletions(-)

diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index 3f61e5686b5..4e6446e7545 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -95,6 +95,19 @@ test_expect_success 'setup' '
 		git add . &&
 		git commit -m "rename deep/deeper1/... to folder1/..." &&
 
+		git checkout -b df-conflict base &&
+		rm -rf folder1 &&
+		echo content >folder1 &&
+		git add . &&
+		git commit -m df &&
+
+		git checkout -b fd-conflict base &&
+		rm a &&
+		mkdir a &&
+		echo content >a/a &&
+		git add . &&
+		git commit -m fd &&
+
 		git checkout -b deepest base &&
 		echo "updated deepest" >deep/deeper1/deepest/a &&
 		git commit -a -m "update deepest" &&
@@ -325,7 +338,11 @@ test_expect_success 'diff --staged' '
 test_expect_success 'diff with renames and conflicts' '
 	init_repos &&
 
-	for branch in rename-out-to-out rename-out-to-in rename-in-to-out
+	for branch in rename-out-to-out \
+		      rename-out-to-in \
+		      rename-in-to-out \
+		      df-conflict \
+		      fd-conflict
 	do
 		test_all_match git checkout rename-base &&
 		test_all_match git checkout $branch -- .&&
@@ -338,7 +355,11 @@ test_expect_success 'diff with renames and conflicts' '
 test_expect_success 'diff with directory/file conflicts' '
 	init_repos &&
 
-	for branch in rename-out-to-out rename-out-to-in rename-in-to-out
+	for branch in rename-out-to-out \
+		      rename-out-to-in \
+		      rename-in-to-out \
+		      df-conflict \
+		      fd-conflict
 	do
 		git -C full-checkout reset --hard &&
 		test_sparse_match git reset --hard &&
diff --git a/unpack-trees.c b/unpack-trees.c
index d141dffbd94..e63b2dcacbc 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -2617,7 +2617,10 @@ int twoway_merge(const struct cache_entry * const *src,
 			 same(current, oldtree) && !same(current, newtree)) {
 			/* 20 or 21 */
 			return merged_entry(newtree, current, o);
-		} else
+		} else if (current && !oldtree && newtree &&
+			   S_ISSPARSEDIR(current->ce_mode) != S_ISSPARSEDIR(newtree->ce_mode))
+			return merged_entry(newtree, current, o);
+		else
 			return reject_merge(current, o);
 	}
 	else if (newtree) {
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 215+ messages in thread

* [PATCH v7 11/16] dir.c: accept a directory as part of cone-mode patterns
  2021-06-29  2:04           ` [PATCH v7 00/16] " Derrick Stolee via GitGitGadget
                               ` (9 preceding siblings ...)
  2021-06-29  2:04             ` [PATCH v7 10/16] unpack-trees: handle dir/file conflict of sparse entries Derrick Stolee via GitGitGadget
@ 2021-06-29  2:04             ` Derrick Stolee via GitGitGadget
  2021-06-29  2:04             ` [PATCH v7 12/16] diff-lib: handle index diffs with sparse dirs Derrick Stolee via GitGitGadget
                               ` (7 subsequent siblings)
  18 siblings, 0 replies; 215+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-06-29  2:04 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee, git,
	johannes.schindelin, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

When we have sparse directory entries in the index, we want to compare
that directory against sparse-checkout patterns. Those pattern matching
algorithms are built expecting a file path, not a directory path. This
is especially important in the "cone mode" patterns which will match
files that exist within the "parent directories" as well as the
recursive directory matches.

If path_matches_pattern_list() is given a directory, we can add a fake
filename ("-") to the directory and get the same results as before,
assuming we are in cone mode. Since sparse index requires cone mode
patterns, this is an acceptable assumption.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 dir.c | 24 +++++++++++++++++++-----
 1 file changed, 19 insertions(+), 5 deletions(-)

diff --git a/dir.c b/dir.c
index ebe5ec046e0..0c5264b3b20 100644
--- a/dir.c
+++ b/dir.c
@@ -1376,7 +1376,7 @@ enum pattern_match_result path_matches_pattern_list(
 	struct path_pattern *pattern;
 	struct strbuf parent_pathname = STRBUF_INIT;
 	int result = NOT_MATCHED;
-	const char *slash_pos;
+	size_t slash_pos;
 
 	if (!pl->use_cone_patterns) {
 		pattern = last_matching_pattern_from_list(pathname, pathlen, basename,
@@ -1397,21 +1397,35 @@ enum pattern_match_result path_matches_pattern_list(
 	strbuf_addch(&parent_pathname, '/');
 	strbuf_add(&parent_pathname, pathname, pathlen);
 
+	/*
+	 * Directory entries are matched if and only if a file
+	 * contained immediately within them is matched. For the
+	 * case of a directory entry, modify the path to create
+	 * a fake filename within this directory, allowing us to
+	 * use the file-base matching logic in an equivalent way.
+	 */
+	if (parent_pathname.len > 0 &&
+	    parent_pathname.buf[parent_pathname.len - 1] == '/') {
+		slash_pos = parent_pathname.len - 1;
+		strbuf_add(&parent_pathname, "-", 1);
+	} else {
+		const char *slash_ptr = strrchr(parent_pathname.buf, '/');
+		slash_pos = slash_ptr ? slash_ptr - parent_pathname.buf : 0;
+	}
+
 	if (hashmap_contains_path(&pl->recursive_hashmap,
 				  &parent_pathname)) {
 		result = MATCHED_RECURSIVE;
 		goto done;
 	}
 
-	slash_pos = strrchr(parent_pathname.buf, '/');
-
-	if (slash_pos == parent_pathname.buf) {
+	if (!slash_pos) {
 		/* include every file in root */
 		result = MATCHED;
 		goto done;
 	}
 
-	strbuf_setlen(&parent_pathname, slash_pos - parent_pathname.buf);
+	strbuf_setlen(&parent_pathname, slash_pos);
 
 	if (hashmap_contains_path(&pl->parent_hashmap, &parent_pathname)) {
 		result = MATCHED;
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 215+ messages in thread

* [PATCH v7 12/16] diff-lib: handle index diffs with sparse dirs
  2021-06-29  2:04           ` [PATCH v7 00/16] " Derrick Stolee via GitGitGadget
                               ` (10 preceding siblings ...)
  2021-06-29  2:04             ` [PATCH v7 11/16] dir.c: accept a directory as part of cone-mode patterns Derrick Stolee via GitGitGadget
@ 2021-06-29  2:04             ` Derrick Stolee via GitGitGadget
  2021-07-08 23:10               ` Elijah Newren
  2021-06-29  2:04             ` [PATCH v7 13/16] status: skip sparse-checkout percentage with sparse-index Derrick Stolee via GitGitGadget
                               ` (6 subsequent siblings)
  18 siblings, 1 reply; 215+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-06-29  2:04 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee, git,
	johannes.schindelin, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

While comparing an index to a tree, we may see a sparse directory entry.
In this case, we should compare that portion of the tree to the tree
represented by that entry. This could include a new tree which needs to
be expanded to a full list of added files. It could also include an
existing tree, in which case all of the changes inside are important to
describe, including the modifications, additions, and deletions. Note
that the case where the tree has a path and the index does not remains
identical to before: the lack of a cache entry is the same with a sparse
index.

Use diff_tree_oid() appropriately to compute the diff.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 diff-lib.c | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/diff-lib.c b/diff-lib.c
index c2ac9250fe9..3f32f038371 100644
--- a/diff-lib.c
+++ b/diff-lib.c
@@ -325,6 +325,11 @@ static void show_new_file(struct rev_info *revs,
 	unsigned dirty_submodule = 0;
 	struct index_state *istate = revs->diffopt.repo->index;
 
+	if (new_file && S_ISSPARSEDIR(new_file->ce_mode)) {
+		diff_tree_oid(NULL, &new_file->oid, new_file->name, &revs->diffopt);
+		return;
+	}
+
 	/*
 	 * New file in the index: it might actually be different in
 	 * the working tree.
@@ -347,6 +352,17 @@ static int show_modified(struct rev_info *revs,
 	unsigned dirty_submodule = 0;
 	struct index_state *istate = revs->diffopt.repo->index;
 
+	/*
+	 * If both are sparse directory entries, then expand the
+	 * modifications to the file level.
+	 */
+	if (old_entry && new_entry &&
+	    S_ISSPARSEDIR(old_entry->ce_mode) &&
+	    S_ISSPARSEDIR(new_entry->ce_mode)) {
+		diff_tree_oid(&old_entry->oid, &new_entry->oid, new_entry->name, &revs->diffopt);
+		return 0;
+	}
+
 	if (get_stat_data(istate, new_entry, &oid, &mode, cached, match_missing,
 			  &dirty_submodule, &revs->diffopt) < 0) {
 		if (report_missing)
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 215+ messages in thread

* [PATCH v7 13/16] status: skip sparse-checkout percentage with sparse-index
  2021-06-29  2:04           ` [PATCH v7 00/16] " Derrick Stolee via GitGitGadget
                               ` (11 preceding siblings ...)
  2021-06-29  2:04             ` [PATCH v7 12/16] diff-lib: handle index diffs with sparse dirs Derrick Stolee via GitGitGadget
@ 2021-06-29  2:04             ` Derrick Stolee via GitGitGadget
  2021-06-29  2:04             ` [PATCH v7 14/16] status: use sparse-index throughout Derrick Stolee via GitGitGadget
                               ` (5 subsequent siblings)
  18 siblings, 0 replies; 215+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-06-29  2:04 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee, git,
	johannes.schindelin, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

'git status' began reporting a percentage of populated paths when
sparse-checkout is enabled in 051df3cf (wt-status: show sparse
checkout status as well, 2020-07-18). This percentage is incorrect when
the index has sparse directories. It would also be expensive to
calculate as we would need to parse trees to count the total number of
possible paths.

Avoid the expensive computation by simplifying the output to only report
that a sparse checkout exists, without the percentage.

This change is the reason we use 'git status --porcelain=v2' in
t1092-sparse-checkout-compatibility.sh. We don't want to ensure that
this message is equal across both modes, but instead just the important
information about staged, modified, and untracked files are compared.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 t/t1092-sparse-checkout-compatibility.sh |  8 ++++++++
 wt-status.c                              | 14 +++++++++++---
 wt-status.h                              |  1 +
 3 files changed, 20 insertions(+), 3 deletions(-)

diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index 4e6446e7545..d372932cd12 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -231,6 +231,14 @@ test_expect_success 'status with options' '
 	test_all_match git status --porcelain=v2 -uno
 '
 
+test_expect_success 'status reports sparse-checkout' '
+	init_repos &&
+	git -C sparse-checkout status >full &&
+	git -C sparse-index status >sparse &&
+	test_i18ngrep "You are in a sparse checkout with " full &&
+	test_i18ngrep "You are in a sparse checkout." sparse
+'
+
 test_expect_success 'add, commit, checkout' '
 	init_repos &&
 
diff --git a/wt-status.c b/wt-status.c
index 42b67357169..96db3e74962 100644
--- a/wt-status.c
+++ b/wt-status.c
@@ -1493,9 +1493,12 @@ static void show_sparse_checkout_in_use(struct wt_status *s,
 	if (s->state.sparse_checkout_percentage == SPARSE_CHECKOUT_DISABLED)
 		return;
 
-	status_printf_ln(s, color,
-			 _("You are in a sparse checkout with %d%% of tracked files present."),
-			 s->state.sparse_checkout_percentage);
+	if (s->state.sparse_checkout_percentage == SPARSE_CHECKOUT_SPARSE_INDEX)
+		status_printf_ln(s, color, _("You are in a sparse checkout."));
+	else
+		status_printf_ln(s, color,
+				_("You are in a sparse checkout with %d%% of tracked files present."),
+				s->state.sparse_checkout_percentage);
 	wt_longstatus_print_trailer(s);
 }
 
@@ -1653,6 +1656,11 @@ static void wt_status_check_sparse_checkout(struct repository *r,
 		return;
 	}
 
+	if (r->index->sparse_index) {
+		state->sparse_checkout_percentage = SPARSE_CHECKOUT_SPARSE_INDEX;
+		return;
+	}
+
 	for (i = 0; i < r->index->cache_nr; i++) {
 		struct cache_entry *ce = r->index->cache[i];
 		if (ce_skip_worktree(ce))
diff --git a/wt-status.h b/wt-status.h
index 0d32799b28e..ab9cc9d8f03 100644
--- a/wt-status.h
+++ b/wt-status.h
@@ -78,6 +78,7 @@ enum wt_status_format {
 };
 
 #define SPARSE_CHECKOUT_DISABLED -1
+#define SPARSE_CHECKOUT_SPARSE_INDEX -2
 
 struct wt_status_state {
 	int merge_in_progress;
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 215+ messages in thread

* [PATCH v7 14/16] status: use sparse-index throughout
  2021-06-29  2:04           ` [PATCH v7 00/16] " Derrick Stolee via GitGitGadget
                               ` (12 preceding siblings ...)
  2021-06-29  2:04             ` [PATCH v7 13/16] status: skip sparse-checkout percentage with sparse-index Derrick Stolee via GitGitGadget
@ 2021-06-29  2:04             ` Derrick Stolee via GitGitGadget
  2021-06-29  2:04             ` [PATCH v7 15/16] wt-status: expand added sparse directory entries Derrick Stolee via GitGitGadget
                               ` (4 subsequent siblings)
  18 siblings, 0 replies; 215+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-06-29  2:04 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee, git,
	johannes.schindelin, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

By testing 'git -c core.fsmonitor= status -uno', we can check for the
simplest index operations that can be made sparse-aware. The necessary
implementation details are already integrated with sparse-checkout, so
modify command_requires_full_index to be zero for cmd_status().

In refresh_index(), we loop through the index entries to refresh their
stat() information. However, sparse directories have no stat()
information to populate. Ignore these entries.

This allows 'git status' to no longer expand a sparse index to a full
one. This is further tested by dropping the "-uno" option and adding an
untracked file into the worktree.

The performance test p2000-sparse-checkout-operations.sh demonstrates
these improvements:

Test                                  HEAD~1           HEAD
-----------------------------------------------------------------------------
2000.2: git status (full-index-v3)    0.31(0.30+0.05)  0.31(0.29+0.06) +0.0%
2000.3: git status (full-index-v4)    0.31(0.29+0.07)  0.34(0.30+0.08) +9.7%
2000.4: git status (sparse-index-v3)  2.35(2.28+0.10)  0.04(0.04+0.05) -98.3%
2000.5: git status (sparse-index-v4)  2.35(2.24+0.15)  0.05(0.04+0.06) -97.9%

Note that since HEAD~1 was expanding the sparse index by parsing trees,
it was artificially slower than the full index case. Thus, the 98%
improvement is misleading, and instead we should celebrate the 0.34s to
0.05s improvement of 85%. This is more indicative of the peformance
gains we are expecting by using a sparse index.

Note: we are dropping the assignment of core.fsmonitor here. This is not
necessary for the test script as we are not altering the config any
other way. Correct integration with FS Monitor will be validated in
later changes.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 builtin/commit.c                         |  3 +++
 read-cache.c                             | 10 ++++++++--
 t/t1092-sparse-checkout-compatibility.sh | 13 +++++++++----
 3 files changed, 20 insertions(+), 6 deletions(-)

diff --git a/builtin/commit.c b/builtin/commit.c
index 190d215d43b..12f51db158a 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -1510,6 +1510,9 @@ int cmd_status(int argc, const char **argv, const char *prefix)
 	if (argc == 2 && !strcmp(argv[1], "-h"))
 		usage_with_options(builtin_status_usage, builtin_status_options);
 
+	prepare_repo_settings(the_repository);
+	the_repository->settings.command_requires_full_index = 0;
+
 	status_init_config(&s, git_status_config);
 	argc = parse_options(argc, argv, prefix,
 			     builtin_status_options,
diff --git a/read-cache.c b/read-cache.c
index 1b3c2eb408b..277c2970a03 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1584,8 +1584,7 @@ int refresh_index(struct index_state *istate, unsigned int flags,
 	 */
 	preload_index(istate, pathspec, 0);
 	trace2_region_enter("index", "refresh", NULL);
-	/* TODO: audit for interaction with sparse-index. */
-	ensure_full_index(istate);
+
 	for (i = 0; i < istate->cache_nr; i++) {
 		struct cache_entry *ce, *new_entry;
 		int cache_errno = 0;
@@ -1600,6 +1599,13 @@ int refresh_index(struct index_state *istate, unsigned int flags,
 		if (ignore_skip_worktree && ce_skip_worktree(ce))
 			continue;
 
+		/*
+		 * If this entry is a sparse directory, then there isn't
+		 * any stat() information to update. Ignore the entry.
+		 */
+		if (S_ISSPARSEDIR(ce->ce_mode))
+			continue;
+
 		if (pathspec && !ce_path_match(istate, ce, pathspec, seen))
 			filtered = 1;
 
diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index d372932cd12..fed0440bafe 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -532,12 +532,17 @@ test_expect_success 'sparse-index is expanded and converted back' '
 	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \
 		git -C sparse-index -c core.fsmonitor="" reset --hard &&
 	test_region index convert_to_sparse trace2.txt &&
-	test_region index ensure_full_index trace2.txt &&
+	test_region index ensure_full_index trace2.txt
+'
 
-	rm trace2.txt &&
+test_expect_success 'sparse-index is not expanded' '
+	init_repos &&
+
+	rm -f trace2.txt &&
+	echo >>sparse-index/untracked.txt &&
 	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \
-		git -C sparse-index -c core.fsmonitor="" status -uno &&
-	test_region index ensure_full_index trace2.txt
+		git -C sparse-index status &&
+	test_region ! index ensure_full_index trace2.txt
 '
 
 test_done
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 215+ messages in thread

* [PATCH v7 15/16] wt-status: expand added sparse directory entries
  2021-06-29  2:04           ` [PATCH v7 00/16] " Derrick Stolee via GitGitGadget
                               ` (13 preceding siblings ...)
  2021-06-29  2:04             ` [PATCH v7 14/16] status: use sparse-index throughout Derrick Stolee via GitGitGadget
@ 2021-06-29  2:04             ` Derrick Stolee via GitGitGadget
  2021-07-09  1:03               ` Elijah Newren
  2021-06-29  2:04             ` [PATCH v7 16/16] fsmonitor: integrate with sparse index Derrick Stolee via GitGitGadget
                               ` (3 subsequent siblings)
  18 siblings, 1 reply; 215+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-06-29  2:04 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee, git,
	johannes.schindelin, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

It is difficult, but possible, to get into a state where we intend to
add a directory that is outside of the sparse-checkout definition. Add a
test to t1092-sparse-checkout-compatibility.sh that demonstrates this
using a combination of 'git reset --mixed' and 'git checkout --orphan'.

This test failed before because the output of 'git status
--porcelain=v2' would not match on the lines for folder1/:

* The sparse-checkout repo (with a full index) would output each path
  name that is intended to be added.

* The sparse-index repo would only output that "folder1/" is staged for
  addition.

The status should report the full list of files to be added, and so this
sparse-directory entry should be expanded to a full list when reaching
it inside the wt_status_collect_changes_initial() method. Use
read_tree_at() to assist.

Somehow, this loop over the cache entries was not guarded by
ensure_full_index() as intended.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 t/t1092-sparse-checkout-compatibility.sh | 33 +++++++++++++++
 wt-status.c                              | 51 ++++++++++++++++++++++++
 2 files changed, 84 insertions(+)

diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index fed0440bafe..df217a2d10b 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -545,4 +545,37 @@ test_expect_success 'sparse-index is not expanded' '
 	test_region ! index ensure_full_index trace2.txt
 '
 
+test_expect_success 'reset mixed and checkout orphan' '
+	init_repos &&
+
+	test_all_match git checkout rename-out-to-in &&
+
+	# Sparse checkouts do not agree with full checkouts about
+	# how to report a directory/file conflict during a reset.
+	# This command would fail with test_all_match because the
+	# full checkout reports "T folder1/0/1" while a sparse
+	# checkout reports "D folder1/0/1". This matches because
+	# the sparse checkouts skip "adding" the other side of
+	# the conflict.
+	test_sparse_match git reset --mixed HEAD~1 &&
+	test_sparse_match test-tool read-cache --table --expand &&
+	test_sparse_match git status --porcelain=v2 &&
+
+	# At this point, sparse-checkouts behave differently
+	# from the full-checkout.
+	test_sparse_match git checkout --orphan new-branch &&
+	test_sparse_match test-tool read-cache --table --expand &&
+	test_sparse_match git status --porcelain=v2
+'
+
+test_expect_success 'add everything with deep new file' '
+	init_repos &&
+
+	run_on_sparse git sparse-checkout set deep/deeper1/deepest &&
+
+	run_on_all touch deep/deeper1/x &&
+	test_all_match git add . &&
+	test_all_match git status --porcelain=v2
+'
+
 test_done
diff --git a/wt-status.c b/wt-status.c
index 96db3e74962..0317baef87e 100644
--- a/wt-status.c
+++ b/wt-status.c
@@ -657,6 +657,36 @@ static void wt_status_collect_changes_index(struct wt_status *s)
 	clear_pathspec(&rev.prune_data);
 }
 
+static int add_file_to_list(const struct object_id *oid,
+			    struct strbuf *base, const char *path,
+			    unsigned int mode, void *context)
+{
+	struct string_list_item *it;
+	struct wt_status_change_data *d;
+	struct wt_status *s = context;
+	struct strbuf full_name = STRBUF_INIT;
+
+	if (S_ISDIR(mode))
+		return READ_TREE_RECURSIVE;
+
+	strbuf_add(&full_name, base->buf, base->len);
+	strbuf_addstr(&full_name, path);
+	it = string_list_insert(&s->change, full_name.buf);
+	d = it->util;
+	if (!d) {
+		CALLOC_ARRAY(d, 1);
+		it->util = d;
+	}
+
+	d->index_status = DIFF_STATUS_ADDED;
+	/* Leave {mode,oid}_head zero for adds. */
+	d->mode_index = mode;
+	oidcpy(&d->oid_index, oid);
+	s->committable = 1;
+	strbuf_release(&full_name);
+	return 0;
+}
+
 static void wt_status_collect_changes_initial(struct wt_status *s)
 {
 	struct index_state *istate = s->repo->index;
@@ -671,6 +701,27 @@ static void wt_status_collect_changes_initial(struct wt_status *s)
 			continue;
 		if (ce_intent_to_add(ce))
 			continue;
+		if (S_ISSPARSEDIR(ce->ce_mode)) {
+			/*
+			 * This is a sparse directory entry, so we want to collect all
+			 * of the added files within the tree. This requires recursively
+			 * expanding the trees to find the elements that are new in this
+			 * tree and marking them with DIFF_STATUS_ADDED.
+			 */
+			struct strbuf base = STRBUF_INIT;
+			struct pathspec ps = { 0 };
+			struct tree *tree = lookup_tree(istate->repo, &ce->oid);
+
+			ps.recursive = 1;
+			ps.has_wildcard = 1;
+			ps.max_depth = -1;
+
+			strbuf_add(&base, ce->name, ce->ce_namelen);
+			read_tree_at(istate->repo, tree, &base, &ps,
+				     add_file_to_list, s);
+			continue;
+		}
+
 		it = string_list_insert(&s->change, ce->name);
 		d = it->util;
 		if (!d) {
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 215+ messages in thread

* [PATCH v7 16/16] fsmonitor: integrate with sparse index
  2021-06-29  2:04           ` [PATCH v7 00/16] " Derrick Stolee via GitGitGadget
                               ` (14 preceding siblings ...)
  2021-06-29  2:04             ` [PATCH v7 15/16] wt-status: expand added sparse directory entries Derrick Stolee via GitGitGadget
@ 2021-06-29  2:04             ` Derrick Stolee via GitGitGadget
  2021-06-29  2:16             ` [PATCH v7 00/16] Sparse-index: integrate with status Derrick Stolee
                               ` (2 subsequent siblings)
  18 siblings, 0 replies; 215+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-06-29  2:04 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee, git,
	johannes.schindelin, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

If we need to expand a sparse-index into a full one, then the FS Monitor
bitmap is going to be incorrect. Ensure that we start fresh at such an
event.

While this is currently a performance drawback, the eventual hope of the
sparse-index feature is that these expansions will be rare and hence we
will be able to keep the FS Monitor data accurate across multiple Git
commands.

These tests are added to demonstrate that the behavior is the same
across a full index and a sparse index, but also that file modifications
to a tracked directory outside of the sparse cone will trigger
ensure_full_index().

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 sparse-index.c              |  7 ++++++
 t/t7519-status-fsmonitor.sh | 49 +++++++++++++++++++++++++++++++++++++
 2 files changed, 56 insertions(+)

diff --git a/sparse-index.c b/sparse-index.c
index ef53bd2198b..53c8f711ccc 100644
--- a/sparse-index.c
+++ b/sparse-index.c
@@ -186,6 +186,10 @@ int convert_to_sparse(struct index_state *istate)
 	cache_tree_free(&istate->cache_tree);
 	cache_tree_update(istate, 0);
 
+	istate->fsmonitor_has_run_once = 0;
+	FREE_AND_NULL(istate->fsmonitor_dirty);
+	FREE_AND_NULL(istate->fsmonitor_last_update);
+
 	istate->sparse_index = 1;
 	trace2_region_leave("index", "convert_to_sparse", istate->repo);
 	return 0;
@@ -282,6 +286,9 @@ void ensure_full_index(struct index_state *istate)
 	istate->cache = full->cache;
 	istate->cache_nr = full->cache_nr;
 	istate->cache_alloc = full->cache_alloc;
+	istate->fsmonitor_has_run_once = 0;
+	FREE_AND_NULL(istate->fsmonitor_dirty);
+	FREE_AND_NULL(istate->fsmonitor_last_update);
 
 	strbuf_release(&base);
 	free(full);
diff --git a/t/t7519-status-fsmonitor.sh b/t/t7519-status-fsmonitor.sh
index 637391c6ce4..deea88d4431 100755
--- a/t/t7519-status-fsmonitor.sh
+++ b/t/t7519-status-fsmonitor.sh
@@ -73,6 +73,7 @@ test_expect_success 'setup' '
 	expect*
 	actual*
 	marker*
+	trace2*
 	EOF
 '
 
@@ -383,4 +384,52 @@ test_expect_success 'status succeeds after staging/unstaging' '
 	)
 '
 
+# Usage:
+# check_sparse_index_behavior [!]
+# If "!" is supplied, then we verify that we do not call ensure_full_index
+# during a call to 'git status'. Otherwise, we verify that we _do_ call it.
+check_sparse_index_behavior () {
+	git status --porcelain=v2 >expect &&
+	git sparse-checkout init --cone --sparse-index &&
+	git sparse-checkout set dir1 dir2 &&
+	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \
+		git status --porcelain=v2 >actual &&
+	test_region $1 index ensure_full_index trace2.txt &&
+	test_region fsm_hook query trace2.txt &&
+	test_cmp expect actual &&
+	rm trace2.txt &&
+	git sparse-checkout disable
+}
+
+test_expect_success 'status succeeds with sparse index' '
+	git reset --hard &&
+
+	test_config core.fsmonitor "$TEST_DIRECTORY/t7519/fsmonitor-all" &&
+	check_sparse_index_behavior ! &&
+
+	write_script .git/hooks/fsmonitor-test<<-\EOF &&
+		printf "last_update_token\0"
+	EOF
+	git config core.fsmonitor .git/hooks/fsmonitor-test &&
+	check_sparse_index_behavior ! &&
+
+	write_script .git/hooks/fsmonitor-test<<-\EOF &&
+		printf "last_update_token\0"
+		printf "dir1/modified\0"
+	EOF
+	check_sparse_index_behavior ! &&
+
+	cp -r dir1 dir1a &&
+	git add dir1a &&
+	git commit -m "add dir1a" &&
+
+	# This one modifies outside the sparse-checkout definition
+	# and hence we expect to expand the sparse-index.
+	write_script .git/hooks/fsmonitor-test<<-\EOF &&
+		printf "last_update_token\0"
+		printf "dir1a/modified\0"
+	EOF
+	check_sparse_index_behavior
+'
+
 test_done
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 215+ messages in thread

* Re: [PATCH v7 00/16] Sparse-index: integrate with status
  2021-06-29  2:04           ` [PATCH v7 00/16] " Derrick Stolee via GitGitGadget
                               ` (15 preceding siblings ...)
  2021-06-29  2:04             ` [PATCH v7 16/16] fsmonitor: integrate with sparse index Derrick Stolee via GitGitGadget
@ 2021-06-29  2:16             ` Derrick Stolee
  2021-06-30 14:32             ` Elijah Newren
  2021-07-12 17:55             ` [PATCH v8 00/15] " Derrick Stolee via GitGitGadget
  18 siblings, 0 replies; 215+ messages in thread
From: Derrick Stolee @ 2021-06-29  2:16 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget, git
  Cc: gitster, newren, Matheus Tavares Bernardino, git,
	johannes.schindelin, Derrick Stolee

On 6/28/2021 10:04 PM, Derrick Stolee via GitGitGadget wrote:
...
> Update in V7 (relative to v5)
> =============================
> 
> APOLOGIES: As I was working on this cover letter, I was still organizing my
> big list of patches, including reordering some into this series. I forgot to
> actually include them in my v6 submission, so here is a re-submission.
> Please ignore v6.

Since v6 was a mistake, here is the full range-diff of v5 versus v7:

 1:  5a2ed3d1d70 =  1:  2a4a7256304 sparse-index: skip indexes with unmerged entries
 2:  8aa41e74947 !  2:  f5bae86014d sparse-index: include EXTENDED flag when expanding
    @@ Commit message
     
         When creating a full index from a sparse one, we create cache entries
         for every blob within a given sparse directory entry. These are
    -    correctly marked with the CE_SKIP_WORKTREE flag, but they must also be
    -    marked with the CE_EXTENDED flag to ensure that the skip-worktree bit is
    -    correctly written to disk in the case that the index is not converted
    -    back down to a sparse-index.
    +    correctly marked with the CE_SKIP_WORKTREE flag, but the CE_EXTENDED
    +    flag is not included. The CE_EXTENDED flag would exist if we loaded a
    +    full index from disk with these entries marked with CE_SKIP_WORKTREE, so
    +    we can add the flag here to be consistent. This allows us to directly
    +    compare the flags present in cache entries when testing the sparse-index
    +    feature, but has no significance to its correctness in the user-facing
    +    functionality.
     
         Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
     
 3:  b99371c7dd6 !  3:  d965669c766 t1092: replace incorrect 'echo' with 'cat'
    @@ Commit message
         t1092: replace incorrect 'echo' with 'cat'
     
         This fixes the test data shape to be as expected, allowing rename
    -    detection to work properly now that the 'larger-conent' file actually
    +    detection to work properly now that the 'larger-content' file actually
         has meaningful lines.
     
         Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
 4:  f4dddac1859 !  4:  e10fa11cfdb t1092: expand repository data shape
    @@ Commit message
           one entry and are the first entry of a directory with multiple
           entries.
     
    +    * Add filenames adjacent to a sparse directory entry that sort before
    +      and after the trailing slash.
    +
         Later tests will take advantage of these shapes, but they also deepen
         the tests that already exist.
     
    @@ t/t1092-sparse-checkout-compatibility.sh: test_expect_success 'setup' '
     +		mkdir deep/deeper1/0/0 &&
     +		touch deep/deeper1/0/1 &&
     +		touch deep/deeper1/0/0/0 &&
    ++		>folder1- &&
    ++		>folder1.x &&
    ++		>folder10 &&
     +		cp -r deep/deeper1/0 folder1 &&
     +		cp -r deep/deeper1/0 folder2 &&
     +		echo >>folder1/0/0/0 &&
 5:  856346b72f7 =  5:  e94ffa07d46 t1092: add tests for status/add and sparse files
 6:  f3f6223e955 =  6:  a8dda933567 unpack-trees: preserve cache_bottom
 7:  45ae96adf28 !  7:  e52166f6e4c unpack-trees: compare sparse directories correctly
    @@ Commit message
         Within compare_entry(), it first calls do_compare_entry() to check the
         leading portion of the name. When the input path is a directory name, we
         could match exactly already. Thus, we should return 0 if we have an
    -    exact string match on a sparse directory entry.
    +    exact string match on a sparse directory entry. The final check is a
    +    length comparison between the strings.
     
         Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
     
    @@ unpack-trees.c: static int compare_entry(const struct cache_entry *ce, const str
     +	 * works when the input name is a directory, since ce->name
     +	 * ends in a directory separator.
     +	 */
    -+	if (S_ISSPARSEDIR(ce->ce_mode))
    ++	if (S_ISSPARSEDIR(ce->ce_mode) &&
    ++	    ce->ce_namelen == traverse_path_len(info, tree_entry_len(n)) + 1)
     +		return 0;
     +
      	/*
 -:  ----------- >  8:  d04b62381b8 unpack-trees: rename unpack_nondirectories()
 8:  724194eef9f !  9:  237ccf4e43d unpack-trees: unpack sparse directory entries
    @@ Commit message
         might want to perform a merge operation on the entry, such as during
         'git checkout <commit>' which wants to replace a sparse tree entry with
         the tree for that path at the target commit. We extend the logic within
    -    unpack_nondirectories() to create a sparse-directory entry in this case,
    +    unpack_single_entry() to create a sparse-directory entry in this case,
         and then that is sent to call_unpack_fn().
     
         There are some subtleties in this process. For instance, we need to
         update find_cache_entry() to allow finding a sparse-directory entry that
    -    exactly matches a given path.
    +    exactly matches a given path. Use the new helper method
    +    sparse_dir_matches_path() for this. We also need to ignore conflict
    +    markers in the case that the entries correspond to directories and we
    +    already have a sparse directory entry.
     
         Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
     
    @@ unpack-trees.c: static struct cache_entry *create_ce_entry(const struct traverse
     +	size_t alloc_len = is_sparse_directory ? len + 1 : len;
      	struct cache_entry *ce =
      		is_transient ?
    --		make_empty_transient_cache_entry(len) :
    +-		make_empty_transient_cache_entry(len, NULL) :
     -		make_empty_cache_entry(istate, len);
    -+		make_empty_transient_cache_entry(alloc_len) :
    ++		make_empty_transient_cache_entry(alloc_len, NULL) :
     +		make_empty_cache_entry(istate, alloc_len);
      
      	ce->ce_mode = create_ce_mode(n->mode);
    @@ unpack-trees.c: static struct cache_entry *create_ce_entry(const struct traverse
      
     +	if (is_sparse_directory) {
     +		ce->name[len] = '/';
    -+		ce->name[len + 1] = 0;
    ++		ce->name[len + 1] = '\0';
     +		ce->ce_namelen++;
     +		ce->ce_flags |= CE_SKIP_WORKTREE;
     +	}
    @@ unpack-trees.c: static struct cache_entry *create_ce_entry(const struct traverse
      	return ce;
      }
      
    -@@ unpack-trees.c: static int unpack_nondirectories(int n, unsigned long mask,
    - 				 unsigned long dirmask,
    - 				 struct cache_entry **src,
    - 				 const struct name_entry *names,
    --				 const struct traverse_info *info)
    -+				 const struct traverse_info *info,
    -+				 int sparse_directory)
    - {
    - 	int i;
    +@@ unpack-trees.c: static int unpack_single_entry(int n, unsigned long mask,
      	struct unpack_trees_options *o = info->data;
      	unsigned long conflicts = info->df_conflicts | dirmask;
      
    @@ unpack-trees.c: static int unpack_nondirectories(int n, unsigned long mask,
      	if (mask == dirmask && !src[0])
      		return 0;
      
    -+	/* no-op if our cache entry doesn't match the expectations. */
    -+	if (sparse_directory) {
    -+		if (src[0] && !S_ISSPARSEDIR(src[0]->ce_mode))
    -+			BUG("expected sparse directory entry");
    -+	} else if (src[0] && S_ISSPARSEDIR(src[0]->ce_mode)) {
    -+		return 0;
    -+	}
    ++	/*
    ++	 * When we have a sparse directory entry for src[0],
    ++	 * then this isn't necessarily a directory-file conflict.
    ++	 */
    ++	if (mask == dirmask && src[0] &&
    ++	    S_ISSPARSEDIR(src[0]->ce_mode))
    ++		conflicts = 0;
     +
      	/*
      	 * Ok, we've filled in up to any potential index entry in src[0],
      	 * now do the rest.
    -@@ unpack-trees.c: static int unpack_nondirectories(int n, unsigned long mask,
    +@@ unpack-trees.c: static int unpack_single_entry(int n, unsigned long mask,
      		 * not stored in the index.  otherwise construct the
      		 * cache entry from the index aware logic.
      		 */
     -		src[i + o->merge] = create_ce_entry(info, names + i, stage, &o->result, o->merge);
     +		src[i + o->merge] = create_ce_entry(info, names + i, stage,
     +						    &o->result, o->merge,
    -+						    sparse_directory);
    ++						    bit & dirmask);
      	}
      
      	if (o->merge) {
     @@ unpack-trees.c: static int find_cache_pos(struct traverse_info *info,
    + 	return -1;
    + }
    + 
    ++/*
    ++ * Given a sparse directory entry 'ce', compare ce->name to
    ++ * info->name + '/' + p->path + '/' if info->name is non-empty.
    ++ * Compare ce->name to p->path + '/' otherwise. Note that
    ++ * ce->name must end in a trailing '/' because it is a sparse
    ++ * directory entry.
    ++ */
    ++static int sparse_dir_matches_path(const struct cache_entry *ce,
    ++				   struct traverse_info *info,
    ++				   const struct name_entry *p)
    ++{
    ++	assert(S_ISSPARSEDIR(ce->ce_mode));
    ++	assert(ce->name[ce->ce_namelen - 1] == '/');
    ++
    ++	if (info->namelen)
    ++		return ce->ce_namelen == info->namelen + p->pathlen + 2 &&
    ++		       ce->name[info->namelen] == '/' &&
    ++		       !strncmp(ce->name, info->name, info->namelen) &&
    ++		       !strncmp(ce->name + info->namelen + 1, p->path, p->pathlen);
    ++	return ce->ce_namelen == p->pathlen + 1 &&
    ++	       !strncmp(ce->name, p->path, p->pathlen);
    ++}
    ++
      static struct cache_entry *find_cache_entry(struct traverse_info *info,
      					    const struct name_entry *p)
      {
    @@ unpack-trees.c: static int find_cache_pos(struct traverse_info *info,
     +	 * Check for a sparse-directory entry named "path/".
     +	 * Due to the input p->path not having a trailing
     +	 * slash, the negative 'pos' value overshoots the
    -+	 * expected position by one, hence "-2" here.
    ++	 * expected position by at least one, hence "-2" here.
     +	 */
     +	pos = -pos - 2;
     +
     +	if (pos < 0 || pos >= o->src_index->cache_nr)
    -+		return NULL;
    -+
    -+	ce = o->src_index->cache[pos];
    -+
    -+	if (!S_ISSPARSEDIR(ce->ce_mode))
      		return NULL;
     +
     +	/*
    -+	 * Compare ce->name to info->name + '/' + p->path + '/'
    -+	 * if info->name is non-empty. Compare ce->name to
    -+	 * p-.path + '/' otherwise.
    ++	 * We might have multiple entries between 'pos' and
    ++	 * the actual sparse-directory entry, so start walking
    ++	 * back until finding it or passing where it would be.
     +	 */
    -+	if (info->namelen) {
    -+		if (ce->ce_namelen == info->namelen + p->pathlen + 2 &&
    -+		    ce->name[info->namelen] == '/' &&
    -+		    !strncmp(ce->name, info->name, info->namelen) &&
    -+		    !strncmp(ce->name + info->namelen + 1, p->path, p->pathlen))
    ++	while (pos >= 0) {
    ++		ce = o->src_index->cache[pos];
    ++
    ++		if (strncmp(ce->name, p->path, p->pathlen))
    ++			return NULL;
    ++
    ++		if (S_ISSPARSEDIR(ce->ce_mode) &&
    ++		    sparse_dir_matches_path(ce, info, p))
     +			return ce;
    -+	} else if (ce->ce_namelen == p->pathlen + 1 &&
    -+		   !strncmp(ce->name, p->path, p->pathlen))
    -+		return ce;
    ++
    ++		pos--;
    ++	}
    ++
     +	return NULL;
      }
      
    @@ unpack-trees.c: static void debug_unpack_callback(int n,
     + * sparse-directory entry that matches the given name_entry
     + * from the tree walk at the given traverse_info.
     + */
    -+static int is_sparse_directory_entry(struct cache_entry *ce, struct name_entry *name, struct traverse_info *info)
    ++static int is_sparse_directory_entry(struct cache_entry *ce,
    ++				     struct name_entry *name,
    ++				     struct traverse_info *info)
     +{
    -+	size_t expected_len, name_start;
    -+
     +	if (!ce || !name || !S_ISSPARSEDIR(ce->ce_mode))
     +		return 0;
     +
    -+	if (info->namelen)
    -+		name_start = info->namelen + 1;
    -+	else
    -+		name_start = 0;
    -+	expected_len = name->pathlen + 1 + name_start;
    -+
    -+	if (ce->ce_namelen != expected_len ||
    -+	    strncmp(ce->name, info->name, info->namelen) ||
    -+	    strncmp(ce->name + name_start, name->path, name->pathlen))
    -+		return 0;
    -+
    -+	return 1;
    ++	return sparse_dir_matches_path(ce, info, name);
     +}
     +
      /*
       * Note that traverse_by_cache_tree() duplicates some logic in this function
       * without actually calling it. If you change the logic here you may need to
    -@@ unpack-trees.c: static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, str
    - 		}
    - 	}
    - 
    --	if (unpack_nondirectories(n, mask, dirmask, src, names, info) < 0)
    -+	if (unpack_nondirectories(n, mask, dirmask, src, names, info, 0) < 0)
    - 		return -1;
    - 
    - 	if (o->merge && src[0]) {
     @@ unpack-trees.c: static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, str
      			}
      		}
      
     -		if (traverse_trees_recursive(n, dirmask, mask & ~dirmask,
     -					     names, info) < 0)
    -+		if (is_sparse_directory_entry(src[0], names, info)) {
    -+			if (unpack_nondirectories(n, dirmask, mask & ~dirmask, src, names, info, 1) < 0)
    -+				return -1;
    -+		} else if (traverse_trees_recursive(n, dirmask, mask & ~dirmask,
    ++		if (!is_sparse_directory_entry(src[0], names, info) &&
    ++		    traverse_trees_recursive(n, dirmask, mask & ~dirmask,
     +						    names, info) < 0) {
      			return -1;
     +		}
 -:  ----------- > 10:  9f31c691af6 unpack-trees: handle dir/file conflict of sparse entries
 9:  b8ff179f43e ! 11:  2a43287c47e dir.c: accept a directory as part of cone-mode patterns
    @@ Commit message
         Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
     
      ## dir.c ##
    +@@ dir.c: enum pattern_match_result path_matches_pattern_list(
    + 	struct path_pattern *pattern;
    + 	struct strbuf parent_pathname = STRBUF_INIT;
    + 	int result = NOT_MATCHED;
    +-	const char *slash_pos;
    ++	size_t slash_pos;
    + 
    + 	if (!pl->use_cone_patterns) {
    + 		pattern = last_matching_pattern_from_list(pathname, pathlen, basename,
     @@ dir.c: enum pattern_match_result path_matches_pattern_list(
      	strbuf_addch(&parent_pathname, '/');
      	strbuf_add(&parent_pathname, pathname, pathlen);
    @@ dir.c: enum pattern_match_result path_matches_pattern_list(
     +	 * use the file-base matching logic in an equivalent way.
     +	 */
     +	if (parent_pathname.len > 0 &&
    -+	    parent_pathname.buf[parent_pathname.len - 1] == '/')
    ++	    parent_pathname.buf[parent_pathname.len - 1] == '/') {
    ++		slash_pos = parent_pathname.len - 1;
     +		strbuf_add(&parent_pathname, "-", 1);
    ++	} else {
    ++		const char *slash_ptr = strrchr(parent_pathname.buf, '/');
    ++		slash_pos = slash_ptr ? slash_ptr - parent_pathname.buf : 0;
    ++	}
     +
      	if (hashmap_contains_path(&pl->recursive_hashmap,
      				  &parent_pathname)) {
      		result = MATCHED_RECURSIVE;
    + 		goto done;
    + 	}
    + 
    +-	slash_pos = strrchr(parent_pathname.buf, '/');
    +-
    +-	if (slash_pos == parent_pathname.buf) {
    ++	if (!slash_pos) {
    + 		/* include every file in root */
    + 		result = MATCHED;
    + 		goto done;
    + 	}
    + 
    +-	strbuf_setlen(&parent_pathname, slash_pos - parent_pathname.buf);
    ++	strbuf_setlen(&parent_pathname, slash_pos);
    + 
    + 	if (hashmap_contains_path(&pl->parent_hashmap, &parent_pathname)) {
    + 		result = MATCHED;
10:  b9b97e01129 ! 12:  f83aa08ff6b diff-lib: handle index diffs with sparse dirs
    @@ Commit message
         identical to before: the lack of a cache entry is the same with a sparse
         index.
     
    -    In the case where a tree is modified, we need to expand the tree
    -    recursively, and start comparing each contained entry as either an
    -    addition, deletion, or modification. This causes an interesting
    -    recursion that did not exist before.
    +    Use diff_tree_oid() appropriately to compute the diff.
     
         Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
     
      ## diff-lib.c ##
    -@@ diff-lib.c: static int get_stat_data(const struct cache_entry *ce,
    - 	return 0;
    - }
    - 
    -+struct show_new_tree_context {
    -+	struct rev_info *revs;
    -+	unsigned added:1;
    -+};
    -+
    -+static int show_new_file_from_tree(const struct object_id *oid,
    -+				   struct strbuf *base, const char *path,
    -+				   unsigned int mode, void *context)
    -+{
    -+	struct show_new_tree_context *ctx = context;
    -+	struct cache_entry *new_file = make_transient_cache_entry(mode, oid, path, /* stage */ 0);
    -+
    -+	diff_index_show_file(ctx->revs, ctx->added ? "+" : "-", new_file, oid, !is_null_oid(oid), mode, 0);
    -+	discard_cache_entry(new_file);
    -+	return 0;
    -+}
    -+
    -+static void show_directory(struct rev_info *revs,
    -+			   const struct cache_entry *new_dir,
    -+			   int added)
    -+{
    -+	/*
    -+	 * new_dir is a sparse directory entry, so we want to collect all
    -+	 * of the new files within the tree. This requires recursively
    -+	 * expanding the trees.
    -+	 */
    -+	struct show_new_tree_context ctx = { revs, added };
    -+	struct repository *r = revs->repo;
    -+	struct strbuf base = STRBUF_INIT;
    -+	struct pathspec ps;
    -+	struct tree *tree = lookup_tree(r, &new_dir->oid);
    -+
    -+	memset(&ps, 0, sizeof(ps));
    -+	ps.recursive = 1;
    -+	ps.has_wildcard = 1;
    -+	ps.max_depth = -1;
    -+
    -+	strbuf_add(&base, new_dir->name, new_dir->ce_namelen);
    -+	read_tree_at(r, tree, &base, &ps,
    -+			show_new_file_from_tree, &ctx);
    -+}
    -+
    - static void show_new_file(struct rev_info *revs,
    - 			  const struct cache_entry *new_file,
    - 			  int cached, int match_missing)
     @@ diff-lib.c: static void show_new_file(struct rev_info *revs,
    - 	unsigned int mode;
      	unsigned dirty_submodule = 0;
    + 	struct index_state *istate = revs->diffopt.repo->index;
      
     +	if (new_file && S_ISSPARSEDIR(new_file->ce_mode)) {
    -+		show_directory(revs, new_file, /*added */ 1);
    ++		diff_tree_oid(NULL, &new_file->oid, new_file->name, &revs->diffopt);
     +		return;
     +	}
     +
      	/*
      	 * New file in the index: it might actually be different in
      	 * the working tree.
    -@@ diff-lib.c: static void show_new_file(struct rev_info *revs,
    - 	diff_index_show_file(revs, "+", new_file, oid, !is_null_oid(oid), mode, dirty_submodule);
    - }
    - 
    -+static int show_modified(struct rev_info *revs,
    -+			 const struct cache_entry *old_entry,
    -+			 const struct cache_entry *new_entry,
    -+			 int report_missing,
    -+			 int cached, int match_missing);
    -+
    -+static int compare_within_sparse_dir(int n, unsigned long mask,
    -+				     unsigned long dirmask, struct name_entry *entry,
    -+				     struct traverse_info *info)
    -+{
    -+	struct rev_info *revs = info->data;
    -+	struct object_id *oid0 = &entry[0].oid;
    -+	struct object_id *oid1 = &entry[1].oid;
    -+
    -+	if (oideq(oid0, oid1))
    -+		return mask;
    -+
    -+	/* Directory/file conflicts are handled earlier. */
    -+	if (S_ISDIR(entry[0].mode) && S_ISDIR(entry[1].mode)) {
    -+		struct tree_desc t[2];
    -+		void *buf[2];
    -+		struct traverse_info info_r = { NULL, };
    -+
    -+		info_r.name = xstrfmt("%s%s", info->traverse_path, entry[0].path);
    -+		info_r.namelen = strlen(info_r.name);
    -+		info_r.traverse_path = xstrfmt("%s/", info_r.name);
    -+		info_r.fn = compare_within_sparse_dir;
    -+		info_r.prev = info;
    -+		info_r.mode = entry[0].mode;
    -+		info_r.pathlen = entry[0].pathlen;
    -+		info_r.df_conflicts = 0;
    -+		info_r.data = revs;
    -+
    -+		buf[0] = fill_tree_descriptor(revs->repo, &t[0], oid0);
    -+		buf[1] = fill_tree_descriptor(revs->repo, &t[1], oid1);
    -+
    -+		traverse_trees(NULL, 2, t, &info_r);
    -+
    -+		free((char *)info_r.name);
    -+		free((char *)info_r.traverse_path);
    -+		free(buf[0]);
    -+		free(buf[1]);
    -+	} else {
    -+		char *old_path = NULL, *new_path = NULL;
    -+		struct cache_entry *old_entry = NULL, *new_entry = NULL;
    -+
    -+		if (entry[0].path) {
    -+			old_path = xstrfmt("%s%s", info->traverse_path, entry[0].path);
    -+			old_entry = make_transient_cache_entry(
    -+					entry[0].mode, &entry[0].oid,
    -+					old_path, /* stage */ 0);
    -+			old_entry->ce_flags |= CE_SKIP_WORKTREE;
    -+		}
    -+		if (entry[1].path) {
    -+			new_path = xstrfmt("%s%s", info->traverse_path, entry[1].path);
    -+			new_entry = make_transient_cache_entry(
    -+					entry[1].mode, &entry[1].oid,
    -+					new_path, /* stage */ 0);
    -+			new_entry->ce_flags |= CE_SKIP_WORKTREE;
    -+		}
    -+
    -+		if (entry[0].path && entry[1].path)
    -+			show_modified(revs, old_entry, new_entry, 0, 1, 0);
    -+		else if (entry[0].path)
    -+			diff_index_show_file(revs, revs->prefix,
    -+					     old_entry, &entry[0].oid,
    -+					     0, entry[0].mode, 0);
    -+		else if (entry[1].path)
    -+			show_new_file(revs, new_entry, 1, 0);
    -+
    -+		discard_cache_entry(old_entry);
    -+		discard_cache_entry(new_entry);
    -+		free(old_path);
    -+		free(new_path);
    -+	}
    -+
    -+	return mask;
    -+}
    -+
    -+static void show_modified_sparse_directory(struct rev_info *revs,
    -+			 const struct cache_entry *old_entry,
    -+			 const struct cache_entry *new_entry,
    -+			 int report_missing,
    -+			 int cached, int match_missing)
    -+{
    -+	struct tree_desc t[2];
    -+	void *buf[2];
    -+	struct traverse_info info = { NULL };
    -+	struct strbuf name = STRBUF_INIT;
    -+	struct strbuf parent_path = STRBUF_INIT;
    -+	char *last_dir_sep;
    -+
    -+	if (oideq(&old_entry->oid, &new_entry->oid))
    -+		return;
    -+
    -+	info.fn = compare_within_sparse_dir;
    -+	info.prev = &info;
    -+
    -+	strbuf_add(&name, new_entry->name, new_entry->ce_namelen - 1);
    -+	info.name = name.buf;
    -+	info.namelen = name.len;
    -+
    -+	strbuf_add(&parent_path, new_entry->name, new_entry->ce_namelen - 1);
    -+	if ((last_dir_sep = find_last_dir_sep(parent_path.buf)) > parent_path.buf)
    -+		strbuf_setlen(&parent_path, (last_dir_sep - parent_path.buf) - 1);
    -+	else
    -+		strbuf_setlen(&parent_path, 0);
    -+
    -+	info.pathlen = parent_path.len;
    -+
    -+	if (parent_path.len)
    -+		info.traverse_path = parent_path.buf;
    -+	else
    -+		info.traverse_path = "";
    -+
    -+	info.mode = new_entry->ce_mode;
    -+	info.df_conflicts = 0;
    -+	info.data = revs;
    -+
    -+	buf[0] = fill_tree_descriptor(revs->repo, &t[0], &old_entry->oid);
    -+	buf[1] = fill_tree_descriptor(revs->repo, &t[1], &new_entry->oid);
    -+
    -+	traverse_trees(NULL, 2, t, &info);
    -+
    -+	free(buf[0]);
    -+	free(buf[1]);
    -+	strbuf_release(&name);
    -+	strbuf_release(&parent_path);
    -+}
    -+
    - static int show_modified(struct rev_info *revs,
    - 			 const struct cache_entry *old_entry,
    - 			 const struct cache_entry *new_entry,
     @@ diff-lib.c: static int show_modified(struct rev_info *revs,
    - 	const struct object_id *oid;
      	unsigned dirty_submodule = 0;
    + 	struct index_state *istate = revs->diffopt.repo->index;
      
     +	/*
     +	 * If both are sparse directory entries, then expand the
    @@ diff-lib.c: static int show_modified(struct rev_info *revs,
     +	if (old_entry && new_entry &&
     +	    S_ISSPARSEDIR(old_entry->ce_mode) &&
     +	    S_ISSPARSEDIR(new_entry->ce_mode)) {
    -+		show_modified_sparse_directory(revs, old_entry, new_entry, report_missing, cached, match_missing);
    ++		diff_tree_oid(&old_entry->oid, &new_entry->oid, new_entry->name, &revs->diffopt);
     +		return 0;
     +	}
     +
    - 	if (get_stat_data(new_entry, &oid, &mode, cached, match_missing,
    + 	if (get_stat_data(istate, new_entry, &oid, &mode, cached, match_missing,
      			  &dirty_submodule, &revs->diffopt) < 0) {
      		if (report_missing)
11:  611b9f61fb2 = 13:  35063ffb8ed status: skip sparse-checkout percentage with sparse-index
12:  0c0a765dde8 = 14:  b4033a9bf36 status: use sparse-index throughout
13:  02f2c7b6398 ! 15:  717a3f49f97 wt-status: expand added sparse directory entries
    @@ t/t1092-sparse-checkout-compatibility.sh: test_expect_success 'sparse-index is n
     +	test_sparse_match git reset --mixed HEAD~1 &&
     +	test_sparse_match test-tool read-cache --table --expand &&
     +	test_sparse_match git status --porcelain=v2 &&
    -+	test_sparse_match git status --porcelain=v2 &&
     +
     +	# At this point, sparse-checkouts behave differently
     +	# from the full-checkout.
     +	test_sparse_match git checkout --orphan new-branch &&
     +	test_sparse_match test-tool read-cache --table --expand &&
    -+	test_sparse_match git status --porcelain=v2 &&
     +	test_sparse_match git status --porcelain=v2
     +'
     +
    @@ t/t1092-sparse-checkout-compatibility.sh: test_expect_success 'sparse-index is n
     +
     +	run_on_all touch deep/deeper1/x &&
     +	test_all_match git add . &&
    -+	test_all_match git status --porcelain=v2 &&
     +	test_all_match git status --porcelain=v2
     +'
     +
    @@ t/t1092-sparse-checkout-compatibility.sh: test_expect_success 'sparse-index is n
     
      ## wt-status.c ##
     @@ wt-status.c: static void wt_status_collect_changes_index(struct wt_status *s)
    - 	run_diff_index(&rev, 1);
    + 	clear_pathspec(&rev.prune_data);
      }
      
     +static int add_file_to_list(const struct object_id *oid,
    @@ wt-status.c: static void wt_status_collect_changes_index(struct wt_status *s)
     +	struct string_list_item *it;
     +	struct wt_status_change_data *d;
     +	struct wt_status *s = context;
    -+	char *full_name;
    ++	struct strbuf full_name = STRBUF_INIT;
     +
     +	if (S_ISDIR(mode))
     +		return READ_TREE_RECURSIVE;
     +
    -+	full_name = xstrfmt("%s%s", base->buf, path);
    -+	it = string_list_insert(&s->change, full_name);
    ++	strbuf_add(&full_name, base->buf, base->len);
    ++	strbuf_addstr(&full_name, path);
    ++	it = string_list_insert(&s->change, full_name.buf);
     +	d = it->util;
     +	if (!d) {
     +		CALLOC_ARRAY(d, 1);
    @@ wt-status.c: static void wt_status_collect_changes_index(struct wt_status *s)
     +	d->mode_index = mode;
     +	oidcpy(&d->oid_index, oid);
     +	s->committable = 1;
    ++	strbuf_release(&full_name);
     +	return 0;
     +}
     +
    @@ wt-status.c: static void wt_status_collect_changes_initial(struct wt_status *s)
     +			 * tree and marking them with DIFF_STATUS_ADDED.
     +			 */
     +			struct strbuf base = STRBUF_INIT;
    -+			struct pathspec ps;
    ++			struct pathspec ps = { 0 };
     +			struct tree *tree = lookup_tree(istate->repo, &ce->oid);
     +
    -+			memset(&ps, 0, sizeof(ps));
     +			ps.recursive = 1;
     +			ps.has_wildcard = 1;
     +			ps.max_depth = -1;
14:  46ca150c354 ! 16:  1d744848ee6 fsmonitor: integrate with sparse index
    @@ t/t7519-status-fsmonitor.sh: test_expect_success 'status succeeds after staging/
     +	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \
     +		git status --porcelain=v2 >actual &&
     +	test_region $1 index ensure_full_index trace2.txt &&
    ++	test_region fsm_hook query trace2.txt &&
     +	test_cmp expect actual &&
     +	rm trace2.txt &&
     +	git sparse-checkout disable
 

^ permalink raw reply	[flat|nested] 215+ messages in thread

* Re: [PATCH v7 00/16] Sparse-index: integrate with status
  2021-06-29  2:04           ` [PATCH v7 00/16] " Derrick Stolee via GitGitGadget
                               ` (16 preceding siblings ...)
  2021-06-29  2:16             ` [PATCH v7 00/16] Sparse-index: integrate with status Derrick Stolee
@ 2021-06-30 14:32             ` Elijah Newren
  2021-07-09  1:16               ` Elijah Newren
  2021-07-12 17:55             ` [PATCH v8 00/15] " Derrick Stolee via GitGitGadget
  18 siblings, 1 reply; 215+ messages in thread
From: Elijah Newren @ 2021-06-30 14:32 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Junio C Hamano, Matheus Tavares Bernardino,
	Derrick Stolee, Jeff Hostetler, Johannes Schindelin,
	Derrick Stolee

On Mon, Jun 28, 2021 at 7:04 PM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> This is the first "payoff" series in the sparse-index work. It makes 'git
> status' very fast when a sparse-index is enabled on a repository with
> cone-mode sparse-checkout (and a small populated set).
>
> This is based on ds/sparse-index-protections AND mt/add-rm-sparse-checkout.
> The latter branch is needed because it changes the behavior of 'git add'
> around sparse entries, which changes the expectations of a test added in
> patch 1.
>
> The approach here is to audit the places where ensure_full_index() pops up
> while doing normal commands with pathspecs within the sparse-checkout
> definition. Each of these are checked and tested. In the end, the
> sparse-index is integrated with these features:
>
>  * git status
>  * FS Monitor index extension.
>
> The performance tests in p2000-sparse-operations.sh improve by 95% or more,
> even when compared with the full-index cases, not just the sparse-index
> cases that previously had extra overhead.
>
> Hopefully this is the first example of how ds/sparse-index-protections has
> done the basic work to do these conversions safely, making them look easier
> than they seemed when starting this adventure.
>
> Thanks, -Stolee
>
>
> Update in V7 (relative to v5)
> =============================
>
> APOLOGIES: As I was working on this cover letter, I was still organizing my
> big list of patches, including reordering some into this series. I forgot to
> actually include them in my v6 submission, so here is a re-submission.
> Please ignore v6.
>
> I'm sorry that this revision took so long. Initially I was blocked on
> getting the directory/file conflict figured out (I did), but also my team
> was very busy with some things. Eventually, we reached an internal deadline
> to make an experimental release available [1] with initial sparse-index
> performance boosts. Creating that included some additional review by Jeff
> Hostetler and Johannes Schindelin which led to more changes in this version.
>
> The good news is that this series has now created the basis for many Git
> commands to integrate with the sparse-index without much additional work.
> This effort was unfortunately overloaded on this series because the changes
> needed for things like 'git checkout' or 'git add' all intersect with the
> changes needed for 'git status'. Might as well get it right the first time.
>
> Because the range-diff is a big difficult to read this time, I'll break the
> changes down on a patch-by-patch basis.
>
>  1. sparse-index: skip indexes with unmerged entries
>
>     (no change)
>
>  2. sparse-index: include EXTENDED flag when expanding
>
>  * Commit message better describes the purpose of the change.
>
>  3. t1092: replace incorrect 'echo' with 'cat'
>
>  * Typo fix
>
>  4. t1092: expand repository data shape
>
>  * some files are added that surround "folder1/" immediately before and
>    after, based on the sorting with the trailing slash. This provides extra
>    coverage.
>
>  5. t1092: add tests for status/add and sparse files
>
>     (no change)
>
>  6. unpack-trees: preserve cache_bottom
>
>     (no change)
>
>  7. unpack-trees: compare sparse directories correctly
>
>  * We were previosly not comparing the path lengths, which causes a problem
>    (with later updates) when a sparse directory such as "folder1/0/" gets
>    compared to a tree name "folder1".
>
>  8. unpack-trees: rename unpack_nondirectories()
>
>  * This new commit changes the name to make more sense with its new behavior
>    that could modify a sparse directory entry. The point of the method is in
>    contrast to recursing into trees.
>
>  9. unpack-trees: unpack sparse directory entries
>
>  * THIS is the biggest change from previous versions. There were a few
>    things going on that were tricky to get right, especially with the
>    directory/file conflict (handled in an update in the following, new
>    patch).
>
>  * The changes to create_ce_entry() regarding alloc_len missed a spot that
>    was critical to getting the length right in the allocated entry.
>
>  * Use '\0' over 0 to represent the terminating character.
>
>  * We don't need a "sparse_directory" parameter to unpack_nondirectories()
>    (which was renamed to unpack_single_entry() by the previous new patch)
>    because we can use dirmask to discover if src[0] (or any other value)
>    should be a sparse directory entry.
>
>  * Similarly, we don't need to call the method twice from unpack_callback().
>
>  * The 'conflicts' variable is set to match the dirmask in the beginning,
>    but it should depend on whether or not we have a sparse directory entry
>    instead, and if all trees that have the path have a directory.
>
>  * The implementation of find_cache_entry() uses find_cache_pos() to find an
>    insertion position for a path if it doesn't find an exact match. Before,
>    we subtracted one to find the sparse directory entry, but there could be
>    multiple paths between the sparse directory entry and the insertion
>    point, so we need to walk backwards until we find it. This requires many
>    paths having the same prefix, so hopefully is a rare case. Some of the
>    test data changes were added to cover the need for this logic. This uses
>    a helper method, sparse_dir_matches_path, which is also used by
>    is_sparse_directory_entry.
>
>  10. unpack-trees: handle dir/file conflict of sparse entries
>
>  * This new logic inside twoway_merge handles the special case for dealing
>    with a directory/file conflict during a 'git checkout'. The necessarily
>    data and tests are also added here, though the logic will only take
>    serious effect when we integrate with 'git checkout' later.
>
>  11. dir.c: accept a directory as part of cone-mode patterns
>
>  * The value slash_pos was previously a pointer within a strbuf, but in some
>    cases we add to that strbuf and that could reallocate the pointer, making
>    slash_pos be invalid. The replacement is to have slash_pos be an integer
>    position within the string, so it is consistent even if the string is
>    reallocated for an append.
>
>  12. diff-lib: handle index diffs with sparse dirs
>
>  * As recommended in the previous review, a simple diff_tree_oid() replaces
>    the complicated use of read_tree_at() and traverse_trees() in the
>    previous version.
>
>  13. status: skip sparse-checkout percentage with sparse-index
>
>      (no change)
>
>  14. status: use sparse-index throughout
>
>      (no change)
>
>  15. wt-status: expand added sparse directory entries
>
>  * Duplicate 'git status --porcelain=v2' lines are removed from tests.
>
>  * The pathspec is initialized using "= { 0 }" instead of memset().
>
>  16. fsmonitor: integrate with sparse index
>
>  * An extra test_region is added to ensure that the filesystem monitor hook
>    is still being called, and we are not simply disabling the feature
>    entirely.

This is SUPER exciting.  I've only read the cover letter, but it
strongly suggests you've not only handled all my feedback in previous
rounds, but got things pretty solidly nailed away.  I'll try to make
some time to go over it all soon.

^ permalink raw reply	[flat|nested] 215+ messages in thread

* Re: [PATCH v7 09/16] unpack-trees: unpack sparse directory entries
  2021-06-29  2:04             ` [PATCH v7 09/16] unpack-trees: unpack sparse directory entries Derrick Stolee via GitGitGadget
@ 2021-07-07 22:25               ` Elijah Newren
  0 siblings, 0 replies; 215+ messages in thread
From: Elijah Newren @ 2021-07-07 22:25 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Junio C Hamano, Matheus Tavares Bernardino,
	Derrick Stolee, Jeff Hostetler, Johannes Schindelin,
	Derrick Stolee, Derrick Stolee

On Mon, Jun 28, 2021 at 7:05 PM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
> -       else
> +
> +       /*
> +        * Check for a sparse-directory entry named "path/".
> +        * Due to the input p->path not having a trailing
> +        * slash, the negative 'pos' value overshoots the
> +        * expected position by at least one, hence "-2" here.

You added the qualifier "at least" to this comment since v5.  I think
it's slightly misleading because it sounds like -2 is the end of the
special handling of the "at least" one overshoot.  Perhaps if you
ended with... "hence '-2' instead of '-1' here, and we also need to
check below if we overshot more than one".

> +        */
> +       pos = -pos - 2;
> +
> +       if (pos < 0 || pos >= o->src_index->cache_nr)
>                 return NULL;
> +
> +       /*
> +        * We might have multiple entries between 'pos' and
> +        * the actual sparse-directory entry, so start walking
> +        * back until finding it or passing where it would be.

It might be helpful to add a quick comment about the scenario where
this comes up.  e.g.

    This arises due to lexicographic sort ordering and sparse
directory entries coming with a trailing slash, causing there to be
multiple entries between "subdir" and "subdir/" (such as anything
beginning with "subdir." or "subdir-").  We are trying to walk back
from "subdir/" to "subdir" here.


> +        */
> +       while (pos >= 0) {
> +               ce = o->src_index->cache[pos];
> +
> +               if (strncmp(ce->name, p->path, p->pathlen))
> +                       return NULL;
> +
> +               if (S_ISSPARSEDIR(ce->ce_mode) &&
> +                   sparse_dir_matches_path(ce, info, p))
> +                       return ce;
> +
> +               pos--;
> +       }
> +
> +       return NULL;
>  }

^ permalink raw reply	[flat|nested] 215+ messages in thread

* Re: [PATCH v7 10/16] unpack-trees: handle dir/file conflict of sparse entries
  2021-06-29  2:04             ` [PATCH v7 10/16] unpack-trees: handle dir/file conflict of sparse entries Derrick Stolee via GitGitGadget
@ 2021-07-07 23:19               ` Elijah Newren
  2021-07-09  0:58                 ` Elijah Newren
  0 siblings, 1 reply; 215+ messages in thread
From: Elijah Newren @ 2021-07-07 23:19 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Junio C Hamano, Matheus Tavares Bernardino,
	Derrick Stolee, Jeff Hostetler, Johannes Schindelin,
	Derrick Stolee, Derrick Stolee

On Mon, Jun 28, 2021 at 7:05 PM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Derrick Stolee <dstolee@microsoft.com>
>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  t/t1092-sparse-checkout-compatibility.sh | 25 ++++++++++++++++++++++--
>  unpack-trees.c                           |  5 ++++-
>  2 files changed, 27 insertions(+), 3 deletions(-)
>
> diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
> index 3f61e5686b5..4e6446e7545 100755
> --- a/t/t1092-sparse-checkout-compatibility.sh
> +++ b/t/t1092-sparse-checkout-compatibility.sh
> @@ -95,6 +95,19 @@ test_expect_success 'setup' '
>                 git add . &&
>                 git commit -m "rename deep/deeper1/... to folder1/..." &&
>
> +               git checkout -b df-conflict base &&
> +               rm -rf folder1 &&
> +               echo content >folder1 &&
> +               git add . &&
> +               git commit -m df &&
> +
> +               git checkout -b fd-conflict base &&
> +               rm a &&
> +               mkdir a &&
> +               echo content >a/a &&
> +               git add . &&
> +               git commit -m fd &&
> +
>                 git checkout -b deepest base &&
>                 echo "updated deepest" >deep/deeper1/deepest/a &&
>                 git commit -a -m "update deepest" &&
> @@ -325,7 +338,11 @@ test_expect_success 'diff --staged' '
>  test_expect_success 'diff with renames and conflicts' '
>         init_repos &&
>
> -       for branch in rename-out-to-out rename-out-to-in rename-in-to-out
> +       for branch in rename-out-to-out \
> +                     rename-out-to-in \
> +                     rename-in-to-out \
> +                     df-conflict \
> +                     fd-conflict
>         do
>                 test_all_match git checkout rename-base &&
>                 test_all_match git checkout $branch -- .&&
> @@ -338,7 +355,11 @@ test_expect_success 'diff with renames and conflicts' '
>  test_expect_success 'diff with directory/file conflicts' '
>         init_repos &&
>
> -       for branch in rename-out-to-out rename-out-to-in rename-in-to-out
> +       for branch in rename-out-to-out \
> +                     rename-out-to-in \
> +                     rename-in-to-out \
> +                     df-conflict \
> +                     fd-conflict
>         do
>                 git -C full-checkout reset --hard &&
>                 test_sparse_match git reset --hard &&

Tests look good...

> diff --git a/unpack-trees.c b/unpack-trees.c
> index d141dffbd94..e63b2dcacbc 100644
> --- a/unpack-trees.c
> +++ b/unpack-trees.c
> @@ -2617,7 +2617,10 @@ int twoway_merge(const struct cache_entry * const *src,
>                          same(current, oldtree) && !same(current, newtree)) {
>                         /* 20 or 21 */
>                         return merged_entry(newtree, current, o);
> -               } else
> +               } else if (current && !oldtree && newtree &&
> +                          S_ISSPARSEDIR(current->ce_mode) != S_ISSPARSEDIR(newtree->ce_mode))
> +                       return merged_entry(newtree, current, o);
> +               else
>                         return reject_merge(current, o);
>         }
>         else if (newtree) {

This seems wrong to me but I'm having a hard time nailing down a
testcase to prove it.  The logic looks to me like "if the old tree as
nothing in the index at the given path, and the newtree has something,
and the index had something staged, but the newtree and staged index
entry disagree on the type of the object, do some weird merged_entry()
logic on both types of trees that tends to just take the newer I
thought but who knows what functions like verify_uptodate(entry) do
when entry is a sparse directory...".

So, I'm not so sure about this.  Could you explain this a bit more?

However, I did find a testcase that aborts with a fatal error...though
I can't tell if it's even triggering the above logic; I think it isn't
because I have an "ignoreme" on both sides of the history.  Here's the
testcase:

# Make a little test repo
git init dumb
cd dumb

# Setup old commit
touch tracked
echo foo >ignoreme
git add .
git commit -m "Initial"
git branch orig

# Setup new commit
git rm ignoreme
mkdir ignoreme
touch ignoreme/file
git add ignoreme/file
git commit -m "whatever"

# Switch to old commit
git checkout orig

# Make index != new (and index != old)
git rm ignoreme
mkdir ignoreme
echo user-data >ignoreme/file
git add ignoreme/file

# Sparsify
GIT_TEST_SPARSE_INDEX=0 # GIT_TEST_SPARSE_INDEX is documented as a boolean;
                        # but the traditional boolean value is ignored and it
                        # really only cares about set/unset.  Confusing.
git sparse-checkout init --cone --sparse-index
git sparse-checkout set tracked

# Check status and dirs/paths in index
git status --porcelain
test-tool read-cache --table
test-tool read-cache --table --expand

# Run a command that aborts with a fatal error
git checkout -m master

^ permalink raw reply	[flat|nested] 215+ messages in thread

* Re: [PATCH v7 12/16] diff-lib: handle index diffs with sparse dirs
  2021-06-29  2:04             ` [PATCH v7 12/16] diff-lib: handle index diffs with sparse dirs Derrick Stolee via GitGitGadget
@ 2021-07-08 23:10               ` Elijah Newren
  2021-07-08 23:51                 ` Elijah Newren
  0 siblings, 1 reply; 215+ messages in thread
From: Elijah Newren @ 2021-07-08 23:10 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Junio C Hamano, Matheus Tavares Bernardino,
	Derrick Stolee, Jeff Hostetler, Johannes Schindelin,
	Derrick Stolee, Derrick Stolee

On Mon, Jun 28, 2021 at 7:05 PM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Derrick Stolee <dstolee@microsoft.com>
>
> While comparing an index to a tree, we may see a sparse directory entry.
> In this case, we should compare that portion of the tree to the tree
> represented by that entry. This could include a new tree which needs to
> be expanded to a full list of added files. It could also include an
> existing tree, in which case all of the changes inside are important to
> describe, including the modifications, additions, and deletions. Note
> that the case where the tree has a path and the index does not remains
> identical to before: the lack of a cache entry is the same with a sparse
> index.
>
> Use diff_tree_oid() appropriately to compute the diff.
>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  diff-lib.c | 16 ++++++++++++++++
>  1 file changed, 16 insertions(+)
>
> diff --git a/diff-lib.c b/diff-lib.c
> index c2ac9250fe9..3f32f038371 100644
> --- a/diff-lib.c
> +++ b/diff-lib.c
> @@ -325,6 +325,11 @@ static void show_new_file(struct rev_info *revs,
>         unsigned dirty_submodule = 0;
>         struct index_state *istate = revs->diffopt.repo->index;
>
> +       if (new_file && S_ISSPARSEDIR(new_file->ce_mode)) {
> +               diff_tree_oid(NULL, &new_file->oid, new_file->name, &revs->diffopt);
> +               return;
> +       }
> +
>         /*
>          * New file in the index: it might actually be different in
>          * the working tree.
> @@ -347,6 +352,17 @@ static int show_modified(struct rev_info *revs,
>         unsigned dirty_submodule = 0;
>         struct index_state *istate = revs->diffopt.repo->index;
>
> +       /*
> +        * If both are sparse directory entries, then expand the
> +        * modifications to the file level.
> +        */
> +       if (old_entry && new_entry &&
> +           S_ISSPARSEDIR(old_entry->ce_mode) &&
> +           S_ISSPARSEDIR(new_entry->ce_mode)) {
> +               diff_tree_oid(&old_entry->oid, &new_entry->oid, new_entry->name, &revs->diffopt);
> +               return 0;
> +       }
> +
>         if (get_stat_data(istate, new_entry, &oid, &mode, cached, match_missing,
>                           &dirty_submodule, &revs->diffopt) < 0) {
>                 if (report_missing)

Love the simpler patch.

I'm curious about the case where S_ISSPARSEDIR(old_entry->ce_mode) !=
S_ISSPARSEDIR(new_entry->ce_mode), though; how is that handled?

^ permalink raw reply	[flat|nested] 215+ messages in thread

* Re: [PATCH v7 12/16] diff-lib: handle index diffs with sparse dirs
  2021-07-08 23:10               ` Elijah Newren
@ 2021-07-08 23:51                 ` Elijah Newren
  2021-07-12 13:52                   ` Derrick Stolee
  0 siblings, 1 reply; 215+ messages in thread
From: Elijah Newren @ 2021-07-08 23:51 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Junio C Hamano, Matheus Tavares Bernardino,
	Derrick Stolee, Jeff Hostetler, Johannes Schindelin,
	Derrick Stolee, Derrick Stolee

On Thu, Jul 8, 2021 at 4:10 PM Elijah Newren <newren@gmail.com> wrote:
>
> On Mon, Jun 28, 2021 at 7:05 PM Derrick Stolee via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
> >
> > From: Derrick Stolee <dstolee@microsoft.com>
> >
> > While comparing an index to a tree, we may see a sparse directory entry.
> > In this case, we should compare that portion of the tree to the tree
> > represented by that entry. This could include a new tree which needs to
> > be expanded to a full list of added files. It could also include an
> > existing tree, in which case all of the changes inside are important to
> > describe, including the modifications, additions, and deletions. Note
> > that the case where the tree has a path and the index does not remains
> > identical to before: the lack of a cache entry is the same with a sparse
> > index.
> >
> > Use diff_tree_oid() appropriately to compute the diff.
> >
> > Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> > ---
> >  diff-lib.c | 16 ++++++++++++++++
> >  1 file changed, 16 insertions(+)
> >
> > diff --git a/diff-lib.c b/diff-lib.c
> > index c2ac9250fe9..3f32f038371 100644
> > --- a/diff-lib.c
> > +++ b/diff-lib.c
> > @@ -325,6 +325,11 @@ static void show_new_file(struct rev_info *revs,
> >         unsigned dirty_submodule = 0;
> >         struct index_state *istate = revs->diffopt.repo->index;
> >
> > +       if (new_file && S_ISSPARSEDIR(new_file->ce_mode)) {
> > +               diff_tree_oid(NULL, &new_file->oid, new_file->name, &revs->diffopt);
> > +               return;
> > +       }
> > +
> >         /*
> >          * New file in the index: it might actually be different in
> >          * the working tree.
> > @@ -347,6 +352,17 @@ static int show_modified(struct rev_info *revs,
> >         unsigned dirty_submodule = 0;
> >         struct index_state *istate = revs->diffopt.repo->index;
> >
> > +       /*
> > +        * If both are sparse directory entries, then expand the
> > +        * modifications to the file level.
> > +        */
> > +       if (old_entry && new_entry &&
> > +           S_ISSPARSEDIR(old_entry->ce_mode) &&
> > +           S_ISSPARSEDIR(new_entry->ce_mode)) {
> > +               diff_tree_oid(&old_entry->oid, &new_entry->oid, new_entry->name, &revs->diffopt);
> > +               return 0;
> > +       }
> > +
> >         if (get_stat_data(istate, new_entry, &oid, &mode, cached, match_missing,
> >                           &dirty_submodule, &revs->diffopt) < 0) {
> >                 if (report_missing)
>
> Love the simpler patch.
>
> I'm curious about the case where S_ISSPARSEDIR(old_entry->ce_mode) !=
> S_ISSPARSEDIR(new_entry->ce_mode), though; how is that handled?

Digging a little deeper, it appears that we could add this just before
your new if-block:

    assert(S_ISSPARSEDIR(old_entry->ce_mode) ==
           S_ISSPARSEDIR(new_entry->ce_mode));

And the code still functions, while that also removes some of the
surprise factor.  I'm guessing that the difference between "folder1"
and "folder1/" cause us to never try to directly compare a file to a
directory...but if that's accurate, a comment of some effect might
help make this code be a little clearer and make readers less likely
to wonder why you need to check that both old and new are sparse
directories.

^ permalink raw reply	[flat|nested] 215+ messages in thread

* Re: [PATCH v7 10/16] unpack-trees: handle dir/file conflict of sparse entries
  2021-07-07 23:19               ` Elijah Newren
@ 2021-07-09  0:58                 ` Elijah Newren
  2021-07-12 13:46                   ` Derrick Stolee
  0 siblings, 1 reply; 215+ messages in thread
From: Elijah Newren @ 2021-07-09  0:58 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Junio C Hamano, Matheus Tavares Bernardino,
	Derrick Stolee, Jeff Hostetler, Johannes Schindelin,
	Derrick Stolee, Derrick Stolee

On Wed, Jul 7, 2021 at 4:19 PM Elijah Newren <newren@gmail.com> wrote:
>
> On Mon, Jun 28, 2021 at 7:05 PM Derrick Stolee via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
> >
> > From: Derrick Stolee <dstolee@microsoft.com>
> >
> > Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> > ---
> >  t/t1092-sparse-checkout-compatibility.sh | 25 ++++++++++++++++++++++--
> >  unpack-trees.c                           |  5 ++++-
> >  2 files changed, 27 insertions(+), 3 deletions(-)
> >
> > diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
> > index 3f61e5686b5..4e6446e7545 100755
> > --- a/t/t1092-sparse-checkout-compatibility.sh
> > +++ b/t/t1092-sparse-checkout-compatibility.sh
> > @@ -95,6 +95,19 @@ test_expect_success 'setup' '
> >                 git add . &&
> >                 git commit -m "rename deep/deeper1/... to folder1/..." &&
> >
> > +               git checkout -b df-conflict base &&
> > +               rm -rf folder1 &&
> > +               echo content >folder1 &&
> > +               git add . &&
> > +               git commit -m df &&
> > +
> > +               git checkout -b fd-conflict base &&
> > +               rm a &&
> > +               mkdir a &&
> > +               echo content >a/a &&
> > +               git add . &&
> > +               git commit -m fd &&
> > +
> >                 git checkout -b deepest base &&
> >                 echo "updated deepest" >deep/deeper1/deepest/a &&
> >                 git commit -a -m "update deepest" &&
> > @@ -325,7 +338,11 @@ test_expect_success 'diff --staged' '
> >  test_expect_success 'diff with renames and conflicts' '
> >         init_repos &&
> >
> > -       for branch in rename-out-to-out rename-out-to-in rename-in-to-out
> > +       for branch in rename-out-to-out \
> > +                     rename-out-to-in \
> > +                     rename-in-to-out \
> > +                     df-conflict \
> > +                     fd-conflict
> >         do
> >                 test_all_match git checkout rename-base &&
> >                 test_all_match git checkout $branch -- .&&
> > @@ -338,7 +355,11 @@ test_expect_success 'diff with renames and conflicts' '
> >  test_expect_success 'diff with directory/file conflicts' '
> >         init_repos &&
> >
> > -       for branch in rename-out-to-out rename-out-to-in rename-in-to-out
> > +       for branch in rename-out-to-out \
> > +                     rename-out-to-in \
> > +                     rename-in-to-out \
> > +                     df-conflict \
> > +                     fd-conflict
> >         do
> >                 git -C full-checkout reset --hard &&
> >                 test_sparse_match git reset --hard &&
>
> Tests look good...
>
> > diff --git a/unpack-trees.c b/unpack-trees.c
> > index d141dffbd94..e63b2dcacbc 100644
> > --- a/unpack-trees.c
> > +++ b/unpack-trees.c
> > @@ -2617,7 +2617,10 @@ int twoway_merge(const struct cache_entry * const *src,
> >                          same(current, oldtree) && !same(current, newtree)) {
> >                         /* 20 or 21 */
> >                         return merged_entry(newtree, current, o);
> > -               } else
> > +               } else if (current && !oldtree && newtree &&
> > +                          S_ISSPARSEDIR(current->ce_mode) != S_ISSPARSEDIR(newtree->ce_mode))
> > +                       return merged_entry(newtree, current, o);
> > +               else
> >                         return reject_merge(current, o);
> >         }
> >         else if (newtree) {

t1092 still passes if you replace the
    return merged_entry(newtree, current, o);
line with
    die("This line is never hit.");

Is it possible that you thought you needed this block but further
refactoring removed the need?  Or that it is only needed by the later
ds/commit-and-checkout-with-sparse-index topic (which I haven't yet
reviewed, because I was reviewing this topic first)?  It seems this
code change should either be dropped, or moved out to the relevant
series that uses it.

> This seems wrong to me but I'm having a hard time nailing down a
> testcase to prove it.  The logic looks to me like "if the old tree as
> nothing in the index at the given path, and the newtree has something,
> and the index had something staged, but the newtree and staged index
> entry disagree on the type of the object, do some weird merged_entry()
> logic on both types of trees that tends to just take the newer I
> thought but who knows what functions like verify_uptodate(entry) do
> when entry is a sparse directory...".
>
> So, I'm not so sure about this.  Could you explain this a bit more?
>
> However, I did find a testcase that aborts with a fatal error...though
> I can't tell if it's even triggering the above logic; I think it isn't
> because I have an "ignoreme" on both sides of the history.  Here's the
> testcase:
>
> # Make a little test repo
> git init dumb
> cd dumb
>
> # Setup old commit
> touch tracked
> echo foo >ignoreme
> git add .
> git commit -m "Initial"
> git branch orig
>
> # Setup new commit
> git rm ignoreme
> mkdir ignoreme
> touch ignoreme/file
> git add ignoreme/file
> git commit -m "whatever"
>
> # Switch to old commit
> git checkout orig
>
> # Make index != new (and index != old)
> git rm ignoreme
> mkdir ignoreme
> echo user-data >ignoreme/file
> git add ignoreme/file
>
> # Sparsify
> GIT_TEST_SPARSE_INDEX=0 # GIT_TEST_SPARSE_INDEX is documented as a boolean;
>                         # but the traditional boolean value is ignored and it
>                         # really only cares about set/unset.  Confusing.
> git sparse-checkout init --cone --sparse-index
> git sparse-checkout set tracked
>
> # Check status and dirs/paths in index
> git status --porcelain
> test-tool read-cache --table
> test-tool read-cache --table --expand
>
> # Run a command that aborts with a fatal error
> git checkout -m master

It turns out that this testcase I provided still triggers the same
fatal error if you omit the --sparse-index flag, so it's not a
sparse-index-specific bug.

So, perhaps it shouldn't hold up this series, but given that a lot of
your correctness verification in t1092 relies on comparisons between
sparse checkouts and sparse indexes, it might be worth trying to get
to the root of this.

^ permalink raw reply	[flat|nested] 215+ messages in thread

* Re: [PATCH v7 15/16] wt-status: expand added sparse directory entries
  2021-06-29  2:04             ` [PATCH v7 15/16] wt-status: expand added sparse directory entries Derrick Stolee via GitGitGadget
@ 2021-07-09  1:03               ` Elijah Newren
  2021-07-12 13:56                 ` Derrick Stolee
  0 siblings, 1 reply; 215+ messages in thread
From: Elijah Newren @ 2021-07-09  1:03 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Junio C Hamano, Matheus Tavares Bernardino,
	Derrick Stolee, Jeff Hostetler, Johannes Schindelin,
	Derrick Stolee, Derrick Stolee

On Mon, Jun 28, 2021 at 7:05 PM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Derrick Stolee <dstolee@microsoft.com>
>
> It is difficult, but possible, to get into a state where we intend to
> add a directory that is outside of the sparse-checkout definition. Add a
> test to t1092-sparse-checkout-compatibility.sh that demonstrates this
> using a combination of 'git reset --mixed' and 'git checkout --orphan'.
>
> This test failed before because the output of 'git status
> --porcelain=v2' would not match on the lines for folder1/:
>
> * The sparse-checkout repo (with a full index) would output each path
>   name that is intended to be added.
>
> * The sparse-index repo would only output that "folder1/" is staged for
>   addition.
>
> The status should report the full list of files to be added, and so this
> sparse-directory entry should be expanded to a full list when reaching
> it inside the wt_status_collect_changes_initial() method. Use
> read_tree_at() to assist.
>
> Somehow, this loop over the cache entries was not guarded by
> ensure_full_index() as intended.
>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  t/t1092-sparse-checkout-compatibility.sh | 33 +++++++++++++++
>  wt-status.c                              | 51 ++++++++++++++++++++++++
>  2 files changed, 84 insertions(+)
>
> diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
> index fed0440bafe..df217a2d10b 100755
> --- a/t/t1092-sparse-checkout-compatibility.sh
> +++ b/t/t1092-sparse-checkout-compatibility.sh
> @@ -545,4 +545,37 @@ test_expect_success 'sparse-index is not expanded' '
>         test_region ! index ensure_full_index trace2.txt
>  '
>
> +test_expect_success 'reset mixed and checkout orphan' '
> +       init_repos &&
> +
> +       test_all_match git checkout rename-out-to-in &&
> +
> +       # Sparse checkouts do not agree with full checkouts about
> +       # how to report a directory/file conflict during a reset.
> +       # This command would fail with test_all_match because the
> +       # full checkout reports "T folder1/0/1" while a sparse
> +       # checkout reports "D folder1/0/1". This matches because
> +       # the sparse checkouts skip "adding" the other side of
> +       # the conflict.

The same issue I highlighted last time is still present.  If you
insert an "exit 1" right here, then run
    ./t1092-sparse-checkout-compatibility.sh --ver --imm -x
until it stops, then
    cd t/trash directory.t1092-sparse-checkout-compatibility/sparse-checkout
    git ls-files -t | grep folder  # Note the files that are sparse
    git reset --mixed HEAD~1
    git ls-files -t | grep folder  # Note the files that are sparse --
there are some that aren't that should be
    git sparse-checkout reapply
    git ls-files -t | grep folder  # Note the files that are sparse

Granted, this is a bug with sparse-checkout without sparse-index, so
not something new to your series.  But since you are using comparisons
between regular sparse-checkouts and sparse-index to verify
correctness, this seems problematic to me.

> +       test_sparse_match git reset --mixed HEAD~1 &&
> +       test_sparse_match test-tool read-cache --table --expand &&
> +       test_sparse_match git status --porcelain=v2 &&
> +
> +       # At this point, sparse-checkouts behave differently
> +       # from the full-checkout.
> +       test_sparse_match git checkout --orphan new-branch &&
> +       test_sparse_match test-tool read-cache --table --expand &&
> +       test_sparse_match git status --porcelain=v2
> +'
> +
> +test_expect_success 'add everything with deep new file' '
> +       init_repos &&
> +
> +       run_on_sparse git sparse-checkout set deep/deeper1/deepest &&
> +
> +       run_on_all touch deep/deeper1/x &&
> +       test_all_match git add . &&
> +       test_all_match git status --porcelain=v2
> +'
> +
>  test_done
> diff --git a/wt-status.c b/wt-status.c
> index 96db3e74962..0317baef87e 100644
> --- a/wt-status.c
> +++ b/wt-status.c
> @@ -657,6 +657,36 @@ static void wt_status_collect_changes_index(struct wt_status *s)
>         clear_pathspec(&rev.prune_data);
>  }
>
> +static int add_file_to_list(const struct object_id *oid,
> +                           struct strbuf *base, const char *path,
> +                           unsigned int mode, void *context)
> +{
> +       struct string_list_item *it;
> +       struct wt_status_change_data *d;
> +       struct wt_status *s = context;
> +       struct strbuf full_name = STRBUF_INIT;
> +
> +       if (S_ISDIR(mode))
> +               return READ_TREE_RECURSIVE;
> +
> +       strbuf_add(&full_name, base->buf, base->len);
> +       strbuf_addstr(&full_name, path);
> +       it = string_list_insert(&s->change, full_name.buf);
> +       d = it->util;
> +       if (!d) {
> +               CALLOC_ARRAY(d, 1);
> +               it->util = d;
> +       }
> +
> +       d->index_status = DIFF_STATUS_ADDED;
> +       /* Leave {mode,oid}_head zero for adds. */
> +       d->mode_index = mode;
> +       oidcpy(&d->oid_index, oid);
> +       s->committable = 1;
> +       strbuf_release(&full_name);
> +       return 0;
> +}
> +
>  static void wt_status_collect_changes_initial(struct wt_status *s)
>  {
>         struct index_state *istate = s->repo->index;
> @@ -671,6 +701,27 @@ static void wt_status_collect_changes_initial(struct wt_status *s)
>                         continue;
>                 if (ce_intent_to_add(ce))
>                         continue;
> +               if (S_ISSPARSEDIR(ce->ce_mode)) {
> +                       /*
> +                        * This is a sparse directory entry, so we want to collect all
> +                        * of the added files within the tree. This requires recursively
> +                        * expanding the trees to find the elements that are new in this
> +                        * tree and marking them with DIFF_STATUS_ADDED.
> +                        */
> +                       struct strbuf base = STRBUF_INIT;
> +                       struct pathspec ps = { 0 };
> +                       struct tree *tree = lookup_tree(istate->repo, &ce->oid);
> +
> +                       ps.recursive = 1;
> +                       ps.has_wildcard = 1;
> +                       ps.max_depth = -1;
> +
> +                       strbuf_add(&base, ce->name, ce->ce_namelen);
> +                       read_tree_at(istate->repo, tree, &base, &ps,
> +                                    add_file_to_list, s);
> +                       continue;
> +               }
> +
>                 it = string_list_insert(&s->change, ce->name);
>                 d = it->util;
>                 if (!d) {
> --
> gitgitgadget

^ permalink raw reply	[flat|nested] 215+ messages in thread

* Re: [PATCH v7 00/16] Sparse-index: integrate with status
  2021-06-30 14:32             ` Elijah Newren
@ 2021-07-09  1:16               ` Elijah Newren
  2021-07-12 14:46                 ` Derrick Stolee
  0 siblings, 1 reply; 215+ messages in thread
From: Elijah Newren @ 2021-07-09  1:16 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Junio C Hamano, Matheus Tavares Bernardino,
	Derrick Stolee, Jeff Hostetler, Johannes Schindelin,
	Derrick Stolee

On Wed, Jun 30, 2021 at 7:32 AM Elijah Newren <newren@gmail.com> wrote:
>
> On Mon, Jun 28, 2021 at 7:04 PM Derrick Stolee via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
> >
> > This is the first "payoff" series in the sparse-index work. It makes 'git
> > status' very fast when a sparse-index is enabled on a repository with
> > cone-mode sparse-checkout (and a small populated set).
> >
...
> > Because the range-diff is a big difficult to read this time, I'll break the
> > changes down on a patch-by-patch basis.

Thanks for doing this; it was helpful.

> This is SUPER exciting.  I've only read the cover letter, but it
> strongly suggests you've not only handled all my feedback in previous
> rounds, but got things pretty solidly nailed away.  I'll try to make
> some time to go over it all soon.

You have indeed addressed nearly all my feedback in previous rounds,
and I found few problems with all the new code in this round.
Overall, this round is looking really good, though there are a couple
things I called out in comments on individual patches that I'll
summarize here:

Patch 9: a few minor suggestions for improving comments

Patch 10: the new code is never triggered and probably should either
be dropped or made part of a later series if the later series needs
it.  Also, although it doesn't necessarily need to hold up this
series, I found a bug affecting sparse-checkouts with or without
sparse-index while trying to understand this patch.

Patch 12: the code was slightly confusing to me, but I found that
there seems to be an invariant it is based upon.  Adding an assert
with a comment or just a comment about this invariant might help make
the code more readable.

Patch 15: since the new tests in t1092 are written to compare
sparse-checkout and sparse-index, it seems we should investigate a bug
where the testsuite commands we invoke are giving incorrect behavior
in both sparse-checkout and sparse-index.

^ permalink raw reply	[flat|nested] 215+ messages in thread

* Re: [PATCH v7 10/16] unpack-trees: handle dir/file conflict of sparse entries
  2021-07-09  0:58                 ` Elijah Newren
@ 2021-07-12 13:46                   ` Derrick Stolee
  0 siblings, 0 replies; 215+ messages in thread
From: Derrick Stolee @ 2021-07-12 13:46 UTC (permalink / raw)
  To: Elijah Newren, Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Junio C Hamano, Matheus Tavares Bernardino,
	Jeff Hostetler, Johannes Schindelin, Derrick Stolee,
	Derrick Stolee

On 7/8/2021 8:58 PM, Elijah Newren wrote:
> On Wed, Jul 7, 2021 at 4:19 PM Elijah Newren <newren@gmail.com> wrote:
>>
>> On Mon, Jun 28, 2021 at 7:05 PM Derrick Stolee via GitGitGadget
>> <gitgitgadget@gmail.com> wrote:
...
>>> +               } else if (current && !oldtree && newtree &&
>>> +                          S_ISSPARSEDIR(current->ce_mode) != S_ISSPARSEDIR(newtree->ce_mode))
>>> +                       return merged_entry(newtree, current, o);
>>> +               else
>>>                         return reject_merge(current, o);
>>>         }
>>>         else if (newtree) {
> 
> t1092 still passes if you replace the
>     return merged_entry(newtree, current, o);
> line with
>     die("This line is never hit.");
> 
> Is it possible that you thought you needed this block but further
> refactoring removed the need?  Or that it is only needed by the later
> ds/commit-and-checkout-with-sparse-index topic (which I haven't yet
> reviewed, because I was reviewing this topic first)?  It seems this
> code change should either be dropped, or moved out to the relevant
> series that uses it.

I have been working with the whole stack of patches (including the
ones that update 'git add') and trying to make the necessary changes
in this series, especially when updating the test data shape or
modifying methods that were necessary for 'git status'.

It is likely I was just overcautious thinking this was necessary so
early. It might also have been necessary due to some strange case
that is only exposed in a Scalar functional test. I'll try moving it
to the next series and double check those tests.

>> This seems wrong to me but I'm having a hard time nailing down a
>> testcase to prove it.  The logic looks to me like "if the old tree as
>> nothing in the index at the given path, and the newtree has something,
>> and the index had something staged, but the newtree and staged index
>> entry disagree on the type of the object, do some weird merged_entry()
>> logic on both types of trees that tends to just take the newer I
>> thought but who knows what functions like verify_uptodate(entry) do
>> when entry is a sparse directory...".
>>
>> So, I'm not so sure about this.  Could you explain this a bit more?

The most important point is that 'current' and 'newtree' are both
present but have different types (blob and tree) and the tree is
necessarily at the edge of the sparse-checkout cone. In the cases
where I was able to trigger this logic in the debugger, 'oldtree'
was NULL, so I added as a condition to be extra cautious around
unexpected initialization. It is possible that the '!oldtree'
condition is unnecessary. I will investigate.

As for verify_uptodate(), it ignores the sparse directory unless
o->skip_sparse_checkout is set. I wonder if it is possible to
have that set and hit this case.

>> However, I did find a testcase that aborts with a fatal error...though
>> I can't tell if it's even triggering the above logic; I think it isn't
>> because I have an "ignoreme" on both sides of the history.  Here's the
>> testcase:
>>
...
>> # Run a command that aborts with a fatal error
>> git checkout -m master

Thank you for finding an interesting test!

> It turns out that this testcase I provided still triggers the same
> fatal error if you omit the --sparse-index flag, so it's not a
> sparse-index-specific bug.
> 
> So, perhaps it shouldn't hold up this series, but given that a lot of
> your correctness verification in t1092 relies on comparisons between
> sparse checkouts and sparse indexes, it might be worth trying to get
> to the root of this.
 
I have some upcoming thoughts on changing how sparse-checkout works
for other complicated cases, so I will add this to that pile.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 215+ messages in thread

* Re: [PATCH v7 12/16] diff-lib: handle index diffs with sparse dirs
  2021-07-08 23:51                 ` Elijah Newren
@ 2021-07-12 13:52                   ` Derrick Stolee
  0 siblings, 0 replies; 215+ messages in thread
From: Derrick Stolee @ 2021-07-12 13:52 UTC (permalink / raw)
  To: Elijah Newren, Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Junio C Hamano, Matheus Tavares Bernardino,
	Jeff Hostetler, Johannes Schindelin, Derrick Stolee,
	Derrick Stolee

On 7/8/2021 7:51 PM, Elijah Newren wrote:
> On Thu, Jul 8, 2021 at 4:10 PM Elijah Newren <newren@gmail.com> wrote:
>>
>> On Mon, Jun 28, 2021 at 7:05 PM Derrick Stolee via GitGitGadget
...
>>> +       /*
>>> +        * If both are sparse directory entries, then expand the
>>> +        * modifications to the file level.
>>> +        */
>>> +       if (old_entry && new_entry &&
>>> +           S_ISSPARSEDIR(old_entry->ce_mode) &&
>>> +           S_ISSPARSEDIR(new_entry->ce_mode)) {
>>> +               diff_tree_oid(&old_entry->oid, &new_entry->oid, new_entry->name, &revs->diffopt);
>>> +               return 0;
>>> +       }
>>> +
>>>         if (get_stat_data(istate, new_entry, &oid, &mode, cached, match_missing,
>>>                           &dirty_submodule, &revs->diffopt) < 0) {
>>>                 if (report_missing)
>>
>> Love the simpler patch.
>>
>> I'm curious about the case where S_ISSPARSEDIR(old_entry->ce_mode) !=
>> S_ISSPARSEDIR(new_entry->ce_mode), though; how is that handled?
> 
> Digging a little deeper, it appears that we could add this just before
> your new if-block:
> 
>     assert(S_ISSPARSEDIR(old_entry->ce_mode) ==
>            S_ISSPARSEDIR(new_entry->ce_mode));
> 
> And the code still functions, while that also removes some of the
> surprise factor.  I'm guessing that the difference between "folder1"
> and "folder1/" cause us to never try to directly compare a file to a
> directory...but if that's accurate, a comment of some effect might
> help make this code be a little clearer and make readers less likely
> to wonder why you need to check that both old and new are sparse
> directories.

I was surprised that this worked, because my patch conditioned on
old_entry and new_entry being non-NULL. But of course show_modified()
requires them to be non-NULL. That can be further simplified.

Adding the assert helps demonstrate this expectation, but also I will
expand upon the comment.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 215+ messages in thread

* Re: [PATCH v7 15/16] wt-status: expand added sparse directory entries
  2021-07-09  1:03               ` Elijah Newren
@ 2021-07-12 13:56                 ` Derrick Stolee
  2021-07-12 19:32                   ` Elijah Newren
  0 siblings, 1 reply; 215+ messages in thread
From: Derrick Stolee @ 2021-07-12 13:56 UTC (permalink / raw)
  To: Elijah Newren, Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Junio C Hamano, Matheus Tavares Bernardino,
	Jeff Hostetler, Johannes Schindelin, Derrick Stolee,
	Derrick Stolee

On 7/8/2021 9:03 PM, Elijah Newren wrote:
> On Mon, Jun 28, 2021 at 7:05 PM Derrick Stolee via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
...
>> +test_expect_success 'reset mixed and checkout orphan' '
>> +       init_repos &&
>> +
>> +       test_all_match git checkout rename-out-to-in &&
>> +
>> +       # Sparse checkouts do not agree with full checkouts about
>> +       # how to report a directory/file conflict during a reset.
>> +       # This command would fail with test_all_match because the
>> +       # full checkout reports "T folder1/0/1" while a sparse
>> +       # checkout reports "D folder1/0/1". This matches because
>> +       # the sparse checkouts skip "adding" the other side of
>> +       # the conflict.
> 
> The same issue I highlighted last time is still present.  If you
> insert an "exit 1" right here, then run
>     ./t1092-sparse-checkout-compatibility.sh --ver --imm -x
> until it stops, then
>     cd t/trash directory.t1092-sparse-checkout-compatibility/sparse-checkout
>     git ls-files -t | grep folder  # Note the files that are sparse
>     git reset --mixed HEAD~1
>     git ls-files -t | grep folder  # Note the files that are sparse --
> there are some that aren't that should be
>     git sparse-checkout reapply
>     git ls-files -t | grep folder  # Note the files that are sparse
> 
> Granted, this is a bug with sparse-checkout without sparse-index, so
> not something new to your series.  But since you are using comparisons
> between regular sparse-checkouts and sparse-index to verify
> correctness, this seems problematic to me.

I'll add it to the pile, but I want to continue having this series
focus on making the sparse-index work quickly without a change in
behavior from a normal index. Changing the behavior of the sparse-
checkout feature should be a separate series.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 215+ messages in thread

* Re: [PATCH v7 00/16] Sparse-index: integrate with status
  2021-07-09  1:16               ` Elijah Newren
@ 2021-07-12 14:46                 ` Derrick Stolee
  0 siblings, 0 replies; 215+ messages in thread
From: Derrick Stolee @ 2021-07-12 14:46 UTC (permalink / raw)
  To: Elijah Newren, Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Junio C Hamano, Matheus Tavares Bernardino,
	Jeff Hostetler, Johannes Schindelin, Derrick Stolee

On 7/8/2021 9:16 PM, Elijah Newren wrote:
> On Wed, Jun 30, 2021 at 7:32 AM Elijah Newren <newren@gmail.com> wrote:
>>
>> On Mon, Jun 28, 2021 at 7:04 PM Derrick Stolee via GitGitGadget
>> <gitgitgadget@gmail.com> wrote:
>>>
>>> This is the first "payoff" series in the sparse-index work. It makes 'git
>>> status' very fast when a sparse-index is enabled on a repository with
>>> cone-mode sparse-checkout (and a small populated set).
>>>
> ...
>>> Because the range-diff is a big difficult to read this time, I'll break the
>>> changes down on a patch-by-patch basis.
> 
> Thanks for doing this; it was helpful.
> 
>> This is SUPER exciting.  I've only read the cover letter, but it
>> strongly suggests you've not only handled all my feedback in previous
>> rounds, but got things pretty solidly nailed away.  I'll try to make
>> some time to go over it all soon.
> 
> You have indeed addressed nearly all my feedback in previous rounds,
> and I found few problems with all the new code in this round.
> Overall, this round is looking really good, though there are a couple
> things I called out in comments on individual patches that I'll
> summarize here:

This summary is nice. I will try to do a similar thing myself
in the future.

> Patch 9: a few minor suggestions for improving comments

Done.

> Patch 10: the new code is never triggered and probably should either
> be dropped or made part of a later series if the later series needs
> it.  Also, although it doesn't necessarily need to hold up this
> series, I found a bug affecting sparse-checkouts with or without
> sparse-index while trying to understand this patch.

I checked and this patch is not necessary until the next series,
so I will move it to that one.

> Patch 12: the code was slightly confusing to me, but I found that
> there seems to be an invariant it is based upon.  Adding an assert
> with a comment or just a comment about this invariant might help make
> the code more readable.

The assert() plus a better comment makes this patch cleaner.

> Patch 15: since the new tests in t1092 are written to compare
> sparse-checkout and sparse-index, it seems we should investigate a bug
> where the testsuite commands we invoke are giving incorrect behavior
> in both sparse-checkout and sparse-index.

I have made a note in an internal tracker to follow-up on this
along with some other behavior things that currently exist in
sparse-checkout as they should be pursued in a separate series.

I'm not ignoring your comments, but I would like to keep these
patches that change the underlying data structure from being
conflated with behavior changes. The fact that I am discovering
(and documenting in tests) the changes is only highlighting that
they already exist.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 215+ messages in thread

* [PATCH v8 00/15] Sparse-index: integrate with status
  2021-06-29  2:04           ` [PATCH v7 00/16] " Derrick Stolee via GitGitGadget
                               ` (17 preceding siblings ...)
  2021-06-30 14:32             ` Elijah Newren
@ 2021-07-12 17:55             ` Derrick Stolee via GitGitGadget
  2021-07-12 17:55               ` [PATCH v8 01/15] sparse-index: skip indexes with unmerged entries Derrick Stolee via GitGitGadget
                                 ` (16 more replies)
  18 siblings, 17 replies; 215+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-07-12 17:55 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee, git,
	johannes.schindelin, Derrick Stolee

This is the first "payoff" series in the sparse-index work. It makes 'git
status' very fast when a sparse-index is enabled on a repository with
cone-mode sparse-checkout (and a small populated set).

This is based on ds/sparse-index-protections AND mt/add-rm-sparse-checkout.
The latter branch is needed because it changes the behavior of 'git add'
around sparse entries, which changes the expectations of a test added in
patch 1.

The approach here is to audit the places where ensure_full_index() pops up
while doing normal commands with pathspecs within the sparse-checkout
definition. Each of these are checked and tested. In the end, the
sparse-index is integrated with these features:

 * git status
 * FS Monitor index extension.

The performance tests in p2000-sparse-operations.sh improve by 95% or more,
even when compared with the full-index cases, not just the sparse-index
cases that previously had extra overhead.

Hopefully this is the first example of how ds/sparse-index-protections has
done the basic work to do these conversions safely, making them look easier
than they seemed when starting this adventure.

Thanks, -Stolee


Update in V8
============

 * The directory/file conflict patch is removed and delayed to the next
   series where it will be required. (It will also be improved in that
   series.)

 * Some comments have been improved, including a new assert() that helps
   document the situation.

Derrick Stolee (15):
  sparse-index: skip indexes with unmerged entries
  sparse-index: include EXTENDED flag when expanding
  t1092: replace incorrect 'echo' with 'cat'
  t1092: expand repository data shape
  t1092: add tests for status/add and sparse files
  unpack-trees: preserve cache_bottom
  unpack-trees: compare sparse directories correctly
  unpack-trees: rename unpack_nondirectories()
  unpack-trees: unpack sparse directory entries
  dir.c: accept a directory as part of cone-mode patterns
  diff-lib: handle index diffs with sparse dirs
  status: skip sparse-checkout percentage with sparse-index
  status: use sparse-index throughout
  wt-status: expand added sparse directory entries
  fsmonitor: integrate with sparse index

 builtin/commit.c                         |   3 +
 diff-lib.c                               |  19 +++
 dir.c                                    |  24 +++-
 read-cache.c                             |  10 +-
 sparse-index.c                           |  27 +++-
 t/t1092-sparse-checkout-compatibility.sh | 158 ++++++++++++++++++++++-
 t/t7519-status-fsmonitor.sh              |  49 +++++++
 unpack-trees.c                           | 142 +++++++++++++++++---
 wt-status.c                              |  65 +++++++++-
 wt-status.h                              |   1 +
 10 files changed, 464 insertions(+), 34 deletions(-)


base-commit: d486ca60a51c9cb1fe068803c3f540724e95e83a
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-932%2Fderrickstolee%2Fsparse-index%2Fstatus-and-add-v8
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-932/derrickstolee/sparse-index/status-and-add-v8
Pull-Request: https://github.com/gitgitgadget/git/pull/932

Range-diff vs v7:

  1:  2a4a7256304 =  1:  1815c148e8c sparse-index: skip indexes with unmerged entries
  2:  f5bae86014d =  2:  7bcde075d8d sparse-index: include EXTENDED flag when expanding
  3:  d965669c766 =  3:  05981e30b97 t1092: replace incorrect 'echo' with 'cat'
  4:  e10fa11cfdb !  4:  d38b66e9ee4 t1092: expand repository data shape
     @@ t/t1092-sparse-checkout-compatibility.sh: test_expect_success 'diff --staged' '
       	for branch in rename-out-to-out rename-out-to-in rename-in-to-out
       	do
       		test_all_match git checkout rename-base &&
     - 		test_all_match git checkout $branch -- .&&
     + 		test_all_match git checkout $branch -- . &&
      +		test_all_match git status --porcelain=v2 &&
      +		test_all_match git diff --staged --no-renames &&
      +		test_all_match git diff --staged --find-renames || return 1
  5:  e94ffa07d46 =  5:  95ddd3abe4e t1092: add tests for status/add and sparse files
  6:  a8dda933567 =  6:  b182b456613 unpack-trees: preserve cache_bottom
  7:  e52166f6e4c =  7:  988ddce4d45 unpack-trees: compare sparse directories correctly
  8:  d04b62381b8 =  8:  d67ad048b08 unpack-trees: rename unpack_nondirectories()
  9:  237ccf4e43d !  9:  c0b0b58584c unpack-trees: unpack sparse directory entries
     @@ unpack-trees.c: static int find_cache_pos(struct traverse_info *info,
      +	 * Check for a sparse-directory entry named "path/".
      +	 * Due to the input p->path not having a trailing
      +	 * slash, the negative 'pos' value overshoots the
     -+	 * expected position by at least one, hence "-2" here.
     ++	 * expected position, hence "-2" instead of "-1".
      +	 */
      +	pos = -pos - 2;
      +
     @@ unpack-trees.c: static int find_cache_pos(struct traverse_info *info,
       		return NULL;
      +
      +	/*
     -+	 * We might have multiple entries between 'pos' and
     -+	 * the actual sparse-directory entry, so start walking
     -+	 * back until finding it or passing where it would be.
     ++	 * Due to lexicographic sorting and sparse directory
     ++	 * entried ending with a trailing slash, our path as a
     ++	 * sparse directory (e.g "subdir/") and	our path as a
     ++	 * file (e.g. "subdir") might be separated by other
     ++	 * paths (e.g. "subdir-").
      +	 */
      +	while (pos >= 0) {
      +		ce = o->src_index->cache[pos];
 10:  9f31c691af6 <  -:  ----------- unpack-trees: handle dir/file conflict of sparse entries
 11:  2a43287c47e = 10:  76c7528f78f dir.c: accept a directory as part of cone-mode patterns
 12:  f83aa08ff6b ! 11:  d875a7f8585 diff-lib: handle index diffs with sparse dirs
     @@ diff-lib.c: static int show_modified(struct rev_info *revs,
       	unsigned dirty_submodule = 0;
       	struct index_state *istate = revs->diffopt.repo->index;
       
     ++	assert(S_ISSPARSEDIR(old_entry->ce_mode) ==
     ++	       S_ISSPARSEDIR(new_entry->ce_mode));
     ++
      +	/*
      +	 * If both are sparse directory entries, then expand the
     -+	 * modifications to the file level.
     ++	 * modifications to the file level. If only one was a sparse
     ++	 * directory, then they appear as an add and delete instead of
     ++	 * a modification.
      +	 */
     -+	if (old_entry && new_entry &&
     -+	    S_ISSPARSEDIR(old_entry->ce_mode) &&
     -+	    S_ISSPARSEDIR(new_entry->ce_mode)) {
     ++	if (S_ISSPARSEDIR(new_entry->ce_mode)) {
      +		diff_tree_oid(&old_entry->oid, &new_entry->oid, new_entry->name, &revs->diffopt);
      +		return 0;
      +	}
 13:  35063ffb8ed = 12:  2b72cc2d985 status: skip sparse-checkout percentage with sparse-index
 14:  b4033a9bf36 = 13:  1c1feef3733 status: use sparse-index throughout
 15:  717a3f49f97 = 14:  dada1b91bdc wt-status: expand added sparse directory entries
 16:  1d744848ee6 = 15:  bdc771cf373 fsmonitor: integrate with sparse index

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 215+ messages in thread

* [PATCH v8 01/15] sparse-index: skip indexes with unmerged entries
  2021-07-12 17:55             ` [PATCH v8 00/15] " Derrick Stolee via GitGitGadget
@ 2021-07-12 17:55               ` Derrick Stolee via GitGitGadget
  2021-07-12 17:55               ` [PATCH v8 02/15] sparse-index: include EXTENDED flag when expanding Derrick Stolee via GitGitGadget
                                 ` (15 subsequent siblings)
  16 siblings, 0 replies; 215+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-07-12 17:55 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee, git,
	johannes.schindelin, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

The sparse-index format is designed to be compatible with merge
conflicts, even those outside the sparse-checkout definition. The reason
is that when converting a full index to a sparse one, a cache entry with
nonzero stage will not be collapsed into a sparse directory entry.

However, this behavior was not tested, and a different behavior within
convert_to_sparse() fails in this scenario. Specifically,
cache_tree_update() will fail when unmerged entries exist.
convert_to_sparse_rec() uses the cache-tree data to recursively walk the
tree structure, but also to compute the OIDs used in the
sparse-directory entries.

Add an index scan to convert_to_sparse() that will detect if these merge
conflict entries exist and skip the conversion before trying to update
the cache-tree. This is marked as NEEDSWORK because this can be removed
with a suitable update to cache_tree_update() or a similar method that
can construct a cache-tree with invalid nodes, but still allow creating
the nodes necessary for creating sparse directory entries.

It is possible that in the future we will not need to make such an
update, since if we do not expand a sparse-index into a full one, this
conversion does not need to happen. Thus, this can be deferred until the
merge machinery is made to integrate with the sparse-index.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 sparse-index.c                           | 18 ++++++++++++++++++
 t/t1092-sparse-checkout-compatibility.sh | 22 ++++++++++++++++++++++
 2 files changed, 40 insertions(+)

diff --git a/sparse-index.c b/sparse-index.c
index affc4048f27..2c695930275 100644
--- a/sparse-index.c
+++ b/sparse-index.c
@@ -116,6 +116,17 @@ int set_sparse_index_config(struct repository *repo, int enable)
 	return res;
 }
 
+static int index_has_unmerged_entries(struct index_state *istate)
+{
+	int i;
+	for (i = 0; i < istate->cache_nr; i++) {
+		if (ce_stage(istate->cache[i]))
+			return 1;
+	}
+
+	return 0;
+}
+
 int convert_to_sparse(struct index_state *istate)
 {
 	int test_env;
@@ -152,6 +163,13 @@ int convert_to_sparse(struct index_state *istate)
 		return -1;
 	}
 
+	/*
+	 * NEEDSWORK: If we have unmerged entries, then stay full.
+	 * Unmerged entries prevent the cache-tree extension from working.
+	 */
+	if (index_has_unmerged_entries(istate))
+		return 0;
+
 	if (cache_tree_update(istate, 0)) {
 		warning(_("unable to update cache-tree, staying full"));
 		return -1;
diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index d028b73eba1..b8617ceef71 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -352,6 +352,28 @@ test_expect_success 'merge with outside renames' '
 	done
 '
 
+# Sparse-index fails to convert the index in the
+# final 'git cherry-pick' command.
+test_expect_success 'cherry-pick with conflicts' '
+	init_repos &&
+
+	write_script edit-conflict <<-\EOF &&
+	echo $1 >conflict
+	EOF
+
+	test_all_match git checkout -b to-cherry-pick &&
+	run_on_all ../edit-conflict ABC &&
+	test_all_match git add conflict &&
+	test_all_match git commit -m "conflict to pick" &&
+
+	test_all_match git checkout -B base HEAD~1 &&
+	run_on_all ../edit-conflict DEF &&
+	test_all_match git add conflict &&
+	test_all_match git commit -m "conflict in base" &&
+
+	test_all_match test_must_fail git cherry-pick to-cherry-pick
+'
+
 test_expect_success 'clean' '
 	init_repos &&
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 215+ messages in thread

* [PATCH v8 02/15] sparse-index: include EXTENDED flag when expanding
  2021-07-12 17:55             ` [PATCH v8 00/15] " Derrick Stolee via GitGitGadget
  2021-07-12 17:55               ` [PATCH v8 01/15] sparse-index: skip indexes with unmerged entries Derrick Stolee via GitGitGadget
@ 2021-07-12 17:55               ` Derrick Stolee via GitGitGadget
  2021-07-12 17:55               ` [PATCH v8 03/15] t1092: replace incorrect 'echo' with 'cat' Derrick Stolee via GitGitGadget
                                 ` (14 subsequent siblings)
  16 siblings, 0 replies; 215+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-07-12 17:55 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee, git,
	johannes.schindelin, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

When creating a full index from a sparse one, we create cache entries
for every blob within a given sparse directory entry. These are
correctly marked with the CE_SKIP_WORKTREE flag, but the CE_EXTENDED
flag is not included. The CE_EXTENDED flag would exist if we loaded a
full index from disk with these entries marked with CE_SKIP_WORKTREE, so
we can add the flag here to be consistent. This allows us to directly
compare the flags present in cache entries when testing the sparse-index
feature, but has no significance to its correctness in the user-facing
functionality.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 sparse-index.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/sparse-index.c b/sparse-index.c
index 2c695930275..ef53bd2198b 100644
--- a/sparse-index.c
+++ b/sparse-index.c
@@ -213,7 +213,7 @@ static int add_path_to_index(const struct object_id *oid,
 	strbuf_addstr(base, path);
 
 	ce = make_cache_entry(istate, mode, oid, base->buf, 0, 0);
-	ce->ce_flags |= CE_SKIP_WORKTREE;
+	ce->ce_flags |= CE_SKIP_WORKTREE | CE_EXTENDED;
 	set_index_entry(istate, istate->cache_nr++, ce);
 
 	strbuf_setlen(base, len);
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 215+ messages in thread

* [PATCH v8 03/15] t1092: replace incorrect 'echo' with 'cat'
  2021-07-12 17:55             ` [PATCH v8 00/15] " Derrick Stolee via GitGitGadget
  2021-07-12 17:55               ` [PATCH v8 01/15] sparse-index: skip indexes with unmerged entries Derrick Stolee via GitGitGadget
  2021-07-12 17:55               ` [PATCH v8 02/15] sparse-index: include EXTENDED flag when expanding Derrick Stolee via GitGitGadget
@ 2021-07-12 17:55               ` Derrick Stolee via GitGitGadget
  2021-07-14  0:02                 ` Bagas Sanjaya
  2021-07-12 17:55               ` [PATCH v8 04/15] t1092: expand repository data shape Derrick Stolee via GitGitGadget
                                 ` (13 subsequent siblings)
  16 siblings, 1 reply; 215+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-07-12 17:55 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee, git,
	johannes.schindelin, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

This fixes the test data shape to be as expected, allowing rename
detection to work properly now that the 'larger-content' file actually
has meaningful lines.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 t/t1092-sparse-checkout-compatibility.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index b8617ceef71..87f1014a1c9 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -40,7 +40,7 @@ test_expect_success 'setup' '
 		done &&
 
 		git checkout -b rename-base base &&
-		echo >folder1/larger-content <<-\EOF &&
+		cat >folder1/larger-content <<-\EOF &&
 		matching
 		lines
 		help
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 215+ messages in thread

* [PATCH v8 04/15] t1092: expand repository data shape
  2021-07-12 17:55             ` [PATCH v8 00/15] " Derrick Stolee via GitGitGadget
                                 ` (2 preceding siblings ...)
  2021-07-12 17:55               ` [PATCH v8 03/15] t1092: replace incorrect 'echo' with 'cat' Derrick Stolee via GitGitGadget
@ 2021-07-12 17:55               ` Derrick Stolee via GitGitGadget
  2021-07-12 17:55               ` [PATCH v8 05/15] t1092: add tests for status/add and sparse files Derrick Stolee via GitGitGadget
                                 ` (12 subsequent siblings)
  16 siblings, 0 replies; 215+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-07-12 17:55 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee, git,
	johannes.schindelin, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

As more features integrate with the sparse-index feature, more and more
special cases arise that require different data shapes within the tree
structure of the repository in order to demonstrate those cases.

Add several interesting special cases all at once instead of sprinkling
them across several commits. The interesting cases being added here are:

* Add sparse-directory entries on both sides of directories within the
  sparse-checkout definition.

* Add directories outside the sparse-checkout definition who have only
  one entry and are the first entry of a directory with multiple
  entries.

* Add filenames adjacent to a sparse directory entry that sort before
  and after the trailing slash.

Later tests will take advantage of these shapes, but they also deepen
the tests that already exist.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 t/t1092-sparse-checkout-compatibility.sh | 42 ++++++++++++++++++++++--
 1 file changed, 40 insertions(+), 2 deletions(-)

diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index 87f1014a1c9..0e71a623619 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -17,7 +17,7 @@ test_expect_success 'setup' '
 		echo "after folder1" >g &&
 		echo "after x" >z &&
 		mkdir folder1 folder2 deep x &&
-		mkdir deep/deeper1 deep/deeper2 &&
+		mkdir deep/deeper1 deep/deeper2 deep/before deep/later &&
 		mkdir deep/deeper1/deepest &&
 		echo "after deeper1" >deep/e &&
 		echo "after deepest" >deep/deeper1/e &&
@@ -25,10 +25,23 @@ test_expect_success 'setup' '
 		cp a folder2 &&
 		cp a x &&
 		cp a deep &&
+		cp a deep/before &&
 		cp a deep/deeper1 &&
 		cp a deep/deeper2 &&
+		cp a deep/later &&
 		cp a deep/deeper1/deepest &&
 		cp -r deep/deeper1/deepest deep/deeper2 &&
+		mkdir deep/deeper1/0 &&
+		mkdir deep/deeper1/0/0 &&
+		touch deep/deeper1/0/1 &&
+		touch deep/deeper1/0/0/0 &&
+		>folder1- &&
+		>folder1.x &&
+		>folder10 &&
+		cp -r deep/deeper1/0 folder1 &&
+		cp -r deep/deeper1/0 folder2 &&
+		echo >>folder1/0/0/0 &&
+		echo >>folder2/0/1 &&
 		git add . &&
 		git commit -m "initial commit" &&
 		git checkout -b base &&
@@ -56,11 +69,17 @@ test_expect_success 'setup' '
 		mv folder1/a folder2/b &&
 		mv folder1/larger-content folder2/edited-content &&
 		echo >>folder2/edited-content &&
+		echo >>folder2/0/1 &&
+		echo stuff >>deep/deeper1/a &&
 		git add . &&
 		git commit -m "rename folder1/... to folder2/..." &&
 
 		git checkout -b rename-out-to-in rename-base &&
 		mv folder1/a deep/deeper1/b &&
+		echo more stuff >>deep/deeper1/a &&
+		rm folder2/0/1 &&
+		mkdir folder2/0/1 &&
+		echo >>folder2/0/1/1 &&
 		mv folder1/larger-content deep/deeper1/edited-content &&
 		echo >>deep/deeper1/edited-content &&
 		git add . &&
@@ -68,6 +87,9 @@ test_expect_success 'setup' '
 
 		git checkout -b rename-in-to-out rename-base &&
 		mv deep/deeper1/a folder1/b &&
+		echo >>folder2/0/1 &&
+		rm -rf folder1/0/0 &&
+		echo >>folder1/0/0 &&
 		mv deep/deeper1/larger-content folder1/edited-content &&
 		echo >>folder1/edited-content &&
 		git add . &&
@@ -262,13 +284,29 @@ test_expect_success 'diff --staged' '
 	test_all_match git diff --staged
 '
 
-test_expect_success 'diff with renames' '
+test_expect_success 'diff with renames and conflicts' '
 	init_repos &&
 
 	for branch in rename-out-to-out rename-out-to-in rename-in-to-out
 	do
 		test_all_match git checkout rename-base &&
 		test_all_match git checkout $branch -- . &&
+		test_all_match git status --porcelain=v2 &&
+		test_all_match git diff --staged --no-renames &&
+		test_all_match git diff --staged --find-renames || return 1
+	done
+'
+
+test_expect_success 'diff with directory/file conflicts' '
+	init_repos &&
+
+	for branch in rename-out-to-out rename-out-to-in rename-in-to-out
+	do
+		git -C full-checkout reset --hard &&
+		test_sparse_match git reset --hard &&
+		test_all_match git checkout $branch &&
+		test_all_match git checkout rename-base -- . &&
+		test_all_match git status --porcelain=v2 &&
 		test_all_match git diff --staged --no-renames &&
 		test_all_match git diff --staged --find-renames || return 1
 	done
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 215+ messages in thread

* [PATCH v8 05/15] t1092: add tests for status/add and sparse files
  2021-07-12 17:55             ` [PATCH v8 00/15] " Derrick Stolee via GitGitGadget
                                 ` (3 preceding siblings ...)
  2021-07-12 17:55               ` [PATCH v8 04/15] t1092: expand repository data shape Derrick Stolee via GitGitGadget
@ 2021-07-12 17:55               ` Derrick Stolee via GitGitGadget
  2021-07-12 17:55               ` [PATCH v8 06/15] unpack-trees: preserve cache_bottom Derrick Stolee via GitGitGadget
                                 ` (11 subsequent siblings)
  16 siblings, 0 replies; 215+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-07-12 17:55 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee, git,
	johannes.schindelin, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Before moving to update 'git status' and 'git add' to work with sparse
indexes, add an explicit test that ensures the sparse-index works the
same as a normal sparse-checkout when the worktree contains directories
and files outside of the sparse cone.

Specifically, 'folder1/a' is a file in our test repo, but 'folder1' is
not in the sparse cone. When 'folder1/a' is modified, the file is not
shown as modified and adding it will fail. This is new behavior as of
a20f704 (add: warn when asked to update SKIP_WORKTREE entries,
2021-04-08). Before that change, these adds would be silently ignored.

Untracked files are fine: adding new files both with 'git add .' and
'git add folder1/' works just as in a full checkout. This may not be
entirely desirable, but we are not intending to change behavior at the
moment, only document it. A future change could alter the behavior to
be more sensible, and this test could be modified to satisfy the new
expected behavior.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 t/t1092-sparse-checkout-compatibility.sh | 38 ++++++++++++++++++++++++
 1 file changed, 38 insertions(+)

diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index 0e71a623619..2269f44e033 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -254,6 +254,44 @@ test_expect_success 'add, commit, checkout' '
 	test_all_match git checkout -
 '
 
+test_expect_success 'status/add: outside sparse cone' '
+	init_repos &&
+
+	# adding a "missing" file outside the cone should fail
+	test_sparse_match test_must_fail git add folder1/a &&
+
+	# folder1 is at HEAD, but outside the sparse cone
+	run_on_sparse mkdir folder1 &&
+	cp initial-repo/folder1/a sparse-checkout/folder1/a &&
+	cp initial-repo/folder1/a sparse-index/folder1/a &&
+
+	test_sparse_match git status &&
+
+	write_script edit-contents <<-\EOF &&
+	echo text >>$1
+	EOF
+	run_on_sparse ../edit-contents folder1/a &&
+	run_on_all ../edit-contents folder1/new &&
+
+	test_sparse_match git status --porcelain=v2 &&
+
+	# This "git add folder1/a" fails with a warning
+	# in the sparse repos, differing from the full
+	# repo. This is intentional.
+	test_sparse_match test_must_fail git add folder1/a &&
+	test_sparse_match test_must_fail git add --refresh folder1/a &&
+	test_all_match git status --porcelain=v2 &&
+
+	test_all_match git add . &&
+	test_all_match git status --porcelain=v2 &&
+	test_all_match git commit -m folder1/new &&
+
+	run_on_all ../edit-contents folder1/newer &&
+	test_all_match git add folder1/ &&
+	test_all_match git status --porcelain=v2 &&
+	test_all_match git commit -m folder1/newer
+'
+
 test_expect_success 'checkout and reset --hard' '
 	init_repos &&
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 215+ messages in thread

* [PATCH v8 06/15] unpack-trees: preserve cache_bottom
  2021-07-12 17:55             ` [PATCH v8 00/15] " Derrick Stolee via GitGitGadget
                                 ` (4 preceding siblings ...)
  2021-07-12 17:55               ` [PATCH v8 05/15] t1092: add tests for status/add and sparse files Derrick Stolee via GitGitGadget
@ 2021-07-12 17:55               ` Derrick Stolee via GitGitGadget
  2021-07-12 17:55               ` [PATCH v8 07/15] unpack-trees: compare sparse directories correctly Derrick Stolee via GitGitGadget
                                 ` (10 subsequent siblings)
  16 siblings, 0 replies; 215+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-07-12 17:55 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee, git,
	johannes.schindelin, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

The cache_bottom member of 'struct unpack_trees_options' is used to
track the range of index entries corresponding to a node of the cache
tree. While recursing with traverse_by_cache_tree(), this value is
preserved on the call stack using a local and then restored as that
method returns.

The mark_ce_used() method normally modifies the cache_bottom member when
it refers to the marked cache entry. However, sparse directory entries
are stored as nodes in the cache-tree data structure as of 2de37c53
(cache-tree: integrate with sparse directory entries, 2021-03-30). Thus,
the cache_bottom will be modified as the cache-tree walk advances. Do
not update it as well within mark_ce_used().

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 unpack-trees.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/unpack-trees.c b/unpack-trees.c
index f88a69f8e71..87c1ed204c8 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -600,6 +600,13 @@ static void mark_ce_used(struct cache_entry *ce, struct unpack_trees_options *o)
 {
 	ce->ce_flags |= CE_UNPACKED;
 
+	/*
+	 * If this is a sparse directory, don't advance cache_bottom.
+	 * That will be advanced later using the cache-tree data.
+	 */
+	if (S_ISSPARSEDIR(ce->ce_mode))
+		return;
+
 	if (o->cache_bottom < o->src_index->cache_nr &&
 	    o->src_index->cache[o->cache_bottom] == ce) {
 		int bottom = o->cache_bottom;
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 215+ messages in thread

* [PATCH v8 07/15] unpack-trees: compare sparse directories correctly
  2021-07-12 17:55             ` [PATCH v8 00/15] " Derrick Stolee via GitGitGadget
                                 ` (5 preceding siblings ...)
  2021-07-12 17:55               ` [PATCH v8 06/15] unpack-trees: preserve cache_bottom Derrick Stolee via GitGitGadget
@ 2021-07-12 17:55               ` Derrick Stolee via GitGitGadget
  2021-07-12 17:55               ` [PATCH v8 08/15] unpack-trees: rename unpack_nondirectories() Derrick Stolee via GitGitGadget
                                 ` (9 subsequent siblings)
  16 siblings, 0 replies; 215+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-07-12 17:55 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee, git,
	johannes.schindelin, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

As we further integrate the sparse-index into unpack-trees, we need to
ensure that we compare sparse directory entries correctly with other
entries. This affects searching for an exact path as well as sorting
index entries.

Sparse directory entries contain the trailing directory separator. This
is important for the sorting, in particular. Thus, within
do_compare_entry() we stop using S_IFREG in all cases, since sparse
directories should use S_IFDIR to indicate that the comparison should
treat the entry name as a dirctory.

Within compare_entry(), it first calls do_compare_entry() to check the
leading portion of the name. When the input path is a directory name, we
could match exactly already. Thus, we should return 0 if we have an
exact string match on a sparse directory entry. The final check is a
length comparison between the strings.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 unpack-trees.c | 14 +++++++++++++-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/unpack-trees.c b/unpack-trees.c
index 87c1ed204c8..b113cc750f2 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -983,6 +983,7 @@ static int do_compare_entry(const struct cache_entry *ce,
 	int pathlen, ce_len;
 	const char *ce_name;
 	int cmp;
+	unsigned ce_mode;
 
 	/*
 	 * If we have not precomputed the traverse path, it is quicker
@@ -1005,7 +1006,8 @@ static int do_compare_entry(const struct cache_entry *ce,
 	ce_len -= pathlen;
 	ce_name = ce->name + pathlen;
 
-	return df_name_compare(ce_name, ce_len, S_IFREG, name, namelen, mode);
+	ce_mode = S_ISSPARSEDIR(ce->ce_mode) ? S_IFDIR : S_IFREG;
+	return df_name_compare(ce_name, ce_len, ce_mode, name, namelen, mode);
 }
 
 static int compare_entry(const struct cache_entry *ce, const struct traverse_info *info, const struct name_entry *n)
@@ -1014,6 +1016,16 @@ static int compare_entry(const struct cache_entry *ce, const struct traverse_inf
 	if (cmp)
 		return cmp;
 
+	/*
+	 * At this point, we know that we have a prefix match. If ce
+	 * is a sparse directory, then allow an exact match. This only
+	 * works when the input name is a directory, since ce->name
+	 * ends in a directory separator.
+	 */
+	if (S_ISSPARSEDIR(ce->ce_mode) &&
+	    ce->ce_namelen == traverse_path_len(info, tree_entry_len(n)) + 1)
+		return 0;
+
 	/*
 	 * Even if the beginning compared identically, the ce should
 	 * compare as bigger than a directory leading up to it!
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 215+ messages in thread

* [PATCH v8 08/15] unpack-trees: rename unpack_nondirectories()
  2021-07-12 17:55             ` [PATCH v8 00/15] " Derrick Stolee via GitGitGadget
                                 ` (6 preceding siblings ...)
  2021-07-12 17:55               ` [PATCH v8 07/15] unpack-trees: compare sparse directories correctly Derrick Stolee via GitGitGadget
@ 2021-07-12 17:55               ` Derrick Stolee via GitGitGadget
  2021-07-12 17:55               ` [PATCH v8 09/15] unpack-trees: unpack sparse directory entries Derrick Stolee via GitGitGadget
                                 ` (8 subsequent siblings)
  16 siblings, 0 replies; 215+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-07-12 17:55 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee, git,
	johannes.schindelin, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

In the next change, we will use this method to unpack a sparse directory
entry, so change the name to unpack_single_entry() so these entries
apply. The new name reflects that we will not recurse into trees in
order to resolve the conflicts.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 unpack-trees.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/unpack-trees.c b/unpack-trees.c
index b113cc750f2..d26386ce8b2 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -804,7 +804,7 @@ static int traverse_by_cache_tree(int pos, int nr_entries, int nr_names,
 		BUG("We need cache-tree to do this optimization");
 
 	/*
-	 * Do what unpack_callback() and unpack_nondirectories() normally
+	 * Do what unpack_callback() and unpack_single_entry() normally
 	 * do. But we walk all paths in an iterative loop instead.
 	 *
 	 * D/F conflicts and higher stage entries are not a concern
@@ -1075,11 +1075,11 @@ static struct cache_entry *create_ce_entry(const struct traverse_info *info,
  * without actually calling it. If you change the logic here you may need to
  * check and change there as well.
  */
-static int unpack_nondirectories(int n, unsigned long mask,
-				 unsigned long dirmask,
-				 struct cache_entry **src,
-				 const struct name_entry *names,
-				 const struct traverse_info *info)
+static int unpack_single_entry(int n, unsigned long mask,
+			       unsigned long dirmask,
+			       struct cache_entry **src,
+			       const struct name_entry *names,
+			       const struct traverse_info *info)
 {
 	int i;
 	struct unpack_trees_options *o = info->data;
@@ -1322,7 +1322,7 @@ static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, str
 		}
 	}
 
-	if (unpack_nondirectories(n, mask, dirmask, src, names, info) < 0)
+	if (unpack_single_entry(n, mask, dirmask, src, names, info) < 0)
 		return -1;
 
 	if (o->merge && src[0]) {
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 215+ messages in thread

* [PATCH v8 09/15] unpack-trees: unpack sparse directory entries
  2021-07-12 17:55             ` [PATCH v8 00/15] " Derrick Stolee via GitGitGadget
                                 ` (7 preceding siblings ...)
  2021-07-12 17:55               ` [PATCH v8 08/15] unpack-trees: rename unpack_nondirectories() Derrick Stolee via GitGitGadget
@ 2021-07-12 17:55               ` Derrick Stolee via GitGitGadget
  2021-07-12 17:55               ` [PATCH v8 10/15] dir.c: accept a directory as part of cone-mode patterns Derrick Stolee via GitGitGadget
                                 ` (7 subsequent siblings)
  16 siblings, 0 replies; 215+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-07-12 17:55 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee, git,
	johannes.schindelin, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

During unpack_callback(), index entries are compared against tree
entries. These are matched according to names and types. One goal is to
decide if we should recurse into subtrees or simply operate on one index
entry.

In the case of a sparse-directory entry, we do not want to recurse into
that subtree and instead simply compare the trees. In some cases, we
might want to perform a merge operation on the entry, such as during
'git checkout <commit>' which wants to replace a sparse tree entry with
the tree for that path at the target commit. We extend the logic within
unpack_single_entry() to create a sparse-directory entry in this case,
and then that is sent to call_unpack_fn().

There are some subtleties in this process. For instance, we need to
update find_cache_entry() to allow finding a sparse-directory entry that
exactly matches a given path. Use the new helper method
sparse_dir_matches_path() for this. We also need to ignore conflict
markers in the case that the entries correspond to directories and we
already have a sparse directory entry.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 unpack-trees.c | 107 +++++++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 99 insertions(+), 8 deletions(-)

diff --git a/unpack-trees.c b/unpack-trees.c
index d26386ce8b2..62ccd5a0ff6 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -1052,13 +1052,15 @@ static struct cache_entry *create_ce_entry(const struct traverse_info *info,
 	const struct name_entry *n,
 	int stage,
 	struct index_state *istate,
-	int is_transient)
+	int is_transient,
+	int is_sparse_directory)
 {
 	size_t len = traverse_path_len(info, tree_entry_len(n));
+	size_t alloc_len = is_sparse_directory ? len + 1 : len;
 	struct cache_entry *ce =
 		is_transient ?
-		make_empty_transient_cache_entry(len, NULL) :
-		make_empty_cache_entry(istate, len);
+		make_empty_transient_cache_entry(alloc_len, NULL) :
+		make_empty_cache_entry(istate, alloc_len);
 
 	ce->ce_mode = create_ce_mode(n->mode);
 	ce->ce_flags = create_ce_flags(stage);
@@ -1067,6 +1069,13 @@ static struct cache_entry *create_ce_entry(const struct traverse_info *info,
 	/* len+1 because the cache_entry allocates space for NUL */
 	make_traverse_path(ce->name, len + 1, info, n->path, n->pathlen);
 
+	if (is_sparse_directory) {
+		ce->name[len] = '/';
+		ce->name[len + 1] = '\0';
+		ce->ce_namelen++;
+		ce->ce_flags |= CE_SKIP_WORKTREE;
+	}
+
 	return ce;
 }
 
@@ -1085,10 +1094,17 @@ static int unpack_single_entry(int n, unsigned long mask,
 	struct unpack_trees_options *o = info->data;
 	unsigned long conflicts = info->df_conflicts | dirmask;
 
-	/* Do we have *only* directories? Nothing to do */
 	if (mask == dirmask && !src[0])
 		return 0;
 
+	/*
+	 * When we have a sparse directory entry for src[0],
+	 * then this isn't necessarily a directory-file conflict.
+	 */
+	if (mask == dirmask && src[0] &&
+	    S_ISSPARSEDIR(src[0]->ce_mode))
+		conflicts = 0;
+
 	/*
 	 * Ok, we've filled in up to any potential index entry in src[0],
 	 * now do the rest.
@@ -1118,7 +1134,9 @@ static int unpack_single_entry(int n, unsigned long mask,
 		 * not stored in the index.  otherwise construct the
 		 * cache entry from the index aware logic.
 		 */
-		src[i + o->merge] = create_ce_entry(info, names + i, stage, &o->result, o->merge);
+		src[i + o->merge] = create_ce_entry(info, names + i, stage,
+						    &o->result, o->merge,
+						    bit & dirmask);
 	}
 
 	if (o->merge) {
@@ -1222,16 +1240,71 @@ static int find_cache_pos(struct traverse_info *info,
 	return -1;
 }
 
+/*
+ * Given a sparse directory entry 'ce', compare ce->name to
+ * info->name + '/' + p->path + '/' if info->name is non-empty.
+ * Compare ce->name to p->path + '/' otherwise. Note that
+ * ce->name must end in a trailing '/' because it is a sparse
+ * directory entry.
+ */
+static int sparse_dir_matches_path(const struct cache_entry *ce,
+				   struct traverse_info *info,
+				   const struct name_entry *p)
+{
+	assert(S_ISSPARSEDIR(ce->ce_mode));
+	assert(ce->name[ce->ce_namelen - 1] == '/');
+
+	if (info->namelen)
+		return ce->ce_namelen == info->namelen + p->pathlen + 2 &&
+		       ce->name[info->namelen] == '/' &&
+		       !strncmp(ce->name, info->name, info->namelen) &&
+		       !strncmp(ce->name + info->namelen + 1, p->path, p->pathlen);
+	return ce->ce_namelen == p->pathlen + 1 &&
+	       !strncmp(ce->name, p->path, p->pathlen);
+}
+
 static struct cache_entry *find_cache_entry(struct traverse_info *info,
 					    const struct name_entry *p)
 {
+	struct cache_entry *ce;
 	int pos = find_cache_pos(info, p->path, p->pathlen);
 	struct unpack_trees_options *o = info->data;
 
 	if (0 <= pos)
 		return o->src_index->cache[pos];
-	else
+
+	/*
+	 * Check for a sparse-directory entry named "path/".
+	 * Due to the input p->path not having a trailing
+	 * slash, the negative 'pos' value overshoots the
+	 * expected position, hence "-2" instead of "-1".
+	 */
+	pos = -pos - 2;
+
+	if (pos < 0 || pos >= o->src_index->cache_nr)
 		return NULL;
+
+	/*
+	 * Due to lexicographic sorting and sparse directory
+	 * entried ending with a trailing slash, our path as a
+	 * sparse directory (e.g "subdir/") and	our path as a
+	 * file (e.g. "subdir") might be separated by other
+	 * paths (e.g. "subdir-").
+	 */
+	while (pos >= 0) {
+		ce = o->src_index->cache[pos];
+
+		if (strncmp(ce->name, p->path, p->pathlen))
+			return NULL;
+
+		if (S_ISSPARSEDIR(ce->ce_mode) &&
+		    sparse_dir_matches_path(ce, info, p))
+			return ce;
+
+		pos--;
+	}
+
+	return NULL;
 }
 
 static void debug_path(struct traverse_info *info)
@@ -1266,6 +1339,21 @@ static void debug_unpack_callback(int n,
 		debug_name_entry(i, names + i);
 }
 
+/*
+ * Returns true if and only if the given cache_entry is a
+ * sparse-directory entry that matches the given name_entry
+ * from the tree walk at the given traverse_info.
+ */
+static int is_sparse_directory_entry(struct cache_entry *ce,
+				     struct name_entry *name,
+				     struct traverse_info *info)
+{
+	if (!ce || !name || !S_ISSPARSEDIR(ce->ce_mode))
+		return 0;
+
+	return sparse_dir_matches_path(ce, info, name);
+}
+
 /*
  * Note that traverse_by_cache_tree() duplicates some logic in this function
  * without actually calling it. If you change the logic here you may need to
@@ -1352,9 +1440,12 @@ static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, str
 			}
 		}
 
-		if (traverse_trees_recursive(n, dirmask, mask & ~dirmask,
-					     names, info) < 0)
+		if (!is_sparse_directory_entry(src[0], names, info) &&
+		    traverse_trees_recursive(n, dirmask, mask & ~dirmask,
+						    names, info) < 0) {
 			return -1;
+		}
+
 		return mask;
 	}
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 215+ messages in thread

* [PATCH v8 10/15] dir.c: accept a directory as part of cone-mode patterns
  2021-07-12 17:55             ` [PATCH v8 00/15] " Derrick Stolee via GitGitGadget
                                 ` (8 preceding siblings ...)
  2021-07-12 17:55               ` [PATCH v8 09/15] unpack-trees: unpack sparse directory entries Derrick Stolee via GitGitGadget
@ 2021-07-12 17:55               ` Derrick Stolee via GitGitGadget
  2021-07-12 17:55               ` [PATCH v8 11/15] diff-lib: handle index diffs with sparse dirs Derrick Stolee via GitGitGadget
                                 ` (6 subsequent siblings)
  16 siblings, 0 replies; 215+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-07-12 17:55 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee, git,
	johannes.schindelin, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

When we have sparse directory entries in the index, we want to compare
that directory against sparse-checkout patterns. Those pattern matching
algorithms are built expecting a file path, not a directory path. This
is especially important in the "cone mode" patterns which will match
files that exist within the "parent directories" as well as the
recursive directory matches.

If path_matches_pattern_list() is given a directory, we can add a fake
filename ("-") to the directory and get the same results as before,
assuming we are in cone mode. Since sparse index requires cone mode
patterns, this is an acceptable assumption.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 dir.c | 24 +++++++++++++++++++-----
 1 file changed, 19 insertions(+), 5 deletions(-)

diff --git a/dir.c b/dir.c
index ebe5ec046e0..0c5264b3b20 100644
--- a/dir.c
+++ b/dir.c
@@ -1376,7 +1376,7 @@ enum pattern_match_result path_matches_pattern_list(
 	struct path_pattern *pattern;
 	struct strbuf parent_pathname = STRBUF_INIT;
 	int result = NOT_MATCHED;
-	const char *slash_pos;
+	size_t slash_pos;
 
 	if (!pl->use_cone_patterns) {
 		pattern = last_matching_pattern_from_list(pathname, pathlen, basename,
@@ -1397,21 +1397,35 @@ enum pattern_match_result path_matches_pattern_list(
 	strbuf_addch(&parent_pathname, '/');
 	strbuf_add(&parent_pathname, pathname, pathlen);
 
+	/*
+	 * Directory entries are matched if and only if a file
+	 * contained immediately within them is matched. For the
+	 * case of a directory entry, modify the path to create
+	 * a fake filename within this directory, allowing us to
+	 * use the file-base matching logic in an equivalent way.
+	 */
+	if (parent_pathname.len > 0 &&
+	    parent_pathname.buf[parent_pathname.len - 1] == '/') {
+		slash_pos = parent_pathname.len - 1;
+		strbuf_add(&parent_pathname, "-", 1);
+	} else {
+		const char *slash_ptr = strrchr(parent_pathname.buf, '/');
+		slash_pos = slash_ptr ? slash_ptr - parent_pathname.buf : 0;
+	}
+
 	if (hashmap_contains_path(&pl->recursive_hashmap,
 				  &parent_pathname)) {
 		result = MATCHED_RECURSIVE;
 		goto done;
 	}
 
-	slash_pos = strrchr(parent_pathname.buf, '/');
-
-	if (slash_pos == parent_pathname.buf) {
+	if (!slash_pos) {
 		/* include every file in root */
 		result = MATCHED;
 		goto done;
 	}
 
-	strbuf_setlen(&parent_pathname, slash_pos - parent_pathname.buf);
+	strbuf_setlen(&parent_pathname, slash_pos);
 
 	if (hashmap_contains_path(&pl->parent_hashmap, &parent_pathname)) {
 		result = MATCHED;
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 215+ messages in thread

* [PATCH v8 11/15] diff-lib: handle index diffs with sparse dirs
  2021-07-12 17:55             ` [PATCH v8 00/15] " Derrick Stolee via GitGitGadget
                                 ` (9 preceding siblings ...)
  2021-07-12 17:55               ` [PATCH v8 10/15] dir.c: accept a directory as part of cone-mode patterns Derrick Stolee via GitGitGadget
@ 2021-07-12 17:55               ` Derrick Stolee via GitGitGadget
  2021-07-12 17:55               ` [PATCH v8 12/15] status: skip sparse-checkout percentage with sparse-index Derrick Stolee via GitGitGadget
                                 ` (5 subsequent siblings)
  16 siblings, 0 replies; 215+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-07-12 17:55 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee, git,
	johannes.schindelin, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

While comparing an index to a tree, we may see a sparse directory entry.
In this case, we should compare that portion of the tree to the tree
represented by that entry. This could include a new tree which needs to
be expanded to a full list of added files. It could also include an
existing tree, in which case all of the changes inside are important to
describe, including the modifications, additions, and deletions. Note
that the case where the tree has a path and the index does not remains
identical to before: the lack of a cache entry is the same with a sparse
index.

Use diff_tree_oid() appropriately to compute the diff.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 diff-lib.c | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/diff-lib.c b/diff-lib.c
index c2ac9250fe9..f9eadc4fc1a 100644
--- a/diff-lib.c
+++ b/diff-lib.c
@@ -325,6 +325,11 @@ static void show_new_file(struct rev_info *revs,
 	unsigned dirty_submodule = 0;
 	struct index_state *istate = revs->diffopt.repo->index;
 
+	if (new_file && S_ISSPARSEDIR(new_file->ce_mode)) {
+		diff_tree_oid(NULL, &new_file->oid, new_file->name, &revs->diffopt);
+		return;
+	}
+
 	/*
 	 * New file in the index: it might actually be different in
 	 * the working tree.
@@ -347,6 +352,20 @@ static int show_modified(struct rev_info *revs,
 	unsigned dirty_submodule = 0;
 	struct index_state *istate = revs->diffopt.repo->index;
 
+	assert(S_ISSPARSEDIR(old_entry->ce_mode) ==
+	       S_ISSPARSEDIR(new_entry->ce_mode));
+
+	/*
+	 * If both are sparse directory entries, then expand the
+	 * modifications to the file level. If only one was a sparse
+	 * directory, then they appear as an add and delete instead of
+	 * a modification.
+	 */
+	if (S_ISSPARSEDIR(new_entry->ce_mode)) {
+		diff_tree_oid(&old_entry->oid, &new_entry->oid, new_entry->name, &revs->diffopt);
+		return 0;
+	}
+
 	if (get_stat_data(istate, new_entry, &oid, &mode, cached, match_missing,
 			  &dirty_submodule, &revs->diffopt) < 0) {
 		if (report_missing)
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 215+ messages in thread

* [PATCH v8 12/15] status: skip sparse-checkout percentage with sparse-index
  2021-07-12 17:55             ` [PATCH v8 00/15] " Derrick Stolee via GitGitGadget
                                 ` (10 preceding siblings ...)
  2021-07-12 17:55               ` [PATCH v8 11/15] diff-lib: handle index diffs with sparse dirs Derrick Stolee via GitGitGadget
@ 2021-07-12 17:55               ` Derrick Stolee via GitGitGadget
  2021-07-12 17:55               ` [PATCH v8 13/15] status: use sparse-index throughout Derrick Stolee via GitGitGadget
                                 ` (4 subsequent siblings)
  16 siblings, 0 replies; 215+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-07-12 17:55 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee, git,
	johannes.schindelin, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

'git status' began reporting a percentage of populated paths when
sparse-checkout is enabled in 051df3cf (wt-status: show sparse
checkout status as well, 2020-07-18). This percentage is incorrect when
the index has sparse directories. It would also be expensive to
calculate as we would need to parse trees to count the total number of
possible paths.

Avoid the expensive computation by simplifying the output to only report
that a sparse checkout exists, without the percentage.

This change is the reason we use 'git status --porcelain=v2' in
t1092-sparse-checkout-compatibility.sh. We don't want to ensure that
this message is equal across both modes, but instead just the important
information about staged, modified, and untracked files are compared.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 t/t1092-sparse-checkout-compatibility.sh |  8 ++++++++
 wt-status.c                              | 14 +++++++++++---
 wt-status.h                              |  1 +
 3 files changed, 20 insertions(+), 3 deletions(-)

diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index 2269f44e033..375b0d35565 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -218,6 +218,14 @@ test_expect_success 'status with options' '
 	test_all_match git status --porcelain=v2 -uno
 '
 
+test_expect_success 'status reports sparse-checkout' '
+	init_repos &&
+	git -C sparse-checkout status >full &&
+	git -C sparse-index status >sparse &&
+	test_i18ngrep "You are in a sparse checkout with " full &&
+	test_i18ngrep "You are in a sparse checkout." sparse
+'
+
 test_expect_success 'add, commit, checkout' '
 	init_repos &&
 
diff --git a/wt-status.c b/wt-status.c
index 42b67357169..96db3e74962 100644
--- a/wt-status.c
+++ b/wt-status.c
@@ -1493,9 +1493,12 @@ static void show_sparse_checkout_in_use(struct wt_status *s,
 	if (s->state.sparse_checkout_percentage == SPARSE_CHECKOUT_DISABLED)
 		return;
 
-	status_printf_ln(s, color,
-			 _("You are in a sparse checkout with %d%% of tracked files present."),
-			 s->state.sparse_checkout_percentage);
+	if (s->state.sparse_checkout_percentage == SPARSE_CHECKOUT_SPARSE_INDEX)
+		status_printf_ln(s, color, _("You are in a sparse checkout."));
+	else
+		status_printf_ln(s, color,
+				_("You are in a sparse checkout with %d%% of tracked files present."),
+				s->state.sparse_checkout_percentage);
 	wt_longstatus_print_trailer(s);
 }
 
@@ -1653,6 +1656,11 @@ static void wt_status_check_sparse_checkout(struct repository *r,
 		return;
 	}
 
+	if (r->index->sparse_index) {
+		state->sparse_checkout_percentage = SPARSE_CHECKOUT_SPARSE_INDEX;
+		return;
+	}
+
 	for (i = 0; i < r->index->cache_nr; i++) {
 		struct cache_entry *ce = r->index->cache[i];
 		if (ce_skip_worktree(ce))
diff --git a/wt-status.h b/wt-status.h
index 0d32799b28e..ab9cc9d8f03 100644
--- a/wt-status.h
+++ b/wt-status.h
@@ -78,6 +78,7 @@ enum wt_status_format {
 };
 
 #define SPARSE_CHECKOUT_DISABLED -1
+#define SPARSE_CHECKOUT_SPARSE_INDEX -2
 
 struct wt_status_state {
 	int merge_in_progress;
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 215+ messages in thread

* [PATCH v8 13/15] status: use sparse-index throughout
  2021-07-12 17:55             ` [PATCH v8 00/15] " Derrick Stolee via GitGitGadget
                                 ` (11 preceding siblings ...)
  2021-07-12 17:55               ` [PATCH v8 12/15] status: skip sparse-checkout percentage with sparse-index Derrick Stolee via GitGitGadget
@ 2021-07-12 17:55               ` Derrick Stolee via GitGitGadget
  2021-07-12 17:55               ` [PATCH v8 14/15] wt-status: expand added sparse directory entries Derrick Stolee via GitGitGadget
                                 ` (3 subsequent siblings)
  16 siblings, 0 replies; 215+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-07-12 17:55 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee, git,
	johannes.schindelin, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

By testing 'git -c core.fsmonitor= status -uno', we can check for the
simplest index operations that can be made sparse-aware. The necessary
implementation details are already integrated with sparse-checkout, so
modify command_requires_full_index to be zero for cmd_status().

In refresh_index(), we loop through the index entries to refresh their
stat() information. However, sparse directories have no stat()
information to populate. Ignore these entries.

This allows 'git status' to no longer expand a sparse index to a full
one. This is further tested by dropping the "-uno" option and adding an
untracked file into the worktree.

The performance test p2000-sparse-checkout-operations.sh demonstrates
these improvements:

Test                                  HEAD~1           HEAD
-----------------------------------------------------------------------------
2000.2: git status (full-index-v3)    0.31(0.30+0.05)  0.31(0.29+0.06) +0.0%
2000.3: git status (full-index-v4)    0.31(0.29+0.07)  0.34(0.30+0.08) +9.7%
2000.4: git status (sparse-index-v3)  2.35(2.28+0.10)  0.04(0.04+0.05) -98.3%
2000.5: git status (sparse-index-v4)  2.35(2.24+0.15)  0.05(0.04+0.06) -97.9%

Note that since HEAD~1 was expanding the sparse index by parsing trees,
it was artificially slower than the full index case. Thus, the 98%
improvement is misleading, and instead we should celebrate the 0.34s to
0.05s improvement of 85%. This is more indicative of the peformance
gains we are expecting by using a sparse index.

Note: we are dropping the assignment of core.fsmonitor here. This is not
necessary for the test script as we are not altering the config any
other way. Correct integration with FS Monitor will be validated in
later changes.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 builtin/commit.c                         |  3 +++
 read-cache.c                             | 10 ++++++++--
 t/t1092-sparse-checkout-compatibility.sh | 13 +++++++++----
 3 files changed, 20 insertions(+), 6 deletions(-)

diff --git a/builtin/commit.c b/builtin/commit.c
index 190d215d43b..12f51db158a 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -1510,6 +1510,9 @@ int cmd_status(int argc, const char **argv, const char *prefix)
 	if (argc == 2 && !strcmp(argv[1], "-h"))
 		usage_with_options(builtin_status_usage, builtin_status_options);
 
+	prepare_repo_settings(the_repository);
+	the_repository->settings.command_requires_full_index = 0;
+
 	status_init_config(&s, git_status_config);
 	argc = parse_options(argc, argv, prefix,
 			     builtin_status_options,
diff --git a/read-cache.c b/read-cache.c
index 0c3ac3cefc0..6a1337cc905 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1585,8 +1585,7 @@ int refresh_index(struct index_state *istate, unsigned int flags,
 	 */
 	preload_index(istate, pathspec, 0);
 	trace2_region_enter("index", "refresh", NULL);
-	/* TODO: audit for interaction with sparse-index. */
-	ensure_full_index(istate);
+
 	for (i = 0; i < istate->cache_nr; i++) {
 		struct cache_entry *ce, *new_entry;
 		int cache_errno = 0;
@@ -1601,6 +1600,13 @@ int refresh_index(struct index_state *istate, unsigned int flags,
 		if (ignore_skip_worktree && ce_skip_worktree(ce))
 			continue;
 
+		/*
+		 * If this entry is a sparse directory, then there isn't
+		 * any stat() information to update. Ignore the entry.
+		 */
+		if (S_ISSPARSEDIR(ce->ce_mode))
+			continue;
+
 		if (pathspec && !ce_path_match(istate, ce, pathspec, seen))
 			filtered = 1;
 
diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index 375b0d35565..751f397cc7f 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -511,12 +511,17 @@ test_expect_success 'sparse-index is expanded and converted back' '
 	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \
 		git -C sparse-index -c core.fsmonitor="" reset --hard &&
 	test_region index convert_to_sparse trace2.txt &&
-	test_region index ensure_full_index trace2.txt &&
+	test_region index ensure_full_index trace2.txt
+'
 
-	rm trace2.txt &&
+test_expect_success 'sparse-index is not expanded' '
+	init_repos &&
+
+	rm -f trace2.txt &&
+	echo >>sparse-index/untracked.txt &&
 	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \
-		git -C sparse-index -c core.fsmonitor="" status -uno &&
-	test_region index ensure_full_index trace2.txt
+		git -C sparse-index status &&
+	test_region ! index ensure_full_index trace2.txt
 '
 
 test_done
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 215+ messages in thread

* [PATCH v8 14/15] wt-status: expand added sparse directory entries
  2021-07-12 17:55             ` [PATCH v8 00/15] " Derrick Stolee via GitGitGadget
                                 ` (12 preceding siblings ...)
  2021-07-12 17:55               ` [PATCH v8 13/15] status: use sparse-index throughout Derrick Stolee via GitGitGadget
@ 2021-07-12 17:55               ` Derrick Stolee via GitGitGadget
  2021-07-12 17:55               ` [PATCH v8 15/15] fsmonitor: integrate with sparse index Derrick Stolee via GitGitGadget
                                 ` (2 subsequent siblings)
  16 siblings, 0 replies; 215+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-07-12 17:55 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee, git,
	johannes.schindelin, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

It is difficult, but possible, to get into a state where we intend to
add a directory that is outside of the sparse-checkout definition. Add a
test to t1092-sparse-checkout-compatibility.sh that demonstrates this
using a combination of 'git reset --mixed' and 'git checkout --orphan'.

This test failed before because the output of 'git status
--porcelain=v2' would not match on the lines for folder1/:

* The sparse-checkout repo (with a full index) would output each path
  name that is intended to be added.

* The sparse-index repo would only output that "folder1/" is staged for
  addition.

The status should report the full list of files to be added, and so this
sparse-directory entry should be expanded to a full list when reaching
it inside the wt_status_collect_changes_initial() method. Use
read_tree_at() to assist.

Somehow, this loop over the cache entries was not guarded by
ensure_full_index() as intended.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 t/t1092-sparse-checkout-compatibility.sh | 33 +++++++++++++++
 wt-status.c                              | 51 ++++++++++++++++++++++++
 2 files changed, 84 insertions(+)

diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index 751f397cc7f..2394c36d881 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -524,4 +524,37 @@ test_expect_success 'sparse-index is not expanded' '
 	test_region ! index ensure_full_index trace2.txt
 '
 
+test_expect_success 'reset mixed and checkout orphan' '
+	init_repos &&
+
+	test_all_match git checkout rename-out-to-in &&
+
+	# Sparse checkouts do not agree with full checkouts about
+	# how to report a directory/file conflict during a reset.
+	# This command would fail with test_all_match because the
+	# full checkout reports "T folder1/0/1" while a sparse
+	# checkout reports "D folder1/0/1". This matches because
+	# the sparse checkouts skip "adding" the other side of
+	# the conflict.
+	test_sparse_match git reset --mixed HEAD~1 &&
+	test_sparse_match test-tool read-cache --table --expand &&
+	test_sparse_match git status --porcelain=v2 &&
+
+	# At this point, sparse-checkouts behave differently
+	# from the full-checkout.
+	test_sparse_match git checkout --orphan new-branch &&
+	test_sparse_match test-tool read-cache --table --expand &&
+	test_sparse_match git status --porcelain=v2
+'
+
+test_expect_success 'add everything with deep new file' '
+	init_repos &&
+
+	run_on_sparse git sparse-checkout set deep/deeper1/deepest &&
+
+	run_on_all touch deep/deeper1/x &&
+	test_all_match git add . &&
+	test_all_match git status --porcelain=v2
+'
+
 test_done
diff --git a/wt-status.c b/wt-status.c
index 96db3e74962..0317baef87e 100644
--- a/wt-status.c
+++ b/wt-status.c
@@ -657,6 +657,36 @@ static void wt_status_collect_changes_index(struct wt_status *s)
 	clear_pathspec(&rev.prune_data);
 }
 
+static int add_file_to_list(const struct object_id *oid,
+			    struct strbuf *base, const char *path,
+			    unsigned int mode, void *context)
+{
+	struct string_list_item *it;
+	struct wt_status_change_data *d;
+	struct wt_status *s = context;
+	struct strbuf full_name = STRBUF_INIT;
+
+	if (S_ISDIR(mode))
+		return READ_TREE_RECURSIVE;
+
+	strbuf_add(&full_name, base->buf, base->len);
+	strbuf_addstr(&full_name, path);
+	it = string_list_insert(&s->change, full_name.buf);
+	d = it->util;
+	if (!d) {
+		CALLOC_ARRAY(d, 1);
+		it->util = d;
+	}
+
+	d->index_status = DIFF_STATUS_ADDED;
+	/* Leave {mode,oid}_head zero for adds. */
+	d->mode_index = mode;
+	oidcpy(&d->oid_index, oid);
+	s->committable = 1;
+	strbuf_release(&full_name);
+	return 0;
+}
+
 static void wt_status_collect_changes_initial(struct wt_status *s)
 {
 	struct index_state *istate = s->repo->index;
@@ -671,6 +701,27 @@ static void wt_status_collect_changes_initial(struct wt_status *s)
 			continue;
 		if (ce_intent_to_add(ce))
 			continue;
+		if (S_ISSPARSEDIR(ce->ce_mode)) {
+			/*
+			 * This is a sparse directory entry, so we want to collect all
+			 * of the added files within the tree. This requires recursively
+			 * expanding the trees to find the elements that are new in this
+			 * tree and marking them with DIFF_STATUS_ADDED.
+			 */
+			struct strbuf base = STRBUF_INIT;
+			struct pathspec ps = { 0 };
+			struct tree *tree = lookup_tree(istate->repo, &ce->oid);
+
+			ps.recursive = 1;
+			ps.has_wildcard = 1;
+			ps.max_depth = -1;
+
+			strbuf_add(&base, ce->name, ce->ce_namelen);
+			read_tree_at(istate->repo, tree, &base, &ps,
+				     add_file_to_list, s);
+			continue;
+		}
+
 		it = string_list_insert(&s->change, ce->name);
 		d = it->util;
 		if (!d) {
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 215+ messages in thread

* [PATCH v8 15/15] fsmonitor: integrate with sparse index
  2021-07-12 17:55             ` [PATCH v8 00/15] " Derrick Stolee via GitGitGadget
                                 ` (13 preceding siblings ...)
  2021-07-12 17:55               ` [PATCH v8 14/15] wt-status: expand added sparse directory entries Derrick Stolee via GitGitGadget
@ 2021-07-12 17:55               ` Derrick Stolee via GitGitGadget
  2021-07-12 19:38               ` [PATCH v8 00/15] Sparse-index: integrate with status Elijah Newren
  2021-07-14 13:12               ` [PATCH v9 00/16] " Derrick Stolee via GitGitGadget
  16 siblings, 0 replies; 215+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-07-12 17:55 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee, git,
	johannes.schindelin, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

If we need to expand a sparse-index into a full one, then the FS Monitor
bitmap is going to be incorrect. Ensure that we start fresh at such an
event.

While this is currently a performance drawback, the eventual hope of the
sparse-index feature is that these expansions will be rare and hence we
will be able to keep the FS Monitor data accurate across multiple Git
commands.

These tests are added to demonstrate that the behavior is the same
across a full index and a sparse index, but also that file modifications
to a tracked directory outside of the sparse cone will trigger
ensure_full_index().

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 sparse-index.c              |  7 ++++++
 t/t7519-status-fsmonitor.sh | 49 +++++++++++++++++++++++++++++++++++++
 2 files changed, 56 insertions(+)

diff --git a/sparse-index.c b/sparse-index.c
index ef53bd2198b..53c8f711ccc 100644
--- a/sparse-index.c
+++ b/sparse-index.c
@@ -186,6 +186,10 @@ int convert_to_sparse(struct index_state *istate)
 	cache_tree_free(&istate->cache_tree);
 	cache_tree_update(istate, 0);
 
+	istate->fsmonitor_has_run_once = 0;
+	FREE_AND_NULL(istate->fsmonitor_dirty);
+	FREE_AND_NULL(istate->fsmonitor_last_update);
+
 	istate->sparse_index = 1;
 	trace2_region_leave("index", "convert_to_sparse", istate->repo);
 	return 0;
@@ -282,6 +286,9 @@ void ensure_full_index(struct index_state *istate)
 	istate->cache = full->cache;
 	istate->cache_nr = full->cache_nr;
 	istate->cache_alloc = full->cache_alloc;
+	istate->fsmonitor_has_run_once = 0;
+	FREE_AND_NULL(istate->fsmonitor_dirty);
+	FREE_AND_NULL(istate->fsmonitor_last_update);
 
 	strbuf_release(&base);
 	free(full);
diff --git a/t/t7519-status-fsmonitor.sh b/t/t7519-status-fsmonitor.sh
index 637391c6ce4..deea88d4431 100755
--- a/t/t7519-status-fsmonitor.sh
+++ b/t/t7519-status-fsmonitor.sh
@@ -73,6 +73,7 @@ test_expect_success 'setup' '
 	expect*
 	actual*
 	marker*
+	trace2*
 	EOF
 '
 
@@ -383,4 +384,52 @@ test_expect_success 'status succeeds after staging/unstaging' '
 	)
 '
 
+# Usage:
+# check_sparse_index_behavior [!]
+# If "!" is supplied, then we verify that we do not call ensure_full_index
+# during a call to 'git status'. Otherwise, we verify that we _do_ call it.
+check_sparse_index_behavior () {
+	git status --porcelain=v2 >expect &&
+	git sparse-checkout init --cone --sparse-index &&
+	git sparse-checkout set dir1 dir2 &&
+	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \
+		git status --porcelain=v2 >actual &&
+	test_region $1 index ensure_full_index trace2.txt &&
+	test_region fsm_hook query trace2.txt &&
+	test_cmp expect actual &&
+	rm trace2.txt &&
+	git sparse-checkout disable
+}
+
+test_expect_success 'status succeeds with sparse index' '
+	git reset --hard &&
+
+	test_config core.fsmonitor "$TEST_DIRECTORY/t7519/fsmonitor-all" &&
+	check_sparse_index_behavior ! &&
+
+	write_script .git/hooks/fsmonitor-test<<-\EOF &&
+		printf "last_update_token\0"
+	EOF
+	git config core.fsmonitor .git/hooks/fsmonitor-test &&
+	check_sparse_index_behavior ! &&
+
+	write_script .git/hooks/fsmonitor-test<<-\EOF &&
+		printf "last_update_token\0"
+		printf "dir1/modified\0"
+	EOF
+	check_sparse_index_behavior ! &&
+
+	cp -r dir1 dir1a &&
+	git add dir1a &&
+	git commit -m "add dir1a" &&
+
+	# This one modifies outside the sparse-checkout definition
+	# and hence we expect to expand the sparse-index.
+	write_script .git/hooks/fsmonitor-test<<-\EOF &&
+		printf "last_update_token\0"
+		printf "dir1a/modified\0"
+	EOF
+	check_sparse_index_behavior
+'
+
 test_done
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 215+ messages in thread

* Re: [PATCH v7 15/16] wt-status: expand added sparse directory entries
  2021-07-12 13:56                 ` Derrick Stolee
@ 2021-07-12 19:32                   ` Elijah Newren
  2021-07-12 19:41                     ` Derrick Stolee
  0 siblings, 1 reply; 215+ messages in thread
From: Elijah Newren @ 2021-07-12 19:32 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Derrick Stolee via GitGitGadget, Git Mailing List,
	Junio C Hamano, Matheus Tavares Bernardino, Jeff Hostetler,
	Johannes Schindelin, Derrick Stolee, Derrick Stolee

On Mon, Jul 12, 2021 at 6:56 AM Derrick Stolee <stolee@gmail.com> wrote:
>
> On 7/8/2021 9:03 PM, Elijah Newren wrote:
> > On Mon, Jun 28, 2021 at 7:05 PM Derrick Stolee via GitGitGadget
> > <gitgitgadget@gmail.com> wrote:
> ...
> >> +test_expect_success 'reset mixed and checkout orphan' '
> >> +       init_repos &&
> >> +
> >> +       test_all_match git checkout rename-out-to-in &&
> >> +
> >> +       # Sparse checkouts do not agree with full checkouts about
> >> +       # how to report a directory/file conflict during a reset.
> >> +       # This command would fail with test_all_match because the
> >> +       # full checkout reports "T folder1/0/1" while a sparse
> >> +       # checkout reports "D folder1/0/1". This matches because
> >> +       # the sparse checkouts skip "adding" the other side of
> >> +       # the conflict.
> >
> > The same issue I highlighted last time is still present.  If you
> > insert an "exit 1" right here, then run
> >     ./t1092-sparse-checkout-compatibility.sh --ver --imm -x
> > until it stops, then
> >     cd t/trash directory.t1092-sparse-checkout-compatibility/sparse-checkout
> >     git ls-files -t | grep folder  # Note the files that are sparse
> >     git reset --mixed HEAD~1
> >     git ls-files -t | grep folder  # Note the files that are sparse --
> > there are some that aren't that should be
> >     git sparse-checkout reapply
> >     git ls-files -t | grep folder  # Note the files that are sparse
> >
> > Granted, this is a bug with sparse-checkout without sparse-index, so
> > not something new to your series.  But since you are using comparisons
> > between regular sparse-checkouts and sparse-index to verify
> > correctness, this seems problematic to me.
>
> I'll add it to the pile, but I want to continue having this series
> focus on making the sparse-index work quickly without a change in
> behavior from a normal index. Changing the behavior of the sparse-
> checkout feature should be a separate series.

Hmm..perhaps there's some middle ground?  I appreciate that you want
to have this series focus on making the sparse-index work without
worrying about behavioral changes to sparse-checkout.  I'm concerned,
though, that testcases tend to be treated as documentation of intended
behavior, even when the tests are testing something else.  These tests
are clearly triggering buggy behavior, and I think your comments and
subsequent command may be affected by it.  I don't want to leave
future folks (even ourselves) to have to try to explain away why the
behavior expected in this test should not be expected.

Perhaps we can just add a comment that this testcase is triggering a
bug in both sparse-checkout and sparse-index but we're only checking
that the two match, and that once the bug is fix, the testcase itself
may need tweaking?

^ permalink raw reply	[flat|nested] 215+ messages in thread

* Re: [PATCH v8 00/15] Sparse-index: integrate with status
  2021-07-12 17:55             ` [PATCH v8 00/15] " Derrick Stolee via GitGitGadget
                                 ` (14 preceding siblings ...)
  2021-07-12 17:55               ` [PATCH v8 15/15] fsmonitor: integrate with sparse index Derrick Stolee via GitGitGadget
@ 2021-07-12 19:38               ` Elijah Newren
  2021-07-13 12:57                 ` Derrick Stolee
  2021-07-14 13:12               ` [PATCH v9 00/16] " Derrick Stolee via GitGitGadget
  16 siblings, 1 reply; 215+ messages in thread
From: Elijah Newren @ 2021-07-12 19:38 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Junio C Hamano, Matheus Tavares Bernardino,
	Derrick Stolee, Jeff Hostetler, Johannes Schindelin,
	Derrick Stolee

On Mon, Jul 12, 2021 at 10:55 AM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> This is the first "payoff" series in the sparse-index work. It makes 'git
> status' very fast when a sparse-index is enabled on a repository with
> cone-mode sparse-checkout (and a small populated set).
>
...
> Update in V8
> ============
>
>  * The directory/file conflict patch is removed and delayed to the next
>    series where it will be required. (It will also be improved in that
>    series.)
>
>  * Some comments have been improved, including a new assert() that helps
>    document the situation.
>

This one looks really good.  Just two minor comments:

> Range-diff vs v7:
>
...
>   9:  237ccf4e43d !  9:  c0b0b58584c unpack-trees: unpack sparse directory entries
>      @@ unpack-trees.c: static int find_cache_pos(struct traverse_info *info,
>       +  * Check for a sparse-directory entry named "path/".
>       +  * Due to the input p->path not having a trailing
>       +  * slash, the negative 'pos' value overshoots the
>      -+  * expected position by at least one, hence "-2" here.
>      ++  * expected position, hence "-2" instead of "-1".
>       +  */
>       + pos = -pos - 2;
>       +
>      @@ unpack-trees.c: static int find_cache_pos(struct traverse_info *info,
>                 return NULL;
>       +
>       + /*
>      -+  * We might have multiple entries between 'pos' and
>      -+  * the actual sparse-directory entry, so start walking
>      -+  * back until finding it or passing where it would be.
>      ++  * Due to lexicographic sorting and sparse directory
>      ++  * entried ending with a trailing slash, our path as a

s/entried/entries/ ?


>      ++  * sparse directory (e.g "subdir/") and our path as a
>      ++  * file (e.g. "subdir") might be separated by other
>      ++  * paths (e.g. "subdir-").
>       +  */
>       + while (pos >= 0) {
>       +         ce = o->src_index->cache[pos];
...
>  15:  717a3f49f97 = 14:  dada1b91bdc wt-status: expand added sparse directory entries

As I commented over at [1], I would appreciate if we could at least
add a comment in the testcase that we know this testcase triggers a
bug for both sparse-index and sparse-checkout...and that fixing it
might affect the other comments and commands within that testcase in
the future...but for now, we're just testing as best we can that the
two give the same behavior.

[1] https://lore.kernel.org/git/CABPp-BGJ+LTubgS=zvGJjk3kgyfW-7UFEa=qg-0mdyrY32j0pQ@mail.gmail.com/


If you agree and include the two fixups above, the entire series is:
Reviewed-by: Elijah Newren <newren@gmail.com>

If you disagree, then all patches other than 9 and 14 can have my
Reviewed-by tag.  :-)


Thanks for all the awesome work!

^ permalink raw reply	[flat|nested] 215+ messages in thread

* Re: [PATCH v7 15/16] wt-status: expand added sparse directory entries
  2021-07-12 19:32                   ` Elijah Newren
@ 2021-07-12 19:41                     ` Derrick Stolee
  0 siblings, 0 replies; 215+ messages in thread
From: Derrick Stolee @ 2021-07-12 19:41 UTC (permalink / raw)
  To: Elijah Newren
  Cc: Derrick Stolee via GitGitGadget, Git Mailing List,
	Junio C Hamano, Matheus Tavares Bernardino, Jeff Hostetler,
	Johannes Schindelin, Derrick Stolee, Derrick Stolee

On 7/12/2021 3:32 PM, Elijah Newren wrote:
> On Mon, Jul 12, 2021 at 6:56 AM Derrick Stolee <stolee@gmail.com> wrote:
>>
>> On 7/8/2021 9:03 PM, Elijah Newren wrote:
>>> On Mon, Jun 28, 2021 at 7:05 PM Derrick Stolee via GitGitGadget
>>> <gitgitgadget@gmail.com> wrote:
>> ...
>>>> +test_expect_success 'reset mixed and checkout orphan' '
>>>> +       init_repos &&
>>>> +
>>>> +       test_all_match git checkout rename-out-to-in &&
>>>> +
>>>> +       # Sparse checkouts do not agree with full checkouts about
>>>> +       # how to report a directory/file conflict during a reset.
>>>> +       # This command would fail with test_all_match because the
>>>> +       # full checkout reports "T folder1/0/1" while a sparse
>>>> +       # checkout reports "D folder1/0/1". This matches because
>>>> +       # the sparse checkouts skip "adding" the other side of
>>>> +       # the conflict.
>>>
>>> The same issue I highlighted last time is still present.  If you
>>> insert an "exit 1" right here, then run
>>>     ./t1092-sparse-checkout-compatibility.sh --ver --imm -x
>>> until it stops, then
>>>     cd t/trash directory.t1092-sparse-checkout-compatibility/sparse-checkout
>>>     git ls-files -t | grep folder  # Note the files that are sparse
>>>     git reset --mixed HEAD~1
>>>     git ls-files -t | grep folder  # Note the files that are sparse --
>>> there are some that aren't that should be
>>>     git sparse-checkout reapply
>>>     git ls-files -t | grep folder  # Note the files that are sparse
>>>
>>> Granted, this is a bug with sparse-checkout without sparse-index, so
>>> not something new to your series.  But since you are using comparisons
>>> between regular sparse-checkouts and sparse-index to verify
>>> correctness, this seems problematic to me.
>>
>> I'll add it to the pile, but I want to continue having this series
>> focus on making the sparse-index work quickly without a change in
>> behavior from a normal index. Changing the behavior of the sparse-
>> checkout feature should be a separate series.
> 
> Hmm..perhaps there's some middle ground?  I appreciate that you want
> to have this series focus on making the sparse-index work without
> worrying about behavioral changes to sparse-checkout.  I'm concerned,
> though, that testcases tend to be treated as documentation of intended
> behavior, even when the tests are testing something else.  These tests
> are clearly triggering buggy behavior, and I think your comments and
> subsequent command may be affected by it.  I don't want to leave
> future folks (even ourselves) to have to try to explain away why the
> behavior expected in this test should not be expected.
> 
> Perhaps we can just add a comment that this testcase is triggering a
> bug in both sparse-checkout and sparse-index but we're only checking
> that the two match, and that once the bug is fix, the testcase itself
> may need tweaking?
 
I can get behind that approach: document the bug, but comment that it
_is_ a bug and should be changed in the future.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 215+ messages in thread

* Re: [PATCH v8 00/15] Sparse-index: integrate with status
  2021-07-12 19:38               ` [PATCH v8 00/15] Sparse-index: integrate with status Elijah Newren
@ 2021-07-13 12:57                 ` Derrick Stolee
  2021-07-13 17:37                   ` Elijah Newren
  0 siblings, 1 reply; 215+ messages in thread
From: Derrick Stolee @ 2021-07-13 12:57 UTC (permalink / raw)
  To: Elijah Newren, Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Junio C Hamano, Matheus Tavares Bernardino,
	Jeff Hostetler, Johannes Schindelin, Derrick Stolee

On 7/12/2021 3:38 PM, Elijah Newren wrote:
> On Mon, Jul 12, 2021 at 10:55 AM Derrick Stolee via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
>>   9:  237ccf4e43d !  9:  c0b0b58584c unpack-trees: unpack sparse directory entries
>>      @@ unpack-trees.c: static int find_cache_pos(struct traverse_info *info,
>>       +  * Check for a sparse-directory entry named "path/".
>>       +  * Due to the input p->path not having a trailing
>>       +  * slash, the negative 'pos' value overshoots the
>>      -+  * expected position by at least one, hence "-2" here.
>>      ++  * expected position, hence "-2" instead of "-1".
>>       +  */
>>       + pos = -pos - 2;
>>       +
>>      @@ unpack-trees.c: static int find_cache_pos(struct traverse_info *info,
>>                 return NULL;
>>       +
>>       + /*
>>      -+  * We might have multiple entries between 'pos' and
>>      -+  * the actual sparse-directory entry, so start walking
>>      -+  * back until finding it or passing where it would be.
>>      ++  * Due to lexicographic sorting and sparse directory
>>      ++  * entried ending with a trailing slash, our path as a
> 
> s/entried/entries/ ?

Oops! Yes, that would be a valuable fixup. Thanks for catching it.

> 
>>      ++  * sparse directory (e.g "subdir/") and our path as a
>>      ++  * file (e.g. "subdir") might be separated by other
>>      ++  * paths (e.g. "subdir-").
>>       +  */
>>       + while (pos >= 0) {
>>       +         ce = o->src_index->cache[pos];
> ...
>>  15:  717a3f49f97 = 14:  dada1b91bdc wt-status: expand added sparse directory entries
> 
> As I commented over at [1], I would appreciate if we could at least
> add a comment in the testcase that we know this testcase triggers a
> bug for both sparse-index and sparse-checkout...and that fixing it
> might affect the other comments and commands within that testcase in
> the future...but for now, we're just testing as best we can that the
> two give the same behavior.
> 
> [1] https://lore.kernel.org/git/CABPp-BGJ+LTubgS=zvGJjk3kgyfW-7UFEa=qg-0mdyrY32j0pQ@mail.gmail.com/

How do you feel about a new patch that focuses on adding these
comments, including an older test that had a similar documentation
of the behavior change? A patch that could be queued on top of
this series is pasted below the cutline.

Thanks,
-Stolee


-- >8 --

From 8e69def90f5844c117cc1e9efd673c92b85c9238 Mon Sep 17 00:00:00 2001
From: Derrick Stolee <dstolee@microsoft.com>
Date: Tue, 13 Jul 2021 08:50:24 -0400
Subject: [PATCH 16/15] t1092: document bad sparse-checkout behavior

There are several situations where a repository with sparse-checkout
enabled will act differently than a normal repository, and in ways that
are not intentional. The test t1092-sparse-checkout-compatibility.sh
documents some of these deviations, but a casual reader might think
these are intentional behavior changes.

Add comments on these tests that make it clear that these behaviors
should be updated. Using 'NEEDSWORK' helps contributors find that these
are potential areas for improvement.

Helped-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 t/t1092-sparse-checkout-compatibility.sh | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index 2394c36d881..cabbd42e339 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -392,8 +392,8 @@ test_expect_failure 'blame with pathspec outside sparse definition' '
 	test_all_match git blame deep/deeper2/deepest/a
 '
 
-# TODO: reset currently does not behave as expected when in a
-# sparse-checkout.
+# NEEDSWORK: a sparse-checkout behaves differently from a full checkout
+# in this scenario, but it shouldn't.
 test_expect_failure 'checkout and reset (mixed)' '
 	init_repos &&
 
@@ -403,8 +403,8 @@ test_expect_failure 'checkout and reset (mixed)' '
 	test_all_match git reset update-folder2
 '
 
-# Ensure that sparse-index behaves identically to
-# sparse-checkout with a full index.
+# NEEDSWORK: a sparse-checkout behaves differently from a full checkout
+# in this scenario, but it shouldn't.
 test_expect_success 'checkout and reset (mixed) [sparse]' '
 	init_repos &&
 
@@ -524,6 +524,8 @@ test_expect_success 'sparse-index is not expanded' '
 	test_region ! index ensure_full_index trace2.txt
 '
 
+# NEEDSWORK: a sparse-checkout behaves differently from a full checkout
+# in this scenario, but it shouldn't.
 test_expect_success 'reset mixed and checkout orphan' '
 	init_repos &&
 
-- 
2.32.0.15.g108d1b86e21




^ permalink raw reply related	[flat|nested] 215+ messages in thread

* Re: [PATCH v8 00/15] Sparse-index: integrate with status
  2021-07-13 12:57                 ` Derrick Stolee
@ 2021-07-13 17:37                   ` Elijah Newren
  0 siblings, 0 replies; 215+ messages in thread
From: Elijah Newren @ 2021-07-13 17:37 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Derrick Stolee via GitGitGadget, Git Mailing List,
	Junio C Hamano, Matheus Tavares Bernardino, Jeff Hostetler,
	Johannes Schindelin, Derrick Stolee

On Tue, Jul 13, 2021 at 5:57 AM Derrick Stolee <stolee@gmail.com> wrote:
>
> On 7/12/2021 3:38 PM, Elijah Newren wrote:
> > On Mon, Jul 12, 2021 at 10:55 AM Derrick Stolee via GitGitGadget
> > <gitgitgadget@gmail.com> wrote:
> >>   9:  237ccf4e43d !  9:  c0b0b58584c unpack-trees: unpack sparse directory entries
> >>      @@ unpack-trees.c: static int find_cache_pos(struct traverse_info *info,
> >>       +  * Check for a sparse-directory entry named "path/".
> >>       +  * Due to the input p->path not having a trailing
> >>       +  * slash, the negative 'pos' value overshoots the
> >>      -+  * expected position by at least one, hence "-2" here.
> >>      ++  * expected position, hence "-2" instead of "-1".
> >>       +  */
> >>       + pos = -pos - 2;
> >>       +
> >>      @@ unpack-trees.c: static int find_cache_pos(struct traverse_info *info,
> >>                 return NULL;
> >>       +
> >>       + /*
> >>      -+  * We might have multiple entries between 'pos' and
> >>      -+  * the actual sparse-directory entry, so start walking
> >>      -+  * back until finding it or passing where it would be.
> >>      ++  * Due to lexicographic sorting and sparse directory
> >>      ++  * entried ending with a trailing slash, our path as a
> >
> > s/entried/entries/ ?
>
> Oops! Yes, that would be a valuable fixup. Thanks for catching it.
>
> >
> >>      ++  * sparse directory (e.g "subdir/") and our path as a
> >>      ++  * file (e.g. "subdir") might be separated by other
> >>      ++  * paths (e.g. "subdir-").
> >>       +  */
> >>       + while (pos >= 0) {
> >>       +         ce = o->src_index->cache[pos];
> > ...
> >>  15:  717a3f49f97 = 14:  dada1b91bdc wt-status: expand added sparse directory entries
> >
> > As I commented over at [1], I would appreciate if we could at least
> > add a comment in the testcase that we know this testcase triggers a
> > bug for both sparse-index and sparse-checkout...and that fixing it
> > might affect the other comments and commands within that testcase in
> > the future...but for now, we're just testing as best we can that the
> > two give the same behavior.
> >
> > [1] https://lore.kernel.org/git/CABPp-BGJ+LTubgS=zvGJjk3kgyfW-7UFEa=qg-0mdyrY32j0pQ@mail.gmail.com/
>
> How do you feel about a new patch that focuses on adding these
> comments, including an older test that had a similar documentation
> of the behavior change? A patch that could be queued on top of
> this series is pasted below the cutline.

Looks good to me.  Re-roll with my Reviewed-by and let's get this
series merged down to next.  :-)

>
> Thanks,
> -Stolee
>
>
> -- >8 --
>
> From 8e69def90f5844c117cc1e9efd673c92b85c9238 Mon Sep 17 00:00:00 2001
> From: Derrick Stolee <dstolee@microsoft.com>
> Date: Tue, 13 Jul 2021 08:50:24 -0400
> Subject: [PATCH 16/15] t1092: document bad sparse-checkout behavior
>
> There are several situations where a repository with sparse-checkout
> enabled will act differently than a normal repository, and in ways that
> are not intentional. The test t1092-sparse-checkout-compatibility.sh
> documents some of these deviations, but a casual reader might think
> these are intentional behavior changes.
>
> Add comments on these tests that make it clear that these behaviors
> should be updated. Using 'NEEDSWORK' helps contributors find that these
> are potential areas for improvement.
>
> Helped-by: Elijah Newren <newren@gmail.com>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  t/t1092-sparse-checkout-compatibility.sh | 10 ++++++----
>  1 file changed, 6 insertions(+), 4 deletions(-)
>
> diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
> index 2394c36d881..cabbd42e339 100755
> --- a/t/t1092-sparse-checkout-compatibility.sh
> +++ b/t/t1092-sparse-checkout-compatibility.sh
> @@ -392,8 +392,8 @@ test_expect_failure 'blame with pathspec outside sparse definition' '
>         test_all_match git blame deep/deeper2/deepest/a
>  '
>
> -# TODO: reset currently does not behave as expected when in a
> -# sparse-checkout.
> +# NEEDSWORK: a sparse-checkout behaves differently from a full checkout
> +# in this scenario, but it shouldn't.
>  test_expect_failure 'checkout and reset (mixed)' '
>         init_repos &&
>
> @@ -403,8 +403,8 @@ test_expect_failure 'checkout and reset (mixed)' '
>         test_all_match git reset update-folder2
>  '
>
> -# Ensure that sparse-index behaves identically to
> -# sparse-checkout with a full index.
> +# NEEDSWORK: a sparse-checkout behaves differently from a full checkout
> +# in this scenario, but it shouldn't.
>  test_expect_success 'checkout and reset (mixed) [sparse]' '
>         init_repos &&
>
> @@ -524,6 +524,8 @@ test_expect_success 'sparse-index is not expanded' '
>         test_region ! index ensure_full_index trace2.txt
>  '
>
> +# NEEDSWORK: a sparse-checkout behaves differently from a full checkout
> +# in this scenario, but it shouldn't.
>  test_expect_success 'reset mixed and checkout orphan' '
>         init_repos &&
>
> --

^ permalink raw reply	[flat|nested] 215+ messages in thread

* Re: [PATCH v8 03/15] t1092: replace incorrect 'echo' with 'cat'
  2021-07-12 17:55               ` [PATCH v8 03/15] t1092: replace incorrect 'echo' with 'cat' Derrick Stolee via GitGitGadget
@ 2021-07-14  0:02                 ` Bagas Sanjaya
  0 siblings, 0 replies; 215+ messages in thread
From: Bagas Sanjaya @ 2021-07-14  0:02 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget, git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee, git,
	johannes.schindelin, Derrick Stolee, Derrick Stolee

On 13/07/21 00.55, Derrick Stolee via GitGitGadget wrote:
> From: Derrick Stolee <dstolee@microsoft.com>
> 
> This fixes the test data shape to be as expected, allowing rename
> detection to work properly now that the 'larger-content' file actually
> has meaningful lines.
> 

What's the connection with s/echo/cat/ ?

> @@ -40,7 +40,7 @@ test_expect_success 'setup' '
>   		done &&
>   
>   		git checkout -b rename-base base &&
> -		echo >folder1/larger-content <<-\EOF &&
> +		cat >folder1/larger-content <<-\EOF &&
>   		matching
>   		lines
>   		help
> 

OK, because to write multi-line file in scripts one must use cat, while 
echo can only write single line.

-- 
An old man doll... just what I always wanted! - Clara

^ permalink raw reply	[flat|nested] 215+ messages in thread

* [PATCH v9 00/16] Sparse-index: integrate with status
  2021-07-12 17:55             ` [PATCH v8 00/15] " Derrick Stolee via GitGitGadget
                                 ` (15 preceding siblings ...)
  2021-07-12 19:38               ` [PATCH v8 00/15] Sparse-index: integrate with status Elijah Newren
@ 2021-07-14 13:12               ` Derrick Stolee via GitGitGadget
  2021-07-14 13:12                 ` [PATCH v9 01/16] sparse-index: skip indexes with unmerged entries Derrick Stolee via GitGitGadget
                                   ` (17 more replies)
  16 siblings, 18 replies; 215+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-07-14 13:12 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee, git,
	johannes.schindelin, Bagas Sanjaya, Derrick Stolee

This is the first "payoff" series in the sparse-index work. It makes 'git
status' very fast when a sparse-index is enabled on a repository with
cone-mode sparse-checkout (and a small populated set).

This is based on ds/sparse-index-protections AND mt/add-rm-sparse-checkout.
The latter branch is needed because it changes the behavior of 'git add'
around sparse entries, which changes the expectations of a test added in
patch 1.

The approach here is to audit the places where ensure_full_index() pops up
while doing normal commands with pathspecs within the sparse-checkout
definition. Each of these are checked and tested. In the end, the
sparse-index is integrated with these features:

 * git status
 * FS Monitor index extension.

The performance tests in p2000-sparse-operations.sh improve by 95% or more,
even when compared with the full-index cases, not just the sparse-index
cases that previously had extra overhead.

Hopefully this is the first example of how ds/sparse-index-protections has
done the basic work to do these conversions safely, making them look easier
than they seemed when starting this adventure.

Thanks, -Stolee


Update in V9
============

 * Fixed typo.

 * All patches are marked as Reviewed-by Elijah. Thanks for the careful
   review!

Derrick Stolee (16):
  sparse-index: skip indexes with unmerged entries
  sparse-index: include EXTENDED flag when expanding
  t1092: replace incorrect 'echo' with 'cat'
  t1092: expand repository data shape
  t1092: add tests for status/add and sparse files
  unpack-trees: preserve cache_bottom
  unpack-trees: compare sparse directories correctly
  unpack-trees: rename unpack_nondirectories()
  unpack-trees: unpack sparse directory entries
  dir.c: accept a directory as part of cone-mode patterns
  diff-lib: handle index diffs with sparse dirs
  status: skip sparse-checkout percentage with sparse-index
  status: use sparse-index throughout
  wt-status: expand added sparse directory entries
  fsmonitor: integrate with sparse index
  t1092: document bad sparse-checkout behavior

 builtin/commit.c                         |   3 +
 diff-lib.c                               |  19 +++
 dir.c                                    |  24 +++-
 read-cache.c                             |  10 +-
 sparse-index.c                           |  27 +++-
 t/t1092-sparse-checkout-compatibility.sh | 168 +++++++++++++++++++++--
 t/t7519-status-fsmonitor.sh              |  49 +++++++
 unpack-trees.c                           | 142 ++++++++++++++++---
 wt-status.c                              |  65 ++++++++-
 wt-status.h                              |   1 +
 10 files changed, 470 insertions(+), 38 deletions(-)


base-commit: d486ca60a51c9cb1fe068803c3f540724e95e83a
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-932%2Fderrickstolee%2Fsparse-index%2Fstatus-and-add-v9
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-932/derrickstolee/sparse-index/status-and-add-v9
Pull-Request: https://github.com/gitgitgadget/git/pull/932

Range-diff vs v8:

  1:  1815c148e8c !  1:  ecab56fd57f sparse-index: skip indexes with unmerged entries
     @@ Commit message
          conversion does not need to happen. Thus, this can be deferred until the
          merge machinery is made to integrate with the sparse-index.
      
     +    Reviewed-by: Elijah Newren <newren@gmail.com>
          Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
      
       ## sparse-index.c ##
  2:  7bcde075d8d !  2:  f3de9ce7baa sparse-index: include EXTENDED flag when expanding
     @@ Commit message
          feature, but has no significance to its correctness in the user-facing
          functionality.
      
     +    Reviewed-by: Elijah Newren <newren@gmail.com>
          Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
      
       ## sparse-index.c ##
  3:  05981e30b97 !  3:  5b59436011c t1092: replace incorrect 'echo' with 'cat'
     @@ Commit message
          detection to work properly now that the 'larger-content' file actually
          has meaningful lines.
      
     +    Reviewed-by: Elijah Newren <newren@gmail.com>
          Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
      
       ## t/t1092-sparse-checkout-compatibility.sh ##
  4:  d38b66e9ee4 !  4:  4d06d972911 t1092: expand repository data shape
     @@ Commit message
          Later tests will take advantage of these shapes, but they also deepen
          the tests that already exist.
      
     +    Reviewed-by: Elijah Newren <newren@gmail.com>
          Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
      
       ## t/t1092-sparse-checkout-compatibility.sh ##
  5:  95ddd3abe4e !  5:  cc83a0cd307 t1092: add tests for status/add and sparse files
     @@ Commit message
          be more sensible, and this test could be modified to satisfy the new
          expected behavior.
      
     +    Reviewed-by: Elijah Newren <newren@gmail.com>
          Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
      
       ## t/t1092-sparse-checkout-compatibility.sh ##
  6:  b182b456613 !  6:  18cb1f6ea9b unpack-trees: preserve cache_bottom
     @@ Commit message
          the cache_bottom will be modified as the cache-tree walk advances. Do
          not update it as well within mark_ce_used().
      
     +    Reviewed-by: Elijah Newren <newren@gmail.com>
          Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
      
       ## unpack-trees.c ##
  7:  988ddce4d45 !  7:  bdd47bd30e0 unpack-trees: compare sparse directories correctly
     @@ Commit message
          exact string match on a sparse directory entry. The final check is a
          length comparison between the strings.
      
     +    Reviewed-by: Elijah Newren <newren@gmail.com>
          Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
      
       ## unpack-trees.c ##
  8:  d67ad048b08 !  8:  0e4b7390f47 unpack-trees: rename unpack_nondirectories()
     @@ Commit message
          apply. The new name reflects that we will not recurse into trees in
          order to resolve the conflicts.
      
     +    Reviewed-by: Elijah Newren <newren@gmail.com>
          Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
      
       ## unpack-trees.c ##
  9:  c0b0b58584c !  9:  602525f5d71 unpack-trees: unpack sparse directory entries
     @@ Commit message
          markers in the case that the entries correspond to directories and we
          already have a sparse directory entry.
      
     +    Reviewed-by: Elijah Newren <newren@gmail.com>
          Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
      
       ## unpack-trees.c ##
     @@ unpack-trees.c: static int find_cache_pos(struct traverse_info *info,
      +
      +	/*
      +	 * Due to lexicographic sorting and sparse directory
     -+	 * entried ending with a trailing slash, our path as a
     ++	 * entries ending with a trailing slash, our path as a
      +	 * sparse directory (e.g "subdir/") and	our path as a
      +	 * file (e.g. "subdir") might be separated by other
      +	 * paths (e.g. "subdir-").
 10:  76c7528f78f ! 10:  b051c0847a5 dir.c: accept a directory as part of cone-mode patterns
     @@ Commit message
          assuming we are in cone mode. Since sparse index requires cone mode
          patterns, this is an acceptable assumption.
      
     +    Reviewed-by: Elijah Newren <newren@gmail.com>
          Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
      
       ## dir.c ##
 11:  d875a7f8585 ! 11:  e749fd41cda diff-lib: handle index diffs with sparse dirs
     @@ Commit message
      
          Use diff_tree_oid() appropriately to compute the diff.
      
     +    Reviewed-by: Elijah Newren <newren@gmail.com>
          Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
      
       ## diff-lib.c ##
 12:  2b72cc2d985 ! 12:  7f782e3fe50 status: skip sparse-checkout percentage with sparse-index
     @@ Commit message
          this message is equal across both modes, but instead just the important
          information about staged, modified, and untracked files are compared.
      
     +    Reviewed-by: Elijah Newren <newren@gmail.com>
          Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
      
       ## t/t1092-sparse-checkout-compatibility.sh ##
 13:  1c1feef3733 ! 13:  ad1715e3319 status: use sparse-index throughout
     @@ Commit message
          other way. Correct integration with FS Monitor will be validated in
          later changes.
      
     +    Reviewed-by: Elijah Newren <newren@gmail.com>
          Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
      
       ## builtin/commit.c ##
 14:  dada1b91bdc ! 14:  aa2258be302 wt-status: expand added sparse directory entries
     @@ Commit message
          Somehow, this loop over the cache entries was not guarded by
          ensure_full_index() as intended.
      
     +    Reviewed-by: Elijah Newren <newren@gmail.com>
          Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
      
       ## t/t1092-sparse-checkout-compatibility.sh ##
 15:  bdc771cf373 ! 15:  1d4b1f8aea0 fsmonitor: integrate with sparse index
     @@ Commit message
          to a tracked directory outside of the sparse cone will trigger
          ensure_full_index().
      
     +    Reviewed-by: Elijah Newren <newren@gmail.com>
          Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
      
       ## sparse-index.c ##
  -:  ----------- > 16:  45861118991 t1092: document bad sparse-checkout behavior

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 215+ messages in thread

* [PATCH v9 01/16] sparse-index: skip indexes with unmerged entries
  2021-07-14 13:12               ` [PATCH v9 00/16] " Derrick Stolee via GitGitGadget
@ 2021-07-14 13:12                 ` Derrick Stolee via GitGitGadget
  2021-07-14 13:12                 ` [PATCH v9 02/16] sparse-index: include EXTENDED flag when expanding Derrick Stolee via GitGitGadget
                                   ` (16 subsequent siblings)
  17 siblings, 0 replies; 215+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-07-14 13:12 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee, git,
	johannes.schindelin, Bagas Sanjaya, Derrick Stolee,
	Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

The sparse-index format is designed to be compatible with merge
conflicts, even those outside the sparse-checkout definition. The reason
is that when converting a full index to a sparse one, a cache entry with
nonzero stage will not be collapsed into a sparse directory entry.

However, this behavior was not tested, and a different behavior within
convert_to_sparse() fails in this scenario. Specifically,
cache_tree_update() will fail when unmerged entries exist.
convert_to_sparse_rec() uses the cache-tree data to recursively walk the
tree structure, but also to compute the OIDs used in the
sparse-directory entries.

Add an index scan to convert_to_sparse() that will detect if these merge
conflict entries exist and skip the conversion before trying to update
the cache-tree. This is marked as NEEDSWORK because this can be removed
with a suitable update to cache_tree_update() or a similar method that
can construct a cache-tree with invalid nodes, but still allow creating
the nodes necessary for creating sparse directory entries.

It is possible that in the future we will not need to make such an
update, since if we do not expand a sparse-index into a full one, this
conversion does not need to happen. Thus, this can be deferred until the
merge machinery is made to integrate with the sparse-index.

Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 sparse-index.c                           | 18 ++++++++++++++++++
 t/t1092-sparse-checkout-compatibility.sh | 22 ++++++++++++++++++++++
 2 files changed, 40 insertions(+)

diff --git a/sparse-index.c b/sparse-index.c
index affc4048f27..2c695930275 100644
--- a/sparse-index.c
+++ b/sparse-index.c
@@ -116,6 +116,17 @@ int set_sparse_index_config(struct repository *repo, int enable)
 	return res;
 }
 
+static int index_has_unmerged_entries(struct index_state *istate)
+{
+	int i;
+	for (i = 0; i < istate->cache_nr; i++) {
+		if (ce_stage(istate->cache[i]))
+			return 1;
+	}
+
+	return 0;
+}
+
 int convert_to_sparse(struct index_state *istate)
 {
 	int test_env;
@@ -152,6 +163,13 @@ int convert_to_sparse(struct index_state *istate)
 		return -1;
 	}
 
+	/*
+	 * NEEDSWORK: If we have unmerged entries, then stay full.
+	 * Unmerged entries prevent the cache-tree extension from working.
+	 */
+	if (index_has_unmerged_entries(istate))
+		return 0;
+
 	if (cache_tree_update(istate, 0)) {
 		warning(_("unable to update cache-tree, staying full"));
 		return -1;
diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index d028b73eba1..b8617ceef71 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -352,6 +352,28 @@ test_expect_success 'merge with outside renames' '
 	done
 '
 
+# Sparse-index fails to convert the index in the
+# final 'git cherry-pick' command.
+test_expect_success 'cherry-pick with conflicts' '
+	init_repos &&
+
+	write_script edit-conflict <<-\EOF &&
+	echo $1 >conflict
+	EOF
+
+	test_all_match git checkout -b to-cherry-pick &&
+	run_on_all ../edit-conflict ABC &&
+	test_all_match git add conflict &&
+	test_all_match git commit -m "conflict to pick" &&
+
+	test_all_match git checkout -B base HEAD~1 &&
+	run_on_all ../edit-conflict DEF &&
+	test_all_match git add conflict &&
+	test_all_match git commit -m "conflict in base" &&
+
+	test_all_match test_must_fail git cherry-pick to-cherry-pick
+'
+
 test_expect_success 'clean' '
 	init_repos &&
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 215+ messages in thread

* [PATCH v9 02/16] sparse-index: include EXTENDED flag when expanding
  2021-07-14 13:12               ` [PATCH v9 00/16] " Derrick Stolee via GitGitGadget
  2021-07-14 13:12                 ` [PATCH v9 01/16] sparse-index: skip indexes with unmerged entries Derrick Stolee via GitGitGadget
@ 2021-07-14 13:12                 ` Derrick Stolee via GitGitGadget
  2021-07-14 13:12                 ` [PATCH v9 03/16] t1092: replace incorrect 'echo' with 'cat' Derrick Stolee via GitGitGadget
                                   ` (15 subsequent siblings)
  17 siblings, 0 replies; 215+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-07-14 13:12 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee, git,
	johannes.schindelin, Bagas Sanjaya, Derrick Stolee,
	Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

When creating a full index from a sparse one, we create cache entries
for every blob within a given sparse directory entry. These are
correctly marked with the CE_SKIP_WORKTREE flag, but the CE_EXTENDED
flag is not included. The CE_EXTENDED flag would exist if we loaded a
full index from disk with these entries marked with CE_SKIP_WORKTREE, so
we can add the flag here to be consistent. This allows us to directly
compare the flags present in cache entries when testing the sparse-index
feature, but has no significance to its correctness in the user-facing
functionality.

Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 sparse-index.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/sparse-index.c b/sparse-index.c
index 2c695930275..ef53bd2198b 100644
--- a/sparse-index.c
+++ b/sparse-index.c
@@ -213,7 +213,7 @@ static int add_path_to_index(const struct object_id *oid,
 	strbuf_addstr(base, path);
 
 	ce = make_cache_entry(istate, mode, oid, base->buf, 0, 0);
-	ce->ce_flags |= CE_SKIP_WORKTREE;
+	ce->ce_flags |= CE_SKIP_WORKTREE | CE_EXTENDED;
 	set_index_entry(istate, istate->cache_nr++, ce);
 
 	strbuf_setlen(base, len);
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 215+ messages in thread

* [PATCH v9 03/16] t1092: replace incorrect 'echo' with 'cat'
  2021-07-14 13:12               ` [PATCH v9 00/16] " Derrick Stolee via GitGitGadget
  2021-07-14 13:12                 ` [PATCH v9 01/16] sparse-index: skip indexes with unmerged entries Derrick Stolee via GitGitGadget
  2021-07-14 13:12                 ` [PATCH v9 02/16] sparse-index: include EXTENDED flag when expanding Derrick Stolee via GitGitGadget
@ 2021-07-14 13:12                 ` Derrick Stolee via GitGitGadget
  2021-07-14 13:12                 ` [PATCH v9 04/16] t1092: expand repository data shape Derrick Stolee via GitGitGadget
                                   ` (14 subsequent siblings)
  17 siblings, 0 replies; 215+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-07-14 13:12 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee, git,
	johannes.schindelin, Bagas Sanjaya, Derrick Stolee,
	Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

This fixes the test data shape to be as expected, allowing rename
detection to work properly now that the 'larger-content' file actually
has meaningful lines.

Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 t/t1092-sparse-checkout-compatibility.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index b8617ceef71..87f1014a1c9 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -40,7 +40,7 @@ test_expect_success 'setup' '
 		done &&
 
 		git checkout -b rename-base base &&
-		echo >folder1/larger-content <<-\EOF &&
+		cat >folder1/larger-content <<-\EOF &&
 		matching
 		lines
 		help
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 215+ messages in thread

* [PATCH v9 04/16] t1092: expand repository data shape
  2021-07-14 13:12               ` [PATCH v9 00/16] " Derrick Stolee via GitGitGadget
                                   ` (2 preceding siblings ...)
  2021-07-14 13:12                 ` [PATCH v9 03/16] t1092: replace incorrect 'echo' with 'cat' Derrick Stolee via GitGitGadget
@ 2021-07-14 13:12                 ` Derrick Stolee via GitGitGadget
  2021-07-14 13:12                 ` [PATCH v9 05/16] t1092: add tests for status/add and sparse files Derrick Stolee via GitGitGadget
                                   ` (13 subsequent siblings)
  17 siblings, 0 replies; 215+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-07-14 13:12 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee, git,
	johannes.schindelin, Bagas Sanjaya, Derrick Stolee,
	Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

As more features integrate with the sparse-index feature, more and more
special cases arise that require different data shapes within the tree
structure of the repository in order to demonstrate those cases.

Add several interesting special cases all at once instead of sprinkling
them across several commits. The interesting cases being added here are:

* Add sparse-directory entries on both sides of directories within the
  sparse-checkout definition.

* Add directories outside the sparse-checkout definition who have only
  one entry and are the first entry of a directory with multiple
  entries.

* Add filenames adjacent to a sparse directory entry that sort before
  and after the trailing slash.

Later tests will take advantage of these shapes, but they also deepen
the tests that already exist.

Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 t/t1092-sparse-checkout-compatibility.sh | 42 ++++++++++++++++++++++--
 1 file changed, 40 insertions(+), 2 deletions(-)

diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index 87f1014a1c9..0e71a623619 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -17,7 +17,7 @@ test_expect_success 'setup' '
 		echo "after folder1" >g &&
 		echo "after x" >z &&
 		mkdir folder1 folder2 deep x &&
-		mkdir deep/deeper1 deep/deeper2 &&
+		mkdir deep/deeper1 deep/deeper2 deep/before deep/later &&
 		mkdir deep/deeper1/deepest &&
 		echo "after deeper1" >deep/e &&
 		echo "after deepest" >deep/deeper1/e &&
@@ -25,10 +25,23 @@ test_expect_success 'setup' '
 		cp a folder2 &&
 		cp a x &&
 		cp a deep &&
+		cp a deep/before &&
 		cp a deep/deeper1 &&
 		cp a deep/deeper2 &&
+		cp a deep/later &&
 		cp a deep/deeper1/deepest &&
 		cp -r deep/deeper1/deepest deep/deeper2 &&
+		mkdir deep/deeper1/0 &&
+		mkdir deep/deeper1/0/0 &&
+		touch deep/deeper1/0/1 &&
+		touch deep/deeper1/0/0/0 &&
+		>folder1- &&
+		>folder1.x &&
+		>folder10 &&
+		cp -r deep/deeper1/0 folder1 &&
+		cp -r deep/deeper1/0 folder2 &&
+		echo >>folder1/0/0/0 &&
+		echo >>folder2/0/1 &&
 		git add . &&
 		git commit -m "initial commit" &&
 		git checkout -b base &&
@@ -56,11 +69,17 @@ test_expect_success 'setup' '
 		mv folder1/a folder2/b &&
 		mv folder1/larger-content folder2/edited-content &&
 		echo >>folder2/edited-content &&
+		echo >>folder2/0/1 &&
+		echo stuff >>deep/deeper1/a &&
 		git add . &&
 		git commit -m "rename folder1/... to folder2/..." &&
 
 		git checkout -b rename-out-to-in rename-base &&
 		mv folder1/a deep/deeper1/b &&
+		echo more stuff >>deep/deeper1/a &&
+		rm folder2/0/1 &&
+		mkdir folder2/0/1 &&
+		echo >>folder2/0/1/1 &&
 		mv folder1/larger-content deep/deeper1/edited-content &&
 		echo >>deep/deeper1/edited-content &&
 		git add . &&
@@ -68,6 +87,9 @@ test_expect_success 'setup' '
 
 		git checkout -b rename-in-to-out rename-base &&
 		mv deep/deeper1/a folder1/b &&
+		echo >>folder2/0/1 &&
+		rm -rf folder1/0/0 &&
+		echo >>folder1/0/0 &&
 		mv deep/deeper1/larger-content folder1/edited-content &&
 		echo >>folder1/edited-content &&
 		git add . &&
@@ -262,13 +284,29 @@ test_expect_success 'diff --staged' '
 	test_all_match git diff --staged
 '
 
-test_expect_success 'diff with renames' '
+test_expect_success 'diff with renames and conflicts' '
 	init_repos &&
 
 	for branch in rename-out-to-out rename-out-to-in rename-in-to-out
 	do
 		test_all_match git checkout rename-base &&
 		test_all_match git checkout $branch -- . &&
+		test_all_match git status --porcelain=v2 &&
+		test_all_match git diff --staged --no-renames &&
+		test_all_match git diff --staged --find-renames || return 1
+	done
+'
+
+test_expect_success 'diff with directory/file conflicts' '
+	init_repos &&
+
+	for branch in rename-out-to-out rename-out-to-in rename-in-to-out
+	do
+		git -C full-checkout reset --hard &&
+		test_sparse_match git reset --hard &&
+		test_all_match git checkout $branch &&
+		test_all_match git checkout rename-base -- . &&
+		test_all_match git status --porcelain=v2 &&
 		test_all_match git diff --staged --no-renames &&
 		test_all_match git diff --staged --find-renames || return 1
 	done
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 215+ messages in thread

* [PATCH v9 05/16] t1092: add tests for status/add and sparse files
  2021-07-14 13:12               ` [PATCH v9 00/16] " Derrick Stolee via GitGitGadget
                                   ` (3 preceding siblings ...)
  2021-07-14 13:12                 ` [PATCH v9 04/16] t1092: expand repository data shape Derrick Stolee via GitGitGadget
@ 2021-07-14 13:12                 ` Derrick Stolee via GitGitGadget
  2021-07-14 13:12                 ` [PATCH v9 06/16] unpack-trees: preserve cache_bottom Derrick Stolee via GitGitGadget
                                   ` (12 subsequent siblings)
  17 siblings, 0 replies; 215+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-07-14 13:12 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee, git,
	johannes.schindelin, Bagas Sanjaya, Derrick Stolee,
	Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Before moving to update 'git status' and 'git add' to work with sparse
indexes, add an explicit test that ensures the sparse-index works the
same as a normal sparse-checkout when the worktree contains directories
and files outside of the sparse cone.

Specifically, 'folder1/a' is a file in our test repo, but 'folder1' is
not in the sparse cone. When 'folder1/a' is modified, the file is not
shown as modified and adding it will fail. This is new behavior as of
a20f704 (add: warn when asked to update SKIP_WORKTREE entries,
2021-04-08). Before that change, these adds would be silently ignored.

Untracked files are fine: adding new files both with 'git add .' and
'git add folder1/' works just as in a full checkout. This may not be
entirely desirable, but we are not intending to change behavior at the
moment, only document it. A future change could alter the behavior to
be more sensible, and this test could be modified to satisfy the new
expected behavior.

Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 t/t1092-sparse-checkout-compatibility.sh | 38 ++++++++++++++++++++++++
 1 file changed, 38 insertions(+)

diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index 0e71a623619..2269f44e033 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -254,6 +254,44 @@ test_expect_success 'add, commit, checkout' '
 	test_all_match git checkout -
 '
 
+test_expect_success 'status/add: outside sparse cone' '
+	init_repos &&
+
+	# adding a "missing" file outside the cone should fail
+	test_sparse_match test_must_fail git add folder1/a &&
+
+	# folder1 is at HEAD, but outside the sparse cone
+	run_on_sparse mkdir folder1 &&
+	cp initial-repo/folder1/a sparse-checkout/folder1/a &&
+	cp initial-repo/folder1/a sparse-index/folder1/a &&
+
+	test_sparse_match git status &&
+
+	write_script edit-contents <<-\EOF &&
+	echo text >>$1
+	EOF
+	run_on_sparse ../edit-contents folder1/a &&
+	run_on_all ../edit-contents folder1/new &&
+
+	test_sparse_match git status --porcelain=v2 &&
+
+	# This "git add folder1/a" fails with a warning
+	# in the sparse repos, differing from the full
+	# repo. This is intentional.
+	test_sparse_match test_must_fail git add folder1/a &&
+	test_sparse_match test_must_fail git add --refresh folder1/a &&
+	test_all_match git status --porcelain=v2 &&
+
+	test_all_match git add . &&
+	test_all_match git status --porcelain=v2 &&
+	test_all_match git commit -m folder1/new &&
+
+	run_on_all ../edit-contents folder1/newer &&
+	test_all_match git add folder1/ &&
+	test_all_match git status --porcelain=v2 &&
+	test_all_match git commit -m folder1/newer
+'
+
 test_expect_success 'checkout and reset --hard' '
 	init_repos &&
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 215+ messages in thread

* [PATCH v9 06/16] unpack-trees: preserve cache_bottom
  2021-07-14 13:12               ` [PATCH v9 00/16] " Derrick Stolee via GitGitGadget
                                   ` (4 preceding siblings ...)
  2021-07-14 13:12                 ` [PATCH v9 05/16] t1092: add tests for status/add and sparse files Derrick Stolee via GitGitGadget
@ 2021-07-14 13:12                 ` Derrick Stolee via GitGitGadget
  2021-07-14 13:12                 ` [PATCH v9 07/16] unpack-trees: compare sparse directories correctly Derrick Stolee via GitGitGadget
                                   ` (11 subsequent siblings)
  17 siblings, 0 replies; 215+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-07-14 13:12 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee, git,
	johannes.schindelin, Bagas Sanjaya, Derrick Stolee,
	Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

The cache_bottom member of 'struct unpack_trees_options' is used to
track the range of index entries corresponding to a node of the cache
tree. While recursing with traverse_by_cache_tree(), this value is
preserved on the call stack using a local and then restored as that
method returns.

The mark_ce_used() method normally modifies the cache_bottom member when
it refers to the marked cache entry. However, sparse directory entries
are stored as nodes in the cache-tree data structure as of 2de37c53
(cache-tree: integrate with sparse directory entries, 2021-03-30). Thus,
the cache_bottom will be modified as the cache-tree walk advances. Do
not update it as well within mark_ce_used().

Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 unpack-trees.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/unpack-trees.c b/unpack-trees.c
index f88a69f8e71..87c1ed204c8 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -600,6 +600,13 @@ static void mark_ce_used(struct cache_entry *ce, struct unpack_trees_options *o)
 {
 	ce->ce_flags |= CE_UNPACKED;
 
+	/*
+	 * If this is a sparse directory, don't advance cache_bottom.
+	 * That will be advanced later using the cache-tree data.
+	 */
+	if (S_ISSPARSEDIR(ce->ce_mode))
+		return;
+
 	if (o->cache_bottom < o->src_index->cache_nr &&
 	    o->src_index->cache[o->cache_bottom] == ce) {
 		int bottom = o->cache_bottom;
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 215+ messages in thread

* [PATCH v9 07/16] unpack-trees: compare sparse directories correctly
  2021-07-14 13:12               ` [PATCH v9 00/16] " Derrick Stolee via GitGitGadget
                                   ` (5 preceding siblings ...)
  2021-07-14 13:12                 ` [PATCH v9 06/16] unpack-trees: preserve cache_bottom Derrick Stolee via GitGitGadget
@ 2021-07-14 13:12                 ` Derrick Stolee via GitGitGadget
  2021-07-14 13:12                 ` [PATCH v9 08/16] unpack-trees: rename unpack_nondirectories() Derrick Stolee via GitGitGadget
                                   ` (10 subsequent siblings)
  17 siblings, 0 replies; 215+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-07-14 13:12 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee, git,
	johannes.schindelin, Bagas Sanjaya, Derrick Stolee,
	Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

As we further integrate the sparse-index into unpack-trees, we need to
ensure that we compare sparse directory entries correctly with other
entries. This affects searching for an exact path as well as sorting
index entries.

Sparse directory entries contain the trailing directory separator. This
is important for the sorting, in particular. Thus, within
do_compare_entry() we stop using S_IFREG in all cases, since sparse
directories should use S_IFDIR to indicate that the comparison should
treat the entry name as a dirctory.

Within compare_entry(), it first calls do_compare_entry() to check the
leading portion of the name. When the input path is a directory name, we
could match exactly already. Thus, we should return 0 if we have an
exact string match on a sparse directory entry. The final check is a
length comparison between the strings.

Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 unpack-trees.c | 14 +++++++++++++-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/unpack-trees.c b/unpack-trees.c
index 87c1ed204c8..b113cc750f2 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -983,6 +983,7 @@ static int do_compare_entry(const struct cache_entry *ce,
 	int pathlen, ce_len;
 	const char *ce_name;
 	int cmp;
+	unsigned ce_mode;
 
 	/*
 	 * If we have not precomputed the traverse path, it is quicker
@@ -1005,7 +1006,8 @@ static int do_compare_entry(const struct cache_entry *ce,
 	ce_len -= pathlen;
 	ce_name = ce->name + pathlen;
 
-	return df_name_compare(ce_name, ce_len, S_IFREG, name, namelen, mode);
+	ce_mode = S_ISSPARSEDIR(ce->ce_mode) ? S_IFDIR : S_IFREG;
+	return df_name_compare(ce_name, ce_len, ce_mode, name, namelen, mode);
 }
 
 static int compare_entry(const struct cache_entry *ce, const struct traverse_info *info, const struct name_entry *n)
@@ -1014,6 +1016,16 @@ static int compare_entry(const struct cache_entry *ce, const struct traverse_inf
 	if (cmp)
 		return cmp;
 
+	/*
+	 * At this point, we know that we have a prefix match. If ce
+	 * is a sparse directory, then allow an exact match. This only
+	 * works when the input name is a directory, since ce->name
+	 * ends in a directory separator.
+	 */
+	if (S_ISSPARSEDIR(ce->ce_mode) &&
+	    ce->ce_namelen == traverse_path_len(info, tree_entry_len(n)) + 1)
+		return 0;
+
 	/*
 	 * Even if the beginning compared identically, the ce should
 	 * compare as bigger than a directory leading up to it!
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 215+ messages in thread

* [PATCH v9 08/16] unpack-trees: rename unpack_nondirectories()
  2021-07-14 13:12               ` [PATCH v9 00/16] " Derrick Stolee via GitGitGadget
                                   ` (6 preceding siblings ...)
  2021-07-14 13:12                 ` [PATCH v9 07/16] unpack-trees: compare sparse directories correctly Derrick Stolee via GitGitGadget
@ 2021-07-14 13:12                 ` Derrick Stolee via GitGitGadget
  2021-07-14 13:12                 ` [PATCH v9 09/16] unpack-trees: unpack sparse directory entries Derrick Stolee via GitGitGadget
                                   ` (9 subsequent siblings)
  17 siblings, 0 replies; 215+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-07-14 13:12 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee, git,
	johannes.schindelin, Bagas Sanjaya, Derrick Stolee,
	Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

In the next change, we will use this method to unpack a sparse directory
entry, so change the name to unpack_single_entry() so these entries
apply. The new name reflects that we will not recurse into trees in
order to resolve the conflicts.

Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 unpack-trees.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/unpack-trees.c b/unpack-trees.c
index b113cc750f2..d26386ce8b2 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -804,7 +804,7 @@ static int traverse_by_cache_tree(int pos, int nr_entries, int nr_names,
 		BUG("We need cache-tree to do this optimization");
 
 	/*
-	 * Do what unpack_callback() and unpack_nondirectories() normally
+	 * Do what unpack_callback() and unpack_single_entry() normally
 	 * do. But we walk all paths in an iterative loop instead.
 	 *
 	 * D/F conflicts and higher stage entries are not a concern
@@ -1075,11 +1075,11 @@ static struct cache_entry *create_ce_entry(const struct traverse_info *info,
  * without actually calling it. If you change the logic here you may need to
  * check and change there as well.
  */
-static int unpack_nondirectories(int n, unsigned long mask,
-				 unsigned long dirmask,
-				 struct cache_entry **src,
-				 const struct name_entry *names,
-				 const struct traverse_info *info)
+static int unpack_single_entry(int n, unsigned long mask,
+			       unsigned long dirmask,
+			       struct cache_entry **src,
+			       const struct name_entry *names,
+			       const struct traverse_info *info)
 {
 	int i;
 	struct unpack_trees_options *o = info->data;
@@ -1322,7 +1322,7 @@ static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, str
 		}
 	}
 
-	if (unpack_nondirectories(n, mask, dirmask, src, names, info) < 0)
+	if (unpack_single_entry(n, mask, dirmask, src, names, info) < 0)
 		return -1;
 
 	if (o->merge && src[0]) {
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 215+ messages in thread

* [PATCH v9 09/16] unpack-trees: unpack sparse directory entries
  2021-07-14 13:12               ` [PATCH v9 00/16] " Derrick Stolee via GitGitGadget
                                   ` (7 preceding siblings ...)
  2021-07-14 13:12                 ` [PATCH v9 08/16] unpack-trees: rename unpack_nondirectories() Derrick Stolee via GitGitGadget
@ 2021-07-14 13:12                 ` Derrick Stolee via GitGitGadget
  2021-07-14 13:12                 ` [PATCH v9 10/16] dir.c: accept a directory as part of cone-mode patterns Derrick Stolee via GitGitGadget
                                   ` (8 subsequent siblings)
  17 siblings, 0 replies; 215+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-07-14 13:12 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee, git,
	johannes.schindelin, Bagas Sanjaya, Derrick Stolee,
	Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

During unpack_callback(), index entries are compared against tree
entries. These are matched according to names and types. One goal is to
decide if we should recurse into subtrees or simply operate on one index
entry.

In the case of a sparse-directory entry, we do not want to recurse into
that subtree and instead simply compare the trees. In some cases, we
might want to perform a merge operation on the entry, such as during
'git checkout <commit>' which wants to replace a sparse tree entry with
the tree for that path at the target commit. We extend the logic within
unpack_single_entry() to create a sparse-directory entry in this case,
and then that is sent to call_unpack_fn().

There are some subtleties in this process. For instance, we need to
update find_cache_entry() to allow finding a sparse-directory entry that
exactly matches a given path. Use the new helper method
sparse_dir_matches_path() for this. We also need to ignore conflict
markers in the case that the entries correspond to directories and we
already have a sparse directory entry.

Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 unpack-trees.c | 107 +++++++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 99 insertions(+), 8 deletions(-)

diff --git a/unpack-trees.c b/unpack-trees.c
index d26386ce8b2..0a5135ab397 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -1052,13 +1052,15 @@ static struct cache_entry *create_ce_entry(const struct traverse_info *info,
 	const struct name_entry *n,
 	int stage,
 	struct index_state *istate,
-	int is_transient)
+	int is_transient,
+	int is_sparse_directory)
 {
 	size_t len = traverse_path_len(info, tree_entry_len(n));
+	size_t alloc_len = is_sparse_directory ? len + 1 : len;
 	struct cache_entry *ce =
 		is_transient ?
-		make_empty_transient_cache_entry(len, NULL) :
-		make_empty_cache_entry(istate, len);
+		make_empty_transient_cache_entry(alloc_len, NULL) :
+		make_empty_cache_entry(istate, alloc_len);
 
 	ce->ce_mode = create_ce_mode(n->mode);
 	ce->ce_flags = create_ce_flags(stage);
@@ -1067,6 +1069,13 @@ static struct cache_entry *create_ce_entry(const struct traverse_info *info,
 	/* len+1 because the cache_entry allocates space for NUL */
 	make_traverse_path(ce->name, len + 1, info, n->path, n->pathlen);
 
+	if (is_sparse_directory) {
+		ce->name[len] = '/';
+		ce->name[len + 1] = '\0';
+		ce->ce_namelen++;
+		ce->ce_flags |= CE_SKIP_WORKTREE;
+	}
+
 	return ce;
 }
 
@@ -1085,10 +1094,17 @@ static int unpack_single_entry(int n, unsigned long mask,
 	struct unpack_trees_options *o = info->data;
 	unsigned long conflicts = info->df_conflicts | dirmask;
 
-	/* Do we have *only* directories? Nothing to do */
 	if (mask == dirmask && !src[0])
 		return 0;
 
+	/*
+	 * When we have a sparse directory entry for src[0],
+	 * then this isn't necessarily a directory-file conflict.
+	 */
+	if (mask == dirmask && src[0] &&
+	    S_ISSPARSEDIR(src[0]->ce_mode))
+		conflicts = 0;
+
 	/*
 	 * Ok, we've filled in up to any potential index entry in src[0],
 	 * now do the rest.
@@ -1118,7 +1134,9 @@ static int unpack_single_entry(int n, unsigned long mask,
 		 * not stored in the index.  otherwise construct the
 		 * cache entry from the index aware logic.
 		 */
-		src[i + o->merge] = create_ce_entry(info, names + i, stage, &o->result, o->merge);
+		src[i + o->merge] = create_ce_entry(info, names + i, stage,
+						    &o->result, o->merge,
+						    bit & dirmask);
 	}
 
 	if (o->merge) {
@@ -1222,16 +1240,71 @@ static int find_cache_pos(struct traverse_info *info,
 	return -1;
 }
 
+/*
+ * Given a sparse directory entry 'ce', compare ce->name to
+ * info->name + '/' + p->path + '/' if info->name is non-empty.
+ * Compare ce->name to p->path + '/' otherwise. Note that
+ * ce->name must end in a trailing '/' because it is a sparse
+ * directory entry.
+ */
+static int sparse_dir_matches_path(const struct cache_entry *ce,
+				   struct traverse_info *info,
+				   const struct name_entry *p)
+{
+	assert(S_ISSPARSEDIR(ce->ce_mode));
+	assert(ce->name[ce->ce_namelen - 1] == '/');
+
+	if (info->namelen)
+		return ce->ce_namelen == info->namelen + p->pathlen + 2 &&
+		       ce->name[info->namelen] == '/' &&
+		       !strncmp(ce->name, info->name, info->namelen) &&
+		       !strncmp(ce->name + info->namelen + 1, p->path, p->pathlen);
+	return ce->ce_namelen == p->pathlen + 1 &&
+	       !strncmp(ce->name, p->path, p->pathlen);
+}
+
 static struct cache_entry *find_cache_entry(struct traverse_info *info,
 					    const struct name_entry *p)
 {
+	struct cache_entry *ce;
 	int pos = find_cache_pos(info, p->path, p->pathlen);
 	struct unpack_trees_options *o = info->data;
 
 	if (0 <= pos)
 		return o->src_index->cache[pos];
-	else
+
+	/*
+	 * Check for a sparse-directory entry named "path/".
+	 * Due to the input p->path not having a trailing
+	 * slash, the negative 'pos' value overshoots the
+	 * expected position, hence "-2" instead of "-1".
+	 */
+	pos = -pos - 2;
+
+	if (pos < 0 || pos >= o->src_index->cache_nr)
 		return NULL;
+
+	/*
+	 * Due to lexicographic sorting and sparse directory
+	 * entries ending with a trailing slash, our path as a
+	 * sparse directory (e.g "subdir/") and	our path as a
+	 * file (e.g. "subdir") might be separated by other
+	 * paths (e.g. "subdir-").
+	 */
+	while (pos >= 0) {
+		ce = o->src_index->cache[pos];
+
+		if (strncmp(ce->name, p->path, p->pathlen))
+			return NULL;
+
+		if (S_ISSPARSEDIR(ce->ce_mode) &&
+		    sparse_dir_matches_path(ce, info, p))
+			return ce;
+
+		pos--;
+	}
+
+	return NULL;
 }
 
 static void debug_path(struct traverse_info *info)
@@ -1266,6 +1339,21 @@ static void debug_unpack_callback(int n,
 		debug_name_entry(i, names + i);
 }
 
+/*
+ * Returns true if and only if the given cache_entry is a
+ * sparse-directory entry that matches the given name_entry
+ * from the tree walk at the given traverse_info.
+ */
+static int is_sparse_directory_entry(struct cache_entry *ce,
+				     struct name_entry *name,
+				     struct traverse_info *info)
+{
+	if (!ce || !name || !S_ISSPARSEDIR(ce->ce_mode))
+		return 0;
+
+	return sparse_dir_matches_path(ce, info, name);
+}
+
 /*
  * Note that traverse_by_cache_tree() duplicates some logic in this function
  * without actually calling it. If you change the logic here you may need to
@@ -1352,9 +1440,12 @@ static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, str
 			}
 		}
 
-		if (traverse_trees_recursive(n, dirmask, mask & ~dirmask,
-					     names, info) < 0)
+		if (!is_sparse_directory_entry(src[0], names, info) &&
+		    traverse_trees_recursive(n, dirmask, mask & ~dirmask,
+						    names, info) < 0) {
 			return -1;
+		}
+
 		return mask;
 	}
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 215+ messages in thread

* [PATCH v9 10/16] dir.c: accept a directory as part of cone-mode patterns
  2021-07-14 13:12               ` [PATCH v9 00/16] " Derrick Stolee via GitGitGadget
                                   ` (8 preceding siblings ...)
  2021-07-14 13:12                 ` [PATCH v9 09/16] unpack-trees: unpack sparse directory entries Derrick Stolee via GitGitGadget
@ 2021-07-14 13:12                 ` Derrick Stolee via GitGitGadget
  2021-07-14 13:12                 ` [PATCH v9 11/16] diff-lib: handle index diffs with sparse dirs Derrick Stolee via GitGitGadget
                                   ` (7 subsequent siblings)
  17 siblings, 0 replies; 215+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-07-14 13:12 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee, git,
	johannes.schindelin, Bagas Sanjaya, Derrick Stolee,
	Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

When we have sparse directory entries in the index, we want to compare
that directory against sparse-checkout patterns. Those pattern matching
algorithms are built expecting a file path, not a directory path. This
is especially important in the "cone mode" patterns which will match
files that exist within the "parent directories" as well as the
recursive directory matches.

If path_matches_pattern_list() is given a directory, we can add a fake
filename ("-") to the directory and get the same results as before,
assuming we are in cone mode. Since sparse index requires cone mode
patterns, this is an acceptable assumption.

Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 dir.c | 24 +++++++++++++++++++-----
 1 file changed, 19 insertions(+), 5 deletions(-)

diff --git a/dir.c b/dir.c
index ebe5ec046e0..0c5264b3b20 100644
--- a/dir.c
+++ b/dir.c
@@ -1376,7 +1376,7 @@ enum pattern_match_result path_matches_pattern_list(
 	struct path_pattern *pattern;
 	struct strbuf parent_pathname = STRBUF_INIT;
 	int result = NOT_MATCHED;
-	const char *slash_pos;
+	size_t slash_pos;
 
 	if (!pl->use_cone_patterns) {
 		pattern = last_matching_pattern_from_list(pathname, pathlen, basename,
@@ -1397,21 +1397,35 @@ enum pattern_match_result path_matches_pattern_list(
 	strbuf_addch(&parent_pathname, '/');
 	strbuf_add(&parent_pathname, pathname, pathlen);
 
+	/*
+	 * Directory entries are matched if and only if a file
+	 * contained immediately within them is matched. For the
+	 * case of a directory entry, modify the path to create
+	 * a fake filename within this directory, allowing us to
+	 * use the file-base matching logic in an equivalent way.
+	 */
+	if (parent_pathname.len > 0 &&
+	    parent_pathname.buf[parent_pathname.len - 1] == '/') {
+		slash_pos = parent_pathname.len - 1;
+		strbuf_add(&parent_pathname, "-", 1);
+	} else {
+		const char *slash_ptr = strrchr(parent_pathname.buf, '/');
+		slash_pos = slash_ptr ? slash_ptr - parent_pathname.buf : 0;
+	}
+
 	if (hashmap_contains_path(&pl->recursive_hashmap,
 				  &parent_pathname)) {
 		result = MATCHED_RECURSIVE;
 		goto done;
 	}
 
-	slash_pos = strrchr(parent_pathname.buf, '/');
-
-	if (slash_pos == parent_pathname.buf) {
+	if (!slash_pos) {
 		/* include every file in root */
 		result = MATCHED;
 		goto done;
 	}
 
-	strbuf_setlen(&parent_pathname, slash_pos - parent_pathname.buf);
+	strbuf_setlen(&parent_pathname, slash_pos);
 
 	if (hashmap_contains_path(&pl->parent_hashmap, &parent_pathname)) {
 		result = MATCHED;
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 215+ messages in thread

* [PATCH v9 11/16] diff-lib: handle index diffs with sparse dirs
  2021-07-14 13:12               ` [PATCH v9 00/16] " Derrick Stolee via GitGitGadget
                                   ` (9 preceding siblings ...)
  2021-07-14 13:12                 ` [PATCH v9 10/16] dir.c: accept a directory as part of cone-mode patterns Derrick Stolee via GitGitGadget
@ 2021-07-14 13:12                 ` Derrick Stolee via GitGitGadget
  2021-07-14 13:12                 ` [PATCH v9 12/16] status: skip sparse-checkout percentage with sparse-index Derrick Stolee via GitGitGadget
                                   ` (6 subsequent siblings)
  17 siblings, 0 replies; 215+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-07-14 13:12 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee, git,
	johannes.schindelin, Bagas Sanjaya, Derrick Stolee,
	Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

While comparing an index to a tree, we may see a sparse directory entry.
In this case, we should compare that portion of the tree to the tree
represented by that entry. This could include a new tree which needs to
be expanded to a full list of added files. It could also include an
existing tree, in which case all of the changes inside are important to
describe, including the modifications, additions, and deletions. Note
that the case where the tree has a path and the index does not remains
identical to before: the lack of a cache entry is the same with a sparse
index.

Use diff_tree_oid() appropriately to compute the diff.

Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 diff-lib.c | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/diff-lib.c b/diff-lib.c
index c2ac9250fe9..f9eadc4fc1a 100644
--- a/diff-lib.c
+++ b/diff-lib.c
@@ -325,6 +325,11 @@ static void show_new_file(struct rev_info *revs,
 	unsigned dirty_submodule = 0;
 	struct index_state *istate = revs->diffopt.repo->index;
 
+	if (new_file && S_ISSPARSEDIR(new_file->ce_mode)) {
+		diff_tree_oid(NULL, &new_file->oid, new_file->name, &revs->diffopt);
+		return;
+	}
+
 	/*
 	 * New file in the index: it might actually be different in
 	 * the working tree.
@@ -347,6 +352,20 @@ static int show_modified(struct rev_info *revs,
 	unsigned dirty_submodule = 0;
 	struct index_state *istate = revs->diffopt.repo->index;
 
+	assert(S_ISSPARSEDIR(old_entry->ce_mode) ==
+	       S_ISSPARSEDIR(new_entry->ce_mode));
+
+	/*
+	 * If both are sparse directory entries, then expand the
+	 * modifications to the file level. If only one was a sparse
+	 * directory, then they appear as an add and delete instead of
+	 * a modification.
+	 */
+	if (S_ISSPARSEDIR(new_entry->ce_mode)) {
+		diff_tree_oid(&old_entry->oid, &new_entry->oid, new_entry->name, &revs->diffopt);
+		return 0;
+	}
+
 	if (get_stat_data(istate, new_entry, &oid, &mode, cached, match_missing,
 			  &dirty_submodule, &revs->diffopt) < 0) {
 		if (report_missing)
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 215+ messages in thread

* [PATCH v9 12/16] status: skip sparse-checkout percentage with sparse-index
  2021-07-14 13:12               ` [PATCH v9 00/16] " Derrick Stolee via GitGitGadget
                                   ` (10 preceding siblings ...)
  2021-07-14 13:12                 ` [PATCH v9 11/16] diff-lib: handle index diffs with sparse dirs Derrick Stolee via GitGitGadget
@ 2021-07-14 13:12                 ` Derrick Stolee via GitGitGadget
  2021-07-14 13:12                 ` [PATCH v9 13/16] status: use sparse-index throughout Derrick Stolee via GitGitGadget
                                   ` (5 subsequent siblings)
  17 siblings, 0 replies; 215+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-07-14 13:12 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee, git,
	johannes.schindelin, Bagas Sanjaya, Derrick Stolee,
	Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

'git status' began reporting a percentage of populated paths when
sparse-checkout is enabled in 051df3cf (wt-status: show sparse
checkout status as well, 2020-07-18). This percentage is incorrect when
the index has sparse directories. It would also be expensive to
calculate as we would need to parse trees to count the total number of
possible paths.

Avoid the expensive computation by simplifying the output to only report
that a sparse checkout exists, without the percentage.

This change is the reason we use 'git status --porcelain=v2' in
t1092-sparse-checkout-compatibility.sh. We don't want to ensure that
this message is equal across both modes, but instead just the important
information about staged, modified, and untracked files are compared.

Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 t/t1092-sparse-checkout-compatibility.sh |  8 ++++++++
 wt-status.c                              | 14 +++++++++++---
 wt-status.h                              |  1 +
 3 files changed, 20 insertions(+), 3 deletions(-)

diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index 2269f44e033..375b0d35565 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -218,6 +218,14 @@ test_expect_success 'status with options' '
 	test_all_match git status --porcelain=v2 -uno
 '
 
+test_expect_success 'status reports sparse-checkout' '
+	init_repos &&
+	git -C sparse-checkout status >full &&
+	git -C sparse-index status >sparse &&
+	test_i18ngrep "You are in a sparse checkout with " full &&
+	test_i18ngrep "You are in a sparse checkout." sparse
+'
+
 test_expect_success 'add, commit, checkout' '
 	init_repos &&
 
diff --git a/wt-status.c b/wt-status.c
index 42b67357169..96db3e74962 100644
--- a/wt-status.c
+++ b/wt-status.c
@@ -1493,9 +1493,12 @@ static void show_sparse_checkout_in_use(struct wt_status *s,
 	if (s->state.sparse_checkout_percentage == SPARSE_CHECKOUT_DISABLED)
 		return;
 
-	status_printf_ln(s, color,
-			 _("You are in a sparse checkout with %d%% of tracked files present."),
-			 s->state.sparse_checkout_percentage);
+	if (s->state.sparse_checkout_percentage == SPARSE_CHECKOUT_SPARSE_INDEX)
+		status_printf_ln(s, color, _("You are in a sparse checkout."));
+	else
+		status_printf_ln(s, color,
+				_("You are in a sparse checkout with %d%% of tracked files present."),
+				s->state.sparse_checkout_percentage);
 	wt_longstatus_print_trailer(s);
 }
 
@@ -1653,6 +1656,11 @@ static void wt_status_check_sparse_checkout(struct repository *r,
 		return;
 	}
 
+	if (r->index->sparse_index) {
+		state->sparse_checkout_percentage = SPARSE_CHECKOUT_SPARSE_INDEX;
+		return;
+	}
+
 	for (i = 0; i < r->index->cache_nr; i++) {
 		struct cache_entry *ce = r->index->cache[i];
 		if (ce_skip_worktree(ce))
diff --git a/wt-status.h b/wt-status.h
index 0d32799b28e..ab9cc9d8f03 100644
--- a/wt-status.h
+++ b/wt-status.h
@@ -78,6 +78,7 @@ enum wt_status_format {
 };
 
 #define SPARSE_CHECKOUT_DISABLED -1
+#define SPARSE_CHECKOUT_SPARSE_INDEX -2
 
 struct wt_status_state {
 	int merge_in_progress;
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 215+ messages in thread

* [PATCH v9 13/16] status: use sparse-index throughout
  2021-07-14 13:12               ` [PATCH v9 00/16] " Derrick Stolee via GitGitGadget
                                   ` (11 preceding siblings ...)
  2021-07-14 13:12                 ` [PATCH v9 12/16] status: skip sparse-checkout percentage with sparse-index Derrick Stolee via GitGitGadget
@ 2021-07-14 13:12                 ` Derrick Stolee via GitGitGadget
  2021-07-14 13:12                 ` [PATCH v9 14/16] wt-status: expand added sparse directory entries Derrick Stolee via GitGitGadget
                                   ` (4 subsequent siblings)
  17 siblings, 0 replies; 215+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-07-14 13:12 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee, git,
	johannes.schindelin, Bagas Sanjaya, Derrick Stolee,
	Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

By testing 'git -c core.fsmonitor= status -uno', we can check for the
simplest index operations that can be made sparse-aware. The necessary
implementation details are already integrated with sparse-checkout, so
modify command_requires_full_index to be zero for cmd_status().

In refresh_index(), we loop through the index entries to refresh their
stat() information. However, sparse directories have no stat()
information to populate. Ignore these entries.

This allows 'git status' to no longer expand a sparse index to a full
one. This is further tested by dropping the "-uno" option and adding an
untracked file into the worktree.

The performance test p2000-sparse-checkout-operations.sh demonstrates
these improvements:

Test                                  HEAD~1           HEAD
-----------------------------------------------------------------------------
2000.2: git status (full-index-v3)    0.31(0.30+0.05)  0.31(0.29+0.06) +0.0%
2000.3: git status (full-index-v4)    0.31(0.29+0.07)  0.34(0.30+0.08) +9.7%
2000.4: git status (sparse-index-v3)  2.35(2.28+0.10)  0.04(0.04+0.05) -98.3%
2000.5: git status (sparse-index-v4)  2.35(2.24+0.15)  0.05(0.04+0.06) -97.9%

Note that since HEAD~1 was expanding the sparse index by parsing trees,
it was artificially slower than the full index case. Thus, the 98%
improvement is misleading, and instead we should celebrate the 0.34s to
0.05s improvement of 85%. This is more indicative of the peformance
gains we are expecting by using a sparse index.

Note: we are dropping the assignment of core.fsmonitor here. This is not
necessary for the test script as we are not altering the config any
other way. Correct integration with FS Monitor will be validated in
later changes.

Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 builtin/commit.c                         |  3 +++
 read-cache.c                             | 10 ++++++++--
 t/t1092-sparse-checkout-compatibility.sh | 13 +++++++++----
 3 files changed, 20 insertions(+), 6 deletions(-)

diff --git a/builtin/commit.c b/builtin/commit.c
index 190d215d43b..12f51db158a 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -1510,6 +1510,9 @@ int cmd_status(int argc, const char **argv, const char *prefix)
 	if (argc == 2 && !strcmp(argv[1], "-h"))
 		usage_with_options(builtin_status_usage, builtin_status_options);
 
+	prepare_repo_settings(the_repository);
+	the_repository->settings.command_requires_full_index = 0;
+
 	status_init_config(&s, git_status_config);
 	argc = parse_options(argc, argv, prefix,
 			     builtin_status_options,
diff --git a/read-cache.c b/read-cache.c
index 0c3ac3cefc0..6a1337cc905 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1585,8 +1585,7 @@ int refresh_index(struct index_state *istate, unsigned int flags,
 	 */
 	preload_index(istate, pathspec, 0);
 	trace2_region_enter("index", "refresh", NULL);
-	/* TODO: audit for interaction with sparse-index. */
-	ensure_full_index(istate);
+
 	for (i = 0; i < istate->cache_nr; i++) {
 		struct cache_entry *ce, *new_entry;
 		int cache_errno = 0;
@@ -1601,6 +1600,13 @@ int refresh_index(struct index_state *istate, unsigned int flags,
 		if (ignore_skip_worktree && ce_skip_worktree(ce))
 			continue;
 
+		/*
+		 * If this entry is a sparse directory, then there isn't
+		 * any stat() information to update. Ignore the entry.
+		 */
+		if (S_ISSPARSEDIR(ce->ce_mode))
+			continue;
+
 		if (pathspec && !ce_path_match(istate, ce, pathspec, seen))
 			filtered = 1;
 
diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index 375b0d35565..751f397cc7f 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -511,12 +511,17 @@ test_expect_success 'sparse-index is expanded and converted back' '
 	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \
 		git -C sparse-index -c core.fsmonitor="" reset --hard &&
 	test_region index convert_to_sparse trace2.txt &&
-	test_region index ensure_full_index trace2.txt &&
+	test_region index ensure_full_index trace2.txt
+'
 
-	rm trace2.txt &&
+test_expect_success 'sparse-index is not expanded' '
+	init_repos &&
+
+	rm -f trace2.txt &&
+	echo >>sparse-index/untracked.txt &&
 	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \
-		git -C sparse-index -c core.fsmonitor="" status -uno &&
-	test_region index ensure_full_index trace2.txt
+		git -C sparse-index status &&
+	test_region ! index ensure_full_index trace2.txt
 '
 
 test_done
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 215+ messages in thread

* [PATCH v9 14/16] wt-status: expand added sparse directory entries
  2021-07-14 13:12               ` [PATCH v9 00/16] " Derrick Stolee via GitGitGadget
                                   ` (12 preceding siblings ...)
  2021-07-14 13:12                 ` [PATCH v9 13/16] status: use sparse-index throughout Derrick Stolee via GitGitGadget
@ 2021-07-14 13:12                 ` Derrick Stolee via GitGitGadget
  2021-07-14 13:12                 ` [PATCH v9 15/16] fsmonitor: integrate with sparse index Derrick Stolee via GitGitGadget
                                   ` (3 subsequent siblings)
  17 siblings, 0 replies; 215+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-07-14 13:12 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee, git,
	johannes.schindelin, Bagas Sanjaya, Derrick Stolee,
	Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

It is difficult, but possible, to get into a state where we intend to
add a directory that is outside of the sparse-checkout definition. Add a
test to t1092-sparse-checkout-compatibility.sh that demonstrates this
using a combination of 'git reset --mixed' and 'git checkout --orphan'.

This test failed before because the output of 'git status
--porcelain=v2' would not match on the lines for folder1/:

* The sparse-checkout repo (with a full index) would output each path
  name that is intended to be added.

* The sparse-index repo would only output that "folder1/" is staged for
  addition.

The status should report the full list of files to be added, and so this
sparse-directory entry should be expanded to a full list when reaching
it inside the wt_status_collect_changes_initial() method. Use
read_tree_at() to assist.

Somehow, this loop over the cache entries was not guarded by
ensure_full_index() as intended.

Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 t/t1092-sparse-checkout-compatibility.sh | 33 +++++++++++++++
 wt-status.c                              | 51 ++++++++++++++++++++++++
 2 files changed, 84 insertions(+)

diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index 751f397cc7f..2394c36d881 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -524,4 +524,37 @@ test_expect_success 'sparse-index is not expanded' '
 	test_region ! index ensure_full_index trace2.txt
 '
 
+test_expect_success 'reset mixed and checkout orphan' '
+	init_repos &&
+
+	test_all_match git checkout rename-out-to-in &&
+
+	# Sparse checkouts do not agree with full checkouts about
+	# how to report a directory/file conflict during a reset.
+	# This command would fail with test_all_match because the
+	# full checkout reports "T folder1/0/1" while a sparse
+	# checkout reports "D folder1/0/1". This matches because
+	# the sparse checkouts skip "adding" the other side of
+	# the conflict.
+	test_sparse_match git reset --mixed HEAD~1 &&
+	test_sparse_match test-tool read-cache --table --expand &&
+	test_sparse_match git status --porcelain=v2 &&
+
+	# At this point, sparse-checkouts behave differently
+	# from the full-checkout.
+	test_sparse_match git checkout --orphan new-branch &&
+	test_sparse_match test-tool read-cache --table --expand &&
+	test_sparse_match git status --porcelain=v2
+'
+
+test_expect_success 'add everything with deep new file' '
+	init_repos &&
+
+	run_on_sparse git sparse-checkout set deep/deeper1/deepest &&
+
+	run_on_all touch deep/deeper1/x &&
+	test_all_match git add . &&
+	test_all_match git status --porcelain=v2
+'
+
 test_done
diff --git a/wt-status.c b/wt-status.c
index 96db3e74962..0317baef87e 100644
--- a/wt-status.c
+++ b/wt-status.c
@@ -657,6 +657,36 @@ static void wt_status_collect_changes_index(struct wt_status *s)
 	clear_pathspec(&rev.prune_data);
 }
 
+static int add_file_to_list(const struct object_id *oid,
+			    struct strbuf *base, const char *path,
+			    unsigned int mode, void *context)
+{
+	struct string_list_item *it;
+	struct wt_status_change_data *d;
+	struct wt_status *s = context;
+	struct strbuf full_name = STRBUF_INIT;
+
+	if (S_ISDIR(mode))
+		return READ_TREE_RECURSIVE;
+
+	strbuf_add(&full_name, base->buf, base->len);
+	strbuf_addstr(&full_name, path);
+	it = string_list_insert(&s->change, full_name.buf);
+	d = it->util;
+	if (!d) {
+		CALLOC_ARRAY(d, 1);
+		it->util = d;
+	}
+
+	d->index_status = DIFF_STATUS_ADDED;
+	/* Leave {mode,oid}_head zero for adds. */
+	d->mode_index = mode;
+	oidcpy(&d->oid_index, oid);
+	s->committable = 1;
+	strbuf_release(&full_name);
+	return 0;
+}
+
 static void wt_status_collect_changes_initial(struct wt_status *s)
 {
 	struct index_state *istate = s->repo->index;
@@ -671,6 +701,27 @@ static void wt_status_collect_changes_initial(struct wt_status *s)
 			continue;
 		if (ce_intent_to_add(ce))
 			continue;
+		if (S_ISSPARSEDIR(ce->ce_mode)) {
+			/*
+			 * This is a sparse directory entry, so we want to collect all
+			 * of the added files within the tree. This requires recursively
+			 * expanding the trees to find the elements that are new in this
+			 * tree and marking them with DIFF_STATUS_ADDED.
+			 */
+			struct strbuf base = STRBUF_INIT;
+			struct pathspec ps = { 0 };
+			struct tree *tree = lookup_tree(istate->repo, &ce->oid);
+
+			ps.recursive = 1;
+			ps.has_wildcard = 1;
+			ps.max_depth = -1;
+
+			strbuf_add(&base, ce->name, ce->ce_namelen);
+			read_tree_at(istate->repo, tree, &base, &ps,
+				     add_file_to_list, s);
+			continue;
+		}
+
 		it = string_list_insert(&s->change, ce->name);
 		d = it->util;
 		if (!d) {
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 215+ messages in thread

* [PATCH v9 15/16] fsmonitor: integrate with sparse index
  2021-07-14 13:12               ` [PATCH v9 00/16] " Derrick Stolee via GitGitGadget
                                   ` (13 preceding siblings ...)
  2021-07-14 13:12                 ` [PATCH v9 14/16] wt-status: expand added sparse directory entries Derrick Stolee via GitGitGadget
@ 2021-07-14 13:12                 ` Derrick Stolee via GitGitGadget
  2021-07-14 13:12                 ` [PATCH v9 16/16] t1092: document bad sparse-checkout behavior Derrick Stolee via GitGitGadget
                                   ` (2 subsequent siblings)
  17 siblings, 0 replies; 215+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-07-14 13:12 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee, git,
	johannes.schindelin, Bagas Sanjaya, Derrick Stolee,
	Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

If we need to expand a sparse-index into a full one, then the FS Monitor
bitmap is going to be incorrect. Ensure that we start fresh at such an
event.

While this is currently a performance drawback, the eventual hope of the
sparse-index feature is that these expansions will be rare and hence we
will be able to keep the FS Monitor data accurate across multiple Git
commands.

These tests are added to demonstrate that the behavior is the same
across a full index and a sparse index, but also that file modifications
to a tracked directory outside of the sparse cone will trigger
ensure_full_index().

Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 sparse-index.c              |  7 ++++++
 t/t7519-status-fsmonitor.sh | 49 +++++++++++++++++++++++++++++++++++++
 2 files changed, 56 insertions(+)

diff --git a/sparse-index.c b/sparse-index.c
index ef53bd2198b..53c8f711ccc 100644
--- a/sparse-index.c
+++ b/sparse-index.c
@@ -186,6 +186,10 @@ int convert_to_sparse(struct index_state *istate)
 	cache_tree_free(&istate->cache_tree);
 	cache_tree_update(istate, 0);
 
+	istate->fsmonitor_has_run_once = 0;
+	FREE_AND_NULL(istate->fsmonitor_dirty);
+	FREE_AND_NULL(istate->fsmonitor_last_update);
+
 	istate->sparse_index = 1;
 	trace2_region_leave("index", "convert_to_sparse", istate->repo);
 	return 0;
@@ -282,6 +286,9 @@ void ensure_full_index(struct index_state *istate)
 	istate->cache = full->cache;
 	istate->cache_nr = full->cache_nr;
 	istate->cache_alloc = full->cache_alloc;
+	istate->fsmonitor_has_run_once = 0;
+	FREE_AND_NULL(istate->fsmonitor_dirty);
+	FREE_AND_NULL(istate->fsmonitor_last_update);
 
 	strbuf_release(&base);
 	free(full);
diff --git a/t/t7519-status-fsmonitor.sh b/t/t7519-status-fsmonitor.sh
index 637391c6ce4..deea88d4431 100755
--- a/t/t7519-status-fsmonitor.sh
+++ b/t/t7519-status-fsmonitor.sh
@@ -73,6 +73,7 @@ test_expect_success 'setup' '
 	expect*
 	actual*
 	marker*
+	trace2*
 	EOF
 '
 
@@ -383,4 +384,52 @@ test_expect_success 'status succeeds after staging/unstaging' '
 	)
 '
 
+# Usage:
+# check_sparse_index_behavior [!]
+# If "!" is supplied, then we verify that we do not call ensure_full_index
+# during a call to 'git status'. Otherwise, we verify that we _do_ call it.
+check_sparse_index_behavior () {
+	git status --porcelain=v2 >expect &&
+	git sparse-checkout init --cone --sparse-index &&
+	git sparse-checkout set dir1 dir2 &&
+	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \
+		git status --porcelain=v2 >actual &&
+	test_region $1 index ensure_full_index trace2.txt &&
+	test_region fsm_hook query trace2.txt &&
+	test_cmp expect actual &&
+	rm trace2.txt &&
+	git sparse-checkout disable
+}
+
+test_expect_success 'status succeeds with sparse index' '
+	git reset --hard &&
+
+	test_config core.fsmonitor "$TEST_DIRECTORY/t7519/fsmonitor-all" &&
+	check_sparse_index_behavior ! &&
+
+	write_script .git/hooks/fsmonitor-test<<-\EOF &&
+		printf "last_update_token\0"
+	EOF
+	git config core.fsmonitor .git/hooks/fsmonitor-test &&
+	check_sparse_index_behavior ! &&
+
+	write_script .git/hooks/fsmonitor-test<<-\EOF &&
+		printf "last_update_token\0"
+		printf "dir1/modified\0"
+	EOF
+	check_sparse_index_behavior ! &&
+
+	cp -r dir1 dir1a &&
+	git add dir1a &&
+	git commit -m "add dir1a" &&
+
+	# This one modifies outside the sparse-checkout definition
+	# and hence we expect to expand the sparse-index.
+	write_script .git/hooks/fsmonitor-test<<-\EOF &&
+		printf "last_update_token\0"
+		printf "dir1a/modified\0"
+	EOF
+	check_sparse_index_behavior
+'
+
 test_done
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 215+ messages in thread

* [PATCH v9 16/16] t1092: document bad sparse-checkout behavior
  2021-07-14 13:12               ` [PATCH v9 00/16] " Derrick Stolee via GitGitGadget
                                   ` (14 preceding siblings ...)
  2021-07-14 13:12                 ` [PATCH v9 15/16] fsmonitor: integrate with sparse index Derrick Stolee via GitGitGadget
@ 2021-07-14 13:12                 ` Derrick Stolee via GitGitGadget
  2021-07-14 15:08                 ` [PATCH v9 00/16] Sparse-index: integrate with status Elijah Newren
  2021-07-14 20:37                 ` Junio C Hamano
  17 siblings, 0 replies; 215+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-07-14 13:12 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee, git,
	johannes.schindelin, Bagas Sanjaya, Derrick Stolee,
	Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

There are several situations where a repository with sparse-checkout
enabled will act differently than a normal repository, and in ways that
are not intentional. The test t1092-sparse-checkout-compatibility.sh
documents some of these deviations, but a casual reader might think
these are intentional behavior changes.

Add comments on these tests that make it clear that these behaviors
should be updated. Using 'NEEDSWORK' helps contributors find that these
are potential areas for improvement.

Helped-by: Elijah Newren <newren@gmail.com>
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 t/t1092-sparse-checkout-compatibility.sh | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index 2394c36d881..cabbd42e339 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -392,8 +392,8 @@ test_expect_failure 'blame with pathspec outside sparse definition' '
 	test_all_match git blame deep/deeper2/deepest/a
 '
 
-# TODO: reset currently does not behave as expected when in a
-# sparse-checkout.
+# NEEDSWORK: a sparse-checkout behaves differently from a full checkout
+# in this scenario, but it shouldn't.
 test_expect_failure 'checkout and reset (mixed)' '
 	init_repos &&
 
@@ -403,8 +403,8 @@ test_expect_failure 'checkout and reset (mixed)' '
 	test_all_match git reset update-folder2
 '
 
-# Ensure that sparse-index behaves identically to
-# sparse-checkout with a full index.
+# NEEDSWORK: a sparse-checkout behaves differently from a full checkout
+# in this scenario, but it shouldn't.
 test_expect_success 'checkout and reset (mixed) [sparse]' '
 	init_repos &&
 
@@ -524,6 +524,8 @@ test_expect_success 'sparse-index is not expanded' '
 	test_region ! index ensure_full_index trace2.txt
 '
 
+# NEEDSWORK: a sparse-checkout behaves differently from a full checkout
+# in this scenario, but it shouldn't.
 test_expect_success 'reset mixed and checkout orphan' '
 	init_repos &&
 
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 215+ messages in thread

* Re: [PATCH v9 00/16] Sparse-index: integrate with status
  2021-07-14 13:12               ` [PATCH v9 00/16] " Derrick Stolee via GitGitGadget
                                   ` (15 preceding siblings ...)
  2021-07-14 13:12                 ` [PATCH v9 16/16] t1092: document bad sparse-checkout behavior Derrick Stolee via GitGitGadget
@ 2021-07-14 15:08                 ` Elijah Newren
  2021-07-14 20:37                 ` Junio C Hamano
  17 siblings, 0 replies; 215+ messages in thread
From: Elijah Newren @ 2021-07-14 15:08 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Junio C Hamano, Matheus Tavares Bernardino,
	Derrick Stolee, Jeff Hostetler, Johannes Schindelin,
	Bagas Sanjaya, Derrick Stolee

On Wed, Jul 14, 2021 at 6:12 AM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> This is the first "payoff" series in the sparse-index work. It makes 'git
> status' very fast when a sparse-index is enabled on a repository with
> cone-mode sparse-checkout (and a small populated set).
>
> This is based on ds/sparse-index-protections AND mt/add-rm-sparse-checkout.
> The latter branch is needed because it changes the behavior of 'git add'
> around sparse entries, which changes the expectations of a test added in
> patch 1.
>
> The approach here is to audit the places where ensure_full_index() pops up
> while doing normal commands with pathspecs within the sparse-checkout
> definition. Each of these are checked and tested. In the end, the
> sparse-index is integrated with these features:
>
>  * git status
>  * FS Monitor index extension.
>
> The performance tests in p2000-sparse-operations.sh improve by 95% or more,
> even when compared with the full-index cases, not just the sparse-index
> cases that previously had extra overhead.
>
> Hopefully this is the first example of how ds/sparse-index-protections has
> done the basic work to do these conversions safely, making them look easier
> than they seemed when starting this adventure.
>
> Thanks, -Stolee
>
>
> Update in V9
> ============
>
>  * Fixed typo.
>
>  * All patches are marked as Reviewed-by Elijah. Thanks for the careful
>    review!

Thanks for all the hard work and pushing this feature forward!

^ permalink raw reply	[flat|nested] 215+ messages in thread

* Re: [PATCH v9 00/16] Sparse-index: integrate with status
  2021-07-14 13:12               ` [PATCH v9 00/16] " Derrick Stolee via GitGitGadget
                                   ` (16 preceding siblings ...)
  2021-07-14 15:08                 ` [PATCH v9 00/16] Sparse-index: integrate with status Elijah Newren
@ 2021-07-14 20:37                 ` Junio C Hamano
  2021-07-15  2:41                   ` Elijah Newren
  17 siblings, 1 reply; 215+ messages in thread
From: Junio C Hamano @ 2021-07-14 20:37 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: git, newren, Matheus Tavares Bernardino, Derrick Stolee, git,
	johannes.schindelin, Bagas Sanjaya, Derrick Stolee

"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> This is the first "payoff" series in the sparse-index work. It makes 'git
> status' very fast when a sparse-index is enabled on a repository with
> cone-mode sparse-checkout (and a small populated set).

The first payoff turning out to be  a long time coming ;-)

> This is based on ds/sparse-index-protections AND mt/add-rm-sparse-checkout.
> The latter branch is needed because it changes the behavior of 'git add'
> around sparse entries, which changes the expectations of a test added in
> patch 1.
>
> The approach here is to audit the places where ensure_full_index() pops up
> while doing normal commands with pathspecs within the sparse-checkout
> definition. Each of these are checked and tested. In the end, the
> sparse-index is integrated with these features:
>
>  * git status
>  * FS Monitor index extension.

I said this alreayd but I think the approach makes sense.  One
related tangent that makes me wonder is if we can teach "ls-files"
to help scriptors with a mode that does not pretend we have the full
index (i.e. when asked to show sparse index state, "ls-files
--sparse" would show the same output as "ls-files --stage" without
expanding each tree entry in the index into its flattend list of
paths).

Thanks.

^ permalink raw reply	[flat|nested] 215+ messages in thread

* Re: [PATCH v9 00/16] Sparse-index: integrate with status
  2021-07-14 20:37                 ` Junio C Hamano
@ 2021-07-15  2:41                   ` Elijah Newren
  0 siblings, 0 replies; 215+ messages in thread
From: Elijah Newren @ 2021-07-15  2:41 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Derrick Stolee via GitGitGadget, Git Mailing List,
	Matheus Tavares Bernardino, Derrick Stolee, Jeff Hostetler,
	Johannes Schindelin, Bagas Sanjaya, Derrick Stolee

On Wed, Jul 14, 2021 at 1:37 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
> > This is the first "payoff" series in the sparse-index work. It makes 'git
> > status' very fast when a sparse-index is enabled on a repository with
> > cone-mode sparse-checkout (and a small populated set).
>
> The first payoff turning out to be  a long time coming ;-)

True, but much, much less time than it took between when I started the
merge machinery overhaul and its first big payoffs.  ;-)

^ permalink raw reply	[flat|nested] 215+ messages in thread

end of thread, other threads:[~2021-07-15  2:41 UTC | newest]

Thread overview: 215+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-13 14:01 [PATCH 00/10] Sparse-index: integrate with status and add Derrick Stolee via GitGitGadget
2021-04-13 14:01 ` [PATCH 01/10] t1092: add tests for status/add and sparse files Derrick Stolee via GitGitGadget
2021-04-20 21:52   ` Elijah Newren
2021-04-21 13:21     ` Derrick Stolee
2021-04-21 15:14   ` Matheus Tavares Bernardino
2021-04-23 20:12     ` Derrick Stolee
2021-04-13 14:01 ` [PATCH 02/10] unpack-trees: make sparse aware Derrick Stolee via GitGitGadget
2021-04-20 23:00   ` Elijah Newren
2021-04-21 13:41     ` Derrick Stolee
2021-04-21 16:11       ` Elijah Newren
2021-04-22  2:24         ` Matheus Tavares Bernardino
2021-04-21 17:27     ` Derrick Stolee
2021-04-21 18:55       ` Matheus Tavares Bernardino
2021-04-21 19:10         ` Elijah Newren
2021-04-21 19:51           ` Matheus Tavares Bernardino
2021-04-21 18:56       ` Elijah Newren
2021-04-23 20:16         ` Derrick Stolee
2021-04-13 14:01 ` [PATCH 03/10] dir.c: accept a directory as part of cone-mode patterns Derrick Stolee via GitGitGadget
2021-04-20 23:21   ` Elijah Newren
2021-04-21 13:47     ` Derrick Stolee
2021-04-13 14:01 ` [PATCH 04/10] status: skip sparse-checkout percentage with sparse-index Derrick Stolee via GitGitGadget
2021-04-20 23:26   ` Elijah Newren
2021-04-21 13:51     ` Derrick Stolee
2021-04-13 14:01 ` [PATCH 05/10] status: use sparse-index throughout Derrick Stolee via GitGitGadget
2021-04-21  0:44   ` Elijah Newren
2021-04-21 13:55     ` Derrick Stolee
2021-04-13 14:01 ` [PATCH 06/10] dir: use expand_to_path() for sparse directories Derrick Stolee via GitGitGadget
2021-04-21  0:52   ` Elijah Newren
2021-04-21  0:53     ` Elijah Newren
2021-04-21 14:03       ` Derrick Stolee
2021-04-13 14:01 ` [PATCH 07/10] add: allow operating on a sparse-only index Derrick Stolee via GitGitGadget
2021-04-13 14:01 ` [PATCH 08/10] pathspec: stop calling ensure_full_index Derrick Stolee via GitGitGadget
2021-04-21  0:57   ` Elijah Newren
2021-04-13 14:01 ` [PATCH 09/10] t7519: add sparse directories to FS monitor tests Derrick Stolee via GitGitGadget
2021-04-13 14:01 ` [PATCH 10/10] fsmonitor: test with sparse index Derrick Stolee via GitGitGadget
2021-04-21  7:00   ` Elijah Newren
2021-04-13 20:45 ` [PATCH 00/10] Sparse-index: integrate with status and add Matheus Tavares Bernardino
2021-04-14 16:31   ` Derrick Stolee
2021-04-23 21:34 ` [PATCH v2 0/8] Sparse-index: integrate with status Derrick Stolee via GitGitGadget
2021-04-23 21:34   ` [PATCH v2 1/8] t1092: add tests for status/add and sparse files Derrick Stolee via GitGitGadget
2021-05-13 12:40     ` Matheus Tavares Bernardino
2021-05-14 12:27       ` Derrick Stolee
2021-04-23 21:34   ` [PATCH v2 2/8] unpack-trees: preserve cache_bottom Derrick Stolee via GitGitGadget
2021-04-23 21:34   ` [PATCH v2 3/8] unpack-trees: compare sparse directories correctly Derrick Stolee via GitGitGadget
2021-05-13  3:26     ` Elijah Newren
2021-04-23 21:34   ` [PATCH v2 4/8] unpack-trees: stop recursing into sparse directories Derrick Stolee via GitGitGadget
2021-05-13  3:31     ` Elijah Newren
2021-04-23 21:34   ` [PATCH v2 5/8] dir.c: accept a directory as part of cone-mode patterns Derrick Stolee via GitGitGadget
2021-04-23 21:34   ` [PATCH v2 6/8] status: skip sparse-checkout percentage with sparse-index Derrick Stolee via GitGitGadget
2021-04-23 21:34   ` [PATCH v2 7/8] status: use sparse-index throughout Derrick Stolee via GitGitGadget
2021-04-23 21:34   ` [PATCH v2 8/8] fsmonitor: test with sparse index Derrick Stolee via GitGitGadget
2021-05-13  4:12   ` [PATCH v2 0/8] Sparse-index: integrate with status Elijah Newren
2021-05-14 18:28     ` Derrick Stolee
2021-05-14 18:30   ` [PATCH v3 00/12] " Derrick Stolee via GitGitGadget
2021-05-14 18:31     ` [PATCH v3 01/12] sparse-index: skip indexes with unmerged entries Derrick Stolee via GitGitGadget
2021-05-14 18:31     ` [PATCH v3 02/12] sparse-index: include EXTENDED flag when expanding Derrick Stolee via GitGitGadget
2021-05-18  1:33       ` Elijah Newren
2021-05-18 14:57         ` Derrick Stolee
2021-05-18 17:48           ` Elijah Newren
2021-05-18 18:16             ` Derrick Stolee
2021-05-14 18:31     ` [PATCH v3 03/12] t1092: expand repository data shape Derrick Stolee via GitGitGadget
2021-05-18  1:49       ` Elijah Newren
2021-05-18 14:59         ` Derrick Stolee
2021-05-14 18:31     ` [PATCH v3 04/12] t1092: add tests for status/add and sparse files Derrick Stolee via GitGitGadget
2021-05-14 18:31     ` [PATCH v3 05/12] unpack-trees: preserve cache_bottom Derrick Stolee via GitGitGadget
2021-05-14 18:31     ` [PATCH v3 06/12] unpack-trees: compare sparse directories correctly Derrick Stolee via GitGitGadget
2021-05-14 18:31     ` [PATCH v3 07/12] unpack-trees: stop recursing into sparse directories Derrick Stolee via GitGitGadget
2021-05-18  2:03       ` Elijah Newren
2021-05-18  2:06         ` Elijah Newren
2021-05-18 19:20           ` Derrick Stolee
2021-05-14 18:31     ` [PATCH v3 08/12] dir.c: accept a directory as part of cone-mode patterns Derrick Stolee via GitGitGadget
2021-05-14 18:31     ` [PATCH v3 09/12] status: skip sparse-checkout percentage with sparse-index Derrick Stolee via GitGitGadget
2021-05-14 18:31     ` [PATCH v3 10/12] status: use sparse-index throughout Derrick Stolee via GitGitGadget
2021-05-14 18:31     ` [PATCH v3 11/12] wt-status: expand added sparse directory entries Derrick Stolee via GitGitGadget
2021-05-18  2:27       ` Elijah Newren
2021-05-18 18:26         ` Derrick Stolee
2021-05-18 19:04           ` Derrick Stolee
2021-05-19  8:38             ` Elijah Newren
2021-05-14 18:31     ` [PATCH v3 12/12] fsmonitor: integrate with sparse index Derrick Stolee via GitGitGadget
2021-05-21 11:59     ` [PATCH v4 00/12] Sparse-index: integrate with status Derrick Stolee via GitGitGadget
2021-05-21 11:59       ` [PATCH v4 01/12] sparse-index: skip indexes with unmerged entries Derrick Stolee via GitGitGadget
2021-05-21 11:59       ` [PATCH v4 02/12] sparse-index: include EXTENDED flag when expanding Derrick Stolee via GitGitGadget
2021-05-21 11:59       ` [PATCH v4 03/12] t1092: expand repository data shape Derrick Stolee via GitGitGadget
2021-05-21 11:59       ` [PATCH v4 04/12] t1092: add tests for status/add and sparse files Derrick Stolee via GitGitGadget
2021-05-21 11:59       ` [PATCH v4 05/12] unpack-trees: preserve cache_bottom Derrick Stolee via GitGitGadget
2021-05-21 11:59       ` [PATCH v4 06/12] unpack-trees: compare sparse directories correctly Derrick Stolee via GitGitGadget
2021-05-21 11:59       ` [PATCH v4 07/12] unpack-trees: be careful around sparse directory entries Derrick Stolee via GitGitGadget
2021-05-28 11:36         ` Derrick Stolee
2021-05-21 11:59       ` [PATCH v4 08/12] dir.c: accept a directory as part of cone-mode patterns Derrick Stolee via GitGitGadget
2021-05-21 11:59       ` [PATCH v4 09/12] status: skip sparse-checkout percentage with sparse-index Derrick Stolee via GitGitGadget
2021-05-21 11:59       ` [PATCH v4 10/12] status: use sparse-index throughout Derrick Stolee via GitGitGadget
2021-05-21 11:59       ` [PATCH v4 11/12] wt-status: expand added sparse directory entries Derrick Stolee via GitGitGadget
2021-05-21 11:59       ` [PATCH v4 12/12] fsmonitor: integrate with sparse index Derrick Stolee via GitGitGadget
2021-06-07 12:33       ` [PATCH v5 00/14] Sparse-index: integrate with status Derrick Stolee via GitGitGadget
2021-06-07 12:33         ` [PATCH v5 01/14] sparse-index: skip indexes with unmerged entries Derrick Stolee via GitGitGadget
2021-06-07 12:34         ` [PATCH v5 02/14] sparse-index: include EXTENDED flag when expanding Derrick Stolee via GitGitGadget
2021-06-08 18:56           ` Elijah Newren
2021-06-09 17:39             ` Derrick Stolee
2021-06-09 18:11               ` Elijah Newren
2021-06-07 12:34         ` [PATCH v5 03/14] t1092: replace incorrect 'echo' with 'cat' Derrick Stolee via GitGitGadget
2021-06-08 19:18           ` Elijah Newren
2021-06-07 12:34         ` [PATCH v5 04/14] t1092: expand repository data shape Derrick Stolee via GitGitGadget
2021-06-07 12:34         ` [PATCH v5 05/14] t1092: add tests for status/add and sparse files Derrick Stolee via GitGitGadget
2021-06-07 12:34         ` [PATCH v5 06/14] unpack-trees: preserve cache_bottom Derrick Stolee via GitGitGadget
2021-06-07 12:34         ` [PATCH v5 07/14] unpack-trees: compare sparse directories correctly Derrick Stolee via GitGitGadget
2021-06-07 12:34         ` [PATCH v5 08/14] unpack-trees: unpack sparse directory entries Derrick Stolee via GitGitGadget
2021-06-09  3:48           ` Elijah Newren
2021-06-09 20:21             ` Derrick Stolee
2021-06-07 12:34         ` [PATCH v5 09/14] dir.c: accept a directory as part of cone-mode patterns Derrick Stolee via GitGitGadget
2021-06-07 12:34         ` [PATCH v5 10/14] diff-lib: handle index diffs with sparse dirs Derrick Stolee via GitGitGadget
2021-06-07 15:26           ` Derrick Stolee
2021-06-08  1:05             ` Junio C Hamano
2021-06-08 13:00               ` Derrick Stolee
2021-06-09  5:47           ` Elijah Newren
2021-06-09  6:32             ` Junio C Hamano
2021-06-09  8:11               ` Elijah Newren
2021-06-09 20:33                 ` Derrick Stolee
2021-06-10 17:45                   ` Derrick Stolee
2021-06-10 21:31                     ` Elijah Newren
2021-06-11 12:57                       ` Derrick Stolee
2021-06-11 17:27                         ` Derrick Stolee
2021-06-07 12:34         ` [PATCH v5 11/14] status: skip sparse-checkout percentage with sparse-index Derrick Stolee via GitGitGadget
2021-06-07 12:34         ` [PATCH v5 12/14] status: use sparse-index throughout Derrick Stolee via GitGitGadget
2021-06-07 12:34         ` [PATCH v5 13/14] wt-status: expand added sparse directory entries Derrick Stolee via GitGitGadget
2021-06-09  5:27           ` Elijah Newren
2021-06-09 20:49             ` Derrick Stolee
2021-06-07 12:34         ` [PATCH v5 14/14] fsmonitor: integrate with sparse index Derrick Stolee via GitGitGadget
2021-06-29  1:51         ` [PATCH v6 00/14] Sparse-index: integrate with status Derrick Stolee via GitGitGadget
2021-06-29  1:51           ` [PATCH v6 01/14] sparse-index: skip indexes with unmerged entries Derrick Stolee via GitGitGadget
2021-06-29  1:51           ` [PATCH v6 02/14] sparse-index: include EXTENDED flag when expanding Derrick Stolee via GitGitGadget
2021-06-29  1:51           ` [PATCH v6 03/14] t1092: replace incorrect 'echo' with 'cat' Derrick Stolee via GitGitGadget
2021-06-29  1:51           ` [PATCH v6 04/14] t1092: expand repository data shape Derrick Stolee via GitGitGadget
2021-06-29  1:51           ` [PATCH v6 05/14] t1092: add tests for status/add and sparse files Derrick Stolee via GitGitGadget
2021-06-29  1:51           ` [PATCH v6 06/14] unpack-trees: preserve cache_bottom Derrick Stolee via GitGitGadget
2021-06-29  1:51           ` [PATCH v6 07/14] unpack-trees: compare sparse directories correctly Derrick Stolee via GitGitGadget
2021-06-29  1:51           ` [PATCH v6 08/14] unpack-trees: unpack sparse directory entries Derrick Stolee via GitGitGadget
2021-06-29  1:51           ` [PATCH v6 09/14] dir.c: accept a directory as part of cone-mode patterns Derrick Stolee via GitGitGadget
2021-06-29  1:51           ` [PATCH v6 10/14] diff-lib: handle index diffs with sparse dirs Derrick Stolee via GitGitGadget
2021-06-29  1:51           ` [PATCH v6 11/14] status: skip sparse-checkout percentage with sparse-index Derrick Stolee via GitGitGadget
2021-06-29  1:51           ` [PATCH v6 12/14] status: use sparse-index throughout Derrick Stolee via GitGitGadget
2021-06-29  1:51           ` [PATCH v6 13/14] wt-status: expand added sparse directory entries Derrick Stolee via GitGitGadget
2021-06-29  1:51           ` [PATCH v6 14/14] fsmonitor: integrate with sparse index Derrick Stolee via GitGitGadget
2021-06-29  2:02           ` [PATCH v6 00/14] Sparse-index: integrate with status Derrick Stolee
2021-06-29  2:04           ` [PATCH v7 00/16] " Derrick Stolee via GitGitGadget
2021-06-29  2:04             ` [PATCH v7 01/16] sparse-index: skip indexes with unmerged entries Derrick Stolee via GitGitGadget
2021-06-29  2:04             ` [PATCH v7 02/16] sparse-index: include EXTENDED flag when expanding Derrick Stolee via GitGitGadget
2021-06-29  2:04             ` [PATCH v7 03/16] t1092: replace incorrect 'echo' with 'cat' Derrick Stolee via GitGitGadget
2021-06-29  2:04             ` [PATCH v7 04/16] t1092: expand repository data shape Derrick Stolee via GitGitGadget
2021-06-29  2:04             ` [PATCH v7 05/16] t1092: add tests for status/add and sparse files Derrick Stolee via GitGitGadget
2021-06-29  2:04             ` [PATCH v7 06/16] unpack-trees: preserve cache_bottom Derrick Stolee via GitGitGadget
2021-06-29  2:04             ` [PATCH v7 07/16] unpack-trees: compare sparse directories correctly Derrick Stolee via GitGitGadget
2021-06-29  2:04             ` [PATCH v7 08/16] unpack-trees: rename unpack_nondirectories() Derrick Stolee via GitGitGadget
2021-06-29  2:04             ` [PATCH v7 09/16] unpack-trees: unpack sparse directory entries Derrick Stolee via GitGitGadget
2021-07-07 22:25               ` Elijah Newren
2021-06-29  2:04             ` [PATCH v7 10/16] unpack-trees: handle dir/file conflict of sparse entries Derrick Stolee via GitGitGadget
2021-07-07 23:19               ` Elijah Newren
2021-07-09  0:58                 ` Elijah Newren
2021-07-12 13:46                   ` Derrick Stolee
2021-06-29  2:04             ` [PATCH v7 11/16] dir.c: accept a directory as part of cone-mode patterns Derrick Stolee via GitGitGadget
2021-06-29  2:04             ` [PATCH v7 12/16] diff-lib: handle index diffs with sparse dirs Derrick Stolee via GitGitGadget
2021-07-08 23:10               ` Elijah Newren
2021-07-08 23:51                 ` Elijah Newren
2021-07-12 13:52                   ` Derrick Stolee
2021-06-29  2:04             ` [PATCH v7 13/16] status: skip sparse-checkout percentage with sparse-index Derrick Stolee via GitGitGadget
2021-06-29  2:04             ` [PATCH v7 14/16] status: use sparse-index throughout Derrick Stolee via GitGitGadget
2021-06-29  2:04             ` [PATCH v7 15/16] wt-status: expand added sparse directory entries Derrick Stolee via GitGitGadget
2021-07-09  1:03               ` Elijah Newren
2021-07-12 13:56                 ` Derrick Stolee
2021-07-12 19:32                   ` Elijah Newren
2021-07-12 19:41                     ` Derrick Stolee
2021-06-29  2:04             ` [PATCH v7 16/16] fsmonitor: integrate with sparse index Derrick Stolee via GitGitGadget
2021-06-29  2:16             ` [PATCH v7 00/16] Sparse-index: integrate with status Derrick Stolee
2021-06-30 14:32             ` Elijah Newren
2021-07-09  1:16               ` Elijah Newren
2021-07-12 14:46                 ` Derrick Stolee
2021-07-12 17:55             ` [PATCH v8 00/15] " Derrick Stolee via GitGitGadget
2021-07-12 17:55               ` [PATCH v8 01/15] sparse-index: skip indexes with unmerged entries Derrick Stolee via GitGitGadget
2021-07-12 17:55               ` [PATCH v8 02/15] sparse-index: include EXTENDED flag when expanding Derrick Stolee via GitGitGadget
2021-07-12 17:55               ` [PATCH v8 03/15] t1092: replace incorrect 'echo' with 'cat' Derrick Stolee via GitGitGadget
2021-07-14  0:02                 ` Bagas Sanjaya
2021-07-12 17:55               ` [PATCH v8 04/15] t1092: expand repository data shape Derrick Stolee via GitGitGadget
2021-07-12 17:55               ` [PATCH v8 05/15] t1092: add tests for status/add and sparse files Derrick Stolee via GitGitGadget
2021-07-12 17:55               ` [PATCH v8 06/15] unpack-trees: preserve cache_bottom Derrick Stolee via GitGitGadget
2021-07-12 17:55               ` [PATCH v8 07/15] unpack-trees: compare sparse directories correctly Derrick Stolee via GitGitGadget
2021-07-12 17:55               ` [PATCH v8 08/15] unpack-trees: rename unpack_nondirectories() Derrick Stolee via GitGitGadget
2021-07-12 17:55               ` [PATCH v8 09/15] unpack-trees: unpack sparse directory entries Derrick Stolee via GitGitGadget
2021-07-12 17:55               ` [PATCH v8 10/15] dir.c: accept a directory as part of cone-mode patterns Derrick Stolee via GitGitGadget
2021-07-12 17:55               ` [PATCH v8 11/15] diff-lib: handle index diffs with sparse dirs Derrick Stolee via GitGitGadget
2021-07-12 17:55               ` [PATCH v8 12/15] status: skip sparse-checkout percentage with sparse-index Derrick Stolee via GitGitGadget
2021-07-12 17:55               ` [PATCH v8 13/15] status: use sparse-index throughout Derrick Stolee via GitGitGadget
2021-07-12 17:55               ` [PATCH v8 14/15] wt-status: expand added sparse directory entries Derrick Stolee via GitGitGadget
2021-07-12 17:55               ` [PATCH v8 15/15] fsmonitor: integrate with sparse index Derrick Stolee via GitGitGadget
2021-07-12 19:38               ` [PATCH v8 00/15] Sparse-index: integrate with status Elijah Newren
2021-07-13 12:57                 ` Derrick Stolee
2021-07-13 17:37                   ` Elijah Newren
2021-07-14 13:12               ` [PATCH v9 00/16] " Derrick Stolee via GitGitGadget
2021-07-14 13:12                 ` [PATCH v9 01/16] sparse-index: skip indexes with unmerged entries Derrick Stolee via GitGitGadget
2021-07-14 13:12                 ` [PATCH v9 02/16] sparse-index: include EXTENDED flag when expanding Derrick Stolee via GitGitGadget
2021-07-14 13:12                 ` [PATCH v9 03/16] t1092: replace incorrect 'echo' with 'cat' Derrick Stolee via GitGitGadget
2021-07-14 13:12                 ` [PATCH v9 04/16] t1092: expand repository data shape Derrick Stolee via GitGitGadget
2021-07-14 13:12                 ` [PATCH v9 05/16] t1092: add tests for status/add and sparse files Derrick Stolee via GitGitGadget
2021-07-14 13:12                 ` [PATCH v9 06/16] unpack-trees: preserve cache_bottom Derrick Stolee via GitGitGadget
2021-07-14 13:12                 ` [PATCH v9 07/16] unpack-trees: compare sparse directories correctly Derrick Stolee via GitGitGadget
2021-07-14 13:12                 ` [PATCH v9 08/16] unpack-trees: rename unpack_nondirectories() Derrick Stolee via GitGitGadget
2021-07-14 13:12                 ` [PATCH v9 09/16] unpack-trees: unpack sparse directory entries Derrick Stolee via GitGitGadget
2021-07-14 13:12                 ` [PATCH v9 10/16] dir.c: accept a directory as part of cone-mode patterns Derrick Stolee via GitGitGadget
2021-07-14 13:12                 ` [PATCH v9 11/16] diff-lib: handle index diffs with sparse dirs Derrick Stolee via GitGitGadget
2021-07-14 13:12                 ` [PATCH v9 12/16] status: skip sparse-checkout percentage with sparse-index Derrick Stolee via GitGitGadget
2021-07-14 13:12                 ` [PATCH v9 13/16] status: use sparse-index throughout Derrick Stolee via GitGitGadget
2021-07-14 13:12                 ` [PATCH v9 14/16] wt-status: expand added sparse directory entries Derrick Stolee via GitGitGadget
2021-07-14 13:12                 ` [PATCH v9 15/16] fsmonitor: integrate with sparse index Derrick Stolee via GitGitGadget
2021-07-14 13:12                 ` [PATCH v9 16/16] t1092: document bad sparse-checkout behavior Derrick Stolee via GitGitGadget
2021-07-14 15:08                 ` [PATCH v9 00/16] Sparse-index: integrate with status Elijah Newren
2021-07-14 20:37                 ` Junio C Hamano
2021-07-15  2:41                   ` Elijah Newren

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).