Git Mailing List Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH 00/10] Sparse-index: integrate with status and add
@ 2021-04-13 14:01 Derrick Stolee via GitGitGadget
  2021-04-13 14:01 ` [PATCH 01/10] t1092: add tests for status/add and sparse files Derrick Stolee via GitGitGadget
                   ` (11 more replies)
  0 siblings, 12 replies; 66+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-04-13 14:01 UTC (permalink / raw)
  To: git; +Cc: gitster, newren, Derrick Stolee

This is the first "payoff" series in the sparse-index work. It makes 'git
status' and 'git add' very fast when a sparse-index is enabled on a
repository with cone-mode sparse-checkout (and a small populated set).

This is based on ds/sparse-index-protections AND mt/add-rm-sparse-checkout.
The latter branch is needed because it changes the behavior of 'git add'
around sparse entries, which changes the expectations of a test added in
patch 1.

The approach here is to audit the places where ensure_full_index() pops up
while doing normal commands with pathspecs within the sparse-checkout
definition. Each of these are checked and tested. In the end, the
sparse-index is integrated with these features:

 * git status
 * git add -A
 * git add . (and other pathspecs)
 * FS Monitor index extension.

The performance tests in p2000-sparse-operations.sh improve by 95% or more,
even when compared with the full-index cases, not just the sparse-index
cases that previously had extra overhead.

Hopefully this is the first example of how ds/sparse-index-protections has
done the basic work to do these conversions safely, making them look easier
than they seemed when starting this adventure.

Thanks, -Stolee

Derrick Stolee (10):
  t1092: add tests for status/add and sparse files
  unpack-trees: make sparse aware
  dir.c: accept a directory as part of cone-mode patterns
  status: skip sparse-checkout percentage with sparse-index
  status: use sparse-index throughout
  dir: use expand_to_path() for sparse directories
  add: allow operating on a sparse-only index
  pathspec: stop calling ensure_full_index
  t7519: add sparse directories to FS monitor tests
  fsmonitor: test with sparse index

 builtin/add.c                            |  3 +
 builtin/commit.c                         |  3 +
 dir.c                                    |  5 ++
 dir.h                                    |  2 +-
 pathspec.c                               |  2 -
 preload-index.c                          |  2 +
 read-cache.c                             |  5 +-
 t/t1092-sparse-checkout-compatibility.sh | 73 +++++++++++++++++++++++-
 t/t7519-status-fsmonitor.sh              | 65 +++++++++++++++++++++
 unpack-trees.c                           | 24 +++++++-
 wt-status.c                              | 14 ++++-
 wt-status.h                              |  1 +
 12 files changed, 186 insertions(+), 13 deletions(-)


base-commit: f723f370c89ad61f4f40aabfd3540b1ce19c00e5
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-932%2Fderrickstolee%2Fsparse-index%2Fstatus-and-add-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-932/derrickstolee/sparse-index/status-and-add-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/932
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCH 01/10] t1092: add tests for status/add and sparse files
  2021-04-13 14:01 [PATCH 00/10] Sparse-index: integrate with status and add Derrick Stolee via GitGitGadget
@ 2021-04-13 14:01 ` Derrick Stolee via GitGitGadget
  2021-04-20 21:52   ` Elijah Newren
  2021-04-21 15:14   ` Matheus Tavares Bernardino
  2021-04-13 14:01 ` [PATCH 02/10] unpack-trees: make sparse aware Derrick Stolee via GitGitGadget
                   ` (10 subsequent siblings)
  11 siblings, 2 replies; 66+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-04-13 14:01 UTC (permalink / raw)
  To: git; +Cc: gitster, newren, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Before moving to update 'git status' and 'git add' to work with sparse
indexes, add an explicit test that ensures the sparse-index works the
same as a normal sparse-checkout when the worktree contains directories
and files outside of the sparse cone.

Specifically, 'folder1/a' is a file in our test repo, but 'folder1' is
not in the sparse cone. When 'folder1/a' is modified, the file
'folder1/a' is shown as modified, but adding it fails. This is new
behavior as of a20f704 (add: warn when asked to update SKIP_WORKTREE
entries, 2021-04-08). Before that change, these adds would be silently
ignored.

Untracked files are fine: adding new files both with 'git add .' and
'git add folder1/' works just as in a full checkout. This may not be
entirely desirable, but we are not intending to change behavior at the
moment, only document it.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 t/t1092-sparse-checkout-compatibility.sh | 36 ++++++++++++++++++++++++
 1 file changed, 36 insertions(+)

diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index 12e6c453024f..6598c12a2069 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -232,6 +232,42 @@ test_expect_success 'add, commit, checkout' '
 	test_all_match git checkout -
 '
 
+test_expect_success 'status/add: outside sparse cone' '
+	init_repos &&
+
+	# folder1 is at HEAD, but outside the sparse cone
+	run_on_sparse mkdir folder1 &&
+	cp initial-repo/folder1/a sparse-checkout/folder1/a &&
+	cp initial-repo/folder1/a sparse-index/folder1/a &&
+
+	test_sparse_match git status &&
+
+	write_script edit-contents <<-\EOF &&
+	echo text >>$1
+	EOF
+	run_on_all ../edit-contents folder1/a &&
+	run_on_all ../edit-contents folder1/new &&
+
+	test_sparse_match git status --porcelain=v2 &&
+
+	# This "git add folder1/a" is completely ignored
+	# by the sparse-checkout repos. It causes the
+	# full repo to have a different staged environment.
+	test_must_fail git -C sparse-checkout add folder1/a &&
+	test_must_fail git -C sparse-index add folder1/a &&
+	git -C full-checkout checkout HEAD -- folder1/a &&
+	test_sparse_match git status --porcelain=v2 &&
+
+	test_all_match git add . &&
+	test_all_match git status --porcelain=v2 &&
+	test_all_match git commit -m folder1/new &&
+
+	run_on_all ../edit-contents folder1/newer &&
+	test_all_match git add folder1/ &&
+	test_all_match git status --porcelain=v2 &&
+	test_all_match git commit -m folder1/newer
+'
+
 test_expect_success 'checkout and reset --hard' '
 	init_repos &&
 
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCH 02/10] unpack-trees: make sparse aware
  2021-04-13 14:01 [PATCH 00/10] Sparse-index: integrate with status and add Derrick Stolee via GitGitGadget
  2021-04-13 14:01 ` [PATCH 01/10] t1092: add tests for status/add and sparse files Derrick Stolee via GitGitGadget
@ 2021-04-13 14:01 ` Derrick Stolee via GitGitGadget
  2021-04-20 23:00   ` Elijah Newren
  2021-04-13 14:01 ` [PATCH 03/10] dir.c: accept a directory as part of cone-mode patterns Derrick Stolee via GitGitGadget
                   ` (9 subsequent siblings)
  11 siblings, 1 reply; 66+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-04-13 14:01 UTC (permalink / raw)
  To: git; +Cc: gitster, newren, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

As a first step to integrate 'git status' and 'git add' with the sparse
index, we must start integrating unpack_trees() with sparse directory
entries. These changes are currently impossible to trigger because
unpack_trees() calls ensure_full_index() if command_requires_full_index
is true. This is the case for all commands at the moment. As we expand
more commands to be sparse-aware, we might find that more changes are
required to unpack_trees(). The current changes will suffice for
'status' and 'add'.

unpack_trees() calls the traverse_trees() API using unpack_callback()
to decide if we should recurse into a subtree. We must add new abilities
to skip a subtree if it corresponds to a sparse directory entry.

It is important to be careful about the trailing directory separator
that exists in the sparse directory entries but not in the subtree
paths.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 dir.h           |  2 +-
 preload-index.c |  2 ++
 read-cache.c    |  3 +++
 unpack-trees.c  | 24 ++++++++++++++++++++++--
 4 files changed, 28 insertions(+), 3 deletions(-)

diff --git a/dir.h b/dir.h
index 51cb0e217247..9d6666f520f3 100644
--- a/dir.h
+++ b/dir.h
@@ -503,7 +503,7 @@ static inline int ce_path_match(struct index_state *istate,
 				char *seen)
 {
 	return match_pathspec(istate, pathspec, ce->name, ce_namelen(ce), 0, seen,
-			      S_ISDIR(ce->ce_mode) || S_ISGITLINK(ce->ce_mode));
+			      S_ISSPARSEDIR(ce->ce_mode) || S_ISDIR(ce->ce_mode) || S_ISGITLINK(ce->ce_mode));
 }
 
 static inline int dir_path_match(struct index_state *istate,
diff --git a/preload-index.c b/preload-index.c
index e5529a586366..35e67057ca9b 100644
--- a/preload-index.c
+++ b/preload-index.c
@@ -55,6 +55,8 @@ static void *preload_thread(void *_data)
 			continue;
 		if (S_ISGITLINK(ce->ce_mode))
 			continue;
+		if (S_ISSPARSEDIR(ce->ce_mode))
+			continue;
 		if (ce_uptodate(ce))
 			continue;
 		if (ce_skip_worktree(ce))
diff --git a/read-cache.c b/read-cache.c
index 29ffa9ac5db9..6308234b4838 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1594,6 +1594,9 @@ int refresh_index(struct index_state *istate, unsigned int flags,
 		if (ignore_skip_worktree && ce_skip_worktree(ce))
 			continue;
 
+		if (istate->sparse_index && S_ISSPARSEDIR(ce->ce_mode))
+			continue;
+
 		if (pathspec && !ce_path_match(istate, ce, pathspec, seen))
 			filtered = 1;
 
diff --git a/unpack-trees.c b/unpack-trees.c
index dddf106d5bd4..9a62e823928a 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -586,6 +586,13 @@ static void mark_ce_used(struct cache_entry *ce, struct unpack_trees_options *o)
 {
 	ce->ce_flags |= CE_UNPACKED;
 
+	/*
+	 * If this is a sparse directory, don't advance cache_bottom.
+	 * That will be advanced later using the cache-tree data.
+	 */
+	if (S_ISSPARSEDIR(ce->ce_mode))
+		return;
+
 	if (o->cache_bottom < o->src_index->cache_nr &&
 	    o->src_index->cache[o->cache_bottom] == ce) {
 		int bottom = o->cache_bottom;
@@ -984,6 +991,9 @@ static int do_compare_entry(const struct cache_entry *ce,
 	ce_len -= pathlen;
 	ce_name = ce->name + pathlen;
 
+	/* remove directory separator if a sparse directory entry */
+	if (S_ISSPARSEDIR(ce->ce_mode))
+		ce_len--;
 	return df_name_compare(ce_name, ce_len, S_IFREG, name, namelen, mode);
 }
 
@@ -993,6 +1003,10 @@ static int compare_entry(const struct cache_entry *ce, const struct traverse_inf
 	if (cmp)
 		return cmp;
 
+	/* If ce is a sparse directory, then allow equality here. */
+	if (S_ISSPARSEDIR(ce->ce_mode))
+		return 0;
+
 	/*
 	 * Even if the beginning compared identically, the ce should
 	 * compare as bigger than a directory leading up to it!
@@ -1243,6 +1257,7 @@ static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, str
 	struct cache_entry *src[MAX_UNPACK_TREES + 1] = { NULL, };
 	struct unpack_trees_options *o = info->data;
 	const struct name_entry *p = names;
+	unsigned recurse = 1;
 
 	/* Find first entry with a real name (we could use "mask" too) */
 	while (!p->mode)
@@ -1284,12 +1299,16 @@ static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, str
 					}
 				}
 				src[0] = ce;
+
+				if (S_ISSPARSEDIR(ce->ce_mode))
+					recurse = 0;
 			}
 			break;
 		}
 	}
 
-	if (unpack_nondirectories(n, mask, dirmask, src, names, info) < 0)
+	if (recurse &&
+	    unpack_nondirectories(n, mask, dirmask, src, names, info) < 0)
 		return -1;
 
 	if (o->merge && src[0]) {
@@ -1319,7 +1338,8 @@ static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, str
 			}
 		}
 
-		if (traverse_trees_recursive(n, dirmask, mask & ~dirmask,
+		if (recurse &&
+		    traverse_trees_recursive(n, dirmask, mask & ~dirmask,
 					     names, info) < 0)
 			return -1;
 		return mask;
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCH 03/10] dir.c: accept a directory as part of cone-mode patterns
  2021-04-13 14:01 [PATCH 00/10] Sparse-index: integrate with status and add Derrick Stolee via GitGitGadget
  2021-04-13 14:01 ` [PATCH 01/10] t1092: add tests for status/add and sparse files Derrick Stolee via GitGitGadget
  2021-04-13 14:01 ` [PATCH 02/10] unpack-trees: make sparse aware Derrick Stolee via GitGitGadget
@ 2021-04-13 14:01 ` Derrick Stolee via GitGitGadget
  2021-04-20 23:21   ` Elijah Newren
  2021-04-13 14:01 ` [PATCH 04/10] status: skip sparse-checkout percentage with sparse-index Derrick Stolee via GitGitGadget
                   ` (8 subsequent siblings)
  11 siblings, 1 reply; 66+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-04-13 14:01 UTC (permalink / raw)
  To: git; +Cc: gitster, newren, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

When we have sparse directory entries in the index, we want to compare
that directory against sparse-checkout patterns. Those pattern matching
algorithms are built expecting a file path, not a directory path. This
is especially important in the "cone mode" patterns which will match
files that exist within the "parent directories" as well as the
recursive directory matches.

If path_matches_pattern_list() is given a directory, we can add a fake
filename ("-") to the directory and get the same results as before,
assuming we are in cone mode. Since sparse index requires cone mode
patterns, this is an acceptable assumption.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 dir.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/dir.c b/dir.c
index 166238e79f52..57e22e605cec 100644
--- a/dir.c
+++ b/dir.c
@@ -1378,6 +1378,11 @@ enum pattern_match_result path_matches_pattern_list(
 	strbuf_addch(&parent_pathname, '/');
 	strbuf_add(&parent_pathname, pathname, pathlen);
 
+	/* Directory requests should be added as if they are a file */
+	if (parent_pathname.len > 1 &&
+	    parent_pathname.buf[parent_pathname.len - 1] == '/')
+		strbuf_add(&parent_pathname, "-", 1);
+
 	if (hashmap_contains_path(&pl->recursive_hashmap,
 				  &parent_pathname)) {
 		result = MATCHED_RECURSIVE;
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCH 04/10] status: skip sparse-checkout percentage with sparse-index
  2021-04-13 14:01 [PATCH 00/10] Sparse-index: integrate with status and add Derrick Stolee via GitGitGadget
                   ` (2 preceding siblings ...)
  2021-04-13 14:01 ` [PATCH 03/10] dir.c: accept a directory as part of cone-mode patterns Derrick Stolee via GitGitGadget
@ 2021-04-13 14:01 ` Derrick Stolee via GitGitGadget
  2021-04-20 23:26   ` Elijah Newren
  2021-04-13 14:01 ` [PATCH 05/10] status: use sparse-index throughout Derrick Stolee via GitGitGadget
                   ` (7 subsequent siblings)
  11 siblings, 1 reply; 66+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-04-13 14:01 UTC (permalink / raw)
  To: git; +Cc: gitster, newren, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

'git status' began reporting a percentage of populated paths when
sparse-checkout is enabled in 051df3cf (wt-status: show sparse
checkout status as well, 2020-07-18). This percentage is incorrect when
the index has sparse directories. It would also be expensive to
calculate as we would need to parse trees to count the total number of
possible paths.

Avoid the expensive computation by simplifying the output to only report
that a sparse checkout exists, without the percentage.

This change is the reason we use 'git status --porcelain=v2' in
t1092-sparse-checkout-compatibility.sh. We don't want to ensure that
this message is equal across both modes, but instead just the important
information about staged, modified, and untracked files are compared.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 t/t1092-sparse-checkout-compatibility.sh |  8 ++++++++
 wt-status.c                              | 14 +++++++++++---
 wt-status.h                              |  1 +
 3 files changed, 20 insertions(+), 3 deletions(-)

diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index 6598c12a2069..e488ef9bd941 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -196,6 +196,14 @@ test_expect_success 'status with options' '
 	test_all_match git status --porcelain=v2 -uno
 '
 
+test_expect_success 'status reports sparse-checkout' '
+	init_repos &&
+	git -C sparse-checkout status >full &&
+	git -C sparse-index status >sparse &&
+	test_i18ngrep "You are in a sparse checkout with " full &&
+	test_i18ngrep "You are in a sparse checkout." sparse
+'
+
 test_expect_success 'add, commit, checkout' '
 	init_repos &&
 
diff --git a/wt-status.c b/wt-status.c
index 0c8287a023e4..0425169c1895 100644
--- a/wt-status.c
+++ b/wt-status.c
@@ -1490,9 +1490,12 @@ static void show_sparse_checkout_in_use(struct wt_status *s,
 	if (s->state.sparse_checkout_percentage == SPARSE_CHECKOUT_DISABLED)
 		return;
 
-	status_printf_ln(s, color,
-			 _("You are in a sparse checkout with %d%% of tracked files present."),
-			 s->state.sparse_checkout_percentage);
+	if (s->state.sparse_checkout_percentage == SPARSE_CHECKOUT_SPARSE_INDEX)
+		status_printf_ln(s, color, _("You are in a sparse checkout."));
+	else
+		status_printf_ln(s, color,
+				_("You are in a sparse checkout with %d%% of tracked files present."),
+				s->state.sparse_checkout_percentage);
 	wt_longstatus_print_trailer(s);
 }
 
@@ -1650,6 +1653,11 @@ static void wt_status_check_sparse_checkout(struct repository *r,
 		return;
 	}
 
+	if (r->index->sparse_index) {
+		state->sparse_checkout_percentage = SPARSE_CHECKOUT_SPARSE_INDEX;
+		return;
+	}
+
 	for (i = 0; i < r->index->cache_nr; i++) {
 		struct cache_entry *ce = r->index->cache[i];
 		if (ce_skip_worktree(ce))
diff --git a/wt-status.h b/wt-status.h
index 0d32799b28e1..ab9cc9d8f032 100644
--- a/wt-status.h
+++ b/wt-status.h
@@ -78,6 +78,7 @@ enum wt_status_format {
 };
 
 #define SPARSE_CHECKOUT_DISABLED -1
+#define SPARSE_CHECKOUT_SPARSE_INDEX -2
 
 struct wt_status_state {
 	int merge_in_progress;
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCH 05/10] status: use sparse-index throughout
  2021-04-13 14:01 [PATCH 00/10] Sparse-index: integrate with status and add Derrick Stolee via GitGitGadget
                   ` (3 preceding siblings ...)
  2021-04-13 14:01 ` [PATCH 04/10] status: skip sparse-checkout percentage with sparse-index Derrick Stolee via GitGitGadget
@ 2021-04-13 14:01 ` Derrick Stolee via GitGitGadget
  2021-04-21  0:44   ` Elijah Newren
  2021-04-13 14:01 ` [PATCH 06/10] dir: use expand_to_path() for sparse directories Derrick Stolee via GitGitGadget
                   ` (6 subsequent siblings)
  11 siblings, 1 reply; 66+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-04-13 14:01 UTC (permalink / raw)
  To: git; +Cc: gitster, newren, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

By testing 'git -c core.fsmonitor= status -uno', we can check for the
simplest index operations that can be made sparse-aware. The necessary
implementation details are already integrated with sparse-checkout, so
modify command_requires_full_index to be zero for cmd_status().

By running the debugger for 'git status -uno' after that change, we find
two instances of ensure_full_index() that were added for extra safety,
but can be removed without issue.

In refresh_index(), we loop through the index entries. The
refresh_cache_ent() method copies the sparse directories into the
refreshed index without issue.

The loop within run_diff_files() skips things that are in stage 0 and
have skip-worktree enabled, so seems safe to disable ensure_full_index()
here.

This allows some cases of 'git status' to no longer expand a sparse
index to a full one, giving the following performance improvements for
p2000-sparse-checkout-operations.sh:

Test                                  HEAD~1           HEAD
-----------------------------------------------------------------------------
2000.2: git status (full-index-v3)    0.38(0.36+0.07)  0.37(0.31+0.10) -2.6%
2000.3: git status (full-index-v4)    0.38(0.29+0.12)  0.37(0.30+0.11) -2.6%
2000.4: git status (sparse-index-v3)  2.43(2.33+0.14)  0.04(0.05+0.04) -98.4%
2000.5: git status (sparse-index-v4)  2.44(2.35+0.13)  0.05(0.04+0.07) -98.0%

Note that since HEAD~1 was expanding the sparse index by parsing trees,
it was artificially slower than the full index case. Thus, the 98%
improvement is misleading, and instead we should celebrate the 0.37s to
0.05s improvement of 82%. This is more indicative of the peformance
gains we are expecting by using a sparse index.

Note: we are dropping the assignment of core.fsmonitor here. This is not
necessary for the test script as we are not altering the config any
other way. Correct integration with FS Monitor will be validated in
later changes.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 builtin/commit.c                         |  3 +++
 read-cache.c                             |  2 --
 t/t1092-sparse-checkout-compatibility.sh | 12 ++++++++----
 3 files changed, 11 insertions(+), 6 deletions(-)

diff --git a/builtin/commit.c b/builtin/commit.c
index cf0c36d1dcb2..e529da7beadd 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -1404,6 +1404,9 @@ int cmd_status(int argc, const char **argv, const char *prefix)
 	if (argc == 2 && !strcmp(argv[1], "-h"))
 		usage_with_options(builtin_status_usage, builtin_status_options);
 
+	prepare_repo_settings(the_repository);
+	the_repository->settings.command_requires_full_index = 0;
+
 	status_init_config(&s, git_status_config);
 	argc = parse_options(argc, argv, prefix,
 			     builtin_status_options,
diff --git a/read-cache.c b/read-cache.c
index 6308234b4838..83e6bdef7604 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1578,8 +1578,6 @@ int refresh_index(struct index_state *istate, unsigned int flags,
 	 */
 	preload_index(istate, pathspec, 0);
 	trace2_region_enter("index", "refresh", NULL);
-	/* TODO: audit for interaction with sparse-index. */
-	ensure_full_index(istate);
 	for (i = 0; i < istate->cache_nr; i++) {
 		struct cache_entry *ce, *new_entry;
 		int cache_errno = 0;
diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index e488ef9bd941..380a085f8ec4 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -449,12 +449,16 @@ test_expect_success 'sparse-index is expanded and converted back' '
 	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \
 		git -C sparse-index -c core.fsmonitor="" reset --hard &&
 	test_region index convert_to_sparse trace2.txt &&
-	test_region index ensure_full_index trace2.txt &&
+	test_region index ensure_full_index trace2.txt
+'
 
-	rm trace2.txt &&
+test_expect_success 'sparse-index is not expanded' '
+	init_repos &&
+
+	rm -f trace2.txt &&
 	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \
-		git -C sparse-index -c core.fsmonitor="" status -uno &&
-	test_region index ensure_full_index trace2.txt
+		git -C sparse-index status -uno &&
+	test_region ! index ensure_full_index trace2.txt
 '
 
 test_done
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCH 06/10] dir: use expand_to_path() for sparse directories
  2021-04-13 14:01 [PATCH 00/10] Sparse-index: integrate with status and add Derrick Stolee via GitGitGadget
                   ` (4 preceding siblings ...)
  2021-04-13 14:01 ` [PATCH 05/10] status: use sparse-index throughout Derrick Stolee via GitGitGadget
@ 2021-04-13 14:01 ` Derrick Stolee via GitGitGadget
  2021-04-21  0:52   ` Elijah Newren
  2021-04-13 14:01 ` [PATCH 07/10] add: allow operating on a sparse-only index Derrick Stolee via GitGitGadget
                   ` (5 subsequent siblings)
  11 siblings, 1 reply; 66+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-04-13 14:01 UTC (permalink / raw)
  To: git; +Cc: gitster, newren, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

The recently-implemented expand_to_path() method can supply position
queries a faster response if they are specifically asking for a path
within the sparse cone. Since this is the most-common scenario, this
provides a significant speedup.

Update t1092-sparse-checkout-compatibility.sh to fully ensure that 'git
status' does not expand a sparse index to a full one, even when there
exist untracked files.

The performance test script p2000-sparse-operations.sh demonstrates
that this is the final hole to fill to allow 'git status' to speed up
when using a sparse index:

Test                                  HEAD~1            HEAD
------------------------------------------------------------------------------
2000.4: git status (sparse-index-v3)  1.50(1.43+0.10)   0.04(0.04+0.03) -97.3%
2000.5: git status (sparse-index-v4)  1.50(1.43+0.10)   0.04(0.03+0.04) -97.3%

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 t/t1092-sparse-checkout-compatibility.sh | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index 380a085f8ec4..b937d7096afd 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -456,8 +456,9 @@ test_expect_success 'sparse-index is not expanded' '
 	init_repos &&
 
 	rm -f trace2.txt &&
+	echo >>sparse-index/untracked.txt &&
 	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \
-		git -C sparse-index status -uno &&
+		git -C sparse-index status &&
 	test_region ! index ensure_full_index trace2.txt
 '
 
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCH 07/10] add: allow operating on a sparse-only index
  2021-04-13 14:01 [PATCH 00/10] Sparse-index: integrate with status and add Derrick Stolee via GitGitGadget
                   ` (5 preceding siblings ...)
  2021-04-13 14:01 ` [PATCH 06/10] dir: use expand_to_path() for sparse directories Derrick Stolee via GitGitGadget
@ 2021-04-13 14:01 ` Derrick Stolee via GitGitGadget
  2021-04-13 14:01 ` [PATCH 08/10] pathspec: stop calling ensure_full_index Derrick Stolee via GitGitGadget
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 66+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-04-13 14:01 UTC (permalink / raw)
  To: git; +Cc: gitster, newren, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Disable command_requires_full_index for 'git add'. This does not require
any additional removals of ensure_full_index(). The main reason is that
'git add' discovers changes based on the pathspec and the worktree
itself. These are then inserted into the index directly, and calls to
index_name_pos() or index_file_exists() already call expand_to_path() at
the appropriate time to support a sparse-index.

Add a test to check that 'git add -A' and 'git add <file>' does not
expand the index at all, as long as <file> is not within a sparse
directory. This does not help the global 'git add .' case.

We can measure the improvement using p2000-sparse-operations.sh with
these results:

Test                                  HEAD~1           HEAD
------------------------------------------------------------------------------
2000.6: git add -A (full-index-v3)    1.35(1.00+0.20)  1.33(0.98+0.19) -1.5%
2000.7: git add -A (full-index-v4)    1.25(0.97+0.17)  1.23(0.96+0.16) -1.6%
2000.8: git add -A (sparse-index-v3)  2.38(2.28+0.13)  0.06(0.04+0.08) -97.5%
2000.9: git add -A (sparse-index-v4)  2.39(2.25+0.18)  0.06(0.04+0.07) -97.5%

While the 97% improvement seems impressive, it's important to recognize
that previously we had significant overhead for expanding the
sparse-index. Comparing to the full index case, 'git add -A' goes from
1.33s to 0.06s, which is "only" a 95% improvement.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 builtin/add.c                            |  3 +++
 t/t1092-sparse-checkout-compatibility.sh | 12 ++++++++++++
 2 files changed, 15 insertions(+)

diff --git a/builtin/add.c b/builtin/add.c
index 58ee3f954ef7..0572d0344065 100644
--- a/builtin/add.c
+++ b/builtin/add.c
@@ -526,6 +526,9 @@ int cmd_add(int argc, const char **argv, const char *prefix)
 	add_new_files = !take_worktree_changes && !refresh_only && !add_renormalize;
 	require_pathspec = !(take_worktree_changes || (0 < addremove_explicit));
 
+	prepare_repo_settings(the_repository);
+	the_repository->settings.command_requires_full_index = 0;
+
 	hold_locked_index(&lock_file, LOCK_DIE_ON_ERROR);
 
 	/*
diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index b937d7096afd..c210dba78067 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -459,6 +459,18 @@ test_expect_success 'sparse-index is not expanded' '
 	echo >>sparse-index/untracked.txt &&
 	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \
 		git -C sparse-index status &&
+	test_region ! index ensure_full_index trace2.txt &&
+
+	rm trace2.txt &&
+	echo >>sparse-index/README.md &&
+	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \
+		git -C sparse-index add -A &&
+	test_region ! index ensure_full_index trace2.txt &&
+
+	rm trace2.txt &&
+	echo >>sparse-index/extra.txt &&
+	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \
+		git -C sparse-index add extra.txt &&
 	test_region ! index ensure_full_index trace2.txt
 '
 
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCH 08/10] pathspec: stop calling ensure_full_index
  2021-04-13 14:01 [PATCH 00/10] Sparse-index: integrate with status and add Derrick Stolee via GitGitGadget
                   ` (6 preceding siblings ...)
  2021-04-13 14:01 ` [PATCH 07/10] add: allow operating on a sparse-only index Derrick Stolee via GitGitGadget
@ 2021-04-13 14:01 ` Derrick Stolee via GitGitGadget
  2021-04-21  0:57   ` Elijah Newren
  2021-04-13 14:01 ` [PATCH 09/10] t7519: add sparse directories to FS monitor tests Derrick Stolee via GitGitGadget
                   ` (3 subsequent siblings)
  11 siblings, 1 reply; 66+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-04-13 14:01 UTC (permalink / raw)
  To: git; +Cc: gitster, newren, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

The add_pathspec_matches_against_index() focuses on matching a pathspec
to file entries in the index. This already works correctly for its only
use: checking if untracked files exist in the index.

The compatibility checks in t1092 already test that 'git add <dir>'
works for a directory outside of the sparse cone. That provides coverage
for removing this guard.

This finalizes our ability to run 'git add .' without expanding a sparse
index to a full one. This is evidenced by an update to t1092 and by
these performance numbers for p2000-sparse-operations.sh:

Test                                    HEAD~1            HEAD
--------------------------------------------------------------------------------
2000.10: git add . (full-index-v3)      1.37(1.02+0.18)   1.38(1.01+0.20) +0.7%
2000.11: git add . (full-index-v4)      1.26(1.00+0.15)   1.27(0.99+0.17) +0.8%
2000.12: git add . (sparse-index-v3)    2.39(2.29+0.14)   0.06(0.05+0.07) -97.5%
2000.13: git add . (sparse-index-v4)    2.42(2.32+0.14)   0.06(0.05+0.06) -97.5%

While the 97% improvement is shown by the test results, it is worth
noting that expanding the sparse index was adding overhead in previous
commits. Comparing to the full index case, we see the performance go
from 1.27s to 0.06s, a 95% improvement.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 pathspec.c                               | 2 --
 t/t1092-sparse-checkout-compatibility.sh | 6 ++++++
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/pathspec.c b/pathspec.c
index 54813c0c4e8e..b51b48471fe6 100644
--- a/pathspec.c
+++ b/pathspec.c
@@ -37,8 +37,6 @@ void add_pathspec_matches_against_index(const struct pathspec *pathspec,
 			num_unmatched++;
 	if (!num_unmatched)
 		return;
-	/* TODO: audit for interaction with sparse-index. */
-	ensure_full_index(istate);
 	for (i = 0; i < istate->cache_nr; i++) {
 		const struct cache_entry *ce = istate->cache[i];
 		if (sw_action == PS_IGNORE_SKIP_WORKTREE && ce_skip_worktree(ce))
diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index c210dba78067..738013b00191 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -471,6 +471,12 @@ test_expect_success 'sparse-index is not expanded' '
 	echo >>sparse-index/extra.txt &&
 	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \
 		git -C sparse-index add extra.txt &&
+	test_region ! index ensure_full_index trace2.txt &&
+
+	rm trace2.txt &&
+	echo >>sparse-index/untracked.txt &&
+	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \
+		git -C sparse-index add . &&
 	test_region ! index ensure_full_index trace2.txt
 '
 
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCH 09/10] t7519: add sparse directories to FS monitor tests
  2021-04-13 14:01 [PATCH 00/10] Sparse-index: integrate with status and add Derrick Stolee via GitGitGadget
                   ` (7 preceding siblings ...)
  2021-04-13 14:01 ` [PATCH 08/10] pathspec: stop calling ensure_full_index Derrick Stolee via GitGitGadget
@ 2021-04-13 14:01 ` Derrick Stolee via GitGitGadget
  2021-04-13 14:01 ` [PATCH 10/10] fsmonitor: test with sparse index Derrick Stolee via GitGitGadget
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 66+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-04-13 14:01 UTC (permalink / raw)
  To: git; +Cc: gitster, newren, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

The File System Monitor (FS Monitor) tests in t7519 demonstrate some
important interactions with the index and the response from the FS
Monitor hook. Later changes will integrate the FS Monitor extension in
the index with the existence of sparse directory entries in a sparse
index. To do so, we need to include directories outside of the sparse
checkout definition.

Add a new directory, dir1a, between dir1 and dir2 in the test repo used
by this script. By inserting it in the middle, we are more likely to
trigger incorrect behavior when the fsmonitor_dirty bitmap is involved
with sparse directories changing the position of cache entries.

I could have modified the test to create two repos, one sparse and one
not, but that causes confusion in the expected output. Further, it makes
the test take twice as long. With this approach, we can validate that FS
Monitor works with the sparse index feature using the
GIT_TEST_SPARSE_INDEX=1 environment variable. The test currently fails
with that environment variable because FS Monitor is disabled when a
sparse index exists. The following changes will update this behavior.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 t/t7519-status-fsmonitor.sh | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/t/t7519-status-fsmonitor.sh b/t/t7519-status-fsmonitor.sh
index 45d025f96010..23879d967297 100755
--- a/t/t7519-status-fsmonitor.sh
+++ b/t/t7519-status-fsmonitor.sh
@@ -62,11 +62,16 @@ test_expect_success 'setup' '
 	mkdir dir1 &&
 	: >dir1/tracked &&
 	: >dir1/modified &&
+	mkdir dir1a &&
+	: >dir1a/a &&
+	: >dir1a/b &&
 	mkdir dir2 &&
 	: >dir2/tracked &&
 	: >dir2/modified &&
 	git -c core.fsmonitor= add . &&
 	git -c core.fsmonitor= commit -m initial &&
+	git sparse-checkout init --cone --no-sparse-index &&
+	git sparse-checkout set dir1 dir2 &&
 	git config core.fsmonitor .git/hooks/fsmonitor-test &&
 	cat >.gitignore <<-\EOF
 	.gitignore
@@ -99,6 +104,8 @@ test_expect_success 'update-index --no-fsmonitor" removes the fsmonitor extensio
 cat >expect <<EOF &&
 h dir1/modified
 H dir1/tracked
+S dir1a/a
+S dir1a/b
 h dir2/modified
 H dir2/tracked
 h modified
@@ -121,6 +128,8 @@ test_expect_success 'update-index --fsmonitor-valid" sets the fsmonitor valid bi
 cat >expect <<EOF &&
 H dir1/modified
 H dir1/tracked
+S dir1a/a
+S dir1a/b
 H dir2/modified
 H dir2/tracked
 H modified
@@ -139,6 +148,8 @@ test_expect_success 'update-index --no-fsmonitor-valid" clears the fsmonitor val
 cat >expect <<EOF &&
 H dir1/modified
 H dir1/tracked
+S dir1a/a
+S dir1a/b
 H dir2/modified
 H dir2/tracked
 H modified
@@ -158,6 +169,8 @@ cat >expect <<EOF &&
 H dir1/modified
 h dir1/new
 H dir1/tracked
+S dir1a/a
+S dir1a/b
 H dir2/modified
 h dir2/new
 H dir2/tracked
@@ -182,6 +195,8 @@ cat >expect <<EOF &&
 H dir1/modified
 h dir1/new
 h dir1/tracked
+S dir1a/a
+S dir1a/b
 H dir2/modified
 h dir2/new
 h dir2/tracked
@@ -201,6 +216,8 @@ test_expect_success 'all unmodified files get marked valid' '
 cat >expect <<EOF &&
 H dir1/modified
 h dir1/tracked
+S dir1a/a
+S dir1a/b
 h dir2/modified
 h dir2/tracked
 h modified
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCH 10/10] fsmonitor: test with sparse index
  2021-04-13 14:01 [PATCH 00/10] Sparse-index: integrate with status and add Derrick Stolee via GitGitGadget
                   ` (8 preceding siblings ...)
  2021-04-13 14:01 ` [PATCH 09/10] t7519: add sparse directories to FS monitor tests Derrick Stolee via GitGitGadget
@ 2021-04-13 14:01 ` Derrick Stolee via GitGitGadget
  2021-04-21  7:00   ` Elijah Newren
  2021-04-13 20:45 ` [PATCH 00/10] Sparse-index: integrate with status and add Matheus Tavares Bernardino
  2021-04-23 21:34 ` [PATCH v2 0/8] Sparse-index: integrate with status Derrick Stolee via GitGitGadget
  11 siblings, 1 reply; 66+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-04-13 14:01 UTC (permalink / raw)
  To: git; +Cc: gitster, newren, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

During the effort to protect uses of the index to operate on a full
index, we did not modify fsmonitor.c. This is because it already works
effectively with only the change to index_name_stage_pos(). The only
thing left to do is to test that it works correctly.

These tests are added to demonstrate that the behavior is the same
across a full index and a sparse index, but also that file modifications
to a tracked directory outside of the sparse cone will trigger
ensure_full_index().

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 t/t7519-status-fsmonitor.sh | 48 +++++++++++++++++++++++++++++++++++++
 1 file changed, 48 insertions(+)

diff --git a/t/t7519-status-fsmonitor.sh b/t/t7519-status-fsmonitor.sh
index 23879d967297..306157d48abf 100755
--- a/t/t7519-status-fsmonitor.sh
+++ b/t/t7519-status-fsmonitor.sh
@@ -78,6 +78,7 @@ test_expect_success 'setup' '
 	expect*
 	actual*
 	marker*
+	trace2*
 	EOF
 '
 
@@ -400,4 +401,51 @@ test_expect_success 'status succeeds after staging/unstaging' '
 	)
 '
 
+test_expect_success 'status succeeds with sparse index' '
+	test_config core.fsmonitor "$TEST_DIRECTORY/t7519/fsmonitor-all" &&
+	git status --porcelain=v2 >expect &&
+	git sparse-checkout init --cone --sparse-index &&
+	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \
+		git status --porcelain=v2 >actual &&
+	test_region ! index ensure_full_index trace2.txt &&
+	test_cmp expect actual &&
+	rm trace2.txt &&
+
+	write_script .git/hooks/fsmonitor-test<<-\EOF &&
+		printf "last_update_token\0"
+	EOF
+	git config core.fsmonitor .git/hooks/fsmonitor-test &&
+	git status --porcelain=v2 >expect &&
+	git sparse-checkout init --cone --sparse-index &&
+	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \
+		git status --porcelain=v2 >actual &&
+	test_region ! index ensure_full_index trace2.txt &&
+	test_cmp expect actual &&
+	rm trace2.txt &&
+
+	write_script .git/hooks/fsmonitor-test<<-\EOF &&
+		printf "last_update_token\0"
+		printf "dir1/modified\0"
+	EOF
+	git config core.fsmonitor .git/hooks/fsmonitor-test &&
+	git status --porcelain=v2 >expect &&
+	git sparse-checkout init --cone --sparse-index &&
+	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \
+		git status --porcelain=v2 >actual &&
+	test_region ! index ensure_full_index trace2.txt &&
+	test_cmp expect actual &&
+
+	write_script .git/hooks/fsmonitor-test<<-\EOF &&
+		printf "last_update_token\0"
+		printf "dir1a/modified\0"
+	EOF
+	git config core.fsmonitor .git/hooks/fsmonitor-test &&
+	git status --porcelain=v2 >expect &&
+	git sparse-checkout init --cone --sparse-index &&
+	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \
+		git status --porcelain=v2 >actual &&
+	test_region index ensure_full_index trace2.txt &&
+	test_cmp expect actual
+'
+
 test_done
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 00/10] Sparse-index: integrate with status and add
  2021-04-13 14:01 [PATCH 00/10] Sparse-index: integrate with status and add Derrick Stolee via GitGitGadget
                   ` (9 preceding siblings ...)
  2021-04-13 14:01 ` [PATCH 10/10] fsmonitor: test with sparse index Derrick Stolee via GitGitGadget
@ 2021-04-13 20:45 ` Matheus Tavares Bernardino
  2021-04-14 16:31   ` Derrick Stolee
  2021-04-23 21:34 ` [PATCH v2 0/8] Sparse-index: integrate with status Derrick Stolee via GitGitGadget
  11 siblings, 1 reply; 66+ messages in thread
From: Matheus Tavares Bernardino @ 2021-04-13 20:45 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: git, Junio C Hamano, Elijah Newren, Derrick Stolee

Hi, Stolee

On Tue, Apr 13, 2021 at 11:02 AM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> This is the first "payoff" series in the sparse-index work. It makes 'git
> status' and 'git add' very fast when a sparse-index is enabled on a
> repository with cone-mode sparse-checkout (and a small populated set).
>
> This is based on ds/sparse-index-protections AND mt/add-rm-sparse-checkout.

I just noticed that our ds/sparse-index-protections and
mt/add-rm-sparse-checkout had a small semantic conflict. It didn't
appear before, but it does now with this new series.

ds/sparse-index-protections added `ensure_full_index()` guards before
the loops that traverse over all cache entries. At the same time,
mt/add-rm-sparse-checkout added yet another one of these loops, at
`pathspec.c::find_pathspecs_matching_skip_worktree()`. Although the
new place didn't get the `ensure_full_index()` guard, all of its
callers (in `add` and `rm`) did call `ensure_full_index()` before
calling it, so it was fine.

However, patches 7 and 8 remove some of these protections in `add`s
code. And, as a result, if "dir" is a sparse directory entry, `git add
[--refresh] dir/file` no longer emits the warning added at
mt/add-rm-sparse-checkout.

Adding `ensure_full_index()` at
`find_pathspecs_matching_skip_worktree()` fixes the problem. We have
to consider the performance implications, but they _might_ be
acceptable as we only call this function when a pathspec given to
`add` or `rm` does not match any non-ignored file inside the sparse
checkout.

Additionally, the tests I added at t3705 won't catch this problem,
even when running with GIT_TEST_SPARSE_INDEX=true :( That's because
they don't set core.sparseCheckout and core.sparseCheckoutCone, they
only set individual index entries with the SKIP_WORKTREE bit. And
therefore, the index is always written fully. Perhaps, should I reroll
my series using cone mode for these tests?

(And a semi-related question: do you plan on adding
GIT_TEST_SPARSE_INDEX=true to one of the CI jobs? )

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 00/10] Sparse-index: integrate with status and add
  2021-04-13 20:45 ` [PATCH 00/10] Sparse-index: integrate with status and add Matheus Tavares Bernardino
@ 2021-04-14 16:31   ` Derrick Stolee
  0 siblings, 0 replies; 66+ messages in thread
From: Derrick Stolee @ 2021-04-14 16:31 UTC (permalink / raw)
  To: Matheus Tavares Bernardino, Derrick Stolee via GitGitGadget
  Cc: git, Junio C Hamano, Elijah Newren, Derrick Stolee

On 4/13/2021 4:45 PM, Matheus Tavares Bernardino wrote:
> Hi, Stolee
> 
> On Tue, Apr 13, 2021 at 11:02 AM Derrick Stolee via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
>>
>> This is the first "payoff" series in the sparse-index work. It makes 'git
>> status' and 'git add' very fast when a sparse-index is enabled on a
>> repository with cone-mode sparse-checkout (and a small populated set).
>>
>> This is based on ds/sparse-index-protections AND mt/add-rm-sparse-checkout.
> 
> I just noticed that our ds/sparse-index-protections and
> mt/add-rm-sparse-checkout had a small semantic conflict. It didn't
> appear before, but it does now with this new series.

Thank you for taking a close look.
 
> ds/sparse-index-protections added `ensure_full_index()` guards before
> the loops that traverse over all cache entries. At the same time,
> mt/add-rm-sparse-checkout added yet another one of these loops, at
> `pathspec.c::find_pathspecs_matching_skip_worktree()`. Although the
> new place didn't get the `ensure_full_index()` guard, all of its
> callers (in `add` and `rm`) did call `ensure_full_index()` before
> calling it, so it was fine.
>
> However, patches 7 and 8 remove some of these protections in `add`s
> code. And, as a result, if "dir" is a sparse directory entry, `git add
> [--refresh] dir/file` no longer emits the warning added at
> mt/add-rm-sparse-checkout.

You are right, it does not emit the warning. I will add a test that
ensures that behavior is the same across the two sparse repos in
t1092 as part of my v2 in this series.
 
> Adding `ensure_full_index()` at
> `find_pathspecs_matching_skip_worktree()` fixes the problem. We have
> to consider the performance implications, but they _might_ be
> acceptable as we only call this function when a pathspec given to
> `add` or `rm` does not match any non-ignored file inside the sparse
> checkout.

I'll want to do the right thing here to make the warning work, so
I'll take a look soon.

> Additionally, the tests I added at t3705 won't catch this problem,
> even when running with GIT_TEST_SPARSE_INDEX=true :( That's because
> they don't set core.sparseCheckout and core.sparseCheckoutCone, they
> only set individual index entries with the SKIP_WORKTREE bit. And
> therefore, the index is always written fully. Perhaps, should I reroll
> my series using cone mode for these tests?

Your series should not be re-rolled for this. Instead, this is valuable
feedback for this series: there is behavior in 'git add' that I am not
checking stays the same when the sparse-index is enabled. That's my
responsibility and I'll get it fixed.
 
> (And a semi-related question: do you plan on adding
> GIT_TEST_SPARSE_INDEX=true to one of the CI jobs? )

I do plan to add that, after things calm down. It won't do much right
now because it requires core.sparseCheckout[Cone] to be enabled. Not
many tests provide that, so they don't add much coverage. I thought at
one point to adjust the initial repo creation to include a
sparse-checkout in cone mode, but that would change too many tests.
I still haven't found the right way to expand the test coverage to
take advantage of our deep test suite for this feature.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 01/10] t1092: add tests for status/add and sparse files
  2021-04-13 14:01 ` [PATCH 01/10] t1092: add tests for status/add and sparse files Derrick Stolee via GitGitGadget
@ 2021-04-20 21:52   ` Elijah Newren
  2021-04-21 13:21     ` Derrick Stolee
  2021-04-21 15:14   ` Matheus Tavares Bernardino
  1 sibling, 1 reply; 66+ messages in thread
From: Elijah Newren @ 2021-04-20 21:52 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Junio C Hamano, Derrick Stolee, Derrick Stolee

On Tue, Apr 13, 2021 at 7:01 AM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Derrick Stolee <dstolee@microsoft.com>
>
> Before moving to update 'git status' and 'git add' to work with sparse
> indexes, add an explicit test that ensures the sparse-index works the
> same as a normal sparse-checkout when the worktree contains directories
> and files outside of the sparse cone.
>
> Specifically, 'folder1/a' is a file in our test repo, but 'folder1' is
> not in the sparse cone. When 'folder1/a' is modified, the file
> 'folder1/a' is shown as modified, but adding it fails. This is new
> behavior as of a20f704 (add: warn when asked to update SKIP_WORKTREE
> entries, 2021-04-08). Before that change, these adds would be silently
> ignored.
>
> Untracked files are fine: adding new files both with 'git add .' and
> 'git add folder1/' works just as in a full checkout. This may not be
> entirely desirable, but we are not intending to change behavior at the
> moment, only document it.

Personally, I'd say not desirable and we should throw an error just
like we do with skip-worktree entries that the user happens to try to
git add.  I've had reports from users that got confused by what
happens after this.  I've been meaning to create some patches to fix
it up, but wanted to avoid getting in the way of the sparse-index work
and have been a bit tied up on other projects to boot.

I'll note in particular that it's easy for users after running "git
add" to run other things such as "git sparse-checkout reapply" or "git
switch $otherbranch" and suddenly the file disappears from the working
tree.  From the sparse-checkout machinery that makes sense; this path
doesn't match the .git/info/sparse-checkout list of paths, so it
should be removed from the working tree.  But it's very disorienting
to users.  Especially if some of those commands are side-effects of
other commands (e.g. our build system invokes "git sparse-checkout
reapply" in various cases, most common of which is that even a simple
"git pull" can bring down code with dependency changes and thus a need
for new sparsity rules and whatnot), but it definitely can just happen
in ways users don't expect with their own commands (e.g. the git
switch/checkout example).

The patch looks good, but it'd be nice if while documenting it we also
add a comment that we believe we want to change the behavior (for
sparse-checkout both with and without sparse-index).  It's one of
those many paper-cuts we still have.

> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  t/t1092-sparse-checkout-compatibility.sh | 36 ++++++++++++++++++++++++
>  1 file changed, 36 insertions(+)
>
> diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
> index 12e6c453024f..6598c12a2069 100755
> --- a/t/t1092-sparse-checkout-compatibility.sh
> +++ b/t/t1092-sparse-checkout-compatibility.sh
> @@ -232,6 +232,42 @@ test_expect_success 'add, commit, checkout' '
>         test_all_match git checkout -
>  '
>
> +test_expect_success 'status/add: outside sparse cone' '
> +       init_repos &&
> +
> +       # folder1 is at HEAD, but outside the sparse cone
> +       run_on_sparse mkdir folder1 &&
> +       cp initial-repo/folder1/a sparse-checkout/folder1/a &&
> +       cp initial-repo/folder1/a sparse-index/folder1/a &&
> +
> +       test_sparse_match git status &&
> +
> +       write_script edit-contents <<-\EOF &&
> +       echo text >>$1
> +       EOF
> +       run_on_all ../edit-contents folder1/a &&
> +       run_on_all ../edit-contents folder1/new &&
> +
> +       test_sparse_match git status --porcelain=v2 &&
> +
> +       # This "git add folder1/a" is completely ignored
> +       # by the sparse-checkout repos. It causes the
> +       # full repo to have a different staged environment.
> +       test_must_fail git -C sparse-checkout add folder1/a &&
> +       test_must_fail git -C sparse-index add folder1/a &&
> +       git -C full-checkout checkout HEAD -- folder1/a &&
> +       test_sparse_match git status --porcelain=v2 &&
> +
> +       test_all_match git add . &&
> +       test_all_match git status --porcelain=v2 &&
> +       test_all_match git commit -m folder1/new &&
> +
> +       run_on_all ../edit-contents folder1/newer &&
> +       test_all_match git add folder1/ &&
> +       test_all_match git status --porcelain=v2 &&
> +       test_all_match git commit -m folder1/newer
> +'
> +
>  test_expect_success 'checkout and reset --hard' '
>         init_repos &&
>
> --
> gitgitgadget
>

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 02/10] unpack-trees: make sparse aware
  2021-04-13 14:01 ` [PATCH 02/10] unpack-trees: make sparse aware Derrick Stolee via GitGitGadget
@ 2021-04-20 23:00   ` Elijah Newren
  2021-04-21 13:41     ` Derrick Stolee
  2021-04-21 17:27     ` Derrick Stolee
  0 siblings, 2 replies; 66+ messages in thread
From: Elijah Newren @ 2021-04-20 23:00 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Junio C Hamano, Derrick Stolee, Derrick Stolee

On Tue, Apr 13, 2021 at 7:01 AM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Derrick Stolee <dstolee@microsoft.com>
>
> As a first step to integrate 'git status' and 'git add' with the sparse
> index, we must start integrating unpack_trees() with sparse directory
> entries. These changes are currently impossible to trigger because
> unpack_trees() calls ensure_full_index() if command_requires_full_index
> is true. This is the case for all commands at the moment. As we expand
> more commands to be sparse-aware, we might find that more changes are
> required to unpack_trees(). The current changes will suffice for
> 'status' and 'add'.
>
> unpack_trees() calls the traverse_trees() API using unpack_callback()
> to decide if we should recurse into a subtree. We must add new abilities
> to skip a subtree if it corresponds to a sparse directory entry.
>
> It is important to be careful about the trailing directory separator
> that exists in the sparse directory entries but not in the subtree
> paths.
>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  dir.h           |  2 +-
>  preload-index.c |  2 ++
>  read-cache.c    |  3 +++
>  unpack-trees.c  | 24 ++++++++++++++++++++++--
>  4 files changed, 28 insertions(+), 3 deletions(-)
>
> diff --git a/dir.h b/dir.h
> index 51cb0e217247..9d6666f520f3 100644
> --- a/dir.h
> +++ b/dir.h
> @@ -503,7 +503,7 @@ static inline int ce_path_match(struct index_state *istate,
>                                 char *seen)
>  {
>         return match_pathspec(istate, pathspec, ce->name, ce_namelen(ce), 0, seen,
> -                             S_ISDIR(ce->ce_mode) || S_ISGITLINK(ce->ce_mode));
> +                             S_ISSPARSEDIR(ce->ce_mode) || S_ISDIR(ce->ce_mode) || S_ISGITLINK(ce->ce_mode));

I'm confused why this change would be needed, or why it'd semantically
be meaningful here either.  Doesn't S_ISSPARSEDIR() being true imply
S_ISDIR() is true (and perhaps even vice versa?).

By chance, was this a leftover from your early RFC changes from a few
series ago when you had an entirely different mode for sparse
directory entries?

>  }
>
>  static inline int dir_path_match(struct index_state *istate,
> diff --git a/preload-index.c b/preload-index.c
> index e5529a586366..35e67057ca9b 100644
> --- a/preload-index.c
> +++ b/preload-index.c
> @@ -55,6 +55,8 @@ static void *preload_thread(void *_data)
>                         continue;
>                 if (S_ISGITLINK(ce->ce_mode))
>                         continue;
> +               if (S_ISSPARSEDIR(ce->ce_mode))
> +                       continue;
>                 if (ce_uptodate(ce))
>                         continue;
>                 if (ce_skip_worktree(ce))

Don't we have S_ISSPARSEDIR(ce->ce_mode) implies ce_skip_worktree(ce)?
 Is this a duplicate check?  If so, is it still desirable for
future-proofing or code clarity, or is it strictly redundant?

> diff --git a/read-cache.c b/read-cache.c
> index 29ffa9ac5db9..6308234b4838 100644
> --- a/read-cache.c
> +++ b/read-cache.c
> @@ -1594,6 +1594,9 @@ int refresh_index(struct index_state *istate, unsigned int flags,
>                 if (ignore_skip_worktree && ce_skip_worktree(ce))
>                         continue;
>
> +               if (istate->sparse_index && S_ISSPARSEDIR(ce->ce_mode))
> +                       continue;
> +

I'm a bit confused about what could trigger ce_skip_worktree(ce) &&
!ignore_skip_worktree and why it'd be desirable to refresh
skip-worktree entries.  However, this is tangential to your patch and
has apparently been around since 2009 (in particular, from 56cac48c35
("ie_match_stat(): do not ignore skip-worktree bit with
CE_MATCH_IGNORE_VALID", 2009-12-14)).

>                 if (pathspec && !ce_path_match(istate, ce, pathspec, seen))
>                         filtered = 1;
>
> diff --git a/unpack-trees.c b/unpack-trees.c
> index dddf106d5bd4..9a62e823928a 100644
> --- a/unpack-trees.c
> +++ b/unpack-trees.c
> @@ -586,6 +586,13 @@ static void mark_ce_used(struct cache_entry *ce, struct unpack_trees_options *o)
>  {
>         ce->ce_flags |= CE_UNPACKED;
>
> +       /*
> +        * If this is a sparse directory, don't advance cache_bottom.
> +        * That will be advanced later using the cache-tree data.
> +        */
> +       if (S_ISSPARSEDIR(ce->ce_mode))
> +               return;
> +

I don't understand cache_bottom stuff; we might want to get Junio to
look over it.  Or maybe I just need to dig a bit further and attempt
to understand it.

>         if (o->cache_bottom < o->src_index->cache_nr &&
>             o->src_index->cache[o->cache_bottom] == ce) {
>                 int bottom = o->cache_bottom;
> @@ -984,6 +991,9 @@ static int do_compare_entry(const struct cache_entry *ce,
>         ce_len -= pathlen;
>         ce_name = ce->name + pathlen;
>
> +       /* remove directory separator if a sparse directory entry */
> +       if (S_ISSPARSEDIR(ce->ce_mode))
> +               ce_len--;
>         return df_name_compare(ce_name, ce_len, S_IFREG, name, namelen, mode);

Shouldn't we be passing ce->ce_mode instead of S_IFREG here as well?

Note the following sort order:
   foo
   foo.txt
   foo/
   foo/bar

You've trimmed off the '/', so 'foo/' would be ordered where 'foo' is,
but df_name_compare() exists to make "foo" sort exactly where "foo/"
would when "foo" is a directory.  Will your df_name_compare() call
here result in foo.txt being placed after all the "foo/<subpath>"
entries in the index and perhaps cause other problems down the line?
(Are there issues, e.g. with cache-trees getting wrong ordering from
this, or even writing out indexes or tree objects with the wrong
ordering?  I've written out trees to disk with wrong ordering before
and git usually survives but gets really confused with diffs.)

Since at least one caller of compare_entry() takes the return result
and does a "if (cmp < 0)", this order is going to matter in some
cases.  Perhaps we need some testcases where there is a sparse
directory entry named "foo/" and a file recorded in some relevant tree
with the name "foo.txt" to be able to trigger these lines of code?

>  }
>
> @@ -993,6 +1003,10 @@ static int compare_entry(const struct cache_entry *ce, const struct traverse_inf
>         if (cmp)
>                 return cmp;
>
> +       /* If ce is a sparse directory, then allow equality here. */
> +       if (S_ISSPARSEDIR(ce->ce_mode))
> +               return 0;
> +

Um...so a sparse directory compares equal to _anything_ at all?  I'm
really confused why this would be desirable.  Am I missing something
here?

>         /*
>          * Even if the beginning compared identically, the ce should
>          * compare as bigger than a directory leading up to it!
> @@ -1243,6 +1257,7 @@ static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, str
>         struct cache_entry *src[MAX_UNPACK_TREES + 1] = { NULL, };
>         struct unpack_trees_options *o = info->data;
>         const struct name_entry *p = names;
> +       unsigned recurse = 1;

"recurse" sent my mind off into questions about safety checks, base
cases, etc., instead of just the simple "we don't want to read in
directories corresponding to sparse entries".  I think this would be
clearer either if the variable had the sparsity concept embedded in
its name somewhere (e.g. "unsigned sparse_entry = 0", and check for
(!sparse_entry) instead of (recurse) below), or with a comment about
why there are cases where you want to avoid recursion.

>
>         /* Find first entry with a real name (we could use "mask" too) */
>         while (!p->mode)
> @@ -1284,12 +1299,16 @@ static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, str
>                                         }
>                                 }
>                                 src[0] = ce;
> +
> +                               if (S_ISSPARSEDIR(ce->ce_mode))
> +                                       recurse = 0;

Ah, the context here doesn't show it but this is in the "if (!cmp)"
block, i.e. if we found a match for the sparse directory.  This makes
sense, to me, _if_ we ignore the above question about sparse
directories matching equal to anything and everything.

>                         }
>                         break;
>                 }
>         }
>
> -       if (unpack_nondirectories(n, mask, dirmask, src, names, info) < 0)
> +       if (recurse &&
> +           unpack_nondirectories(n, mask, dirmask, src, names, info) < 0)
>                 return -1;
>
>         if (o->merge && src[0]) {
> @@ -1319,7 +1338,8 @@ static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, str
>                         }
>                 }
>
> -               if (traverse_trees_recursive(n, dirmask, mask & ~dirmask,
> +               if (recurse &&
> +                   traverse_trees_recursive(n, dirmask, mask & ~dirmask,
>                                              names, info) < 0)
>                         return -1;
>                 return mask;

Nice.  :-)


I think your patch was mostly about the recurse stuff, which other
than the name or a comment about it look good to me.  However, all the
other preparatory small tweaks brought up a lot of questions or
confusion for me.  I'm worried there might be a bug or two, though I
may have just misunderstood some of the code bits.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 03/10] dir.c: accept a directory as part of cone-mode patterns
  2021-04-13 14:01 ` [PATCH 03/10] dir.c: accept a directory as part of cone-mode patterns Derrick Stolee via GitGitGadget
@ 2021-04-20 23:21   ` Elijah Newren
  2021-04-21 13:47     ` Derrick Stolee
  0 siblings, 1 reply; 66+ messages in thread
From: Elijah Newren @ 2021-04-20 23:21 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Junio C Hamano, Derrick Stolee, Derrick Stolee

On Tue, Apr 13, 2021 at 7:01 AM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Derrick Stolee <dstolee@microsoft.com>
>
> When we have sparse directory entries in the index, we want to compare
> that directory against sparse-checkout patterns. Those pattern matching
> algorithms are built expecting a file path, not a directory path. This
> is especially important in the "cone mode" patterns which will match
> files that exist within the "parent directories" as well as the
> recursive directory matches.
>
> If path_matches_pattern_list() is given a directory, we can add a fake
> filename ("-") to the directory and get the same results as before,
> assuming we are in cone mode. Since sparse index requires cone mode
> patterns, this is an acceptable assumption.

Makes sense; thanks for the good description.

> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  dir.c | 5 +++++
>  1 file changed, 5 insertions(+)
>
> diff --git a/dir.c b/dir.c
> index 166238e79f52..57e22e605cec 100644
> --- a/dir.c
> +++ b/dir.c
> @@ -1378,6 +1378,11 @@ enum pattern_match_result path_matches_pattern_list(
>         strbuf_addch(&parent_pathname, '/');
>         strbuf_add(&parent_pathname, pathname, pathlen);
>
> +       /* Directory requests should be added as if they are a file */

"added" or "matched"?  Also, the description seems a bit brief and
likely to surprise; I'd at least want to expand "file" to "file within
their given directory" but it might be nice to get some summarized
version of the commit message or at least state that "-" is just a
random simple name within the given directory.

> +       if (parent_pathname.len > 1 &&

Is this line...

> +           parent_pathname.buf[parent_pathname.len - 1] == '/')

to prevent an out-of-bounds indexing?  If so, shouldn't it be "> 0" or
">= 1" rather than "> 1"?  And if so, doesn't the strbuf_addch() call
above ensure the condition is always met?

Or are we trying to avoid adding the "-" when we parent_pathname is
just a plain "/"?

> +               strbuf_add(&parent_pathname, "-", 1);
> +

Sorry for all the questions on such a tiny change.  It makes sense to
me, I'm just curious whether it'll confuse future code readers.


>         if (hashmap_contains_path(&pl->recursive_hashmap,
>                                   &parent_pathname)) {
>                 result = MATCHED_RECURSIVE;
> --

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 04/10] status: skip sparse-checkout percentage with sparse-index
  2021-04-13 14:01 ` [PATCH 04/10] status: skip sparse-checkout percentage with sparse-index Derrick Stolee via GitGitGadget
@ 2021-04-20 23:26   ` Elijah Newren
  2021-04-21 13:51     ` Derrick Stolee
  0 siblings, 1 reply; 66+ messages in thread
From: Elijah Newren @ 2021-04-20 23:26 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Junio C Hamano, Derrick Stolee, Derrick Stolee

On Tue, Apr 13, 2021 at 7:01 AM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Derrick Stolee <dstolee@microsoft.com>
>
> 'git status' began reporting a percentage of populated paths when
> sparse-checkout is enabled in 051df3cf (wt-status: show sparse
> checkout status as well, 2020-07-18). This percentage is incorrect when
> the index has sparse directories. It would also be expensive to
> calculate as we would need to parse trees to count the total number of
> possible paths.
>
> Avoid the expensive computation by simplifying the output to only report
> that a sparse checkout exists, without the percentage.

Makes sense.  The percentage wasn't critical, it was just a nice UI
bonus.  The critical part is notifying about being in a sparse
checkout.

It makes me wonder slightly if we'd want to remove the percentage for
both modes just to keep them more similar.  I'll ask some folks for
their thoughts/opinions.  Of course, that could always be tweaked
later and doesn't necessarily need to go into your series.

> This change is the reason we use 'git status --porcelain=v2' in
> t1092-sparse-checkout-compatibility.sh. We don't want to ensure that
> this message is equal across both modes, but instead just the important
> information about staged, modified, and untracked files are compared.
>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  t/t1092-sparse-checkout-compatibility.sh |  8 ++++++++
>  wt-status.c                              | 14 +++++++++++---
>  wt-status.h                              |  1 +
>  3 files changed, 20 insertions(+), 3 deletions(-)
>
> diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
> index 6598c12a2069..e488ef9bd941 100755
> --- a/t/t1092-sparse-checkout-compatibility.sh
> +++ b/t/t1092-sparse-checkout-compatibility.sh
> @@ -196,6 +196,14 @@ test_expect_success 'status with options' '
>         test_all_match git status --porcelain=v2 -uno
>  '
>
> +test_expect_success 'status reports sparse-checkout' '
> +       init_repos &&
> +       git -C sparse-checkout status >full &&
> +       git -C sparse-index status >sparse &&
> +       test_i18ngrep "You are in a sparse checkout with " full &&
> +       test_i18ngrep "You are in a sparse checkout." sparse
> +'
> +
>  test_expect_success 'add, commit, checkout' '
>         init_repos &&
>
> diff --git a/wt-status.c b/wt-status.c
> index 0c8287a023e4..0425169c1895 100644
> --- a/wt-status.c
> +++ b/wt-status.c
> @@ -1490,9 +1490,12 @@ static void show_sparse_checkout_in_use(struct wt_status *s,
>         if (s->state.sparse_checkout_percentage == SPARSE_CHECKOUT_DISABLED)
>                 return;
>
> -       status_printf_ln(s, color,
> -                        _("You are in a sparse checkout with %d%% of tracked files present."),
> -                        s->state.sparse_checkout_percentage);
> +       if (s->state.sparse_checkout_percentage == SPARSE_CHECKOUT_SPARSE_INDEX)
> +               status_printf_ln(s, color, _("You are in a sparse checkout."));
> +       else
> +               status_printf_ln(s, color,
> +                               _("You are in a sparse checkout with %d%% of tracked files present."),
> +                               s->state.sparse_checkout_percentage);
>         wt_longstatus_print_trailer(s);
>  }
>
> @@ -1650,6 +1653,11 @@ static void wt_status_check_sparse_checkout(struct repository *r,
>                 return;
>         }
>
> +       if (r->index->sparse_index) {
> +               state->sparse_checkout_percentage = SPARSE_CHECKOUT_SPARSE_INDEX;
> +               return;
> +       }
> +
>         for (i = 0; i < r->index->cache_nr; i++) {
>                 struct cache_entry *ce = r->index->cache[i];
>                 if (ce_skip_worktree(ce))
> diff --git a/wt-status.h b/wt-status.h
> index 0d32799b28e1..ab9cc9d8f032 100644
> --- a/wt-status.h
> +++ b/wt-status.h
> @@ -78,6 +78,7 @@ enum wt_status_format {
>  };
>
>  #define SPARSE_CHECKOUT_DISABLED -1
> +#define SPARSE_CHECKOUT_SPARSE_INDEX -2
>
>  struct wt_status_state {
>         int merge_in_progress;
> --
> gitgitgadget

Looks good.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 05/10] status: use sparse-index throughout
  2021-04-13 14:01 ` [PATCH 05/10] status: use sparse-index throughout Derrick Stolee via GitGitGadget
@ 2021-04-21  0:44   ` Elijah Newren
  2021-04-21 13:55     ` Derrick Stolee
  0 siblings, 1 reply; 66+ messages in thread
From: Elijah Newren @ 2021-04-21  0:44 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Junio C Hamano, Derrick Stolee, Derrick Stolee

On Tue, Apr 13, 2021 at 7:01 AM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Derrick Stolee <dstolee@microsoft.com>
>
> By testing 'git -c core.fsmonitor= status -uno', we can check for the
> simplest index operations that can be made sparse-aware. The necessary
> implementation details are already integrated with sparse-checkout, so
> modify command_requires_full_index to be zero for cmd_status().
>
> By running the debugger for 'git status -uno' after that change, we find
> two instances of ensure_full_index() that were added for extra safety,
> but can be removed without issue.
>
> In refresh_index(), we loop through the index entries. The
> refresh_cache_ent() method copies the sparse directories into the
> refreshed index without issue.

I do see the removal of a call to ensure_full_index() in
refresh_index() that you mention in this paragraph in the patch below.

I'm confused, though; I would have thought we wanted to avoid a
refresh_cache_ent() call.  Also, one of your previous patches added a

    if (istate->sparse_index && S_ISSPARSEDIR(ce->ce_mode))
        continue;

check before the code ever gets to the refresh_cache_ent() call, so as
far as I can tell, that function won't be called from refresh_entry()
for sparse entries.  Maybe your commit message here is out-of-date?
Or am I confused somehow?

> The loop within run_diff_files() skips things that are in stage 0 and
> have skip-worktree enabled, so seems safe to disable ensure_full_index()
> here.

Unlike the above, I don't see a removal of a ensure_full_index() call
in run_diff_files() as claimed by this paragraph.  Has the commit
message gotten out of date with refactorings you did while developing
this series?

> This allows some cases of 'git status' to no longer expand a sparse
> index to a full one, giving the following performance improvements for
> p2000-sparse-checkout-operations.sh:
>
> Test                                  HEAD~1           HEAD
> -----------------------------------------------------------------------------
> 2000.2: git status (full-index-v3)    0.38(0.36+0.07)  0.37(0.31+0.10) -2.6%
> 2000.3: git status (full-index-v4)    0.38(0.29+0.12)  0.37(0.30+0.11) -2.6%
> 2000.4: git status (sparse-index-v3)  2.43(2.33+0.14)  0.04(0.05+0.04) -98.4%
> 2000.5: git status (sparse-index-v4)  2.44(2.35+0.13)  0.05(0.04+0.07) -98.0%
>
> Note that since HEAD~1 was expanding the sparse index by parsing trees,
> it was artificially slower than the full index case. Thus, the 98%
> improvement is misleading, and instead we should celebrate the 0.37s to
> 0.05s improvement of 82%. This is more indicative of the peformance
> gains we are expecting by using a sparse index.

82%, very nice.  Was this with git.git as the test repository, or some
other repo?  If it's git.git, then we'd actually expect a much bigger
speedup for other repositories, as git.git is pretty small.


> Note: we are dropping the assignment of core.fsmonitor here. This is not
> necessary for the test script as we are not altering the config any
> other way. Correct integration with FS Monitor will be validated in
> later changes.
>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  builtin/commit.c                         |  3 +++
>  read-cache.c                             |  2 --
>  t/t1092-sparse-checkout-compatibility.sh | 12 ++++++++----
>  3 files changed, 11 insertions(+), 6 deletions(-)
>
> diff --git a/builtin/commit.c b/builtin/commit.c
> index cf0c36d1dcb2..e529da7beadd 100644
> --- a/builtin/commit.c
> +++ b/builtin/commit.c
> @@ -1404,6 +1404,9 @@ int cmd_status(int argc, const char **argv, const char *prefix)
>         if (argc == 2 && !strcmp(argv[1], "-h"))
>                 usage_with_options(builtin_status_usage, builtin_status_options);
>
> +       prepare_repo_settings(the_repository);
> +       the_repository->settings.command_requires_full_index = 0;
> +
>         status_init_config(&s, git_status_config);
>         argc = parse_options(argc, argv, prefix,
>                              builtin_status_options,
> diff --git a/read-cache.c b/read-cache.c
> index 6308234b4838..83e6bdef7604 100644
> --- a/read-cache.c
> +++ b/read-cache.c
> @@ -1578,8 +1578,6 @@ int refresh_index(struct index_state *istate, unsigned int flags,
>          */
>         preload_index(istate, pathspec, 0);
>         trace2_region_enter("index", "refresh", NULL);
> -       /* TODO: audit for interaction with sparse-index. */
> -       ensure_full_index(istate);
>         for (i = 0; i < istate->cache_nr; i++) {
>                 struct cache_entry *ce, *new_entry;
>                 int cache_errno = 0;
> diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
> index e488ef9bd941..380a085f8ec4 100755
> --- a/t/t1092-sparse-checkout-compatibility.sh
> +++ b/t/t1092-sparse-checkout-compatibility.sh
> @@ -449,12 +449,16 @@ test_expect_success 'sparse-index is expanded and converted back' '
>         GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \
>                 git -C sparse-index -c core.fsmonitor="" reset --hard &&
>         test_region index convert_to_sparse trace2.txt &&
> -       test_region index ensure_full_index trace2.txt &&
> +       test_region index ensure_full_index trace2.txt
> +'
>
> -       rm trace2.txt &&
> +test_expect_success 'sparse-index is not expanded' '
> +       init_repos &&
> +
> +       rm -f trace2.txt &&
>         GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \
> -               git -C sparse-index -c core.fsmonitor="" status -uno &&
> -       test_region index ensure_full_index trace2.txt
> +               git -C sparse-index status -uno &&
> +       test_region ! index ensure_full_index trace2.txt
>  '
>
>  test_done
> --
> gitgitgadget

Other than what looks like a couple issues in the commit message, the
change looks good to me.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 06/10] dir: use expand_to_path() for sparse directories
  2021-04-13 14:01 ` [PATCH 06/10] dir: use expand_to_path() for sparse directories Derrick Stolee via GitGitGadget
@ 2021-04-21  0:52   ` Elijah Newren
  2021-04-21  0:53     ` Elijah Newren
  0 siblings, 1 reply; 66+ messages in thread
From: Elijah Newren @ 2021-04-21  0:52 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Junio C Hamano, Derrick Stolee, Derrick Stolee

On Tue, Apr 13, 2021 at 7:01 AM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Derrick Stolee <dstolee@microsoft.com>
>
> The recently-implemented expand_to_path() method can supply position
> queries a faster response if they are specifically asking for a path
> within the sparse cone. Since this is the most-common scenario, this
> provides a significant speedup.
>
> Update t1092-sparse-checkout-compatibility.sh to fully ensure that 'git
> status' does not expand a sparse index to a full one, even when there
> exist untracked files.
>
> The performance test script p2000-sparse-operations.sh demonstrates
> that this is the final hole to fill to allow 'git status' to speed up
> when using a sparse index:
>
> Test                                  HEAD~1            HEAD
> ------------------------------------------------------------------------------
> 2000.4: git status (sparse-index-v3)  1.50(1.43+0.10)   0.04(0.04+0.03) -97.3%
> 2000.5: git status (sparse-index-v4)  1.50(1.43+0.10)   0.04(0.03+0.04) -97.3%

Um, I'm confused.  In the previous patch you claimed the following speedups:

2000.4: git status (sparse-index-v3)  2.43(2.33+0.14)  0.04(0.05+0.04) -98.4%
2000.5: git status (sparse-index-v4)  2.44(2.35+0.13)  0.05(0.04+0.07) -98.0%

I don't understand why the "Before" for this patch claims 1.50 as the
initial speed, if the "After" for the last patch was 0.04.  Should the
previous commit message have instead claimed:

2000.4: git status (sparse-index-v3)  2.43(2.33+0.14)  1.50(1.43+0.10) -38.3%
2000.5: git status (sparse-index-v4)  2.44(2.35+0.13)  1.50(1.43+0.10) -38.5%

?

>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  t/t1092-sparse-checkout-compatibility.sh | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
> index 380a085f8ec4..b937d7096afd 100755
> --- a/t/t1092-sparse-checkout-compatibility.sh
> +++ b/t/t1092-sparse-checkout-compatibility.sh
> @@ -456,8 +456,9 @@ test_expect_success 'sparse-index is not expanded' '
>         init_repos &&
>
>         rm -f trace2.txt &&
> +       echo >>sparse-index/untracked.txt &&
>         GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \
> -               git -C sparse-index status -uno &&
> +               git -C sparse-index status &&
>         test_region ! index ensure_full_index trace2.txt
>  '
>
> --
> gitgitgadget

Oh!  So, the previous patch was testing without enumerating untracked
files (because it did those slowly), whereas this one enumerates
untracked files and is still able to achieve the same performance?
This wasn't very clear from the commit message.  Maybe I'm just bad at
reading, but perhaps the commit message could be tweaked slightly to
make this more clear?

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 06/10] dir: use expand_to_path() for sparse directories
  2021-04-21  0:52   ` Elijah Newren
@ 2021-04-21  0:53     ` Elijah Newren
  2021-04-21 14:03       ` Derrick Stolee
  0 siblings, 1 reply; 66+ messages in thread
From: Elijah Newren @ 2021-04-21  0:53 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Junio C Hamano, Derrick Stolee, Derrick Stolee

One more thing:

On Tue, Apr 20, 2021 at 5:52 PM Elijah Newren <newren@gmail.com> wrote:
>
> On Tue, Apr 13, 2021 at 7:01 AM Derrick Stolee via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
> >
> > From: Derrick Stolee <dstolee@microsoft.com>
> >
> > The recently-implemented expand_to_path() method can supply position
> > queries a faster response if they are specifically asking for a path
> > within the sparse cone. Since this is the most-common scenario, this
> > provides a significant speedup.
> >
> > Update t1092-sparse-checkout-compatibility.sh to fully ensure that 'git
> > status' does not expand a sparse index to a full one, even when there
> > exist untracked files.
> >
> > The performance test script p2000-sparse-operations.sh demonstrates
> > that this is the final hole to fill to allow 'git status' to speed up
> > when using a sparse index:
> >
> > Test                                  HEAD~1            HEAD
> > ------------------------------------------------------------------------------
> > 2000.4: git status (sparse-index-v3)  1.50(1.43+0.10)   0.04(0.04+0.03) -97.3%
> > 2000.5: git status (sparse-index-v4)  1.50(1.43+0.10)   0.04(0.03+0.04) -97.3%
>
> Um, I'm confused.  In the previous patch you claimed the following speedups:
>
> 2000.4: git status (sparse-index-v3)  2.43(2.33+0.14)  0.04(0.05+0.04) -98.4%
> 2000.5: git status (sparse-index-v4)  2.44(2.35+0.13)  0.05(0.04+0.07) -98.0%
>
> I don't understand why the "Before" for this patch claims 1.50 as the
> initial speed, if the "After" for the last patch was 0.04.  Should the
> previous commit message have instead claimed:
>
> 2000.4: git status (sparse-index-v3)  2.43(2.33+0.14)  1.50(1.43+0.10) -38.3%
> 2000.5: git status (sparse-index-v4)  2.44(2.35+0.13)  1.50(1.43+0.10) -38.5%
>
> ?
>
> >
> > Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> > ---
> >  t/t1092-sparse-checkout-compatibility.sh | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
> > index 380a085f8ec4..b937d7096afd 100755
> > --- a/t/t1092-sparse-checkout-compatibility.sh
> > +++ b/t/t1092-sparse-checkout-compatibility.sh
> > @@ -456,8 +456,9 @@ test_expect_success 'sparse-index is not expanded' '
> >         init_repos &&
> >
> >         rm -f trace2.txt &&
> > +       echo >>sparse-index/untracked.txt &&
> >         GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \
> > -               git -C sparse-index status -uno &&
> > +               git -C sparse-index status &&
> >         test_region ! index ensure_full_index trace2.txt
> >  '
> >
> > --
> > gitgitgadget
>
> Oh!  So, the previous patch was testing without enumerating untracked
> files (because it did those slowly), whereas this one enumerates
> untracked files and is still able to achieve the same performance?
> This wasn't very clear from the commit message.  Maybe I'm just bad at
> reading, but perhaps the commit message could be tweaked slightly to
> make this more clear?

Why is the subject of this commit "dir: use expand_to_path() ..." if
it only touches t1092-sparse-checkout-compatibility.sh?

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 08/10] pathspec: stop calling ensure_full_index
  2021-04-13 14:01 ` [PATCH 08/10] pathspec: stop calling ensure_full_index Derrick Stolee via GitGitGadget
@ 2021-04-21  0:57   ` Elijah Newren
  0 siblings, 0 replies; 66+ messages in thread
From: Elijah Newren @ 2021-04-21  0:57 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Junio C Hamano, Derrick Stolee, Derrick Stolee

On Tue, Apr 13, 2021 at 7:01 AM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Derrick Stolee <dstolee@microsoft.com>
>
> The add_pathspec_matches_against_index() focuses on matching a pathspec
> to file entries in the index. This already works correctly for its only
> use: checking if untracked files exist in the index.
>
> The compatibility checks in t1092 already test that 'git add <dir>'
> works for a directory outside of the sparse cone. That provides coverage
> for removing this guard.
>
> This finalizes our ability to run 'git add .' without expanding a sparse
> index to a full one. This is evidenced by an update to t1092 and by
> these performance numbers for p2000-sparse-operations.sh:
>
> Test                                    HEAD~1            HEAD
> --------------------------------------------------------------------------------
> 2000.10: git add . (full-index-v3)      1.37(1.02+0.18)   1.38(1.01+0.20) +0.7%
> 2000.11: git add . (full-index-v4)      1.26(1.00+0.15)   1.27(0.99+0.17) +0.8%
> 2000.12: git add . (sparse-index-v3)    2.39(2.29+0.14)   0.06(0.05+0.07) -97.5%
> 2000.13: git add . (sparse-index-v4)    2.42(2.32+0.14)   0.06(0.05+0.06) -97.5%
>
> While the 97% improvement is shown by the test results, it is worth
> noting that expanding the sparse index was adding overhead in previous
> commits. Comparing to the full index case, we see the performance go
> from 1.27s to 0.06s, a 95% improvement.

This is awesome.  :-)

>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  pathspec.c                               | 2 --
>  t/t1092-sparse-checkout-compatibility.sh | 6 ++++++
>  2 files changed, 6 insertions(+), 2 deletions(-)
>
> diff --git a/pathspec.c b/pathspec.c
> index 54813c0c4e8e..b51b48471fe6 100644
> --- a/pathspec.c
> +++ b/pathspec.c
> @@ -37,8 +37,6 @@ void add_pathspec_matches_against_index(const struct pathspec *pathspec,
>                         num_unmatched++;
>         if (!num_unmatched)
>                 return;
> -       /* TODO: audit for interaction with sparse-index. */
> -       ensure_full_index(istate);
>         for (i = 0; i < istate->cache_nr; i++) {
>                 const struct cache_entry *ce = istate->cache[i];
>                 if (sw_action == PS_IGNORE_SKIP_WORKTREE && ce_skip_worktree(ce))
> diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
> index c210dba78067..738013b00191 100755
> --- a/t/t1092-sparse-checkout-compatibility.sh
> +++ b/t/t1092-sparse-checkout-compatibility.sh
> @@ -471,6 +471,12 @@ test_expect_success 'sparse-index is not expanded' '
>         echo >>sparse-index/extra.txt &&
>         GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \
>                 git -C sparse-index add extra.txt &&
> +       test_region ! index ensure_full_index trace2.txt &&
> +
> +       rm trace2.txt &&
> +       echo >>sparse-index/untracked.txt &&
> +       GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \
> +               git -C sparse-index add . &&
>         test_region ! index ensure_full_index trace2.txt
>  '
>
> --
> gitgitgadget
>

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 10/10] fsmonitor: test with sparse index
  2021-04-13 14:01 ` [PATCH 10/10] fsmonitor: test with sparse index Derrick Stolee via GitGitGadget
@ 2021-04-21  7:00   ` Elijah Newren
  0 siblings, 0 replies; 66+ messages in thread
From: Elijah Newren @ 2021-04-21  7:00 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Junio C Hamano, Derrick Stolee, Derrick Stolee

On Tue, Apr 13, 2021 at 7:01 AM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Derrick Stolee <dstolee@microsoft.com>
>
> During the effort to protect uses of the index to operate on a full
> index, we did not modify fsmonitor.c. This is because it already works
> effectively with only the change to index_name_stage_pos(). The only
> thing left to do is to test that it works correctly.
>
> These tests are added to demonstrate that the behavior is the same
> across a full index and a sparse index, but also that file modifications
> to a tracked directory outside of the sparse cone will trigger
> ensure_full_index().
>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  t/t7519-status-fsmonitor.sh | 48 +++++++++++++++++++++++++++++++++++++
>  1 file changed, 48 insertions(+)
>
> diff --git a/t/t7519-status-fsmonitor.sh b/t/t7519-status-fsmonitor.sh
> index 23879d967297..306157d48abf 100755
> --- a/t/t7519-status-fsmonitor.sh
> +++ b/t/t7519-status-fsmonitor.sh
> @@ -78,6 +78,7 @@ test_expect_success 'setup' '
>         expect*
>         actual*
>         marker*
> +       trace2*
>         EOF
>  '
>
> @@ -400,4 +401,51 @@ test_expect_success 'status succeeds after staging/unstaging' '
>         )
>  '
>
> +test_expect_success 'status succeeds with sparse index' '
> +       test_config core.fsmonitor "$TEST_DIRECTORY/t7519/fsmonitor-all" &&
> +       git status --porcelain=v2 >expect &&
> +       git sparse-checkout init --cone --sparse-index &&
> +       GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \
> +               git status --porcelain=v2 >actual &&
> +       test_region ! index ensure_full_index trace2.txt &&
> +       test_cmp expect actual &&
> +       rm trace2.txt &&
> +
> +       write_script .git/hooks/fsmonitor-test<<-\EOF &&
> +               printf "last_update_token\0"
> +       EOF
> +       git config core.fsmonitor .git/hooks/fsmonitor-test &&
> +       git status --porcelain=v2 >expect &&
> +       git sparse-checkout init --cone --sparse-index &&
> +       GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \
> +               git status --porcelain=v2 >actual &&
> +       test_region ! index ensure_full_index trace2.txt &&
> +       test_cmp expect actual &&
> +       rm trace2.txt &&
> +
> +       write_script .git/hooks/fsmonitor-test<<-\EOF &&
> +               printf "last_update_token\0"
> +               printf "dir1/modified\0"
> +       EOF
> +       git config core.fsmonitor .git/hooks/fsmonitor-test &&
> +       git status --porcelain=v2 >expect &&
> +       git sparse-checkout init --cone --sparse-index &&
> +       GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \
> +               git status --porcelain=v2 >actual &&
> +       test_region ! index ensure_full_index trace2.txt &&
> +       test_cmp expect actual &&
> +
> +       write_script .git/hooks/fsmonitor-test<<-\EOF &&
> +               printf "last_update_token\0"
> +               printf "dir1a/modified\0"
> +       EOF
> +       git config core.fsmonitor .git/hooks/fsmonitor-test &&
> +       git status --porcelain=v2 >expect &&
> +       git sparse-checkout init --cone --sparse-index &&
> +       GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \
> +               git status --porcelain=v2 >actual &&
> +       test_region index ensure_full_index trace2.txt &&
> +       test_cmp expect actual

There's a lot of duplicated lines here; would it make sense to have a
helper function you call, making it easier to see the differences
between the four subsections of this test?  Also, do you want to use
test_config instead of git config, so that it automatically gets unset
at the end of the test?

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 01/10] t1092: add tests for status/add and sparse files
  2021-04-20 21:52   ` Elijah Newren
@ 2021-04-21 13:21     ` Derrick Stolee
  0 siblings, 0 replies; 66+ messages in thread
From: Derrick Stolee @ 2021-04-21 13:21 UTC (permalink / raw)
  To: Elijah Newren, Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Junio C Hamano, Derrick Stolee, Derrick Stolee

On 4/20/2021 5:52 PM, Elijah Newren wrote:
> On Tue, Apr 13, 2021 at 7:01 AM Derrick Stolee via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
> I'll note in particular that it's easy for users after running "git
> add" to run other things such as "git sparse-checkout reapply" or "git
> switch $otherbranch" and suddenly the file disappears from the working
> tree.  From the sparse-checkout machinery that makes sense; this path
> doesn't match the .git/info/sparse-checkout list of paths, so it
> should be removed from the working tree.  But it's very disorienting
> to users.  Especially if some of those commands are side-effects of
> other commands (e.g. our build system invokes "git sparse-checkout
> reapply" in various cases, most common of which is that even a simple
> "git pull" can bring down code with dependency changes and thus a need
> for new sparsity rules and whatnot), but it definitely can just happen
> in ways users don't expect with their own commands (e.g. the git
> switch/checkout example).
> 
> The patch looks good, but it'd be nice if while documenting it we also
> add a comment that we believe we want to change the behavior (for
> sparse-checkout both with and without sparse-index).  It's one of
> those many paper-cuts we still have.

I can try to comment on these corner case tests that the behavior is
not intended to be permanent, especially when already needing to comment
how strange it is acting.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 02/10] unpack-trees: make sparse aware
  2021-04-20 23:00   ` Elijah Newren
@ 2021-04-21 13:41     ` Derrick Stolee
  2021-04-21 16:11       ` Elijah Newren
  2021-04-21 17:27     ` Derrick Stolee
  1 sibling, 1 reply; 66+ messages in thread
From: Derrick Stolee @ 2021-04-21 13:41 UTC (permalink / raw)
  To: Elijah Newren, Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Junio C Hamano, Derrick Stolee, Derrick Stolee

On 4/20/2021 7:00 PM, Elijah Newren wrote:
> On Tue, Apr 13, 2021 at 7:01 AM Derrick Stolee via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
>> diff --git a/dir.h b/dir.h
>> index 51cb0e217247..9d6666f520f3 100644
>> --- a/dir.h
>> +++ b/dir.h
>> @@ -503,7 +503,7 @@ static inline int ce_path_match(struct index_state *istate,
>>                                 char *seen)
>>  {
>>         return match_pathspec(istate, pathspec, ce->name, ce_namelen(ce), 0, seen,
>> -                             S_ISDIR(ce->ce_mode) || S_ISGITLINK(ce->ce_mode));
>> +                             S_ISSPARSEDIR(ce->ce_mode) || S_ISDIR(ce->ce_mode) || S_ISGITLINK(ce->ce_mode));
> 
> I'm confused why this change would be needed, or why it'd semantically
> be meaningful here either.  Doesn't S_ISSPARSEDIR() being true imply
> S_ISDIR() is true (and perhaps even vice versa?).
> 
> By chance, was this a leftover from your early RFC changes from a few
> series ago when you had an entirely different mode for sparse
> directory entries?

I will double-check on this with additional testing and debugging.
Your comments below make it clear that this patch would benefit from
some additional splitting.

>>  }
>>
>>  static inline int dir_path_match(struct index_state *istate,
>> diff --git a/preload-index.c b/preload-index.c
>> index e5529a586366..35e67057ca9b 100644
>> --- a/preload-index.c
>> +++ b/preload-index.c
>> @@ -55,6 +55,8 @@ static void *preload_thread(void *_data)
>>                         continue;
>>                 if (S_ISGITLINK(ce->ce_mode))
>>                         continue;
>> +               if (S_ISSPARSEDIR(ce->ce_mode))
>> +                       continue;
>>                 if (ce_uptodate(ce))
>>                         continue;
>>                 if (ce_skip_worktree(ce))
> 
> Don't we have S_ISSPARSEDIR(ce->ce_mode) implies ce_skip_worktree(ce)?
>  Is this a duplicate check?  If so, is it still desirable for
> future-proofing or code clarity, or is it strictly redundant?

You're right, we could skip this one because the ce_skip_worktree(ce)
is enough to cover this case. I think I created this one because I was
auditing uses of S_ISGITLINK().

>> diff --git a/read-cache.c b/read-cache.c
>> index 29ffa9ac5db9..6308234b4838 100644
>> --- a/read-cache.c
>> +++ b/read-cache.c
>> @@ -1594,6 +1594,9 @@ int refresh_index(struct index_state *istate, unsigned int flags,
>>                 if (ignore_skip_worktree && ce_skip_worktree(ce))
>>                         continue;
>>
>> +               if (istate->sparse_index && S_ISSPARSEDIR(ce->ce_mode))
>> +                       continue;
>> +
> 
> I'm a bit confused about what could trigger ce_skip_worktree(ce) &&
> !ignore_skip_worktree and why it'd be desirable to refresh
> skip-worktree entries.  However, this is tangential to your patch and
> has apparently been around since 2009 (in particular, from 56cac48c35
> ("ie_match_stat(): do not ignore skip-worktree bit with
> CE_MATCH_IGNORE_VALID", 2009-12-14)).

This is probably better served with a statement like this earlier in
the method:

	if (ignore_skip_worktree)
		ensure_full_index(istate);

It seems like ignoring the skip worktree bits is a rare occasion and
it will be worth expanding the index for that case.

>>                 if (pathspec && !ce_path_match(istate, ce, pathspec, seen))
>>                         filtered = 1;
>>
>> diff --git a/unpack-trees.c b/unpack-trees.c
>> index dddf106d5bd4..9a62e823928a 100644
>> --- a/unpack-trees.c
>> +++ b/unpack-trees.c
>> @@ -586,6 +586,13 @@ static void mark_ce_used(struct cache_entry *ce, struct unpack_trees_options *o)
>>  {
>>         ce->ce_flags |= CE_UNPACKED;
>>
>> +       /*
>> +        * If this is a sparse directory, don't advance cache_bottom.
>> +        * That will be advanced later using the cache-tree data.
>> +        */
>> +       if (S_ISSPARSEDIR(ce->ce_mode))
>> +               return;
>> +
> 
> I don't understand cache_bottom stuff; we might want to get Junio to
> look over it.  Or maybe I just need to dig a bit further and attempt
> to understand it.

I remember looking very careful at this when I created this (and found
it worth a comment) but I don't recall enough off the top of my head.
This is worth splitting out with a careful message, which will force me
to reexamine the cache_bottom member.

>>         if (o->cache_bottom < o->src_index->cache_nr &&
>>             o->src_index->cache[o->cache_bottom] == ce) {
>>                 int bottom = o->cache_bottom;
>> @@ -984,6 +991,9 @@ static int do_compare_entry(const struct cache_entry *ce,
>>         ce_len -= pathlen;
>>         ce_name = ce->name + pathlen;
>>
>> +       /* remove directory separator if a sparse directory entry */
>> +       if (S_ISSPARSEDIR(ce->ce_mode))
>> +               ce_len--;
>>         return df_name_compare(ce_name, ce_len, S_IFREG, name, namelen, mode);
> 
> Shouldn't we be passing ce->ce_mode instead of S_IFREG here as well?
> 
> Note the following sort order:
>    foo
>    foo.txt
>    foo/
>    foo/bar
> 
> You've trimmed off the '/', so 'foo/' would be ordered where 'foo' is,
> but df_name_compare() exists to make "foo" sort exactly where "foo/"
> would when "foo" is a directory.  Will your df_name_compare() call
> here result in foo.txt being placed after all the "foo/<subpath>"
> entries in the index and perhaps cause other problems down the line?
> (Are there issues, e.g. with cache-trees getting wrong ordering from
> this, or even writing out indexes or tree objects with the wrong
> ordering?  I've written out trees to disk with wrong ordering before
> and git usually survives but gets really confused with diffs.)
> 
> Since at least one caller of compare_entry() takes the return result
> and does a "if (cmp < 0)", this order is going to matter in some
> cases.  Perhaps we need some testcases where there is a sparse
> directory entry named "foo/" and a file recorded in some relevant tree
> with the name "foo.txt" to be able to trigger these lines of code?

I will do some testing to find out why removing the separator here was
necessary or valuable.

>>  }
>>
>> @@ -993,6 +1003,10 @@ static int compare_entry(const struct cache_entry *ce, const struct traverse_inf
>>         if (cmp)
>>                 return cmp;
>>
>> +       /* If ce is a sparse directory, then allow equality here. */
>> +       if (S_ISSPARSEDIR(ce->ce_mode))
>> +               return 0;
>> +
> 
> Um...so a sparse directory compares equal to _anything_ at all?  I'm
> really confused why this would be desirable.  Am I missing something
> here?

The context is that is removed from the patch is that "cmp" is the
response from do_compare_entry(), which does a length-limited comparison.
If cmp is non-zero, then we've already returned the difference.

The rest of the method is checking if the 'info' input is actually a
parent directory of the _path_ given at this cache entry.

>>         /*
>>          * Even if the beginning compared identically, the ce should
>>          * compare as bigger than a directory leading up to it!

The line after this is:

	return ce_namelen(ce) > traverse_path_len(info, tree_entry_len(n));

This comparison is saying "these paths match up to the directory specified
by info and n, but we need 'ce' to be a file within that directory." But
in the case of a sparse directory entry, we can skip this comparison.

>> @@ -1243,6 +1257,7 @@ static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, str
>>         struct cache_entry *src[MAX_UNPACK_TREES + 1] = { NULL, };
>>         struct unpack_trees_options *o = info->data;
>>         const struct name_entry *p = names;
>> +       unsigned recurse = 1;
> 
> "recurse" sent my mind off into questions about safety checks, base
> cases, etc., instead of just the simple "we don't want to read in
> directories corresponding to sparse entries".  I think this would be
> clearer either if the variable had the sparsity concept embedded in
> its name somewhere (e.g. "unsigned sparse_entry = 0", and check for
> (!sparse_entry) instead of (recurse) below), or with a comment about
> why there are cases where you want to avoid recursion.

I can understand that. This callback is confusing because it _does_
recurse, but through a sequence of methods instead of actually calling
itself.

It would be better to say something like "unpack_subdirectories = 1"
and disabling it when we are in a sparse directory.

>>
>>         /* Find first entry with a real name (we could use "mask" too) */
>>         while (!p->mode)
>> @@ -1284,12 +1299,16 @@ static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, str
>>                                         }
>>                                 }
>>                                 src[0] = ce;
>> +
>> +                               if (S_ISSPARSEDIR(ce->ce_mode))
>> +                                       recurse = 0;
> 
> Ah, the context here doesn't show it but this is in the "if (!cmp)"
> block, i.e. if we found a match for the sparse directory.  This makes
> sense, to me, _if_ we ignore the above question about sparse
> directories matching equal to anything and everything.

I believe that "anything and everything" concern has been resolved.

>> @@ -1319,7 +1338,8 @@ static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, str
>>                         }
>>                 }
>>
>> -               if (traverse_trees_recursive(n, dirmask, mask & ~dirmask,
>> +               if (recurse &&
>> +                   traverse_trees_recursive(n, dirmask, mask & ~dirmask,
>>                                              names, info) < 0)
>>                         return -1;
>>                 return mask;
> 
> Nice.  :-)
> 
> 
> I think your patch was mostly about the recurse stuff, which other
> than the name or a comment about it look good to me.  However, all the
> other preparatory small tweaks brought up a lot of questions or
> confusion for me.  I'm worried there might be a bug or two, though I
> may have just misunderstood some of the code bits.
 
This patch could probably be split up a little to make these things
clearer. Thanks for bringing up the tricky bits.

-Stolee

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 03/10] dir.c: accept a directory as part of cone-mode patterns
  2021-04-20 23:21   ` Elijah Newren
@ 2021-04-21 13:47     ` Derrick Stolee
  0 siblings, 0 replies; 66+ messages in thread
From: Derrick Stolee @ 2021-04-21 13:47 UTC (permalink / raw)
  To: Elijah Newren, Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Junio C Hamano, Derrick Stolee, Derrick Stolee

On 4/20/2021 7:21 PM, Elijah Newren wrote:
> On Tue, Apr 13, 2021 at 7:01 AM Derrick Stolee via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
>>
>> From: Derrick Stolee <dstolee@microsoft.com>
>>
>> When we have sparse directory entries in the index, we want to compare
>> that directory against sparse-checkout patterns. Those pattern matching
>> algorithms are built expecting a file path, not a directory path. This
>> is especially important in the "cone mode" patterns which will match
>> files that exist within the "parent directories" as well as the
>> recursive directory matches.
>>
>> If path_matches_pattern_list() is given a directory, we can add a fake
>> filename ("-") to the directory and get the same results as before,
>> assuming we are in cone mode. Since sparse index requires cone mode
>> patterns, this is an acceptable assumption.
> 
> Makes sense; thanks for the good description.
> 
>> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
>> ---
>>  dir.c | 5 +++++
>>  1 file changed, 5 insertions(+)
>>
>> diff --git a/dir.c b/dir.c
>> index 166238e79f52..57e22e605cec 100644
>> --- a/dir.c
>> +++ b/dir.c
>> @@ -1378,6 +1378,11 @@ enum pattern_match_result path_matches_pattern_list(
>>         strbuf_addch(&parent_pathname, '/');
>>         strbuf_add(&parent_pathname, pathname, pathlen);
>>
>> +       /* Directory requests should be added as if they are a file */
> 
> "added" or "matched"?  Also, the description seems a bit brief and
> likely to surprise; I'd at least want to expand "file" to "file within
> their given directory" but it might be nice to get some summarized
> version of the commit message or at least state that "-" is just a
> random simple name within the given directory.

I can improve this comment.

>> +       if (parent_pathname.len > 1 &&
> 
> Is this line...
> 
>> +           parent_pathname.buf[parent_pathname.len - 1] == '/')
> 
> to prevent an out-of-bounds indexing?  If so, shouldn't it be "> 0" or
> ">= 1" rather than "> 1"?  And if so, doesn't the strbuf_addch() call
> above ensure the condition is always met?
> 
> Or are we trying to avoid adding the "-" when we parent_pathname is
> just a plain "/"?

I believe plain "/" is impossible. There needs to be a valid tree entry
before that first slash ("a/", for example). But that isn't super
important to the logic here and just adds confusion.

> 
>> +               strbuf_add(&parent_pathname, "-", 1);
>> +
> 
> Sorry for all the questions on such a tiny change.  It makes sense to
> me, I'm just curious whether it'll confuse future code readers.

Yes, let's avoid confusion by doing the simple thing and use "> 0".

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 04/10] status: skip sparse-checkout percentage with sparse-index
  2021-04-20 23:26   ` Elijah Newren
@ 2021-04-21 13:51     ` Derrick Stolee
  0 siblings, 0 replies; 66+ messages in thread
From: Derrick Stolee @ 2021-04-21 13:51 UTC (permalink / raw)
  To: Elijah Newren, Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Junio C Hamano, Derrick Stolee, Derrick Stolee

On 4/20/2021 7:26 PM, Elijah Newren wrote:
> On Tue, Apr 13, 2021 at 7:01 AM Derrick Stolee via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
>> Avoid the expensive computation by simplifying the output to only report
>> that a sparse checkout exists, without the percentage.
> 
> Makes sense.  The percentage wasn't critical, it was just a nice UI
> bonus.  The critical part is notifying about being in a sparse
> checkout.
> 
> It makes me wonder slightly if we'd want to remove the percentage for
> both modes just to keep them more similar.  I'll ask some folks for
> their thoughts/opinions.  Of course, that could always be tweaked
> later and doesn't necessarily need to go into your series.

I find the percentage helpful for users who are exploring the
sparse-checkout feature in their repositories. It's nice to know how
much time it is saving, because "percentage of files" frequently
translates to "percentage of time it takes to update the worktree".

I was sad to lose it here, but I don't see any way to keep it.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 05/10] status: use sparse-index throughout
  2021-04-21  0:44   ` Elijah Newren
@ 2021-04-21 13:55     ` Derrick Stolee
  0 siblings, 0 replies; 66+ messages in thread
From: Derrick Stolee @ 2021-04-21 13:55 UTC (permalink / raw)
  To: Elijah Newren, Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Junio C Hamano, Derrick Stolee, Derrick Stolee

On 4/20/2021 8:44 PM, Elijah Newren wrote:
> On Tue, Apr 13, 2021 at 7:01 AM Derrick Stolee via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
>>
>> From: Derrick Stolee <dstolee@microsoft.com>
>>
>> By testing 'git -c core.fsmonitor= status -uno', we can check for the
>> simplest index operations that can be made sparse-aware. The necessary
>> implementation details are already integrated with sparse-checkout, so
>> modify command_requires_full_index to be zero for cmd_status().
>>
>> By running the debugger for 'git status -uno' after that change, we find
>> two instances of ensure_full_index() that were added for extra safety,
>> but can be removed without issue.
>>
>> In refresh_index(), we loop through the index entries. The
>> refresh_cache_ent() method copies the sparse directories into the
>> refreshed index without issue.
> 
> I do see the removal of a call to ensure_full_index() in
> refresh_index() that you mention in this paragraph in the patch below.
> 
> I'm confused, though; I would have thought we wanted to avoid a
> refresh_cache_ent() call.  Also, one of your previous patches added a
> 
>     if (istate->sparse_index && S_ISSPARSEDIR(ce->ce_mode))
>         continue;
> 
> check before the code ever gets to the refresh_cache_ent() call, so as
> far as I can tell, that function won't be called from refresh_entry()
> for sparse entries.  Maybe your commit message here is out-of-date?
> Or am I confused somehow?
> 
>> The loop within run_diff_files() skips things that are in stage 0 and
>> have skip-worktree enabled, so seems safe to disable ensure_full_index()
>> here.
> 
> Unlike the above, I don't see a removal of a ensure_full_index() call
> in run_diff_files() as claimed by this paragraph.  Has the commit
> message gotten out of date with refactorings you did while developing
> this series?

I greatly reduced the number of ensure_full_index() calls in the
previous topic (ds/sparse-index-protections) since first writing this
patch, so it is very likely to be out-of-date. Thanks for calling it out.

>> This allows some cases of 'git status' to no longer expand a sparse
>> index to a full one, giving the following performance improvements for
>> p2000-sparse-checkout-operations.sh:
>>
>> Test                                  HEAD~1           HEAD
>> -----------------------------------------------------------------------------
>> 2000.2: git status (full-index-v3)    0.38(0.36+0.07)  0.37(0.31+0.10) -2.6%
>> 2000.3: git status (full-index-v4)    0.38(0.29+0.12)  0.37(0.30+0.11) -2.6%
>> 2000.4: git status (sparse-index-v3)  2.43(2.33+0.14)  0.04(0.05+0.04) -98.4%
>> 2000.5: git status (sparse-index-v4)  2.44(2.35+0.13)  0.05(0.04+0.07) -98.0%
>>
>> Note that since HEAD~1 was expanding the sparse index by parsing trees,
>> it was artificially slower than the full index case. Thus, the 98%
>> improvement is misleading, and instead we should celebrate the 0.37s to
>> 0.05s improvement of 82%. This is more indicative of the peformance
>> gains we are expecting by using a sparse index.
> 
> 82%, very nice.  Was this with git.git as the test repository, or some
> other repo?  If it's git.git, then we'd actually expect a much bigger
> speedup for other repositories, as git.git is pretty small.
This test script takes the input repository (git.git in this case) and
creates a tree that contains that repository many times over, but only
four copies remain in the sparse-checkout definition. This creates the
big speedup, because of the enormous difference in index size.

As I am exploring commands such as 'merge' and 'rebase' I am finding
that this test setup is too expensive to cover those commands. I will
need to reduce the size of the test repository (by a factor of 4) and
that will reduce how impressive these results are while making the more
complicated commands testable in a reasonable amount of time.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 06/10] dir: use expand_to_path() for sparse directories
  2021-04-21  0:53     ` Elijah Newren
@ 2021-04-21 14:03       ` Derrick Stolee
  0 siblings, 0 replies; 66+ messages in thread
From: Derrick Stolee @ 2021-04-21 14:03 UTC (permalink / raw)
  To: Elijah Newren, Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Junio C Hamano, Derrick Stolee, Derrick Stolee

On 4/20/2021 8:53 PM, Elijah Newren wrote:
> One more thing:
> 
> On Tue, Apr 20, 2021 at 5:52 PM Elijah Newren <newren@gmail.com> wrote:
>>
>> On Tue, Apr 13, 2021 at 7:01 AM Derrick Stolee via GitGitGadget
>> <gitgitgadget@gmail.com> wrote:
>>> Test                                  HEAD~1            HEAD
>>> ------------------------------------------------------------------------------
>>> 2000.4: git status (sparse-index-v3)  1.50(1.43+0.10)   0.04(0.04+0.03) -97.3%
>>> 2000.5: git status (sparse-index-v4)  1.50(1.43+0.10)   0.04(0.03+0.04) -97.3%
>>
>> Um, I'm confused.  In the previous patch you claimed the following speedups:
>>
>> 2000.4: git status (sparse-index-v3)  2.43(2.33+0.14)  0.04(0.05+0.04) -98.4%
>> 2000.5: git status (sparse-index-v4)  2.44(2.35+0.13)  0.05(0.04+0.07) -98.0%
>>
>> I don't understand why the "Before" for this patch claims 1.50 as the
>> initial speed, if the "After" for the last patch was 0.04.  Should the
>> previous commit message have instead claimed:
>>
>> 2000.4: git status (sparse-index-v3)  2.43(2.33+0.14)  1.50(1.43+0.10) -38.3%
>> 2000.5: git status (sparse-index-v4)  2.44(2.35+0.13)  1.50(1.43+0.10) -38.5%
...
>> Oh!  So, the previous patch was testing without enumerating untracked
>> files (because it did those slowly), whereas this one enumerates
>> untracked files and is still able to achieve the same performance?
>> This wasn't very clear from the commit message.  Maybe I'm just bad at
>> reading, but perhaps the commit message could be tweaked slightly to
>> make this more clear?
> 
> Why is the subject of this commit "dir: use expand_to_path() ..." if
> it only touches t1092-sparse-checkout-compatibility.sh?
 
You are right to be confused. This is another patch that simplified due
to refactors in the protections branch. This should just be squashed into
the previous.

For context: an earlier version inserted ensure_full_index() before
every call to index_name_pos() and then this patch swapped that for
a call to expand_to_path(). The change in the protections branch was
to have index_name_pos() call expand_to_path() itself, preventing the
need for these ensure_full_index() calls.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 01/10] t1092: add tests for status/add and sparse files
  2021-04-13 14:01 ` [PATCH 01/10] t1092: add tests for status/add and sparse files Derrick Stolee via GitGitGadget
  2021-04-20 21:52   ` Elijah Newren
@ 2021-04-21 15:14   ` Matheus Tavares Bernardino
  2021-04-23 20:12     ` Derrick Stolee
  1 sibling, 1 reply; 66+ messages in thread
From: Matheus Tavares Bernardino @ 2021-04-21 15:14 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: git, Junio C Hamano, Elijah Newren, Derrick Stolee, Derrick Stolee

Hi, Stolee

You already said you will make changes in this test to make sure
git-add's sparse warning is kept on a sparse index (BTW thanks for
that :), but I just wanted to give a couple suggestions that came to
my mind while reading the patch.

On Tue, Apr 13, 2021 at 11:02 AM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
> index 12e6c453024f..6598c12a2069 100755
> --- a/t/t1092-sparse-checkout-compatibility.sh
> +++ b/t/t1092-sparse-checkout-compatibility.sh
> @@ -232,6 +232,42 @@ test_expect_success 'add, commit, checkout' '
>         test_all_match git checkout -
>  '
>
> +test_expect_success 'status/add: outside sparse cone' '
> +       init_repos &&
> +
> +       # folder1 is at HEAD, but outside the sparse cone
> +       run_on_sparse mkdir folder1 &&
> +       cp initial-repo/folder1/a sparse-checkout/folder1/a &&
> +       cp initial-repo/folder1/a sparse-index/folder1/a &&
> +
> +       test_sparse_match git status &&
> +
> +       write_script edit-contents <<-\EOF &&
> +       echo text >>$1
> +       EOF
> +       run_on_all ../edit-contents folder1/a &&
> +       run_on_all ../edit-contents folder1/new &&
> +
> +       test_sparse_match git status --porcelain=v2 &&
> +
> +       # This "git add folder1/a" is completely ignored
> +       # by the sparse-checkout repos. It causes the
> +       # full repo to have a different staged environment.
> +
> +       test_must_fail git -C sparse-checkout add folder1/a &&
> +       test_must_fail git -C sparse-index add folder1/a &&

To make sure the output is the same, could we collapse these two lines into:

test_sparse_match test_must_fail git add folder1/a ?

And additionally, I think we could repeat this check with `add
--refresh` and also after removing `folder1/a`. The reason I'm saying
this is because the check currently succeeds when `folder1/a` is in
the working tree (maybe because `fill_directory()` ends up expanding
the sparse index in this case?), but not under the two other
circumstances I mentioned (as we've discussed in [1]).

[1]: https://lore.kernel.org/git/CAHd-oW7vCKC-XRM=rX37+jQn_XDzjtar9nNHKQ-4OHSZ=2=KFA@mail.gmail.com/

> +       git -C full-checkout checkout HEAD -- folder1/a &&
> +       test_sparse_match git status --porcelain=v2 &&

Hmm, shouldn't this be `test_all_match`? IIUC, we've resetted
`folder1/a` on the full repo to make sure the status report is the
same across all repos, right?

> +       test_all_match git add . &&
> +       test_all_match git status --porcelain=v2 &&
> +       test_all_match git commit -m folder1/new &&
> +
> +       run_on_all ../edit-contents folder1/newer &&
> +       test_all_match git add folder1/ &&
> +       test_all_match git status --porcelain=v2 &&
> +       test_all_match git commit -m folder1/newer
> +'
> +
>  test_expect_success 'checkout and reset --hard' '
>         init_repos &&
>
> --
> gitgitgadget
>

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 02/10] unpack-trees: make sparse aware
  2021-04-21 13:41     ` Derrick Stolee
@ 2021-04-21 16:11       ` Elijah Newren
  2021-04-22  2:24         ` Matheus Tavares Bernardino
  0 siblings, 1 reply; 66+ messages in thread
From: Elijah Newren @ 2021-04-21 16:11 UTC (permalink / raw)
  To: Derrick Stolee, Matheus Tavares Bernardino
  Cc: Derrick Stolee via GitGitGadget, Git Mailing List,
	Junio C Hamano, Derrick Stolee, Derrick Stolee

// Adding Matheus to cc due to the ignore_skip_worktree bit, given his
experience and expertise with the checkout and unpack-trees code.

On Wed, Apr 21, 2021 at 6:41 AM Derrick Stolee <stolee@gmail.com> wrote:
>
> On 4/20/2021 7:00 PM, Elijah Newren wrote:
> > On Tue, Apr 13, 2021 at 7:01 AM Derrick Stolee via GitGitGadget
> > <gitgitgadget@gmail.com> wrote:
> >> diff --git a/dir.h b/dir.h
> >> index 51cb0e217247..9d6666f520f3 100644
> >> --- a/dir.h
> >> +++ b/dir.h
> >> @@ -503,7 +503,7 @@ static inline int ce_path_match(struct index_state *istate,
> >>                                 char *seen)
> >>  {
> >>         return match_pathspec(istate, pathspec, ce->name, ce_namelen(ce), 0, seen,
> >> -                             S_ISDIR(ce->ce_mode) || S_ISGITLINK(ce->ce_mode));
> >> +                             S_ISSPARSEDIR(ce->ce_mode) || S_ISDIR(ce->ce_mode) || S_ISGITLINK(ce->ce_mode));
> >
> > I'm confused why this change would be needed, or why it'd semantically
> > be meaningful here either.  Doesn't S_ISSPARSEDIR() being true imply
> > S_ISDIR() is true (and perhaps even vice versa?).
> >
> > By chance, was this a leftover from your early RFC changes from a few
> > series ago when you had an entirely different mode for sparse
> > directory entries?
>
> I will double-check on this with additional testing and debugging.
> Your comments below make it clear that this patch would benefit from
> some additional splitting.
>
> >>  }
> >>
> >>  static inline int dir_path_match(struct index_state *istate,
> >> diff --git a/preload-index.c b/preload-index.c
> >> index e5529a586366..35e67057ca9b 100644
> >> --- a/preload-index.c
> >> +++ b/preload-index.c
> >> @@ -55,6 +55,8 @@ static void *preload_thread(void *_data)
> >>                         continue;
> >>                 if (S_ISGITLINK(ce->ce_mode))
> >>                         continue;
> >> +               if (S_ISSPARSEDIR(ce->ce_mode))
> >> +                       continue;
> >>                 if (ce_uptodate(ce))
> >>                         continue;
> >>                 if (ce_skip_worktree(ce))
> >
> > Don't we have S_ISSPARSEDIR(ce->ce_mode) implies ce_skip_worktree(ce)?
> >  Is this a duplicate check?  If so, is it still desirable for
> > future-proofing or code clarity, or is it strictly redundant?
>
> You're right, we could skip this one because the ce_skip_worktree(ce)
> is enough to cover this case. I think I created this one because I was
> auditing uses of S_ISGITLINK().
>
> >> diff --git a/read-cache.c b/read-cache.c
> >> index 29ffa9ac5db9..6308234b4838 100644
> >> --- a/read-cache.c
> >> +++ b/read-cache.c
> >> @@ -1594,6 +1594,9 @@ int refresh_index(struct index_state *istate, unsigned int flags,
> >>                 if (ignore_skip_worktree && ce_skip_worktree(ce))
> >>                         continue;
> >>
> >> +               if (istate->sparse_index && S_ISSPARSEDIR(ce->ce_mode))
> >> +                       continue;
> >> +
> >
> > I'm a bit confused about what could trigger ce_skip_worktree(ce) &&
> > !ignore_skip_worktree and why it'd be desirable to refresh
> > skip-worktree entries.  However, this is tangential to your patch and
> > has apparently been around since 2009 (in particular, from 56cac48c35
> > ("ie_match_stat(): do not ignore skip-worktree bit with
> > CE_MATCH_IGNORE_VALID", 2009-12-14)).
>
> This is probably better served with a statement like this earlier in
> the method:
>
>         if (ignore_skip_worktree)
>                 ensure_full_index(istate);
>
> It seems like ignoring the skip worktree bits is a rare occasion and
> it will be worth expanding the index for that case.

Maybe...I read the commit message that introduced the behavior and
it's not very convincing to me that SKIP_WORKTREE should be ignored
(it's also not that clear to me what the conditions are; is it just
update-index --really-refresh?); it may be worth double checking on
that assumption first, especially given how many other bugs existed
with skip_worktree stuff for years.  If it's necessary, then I agree
that your extra if-check makes sense.

In particular, I think it'd be really dumb for "update-index
--really-refresh" to read in and populate a huge subdirectory just to
stat files that don't exist because they are in directories that don't
exist.  And I think there's a pretty good argument to not update stat
information for skip_worktree entries in non-sparse-index cases even
in the presence of that flag, especially given Matheus' other recent
changes in this area (the emails just before we got to the point of
discussing SKIP_WORKTREE and racy clean entries...speaking of which,
it might be worthwhile pinging Matheus' for opinions on this issue
too.)

> >>                 if (pathspec && !ce_path_match(istate, ce, pathspec, seen))
> >>                         filtered = 1;
> >>
> >> diff --git a/unpack-trees.c b/unpack-trees.c
> >> index dddf106d5bd4..9a62e823928a 100644
> >> --- a/unpack-trees.c
> >> +++ b/unpack-trees.c
> >> @@ -586,6 +586,13 @@ static void mark_ce_used(struct cache_entry *ce, struct unpack_trees_options *o)
> >>  {
> >>         ce->ce_flags |= CE_UNPACKED;
> >>
> >> +       /*
> >> +        * If this is a sparse directory, don't advance cache_bottom.
> >> +        * That will be advanced later using the cache-tree data.
> >> +        */
> >> +       if (S_ISSPARSEDIR(ce->ce_mode))
> >> +               return;
> >> +
> >
> > I don't understand cache_bottom stuff; we might want to get Junio to
> > look over it.  Or maybe I just need to dig a bit further and attempt
> > to understand it.
>
> I remember looking very careful at this when I created this (and found
> it worth a comment) but I don't recall enough off the top of my head.
> This is worth splitting out with a careful message, which will force me
> to reexamine the cache_bottom member.
>
> >>         if (o->cache_bottom < o->src_index->cache_nr &&
> >>             o->src_index->cache[o->cache_bottom] == ce) {
> >>                 int bottom = o->cache_bottom;
> >> @@ -984,6 +991,9 @@ static int do_compare_entry(const struct cache_entry *ce,
> >>         ce_len -= pathlen;
> >>         ce_name = ce->name + pathlen;
> >>
> >> +       /* remove directory separator if a sparse directory entry */
> >> +       if (S_ISSPARSEDIR(ce->ce_mode))
> >> +               ce_len--;
> >>         return df_name_compare(ce_name, ce_len, S_IFREG, name, namelen, mode);
> >
> > Shouldn't we be passing ce->ce_mode instead of S_IFREG here as well?
> >
> > Note the following sort order:
> >    foo
> >    foo.txt
> >    foo/
> >    foo/bar
> >
> > You've trimmed off the '/', so 'foo/' would be ordered where 'foo' is,
> > but df_name_compare() exists to make "foo" sort exactly where "foo/"
> > would when "foo" is a directory.  Will your df_name_compare() call
> > here result in foo.txt being placed after all the "foo/<subpath>"
> > entries in the index and perhaps cause other problems down the line?
> > (Are there issues, e.g. with cache-trees getting wrong ordering from
> > this, or even writing out indexes or tree objects with the wrong
> > ordering?  I've written out trees to disk with wrong ordering before
> > and git usually survives but gets really confused with diffs.)
> >
> > Since at least one caller of compare_entry() takes the return result
> > and does a "if (cmp < 0)", this order is going to matter in some
> > cases.  Perhaps we need some testcases where there is a sparse
> > directory entry named "foo/" and a file recorded in some relevant tree
> > with the name "foo.txt" to be able to trigger these lines of code?
>
> I will do some testing to find out why removing the separator here was
> necessary or valuable.

I think you removed the separator because df_name_compare() assumes it
gets a regular filename (i.e. no trailing '/') and manually adds one
based on mode for directories.  You were probably worried about what
amounts to a non-sensical double '/', but df_name_compare() wouldn't
actually get to that point unless someone somehow recorded a path
within a git tree object that ended with a trailing '/'.  I'd rather
not have to worry about the double '/' and explain why it isn't
possible (or wonder about whether git trees with trailing '/'
characters could be recorded on some OS), so I think the trimming of
the separator as you did makes sense.

What doesn't make sense to me is that the code just below had a
hardcoded S_IFREG that it passed to df_name_compare, based on "this is
a cache entry, and index entries are _always_ regular files".  You
didn't change that, even though it's now a false assumption.
symlinks, and regular files should be passed as S_IFREG there, I'm not
sure what should be passed for submodules (though the fact that it's
been using S_IFREG for years suggests maybe that is the mode we want
for it, so we can't use ce->ce_mode), and I'm pretty sure sparse
directory entries should be passed as S_IFDIR in order to get the
sorting right unless you stop stripping the trailing '/' character.
I'm not exactly sure where the sorting for do_compare_entry() affects
the code later, but I tried to trace it out a little in my comments
above in order to guide some testing.

> >>  }
> >>
> >> @@ -993,6 +1003,10 @@ static int compare_entry(const struct cache_entry *ce, const struct traverse_inf
> >>         if (cmp)
> >>                 return cmp;
> >>
> >> +       /* If ce is a sparse directory, then allow equality here. */
> >> +       if (S_ISSPARSEDIR(ce->ce_mode))
> >> +               return 0;
> >> +
> >
> > Um...so a sparse directory compares equal to _anything_ at all?  I'm
> > really confused why this would be desirable.  Am I missing something
> > here?
>
> The context is that is removed from the patch is that "cmp" is the
> response from do_compare_entry(), which does a length-limited comparison.
> If cmp is non-zero, then we've already returned the difference.
>
> The rest of the method is checking if the 'info' input is actually a
> parent directory of the _path_ given at this cache entry.

Ah, thanks for the explanation.  So the only way we get here with
cmp==0 when we're dealing with a sparse directory entry is if we found
a directory by the same name....

> >>         /*
> >>          * Even if the beginning compared identically, the ce should
> >>          * compare as bigger than a directory leading up to it!
>
> The line after this is:
>
>         return ce_namelen(ce) > traverse_path_len(info, tree_entry_len(n));
>
> This comparison is saying "these paths match up to the directory specified
> by info and n, but we need 'ce' to be a file within that directory." But
> in the case of a sparse directory entry, we can skip this comparison.

Isn't this "must skip" rather than "can skip"?  If we're considering
the ce path "foo/bar/", then the traverse_path would be "foo/bar" and
we'd have:
    ce_namelen(ce) == 1 + traverse_path_len(info, tree_entry_len(n))
so this would return 1 for the comparison making them be treated as
non-equal even though they are what we consider equal entries.

In any event, it seems like this new check could use a better comment
than "then allow equality here".

> >> @@ -1243,6 +1257,7 @@ static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, str
> >>         struct cache_entry *src[MAX_UNPACK_TREES + 1] = { NULL, };
> >>         struct unpack_trees_options *o = info->data;
> >>         const struct name_entry *p = names;
> >> +       unsigned recurse = 1;
> >
> > "recurse" sent my mind off into questions about safety checks, base
> > cases, etc., instead of just the simple "we don't want to read in
> > directories corresponding to sparse entries".  I think this would be
> > clearer either if the variable had the sparsity concept embedded in
> > its name somewhere (e.g. "unsigned sparse_entry = 0", and check for
> > (!sparse_entry) instead of (recurse) below), or with a comment about
> > why there are cases where you want to avoid recursion.
>
> I can understand that. This callback is confusing because it _does_
> recurse, but through a sequence of methods instead of actually calling
> itself.
>
> It would be better to say something like "unpack_subdirectories = 1"
> and disabling it when we are in a sparse directory.

I like that name.

>
> >>
> >>         /* Find first entry with a real name (we could use "mask" too) */
> >>         while (!p->mode)
> >> @@ -1284,12 +1299,16 @@ static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, str
> >>                                         }
> >>                                 }
> >>                                 src[0] = ce;
> >> +
> >> +                               if (S_ISSPARSEDIR(ce->ce_mode))
> >> +                                       recurse = 0;
> >
> > Ah, the context here doesn't show it but this is in the "if (!cmp)"
> > block, i.e. if we found a match for the sparse directory.  This makes
> > sense, to me, _if_ we ignore the above question about sparse
> > directories matching equal to anything and everything.
>
> I believe that "anything and everything" concern has been resolved.

Yes, if we just improve the "then allow equality here" comment.

> >> @@ -1319,7 +1338,8 @@ static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, str
> >>                         }
> >>                 }
> >>
> >> -               if (traverse_trees_recursive(n, dirmask, mask & ~dirmask,
> >> +               if (recurse &&
> >> +                   traverse_trees_recursive(n, dirmask, mask & ~dirmask,
> >>                                              names, info) < 0)
> >>                         return -1;
> >>                 return mask;
> >
> > Nice.  :-)
> >
> >
> > I think your patch was mostly about the recurse stuff, which other
> > than the name or a comment about it look good to me.  However, all the
> > other preparatory small tweaks brought up a lot of questions or
> > confusion for me.  I'm worried there might be a bug or two, though I
> > may have just misunderstood some of the code bits.
>
> This patch could probably be split up a little to make these things
> clearer. Thanks for bringing up the tricky bits.
>
> -Stolee

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 02/10] unpack-trees: make sparse aware
  2021-04-20 23:00   ` Elijah Newren
  2021-04-21 13:41     ` Derrick Stolee
@ 2021-04-21 17:27     ` Derrick Stolee
  2021-04-21 18:55       ` Matheus Tavares Bernardino
  2021-04-21 18:56       ` Elijah Newren
  1 sibling, 2 replies; 66+ messages in thread
From: Derrick Stolee @ 2021-04-21 17:27 UTC (permalink / raw)
  To: Elijah Newren, Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Junio C Hamano, Derrick Stolee, Derrick Stolee,
	Matheus Tavares Bernardino

On 4/20/2021 7:00 PM, Elijah Newren wrote:
> On Tue, Apr 13, 2021 at 7:01 AM Derrick Stolee via GitGitGadget
> <gitgitgadget@gmail.com> wrote:

>> diff --git a/read-cache.c b/read-cache.c
>> index 29ffa9ac5db9..6308234b4838 100644
>> --- a/read-cache.c
>> +++ b/read-cache.c
>> @@ -1594,6 +1594,9 @@ int refresh_index(struct index_state *istate, unsigned int flags,
>>                 if (ignore_skip_worktree && ce_skip_worktree(ce))
>>                         continue;
>>
>> +               if (istate->sparse_index && S_ISSPARSEDIR(ce->ce_mode))
>> +                       continue;
>> +
> 
> I'm a bit confused about what could trigger ce_skip_worktree(ce) &&
> !ignore_skip_worktree and why it'd be desirable to refresh
> skip-worktree entries.  However, this is tangential to your patch and
> has apparently been around since 2009 (in particular, from 56cac48c35
> ("ie_match_stat(): do not ignore skip-worktree bit with
> CE_MATCH_IGNORE_VALID", 2009-12-14)).

I did some more digging on this part here. There has been movement in
this space!

The thing that triggers this ignore_skip_worktree variable inside
refresh_index() is now the REFRESH_IGNORE_SKIP_WORKTREE flag which was
introduced recently and is set only by builtin/add.c:refresh(), by
Matheus: a20f704 (add: warn when asked to update SKIP_WORKTREE entries,
2021-04-08).

This means that we can (for now) keep the behavior the same by adding

	if (ignore_skip_worktree)
		ensure_full_index(istate);

before the loop. This prevents the expansion during 'git status', but
requires modification before we are ready for 'git add' to work
correctly. Specifically, 'git add' currently warns only when adding
something that exactly matches a tracked file with SKIP_WORKTREE. It
does _not_ warn when adding something that is untracked but would have
the SKIP_WORKTREE bit if it was tracked. We will need to add that
extra warning if we want to avoid expanding during 'git add'.

Alternatively, we can decide to change the behavior here and send an
error() and return failure if they try to add something that would
live within a sparse-directory entry. I will think more on this and
have a good answer before v2 is ready.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 02/10] unpack-trees: make sparse aware
  2021-04-21 17:27     ` Derrick Stolee
@ 2021-04-21 18:55       ` Matheus Tavares Bernardino
  2021-04-21 19:10         ` Elijah Newren
  2021-04-21 18:56       ` Elijah Newren
  1 sibling, 1 reply; 66+ messages in thread
From: Matheus Tavares Bernardino @ 2021-04-21 18:55 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Elijah Newren, Derrick Stolee via GitGitGadget, Git Mailing List,
	Junio C Hamano, Derrick Stolee, Derrick Stolee

On Wed, Apr 21, 2021 at 2:27 PM Derrick Stolee <stolee@gmail.com> wrote:
>
> On 4/20/2021 7:00 PM, Elijah Newren wrote:
> > On Tue, Apr 13, 2021 at 7:01 AM Derrick Stolee via GitGitGadget
> > <gitgitgadget@gmail.com> wrote:
>
> >> diff --git a/read-cache.c b/read-cache.c
> >> index 29ffa9ac5db9..6308234b4838 100644
> >> --- a/read-cache.c
> >> +++ b/read-cache.c
> >> @@ -1594,6 +1594,9 @@ int refresh_index(struct index_state *istate, unsigned int flags,
> >>                 if (ignore_skip_worktree && ce_skip_worktree(ce))
> >>                         continue;
> >>
> >> +               if (istate->sparse_index && S_ISSPARSEDIR(ce->ce_mode))
> >> +                       continue;
> >> +
> >
> > I'm a bit confused about what could trigger ce_skip_worktree(ce) &&
> > !ignore_skip_worktree and why it'd be desirable to refresh
> > skip-worktree entries.  However, this is tangential to your patch and
> > has apparently been around since 2009 (in particular, from 56cac48c35
> > ("ie_match_stat(): do not ignore skip-worktree bit with
> > CE_MATCH_IGNORE_VALID", 2009-12-14)).
>
> I did some more digging on this part here. There has been movement in
> this space!
>
> The thing that triggers this ignore_skip_worktree variable inside
> refresh_index() is now the REFRESH_IGNORE_SKIP_WORKTREE flag which was
> introduced recently and is set only by builtin/add.c:refresh(), by
> Matheus: a20f704 (add: warn when asked to update SKIP_WORKTREE entries,
> 2021-04-08).
>
> This means that we can (for now) keep the behavior the same by adding
>
>         if (ignore_skip_worktree)
>                 ensure_full_index(istate);
>
> before the loop.

Hmm, I don't think we need to expand the index here.
ignore_skip_worktree makes the loop below ignore entries with the
skip_worktree bit set. Since sparse dirs also have this bit set, we
will already get the behavior we want :)

However, I think we will need to expand the index at
`find_pathspecs_matching_against_index()` in order to find and warn
about the pathspecs that have matches among skip_worktree entries...

> This prevents the expansion during 'git status', but
> requires modification before we are ready for 'git add' to work
> correctly. Specifically, 'git add' currently warns only when adding
> something that exactly matches a tracked file with SKIP_WORKTREE. It
> does _not_ warn when adding something that is untracked but would have
> the SKIP_WORKTREE bit if it was tracked. We will need to add that
> extra warning if we want to avoid expanding during 'git add'.

Hmm, I see :( I was trying to think if it would be possible to do the
pathspec matching (for the warning) without having to expand the
index, but then there are the untracked files... If the user gives
"a/*/c" and we have "a/b/" as a sparse dir, we don't know if "a/b/c"
is a skip_worktree entry or an untracked file without expanding the
index...

> Alternatively, we can decide to change the behavior here and send an
> error() and return failure if they try to add something that would
> live within a sparse-directory entry.

I think this behavior would be tricky to replicate on non-sparse-index
sparse-checkouts, if we were to do that. We would have to pathspec
match each untracked file against the sparsity patterns, perhaps?

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 02/10] unpack-trees: make sparse aware
  2021-04-21 17:27     ` Derrick Stolee
  2021-04-21 18:55       ` Matheus Tavares Bernardino
@ 2021-04-21 18:56       ` Elijah Newren
  2021-04-23 20:16         ` Derrick Stolee
  1 sibling, 1 reply; 66+ messages in thread
From: Elijah Newren @ 2021-04-21 18:56 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Derrick Stolee via GitGitGadget, Git Mailing List,
	Junio C Hamano, Derrick Stolee, Derrick Stolee,
	Matheus Tavares Bernardino

On Wed, Apr 21, 2021 at 10:27 AM Derrick Stolee <stolee@gmail.com> wrote:
>
> On 4/20/2021 7:00 PM, Elijah Newren wrote:
> > On Tue, Apr 13, 2021 at 7:01 AM Derrick Stolee via GitGitGadget
> > <gitgitgadget@gmail.com> wrote:
>
> >> diff --git a/read-cache.c b/read-cache.c
> >> index 29ffa9ac5db9..6308234b4838 100644
> >> --- a/read-cache.c
> >> +++ b/read-cache.c
> >> @@ -1594,6 +1594,9 @@ int refresh_index(struct index_state *istate, unsigned int flags,
> >>                 if (ignore_skip_worktree && ce_skip_worktree(ce))
> >>                         continue;
> >>
> >> +               if (istate->sparse_index && S_ISSPARSEDIR(ce->ce_mode))
> >> +                       continue;
> >> +
> >
> > I'm a bit confused about what could trigger ce_skip_worktree(ce) &&
> > !ignore_skip_worktree and why it'd be desirable to refresh
> > skip-worktree entries.  However, this is tangential to your patch and
> > has apparently been around since 2009 (in particular, from 56cac48c35
> > ("ie_match_stat(): do not ignore skip-worktree bit with
> > CE_MATCH_IGNORE_VALID", 2009-12-14)).
>
> I did some more digging on this part here. There has been movement in
> this space!
>
> The thing that triggers this ignore_skip_worktree variable inside
> refresh_index() is now the REFRESH_IGNORE_SKIP_WORKTREE flag which was
> introduced recently and is set only by builtin/add.c:refresh(), by
> Matheus: a20f704 (add: warn when asked to update SKIP_WORKTREE entries,
> 2021-04-08).
>
> This means that we can (for now) keep the behavior the same by adding
>
>         if (ignore_skip_worktree)
>                 ensure_full_index(istate);
>
> before the loop. This prevents the expansion during 'git status', but
> requires modification before we are ready for 'git add' to work
> correctly. Specifically, 'git add' currently warns only when adding
> something that exactly matches a tracked file with SKIP_WORKTREE. It
> does _not_ warn when adding something that is untracked but would have
> the SKIP_WORKTREE bit if it was tracked. We will need to add that
> extra warning if we want to avoid expanding during 'git add'.
>
> Alternatively, we can decide to change the behavior here and send an
> error() and return failure if they try to add something that would
> live within a sparse-directory entry. I will think more on this and
> have a good answer before v2 is ready.

See my comments on 01/10; users are already getting surprised by "git
add" today and has been going on for months (though not super
frequently).  When they try to "git add" an untracked path that would
not match any path specifications in $GIT_DIR/info/sparse-checkout,
the fact that "git add" doesn't error out (or at the very least give a
warning) causes _subsequent_ commands to surprise the user with their
behavior; the fact that it is some later command that does weird stuff
(removing the file from the working tree) makes it harder for them to
try to understand and make sense of.  So, I'd say we do want to change
the behavior here...and not just for sparse-indexes but
sparse-checkouts in general.

As for how this affects the code, I think I'm behind both you and
Matheus on understanding here, but I'm starting to think it was a good
idea for me to spout my offhand comment on what looked like a funny
code smell that I thought was unrelated to your patch.  Sounds like it
is causing some good digging...I'll try to read up more on the results
when you send v2.  :-)

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 02/10] unpack-trees: make sparse aware
  2021-04-21 18:55       ` Matheus Tavares Bernardino
@ 2021-04-21 19:10         ` Elijah Newren
  2021-04-21 19:51           ` Matheus Tavares Bernardino
  0 siblings, 1 reply; 66+ messages in thread
From: Elijah Newren @ 2021-04-21 19:10 UTC (permalink / raw)
  To: Matheus Tavares Bernardino
  Cc: Derrick Stolee, Derrick Stolee via GitGitGadget,
	Git Mailing List, Junio C Hamano, Derrick Stolee, Derrick Stolee

On Wed, Apr 21, 2021 at 11:55 AM Matheus Tavares Bernardino
<matheus.bernardino@usp.br> wrote:
>
> On Wed, Apr 21, 2021 at 2:27 PM Derrick Stolee <stolee@gmail.com> wrote:
> >
> > On 4/20/2021 7:00 PM, Elijah Newren wrote:
> > > On Tue, Apr 13, 2021 at 7:01 AM Derrick Stolee via GitGitGadget
> > > <gitgitgadget@gmail.com> wrote:
> >
> > >> diff --git a/read-cache.c b/read-cache.c
> > >> index 29ffa9ac5db9..6308234b4838 100644
> > >> --- a/read-cache.c
> > >> +++ b/read-cache.c
> > >> @@ -1594,6 +1594,9 @@ int refresh_index(struct index_state *istate, unsigned int flags,
> > >>                 if (ignore_skip_worktree && ce_skip_worktree(ce))
> > >>                         continue;
> > >>
> > >> +               if (istate->sparse_index && S_ISSPARSEDIR(ce->ce_mode))
> > >> +                       continue;
> > >> +
> > >
> > > I'm a bit confused about what could trigger ce_skip_worktree(ce) &&
> > > !ignore_skip_worktree and why it'd be desirable to refresh
> > > skip-worktree entries.  However, this is tangential to your patch and
> > > has apparently been around since 2009 (in particular, from 56cac48c35
> > > ("ie_match_stat(): do not ignore skip-worktree bit with
> > > CE_MATCH_IGNORE_VALID", 2009-12-14)).
> >
> > I did some more digging on this part here. There has been movement in
> > this space!
> >
> > The thing that triggers this ignore_skip_worktree variable inside
> > refresh_index() is now the REFRESH_IGNORE_SKIP_WORKTREE flag which was
> > introduced recently and is set only by builtin/add.c:refresh(), by
> > Matheus: a20f704 (add: warn when asked to update SKIP_WORKTREE entries,
> > 2021-04-08).
> >
> > This means that we can (for now) keep the behavior the same by adding
> >
> >         if (ignore_skip_worktree)
> >                 ensure_full_index(istate);
> >
> > before the loop.
>
> Hmm, I don't think we need to expand the index here.
> ignore_skip_worktree makes the loop below ignore entries with the
> skip_worktree bit set. Since sparse dirs also have this bit set, we
> will already get the behavior we want :)
>
> However, I think we will need to expand the index at
> `find_pathspecs_matching_against_index()` in order to find and warn
> about the pathspecs that have matches among skip_worktree entries...
>
> > This prevents the expansion during 'git status', but
> > requires modification before we are ready for 'git add' to work
> > correctly. Specifically, 'git add' currently warns only when adding
> > something that exactly matches a tracked file with SKIP_WORKTREE. It
> > does _not_ warn when adding something that is untracked but would have
> > the SKIP_WORKTREE bit if it was tracked. We will need to add that
> > extra warning if we want to avoid expanding during 'git add'.
>
> Hmm, I see :( I was trying to think if it would be possible to do the
> pathspec matching (for the warning) without having to expand the
> index, but then there are the untracked files... If the user gives
> "a/*/c" and we have "a/b/" as a sparse dir, we don't know if "a/b/c"
> is a skip_worktree entry or an untracked file without expanding the
> index...

I thought Stolee's series added something that could allow us to check
that e.g. "a/b/c" corresponded to an entry under the sparse directory
"a/b/" and thus is a would-be-sparse entry.  Can we use that?

> > Alternatively, we can decide to change the behavior here and send an
> > error() and return failure if they try to add something that would
> > live within a sparse-directory entry.
>
> I think this behavior would be tricky to replicate on non-sparse-index
> sparse-checkouts, if we were to do that. We would have to pathspec
> match each untracked file against the sparsity patterns, perhaps?

By way of analogy, don't we have to pay the cost of pathspec matching
each tree entry against the sparsity patterns when doing a checkout
before putting those entries into the index?  Since "git add" is
trying to put new entries into the index, doesn't it make sense for it
to pay the same cost for the untracked paths it is about to place
there?

Sure, that can be expensive for non-cone mode, but that's the price
users pay for using sparse-checkouts and not using cone mode, and they
pay it every time they try to update the index with some new checkout.
I think "git add" should be treated similarly as another way to update
the index -- especially since users will get confused (and have gotten
confused) by subsequent commands if we don't do those checks.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 02/10] unpack-trees: make sparse aware
  2021-04-21 19:10         ` Elijah Newren
@ 2021-04-21 19:51           ` Matheus Tavares Bernardino
  0 siblings, 0 replies; 66+ messages in thread
From: Matheus Tavares Bernardino @ 2021-04-21 19:51 UTC (permalink / raw)
  To: Elijah Newren
  Cc: Derrick Stolee, Derrick Stolee via GitGitGadget,
	Git Mailing List, Junio C Hamano, Derrick Stolee, Derrick Stolee

On Wed, Apr 21, 2021 at 4:11 PM Elijah Newren <newren@gmail.com> wrote:
>
> On Wed, Apr 21, 2021 at 11:55 AM Matheus Tavares Bernardino
> <matheus.bernardino@usp.br> wrote:
> >
> > On Wed, Apr 21, 2021 at 2:27 PM Derrick Stolee <stolee@gmail.com> wrote:
> > >
> > > On 4/20/2021 7:00 PM, Elijah Newren wrote:
> > > > On Tue, Apr 13, 2021 at 7:01 AM Derrick Stolee via GitGitGadget
> > > > <gitgitgadget@gmail.com> wrote:
> > >
> > > >> diff --git a/read-cache.c b/read-cache.c
> > > >> index 29ffa9ac5db9..6308234b4838 100644
> > > >> --- a/read-cache.c
> > > >> +++ b/read-cache.c
> > > >> @@ -1594,6 +1594,9 @@ int refresh_index(struct index_state *istate, unsigned int flags,
> > > >>                 if (ignore_skip_worktree && ce_skip_worktree(ce))
> > > >>                         continue;
> > > >>
> > > >> +               if (istate->sparse_index && S_ISSPARSEDIR(ce->ce_mode))
> > > >> +                       continue;
> > > >> +
> > > >
> > > > I'm a bit confused about what could trigger ce_skip_worktree(ce) &&
> > > > !ignore_skip_worktree and why it'd be desirable to refresh
> > > > skip-worktree entries.  However, this is tangential to your patch and
> > > > has apparently been around since 2009 (in particular, from 56cac48c35
> > > > ("ie_match_stat(): do not ignore skip-worktree bit with
> > > > CE_MATCH_IGNORE_VALID", 2009-12-14)).
> > >
> > > I did some more digging on this part here. There has been movement in
> > > this space!
> > >
> > > The thing that triggers this ignore_skip_worktree variable inside
> > > refresh_index() is now the REFRESH_IGNORE_SKIP_WORKTREE flag which was
> > > introduced recently and is set only by builtin/add.c:refresh(), by
> > > Matheus: a20f704 (add: warn when asked to update SKIP_WORKTREE entries,
> > > 2021-04-08).
> > >
> > > This means that we can (for now) keep the behavior the same by adding
> > >
> > >         if (ignore_skip_worktree)
> > >                 ensure_full_index(istate);
> > >
> > > before the loop.
> >
> > Hmm, I don't think we need to expand the index here.
> > ignore_skip_worktree makes the loop below ignore entries with the
> > skip_worktree bit set. Since sparse dirs also have this bit set, we
> > will already get the behavior we want :)
> >
> > However, I think we will need to expand the index at
> > `find_pathspecs_matching_against_index()` in order to find and warn
> > about the pathspecs that have matches among skip_worktree entries...
> >
> > > This prevents the expansion during 'git status', but
> > > requires modification before we are ready for 'git add' to work
> > > correctly. Specifically, 'git add' currently warns only when adding
> > > something that exactly matches a tracked file with SKIP_WORKTREE. It
> > > does _not_ warn when adding something that is untracked but would have
> > > the SKIP_WORKTREE bit if it was tracked. We will need to add that
> > > extra warning if we want to avoid expanding during 'git add'.
> >
> > Hmm, I see :( I was trying to think if it would be possible to do the
> > pathspec matching (for the warning) without having to expand the
> > index, but then there are the untracked files... If the user gives
> > "a/*/c" and we have "a/b/" as a sparse dir, we don't know if "a/b/c"
> > is a skip_worktree entry or an untracked file without expanding the
> > index...
>
> I thought Stolee's series added something that could allow us to check
> that e.g. "a/b/c" corresponded to an entry under the sparse directory
> "a/b/" and thus is a would-be-sparse entry.  Can we use that?

Yes, you mean for the warning on untracked paths that would become
sparse entries, right? The problem I was considering there was the
warning on tracked entries only, in which case I'm not sure if it
would help.

> > > Alternatively, we can decide to change the behavior here and send an
> > > error() and return failure if they try to add something that would
> > > live within a sparse-directory entry.
> >
> > I think this behavior would be tricky to replicate on non-sparse-index
> > sparse-checkouts, if we were to do that. We would have to pathspec
> > match each untracked file against the sparsity patterns, perhaps?
>
> By way of analogy, don't we have to pay the cost of pathspec matching
> each tree entry against the sparsity patterns when doing a checkout
> before putting those entries into the index?  Since "git add" is
> trying to put new entries into the index, doesn't it make sense for it
> to pay the same cost for the untracked paths it is about to place
> there?
>
> Sure, that can be expensive for non-cone mode, but that's the price
> users pay for using sparse-checkouts and not using cone mode, and they
> pay it every time they try to update the index with some new checkout.
> I think "git add" should be treated similarly as another way to update
> the index -- especially since users will get confused (and have gotten
> confused) by subsequent commands if we don't do those checks.

Good point. Yeah, that all makes sense :)

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 02/10] unpack-trees: make sparse aware
  2021-04-21 16:11       ` Elijah Newren
@ 2021-04-22  2:24         ` Matheus Tavares Bernardino
  0 siblings, 0 replies; 66+ messages in thread
From: Matheus Tavares Bernardino @ 2021-04-22  2:24 UTC (permalink / raw)
  To: Elijah Newren
  Cc: Derrick Stolee, Derrick Stolee via GitGitGadget,
	Git Mailing List, Junio C Hamano, Derrick Stolee, Derrick Stolee

On Wed, Apr 21, 2021 at 1:11 PM Elijah Newren <newren@gmail.com> wrote:
>
> // Adding Matheus to cc due to the ignore_skip_worktree bit, given his
> experience and expertise with the checkout and unpack-trees code.
>
> On Wed, Apr 21, 2021 at 6:41 AM Derrick Stolee <stolee@gmail.com> wrote:
> >
> > On 4/20/2021 7:00 PM, Elijah Newren wrote:
> > > On Tue, Apr 13, 2021 at 7:01 AM Derrick Stolee via GitGitGadget
> > > <gitgitgadget@gmail.com> wrote:
> > >>
> > >> diff --git a/read-cache.c b/read-cache.c
> > >> index 29ffa9ac5db9..6308234b4838 100644
> > >> --- a/read-cache.c
> > >> +++ b/read-cache.c
> > >> @@ -1594,6 +1594,9 @@ int refresh_index(struct index_state *istate, unsigned int flags,
> > >>                 if (ignore_skip_worktree && ce_skip_worktree(ce))
> > >>                         continue;
> > >>
> > >> +               if (istate->sparse_index && S_ISSPARSEDIR(ce->ce_mode))
> > >> +                       continue;
> > >> +
> > >
> > > I'm a bit confused about what could trigger ce_skip_worktree(ce) &&
> > > !ignore_skip_worktree and why it'd be desirable to refresh
> > > skip-worktree entries.

The skip-worktree entries are not really refreshed in refresh_index(),
even when !ignore_skip_worktree (which is the default case; i.e.
without the REFRESH_IGNORE_SKIP_WORKTREE flag).

This flag (which is currently only used by `git add --refresh`s code
at `builtin/add.c:refresh()`), just makes refresh_index() skip the
following operations on skip-worktree entries: pathspec matching,
marking the matches on `seen`, checking/warning if unmerged, and
marking the entry as up-to-date (i.e. with the in-memory CE_UPTODATE
bit).

I added this flag in mt/add-rm-in-sparse-checkout and changed
`builtin/add.c:refresh()` to use it mainly because we needed a `seen`
array with only matches from non-skip-worktree entries so that we
could later decide when to emit the warning. (In fact, the original
implementation of the flag only controlled whether sparse matches
would be marked on `seen` or not [1])

[1]: https://lore.kernel.org/git/d65b214dd1d83a2e8710a9bbf98477c1929f0d5e.1614138107.git.matheus.bernardino@usp.br/

Perhaps we could alternatively make refresh_index() skip the
previously mentioned operations on all skip-worktrees entries
*unconditionally*. I.e. having, early in the loop:

if (ce_skip_worktree(ce))
        continue;

But I'm not familiar enough with CE_UPTODATE and how it's used in
different parts of the code base, so I didn't want to risk introducing
any bugs at refresh_index() callers that might want/expect the
function to set the CE_UPTODATE bit on the skip-worktree entries. The
case of `git add --refresh` was much narrower and easier to analyze,
and that's what we were interested in for the warning. That's why I
only changed the behavior there :)

> > > However, this is tangential to your patch and
> > > has apparently been around since 2009 (in particular, from 56cac48c35
> > > ("ie_match_stat(): do not ignore skip-worktree bit with
> > > CE_MATCH_IGNORE_VALID", 2009-12-14)).

Note that the `CE_MATCH_IGNORE_SKIP_WORKTREE` added in this patch does
control if refresh_cache_ent() will refresh skip-worktree entries, but
refresh_index() allways calls this function *without* this flag.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 01/10] t1092: add tests for status/add and sparse files
  2021-04-21 15:14   ` Matheus Tavares Bernardino
@ 2021-04-23 20:12     ` Derrick Stolee
  0 siblings, 0 replies; 66+ messages in thread
From: Derrick Stolee @ 2021-04-23 20:12 UTC (permalink / raw)
  To: Matheus Tavares Bernardino, Derrick Stolee via GitGitGadget
  Cc: git, Junio C Hamano, Elijah Newren, Derrick Stolee, Derrick Stolee

On 4/21/2021 11:14 AM, Matheus Tavares Bernardino wrote:
> Hi, Stolee
> 
> You already said you will make changes in this test to make sure
> git-add's sparse warning is kept on a sparse index (BTW thanks for
> that :), but I just wanted to give a couple suggestions that came to
> my mind while reading the patch.

I appreciate the suggestions! More tests always help me from
making mistakes, and you are definitely more of a 'git add'
expert than me.
 
>> +       test_must_fail git -C sparse-checkout add folder1/a &&
>> +       test_must_fail git -C sparse-index add folder1/a &&
> 
> To make sure the output is the same, could we collapse these two lines into:
> 
> test_sparse_match test_must_fail git add folder1/a ?

This is elegant. I'm sad I didn't think of it earlier.

> And additionally, I think we could repeat this check with `add
> --refresh` and also after removing `folder1/a`. The reason I'm saying
> this is because the check currently succeeds when `folder1/a` is in
> the working tree (maybe because `fill_directory()` ends up expanding
> the sparse index in this case?), but not under the two other
> circumstances I mentioned (as we've discussed in [1]).
> 
> [1]: https://lore.kernel.org/git/CAHd-oW7vCKC-XRM=rX37+jQn_XDzjtar9nNHKQ-4OHSZ=2=KFA@mail.gmail.com/

Can do!

>> +       git -C full-checkout checkout HEAD -- folder1/a &&
>> +       test_sparse_match git status --porcelain=v2 &&
> 
> Hmm, shouldn't this be `test_all_match`? IIUC, we've resetted
> `folder1/a` on the full repo to make sure the status report is the
> same across all repos, right?

Yes!

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 02/10] unpack-trees: make sparse aware
  2021-04-21 18:56       ` Elijah Newren
@ 2021-04-23 20:16         ` Derrick Stolee
  0 siblings, 0 replies; 66+ messages in thread
From: Derrick Stolee @ 2021-04-23 20:16 UTC (permalink / raw)
  To: Elijah Newren
  Cc: Derrick Stolee via GitGitGadget, Git Mailing List,
	Junio C Hamano, Derrick Stolee, Derrick Stolee,
	Matheus Tavares Bernardino

On 4/21/2021 2:56 PM, Elijah Newren wrote:
> On Wed, Apr 21, 2021 at 10:27 AM Derrick Stolee <stolee@gmail.com> wrote:
>> Alternatively, we can decide to change the behavior here and send an
>> error() and return failure if they try to add something that would
>> live within a sparse-directory entry. I will think more on this and
>> have a good answer before v2 is ready.
> 
> See my comments on 01/10; users are already getting surprised by "git
> add" today and has been going on for months (though not super
> frequently).  When they try to "git add" an untracked path that would
> not match any path specifications in $GIT_DIR/info/sparse-checkout,
> the fact that "git add" doesn't error out (or at the very least give a
> warning) causes _subsequent_ commands to surprise the user with their
> behavior; the fact that it is some later command that does weird stuff
> (removing the file from the working tree) makes it harder for them to
> try to understand and make sense of.  So, I'd say we do want to change
> the behavior here...and not just for sparse-indexes but
> sparse-checkouts in general.
> 
> As for how this affects the code, I think I'm behind both you and
> Matheus on understanding here, but I'm starting to think it was a good
> idea for me to spout my offhand comment on what looked like a funny
> code smell that I thought was unrelated to your patch.  Sounds like it
> is causing some good digging...I'll try to read up more on the results
> when you send v2.  :-)

I think there are enough strange thing happening with 'git add' that I
want to take some time to figure out the right approach here. In v2, I
will delete the changes to builtin/add.c and instead focus on making
'git status' faster with a sparse-index. The 'git add' improvements
will follow in another series after I take enough time to understand
all of these special modes.

I think this split is especially important if we decide that changing
the behavior is the best thing to do here.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCH v2 0/8] Sparse-index: integrate with status
  2021-04-13 14:01 [PATCH 00/10] Sparse-index: integrate with status and add Derrick Stolee via GitGitGadget
                   ` (10 preceding siblings ...)
  2021-04-13 20:45 ` [PATCH 00/10] Sparse-index: integrate with status and add Matheus Tavares Bernardino
@ 2021-04-23 21:34 ` Derrick Stolee via GitGitGadget
  2021-04-23 21:34   ` [PATCH v2 1/8] t1092: add tests for status/add and sparse files Derrick Stolee via GitGitGadget
                     ` (9 more replies)
  11 siblings, 10 replies; 66+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-04-23 21:34 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee,
	Derrick Stolee

This is the first "payoff" series in the sparse-index work. It makes 'git
status' very fast when a sparse-index is enabled on a repository with
cone-mode sparse-checkout (and a small populated set).

This is based on ds/sparse-index-protections AND mt/add-rm-sparse-checkout.
The latter branch is needed because it changes the behavior of 'git add'
around sparse entries, which changes the expectations of a test added in
patch 1.

The approach here is to audit the places where ensure_full_index() pops up
while doing normal commands with pathspecs within the sparse-checkout
definition. Each of these are checked and tested. In the end, the
sparse-index is integrated with these features:

 * git status
 * FS Monitor index extension.

The performance tests in p2000-sparse-operations.sh improve by 95% or more,
even when compared with the full-index cases, not just the sparse-index
cases that previously had extra overhead.

Hopefully this is the first example of how ds/sparse-index-protections has
done the basic work to do these conversions safely, making them look easier
than they seemed when starting this adventure.

Thanks, -Stolee


Updates in V2
=============

 * Based on the feedback, it is clear that 'git add' will require much more
   careful testing and thought. I'm splitting it out of this series and it
   will return with a follow-up.
 * Test cases are improved, both in coverage and organization.
 * The previous "unpack-trees: make sparse aware" patch is split into three
   now.
 * Stale messages based on an old implementation of the "protections" topic
   are now fixed.
 * Performance tests were re-run.

Derrick Stolee (8):
  t1092: add tests for status/add and sparse files
  unpack-trees: preserve cache_bottom
  unpack-trees: compare sparse directories correctly
  unpack-trees: stop recursing into sparse directories
  dir.c: accept a directory as part of cone-mode patterns
  status: skip sparse-checkout percentage with sparse-index
  status: use sparse-index throughout
  fsmonitor: test with sparse index

 builtin/commit.c                         |  3 ++
 dir.c                                    | 11 +++++
 read-cache.c                             | 10 +++-
 t/t1092-sparse-checkout-compatibility.sh | 61 ++++++++++++++++++++++--
 t/t7519-status-fsmonitor.sh              | 48 +++++++++++++++++++
 unpack-trees.c                           | 25 ++++++++--
 wt-status.c                              | 14 ++++--
 wt-status.h                              |  1 +
 8 files changed, 161 insertions(+), 12 deletions(-)


base-commit: f723f370c89ad61f4f40aabfd3540b1ce19c00e5
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-932%2Fderrickstolee%2Fsparse-index%2Fstatus-and-add-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-932/derrickstolee/sparse-index/status-and-add-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/932

Range-diff vs v1:

  1:  b2cb5401eff8 !  1:  3bac9edae7d8 t1092: add tests for status/add and sparse files
     @@ Commit message
          Untracked files are fine: adding new files both with 'git add .' and
          'git add folder1/' works just as in a full checkout. This may not be
          entirely desirable, but we are not intending to change behavior at the
     -    moment, only document it.
     +    moment, only document it. A future change could alter the behavior to
     +    be more sensible, and this test could be modified to satisfy the new
     +    expected behavior.
      
          Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
      
     @@ t/t1092-sparse-checkout-compatibility.sh: test_expect_success 'add, commit, chec
      +	# This "git add folder1/a" is completely ignored
      +	# by the sparse-checkout repos. It causes the
      +	# full repo to have a different staged environment.
     -+	test_must_fail git -C sparse-checkout add folder1/a &&
     -+	test_must_fail git -C sparse-index add folder1/a &&
     ++	#
     ++	# This is not a desirable behavior, but this test
     ++	# ensures that the sparse-index is not the cause
     ++	# of a behavior change.
     ++	test_sparse_match test_must_fail git add folder1/a &&
     ++	test_sparse_match test_must_fail git add --refresh folder1/a &&
      +	git -C full-checkout checkout HEAD -- folder1/a &&
     -+	test_sparse_match git status --porcelain=v2 &&
     ++	test_all_match git status --porcelain=v2 &&
      +
      +	test_all_match git add . &&
      +	test_all_match git status --porcelain=v2 &&
  -:  ------------ >  2:  19344394379d unpack-trees: preserve cache_bottom
  -:  ------------ >  3:  24e71d8c0622 unpack-trees: compare sparse directories correctly
  2:  0a3892d2ec9e !  4:  d3c8948d0a33 unpack-trees: make sparse aware
     @@ Metadata
      Author: Derrick Stolee <dstolee@microsoft.com>
      
       ## Commit message ##
     -    unpack-trees: make sparse aware
     +    unpack-trees: stop recursing into sparse directories
      
     -    As a first step to integrate 'git status' and 'git add' with the sparse
     -    index, we must start integrating unpack_trees() with sparse directory
     -    entries. These changes are currently impossible to trigger because
     -    unpack_trees() calls ensure_full_index() if command_requires_full_index
     -    is true. This is the case for all commands at the moment. As we expand
     -    more commands to be sparse-aware, we might find that more changes are
     -    required to unpack_trees(). The current changes will suffice for
     -    'status' and 'add'.
     +    When walking trees using traverse_trees_recursive() and
     +    unpack_callback(), we must not attempt to walk into a sparse directory
     +    entry. There are no index entries within that directory to compare to
     +    the tree object at that position, so skip over the entries of that tree.
      
     -    unpack_trees() calls the traverse_trees() API using unpack_callback()
     -    to decide if we should recurse into a subtree. We must add new abilities
     -    to skip a subtree if it corresponds to a sparse directory entry.
     -
     -    It is important to be careful about the trailing directory separator
     -    that exists in the sparse directory entries but not in the subtree
     -    paths.
     +    This code is used in many places, so the only way to test it is to start
     +    removing the command_requres_full_index option from one builtin at a
     +    time and carefully test that its use of unpack_trees() behaves correctly
     +    with a sparse-index. Such tests will be added by later changes.
      
          Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
      
     - ## dir.h ##
     -@@ dir.h: static inline int ce_path_match(struct index_state *istate,
     - 				char *seen)
     - {
     - 	return match_pathspec(istate, pathspec, ce->name, ce_namelen(ce), 0, seen,
     --			      S_ISDIR(ce->ce_mode) || S_ISGITLINK(ce->ce_mode));
     -+			      S_ISSPARSEDIR(ce->ce_mode) || S_ISDIR(ce->ce_mode) || S_ISGITLINK(ce->ce_mode));
     - }
     - 
     - static inline int dir_path_match(struct index_state *istate,
     -
     - ## preload-index.c ##
     -@@ preload-index.c: static void *preload_thread(void *_data)
     - 			continue;
     - 		if (S_ISGITLINK(ce->ce_mode))
     - 			continue;
     -+		if (S_ISSPARSEDIR(ce->ce_mode))
     -+			continue;
     - 		if (ce_uptodate(ce))
     - 			continue;
     - 		if (ce_skip_worktree(ce))
     -
     - ## read-cache.c ##
     -@@ read-cache.c: int refresh_index(struct index_state *istate, unsigned int flags,
     - 		if (ignore_skip_worktree && ce_skip_worktree(ce))
     - 			continue;
     - 
     -+		if (istate->sparse_index && S_ISSPARSEDIR(ce->ce_mode))
     -+			continue;
     -+
     - 		if (pathspec && !ce_path_match(istate, ce, pathspec, seen))
     - 			filtered = 1;
     - 
     -
       ## unpack-trees.c ##
     -@@ unpack-trees.c: static void mark_ce_used(struct cache_entry *ce, struct unpack_trees_options *o)
     - {
     - 	ce->ce_flags |= CE_UNPACKED;
     - 
     -+	/*
     -+	 * If this is a sparse directory, don't advance cache_bottom.
     -+	 * That will be advanced later using the cache-tree data.
     -+	 */
     -+	if (S_ISSPARSEDIR(ce->ce_mode))
     -+		return;
     -+
     - 	if (o->cache_bottom < o->src_index->cache_nr &&
     - 	    o->src_index->cache[o->cache_bottom] == ce) {
     - 		int bottom = o->cache_bottom;
     -@@ unpack-trees.c: static int do_compare_entry(const struct cache_entry *ce,
     - 	ce_len -= pathlen;
     - 	ce_name = ce->name + pathlen;
     - 
     -+	/* remove directory separator if a sparse directory entry */
     -+	if (S_ISSPARSEDIR(ce->ce_mode))
     -+		ce_len--;
     - 	return df_name_compare(ce_name, ce_len, S_IFREG, name, namelen, mode);
     - }
     - 
     -@@ unpack-trees.c: static int compare_entry(const struct cache_entry *ce, const struct traverse_inf
     - 	if (cmp)
     - 		return cmp;
     - 
     -+	/* If ce is a sparse directory, then allow equality here. */
     -+	if (S_ISSPARSEDIR(ce->ce_mode))
     -+		return 0;
     -+
     - 	/*
     - 	 * Even if the beginning compared identically, the ce should
     - 	 * compare as bigger than a directory leading up to it!
      @@ unpack-trees.c: static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, str
       	struct cache_entry *src[MAX_UNPACK_TREES + 1] = { NULL, };
       	struct unpack_trees_options *o = info->data;
       	const struct name_entry *p = names;
     -+	unsigned recurse = 1;
     ++	unsigned unpack_tree = 1;
       
       	/* Find first entry with a real name (we could use "mask" too) */
       	while (!p->mode)
     @@ unpack-trees.c: static int unpack_callback(int n, unsigned long mask, unsigned l
       				src[0] = ce;
      +
      +				if (S_ISSPARSEDIR(ce->ce_mode))
     -+					recurse = 0;
     ++					unpack_tree = 0;
       			}
       			break;
       		}
       	}
       
      -	if (unpack_nondirectories(n, mask, dirmask, src, names, info) < 0)
     -+	if (recurse &&
     ++	if (unpack_tree &&
      +	    unpack_nondirectories(n, mask, dirmask, src, names, info) < 0)
       		return -1;
       
     @@ unpack-trees.c: static int unpack_callback(int n, unsigned long mask, unsigned l
       		}
       
      -		if (traverse_trees_recursive(n, dirmask, mask & ~dirmask,
     -+		if (recurse &&
     ++		if (unpack_tree &&
      +		    traverse_trees_recursive(n, dirmask, mask & ~dirmask,
       					     names, info) < 0)
       			return -1;
  3:  28ca717e6526 !  5:  fd96b71968b6 dir.c: accept a directory as part of cone-mode patterns
     @@ dir.c: enum pattern_match_result path_matches_pattern_list(
       	strbuf_addch(&parent_pathname, '/');
       	strbuf_add(&parent_pathname, pathname, pathlen);
       
     -+	/* Directory requests should be added as if they are a file */
     -+	if (parent_pathname.len > 1 &&
     ++	/*
     ++	 * Directory entries are matched if and only if a file
     ++	 * contained immediately within them is matched. For the
     ++	 * case of a directory entry, modify the path to create
     ++	 * a fake filename within this directory, allowing us to
     ++	 * use the file-base matching logic in an equivalent way.
     ++	 */
     ++	if (parent_pathname.len > 0 &&
      +	    parent_pathname.buf[parent_pathname.len - 1] == '/')
      +		strbuf_add(&parent_pathname, "-", 1);
      +
  4:  e86f874dd412 =  6:  1f4ba56e7416 status: skip sparse-checkout percentage with sparse-index
  5:  d7d4cad8be0b !  7:  3d09368c0541 status: use sparse-index throughout
     @@ Commit message
          implementation details are already integrated with sparse-checkout, so
          modify command_requires_full_index to be zero for cmd_status().
      
     -    By running the debugger for 'git status -uno' after that change, we find
     -    two instances of ensure_full_index() that were added for extra safety,
     -    but can be removed without issue.
     +    In refresh_index(), we loop through the index entries to refresh their
     +    stat() information. However, sparse directories have no stat()
     +    information to populate. Ignore these entries.
      
     -    In refresh_index(), we loop through the index entries. The
     -    refresh_cache_ent() method copies the sparse directories into the
     -    refreshed index without issue.
     +    This allows 'git status' to no longer expand a sparse index to a full
     +    one. This is further tested by dropping the "-uno" option and adding an
     +    untracked file into the worktree.
      
     -    The loop within run_diff_files() skips things that are in stage 0 and
     -    have skip-worktree enabled, so seems safe to disable ensure_full_index()
     -    here.
     -
     -    This allows some cases of 'git status' to no longer expand a sparse
     -    index to a full one, giving the following performance improvements for
     -    p2000-sparse-checkout-operations.sh:
     +    The performance test p2000-sparse-checkout-operations.sh demonstrates
     +    these improvements:
      
          Test                                  HEAD~1           HEAD
          -----------------------------------------------------------------------------
     -    2000.2: git status (full-index-v3)    0.38(0.36+0.07)  0.37(0.31+0.10) -2.6%
     -    2000.3: git status (full-index-v4)    0.38(0.29+0.12)  0.37(0.30+0.11) -2.6%
     -    2000.4: git status (sparse-index-v3)  2.43(2.33+0.14)  0.04(0.05+0.04) -98.4%
     -    2000.5: git status (sparse-index-v4)  2.44(2.35+0.13)  0.05(0.04+0.07) -98.0%
     +    2000.2: git status (full-index-v3)    0.31(0.30+0.05)  0.31(0.29+0.06) +0.0%
     +    2000.3: git status (full-index-v4)    0.31(0.29+0.07)  0.34(0.30+0.08) +9.7%
     +    2000.4: git status (sparse-index-v3)  2.35(2.28+0.10)  0.04(0.04+0.05) -98.3%
     +    2000.5: git status (sparse-index-v4)  2.35(2.24+0.15)  0.05(0.04+0.06) -97.9%
      
          Note that since HEAD~1 was expanding the sparse index by parsing trees,
          it was artificially slower than the full index case. Thus, the 98%
     -    improvement is misleading, and instead we should celebrate the 0.37s to
     -    0.05s improvement of 82%. This is more indicative of the peformance
     +    improvement is misleading, and instead we should celebrate the 0.34s to
     +    0.05s improvement of 85%. This is more indicative of the peformance
          gains we are expecting by using a sparse index.
      
          Note: we are dropping the assignment of core.fsmonitor here. This is not
     @@ read-cache.c: int refresh_index(struct index_state *istate, unsigned int flags,
       	trace2_region_enter("index", "refresh", NULL);
      -	/* TODO: audit for interaction with sparse-index. */
      -	ensure_full_index(istate);
     ++
       	for (i = 0; i < istate->cache_nr; i++) {
       		struct cache_entry *ce, *new_entry;
       		int cache_errno = 0;
     +@@ read-cache.c: int refresh_index(struct index_state *istate, unsigned int flags,
     + 		if (ignore_skip_worktree && ce_skip_worktree(ce))
     + 			continue;
     + 
     ++		/*
     ++		 * If this entry is a sparse directory, then there isn't
     ++		 * any stat() information to update. Ignore the entry.
     ++		 */
     ++		if (S_ISSPARSEDIR(ce->ce_mode))
     ++			continue;
     ++
     + 		if (pathspec && !ce_path_match(istate, ce, pathspec, seen))
     + 			filtered = 1;
     + 
      
       ## t/t1092-sparse-checkout-compatibility.sh ##
      @@ t/t1092-sparse-checkout-compatibility.sh: test_expect_success 'sparse-index is expanded and converted back' '
     @@ t/t1092-sparse-checkout-compatibility.sh: test_expect_success 'sparse-index is e
      +	init_repos &&
      +
      +	rm -f trace2.txt &&
     ++	echo >>sparse-index/untracked.txt &&
       	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \
      -		git -C sparse-index -c core.fsmonitor="" status -uno &&
      -	test_region index ensure_full_index trace2.txt
     -+		git -C sparse-index status -uno &&
     ++		git -C sparse-index status &&
      +	test_region ! index ensure_full_index trace2.txt
       '
       
  6:  434306541613 <  -:  ------------ dir: use expand_to_path() for sparse directories
  7:  f1a9ce4ef0e5 <  -:  ------------ add: allow operating on a sparse-only index
  8:  6d7f30f2b90a <  -:  ------------ pathspec: stop calling ensure_full_index
  9:  75199bbe8ca1 <  -:  ------------ t7519: add sparse directories to FS monitor tests
 10:  9d1183ddd280 !  8:  1fd033a6ebb2 fsmonitor: test with sparse index
     @@ t/t7519-status-fsmonitor.sh: test_expect_success 'status succeeds after staging/
       	)
       '
       
     -+test_expect_success 'status succeeds with sparse index' '
     -+	test_config core.fsmonitor "$TEST_DIRECTORY/t7519/fsmonitor-all" &&
     ++# Usage:
     ++# check_sparse_index_behavior [!]
     ++# If "!" is supplied, then we verify that we do not call ensure_full_index
     ++# during a call to 'git status'. Otherwise, we verify that we _do_ call it.
     ++check_sparse_index_behavior () {
      +	git status --porcelain=v2 >expect &&
      +	git sparse-checkout init --cone --sparse-index &&
     ++	git sparse-checkout set dir1 dir2 &&
      +	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \
      +		git status --porcelain=v2 >actual &&
     -+	test_region ! index ensure_full_index trace2.txt &&
     ++	test_region $1 index ensure_full_index trace2.txt &&
      +	test_cmp expect actual &&
      +	rm trace2.txt &&
     ++	git sparse-checkout disable
     ++}
     ++
     ++test_expect_success 'status succeeds with sparse index' '
     ++	git reset --hard &&
     ++
     ++	test_config core.fsmonitor "$TEST_DIRECTORY/t7519/fsmonitor-all" &&
     ++	check_sparse_index_behavior ! &&
      +
      +	write_script .git/hooks/fsmonitor-test<<-\EOF &&
      +		printf "last_update_token\0"
      +	EOF
      +	git config core.fsmonitor .git/hooks/fsmonitor-test &&
     -+	git status --porcelain=v2 >expect &&
     -+	git sparse-checkout init --cone --sparse-index &&
     -+	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \
     -+		git status --porcelain=v2 >actual &&
     -+	test_region ! index ensure_full_index trace2.txt &&
     -+	test_cmp expect actual &&
     -+	rm trace2.txt &&
     ++	check_sparse_index_behavior ! &&
      +
      +	write_script .git/hooks/fsmonitor-test<<-\EOF &&
      +		printf "last_update_token\0"
      +		printf "dir1/modified\0"
      +	EOF
     -+	git config core.fsmonitor .git/hooks/fsmonitor-test &&
     -+	git status --porcelain=v2 >expect &&
     -+	git sparse-checkout init --cone --sparse-index &&
     -+	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \
     -+		git status --porcelain=v2 >actual &&
     -+	test_region ! index ensure_full_index trace2.txt &&
     -+	test_cmp expect actual &&
     ++	check_sparse_index_behavior ! &&
      +
     ++	cp -r dir1 dir1a &&
     ++	git add dir1a &&
     ++	git commit -m "add dir1a" &&
     ++
     ++	# This one modifies outside the sparse-checkout definition
     ++	# and hence we expect to expand the sparse-index.
      +	write_script .git/hooks/fsmonitor-test<<-\EOF &&
      +		printf "last_update_token\0"
      +		printf "dir1a/modified\0"
      +	EOF
     -+	git config core.fsmonitor .git/hooks/fsmonitor-test &&
     -+	git status --porcelain=v2 >expect &&
     -+	git sparse-checkout init --cone --sparse-index &&
     -+	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \
     -+		git status --porcelain=v2 >actual &&
     -+	test_region index ensure_full_index trace2.txt &&
     -+	test_cmp expect actual
     ++	check_sparse_index_behavior
      +'
      +
       test_done

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCH v2 1/8] t1092: add tests for status/add and sparse files
  2021-04-23 21:34 ` [PATCH v2 0/8] Sparse-index: integrate with status Derrick Stolee via GitGitGadget
@ 2021-04-23 21:34   ` Derrick Stolee via GitGitGadget
  2021-05-13 12:40     ` Matheus Tavares Bernardino
  2021-04-23 21:34   ` [PATCH v2 2/8] unpack-trees: preserve cache_bottom Derrick Stolee via GitGitGadget
                     ` (8 subsequent siblings)
  9 siblings, 1 reply; 66+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-04-23 21:34 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Before moving to update 'git status' and 'git add' to work with sparse
indexes, add an explicit test that ensures the sparse-index works the
same as a normal sparse-checkout when the worktree contains directories
and files outside of the sparse cone.

Specifically, 'folder1/a' is a file in our test repo, but 'folder1' is
not in the sparse cone. When 'folder1/a' is modified, the file
'folder1/a' is shown as modified, but adding it fails. This is new
behavior as of a20f704 (add: warn when asked to update SKIP_WORKTREE
entries, 2021-04-08). Before that change, these adds would be silently
ignored.

Untracked files are fine: adding new files both with 'git add .' and
'git add folder1/' works just as in a full checkout. This may not be
entirely desirable, but we are not intending to change behavior at the
moment, only document it. A future change could alter the behavior to
be more sensible, and this test could be modified to satisfy the new
expected behavior.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 t/t1092-sparse-checkout-compatibility.sh | 40 ++++++++++++++++++++++++
 1 file changed, 40 insertions(+)

diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index 12e6c453024f..0ec487acd283 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -232,6 +232,46 @@ test_expect_success 'add, commit, checkout' '
 	test_all_match git checkout -
 '
 
+test_expect_success 'status/add: outside sparse cone' '
+	init_repos &&
+
+	# folder1 is at HEAD, but outside the sparse cone
+	run_on_sparse mkdir folder1 &&
+	cp initial-repo/folder1/a sparse-checkout/folder1/a &&
+	cp initial-repo/folder1/a sparse-index/folder1/a &&
+
+	test_sparse_match git status &&
+
+	write_script edit-contents <<-\EOF &&
+	echo text >>$1
+	EOF
+	run_on_all ../edit-contents folder1/a &&
+	run_on_all ../edit-contents folder1/new &&
+
+	test_sparse_match git status --porcelain=v2 &&
+
+	# This "git add folder1/a" is completely ignored
+	# by the sparse-checkout repos. It causes the
+	# full repo to have a different staged environment.
+	#
+	# This is not a desirable behavior, but this test
+	# ensures that the sparse-index is not the cause
+	# of a behavior change.
+	test_sparse_match test_must_fail git add folder1/a &&
+	test_sparse_match test_must_fail git add --refresh folder1/a &&
+	git -C full-checkout checkout HEAD -- folder1/a &&
+	test_all_match git status --porcelain=v2 &&
+
+	test_all_match git add . &&
+	test_all_match git status --porcelain=v2 &&
+	test_all_match git commit -m folder1/new &&
+
+	run_on_all ../edit-contents folder1/newer &&
+	test_all_match git add folder1/ &&
+	test_all_match git status --porcelain=v2 &&
+	test_all_match git commit -m folder1/newer
+'
+
 test_expect_success 'checkout and reset --hard' '
 	init_repos &&
 
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCH v2 2/8] unpack-trees: preserve cache_bottom
  2021-04-23 21:34 ` [PATCH v2 0/8] Sparse-index: integrate with status Derrick Stolee via GitGitGadget
  2021-04-23 21:34   ` [PATCH v2 1/8] t1092: add tests for status/add and sparse files Derrick Stolee via GitGitGadget
@ 2021-04-23 21:34   ` Derrick Stolee via GitGitGadget
  2021-04-23 21:34   ` [PATCH v2 3/8] unpack-trees: compare sparse directories correctly Derrick Stolee via GitGitGadget
                     ` (7 subsequent siblings)
  9 siblings, 0 replies; 66+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-04-23 21:34 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

The cache_bottom member of 'struct unpack_trees_options' is used to
track the range of index entries corresponding to a node of the cache
tree. While recursing with traverse_by_cache_tree(), this value is
preserved on the call stack using a local and then restored as that
method returns.

The mark_ce_used() method normally modifies the cache_bottom member when
it refers to the marked cache entry. However, sparse directory entries
are stored as nodes in the cache-tree data structure as of 2de37c53
(cache-tree: integrate with sparse directory entries, 2021-03-30). Thus,
the cache_bottom will be modified as the cache-tree walk advances. Do
not update it as well within mark_ce_used().

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 unpack-trees.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/unpack-trees.c b/unpack-trees.c
index dddf106d5bd4..1067db19c9d2 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -586,6 +586,13 @@ static void mark_ce_used(struct cache_entry *ce, struct unpack_trees_options *o)
 {
 	ce->ce_flags |= CE_UNPACKED;
 
+	/*
+	 * If this is a sparse directory, don't advance cache_bottom.
+	 * That will be advanced later using the cache-tree data.
+	 */
+	if (S_ISSPARSEDIR(ce->ce_mode))
+		return;
+
 	if (o->cache_bottom < o->src_index->cache_nr &&
 	    o->src_index->cache[o->cache_bottom] == ce) {
 		int bottom = o->cache_bottom;
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCH v2 3/8] unpack-trees: compare sparse directories correctly
  2021-04-23 21:34 ` [PATCH v2 0/8] Sparse-index: integrate with status Derrick Stolee via GitGitGadget
  2021-04-23 21:34   ` [PATCH v2 1/8] t1092: add tests for status/add and sparse files Derrick Stolee via GitGitGadget
  2021-04-23 21:34   ` [PATCH v2 2/8] unpack-trees: preserve cache_bottom Derrick Stolee via GitGitGadget
@ 2021-04-23 21:34   ` Derrick Stolee via GitGitGadget
  2021-05-13  3:26     ` Elijah Newren
  2021-04-23 21:34   ` [PATCH v2 4/8] unpack-trees: stop recursing into sparse directories Derrick Stolee via GitGitGadget
                     ` (6 subsequent siblings)
  9 siblings, 1 reply; 66+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-04-23 21:34 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

As we further integrate the sparse-index into unpack-trees, we need to
ensure that we compare sparse directory entries correctly with other
entries. This affects searching for an exact path as well as sorting
index entries.

Sparse directory entries contain the trailing directory separator. This
is important for the sorting, in particular. Thus, within
do_compare_entry() we stop using S_IFREG in all cases, since sparse
directories should use S_IFDIR to indicate that the comparison should
treat the entry name as a dirctory.

Within compare_entry(), it first calls do_compare_entry() to check the
leading portion of the name. When the input path is a directory name, we
could match exactly already. Thus, we should return 0 if we have an
exact string match on a sparse directory entry.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 unpack-trees.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/unpack-trees.c b/unpack-trees.c
index 1067db19c9d2..3af797093095 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -969,6 +969,7 @@ static int do_compare_entry(const struct cache_entry *ce,
 	int pathlen, ce_len;
 	const char *ce_name;
 	int cmp;
+	unsigned ce_mode;
 
 	/*
 	 * If we have not precomputed the traverse path, it is quicker
@@ -991,7 +992,8 @@ static int do_compare_entry(const struct cache_entry *ce,
 	ce_len -= pathlen;
 	ce_name = ce->name + pathlen;
 
-	return df_name_compare(ce_name, ce_len, S_IFREG, name, namelen, mode);
+	ce_mode = S_ISSPARSEDIR(ce->ce_mode) ? S_IFDIR : S_IFREG;
+	return df_name_compare(ce_name, ce_len, ce_mode, name, namelen, mode);
 }
 
 static int compare_entry(const struct cache_entry *ce, const struct traverse_info *info, const struct name_entry *n)
@@ -1000,6 +1002,10 @@ static int compare_entry(const struct cache_entry *ce, const struct traverse_inf
 	if (cmp)
 		return cmp;
 
+	/* If ce is a sparse directory, then allow an exact match. */
+	if (S_ISSPARSEDIR(ce->ce_mode))
+		return 0;
+
 	/*
 	 * Even if the beginning compared identically, the ce should
 	 * compare as bigger than a directory leading up to it!
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCH v2 4/8] unpack-trees: stop recursing into sparse directories
  2021-04-23 21:34 ` [PATCH v2 0/8] Sparse-index: integrate with status Derrick Stolee via GitGitGadget
                     ` (2 preceding siblings ...)
  2021-04-23 21:34   ` [PATCH v2 3/8] unpack-trees: compare sparse directories correctly Derrick Stolee via GitGitGadget
@ 2021-04-23 21:34   ` Derrick Stolee via GitGitGadget
  2021-05-13  3:31     ` Elijah Newren
  2021-04-23 21:34   ` [PATCH v2 5/8] dir.c: accept a directory as part of cone-mode patterns Derrick Stolee via GitGitGadget
                     ` (5 subsequent siblings)
  9 siblings, 1 reply; 66+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-04-23 21:34 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

When walking trees using traverse_trees_recursive() and
unpack_callback(), we must not attempt to walk into a sparse directory
entry. There are no index entries within that directory to compare to
the tree object at that position, so skip over the entries of that tree.

This code is used in many places, so the only way to test it is to start
removing the command_requres_full_index option from one builtin at a
time and carefully test that its use of unpack_trees() behaves correctly
with a sparse-index. Such tests will be added by later changes.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 unpack-trees.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/unpack-trees.c b/unpack-trees.c
index 3af797093095..67777570f829 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -1256,6 +1256,7 @@ static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, str
 	struct cache_entry *src[MAX_UNPACK_TREES + 1] = { NULL, };
 	struct unpack_trees_options *o = info->data;
 	const struct name_entry *p = names;
+	unsigned unpack_tree = 1;
 
 	/* Find first entry with a real name (we could use "mask" too) */
 	while (!p->mode)
@@ -1297,12 +1298,16 @@ static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, str
 					}
 				}
 				src[0] = ce;
+
+				if (S_ISSPARSEDIR(ce->ce_mode))
+					unpack_tree = 0;
 			}
 			break;
 		}
 	}
 
-	if (unpack_nondirectories(n, mask, dirmask, src, names, info) < 0)
+	if (unpack_tree &&
+	    unpack_nondirectories(n, mask, dirmask, src, names, info) < 0)
 		return -1;
 
 	if (o->merge && src[0]) {
@@ -1332,7 +1337,8 @@ static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, str
 			}
 		}
 
-		if (traverse_trees_recursive(n, dirmask, mask & ~dirmask,
+		if (unpack_tree &&
+		    traverse_trees_recursive(n, dirmask, mask & ~dirmask,
 					     names, info) < 0)
 			return -1;
 		return mask;
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCH v2 5/8] dir.c: accept a directory as part of cone-mode patterns
  2021-04-23 21:34 ` [PATCH v2 0/8] Sparse-index: integrate with status Derrick Stolee via GitGitGadget
                     ` (3 preceding siblings ...)
  2021-04-23 21:34   ` [PATCH v2 4/8] unpack-trees: stop recursing into sparse directories Derrick Stolee via GitGitGadget
@ 2021-04-23 21:34   ` Derrick Stolee via GitGitGadget
  2021-04-23 21:34   ` [PATCH v2 6/8] status: skip sparse-checkout percentage with sparse-index Derrick Stolee via GitGitGadget
                     ` (4 subsequent siblings)
  9 siblings, 0 replies; 66+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-04-23 21:34 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

When we have sparse directory entries in the index, we want to compare
that directory against sparse-checkout patterns. Those pattern matching
algorithms are built expecting a file path, not a directory path. This
is especially important in the "cone mode" patterns which will match
files that exist within the "parent directories" as well as the
recursive directory matches.

If path_matches_pattern_list() is given a directory, we can add a fake
filename ("-") to the directory and get the same results as before,
assuming we are in cone mode. Since sparse index requires cone mode
patterns, this is an acceptable assumption.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 dir.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/dir.c b/dir.c
index 166238e79f52..ab76ef286495 100644
--- a/dir.c
+++ b/dir.c
@@ -1378,6 +1378,17 @@ enum pattern_match_result path_matches_pattern_list(
 	strbuf_addch(&parent_pathname, '/');
 	strbuf_add(&parent_pathname, pathname, pathlen);
 
+	/*
+	 * Directory entries are matched if and only if a file
+	 * contained immediately within them is matched. For the
+	 * case of a directory entry, modify the path to create
+	 * a fake filename within this directory, allowing us to
+	 * use the file-base matching logic in an equivalent way.
+	 */
+	if (parent_pathname.len > 0 &&
+	    parent_pathname.buf[parent_pathname.len - 1] == '/')
+		strbuf_add(&parent_pathname, "-", 1);
+
 	if (hashmap_contains_path(&pl->recursive_hashmap,
 				  &parent_pathname)) {
 		result = MATCHED_RECURSIVE;
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCH v2 6/8] status: skip sparse-checkout percentage with sparse-index
  2021-04-23 21:34 ` [PATCH v2 0/8] Sparse-index: integrate with status Derrick Stolee via GitGitGadget
                     ` (4 preceding siblings ...)
  2021-04-23 21:34   ` [PATCH v2 5/8] dir.c: accept a directory as part of cone-mode patterns Derrick Stolee via GitGitGadget
@ 2021-04-23 21:34   ` Derrick Stolee via GitGitGadget
  2021-04-23 21:34   ` [PATCH v2 7/8] status: use sparse-index throughout Derrick Stolee via GitGitGadget
                     ` (3 subsequent siblings)
  9 siblings, 0 replies; 66+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-04-23 21:34 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

'git status' began reporting a percentage of populated paths when
sparse-checkout is enabled in 051df3cf (wt-status: show sparse
checkout status as well, 2020-07-18). This percentage is incorrect when
the index has sparse directories. It would also be expensive to
calculate as we would need to parse trees to count the total number of
possible paths.

Avoid the expensive computation by simplifying the output to only report
that a sparse checkout exists, without the percentage.

This change is the reason we use 'git status --porcelain=v2' in
t1092-sparse-checkout-compatibility.sh. We don't want to ensure that
this message is equal across both modes, but instead just the important
information about staged, modified, and untracked files are compared.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 t/t1092-sparse-checkout-compatibility.sh |  8 ++++++++
 wt-status.c                              | 14 +++++++++++---
 wt-status.h                              |  1 +
 3 files changed, 20 insertions(+), 3 deletions(-)

diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index 0ec487acd283..0dc551b25f67 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -196,6 +196,14 @@ test_expect_success 'status with options' '
 	test_all_match git status --porcelain=v2 -uno
 '
 
+test_expect_success 'status reports sparse-checkout' '
+	init_repos &&
+	git -C sparse-checkout status >full &&
+	git -C sparse-index status >sparse &&
+	test_i18ngrep "You are in a sparse checkout with " full &&
+	test_i18ngrep "You are in a sparse checkout." sparse
+'
+
 test_expect_success 'add, commit, checkout' '
 	init_repos &&
 
diff --git a/wt-status.c b/wt-status.c
index 0c8287a023e4..0425169c1895 100644
--- a/wt-status.c
+++ b/wt-status.c
@@ -1490,9 +1490,12 @@ static void show_sparse_checkout_in_use(struct wt_status *s,
 	if (s->state.sparse_checkout_percentage == SPARSE_CHECKOUT_DISABLED)
 		return;
 
-	status_printf_ln(s, color,
-			 _("You are in a sparse checkout with %d%% of tracked files present."),
-			 s->state.sparse_checkout_percentage);
+	if (s->state.sparse_checkout_percentage == SPARSE_CHECKOUT_SPARSE_INDEX)
+		status_printf_ln(s, color, _("You are in a sparse checkout."));
+	else
+		status_printf_ln(s, color,
+				_("You are in a sparse checkout with %d%% of tracked files present."),
+				s->state.sparse_checkout_percentage);
 	wt_longstatus_print_trailer(s);
 }
 
@@ -1650,6 +1653,11 @@ static void wt_status_check_sparse_checkout(struct repository *r,
 		return;
 	}
 
+	if (r->index->sparse_index) {
+		state->sparse_checkout_percentage = SPARSE_CHECKOUT_SPARSE_INDEX;
+		return;
+	}
+
 	for (i = 0; i < r->index->cache_nr; i++) {
 		struct cache_entry *ce = r->index->cache[i];
 		if (ce_skip_worktree(ce))
diff --git a/wt-status.h b/wt-status.h
index 0d32799b28e1..ab9cc9d8f032 100644
--- a/wt-status.h
+++ b/wt-status.h
@@ -78,6 +78,7 @@ enum wt_status_format {
 };
 
 #define SPARSE_CHECKOUT_DISABLED -1
+#define SPARSE_CHECKOUT_SPARSE_INDEX -2
 
 struct wt_status_state {
 	int merge_in_progress;
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCH v2 7/8] status: use sparse-index throughout
  2021-04-23 21:34 ` [PATCH v2 0/8] Sparse-index: integrate with status Derrick Stolee via GitGitGadget
                     ` (5 preceding siblings ...)
  2021-04-23 21:34   ` [PATCH v2 6/8] status: skip sparse-checkout percentage with sparse-index Derrick Stolee via GitGitGadget
@ 2021-04-23 21:34   ` Derrick Stolee via GitGitGadget
  2021-04-23 21:34   ` [PATCH v2 8/8] fsmonitor: test with sparse index Derrick Stolee via GitGitGadget
                     ` (2 subsequent siblings)
  9 siblings, 0 replies; 66+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-04-23 21:34 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

By testing 'git -c core.fsmonitor= status -uno', we can check for the
simplest index operations that can be made sparse-aware. The necessary
implementation details are already integrated with sparse-checkout, so
modify command_requires_full_index to be zero for cmd_status().

In refresh_index(), we loop through the index entries to refresh their
stat() information. However, sparse directories have no stat()
information to populate. Ignore these entries.

This allows 'git status' to no longer expand a sparse index to a full
one. This is further tested by dropping the "-uno" option and adding an
untracked file into the worktree.

The performance test p2000-sparse-checkout-operations.sh demonstrates
these improvements:

Test                                  HEAD~1           HEAD
-----------------------------------------------------------------------------
2000.2: git status (full-index-v3)    0.31(0.30+0.05)  0.31(0.29+0.06) +0.0%
2000.3: git status (full-index-v4)    0.31(0.29+0.07)  0.34(0.30+0.08) +9.7%
2000.4: git status (sparse-index-v3)  2.35(2.28+0.10)  0.04(0.04+0.05) -98.3%
2000.5: git status (sparse-index-v4)  2.35(2.24+0.15)  0.05(0.04+0.06) -97.9%

Note that since HEAD~1 was expanding the sparse index by parsing trees,
it was artificially slower than the full index case. Thus, the 98%
improvement is misleading, and instead we should celebrate the 0.34s to
0.05s improvement of 85%. This is more indicative of the peformance
gains we are expecting by using a sparse index.

Note: we are dropping the assignment of core.fsmonitor here. This is not
necessary for the test script as we are not altering the config any
other way. Correct integration with FS Monitor will be validated in
later changes.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 builtin/commit.c                         |  3 +++
 read-cache.c                             | 10 ++++++++--
 t/t1092-sparse-checkout-compatibility.sh | 13 +++++++++----
 3 files changed, 20 insertions(+), 6 deletions(-)

diff --git a/builtin/commit.c b/builtin/commit.c
index cf0c36d1dcb2..e529da7beadd 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -1404,6 +1404,9 @@ int cmd_status(int argc, const char **argv, const char *prefix)
 	if (argc == 2 && !strcmp(argv[1], "-h"))
 		usage_with_options(builtin_status_usage, builtin_status_options);
 
+	prepare_repo_settings(the_repository);
+	the_repository->settings.command_requires_full_index = 0;
+
 	status_init_config(&s, git_status_config);
 	argc = parse_options(argc, argv, prefix,
 			     builtin_status_options,
diff --git a/read-cache.c b/read-cache.c
index 29ffa9ac5db9..f80e26831b36 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1578,8 +1578,7 @@ int refresh_index(struct index_state *istate, unsigned int flags,
 	 */
 	preload_index(istate, pathspec, 0);
 	trace2_region_enter("index", "refresh", NULL);
-	/* TODO: audit for interaction with sparse-index. */
-	ensure_full_index(istate);
+
 	for (i = 0; i < istate->cache_nr; i++) {
 		struct cache_entry *ce, *new_entry;
 		int cache_errno = 0;
@@ -1594,6 +1593,13 @@ int refresh_index(struct index_state *istate, unsigned int flags,
 		if (ignore_skip_worktree && ce_skip_worktree(ce))
 			continue;
 
+		/*
+		 * If this entry is a sparse directory, then there isn't
+		 * any stat() information to update. Ignore the entry.
+		 */
+		if (S_ISSPARSEDIR(ce->ce_mode))
+			continue;
+
 		if (pathspec && !ce_path_match(istate, ce, pathspec, seen))
 			filtered = 1;
 
diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index 0dc551b25f67..5a8fe88dc894 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -453,12 +453,17 @@ test_expect_success 'sparse-index is expanded and converted back' '
 	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \
 		git -C sparse-index -c core.fsmonitor="" reset --hard &&
 	test_region index convert_to_sparse trace2.txt &&
-	test_region index ensure_full_index trace2.txt &&
+	test_region index ensure_full_index trace2.txt
+'
 
-	rm trace2.txt &&
+test_expect_success 'sparse-index is not expanded' '
+	init_repos &&
+
+	rm -f trace2.txt &&
+	echo >>sparse-index/untracked.txt &&
 	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \
-		git -C sparse-index -c core.fsmonitor="" status -uno &&
-	test_region index ensure_full_index trace2.txt
+		git -C sparse-index status &&
+	test_region ! index ensure_full_index trace2.txt
 '
 
 test_done
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCH v2 8/8] fsmonitor: test with sparse index
  2021-04-23 21:34 ` [PATCH v2 0/8] Sparse-index: integrate with status Derrick Stolee via GitGitGadget
                     ` (6 preceding siblings ...)
  2021-04-23 21:34   ` [PATCH v2 7/8] status: use sparse-index throughout Derrick Stolee via GitGitGadget
@ 2021-04-23 21:34   ` Derrick Stolee via GitGitGadget
  2021-05-13  4:12   ` [PATCH v2 0/8] Sparse-index: integrate with status Elijah Newren
  2021-05-14 18:30   ` [PATCH v3 00/12] " Derrick Stolee via GitGitGadget
  9 siblings, 0 replies; 66+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-04-23 21:34 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

During the effort to protect uses of the index to operate on a full
index, we did not modify fsmonitor.c. This is because it already works
effectively with only the change to index_name_stage_pos(). The only
thing left to do is to test that it works correctly.

These tests are added to demonstrate that the behavior is the same
across a full index and a sparse index, but also that file modifications
to a tracked directory outside of the sparse cone will trigger
ensure_full_index().

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 t/t7519-status-fsmonitor.sh | 48 +++++++++++++++++++++++++++++++++++++
 1 file changed, 48 insertions(+)

diff --git a/t/t7519-status-fsmonitor.sh b/t/t7519-status-fsmonitor.sh
index 45d025f96010..f70fe961902e 100755
--- a/t/t7519-status-fsmonitor.sh
+++ b/t/t7519-status-fsmonitor.sh
@@ -73,6 +73,7 @@ test_expect_success 'setup' '
 	expect*
 	actual*
 	marker*
+	trace2*
 	EOF
 '
 
@@ -383,4 +384,51 @@ test_expect_success 'status succeeds after staging/unstaging' '
 	)
 '
 
+# Usage:
+# check_sparse_index_behavior [!]
+# If "!" is supplied, then we verify that we do not call ensure_full_index
+# during a call to 'git status'. Otherwise, we verify that we _do_ call it.
+check_sparse_index_behavior () {
+	git status --porcelain=v2 >expect &&
+	git sparse-checkout init --cone --sparse-index &&
+	git sparse-checkout set dir1 dir2 &&
+	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \
+		git status --porcelain=v2 >actual &&
+	test_region $1 index ensure_full_index trace2.txt &&
+	test_cmp expect actual &&
+	rm trace2.txt &&
+	git sparse-checkout disable
+}
+
+test_expect_success 'status succeeds with sparse index' '
+	git reset --hard &&
+
+	test_config core.fsmonitor "$TEST_DIRECTORY/t7519/fsmonitor-all" &&
+	check_sparse_index_behavior ! &&
+
+	write_script .git/hooks/fsmonitor-test<<-\EOF &&
+		printf "last_update_token\0"
+	EOF
+	git config core.fsmonitor .git/hooks/fsmonitor-test &&
+	check_sparse_index_behavior ! &&
+
+	write_script .git/hooks/fsmonitor-test<<-\EOF &&
+		printf "last_update_token\0"
+		printf "dir1/modified\0"
+	EOF
+	check_sparse_index_behavior ! &&
+
+	cp -r dir1 dir1a &&
+	git add dir1a &&
+	git commit -m "add dir1a" &&
+
+	# This one modifies outside the sparse-checkout definition
+	# and hence we expect to expand the sparse-index.
+	write_script .git/hooks/fsmonitor-test<<-\EOF &&
+		printf "last_update_token\0"
+		printf "dir1a/modified\0"
+	EOF
+	check_sparse_index_behavior
+'
+
 test_done
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v2 3/8] unpack-trees: compare sparse directories correctly
  2021-04-23 21:34   ` [PATCH v2 3/8] unpack-trees: compare sparse directories correctly Derrick Stolee via GitGitGadget
@ 2021-05-13  3:26     ` Elijah Newren
  0 siblings, 0 replies; 66+ messages in thread
From: Elijah Newren @ 2021-05-13  3:26 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Junio C Hamano, Matheus Tavares Bernardino,
	Derrick Stolee, Derrick Stolee, Derrick Stolee

On Fri, Apr 23, 2021 at 2:34 PM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Derrick Stolee <dstolee@microsoft.com>
>
> As we further integrate the sparse-index into unpack-trees, we need to
> ensure that we compare sparse directory entries correctly with other
> entries. This affects searching for an exact path as well as sorting
> index entries.
>
> Sparse directory entries contain the trailing directory separator. This
> is important for the sorting, in particular. Thus, within
> do_compare_entry() we stop using S_IFREG in all cases, since sparse
> directories should use S_IFDIR to indicate that the comparison should
> treat the entry name as a dirctory.
>
> Within compare_entry(), it first calls do_compare_entry() to check the
> leading portion of the name. When the input path is a directory name, we
> could match exactly already. Thus, we should return 0 if we have an
> exact string match on a sparse directory entry.

Thanks for splitting up patch 2 from the original series; it's much
easier to understand these separate patches.

>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  unpack-trees.c | 8 +++++++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/unpack-trees.c b/unpack-trees.c
> index 1067db19c9d2..3af797093095 100644
> --- a/unpack-trees.c
> +++ b/unpack-trees.c
> @@ -969,6 +969,7 @@ static int do_compare_entry(const struct cache_entry *ce,
>         int pathlen, ce_len;
>         const char *ce_name;
>         int cmp;
> +       unsigned ce_mode;
>
>         /*
>          * If we have not precomputed the traverse path, it is quicker
> @@ -991,7 +992,8 @@ static int do_compare_entry(const struct cache_entry *ce,
>         ce_len -= pathlen;
>         ce_name = ce->name + pathlen;
>
> -       return df_name_compare(ce_name, ce_len, S_IFREG, name, namelen, mode);
> +       ce_mode = S_ISSPARSEDIR(ce->ce_mode) ? S_IFDIR : S_IFREG;

Ah, so here the fact that S_ISSPARSEDIR is defined as
   #define S_ISSPARSEDIR(m) ((m) == S_IFDIR)
whereas S_ISDIR is defined as
   #define S_ISDIR(m)      (((m) & S_IFMT) == S_IFDIR)
turns out to be critically important, because if you used S_ISDIR()
here, then we'd get ce_mode = S_IFDIR for submodules and break the
sorting.  S_ISSPARSEDIR() gives us the correct value.

> +       return df_name_compare(ce_name, ce_len, ce_mode, name, namelen, mode);
>  }
>
>  static int compare_entry(const struct cache_entry *ce, const struct traverse_info *info, const struct name_entry *n)
> @@ -1000,6 +1002,10 @@ static int compare_entry(const struct cache_entry *ce, const struct traverse_inf
>         if (cmp)
>                 return cmp;
>
> +       /* If ce is a sparse directory, then allow an exact match. */
> +       if (S_ISSPARSEDIR(ce->ce_mode))
> +               return 0;

I think the comment from the commit message belongs in the code; the
comment in the code is too jarring without the more detailed
explanation.

> +
>         /*
>          * Even if the beginning compared identically, the ce should
>          * compare as bigger than a directory leading up to it!
> --
> gitgitgadget

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v2 4/8] unpack-trees: stop recursing into sparse directories
  2021-04-23 21:34   ` [PATCH v2 4/8] unpack-trees: stop recursing into sparse directories Derrick Stolee via GitGitGadget
@ 2021-05-13  3:31     ` Elijah Newren
  0 siblings, 0 replies; 66+ messages in thread
From: Elijah Newren @ 2021-05-13  3:31 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Junio C Hamano, Matheus Tavares Bernardino,
	Derrick Stolee, Derrick Stolee, Derrick Stolee

On Fri, Apr 23, 2021 at 2:34 PM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Derrick Stolee <dstolee@microsoft.com>
>
> When walking trees using traverse_trees_recursive() and
> unpack_callback(), we must not attempt to walk into a sparse directory
> entry. There are no index entries within that directory to compare to
> the tree object at that position, so skip over the entries of that tree.
>
> This code is used in many places, so the only way to test it is to start
> removing the command_requres_full_index option from one builtin at a
> time and carefully test that its use of unpack_trees() behaves correctly
> with a sparse-index. Such tests will be added by later changes.
>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  unpack-trees.c | 10 ++++++++--
>  1 file changed, 8 insertions(+), 2 deletions(-)
>
> diff --git a/unpack-trees.c b/unpack-trees.c
> index 3af797093095..67777570f829 100644
> --- a/unpack-trees.c
> +++ b/unpack-trees.c
> @@ -1256,6 +1256,7 @@ static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, str
>         struct cache_entry *src[MAX_UNPACK_TREES + 1] = { NULL, };
>         struct unpack_trees_options *o = info->data;
>         const struct name_entry *p = names;
> +       unsigned unpack_tree = 1;
>
>         /* Find first entry with a real name (we could use "mask" too) */
>         while (!p->mode)
> @@ -1297,12 +1298,16 @@ static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, str
>                                         }
>                                 }
>                                 src[0] = ce;
> +
> +                               if (S_ISSPARSEDIR(ce->ce_mode))
> +                                       unpack_tree = 0;
>                         }
>                         break;
>                 }
>         }
>
> -       if (unpack_nondirectories(n, mask, dirmask, src, names, info) < 0)
> +       if (unpack_tree &&
> +           unpack_nondirectories(n, mask, dirmask, src, names, info) < 0)
>                 return -1;
>
>         if (o->merge && src[0]) {
> @@ -1332,7 +1337,8 @@ static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, str
>                         }
>                 }
>
> -               if (traverse_trees_recursive(n, dirmask, mask & ~dirmask,
> +               if (unpack_tree &&
> +                   traverse_trees_recursive(n, dirmask, mask & ~dirmask,
>                                              names, info) < 0)
>                         return -1;
>                 return mask;
> --
> gitgitgadget

The splitting of the previous patch looks really good here too, and
the variable rename makes it flow nicely.  Looking good.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v2 0/8] Sparse-index: integrate with status
  2021-04-23 21:34 ` [PATCH v2 0/8] Sparse-index: integrate with status Derrick Stolee via GitGitGadget
                     ` (7 preceding siblings ...)
  2021-04-23 21:34   ` [PATCH v2 8/8] fsmonitor: test with sparse index Derrick Stolee via GitGitGadget
@ 2021-05-13  4:12   ` Elijah Newren
  2021-05-14 18:28     ` Derrick Stolee
  2021-05-14 18:30   ` [PATCH v3 00/12] " Derrick Stolee via GitGitGadget
  9 siblings, 1 reply; 66+ messages in thread
From: Elijah Newren @ 2021-05-13  4:12 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Junio C Hamano, Matheus Tavares Bernardino,
	Derrick Stolee, Derrick Stolee

On Fri, Apr 23, 2021 at 2:34 PM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> This is the first "payoff" series in the sparse-index work. It makes 'git
> status' very fast when a sparse-index is enabled on a repository with
> cone-mode sparse-checkout (and a small populated set).
>
> This is based on ds/sparse-index-protections AND mt/add-rm-sparse-checkout.
> The latter branch is needed because it changes the behavior of 'git add'
> around sparse entries, which changes the expectations of a test added in
> patch 1.
>
> The approach here is to audit the places where ensure_full_index() pops up
> while doing normal commands with pathspecs within the sparse-checkout
> definition. Each of these are checked and tested. In the end, the
> sparse-index is integrated with these features:
>
>  * git status
>  * FS Monitor index extension.
>
> The performance tests in p2000-sparse-operations.sh improve by 95% or more,
> even when compared with the full-index cases, not just the sparse-index
> cases that previously had extra overhead.
>
> Hopefully this is the first example of how ds/sparse-index-protections has
> done the basic work to do these conversions safely, making them look easier
> than they seemed when starting this adventure.
>
> Thanks, -Stolee
>
>
> Updates in V2
> =============
>
>  * Based on the feedback, it is clear that 'git add' will require much more
>    careful testing and thought. I'm splitting it out of this series and it
>    will return with a follow-up.
>  * Test cases are improved, both in coverage and organization.
>  * The previous "unpack-trees: make sparse aware" patch is split into three
>    now.
>  * Stale messages based on an old implementation of the "protections" topic
>    are now fixed.
>  * Performance tests were re-run.

I read through the topic, both my old comments, the range-diff, and
the new patches where the range-diff wasn't enough.  I tried to spot
issues, and was hoping to find problems you alluded to in your recent
comments at https://lore.kernel.org/git/05932ebc-04ac-b3c5-a460-5d37d8604fd9@gmail.com/,
but I failed to spot them.  I hope it has to do with the cache bottom
stuff that I just don't understand, because otherwise I just missed
the problems in my review.  I can say that in v2 you fixed the issues
I did spot in my review of v1.

I'll look forward to v3 to see what it was I missed.  If I somehow
don't respond soon (in a week at the latest), do feel free to ping me;
sorry for somehow having this one slip through the cracks.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v2 1/8] t1092: add tests for status/add and sparse files
  2021-04-23 21:34   ` [PATCH v2 1/8] t1092: add tests for status/add and sparse files Derrick Stolee via GitGitGadget
@ 2021-05-13 12:40     ` Matheus Tavares Bernardino
  2021-05-14 12:27       ` Derrick Stolee
  0 siblings, 1 reply; 66+ messages in thread
From: Matheus Tavares Bernardino @ 2021-05-13 12:40 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: git, Junio C Hamano, Elijah Newren, Derrick Stolee,
	Derrick Stolee, Derrick Stolee

On Fri, Apr 23, 2021 at 6:34 PM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Derrick Stolee <dstolee@microsoft.com>
>
> Before moving to update 'git status' and 'git add' to work with sparse
> indexes, add an explicit test that ensures the sparse-index works the
> same as a normal sparse-checkout when the worktree contains directories
> and files outside of the sparse cone.
>
> Specifically, 'folder1/a' is a file in our test repo, but 'folder1' is
> not in the sparse cone. When 'folder1/a' is modified, the file
> 'folder1/a' is shown as modified, but adding it fails.

Hmm, I might be doing something wrong, but I think `folder1/a` is not
shown as modified.

$ git init test
$ mkdir test/folder1
$ echo original >test/folder1/a
$ echo original >test/b
$ git -C test add . && git -C test commit -m files
$ git -C test sparse-checkout init --cone --sparse-index
$ ls test
b
$ mkdir test/folder1 && echo modified >test/folder1/a
$ git -C test status
On branch master
You are in a sparse checkout with 50% of tracked files present.
nothing to commit, working tree clean

> This is new
> behavior as of a20f704 (add: warn when asked to update SKIP_WORKTREE
> entries, 2021-04-08). Before that change, these adds would be silently
> ignored.
>
> Untracked files are fine: adding new files both with 'git add .' and
> 'git add folder1/' works just as in a full checkout. This may not be
> entirely desirable, but we are not intending to change behavior at the
> moment, only document it. A future change could alter the behavior to
> be more sensible, and this test could be modified to satisfy the new
> expected behavior.
>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  t/t1092-sparse-checkout-compatibility.sh | 40 ++++++++++++++++++++++++
>  1 file changed, 40 insertions(+)
>
> diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
> index 12e6c453024f..0ec487acd283 100755
> --- a/t/t1092-sparse-checkout-compatibility.sh
> +++ b/t/t1092-sparse-checkout-compatibility.sh
> @@ -232,6 +232,46 @@ test_expect_success 'add, commit, checkout' '
>         test_all_match git checkout -
>  '
>
> +test_expect_success 'status/add: outside sparse cone' '
> +       init_repos &&

A minor suggestion: before recreating folder1/a, we could also test
that `git add folder1/a` will not remove the sparse entry from the
index and will properly warn about it on both sparse repos. I.e.
adding a:

        test_sparse_match test_must_fail git add folder1/a

> +       # folder1 is at HEAD, but outside the sparse cone
> +       run_on_sparse mkdir folder1 &&
> +       cp initial-repo/folder1/a sparse-checkout/folder1/a &&
> +       cp initial-repo/folder1/a sparse-index/folder1/a &&
> +
> +       test_sparse_match git status &&
> +
> +       write_script edit-contents <<-\EOF &&
> +       echo text >>$1
> +       EOF
> +       run_on_all ../edit-contents folder1/a &&

Hmm, we modify `folder1/a` in all repos, but we only try adding it on
the sparse repos, and then we immediately restore it on the full repo.
As we won't use the modified version on the full repo, could this
perhaps be `run_on_sparse` instead? If so, we could also save the
later `git -C full-checkout checkout HEAD -- folder1/a`.

> +       run_on_all ../edit-contents folder1/new &&
> +
> +       test_sparse_match git status --porcelain=v2 &&
> +
> +       # This "git add folder1/a" is completely ignored
> +       # by the sparse-checkout repos. It causes the
> +       # full repo to have a different staged environment.
> +       #
> +       # This is not a desirable behavior, but this test
> +       # ensures that the sparse-index is not the cause
> +       # of a behavior change.

I'm not sure I understand what the undesirable behavior is in this
sentence. Is it "git add folder1/a" erroring out and not updating
`folder1/a`? Or the full repo having a different staged environment?

> +       test_sparse_match test_must_fail git add folder1/a &&
> +       test_sparse_match test_must_fail git add --refresh folder1/a &&
> +       git -C full-checkout checkout HEAD -- folder1/a &&
> +       test_all_match git status --porcelain=v2 &&
> +
> +       test_all_match git add . &&
> +       test_all_match git status --porcelain=v2 &&
> +       test_all_match git commit -m folder1/new &&
> +
> +       run_on_all ../edit-contents folder1/newer &&
> +       test_all_match git add folder1/ &&
> +       test_all_match git status --porcelain=v2 &&
> +       test_all_match git commit -m folder1/newer
> +'
> +
>  test_expect_success 'checkout and reset --hard' '
>         init_repos &&
>
> --
> gitgitgadget
>

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v2 1/8] t1092: add tests for status/add and sparse files
  2021-05-13 12:40     ` Matheus Tavares Bernardino
@ 2021-05-14 12:27       ` Derrick Stolee
  0 siblings, 0 replies; 66+ messages in thread
From: Derrick Stolee @ 2021-05-14 12:27 UTC (permalink / raw)
  To: Matheus Tavares Bernardino, Derrick Stolee via GitGitGadget
  Cc: git, Junio C Hamano, Elijah Newren, Derrick Stolee, Derrick Stolee

On 5/13/2021 8:40 AM, Matheus Tavares Bernardino wrote:
> On Fri, Apr 23, 2021 at 6:34 PM Derrick Stolee via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
>>
>> From: Derrick Stolee <dstolee@microsoft.com>
>>
>> Before moving to update 'git status' and 'git add' to work with sparse
>> indexes, add an explicit test that ensures the sparse-index works the
>> same as a normal sparse-checkout when the worktree contains directories
>> and files outside of the sparse cone.
>>
>> Specifically, 'folder1/a' is a file in our test repo, but 'folder1' is
>> not in the sparse cone. When 'folder1/a' is modified, the file
>> 'folder1/a' is shown as modified, but adding it fails.
> 
> Hmm, I might be doing something wrong, but I think `folder1/a` is not
> shown as modified.
> 
> $ git init test
> $ mkdir test/folder1
> $ echo original >test/folder1/a
> $ echo original >test/b
> $ git -C test add . && git -C test commit -m files
> $ git -C test sparse-checkout init --cone --sparse-index
> $ ls test
> b
> $ mkdir test/folder1 && echo modified >test/folder1/a
> $ git -C test status
> On branch master
> You are in a sparse checkout with 50% of tracked files present.
> nothing to commit, working tree clean

You are correct. This happens in both the sparse-index case and the
regular full-index case. The modifications outside of the sparse-checkout
definition are ignored, as long as they matched a tracked file.

I checked my latest code against this example and see that the sparse
index is not expanded to a full one. It _will_ be if we add an untracked
file outside of the sparse cone.

>> This is new
>> behavior as of a20f704 (add: warn when asked to update SKIP_WORKTREE
>> entries, 2021-04-08). Before that change, these adds would be silently
>> ignored.
>>
>> Untracked files are fine: adding new files both with 'git add .' and
>> 'git add folder1/' works just as in a full checkout. This may not be
>> entirely desirable, but we are not intending to change behavior at the
>> moment, only document it. A future change could alter the behavior to
>> be more sensible, and this test could be modified to satisfy the new
>> expected behavior.
>>
>> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
>> ---
>>  t/t1092-sparse-checkout-compatibility.sh | 40 ++++++++++++++++++++++++
>>  1 file changed, 40 insertions(+)
>>
>> diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
>> index 12e6c453024f..0ec487acd283 100755
>> --- a/t/t1092-sparse-checkout-compatibility.sh
>> +++ b/t/t1092-sparse-checkout-compatibility.sh
>> @@ -232,6 +232,46 @@ test_expect_success 'add, commit, checkout' '
>>         test_all_match git checkout -
>>  '
>>
>> +test_expect_success 'status/add: outside sparse cone' '
>> +       init_repos &&
> 
> A minor suggestion: before recreating folder1/a, we could also test
> that `git add folder1/a` will not remove the sparse entry from the
> index and will properly warn about it on both sparse repos. I.e.
> adding a:
> 
>         test_sparse_match test_must_fail git add folder1/a

Will do.

>> +       # folder1 is at HEAD, but outside the sparse cone
>> +       run_on_sparse mkdir folder1 &&
>> +       cp initial-repo/folder1/a sparse-checkout/folder1/a &&
>> +       cp initial-repo/folder1/a sparse-index/folder1/a &&
>> +
>> +       test_sparse_match git status &&
>> +
>> +       write_script edit-contents <<-\EOF &&
>> +       echo text >>$1
>> +       EOF
>> +       run_on_all ../edit-contents folder1/a &&
> 
> Hmm, we modify `folder1/a` in all repos, but we only try adding it on
> the sparse repos, and then we immediately restore it on the full repo.
> As we won't use the modified version on the full repo, could this
> perhaps be `run_on_sparse` instead? If so, we could also save the
> later `git -C full-checkout checkout HEAD -- folder1/a`.

Good idea.

>> +       run_on_all ../edit-contents folder1/new &&
>> +
>> +       test_sparse_match git status --porcelain=v2 &&
>> +
>> +       # This "git add folder1/a" is completely ignored
>> +       # by the sparse-checkout repos. It causes the
>> +       # full repo to have a different staged environment.
>> +       #
>> +       # This is not a desirable behavior, but this test
>> +       # ensures that the sparse-index is not the cause
>> +       # of a behavior change.
> 
> I'm not sure I understand what the undesirable behavior is in this
> sentence. Is it "git add folder1/a" erroring out and not updating
> `folder1/a`? Or the full repo having a different staged environment?

Perhaps this isn't actually undesirable, now that we are actually
returning an error. It's no longer silent, so maybe my comment is
stale from an earlier version.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v2 0/8] Sparse-index: integrate with status
  2021-05-13  4:12   ` [PATCH v2 0/8] Sparse-index: integrate with status Elijah Newren
@ 2021-05-14 18:28     ` Derrick Stolee
  0 siblings, 0 replies; 66+ messages in thread
From: Derrick Stolee @ 2021-05-14 18:28 UTC (permalink / raw)
  To: Elijah Newren, Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Junio C Hamano, Matheus Tavares Bernardino,
	Derrick Stolee

On 5/13/2021 12:12 AM, Elijah Newren wrote:
> On Fri, Apr 23, 2021 at 2:34 PM Derrick Stolee via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
>>
>> This is the first "payoff" series in the sparse-index work. It makes 'git
>> status' very fast when a sparse-index is enabled on a repository with
>> cone-mode sparse-checkout (and a small populated set).
> 
> I read through the topic, both my old comments, the range-diff, and
> the new patches where the range-diff wasn't enough.  I tried to spot
> issues, and was hoping to find problems you alluded to in your recent
> comments at https://lore.kernel.org/git/05932ebc-04ac-b3c5-a460-5d37d8604fd9@gmail.com/,
> but I failed to spot them.  I hope it has to do with the cache bottom
> stuff that I just don't understand, because otherwise I just missed
> the problems in my review.  I can say that in v2 you fixed the issues
> I did spot in my review of v1.
> 
> I'll look forward to v3 to see what it was I missed.  If I somehow
> don't respond soon (in a week at the latest), do feel free to ping me;
> sorry for somehow having this one slip through the cracks.

v3 is on the way. The changes related to issues I found in my
deeper testing are more about what wasn't previously tested in
my test script as opposed to things actually being wrong in
the patch series. (There is one case where some new code was
incorrect, but it wasn't being tested because of the test repo's
data shape.)

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCH v3 00/12] Sparse-index: integrate with status
  2021-04-23 21:34 ` [PATCH v2 0/8] Sparse-index: integrate with status Derrick Stolee via GitGitGadget
                     ` (8 preceding siblings ...)
  2021-05-13  4:12   ` [PATCH v2 0/8] Sparse-index: integrate with status Elijah Newren
@ 2021-05-14 18:30   ` Derrick Stolee via GitGitGadget
  2021-05-14 18:31     ` [PATCH v3 01/12] sparse-index: skip indexes with unmerged entries Derrick Stolee via GitGitGadget
                       ` (11 more replies)
  9 siblings, 12 replies; 66+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-05-14 18:30 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee,
	Derrick Stolee

This is the first "payoff" series in the sparse-index work. It makes 'git
status' very fast when a sparse-index is enabled on a repository with
cone-mode sparse-checkout (and a small populated set).

This is based on ds/sparse-index-protections AND mt/add-rm-sparse-checkout.
The latter branch is needed because it changes the behavior of 'git add'
around sparse entries, which changes the expectations of a test added in
patch 1.

The approach here is to audit the places where ensure_full_index() pops up
while doing normal commands with pathspecs within the sparse-checkout
definition. Each of these are checked and tested. In the end, the
sparse-index is integrated with these features:

 * git status
 * FS Monitor index extension.

The performance tests in p2000-sparse-operations.sh improve by 95% or more,
even when compared with the full-index cases, not just the sparse-index
cases that previously had extra overhead.

Hopefully this is the first example of how ds/sparse-index-protections has
done the basic work to do these conversions safely, making them look easier
than they seemed when starting this adventure.

Thanks, -Stolee


Updates in V3
=============

Sorry that this was a long time coming. I got a little side-tracked on other
projects, but I also worked to get the sparse-index feature working against
the Scalar functional tests, which contain many special cases around the
sparse-checkout feature as they were inherited from special cases that arose
in the virtualized environment of VFS for Git. This version contains my
fixes based on that investigation. Most of these were easy to identify and
fix, but I was blocked for a long time struggling with a bug when combining
the sparse-index with the builtin FS Monitor feature, but I've reported my
findings already [1].

[1]
https://lore.kernel.org/git/0b9e54ba-ac27-e537-7bef-1b4448f92352@gmail.com/

 * Updated comments and tests based on the v2 feedback.
 * Expanded the test repository data shape based on the special cases found
   during my investigation.
 * Added several commits that either fix errors in the status code, or fix
   errors in the previous sparse-index series, specifically:
   * When in a conflict state, the cache-tree fails to update. For now, skip
     writing a sparse-index until this can be resolved more carefully.
   * When expanding a sparse-directory entry, we set the CE_SKIP_WORKTREE
     bit but forgot the CE_EXTENDED bit.
   * git status had failures if there was a sparse-directory entry as the
     first entry within a directory.
   * When expanding a directory to report its status, such as when a
     sparse-directory is staged but doesn't exist at HEAD (such as in an
     orphaned commit) we did not previously recurse correctly into
     subdirectories.
   * Be extra careful with the FS Monitor data when expanding or contracting
     an index. This version now abandons all FS Monitor data at these
     conversion points with the expectation that in the future these
     conversions will be rare so the FS Monitor feature can work
     efficiently. Updates in V2

----------------------------------------------------------------------------

 * Based on the feedback, it is clear that 'git add' will require much more
   careful testing and thought. I'm splitting it out of this series and it
   will return with a follow-up.
 * Test cases are improved, both in coverage and organization.
 * The previous "unpack-trees: make sparse aware" patch is split into three
   now.
 * Stale messages based on an old implementation of the "protections" topic
   are now fixed.
 * Performance tests were re-run.

Derrick Stolee (12):
  sparse-index: skip indexes with unmerged entries
  sparse-index: include EXTENDED flag when expanding
  t1092: expand repository data shape
  t1092: add tests for status/add and sparse files
  unpack-trees: preserve cache_bottom
  unpack-trees: compare sparse directories correctly
  unpack-trees: stop recursing into sparse directories
  dir.c: accept a directory as part of cone-mode patterns
  status: skip sparse-checkout percentage with sparse-index
  status: use sparse-index throughout
  wt-status: expand added sparse directory entries
  fsmonitor: integrate with sparse index

 builtin/commit.c                         |   3 +
 diff-lib.c                               |   6 ++
 dir.c                                    |  11 +++
 read-cache.c                             |  10 +-
 sparse-index.c                           |  27 +++++-
 t/t1092-sparse-checkout-compatibility.sh | 117 ++++++++++++++++++++++-
 t/t7519-status-fsmonitor.sh              |  48 ++++++++++
 unpack-trees.c                           |  27 +++++-
 wt-status.c                              |  64 ++++++++++++-
 wt-status.h                              |   1 +
 10 files changed, 300 insertions(+), 14 deletions(-)


base-commit: f723f370c89ad61f4f40aabfd3540b1ce19c00e5
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-932%2Fderrickstolee%2Fsparse-index%2Fstatus-and-add-v3
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-932/derrickstolee/sparse-index/status-and-add-v3
Pull-Request: https://github.com/gitgitgadget/git/pull/932

Range-diff vs v2:

  -:  ------------ >  1:  5a2ed3d1d701 sparse-index: skip indexes with unmerged entries
  -:  ------------ >  2:  8aa41e749471 sparse-index: include EXTENDED flag when expanding
  -:  ------------ >  3:  70971b1f9261 t1092: expand repository data shape
  1:  3bac9edae7d8 !  4:  a80b5a41153f t1092: add tests for status/add and sparse files
     @@ Commit message
          and files outside of the sparse cone.
      
          Specifically, 'folder1/a' is a file in our test repo, but 'folder1' is
     -    not in the sparse cone. When 'folder1/a' is modified, the file
     -    'folder1/a' is shown as modified, but adding it fails. This is new
     -    behavior as of a20f704 (add: warn when asked to update SKIP_WORKTREE
     -    entries, 2021-04-08). Before that change, these adds would be silently
     -    ignored.
     +    not in the sparse cone. When 'folder1/a' is modified, the file is not
     +    shown as modified and adding it will fail. This is new behavior as of
     +    a20f704 (add: warn when asked to update SKIP_WORKTREE entries,
     +    2021-04-08). Before that change, these adds would be silently ignored.
      
          Untracked files are fine: adding new files both with 'git add .' and
          'git add folder1/' works just as in a full checkout. This may not be
     @@ t/t1092-sparse-checkout-compatibility.sh: test_expect_success 'add, commit, chec
      +test_expect_success 'status/add: outside sparse cone' '
      +	init_repos &&
      +
     ++	# adding a "missing" file outside the cone should fail
     ++	test_sparse_match test_must_fail git add folder1/a &&
     ++
      +	# folder1 is at HEAD, but outside the sparse cone
      +	run_on_sparse mkdir folder1 &&
      +	cp initial-repo/folder1/a sparse-checkout/folder1/a &&
     @@ t/t1092-sparse-checkout-compatibility.sh: test_expect_success 'add, commit, chec
      +	write_script edit-contents <<-\EOF &&
      +	echo text >>$1
      +	EOF
     -+	run_on_all ../edit-contents folder1/a &&
     ++	run_on_sparse ../edit-contents folder1/a &&
      +	run_on_all ../edit-contents folder1/new &&
      +
      +	test_sparse_match git status --porcelain=v2 &&
      +
     -+	# This "git add folder1/a" is completely ignored
     -+	# by the sparse-checkout repos. It causes the
     -+	# full repo to have a different staged environment.
     -+	#
     -+	# This is not a desirable behavior, but this test
     -+	# ensures that the sparse-index is not the cause
     -+	# of a behavior change.
     ++	# This "git add folder1/a" fails with a warning
     ++	# in the sparse repos, differing from the full
     ++	# repo. This is intentional.
      +	test_sparse_match test_must_fail git add folder1/a &&
      +	test_sparse_match test_must_fail git add --refresh folder1/a &&
     -+	git -C full-checkout checkout HEAD -- folder1/a &&
      +	test_all_match git status --porcelain=v2 &&
      +
      +	test_all_match git add . &&
  2:  19344394379d =  5:  07a45b661c4a unpack-trees: preserve cache_bottom
  3:  24e71d8c0622 !  6:  cc4a526e7947 unpack-trees: compare sparse directories correctly
     @@ unpack-trees.c: static int compare_entry(const struct cache_entry *ce, const str
       	if (cmp)
       		return cmp;
       
     -+	/* If ce is a sparse directory, then allow an exact match. */
     ++	/*
     ++	 * At this point, we know that we have a prefix match. If ce
     ++	 * is a sparse directory, then allow an exact match. This only
     ++	 * works when the input name is a directory, since ce->name
     ++	 * ends in a directory separator.
     ++	 */
      +	if (S_ISSPARSEDIR(ce->ce_mode))
      +		return 0;
      +
  4:  d3c8948d0a33 !  7:  598375d3531f unpack-trees: stop recursing into sparse directories
     @@ Commit message
      
          Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
      
     + ## diff-lib.c ##
     +@@ diff-lib.c: static void show_new_file(struct rev_info *revs,
     + 	unsigned int mode;
     + 	unsigned dirty_submodule = 0;
     + 
     ++	if (S_ISSPARSEDIR(new_file->ce_mode))
     ++		return;
     ++
     + 	/*
     + 	 * New file in the index: it might actually be different in
     + 	 * the working tree.
     +@@ diff-lib.c: static int show_modified(struct rev_info *revs,
     + 	const struct object_id *oid;
     + 	unsigned dirty_submodule = 0;
     + 
     ++	if (S_ISSPARSEDIR(new_entry->ce_mode))
     ++		return 0;
     ++
     + 	if (get_stat_data(new_entry, &oid, &mode, cached, match_missing,
     + 			  &dirty_submodule, &revs->diffopt) < 0) {
     + 		if (report_missing)
     +
       ## unpack-trees.c ##
      @@ unpack-trees.c: static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, str
       	struct cache_entry *src[MAX_UNPACK_TREES + 1] = { NULL, };
     @@ unpack-trees.c: static int unpack_callback(int n, unsigned long mask, unsigned l
       	/* Find first entry with a real name (we could use "mask" too) */
       	while (!p->mode)
      @@ unpack-trees.c: static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, str
     - 					}
     - 				}
     - 				src[0] = ce;
     -+
     -+				if (S_ISSPARSEDIR(ce->ce_mode))
     -+					unpack_tree = 0;
     - 			}
     - 			break;
       		}
       	}
       
  5:  fd96b71968b6 =  8:  47da2b317237 dir.c: accept a directory as part of cone-mode patterns
  6:  1f4ba56e7416 =  9:  bc1512981493 status: skip sparse-checkout percentage with sparse-index
  7:  3d09368c0541 = 10:  5b1ae369a7cd status: use sparse-index throughout
  -:  ------------ > 11:  3b42783d4a86 wt-status: expand added sparse directory entries
  8:  1fd033a6ebb2 ! 12:  b72507f514d1 fsmonitor: test with sparse index
     @@ Metadata
      Author: Derrick Stolee <dstolee@microsoft.com>
      
       ## Commit message ##
     -    fsmonitor: test with sparse index
     +    fsmonitor: integrate with sparse index
      
     -    During the effort to protect uses of the index to operate on a full
     -    index, we did not modify fsmonitor.c. This is because it already works
     -    effectively with only the change to index_name_stage_pos(). The only
     -    thing left to do is to test that it works correctly.
     +    If we need to expand a sparse-index into a full one, then the FS Monitor
     +    bitmap is going to be incorrect. Ensure that we start fresh at such an
     +    event.
     +
     +    While this is currently a performance drawback, the eventual hope of the
     +    sparse-index feature is that these expansions will be rare and hence we
     +    will be able to keep the FS Monitor data accurate across multiple Git
     +    commands.
      
          These tests are added to demonstrate that the behavior is the same
          across a full index and a sparse index, but also that file modifications
     @@ Commit message
      
          Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
      
     + ## sparse-index.c ##
     +@@ sparse-index.c: int convert_to_sparse(struct index_state *istate)
     + 	cache_tree_free(&istate->cache_tree);
     + 	cache_tree_update(istate, 0);
     + 
     ++	istate->fsmonitor_has_run_once = 0;
     ++	FREE_AND_NULL(istate->fsmonitor_dirty);
     ++	FREE_AND_NULL(istate->fsmonitor_last_update);
     ++
     + 	istate->sparse_index = 1;
     + 	trace2_region_leave("index", "convert_to_sparse", istate->repo);
     + 	return 0;
     +@@ sparse-index.c: void ensure_full_index(struct index_state *istate)
     + 	istate->cache = full->cache;
     + 	istate->cache_nr = full->cache_nr;
     + 	istate->cache_alloc = full->cache_alloc;
     ++	istate->fsmonitor_has_run_once = 0;
     ++	FREE_AND_NULL(istate->fsmonitor_dirty);
     ++	FREE_AND_NULL(istate->fsmonitor_last_update);
     + 
     + 	strbuf_release(&base);
     + 	free(full);
     +
       ## t/t7519-status-fsmonitor.sh ##
      @@ t/t7519-status-fsmonitor.sh: test_expect_success 'setup' '
       	expect*

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCH v3 01/12] sparse-index: skip indexes with unmerged entries
  2021-05-14 18:30   ` [PATCH v3 00/12] " Derrick Stolee via GitGitGadget
@ 2021-05-14 18:31     ` Derrick Stolee via GitGitGadget
  2021-05-14 18:31     ` [PATCH v3 02/12] sparse-index: include EXTENDED flag when expanding Derrick Stolee via GitGitGadget
                       ` (10 subsequent siblings)
  11 siblings, 0 replies; 66+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-05-14 18:31 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

The sparse-index format is designed to be compatible with merge
conflicts, even those outside the sparse-checkout definition. The reason
is that when converting a full index to a sparse one, a cache entry with
nonzero stage will not be collapsed into a sparse directory entry.

However, this behavior was not tested, and a different behavior within
convert_to_sparse() fails in this scenario. Specifically,
cache_tree_update() will fail when unmerged entries exist.
convert_to_sparse_rec() uses the cache-tree data to recursively walk the
tree structure, but also to compute the OIDs used in the
sparse-directory entries.

Add an index scan to convert_to_sparse() that will detect if these merge
conflict entries exist and skip the conversion before trying to update
the cache-tree. This is marked as NEEDSWORK because this can be removed
with a suitable update to cache_tree_update() or a similar method that
can construct a cache-tree with invalid nodes, but still allow creating
the nodes necessary for creating sparse directory entries.

It is possible that in the future we will not need to make such an
update, since if we do not expand a sparse-index into a full one, this
conversion does not need to happen. Thus, this can be deferred until the
merge machinery is made to integrate with the sparse-index.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 sparse-index.c                           | 18 ++++++++++++++++++
 t/t1092-sparse-checkout-compatibility.sh | 22 ++++++++++++++++++++++
 2 files changed, 40 insertions(+)

diff --git a/sparse-index.c b/sparse-index.c
index 6f21397e2ee0..1b49898d0cb7 100644
--- a/sparse-index.c
+++ b/sparse-index.c
@@ -125,6 +125,17 @@ int set_sparse_index_config(struct repository *repo, int enable)
 	return res;
 }
 
+static int index_has_unmerged_entries(struct index_state *istate)
+{
+	int i;
+	for (i = 0; i < istate->cache_nr; i++) {
+		if (ce_stage(istate->cache[i]))
+			return 1;
+	}
+
+	return 0;
+}
+
 int convert_to_sparse(struct index_state *istate)
 {
 	int test_env;
@@ -161,6 +172,13 @@ int convert_to_sparse(struct index_state *istate)
 		return -1;
 	}
 
+	/*
+	 * NEEDSWORK: If we have unmerged entries, then stay full.
+	 * Unmerged entries prevent the cache-tree extension from working.
+	 */
+	if (index_has_unmerged_entries(istate))
+		return 0;
+
 	if (cache_tree_update(istate, 0)) {
 		warning(_("unable to update cache-tree, staying full"));
 		return -1;
diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index 12e6c453024f..4f2f09b53a32 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -352,6 +352,28 @@ test_expect_success 'merge with outside renames' '
 	done
 '
 
+# Sparse-index fails to convert the index in the
+# final 'git cherry-pick' command.
+test_expect_success 'cherry-pick with conflicts' '
+	init_repos &&
+
+	write_script edit-conflict <<-\EOF &&
+	echo $1 >conflict
+	EOF
+
+	test_all_match git checkout -b to-cherry-pick &&
+	run_on_all ../edit-conflict ABC &&
+	test_all_match git add conflict &&
+	test_all_match git commit -m "conflict to pick" &&
+
+	test_all_match git checkout -B base HEAD~1 &&
+	run_on_all ../edit-conflict DEF &&
+	test_all_match git add conflict &&
+	test_all_match git commit -m "conflict in base" &&
+
+	test_all_match test_must_fail git cherry-pick to-cherry-pick
+'
+
 test_expect_success 'clean' '
 	init_repos &&
 
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCH v3 02/12] sparse-index: include EXTENDED flag when expanding
  2021-05-14 18:30   ` [PATCH v3 00/12] " Derrick Stolee via GitGitGadget
  2021-05-14 18:31     ` [PATCH v3 01/12] sparse-index: skip indexes with unmerged entries Derrick Stolee via GitGitGadget
@ 2021-05-14 18:31     ` Derrick Stolee via GitGitGadget
  2021-05-14 18:31     ` [PATCH v3 03/12] t1092: expand repository data shape Derrick Stolee via GitGitGadget
                       ` (9 subsequent siblings)
  11 siblings, 0 replies; 66+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-05-14 18:31 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

When creating a full index from a sparse one, we create cache entries
for every blob within a given sparse directory entry. These are
correctly marked with the CE_SKIP_WORKTREE flag, but they must also be
marked with the CE_EXTENDED flag to ensure that the skip-worktree bit is
correctly written to disk in the case that the index is not converted
back down to a sparse-index.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 sparse-index.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/sparse-index.c b/sparse-index.c
index 1b49898d0cb7..b2b3fbd75050 100644
--- a/sparse-index.c
+++ b/sparse-index.c
@@ -222,7 +222,7 @@ static int add_path_to_index(const struct object_id *oid,
 	strbuf_addstr(base, path);
 
 	ce = make_cache_entry(istate, mode, oid, base->buf, 0, 0);
-	ce->ce_flags |= CE_SKIP_WORKTREE;
+	ce->ce_flags |= CE_SKIP_WORKTREE | CE_EXTENDED;
 	set_index_entry(istate, istate->cache_nr++, ce);
 
 	strbuf_setlen(base, len);
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCH v3 03/12] t1092: expand repository data shape
  2021-05-14 18:30   ` [PATCH v3 00/12] " Derrick Stolee via GitGitGadget
  2021-05-14 18:31     ` [PATCH v3 01/12] sparse-index: skip indexes with unmerged entries Derrick Stolee via GitGitGadget
  2021-05-14 18:31     ` [PATCH v3 02/12] sparse-index: include EXTENDED flag when expanding Derrick Stolee via GitGitGadget
@ 2021-05-14 18:31     ` Derrick Stolee via GitGitGadget
  2021-05-14 18:31     ` [PATCH v3 04/12] t1092: add tests for status/add and sparse files Derrick Stolee via GitGitGadget
                       ` (8 subsequent siblings)
  11 siblings, 0 replies; 66+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-05-14 18:31 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

As more features integrate with the sparse-index feature, more and more
special cases arise that require different data shapes within the tree
structure of the repository in order to demonstrate those cases.

Add several interesting special cases all at once instead of sprinkling
them across several commits. The interesting cases being added here are:

* Add sparse-directory entries on both sides of directories within the
  sparse-checkout definition.

* Add directories outside the sparse-checkout definition who have only
  one entry and are the first entry of a directory with multiple
  entries.

Later tests will take advantage of these shapes, but they also deepen
the tests that already exist.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 t/t1092-sparse-checkout-compatibility.sh | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index 4f2f09b53a32..98257695979a 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -17,7 +17,7 @@ test_expect_success 'setup' '
 		echo "after folder1" >g &&
 		echo "after x" >z &&
 		mkdir folder1 folder2 deep x &&
-		mkdir deep/deeper1 deep/deeper2 &&
+		mkdir deep/deeper1 deep/deeper2 deep/before deep/later &&
 		mkdir deep/deeper1/deepest &&
 		echo "after deeper1" >deep/e &&
 		echo "after deepest" >deep/deeper1/e &&
@@ -25,10 +25,16 @@ test_expect_success 'setup' '
 		cp a folder2 &&
 		cp a x &&
 		cp a deep &&
+		cp a deep/before &&
 		cp a deep/deeper1 &&
 		cp a deep/deeper2 &&
+		cp a deep/later &&
 		cp a deep/deeper1/deepest &&
 		cp -r deep/deeper1/deepest deep/deeper2 &&
+		mkdir deep/deeper1/0 &&
+		mkdir deep/deeper1/0/0 &&
+		touch deep/deeper1/0/1 &&
+		touch deep/deeper1/0/0/0 &&
 		git add . &&
 		git commit -m "initial commit" &&
 		git checkout -b base &&
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCH v3 04/12] t1092: add tests for status/add and sparse files
  2021-05-14 18:30   ` [PATCH v3 00/12] " Derrick Stolee via GitGitGadget
                       ` (2 preceding siblings ...)
  2021-05-14 18:31     ` [PATCH v3 03/12] t1092: expand repository data shape Derrick Stolee via GitGitGadget
@ 2021-05-14 18:31     ` Derrick Stolee via GitGitGadget
  2021-05-14 18:31     ` [PATCH v3 05/12] unpack-trees: preserve cache_bottom Derrick Stolee via GitGitGadget
                       ` (7 subsequent siblings)
  11 siblings, 0 replies; 66+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-05-14 18:31 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Before moving to update 'git status' and 'git add' to work with sparse
indexes, add an explicit test that ensures the sparse-index works the
same as a normal sparse-checkout when the worktree contains directories
and files outside of the sparse cone.

Specifically, 'folder1/a' is a file in our test repo, but 'folder1' is
not in the sparse cone. When 'folder1/a' is modified, the file is not
shown as modified and adding it will fail. This is new behavior as of
a20f704 (add: warn when asked to update SKIP_WORKTREE entries,
2021-04-08). Before that change, these adds would be silently ignored.

Untracked files are fine: adding new files both with 'git add .' and
'git add folder1/' works just as in a full checkout. This may not be
entirely desirable, but we are not intending to change behavior at the
moment, only document it. A future change could alter the behavior to
be more sensible, and this test could be modified to satisfy the new
expected behavior.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 t/t1092-sparse-checkout-compatibility.sh | 38 ++++++++++++++++++++++++
 1 file changed, 38 insertions(+)

diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index 98257695979a..fba98d5484ae 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -238,6 +238,44 @@ test_expect_success 'add, commit, checkout' '
 	test_all_match git checkout -
 '
 
+test_expect_success 'status/add: outside sparse cone' '
+	init_repos &&
+
+	# adding a "missing" file outside the cone should fail
+	test_sparse_match test_must_fail git add folder1/a &&
+
+	# folder1 is at HEAD, but outside the sparse cone
+	run_on_sparse mkdir folder1 &&
+	cp initial-repo/folder1/a sparse-checkout/folder1/a &&
+	cp initial-repo/folder1/a sparse-index/folder1/a &&
+
+	test_sparse_match git status &&
+
+	write_script edit-contents <<-\EOF &&
+	echo text >>$1
+	EOF
+	run_on_sparse ../edit-contents folder1/a &&
+	run_on_all ../edit-contents folder1/new &&
+
+	test_sparse_match git status --porcelain=v2 &&
+
+	# This "git add folder1/a" fails with a warning
+	# in the sparse repos, differing from the full
+	# repo. This is intentional.
+	test_sparse_match test_must_fail git add folder1/a &&
+	test_sparse_match test_must_fail git add --refresh folder1/a &&
+	test_all_match git status --porcelain=v2 &&
+
+	test_all_match git add . &&
+	test_all_match git status --porcelain=v2 &&
+	test_all_match git commit -m folder1/new &&
+
+	run_on_all ../edit-contents folder1/newer &&
+	test_all_match git add folder1/ &&
+	test_all_match git status --porcelain=v2 &&
+	test_all_match git commit -m folder1/newer
+'
+
 test_expect_success 'checkout and reset --hard' '
 	init_repos &&
 
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCH v3 05/12] unpack-trees: preserve cache_bottom
  2021-05-14 18:30   ` [PATCH v3 00/12] " Derrick Stolee via GitGitGadget
                       ` (3 preceding siblings ...)
  2021-05-14 18:31     ` [PATCH v3 04/12] t1092: add tests for status/add and sparse files Derrick Stolee via GitGitGadget
@ 2021-05-14 18:31     ` Derrick Stolee via GitGitGadget
  2021-05-14 18:31     ` [PATCH v3 06/12] unpack-trees: compare sparse directories correctly Derrick Stolee via GitGitGadget
                       ` (6 subsequent siblings)
  11 siblings, 0 replies; 66+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-05-14 18:31 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

The cache_bottom member of 'struct unpack_trees_options' is used to
track the range of index entries corresponding to a node of the cache
tree. While recursing with traverse_by_cache_tree(), this value is
preserved on the call stack using a local and then restored as that
method returns.

The mark_ce_used() method normally modifies the cache_bottom member when
it refers to the marked cache entry. However, sparse directory entries
are stored as nodes in the cache-tree data structure as of 2de37c53
(cache-tree: integrate with sparse directory entries, 2021-03-30). Thus,
the cache_bottom will be modified as the cache-tree walk advances. Do
not update it as well within mark_ce_used().

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 unpack-trees.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/unpack-trees.c b/unpack-trees.c
index dddf106d5bd4..1067db19c9d2 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -586,6 +586,13 @@ static void mark_ce_used(struct cache_entry *ce, struct unpack_trees_options *o)
 {
 	ce->ce_flags |= CE_UNPACKED;
 
+	/*
+	 * If this is a sparse directory, don't advance cache_bottom.
+	 * That will be advanced later using the cache-tree data.
+	 */
+	if (S_ISSPARSEDIR(ce->ce_mode))
+		return;
+
 	if (o->cache_bottom < o->src_index->cache_nr &&
 	    o->src_index->cache[o->cache_bottom] == ce) {
 		int bottom = o->cache_bottom;
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCH v3 06/12] unpack-trees: compare sparse directories correctly
  2021-05-14 18:30   ` [PATCH v3 00/12] " Derrick Stolee via GitGitGadget
                       ` (4 preceding siblings ...)
  2021-05-14 18:31     ` [PATCH v3 05/12] unpack-trees: preserve cache_bottom Derrick Stolee via GitGitGadget
@ 2021-05-14 18:31     ` Derrick Stolee via GitGitGadget
  2021-05-14 18:31     ` [PATCH v3 07/12] unpack-trees: stop recursing into sparse directories Derrick Stolee via GitGitGadget
                       ` (5 subsequent siblings)
  11 siblings, 0 replies; 66+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-05-14 18:31 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

As we further integrate the sparse-index into unpack-trees, we need to
ensure that we compare sparse directory entries correctly with other
entries. This affects searching for an exact path as well as sorting
index entries.

Sparse directory entries contain the trailing directory separator. This
is important for the sorting, in particular. Thus, within
do_compare_entry() we stop using S_IFREG in all cases, since sparse
directories should use S_IFDIR to indicate that the comparison should
treat the entry name as a dirctory.

Within compare_entry(), it first calls do_compare_entry() to check the
leading portion of the name. When the input path is a directory name, we
could match exactly already. Thus, we should return 0 if we have an
exact string match on a sparse directory entry.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 unpack-trees.c | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/unpack-trees.c b/unpack-trees.c
index 1067db19c9d2..ef6a2b1c951c 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -969,6 +969,7 @@ static int do_compare_entry(const struct cache_entry *ce,
 	int pathlen, ce_len;
 	const char *ce_name;
 	int cmp;
+	unsigned ce_mode;
 
 	/*
 	 * If we have not precomputed the traverse path, it is quicker
@@ -991,7 +992,8 @@ static int do_compare_entry(const struct cache_entry *ce,
 	ce_len -= pathlen;
 	ce_name = ce->name + pathlen;
 
-	return df_name_compare(ce_name, ce_len, S_IFREG, name, namelen, mode);
+	ce_mode = S_ISSPARSEDIR(ce->ce_mode) ? S_IFDIR : S_IFREG;
+	return df_name_compare(ce_name, ce_len, ce_mode, name, namelen, mode);
 }
 
 static int compare_entry(const struct cache_entry *ce, const struct traverse_info *info, const struct name_entry *n)
@@ -1000,6 +1002,15 @@ static int compare_entry(const struct cache_entry *ce, const struct traverse_inf
 	if (cmp)
 		return cmp;
 
+	/*
+	 * At this point, we know that we have a prefix match. If ce
+	 * is a sparse directory, then allow an exact match. This only
+	 * works when the input name is a directory, since ce->name
+	 * ends in a directory separator.
+	 */
+	if (S_ISSPARSEDIR(ce->ce_mode))
+		return 0;
+
 	/*
 	 * Even if the beginning compared identically, the ce should
 	 * compare as bigger than a directory leading up to it!
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCH v3 07/12] unpack-trees: stop recursing into sparse directories
  2021-05-14 18:30   ` [PATCH v3 00/12] " Derrick Stolee via GitGitGadget
                       ` (5 preceding siblings ...)
  2021-05-14 18:31     ` [PATCH v3 06/12] unpack-trees: compare sparse directories correctly Derrick Stolee via GitGitGadget
@ 2021-05-14 18:31     ` Derrick Stolee via GitGitGadget
  2021-05-14 18:31     ` [PATCH v3 08/12] dir.c: accept a directory as part of cone-mode patterns Derrick Stolee via GitGitGadget
                       ` (4 subsequent siblings)
  11 siblings, 0 replies; 66+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-05-14 18:31 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

When walking trees using traverse_trees_recursive() and
unpack_callback(), we must not attempt to walk into a sparse directory
entry. There are no index entries within that directory to compare to
the tree object at that position, so skip over the entries of that tree.

This code is used in many places, so the only way to test it is to start
removing the command_requres_full_index option from one builtin at a
time and carefully test that its use of unpack_trees() behaves correctly
with a sparse-index. Such tests will be added by later changes.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 diff-lib.c     | 6 ++++++
 unpack-trees.c | 7 +++++--
 2 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/diff-lib.c b/diff-lib.c
index b73cc1859a49..d5e7e01132ee 100644
--- a/diff-lib.c
+++ b/diff-lib.c
@@ -322,6 +322,9 @@ static void show_new_file(struct rev_info *revs,
 	unsigned int mode;
 	unsigned dirty_submodule = 0;
 
+	if (S_ISSPARSEDIR(new_file->ce_mode))
+		return;
+
 	/*
 	 * New file in the index: it might actually be different in
 	 * the working tree.
@@ -343,6 +346,9 @@ static int show_modified(struct rev_info *revs,
 	const struct object_id *oid;
 	unsigned dirty_submodule = 0;
 
+	if (S_ISSPARSEDIR(new_entry->ce_mode))
+		return 0;
+
 	if (get_stat_data(new_entry, &oid, &mode, cached, match_missing,
 			  &dirty_submodule, &revs->diffopt) < 0) {
 		if (report_missing)
diff --git a/unpack-trees.c b/unpack-trees.c
index ef6a2b1c951c..703b0bdc9dfd 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -1261,6 +1261,7 @@ static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, str
 	struct cache_entry *src[MAX_UNPACK_TREES + 1] = { NULL, };
 	struct unpack_trees_options *o = info->data;
 	const struct name_entry *p = names;
+	unsigned unpack_tree = 1;
 
 	/* Find first entry with a real name (we could use "mask" too) */
 	while (!p->mode)
@@ -1307,7 +1308,8 @@ static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, str
 		}
 	}
 
-	if (unpack_nondirectories(n, mask, dirmask, src, names, info) < 0)
+	if (unpack_tree &&
+	    unpack_nondirectories(n, mask, dirmask, src, names, info) < 0)
 		return -1;
 
 	if (o->merge && src[0]) {
@@ -1337,7 +1339,8 @@ static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, str
 			}
 		}
 
-		if (traverse_trees_recursive(n, dirmask, mask & ~dirmask,
+		if (unpack_tree &&
+		    traverse_trees_recursive(n, dirmask, mask & ~dirmask,
 					     names, info) < 0)
 			return -1;
 		return mask;
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCH v3 08/12] dir.c: accept a directory as part of cone-mode patterns
  2021-05-14 18:30   ` [PATCH v3 00/12] " Derrick Stolee via GitGitGadget
                       ` (6 preceding siblings ...)
  2021-05-14 18:31     ` [PATCH v3 07/12] unpack-trees: stop recursing into sparse directories Derrick Stolee via GitGitGadget
@ 2021-05-14 18:31     ` Derrick Stolee via GitGitGadget
  2021-05-14 18:31     ` [PATCH v3 09/12] status: skip sparse-checkout percentage with sparse-index Derrick Stolee via GitGitGadget
                       ` (3 subsequent siblings)
  11 siblings, 0 replies; 66+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-05-14 18:31 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

When we have sparse directory entries in the index, we want to compare
that directory against sparse-checkout patterns. Those pattern matching
algorithms are built expecting a file path, not a directory path. This
is especially important in the "cone mode" patterns which will match
files that exist within the "parent directories" as well as the
recursive directory matches.

If path_matches_pattern_list() is given a directory, we can add a fake
filename ("-") to the directory and get the same results as before,
assuming we are in cone mode. Since sparse index requires cone mode
patterns, this is an acceptable assumption.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 dir.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/dir.c b/dir.c
index 166238e79f52..ab76ef286495 100644
--- a/dir.c
+++ b/dir.c
@@ -1378,6 +1378,17 @@ enum pattern_match_result path_matches_pattern_list(
 	strbuf_addch(&parent_pathname, '/');
 	strbuf_add(&parent_pathname, pathname, pathlen);
 
+	/*
+	 * Directory entries are matched if and only if a file
+	 * contained immediately within them is matched. For the
+	 * case of a directory entry, modify the path to create
+	 * a fake filename within this directory, allowing us to
+	 * use the file-base matching logic in an equivalent way.
+	 */
+	if (parent_pathname.len > 0 &&
+	    parent_pathname.buf[parent_pathname.len - 1] == '/')
+		strbuf_add(&parent_pathname, "-", 1);
+
 	if (hashmap_contains_path(&pl->recursive_hashmap,
 				  &parent_pathname)) {
 		result = MATCHED_RECURSIVE;
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCH v3 09/12] status: skip sparse-checkout percentage with sparse-index
  2021-05-14 18:30   ` [PATCH v3 00/12] " Derrick Stolee via GitGitGadget
                       ` (7 preceding siblings ...)
  2021-05-14 18:31     ` [PATCH v3 08/12] dir.c: accept a directory as part of cone-mode patterns Derrick Stolee via GitGitGadget
@ 2021-05-14 18:31     ` Derrick Stolee via GitGitGadget
  2021-05-14 18:31     ` [PATCH v3 10/12] status: use sparse-index throughout Derrick Stolee via GitGitGadget
                       ` (2 subsequent siblings)
  11 siblings, 0 replies; 66+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-05-14 18:31 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

'git status' began reporting a percentage of populated paths when
sparse-checkout is enabled in 051df3cf (wt-status: show sparse
checkout status as well, 2020-07-18). This percentage is incorrect when
the index has sparse directories. It would also be expensive to
calculate as we would need to parse trees to count the total number of
possible paths.

Avoid the expensive computation by simplifying the output to only report
that a sparse checkout exists, without the percentage.

This change is the reason we use 'git status --porcelain=v2' in
t1092-sparse-checkout-compatibility.sh. We don't want to ensure that
this message is equal across both modes, but instead just the important
information about staged, modified, and untracked files are compared.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 t/t1092-sparse-checkout-compatibility.sh |  8 ++++++++
 wt-status.c                              | 14 +++++++++++---
 wt-status.h                              |  1 +
 3 files changed, 20 insertions(+), 3 deletions(-)

diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index fba98d5484ae..34dae7fbcadd 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -202,6 +202,14 @@ test_expect_success 'status with options' '
 	test_all_match git status --porcelain=v2 -uno
 '
 
+test_expect_success 'status reports sparse-checkout' '
+	init_repos &&
+	git -C sparse-checkout status >full &&
+	git -C sparse-index status >sparse &&
+	test_i18ngrep "You are in a sparse checkout with " full &&
+	test_i18ngrep "You are in a sparse checkout." sparse
+'
+
 test_expect_success 'add, commit, checkout' '
 	init_repos &&
 
diff --git a/wt-status.c b/wt-status.c
index 0c8287a023e4..0425169c1895 100644
--- a/wt-status.c
+++ b/wt-status.c
@@ -1490,9 +1490,12 @@ static void show_sparse_checkout_in_use(struct wt_status *s,
 	if (s->state.sparse_checkout_percentage == SPARSE_CHECKOUT_DISABLED)
 		return;
 
-	status_printf_ln(s, color,
-			 _("You are in a sparse checkout with %d%% of tracked files present."),
-			 s->state.sparse_checkout_percentage);
+	if (s->state.sparse_checkout_percentage == SPARSE_CHECKOUT_SPARSE_INDEX)
+		status_printf_ln(s, color, _("You are in a sparse checkout."));
+	else
+		status_printf_ln(s, color,
+				_("You are in a sparse checkout with %d%% of tracked files present."),
+				s->state.sparse_checkout_percentage);
 	wt_longstatus_print_trailer(s);
 }
 
@@ -1650,6 +1653,11 @@ static void wt_status_check_sparse_checkout(struct repository *r,
 		return;
 	}
 
+	if (r->index->sparse_index) {
+		state->sparse_checkout_percentage = SPARSE_CHECKOUT_SPARSE_INDEX;
+		return;
+	}
+
 	for (i = 0; i < r->index->cache_nr; i++) {
 		struct cache_entry *ce = r->index->cache[i];
 		if (ce_skip_worktree(ce))
diff --git a/wt-status.h b/wt-status.h
index 0d32799b28e1..ab9cc9d8f032 100644
--- a/wt-status.h
+++ b/wt-status.h
@@ -78,6 +78,7 @@ enum wt_status_format {
 };
 
 #define SPARSE_CHECKOUT_DISABLED -1
+#define SPARSE_CHECKOUT_SPARSE_INDEX -2
 
 struct wt_status_state {
 	int merge_in_progress;
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCH v3 10/12] status: use sparse-index throughout
  2021-05-14 18:30   ` [PATCH v3 00/12] " Derrick Stolee via GitGitGadget
                       ` (8 preceding siblings ...)
  2021-05-14 18:31     ` [PATCH v3 09/12] status: skip sparse-checkout percentage with sparse-index Derrick Stolee via GitGitGadget
@ 2021-05-14 18:31     ` Derrick Stolee via GitGitGadget
  2021-05-14 18:31     ` [PATCH v3 11/12] wt-status: expand added sparse directory entries Derrick Stolee via GitGitGadget
  2021-05-14 18:31     ` [PATCH v3 12/12] fsmonitor: integrate with sparse index Derrick Stolee via GitGitGadget
  11 siblings, 0 replies; 66+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-05-14 18:31 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

By testing 'git -c core.fsmonitor= status -uno', we can check for the
simplest index operations that can be made sparse-aware. The necessary
implementation details are already integrated with sparse-checkout, so
modify command_requires_full_index to be zero for cmd_status().

In refresh_index(), we loop through the index entries to refresh their
stat() information. However, sparse directories have no stat()
information to populate. Ignore these entries.

This allows 'git status' to no longer expand a sparse index to a full
one. This is further tested by dropping the "-uno" option and adding an
untracked file into the worktree.

The performance test p2000-sparse-checkout-operations.sh demonstrates
these improvements:

Test                                  HEAD~1           HEAD
-----------------------------------------------------------------------------
2000.2: git status (full-index-v3)    0.31(0.30+0.05)  0.31(0.29+0.06) +0.0%
2000.3: git status (full-index-v4)    0.31(0.29+0.07)  0.34(0.30+0.08) +9.7%
2000.4: git status (sparse-index-v3)  2.35(2.28+0.10)  0.04(0.04+0.05) -98.3%
2000.5: git status (sparse-index-v4)  2.35(2.24+0.15)  0.05(0.04+0.06) -97.9%

Note that since HEAD~1 was expanding the sparse index by parsing trees,
it was artificially slower than the full index case. Thus, the 98%
improvement is misleading, and instead we should celebrate the 0.34s to
0.05s improvement of 85%. This is more indicative of the peformance
gains we are expecting by using a sparse index.

Note: we are dropping the assignment of core.fsmonitor here. This is not
necessary for the test script as we are not altering the config any
other way. Correct integration with FS Monitor will be validated in
later changes.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 builtin/commit.c                         |  3 +++
 read-cache.c                             | 10 ++++++++--
 t/t1092-sparse-checkout-compatibility.sh | 13 +++++++++----
 3 files changed, 20 insertions(+), 6 deletions(-)

diff --git a/builtin/commit.c b/builtin/commit.c
index cf0c36d1dcb2..e529da7beadd 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -1404,6 +1404,9 @@ int cmd_status(int argc, const char **argv, const char *prefix)
 	if (argc == 2 && !strcmp(argv[1], "-h"))
 		usage_with_options(builtin_status_usage, builtin_status_options);
 
+	prepare_repo_settings(the_repository);
+	the_repository->settings.command_requires_full_index = 0;
+
 	status_init_config(&s, git_status_config);
 	argc = parse_options(argc, argv, prefix,
 			     builtin_status_options,
diff --git a/read-cache.c b/read-cache.c
index 29ffa9ac5db9..f80e26831b36 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1578,8 +1578,7 @@ int refresh_index(struct index_state *istate, unsigned int flags,
 	 */
 	preload_index(istate, pathspec, 0);
 	trace2_region_enter("index", "refresh", NULL);
-	/* TODO: audit for interaction with sparse-index. */
-	ensure_full_index(istate);
+
 	for (i = 0; i < istate->cache_nr; i++) {
 		struct cache_entry *ce, *new_entry;
 		int cache_errno = 0;
@@ -1594,6 +1593,13 @@ int refresh_index(struct index_state *istate, unsigned int flags,
 		if (ignore_skip_worktree && ce_skip_worktree(ce))
 			continue;
 
+		/*
+		 * If this entry is a sparse directory, then there isn't
+		 * any stat() information to update. Ignore the entry.
+		 */
+		if (S_ISSPARSEDIR(ce->ce_mode))
+			continue;
+
 		if (pathspec && !ce_path_match(istate, ce, pathspec, seen))
 			filtered = 1;
 
diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index 34dae7fbcadd..59faf7381093 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -479,12 +479,17 @@ test_expect_success 'sparse-index is expanded and converted back' '
 	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \
 		git -C sparse-index -c core.fsmonitor="" reset --hard &&
 	test_region index convert_to_sparse trace2.txt &&
-	test_region index ensure_full_index trace2.txt &&
+	test_region index ensure_full_index trace2.txt
+'
 
-	rm trace2.txt &&
+test_expect_success 'sparse-index is not expanded' '
+	init_repos &&
+
+	rm -f trace2.txt &&
+	echo >>sparse-index/untracked.txt &&
 	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \
-		git -C sparse-index -c core.fsmonitor="" status -uno &&
-	test_region index ensure_full_index trace2.txt
+		git -C sparse-index status &&
+	test_region ! index ensure_full_index trace2.txt
 '
 
 test_done
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCH v3 11/12] wt-status: expand added sparse directory entries
  2021-05-14 18:30   ` [PATCH v3 00/12] " Derrick Stolee via GitGitGadget
                       ` (9 preceding siblings ...)
  2021-05-14 18:31     ` [PATCH v3 10/12] status: use sparse-index throughout Derrick Stolee via GitGitGadget
@ 2021-05-14 18:31     ` Derrick Stolee via GitGitGadget
  2021-05-14 18:31     ` [PATCH v3 12/12] fsmonitor: integrate with sparse index Derrick Stolee via GitGitGadget
  11 siblings, 0 replies; 66+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-05-14 18:31 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

It is difficult, but possible, to get into a state where we intend to
add a directory that is outside of the sparse-checkout definition. Add a
test to t1092-sparse-checkout-compatibility.sh that demonstrates this
using a combination of 'git reset --mixed' and 'git checkout --orphan'.

This test failed before because the output of 'git status
--porcelain=v2' would not match on the lines for folder1/:

* The sparse-checkout repo (with a full index) would output each path
  name that is intended to be added.

* The sparse-index repo would only output that "folder1/" is staged for
  addition.

The status should report the full list of files to be added, and so this
sparse-directory entry should be expanded to a full list when reaching
it inside the wt_status_collect_changes_initial() method. Use
read_tree_at() to assist.

Somehow, this loop over the cache entries was not guarded by
ensure_full_index() as intended.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 t/t1092-sparse-checkout-compatibility.sh | 28 +++++++++++++
 wt-status.c                              | 50 ++++++++++++++++++++++++
 2 files changed, 78 insertions(+)

diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index 59faf7381093..cd3669d36b53 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -492,4 +492,32 @@ test_expect_success 'sparse-index is not expanded' '
 	test_region ! index ensure_full_index trace2.txt
 '
 
+test_expect_success 'reset mixed and checkout orphan' '
+	init_repos &&
+
+	test_all_match git checkout rename-out-to-in &&
+	test_all_match git reset --mixed HEAD~1 &&
+	test_sparse_match test-tool read-cache --table --expand &&
+	test_all_match git status --porcelain=v2 &&
+	test_all_match git status --porcelain=v2 &&
+
+	# At this point, sparse-checkouts behave differently
+	# from the full-checkout.
+	test_sparse_match git checkout --orphan new-branch &&
+	test_sparse_match test-tool read-cache --table --expand &&
+	test_sparse_match git status --porcelain=v2 &&
+	test_sparse_match git status --porcelain=v2
+'
+
+test_expect_success 'add everything with deep new file' '
+	init_repos &&
+
+	run_on_sparse git sparse-checkout set deep/deeper1/deepest &&
+
+	run_on_all touch deep/deeper1/x &&
+	test_all_match git add . &&
+	test_all_match git status --porcelain=v2 &&
+	test_all_match git status --porcelain=v2
+'
+
 test_done
diff --git a/wt-status.c b/wt-status.c
index 0425169c1895..90db8bd659fa 100644
--- a/wt-status.c
+++ b/wt-status.c
@@ -654,6 +654,34 @@ static void wt_status_collect_changes_index(struct wt_status *s)
 	run_diff_index(&rev, 1);
 }
 
+static int add_file_to_list(const struct object_id *oid,
+			    struct strbuf *base, const char *path,
+			    unsigned int mode, void *context)
+{
+	struct string_list_item *it;
+	struct wt_status_change_data *d;
+	struct wt_status *s = context;
+	char *full_name;
+
+	if (S_ISDIR(mode))
+		return READ_TREE_RECURSIVE;
+
+	full_name = xstrfmt("%s%s", base->buf, path);
+	it = string_list_insert(&s->change, full_name);
+	d = it->util;
+	if (!d) {
+		CALLOC_ARRAY(d, 1);
+		it->util = d;
+	}
+
+	d->index_status = DIFF_STATUS_ADDED;
+	/* Leave {mode,oid}_head zero for adds. */
+	d->mode_index = mode;
+	oidcpy(&d->oid_index, oid);
+	s->committable = 1;
+	return 0;
+}
+
 static void wt_status_collect_changes_initial(struct wt_status *s)
 {
 	struct index_state *istate = s->repo->index;
@@ -668,6 +696,28 @@ static void wt_status_collect_changes_initial(struct wt_status *s)
 			continue;
 		if (ce_intent_to_add(ce))
 			continue;
+		if (S_ISSPARSEDIR(ce->ce_mode)) {
+			/*
+			 * This is a sparse directory entry, so we want to collect all
+			 * of the added files within the tree. This requires recursively
+			 * expanding the trees to find the elements that are new in this
+			 * tree and marking them with DIFF_STATUS_ADDED.
+			 */
+			struct strbuf base = STRBUF_INIT;
+			struct pathspec ps;
+			struct tree *tree = lookup_tree(istate->repo, &ce->oid);
+
+			memset(&ps, 0, sizeof(ps));
+			ps.recursive = 1;
+			ps.has_wildcard = 1;
+			ps.max_depth = -1;
+
+			strbuf_add(&base, ce->name, ce->ce_namelen);
+			read_tree_at(istate->repo, tree, &base, &ps,
+				     add_file_to_list, s);
+			continue;
+		}
+
 		it = string_list_insert(&s->change, ce->name);
 		d = it->util;
 		if (!d) {
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCH v3 12/12] fsmonitor: integrate with sparse index
  2021-05-14 18:30   ` [PATCH v3 00/12] " Derrick Stolee via GitGitGadget
                       ` (10 preceding siblings ...)
  2021-05-14 18:31     ` [PATCH v3 11/12] wt-status: expand added sparse directory entries Derrick Stolee via GitGitGadget
@ 2021-05-14 18:31     ` Derrick Stolee via GitGitGadget
  11 siblings, 0 replies; 66+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-05-14 18:31 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

If we need to expand a sparse-index into a full one, then the FS Monitor
bitmap is going to be incorrect. Ensure that we start fresh at such an
event.

While this is currently a performance drawback, the eventual hope of the
sparse-index feature is that these expansions will be rare and hence we
will be able to keep the FS Monitor data accurate across multiple Git
commands.

These tests are added to demonstrate that the behavior is the same
across a full index and a sparse index, but also that file modifications
to a tracked directory outside of the sparse cone will trigger
ensure_full_index().

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 sparse-index.c              |  7 ++++++
 t/t7519-status-fsmonitor.sh | 48 +++++++++++++++++++++++++++++++++++++
 2 files changed, 55 insertions(+)

diff --git a/sparse-index.c b/sparse-index.c
index b2b3fbd75050..32ba0d17ef7c 100644
--- a/sparse-index.c
+++ b/sparse-index.c
@@ -195,6 +195,10 @@ int convert_to_sparse(struct index_state *istate)
 	cache_tree_free(&istate->cache_tree);
 	cache_tree_update(istate, 0);
 
+	istate->fsmonitor_has_run_once = 0;
+	FREE_AND_NULL(istate->fsmonitor_dirty);
+	FREE_AND_NULL(istate->fsmonitor_last_update);
+
 	istate->sparse_index = 1;
 	trace2_region_leave("index", "convert_to_sparse", istate->repo);
 	return 0;
@@ -291,6 +295,9 @@ void ensure_full_index(struct index_state *istate)
 	istate->cache = full->cache;
 	istate->cache_nr = full->cache_nr;
 	istate->cache_alloc = full->cache_alloc;
+	istate->fsmonitor_has_run_once = 0;
+	FREE_AND_NULL(istate->fsmonitor_dirty);
+	FREE_AND_NULL(istate->fsmonitor_last_update);
 
 	strbuf_release(&base);
 	free(full);
diff --git a/t/t7519-status-fsmonitor.sh b/t/t7519-status-fsmonitor.sh
index 45d025f96010..f70fe961902e 100755
--- a/t/t7519-status-fsmonitor.sh
+++ b/t/t7519-status-fsmonitor.sh
@@ -73,6 +73,7 @@ test_expect_success 'setup' '
 	expect*
 	actual*
 	marker*
+	trace2*
 	EOF
 '
 
@@ -383,4 +384,51 @@ test_expect_success 'status succeeds after staging/unstaging' '
 	)
 '
 
+# Usage:
+# check_sparse_index_behavior [!]
+# If "!" is supplied, then we verify that we do not call ensure_full_index
+# during a call to 'git status'. Otherwise, we verify that we _do_ call it.
+check_sparse_index_behavior () {
+	git status --porcelain=v2 >expect &&
+	git sparse-checkout init --cone --sparse-index &&
+	git sparse-checkout set dir1 dir2 &&
+	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \
+		git status --porcelain=v2 >actual &&
+	test_region $1 index ensure_full_index trace2.txt &&
+	test_cmp expect actual &&
+	rm trace2.txt &&
+	git sparse-checkout disable
+}
+
+test_expect_success 'status succeeds with sparse index' '
+	git reset --hard &&
+
+	test_config core.fsmonitor "$TEST_DIRECTORY/t7519/fsmonitor-all" &&
+	check_sparse_index_behavior ! &&
+
+	write_script .git/hooks/fsmonitor-test<<-\EOF &&
+		printf "last_update_token\0"
+	EOF
+	git config core.fsmonitor .git/hooks/fsmonitor-test &&
+	check_sparse_index_behavior ! &&
+
+	write_script .git/hooks/fsmonitor-test<<-\EOF &&
+		printf "last_update_token\0"
+		printf "dir1/modified\0"
+	EOF
+	check_sparse_index_behavior ! &&
+
+	cp -r dir1 dir1a &&
+	git add dir1a &&
+	git commit -m "add dir1a" &&
+
+	# This one modifies outside the sparse-checkout definition
+	# and hence we expect to expand the sparse-index.
+	write_script .git/hooks/fsmonitor-test<<-\EOF &&
+		printf "last_update_token\0"
+		printf "dir1a/modified\0"
+	EOF
+	check_sparse_index_behavior
+'
+
 test_done
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 66+ messages in thread

end of thread, back to index

Thread overview: 66+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-13 14:01 [PATCH 00/10] Sparse-index: integrate with status and add Derrick Stolee via GitGitGadget
2021-04-13 14:01 ` [PATCH 01/10] t1092: add tests for status/add and sparse files Derrick Stolee via GitGitGadget
2021-04-20 21:52   ` Elijah Newren
2021-04-21 13:21     ` Derrick Stolee
2021-04-21 15:14   ` Matheus Tavares Bernardino
2021-04-23 20:12     ` Derrick Stolee
2021-04-13 14:01 ` [PATCH 02/10] unpack-trees: make sparse aware Derrick Stolee via GitGitGadget
2021-04-20 23:00   ` Elijah Newren
2021-04-21 13:41     ` Derrick Stolee
2021-04-21 16:11       ` Elijah Newren
2021-04-22  2:24         ` Matheus Tavares Bernardino
2021-04-21 17:27     ` Derrick Stolee
2021-04-21 18:55       ` Matheus Tavares Bernardino
2021-04-21 19:10         ` Elijah Newren
2021-04-21 19:51           ` Matheus Tavares Bernardino
2021-04-21 18:56       ` Elijah Newren
2021-04-23 20:16         ` Derrick Stolee
2021-04-13 14:01 ` [PATCH 03/10] dir.c: accept a directory as part of cone-mode patterns Derrick Stolee via GitGitGadget
2021-04-20 23:21   ` Elijah Newren
2021-04-21 13:47     ` Derrick Stolee
2021-04-13 14:01 ` [PATCH 04/10] status: skip sparse-checkout percentage with sparse-index Derrick Stolee via GitGitGadget
2021-04-20 23:26   ` Elijah Newren
2021-04-21 13:51     ` Derrick Stolee
2021-04-13 14:01 ` [PATCH 05/10] status: use sparse-index throughout Derrick Stolee via GitGitGadget
2021-04-21  0:44   ` Elijah Newren
2021-04-21 13:55     ` Derrick Stolee
2021-04-13 14:01 ` [PATCH 06/10] dir: use expand_to_path() for sparse directories Derrick Stolee via GitGitGadget
2021-04-21  0:52   ` Elijah Newren
2021-04-21  0:53     ` Elijah Newren
2021-04-21 14:03       ` Derrick Stolee
2021-04-13 14:01 ` [PATCH 07/10] add: allow operating on a sparse-only index Derrick Stolee via GitGitGadget
2021-04-13 14:01 ` [PATCH 08/10] pathspec: stop calling ensure_full_index Derrick Stolee via GitGitGadget
2021-04-21  0:57   ` Elijah Newren
2021-04-13 14:01 ` [PATCH 09/10] t7519: add sparse directories to FS monitor tests Derrick Stolee via GitGitGadget
2021-04-13 14:01 ` [PATCH 10/10] fsmonitor: test with sparse index Derrick Stolee via GitGitGadget
2021-04-21  7:00   ` Elijah Newren
2021-04-13 20:45 ` [PATCH 00/10] Sparse-index: integrate with status and add Matheus Tavares Bernardino
2021-04-14 16:31   ` Derrick Stolee
2021-04-23 21:34 ` [PATCH v2 0/8] Sparse-index: integrate with status Derrick Stolee via GitGitGadget
2021-04-23 21:34   ` [PATCH v2 1/8] t1092: add tests for status/add and sparse files Derrick Stolee via GitGitGadget
2021-05-13 12:40     ` Matheus Tavares Bernardino
2021-05-14 12:27       ` Derrick Stolee
2021-04-23 21:34   ` [PATCH v2 2/8] unpack-trees: preserve cache_bottom Derrick Stolee via GitGitGadget
2021-04-23 21:34   ` [PATCH v2 3/8] unpack-trees: compare sparse directories correctly Derrick Stolee via GitGitGadget
2021-05-13  3:26     ` Elijah Newren
2021-04-23 21:34   ` [PATCH v2 4/8] unpack-trees: stop recursing into sparse directories Derrick Stolee via GitGitGadget
2021-05-13  3:31     ` Elijah Newren
2021-04-23 21:34   ` [PATCH v2 5/8] dir.c: accept a directory as part of cone-mode patterns Derrick Stolee via GitGitGadget
2021-04-23 21:34   ` [PATCH v2 6/8] status: skip sparse-checkout percentage with sparse-index Derrick Stolee via GitGitGadget
2021-04-23 21:34   ` [PATCH v2 7/8] status: use sparse-index throughout Derrick Stolee via GitGitGadget
2021-04-23 21:34   ` [PATCH v2 8/8] fsmonitor: test with sparse index Derrick Stolee via GitGitGadget
2021-05-13  4:12   ` [PATCH v2 0/8] Sparse-index: integrate with status Elijah Newren
2021-05-14 18:28     ` Derrick Stolee
2021-05-14 18:30   ` [PATCH v3 00/12] " Derrick Stolee via GitGitGadget
2021-05-14 18:31     ` [PATCH v3 01/12] sparse-index: skip indexes with unmerged entries Derrick Stolee via GitGitGadget
2021-05-14 18:31     ` [PATCH v3 02/12] sparse-index: include EXTENDED flag when expanding Derrick Stolee via GitGitGadget
2021-05-14 18:31     ` [PATCH v3 03/12] t1092: expand repository data shape Derrick Stolee via GitGitGadget
2021-05-14 18:31     ` [PATCH v3 04/12] t1092: add tests for status/add and sparse files Derrick Stolee via GitGitGadget
2021-05-14 18:31     ` [PATCH v3 05/12] unpack-trees: preserve cache_bottom Derrick Stolee via GitGitGadget
2021-05-14 18:31     ` [PATCH v3 06/12] unpack-trees: compare sparse directories correctly Derrick Stolee via GitGitGadget
2021-05-14 18:31     ` [PATCH v3 07/12] unpack-trees: stop recursing into sparse directories Derrick Stolee via GitGitGadget
2021-05-14 18:31     ` [PATCH v3 08/12] dir.c: accept a directory as part of cone-mode patterns Derrick Stolee via GitGitGadget
2021-05-14 18:31     ` [PATCH v3 09/12] status: skip sparse-checkout percentage with sparse-index Derrick Stolee via GitGitGadget
2021-05-14 18:31     ` [PATCH v3 10/12] status: use sparse-index throughout Derrick Stolee via GitGitGadget
2021-05-14 18:31     ` [PATCH v3 11/12] wt-status: expand added sparse directory entries Derrick Stolee via GitGitGadget
2021-05-14 18:31     ` [PATCH v3 12/12] fsmonitor: integrate with sparse index Derrick Stolee via GitGitGadget

Git Mailing List Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/git/0 git/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 git git/ https://lore.kernel.org/git \
		git@vger.kernel.org
	public-inbox-index git

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.git


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git