Git Mailing List Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH 00/10] Sparse-index: integrate with status and add
@ 2021-04-13 14:01 Derrick Stolee via GitGitGadget
  2021-04-13 14:01 ` [PATCH 01/10] t1092: add tests for status/add and sparse files Derrick Stolee via GitGitGadget
                   ` (11 more replies)
  0 siblings, 12 replies; 127+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-04-13 14:01 UTC (permalink / raw)
  To: git; +Cc: gitster, newren, Derrick Stolee

This is the first "payoff" series in the sparse-index work. It makes 'git
status' and 'git add' very fast when a sparse-index is enabled on a
repository with cone-mode sparse-checkout (and a small populated set).

This is based on ds/sparse-index-protections AND mt/add-rm-sparse-checkout.
The latter branch is needed because it changes the behavior of 'git add'
around sparse entries, which changes the expectations of a test added in
patch 1.

The approach here is to audit the places where ensure_full_index() pops up
while doing normal commands with pathspecs within the sparse-checkout
definition. Each of these are checked and tested. In the end, the
sparse-index is integrated with these features:

 * git status
 * git add -A
 * git add . (and other pathspecs)
 * FS Monitor index extension.

The performance tests in p2000-sparse-operations.sh improve by 95% or more,
even when compared with the full-index cases, not just the sparse-index
cases that previously had extra overhead.

Hopefully this is the first example of how ds/sparse-index-protections has
done the basic work to do these conversions safely, making them look easier
than they seemed when starting this adventure.

Thanks, -Stolee

Derrick Stolee (10):
  t1092: add tests for status/add and sparse files
  unpack-trees: make sparse aware
  dir.c: accept a directory as part of cone-mode patterns
  status: skip sparse-checkout percentage with sparse-index
  status: use sparse-index throughout
  dir: use expand_to_path() for sparse directories
  add: allow operating on a sparse-only index
  pathspec: stop calling ensure_full_index
  t7519: add sparse directories to FS monitor tests
  fsmonitor: test with sparse index

 builtin/add.c                            |  3 +
 builtin/commit.c                         |  3 +
 dir.c                                    |  5 ++
 dir.h                                    |  2 +-
 pathspec.c                               |  2 -
 preload-index.c                          |  2 +
 read-cache.c                             |  5 +-
 t/t1092-sparse-checkout-compatibility.sh | 73 +++++++++++++++++++++++-
 t/t7519-status-fsmonitor.sh              | 65 +++++++++++++++++++++
 unpack-trees.c                           | 24 +++++++-
 wt-status.c                              | 14 ++++-
 wt-status.h                              |  1 +
 12 files changed, 186 insertions(+), 13 deletions(-)


base-commit: f723f370c89ad61f4f40aabfd3540b1ce19c00e5
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-932%2Fderrickstolee%2Fsparse-index%2Fstatus-and-add-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-932/derrickstolee/sparse-index/status-and-add-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/932
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 127+ messages in thread

* [PATCH 01/10] t1092: add tests for status/add and sparse files
  2021-04-13 14:01 [PATCH 00/10] Sparse-index: integrate with status and add Derrick Stolee via GitGitGadget
@ 2021-04-13 14:01 ` Derrick Stolee via GitGitGadget
  2021-04-20 21:52   ` Elijah Newren
  2021-04-21 15:14   ` Matheus Tavares Bernardino
  2021-04-13 14:01 ` [PATCH 02/10] unpack-trees: make sparse aware Derrick Stolee via GitGitGadget
                   ` (10 subsequent siblings)
  11 siblings, 2 replies; 127+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-04-13 14:01 UTC (permalink / raw)
  To: git; +Cc: gitster, newren, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Before moving to update 'git status' and 'git add' to work with sparse
indexes, add an explicit test that ensures the sparse-index works the
same as a normal sparse-checkout when the worktree contains directories
and files outside of the sparse cone.

Specifically, 'folder1/a' is a file in our test repo, but 'folder1' is
not in the sparse cone. When 'folder1/a' is modified, the file
'folder1/a' is shown as modified, but adding it fails. This is new
behavior as of a20f704 (add: warn when asked to update SKIP_WORKTREE
entries, 2021-04-08). Before that change, these adds would be silently
ignored.

Untracked files are fine: adding new files both with 'git add .' and
'git add folder1/' works just as in a full checkout. This may not be
entirely desirable, but we are not intending to change behavior at the
moment, only document it.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 t/t1092-sparse-checkout-compatibility.sh | 36 ++++++++++++++++++++++++
 1 file changed, 36 insertions(+)

diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index 12e6c453024f..6598c12a2069 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -232,6 +232,42 @@ test_expect_success 'add, commit, checkout' '
 	test_all_match git checkout -
 '
 
+test_expect_success 'status/add: outside sparse cone' '
+	init_repos &&
+
+	# folder1 is at HEAD, but outside the sparse cone
+	run_on_sparse mkdir folder1 &&
+	cp initial-repo/folder1/a sparse-checkout/folder1/a &&
+	cp initial-repo/folder1/a sparse-index/folder1/a &&
+
+	test_sparse_match git status &&
+
+	write_script edit-contents <<-\EOF &&
+	echo text >>$1
+	EOF
+	run_on_all ../edit-contents folder1/a &&
+	run_on_all ../edit-contents folder1/new &&
+
+	test_sparse_match git status --porcelain=v2 &&
+
+	# This "git add folder1/a" is completely ignored
+	# by the sparse-checkout repos. It causes the
+	# full repo to have a different staged environment.
+	test_must_fail git -C sparse-checkout add folder1/a &&
+	test_must_fail git -C sparse-index add folder1/a &&
+	git -C full-checkout checkout HEAD -- folder1/a &&
+	test_sparse_match git status --porcelain=v2 &&
+
+	test_all_match git add . &&
+	test_all_match git status --porcelain=v2 &&
+	test_all_match git commit -m folder1/new &&
+
+	run_on_all ../edit-contents folder1/newer &&
+	test_all_match git add folder1/ &&
+	test_all_match git status --porcelain=v2 &&
+	test_all_match git commit -m folder1/newer
+'
+
 test_expect_success 'checkout and reset --hard' '
 	init_repos &&
 
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 127+ messages in thread

* [PATCH 02/10] unpack-trees: make sparse aware
  2021-04-13 14:01 [PATCH 00/10] Sparse-index: integrate with status and add Derrick Stolee via GitGitGadget
  2021-04-13 14:01 ` [PATCH 01/10] t1092: add tests for status/add and sparse files Derrick Stolee via GitGitGadget
@ 2021-04-13 14:01 ` Derrick Stolee via GitGitGadget
  2021-04-20 23:00   ` Elijah Newren
  2021-04-13 14:01 ` [PATCH 03/10] dir.c: accept a directory as part of cone-mode patterns Derrick Stolee via GitGitGadget
                   ` (9 subsequent siblings)
  11 siblings, 1 reply; 127+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-04-13 14:01 UTC (permalink / raw)
  To: git; +Cc: gitster, newren, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

As a first step to integrate 'git status' and 'git add' with the sparse
index, we must start integrating unpack_trees() with sparse directory
entries. These changes are currently impossible to trigger because
unpack_trees() calls ensure_full_index() if command_requires_full_index
is true. This is the case for all commands at the moment. As we expand
more commands to be sparse-aware, we might find that more changes are
required to unpack_trees(). The current changes will suffice for
'status' and 'add'.

unpack_trees() calls the traverse_trees() API using unpack_callback()
to decide if we should recurse into a subtree. We must add new abilities
to skip a subtree if it corresponds to a sparse directory entry.

It is important to be careful about the trailing directory separator
that exists in the sparse directory entries but not in the subtree
paths.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 dir.h           |  2 +-
 preload-index.c |  2 ++
 read-cache.c    |  3 +++
 unpack-trees.c  | 24 ++++++++++++++++++++++--
 4 files changed, 28 insertions(+), 3 deletions(-)

diff --git a/dir.h b/dir.h
index 51cb0e217247..9d6666f520f3 100644
--- a/dir.h
+++ b/dir.h
@@ -503,7 +503,7 @@ static inline int ce_path_match(struct index_state *istate,
 				char *seen)
 {
 	return match_pathspec(istate, pathspec, ce->name, ce_namelen(ce), 0, seen,
-			      S_ISDIR(ce->ce_mode) || S_ISGITLINK(ce->ce_mode));
+			      S_ISSPARSEDIR(ce->ce_mode) || S_ISDIR(ce->ce_mode) || S_ISGITLINK(ce->ce_mode));
 }
 
 static inline int dir_path_match(struct index_state *istate,
diff --git a/preload-index.c b/preload-index.c
index e5529a586366..35e67057ca9b 100644
--- a/preload-index.c
+++ b/preload-index.c
@@ -55,6 +55,8 @@ static void *preload_thread(void *_data)
 			continue;
 		if (S_ISGITLINK(ce->ce_mode))
 			continue;
+		if (S_ISSPARSEDIR(ce->ce_mode))
+			continue;
 		if (ce_uptodate(ce))
 			continue;
 		if (ce_skip_worktree(ce))
diff --git a/read-cache.c b/read-cache.c
index 29ffa9ac5db9..6308234b4838 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1594,6 +1594,9 @@ int refresh_index(struct index_state *istate, unsigned int flags,
 		if (ignore_skip_worktree && ce_skip_worktree(ce))
 			continue;
 
+		if (istate->sparse_index && S_ISSPARSEDIR(ce->ce_mode))
+			continue;
+
 		if (pathspec && !ce_path_match(istate, ce, pathspec, seen))
 			filtered = 1;
 
diff --git a/unpack-trees.c b/unpack-trees.c
index dddf106d5bd4..9a62e823928a 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -586,6 +586,13 @@ static void mark_ce_used(struct cache_entry *ce, struct unpack_trees_options *o)
 {
 	ce->ce_flags |= CE_UNPACKED;
 
+	/*
+	 * If this is a sparse directory, don't advance cache_bottom.
+	 * That will be advanced later using the cache-tree data.
+	 */
+	if (S_ISSPARSEDIR(ce->ce_mode))
+		return;
+
 	if (o->cache_bottom < o->src_index->cache_nr &&
 	    o->src_index->cache[o->cache_bottom] == ce) {
 		int bottom = o->cache_bottom;
@@ -984,6 +991,9 @@ static int do_compare_entry(const struct cache_entry *ce,
 	ce_len -= pathlen;
 	ce_name = ce->name + pathlen;
 
+	/* remove directory separator if a sparse directory entry */
+	if (S_ISSPARSEDIR(ce->ce_mode))
+		ce_len--;
 	return df_name_compare(ce_name, ce_len, S_IFREG, name, namelen, mode);
 }
 
@@ -993,6 +1003,10 @@ static int compare_entry(const struct cache_entry *ce, const struct traverse_inf
 	if (cmp)
 		return cmp;
 
+	/* If ce is a sparse directory, then allow equality here. */
+	if (S_ISSPARSEDIR(ce->ce_mode))
+		return 0;
+
 	/*
 	 * Even if the beginning compared identically, the ce should
 	 * compare as bigger than a directory leading up to it!
@@ -1243,6 +1257,7 @@ static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, str
 	struct cache_entry *src[MAX_UNPACK_TREES + 1] = { NULL, };
 	struct unpack_trees_options *o = info->data;
 	const struct name_entry *p = names;
+	unsigned recurse = 1;
 
 	/* Find first entry with a real name (we could use "mask" too) */
 	while (!p->mode)
@@ -1284,12 +1299,16 @@ static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, str
 					}
 				}
 				src[0] = ce;
+
+				if (S_ISSPARSEDIR(ce->ce_mode))
+					recurse = 0;
 			}
 			break;
 		}
 	}
 
-	if (unpack_nondirectories(n, mask, dirmask, src, names, info) < 0)
+	if (recurse &&
+	    unpack_nondirectories(n, mask, dirmask, src, names, info) < 0)
 		return -1;
 
 	if (o->merge && src[0]) {
@@ -1319,7 +1338,8 @@ static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, str
 			}
 		}
 
-		if (traverse_trees_recursive(n, dirmask, mask & ~dirmask,
+		if (recurse &&
+		    traverse_trees_recursive(n, dirmask, mask & ~dirmask,
 					     names, info) < 0)
 			return -1;
 		return mask;
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 127+ messages in thread

* [PATCH 03/10] dir.c: accept a directory as part of cone-mode patterns
  2021-04-13 14:01 [PATCH 00/10] Sparse-index: integrate with status and add Derrick Stolee via GitGitGadget
  2021-04-13 14:01 ` [PATCH 01/10] t1092: add tests for status/add and sparse files Derrick Stolee via GitGitGadget
  2021-04-13 14:01 ` [PATCH 02/10] unpack-trees: make sparse aware Derrick Stolee via GitGitGadget
@ 2021-04-13 14:01 ` Derrick Stolee via GitGitGadget
  2021-04-20 23:21   ` Elijah Newren
  2021-04-13 14:01 ` [PATCH 04/10] status: skip sparse-checkout percentage with sparse-index Derrick Stolee via GitGitGadget
                   ` (8 subsequent siblings)
  11 siblings, 1 reply; 127+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-04-13 14:01 UTC (permalink / raw)
  To: git; +Cc: gitster, newren, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

When we have sparse directory entries in the index, we want to compare
that directory against sparse-checkout patterns. Those pattern matching
algorithms are built expecting a file path, not a directory path. This
is especially important in the "cone mode" patterns which will match
files that exist within the "parent directories" as well as the
recursive directory matches.

If path_matches_pattern_list() is given a directory, we can add a fake
filename ("-") to the directory and get the same results as before,
assuming we are in cone mode. Since sparse index requires cone mode
patterns, this is an acceptable assumption.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 dir.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/dir.c b/dir.c
index 166238e79f52..57e22e605cec 100644
--- a/dir.c
+++ b/dir.c
@@ -1378,6 +1378,11 @@ enum pattern_match_result path_matches_pattern_list(
 	strbuf_addch(&parent_pathname, '/');
 	strbuf_add(&parent_pathname, pathname, pathlen);
 
+	/* Directory requests should be added as if they are a file */
+	if (parent_pathname.len > 1 &&
+	    parent_pathname.buf[parent_pathname.len - 1] == '/')
+		strbuf_add(&parent_pathname, "-", 1);
+
 	if (hashmap_contains_path(&pl->recursive_hashmap,
 				  &parent_pathname)) {
 		result = MATCHED_RECURSIVE;
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 127+ messages in thread

* [PATCH 04/10] status: skip sparse-checkout percentage with sparse-index
  2021-04-13 14:01 [PATCH 00/10] Sparse-index: integrate with status and add Derrick Stolee via GitGitGadget
                   ` (2 preceding siblings ...)
  2021-04-13 14:01 ` [PATCH 03/10] dir.c: accept a directory as part of cone-mode patterns Derrick Stolee via GitGitGadget
@ 2021-04-13 14:01 ` Derrick Stolee via GitGitGadget
  2021-04-20 23:26   ` Elijah Newren
  2021-04-13 14:01 ` [PATCH 05/10] status: use sparse-index throughout Derrick Stolee via GitGitGadget
                   ` (7 subsequent siblings)
  11 siblings, 1 reply; 127+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-04-13 14:01 UTC (permalink / raw)
  To: git; +Cc: gitster, newren, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

'git status' began reporting a percentage of populated paths when
sparse-checkout is enabled in 051df3cf (wt-status: show sparse
checkout status as well, 2020-07-18). This percentage is incorrect when
the index has sparse directories. It would also be expensive to
calculate as we would need to parse trees to count the total number of
possible paths.

Avoid the expensive computation by simplifying the output to only report
that a sparse checkout exists, without the percentage.

This change is the reason we use 'git status --porcelain=v2' in
t1092-sparse-checkout-compatibility.sh. We don't want to ensure that
this message is equal across both modes, but instead just the important
information about staged, modified, and untracked files are compared.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 t/t1092-sparse-checkout-compatibility.sh |  8 ++++++++
 wt-status.c                              | 14 +++++++++++---
 wt-status.h                              |  1 +
 3 files changed, 20 insertions(+), 3 deletions(-)

diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index 6598c12a2069..e488ef9bd941 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -196,6 +196,14 @@ test_expect_success 'status with options' '
 	test_all_match git status --porcelain=v2 -uno
 '
 
+test_expect_success 'status reports sparse-checkout' '
+	init_repos &&
+	git -C sparse-checkout status >full &&
+	git -C sparse-index status >sparse &&
+	test_i18ngrep "You are in a sparse checkout with " full &&
+	test_i18ngrep "You are in a sparse checkout." sparse
+'
+
 test_expect_success 'add, commit, checkout' '
 	init_repos &&
 
diff --git a/wt-status.c b/wt-status.c
index 0c8287a023e4..0425169c1895 100644
--- a/wt-status.c
+++ b/wt-status.c
@@ -1490,9 +1490,12 @@ static void show_sparse_checkout_in_use(struct wt_status *s,
 	if (s->state.sparse_checkout_percentage == SPARSE_CHECKOUT_DISABLED)
 		return;
 
-	status_printf_ln(s, color,
-			 _("You are in a sparse checkout with %d%% of tracked files present."),
-			 s->state.sparse_checkout_percentage);
+	if (s->state.sparse_checkout_percentage == SPARSE_CHECKOUT_SPARSE_INDEX)
+		status_printf_ln(s, color, _("You are in a sparse checkout."));
+	else
+		status_printf_ln(s, color,
+				_("You are in a sparse checkout with %d%% of tracked files present."),
+				s->state.sparse_checkout_percentage);
 	wt_longstatus_print_trailer(s);
 }
 
@@ -1650,6 +1653,11 @@ static void wt_status_check_sparse_checkout(struct repository *r,
 		return;
 	}
 
+	if (r->index->sparse_index) {
+		state->sparse_checkout_percentage = SPARSE_CHECKOUT_SPARSE_INDEX;
+		return;
+	}
+
 	for (i = 0; i < r->index->cache_nr; i++) {
 		struct cache_entry *ce = r->index->cache[i];
 		if (ce_skip_worktree(ce))
diff --git a/wt-status.h b/wt-status.h
index 0d32799b28e1..ab9cc9d8f032 100644
--- a/wt-status.h
+++ b/wt-status.h
@@ -78,6 +78,7 @@ enum wt_status_format {
 };
 
 #define SPARSE_CHECKOUT_DISABLED -1
+#define SPARSE_CHECKOUT_SPARSE_INDEX -2
 
 struct wt_status_state {
 	int merge_in_progress;
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 127+ messages in thread

* [PATCH 05/10] status: use sparse-index throughout
  2021-04-13 14:01 [PATCH 00/10] Sparse-index: integrate with status and add Derrick Stolee via GitGitGadget
                   ` (3 preceding siblings ...)
  2021-04-13 14:01 ` [PATCH 04/10] status: skip sparse-checkout percentage with sparse-index Derrick Stolee via GitGitGadget
@ 2021-04-13 14:01 ` Derrick Stolee via GitGitGadget
  2021-04-21  0:44   ` Elijah Newren
  2021-04-13 14:01 ` [PATCH 06/10] dir: use expand_to_path() for sparse directories Derrick Stolee via GitGitGadget
                   ` (6 subsequent siblings)
  11 siblings, 1 reply; 127+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-04-13 14:01 UTC (permalink / raw)
  To: git; +Cc: gitster, newren, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

By testing 'git -c core.fsmonitor= status -uno', we can check for the
simplest index operations that can be made sparse-aware. The necessary
implementation details are already integrated with sparse-checkout, so
modify command_requires_full_index to be zero for cmd_status().

By running the debugger for 'git status -uno' after that change, we find
two instances of ensure_full_index() that were added for extra safety,
but can be removed without issue.

In refresh_index(), we loop through the index entries. The
refresh_cache_ent() method copies the sparse directories into the
refreshed index without issue.

The loop within run_diff_files() skips things that are in stage 0 and
have skip-worktree enabled, so seems safe to disable ensure_full_index()
here.

This allows some cases of 'git status' to no longer expand a sparse
index to a full one, giving the following performance improvements for
p2000-sparse-checkout-operations.sh:

Test                                  HEAD~1           HEAD
-----------------------------------------------------------------------------
2000.2: git status (full-index-v3)    0.38(0.36+0.07)  0.37(0.31+0.10) -2.6%
2000.3: git status (full-index-v4)    0.38(0.29+0.12)  0.37(0.30+0.11) -2.6%
2000.4: git status (sparse-index-v3)  2.43(2.33+0.14)  0.04(0.05+0.04) -98.4%
2000.5: git status (sparse-index-v4)  2.44(2.35+0.13)  0.05(0.04+0.07) -98.0%

Note that since HEAD~1 was expanding the sparse index by parsing trees,
it was artificially slower than the full index case. Thus, the 98%
improvement is misleading, and instead we should celebrate the 0.37s to
0.05s improvement of 82%. This is more indicative of the peformance
gains we are expecting by using a sparse index.

Note: we are dropping the assignment of core.fsmonitor here. This is not
necessary for the test script as we are not altering the config any
other way. Correct integration with FS Monitor will be validated in
later changes.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 builtin/commit.c                         |  3 +++
 read-cache.c                             |  2 --
 t/t1092-sparse-checkout-compatibility.sh | 12 ++++++++----
 3 files changed, 11 insertions(+), 6 deletions(-)

diff --git a/builtin/commit.c b/builtin/commit.c
index cf0c36d1dcb2..e529da7beadd 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -1404,6 +1404,9 @@ int cmd_status(int argc, const char **argv, const char *prefix)
 	if (argc == 2 && !strcmp(argv[1], "-h"))
 		usage_with_options(builtin_status_usage, builtin_status_options);
 
+	prepare_repo_settings(the_repository);
+	the_repository->settings.command_requires_full_index = 0;
+
 	status_init_config(&s, git_status_config);
 	argc = parse_options(argc, argv, prefix,
 			     builtin_status_options,
diff --git a/read-cache.c b/read-cache.c
index 6308234b4838..83e6bdef7604 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1578,8 +1578,6 @@ int refresh_index(struct index_state *istate, unsigned int flags,
 	 */
 	preload_index(istate, pathspec, 0);
 	trace2_region_enter("index", "refresh", NULL);
-	/* TODO: audit for interaction with sparse-index. */
-	ensure_full_index(istate);
 	for (i = 0; i < istate->cache_nr; i++) {
 		struct cache_entry *ce, *new_entry;
 		int cache_errno = 0;
diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index e488ef9bd941..380a085f8ec4 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -449,12 +449,16 @@ test_expect_success 'sparse-index is expanded and converted back' '
 	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \
 		git -C sparse-index -c core.fsmonitor="" reset --hard &&
 	test_region index convert_to_sparse trace2.txt &&
-	test_region index ensure_full_index trace2.txt &&
+	test_region index ensure_full_index trace2.txt
+'
 
-	rm trace2.txt &&
+test_expect_success 'sparse-index is not expanded' '
+	init_repos &&
+
+	rm -f trace2.txt &&
 	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \
-		git -C sparse-index -c core.fsmonitor="" status -uno &&
-	test_region index ensure_full_index trace2.txt
+		git -C sparse-index status -uno &&
+	test_region ! index ensure_full_index trace2.txt
 '
 
 test_done
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 127+ messages in thread

* [PATCH 06/10] dir: use expand_to_path() for sparse directories
  2021-04-13 14:01 [PATCH 00/10] Sparse-index: integrate with status and add Derrick Stolee via GitGitGadget
                   ` (4 preceding siblings ...)
  2021-04-13 14:01 ` [PATCH 05/10] status: use sparse-index throughout Derrick Stolee via GitGitGadget
@ 2021-04-13 14:01 ` Derrick Stolee via GitGitGadget
  2021-04-21  0:52   ` Elijah Newren
  2021-04-13 14:01 ` [PATCH 07/10] add: allow operating on a sparse-only index Derrick Stolee via GitGitGadget
                   ` (5 subsequent siblings)
  11 siblings, 1 reply; 127+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-04-13 14:01 UTC (permalink / raw)
  To: git; +Cc: gitster, newren, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

The recently-implemented expand_to_path() method can supply position
queries a faster response if they are specifically asking for a path
within the sparse cone. Since this is the most-common scenario, this
provides a significant speedup.

Update t1092-sparse-checkout-compatibility.sh to fully ensure that 'git
status' does not expand a sparse index to a full one, even when there
exist untracked files.

The performance test script p2000-sparse-operations.sh demonstrates
that this is the final hole to fill to allow 'git status' to speed up
when using a sparse index:

Test                                  HEAD~1            HEAD
------------------------------------------------------------------------------
2000.4: git status (sparse-index-v3)  1.50(1.43+0.10)   0.04(0.04+0.03) -97.3%
2000.5: git status (sparse-index-v4)  1.50(1.43+0.10)   0.04(0.03+0.04) -97.3%

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 t/t1092-sparse-checkout-compatibility.sh | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index 380a085f8ec4..b937d7096afd 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -456,8 +456,9 @@ test_expect_success 'sparse-index is not expanded' '
 	init_repos &&
 
 	rm -f trace2.txt &&
+	echo >>sparse-index/untracked.txt &&
 	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \
-		git -C sparse-index status -uno &&
+		git -C sparse-index status &&
 	test_region ! index ensure_full_index trace2.txt
 '
 
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 127+ messages in thread

* [PATCH 07/10] add: allow operating on a sparse-only index
  2021-04-13 14:01 [PATCH 00/10] Sparse-index: integrate with status and add Derrick Stolee via GitGitGadget
                   ` (5 preceding siblings ...)
  2021-04-13 14:01 ` [PATCH 06/10] dir: use expand_to_path() for sparse directories Derrick Stolee via GitGitGadget
@ 2021-04-13 14:01 ` Derrick Stolee via GitGitGadget
  2021-04-13 14:01 ` [PATCH 08/10] pathspec: stop calling ensure_full_index Derrick Stolee via GitGitGadget
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 127+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-04-13 14:01 UTC (permalink / raw)
  To: git; +Cc: gitster, newren, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Disable command_requires_full_index for 'git add'. This does not require
any additional removals of ensure_full_index(). The main reason is that
'git add' discovers changes based on the pathspec and the worktree
itself. These are then inserted into the index directly, and calls to
index_name_pos() or index_file_exists() already call expand_to_path() at
the appropriate time to support a sparse-index.

Add a test to check that 'git add -A' and 'git add <file>' does not
expand the index at all, as long as <file> is not within a sparse
directory. This does not help the global 'git add .' case.

We can measure the improvement using p2000-sparse-operations.sh with
these results:

Test                                  HEAD~1           HEAD
------------------------------------------------------------------------------
2000.6: git add -A (full-index-v3)    1.35(1.00+0.20)  1.33(0.98+0.19) -1.5%
2000.7: git add -A (full-index-v4)    1.25(0.97+0.17)  1.23(0.96+0.16) -1.6%
2000.8: git add -A (sparse-index-v3)  2.38(2.28+0.13)  0.06(0.04+0.08) -97.5%
2000.9: git add -A (sparse-index-v4)  2.39(2.25+0.18)  0.06(0.04+0.07) -97.5%

While the 97% improvement seems impressive, it's important to recognize
that previously we had significant overhead for expanding the
sparse-index. Comparing to the full index case, 'git add -A' goes from
1.33s to 0.06s, which is "only" a 95% improvement.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 builtin/add.c                            |  3 +++
 t/t1092-sparse-checkout-compatibility.sh | 12 ++++++++++++
 2 files changed, 15 insertions(+)

diff --git a/builtin/add.c b/builtin/add.c
index 58ee3f954ef7..0572d0344065 100644
--- a/builtin/add.c
+++ b/builtin/add.c
@@ -526,6 +526,9 @@ int cmd_add(int argc, const char **argv, const char *prefix)
 	add_new_files = !take_worktree_changes && !refresh_only && !add_renormalize;
 	require_pathspec = !(take_worktree_changes || (0 < addremove_explicit));
 
+	prepare_repo_settings(the_repository);
+	the_repository->settings.command_requires_full_index = 0;
+
 	hold_locked_index(&lock_file, LOCK_DIE_ON_ERROR);
 
 	/*
diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index b937d7096afd..c210dba78067 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -459,6 +459,18 @@ test_expect_success 'sparse-index is not expanded' '
 	echo >>sparse-index/untracked.txt &&
 	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \
 		git -C sparse-index status &&
+	test_region ! index ensure_full_index trace2.txt &&
+
+	rm trace2.txt &&
+	echo >>sparse-index/README.md &&
+	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \
+		git -C sparse-index add -A &&
+	test_region ! index ensure_full_index trace2.txt &&
+
+	rm trace2.txt &&
+	echo >>sparse-index/extra.txt &&
+	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \
+		git -C sparse-index add extra.txt &&
 	test_region ! index ensure_full_index trace2.txt
 '
 
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 127+ messages in thread

* [PATCH 08/10] pathspec: stop calling ensure_full_index
  2021-04-13 14:01 [PATCH 00/10] Sparse-index: integrate with status and add Derrick Stolee via GitGitGadget
                   ` (6 preceding siblings ...)
  2021-04-13 14:01 ` [PATCH 07/10] add: allow operating on a sparse-only index Derrick Stolee via GitGitGadget
@ 2021-04-13 14:01 ` Derrick Stolee via GitGitGadget
  2021-04-21  0:57   ` Elijah Newren
  2021-04-13 14:01 ` [PATCH 09/10] t7519: add sparse directories to FS monitor tests Derrick Stolee via GitGitGadget
                   ` (3 subsequent siblings)
  11 siblings, 1 reply; 127+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-04-13 14:01 UTC (permalink / raw)
  To: git; +Cc: gitster, newren, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

The add_pathspec_matches_against_index() focuses on matching a pathspec
to file entries in the index. This already works correctly for its only
use: checking if untracked files exist in the index.

The compatibility checks in t1092 already test that 'git add <dir>'
works for a directory outside of the sparse cone. That provides coverage
for removing this guard.

This finalizes our ability to run 'git add .' without expanding a sparse
index to a full one. This is evidenced by an update to t1092 and by
these performance numbers for p2000-sparse-operations.sh:

Test                                    HEAD~1            HEAD
--------------------------------------------------------------------------------
2000.10: git add . (full-index-v3)      1.37(1.02+0.18)   1.38(1.01+0.20) +0.7%
2000.11: git add . (full-index-v4)      1.26(1.00+0.15)   1.27(0.99+0.17) +0.8%
2000.12: git add . (sparse-index-v3)    2.39(2.29+0.14)   0.06(0.05+0.07) -97.5%
2000.13: git add . (sparse-index-v4)    2.42(2.32+0.14)   0.06(0.05+0.06) -97.5%

While the 97% improvement is shown by the test results, it is worth
noting that expanding the sparse index was adding overhead in previous
commits. Comparing to the full index case, we see the performance go
from 1.27s to 0.06s, a 95% improvement.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 pathspec.c                               | 2 --
 t/t1092-sparse-checkout-compatibility.sh | 6 ++++++
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/pathspec.c b/pathspec.c
index 54813c0c4e8e..b51b48471fe6 100644
--- a/pathspec.c
+++ b/pathspec.c
@@ -37,8 +37,6 @@ void add_pathspec_matches_against_index(const struct pathspec *pathspec,
 			num_unmatched++;
 	if (!num_unmatched)
 		return;
-	/* TODO: audit for interaction with sparse-index. */
-	ensure_full_index(istate);
 	for (i = 0; i < istate->cache_nr; i++) {
 		const struct cache_entry *ce = istate->cache[i];
 		if (sw_action == PS_IGNORE_SKIP_WORKTREE && ce_skip_worktree(ce))
diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index c210dba78067..738013b00191 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -471,6 +471,12 @@ test_expect_success 'sparse-index is not expanded' '
 	echo >>sparse-index/extra.txt &&
 	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \
 		git -C sparse-index add extra.txt &&
+	test_region ! index ensure_full_index trace2.txt &&
+
+	rm trace2.txt &&
+	echo >>sparse-index/untracked.txt &&
+	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \
+		git -C sparse-index add . &&
 	test_region ! index ensure_full_index trace2.txt
 '
 
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 127+ messages in thread

* [PATCH 09/10] t7519: add sparse directories to FS monitor tests
  2021-04-13 14:01 [PATCH 00/10] Sparse-index: integrate with status and add Derrick Stolee via GitGitGadget
                   ` (7 preceding siblings ...)
  2021-04-13 14:01 ` [PATCH 08/10] pathspec: stop calling ensure_full_index Derrick Stolee via GitGitGadget
@ 2021-04-13 14:01 ` Derrick Stolee via GitGitGadget
  2021-04-13 14:01 ` [PATCH 10/10] fsmonitor: test with sparse index Derrick Stolee via GitGitGadget
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 127+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-04-13 14:01 UTC (permalink / raw)
  To: git; +Cc: gitster, newren, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

The File System Monitor (FS Monitor) tests in t7519 demonstrate some
important interactions with the index and the response from the FS
Monitor hook. Later changes will integrate the FS Monitor extension in
the index with the existence of sparse directory entries in a sparse
index. To do so, we need to include directories outside of the sparse
checkout definition.

Add a new directory, dir1a, between dir1 and dir2 in the test repo used
by this script. By inserting it in the middle, we are more likely to
trigger incorrect behavior when the fsmonitor_dirty bitmap is involved
with sparse directories changing the position of cache entries.

I could have modified the test to create two repos, one sparse and one
not, but that causes confusion in the expected output. Further, it makes
the test take twice as long. With this approach, we can validate that FS
Monitor works with the sparse index feature using the
GIT_TEST_SPARSE_INDEX=1 environment variable. The test currently fails
with that environment variable because FS Monitor is disabled when a
sparse index exists. The following changes will update this behavior.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 t/t7519-status-fsmonitor.sh | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/t/t7519-status-fsmonitor.sh b/t/t7519-status-fsmonitor.sh
index 45d025f96010..23879d967297 100755
--- a/t/t7519-status-fsmonitor.sh
+++ b/t/t7519-status-fsmonitor.sh
@@ -62,11 +62,16 @@ test_expect_success 'setup' '
 	mkdir dir1 &&
 	: >dir1/tracked &&
 	: >dir1/modified &&
+	mkdir dir1a &&
+	: >dir1a/a &&
+	: >dir1a/b &&
 	mkdir dir2 &&
 	: >dir2/tracked &&
 	: >dir2/modified &&
 	git -c core.fsmonitor= add . &&
 	git -c core.fsmonitor= commit -m initial &&
+	git sparse-checkout init --cone --no-sparse-index &&
+	git sparse-checkout set dir1 dir2 &&
 	git config core.fsmonitor .git/hooks/fsmonitor-test &&
 	cat >.gitignore <<-\EOF
 	.gitignore
@@ -99,6 +104,8 @@ test_expect_success 'update-index --no-fsmonitor" removes the fsmonitor extensio
 cat >expect <<EOF &&
 h dir1/modified
 H dir1/tracked
+S dir1a/a
+S dir1a/b
 h dir2/modified
 H dir2/tracked
 h modified
@@ -121,6 +128,8 @@ test_expect_success 'update-index --fsmonitor-valid" sets the fsmonitor valid bi
 cat >expect <<EOF &&
 H dir1/modified
 H dir1/tracked
+S dir1a/a
+S dir1a/b
 H dir2/modified
 H dir2/tracked
 H modified
@@ -139,6 +148,8 @@ test_expect_success 'update-index --no-fsmonitor-valid" clears the fsmonitor val
 cat >expect <<EOF &&
 H dir1/modified
 H dir1/tracked
+S dir1a/a
+S dir1a/b
 H dir2/modified
 H dir2/tracked
 H modified
@@ -158,6 +169,8 @@ cat >expect <<EOF &&
 H dir1/modified
 h dir1/new
 H dir1/tracked
+S dir1a/a
+S dir1a/b
 H dir2/modified
 h dir2/new
 H dir2/tracked
@@ -182,6 +195,8 @@ cat >expect <<EOF &&
 H dir1/modified
 h dir1/new
 h dir1/tracked
+S dir1a/a
+S dir1a/b
 H dir2/modified
 h dir2/new
 h dir2/tracked
@@ -201,6 +216,8 @@ test_expect_success 'all unmodified files get marked valid' '
 cat >expect <<EOF &&
 H dir1/modified
 h dir1/tracked
+S dir1a/a
+S dir1a/b
 h dir2/modified
 h dir2/tracked
 h modified
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 127+ messages in thread

* [PATCH 10/10] fsmonitor: test with sparse index
  2021-04-13 14:01 [PATCH 00/10] Sparse-index: integrate with status and add Derrick Stolee via GitGitGadget
                   ` (8 preceding siblings ...)
  2021-04-13 14:01 ` [PATCH 09/10] t7519: add sparse directories to FS monitor tests Derrick Stolee via GitGitGadget
@ 2021-04-13 14:01 ` Derrick Stolee via GitGitGadget
  2021-04-21  7:00   ` Elijah Newren
  2021-04-13 20:45 ` [PATCH 00/10] Sparse-index: integrate with status and add Matheus Tavares Bernardino
  2021-04-23 21:34 ` [PATCH v2 0/8] Sparse-index: integrate with status Derrick Stolee via GitGitGadget
  11 siblings, 1 reply; 127+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-04-13 14:01 UTC (permalink / raw)
  To: git; +Cc: gitster, newren, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

During the effort to protect uses of the index to operate on a full
index, we did not modify fsmonitor.c. This is because it already works
effectively with only the change to index_name_stage_pos(). The only
thing left to do is to test that it works correctly.

These tests are added to demonstrate that the behavior is the same
across a full index and a sparse index, but also that file modifications
to a tracked directory outside of the sparse cone will trigger
ensure_full_index().

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 t/t7519-status-fsmonitor.sh | 48 +++++++++++++++++++++++++++++++++++++
 1 file changed, 48 insertions(+)

diff --git a/t/t7519-status-fsmonitor.sh b/t/t7519-status-fsmonitor.sh
index 23879d967297..306157d48abf 100755
--- a/t/t7519-status-fsmonitor.sh
+++ b/t/t7519-status-fsmonitor.sh
@@ -78,6 +78,7 @@ test_expect_success 'setup' '
 	expect*
 	actual*
 	marker*
+	trace2*
 	EOF
 '
 
@@ -400,4 +401,51 @@ test_expect_success 'status succeeds after staging/unstaging' '
 	)
 '
 
+test_expect_success 'status succeeds with sparse index' '
+	test_config core.fsmonitor "$TEST_DIRECTORY/t7519/fsmonitor-all" &&
+	git status --porcelain=v2 >expect &&
+	git sparse-checkout init --cone --sparse-index &&
+	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \
+		git status --porcelain=v2 >actual &&
+	test_region ! index ensure_full_index trace2.txt &&
+	test_cmp expect actual &&
+	rm trace2.txt &&
+
+	write_script .git/hooks/fsmonitor-test<<-\EOF &&
+		printf "last_update_token\0"
+	EOF
+	git config core.fsmonitor .git/hooks/fsmonitor-test &&
+	git status --porcelain=v2 >expect &&
+	git sparse-checkout init --cone --sparse-index &&
+	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \
+		git status --porcelain=v2 >actual &&
+	test_region ! index ensure_full_index trace2.txt &&
+	test_cmp expect actual &&
+	rm trace2.txt &&
+
+	write_script .git/hooks/fsmonitor-test<<-\EOF &&
+		printf "last_update_token\0"
+		printf "dir1/modified\0"
+	EOF
+	git config core.fsmonitor .git/hooks/fsmonitor-test &&
+	git status --porcelain=v2 >expect &&
+	git sparse-checkout init --cone --sparse-index &&
+	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \
+		git status --porcelain=v2 >actual &&
+	test_region ! index ensure_full_index trace2.txt &&
+	test_cmp expect actual &&
+
+	write_script .git/hooks/fsmonitor-test<<-\EOF &&
+		printf "last_update_token\0"
+		printf "dir1a/modified\0"
+	EOF
+	git config core.fsmonitor .git/hooks/fsmonitor-test &&
+	git status --porcelain=v2 >expect &&
+	git sparse-checkout init --cone --sparse-index &&
+	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \
+		git status --porcelain=v2 >actual &&
+	test_region index ensure_full_index trace2.txt &&
+	test_cmp expect actual
+'
+
 test_done
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [PATCH 00/10] Sparse-index: integrate with status and add
  2021-04-13 14:01 [PATCH 00/10] Sparse-index: integrate with status and add Derrick Stolee via GitGitGadget
                   ` (9 preceding siblings ...)
  2021-04-13 14:01 ` [PATCH 10/10] fsmonitor: test with sparse index Derrick Stolee via GitGitGadget
@ 2021-04-13 20:45 ` Matheus Tavares Bernardino
  2021-04-14 16:31   ` Derrick Stolee
  2021-04-23 21:34 ` [PATCH v2 0/8] Sparse-index: integrate with status Derrick Stolee via GitGitGadget
  11 siblings, 1 reply; 127+ messages in thread
From: Matheus Tavares Bernardino @ 2021-04-13 20:45 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: git, Junio C Hamano, Elijah Newren, Derrick Stolee

Hi, Stolee

On Tue, Apr 13, 2021 at 11:02 AM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> This is the first "payoff" series in the sparse-index work. It makes 'git
> status' and 'git add' very fast when a sparse-index is enabled on a
> repository with cone-mode sparse-checkout (and a small populated set).
>
> This is based on ds/sparse-index-protections AND mt/add-rm-sparse-checkout.

I just noticed that our ds/sparse-index-protections and
mt/add-rm-sparse-checkout had a small semantic conflict. It didn't
appear before, but it does now with this new series.

ds/sparse-index-protections added `ensure_full_index()` guards before
the loops that traverse over all cache entries. At the same time,
mt/add-rm-sparse-checkout added yet another one of these loops, at
`pathspec.c::find_pathspecs_matching_skip_worktree()`. Although the
new place didn't get the `ensure_full_index()` guard, all of its
callers (in `add` and `rm`) did call `ensure_full_index()` before
calling it, so it was fine.

However, patches 7 and 8 remove some of these protections in `add`s
code. And, as a result, if "dir" is a sparse directory entry, `git add
[--refresh] dir/file` no longer emits the warning added at
mt/add-rm-sparse-checkout.

Adding `ensure_full_index()` at
`find_pathspecs_matching_skip_worktree()` fixes the problem. We have
to consider the performance implications, but they _might_ be
acceptable as we only call this function when a pathspec given to
`add` or `rm` does not match any non-ignored file inside the sparse
checkout.

Additionally, the tests I added at t3705 won't catch this problem,
even when running with GIT_TEST_SPARSE_INDEX=true :( That's because
they don't set core.sparseCheckout and core.sparseCheckoutCone, they
only set individual index entries with the SKIP_WORKTREE bit. And
therefore, the index is always written fully. Perhaps, should I reroll
my series using cone mode for these tests?

(And a semi-related question: do you plan on adding
GIT_TEST_SPARSE_INDEX=true to one of the CI jobs? )

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [PATCH 00/10] Sparse-index: integrate with status and add
  2021-04-13 20:45 ` [PATCH 00/10] Sparse-index: integrate with status and add Matheus Tavares Bernardino
@ 2021-04-14 16:31   ` Derrick Stolee
  0 siblings, 0 replies; 127+ messages in thread
From: Derrick Stolee @ 2021-04-14 16:31 UTC (permalink / raw)
  To: Matheus Tavares Bernardino, Derrick Stolee via GitGitGadget
  Cc: git, Junio C Hamano, Elijah Newren, Derrick Stolee

On 4/13/2021 4:45 PM, Matheus Tavares Bernardino wrote:
> Hi, Stolee
> 
> On Tue, Apr 13, 2021 at 11:02 AM Derrick Stolee via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
>>
>> This is the first "payoff" series in the sparse-index work. It makes 'git
>> status' and 'git add' very fast when a sparse-index is enabled on a
>> repository with cone-mode sparse-checkout (and a small populated set).
>>
>> This is based on ds/sparse-index-protections AND mt/add-rm-sparse-checkout.
> 
> I just noticed that our ds/sparse-index-protections and
> mt/add-rm-sparse-checkout had a small semantic conflict. It didn't
> appear before, but it does now with this new series.

Thank you for taking a close look.
 
> ds/sparse-index-protections added `ensure_full_index()` guards before
> the loops that traverse over all cache entries. At the same time,
> mt/add-rm-sparse-checkout added yet another one of these loops, at
> `pathspec.c::find_pathspecs_matching_skip_worktree()`. Although the
> new place didn't get the `ensure_full_index()` guard, all of its
> callers (in `add` and `rm`) did call `ensure_full_index()` before
> calling it, so it was fine.
>
> However, patches 7 and 8 remove some of these protections in `add`s
> code. And, as a result, if "dir" is a sparse directory entry, `git add
> [--refresh] dir/file` no longer emits the warning added at
> mt/add-rm-sparse-checkout.

You are right, it does not emit the warning. I will add a test that
ensures that behavior is the same across the two sparse repos in
t1092 as part of my v2 in this series.
 
> Adding `ensure_full_index()` at
> `find_pathspecs_matching_skip_worktree()` fixes the problem. We have
> to consider the performance implications, but they _might_ be
> acceptable as we only call this function when a pathspec given to
> `add` or `rm` does not match any non-ignored file inside the sparse
> checkout.

I'll want to do the right thing here to make the warning work, so
I'll take a look soon.

> Additionally, the tests I added at t3705 won't catch this problem,
> even when running with GIT_TEST_SPARSE_INDEX=true :( That's because
> they don't set core.sparseCheckout and core.sparseCheckoutCone, they
> only set individual index entries with the SKIP_WORKTREE bit. And
> therefore, the index is always written fully. Perhaps, should I reroll
> my series using cone mode for these tests?

Your series should not be re-rolled for this. Instead, this is valuable
feedback for this series: there is behavior in 'git add' that I am not
checking stays the same when the sparse-index is enabled. That's my
responsibility and I'll get it fixed.
 
> (And a semi-related question: do you plan on adding
> GIT_TEST_SPARSE_INDEX=true to one of the CI jobs? )

I do plan to add that, after things calm down. It won't do much right
now because it requires core.sparseCheckout[Cone] to be enabled. Not
many tests provide that, so they don't add much coverage. I thought at
one point to adjust the initial repo creation to include a
sparse-checkout in cone mode, but that would change too many tests.
I still haven't found the right way to expand the test coverage to
take advantage of our deep test suite for this feature.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [PATCH 01/10] t1092: add tests for status/add and sparse files
  2021-04-13 14:01 ` [PATCH 01/10] t1092: add tests for status/add and sparse files Derrick Stolee via GitGitGadget
@ 2021-04-20 21:52   ` Elijah Newren
  2021-04-21 13:21     ` Derrick Stolee
  2021-04-21 15:14   ` Matheus Tavares Bernardino
  1 sibling, 1 reply; 127+ messages in thread
From: Elijah Newren @ 2021-04-20 21:52 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Junio C Hamano, Derrick Stolee, Derrick Stolee

On Tue, Apr 13, 2021 at 7:01 AM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Derrick Stolee <dstolee@microsoft.com>
>
> Before moving to update 'git status' and 'git add' to work with sparse
> indexes, add an explicit test that ensures the sparse-index works the
> same as a normal sparse-checkout when the worktree contains directories
> and files outside of the sparse cone.
>
> Specifically, 'folder1/a' is a file in our test repo, but 'folder1' is
> not in the sparse cone. When 'folder1/a' is modified, the file
> 'folder1/a' is shown as modified, but adding it fails. This is new
> behavior as of a20f704 (add: warn when asked to update SKIP_WORKTREE
> entries, 2021-04-08). Before that change, these adds would be silently
> ignored.
>
> Untracked files are fine: adding new files both with 'git add .' and
> 'git add folder1/' works just as in a full checkout. This may not be
> entirely desirable, but we are not intending to change behavior at the
> moment, only document it.

Personally, I'd say not desirable and we should throw an error just
like we do with skip-worktree entries that the user happens to try to
git add.  I've had reports from users that got confused by what
happens after this.  I've been meaning to create some patches to fix
it up, but wanted to avoid getting in the way of the sparse-index work
and have been a bit tied up on other projects to boot.

I'll note in particular that it's easy for users after running "git
add" to run other things such as "git sparse-checkout reapply" or "git
switch $otherbranch" and suddenly the file disappears from the working
tree.  From the sparse-checkout machinery that makes sense; this path
doesn't match the .git/info/sparse-checkout list of paths, so it
should be removed from the working tree.  But it's very disorienting
to users.  Especially if some of those commands are side-effects of
other commands (e.g. our build system invokes "git sparse-checkout
reapply" in various cases, most common of which is that even a simple
"git pull" can bring down code with dependency changes and thus a need
for new sparsity rules and whatnot), but it definitely can just happen
in ways users don't expect with their own commands (e.g. the git
switch/checkout example).

The patch looks good, but it'd be nice if while documenting it we also
add a comment that we believe we want to change the behavior (for
sparse-checkout both with and without sparse-index).  It's one of
those many paper-cuts we still have.

> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  t/t1092-sparse-checkout-compatibility.sh | 36 ++++++++++++++++++++++++
>  1 file changed, 36 insertions(+)
>
> diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
> index 12e6c453024f..6598c12a2069 100755
> --- a/t/t1092-sparse-checkout-compatibility.sh
> +++ b/t/t1092-sparse-checkout-compatibility.sh
> @@ -232,6 +232,42 @@ test_expect_success 'add, commit, checkout' '
>         test_all_match git checkout -
>  '
>
> +test_expect_success 'status/add: outside sparse cone' '
> +       init_repos &&
> +
> +       # folder1 is at HEAD, but outside the sparse cone
> +       run_on_sparse mkdir folder1 &&
> +       cp initial-repo/folder1/a sparse-checkout/folder1/a &&
> +       cp initial-repo/folder1/a sparse-index/folder1/a &&
> +
> +       test_sparse_match git status &&
> +
> +       write_script edit-contents <<-\EOF &&
> +       echo text >>$1
> +       EOF
> +       run_on_all ../edit-contents folder1/a &&
> +       run_on_all ../edit-contents folder1/new &&
> +
> +       test_sparse_match git status --porcelain=v2 &&
> +
> +       # This "git add folder1/a" is completely ignored
> +       # by the sparse-checkout repos. It causes the
> +       # full repo to have a different staged environment.
> +       test_must_fail git -C sparse-checkout add folder1/a &&
> +       test_must_fail git -C sparse-index add folder1/a &&
> +       git -C full-checkout checkout HEAD -- folder1/a &&
> +       test_sparse_match git status --porcelain=v2 &&
> +
> +       test_all_match git add . &&
> +       test_all_match git status --porcelain=v2 &&
> +       test_all_match git commit -m folder1/new &&
> +
> +       run_on_all ../edit-contents folder1/newer &&
> +       test_all_match git add folder1/ &&
> +       test_all_match git status --porcelain=v2 &&
> +       test_all_match git commit -m folder1/newer
> +'
> +
>  test_expect_success 'checkout and reset --hard' '
>         init_repos &&
>
> --
> gitgitgadget
>

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [PATCH 02/10] unpack-trees: make sparse aware
  2021-04-13 14:01 ` [PATCH 02/10] unpack-trees: make sparse aware Derrick Stolee via GitGitGadget
@ 2021-04-20 23:00   ` Elijah Newren
  2021-04-21 13:41     ` Derrick Stolee
  2021-04-21 17:27     ` Derrick Stolee
  0 siblings, 2 replies; 127+ messages in thread
From: Elijah Newren @ 2021-04-20 23:00 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Junio C Hamano, Derrick Stolee, Derrick Stolee

On Tue, Apr 13, 2021 at 7:01 AM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Derrick Stolee <dstolee@microsoft.com>
>
> As a first step to integrate 'git status' and 'git add' with the sparse
> index, we must start integrating unpack_trees() with sparse directory
> entries. These changes are currently impossible to trigger because
> unpack_trees() calls ensure_full_index() if command_requires_full_index
> is true. This is the case for all commands at the moment. As we expand
> more commands to be sparse-aware, we might find that more changes are
> required to unpack_trees(). The current changes will suffice for
> 'status' and 'add'.
>
> unpack_trees() calls the traverse_trees() API using unpack_callback()
> to decide if we should recurse into a subtree. We must add new abilities
> to skip a subtree if it corresponds to a sparse directory entry.
>
> It is important to be careful about the trailing directory separator
> that exists in the sparse directory entries but not in the subtree
> paths.
>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  dir.h           |  2 +-
>  preload-index.c |  2 ++
>  read-cache.c    |  3 +++
>  unpack-trees.c  | 24 ++++++++++++++++++++++--
>  4 files changed, 28 insertions(+), 3 deletions(-)
>
> diff --git a/dir.h b/dir.h
> index 51cb0e217247..9d6666f520f3 100644
> --- a/dir.h
> +++ b/dir.h
> @@ -503,7 +503,7 @@ static inline int ce_path_match(struct index_state *istate,
>                                 char *seen)
>  {
>         return match_pathspec(istate, pathspec, ce->name, ce_namelen(ce), 0, seen,
> -                             S_ISDIR(ce->ce_mode) || S_ISGITLINK(ce->ce_mode));
> +                             S_ISSPARSEDIR(ce->ce_mode) || S_ISDIR(ce->ce_mode) || S_ISGITLINK(ce->ce_mode));

I'm confused why this change would be needed, or why it'd semantically
be meaningful here either.  Doesn't S_ISSPARSEDIR() being true imply
S_ISDIR() is true (and perhaps even vice versa?).

By chance, was this a leftover from your early RFC changes from a few
series ago when you had an entirely different mode for sparse
directory entries?

>  }
>
>  static inline int dir_path_match(struct index_state *istate,
> diff --git a/preload-index.c b/preload-index.c
> index e5529a586366..35e67057ca9b 100644
> --- a/preload-index.c
> +++ b/preload-index.c
> @@ -55,6 +55,8 @@ static void *preload_thread(void *_data)
>                         continue;
>                 if (S_ISGITLINK(ce->ce_mode))
>                         continue;
> +               if (S_ISSPARSEDIR(ce->ce_mode))
> +                       continue;
>                 if (ce_uptodate(ce))
>                         continue;
>                 if (ce_skip_worktree(ce))

Don't we have S_ISSPARSEDIR(ce->ce_mode) implies ce_skip_worktree(ce)?
 Is this a duplicate check?  If so, is it still desirable for
future-proofing or code clarity, or is it strictly redundant?

> diff --git a/read-cache.c b/read-cache.c
> index 29ffa9ac5db9..6308234b4838 100644
> --- a/read-cache.c
> +++ b/read-cache.c
> @@ -1594,6 +1594,9 @@ int refresh_index(struct index_state *istate, unsigned int flags,
>                 if (ignore_skip_worktree && ce_skip_worktree(ce))
>                         continue;
>
> +               if (istate->sparse_index && S_ISSPARSEDIR(ce->ce_mode))
> +                       continue;
> +

I'm a bit confused about what could trigger ce_skip_worktree(ce) &&
!ignore_skip_worktree and why it'd be desirable to refresh
skip-worktree entries.  However, this is tangential to your patch and
has apparently been around since 2009 (in particular, from 56cac48c35
("ie_match_stat(): do not ignore skip-worktree bit with
CE_MATCH_IGNORE_VALID", 2009-12-14)).

>                 if (pathspec && !ce_path_match(istate, ce, pathspec, seen))
>                         filtered = 1;
>
> diff --git a/unpack-trees.c b/unpack-trees.c
> index dddf106d5bd4..9a62e823928a 100644
> --- a/unpack-trees.c
> +++ b/unpack-trees.c
> @@ -586,6 +586,13 @@ static void mark_ce_used(struct cache_entry *ce, struct unpack_trees_options *o)
>  {
>         ce->ce_flags |= CE_UNPACKED;
>
> +       /*
> +        * If this is a sparse directory, don't advance cache_bottom.
> +        * That will be advanced later using the cache-tree data.
> +        */
> +       if (S_ISSPARSEDIR(ce->ce_mode))
> +               return;
> +

I don't understand cache_bottom stuff; we might want to get Junio to
look over it.  Or maybe I just need to dig a bit further and attempt
to understand it.

>         if (o->cache_bottom < o->src_index->cache_nr &&
>             o->src_index->cache[o->cache_bottom] == ce) {
>                 int bottom = o->cache_bottom;
> @@ -984,6 +991,9 @@ static int do_compare_entry(const struct cache_entry *ce,
>         ce_len -= pathlen;
>         ce_name = ce->name + pathlen;
>
> +       /* remove directory separator if a sparse directory entry */
> +       if (S_ISSPARSEDIR(ce->ce_mode))
> +               ce_len--;
>         return df_name_compare(ce_name, ce_len, S_IFREG, name, namelen, mode);

Shouldn't we be passing ce->ce_mode instead of S_IFREG here as well?

Note the following sort order:
   foo
   foo.txt
   foo/
   foo/bar

You've trimmed off the '/', so 'foo/' would be ordered where 'foo' is,
but df_name_compare() exists to make "foo" sort exactly where "foo/"
would when "foo" is a directory.  Will your df_name_compare() call
here result in foo.txt being placed after all the "foo/<subpath>"
entries in the index and perhaps cause other problems down the line?
(Are there issues, e.g. with cache-trees getting wrong ordering from
this, or even writing out indexes or tree objects with the wrong
ordering?  I've written out trees to disk with wrong ordering before
and git usually survives but gets really confused with diffs.)

Since at least one caller of compare_entry() takes the return result
and does a "if (cmp < 0)", this order is going to matter in some
cases.  Perhaps we need some testcases where there is a sparse
directory entry named "foo/" and a file recorded in some relevant tree
with the name "foo.txt" to be able to trigger these lines of code?

>  }
>
> @@ -993,6 +1003,10 @@ static int compare_entry(const struct cache_entry *ce, const struct traverse_inf
>         if (cmp)
>                 return cmp;
>
> +       /* If ce is a sparse directory, then allow equality here. */
> +       if (S_ISSPARSEDIR(ce->ce_mode))
> +               return 0;
> +

Um...so a sparse directory compares equal to _anything_ at all?  I'm
really confused why this would be desirable.  Am I missing something
here?

>         /*
>          * Even if the beginning compared identically, the ce should
>          * compare as bigger than a directory leading up to it!
> @@ -1243,6 +1257,7 @@ static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, str
>         struct cache_entry *src[MAX_UNPACK_TREES + 1] = { NULL, };
>         struct unpack_trees_options *o = info->data;
>         const struct name_entry *p = names;
> +       unsigned recurse = 1;

"recurse" sent my mind off into questions about safety checks, base
cases, etc., instead of just the simple "we don't want to read in
directories corresponding to sparse entries".  I think this would be
clearer either if the variable had the sparsity concept embedded in
its name somewhere (e.g. "unsigned sparse_entry = 0", and check for
(!sparse_entry) instead of (recurse) below), or with a comment about
why there are cases where you want to avoid recursion.

>
>         /* Find first entry with a real name (we could use "mask" too) */
>         while (!p->mode)
> @@ -1284,12 +1299,16 @@ static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, str
>                                         }
>                                 }
>                                 src[0] = ce;
> +
> +                               if (S_ISSPARSEDIR(ce->ce_mode))
> +                                       recurse = 0;

Ah, the context here doesn't show it but this is in the "if (!cmp)"
block, i.e. if we found a match for the sparse directory.  This makes
sense, to me, _if_ we ignore the above question about sparse
directories matching equal to anything and everything.

>                         }
>                         break;
>                 }
>         }
>
> -       if (unpack_nondirectories(n, mask, dirmask, src, names, info) < 0)
> +       if (recurse &&
> +           unpack_nondirectories(n, mask, dirmask, src, names, info) < 0)
>                 return -1;
>
>         if (o->merge && src[0]) {
> @@ -1319,7 +1338,8 @@ static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, str
>                         }
>                 }
>
> -               if (traverse_trees_recursive(n, dirmask, mask & ~dirmask,
> +               if (recurse &&
> +                   traverse_trees_recursive(n, dirmask, mask & ~dirmask,
>                                              names, info) < 0)
>                         return -1;
>                 return mask;

Nice.  :-)


I think your patch was mostly about the recurse stuff, which other
than the name or a comment about it look good to me.  However, all the
other preparatory small tweaks brought up a lot of questions or
confusion for me.  I'm worried there might be a bug or two, though I
may have just misunderstood some of the code bits.

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [PATCH 03/10] dir.c: accept a directory as part of cone-mode patterns
  2021-04-13 14:01 ` [PATCH 03/10] dir.c: accept a directory as part of cone-mode patterns Derrick Stolee via GitGitGadget
@ 2021-04-20 23:21   ` Elijah Newren
  2021-04-21 13:47     ` Derrick Stolee
  0 siblings, 1 reply; 127+ messages in thread
From: Elijah Newren @ 2021-04-20 23:21 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Junio C Hamano, Derrick Stolee, Derrick Stolee

On Tue, Apr 13, 2021 at 7:01 AM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Derrick Stolee <dstolee@microsoft.com>
>
> When we have sparse directory entries in the index, we want to compare
> that directory against sparse-checkout patterns. Those pattern matching
> algorithms are built expecting a file path, not a directory path. This
> is especially important in the "cone mode" patterns which will match
> files that exist within the "parent directories" as well as the
> recursive directory matches.
>
> If path_matches_pattern_list() is given a directory, we can add a fake
> filename ("-") to the directory and get the same results as before,
> assuming we are in cone mode. Since sparse index requires cone mode
> patterns, this is an acceptable assumption.

Makes sense; thanks for the good description.

> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  dir.c | 5 +++++
>  1 file changed, 5 insertions(+)
>
> diff --git a/dir.c b/dir.c
> index 166238e79f52..57e22e605cec 100644
> --- a/dir.c
> +++ b/dir.c
> @@ -1378,6 +1378,11 @@ enum pattern_match_result path_matches_pattern_list(
>         strbuf_addch(&parent_pathname, '/');
>         strbuf_add(&parent_pathname, pathname, pathlen);
>
> +       /* Directory requests should be added as if they are a file */

"added" or "matched"?  Also, the description seems a bit brief and
likely to surprise; I'd at least want to expand "file" to "file within
their given directory" but it might be nice to get some summarized
version of the commit message or at least state that "-" is just a
random simple name within the given directory.

> +       if (parent_pathname.len > 1 &&

Is this line...

> +           parent_pathname.buf[parent_pathname.len - 1] == '/')

to prevent an out-of-bounds indexing?  If so, shouldn't it be "> 0" or
">= 1" rather than "> 1"?  And if so, doesn't the strbuf_addch() call
above ensure the condition is always met?

Or are we trying to avoid adding the "-" when we parent_pathname is
just a plain "/"?

> +               strbuf_add(&parent_pathname, "-", 1);
> +

Sorry for all the questions on such a tiny change.  It makes sense to
me, I'm just curious whether it'll confuse future code readers.


>         if (hashmap_contains_path(&pl->recursive_hashmap,
>                                   &parent_pathname)) {
>                 result = MATCHED_RECURSIVE;
> --

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [PATCH 04/10] status: skip sparse-checkout percentage with sparse-index
  2021-04-13 14:01 ` [PATCH 04/10] status: skip sparse-checkout percentage with sparse-index Derrick Stolee via GitGitGadget
@ 2021-04-20 23:26   ` Elijah Newren
  2021-04-21 13:51     ` Derrick Stolee
  0 siblings, 1 reply; 127+ messages in thread
From: Elijah Newren @ 2021-04-20 23:26 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Junio C Hamano, Derrick Stolee, Derrick Stolee

On Tue, Apr 13, 2021 at 7:01 AM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Derrick Stolee <dstolee@microsoft.com>
>
> 'git status' began reporting a percentage of populated paths when
> sparse-checkout is enabled in 051df3cf (wt-status: show sparse
> checkout status as well, 2020-07-18). This percentage is incorrect when
> the index has sparse directories. It would also be expensive to
> calculate as we would need to parse trees to count the total number of
> possible paths.
>
> Avoid the expensive computation by simplifying the output to only report
> that a sparse checkout exists, without the percentage.

Makes sense.  The percentage wasn't critical, it was just a nice UI
bonus.  The critical part is notifying about being in a sparse
checkout.

It makes me wonder slightly if we'd want to remove the percentage for
both modes just to keep them more similar.  I'll ask some folks for
their thoughts/opinions.  Of course, that could always be tweaked
later and doesn't necessarily need to go into your series.

> This change is the reason we use 'git status --porcelain=v2' in
> t1092-sparse-checkout-compatibility.sh. We don't want to ensure that
> this message is equal across both modes, but instead just the important
> information about staged, modified, and untracked files are compared.
>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  t/t1092-sparse-checkout-compatibility.sh |  8 ++++++++
>  wt-status.c                              | 14 +++++++++++---
>  wt-status.h                              |  1 +
>  3 files changed, 20 insertions(+), 3 deletions(-)
>
> diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
> index 6598c12a2069..e488ef9bd941 100755
> --- a/t/t1092-sparse-checkout-compatibility.sh
> +++ b/t/t1092-sparse-checkout-compatibility.sh
> @@ -196,6 +196,14 @@ test_expect_success 'status with options' '
>         test_all_match git status --porcelain=v2 -uno
>  '
>
> +test_expect_success 'status reports sparse-checkout' '
> +       init_repos &&
> +       git -C sparse-checkout status >full &&
> +       git -C sparse-index status >sparse &&
> +       test_i18ngrep "You are in a sparse checkout with " full &&
> +       test_i18ngrep "You are in a sparse checkout." sparse
> +'
> +
>  test_expect_success 'add, commit, checkout' '
>         init_repos &&
>
> diff --git a/wt-status.c b/wt-status.c
> index 0c8287a023e4..0425169c1895 100644
> --- a/wt-status.c
> +++ b/wt-status.c
> @@ -1490,9 +1490,12 @@ static void show_sparse_checkout_in_use(struct wt_status *s,
>         if (s->state.sparse_checkout_percentage == SPARSE_CHECKOUT_DISABLED)
>                 return;
>
> -       status_printf_ln(s, color,
> -                        _("You are in a sparse checkout with %d%% of tracked files present."),
> -                        s->state.sparse_checkout_percentage);
> +       if (s->state.sparse_checkout_percentage == SPARSE_CHECKOUT_SPARSE_INDEX)
> +               status_printf_ln(s, color, _("You are in a sparse checkout."));
> +       else
> +               status_printf_ln(s, color,
> +                               _("You are in a sparse checkout with %d%% of tracked files present."),
> +                               s->state.sparse_checkout_percentage);
>         wt_longstatus_print_trailer(s);
>  }
>
> @@ -1650,6 +1653,11 @@ static void wt_status_check_sparse_checkout(struct repository *r,
>                 return;
>         }
>
> +       if (r->index->sparse_index) {
> +               state->sparse_checkout_percentage = SPARSE_CHECKOUT_SPARSE_INDEX;
> +               return;
> +       }
> +
>         for (i = 0; i < r->index->cache_nr; i++) {
>                 struct cache_entry *ce = r->index->cache[i];
>                 if (ce_skip_worktree(ce))
> diff --git a/wt-status.h b/wt-status.h
> index 0d32799b28e1..ab9cc9d8f032 100644
> --- a/wt-status.h
> +++ b/wt-status.h
> @@ -78,6 +78,7 @@ enum wt_status_format {
>  };
>
>  #define SPARSE_CHECKOUT_DISABLED -1
> +#define SPARSE_CHECKOUT_SPARSE_INDEX -2
>
>  struct wt_status_state {
>         int merge_in_progress;
> --
> gitgitgadget

Looks good.

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [PATCH 05/10] status: use sparse-index throughout
  2021-04-13 14:01 ` [PATCH 05/10] status: use sparse-index throughout Derrick Stolee via GitGitGadget
@ 2021-04-21  0:44   ` Elijah Newren
  2021-04-21 13:55     ` Derrick Stolee
  0 siblings, 1 reply; 127+ messages in thread
From: Elijah Newren @ 2021-04-21  0:44 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Junio C Hamano, Derrick Stolee, Derrick Stolee

On Tue, Apr 13, 2021 at 7:01 AM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Derrick Stolee <dstolee@microsoft.com>
>
> By testing 'git -c core.fsmonitor= status -uno', we can check for the
> simplest index operations that can be made sparse-aware. The necessary
> implementation details are already integrated with sparse-checkout, so
> modify command_requires_full_index to be zero for cmd_status().
>
> By running the debugger for 'git status -uno' after that change, we find
> two instances of ensure_full_index() that were added for extra safety,
> but can be removed without issue.
>
> In refresh_index(), we loop through the index entries. The
> refresh_cache_ent() method copies the sparse directories into the
> refreshed index without issue.

I do see the removal of a call to ensure_full_index() in
refresh_index() that you mention in this paragraph in the patch below.

I'm confused, though; I would have thought we wanted to avoid a
refresh_cache_ent() call.  Also, one of your previous patches added a

    if (istate->sparse_index && S_ISSPARSEDIR(ce->ce_mode))
        continue;

check before the code ever gets to the refresh_cache_ent() call, so as
far as I can tell, that function won't be called from refresh_entry()
for sparse entries.  Maybe your commit message here is out-of-date?
Or am I confused somehow?

> The loop within run_diff_files() skips things that are in stage 0 and
> have skip-worktree enabled, so seems safe to disable ensure_full_index()
> here.

Unlike the above, I don't see a removal of a ensure_full_index() call
in run_diff_files() as claimed by this paragraph.  Has the commit
message gotten out of date with refactorings you did while developing
this series?

> This allows some cases of 'git status' to no longer expand a sparse
> index to a full one, giving the following performance improvements for
> p2000-sparse-checkout-operations.sh:
>
> Test                                  HEAD~1           HEAD
> -----------------------------------------------------------------------------
> 2000.2: git status (full-index-v3)    0.38(0.36+0.07)  0.37(0.31+0.10) -2.6%
> 2000.3: git status (full-index-v4)    0.38(0.29+0.12)  0.37(0.30+0.11) -2.6%
> 2000.4: git status (sparse-index-v3)  2.43(2.33+0.14)  0.04(0.05+0.04) -98.4%
> 2000.5: git status (sparse-index-v4)  2.44(2.35+0.13)  0.05(0.04+0.07) -98.0%
>
> Note that since HEAD~1 was expanding the sparse index by parsing trees,
> it was artificially slower than the full index case. Thus, the 98%
> improvement is misleading, and instead we should celebrate the 0.37s to
> 0.05s improvement of 82%. This is more indicative of the peformance
> gains we are expecting by using a sparse index.

82%, very nice.  Was this with git.git as the test repository, or some
other repo?  If it's git.git, then we'd actually expect a much bigger
speedup for other repositories, as git.git is pretty small.


> Note: we are dropping the assignment of core.fsmonitor here. This is not
> necessary for the test script as we are not altering the config any
> other way. Correct integration with FS Monitor will be validated in
> later changes.
>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  builtin/commit.c                         |  3 +++
>  read-cache.c                             |  2 --
>  t/t1092-sparse-checkout-compatibility.sh | 12 ++++++++----
>  3 files changed, 11 insertions(+), 6 deletions(-)
>
> diff --git a/builtin/commit.c b/builtin/commit.c
> index cf0c36d1dcb2..e529da7beadd 100644
> --- a/builtin/commit.c
> +++ b/builtin/commit.c
> @@ -1404,6 +1404,9 @@ int cmd_status(int argc, const char **argv, const char *prefix)
>         if (argc == 2 && !strcmp(argv[1], "-h"))
>                 usage_with_options(builtin_status_usage, builtin_status_options);
>
> +       prepare_repo_settings(the_repository);
> +       the_repository->settings.command_requires_full_index = 0;
> +
>         status_init_config(&s, git_status_config);
>         argc = parse_options(argc, argv, prefix,
>                              builtin_status_options,
> diff --git a/read-cache.c b/read-cache.c
> index 6308234b4838..83e6bdef7604 100644
> --- a/read-cache.c
> +++ b/read-cache.c
> @@ -1578,8 +1578,6 @@ int refresh_index(struct index_state *istate, unsigned int flags,
>          */
>         preload_index(istate, pathspec, 0);
>         trace2_region_enter("index", "refresh", NULL);
> -       /* TODO: audit for interaction with sparse-index. */
> -       ensure_full_index(istate);
>         for (i = 0; i < istate->cache_nr; i++) {
>                 struct cache_entry *ce, *new_entry;
>                 int cache_errno = 0;
> diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
> index e488ef9bd941..380a085f8ec4 100755
> --- a/t/t1092-sparse-checkout-compatibility.sh
> +++ b/t/t1092-sparse-checkout-compatibility.sh
> @@ -449,12 +449,16 @@ test_expect_success 'sparse-index is expanded and converted back' '
>         GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \
>                 git -C sparse-index -c core.fsmonitor="" reset --hard &&
>         test_region index convert_to_sparse trace2.txt &&
> -       test_region index ensure_full_index trace2.txt &&
> +       test_region index ensure_full_index trace2.txt
> +'
>
> -       rm trace2.txt &&
> +test_expect_success 'sparse-index is not expanded' '
> +       init_repos &&
> +
> +       rm -f trace2.txt &&
>         GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \
> -               git -C sparse-index -c core.fsmonitor="" status -uno &&
> -       test_region index ensure_full_index trace2.txt
> +               git -C sparse-index status -uno &&
> +       test_region ! index ensure_full_index trace2.txt
>  '
>
>  test_done
> --
> gitgitgadget

Other than what looks like a couple issues in the commit message, the
change looks good to me.

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [PATCH 06/10] dir: use expand_to_path() for sparse directories
  2021-04-13 14:01 ` [PATCH 06/10] dir: use expand_to_path() for sparse directories Derrick Stolee via GitGitGadget
@ 2021-04-21  0:52   ` Elijah Newren
  2021-04-21  0:53     ` Elijah Newren
  0 siblings, 1 reply; 127+ messages in thread
From: Elijah Newren @ 2021-04-21  0:52 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Junio C Hamano, Derrick Stolee, Derrick Stolee

On Tue, Apr 13, 2021 at 7:01 AM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Derrick Stolee <dstolee@microsoft.com>
>
> The recently-implemented expand_to_path() method can supply position
> queries a faster response if they are specifically asking for a path
> within the sparse cone. Since this is the most-common scenario, this
> provides a significant speedup.
>
> Update t1092-sparse-checkout-compatibility.sh to fully ensure that 'git
> status' does not expand a sparse index to a full one, even when there
> exist untracked files.
>
> The performance test script p2000-sparse-operations.sh demonstrates
> that this is the final hole to fill to allow 'git status' to speed up
> when using a sparse index:
>
> Test                                  HEAD~1            HEAD
> ------------------------------------------------------------------------------
> 2000.4: git status (sparse-index-v3)  1.50(1.43+0.10)   0.04(0.04+0.03) -97.3%
> 2000.5: git status (sparse-index-v4)  1.50(1.43+0.10)   0.04(0.03+0.04) -97.3%

Um, I'm confused.  In the previous patch you claimed the following speedups:

2000.4: git status (sparse-index-v3)  2.43(2.33+0.14)  0.04(0.05+0.04) -98.4%
2000.5: git status (sparse-index-v4)  2.44(2.35+0.13)  0.05(0.04+0.07) -98.0%

I don't understand why the "Before" for this patch claims 1.50 as the
initial speed, if the "After" for the last patch was 0.04.  Should the
previous commit message have instead claimed:

2000.4: git status (sparse-index-v3)  2.43(2.33+0.14)  1.50(1.43+0.10) -38.3%
2000.5: git status (sparse-index-v4)  2.44(2.35+0.13)  1.50(1.43+0.10) -38.5%

?

>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  t/t1092-sparse-checkout-compatibility.sh | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
> index 380a085f8ec4..b937d7096afd 100755
> --- a/t/t1092-sparse-checkout-compatibility.sh
> +++ b/t/t1092-sparse-checkout-compatibility.sh
> @@ -456,8 +456,9 @@ test_expect_success 'sparse-index is not expanded' '
>         init_repos &&
>
>         rm -f trace2.txt &&
> +       echo >>sparse-index/untracked.txt &&
>         GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \
> -               git -C sparse-index status -uno &&
> +               git -C sparse-index status &&
>         test_region ! index ensure_full_index trace2.txt
>  '
>
> --
> gitgitgadget

Oh!  So, the previous patch was testing without enumerating untracked
files (because it did those slowly), whereas this one enumerates
untracked files and is still able to achieve the same performance?
This wasn't very clear from the commit message.  Maybe I'm just bad at
reading, but perhaps the commit message could be tweaked slightly to
make this more clear?

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [PATCH 06/10] dir: use expand_to_path() for sparse directories
  2021-04-21  0:52   ` Elijah Newren
@ 2021-04-21  0:53     ` Elijah Newren
  2021-04-21 14:03       ` Derrick Stolee
  0 siblings, 1 reply; 127+ messages in thread
From: Elijah Newren @ 2021-04-21  0:53 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Junio C Hamano, Derrick Stolee, Derrick Stolee

One more thing:

On Tue, Apr 20, 2021 at 5:52 PM Elijah Newren <newren@gmail.com> wrote:
>
> On Tue, Apr 13, 2021 at 7:01 AM Derrick Stolee via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
> >
> > From: Derrick Stolee <dstolee@microsoft.com>
> >
> > The recently-implemented expand_to_path() method can supply position
> > queries a faster response if they are specifically asking for a path
> > within the sparse cone. Since this is the most-common scenario, this
> > provides a significant speedup.
> >
> > Update t1092-sparse-checkout-compatibility.sh to fully ensure that 'git
> > status' does not expand a sparse index to a full one, even when there
> > exist untracked files.
> >
> > The performance test script p2000-sparse-operations.sh demonstrates
> > that this is the final hole to fill to allow 'git status' to speed up
> > when using a sparse index:
> >
> > Test                                  HEAD~1            HEAD
> > ------------------------------------------------------------------------------
> > 2000.4: git status (sparse-index-v3)  1.50(1.43+0.10)   0.04(0.04+0.03) -97.3%
> > 2000.5: git status (sparse-index-v4)  1.50(1.43+0.10)   0.04(0.03+0.04) -97.3%
>
> Um, I'm confused.  In the previous patch you claimed the following speedups:
>
> 2000.4: git status (sparse-index-v3)  2.43(2.33+0.14)  0.04(0.05+0.04) -98.4%
> 2000.5: git status (sparse-index-v4)  2.44(2.35+0.13)  0.05(0.04+0.07) -98.0%
>
> I don't understand why the "Before" for this patch claims 1.50 as the
> initial speed, if the "After" for the last patch was 0.04.  Should the
> previous commit message have instead claimed:
>
> 2000.4: git status (sparse-index-v3)  2.43(2.33+0.14)  1.50(1.43+0.10) -38.3%
> 2000.5: git status (sparse-index-v4)  2.44(2.35+0.13)  1.50(1.43+0.10) -38.5%
>
> ?
>
> >
> > Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> > ---
> >  t/t1092-sparse-checkout-compatibility.sh | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
> > index 380a085f8ec4..b937d7096afd 100755
> > --- a/t/t1092-sparse-checkout-compatibility.sh
> > +++ b/t/t1092-sparse-checkout-compatibility.sh
> > @@ -456,8 +456,9 @@ test_expect_success 'sparse-index is not expanded' '
> >         init_repos &&
> >
> >         rm -f trace2.txt &&
> > +       echo >>sparse-index/untracked.txt &&
> >         GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \
> > -               git -C sparse-index status -uno &&
> > +               git -C sparse-index status &&
> >         test_region ! index ensure_full_index trace2.txt
> >  '
> >
> > --
> > gitgitgadget
>
> Oh!  So, the previous patch was testing without enumerating untracked
> files (because it did those slowly), whereas this one enumerates
> untracked files and is still able to achieve the same performance?
> This wasn't very clear from the commit message.  Maybe I'm just bad at
> reading, but perhaps the commit message could be tweaked slightly to
> make this more clear?

Why is the subject of this commit "dir: use expand_to_path() ..." if
it only touches t1092-sparse-checkout-compatibility.sh?

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [PATCH 08/10] pathspec: stop calling ensure_full_index
  2021-04-13 14:01 ` [PATCH 08/10] pathspec: stop calling ensure_full_index Derrick Stolee via GitGitGadget
@ 2021-04-21  0:57   ` Elijah Newren
  0 siblings, 0 replies; 127+ messages in thread
From: Elijah Newren @ 2021-04-21  0:57 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Junio C Hamano, Derrick Stolee, Derrick Stolee

On Tue, Apr 13, 2021 at 7:01 AM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Derrick Stolee <dstolee@microsoft.com>
>
> The add_pathspec_matches_against_index() focuses on matching a pathspec
> to file entries in the index. This already works correctly for its only
> use: checking if untracked files exist in the index.
>
> The compatibility checks in t1092 already test that 'git add <dir>'
> works for a directory outside of the sparse cone. That provides coverage
> for removing this guard.
>
> This finalizes our ability to run 'git add .' without expanding a sparse
> index to a full one. This is evidenced by an update to t1092 and by
> these performance numbers for p2000-sparse-operations.sh:
>
> Test                                    HEAD~1            HEAD
> --------------------------------------------------------------------------------
> 2000.10: git add . (full-index-v3)      1.37(1.02+0.18)   1.38(1.01+0.20) +0.7%
> 2000.11: git add . (full-index-v4)      1.26(1.00+0.15)   1.27(0.99+0.17) +0.8%
> 2000.12: git add . (sparse-index-v3)    2.39(2.29+0.14)   0.06(0.05+0.07) -97.5%
> 2000.13: git add . (sparse-index-v4)    2.42(2.32+0.14)   0.06(0.05+0.06) -97.5%
>
> While the 97% improvement is shown by the test results, it is worth
> noting that expanding the sparse index was adding overhead in previous
> commits. Comparing to the full index case, we see the performance go
> from 1.27s to 0.06s, a 95% improvement.

This is awesome.  :-)

>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  pathspec.c                               | 2 --
>  t/t1092-sparse-checkout-compatibility.sh | 6 ++++++
>  2 files changed, 6 insertions(+), 2 deletions(-)
>
> diff --git a/pathspec.c b/pathspec.c
> index 54813c0c4e8e..b51b48471fe6 100644
> --- a/pathspec.c
> +++ b/pathspec.c
> @@ -37,8 +37,6 @@ void add_pathspec_matches_against_index(const struct pathspec *pathspec,
>                         num_unmatched++;
>         if (!num_unmatched)
>                 return;
> -       /* TODO: audit for interaction with sparse-index. */
> -       ensure_full_index(istate);
>         for (i = 0; i < istate->cache_nr; i++) {
>                 const struct cache_entry *ce = istate->cache[i];
>                 if (sw_action == PS_IGNORE_SKIP_WORKTREE && ce_skip_worktree(ce))
> diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
> index c210dba78067..738013b00191 100755
> --- a/t/t1092-sparse-checkout-compatibility.sh
> +++ b/t/t1092-sparse-checkout-compatibility.sh
> @@ -471,6 +471,12 @@ test_expect_success 'sparse-index is not expanded' '
>         echo >>sparse-index/extra.txt &&
>         GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \
>                 git -C sparse-index add extra.txt &&
> +       test_region ! index ensure_full_index trace2.txt &&
> +
> +       rm trace2.txt &&
> +       echo >>sparse-index/untracked.txt &&
> +       GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \
> +               git -C sparse-index add . &&
>         test_region ! index ensure_full_index trace2.txt
>  '
>
> --
> gitgitgadget
>

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [PATCH 10/10] fsmonitor: test with sparse index
  2021-04-13 14:01 ` [PATCH 10/10] fsmonitor: test with sparse index Derrick Stolee via GitGitGadget
@ 2021-04-21  7:00   ` Elijah Newren
  0 siblings, 0 replies; 127+ messages in thread
From: Elijah Newren @ 2021-04-21  7:00 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Junio C Hamano, Derrick Stolee, Derrick Stolee

On Tue, Apr 13, 2021 at 7:01 AM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Derrick Stolee <dstolee@microsoft.com>
>
> During the effort to protect uses of the index to operate on a full
> index, we did not modify fsmonitor.c. This is because it already works
> effectively with only the change to index_name_stage_pos(). The only
> thing left to do is to test that it works correctly.
>
> These tests are added to demonstrate that the behavior is the same
> across a full index and a sparse index, but also that file modifications
> to a tracked directory outside of the sparse cone will trigger
> ensure_full_index().
>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  t/t7519-status-fsmonitor.sh | 48 +++++++++++++++++++++++++++++++++++++
>  1 file changed, 48 insertions(+)
>
> diff --git a/t/t7519-status-fsmonitor.sh b/t/t7519-status-fsmonitor.sh
> index 23879d967297..306157d48abf 100755
> --- a/t/t7519-status-fsmonitor.sh
> +++ b/t/t7519-status-fsmonitor.sh
> @@ -78,6 +78,7 @@ test_expect_success 'setup' '
>         expect*
>         actual*
>         marker*
> +       trace2*
>         EOF
>  '
>
> @@ -400,4 +401,51 @@ test_expect_success 'status succeeds after staging/unstaging' '
>         )
>  '
>
> +test_expect_success 'status succeeds with sparse index' '
> +       test_config core.fsmonitor "$TEST_DIRECTORY/t7519/fsmonitor-all" &&
> +       git status --porcelain=v2 >expect &&
> +       git sparse-checkout init --cone --sparse-index &&
> +       GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \
> +               git status --porcelain=v2 >actual &&
> +       test_region ! index ensure_full_index trace2.txt &&
> +       test_cmp expect actual &&
> +       rm trace2.txt &&
> +
> +       write_script .git/hooks/fsmonitor-test<<-\EOF &&
> +               printf "last_update_token\0"
> +       EOF
> +       git config core.fsmonitor .git/hooks/fsmonitor-test &&
> +       git status --porcelain=v2 >expect &&
> +       git sparse-checkout init --cone --sparse-index &&
> +       GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \
> +               git status --porcelain=v2 >actual &&
> +       test_region ! index ensure_full_index trace2.txt &&
> +       test_cmp expect actual &&
> +       rm trace2.txt &&
> +
> +       write_script .git/hooks/fsmonitor-test<<-\EOF &&
> +               printf "last_update_token\0"
> +               printf "dir1/modified\0"
> +       EOF
> +       git config core.fsmonitor .git/hooks/fsmonitor-test &&
> +       git status --porcelain=v2 >expect &&
> +       git sparse-checkout init --cone --sparse-index &&
> +       GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \
> +               git status --porcelain=v2 >actual &&
> +       test_region ! index ensure_full_index trace2.txt &&
> +       test_cmp expect actual &&
> +
> +       write_script .git/hooks/fsmonitor-test<<-\EOF &&
> +               printf "last_update_token\0"
> +               printf "dir1a/modified\0"
> +       EOF
> +       git config core.fsmonitor .git/hooks/fsmonitor-test &&
> +       git status --porcelain=v2 >expect &&
> +       git sparse-checkout init --cone --sparse-index &&
> +       GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \
> +               git status --porcelain=v2 >actual &&
> +       test_region index ensure_full_index trace2.txt &&
> +       test_cmp expect actual

There's a lot of duplicated lines here; would it make sense to have a
helper function you call, making it easier to see the differences
between the four subsections of this test?  Also, do you want to use
test_config instead of git config, so that it automatically gets unset
at the end of the test?

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [PATCH 01/10] t1092: add tests for status/add and sparse files
  2021-04-20 21:52   ` Elijah Newren
@ 2021-04-21 13:21     ` Derrick Stolee
  0 siblings, 0 replies; 127+ messages in thread
From: Derrick Stolee @ 2021-04-21 13:21 UTC (permalink / raw)
  To: Elijah Newren, Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Junio C Hamano, Derrick Stolee, Derrick Stolee

On 4/20/2021 5:52 PM, Elijah Newren wrote:
> On Tue, Apr 13, 2021 at 7:01 AM Derrick Stolee via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
> I'll note in particular that it's easy for users after running "git
> add" to run other things such as "git sparse-checkout reapply" or "git
> switch $otherbranch" and suddenly the file disappears from the working
> tree.  From the sparse-checkout machinery that makes sense; this path
> doesn't match the .git/info/sparse-checkout list of paths, so it
> should be removed from the working tree.  But it's very disorienting
> to users.  Especially if some of those commands are side-effects of
> other commands (e.g. our build system invokes "git sparse-checkout
> reapply" in various cases, most common of which is that even a simple
> "git pull" can bring down code with dependency changes and thus a need
> for new sparsity rules and whatnot), but it definitely can just happen
> in ways users don't expect with their own commands (e.g. the git
> switch/checkout example).
> 
> The patch looks good, but it'd be nice if while documenting it we also
> add a comment that we believe we want to change the behavior (for
> sparse-checkout both with and without sparse-index).  It's one of
> those many paper-cuts we still have.

I can try to comment on these corner case tests that the behavior is
not intended to be permanent, especially when already needing to comment
how strange it is acting.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [PATCH 02/10] unpack-trees: make sparse aware
  2021-04-20 23:00   ` Elijah Newren
@ 2021-04-21 13:41     ` Derrick Stolee
  2021-04-21 16:11       ` Elijah Newren
  2021-04-21 17:27     ` Derrick Stolee
  1 sibling, 1 reply; 127+ messages in thread
From: Derrick Stolee @ 2021-04-21 13:41 UTC (permalink / raw)
  To: Elijah Newren, Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Junio C Hamano, Derrick Stolee, Derrick Stolee

On 4/20/2021 7:00 PM, Elijah Newren wrote:
> On Tue, Apr 13, 2021 at 7:01 AM Derrick Stolee via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
>> diff --git a/dir.h b/dir.h
>> index 51cb0e217247..9d6666f520f3 100644
>> --- a/dir.h
>> +++ b/dir.h
>> @@ -503,7 +503,7 @@ static inline int ce_path_match(struct index_state *istate,
>>                                 char *seen)
>>  {
>>         return match_pathspec(istate, pathspec, ce->name, ce_namelen(ce), 0, seen,
>> -                             S_ISDIR(ce->ce_mode) || S_ISGITLINK(ce->ce_mode));
>> +                             S_ISSPARSEDIR(ce->ce_mode) || S_ISDIR(ce->ce_mode) || S_ISGITLINK(ce->ce_mode));
> 
> I'm confused why this change would be needed, or why it'd semantically
> be meaningful here either.  Doesn't S_ISSPARSEDIR() being true imply
> S_ISDIR() is true (and perhaps even vice versa?).
> 
> By chance, was this a leftover from your early RFC changes from a few
> series ago when you had an entirely different mode for sparse
> directory entries?

I will double-check on this with additional testing and debugging.
Your comments below make it clear that this patch would benefit from
some additional splitting.

>>  }
>>
>>  static inline int dir_path_match(struct index_state *istate,
>> diff --git a/preload-index.c b/preload-index.c
>> index e5529a586366..35e67057ca9b 100644
>> --- a/preload-index.c
>> +++ b/preload-index.c
>> @@ -55,6 +55,8 @@ static void *preload_thread(void *_data)
>>                         continue;
>>                 if (S_ISGITLINK(ce->ce_mode))
>>                         continue;
>> +               if (S_ISSPARSEDIR(ce->ce_mode))
>> +                       continue;
>>                 if (ce_uptodate(ce))
>>                         continue;
>>                 if (ce_skip_worktree(ce))
> 
> Don't we have S_ISSPARSEDIR(ce->ce_mode) implies ce_skip_worktree(ce)?
>  Is this a duplicate check?  If so, is it still desirable for
> future-proofing or code clarity, or is it strictly redundant?

You're right, we could skip this one because the ce_skip_worktree(ce)
is enough to cover this case. I think I created this one because I was
auditing uses of S_ISGITLINK().

>> diff --git a/read-cache.c b/read-cache.c
>> index 29ffa9ac5db9..6308234b4838 100644
>> --- a/read-cache.c
>> +++ b/read-cache.c
>> @@ -1594,6 +1594,9 @@ int refresh_index(struct index_state *istate, unsigned int flags,
>>                 if (ignore_skip_worktree && ce_skip_worktree(ce))
>>                         continue;
>>
>> +               if (istate->sparse_index && S_ISSPARSEDIR(ce->ce_mode))
>> +                       continue;
>> +
> 
> I'm a bit confused about what could trigger ce_skip_worktree(ce) &&
> !ignore_skip_worktree and why it'd be desirable to refresh
> skip-worktree entries.  However, this is tangential to your patch and
> has apparently been around since 2009 (in particular, from 56cac48c35
> ("ie_match_stat(): do not ignore skip-worktree bit with
> CE_MATCH_IGNORE_VALID", 2009-12-14)).

This is probably better served with a statement like this earlier in
the method:

	if (ignore_skip_worktree)
		ensure_full_index(istate);

It seems like ignoring the skip worktree bits is a rare occasion and
it will be worth expanding the index for that case.

>>                 if (pathspec && !ce_path_match(istate, ce, pathspec, seen))
>>                         filtered = 1;
>>
>> diff --git a/unpack-trees.c b/unpack-trees.c
>> index dddf106d5bd4..9a62e823928a 100644
>> --- a/unpack-trees.c
>> +++ b/unpack-trees.c
>> @@ -586,6 +586,13 @@ static void mark_ce_used(struct cache_entry *ce, struct unpack_trees_options *o)
>>  {
>>         ce->ce_flags |= CE_UNPACKED;
>>
>> +       /*
>> +        * If this is a sparse directory, don't advance cache_bottom.
>> +        * That will be advanced later using the cache-tree data.
>> +        */
>> +       if (S_ISSPARSEDIR(ce->ce_mode))
>> +               return;
>> +
> 
> I don't understand cache_bottom stuff; we might want to get Junio to
> look over it.  Or maybe I just need to dig a bit further and attempt
> to understand it.

I remember looking very careful at this when I created this (and found
it worth a comment) but I don't recall enough off the top of my head.
This is worth splitting out with a careful message, which will force me
to reexamine the cache_bottom member.

>>         if (o->cache_bottom < o->src_index->cache_nr &&
>>             o->src_index->cache[o->cache_bottom] == ce) {
>>                 int bottom = o->cache_bottom;
>> @@ -984,6 +991,9 @@ static int do_compare_entry(const struct cache_entry *ce,
>>         ce_len -= pathlen;
>>         ce_name = ce->name + pathlen;
>>
>> +       /* remove directory separator if a sparse directory entry */
>> +       if (S_ISSPARSEDIR(ce->ce_mode))
>> +               ce_len--;
>>         return df_name_compare(ce_name, ce_len, S_IFREG, name, namelen, mode);
> 
> Shouldn't we be passing ce->ce_mode instead of S_IFREG here as well?
> 
> Note the following sort order:
>    foo
>    foo.txt
>    foo/
>    foo/bar
> 
> You've trimmed off the '/', so 'foo/' would be ordered where 'foo' is,
> but df_name_compare() exists to make "foo" sort exactly where "foo/"
> would when "foo" is a directory.  Will your df_name_compare() call
> here result in foo.txt being placed after all the "foo/<subpath>"
> entries in the index and perhaps cause other problems down the line?
> (Are there issues, e.g. with cache-trees getting wrong ordering from
> this, or even writing out indexes or tree objects with the wrong
> ordering?  I've written out trees to disk with wrong ordering before
> and git usually survives but gets really confused with diffs.)
> 
> Since at least one caller of compare_entry() takes the return result
> and does a "if (cmp < 0)", this order is going to matter in some
> cases.  Perhaps we need some testcases where there is a sparse
> directory entry named "foo/" and a file recorded in some relevant tree
> with the name "foo.txt" to be able to trigger these lines of code?

I will do some testing to find out why removing the separator here was
necessary or valuable.

>>  }
>>
>> @@ -993,6 +1003,10 @@ static int compare_entry(const struct cache_entry *ce, const struct traverse_inf
>>         if (cmp)
>>                 return cmp;
>>
>> +       /* If ce is a sparse directory, then allow equality here. */
>> +       if (S_ISSPARSEDIR(ce->ce_mode))
>> +               return 0;
>> +
> 
> Um...so a sparse directory compares equal to _anything_ at all?  I'm
> really confused why this would be desirable.  Am I missing something
> here?

The context is that is removed from the patch is that "cmp" is the
response from do_compare_entry(), which does a length-limited comparison.
If cmp is non-zero, then we've already returned the difference.

The rest of the method is checking if the 'info' input is actually a
parent directory of the _path_ given at this cache entry.

>>         /*
>>          * Even if the beginning compared identically, the ce should
>>          * compare as bigger than a directory leading up to it!

The line after this is:

	return ce_namelen(ce) > traverse_path_len(info, tree_entry_len(n));

This comparison is saying "these paths match up to the directory specified
by info and n, but we need 'ce' to be a file within that directory." But
in the case of a sparse directory entry, we can skip this comparison.

>> @@ -1243,6 +1257,7 @@ static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, str
>>         struct cache_entry *src[MAX_UNPACK_TREES + 1] = { NULL, };
>>         struct unpack_trees_options *o = info->data;
>>         const struct name_entry *p = names;
>> +       unsigned recurse = 1;
> 
> "recurse" sent my mind off into questions about safety checks, base
> cases, etc., instead of just the simple "we don't want to read in
> directories corresponding to sparse entries".  I think this would be
> clearer either if the variable had the sparsity concept embedded in
> its name somewhere (e.g. "unsigned sparse_entry = 0", and check for
> (!sparse_entry) instead of (recurse) below), or with a comment about
> why there are cases where you want to avoid recursion.

I can understand that. This callback is confusing because it _does_
recurse, but through a sequence of methods instead of actually calling
itself.

It would be better to say something like "unpack_subdirectories = 1"
and disabling it when we are in a sparse directory.

>>
>>         /* Find first entry with a real name (we could use "mask" too) */
>>         while (!p->mode)
>> @@ -1284,12 +1299,16 @@ static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, str
>>                                         }
>>                                 }
>>                                 src[0] = ce;
>> +
>> +                               if (S_ISSPARSEDIR(ce->ce_mode))
>> +                                       recurse = 0;
> 
> Ah, the context here doesn't show it but this is in the "if (!cmp)"
> block, i.e. if we found a match for the sparse directory.  This makes
> sense, to me, _if_ we ignore the above question about sparse
> directories matching equal to anything and everything.

I believe that "anything and everything" concern has been resolved.

>> @@ -1319,7 +1338,8 @@ static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, str
>>                         }
>>                 }
>>
>> -               if (traverse_trees_recursive(n, dirmask, mask & ~dirmask,
>> +               if (recurse &&
>> +                   traverse_trees_recursive(n, dirmask, mask & ~dirmask,
>>                                              names, info) < 0)
>>                         return -1;
>>                 return mask;
> 
> Nice.  :-)
> 
> 
> I think your patch was mostly about the recurse stuff, which other
> than the name or a comment about it look good to me.  However, all the
> other preparatory small tweaks brought up a lot of questions or
> confusion for me.  I'm worried there might be a bug or two, though I
> may have just misunderstood some of the code bits.
 
This patch could probably be split up a little to make these things
clearer. Thanks for bringing up the tricky bits.

-Stolee

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [PATCH 03/10] dir.c: accept a directory as part of cone-mode patterns
  2021-04-20 23:21   ` Elijah Newren
@ 2021-04-21 13:47     ` Derrick Stolee
  0 siblings, 0 replies; 127+ messages in thread
From: Derrick Stolee @ 2021-04-21 13:47 UTC (permalink / raw)
  To: Elijah Newren, Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Junio C Hamano, Derrick Stolee, Derrick Stolee

On 4/20/2021 7:21 PM, Elijah Newren wrote:
> On Tue, Apr 13, 2021 at 7:01 AM Derrick Stolee via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
>>
>> From: Derrick Stolee <dstolee@microsoft.com>
>>
>> When we have sparse directory entries in the index, we want to compare
>> that directory against sparse-checkout patterns. Those pattern matching
>> algorithms are built expecting a file path, not a directory path. This
>> is especially important in the "cone mode" patterns which will match
>> files that exist within the "parent directories" as well as the
>> recursive directory matches.
>>
>> If path_matches_pattern_list() is given a directory, we can add a fake
>> filename ("-") to the directory and get the same results as before,
>> assuming we are in cone mode. Since sparse index requires cone mode
>> patterns, this is an acceptable assumption.
> 
> Makes sense; thanks for the good description.
> 
>> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
>> ---
>>  dir.c | 5 +++++
>>  1 file changed, 5 insertions(+)
>>
>> diff --git a/dir.c b/dir.c
>> index 166238e79f52..57e22e605cec 100644
>> --- a/dir.c
>> +++ b/dir.c
>> @@ -1378,6 +1378,11 @@ enum pattern_match_result path_matches_pattern_list(
>>         strbuf_addch(&parent_pathname, '/');
>>         strbuf_add(&parent_pathname, pathname, pathlen);
>>
>> +       /* Directory requests should be added as if they are a file */
> 
> "added" or "matched"?  Also, the description seems a bit brief and
> likely to surprise; I'd at least want to expand "file" to "file within
> their given directory" but it might be nice to get some summarized
> version of the commit message or at least state that "-" is just a
> random simple name within the given directory.

I can improve this comment.

>> +       if (parent_pathname.len > 1 &&
> 
> Is this line...
> 
>> +           parent_pathname.buf[parent_pathname.len - 1] == '/')
> 
> to prevent an out-of-bounds indexing?  If so, shouldn't it be "> 0" or
> ">= 1" rather than "> 1"?  And if so, doesn't the strbuf_addch() call
> above ensure the condition is always met?
> 
> Or are we trying to avoid adding the "-" when we parent_pathname is
> just a plain "/"?

I believe plain "/" is impossible. There needs to be a valid tree entry
before that first slash ("a/", for example). But that isn't super
important to the logic here and just adds confusion.

> 
>> +               strbuf_add(&parent_pathname, "-", 1);
>> +
> 
> Sorry for all the questions on such a tiny change.  It makes sense to
> me, I'm just curious whether it'll confuse future code readers.

Yes, let's avoid confusion by doing the simple thing and use "> 0".

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [PATCH 04/10] status: skip sparse-checkout percentage with sparse-index
  2021-04-20 23:26   ` Elijah Newren
@ 2021-04-21 13:51     ` Derrick Stolee
  0 siblings, 0 replies; 127+ messages in thread
From: Derrick Stolee @ 2021-04-21 13:51 UTC (permalink / raw)
  To: Elijah Newren, Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Junio C Hamano, Derrick Stolee, Derrick Stolee

On 4/20/2021 7:26 PM, Elijah Newren wrote:
> On Tue, Apr 13, 2021 at 7:01 AM Derrick Stolee via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
>> Avoid the expensive computation by simplifying the output to only report
>> that a sparse checkout exists, without the percentage.
> 
> Makes sense.  The percentage wasn't critical, it was just a nice UI
> bonus.  The critical part is notifying about being in a sparse
> checkout.
> 
> It makes me wonder slightly if we'd want to remove the percentage for
> both modes just to keep them more similar.  I'll ask some folks for
> their thoughts/opinions.  Of course, that could always be tweaked
> later and doesn't necessarily need to go into your series.

I find the percentage helpful for users who are exploring the
sparse-checkout feature in their repositories. It's nice to know how
much time it is saving, because "percentage of files" frequently
translates to "percentage of time it takes to update the worktree".

I was sad to lose it here, but I don't see any way to keep it.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [PATCH 05/10] status: use sparse-index throughout
  2021-04-21  0:44   ` Elijah Newren
@ 2021-04-21 13:55     ` Derrick Stolee
  0 siblings, 0 replies; 127+ messages in thread
From: Derrick Stolee @ 2021-04-21 13:55 UTC (permalink / raw)
  To: Elijah Newren, Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Junio C Hamano, Derrick Stolee, Derrick Stolee

On 4/20/2021 8:44 PM, Elijah Newren wrote:
> On Tue, Apr 13, 2021 at 7:01 AM Derrick Stolee via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
>>
>> From: Derrick Stolee <dstolee@microsoft.com>
>>
>> By testing 'git -c core.fsmonitor= status -uno', we can check for the
>> simplest index operations that can be made sparse-aware. The necessary
>> implementation details are already integrated with sparse-checkout, so
>> modify command_requires_full_index to be zero for cmd_status().
>>
>> By running the debugger for 'git status -uno' after that change, we find
>> two instances of ensure_full_index() that were added for extra safety,
>> but can be removed without issue.
>>
>> In refresh_index(), we loop through the index entries. The
>> refresh_cache_ent() method copies the sparse directories into the
>> refreshed index without issue.
> 
> I do see the removal of a call to ensure_full_index() in
> refresh_index() that you mention in this paragraph in the patch below.
> 
> I'm confused, though; I would have thought we wanted to avoid a
> refresh_cache_ent() call.  Also, one of your previous patches added a
> 
>     if (istate->sparse_index && S_ISSPARSEDIR(ce->ce_mode))
>         continue;
> 
> check before the code ever gets to the refresh_cache_ent() call, so as
> far as I can tell, that function won't be called from refresh_entry()
> for sparse entries.  Maybe your commit message here is out-of-date?
> Or am I confused somehow?
> 
>> The loop within run_diff_files() skips things that are in stage 0 and
>> have skip-worktree enabled, so seems safe to disable ensure_full_index()
>> here.
> 
> Unlike the above, I don't see a removal of a ensure_full_index() call
> in run_diff_files() as claimed by this paragraph.  Has the commit
> message gotten out of date with refactorings you did while developing
> this series?

I greatly reduced the number of ensure_full_index() calls in the
previous topic (ds/sparse-index-protections) since first writing this
patch, so it is very likely to be out-of-date. Thanks for calling it out.

>> This allows some cases of 'git status' to no longer expand a sparse
>> index to a full one, giving the following performance improvements for
>> p2000-sparse-checkout-operations.sh:
>>
>> Test                                  HEAD~1           HEAD
>> -----------------------------------------------------------------------------
>> 2000.2: git status (full-index-v3)    0.38(0.36+0.07)  0.37(0.31+0.10) -2.6%
>> 2000.3: git status (full-index-v4)    0.38(0.29+0.12)  0.37(0.30+0.11) -2.6%
>> 2000.4: git status (sparse-index-v3)  2.43(2.33+0.14)  0.04(0.05+0.04) -98.4%
>> 2000.5: git status (sparse-index-v4)  2.44(2.35+0.13)  0.05(0.04+0.07) -98.0%
>>
>> Note that since HEAD~1 was expanding the sparse index by parsing trees,
>> it was artificially slower than the full index case. Thus, the 98%
>> improvement is misleading, and instead we should celebrate the 0.37s to
>> 0.05s improvement of 82%. This is more indicative of the peformance
>> gains we are expecting by using a sparse index.
> 
> 82%, very nice.  Was this with git.git as the test repository, or some
> other repo?  If it's git.git, then we'd actually expect a much bigger
> speedup for other repositories, as git.git is pretty small.
This test script takes the input repository (git.git in this case) and
creates a tree that contains that repository many times over, but only
four copies remain in the sparse-checkout definition. This creates the
big speedup, because of the enormous difference in index size.

As I am exploring commands such as 'merge' and 'rebase' I am finding
that this test setup is too expensive to cover those commands. I will
need to reduce the size of the test repository (by a factor of 4) and
that will reduce how impressive these results are while making the more
complicated commands testable in a reasonable amount of time.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [PATCH 06/10] dir: use expand_to_path() for sparse directories
  2021-04-21  0:53     ` Elijah Newren
@ 2021-04-21 14:03       ` Derrick Stolee
  0 siblings, 0 replies; 127+ messages in thread
From: Derrick Stolee @ 2021-04-21 14:03 UTC (permalink / raw)
  To: Elijah Newren, Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Junio C Hamano, Derrick Stolee, Derrick Stolee

On 4/20/2021 8:53 PM, Elijah Newren wrote:
> One more thing:
> 
> On Tue, Apr 20, 2021 at 5:52 PM Elijah Newren <newren@gmail.com> wrote:
>>
>> On Tue, Apr 13, 2021 at 7:01 AM Derrick Stolee via GitGitGadget
>> <gitgitgadget@gmail.com> wrote:
>>> Test                                  HEAD~1            HEAD
>>> ------------------------------------------------------------------------------
>>> 2000.4: git status (sparse-index-v3)  1.50(1.43+0.10)   0.04(0.04+0.03) -97.3%
>>> 2000.5: git status (sparse-index-v4)  1.50(1.43+0.10)   0.04(0.03+0.04) -97.3%
>>
>> Um, I'm confused.  In the previous patch you claimed the following speedups:
>>
>> 2000.4: git status (sparse-index-v3)  2.43(2.33+0.14)  0.04(0.05+0.04) -98.4%
>> 2000.5: git status (sparse-index-v4)  2.44(2.35+0.13)  0.05(0.04+0.07) -98.0%
>>
>> I don't understand why the "Before" for this patch claims 1.50 as the
>> initial speed, if the "After" for the last patch was 0.04.  Should the
>> previous commit message have instead claimed:
>>
>> 2000.4: git status (sparse-index-v3)  2.43(2.33+0.14)  1.50(1.43+0.10) -38.3%
>> 2000.5: git status (sparse-index-v4)  2.44(2.35+0.13)  1.50(1.43+0.10) -38.5%
...
>> Oh!  So, the previous patch was testing without enumerating untracked
>> files (because it did those slowly), whereas this one enumerates
>> untracked files and is still able to achieve the same performance?
>> This wasn't very clear from the commit message.  Maybe I'm just bad at
>> reading, but perhaps the commit message could be tweaked slightly to
>> make this more clear?
> 
> Why is the subject of this commit "dir: use expand_to_path() ..." if
> it only touches t1092-sparse-checkout-compatibility.sh?
 
You are right to be confused. This is another patch that simplified due
to refactors in the protections branch. This should just be squashed into
the previous.

For context: an earlier version inserted ensure_full_index() before
every call to index_name_pos() and then this patch swapped that for
a call to expand_to_path(). The change in the protections branch was
to have index_name_pos() call expand_to_path() itself, preventing the
need for these ensure_full_index() calls.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [PATCH 01/10] t1092: add tests for status/add and sparse files
  2021-04-13 14:01 ` [PATCH 01/10] t1092: add tests for status/add and sparse files Derrick Stolee via GitGitGadget
  2021-04-20 21:52   ` Elijah Newren
@ 2021-04-21 15:14   ` Matheus Tavares Bernardino
  2021-04-23 20:12     ` Derrick Stolee
  1 sibling, 1 reply; 127+ messages in thread
From: Matheus Tavares Bernardino @ 2021-04-21 15:14 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: git, Junio C Hamano, Elijah Newren, Derrick Stolee, Derrick Stolee

Hi, Stolee

You already said you will make changes in this test to make sure
git-add's sparse warning is kept on a sparse index (BTW thanks for
that :), but I just wanted to give a couple suggestions that came to
my mind while reading the patch.

On Tue, Apr 13, 2021 at 11:02 AM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
> index 12e6c453024f..6598c12a2069 100755
> --- a/t/t1092-sparse-checkout-compatibility.sh
> +++ b/t/t1092-sparse-checkout-compatibility.sh
> @@ -232,6 +232,42 @@ test_expect_success 'add, commit, checkout' '
>         test_all_match git checkout -
>  '
>
> +test_expect_success 'status/add: outside sparse cone' '
> +       init_repos &&
> +
> +       # folder1 is at HEAD, but outside the sparse cone
> +       run_on_sparse mkdir folder1 &&
> +       cp initial-repo/folder1/a sparse-checkout/folder1/a &&
> +       cp initial-repo/folder1/a sparse-index/folder1/a &&
> +
> +       test_sparse_match git status &&
> +
> +       write_script edit-contents <<-\EOF &&
> +       echo text >>$1
> +       EOF
> +       run_on_all ../edit-contents folder1/a &&
> +       run_on_all ../edit-contents folder1/new &&
> +
> +       test_sparse_match git status --porcelain=v2 &&
> +
> +       # This "git add folder1/a" is completely ignored
> +       # by the sparse-checkout repos. It causes the
> +       # full repo to have a different staged environment.
> +
> +       test_must_fail git -C sparse-checkout add folder1/a &&
> +       test_must_fail git -C sparse-index add folder1/a &&

To make sure the output is the same, could we collapse these two lines into:

test_sparse_match test_must_fail git add folder1/a ?

And additionally, I think we could repeat this check with `add
--refresh` and also after removing `folder1/a`. The reason I'm saying
this is because the check currently succeeds when `folder1/a` is in
the working tree (maybe because `fill_directory()` ends up expanding
the sparse index in this case?), but not under the two other
circumstances I mentioned (as we've discussed in [1]).

[1]: https://lore.kernel.org/git/CAHd-oW7vCKC-XRM=rX37+jQn_XDzjtar9nNHKQ-4OHSZ=2=KFA@mail.gmail.com/

> +       git -C full-checkout checkout HEAD -- folder1/a &&
> +       test_sparse_match git status --porcelain=v2 &&

Hmm, shouldn't this be `test_all_match`? IIUC, we've resetted
`folder1/a` on the full repo to make sure the status report is the
same across all repos, right?

> +       test_all_match git add . &&
> +       test_all_match git status --porcelain=v2 &&
> +       test_all_match git commit -m folder1/new &&
> +
> +       run_on_all ../edit-contents folder1/newer &&
> +       test_all_match git add folder1/ &&
> +       test_all_match git status --porcelain=v2 &&
> +       test_all_match git commit -m folder1/newer
> +'
> +
>  test_expect_success 'checkout and reset --hard' '
>         init_repos &&
>
> --
> gitgitgadget
>

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [PATCH 02/10] unpack-trees: make sparse aware
  2021-04-21 13:41     ` Derrick Stolee
@ 2021-04-21 16:11       ` Elijah Newren
  2021-04-22  2:24         ` Matheus Tavares Bernardino
  0 siblings, 1 reply; 127+ messages in thread
From: Elijah Newren @ 2021-04-21 16:11 UTC (permalink / raw)
  To: Derrick Stolee, Matheus Tavares Bernardino
  Cc: Derrick Stolee via GitGitGadget, Git Mailing List,
	Junio C Hamano, Derrick Stolee, Derrick Stolee

// Adding Matheus to cc due to the ignore_skip_worktree bit, given his
experience and expertise with the checkout and unpack-trees code.

On Wed, Apr 21, 2021 at 6:41 AM Derrick Stolee <stolee@gmail.com> wrote:
>
> On 4/20/2021 7:00 PM, Elijah Newren wrote:
> > On Tue, Apr 13, 2021 at 7:01 AM Derrick Stolee via GitGitGadget
> > <gitgitgadget@gmail.com> wrote:
> >> diff --git a/dir.h b/dir.h
> >> index 51cb0e217247..9d6666f520f3 100644
> >> --- a/dir.h
> >> +++ b/dir.h
> >> @@ -503,7 +503,7 @@ static inline int ce_path_match(struct index_state *istate,
> >>                                 char *seen)
> >>  {
> >>         return match_pathspec(istate, pathspec, ce->name, ce_namelen(ce), 0, seen,
> >> -                             S_ISDIR(ce->ce_mode) || S_ISGITLINK(ce->ce_mode));
> >> +                             S_ISSPARSEDIR(ce->ce_mode) || S_ISDIR(ce->ce_mode) || S_ISGITLINK(ce->ce_mode));
> >
> > I'm confused why this change would be needed, or why it'd semantically
> > be meaningful here either.  Doesn't S_ISSPARSEDIR() being true imply
> > S_ISDIR() is true (and perhaps even vice versa?).
> >
> > By chance, was this a leftover from your early RFC changes from a few
> > series ago when you had an entirely different mode for sparse
> > directory entries?
>
> I will double-check on this with additional testing and debugging.
> Your comments below make it clear that this patch would benefit from
> some additional splitting.
>
> >>  }
> >>
> >>  static inline int dir_path_match(struct index_state *istate,
> >> diff --git a/preload-index.c b/preload-index.c
> >> index e5529a586366..35e67057ca9b 100644
> >> --- a/preload-index.c
> >> +++ b/preload-index.c
> >> @@ -55,6 +55,8 @@ static void *preload_thread(void *_data)
> >>                         continue;
> >>                 if (S_ISGITLINK(ce->ce_mode))
> >>                         continue;
> >> +               if (S_ISSPARSEDIR(ce->ce_mode))
> >> +                       continue;
> >>                 if (ce_uptodate(ce))
> >>                         continue;
> >>                 if (ce_skip_worktree(ce))
> >
> > Don't we have S_ISSPARSEDIR(ce->ce_mode) implies ce_skip_worktree(ce)?
> >  Is this a duplicate check?  If so, is it still desirable for
> > future-proofing or code clarity, or is it strictly redundant?
>
> You're right, we could skip this one because the ce_skip_worktree(ce)
> is enough to cover this case. I think I created this one because I was
> auditing uses of S_ISGITLINK().
>
> >> diff --git a/read-cache.c b/read-cache.c
> >> index 29ffa9ac5db9..6308234b4838 100644
> >> --- a/read-cache.c
> >> +++ b/read-cache.c
> >> @@ -1594,6 +1594,9 @@ int refresh_index(struct index_state *istate, unsigned int flags,
> >>                 if (ignore_skip_worktree && ce_skip_worktree(ce))
> >>                         continue;
> >>
> >> +               if (istate->sparse_index && S_ISSPARSEDIR(ce->ce_mode))
> >> +                       continue;
> >> +
> >
> > I'm a bit confused about what could trigger ce_skip_worktree(ce) &&
> > !ignore_skip_worktree and why it'd be desirable to refresh
> > skip-worktree entries.  However, this is tangential to your patch and
> > has apparently been around since 2009 (in particular, from 56cac48c35
> > ("ie_match_stat(): do not ignore skip-worktree bit with
> > CE_MATCH_IGNORE_VALID", 2009-12-14)).
>
> This is probably better served with a statement like this earlier in
> the method:
>
>         if (ignore_skip_worktree)
>                 ensure_full_index(istate);
>
> It seems like ignoring the skip worktree bits is a rare occasion and
> it will be worth expanding the index for that case.

Maybe...I read the commit message that introduced the behavior and
it's not very convincing to me that SKIP_WORKTREE should be ignored
(it's also not that clear to me what the conditions are; is it just
update-index --really-refresh?); it may be worth double checking on
that assumption first, especially given how many other bugs existed
with skip_worktree stuff for years.  If it's necessary, then I agree
that your extra if-check makes sense.

In particular, I think it'd be really dumb for "update-index
--really-refresh" to read in and populate a huge subdirectory just to
stat files that don't exist because they are in directories that don't
exist.  And I think there's a pretty good argument to not update stat
information for skip_worktree entries in non-sparse-index cases even
in the presence of that flag, especially given Matheus' other recent
changes in this area (the emails just before we got to the point of
discussing SKIP_WORKTREE and racy clean entries...speaking of which,
it might be worthwhile pinging Matheus' for opinions on this issue
too.)

> >>                 if (pathspec && !ce_path_match(istate, ce, pathspec, seen))
> >>                         filtered = 1;
> >>
> >> diff --git a/unpack-trees.c b/unpack-trees.c
> >> index dddf106d5bd4..9a62e823928a 100644
> >> --- a/unpack-trees.c
> >> +++ b/unpack-trees.c
> >> @@ -586,6 +586,13 @@ static void mark_ce_used(struct cache_entry *ce, struct unpack_trees_options *o)
> >>  {
> >>         ce->ce_flags |= CE_UNPACKED;
> >>
> >> +       /*
> >> +        * If this is a sparse directory, don't advance cache_bottom.
> >> +        * That will be advanced later using the cache-tree data.
> >> +        */
> >> +       if (S_ISSPARSEDIR(ce->ce_mode))
> >> +               return;
> >> +
> >
> > I don't understand cache_bottom stuff; we might want to get Junio to
> > look over it.  Or maybe I just need to dig a bit further and attempt
> > to understand it.
>
> I remember looking very careful at this when I created this (and found
> it worth a comment) but I don't recall enough off the top of my head.
> This is worth splitting out with a careful message, which will force me
> to reexamine the cache_bottom member.
>
> >>         if (o->cache_bottom < o->src_index->cache_nr &&
> >>             o->src_index->cache[o->cache_bottom] == ce) {
> >>                 int bottom = o->cache_bottom;
> >> @@ -984,6 +991,9 @@ static int do_compare_entry(const struct cache_entry *ce,
> >>         ce_len -= pathlen;
> >>         ce_name = ce->name + pathlen;
> >>
> >> +       /* remove directory separator if a sparse directory entry */
> >> +       if (S_ISSPARSEDIR(ce->ce_mode))
> >> +               ce_len--;
> >>         return df_name_compare(ce_name, ce_len, S_IFREG, name, namelen, mode);
> >
> > Shouldn't we be passing ce->ce_mode instead of S_IFREG here as well?
> >
> > Note the following sort order:
> >    foo
> >    foo.txt
> >    foo/
> >    foo/bar
> >
> > You've trimmed off the '/', so 'foo/' would be ordered where 'foo' is,
> > but df_name_compare() exists to make "foo" sort exactly where "foo/"
> > would when "foo" is a directory.  Will your df_name_compare() call
> > here result in foo.txt being placed after all the "foo/<subpath>"
> > entries in the index and perhaps cause other problems down the line?
> > (Are there issues, e.g. with cache-trees getting wrong ordering from
> > this, or even writing out indexes or tree objects with the wrong
> > ordering?  I've written out trees to disk with wrong ordering before
> > and git usually survives but gets really confused with diffs.)
> >
> > Since at least one caller of compare_entry() takes the return result
> > and does a "if (cmp < 0)", this order is going to matter in some
> > cases.  Perhaps we need some testcases where there is a sparse
> > directory entry named "foo/" and a file recorded in some relevant tree
> > with the name "foo.txt" to be able to trigger these lines of code?
>
> I will do some testing to find out why removing the separator here was
> necessary or valuable.

I think you removed the separator because df_name_compare() assumes it
gets a regular filename (i.e. no trailing '/') and manually adds one
based on mode for directories.  You were probably worried about what
amounts to a non-sensical double '/', but df_name_compare() wouldn't
actually get to that point unless someone somehow recorded a path
within a git tree object that ended with a trailing '/'.  I'd rather
not have to worry about the double '/' and explain why it isn't
possible (or wonder about whether git trees with trailing '/'
characters could be recorded on some OS), so I think the trimming of
the separator as you did makes sense.

What doesn't make sense to me is that the code just below had a
hardcoded S_IFREG that it passed to df_name_compare, based on "this is
a cache entry, and index entries are _always_ regular files".  You
didn't change that, even though it's now a false assumption.
symlinks, and regular files should be passed as S_IFREG there, I'm not
sure what should be passed for submodules (though the fact that it's
been using S_IFREG for years suggests maybe that is the mode we want
for it, so we can't use ce->ce_mode), and I'm pretty sure sparse
directory entries should be passed as S_IFDIR in order to get the
sorting right unless you stop stripping the trailing '/' character.
I'm not exactly sure where the sorting for do_compare_entry() affects
the code later, but I tried to trace it out a little in my comments
above in order to guide some testing.

> >>  }
> >>
> >> @@ -993,6 +1003,10 @@ static int compare_entry(const struct cache_entry *ce, const struct traverse_inf
> >>         if (cmp)
> >>                 return cmp;
> >>
> >> +       /* If ce is a sparse directory, then allow equality here. */
> >> +       if (S_ISSPARSEDIR(ce->ce_mode))
> >> +               return 0;
> >> +
> >
> > Um...so a sparse directory compares equal to _anything_ at all?  I'm
> > really confused why this would be desirable.  Am I missing something
> > here?
>
> The context is that is removed from the patch is that "cmp" is the
> response from do_compare_entry(), which does a length-limited comparison.
> If cmp is non-zero, then we've already returned the difference.
>
> The rest of the method is checking if the 'info' input is actually a
> parent directory of the _path_ given at this cache entry.

Ah, thanks for the explanation.  So the only way we get here with
cmp==0 when we're dealing with a sparse directory entry is if we found
a directory by the same name....

> >>         /*
> >>          * Even if the beginning compared identically, the ce should
> >>          * compare as bigger than a directory leading up to it!
>
> The line after this is:
>
>         return ce_namelen(ce) > traverse_path_len(info, tree_entry_len(n));
>
> This comparison is saying "these paths match up to the directory specified
> by info and n, but we need 'ce' to be a file within that directory." But
> in the case of a sparse directory entry, we can skip this comparison.

Isn't this "must skip" rather than "can skip"?  If we're considering
the ce path "foo/bar/", then the traverse_path would be "foo/bar" and
we'd have:
    ce_namelen(ce) == 1 + traverse_path_len(info, tree_entry_len(n))
so this would return 1 for the comparison making them be treated as
non-equal even though they are what we consider equal entries.

In any event, it seems like this new check could use a better comment
than "then allow equality here".

> >> @@ -1243,6 +1257,7 @@ static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, str
> >>         struct cache_entry *src[MAX_UNPACK_TREES + 1] = { NULL, };
> >>         struct unpack_trees_options *o = info->data;
> >>         const struct name_entry *p = names;
> >> +       unsigned recurse = 1;
> >
> > "recurse" sent my mind off into questions about safety checks, base
> > cases, etc., instead of just the simple "we don't want to read in
> > directories corresponding to sparse entries".  I think this would be
> > clearer either if the variable had the sparsity concept embedded in
> > its name somewhere (e.g. "unsigned sparse_entry = 0", and check for
> > (!sparse_entry) instead of (recurse) below), or with a comment about
> > why there are cases where you want to avoid recursion.
>
> I can understand that. This callback is confusing because it _does_
> recurse, but through a sequence of methods instead of actually calling
> itself.
>
> It would be better to say something like "unpack_subdirectories = 1"
> and disabling it when we are in a sparse directory.

I like that name.

>
> >>
> >>         /* Find first entry with a real name (we could use "mask" too) */
> >>         while (!p->mode)
> >> @@ -1284,12 +1299,16 @@ static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, str
> >>                                         }
> >>                                 }
> >>                                 src[0] = ce;
> >> +
> >> +                               if (S_ISSPARSEDIR(ce->ce_mode))
> >> +                                       recurse = 0;
> >
> > Ah, the context here doesn't show it but this is in the "if (!cmp)"
> > block, i.e. if we found a match for the sparse directory.  This makes
> > sense, to me, _if_ we ignore the above question about sparse
> > directories matching equal to anything and everything.
>
> I believe that "anything and everything" concern has been resolved.

Yes, if we just improve the "then allow equality here" comment.

> >> @@ -1319,7 +1338,8 @@ static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, str
> >>                         }
> >>                 }
> >>
> >> -               if (traverse_trees_recursive(n, dirmask, mask & ~dirmask,
> >> +               if (recurse &&
> >> +                   traverse_trees_recursive(n, dirmask, mask & ~dirmask,
> >>                                              names, info) < 0)
> >>                         return -1;
> >>                 return mask;
> >
> > Nice.  :-)
> >
> >
> > I think your patch was mostly about the recurse stuff, which other
> > than the name or a comment about it look good to me.  However, all the
> > other preparatory small tweaks brought up a lot of questions or
> > confusion for me.  I'm worried there might be a bug or two, though I
> > may have just misunderstood some of the code bits.
>
> This patch could probably be split up a little to make these things
> clearer. Thanks for bringing up the tricky bits.
>
> -Stolee

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [PATCH 02/10] unpack-trees: make sparse aware
  2021-04-20 23:00   ` Elijah Newren
  2021-04-21 13:41     ` Derrick Stolee
@ 2021-04-21 17:27     ` Derrick Stolee
  2021-04-21 18:55       ` Matheus Tavares Bernardino
  2021-04-21 18:56       ` Elijah Newren
  1 sibling, 2 replies; 127+ messages in thread
From: Derrick Stolee @ 2021-04-21 17:27 UTC (permalink / raw)
  To: Elijah Newren, Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Junio C Hamano, Derrick Stolee, Derrick Stolee,
	Matheus Tavares Bernardino

On 4/20/2021 7:00 PM, Elijah Newren wrote:
> On Tue, Apr 13, 2021 at 7:01 AM Derrick Stolee via GitGitGadget
> <gitgitgadget@gmail.com> wrote:

>> diff --git a/read-cache.c b/read-cache.c
>> index 29ffa9ac5db9..6308234b4838 100644
>> --- a/read-cache.c
>> +++ b/read-cache.c
>> @@ -1594,6 +1594,9 @@ int refresh_index(struct index_state *istate, unsigned int flags,
>>                 if (ignore_skip_worktree && ce_skip_worktree(ce))
>>                         continue;
>>
>> +               if (istate->sparse_index && S_ISSPARSEDIR(ce->ce_mode))
>> +                       continue;
>> +
> 
> I'm a bit confused about what could trigger ce_skip_worktree(ce) &&
> !ignore_skip_worktree and why it'd be desirable to refresh
> skip-worktree entries.  However, this is tangential to your patch and
> has apparently been around since 2009 (in particular, from 56cac48c35
> ("ie_match_stat(): do not ignore skip-worktree bit with
> CE_MATCH_IGNORE_VALID", 2009-12-14)).

I did some more digging on this part here. There has been movement in
this space!

The thing that triggers this ignore_skip_worktree variable inside
refresh_index() is now the REFRESH_IGNORE_SKIP_WORKTREE flag which was
introduced recently and is set only by builtin/add.c:refresh(), by
Matheus: a20f704 (add: warn when asked to update SKIP_WORKTREE entries,
2021-04-08).

This means that we can (for now) keep the behavior the same by adding

	if (ignore_skip_worktree)
		ensure_full_index(istate);

before the loop. This prevents the expansion during 'git status', but
requires modification before we are ready for 'git add' to work
correctly. Specifically, 'git add' currently warns only when adding
something that exactly matches a tracked file with SKIP_WORKTREE. It
does _not_ warn when adding something that is untracked but would have
the SKIP_WORKTREE bit if it was tracked. We will need to add that
extra warning if we want to avoid expanding during 'git add'.

Alternatively, we can decide to change the behavior here and send an
error() and return failure if they try to add something that would
live within a sparse-directory entry. I will think more on this and
have a good answer before v2 is ready.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [PATCH 02/10] unpack-trees: make sparse aware
  2021-04-21 17:27     ` Derrick Stolee
@ 2021-04-21 18:55       ` Matheus Tavares Bernardino
  2021-04-21 19:10         ` Elijah Newren
  2021-04-21 18:56       ` Elijah Newren
  1 sibling, 1 reply; 127+ messages in thread
From: Matheus Tavares Bernardino @ 2021-04-21 18:55 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Elijah Newren, Derrick Stolee via GitGitGadget, Git Mailing List,
	Junio C Hamano, Derrick Stolee, Derrick Stolee

On Wed, Apr 21, 2021 at 2:27 PM Derrick Stolee <stolee@gmail.com> wrote:
>
> On 4/20/2021 7:00 PM, Elijah Newren wrote:
> > On Tue, Apr 13, 2021 at 7:01 AM Derrick Stolee via GitGitGadget
> > <gitgitgadget@gmail.com> wrote:
>
> >> diff --git a/read-cache.c b/read-cache.c
> >> index 29ffa9ac5db9..6308234b4838 100644
> >> --- a/read-cache.c
> >> +++ b/read-cache.c
> >> @@ -1594,6 +1594,9 @@ int refresh_index(struct index_state *istate, unsigned int flags,
> >>                 if (ignore_skip_worktree && ce_skip_worktree(ce))
> >>                         continue;
> >>
> >> +               if (istate->sparse_index && S_ISSPARSEDIR(ce->ce_mode))
> >> +                       continue;
> >> +
> >
> > I'm a bit confused about what could trigger ce_skip_worktree(ce) &&
> > !ignore_skip_worktree and why it'd be desirable to refresh
> > skip-worktree entries.  However, this is tangential to your patch and
> > has apparently been around since 2009 (in particular, from 56cac48c35
> > ("ie_match_stat(): do not ignore skip-worktree bit with
> > CE_MATCH_IGNORE_VALID", 2009-12-14)).
>
> I did some more digging on this part here. There has been movement in
> this space!
>
> The thing that triggers this ignore_skip_worktree variable inside
> refresh_index() is now the REFRESH_IGNORE_SKIP_WORKTREE flag which was
> introduced recently and is set only by builtin/add.c:refresh(), by
> Matheus: a20f704 (add: warn when asked to update SKIP_WORKTREE entries,
> 2021-04-08).
>
> This means that we can (for now) keep the behavior the same by adding
>
>         if (ignore_skip_worktree)
>                 ensure_full_index(istate);
>
> before the loop.

Hmm, I don't think we need to expand the index here.
ignore_skip_worktree makes the loop below ignore entries with the
skip_worktree bit set. Since sparse dirs also have this bit set, we
will already get the behavior we want :)

However, I think we will need to expand the index at
`find_pathspecs_matching_against_index()` in order to find and warn
about the pathspecs that have matches among skip_worktree entries...

> This prevents the expansion during 'git status', but
> requires modification before we are ready for 'git add' to work
> correctly. Specifically, 'git add' currently warns only when adding
> something that exactly matches a tracked file with SKIP_WORKTREE. It
> does _not_ warn when adding something that is untracked but would have
> the SKIP_WORKTREE bit if it was tracked. We will need to add that
> extra warning if we want to avoid expanding during 'git add'.

Hmm, I see :( I was trying to think if it would be possible to do the
pathspec matching (for the warning) without having to expand the
index, but then there are the untracked files... If the user gives
"a/*/c" and we have "a/b/" as a sparse dir, we don't know if "a/b/c"
is a skip_worktree entry or an untracked file without expanding the
index...

> Alternatively, we can decide to change the behavior here and send an
> error() and return failure if they try to add something that would
> live within a sparse-directory entry.

I think this behavior would be tricky to replicate on non-sparse-index
sparse-checkouts, if we were to do that. We would have to pathspec
match each untracked file against the sparsity patterns, perhaps?

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [PATCH 02/10] unpack-trees: make sparse aware
  2021-04-21 17:27     ` Derrick Stolee
  2021-04-21 18:55       ` Matheus Tavares Bernardino
@ 2021-04-21 18:56       ` Elijah Newren
  2021-04-23 20:16         ` Derrick Stolee
  1 sibling, 1 reply; 127+ messages in thread
From: Elijah Newren @ 2021-04-21 18:56 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Derrick Stolee via GitGitGadget, Git Mailing List,
	Junio C Hamano, Derrick Stolee, Derrick Stolee,
	Matheus Tavares Bernardino

On Wed, Apr 21, 2021 at 10:27 AM Derrick Stolee <stolee@gmail.com> wrote:
>
> On 4/20/2021 7:00 PM, Elijah Newren wrote:
> > On Tue, Apr 13, 2021 at 7:01 AM Derrick Stolee via GitGitGadget
> > <gitgitgadget@gmail.com> wrote:
>
> >> diff --git a/read-cache.c b/read-cache.c
> >> index 29ffa9ac5db9..6308234b4838 100644
> >> --- a/read-cache.c
> >> +++ b/read-cache.c
> >> @@ -1594,6 +1594,9 @@ int refresh_index(struct index_state *istate, unsigned int flags,
> >>                 if (ignore_skip_worktree && ce_skip_worktree(ce))
> >>                         continue;
> >>
> >> +               if (istate->sparse_index && S_ISSPARSEDIR(ce->ce_mode))
> >> +                       continue;
> >> +
> >
> > I'm a bit confused about what could trigger ce_skip_worktree(ce) &&
> > !ignore_skip_worktree and why it'd be desirable to refresh
> > skip-worktree entries.  However, this is tangential to your patch and
> > has apparently been around since 2009 (in particular, from 56cac48c35
> > ("ie_match_stat(): do not ignore skip-worktree bit with
> > CE_MATCH_IGNORE_VALID", 2009-12-14)).
>
> I did some more digging on this part here. There has been movement in
> this space!
>
> The thing that triggers this ignore_skip_worktree variable inside
> refresh_index() is now the REFRESH_IGNORE_SKIP_WORKTREE flag which was
> introduced recently and is set only by builtin/add.c:refresh(), by
> Matheus: a20f704 (add: warn when asked to update SKIP_WORKTREE entries,
> 2021-04-08).
>
> This means that we can (for now) keep the behavior the same by adding
>
>         if (ignore_skip_worktree)
>                 ensure_full_index(istate);
>
> before the loop. This prevents the expansion during 'git status', but
> requires modification before we are ready for 'git add' to work
> correctly. Specifically, 'git add' currently warns only when adding
> something that exactly matches a tracked file with SKIP_WORKTREE. It
> does _not_ warn when adding something that is untracked but would have
> the SKIP_WORKTREE bit if it was tracked. We will need to add that
> extra warning if we want to avoid expanding during 'git add'.
>
> Alternatively, we can decide to change the behavior here and send an
> error() and return failure if they try to add something that would
> live within a sparse-directory entry. I will think more on this and
> have a good answer before v2 is ready.

See my comments on 01/10; users are already getting surprised by "git
add" today and has been going on for months (though not super
frequently).  When they try to "git add" an untracked path that would
not match any path specifications in $GIT_DIR/info/sparse-checkout,
the fact that "git add" doesn't error out (or at the very least give a
warning) causes _subsequent_ commands to surprise the user with their
behavior; the fact that it is some later command that does weird stuff
(removing the file from the working tree) makes it harder for them to
try to understand and make sense of.  So, I'd say we do want to change
the behavior here...and not just for sparse-indexes but
sparse-checkouts in general.

As for how this affects the code, I think I'm behind both you and
Matheus on understanding here, but I'm starting to think it was a good
idea for me to spout my offhand comment on what looked like a funny
code smell that I thought was unrelated to your patch.  Sounds like it
is causing some good digging...I'll try to read up more on the results
when you send v2.  :-)

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [PATCH 02/10] unpack-trees: make sparse aware
  2021-04-21 18:55       ` Matheus Tavares Bernardino
@ 2021-04-21 19:10         ` Elijah Newren
  2021-04-21 19:51           ` Matheus Tavares Bernardino
  0 siblings, 1 reply; 127+ messages in thread
From: Elijah Newren @ 2021-04-21 19:10 UTC (permalink / raw)
  To: Matheus Tavares Bernardino
  Cc: Derrick Stolee, Derrick Stolee via GitGitGadget,
	Git Mailing List, Junio C Hamano, Derrick Stolee, Derrick Stolee

On Wed, Apr 21, 2021 at 11:55 AM Matheus Tavares Bernardino
<matheus.bernardino@usp.br> wrote:
>
> On Wed, Apr 21, 2021 at 2:27 PM Derrick Stolee <stolee@gmail.com> wrote:
> >
> > On 4/20/2021 7:00 PM, Elijah Newren wrote:
> > > On Tue, Apr 13, 2021 at 7:01 AM Derrick Stolee via GitGitGadget
> > > <gitgitgadget@gmail.com> wrote:
> >
> > >> diff --git a/read-cache.c b/read-cache.c
> > >> index 29ffa9ac5db9..6308234b4838 100644
> > >> --- a/read-cache.c
> > >> +++ b/read-cache.c
> > >> @@ -1594,6 +1594,9 @@ int refresh_index(struct index_state *istate, unsigned int flags,
> > >>                 if (ignore_skip_worktree && ce_skip_worktree(ce))
> > >>                         continue;
> > >>
> > >> +               if (istate->sparse_index && S_ISSPARSEDIR(ce->ce_mode))
> > >> +                       continue;
> > >> +
> > >
> > > I'm a bit confused about what could trigger ce_skip_worktree(ce) &&
> > > !ignore_skip_worktree and why it'd be desirable to refresh
> > > skip-worktree entries.  However, this is tangential to your patch and
> > > has apparently been around since 2009 (in particular, from 56cac48c35
> > > ("ie_match_stat(): do not ignore skip-worktree bit with
> > > CE_MATCH_IGNORE_VALID", 2009-12-14)).
> >
> > I did some more digging on this part here. There has been movement in
> > this space!
> >
> > The thing that triggers this ignore_skip_worktree variable inside
> > refresh_index() is now the REFRESH_IGNORE_SKIP_WORKTREE flag which was
> > introduced recently and is set only by builtin/add.c:refresh(), by
> > Matheus: a20f704 (add: warn when asked to update SKIP_WORKTREE entries,
> > 2021-04-08).
> >
> > This means that we can (for now) keep the behavior the same by adding
> >
> >         if (ignore_skip_worktree)
> >                 ensure_full_index(istate);
> >
> > before the loop.
>
> Hmm, I don't think we need to expand the index here.
> ignore_skip_worktree makes the loop below ignore entries with the
> skip_worktree bit set. Since sparse dirs also have this bit set, we
> will already get the behavior we want :)
>
> However, I think we will need to expand the index at
> `find_pathspecs_matching_against_index()` in order to find and warn
> about the pathspecs that have matches among skip_worktree entries...
>
> > This prevents the expansion during 'git status', but
> > requires modification before we are ready for 'git add' to work
> > correctly. Specifically, 'git add' currently warns only when adding
> > something that exactly matches a tracked file with SKIP_WORKTREE. It
> > does _not_ warn when adding something that is untracked but would have
> > the SKIP_WORKTREE bit if it was tracked. We will need to add that
> > extra warning if we want to avoid expanding during 'git add'.
>
> Hmm, I see :( I was trying to think if it would be possible to do the
> pathspec matching (for the warning) without having to expand the
> index, but then there are the untracked files... If the user gives
> "a/*/c" and we have "a/b/" as a sparse dir, we don't know if "a/b/c"
> is a skip_worktree entry or an untracked file without expanding the
> index...

I thought Stolee's series added something that could allow us to check
that e.g. "a/b/c" corresponded to an entry under the sparse directory
"a/b/" and thus is a would-be-sparse entry.  Can we use that?

> > Alternatively, we can decide to change the behavior here and send an
> > error() and return failure if they try to add something that would
> > live within a sparse-directory entry.
>
> I think this behavior would be tricky to replicate on non-sparse-index
> sparse-checkouts, if we were to do that. We would have to pathspec
> match each untracked file against the sparsity patterns, perhaps?

By way of analogy, don't we have to pay the cost of pathspec matching
each tree entry against the sparsity patterns when doing a checkout
before putting those entries into the index?  Since "git add" is
trying to put new entries into the index, doesn't it make sense for it
to pay the same cost for the untracked paths it is about to place
there?

Sure, that can be expensive for non-cone mode, but that's the price
users pay for using sparse-checkouts and not using cone mode, and they
pay it every time they try to update the index with some new checkout.
I think "git add" should be treated similarly as another way to update
the index -- especially since users will get confused (and have gotten
confused) by subsequent commands if we don't do those checks.

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [PATCH 02/10] unpack-trees: make sparse aware
  2021-04-21 19:10         ` Elijah Newren
@ 2021-04-21 19:51           ` Matheus Tavares Bernardino
  0 siblings, 0 replies; 127+ messages in thread
From: Matheus Tavares Bernardino @ 2021-04-21 19:51 UTC (permalink / raw)
  To: Elijah Newren
  Cc: Derrick Stolee, Derrick Stolee via GitGitGadget,
	Git Mailing List, Junio C Hamano, Derrick Stolee, Derrick Stolee

On Wed, Apr 21, 2021 at 4:11 PM Elijah Newren <newren@gmail.com> wrote:
>
> On Wed, Apr 21, 2021 at 11:55 AM Matheus Tavares Bernardino
> <matheus.bernardino@usp.br> wrote:
> >
> > On Wed, Apr 21, 2021 at 2:27 PM Derrick Stolee <stolee@gmail.com> wrote:
> > >
> > > On 4/20/2021 7:00 PM, Elijah Newren wrote:
> > > > On Tue, Apr 13, 2021 at 7:01 AM Derrick Stolee via GitGitGadget
> > > > <gitgitgadget@gmail.com> wrote:
> > >
> > > >> diff --git a/read-cache.c b/read-cache.c
> > > >> index 29ffa9ac5db9..6308234b4838 100644
> > > >> --- a/read-cache.c
> > > >> +++ b/read-cache.c
> > > >> @@ -1594,6 +1594,9 @@ int refresh_index(struct index_state *istate, unsigned int flags,
> > > >>                 if (ignore_skip_worktree && ce_skip_worktree(ce))
> > > >>                         continue;
> > > >>
> > > >> +               if (istate->sparse_index && S_ISSPARSEDIR(ce->ce_mode))
> > > >> +                       continue;
> > > >> +
> > > >
> > > > I'm a bit confused about what could trigger ce_skip_worktree(ce) &&
> > > > !ignore_skip_worktree and why it'd be desirable to refresh
> > > > skip-worktree entries.  However, this is tangential to your patch and
> > > > has apparently been around since 2009 (in particular, from 56cac48c35
> > > > ("ie_match_stat(): do not ignore skip-worktree bit with
> > > > CE_MATCH_IGNORE_VALID", 2009-12-14)).
> > >
> > > I did some more digging on this part here. There has been movement in
> > > this space!
> > >
> > > The thing that triggers this ignore_skip_worktree variable inside
> > > refresh_index() is now the REFRESH_IGNORE_SKIP_WORKTREE flag which was
> > > introduced recently and is set only by builtin/add.c:refresh(), by
> > > Matheus: a20f704 (add: warn when asked to update SKIP_WORKTREE entries,
> > > 2021-04-08).
> > >
> > > This means that we can (for now) keep the behavior the same by adding
> > >
> > >         if (ignore_skip_worktree)
> > >                 ensure_full_index(istate);
> > >
> > > before the loop.
> >
> > Hmm, I don't think we need to expand the index here.
> > ignore_skip_worktree makes the loop below ignore entries with the
> > skip_worktree bit set. Since sparse dirs also have this bit set, we
> > will already get the behavior we want :)
> >
> > However, I think we will need to expand the index at
> > `find_pathspecs_matching_against_index()` in order to find and warn
> > about the pathspecs that have matches among skip_worktree entries...
> >
> > > This prevents the expansion during 'git status', but
> > > requires modification before we are ready for 'git add' to work
> > > correctly. Specifically, 'git add' currently warns only when adding
> > > something that exactly matches a tracked file with SKIP_WORKTREE. It
> > > does _not_ warn when adding something that is untracked but would have
> > > the SKIP_WORKTREE bit if it was tracked. We will need to add that
> > > extra warning if we want to avoid expanding during 'git add'.
> >
> > Hmm, I see :( I was trying to think if it would be possible to do the
> > pathspec matching (for the warning) without having to expand the
> > index, but then there are the untracked files... If the user gives
> > "a/*/c" and we have "a/b/" as a sparse dir, we don't know if "a/b/c"
> > is a skip_worktree entry or an untracked file without expanding the
> > index...
>
> I thought Stolee's series added something that could allow us to check
> that e.g. "a/b/c" corresponded to an entry under the sparse directory
> "a/b/" and thus is a would-be-sparse entry.  Can we use that?

Yes, you mean for the warning on untracked paths that would become
sparse entries, right? The problem I was considering there was the
warning on tracked entries only, in which case I'm not sure if it
would help.

> > > Alternatively, we can decide to change the behavior here and send an
> > > error() and return failure if they try to add something that would
> > > live within a sparse-directory entry.
> >
> > I think this behavior would be tricky to replicate on non-sparse-index
> > sparse-checkouts, if we were to do that. We would have to pathspec
> > match each untracked file against the sparsity patterns, perhaps?
>
> By way of analogy, don't we have to pay the cost of pathspec matching
> each tree entry against the sparsity patterns when doing a checkout
> before putting those entries into the index?  Since "git add" is
> trying to put new entries into the index, doesn't it make sense for it
> to pay the same cost for the untracked paths it is about to place
> there?
>
> Sure, that can be expensive for non-cone mode, but that's the price
> users pay for using sparse-checkouts and not using cone mode, and they
> pay it every time they try to update the index with some new checkout.
> I think "git add" should be treated similarly as another way to update
> the index -- especially since users will get confused (and have gotten
> confused) by subsequent commands if we don't do those checks.

Good point. Yeah, that all makes sense :)

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [PATCH 02/10] unpack-trees: make sparse aware
  2021-04-21 16:11       ` Elijah Newren
@ 2021-04-22  2:24         ` Matheus Tavares Bernardino
  0 siblings, 0 replies; 127+ messages in thread
From: Matheus Tavares Bernardino @ 2021-04-22  2:24 UTC (permalink / raw)
  To: Elijah Newren
  Cc: Derrick Stolee, Derrick Stolee via GitGitGadget,
	Git Mailing List, Junio C Hamano, Derrick Stolee, Derrick Stolee

On Wed, Apr 21, 2021 at 1:11 PM Elijah Newren <newren@gmail.com> wrote:
>
> // Adding Matheus to cc due to the ignore_skip_worktree bit, given his
> experience and expertise with the checkout and unpack-trees code.
>
> On Wed, Apr 21, 2021 at 6:41 AM Derrick Stolee <stolee@gmail.com> wrote:
> >
> > On 4/20/2021 7:00 PM, Elijah Newren wrote:
> > > On Tue, Apr 13, 2021 at 7:01 AM Derrick Stolee via GitGitGadget
> > > <gitgitgadget@gmail.com> wrote:
> > >>
> > >> diff --git a/read-cache.c b/read-cache.c
> > >> index 29ffa9ac5db9..6308234b4838 100644
> > >> --- a/read-cache.c
> > >> +++ b/read-cache.c
> > >> @@ -1594,6 +1594,9 @@ int refresh_index(struct index_state *istate, unsigned int flags,
> > >>                 if (ignore_skip_worktree && ce_skip_worktree(ce))
> > >>                         continue;
> > >>
> > >> +               if (istate->sparse_index && S_ISSPARSEDIR(ce->ce_mode))
> > >> +                       continue;
> > >> +
> > >
> > > I'm a bit confused about what could trigger ce_skip_worktree(ce) &&
> > > !ignore_skip_worktree and why it'd be desirable to refresh
> > > skip-worktree entries.

The skip-worktree entries are not really refreshed in refresh_index(),
even when !ignore_skip_worktree (which is the default case; i.e.
without the REFRESH_IGNORE_SKIP_WORKTREE flag).

This flag (which is currently only used by `git add --refresh`s code
at `builtin/add.c:refresh()`), just makes refresh_index() skip the
following operations on skip-worktree entries: pathspec matching,
marking the matches on `seen`, checking/warning if unmerged, and
marking the entry as up-to-date (i.e. with the in-memory CE_UPTODATE
bit).

I added this flag in mt/add-rm-in-sparse-checkout and changed
`builtin/add.c:refresh()` to use it mainly because we needed a `seen`
array with only matches from non-skip-worktree entries so that we
could later decide when to emit the warning. (In fact, the original
implementation of the flag only controlled whether sparse matches
would be marked on `seen` or not [1])

[1]: https://lore.kernel.org/git/d65b214dd1d83a2e8710a9bbf98477c1929f0d5e.1614138107.git.matheus.bernardino@usp.br/

Perhaps we could alternatively make refresh_index() skip the
previously mentioned operations on all skip-worktrees entries
*unconditionally*. I.e. having, early in the loop:

if (ce_skip_worktree(ce))
        continue;

But I'm not familiar enough with CE_UPTODATE and how it's used in
different parts of the code base, so I didn't want to risk introducing
any bugs at refresh_index() callers that might want/expect the
function to set the CE_UPTODATE bit on the skip-worktree entries. The
case of `git add --refresh` was much narrower and easier to analyze,
and that's what we were interested in for the warning. That's why I
only changed the behavior there :)

> > > However, this is tangential to your patch and
> > > has apparently been around since 2009 (in particular, from 56cac48c35
> > > ("ie_match_stat(): do not ignore skip-worktree bit with
> > > CE_MATCH_IGNORE_VALID", 2009-12-14)).

Note that the `CE_MATCH_IGNORE_SKIP_WORKTREE` added in this patch does
control if refresh_cache_ent() will refresh skip-worktree entries, but
refresh_index() allways calls this function *without* this flag.

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [PATCH 01/10] t1092: add tests for status/add and sparse files
  2021-04-21 15:14   ` Matheus Tavares Bernardino
@ 2021-04-23 20:12     ` Derrick Stolee
  0 siblings, 0 replies; 127+ messages in thread
From: Derrick Stolee @ 2021-04-23 20:12 UTC (permalink / raw)
  To: Matheus Tavares Bernardino, Derrick Stolee via GitGitGadget
  Cc: git, Junio C Hamano, Elijah Newren, Derrick Stolee, Derrick Stolee

On 4/21/2021 11:14 AM, Matheus Tavares Bernardino wrote:
> Hi, Stolee
> 
> You already said you will make changes in this test to make sure
> git-add's sparse warning is kept on a sparse index (BTW thanks for
> that :), but I just wanted to give a couple suggestions that came to
> my mind while reading the patch.

I appreciate the suggestions! More tests always help me from
making mistakes, and you are definitely more of a 'git add'
expert than me.
 
>> +       test_must_fail git -C sparse-checkout add folder1/a &&
>> +       test_must_fail git -C sparse-index add folder1/a &&
> 
> To make sure the output is the same, could we collapse these two lines into:
> 
> test_sparse_match test_must_fail git add folder1/a ?

This is elegant. I'm sad I didn't think of it earlier.

> And additionally, I think we could repeat this check with `add
> --refresh` and also after removing `folder1/a`. The reason I'm saying
> this is because the check currently succeeds when `folder1/a` is in
> the working tree (maybe because `fill_directory()` ends up expanding
> the sparse index in this case?), but not under the two other
> circumstances I mentioned (as we've discussed in [1]).
> 
> [1]: https://lore.kernel.org/git/CAHd-oW7vCKC-XRM=rX37+jQn_XDzjtar9nNHKQ-4OHSZ=2=KFA@mail.gmail.com/

Can do!

>> +       git -C full-checkout checkout HEAD -- folder1/a &&
>> +       test_sparse_match git status --porcelain=v2 &&
> 
> Hmm, shouldn't this be `test_all_match`? IIUC, we've resetted
> `folder1/a` on the full repo to make sure the status report is the
> same across all repos, right?

Yes!

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [PATCH 02/10] unpack-trees: make sparse aware
  2021-04-21 18:56       ` Elijah Newren
@ 2021-04-23 20:16         ` Derrick Stolee
  0 siblings, 0 replies; 127+ messages in thread
From: Derrick Stolee @ 2021-04-23 20:16 UTC (permalink / raw)
  To: Elijah Newren
  Cc: Derrick Stolee via GitGitGadget, Git Mailing List,
	Junio C Hamano, Derrick Stolee, Derrick Stolee,
	Matheus Tavares Bernardino

On 4/21/2021 2:56 PM, Elijah Newren wrote:
> On Wed, Apr 21, 2021 at 10:27 AM Derrick Stolee <stolee@gmail.com> wrote:
>> Alternatively, we can decide to change the behavior here and send an
>> error() and return failure if they try to add something that would
>> live within a sparse-directory entry. I will think more on this and
>> have a good answer before v2 is ready.
> 
> See my comments on 01/10; users are already getting surprised by "git
> add" today and has been going on for months (though not super
> frequently).  When they try to "git add" an untracked path that would
> not match any path specifications in $GIT_DIR/info/sparse-checkout,
> the fact that "git add" doesn't error out (or at the very least give a
> warning) causes _subsequent_ commands to surprise the user with their
> behavior; the fact that it is some later command that does weird stuff
> (removing the file from the working tree) makes it harder for them to
> try to understand and make sense of.  So, I'd say we do want to change
> the behavior here...and not just for sparse-indexes but
> sparse-checkouts in general.
> 
> As for how this affects the code, I think I'm behind both you and
> Matheus on understanding here, but I'm starting to think it was a good
> idea for me to spout my offhand comment on what looked like a funny
> code smell that I thought was unrelated to your patch.  Sounds like it
> is causing some good digging...I'll try to read up more on the results
> when you send v2.  :-)

I think there are enough strange thing happening with 'git add' that I
want to take some time to figure out the right approach here. In v2, I
will delete the changes to builtin/add.c and instead focus on making
'git status' faster with a sparse-index. The 'git add' improvements
will follow in another series after I take enough time to understand
all of these special modes.

I think this split is especially important if we decide that changing
the behavior is the best thing to do here.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 127+ messages in thread

* [PATCH v2 0/8] Sparse-index: integrate with status
  2021-04-13 14:01 [PATCH 00/10] Sparse-index: integrate with status and add Derrick Stolee via GitGitGadget
                   ` (10 preceding siblings ...)
  2021-04-13 20:45 ` [PATCH 00/10] Sparse-index: integrate with status and add Matheus Tavares Bernardino
@ 2021-04-23 21:34 ` Derrick Stolee via GitGitGadget
  2021-04-23 21:34   ` [PATCH v2 1/8] t1092: add tests for status/add and sparse files Derrick Stolee via GitGitGadget
                     ` (9 more replies)
  11 siblings, 10 replies; 127+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-04-23 21:34 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee,
	Derrick Stolee

This is the first "payoff" series in the sparse-index work. It makes 'git
status' very fast when a sparse-index is enabled on a repository with
cone-mode sparse-checkout (and a small populated set).

This is based on ds/sparse-index-protections AND mt/add-rm-sparse-checkout.
The latter branch is needed because it changes the behavior of 'git add'
around sparse entries, which changes the expectations of a test added in
patch 1.

The approach here is to audit the places where ensure_full_index() pops up
while doing normal commands with pathspecs within the sparse-checkout
definition. Each of these are checked and tested. In the end, the
sparse-index is integrated with these features:

 * git status
 * FS Monitor index extension.

The performance tests in p2000-sparse-operations.sh improve by 95% or more,
even when compared with the full-index cases, not just the sparse-index
cases that previously had extra overhead.

Hopefully this is the first example of how ds/sparse-index-protections has
done the basic work to do these conversions safely, making them look easier
than they seemed when starting this adventure.

Thanks, -Stolee


Updates in V2
=============

 * Based on the feedback, it is clear that 'git add' will require much more
   careful testing and thought. I'm splitting it out of this series and it
   will return with a follow-up.
 * Test cases are improved, both in coverage and organization.
 * The previous "unpack-trees: make sparse aware" patch is split into three
   now.
 * Stale messages based on an old implementation of the "protections" topic
   are now fixed.
 * Performance tests were re-run.

Derrick Stolee (8):
  t1092: add tests for status/add and sparse files
  unpack-trees: preserve cache_bottom
  unpack-trees: compare sparse directories correctly
  unpack-trees: stop recursing into sparse directories
  dir.c: accept a directory as part of cone-mode patterns
  status: skip sparse-checkout percentage with sparse-index
  status: use sparse-index throughout
  fsmonitor: test with sparse index

 builtin/commit.c                         |  3 ++
 dir.c                                    | 11 +++++
 read-cache.c                             | 10 +++-
 t/t1092-sparse-checkout-compatibility.sh | 61 ++++++++++++++++++++++--
 t/t7519-status-fsmonitor.sh              | 48 +++++++++++++++++++
 unpack-trees.c                           | 25 ++++++++--
 wt-status.c                              | 14 ++++--
 wt-status.h                              |  1 +
 8 files changed, 161 insertions(+), 12 deletions(-)


base-commit: f723f370c89ad61f4f40aabfd3540b1ce19c00e5
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-932%2Fderrickstolee%2Fsparse-index%2Fstatus-and-add-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-932/derrickstolee/sparse-index/status-and-add-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/932

Range-diff vs v1:

  1:  b2cb5401eff8 !  1:  3bac9edae7d8 t1092: add tests for status/add and sparse files
     @@ Commit message
          Untracked files are fine: adding new files both with 'git add .' and
          'git add folder1/' works just as in a full checkout. This may not be
          entirely desirable, but we are not intending to change behavior at the
     -    moment, only document it.
     +    moment, only document it. A future change could alter the behavior to
     +    be more sensible, and this test could be modified to satisfy the new
     +    expected behavior.
      
          Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
      
     @@ t/t1092-sparse-checkout-compatibility.sh: test_expect_success 'add, commit, chec
      +	# This "git add folder1/a" is completely ignored
      +	# by the sparse-checkout repos. It causes the
      +	# full repo to have a different staged environment.
     -+	test_must_fail git -C sparse-checkout add folder1/a &&
     -+	test_must_fail git -C sparse-index add folder1/a &&
     ++	#
     ++	# This is not a desirable behavior, but this test
     ++	# ensures that the sparse-index is not the cause
     ++	# of a behavior change.
     ++	test_sparse_match test_must_fail git add folder1/a &&
     ++	test_sparse_match test_must_fail git add --refresh folder1/a &&
      +	git -C full-checkout checkout HEAD -- folder1/a &&
     -+	test_sparse_match git status --porcelain=v2 &&
     ++	test_all_match git status --porcelain=v2 &&
      +
      +	test_all_match git add . &&
      +	test_all_match git status --porcelain=v2 &&
  -:  ------------ >  2:  19344394379d unpack-trees: preserve cache_bottom
  -:  ------------ >  3:  24e71d8c0622 unpack-trees: compare sparse directories correctly
  2:  0a3892d2ec9e !  4:  d3c8948d0a33 unpack-trees: make sparse aware
     @@ Metadata
      Author: Derrick Stolee <dstolee@microsoft.com>
      
       ## Commit message ##
     -    unpack-trees: make sparse aware
     +    unpack-trees: stop recursing into sparse directories
      
     -    As a first step to integrate 'git status' and 'git add' with the sparse
     -    index, we must start integrating unpack_trees() with sparse directory
     -    entries. These changes are currently impossible to trigger because
     -    unpack_trees() calls ensure_full_index() if command_requires_full_index
     -    is true. This is the case for all commands at the moment. As we expand
     -    more commands to be sparse-aware, we might find that more changes are
     -    required to unpack_trees(). The current changes will suffice for
     -    'status' and 'add'.
     +    When walking trees using traverse_trees_recursive() and
     +    unpack_callback(), we must not attempt to walk into a sparse directory
     +    entry. There are no index entries within that directory to compare to
     +    the tree object at that position, so skip over the entries of that tree.
      
     -    unpack_trees() calls the traverse_trees() API using unpack_callback()
     -    to decide if we should recurse into a subtree. We must add new abilities
     -    to skip a subtree if it corresponds to a sparse directory entry.
     -
     -    It is important to be careful about the trailing directory separator
     -    that exists in the sparse directory entries but not in the subtree
     -    paths.
     +    This code is used in many places, so the only way to test it is to start
     +    removing the command_requres_full_index option from one builtin at a
     +    time and carefully test that its use of unpack_trees() behaves correctly
     +    with a sparse-index. Such tests will be added by later changes.
      
          Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
      
     - ## dir.h ##
     -@@ dir.h: static inline int ce_path_match(struct index_state *istate,
     - 				char *seen)
     - {
     - 	return match_pathspec(istate, pathspec, ce->name, ce_namelen(ce), 0, seen,
     --			      S_ISDIR(ce->ce_mode) || S_ISGITLINK(ce->ce_mode));
     -+			      S_ISSPARSEDIR(ce->ce_mode) || S_ISDIR(ce->ce_mode) || S_ISGITLINK(ce->ce_mode));
     - }
     - 
     - static inline int dir_path_match(struct index_state *istate,
     -
     - ## preload-index.c ##
     -@@ preload-index.c: static void *preload_thread(void *_data)
     - 			continue;
     - 		if (S_ISGITLINK(ce->ce_mode))
     - 			continue;
     -+		if (S_ISSPARSEDIR(ce->ce_mode))
     -+			continue;
     - 		if (ce_uptodate(ce))
     - 			continue;
     - 		if (ce_skip_worktree(ce))
     -
     - ## read-cache.c ##
     -@@ read-cache.c: int refresh_index(struct index_state *istate, unsigned int flags,
     - 		if (ignore_skip_worktree && ce_skip_worktree(ce))
     - 			continue;
     - 
     -+		if (istate->sparse_index && S_ISSPARSEDIR(ce->ce_mode))
     -+			continue;
     -+
     - 		if (pathspec && !ce_path_match(istate, ce, pathspec, seen))
     - 			filtered = 1;
     - 
     -
       ## unpack-trees.c ##
     -@@ unpack-trees.c: static void mark_ce_used(struct cache_entry *ce, struct unpack_trees_options *o)
     - {
     - 	ce->ce_flags |= CE_UNPACKED;
     - 
     -+	/*
     -+	 * If this is a sparse directory, don't advance cache_bottom.
     -+	 * That will be advanced later using the cache-tree data.
     -+	 */
     -+	if (S_ISSPARSEDIR(ce->ce_mode))
     -+		return;
     -+
     - 	if (o->cache_bottom < o->src_index->cache_nr &&
     - 	    o->src_index->cache[o->cache_bottom] == ce) {
     - 		int bottom = o->cache_bottom;
     -@@ unpack-trees.c: static int do_compare_entry(const struct cache_entry *ce,
     - 	ce_len -= pathlen;
     - 	ce_name = ce->name + pathlen;
     - 
     -+	/* remove directory separator if a sparse directory entry */
     -+	if (S_ISSPARSEDIR(ce->ce_mode))
     -+		ce_len--;
     - 	return df_name_compare(ce_name, ce_len, S_IFREG, name, namelen, mode);
     - }
     - 
     -@@ unpack-trees.c: static int compare_entry(const struct cache_entry *ce, const struct traverse_inf
     - 	if (cmp)
     - 		return cmp;
     - 
     -+	/* If ce is a sparse directory, then allow equality here. */
     -+	if (S_ISSPARSEDIR(ce->ce_mode))
     -+		return 0;
     -+
     - 	/*
     - 	 * Even if the beginning compared identically, the ce should
     - 	 * compare as bigger than a directory leading up to it!
      @@ unpack-trees.c: static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, str
       	struct cache_entry *src[MAX_UNPACK_TREES + 1] = { NULL, };
       	struct unpack_trees_options *o = info->data;
       	const struct name_entry *p = names;
     -+	unsigned recurse = 1;
     ++	unsigned unpack_tree = 1;
       
       	/* Find first entry with a real name (we could use "mask" too) */
       	while (!p->mode)
     @@ unpack-trees.c: static int unpack_callback(int n, unsigned long mask, unsigned l
       				src[0] = ce;
      +
      +				if (S_ISSPARSEDIR(ce->ce_mode))
     -+					recurse = 0;
     ++					unpack_tree = 0;
       			}
       			break;
       		}
       	}
       
      -	if (unpack_nondirectories(n, mask, dirmask, src, names, info) < 0)
     -+	if (recurse &&
     ++	if (unpack_tree &&
      +	    unpack_nondirectories(n, mask, dirmask, src, names, info) < 0)
       		return -1;
       
     @@ unpack-trees.c: static int unpack_callback(int n, unsigned long mask, unsigned l
       		}
       
      -		if (traverse_trees_recursive(n, dirmask, mask & ~dirmask,
     -+		if (recurse &&
     ++		if (unpack_tree &&
      +		    traverse_trees_recursive(n, dirmask, mask & ~dirmask,
       					     names, info) < 0)
       			return -1;
  3:  28ca717e6526 !  5:  fd96b71968b6 dir.c: accept a directory as part of cone-mode patterns
     @@ dir.c: enum pattern_match_result path_matches_pattern_list(
       	strbuf_addch(&parent_pathname, '/');
       	strbuf_add(&parent_pathname, pathname, pathlen);
       
     -+	/* Directory requests should be added as if they are a file */
     -+	if (parent_pathname.len > 1 &&
     ++	/*
     ++	 * Directory entries are matched if and only if a file
     ++	 * contained immediately within them is matched. For the
     ++	 * case of a directory entry, modify the path to create
     ++	 * a fake filename within this directory, allowing us to
     ++	 * use the file-base matching logic in an equivalent way.
     ++	 */
     ++	if (parent_pathname.len > 0 &&
      +	    parent_pathname.buf[parent_pathname.len - 1] == '/')
      +		strbuf_add(&parent_pathname, "-", 1);
      +
  4:  e86f874dd412 =  6:  1f4ba56e7416 status: skip sparse-checkout percentage with sparse-index
  5:  d7d4cad8be0b !  7:  3d09368c0541 status: use sparse-index throughout
     @@ Commit message
          implementation details are already integrated with sparse-checkout, so
          modify command_requires_full_index to be zero for cmd_status().
      
     -    By running the debugger for 'git status -uno' after that change, we find
     -    two instances of ensure_full_index() that were added for extra safety,
     -    but can be removed without issue.
     +    In refresh_index(), we loop through the index entries to refresh their
     +    stat() information. However, sparse directories have no stat()
     +    information to populate. Ignore these entries.
      
     -    In refresh_index(), we loop through the index entries. The
     -    refresh_cache_ent() method copies the sparse directories into the
     -    refreshed index without issue.
     +    This allows 'git status' to no longer expand a sparse index to a full
     +    one. This is further tested by dropping the "-uno" option and adding an
     +    untracked file into the worktree.
      
     -    The loop within run_diff_files() skips things that are in stage 0 and
     -    have skip-worktree enabled, so seems safe to disable ensure_full_index()
     -    here.
     -
     -    This allows some cases of 'git status' to no longer expand a sparse
     -    index to a full one, giving the following performance improvements for
     -    p2000-sparse-checkout-operations.sh:
     +    The performance test p2000-sparse-checkout-operations.sh demonstrates
     +    these improvements:
      
          Test                                  HEAD~1           HEAD
          -----------------------------------------------------------------------------
     -    2000.2: git status (full-index-v3)    0.38(0.36+0.07)  0.37(0.31+0.10) -2.6%
     -    2000.3: git status (full-index-v4)    0.38(0.29+0.12)  0.37(0.30+0.11) -2.6%
     -    2000.4: git status (sparse-index-v3)  2.43(2.33+0.14)  0.04(0.05+0.04) -98.4%
     -    2000.5: git status (sparse-index-v4)  2.44(2.35+0.13)  0.05(0.04+0.07) -98.0%
     +    2000.2: git status (full-index-v3)    0.31(0.30+0.05)  0.31(0.29+0.06) +0.0%
     +    2000.3: git status (full-index-v4)    0.31(0.29+0.07)  0.34(0.30+0.08) +9.7%
     +    2000.4: git status (sparse-index-v3)  2.35(2.28+0.10)  0.04(0.04+0.05) -98.3%
     +    2000.5: git status (sparse-index-v4)  2.35(2.24+0.15)  0.05(0.04+0.06) -97.9%
      
          Note that since HEAD~1 was expanding the sparse index by parsing trees,
          it was artificially slower than the full index case. Thus, the 98%
     -    improvement is misleading, and instead we should celebrate the 0.37s to
     -    0.05s improvement of 82%. This is more indicative of the peformance
     +    improvement is misleading, and instead we should celebrate the 0.34s to
     +    0.05s improvement of 85%. This is more indicative of the peformance
          gains we are expecting by using a sparse index.
      
          Note: we are dropping the assignment of core.fsmonitor here. This is not
     @@ read-cache.c: int refresh_index(struct index_state *istate, unsigned int flags,
       	trace2_region_enter("index", "refresh", NULL);
      -	/* TODO: audit for interaction with sparse-index. */
      -	ensure_full_index(istate);
     ++
       	for (i = 0; i < istate->cache_nr; i++) {
       		struct cache_entry *ce, *new_entry;
       		int cache_errno = 0;
     +@@ read-cache.c: int refresh_index(struct index_state *istate, unsigned int flags,
     + 		if (ignore_skip_worktree && ce_skip_worktree(ce))
     + 			continue;
     + 
     ++		/*
     ++		 * If this entry is a sparse directory, then there isn't
     ++		 * any stat() information to update. Ignore the entry.
     ++		 */
     ++		if (S_ISSPARSEDIR(ce->ce_mode))
     ++			continue;
     ++
     + 		if (pathspec && !ce_path_match(istate, ce, pathspec, seen))
     + 			filtered = 1;
     + 
      
       ## t/t1092-sparse-checkout-compatibility.sh ##
      @@ t/t1092-sparse-checkout-compatibility.sh: test_expect_success 'sparse-index is expanded and converted back' '
     @@ t/t1092-sparse-checkout-compatibility.sh: test_expect_success 'sparse-index is e
      +	init_repos &&
      +
      +	rm -f trace2.txt &&
     ++	echo >>sparse-index/untracked.txt &&
       	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \
      -		git -C sparse-index -c core.fsmonitor="" status -uno &&
      -	test_region index ensure_full_index trace2.txt
     -+		git -C sparse-index status -uno &&
     ++		git -C sparse-index status &&
      +	test_region ! index ensure_full_index trace2.txt
       '
       
  6:  434306541613 <  -:  ------------ dir: use expand_to_path() for sparse directories
  7:  f1a9ce4ef0e5 <  -:  ------------ add: allow operating on a sparse-only index
  8:  6d7f30f2b90a <  -:  ------------ pathspec: stop calling ensure_full_index
  9:  75199bbe8ca1 <  -:  ------------ t7519: add sparse directories to FS monitor tests
 10:  9d1183ddd280 !  8:  1fd033a6ebb2 fsmonitor: test with sparse index
     @@ t/t7519-status-fsmonitor.sh: test_expect_success 'status succeeds after staging/
       	)
       '
       
     -+test_expect_success 'status succeeds with sparse index' '
     -+	test_config core.fsmonitor "$TEST_DIRECTORY/t7519/fsmonitor-all" &&
     ++# Usage:
     ++# check_sparse_index_behavior [!]
     ++# If "!" is supplied, then we verify that we do not call ensure_full_index
     ++# during a call to 'git status'. Otherwise, we verify that we _do_ call it.
     ++check_sparse_index_behavior () {
      +	git status --porcelain=v2 >expect &&
      +	git sparse-checkout init --cone --sparse-index &&
     ++	git sparse-checkout set dir1 dir2 &&
      +	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \
      +		git status --porcelain=v2 >actual &&
     -+	test_region ! index ensure_full_index trace2.txt &&
     ++	test_region $1 index ensure_full_index trace2.txt &&
      +	test_cmp expect actual &&
      +	rm trace2.txt &&
     ++	git sparse-checkout disable
     ++}
     ++
     ++test_expect_success 'status succeeds with sparse index' '
     ++	git reset --hard &&
     ++
     ++	test_config core.fsmonitor "$TEST_DIRECTORY/t7519/fsmonitor-all" &&
     ++	check_sparse_index_behavior ! &&
      +
      +	write_script .git/hooks/fsmonitor-test<<-\EOF &&
      +		printf "last_update_token\0"
      +	EOF
      +	git config core.fsmonitor .git/hooks/fsmonitor-test &&
     -+	git status --porcelain=v2 >expect &&
     -+	git sparse-checkout init --cone --sparse-index &&
     -+	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \
     -+		git status --porcelain=v2 >actual &&
     -+	test_region ! index ensure_full_index trace2.txt &&
     -+	test_cmp expect actual &&
     -+	rm trace2.txt &&
     ++	check_sparse_index_behavior ! &&
      +
      +	write_script .git/hooks/fsmonitor-test<<-\EOF &&
      +		printf "last_update_token\0"
      +		printf "dir1/modified\0"
      +	EOF
     -+	git config core.fsmonitor .git/hooks/fsmonitor-test &&
     -+	git status --porcelain=v2 >expect &&
     -+	git sparse-checkout init --cone --sparse-index &&
     -+	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \
     -+		git status --porcelain=v2 >actual &&
     -+	test_region ! index ensure_full_index trace2.txt &&
     -+	test_cmp expect actual &&
     ++	check_sparse_index_behavior ! &&
      +
     ++	cp -r dir1 dir1a &&
     ++	git add dir1a &&
     ++	git commit -m "add dir1a" &&
     ++
     ++	# This one modifies outside the sparse-checkout definition
     ++	# and hence we expect to expand the sparse-index.
      +	write_script .git/hooks/fsmonitor-test<<-\EOF &&
      +		printf "last_update_token\0"
      +		printf "dir1a/modified\0"
      +	EOF
     -+	git config core.fsmonitor .git/hooks/fsmonitor-test &&
     -+	git status --porcelain=v2 >expect &&
     -+	git sparse-checkout init --cone --sparse-index &&
     -+	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \
     -+		git status --porcelain=v2 >actual &&
     -+	test_region index ensure_full_index trace2.txt &&
     -+	test_cmp expect actual
     ++	check_sparse_index_behavior
      +'
      +
       test_done

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 127+ messages in thread

* [PATCH v2 1/8] t1092: add tests for status/add and sparse files
  2021-04-23 21:34 ` [PATCH v2 0/8] Sparse-index: integrate with status Derrick Stolee via GitGitGadget
@ 2021-04-23 21:34   ` Derrick Stolee via GitGitGadget
  2021-05-13 12:40     ` Matheus Tavares Bernardino
  2021-04-23 21:34   ` [PATCH v2 2/8] unpack-trees: preserve cache_bottom Derrick Stolee via GitGitGadget
                     ` (8 subsequent siblings)
  9 siblings, 1 reply; 127+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-04-23 21:34 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Before moving to update 'git status' and 'git add' to work with sparse
indexes, add an explicit test that ensures the sparse-index works the
same as a normal sparse-checkout when the worktree contains directories
and files outside of the sparse cone.

Specifically, 'folder1/a' is a file in our test repo, but 'folder1' is
not in the sparse cone. When 'folder1/a' is modified, the file
'folder1/a' is shown as modified, but adding it fails. This is new
behavior as of a20f704 (add: warn when asked to update SKIP_WORKTREE
entries, 2021-04-08). Before that change, these adds would be silently
ignored.

Untracked files are fine: adding new files both with 'git add .' and
'git add folder1/' works just as in a full checkout. This may not be
entirely desirable, but we are not intending to change behavior at the
moment, only document it. A future change could alter the behavior to
be more sensible, and this test could be modified to satisfy the new
expected behavior.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 t/t1092-sparse-checkout-compatibility.sh | 40 ++++++++++++++++++++++++
 1 file changed, 40 insertions(+)

diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index 12e6c453024f..0ec487acd283 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -232,6 +232,46 @@ test_expect_success 'add, commit, checkout' '
 	test_all_match git checkout -
 '
 
+test_expect_success 'status/add: outside sparse cone' '
+	init_repos &&
+
+	# folder1 is at HEAD, but outside the sparse cone
+	run_on_sparse mkdir folder1 &&
+	cp initial-repo/folder1/a sparse-checkout/folder1/a &&
+	cp initial-repo/folder1/a sparse-index/folder1/a &&
+
+	test_sparse_match git status &&
+
+	write_script edit-contents <<-\EOF &&
+	echo text >>$1
+	EOF
+	run_on_all ../edit-contents folder1/a &&
+	run_on_all ../edit-contents folder1/new &&
+
+	test_sparse_match git status --porcelain=v2 &&
+
+	# This "git add folder1/a" is completely ignored
+	# by the sparse-checkout repos. It causes the
+	# full repo to have a different staged environment.
+	#
+	# This is not a desirable behavior, but this test
+	# ensures that the sparse-index is not the cause
+	# of a behavior change.
+	test_sparse_match test_must_fail git add folder1/a &&
+	test_sparse_match test_must_fail git add --refresh folder1/a &&
+	git -C full-checkout checkout HEAD -- folder1/a &&
+	test_all_match git status --porcelain=v2 &&
+
+	test_all_match git add . &&
+	test_all_match git status --porcelain=v2 &&
+	test_all_match git commit -m folder1/new &&
+
+	run_on_all ../edit-contents folder1/newer &&
+	test_all_match git add folder1/ &&
+	test_all_match git status --porcelain=v2 &&
+	test_all_match git commit -m folder1/newer
+'
+
 test_expect_success 'checkout and reset --hard' '
 	init_repos &&
 
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 127+ messages in thread

* [PATCH v2 2/8] unpack-trees: preserve cache_bottom
  2021-04-23 21:34 ` [PATCH v2 0/8] Sparse-index: integrate with status Derrick Stolee via GitGitGadget
  2021-04-23 21:34   ` [PATCH v2 1/8] t1092: add tests for status/add and sparse files Derrick Stolee via GitGitGadget
@ 2021-04-23 21:34   ` Derrick Stolee via GitGitGadget
  2021-04-23 21:34   ` [PATCH v2 3/8] unpack-trees: compare sparse directories correctly Derrick Stolee via GitGitGadget
                     ` (7 subsequent siblings)
  9 siblings, 0 replies; 127+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-04-23 21:34 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

The cache_bottom member of 'struct unpack_trees_options' is used to
track the range of index entries corresponding to a node of the cache
tree. While recursing with traverse_by_cache_tree(), this value is
preserved on the call stack using a local and then restored as that
method returns.

The mark_ce_used() method normally modifies the cache_bottom member when
it refers to the marked cache entry. However, sparse directory entries
are stored as nodes in the cache-tree data structure as of 2de37c53
(cache-tree: integrate with sparse directory entries, 2021-03-30). Thus,
the cache_bottom will be modified as the cache-tree walk advances. Do
not update it as well within mark_ce_used().

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 unpack-trees.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/unpack-trees.c b/unpack-trees.c
index dddf106d5bd4..1067db19c9d2 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -586,6 +586,13 @@ static void mark_ce_used(struct cache_entry *ce, struct unpack_trees_options *o)
 {
 	ce->ce_flags |= CE_UNPACKED;
 
+	/*
+	 * If this is a sparse directory, don't advance cache_bottom.
+	 * That will be advanced later using the cache-tree data.
+	 */
+	if (S_ISSPARSEDIR(ce->ce_mode))
+		return;
+
 	if (o->cache_bottom < o->src_index->cache_nr &&
 	    o->src_index->cache[o->cache_bottom] == ce) {
 		int bottom = o->cache_bottom;
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 127+ messages in thread

* [PATCH v2 3/8] unpack-trees: compare sparse directories correctly
  2021-04-23 21:34 ` [PATCH v2 0/8] Sparse-index: integrate with status Derrick Stolee via GitGitGadget
  2021-04-23 21:34   ` [PATCH v2 1/8] t1092: add tests for status/add and sparse files Derrick Stolee via GitGitGadget
  2021-04-23 21:34   ` [PATCH v2 2/8] unpack-trees: preserve cache_bottom Derrick Stolee via GitGitGadget
@ 2021-04-23 21:34   ` Derrick Stolee via GitGitGadget
  2021-05-13  3:26     ` Elijah Newren
  2021-04-23 21:34   ` [PATCH v2 4/8] unpack-trees: stop recursing into sparse directories Derrick Stolee via GitGitGadget
                     ` (6 subsequent siblings)
  9 siblings, 1 reply; 127+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-04-23 21:34 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

As we further integrate the sparse-index into unpack-trees, we need to
ensure that we compare sparse directory entries correctly with other
entries. This affects searching for an exact path as well as sorting
index entries.

Sparse directory entries contain the trailing directory separator. This
is important for the sorting, in particular. Thus, within
do_compare_entry() we stop using S_IFREG in all cases, since sparse
directories should use S_IFDIR to indicate that the comparison should
treat the entry name as a dirctory.

Within compare_entry(), it first calls do_compare_entry() to check the
leading portion of the name. When the input path is a directory name, we
could match exactly already. Thus, we should return 0 if we have an
exact string match on a sparse directory entry.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 unpack-trees.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/unpack-trees.c b/unpack-trees.c
index 1067db19c9d2..3af797093095 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -969,6 +969,7 @@ static int do_compare_entry(const struct cache_entry *ce,
 	int pathlen, ce_len;
 	const char *ce_name;
 	int cmp;
+	unsigned ce_mode;
 
 	/*
 	 * If we have not precomputed the traverse path, it is quicker
@@ -991,7 +992,8 @@ static int do_compare_entry(const struct cache_entry *ce,
 	ce_len -= pathlen;
 	ce_name = ce->name + pathlen;
 
-	return df_name_compare(ce_name, ce_len, S_IFREG, name, namelen, mode);
+	ce_mode = S_ISSPARSEDIR(ce->ce_mode) ? S_IFDIR : S_IFREG;
+	return df_name_compare(ce_name, ce_len, ce_mode, name, namelen, mode);
 }
 
 static int compare_entry(const struct cache_entry *ce, const struct traverse_info *info, const struct name_entry *n)
@@ -1000,6 +1002,10 @@ static int compare_entry(const struct cache_entry *ce, const struct traverse_inf
 	if (cmp)
 		return cmp;
 
+	/* If ce is a sparse directory, then allow an exact match. */
+	if (S_ISSPARSEDIR(ce->ce_mode))
+		return 0;
+
 	/*
 	 * Even if the beginning compared identically, the ce should
 	 * compare as bigger than a directory leading up to it!
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 127+ messages in thread

* [PATCH v2 4/8] unpack-trees: stop recursing into sparse directories
  2021-04-23 21:34 ` [PATCH v2 0/8] Sparse-index: integrate with status Derrick Stolee via GitGitGadget
                     ` (2 preceding siblings ...)
  2021-04-23 21:34   ` [PATCH v2 3/8] unpack-trees: compare sparse directories correctly Derrick Stolee via GitGitGadget
@ 2021-04-23 21:34   ` Derrick Stolee via GitGitGadget
  2021-05-13  3:31     ` Elijah Newren
  2021-04-23 21:34   ` [PATCH v2 5/8] dir.c: accept a directory as part of cone-mode patterns Derrick Stolee via GitGitGadget
                     ` (5 subsequent siblings)
  9 siblings, 1 reply; 127+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-04-23 21:34 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

When walking trees using traverse_trees_recursive() and
unpack_callback(), we must not attempt to walk into a sparse directory
entry. There are no index entries within that directory to compare to
the tree object at that position, so skip over the entries of that tree.

This code is used in many places, so the only way to test it is to start
removing the command_requres_full_index option from one builtin at a
time and carefully test that its use of unpack_trees() behaves correctly
with a sparse-index. Such tests will be added by later changes.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 unpack-trees.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/unpack-trees.c b/unpack-trees.c
index 3af797093095..67777570f829 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -1256,6 +1256,7 @@ static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, str
 	struct cache_entry *src[MAX_UNPACK_TREES + 1] = { NULL, };
 	struct unpack_trees_options *o = info->data;
 	const struct name_entry *p = names;
+	unsigned unpack_tree = 1;
 
 	/* Find first entry with a real name (we could use "mask" too) */
 	while (!p->mode)
@@ -1297,12 +1298,16 @@ static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, str
 					}
 				}
 				src[0] = ce;
+
+				if (S_ISSPARSEDIR(ce->ce_mode))
+					unpack_tree = 0;
 			}
 			break;
 		}
 	}
 
-	if (unpack_nondirectories(n, mask, dirmask, src, names, info) < 0)
+	if (unpack_tree &&
+	    unpack_nondirectories(n, mask, dirmask, src, names, info) < 0)
 		return -1;
 
 	if (o->merge && src[0]) {
@@ -1332,7 +1337,8 @@ static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, str
 			}
 		}
 
-		if (traverse_trees_recursive(n, dirmask, mask & ~dirmask,
+		if (unpack_tree &&
+		    traverse_trees_recursive(n, dirmask, mask & ~dirmask,
 					     names, info) < 0)
 			return -1;
 		return mask;
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 127+ messages in thread

* [PATCH v2 5/8] dir.c: accept a directory as part of cone-mode patterns
  2021-04-23 21:34 ` [PATCH v2 0/8] Sparse-index: integrate with status Derrick Stolee via GitGitGadget
                     ` (3 preceding siblings ...)
  2021-04-23 21:34   ` [PATCH v2 4/8] unpack-trees: stop recursing into sparse directories Derrick Stolee via GitGitGadget
@ 2021-04-23 21:34   ` Derrick Stolee via GitGitGadget
  2021-04-23 21:34   ` [PATCH v2 6/8] status: skip sparse-checkout percentage with sparse-index Derrick Stolee via GitGitGadget
                     ` (4 subsequent siblings)
  9 siblings, 0 replies; 127+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-04-23 21:34 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

When we have sparse directory entries in the index, we want to compare
that directory against sparse-checkout patterns. Those pattern matching
algorithms are built expecting a file path, not a directory path. This
is especially important in the "cone mode" patterns which will match
files that exist within the "parent directories" as well as the
recursive directory matches.

If path_matches_pattern_list() is given a directory, we can add a fake
filename ("-") to the directory and get the same results as before,
assuming we are in cone mode. Since sparse index requires cone mode
patterns, this is an acceptable assumption.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 dir.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/dir.c b/dir.c
index 166238e79f52..ab76ef286495 100644
--- a/dir.c
+++ b/dir.c
@@ -1378,6 +1378,17 @@ enum pattern_match_result path_matches_pattern_list(
 	strbuf_addch(&parent_pathname, '/');
 	strbuf_add(&parent_pathname, pathname, pathlen);
 
+	/*
+	 * Directory entries are matched if and only if a file
+	 * contained immediately within them is matched. For the
+	 * case of a directory entry, modify the path to create
+	 * a fake filename within this directory, allowing us to
+	 * use the file-base matching logic in an equivalent way.
+	 */
+	if (parent_pathname.len > 0 &&
+	    parent_pathname.buf[parent_pathname.len - 1] == '/')
+		strbuf_add(&parent_pathname, "-", 1);
+
 	if (hashmap_contains_path(&pl->recursive_hashmap,
 				  &parent_pathname)) {
 		result = MATCHED_RECURSIVE;
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 127+ messages in thread

* [PATCH v2 6/8] status: skip sparse-checkout percentage with sparse-index
  2021-04-23 21:34 ` [PATCH v2 0/8] Sparse-index: integrate with status Derrick Stolee via GitGitGadget
                     ` (4 preceding siblings ...)
  2021-04-23 21:34   ` [PATCH v2 5/8] dir.c: accept a directory as part of cone-mode patterns Derrick Stolee via GitGitGadget
@ 2021-04-23 21:34   ` Derrick Stolee via GitGitGadget
  2021-04-23 21:34   ` [PATCH v2 7/8] status: use sparse-index throughout Derrick Stolee via GitGitGadget
                     ` (3 subsequent siblings)
  9 siblings, 0 replies; 127+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-04-23 21:34 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

'git status' began reporting a percentage of populated paths when
sparse-checkout is enabled in 051df3cf (wt-status: show sparse
checkout status as well, 2020-07-18). This percentage is incorrect when
the index has sparse directories. It would also be expensive to
calculate as we would need to parse trees to count the total number of
possible paths.

Avoid the expensive computation by simplifying the output to only report
that a sparse checkout exists, without the percentage.

This change is the reason we use 'git status --porcelain=v2' in
t1092-sparse-checkout-compatibility.sh. We don't want to ensure that
this message is equal across both modes, but instead just the important
information about staged, modified, and untracked files are compared.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 t/t1092-sparse-checkout-compatibility.sh |  8 ++++++++
 wt-status.c                              | 14 +++++++++++---
 wt-status.h                              |  1 +
 3 files changed, 20 insertions(+), 3 deletions(-)

diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index 0ec487acd283..0dc551b25f67 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -196,6 +196,14 @@ test_expect_success 'status with options' '
 	test_all_match git status --porcelain=v2 -uno
 '
 
+test_expect_success 'status reports sparse-checkout' '
+	init_repos &&
+	git -C sparse-checkout status >full &&
+	git -C sparse-index status >sparse &&
+	test_i18ngrep "You are in a sparse checkout with " full &&
+	test_i18ngrep "You are in a sparse checkout." sparse
+'
+
 test_expect_success 'add, commit, checkout' '
 	init_repos &&
 
diff --git a/wt-status.c b/wt-status.c
index 0c8287a023e4..0425169c1895 100644
--- a/wt-status.c
+++ b/wt-status.c
@@ -1490,9 +1490,12 @@ static void show_sparse_checkout_in_use(struct wt_status *s,
 	if (s->state.sparse_checkout_percentage == SPARSE_CHECKOUT_DISABLED)
 		return;
 
-	status_printf_ln(s, color,
-			 _("You are in a sparse checkout with %d%% of tracked files present."),
-			 s->state.sparse_checkout_percentage);
+	if (s->state.sparse_checkout_percentage == SPARSE_CHECKOUT_SPARSE_INDEX)
+		status_printf_ln(s, color, _("You are in a sparse checkout."));
+	else
+		status_printf_ln(s, color,
+				_("You are in a sparse checkout with %d%% of tracked files present."),
+				s->state.sparse_checkout_percentage);
 	wt_longstatus_print_trailer(s);
 }
 
@@ -1650,6 +1653,11 @@ static void wt_status_check_sparse_checkout(struct repository *r,
 		return;
 	}
 
+	if (r->index->sparse_index) {
+		state->sparse_checkout_percentage = SPARSE_CHECKOUT_SPARSE_INDEX;
+		return;
+	}
+
 	for (i = 0; i < r->index->cache_nr; i++) {
 		struct cache_entry *ce = r->index->cache[i];
 		if (ce_skip_worktree(ce))
diff --git a/wt-status.h b/wt-status.h
index 0d32799b28e1..ab9cc9d8f032 100644
--- a/wt-status.h
+++ b/wt-status.h
@@ -78,6 +78,7 @@ enum wt_status_format {
 };
 
 #define SPARSE_CHECKOUT_DISABLED -1
+#define SPARSE_CHECKOUT_SPARSE_INDEX -2
 
 struct wt_status_state {
 	int merge_in_progress;
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 127+ messages in thread

* [PATCH v2 7/8] status: use sparse-index throughout
  2021-04-23 21:34 ` [PATCH v2 0/8] Sparse-index: integrate with status Derrick Stolee via GitGitGadget
                     ` (5 preceding siblings ...)
  2021-04-23 21:34   ` [PATCH v2 6/8] status: skip sparse-checkout percentage with sparse-index Derrick Stolee via GitGitGadget
@ 2021-04-23 21:34   ` Derrick Stolee via GitGitGadget
  2021-04-23 21:34   ` [PATCH v2 8/8] fsmonitor: test with sparse index Derrick Stolee via GitGitGadget
                     ` (2 subsequent siblings)
  9 siblings, 0 replies; 127+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-04-23 21:34 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

By testing 'git -c core.fsmonitor= status -uno', we can check for the
simplest index operations that can be made sparse-aware. The necessary
implementation details are already integrated with sparse-checkout, so
modify command_requires_full_index to be zero for cmd_status().

In refresh_index(), we loop through the index entries to refresh their
stat() information. However, sparse directories have no stat()
information to populate. Ignore these entries.

This allows 'git status' to no longer expand a sparse index to a full
one. This is further tested by dropping the "-uno" option and adding an
untracked file into the worktree.

The performance test p2000-sparse-checkout-operations.sh demonstrates
these improvements:

Test                                  HEAD~1           HEAD
-----------------------------------------------------------------------------
2000.2: git status (full-index-v3)    0.31(0.30+0.05)  0.31(0.29+0.06) +0.0%
2000.3: git status (full-index-v4)    0.31(0.29+0.07)  0.34(0.30+0.08) +9.7%
2000.4: git status (sparse-index-v3)  2.35(2.28+0.10)  0.04(0.04+0.05) -98.3%
2000.5: git status (sparse-index-v4)  2.35(2.24+0.15)  0.05(0.04+0.06) -97.9%

Note that since HEAD~1 was expanding the sparse index by parsing trees,
it was artificially slower than the full index case. Thus, the 98%
improvement is misleading, and instead we should celebrate the 0.34s to
0.05s improvement of 85%. This is more indicative of the peformance
gains we are expecting by using a sparse index.

Note: we are dropping the assignment of core.fsmonitor here. This is not
necessary for the test script as we are not altering the config any
other way. Correct integration with FS Monitor will be validated in
later changes.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 builtin/commit.c                         |  3 +++
 read-cache.c                             | 10 ++++++++--
 t/t1092-sparse-checkout-compatibility.sh | 13 +++++++++----
 3 files changed, 20 insertions(+), 6 deletions(-)

diff --git a/builtin/commit.c b/builtin/commit.c
index cf0c36d1dcb2..e529da7beadd 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -1404,6 +1404,9 @@ int cmd_status(int argc, const char **argv, const char *prefix)
 	if (argc == 2 && !strcmp(argv[1], "-h"))
 		usage_with_options(builtin_status_usage, builtin_status_options);
 
+	prepare_repo_settings(the_repository);
+	the_repository->settings.command_requires_full_index = 0;
+
 	status_init_config(&s, git_status_config);
 	argc = parse_options(argc, argv, prefix,
 			     builtin_status_options,
diff --git a/read-cache.c b/read-cache.c
index 29ffa9ac5db9..f80e26831b36 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1578,8 +1578,7 @@ int refresh_index(struct index_state *istate, unsigned int flags,
 	 */
 	preload_index(istate, pathspec, 0);
 	trace2_region_enter("index", "refresh", NULL);
-	/* TODO: audit for interaction with sparse-index. */
-	ensure_full_index(istate);
+
 	for (i = 0; i < istate->cache_nr; i++) {
 		struct cache_entry *ce, *new_entry;
 		int cache_errno = 0;
@@ -1594,6 +1593,13 @@ int refresh_index(struct index_state *istate, unsigned int flags,
 		if (ignore_skip_worktree && ce_skip_worktree(ce))
 			continue;
 
+		/*
+		 * If this entry is a sparse directory, then there isn't
+		 * any stat() information to update. Ignore the entry.
+		 */
+		if (S_ISSPARSEDIR(ce->ce_mode))
+			continue;
+
 		if (pathspec && !ce_path_match(istate, ce, pathspec, seen))
 			filtered = 1;
 
diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index 0dc551b25f67..5a8fe88dc894 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -453,12 +453,17 @@ test_expect_success 'sparse-index is expanded and converted back' '
 	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \
 		git -C sparse-index -c core.fsmonitor="" reset --hard &&
 	test_region index convert_to_sparse trace2.txt &&
-	test_region index ensure_full_index trace2.txt &&
+	test_region index ensure_full_index trace2.txt
+'
 
-	rm trace2.txt &&
+test_expect_success 'sparse-index is not expanded' '
+	init_repos &&
+
+	rm -f trace2.txt &&
+	echo >>sparse-index/untracked.txt &&
 	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \
-		git -C sparse-index -c core.fsmonitor="" status -uno &&
-	test_region index ensure_full_index trace2.txt
+		git -C sparse-index status &&
+	test_region ! index ensure_full_index trace2.txt
 '
 
 test_done
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 127+ messages in thread

* [PATCH v2 8/8] fsmonitor: test with sparse index
  2021-04-23 21:34 ` [PATCH v2 0/8] Sparse-index: integrate with status Derrick Stolee via GitGitGadget
                     ` (6 preceding siblings ...)
  2021-04-23 21:34   ` [PATCH v2 7/8] status: use sparse-index throughout Derrick Stolee via GitGitGadget
@ 2021-04-23 21:34   ` Derrick Stolee via GitGitGadget
  2021-05-13  4:12   ` [PATCH v2 0/8] Sparse-index: integrate with status Elijah Newren
  2021-05-14 18:30   ` [PATCH v3 00/12] " Derrick Stolee via GitGitGadget
  9 siblings, 0 replies; 127+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-04-23 21:34 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

During the effort to protect uses of the index to operate on a full
index, we did not modify fsmonitor.c. This is because it already works
effectively with only the change to index_name_stage_pos(). The only
thing left to do is to test that it works correctly.

These tests are added to demonstrate that the behavior is the same
across a full index and a sparse index, but also that file modifications
to a tracked directory outside of the sparse cone will trigger
ensure_full_index().

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 t/t7519-status-fsmonitor.sh | 48 +++++++++++++++++++++++++++++++++++++
 1 file changed, 48 insertions(+)

diff --git a/t/t7519-status-fsmonitor.sh b/t/t7519-status-fsmonitor.sh
index 45d025f96010..f70fe961902e 100755
--- a/t/t7519-status-fsmonitor.sh
+++ b/t/t7519-status-fsmonitor.sh
@@ -73,6 +73,7 @@ test_expect_success 'setup' '
 	expect*
 	actual*
 	marker*
+	trace2*
 	EOF
 '
 
@@ -383,4 +384,51 @@ test_expect_success 'status succeeds after staging/unstaging' '
 	)
 '
 
+# Usage:
+# check_sparse_index_behavior [!]
+# If "!" is supplied, then we verify that we do not call ensure_full_index
+# during a call to 'git status'. Otherwise, we verify that we _do_ call it.
+check_sparse_index_behavior () {
+	git status --porcelain=v2 >expect &&
+	git sparse-checkout init --cone --sparse-index &&
+	git sparse-checkout set dir1 dir2 &&
+	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \
+		git status --porcelain=v2 >actual &&
+	test_region $1 index ensure_full_index trace2.txt &&
+	test_cmp expect actual &&
+	rm trace2.txt &&
+	git sparse-checkout disable
+}
+
+test_expect_success 'status succeeds with sparse index' '
+	git reset --hard &&
+
+	test_config core.fsmonitor "$TEST_DIRECTORY/t7519/fsmonitor-all" &&
+	check_sparse_index_behavior ! &&
+
+	write_script .git/hooks/fsmonitor-test<<-\EOF &&
+		printf "last_update_token\0"
+	EOF
+	git config core.fsmonitor .git/hooks/fsmonitor-test &&
+	check_sparse_index_behavior ! &&
+
+	write_script .git/hooks/fsmonitor-test<<-\EOF &&
+		printf "last_update_token\0"
+		printf "dir1/modified\0"
+	EOF
+	check_sparse_index_behavior ! &&
+
+	cp -r dir1 dir1a &&
+	git add dir1a &&
+	git commit -m "add dir1a" &&
+
+	# This one modifies outside the sparse-checkout definition
+	# and hence we expect to expand the sparse-index.
+	write_script .git/hooks/fsmonitor-test<<-\EOF &&
+		printf "last_update_token\0"
+		printf "dir1a/modified\0"
+	EOF
+	check_sparse_index_behavior
+'
+
 test_done
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [PATCH v2 3/8] unpack-trees: compare sparse directories correctly
  2021-04-23 21:34   ` [PATCH v2 3/8] unpack-trees: compare sparse directories correctly Derrick Stolee via GitGitGadget
@ 2021-05-13  3:26     ` Elijah Newren
  0 siblings, 0 replies; 127+ messages in thread
From: Elijah Newren @ 2021-05-13  3:26 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Junio C Hamano, Matheus Tavares Bernardino,
	Derrick Stolee, Derrick Stolee, Derrick Stolee

On Fri, Apr 23, 2021 at 2:34 PM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Derrick Stolee <dstolee@microsoft.com>
>
> As we further integrate the sparse-index into unpack-trees, we need to
> ensure that we compare sparse directory entries correctly with other
> entries. This affects searching for an exact path as well as sorting
> index entries.
>
> Sparse directory entries contain the trailing directory separator. This
> is important for the sorting, in particular. Thus, within
> do_compare_entry() we stop using S_IFREG in all cases, since sparse
> directories should use S_IFDIR to indicate that the comparison should
> treat the entry name as a dirctory.
>
> Within compare_entry(), it first calls do_compare_entry() to check the
> leading portion of the name. When the input path is a directory name, we
> could match exactly already. Thus, we should return 0 if we have an
> exact string match on a sparse directory entry.

Thanks for splitting up patch 2 from the original series; it's much
easier to understand these separate patches.

>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  unpack-trees.c | 8 +++++++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/unpack-trees.c b/unpack-trees.c
> index 1067db19c9d2..3af797093095 100644
> --- a/unpack-trees.c
> +++ b/unpack-trees.c
> @@ -969,6 +969,7 @@ static int do_compare_entry(const struct cache_entry *ce,
>         int pathlen, ce_len;
>         const char *ce_name;
>         int cmp;
> +       unsigned ce_mode;
>
>         /*
>          * If we have not precomputed the traverse path, it is quicker
> @@ -991,7 +992,8 @@ static int do_compare_entry(const struct cache_entry *ce,
>         ce_len -= pathlen;
>         ce_name = ce->name + pathlen;
>
> -       return df_name_compare(ce_name, ce_len, S_IFREG, name, namelen, mode);
> +       ce_mode = S_ISSPARSEDIR(ce->ce_mode) ? S_IFDIR : S_IFREG;

Ah, so here the fact that S_ISSPARSEDIR is defined as
   #define S_ISSPARSEDIR(m) ((m) == S_IFDIR)
whereas S_ISDIR is defined as
   #define S_ISDIR(m)      (((m) & S_IFMT) == S_IFDIR)
turns out to be critically important, because if you used S_ISDIR()
here, then we'd get ce_mode = S_IFDIR for submodules and break the
sorting.  S_ISSPARSEDIR() gives us the correct value.

> +       return df_name_compare(ce_name, ce_len, ce_mode, name, namelen, mode);
>  }
>
>  static int compare_entry(const struct cache_entry *ce, const struct traverse_info *info, const struct name_entry *n)
> @@ -1000,6 +1002,10 @@ static int compare_entry(const struct cache_entry *ce, const struct traverse_inf
>         if (cmp)
>                 return cmp;
>
> +       /* If ce is a sparse directory, then allow an exact match. */
> +       if (S_ISSPARSEDIR(ce->ce_mode))
> +               return 0;

I think the comment from the commit message belongs in the code; the
comment in the code is too jarring without the more detailed
explanation.

> +
>         /*
>          * Even if the beginning compared identically, the ce should
>          * compare as bigger than a directory leading up to it!
> --
> gitgitgadget

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [PATCH v2 4/8] unpack-trees: stop recursing into sparse directories
  2021-04-23 21:34   ` [PATCH v2 4/8] unpack-trees: stop recursing into sparse directories Derrick Stolee via GitGitGadget
@ 2021-05-13  3:31     ` Elijah Newren
  0 siblings, 0 replies; 127+ messages in thread
From: Elijah Newren @ 2021-05-13  3:31 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Junio C Hamano, Matheus Tavares Bernardino,
	Derrick Stolee, Derrick Stolee, Derrick Stolee

On Fri, Apr 23, 2021 at 2:34 PM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Derrick Stolee <dstolee@microsoft.com>
>
> When walking trees using traverse_trees_recursive() and
> unpack_callback(), we must not attempt to walk into a sparse directory
> entry. There are no index entries within that directory to compare to
> the tree object at that position, so skip over the entries of that tree.
>
> This code is used in many places, so the only way to test it is to start
> removing the command_requres_full_index option from one builtin at a
> time and carefully test that its use of unpack_trees() behaves correctly
> with a sparse-index. Such tests will be added by later changes.
>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  unpack-trees.c | 10 ++++++++--
>  1 file changed, 8 insertions(+), 2 deletions(-)
>
> diff --git a/unpack-trees.c b/unpack-trees.c
> index 3af797093095..67777570f829 100644
> --- a/unpack-trees.c
> +++ b/unpack-trees.c
> @@ -1256,6 +1256,7 @@ static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, str
>         struct cache_entry *src[MAX_UNPACK_TREES + 1] = { NULL, };
>         struct unpack_trees_options *o = info->data;
>         const struct name_entry *p = names;
> +       unsigned unpack_tree = 1;
>
>         /* Find first entry with a real name (we could use "mask" too) */
>         while (!p->mode)
> @@ -1297,12 +1298,16 @@ static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, str
>                                         }
>                                 }
>                                 src[0] = ce;
> +
> +                               if (S_ISSPARSEDIR(ce->ce_mode))
> +                                       unpack_tree = 0;
>                         }
>                         break;
>                 }
>         }
>
> -       if (unpack_nondirectories(n, mask, dirmask, src, names, info) < 0)
> +       if (unpack_tree &&
> +           unpack_nondirectories(n, mask, dirmask, src, names, info) < 0)
>                 return -1;
>
>         if (o->merge && src[0]) {
> @@ -1332,7 +1337,8 @@ static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, str
>                         }
>                 }
>
> -               if (traverse_trees_recursive(n, dirmask, mask & ~dirmask,
> +               if (unpack_tree &&
> +                   traverse_trees_recursive(n, dirmask, mask & ~dirmask,
>                                              names, info) < 0)
>                         return -1;
>                 return mask;
> --
> gitgitgadget

The splitting of the previous patch looks really good here too, and
the variable rename makes it flow nicely.  Looking good.

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [PATCH v2 0/8] Sparse-index: integrate with status
  2021-04-23 21:34 ` [PATCH v2 0/8] Sparse-index: integrate with status Derrick Stolee via GitGitGadget
                     ` (7 preceding siblings ...)
  2021-04-23 21:34   ` [PATCH v2 8/8] fsmonitor: test with sparse index Derrick Stolee via GitGitGadget
@ 2021-05-13  4:12   ` Elijah Newren
  2021-05-14 18:28     ` Derrick Stolee
  2021-05-14 18:30   ` [PATCH v3 00/12] " Derrick Stolee via GitGitGadget
  9 siblings, 1 reply; 127+ messages in thread
From: Elijah Newren @ 2021-05-13  4:12 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Junio C Hamano, Matheus Tavares Bernardino,
	Derrick Stolee, Derrick Stolee

On Fri, Apr 23, 2021 at 2:34 PM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> This is the first "payoff" series in the sparse-index work. It makes 'git
> status' very fast when a sparse-index is enabled on a repository with
> cone-mode sparse-checkout (and a small populated set).
>
> This is based on ds/sparse-index-protections AND mt/add-rm-sparse-checkout.
> The latter branch is needed because it changes the behavior of 'git add'
> around sparse entries, which changes the expectations of a test added in
> patch 1.
>
> The approach here is to audit the places where ensure_full_index() pops up
> while doing normal commands with pathspecs within the sparse-checkout
> definition. Each of these are checked and tested. In the end, the
> sparse-index is integrated with these features:
>
>  * git status
>  * FS Monitor index extension.
>
> The performance tests in p2000-sparse-operations.sh improve by 95% or more,
> even when compared with the full-index cases, not just the sparse-index
> cases that previously had extra overhead.
>
> Hopefully this is the first example of how ds/sparse-index-protections has
> done the basic work to do these conversions safely, making them look easier
> than they seemed when starting this adventure.
>
> Thanks, -Stolee
>
>
> Updates in V2
> =============
>
>  * Based on the feedback, it is clear that 'git add' will require much more
>    careful testing and thought. I'm splitting it out of this series and it
>    will return with a follow-up.
>  * Test cases are improved, both in coverage and organization.
>  * The previous "unpack-trees: make sparse aware" patch is split into three
>    now.
>  * Stale messages based on an old implementation of the "protections" topic
>    are now fixed.
>  * Performance tests were re-run.

I read through the topic, both my old comments, the range-diff, and
the new patches where the range-diff wasn't enough.  I tried to spot
issues, and was hoping to find problems you alluded to in your recent
comments at https://lore.kernel.org/git/05932ebc-04ac-b3c5-a460-5d37d8604fd9@gmail.com/,
but I failed to spot them.  I hope it has to do with the cache bottom
stuff that I just don't understand, because otherwise I just missed
the problems in my review.  I can say that in v2 you fixed the issues
I did spot in my review of v1.

I'll look forward to v3 to see what it was I missed.  If I somehow
don't respond soon (in a week at the latest), do feel free to ping me;
sorry for somehow having this one slip through the cracks.

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [PATCH v2 1/8] t1092: add tests for status/add and sparse files
  2021-04-23 21:34   ` [PATCH v2 1/8] t1092: add tests for status/add and sparse files Derrick Stolee via GitGitGadget
@ 2021-05-13 12:40     ` Matheus Tavares Bernardino
  2021-05-14 12:27       ` Derrick Stolee
  0 siblings, 1 reply; 127+ messages in thread
From: Matheus Tavares Bernardino @ 2021-05-13 12:40 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: git, Junio C Hamano, Elijah Newren, Derrick Stolee,
	Derrick Stolee, Derrick Stolee

On Fri, Apr 23, 2021 at 6:34 PM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Derrick Stolee <dstolee@microsoft.com>
>
> Before moving to update 'git status' and 'git add' to work with sparse
> indexes, add an explicit test that ensures the sparse-index works the
> same as a normal sparse-checkout when the worktree contains directories
> and files outside of the sparse cone.
>
> Specifically, 'folder1/a' is a file in our test repo, but 'folder1' is
> not in the sparse cone. When 'folder1/a' is modified, the file
> 'folder1/a' is shown as modified, but adding it fails.

Hmm, I might be doing something wrong, but I think `folder1/a` is not
shown as modified.

$ git init test
$ mkdir test/folder1
$ echo original >test/folder1/a
$ echo original >test/b
$ git -C test add . && git -C test commit -m files
$ git -C test sparse-checkout init --cone --sparse-index
$ ls test
b
$ mkdir test/folder1 && echo modified >test/folder1/a
$ git -C test status
On branch master
You are in a sparse checkout with 50% of tracked files present.
nothing to commit, working tree clean

> This is new
> behavior as of a20f704 (add: warn when asked to update SKIP_WORKTREE
> entries, 2021-04-08). Before that change, these adds would be silently
> ignored.
>
> Untracked files are fine: adding new files both with 'git add .' and
> 'git add folder1/' works just as in a full checkout. This may not be
> entirely desirable, but we are not intending to change behavior at the
> moment, only document it. A future change could alter the behavior to
> be more sensible, and this test could be modified to satisfy the new
> expected behavior.
>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  t/t1092-sparse-checkout-compatibility.sh | 40 ++++++++++++++++++++++++
>  1 file changed, 40 insertions(+)
>
> diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
> index 12e6c453024f..0ec487acd283 100755
> --- a/t/t1092-sparse-checkout-compatibility.sh
> +++ b/t/t1092-sparse-checkout-compatibility.sh
> @@ -232,6 +232,46 @@ test_expect_success 'add, commit, checkout' '
>         test_all_match git checkout -
>  '
>
> +test_expect_success 'status/add: outside sparse cone' '
> +       init_repos &&

A minor suggestion: before recreating folder1/a, we could also test
that `git add folder1/a` will not remove the sparse entry from the
index and will properly warn about it on both sparse repos. I.e.
adding a:

        test_sparse_match test_must_fail git add folder1/a

> +       # folder1 is at HEAD, but outside the sparse cone
> +       run_on_sparse mkdir folder1 &&
> +       cp initial-repo/folder1/a sparse-checkout/folder1/a &&
> +       cp initial-repo/folder1/a sparse-index/folder1/a &&
> +
> +       test_sparse_match git status &&
> +
> +       write_script edit-contents <<-\EOF &&
> +       echo text >>$1
> +       EOF
> +       run_on_all ../edit-contents folder1/a &&

Hmm, we modify `folder1/a` in all repos, but we only try adding it on
the sparse repos, and then we immediately restore it on the full repo.
As we won't use the modified version on the full repo, could this
perhaps be `run_on_sparse` instead? If so, we could also save the
later `git -C full-checkout checkout HEAD -- folder1/a`.

> +       run_on_all ../edit-contents folder1/new &&
> +
> +       test_sparse_match git status --porcelain=v2 &&
> +
> +       # This "git add folder1/a" is completely ignored
> +       # by the sparse-checkout repos. It causes the
> +       # full repo to have a different staged environment.
> +       #
> +       # This is not a desirable behavior, but this test
> +       # ensures that the sparse-index is not the cause
> +       # of a behavior change.

I'm not sure I understand what the undesirable behavior is in this
sentence. Is it "git add folder1/a" erroring out and not updating
`folder1/a`? Or the full repo having a different staged environment?

> +       test_sparse_match test_must_fail git add folder1/a &&
> +       test_sparse_match test_must_fail git add --refresh folder1/a &&
> +       git -C full-checkout checkout HEAD -- folder1/a &&
> +       test_all_match git status --porcelain=v2 &&
> +
> +       test_all_match git add . &&
> +       test_all_match git status --porcelain=v2 &&
> +       test_all_match git commit -m folder1/new &&
> +
> +       run_on_all ../edit-contents folder1/newer &&
> +       test_all_match git add folder1/ &&
> +       test_all_match git status --porcelain=v2 &&
> +       test_all_match git commit -m folder1/newer
> +'
> +
>  test_expect_success 'checkout and reset --hard' '
>         init_repos &&
>
> --
> gitgitgadget
>

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [PATCH v2 1/8] t1092: add tests for status/add and sparse files
  2021-05-13 12:40     ` Matheus Tavares Bernardino
@ 2021-05-14 12:27       ` Derrick Stolee
  0 siblings, 0 replies; 127+ messages in thread
From: Derrick Stolee @ 2021-05-14 12:27 UTC (permalink / raw)
  To: Matheus Tavares Bernardino, Derrick Stolee via GitGitGadget
  Cc: git, Junio C Hamano, Elijah Newren, Derrick Stolee, Derrick Stolee

On 5/13/2021 8:40 AM, Matheus Tavares Bernardino wrote:
> On Fri, Apr 23, 2021 at 6:34 PM Derrick Stolee via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
>>
>> From: Derrick Stolee <dstolee@microsoft.com>
>>
>> Before moving to update 'git status' and 'git add' to work with sparse
>> indexes, add an explicit test that ensures the sparse-index works the
>> same as a normal sparse-checkout when the worktree contains directories
>> and files outside of the sparse cone.
>>
>> Specifically, 'folder1/a' is a file in our test repo, but 'folder1' is
>> not in the sparse cone. When 'folder1/a' is modified, the file
>> 'folder1/a' is shown as modified, but adding it fails.
> 
> Hmm, I might be doing something wrong, but I think `folder1/a` is not
> shown as modified.
> 
> $ git init test
> $ mkdir test/folder1
> $ echo original >test/folder1/a
> $ echo original >test/b
> $ git -C test add . && git -C test commit -m files
> $ git -C test sparse-checkout init --cone --sparse-index
> $ ls test
> b
> $ mkdir test/folder1 && echo modified >test/folder1/a
> $ git -C test status
> On branch master
> You are in a sparse checkout with 50% of tracked files present.
> nothing to commit, working tree clean

You are correct. This happens in both the sparse-index case and the
regular full-index case. The modifications outside of the sparse-checkout
definition are ignored, as long as they matched a tracked file.

I checked my latest code against this example and see that the sparse
index is not expanded to a full one. It _will_ be if we add an untracked
file outside of the sparse cone.

>> This is new
>> behavior as of a20f704 (add: warn when asked to update SKIP_WORKTREE
>> entries, 2021-04-08). Before that change, these adds would be silently
>> ignored.
>>
>> Untracked files are fine: adding new files both with 'git add .' and
>> 'git add folder1/' works just as in a full checkout. This may not be
>> entirely desirable, but we are not intending to change behavior at the
>> moment, only document it. A future change could alter the behavior to
>> be more sensible, and this test could be modified to satisfy the new
>> expected behavior.
>>
>> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
>> ---
>>  t/t1092-sparse-checkout-compatibility.sh | 40 ++++++++++++++++++++++++
>>  1 file changed, 40 insertions(+)
>>
>> diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
>> index 12e6c453024f..0ec487acd283 100755
>> --- a/t/t1092-sparse-checkout-compatibility.sh
>> +++ b/t/t1092-sparse-checkout-compatibility.sh
>> @@ -232,6 +232,46 @@ test_expect_success 'add, commit, checkout' '
>>         test_all_match git checkout -
>>  '
>>
>> +test_expect_success 'status/add: outside sparse cone' '
>> +       init_repos &&
> 
> A minor suggestion: before recreating folder1/a, we could also test
> that `git add folder1/a` will not remove the sparse entry from the
> index and will properly warn about it on both sparse repos. I.e.
> adding a:
> 
>         test_sparse_match test_must_fail git add folder1/a

Will do.

>> +       # folder1 is at HEAD, but outside the sparse cone
>> +       run_on_sparse mkdir folder1 &&
>> +       cp initial-repo/folder1/a sparse-checkout/folder1/a &&
>> +       cp initial-repo/folder1/a sparse-index/folder1/a &&
>> +
>> +       test_sparse_match git status &&
>> +
>> +       write_script edit-contents <<-\EOF &&
>> +       echo text >>$1
>> +       EOF
>> +       run_on_all ../edit-contents folder1/a &&
> 
> Hmm, we modify `folder1/a` in all repos, but we only try adding it on
> the sparse repos, and then we immediately restore it on the full repo.
> As we won't use the modified version on the full repo, could this
> perhaps be `run_on_sparse` instead? If so, we could also save the
> later `git -C full-checkout checkout HEAD -- folder1/a`.

Good idea.

>> +       run_on_all ../edit-contents folder1/new &&
>> +
>> +       test_sparse_match git status --porcelain=v2 &&
>> +
>> +       # This "git add folder1/a" is completely ignored
>> +       # by the sparse-checkout repos. It causes the
>> +       # full repo to have a different staged environment.
>> +       #
>> +       # This is not a desirable behavior, but this test
>> +       # ensures that the sparse-index is not the cause
>> +       # of a behavior change.
> 
> I'm not sure I understand what the undesirable behavior is in this
> sentence. Is it "git add folder1/a" erroring out and not updating
> `folder1/a`? Or the full repo having a different staged environment?

Perhaps this isn't actually undesirable, now that we are actually
returning an error. It's no longer silent, so maybe my comment is
stale from an earlier version.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [PATCH v2 0/8] Sparse-index: integrate with status
  2021-05-13  4:12   ` [PATCH v2 0/8] Sparse-index: integrate with status Elijah Newren
@ 2021-05-14 18:28     ` Derrick Stolee
  0 siblings, 0 replies; 127+ messages in thread
From: Derrick Stolee @ 2021-05-14 18:28 UTC (permalink / raw)
  To: Elijah Newren, Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Junio C Hamano, Matheus Tavares Bernardino,
	Derrick Stolee

On 5/13/2021 12:12 AM, Elijah Newren wrote:
> On Fri, Apr 23, 2021 at 2:34 PM Derrick Stolee via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
>>
>> This is the first "payoff" series in the sparse-index work. It makes 'git
>> status' very fast when a sparse-index is enabled on a repository with
>> cone-mode sparse-checkout (and a small populated set).
> 
> I read through the topic, both my old comments, the range-diff, and
> the new patches where the range-diff wasn't enough.  I tried to spot
> issues, and was hoping to find problems you alluded to in your recent
> comments at https://lore.kernel.org/git/05932ebc-04ac-b3c5-a460-5d37d8604fd9@gmail.com/,
> but I failed to spot them.  I hope it has to do with the cache bottom
> stuff that I just don't understand, because otherwise I just missed
> the problems in my review.  I can say that in v2 you fixed the issues
> I did spot in my review of v1.
> 
> I'll look forward to v3 to see what it was I missed.  If I somehow
> don't respond soon (in a week at the latest), do feel free to ping me;
> sorry for somehow having this one slip through the cracks.

v3 is on the way. The changes related to issues I found in my
deeper testing are more about what wasn't previously tested in
my test script as opposed to things actually being wrong in
the patch series. (There is one case where some new code was
incorrect, but it wasn't being tested because of the test repo's
data shape.)

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 127+ messages in thread

* [PATCH v3 00/12] Sparse-index: integrate with status
  2021-04-23 21:34 ` [PATCH v2 0/8] Sparse-index: integrate with status Derrick Stolee via GitGitGadget
                     ` (8 preceding siblings ...)
  2021-05-13  4:12   ` [PATCH v2 0/8] Sparse-index: integrate with status Elijah Newren
@ 2021-05-14 18:30   ` Derrick Stolee via GitGitGadget
  2021-05-14 18:31     ` [PATCH v3 01/12] sparse-index: skip indexes with unmerged entries Derrick Stolee via GitGitGadget
                       ` (12 more replies)
  9 siblings, 13 replies; 127+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-05-14 18:30 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee,
	Derrick Stolee

This is the first "payoff" series in the sparse-index work. It makes 'git
status' very fast when a sparse-index is enabled on a repository with
cone-mode sparse-checkout (and a small populated set).

This is based on ds/sparse-index-protections AND mt/add-rm-sparse-checkout.
The latter branch is needed because it changes the behavior of 'git add'
around sparse entries, which changes the expectations of a test added in
patch 1.

The approach here is to audit the places where ensure_full_index() pops up
while doing normal commands with pathspecs within the sparse-checkout
definition. Each of these are checked and tested. In the end, the
sparse-index is integrated with these features:

 * git status
 * FS Monitor index extension.

The performance tests in p2000-sparse-operations.sh improve by 95% or more,
even when compared with the full-index cases, not just the sparse-index
cases that previously had extra overhead.

Hopefully this is the first example of how ds/sparse-index-protections has
done the basic work to do these conversions safely, making them look easier
than they seemed when starting this adventure.

Thanks, -Stolee


Updates in V3
=============

Sorry that this was a long time coming. I got a little side-tracked on other
projects, but I also worked to get the sparse-index feature working against
the Scalar functional tests, which contain many special cases around the
sparse-checkout feature as they were inherited from special cases that arose
in the virtualized environment of VFS for Git. This version contains my
fixes based on that investigation. Most of these were easy to identify and
fix, but I was blocked for a long time struggling with a bug when combining
the sparse-index with the builtin FS Monitor feature, but I've reported my
findings already [1].

[1]
https://lore.kernel.org/git/0b9e54ba-ac27-e537-7bef-1b4448f92352@gmail.com/

 * Updated comments and tests based on the v2 feedback.
 * Expanded the test repository data shape based on the special cases found
   during my investigation.
 * Added several commits that either fix errors in the status code, or fix
   errors in the previous sparse-index series, specifically:
   * When in a conflict state, the cache-tree fails to update. For now, skip
     writing a sparse-index until this can be resolved more carefully.
   * When expanding a sparse-directory entry, we set the CE_SKIP_WORKTREE
     bit but forgot the CE_EXTENDED bit.
   * git status had failures if there was a sparse-directory entry as the
     first entry within a directory.
   * When expanding a directory to report its status, such as when a
     sparse-directory is staged but doesn't exist at HEAD (such as in an
     orphaned commit) we did not previously recurse correctly into
     subdirectories.
   * Be extra careful with the FS Monitor data when expanding or contracting
     an index. This version now abandons all FS Monitor data at these
     conversion points with the expectation that in the future these
     conversions will be rare so the FS Monitor feature can work
     efficiently. Updates in V2

----------------------------------------------------------------------------

 * Based on the feedback, it is clear that 'git add' will require much more
   careful testing and thought. I'm splitting it out of this series and it
   will return with a follow-up.
 * Test cases are improved, both in coverage and organization.
 * The previous "unpack-trees: make sparse aware" patch is split into three
   now.
 * Stale messages based on an old implementation of the "protections" topic
   are now fixed.
 * Performance tests were re-run.

Derrick Stolee (12):
  sparse-index: skip indexes with unmerged entries
  sparse-index: include EXTENDED flag when expanding
  t1092: expand repository data shape
  t1092: add tests for status/add and sparse files
  unpack-trees: preserve cache_bottom
  unpack-trees: compare sparse directories correctly
  unpack-trees: stop recursing into sparse directories
  dir.c: accept a directory as part of cone-mode patterns
  status: skip sparse-checkout percentage with sparse-index
  status: use sparse-index throughout
  wt-status: expand added sparse directory entries
  fsmonitor: integrate with sparse index

 builtin/commit.c                         |   3 +
 diff-lib.c                               |   6 ++
 dir.c                                    |  11 +++
 read-cache.c                             |  10 +-
 sparse-index.c                           |  27 +++++-
 t/t1092-sparse-checkout-compatibility.sh | 117 ++++++++++++++++++++++-
 t/t7519-status-fsmonitor.sh              |  48 ++++++++++
 unpack-trees.c                           |  27 +++++-
 wt-status.c                              |  64 ++++++++++++-
 wt-status.h                              |   1 +
 10 files changed, 300 insertions(+), 14 deletions(-)


base-commit: f723f370c89ad61f4f40aabfd3540b1ce19c00e5
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-932%2Fderrickstolee%2Fsparse-index%2Fstatus-and-add-v3
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-932/derrickstolee/sparse-index/status-and-add-v3
Pull-Request: https://github.com/gitgitgadget/git/pull/932

Range-diff vs v2:

  -:  ------------ >  1:  5a2ed3d1d701 sparse-index: skip indexes with unmerged entries
  -:  ------------ >  2:  8aa41e749471 sparse-index: include EXTENDED flag when expanding
  -:  ------------ >  3:  70971b1f9261 t1092: expand repository data shape
  1:  3bac9edae7d8 !  4:  a80b5a41153f t1092: add tests for status/add and sparse files
     @@ Commit message
          and files outside of the sparse cone.
      
          Specifically, 'folder1/a' is a file in our test repo, but 'folder1' is
     -    not in the sparse cone. When 'folder1/a' is modified, the file
     -    'folder1/a' is shown as modified, but adding it fails. This is new
     -    behavior as of a20f704 (add: warn when asked to update SKIP_WORKTREE
     -    entries, 2021-04-08). Before that change, these adds would be silently
     -    ignored.
     +    not in the sparse cone. When 'folder1/a' is modified, the file is not
     +    shown as modified and adding it will fail. This is new behavior as of
     +    a20f704 (add: warn when asked to update SKIP_WORKTREE entries,
     +    2021-04-08). Before that change, these adds would be silently ignored.
      
          Untracked files are fine: adding new files both with 'git add .' and
          'git add folder1/' works just as in a full checkout. This may not be
     @@ t/t1092-sparse-checkout-compatibility.sh: test_expect_success 'add, commit, chec
      +test_expect_success 'status/add: outside sparse cone' '
      +	init_repos &&
      +
     ++	# adding a "missing" file outside the cone should fail
     ++	test_sparse_match test_must_fail git add folder1/a &&
     ++
      +	# folder1 is at HEAD, but outside the sparse cone
      +	run_on_sparse mkdir folder1 &&
      +	cp initial-repo/folder1/a sparse-checkout/folder1/a &&
     @@ t/t1092-sparse-checkout-compatibility.sh: test_expect_success 'add, commit, chec
      +	write_script edit-contents <<-\EOF &&
      +	echo text >>$1
      +	EOF
     -+	run_on_all ../edit-contents folder1/a &&
     ++	run_on_sparse ../edit-contents folder1/a &&
      +	run_on_all ../edit-contents folder1/new &&
      +
      +	test_sparse_match git status --porcelain=v2 &&
      +
     -+	# This "git add folder1/a" is completely ignored
     -+	# by the sparse-checkout repos. It causes the
     -+	# full repo to have a different staged environment.
     -+	#
     -+	# This is not a desirable behavior, but this test
     -+	# ensures that the sparse-index is not the cause
     -+	# of a behavior change.
     ++	# This "git add folder1/a" fails with a warning
     ++	# in the sparse repos, differing from the full
     ++	# repo. This is intentional.
      +	test_sparse_match test_must_fail git add folder1/a &&
      +	test_sparse_match test_must_fail git add --refresh folder1/a &&
     -+	git -C full-checkout checkout HEAD -- folder1/a &&
      +	test_all_match git status --porcelain=v2 &&
      +
      +	test_all_match git add . &&
  2:  19344394379d =  5:  07a45b661c4a unpack-trees: preserve cache_bottom
  3:  24e71d8c0622 !  6:  cc4a526e7947 unpack-trees: compare sparse directories correctly
     @@ unpack-trees.c: static int compare_entry(const struct cache_entry *ce, const str
       	if (cmp)
       		return cmp;
       
     -+	/* If ce is a sparse directory, then allow an exact match. */
     ++	/*
     ++	 * At this point, we know that we have a prefix match. If ce
     ++	 * is a sparse directory, then allow an exact match. This only
     ++	 * works when the input name is a directory, since ce->name
     ++	 * ends in a directory separator.
     ++	 */
      +	if (S_ISSPARSEDIR(ce->ce_mode))
      +		return 0;
      +
  4:  d3c8948d0a33 !  7:  598375d3531f unpack-trees: stop recursing into sparse directories
     @@ Commit message
      
          Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
      
     + ## diff-lib.c ##
     +@@ diff-lib.c: static void show_new_file(struct rev_info *revs,
     + 	unsigned int mode;
     + 	unsigned dirty_submodule = 0;
     + 
     ++	if (S_ISSPARSEDIR(new_file->ce_mode))
     ++		return;
     ++
     + 	/*
     + 	 * New file in the index: it might actually be different in
     + 	 * the working tree.
     +@@ diff-lib.c: static int show_modified(struct rev_info *revs,
     + 	const struct object_id *oid;
     + 	unsigned dirty_submodule = 0;
     + 
     ++	if (S_ISSPARSEDIR(new_entry->ce_mode))
     ++		return 0;
     ++
     + 	if (get_stat_data(new_entry, &oid, &mode, cached, match_missing,
     + 			  &dirty_submodule, &revs->diffopt) < 0) {
     + 		if (report_missing)
     +
       ## unpack-trees.c ##
      @@ unpack-trees.c: static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, str
       	struct cache_entry *src[MAX_UNPACK_TREES + 1] = { NULL, };
     @@ unpack-trees.c: static int unpack_callback(int n, unsigned long mask, unsigned l
       	/* Find first entry with a real name (we could use "mask" too) */
       	while (!p->mode)
      @@ unpack-trees.c: static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, str
     - 					}
     - 				}
     - 				src[0] = ce;
     -+
     -+				if (S_ISSPARSEDIR(ce->ce_mode))
     -+					unpack_tree = 0;
     - 			}
     - 			break;
       		}
       	}
       
  5:  fd96b71968b6 =  8:  47da2b317237 dir.c: accept a directory as part of cone-mode patterns
  6:  1f4ba56e7416 =  9:  bc1512981493 status: skip sparse-checkout percentage with sparse-index
  7:  3d09368c0541 = 10:  5b1ae369a7cd status: use sparse-index throughout
  -:  ------------ > 11:  3b42783d4a86 wt-status: expand added sparse directory entries
  8:  1fd033a6ebb2 ! 12:  b72507f514d1 fsmonitor: test with sparse index
     @@ Metadata
      Author: Derrick Stolee <dstolee@microsoft.com>
      
       ## Commit message ##
     -    fsmonitor: test with sparse index
     +    fsmonitor: integrate with sparse index
      
     -    During the effort to protect uses of the index to operate on a full
     -    index, we did not modify fsmonitor.c. This is because it already works
     -    effectively with only the change to index_name_stage_pos(). The only
     -    thing left to do is to test that it works correctly.
     +    If we need to expand a sparse-index into a full one, then the FS Monitor
     +    bitmap is going to be incorrect. Ensure that we start fresh at such an
     +    event.
     +
     +    While this is currently a performance drawback, the eventual hope of the
     +    sparse-index feature is that these expansions will be rare and hence we
     +    will be able to keep the FS Monitor data accurate across multiple Git
     +    commands.
      
          These tests are added to demonstrate that the behavior is the same
          across a full index and a sparse index, but also that file modifications
     @@ Commit message
      
          Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
      
     + ## sparse-index.c ##
     +@@ sparse-index.c: int convert_to_sparse(struct index_state *istate)
     + 	cache_tree_free(&istate->cache_tree);
     + 	cache_tree_update(istate, 0);
     + 
     ++	istate->fsmonitor_has_run_once = 0;
     ++	FREE_AND_NULL(istate->fsmonitor_dirty);
     ++	FREE_AND_NULL(istate->fsmonitor_last_update);
     ++
     + 	istate->sparse_index = 1;
     + 	trace2_region_leave("index", "convert_to_sparse", istate->repo);
     + 	return 0;
     +@@ sparse-index.c: void ensure_full_index(struct index_state *istate)
     + 	istate->cache = full->cache;
     + 	istate->cache_nr = full->cache_nr;
     + 	istate->cache_alloc = full->cache_alloc;
     ++	istate->fsmonitor_has_run_once = 0;
     ++	FREE_AND_NULL(istate->fsmonitor_dirty);
     ++	FREE_AND_NULL(istate->fsmonitor_last_update);
     + 
     + 	strbuf_release(&base);
     + 	free(full);
     +
       ## t/t7519-status-fsmonitor.sh ##
      @@ t/t7519-status-fsmonitor.sh: test_expect_success 'setup' '
       	expect*

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 127+ messages in thread

* [PATCH v3 01/12] sparse-index: skip indexes with unmerged entries
  2021-05-14 18:30   ` [PATCH v3 00/12] " Derrick Stolee via GitGitGadget
@ 2021-05-14 18:31     ` Derrick Stolee via GitGitGadget
  2021-05-14 18:31     ` [PATCH v3 02/12] sparse-index: include EXTENDED flag when expanding Derrick Stolee via GitGitGadget
                       ` (11 subsequent siblings)
  12 siblings, 0 replies; 127+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-05-14 18:31 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

The sparse-index format is designed to be compatible with merge
conflicts, even those outside the sparse-checkout definition. The reason
is that when converting a full index to a sparse one, a cache entry with
nonzero stage will not be collapsed into a sparse directory entry.

However, this behavior was not tested, and a different behavior within
convert_to_sparse() fails in this scenario. Specifically,
cache_tree_update() will fail when unmerged entries exist.
convert_to_sparse_rec() uses the cache-tree data to recursively walk the
tree structure, but also to compute the OIDs used in the
sparse-directory entries.

Add an index scan to convert_to_sparse() that will detect if these merge
conflict entries exist and skip the conversion before trying to update
the cache-tree. This is marked as NEEDSWORK because this can be removed
with a suitable update to cache_tree_update() or a similar method that
can construct a cache-tree with invalid nodes, but still allow creating
the nodes necessary for creating sparse directory entries.

It is possible that in the future we will not need to make such an
update, since if we do not expand a sparse-index into a full one, this
conversion does not need to happen. Thus, this can be deferred until the
merge machinery is made to integrate with the sparse-index.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 sparse-index.c                           | 18 ++++++++++++++++++
 t/t1092-sparse-checkout-compatibility.sh | 22 ++++++++++++++++++++++
 2 files changed, 40 insertions(+)

diff --git a/sparse-index.c b/sparse-index.c
index 6f21397e2ee0..1b49898d0cb7 100644
--- a/sparse-index.c
+++ b/sparse-index.c
@@ -125,6 +125,17 @@ int set_sparse_index_config(struct repository *repo, int enable)
 	return res;
 }
 
+static int index_has_unmerged_entries(struct index_state *istate)
+{
+	int i;
+	for (i = 0; i < istate->cache_nr; i++) {
+		if (ce_stage(istate->cache[i]))
+			return 1;
+	}
+
+	return 0;
+}
+
 int convert_to_sparse(struct index_state *istate)
 {
 	int test_env;
@@ -161,6 +172,13 @@ int convert_to_sparse(struct index_state *istate)
 		return -1;
 	}
 
+	/*
+	 * NEEDSWORK: If we have unmerged entries, then stay full.
+	 * Unmerged entries prevent the cache-tree extension from working.
+	 */
+	if (index_has_unmerged_entries(istate))
+		return 0;
+
 	if (cache_tree_update(istate, 0)) {
 		warning(_("unable to update cache-tree, staying full"));
 		return -1;
diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index 12e6c453024f..4f2f09b53a32 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -352,6 +352,28 @@ test_expect_success 'merge with outside renames' '
 	done
 '
 
+# Sparse-index fails to convert the index in the
+# final 'git cherry-pick' command.
+test_expect_success 'cherry-pick with conflicts' '
+	init_repos &&
+
+	write_script edit-conflict <<-\EOF &&
+	echo $1 >conflict
+	EOF
+
+	test_all_match git checkout -b to-cherry-pick &&
+	run_on_all ../edit-conflict ABC &&
+	test_all_match git add conflict &&
+	test_all_match git commit -m "conflict to pick" &&
+
+	test_all_match git checkout -B base HEAD~1 &&
+	run_on_all ../edit-conflict DEF &&
+	test_all_match git add conflict &&
+	test_all_match git commit -m "conflict in base" &&
+
+	test_all_match test_must_fail git cherry-pick to-cherry-pick
+'
+
 test_expect_success 'clean' '
 	init_repos &&
 
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 127+ messages in thread

* [PATCH v3 02/12] sparse-index: include EXTENDED flag when expanding
  2021-05-14 18:30   ` [PATCH v3 00/12] " Derrick Stolee via GitGitGadget
  2021-05-14 18:31     ` [PATCH v3 01/12] sparse-index: skip indexes with unmerged entries Derrick Stolee via GitGitGadget
@ 2021-05-14 18:31     ` Derrick Stolee via GitGitGadget
  2021-05-18  1:33       ` Elijah Newren
  2021-05-14 18:31     ` [PATCH v3 03/12] t1092: expand repository data shape Derrick Stolee via GitGitGadget
                       ` (10 subsequent siblings)
  12 siblings, 1 reply; 127+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-05-14 18:31 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

When creating a full index from a sparse one, we create cache entries
for every blob within a given sparse directory entry. These are
correctly marked with the CE_SKIP_WORKTREE flag, but they must also be
marked with the CE_EXTENDED flag to ensure that the skip-worktree bit is
correctly written to disk in the case that the index is not converted
back down to a sparse-index.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 sparse-index.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/sparse-index.c b/sparse-index.c
index 1b49898d0cb7..b2b3fbd75050 100644
--- a/sparse-index.c
+++ b/sparse-index.c
@@ -222,7 +222,7 @@ static int add_path_to_index(const struct object_id *oid,
 	strbuf_addstr(base, path);
 
 	ce = make_cache_entry(istate, mode, oid, base->buf, 0, 0);
-	ce->ce_flags |= CE_SKIP_WORKTREE;
+	ce->ce_flags |= CE_SKIP_WORKTREE | CE_EXTENDED;
 	set_index_entry(istate, istate->cache_nr++, ce);
 
 	strbuf_setlen(base, len);
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 127+ messages in thread

* [PATCH v3 03/12] t1092: expand repository data shape
  2021-05-14 18:30   ` [PATCH v3 00/12] " Derrick Stolee via GitGitGadget
  2021-05-14 18:31     ` [PATCH v3 01/12] sparse-index: skip indexes with unmerged entries Derrick Stolee via GitGitGadget
  2021-05-14 18:31     ` [PATCH v3 02/12] sparse-index: include EXTENDED flag when expanding Derrick Stolee via GitGitGadget
@ 2021-05-14 18:31     ` Derrick Stolee via GitGitGadget
  2021-05-18  1:49       ` Elijah Newren
  2021-05-14 18:31     ` [PATCH v3 04/12] t1092: add tests for status/add and sparse files Derrick Stolee via GitGitGadget
                       ` (9 subsequent siblings)
  12 siblings, 1 reply; 127+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-05-14 18:31 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

As more features integrate with the sparse-index feature, more and more
special cases arise that require different data shapes within the tree
structure of the repository in order to demonstrate those cases.

Add several interesting special cases all at once instead of sprinkling
them across several commits. The interesting cases being added here are:

* Add sparse-directory entries on both sides of directories within the
  sparse-checkout definition.

* Add directories outside the sparse-checkout definition who have only
  one entry and are the first entry of a directory with multiple
  entries.

Later tests will take advantage of these shapes, but they also deepen
the tests that already exist.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 t/t1092-sparse-checkout-compatibility.sh | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index 4f2f09b53a32..98257695979a 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -17,7 +17,7 @@ test_expect_success 'setup' '
 		echo "after folder1" >g &&
 		echo "after x" >z &&
 		mkdir folder1 folder2 deep x &&
-		mkdir deep/deeper1 deep/deeper2 &&
+		mkdir deep/deeper1 deep/deeper2 deep/before deep/later &&
 		mkdir deep/deeper1/deepest &&
 		echo "after deeper1" >deep/e &&
 		echo "after deepest" >deep/deeper1/e &&
@@ -25,10 +25,16 @@ test_expect_success 'setup' '
 		cp a folder2 &&
 		cp a x &&
 		cp a deep &&
+		cp a deep/before &&
 		cp a deep/deeper1 &&
 		cp a deep/deeper2 &&
+		cp a deep/later &&
 		cp a deep/deeper1/deepest &&
 		cp -r deep/deeper1/deepest deep/deeper2 &&
+		mkdir deep/deeper1/0 &&
+		mkdir deep/deeper1/0/0 &&
+		touch deep/deeper1/0/1 &&
+		touch deep/deeper1/0/0/0 &&
 		git add . &&
 		git commit -m "initial commit" &&
 		git checkout -b base &&
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 127+ messages in thread

* [PATCH v3 04/12] t1092: add tests for status/add and sparse files
  2021-05-14 18:30   ` [PATCH v3 00/12] " Derrick Stolee via GitGitGadget
                       ` (2 preceding siblings ...)
  2021-05-14 18:31     ` [PATCH v3 03/12] t1092: expand repository data shape Derrick Stolee via GitGitGadget
@ 2021-05-14 18:31     ` Derrick Stolee via GitGitGadget
  2021-05-14 18:31     ` [PATCH v3 05/12] unpack-trees: preserve cache_bottom Derrick Stolee via GitGitGadget
                       ` (8 subsequent siblings)
  12 siblings, 0 replies; 127+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-05-14 18:31 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Before moving to update 'git status' and 'git add' to work with sparse
indexes, add an explicit test that ensures the sparse-index works the
same as a normal sparse-checkout when the worktree contains directories
and files outside of the sparse cone.

Specifically, 'folder1/a' is a file in our test repo, but 'folder1' is
not in the sparse cone. When 'folder1/a' is modified, the file is not
shown as modified and adding it will fail. This is new behavior as of
a20f704 (add: warn when asked to update SKIP_WORKTREE entries,
2021-04-08). Before that change, these adds would be silently ignored.

Untracked files are fine: adding new files both with 'git add .' and
'git add folder1/' works just as in a full checkout. This may not be
entirely desirable, but we are not intending to change behavior at the
moment, only document it. A future change could alter the behavior to
be more sensible, and this test could be modified to satisfy the new
expected behavior.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 t/t1092-sparse-checkout-compatibility.sh | 38 ++++++++++++++++++++++++
 1 file changed, 38 insertions(+)

diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index 98257695979a..fba98d5484ae 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -238,6 +238,44 @@ test_expect_success 'add, commit, checkout' '
 	test_all_match git checkout -
 '
 
+test_expect_success 'status/add: outside sparse cone' '
+	init_repos &&
+
+	# adding a "missing" file outside the cone should fail
+	test_sparse_match test_must_fail git add folder1/a &&
+
+	# folder1 is at HEAD, but outside the sparse cone
+	run_on_sparse mkdir folder1 &&
+	cp initial-repo/folder1/a sparse-checkout/folder1/a &&
+	cp initial-repo/folder1/a sparse-index/folder1/a &&
+
+	test_sparse_match git status &&
+
+	write_script edit-contents <<-\EOF &&
+	echo text >>$1
+	EOF
+	run_on_sparse ../edit-contents folder1/a &&
+	run_on_all ../edit-contents folder1/new &&
+
+	test_sparse_match git status --porcelain=v2 &&
+
+	# This "git add folder1/a" fails with a warning
+	# in the sparse repos, differing from the full
+	# repo. This is intentional.
+	test_sparse_match test_must_fail git add folder1/a &&
+	test_sparse_match test_must_fail git add --refresh folder1/a &&
+	test_all_match git status --porcelain=v2 &&
+
+	test_all_match git add . &&
+	test_all_match git status --porcelain=v2 &&
+	test_all_match git commit -m folder1/new &&
+
+	run_on_all ../edit-contents folder1/newer &&
+	test_all_match git add folder1/ &&
+	test_all_match git status --porcelain=v2 &&
+	test_all_match git commit -m folder1/newer
+'
+
 test_expect_success 'checkout and reset --hard' '
 	init_repos &&
 
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 127+ messages in thread

* [PATCH v3 05/12] unpack-trees: preserve cache_bottom
  2021-05-14 18:30   ` [PATCH v3 00/12] " Derrick Stolee via GitGitGadget
                       ` (3 preceding siblings ...)
  2021-05-14 18:31     ` [PATCH v3 04/12] t1092: add tests for status/add and sparse files Derrick Stolee via GitGitGadget
@ 2021-05-14 18:31     ` Derrick Stolee via GitGitGadget
  2021-05-14 18:31     ` [PATCH v3 06/12] unpack-trees: compare sparse directories correctly Derrick Stolee via GitGitGadget
                       ` (7 subsequent siblings)
  12 siblings, 0 replies; 127+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-05-14 18:31 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

The cache_bottom member of 'struct unpack_trees_options' is used to
track the range of index entries corresponding to a node of the cache
tree. While recursing with traverse_by_cache_tree(), this value is
preserved on the call stack using a local and then restored as that
method returns.

The mark_ce_used() method normally modifies the cache_bottom member when
it refers to the marked cache entry. However, sparse directory entries
are stored as nodes in the cache-tree data structure as of 2de37c53
(cache-tree: integrate with sparse directory entries, 2021-03-30). Thus,
the cache_bottom will be modified as the cache-tree walk advances. Do
not update it as well within mark_ce_used().

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 unpack-trees.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/unpack-trees.c b/unpack-trees.c
index dddf106d5bd4..1067db19c9d2 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -586,6 +586,13 @@ static void mark_ce_used(struct cache_entry *ce, struct unpack_trees_options *o)
 {
 	ce->ce_flags |= CE_UNPACKED;
 
+	/*
+	 * If this is a sparse directory, don't advance cache_bottom.
+	 * That will be advanced later using the cache-tree data.
+	 */
+	if (S_ISSPARSEDIR(ce->ce_mode))
+		return;
+
 	if (o->cache_bottom < o->src_index->cache_nr &&
 	    o->src_index->cache[o->cache_bottom] == ce) {
 		int bottom = o->cache_bottom;
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 127+ messages in thread

* [PATCH v3 06/12] unpack-trees: compare sparse directories correctly
  2021-05-14 18:30   ` [PATCH v3 00/12] " Derrick Stolee via GitGitGadget
                       ` (4 preceding siblings ...)
  2021-05-14 18:31     ` [PATCH v3 05/12] unpack-trees: preserve cache_bottom Derrick Stolee via GitGitGadget
@ 2021-05-14 18:31     ` Derrick Stolee via GitGitGadget
  2021-05-14 18:31     ` [PATCH v3 07/12] unpack-trees: stop recursing into sparse directories Derrick Stolee via GitGitGadget
                       ` (6 subsequent siblings)
  12 siblings, 0 replies; 127+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-05-14 18:31 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

As we further integrate the sparse-index into unpack-trees, we need to
ensure that we compare sparse directory entries correctly with other
entries. This affects searching for an exact path as well as sorting
index entries.

Sparse directory entries contain the trailing directory separator. This
is important for the sorting, in particular. Thus, within
do_compare_entry() we stop using S_IFREG in all cases, since sparse
directories should use S_IFDIR to indicate that the comparison should
treat the entry name as a dirctory.

Within compare_entry(), it first calls do_compare_entry() to check the
leading portion of the name. When the input path is a directory name, we
could match exactly already. Thus, we should return 0 if we have an
exact string match on a sparse directory entry.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 unpack-trees.c | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/unpack-trees.c b/unpack-trees.c
index 1067db19c9d2..ef6a2b1c951c 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -969,6 +969,7 @@ static int do_compare_entry(const struct cache_entry *ce,
 	int pathlen, ce_len;
 	const char *ce_name;
 	int cmp;
+	unsigned ce_mode;
 
 	/*
 	 * If we have not precomputed the traverse path, it is quicker
@@ -991,7 +992,8 @@ static int do_compare_entry(const struct cache_entry *ce,
 	ce_len -= pathlen;
 	ce_name = ce->name + pathlen;
 
-	return df_name_compare(ce_name, ce_len, S_IFREG, name, namelen, mode);
+	ce_mode = S_ISSPARSEDIR(ce->ce_mode) ? S_IFDIR : S_IFREG;
+	return df_name_compare(ce_name, ce_len, ce_mode, name, namelen, mode);
 }
 
 static int compare_entry(const struct cache_entry *ce, const struct traverse_info *info, const struct name_entry *n)
@@ -1000,6 +1002,15 @@ static int compare_entry(const struct cache_entry *ce, const struct traverse_inf
 	if (cmp)
 		return cmp;
 
+	/*
+	 * At this point, we know that we have a prefix match. If ce
+	 * is a sparse directory, then allow an exact match. This only
+	 * works when the input name is a directory, since ce->name
+	 * ends in a directory separator.
+	 */
+	if (S_ISSPARSEDIR(ce->ce_mode))
+		return 0;
+
 	/*
 	 * Even if the beginning compared identically, the ce should
 	 * compare as bigger than a directory leading up to it!
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 127+ messages in thread

* [PATCH v3 07/12] unpack-trees: stop recursing into sparse directories
  2021-05-14 18:30   ` [PATCH v3 00/12] " Derrick Stolee via GitGitGadget
                       ` (5 preceding siblings ...)
  2021-05-14 18:31     ` [PATCH v3 06/12] unpack-trees: compare sparse directories correctly Derrick Stolee via GitGitGadget
@ 2021-05-14 18:31     ` Derrick Stolee via GitGitGadget
  2021-05-18  2:03       ` Elijah Newren
  2021-05-14 18:31     ` [PATCH v3 08/12] dir.c: accept a directory as part of cone-mode patterns Derrick Stolee via GitGitGadget
                       ` (5 subsequent siblings)
  12 siblings, 1 reply; 127+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-05-14 18:31 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

When walking trees using traverse_trees_recursive() and
unpack_callback(), we must not attempt to walk into a sparse directory
entry. There are no index entries within that directory to compare to
the tree object at that position, so skip over the entries of that tree.

This code is used in many places, so the only way to test it is to start
removing the command_requres_full_index option from one builtin at a
time and carefully test that its use of unpack_trees() behaves correctly
with a sparse-index. Such tests will be added by later changes.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 diff-lib.c     | 6 ++++++
 unpack-trees.c | 7 +++++--
 2 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/diff-lib.c b/diff-lib.c
index b73cc1859a49..d5e7e01132ee 100644
--- a/diff-lib.c
+++ b/diff-lib.c
@@ -322,6 +322,9 @@ static void show_new_file(struct rev_info *revs,
 	unsigned int mode;
 	unsigned dirty_submodule = 0;
 
+	if (S_ISSPARSEDIR(new_file->ce_mode))
+		return;
+
 	/*
 	 * New file in the index: it might actually be different in
 	 * the working tree.
@@ -343,6 +346,9 @@ static int show_modified(struct rev_info *revs,
 	const struct object_id *oid;
 	unsigned dirty_submodule = 0;
 
+	if (S_ISSPARSEDIR(new_entry->ce_mode))
+		return 0;
+
 	if (get_stat_data(new_entry, &oid, &mode, cached, match_missing,
 			  &dirty_submodule, &revs->diffopt) < 0) {
 		if (report_missing)
diff --git a/unpack-trees.c b/unpack-trees.c
index ef6a2b1c951c..703b0bdc9dfd 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -1261,6 +1261,7 @@ static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, str
 	struct cache_entry *src[MAX_UNPACK_TREES + 1] = { NULL, };
 	struct unpack_trees_options *o = info->data;
 	const struct name_entry *p = names;
+	unsigned unpack_tree = 1;
 
 	/* Find first entry with a real name (we could use "mask" too) */
 	while (!p->mode)
@@ -1307,7 +1308,8 @@ static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, str
 		}
 	}
 
-	if (unpack_nondirectories(n, mask, dirmask, src, names, info) < 0)
+	if (unpack_tree &&
+	    unpack_nondirectories(n, mask, dirmask, src, names, info) < 0)
 		return -1;
 
 	if (o->merge && src[0]) {
@@ -1337,7 +1339,8 @@ static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, str
 			}
 		}
 
-		if (traverse_trees_recursive(n, dirmask, mask & ~dirmask,
+		if (unpack_tree &&
+		    traverse_trees_recursive(n, dirmask, mask & ~dirmask,
 					     names, info) < 0)
 			return -1;
 		return mask;
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 127+ messages in thread

* [PATCH v3 08/12] dir.c: accept a directory as part of cone-mode patterns
  2021-05-14 18:30   ` [PATCH v3 00/12] " Derrick Stolee via GitGitGadget
                       ` (6 preceding siblings ...)
  2021-05-14 18:31     ` [PATCH v3 07/12] unpack-trees: stop recursing into sparse directories Derrick Stolee via GitGitGadget
@ 2021-05-14 18:31     ` Derrick Stolee via GitGitGadget
  2021-05-14 18:31     ` [PATCH v3 09/12] status: skip sparse-checkout percentage with sparse-index Derrick Stolee via GitGitGadget
                       ` (4 subsequent siblings)
  12 siblings, 0 replies; 127+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-05-14 18:31 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

When we have sparse directory entries in the index, we want to compare
that directory against sparse-checkout patterns. Those pattern matching
algorithms are built expecting a file path, not a directory path. This
is especially important in the "cone mode" patterns which will match
files that exist within the "parent directories" as well as the
recursive directory matches.

If path_matches_pattern_list() is given a directory, we can add a fake
filename ("-") to the directory and get the same results as before,
assuming we are in cone mode. Since sparse index requires cone mode
patterns, this is an acceptable assumption.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 dir.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/dir.c b/dir.c
index 166238e79f52..ab76ef286495 100644
--- a/dir.c
+++ b/dir.c
@@ -1378,6 +1378,17 @@ enum pattern_match_result path_matches_pattern_list(
 	strbuf_addch(&parent_pathname, '/');
 	strbuf_add(&parent_pathname, pathname, pathlen);
 
+	/*
+	 * Directory entries are matched if and only if a file
+	 * contained immediately within them is matched. For the
+	 * case of a directory entry, modify the path to create
+	 * a fake filename within this directory, allowing us to
+	 * use the file-base matching logic in an equivalent way.
+	 */
+	if (parent_pathname.len > 0 &&
+	    parent_pathname.buf[parent_pathname.len - 1] == '/')
+		strbuf_add(&parent_pathname, "-", 1);
+
 	if (hashmap_contains_path(&pl->recursive_hashmap,
 				  &parent_pathname)) {
 		result = MATCHED_RECURSIVE;
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 127+ messages in thread

* [PATCH v3 09/12] status: skip sparse-checkout percentage with sparse-index
  2021-05-14 18:30   ` [PATCH v3 00/12] " Derrick Stolee via GitGitGadget
                       ` (7 preceding siblings ...)
  2021-05-14 18:31     ` [PATCH v3 08/12] dir.c: accept a directory as part of cone-mode patterns Derrick Stolee via GitGitGadget
@ 2021-05-14 18:31     ` Derrick Stolee via GitGitGadget
  2021-05-14 18:31     ` [PATCH v3 10/12] status: use sparse-index throughout Derrick Stolee via GitGitGadget
                       ` (3 subsequent siblings)
  12 siblings, 0 replies; 127+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-05-14 18:31 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

'git status' began reporting a percentage of populated paths when
sparse-checkout is enabled in 051df3cf (wt-status: show sparse
checkout status as well, 2020-07-18). This percentage is incorrect when
the index has sparse directories. It would also be expensive to
calculate as we would need to parse trees to count the total number of
possible paths.

Avoid the expensive computation by simplifying the output to only report
that a sparse checkout exists, without the percentage.

This change is the reason we use 'git status --porcelain=v2' in
t1092-sparse-checkout-compatibility.sh. We don't want to ensure that
this message is equal across both modes, but instead just the important
information about staged, modified, and untracked files are compared.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 t/t1092-sparse-checkout-compatibility.sh |  8 ++++++++
 wt-status.c                              | 14 +++++++++++---
 wt-status.h                              |  1 +
 3 files changed, 20 insertions(+), 3 deletions(-)

diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index fba98d5484ae..34dae7fbcadd 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -202,6 +202,14 @@ test_expect_success 'status with options' '
 	test_all_match git status --porcelain=v2 -uno
 '
 
+test_expect_success 'status reports sparse-checkout' '
+	init_repos &&
+	git -C sparse-checkout status >full &&
+	git -C sparse-index status >sparse &&
+	test_i18ngrep "You are in a sparse checkout with " full &&
+	test_i18ngrep "You are in a sparse checkout." sparse
+'
+
 test_expect_success 'add, commit, checkout' '
 	init_repos &&
 
diff --git a/wt-status.c b/wt-status.c
index 0c8287a023e4..0425169c1895 100644
--- a/wt-status.c
+++ b/wt-status.c
@@ -1490,9 +1490,12 @@ static void show_sparse_checkout_in_use(struct wt_status *s,
 	if (s->state.sparse_checkout_percentage == SPARSE_CHECKOUT_DISABLED)
 		return;
 
-	status_printf_ln(s, color,
-			 _("You are in a sparse checkout with %d%% of tracked files present."),
-			 s->state.sparse_checkout_percentage);
+	if (s->state.sparse_checkout_percentage == SPARSE_CHECKOUT_SPARSE_INDEX)
+		status_printf_ln(s, color, _("You are in a sparse checkout."));
+	else
+		status_printf_ln(s, color,
+				_("You are in a sparse checkout with %d%% of tracked files present."),
+				s->state.sparse_checkout_percentage);
 	wt_longstatus_print_trailer(s);
 }
 
@@ -1650,6 +1653,11 @@ static void wt_status_check_sparse_checkout(struct repository *r,
 		return;
 	}
 
+	if (r->index->sparse_index) {
+		state->sparse_checkout_percentage = SPARSE_CHECKOUT_SPARSE_INDEX;
+		return;
+	}
+
 	for (i = 0; i < r->index->cache_nr; i++) {
 		struct cache_entry *ce = r->index->cache[i];
 		if (ce_skip_worktree(ce))
diff --git a/wt-status.h b/wt-status.h
index 0d32799b28e1..ab9cc9d8f032 100644
--- a/wt-status.h
+++ b/wt-status.h
@@ -78,6 +78,7 @@ enum wt_status_format {
 };
 
 #define SPARSE_CHECKOUT_DISABLED -1
+#define SPARSE_CHECKOUT_SPARSE_INDEX -2
 
 struct wt_status_state {
 	int merge_in_progress;
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 127+ messages in thread

* [PATCH v3 10/12] status: use sparse-index throughout
  2021-05-14 18:30   ` [PATCH v3 00/12] " Derrick Stolee via GitGitGadget
                       ` (8 preceding siblings ...)
  2021-05-14 18:31     ` [PATCH v3 09/12] status: skip sparse-checkout percentage with sparse-index Derrick Stolee via GitGitGadget
@ 2021-05-14 18:31     ` Derrick Stolee via GitGitGadget
  2021-05-14 18:31     ` [PATCH v3 11/12] wt-status: expand added sparse directory entries Derrick Stolee via GitGitGadget
                       ` (2 subsequent siblings)
  12 siblings, 0 replies; 127+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-05-14 18:31 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

By testing 'git -c core.fsmonitor= status -uno', we can check for the
simplest index operations that can be made sparse-aware. The necessary
implementation details are already integrated with sparse-checkout, so
modify command_requires_full_index to be zero for cmd_status().

In refresh_index(), we loop through the index entries to refresh their
stat() information. However, sparse directories have no stat()
information to populate. Ignore these entries.

This allows 'git status' to no longer expand a sparse index to a full
one. This is further tested by dropping the "-uno" option and adding an
untracked file into the worktree.

The performance test p2000-sparse-checkout-operations.sh demonstrates
these improvements:

Test                                  HEAD~1           HEAD
-----------------------------------------------------------------------------
2000.2: git status (full-index-v3)    0.31(0.30+0.05)  0.31(0.29+0.06) +0.0%
2000.3: git status (full-index-v4)    0.31(0.29+0.07)  0.34(0.30+0.08) +9.7%
2000.4: git status (sparse-index-v3)  2.35(2.28+0.10)  0.04(0.04+0.05) -98.3%
2000.5: git status (sparse-index-v4)  2.35(2.24+0.15)  0.05(0.04+0.06) -97.9%

Note that since HEAD~1 was expanding the sparse index by parsing trees,
it was artificially slower than the full index case. Thus, the 98%
improvement is misleading, and instead we should celebrate the 0.34s to
0.05s improvement of 85%. This is more indicative of the peformance
gains we are expecting by using a sparse index.

Note: we are dropping the assignment of core.fsmonitor here. This is not
necessary for the test script as we are not altering the config any
other way. Correct integration with FS Monitor will be validated in
later changes.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 builtin/commit.c                         |  3 +++
 read-cache.c                             | 10 ++++++++--
 t/t1092-sparse-checkout-compatibility.sh | 13 +++++++++----
 3 files changed, 20 insertions(+), 6 deletions(-)

diff --git a/builtin/commit.c b/builtin/commit.c
index cf0c36d1dcb2..e529da7beadd 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -1404,6 +1404,9 @@ int cmd_status(int argc, const char **argv, const char *prefix)
 	if (argc == 2 && !strcmp(argv[1], "-h"))
 		usage_with_options(builtin_status_usage, builtin_status_options);
 
+	prepare_repo_settings(the_repository);
+	the_repository->settings.command_requires_full_index = 0;
+
 	status_init_config(&s, git_status_config);
 	argc = parse_options(argc, argv, prefix,
 			     builtin_status_options,
diff --git a/read-cache.c b/read-cache.c
index 29ffa9ac5db9..f80e26831b36 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1578,8 +1578,7 @@ int refresh_index(struct index_state *istate, unsigned int flags,
 	 */
 	preload_index(istate, pathspec, 0);
 	trace2_region_enter("index", "refresh", NULL);
-	/* TODO: audit for interaction with sparse-index. */
-	ensure_full_index(istate);
+
 	for (i = 0; i < istate->cache_nr; i++) {
 		struct cache_entry *ce, *new_entry;
 		int cache_errno = 0;
@@ -1594,6 +1593,13 @@ int refresh_index(struct index_state *istate, unsigned int flags,
 		if (ignore_skip_worktree && ce_skip_worktree(ce))
 			continue;
 
+		/*
+		 * If this entry is a sparse directory, then there isn't
+		 * any stat() information to update. Ignore the entry.
+		 */
+		if (S_ISSPARSEDIR(ce->ce_mode))
+			continue;
+
 		if (pathspec && !ce_path_match(istate, ce, pathspec, seen))
 			filtered = 1;
 
diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index 34dae7fbcadd..59faf7381093 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -479,12 +479,17 @@ test_expect_success 'sparse-index is expanded and converted back' '
 	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \
 		git -C sparse-index -c core.fsmonitor="" reset --hard &&
 	test_region index convert_to_sparse trace2.txt &&
-	test_region index ensure_full_index trace2.txt &&
+	test_region index ensure_full_index trace2.txt
+'
 
-	rm trace2.txt &&
+test_expect_success 'sparse-index is not expanded' '
+	init_repos &&
+
+	rm -f trace2.txt &&
+	echo >>sparse-index/untracked.txt &&
 	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \
-		git -C sparse-index -c core.fsmonitor="" status -uno &&
-	test_region index ensure_full_index trace2.txt
+		git -C sparse-index status &&
+	test_region ! index ensure_full_index trace2.txt
 '
 
 test_done
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 127+ messages in thread

* [PATCH v3 11/12] wt-status: expand added sparse directory entries
  2021-05-14 18:30   ` [PATCH v3 00/12] " Derrick Stolee via GitGitGadget
                       ` (9 preceding siblings ...)
  2021-05-14 18:31     ` [PATCH v3 10/12] status: use sparse-index throughout Derrick Stolee via GitGitGadget
@ 2021-05-14 18:31     ` Derrick Stolee via GitGitGadget
  2021-05-18  2:27       ` Elijah Newren
  2021-05-14 18:31     ` [PATCH v3 12/12] fsmonitor: integrate with sparse index Derrick Stolee via GitGitGadget
  2021-05-21 11:59     ` [PATCH v4 00/12] Sparse-index: integrate with status Derrick Stolee via GitGitGadget
  12 siblings, 1 reply; 127+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-05-14 18:31 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

It is difficult, but possible, to get into a state where we intend to
add a directory that is outside of the sparse-checkout definition. Add a
test to t1092-sparse-checkout-compatibility.sh that demonstrates this
using a combination of 'git reset --mixed' and 'git checkout --orphan'.

This test failed before because the output of 'git status
--porcelain=v2' would not match on the lines for folder1/:

* The sparse-checkout repo (with a full index) would output each path
  name that is intended to be added.

* The sparse-index repo would only output that "folder1/" is staged for
  addition.

The status should report the full list of files to be added, and so this
sparse-directory entry should be expanded to a full list when reaching
it inside the wt_status_collect_changes_initial() method. Use
read_tree_at() to assist.

Somehow, this loop over the cache entries was not guarded by
ensure_full_index() as intended.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 t/t1092-sparse-checkout-compatibility.sh | 28 +++++++++++++
 wt-status.c                              | 50 ++++++++++++++++++++++++
 2 files changed, 78 insertions(+)

diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index 59faf7381093..cd3669d36b53 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -492,4 +492,32 @@ test_expect_success 'sparse-index is not expanded' '
 	test_region ! index ensure_full_index trace2.txt
 '
 
+test_expect_success 'reset mixed and checkout orphan' '
+	init_repos &&
+
+	test_all_match git checkout rename-out-to-in &&
+	test_all_match git reset --mixed HEAD~1 &&
+	test_sparse_match test-tool read-cache --table --expand &&
+	test_all_match git status --porcelain=v2 &&
+	test_all_match git status --porcelain=v2 &&
+
+	# At this point, sparse-checkouts behave differently
+	# from the full-checkout.
+	test_sparse_match git checkout --orphan new-branch &&
+	test_sparse_match test-tool read-cache --table --expand &&
+	test_sparse_match git status --porcelain=v2 &&
+	test_sparse_match git status --porcelain=v2
+'
+
+test_expect_success 'add everything with deep new file' '
+	init_repos &&
+
+	run_on_sparse git sparse-checkout set deep/deeper1/deepest &&
+
+	run_on_all touch deep/deeper1/x &&
+	test_all_match git add . &&
+	test_all_match git status --porcelain=v2 &&
+	test_all_match git status --porcelain=v2
+'
+
 test_done
diff --git a/wt-status.c b/wt-status.c
index 0425169c1895..90db8bd659fa 100644
--- a/wt-status.c
+++ b/wt-status.c
@@ -654,6 +654,34 @@ static void wt_status_collect_changes_index(struct wt_status *s)
 	run_diff_index(&rev, 1);
 }
 
+static int add_file_to_list(const struct object_id *oid,
+			    struct strbuf *base, const char *path,
+			    unsigned int mode, void *context)
+{
+	struct string_list_item *it;
+	struct wt_status_change_data *d;
+	struct wt_status *s = context;
+	char *full_name;
+
+	if (S_ISDIR(mode))
+		return READ_TREE_RECURSIVE;
+
+	full_name = xstrfmt("%s%s", base->buf, path);
+	it = string_list_insert(&s->change, full_name);
+	d = it->util;
+	if (!d) {
+		CALLOC_ARRAY(d, 1);
+		it->util = d;
+	}
+
+	d->index_status = DIFF_STATUS_ADDED;
+	/* Leave {mode,oid}_head zero for adds. */
+	d->mode_index = mode;
+	oidcpy(&d->oid_index, oid);
+	s->committable = 1;
+	return 0;
+}
+
 static void wt_status_collect_changes_initial(struct wt_status *s)
 {
 	struct index_state *istate = s->repo->index;
@@ -668,6 +696,28 @@ static void wt_status_collect_changes_initial(struct wt_status *s)
 			continue;
 		if (ce_intent_to_add(ce))
 			continue;
+		if (S_ISSPARSEDIR(ce->ce_mode)) {
+			/*
+			 * This is a sparse directory entry, so we want to collect all
+			 * of the added files within the tree. This requires recursively
+			 * expanding the trees to find the elements that are new in this
+			 * tree and marking them with DIFF_STATUS_ADDED.
+			 */
+			struct strbuf base = STRBUF_INIT;
+			struct pathspec ps;
+			struct tree *tree = lookup_tree(istate->repo, &ce->oid);
+
+			memset(&ps, 0, sizeof(ps));
+			ps.recursive = 1;
+			ps.has_wildcard = 1;
+			ps.max_depth = -1;
+
+			strbuf_add(&base, ce->name, ce->ce_namelen);
+			read_tree_at(istate->repo, tree, &base, &ps,
+				     add_file_to_list, s);
+			continue;
+		}
+
 		it = string_list_insert(&s->change, ce->name);
 		d = it->util;
 		if (!d) {
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 127+ messages in thread

* [PATCH v3 12/12] fsmonitor: integrate with sparse index
  2021-05-14 18:30   ` [PATCH v3 00/12] " Derrick Stolee via GitGitGadget
                       ` (10 preceding siblings ...)
  2021-05-14 18:31     ` [PATCH v3 11/12] wt-status: expand added sparse directory entries Derrick Stolee via GitGitGadget
@ 2021-05-14 18:31     ` Derrick Stolee via GitGitGadget
  2021-05-21 11:59     ` [PATCH v4 00/12] Sparse-index: integrate with status Derrick Stolee via GitGitGadget
  12 siblings, 0 replies; 127+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-05-14 18:31 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

If we need to expand a sparse-index into a full one, then the FS Monitor
bitmap is going to be incorrect. Ensure that we start fresh at such an
event.

While this is currently a performance drawback, the eventual hope of the
sparse-index feature is that these expansions will be rare and hence we
will be able to keep the FS Monitor data accurate across multiple Git
commands.

These tests are added to demonstrate that the behavior is the same
across a full index and a sparse index, but also that file modifications
to a tracked directory outside of the sparse cone will trigger
ensure_full_index().

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 sparse-index.c              |  7 ++++++
 t/t7519-status-fsmonitor.sh | 48 +++++++++++++++++++++++++++++++++++++
 2 files changed, 55 insertions(+)

diff --git a/sparse-index.c b/sparse-index.c
index b2b3fbd75050..32ba0d17ef7c 100644
--- a/sparse-index.c
+++ b/sparse-index.c
@@ -195,6 +195,10 @@ int convert_to_sparse(struct index_state *istate)
 	cache_tree_free(&istate->cache_tree);
 	cache_tree_update(istate, 0);
 
+	istate->fsmonitor_has_run_once = 0;
+	FREE_AND_NULL(istate->fsmonitor_dirty);
+	FREE_AND_NULL(istate->fsmonitor_last_update);
+
 	istate->sparse_index = 1;
 	trace2_region_leave("index", "convert_to_sparse", istate->repo);
 	return 0;
@@ -291,6 +295,9 @@ void ensure_full_index(struct index_state *istate)
 	istate->cache = full->cache;
 	istate->cache_nr = full->cache_nr;
 	istate->cache_alloc = full->cache_alloc;
+	istate->fsmonitor_has_run_once = 0;
+	FREE_AND_NULL(istate->fsmonitor_dirty);
+	FREE_AND_NULL(istate->fsmonitor_last_update);
 
 	strbuf_release(&base);
 	free(full);
diff --git a/t/t7519-status-fsmonitor.sh b/t/t7519-status-fsmonitor.sh
index 45d025f96010..f70fe961902e 100755
--- a/t/t7519-status-fsmonitor.sh
+++ b/t/t7519-status-fsmonitor.sh
@@ -73,6 +73,7 @@ test_expect_success 'setup' '
 	expect*
 	actual*
 	marker*
+	trace2*
 	EOF
 '
 
@@ -383,4 +384,51 @@ test_expect_success 'status succeeds after staging/unstaging' '
 	)
 '
 
+# Usage:
+# check_sparse_index_behavior [!]
+# If "!" is supplied, then we verify that we do not call ensure_full_index
+# during a call to 'git status'. Otherwise, we verify that we _do_ call it.
+check_sparse_index_behavior () {
+	git status --porcelain=v2 >expect &&
+	git sparse-checkout init --cone --sparse-index &&
+	git sparse-checkout set dir1 dir2 &&
+	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \
+		git status --porcelain=v2 >actual &&
+	test_region $1 index ensure_full_index trace2.txt &&
+	test_cmp expect actual &&
+	rm trace2.txt &&
+	git sparse-checkout disable
+}
+
+test_expect_success 'status succeeds with sparse index' '
+	git reset --hard &&
+
+	test_config core.fsmonitor "$TEST_DIRECTORY/t7519/fsmonitor-all" &&
+	check_sparse_index_behavior ! &&
+
+	write_script .git/hooks/fsmonitor-test<<-\EOF &&
+		printf "last_update_token\0"
+	EOF
+	git config core.fsmonitor .git/hooks/fsmonitor-test &&
+	check_sparse_index_behavior ! &&
+
+	write_script .git/hooks/fsmonitor-test<<-\EOF &&
+		printf "last_update_token\0"
+		printf "dir1/modified\0"
+	EOF
+	check_sparse_index_behavior ! &&
+
+	cp -r dir1 dir1a &&
+	git add dir1a &&
+	git commit -m "add dir1a" &&
+
+	# This one modifies outside the sparse-checkout definition
+	# and hence we expect to expand the sparse-index.
+	write_script .git/hooks/fsmonitor-test<<-\EOF &&
+		printf "last_update_token\0"
+		printf "dir1a/modified\0"
+	EOF
+	check_sparse_index_behavior
+'
+
 test_done
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [PATCH v3 02/12] sparse-index: include EXTENDED flag when expanding
  2021-05-14 18:31     ` [PATCH v3 02/12] sparse-index: include EXTENDED flag when expanding Derrick Stolee via GitGitGadget
@ 2021-05-18  1:33       ` Elijah Newren
  2021-05-18 14:57         ` Derrick Stolee
  0 siblings, 1 reply; 127+ messages in thread
From: Elijah Newren @ 2021-05-18  1:33 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Junio C Hamano, Matheus Tavares Bernardino,
	Derrick Stolee, Derrick Stolee, Derrick Stolee

On Fri, May 14, 2021 at 11:31 AM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Derrick Stolee <dstolee@microsoft.com>
>
> When creating a full index from a sparse one, we create cache entries
> for every blob within a given sparse directory entry. These are
> correctly marked with the CE_SKIP_WORKTREE flag, but they must also be
> marked with the CE_EXTENDED flag to ensure that the skip-worktree bit is
> correctly written to disk in the case that the index is not converted
> back down to a sparse-index.

This seems odd to me.  When sparse-index is not involved and we are
just doing simple sparse checkouts, do we mark CE_SKIP_WORKTREE
entries with CE_EXTENDED?  I can't find any code that does so.

Is it possible that the setting of CE_EXTENDED is just a workaround
that happens to force the index to be written in cases where the logic
is otherwise thinking it can get away without one?  Or is there
something I'm missing about why the CE_EXTENDED flag is actually
needed here?

>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  sparse-index.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/sparse-index.c b/sparse-index.c
> index 1b49898d0cb7..b2b3fbd75050 100644
> --- a/sparse-index.c
> +++ b/sparse-index.c
> @@ -222,7 +222,7 @@ static int add_path_to_index(const struct object_id *oid,
>         strbuf_addstr(base, path);
>
>         ce = make_cache_entry(istate, mode, oid, base->buf, 0, 0);
> -       ce->ce_flags |= CE_SKIP_WORKTREE;
> +       ce->ce_flags |= CE_SKIP_WORKTREE | CE_EXTENDED;
>         set_index_entry(istate, istate->cache_nr++, ce);
>
>         strbuf_setlen(base, len);
> --
> gitgitgadget
>

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [PATCH v3 03/12] t1092: expand repository data shape
  2021-05-14 18:31     ` [PATCH v3 03/12] t1092: expand repository data shape Derrick Stolee via GitGitGadget
@ 2021-05-18  1:49       ` Elijah Newren
  2021-05-18 14:59         ` Derrick Stolee
  0 siblings, 1 reply; 127+ messages in thread
From: Elijah Newren @ 2021-05-18  1:49 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Junio C Hamano, Matheus Tavares Bernardino,
	Derrick Stolee, Derrick Stolee, Derrick Stolee

On Fri, May 14, 2021 at 11:31 AM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Derrick Stolee <dstolee@microsoft.com>
>
> As more features integrate with the sparse-index feature, more and more
> special cases arise that require different data shapes within the tree
> structure of the repository in order to demonstrate those cases.
>
> Add several interesting special cases all at once instead of sprinkling
> them across several commits. The interesting cases being added here are:
>
> * Add sparse-directory entries on both sides of directories within the
>   sparse-checkout definition.
>
> * Add directories outside the sparse-checkout definition who have only
>   one entry and are the first entry of a directory with multiple
>   entries.
>
> Later tests will take advantage of these shapes, but they also deepen
> the tests that already exist.

Makes sense.  Do we also want to add ones of the form

   foo/bar
   foo.txt

?

Here we'd be particularly looking that if foo is a sparse directory,
we want to avoid messing up its order.  ('foo' sorts before 'foo.txt',
but 'foo/' sorts after, and thus 'foo' the directory should be after
'foo.txt')


> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  t/t1092-sparse-checkout-compatibility.sh | 8 +++++++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
> index 4f2f09b53a32..98257695979a 100755
> --- a/t/t1092-sparse-checkout-compatibility.sh
> +++ b/t/t1092-sparse-checkout-compatibility.sh
> @@ -17,7 +17,7 @@ test_expect_success 'setup' '
>                 echo "after folder1" >g &&
>                 echo "after x" >z &&
>                 mkdir folder1 folder2 deep x &&
> -               mkdir deep/deeper1 deep/deeper2 &&
> +               mkdir deep/deeper1 deep/deeper2 deep/before deep/later &&
>                 mkdir deep/deeper1/deepest &&
>                 echo "after deeper1" >deep/e &&
>                 echo "after deepest" >deep/deeper1/e &&
> @@ -25,10 +25,16 @@ test_expect_success 'setup' '
>                 cp a folder2 &&
>                 cp a x &&
>                 cp a deep &&
> +               cp a deep/before &&
>                 cp a deep/deeper1 &&
>                 cp a deep/deeper2 &&
> +               cp a deep/later &&
>                 cp a deep/deeper1/deepest &&
>                 cp -r deep/deeper1/deepest deep/deeper2 &&
> +               mkdir deep/deeper1/0 &&
> +               mkdir deep/deeper1/0/0 &&
> +               touch deep/deeper1/0/1 &&
> +               touch deep/deeper1/0/0/0 &&
>                 git add . &&
>                 git commit -m "initial commit" &&
>                 git checkout -b base &&
> --
> gitgitgadget

Looks good.

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [PATCH v3 07/12] unpack-trees: stop recursing into sparse directories
  2021-05-14 18:31     ` [PATCH v3 07/12] unpack-trees: stop recursing into sparse directories Derrick Stolee via GitGitGadget
@ 2021-05-18  2:03       ` Elijah Newren
  2021-05-18  2:06         ` Elijah Newren
  0 siblings, 1 reply; 127+ messages in thread
From: Elijah Newren @ 2021-05-18  2:03 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Junio C Hamano, Matheus Tavares Bernardino,
	Derrick Stolee, Derrick Stolee, Derrick Stolee

On Fri, May 14, 2021 at 11:31 AM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Derrick Stolee <dstolee@microsoft.com>
>
> When walking trees using traverse_trees_recursive() and
> unpack_callback(), we must not attempt to walk into a sparse directory
> entry. There are no index entries within that directory to compare to
> the tree object at that position, so skip over the entries of that tree.
>
> This code is used in many places, so the only way to test it is to start
> removing the command_requres_full_index option from one builtin at a
> time and carefully test that its use of unpack_trees() behaves correctly
> with a sparse-index. Such tests will be added by later changes.
>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  diff-lib.c     | 6 ++++++
>  unpack-trees.c | 7 +++++--
>  2 files changed, 11 insertions(+), 2 deletions(-)
>
> diff --git a/diff-lib.c b/diff-lib.c
> index b73cc1859a49..d5e7e01132ee 100644
> --- a/diff-lib.c
> +++ b/diff-lib.c
> @@ -322,6 +322,9 @@ static void show_new_file(struct rev_info *revs,
>         unsigned int mode;
>         unsigned dirty_submodule = 0;
>
> +       if (S_ISSPARSEDIR(new_file->ce_mode))
> +               return;
> +

Makes sense, but is this related to the unpack-trees.c changes and the
commit message, or should it be in a separate commit?

>         /*
>          * New file in the index: it might actually be different in
>          * the working tree.
> @@ -343,6 +346,9 @@ static int show_modified(struct rev_info *revs,
>         const struct object_id *oid;
>         unsigned dirty_submodule = 0;
>
> +       if (S_ISSPARSEDIR(new_entry->ce_mode))
> +               return 0;
> +

Same question as above.  And a few more questions...

What if the old commit/tree had a file at this path, and the new
commit/tree has a (sparse) directory at this path?  Shouldn't
_something_ be shown for the file deletion?  Or does such a case not
run through this code path?

Also, wouldn't we expect it to be an error for show_modified() to be
called on a sparse directory?  If two sparse directories differed, we
should have inflated the trees to find the differences in the path
underneath them, right?  And if they didn't differ, then
show_modified() should not have been invoked?

I can see cases where we wouldn't want to bother looking at the
differences between to sparse directories, e.g. a
--restrict-to-sparsity-paths option to diff/log/etc, but I don't see
you setting this behind an option here.

>         if (get_stat_data(new_entry, &oid, &mode, cached, match_missing,
>                           &dirty_submodule, &revs->diffopt) < 0) {
>                 if (report_missing)
> diff --git a/unpack-trees.c b/unpack-trees.c
> index ef6a2b1c951c..703b0bdc9dfd 100644
> --- a/unpack-trees.c
> +++ b/unpack-trees.c
> @@ -1261,6 +1261,7 @@ static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, str
>         struct cache_entry *src[MAX_UNPACK_TREES + 1] = { NULL, };
>         struct unpack_trees_options *o = info->data;
>         const struct name_entry *p = names;
> +       unsigned unpack_tree = 1;
>
>         /* Find first entry with a real name (we could use "mask" too) */
>         while (!p->mode)
> @@ -1307,7 +1308,8 @@ static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, str
>                 }
>         }
>
> -       if (unpack_nondirectories(n, mask, dirmask, src, names, info) < 0)
> +       if (unpack_tree &&
> +           unpack_nondirectories(n, mask, dirmask, src, names, info) < 0)
>                 return -1;
>
>         if (o->merge && src[0]) {
> @@ -1337,7 +1339,8 @@ static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, str
>                         }
>                 }
>
> -               if (traverse_trees_recursive(n, dirmask, mask & ~dirmask,
> +               if (unpack_tree &&
> +                   traverse_trees_recursive(n, dirmask, mask & ~dirmask,
>                                              names, info) < 0)
>                         return -1;
>                 return mask;
> --
> gitgitgadget

The unpack-trees.c changes make sense to me still.

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [PATCH v3 07/12] unpack-trees: stop recursing into sparse directories
  2021-05-18  2:03       ` Elijah Newren
@ 2021-05-18  2:06         ` Elijah Newren
  2021-05-18 19:20           ` Derrick Stolee
  0 siblings, 1 reply; 127+ messages in thread
From: Elijah Newren @ 2021-05-18  2:06 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Junio C Hamano, Matheus Tavares Bernardino,
	Derrick Stolee, Derrick Stolee, Derrick Stolee

Sorry, I spoke too soon...

On Mon, May 17, 2021 at 7:03 PM Elijah Newren <newren@gmail.com> wrote:
>
> > diff --git a/unpack-trees.c b/unpack-trees.c
> > index ef6a2b1c951c..703b0bdc9dfd 100644
> > --- a/unpack-trees.c
> > +++ b/unpack-trees.c
> > @@ -1261,6 +1261,7 @@ static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, str
> >         struct cache_entry *src[MAX_UNPACK_TREES + 1] = { NULL, };
> >         struct unpack_trees_options *o = info->data;
> >         const struct name_entry *p = names;
> > +       unsigned unpack_tree = 1;

Here, you set unpack_tree to 1.

> >
> >         /* Find first entry with a real name (we could use "mask" too) */
> >         while (!p->mode)
> > @@ -1307,7 +1308,8 @@ static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, str
> >                 }
> >         }
> >
> > -       if (unpack_nondirectories(n, mask, dirmask, src, names, info) < 0)
> > +       if (unpack_tree &&

You check it's value here...

> > +           unpack_nondirectories(n, mask, dirmask, src, names, info) < 0)
> >                 return -1;
> >
> >         if (o->merge && src[0]) {
> > @@ -1337,7 +1339,8 @@ static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, str
> >                         }
> >                 }
> >
> > -               if (traverse_trees_recursive(n, dirmask, mask & ~dirmask,
> > +               if (unpack_tree &&
...and here....

> > +                   traverse_trees_recursive(n, dirmask, mask & ~dirmask,
> >                                              names, info) < 0)
> >                         return -1;
> >                 return mask;

but you never set unpack_tree to 0, so this is wasted effort and you
always recurse.  The previous iteration had a case where it'd set
unpack_tree to 0 in a certain case, but you deleted that code in this
version.  Why?

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [PATCH v3 11/12] wt-status: expand added sparse directory entries
  2021-05-14 18:31     ` [PATCH v3 11/12] wt-status: expand added sparse directory entries Derrick Stolee via GitGitGadget
@ 2021-05-18  2:27       ` Elijah Newren
  2021-05-18 18:26         ` Derrick Stolee
  0 siblings, 1 reply; 127+ messages in thread
From: Elijah Newren @ 2021-05-18  2:27 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Junio C Hamano, Matheus Tavares Bernardino,
	Derrick Stolee, Derrick Stolee, Derrick Stolee

On Fri, May 14, 2021 at 11:31 AM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Derrick Stolee <dstolee@microsoft.com>
>
> It is difficult, but possible, to get into a state where we intend to
> add a directory that is outside of the sparse-checkout definition. Add a

Then we need to fix that; allowing things to be added outside the
sparse-checkout definition is a bug[1][2].  That's an invariant I
believe we should maintain everywhere; things get really confusing to
users somewhere later down the road if we don't.  Matheus worked to
fix that with 'git add'; if there are other commands that need fixing
too, then we should also fix them.

[1] https://lore.kernel.org/git/CABPp-BFhyFiKSXdLM5q5t=ZKzr6V0pY7dbheierRaOHFbMEdkg@mail.gmail.com/
[2] https://lore.kernel.org/git/CABPp-BF0ZhbSs42R3Bw_r-hbhQ71qtbXSBqXdq0djyaan=8p=A@mail.gmail.com/

> test to t1092-sparse-checkout-compatibility.sh that demonstrates this
> using a combination of 'git reset --mixed' and 'git checkout --orphan'.

I think `git checkout --orphan` should just throw an error if
sparse-checkout is in use.  Allowing adding paths outside the
sparse-checkout set causes too much collateral and deferred confusion
for users.

> This test failed before because the output of 'git status
> --porcelain=v2' would not match on the lines for folder1/:
>
> * The sparse-checkout repo (with a full index) would output each path
>   name that is intended to be added.
>
> * The sparse-index repo would only output that "folder1/" is staged for
>   addition.
>
> The status should report the full list of files to be added, and so this
> sparse-directory entry should be expanded to a full list when reaching
> it inside the wt_status_collect_changes_initial() method. Use
> read_tree_at() to assist.

Having a sparse directory entry whose object_id in the index does not
match HEAD should be an error.  Have a CE_SKIP_WORKTREE non-directory
whose object_id in the index does not match HEAD should also be an
error.  I don't think we should complicate the code to try to handle
violations of those assumptions.  I do think we should add checks to
enforce that constraint (or BUG() if it's violated).

And yeah, that also means 'git sparse-checkout add/set' would need to
error out if paths are requested to be sparsified despite being
different from HEAD.

> Somehow, this loop over the cache entries was not guarded by
> ensure_full_index() as intended.
>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  t/t1092-sparse-checkout-compatibility.sh | 28 +++++++++++++
>  wt-status.c                              | 50 ++++++++++++++++++++++++
>  2 files changed, 78 insertions(+)
>
> diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
> index 59faf7381093..cd3669d36b53 100755
> --- a/t/t1092-sparse-checkout-compatibility.sh
> +++ b/t/t1092-sparse-checkout-compatibility.sh
> @@ -492,4 +492,32 @@ test_expect_success 'sparse-index is not expanded' '
>         test_region ! index ensure_full_index trace2.txt
>  '
>
> +test_expect_success 'reset mixed and checkout orphan' '
> +       init_repos &&
> +
> +       test_all_match git checkout rename-out-to-in &&
> +       test_all_match git reset --mixed HEAD~1 &&
> +       test_sparse_match test-tool read-cache --table --expand &&
> +       test_all_match git status --porcelain=v2 &&
> +       test_all_match git status --porcelain=v2 &&
> +
> +       # At this point, sparse-checkouts behave differently
> +       # from the full-checkout.
> +       test_sparse_match git checkout --orphan new-branch &&
> +       test_sparse_match test-tool read-cache --table --expand &&
> +       test_sparse_match git status --porcelain=v2 &&
> +       test_sparse_match git status --porcelain=v2
> +'
> +
> +test_expect_success 'add everything with deep new file' '
> +       init_repos &&
> +
> +       run_on_sparse git sparse-checkout set deep/deeper1/deepest &&
> +
> +       run_on_all touch deep/deeper1/x &&
> +       test_all_match git add . &&
> +       test_all_match git status --porcelain=v2 &&
> +       test_all_match git status --porcelain=v2
> +'
> +
>  test_done
> diff --git a/wt-status.c b/wt-status.c
> index 0425169c1895..90db8bd659fa 100644
> --- a/wt-status.c
> +++ b/wt-status.c
> @@ -654,6 +654,34 @@ static void wt_status_collect_changes_index(struct wt_status *s)
>         run_diff_index(&rev, 1);
>  }
>
> +static int add_file_to_list(const struct object_id *oid,
> +                           struct strbuf *base, const char *path,
> +                           unsigned int mode, void *context)
> +{
> +       struct string_list_item *it;
> +       struct wt_status_change_data *d;
> +       struct wt_status *s = context;
> +       char *full_name;
> +
> +       if (S_ISDIR(mode))
> +               return READ_TREE_RECURSIVE;
> +
> +       full_name = xstrfmt("%s%s", base->buf, path);
> +       it = string_list_insert(&s->change, full_name);
> +       d = it->util;
> +       if (!d) {
> +               CALLOC_ARRAY(d, 1);
> +               it->util = d;
> +       }
> +
> +       d->index_status = DIFF_STATUS_ADDED;
> +       /* Leave {mode,oid}_head zero for adds. */
> +       d->mode_index = mode;
> +       oidcpy(&d->oid_index, oid);
> +       s->committable = 1;
> +       return 0;
> +}
> +
>  static void wt_status_collect_changes_initial(struct wt_status *s)
>  {
>         struct index_state *istate = s->repo->index;
> @@ -668,6 +696,28 @@ static void wt_status_collect_changes_initial(struct wt_status *s)
>                         continue;
>                 if (ce_intent_to_add(ce))
>                         continue;
> +               if (S_ISSPARSEDIR(ce->ce_mode)) {
> +                       /*
> +                        * This is a sparse directory entry, so we want to collect all
> +                        * of the added files within the tree. This requires recursively
> +                        * expanding the trees to find the elements that are new in this
> +                        * tree and marking them with DIFF_STATUS_ADDED.
> +                        */
> +                       struct strbuf base = STRBUF_INIT;
> +                       struct pathspec ps;
> +                       struct tree *tree = lookup_tree(istate->repo, &ce->oid);
> +
> +                       memset(&ps, 0, sizeof(ps));
> +                       ps.recursive = 1;
> +                       ps.has_wildcard = 1;
> +                       ps.max_depth = -1;
> +
> +                       strbuf_add(&base, ce->name, ce->ce_namelen);
> +                       read_tree_at(istate->repo, tree, &base, &ps,
> +                                    add_file_to_list, s);
> +                       continue;
> +               }
> +
>                 it = string_list_insert(&s->change, ce->name);
>                 d = it->util;
>                 if (!d) {
> --
> gitgitgadget

This was a really nice catch that you got this particular testcase.
While I disagree with the fix, I do have to say nice work on the catch
and the implementation otherwise.

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [PATCH v3 02/12] sparse-index: include EXTENDED flag when expanding
  2021-05-18  1:33       ` Elijah Newren
@ 2021-05-18 14:57         ` Derrick Stolee
  2021-05-18 17:48           ` Elijah Newren
  0 siblings, 1 reply; 127+ messages in thread
From: Derrick Stolee @ 2021-05-18 14:57 UTC (permalink / raw)
  To: Elijah Newren, Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Junio C Hamano, Matheus Tavares Bernardino,
	Derrick Stolee, Derrick Stolee

On 5/17/2021 9:33 PM, Elijah Newren wrote:
> On Fri, May 14, 2021 at 11:31 AM Derrick Stolee via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
>>
>> From: Derrick Stolee <dstolee@microsoft.com>
>>
>> When creating a full index from a sparse one, we create cache entries
>> for every blob within a given sparse directory entry. These are
>> correctly marked with the CE_SKIP_WORKTREE flag, but they must also be
>> marked with the CE_EXTENDED flag to ensure that the skip-worktree bit is
>> correctly written to disk in the case that the index is not converted
>> back down to a sparse-index.
> 
> This seems odd to me.  When sparse-index is not involved and we are
> just doing simple sparse checkouts, do we mark CE_SKIP_WORKTREE
> entries with CE_EXTENDED?  I can't find any code that does so.
> 
> Is it possible that the setting of CE_EXTENDED is just a workaround
> that happens to force the index to be written in cases where the logic
> is otherwise thinking it can get away without one?  Or is there
> something I'm missing about why the CE_EXTENDED flag is actually
> needed here?

This is happening within the context of ensure_full_index(), so we
are creating new cache entries and want to mimic what they would
look like on-disk. Something within do_write_index() discovers that
since CE_SKIP_WORKTREE is set, then also CE_EXTENDED should be set
in order to ensure that the on-disk representation has enough room
for the CE_SKIP_WORKTREE bit.

I suppose this might not have a meaningful purpose other than when
I compare a full index against an expanded sparse-index and check
if their flags match.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [PATCH v3 03/12] t1092: expand repository data shape
  2021-05-18  1:49       ` Elijah Newren
@ 2021-05-18 14:59         ` Derrick Stolee
  0 siblings, 0 replies; 127+ messages in thread
From: Derrick Stolee @ 2021-05-18 14:59 UTC (permalink / raw)
  To: Elijah Newren, Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Junio C Hamano, Matheus Tavares Bernardino,
	Derrick Stolee, Derrick Stolee

On 5/17/2021 9:49 PM, Elijah Newren wrote:
> On Fri, May 14, 2021 at 11:31 AM Derrick Stolee via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
>> Later tests will take advantage of these shapes, but they also deepen
>> the tests that already exist.
> 
> Makes sense.  Do we also want to add ones of the form
> 
>    foo/bar
>    foo.txt
> 
> ?
> 
> Here we'd be particularly looking that if foo is a sparse directory,
> we want to avoid messing up its order.  ('foo' sorts before 'foo.txt',
> but 'foo/' sorts after, and thus 'foo' the directory should be after
> 'foo.txt')

Good idea!

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [PATCH v3 02/12] sparse-index: include EXTENDED flag when expanding
  2021-05-18 14:57         ` Derrick Stolee
@ 2021-05-18 17:48           ` Elijah Newren
  2021-05-18 18:16             ` Derrick Stolee
  0 siblings, 1 reply; 127+ messages in thread
From: Elijah Newren @ 2021-05-18 17:48 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Derrick Stolee via GitGitGadget, Git Mailing List,
	Junio C Hamano, Matheus Tavares Bernardino, Derrick Stolee,
	Derrick Stolee

On Tue, May 18, 2021 at 7:57 AM Derrick Stolee <stolee@gmail.com> wrote:
>
> On 5/17/2021 9:33 PM, Elijah Newren wrote:
> > On Fri, May 14, 2021 at 11:31 AM Derrick Stolee via GitGitGadget
> > <gitgitgadget@gmail.com> wrote:
> >>
> >> From: Derrick Stolee <dstolee@microsoft.com>
> >>
> >> When creating a full index from a sparse one, we create cache entries
> >> for every blob within a given sparse directory entry. These are
> >> correctly marked with the CE_SKIP_WORKTREE flag, but they must also be
> >> marked with the CE_EXTENDED flag to ensure that the skip-worktree bit is
> >> correctly written to disk in the case that the index is not converted
> >> back down to a sparse-index.
> >
> > This seems odd to me.  When sparse-index is not involved and we are
> > just doing simple sparse checkouts, do we mark CE_SKIP_WORKTREE
> > entries with CE_EXTENDED?  I can't find any code that does so.
> >
> > Is it possible that the setting of CE_EXTENDED is just a workaround
> > that happens to force the index to be written in cases where the logic
> > is otherwise thinking it can get away without one?  Or is there
> > something I'm missing about why the CE_EXTENDED flag is actually
> > needed here?
>
> This is happening within the context of ensure_full_index(), so we
> are creating new cache entries and want to mimic what they would
> look like on-disk. Something within do_write_index() discovers that
> since CE_SKIP_WORKTREE is set, then also CE_EXTENDED should be set
> in order to ensure that the on-disk representation has enough room
> for the CE_SKIP_WORKTREE bit.

Yeah, I think it's this part:

        /* reduce extended entries if possible */
        cache[i]->ce_flags &= ~CE_EXTENDED;
        if (cache[i]->ce_flags & CE_EXTENDED_FLAGS) {
            extended++;
            cache[i]->ce_flags |= CE_EXTENDED;
        }

>
> I suppose this might not have a meaningful purpose other than when
> I compare a full index against an expanded sparse-index and check
> if their flags match.

Ah, you're just setting this flag in advance of do_write_index() being
called so that you can compare in memory values and check they match
without doing a write-to-disk-and-read-back cycle.  Makes sense, but
it'd be nice to see this in the commit message.

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [PATCH v3 02/12] sparse-index: include EXTENDED flag when expanding
  2021-05-18 17:48           ` Elijah Newren
@ 2021-05-18 18:16             ` Derrick Stolee
  0 siblings, 0 replies; 127+ messages in thread
From: Derrick Stolee @ 2021-05-18 18:16 UTC (permalink / raw)
  To: Elijah Newren
  Cc: Derrick Stolee via GitGitGadget, Git Mailing List,
	Junio C Hamano, Matheus Tavares Bernardino, Derrick Stolee,
	Derrick Stolee

On 5/18/2021 1:48 PM, Elijah Newren wrote:
> On Tue, May 18, 2021 at 7:57 AM Derrick Stolee <stolee@gmail.com> wrote:
>>
>> On 5/17/2021 9:33 PM, Elijah Newren wrote:
>>> Is it possible that the setting of CE_EXTENDED is just a workaround
>>> that happens to force the index to be written in cases where the logic
>>> is otherwise thinking it can get away without one?  Or is there
>>> something I'm missing about why the CE_EXTENDED flag is actually
>>> needed here?
>>
>> This is happening within the context of ensure_full_index(), so we
>> are creating new cache entries and want to mimic what they would
>> look like on-disk. Something within do_write_index() discovers that
>> since CE_SKIP_WORKTREE is set, then also CE_EXTENDED should be set
>> in order to ensure that the on-disk representation has enough room
>> for the CE_SKIP_WORKTREE bit.
> 
> Yeah, I think it's this part:
> 
>         /* reduce extended entries if possible */
>         cache[i]->ce_flags &= ~CE_EXTENDED;
>         if (cache[i]->ce_flags & CE_EXTENDED_FLAGS) {
>             extended++;
>             cache[i]->ce_flags |= CE_EXTENDED;
>         }
> 
>>
>> I suppose this might not have a meaningful purpose other than when
>> I compare a full index against an expanded sparse-index and check
>> if their flags match.
> 
> Ah, you're just setting this flag in advance of do_write_index() being
> called so that you can compare in memory values and check they match
> without doing a write-to-disk-and-read-back cycle.  Makes sense, but
> it'd be nice to see this in the commit message.

Will do. Thanks,

-Stolee

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [PATCH v3 11/12] wt-status: expand added sparse directory entries
  2021-05-18  2:27       ` Elijah Newren
@ 2021-05-18 18:26         ` Derrick Stolee
  2021-05-18 19:04           ` Derrick Stolee
  0 siblings, 1 reply; 127+ messages in thread
From: Derrick Stolee @ 2021-05-18 18:26 UTC (permalink / raw)
  To: Elijah Newren, Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Junio C Hamano, Matheus Tavares Bernardino,
	Derrick Stolee, Derrick Stolee

On 5/17/2021 10:27 PM, Elijah Newren wrote:
> On Fri, May 14, 2021 at 11:31 AM Derrick Stolee via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
>>
>> From: Derrick Stolee <dstolee@microsoft.com>
>>
>> It is difficult, but possible, to get into a state where we intend to
>> add a directory that is outside of the sparse-checkout definition. Add a
> 
> Then we need to fix that; allowing things to be added outside the
> sparse-checkout definition is a bug[1][2].  That's an invariant I
> believe we should maintain everywhere; things get really confusing to
> users somewhere later down the road if we don't.  Matheus worked to
> fix that with 'git add'; if there are other commands that need fixing
> too, then we should also fix them.
> 
> [1] https://lore.kernel.org/git/CABPp-BFhyFiKSXdLM5q5t=ZKzr6V0pY7dbheierRaOHFbMEdkg@mail.gmail.com/
> [2] https://lore.kernel.org/git/CABPp-BF0ZhbSs42R3Bw_r-hbhQ71qtbXSBqXdq0djyaan=8p=A@mail.gmail.com/
> 
>> test to t1092-sparse-checkout-compatibility.sh that demonstrates this
>> using a combination of 'git reset --mixed' and 'git checkout --orphan'.
> 
> I think `git checkout --orphan` should just throw an error if
> sparse-checkout is in use.  Allowing adding paths outside the
> sparse-checkout set causes too much collateral and deferred confusion
> for users.

I've been trying to strike an interesting balance of creating
performance improvements without changing behavior, trying to
defer those behavior changes to an isolated instance. I think
that approach is unavoidable with the 'git add' work that I
pulled out of this series and will return to soon.

However, here I think it would be too much to start throwing
an error in this case. I think that change is a bit too much.

The thing I can try to do, instead of the current approach, is
to not allow sparse directory entries to differ between the
index and HEAD. That will satisfy this case, but also a lot of
other painful cases.

I have no idea how to actually accomplish that, but I'll start
digging.

>> This test failed before because the output of 'git status
>> --porcelain=v2' would not match on the lines for folder1/:
>>
>> * The sparse-checkout repo (with a full index) would output each path
>>   name that is intended to be added.
>>
>> * The sparse-index repo would only output that "folder1/" is staged for
>>   addition.
>>
>> The status should report the full list of files to be added, and so this
>> sparse-directory entry should be expanded to a full list when reaching
>> it inside the wt_status_collect_changes_initial() method. Use
>> read_tree_at() to assist.
> 
> Having a sparse directory entry whose object_id in the index does not
> match HEAD should be an error.

I can get behind this understanding.

>  Have a CE_SKIP_WORKTREE non-directory
> whose object_id in the index does not match HEAD should also be an
> error.

I'm less convinced here. At minimum, I'm not willing to stake
a firm claim and change the behavior around this statement in
the current series.

>  I don't think we should complicate the code to try to handle
> violations of those assumptions.  I do think we should add checks to
> enforce that constraint (or BUG() if it's violated).

A BUG() is likely too strict, because existing Git clients can
get users into this state, and then they upgrade and are suddenly
in a BUG() state. We should perhaps do our best effort to avoid
this case and handle it as appropriately as possible.

> And yeah, that also means 'git sparse-checkout add/set' would need to
> error out if paths are requested to be sparsified despite being
> different from HEAD.

This would be a reasonable thing, assuming the established
behavior is changed.

>> Somehow, this loop over the cache entries was not guarded by
>> ensure_full_index() as intended.
>>
>> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
>> ---
>>  t/t1092-sparse-checkout-compatibility.sh | 28 +++++++++++++
>>  wt-status.c                              | 50 ++++++++++++++++++++++++
>>  2 files changed, 78 insertions(+)
>>
>> diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
>> index 59faf7381093..cd3669d36b53 100755
>> --- a/t/t1092-sparse-checkout-compatibility.sh
>> +++ b/t/t1092-sparse-checkout-compatibility.sh
>> @@ -492,4 +492,32 @@ test_expect_success 'sparse-index is not expanded' '
>>         test_region ! index ensure_full_index trace2.txt
>>  '
>>
>> +test_expect_success 'reset mixed and checkout orphan' '
>> +       init_repos &&
>> +
>> +       test_all_match git checkout rename-out-to-in &&
>> +       test_all_match git reset --mixed HEAD~1 &&
>> +       test_sparse_match test-tool read-cache --table --expand &&
>> +       test_all_match git status --porcelain=v2 &&
>> +       test_all_match git status --porcelain=v2 &&
>> +
>> +       # At this point, sparse-checkouts behave differently
>> +       # from the full-checkout.
>> +       test_sparse_match git checkout --orphan new-branch &&
>> +       test_sparse_match test-tool read-cache --table --expand &&
>> +       test_sparse_match git status --porcelain=v2 &&
>> +       test_sparse_match git status --porcelain=v2
>> +'
>> +
>> +test_expect_success 'add everything with deep new file' '
>> +       init_repos &&
>> +
>> +       run_on_sparse git sparse-checkout set deep/deeper1/deepest &&
>> +
>> +       run_on_all touch deep/deeper1/x &&
>> +       test_all_match git add . &&
>> +       test_all_match git status --porcelain=v2 &&
>> +       test_all_match git status --porcelain=v2
>> +'>
> This was a really nice catch that you got this particular testcase.
> While I disagree with the fix, I do have to say nice work on the catch
> and the implementation otherwise.

This test exists almost verbatim in the Scalar and VFS For Git
functional tests. I have no idea what context caused it to be
necessary.

I can understand your aversion to the solution I presented here.
Preventing sparse directory entries that differ from the tree at
HEAD for that path should be more robust to future integrations.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [PATCH v3 11/12] wt-status: expand added sparse directory entries
  2021-05-18 18:26         ` Derrick Stolee
@ 2021-05-18 19:04           ` Derrick Stolee
  2021-05-19  8:38             ` Elijah Newren
  0 siblings, 1 reply; 127+ messages in thread
From: Derrick Stolee @ 2021-05-18 19:04 UTC (permalink / raw)
  To: Elijah Newren, Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Junio C Hamano, Matheus Tavares Bernardino,
	Derrick Stolee, Derrick Stolee

On 5/18/2021 2:26 PM, Derrick Stolee wrote:
> On 5/17/2021 10:27 PM, Elijah Newren wrote:
>> On Fri, May 14, 2021 at 11:31 AM Derrick Stolee via GitGitGadget
>> <gitgitgadget@gmail.com> wrote:
>>>
>>> From: Derrick Stolee <dstolee@microsoft.com>
>>>
>>> It is difficult, but possible, to get into a state where we intend to
>>> add a directory that is outside of the sparse-checkout definition. Add a
>>
>> Then we need to fix that; allowing things to be added outside the
>> sparse-checkout definition is a bug[1][2].  That's an invariant I
>> believe we should maintain everywhere; things get really confusing to
>> users somewhere later down the road if we don't.  Matheus worked to
>> fix that with 'git add'; if there are other commands that need fixing
>> too, then we should also fix them.
>>
>> [1] https://lore.kernel.org/git/CABPp-BFhyFiKSXdLM5q5t=ZKzr6V0pY7dbheierRaOHFbMEdkg@mail.gmail.com/
>> [2] https://lore.kernel.org/git/CABPp-BF0ZhbSs42R3Bw_r-hbhQ71qtbXSBqXdq0djyaan=8p=A@mail.gmail.com/
>>
>>> test to t1092-sparse-checkout-compatibility.sh that demonstrates this
>>> using a combination of 'git reset --mixed' and 'git checkout --orphan'.
>>
>> I think `git checkout --orphan` should just throw an error if
>> sparse-checkout is in use.  Allowing adding paths outside the
>> sparse-checkout set causes too much collateral and deferred confusion
>> for users.
> 
> I've been trying to strike an interesting balance of creating
> performance improvements without changing behavior, trying to
> defer those behavior changes to an isolated instance. I think
> that approach is unavoidable with the 'git add' work that I
> pulled out of this series and will return to soon.
> 
> However, here I think it would be too much to start throwing
> an error in this case. I think that change is a bit too much.
> 
> The thing I can try to do, instead of the current approach, is
> to not allow sparse directory entries to differ between the
> index and HEAD. That will satisfy this case, but also a lot of
> other painful cases.
> 
> I have no idea how to actually accomplish that, but I'll start
> digging.

It didn't take much digging to discover that this is likely
impossible, or rather it would be a drastic change to make this
happen.

The immediate issue is trying to prevent sparse directory entries
from existing when the contained paths don't match what exists at
HEAD. However, in the 'git checkout --orphan' case, we are using
a full index for the unpack_trees() that updates the in-memory
index according to the paths at HEAD, then updates HEAD to point
to a non-existing ref. The sparse directories are only created as
part of convert_to_sparse() within do_write_index(). At that
point, there is no HEAD provided. Trying to load it from scratch
violates the fact that HEAD is being staged to change _after_ the
index updates in a command like 'git checkout'.

So, the drastic change to make this work would be to update the
index API to require a root tree to be provided whenever writing
the index. However, that doesn't make sense, either! What do we
do when in a conflicted state?

What if a user modifies HEAD manually to point to a new ref?

Such a change would couple the index to the concept of HEAD in
an unproductive way, I think. The index data structure exists
as a separate entity that is frequently _compared_ to HEAD, and
the solution presented in this patch presents a way to keep the
comparison of a sparse-index and HEAD to be the same as if we
had a full index.

So, after looking into it, I'm back in favor of this change and
forever allowing sparse cache entries to differ from HEAD,
because there is no way to avoid it.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [PATCH v3 07/12] unpack-trees: stop recursing into sparse directories
  2021-05-18  2:06         ` Elijah Newren
@ 2021-05-18 19:20           ` Derrick Stolee
  0 siblings, 0 replies; 127+ messages in thread
From: Derrick Stolee @ 2021-05-18 19:20 UTC (permalink / raw)
  To: Elijah Newren, Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Junio C Hamano, Matheus Tavares Bernardino,
	Derrick Stolee, Derrick Stolee

On 5/17/2021 10:06 PM, Elijah Newren wrote:
> Sorry, I spoke too soon...
> 
> On Mon, May 17, 2021 at 7:03 PM Elijah Newren <newren@gmail.com> wrote:
>>
>>> diff --git a/unpack-trees.c b/unpack-trees.c
>>> index ef6a2b1c951c..703b0bdc9dfd 100644
>>> --- a/unpack-trees.c
>>> +++ b/unpack-trees.c
>>> @@ -1261,6 +1261,7 @@ static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, str
>>>         struct cache_entry *src[MAX_UNPACK_TREES + 1] = { NULL, };
>>>         struct unpack_trees_options *o = info->data;
>>>         const struct name_entry *p = names;
>>> +       unsigned unpack_tree = 1;
> 
> Here, you set unpack_tree to 1.
> 
>>>
>>>         /* Find first entry with a real name (we could use "mask" too) */
>>>         while (!p->mode)
>>> @@ -1307,7 +1308,8 @@ static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, str
>>>                 }
>>>         }
>>>
>>> -       if (unpack_nondirectories(n, mask, dirmask, src, names, info) < 0)
>>> +       if (unpack_tree &&
> 
> You check it's value here...
> 
>>> +           unpack_nondirectories(n, mask, dirmask, src, names, info) < 0)
>>>                 return -1;
>>>
>>>         if (o->merge && src[0]) {
>>> @@ -1337,7 +1339,8 @@ static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, str
>>>                         }
>>>                 }
>>>
>>> -               if (traverse_trees_recursive(n, dirmask, mask & ~dirmask,
>>> +               if (unpack_tree &&
> ...and here....
> 
>>> +                   traverse_trees_recursive(n, dirmask, mask & ~dirmask,
>>>                                              names, info) < 0)
>>>                         return -1;
>>>                 return mask;
> 
> but you never set unpack_tree to 0, so this is wasted effort and you
> always recurse.  The previous iteration had a case where it'd set
> unpack_tree to 0 in a certain case, but you deleted that code in this
> version.  Why?

It appears that the changes to unpack-trees.c are no longer relevant,
and instead the changes to diff-lib.c (which were already out of place)
should instead be the focus. In fact, those changes to diff-lib.c can
be simplified and moved to path 10, so I will do that.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [PATCH v3 11/12] wt-status: expand added sparse directory entries
  2021-05-18 19:04           ` Derrick Stolee
@ 2021-05-19  8:38             ` Elijah Newren
  0 siblings, 0 replies; 127+ messages in thread
From: Elijah Newren @ 2021-05-19  8:38 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Derrick Stolee via GitGitGadget, Git Mailing List,
	Junio C Hamano, Matheus Tavares Bernardino, Derrick Stolee,
	Derrick Stolee

On Tue, May 18, 2021 at 12:05 PM Derrick Stolee <stolee@gmail.com> wrote:
>
> On 5/18/2021 2:26 PM, Derrick Stolee wrote:
> > On 5/17/2021 10:27 PM, Elijah Newren wrote:
> >> On Fri, May 14, 2021 at 11:31 AM Derrick Stolee via GitGitGadget
> >> <gitgitgadget@gmail.com> wrote:
> >>>
> >>> From: Derrick Stolee <dstolee@microsoft.com>
> >>>
> >>> It is difficult, but possible, to get into a state where we intend to
> >>> add a directory that is outside of the sparse-checkout definition. Add a
> >>
> >> Then we need to fix that; allowing things to be added outside the
> >> sparse-checkout definition is a bug[1][2].  That's an invariant I
> >> believe we should maintain everywhere; things get really confusing to
> >> users somewhere later down the road if we don't.  Matheus worked to
> >> fix that with 'git add'; if there are other commands that need fixing
> >> too, then we should also fix them.
> >>
> >> [1] https://lore.kernel.org/git/CABPp-BFhyFiKSXdLM5q5t=ZKzr6V0pY7dbheierRaOHFbMEdkg@mail.gmail.com/
> >> [2] https://lore.kernel.org/git/CABPp-BF0ZhbSs42R3Bw_r-hbhQ71qtbXSBqXdq0djyaan=8p=A@mail.gmail.com/
> >>
> >>> test to t1092-sparse-checkout-compatibility.sh that demonstrates this
> >>> using a combination of 'git reset --mixed' and 'git checkout --orphan'.
> >>
> >> I think `git checkout --orphan` should just throw an error if
> >> sparse-checkout is in use.  Allowing adding paths outside the
> >> sparse-checkout set causes too much collateral and deferred confusion
> >> for users.
> >
> > I've been trying to strike an interesting balance of creating
> > performance improvements without changing behavior, trying to
> > defer those behavior changes to an isolated instance. I think
> > that approach is unavoidable with the 'git add' work that I
> > pulled out of this series and will return to soon.
> >
> > However, here I think it would be too much to start throwing
> > an error in this case. I think that change is a bit too much.
> >
> > The thing I can try to do, instead of the current approach, is
> > to not allow sparse directory entries to differ between the
> > index and HEAD. That will satisfy this case, but also a lot of
> > other painful cases.
> >
> > I have no idea how to actually accomplish that, but I'll start
> > digging.
>
> It didn't take much digging to discover that this is likely
> impossible, or rather it would be a drastic change to make this
> happen.
>
> The immediate issue is trying to prevent sparse directory entries
> from existing when the contained paths don't match what exists at
> HEAD. However, in the 'git checkout --orphan' case, we are using
> a full index for the unpack_trees() that updates the in-memory
> index according to the paths at HEAD, then updates HEAD to point
> to a non-existing ref. The sparse directories are only created as
> part of convert_to_sparse() within do_write_index(). At that
> point, there is no HEAD provided. Trying to load it from scratch
> violates the fact that HEAD is being staged to change _after_ the
> index updates in a command like 'git checkout'.
>
> So, the drastic change to make this work would be to update the
> index API to require a root tree to be provided whenever writing
> the index. However, that doesn't make sense, either! What do we
> do when in a conflicted state?
>
> What if a user modifies HEAD manually to point to a new ref?
>
> Such a change would couple the index to the concept of HEAD in
> an unproductive way, I think. The index data structure exists
> as a separate entity that is frequently _compared_ to HEAD, and
> the solution presented in this patch presents a way to keep the
> comparison of a sparse-index and HEAD to be the same as if we
> had a full index.
>
> So, after looking into it, I'm back in favor of this change and
> forever allowing sparse cache entries to differ from HEAD,
> because there is no way to avoid it.

Doh, thanks for digging in and entertaining the idea.  I'm worried
we'll get lots of confused users over the years from not being able to
do this, but you do make some good points.

I still think `git checkout --orphan` should be an error when in a
sparse checkout -- the point of a sparse checkout is that you only
care about a subset of files, whereas checkout --orphan fundamentally
says you are throwing away history but care about each and every file
since you are staging "changes" from all of them to include in some
new commit soon.  They just seem in strong opposition to me, and it
seems likely to result in surprises for some of the users when despite
the --orphan request and them fixing up the working directory how they
like, they get some new commit that contains files that aren't in
their working tree.  (In contrast, `git switch --orphan` would
probably be fine in a sparse checkout, precisely because it really
does empty everything).  However, I do agree with you that such a
change belongs in a separate series.  So, yes, your patch is good, and
I'll raise the behavioral change later.

(Sorry for being slow to respond and still not getting to all your
good reviews of my series; I'm a bit limited in my time for git right
now...)

^ permalink raw reply	[flat|nested] 127+ messages in thread

* [PATCH v4 00/12] Sparse-index: integrate with status
  2021-05-14 18:30   ` [PATCH v3 00/12] " Derrick Stolee via GitGitGadget
                       ` (11 preceding siblings ...)
  2021-05-14 18:31     ` [PATCH v3 12/12] fsmonitor: integrate with sparse index Derrick Stolee via GitGitGadget
@ 2021-05-21 11:59     ` Derrick Stolee via GitGitGadget
  2021-05-21 11:59       ` [PATCH v4 01/12] sparse-index: skip indexes with unmerged entries Derrick Stolee via GitGitGadget
                         ` (12 more replies)
  12 siblings, 13 replies; 127+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-05-21 11:59 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee,
	Derrick Stolee

This is the first "payoff" series in the sparse-index work. It makes 'git
status' very fast when a sparse-index is enabled on a repository with
cone-mode sparse-checkout (and a small populated set).

This is based on ds/sparse-index-protections AND mt/add-rm-sparse-checkout.
The latter branch is needed because it changes the behavior of 'git add'
around sparse entries, which changes the expectations of a test added in
patch 1.

The approach here is to audit the places where ensure_full_index() pops up
while doing normal commands with pathspecs within the sparse-checkout
definition. Each of these are checked and tested. In the end, the
sparse-index is integrated with these features:

 * git status
 * FS Monitor index extension.

The performance tests in p2000-sparse-operations.sh improve by 95% or more,
even when compared with the full-index cases, not just the sparse-index
cases that previously had extra overhead.

Hopefully this is the first example of how ds/sparse-index-protections has
done the basic work to do these conversions safely, making them look easier
than they seemed when starting this adventure.

Thanks, -Stolee


Updates in V4
=============

 * The previous patch "unpack-trees: stop recursing into sparse directories"
   was confusing, and actually a bit sloppy.
 * It has been replaced with "unpack-trees: be careful around sparse
   directory entries" which takes the sparse-directory checks and raises
   them higher up into unpack_trees.c instead of in diff-lib.c.


Updates in V3
=============

Sorry that this was a long time coming. I got a little side-tracked on other
projects, but I also worked to get the sparse-index feature working against
the Scalar functional tests, which contain many special cases around the
sparse-checkout feature as they were inherited from special cases that arose
in the virtualized environment of VFS for Git. This version contains my
fixes based on that investigation. Most of these were easy to identify and
fix, but I was blocked for a long time struggling with a bug when combining
the sparse-index with the builtin FS Monitor feature, but I've reported my
findings already [1].

[1]
https://lore.kernel.org/git/0b9e54ba-ac27-e537-7bef-1b4448f92352@gmail.com/

 * Updated comments and tests based on the v2 feedback.
 * Expanded the test repository data shape based on the special cases found
   during my investigation.
 * Added several commits that either fix errors in the status code, or fix
   errors in the previous sparse-index series, specifically:
   * When in a conflict state, the cache-tree fails to update. For now, skip
     writing a sparse-index until this can be resolved more carefully.
   * When expanding a sparse-directory entry, we set the CE_SKIP_WORKTREE
     bit but forgot the CE_EXTENDED bit.
   * git status had failures if there was a sparse-directory entry as the
     first entry within a directory.
   * When expanding a directory to report its status, such as when a
     sparse-directory is staged but doesn't exist at HEAD (such as in an
     orphaned commit) we did not previously recurse correctly into
     subdirectories.
   * Be extra careful with the FS Monitor data when expanding or contracting
     an index. This version now abandons all FS Monitor data at these
     conversion points with the expectation that in the future these
     conversions will be rare so the FS Monitor feature can work
     efficiently. Updates in V2

----------------------------------------------------------------------------

 * Based on the feedback, it is clear that 'git add' will require much more
   careful testing and thought. I'm splitting it out of this series and it
   will return with a follow-up.
 * Test cases are improved, both in coverage and organization.
 * The previous "unpack-trees: make sparse aware" patch is split into three
   now.
 * Stale messages based on an old implementation of the "protections" topic
   are now fixed.
 * Performance tests were re-run.

Derrick Stolee (12):
  sparse-index: skip indexes with unmerged entries
  sparse-index: include EXTENDED flag when expanding
  t1092: expand repository data shape
  t1092: add tests for status/add and sparse files
  unpack-trees: preserve cache_bottom
  unpack-trees: compare sparse directories correctly
  unpack-trees: be careful around sparse directory entries
  dir.c: accept a directory as part of cone-mode patterns
  status: skip sparse-checkout percentage with sparse-index
  status: use sparse-index throughout
  wt-status: expand added sparse directory entries
  fsmonitor: integrate with sparse index

 builtin/commit.c                         |   3 +
 dir.c                                    |  11 +++
 read-cache.c                             |  10 +-
 sparse-index.c                           |  27 +++++-
 t/t1092-sparse-checkout-compatibility.sh | 117 ++++++++++++++++++++++-
 t/t7519-status-fsmonitor.sh              |  48 ++++++++++
 unpack-trees.c                           |  26 ++++-
 wt-status.c                              |  64 ++++++++++++-
 wt-status.h                              |   1 +
 9 files changed, 295 insertions(+), 12 deletions(-)


base-commit: f723f370c89ad61f4f40aabfd3540b1ce19c00e5
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-932%2Fderrickstolee%2Fsparse-index%2Fstatus-and-add-v4
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-932/derrickstolee/sparse-index/status-and-add-v4
Pull-Request: https://github.com/gitgitgadget/git/pull/932

Range-diff vs v3:

  1:  5a2ed3d1d701 =  1:  5a2ed3d1d701 sparse-index: skip indexes with unmerged entries
  2:  8aa41e749471 =  2:  8aa41e749471 sparse-index: include EXTENDED flag when expanding
  3:  70971b1f9261 =  3:  70971b1f9261 t1092: expand repository data shape
  4:  a80b5a41153f =  4:  a80b5a41153f t1092: add tests for status/add and sparse files
  5:  07a45b661c4a =  5:  07a45b661c4a unpack-trees: preserve cache_bottom
  6:  cc4a526e7947 =  6:  cc4a526e7947 unpack-trees: compare sparse directories correctly
  7:  598375d3531f <  -:  ------------ unpack-trees: stop recursing into sparse directories
  -:  ------------ >  7:  e28df7f9395d unpack-trees: be careful around sparse directory entries
  8:  47da2b317237 =  8:  2cc3a93d4434 dir.c: accept a directory as part of cone-mode patterns
  9:  bc1512981493 =  9:  5011feb1aa04 status: skip sparse-checkout percentage with sparse-index
 10:  5b1ae369a7cd = 10:  9f2ce5301dc9 status: use sparse-index throughout
 11:  3b42783d4a86 = 11:  24417e095243 wt-status: expand added sparse directory entries
 12:  b72507f514d1 = 12:  584d4b559a91 fsmonitor: integrate with sparse index

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 127+ messages in thread

* [PATCH v4 01/12] sparse-index: skip indexes with unmerged entries
  2021-05-21 11:59     ` [PATCH v4 00/12] Sparse-index: integrate with status Derrick Stolee via GitGitGadget
@ 2021-05-21 11:59       ` Derrick Stolee via GitGitGadget
  2021-05-21 11:59       ` [PATCH v4 02/12] sparse-index: include EXTENDED flag when expanding Derrick Stolee via GitGitGadget
                         ` (11 subsequent siblings)
  12 siblings, 0 replies; 127+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-05-21 11:59 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

The sparse-index format is designed to be compatible with merge
conflicts, even those outside the sparse-checkout definition. The reason
is that when converting a full index to a sparse one, a cache entry with
nonzero stage will not be collapsed into a sparse directory entry.

However, this behavior was not tested, and a different behavior within
convert_to_sparse() fails in this scenario. Specifically,
cache_tree_update() will fail when unmerged entries exist.
convert_to_sparse_rec() uses the cache-tree data to recursively walk the
tree structure, but also to compute the OIDs used in the
sparse-directory entries.

Add an index scan to convert_to_sparse() that will detect if these merge
conflict entries exist and skip the conversion before trying to update
the cache-tree. This is marked as NEEDSWORK because this can be removed
with a suitable update to cache_tree_update() or a similar method that
can construct a cache-tree with invalid nodes, but still allow creating
the nodes necessary for creating sparse directory entries.

It is possible that in the future we will not need to make such an
update, since if we do not expand a sparse-index into a full one, this
conversion does not need to happen. Thus, this can be deferred until the
merge machinery is made to integrate with the sparse-index.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 sparse-index.c                           | 18 ++++++++++++++++++
 t/t1092-sparse-checkout-compatibility.sh | 22 ++++++++++++++++++++++
 2 files changed, 40 insertions(+)

diff --git a/sparse-index.c b/sparse-index.c
index 6f21397e2ee0..1b49898d0cb7 100644
--- a/sparse-index.c
+++ b/sparse-index.c
@@ -125,6 +125,17 @@ int set_sparse_index_config(struct repository *repo, int enable)
 	return res;
 }
 
+static int index_has_unmerged_entries(struct index_state *istate)
+{
+	int i;
+	for (i = 0; i < istate->cache_nr; i++) {
+		if (ce_stage(istate->cache[i]))
+			return 1;
+	}
+
+	return 0;
+}
+
 int convert_to_sparse(struct index_state *istate)
 {
 	int test_env;
@@ -161,6 +172,13 @@ int convert_to_sparse(struct index_state *istate)
 		return -1;
 	}
 
+	/*
+	 * NEEDSWORK: If we have unmerged entries, then stay full.
+	 * Unmerged entries prevent the cache-tree extension from working.
+	 */
+	if (index_has_unmerged_entries(istate))
+		return 0;
+
 	if (cache_tree_update(istate, 0)) {
 		warning(_("unable to update cache-tree, staying full"));
 		return -1;
diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index 12e6c453024f..4f2f09b53a32 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -352,6 +352,28 @@ test_expect_success 'merge with outside renames' '
 	done
 '
 
+# Sparse-index fails to convert the index in the
+# final 'git cherry-pick' command.
+test_expect_success 'cherry-pick with conflicts' '
+	init_repos &&
+
+	write_script edit-conflict <<-\EOF &&
+	echo $1 >conflict
+	EOF
+
+	test_all_match git checkout -b to-cherry-pick &&
+	run_on_all ../edit-conflict ABC &&
+	test_all_match git add conflict &&
+	test_all_match git commit -m "conflict to pick" &&
+
+	test_all_match git checkout -B base HEAD~1 &&
+	run_on_all ../edit-conflict DEF &&
+	test_all_match git add conflict &&
+	test_all_match git commit -m "conflict in base" &&
+
+	test_all_match test_must_fail git cherry-pick to-cherry-pick
+'
+
 test_expect_success 'clean' '
 	init_repos &&
 
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 127+ messages in thread

* [PATCH v4 02/12] sparse-index: include EXTENDED flag when expanding
  2021-05-21 11:59     ` [PATCH v4 00/12] Sparse-index: integrate with status Derrick Stolee via GitGitGadget
  2021-05-21 11:59       ` [PATCH v4 01/12] sparse-index: skip indexes with unmerged entries Derrick Stolee via GitGitGadget
@ 2021-05-21 11:59       ` Derrick Stolee via GitGitGadget
  2021-05-21 11:59       ` [PATCH v4 03/12] t1092: expand repository data shape Derrick Stolee via GitGitGadget
                         ` (10 subsequent siblings)
  12 siblings, 0 replies; 127+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-05-21 11:59 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

When creating a full index from a sparse one, we create cache entries
for every blob within a given sparse directory entry. These are
correctly marked with the CE_SKIP_WORKTREE flag, but they must also be
marked with the CE_EXTENDED flag to ensure that the skip-worktree bit is
correctly written to disk in the case that the index is not converted
back down to a sparse-index.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 sparse-index.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/sparse-index.c b/sparse-index.c
index 1b49898d0cb7..b2b3fbd75050 100644
--- a/sparse-index.c
+++ b/sparse-index.c
@@ -222,7 +222,7 @@ static int add_path_to_index(const struct object_id *oid,
 	strbuf_addstr(base, path);
 
 	ce = make_cache_entry(istate, mode, oid, base->buf, 0, 0);
-	ce->ce_flags |= CE_SKIP_WORKTREE;
+	ce->ce_flags |= CE_SKIP_WORKTREE | CE_EXTENDED;
 	set_index_entry(istate, istate->cache_nr++, ce);
 
 	strbuf_setlen(base, len);
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 127+ messages in thread

* [PATCH v4 03/12] t1092: expand repository data shape
  2021-05-21 11:59     ` [PATCH v4 00/12] Sparse-index: integrate with status Derrick Stolee via GitGitGadget
  2021-05-21 11:59       ` [PATCH v4 01/12] sparse-index: skip indexes with unmerged entries Derrick Stolee via GitGitGadget
  2021-05-21 11:59       ` [PATCH v4 02/12] sparse-index: include EXTENDED flag when expanding Derrick Stolee via GitGitGadget
@ 2021-05-21 11:59       ` Derrick Stolee via GitGitGadget
  2021-05-21 11:59       ` [PATCH v4 04/12] t1092: add tests for status/add and sparse files Derrick Stolee via GitGitGadget
                         ` (9 subsequent siblings)
  12 siblings, 0 replies; 127+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-05-21 11:59 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

As more features integrate with the sparse-index feature, more and more
special cases arise that require different data shapes within the tree
structure of the repository in order to demonstrate those cases.

Add several interesting special cases all at once instead of sprinkling
them across several commits. The interesting cases being added here are:

* Add sparse-directory entries on both sides of directories within the
  sparse-checkout definition.

* Add directories outside the sparse-checkout definition who have only
  one entry and are the first entry of a directory with multiple
  entries.

Later tests will take advantage of these shapes, but they also deepen
the tests that already exist.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 t/t1092-sparse-checkout-compatibility.sh | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index 4f2f09b53a32..98257695979a 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -17,7 +17,7 @@ test_expect_success 'setup' '
 		echo "after folder1" >g &&
 		echo "after x" >z &&
 		mkdir folder1 folder2 deep x &&
-		mkdir deep/deeper1 deep/deeper2 &&
+		mkdir deep/deeper1 deep/deeper2 deep/before deep/later &&
 		mkdir deep/deeper1/deepest &&
 		echo "after deeper1" >deep/e &&
 		echo "after deepest" >deep/deeper1/e &&
@@ -25,10 +25,16 @@ test_expect_success 'setup' '
 		cp a folder2 &&
 		cp a x &&
 		cp a deep &&
+		cp a deep/before &&
 		cp a deep/deeper1 &&
 		cp a deep/deeper2 &&
+		cp a deep/later &&
 		cp a deep/deeper1/deepest &&
 		cp -r deep/deeper1/deepest deep/deeper2 &&
+		mkdir deep/deeper1/0 &&
+		mkdir deep/deeper1/0/0 &&
+		touch deep/deeper1/0/1 &&
+		touch deep/deeper1/0/0/0 &&
 		git add . &&
 		git commit -m "initial commit" &&
 		git checkout -b base &&
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 127+ messages in thread

* [PATCH v4 04/12] t1092: add tests for status/add and sparse files
  2021-05-21 11:59     ` [PATCH v4 00/12] Sparse-index: integrate with status Derrick Stolee via GitGitGadget
                         ` (2 preceding siblings ...)
  2021-05-21 11:59       ` [PATCH v4 03/12] t1092: expand repository data shape Derrick Stolee via GitGitGadget
@ 2021-05-21 11:59       ` Derrick Stolee via GitGitGadget
  2021-05-21 11:59       ` [PATCH v4 05/12] unpack-trees: preserve cache_bottom Derrick Stolee via GitGitGadget
                         ` (8 subsequent siblings)
  12 siblings, 0 replies; 127+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-05-21 11:59 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Before moving to update 'git status' and 'git add' to work with sparse
indexes, add an explicit test that ensures the sparse-index works the
same as a normal sparse-checkout when the worktree contains directories
and files outside of the sparse cone.

Specifically, 'folder1/a' is a file in our test repo, but 'folder1' is
not in the sparse cone. When 'folder1/a' is modified, the file is not
shown as modified and adding it will fail. This is new behavior as of
a20f704 (add: warn when asked to update SKIP_WORKTREE entries,
2021-04-08). Before that change, these adds would be silently ignored.

Untracked files are fine: adding new files both with 'git add .' and
'git add folder1/' works just as in a full checkout. This may not be
entirely desirable, but we are not intending to change behavior at the
moment, only document it. A future change could alter the behavior to
be more sensible, and this test could be modified to satisfy the new
expected behavior.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 t/t1092-sparse-checkout-compatibility.sh | 38 ++++++++++++++++++++++++
 1 file changed, 38 insertions(+)

diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index 98257695979a..fba98d5484ae 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -238,6 +238,44 @@ test_expect_success 'add, commit, checkout' '
 	test_all_match git checkout -
 '
 
+test_expect_success 'status/add: outside sparse cone' '
+	init_repos &&
+
+	# adding a "missing" file outside the cone should fail
+	test_sparse_match test_must_fail git add folder1/a &&
+
+	# folder1 is at HEAD, but outside the sparse cone
+	run_on_sparse mkdir folder1 &&
+	cp initial-repo/folder1/a sparse-checkout/folder1/a &&
+	cp initial-repo/folder1/a sparse-index/folder1/a &&
+
+	test_sparse_match git status &&
+
+	write_script edit-contents <<-\EOF &&
+	echo text >>$1
+	EOF
+	run_on_sparse ../edit-contents folder1/a &&
+	run_on_all ../edit-contents folder1/new &&
+
+	test_sparse_match git status --porcelain=v2 &&
+
+	# This "git add folder1/a" fails with a warning
+	# in the sparse repos, differing from the full
+	# repo. This is intentional.
+	test_sparse_match test_must_fail git add folder1/a &&
+	test_sparse_match test_must_fail git add --refresh folder1/a &&
+	test_all_match git status --porcelain=v2 &&
+
+	test_all_match git add . &&
+	test_all_match git status --porcelain=v2 &&
+	test_all_match git commit -m folder1/new &&
+
+	run_on_all ../edit-contents folder1/newer &&
+	test_all_match git add folder1/ &&
+	test_all_match git status --porcelain=v2 &&
+	test_all_match git commit -m folder1/newer
+'
+
 test_expect_success 'checkout and reset --hard' '
 	init_repos &&
 
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 127+ messages in thread

* [PATCH v4 05/12] unpack-trees: preserve cache_bottom
  2021-05-21 11:59     ` [PATCH v4 00/12] Sparse-index: integrate with status Derrick Stolee via GitGitGadget
                         ` (3 preceding siblings ...)
  2021-05-21 11:59       ` [PATCH v4 04/12] t1092: add tests for status/add and sparse files Derrick Stolee via GitGitGadget
@ 2021-05-21 11:59       ` Derrick Stolee via GitGitGadget
  2021-05-21 11:59       ` [PATCH v4 06/12] unpack-trees: compare sparse directories correctly Derrick Stolee via GitGitGadget
                         ` (7 subsequent siblings)
  12 siblings, 0 replies; 127+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-05-21 11:59 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

The cache_bottom member of 'struct unpack_trees_options' is used to
track the range of index entries corresponding to a node of the cache
tree. While recursing with traverse_by_cache_tree(), this value is
preserved on the call stack using a local and then restored as that
method returns.

The mark_ce_used() method normally modifies the cache_bottom member when
it refers to the marked cache entry. However, sparse directory entries
are stored as nodes in the cache-tree data structure as of 2de37c53
(cache-tree: integrate with sparse directory entries, 2021-03-30). Thus,
the cache_bottom will be modified as the cache-tree walk advances. Do
not update it as well within mark_ce_used().

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 unpack-trees.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/unpack-trees.c b/unpack-trees.c
index dddf106d5bd4..1067db19c9d2 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -586,6 +586,13 @@ static void mark_ce_used(struct cache_entry *ce, struct unpack_trees_options *o)
 {
 	ce->ce_flags |= CE_UNPACKED;
 
+	/*
+	 * If this is a sparse directory, don't advance cache_bottom.
+	 * That will be advanced later using the cache-tree data.
+	 */
+	if (S_ISSPARSEDIR(ce->ce_mode))
+		return;
+
 	if (o->cache_bottom < o->src_index->cache_nr &&
 	    o->src_index->cache[o->cache_bottom] == ce) {
 		int bottom = o->cache_bottom;
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 127+ messages in thread

* [PATCH v4 06/12] unpack-trees: compare sparse directories correctly
  2021-05-21 11:59     ` [PATCH v4 00/12] Sparse-index: integrate with status Derrick Stolee via GitGitGadget
                         ` (4 preceding siblings ...)
  2021-05-21 11:59       ` [PATCH v4 05/12] unpack-trees: preserve cache_bottom Derrick Stolee via GitGitGadget
@ 2021-05-21 11:59       ` Derrick Stolee via GitGitGadget
  2021-05-21 11:59       ` [PATCH v4 07/12] unpack-trees: be careful around sparse directory entries Derrick Stolee via GitGitGadget
                         ` (6 subsequent siblings)
  12 siblings, 0 replies; 127+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-05-21 11:59 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

As we further integrate the sparse-index into unpack-trees, we need to
ensure that we compare sparse directory entries correctly with other
entries. This affects searching for an exact path as well as sorting
index entries.

Sparse directory entries contain the trailing directory separator. This
is important for the sorting, in particular. Thus, within
do_compare_entry() we stop using S_IFREG in all cases, since sparse
directories should use S_IFDIR to indicate that the comparison should
treat the entry name as a dirctory.

Within compare_entry(), it first calls do_compare_entry() to check the
leading portion of the name. When the input path is a directory name, we
could match exactly already. Thus, we should return 0 if we have an
exact string match on a sparse directory entry.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 unpack-trees.c | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/unpack-trees.c b/unpack-trees.c
index 1067db19c9d2..ef6a2b1c951c 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -969,6 +969,7 @@ static int do_compare_entry(const struct cache_entry *ce,
 	int pathlen, ce_len;
 	const char *ce_name;
 	int cmp;
+	unsigned ce_mode;
 
 	/*
 	 * If we have not precomputed the traverse path, it is quicker
@@ -991,7 +992,8 @@ static int do_compare_entry(const struct cache_entry *ce,
 	ce_len -= pathlen;
 	ce_name = ce->name + pathlen;
 
-	return df_name_compare(ce_name, ce_len, S_IFREG, name, namelen, mode);
+	ce_mode = S_ISSPARSEDIR(ce->ce_mode) ? S_IFDIR : S_IFREG;
+	return df_name_compare(ce_name, ce_len, ce_mode, name, namelen, mode);
 }
 
 static int compare_entry(const struct cache_entry *ce, const struct traverse_info *info, const struct name_entry *n)
@@ -1000,6 +1002,15 @@ static int compare_entry(const struct cache_entry *ce, const struct traverse_inf
 	if (cmp)
 		return cmp;
 
+	/*
+	 * At this point, we know that we have a prefix match. If ce
+	 * is a sparse directory, then allow an exact match. This only
+	 * works when the input name is a directory, since ce->name
+	 * ends in a directory separator.
+	 */
+	if (S_ISSPARSEDIR(ce->ce_mode))
+		return 0;
+
 	/*
 	 * Even if the beginning compared identically, the ce should
 	 * compare as bigger than a directory leading up to it!
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 127+ messages in thread

* [PATCH v4 07/12] unpack-trees: be careful around sparse directory entries
  2021-05-21 11:59     ` [PATCH v4 00/12] Sparse-index: integrate with status Derrick Stolee via GitGitGadget
                         ` (5 preceding siblings ...)
  2021-05-21 11:59       ` [PATCH v4 06/12] unpack-trees: compare sparse directories correctly Derrick Stolee via GitGitGadget
@ 2021-05-21 11:59       ` Derrick Stolee via GitGitGadget
  2021-05-28 11:36         ` Derrick Stolee
  2021-05-21 11:59       ` [PATCH v4 08/12] dir.c: accept a directory as part of cone-mode patterns Derrick Stolee via GitGitGadget
                         ` (5 subsequent siblings)
  12 siblings, 1 reply; 127+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-05-21 11:59 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

The methods traverse_by_cache_tree() and unpack_nondirectories() have
similar behavior in trying to demonstrate the difference between and
index and a tree, with some differences about how they walk the index.

Each of these is expecting every cache entry to correspond to a file
path. We need to skip over the sparse directory entries in the case of a
sparse-index. Those entries are discovered in the portion that looks for
subtrees among the cache entries by scanning the paths for slashes.

Skipping these sparse directory entries will have a measurable effect
when we relax 'git status' to work with sparse-indexes: without this
change these methods would call call_unpack_fn() which in turn calls
oneway_diff() and then shows these sparse directory entries as added or
modified files.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 unpack-trees.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/unpack-trees.c b/unpack-trees.c
index ef6a2b1c951c..22634d98e72b 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -802,6 +802,9 @@ static int traverse_by_cache_tree(int pos, int nr_entries, int nr_names,
 
 		src[0] = o->src_index->cache[pos + i];
 
+		if (S_ISSPARSEDIR(src[0]->ce_mode))
+			continue;
+
 		len = ce_namelen(src[0]);
 		new_ce_len = cache_entry_size(len);
 
@@ -1074,6 +1077,9 @@ static int unpack_nondirectories(int n, unsigned long mask,
 	if (mask == dirmask && !src[0])
 		return 0;
 
+	if (src[0] && S_ISSPARSEDIR(src[0]->ce_mode))
+		return 0;
+
 	/*
 	 * Ok, we've filled in up to any potential index entry in src[0],
 	 * now do the rest.
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 127+ messages in thread

* [PATCH v4 08/12] dir.c: accept a directory as part of cone-mode patterns
  2021-05-21 11:59     ` [PATCH v4 00/12] Sparse-index: integrate with status Derrick Stolee via GitGitGadget
                         ` (6 preceding siblings ...)
  2021-05-21 11:59       ` [PATCH v4 07/12] unpack-trees: be careful around sparse directory entries Derrick Stolee via GitGitGadget
@ 2021-05-21 11:59       ` Derrick Stolee via GitGitGadget
  2021-05-21 11:59       ` [PATCH v4 09/12] status: skip sparse-checkout percentage with sparse-index Derrick Stolee via GitGitGadget
                         ` (4 subsequent siblings)
  12 siblings, 0 replies; 127+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-05-21 11:59 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

When we have sparse directory entries in the index, we want to compare
that directory against sparse-checkout patterns. Those pattern matching
algorithms are built expecting a file path, not a directory path. This
is especially important in the "cone mode" patterns which will match
files that exist within the "parent directories" as well as the
recursive directory matches.

If path_matches_pattern_list() is given a directory, we can add a fake
filename ("-") to the directory and get the same results as before,
assuming we are in cone mode. Since sparse index requires cone mode
patterns, this is an acceptable assumption.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 dir.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/dir.c b/dir.c
index 166238e79f52..ab76ef286495 100644
--- a/dir.c
+++ b/dir.c
@@ -1378,6 +1378,17 @@ enum pattern_match_result path_matches_pattern_list(
 	strbuf_addch(&parent_pathname, '/');
 	strbuf_add(&parent_pathname, pathname, pathlen);
 
+	/*
+	 * Directory entries are matched if and only if a file
+	 * contained immediately within them is matched. For the
+	 * case of a directory entry, modify the path to create
+	 * a fake filename within this directory, allowing us to
+	 * use the file-base matching logic in an equivalent way.
+	 */
+	if (parent_pathname.len > 0 &&
+	    parent_pathname.buf[parent_pathname.len - 1] == '/')
+		strbuf_add(&parent_pathname, "-", 1);
+
 	if (hashmap_contains_path(&pl->recursive_hashmap,
 				  &parent_pathname)) {
 		result = MATCHED_RECURSIVE;
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 127+ messages in thread

* [PATCH v4 09/12] status: skip sparse-checkout percentage with sparse-index
  2021-05-21 11:59     ` [PATCH v4 00/12] Sparse-index: integrate with status Derrick Stolee via GitGitGadget
                         ` (7 preceding siblings ...)
  2021-05-21 11:59       ` [PATCH v4 08/12] dir.c: accept a directory as part of cone-mode patterns Derrick Stolee via GitGitGadget
@ 2021-05-21 11:59       ` Derrick Stolee via GitGitGadget
  2021-05-21 11:59       ` [PATCH v4 10/12] status: use sparse-index throughout Derrick Stolee via GitGitGadget
                         ` (3 subsequent siblings)
  12 siblings, 0 replies; 127+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-05-21 11:59 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

'git status' began reporting a percentage of populated paths when
sparse-checkout is enabled in 051df3cf (wt-status: show sparse
checkout status as well, 2020-07-18). This percentage is incorrect when
the index has sparse directories. It would also be expensive to
calculate as we would need to parse trees to count the total number of
possible paths.

Avoid the expensive computation by simplifying the output to only report
that a sparse checkout exists, without the percentage.

This change is the reason we use 'git status --porcelain=v2' in
t1092-sparse-checkout-compatibility.sh. We don't want to ensure that
this message is equal across both modes, but instead just the important
information about staged, modified, and untracked files are compared.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 t/t1092-sparse-checkout-compatibility.sh |  8 ++++++++
 wt-status.c                              | 14 +++++++++++---
 wt-status.h                              |  1 +
 3 files changed, 20 insertions(+), 3 deletions(-)

diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index fba98d5484ae..34dae7fbcadd 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -202,6 +202,14 @@ test_expect_success 'status with options' '
 	test_all_match git status --porcelain=v2 -uno
 '
 
+test_expect_success 'status reports sparse-checkout' '
+	init_repos &&
+	git -C sparse-checkout status >full &&
+	git -C sparse-index status >sparse &&
+	test_i18ngrep "You are in a sparse checkout with " full &&
+	test_i18ngrep "You are in a sparse checkout." sparse
+'
+
 test_expect_success 'add, commit, checkout' '
 	init_repos &&
 
diff --git a/wt-status.c b/wt-status.c
index 0c8287a023e4..0425169c1895 100644
--- a/wt-status.c
+++ b/wt-status.c
@@ -1490,9 +1490,12 @@ static void show_sparse_checkout_in_use(struct wt_status *s,
 	if (s->state.sparse_checkout_percentage == SPARSE_CHECKOUT_DISABLED)
 		return;
 
-	status_printf_ln(s, color,
-			 _("You are in a sparse checkout with %d%% of tracked files present."),
-			 s->state.sparse_checkout_percentage);
+	if (s->state.sparse_checkout_percentage == SPARSE_CHECKOUT_SPARSE_INDEX)
+		status_printf_ln(s, color, _("You are in a sparse checkout."));
+	else
+		status_printf_ln(s, color,
+				_("You are in a sparse checkout with %d%% of tracked files present."),
+				s->state.sparse_checkout_percentage);
 	wt_longstatus_print_trailer(s);
 }
 
@@ -1650,6 +1653,11 @@ static void wt_status_check_sparse_checkout(struct repository *r,
 		return;
 	}
 
+	if (r->index->sparse_index) {
+		state->sparse_checkout_percentage = SPARSE_CHECKOUT_SPARSE_INDEX;
+		return;
+	}
+
 	for (i = 0; i < r->index->cache_nr; i++) {
 		struct cache_entry *ce = r->index->cache[i];
 		if (ce_skip_worktree(ce))
diff --git a/wt-status.h b/wt-status.h
index 0d32799b28e1..ab9cc9d8f032 100644
--- a/wt-status.h
+++ b/wt-status.h
@@ -78,6 +78,7 @@ enum wt_status_format {
 };
 
 #define SPARSE_CHECKOUT_DISABLED -1
+#define SPARSE_CHECKOUT_SPARSE_INDEX -2
 
 struct wt_status_state {
 	int merge_in_progress;
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 127+ messages in thread

* [PATCH v4 10/12] status: use sparse-index throughout
  2021-05-21 11:59     ` [PATCH v4 00/12] Sparse-index: integrate with status Derrick Stolee via GitGitGadget
                         ` (8 preceding siblings ...)
  2021-05-21 11:59       ` [PATCH v4 09/12] status: skip sparse-checkout percentage with sparse-index Derrick Stolee via GitGitGadget
@ 2021-05-21 11:59       ` Derrick Stolee via GitGitGadget
  2021-05-21 11:59       ` [PATCH v4 11/12] wt-status: expand added sparse directory entries Derrick Stolee via GitGitGadget
                         ` (2 subsequent siblings)
  12 siblings, 0 replies; 127+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-05-21 11:59 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

By testing 'git -c core.fsmonitor= status -uno', we can check for the
simplest index operations that can be made sparse-aware. The necessary
implementation details are already integrated with sparse-checkout, so
modify command_requires_full_index to be zero for cmd_status().

In refresh_index(), we loop through the index entries to refresh their
stat() information. However, sparse directories have no stat()
information to populate. Ignore these entries.

This allows 'git status' to no longer expand a sparse index to a full
one. This is further tested by dropping the "-uno" option and adding an
untracked file into the worktree.

The performance test p2000-sparse-checkout-operations.sh demonstrates
these improvements:

Test                                  HEAD~1           HEAD
-----------------------------------------------------------------------------
2000.2: git status (full-index-v3)    0.31(0.30+0.05)  0.31(0.29+0.06) +0.0%
2000.3: git status (full-index-v4)    0.31(0.29+0.07)  0.34(0.30+0.08) +9.7%
2000.4: git status (sparse-index-v3)  2.35(2.28+0.10)  0.04(0.04+0.05) -98.3%
2000.5: git status (sparse-index-v4)  2.35(2.24+0.15)  0.05(0.04+0.06) -97.9%

Note that since HEAD~1 was expanding the sparse index by parsing trees,
it was artificially slower than the full index case. Thus, the 98%
improvement is misleading, and instead we should celebrate the 0.34s to
0.05s improvement of 85%. This is more indicative of the peformance
gains we are expecting by using a sparse index.

Note: we are dropping the assignment of core.fsmonitor here. This is not
necessary for the test script as we are not altering the config any
other way. Correct integration with FS Monitor will be validated in
later changes.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 builtin/commit.c                         |  3 +++
 read-cache.c                             | 10 ++++++++--
 t/t1092-sparse-checkout-compatibility.sh | 13 +++++++++----
 3 files changed, 20 insertions(+), 6 deletions(-)

diff --git a/builtin/commit.c b/builtin/commit.c
index cf0c36d1dcb2..e529da7beadd 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -1404,6 +1404,9 @@ int cmd_status(int argc, const char **argv, const char *prefix)
 	if (argc == 2 && !strcmp(argv[1], "-h"))
 		usage_with_options(builtin_status_usage, builtin_status_options);
 
+	prepare_repo_settings(the_repository);
+	the_repository->settings.command_requires_full_index = 0;
+
 	status_init_config(&s, git_status_config);
 	argc = parse_options(argc, argv, prefix,
 			     builtin_status_options,
diff --git a/read-cache.c b/read-cache.c
index 29ffa9ac5db9..f80e26831b36 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1578,8 +1578,7 @@ int refresh_index(struct index_state *istate, unsigned int flags,
 	 */
 	preload_index(istate, pathspec, 0);
 	trace2_region_enter("index", "refresh", NULL);
-	/* TODO: audit for interaction with sparse-index. */
-	ensure_full_index(istate);
+
 	for (i = 0; i < istate->cache_nr; i++) {
 		struct cache_entry *ce, *new_entry;
 		int cache_errno = 0;
@@ -1594,6 +1593,13 @@ int refresh_index(struct index_state *istate, unsigned int flags,
 		if (ignore_skip_worktree && ce_skip_worktree(ce))
 			continue;
 
+		/*
+		 * If this entry is a sparse directory, then there isn't
+		 * any stat() information to update. Ignore the entry.
+		 */
+		if (S_ISSPARSEDIR(ce->ce_mode))
+			continue;
+
 		if (pathspec && !ce_path_match(istate, ce, pathspec, seen))
 			filtered = 1;
 
diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index 34dae7fbcadd..59faf7381093 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -479,12 +479,17 @@ test_expect_success 'sparse-index is expanded and converted back' '
 	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \
 		git -C sparse-index -c core.fsmonitor="" reset --hard &&
 	test_region index convert_to_sparse trace2.txt &&
-	test_region index ensure_full_index trace2.txt &&
+	test_region index ensure_full_index trace2.txt
+'
 
-	rm trace2.txt &&
+test_expect_success 'sparse-index is not expanded' '
+	init_repos &&
+
+	rm -f trace2.txt &&
+	echo >>sparse-index/untracked.txt &&
 	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \
-		git -C sparse-index -c core.fsmonitor="" status -uno &&
-	test_region index ensure_full_index trace2.txt
+		git -C sparse-index status &&
+	test_region ! index ensure_full_index trace2.txt
 '
 
 test_done
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 127+ messages in thread

* [PATCH v4 11/12] wt-status: expand added sparse directory entries
  2021-05-21 11:59     ` [PATCH v4 00/12] Sparse-index: integrate with status Derrick Stolee via GitGitGadget
                         ` (9 preceding siblings ...)
  2021-05-21 11:59       ` [PATCH v4 10/12] status: use sparse-index throughout Derrick Stolee via GitGitGadget
@ 2021-05-21 11:59       ` Derrick Stolee via GitGitGadget
  2021-05-21 11:59       ` [PATCH v4 12/12] fsmonitor: integrate with sparse index Derrick Stolee via GitGitGadget
  2021-06-07 12:33       ` [PATCH v5 00/14] Sparse-index: integrate with status Derrick Stolee via GitGitGadget
  12 siblings, 0 replies; 127+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-05-21 11:59 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

It is difficult, but possible, to get into a state where we intend to
add a directory that is outside of the sparse-checkout definition. Add a
test to t1092-sparse-checkout-compatibility.sh that demonstrates this
using a combination of 'git reset --mixed' and 'git checkout --orphan'.

This test failed before because the output of 'git status
--porcelain=v2' would not match on the lines for folder1/:

* The sparse-checkout repo (with a full index) would output each path
  name that is intended to be added.

* The sparse-index repo would only output that "folder1/" is staged for
  addition.

The status should report the full list of files to be added, and so this
sparse-directory entry should be expanded to a full list when reaching
it inside the wt_status_collect_changes_initial() method. Use
read_tree_at() to assist.

Somehow, this loop over the cache entries was not guarded by
ensure_full_index() as intended.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 t/t1092-sparse-checkout-compatibility.sh | 28 +++++++++++++
 wt-status.c                              | 50 ++++++++++++++++++++++++
 2 files changed, 78 insertions(+)

diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index 59faf7381093..cd3669d36b53 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -492,4 +492,32 @@ test_expect_success 'sparse-index is not expanded' '
 	test_region ! index ensure_full_index trace2.txt
 '
 
+test_expect_success 'reset mixed and checkout orphan' '
+	init_repos &&
+
+	test_all_match git checkout rename-out-to-in &&
+	test_all_match git reset --mixed HEAD~1 &&
+	test_sparse_match test-tool read-cache --table --expand &&
+	test_all_match git status --porcelain=v2 &&
+	test_all_match git status --porcelain=v2 &&
+
+	# At this point, sparse-checkouts behave differently
+	# from the full-checkout.
+	test_sparse_match git checkout --orphan new-branch &&
+	test_sparse_match test-tool read-cache --table --expand &&
+	test_sparse_match git status --porcelain=v2 &&
+	test_sparse_match git status --porcelain=v2
+'
+
+test_expect_success 'add everything with deep new file' '
+	init_repos &&
+
+	run_on_sparse git sparse-checkout set deep/deeper1/deepest &&
+
+	run_on_all touch deep/deeper1/x &&
+	test_all_match git add . &&
+	test_all_match git status --porcelain=v2 &&
+	test_all_match git status --porcelain=v2
+'
+
 test_done
diff --git a/wt-status.c b/wt-status.c
index 0425169c1895..90db8bd659fa 100644
--- a/wt-status.c
+++ b/wt-status.c
@@ -654,6 +654,34 @@ static void wt_status_collect_changes_index(struct wt_status *s)
 	run_diff_index(&rev, 1);
 }
 
+static int add_file_to_list(const struct object_id *oid,
+			    struct strbuf *base, const char *path,
+			    unsigned int mode, void *context)
+{
+	struct string_list_item *it;
+	struct wt_status_change_data *d;
+	struct wt_status *s = context;
+	char *full_name;
+
+	if (S_ISDIR(mode))
+		return READ_TREE_RECURSIVE;
+
+	full_name = xstrfmt("%s%s", base->buf, path);
+	it = string_list_insert(&s->change, full_name);
+	d = it->util;
+	if (!d) {
+		CALLOC_ARRAY(d, 1);
+		it->util = d;
+	}
+
+	d->index_status = DIFF_STATUS_ADDED;
+	/* Leave {mode,oid}_head zero for adds. */
+	d->mode_index = mode;
+	oidcpy(&d->oid_index, oid);
+	s->committable = 1;
+	return 0;
+}
+
 static void wt_status_collect_changes_initial(struct wt_status *s)
 {
 	struct index_state *istate = s->repo->index;
@@ -668,6 +696,28 @@ static void wt_status_collect_changes_initial(struct wt_status *s)
 			continue;
 		if (ce_intent_to_add(ce))
 			continue;
+		if (S_ISSPARSEDIR(ce->ce_mode)) {
+			/*
+			 * This is a sparse directory entry, so we want to collect all
+			 * of the added files within the tree. This requires recursively
+			 * expanding the trees to find the elements that are new in this
+			 * tree and marking them with DIFF_STATUS_ADDED.
+			 */
+			struct strbuf base = STRBUF_INIT;
+			struct pathspec ps;
+			struct tree *tree = lookup_tree(istate->repo, &ce->oid);
+
+			memset(&ps, 0, sizeof(ps));
+			ps.recursive = 1;
+			ps.has_wildcard = 1;
+			ps.max_depth = -1;
+
+			strbuf_add(&base, ce->name, ce->ce_namelen);
+			read_tree_at(istate->repo, tree, &base, &ps,
+				     add_file_to_list, s);
+			continue;
+		}
+
 		it = string_list_insert(&s->change, ce->name);
 		d = it->util;
 		if (!d) {
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 127+ messages in thread

* [PATCH v4 12/12] fsmonitor: integrate with sparse index
  2021-05-21 11:59     ` [PATCH v4 00/12] Sparse-index: integrate with status Derrick Stolee via GitGitGadget
                         ` (10 preceding siblings ...)
  2021-05-21 11:59       ` [PATCH v4 11/12] wt-status: expand added sparse directory entries Derrick Stolee via GitGitGadget
@ 2021-05-21 11:59       ` Derrick Stolee via GitGitGadget
  2021-06-07 12:33       ` [PATCH v5 00/14] Sparse-index: integrate with status Derrick Stolee via GitGitGadget
  12 siblings, 0 replies; 127+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-05-21 11:59 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

If we need to expand a sparse-index into a full one, then the FS Monitor
bitmap is going to be incorrect. Ensure that we start fresh at such an
event.

While this is currently a performance drawback, the eventual hope of the
sparse-index feature is that these expansions will be rare and hence we
will be able to keep the FS Monitor data accurate across multiple Git
commands.

These tests are added to demonstrate that the behavior is the same
across a full index and a sparse index, but also that file modifications
to a tracked directory outside of the sparse cone will trigger
ensure_full_index().

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 sparse-index.c              |  7 ++++++
 t/t7519-status-fsmonitor.sh | 48 +++++++++++++++++++++++++++++++++++++
 2 files changed, 55 insertions(+)

diff --git a/sparse-index.c b/sparse-index.c
index b2b3fbd75050..32ba0d17ef7c 100644
--- a/sparse-index.c
+++ b/sparse-index.c
@@ -195,6 +195,10 @@ int convert_to_sparse(struct index_state *istate)
 	cache_tree_free(&istate->cache_tree);
 	cache_tree_update(istate, 0);
 
+	istate->fsmonitor_has_run_once = 0;
+	FREE_AND_NULL(istate->fsmonitor_dirty);
+	FREE_AND_NULL(istate->fsmonitor_last_update);
+
 	istate->sparse_index = 1;
 	trace2_region_leave("index", "convert_to_sparse", istate->repo);
 	return 0;
@@ -291,6 +295,9 @@ void ensure_full_index(struct index_state *istate)
 	istate->cache = full->cache;
 	istate->cache_nr = full->cache_nr;
 	istate->cache_alloc = full->cache_alloc;
+	istate->fsmonitor_has_run_once = 0;
+	FREE_AND_NULL(istate->fsmonitor_dirty);
+	FREE_AND_NULL(istate->fsmonitor_last_update);
 
 	strbuf_release(&base);
 	free(full);
diff --git a/t/t7519-status-fsmonitor.sh b/t/t7519-status-fsmonitor.sh
index 45d025f96010..f70fe961902e 100755
--- a/t/t7519-status-fsmonitor.sh
+++ b/t/t7519-status-fsmonitor.sh
@@ -73,6 +73,7 @@ test_expect_success 'setup' '
 	expect*
 	actual*
 	marker*
+	trace2*
 	EOF
 '
 
@@ -383,4 +384,51 @@ test_expect_success 'status succeeds after staging/unstaging' '
 	)
 '
 
+# Usage:
+# check_sparse_index_behavior [!]
+# If "!" is supplied, then we verify that we do not call ensure_full_index
+# during a call to 'git status'. Otherwise, we verify that we _do_ call it.
+check_sparse_index_behavior () {
+	git status --porcelain=v2 >expect &&
+	git sparse-checkout init --cone --sparse-index &&
+	git sparse-checkout set dir1 dir2 &&
+	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \
+		git status --porcelain=v2 >actual &&
+	test_region $1 index ensure_full_index trace2.txt &&
+	test_cmp expect actual &&
+	rm trace2.txt &&
+	git sparse-checkout disable
+}
+
+test_expect_success 'status succeeds with sparse index' '
+	git reset --hard &&
+
+	test_config core.fsmonitor "$TEST_DIRECTORY/t7519/fsmonitor-all" &&
+	check_sparse_index_behavior ! &&
+
+	write_script .git/hooks/fsmonitor-test<<-\EOF &&
+		printf "last_update_token\0"
+	EOF
+	git config core.fsmonitor .git/hooks/fsmonitor-test &&
+	check_sparse_index_behavior ! &&
+
+	write_script .git/hooks/fsmonitor-test<<-\EOF &&
+		printf "last_update_token\0"
+		printf "dir1/modified\0"
+	EOF
+	check_sparse_index_behavior ! &&
+
+	cp -r dir1 dir1a &&
+	git add dir1a &&
+	git commit -m "add dir1a" &&
+
+	# This one modifies outside the sparse-checkout definition
+	# and hence we expect to expand the sparse-index.
+	write_script .git/hooks/fsmonitor-test<<-\EOF &&
+		printf "last_update_token\0"
+		printf "dir1a/modified\0"
+	EOF
+	check_sparse_index_behavior
+'
+
 test_done
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [PATCH v4 07/12] unpack-trees: be careful around sparse directory entries
  2021-05-21 11:59       ` [PATCH v4 07/12] unpack-trees: be careful around sparse directory entries Derrick Stolee via GitGitGadget
@ 2021-05-28 11:36         ` Derrick Stolee
  0 siblings, 0 replies; 127+ messages in thread
From: Derrick Stolee @ 2021-05-28 11:36 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget, git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee,
	Derrick Stolee

On 5/21/2021 7:59 AM, Derrick Stolee via GitGitGadget wrote:
> From: Derrick Stolee <dstolee@microsoft.com>
> 
> The methods traverse_by_cache_tree() and unpack_nondirectories() have
> similar behavior in trying to demonstrate the difference between and
> index and a tree, with some differences about how they walk the index.

As I have been working on further sparse-index integrations,
specifically with 'git checkout', I have found an issue with this
patch that doesn't show itself in the current t1092 test script,
but appears later as more complicated scenarios appear.

I am pursuing the correct fix (that will also make 'git checkout'
work better) but it might be a week or two before I can send a v5
with that fix.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 127+ messages in thread

* [PATCH v5 00/14] Sparse-index: integrate with status
  2021-05-21 11:59     ` [PATCH v4 00/12] Sparse-index: integrate with status Derrick Stolee via GitGitGadget
                         ` (11 preceding siblings ...)
  2021-05-21 11:59       ` [PATCH v4 12/12] fsmonitor: integrate with sparse index Derrick Stolee via GitGitGadget
@ 2021-06-07 12:33       ` Derrick Stolee via GitGitGadget
  2021-06-07 12:33         ` [PATCH v5 01/14] sparse-index: skip indexes with unmerged entries Derrick Stolee via GitGitGadget
                           ` (13 more replies)
  12 siblings, 14 replies; 127+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-06-07 12:33 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee,
	Derrick Stolee

This is the first "payoff" series in the sparse-index work. It makes 'git
status' very fast when a sparse-index is enabled on a repository with
cone-mode sparse-checkout (and a small populated set).

This is based on ds/sparse-index-protections AND mt/add-rm-sparse-checkout.
The latter branch is needed because it changes the behavior of 'git add'
around sparse entries, which changes the expectations of a test added in
patch 1.

The approach here is to audit the places where ensure_full_index() pops up
while doing normal commands with pathspecs within the sparse-checkout
definition. Each of these are checked and tested. In the end, the
sparse-index is integrated with these features:

 * git status
 * FS Monitor index extension.

The performance tests in p2000-sparse-operations.sh improve by 95% or more,
even when compared with the full-index cases, not just the sparse-index
cases that previously had extra overhead.

Hopefully this is the first example of how ds/sparse-index-protections has
done the basic work to do these conversions safely, making them look easier
than they seemed when starting this adventure.

Thanks, -Stolee


Updates in V5
=============

I replaced one patch with a few that are more complicated. The reason is
that I started integrating with git checkout and realized that some of the
changes I was making in unpack-trees.c were incorrect for that situation, so
I might as well do them right here. The tests can't demonstrate the bugs
with the previous case until we integrate with git checkout, which will
follow in another series after this one is submitted.

For testing, I've integrated this series along with extensions that work for
git commit and git checkout into the Scalar functional tests, which test
many scenarios with cone mode sparse-checkout and hence provides good
evidence that this is working correctly.


Updates in V4
=============

 * The previous patch "unpack-trees: stop recursing into sparse directories"
   was confusing, and actually a bit sloppy.
 * It has been replaced with "unpack-trees: be careful around sparse
   directory entries" which takes the sparse-directory checks and raises
   them higher up into unpack_trees.c instead of in diff-lib.c.


Updates in V3
=============

Sorry that this was a long time coming. I got a little side-tracked on other
projects, but I also worked to get the sparse-index feature working against
the Scalar functional tests, which contain many special cases around the
sparse-checkout feature as they were inherited from special cases that arose
in the virtualized environment of VFS for Git. This version contains my
fixes based on that investigation. Most of these were easy to identify and
fix, but I was blocked for a long time struggling with a bug when combining
the sparse-index with the builtin FS Monitor feature, but I've reported my
findings already [1].

[1]
https://lore.kernel.org/git/0b9e54ba-ac27-e537-7bef-1b4448f92352@gmail.com/

 * Updated comments and tests based on the v2 feedback.
 * Expanded the test repository data shape based on the special cases found
   during my investigation.
 * Added several commits that either fix errors in the status code, or fix
   errors in the previous sparse-index series, specifically:
   * When in a conflict state, the cache-tree fails to update. For now, skip
     writing a sparse-index until this can be resolved more carefully.
   * When expanding a sparse-directory entry, we set the CE_SKIP_WORKTREE
     bit but forgot the CE_EXTENDED bit.
   * git status had failures if there was a sparse-directory entry as the
     first entry within a directory.
   * When expanding a directory to report its status, such as when a
     sparse-directory is staged but doesn't exist at HEAD (such as in an
     orphaned commit) we did not previously recurse correctly into
     subdirectories.
   * Be extra careful with the FS Monitor data when expanding or contracting
     an index. This version now abandons all FS Monitor data at these
     conversion points with the expectation that in the future these
     conversions will be rare so the FS Monitor feature can work
     efficiently. Updates in V2

----------------------------------------------------------------------------

 * Based on the feedback, it is clear that 'git add' will require much more
   careful testing and thought. I'm splitting it out of this series and it
   will return with a follow-up.
 * Test cases are improved, both in coverage and organization.
 * The previous "unpack-trees: make sparse aware" patch is split into three
   now.
 * Stale messages based on an old implementation of the "protections" topic
   are now fixed.
 * Performance tests were re-run.

Derrick Stolee (14):
  sparse-index: skip indexes with unmerged entries
  sparse-index: include EXTENDED flag when expanding
  t1092: replace incorrect 'echo' with 'cat'
  t1092: expand repository data shape
  t1092: add tests for status/add and sparse files
  unpack-trees: preserve cache_bottom
  unpack-trees: compare sparse directories correctly
  unpack-trees: unpack sparse directory entries
  dir.c: accept a directory as part of cone-mode patterns
  diff-lib: handle index diffs with sparse dirs
  status: skip sparse-checkout percentage with sparse-index
  status: use sparse-index throughout
  wt-status: expand added sparse directory entries
  fsmonitor: integrate with sparse index

 builtin/commit.c                         |   3 +
 diff-lib.c                               | 188 +++++++++++++++++++++++
 dir.c                                    |  11 ++
 read-cache.c                             |  10 +-
 sparse-index.c                           |  27 +++-
 t/t1092-sparse-checkout-compatibility.sh | 158 ++++++++++++++++++-
 t/t7519-status-fsmonitor.sh              |  48 ++++++
 unpack-trees.c                           | 121 +++++++++++++--
 wt-status.c                              |  64 +++++++-
 wt-status.h                              |   1 +
 10 files changed, 607 insertions(+), 24 deletions(-)


base-commit: f723f370c89ad61f4f40aabfd3540b1ce19c00e5
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-932%2Fderrickstolee%2Fsparse-index%2Fstatus-and-add-v5
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-932/derrickstolee/sparse-index/status-and-add-v5
Pull-Request: https://github.com/gitgitgadget/git/pull/932

Range-diff vs v4:

  1:  5a2ed3d1d701 =  1:  5a2ed3d1d701 sparse-index: skip indexes with unmerged entries
  2:  8aa41e749471 =  2:  8aa41e749471 sparse-index: include EXTENDED flag when expanding
  -:  ------------ >  3:  b99371c7dd61 t1092: replace incorrect 'echo' with 'cat'
  3:  70971b1f9261 !  4:  f4dddac1859e t1092: expand repository data shape
     @@ t/t1092-sparse-checkout-compatibility.sh: test_expect_success 'setup' '
      +		mkdir deep/deeper1/0/0 &&
      +		touch deep/deeper1/0/1 &&
      +		touch deep/deeper1/0/0/0 &&
     ++		cp -r deep/deeper1/0 folder1 &&
     ++		cp -r deep/deeper1/0 folder2 &&
     ++		echo >>folder1/0/0/0 &&
     ++		echo >>folder2/0/1 &&
       		git add . &&
       		git commit -m "initial commit" &&
       		git checkout -b base &&
     +@@ t/t1092-sparse-checkout-compatibility.sh: test_expect_success 'setup' '
     + 		mv folder1/a folder2/b &&
     + 		mv folder1/larger-content folder2/edited-content &&
     + 		echo >>folder2/edited-content &&
     ++		echo >>folder2/0/1 &&
     ++		echo stuff >>deep/deeper1/a &&
     + 		git add . &&
     + 		git commit -m "rename folder1/... to folder2/..." &&
     + 
     + 		git checkout -b rename-out-to-in rename-base &&
     + 		mv folder1/a deep/deeper1/b &&
     ++		echo more stuff >>deep/deeper1/a &&
     ++		rm folder2/0/1 &&
     ++		mkdir folder2/0/1 &&
     ++		echo >>folder2/0/1/1 &&
     + 		mv folder1/larger-content deep/deeper1/edited-content &&
     + 		echo >>deep/deeper1/edited-content &&
     + 		git add . &&
     +@@ t/t1092-sparse-checkout-compatibility.sh: test_expect_success 'setup' '
     + 
     + 		git checkout -b rename-in-to-out rename-base &&
     + 		mv deep/deeper1/a folder1/b &&
     ++		echo >>folder2/0/1 &&
     ++		rm -rf folder1/0/0 &&
     ++		echo >>folder1/0/0 &&
     + 		mv deep/deeper1/larger-content folder1/edited-content &&
     + 		echo >>folder1/edited-content &&
     + 		git add . &&
     +@@ t/t1092-sparse-checkout-compatibility.sh: test_expect_success 'diff --staged' '
     + 	test_all_match git diff --staged
     + '
     + 
     +-test_expect_success 'diff with renames' '
     ++test_expect_success 'diff with renames and conflicts' '
     + 	init_repos &&
     + 
     + 	for branch in rename-out-to-out rename-out-to-in rename-in-to-out
     + 	do
     + 		test_all_match git checkout rename-base &&
     + 		test_all_match git checkout $branch -- .&&
     ++		test_all_match git status --porcelain=v2 &&
     ++		test_all_match git diff --staged --no-renames &&
     ++		test_all_match git diff --staged --find-renames || return 1
     ++	done
     ++'
     ++
     ++test_expect_success 'diff with directory/file conflicts' '
     ++	init_repos &&
     ++
     ++	for branch in rename-out-to-out rename-out-to-in rename-in-to-out
     ++	do
     ++		git -C full-checkout reset --hard &&
     ++		test_sparse_match git reset --hard &&
     ++		test_all_match git checkout $branch &&
     ++		test_all_match git checkout rename-base -- . &&
     ++		test_all_match git status --porcelain=v2 &&
     + 		test_all_match git diff --staged --no-renames &&
     + 		test_all_match git diff --staged --find-renames || return 1
     + 	done
  4:  a80b5a41153f =  5:  856346b72f79 t1092: add tests for status/add and sparse files
  5:  07a45b661c4a =  6:  f3f6223e955f unpack-trees: preserve cache_bottom
  6:  cc4a526e7947 =  7:  45ae96adf285 unpack-trees: compare sparse directories correctly
  7:  e28df7f9395d !  8:  724194eef9f6 unpack-trees: be careful around sparse directory entries
     @@ Metadata
      Author: Derrick Stolee <dstolee@microsoft.com>
      
       ## Commit message ##
     -    unpack-trees: be careful around sparse directory entries
     +    unpack-trees: unpack sparse directory entries
      
     -    The methods traverse_by_cache_tree() and unpack_nondirectories() have
     -    similar behavior in trying to demonstrate the difference between and
     -    index and a tree, with some differences about how they walk the index.
     +    During unpack_callback(), index entries are compared against tree
     +    entries. These are matched according to names and types. One goal is to
     +    decide if we should recurse into subtrees or simply operate on one index
     +    entry.
      
     -    Each of these is expecting every cache entry to correspond to a file
     -    path. We need to skip over the sparse directory entries in the case of a
     -    sparse-index. Those entries are discovered in the portion that looks for
     -    subtrees among the cache entries by scanning the paths for slashes.
     +    In the case of a sparse-directory entry, we do not want to recurse into
     +    that subtree and instead simply compare the trees. In some cases, we
     +    might want to perform a merge operation on the entry, such as during
     +    'git checkout <commit>' which wants to replace a sparse tree entry with
     +    the tree for that path at the target commit. We extend the logic within
     +    unpack_nondirectories() to create a sparse-directory entry in this case,
     +    and then that is sent to call_unpack_fn().
      
     -    Skipping these sparse directory entries will have a measurable effect
     -    when we relax 'git status' to work with sparse-indexes: without this
     -    change these methods would call call_unpack_fn() which in turn calls
     -    oneway_diff() and then shows these sparse directory entries as added or
     -    modified files.
     +    There are some subtleties in this process. For instance, we need to
     +    update find_cache_entry() to allow finding a sparse-directory entry that
     +    exactly matches a given path.
      
          Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
      
       ## unpack-trees.c ##
     -@@ unpack-trees.c: static int traverse_by_cache_tree(int pos, int nr_entries, int nr_names,
     +@@ unpack-trees.c: static struct cache_entry *create_ce_entry(const struct traverse_info *info,
     + 	const struct name_entry *n,
     + 	int stage,
     + 	struct index_state *istate,
     +-	int is_transient)
     ++	int is_transient,
     ++	int is_sparse_directory)
     + {
     + 	size_t len = traverse_path_len(info, tree_entry_len(n));
     ++	size_t alloc_len = is_sparse_directory ? len + 1 : len;
     + 	struct cache_entry *ce =
     + 		is_transient ?
     +-		make_empty_transient_cache_entry(len) :
     +-		make_empty_cache_entry(istate, len);
     ++		make_empty_transient_cache_entry(alloc_len) :
     ++		make_empty_cache_entry(istate, alloc_len);
       
     - 		src[0] = o->src_index->cache[pos + i];
     + 	ce->ce_mode = create_ce_mode(n->mode);
     + 	ce->ce_flags = create_ce_flags(stage);
     +@@ unpack-trees.c: static struct cache_entry *create_ce_entry(const struct traverse_info *info,
     + 	/* len+1 because the cache_entry allocates space for NUL */
     + 	make_traverse_path(ce->name, len + 1, info, n->path, n->pathlen);
       
     -+		if (S_ISSPARSEDIR(src[0]->ce_mode))
     -+			continue;
     ++	if (is_sparse_directory) {
     ++		ce->name[len] = '/';
     ++		ce->name[len + 1] = 0;
     ++		ce->ce_namelen++;
     ++		ce->ce_flags |= CE_SKIP_WORKTREE;
     ++	}
      +
     - 		len = ce_namelen(src[0]);
     - 		new_ce_len = cache_entry_size(len);
     + 	return ce;
     + }
       
      @@ unpack-trees.c: static int unpack_nondirectories(int n, unsigned long mask,
     + 				 unsigned long dirmask,
     + 				 struct cache_entry **src,
     + 				 const struct name_entry *names,
     +-				 const struct traverse_info *info)
     ++				 const struct traverse_info *info,
     ++				 int sparse_directory)
     + {
     + 	int i;
     + 	struct unpack_trees_options *o = info->data;
     + 	unsigned long conflicts = info->df_conflicts | dirmask;
     + 
     +-	/* Do we have *only* directories? Nothing to do */
       	if (mask == dirmask && !src[0])
       		return 0;
       
     -+	if (src[0] && S_ISSPARSEDIR(src[0]->ce_mode))
     ++	/* no-op if our cache entry doesn't match the expectations. */
     ++	if (sparse_directory) {
     ++		if (src[0] && !S_ISSPARSEDIR(src[0]->ce_mode))
     ++			BUG("expected sparse directory entry");
     ++	} else if (src[0] && S_ISSPARSEDIR(src[0]->ce_mode)) {
      +		return 0;
     ++	}
      +
       	/*
       	 * Ok, we've filled in up to any potential index entry in src[0],
       	 * now do the rest.
     +@@ unpack-trees.c: static int unpack_nondirectories(int n, unsigned long mask,
     + 		 * not stored in the index.  otherwise construct the
     + 		 * cache entry from the index aware logic.
     + 		 */
     +-		src[i + o->merge] = create_ce_entry(info, names + i, stage, &o->result, o->merge);
     ++		src[i + o->merge] = create_ce_entry(info, names + i, stage,
     ++						    &o->result, o->merge,
     ++						    sparse_directory);
     + 	}
     + 
     + 	if (o->merge) {
     +@@ unpack-trees.c: static int find_cache_pos(struct traverse_info *info,
     + static struct cache_entry *find_cache_entry(struct traverse_info *info,
     + 					    const struct name_entry *p)
     + {
     ++	struct cache_entry *ce;
     + 	int pos = find_cache_pos(info, p->path, p->pathlen);
     + 	struct unpack_trees_options *o = info->data;
     + 
     + 	if (0 <= pos)
     + 		return o->src_index->cache[pos];
     +-	else
     ++
     ++	/*
     ++	 * Check for a sparse-directory entry named "path/".
     ++	 * Due to the input p->path not having a trailing
     ++	 * slash, the negative 'pos' value overshoots the
     ++	 * expected position by one, hence "-2" here.
     ++	 */
     ++	pos = -pos - 2;
     ++
     ++	if (pos < 0 || pos >= o->src_index->cache_nr)
     ++		return NULL;
     ++
     ++	ce = o->src_index->cache[pos];
     ++
     ++	if (!S_ISSPARSEDIR(ce->ce_mode))
     + 		return NULL;
     ++
     ++	/*
     ++	 * Compare ce->name to info->name + '/' + p->path + '/'
     ++	 * if info->name is non-empty. Compare ce->name to
     ++	 * p-.path + '/' otherwise.
     ++	 */
     ++	if (info->namelen) {
     ++		if (ce->ce_namelen == info->namelen + p->pathlen + 2 &&
     ++		    ce->name[info->namelen] == '/' &&
     ++		    !strncmp(ce->name, info->name, info->namelen) &&
     ++		    !strncmp(ce->name + info->namelen + 1, p->path, p->pathlen))
     ++			return ce;
     ++	} else if (ce->ce_namelen == p->pathlen + 1 &&
     ++		   !strncmp(ce->name, p->path, p->pathlen))
     ++		return ce;
     ++	return NULL;
     + }
     + 
     + static void debug_path(struct traverse_info *info)
     +@@ unpack-trees.c: static void debug_unpack_callback(int n,
     + 		debug_name_entry(i, names + i);
     + }
     + 
     ++/*
     ++ * Returns true if and only if the given cache_entry is a
     ++ * sparse-directory entry that matches the given name_entry
     ++ * from the tree walk at the given traverse_info.
     ++ */
     ++static int is_sparse_directory_entry(struct cache_entry *ce, struct name_entry *name, struct traverse_info *info)
     ++{
     ++	size_t expected_len, name_start;
     ++
     ++	if (!ce || !name || !S_ISSPARSEDIR(ce->ce_mode))
     ++		return 0;
     ++
     ++	if (info->namelen)
     ++		name_start = info->namelen + 1;
     ++	else
     ++		name_start = 0;
     ++	expected_len = name->pathlen + 1 + name_start;
     ++
     ++	if (ce->ce_namelen != expected_len ||
     ++	    strncmp(ce->name, info->name, info->namelen) ||
     ++	    strncmp(ce->name + name_start, name->path, name->pathlen))
     ++		return 0;
     ++
     ++	return 1;
     ++}
     ++
     + /*
     +  * Note that traverse_by_cache_tree() duplicates some logic in this function
     +  * without actually calling it. If you change the logic here you may need to
     +@@ unpack-trees.c: static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, str
     + 		}
     + 	}
     + 
     +-	if (unpack_nondirectories(n, mask, dirmask, src, names, info) < 0)
     ++	if (unpack_nondirectories(n, mask, dirmask, src, names, info, 0) < 0)
     + 		return -1;
     + 
     + 	if (o->merge && src[0]) {
     +@@ unpack-trees.c: static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, str
     + 			}
     + 		}
     + 
     +-		if (traverse_trees_recursive(n, dirmask, mask & ~dirmask,
     +-					     names, info) < 0)
     ++		if (is_sparse_directory_entry(src[0], names, info)) {
     ++			if (unpack_nondirectories(n, dirmask, mask & ~dirmask, src, names, info, 1) < 0)
     ++				return -1;
     ++		} else if (traverse_trees_recursive(n, dirmask, mask & ~dirmask,
     ++						    names, info) < 0) {
     + 			return -1;
     ++		}
     ++
     + 		return mask;
     + 	}
     + 
  8:  2cc3a93d4434 =  9:  b8ff179f43e3 dir.c: accept a directory as part of cone-mode patterns
  -:  ------------ > 10:  b9b97e011293 diff-lib: handle index diffs with sparse dirs
  9:  5011feb1aa04 = 11:  611b9f61fb2c status: skip sparse-checkout percentage with sparse-index
 10:  9f2ce5301dc9 = 12:  0c0a765dde80 status: use sparse-index throughout
 11:  24417e095243 ! 13:  02f2c7b63982 wt-status: expand added sparse directory entries
     @@ t/t1092-sparse-checkout-compatibility.sh: test_expect_success 'sparse-index is n
      +	init_repos &&
      +
      +	test_all_match git checkout rename-out-to-in &&
     -+	test_all_match git reset --mixed HEAD~1 &&
     ++
     ++	# Sparse checkouts do not agree with full checkouts about
     ++	# how to report a directory/file conflict during a reset.
     ++	# This command would fail with test_all_match because the
     ++	# full checkout reports "T folder1/0/1" while a sparse
     ++	# checkout reports "D folder1/0/1". This matches because
     ++	# the sparse checkouts skip "adding" the other side of
     ++	# the conflict.
     ++	test_sparse_match git reset --mixed HEAD~1 &&
      +	test_sparse_match test-tool read-cache --table --expand &&
     -+	test_all_match git status --porcelain=v2 &&
     -+	test_all_match git status --porcelain=v2 &&
     ++	test_sparse_match git status --porcelain=v2 &&
     ++	test_sparse_match git status --porcelain=v2 &&
      +
      +	# At this point, sparse-checkouts behave differently
      +	# from the full-checkout.
 12:  584d4b559a91 = 14:  46ca150c3548 fsmonitor: integrate with sparse index

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 127+ messages in thread

* [PATCH v5 01/14] sparse-index: skip indexes with unmerged entries
  2021-06-07 12:33       ` [PATCH v5 00/14] Sparse-index: integrate with status Derrick Stolee via GitGitGadget
@ 2021-06-07 12:33         ` Derrick Stolee via GitGitGadget
  2021-06-07 12:34         ` [PATCH v5 02/14] sparse-index: include EXTENDED flag when expanding Derrick Stolee via GitGitGadget
                           ` (12 subsequent siblings)
  13 siblings, 0 replies; 127+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-06-07 12:33 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

The sparse-index format is designed to be compatible with merge
conflicts, even those outside the sparse-checkout definition. The reason
is that when converting a full index to a sparse one, a cache entry with
nonzero stage will not be collapsed into a sparse directory entry.

However, this behavior was not tested, and a different behavior within
convert_to_sparse() fails in this scenario. Specifically,
cache_tree_update() will fail when unmerged entries exist.
convert_to_sparse_rec() uses the cache-tree data to recursively walk the
tree structure, but also to compute the OIDs used in the
sparse-directory entries.

Add an index scan to convert_to_sparse() that will detect if these merge
conflict entries exist and skip the conversion before trying to update
the cache-tree. This is marked as NEEDSWORK because this can be removed
with a suitable update to cache_tree_update() or a similar method that
can construct a cache-tree with invalid nodes, but still allow creating
the nodes necessary for creating sparse directory entries.

It is possible that in the future we will not need to make such an
update, since if we do not expand a sparse-index into a full one, this
conversion does not need to happen. Thus, this can be deferred until the
merge machinery is made to integrate with the sparse-index.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 sparse-index.c                           | 18 ++++++++++++++++++
 t/t1092-sparse-checkout-compatibility.sh | 22 ++++++++++++++++++++++
 2 files changed, 40 insertions(+)

diff --git a/sparse-index.c b/sparse-index.c
index 6f21397e2ee0..1b49898d0cb7 100644
--- a/sparse-index.c
+++ b/sparse-index.c
@@ -125,6 +125,17 @@ int set_sparse_index_config(struct repository *repo, int enable)
 	return res;
 }
 
+static int index_has_unmerged_entries(struct index_state *istate)
+{
+	int i;
+	for (i = 0; i < istate->cache_nr; i++) {
+		if (ce_stage(istate->cache[i]))
+			return 1;
+	}
+
+	return 0;
+}
+
 int convert_to_sparse(struct index_state *istate)
 {
 	int test_env;
@@ -161,6 +172,13 @@ int convert_to_sparse(struct index_state *istate)
 		return -1;
 	}
 
+	/*
+	 * NEEDSWORK: If we have unmerged entries, then stay full.
+	 * Unmerged entries prevent the cache-tree extension from working.
+	 */
+	if (index_has_unmerged_entries(istate))
+		return 0;
+
 	if (cache_tree_update(istate, 0)) {
 		warning(_("unable to update cache-tree, staying full"));
 		return -1;
diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index 12e6c453024f..4f2f09b53a32 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -352,6 +352,28 @@ test_expect_success 'merge with outside renames' '
 	done
 '
 
+# Sparse-index fails to convert the index in the
+# final 'git cherry-pick' command.
+test_expect_success 'cherry-pick with conflicts' '
+	init_repos &&
+
+	write_script edit-conflict <<-\EOF &&
+	echo $1 >conflict
+	EOF
+
+	test_all_match git checkout -b to-cherry-pick &&
+	run_on_all ../edit-conflict ABC &&
+	test_all_match git add conflict &&
+	test_all_match git commit -m "conflict to pick" &&
+
+	test_all_match git checkout -B base HEAD~1 &&
+	run_on_all ../edit-conflict DEF &&
+	test_all_match git add conflict &&
+	test_all_match git commit -m "conflict in base" &&
+
+	test_all_match test_must_fail git cherry-pick to-cherry-pick
+'
+
 test_expect_success 'clean' '
 	init_repos &&
 
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 127+ messages in thread

* [PATCH v5 02/14] sparse-index: include EXTENDED flag when expanding
  2021-06-07 12:33       ` [PATCH v5 00/14] Sparse-index: integrate with status Derrick Stolee via GitGitGadget
  2021-06-07 12:33         ` [PATCH v5 01/14] sparse-index: skip indexes with unmerged entries Derrick Stolee via GitGitGadget
@ 2021-06-07 12:34         ` Derrick Stolee via GitGitGadget
  2021-06-08 18:56           ` Elijah Newren
  2021-06-07 12:34         ` [PATCH v5 03/14] t1092: replace incorrect 'echo' with 'cat' Derrick Stolee via GitGitGadget
                           ` (11 subsequent siblings)
  13 siblings, 1 reply; 127+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-06-07 12:34 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

When creating a full index from a sparse one, we create cache entries
for every blob within a given sparse directory entry. These are
correctly marked with the CE_SKIP_WORKTREE flag, but they must also be
marked with the CE_EXTENDED flag to ensure that the skip-worktree bit is
correctly written to disk in the case that the index is not converted
back down to a sparse-index.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 sparse-index.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/sparse-index.c b/sparse-index.c
index 1b49898d0cb7..b2b3fbd75050 100644
--- a/sparse-index.c
+++ b/sparse-index.c
@@ -222,7 +222,7 @@ static int add_path_to_index(const struct object_id *oid,
 	strbuf_addstr(base, path);
 
 	ce = make_cache_entry(istate, mode, oid, base->buf, 0, 0);
-	ce->ce_flags |= CE_SKIP_WORKTREE;
+	ce->ce_flags |= CE_SKIP_WORKTREE | CE_EXTENDED;
 	set_index_entry(istate, istate->cache_nr++, ce);
 
 	strbuf_setlen(base, len);
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 127+ messages in thread

* [PATCH v5 03/14] t1092: replace incorrect 'echo' with 'cat'
  2021-06-07 12:33       ` [PATCH v5 00/14] Sparse-index: integrate with status Derrick Stolee via GitGitGadget
  2021-06-07 12:33         ` [PATCH v5 01/14] sparse-index: skip indexes with unmerged entries Derrick Stolee via GitGitGadget
  2021-06-07 12:34         ` [PATCH v5 02/14] sparse-index: include EXTENDED flag when expanding Derrick Stolee via GitGitGadget
@ 2021-06-07 12:34         ` Derrick Stolee via GitGitGadget
  2021-06-08 19:18           ` Elijah Newren
  2021-06-07 12:34         ` [PATCH v5 04/14] t1092: expand repository data shape Derrick Stolee via GitGitGadget
                           ` (10 subsequent siblings)
  13 siblings, 1 reply; 127+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-06-07 12:34 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

This fixes the test data shape to be as expected, allowing rename
detection to work properly now that the 'larger-conent' file actually
has meaningful lines.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 t/t1092-sparse-checkout-compatibility.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index 4f2f09b53a32..d55478a1902b 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -40,7 +40,7 @@ test_expect_success 'setup' '
 		done &&
 
 		git checkout -b rename-base base &&
-		echo >folder1/larger-content <<-\EOF &&
+		cat >folder1/larger-content <<-\EOF &&
 		matching
 		lines
 		help
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 127+ messages in thread

* [PATCH v5 04/14] t1092: expand repository data shape
  2021-06-07 12:33       ` [PATCH v5 00/14] Sparse-index: integrate with status Derrick Stolee via GitGitGadget
                           ` (2 preceding siblings ...)
  2021-06-07 12:34         ` [PATCH v5 03/14] t1092: replace incorrect 'echo' with 'cat' Derrick Stolee via GitGitGadget
@ 2021-06-07 12:34         ` Derrick Stolee via GitGitGadget
  2021-06-07 12:34         ` [PATCH v5 05/14] t1092: add tests for status/add and sparse files Derrick Stolee via GitGitGadget
                           ` (9 subsequent siblings)
  13 siblings, 0 replies; 127+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-06-07 12:34 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

As more features integrate with the sparse-index feature, more and more
special cases arise that require different data shapes within the tree
structure of the repository in order to demonstrate those cases.

Add several interesting special cases all at once instead of sprinkling
them across several commits. The interesting cases being added here are:

* Add sparse-directory entries on both sides of directories within the
  sparse-checkout definition.

* Add directories outside the sparse-checkout definition who have only
  one entry and are the first entry of a directory with multiple
  entries.

Later tests will take advantage of these shapes, but they also deepen
the tests that already exist.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 t/t1092-sparse-checkout-compatibility.sh | 39 ++++++++++++++++++++++--
 1 file changed, 37 insertions(+), 2 deletions(-)

diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index d55478a1902b..014a507d8b06 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -17,7 +17,7 @@ test_expect_success 'setup' '
 		echo "after folder1" >g &&
 		echo "after x" >z &&
 		mkdir folder1 folder2 deep x &&
-		mkdir deep/deeper1 deep/deeper2 &&
+		mkdir deep/deeper1 deep/deeper2 deep/before deep/later &&
 		mkdir deep/deeper1/deepest &&
 		echo "after deeper1" >deep/e &&
 		echo "after deepest" >deep/deeper1/e &&
@@ -25,10 +25,20 @@ test_expect_success 'setup' '
 		cp a folder2 &&
 		cp a x &&
 		cp a deep &&
+		cp a deep/before &&
 		cp a deep/deeper1 &&
 		cp a deep/deeper2 &&
+		cp a deep/later &&
 		cp a deep/deeper1/deepest &&
 		cp -r deep/deeper1/deepest deep/deeper2 &&
+		mkdir deep/deeper1/0 &&
+		mkdir deep/deeper1/0/0 &&
+		touch deep/deeper1/0/1 &&
+		touch deep/deeper1/0/0/0 &&
+		cp -r deep/deeper1/0 folder1 &&
+		cp -r deep/deeper1/0 folder2 &&
+		echo >>folder1/0/0/0 &&
+		echo >>folder2/0/1 &&
 		git add . &&
 		git commit -m "initial commit" &&
 		git checkout -b base &&
@@ -56,11 +66,17 @@ test_expect_success 'setup' '
 		mv folder1/a folder2/b &&
 		mv folder1/larger-content folder2/edited-content &&
 		echo >>folder2/edited-content &&
+		echo >>folder2/0/1 &&
+		echo stuff >>deep/deeper1/a &&
 		git add . &&
 		git commit -m "rename folder1/... to folder2/..." &&
 
 		git checkout -b rename-out-to-in rename-base &&
 		mv folder1/a deep/deeper1/b &&
+		echo more stuff >>deep/deeper1/a &&
+		rm folder2/0/1 &&
+		mkdir folder2/0/1 &&
+		echo >>folder2/0/1/1 &&
 		mv folder1/larger-content deep/deeper1/edited-content &&
 		echo >>deep/deeper1/edited-content &&
 		git add . &&
@@ -68,6 +84,9 @@ test_expect_success 'setup' '
 
 		git checkout -b rename-in-to-out rename-base &&
 		mv deep/deeper1/a folder1/b &&
+		echo >>folder2/0/1 &&
+		rm -rf folder1/0/0 &&
+		echo >>folder1/0/0 &&
 		mv deep/deeper1/larger-content folder1/edited-content &&
 		echo >>folder1/edited-content &&
 		git add . &&
@@ -262,13 +281,29 @@ test_expect_success 'diff --staged' '
 	test_all_match git diff --staged
 '
 
-test_expect_success 'diff with renames' '
+test_expect_success 'diff with renames and conflicts' '
 	init_repos &&
 
 	for branch in rename-out-to-out rename-out-to-in rename-in-to-out
 	do
 		test_all_match git checkout rename-base &&
 		test_all_match git checkout $branch -- .&&
+		test_all_match git status --porcelain=v2 &&
+		test_all_match git diff --staged --no-renames &&
+		test_all_match git diff --staged --find-renames || return 1
+	done
+'
+
+test_expect_success 'diff with directory/file conflicts' '
+	init_repos &&
+
+	for branch in rename-out-to-out rename-out-to-in rename-in-to-out
+	do
+		git -C full-checkout reset --hard &&
+		test_sparse_match git reset --hard &&
+		test_all_match git checkout $branch &&
+		test_all_match git checkout rename-base -- . &&
+		test_all_match git status --porcelain=v2 &&
 		test_all_match git diff --staged --no-renames &&
 		test_all_match git diff --staged --find-renames || return 1
 	done
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 127+ messages in thread

* [PATCH v5 05/14] t1092: add tests for status/add and sparse files
  2021-06-07 12:33       ` [PATCH v5 00/14] Sparse-index: integrate with status Derrick Stolee via GitGitGadget
                           ` (3 preceding siblings ...)
  2021-06-07 12:34         ` [PATCH v5 04/14] t1092: expand repository data shape Derrick Stolee via GitGitGadget
@ 2021-06-07 12:34         ` Derrick Stolee via GitGitGadget
  2021-06-07 12:34         ` [PATCH v5 06/14] unpack-trees: preserve cache_bottom Derrick Stolee via GitGitGadget
                           ` (8 subsequent siblings)
  13 siblings, 0 replies; 127+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-06-07 12:34 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Before moving to update 'git status' and 'git add' to work with sparse
indexes, add an explicit test that ensures the sparse-index works the
same as a normal sparse-checkout when the worktree contains directories
and files outside of the sparse cone.

Specifically, 'folder1/a' is a file in our test repo, but 'folder1' is
not in the sparse cone. When 'folder1/a' is modified, the file is not
shown as modified and adding it will fail. This is new behavior as of
a20f704 (add: warn when asked to update SKIP_WORKTREE entries,
2021-04-08). Before that change, these adds would be silently ignored.

Untracked files are fine: adding new files both with 'git add .' and
'git add folder1/' works just as in a full checkout. This may not be
entirely desirable, but we are not intending to change behavior at the
moment, only document it. A future change could alter the behavior to
be more sensible, and this test could be modified to satisfy the new
expected behavior.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 t/t1092-sparse-checkout-compatibility.sh | 38 ++++++++++++++++++++++++
 1 file changed, 38 insertions(+)

diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index 014a507d8b06..851a83388e4b 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -251,6 +251,44 @@ test_expect_success 'add, commit, checkout' '
 	test_all_match git checkout -
 '
 
+test_expect_success 'status/add: outside sparse cone' '
+	init_repos &&
+
+	# adding a "missing" file outside the cone should fail
+	test_sparse_match test_must_fail git add folder1/a &&
+
+	# folder1 is at HEAD, but outside the sparse cone
+	run_on_sparse mkdir folder1 &&
+	cp initial-repo/folder1/a sparse-checkout/folder1/a &&
+	cp initial-repo/folder1/a sparse-index/folder1/a &&
+
+	test_sparse_match git status &&
+
+	write_script edit-contents <<-\EOF &&
+	echo text >>$1
+	EOF
+	run_on_sparse ../edit-contents folder1/a &&
+	run_on_all ../edit-contents folder1/new &&
+
+	test_sparse_match git status --porcelain=v2 &&
+
+	# This "git add folder1/a" fails with a warning
+	# in the sparse repos, differing from the full
+	# repo. This is intentional.
+	test_sparse_match test_must_fail git add folder1/a &&
+	test_sparse_match test_must_fail git add --refresh folder1/a &&
+	test_all_match git status --porcelain=v2 &&
+
+	test_all_match git add . &&
+	test_all_match git status --porcelain=v2 &&
+	test_all_match git commit -m folder1/new &&
+
+	run_on_all ../edit-contents folder1/newer &&
+	test_all_match git add folder1/ &&
+	test_all_match git status --porcelain=v2 &&
+	test_all_match git commit -m folder1/newer
+'
+
 test_expect_success 'checkout and reset --hard' '
 	init_repos &&
 
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 127+ messages in thread

* [PATCH v5 06/14] unpack-trees: preserve cache_bottom
  2021-06-07 12:33       ` [PATCH v5 00/14] Sparse-index: integrate with status Derrick Stolee via GitGitGadget
                           ` (4 preceding siblings ...)
  2021-06-07 12:34         ` [PATCH v5 05/14] t1092: add tests for status/add and sparse files Derrick Stolee via GitGitGadget
@ 2021-06-07 12:34         ` Derrick Stolee via GitGitGadget
  2021-06-07 12:34         ` [PATCH v5 07/14] unpack-trees: compare sparse directories correctly Derrick Stolee via GitGitGadget
                           ` (7 subsequent siblings)
  13 siblings, 0 replies; 127+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-06-07 12:34 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

The cache_bottom member of 'struct unpack_trees_options' is used to
track the range of index entries corresponding to a node of the cache
tree. While recursing with traverse_by_cache_tree(), this value is
preserved on the call stack using a local and then restored as that
method returns.

The mark_ce_used() method normally modifies the cache_bottom member when
it refers to the marked cache entry. However, sparse directory entries
are stored as nodes in the cache-tree data structure as of 2de37c53
(cache-tree: integrate with sparse directory entries, 2021-03-30). Thus,
the cache_bottom will be modified as the cache-tree walk advances. Do
not update it as well within mark_ce_used().

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 unpack-trees.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/unpack-trees.c b/unpack-trees.c
index dddf106d5bd4..1067db19c9d2 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -586,6 +586,13 @@ static void mark_ce_used(struct cache_entry *ce, struct unpack_trees_options *o)
 {
 	ce->ce_flags |= CE_UNPACKED;
 
+	/*
+	 * If this is a sparse directory, don't advance cache_bottom.
+	 * That will be advanced later using the cache-tree data.
+	 */
+	if (S_ISSPARSEDIR(ce->ce_mode))
+		return;
+
 	if (o->cache_bottom < o->src_index->cache_nr &&
 	    o->src_index->cache[o->cache_bottom] == ce) {
 		int bottom = o->cache_bottom;
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 127+ messages in thread

* [PATCH v5 07/14] unpack-trees: compare sparse directories correctly
  2021-06-07 12:33       ` [PATCH v5 00/14] Sparse-index: integrate with status Derrick Stolee via GitGitGadget
                           ` (5 preceding siblings ...)
  2021-06-07 12:34         ` [PATCH v5 06/14] unpack-trees: preserve cache_bottom Derrick Stolee via GitGitGadget
@ 2021-06-07 12:34         ` Derrick Stolee via GitGitGadget
  2021-06-07 12:34         ` [PATCH v5 08/14] unpack-trees: unpack sparse directory entries Derrick Stolee via GitGitGadget
                           ` (6 subsequent siblings)
  13 siblings, 0 replies; 127+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-06-07 12:34 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

As we further integrate the sparse-index into unpack-trees, we need to
ensure that we compare sparse directory entries correctly with other
entries. This affects searching for an exact path as well as sorting
index entries.

Sparse directory entries contain the trailing directory separator. This
is important for the sorting, in particular. Thus, within
do_compare_entry() we stop using S_IFREG in all cases, since sparse
directories should use S_IFDIR to indicate that the comparison should
treat the entry name as a dirctory.

Within compare_entry(), it first calls do_compare_entry() to check the
leading portion of the name. When the input path is a directory name, we
could match exactly already. Thus, we should return 0 if we have an
exact string match on a sparse directory entry.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 unpack-trees.c | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/unpack-trees.c b/unpack-trees.c
index 1067db19c9d2..ef6a2b1c951c 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -969,6 +969,7 @@ static int do_compare_entry(const struct cache_entry *ce,
 	int pathlen, ce_len;
 	const char *ce_name;
 	int cmp;
+	unsigned ce_mode;
 
 	/*
 	 * If we have not precomputed the traverse path, it is quicker
@@ -991,7 +992,8 @@ static int do_compare_entry(const struct cache_entry *ce,
 	ce_len -= pathlen;
 	ce_name = ce->name + pathlen;
 
-	return df_name_compare(ce_name, ce_len, S_IFREG, name, namelen, mode);
+	ce_mode = S_ISSPARSEDIR(ce->ce_mode) ? S_IFDIR : S_IFREG;
+	return df_name_compare(ce_name, ce_len, ce_mode, name, namelen, mode);
 }
 
 static int compare_entry(const struct cache_entry *ce, const struct traverse_info *info, const struct name_entry *n)
@@ -1000,6 +1002,15 @@ static int compare_entry(const struct cache_entry *ce, const struct traverse_inf
 	if (cmp)
 		return cmp;
 
+	/*
+	 * At this point, we know that we have a prefix match. If ce
+	 * is a sparse directory, then allow an exact match. This only
+	 * works when the input name is a directory, since ce->name
+	 * ends in a directory separator.
+	 */
+	if (S_ISSPARSEDIR(ce->ce_mode))
+		return 0;
+
 	/*
 	 * Even if the beginning compared identically, the ce should
 	 * compare as bigger than a directory leading up to it!
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 127+ messages in thread

* [PATCH v5 08/14] unpack-trees: unpack sparse directory entries
  2021-06-07 12:33       ` [PATCH v5 00/14] Sparse-index: integrate with status Derrick Stolee via GitGitGadget
                           ` (6 preceding siblings ...)
  2021-06-07 12:34         ` [PATCH v5 07/14] unpack-trees: compare sparse directories correctly Derrick Stolee via GitGitGadget
@ 2021-06-07 12:34         ` Derrick Stolee via GitGitGadget
  2021-06-09  3:48           ` Elijah Newren
  2021-06-07 12:34         ` [PATCH v5 09/14] dir.c: accept a directory as part of cone-mode patterns Derrick Stolee via GitGitGadget
                           ` (5 subsequent siblings)
  13 siblings, 1 reply; 127+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-06-07 12:34 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

During unpack_callback(), index entries are compared against tree
entries. These are matched according to names and types. One goal is to
decide if we should recurse into subtrees or simply operate on one index
entry.

In the case of a sparse-directory entry, we do not want to recurse into
that subtree and instead simply compare the trees. In some cases, we
might want to perform a merge operation on the entry, such as during
'git checkout <commit>' which wants to replace a sparse tree entry with
the tree for that path at the target commit. We extend the logic within
unpack_nondirectories() to create a sparse-directory entry in this case,
and then that is sent to call_unpack_fn().

There are some subtleties in this process. For instance, we need to
update find_cache_entry() to allow finding a sparse-directory entry that
exactly matches a given path.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 unpack-trees.c | 101 ++++++++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 91 insertions(+), 10 deletions(-)

diff --git a/unpack-trees.c b/unpack-trees.c
index ef6a2b1c951c..ff448ee8424e 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -1037,13 +1037,15 @@ static struct cache_entry *create_ce_entry(const struct traverse_info *info,
 	const struct name_entry *n,
 	int stage,
 	struct index_state *istate,
-	int is_transient)
+	int is_transient,
+	int is_sparse_directory)
 {
 	size_t len = traverse_path_len(info, tree_entry_len(n));
+	size_t alloc_len = is_sparse_directory ? len + 1 : len;
 	struct cache_entry *ce =
 		is_transient ?
-		make_empty_transient_cache_entry(len) :
-		make_empty_cache_entry(istate, len);
+		make_empty_transient_cache_entry(alloc_len) :
+		make_empty_cache_entry(istate, alloc_len);
 
 	ce->ce_mode = create_ce_mode(n->mode);
 	ce->ce_flags = create_ce_flags(stage);
@@ -1052,6 +1054,13 @@ static struct cache_entry *create_ce_entry(const struct traverse_info *info,
 	/* len+1 because the cache_entry allocates space for NUL */
 	make_traverse_path(ce->name, len + 1, info, n->path, n->pathlen);
 
+	if (is_sparse_directory) {
+		ce->name[len] = '/';
+		ce->name[len + 1] = 0;
+		ce->ce_namelen++;
+		ce->ce_flags |= CE_SKIP_WORKTREE;
+	}
+
 	return ce;
 }
 
@@ -1064,16 +1073,24 @@ static int unpack_nondirectories(int n, unsigned long mask,
 				 unsigned long dirmask,
 				 struct cache_entry **src,
 				 const struct name_entry *names,
-				 const struct traverse_info *info)
+				 const struct traverse_info *info,
+				 int sparse_directory)
 {
 	int i;
 	struct unpack_trees_options *o = info->data;
 	unsigned long conflicts = info->df_conflicts | dirmask;
 
-	/* Do we have *only* directories? Nothing to do */
 	if (mask == dirmask && !src[0])
 		return 0;
 
+	/* no-op if our cache entry doesn't match the expectations. */
+	if (sparse_directory) {
+		if (src[0] && !S_ISSPARSEDIR(src[0]->ce_mode))
+			BUG("expected sparse directory entry");
+	} else if (src[0] && S_ISSPARSEDIR(src[0]->ce_mode)) {
+		return 0;
+	}
+
 	/*
 	 * Ok, we've filled in up to any potential index entry in src[0],
 	 * now do the rest.
@@ -1103,7 +1120,9 @@ static int unpack_nondirectories(int n, unsigned long mask,
 		 * not stored in the index.  otherwise construct the
 		 * cache entry from the index aware logic.
 		 */
-		src[i + o->merge] = create_ce_entry(info, names + i, stage, &o->result, o->merge);
+		src[i + o->merge] = create_ce_entry(info, names + i, stage,
+						    &o->result, o->merge,
+						    sparse_directory);
 	}
 
 	if (o->merge) {
@@ -1210,13 +1229,44 @@ static int find_cache_pos(struct traverse_info *info,
 static struct cache_entry *find_cache_entry(struct traverse_info *info,
 					    const struct name_entry *p)
 {
+	struct cache_entry *ce;
 	int pos = find_cache_pos(info, p->path, p->pathlen);
 	struct unpack_trees_options *o = info->data;
 
 	if (0 <= pos)
 		return o->src_index->cache[pos];
-	else
+
+	/*
+	 * Check for a sparse-directory entry named "path/".
+	 * Due to the input p->path not having a trailing
+	 * slash, the negative 'pos' value overshoots the
+	 * expected position by one, hence "-2" here.
+	 */
+	pos = -pos - 2;
+
+	if (pos < 0 || pos >= o->src_index->cache_nr)
+		return NULL;
+
+	ce = o->src_index->cache[pos];
+
+	if (!S_ISSPARSEDIR(ce->ce_mode))
 		return NULL;
+
+	/*
+	 * Compare ce->name to info->name + '/' + p->path + '/'
+	 * if info->name is non-empty. Compare ce->name to
+	 * p-.path + '/' otherwise.
+	 */
+	if (info->namelen) {
+		if (ce->ce_namelen == info->namelen + p->pathlen + 2 &&
+		    ce->name[info->namelen] == '/' &&
+		    !strncmp(ce->name, info->name, info->namelen) &&
+		    !strncmp(ce->name + info->namelen + 1, p->path, p->pathlen))
+			return ce;
+	} else if (ce->ce_namelen == p->pathlen + 1 &&
+		   !strncmp(ce->name, p->path, p->pathlen))
+		return ce;
+	return NULL;
 }
 
 static void debug_path(struct traverse_info *info)
@@ -1251,6 +1301,32 @@ static void debug_unpack_callback(int n,
 		debug_name_entry(i, names + i);
 }
 
+/*
+ * Returns true if and only if the given cache_entry is a
+ * sparse-directory entry that matches the given name_entry
+ * from the tree walk at the given traverse_info.
+ */
+static int is_sparse_directory_entry(struct cache_entry *ce, struct name_entry *name, struct traverse_info *info)
+{
+	size_t expected_len, name_start;
+
+	if (!ce || !name || !S_ISSPARSEDIR(ce->ce_mode))
+		return 0;
+
+	if (info->namelen)
+		name_start = info->namelen + 1;
+	else
+		name_start = 0;
+	expected_len = name->pathlen + 1 + name_start;
+
+	if (ce->ce_namelen != expected_len ||
+	    strncmp(ce->name, info->name, info->namelen) ||
+	    strncmp(ce->name + name_start, name->path, name->pathlen))
+		return 0;
+
+	return 1;
+}
+
 /*
  * Note that traverse_by_cache_tree() duplicates some logic in this function
  * without actually calling it. If you change the logic here you may need to
@@ -1307,7 +1383,7 @@ static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, str
 		}
 	}
 
-	if (unpack_nondirectories(n, mask, dirmask, src, names, info) < 0)
+	if (unpack_nondirectories(n, mask, dirmask, src, names, info, 0) < 0)
 		return -1;
 
 	if (o->merge && src[0]) {
@@ -1337,9 +1413,14 @@ static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, str
 			}
 		}
 
-		if (traverse_trees_recursive(n, dirmask, mask & ~dirmask,
-					     names, info) < 0)
+		if (is_sparse_directory_entry(src[0], names, info)) {
+			if (unpack_nondirectories(n, dirmask, mask & ~dirmask, src, names, info, 1) < 0)
+				return -1;
+		} else if (traverse_trees_recursive(n, dirmask, mask & ~dirmask,
+						    names, info) < 0) {
 			return -1;
+		}
+
 		return mask;
 	}
 
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 127+ messages in thread

* [PATCH v5 09/14] dir.c: accept a directory as part of cone-mode patterns
  2021-06-07 12:33       ` [PATCH v5 00/14] Sparse-index: integrate with status Derrick Stolee via GitGitGadget
                           ` (7 preceding siblings ...)
  2021-06-07 12:34         ` [PATCH v5 08/14] unpack-trees: unpack sparse directory entries Derrick Stolee via GitGitGadget
@ 2021-06-07 12:34         ` Derrick Stolee via GitGitGadget
  2021-06-07 12:34         ` [PATCH v5 10/14] diff-lib: handle index diffs with sparse dirs Derrick Stolee via GitGitGadget
                           ` (4 subsequent siblings)
  13 siblings, 0 replies; 127+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-06-07 12:34 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

When we have sparse directory entries in the index, we want to compare
that directory against sparse-checkout patterns. Those pattern matching
algorithms are built expecting a file path, not a directory path. This
is especially important in the "cone mode" patterns which will match
files that exist within the "parent directories" as well as the
recursive directory matches.

If path_matches_pattern_list() is given a directory, we can add a fake
filename ("-") to the directory and get the same results as before,
assuming we are in cone mode. Since sparse index requires cone mode
patterns, this is an acceptable assumption.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 dir.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/dir.c b/dir.c
index 166238e79f52..ab76ef286495 100644
--- a/dir.c
+++ b/dir.c
@@ -1378,6 +1378,17 @@ enum pattern_match_result path_matches_pattern_list(
 	strbuf_addch(&parent_pathname, '/');
 	strbuf_add(&parent_pathname, pathname, pathlen);
 
+	/*
+	 * Directory entries are matched if and only if a file
+	 * contained immediately within them is matched. For the
+	 * case of a directory entry, modify the path to create
+	 * a fake filename within this directory, allowing us to
+	 * use the file-base matching logic in an equivalent way.
+	 */
+	if (parent_pathname.len > 0 &&
+	    parent_pathname.buf[parent_pathname.len - 1] == '/')
+		strbuf_add(&parent_pathname, "-", 1);
+
 	if (hashmap_contains_path(&pl->recursive_hashmap,
 				  &parent_pathname)) {
 		result = MATCHED_RECURSIVE;
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 127+ messages in thread

* [PATCH v5 10/14] diff-lib: handle index diffs with sparse dirs
  2021-06-07 12:33       ` [PATCH v5 00/14] Sparse-index: integrate with status Derrick Stolee via GitGitGadget
                           ` (8 preceding siblings ...)
  2021-06-07 12:34         ` [PATCH v5 09/14] dir.c: accept a directory as part of cone-mode patterns Derrick Stolee via GitGitGadget
@ 2021-06-07 12:34         ` Derrick Stolee via GitGitGadget
  2021-06-07 15:26           ` Derrick Stolee
  2021-06-09  5:47           ` Elijah Newren
  2021-06-07 12:34         ` [PATCH v5 11/14] status: skip sparse-checkout percentage with sparse-index Derrick Stolee via GitGitGadget
                           ` (3 subsequent siblings)
  13 siblings, 2 replies; 127+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-06-07 12:34 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

While comparing an index to a tree, we may see a sparse directory entry.
In this case, we should compare that portion of the tree to the tree
represented by that entry. This could include a new tree which needs to
be expanded to a full list of added files. It could also include an
existing tree, in which case all of the changes inside are important to
describe, including the modifications, additions, and deletions. Note
that the case where the tree has a path and the index does not remains
identical to before: the lack of a cache entry is the same with a sparse
index.

In the case where a tree is modified, we need to expand the tree
recursively, and start comparing each contained entry as either an
addition, deletion, or modification. This causes an interesting
recursion that did not exist before.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 diff-lib.c | 188 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 188 insertions(+)

diff --git a/diff-lib.c b/diff-lib.c
index b73cc1859a49..ba4c683d4bc4 100644
--- a/diff-lib.c
+++ b/diff-lib.c
@@ -314,6 +314,48 @@ static int get_stat_data(const struct cache_entry *ce,
 	return 0;
 }
 
+struct show_new_tree_context {
+	struct rev_info *revs;
+	unsigned added:1;
+};
+
+static int show_new_file_from_tree(const struct object_id *oid,
+				   struct strbuf *base, const char *path,
+				   unsigned int mode, void *context)
+{
+	struct show_new_tree_context *ctx = context;
+	struct cache_entry *new_file = make_transient_cache_entry(mode, oid, path, /* stage */ 0);
+
+	diff_index_show_file(ctx->revs, ctx->added ? "+" : "-", new_file, oid, !is_null_oid(oid), mode, 0);
+	discard_cache_entry(new_file);
+	return 0;
+}
+
+static void show_directory(struct rev_info *revs,
+			   const struct cache_entry *new_dir,
+			   int added)
+{
+	/*
+	 * new_dir is a sparse directory entry, so we want to collect all
+	 * of the new files within the tree. This requires recursively
+	 * expanding the trees.
+	 */
+	struct show_new_tree_context ctx = { revs, added };
+	struct repository *r = revs->repo;
+	struct strbuf base = STRBUF_INIT;
+	struct pathspec ps;
+	struct tree *tree = lookup_tree(r, &new_dir->oid);
+
+	memset(&ps, 0, sizeof(ps));
+	ps.recursive = 1;
+	ps.has_wildcard = 1;
+	ps.max_depth = -1;
+
+	strbuf_add(&base, new_dir->name, new_dir->ce_namelen);
+	read_tree_at(r, tree, &base, &ps,
+			show_new_file_from_tree, &ctx);
+}
+
 static void show_new_file(struct rev_info *revs,
 			  const struct cache_entry *new_file,
 			  int cached, int match_missing)
@@ -322,6 +364,11 @@ static void show_new_file(struct rev_info *revs,
 	unsigned int mode;
 	unsigned dirty_submodule = 0;
 
+	if (new_file && S_ISSPARSEDIR(new_file->ce_mode)) {
+		show_directory(revs, new_file, /*added */ 1);
+		return;
+	}
+
 	/*
 	 * New file in the index: it might actually be different in
 	 * the working tree.
@@ -333,6 +380,136 @@ static void show_new_file(struct rev_info *revs,
 	diff_index_show_file(revs, "+", new_file, oid, !is_null_oid(oid), mode, dirty_submodule);
 }
 
+static int show_modified(struct rev_info *revs,
+			 const struct cache_entry *old_entry,
+			 const struct cache_entry *new_entry,
+			 int report_missing,
+			 int cached, int match_missing);
+
+static int compare_within_sparse_dir(int n, unsigned long mask,
+				     unsigned long dirmask, struct name_entry *entry,
+				     struct traverse_info *info)
+{
+	struct rev_info *revs = info->data;
+	struct object_id *oid0 = &entry[0].oid;
+	struct object_id *oid1 = &entry[1].oid;
+
+	if (oideq(oid0, oid1))
+		return mask;
+
+	/* Directory/file conflicts are handled earlier. */
+	if (S_ISDIR(entry[0].mode) && S_ISDIR(entry[1].mode)) {
+		struct tree_desc t[2];
+		void *buf[2];
+		struct traverse_info info_r = { NULL, };
+
+		info_r.name = xstrfmt("%s%s", info->traverse_path, entry[0].path);
+		info_r.namelen = strlen(info_r.name);
+		info_r.traverse_path = xstrfmt("%s/", info_r.name);
+		info_r.fn = compare_within_sparse_dir;
+		info_r.prev = info;
+		info_r.mode = entry[0].mode;
+		info_r.pathlen = entry[0].pathlen;
+		info_r.df_conflicts = 0;
+		info_r.data = revs;
+
+		buf[0] = fill_tree_descriptor(revs->repo, &t[0], oid0);
+		buf[1] = fill_tree_descriptor(revs->repo, &t[1], oid1);
+
+		traverse_trees(NULL, 2, t, &info_r);
+
+		free((char *)info_r.name);
+		free((char *)info_r.traverse_path);
+		free(buf[0]);
+		free(buf[1]);
+	} else {
+		char *old_path = NULL, *new_path = NULL;
+		struct cache_entry *old_entry = NULL, *new_entry = NULL;
+
+		if (entry[0].path) {
+			old_path = xstrfmt("%s%s", info->traverse_path, entry[0].path);
+			old_entry = make_transient_cache_entry(
+					entry[0].mode, &entry[0].oid,
+					old_path, /* stage */ 0);
+			old_entry->ce_flags |= CE_SKIP_WORKTREE;
+		}
+		if (entry[1].path) {
+			new_path = xstrfmt("%s%s", info->traverse_path, entry[1].path);
+			new_entry = make_transient_cache_entry(
+					entry[1].mode, &entry[1].oid,
+					new_path, /* stage */ 0);
+			new_entry->ce_flags |= CE_SKIP_WORKTREE;
+		}
+
+		if (entry[0].path && entry[1].path)
+			show_modified(revs, old_entry, new_entry, 0, 1, 0);
+		else if (entry[0].path)
+			diff_index_show_file(revs, revs->prefix,
+					     old_entry, &entry[0].oid,
+					     0, entry[0].mode, 0);
+		else if (entry[1].path)
+			show_new_file(revs, new_entry, 1, 0);
+
+		discard_cache_entry(old_entry);
+		discard_cache_entry(new_entry);
+		free(old_path);
+		free(new_path);
+	}
+
+	return mask;
+}
+
+static void show_modified_sparse_directory(struct rev_info *revs,
+			 const struct cache_entry *old_entry,
+			 const struct cache_entry *new_entry,
+			 int report_missing,
+			 int cached, int match_missing)
+{
+	struct tree_desc t[2];
+	void *buf[2];
+	struct traverse_info info = { NULL };
+	struct strbuf name = STRBUF_INIT;
+	struct strbuf parent_path = STRBUF_INIT;
+	char *last_dir_sep;
+
+	if (oideq(&old_entry->oid, &new_entry->oid))
+		return;
+
+	info.fn = compare_within_sparse_dir;
+	info.prev = &info;
+
+	strbuf_add(&name, new_entry->name, new_entry->ce_namelen - 1);
+	info.name = name.buf;
+	info.namelen = name.len;
+
+	strbuf_add(&parent_path, new_entry->name, new_entry->ce_namelen - 1);
+	if ((last_dir_sep = find_last_dir_sep(parent_path.buf)) > parent_path.buf)
+		strbuf_setlen(&parent_path, (last_dir_sep - parent_path.buf) - 1);
+	else
+		strbuf_setlen(&parent_path, 0);
+
+	info.pathlen = parent_path.len;
+
+	if (parent_path.len)
+		info.traverse_path = parent_path.buf;
+	else
+		info.traverse_path = "";
+
+	info.mode = new_entry->ce_mode;
+	info.df_conflicts = 0;
+	info.data = revs;
+
+	buf[0] = fill_tree_descriptor(revs->repo, &t[0], &old_entry->oid);
+	buf[1] = fill_tree_descriptor(revs->repo, &t[1], &new_entry->oid);
+
+	traverse_trees(NULL, 2, t, &info);
+
+	free(buf[0]);
+	free(buf[1]);
+	strbuf_release(&name);
+	strbuf_release(&parent_path);
+}
+
 static int show_modified(struct rev_info *revs,
 			 const struct cache_entry *old_entry,
 			 const struct cache_entry *new_entry,
@@ -343,6 +520,17 @@ static int show_modified(struct rev_info *revs,
 	const struct object_id *oid;
 	unsigned dirty_submodule = 0;
 
+	/*
+	 * If both are sparse directory entries, then expand the
+	 * modifications to the file level.
+	 */
+	if (old_entry && new_entry &&
+	    S_ISSPARSEDIR(old_entry->ce_mode) &&
+	    S_ISSPARSEDIR(new_entry->ce_mode)) {
+		show_modified_sparse_directory(revs, old_entry, new_entry, report_missing, cached, match_missing);
+		return 0;
+	}
+
 	if (get_stat_data(new_entry, &oid, &mode, cached, match_missing,
 			  &dirty_submodule, &revs->diffopt) < 0) {
 		if (report_missing)
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 127+ messages in thread

* [PATCH v5 11/14] status: skip sparse-checkout percentage with sparse-index
  2021-06-07 12:33       ` [PATCH v5 00/14] Sparse-index: integrate with status Derrick Stolee via GitGitGadget
                           ` (9 preceding siblings ...)
  2021-06-07 12:34         ` [PATCH v5 10/14] diff-lib: handle index diffs with sparse dirs Derrick Stolee via GitGitGadget
@ 2021-06-07 12:34         ` Derrick Stolee via GitGitGadget
  2021-06-07 12:34         ` [PATCH v5 12/14] status: use sparse-index throughout Derrick Stolee via GitGitGadget
                           ` (2 subsequent siblings)
  13 siblings, 0 replies; 127+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-06-07 12:34 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

'git status' began reporting a percentage of populated paths when
sparse-checkout is enabled in 051df3cf (wt-status: show sparse
checkout status as well, 2020-07-18). This percentage is incorrect when
the index has sparse directories. It would also be expensive to
calculate as we would need to parse trees to count the total number of
possible paths.

Avoid the expensive computation by simplifying the output to only report
that a sparse checkout exists, without the percentage.

This change is the reason we use 'git status --porcelain=v2' in
t1092-sparse-checkout-compatibility.sh. We don't want to ensure that
this message is equal across both modes, but instead just the important
information about staged, modified, and untracked files are compared.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 t/t1092-sparse-checkout-compatibility.sh |  8 ++++++++
 wt-status.c                              | 14 +++++++++++---
 wt-status.h                              |  1 +
 3 files changed, 20 insertions(+), 3 deletions(-)

diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index 851a83388e4b..f6b124e0500f 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -215,6 +215,14 @@ test_expect_success 'status with options' '
 	test_all_match git status --porcelain=v2 -uno
 '
 
+test_expect_success 'status reports sparse-checkout' '
+	init_repos &&
+	git -C sparse-checkout status >full &&
+	git -C sparse-index status >sparse &&
+	test_i18ngrep "You are in a sparse checkout with " full &&
+	test_i18ngrep "You are in a sparse checkout." sparse
+'
+
 test_expect_success 'add, commit, checkout' '
 	init_repos &&
 
diff --git a/wt-status.c b/wt-status.c
index 0c8287a023e4..0425169c1895 100644
--- a/wt-status.c
+++ b/wt-status.c
@@ -1490,9 +1490,12 @@ static void show_sparse_checkout_in_use(struct wt_status *s,
 	if (s->state.sparse_checkout_percentage == SPARSE_CHECKOUT_DISABLED)
 		return;
 
-	status_printf_ln(s, color,
-			 _("You are in a sparse checkout with %d%% of tracked files present."),
-			 s->state.sparse_checkout_percentage);
+	if (s->state.sparse_checkout_percentage == SPARSE_CHECKOUT_SPARSE_INDEX)
+		status_printf_ln(s, color, _("You are in a sparse checkout."));
+	else
+		status_printf_ln(s, color,
+				_("You are in a sparse checkout with %d%% of tracked files present."),
+				s->state.sparse_checkout_percentage);
 	wt_longstatus_print_trailer(s);
 }
 
@@ -1650,6 +1653,11 @@ static void wt_status_check_sparse_checkout(struct repository *r,
 		return;
 	}
 
+	if (r->index->sparse_index) {
+		state->sparse_checkout_percentage = SPARSE_CHECKOUT_SPARSE_INDEX;
+		return;
+	}
+
 	for (i = 0; i < r->index->cache_nr; i++) {
 		struct cache_entry *ce = r->index->cache[i];
 		if (ce_skip_worktree(ce))
diff --git a/wt-status.h b/wt-status.h
index 0d32799b28e1..ab9cc9d8f032 100644
--- a/wt-status.h
+++ b/wt-status.h
@@ -78,6 +78,7 @@ enum wt_status_format {
 };
 
 #define SPARSE_CHECKOUT_DISABLED -1
+#define SPARSE_CHECKOUT_SPARSE_INDEX -2
 
 struct wt_status_state {
 	int merge_in_progress;
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 127+ messages in thread

* [PATCH v5 12/14] status: use sparse-index throughout
  2021-06-07 12:33       ` [PATCH v5 00/14] Sparse-index: integrate with status Derrick Stolee via GitGitGadget
                           ` (10 preceding siblings ...)
  2021-06-07 12:34         ` [PATCH v5 11/14] status: skip sparse-checkout percentage with sparse-index Derrick Stolee via GitGitGadget
@ 2021-06-07 12:34         ` Derrick Stolee via GitGitGadget
  2021-06-07 12:34         ` [PATCH v5 13/14] wt-status: expand added sparse directory entries Derrick Stolee via GitGitGadget
  2021-06-07 12:34         ` [PATCH v5 14/14] fsmonitor: integrate with sparse index Derrick Stolee via GitGitGadget
  13 siblings, 0 replies; 127+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-06-07 12:34 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

By testing 'git -c core.fsmonitor= status -uno', we can check for the
simplest index operations that can be made sparse-aware. The necessary
implementation details are already integrated with sparse-checkout, so
modify command_requires_full_index to be zero for cmd_status().

In refresh_index(), we loop through the index entries to refresh their
stat() information. However, sparse directories have no stat()
information to populate. Ignore these entries.

This allows 'git status' to no longer expand a sparse index to a full
one. This is further tested by dropping the "-uno" option and adding an
untracked file into the worktree.

The performance test p2000-sparse-checkout-operations.sh demonstrates
these improvements:

Test                                  HEAD~1           HEAD
-----------------------------------------------------------------------------
2000.2: git status (full-index-v3)    0.31(0.30+0.05)  0.31(0.29+0.06) +0.0%
2000.3: git status (full-index-v4)    0.31(0.29+0.07)  0.34(0.30+0.08) +9.7%
2000.4: git status (sparse-index-v3)  2.35(2.28+0.10)  0.04(0.04+0.05) -98.3%
2000.5: git status (sparse-index-v4)  2.35(2.24+0.15)  0.05(0.04+0.06) -97.9%

Note that since HEAD~1 was expanding the sparse index by parsing trees,
it was artificially slower than the full index case. Thus, the 98%
improvement is misleading, and instead we should celebrate the 0.34s to
0.05s improvement of 85%. This is more indicative of the peformance
gains we are expecting by using a sparse index.

Note: we are dropping the assignment of core.fsmonitor here. This is not
necessary for the test script as we are not altering the config any
other way. Correct integration with FS Monitor will be validated in
later changes.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 builtin/commit.c                         |  3 +++
 read-cache.c                             | 10 ++++++++--
 t/t1092-sparse-checkout-compatibility.sh | 13 +++++++++----
 3 files changed, 20 insertions(+), 6 deletions(-)

diff --git a/builtin/commit.c b/builtin/commit.c
index cf0c36d1dcb2..e529da7beadd 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -1404,6 +1404,9 @@ int cmd_status(int argc, const char **argv, const char *prefix)
 	if (argc == 2 && !strcmp(argv[1], "-h"))
 		usage_with_options(builtin_status_usage, builtin_status_options);
 
+	prepare_repo_settings(the_repository);
+	the_repository->settings.command_requires_full_index = 0;
+
 	status_init_config(&s, git_status_config);
 	argc = parse_options(argc, argv, prefix,
 			     builtin_status_options,
diff --git a/read-cache.c b/read-cache.c
index 29ffa9ac5db9..f80e26831b36 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1578,8 +1578,7 @@ int refresh_index(struct index_state *istate, unsigned int flags,
 	 */
 	preload_index(istate, pathspec, 0);
 	trace2_region_enter("index", "refresh", NULL);
-	/* TODO: audit for interaction with sparse-index. */
-	ensure_full_index(istate);
+
 	for (i = 0; i < istate->cache_nr; i++) {
 		struct cache_entry *ce, *new_entry;
 		int cache_errno = 0;
@@ -1594,6 +1593,13 @@ int refresh_index(struct index_state *istate, unsigned int flags,
 		if (ignore_skip_worktree && ce_skip_worktree(ce))
 			continue;
 
+		/*
+		 * If this entry is a sparse directory, then there isn't
+		 * any stat() information to update. Ignore the entry.
+		 */
+		if (S_ISSPARSEDIR(ce->ce_mode))
+			continue;
+
 		if (pathspec && !ce_path_match(istate, ce, pathspec, seen))
 			filtered = 1;
 
diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index f6b124e0500f..099dc2bf440f 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -508,12 +508,17 @@ test_expect_success 'sparse-index is expanded and converted back' '
 	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \
 		git -C sparse-index -c core.fsmonitor="" reset --hard &&
 	test_region index convert_to_sparse trace2.txt &&
-	test_region index ensure_full_index trace2.txt &&
+	test_region index ensure_full_index trace2.txt
+'
 
-	rm trace2.txt &&
+test_expect_success 'sparse-index is not expanded' '
+	init_repos &&
+
+	rm -f trace2.txt &&
+	echo >>sparse-index/untracked.txt &&
 	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \
-		git -C sparse-index -c core.fsmonitor="" status -uno &&
-	test_region index ensure_full_index trace2.txt
+		git -C sparse-index status &&
+	test_region ! index ensure_full_index trace2.txt
 '
 
 test_done
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 127+ messages in thread

* [PATCH v5 13/14] wt-status: expand added sparse directory entries
  2021-06-07 12:33       ` [PATCH v5 00/14] Sparse-index: integrate with status Derrick Stolee via GitGitGadget
                           ` (11 preceding siblings ...)
  2021-06-07 12:34         ` [PATCH v5 12/14] status: use sparse-index throughout Derrick Stolee via GitGitGadget
@ 2021-06-07 12:34         ` Derrick Stolee via GitGitGadget
  2021-06-09  5:27           ` Elijah Newren
  2021-06-07 12:34         ` [PATCH v5 14/14] fsmonitor: integrate with sparse index Derrick Stolee via GitGitGadget
  13 siblings, 1 reply; 127+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-06-07 12:34 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

It is difficult, but possible, to get into a state where we intend to
add a directory that is outside of the sparse-checkout definition. Add a
test to t1092-sparse-checkout-compatibility.sh that demonstrates this
using a combination of 'git reset --mixed' and 'git checkout --orphan'.

This test failed before because the output of 'git status
--porcelain=v2' would not match on the lines for folder1/:

* The sparse-checkout repo (with a full index) would output each path
  name that is intended to be added.

* The sparse-index repo would only output that "folder1/" is staged for
  addition.

The status should report the full list of files to be added, and so this
sparse-directory entry should be expanded to a full list when reaching
it inside the wt_status_collect_changes_initial() method. Use
read_tree_at() to assist.

Somehow, this loop over the cache entries was not guarded by
ensure_full_index() as intended.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 t/t1092-sparse-checkout-compatibility.sh | 36 +++++++++++++++++
 wt-status.c                              | 50 ++++++++++++++++++++++++
 2 files changed, 86 insertions(+)

diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index 099dc2bf440f..39b86fbe2be6 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -521,4 +521,40 @@ test_expect_success 'sparse-index is not expanded' '
 	test_region ! index ensure_full_index trace2.txt
 '
 
+test_expect_success 'reset mixed and checkout orphan' '
+	init_repos &&
+
+	test_all_match git checkout rename-out-to-in &&
+
+	# Sparse checkouts do not agree with full checkouts about
+	# how to report a directory/file conflict during a reset.
+	# This command would fail with test_all_match because the
+	# full checkout reports "T folder1/0/1" while a sparse
+	# checkout reports "D folder1/0/1". This matches because
+	# the sparse checkouts skip "adding" the other side of
+	# the conflict.
+	test_sparse_match git reset --mixed HEAD~1 &&
+	test_sparse_match test-tool read-cache --table --expand &&
+	test_sparse_match git status --porcelain=v2 &&
+	test_sparse_match git status --porcelain=v2 &&
+
+	# At this point, sparse-checkouts behave differently
+	# from the full-checkout.
+	test_sparse_match git checkout --orphan new-branch &&
+	test_sparse_match test-tool read-cache --table --expand &&
+	test_sparse_match git status --porcelain=v2 &&
+	test_sparse_match git status --porcelain=v2
+'
+
+test_expect_success 'add everything with deep new file' '
+	init_repos &&
+
+	run_on_sparse git sparse-checkout set deep/deeper1/deepest &&
+
+	run_on_all touch deep/deeper1/x &&
+	test_all_match git add . &&
+	test_all_match git status --porcelain=v2 &&
+	test_all_match git status --porcelain=v2
+'
+
 test_done
diff --git a/wt-status.c b/wt-status.c
index 0425169c1895..90db8bd659fa 100644
--- a/wt-status.c
+++ b/wt-status.c
@@ -654,6 +654,34 @@ static void wt_status_collect_changes_index(struct wt_status *s)
 	run_diff_index(&rev, 1);
 }
 
+static int add_file_to_list(const struct object_id *oid,
+			    struct strbuf *base, const char *path,
+			    unsigned int mode, void *context)
+{
+	struct string_list_item *it;
+	struct wt_status_change_data *d;
+	struct wt_status *s = context;
+	char *full_name;
+
+	if (S_ISDIR(mode))
+		return READ_TREE_RECURSIVE;
+
+	full_name = xstrfmt("%s%s", base->buf, path);
+	it = string_list_insert(&s->change, full_name);
+	d = it->util;
+	if (!d) {
+		CALLOC_ARRAY(d, 1);
+		it->util = d;
+	}
+
+	d->index_status = DIFF_STATUS_ADDED;
+	/* Leave {mode,oid}_head zero for adds. */
+	d->mode_index = mode;
+	oidcpy(&d->oid_index, oid);
+	s->committable = 1;
+	return 0;
+}
+
 static void wt_status_collect_changes_initial(struct wt_status *s)
 {
 	struct index_state *istate = s->repo->index;
@@ -668,6 +696,28 @@ static void wt_status_collect_changes_initial(struct wt_status *s)
 			continue;
 		if (ce_intent_to_add(ce))
 			continue;
+		if (S_ISSPARSEDIR(ce->ce_mode)) {
+			/*
+			 * This is a sparse directory entry, so we want to collect all
+			 * of the added files within the tree. This requires recursively
+			 * expanding the trees to find the elements that are new in this
+			 * tree and marking them with DIFF_STATUS_ADDED.
+			 */
+			struct strbuf base = STRBUF_INIT;
+			struct pathspec ps;
+			struct tree *tree = lookup_tree(istate->repo, &ce->oid);
+
+			memset(&ps, 0, sizeof(ps));
+			ps.recursive = 1;
+			ps.has_wildcard = 1;
+			ps.max_depth = -1;
+
+			strbuf_add(&base, ce->name, ce->ce_namelen);
+			read_tree_at(istate->repo, tree, &base, &ps,
+				     add_file_to_list, s);
+			continue;
+		}
+
 		it = string_list_insert(&s->change, ce->name);
 		d = it->util;
 		if (!d) {
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 127+ messages in thread

* [PATCH v5 14/14] fsmonitor: integrate with sparse index
  2021-06-07 12:33       ` [PATCH v5 00/14] Sparse-index: integrate with status Derrick Stolee via GitGitGadget
                           ` (12 preceding siblings ...)
  2021-06-07 12:34         ` [PATCH v5 13/14] wt-status: expand added sparse directory entries Derrick Stolee via GitGitGadget
@ 2021-06-07 12:34         ` Derrick Stolee via GitGitGadget
  13 siblings, 0 replies; 127+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-06-07 12:34 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

If we need to expand a sparse-index into a full one, then the FS Monitor
bitmap is going to be incorrect. Ensure that we start fresh at such an
event.

While this is currently a performance drawback, the eventual hope of the
sparse-index feature is that these expansions will be rare and hence we
will be able to keep the FS Monitor data accurate across multiple Git
commands.

These tests are added to demonstrate that the behavior is the same
across a full index and a sparse index, but also that file modifications
to a tracked directory outside of the sparse cone will trigger
ensure_full_index().

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 sparse-index.c              |  7 ++++++
 t/t7519-status-fsmonitor.sh | 48 +++++++++++++++++++++++++++++++++++++
 2 files changed, 55 insertions(+)

diff --git a/sparse-index.c b/sparse-index.c
index b2b3fbd75050..32ba0d17ef7c 100644
--- a/sparse-index.c
+++ b/sparse-index.c
@@ -195,6 +195,10 @@ int convert_to_sparse(struct index_state *istate)
 	cache_tree_free(&istate->cache_tree);
 	cache_tree_update(istate, 0);
 
+	istate->fsmonitor_has_run_once = 0;
+	FREE_AND_NULL(istate->fsmonitor_dirty);
+	FREE_AND_NULL(istate->fsmonitor_last_update);
+
 	istate->sparse_index = 1;
 	trace2_region_leave("index", "convert_to_sparse", istate->repo);
 	return 0;
@@ -291,6 +295,9 @@ void ensure_full_index(struct index_state *istate)
 	istate->cache = full->cache;
 	istate->cache_nr = full->cache_nr;
 	istate->cache_alloc = full->cache_alloc;
+	istate->fsmonitor_has_run_once = 0;
+	FREE_AND_NULL(istate->fsmonitor_dirty);
+	FREE_AND_NULL(istate->fsmonitor_last_update);
 
 	strbuf_release(&base);
 	free(full);
diff --git a/t/t7519-status-fsmonitor.sh b/t/t7519-status-fsmonitor.sh
index 45d025f96010..f70fe961902e 100755
--- a/t/t7519-status-fsmonitor.sh
+++ b/t/t7519-status-fsmonitor.sh
@@ -73,6 +73,7 @@ test_expect_success 'setup' '
 	expect*
 	actual*
 	marker*
+	trace2*
 	EOF
 '
 
@@ -383,4 +384,51 @@ test_expect_success 'status succeeds after staging/unstaging' '
 	)
 '
 
+# Usage:
+# check_sparse_index_behavior [!]
+# If "!" is supplied, then we verify that we do not call ensure_full_index
+# during a call to 'git status'. Otherwise, we verify that we _do_ call it.
+check_sparse_index_behavior () {
+	git status --porcelain=v2 >expect &&
+	git sparse-checkout init --cone --sparse-index &&
+	git sparse-checkout set dir1 dir2 &&
+	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \
+		git status --porcelain=v2 >actual &&
+	test_region $1 index ensure_full_index trace2.txt &&
+	test_cmp expect actual &&
+	rm trace2.txt &&
+	git sparse-checkout disable
+}
+
+test_expect_success 'status succeeds with sparse index' '
+	git reset --hard &&
+
+	test_config core.fsmonitor "$TEST_DIRECTORY/t7519/fsmonitor-all" &&
+	check_sparse_index_behavior ! &&
+
+	write_script .git/hooks/fsmonitor-test<<-\EOF &&
+		printf "last_update_token\0"
+	EOF
+	git config core.fsmonitor .git/hooks/fsmonitor-test &&
+	check_sparse_index_behavior ! &&
+
+	write_script .git/hooks/fsmonitor-test<<-\EOF &&
+		printf "last_update_token\0"
+		printf "dir1/modified\0"
+	EOF
+	check_sparse_index_behavior ! &&
+
+	cp -r dir1 dir1a &&
+	git add dir1a &&
+	git commit -m "add dir1a" &&
+
+	# This one modifies outside the sparse-checkout definition
+	# and hence we expect to expand the sparse-index.
+	write_script .git/hooks/fsmonitor-test<<-\EOF &&
+		printf "last_update_token\0"
+		printf "dir1a/modified\0"
+	EOF
+	check_sparse_index_behavior
+'
+
 test_done
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [PATCH v5 10/14] diff-lib: handle index diffs with sparse dirs
  2021-06-07 12:34         ` [PATCH v5 10/14] diff-lib: handle index diffs with sparse dirs Derrick Stolee via GitGitGadget
@ 2021-06-07 15:26           ` Derrick Stolee
  2021-06-08  1:05             ` Junio C Hamano
  2021-06-09  5:47           ` Elijah Newren
  1 sibling, 1 reply; 127+ messages in thread
From: Derrick Stolee @ 2021-06-07 15:26 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget, git
  Cc: gitster, newren, Matheus Tavares Bernardino, Derrick Stolee,
	Derrick Stolee

On 6/7/2021 8:34 AM, Derrick Stolee via GitGitGadget wrote:
> From: Derrick Stolee <dstolee@microsoft.com>
...
> +			old_entry = make_transient_cache_entry(
> +					entry[0].mode, &entry[0].oid,
> +					old_path, /* stage */ 0);

I didn't realize this before I started integrating with
v2.32.0 (which I should have done before submitting v5) that
make_transient_cache_entry() has changed its prototype to
include a memory pool parameter.

I'm working on a v6 that makes only this update and it will
probably be ready tomorrow.

Thanks,
-Stolee


^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [PATCH v5 10/14] diff-lib: handle index diffs with sparse dirs
  2021-06-07 15:26           ` Derrick Stolee
@ 2021-06-08  1:05             ` Junio C Hamano
  2021-06-08 13:00               ` Derrick Stolee
  0 siblings, 1 reply; 127+ messages in thread
From: Junio C Hamano @ 2021-06-08  1:05 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Derrick Stolee via GitGitGadget, git, newren,
	Matheus Tavares Bernardino, Derrick Stolee, Derrick Stolee

Derrick Stolee <stolee@gmail.com> writes:

> On 6/7/2021 8:34 AM, Derrick Stolee via GitGitGadget wrote:
>> From: Derrick Stolee <dstolee@microsoft.com>
> ...
>> +			old_entry = make_transient_cache_entry(
>> +					entry[0].mode, &entry[0].oid,
>> +					old_path, /* stage */ 0);
>
> I didn't realize this before I started integrating with
> v2.32.0 (which I should have done before submitting v5) that
> make_transient_cache_entry() has changed its prototype to
> include a memory pool parameter.

Sorry for the trouble---these are usually all known to me for topics
I happened to have picked up in 'seen', since I try to make it a rule
that 'seen' must be a descendant of 'master'.

How can I usefully communicate the conflicts I find out during the
integration cycles to topic owners, I wonder.

Thanks.

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [PATCH v5 10/14] diff-lib: handle index diffs with sparse dirs
  2021-06-08  1:05             ` Junio C Hamano
@ 2021-06-08 13:00               ` Derrick Stolee
  0 siblings, 0 replies; 127+ messages in thread
From: Derrick Stolee @ 2021-06-08 13:00 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Derrick Stolee via GitGitGadget, git, newren,
	Matheus Tavares Bernardino, Derrick Stolee, Derrick Stolee

On 6/7/2021 9:05 PM, Junio C Hamano wrote:
> Derrick Stolee <stolee@gmail.com> writes:
> 
>> On 6/7/2021 8:34 AM, Derrick Stolee via GitGitGadget wrote:
>>> From: Derrick Stolee <dstolee@microsoft.com>
>> ...
>>> +			old_entry = make_transient_cache_entry(
>>> +					entry[0].mode, &entry[0].oid,
>>> +					old_path, /* stage */ 0);
>>
>> I didn't realize this before I started integrating with
>> v2.32.0 (which I should have done before submitting v5) that
>> make_transient_cache_entry() has changed its prototype to
>> include a memory pool parameter.
> 
> Sorry for the trouble---these are usually all known to me for topics
> I happened to have picked up in 'seen', since I try to make it a rule
> that 'seen' must be a descendant of 'master'.
> 
> How can I usefully communicate the conflicts I find out during the
> integration cycles to topic owners, I wonder.

This is my fault for stacking topics. I used a GitGitGadget PR
to target a custom merge of other topics in flight, so my
merges were testing against a static target. When those topics
were merged, I should have updated my PR to point to 'master' or
even 'next'.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [PATCH v5 02/14] sparse-index: include EXTENDED flag when expanding
  2021-06-07 12:34         ` [PATCH v5 02/14] sparse-index: include EXTENDED flag when expanding Derrick Stolee via GitGitGadget
@ 2021-06-08 18:56           ` Elijah Newren
  2021-06-09 17:39             ` Derrick Stolee
  0 siblings, 1 reply; 127+ messages in thread
From: Elijah Newren @ 2021-06-08 18:56 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Junio C Hamano, Matheus Tavares Bernardino,
	Derrick Stolee, Derrick Stolee, Derrick Stolee

On Mon, Jun 7, 2021 at 5:34 AM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Derrick Stolee <dstolee@microsoft.com>
>
> When creating a full index from a sparse one, we create cache entries
> for every blob within a given sparse directory entry. These are
> correctly marked with the CE_SKIP_WORKTREE flag, but they must also be
> marked with the CE_EXTENDED flag to ensure that the skip-worktree bit is
> correctly written to disk in the case that the index is not converted
> back down to a sparse-index.

In our previous discussion on this patch from v3
(https://lore.kernel.org/git/cb9161ca-dc6e-b77b-1a41-385ed8920bb2@gmail.com/),
you said you'd explain the reason for this change in a bit more
detail, but the commit message has not changed.

Could this be corrected?

>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  sparse-index.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/sparse-index.c b/sparse-index.c
> index 1b49898d0cb7..b2b3fbd75050 100644
> --- a/sparse-index.c
> +++ b/sparse-index.c
> @@ -222,7 +222,7 @@ static int add_path_to_index(const struct object_id *oid,
>         strbuf_addstr(base, path);
>
>         ce = make_cache_entry(istate, mode, oid, base->buf, 0, 0);
> -       ce->ce_flags |= CE_SKIP_WORKTREE;
> +       ce->ce_flags |= CE_SKIP_WORKTREE | CE_EXTENDED;
>         set_index_entry(istate, istate->cache_nr++, ce);
>
>         strbuf_setlen(base, len);
> --
> gitgitgadget

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [PATCH v5 03/14] t1092: replace incorrect 'echo' with 'cat'
  2021-06-07 12:34         ` [PATCH v5 03/14] t1092: replace incorrect 'echo' with 'cat' Derrick Stolee via GitGitGadget
@ 2021-06-08 19:18           ` Elijah Newren
  0 siblings, 0 replies; 127+ messages in thread
From: Elijah Newren @ 2021-06-08 19:18 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Junio C Hamano, Matheus Tavares Bernardino,
	Derrick Stolee, Derrick Stolee, Derrick Stolee

On Mon, Jun 7, 2021 at 5:34 AM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Derrick Stolee <dstolee@microsoft.com>
>
> This fixes the test data shape to be as expected, allowing rename
> detection to work properly now that the 'larger-conent' file actually

s/conent/content/

> has meaningful lines.
>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  t/t1092-sparse-checkout-compatibility.sh | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
> index 4f2f09b53a32..d55478a1902b 100755
> --- a/t/t1092-sparse-checkout-compatibility.sh
> +++ b/t/t1092-sparse-checkout-compatibility.sh
> @@ -40,7 +40,7 @@ test_expect_success 'setup' '
>                 done &&
>
>                 git checkout -b rename-base base &&
> -               echo >folder1/larger-content <<-\EOF &&
> +               cat >folder1/larger-content <<-\EOF &&
>                 matching
>                 lines
>                 help
> --
> gitgitgadget

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [PATCH v5 08/14] unpack-trees: unpack sparse directory entries
  2021-06-07 12:34         ` [PATCH v5 08/14] unpack-trees: unpack sparse directory entries Derrick Stolee via GitGitGadget
@ 2021-06-09  3:48           ` Elijah Newren
  2021-06-09 20:21             ` Derrick Stolee
  0 siblings, 1 reply; 127+ messages in thread
From: Elijah Newren @ 2021-06-09  3:48 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Junio C Hamano, Matheus Tavares Bernardino,
	Derrick Stolee, Derrick Stolee, Derrick Stolee

On Mon, Jun 7, 2021 at 5:34 AM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Derrick Stolee <dstolee@microsoft.com>
>
> During unpack_callback(), index entries are compared against tree
> entries. These are matched according to names and types. One goal is to
> decide if we should recurse into subtrees or simply operate on one index
> entry.
>
> In the case of a sparse-directory entry, we do not want to recurse into
> that subtree and instead simply compare the trees. In some cases, we
> might want to perform a merge operation on the entry, such as during
> 'git checkout <commit>' which wants to replace a sparse tree entry with
> the tree for that path at the target commit. We extend the logic within
> unpack_nondirectories() to create a sparse-directory entry in this case,
> and then that is sent to call_unpack_fn().

Does this presume that all callbacks are prepared to accept a sparse
directory entry?  Or do we have an external flag that ensures we do
not reach this code path when using callbacks that aren't prepared to
handle it properly?

I hope that the answer is the latter, and that the ensure_full_index()
calls are what prevents the code from reaching this point if a
callback would be used that couldn't handle a sparse directory entry.

I'd be particularly concerned that merge-recursive would call this
code with unpack_opts.fn = threeway_merge.  threeway_merge is kind of
interesting in that it might just happen to not die when passed a
sparse directory entry, but would pass along data that'd just break
stuff downstream in various subtle ways.  For example, if there were
conflicts in the sparse directory entries because both had been
modified, the merge should recurse and resolve individual paths
underneath, which the merge-recursive code would not be prepared to do
since unpack_trees() has already returned.  Also, even if there wasn't
a "conflict" because only one side modified, blindly doing a trivial
directory resolution will break rename detection.  I mention
merge-recursive not because it's worth fixing (well, it was and the
fix is called merge-ort) but because I'm most familiar with it.  The
other callbacks _might_ have similar problems, though its possible
that it's safe for one- and two- way merging and just fails once you
get to three-way.

> There are some subtleties in this process. For instance, we need to
> update find_cache_entry() to allow finding a sparse-directory entry that
> exactly matches a given path.
>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  unpack-trees.c | 101 ++++++++++++++++++++++++++++++++++++++++++++-----
>  1 file changed, 91 insertions(+), 10 deletions(-)
>
> diff --git a/unpack-trees.c b/unpack-trees.c
> index ef6a2b1c951c..ff448ee8424e 100644
> --- a/unpack-trees.c
> +++ b/unpack-trees.c
> @@ -1037,13 +1037,15 @@ static struct cache_entry *create_ce_entry(const struct traverse_info *info,
>         const struct name_entry *n,
>         int stage,
>         struct index_state *istate,
> -       int is_transient)
> +       int is_transient,
> +       int is_sparse_directory)
>  {
>         size_t len = traverse_path_len(info, tree_entry_len(n));
> +       size_t alloc_len = is_sparse_directory ? len + 1 : len;
>         struct cache_entry *ce =
>                 is_transient ?
> -               make_empty_transient_cache_entry(len) :
> -               make_empty_cache_entry(istate, len);
> +               make_empty_transient_cache_entry(alloc_len) :
> +               make_empty_cache_entry(istate, alloc_len);
>
>         ce->ce_mode = create_ce_mode(n->mode);
>         ce->ce_flags = create_ce_flags(stage);
> @@ -1052,6 +1054,13 @@ static struct cache_entry *create_ce_entry(const struct traverse_info *info,
>         /* len+1 because the cache_entry allocates space for NUL */
>         make_traverse_path(ce->name, len + 1, info, n->path, n->pathlen);
>
> +       if (is_sparse_directory) {
> +               ce->name[len] = '/';
> +               ce->name[len + 1] = 0;

Should this be '\0', for clarity?

> +               ce->ce_namelen++;
> +               ce->ce_flags |= CE_SKIP_WORKTREE;
> +       }
> +
>         return ce;
>  }
>
> @@ -1064,16 +1073,24 @@ static int unpack_nondirectories(int n, unsigned long mask,
>                                  unsigned long dirmask,
>                                  struct cache_entry **src,
>                                  const struct name_entry *names,
> -                                const struct traverse_info *info)
> +                                const struct traverse_info *info,
> +                                int sparse_directory)
>  {
>         int i;
>         struct unpack_trees_options *o = info->data;
>         unsigned long conflicts = info->df_conflicts | dirmask;
>
> -       /* Do we have *only* directories? Nothing to do */

You've removed the comment, but not the code.  So it still returns
immediately if there are only directories...right?  Am I missing
something?  Is this code still correct?  Or is the comment just
misleading now that src[0] can be a directory?

>         if (mask == dirmask && !src[0])
>                 return 0;
>
> +       /* no-op if our cache entry doesn't match the expectations. */
> +       if (sparse_directory) {
> +               if (src[0] && !S_ISSPARSEDIR(src[0]->ce_mode))
> +                       BUG("expected sparse directory entry");
> +       } else if (src[0] && S_ISSPARSEDIR(src[0]->ce_mode)) {
> +               return 0;
> +       }

This code reads like "If sparse_directory is false, but the cache
entry is a sparse directory, we'll just keep it as-is and ignore
changed or conflicting directories or files from the names name_entry.
However, I think this has to be coupled with knowledge about changes
to unpack_callback() you made, where you introduce an extra call to
unpack_nondirectories() for the sparse directory case, and in the
second one you would do useful work.  So "no-op" is kind of
misleading, it's more deferral until the later unpack_nondirectories()
call.

Or, at least so I think after trying to read over this patch.  Am I
understanding this right?

> +
>         /*
>          * Ok, we've filled in up to any potential index entry in src[0],
>          * now do the rest.
> @@ -1103,7 +1120,9 @@ static int unpack_nondirectories(int n, unsigned long mask,
>                  * not stored in the index.  otherwise construct the
>                  * cache entry from the index aware logic.
>                  */
> -               src[i + o->merge] = create_ce_entry(info, names + i, stage, &o->result, o->merge);
> +               src[i + o->merge] = create_ce_entry(info, names + i, stage,
> +                                                   &o->result, o->merge,
> +                                                   sparse_directory);
>         }
>
>         if (o->merge) {
> @@ -1210,13 +1229,44 @@ static int find_cache_pos(struct traverse_info *info,
>  static struct cache_entry *find_cache_entry(struct traverse_info *info,
>                                             const struct name_entry *p)
>  {
> +       struct cache_entry *ce;
>         int pos = find_cache_pos(info, p->path, p->pathlen);
>         struct unpack_trees_options *o = info->data;
>
>         if (0 <= pos)
>                 return o->src_index->cache[pos];
> -       else
> +
> +       /*
> +        * Check for a sparse-directory entry named "path/".
> +        * Due to the input p->path not having a trailing
> +        * slash, the negative 'pos' value overshoots the
> +        * expected position by one, hence "-2" here.
> +        */
> +       pos = -pos - 2;
> +
> +       if (pos < 0 || pos >= o->src_index->cache_nr)
> +               return NULL;
> +
> +       ce = o->src_index->cache[pos];
> +
> +       if (!S_ISSPARSEDIR(ce->ce_mode))
>                 return NULL;
> +
> +       /*
> +        * Compare ce->name to info->name + '/' + p->path + '/'
> +        * if info->name is non-empty. Compare ce->name to
> +        * p-.path + '/' otherwise.

p->path, not p-.path

Also, you state in both cases that you are comparing against a
trailing '/', but...

> +        */
> +       if (info->namelen) {
> +               if (ce->ce_namelen == info->namelen + p->pathlen + 2 &&
> +                   ce->name[info->namelen] == '/' &&
> +                   !strncmp(ce->name, info->name, info->namelen) &&
> +                   !strncmp(ce->name + info->namelen + 1, p->path, p->pathlen))

You only checked for one of the two '/' characters here.  Are you
omitting the check for the final '/' do to the S_ISSPARSEDIR() check
above?

> +                       return ce;
> +       } else if (ce->ce_namelen == p->pathlen + 1 &&
> +                  !strncmp(ce->name, p->path, p->pathlen))

Here you didn't check for the final '/'.  Is that intentional because
of the S_ISSPARSEDIR() check above?  If so, should the comment above
this block be corrected?

> +               return ce;
> +       return NULL;
>  }
>
>  static void debug_path(struct traverse_info *info)
> @@ -1251,6 +1301,32 @@ static void debug_unpack_callback(int n,
>                 debug_name_entry(i, names + i);
>  }
>
> +/*
> + * Returns true if and only if the given cache_entry is a
> + * sparse-directory entry that matches the given name_entry
> + * from the tree walk at the given traverse_info.
> + */
> +static int is_sparse_directory_entry(struct cache_entry *ce, struct name_entry *name, struct traverse_info *info)
> +{
> +       size_t expected_len, name_start;
> +
> +       if (!ce || !name || !S_ISSPARSEDIR(ce->ce_mode))
> +               return 0;
> +
> +       if (info->namelen)
> +               name_start = info->namelen + 1;
> +       else
> +               name_start = 0;
> +       expected_len = name->pathlen + 1 + name_start;
> +
> +       if (ce->ce_namelen != expected_len ||
> +           strncmp(ce->name, info->name, info->namelen) ||
> +           strncmp(ce->name + name_start, name->path, name->pathlen))
> +               return 0;

What about the intervening '/' character?  Could we get a false hit
between "foo/bar/" and "foo.bar/"?

Also, do we have to worry about the trailing '/'?

> +
> +       return 1;
> +}
> +
>  /*
>   * Note that traverse_by_cache_tree() duplicates some logic in this function
>   * without actually calling it. If you change the logic here you may need to
> @@ -1307,7 +1383,7 @@ static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, str
>                 }
>         }
>
> -       if (unpack_nondirectories(n, mask, dirmask, src, names, info) < 0)
> +       if (unpack_nondirectories(n, mask, dirmask, src, names, info, 0) < 0)
>                 return -1;
>
>         if (o->merge && src[0]) {
> @@ -1337,9 +1413,14 @@ static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, str
>                         }
>                 }
>
> -               if (traverse_trees_recursive(n, dirmask, mask & ~dirmask,
> -                                            names, info) < 0)
> +               if (is_sparse_directory_entry(src[0], names, info)) {
> +                       if (unpack_nondirectories(n, dirmask, mask & ~dirmask, src, names, info, 1) < 0)
> +                               return -1;
> +               } else if (traverse_trees_recursive(n, dirmask, mask & ~dirmask,
> +                                                   names, info) < 0) {
>                         return -1;
> +               }
> +
>                 return mask;
>         }
>
> --
> gitgitgadget

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [PATCH v5 13/14] wt-status: expand added sparse directory entries
  2021-06-07 12:34         ` [PATCH v5 13/14] wt-status: expand added sparse directory entries Derrick Stolee via GitGitGadget
@ 2021-06-09  5:27           ` Elijah Newren
  2021-06-09 20:49             ` Derrick Stolee
  0 siblings, 1 reply; 127+ messages in thread
From: Elijah Newren @ 2021-06-09  5:27 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Junio C Hamano, Matheus Tavares Bernardino,
	Derrick Stolee, Derrick Stolee, Derrick Stolee

On Mon, Jun 7, 2021 at 5:34 AM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Derrick Stolee <dstolee@microsoft.com>
>
> It is difficult, but possible, to get into a state where we intend to
> add a directory that is outside of the sparse-checkout definition. Add a
> test to t1092-sparse-checkout-compatibility.sh that demonstrates this
> using a combination of 'git reset --mixed' and 'git checkout --orphan'.
>
> This test failed before because the output of 'git status
> --porcelain=v2' would not match on the lines for folder1/:
>
> * The sparse-checkout repo (with a full index) would output each path
>   name that is intended to be added.
>
> * The sparse-index repo would only output that "folder1/" is staged for
>   addition.
>
> The status should report the full list of files to be added, and so this
> sparse-directory entry should be expanded to a full list when reaching
> it inside the wt_status_collect_changes_initial() method. Use
> read_tree_at() to assist.
>
> Somehow, this loop over the cache entries was not guarded by
> ensure_full_index() as intended.
>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  t/t1092-sparse-checkout-compatibility.sh | 36 +++++++++++++++++
>  wt-status.c                              | 50 ++++++++++++++++++++++++
>  2 files changed, 86 insertions(+)
>
> diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
> index 099dc2bf440f..39b86fbe2be6 100755
> --- a/t/t1092-sparse-checkout-compatibility.sh
> +++ b/t/t1092-sparse-checkout-compatibility.sh
> @@ -521,4 +521,40 @@ test_expect_success 'sparse-index is not expanded' '
>         test_region ! index ensure_full_index trace2.txt
>  '
>
> +test_expect_success 'reset mixed and checkout orphan' '
> +       init_repos &&
> +
> +       test_all_match git checkout rename-out-to-in &&
> +
> +       # Sparse checkouts do not agree with full checkouts about
> +       # how to report a directory/file conflict during a reset.
> +       # This command would fail with test_all_match because the
> +       # full checkout reports "T folder1/0/1" while a sparse
> +       # checkout reports "D folder1/0/1". This matches because
> +       # the sparse checkouts skip "adding" the other side of
> +       # the conflict.
> +       test_sparse_match git reset --mixed HEAD~1 &&

Ooh!  I think you found a sparse-checkout bug here.  I agree that
sparse-checkouts and full-checkouts should give different output in
this case, but I don't think the current difference is the correct
one.  Digging in a little closer, before running `git reset --mixed
HEAD~1` I see:

$ git ls-files -t | grep folder
S folder1/0/0/0
S folder1/0/1
S folder2/0/0/0
S folder2/0/1/1
S folder2/a
S folder2/larger-content

and after running git reset --mixed HEAD~1, I see:
S folder1/0/0/0
S folder1/0/1
H folder1/a
H folder1/larger-content
S folder2/0/0/0
H folder2/0/1
S folder2/a
S folder2/larger-content

meaning that the reset of the index failed.  It thinks some entries
are present in the working copy, though it didn't actually check any
of them out, leaving them to be marked as deleted.  This leaves the
sparse-checkout in a messed up state.  To correct it, I need to run
either of the following:

    git diff --diff-filter=D --name-only | xargs git update-index
--skip-worktree

or

    git sparse-checkout reapply

(Though one could ask whether sparse-checkout reapply should take a
missing file that isn't SKIP_WORKTREE and determine it's okay to just
mark it as SKIP_WORKTREE rather than treating it as dirty.  I'm not
sure the answer to that...)

I really think that `git reset --mixed ...` should have been getting
the sparsity right on its own without the manual fixup afterwards that
I needed to add.

> +       test_sparse_match test-tool read-cache --table --expand &&

If both the full and the sparse checkouts do a reset --mixed, I would
think that this step should be able to use a test_all_match...at least
if reset --mixed weren't broken.

> +       test_sparse_match git status --porcelain=v2 &&
> +       test_sparse_match git status --porcelain=v2 &&

Why is this test run twice?

> +
> +       # At this point, sparse-checkouts behave differently
> +       # from the full-checkout.
> +       test_sparse_match git checkout --orphan new-branch &&
> +       test_sparse_match test-tool read-cache --table --expand &&
> +       test_sparse_match git status --porcelain=v2 &&
> +       test_sparse_match git status --porcelain=v2

And again, you run the status twice...why?

> +'
> +
> +test_expect_success 'add everything with deep new file' '
> +       init_repos &&
> +
> +       run_on_sparse git sparse-checkout set deep/deeper1/deepest &&
> +
> +       run_on_all touch deep/deeper1/x &&
> +       test_all_match git add . &&
> +       test_all_match git status --porcelain=v2 &&
> +       test_all_match git status --porcelain=v2

same question.

> +'
> +
>  test_done
> diff --git a/wt-status.c b/wt-status.c
> index 0425169c1895..90db8bd659fa 100644
> --- a/wt-status.c
> +++ b/wt-status.c
> @@ -654,6 +654,34 @@ static void wt_status_collect_changes_index(struct wt_status *s)
>         run_diff_index(&rev, 1);
>  }
>
> +static int add_file_to_list(const struct object_id *oid,
> +                           struct strbuf *base, const char *path,
> +                           unsigned int mode, void *context)
> +{
> +       struct string_list_item *it;
> +       struct wt_status_change_data *d;
> +       struct wt_status *s = context;
> +       char *full_name;
> +
> +       if (S_ISDIR(mode))
> +               return READ_TREE_RECURSIVE;
> +
> +       full_name = xstrfmt("%s%s", base->buf, path);
> +       it = string_list_insert(&s->change, full_name);
> +       d = it->util;
> +       if (!d) {
> +               CALLOC_ARRAY(d, 1);
> +               it->util = d;
> +       }
> +
> +       d->index_status = DIFF_STATUS_ADDED;
> +       /* Leave {mode,oid}_head zero for adds. */
> +       d->mode_index = mode;
> +       oidcpy(&d->oid_index, oid);
> +       s->committable = 1;
> +       return 0;
> +}
> +
>  static void wt_status_collect_changes_initial(struct wt_status *s)
>  {
>         struct index_state *istate = s->repo->index;
> @@ -668,6 +696,28 @@ static void wt_status_collect_changes_initial(struct wt_status *s)
>                         continue;
>                 if (ce_intent_to_add(ce))
>                         continue;
> +               if (S_ISSPARSEDIR(ce->ce_mode)) {
> +                       /*
> +                        * This is a sparse directory entry, so we want to collect all
> +                        * of the added files within the tree. This requires recursively
> +                        * expanding the trees to find the elements that are new in this
> +                        * tree and marking them with DIFF_STATUS_ADDED.
> +                        */
> +                       struct strbuf base = STRBUF_INIT;
> +                       struct pathspec ps;
> +                       struct tree *tree = lookup_tree(istate->repo, &ce->oid);
> +
> +                       memset(&ps, 0, sizeof(ps));
> +                       ps.recursive = 1;
> +                       ps.has_wildcard = 1;
> +                       ps.max_depth = -1;
> +
> +                       strbuf_add(&base, ce->name, ce->ce_namelen);
> +                       read_tree_at(istate->repo, tree, &base, &ps,
> +                                    add_file_to_list, s);
> +                       continue;
> +               }
> +
>                 it = string_list_insert(&s->change, ce->name);
>                 d = it->util;
>                 if (!d) {
> --
> gitgitgadget
>

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [PATCH v5 10/14] diff-lib: handle index diffs with sparse dirs
  2021-06-07 12:34         ` [PATCH v5 10/14] diff-lib: handle index diffs with sparse dirs Derrick Stolee via GitGitGadget
  2021-06-07 15:26           ` Derrick Stolee
@ 2021-06-09  5:47           ` Elijah Newren
  2021-06-09  6:32             ` Junio C Hamano
  1 sibling, 1 reply; 127+ messages in thread
From: Elijah Newren @ 2021-06-09  5:47 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Junio C Hamano, Matheus Tavares Bernardino,
	Derrick Stolee, Derrick Stolee, Derrick Stolee

On Mon, Jun 7, 2021 at 5:34 AM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Derrick Stolee <dstolee@microsoft.com>
>
> While comparing an index to a tree, we may see a sparse directory entry.
> In this case, we should compare that portion of the tree to the tree
> represented by that entry. This could include a new tree which needs to
> be expanded to a full list of added files. It could also include an
> existing tree, in which case all of the changes inside are important to
> describe, including the modifications, additions, and deletions. Note
> that the case where the tree has a path and the index does not remains
> identical to before: the lack of a cache entry is the same with a sparse
> index.
>
> In the case where a tree is modified, we need to expand the tree
> recursively, and start comparing each contained entry as either an
> addition, deletion, or modification. This causes an interesting
> recursion that did not exist before.

So, I haven't read through this in detail yet...but there's a big
question I'm curious about:

Git already has code for comparing an index to a tree, a tree to a
tree, or a tree to the working directory, right?  So, when comparing a
sparse-index to a tree...can't we re-use the compare a tree to a tree
code when we hit a sparse directory?

Maybe there's a really good reason to conceptually duplicate the
compare a tree to a tree code, but it seems the commit message should
at least address that reason and why we need to reimplement that
logic.


> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  diff-lib.c | 188 +++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 188 insertions(+)
>
> diff --git a/diff-lib.c b/diff-lib.c
> index b73cc1859a49..ba4c683d4bc4 100644
> --- a/diff-lib.c
> +++ b/diff-lib.c
> @@ -314,6 +314,48 @@ static int get_stat_data(const struct cache_entry *ce,
>         return 0;
>  }
>
> +struct show_new_tree_context {
> +       struct rev_info *revs;
> +       unsigned added:1;
> +};
> +
> +static int show_new_file_from_tree(const struct object_id *oid,
> +                                  struct strbuf *base, const char *path,
> +                                  unsigned int mode, void *context)
> +{
> +       struct show_new_tree_context *ctx = context;
> +       struct cache_entry *new_file = make_transient_cache_entry(mode, oid, path, /* stage */ 0);
> +
> +       diff_index_show_file(ctx->revs, ctx->added ? "+" : "-", new_file, oid, !is_null_oid(oid), mode, 0);
> +       discard_cache_entry(new_file);
> +       return 0;
> +}
> +
> +static void show_directory(struct rev_info *revs,
> +                          const struct cache_entry *new_dir,
> +                          int added)
> +{
> +       /*
> +        * new_dir is a sparse directory entry, so we want to collect all
> +        * of the new files within the tree. This requires recursively
> +        * expanding the trees.
> +        */
> +       struct show_new_tree_context ctx = { revs, added };
> +       struct repository *r = revs->repo;
> +       struct strbuf base = STRBUF_INIT;
> +       struct pathspec ps;
> +       struct tree *tree = lookup_tree(r, &new_dir->oid);
> +
> +       memset(&ps, 0, sizeof(ps));
> +       ps.recursive = 1;
> +       ps.has_wildcard = 1;
> +       ps.max_depth = -1;
> +
> +       strbuf_add(&base, new_dir->name, new_dir->ce_namelen);
> +       read_tree_at(r, tree, &base, &ps,
> +                       show_new_file_from_tree, &ctx);
> +}
> +
>  static void show_new_file(struct rev_info *revs,
>                           const struct cache_entry *new_file,
>                           int cached, int match_missing)
> @@ -322,6 +364,11 @@ static void show_new_file(struct rev_info *revs,
>         unsigned int mode;
>         unsigned dirty_submodule = 0;
>
> +       if (new_file && S_ISSPARSEDIR(new_file->ce_mode)) {
> +               show_directory(revs, new_file, /*added */ 1);
> +               return;
> +       }
> +
>         /*
>          * New file in the index: it might actually be different in
>          * the working tree.
> @@ -333,6 +380,136 @@ static void show_new_file(struct rev_info *revs,
>         diff_index_show_file(revs, "+", new_file, oid, !is_null_oid(oid), mode, dirty_submodule);
>  }
>
> +static int show_modified(struct rev_info *revs,
> +                        const struct cache_entry *old_entry,
> +                        const struct cache_entry *new_entry,
> +                        int report_missing,
> +                        int cached, int match_missing);
> +
> +static int compare_within_sparse_dir(int n, unsigned long mask,
> +                                    unsigned long dirmask, struct name_entry *entry,
> +                                    struct traverse_info *info)
> +{
> +       struct rev_info *revs = info->data;
> +       struct object_id *oid0 = &entry[0].oid;
> +       struct object_id *oid1 = &entry[1].oid;
> +
> +       if (oideq(oid0, oid1))
> +               return mask;
> +
> +       /* Directory/file conflicts are handled earlier. */
> +       if (S_ISDIR(entry[0].mode) && S_ISDIR(entry[1].mode)) {
> +               struct tree_desc t[2];
> +               void *buf[2];
> +               struct traverse_info info_r = { NULL, };
> +
> +               info_r.name = xstrfmt("%s%s", info->traverse_path, entry[0].path);
> +               info_r.namelen = strlen(info_r.name);
> +               info_r.traverse_path = xstrfmt("%s/", info_r.name);
> +               info_r.fn = compare_within_sparse_dir;
> +               info_r.prev = info;
> +               info_r.mode = entry[0].mode;
> +               info_r.pathlen = entry[0].pathlen;
> +               info_r.df_conflicts = 0;
> +               info_r.data = revs;
> +
> +               buf[0] = fill_tree_descriptor(revs->repo, &t[0], oid0);
> +               buf[1] = fill_tree_descriptor(revs->repo, &t[1], oid1);
> +
> +               traverse_trees(NULL, 2, t, &info_r);
> +
> +               free((char *)info_r.name);
> +               free((char *)info_r.traverse_path);
> +               free(buf[0]);
> +               free(buf[1]);
> +       } else {
> +               char *old_path = NULL, *new_path = NULL;
> +               struct cache_entry *old_entry = NULL, *new_entry = NULL;
> +
> +               if (entry[0].path) {
> +                       old_path = xstrfmt("%s%s", info->traverse_path, entry[0].path);
> +                       old_entry = make_transient_cache_entry(
> +                                       entry[0].mode, &entry[0].oid,
> +                                       old_path, /* stage */ 0);
> +                       old_entry->ce_flags |= CE_SKIP_WORKTREE;
> +               }
> +               if (entry[1].path) {
> +                       new_path = xstrfmt("%s%s", info->traverse_path, entry[1].path);
> +                       new_entry = make_transient_cache_entry(
> +                                       entry[1].mode, &entry[1].oid,
> +                                       new_path, /* stage */ 0);
> +                       new_entry->ce_flags |= CE_SKIP_WORKTREE;
> +               }
> +
> +               if (entry[0].path && entry[1].path)
> +                       show_modified(revs, old_entry, new_entry, 0, 1, 0);
> +               else if (entry[0].path)
> +                       diff_index_show_file(revs, revs->prefix,
> +                                            old_entry, &entry[0].oid,
> +                                            0, entry[0].mode, 0);
> +               else if (entry[1].path)
> +                       show_new_file(revs, new_entry, 1, 0);
> +
> +               discard_cache_entry(old_entry);
> +               discard_cache_entry(new_entry);
> +               free(old_path);
> +               free(new_path);
> +       }
> +
> +       return mask;
> +}
> +
> +static void show_modified_sparse_directory(struct rev_info *revs,
> +                        const struct cache_entry *old_entry,
> +                        const struct cache_entry *new_entry,
> +                        int report_missing,
> +                        int cached, int match_missing)
> +{
> +       struct tree_desc t[2];
> +       void *buf[2];
> +       struct traverse_info info = { NULL };
> +       struct strbuf name = STRBUF_INIT;
> +       struct strbuf parent_path = STRBUF_INIT;
> +       char *last_dir_sep;
> +
> +       if (oideq(&old_entry->oid, &new_entry->oid))
> +               return;
> +
> +       info.fn = compare_within_sparse_dir;
> +       info.prev = &info;
> +
> +       strbuf_add(&name, new_entry->name, new_entry->ce_namelen - 1);
> +       info.name = name.buf;
> +       info.namelen = name.len;
> +
> +       strbuf_add(&parent_path, new_entry->name, new_entry->ce_namelen - 1);
> +       if ((last_dir_sep = find_last_dir_sep(parent_path.buf)) > parent_path.buf)
> +               strbuf_setlen(&parent_path, (last_dir_sep - parent_path.buf) - 1);
> +       else
> +               strbuf_setlen(&parent_path, 0);
> +
> +       info.pathlen = parent_path.len;
> +
> +       if (parent_path.len)
> +               info.traverse_path = parent_path.buf;
> +       else
> +               info.traverse_path = "";
> +
> +       info.mode = new_entry->ce_mode;
> +       info.df_conflicts = 0;
> +       info.data = revs;
> +
> +       buf[0] = fill_tree_descriptor(revs->repo, &t[0], &old_entry->oid);
> +       buf[1] = fill_tree_descriptor(revs->repo, &t[1], &new_entry->oid);
> +
> +       traverse_trees(NULL, 2, t, &info);
> +
> +       free(buf[0]);
> +       free(buf[1]);
> +       strbuf_release(&name);
> +       strbuf_release(&parent_path);
> +}
> +
>  static int show_modified(struct rev_info *revs,
>                          const struct cache_entry *old_entry,
>                          const struct cache_entry *new_entry,
> @@ -343,6 +520,17 @@ static int show_modified(struct rev_info *revs,
>         const struct object_id *oid;
>         unsigned dirty_submodule = 0;
>
> +       /*
> +        * If both are sparse directory entries, then expand the
> +        * modifications to the file level.
> +        */
> +       if (old_entry && new_entry &&
> +           S_ISSPARSEDIR(old_entry->ce_mode) &&
> +           S_ISSPARSEDIR(new_entry->ce_mode)) {
> +               show_modified_sparse_directory(revs, old_entry, new_entry, report_missing, cached, match_missing);
> +               return 0;
> +       }
> +
>         if (get_stat_data(new_entry, &oid, &mode, cached, match_missing,
>                           &dirty_submodule, &revs->diffopt) < 0) {
>                 if (report_missing)
> --
> gitgitgadget
>

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [PATCH v5 10/14] diff-lib: handle index diffs with sparse dirs
  2021-06-09  5:47           ` Elijah Newren
@ 2021-06-09  6:32             ` Junio C Hamano
  2021-06-09  8:11               ` Elijah Newren
  0 siblings, 1 reply; 127+ messages in thread
From: Junio C Hamano @ 2021-06-09  6:32 UTC (permalink / raw)
  To: Elijah Newren
  Cc: Derrick Stolee via GitGitGadget, Git Mailing List,
	Matheus Tavares Bernardino, Derrick Stolee, Derrick Stolee,
	Derrick Stolee

Elijah Newren <newren@gmail.com> writes:

> On Mon, Jun 7, 2021 at 5:34 AM Derrick Stolee via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
>>
>> From: Derrick Stolee <dstolee@microsoft.com>
>>
>> While comparing an index to a tree, we may see a sparse directory entry.
>> In this case, we should compare that portion of the tree to the tree
>> represented by that entry. This could include a new tree which needs to
>> be expanded to a full list of added files. It could also include an
>> existing tree, in which case all of the changes inside are important to
>> describe, including the modifications, additions, and deletions. Note
>> that the case where the tree has a path and the index does not remains
>> identical to before: the lack of a cache entry is the same with a sparse
>> index.
>>
>> In the case where a tree is modified, we need to expand the tree
>> recursively, and start comparing each contained entry as either an
>> addition, deletion, or modification. This causes an interesting
>> recursion that did not exist before.
>
> So, I haven't read through this in detail yet...but there's a big
> question I'm curious about:
>
> Git already has code for comparing an index to a tree, a tree to a
> tree, or a tree to the working directory, right?  So, when comparing a
> sparse-index to a tree...can't we re-use the compare a tree to a tree
> code when we hit a sparse directory?

Offhand I do not think of a reason why that cannot work.

The tree-diff machinery takes two trees, walks them in parallel and
repeatedly calls either diff_addremove() or diff_change(), which
appends diff_filepair() to the diff_queue[] structure.  If you see
an unexpanded tree on the index side, you should be able to pass
that tree with the subtree you are comparing against to the tree-diff
machinery to come up with a series of filepairs, and then tweak the
pathnames of these filepairs (as such a two-tree comparison would be
comparing two trees representing a single subdirectory of two different
vintages) before adding them to the diff_queue[] you are collecting
the index-vs-tree diff, for example.

But if a part of the index is represented as a tree because it is
outside the cone of interest, should we even be showing the
difference in that part of the tree?  If t/ directory is outside the
cone of interest, should "git diff HEAD~100 HEAD t/" show anything
to begin with (the same question for "git diff --cached HEAD t/")?

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [PATCH v5 10/14] diff-lib: handle index diffs with sparse dirs
  2021-06-09  6:32             ` Junio C Hamano
@ 2021-06-09  8:11               ` Elijah Newren
  2021-06-09 20:33                 ` Derrick Stolee
  0 siblings, 1 reply; 127+ messages in thread
From: Elijah Newren @ 2021-06-09  8:11 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Derrick Stolee via GitGitGadget, Git Mailing List,
	Matheus Tavares Bernardino, Derrick Stolee, Derrick Stolee,
	Derrick Stolee

On Tue, Jun 8, 2021 at 11:32 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Elijah Newren <newren@gmail.com> writes:
>
> > On Mon, Jun 7, 2021 at 5:34 AM Derrick Stolee via GitGitGadget
> > <gitgitgadget@gmail.com> wrote:
> >>
> >> From: Derrick Stolee <dstolee@microsoft.com>
> >>
> >> While comparing an index to a tree, we may see a sparse directory entry.
> >> In this case, we should compare that portion of the tree to the tree
> >> represented by that entry. This could include a new tree which needs to
> >> be expanded to a full list of added files. It could also include an
> >> existing tree, in which case all of the changes inside are important to
> >> describe, including the modifications, additions, and deletions. Note
> >> that the case where the tree has a path and the index does not remains
> >> identical to before: the lack of a cache entry is the same with a sparse
> >> index.
> >>
> >> In the case where a tree is modified, we need to expand the tree
> >> recursively, and start comparing each contained entry as either an
> >> addition, deletion, or modification. This causes an interesting
> >> recursion that did not exist before.
> >
> > So, I haven't read through this in detail yet...but there's a big
> > question I'm curious about:
> >
> > Git already has code for comparing an index to a tree, a tree to a
> > tree, or a tree to the working directory, right?  So, when comparing a
> > sparse-index to a tree...can't we re-use the compare a tree to a tree
> > code when we hit a sparse directory?
>
> Offhand I do not think of a reason why that cannot work.
>
> The tree-diff machinery takes two trees, walks them in parallel and
> repeatedly calls either diff_addremove() or diff_change(), which
> appends diff_filepair() to the diff_queue[] structure.  If you see
> an unexpanded tree on the index side, you should be able to pass
> that tree with the subtree you are comparing against to the tree-diff
> machinery to come up with a series of filepairs, and then tweak the
> pathnames of these filepairs (as such a two-tree comparison would be
> comparing two trees representing a single subdirectory of two different
> vintages) before adding them to the diff_queue[] you are collecting
> the index-vs-tree diff, for example.

Good to know it seems my idea might be reasonable.

> But if a part of the index is represented as a tree because it is
> outside the cone of interest, should we even be showing the
> difference in that part of the tree?  If t/ directory is outside the
> cone of interest, should "git diff HEAD~100 HEAD t/" show anything
> to begin with (the same question for "git diff --cached HEAD t/")?

Excellent question...and not just for diff, but log, grep with
revisions, and other commands.  We discussed this a while back[1] and
we seemed to lean towards eventually adding a flag because there are
usecases both for (1) viewing full history while having sparsity paths
restrict just the working copy, and (2) also restricting the view of
history to the sparsity paths.

[1] It's been discussed a few times, but there's a relatively
comprehensive discussion at the "Commands that would change for
behavior A" section from
https://lore.kernel.org/git/CABPp-BGJ_Nvi5TmgriD9Bh6eNXE2EDq2f8e8QKXAeYG3BxZafA@mail.gmail.com/

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [PATCH v5 02/14] sparse-index: include EXTENDED flag when expanding
  2021-06-08 18:56           ` Elijah Newren
@ 2021-06-09 17:39             ` Derrick Stolee
  2021-06-09 18:11               ` Elijah Newren
  0 siblings, 1 reply; 127+ messages in thread
From: Derrick Stolee @ 2021-06-09 17:39 UTC (permalink / raw)
  To: Elijah Newren, Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Junio C Hamano, Matheus Tavares Bernardino,
	Derrick Stolee, Derrick Stolee

On 6/8/2021 2:56 PM, Elijah Newren wrote:
> On Mon, Jun 7, 2021 at 5:34 AM Derrick Stolee via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
>>
>> From: Derrick Stolee <dstolee@microsoft.com>
>>
>> When creating a full index from a sparse one, we create cache entries
>> for every blob within a given sparse directory entry. These are
>> correctly marked with the CE_SKIP_WORKTREE flag, but they must also be
>> marked with the CE_EXTENDED flag to ensure that the skip-worktree bit is
>> correctly written to disk in the case that the index is not converted
>> back down to a sparse-index.
> 
> In our previous discussion on this patch from v3
> (https://lore.kernel.org/git/cb9161ca-dc6e-b77b-1a41-385ed8920bb2@gmail.com/),
> you said you'd explain the reason for this change in a bit more
> detail, but the commit message has not changed.

Thank you for the reminder.

> Could this be corrected?

How does this sound?

    When creating a full index from a sparse one, we create cache entries
    for every blob within a given sparse directory entry. These are
    correctly marked with the CE_SKIP_WORKTREE flag, but the CE_EXTENDED
    flag is not included. The CE_EXTENDED flag would exist if we loaded a
    full index from disk with these entries marked with CE_SKIP_WORKTREE, so
    we can add the flag here to be consistent. This allows us to directly
    compare the flags present in cache entries when testing the sparse-index
    feature, but has no significance to its correctness in the user-facing
    functionality.

I have this in my local branch for now, but can update it before the next
version.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [PATCH v5 02/14] sparse-index: include EXTENDED flag when expanding
  2021-06-09 17:39             ` Derrick Stolee
@ 2021-06-09 18:11               ` Elijah Newren
  0 siblings, 0 replies; 127+ messages in thread
From: Elijah Newren @ 2021-06-09 18:11 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Derrick Stolee via GitGitGadget, Git Mailing List,
	Junio C Hamano, Matheus Tavares Bernardino, Derrick Stolee,
	Derrick Stolee

On Wed, Jun 9, 2021 at 10:39 AM Derrick Stolee <stolee@gmail.com> wrote:
>
> On 6/8/2021 2:56 PM, Elijah Newren wrote:
> > On Mon, Jun 7, 2021 at 5:34 AM Derrick Stolee via GitGitGadget
> > <gitgitgadget@gmail.com> wrote:
> >>
> >> From: Derrick Stolee <dstolee@microsoft.com>
> >>
> >> When creating a full index from a sparse one, we create cache entries
> >> for every blob within a given sparse directory entry. These are
> >> correctly marked with the CE_SKIP_WORKTREE flag, but they must also be
> >> marked with the CE_EXTENDED flag to ensure that the skip-worktree bit is
> >> correctly written to disk in the case that the index is not converted
> >> back down to a sparse-index.
> >
> > In our previous discussion on this patch from v3
> > (https://lore.kernel.org/git/cb9161ca-dc6e-b77b-1a41-385ed8920bb2@gmail.com/),
> > you said you'd explain the reason for this change in a bit more
> > detail, but the commit message has not changed.
>
> Thank you for the reminder.
>
> > Could this be corrected?
>
> How does this sound?
>
>     When creating a full index from a sparse one, we create cache entries
>     for every blob within a given sparse directory entry. These are
>     correctly marked with the CE_SKIP_WORKTREE flag, but the CE_EXTENDED
>     flag is not included. The CE_EXTENDED flag would exist if we loaded a
>     full index from disk with these entries marked with CE_SKIP_WORKTREE, so
>     we can add the flag here to be consistent. This allows us to directly
>     compare the flags present in cache entries when testing the sparse-index
>     feature, but has no significance to its correctness in the user-facing
>     functionality.
>
> I have this in my local branch for now, but can update it before the next
> version.

Thanks; this looks good to me.

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [PATCH v5 08/14] unpack-trees: unpack sparse directory entries
  2021-06-09  3:48           ` Elijah Newren
@ 2021-06-09 20:21             ` Derrick Stolee
  0 siblings, 0 replies; 127+ messages in thread
From: Derrick Stolee @ 2021-06-09 20:21 UTC (permalink / raw)
  To: Elijah Newren, Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Junio C Hamano, Matheus Tavares Bernardino,
	Derrick Stolee, Derrick Stolee

On 6/8/2021 11:48 PM, Elijah Newren wrote:
> On Mon, Jun 7, 2021 at 5:34 AM Derrick Stolee via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
>>
>> From: Derrick Stolee <dstolee@microsoft.com>
>>
>> During unpack_callback(), index entries are compared against tree
>> entries. These are matched according to names and types. One goal is to
>> decide if we should recurse into subtrees or simply operate on one index
>> entry.
>>
>> In the case of a sparse-directory entry, we do not want to recurse into
>> that subtree and instead simply compare the trees. In some cases, we
>> might want to perform a merge operation on the entry, such as during
>> 'git checkout <commit>' which wants to replace a sparse tree entry with
>> the tree for that path at the target commit. We extend the logic within
>> unpack_nondirectories() to create a sparse-directory entry in this case,
>> and then that is sent to call_unpack_fn().
> 
> Does this presume that all callbacks are prepared to accept a sparse
> directory entry?  Or do we have an external flag that ensures we do
> not reach this code path when using callbacks that aren't prepared to
> handle it properly?
> 
> I hope that the answer is the latter, and that the ensure_full_index()
> calls are what prevents the code from reaching this point if a
> callback would be used that couldn't handle a sparse directory entry.

To the best of my knowledge, callbacks that are not protected have
ensure_full_index() protecting them. At minimum, the repository setting
command_requires_full_index is enabled by default, causing a sparse
index to be expanded to a full one immediately upon parsing (and also
after writing) to protect cases that might be missing. That is, until
we can create tests for each command before disabling it for that
command.

> I'd be particularly concerned that merge-recursive would call this
> code with unpack_opts.fn = threeway_merge.  threeway_merge is kind of
> interesting in that it might just happen to not die when passed a
> sparse directory entry, but would pass along data that'd just break
> stuff downstream in various subtle ways.  For example, if there were
> conflicts in the sparse directory entries because both had been
> modified, the merge should recurse and resolve individual paths
> underneath, which the merge-recursive code would not be prepared to do
> since unpack_trees() has already returned.  Also, even if there wasn't
> a "conflict" because only one side modified, blindly doing a trivial
> directory resolution will break rename detection.  I mention
> merge-recursive not because it's worth fixing (well, it was and the
> fix is called merge-ort) but because I'm most familiar with it.  The
> other callbacks _might_ have similar problems, though its possible
> that it's safe for one- and two- way merging and just fails once you
> get to three-way.

I also believe that threeway_merge might be difficult to integrate
and that its use should be protected with ensure_full_index() even if
'git merge' in general does not do it by default. We can cross that
bridge when we get to it. Merge, rebase, and cherry-pick are next on
my list of "commands to integrate with sparse-index" but I haven't
done the work yet to make them work.
>> +       if (is_sparse_directory) {
>> +               ce->name[len] = '/';
>> +               ce->name[len + 1] = 0;
> 
> Should this be '\0', for clarity?

Sure.

>> @@ -1064,16 +1073,24 @@ static int unpack_nondirectories(int n, unsigned long mask,
>>                                  unsigned long dirmask,
>>                                  struct cache_entry **src,
>>                                  const struct name_entry *names,
>> -                                const struct traverse_info *info)
>> +                                const struct traverse_info *info,
>> +                                int sparse_directory)
>>  {
>>         int i;
>>         struct unpack_trees_options *o = info->data;
>>         unsigned long conflicts = info->df_conflicts | dirmask;
>>
>> -       /* Do we have *only* directories? Nothing to do */
> 
> You've removed the comment, but not the code.  So it still returns
> immediately if there are only directories...right?  Am I missing
> something?  Is this code still correct?  Or is the comment just
> misleading now that src[0] can be a directory?
Yes, the comment is misleading now that we can call this method
with sparse-directory entries. The method name is also a bit
misleading: this should perhaps be renamed to unpack_single_entry()
or something like that. That will signal that we are not recursing
with traverse_trees_recursive() as we do in the other case.
 
>>         if (mask == dirmask && !src[0])
>>                 return 0;
>>
>> +       /* no-op if our cache entry doesn't match the expectations. */
>> +       if (sparse_directory) {
>> +               if (src[0] && !S_ISSPARSEDIR(src[0]->ce_mode))
>> +                       BUG("expected sparse directory entry");
>> +       } else if (src[0] && S_ISSPARSEDIR(src[0]->ce_mode)) {
>> +               return 0;
>> +       }
> 
> This code reads like "If sparse_directory is false, but the cache
> entry is a sparse directory, we'll just keep it as-is and ignore
> changed or conflicting directories or files from the names name_entry.
> However, I think this has to be coupled with knowledge about changes
> to unpack_callback() you made, where you introduce an extra call to
> unpack_nondirectories() for the sparse directory case, and in the
> second one you would do useful work.  So "no-op" is kind of
> misleading, it's more deferral until the later unpack_nondirectories()
> call.
> 
> Or, at least so I think after trying to read over this patch.  Am I
> understanding this right?

I think they are both correct: we defer until later by doing a no-op
right now. But using "defer" is more informative of the context of
this call.

>> +
>> +       /*
>> +        * Compare ce->name to info->name + '/' + p->path + '/'
>> +        * if info->name is non-empty. Compare ce->name to
>> +        * p-.path + '/' otherwise.
> 
> p->path, not p-.path

Thanks!
 
> Also, you state in both cases that you are comparing against a
> trailing '/', but...
> 
>> +        */
>> +       if (info->namelen) {
>> +               if (ce->ce_namelen == info->namelen + p->pathlen + 2 &&
>> +                   ce->name[info->namelen] == '/' &&
>> +                   !strncmp(ce->name, info->name, info->namelen) &&
>> +                   !strncmp(ce->name + info->namelen + 1, p->path, p->pathlen))
> 
> You only checked for one of the two '/' characters here.

The first '/' check is to verify that we match "{info->name}/{p->name}/"
and not "{info->name}.{p->name}/" ('.' means "any character").

>  Are you
> omitting the check for the final '/' do to the S_ISSPARSEDIR() check
> above?

Since we know at this point that ce is a sparse directory entry, the
final character _must_ be a trailing slash. There is not a trailing
slash in the input p->path.

>> +                       return ce;
>> +       } else if (ce->ce_namelen == p->pathlen + 1 &&
>> +                  !strncmp(ce->name, p->path, p->pathlen))
> 
> Here you didn't check for the final '/'.  Is that intentional because
> of the S_ISSPARSEDIR() check above?  If so, should the comment above
> this block be corrected?

Yes, will do.

>> +               return ce;
>> +       return NULL;
>>  }
>>
>>  static void debug_path(struct traverse_info *info)
>> @@ -1251,6 +1301,32 @@ static void debug_unpack_callback(int n,
>>                 debug_name_entry(i, names + i);
>>  }
>>
>> +/*
>> + * Returns true if and only if the given cache_entry is a
>> + * sparse-directory entry that matches the given name_entry
>> + * from the tree walk at the given traverse_info.
>> + */
>> +static int is_sparse_directory_entry(struct cache_entry *ce, struct name_entry *name, struct traverse_info *info)
>> +{
>> +       size_t expected_len, name_start;
>> +
>> +       if (!ce || !name || !S_ISSPARSEDIR(ce->ce_mode))
>> +               return 0;
>> +
>> +       if (info->namelen)
>> +               name_start = info->namelen + 1;
>> +       else
>> +               name_start = 0;
>> +       expected_len = name->pathlen + 1 + name_start;
>> +
>> +       if (ce->ce_namelen != expected_len ||
>> +           strncmp(ce->name, info->name, info->namelen) ||
>> +           strncmp(ce->name + name_start, name->path, name->pathlen))
>> +               return 0;
> 
> What about the intervening '/' character?  Could we get a false hit
> between "foo/bar/" and "foo.bar/"?

Here, you are right that I missed this check. I will add it.

> Also, do we have to worry about the trailing '/'?

No, the index would be malformed without it. Since this code
is so similar to the other check (just the negation of it)
I will add a clearly-commented helper method.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [PATCH v5 10/14] diff-lib: handle index diffs with sparse dirs
  2021-06-09  8:11               ` Elijah Newren
@ 2021-06-09 20:33                 ` Derrick Stolee
  2021-06-10 17:45                   ` Derrick Stolee
  0 siblings, 1 reply; 127+ messages in thread
From: Derrick Stolee @ 2021-06-09 20:33 UTC (permalink / raw)
  To: Elijah Newren, Junio C Hamano
  Cc: Derrick Stolee via GitGitGadget, Git Mailing List,
	Matheus Tavares Bernardino, Derrick Stolee, Derrick Stolee

On 6/9/2021 4:11 AM, Elijah Newren wrote:
> On Tue, Jun 8, 2021 at 11:32 PM Junio C Hamano <gitster@pobox.com> wrote:
>>
>> Elijah Newren <newren@gmail.com> writes:
>>
>>> On Mon, Jun 7, 2021 at 5:34 AM Derrick Stolee via GitGitGadget
>>> <gitgitgadget@gmail.com> wrote:
>>>>
>>>> From: Derrick Stolee <dstolee@microsoft.com>
>>>>
>>>> While comparing an index to a tree, we may see a sparse directory entry.
>>>> In this case, we should compare that portion of the tree to the tree
>>>> represented by that entry. This could include a new tree which needs to
>>>> be expanded to a full list of added files. It could also include an
>>>> existing tree, in which case all of the changes inside are important to
>>>> describe, including the modifications, additions, and deletions. Note
>>>> that the case where the tree has a path and the index does not remains
>>>> identical to before: the lack of a cache entry is the same with a sparse
>>>> index.
>>>>
>>>> In the case where a tree is modified, we need to expand the tree
>>>> recursively, and start comparing each contained entry as either an
>>>> addition, deletion, or modification. This causes an interesting
>>>> recursion that did not exist before.
>>>
>>> So, I haven't read through this in detail yet...but there's a big
>>> question I'm curious about:
>>>
>>> Git already has code for comparing an index to a tree, a tree to a
>>> tree, or a tree to the working directory, right?  So, when comparing a
>>> sparse-index to a tree...can't we re-use the compare a tree to a tree
>>> code when we hit a sparse directory?
>>
>> Offhand I do not think of a reason why that cannot work.
>>
>> The tree-diff machinery takes two trees, walks them in parallel and
>> repeatedly calls either diff_addremove() or diff_change(), which
>> appends diff_filepair() to the diff_queue[] structure.  If you see
>> an unexpanded tree on the index side, you should be able to pass
>> that tree with the subtree you are comparing against to the tree-diff
>> machinery to come up with a series of filepairs, and then tweak the
>> pathnames of these filepairs (as such a two-tree comparison would be
>> comparing two trees representing a single subdirectory of two different
>> vintages) before adding them to the diff_queue[] you are collecting
>> the index-vs-tree diff, for example.
> 
> Good to know it seems my idea might be reasonable.

I agree that this is reasonable. I just didn't look hard enough
to find existing code for this, since I found traverse_trees and
thought that _was_ the library for this.

>> But if a part of the index is represented as a tree because it is
>> outside the cone of interest, should we even be showing the
>> difference in that part of the tree?  If t/ directory is outside the
>> cone of interest, should "git diff HEAD~100 HEAD t/" show anything
>> to begin with (the same question for "git diff --cached HEAD t/")?
> 
> Excellent question...and not just for diff, but log, grep with
> revisions, and other commands.  We discussed this a while back[1] and
> we seemed to lean towards eventually adding a flag because there are
> usecases both for (1) viewing full history while having sparsity paths
> restrict just the working copy, and (2) also restricting the view of
> history to the sparsity paths.
> 
> [1] It's been discussed a few times, but there's a relatively
> comprehensive discussion at the "Commands that would change for
> behavior A" section from
> https://lore.kernel.org/git/CABPp-BGJ_Nvi5TmgriD9Bh6eNXE2EDq2f8e8QKXAeYG3BxZafA@mail.gmail.com/

Yes, we could investigate this behavior change in the future. The
good thing is that these points that handle sparse directory
entries create clear branching points for that future behavior
change.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [PATCH v5 13/14] wt-status: expand added sparse directory entries
  2021-06-09  5:27           ` Elijah Newren
@ 2021-06-09 20:49             ` Derrick Stolee
  0 siblings, 0 replies; 127+ messages in thread
From: Derrick Stolee @ 2021-06-09 20:49 UTC (permalink / raw)
  To: Elijah Newren, Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Junio C Hamano, Matheus Tavares Bernardino,
	Derrick Stolee, Derrick Stolee

On 6/9/2021 1:27 AM, Elijah Newren wrote:
> On Mon, Jun 7, 2021 at 5:34 AM Derrick Stolee via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
...
>> +test_expect_success 'reset mixed and checkout orphan' '
>> +       init_repos &&
>> +
>> +       test_all_match git checkout rename-out-to-in &&
>> +
>> +       # Sparse checkouts do not agree with full checkouts about
>> +       # how to report a directory/file conflict during a reset.
>> +       # This command would fail with test_all_match because the
>> +       # full checkout reports "T folder1/0/1" while a sparse
>> +       # checkout reports "D folder1/0/1". This matches because
>> +       # the sparse checkouts skip "adding" the other side of
>> +       # the conflict.
>> +       test_sparse_match git reset --mixed HEAD~1 &&
> 
> Ooh!  I think you found a sparse-checkout bug here.  I agree that
> sparse-checkouts and full-checkouts should give different output in
> this case, but I don't think the current difference is the correct
> one.  Digging in a little closer, before running `git reset --mixed
> HEAD~1` I see:
> 
> $ git ls-files -t | grep folder
> S folder1/0/0/0
> S folder1/0/1
> S folder2/0/0/0
> S folder2/0/1/1
> S folder2/a
> S folder2/larger-content
> 
> and after running git reset --mixed HEAD~1, I see:
> S folder1/0/0/0
> S folder1/0/1
> H folder1/a
> H folder1/larger-content
> S folder2/0/0/0
> H folder2/0/1
> S folder2/a
> S folder2/larger-content
> 
> meaning that the reset of the index failed.  It thinks some entries
> are present in the working copy, though it didn't actually check any
> of them out, leaving them to be marked as deleted.  This leaves the
> sparse-checkout in a messed up state.  To correct it, I need to run
> either of the following:
> 
>     git diff --diff-filter=D --name-only | xargs git update-index
> --skip-worktree
> 
> or
> 
>     git sparse-checkout reapply
> 
> (Though one could ask whether sparse-checkout reapply should take a
> missing file that isn't SKIP_WORKTREE and determine it's okay to just
> mark it as SKIP_WORKTREE rather than treating it as dirty.  I'm not
> sure the answer to that...)
> 
> I really think that `git reset --mixed ...` should have been getting
> the sparsity right on its own without the manual fixup afterwards that
> I needed to add.
> 
>> +       test_sparse_match test-tool read-cache --table --expand &&
> 
> If both the full and the sparse checkouts do a reset --mixed, I would
> think that this step should be able to use a test_all_match...at least
> if reset --mixed weren't broken.

I will add this to my list when getting to 'git reset' integration
with sparse-checkout. Thanks.

>> +       test_sparse_match git status --porcelain=v2 &&
>> +       test_sparse_match git status --porcelain=v2 &&
> 
> Why is this test run twice?
> 
>> +
>> +       # At this point, sparse-checkouts behave differently
>> +       # from the full-checkout.
>> +       test_sparse_match git checkout --orphan new-branch &&
>> +       test_sparse_match test-tool read-cache --table --expand &&
>> +       test_sparse_match git status --porcelain=v2 &&
>> +       test_sparse_match git status --porcelain=v2
> 
> And again, you run the status twice...why?
> 
>> +'
>> +
>> +test_expect_success 'add everything with deep new file' '
>> +       init_repos &&
>> +
>> +       run_on_sparse git sparse-checkout set deep/deeper1/deepest &&
>> +
>> +       run_on_all touch deep/deeper1/x &&
>> +       test_all_match git add . &&
>> +       test_all_match git status --porcelain=v2 &&
>> +       test_all_match git status --porcelain=v2
> 
> same question.

These double 'git status' calls are actually a bit subtle: there
was a bug in an earlier version that only appeared when using
'git status' twice, because the first kept the sparse index
without expanding it, and the bug actually had an incorrect
result when writing that index. Only the second 'git status'
would notice the problem. I started adding two calls to my tests,
but it is not necessary any more.

The reason to leave it out of the Git tests is that I'm testing
all of my submissions against the Scalar functional tests which
run 'git status' multiple times throughout each test situation
and that catches the problem as well. In the future, we will have
'git add' keeping the sparse index in-memory; that will also
expose this behavior sufficiently.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [PATCH v5 10/14] diff-lib: handle index diffs with sparse dirs
  2021-06-09 20:33                 ` Derrick Stolee
@ 2021-06-10 17:45                   ` Derrick Stolee
  2021-06-10 21:31                     ` Elijah Newren
  0 siblings, 1 reply; 127+ messages in thread
From: Derrick Stolee @ 2021-06-10 17:45 UTC (permalink / raw)
  To: Elijah Newren, Junio C Hamano
  Cc: Derrick Stolee via GitGitGadget, Git Mailing List,
	Matheus Tavares Bernardino, Derrick Stolee, Derrick Stolee

On 6/9/2021 4:33 PM, Derrick Stolee wrote:
> On 6/9/2021 4:11 AM, Elijah Newren wrote:
>> On Tue, Jun 8, 2021 at 11:32 PM Junio C Hamano <gitster@pobox.com> wrote:
>>>
>>> Elijah Newren <newren@gmail.com> writes:
>>>
>>> The tree-diff machinery takes two trees, walks them in parallel and
>>> repeatedly calls either diff_addremove() or diff_change(), which
>>> appends diff_filepair() to the diff_queue[] structure.  If you see
>>> an unexpanded tree on the index side, you should be able to pass
>>> that tree with the subtree you are comparing against to the tree-diff
>>> machinery to come up with a series of filepairs, and then tweak the
>>> pathnames of these filepairs (as such a two-tree comparison would be
>>> comparing two trees representing a single subdirectory of two different
>>> vintages) before adding them to the diff_queue[] you are collecting
>>> the index-vs-tree diff, for example.
>>
>> Good to know it seems my idea might be reasonable.
> 
> I agree that this is reasonable. I just didn't look hard enough
> to find existing code for this, since I found traverse_trees and
> thought that _was_ the library for this.

This was surprisingly simple, since most of the complicated stuff
is built into diff_tree_oid() and its use of revs->diffopt. The
new patch works as shown below the cut-line.

I was incredibly suspicious of how quickly this came together,
but it passes all the tests I have for it (including Scalar
functional tests with the commit, checkout, and add integrations).

I'll send a new version with this patch tomorrow, as well as the
other recommended edits.

Thanks,
-Stolee

--- >8 ---


diff --git a/diff-lib.c b/diff-lib.c
index c2ac9250fe9..b631df89343 100644
--- a/diff-lib.c
+++ b/diff-lib.c
@@ -316,6 +316,13 @@ static int get_stat_data(const struct index_state *istate,
 	return 0;
 }
 
+static void show_directory(struct rev_info *revs,
+			   const struct cache_entry *new_dir,
+			   int added)
+{
+	diff_tree_oid(NULL, &new_dir->oid, new_dir->name, &revs->diffopt);
+}
+
 static void show_new_file(struct rev_info *revs,
 			  const struct cache_entry *new_file,
 			  int cached, int match_missing)
@@ -325,6 +332,11 @@ static void show_new_file(struct rev_info *revs,
 	unsigned dirty_submodule = 0;
 	struct index_state *istate = revs->diffopt.repo->index;
 
+	if (new_file && S_ISSPARSEDIR(new_file->ce_mode)) {
+		show_directory(revs, new_file, /*added */ 1);
+		return;
+	}
+
 	/*
 	 * New file in the index: it might actually be different in
 	 * the working tree.
@@ -336,6 +348,15 @@ static void show_new_file(struct rev_info *revs,
 	diff_index_show_file(revs, "+", new_file, oid, !is_null_oid(oid), mode, dirty_submodule);
 }
 
+static void show_modified_sparse_directory(struct rev_info *revs,
+			 const struct cache_entry *old_entry,
+			 const struct cache_entry *new_entry,
+			 int report_missing,
+			 int cached, int match_missing)
+{
+	diff_tree_oid(&old_entry->oid, &new_entry->oid, new_entry->name, &revs->diffopt);
+}
+
 static int show_modified(struct rev_info *revs,
 			 const struct cache_entry *old_entry,
 			 const struct cache_entry *new_entry,
@@ -347,6 +368,17 @@ static int show_modified(struct rev_info *revs,
 	unsigned dirty_submodule = 0;
 	struct index_state *istate = revs->diffopt.repo->index;
 
+	/*
+	 * If both are sparse directory entries, then expand the
+	 * modifications to the file level.
+	 */
+	if (old_entry && new_entry &&
+	    S_ISSPARSEDIR(old_entry->ce_mode) &&
+	    S_ISSPARSEDIR(new_entry->ce_mode)) {
+		show_modified_sparse_directory(revs, old_entry, new_entry, report_missing, cached, match_missing);
+		return 0;
+	}
+
 	if (get_stat_data(istate, new_entry, &oid, &mode, cached, match_missing,
 			  &dirty_submodule, &revs->diffopt) < 0) {
 		if (report_missing)

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [PATCH v5 10/14] diff-lib: handle index diffs with sparse dirs
  2021-06-10 17:45                   ` Derrick Stolee
@ 2021-06-10 21:31                     ` Elijah Newren
  2021-06-11 12:57                       ` Derrick Stolee
  0 siblings, 1 reply; 127+ messages in thread
From: Elijah Newren @ 2021-06-10 21:31 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Junio C Hamano, Derrick Stolee via GitGitGadget,
	Git Mailing List, Matheus Tavares Bernardino, Derrick Stolee,
	Derrick Stolee

On Thu, Jun 10, 2021 at 10:45 AM Derrick Stolee <stolee@gmail.com> wrote:
>
> On 6/9/2021 4:33 PM, Derrick Stolee wrote:
> > On 6/9/2021 4:11 AM, Elijah Newren wrote:
> >> On Tue, Jun 8, 2021 at 11:32 PM Junio C Hamano <gitster@pobox.com> wrote:
> >>>
> >>> Elijah Newren <newren@gmail.com> writes:
> >>>
> >>> The tree-diff machinery takes two trees, walks them in parallel and
> >>> repeatedly calls either diff_addremove() or diff_change(), which
> >>> appends diff_filepair() to the diff_queue[] structure.  If you see
> >>> an unexpanded tree on the index side, you should be able to pass
> >>> that tree with the subtree you are comparing against to the tree-diff
> >>> machinery to come up with a series of filepairs, and then tweak the
> >>> pathnames of these filepairs (as such a two-tree comparison would be
> >>> comparing two trees representing a single subdirectory of two different
> >>> vintages) before adding them to the diff_queue[] you are collecting
> >>> the index-vs-tree diff, for example.
> >>
> >> Good to know it seems my idea might be reasonable.
> >
> > I agree that this is reasonable. I just didn't look hard enough
> > to find existing code for this, since I found traverse_trees and
> > thought that _was_ the library for this.
>
> This was surprisingly simple, since most of the complicated stuff
> is built into diff_tree_oid() and its use of revs->diffopt. The
> new patch works as shown below the cut-line.
>
> I was incredibly suspicious of how quickly this came together,
> but it passes all the tests I have for it (including Scalar
> functional tests with the commit, checkout, and add integrations).

Nice!

> I'll send a new version with this patch tomorrow, as well as the
> other recommended edits.
>
> Thanks,
> -Stolee
>
> --- >8 ---
>
>
> diff --git a/diff-lib.c b/diff-lib.c
> index c2ac9250fe9..b631df89343 100644
> --- a/diff-lib.c
> +++ b/diff-lib.c
> @@ -316,6 +316,13 @@ static int get_stat_data(const struct index_state *istate,
>         return 0;
>  }
>
> +static void show_directory(struct rev_info *revs,
> +                          const struct cache_entry *new_dir,
> +                          int added)
> +{
> +       diff_tree_oid(NULL, &new_dir->oid, new_dir->name, &revs->diffopt);
> +}
> +
>  static void show_new_file(struct rev_info *revs,
>                           const struct cache_entry *new_file,
>                           int cached, int match_missing)
> @@ -325,6 +332,11 @@ static void show_new_file(struct rev_info *revs,
>         unsigned dirty_submodule = 0;
>         struct index_state *istate = revs->diffopt.repo->index;
>
> +       if (new_file && S_ISSPARSEDIR(new_file->ce_mode)) {
> +               show_directory(revs, new_file, /*added */ 1);
> +               return;
> +       }
> +
>         /*
>          * New file in the index: it might actually be different in
>          * the working tree.
> @@ -336,6 +348,15 @@ static void show_new_file(struct rev_info *revs,
>         diff_index_show_file(revs, "+", new_file, oid, !is_null_oid(oid), mode, dirty_submodule);
>  }
>
> +static void show_modified_sparse_directory(struct rev_info *revs,
> +                        const struct cache_entry *old_entry,
> +                        const struct cache_entry *new_entry,
> +                        int report_missing,
> +                        int cached, int match_missing)
> +{
> +       diff_tree_oid(&old_entry->oid, &new_entry->oid, new_entry->name, &revs->diffopt);
> +}
> +
>  static int show_modified(struct rev_info *revs,
>                          const struct cache_entry *old_entry,
>                          const struct cache_entry *new_entry,
> @@ -347,6 +368,17 @@ static int show_modified(struct rev_info *revs,
>         unsigned dirty_submodule = 0;
>         struct index_state *istate = revs->diffopt.repo->index;
>
> +       /*
> +        * If both are sparse directory entries, then expand the
> +        * modifications to the file level.
> +        */
> +       if (old_entry && new_entry &&
> +           S_ISSPARSEDIR(old_entry->ce_mode) &&
> +           S_ISSPARSEDIR(new_entry->ce_mode)) {
> +               show_modified_sparse_directory(revs, old_entry, new_entry, report_missing, cached, match_missing);
> +               return 0;
> +       }

What if S_ISSPARSEDIR(old_entry->ce_mode) != S_ISSPARSEDIR(new_entry->ce_mode) ?

> +
>         if (get_stat_data(istate, new_entry, &oid, &mode, cached, match_missing,
>                           &dirty_submodule, &revs->diffopt) < 0) {
>                 if (report_missing)

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [PATCH v5 10/14] diff-lib: handle index diffs with sparse dirs
  2021-06-10 21:31                     ` Elijah Newren
@ 2021-06-11 12:57                       ` Derrick Stolee
  2021-06-11 17:27                         ` Derrick Stolee
  0 siblings, 1 reply; 127+ messages in thread
From: Derrick Stolee @ 2021-06-11 12:57 UTC (permalink / raw)
  To: Elijah Newren
  Cc: Junio C Hamano, Derrick Stolee via GitGitGadget,
	Git Mailing List, Matheus Tavares Bernardino, Derrick Stolee,
	Derrick Stolee

On 6/10/2021 5:31 PM, Elijah Newren wrote:
> On Thu, Jun 10, 2021 at 10:45 AM Derrick Stolee <stolee@gmail.com> wrote:
>>
>> On 6/9/2021 4:33 PM, Derrick Stolee wrote:
>>> On 6/9/2021 4:11 AM, Elijah Newren wrote:
>>>> On Tue, Jun 8, 2021 at 11:32 PM Junio C Hamano <gitster@pobox.com> wrote:
>>>>>
>>>>> Elijah Newren <newren@gmail.com> writes:
>>>>>
>>>>> The tree-diff machinery takes two trees, walks them in parallel and
>>>>> repeatedly calls either diff_addremove() or diff_change(), which
>>>>> appends diff_filepair() to the diff_queue[] structure.  If you see
>>>>> an unexpanded tree on the index side, you should be able to pass
>>>>> that tree with the subtree you are comparing against to the tree-diff
>>>>> machinery to come up with a series of filepairs, and then tweak the
>>>>> pathnames of these filepairs (as such a two-tree comparison would be
>>>>> comparing two trees representing a single subdirectory of two different
>>>>> vintages) before adding them to the diff_queue[] you are collecting
>>>>> the index-vs-tree diff, for example.
>>>>
>>>> Good to know it seems my idea might be reasonable.
>>>
>>> I agree that this is reasonable. I just didn't look hard enough
>>> to find existing code for this, since I found traverse_trees and
>>> thought that _was_ the library for this.
>>
>> This was surprisingly simple, since most of the complicated stuff
>> is built into diff_tree_oid() and its use of revs->diffopt. The
>> new patch works as shown below the cut-line.
>>
>> I was incredibly suspicious of how quickly this came together,
>> but it passes all the tests I have for it (including Scalar
>> functional tests with the commit, checkout, and add integrations).
> 
> Nice!
> 
>> I'll send a new version with this patch tomorrow, as well as the
>> other recommended edits.

...still planning on this today, but...

>> +       /*
>> +        * If both are sparse directory entries, then expand the
>> +        * modifications to the file level.
>> +        */
>> +       if (old_entry && new_entry &&
>> +           S_ISSPARSEDIR(old_entry->ce_mode) &&
>> +           S_ISSPARSEDIR(new_entry->ce_mode)) {
>> +               show_modified_sparse_directory(revs, old_entry, new_entry, report_missing, cached, match_missing);
>> +               return 0;
>> +       }
> 
> What if S_ISSPARSEDIR(old_entry->ce_mode) != S_ISSPARSEDIR(new_entry->ce_mode) ?

You make a good point that something different would happen
in the case of a directory/file conflict on the sparse checkout
boundary. This can be as simple as the trivial "only files at
root" cone-mode sparse-checkout definition, with "folder/" (tree)
changing to "folder" (blob).

I'll see what I can do to create a test scenario for
this and add the correct cases.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [PATCH v5 10/14] diff-lib: handle index diffs with sparse dirs
  2021-06-11 12:57                       ` Derrick Stolee
@ 2021-06-11 17:27                         ` Derrick Stolee
  0 siblings, 0 replies; 127+ messages in thread
From: Derrick Stolee @ 2021-06-11 17:27 UTC (permalink / raw)
  To: Elijah Newren
  Cc: Junio C Hamano, Derrick Stolee via GitGitGadget,
	Git Mailing List, Matheus Tavares Bernardino, Derrick Stolee,
	Derrick Stolee

On 6/11/2021 8:57 AM, Derrick Stolee wrote:
> On 6/10/2021 5:31 PM, Elijah Newren wrote:
>> On Thu, Jun 10, 2021 at 10:45 AM Derrick Stolee <stolee@gmail.com> wrote:
>>>
>>> I'll send a new version with this patch tomorrow, as well as the
>>> other recommended edits.
> 
> ...still planning on this today, but...

So optimistic!
 
>>> +       /*
>>> +        * If both are sparse directory entries, then expand the
>>> +        * modifications to the file level.
>>> +        */
>>> +       if (old_entry && new_entry &&
>>> +           S_ISSPARSEDIR(old_entry->ce_mode) &&
>>> +           S_ISSPARSEDIR(new_entry->ce_mode)) {
>>> +               show_modified_sparse_directory(revs, old_entry, new_entry, report_missing, cached, match_missing);
>>> +               return 0;
>>> +       }
>>
>> What if S_ISSPARSEDIR(old_entry->ce_mode) != S_ISSPARSEDIR(new_entry->ce_mode) ?
> 
> You make a good point that something different would happen
> in the case of a directory/file conflict on the sparse checkout
> boundary. This can be as simple as the trivial "only files at
> root" cone-mode sparse-checkout definition, with "folder/" (tree)
> changing to "folder" (blob).
> 
> I'll see what I can do to create a test scenario for
> this and add the correct cases.

Creating a directory/file conflict in this way exposes a bug in
a different codepath in unpack_trees(), although it isn't visible
until 'git checkout' allows the index to stay sparse. It's due to
the code in unpack_callback() that handles blobs and trees
differently, and hence the blob/tree conflict isn't handled
appropriately there. The changes from Patch 8 are to blame for
these first errors.

At least, those are the first errors I have discovered with these
conflicts. There might be other scenarios that care about this
section of diff-lib.c, but I have not gotten to a point where
such behavior would be exposed.

I don't expect to succeed in squashing this bug today, so I'll
try again next week.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 127+ messages in thread

end of thread, back to index

Thread overview: 127+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-13 14:01 [PATCH 00/10] Sparse-index: integrate with status and add Derrick Stolee via GitGitGadget
2021-04-13 14:01 ` [PATCH 01/10] t1092: add tests for status/add and sparse files Derrick Stolee via GitGitGadget
2021-04-20 21:52   ` Elijah Newren
2021-04-21 13:21     ` Derrick Stolee
2021-04-21 15:14   ` Matheus Tavares Bernardino
2021-04-23 20:12     ` Derrick Stolee
2021-04-13 14:01 ` [PATCH 02/10] unpack-trees: make sparse aware Derrick Stolee via GitGitGadget
2021-04-20 23:00   ` Elijah Newren
2021-04-21 13:41     ` Derrick Stolee
2021-04-21 16:11       ` Elijah Newren
2021-04-22  2:24         ` Matheus Tavares Bernardino
2021-04-21 17:27     ` Derrick Stolee
2021-04-21 18:55       ` Matheus Tavares Bernardino
2021-04-21 19:10         ` Elijah Newren
2021-04-21 19:51           ` Matheus Tavares Bernardino
2021-04-21 18:56       ` Elijah Newren
2021-04-23 20:16         ` Derrick Stolee
2021-04-13 14:01 ` [PATCH 03/10] dir.c: accept a directory as part of cone-mode patterns Derrick Stolee via GitGitGadget
2021-04-20 23:21   ` Elijah Newren
2021-04-21 13:47     ` Derrick Stolee
2021-04-13 14:01 ` [PATCH 04/10] status: skip sparse-checkout percentage with sparse-index Derrick Stolee via GitGitGadget
2021-04-20 23:26   ` Elijah Newren
2021-04-21 13:51     ` Derrick Stolee
2021-04-13 14:01 ` [PATCH 05/10] status: use sparse-index throughout Derrick Stolee via GitGitGadget
2021-04-21  0:44   ` Elijah Newren
2021-04-21 13:55     ` Derrick Stolee
2021-04-13 14:01 ` [PATCH 06/10] dir: use expand_to_path() for sparse directories Derrick Stolee via GitGitGadget
2021-04-21  0:52   ` Elijah Newren
2021-04-21  0:53     ` Elijah Newren
2021-04-21 14:03       ` Derrick Stolee
2021-04-13 14:01 ` [PATCH 07/10] add: allow operating on a sparse-only index Derrick Stolee via GitGitGadget
2021-04-13 14:01 ` [PATCH 08/10] pathspec: stop calling ensure_full_index Derrick Stolee via GitGitGadget
2021-04-21  0:57   ` Elijah Newren
2021-04-13 14:01 ` [PATCH 09/10] t7519: add sparse directories to FS monitor tests Derrick Stolee via GitGitGadget
2021-04-13 14:01 ` [PATCH 10/10] fsmonitor: test with sparse index Derrick Stolee via GitGitGadget
2021-04-21  7:00   ` Elijah Newren
2021-04-13 20:45 ` [PATCH 00/10] Sparse-index: integrate with status and add Matheus Tavares Bernardino
2021-04-14 16:31   ` Derrick Stolee
2021-04-23 21:34 ` [PATCH v2 0/8] Sparse-index: integrate with status Derrick Stolee via GitGitGadget
2021-04-23 21:34   ` [PATCH v2 1/8] t1092: add tests for status/add and sparse files Derrick Stolee via GitGitGadget
2021-05-13 12:40     ` Matheus Tavares Bernardino
2021-05-14 12:27       ` Derrick Stolee
2021-04-23 21:34   ` [PATCH v2 2/8] unpack-trees: preserve cache_bottom Derrick Stolee via GitGitGadget
2021-04-23 21:34   ` [PATCH v2 3/8] unpack-trees: compare sparse directories correctly Derrick Stolee via GitGitGadget
2021-05-13  3:26     ` Elijah Newren
2021-04-23 21:34   ` [PATCH v2 4/8] unpack-trees: stop recursing into sparse directories Derrick Stolee via GitGitGadget
2021-05-13  3:31     ` Elijah Newren
2021-04-23 21:34   ` [PATCH v2 5/8] dir.c: accept a directory as part of cone-mode patterns Derrick Stolee via GitGitGadget
2021-04-23 21:34   ` [PATCH v2 6/8] status: skip sparse-checkout percentage with sparse-index Derrick Stolee via GitGitGadget
2021-04-23 21:34   ` [PATCH v2 7/8] status: use sparse-index throughout Derrick Stolee via GitGitGadget
2021-04-23 21:34   ` [PATCH v2 8/8] fsmonitor: test with sparse index Derrick Stolee via GitGitGadget
2021-05-13  4:12   ` [PATCH v2 0/8] Sparse-index: integrate with status Elijah Newren
2021-05-14 18:28     ` Derrick Stolee
2021-05-14 18:30   ` [PATCH v3 00/12] " Derrick Stolee via GitGitGadget
2021-05-14 18:31     ` [PATCH v3 01/12] sparse-index: skip indexes with unmerged entries Derrick Stolee via GitGitGadget
2021-05-14 18:31     ` [PATCH v3 02/12] sparse-index: include EXTENDED flag when expanding Derrick Stolee via GitGitGadget
2021-05-18  1:33       ` Elijah Newren
2021-05-18 14:57         ` Derrick Stolee
2021-05-18 17:48           ` Elijah Newren
2021-05-18 18:16             ` Derrick Stolee
2021-05-14 18:31     ` [PATCH v3 03/12] t1092: expand repository data shape Derrick Stolee via GitGitGadget
2021-05-18  1:49       ` Elijah Newren
2021-05-18 14:59         ` Derrick Stolee
2021-05-14 18:31     ` [PATCH v3 04/12] t1092: add tests for status/add and sparse files Derrick Stolee via GitGitGadget
2021-05-14 18:31     ` [PATCH v3 05/12] unpack-trees: preserve cache_bottom Derrick Stolee via GitGitGadget
2021-05-14 18:31     ` [PATCH v3 06/12] unpack-trees: compare sparse directories correctly Derrick Stolee via GitGitGadget
2021-05-14 18:31     ` [PATCH v3 07/12] unpack-trees: stop recursing into sparse directories Derrick Stolee via GitGitGadget
2021-05-18  2:03       ` Elijah Newren
2021-05-18  2:06         ` Elijah Newren
2021-05-18 19:20           ` Derrick Stolee
2021-05-14 18:31     ` [PATCH v3 08/12] dir.c: accept a directory as part of cone-mode patterns Derrick Stolee via GitGitGadget
2021-05-14 18:31     ` [PATCH v3 09/12] status: skip sparse-checkout percentage with sparse-index Derrick Stolee via GitGitGadget
2021-05-14 18:31     ` [PATCH v3 10/12] status: use sparse-index throughout Derrick Stolee via GitGitGadget
2021-05-14 18:31     ` [PATCH v3 11/12] wt-status: expand added sparse directory entries Derrick Stolee via GitGitGadget
2021-05-18  2:27       ` Elijah Newren
2021-05-18 18:26         ` Derrick Stolee
2021-05-18 19:04           ` Derrick Stolee
2021-05-19  8:38             ` Elijah Newren
2021-05-14 18:31     ` [PATCH v3 12/12] fsmonitor: integrate with sparse index Derrick Stolee via GitGitGadget
2021-05-21 11:59     ` [PATCH v4 00/12] Sparse-index: integrate with status Derrick Stolee via GitGitGadget
2021-05-21 11:59       ` [PATCH v4 01/12] sparse-index: skip indexes with unmerged entries Derrick Stolee via GitGitGadget
2021-05-21 11:59       ` [PATCH v4 02/12] sparse-index: include EXTENDED flag when expanding Derrick Stolee via GitGitGadget
2021-05-21 11:59       ` [PATCH v4 03/12] t1092: expand repository data shape Derrick Stolee via GitGitGadget
2021-05-21 11:59       ` [PATCH v4 04/12] t1092: add tests for status/add and sparse files Derrick Stolee via GitGitGadget
2021-05-21 11:59       ` [PATCH v4 05/12] unpack-trees: preserve cache_bottom Derrick Stolee via GitGitGadget
2021-05-21 11:59       ` [PATCH v4 06/12] unpack-trees: compare sparse directories correctly Derrick Stolee via GitGitGadget
2021-05-21 11:59       ` [PATCH v4 07/12] unpack-trees: be careful around sparse directory entries Derrick Stolee via GitGitGadget
2021-05-28 11:36         ` Derrick Stolee
2021-05-21 11:59       ` [PATCH v4 08/12] dir.c: accept a directory as part of cone-mode patterns Derrick Stolee via GitGitGadget
2021-05-21 11:59       ` [PATCH v4 09/12] status: skip sparse-checkout percentage with sparse-index Derrick Stolee via GitGitGadget
2021-05-21 11:59       ` [PATCH v4 10/12] status: use sparse-index throughout Derrick Stolee via GitGitGadget
2021-05-21 11:59       ` [PATCH v4 11/12] wt-status: expand added sparse directory entries Derrick Stolee via GitGitGadget
2021-05-21 11:59       ` [PATCH v4 12/12] fsmonitor: integrate with sparse index Derrick Stolee via GitGitGadget
2021-06-07 12:33       ` [PATCH v5 00/14] Sparse-index: integrate with status Derrick Stolee via GitGitGadget
2021-06-07 12:33         ` [PATCH v5 01/14] sparse-index: skip indexes with unmerged entries Derrick Stolee via GitGitGadget
2021-06-07 12:34         ` [PATCH v5 02/14] sparse-index: include EXTENDED flag when expanding Derrick Stolee via GitGitGadget
2021-06-08 18:56           ` Elijah Newren
2021-06-09 17:39             ` Derrick Stolee
2021-06-09 18:11               ` Elijah Newren
2021-06-07 12:34         ` [PATCH v5 03/14] t1092: replace incorrect 'echo' with 'cat' Derrick Stolee via GitGitGadget
2021-06-08 19:18           ` Elijah Newren
2021-06-07 12:34         ` [PATCH v5 04/14] t1092: expand repository data shape Derrick Stolee via GitGitGadget
2021-06-07 12:34         ` [PATCH v5 05/14] t1092: add tests for status/add and sparse files Derrick Stolee via GitGitGadget
2021-06-07 12:34         ` [PATCH v5 06/14] unpack-trees: preserve cache_bottom Derrick Stolee via GitGitGadget
2021-06-07 12:34         ` [PATCH v5 07/14] unpack-trees: compare sparse directories correctly Derrick Stolee via GitGitGadget
2021-06-07 12:34         ` [PATCH v5 08/14] unpack-trees: unpack sparse directory entries Derrick Stolee via GitGitGadget
2021-06-09  3:48           ` Elijah Newren
2021-06-09 20:21             ` Derrick Stolee
2021-06-07 12:34         ` [PATCH v5 09/14] dir.c: accept a directory as part of cone-mode patterns Derrick Stolee via GitGitGadget
2021-06-07 12:34         ` [PATCH v5 10/14] diff-lib: handle index diffs with sparse dirs Derrick Stolee via GitGitGadget
2021-06-07 15:26           ` Derrick Stolee
2021-06-08  1:05             ` Junio C Hamano
2021-06-08 13:00               ` Derrick Stolee
2021-06-09  5:47           ` Elijah Newren
2021-06-09  6:32             ` Junio C Hamano
2021-06-09  8:11               ` Elijah Newren
2021-06-09 20:33                 ` Derrick Stolee
2021-06-10 17:45                   ` Derrick Stolee
2021-06-10 21:31                     ` Elijah Newren
2021-06-11 12:57                       ` Derrick Stolee
2021-06-11 17:27                         ` Derrick Stolee
2021-06-07 12:34         ` [PATCH v5 11/14] status: skip sparse-checkout percentage with sparse-index Derrick Stolee via GitGitGadget
2021-06-07 12:34         ` [PATCH v5 12/14] status: use sparse-index throughout Derrick Stolee via GitGitGadget
2021-06-07 12:34         ` [PATCH v5 13/14] wt-status: expand added sparse directory entries Derrick Stolee via GitGitGadget
2021-06-09  5:27           ` Elijah Newren
2021-06-09 20:49             ` Derrick Stolee
2021-06-07 12:34         ` [PATCH v5 14/14] fsmonitor: integrate with sparse index Derrick Stolee via GitGitGadget

Git Mailing List Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/git/0 git/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 git git/ https://lore.kernel.org/git \
		git@vger.kernel.org
	public-inbox-index git

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.git


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git