All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3] Fix use of 'cache_bottom' in sparse index
@ 2022-03-16 20:11 Victoria Dye via GitGitGadget
  2022-03-16 20:12 ` [PATCH 1/3] t1092: add sparse directory before cone in test repo Victoria Dye via GitGitGadget
                   ` (4 more replies)
  0 siblings, 5 replies; 12+ messages in thread
From: Victoria Dye via GitGitGadget @ 2022-03-16 20:11 UTC (permalink / raw)
  To: git; +Cc: derrickstolee, gitster, newren, Victoria Dye

An issue was found by @stolee after experimenting with the structure of the
test repo used in 't/t1092-sparse-checkout-compatibility.sh' (patch [1/3])
and finding that certain commands (namely those that internally performed a
"cache diff") failed when a sparse directory appeared lexicographically
before the in-cone directory. The root cause was traced to the
'unpack_trees_options.cache_bottom' variable, which was not being advanced
properly for sparse directories. The result was entries being
double-unpacked or improperly unpacked by 'unpack_trees', causing failures
in tests using 'git reset -- <pathspec>' and 'git diff --staged --
<pathspec>' using pathspecs.

The 'cache_bottom' was handled differently in sparse indexes in the first
place based on the reasoning laid out in 17a1bb570b (unpack-trees: preserve
cache_bottom, 2021-07-14). In short, the 'cache_bottom' advancement in
'mark_ce_used(...)' was disabled for sparse directories because
'unpack_callback(...)' would advance the 'cache_bottom' based on the number
of index entries matching or inside the tree being unpacked. If the tree was
a sparse directory, 'cache_bottom' would appropriately advance by 1, and
therefore didn't need the extra advancement in 'mark_ce_used(...)'. However,
this was insufficient to properly advance the 'cache_bottom' for two
reasons:

 1. 'unpack_callback(...)' would only advance the 'cache_bottom' for sparse
    directories if the operation in progress was a "cache diff" (true for
    'git diff --staged' and 'git reset --mixed', but not - for instance -
    'git reset --hard' or 'git read-tree -m').
 2. If sparse directories were unpacked with 'unpack_index_entry(...)'
    (e.g., in 'git reset -- <pathspec>'), the cache tree-based advancement
    of 'cache_bottom' would not happen.

Luckily, the first did not appear to have any behavioral impact. However,
the latter led to incorrect values being returned by 'next_cache_entry(...)'
depending on the structure of the index, causing the test failures observed
in 't1092'.

To fix this, the 'cache_bottom' advancement is reinstated in
'mark_ce_used(...)', and instead it is disabled in 'unpack_callback(...)' if
the tree in question is a sparse directory. This corrects both the
non-"cache diff" cases and the 'unpack_index_entry(...)' cases while
preventing the double-advancement 17a1bb570b originally intended to avoid
(patch [2/3]).

Finally, now that the cache bottom is advanced properly, we can revert the
"performance improvement" introduced in f2a454e0a5 (unpack-trees: improve
performance of next_cache_entry, 2021-11-29) that mitigated performance
issues arising in 'next_cache_entry(...)' from the non-advancing
'cache_bottom' (patch [3/3]). The performance results in
'p2000-sparse-operations.sh' showed expected variability around 0% change in
execution time (+/= 0.04s, depending on the command), with example results
for potentially-affected commands below.

'git reset'                      master            this_series                  
------------------------------------------------------------------------
full-v3                          0.51(0.21+0.27)   0.50(0.21+0.25) -2.0%
full-v4                          0.51(0.22+0.27)   0.50(0.21+0.24) -2.0%
sparse-v3                        0.30(0.04+0.55)   0.28(0.04+0.50) -6.7%
sparse-v4                        0.31(0.04+0.51)   0.29(0.04+0.51) -6.5%

'git reset -- does-not-exist'    master            this_series                  
------------------------------------------------------------------------
full-v3                          0.54(0.23+0.27)   0.55(0.22+0.28) +1.9%
full-v4                          0.56(0.25+0.26)   0.54(0.24+0.26) -3.6%
sparse-v3                        0.31(0.04+0.54)   0.31(0.04+0.50) +0.0%
sparse-v4                        0.31(0.04+0.52)   0.31(0.04+0.50) +0.0%

'git diff --cached'              master            this_series    
-------------------------------------------------------------------------
full-v3                          0.09(0.04+0.04)   0.09(0.04+0.04) +0.0%
full-v4                          0.09(0.04+0.04)   0.09(0.04+0.04) +0.0%
sparse-v3                        0.05(0.01+0.02)   0.05(0.01+0.03) +0.0%
sparse-v4                        0.04(0.01+0.02)   0.04(0.01+0.02) +0.0%


Thanks! -Victoria

Victoria Dye (3):
  t1092: add sparse directory before cone in test repo
  unpack-trees: increment cache_bottom for sparse directories
  Revert "unpack-trees: improve performance of next_cache_entry"

 t/t1092-sparse-checkout-compatibility.sh |  6 +++-
 unpack-trees.c                           | 39 +++++++++---------------
 2 files changed, 19 insertions(+), 26 deletions(-)


base-commit: 1a4874565fa3b6668042216189551b98b4dc0b1b
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1179%2Fvdye%2Fbugfix%2Fsparse-index-cache-bottom-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1179/vdye/bugfix/sparse-index-cache-bottom-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/1179
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 1/3] t1092: add sparse directory before cone in test repo
  2022-03-16 20:11 [PATCH 0/3] Fix use of 'cache_bottom' in sparse index Victoria Dye via GitGitGadget
@ 2022-03-16 20:12 ` Victoria Dye via GitGitGadget
  2022-03-16 20:12 ` [PATCH 2/3] unpack-trees: increment cache_bottom for sparse directories Victoria Dye via GitGitGadget
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 12+ messages in thread
From: Victoria Dye via GitGitGadget @ 2022-03-16 20:12 UTC (permalink / raw)
  To: git; +Cc: derrickstolee, gitster, newren, Victoria Dye, Victoria Dye

From: Victoria Dye <vdye@github.com>

Add a sparse directory 'before/' containing files 'a' and 'b' to the test
repo used in 't/t1092-sparse-checkout-compatibility.sh'. This is meant to
ensure that no sparse index integrations rely on the in-cone path(s) being
lexicographically first in the repo.

Unfortunately, some existing tests do not handle this repo architecture
properly:

* 'add outside sparse cone'
* 'status/add: outside sparse cone'
* 'reset with pathspecs inside sparse definition'

All three of these are due to the incorrect handling of the
'unpack_trees_options.cache_bottom' when performing a cache diff via
'unpack_trees'. This will be corrected in a future patch; in the meantime,
mark the tests with 'test_expect_failure'.

Finally, update the 'ls-files' test to include the 'before/' directory in
its expected results.

Co-authored-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Victoria Dye <vdye@github.com>
---
 t/t1092-sparse-checkout-compatibility.sh | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index 2a04b532f91..11141221b4d 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -16,7 +16,9 @@ test_expect_success 'setup' '
 		echo "after deep" >e &&
 		echo "after folder1" >g &&
 		echo "after x" >z &&
-		mkdir folder1 folder2 deep x &&
+		mkdir folder1 folder2 deep before x &&
+		echo "before deep" >before/a &&
+		echo "before deep again" >before/b &&
 		mkdir deep/deeper1 deep/deeper2 deep/before deep/later &&
 		mkdir deep/deeper1/deepest &&
 		mkdir deep/deeper1/deepest2 &&
@@ -312,7 +314,7 @@ test_expect_success 'deep changes during checkout' '
 	test_all_match git checkout base
 '
 
-test_expect_success 'add outside sparse cone' '
+test_expect_failure 'add outside sparse cone' '
 	init_repos &&
 
 	run_on_sparse mkdir folder1 &&
@@ -354,7 +356,7 @@ test_expect_success 'commit including unstaged changes' '
 	test_all_match git status --porcelain=v2
 '
 
-test_expect_success 'status/add: outside sparse cone' '
+test_expect_failure 'status/add: outside sparse cone' '
 	init_repos &&
 
 	# folder1 is at HEAD, but outside the sparse cone
@@ -565,7 +567,7 @@ test_expect_success 'checkout and reset (keep)' '
 	test_all_match test_must_fail git reset --keep deepest
 '
 
-test_expect_success 'reset with pathspecs inside sparse definition' '
+test_expect_failure 'reset with pathspecs inside sparse definition' '
 	init_repos &&
 
 	write_script edit-contents <<-\EOF &&
@@ -1311,6 +1313,7 @@ test_expect_success 'ls-files' '
 
 	cat >expect <<-\EOF &&
 	a
+	before/
 	deep/
 	e
 	folder1-
@@ -1358,6 +1361,7 @@ test_expect_success 'ls-files' '
 
 	cat >expect <<-\EOF &&
 	a
+	before/
 	deep/
 	e
 	folder1-
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 2/3] unpack-trees: increment cache_bottom for sparse directories
  2022-03-16 20:11 [PATCH 0/3] Fix use of 'cache_bottom' in sparse index Victoria Dye via GitGitGadget
  2022-03-16 20:12 ` [PATCH 1/3] t1092: add sparse directory before cone in test repo Victoria Dye via GitGitGadget
@ 2022-03-16 20:12 ` Victoria Dye via GitGitGadget
  2022-03-16 20:12 ` [PATCH 3/3] Revert "unpack-trees: improve performance of next_cache_entry" Victoria Dye via GitGitGadget
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 12+ messages in thread
From: Victoria Dye via GitGitGadget @ 2022-03-16 20:12 UTC (permalink / raw)
  To: git; +Cc: derrickstolee, gitster, newren, Victoria Dye, Victoria Dye

From: Victoria Dye <vdye@github.com>

Correct tracking of the 'cache_bottom' for cases where sparse directories
are present in the index.

BACKGROUND
----------
The 'unpack_trees_options.cache_bottom' is a variable that tracks the
in-progress "bottom" of the cache as 'unpack_trees()' iterates through the
contents of the index. Most importantly, this value informs the sequential
return values of 'next_cache_entry()' which, in the "diff cache" usage of
'unpack_callback()', are either unpacked as-is or are passed into the diff
machinery.

The 'cache_bottom' is intended to track the position of the first entry in
the index that has not yet been diffed or unpacked. It is advanced in two
main ways: either it is incremented when an index entry is marked as "used"
(in 'mark_ce_used()'), indicating that it was unpacked or diffed, or when a
directory is unpacked, in which case it is increased by an amount equaling
the number of index entries inside that tree.

In 17a1bb570b (unpack-trees: preserve cache_bottom, 2021-07-14), it was
identified that sparse directories posed a problem to the above
'cache_bottom' advancement logic - because a sparse directory was both an
index entry that could be "used" and a directory that can be unpacked, the
'cache_bottom' would be incremented too many times. To solve this problem,
the 'mark_ce_used()' advancement of 'cache_bottom' was skipped for sparse
directories.

INCORRECT CACHE_BOTTOM TRACKING
-------------------------------
Skipping the 'cache_bottom' advancement for sparse directories in
'mark_ce_used()' breaks down in two cases:

1. When the 'unpack_trees()' operation is *not* a "cache diff" (because the
   directory contents-based incrementing of 'cache_bottom' does not happen).
2. When a cache diff is performed with a pathspec (because
   'unpack_index_entry()' will unpack a sparse directory not matched by the
   pathspec without performing the directory contents-based increment).

The former luckily does not appear to affect 'git' behavior, likely because
'cache_bottom' is largely unused (non-"cache diff" 'unpack_trees()' uses
'find_index_entry()' - rather than 'next_cache_entry()' - to find the index
entries to unpack).

The latter, however, causes 'cache_bottom' to "lag behind" its intended
position by an amount equal to the number of sparse directories unpacked so
far with 'unpack_index_entry()'. If a repository is structured such that any
sparse directories are ordered lexicographically *after* any
pathspec-matching directories, though, this issue won't present any adverse
behavior.

This was the case with the 't1092-sparse-checkout-compatibility.sh' tests
before the addition of the 'before/' sparse directory (ordered *before* the
in-cone 'deep/' directory), therefore sidestepping the issue. Once the
'before/' directory was added, though, 'cache_bottom' began to lag behind
its intended position, causing 'next_cache_entry()' to return index entries
it had already processed and, ultimately, an incorrect diff.

CORRECTING CACHE_BOTTOM
-----------------------
The problems observed in 't1092' come from 'cache_bottom' lagging behind in
cases where the cache tree-based advancement doesn't occur. To solve this,
then, the fix in 17a1bb570b is "reversed"; rather than skipping
'cache_bottom' advancement in 'mark_ce_used()', we skip the directory
contents-based advancement for sparse directories. Now, every index entry
can be accounted for in 'cache_bottom':

* if you're working with a single index entry, 'cache_bottom' is incremented
  in 'mark_ce_used()'
* if you're working with a directory that contains index entries (but is not
  one itself), 'cache_bottom' is incremented by the number of entries in
  that directory.

Finally, change the 'test_expect_failure' tests in 't1092' failing due to
this bug back to 'test_expect_success'.

Signed-off-by: Victoria Dye <vdye@github.com>
---
 t/t1092-sparse-checkout-compatibility.sh |  6 +++---
 unpack-trees.c                           | 16 ++++++++--------
 2 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index 11141221b4d..e9533832aab 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -314,7 +314,7 @@ test_expect_success 'deep changes during checkout' '
 	test_all_match git checkout base
 '
 
-test_expect_failure 'add outside sparse cone' '
+test_expect_success 'add outside sparse cone' '
 	init_repos &&
 
 	run_on_sparse mkdir folder1 &&
@@ -356,7 +356,7 @@ test_expect_success 'commit including unstaged changes' '
 	test_all_match git status --porcelain=v2
 '
 
-test_expect_failure 'status/add: outside sparse cone' '
+test_expect_success 'status/add: outside sparse cone' '
 	init_repos &&
 
 	# folder1 is at HEAD, but outside the sparse cone
@@ -567,7 +567,7 @@ test_expect_success 'checkout and reset (keep)' '
 	test_all_match test_must_fail git reset --keep deepest
 '
 
-test_expect_failure 'reset with pathspecs inside sparse definition' '
+test_expect_success 'reset with pathspecs inside sparse definition' '
 	init_repos &&
 
 	write_script edit-contents <<-\EOF &&
diff --git a/unpack-trees.c b/unpack-trees.c
index 96525d2ec26..aac927faf08 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -595,13 +595,6 @@ static void mark_ce_used(struct cache_entry *ce, struct unpack_trees_options *o)
 {
 	ce->ce_flags |= CE_UNPACKED;
 
-	/*
-	 * If this is a sparse directory, don't advance cache_bottom.
-	 * That will be advanced later using the cache-tree data.
-	 */
-	if (S_ISSPARSEDIR(ce->ce_mode))
-		return;
-
 	if (o->cache_bottom < o->src_index->cache_nr &&
 	    o->src_index->cache[o->cache_bottom] == ce) {
 		int bottom = o->cache_bottom;
@@ -1442,7 +1435,14 @@ static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, str
 			 * it does not do any look-ahead, so this is safe.
 			 */
 			if (matches) {
-				o->cache_bottom += matches;
+				/*
+				 * Only increment the cache_bottom if the
+				 * directory isn't a sparse directory index
+				 * entry (if it is, it was already incremented)
+				 * in 'mark_ce_used()'
+				 */
+				if (!src[0] || !S_ISSPARSEDIR(src[0]->ce_mode))
+					o->cache_bottom += matches;
 				return mask;
 			}
 		}
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 3/3] Revert "unpack-trees: improve performance of next_cache_entry"
  2022-03-16 20:11 [PATCH 0/3] Fix use of 'cache_bottom' in sparse index Victoria Dye via GitGitGadget
  2022-03-16 20:12 ` [PATCH 1/3] t1092: add sparse directory before cone in test repo Victoria Dye via GitGitGadget
  2022-03-16 20:12 ` [PATCH 2/3] unpack-trees: increment cache_bottom for sparse directories Victoria Dye via GitGitGadget
@ 2022-03-16 20:12 ` Victoria Dye via GitGitGadget
  2022-03-16 20:21 ` [PATCH 0/3] Fix use of 'cache_bottom' in sparse index Junio C Hamano
  2022-03-17 15:55 ` [PATCH v2 " Victoria Dye via GitGitGadget
  4 siblings, 0 replies; 12+ messages in thread
From: Victoria Dye via GitGitGadget @ 2022-03-16 20:12 UTC (permalink / raw)
  To: git; +Cc: derrickstolee, gitster, newren, Victoria Dye, Victoria Dye

From: Victoria Dye <vdye@github.com>

This reverts commit f2a454e0a5 (unpack-trees: improve performance of
next_cache_entry, 2021-11-29).

The "hint" value was originally needed to improve performance in 'git reset
-- <pathspec>' caused by 'cache_bottom' lagging behind its correct value
when using a sparse index. The 'cache_bottom' tracking has since been
corrected, removing the need for an additional "pseudo-cache_bottom"
tracking variable.

Signed-off-by: Victoria Dye <vdye@github.com>
---
 unpack-trees.c | 23 ++++++-----------------
 1 file changed, 6 insertions(+), 17 deletions(-)

diff --git a/unpack-trees.c b/unpack-trees.c
index aac927faf08..7e5715c42b3 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -644,24 +644,17 @@ static void mark_ce_used_same_name(struct cache_entry *ce,
 	}
 }
 
-static struct cache_entry *next_cache_entry(struct unpack_trees_options *o, int *hint)
+static struct cache_entry *next_cache_entry(struct unpack_trees_options *o)
 {
 	const struct index_state *index = o->src_index;
 	int pos = o->cache_bottom;
 
-	if (*hint > pos)
-		pos = *hint;
-
 	while (pos < index->cache_nr) {
 		struct cache_entry *ce = index->cache[pos];
-		if (!(ce->ce_flags & CE_UNPACKED)) {
-			*hint = pos + 1;
+		if (!(ce->ce_flags & CE_UNPACKED))
 			return ce;
-		}
 		pos++;
 	}
-
-	*hint = pos;
 	return NULL;
 }
 
@@ -1373,13 +1366,12 @@ static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, str
 
 	/* Are we supposed to look at the index too? */
 	if (o->merge) {
-		int hint = -1;
 		while (1) {
 			int cmp;
 			struct cache_entry *ce;
 
 			if (o->diff_index_cached)
-				ce = next_cache_entry(o, &hint);
+				ce = next_cache_entry(o);
 			else
 				ce = find_cache_entry(info, p);
 
@@ -1706,7 +1698,7 @@ static int verify_absent(const struct cache_entry *,
 int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options *o)
 {
 	struct repository *repo = the_repository;
-	int i, hint, ret;
+	int i, ret;
 	static struct cache_entry *dfc;
 	struct pattern_list pl;
 	int free_pattern_list = 0;
@@ -1795,15 +1787,13 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
 		info.pathspec = o->pathspec;
 
 		if (o->prefix) {
-			hint = -1;
-
 			/*
 			 * Unpack existing index entries that sort before the
 			 * prefix the tree is spliced into.  Note that o->merge
 			 * is always true in this case.
 			 */
 			while (1) {
-				struct cache_entry *ce = next_cache_entry(o, &hint);
+				struct cache_entry *ce = next_cache_entry(o);
 				if (!ce)
 					break;
 				if (ce_in_traverse_path(ce, &info))
@@ -1824,9 +1814,8 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
 
 	/* Any left-over entries in the index? */
 	if (o->merge) {
-		hint = -1;
 		while (1) {
-			struct cache_entry *ce = next_cache_entry(o, &hint);
+			struct cache_entry *ce = next_cache_entry(o);
 			if (!ce)
 				break;
 			if (unpack_index_entry(ce, o) < 0)
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/3] Fix use of 'cache_bottom' in sparse index
  2022-03-16 20:11 [PATCH 0/3] Fix use of 'cache_bottom' in sparse index Victoria Dye via GitGitGadget
                   ` (2 preceding siblings ...)
  2022-03-16 20:12 ` [PATCH 3/3] Revert "unpack-trees: improve performance of next_cache_entry" Victoria Dye via GitGitGadget
@ 2022-03-16 20:21 ` Junio C Hamano
  2022-03-17 15:55 ` [PATCH v2 " Victoria Dye via GitGitGadget
  4 siblings, 0 replies; 12+ messages in thread
From: Junio C Hamano @ 2022-03-16 20:21 UTC (permalink / raw)
  To: Victoria Dye via GitGitGadget; +Cc: git, derrickstolee, newren, Victoria Dye

"Victoria Dye via GitGitGadget" <gitgitgadget@gmail.com> writes:

> ...
> To fix this, the 'cache_bottom' advancement is reinstated in
> 'mark_ce_used(...)', and instead it is disabled in 'unpack_callback(...)' if
> the tree in question is a sparse directory. This corrects both the
> non-"cache diff" cases and the 'unpack_index_entry(...)' cases while
> preventing the double-advancement 17a1bb570b originally intended to avoid
> (patch [2/3]).

;-).

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v2 0/3] Fix use of 'cache_bottom' in sparse index
  2022-03-16 20:11 [PATCH 0/3] Fix use of 'cache_bottom' in sparse index Victoria Dye via GitGitGadget
                   ` (3 preceding siblings ...)
  2022-03-16 20:21 ` [PATCH 0/3] Fix use of 'cache_bottom' in sparse index Junio C Hamano
@ 2022-03-17 15:55 ` Victoria Dye via GitGitGadget
  2022-03-17 15:55   ` [PATCH v2 1/3] t1092: add sparse directory before cone in test repo Victoria Dye via GitGitGadget
                     ` (3 more replies)
  4 siblings, 4 replies; 12+ messages in thread
From: Victoria Dye via GitGitGadget @ 2022-03-17 15:55 UTC (permalink / raw)
  To: git; +Cc: derrickstolee, gitster, newren, Victoria Dye

An issue was found by @stolee after experimenting with the structure of the
test repo used in 't/t1092-sparse-checkout-compatibility.sh' (patch [1/3])
and finding that certain commands (namely those that internally performed a
"cache diff") failed when a sparse directory appeared lexicographically
before the in-cone directory. The root cause was traced to the
'unpack_trees_options.cache_bottom' variable, which was not being advanced
properly for sparse directories. The result was entries being
double-unpacked or improperly unpacked by 'unpack_trees', causing failures
in tests using 'git reset -- <pathspec>' and 'git diff --staged --
<pathspec>' using pathspecs.

The 'cache_bottom' was handled differently in sparse indexes in the first
place based on the reasoning laid out in 17a1bb570b (unpack-trees: preserve
cache_bottom, 2021-07-14). In short, the 'cache_bottom' advancement in
'mark_ce_used(...)' was disabled for sparse directories because
'unpack_callback(...)' would advance the 'cache_bottom' based on the number
of index entries matching or inside the tree being unpacked. If the tree was
a sparse directory, 'cache_bottom' would appropriately advance by 1, and
therefore didn't need the extra advancement in 'mark_ce_used(...)'. However,
this was insufficient to properly advance the 'cache_bottom' for two
reasons:

 1. 'unpack_callback(...)' would only advance the 'cache_bottom' for sparse
    directories if the operation in progress was a "cache diff" (true for
    'git diff --staged' and 'git reset --mixed', but not - for instance -
    'git reset --hard' or 'git read-tree -m').
 2. If sparse directories were unpacked with 'unpack_index_entry(...)'
    (e.g., in 'git reset -- <pathspec>'), the cache tree-based advancement
    of 'cache_bottom' would not happen.

Luckily, the first did not appear to have any behavioral impact. However,
the latter led to incorrect values being returned by 'next_cache_entry(...)'
depending on the structure of the index, causing the test failures observed
in 't1092'.

To fix this, the 'cache_bottom' advancement is reinstated in
'mark_ce_used(...)', and instead it is disabled in 'unpack_callback(...)' if
the tree in question is a sparse directory. This corrects both the
non-"cache diff" cases and the 'unpack_index_entry(...)' cases while
preventing the double-advancement 17a1bb570b originally intended to avoid
(patch [2/3]).

Finally, now that the cache bottom is advanced properly, we can revert the
"performance improvement" introduced in f2a454e0a5 (unpack-trees: improve
performance of next_cache_entry, 2021-11-29) that mitigated performance
issues arising in 'next_cache_entry(...)' from the non-advancing
'cache_bottom' (patch [3/3]). The performance results in
'p2000-sparse-operations.sh' showed expected variability around 0% change in
execution time (+/= 0.04s, depending on the command), with example results
for potentially-affected commands below.

'git reset'                      master            this_series                  
------------------------------------------------------------------------
full-v3                          0.51(0.21+0.27)   0.50(0.21+0.25) -2.0%
full-v4                          0.51(0.22+0.27)   0.50(0.21+0.24) -2.0%
sparse-v3                        0.30(0.04+0.55)   0.28(0.04+0.50) -6.7%
sparse-v4                        0.31(0.04+0.51)   0.29(0.04+0.51) -6.5%

'git reset -- does-not-exist'    master            this_series                  
------------------------------------------------------------------------
full-v3                          0.54(0.23+0.27)   0.55(0.22+0.28) +1.9%
full-v4                          0.56(0.25+0.26)   0.54(0.24+0.26) -3.6%
sparse-v3                        0.31(0.04+0.54)   0.31(0.04+0.50) +0.0%
sparse-v4                        0.31(0.04+0.52)   0.31(0.04+0.50) +0.0%

'git diff --cached'              master            this_series    
-------------------------------------------------------------------------
full-v3                          0.09(0.04+0.04)   0.09(0.04+0.04) +0.0%
full-v4                          0.09(0.04+0.04)   0.09(0.04+0.04) +0.0%
sparse-v3                        0.05(0.01+0.02)   0.05(0.01+0.03) +0.0%
sparse-v4                        0.04(0.01+0.02)   0.04(0.01+0.02) +0.0%



Changes since V1
================

 * Rebase on top of 'master' (to take changes from 'vd/sparse-read-tree')
 * Add 'before/' to expected index results in 't1092' test 'root directory
   cannot be sparse'

Thanks! -Victoria

Victoria Dye (3):
  t1092: add sparse directory before cone in test repo
  unpack-trees: increment cache_bottom for sparse directories
  Revert "unpack-trees: improve performance of next_cache_entry"

 t/t1092-sparse-checkout-compatibility.sh |  7 ++++-
 unpack-trees.c                           | 39 +++++++++---------------
 2 files changed, 20 insertions(+), 26 deletions(-)


base-commit: 74cc1aa55f30ed76424a0e7226ab519aa6265061
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1179%2Fvdye%2Fbugfix%2Fsparse-index-cache-bottom-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1179/vdye/bugfix/sparse-index-cache-bottom-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/1179

Range-diff vs v1:

 1:  726b947bcbf ! 1:  d1491d299a9 t1092: add sparse directory before cone in test repo
     @@ Commit message
          'unpack_trees'. This will be corrected in a future patch; in the meantime,
          mark the tests with 'test_expect_failure'.
      
     -    Finally, update the 'ls-files' test to include the 'before/' directory in
     -    its expected results.
     +    Finally, update the 'ls-files' and 'root directory cannot be sparse' tests
     +    to include the 'before/' directory in their expected index contents.
      
          Co-authored-by: Derrick Stolee <derrickstolee@github.com>
          Signed-off-by: Victoria Dye <vdye@github.com>
     @@ t/t1092-sparse-checkout-compatibility.sh: test_expect_success 'setup' '
       		mkdir deep/deeper1 deep/deeper2 deep/before deep/later &&
       		mkdir deep/deeper1/deepest &&
       		mkdir deep/deeper1/deepest2 &&
     +@@ t/t1092-sparse-checkout-compatibility.sh: test_expect_success 'root directory cannot be sparse' '
     + 
     + 	# Verify sparse directories still present, root directory is not sparse
     + 	cat >expect <<-EOF &&
     ++	before/
     + 	folder1/
     + 	folder2/
     + 	x/
      @@ t/t1092-sparse-checkout-compatibility.sh: test_expect_success 'deep changes during checkout' '
       	test_all_match git checkout base
       '
 2:  8ebfebcc347 = 2:  cf8dc50c300 unpack-trees: increment cache_bottom for sparse directories
 3:  ec6b9686e8f = 3:  d74b304f560 Revert "unpack-trees: improve performance of next_cache_entry"

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v2 1/3] t1092: add sparse directory before cone in test repo
  2022-03-17 15:55 ` [PATCH v2 " Victoria Dye via GitGitGadget
@ 2022-03-17 15:55   ` Victoria Dye via GitGitGadget
  2022-03-17 15:55   ` [PATCH v2 2/3] unpack-trees: increment cache_bottom for sparse directories Victoria Dye via GitGitGadget
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 12+ messages in thread
From: Victoria Dye via GitGitGadget @ 2022-03-17 15:55 UTC (permalink / raw)
  To: git; +Cc: derrickstolee, gitster, newren, Victoria Dye, Victoria Dye

From: Victoria Dye <vdye@github.com>

Add a sparse directory 'before/' containing files 'a' and 'b' to the test
repo used in 't/t1092-sparse-checkout-compatibility.sh'. This is meant to
ensure that no sparse index integrations rely on the in-cone path(s) being
lexicographically first in the repo.

Unfortunately, some existing tests do not handle this repo architecture
properly:

* 'add outside sparse cone'
* 'status/add: outside sparse cone'
* 'reset with pathspecs inside sparse definition'

All three of these are due to the incorrect handling of the
'unpack_trees_options.cache_bottom' when performing a cache diff via
'unpack_trees'. This will be corrected in a future patch; in the meantime,
mark the tests with 'test_expect_failure'.

Finally, update the 'ls-files' and 'root directory cannot be sparse' tests
to include the 'before/' directory in their expected index contents.

Co-authored-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Victoria Dye <vdye@github.com>
---
 t/t1092-sparse-checkout-compatibility.sh | 13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index dcc0a30d4ad..dcd7061fb3b 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -16,7 +16,9 @@ test_expect_success 'setup' '
 		echo "after deep" >e &&
 		echo "after folder1" >g &&
 		echo "after x" >z &&
-		mkdir folder1 folder2 deep x &&
+		mkdir folder1 folder2 deep before x &&
+		echo "before deep" >before/a &&
+		echo "before deep again" >before/b &&
 		mkdir deep/deeper1 deep/deeper2 deep/before deep/later &&
 		mkdir deep/deeper1/deepest &&
 		mkdir deep/deeper1/deepest2 &&
@@ -254,6 +256,7 @@ test_expect_success 'root directory cannot be sparse' '
 
 	# Verify sparse directories still present, root directory is not sparse
 	cat >expect <<-EOF &&
+	before/
 	folder1/
 	folder2/
 	x/
@@ -337,7 +340,7 @@ test_expect_success 'deep changes during checkout' '
 	test_all_match git checkout base
 '
 
-test_expect_success 'add outside sparse cone' '
+test_expect_failure 'add outside sparse cone' '
 	init_repos &&
 
 	run_on_sparse mkdir folder1 &&
@@ -379,7 +382,7 @@ test_expect_success 'commit including unstaged changes' '
 	test_all_match git status --porcelain=v2
 '
 
-test_expect_success 'status/add: outside sparse cone' '
+test_expect_failure 'status/add: outside sparse cone' '
 	init_repos &&
 
 	# folder1 is at HEAD, but outside the sparse cone
@@ -590,7 +593,7 @@ test_expect_success 'checkout and reset (keep)' '
 	test_all_match test_must_fail git reset --keep deepest
 '
 
-test_expect_success 'reset with pathspecs inside sparse definition' '
+test_expect_failure 'reset with pathspecs inside sparse definition' '
 	init_repos &&
 
 	write_script edit-contents <<-\EOF &&
@@ -1444,6 +1447,7 @@ test_expect_success 'ls-files' '
 
 	cat >expect <<-\EOF &&
 	a
+	before/
 	deep/
 	e
 	folder1-
@@ -1491,6 +1495,7 @@ test_expect_success 'ls-files' '
 
 	cat >expect <<-\EOF &&
 	a
+	before/
 	deep/
 	e
 	folder1-
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v2 2/3] unpack-trees: increment cache_bottom for sparse directories
  2022-03-17 15:55 ` [PATCH v2 " Victoria Dye via GitGitGadget
  2022-03-17 15:55   ` [PATCH v2 1/3] t1092: add sparse directory before cone in test repo Victoria Dye via GitGitGadget
@ 2022-03-17 15:55   ` Victoria Dye via GitGitGadget
  2022-03-21 19:03     ` Derrick Stolee
  2022-03-17 15:55   ` [PATCH v2 3/3] Revert "unpack-trees: improve performance of next_cache_entry" Victoria Dye via GitGitGadget
  2022-03-21 19:12   ` [PATCH v2 0/3] Fix use of 'cache_bottom' in sparse index Derrick Stolee
  3 siblings, 1 reply; 12+ messages in thread
From: Victoria Dye via GitGitGadget @ 2022-03-17 15:55 UTC (permalink / raw)
  To: git; +Cc: derrickstolee, gitster, newren, Victoria Dye, Victoria Dye

From: Victoria Dye <vdye@github.com>

Correct tracking of the 'cache_bottom' for cases where sparse directories
are present in the index.

BACKGROUND
----------
The 'unpack_trees_options.cache_bottom' is a variable that tracks the
in-progress "bottom" of the cache as 'unpack_trees()' iterates through the
contents of the index. Most importantly, this value informs the sequential
return values of 'next_cache_entry()' which, in the "diff cache" usage of
'unpack_callback()', are either unpacked as-is or are passed into the diff
machinery.

The 'cache_bottom' is intended to track the position of the first entry in
the index that has not yet been diffed or unpacked. It is advanced in two
main ways: either it is incremented when an index entry is marked as "used"
(in 'mark_ce_used()'), indicating that it was unpacked or diffed, or when a
directory is unpacked, in which case it is increased by an amount equaling
the number of index entries inside that tree.

In 17a1bb570b (unpack-trees: preserve cache_bottom, 2021-07-14), it was
identified that sparse directories posed a problem to the above
'cache_bottom' advancement logic - because a sparse directory was both an
index entry that could be "used" and a directory that can be unpacked, the
'cache_bottom' would be incremented too many times. To solve this problem,
the 'mark_ce_used()' advancement of 'cache_bottom' was skipped for sparse
directories.

INCORRECT CACHE_BOTTOM TRACKING
-------------------------------
Skipping the 'cache_bottom' advancement for sparse directories in
'mark_ce_used()' breaks down in two cases:

1. When the 'unpack_trees()' operation is *not* a "cache diff" (because the
   directory contents-based incrementing of 'cache_bottom' does not happen).
2. When a cache diff is performed with a pathspec (because
   'unpack_index_entry()' will unpack a sparse directory not matched by the
   pathspec without performing the directory contents-based increment).

The former luckily does not appear to affect 'git' behavior, likely because
'cache_bottom' is largely unused (non-"cache diff" 'unpack_trees()' uses
'find_index_entry()' - rather than 'next_cache_entry()' - to find the index
entries to unpack).

The latter, however, causes 'cache_bottom' to "lag behind" its intended
position by an amount equal to the number of sparse directories unpacked so
far with 'unpack_index_entry()'. If a repository is structured such that any
sparse directories are ordered lexicographically *after* any
pathspec-matching directories, though, this issue won't present any adverse
behavior.

This was the case with the 't1092-sparse-checkout-compatibility.sh' tests
before the addition of the 'before/' sparse directory (ordered *before* the
in-cone 'deep/' directory), therefore sidestepping the issue. Once the
'before/' directory was added, though, 'cache_bottom' began to lag behind
its intended position, causing 'next_cache_entry()' to return index entries
it had already processed and, ultimately, an incorrect diff.

CORRECTING CACHE_BOTTOM
-----------------------
The problems observed in 't1092' come from 'cache_bottom' lagging behind in
cases where the cache tree-based advancement doesn't occur. To solve this,
then, the fix in 17a1bb570b is "reversed"; rather than skipping
'cache_bottom' advancement in 'mark_ce_used()', we skip the directory
contents-based advancement for sparse directories. Now, every index entry
can be accounted for in 'cache_bottom':

* if you're working with a single index entry, 'cache_bottom' is incremented
  in 'mark_ce_used()'
* if you're working with a directory that contains index entries (but is not
  one itself), 'cache_bottom' is incremented by the number of entries in
  that directory.

Finally, change the 'test_expect_failure' tests in 't1092' failing due to
this bug back to 'test_expect_success'.

Signed-off-by: Victoria Dye <vdye@github.com>
---
 t/t1092-sparse-checkout-compatibility.sh |  6 +++---
 unpack-trees.c                           | 16 ++++++++--------
 2 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index dcd7061fb3b..236ab530284 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -340,7 +340,7 @@ test_expect_success 'deep changes during checkout' '
 	test_all_match git checkout base
 '
 
-test_expect_failure 'add outside sparse cone' '
+test_expect_success 'add outside sparse cone' '
 	init_repos &&
 
 	run_on_sparse mkdir folder1 &&
@@ -382,7 +382,7 @@ test_expect_success 'commit including unstaged changes' '
 	test_all_match git status --porcelain=v2
 '
 
-test_expect_failure 'status/add: outside sparse cone' '
+test_expect_success 'status/add: outside sparse cone' '
 	init_repos &&
 
 	# folder1 is at HEAD, but outside the sparse cone
@@ -593,7 +593,7 @@ test_expect_success 'checkout and reset (keep)' '
 	test_all_match test_must_fail git reset --keep deepest
 '
 
-test_expect_failure 'reset with pathspecs inside sparse definition' '
+test_expect_success 'reset with pathspecs inside sparse definition' '
 	init_repos &&
 
 	write_script edit-contents <<-\EOF &&
diff --git a/unpack-trees.c b/unpack-trees.c
index 2763a029a17..b82c1a9705e 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -595,13 +595,6 @@ static void mark_ce_used(struct cache_entry *ce, struct unpack_trees_options *o)
 {
 	ce->ce_flags |= CE_UNPACKED;
 
-	/*
-	 * If this is a sparse directory, don't advance cache_bottom.
-	 * That will be advanced later using the cache-tree data.
-	 */
-	if (S_ISSPARSEDIR(ce->ce_mode))
-		return;
-
 	if (o->cache_bottom < o->src_index->cache_nr &&
 	    o->src_index->cache[o->cache_bottom] == ce) {
 		int bottom = o->cache_bottom;
@@ -1478,7 +1471,14 @@ static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, str
 			 * it does not do any look-ahead, so this is safe.
 			 */
 			if (matches) {
-				o->cache_bottom += matches;
+				/*
+				 * Only increment the cache_bottom if the
+				 * directory isn't a sparse directory index
+				 * entry (if it is, it was already incremented)
+				 * in 'mark_ce_used()'
+				 */
+				if (!src[0] || !S_ISSPARSEDIR(src[0]->ce_mode))
+					o->cache_bottom += matches;
 				return mask;
 			}
 		}
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v2 3/3] Revert "unpack-trees: improve performance of next_cache_entry"
  2022-03-17 15:55 ` [PATCH v2 " Victoria Dye via GitGitGadget
  2022-03-17 15:55   ` [PATCH v2 1/3] t1092: add sparse directory before cone in test repo Victoria Dye via GitGitGadget
  2022-03-17 15:55   ` [PATCH v2 2/3] unpack-trees: increment cache_bottom for sparse directories Victoria Dye via GitGitGadget
@ 2022-03-17 15:55   ` Victoria Dye via GitGitGadget
  2022-03-21 19:12   ` [PATCH v2 0/3] Fix use of 'cache_bottom' in sparse index Derrick Stolee
  3 siblings, 0 replies; 12+ messages in thread
From: Victoria Dye via GitGitGadget @ 2022-03-17 15:55 UTC (permalink / raw)
  To: git; +Cc: derrickstolee, gitster, newren, Victoria Dye, Victoria Dye

From: Victoria Dye <vdye@github.com>

This reverts commit f2a454e0a5 (unpack-trees: improve performance of
next_cache_entry, 2021-11-29).

The "hint" value was originally needed to improve performance in 'git reset
-- <pathspec>' caused by 'cache_bottom' lagging behind its correct value
when using a sparse index. The 'cache_bottom' tracking has since been
corrected, removing the need for an additional "pseudo-cache_bottom"
tracking variable.

Signed-off-by: Victoria Dye <vdye@github.com>
---
 unpack-trees.c | 23 ++++++-----------------
 1 file changed, 6 insertions(+), 17 deletions(-)

diff --git a/unpack-trees.c b/unpack-trees.c
index b82c1a9705e..7f528d35cc2 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -644,24 +644,17 @@ static void mark_ce_used_same_name(struct cache_entry *ce,
 	}
 }
 
-static struct cache_entry *next_cache_entry(struct unpack_trees_options *o, int *hint)
+static struct cache_entry *next_cache_entry(struct unpack_trees_options *o)
 {
 	const struct index_state *index = o->src_index;
 	int pos = o->cache_bottom;
 
-	if (*hint > pos)
-		pos = *hint;
-
 	while (pos < index->cache_nr) {
 		struct cache_entry *ce = index->cache[pos];
-		if (!(ce->ce_flags & CE_UNPACKED)) {
-			*hint = pos + 1;
+		if (!(ce->ce_flags & CE_UNPACKED))
 			return ce;
-		}
 		pos++;
 	}
-
-	*hint = pos;
 	return NULL;
 }
 
@@ -1409,13 +1402,12 @@ static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, str
 
 	/* Are we supposed to look at the index too? */
 	if (o->merge) {
-		int hint = -1;
 		while (1) {
 			int cmp;
 			struct cache_entry *ce;
 
 			if (o->diff_index_cached)
-				ce = next_cache_entry(o, &hint);
+				ce = next_cache_entry(o);
 			else
 				ce = find_cache_entry(info, p);
 
@@ -1777,7 +1769,7 @@ static int verify_absent(const struct cache_entry *,
 int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options *o)
 {
 	struct repository *repo = the_repository;
-	int i, hint, ret;
+	int i, ret;
 	static struct cache_entry *dfc;
 	struct pattern_list pl;
 	int free_pattern_list = 0;
@@ -1869,15 +1861,13 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
 		info.pathspec = o->pathspec;
 
 		if (o->prefix) {
-			hint = -1;
-
 			/*
 			 * Unpack existing index entries that sort before the
 			 * prefix the tree is spliced into.  Note that o->merge
 			 * is always true in this case.
 			 */
 			while (1) {
-				struct cache_entry *ce = next_cache_entry(o, &hint);
+				struct cache_entry *ce = next_cache_entry(o);
 				if (!ce)
 					break;
 				if (ce_in_traverse_path(ce, &info))
@@ -1898,9 +1888,8 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
 
 	/* Any left-over entries in the index? */
 	if (o->merge) {
-		hint = -1;
 		while (1) {
-			struct cache_entry *ce = next_cache_entry(o, &hint);
+			struct cache_entry *ce = next_cache_entry(o);
 			if (!ce)
 				break;
 			if (unpack_index_entry(ce, o) < 0)
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 2/3] unpack-trees: increment cache_bottom for sparse directories
  2022-03-17 15:55   ` [PATCH v2 2/3] unpack-trees: increment cache_bottom for sparse directories Victoria Dye via GitGitGadget
@ 2022-03-21 19:03     ` Derrick Stolee
  2022-03-21 20:52       ` Junio C Hamano
  0 siblings, 1 reply; 12+ messages in thread
From: Derrick Stolee @ 2022-03-21 19:03 UTC (permalink / raw)
  To: Victoria Dye via GitGitGadget, git; +Cc: gitster, newren, Victoria Dye

On 3/17/2022 11:55 AM, Victoria Dye via GitGitGadget wrote:
> From: Victoria Dye <vdye@github.com>
> 
> Correct tracking of the 'cache_bottom' for cases where sparse directories
> are present in the index.

Thank you for the detailed background (that I cut from my reply).

> CORRECTING CACHE_BOTTOM
> -----------------------
> The problems observed in 't1092' come from 'cache_bottom' lagging behind in
> cases where the cache tree-based advancement doesn't occur. To solve this,
> then, the fix in 17a1bb570b is "reversed"; rather than skipping
> 'cache_bottom' advancement in 'mark_ce_used()', we skip the directory
> contents-based advancement for sparse directories. Now, every index entry
> can be accounted for in 'cache_bottom':

I have sufficient background to be confident that you are doing the
right thing here!

> -test_expect_failure 'add outside sparse cone' '
> +test_expect_success 'add outside sparse cone' '

> -test_expect_failure 'status/add: outside sparse cone' '
> +test_expect_success 'status/add: outside sparse cone' '

> -test_expect_failure 'reset with pathspecs inside sparse definition' '
> +test_expect_success 'reset with pathspecs inside sparse definition' '

Love to see it.

>  			if (matches) {
> -				o->cache_bottom += matches;
> +				/*
> +				 * Only increment the cache_bottom if the
> +				 * directory isn't a sparse directory index
> +				 * entry (if it is, it was already incremented)
> +				 * in 'mark_ce_used()'
> +				 */
> +				if (!src[0] || !S_ISSPARSEDIR(src[0]->ce_mode))
> +					o->cache_bottom += matches;
>  				return mask;

Looks great.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 0/3] Fix use of 'cache_bottom' in sparse index
  2022-03-17 15:55 ` [PATCH v2 " Victoria Dye via GitGitGadget
                     ` (2 preceding siblings ...)
  2022-03-17 15:55   ` [PATCH v2 3/3] Revert "unpack-trees: improve performance of next_cache_entry" Victoria Dye via GitGitGadget
@ 2022-03-21 19:12   ` Derrick Stolee
  3 siblings, 0 replies; 12+ messages in thread
From: Derrick Stolee @ 2022-03-21 19:12 UTC (permalink / raw)
  To: Victoria Dye via GitGitGadget, git; +Cc: gitster, newren, Victoria Dye

On 3/17/2022 11:55 AM, Victoria Dye via GitGitGadget wrote:
> To fix this, the 'cache_bottom' advancement is reinstated in
> 'mark_ce_used(...)', and instead it is disabled in 'unpack_callback(...)' if
> the tree in question is a sparse directory. This corrects both the
> non-"cache diff" cases and the 'unpack_index_entry(...)' cases while
> preventing the double-advancement 17a1bb570b originally intended to avoid
> (patch [2/3]).

Thank you for digging deep and finding the root cause here _and_ the
correct fix.
 
> Finally, now that the cache bottom is advanced properly, we can revert the
> "performance improvement" introduced in f2a454e0a5 (unpack-trees: improve
> performance of next_cache_entry, 2021-11-29) that mitigated performance
> issues arising in 'next_cache_entry(...)' from the non-advancing
> 'cache_bottom' (patch [3/3]). The performance results in
> 'p2000-sparse-operations.sh' showed expected variability around 0% change in
> execution time (+/= 0.04s, depending on the command), with example results
> for potentially-affected commands below.
> 
> 'git reset'                      master            this_series                  
> ------------------------------------------------------------------------
> full-v3                          0.51(0.21+0.27)   0.50(0.21+0.25) -2.0%
> full-v4                          0.51(0.22+0.27)   0.50(0.21+0.24) -2.0%
> sparse-v3                        0.30(0.04+0.55)   0.28(0.04+0.50) -6.7%
> sparse-v4                        0.31(0.04+0.51)   0.29(0.04+0.51) -6.5%
> 
> 'git reset -- does-not-exist'    master            this_series                  
> ------------------------------------------------------------------------
> full-v3                          0.54(0.23+0.27)   0.55(0.22+0.28) +1.9%
> full-v4                          0.56(0.25+0.26)   0.54(0.24+0.26) -3.6%
> sparse-v3                        0.31(0.04+0.54)   0.31(0.04+0.50) +0.0%
> sparse-v4                        0.31(0.04+0.52)   0.31(0.04+0.50) +0.0%
> 
> 'git diff --cached'              master            this_series    
> -------------------------------------------------------------------------
> full-v3                          0.09(0.04+0.04)   0.09(0.04+0.04) +0.0%
> full-v4                          0.09(0.04+0.04)   0.09(0.04+0.04) +0.0%
> sparse-v3                        0.05(0.01+0.02)   0.05(0.01+0.03) +0.0%
> sparse-v4                        0.04(0.01+0.02)   0.04(0.01+0.02) +0.0%

I'm glad this also makes things simpler here. It's interesting that
it previously manifested only as a performance issue and not a
correctness issue.
  
I carefully read these patches and think they are ready to go.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 2/3] unpack-trees: increment cache_bottom for sparse directories
  2022-03-21 19:03     ` Derrick Stolee
@ 2022-03-21 20:52       ` Junio C Hamano
  0 siblings, 0 replies; 12+ messages in thread
From: Junio C Hamano @ 2022-03-21 20:52 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Victoria Dye via GitGitGadget, git, newren, Victoria Dye

Derrick Stolee <derrickstolee@github.com> writes:

> On 3/17/2022 11:55 AM, Victoria Dye via GitGitGadget wrote:
>> From: Victoria Dye <vdye@github.com>
>> 
>> Correct tracking of the 'cache_bottom' for cases where sparse directories
>> are present in the index.
>
> Thank you for the detailed background (that I cut from my reply).
>
>> CORRECTING CACHE_BOTTOM
>> -----------------------
>> The problems observed in 't1092' come from 'cache_bottom' lagging behind in
>> cases where the cache tree-based advancement doesn't occur. To solve this,
>> then, the fix in 17a1bb570b is "reversed"; rather than skipping
>> 'cache_bottom' advancement in 'mark_ce_used()', we skip the directory
>> contents-based advancement for sparse directories. Now, every index entry
>> can be accounted for in 'cache_bottom':
>
> I have sufficient background to be confident that you are doing the
> right thing here!

;-)  

The "partly lagging" walking of the index guided by cache_bottom has
been a tricky thing in the unpack_trees machinery, and I am very
happy that we now have two more experts on the topic ;-)

Thanks, both.


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2022-03-21 20:53 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-16 20:11 [PATCH 0/3] Fix use of 'cache_bottom' in sparse index Victoria Dye via GitGitGadget
2022-03-16 20:12 ` [PATCH 1/3] t1092: add sparse directory before cone in test repo Victoria Dye via GitGitGadget
2022-03-16 20:12 ` [PATCH 2/3] unpack-trees: increment cache_bottom for sparse directories Victoria Dye via GitGitGadget
2022-03-16 20:12 ` [PATCH 3/3] Revert "unpack-trees: improve performance of next_cache_entry" Victoria Dye via GitGitGadget
2022-03-16 20:21 ` [PATCH 0/3] Fix use of 'cache_bottom' in sparse index Junio C Hamano
2022-03-17 15:55 ` [PATCH v2 " Victoria Dye via GitGitGadget
2022-03-17 15:55   ` [PATCH v2 1/3] t1092: add sparse directory before cone in test repo Victoria Dye via GitGitGadget
2022-03-17 15:55   ` [PATCH v2 2/3] unpack-trees: increment cache_bottom for sparse directories Victoria Dye via GitGitGadget
2022-03-21 19:03     ` Derrick Stolee
2022-03-21 20:52       ` Junio C Hamano
2022-03-17 15:55   ` [PATCH v2 3/3] Revert "unpack-trees: improve performance of next_cache_entry" Victoria Dye via GitGitGadget
2022-03-21 19:12   ` [PATCH v2 0/3] Fix use of 'cache_bottom' in sparse index Derrick Stolee

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.