All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] Sparse Index: fix a checkout bug with deep sparse-checkout patterns
@ 2021-12-04  2:55 Derrick Stolee via GitGitGadget
  2021-12-04  2:55 ` [PATCH 1/2] t1092: add deeper changes during a checkout Derrick Stolee via GitGitGadget
                   ` (3 more replies)
  0 siblings, 4 replies; 9+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-12-04  2:55 UTC (permalink / raw)
  To: git; +Cc: stolee, vdye, gitster, newren, Derrick Stolee

This week, we rolled out the sparse index to a large internal monorepo. We
got two very similar bug reports that dealt with a strange error that
involved the same set of paths. One was during git pull (pull was a red
herring) and the other was git checkout. The git checkout case gave enough
of a reproduction to debug deep into unpack-trees.c and find the problem.

This bug dates back to 523506d (unpack-trees: unpack sparse directory
entries, 2021-07-14). The reason we didn't hit this before is because it
requires the following:

 1. The sparse-checkout definition needs to have recursive inclusion of deep
    folders (depth 3 or more).
 2. Adjacent to those deep folders, we need a deep sparse directory entry
    that receives changes.
 3. In this particular repo, deep directories are only added to the
    sparse-checkout in rare occasions and those adjacent folders are rarely
    updated. They happened to update this week and hit our sparse index
    dogfooders in surprising ways.

The first patch adds a test that fails without the fix. It requires
modifying our test data to make adjacent, deep sparse directory entries
possible. It's a rather simple test after we have that data change.

The second patch includes the actual fix. It's really just an error of not
understanding the difference between the name and traverse_path members of
the struct traverse_info structure. name only stores a single tree entry
while traverse_path actually includes the full name from root. The method we
are editing also has an additional struct name_entry that fills in the tree
entry on top of the traverse_path, which explains how this worked to depth
two, but not depth three.

Thanks, -Stolee

Derrick Stolee (2):
  t1092: add deeper changes during a checkout
  unpack-trees: use traverse_path instead of name

 t/t1092-sparse-checkout-compatibility.sh | 16 +++++++++++++++-
 unpack-trees.c                           | 10 +++++-----
 2 files changed, 20 insertions(+), 6 deletions(-)


base-commit: cd3e606211bb1cf8bc57f7d76bab98cc17a150bc
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1092%2Fderrickstolee%2Fsparse-index%2Fcheckout-bug-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1092/derrickstolee/sparse-index/checkout-bug-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/1092
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 1/2] t1092: add deeper changes during a checkout
  2021-12-04  2:55 [PATCH 0/2] Sparse Index: fix a checkout bug with deep sparse-checkout patterns Derrick Stolee via GitGitGadget
@ 2021-12-04  2:55 ` Derrick Stolee via GitGitGadget
  2021-12-04  2:55 ` [PATCH 2/2] unpack-trees: use traverse_path instead of name Derrick Stolee via GitGitGadget
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 9+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-12-04  2:55 UTC (permalink / raw)
  To: git; +Cc: stolee, vdye, gitster, newren, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Extend the repository data in the setup of t1092 to include more
directories within two parent directories. This reproduces a bug found
by users of the sparse index feature with suitably-complicated
sparse-checkout definitions.

Add a failing test that fails in its first 'git checkout deepest' run in
the sparse index case with this error:

  error: Your local changes to the following files would be overwritten by checkout:
          deep/deeper1/deepest2/a
          deep/deeper1/deepest3/a
  Please commit your changes or stash them before you switch branches.
  Aborting

The next change will fix this error, and that fix will make it clear why
the extra depth is necessary for revealing this bug. The assignment of
the sparse-checkout definition to include deep/deeper1/deepest as a
sibling directory is important to ensure that deep/deeper1 is not a
sparse directory entry, but deep/deeper1/deepest2 is.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 t/t1092-sparse-checkout-compatibility.sh | 16 +++++++++++++++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index 16fbd2c6db9..e6aef40e9b3 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -19,6 +19,8 @@ test_expect_success 'setup' '
 		mkdir folder1 folder2 deep x &&
 		mkdir deep/deeper1 deep/deeper2 deep/before deep/later &&
 		mkdir deep/deeper1/deepest &&
+		mkdir deep/deeper1/deepest2 &&
+		mkdir deep/deeper1/deepest3 &&
 		echo "after deeper1" >deep/e &&
 		echo "after deepest" >deep/deeper1/e &&
 		cp a folder1 &&
@@ -30,7 +32,9 @@ test_expect_success 'setup' '
 		cp a deep/deeper2 &&
 		cp a deep/later &&
 		cp a deep/deeper1/deepest &&
-		cp -r deep/deeper1/deepest deep/deeper2 &&
+		cp a deep/deeper1/deepest2 &&
+		cp a deep/deeper1/deepest3 &&
+		cp -r deep/deeper1/ deep/deeper2 &&
 		mkdir deep/deeper1/0 &&
 		mkdir deep/deeper1/0/0 &&
 		touch deep/deeper1/0/1 &&
@@ -126,6 +130,8 @@ test_expect_success 'setup' '
 
 		git checkout -b deepest base &&
 		echo "updated deepest" >deep/deeper1/deepest/a &&
+		echo "updated deepest2" >deep/deeper1/deepest2/a &&
+		echo "updated deepest3" >deep/deeper1/deepest3/a &&
 		git commit -a -m "update deepest" &&
 
 		git checkout -f base &&
@@ -301,6 +307,14 @@ test_expect_success 'add, commit, checkout' '
 	test_all_match git checkout -
 '
 
+test_expect_failure 'deep changes during checkout' '
+	init_repos &&
+
+	test_sparse_match git sparse-checkout set deep/deeper1/deepest &&
+	test_all_match git checkout deepest &&
+	test_all_match git checkout base
+'
+
 test_expect_success 'add outside sparse cone' '
 	init_repos &&
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 2/2] unpack-trees: use traverse_path instead of name
  2021-12-04  2:55 [PATCH 0/2] Sparse Index: fix a checkout bug with deep sparse-checkout patterns Derrick Stolee via GitGitGadget
  2021-12-04  2:55 ` [PATCH 1/2] t1092: add deeper changes during a checkout Derrick Stolee via GitGitGadget
@ 2021-12-04  2:55 ` Derrick Stolee via GitGitGadget
  2021-12-04  5:42   ` Elijah Newren
  2021-12-04  5:45 ` [PATCH 0/2] Sparse Index: fix a checkout bug with deep sparse-checkout patterns Elijah Newren
  2021-12-06 14:10 ` [PATCH v2 " Derrick Stolee via GitGitGadget
  3 siblings, 1 reply; 9+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-12-04  2:55 UTC (permalink / raw)
  To: git; +Cc: stolee, vdye, gitster, newren, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

The sparse_dir_matches_path() method compares a cache entry that is a
sparse directory entry against a 'struct traverse_info *info' and a
'struct name_entry *p' to see if the cache entry has exactly the right
name for those other inputs.

This method was introduced in 523506d (unpack-trees: unpack sparse
directory entries, 2021-07-14), but included a significant mistake. The
path comparisons used 'info->name' instead of 'info->traverse_path'.
Since 'info->name' only stores a single tree entry name while
'info->traverse_path' stores the full path from root, this method does
not work when 'info' is in a subdirectory of a directory. Replacing the
right strings and their corresponding lengths make the method work
properly.

The previous change included a failing test that exposes this issue.
That test now passes. The critical detail is that as we go deep into
unpack_trees(), the logic for merging a sparse directory entry with a
tree entry during 'git checkout' relies on this
sparse_dir_matches_path() in order to avoid calling
traverse_trees_recursive() during unpack_callback() in this hunk:

	if (!is_sparse_directory_entry(src[0], names, info) &&
	    traverse_trees_recursive(n, dirmask, mask & ~dirmask,
					    names, info) < 0) {
		return -1;
	}

For deep paths, the short-circuit never occurred and
traverse_trees_recursive() was being called incorrectly and that was
causing other strange issues. Specifically, the error message from the
now-passing test previously included this:

      error: Your local changes to the following files would be overwritten by checkout:
              deep/deeper1/deepest2/a
              deep/deeper1/deepest3/a
      Please commit your changes or stash them before you switch branches.
      Aborting

These messages occurred because the 'current' cache entry in
twoway_merge() was showing as NULL because the index did not contain
entries for the paths contained within the sparse directory entries. We
instead had 'oldtree' given as the entry at HEAD and 'newtree' as the
entry in the target tree. This led to reject_merge() listing these
paths.

Now that sparse_dir_matches_path() works the same for deep paths as it
does for shallow depths, the rest of the logic kicks in to properly
handle modifying the sparse directory entries as designed.

Reported-by: Gustave Granroth <gus.gran@gmail.com>
Reported-by: Mike Marcelais <michmarc@exchange.microsoft.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 t/t1092-sparse-checkout-compatibility.sh |  2 +-
 unpack-trees.c                           | 10 +++++-----
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index e6aef40e9b3..f04a02c6b20 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -307,7 +307,7 @@ test_expect_success 'add, commit, checkout' '
 	test_all_match git checkout -
 '
 
-test_expect_failure 'deep changes during checkout' '
+test_expect_success 'deep changes during checkout' '
 	init_repos &&
 
 	test_sparse_match git sparse-checkout set deep/deeper1/deepest &&
diff --git a/unpack-trees.c b/unpack-trees.c
index 89ca95ce90b..7381c275768 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -1243,11 +1243,11 @@ static int sparse_dir_matches_path(const struct cache_entry *ce,
 	assert(S_ISSPARSEDIR(ce->ce_mode));
 	assert(ce->name[ce->ce_namelen - 1] == '/');
 
-	if (info->namelen)
-		return ce->ce_namelen == info->namelen + p->pathlen + 2 &&
-		       ce->name[info->namelen] == '/' &&
-		       !strncmp(ce->name, info->name, info->namelen) &&
-		       !strncmp(ce->name + info->namelen + 1, p->path, p->pathlen);
+	if (info->pathlen)
+		return ce->ce_namelen == info->pathlen + p->pathlen + 1 &&
+		       ce->name[info->pathlen - 1] == '/' &&
+		       !strncmp(ce->name, info->traverse_path, info->pathlen) &&
+		       !strncmp(ce->name + info->pathlen, p->path, p->pathlen);
 	return ce->ce_namelen == p->pathlen + 1 &&
 	       !strncmp(ce->name, p->path, p->pathlen);
 }
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/2] unpack-trees: use traverse_path instead of name
  2021-12-04  2:55 ` [PATCH 2/2] unpack-trees: use traverse_path instead of name Derrick Stolee via GitGitGadget
@ 2021-12-04  5:42   ` Elijah Newren
  2021-12-06 13:59     ` Derrick Stolee
  0 siblings, 1 reply; 9+ messages in thread
From: Elijah Newren @ 2021-12-04  5:42 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Derrick Stolee, Victoria Dye, Junio C Hamano,
	Derrick Stolee, Derrick Stolee

On Fri, Dec 3, 2021 at 6:55 PM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Derrick Stolee <dstolee@microsoft.com>
>
> The sparse_dir_matches_path() method compares a cache entry that is a
> sparse directory entry against a 'struct traverse_info *info' and a
> 'struct name_entry *p' to see if the cache entry has exactly the right
> name for those other inputs.
>
> This method was introduced in 523506d (unpack-trees: unpack sparse
> directory entries, 2021-07-14), but included a significant mistake. The
> path comparisons used 'info->name' instead of 'info->traverse_path'.
> Since 'info->name' only stores a single tree entry name while
> 'info->traverse_path' stores the full path from root, this method does
> not work when 'info' is in a subdirectory of a directory. Replacing the
> right strings and their corresponding lengths make the method work
> properly.
>
> The previous change included a failing test that exposes this issue.
> That test now passes. The critical detail is that as we go deep into
> unpack_trees(), the logic for merging a sparse directory entry with a
> tree entry during 'git checkout' relies on this
> sparse_dir_matches_path() in order to avoid calling
> traverse_trees_recursive() during unpack_callback() in this hunk:
>
>         if (!is_sparse_directory_entry(src[0], names, info) &&
>             traverse_trees_recursive(n, dirmask, mask & ~dirmask,
>                                             names, info) < 0) {
>                 return -1;
>         }
>
> For deep paths, the short-circuit never occurred and
> traverse_trees_recursive() was being called incorrectly and that was
> causing other strange issues. Specifically, the error message from the
> now-passing test previously included this:
>
>       error: Your local changes to the following files would be overwritten by checkout:
>               deep/deeper1/deepest2/a
>               deep/deeper1/deepest3/a
>       Please commit your changes or stash them before you switch branches.
>       Aborting
>
> These messages occurred because the 'current' cache entry in
> twoway_merge() was showing as NULL because the index did not contain
> entries for the paths contained within the sparse directory entries. We
> instead had 'oldtree' given as the entry at HEAD and 'newtree' as the
> entry in the target tree. This led to reject_merge() listing these
> paths.
>
> Now that sparse_dir_matches_path() works the same for deep paths as it
> does for shallow depths, the rest of the logic kicks in to properly
> handle modifying the sparse directory entries as designed.

Eek, sorry for not catching this in my earlier review.  Thanks for the
detailed explanation; well analyzed.

>
> Reported-by: Gustave Granroth <gus.gran@gmail.com>
> Reported-by: Mike Marcelais <michmarc@exchange.microsoft.com>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  t/t1092-sparse-checkout-compatibility.sh |  2 +-
>  unpack-trees.c                           | 10 +++++-----
>  2 files changed, 6 insertions(+), 6 deletions(-)
>
> diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
> index e6aef40e9b3..f04a02c6b20 100755
> --- a/t/t1092-sparse-checkout-compatibility.sh
> +++ b/t/t1092-sparse-checkout-compatibility.sh
> @@ -307,7 +307,7 @@ test_expect_success 'add, commit, checkout' '
>         test_all_match git checkout -
>  '
>
> -test_expect_failure 'deep changes during checkout' '
> +test_expect_success 'deep changes during checkout' '
>         init_repos &&
>
>         test_sparse_match git sparse-checkout set deep/deeper1/deepest &&
> diff --git a/unpack-trees.c b/unpack-trees.c
> index 89ca95ce90b..7381c275768 100644
> --- a/unpack-trees.c
> +++ b/unpack-trees.c
> @@ -1243,11 +1243,11 @@ static int sparse_dir_matches_path(const struct cache_entry *ce,
>         assert(S_ISSPARSEDIR(ce->ce_mode));
>         assert(ce->name[ce->ce_namelen - 1] == '/');
>
> -       if (info->namelen)
> -               return ce->ce_namelen == info->namelen + p->pathlen + 2 &&
> -                      ce->name[info->namelen] == '/' &&
> -                      !strncmp(ce->name, info->name, info->namelen) &&
> -                      !strncmp(ce->name + info->namelen + 1, p->path, p->pathlen);
> +       if (info->pathlen)
> +               return ce->ce_namelen == info->pathlen + p->pathlen + 1 &&
> +                      ce->name[info->pathlen - 1] == '/' &&
> +                      !strncmp(ce->name, info->traverse_path, info->pathlen) &&
> +                      !strncmp(ce->name + info->pathlen, p->path, p->pathlen);
>         return ce->ce_namelen == p->pathlen + 1 &&
>                !strncmp(ce->name, p->path, p->pathlen);
>  }
> --

The comment at the beginning of this function (not shown in this
patch) is now stale and misleading; it should be corrected too.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 0/2] Sparse Index: fix a checkout bug with deep sparse-checkout patterns
  2021-12-04  2:55 [PATCH 0/2] Sparse Index: fix a checkout bug with deep sparse-checkout patterns Derrick Stolee via GitGitGadget
  2021-12-04  2:55 ` [PATCH 1/2] t1092: add deeper changes during a checkout Derrick Stolee via GitGitGadget
  2021-12-04  2:55 ` [PATCH 2/2] unpack-trees: use traverse_path instead of name Derrick Stolee via GitGitGadget
@ 2021-12-04  5:45 ` Elijah Newren
  2021-12-06 14:10 ` [PATCH v2 " Derrick Stolee via GitGitGadget
  3 siblings, 0 replies; 9+ messages in thread
From: Elijah Newren @ 2021-12-04  5:45 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Derrick Stolee, Victoria Dye, Junio C Hamano,
	Derrick Stolee

On Fri, Dec 3, 2021 at 6:55 PM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> This week, we rolled out the sparse index to a large internal monorepo. We
> got two very similar bug reports that dealt with a strange error that
> involved the same set of paths. One was during git pull (pull was a red
> herring) and the other was git checkout. The git checkout case gave enough
> of a reproduction to debug deep into unpack-trees.c and find the problem.
>
> This bug dates back to 523506d (unpack-trees: unpack sparse directory
> entries, 2021-07-14). The reason we didn't hit this before is because it
> requires the following:
>
>  1. The sparse-checkout definition needs to have recursive inclusion of deep
>     folders (depth 3 or more).
>  2. Adjacent to those deep folders, we need a deep sparse directory entry
>     that receives changes.
>  3. In this particular repo, deep directories are only added to the
>     sparse-checkout in rare occasions and those adjacent folders are rarely
>     updated. They happened to update this week and hit our sparse index
>     dogfooders in surprising ways.
>
> The first patch adds a test that fails without the fix. It requires
> modifying our test data to make adjacent, deep sparse directory entries
> possible. It's a rather simple test after we have that data change.
>
> The second patch includes the actual fix. It's really just an error of not
> understanding the difference between the name and traverse_path members of
> the struct traverse_info structure. name only stores a single tree entry
> while traverse_path actually includes the full name from root. The method we
> are editing also has an additional struct name_entry that fills in the tree
> entry on top of the traverse_path, which explains how this worked to depth
> two, but not depth three.

Thanks for the detailed explanation.  I looked around for similar
potential problems elsewhere, but only noted that the comment at the
top of the function is also wrong and should be updated (as I
commented on Patch 2).  After you fix the comment similarly, feel free
to add my

Reviewed-by: Elijah Newren <newren@gmail.com>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/2] unpack-trees: use traverse_path instead of name
  2021-12-04  5:42   ` Elijah Newren
@ 2021-12-06 13:59     ` Derrick Stolee
  0 siblings, 0 replies; 9+ messages in thread
From: Derrick Stolee @ 2021-12-06 13:59 UTC (permalink / raw)
  To: Elijah Newren, Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Victoria Dye, Junio C Hamano, Derrick Stolee,
	Derrick Stolee

On 12/4/2021 12:42 AM, Elijah Newren wrote:
> On Fri, Dec 3, 2021 at 6:55 PM Derrick Stolee via GitGitGadget
> <gitgitgadget@gmail.com> wrote:

>> @@ -1243,11 +1243,11 @@ static int sparse_dir_matches_path(const struct cache_entry *ce,
>>         assert(S_ISSPARSEDIR(ce->ce_mode));
>>         assert(ce->name[ce->ce_namelen - 1] == '/');
>>
>> -       if (info->namelen)
>> -               return ce->ce_namelen == info->namelen + p->pathlen + 2 &&
>> -                      ce->name[info->namelen] == '/' &&
>> -                      !strncmp(ce->name, info->name, info->namelen) &&
>> -                      !strncmp(ce->name + info->namelen + 1, p->path, p->pathlen);
>> +       if (info->pathlen)
>> +               return ce->ce_namelen == info->pathlen + p->pathlen + 1 &&
>> +                      ce->name[info->pathlen - 1] == '/' &&
>> +                      !strncmp(ce->name, info->traverse_path, info->pathlen) &&
>> +                      !strncmp(ce->name + info->pathlen, p->path, p->pathlen);
>>         return ce->ce_namelen == p->pathlen + 1 &&
>>                !strncmp(ce->name, p->path, p->pathlen);
>>  }
>> --
> 
> The comment at the beginning of this function (not shown in this
> patch) is now stale and misleading; it should be corrected too.
 
Will do! Thanks for catching that.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v2 0/2] Sparse Index: fix a checkout bug with deep sparse-checkout patterns
  2021-12-04  2:55 [PATCH 0/2] Sparse Index: fix a checkout bug with deep sparse-checkout patterns Derrick Stolee via GitGitGadget
                   ` (2 preceding siblings ...)
  2021-12-04  5:45 ` [PATCH 0/2] Sparse Index: fix a checkout bug with deep sparse-checkout patterns Elijah Newren
@ 2021-12-06 14:10 ` Derrick Stolee via GitGitGadget
  2021-12-06 14:10   ` [PATCH v2 1/2] t1092: add deeper changes during a checkout Derrick Stolee via GitGitGadget
  2021-12-06 14:10   ` [PATCH v2 2/2] unpack-trees: use traverse_path instead of name Derrick Stolee via GitGitGadget
  3 siblings, 2 replies; 9+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-12-06 14:10 UTC (permalink / raw)
  To: git; +Cc: stolee, vdye, gitster, newren, Derrick Stolee

This week, we rolled out the sparse index to a large internal monorepo. We
got two very similar bug reports that dealt with a strange error that
involved the same set of paths. One was during git pull (pull was a red
herring) and the other was git checkout. The git checkout case gave enough
of a reproduction to debug deep into unpack-trees.c and find the problem.

This bug dates back to 523506d (unpack-trees: unpack sparse directory
entries, 2021-07-14). The reason we didn't hit this before is because it
requires the following:

 1. The sparse-checkout definition needs to have recursive inclusion of deep
    folders (depth 3 or more).
 2. Adjacent to those deep folders, we need a deep sparse directory entry
    that receives changes.
 3. In this particular repo, deep directories are only added to the
    sparse-checkout in rare occasions and those adjacent folders are rarely
    updated. They happened to update this week and hit our sparse index
    dogfooders in surprising ways.

The first patch adds a test that fails without the fix. It requires
modifying our test data to make adjacent, deep sparse directory entries
possible. It's a rather simple test after we have that data change.

The second patch includes the actual fix. It's really just an error of not
understanding the difference between the name and traverse_path members of
the struct traverse_info structure. name only stores a single tree entry
while traverse_path actually includes the full name from root. The method we
are editing also has an additional struct name_entry that fills in the tree
entry on top of the traverse_path, which explains how this worked to depth
two, but not depth three.


Update in v2
============

 * Fixed the comment describing the sparse_dir_matches_path() method.

Thanks, -Stolee

Derrick Stolee (2):
  t1092: add deeper changes during a checkout
  unpack-trees: use traverse_path instead of name

 t/t1092-sparse-checkout-compatibility.sh | 16 +++++++++++++++-
 unpack-trees.c                           | 14 ++++++++------
 2 files changed, 23 insertions(+), 7 deletions(-)


base-commit: cd3e606211bb1cf8bc57f7d76bab98cc17a150bc
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1092%2Fderrickstolee%2Fsparse-index%2Fcheckout-bug-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1092/derrickstolee/sparse-index/checkout-bug-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/1092

Range-diff vs v1:

 1:  ba05d7d4149 = 1:  ba05d7d4149 t1092: add deeper changes during a checkout
 2:  c9142199656 ! 2:  aa37168dcb4 unpack-trees: use traverse_path instead of name
     @@ t/t1092-sparse-checkout-compatibility.sh: test_expect_success 'add, commit, chec
       	test_sparse_match git sparse-checkout set deep/deeper1/deepest &&
      
       ## unpack-trees.c ##
     +@@ unpack-trees.c: static int find_cache_pos(struct traverse_info *info,
     + 
     + /*
     +  * Given a sparse directory entry 'ce', compare ce->name to
     +- * info->name + '/' + p->path + '/' if info->name is non-empty.
     ++ * info->traverse_path + p->path + '/' if info->traverse_path
     ++ * is non-empty.
     ++ *
     +  * Compare ce->name to p->path + '/' otherwise. Note that
     +  * ce->name must end in a trailing '/' because it is a sparse
     +  * directory entry.
      @@ unpack-trees.c: static int sparse_dir_matches_path(const struct cache_entry *ce,
       	assert(S_ISSPARSEDIR(ce->ce_mode));
       	assert(ce->name[ce->ce_namelen - 1] == '/');

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v2 1/2] t1092: add deeper changes during a checkout
  2021-12-06 14:10 ` [PATCH v2 " Derrick Stolee via GitGitGadget
@ 2021-12-06 14:10   ` Derrick Stolee via GitGitGadget
  2021-12-06 14:10   ` [PATCH v2 2/2] unpack-trees: use traverse_path instead of name Derrick Stolee via GitGitGadget
  1 sibling, 0 replies; 9+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-12-06 14:10 UTC (permalink / raw)
  To: git; +Cc: stolee, vdye, gitster, newren, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Extend the repository data in the setup of t1092 to include more
directories within two parent directories. This reproduces a bug found
by users of the sparse index feature with suitably-complicated
sparse-checkout definitions.

Add a failing test that fails in its first 'git checkout deepest' run in
the sparse index case with this error:

  error: Your local changes to the following files would be overwritten by checkout:
          deep/deeper1/deepest2/a
          deep/deeper1/deepest3/a
  Please commit your changes or stash them before you switch branches.
  Aborting

The next change will fix this error, and that fix will make it clear why
the extra depth is necessary for revealing this bug. The assignment of
the sparse-checkout definition to include deep/deeper1/deepest as a
sibling directory is important to ensure that deep/deeper1 is not a
sparse directory entry, but deep/deeper1/deepest2 is.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 t/t1092-sparse-checkout-compatibility.sh | 16 +++++++++++++++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index 16fbd2c6db9..e6aef40e9b3 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -19,6 +19,8 @@ test_expect_success 'setup' '
 		mkdir folder1 folder2 deep x &&
 		mkdir deep/deeper1 deep/deeper2 deep/before deep/later &&
 		mkdir deep/deeper1/deepest &&
+		mkdir deep/deeper1/deepest2 &&
+		mkdir deep/deeper1/deepest3 &&
 		echo "after deeper1" >deep/e &&
 		echo "after deepest" >deep/deeper1/e &&
 		cp a folder1 &&
@@ -30,7 +32,9 @@ test_expect_success 'setup' '
 		cp a deep/deeper2 &&
 		cp a deep/later &&
 		cp a deep/deeper1/deepest &&
-		cp -r deep/deeper1/deepest deep/deeper2 &&
+		cp a deep/deeper1/deepest2 &&
+		cp a deep/deeper1/deepest3 &&
+		cp -r deep/deeper1/ deep/deeper2 &&
 		mkdir deep/deeper1/0 &&
 		mkdir deep/deeper1/0/0 &&
 		touch deep/deeper1/0/1 &&
@@ -126,6 +130,8 @@ test_expect_success 'setup' '
 
 		git checkout -b deepest base &&
 		echo "updated deepest" >deep/deeper1/deepest/a &&
+		echo "updated deepest2" >deep/deeper1/deepest2/a &&
+		echo "updated deepest3" >deep/deeper1/deepest3/a &&
 		git commit -a -m "update deepest" &&
 
 		git checkout -f base &&
@@ -301,6 +307,14 @@ test_expect_success 'add, commit, checkout' '
 	test_all_match git checkout -
 '
 
+test_expect_failure 'deep changes during checkout' '
+	init_repos &&
+
+	test_sparse_match git sparse-checkout set deep/deeper1/deepest &&
+	test_all_match git checkout deepest &&
+	test_all_match git checkout base
+'
+
 test_expect_success 'add outside sparse cone' '
 	init_repos &&
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v2 2/2] unpack-trees: use traverse_path instead of name
  2021-12-06 14:10 ` [PATCH v2 " Derrick Stolee via GitGitGadget
  2021-12-06 14:10   ` [PATCH v2 1/2] t1092: add deeper changes during a checkout Derrick Stolee via GitGitGadget
@ 2021-12-06 14:10   ` Derrick Stolee via GitGitGadget
  1 sibling, 0 replies; 9+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-12-06 14:10 UTC (permalink / raw)
  To: git; +Cc: stolee, vdye, gitster, newren, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

The sparse_dir_matches_path() method compares a cache entry that is a
sparse directory entry against a 'struct traverse_info *info' and a
'struct name_entry *p' to see if the cache entry has exactly the right
name for those other inputs.

This method was introduced in 523506d (unpack-trees: unpack sparse
directory entries, 2021-07-14), but included a significant mistake. The
path comparisons used 'info->name' instead of 'info->traverse_path'.
Since 'info->name' only stores a single tree entry name while
'info->traverse_path' stores the full path from root, this method does
not work when 'info' is in a subdirectory of a directory. Replacing the
right strings and their corresponding lengths make the method work
properly.

The previous change included a failing test that exposes this issue.
That test now passes. The critical detail is that as we go deep into
unpack_trees(), the logic for merging a sparse directory entry with a
tree entry during 'git checkout' relies on this
sparse_dir_matches_path() in order to avoid calling
traverse_trees_recursive() during unpack_callback() in this hunk:

	if (!is_sparse_directory_entry(src[0], names, info) &&
	    traverse_trees_recursive(n, dirmask, mask & ~dirmask,
					    names, info) < 0) {
		return -1;
	}

For deep paths, the short-circuit never occurred and
traverse_trees_recursive() was being called incorrectly and that was
causing other strange issues. Specifically, the error message from the
now-passing test previously included this:

      error: Your local changes to the following files would be overwritten by checkout:
              deep/deeper1/deepest2/a
              deep/deeper1/deepest3/a
      Please commit your changes or stash them before you switch branches.
      Aborting

These messages occurred because the 'current' cache entry in
twoway_merge() was showing as NULL because the index did not contain
entries for the paths contained within the sparse directory entries. We
instead had 'oldtree' given as the entry at HEAD and 'newtree' as the
entry in the target tree. This led to reject_merge() listing these
paths.

Now that sparse_dir_matches_path() works the same for deep paths as it
does for shallow depths, the rest of the logic kicks in to properly
handle modifying the sparse directory entries as designed.

Reported-by: Gustave Granroth <gus.gran@gmail.com>
Reported-by: Mike Marcelais <michmarc@exchange.microsoft.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 t/t1092-sparse-checkout-compatibility.sh |  2 +-
 unpack-trees.c                           | 14 ++++++++------
 2 files changed, 9 insertions(+), 7 deletions(-)

diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index e6aef40e9b3..f04a02c6b20 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -307,7 +307,7 @@ test_expect_success 'add, commit, checkout' '
 	test_all_match git checkout -
 '
 
-test_expect_failure 'deep changes during checkout' '
+test_expect_success 'deep changes during checkout' '
 	init_repos &&
 
 	test_sparse_match git sparse-checkout set deep/deeper1/deepest &&
diff --git a/unpack-trees.c b/unpack-trees.c
index 89ca95ce90b..d2363b44ec3 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -1231,7 +1231,9 @@ static int find_cache_pos(struct traverse_info *info,
 
 /*
  * Given a sparse directory entry 'ce', compare ce->name to
- * info->name + '/' + p->path + '/' if info->name is non-empty.
+ * info->traverse_path + p->path + '/' if info->traverse_path
+ * is non-empty.
+ *
  * Compare ce->name to p->path + '/' otherwise. Note that
  * ce->name must end in a trailing '/' because it is a sparse
  * directory entry.
@@ -1243,11 +1245,11 @@ static int sparse_dir_matches_path(const struct cache_entry *ce,
 	assert(S_ISSPARSEDIR(ce->ce_mode));
 	assert(ce->name[ce->ce_namelen - 1] == '/');
 
-	if (info->namelen)
-		return ce->ce_namelen == info->namelen + p->pathlen + 2 &&
-		       ce->name[info->namelen] == '/' &&
-		       !strncmp(ce->name, info->name, info->namelen) &&
-		       !strncmp(ce->name + info->namelen + 1, p->path, p->pathlen);
+	if (info->pathlen)
+		return ce->ce_namelen == info->pathlen + p->pathlen + 1 &&
+		       ce->name[info->pathlen - 1] == '/' &&
+		       !strncmp(ce->name, info->traverse_path, info->pathlen) &&
+		       !strncmp(ce->name + info->pathlen, p->path, p->pathlen);
 	return ce->ce_namelen == p->pathlen + 1 &&
 	       !strncmp(ce->name, p->path, p->pathlen);
 }
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2021-12-06 14:10 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-04  2:55 [PATCH 0/2] Sparse Index: fix a checkout bug with deep sparse-checkout patterns Derrick Stolee via GitGitGadget
2021-12-04  2:55 ` [PATCH 1/2] t1092: add deeper changes during a checkout Derrick Stolee via GitGitGadget
2021-12-04  2:55 ` [PATCH 2/2] unpack-trees: use traverse_path instead of name Derrick Stolee via GitGitGadget
2021-12-04  5:42   ` Elijah Newren
2021-12-06 13:59     ` Derrick Stolee
2021-12-04  5:45 ` [PATCH 0/2] Sparse Index: fix a checkout bug with deep sparse-checkout patterns Elijah Newren
2021-12-06 14:10 ` [PATCH v2 " Derrick Stolee via GitGitGadget
2021-12-06 14:10   ` [PATCH v2 1/2] t1092: add deeper changes during a checkout Derrick Stolee via GitGitGadget
2021-12-06 14:10   ` [PATCH v2 2/2] unpack-trees: use traverse_path instead of name Derrick Stolee via GitGitGadget

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.