git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v1 0/3] check-attr: integrate with sparse-index
@ 2023-07-01  6:48 Shuqi Liang
  2023-07-01  6:48 ` [PATCH v1 1/3] attr.c: read attributes in a sparse directory Shuqi Liang
                   ` (3 more replies)
  0 siblings, 4 replies; 39+ messages in thread
From: Shuqi Liang @ 2023-07-01  6:48 UTC (permalink / raw)
  To: git; +Cc: Shuqi Liang, vdye, gitster

Turn on sparse-index feature within `git check-attr` command.
Add necessary modifications and test them.

Shuqi Liang (3):
  attr.c: read attributes in a sparse directory
  t1092: add tests for `git check-attr`
  check-attr: integrate with sparse-index

 attr.c                                   | 64 ++++++++++++++++--------
 builtin/check-attr.c                     |  3 ++
 t/t1092-sparse-checkout-compatibility.sh | 40 +++++++++++++++
 3 files changed, 86 insertions(+), 21 deletions(-)


base-commit: 9748a6820043d5815bee770ffa51647e0adc2cf0
-- 
2.39.0


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH v1 1/3] attr.c: read attributes in a sparse directory
  2023-07-01  6:48 [PATCH v1 0/3] check-attr: integrate with sparse-index Shuqi Liang
@ 2023-07-01  6:48 ` Shuqi Liang
  2023-07-03 17:59   ` Victoria Dye
  2023-07-01  6:48 ` [PATCH v1 2/3] t1092: add tests for `git check-attr` Shuqi Liang
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 39+ messages in thread
From: Shuqi Liang @ 2023-07-01  6:48 UTC (permalink / raw)
  To: git; +Cc: Shuqi Liang, vdye, gitster

Before this patch,`git check-attr` can't find the attributes of a file
within a sparse directory. In order to read attributes from
'.gitattributes' files that may be in a sparse directory:

When path is in cone mode of sparse checkout:

1.If path is a sparse directory, read the tree OIDs from the sparse
directory.

2.If path is a regular files, read the attributes directly from the blob
data stored in the cache.

Helped-by: Victoria Dye <vdye@github.com>
Signed-off-by: Shuqi Liang <cheskaqiqi@gmail.com>
---
 attr.c | 64 +++++++++++++++++++++++++++++++++++++++-------------------
 1 file changed, 43 insertions(+), 21 deletions(-)

diff --git a/attr.c b/attr.c
index 7d39ac4a29..b0d26da102 100644
--- a/attr.c
+++ b/attr.c
@@ -808,35 +808,57 @@ static struct attr_stack *read_attr_from_blob(struct index_state *istate,
 static struct attr_stack *read_attr_from_index(struct index_state *istate,
 					       const char *path, unsigned flags)
 {
+	struct attr_stack *stack = NULL;
+	int i;
+	struct strbuf path1 = STRBUF_INIT;
+	struct strbuf path2 = STRBUF_INIT;
+	char *first_slash = NULL;
 	char *buf;
 	unsigned long size;
 
 	if (!istate)
 		return NULL;
 
-	/*
-	 * The .gitattributes file only applies to files within its
-	 * parent directory. In the case of cone-mode sparse-checkout,
-	 * the .gitattributes file is sparse if and only if all paths
-	 * within that directory are also sparse. Thus, don't load the
-	 * .gitattributes file since it will not matter.
-	 *
-	 * In the case of a sparse index, it is critical that we don't go
-	 * looking for a .gitattributes file, as doing so would cause the
-	 * index to expand.
-	 */
-	if (!path_in_cone_mode_sparse_checkout(path, istate))
-		return NULL;
-
-	buf = read_blob_data_from_index(istate, path, &size);
-	if (!buf)
-		return NULL;
-	if (size >= ATTR_MAX_FILE_SIZE) {
-		warning(_("ignoring overly large gitattributes blob '%s'"), path);
-		return NULL;
+	first_slash = strchr(path, '/');
+	if (first_slash) {
+		strbuf_add(&path1, path, first_slash - path + 1);
+		strbuf_addstr(&path2, first_slash + 1);
 	}
 
-	return read_attr_from_buf(buf, path, flags);
+	if(!path_in_cone_mode_sparse_checkout(path, istate)){
+		for (i = 0; i < istate->cache_nr; i++) {
+			struct cache_entry *ce = istate->cache[i];
+			if ( !strcmp(istate->cache[i]->name, path1.buf)&&S_ISSPARSEDIR(ce->ce_mode)) {
+				stack = read_attr_from_blob(istate, &ce->oid, path2.buf, flags);
+			}else if(S_ISREG(ce->ce_mode) && !strcmp(istate->cache[i]->name, path)){
+				unsigned long sz;
+				enum object_type type;
+				void *data;
+
+				data = repo_read_object_file(the_repository, &istate->cache[i]->oid,
+							&type, &sz);
+				if (!data || type != OBJ_BLOB) {
+					free(data);
+					strbuf_release(&path1);
+					strbuf_release(&path2);
+					return NULL;
+				}
+				stack = read_attr_from_buf(data, path, flags);
+			}
+		}
+	}else{
+		buf = read_blob_data_from_index(istate, path, &size);
+		if (!buf)
+			return NULL;
+		if (size >= ATTR_MAX_FILE_SIZE) {
+			warning(_("ignoring overly large gitattributes blob '%s'"), path);
+			return NULL;
+		}
+		 stack = read_attr_from_buf(buf, path, flags);
+	}
+	strbuf_release(&path1);
+	strbuf_release(&path2);
+	return stack;
 }
 
 static struct attr_stack *read_attr(struct index_state *istate,
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v1 2/3] t1092: add tests for `git check-attr`
  2023-07-01  6:48 [PATCH v1 0/3] check-attr: integrate with sparse-index Shuqi Liang
  2023-07-01  6:48 ` [PATCH v1 1/3] attr.c: read attributes in a sparse directory Shuqi Liang
@ 2023-07-01  6:48 ` Shuqi Liang
  2023-07-03 18:11   ` Victoria Dye
  2023-07-01  6:48 ` [PATCH v1 3/3] check-attr: integrate with sparse-index Shuqi Liang
  2023-07-07 15:18 ` [PATCH v2 0/3] " Shuqi Liang
  3 siblings, 1 reply; 39+ messages in thread
From: Shuqi Liang @ 2023-07-01  6:48 UTC (permalink / raw)
  To: git; +Cc: Shuqi Liang, vdye, gitster

Add tests for `git check-attr`, make sure it behaves as expected when
path is both inside or outside of sparse-checkout definition.

Helped-by: Victoria Dye <vdye@github.com>
Signed-off-by: Shuqi Liang <cheskaqiqi@gmail.com>
---
 t/t1092-sparse-checkout-compatibility.sh | 29 ++++++++++++++++++++++++
 1 file changed, 29 insertions(+)

diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index 8a95adf4b5..4edfa3c168 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -2259,4 +2259,33 @@ test_expect_success 'worktree is not expanded' '
 	ensure_not_expanded worktree remove .worktrees/hotfix
 '
 
+test_expect_success 'check-attr with pathspec inside sparse definition' '
+	init_repos &&
+
+	echo "a -crlf myAttr" >>.gitattributes &&
+	run_on_all cp ../.gitattributes ./deep &&
+
+	test_all_match git check-attr -a -- deep/a &&
+
+	test_all_match git add deep/.gitattributes &&
+	test_all_match git check-attr -a --cached -- deep/a
+'
+
+test_expect_success 'check-attr with pathspec outside sparse definition' '
+	init_repos &&
+
+	echo "a -crlf myAttr" >>.gitattributes &&
+	run_on_sparse mkdir folder1 &&
+	run_on_all cp ../.gitattributes ./folder1 &&
+	run_on_all cp a folder1/a &&
+
+	test_all_match git check-attr -a -- folder1/a &&
+
+	git -C full-checkout add folder1/.gitattributes &&
+	run_on_sparse git add --sparse folder1/.gitattributes &&
+	run_on_all git commit -m "add .gitattributes" &&
+	test_sparse_match git sparse-checkout reapply &&
+	test_all_match git check-attr  -a --cached -- folder1/a
+'
+
 test_done
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v1 3/3] check-attr: integrate with sparse-index
  2023-07-01  6:48 [PATCH v1 0/3] check-attr: integrate with sparse-index Shuqi Liang
  2023-07-01  6:48 ` [PATCH v1 1/3] attr.c: read attributes in a sparse directory Shuqi Liang
  2023-07-01  6:48 ` [PATCH v1 2/3] t1092: add tests for `git check-attr` Shuqi Liang
@ 2023-07-01  6:48 ` Shuqi Liang
  2023-07-03 18:21   ` Victoria Dye
  2023-07-07 15:18 ` [PATCH v2 0/3] " Shuqi Liang
  3 siblings, 1 reply; 39+ messages in thread
From: Shuqi Liang @ 2023-07-01  6:48 UTC (permalink / raw)
  To: git; +Cc: Shuqi Liang, vdye, gitster

Set the requires-full-index to false for "diff-tree".

Add test to ensure the index is not expanded when the
sparse index is enabled.

The `p2000` tests demonstrate a ~63% execution time reduction for
'git check-attr' using a sparse index.

Test                                            before  after
-----------------------------------------------------------------------
2000.106: git check-attr -a f2/f4/a (full-v3)    0.05   0.05 +0.0%
2000.107: git check-attr -a f2/f4/a (full-v4)    0.05   0.05 +0.0%
2000.108: git check-attr -a f2/f4/a (sparse-v3)  0.04   0.02 -50.0%
2000.109: git check-attr -a f2/f4/a (sparse-v4)  0.04   0.01 -75.0%

Helped-by: Victoria Dye <vdye@github.com>
Signed-off-by: Shuqi Liang <cheskaqiqi@gmail.com>
---
 builtin/check-attr.c                     |  3 +++
 t/t1092-sparse-checkout-compatibility.sh | 11 +++++++++++
 2 files changed, 14 insertions(+)

diff --git a/builtin/check-attr.c b/builtin/check-attr.c
index b22ff748c3..02267f9bc1 100644
--- a/builtin/check-attr.c
+++ b/builtin/check-attr.c
@@ -122,6 +122,9 @@ int cmd_check_attr(int argc, const char **argv, const char *prefix)
 	argc = parse_options(argc, argv, prefix, check_attr_options,
 			     check_attr_usage, PARSE_OPT_KEEP_DASHDASH);
 
+	prepare_repo_settings(the_repository);
+	the_repository->settings.command_requires_full_index = 0;
+	
 	if (repo_read_index(the_repository) < 0) {
 		die("invalid cache");
 	}
diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index 4edfa3c168..317ccc8ec5 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -2288,4 +2288,15 @@ test_expect_success 'check-attr with pathspec outside sparse definition' '
 	test_all_match git check-attr  -a --cached -- folder1/a
 '
 
+test_expect_success 'sparse-index is not expanded: check-attr' '
+	init_repos &&
+
+	echo "a -crlf myAttr" >>.gitattributes &&
+	run_on_all cp ../.gitattributes ./deep &&
+
+	ensure_not_expanded check-attr -a -- deep/a &&
+	run_on_all git add deep/.gitattributes &&
+	ensure_not_expanded check-attr -a --cached -- deep/a
+'
+
 test_done
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [PATCH v1 1/3] attr.c: read attributes in a sparse directory
  2023-07-01  6:48 ` [PATCH v1 1/3] attr.c: read attributes in a sparse directory Shuqi Liang
@ 2023-07-03 17:59   ` Victoria Dye
  0 siblings, 0 replies; 39+ messages in thread
From: Victoria Dye @ 2023-07-03 17:59 UTC (permalink / raw)
  To: Shuqi Liang, git; +Cc: gitster

Shuqi Liang wrote:
> Before this patch,`git check-attr` can't find the attributes of a file
> within a sparse directory. In order to read attributes from
> '.gitattributes' files that may be in a sparse directory:
> 
> When path is in cone mode of sparse checkout:
> 
> 1.If path is a sparse directory, read the tree OIDs from the sparse

s/path is a sparse directory/path is in a sparse directory(?)

> directory.
> 
> 2.If path is a regular files, read the attributes directly from the blob

s/files/file

> data stored in the cache.
> 
> Helped-by: Victoria Dye <vdye@github.com>
> Signed-off-by: Shuqi Liang <cheskaqiqi@gmail.com>
> ---
>  attr.c | 64 +++++++++++++++++++++++++++++++++++++++-------------------
>  1 file changed, 43 insertions(+), 21 deletions(-)
> 
> diff --git a/attr.c b/attr.c
> index 7d39ac4a29..b0d26da102 100644
> --- a/attr.c
> +++ b/attr.c
> @@ -808,35 +808,57 @@ static struct attr_stack *read_attr_from_blob(struct index_state *istate,
>  static struct attr_stack *read_attr_from_index(struct index_state *istate,
>  					       const char *path, unsigned flags)
>  {

nit: there are a few instances below of spacing inconsistent with project
styling: 'if(' instead of 'if (', 'x&&y' instead of 'x && y', etc. Please
adjust your  to match in your next re-roll (using CodingGuidelines and/or
surrounding code for reference).

> +	struct attr_stack *stack = NULL;
> +	int i;
> +	struct strbuf path1 = STRBUF_INIT;
> +	struct strbuf path2 = STRBUF_INIT;
> +	char *first_slash = NULL;
>  	char *buf;
>  	unsigned long size;
>  
>  	if (!istate)
>  		return NULL;
>  
> -	/*
> -	 * The .gitattributes file only applies to files within its
> -	 * parent directory. In the case of cone-mode sparse-checkout,
> -	 * the .gitattributes file is sparse if and only if all paths
> -	 * within that directory are also sparse. Thus, don't load the
> -	 * .gitattributes file since it will not matter.
> -	 *
> -	 * In the case of a sparse index, it is critical that we don't go
> -	 * looking for a .gitattributes file, as doing so would cause the
> -	 * index to expand.
> -	 */
> -	if (!path_in_cone_mode_sparse_checkout(path, istate))
> -		return NULL;

Could you add some details to your commit message explaining why the
reasoning in this comment no longer applies? I agree with your approach, but
the extra context will make it easier for reviewers and future readers to
evaluate whether _they_ agree with it, as well as determine whether your
implementation aligns with your stated goal.

As for this review, I'll assume that we now _always_ want to read
.gitattributes, regardless of 'SKIP_WORKTREE' or whether .gitattributes is
contained within a sparse directory. Please correct me if that
interpretation is incorrect!

> -
> -	buf = read_blob_data_from_index(istate, path, &size);
> -	if (!buf)
> -		return NULL;
> -	if (size >= ATTR_MAX_FILE_SIZE) {
> -		warning(_("ignoring overly large gitattributes blob '%s'"), path);
> -		return NULL;
> +	first_slash = strchr(path, '/');
> +	if (first_slash) {
> +		strbuf_add(&path1, path, first_slash - path + 1);
> +		strbuf_addstr(&path2, first_slash + 1);
>  	}

At this point, 'path1' is the first component of a given path, and 'path2'
is "everything else". If 'path' is 'path/to/my/.gitattributes', 'path1' is
"path" and 'path2' is "to/my/.gitattributes". Looks good.

>  
> -	return read_attr_from_buf(buf, path, flags);
> +	if(!path_in_cone_mode_sparse_checkout(path, istate)){> +		for (i = 0; i < istate->cache_nr; i++) {
> +			struct cache_entry *ce = istate->cache[i];
> +			if ( !strcmp(istate->cache[i]->name, path1.buf)&&S_ISSPARSEDIR(ce->ce_mode)) {
> +				stack = read_attr_from_blob(istate, &ce->oid, path2.buf, flags);

Here, you use 'read_attr_from_blob()' to read from the sparse directory's
tree directly _without_ needing to expand the index. Nice!

> +			}else if(S_ISREG(ce->ce_mode) && !strcmp(istate->cache[i]->name, path)){
> +				unsigned long sz;
> +				enum object_type type;
> +				void *data;
> +
> +				data = repo_read_object_file(the_repository, &istate->cache[i]->oid,
> +							&type, &sz);
> +				if (!data || type != OBJ_BLOB) {
> +					free(data);
> +					strbuf_release(&path1);
> +					strbuf_release(&path2);
> +					return NULL;
> +				}
> +				stack = read_attr_from_buf(data, path, flags);
> +			}
> +		}


On the whole, this patch updates the the treatment of a 'path' outside the
sparse-checkout patterns to first iterate through the index and at each
entry:

1. If the entry is a sparse directory _and_ the first component of 'path'
   matches the sparse directory name, read the .gitattributes with
   'read_attr_from_blob()'. 'read_attr_from_blob()' reads from the tree
   pointed to by the sparse directory using only the part of 'path' that is
   inside that sparse directory.
2. If the entry is _not_ a sparse directory _and_ its name matches the full
   'path', we read the blob by OID into a buffer, then
   'read_attr_from_buffer()'.

The general idea behind this makes sense (if .gitattributes is in a sparse
directory, read from the sparse directory tree; if not, directly read the
index), but the implementation as it is now has a few gaps/inefficiencies:

- If the sparse directory is not top-level (e.g., a sparse directory at
  'folder1/foo/'), the .gitattributes will be ignored completely.
- The iteration through the index continues even after we've read from the
  correct .gitattributes entry.
- The "else if" case above shouldn't functionally be any different from the
  "else" case below (both read the .gitattributes blob directly from the
  index) but their implementations are different.

To avoid those issues, you could adjust the structure of the code to more
explicitly match what you described in your commit message:

	if (*path is inside sparse directory*)
		stack = read_attr_from_blob(istate, 
					    *sparse directory containing path*, 
					    *path relative to sparse directory*, 
					    flags);
	else
		stack = *read .gitattributes from index blob*

Then fill in the pseudocode bits with concrete details:

- "read .gitattributes from index blob" is the most straightforward; it's
  what you have in the "else" block below.
- "path is inside sparse directory" can be determined using a combination of
  'path_in_cone_mode_sparse_checkout()' & 'index_name_pos_sparse()'. An
  example of similar logic can be found in 'entry_is_new_sparse_dir()' in
  'unpack-trees.c'.
- "sparse directory containing path" and "path relative to sparse directory"
  can be determined from the results of 'index_name_pos_sparse()'.

> +	}else{
> +		buf = read_blob_data_from_index(istate, path, &size);
> +		if (!buf)
> +			return NULL;
> +		if (size >= ATTR_MAX_FILE_SIZE) {
> +			warning(_("ignoring overly large gitattributes blob '%s'"), path);
> +			return NULL;
> +		}
> +		 stack = read_attr_from_buf(buf, path, flags);
> +	}
> +	strbuf_release(&path1);
> +	strbuf_release(&path2);
> +	return stack;

These changes should affect the behavior sparse index-integrated commands
that read attributes (e.g. 'git merge'). Would it be possible to test that?
E.g. take the 't1092' test 'merge with conflict outside cone', but add
smudge/clean filters in .gitattributes files inside the affected sparse
directories.

>  }
>  
>  static struct attr_stack *read_attr(struct index_state *istate,


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v1 2/3] t1092: add tests for `git check-attr`
  2023-07-01  6:48 ` [PATCH v1 2/3] t1092: add tests for `git check-attr` Shuqi Liang
@ 2023-07-03 18:11   ` Victoria Dye
  0 siblings, 0 replies; 39+ messages in thread
From: Victoria Dye @ 2023-07-03 18:11 UTC (permalink / raw)
  To: Shuqi Liang, git; +Cc: gitster

Shuqi Liang wrote:
> Add tests for `git check-attr`, make sure it behaves as expected when
> path is both inside or outside of sparse-checkout definition.
> 
> Helped-by: Victoria Dye <vdye@github.com>
> Signed-off-by: Shuqi Liang <cheskaqiqi@gmail.com>
> ---
>  t/t1092-sparse-checkout-compatibility.sh | 29 ++++++++++++++++++++++++
>  1 file changed, 29 insertions(+)
> 
> diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
> index 8a95adf4b5..4edfa3c168 100755
> --- a/t/t1092-sparse-checkout-compatibility.sh
> +++ b/t/t1092-sparse-checkout-compatibility.sh
> @@ -2259,4 +2259,33 @@ test_expect_success 'worktree is not expanded' '
>  	ensure_not_expanded worktree remove .worktrees/hotfix
>  '
>  
> +test_expect_success 'check-attr with pathspec inside sparse definition' '
> +	init_repos &&
> +
> +	echo "a -crlf myAttr" >>.gitattributes &&
> +	run_on_all cp ../.gitattributes ./deep &&
> +
> +	test_all_match git check-attr -a -- deep/a &&

First, ensure 'check-attr' reads the attributes in the untracked
.gitattributes...

> +
> +	test_all_match git add deep/.gitattributes &&
> +	test_all_match git check-attr -a --cached -- deep/a

Then, once .gitattributes is in the index, 'check-attr --cached' reads the
attributes from the index. Makes sense.

> +'
> +
> +test_expect_success 'check-attr with pathspec outside sparse definition' '
> +	init_repos &&
> +
> +	echo "a -crlf myAttr" >>.gitattributes &&
> +	run_on_sparse mkdir folder1 &&
> +	run_on_all cp ../.gitattributes ./folder1 &&
> +	run_on_all cp a folder1/a &&
> +
> +	test_all_match git check-attr -a -- folder1/a &&

This test starts the same way as the last, ensuring a .gitattributes file on
disk is read. The difference is, this one is outside the sparse cone;
without the previous patch [1], this would not work correctly. Good!

[1] https://lore.kernel.org/git/20230701064843.147496-2-cheskaqiqi@gmail.com/

> +
> +	git -C full-checkout add folder1/.gitattributes &&
> +	run_on_sparse git add --sparse folder1/.gitattributes &&
> +	run_on_all git commit -m "add .gitattributes" &&
> +	test_sparse_match git sparse-checkout reapply &&
> +	test_all_match git check-attr  -a --cached -- folder1/a

Now, add the file to the index and reapply the sparse-checkout patterns. In
both 'sparse-checkout' and 'sparse-index', the file is removed from disk; in
'sparse-index', the file is now contained in a sparse directory. Despite
this, the attributes are still read correctly by 'check-attr --cached'. 

These tests look great to me! 

> +'
> +
>  test_done


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v1 3/3] check-attr: integrate with sparse-index
  2023-07-01  6:48 ` [PATCH v1 3/3] check-attr: integrate with sparse-index Shuqi Liang
@ 2023-07-03 18:21   ` Victoria Dye
  0 siblings, 0 replies; 39+ messages in thread
From: Victoria Dye @ 2023-07-03 18:21 UTC (permalink / raw)
  To: Shuqi Liang, git; +Cc: gitster

Shuqi Liang wrote:
> Set the requires-full-index to false for "diff-tree".
> 
> Add test to ensure the index is not expanded when the
> sparse index is enabled.
> 
> The `p2000` tests demonstrate a ~63% execution time reduction for
> 'git check-attr' using a sparse index.
> 
> Test                                            before  after
> -----------------------------------------------------------------------
> 2000.106: git check-attr -a f2/f4/a (full-v3)    0.05   0.05 +0.0%
> 2000.107: git check-attr -a f2/f4/a (full-v4)    0.05   0.05 +0.0%
> 2000.108: git check-attr -a f2/f4/a (sparse-v3)  0.04   0.02 -50.0%
> 2000.109: git check-attr -a f2/f4/a (sparse-v4)  0.04   0.01 -75.0%

Great results as usual!

> 
> Helped-by: Victoria Dye <vdye@github.com>
> Signed-off-by: Shuqi Liang <cheskaqiqi@gmail.com>
> ---
>  builtin/check-attr.c                     |  3 +++
>  t/t1092-sparse-checkout-compatibility.sh | 11 +++++++++++

Did you forget to add the perf test to this patch? 

>  2 files changed, 14 insertions(+)
> 
> diff --git a/builtin/check-attr.c b/builtin/check-attr.c
> index b22ff748c3..02267f9bc1 100644
> --- a/builtin/check-attr.c
> +++ b/builtin/check-attr.c
> @@ -122,6 +122,9 @@ int cmd_check_attr(int argc, const char **argv, const char *prefix)
>  	argc = parse_options(argc, argv, prefix, check_attr_options,
>  			     check_attr_usage, PARSE_OPT_KEEP_DASHDASH);
>  
> +	prepare_repo_settings(the_repository);
> +	the_repository->settings.command_requires_full_index = 0;

Given that you updated 'read_attr_from_index()' to handle sparse directories
in [1], it makes sense that disabling 'command_requires_full_index' is all
that's needed to enable the sparse index here.

[1] https://lore.kernel.org/git/20230701064843.147496-2-cheskaqiqi@gmail.com/

> +	
>  	if (repo_read_index(the_repository) < 0) {
>  		die("invalid cache");
>  	}
> diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
> index 4edfa3c168..317ccc8ec5 100755
> --- a/t/t1092-sparse-checkout-compatibility.sh
> +++ b/t/t1092-sparse-checkout-compatibility.sh
> @@ -2288,4 +2288,15 @@ test_expect_success 'check-attr with pathspec outside sparse definition' '
>  	test_all_match git check-attr  -a --cached -- folder1/a
>  '
>  
> +test_expect_success 'sparse-index is not expanded: check-attr' '
> +	init_repos &&
> +
> +	echo "a -crlf myAttr" >>.gitattributes &&
> +	run_on_all cp ../.gitattributes ./deep &&

nit: we're only verifying behavior in 'sparse-index', so this should
probably be

	cp .gitattributes ./sparse-index/deep &&

> +
> +	ensure_not_expanded check-attr -a -- deep/a &&
> +	run_on_all git add deep/.gitattributes &&

Similar to above, this should probably be:

	git -C sparse-index add deep/.gitattributes &&

> +	ensure_not_expanded check-attr -a --cached -- deep/a
It'd be nice to show that the index is also not expanded files outside of
the sparse-checkout cone, e.g. 'folder1/.gitattributes' or
'folder1/0/.gitattributes' (similar to what you did for the correctness
tests in [2]).

[2] https://lore.kernel.org/git/20230701064843.147496-3-cheskaqiqi@gmail.com/

> +'
> +
>  test_done


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH v2 0/3] check-attr: integrate with sparse-index
  2023-07-01  6:48 [PATCH v1 0/3] check-attr: integrate with sparse-index Shuqi Liang
                   ` (2 preceding siblings ...)
  2023-07-01  6:48 ` [PATCH v1 3/3] check-attr: integrate with sparse-index Shuqi Liang
@ 2023-07-07 15:18 ` Shuqi Liang
  2023-07-07 15:18   ` [PATCH v2 1/3] Enable gitattributes read from sparse directories Shuqi Liang
                     ` (3 more replies)
  3 siblings, 4 replies; 39+ messages in thread
From: Shuqi Liang @ 2023-07-07 15:18 UTC (permalink / raw)
  To: git; +Cc: Shuqi Liang, vdye, gitster


change against v1:

* update a new commit message with more details

* update read_attr_from_index function

* add smudge/clean filters in test 'merge with conflict outside cone'

* add the missing perf test

* only verifying behavior in 'sparse-index' in test 'sparse-index is
not expanded: check-attr'

* show that the index is also not expanded files outside of
the sparse-checkout cone

Shuqi Liang (3):
  Enable gitattributes read from sparse directories
  t1092: add tests for `git check-attr`
  check-attr: integrate with sparse-index

 attr.c                                   | 42 +++++++++---------
 builtin/check-attr.c                     |  3 ++
 t/perf/p2000-sparse-operations.sh        |  1 +
 t/t1092-sparse-checkout-compatibility.sh | 55 ++++++++++++++++++++++++
 4 files changed, 80 insertions(+), 21 deletions(-)

Range-diff against v1:
1:  afa27ebe2d < -:  ---------- attr.c: read attributes in a sparse directory
-:  ---------- > 1:  0ff2ab9430 Enable gitattributes read from sparse directories
2:  5bb40b0327 ! 2:  835e1176b0 t1092: add tests for `git check-attr`
    @@ Metadata
      ## Commit message ##
         t1092: add tests for `git check-attr`
     
    +    Add smudge/clean filters in .gitattributes files inside the affected
    +    sparse directories in test 'merge with conflict outside cone', make sure
    +    it behaves as expected when path is outside of sparse-checkout.
    +
         Add tests for `git check-attr`, make sure it behaves as expected when
         path is both inside or outside of sparse-checkout definition.
     
    @@ Commit message
         Signed-off-by: Shuqi Liang <cheskaqiqi@gmail.com>
     
      ## t/t1092-sparse-checkout-compatibility.sh ##
    +@@ t/t1092-sparse-checkout-compatibility.sh: test_expect_success 'merge with conflict outside cone' '
    + 
    + 	test_all_match git checkout -b merge-tip merge-left &&
    + 	test_all_match git status --porcelain=v2 &&
    ++
    ++	echo "a filter=rot13" >>.gitattributes &&
    ++	run_on_sparse mkdir folder1 &&
    ++	run_on_all cp ../.gitattributes ./folder1 &&
    ++	git -C full-checkout add folder1/.gitattributes &&
    ++	run_on_sparse git add --sparse folder1/.gitattributes &&
    ++	run_on_all git commit -m "add .gitattributes" &&
    ++	test_sparse_match git sparse-checkout reapply &&
    ++	git config filter.rot13.clean "tr 'A-Za-z' 'N-ZA-Mn-za-m'" &&
    ++	git config filter.rot13.smudge "tr 'A-Za-z' 'N-ZA-Mn-za-m'" &&
    ++
    + 	test_all_match test_must_fail git merge -m merge merge-right &&
    + 	test_all_match git status --porcelain=v2 &&
    + 
     @@ t/t1092-sparse-checkout-compatibility.sh: test_expect_success 'worktree is not expanded' '
      	ensure_not_expanded worktree remove .worktrees/hotfix
      '
3:  abd14ddda7 < -:  ---------- check-attr: integrate with sparse-index
-:  ---------- > 3:  672d692e51 check-attr: integrate with sparse-index
-- 
2.39.0


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH v2 1/3] Enable gitattributes read from sparse directories
  2023-07-07 15:18 ` [PATCH v2 0/3] " Shuqi Liang
@ 2023-07-07 15:18   ` Shuqi Liang
  2023-07-07 23:15     ` Junio C Hamano
  2023-07-07 15:18   ` [PATCH v2 2/3] t1092: add tests for `git check-attr` Shuqi Liang
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 39+ messages in thread
From: Shuqi Liang @ 2023-07-07 15:18 UTC (permalink / raw)
  To: git; +Cc: Shuqi Liang, vdye, gitster

'git check-attr' cannot currently find attributes of a file within a
sparse directory. This is due to .gitattributes files are irrelevant in
sparse-checkout cone mode, as the file is considered sparse only if all
paths within its parent directory are also sparse. In addition,
searching for a .gitattributes file causes expansion of the sparse
index, which is avoided to prevent potential performance degradation.

However, this behavior can lead to missing attributes for files inside
sparse directories, causing inconsistencies in file handling.

To resolve this, revise 'git check-attr' to allow attribute reading for
files in sparse directories from the corresponding .gitattributes files:

1.Utilize path_in_cone_mode_sparse_checkout() and index_name_pos_sparse
to check if a path falls within a sparse directory.

2.If path is inside a sparse directory, employ the value of
index_name_pos_sparse() to find the sparse directory containing path and
path relative to sparse directory. Proceed to read attributes from the
tree OID of the sparse directory using read_attr_from_blob().

3.If path is not inside a sparse directory,ensure that attributes are
fetched from the index blob with read_blob_data_from_index().

Helped-by: Victoria Dye <vdye@github.com>
Signed-off-by: Shuqi Liang <cheskaqiqi@gmail.com>
---
 attr.c | 42 +++++++++++++++++++++---------------------
 1 file changed, 21 insertions(+), 21 deletions(-)

diff --git a/attr.c b/attr.c
index 7d39ac4a29..03deb18fb3 100644
--- a/attr.c
+++ b/attr.c
@@ -808,35 +808,35 @@ static struct attr_stack *read_attr_from_blob(struct index_state *istate,
 static struct attr_stack *read_attr_from_index(struct index_state *istate,
 					       const char *path, unsigned flags)
 {
+	struct attr_stack *stack = NULL;
 	char *buf;
 	unsigned long size;
+	int pos;
 
 	if (!istate)
 		return NULL;
 
-	/*
-	 * The .gitattributes file only applies to files within its
-	 * parent directory. In the case of cone-mode sparse-checkout,
-	 * the .gitattributes file is sparse if and only if all paths
-	 * within that directory are also sparse. Thus, don't load the
-	 * .gitattributes file since it will not matter.
-	 *
-	 * In the case of a sparse index, it is critical that we don't go
-	 * looking for a .gitattributes file, as doing so would cause the
-	 * index to expand.
-	 */
-	if (!path_in_cone_mode_sparse_checkout(path, istate))
-		return NULL;
+	pos = index_name_pos_sparse(istate, path, strlen(path));
+	pos = -pos-2;
+	if (!path_in_cone_mode_sparse_checkout(path, istate) && pos>=0) {
+		if (!S_ISSPARSEDIR(istate->cache[pos]->ce_mode))
+			return NULL;
 
-	buf = read_blob_data_from_index(istate, path, &size);
-	if (!buf)
-		return NULL;
-	if (size >= ATTR_MAX_FILE_SIZE) {
-		warning(_("ignoring overly large gitattributes blob '%s'"), path);
-		return NULL;
+		if (strncmp(istate->cache[pos]->name, path, ce_namelen(istate->cache[pos])) == 0) {
+			const char *relative_path = path + ce_namelen(istate->cache[pos]);  
+			stack = read_attr_from_blob(istate, &istate->cache[pos]->oid, relative_path, flags);
+		}
+	} else {
+		buf = read_blob_data_from_index(istate, path, &size);
+		if (!buf)
+			return NULL;
+		if (size >= ATTR_MAX_FILE_SIZE) {
+			warning(_("ignoring overly large gitattributes blob '%s'"), path);
+			return NULL;
+		}
+		stack = read_attr_from_buf(buf, path, flags);
 	}
-
-	return read_attr_from_buf(buf, path, flags);
+	return stack;
 }
 
 static struct attr_stack *read_attr(struct index_state *istate,
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v2 2/3] t1092: add tests for `git check-attr`
  2023-07-07 15:18 ` [PATCH v2 0/3] " Shuqi Liang
  2023-07-07 15:18   ` [PATCH v2 1/3] Enable gitattributes read from sparse directories Shuqi Liang
@ 2023-07-07 15:18   ` Shuqi Liang
  2023-07-07 15:18   ` [PATCH v2 3/3] check-attr: integrate with sparse-index Shuqi Liang
  2023-07-11 13:30   ` [PATCH v3 0/3] " Shuqi Liang
  3 siblings, 0 replies; 39+ messages in thread
From: Shuqi Liang @ 2023-07-07 15:18 UTC (permalink / raw)
  To: git; +Cc: Shuqi Liang, vdye, gitster

Add smudge/clean filters in .gitattributes files inside the affected
sparse directories in test 'merge with conflict outside cone', make sure
it behaves as expected when path is outside of sparse-checkout.

Add tests for `git check-attr`, make sure it behaves as expected when
path is both inside or outside of sparse-checkout definition.

Helped-by: Victoria Dye <vdye@github.com>
Signed-off-by: Shuqi Liang <cheskaqiqi@gmail.com>
---
 t/t1092-sparse-checkout-compatibility.sh | 40 ++++++++++++++++++++++++
 1 file changed, 40 insertions(+)

diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index 8a95adf4b5..839e08d8dd 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -1006,6 +1006,17 @@ test_expect_success 'merge with conflict outside cone' '
 
 	test_all_match git checkout -b merge-tip merge-left &&
 	test_all_match git status --porcelain=v2 &&
+
+	echo "a filter=rot13" >>.gitattributes &&
+	run_on_sparse mkdir folder1 &&
+	run_on_all cp ../.gitattributes ./folder1 &&
+	git -C full-checkout add folder1/.gitattributes &&
+	run_on_sparse git add --sparse folder1/.gitattributes &&
+	run_on_all git commit -m "add .gitattributes" &&
+	test_sparse_match git sparse-checkout reapply &&
+	git config filter.rot13.clean "tr 'A-Za-z' 'N-ZA-Mn-za-m'" &&
+	git config filter.rot13.smudge "tr 'A-Za-z' 'N-ZA-Mn-za-m'" &&
+
 	test_all_match test_must_fail git merge -m merge merge-right &&
 	test_all_match git status --porcelain=v2 &&
 
@@ -2259,4 +2270,33 @@ test_expect_success 'worktree is not expanded' '
 	ensure_not_expanded worktree remove .worktrees/hotfix
 '
 
+test_expect_success 'check-attr with pathspec inside sparse definition' '
+	init_repos &&
+
+	echo "a -crlf myAttr" >>.gitattributes &&
+	run_on_all cp ../.gitattributes ./deep &&
+
+	test_all_match git check-attr -a -- deep/a &&
+
+	test_all_match git add deep/.gitattributes &&
+	test_all_match git check-attr -a --cached -- deep/a
+'
+
+test_expect_success 'check-attr with pathspec outside sparse definition' '
+	init_repos &&
+
+	echo "a -crlf myAttr" >>.gitattributes &&
+	run_on_sparse mkdir folder1 &&
+	run_on_all cp ../.gitattributes ./folder1 &&
+	run_on_all cp a folder1/a &&
+
+	test_all_match git check-attr -a -- folder1/a &&
+
+	git -C full-checkout add folder1/.gitattributes &&
+	run_on_sparse git add --sparse folder1/.gitattributes &&
+	run_on_all git commit -m "add .gitattributes" &&
+	test_sparse_match git sparse-checkout reapply &&
+	test_all_match git check-attr  -a --cached -- folder1/a
+'
+
 test_done
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v2 3/3] check-attr: integrate with sparse-index
  2023-07-07 15:18 ` [PATCH v2 0/3] " Shuqi Liang
  2023-07-07 15:18   ` [PATCH v2 1/3] Enable gitattributes read from sparse directories Shuqi Liang
  2023-07-07 15:18   ` [PATCH v2 2/3] t1092: add tests for `git check-attr` Shuqi Liang
@ 2023-07-07 15:18   ` Shuqi Liang
  2023-07-11 13:30   ` [PATCH v3 0/3] " Shuqi Liang
  3 siblings, 0 replies; 39+ messages in thread
From: Shuqi Liang @ 2023-07-07 15:18 UTC (permalink / raw)
  To: git; +Cc: Shuqi Liang, vdye, gitster

Set the requires-full-index to false for "diff-tree".

Add a test to ensure that the index is not expanded whether the files
are outside or inside the sparse-checkout cone when the sparse index is
enabled.

The `p2000` tests demonstrate a ~63% execution time reduction for
'git check-attr' using a sparse index.

Test                                            before  after
-----------------------------------------------------------------------
2000.106: git check-attr -a f2/f4/a (full-v3)    0.05   0.05 +0.0%
2000.107: git check-attr -a f2/f4/a (full-v4)    0.05   0.05 +0.0%
2000.108: git check-attr -a f2/f4/a (sparse-v3)  0.04   0.02 -50.0%
2000.109: git check-attr -a f2/f4/a (sparse-v4)  0.04   0.01 -75.0%

Helped-by: Victoria Dye <vdye@github.com>
Signed-off-by: Shuqi Liang <cheskaqiqi@gmail.com>
---
 builtin/check-attr.c                     |  3 +++
 t/perf/p2000-sparse-operations.sh        |  1 +
 t/t1092-sparse-checkout-compatibility.sh | 15 +++++++++++++++
 3 files changed, 19 insertions(+)

diff --git a/builtin/check-attr.c b/builtin/check-attr.c
index b22ff748c3..c1da1d184e 100644
--- a/builtin/check-attr.c
+++ b/builtin/check-attr.c
@@ -122,6 +122,9 @@ int cmd_check_attr(int argc, const char **argv, const char *prefix)
 	argc = parse_options(argc, argv, prefix, check_attr_options,
 			     check_attr_usage, PARSE_OPT_KEEP_DASHDASH);
 
+	prepare_repo_settings(the_repository);
+	the_repository->settings.command_requires_full_index = 0;
+
 	if (repo_read_index(the_repository) < 0) {
 		die("invalid cache");
 	}
diff --git a/t/perf/p2000-sparse-operations.sh b/t/perf/p2000-sparse-operations.sh
index 96ed3e1d69..39e92b0841 100755
--- a/t/perf/p2000-sparse-operations.sh
+++ b/t/perf/p2000-sparse-operations.sh
@@ -134,5 +134,6 @@ test_perf_on_all git diff-files -- $SPARSE_CONE/a
 test_perf_on_all git diff-tree HEAD
 test_perf_on_all git diff-tree HEAD -- $SPARSE_CONE/a
 test_perf_on_all "git worktree add ../temp && git worktree remove ../temp"
+test_perf_on_all git check-attr -a -- $SPARSE_CONE/a
 
 test_done
diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index 839e08d8dd..db2c38ab70 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -2299,4 +2299,19 @@ test_expect_success 'check-attr with pathspec outside sparse definition' '
 	test_all_match git check-attr  -a --cached -- folder1/a
 '
 
+test_expect_success 'sparse-index is not expanded: check-attr' '
+	init_repos &&
+
+	echo "a -crlf myAttr" >>.gitattributes &&
+	mkdir ./sparse-index/folder1 &&
+	cp ./sparse-index/a ./sparse-index/folder1/a &&
+	cp .gitattributes ./sparse-index/deep &&
+	cp .gitattributes ./sparse-index/folder1 &&
+
+	git -C sparse-index add deep/.gitattributes &&
+	git -C sparse-index add --sparse  folder1/.gitattributes &&
+	ensure_not_expanded check-attr -a --cached -- deep/a &&
+	ensure_not_expanded check-attr -a --cached -- folder1/a
+'
+
 test_done
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 1/3] Enable gitattributes read from sparse directories
  2023-07-07 15:18   ` [PATCH v2 1/3] Enable gitattributes read from sparse directories Shuqi Liang
@ 2023-07-07 23:15     ` Junio C Hamano
  0 siblings, 0 replies; 39+ messages in thread
From: Junio C Hamano @ 2023-07-07 23:15 UTC (permalink / raw)
  To: Shuqi Liang; +Cc: git, vdye

Shuqi Liang <cheskaqiqi@gmail.com> writes:

> +	pos = index_name_pos_sparse(istate, path, strlen(path));
> +	pos = -pos-2;

With SP at appropriate places, i.e. "pos = -pos - 2".

But more importantly, where does the -2 come from?  For a missing
entry, we get a negative number, and the location that the cache
entry with the given path would be inserted can be recovered by
computing -pos - 1, and that is why 

	if (0 <= pos) {
		... handle existing ce at pos ...
	} else if (pos < 0) {
		pos = -pos - 1;
		... if such a path were in the index, it would have
		... been at pos
	}

looks fairly familiar to those who have read our code.  Even in such
a case, we do not blindly compute "-pos - 1", though.

In any case, this magic "adjustment" of the returned value needs to
be explained, perhaps in in-code comment around there.

> +	if (!path_in_cone_mode_sparse_checkout(path, istate) && pos>=0) {

With SP at appropriate places, i.e. " && 0 <= pos".

Thanks.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH v3 0/3] check-attr: integrate with sparse-index
  2023-07-07 15:18 ` [PATCH v2 0/3] " Shuqi Liang
                     ` (2 preceding siblings ...)
  2023-07-07 15:18   ` [PATCH v2 3/3] check-attr: integrate with sparse-index Shuqi Liang
@ 2023-07-11 13:30   ` Shuqi Liang
  2023-07-11 13:30     ` [PATCH v3 1/3] attr.c: read attributes in a sparse directory Shuqi Liang
                       ` (4 more replies)
  3 siblings, 5 replies; 39+ messages in thread
From: Shuqi Liang @ 2023-07-11 13:30 UTC (permalink / raw)
  To: git; +Cc: Shuqi Liang, vdye, gitster

change against v2:

* add SP at appropriate places

* add in-code comment around magic "adjustment"

Shuqi Liang (3):
  attr.c: read attributes in a sparse directory
  t1092: add tests for `git check-attr`
  check-attr: integrate with sparse-index

 attr.c                                   | 47 ++++++++++++--------
 builtin/check-attr.c                     |  3 ++
 t/perf/p2000-sparse-operations.sh        |  1 +
 t/t1092-sparse-checkout-compatibility.sh | 55 ++++++++++++++++++++++++
 4 files changed, 87 insertions(+), 19 deletions(-)

Range-diff against v2:
1:  0ff2ab9430 ! 1:  199cc90a5b Enable gitattributes read from sparse directories
    @@ Metadata
     Author: Shuqi Liang <cheskaqiqi@gmail.com>
     
      ## Commit message ##
    -    Enable gitattributes read from sparse directories
    +    attr.c: read attributes in a sparse directory
     
         'git check-attr' cannot currently find attributes of a file within a
         sparse directory. This is due to .gitattributes files are irrelevant in
    @@ attr.c: static struct attr_stack *read_attr_from_blob(struct index_state *istate
      	if (!istate)
      		return NULL;
      
    --	/*
    + 	/*
     -	 * The .gitattributes file only applies to files within its
     -	 * parent directory. In the case of cone-mode sparse-checkout,
     -	 * the .gitattributes file is sparse if and only if all paths
    @@ attr.c: static struct attr_stack *read_attr_from_blob(struct index_state *istate
     -	 * In the case of a sparse index, it is critical that we don't go
     -	 * looking for a .gitattributes file, as doing so would cause the
     -	 * index to expand.
    --	 */
    ++	 * If the pos value is negative, it means the path is not in the index. 
    ++	 * However, the absolute value of pos minus 1 gives us the position where the path 
    ++	 * would be inserted in lexicographic order. By subtracting another 1 from this 
    ++	 * value (pos = -pos - 2), we find the position of the last index entry 
    ++	 * which is lexicographically smaller than the provided path. This would be 
    ++	 * the sparse directory containing the path.
    + 	 */
     -	if (!path_in_cone_mode_sparse_checkout(path, istate))
     -		return NULL;
     +	pos = index_name_pos_sparse(istate, path, strlen(path));
    -+	pos = -pos-2;
    -+	if (!path_in_cone_mode_sparse_checkout(path, istate) && pos>=0) {
    -+		if (!S_ISSPARSEDIR(istate->cache[pos]->ce_mode))
    -+			return NULL;
    ++	pos = - pos - 2;
      
     -	buf = read_blob_data_from_index(istate, path, &size);
     -	if (!buf)
    @@ attr.c: static struct attr_stack *read_attr_from_blob(struct index_state *istate
     -	if (size >= ATTR_MAX_FILE_SIZE) {
     -		warning(_("ignoring overly large gitattributes blob '%s'"), path);
     -		return NULL;
    +-	}
    ++	if (!path_in_cone_mode_sparse_checkout(path, istate) && 0 <= pos) {
    ++		if (!S_ISSPARSEDIR(istate->cache[pos]->ce_mode))
    ++			return NULL;
    + 
    +-	return read_attr_from_buf(buf, path, flags);
     +		if (strncmp(istate->cache[pos]->name, path, ce_namelen(istate->cache[pos])) == 0) {
     +			const char *relative_path = path + ce_namelen(istate->cache[pos]);  
     +			stack = read_attr_from_blob(istate, &istate->cache[pos]->oid, relative_path, flags);
    @@ attr.c: static struct attr_stack *read_attr_from_blob(struct index_state *istate
     +			return NULL;
     +		}
     +		stack = read_attr_from_buf(buf, path, flags);
    - 	}
    --
    --	return read_attr_from_buf(buf, path, flags);
    ++	}
     +	return stack;
      }
      
2:  835e1176b0 = 2:  eefce85083 t1092: add tests for `git check-attr`
3:  672d692e51 = 3:  65c2624504 check-attr: integrate with sparse-index
-- 
2.39.0


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH v3 1/3] attr.c: read attributes in a sparse directory
  2023-07-11 13:30   ` [PATCH v3 0/3] " Shuqi Liang
@ 2023-07-11 13:30     ` Shuqi Liang
  2023-07-11 21:15       ` Junio C Hamano
  2023-07-11 21:24       ` Victoria Dye
  2023-07-11 13:30     ` [PATCH v3 2/3] t1092: add tests for `git check-attr` Shuqi Liang
                       ` (3 subsequent siblings)
  4 siblings, 2 replies; 39+ messages in thread
From: Shuqi Liang @ 2023-07-11 13:30 UTC (permalink / raw)
  To: git; +Cc: Shuqi Liang, vdye, gitster

'git check-attr' cannot currently find attributes of a file within a
sparse directory. This is due to .gitattributes files are irrelevant in
sparse-checkout cone mode, as the file is considered sparse only if all
paths within its parent directory are also sparse. In addition,
searching for a .gitattributes file causes expansion of the sparse
index, which is avoided to prevent potential performance degradation.

However, this behavior can lead to missing attributes for files inside
sparse directories, causing inconsistencies in file handling.

To resolve this, revise 'git check-attr' to allow attribute reading for
files in sparse directories from the corresponding .gitattributes files:

1.Utilize path_in_cone_mode_sparse_checkout() and index_name_pos_sparse
to check if a path falls within a sparse directory.

2.If path is inside a sparse directory, employ the value of
index_name_pos_sparse() to find the sparse directory containing path and
path relative to sparse directory. Proceed to read attributes from the
tree OID of the sparse directory using read_attr_from_blob().

3.If path is not inside a sparse directory,ensure that attributes are
fetched from the index blob with read_blob_data_from_index().

Helped-by: Victoria Dye <vdye@github.com>
Signed-off-by: Shuqi Liang <cheskaqiqi@gmail.com>
---
 attr.c | 47 ++++++++++++++++++++++++++++-------------------
 1 file changed, 28 insertions(+), 19 deletions(-)

diff --git a/attr.c b/attr.c
index 7d39ac4a29..be06747b0d 100644
--- a/attr.c
+++ b/attr.c
@@ -808,35 +808,44 @@ static struct attr_stack *read_attr_from_blob(struct index_state *istate,
 static struct attr_stack *read_attr_from_index(struct index_state *istate,
 					       const char *path, unsigned flags)
 {
+	struct attr_stack *stack = NULL;
 	char *buf;
 	unsigned long size;
+	int pos;
 
 	if (!istate)
 		return NULL;
 
 	/*
-	 * The .gitattributes file only applies to files within its
-	 * parent directory. In the case of cone-mode sparse-checkout,
-	 * the .gitattributes file is sparse if and only if all paths
-	 * within that directory are also sparse. Thus, don't load the
-	 * .gitattributes file since it will not matter.
-	 *
-	 * In the case of a sparse index, it is critical that we don't go
-	 * looking for a .gitattributes file, as doing so would cause the
-	 * index to expand.
+	 * If the pos value is negative, it means the path is not in the index. 
+	 * However, the absolute value of pos minus 1 gives us the position where the path 
+	 * would be inserted in lexicographic order. By subtracting another 1 from this 
+	 * value (pos = -pos - 2), we find the position of the last index entry 
+	 * which is lexicographically smaller than the provided path. This would be 
+	 * the sparse directory containing the path.
 	 */
-	if (!path_in_cone_mode_sparse_checkout(path, istate))
-		return NULL;
+	pos = index_name_pos_sparse(istate, path, strlen(path));
+	pos = - pos - 2;
 
-	buf = read_blob_data_from_index(istate, path, &size);
-	if (!buf)
-		return NULL;
-	if (size >= ATTR_MAX_FILE_SIZE) {
-		warning(_("ignoring overly large gitattributes blob '%s'"), path);
-		return NULL;
-	}
+	if (!path_in_cone_mode_sparse_checkout(path, istate) && 0 <= pos) {
+		if (!S_ISSPARSEDIR(istate->cache[pos]->ce_mode))
+			return NULL;
 
-	return read_attr_from_buf(buf, path, flags);
+		if (strncmp(istate->cache[pos]->name, path, ce_namelen(istate->cache[pos])) == 0) {
+			const char *relative_path = path + ce_namelen(istate->cache[pos]);  
+			stack = read_attr_from_blob(istate, &istate->cache[pos]->oid, relative_path, flags);
+		}
+	} else {
+		buf = read_blob_data_from_index(istate, path, &size);
+		if (!buf)
+			return NULL;
+		if (size >= ATTR_MAX_FILE_SIZE) {
+			warning(_("ignoring overly large gitattributes blob '%s'"), path);
+			return NULL;
+		}
+		stack = read_attr_from_buf(buf, path, flags);
+	}
+	return stack;
 }
 
 static struct attr_stack *read_attr(struct index_state *istate,
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v3 2/3] t1092: add tests for `git check-attr`
  2023-07-11 13:30   ` [PATCH v3 0/3] " Shuqi Liang
  2023-07-11 13:30     ` [PATCH v3 1/3] attr.c: read attributes in a sparse directory Shuqi Liang
@ 2023-07-11 13:30     ` Shuqi Liang
  2023-07-11 18:52       ` Junio C Hamano
  2023-07-11 20:47       ` Victoria Dye
  2023-07-11 13:30     ` [PATCH v3 3/3] check-attr: integrate with sparse-index Shuqi Liang
                       ` (2 subsequent siblings)
  4 siblings, 2 replies; 39+ messages in thread
From: Shuqi Liang @ 2023-07-11 13:30 UTC (permalink / raw)
  To: git; +Cc: Shuqi Liang, vdye, gitster

Add smudge/clean filters in .gitattributes files inside the affected
sparse directories in test 'merge with conflict outside cone', make sure
it behaves as expected when path is outside of sparse-checkout.

Add tests for `git check-attr`, make sure it behaves as expected when
path is both inside or outside of sparse-checkout definition.

Helped-by: Victoria Dye <vdye@github.com>
Signed-off-by: Shuqi Liang <cheskaqiqi@gmail.com>
---
 t/t1092-sparse-checkout-compatibility.sh | 40 ++++++++++++++++++++++++
 1 file changed, 40 insertions(+)

diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index 8a95adf4b5..839e08d8dd 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -1006,6 +1006,17 @@ test_expect_success 'merge with conflict outside cone' '
 
 	test_all_match git checkout -b merge-tip merge-left &&
 	test_all_match git status --porcelain=v2 &&
+
+	echo "a filter=rot13" >>.gitattributes &&
+	run_on_sparse mkdir folder1 &&
+	run_on_all cp ../.gitattributes ./folder1 &&
+	git -C full-checkout add folder1/.gitattributes &&
+	run_on_sparse git add --sparse folder1/.gitattributes &&
+	run_on_all git commit -m "add .gitattributes" &&
+	test_sparse_match git sparse-checkout reapply &&
+	git config filter.rot13.clean "tr 'A-Za-z' 'N-ZA-Mn-za-m'" &&
+	git config filter.rot13.smudge "tr 'A-Za-z' 'N-ZA-Mn-za-m'" &&
+
 	test_all_match test_must_fail git merge -m merge merge-right &&
 	test_all_match git status --porcelain=v2 &&
 
@@ -2259,4 +2270,33 @@ test_expect_success 'worktree is not expanded' '
 	ensure_not_expanded worktree remove .worktrees/hotfix
 '
 
+test_expect_success 'check-attr with pathspec inside sparse definition' '
+	init_repos &&
+
+	echo "a -crlf myAttr" >>.gitattributes &&
+	run_on_all cp ../.gitattributes ./deep &&
+
+	test_all_match git check-attr -a -- deep/a &&
+
+	test_all_match git add deep/.gitattributes &&
+	test_all_match git check-attr -a --cached -- deep/a
+'
+
+test_expect_success 'check-attr with pathspec outside sparse definition' '
+	init_repos &&
+
+	echo "a -crlf myAttr" >>.gitattributes &&
+	run_on_sparse mkdir folder1 &&
+	run_on_all cp ../.gitattributes ./folder1 &&
+	run_on_all cp a folder1/a &&
+
+	test_all_match git check-attr -a -- folder1/a &&
+
+	git -C full-checkout add folder1/.gitattributes &&
+	run_on_sparse git add --sparse folder1/.gitattributes &&
+	run_on_all git commit -m "add .gitattributes" &&
+	test_sparse_match git sparse-checkout reapply &&
+	test_all_match git check-attr  -a --cached -- folder1/a
+'
+
 test_done
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v3 3/3] check-attr: integrate with sparse-index
  2023-07-11 13:30   ` [PATCH v3 0/3] " Shuqi Liang
  2023-07-11 13:30     ` [PATCH v3 1/3] attr.c: read attributes in a sparse directory Shuqi Liang
  2023-07-11 13:30     ` [PATCH v3 2/3] t1092: add tests for `git check-attr` Shuqi Liang
@ 2023-07-11 13:30     ` Shuqi Liang
  2023-07-11 20:07       ` Junio C Hamano
  2023-07-11 16:56     ` [PATCH v3 0/3] " Junio C Hamano
  2023-07-18 23:29     ` [PATCH v4 " Shuqi Liang
  4 siblings, 1 reply; 39+ messages in thread
From: Shuqi Liang @ 2023-07-11 13:30 UTC (permalink / raw)
  To: git; +Cc: Shuqi Liang, vdye, gitster

Set the requires-full-index to false for "diff-tree".

Add a test to ensure that the index is not expanded whether the files
are outside or inside the sparse-checkout cone when the sparse index is
enabled.

The `p2000` tests demonstrate a ~63% execution time reduction for
'git check-attr' using a sparse index.

Test                                            before  after
-----------------------------------------------------------------------
2000.106: git check-attr -a f2/f4/a (full-v3)    0.05   0.05 +0.0%
2000.107: git check-attr -a f2/f4/a (full-v4)    0.05   0.05 +0.0%
2000.108: git check-attr -a f2/f4/a (sparse-v3)  0.04   0.02 -50.0%
2000.109: git check-attr -a f2/f4/a (sparse-v4)  0.04   0.01 -75.0%

Helped-by: Victoria Dye <vdye@github.com>
Signed-off-by: Shuqi Liang <cheskaqiqi@gmail.com>
---
 builtin/check-attr.c                     |  3 +++
 t/perf/p2000-sparse-operations.sh        |  1 +
 t/t1092-sparse-checkout-compatibility.sh | 15 +++++++++++++++
 3 files changed, 19 insertions(+)

diff --git a/builtin/check-attr.c b/builtin/check-attr.c
index b22ff748c3..c1da1d184e 100644
--- a/builtin/check-attr.c
+++ b/builtin/check-attr.c
@@ -122,6 +122,9 @@ int cmd_check_attr(int argc, const char **argv, const char *prefix)
 	argc = parse_options(argc, argv, prefix, check_attr_options,
 			     check_attr_usage, PARSE_OPT_KEEP_DASHDASH);
 
+	prepare_repo_settings(the_repository);
+	the_repository->settings.command_requires_full_index = 0;
+
 	if (repo_read_index(the_repository) < 0) {
 		die("invalid cache");
 	}
diff --git a/t/perf/p2000-sparse-operations.sh b/t/perf/p2000-sparse-operations.sh
index 96ed3e1d69..39e92b0841 100755
--- a/t/perf/p2000-sparse-operations.sh
+++ b/t/perf/p2000-sparse-operations.sh
@@ -134,5 +134,6 @@ test_perf_on_all git diff-files -- $SPARSE_CONE/a
 test_perf_on_all git diff-tree HEAD
 test_perf_on_all git diff-tree HEAD -- $SPARSE_CONE/a
 test_perf_on_all "git worktree add ../temp && git worktree remove ../temp"
+test_perf_on_all git check-attr -a -- $SPARSE_CONE/a
 
 test_done
diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index 839e08d8dd..db2c38ab70 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -2299,4 +2299,19 @@ test_expect_success 'check-attr with pathspec outside sparse definition' '
 	test_all_match git check-attr  -a --cached -- folder1/a
 '
 
+test_expect_success 'sparse-index is not expanded: check-attr' '
+	init_repos &&
+
+	echo "a -crlf myAttr" >>.gitattributes &&
+	mkdir ./sparse-index/folder1 &&
+	cp ./sparse-index/a ./sparse-index/folder1/a &&
+	cp .gitattributes ./sparse-index/deep &&
+	cp .gitattributes ./sparse-index/folder1 &&
+
+	git -C sparse-index add deep/.gitattributes &&
+	git -C sparse-index add --sparse  folder1/.gitattributes &&
+	ensure_not_expanded check-attr -a --cached -- deep/a &&
+	ensure_not_expanded check-attr -a --cached -- folder1/a
+'
+
 test_done
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [PATCH v3 0/3] check-attr: integrate with sparse-index
  2023-07-11 13:30   ` [PATCH v3 0/3] " Shuqi Liang
                       ` (2 preceding siblings ...)
  2023-07-11 13:30     ` [PATCH v3 3/3] check-attr: integrate with sparse-index Shuqi Liang
@ 2023-07-11 16:56     ` Junio C Hamano
  2023-07-18 23:29     ` [PATCH v4 " Shuqi Liang
  4 siblings, 0 replies; 39+ messages in thread
From: Junio C Hamano @ 2023-07-11 16:56 UTC (permalink / raw)
  To: Shuqi Liang; +Cc: git, vdye

Seems to have leftover whitespace glitches.  They are not a reason
to reroll alone, but if you need to create v4 then please make sure
your patches are free of these.

Thanks.

.git/rebase-apply/patch:31: trailing whitespace.
	 * If the pos value is negative, it means the path is not in the index. 
.git/rebase-apply/patch:32: trailing whitespace.
	 * However, the absolute value of pos minus 1 gives us the position where the path 
.git/rebase-apply/patch:33: trailing whitespace.
	 * would be inserted in lexicographic order. By subtracting another 1 from this 
.git/rebase-apply/patch:34: trailing whitespace.
	 * value (pos = -pos - 2), we find the position of the last index entry 
.git/rebase-apply/patch:35: trailing whitespace.
	 * which is lexicographically smaller than the provided path. This would be 
warning: squelched 1 whitespace error
warning: 6 lines applied after fixing whitespace errors.
Applying: attr.c: read attributes in a sparse directory
Applying: t1092: add tests for `git check-attr`
Applying: check-attr: integrate with sparse-index

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v3 2/3] t1092: add tests for `git check-attr`
  2023-07-11 13:30     ` [PATCH v3 2/3] t1092: add tests for `git check-attr` Shuqi Liang
@ 2023-07-11 18:52       ` Junio C Hamano
  2023-07-11 20:47       ` Victoria Dye
  1 sibling, 0 replies; 39+ messages in thread
From: Junio C Hamano @ 2023-07-11 18:52 UTC (permalink / raw)
  To: Shuqi Liang; +Cc: git, vdye

Shuqi Liang <cheskaqiqi@gmail.com> writes:

> Add smudge/clean filters in .gitattributes files inside the affected
> sparse directories in test 'merge with conflict outside cone', make sure
> it behaves as expected when path is outside of sparse-checkout.

Please rewrite "as expected" into a more concrete form.  What is the
expectation when path is outside of sparse-checkout?  The attributes
file does not need to be read and that can be observed by the filter
program not triggering?  The attribute file does get read and that
can be observed by the filter program running?

> Add tests for `git check-attr`, make sure it behaves as expected when
> path is both inside or outside of sparse-checkout definition.

Ditto.

> Helped-by: Victoria Dye <vdye@github.com>
> Signed-off-by: Shuqi Liang <cheskaqiqi@gmail.com>
> ---
>  t/t1092-sparse-checkout-compatibility.sh | 40 ++++++++++++++++++++++++
>  1 file changed, 40 insertions(+)

Thanks.

>
> diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
> index 8a95adf4b5..839e08d8dd 100755
> --- a/t/t1092-sparse-checkout-compatibility.sh
> +++ b/t/t1092-sparse-checkout-compatibility.sh
> @@ -1006,6 +1006,17 @@ test_expect_success 'merge with conflict outside cone' '
>  
>  	test_all_match git checkout -b merge-tip merge-left &&
>  	test_all_match git status --porcelain=v2 &&
> +
> +	echo "a filter=rot13" >>.gitattributes &&
> +	run_on_sparse mkdir folder1 &&
> +	run_on_all cp ../.gitattributes ./folder1 &&
> +	git -C full-checkout add folder1/.gitattributes &&
> +	run_on_sparse git add --sparse folder1/.gitattributes &&
> +	run_on_all git commit -m "add .gitattributes" &&
> +	test_sparse_match git sparse-checkout reapply &&
> +	git config filter.rot13.clean "tr 'A-Za-z' 'N-ZA-Mn-za-m'" &&
> +	git config filter.rot13.smudge "tr 'A-Za-z' 'N-ZA-Mn-za-m'" &&
> +
>  	test_all_match test_must_fail git merge -m merge merge-right &&
>  	test_all_match git status --porcelain=v2 &&
>  
> @@ -2259,4 +2270,33 @@ test_expect_success 'worktree is not expanded' '
>  	ensure_not_expanded worktree remove .worktrees/hotfix
>  '
>  
> +test_expect_success 'check-attr with pathspec inside sparse definition' '
> +	init_repos &&
> +
> +	echo "a -crlf myAttr" >>.gitattributes &&
> +	run_on_all cp ../.gitattributes ./deep &&
> +
> +	test_all_match git check-attr -a -- deep/a &&
> +
> +	test_all_match git add deep/.gitattributes &&
> +	test_all_match git check-attr -a --cached -- deep/a
> +'
> +
> +test_expect_success 'check-attr with pathspec outside sparse definition' '
> +	init_repos &&
> +
> +	echo "a -crlf myAttr" >>.gitattributes &&
> +	run_on_sparse mkdir folder1 &&
> +	run_on_all cp ../.gitattributes ./folder1 &&
> +	run_on_all cp a folder1/a &&
> +
> +	test_all_match git check-attr -a -- folder1/a &&
> +
> +	git -C full-checkout add folder1/.gitattributes &&
> +	run_on_sparse git add --sparse folder1/.gitattributes &&
> +	run_on_all git commit -m "add .gitattributes" &&
> +	test_sparse_match git sparse-checkout reapply &&
> +	test_all_match git check-attr  -a --cached -- folder1/a
> +'
> +
>  test_done

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v3 3/3] check-attr: integrate with sparse-index
  2023-07-11 13:30     ` [PATCH v3 3/3] check-attr: integrate with sparse-index Shuqi Liang
@ 2023-07-11 20:07       ` Junio C Hamano
  0 siblings, 0 replies; 39+ messages in thread
From: Junio C Hamano @ 2023-07-11 20:07 UTC (permalink / raw)
  To: Shuqi Liang; +Cc: git, vdye

Shuqi Liang <cheskaqiqi@gmail.com> writes:

> Set the requires-full-index to false for "diff-tree".

Really...?

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v3 2/3] t1092: add tests for `git check-attr`
  2023-07-11 13:30     ` [PATCH v3 2/3] t1092: add tests for `git check-attr` Shuqi Liang
  2023-07-11 18:52       ` Junio C Hamano
@ 2023-07-11 20:47       ` Victoria Dye
  1 sibling, 0 replies; 39+ messages in thread
From: Victoria Dye @ 2023-07-11 20:47 UTC (permalink / raw)
  To: Shuqi Liang, git; +Cc: gitster

Shuqi Liang wrote:
> Add smudge/clean filters in .gitattributes files inside the affected
> sparse directories in test 'merge with conflict outside cone', make sure
> it behaves as expected when path is outside of sparse-checkout.
> 
> Add tests for `git check-attr`, make sure it behaves as expected when
> path is both inside or outside of sparse-checkout definition.
> 
> Helped-by: Victoria Dye <vdye@github.com>
> Signed-off-by: Shuqi Liang <cheskaqiqi@gmail.com>
> ---
>  t/t1092-sparse-checkout-compatibility.sh | 40 ++++++++++++++++++++++++
>  1 file changed, 40 insertions(+)
> 
> diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
> index 8a95adf4b5..839e08d8dd 100755
> --- a/t/t1092-sparse-checkout-compatibility.sh
> +++ b/t/t1092-sparse-checkout-compatibility.sh
> @@ -1006,6 +1006,17 @@ test_expect_success 'merge with conflict outside cone' '
>  
>  	test_all_match git checkout -b merge-tip merge-left &&
>  	test_all_match git status --porcelain=v2 &&
> +
> +	echo "a filter=rot13" >>.gitattributes &&
> +	run_on_sparse mkdir folder1 &&
> +	run_on_all cp ../.gitattributes ./folder1 &&
> +	git -C full-checkout add folder1/.gitattributes &&
> +	run_on_sparse git add --sparse folder1/.gitattributes &&
> +	run_on_all git commit -m "add .gitattributes" &&
> +	test_sparse_match git sparse-checkout reapply &&
> +	git config filter.rot13.clean "tr 'A-Za-z' 'N-ZA-Mn-za-m'" &&
> +	git config filter.rot13.smudge "tr 'A-Za-z' 'N-ZA-Mn-za-m'" &&
> +

In general, we try to add tests demonstrating behavior in context with the
implementation of that behavior. Patch 1 [1] contains the update to
attribute reading that's being tested here, so this block should be moved
there accordingly.

Also, does this test fail before patch 1 but succeed after? It's a bit
difficult to tell how this demonstrates that the in-sparse-directory
`.gitattributes` is applied properly now but wasn't before. An
additional comment in the test or commit message would be helpful for
understanding it better.

[1] https://lore.kernel.org/git/20230711133035.16916-2-cheskaqiqi@gmail.com/

>  	test_all_match test_must_fail git merge -m merge merge-right &&
>  	test_all_match git status --porcelain=v2 &&
>  

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v3 1/3] attr.c: read attributes in a sparse directory
  2023-07-11 13:30     ` [PATCH v3 1/3] attr.c: read attributes in a sparse directory Shuqi Liang
@ 2023-07-11 21:15       ` Junio C Hamano
  2023-07-11 22:08         ` Junio C Hamano
  2023-07-13 20:13         ` Shuqi Liang
  2023-07-11 21:24       ` Victoria Dye
  1 sibling, 2 replies; 39+ messages in thread
From: Junio C Hamano @ 2023-07-11 21:15 UTC (permalink / raw)
  To: Shuqi Liang; +Cc: git, vdye

Shuqi Liang <cheskaqiqi@gmail.com> writes:

> 'git check-attr' cannot currently find attributes of a file within a
> sparse directory. This is due to .gitattributes files are irrelevant in
> sparse-checkout cone mode, as the file is considered sparse only if all
> paths within its parent directory are also sparse.

I do not quite understand what these two sentences want to say.  If
the attribute files are truly irrelevant then "cannot find" does not
matter, because there is no point in finding irrelevant things that
by definition will not affect the outcome of any commands at all,
no?

> In addition,
> searching for a .gitattributes file causes expansion of the sparse
> index, which is avoided to prevent potential performance degradation.

Does this sentence want to say that there is a price to pay, in
order to read an attribute file that is not part of the cones of
interest, that you first need to expand the sparse index?  I think
that is a given and I am not sure what the point of saying it is.

> However, this behavior can lead to missing attributes for files inside
> sparse directories, causing inconsistencies in file handling.

I agree.  Not reading attribute files correctly will lead to a bug.

Let me rephase what (I think) you wrote below to see if I understand
what you are doing correctly.

Suppose that sub1/.gitattributes need to be read, when the calling
command wants to know about attributes of sub1/file.  Imagine that
sub1/ and sub2/ are both outside the cones of interest. It would be
better not to expand sub2/ even though we need to expand sub1/.  Not
calling ensure_full_index() upfront and instead expanding the
necessary subdirectories on demand would be a good way to solve it.

Is that what going on?

> To resolve this, revise 'git check-attr' to allow attribute reading for
> files in sparse directories from the corresponding .gitattributes files:
>
> 1.Utilize path_in_cone_mode_sparse_checkout() and index_name_pos_sparse
> to check if a path falls within a sparse directory.
>
> 2.If path is inside a sparse directory, employ the value of
> index_name_pos_sparse() to find the sparse directory containing path and
> path relative to sparse directory. Proceed to read attributes from the
> tree OID of the sparse directory using read_attr_from_blob().
>
> 3.If path is not inside a sparse directory,ensure that attributes are
> fetched from the index blob with read_blob_data_from_index().
>
> Helped-by: Victoria Dye <vdye@github.com>
> Signed-off-by: Shuqi Liang <cheskaqiqi@gmail.com>
> ---
>  attr.c | 47 ++++++++++++++++++++++++++++-------------------
>  1 file changed, 28 insertions(+), 19 deletions(-)

Thanks.

> diff --git a/attr.c b/attr.c
> index 7d39ac4a29..be06747b0d 100644
> --- a/attr.c
> +++ b/attr.c
> @@ -808,35 +808,44 @@ static struct attr_stack *read_attr_from_blob(struct index_state *istate,
>  static struct attr_stack *read_attr_from_index(struct index_state *istate,
>  					       const char *path, unsigned flags)
>  {
> +	struct attr_stack *stack = NULL;
>  	char *buf;
>  	unsigned long size;
> +	int pos;
>  
>  	if (!istate)
>  		return NULL;
>  
>  	/*
> -	 * The .gitattributes file only applies to files within its
> -	 * parent directory. In the case of cone-mode sparse-checkout,
> -	 * the .gitattributes file is sparse if and only if all paths
> -	 * within that directory are also sparse. Thus, don't load the
> -	 * .gitattributes file since it will not matter.

Imagine that you have a tree with sub1/ outside the cones of
interest and sub2/ and sub9/ inside the cones of interest, and
further imagine that sub1/.gitattributes and sub2/.gitattributes
give attribute X to sub1/file and sub2/file respectively.  There
is no sub9/.gitattributes file.

Then "git ls-files ':(attr:X)sub[0-9]'" _could_ have two equally
sensible behaviours:

 (1) Only show sub2/file because sub1/ is outside the cones of
     interest and the user does not want to clutter the output
     from the parts of the tree they are not interested in.

 (2) Show both sub1/file and sub2/file, even though sub1/ is outside
     the cones of interest, in response to the fact that the mention
     of "sub[0-9]" on the command line is an explicit indication of
     interest by the user (it would become more and more interesting
     if the pathspec gets less specific, like ":(attr:X)" that is
     treewide, though).

The original comment seems to say that only behaviour (1) is
supported, but I wonder if we eventually want to support both,
choice made by the calling code (and perhaps options)?  In any case,
offering the choice of (2) is a good thing in the longer run.
Anyway...

> +	 * If the pos value is negative, it means the path is not in the index. 
> +	 * However, the absolute value of pos minus 1 gives us the position where the path 
> +	 * would be inserted in lexicographic order. By subtracting another 1 from this 
> +	 * value (pos = -pos - 2), we find the position of the last index entry 
> +	 * which is lexicographically smaller than the provided path. This would be 
> +	 * the sparse directory containing the path.

That is true only if the directory containing the .gitattribute file
is sparsified (e.g. sub1/.gitattributes does not appear in the index
but sub1/ does; sub2/.gitattributes however does appear in the index
and there is no sub2/ in the index).

If not, there are two cases:

 * sub2/.gitattributes does appear in the index (and there is no
   sub2/ in the index).  "pos = - pos - 2" computes a nonsense
   number in this case; hopefully we can reject it early by noticing
   that the resulting pos is negative.

 * sub9/.gitattributes does not belong to the project.  The pos is
   negative and "- pos - 2" does not poihnt at sub9/ (as it is not
   sparse).  Depending on what other paths appear in sub9/., the
   path that appears at (-pos-2) may be inside or outside sub9/.  In
   the worst case, it could be a sparsified directory that sorts
   directly before sub9/ (say, there is sub8/ that is sparse, which
   may have .gitattributes in it).  Would the updated code
   mistakenly check S_ISSPARSEDIR() on sub8/ that has no relevance
   when we are dealing with sub9/.gitattributes that does not exist?

> -	if (!path_in_cone_mode_sparse_checkout(path, istate))
> -		return NULL;
> +	pos = index_name_pos_sparse(istate, path, strlen(path));
> +	pos = - pos - 2;
>  
> -	buf = read_blob_data_from_index(istate, path, &size);
> -	if (!buf)
> -		return NULL;
> -	if (size >= ATTR_MAX_FILE_SIZE) {
> -		warning(_("ignoring overly large gitattributes blob '%s'"), path);
> -		return NULL;
> -	}
> +	if (!path_in_cone_mode_sparse_checkout(path, istate) && 0 <= pos) {
> +		if (!S_ISSPARSEDIR(istate->cache[pos]->ce_mode))
> +			return NULL;

So earlier, the code, given say sub1/.gitattributes, checked if that
path is outside the cones of interest and skipped reading it.  But
the updated code tries to check the same "is it outside or inside?"
condition for sub1/ directory itself.  Does it make a practical
difference that you can demonstrate with a test?

I do not know if the updated code does the right thing for
sub2/.gitattributes (exists in a non-sparse directory) and
sub9/.gitattributes (does not exist in non-sparse directory),
though.

> +		if (strncmp(istate->cache[pos]->name, path, ce_namelen(istate->cache[pos])) == 0) {

Don't compare with "==0", write !strncmp(...) instead.

> +			const char *relative_path = path + ce_namelen(istate->cache[pos]);  
> +			stack = read_attr_from_blob(istate, &istate->cache[pos]->oid, relative_path, flags);
> +		}

If the earlier "- pos - 2" misidentified the parent sparse directory
entry in the index and the strncmp() noticed that mistake, we would
come here without reading any new attribute stack frame.  Don't we
need to fallback reading from the path in the correct directory that
is not at "- pos - 2"?

Let's imagine this case where sub/ is a directory outside the cones
of interest, and our sparse-index may or may not have it as a
directory in the index, and then the caller asks to read from the
"sub/sub1/.gitattributes" file.  Even when "sub/" is expanded in the
index, "sub/sub1/" may not and appear as a directory in the index.

The above "find relative_path and read from the tree object" code
would of course work when the direct parent directory of
".gitattributes" is visible in the index, but interestingly, it
would also work when it does not.  E.g. if "sub/" is represented as
a directory in the index, then asking for "sub1/.gitattributes"
inside the tree object of "sub/" would work as get_tree_entry() used
by read_attr_from_blob() would get to the right object recursively,
so that is nice.  If that is why "'- pos - 2' must be the directory
entry in the index that _would_ include $leadingpath/.gitattributes
regardless of how many levels of directory hierarchy there are
inside $leadingpath" idea was chosen, I'd have to say that it is
clever ;-)

I however find the "'- pos - 2' must be the directory entry in the
index" trick hard to reason about and explain.  I wonder if we write
this in a more straight-forward and stupid way, the result becomes
easier to read and less prone to future bugs...

Thanks.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v3 1/3] attr.c: read attributes in a sparse directory
  2023-07-11 13:30     ` [PATCH v3 1/3] attr.c: read attributes in a sparse directory Shuqi Liang
  2023-07-11 21:15       ` Junio C Hamano
@ 2023-07-11 21:24       ` Victoria Dye
  1 sibling, 0 replies; 39+ messages in thread
From: Victoria Dye @ 2023-07-11 21:24 UTC (permalink / raw)
  To: Shuqi Liang, git; +Cc: gitster

Shuqi Liang wrote:
> 'git check-attr' cannot currently find attributes of a file within a
> sparse directory. This is due to .gitattributes files are irrelevant in
> sparse-checkout cone mode, as the file is considered sparse only if all
> paths within its parent directory are also sparse. 

If .gitattributes files are irrelevant in sparse-checkout cone mode, then
why are we changing the behavior? If you're challenging that assertion,
please state so clearly.

> In addition,> searching for a .gitattributes file causes expansion of the sparse
> index, which is avoided to prevent potential performance degradation.

This isn't an unchangeable fact (as your implementation below shows).
Expanding the index is just the most straightforward approach, but the
performance cost of that is (AFAICT) a reason used to justify why we didn't
read sparse directory attributes in the past.

> 
> However, this behavior can lead to missing attributes for files inside
> sparse directories, causing inconsistencies in file handling.
> 
> To resolve this, revise 'git check-attr' to allow attribute reading for
> files in sparse directories from the corresponding .gitattributes files:
> 
> 1.Utilize path_in_cone_mode_sparse_checkout() and index_name_pos_sparse
> to check if a path falls within a sparse directory.
> 
> 2.If path is inside a sparse directory, employ the value of
> index_name_pos_sparse() to find the sparse directory containing path and
> path relative to sparse directory. Proceed to read attributes from the
> tree OID of the sparse directory using read_attr_from_blob().
> 
> 3.If path is not inside a sparse directory,ensure that attributes are
> fetched from the index blob with read_blob_data_from_index().

Makes sense to me.

> 
> Helped-by: Victoria Dye <vdye@github.com>
> Signed-off-by: Shuqi Liang <cheskaqiqi@gmail.com>
> ---
>  attr.c | 47 ++++++++++++++++++++++++++++-------------------
>  1 file changed, 28 insertions(+), 19 deletions(-)
> 
> diff --git a/attr.c b/attr.c
> index 7d39ac4a29..be06747b0d 100644
> --- a/attr.c
> +++ b/attr.c
> @@ -808,35 +808,44 @@ static struct attr_stack *read_attr_from_blob(struct index_state *istate,
>  static struct attr_stack *read_attr_from_index(struct index_state *istate,
>  					       const char *path, unsigned flags)
>  {
> +	struct attr_stack *stack = NULL;
>  	char *buf;
>  	unsigned long size;
> +	int pos;
>  
>  	if (!istate)
>  		return NULL;
>  
>  	/*
> -	 * The .gitattributes file only applies to files within its
> -	 * parent directory. In the case of cone-mode sparse-checkout,
> -	 * the .gitattributes file is sparse if and only if all paths
> -	 * within that directory are also sparse. Thus, don't load the
> -	 * .gitattributes file since it will not matter.
> -	 *
> -	 * In the case of a sparse index, it is critical that we don't go
> -	 * looking for a .gitattributes file, as doing so would cause the
> -	 * index to expand.
> +	 * If the pos value is negative, it means the path is not in the index. 
> +	 * However, the absolute value of pos minus 1 gives us the position where the path 
> +	 * would be inserted in lexicographic order. By subtracting another 1 from this 
> +	 * value (pos = -pos - 2), we find the position of the last index entry 
> +	 * which is lexicographically smaller than the provided path. This would be 
> +	 * the sparse directory containing the path.

This is a good explanation of what '-pos - 2' represents, but it doesn't
explain why we'd want that value. Could you add a bit of detail around why
1) we care whether 'pos' identifies a value that exists in the index or not,
and 2) why we're looking for the sparse directory containing the path?

>  	 */
> -	if (!path_in_cone_mode_sparse_checkout(path, istate))
> -		return NULL;
> +	pos = index_name_pos_sparse(istate, path, strlen(path));
> +	pos = - pos - 2;

nit: don't add the space between '-' and 'pos'. This should be:

	pos = -pos - 2;

>  
> -	buf = read_blob_data_from_index(istate, path, &size);
> -	if (!buf)
> -		return NULL;
> -	if (size >= ATTR_MAX_FILE_SIZE) {
> -		warning(_("ignoring overly large gitattributes blob '%s'"), path);
> -		return NULL;
> -	}
> +	if (!path_in_cone_mode_sparse_checkout(path, istate) && 0 <= pos) {

Typically, we try to put the less expensive operation first in a condition
like this (if the first part of the condition is 'false', the second part
won't be evaluated). 'path_in_cone_mode_sparse_checkout()' is more expensive
than a simple numerical check, so this should probably be:

	if (pos >= 0 && !path_in_cone_mode_sparse_checkout(path, istate)) {

But on a more general note, why check 'path_in_cone_mode_sparse_checkout()'
at all? The goal is to determine whether 'path' is inside a sparse
directory, so first you search the index to find where that directory would
be, then - if 'path' isn't in the sparse-checkout cone - check whether the
index entry you found is a sparse directory. But sparse directories can't
exist within the sparse-checkout cone in the first place, so the
'path_in_cone_mode_sparse_checkout()' is redundant. 

Instead, 'path_in_cone_mode_sparse_checkout()' (and probably
'istate->sparse_index', since sparse directories can't exist if the index
isn't sparse) could be used to avoid calculating 'index_name_pos_sparse()'
in the first place; the index search operation is generally more expensive
than 'path_in_cone_mode_sparse_checkout()', especially when sparse-checkout
is disabled entirely.

> +		if (!S_ISSPARSEDIR(istate->cache[pos]->ce_mode))
> +			return NULL;
>  
> -	return read_attr_from_buf(buf, path, flags);
> +		if (strncmp(istate->cache[pos]->name, path, ce_namelen(istate->cache[pos])) == 0) {

All of these nested conditions could be simplified/collapsed into a single,
top-level condition:

	if (pos >= 0 && !path_in_cone_mode_sparse_checkout(path, istate) &&
	    S_ISSPARSEDIR(istate->cache[pos]->ce_mode) &&
	    !strncmp(istate->cache[pos]->name, path, ce_namelen(istate->cache[pos]))) {

IMO, this also more clearly reflects _why_ you'd want to enter this
condition and read from the index directly:

* If the path is not in the sparse-checkout cone
* AND the index entry preceding 'path' is a sparse directory
* AND the sparse directory is the prefix of 'path' (i.e., 'path' is in the
  directory) 
    -> Read from the sparse directory's tree

One other quick sanity check - for the sparse directory prefixing check to
work, 'path' needs to be a normalized path relative to the root of the repo.
Is that guaranteed to be the case here?

> +			const char *relative_path = path + ce_namelen(istate->cache[pos]);  

Here, you get the relative path within the sparse directory by skipping past
the sparse directory name in 'path'. If 'path' is normalized (see above),
this works. Nice!

> +			stack = read_attr_from_blob(istate, &istate->cache[pos]->oid, relative_path, flags);
> +		}
> +	} else {
> +		buf = read_blob_data_from_index(istate, path, &size);
> +		if (!buf)
> +			return NULL;
> +		if (size >= ATTR_MAX_FILE_SIZE) {
> +			warning(_("ignoring overly large gitattributes blob '%s'"), path);
> +			return NULL;
> +		}
> +		stack = read_attr_from_buf(buf, path, flags);
> +	}
> +	return stack;
>  }
>  
>  static struct attr_stack *read_attr(struct index_state *istate,


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v3 1/3] attr.c: read attributes in a sparse directory
  2023-07-11 21:15       ` Junio C Hamano
@ 2023-07-11 22:08         ` Junio C Hamano
  2023-07-13 20:22           ` Shuqi Liang
  2023-07-13 20:13         ` Shuqi Liang
  1 sibling, 1 reply; 39+ messages in thread
From: Junio C Hamano @ 2023-07-11 22:08 UTC (permalink / raw)
  To: Shuqi Liang; +Cc: git, vdye

Junio C Hamano <gitster@pobox.com> writes:

>> -	if (!path_in_cone_mode_sparse_checkout(path, istate))
>> -		return NULL;
>> +	pos = index_name_pos_sparse(istate, path, strlen(path));
>> +	pos = - pos - 2;
>>  
>> -	buf = read_blob_data_from_index(istate, path, &size);
>> -	if (!buf)
>> -		return NULL;
>> -	if (size >= ATTR_MAX_FILE_SIZE) {
>> -		warning(_("ignoring overly large gitattributes blob '%s'"), path);
>> -		return NULL;
>> -	}
>> +	if (!path_in_cone_mode_sparse_checkout(path, istate) && 0 <= pos) {
>> +		if (!S_ISSPARSEDIR(istate->cache[pos]->ce_mode))
>> +			return NULL;

Another thing I forgot to ask.  When we are asked to read
".gitattributes" at the top level, does this code work correctly?
As ".gitattributes" is at the root level, it won't be hidden inside
a sparsified directory in the index, and we do not have to search
for its parent.  I just wanted to see if the relative_path computation
and other things we see below will safely be skipped in such a case.

Thanks.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v3 1/3] attr.c: read attributes in a sparse directory
  2023-07-11 21:15       ` Junio C Hamano
  2023-07-11 22:08         ` Junio C Hamano
@ 2023-07-13 20:13         ` Shuqi Liang
  1 sibling, 0 replies; 39+ messages in thread
From: Shuqi Liang @ 2023-07-13 20:13 UTC (permalink / raw)
  To: Junio C Hamano, git, Victoria Dye

On Tue, Jul 11, 2023 at 5:15 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Shuqi Liang <cheskaqiqi@gmail.com> writes:
>
> > 'git check-attr' cannot currently find attributes of a file within a
> > sparse directory. This is due to .gitattributes files are irrelevant in
> > sparse-checkout cone mode, as the file is considered sparse only if all
> > paths within its parent directory are also sparse.
>
> I do not quite understand what these two sentences want to say.  If
> the attribute files are truly irrelevant then "cannot find" does not
> matter, because there is no point in finding irrelevant things that
> by definition will not affect the outcome of any commands at all,
> no?
>
> > In addition,
> > searching for a .gitattributes file causes expansion of the sparse
> > index, which is avoided to prevent potential performance degradation.
>
> Does this sentence want to say that there is a price to pay, in
> order to read an attribute file that is not part of the cones of
> interest, that you first need to expand the sparse index?  I think
> that is a given and I am not sure what the point of saying it is.
>
> > However, this behavior can lead to missing attributes for files inside
> > sparse directories, causing inconsistencies in file handling.
>
> I agree.  Not reading attribute files correctly will lead to a bug.
>
> Let me rephase what (I think) you wrote below to see if I understand
> what you are doing correctly.
>
> Suppose that sub1/.gitattributes need to be read, when the calling
> command wants to know about attributes of sub1/file.  Imagine that
> sub1/ and sub2/ are both outside the cones of interest. It would be
> better not to expand sub2/ even though we need to expand sub1/.  Not
> calling ensure_full_index() upfront and instead expanding the
> necessary subdirectories on demand would be a good way to solve it.
>
> Is that what going on?

Sorry for the confusion. I was actually trying to explain why the original
comment isn't needed anymore. Here's my updated comment - does this
make more sense?

Previously, the `read_attr_from_index()` function was structured to handle
attribute reading from a `.gitattributes` file only when the file
paths were part
of the cone-mode sparse-checkout. This was based on the fact that the
`.gitattributes` file only applied to files within its parent
directory. As a result,
we avoided loading the `.gitattributes` file if the sparse-checkout was in
cone-mode, as the file is sparse only if all paths within that directory are
also sparse.

However, this approach was not capable of handling scenarios where we
needed to read attributes from sparse directories。

To resolve this, revise 'git check-attr' to allow attribute reading for
files in sparse directories from the corresponding .gitattributes files:

1.Utilize path_in_cone_mode_sparse_checkout() and index_name_pos_sparse
to check if a path falls within a sparse directory.

2.If path is inside a sparse directory, employ the value of
index_name_pos_sparse() to find the sparse directory containing path and
path relative to sparse directory. Proceed to read attributes from the
tree OID of the sparse directory using read_attr_from_blob().

3.If path is not inside a sparse directory,ensure that attributes are
fetched from the index blob with read_blob_data_from_index().



> > diff --git a/attr.c b/attr.c
> > index 7d39ac4a29..be06747b0d 100644
> > --- a/attr.c
> > +++ b/attr.c
> > @@ -808,35 +808,44 @@ static struct attr_stack *read_attr_from_blob(struct index_state *istate,
> >  static struct attr_stack *read_attr_from_index(struct index_state *istate,
> >                                              const char *path, unsigned flags)
> >  {
> > +     struct attr_stack *stack = NULL;
> >       char *buf;
> >       unsigned long size;
> > +     int pos;
> >
> >       if (!istate)
> >               return NULL;
> >
> >       /*
> > -      * The .gitattributes file only applies to files within its
> > -      * parent directory. In the case of cone-mode sparse-checkout,
> > -      * the .gitattributes file is sparse if and only if all paths
> > -      * within that directory are also sparse. Thus, don't load the
> > -      * .gitattributes file since it will not matter.
>
> Imagine that you have a tree with sub1/ outside the cones of
> interest and sub2/ and sub9/ inside the cones of interest, and
> further imagine that sub1/.gitattributes and sub2/.gitattributes
> give attribute X to sub1/file and sub2/file respectively.  There
> is no sub9/.gitattributes file.
>
> Then "git ls-files ':(attr:X)sub[0-9]'" _could_ have two equally
> sensible behaviours:
>
>  (1) Only show sub2/file because sub1/ is outside the cones of
>      interest and the user does not want to clutter the output
>      from the parts of the tree they are not interested in.
>
>  (2) Show both sub1/file and sub2/file, even though sub1/ is outside
>      the cones of interest, in response to the fact that the mention
>      of "sub[0-9]" on the command line is an explicit indication of
>      interest by the user (it would become more and more interesting
>      if the pathspec gets less specific, like ":(attr:X)" that is
>      treewide, though).
> The original comment seems to say that only behaviour (1) is
> supported, but I wonder if we eventually want to support both,
> choice made by the calling code (and perhaps options)?  In any case,
> offering the choice of (2) is a good thing in the longer run.
> Anyway...

If we use '--sparse' as an option, then by default we'd only apply (1).
However, if the user types '--sparse', we'd switch to (2). Do you think that's
a good approach?

> > +      * If the pos value is negative, it means the path is not in the index.
> > +      * However, the absolute value of pos minus 1 gives us the position where the path
> > +      * would be inserted in lexicographic order. By subtracting another 1 from this
> > +      * value (pos = -pos - 2), we find the position of the last index entry
> > +      * which is lexicographically smaller than the provided path. This would be
> > +      * the sparse directory containing the path.
>
> That is true only if the directory containing the .gitattribute file
> is sparsified (e.g. sub1/.gitattributes does not appear in the index
> but sub1/ does; sub2/.gitattributes however does appear in the index
> and there is no sub2/ in the index).
>
> If not, there are two cases:
>
>  * sub2/.gitattributes does appear in the index (and there is no
>    sub2/ in the index).  "pos = - pos - 2" computes a nonsense
>    number in this case; hopefully we can reject it early by noticing
>    that the resulting pos is negative.
>
>  * sub9/.gitattributes does not belong to the project.  The pos is
>    negative and "- pos - 2" does not poihnt at sub9/ (as it is not
>    sparse).  Depending on what other paths appear in sub9/., the
>    path that appears at (-pos-2) may be inside or outside sub9/.  In
>    the worst case, it could be a sparsified directory that sorts
>    directly before sub9/ (say, there is sub8/ that is sparse, which
>    may have .gitattributes in it).  Would the updated code
>    mistakenly check S_ISSPARSEDIR() on sub8/ that has no relevance
>    when we are dealing with sub9/.gitattributes that does not exist?

I made some modifications to the code to address your points. Now, only
calculating 'pos' when the path falls outside of the cone. Moreover, we
only compute '-pos -2' when 'pos' is negative. I believe this adjustment can
effectively address the issues you've brought up.

> > -     if (!path_in_cone_mode_sparse_checkout(path, istate))
> > -             return NULL;
> > +     pos = index_name_pos_sparse(istate, path, strlen(path));
> > +     pos = - pos - 2;
> >
> > -     buf = read_blob_data_from_index(istate, path, &size);
> > -     if (!buf)
> > -             return NULL;
> > -     if (size >= ATTR_MAX_FILE_SIZE) {
> > -             warning(_("ignoring overly large gitattributes blob '%s'"), path);
> > -             return NULL;
> > -     }
> > +     if (!path_in_cone_mode_sparse_checkout(path, istate) && 0 <= pos) {
> > +             if (!S_ISSPARSEDIR(istate->cache[pos]->ce_mode))
> > +                     return NULL;
>
> So earlier, the code, given say sub1/.gitattributes, checked if that
> path is outside the cones of interest and skipped reading it.  But
> the updated code tries to check the same "is it outside or inside?"
> condition for sub1/ directory itself.  Does it make a practical
> difference that you can demonstrate with a test?

I'm not sure I understand the point. Are you suggesting that checking
(!S_ISSPARSEDIR(istate->cache[pos]->ce_mode) is unnecessary because
 we don't need to verify it as we already outside the cone?

> I do not know if the updated code does the right thing for
> sub2/.gitattributes (exists in a non-sparse directory) and
> sub9/.gitattributes (does not exist in non-sparse directory),
> though.
>
> > +             if (strncmp(istate->cache[pos]->name, path, ce_namelen(istate->cache[pos])) == 0) {
>
> Don't compare with "==0", write !strncmp(...) instead.

Will fix!

> > +                     const char *relative_path = path + ce_namelen(istate->cache[pos]);
> > +                     stack = read_attr_from_blob(istate, &istate->cache[pos]->oid, relative_path, flags);
> > +             }
>
> If the earlier "- pos - 2" misidentified the parent sparse directory
> entry in the index and the strncmp() noticed that mistake, we would
> come here without reading any new attribute stack frame.  Don't we
> need to fallback reading from the path in the correct directory that
> is not at "- pos - 2"?
>
> Let's imagine this case where sub/ is a directory outside the cones
> of interest, and our sparse-index may or may not have it as a
> directory in the index, and then the caller asks to read from the
> "sub/sub1/.gitattributes" file.  Even when "sub/" is expanded in the
> index, "sub/sub1/" may not and appear as a directory in the index.
>
> The above "find relative_path and read from the tree object" code
> would of course work when the direct parent directory of
> ".gitattributes" is visible in the index, but interestingly, it
> would also work when it does not.  E.g. if "sub/" is represented as
> a directory in the index, then asking for "sub1/.gitattributes"
> inside the tree object of "sub/" would work as get_tree_entry() used
> by read_attr_from_blob() would get to the right object recursively,
> so that is nice.  If that is why "'- pos - 2' must be the directory
> entry in the index that _would_ include $leadingpath/.gitattributes
> regardless of how many levels of directory hierarchy there are
> inside $leadingpath" idea was chosen, I'd have to say that it is
> clever ;-)
>
> I however find the "'- pos - 2' must be the directory entry in the
> index" trick hard to reason about and explain.  I wonder if we write
> this in a more straight-forward and stupid way, the result becomes
> easier to read and less prone to future bugs...

Here is my updated code. Is it more straightforward and less prone to bugs?

if (!path_in_cone_mode_sparse_checkout(path, istate)) {
    pos = index_name_pos_sparse(istate, path, strlen(path));

    if (pos < 0)
        pos = -pos - 2;
}

if (pos >= 0 && !path_in_cone_mode_sparse_checkout(path, istate) &&
S_ISSPARSEDIR(istate->cache[pos]->ce_mode) &&
!strncmp(istate->cache[pos]->name, path,
ce_namelen(istate->cache[pos])) &&
!normalize_path_copy(normalize_path, path)) {
    relative_path = normalize_path + ce_namelen(istate->cache[pos]);
    stack = read_attr_from_blob(istate, &istate->cache[pos]->oid,
relative_path, flags);

    stack = read_attr_from_blob(istate, &istate->cache[pos]->oid,
relative_path, flags);
}

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v3 1/3] attr.c: read attributes in a sparse directory
  2023-07-11 22:08         ` Junio C Hamano
@ 2023-07-13 20:22           ` Shuqi Liang
  0 siblings, 0 replies; 39+ messages in thread
From: Shuqi Liang @ 2023-07-13 20:22 UTC (permalink / raw)
  To: Junio C Hamano, git, Victoria Dye

On Tue, Jul 11, 2023 at 6:08 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Junio C Hamano <gitster@pobox.com> writes:
>
> >> -    if (!path_in_cone_mode_sparse_checkout(path, istate))
> >> -            return NULL;
> >> +    pos = index_name_pos_sparse(istate, path, strlen(path));
> >> +    pos = - pos - 2;
> >>
> >> -    buf = read_blob_data_from_index(istate, path, &size);
> >> -    if (!buf)
> >> -            return NULL;
> >> -    if (size >= ATTR_MAX_FILE_SIZE) {
> >> -            warning(_("ignoring overly large gitattributes blob '%s'"), path);
> >> -            return NULL;
> >> -    }
> >> +    if (!path_in_cone_mode_sparse_checkout(path, istate) && 0 <= pos) {
> >> +            if (!S_ISSPARSEDIR(istate->cache[pos]->ce_mode))
> >> +                    return NULL;
>
> Another thing I forgot to ask.  When we are asked to read
> ".gitattributes" at the top level, does this code work correctly?
> As ".gitattributes" is at the root level, it won't be hidden inside
> a sparsified directory in the index, and we do not have to search
> for its parent.  I just wanted to see if the relative_path computation
> and other things we see below will safely be skipped in such a case.

Yeah, this code works correctly. I added those tests in t1092 and they
passed successfully.

test_expect_success 'check-attr with pathspec inside sparse definition' '
init_repos &&

    echo "a -crlf myAttr" >>.gitattributes &&
    run_on_all cp ../.gitattributes . &&

    test_all_match git check-attr -a -- deep/a &&

    test_all_match git add .gitattributes &&
    test_all_match git check-attr -a --cached -- deep/a
'
test_expect_success 'check-attr with pathspec inside sparse definition' '
init_repos &&

    echo "a -crlf myAttr" >>.gitattributes &&
    run_on_all cp ../.gitattributes . &&

    test_all_match git check-attr -a -- folder1/a &&

    test_all_match git add .gitattributes &&
    test_all_match git check-attr -a --cached -- folder1/a
'

Do I need to modify t1092 to include cases like this?

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH v4 0/3] check-attr: integrate with sparse-index
  2023-07-11 13:30   ` [PATCH v3 0/3] " Shuqi Liang
                       ` (3 preceding siblings ...)
  2023-07-11 16:56     ` [PATCH v3 0/3] " Junio C Hamano
@ 2023-07-18 23:29     ` Shuqi Liang
  2023-07-18 23:29       ` [PATCH v4 1/3] t1092: add tests for 'git check-attr' Shuqi Liang
                         ` (3 more replies)
  4 siblings, 4 replies; 39+ messages in thread
From: Shuqi Liang @ 2023-07-18 23:29 UTC (permalink / raw)
  To: git; +Cc: Shuqi Liang, vdye, gitster

change against v3:

* update a new commit message

* update read_attr_from_index function

* The order of the patches has been rearranged to better illustrate
the problem and its solution.

1.t1092: add tests for git check-attr 
2.attr.c: read attributes in a sparse directory
3.check-attr: integrate with sparse-index

The new order of patches allows us to introduce a failing test case in
the patch 1, which then hen back to  "test_expect_success" in patch 2.
This approach is designed to concretely show why reading attributes
from a sparse directory is needed: without this functionality, the
sparse index case doesn't work correctly for git check-attr.

* Enhanced the comments in the code to provide more detail. Added
explanations as to why 1) it matters whether 'pos' identifies a
value that exists in the index or not, and 2) the rationale behind
looking for the sparse directory containing the path

* Add a test 'diff --check with pathspec outside sparse definition'.
It starts by disabling the trailing whitespace and space-before-tab
checks using the core.whitespace configuration option. Then, it
specifically re-enables the trailing whitespace check for a file located
in a sparse directory. This is accomplished by adding a
whitespace=trailing-space rule to the .gitattributes file within that
directory. To ensure that only the .gitattributes file in the index is
being read, and not any .gitattributes files in the working tree, the
test removes the .gitattributes file from the working tree after adding
it to the index. The final part of the test uses 'git diff --check' to
verify the correct application of the attribute rules. This ensures that
the .gitattributes file is correctly read from index and applied, even
when the file's path falls outside of the sparse-checkout definition.

* fix whitespace error 

Shuqi Liang (3):
  t1092: add tests for 'git check-attr'
  attr.c: read attributes in a sparse directory
  check-attr: integrate with sparse-index

 attr.c                                   | 60 +++++++++++++++-------
 builtin/check-attr.c                     |  3 ++
 t/perf/p2000-sparse-operations.sh        |  1 +
 t/t1092-sparse-checkout-compatibility.sh | 63 ++++++++++++++++++++++++
 4 files changed, 109 insertions(+), 18 deletions(-)

Range-diff against v3:
1:  199cc90a5b < -:  ---------- attr.c: read attributes in a sparse directory
2:  eefce85083 ! 1:  9c43eea9cc t1092: add tests for `git check-attr`
    @@ Metadata
     Author: Shuqi Liang <cheskaqiqi@gmail.com>
     
      ## Commit message ##
    -    t1092: add tests for `git check-attr`
    +    t1092: add tests for 'git check-attr'
     
    -    Add smudge/clean filters in .gitattributes files inside the affected
    -    sparse directories in test 'merge with conflict outside cone', make sure
    -    it behaves as expected when path is outside of sparse-checkout.
    +    Add tests for `git check-attr`, make sure attribute file does get read
    +    from index when path is either inside or outside of sparse-checkout
    +    definition.
     
    -    Add tests for `git check-attr`, make sure it behaves as expected when
    -    path is both inside or outside of sparse-checkout definition.
    +    Add a test named 'diff --check with pathspec outside sparse definition'.
    +    It starts by disabling the trailing whitespace and space-before-tab
    +    checks using the core.whitespace configuration option. Then, it
    +    specifically re-enables the trailing whitespace check for a file located
    +    in a sparse directory. This is accomplished by adding a
    +    whitespace=trailing-space rule to the .gitattributes file within that
    +    directory. To ensure that only the .gitattributes file in the index is
    +    being read, and not any .gitattributes files in the working tree, the
    +    test removes the .gitattributes file from the working tree after adding
    +    it to the index. The final part of the test uses 'git diff --check' to
    +    verify the correct application of the attribute rules. This ensures that
    +    the .gitattributes file is correctly read from index and applied, even
    +    when the file's path falls outside of the sparse-checkout definition.
     
         Helped-by: Victoria Dye <vdye@github.com>
         Signed-off-by: Shuqi Liang <cheskaqiqi@gmail.com>
     
      ## t/t1092-sparse-checkout-compatibility.sh ##
    -@@ t/t1092-sparse-checkout-compatibility.sh: test_expect_success 'merge with conflict outside cone' '
    - 
    - 	test_all_match git checkout -b merge-tip merge-left &&
    - 	test_all_match git status --porcelain=v2 &&
    -+
    -+	echo "a filter=rot13" >>.gitattributes &&
    -+	run_on_sparse mkdir folder1 &&
    -+	run_on_all cp ../.gitattributes ./folder1 &&
    -+	git -C full-checkout add folder1/.gitattributes &&
    -+	run_on_sparse git add --sparse folder1/.gitattributes &&
    -+	run_on_all git commit -m "add .gitattributes" &&
    -+	test_sparse_match git sparse-checkout reapply &&
    -+	git config filter.rot13.clean "tr 'A-Za-z' 'N-ZA-Mn-za-m'" &&
    -+	git config filter.rot13.smudge "tr 'A-Za-z' 'N-ZA-Mn-za-m'" &&
    -+
    - 	test_all_match test_must_fail git merge -m merge merge-right &&
    - 	test_all_match git status --porcelain=v2 &&
    - 
     @@ t/t1092-sparse-checkout-compatibility.sh: test_expect_success 'worktree is not expanded' '
      	ensure_not_expanded worktree remove .worktrees/hotfix
      '
    @@ t/t1092-sparse-checkout-compatibility.sh: test_expect_success 'worktree is not e
     +	test_all_match git check-attr -a --cached -- deep/a
     +'
     +
    -+test_expect_success 'check-attr with pathspec outside sparse definition' '
    ++test_expect_failure 'check-attr with pathspec outside sparse definition' '
     +	init_repos &&
     +
     +	echo "a -crlf myAttr" >>.gitattributes &&
    @@ t/t1092-sparse-checkout-compatibility.sh: test_expect_success 'worktree is not e
     +	test_sparse_match git sparse-checkout reapply &&
     +	test_all_match git check-attr  -a --cached -- folder1/a
     +'
    ++
    ++test_expect_failure 'diff --check with pathspec outside sparse definition' '
    ++	init_repos &&
    ++
    ++	write_script edit-contents <<-\EOF &&
    ++	echo "a " >"$1"
    ++	EOF
    ++
    ++	git config core.whitespace -trailing-space,-space-before-tab &&
    ++
    ++	echo "a whitespace=trailing-space,space-before-tab" >>.gitattributes &&
    ++	run_on_all mkdir -p folder1 &&
    ++	run_on_all cp ../.gitattributes ./folder1 &&
    ++	git -C full-checkout add folder1/.gitattributes &&
    ++	run_on_sparse git add --sparse folder1/.gitattributes &&
    ++	run_on_all rm folder1/.gitattributes &&
    ++	run_on_all  ../edit-contents folder1/a &&
    ++	test_all_match test_must_fail git diff --check -- folder1/a
    ++'
     +
      test_done
-:  ---------- > 2:  63ff110b1c attr.c: read attributes in a sparse directory
3:  65c2624504 ! 3:  7a9c2da30d check-attr: integrate with sparse-index
    @@ Metadata
      ## Commit message ##
         check-attr: integrate with sparse-index
     
    -    Set the requires-full-index to false for "diff-tree".
    +    Set the requires-full-index to false for "check-attr".
     
         Add a test to ensure that the index is not expanded whether the files
         are outside or inside the sparse-checkout cone when the sparse index is
    @@ t/perf/p2000-sparse-operations.sh: test_perf_on_all git diff-files -- $SPARSE_CO
      test_done
     
      ## t/t1092-sparse-checkout-compatibility.sh ##
    -@@ t/t1092-sparse-checkout-compatibility.sh: test_expect_success 'check-attr with pathspec outside sparse definition' '
    - 	test_all_match git check-attr  -a --cached -- folder1/a
    +@@ t/t1092-sparse-checkout-compatibility.sh: test_expect_success 'diff --check with pathspec outside sparse definition' '
    + 	test_all_match test_must_fail git diff --check -- folder1/a
      '
      
     +test_expect_success 'sparse-index is not expanded: check-attr' '
-- 
2.39.0


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH v4 1/3] t1092: add tests for 'git check-attr'
  2023-07-18 23:29     ` [PATCH v4 " Shuqi Liang
@ 2023-07-18 23:29       ` Shuqi Liang
  2023-07-20 18:43         ` Victoria Dye
  2023-07-18 23:29       ` [PATCH v4 2/3] attr.c: read attributes in a sparse directory Shuqi Liang
                         ` (2 subsequent siblings)
  3 siblings, 1 reply; 39+ messages in thread
From: Shuqi Liang @ 2023-07-18 23:29 UTC (permalink / raw)
  To: git; +Cc: Shuqi Liang, vdye, gitster

Add tests for `git check-attr`, make sure attribute file does get read
from index when path is either inside or outside of sparse-checkout
definition.

Add a test named 'diff --check with pathspec outside sparse definition'.
It starts by disabling the trailing whitespace and space-before-tab
checks using the core.whitespace configuration option. Then, it
specifically re-enables the trailing whitespace check for a file located
in a sparse directory. This is accomplished by adding a
whitespace=trailing-space rule to the .gitattributes file within that
directory. To ensure that only the .gitattributes file in the index is
being read, and not any .gitattributes files in the working tree, the
test removes the .gitattributes file from the working tree after adding
it to the index. The final part of the test uses 'git diff --check' to
verify the correct application of the attribute rules. This ensures that
the .gitattributes file is correctly read from index and applied, even
when the file's path falls outside of the sparse-checkout definition.

Helped-by: Victoria Dye <vdye@github.com>
Signed-off-by: Shuqi Liang <cheskaqiqi@gmail.com>
---
 t/t1092-sparse-checkout-compatibility.sh | 48 ++++++++++++++++++++++++
 1 file changed, 48 insertions(+)

diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index 8a95adf4b5..90633f383a 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -2259,4 +2259,52 @@ test_expect_success 'worktree is not expanded' '
 	ensure_not_expanded worktree remove .worktrees/hotfix
 '
 
+test_expect_success 'check-attr with pathspec inside sparse definition' '
+	init_repos &&
+
+	echo "a -crlf myAttr" >>.gitattributes &&
+	run_on_all cp ../.gitattributes ./deep &&
+
+	test_all_match git check-attr -a -- deep/a &&
+
+	test_all_match git add deep/.gitattributes &&
+	test_all_match git check-attr -a --cached -- deep/a
+'
+
+test_expect_failure 'check-attr with pathspec outside sparse definition' '
+	init_repos &&
+
+	echo "a -crlf myAttr" >>.gitattributes &&
+	run_on_sparse mkdir folder1 &&
+	run_on_all cp ../.gitattributes ./folder1 &&
+	run_on_all cp a folder1/a &&
+
+	test_all_match git check-attr -a -- folder1/a &&
+
+	git -C full-checkout add folder1/.gitattributes &&
+	run_on_sparse git add --sparse folder1/.gitattributes &&
+	run_on_all git commit -m "add .gitattributes" &&
+	test_sparse_match git sparse-checkout reapply &&
+	test_all_match git check-attr  -a --cached -- folder1/a
+'
+
+test_expect_failure 'diff --check with pathspec outside sparse definition' '
+	init_repos &&
+
+	write_script edit-contents <<-\EOF &&
+	echo "a " >"$1"
+	EOF
+
+	git config core.whitespace -trailing-space,-space-before-tab &&
+
+	echo "a whitespace=trailing-space,space-before-tab" >>.gitattributes &&
+	run_on_all mkdir -p folder1 &&
+	run_on_all cp ../.gitattributes ./folder1 &&
+	git -C full-checkout add folder1/.gitattributes &&
+	run_on_sparse git add --sparse folder1/.gitattributes &&
+	run_on_all rm folder1/.gitattributes &&
+	run_on_all  ../edit-contents folder1/a &&
+	test_all_match test_must_fail git diff --check -- folder1/a
+'
+
 test_done
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v4 2/3] attr.c: read attributes in a sparse directory
  2023-07-18 23:29     ` [PATCH v4 " Shuqi Liang
  2023-07-18 23:29       ` [PATCH v4 1/3] t1092: add tests for 'git check-attr' Shuqi Liang
@ 2023-07-18 23:29       ` Shuqi Liang
  2023-07-20 20:18         ` Victoria Dye
  2023-07-18 23:29       ` [PATCH v4 3/3] check-attr: integrate with sparse-index Shuqi Liang
  2023-08-11 14:22       ` [PATCH v5 0/3] " Shuqi Liang
  3 siblings, 1 reply; 39+ messages in thread
From: Shuqi Liang @ 2023-07-18 23:29 UTC (permalink / raw)
  To: git; +Cc: Shuqi Liang, vdye, gitster

Before this patch, git check-attr was unable to read the attributes from
a .gitattributes file within a sparse directory. The original comment
was operating under the assumption that users are only interested in
files or directories inside the cones. Therefore, in the original code,
in the case of a cone-mode sparse-checkout, we didn't load the
.gitattributes file.

However, this behavior can lead to missing attributes for files inside
sparse directories, causing inconsistencies in file handling.

To resolve this, revise 'git check-attr' to allow attribute reading for
files in sparse directories from the corresponding .gitattributes files:

1.Utilize path_in_cone_mode_sparse_checkout() and index_name_pos_sparse
to check if a path falls within a sparse directory.

2.If path is inside a sparse directory, employ the value of
index_name_pos_sparse() to find the sparse directory containing path and
path relative to sparse directory. Proceed to read attributes from the
tree OID of the sparse directory using read_attr_from_blob().

3.If path is not inside a sparse directory,ensure that attributes are
fetched from the index blob with read_blob_data_from_index().

Modify previous tests so such difference is not considered as an error.

Helped-by: Victoria Dye <vdye@github.com>
Signed-off-by: Shuqi Liang <cheskaqiqi@gmail.com>
---
 attr.c                                   | 60 +++++++++++++++++-------
 t/t1092-sparse-checkout-compatibility.sh |  4 +-
 2 files changed, 44 insertions(+), 20 deletions(-)

diff --git a/attr.c b/attr.c
index 7d39ac4a29..7650f5481a 100644
--- a/attr.c
+++ b/attr.c
@@ -808,35 +808,59 @@ static struct attr_stack *read_attr_from_blob(struct index_state *istate,
 static struct attr_stack *read_attr_from_index(struct index_state *istate,
 					       const char *path, unsigned flags)
 {
+	struct attr_stack *stack = NULL;
 	char *buf;
 	unsigned long size;
+	int pos = -1;
+	char normalize_path[PATH_MAX];
+	const char *relative_path;
 
 	if (!istate)
 		return NULL;
 
 	/*
-	 * The .gitattributes file only applies to files within its
-	 * parent directory. In the case of cone-mode sparse-checkout,
-	 * the .gitattributes file is sparse if and only if all paths
-	 * within that directory are also sparse. Thus, don't load the
-	 * .gitattributes file since it will not matter.
-	 *
-	 * In the case of a sparse index, it is critical that we don't go
-	 * looking for a .gitattributes file, as doing so would cause the
-	 * index to expand.
+	 * When handling sparse-checkouts, .gitattributes files
+	 * may reside within a sparse directory. We distinguish
+	 * whether a path exists directly in the index or not by
+	 * evaluating if 'pos' is negative.
+	 * If 'pos' is negative, the path is not directly present
+	 * in the index and is likely within a sparse directory.
+	 * For paths not in the index, The absolute value of 'pos'
+	 * minus 1 gives us the position where the path would be
+	 * inserted in lexicographic order within the index.
+	 * We then subtract another 1 from this value
+	 * (pos = -pos - 2) to find the position of the last
+	 * index entry which is lexicographically smaller than
+	 * the path. This would be the sparse directory containing
+	 * the path. By identifying the sparse directory containing
+	 * the path, we can correctly read the attributes specified
+	 * in the .gitattributes file from the tree object of the
+	 * sparse directory.
 	 */
-	if (!path_in_cone_mode_sparse_checkout(path, istate))
-		return NULL;
+	if (!path_in_cone_mode_sparse_checkout(path, istate)) {
+		pos = index_name_pos_sparse(istate, path, strlen(path));
 
-	buf = read_blob_data_from_index(istate, path, &size);
-	if (!buf)
-		return NULL;
-	if (size >= ATTR_MAX_FILE_SIZE) {
-		warning(_("ignoring overly large gitattributes blob '%s'"), path);
-		return NULL;
+		if (pos < 0)
+			pos = -pos - 2;
 	}
 
-	return read_attr_from_buf(buf, path, flags);
+	if (pos >= 0 && !path_in_cone_mode_sparse_checkout(path, istate) &&
+	    S_ISSPARSEDIR(istate->cache[pos]->ce_mode) &&
+	    !strncmp(istate->cache[pos]->name, path, ce_namelen(istate->cache[pos])) &&
+	    !normalize_path_copy(normalize_path, path)) {
+		relative_path = normalize_path + ce_namelen(istate->cache[pos]);
+		stack = read_attr_from_blob(istate, &istate->cache[pos]->oid, relative_path, flags);
+	} else {
+		buf = read_blob_data_from_index(istate, path, &size);
+		if (!buf)
+			return NULL;
+		if (size >= ATTR_MAX_FILE_SIZE) {
+			warning(_("ignoring overly large gitattributes blob '%s'"), path);
+			return NULL;
+		}
+		stack = read_attr_from_buf(buf, path, flags);
+	}
+	return stack;
 }
 
 static struct attr_stack *read_attr(struct index_state *istate,
diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index 90633f383a..3f32c1f972 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -2271,7 +2271,7 @@ test_expect_success 'check-attr with pathspec inside sparse definition' '
 	test_all_match git check-attr -a --cached -- deep/a
 '
 
-test_expect_failure 'check-attr with pathspec outside sparse definition' '
+test_expect_success 'check-attr with pathspec outside sparse definition' '
 	init_repos &&
 
 	echo "a -crlf myAttr" >>.gitattributes &&
@@ -2288,7 +2288,7 @@ test_expect_failure 'check-attr with pathspec outside sparse definition' '
 	test_all_match git check-attr  -a --cached -- folder1/a
 '
 
-test_expect_failure 'diff --check with pathspec outside sparse definition' '
+test_expect_success 'diff --check with pathspec outside sparse definition' '
 	init_repos &&
 
 	write_script edit-contents <<-\EOF &&
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v4 3/3] check-attr: integrate with sparse-index
  2023-07-18 23:29     ` [PATCH v4 " Shuqi Liang
  2023-07-18 23:29       ` [PATCH v4 1/3] t1092: add tests for 'git check-attr' Shuqi Liang
  2023-07-18 23:29       ` [PATCH v4 2/3] attr.c: read attributes in a sparse directory Shuqi Liang
@ 2023-07-18 23:29       ` Shuqi Liang
  2023-08-11 14:22       ` [PATCH v5 0/3] " Shuqi Liang
  3 siblings, 0 replies; 39+ messages in thread
From: Shuqi Liang @ 2023-07-18 23:29 UTC (permalink / raw)
  To: git; +Cc: Shuqi Liang, vdye, gitster

Set the requires-full-index to false for "check-attr".

Add a test to ensure that the index is not expanded whether the files
are outside or inside the sparse-checkout cone when the sparse index is
enabled.

The `p2000` tests demonstrate a ~63% execution time reduction for
'git check-attr' using a sparse index.

Test                                            before  after
-----------------------------------------------------------------------
2000.106: git check-attr -a f2/f4/a (full-v3)    0.05   0.05 +0.0%
2000.107: git check-attr -a f2/f4/a (full-v4)    0.05   0.05 +0.0%
2000.108: git check-attr -a f2/f4/a (sparse-v3)  0.04   0.02 -50.0%
2000.109: git check-attr -a f2/f4/a (sparse-v4)  0.04   0.01 -75.0%

Helped-by: Victoria Dye <vdye@github.com>
Signed-off-by: Shuqi Liang <cheskaqiqi@gmail.com>
---
 builtin/check-attr.c                     |  3 +++
 t/perf/p2000-sparse-operations.sh        |  1 +
 t/t1092-sparse-checkout-compatibility.sh | 15 +++++++++++++++
 3 files changed, 19 insertions(+)

diff --git a/builtin/check-attr.c b/builtin/check-attr.c
index b22ff748c3..c1da1d184e 100644
--- a/builtin/check-attr.c
+++ b/builtin/check-attr.c
@@ -122,6 +122,9 @@ int cmd_check_attr(int argc, const char **argv, const char *prefix)
 	argc = parse_options(argc, argv, prefix, check_attr_options,
 			     check_attr_usage, PARSE_OPT_KEEP_DASHDASH);
 
+	prepare_repo_settings(the_repository);
+	the_repository->settings.command_requires_full_index = 0;
+
 	if (repo_read_index(the_repository) < 0) {
 		die("invalid cache");
 	}
diff --git a/t/perf/p2000-sparse-operations.sh b/t/perf/p2000-sparse-operations.sh
index 96ed3e1d69..39e92b0841 100755
--- a/t/perf/p2000-sparse-operations.sh
+++ b/t/perf/p2000-sparse-operations.sh
@@ -134,5 +134,6 @@ test_perf_on_all git diff-files -- $SPARSE_CONE/a
 test_perf_on_all git diff-tree HEAD
 test_perf_on_all git diff-tree HEAD -- $SPARSE_CONE/a
 test_perf_on_all "git worktree add ../temp && git worktree remove ../temp"
+test_perf_on_all git check-attr -a -- $SPARSE_CONE/a
 
 test_done
diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index 3f32c1f972..125b205b0d 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -2307,4 +2307,19 @@ test_expect_success 'diff --check with pathspec outside sparse definition' '
 	test_all_match test_must_fail git diff --check -- folder1/a
 '
 
+test_expect_success 'sparse-index is not expanded: check-attr' '
+	init_repos &&
+
+	echo "a -crlf myAttr" >>.gitattributes &&
+	mkdir ./sparse-index/folder1 &&
+	cp ./sparse-index/a ./sparse-index/folder1/a &&
+	cp .gitattributes ./sparse-index/deep &&
+	cp .gitattributes ./sparse-index/folder1 &&
+
+	git -C sparse-index add deep/.gitattributes &&
+	git -C sparse-index add --sparse  folder1/.gitattributes &&
+	ensure_not_expanded check-attr -a --cached -- deep/a &&
+	ensure_not_expanded check-attr -a --cached -- folder1/a
+'
+
 test_done
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [PATCH v4 1/3] t1092: add tests for 'git check-attr'
  2023-07-18 23:29       ` [PATCH v4 1/3] t1092: add tests for 'git check-attr' Shuqi Liang
@ 2023-07-20 18:43         ` Victoria Dye
  0 siblings, 0 replies; 39+ messages in thread
From: Victoria Dye @ 2023-07-20 18:43 UTC (permalink / raw)
  To: Shuqi Liang, git; +Cc: gitster

Shuqi Liang wrote:
> Add tests for `git check-attr`, make sure attribute file does get read
> from index when path is either inside or outside of sparse-checkout
> definition.
> 
> Add a test named 'diff --check with pathspec outside sparse definition'.
> It starts by disabling the trailing whitespace and space-before-tab
> checks using the core.whitespace configuration option. Then, it
> specifically re-enables the trailing whitespace check for a file located
> in a sparse directory. This is accomplished by adding a
> whitespace=trailing-space rule to the .gitattributes file within that
> directory. To ensure that only the .gitattributes file in the index is
> being read, and not any .gitattributes files in the working tree, the
> test removes the .gitattributes file from the working tree after adding
> it to the index. The final part of the test uses 'git diff --check' to
> verify the correct application of the attribute rules. This ensures that
> the .gitattributes file is correctly read from index and applied, even
> when the file's path falls outside of the sparse-checkout definition.

Thanks for the thorough explanation! This presents a compelling case for why
.gitattributes should be read from sparse directories (if it isn't, the
behavior in sparse-index vs. full-checkout and sparse-checkout doesn't
match).

> 
> Helped-by: Victoria Dye <vdye@github.com>
> Signed-off-by: Shuqi Liang <cheskaqiqi@gmail.com>
> ---
>  t/t1092-sparse-checkout-compatibility.sh | 48 ++++++++++++++++++++++++
>  1 file changed, 48 insertions(+)
> 
> diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
> index 8a95adf4b5..90633f383a 100755
> --- a/t/t1092-sparse-checkout-compatibility.sh
> +++ b/t/t1092-sparse-checkout-compatibility.sh
> @@ -2259,4 +2259,52 @@ test_expect_success 'worktree is not expanded' '
>  	ensure_not_expanded worktree remove .worktrees/hotfix
>  '
>  
> +test_expect_success 'check-attr with pathspec inside sparse definition' '
> +	init_repos &&
> +
> +	echo "a -crlf myAttr" >>.gitattributes &&
> +	run_on_all cp ../.gitattributes ./deep &&
> +
> +	test_all_match git check-attr -a -- deep/a &&
> +
> +	test_all_match git add deep/.gitattributes &&
> +	test_all_match git check-attr -a --cached -- deep/a
> +'
> +
> +test_expect_failure 'check-attr with pathspec outside sparse definition' '

Could you explain (either in a "NEEDSWORK" comment here or in the commit
message) why this is 'test_expect_failure'? 

> +	init_repos &&
> +
> +	echo "a -crlf myAttr" >>.gitattributes &&
> +	run_on_sparse mkdir folder1 &&
> +	run_on_all cp ../.gitattributes ./folder1 &&
> +	run_on_all cp a folder1/a &&
> +
> +	test_all_match git check-attr -a -- folder1/a &&
> +
> +	git -C full-checkout add folder1/.gitattributes &&
> +	run_on_sparse git add --sparse folder1/.gitattributes &&
> +	run_on_all git commit -m "add .gitattributes" &&
> +	test_sparse_match git sparse-checkout reapply &&
> +	test_all_match git check-attr  -a --cached -- folder1/a
> +'
> +
> +test_expect_failure 'diff --check with pathspec outside sparse definition' '

Same here.

However, when I apply this patch locally and run this test, I get:

	ok 94 - diff --check with pathspec outside sparse definition # TODO known breakage vanished
	# 1 known breakage(s) vanished; please update test(s)

Looking at 'sparse-checkout-out', I see:

	folder1/a:1: trailing whitespace.
	+a 

This test _should_ fail (as your 'test_expect_failure' indicates), but it
passes because the outside-of-cone '.gitattributes' is somehow being applied
to 'folder1/a'. After some debugging, I traced the issue to...

> +	init_repos &&
> +
> +	write_script edit-contents <<-\EOF &&
> +	echo "a " >"$1"
> +	EOF
> +
> +	git config core.whitespace -trailing-space,-space-before-tab &&

...here. This 'git config' doesn't actually apply the configuration to any
of the test repositories, it applies the config to the parent directory. To
apply the config to the test repos, use:

	test_all_match git config core.whitespace -trailing-space,-space-before-tab &&

> +
> +	echo "a whitespace=trailing-space,space-before-tab" >>.gitattributes &&
> +	run_on_all mkdir -p folder1 &&
> +	run_on_all cp ../.gitattributes ./folder1 &&
> +	git -C full-checkout add folder1/.gitattributes &&
> +	run_on_sparse git add --sparse folder1/.gitattributes &&

nit: 'git add --sparse' will work in 'full-checkout' - there's no need to
have separate calls for 'full-checkout' and the sparse checkouts. 

Also, please use 'test_(all|sparse)_match' when running Git commands in
these tests, even if they're not the command explicitly being tested. It
adds an extra level of verification to the test essentially "for free".
Conversely, using 'run_on_(all|sparse)' could conceal subtle issues in the
test or mask bugs in the implementation.

> +	run_on_all rm folder1/.gitattributes &&
> +	run_on_all  ../edit-contents folder1/a &&

nit: extra space between 'run_on_all' and '../edit-contents'.

> +	test_all_match test_must_fail git diff --check -- folder1/a

Because '.gitattributes' was added and 'folder1/a' exists on disk, the
'folder1/' sparse directory is "unsparsified" by the time of this check. As
a result, there's essentially no difference in how 'sparse-index' is handled
vs. 'sparse-checkout'. It would be nice to have this verify the attribute
parsing in a sparse directory; one way to do that would be something like:

----- 8< ----- 8< ----- 8< ----- 8< ----- 8< ----- 8< ----- 8< ----- 8< -----
diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index 125b205b0d..183fce8531 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -2300,11 +2300,12 @@ test_expect_success 'diff --check with pathspec outside sparse definition' '
        echo "a whitespace=trailing-space,space-before-tab" >>.gitattributes &&
        run_on_all mkdir -p folder1 &&
        run_on_all cp ../.gitattributes ./folder1 &&
-       git -C full-checkout add folder1/.gitattributes &&
-       run_on_sparse git add --sparse folder1/.gitattributes &&
-       run_on_all rm folder1/.gitattributes &&
-       run_on_all  ../edit-contents folder1/a &&
-       test_all_match test_must_fail git diff --check -- folder1/a
+       test_all_match git add --sparse folder1/.gitattributes &&
+       run_on_all ../edit-contents folder1/a &&
+       test_all_match git add --sparse folder1/a &&
+
+       test_sparse_match git sparse-checkout reapply &&
+       test_all_match test_must_fail git diff --check --cached -- folder1/a
 '
 
 test_expect_success 'sparse-index is not expanded: check-attr' '
----- >8 ----- >8 ----- >8 ----- >8 ----- >8 ----- >8 ----- >8 ----- >8 -----

The main differences from the current patch are:

1. adding 'folder1/a' to the index then "re-sparsifying" 'folder1/' with
   'git sparse-checkout reapply'.
2. using the '--cached' option to 'git diff' to compare "index vs. HEAD"
   rather than "working tree vs. index".

Finally, as a general point of feedback - all of the version of this series
so far have included some minor whitespace/stylistic issues (missing spaces
in expressions, double spaces, trailing whitespace, etc.). Please check your
patches carefully before submitting them to avoid excess re-rolls & get your
changes merged faster.

[1] https://lore.kernel.org/git/20230718232916.31660-3-cheskaqiqi@gmail.com/

> +'
> +
>  test_done


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [PATCH v4 2/3] attr.c: read attributes in a sparse directory
  2023-07-18 23:29       ` [PATCH v4 2/3] attr.c: read attributes in a sparse directory Shuqi Liang
@ 2023-07-20 20:18         ` Victoria Dye
  2023-08-03 16:22           ` Glen Choo
  0 siblings, 1 reply; 39+ messages in thread
From: Victoria Dye @ 2023-07-20 20:18 UTC (permalink / raw)
  To: Shuqi Liang, git; +Cc: gitster

Shuqi Liang wrote:
> Before this patch, git check-attr was unable to read the attributes from
> a .gitattributes file within a sparse directory. The original comment
> was operating under the assumption that users are only interested in
> files or directories inside the cones. Therefore, in the original code,
> in the case of a cone-mode sparse-checkout, we didn't load the
> .gitattributes file.
> 
> However, this behavior can lead to missing attributes for files inside
> sparse directories, causing inconsistencies in file handling.
> 
> To resolve this, revise 'git check-attr' to allow attribute reading for
> files in sparse directories from the corresponding .gitattributes files:
> 
> 1.Utilize path_in_cone_mode_sparse_checkout() and index_name_pos_sparse
> to check if a path falls within a sparse directory.
> 
> 2.If path is inside a sparse directory, employ the value of
> index_name_pos_sparse() to find the sparse directory containing path and
> path relative to sparse directory. Proceed to read attributes from the
> tree OID of the sparse directory using read_attr_from_blob().
> 
> 3.If path is not inside a sparse directory,ensure that attributes are
> fetched from the index blob with read_blob_data_from_index().
> 
> Modify previous tests so such difference is not considered as an error.

I don't quite follow what you mean here "such difference". I see that you
changed the 'test_expect_failure' to 'test_expect_success', but that's
because the attributes inside a sparse directory are now being read rather
than ignored.

The rest of the commit message looks good to me, though.

> 
> Helped-by: Victoria Dye <vdye@github.com>
> Signed-off-by: Shuqi Liang <cheskaqiqi@gmail.com>
> ---
>  attr.c                                   | 60 +++++++++++++++++-------
>  t/t1092-sparse-checkout-compatibility.sh |  4 +-
>  2 files changed, 44 insertions(+), 20 deletions(-)
> 
> diff --git a/attr.c b/attr.c
> index 7d39ac4a29..7650f5481a 100644
> --- a/attr.c
> +++ b/attr.c
> @@ -808,35 +808,59 @@ static struct attr_stack *read_attr_from_blob(struct index_state *istate,
>  static struct attr_stack *read_attr_from_index(struct index_state *istate,
>  					       const char *path, unsigned flags)
>  {
> +	struct attr_stack *stack = NULL;
>  	char *buf;
>  	unsigned long size;
> +	int pos = -1;
> +	char normalize_path[PATH_MAX];
> +	const char *relative_path;
>  
>  	if (!istate)
>  		return NULL;
>  
>  	/*
> -	 * The .gitattributes file only applies to files within its
> -	 * parent directory. In the case of cone-mode sparse-checkout,
> -	 * the .gitattributes file is sparse if and only if all paths
> -	 * within that directory are also sparse. Thus, don't load the
> -	 * .gitattributes file since it will not matter.
> -	 *
> -	 * In the case of a sparse index, it is critical that we don't go
> -	 * looking for a .gitattributes file, as doing so would cause the
> -	 * index to expand.
> +	 * When handling sparse-checkouts, .gitattributes files
> +	 * may reside within a sparse directory. We distinguish
> +	 * whether a path exists directly in the index or not by
> +	 * evaluating if 'pos' is negative.
> +	 * If 'pos' is negative, the path is not directly present
> +	 * in the index and is likely within a sparse directory.
> +	 * For paths not in the index, The absolute value of 'pos'
> +	 * minus 1 gives us the position where the path would be
> +	 * inserted in lexicographic order within the index.
> +	 * We then subtract another 1 from this value
> +	 * (pos = -pos - 2) to find the position of the last
> +	 * index entry which is lexicographically smaller than
> +	 * the path. This would be the sparse directory containing
> +	 * the path. By identifying the sparse directory containing
> +	 * the path, we can correctly read the attributes specified
> +	 * in the .gitattributes file from the tree object of the
> +	 * sparse directory.
>  	 */
> -	if (!path_in_cone_mode_sparse_checkout(path, istate))
> -		return NULL;
> +	if (!path_in_cone_mode_sparse_checkout(path, istate)) {
> +		pos = index_name_pos_sparse(istate, path, strlen(path));
>  
> -	buf = read_blob_data_from_index(istate, path, &size);
> -	if (!buf)
> -		return NULL;
> -	if (size >= ATTR_MAX_FILE_SIZE) {
> -		warning(_("ignoring overly large gitattributes blob '%s'"), path);
> -		return NULL;
> +		if (pos < 0)
> +			pos = -pos - 2;
>  	}

What 'pos' represents after this block is somewhat confusing/possibly
misleading. Consider the possible cases for 'path':

1. 'path' is inside the sparse-checkout cone.
2. 'path' is not inside the sparse-checkout cone, but it is in the index
   (i.e., the sparse directory has been expanded for some reason).
3. 'path' is inside a sparse directory.

In case #1, 'pos' will be '-1'. We never enter the
'!path_in_cone_mode_sparse_checkout()' if-statement, so the value is never
updated.

In case #2, 'pos' will be positive, since it exists in the index.

In case #3, the return value of 'index_name_pos_sparse()' will be negative,
then assigned the value of '-pos - 2' to (in many cases) create a positive
value pointing to the entry before the insertion point of 'path'.

Based on your explanation in the code comment above, I would assume that, if
'pos' was >= 0 coming out of this block, it represented the index position
of the potential sparse directory containing 'path'. But that's not true in
case #2.

To make the purpose of these variables clearer, instead of changing 'pos'
in-place, you could create a variable like 'sparse_dir_pos'. Like 'pos' is
now, it'd be initialized to '-1', but instead of unconditionally assigning
it the output of 'index_name_pos_sparse()', you'd only assign it if the
output of that function is negative:

	if (!path_in_cone_mode_sparse_checkout(path, istate)) {
		int pos = index_name_pos_sparse(istate, path, strlen(path));
		if (pos < 0)
			sparse_dir_pos = -pos - 2;
	}

Then, you'd use 'sparse_dir_pos' in the condition below, and it'd always be
-1 when 'path' is definitely not contained in a sparse directory.

>  
> -	return read_attr_from_buf(buf, path, flags);
> +	if (pos >= 0 && !path_in_cone_mode_sparse_checkout(path, istate) &&

nit: the check of '!path_in_cone_mode_sparse_checkout()' is redundant, since
'pos' will always be '-1' if 'path_in_cone_mode_sparse_checkout()' is true.

> +	    S_ISSPARSEDIR(istate->cache[pos]->ce_mode) &&
> +	    !strncmp(istate->cache[pos]->name, path, ce_namelen(istate->cache[pos])) &&
> +	    !normalize_path_copy(normalize_path, path)) {

The normalization should come before the 'strncmp' so that you're comparing
'normalize_path' to the index entry.  

That said, I know I asked about path normalization earlier [1], but did you
confirm that 'path' _isn't_ normalized? Because if it's normalized by all of
the potential callers of this function, the check here would be unnecessary.

[1] https://lore.kernel.org/git/e4a77d0f-cf1d-ef76-fe26-ad5e58372a02@github.com/

> +		relative_path = normalize_path + ce_namelen(istate->cache[pos]);

'relative_path' should be declared here, since it's not used outside this
block. 

> +		stack = read_attr_from_blob(istate, &istate->cache[pos]->oid, relative_path, flags);
> +	} else {
> +		buf = read_blob_data_from_index(istate, path, &size);
> +		if (!buf)
> +			return NULL;
> +		if (size >= ATTR_MAX_FILE_SIZE) {
> +			warning(_("ignoring overly large gitattributes blob '%s'"), path);
> +			return NULL;
> +		}
> +		stack = read_attr_from_buf(buf, path, flags);
> +	}
> +	return stack;
>  }
>  
>  static struct attr_stack *read_attr(struct index_state *istate,
> diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
> index 90633f383a..3f32c1f972 100755
> --- a/t/t1092-sparse-checkout-compatibility.sh
> +++ b/t/t1092-sparse-checkout-compatibility.sh
> @@ -2271,7 +2271,7 @@ test_expect_success 'check-attr with pathspec inside sparse definition' '
>  	test_all_match git check-attr -a --cached -- deep/a
>  '
>  
> -test_expect_failure 'check-attr with pathspec outside sparse definition' '
> +test_expect_success 'check-attr with pathspec outside sparse definition' '

Re: my suggested change to the test in patch 1 [2], when I applied _this_
patch, the test still failed for the 'sparse-index' case. It doesn't seem to
be a problem with your patch, but rather a bug in 'diff' that can be
reproduced with this test (using the infrastructure in t1092):

test_expect_failure 'diff --cached shows differences in sparse directory' '
	init_repos &&

	test_all_match git reset --soft update-folder1 &&
	test_all_match git diff --cached -- folder1/a
'

It's not immediately obvious to me what the problem is, but my guess is it's
some mis-handling of sparse directories in the internal diff machinery.
Given the likely complexity of the issue, I'd be content with you leaving
the 'diff --check' test as 'test_expect_failure' with a note about the bug
in 'diff' to fix later. Or, if you do want to investigate & fix it now, I
wouldn't be opposed to that either. :) 

[2] https://lore.kernel.org/git/c3ebe3b4-88b9-8ca2-2ee3-39a3e0d82201@github.com/

>  	init_repos &&
>  
>  	echo "a -crlf myAttr" >>.gitattributes &&
> @@ -2288,7 +2288,7 @@ test_expect_failure 'check-attr with pathspec outside sparse definition' '
>  	test_all_match git check-attr  -a --cached -- folder1/a
>  '
>  
> -test_expect_failure 'diff --check with pathspec outside sparse definition' '
> +test_expect_success 'diff --check with pathspec outside sparse definition' '
>  	init_repos &&
>  
>  	write_script edit-contents <<-\EOF &&


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v4 2/3] attr.c: read attributes in a sparse directory
  2023-07-20 20:18         ` Victoria Dye
@ 2023-08-03 16:22           ` Glen Choo
  2023-08-15  8:05             ` Shuqi Liang
  0 siblings, 1 reply; 39+ messages in thread
From: Glen Choo @ 2023-08-03 16:22 UTC (permalink / raw)
  To: Victoria Dye, Shuqi Liang, git; +Cc: gitster

Here's something odd that I spotted that I think other reviewers haven't
mentioned. That said, Victoria has already given quite extensive review
and I trust her judgement on this series, so if I accidentally end up
contradicting her, ignore me and trust her instead :)

Victoria Dye <vdye@github.com> writes:

>> -test_expect_failure 'check-attr with pathspec outside sparse definition' '
>> +test_expect_success 'check-attr with pathspec outside sparse definition' '
>
> Re: my suggested change to the test in patch 1 [2], when I applied _this_
> patch, the test still failed for the 'sparse-index' case. It doesn't seem to
> be a problem with your patch, but rather a bug in 'diff' that can be
> reproduced with this test (using the infrastructure in t1092):
>
> test_expect_failure 'diff --cached shows differences in sparse directory' '
> 	init_repos &&
>
> 	test_all_match git reset --soft update-folder1 &&
> 	test_all_match git diff --cached -- folder1/a
> '
>
> It's not immediately obvious to me what the problem is, but my guess is it's
> some mis-handling of sparse directories in the internal diff machinery.
> Given the likely complexity of the issue, I'd be content with you leaving
> the 'diff --check' test as 'test_expect_failure' with a note about the bug
> in 'diff' to fix later. Or, if you do want to investigate & fix it now, I
> wouldn't be opposed to that either. :) 
>
> [2] https://lore.kernel.org/git/c3ebe3b4-88b9-8ca2-2ee3-39a3e0d82201@github.com/

Because the 'diff --check' test is broken, and 'git check-attr' still
expands the index (as noted in the next patch), the code that implements
'read from a blob if the .gitattributes is not in the index' is not
exercised by the tests in this patch (it gets exercised in the next
patch). IOW, you can remove this logic and the tests still pass, like
so:

----- >8 --------- >8 --------- >8 --------- >8 --------- >8 ----
diff --git a/attr.c b/attr.c
index 1488b8e18a..abfa2078ac 100644
--- a/attr.c
+++ b/attr.c
@@ -23,6 +23,7 @@
 #include "thread-utils.h"
 #include "tree-walk.h"
 #include "object-name.h"
+#include "trace2.h"
 
 const char git_attr__true[] = "(builtin)true";
 const char git_attr__false[] = "\0(builtin)false";
@@ -847,9 +848,11 @@ static struct attr_stack *read_attr_from_index(struct index_state *istate,
 	    S_ISSPARSEDIR(istate->cache[pos]->ce_mode) &&
 	    !strncmp(istate->cache[pos]->name, path, ce_namelen(istate->cache[pos])) &&
 	    !normalize_path_copy(normalize_path, path)) {
-		relative_path = normalize_path + ce_namelen(istate->cache[pos]);
-		stack = read_attr_from_blob(istate, &istate->cache[pos]->oid, relative_path, flags);
+		/* relative_path = normalize_path + ce_namelen(istate->cache[pos]); */
+		/* stack = read_attr_from_blob(istate, &istate->cache[pos]->oid, relative_path, flags); */
+		trace2_printf("Tried to read from blob");
 	} else {
+		trace2_printf("Tried to read from index");
 		buf = read_blob_data_from_index(istate, path, &size);
 		if (!buf)
 			return NULL;
----- >8 --------- >8 --------- >8 --------- >8 --------- >8 ----

If I were writing patches and encountered this situation, I would squash
patches 2-3/3 together since both are closely related and quite small,
but I'll leave the decision to you + other reviewers.

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v5 0/3] check-attr: integrate with sparse-index
  2023-07-18 23:29     ` [PATCH v4 " Shuqi Liang
                         ` (2 preceding siblings ...)
  2023-07-18 23:29       ` [PATCH v4 3/3] check-attr: integrate with sparse-index Shuqi Liang
@ 2023-08-11 14:22       ` Shuqi Liang
  2023-08-11 14:22         ` [PATCH v5 1/3] t1092: add tests for 'git check-attr' Shuqi Liang
                           ` (3 more replies)
  3 siblings, 4 replies; 39+ messages in thread
From: Shuqi Liang @ 2023-08-11 14:22 UTC (permalink / raw)
  To: git; +Cc: Shuqi Liang, vdye, gitster

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=y, Size: 11779 bytes --]

MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

change against v4:

1/3:
* Add a commit message to explain why 'test_expect_failure' is set
and why 'test_expect_success' is set.

* Update 'diff --check with pathspec outside sparse definition' to
compare "index vs HEAD" rather than "working tree vs index".

* Use 'test_all_match' to apply the config to the test repos instead of
the parent .

* Use 'test_(all|sparse)_match' when running Git commands in
these tests.

2/3:
* Create a variable named 'sparse_dir_pos' to make the purpose of
variable clearer.

* Remove the redundant check of '!path_in_cone_mode_sparse_checkout()'
since 'pos' will always be '-1' if 'path_in_cone_mode_sparse_checkout()'
is true.

* Remove normalize path check because 'prefix_path'(builtin/check-attr.c)
call to 'normalize_path_copy_len' (path.c:1124). This confirms that the
path has indeed been normalized.

* Leave the 'diff --check' test as 'test_expect_failure' with a note about
the bug in 'diff' to fix later.


Shuqi Liang (3):
  t1092: add tests for 'git check-attr'
  attr.c: read attributes in a sparse directory
  check-attr: integrate with sparse-index

 attr.c                                   | 57 +++++++++++++------
 builtin/check-attr.c                     |  3 +
 t/perf/p2000-sparse-operations.sh        |  1 +
 t/t1092-sparse-checkout-compatibility.sh | 72 ++++++++++++++++++++++++
 4 files changed, 115 insertions(+), 18 deletions(-)

Range-diff against v4:
1:  9c43eea9cc ! 1:  78d0fc0df1 t1092: add tests for 'git check-attr'
    @@ Commit message
     
         Add a test named 'diff --check with pathspec outside sparse definition'.
         It starts by disabling the trailing whitespace and space-before-tab
    -    checks using the core.whitespace configuration option. Then, it
    +    checks using the core. whitespace configuration option. Then, it
         specifically re-enables the trailing whitespace check for a file located
    -    in a sparse directory. This is accomplished by adding a
    -    whitespace=trailing-space rule to the .gitattributes file within that
    -    directory. To ensure that only the .gitattributes file in the index is
    -    being read, and not any .gitattributes files in the working tree, the
    -    test removes the .gitattributes file from the working tree after adding
    -    it to the index. The final part of the test uses 'git diff --check' to
    -    verify the correct application of the attribute rules. This ensures that
    -    the .gitattributes file is correctly read from index and applied, even
    -    when the file's path falls outside of the sparse-checkout definition.
    +    in a sparse directory by adding a whitespace=trailing-space rule to the
    +    .gitattributes file within that directory. Next, create and populate the
    +    folder1 directory, and then add the .gitattributes file to the index.
    +    Edit the contents of folder1/a, add it to the index, and proceed to
    +    "re-sparsify" 'folder1/' with 'git sparse-checkout reapply'. Finally,
    +    use 'git diff --check --cached' to compare the 'index vs. HEAD',
    +    ensuring the correct application of the attribute rules even when the
    +    file's path is outside the sparse-checkout definition.
    +
    +    Mark the two tests 'check-attr with pathspec outside sparse definition'
    +    and 'diff --check with pathspec outside sparse definition' as
    +    'test_expect_failure' to reflect an existing issue where the attributes
    +    inside a sparse directory are ignored. Ensure that the 'check-attr'
    +    command fails to read the required attributes to demonstrate this
    +    expected failure.
     
         Helped-by: Victoria Dye <vdye@github.com>
         Signed-off-by: Shuqi Liang <cheskaqiqi@gmail.com>
    @@ t/t1092-sparse-checkout-compatibility.sh: test_expect_success 'worktree is not e
     +	test_all_match git check-attr -a -- folder1/a &&
     +
     +	git -C full-checkout add folder1/.gitattributes &&
    -+	run_on_sparse git add --sparse folder1/.gitattributes &&
    -+	run_on_all git commit -m "add .gitattributes" &&
    ++	test_sparse_match git add --sparse folder1/.gitattributes &&
    ++	test_all_match git commit -m "add .gitattributes" &&
     +	test_sparse_match git sparse-checkout reapply &&
    -+	test_all_match git check-attr  -a --cached -- folder1/a
    ++	test_all_match git check-attr -a --cached -- folder1/a
     +'
     +
     +test_expect_failure 'diff --check with pathspec outside sparse definition' '
    @@ t/t1092-sparse-checkout-compatibility.sh: test_expect_success 'worktree is not e
     +	echo "a " >"$1"
     +	EOF
     +
    -+	git config core.whitespace -trailing-space,-space-before-tab &&
    ++	test_all_match git config core.whitespace -trailing-space,-space-before-tab &&
     +
     +	echo "a whitespace=trailing-space,space-before-tab" >>.gitattributes &&
     +	run_on_all mkdir -p folder1 &&
     +	run_on_all cp ../.gitattributes ./folder1 &&
    -+	git -C full-checkout add folder1/.gitattributes &&
    -+	run_on_sparse git add --sparse folder1/.gitattributes &&
    -+	run_on_all rm folder1/.gitattributes &&
    -+	run_on_all  ../edit-contents folder1/a &&
    -+	test_all_match test_must_fail git diff --check -- folder1/a
    ++	test_all_match git add --sparse folder1/.gitattributes &&
    ++	run_on_all ../edit-contents folder1/a &&
    ++	test_all_match git add --sparse folder1/a &&
    ++
    ++	test_sparse_match git sparse-checkout reapply &&
    ++	test_all_match test_must_fail git diff --check --cached -- folder1/a
     +'
     +
      test_done
2:  63ff110b1c ! 2:  ef866930c6 attr.c: read attributes in a sparse directory
    @@ Commit message
         3.If path is not inside a sparse directory,ensure that attributes are
         fetched from the index blob with read_blob_data_from_index().
     
    -    Modify previous tests so such difference is not considered as an error.
    +    Change the test 'check-attr with pathspec outside sparse definition' to
    +    'test_expect_success' to reflect that the attributes inside a sparse
    +    directory can now be read. Ensure that the sparse index case works
    +    correctly for git check-attr to illustrate the successful handling of
    +    attributes within sparse directories.
     
         Helped-by: Victoria Dye <vdye@github.com>
         Signed-off-by: Shuqi Liang <cheskaqiqi@gmail.com>
    @@ attr.c: static struct attr_stack *read_attr_from_blob(struct index_state *istate
     +	struct attr_stack *stack = NULL;
      	char *buf;
      	unsigned long size;
    -+	int pos = -1;
    -+	char normalize_path[PATH_MAX];
    -+	const char *relative_path;
    ++	int sparse_dir_pos = -1;
      
      	if (!istate)
      		return NULL;
    @@ attr.c: static struct attr_stack *read_attr_from_blob(struct index_state *istate
     +	 * minus 1 gives us the position where the path would be
     +	 * inserted in lexicographic order within the index.
     +	 * We then subtract another 1 from this value
    -+	 * (pos = -pos - 2) to find the position of the last
    -+	 * index entry which is lexicographically smaller than
    ++	 * (sparse_dir_pos = -pos - 2) to find the position of the
    ++	 * last index entry which is lexicographically smaller than
     +	 * the path. This would be the sparse directory containing
     +	 * the path. By identifying the sparse directory containing
     +	 * the path, we can correctly read the attributes specified
    @@ attr.c: static struct attr_stack *read_attr_from_blob(struct index_state *istate
     -	if (!path_in_cone_mode_sparse_checkout(path, istate))
     -		return NULL;
     +	if (!path_in_cone_mode_sparse_checkout(path, istate)) {
    -+		pos = index_name_pos_sparse(istate, path, strlen(path));
    ++		int pos = index_name_pos_sparse(istate, path, strlen(path));
      
     -	buf = read_blob_data_from_index(istate, path, &size);
     -	if (!buf)
    @@ attr.c: static struct attr_stack *read_attr_from_blob(struct index_state *istate
     -		warning(_("ignoring overly large gitattributes blob '%s'"), path);
     -		return NULL;
     +		if (pos < 0)
    -+			pos = -pos - 2;
    ++			sparse_dir_pos = -pos - 2;
      	}
      
     -	return read_attr_from_buf(buf, path, flags);
    -+	if (pos >= 0 && !path_in_cone_mode_sparse_checkout(path, istate) &&
    -+	    S_ISSPARSEDIR(istate->cache[pos]->ce_mode) &&
    -+	    !strncmp(istate->cache[pos]->name, path, ce_namelen(istate->cache[pos])) &&
    -+	    !normalize_path_copy(normalize_path, path)) {
    -+		relative_path = normalize_path + ce_namelen(istate->cache[pos]);
    -+		stack = read_attr_from_blob(istate, &istate->cache[pos]->oid, relative_path, flags);
    ++	if (sparse_dir_pos >= 0 &&
    ++	    S_ISSPARSEDIR(istate->cache[sparse_dir_pos]->ce_mode) &&
    ++	    !strncmp(istate->cache[sparse_dir_pos]->name, path, ce_namelen(istate->cache[sparse_dir_pos]))) {
    ++		const char *relative_path = path + ce_namelen(istate->cache[sparse_dir_pos]);
    ++		stack = read_attr_from_blob(istate, &istate->cache[sparse_dir_pos]->oid, relative_path, flags);
     +	} else {
     +		buf = read_blob_data_from_index(istate, path, &size);
     +		if (!buf)
    @@ t/t1092-sparse-checkout-compatibility.sh: test_expect_success 'check-attr with p
      
      	echo "a -crlf myAttr" >>.gitattributes &&
     @@ t/t1092-sparse-checkout-compatibility.sh: test_expect_failure 'check-attr with pathspec outside sparse definition' '
    - 	test_all_match git check-attr  -a --cached -- folder1/a
    + 	test_all_match git check-attr -a --cached -- folder1/a
      '
      
    --test_expect_failure 'diff --check with pathspec outside sparse definition' '
    -+test_expect_success 'diff --check with pathspec outside sparse definition' '
    ++# NEEDSWORK: The 'diff --check' test is left as 'test_expect_failure' due
    ++# to an underlying issue in oneway_diff() within diff-lib.c.
    ++# 'do_oneway_diff()' is not called as expected for paths that could match
    ++# inside of a sparse directory. Specifically, the 'ce_path_match()' function
    ++# fails to recognize files inside a sparse directory (e.g., when 'folder1/'
    ++# is a sparse directory, 'folder1/a' cannot be recognized). The goal is to
    ++# proceed with 'do_oneway_diff()' if the pathspec could match inside of a
    ++# sparse directory.
    + test_expect_failure 'diff --check with pathspec outside sparse definition' '
      	init_repos &&
      
    - 	write_script edit-contents <<-\EOF &&
3:  7a9c2da30d ! 3:  310397de6d check-attr: integrate with sparse-index
    @@ t/perf/p2000-sparse-operations.sh: test_perf_on_all git diff-files -- $SPARSE_CO
      test_done
     
      ## t/t1092-sparse-checkout-compatibility.sh ##
    -@@ t/t1092-sparse-checkout-compatibility.sh: test_expect_success 'diff --check with pathspec outside sparse definition' '
    - 	test_all_match test_must_fail git diff --check -- folder1/a
    +@@ t/t1092-sparse-checkout-compatibility.sh: test_expect_failure 'diff --check with pathspec outside sparse definition' '
    + 	test_all_match test_must_fail git diff --check --cached -- folder1/a
      '
      
     +test_expect_success 'sparse-index is not expanded: check-attr' '
    @@ t/t1092-sparse-checkout-compatibility.sh: test_expect_success 'diff --check with
     +	cp .gitattributes ./sparse-index/folder1 &&
     +
     +	git -C sparse-index add deep/.gitattributes &&
    -+	git -C sparse-index add --sparse  folder1/.gitattributes &&
    ++	git -C sparse-index add --sparse folder1/.gitattributes &&
     +	ensure_not_expanded check-attr -a --cached -- deep/a &&
     +	ensure_not_expanded check-attr -a --cached -- folder1/a
     +'
-- 
2.39.0


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH v5 1/3] t1092: add tests for 'git check-attr'
  2023-08-11 14:22       ` [PATCH v5 0/3] " Shuqi Liang
@ 2023-08-11 14:22         ` Shuqi Liang
  2023-08-11 14:22         ` [PATCH v5 2/3] attr.c: read attributes in a sparse directory Shuqi Liang
                           ` (2 subsequent siblings)
  3 siblings, 0 replies; 39+ messages in thread
From: Shuqi Liang @ 2023-08-11 14:22 UTC (permalink / raw)
  To: git; +Cc: Shuqi Liang, vdye, gitster

Add tests for `git check-attr`, make sure attribute file does get read
from index when path is either inside or outside of sparse-checkout
definition.

Add a test named 'diff --check with pathspec outside sparse definition'.
It starts by disabling the trailing whitespace and space-before-tab
checks using the core. whitespace configuration option. Then, it
specifically re-enables the trailing whitespace check for a file located
in a sparse directory by adding a whitespace=trailing-space rule to the
.gitattributes file within that directory. Next, create and populate the
folder1 directory, and then add the .gitattributes file to the index.
Edit the contents of folder1/a, add it to the index, and proceed to
"re-sparsify" 'folder1/' with 'git sparse-checkout reapply'. Finally,
use 'git diff --check --cached' to compare the 'index vs. HEAD',
ensuring the correct application of the attribute rules even when the
file's path is outside the sparse-checkout definition.

Mark the two tests 'check-attr with pathspec outside sparse definition'
and 'diff --check with pathspec outside sparse definition' as
'test_expect_failure' to reflect an existing issue where the attributes
inside a sparse directory are ignored. Ensure that the 'check-attr'
command fails to read the required attributes to demonstrate this
expected failure.

Helped-by: Victoria Dye <vdye@github.com>
Signed-off-by: Shuqi Liang <cheskaqiqi@gmail.com>
---
 t/t1092-sparse-checkout-compatibility.sh | 49 ++++++++++++++++++++++++
 1 file changed, 49 insertions(+)

diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index 8a95adf4b5..2d7fa65d81 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -2259,4 +2259,53 @@ test_expect_success 'worktree is not expanded' '
 	ensure_not_expanded worktree remove .worktrees/hotfix
 '
 
+test_expect_success 'check-attr with pathspec inside sparse definition' '
+	init_repos &&
+
+	echo "a -crlf myAttr" >>.gitattributes &&
+	run_on_all cp ../.gitattributes ./deep &&
+
+	test_all_match git check-attr -a -- deep/a &&
+
+	test_all_match git add deep/.gitattributes &&
+	test_all_match git check-attr -a --cached -- deep/a
+'
+
+test_expect_failure 'check-attr with pathspec outside sparse definition' '
+	init_repos &&
+
+	echo "a -crlf myAttr" >>.gitattributes &&
+	run_on_sparse mkdir folder1 &&
+	run_on_all cp ../.gitattributes ./folder1 &&
+	run_on_all cp a folder1/a &&
+
+	test_all_match git check-attr -a -- folder1/a &&
+
+	git -C full-checkout add folder1/.gitattributes &&
+	test_sparse_match git add --sparse folder1/.gitattributes &&
+	test_all_match git commit -m "add .gitattributes" &&
+	test_sparse_match git sparse-checkout reapply &&
+	test_all_match git check-attr -a --cached -- folder1/a
+'
+
+test_expect_failure 'diff --check with pathspec outside sparse definition' '
+	init_repos &&
+
+	write_script edit-contents <<-\EOF &&
+	echo "a " >"$1"
+	EOF
+
+	test_all_match git config core.whitespace -trailing-space,-space-before-tab &&
+
+	echo "a whitespace=trailing-space,space-before-tab" >>.gitattributes &&
+	run_on_all mkdir -p folder1 &&
+	run_on_all cp ../.gitattributes ./folder1 &&
+	test_all_match git add --sparse folder1/.gitattributes &&
+	run_on_all ../edit-contents folder1/a &&
+	test_all_match git add --sparse folder1/a &&
+
+	test_sparse_match git sparse-checkout reapply &&
+	test_all_match test_must_fail git diff --check --cached -- folder1/a
+'
+
 test_done
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v5 2/3] attr.c: read attributes in a sparse directory
  2023-08-11 14:22       ` [PATCH v5 0/3] " Shuqi Liang
  2023-08-11 14:22         ` [PATCH v5 1/3] t1092: add tests for 'git check-attr' Shuqi Liang
@ 2023-08-11 14:22         ` Shuqi Liang
  2023-08-11 14:22         ` [PATCH v5 3/3] check-attr: integrate with sparse-index Shuqi Liang
  2023-08-14 16:24         ` [PATCH v5 0/3] " Victoria Dye
  3 siblings, 0 replies; 39+ messages in thread
From: Shuqi Liang @ 2023-08-11 14:22 UTC (permalink / raw)
  To: git; +Cc: Shuqi Liang, vdye, gitster

Before this patch, git check-attr was unable to read the attributes from
a .gitattributes file within a sparse directory. The original comment
was operating under the assumption that users are only interested in
files or directories inside the cones. Therefore, in the original code,
in the case of a cone-mode sparse-checkout, we didn't load the
.gitattributes file.

However, this behavior can lead to missing attributes for files inside
sparse directories, causing inconsistencies in file handling.

To resolve this, revise 'git check-attr' to allow attribute reading for
files in sparse directories from the corresponding .gitattributes files:

1.Utilize path_in_cone_mode_sparse_checkout() and index_name_pos_sparse
to check if a path falls within a sparse directory.

2.If path is inside a sparse directory, employ the value of
index_name_pos_sparse() to find the sparse directory containing path and
path relative to sparse directory. Proceed to read attributes from the
tree OID of the sparse directory using read_attr_from_blob().

3.If path is not inside a sparse directory,ensure that attributes are
fetched from the index blob with read_blob_data_from_index().

Change the test 'check-attr with pathspec outside sparse definition' to
'test_expect_success' to reflect that the attributes inside a sparse
directory can now be read. Ensure that the sparse index case works
correctly for git check-attr to illustrate the successful handling of
attributes within sparse directories.

Helped-by: Victoria Dye <vdye@github.com>
Signed-off-by: Shuqi Liang <cheskaqiqi@gmail.com>
---
 attr.c                                   | 57 ++++++++++++++++--------
 t/t1092-sparse-checkout-compatibility.sh | 10 ++++-
 2 files changed, 48 insertions(+), 19 deletions(-)

diff --git a/attr.c b/attr.c
index 7d39ac4a29..1d34e48ea2 100644
--- a/attr.c
+++ b/attr.c
@@ -808,35 +808,56 @@ static struct attr_stack *read_attr_from_blob(struct index_state *istate,
 static struct attr_stack *read_attr_from_index(struct index_state *istate,
 					       const char *path, unsigned flags)
 {
+	struct attr_stack *stack = NULL;
 	char *buf;
 	unsigned long size;
+	int sparse_dir_pos = -1;
 
 	if (!istate)
 		return NULL;
 
 	/*
-	 * The .gitattributes file only applies to files within its
-	 * parent directory. In the case of cone-mode sparse-checkout,
-	 * the .gitattributes file is sparse if and only if all paths
-	 * within that directory are also sparse. Thus, don't load the
-	 * .gitattributes file since it will not matter.
-	 *
-	 * In the case of a sparse index, it is critical that we don't go
-	 * looking for a .gitattributes file, as doing so would cause the
-	 * index to expand.
+	 * When handling sparse-checkouts, .gitattributes files
+	 * may reside within a sparse directory. We distinguish
+	 * whether a path exists directly in the index or not by
+	 * evaluating if 'pos' is negative.
+	 * If 'pos' is negative, the path is not directly present
+	 * in the index and is likely within a sparse directory.
+	 * For paths not in the index, The absolute value of 'pos'
+	 * minus 1 gives us the position where the path would be
+	 * inserted in lexicographic order within the index.
+	 * We then subtract another 1 from this value
+	 * (sparse_dir_pos = -pos - 2) to find the position of the
+	 * last index entry which is lexicographically smaller than
+	 * the path. This would be the sparse directory containing
+	 * the path. By identifying the sparse directory containing
+	 * the path, we can correctly read the attributes specified
+	 * in the .gitattributes file from the tree object of the
+	 * sparse directory.
 	 */
-	if (!path_in_cone_mode_sparse_checkout(path, istate))
-		return NULL;
+	if (!path_in_cone_mode_sparse_checkout(path, istate)) {
+		int pos = index_name_pos_sparse(istate, path, strlen(path));
 
-	buf = read_blob_data_from_index(istate, path, &size);
-	if (!buf)
-		return NULL;
-	if (size >= ATTR_MAX_FILE_SIZE) {
-		warning(_("ignoring overly large gitattributes blob '%s'"), path);
-		return NULL;
+		if (pos < 0)
+			sparse_dir_pos = -pos - 2;
 	}
 
-	return read_attr_from_buf(buf, path, flags);
+	if (sparse_dir_pos >= 0 &&
+	    S_ISSPARSEDIR(istate->cache[sparse_dir_pos]->ce_mode) &&
+	    !strncmp(istate->cache[sparse_dir_pos]->name, path, ce_namelen(istate->cache[sparse_dir_pos]))) {
+		const char *relative_path = path + ce_namelen(istate->cache[sparse_dir_pos]);
+		stack = read_attr_from_blob(istate, &istate->cache[sparse_dir_pos]->oid, relative_path, flags);
+	} else {
+		buf = read_blob_data_from_index(istate, path, &size);
+		if (!buf)
+			return NULL;
+		if (size >= ATTR_MAX_FILE_SIZE) {
+			warning(_("ignoring overly large gitattributes blob '%s'"), path);
+			return NULL;
+		}
+		stack = read_attr_from_buf(buf, path, flags);
+	}
+	return stack;
 }
 
 static struct attr_stack *read_attr(struct index_state *istate,
diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index 2d7fa65d81..dc84b3e2e1 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -2271,7 +2271,7 @@ test_expect_success 'check-attr with pathspec inside sparse definition' '
 	test_all_match git check-attr -a --cached -- deep/a
 '
 
-test_expect_failure 'check-attr with pathspec outside sparse definition' '
+test_expect_success 'check-attr with pathspec outside sparse definition' '
 	init_repos &&
 
 	echo "a -crlf myAttr" >>.gitattributes &&
@@ -2288,6 +2288,14 @@ test_expect_failure 'check-attr with pathspec outside sparse definition' '
 	test_all_match git check-attr -a --cached -- folder1/a
 '
 
+# NEEDSWORK: The 'diff --check' test is left as 'test_expect_failure' due
+# to an underlying issue in oneway_diff() within diff-lib.c.
+# 'do_oneway_diff()' is not called as expected for paths that could match
+# inside of a sparse directory. Specifically, the 'ce_path_match()' function
+# fails to recognize files inside a sparse directory (e.g., when 'folder1/'
+# is a sparse directory, 'folder1/a' cannot be recognized). The goal is to
+# proceed with 'do_oneway_diff()' if the pathspec could match inside of a
+# sparse directory.
 test_expect_failure 'diff --check with pathspec outside sparse definition' '
 	init_repos &&
 
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v5 3/3] check-attr: integrate with sparse-index
  2023-08-11 14:22       ` [PATCH v5 0/3] " Shuqi Liang
  2023-08-11 14:22         ` [PATCH v5 1/3] t1092: add tests for 'git check-attr' Shuqi Liang
  2023-08-11 14:22         ` [PATCH v5 2/3] attr.c: read attributes in a sparse directory Shuqi Liang
@ 2023-08-11 14:22         ` Shuqi Liang
  2023-08-14 16:24         ` [PATCH v5 0/3] " Victoria Dye
  3 siblings, 0 replies; 39+ messages in thread
From: Shuqi Liang @ 2023-08-11 14:22 UTC (permalink / raw)
  To: git; +Cc: Shuqi Liang, vdye, gitster

Set the requires-full-index to false for "check-attr".

Add a test to ensure that the index is not expanded whether the files
are outside or inside the sparse-checkout cone when the sparse index is
enabled.

The `p2000` tests demonstrate a ~63% execution time reduction for
'git check-attr' using a sparse index.

Test                                            before  after
-----------------------------------------------------------------------
2000.106: git check-attr -a f2/f4/a (full-v3)    0.05   0.05 +0.0%
2000.107: git check-attr -a f2/f4/a (full-v4)    0.05   0.05 +0.0%
2000.108: git check-attr -a f2/f4/a (sparse-v3)  0.04   0.02 -50.0%
2000.109: git check-attr -a f2/f4/a (sparse-v4)  0.04   0.01 -75.0%

Helped-by: Victoria Dye <vdye@github.com>
Signed-off-by: Shuqi Liang <cheskaqiqi@gmail.com>
---
 builtin/check-attr.c                     |  3 +++
 t/perf/p2000-sparse-operations.sh        |  1 +
 t/t1092-sparse-checkout-compatibility.sh | 15 +++++++++++++++
 3 files changed, 19 insertions(+)

diff --git a/builtin/check-attr.c b/builtin/check-attr.c
index b22ff748c3..c1da1d184e 100644
--- a/builtin/check-attr.c
+++ b/builtin/check-attr.c
@@ -122,6 +122,9 @@ int cmd_check_attr(int argc, const char **argv, const char *prefix)
 	argc = parse_options(argc, argv, prefix, check_attr_options,
 			     check_attr_usage, PARSE_OPT_KEEP_DASHDASH);
 
+	prepare_repo_settings(the_repository);
+	the_repository->settings.command_requires_full_index = 0;
+
 	if (repo_read_index(the_repository) < 0) {
 		die("invalid cache");
 	}
diff --git a/t/perf/p2000-sparse-operations.sh b/t/perf/p2000-sparse-operations.sh
index 96ed3e1d69..39e92b0841 100755
--- a/t/perf/p2000-sparse-operations.sh
+++ b/t/perf/p2000-sparse-operations.sh
@@ -134,5 +134,6 @@ test_perf_on_all git diff-files -- $SPARSE_CONE/a
 test_perf_on_all git diff-tree HEAD
 test_perf_on_all git diff-tree HEAD -- $SPARSE_CONE/a
 test_perf_on_all "git worktree add ../temp && git worktree remove ../temp"
+test_perf_on_all git check-attr -a -- $SPARSE_CONE/a
 
 test_done
diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index dc84b3e2e1..2a4f35e984 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -2316,4 +2316,19 @@ test_expect_failure 'diff --check with pathspec outside sparse definition' '
 	test_all_match test_must_fail git diff --check --cached -- folder1/a
 '
 
+test_expect_success 'sparse-index is not expanded: check-attr' '
+	init_repos &&
+
+	echo "a -crlf myAttr" >>.gitattributes &&
+	mkdir ./sparse-index/folder1 &&
+	cp ./sparse-index/a ./sparse-index/folder1/a &&
+	cp .gitattributes ./sparse-index/deep &&
+	cp .gitattributes ./sparse-index/folder1 &&
+
+	git -C sparse-index add deep/.gitattributes &&
+	git -C sparse-index add --sparse folder1/.gitattributes &&
+	ensure_not_expanded check-attr -a --cached -- deep/a &&
+	ensure_not_expanded check-attr -a --cached -- folder1/a
+'
+
 test_done
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [PATCH v5 0/3] check-attr: integrate with sparse-index
  2023-08-11 14:22       ` [PATCH v5 0/3] " Shuqi Liang
                           ` (2 preceding siblings ...)
  2023-08-11 14:22         ` [PATCH v5 3/3] check-attr: integrate with sparse-index Shuqi Liang
@ 2023-08-14 16:24         ` Victoria Dye
  2023-08-14 17:10           ` Junio C Hamano
  3 siblings, 1 reply; 39+ messages in thread
From: Victoria Dye @ 2023-08-14 16:24 UTC (permalink / raw)
  To: Shuqi Liang, git; +Cc: gitster

Shuqi Liang wrote:
> change against v4:

I've reviewed the patches in this version and all of my prior feedback
appears to be addressed. Overall, I think this is ready to merge. 

I see that you didn't take the suggestion from [1], though. I personally
don't consider it a blocking issue, but I am curious to hear your
thoughts/reasoning behind sticking with your current patch organization over
what was suggested there.

[1] https://lore.kernel.org/git/kl6la5v82izn.fsf@chooglen-macbookpro.roam.corp.google.com/

Otherwise, a couple notes:

> 
> 1/3:
> * Add a commit message to explain why 'test_expect_failure' is set
> and why 'test_expect_success' is set.
> 
> * Update 'diff --check with pathspec outside sparse definition' to
> compare "index vs HEAD" rather than "working tree vs index".
> 
> * Use 'test_all_match' to apply the config to the test repos instead of
> the parent .
> 
> * Use 'test_(all|sparse)_match' when running Git commands in
> these tests.
> 
> 2/3:
> * Create a variable named 'sparse_dir_pos' to make the purpose of
> variable clearer.
> 
> * Remove the redundant check of '!path_in_cone_mode_sparse_checkout()'
> since 'pos' will always be '-1' if 'path_in_cone_mode_sparse_checkout()'
> is true.
> 
> * Remove normalize path check because 'prefix_path'(builtin/check-attr.c)
> call to 'normalize_path_copy_len' (path.c:1124). This confirms that the
> path has indeed been normalized.

Nice, thanks for looking into this! I'm glad we're able to avoid the
normalization, it simplifies the code quite a bit.

> 
> * Leave the 'diff --check' test as 'test_expect_failure' with a note about
> the bug in 'diff' to fix later.

Makes sense. The extra detail added in the "NEEDSWORK" comment is especially
helpful in pointing out which part of the diff machinery causes the issue.

> 
> 
> Shuqi Liang (3):
>   t1092: add tests for 'git check-attr'
>   attr.c: read attributes in a sparse directory
>   check-attr: integrate with sparse-index


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v5 0/3] check-attr: integrate with sparse-index
  2023-08-14 16:24         ` [PATCH v5 0/3] " Victoria Dye
@ 2023-08-14 17:10           ` Junio C Hamano
  0 siblings, 0 replies; 39+ messages in thread
From: Junio C Hamano @ 2023-08-14 17:10 UTC (permalink / raw)
  To: Victoria Dye; +Cc: Shuqi Liang, git

Victoria Dye <vdye@github.com> writes:

> Shuqi Liang wrote:
>> change against v4:
>
> I've reviewed the patches in this version and all of my prior feedback
> appears to be addressed. Overall, I think this is ready to merge. 
>
> I see that you didn't take the suggestion from [1], though. I personally
> don't consider it a blocking issue, but I am curious to hear your
> thoughts/reasoning behind sticking with your current patch organization over
> what was suggested there.
>
> [1] https://lore.kernel.org/git/kl6la5v82izn.fsf@chooglen-macbookpro.roam.corp.google.com/
>
> Otherwise, a couple notes:

Thanks for a review Victoria, and Shuqi, thanks for working on the
topic.

I too am curious what your response to [1] would be, by the way.


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v4 2/3] attr.c: read attributes in a sparse directory
  2023-08-03 16:22           ` Glen Choo
@ 2023-08-15  8:05             ` Shuqi Liang
  0 siblings, 0 replies; 39+ messages in thread
From: Shuqi Liang @ 2023-08-15  8:05 UTC (permalink / raw)
  To: Glen Choo, git, Junio C Hamano, Victoria Dye

Hi Glen,

I just realized I missed your previous email – my apologies for the delay!
Really appreciate your insights on the patches!

I'm trying to wrap my head around the points you've made. If I'm reading
you right, you're suggesting that in my patch2, the logic for 'read from
a blob if the .gitattributes is not in the index' isn't being executed?
You think the index is still being expanded and hence this part of the
code doesn't influence the test results. In particular, you're referring
to the 'check-attr with pathspec outside sparse definition' and
'diff --check with pathspec outside sparse definition' tests, right?

I tried out the code changes you shared, but the tests didn’t pass for
me. In the original setup, even with the expanded index, the base code
couldn't read the attributes from files within a sparse directory.
So, I'm inclined to think that the modifications in patch2 have a direct
bearing on whether the tests pass or fail.

Would love to hear more about your thoughts on this. Thanks again for
diving deep into the patches!

On Fri, Aug 4, 2023 at 12:22 AM Glen Choo <chooglen@google.com> wrote:
>
> Here's something odd that I spotted that I think other reviewers haven't
> mentioned. That said, Victoria has already given quite extensive review
> and I trust her judgement on this series, so if I accidentally end up
> contradicting her, ignore me and trust her instead :)
>
> Victoria Dye <vdye@github.com> writes:
>
> >> -test_expect_failure 'check-attr with pathspec outside sparse definition' '
> >> +test_expect_success 'check-attr with pathspec outside sparse definition' '
> >
> > Re: my suggested change to the test in patch 1 [2], when I applied _this_
> > patch, the test still failed for the 'sparse-index' case. It doesn't seem to
> > be a problem with your patch, but rather a bug in 'diff' that can be
> > reproduced with this test (using the infrastructure in t1092):
> >
> > test_expect_failure 'diff --cached shows differences in sparse directory' '
> >       init_repos &&
> >
> >       test_all_match git reset --soft update-folder1 &&
> >       test_all_match git diff --cached -- folder1/a
> > '
> >
> > It's not immediately obvious to me what the problem is, but my guess is it's
> > some mis-handling of sparse directories in the internal diff machinery.
> > Given the likely complexity of the issue, I'd be content with you leaving
> > the 'diff --check' test as 'test_expect_failure' with a note about the bug
> > in 'diff' to fix later. Or, if you do want to investigate & fix it now, I
> > wouldn't be opposed to that either. :)
> >
> > [2] https://lore.kernel.org/git/c3ebe3b4-88b9-8ca2-2ee3-39a3e0d82201@github.com/
>
> Because the 'diff --check' test is broken, and 'git check-attr' still
> expands the index (as noted in the next patch), the code that implements
> 'read from a blob if the .gitattributes is not in the index' is not
> exercised by the tests in this patch (it gets exercised in the next
> patch). IOW, you can remove this logic and the tests still pass, like
> so:
>
> ----- >8 --------- >8 --------- >8 --------- >8 --------- >8 ----
> diff --git a/attr.c b/attr.c
> index 1488b8e18a..abfa2078ac 100644
> --- a/attr.c
> +++ b/attr.c
> @@ -23,6 +23,7 @@
>  #include "thread-utils.h"
>  #include "tree-walk.h"
>  #include "object-name.h"
> +#include "trace2.h"
>
>  const char git_attr__true[] = "(builtin)true";
>  const char git_attr__false[] = "\0(builtin)false";
> @@ -847,9 +848,11 @@ static struct attr_stack *read_attr_from_index(struct index_state *istate,
>             S_ISSPARSEDIR(istate->cache[pos]->ce_mode) &&
>             !strncmp(istate->cache[pos]->name, path, ce_namelen(istate->cache[pos])) &&
>             !normalize_path_copy(normalize_path, path)) {
> -               relative_path = normalize_path + ce_namelen(istate->cache[pos]);
> -               stack = read_attr_from_blob(istate, &istate->cache[pos]->oid, relative_path, flags);
> +               /* relative_path = normalize_path + ce_namelen(istate->cache[pos]); */
> +               /* stack = read_attr_from_blob(istate, &istate->cache[pos]->oid, relative_path, flags); */
> +               trace2_printf("Tried to read from blob");
>         } else {
> +               trace2_printf("Tried to read from index");
>                 buf = read_blob_data_from_index(istate, path, &size);
>                 if (!buf)
>                         return NULL;
> ----- >8 --------- >8 --------- >8 --------- >8 --------- >8 ----
>
> If I were writing patches and encountered this situation, I would squash
> patches 2-3/3 together since both are closely related and quite small,
> but I'll leave the decision to you + other reviewers.

^ permalink raw reply	[flat|nested] 39+ messages in thread

end of thread, other threads:[~2023-08-15  8:07 UTC | newest]

Thread overview: 39+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-07-01  6:48 [PATCH v1 0/3] check-attr: integrate with sparse-index Shuqi Liang
2023-07-01  6:48 ` [PATCH v1 1/3] attr.c: read attributes in a sparse directory Shuqi Liang
2023-07-03 17:59   ` Victoria Dye
2023-07-01  6:48 ` [PATCH v1 2/3] t1092: add tests for `git check-attr` Shuqi Liang
2023-07-03 18:11   ` Victoria Dye
2023-07-01  6:48 ` [PATCH v1 3/3] check-attr: integrate with sparse-index Shuqi Liang
2023-07-03 18:21   ` Victoria Dye
2023-07-07 15:18 ` [PATCH v2 0/3] " Shuqi Liang
2023-07-07 15:18   ` [PATCH v2 1/3] Enable gitattributes read from sparse directories Shuqi Liang
2023-07-07 23:15     ` Junio C Hamano
2023-07-07 15:18   ` [PATCH v2 2/3] t1092: add tests for `git check-attr` Shuqi Liang
2023-07-07 15:18   ` [PATCH v2 3/3] check-attr: integrate with sparse-index Shuqi Liang
2023-07-11 13:30   ` [PATCH v3 0/3] " Shuqi Liang
2023-07-11 13:30     ` [PATCH v3 1/3] attr.c: read attributes in a sparse directory Shuqi Liang
2023-07-11 21:15       ` Junio C Hamano
2023-07-11 22:08         ` Junio C Hamano
2023-07-13 20:22           ` Shuqi Liang
2023-07-13 20:13         ` Shuqi Liang
2023-07-11 21:24       ` Victoria Dye
2023-07-11 13:30     ` [PATCH v3 2/3] t1092: add tests for `git check-attr` Shuqi Liang
2023-07-11 18:52       ` Junio C Hamano
2023-07-11 20:47       ` Victoria Dye
2023-07-11 13:30     ` [PATCH v3 3/3] check-attr: integrate with sparse-index Shuqi Liang
2023-07-11 20:07       ` Junio C Hamano
2023-07-11 16:56     ` [PATCH v3 0/3] " Junio C Hamano
2023-07-18 23:29     ` [PATCH v4 " Shuqi Liang
2023-07-18 23:29       ` [PATCH v4 1/3] t1092: add tests for 'git check-attr' Shuqi Liang
2023-07-20 18:43         ` Victoria Dye
2023-07-18 23:29       ` [PATCH v4 2/3] attr.c: read attributes in a sparse directory Shuqi Liang
2023-07-20 20:18         ` Victoria Dye
2023-08-03 16:22           ` Glen Choo
2023-08-15  8:05             ` Shuqi Liang
2023-07-18 23:29       ` [PATCH v4 3/3] check-attr: integrate with sparse-index Shuqi Liang
2023-08-11 14:22       ` [PATCH v5 0/3] " Shuqi Liang
2023-08-11 14:22         ` [PATCH v5 1/3] t1092: add tests for 'git check-attr' Shuqi Liang
2023-08-11 14:22         ` [PATCH v5 2/3] attr.c: read attributes in a sparse directory Shuqi Liang
2023-08-11 14:22         ` [PATCH v5 3/3] check-attr: integrate with sparse-index Shuqi Liang
2023-08-14 16:24         ` [PATCH v5 0/3] " Victoria Dye
2023-08-14 17:10           ` Junio C Hamano

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).