All of lore.kernel.org
 help / color / mirror / Atom feed
From: Shaoxuan Yuan <shaoxuan.yuan02@gmail.com>
To: shaoxuan.yuan02@gmail.com
Cc: derrickstolee@github.com, vdye@github.com, git@vger.kernel.org,
	gitster@pobox.com
Subject: [PATCH v5 0/3] grep: integrate with sparse index
Date: Wed,  7 Sep 2022 17:18:51 -0700	[thread overview]
Message-ID: <20220908001854.206789-1-shaoxuan.yuan02@gmail.com> (raw)
In-Reply-To: <20220817075633.217934-1-shaoxuan.yuan02@gmail.com>

Integrate `git-grep` with sparse-index and test the performance
improvement.

Changes since v4
----------------
* Reset the length of `struct strbuf name` back to `name_base_len`,
  instead of 0, after `grep_tree()` returns.

* Add test cases in t1092 for `grep` recursing into submodules.

* Add a few NEEDSWORK to explain the current problem with submodules.

Changes since v3
----------------
* Shorten the perf result tables in commit message.

* Update the commit message to reflect the changes in the commit.

* Update the commit message to indicate the performance improvement
  is dependent on the pathspec.

* Stop passing `ce_mode` through `check_attr`. Instead, set the
  `base_len` to 0 to make the code more reasonable and less abuse of
  `check_attr`.

* Remove another invention of `base`. Use the existing `name` as the
  argument for `grep_tree()`, and reset it back to `ce->name` after
  `grep_tree()` returns.

* Update the p2000 test to use a more general pathspec for better
  compatibility (i.e. do not use git repository specific pathspec).

* Add tests to t1092 'grep is not expanded' to verify the change
  brought by "builtin/grep.c: walking tree instead of expanding index
  with --sparse": the index *never* expands.

Changes since v2
----------------

* Modify the commit message for "builtin/grep.c: integrate with sparse
  index" to make it obvious that the perf test results are not from
  p2000 tests, but from manual perf runs.

* Add tree-walking logic as an extra (the third) patch to improve the
  performance when --sparse is used. This resolved the left-over-bit
  in v2 [1].

[1] https://lore.kernel.org/git/20220829232843.183711-1-shaoxuan.yuan02@gmail.com/

Changes since v1
----------------

* Rewrite the commit message for "builtin/grep.c: add --sparse option"
  to be clearer.

* Update the documentation (both in-code and man page) for --sparse.

* Add a few tests to test the new behavior (when _only_ --cached is
  supplied).

* Reformat the perf test results to not look like directly from p2000
  tests.

* Put the "command_requires_full_index" lines right after parse_options().

* Add a pathspec test in t1092, and reword a few test documentations.

Shaoxuan Yuan (3):
  builtin/grep.c: add --sparse option
  builtin/grep.c: integrate with sparse index
  builtin/grep.c: walking tree instead of expanding index with --sparse

 Documentation/git-grep.txt               |  5 +-
 builtin/grep.c                           | 58 +++++++++++++++++--
 t/perf/p2000-sparse-operations.sh        |  1 +
 t/t1092-sparse-checkout-compatibility.sh | 72 ++++++++++++++++++++++++
 t/t7817-grep-sparse-checkout.sh          | 34 +++++++++--
 5 files changed, 159 insertions(+), 11 deletions(-)

Range-diff against v4:
1:  00a8b3a68e = 1:  c3d33e487c builtin/grep.c: add --sparse option
2:  3e0786722c = 2:  c5366f51b8 builtin/grep.c: integrate with sparse index
3:  81afe2fcb3 ! 3:  52bb802eae builtin/grep.c: walking tree instead of expanding index with --sparse
    @@ Commit message
                 2000.80: git grep ... (sparse-v3)   6.57   6.44 (≈)
                 2000.81: git grep ... (sparse-v4)   6.65   6.28 (≈)
     
    +    --------------------------
    +    NEEDSWORK about submodules
    +
    +    There are a few NEEDSWORKs that belong to improvements beyond this
    +    topic. See the NEEDSWORK in builtin/grep.c::grep_submodule() for
    +    more context. The other two NEEDSWORKs in t1092 are also relative.
    +
         Suggested-by: Derrick Stolee <derrickstolee@github.com>
         Helped-by: Derrick Stolee <derrickstolee@github.com>
         Helped-by: Victoria Dye <vdye@github.com>
         Signed-off-by: Shaoxuan Yuan <shaoxuan.yuan02@gmail.com>
     
      ## builtin/grep.c ##
    +@@ builtin/grep.c: static int grep_submodule(struct grep_opt *opt,
    + 	 * subrepo's odbs to the in-memory alternates list.
    + 	 */
    + 	obj_read_lock();
    ++
    ++	/*
    ++	 * NEEDSWORK: when reading a submodule, the sparsity settings in the
    ++	 * superproject are incorrectly forgotten or misused. For example:
    ++	 *
    ++	 * 1. "command_requires_full_index"
    ++	 * 	When this setting is turned on for `grep`, only the superproject
    ++	 *	knows it. All the submodules are read with their own configs
    ++	 *	and get prepare_repo_settings()'d. Therefore, these submodules
    ++	 *	"forget" the sparse-index feature switch. As a result, the index
    ++	 *	of these submodules are expanded unexpectedly.
    ++	 *
    ++	 * 2. "core_apply_sparse_checkout"
    ++	 *	When running `grep` in the superproject, this setting is
    ++	 *	populated using the superproject's configs. However, once
    ++	 *	initialized, this config is globally accessible and is read by
    ++	 *	prepare_repo_settings() for the submodules. For instance, if a
    ++	 *	submodule is using a sparse-checkout, however, the superproject
    ++	 *	is not, the result is that the config from the superproject will
    ++	 *	dictate the behavior for the submodule, making it "forget" its
    ++	 *	sparse-checkout state.
    ++	 *
    ++	 * 3. "core_sparse_checkout_cone"
    ++	 *	ditto.
    ++	 *
    ++	 * Note that this list is not exhaustive.
    ++	 */
    + 	repo_read_gitmodules(subrepo, 0);
    + 
    + 	/*
     @@ builtin/grep.c: static int grep_cache(struct grep_opt *opt,
      	if (repo_read_index(repo) < 0)
      		die(_("index file corrupt"));
    @@ builtin/grep.c: static int grep_cache(struct grep_opt *opt,
      
     -		if (S_ISREG(ce->ce_mode) &&
     +			hit |= grep_tree(opt, pathspec, &tree, &name, 0, 0);
    -+			strbuf_reset(&name);
    ++			strbuf_setlen(&name, name_base_len);
     +			strbuf_addstr(&name, ce->name);
     +			free(data);
     +		} else if (S_ISREG(ce->ce_mode) &&
    @@ t/perf/p2000-sparse-operations.sh: test_perf_on_all git read-tree -mu HEAD
      test_done
     
      ## t/t1092-sparse-checkout-compatibility.sh ##
    +@@ t/t1092-sparse-checkout-compatibility.sh: init_repos () {
    + 	git -C sparse-index sparse-checkout set deep
    + }
    + 
    ++init_repos_as_submodules () {
    ++	git reset --hard &&
    ++	init_repos &&
    ++	git submodule add ./full-checkout &&
    ++	git submodule add ./sparse-checkout &&
    ++	git submodule add ./sparse-index &&
    ++
    ++	git submodule status >actual &&
    ++	grep full-checkout actual &&
    ++	grep sparse-checkout actual &&
    ++	grep sparse-index actual
    ++}
    ++
    + run_on_sparse () {
    + 	(
    + 		cd sparse-checkout &&
     @@ t/t1092-sparse-checkout-compatibility.sh: test_expect_success 'grep is not expanded' '
      
      	# All files within the folder1/* pathspec are sparse,
    @@ t/t1092-sparse-checkout-compatibility.sh: test_expect_success 'grep is not expan
     +	# test in-cone pathspec with or without wildcard
     +	ensure_not_expanded grep --sparse --cached a -- "deep/a" &&
     +	ensure_not_expanded grep --sparse --cached a -- "deep/*"
    ++'
    ++
    ++# NEEDSWORK: when running `grep` in the superproject with --recurse-submodules,
    ++# Git expands the index of the submodules unexpectedly. Even though `grep`
    ++# builtin is marked as "command_requires_full_index = 0", this config is only
    ++# useful for the superproject. Namely, the submodules have their own configs,
    ++# which are _not_ populated by the one-time sparse-index feature switch.
    ++test_expect_failure 'grep within submodules is not expanded' '
    ++	init_repos_as_submodules &&
    ++
    ++	# do not use ensure_not_expanded() here, becasue `grep` should be
    ++	# run in the superproject, not in "./sparse-index"
    ++	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" \
    ++	git grep --sparse --cached --recurse-submodules a -- "*/folder1/*" &&
    ++	test_region ! index ensure_full_index trace2.txt
    ++'
    ++
    ++# NEEDSWORK: this test is not actually testing the code. The design purpose
    ++# of this test is to verify the grep result when the submodules are using a
    ++# sparse-index. Namely, we want "folder1/" as a tree (a sparse directory); but
    ++# because of the index expansion, we are now grepping the "folder1/a" blob.
    ++# Because of the problem stated above 'grep within submodules is not expanded',
    ++# we don't have the ideal test environment yet.
    ++test_expect_success 'grep sparse directory within submodules' '
    ++	init_repos_as_submodules &&
    ++
    ++	cat >expect <<-\EOF &&
    ++	full-checkout/folder1/a:a
    ++	sparse-checkout/folder1/a:a
    ++	sparse-index/folder1/a:a
    ++	EOF
    ++	git grep --sparse --cached --recurse-submodules a -- "*/folder1/*" >actual &&
    ++	test_cmp actual expect
      '
      
      test_done

base-commit: 79f2338b3746d23454308648b2491e5beba4beff
-- 
2.37.0


  parent reply	other threads:[~2022-09-08  0:19 UTC|newest]

Thread overview: 69+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-17  7:56 [PATCH v1 0/2] grep: integrate with sparse index Shaoxuan Yuan
2022-08-17  7:56 ` [PATCH v1 1/2] builtin/grep.c: add --sparse option Shaoxuan Yuan
2022-08-17 14:12   ` Derrick Stolee
2022-08-17 17:13     ` Junio C Hamano
2022-08-17 17:34       ` Victoria Dye
2022-08-17 17:43         ` Derrick Stolee
2022-08-17 18:47           ` Junio C Hamano
2022-08-17 17:37     ` Elijah Newren
2022-08-24 18:20     ` Shaoxuan Yuan
2022-08-24 19:08       ` Derrick Stolee
2022-08-17  7:56 ` [PATCH v1 2/2] builtin/grep.c: integrate with sparse index Shaoxuan Yuan
2022-08-17 14:23   ` Derrick Stolee
2022-08-24 21:06     ` Shaoxuan Yuan
2022-08-25  0:39       ` Derrick Stolee
2022-08-17 13:46 ` [PATCH v1 0/2] grep: " Derrick Stolee
2022-08-29 23:28 ` [PATCH v2 " Shaoxuan Yuan
2022-08-29 23:28   ` [PATCH v2 1/2] builtin/grep.c: add --sparse option Shaoxuan Yuan
2022-08-29 23:28   ` [PATCH v2 2/2] builtin/grep.c: integrate with sparse index Shaoxuan Yuan
2022-08-30 13:45     ` Derrick Stolee
2022-09-01  4:57 ` [PATCH v3 0/3] grep: " Shaoxuan Yuan
2022-09-01  4:57   ` [PATCH v3 1/3] builtin/grep.c: add --sparse option Shaoxuan Yuan
2022-09-01  4:57   ` [PATCH v3 2/3] builtin/grep.c: integrate with sparse index Shaoxuan Yuan
2022-09-01  4:57   ` [PATCH v3 3/3] builtin/grep.c: walking tree instead of expanding index with --sparse Shaoxuan Yuan
2022-09-01 17:03     ` Derrick Stolee
2022-09-01 18:31       ` Shaoxuan Yuan
2022-09-01 17:17     ` Junio C Hamano
2022-09-01 17:27       ` Junio C Hamano
2022-09-01 22:49         ` Shaoxuan Yuan
2022-09-01 22:36       ` Shaoxuan Yuan
2022-09-02  3:28     ` Victoria Dye
2022-09-02 18:47       ` Shaoxuan Yuan
2022-09-03  0:36 ` [PATCH v4 0/3] grep: integrate with sparse index Shaoxuan Yuan
2022-09-03  0:36   ` [PATCH v4 1/3] builtin/grep.c: add --sparse option Shaoxuan Yuan
2022-09-03  0:36   ` [PATCH v4 2/3] builtin/grep.c: integrate with sparse index Shaoxuan Yuan
2022-09-03  0:36   ` [PATCH v4 3/3] builtin/grep.c: walking tree instead of expanding index with --sparse Shaoxuan Yuan
2022-09-03  4:39     ` Junio C Hamano
2022-09-08  0:24       ` Shaoxuan Yuan
2022-09-08  0:18 ` Shaoxuan Yuan [this message]
2022-09-08  0:18   ` [PATCH v5 1/3] builtin/grep.c: add --sparse option Shaoxuan Yuan
2022-09-10  1:07     ` Victoria Dye
2022-09-14  6:08     ` Elijah Newren
2022-09-15  2:57       ` Junio C Hamano
2022-09-18  2:14         ` Elijah Newren
2022-09-18 19:52           ` Victoria Dye
2022-09-19  1:23             ` Junio C Hamano
2022-09-19  4:27             ` Shaoxuan Yuan
2022-09-19 11:03             ` Ævar Arnfjörð Bjarmason
2022-09-20  7:13             ` Elijah Newren
2022-09-17  3:34       ` Shaoxuan Yuan
2022-09-18  4:24         ` Elijah Newren
2022-09-19  4:13           ` Shaoxuan Yuan
2022-09-17  3:45       ` Shaoxuan Yuan
2022-09-08  0:18   ` [PATCH v5 2/3] builtin/grep.c: integrate with sparse index Shaoxuan Yuan
2022-09-08  0:18   ` [PATCH v5 3/3] builtin/grep.c: walking tree instead of expanding index with --sparse Shaoxuan Yuan
2022-09-08 17:59     ` Junio C Hamano
2022-09-08 20:46       ` Derrick Stolee
2022-09-08 20:56         ` Junio C Hamano
2022-09-08 21:06           ` Shaoxuan Yuan
2022-09-09 12:49           ` Derrick Stolee
2022-09-13 17:23         ` Junio C Hamano
2022-09-10  2:04     ` Victoria Dye
2022-09-23  4:18 ` [PATCH v6 0/1] grep: integrate with sparse index Shaoxuan Yuan
2022-09-23  4:18   ` [PATCH v6 1/1] builtin/grep.c: " Shaoxuan Yuan
2022-09-23 16:40     ` Junio C Hamano
2022-09-23 16:58     ` Junio C Hamano
2022-09-26 17:28       ` Junio C Hamano
2022-09-23 14:13   ` [PATCH v6 0/1] grep: " Derrick Stolee
2022-09-23 16:01   ` Victoria Dye
2022-09-23 17:08     ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220908001854.206789-1-shaoxuan.yuan02@gmail.com \
    --to=shaoxuan.yuan02@gmail.com \
    --cc=derrickstolee@github.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=vdye@github.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.