All of lore.kernel.org
 help / color / mirror / Atom feed
From: Glen Choo <chooglen@google.com>
To: Calvin Wan <calvinwan@google.com>, git@vger.kernel.org
Cc: Calvin Wan <calvinwan@google.com>,
	avarab@gmail.com, newren@gmail.com, jonathantanmy@google.com,
	phillip.wood123@gmail.com
Subject: Re: [PATCH v8 6/6] diff-lib: parallelize run_diff_files for submodules
Date: Mon, 13 Feb 2023 16:36:18 +0800	[thread overview]
Message-ID: <kl6lh6vqarnh.fsf@chooglen-macbookpro.roam.corp.google.com> (raw)
In-Reply-To: <20230209000212.1892457-7-calvinwan@google.com>

Calvin Wan <calvinwan@google.com> writes:

> @@ -244,6 +266,8 @@ int run_diff_files(struct rev_info *revs, unsigned int option)
>  			newmode = ce->ce_mode;
>  		} else {
>  			struct stat st;
> +			unsigned ignore_untracked = 0;
> +			int defer_submodule_status = 1;
>  
>  			changed = check_removed(istate, ce, &st);
>  			if (changed) {

Previously [1] it wasn't entirely clear whether we intended to always
parallelize submodule diffing, but now it seems that we always try to
parallelize. In essence, this means that we don't have a serial
implementation any more, but maybe that's okay.

[1] https://lore.kernel.org/git/kl6lilgtveoe.fsf@chooglen-macbookpro.roam.corp.google.com/

> @@ -265,14 +289,53 @@ int run_diff_files(struct rev_info *revs, unsigned int option)
>  			}
>  
>  			changed = match_stat_with_submodule(&revs->diffopt, ce, &st,
> -							    ce_option, &dirty_submodule);
> +							    ce_option, &dirty_submodule,
> +							    &defer_submodule_status,
> +							    &ignore_untracked);

Here we get the 'changed' bit of the submodule. Because we always defer,
we never call is_submodule_modified() inside
match_stat_with_submodule() and as such, we never set "dirty_submodule"
here. If so, could we remove the variable altogether?

>  			newmode = ce_mode_from_stat(ce, st.st_mode);
> +			if (defer_submodule_status) {
> +				struct submodule_status_util tmp = {
> +					.changed = changed,
> +					.dirty_submodule = 0,
> +					.ignore_untracked = ignore_untracked,
> +					.newmode = newmode,
> +					.ce = ce,
> +					.path = ce->name,
> +				};
> +				struct string_list_item *item;
> +
> +				item = string_list_append(&submodules, ce->name);
> +				item->util = xmalloc(sizeof(tmp));
> +				memcpy(item->util, &tmp, sizeof(tmp));
> +				continue;
> +			}
>  		}
>  
>  		if (diff_change_helper(&revs->diffopt, newmode, dirty_submodule,
>  				       changed, istate, ce))

I'm surprised to see that we still call "diff_change_helper()" even
though we've 'deferred' the submodule diff, especially since "changed"
is set and "dirty_submodule" is unset. Even if this is safe, I think we
shouldn't do this because...

> +	if (submodules.nr) {
> +		unsigned long parallel_jobs;
> +		struct string_list_item *item;
> +
> +		if (git_config_get_ulong("submodule.diffjobs", &parallel_jobs))
> +			parallel_jobs = 1;
> +		else if (!parallel_jobs)
> +			parallel_jobs = online_cpus();
> +
> +		if (get_submodules_status(&submodules, parallel_jobs))
> +			die(_("submodule status failed"));
> +		for_each_string_list_item(item, &submodules) {
> +			struct submodule_status_util *util = item->util;
> +
> +			if (diff_change_helper(&revs->diffopt, util->newmode,
> +				       util->dirty_submodule, util->changed,
> +				       istate, util->ce))

Here we call "diff_change_helper()" again on the deferred submodule, but
now with the "dirty_submodule" value we expected. At best this is
wasteful, but at worst this is possibly wrong.

For good measure, I applied this patch to see if we needed either
"dirty_submodule" or the second "diff_change_helper()" call; our
test suite still passes after I remove both of them.

  diff --git a/diff-lib.c b/diff-lib.c
  index 2dde575ec6..21adcc7fd6 100644
  --- a/diff-lib.c
  +++ b/diff-lib.c
  @@ -156,6 +156,7 @@ int run_diff_files(struct rev_info *revs, unsigned int option)
      struct cache_entry *ce = istate->cache[i];
      int changed;
      unsigned dirty_submodule = 0;
  +		int defer_submodule_status = 1;

      if (diff_can_quit_early(&revs->diffopt))
        break;
  @@ -267,7 +268,6 @@ int run_diff_files(struct rev_info *revs, unsigned int option)
      } else {
        struct stat st;
        unsigned ignore_untracked = 0;
  -			int defer_submodule_status = 1;

        changed = check_removed(istate, ce, &st);
        if (changed) {
  @@ -311,9 +311,9 @@ int run_diff_files(struct rev_info *revs, unsigned int option)
        }
      }

  -		if (diff_change_helper(&revs->diffopt, newmode, dirty_submodule,
  -				       changed, istate, ce))
  -			continue;
  +		if (!defer_submodule_status)
  +			diff_change_helper(&revs->diffopt, newmode, 0,
  +					   changed, istate, ce);
    }
    if (submodules.nr) {
      unsigned long parallel_jobs;


> +static void parse_status_porcelain_strbuf(struct strbuf *buf,
> +				   unsigned *dirty_submodule,
> +				   int ignore_untracked)
> +{
> +	struct string_list list = STRING_LIST_INIT_DUP;
> +	struct string_list_item *item;
> +
> +	string_list_split(&list, buf->buf, '\n', -1);
> +
> +	for_each_string_list_item(item, &list) {
> +		if (parse_status_porcelain(item->string,
> +					   strlen(item->string),
> +					   dirty_submodule,
> +					   ignore_untracked))
> +			break;
> +	}
> +	string_list_clear(&list, 0);
> +}

Given that this function only has one caller, is quite simple, and isn't
actually a strbuf version of "parse_status_porcelain()" (it's actually a
multiline version that also happens to accept a strbuf), I think this
might be better inlined.

> +test_expect_success 'status in superproject with submodules (parallel)' '
> +	git -C super status --porcelain >output &&
> +	git -C super -c submodule.diffJobs=8 status --porcelain >output_parallel &&
> +	diff output output_parallel
> +'
> +
>  test_done

When I first suggested this test, I thought that we would sometimes
defer submodule status and sometimes not, so this would be a good way to
check between both modes. Now this is less useful, since this is only
checking that parallelism > 1 doesn't affect the output, but it's still
a useful reasonableness check IMO. Thanks.

  reply	other threads:[~2023-02-13  8:36 UTC|newest]

Thread overview: 86+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <https://lore.kernel.org/git/20221108184200.2813458-1-calvinwan@google.com/>
2023-01-04 21:54 ` [PATCH v5 0/6] submodule: parallelize diff Calvin Wan
2023-01-05 23:23   ` Calvin Wan
2023-01-17 19:30   ` [PATCH v6 " Calvin Wan
2023-02-07 18:16     ` [PATCH v7 0/7] " Calvin Wan
2023-02-08  0:55       ` Ævar Arnfjörð Bjarmason
2023-02-09  0:02       ` [PATCH v8 0/6] " Calvin Wan
2023-02-09  1:42         ` Ævar Arnfjörð Bjarmason
2023-02-09 19:50         ` Junio C Hamano
2023-02-09 21:52           ` Calvin Wan
2023-02-09 22:25             ` Junio C Hamano
2023-02-10 13:24             ` Ævar Arnfjörð Bjarmason
2023-02-10 17:42               ` Junio C Hamano
2023-02-09 20:50         ` Phillip Wood
2023-03-02 21:52         ` [PATCH v9 " Calvin Wan
2023-03-02 22:02           ` [PATCH v9 1/6] run-command: add on_stderr_output_fn to run_processes_parallel_opts Calvin Wan
2023-03-02 22:02           ` [PATCH v9 2/6] submodule: rename strbuf variable Calvin Wan
2023-03-03  0:25             ` Junio C Hamano
2023-03-06 17:37               ` Calvin Wan
2023-03-06 18:30                 ` Junio C Hamano
2023-03-06 19:00                   ` Calvin Wan
2023-03-02 22:02           ` [PATCH v9 3/6] submodule: move status parsing into function Calvin Wan
2023-03-17 20:42             ` Glen Choo
2023-03-02 22:02           ` [PATCH v9 4/6] submodule: refactor is_submodule_modified() Calvin Wan
2023-03-02 22:02           ` [PATCH v9 5/6] diff-lib: refactor out diff_change logic Calvin Wan
2023-03-02 22:02           ` [PATCH v9 6/6] diff-lib: parallelize run_diff_files for submodules Calvin Wan
2023-03-07  8:41             ` Ævar Arnfjörð Bjarmason
2023-03-07 10:21             ` Ævar Arnfjörð Bjarmason
2023-03-07 17:55               ` Junio C Hamano
2023-03-17  1:09             ` Glen Choo
2023-03-17  2:51               ` Glen Choo
2023-02-09  0:02       ` [PATCH v8 1/6] run-command: add duplicate_output_fn to run_processes_parallel_opts Calvin Wan
2023-02-13  6:34         ` Glen Choo
2023-02-13 17:52           ` Junio C Hamano
2023-02-13 18:26             ` Calvin Wan
2023-02-09  0:02       ` [PATCH v8 2/6] submodule: strbuf variable rename Calvin Wan
2023-02-13  8:37         ` Glen Choo
2023-02-09  0:02       ` [PATCH v8 3/6] submodule: move status parsing into function Calvin Wan
2023-02-09  0:02       ` [PATCH v8 4/6] submodule: refactor is_submodule_modified() Calvin Wan
2023-02-13  7:06         ` Glen Choo
2023-02-09  0:02       ` [PATCH v8 5/6] diff-lib: refactor out diff_change logic Calvin Wan
2023-02-09  1:48         ` Ævar Arnfjörð Bjarmason
2023-02-13  8:42         ` Glen Choo
2023-02-13 18:29           ` Calvin Wan
2023-02-14  4:03             ` Glen Choo
2023-02-09  0:02       ` [PATCH v8 6/6] diff-lib: parallelize run_diff_files for submodules Calvin Wan
2023-02-13  8:36         ` Glen Choo [this message]
2023-02-07 18:17     ` [PATCH v7 1/7] run-command: add duplicate_output_fn to run_processes_parallel_opts Calvin Wan
2023-02-07 22:16       ` Ævar Arnfjörð Bjarmason
2023-02-08 22:50         ` Calvin Wan
2023-02-08 14:19       ` Phillip Wood
2023-02-08 22:54         ` Calvin Wan
2023-02-09 20:37           ` Phillip Wood
2023-02-07 18:17     ` [PATCH v7 2/7] submodule: strbuf variable rename Calvin Wan
2023-02-07 22:47       ` Ævar Arnfjörð Bjarmason
2023-02-08 22:59         ` Calvin Wan
2023-02-07 18:17     ` [PATCH v7 3/7] submodule: move status parsing into function Calvin Wan
2023-02-07 18:17     ` [PATCH v7 4/7] submodule: refactor is_submodule_modified() Calvin Wan
2023-02-07 22:59       ` Ævar Arnfjörð Bjarmason
2023-02-07 18:17     ` [PATCH v7 5/7] diff-lib: refactor out diff_change logic Calvin Wan
2023-02-08 14:28       ` Phillip Wood
2023-02-08 23:12         ` Calvin Wan
2023-02-09 20:53           ` Phillip Wood
2023-02-07 18:17     ` [PATCH v7 6/7] diff-lib: refactor match_stat_with_submodule Calvin Wan
2023-02-08  8:18       ` Ævar Arnfjörð Bjarmason
2023-02-08 17:07         ` Phillip Wood
2023-02-08 23:13           ` Calvin Wan
2023-02-08 14:22       ` Phillip Wood
2023-02-07 18:17     ` [PATCH v7 7/7] diff-lib: parallelize run_diff_files for submodules Calvin Wan
2023-02-07 23:06       ` Ævar Arnfjörð Bjarmason
2023-01-17 19:30   ` [PATCH v6 1/6] run-command: add duplicate_output_fn to run_processes_parallel_opts Calvin Wan
2023-01-17 19:30   ` [PATCH v6 2/6] submodule: strbuf variable rename Calvin Wan
2023-01-17 19:30   ` [PATCH v6 3/6] submodule: move status parsing into function Calvin Wan
2023-01-17 19:30   ` [PATCH v6 4/6] diff-lib: refactor match_stat_with_submodule Calvin Wan
2023-01-17 19:30   ` [PATCH v6 5/6] diff-lib: parallelize run_diff_files for submodules Calvin Wan
2023-01-26  9:09     ` Glen Choo
2023-01-26  9:16     ` Glen Choo
2023-01-26 18:52       ` Calvin Wan
2023-01-17 19:30   ` [PATCH v6 6/6] submodule: call parallel code from serial status Calvin Wan
2023-01-26  8:09     ` Glen Choo
2023-01-26  8:45       ` Glen Choo
2023-01-04 21:54 ` [PATCH v5 1/6] run-command: add duplicate_output_fn to run_processes_parallel_opts Calvin Wan
2023-01-04 21:54 ` [PATCH v5 2/6] submodule: strbuf variable rename Calvin Wan
2023-01-04 21:54 ` [PATCH v5 3/6] submodule: move status parsing into function Calvin Wan
2023-01-04 21:54 ` [PATCH v5 4/6] diff-lib: refactor match_stat_with_submodule Calvin Wan
2023-01-04 21:54 ` [PATCH v5 5/6] diff-lib: parallelize run_diff_files for submodules Calvin Wan
2023-01-04 21:54 ` [PATCH v5 6/6] submodule: call parallel code from serial status Calvin Wan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=kl6lh6vqarnh.fsf@chooglen-macbookpro.roam.corp.google.com \
    --to=chooglen@google.com \
    --cc=avarab@gmail.com \
    --cc=calvinwan@google.com \
    --cc=git@vger.kernel.org \
    --cc=jonathantanmy@google.com \
    --cc=newren@gmail.com \
    --cc=phillip.wood123@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.