git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: Stefan Beller <sbeller@google.com>
Cc: git@vger.kernel.org
Subject: Re: [PATCH 06/11] submodule.c: sort changed_submodule_names before searching it
Date: Thu, 06 Sep 2018 11:03:30 -0700	[thread overview]
Message-ID: <xmqqbm9a1p1p.fsf@gitster-ct.c.googlers.com> (raw)
In-Reply-To: <20180904230149.180332-7-sbeller@google.com> (Stefan Beller's message of "Tue, 4 Sep 2018 16:01:44 -0700")

Stefan Beller <sbeller@google.com> writes:

> Instead of sorting it after we created an unsorted list, we could insert
> correctly into the list.

It is unclear what problem you are solving, especially with
subjunctive "could" there.  We are creating an unsorted list and
then sorting it and you see it as a problem because it is just as
easy and efficient to do the insertion sort while building up the
list?  (don't react and answer without reading all the way to the
end; I think I know what is going on).

> As the unsorted append is in order of cache entry
> names, this is already sorted if names were equal to paths for submodules.

That may be a statement of a fact, but it is unclear how that fact
relates to what the code is doing before this patch, or what the
code updated by this patch is doing.

> As submodule names are often the same as their path, the input is sorted
> pretty well already, so let's just do the sort afterwards.

It is unclear what (performance?) trade-off this senttence is trying
to make.  It sounds as if it is claiming this:

	We can string_list_insert() to maintain sorted-ness of the
	list as we find new items, or we can string_list_append() to
	build an unsorted list and sort it at the end just once.

	To pick which one is more appropriate, we notice the fact
	that we discover new items more or less in the already
	sorted order.  That makes "append then sort" more
	appropriate.

But is that reasoning sensible?  

I'd imagine that append-then-sort would always be more efficient
than insert-at-the-right-place-as-we-go, and the reason why we
sometimes would need to do the latter is when we need to look up
elements in the middle of building the list (e.g. we may want to
de-dup, which requires us to look up a name from the ones we
collected so far).

And in this application, calculate_changed_submodule_paths()
discover paths by calling collect_changed_submodules() which finds a
mapping <submodule name, oid of commits> into a list sorted by
submodule name, and then goes through that list and builds a list of
submodules paths (which could be different from submodule names) by
appending.  Only after this list is fully built, get_next_submodule()
gets called, so making the latter use string_list_lookup() that assumes
a sorted list is safe if we built the list by append-then-sort (iow,
sortedness while building the list does not matter).

Having analysed all that, I find it somewhat iffy that _append() is
used there in the calculate_changed_submodule_paths() function.  It
would cause the resulting changed_submodule_names list to hold the
same name twice (or more), but I do not know if that would pose a
problem to the consumer of the list.  Using "accumulate then sort
before calling look-up" would not change it as string_list_sort()
would not dedup, so I do not think this patch would introduce a new
problem, though.



> Signed-off-by: Stefan Beller <sbeller@google.com>
> ---
>  submodule.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/submodule.c b/submodule.c
> index 8345d423fda..8bee455137a 100644
> --- a/submodule.c
> +++ b/submodule.c
> @@ -1255,7 +1255,7 @@ static int get_next_submodule(struct child_process *cp,
>  		default:
>  		case RECURSE_SUBMODULES_DEFAULT:
>  		case RECURSE_SUBMODULES_ON_DEMAND:
> -			if (!submodule || !unsorted_string_list_lookup(
> +			if (!submodule || !string_list_lookup(
>  					&changed_submodule_names,
>  					submodule->name))
>  				continue;
> @@ -1349,6 +1349,7 @@ int fetch_populated_submodules(struct repository *r,
>  	/* default value, "--submodule-prefix" and its value are added later */
>  
>  	calculate_changed_submodule_paths();
> +	string_list_sort(&changed_submodule_names);
>  	run_processes_parallel(max_parallel_jobs,
>  			       get_next_submodule,
>  			       fetch_start_failure,

  reply	other threads:[~2018-09-06 18:03 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-09-04 23:01 [PATCH 00/11] fetch: make sure submodule oids are fetched Stefan Beller
2018-09-04 23:01 ` [PATCH 01/11] string_list: print_string_list to use trace_printf Stefan Beller
2018-09-06 16:52   ` Junio C Hamano
2018-09-06 16:56     ` Jeff King
2018-09-06 22:16       ` Stefan Beller
2018-09-07  0:04         ` Jeff King
2018-09-07  9:53           ` Junio C Hamano
2018-09-07 17:21             ` Stefan Beller
2018-09-10 21:58               ` [PATCH 1/2] trace: add trace_print_string_list_key Stefan Beller
2018-09-10 21:58                 ` [PATCH 2/2] string-list: remove unused function print_string_list Stefan Beller
2018-09-10 22:32                 ` [PATCH 1/2] trace: add trace_print_string_list_key Junio C Hamano
2018-09-10 22:38                   ` Stefan Beller
2018-09-11  0:52                     ` Junio C Hamano
2018-09-11  3:08                       ` Stefan Beller
2018-09-11 21:03                         ` Jeff King
2018-09-11 18:48                       ` [PATCH] string-list: remove unused function print_string_list Stefan Beller
2018-09-11 19:27                         ` Junio C Hamano
2018-09-11 19:30                           ` Junio C Hamano
2018-09-11 19:47                             ` Stefan Beller
2018-09-11 20:53                               ` Junio C Hamano
2018-09-04 23:01 ` [PATCH 02/11] string-list.h: add string_list_{pop, last} functions Stefan Beller
2018-09-06 17:10   ` Junio C Hamano
2018-09-06 22:29     ` Stefan Beller
2018-09-04 23:01 ` [PATCH 03/11] sha1-array: provide oid_array_filter Stefan Beller
2018-09-06 17:20   ` Junio C Hamano
2018-09-04 23:01 ` [PATCH 04/11] submodule.c: convert submodule_move_head new argument to object id Stefan Beller
2018-09-06 17:31   ` Junio C Hamano
2018-09-04 23:01 ` [PATCH 05/11] submodule.c: fix indentation Stefan Beller
2018-09-06 17:34   ` Junio C Hamano
2018-09-04 23:01 ` [PATCH 06/11] submodule.c: sort changed_submodule_names before searching it Stefan Beller
2018-09-06 18:03   ` Junio C Hamano [this message]
2018-09-11 18:31     ` Stefan Beller
2018-09-04 23:01 ` [PATCH 07/11] submodule: move global changed_submodule_names into fetch submodule struct Stefan Beller
2018-09-06 18:04   ` Junio C Hamano
2018-09-04 23:01 ` [PATCH 08/11] submodule.c: do not copy around submodule list Stefan Beller
2018-09-06 18:20   ` Junio C Hamano
2018-09-04 23:01 ` [PATCH 09/11] submodule: fetch in submodules git directory instead of in worktree Stefan Beller
2018-09-04 23:01 ` [PATCH 10/11] fetch: retry fetching submodules if sha1 were not fetched Stefan Beller
2018-09-04 23:01 ` [PATCH 11/11] builtin/fetch: check for submodule updates for non branch fetches Stefan Beller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqqbm9a1p1p.fsf@gitster-ct.c.googlers.com \
    --to=gitster@pobox.com \
    --cc=git@vger.kernel.org \
    --cc=sbeller@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).