Git Mailing List Archive on lore.kernel.org
 help / color / Atom feed
From: "SZEDER Gábor" <szeder.dev@gmail.com>
To: Matheus Tavares <matheus.bernardino@usp.br>,
	gitster@pobox.com, Philippe Blain <levraiphilippeblain@gmail.com>
Cc: git@vger.kernel.org, christian.couder@gmail.com,
	gitster@pobox.com, jrnieder@gmail.com, olyatelezhnaya@gmail.com,
	pclouds@gmail.com, jonathantanmy@google.com, peff@peff.net,
	Brandon Williams <bwilliams.eng@gmail.com>,
	Stefan Beller <stefanbeller@gmail.com>
Subject: Re: [PATCH v3 08/12] grep: allow submodule functions to run in parallel
Date: Wed, 29 Jan 2020 12:26:13 +0100
Message-ID: <20200129112613.GE10482@szeder.dev> (raw)
In-Reply-To: <af8ad95d413aa3d763769eb3ae9544e25ccbe2d1.1579141989.git.matheus.bernardino@usp.br>

Junio, Matheus, Philippe,

this patch below and a7f3240877 (grep: ignore --recurse-submodules if
--no-index is given, 2020-01-26) on topic
'pb/do-not-recurse-grep-no-index' don't work well together, and cause
failure of the test 'grep --recurse-submodules --no-index ignores
--recurse-submodules' in 't7814-grep-recurse-submodules.sh', i.e. in
the new test added in a7f3240877.

More below.

On Wed, Jan 15, 2020 at 11:39:56PM -0300, Matheus Tavares wrote:
> Now that object reading operations are internally protected, the
> submodule initialization functions at builtin/grep.c:grep_submodule()
> are very close to being thread-safe. Let's take a look at each call and
> remove from the critical section what we can, for better performance:
> 
> - submodule_from_path() and is_submodule_active() cannot be called in
>   parallel yet only because they call repo_read_gitmodules() which
>   contains, in its call stack, operations that would otherwise be in
>   race condition with object reading (for example parse_object() and
>   is_promisor_remote()). However, they only call repo_read_gitmodules()
>   if it wasn't read before. So let's pre-read it before firing the
>   threads and allow these two functions to safely be called in
>   parallel.
> 
> - repo_submodule_init() is already thread-safe, so remove it from the
>   critical section without other necessary changes.
> 
> - The repo_read_gitmodules(&subrepo) call at grep_submodule() is safe as
>   no other thread is performing object reading operations in the subrepo
>   yet. However, threads might be working in the superproject, and this
>   function calls add_to_alternates_memory() internally, which is racy
>   with object readings in the superproject. So it must be kept
>   protected for now. Let's add a "NEEDSWORK" to it, informing why it
>   cannot be removed from the critical section yet.
> 
> - Finally, add_to_alternates_memory() must be kept protected for the
>   same reason as the item above.
> 
> Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
> ---
>  builtin/grep.c | 38 ++++++++++++++++++++++----------------
>  1 file changed, 22 insertions(+), 16 deletions(-)
> 
> diff --git a/builtin/grep.c b/builtin/grep.c
> index d3ed05c1da..ac3d86c2e5 100644
> --- a/builtin/grep.c
> +++ b/builtin/grep.c
> @@ -401,25 +401,23 @@ static int grep_submodule(struct grep_opt *opt,
>  	struct grep_opt subopt;
>  	int hit;
>  
> -	/*
> -	 * NEEDSWORK: submodules functions need to be protected because they
> -	 * call config_from_gitmodules(): the latter contains in its call stack
> -	 * many thread-unsafe operations that are racy with object reading, such
> -	 * as parse_object() and is_promisor_object().
> -	 */
> -	obj_read_lock();
>  	sub = submodule_from_path(superproject, &null_oid, path);
>  
> -	if (!is_submodule_active(superproject, path)) {
> -		obj_read_unlock();
> +	if (!is_submodule_active(superproject, path))
>  		return 0;
> -	}
>  
> -	if (repo_submodule_init(&subrepo, superproject, sub)) {
> -		obj_read_unlock();
> +	if (repo_submodule_init(&subrepo, superproject, sub))
>  		return 0;
> -	}
>  
> +	/*
> +	 * NEEDSWORK: repo_read_gitmodules() might call
> +	 * add_to_alternates_memory() via config_from_gitmodules(). This
> +	 * operation causes a race condition with concurrent object readings
> +	 * performed by the worker threads. That's why we need obj_read_lock()
> +	 * here. It should be removed once it's no longer necessary to add the
> +	 * subrepo's odbs to the in-memory alternates list.
> +	 */
> +	obj_read_lock();
>  	repo_read_gitmodules(&subrepo, 0);
>  
>  	/*
> @@ -1052,6 +1050,9 @@ int cmd_grep(int argc, const char **argv, const char *prefix)
>  	pathspec.recursive = 1;
>  	pathspec.recurse_submodules = !!recurse_submodules;
>  
> +	if (recurse_submodules && (!use_index || untracked))
> +		die(_("option not supported with --recurse-submodules"));

So this patch moves this condition here, expecting git to die with 
'--recurse-submodules --no-index'.  However, a7f3240877 removes the
'!use_index' part of the condition, so we won't die here ...

>  	if (list.nr || cached || show_in_pager) {
>  		if (num_threads > 1)
>  			warning(_("invalid option combination, ignoring --threads"));
> @@ -1071,6 +1072,14 @@ int cmd_grep(int argc, const char **argv, const char *prefix)
>  		    && (opt.pre_context || opt.post_context ||
>  			opt.file_break || opt.funcbody))
>  			skip_first_line = 1;
> +
> +		/*
> +		 * Pre-read gitmodules (if not read already) to prevent racy
> +		 * lazy reading in worker threads.
> +		 */
> +		if (recurse_submodules)
> +			repo_read_gitmodules(the_repository, 1);

... and eventually reach this condition, which then reads the
submodules even with '--no-index', which is just what a7f3240877 tried
to avoid, thus triggering the test failure.

It might be that all we need is changing this condition to:

  if (recurse_submodules && use_index)

Or maybe not, but this change on top of 'pu' makes t7814 succeed
again.

However, I'm not familiar with the intricacies of either threaded grep
or submodules, much less the combination of the two...  so just an
idea.

>  		start_threads(&opt);
>  	} else {
>  		/*
> @@ -1105,9 +1114,6 @@ int cmd_grep(int argc, const char **argv, const char *prefix)
>  		}
>  	}
>  
> -	if (recurse_submodules && (!use_index || untracked))
> -		die(_("option not supported with --recurse-submodules"));
> -
>  	if (!show_in_pager && !opt.status_only)
>  		setup_pager();
>  
> -- 
> 2.24.1
> 

  reply index

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-10 20:27 [GSoC][PATCH 0/4] grep: re-enable threads when cached, w/ parallel inflation Matheus Tavares
2019-09-30  1:50 ` [PATCH v2 00/11] grep: improve threading and fix race conditions Matheus Tavares
2020-01-16  2:39   ` [PATCH v3 00/12] " Matheus Tavares
2020-01-16  2:39     ` [PATCH v3 01/12] grep: fix race conditions on userdiff calls Matheus Tavares
2020-01-16  2:39     ` [PATCH v3 02/12] grep: fix race conditions at grep_submodule() Matheus Tavares
2020-01-16  2:39     ` [PATCH v3 03/12] grep: fix racy calls in grep_objects() Matheus Tavares
2020-01-16  2:39     ` [PATCH v3 04/12] replace-object: make replace operations thread-safe Matheus Tavares
2020-01-16  2:39     ` [PATCH v3 05/12] object-store: allow threaded access to object reading Matheus Tavares
2020-01-16  2:39     ` [PATCH v3 06/12] grep: replace grep_read_mutex by internal obj read lock Matheus Tavares
2020-01-16  2:39     ` [PATCH v3 07/12] submodule-config: add skip_if_read option to repo_read_gitmodules() Matheus Tavares
2020-01-16  2:39     ` [PATCH v3 08/12] grep: allow submodule functions to run in parallel Matheus Tavares
2020-01-29 11:26       ` SZEDER Gábor [this message]
2020-01-29 18:49         ` Junio C Hamano
2020-01-29 18:57         ` Junio C Hamano
2020-01-29 20:42           ` Matheus Tavares Bernardino
2020-01-30 13:28             ` Philippe Blain
2020-01-16  2:39     ` [PATCH v3 09/12] grep: protect packed_git [re-]initialization Matheus Tavares
2020-01-16  2:39     ` [PATCH v3 10/12] grep: re-enable threads in non-worktree case Matheus Tavares
2020-01-16  2:39     ` [PATCH v3 11/12] grep: move driver pre-load out of critical section Matheus Tavares
2020-01-16  2:40     ` [PATCH v3 12/12] grep: use no. of cores as the default no. of threads Matheus Tavares
2020-01-16 13:11       ` Victor Leschuk
2020-01-16 14:47         ` [PATCH] " Matheus Tavares

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200129112613.GE10482@szeder.dev \
    --to=szeder.dev@gmail.com \
    --cc=bwilliams.eng@gmail.com \
    --cc=christian.couder@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jonathantanmy@google.com \
    --cc=jrnieder@gmail.com \
    --cc=levraiphilippeblain@gmail.com \
    --cc=matheus.bernardino@usp.br \
    --cc=olyatelezhnaya@gmail.com \
    --cc=pclouds@gmail.com \
    --cc=peff@peff.net \
    --cc=stefanbeller@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Git Mailing List Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/git/0 git/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 git git/ https://lore.kernel.org/git \
		git@vger.kernel.org
	public-inbox-index git

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.git


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git