lttng-dev.lists.lttng.org archive mirror
 help / color / mirror / Atom feed
From: Mathieu Desnoyers via lttng-dev <lttng-dev@lists.lttng.org>
To: paulmck <paulmck@kernel.org>
Cc: Stephen Hemminger <stephen@networkplumber.org>,
	 Alan Stern <stern@rowland.harvard.edu>,
	 Lai Jiangshan <jiangshanlai@gmail.com>,
	 lttng-dev <lttng-dev@lists.lttng.org>,
	 linux-kernel <linux-kernel@vger.kernel.org>
Subject: Re: [lttng-dev] [PATCH] call_rcu: Fix race between rcu_barrier() and call_rcu_data_free()
Date: Mon, 26 Oct 2020 15:58:11 -0400 (EDT)	[thread overview]
Message-ID: <1576751762.38206.1603742291604.JavaMail.zimbra@efficios.com> (raw)
In-Reply-To: <20201022223021.GA8535@paulmck-ThinkPad-P72>

----- On Oct 22, 2020, at 6:30 PM, paulmck paulmck@kernel.org wrote:

> The current code can lose RCU callbacks at shutdown time, which can
> result in hangs.  This lossage can happen as follows:
> 
> o       A thread invokes call_rcu_data_free(), which executes up through
>        the wake_call_rcu_thread().  At this point, the call_rcu_data
>        structure has been drained of callbacks, but is still on the
>        call_rcu_data_list.  Note that this thread does not hold the
>        call_rcu_mutex.
> 
> o       Another thread invokes rcu_barrier(), which traverses the
>        call_rcu_data_list under the protection of call_rcu_mutex,
>        a list which still includes the above newly drained structure.
>        This thread therefore adds a callback to the newly drained
>        call_rcu_data structure.  It then releases call_rcu_mutex and
>        enters a mystifying loop that does futex stuff.
> 
> o       The first thread finishes executing call_rcu_data_free(),
>        which acquires call_rcu_mutex just long enough to remove the
>        newly drained call_rcu_data structure from call_rcu_data_list.
>        Which causes one of the rcu_barrier() invocation's callbacks to
>        be leaked.
> 
> o       The second thread's rcu_barrier() invocation never returns
>        resulting in a hang.
> 
> This commit therefore changes call_rcu_data_free() to acquire
> call_rcu_mutex before checking the call_rcu_data structure for callbacks.
> In the case where there are no callbacks, call_rcu_mutex is held across
> both the check and the removal from call_rcu_data_list, thus preventing
> rcu_barrier() from adding a callback in the meantime.  In the case where
> there are callbacks, call_rcu_mutex must be momentarily dropped across
> the call to get_default_call_rcu_data(), which can itself acquire
> call_rcu_mutex.  This momentary drop is not a problem because any
> callbacks that rcu_barrier() might queue during that period of time will
> be moved to the default call_rcu_data structure, and the lock will be
> held across the full time including moving those callbacks and removing
> the call_rcu_data structure that was passed into call_rcu_data_free()
> from call_rcu_data_list.
> 
> With this fix, a several-hundred-CPU test successfully completes more
> than 5,000 executions.  Without this fix, it fails within a few tens
> of executions.  Although the failures happen more quickly on larger
> systems, in theory this could happen on a single-CPU system, courtesy
> of preemption.

I agree with this fix, will merge in liburcu master, stable-0.12, and stable-2.11.
Out of curiosity, which test is hanging ?  Is it a test which is part of the liburcu
tree or some out-of-tree test ? I wonder why we did not catch it in our CI [1].

Thanks,

Mathieu

[1] https://ci.lttng.org/view/Liburcu/

> 
> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> Cc: Stephen Hemminger <stephen@networkplumber.org>
> Cc: Alan Stern <stern@rowland.harvard.edu>
> Cc: Lai Jiangshan <jiangshanlai@gmail.com>
> Cc: <lttng-dev@lists.lttng.org>
> Cc: <linux-kernel@vger.kernel.org>
> 
> ---
> 
> urcu-call-rcu-impl.h |    7 +++++--
> 1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/src/urcu-call-rcu-impl.h b/src/urcu-call-rcu-impl.h
> index b6ec6ba..18fd65a 100644
> --- a/src/urcu-call-rcu-impl.h
> +++ b/src/urcu-call-rcu-impl.h
> @@ -772,9 +772,13 @@ void call_rcu_data_free(struct call_rcu_data *crdp)
> 		while ((uatomic_read(&crdp->flags) & URCU_CALL_RCU_STOPPED) == 0)
> 			(void) poll(NULL, 0, 1);
> 	}
> +	call_rcu_lock(&call_rcu_mutex);
> 	if (!cds_wfcq_empty(&crdp->cbs_head, &crdp->cbs_tail)) {
> -		/* Create default call rcu data if need be */
> +		call_rcu_unlock(&call_rcu_mutex);
> +		/* Create default call rcu data if need be. */
> +		/* CBs queued here will be handed to the default list. */
> 		(void) get_default_call_rcu_data();
> +		call_rcu_lock(&call_rcu_mutex);
> 		__cds_wfcq_splice_blocking(&default_call_rcu_data->cbs_head,
> 			&default_call_rcu_data->cbs_tail,
> 			&crdp->cbs_head, &crdp->cbs_tail);
> @@ -783,7 +787,6 @@ void call_rcu_data_free(struct call_rcu_data *crdp)
> 		wake_call_rcu_thread(default_call_rcu_data);
> 	}
> 
> -	call_rcu_lock(&call_rcu_mutex);
> 	cds_list_del(&crdp->list);
>  	call_rcu_unlock(&call_rcu_mutex);

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

  reply	other threads:[~2020-10-26 19:58 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-22 22:30 Paul E. McKenney via lttng-dev
2020-10-26 19:58 ` Mathieu Desnoyers via lttng-dev [this message]
2020-10-26 20:53   ` Paul E. McKenney via lttng-dev
2020-10-27 14:47     ` Mathieu Desnoyers via lttng-dev

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1576751762.38206.1603742291604.JavaMail.zimbra@efficios.com \
    --to=lttng-dev@lists.lttng.org \
    --cc=jiangshanlai@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=paulmck@kernel.org \
    --cc=stephen@networkplumber.org \
    --cc=stern@rowland.harvard.edu \
    --subject='Re: [lttng-dev] [PATCH] call_rcu: Fix race between rcu_barrier() and call_rcu_data_free()' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).