From: Mathieu Desnoyers via lttng-dev <lttng-dev@lists.lttng.org> To: paulmck <paulmck@kernel.org> Cc: Stephen Hemminger <stephen@networkplumber.org>, Alan Stern <stern@rowland.harvard.edu>, Lai Jiangshan <jiangshanlai@gmail.com>, lttng-dev <lttng-dev@lists.lttng.org>, linux-kernel <linux-kernel@vger.kernel.org> Subject: Re: [lttng-dev] [PATCH] call_rcu: Fix race between rcu_barrier() and call_rcu_data_free() Date: Mon, 26 Oct 2020 15:58:11 -0400 (EDT) [thread overview] Message-ID: <1576751762.38206.1603742291604.JavaMail.zimbra@efficios.com> (raw) In-Reply-To: <20201022223021.GA8535@paulmck-ThinkPad-P72> ----- On Oct 22, 2020, at 6:30 PM, paulmck paulmck@kernel.org wrote: > The current code can lose RCU callbacks at shutdown time, which can > result in hangs. This lossage can happen as follows: > > o A thread invokes call_rcu_data_free(), which executes up through > the wake_call_rcu_thread(). At this point, the call_rcu_data > structure has been drained of callbacks, but is still on the > call_rcu_data_list. Note that this thread does not hold the > call_rcu_mutex. > > o Another thread invokes rcu_barrier(), which traverses the > call_rcu_data_list under the protection of call_rcu_mutex, > a list which still includes the above newly drained structure. > This thread therefore adds a callback to the newly drained > call_rcu_data structure. It then releases call_rcu_mutex and > enters a mystifying loop that does futex stuff. > > o The first thread finishes executing call_rcu_data_free(), > which acquires call_rcu_mutex just long enough to remove the > newly drained call_rcu_data structure from call_rcu_data_list. > Which causes one of the rcu_barrier() invocation's callbacks to > be leaked. > > o The second thread's rcu_barrier() invocation never returns > resulting in a hang. > > This commit therefore changes call_rcu_data_free() to acquire > call_rcu_mutex before checking the call_rcu_data structure for callbacks. > In the case where there are no callbacks, call_rcu_mutex is held across > both the check and the removal from call_rcu_data_list, thus preventing > rcu_barrier() from adding a callback in the meantime. In the case where > there are callbacks, call_rcu_mutex must be momentarily dropped across > the call to get_default_call_rcu_data(), which can itself acquire > call_rcu_mutex. This momentary drop is not a problem because any > callbacks that rcu_barrier() might queue during that period of time will > be moved to the default call_rcu_data structure, and the lock will be > held across the full time including moving those callbacks and removing > the call_rcu_data structure that was passed into call_rcu_data_free() > from call_rcu_data_list. > > With this fix, a several-hundred-CPU test successfully completes more > than 5,000 executions. Without this fix, it fails within a few tens > of executions. Although the failures happen more quickly on larger > systems, in theory this could happen on a single-CPU system, courtesy > of preemption. I agree with this fix, will merge in liburcu master, stable-0.12, and stable-2.11. Out of curiosity, which test is hanging ? Is it a test which is part of the liburcu tree or some out-of-tree test ? I wonder why we did not catch it in our CI [1]. Thanks, Mathieu [1] https://ci.lttng.org/view/Liburcu/ > > Signed-off-by: Paul E. McKenney <paulmck@kernel.org> > Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> > Cc: Stephen Hemminger <stephen@networkplumber.org> > Cc: Alan Stern <stern@rowland.harvard.edu> > Cc: Lai Jiangshan <jiangshanlai@gmail.com> > Cc: <lttng-dev@lists.lttng.org> > Cc: <linux-kernel@vger.kernel.org> > > --- > > urcu-call-rcu-impl.h | 7 +++++-- > 1 file changed, 5 insertions(+), 2 deletions(-) > > diff --git a/src/urcu-call-rcu-impl.h b/src/urcu-call-rcu-impl.h > index b6ec6ba..18fd65a 100644 > --- a/src/urcu-call-rcu-impl.h > +++ b/src/urcu-call-rcu-impl.h > @@ -772,9 +772,13 @@ void call_rcu_data_free(struct call_rcu_data *crdp) > while ((uatomic_read(&crdp->flags) & URCU_CALL_RCU_STOPPED) == 0) > (void) poll(NULL, 0, 1); > } > + call_rcu_lock(&call_rcu_mutex); > if (!cds_wfcq_empty(&crdp->cbs_head, &crdp->cbs_tail)) { > - /* Create default call rcu data if need be */ > + call_rcu_unlock(&call_rcu_mutex); > + /* Create default call rcu data if need be. */ > + /* CBs queued here will be handed to the default list. */ > (void) get_default_call_rcu_data(); > + call_rcu_lock(&call_rcu_mutex); > __cds_wfcq_splice_blocking(&default_call_rcu_data->cbs_head, > &default_call_rcu_data->cbs_tail, > &crdp->cbs_head, &crdp->cbs_tail); > @@ -783,7 +787,6 @@ void call_rcu_data_free(struct call_rcu_data *crdp) > wake_call_rcu_thread(default_call_rcu_data); > } > > - call_rcu_lock(&call_rcu_mutex); > cds_list_del(&crdp->list); > call_rcu_unlock(&call_rcu_mutex); -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com _______________________________________________ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
next prev parent reply other threads:[~2020-10-26 19:58 UTC|newest] Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top 2020-10-22 22:30 Paul E. McKenney via lttng-dev 2020-10-26 19:58 ` Mathieu Desnoyers via lttng-dev [this message] 2020-10-26 20:53 ` Paul E. McKenney via lttng-dev 2020-10-27 14:47 ` Mathieu Desnoyers via lttng-dev
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=1576751762.38206.1603742291604.JavaMail.zimbra@efficios.com \ --to=lttng-dev@lists.lttng.org \ --cc=jiangshanlai@gmail.com \ --cc=linux-kernel@vger.kernel.org \ --cc=mathieu.desnoyers@efficios.com \ --cc=paulmck@kernel.org \ --cc=stephen@networkplumber.org \ --cc=stern@rowland.harvard.edu \ --subject='Re: [lttng-dev] [PATCH] call_rcu: Fix race between rcu_barrier() and call_rcu_data_free()' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).