From: "Paul E. McKenney" <paulmck@kernel.org>
To: mathieu.desnoyers@efficios.com
Cc: stephen@networkplumber.org, stern@rowland.harvard.edu,
jiangshanlai@gmail.com, lttng-dev@lists.lttng.org,
linux-kernel@vger.kernel.org
Subject: [PATCH] call_rcu: Fix race between rcu_barrier() and call_rcu_data_free()
Date: Thu, 22 Oct 2020 15:30:21 -0700 [thread overview]
Message-ID: <20201022223021.GA8535@paulmck-ThinkPad-P72> (raw)
The current code can lose RCU callbacks at shutdown time, which can
result in hangs. This lossage can happen as follows:
o A thread invokes call_rcu_data_free(), which executes up through
the wake_call_rcu_thread(). At this point, the call_rcu_data
structure has been drained of callbacks, but is still on the
call_rcu_data_list. Note that this thread does not hold the
call_rcu_mutex.
o Another thread invokes rcu_barrier(), which traverses the
call_rcu_data_list under the protection of call_rcu_mutex,
a list which still includes the above newly drained structure.
This thread therefore adds a callback to the newly drained
call_rcu_data structure. It then releases call_rcu_mutex and
enters a mystifying loop that does futex stuff.
o The first thread finishes executing call_rcu_data_free(),
which acquires call_rcu_mutex just long enough to remove the
newly drained call_rcu_data structure from call_rcu_data_list.
Which causes one of the rcu_barrier() invocation's callbacks to
be leaked.
o The second thread's rcu_barrier() invocation never returns
resulting in a hang.
This commit therefore changes call_rcu_data_free() to acquire
call_rcu_mutex before checking the call_rcu_data structure for callbacks.
In the case where there are no callbacks, call_rcu_mutex is held across
both the check and the removal from call_rcu_data_list, thus preventing
rcu_barrier() from adding a callback in the meantime. In the case where
there are callbacks, call_rcu_mutex must be momentarily dropped across
the call to get_default_call_rcu_data(), which can itself acquire
call_rcu_mutex. This momentary drop is not a problem because any
callbacks that rcu_barrier() might queue during that period of time will
be moved to the default call_rcu_data structure, and the lock will be
held across the full time including moving those callbacks and removing
the call_rcu_data structure that was passed into call_rcu_data_free()
from call_rcu_data_list.
With this fix, a several-hundred-CPU test successfully completes more
than 5,000 executions. Without this fix, it fails within a few tens
of executions. Although the failures happen more quickly on larger
systems, in theory this could happen on a single-CPU system, courtesy
of preemption.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: <lttng-dev@lists.lttng.org>
Cc: <linux-kernel@vger.kernel.org>
---
urcu-call-rcu-impl.h | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/src/urcu-call-rcu-impl.h b/src/urcu-call-rcu-impl.h
index b6ec6ba..18fd65a 100644
--- a/src/urcu-call-rcu-impl.h
+++ b/src/urcu-call-rcu-impl.h
@@ -772,9 +772,13 @@ void call_rcu_data_free(struct call_rcu_data *crdp)
while ((uatomic_read(&crdp->flags) & URCU_CALL_RCU_STOPPED) == 0)
(void) poll(NULL, 0, 1);
}
+ call_rcu_lock(&call_rcu_mutex);
if (!cds_wfcq_empty(&crdp->cbs_head, &crdp->cbs_tail)) {
- /* Create default call rcu data if need be */
+ call_rcu_unlock(&call_rcu_mutex);
+ /* Create default call rcu data if need be. */
+ /* CBs queued here will be handed to the default list. */
(void) get_default_call_rcu_data();
+ call_rcu_lock(&call_rcu_mutex);
__cds_wfcq_splice_blocking(&default_call_rcu_data->cbs_head,
&default_call_rcu_data->cbs_tail,
&crdp->cbs_head, &crdp->cbs_tail);
@@ -783,7 +787,6 @@ void call_rcu_data_free(struct call_rcu_data *crdp)
wake_call_rcu_thread(default_call_rcu_data);
}
- call_rcu_lock(&call_rcu_mutex);
cds_list_del(&crdp->list);
call_rcu_unlock(&call_rcu_mutex);
next reply other threads:[~2020-10-22 22:30 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-10-22 22:30 Paul E. McKenney [this message]
2020-10-26 19:58 ` [PATCH] call_rcu: Fix race between rcu_barrier() and call_rcu_data_free() Mathieu Desnoyers
2020-10-26 20:53 ` Paul E. McKenney
2020-10-27 14:47 ` Mathieu Desnoyers
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20201022223021.GA8535@paulmck-ThinkPad-P72 \
--to=paulmck@kernel.org \
--cc=jiangshanlai@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=lttng-dev@lists.lttng.org \
--cc=mathieu.desnoyers@efficios.com \
--cc=stephen@networkplumber.org \
--cc=stern@rowland.harvard.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).