From: "Joel Fernandes (Google)" <joel@joelfernandes.org>
To: linux-kernel@vger.kernel.org,
"Paul E. McKenney" <paulmck@kernel.org>,
Frederic Weisbecker <frederic@kernel.org>,
Neeraj Upadhyay <quic_neeraju@quicinc.com>,
Josh Triplett <josh@joshtriplett.org>,
Steven Rostedt <rostedt@goodmis.org>,
Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
Lai Jiangshan <jiangshanlai@gmail.com>,
Joel Fernandes <joel@joelfernandes.org>
Cc: Zqiang <qiang1.zhang@intel.com>, rcu@vger.kernel.org
Subject: [PATCH 1/9] rcu: Fix set/clear TICK_DEP_BIT_RCU_EXP bitmask race
Date: Wed, 15 Mar 2023 19:43:41 +0000 [thread overview]
Message-ID: <20230315194349.10798-1-joel@joelfernandes.org> (raw)
From: Zqiang <qiang1.zhang@intel.com>
For kernels built with CONFIG_NO_HZ_FULL=y, the following scenario can result
in the scheduling-clock interrupt remaining enabled on a holdout CPU after
its quiescent state has been reported:
CPU1 CPU2
rcu_report_exp_cpu_mult synchronize_rcu_expedited_wait
acquires rnp->lock mask = rnp->expmask;
for_each_leaf_node_cpu_mask(rnp, cpu, mask)
rnp->expmask = rnp->expmask & ~mask; rdp = per_cpu_ptr(&rcu_data, cpu1);
for_each_leaf_node_cpu_mask(rnp, cpu, mask)
rdp = per_cpu_ptr(&rcu_data, cpu1);
if (!rdp->rcu_forced_tick_exp)
continue; rdp->rcu_forced_tick_exp = true;
tick_dep_set_cpu(cpu1, TICK_DEP_BIT_RCU_EXP);
The problem is that CPU2's sampling of rnp->expmask is obsolete by the
time it invokes tick_dep_set_cpu(), and CPU1 is not guaranteed to see
CPU2's store to ->rcu_forced_tick_exp in time to clear it. And even if
CPU1 does see that store, it might invoke tick_dep_clear_cpu() before
CPU2 got around to executing its tick_dep_set_cpu(), which would still
leave the victim CPU with its scheduler-clock tick running.
Either way, an nohz_full real-time application running on the victim
CPU would have its latency needlessly degraded.
Note that expedited RCU grace periods look at context-tracking
information, and so if the CPU is executing in nohz_full usermode
throughout, that CPU cannot be victimized in this manner.
This commit therefore causes synchronize_rcu_expedited_wait to hold
the rcu_node structure's ->lock when checking for holdout CPUs, setting
TICK_DEP_BIT_RCU_EXP, and invoking tick_dep_set_cpu(), thus preventing
this race.
Signed-off-by: Zqiang <qiang1.zhang@intel.com>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
kernel/rcu/tree_exp.h | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
index 249c2967d9e6..7cc4856da081 100644
--- a/kernel/rcu/tree_exp.h
+++ b/kernel/rcu/tree_exp.h
@@ -594,6 +594,7 @@ static void synchronize_rcu_expedited_wait(void)
struct rcu_data *rdp;
struct rcu_node *rnp;
struct rcu_node *rnp_root = rcu_get_root();
+ unsigned long flags;
trace_rcu_exp_grace_period(rcu_state.name, rcu_exp_gp_seq_endval(), TPS("startwait"));
jiffies_stall = rcu_exp_jiffies_till_stall_check();
@@ -602,17 +603,17 @@ static void synchronize_rcu_expedited_wait(void)
if (synchronize_rcu_expedited_wait_once(1))
return;
rcu_for_each_leaf_node(rnp) {
+ raw_spin_lock_irqsave_rcu_node(rnp, flags);
mask = READ_ONCE(rnp->expmask);
for_each_leaf_node_cpu_mask(rnp, cpu, mask) {
rdp = per_cpu_ptr(&rcu_data, cpu);
if (rdp->rcu_forced_tick_exp)
continue;
rdp->rcu_forced_tick_exp = true;
- preempt_disable();
if (cpu_online(cpu))
tick_dep_set_cpu(cpu, TICK_DEP_BIT_RCU_EXP);
- preempt_enable();
}
+ raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
}
j = READ_ONCE(jiffies_till_first_fqs);
if (synchronize_rcu_expedited_wait_once(j + HZ))
--
2.40.0.rc1.284.g88254d51c5-goog
next reply other threads:[~2023-03-15 19:44 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-03-15 19:43 Joel Fernandes (Google) [this message]
2023-03-15 19:43 ` [PATCH 2/9] rcu: Fix missing TICK_DEP_MASK_RCU_EXP dependency check Joel Fernandes (Google)
2023-03-15 19:43 ` [PATCH 3/9] entry/rcu: Check TIF_RESCHED _after_ delayed RCU wake-up Joel Fernandes (Google)
2023-03-21 14:20 ` [tip: core/urgent] " tip-bot2 for Frederic Weisbecker
2023-03-15 19:43 ` [PATCH 4/9] rcu: Register rcu-lazy shrinker only for CONFIG_RCU_LAZY=y kernels Joel Fernandes (Google)
2023-03-15 23:19 ` Frederic Weisbecker
2023-03-16 2:28 ` Joel Fernandes
2023-03-15 19:43 ` [PATCH 5/9] rcu: Remove never-set needwake assignment from rcu_report_qs_rdp() Joel Fernandes (Google)
2023-03-15 19:43 ` [PATCH 6/9] rcu: Permit start_poll_synchronize_rcu_expedited() to be invoked early Joel Fernandes (Google)
2023-03-15 19:43 ` [PATCH 7/9] rcu-tasks: Report stalls during synchronize_srcu() in rcu_tasks_postscan() Joel Fernandes (Google)
2023-03-15 19:43 ` [PATCH 8/9] rcu: Avoid stack overflow due to __rcu_irq_enter_check_tick() being kprobe-ed Joel Fernandes (Google)
2023-03-15 19:43 ` [PATCH 9/9] rcu: Protect rcu_print_task_exp_stall() ->exp_tasks access Joel Fernandes (Google)
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230315194349.10798-1-joel@joelfernandes.org \
--to=joel@joelfernandes.org \
--cc=frederic@kernel.org \
--cc=jiangshanlai@gmail.com \
--cc=josh@joshtriplett.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mathieu.desnoyers@efficios.com \
--cc=paulmck@kernel.org \
--cc=qiang1.zhang@intel.com \
--cc=quic_neeraju@quicinc.com \
--cc=rcu@vger.kernel.org \
--cc=rostedt@goodmis.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).