All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/9] rcu: Fix set/clear TICK_DEP_BIT_RCU_EXP bitmask race
@ 2023-03-15 19:43 Joel Fernandes (Google)
  2023-03-15 19:43 ` [PATCH 2/9] rcu: Fix missing TICK_DEP_MASK_RCU_EXP dependency check Joel Fernandes (Google)
                   ` (7 more replies)
  0 siblings, 8 replies; 12+ messages in thread
From: Joel Fernandes (Google) @ 2023-03-15 19:43 UTC (permalink / raw)
  To: linux-kernel, Paul E. McKenney, Frederic Weisbecker,
	Neeraj Upadhyay, Josh Triplett, Steven Rostedt,
	Mathieu Desnoyers, Lai Jiangshan, Joel Fernandes
  Cc: Zqiang, rcu

From: Zqiang <qiang1.zhang@intel.com>

For kernels built with CONFIG_NO_HZ_FULL=y, the following scenario can result
in the scheduling-clock interrupt remaining enabled on a holdout CPU after
its quiescent state has been reported:

	CPU1                                                 CPU2
rcu_report_exp_cpu_mult                          synchronize_rcu_expedited_wait
   acquires rnp->lock                               mask = rnp->expmask;
                                                    for_each_leaf_node_cpu_mask(rnp, cpu, mask)
   rnp->expmask = rnp->expmask & ~mask;                rdp = per_cpu_ptr(&rcu_data, cpu1);
   for_each_leaf_node_cpu_mask(rnp, cpu, mask)
      rdp = per_cpu_ptr(&rcu_data, cpu1);
      if (!rdp->rcu_forced_tick_exp)
             continue;                                 rdp->rcu_forced_tick_exp = true;
                                                       tick_dep_set_cpu(cpu1, TICK_DEP_BIT_RCU_EXP);

The problem is that CPU2's sampling of rnp->expmask is obsolete by the
time it invokes tick_dep_set_cpu(), and CPU1 is not guaranteed to see
CPU2's store to ->rcu_forced_tick_exp in time to clear it.  And even if
CPU1 does see that store, it might invoke tick_dep_clear_cpu() before
CPU2 got around to executing its tick_dep_set_cpu(), which would still
leave the victim CPU with its scheduler-clock tick running.

Either way, an nohz_full real-time application running on the victim
CPU would have its latency needlessly degraded.

Note that expedited RCU grace periods look at context-tracking
information, and so if the CPU is executing in nohz_full usermode
throughout, that CPU cannot be victimized in this manner.

This commit therefore causes synchronize_rcu_expedited_wait to hold
the rcu_node structure's ->lock when checking for holdout CPUs, setting
TICK_DEP_BIT_RCU_EXP, and invoking tick_dep_set_cpu(), thus preventing
this race.

Signed-off-by: Zqiang <qiang1.zhang@intel.com>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
 kernel/rcu/tree_exp.h | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
index 249c2967d9e6..7cc4856da081 100644
--- a/kernel/rcu/tree_exp.h
+++ b/kernel/rcu/tree_exp.h
@@ -594,6 +594,7 @@ static void synchronize_rcu_expedited_wait(void)
 	struct rcu_data *rdp;
 	struct rcu_node *rnp;
 	struct rcu_node *rnp_root = rcu_get_root();
+	unsigned long flags;
 
 	trace_rcu_exp_grace_period(rcu_state.name, rcu_exp_gp_seq_endval(), TPS("startwait"));
 	jiffies_stall = rcu_exp_jiffies_till_stall_check();
@@ -602,17 +603,17 @@ static void synchronize_rcu_expedited_wait(void)
 		if (synchronize_rcu_expedited_wait_once(1))
 			return;
 		rcu_for_each_leaf_node(rnp) {
+			raw_spin_lock_irqsave_rcu_node(rnp, flags);
 			mask = READ_ONCE(rnp->expmask);
 			for_each_leaf_node_cpu_mask(rnp, cpu, mask) {
 				rdp = per_cpu_ptr(&rcu_data, cpu);
 				if (rdp->rcu_forced_tick_exp)
 					continue;
 				rdp->rcu_forced_tick_exp = true;
-				preempt_disable();
 				if (cpu_online(cpu))
 					tick_dep_set_cpu(cpu, TICK_DEP_BIT_RCU_EXP);
-				preempt_enable();
 			}
+			raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
 		}
 		j = READ_ONCE(jiffies_till_first_fqs);
 		if (synchronize_rcu_expedited_wait_once(j + HZ))
-- 
2.40.0.rc1.284.g88254d51c5-goog


^ permalink raw reply related	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2023-03-21 14:20 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-15 19:43 [PATCH 1/9] rcu: Fix set/clear TICK_DEP_BIT_RCU_EXP bitmask race Joel Fernandes (Google)
2023-03-15 19:43 ` [PATCH 2/9] rcu: Fix missing TICK_DEP_MASK_RCU_EXP dependency check Joel Fernandes (Google)
2023-03-15 19:43 ` [PATCH 3/9] entry/rcu: Check TIF_RESCHED _after_ delayed RCU wake-up Joel Fernandes (Google)
2023-03-21 14:20   ` [tip: core/urgent] " tip-bot2 for Frederic Weisbecker
2023-03-15 19:43 ` [PATCH 4/9] rcu: Register rcu-lazy shrinker only for CONFIG_RCU_LAZY=y kernels Joel Fernandes (Google)
2023-03-15 23:19   ` Frederic Weisbecker
2023-03-16  2:28     ` Joel Fernandes
2023-03-15 19:43 ` [PATCH 5/9] rcu: Remove never-set needwake assignment from rcu_report_qs_rdp() Joel Fernandes (Google)
2023-03-15 19:43 ` [PATCH 6/9] rcu: Permit start_poll_synchronize_rcu_expedited() to be invoked early Joel Fernandes (Google)
2023-03-15 19:43 ` [PATCH 7/9] rcu-tasks: Report stalls during synchronize_srcu() in rcu_tasks_postscan() Joel Fernandes (Google)
2023-03-15 19:43 ` [PATCH 8/9] rcu: Avoid stack overflow due to __rcu_irq_enter_check_tick() being kprobe-ed Joel Fernandes (Google)
2023-03-15 19:43 ` [PATCH 9/9] rcu: Protect rcu_print_task_exp_stall() ->exp_tasks access Joel Fernandes (Google)

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.