linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/13] rcu/nocb updates v2
@ 2021-02-23  0:09 Frederic Weisbecker
  2021-02-23  0:09 ` [PATCH 01/13] rcu/nocb: Fix potential missed nocb_timer rearm Frederic Weisbecker
                   ` (12 more replies)
  0 siblings, 13 replies; 39+ messages in thread
From: Frederic Weisbecker @ 2021-02-23  0:09 UTC (permalink / raw)
  To: Paul E . McKenney
  Cc: LKML, Frederic Weisbecker, Thomas Gleixner, Boqun Feng,
	Lai Jiangshan, Neeraj Upadhyay, Josh Triplett, Stable,
	Joel Fernandes

It's a v2 of the previous set (https://lore.kernel.org/lkml/20210128171222.131380-1-frederic@kernel.org/)
minus the patches already applied in rcu/dev. And this is based on 
latest rcu/dev.

Changelog since v1:

"rcu/nocb: Fix potential missed nocb_timer rearm"
	* Remove nocb_defer_wakeup reset from do_nocb_deferred_wakeup_common() (paulmck)
	* Only reset/del if the timer is actually armed
	* Add secondary potential cause for missed rearm in the changelog

"rcu/nocb: Disable bypass when CPU isn't completely offloaded"
	* Improve comments on state machine (paulmck)
	* Add comment (a full quote from Paul) explaining why early flush is enough (paulmck)
	* Move sanity check to the very end of deoffloading (paulmck)
	* Clarify some comments about nocb locking on de-offloading (paulmck)

"rcu/nocb: Remove stale comment above rcu_segcblist_offload()"
	* New patch, reported by (paulmck)

"rcu/nocb: Merge nocb_timer to the rdp leader"
	* Remove rcu_running_nocb_timer() and its use in rcu_rdp_is_offloaded()
	  debugging since the timer doesn't refer to any rdp offloading anymore.
	* Only delete nocb_timer when armed, in nocb_gp_wait()
	* Clarify some comments about nocb locking on de-offloading (paulmck)
	* Remove stale code "re-enabling" nocb timer on offloading. Not necessary
	  anymore and even buggy.

"timer: Revert "timer: Add timer_curr_running()""
	* New patch

git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks.git
	rcu/dev-v2

HEAD: 925ee3076eb694db893e2c6664d90ad8fb9cb6e5

Thanks,
	Frederic
---

Frederic Weisbecker (13):
      rcu/nocb: Fix potential missed nocb_timer rearm
      rcu/nocb: Disable bypass when CPU isn't completely offloaded
      rcu/nocb: Remove stale comment above rcu_segcblist_offload()
      rcu/nocb: Move trace_rcu_nocb_wake() calls outside nocb_lock when possible
      rcu/nocb: Merge nocb_timer to the rdp leader
      timer: Revert "timer: Add timer_curr_running()"
      rcu/nocb: Directly call __wake_nocb_gp() from bypass timer
      rcu/nocb: Allow de-offloading rdp leader
      rcu/nocb: Cancel nocb_timer upon nocb_gp wakeup
      rcu/nocb: Delete bypass_timer upon nocb_gp wakeup
      rcu/nocb: Only cancel nocb timer if not polling
      rcu/nocb: Prepare for finegrained deferred wakeup
      rcu/nocb: Unify timers


 include/linux/rcu_segcblist.h |   7 +-
 include/linux/timer.h         |   2 -
 include/trace/events/rcu.h    |   1 +
 kernel/rcu/rcu_segcblist.c    |   3 +-
 kernel/rcu/tree.c             |   2 +-
 kernel/rcu/tree.h             |   9 +-
 kernel/rcu/tree_plugin.h      | 233 +++++++++++++++++++++++-------------------
 kernel/time/timer.c           |  14 ---
 8 files changed, 141 insertions(+), 130 deletions(-)

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH 01/13] rcu/nocb: Fix potential missed nocb_timer rearm
  2021-02-23  0:09 [PATCH 00/13] rcu/nocb updates v2 Frederic Weisbecker
@ 2021-02-23  0:09 ` Frederic Weisbecker
  2021-02-24 18:37   ` Paul E. McKenney
  2021-02-23  0:10 ` [PATCH 02/13] rcu/nocb: Disable bypass when CPU isn't completely offloaded Frederic Weisbecker
                   ` (11 subsequent siblings)
  12 siblings, 1 reply; 39+ messages in thread
From: Frederic Weisbecker @ 2021-02-23  0:09 UTC (permalink / raw)
  To: Paul E . McKenney
  Cc: LKML, Frederic Weisbecker, Thomas Gleixner, Boqun Feng,
	Lai Jiangshan, Neeraj Upadhyay, Josh Triplett, Stable,
	Joel Fernandes

Two situations can cause a missed nocb timer rearm:

1) rdp(CPU A) queues its nocb timer. The grace period elapses before
   the timer get a chance to fire. The nocb_gp kthread is awaken by
   rdp(CPU B). The nocb_cb kthread for rdp(CPU A) is awaken and
   process the callbacks, again before the nocb_timer for CPU A get a
   chance to fire. rdp(CPU A) queues a callback and wakes up nocb_gp
   kthread, cancelling the pending nocb_timer without resetting the
   corresponding nocb_defer_wakeup.

2) The "nocb_bypass_timer" ends up calling wake_nocb_gp() which deletes
   the pending "nocb_timer" (note they are not the same timers) for the
   given rdp without resetting the matching state stored in nocb_defer
   wakeup.

On both situations, a future call_rcu() on that rdp may be fooled and
think the timer is armed when it's not, missing a deferred nocb_gp
wakeup.

Case 1) is very unlikely due to timing constraint (the timer fires after
1 jiffy) but still possible in theory. Case 2) is more likely to happen.
But in any case such scenario require the CPU to spend a long time
within a kernel thread without exiting to idle or user space, which is
a pretty exotic behaviour.

Fix this with resetting rdp->nocb_defer_wakeup everytime we disarm the
timer.

Fixes: d1b222c6be1f (rcu/nocb: Add bypass callback queueing)
Cc: Stable <stable@vger.kernel.org>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Neeraj Upadhyay <neeraju@codeaurora.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
 kernel/rcu/tree_plugin.h | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 2ec9d7f55f99..dd0dc66c282d 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -1720,7 +1720,11 @@ static bool wake_nocb_gp(struct rcu_data *rdp, bool force,
 		rcu_nocb_unlock_irqrestore(rdp, flags);
 		return false;
 	}
-	del_timer(&rdp->nocb_timer);
+
+	if (READ_ONCE(rdp->nocb_defer_wakeup) > RCU_NOCB_WAKE_NOT) {
+		WRITE_ONCE(rdp->nocb_defer_wakeup, RCU_NOCB_WAKE_NOT);
+		del_timer(&rdp->nocb_timer);
+	}
 	rcu_nocb_unlock_irqrestore(rdp, flags);
 	raw_spin_lock_irqsave(&rdp_gp->nocb_gp_lock, flags);
 	if (force || READ_ONCE(rdp_gp->nocb_gp_sleep)) {
@@ -2349,7 +2353,6 @@ static bool do_nocb_deferred_wakeup_common(struct rcu_data *rdp)
 		return false;
 	}
 	ndw = READ_ONCE(rdp->nocb_defer_wakeup);
-	WRITE_ONCE(rdp->nocb_defer_wakeup, RCU_NOCB_WAKE_NOT);
 	ret = wake_nocb_gp(rdp, ndw == RCU_NOCB_WAKE_FORCE, flags);
 	trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("DeferredWake"));
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 02/13] rcu/nocb: Disable bypass when CPU isn't completely offloaded
  2021-02-23  0:09 [PATCH 00/13] rcu/nocb updates v2 Frederic Weisbecker
  2021-02-23  0:09 ` [PATCH 01/13] rcu/nocb: Fix potential missed nocb_timer rearm Frederic Weisbecker
@ 2021-02-23  0:10 ` Frederic Weisbecker
  2021-02-23  0:10 ` [PATCH 03/13] rcu/nocb: Remove stale comment above rcu_segcblist_offload() Frederic Weisbecker
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 39+ messages in thread
From: Frederic Weisbecker @ 2021-02-23  0:10 UTC (permalink / raw)
  To: Paul E . McKenney
  Cc: LKML, Frederic Weisbecker, Thomas Gleixner, Boqun Feng,
	Lai Jiangshan, Neeraj Upadhyay, Josh Triplett, Stable,
	Joel Fernandes

Instead of flushing bypass at the very last moment in the deoffloading
process, just disable bypass enqueue at soon as we start the deoffloading
process and flush the pending bypass early. It's less fragile and we
leave some time to the kthreads and softirqs to process quietly.

Symmetrically, only enable bypass once we safely complete the offloading
process.

Reported-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Neeraj Upadhyay <neeraju@codeaurora.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
 include/linux/rcu_segcblist.h |  7 ++++---
 kernel/rcu/tree_plugin.h      | 38 ++++++++++++++++++++++++++---------
 2 files changed, 33 insertions(+), 12 deletions(-)

diff --git a/include/linux/rcu_segcblist.h b/include/linux/rcu_segcblist.h
index 8afe886e85f1..3db96c4f45fd 100644
--- a/include/linux/rcu_segcblist.h
+++ b/include/linux/rcu_segcblist.h
@@ -109,7 +109,7 @@ struct rcu_cblist {
  *  |                           SEGCBLIST_KTHREAD_GP                           |
  *  |                                                                          |
  *  |   Kthreads handle callbacks holding nocb_lock, local rcu_core() stops    |
- *  |   handling callbacks.                                                    |
+ *  |   handling callbacks. Enable bypass queueing.                            |
  *  ----------------------------------------------------------------------------
  */
 
@@ -125,7 +125,7 @@ struct rcu_cblist {
  *  |                           SEGCBLIST_KTHREAD_GP                           |
  *  |                                                                          |
  *  |   CB/GP kthreads handle callbacks holding nocb_lock, local rcu_core()    |
- *  |   ignores callbacks.                                                     |
+ *  |   ignores callbacks. Bypass enqueue is enabled.                          |
  *  ----------------------------------------------------------------------------
  *                                      |
  *                                      v
@@ -134,7 +134,8 @@ struct rcu_cblist {
  *  |                           SEGCBLIST_KTHREAD_GP                           |
  *  |                                                                          |
  *  |   CB/GP kthreads and local rcu_core() handle callbacks concurrently      |
- *  |   holding nocb_lock. Wake up CB and GP kthreads if necessary.            |
+ *  |   holding nocb_lock. Wake up CB and GP kthreads if necessary. Disable    |
+ *  |   bypass enqueue.                                                        |
  *  ----------------------------------------------------------------------------
  *                                      |
  *                                      v
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index dd0dc66c282d..924fa3d1df0d 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -1842,11 +1842,22 @@ static bool rcu_nocb_try_bypass(struct rcu_data *rdp, struct rcu_head *rhp,
 	unsigned long j = jiffies;
 	long ncbs = rcu_cblist_n_cbs(&rdp->nocb_bypass);
 
+	lockdep_assert_irqs_disabled();
+
+	// Pure softirq/rcuc based processing: no bypassing, no
+	// locking.
 	if (!rcu_rdp_is_offloaded(rdp)) {
+		*was_alldone = !rcu_segcblist_pend_cbs(&rdp->cblist);
+		return false;
+	}
+
+	// In the process of (de-)offloading: no bypassing, but
+	// locking.
+	if (!rcu_segcblist_completely_offloaded(&rdp->cblist)) {
+		rcu_nocb_lock(rdp);
 		*was_alldone = !rcu_segcblist_pend_cbs(&rdp->cblist);
 		return false; /* Not offloaded, no bypassing. */
 	}
-	lockdep_assert_irqs_disabled();
 
 	// Don't use ->nocb_bypass during early boot.
 	if (rcu_scheduler_active != RCU_SCHEDULER_RUNNING) {
@@ -2429,7 +2440,16 @@ static long rcu_nocb_rdp_deoffload(void *arg)
 	pr_info("De-offloading %d\n", rdp->cpu);
 
 	rcu_nocb_lock_irqsave(rdp, flags);
-
+	/*
+	 * Flush once and for all now. This suffices because we are
+	 * running on the target CPU holding ->nocb_lock (thus having
+	 * interrupts disabled), and because rdp_offload_toggle()
+	 * invokes rcu_segcblist_offload(), which clears SEGCBLIST_OFFLOADED.
+	 * Thus future calls to rcu_segcblist_completely_offloaded() will
+	 * return false, which means that future calls to rcu_nocb_try_bypass()
+	 * will refuse to put anything into the bypass.
+	 */
+	WARN_ON_ONCE(!rcu_nocb_flush_bypass(rdp, NULL, jiffies));
 	ret = rdp_offload_toggle(rdp, false, flags);
 	swait_event_exclusive(rdp->nocb_state_wq,
 			      !rcu_segcblist_test_flags(cblist, SEGCBLIST_KTHREAD_CB |
@@ -2441,21 +2461,21 @@ static long rcu_nocb_rdp_deoffload(void *arg)
 	del_timer_sync(&rdp->nocb_timer);
 
 	/*
-	 * Flush bypass. While IRQs are disabled and once we set
-	 * SEGCBLIST_SOFTIRQ_ONLY, no callback is supposed to be
-	 * enqueued on bypass.
+	 * Theoretically we could set SEGCBLIST_SOFTIRQ_ONLY with CB unlocked
+	 * and IRQs disabled but let's be paranoid.
 	 */
 	rcu_nocb_lock_irqsave(rdp, flags);
-	rcu_nocb_flush_bypass(rdp, NULL, jiffies);
 	rcu_segcblist_set_flags(cblist, SEGCBLIST_SOFTIRQ_ONLY);
 	/*
 	 * With SEGCBLIST_SOFTIRQ_ONLY, we can't use
-	 * rcu_nocb_unlock_irqrestore() anymore. Theoretically we
-	 * could set SEGCBLIST_SOFTIRQ_ONLY with cb unlocked and IRQs
-	 * disabled now, but let's be paranoid.
+	 * rcu_nocb_unlock_irqrestore() anymore.
 	 */
 	raw_spin_unlock_irqrestore(&rdp->nocb_lock, flags);
 
+	/* Sanity check */
+	WARN_ON_ONCE(rcu_cblist_n_cbs(&rdp->nocb_bypass));
+
+
 	return ret;
 }
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 03/13] rcu/nocb: Remove stale comment above rcu_segcblist_offload()
  2021-02-23  0:09 [PATCH 00/13] rcu/nocb updates v2 Frederic Weisbecker
  2021-02-23  0:09 ` [PATCH 01/13] rcu/nocb: Fix potential missed nocb_timer rearm Frederic Weisbecker
  2021-02-23  0:10 ` [PATCH 02/13] rcu/nocb: Disable bypass when CPU isn't completely offloaded Frederic Weisbecker
@ 2021-02-23  0:10 ` Frederic Weisbecker
  2021-02-23  0:10 ` [PATCH 04/13] rcu/nocb: Move trace_rcu_nocb_wake() calls outside nocb_lock when possible Frederic Weisbecker
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 39+ messages in thread
From: Frederic Weisbecker @ 2021-02-23  0:10 UTC (permalink / raw)
  To: Paul E . McKenney
  Cc: LKML, Frederic Weisbecker, Thomas Gleixner, Boqun Feng,
	Lai Jiangshan, Neeraj Upadhyay, Josh Triplett, Stable,
	Joel Fernandes

Remove stale comment claiming that the cblist must be empty before
changing the offloading state. This applied when the offloaded state was
defined exclusively on boot.

Reported-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Neeraj Upadhyay <neeraju@codeaurora.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
---
 kernel/rcu/rcu_segcblist.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/kernel/rcu/rcu_segcblist.c b/kernel/rcu/rcu_segcblist.c
index 7f181c9675f7..aaa111237b60 100644
--- a/kernel/rcu/rcu_segcblist.c
+++ b/kernel/rcu/rcu_segcblist.c
@@ -261,8 +261,7 @@ void rcu_segcblist_disable(struct rcu_segcblist *rsclp)
 }
 
 /*
- * Mark the specified rcu_segcblist structure as offloaded.  This
- * structure must be empty.
+ * Mark the specified rcu_segcblist structure as offloaded.
  */
 void rcu_segcblist_offload(struct rcu_segcblist *rsclp, bool offload)
 {
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 04/13] rcu/nocb: Move trace_rcu_nocb_wake() calls outside nocb_lock when possible
  2021-02-23  0:09 [PATCH 00/13] rcu/nocb updates v2 Frederic Weisbecker
                   ` (2 preceding siblings ...)
  2021-02-23  0:10 ` [PATCH 03/13] rcu/nocb: Remove stale comment above rcu_segcblist_offload() Frederic Weisbecker
@ 2021-02-23  0:10 ` Frederic Weisbecker
  2021-02-23  0:10 ` [PATCH 05/13] rcu/nocb: Merge nocb_timer to the rdp leader Frederic Weisbecker
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 39+ messages in thread
From: Frederic Weisbecker @ 2021-02-23  0:10 UTC (permalink / raw)
  To: Paul E . McKenney
  Cc: LKML, Frederic Weisbecker, Thomas Gleixner, Boqun Feng,
	Lai Jiangshan, Neeraj Upadhyay, Josh Triplett, Stable,
	Joel Fernandes

Those tracing calls don't need to be under the nocb lock. Move them
outside.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Neeraj Upadhyay <neeraju@codeaurora.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
---
 kernel/rcu/tree_plugin.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 924fa3d1df0d..587df271d640 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -1715,9 +1715,9 @@ static bool wake_nocb_gp(struct rcu_data *rdp, bool force,
 
 	lockdep_assert_held(&rdp->nocb_lock);
 	if (!READ_ONCE(rdp_gp->nocb_gp_kthread)) {
+		rcu_nocb_unlock_irqrestore(rdp, flags);
 		trace_rcu_nocb_wake(rcu_state.name, rdp->cpu,
 				    TPS("AlreadyAwake"));
-		rcu_nocb_unlock_irqrestore(rdp, flags);
 		return false;
 	}
 
@@ -1967,9 +1967,9 @@ static void __call_rcu_nocb_wake(struct rcu_data *rdp, bool was_alldone,
 	// If we are being polled or there is no kthread, just leave.
 	t = READ_ONCE(rdp->nocb_gp_kthread);
 	if (rcu_nocb_poll || !t) {
+		rcu_nocb_unlock_irqrestore(rdp, flags);
 		trace_rcu_nocb_wake(rcu_state.name, rdp->cpu,
 				    TPS("WakeNotPoll"));
-		rcu_nocb_unlock_irqrestore(rdp, flags);
 		return;
 	}
 	// Need to actually to a wakeup.
@@ -2004,8 +2004,8 @@ static void __call_rcu_nocb_wake(struct rcu_data *rdp, bool was_alldone,
 					   TPS("WakeOvfIsDeferred"));
 		rcu_nocb_unlock_irqrestore(rdp, flags);
 	} else {
+		rcu_nocb_unlock_irqrestore(rdp, flags);
 		trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("WakeNot"));
-		rcu_nocb_unlock_irqrestore(rdp, flags);
 	}
 	return;
 }
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 05/13] rcu/nocb: Merge nocb_timer to the rdp leader
  2021-02-23  0:09 [PATCH 00/13] rcu/nocb updates v2 Frederic Weisbecker
                   ` (3 preceding siblings ...)
  2021-02-23  0:10 ` [PATCH 04/13] rcu/nocb: Move trace_rcu_nocb_wake() calls outside nocb_lock when possible Frederic Weisbecker
@ 2021-02-23  0:10 ` Frederic Weisbecker
  2021-03-03  1:15   ` [PATCH 05/13] rcu/nocb: Use the rcuog CPU's ->nocb_timer Paul E. McKenney
  2021-02-23  0:10 ` [PATCH 06/13] timer: Revert "timer: Add timer_curr_running()" Frederic Weisbecker
                   ` (7 subsequent siblings)
  12 siblings, 1 reply; 39+ messages in thread
From: Frederic Weisbecker @ 2021-02-23  0:10 UTC (permalink / raw)
  To: Paul E . McKenney
  Cc: LKML, Frederic Weisbecker, Thomas Gleixner, Boqun Feng,
	Lai Jiangshan, Neeraj Upadhyay, Josh Triplett, Stable,
	Joel Fernandes

Currently each offline rdp has its own nocb_timer armed when the
nocb_gp wakeup must be deferred. This layout has many drawbacks,
compared to a solution based on a single timer per rdp group:

* It's a lot of timers to maintain.

* The per rdp nocb lock must be held to arm and cancel the timer and it's
  already quite contended.

* Firing one timer doesn't cancel the other rdp's timers in the same
  group:
  - They may conflict and end up with spurious wake ups
  - Each of the rdp that armed a timer need to lock both nocb_lock
    and then nocb_gp_lock upon exit to idle/user/guest mode.

* We can't cancel all of them if we detect an unflushed bypass in
  nocb_gp_wait(). In fact currently we only ever cancel the nocb_timer
  of the leader group.

* The leader group's nocb_timer is cancelled without locking nocb_lock
  in nocb_gp_wait(). It appears to be safe currently but is very error
  prone and hairy for reviewers.

* Since the timer takes the nocb_lock, it requires extra care in the NOCB
  (de-)offloading process, needing it to be either enabled or disabled
  and flushed.

Use a per rdp leader timer instead. It is based on nocb_gp_lock that is
_way_ less contended and stays so after this change. As a matter of fact,
the nocb_timer almost never fires and the deferred wakeup is mostly
handled on idle/user/guest entry. Now the early check performed at this
point in do_nocb_deferred_wakeup() is done on rdp_gp->nocb_defer_wakeup,
which is racy of course. It doesn't matter though because we only need
the guarantee to see the timer armed if we were the last one to arm it.
Any other situation (another rdp has armed it and we either see it or not)
is fine.

This solves all the issues listed above.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Neeraj Upadhyay <neeraju@codeaurora.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
---
 kernel/rcu/tree.h        |   1 -
 kernel/rcu/tree_plugin.h | 142 +++++++++++++++++++++------------------
 2 files changed, 78 insertions(+), 65 deletions(-)

diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
index 71821d59d95c..b280a843bd2c 100644
--- a/kernel/rcu/tree.h
+++ b/kernel/rcu/tree.h
@@ -257,7 +257,6 @@ struct rcu_data {
 };
 
 /* Values for nocb_defer_wakeup field in struct rcu_data. */
-#define RCU_NOCB_WAKE_OFF	-1
 #define RCU_NOCB_WAKE_NOT	0
 #define RCU_NOCB_WAKE		1
 #define RCU_NOCB_WAKE_FORCE	2
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 587df271d640..847636d3e93d 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -33,10 +33,6 @@ static inline bool rcu_current_is_nocb_kthread(struct rcu_data *rdp)
 	return false;
 }
 
-static inline bool rcu_running_nocb_timer(struct rcu_data *rdp)
-{
-	return (timer_curr_running(&rdp->nocb_timer) && !in_irq());
-}
 #else
 static inline int rcu_lockdep_is_held_nocb(struct rcu_data *rdp)
 {
@@ -48,11 +44,6 @@ static inline bool rcu_current_is_nocb_kthread(struct rcu_data *rdp)
 	return false;
 }
 
-static inline bool rcu_running_nocb_timer(struct rcu_data *rdp)
-{
-	return false;
-}
-
 #endif /* #ifdef CONFIG_RCU_NOCB_CPU */
 
 static bool rcu_rdp_is_offloaded(struct rcu_data *rdp)
@@ -72,8 +63,7 @@ static bool rcu_rdp_is_offloaded(struct rcu_data *rdp)
 		  rcu_lockdep_is_held_nocb(rdp) ||
 		  (rdp == this_cpu_ptr(&rcu_data) &&
 		   !(IS_ENABLED(CONFIG_PREEMPT_COUNT) && preemptible())) ||
-		  rcu_current_is_nocb_kthread(rdp) ||
-		  rcu_running_nocb_timer(rdp)),
+		  rcu_current_is_nocb_kthread(rdp)),
 		"Unsafe read of RCU_NOCB offloaded state"
 	);
 
@@ -1702,43 +1692,50 @@ bool rcu_is_nocb_cpu(int cpu)
 	return false;
 }
 
-/*
- * Kick the GP kthread for this NOCB group.  Caller holds ->nocb_lock
- * and this function releases it.
- */
-static bool wake_nocb_gp(struct rcu_data *rdp, bool force,
-			 unsigned long flags)
-	__releases(rdp->nocb_lock)
+static bool __wake_nocb_gp(struct rcu_data *rdp_gp,
+			   struct rcu_data *rdp,
+			   bool force, unsigned long flags)
+	__releases(rdp_gp->nocb_gp_lock)
 {
 	bool needwake = false;
-	struct rcu_data *rdp_gp = rdp->nocb_gp_rdp;
 
-	lockdep_assert_held(&rdp->nocb_lock);
 	if (!READ_ONCE(rdp_gp->nocb_gp_kthread)) {
-		rcu_nocb_unlock_irqrestore(rdp, flags);
+		raw_spin_unlock_irqrestore(&rdp_gp->nocb_gp_lock, flags);
 		trace_rcu_nocb_wake(rcu_state.name, rdp->cpu,
 				    TPS("AlreadyAwake"));
 		return false;
 	}
 
-	if (READ_ONCE(rdp->nocb_defer_wakeup) > RCU_NOCB_WAKE_NOT) {
-		WRITE_ONCE(rdp->nocb_defer_wakeup, RCU_NOCB_WAKE_NOT);
-		del_timer(&rdp->nocb_timer);
+	if (rdp_gp->nocb_defer_wakeup > RCU_NOCB_WAKE_NOT) {
+		WRITE_ONCE(rdp_gp->nocb_defer_wakeup, RCU_NOCB_WAKE_NOT);
+		del_timer(&rdp_gp->nocb_timer);
 	}
-	rcu_nocb_unlock_irqrestore(rdp, flags);
-	raw_spin_lock_irqsave(&rdp_gp->nocb_gp_lock, flags);
+
 	if (force || READ_ONCE(rdp_gp->nocb_gp_sleep)) {
 		WRITE_ONCE(rdp_gp->nocb_gp_sleep, false);
 		needwake = true;
+	}
+	raw_spin_unlock_irqrestore(&rdp_gp->nocb_gp_lock, flags);
+	if (needwake) {
 		trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("DoWake"));
-	}
-	raw_spin_unlock_irqrestore(&rdp_gp->nocb_gp_lock, flags);
-	if (needwake)
 		wake_up_process(rdp_gp->nocb_gp_kthread);
+	}
 
 	return needwake;
 }
 
+/*
+ * Kick the GP kthread for this NOCB group.
+ */
+static bool wake_nocb_gp(struct rcu_data *rdp, bool force)
+{
+	unsigned long flags;
+	struct rcu_data *rdp_gp = rdp->nocb_gp_rdp;
+
+	raw_spin_lock_irqsave(&rdp_gp->nocb_gp_lock, flags);
+	return __wake_nocb_gp(rdp_gp, rdp, force, flags);
+}
+
 /*
  * Arrange to wake the GP kthread for this NOCB group at some future
  * time when it is safe to do so.
@@ -1746,12 +1743,18 @@ static bool wake_nocb_gp(struct rcu_data *rdp, bool force,
 static void wake_nocb_gp_defer(struct rcu_data *rdp, int waketype,
 			       const char *reason)
 {
-	if (rdp->nocb_defer_wakeup == RCU_NOCB_WAKE_OFF)
-		return;
-	if (rdp->nocb_defer_wakeup == RCU_NOCB_WAKE_NOT)
-		mod_timer(&rdp->nocb_timer, jiffies + 1);
-	if (rdp->nocb_defer_wakeup < waketype)
-		WRITE_ONCE(rdp->nocb_defer_wakeup, waketype);
+	unsigned long flags;
+	struct rcu_data *rdp_gp = rdp->nocb_gp_rdp;
+
+	raw_spin_lock_irqsave(&rdp_gp->nocb_gp_lock, flags);
+
+	if (rdp_gp->nocb_defer_wakeup == RCU_NOCB_WAKE_NOT)
+		mod_timer(&rdp_gp->nocb_timer, jiffies + 1);
+	if (rdp_gp->nocb_defer_wakeup < waketype)
+		WRITE_ONCE(rdp_gp->nocb_defer_wakeup, waketype);
+
+	raw_spin_unlock_irqrestore(&rdp_gp->nocb_gp_lock, flags);
+
 	trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, reason);
 }
 
@@ -1978,13 +1981,14 @@ static void __call_rcu_nocb_wake(struct rcu_data *rdp, bool was_alldone,
 		rdp->qlen_last_fqs_check = len;
 		if (!irqs_disabled_flags(flags)) {
 			/* ... if queue was empty ... */
-			wake_nocb_gp(rdp, false, flags);
+			rcu_nocb_unlock_irqrestore(rdp, flags);
+			wake_nocb_gp(rdp, false);
 			trace_rcu_nocb_wake(rcu_state.name, rdp->cpu,
 					    TPS("WakeEmpty"));
 		} else {
+			rcu_nocb_unlock_irqrestore(rdp, flags);
 			wake_nocb_gp_defer(rdp, RCU_NOCB_WAKE,
 					   TPS("WakeEmptyIsDeferred"));
-			rcu_nocb_unlock_irqrestore(rdp, flags);
 		}
 	} else if (len > rdp->qlen_last_fqs_check + qhimark) {
 		/* ... or if many callbacks queued. */
@@ -1999,10 +2003,14 @@ static void __call_rcu_nocb_wake(struct rcu_data *rdp, bool was_alldone,
 		smp_mb(); /* Enqueue before timer_pending(). */
 		if ((rdp->nocb_cb_sleep ||
 		     !rcu_segcblist_ready_cbs(&rdp->cblist)) &&
-		    !timer_pending(&rdp->nocb_bypass_timer))
+		    !timer_pending(&rdp->nocb_bypass_timer)) {
+			rcu_nocb_unlock_irqrestore(rdp, flags);
 			wake_nocb_gp_defer(rdp, RCU_NOCB_WAKE_FORCE,
 					   TPS("WakeOvfIsDeferred"));
-		rcu_nocb_unlock_irqrestore(rdp, flags);
+		} else {
+			rcu_nocb_unlock_irqrestore(rdp, flags);
+			trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("WakeNot"));
+		}
 	} else {
 		rcu_nocb_unlock_irqrestore(rdp, flags);
 		trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("WakeNot"));
@@ -2128,11 +2136,7 @@ static void nocb_gp_wait(struct rcu_data *my_rdp)
 			bypass = true;
 		}
 		rnp = rdp->mynode;
-		if (bypass) {  // Avoid race with first bypass CB.
-			WRITE_ONCE(my_rdp->nocb_defer_wakeup,
-				   RCU_NOCB_WAKE_NOT);
-			del_timer(&my_rdp->nocb_timer);
-		}
+
 		// Advance callbacks if helpful and low contention.
 		needwake_gp = false;
 		if (!rcu_segcblist_restempty(&rdp->cblist,
@@ -2178,11 +2182,18 @@ static void nocb_gp_wait(struct rcu_data *my_rdp)
 	my_rdp->nocb_gp_bypass = bypass;
 	my_rdp->nocb_gp_gp = needwait_gp;
 	my_rdp->nocb_gp_seq = needwait_gp ? wait_gp_seq : 0;
-	if (bypass && !rcu_nocb_poll) {
-		// At least one child with non-empty ->nocb_bypass, so set
-		// timer in order to avoid stranding its callbacks.
+	if (bypass) {
 		raw_spin_lock_irqsave(&my_rdp->nocb_gp_lock, flags);
-		mod_timer(&my_rdp->nocb_bypass_timer, j + 2);
+		// Avoid race with first bypass CB.
+		if (my_rdp->nocb_defer_wakeup > RCU_NOCB_WAKE_NOT) {
+			WRITE_ONCE(my_rdp->nocb_defer_wakeup, RCU_NOCB_WAKE_NOT);
+			del_timer(&my_rdp->nocb_timer);
+		}
+		if (!rcu_nocb_poll) {
+			// At least one child with non-empty ->nocb_bypass, so set
+			// timer in order to avoid stranding its callbacks.
+			mod_timer(&my_rdp->nocb_bypass_timer, j + 2);
+		}
 		raw_spin_unlock_irqrestore(&my_rdp->nocb_gp_lock, flags);
 	}
 	if (rcu_nocb_poll) {
@@ -2356,15 +2367,18 @@ static bool do_nocb_deferred_wakeup_common(struct rcu_data *rdp)
 {
 	unsigned long flags;
 	int ndw;
+	struct rcu_data *rdp_gp = rdp->nocb_gp_rdp;
 	int ret;
 
-	rcu_nocb_lock_irqsave(rdp, flags);
-	if (!rcu_nocb_need_deferred_wakeup(rdp)) {
-		rcu_nocb_unlock_irqrestore(rdp, flags);
+	raw_spin_lock_irqsave(&rdp_gp->nocb_gp_lock, flags);
+
+	if (!rcu_nocb_need_deferred_wakeup(rdp_gp)) {
+		raw_spin_unlock_irqrestore(&rdp_gp->nocb_gp_lock, flags);
 		return false;
 	}
-	ndw = READ_ONCE(rdp->nocb_defer_wakeup);
-	ret = wake_nocb_gp(rdp, ndw == RCU_NOCB_WAKE_FORCE, flags);
+
+	ndw = rdp_gp->nocb_defer_wakeup;
+	ret = __wake_nocb_gp(rdp_gp, rdp, ndw == RCU_NOCB_WAKE_FORCE, flags);
 	trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("DeferredWake"));
 
 	return ret;
@@ -2385,7 +2399,10 @@ static void do_nocb_deferred_wakeup_timer(struct timer_list *t)
  */
 static bool do_nocb_deferred_wakeup(struct rcu_data *rdp)
 {
-	if (rcu_nocb_need_deferred_wakeup(rdp))
+	if (!rdp->nocb_gp_rdp)
+		return false;
+
+	if (rcu_nocb_need_deferred_wakeup(rdp->nocb_gp_rdp))
 		return do_nocb_deferred_wakeup_common(rdp);
 	return false;
 }
@@ -2454,17 +2471,15 @@ static long rcu_nocb_rdp_deoffload(void *arg)
 	swait_event_exclusive(rdp->nocb_state_wq,
 			      !rcu_segcblist_test_flags(cblist, SEGCBLIST_KTHREAD_CB |
 							SEGCBLIST_KTHREAD_GP));
-	rcu_nocb_lock_irqsave(rdp, flags);
-	/* Make sure nocb timer won't stay around */
-	WRITE_ONCE(rdp->nocb_defer_wakeup, RCU_NOCB_WAKE_OFF);
-	rcu_nocb_unlock_irqrestore(rdp, flags);
-	del_timer_sync(&rdp->nocb_timer);
-
 	/*
-	 * Theoretically we could set SEGCBLIST_SOFTIRQ_ONLY with CB unlocked
-	 * and IRQs disabled but let's be paranoid.
+	 * Lock one last time to acquire latest callback updates from kthreads
+	 * so we can later handle callbacks locally without locking.
 	 */
 	rcu_nocb_lock_irqsave(rdp, flags);
+	/*
+	 * Theoretically we could set SEGCBLIST_SOFTIRQ_ONLY after the nocb
+	 * lock is released but how about being paranoid for once?
+	 */
 	rcu_segcblist_set_flags(cblist, SEGCBLIST_SOFTIRQ_ONLY);
 	/*
 	 * With SEGCBLIST_SOFTIRQ_ONLY, we can't use
@@ -2528,8 +2543,7 @@ static long rcu_nocb_rdp_offload(void *arg)
 	 * SEGCBLIST_SOFTIRQ_ONLY mode.
 	 */
 	raw_spin_lock_irqsave(&rdp->nocb_lock, flags);
-	/* Re-enable nocb timer */
-	WRITE_ONCE(rdp->nocb_defer_wakeup, RCU_NOCB_WAKE_NOT);
+
 	/*
 	 * We didn't take the nocb lock while working on the
 	 * rdp->cblist in SEGCBLIST_SOFTIRQ_ONLY mode.
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 06/13] timer: Revert "timer: Add timer_curr_running()"
  2021-02-23  0:09 [PATCH 00/13] rcu/nocb updates v2 Frederic Weisbecker
                   ` (4 preceding siblings ...)
  2021-02-23  0:10 ` [PATCH 05/13] rcu/nocb: Merge nocb_timer to the rdp leader Frederic Weisbecker
@ 2021-02-23  0:10 ` Frederic Weisbecker
  2021-02-23  0:10 ` [PATCH 07/13] rcu/nocb: Directly call __wake_nocb_gp() from bypass timer Frederic Weisbecker
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 39+ messages in thread
From: Frederic Weisbecker @ 2021-02-23  0:10 UTC (permalink / raw)
  To: Paul E . McKenney
  Cc: LKML, Frederic Weisbecker, Thomas Gleixner, Boqun Feng,
	Lai Jiangshan, Neeraj Upadhyay, Josh Triplett, Stable,
	Joel Fernandes

This reverts commit dcd42591ebb8a25895b551a5297ea9c24414ba54.
The only user was RCU/nocb.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Neeraj Upadhyay <neeraju@codeaurora.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
---
 include/linux/timer.h |  2 --
 kernel/time/timer.c   | 14 --------------
 2 files changed, 16 deletions(-)

diff --git a/include/linux/timer.h b/include/linux/timer.h
index 4118a97e62fb..fda13c9d1256 100644
--- a/include/linux/timer.h
+++ b/include/linux/timer.h
@@ -192,8 +192,6 @@ extern int try_to_del_timer_sync(struct timer_list *timer);
 
 #define del_singleshot_timer_sync(t) del_timer_sync(t)
 
-extern bool timer_curr_running(struct timer_list *timer);
-
 extern void init_timers(void);
 struct hrtimer;
 extern enum hrtimer_restart it_real_fn(struct hrtimer *);
diff --git a/kernel/time/timer.c b/kernel/time/timer.c
index f475f1a027c8..8dbc008f8942 100644
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -1237,20 +1237,6 @@ int try_to_del_timer_sync(struct timer_list *timer)
 }
 EXPORT_SYMBOL(try_to_del_timer_sync);
 
-bool timer_curr_running(struct timer_list *timer)
-{
-	int i;
-
-	for (i = 0; i < NR_BASES; i++) {
-		struct timer_base *base = this_cpu_ptr(&timer_bases[i]);
-
-		if (base->running_timer == timer)
-			return true;
-	}
-
-	return false;
-}
-
 #ifdef CONFIG_PREEMPT_RT
 static __init void timer_base_init_expiry_lock(struct timer_base *base)
 {
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 07/13] rcu/nocb: Directly call __wake_nocb_gp() from bypass timer
  2021-02-23  0:09 [PATCH 00/13] rcu/nocb updates v2 Frederic Weisbecker
                   ` (5 preceding siblings ...)
  2021-02-23  0:10 ` [PATCH 06/13] timer: Revert "timer: Add timer_curr_running()" Frederic Weisbecker
@ 2021-02-23  0:10 ` Frederic Weisbecker
  2021-02-23  0:10 ` [PATCH 08/13] rcu/nocb: Allow de-offloading rdp leader Frederic Weisbecker
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 39+ messages in thread
From: Frederic Weisbecker @ 2021-02-23  0:10 UTC (permalink / raw)
  To: Paul E . McKenney
  Cc: LKML, Frederic Weisbecker, Thomas Gleixner, Boqun Feng,
	Lai Jiangshan, Neeraj Upadhyay, Josh Triplett, Stable,
	Joel Fernandes

The bypass timer calls __call_rcu_nocb_wake() instead of directly
calling __wake_nocb_gp(). The only difference here is that
rdp->qlen_last_fqs_check gets overriden. But resetting the deferred
force quiescent state base shouldn't be relevant for that timer. In fact
the bypass queue in concern can be for any rdp from the group and not
necessarily the rdp leader on which the bypass timer is attached.

Therefore we can simply call directly __wake_nocb_gp(). This way we
don't even need to lock the nocb_lock.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Neeraj Upadhyay <neeraju@codeaurora.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
---
 kernel/rcu/tree_plugin.h | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 847636d3e93d..c80b214a86bb 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -2025,9 +2025,10 @@ static void do_nocb_bypass_wakeup_timer(struct timer_list *t)
 	struct rcu_data *rdp = from_timer(rdp, t, nocb_bypass_timer);
 
 	trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("Timer"));
-	rcu_nocb_lock_irqsave(rdp, flags);
+
+	raw_spin_lock_irqsave(&rdp->nocb_gp_lock, flags);
 	smp_mb__after_spinlock(); /* Timer expire before wakeup. */
-	__call_rcu_nocb_wake(rdp, true, flags);
+	__wake_nocb_gp(rdp, rdp, false, flags);
 }
 
 /*
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 08/13] rcu/nocb: Allow de-offloading rdp leader
  2021-02-23  0:09 [PATCH 00/13] rcu/nocb updates v2 Frederic Weisbecker
                   ` (6 preceding siblings ...)
  2021-02-23  0:10 ` [PATCH 07/13] rcu/nocb: Directly call __wake_nocb_gp() from bypass timer Frederic Weisbecker
@ 2021-02-23  0:10 ` Frederic Weisbecker
  2021-02-23  0:10 ` [PATCH 09/13] rcu/nocb: Cancel nocb_timer upon nocb_gp wakeup Frederic Weisbecker
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 39+ messages in thread
From: Frederic Weisbecker @ 2021-02-23  0:10 UTC (permalink / raw)
  To: Paul E . McKenney
  Cc: LKML, Frederic Weisbecker, Thomas Gleixner, Boqun Feng,
	Lai Jiangshan, Neeraj Upadhyay, Josh Triplett, Stable,
	Joel Fernandes

The only thing that prevented an rdp leader from being de-offloaded was
the nocb_bypass_timer that used to lock the nocb_lock of the rdp leader.

If an rdp gets de-offloaded, it will subtely ignore rcu_nocb_lock()
calls and do its job in the timer unsafely. Worse yet: if it gets
re-offloaded in the middle of the timer, rcu_nocb_unlock() would try to
unlock, leaving it imbalanced.

Now that the nocb_bypass_timer doesn't use the nocb_lock anymore, we can
safely allow rdp leader to be de-offloaded.

Reported-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Neeraj Upadhyay <neeraju@codeaurora.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
 kernel/rcu/tree_plugin.h | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index c80b214a86bb..0fdf0223f08f 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -2500,10 +2500,6 @@ int rcu_nocb_cpu_deoffload(int cpu)
 	struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
 	int ret = 0;
 
-	if (rdp == rdp->nocb_gp_rdp) {
-		pr_info("Can't deoffload an rdp GP leader (yet)\n");
-		return -EINVAL;
-	}
 	mutex_lock(&rcu_state.barrier_mutex);
 	cpus_read_lock();
 	if (rcu_rdp_is_offloaded(rdp)) {
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 09/13] rcu/nocb: Cancel nocb_timer upon nocb_gp wakeup
  2021-02-23  0:09 [PATCH 00/13] rcu/nocb updates v2 Frederic Weisbecker
                   ` (7 preceding siblings ...)
  2021-02-23  0:10 ` [PATCH 08/13] rcu/nocb: Allow de-offloading rdp leader Frederic Weisbecker
@ 2021-02-23  0:10 ` Frederic Weisbecker
  2021-02-23  0:10 ` [PATCH 10/13] rcu/nocb: Delete bypass_timer " Frederic Weisbecker
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 39+ messages in thread
From: Frederic Weisbecker @ 2021-02-23  0:10 UTC (permalink / raw)
  To: Paul E . McKenney
  Cc: LKML, Frederic Weisbecker, Thomas Gleixner, Boqun Feng,
	Lai Jiangshan, Neeraj Upadhyay, Josh Triplett, Stable,
	Joel Fernandes

As we wake up in nocb_gp_wait(), there is no need to keep the nocb_timer
around as we are going to go through the whole rdp list again. Any update
performed before the timer was armed will now be visible after the
nocb_gp_lock acquire.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Neeraj Upadhyay <neeraju@codeaurora.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
---
 kernel/rcu/tree_plugin.h | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 0fdf0223f08f..b62ad79bbda5 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -2221,6 +2221,10 @@ static void nocb_gp_wait(struct rcu_data *my_rdp)
 		raw_spin_lock_irqsave(&my_rdp->nocb_gp_lock, flags);
 		if (bypass)
 			del_timer(&my_rdp->nocb_bypass_timer);
+		if (my_rdp->nocb_defer_wakeup > RCU_NOCB_WAKE_NOT) {
+			WRITE_ONCE(my_rdp->nocb_defer_wakeup, RCU_NOCB_WAKE_NOT);
+			del_timer(&my_rdp->nocb_timer);
+		}
 		WRITE_ONCE(my_rdp->nocb_gp_sleep, true);
 		raw_spin_unlock_irqrestore(&my_rdp->nocb_gp_lock, flags);
 	}
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 10/13] rcu/nocb: Delete bypass_timer upon nocb_gp wakeup
  2021-02-23  0:09 [PATCH 00/13] rcu/nocb updates v2 Frederic Weisbecker
                   ` (8 preceding siblings ...)
  2021-02-23  0:10 ` [PATCH 09/13] rcu/nocb: Cancel nocb_timer upon nocb_gp wakeup Frederic Weisbecker
@ 2021-02-23  0:10 ` Frederic Weisbecker
  2021-03-03  1:24   ` Paul E. McKenney
  2021-02-23  0:10 ` [PATCH 11/13] rcu/nocb: Only cancel nocb timer if not polling Frederic Weisbecker
                   ` (2 subsequent siblings)
  12 siblings, 1 reply; 39+ messages in thread
From: Frederic Weisbecker @ 2021-02-23  0:10 UTC (permalink / raw)
  To: Paul E . McKenney
  Cc: LKML, Frederic Weisbecker, Thomas Gleixner, Boqun Feng,
	Lai Jiangshan, Neeraj Upadhyay, Josh Triplett, Stable,
	Joel Fernandes

A NOCB-gp wake up can safely delete the nocb_bypass_timer. nocb_gp_wait()
is going to check again the bypass state and rearm the bypass timer if
necessary.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Neeraj Upadhyay <neeraju@codeaurora.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
---
 kernel/rcu/tree_plugin.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index b62ad79bbda5..9da67b0d3997 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -1711,6 +1711,8 @@ static bool __wake_nocb_gp(struct rcu_data *rdp_gp,
 		del_timer(&rdp_gp->nocb_timer);
 	}
 
+	del_timer(&rdp_gp->nocb_bypass_timer);
+
 	if (force || READ_ONCE(rdp_gp->nocb_gp_sleep)) {
 		WRITE_ONCE(rdp_gp->nocb_gp_sleep, false);
 		needwake = true;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 11/13] rcu/nocb: Only cancel nocb timer if not polling
  2021-02-23  0:09 [PATCH 00/13] rcu/nocb updates v2 Frederic Weisbecker
                   ` (9 preceding siblings ...)
  2021-02-23  0:10 ` [PATCH 10/13] rcu/nocb: Delete bypass_timer " Frederic Weisbecker
@ 2021-02-23  0:10 ` Frederic Weisbecker
  2021-03-03  1:22   ` Paul E. McKenney
  2021-02-23  0:10 ` [PATCH 12/13] rcu/nocb: Prepare for finegrained deferred wakeup Frederic Weisbecker
  2021-02-23  0:10 ` [PATCH 13/13] rcu/nocb: Unify timers Frederic Weisbecker
  12 siblings, 1 reply; 39+ messages in thread
From: Frederic Weisbecker @ 2021-02-23  0:10 UTC (permalink / raw)
  To: Paul E . McKenney
  Cc: LKML, Frederic Weisbecker, Thomas Gleixner, Boqun Feng,
	Lai Jiangshan, Neeraj Upadhyay, Josh Triplett, Stable,
	Joel Fernandes

No need to disarm the nocb_timer if rcu_nocb is polling because it
shouldn't be armed either.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Neeraj Upadhyay <neeraju@codeaurora.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
---
 kernel/rcu/tree_plugin.h | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 9da67b0d3997..d8b50ff40e4b 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -2186,18 +2186,18 @@ static void nocb_gp_wait(struct rcu_data *my_rdp)
 	my_rdp->nocb_gp_gp = needwait_gp;
 	my_rdp->nocb_gp_seq = needwait_gp ? wait_gp_seq : 0;
 	if (bypass) {
-		raw_spin_lock_irqsave(&my_rdp->nocb_gp_lock, flags);
-		// Avoid race with first bypass CB.
-		if (my_rdp->nocb_defer_wakeup > RCU_NOCB_WAKE_NOT) {
-			WRITE_ONCE(my_rdp->nocb_defer_wakeup, RCU_NOCB_WAKE_NOT);
-			del_timer(&my_rdp->nocb_timer);
-		}
 		if (!rcu_nocb_poll) {
+			raw_spin_lock_irqsave(&my_rdp->nocb_gp_lock, flags);
+			// Avoid race with first bypass CB.
+			if (my_rdp->nocb_defer_wakeup > RCU_NOCB_WAKE_NOT) {
+				WRITE_ONCE(my_rdp->nocb_defer_wakeup, RCU_NOCB_WAKE_NOT);
+				del_timer(&my_rdp->nocb_timer);
+			}
 			// At least one child with non-empty ->nocb_bypass, so set
 			// timer in order to avoid stranding its callbacks.
 			mod_timer(&my_rdp->nocb_bypass_timer, j + 2);
+			raw_spin_unlock_irqrestore(&my_rdp->nocb_gp_lock, flags);
 		}
-		raw_spin_unlock_irqrestore(&my_rdp->nocb_gp_lock, flags);
 	}
 	if (rcu_nocb_poll) {
 		/* Polling, so trace if first poll in the series. */
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 12/13] rcu/nocb: Prepare for finegrained deferred wakeup
  2021-02-23  0:09 [PATCH 00/13] rcu/nocb updates v2 Frederic Weisbecker
                   ` (10 preceding siblings ...)
  2021-02-23  0:10 ` [PATCH 11/13] rcu/nocb: Only cancel nocb timer if not polling Frederic Weisbecker
@ 2021-02-23  0:10 ` Frederic Weisbecker
  2021-03-16  3:02   ` Paul E. McKenney
  2021-02-23  0:10 ` [PATCH 13/13] rcu/nocb: Unify timers Frederic Weisbecker
  12 siblings, 1 reply; 39+ messages in thread
From: Frederic Weisbecker @ 2021-02-23  0:10 UTC (permalink / raw)
  To: Paul E . McKenney
  Cc: LKML, Frederic Weisbecker, Thomas Gleixner, Boqun Feng,
	Lai Jiangshan, Neeraj Upadhyay, Josh Triplett, Stable,
	Joel Fernandes

Provide a way to tune the deferred wakeup level we want to perform from
a safe wakeup point. Currently those sites are:

* nocb_timer
* user/idle/guest entry
* CPU down
* softirq/rcuc

All of these sites perform the wake up for both RCU_NOCB_WAKE and
RCU_NOCB_WAKE_FORCE.

In order to merge nocb_timer and nocb_bypass_timer together, we plan to
add a new RCU_NOCB_WAKE_BYPASS that really want to be deferred until
a timer fires so that we don't wake up the NOCB-gp kthread too early.

To prepare for that, integrate a wake up level/limit that a callsite is
deemed to perform.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Neeraj Upadhyay <neeraju@codeaurora.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
---
 kernel/rcu/tree.c        |  2 +-
 kernel/rcu/tree.h        |  2 +-
 kernel/rcu/tree_plugin.h | 15 ++++++++-------
 3 files changed, 10 insertions(+), 9 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 2c9cf4df942c..9951a4bef504 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -3823,7 +3823,7 @@ static int rcu_pending(int user)
 	check_cpu_stall(rdp);
 
 	/* Does this CPU need a deferred NOCB wakeup? */
-	if (rcu_nocb_need_deferred_wakeup(rdp))
+	if (rcu_nocb_need_deferred_wakeup(rdp, RCU_NOCB_WAKE))
 		return 1;
 
 	/* Is this a nohz_full CPU in userspace or idle?  (Ignore RCU if so.) */
diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
index b280a843bd2c..2510e86265c1 100644
--- a/kernel/rcu/tree.h
+++ b/kernel/rcu/tree.h
@@ -433,7 +433,7 @@ static bool rcu_nocb_try_bypass(struct rcu_data *rdp, struct rcu_head *rhp,
 				bool *was_alldone, unsigned long flags);
 static void __call_rcu_nocb_wake(struct rcu_data *rdp, bool was_empty,
 				 unsigned long flags);
-static int rcu_nocb_need_deferred_wakeup(struct rcu_data *rdp);
+static int rcu_nocb_need_deferred_wakeup(struct rcu_data *rdp, int level);
 static bool do_nocb_deferred_wakeup(struct rcu_data *rdp);
 static void rcu_boot_init_nocb_percpu_data(struct rcu_data *rdp);
 static void rcu_spawn_cpu_nocb_kthread(int cpu);
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index d8b50ff40e4b..e0420e3b30e6 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -2364,13 +2364,14 @@ static int rcu_nocb_cb_kthread(void *arg)
 }
 
 /* Is a deferred wakeup of rcu_nocb_kthread() required? */
-static int rcu_nocb_need_deferred_wakeup(struct rcu_data *rdp)
+static int rcu_nocb_need_deferred_wakeup(struct rcu_data *rdp, int level)
 {
-	return READ_ONCE(rdp->nocb_defer_wakeup) > RCU_NOCB_WAKE_NOT;
+	return READ_ONCE(rdp->nocb_defer_wakeup) >= level;
 }
 
 /* Do a deferred wakeup of rcu_nocb_kthread(). */
-static bool do_nocb_deferred_wakeup_common(struct rcu_data *rdp)
+static bool do_nocb_deferred_wakeup_common(struct rcu_data *rdp,
+					   int level)
 {
 	unsigned long flags;
 	int ndw;
@@ -2379,7 +2380,7 @@ static bool do_nocb_deferred_wakeup_common(struct rcu_data *rdp)
 
 	raw_spin_lock_irqsave(&rdp_gp->nocb_gp_lock, flags);
 
-	if (!rcu_nocb_need_deferred_wakeup(rdp_gp)) {
+	if (!rcu_nocb_need_deferred_wakeup(rdp_gp, level)) {
 		raw_spin_unlock_irqrestore(&rdp_gp->nocb_gp_lock, flags);
 		return false;
 	}
@@ -2396,7 +2397,7 @@ static void do_nocb_deferred_wakeup_timer(struct timer_list *t)
 {
 	struct rcu_data *rdp = from_timer(rdp, t, nocb_timer);
 
-	do_nocb_deferred_wakeup_common(rdp);
+	do_nocb_deferred_wakeup_common(rdp, RCU_NOCB_WAKE);
 }
 
 /*
@@ -2409,8 +2410,8 @@ static bool do_nocb_deferred_wakeup(struct rcu_data *rdp)
 	if (!rdp->nocb_gp_rdp)
 		return false;
 
-	if (rcu_nocb_need_deferred_wakeup(rdp->nocb_gp_rdp))
-		return do_nocb_deferred_wakeup_common(rdp);
+	if (rcu_nocb_need_deferred_wakeup(rdp->nocb_gp_rdp, RCU_NOCB_WAKE))
+		return do_nocb_deferred_wakeup_common(rdp, RCU_NOCB_WAKE);
 	return false;
 }
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 13/13] rcu/nocb: Unify timers
  2021-02-23  0:09 [PATCH 00/13] rcu/nocb updates v2 Frederic Weisbecker
                   ` (11 preceding siblings ...)
  2021-02-23  0:10 ` [PATCH 12/13] rcu/nocb: Prepare for finegrained deferred wakeup Frederic Weisbecker
@ 2021-02-23  0:10 ` Frederic Weisbecker
  12 siblings, 0 replies; 39+ messages in thread
From: Frederic Weisbecker @ 2021-02-23  0:10 UTC (permalink / raw)
  To: Paul E . McKenney
  Cc: LKML, Frederic Weisbecker, Thomas Gleixner, Boqun Feng,
	Lai Jiangshan, Neeraj Upadhyay, Josh Triplett, Stable,
	Joel Fernandes

Now that nocb_timer and nocb_bypass_timer have become very similar,
merge them together. A new RCU_NOCB_WAKE_BYPASS wake level is introduced.
As a result, timers perform all kinds of deferred wake ups but other
deferred wakeup callsites only handle non-bypass wakeups in order not
to wake up rcuo too early.

The timer also performs the full barrier all the time to order
timer_pending() and callback enqueue although the path performing
RCU_NOCB_WAKE_FORCE that makes use of it is debatable. It should also
test against the rdp leader instead of the current rdp.

The permanent full barrier shouldn't bring visible overhead since the
timers almost never fire.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Neeraj Upadhyay <neeraju@codeaurora.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
---
 include/trace/events/rcu.h |  1 +
 kernel/rcu/tree.h          |  6 +--
 kernel/rcu/tree_plugin.h   | 92 ++++++++++++++++----------------------
 3 files changed, 43 insertions(+), 56 deletions(-)

diff --git a/include/trace/events/rcu.h b/include/trace/events/rcu.h
index 5fc29400e1a2..c16cb7d78f51 100644
--- a/include/trace/events/rcu.h
+++ b/include/trace/events/rcu.h
@@ -278,6 +278,7 @@ TRACE_EVENT_RCU(rcu_exp_funnel_lock,
  * "WakeNot": Don't wake rcuo kthread.
  * "WakeNotPoll": Don't wake rcuo kthread because it is polling.
  * "WakeOvfIsDeferred": Wake rcuo kthread later, CB list is huge.
+ * "WakeBypassIsDeferred": Wake rcuo kthread later, bypass list is contended.
  * "WokeEmpty": rcuo CB kthread woke to find empty list.
  */
 TRACE_EVENT_RCU(rcu_nocb_wake,
diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
index 2510e86265c1..9a16487edfca 100644
--- a/kernel/rcu/tree.h
+++ b/kernel/rcu/tree.h
@@ -218,7 +218,6 @@ struct rcu_data {
 
 	/* The following fields are used by GP kthread, hence own cacheline. */
 	raw_spinlock_t nocb_gp_lock ____cacheline_internodealigned_in_smp;
-	struct timer_list nocb_bypass_timer; /* Force nocb_bypass flush. */
 	u8 nocb_gp_sleep;		/* Is the nocb GP thread asleep? */
 	u8 nocb_gp_bypass;		/* Found a bypass on last scan? */
 	u8 nocb_gp_gp;			/* GP to wait for on last scan? */
@@ -258,8 +257,9 @@ struct rcu_data {
 
 /* Values for nocb_defer_wakeup field in struct rcu_data. */
 #define RCU_NOCB_WAKE_NOT	0
-#define RCU_NOCB_WAKE		1
-#define RCU_NOCB_WAKE_FORCE	2
+#define RCU_NOCB_WAKE_BYPASS	1
+#define RCU_NOCB_WAKE		2
+#define RCU_NOCB_WAKE_FORCE	3
 
 #define RCU_JIFFIES_TILL_FORCE_QS (1 + (HZ > 250) + (HZ > 500))
 					/* For jiffies_till_first_fqs and */
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index e0420e3b30e6..6bf35a1fe68e 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -1711,8 +1711,6 @@ static bool __wake_nocb_gp(struct rcu_data *rdp_gp,
 		del_timer(&rdp_gp->nocb_timer);
 	}
 
-	del_timer(&rdp_gp->nocb_bypass_timer);
-
 	if (force || READ_ONCE(rdp_gp->nocb_gp_sleep)) {
 		WRITE_ONCE(rdp_gp->nocb_gp_sleep, false);
 		needwake = true;
@@ -1750,10 +1748,19 @@ static void wake_nocb_gp_defer(struct rcu_data *rdp, int waketype,
 
 	raw_spin_lock_irqsave(&rdp_gp->nocb_gp_lock, flags);
 
-	if (rdp_gp->nocb_defer_wakeup == RCU_NOCB_WAKE_NOT)
-		mod_timer(&rdp_gp->nocb_timer, jiffies + 1);
-	if (rdp_gp->nocb_defer_wakeup < waketype)
+	/*
+	 * Bypass wakeup overrides previous deferments. In case
+	 * of callback storm, no need to wake up too early.
+	 */
+	if (waketype == RCU_NOCB_WAKE_BYPASS) {
+		mod_timer(&rdp_gp->nocb_timer, jiffies + 2);
 		WRITE_ONCE(rdp_gp->nocb_defer_wakeup, waketype);
+	} else {
+		if (rdp_gp->nocb_defer_wakeup < RCU_NOCB_WAKE)
+			mod_timer(&rdp_gp->nocb_timer, jiffies + 1);
+		if (rdp_gp->nocb_defer_wakeup < waketype)
+			WRITE_ONCE(rdp_gp->nocb_defer_wakeup, waketype);
+	}
 
 	raw_spin_unlock_irqrestore(&rdp_gp->nocb_gp_lock, flags);
 
@@ -2005,7 +2012,7 @@ static void __call_rcu_nocb_wake(struct rcu_data *rdp, bool was_alldone,
 		smp_mb(); /* Enqueue before timer_pending(). */
 		if ((rdp->nocb_cb_sleep ||
 		     !rcu_segcblist_ready_cbs(&rdp->cblist)) &&
-		    !timer_pending(&rdp->nocb_bypass_timer)) {
+		    !timer_pending(&rdp->nocb_timer)) {
 			rcu_nocb_unlock_irqrestore(rdp, flags);
 			wake_nocb_gp_defer(rdp, RCU_NOCB_WAKE_FORCE,
 					   TPS("WakeOvfIsDeferred"));
@@ -2020,19 +2027,6 @@ static void __call_rcu_nocb_wake(struct rcu_data *rdp, bool was_alldone,
 	return;
 }
 
-/* Wake up the no-CBs GP kthread to flush ->nocb_bypass. */
-static void do_nocb_bypass_wakeup_timer(struct timer_list *t)
-{
-	unsigned long flags;
-	struct rcu_data *rdp = from_timer(rdp, t, nocb_bypass_timer);
-
-	trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("Timer"));
-
-	raw_spin_lock_irqsave(&rdp->nocb_gp_lock, flags);
-	smp_mb__after_spinlock(); /* Timer expire before wakeup. */
-	__wake_nocb_gp(rdp, rdp, false, flags);
-}
-
 /*
  * Check if we ignore this rdp.
  *
@@ -2185,19 +2179,12 @@ static void nocb_gp_wait(struct rcu_data *my_rdp)
 	my_rdp->nocb_gp_bypass = bypass;
 	my_rdp->nocb_gp_gp = needwait_gp;
 	my_rdp->nocb_gp_seq = needwait_gp ? wait_gp_seq : 0;
-	if (bypass) {
-		if (!rcu_nocb_poll) {
-			raw_spin_lock_irqsave(&my_rdp->nocb_gp_lock, flags);
-			// Avoid race with first bypass CB.
-			if (my_rdp->nocb_defer_wakeup > RCU_NOCB_WAKE_NOT) {
-				WRITE_ONCE(my_rdp->nocb_defer_wakeup, RCU_NOCB_WAKE_NOT);
-				del_timer(&my_rdp->nocb_timer);
-			}
-			// At least one child with non-empty ->nocb_bypass, so set
-			// timer in order to avoid stranding its callbacks.
-			mod_timer(&my_rdp->nocb_bypass_timer, j + 2);
-			raw_spin_unlock_irqrestore(&my_rdp->nocb_gp_lock, flags);
-		}
+
+	if (bypass && !rcu_nocb_poll) {
+		// At least one child with non-empty ->nocb_bypass, so set
+		// timer in order to avoid stranding its callbacks.
+		wake_nocb_gp_defer(my_rdp, RCU_NOCB_WAKE_BYPASS,
+				   TPS("WakeBypassIsDeferred"));
 	}
 	if (rcu_nocb_poll) {
 		/* Polling, so trace if first poll in the series. */
@@ -2221,8 +2208,6 @@ static void nocb_gp_wait(struct rcu_data *my_rdp)
 	}
 	if (!rcu_nocb_poll) {
 		raw_spin_lock_irqsave(&my_rdp->nocb_gp_lock, flags);
-		if (bypass)
-			del_timer(&my_rdp->nocb_bypass_timer);
 		if (my_rdp->nocb_defer_wakeup > RCU_NOCB_WAKE_NOT) {
 			WRITE_ONCE(my_rdp->nocb_defer_wakeup, RCU_NOCB_WAKE_NOT);
 			del_timer(&my_rdp->nocb_timer);
@@ -2370,16 +2355,14 @@ static int rcu_nocb_need_deferred_wakeup(struct rcu_data *rdp, int level)
 }
 
 /* Do a deferred wakeup of rcu_nocb_kthread(). */
-static bool do_nocb_deferred_wakeup_common(struct rcu_data *rdp,
-					   int level)
+static bool do_nocb_deferred_wakeup_common(struct rcu_data *rdp_gp,
+					   struct rcu_data *rdp, int level,
+					   unsigned long flags)
+	__releases(rdp_gp->nocb_gp_lock)
 {
-	unsigned long flags;
 	int ndw;
-	struct rcu_data *rdp_gp = rdp->nocb_gp_rdp;
 	int ret;
 
-	raw_spin_lock_irqsave(&rdp_gp->nocb_gp_lock, flags);
-
 	if (!rcu_nocb_need_deferred_wakeup(rdp_gp, level)) {
 		raw_spin_unlock_irqrestore(&rdp_gp->nocb_gp_lock, flags);
 		return false;
@@ -2395,9 +2378,15 @@ static bool do_nocb_deferred_wakeup_common(struct rcu_data *rdp,
 /* Do a deferred wakeup of rcu_nocb_kthread() from a timer handler. */
 static void do_nocb_deferred_wakeup_timer(struct timer_list *t)
 {
+	unsigned long flags;
 	struct rcu_data *rdp = from_timer(rdp, t, nocb_timer);
 
-	do_nocb_deferred_wakeup_common(rdp, RCU_NOCB_WAKE);
+	WARN_ON_ONCE(rdp->nocb_gp_rdp != rdp);
+	trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("Timer"));
+
+	raw_spin_lock_irqsave(&rdp->nocb_gp_lock, flags);
+	smp_mb__after_spinlock(); /* Timer expire before wakeup. */
+	do_nocb_deferred_wakeup_common(rdp, rdp, RCU_NOCB_WAKE_BYPASS, flags);
 }
 
 /*
@@ -2407,12 +2396,14 @@ static void do_nocb_deferred_wakeup_timer(struct timer_list *t)
  */
 static bool do_nocb_deferred_wakeup(struct rcu_data *rdp)
 {
-	if (!rdp->nocb_gp_rdp)
+	unsigned long flags;
+	struct rcu_data *rdp_gp = rdp->nocb_gp_rdp;
+
+	if (!rdp_gp || !rcu_nocb_need_deferred_wakeup(rdp_gp, RCU_NOCB_WAKE))
 		return false;
 
-	if (rcu_nocb_need_deferred_wakeup(rdp->nocb_gp_rdp, RCU_NOCB_WAKE))
-		return do_nocb_deferred_wakeup_common(rdp, RCU_NOCB_WAKE);
-	return false;
+	raw_spin_lock_irqsave(&rdp_gp->nocb_gp_lock, flags);
+	return do_nocb_deferred_wakeup_common(rdp_gp, rdp, RCU_NOCB_WAKE, flags);
 }
 
 void rcu_nocb_flush_deferred_wakeup(void)
@@ -2655,7 +2646,6 @@ static void __init rcu_boot_init_nocb_percpu_data(struct rcu_data *rdp)
 	raw_spin_lock_init(&rdp->nocb_bypass_lock);
 	raw_spin_lock_init(&rdp->nocb_gp_lock);
 	timer_setup(&rdp->nocb_timer, do_nocb_deferred_wakeup_timer, 0);
-	timer_setup(&rdp->nocb_bypass_timer, do_nocb_bypass_wakeup_timer, 0);
 	rcu_cblist_init(&rdp->nocb_bypass);
 }
 
@@ -2814,13 +2804,12 @@ static void show_rcu_nocb_gp_state(struct rcu_data *rdp)
 {
 	struct rcu_node *rnp = rdp->mynode;
 
-	pr_info("nocb GP %d %c%c%c%c%c%c %c[%c%c] %c%c:%ld rnp %d:%d %lu %c CPU %d%s\n",
+	pr_info("nocb GP %d %c%c%c%c%c %c[%c%c] %c%c:%ld rnp %d:%d %lu %c CPU %d%s\n",
 		rdp->cpu,
 		"kK"[!!rdp->nocb_gp_kthread],
 		"lL"[raw_spin_is_locked(&rdp->nocb_gp_lock)],
 		"dD"[!!rdp->nocb_defer_wakeup],
 		"tT"[timer_pending(&rdp->nocb_timer)],
-		"bB"[timer_pending(&rdp->nocb_bypass_timer)],
 		"sS"[!!rdp->nocb_gp_sleep],
 		".W"[swait_active(&rdp->nocb_gp_wq)],
 		".W"[swait_active(&rnp->nocb_gp_wq[0])],
@@ -2841,7 +2830,6 @@ static void show_rcu_nocb_state(struct rcu_data *rdp)
 	char bufr[20];
 	struct rcu_segcblist *rsclp = &rdp->cblist;
 	bool waslocked;
-	bool wastimer;
 	bool wassleep;
 
 	if (rdp->nocb_gp_rdp == rdp)
@@ -2878,15 +2866,13 @@ static void show_rcu_nocb_state(struct rcu_data *rdp)
 		return;
 
 	waslocked = raw_spin_is_locked(&rdp->nocb_gp_lock);
-	wastimer = timer_pending(&rdp->nocb_bypass_timer);
 	wassleep = swait_active(&rdp->nocb_gp_wq);
-	if (!rdp->nocb_gp_sleep && !waslocked && !wastimer && !wassleep)
+	if (!rdp->nocb_gp_sleep && !waslocked && !wassleep)
 		return;  /* Nothing untowards. */
 
-	pr_info("   nocb GP activity on CB-only CPU!!! %c%c%c%c %c\n",
+	pr_info("   nocb GP activity on CB-only CPU!!! %c%c%c %c\n",
 		"lL"[waslocked],
 		"dD"[!!rdp->nocb_defer_wakeup],
-		"tT"[wastimer],
 		"sS"[!!rdp->nocb_gp_sleep],
 		".W"[wassleep]);
 }
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [PATCH 01/13] rcu/nocb: Fix potential missed nocb_timer rearm
  2021-02-23  0:09 ` [PATCH 01/13] rcu/nocb: Fix potential missed nocb_timer rearm Frederic Weisbecker
@ 2021-02-24 18:37   ` Paul E. McKenney
  2021-02-24 22:06     ` Frederic Weisbecker
  0 siblings, 1 reply; 39+ messages in thread
From: Paul E. McKenney @ 2021-02-24 18:37 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Thomas Gleixner, Boqun Feng, Lai Jiangshan,
	Neeraj Upadhyay, Josh Triplett, Stable, Joel Fernandes

On Tue, Feb 23, 2021 at 01:09:59AM +0100, Frederic Weisbecker wrote:
> Two situations can cause a missed nocb timer rearm:
> 
> 1) rdp(CPU A) queues its nocb timer. The grace period elapses before
>    the timer get a chance to fire. The nocb_gp kthread is awaken by
>    rdp(CPU B). The nocb_cb kthread for rdp(CPU A) is awaken and
>    process the callbacks, again before the nocb_timer for CPU A get a
>    chance to fire. rdp(CPU A) queues a callback and wakes up nocb_gp
>    kthread, cancelling the pending nocb_timer without resetting the
>    corresponding nocb_defer_wakeup.

As discussed offlist, expanding the above scenario results in this
sequence of steps:

1.	There are no callbacks queued for any CPU covered by CPU 0-2's
	->nocb_gp_kthread.

2.	CPU 0 enqueues its first callback with interrupts disabled, and
	thus must defer awakening its ->nocb_gp_kthread.  It therefore
	queues its rcu_data structure's ->nocb_timer.

3.	CPU 1, which shares the same ->nocb_gp_kthread, also enqueues a
	callback, but with interrupts enabled, allowing it to directly
	awaken the ->nocb_gp_kthread.

4.	The newly awakened ->nocb_gp_kthread associates both CPU 0's
	and CPU 1's callbacks with a future grace period and arranges
	for that grace period to be started.

5.	This ->nocb_gp_kthread goes to sleep waiting for the end of this
	future grace period.

6.	This grace period elapses before the CPU 0's timer fires.
	This is normally improbably given that the timer is set for only
	one jiffy, but timers can be delayed.  Besides, it is possible
	that kernel was built with CONFIG_RCU_STRICT_GRACE_PERIOD=y.

7.	The grace period ends, so rcu_gp_kthread awakens the
	->nocb_gp_kthread, which in turn awakens both CPU 0's and
	CPU 1's ->nocb_cb_kthread.

8.	CPU 0's ->nocb_cb_kthread invokes its callback.

9.	Note that neither kthread updated any ->nocb_timer state,
	so CPU 0's ->nocb_defer_wakeup is still set to either
	RCU_NOCB_WAKE or RCU_NOCB_WAKE_FORCE.

10.	CPU 0 enqueues its second callback, again with interrupts
	disabled, and thus must again defer awakening its
	->nocb_gp_kthread.  However, ->nocb_defer_wakeup prevents
	CPU 0 from queueing the timer.

So far so good, but why isn't the timer still queued from back in step 2?
What am I missing here?  Either way, could you please update the commit
logs to tell the full story?  At some later time, you might be very
happy that you did.  ;-)

							Thanx, Paul

> 2) The "nocb_bypass_timer" ends up calling wake_nocb_gp() which deletes
>    the pending "nocb_timer" (note they are not the same timers) for the
>    given rdp without resetting the matching state stored in nocb_defer
>    wakeup.
> 
> On both situations, a future call_rcu() on that rdp may be fooled and
> think the timer is armed when it's not, missing a deferred nocb_gp
> wakeup.
> 
> Case 1) is very unlikely due to timing constraint (the timer fires after
> 1 jiffy) but still possible in theory. Case 2) is more likely to happen.
> But in any case such scenario require the CPU to spend a long time
> within a kernel thread without exiting to idle or user space, which is
> a pretty exotic behaviour.
> 
> Fix this with resetting rdp->nocb_defer_wakeup everytime we disarm the
> timer.
> 
> Fixes: d1b222c6be1f (rcu/nocb: Add bypass callback queueing)
> Cc: Stable <stable@vger.kernel.org>
> Cc: Josh Triplett <josh@joshtriplett.org>
> Cc: Lai Jiangshan <jiangshanlai@gmail.com>
> Cc: Joel Fernandes <joel@joelfernandes.org>
> Cc: Neeraj Upadhyay <neeraju@codeaurora.org>
> Cc: Boqun Feng <boqun.feng@gmail.com>
> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
> ---
>  kernel/rcu/tree_plugin.h | 7 +++++--
>  1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> index 2ec9d7f55f99..dd0dc66c282d 100644
> --- a/kernel/rcu/tree_plugin.h
> +++ b/kernel/rcu/tree_plugin.h
> @@ -1720,7 +1720,11 @@ static bool wake_nocb_gp(struct rcu_data *rdp, bool force,
>  		rcu_nocb_unlock_irqrestore(rdp, flags);
>  		return false;
>  	}
> -	del_timer(&rdp->nocb_timer);
> +
> +	if (READ_ONCE(rdp->nocb_defer_wakeup) > RCU_NOCB_WAKE_NOT) {
> +		WRITE_ONCE(rdp->nocb_defer_wakeup, RCU_NOCB_WAKE_NOT);
> +		del_timer(&rdp->nocb_timer);
> +	}
>  	rcu_nocb_unlock_irqrestore(rdp, flags);
>  	raw_spin_lock_irqsave(&rdp_gp->nocb_gp_lock, flags);
>  	if (force || READ_ONCE(rdp_gp->nocb_gp_sleep)) {
> @@ -2349,7 +2353,6 @@ static bool do_nocb_deferred_wakeup_common(struct rcu_data *rdp)
>  		return false;
>  	}
>  	ndw = READ_ONCE(rdp->nocb_defer_wakeup);
> -	WRITE_ONCE(rdp->nocb_defer_wakeup, RCU_NOCB_WAKE_NOT);
>  	ret = wake_nocb_gp(rdp, ndw == RCU_NOCB_WAKE_FORCE, flags);
>  	trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("DeferredWake"));
>  
> -- 
> 2.25.1
> 

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 01/13] rcu/nocb: Fix potential missed nocb_timer rearm
  2021-02-24 18:37   ` Paul E. McKenney
@ 2021-02-24 22:06     ` Frederic Weisbecker
  2021-02-25  0:14       ` Paul E. McKenney
  2021-03-02  1:48       ` Paul E. McKenney
  0 siblings, 2 replies; 39+ messages in thread
From: Frederic Weisbecker @ 2021-02-24 22:06 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: LKML, Thomas Gleixner, Boqun Feng, Lai Jiangshan,
	Neeraj Upadhyay, Josh Triplett, Stable, Joel Fernandes

On Wed, Feb 24, 2021 at 10:37:09AM -0800, Paul E. McKenney wrote:
> On Tue, Feb 23, 2021 at 01:09:59AM +0100, Frederic Weisbecker wrote:
> > Two situations can cause a missed nocb timer rearm:
> > 
> > 1) rdp(CPU A) queues its nocb timer. The grace period elapses before
> >    the timer get a chance to fire. The nocb_gp kthread is awaken by
> >    rdp(CPU B). The nocb_cb kthread for rdp(CPU A) is awaken and
> >    process the callbacks, again before the nocb_timer for CPU A get a
> >    chance to fire. rdp(CPU A) queues a callback and wakes up nocb_gp
> >    kthread, cancelling the pending nocb_timer without resetting the
> >    corresponding nocb_defer_wakeup.
> 
> As discussed offlist, expanding the above scenario results in this
> sequence of steps:
> 
> 1.	There are no callbacks queued for any CPU covered by CPU 0-2's
> 	->nocb_gp_kthread.
> 
> 2.	CPU 0 enqueues its first callback with interrupts disabled, and
> 	thus must defer awakening its ->nocb_gp_kthread.  It therefore
> 	queues its rcu_data structure's ->nocb_timer.
> 
> 3.	CPU 1, which shares the same ->nocb_gp_kthread, also enqueues a
> 	callback, but with interrupts enabled, allowing it to directly
> 	awaken the ->nocb_gp_kthread.
> 
> 4.	The newly awakened ->nocb_gp_kthread associates both CPU 0's
> 	and CPU 1's callbacks with a future grace period and arranges
> 	for that grace period to be started.
> 
> 5.	This ->nocb_gp_kthread goes to sleep waiting for the end of this
> 	future grace period.
> 
> 6.	This grace period elapses before the CPU 0's timer fires.
> 	This is normally improbably given that the timer is set for only
> 	one jiffy, but timers can be delayed.  Besides, it is possible
> 	that kernel was built with CONFIG_RCU_STRICT_GRACE_PERIOD=y.
> 
> 7.	The grace period ends, so rcu_gp_kthread awakens the
> 	->nocb_gp_kthread, which in turn awakens both CPU 0's and
> 	CPU 1's ->nocb_cb_kthread.
> 
> 8.	CPU 0's ->nocb_cb_kthread invokes its callback.
> 
> 9.	Note that neither kthread updated any ->nocb_timer state,
> 	so CPU 0's ->nocb_defer_wakeup is still set to either
> 	RCU_NOCB_WAKE or RCU_NOCB_WAKE_FORCE.
> 
> 10.	CPU 0 enqueues its second callback, again with interrupts
> 	disabled, and thus must again defer awakening its
> 	->nocb_gp_kthread.  However, ->nocb_defer_wakeup prevents
> 	CPU 0 from queueing the timer.

I managed to recollect some pieces of my brain. So keep the above but
let's change the point 10:

10.     CPU 0 enqueues its second callback, this time with interrupts
 	enabled so it can wake directly	->nocb_gp_kthread.
	It does so with calling __wake_nocb_gp() which also cancels the
	pending timer that got queued in step 2. But that doesn't reset
	CPU 0's ->nocb_defer_wakeup which is still set to RCU_NOCB_WAKE.
	So CPU 0's ->nocb_defer_wakeup and CPU 0's ->nocb_timer are now
	desynchronized.

11.	->nocb_gp_kthread associates the callback queued in 10 with a new
	grace period, arrange for it to start and sleeps on it.

12.     The grace period ends, ->nocb_gp_kthread awakens and wakes up
	CPU 0's ->nocb_cb_kthread which invokes the callback queued in 10.

13.	CPU 0 enqueues its third callback, this time with interrupts
	disabled so it tries to queue a deferred wakeup. However
	->nocb_defer_wakeup has a stalled RCU_NOCB_WAKE value which prevents
	the CPU 0's ->nocb_timer, that got cancelled in 10, from being armed.

14.     CPU 0 has its pending callback and it may go unnoticed until
        some other CPU ever wakes up ->nocb_gp_kthread or CPU 0 ever calls
	an explicit deferred wake up caller like idle entry.

I hope I'm not missing something this time...

Thanks.
	

> 
> So far so good, but why isn't the timer still queued from back in step 2?
> What am I missing here?  Either way, could you please update the commit
> logs to tell the full story?  At some later time, you might be very
> happy that you did.  ;-)
> 
> 							Thanx, Paul
> 
> > 2) The "nocb_bypass_timer" ends up calling wake_nocb_gp() which deletes
> >    the pending "nocb_timer" (note they are not the same timers) for the
> >    given rdp without resetting the matching state stored in nocb_defer
> >    wakeup.
> > 
> > On both situations, a future call_rcu() on that rdp may be fooled and
> > think the timer is armed when it's not, missing a deferred nocb_gp
> > wakeup.
> > 
> > Case 1) is very unlikely due to timing constraint (the timer fires after
> > 1 jiffy) but still possible in theory. Case 2) is more likely to happen.
> > But in any case such scenario require the CPU to spend a long time
> > within a kernel thread without exiting to idle or user space, which is
> > a pretty exotic behaviour.
> > 
> > Fix this with resetting rdp->nocb_defer_wakeup everytime we disarm the
> > timer.
> > 
> > Fixes: d1b222c6be1f (rcu/nocb: Add bypass callback queueing)
> > Cc: Stable <stable@vger.kernel.org>
> > Cc: Josh Triplett <josh@joshtriplett.org>
> > Cc: Lai Jiangshan <jiangshanlai@gmail.com>
> > Cc: Joel Fernandes <joel@joelfernandes.org>
> > Cc: Neeraj Upadhyay <neeraju@codeaurora.org>
> > Cc: Boqun Feng <boqun.feng@gmail.com>
> > Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
> > ---
> >  kernel/rcu/tree_plugin.h | 7 +++++--
> >  1 file changed, 5 insertions(+), 2 deletions(-)
> > 
> > diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> > index 2ec9d7f55f99..dd0dc66c282d 100644
> > --- a/kernel/rcu/tree_plugin.h
> > +++ b/kernel/rcu/tree_plugin.h
> > @@ -1720,7 +1720,11 @@ static bool wake_nocb_gp(struct rcu_data *rdp, bool force,
> >  		rcu_nocb_unlock_irqrestore(rdp, flags);
> >  		return false;
> >  	}
> > -	del_timer(&rdp->nocb_timer);
> > +
> > +	if (READ_ONCE(rdp->nocb_defer_wakeup) > RCU_NOCB_WAKE_NOT) {
> > +		WRITE_ONCE(rdp->nocb_defer_wakeup, RCU_NOCB_WAKE_NOT);
> > +		del_timer(&rdp->nocb_timer);
> > +	}
> >  	rcu_nocb_unlock_irqrestore(rdp, flags);
> >  	raw_spin_lock_irqsave(&rdp_gp->nocb_gp_lock, flags);
> >  	if (force || READ_ONCE(rdp_gp->nocb_gp_sleep)) {
> > @@ -2349,7 +2353,6 @@ static bool do_nocb_deferred_wakeup_common(struct rcu_data *rdp)
> >  		return false;
> >  	}
> >  	ndw = READ_ONCE(rdp->nocb_defer_wakeup);
> > -	WRITE_ONCE(rdp->nocb_defer_wakeup, RCU_NOCB_WAKE_NOT);
> >  	ret = wake_nocb_gp(rdp, ndw == RCU_NOCB_WAKE_FORCE, flags);
> >  	trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("DeferredWake"));
> >  
> > -- 
> > 2.25.1
> > 

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 01/13] rcu/nocb: Fix potential missed nocb_timer rearm
  2021-02-24 22:06     ` Frederic Weisbecker
@ 2021-02-25  0:14       ` Paul E. McKenney
  2021-02-25  0:48         ` Frederic Weisbecker
  2021-03-02  1:48       ` Paul E. McKenney
  1 sibling, 1 reply; 39+ messages in thread
From: Paul E. McKenney @ 2021-02-25  0:14 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Thomas Gleixner, Boqun Feng, Lai Jiangshan,
	Neeraj Upadhyay, Josh Triplett, Stable, Joel Fernandes

On Wed, Feb 24, 2021 at 11:06:06PM +0100, Frederic Weisbecker wrote:
> On Wed, Feb 24, 2021 at 10:37:09AM -0800, Paul E. McKenney wrote:
> > On Tue, Feb 23, 2021 at 01:09:59AM +0100, Frederic Weisbecker wrote:
> > > Two situations can cause a missed nocb timer rearm:
> > > 
> > > 1) rdp(CPU A) queues its nocb timer. The grace period elapses before
> > >    the timer get a chance to fire. The nocb_gp kthread is awaken by
> > >    rdp(CPU B). The nocb_cb kthread for rdp(CPU A) is awaken and
> > >    process the callbacks, again before the nocb_timer for CPU A get a
> > >    chance to fire. rdp(CPU A) queues a callback and wakes up nocb_gp
> > >    kthread, cancelling the pending nocb_timer without resetting the
> > >    corresponding nocb_defer_wakeup.
> > 
> > As discussed offlist, expanding the above scenario results in this
> > sequence of steps:
> > 
> > 1.	There are no callbacks queued for any CPU covered by CPU 0-2's
> > 	->nocb_gp_kthread.
> > 
> > 2.	CPU 0 enqueues its first callback with interrupts disabled, and
> > 	thus must defer awakening its ->nocb_gp_kthread.  It therefore
> > 	queues its rcu_data structure's ->nocb_timer.
> > 
> > 3.	CPU 1, which shares the same ->nocb_gp_kthread, also enqueues a
> > 	callback, but with interrupts enabled, allowing it to directly
> > 	awaken the ->nocb_gp_kthread.
> > 
> > 4.	The newly awakened ->nocb_gp_kthread associates both CPU 0's
> > 	and CPU 1's callbacks with a future grace period and arranges
> > 	for that grace period to be started.
> > 
> > 5.	This ->nocb_gp_kthread goes to sleep waiting for the end of this
> > 	future grace period.
> > 
> > 6.	This grace period elapses before the CPU 0's timer fires.
> > 	This is normally improbably given that the timer is set for only
> > 	one jiffy, but timers can be delayed.  Besides, it is possible
> > 	that kernel was built with CONFIG_RCU_STRICT_GRACE_PERIOD=y.
> > 
> > 7.	The grace period ends, so rcu_gp_kthread awakens the
> > 	->nocb_gp_kthread, which in turn awakens both CPU 0's and
> > 	CPU 1's ->nocb_cb_kthread.
> > 
> > 8.	CPU 0's ->nocb_cb_kthread invokes its callback.
> > 
> > 9.	Note that neither kthread updated any ->nocb_timer state,
> > 	so CPU 0's ->nocb_defer_wakeup is still set to either
> > 	RCU_NOCB_WAKE or RCU_NOCB_WAKE_FORCE.
> > 
> > 10.	CPU 0 enqueues its second callback, again with interrupts
> > 	disabled, and thus must again defer awakening its
> > 	->nocb_gp_kthread.  However, ->nocb_defer_wakeup prevents
> > 	CPU 0 from queueing the timer.
> 
> I managed to recollect some pieces of my brain. So keep the above but
> let's change the point 10:
> 
> 10.     CPU 0 enqueues its second callback, this time with interrupts
>  	enabled so it can wake directly	->nocb_gp_kthread.
> 	It does so with calling __wake_nocb_gp() which also cancels the
> 	pending timer that got queued in step 2. But that doesn't reset
> 	CPU 0's ->nocb_defer_wakeup which is still set to RCU_NOCB_WAKE.
> 	So CPU 0's ->nocb_defer_wakeup and CPU 0's ->nocb_timer are now
> 	desynchronized.
> 
> 11.	->nocb_gp_kthread associates the callback queued in 10 with a new
> 	grace period, arrange for it to start and sleeps on it.
> 
> 12.     The grace period ends, ->nocb_gp_kthread awakens and wakes up
> 	CPU 0's ->nocb_cb_kthread which invokes the callback queued in 10.
> 
> 13.	CPU 0 enqueues its third callback, this time with interrupts
> 	disabled so it tries to queue a deferred wakeup. However
> 	->nocb_defer_wakeup has a stalled RCU_NOCB_WAKE value which prevents
> 	the CPU 0's ->nocb_timer, that got cancelled in 10, from being armed.
> 
> 14.     CPU 0 has its pending callback and it may go unnoticed until
>         some other CPU ever wakes up ->nocb_gp_kthread or CPU 0 ever calls
> 	an explicit deferred wake up caller like idle entry.
> 
> I hope I'm not missing something this time...

Thank you, that does sound plausible.  I guess I can see how rcutorture
might have missed this one!

							Thanx, Paul

> Thanks.
> 	
> 
> > 
> > So far so good, but why isn't the timer still queued from back in step 2?
> > What am I missing here?  Either way, could you please update the commit
> > logs to tell the full story?  At some later time, you might be very
> > happy that you did.  ;-)
> > 
> > 							Thanx, Paul
> > 
> > > 2) The "nocb_bypass_timer" ends up calling wake_nocb_gp() which deletes
> > >    the pending "nocb_timer" (note they are not the same timers) for the
> > >    given rdp without resetting the matching state stored in nocb_defer
> > >    wakeup.
> > > 
> > > On both situations, a future call_rcu() on that rdp may be fooled and
> > > think the timer is armed when it's not, missing a deferred nocb_gp
> > > wakeup.
> > > 
> > > Case 1) is very unlikely due to timing constraint (the timer fires after
> > > 1 jiffy) but still possible in theory. Case 2) is more likely to happen.
> > > But in any case such scenario require the CPU to spend a long time
> > > within a kernel thread without exiting to idle or user space, which is
> > > a pretty exotic behaviour.
> > > 
> > > Fix this with resetting rdp->nocb_defer_wakeup everytime we disarm the
> > > timer.
> > > 
> > > Fixes: d1b222c6be1f (rcu/nocb: Add bypass callback queueing)
> > > Cc: Stable <stable@vger.kernel.org>
> > > Cc: Josh Triplett <josh@joshtriplett.org>
> > > Cc: Lai Jiangshan <jiangshanlai@gmail.com>
> > > Cc: Joel Fernandes <joel@joelfernandes.org>
> > > Cc: Neeraj Upadhyay <neeraju@codeaurora.org>
> > > Cc: Boqun Feng <boqun.feng@gmail.com>
> > > Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
> > > ---
> > >  kernel/rcu/tree_plugin.h | 7 +++++--
> > >  1 file changed, 5 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> > > index 2ec9d7f55f99..dd0dc66c282d 100644
> > > --- a/kernel/rcu/tree_plugin.h
> > > +++ b/kernel/rcu/tree_plugin.h
> > > @@ -1720,7 +1720,11 @@ static bool wake_nocb_gp(struct rcu_data *rdp, bool force,
> > >  		rcu_nocb_unlock_irqrestore(rdp, flags);
> > >  		return false;
> > >  	}
> > > -	del_timer(&rdp->nocb_timer);
> > > +
> > > +	if (READ_ONCE(rdp->nocb_defer_wakeup) > RCU_NOCB_WAKE_NOT) {
> > > +		WRITE_ONCE(rdp->nocb_defer_wakeup, RCU_NOCB_WAKE_NOT);
> > > +		del_timer(&rdp->nocb_timer);
> > > +	}
> > >  	rcu_nocb_unlock_irqrestore(rdp, flags);
> > >  	raw_spin_lock_irqsave(&rdp_gp->nocb_gp_lock, flags);
> > >  	if (force || READ_ONCE(rdp_gp->nocb_gp_sleep)) {
> > > @@ -2349,7 +2353,6 @@ static bool do_nocb_deferred_wakeup_common(struct rcu_data *rdp)
> > >  		return false;
> > >  	}
> > >  	ndw = READ_ONCE(rdp->nocb_defer_wakeup);
> > > -	WRITE_ONCE(rdp->nocb_defer_wakeup, RCU_NOCB_WAKE_NOT);
> > >  	ret = wake_nocb_gp(rdp, ndw == RCU_NOCB_WAKE_FORCE, flags);
> > >  	trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("DeferredWake"));
> > >  
> > > -- 
> > > 2.25.1
> > > 

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 01/13] rcu/nocb: Fix potential missed nocb_timer rearm
  2021-02-25  0:14       ` Paul E. McKenney
@ 2021-02-25  0:48         ` Frederic Weisbecker
  2021-02-25  1:07           ` Paul E. McKenney
  0 siblings, 1 reply; 39+ messages in thread
From: Frederic Weisbecker @ 2021-02-25  0:48 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: LKML, Thomas Gleixner, Boqun Feng, Lai Jiangshan,
	Neeraj Upadhyay, Josh Triplett, Stable, Joel Fernandes

On Wed, Feb 24, 2021 at 04:14:25PM -0800, Paul E. McKenney wrote:
> On Wed, Feb 24, 2021 at 11:06:06PM +0100, Frederic Weisbecker wrote:
> > I managed to recollect some pieces of my brain. So keep the above but
> > let's change the point 10:
> > 
> > 10.     CPU 0 enqueues its second callback, this time with interrupts
> >  	enabled so it can wake directly	->nocb_gp_kthread.
> > 	It does so with calling __wake_nocb_gp() which also cancels the
> > 	pending timer that got queued in step 2. But that doesn't reset
> > 	CPU 0's ->nocb_defer_wakeup which is still set to RCU_NOCB_WAKE.
> > 	So CPU 0's ->nocb_defer_wakeup and CPU 0's ->nocb_timer are now
> > 	desynchronized.
> > 
> > 11.	->nocb_gp_kthread associates the callback queued in 10 with a new
> > 	grace period, arrange for it to start and sleeps on it.
> > 
> > 12.     The grace period ends, ->nocb_gp_kthread awakens and wakes up
> > 	CPU 0's ->nocb_cb_kthread which invokes the callback queued in 10.
> > 
> > 13.	CPU 0 enqueues its third callback, this time with interrupts
> > 	disabled so it tries to queue a deferred wakeup. However
> > 	->nocb_defer_wakeup has a stalled RCU_NOCB_WAKE value which prevents
> > 	the CPU 0's ->nocb_timer, that got cancelled in 10, from being armed.
> > 
> > 14.     CPU 0 has its pending callback and it may go unnoticed until
> >         some other CPU ever wakes up ->nocb_gp_kthread or CPU 0 ever calls
> > 	an explicit deferred wake up caller like idle entry.
> > 
> > I hope I'm not missing something this time...
> 
> Thank you, that does sound plausible.  I guess I can see how rcutorture
> might have missed this one!

I must admit it requires a lot of stars to be aligned :-)

Thanks.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 01/13] rcu/nocb: Fix potential missed nocb_timer rearm
  2021-02-25  0:48         ` Frederic Weisbecker
@ 2021-02-25  1:07           ` Paul E. McKenney
  0 siblings, 0 replies; 39+ messages in thread
From: Paul E. McKenney @ 2021-02-25  1:07 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Thomas Gleixner, Boqun Feng, Lai Jiangshan,
	Neeraj Upadhyay, Josh Triplett, Stable, Joel Fernandes

On Thu, Feb 25, 2021 at 01:48:13AM +0100, Frederic Weisbecker wrote:
> On Wed, Feb 24, 2021 at 04:14:25PM -0800, Paul E. McKenney wrote:
> > On Wed, Feb 24, 2021 at 11:06:06PM +0100, Frederic Weisbecker wrote:
> > > I managed to recollect some pieces of my brain. So keep the above but
> > > let's change the point 10:
> > > 
> > > 10.     CPU 0 enqueues its second callback, this time with interrupts
> > >  	enabled so it can wake directly	->nocb_gp_kthread.
> > > 	It does so with calling __wake_nocb_gp() which also cancels the
> > > 	pending timer that got queued in step 2. But that doesn't reset
> > > 	CPU 0's ->nocb_defer_wakeup which is still set to RCU_NOCB_WAKE.
> > > 	So CPU 0's ->nocb_defer_wakeup and CPU 0's ->nocb_timer are now
> > > 	desynchronized.
> > > 
> > > 11.	->nocb_gp_kthread associates the callback queued in 10 with a new
> > > 	grace period, arrange for it to start and sleeps on it.
> > > 
> > > 12.     The grace period ends, ->nocb_gp_kthread awakens and wakes up
> > > 	CPU 0's ->nocb_cb_kthread which invokes the callback queued in 10.
> > > 
> > > 13.	CPU 0 enqueues its third callback, this time with interrupts
> > > 	disabled so it tries to queue a deferred wakeup. However
> > > 	->nocb_defer_wakeup has a stalled RCU_NOCB_WAKE value which prevents
> > > 	the CPU 0's ->nocb_timer, that got cancelled in 10, from being armed.
> > > 
> > > 14.     CPU 0 has its pending callback and it may go unnoticed until
> > >         some other CPU ever wakes up ->nocb_gp_kthread or CPU 0 ever calls
> > > 	an explicit deferred wake up caller like idle entry.
> > > 
> > > I hope I'm not missing something this time...
> > 
> > Thank you, that does sound plausible.  I guess I can see how rcutorture
> > might have missed this one!
> 
> I must admit it requires a lot of stars to be aligned :-)

It nevertheless constitutes a bug in rcutorture.  Or maybe an additional
challenge for the formal verification people.  ;-)

							Thanx, Paul

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 01/13] rcu/nocb: Fix potential missed nocb_timer rearm
  2021-02-24 22:06     ` Frederic Weisbecker
  2021-02-25  0:14       ` Paul E. McKenney
@ 2021-03-02  1:48       ` Paul E. McKenney
  2021-03-02 12:34         ` Frederic Weisbecker
  1 sibling, 1 reply; 39+ messages in thread
From: Paul E. McKenney @ 2021-03-02  1:48 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Thomas Gleixner, Boqun Feng, Lai Jiangshan,
	Neeraj Upadhyay, Josh Triplett, Stable, Joel Fernandes

On Wed, Feb 24, 2021 at 11:06:06PM +0100, Frederic Weisbecker wrote:
> On Wed, Feb 24, 2021 at 10:37:09AM -0800, Paul E. McKenney wrote:
> > On Tue, Feb 23, 2021 at 01:09:59AM +0100, Frederic Weisbecker wrote:
> > > Two situations can cause a missed nocb timer rearm:
> > > 
> > > 1) rdp(CPU A) queues its nocb timer. The grace period elapses before
> > >    the timer get a chance to fire. The nocb_gp kthread is awaken by
> > >    rdp(CPU B). The nocb_cb kthread for rdp(CPU A) is awaken and
> > >    process the callbacks, again before the nocb_timer for CPU A get a
> > >    chance to fire. rdp(CPU A) queues a callback and wakes up nocb_gp
> > >    kthread, cancelling the pending nocb_timer without resetting the
> > >    corresponding nocb_defer_wakeup.
> > 
> > As discussed offlist, expanding the above scenario results in this
> > sequence of steps:

I renumbered the CPUs, since the ->nocb_gp_kthread would normally be
associated with CPU 0.  If the first CPU to enqueue a callback was also
CPU 0, nocb_gp_wait() might clear that CPU's ->nocb_defer_wakeup, which
would prevent this scenario from playing out.  (But admittedly only if
some other CPU handled by this same ->nocb_gp_kthread used its bypass.)

> > 1.	There are no callbacks queued for any CPU covered by CPU 0-2's
> > 	->nocb_gp_kthread.

And ->nocb_gp_kthread is associated with CPU 0.

> > 2.	CPU 1 enqueues its first callback with interrupts disabled, and
> > 	thus must defer awakening its ->nocb_gp_kthread.  It therefore
> > 	queues its rcu_data structure's ->nocb_timer.

At this point, CPU 1's rdp->nocb_defer_wakeup is RCU_NOCB_WAKE.

> > 3.	CPU 2, which shares the same ->nocb_gp_kthread, also enqueues a
> > 	callback, but with interrupts enabled, allowing it to directly
> > 	awaken the ->nocb_gp_kthread.
> > 
> > 4.	The newly awakened ->nocb_gp_kthread associates both CPU 1's
> > 	and CPU 2's callbacks with a future grace period and arranges
> > 	for that grace period to be started.
> > 
> > 5.	This ->nocb_gp_kthread goes to sleep waiting for the end of this
> > 	future grace period.
> > 
> > 6.	This grace period elapses before the CPU 1's timer fires.
> > 	This is normally improbably given that the timer is set for only
> > 	one jiffy, but timers can be delayed.  Besides, it is possible
> > 	that kernel was built with CONFIG_RCU_STRICT_GRACE_PERIOD=y.
> > 
> > 7.	The grace period ends, so rcu_gp_kthread awakens the
> > 	->nocb_gp_kthread, which in turn awakens both CPU 1's and
> > 	CPU 2's ->nocb_cb_kthread.

And then ->nocb_cb_kthread sleeps waiting for more callbacks.

> > 8.	CPU 1's ->nocb_cb_kthread invokes its callback.
> > 
> > 9.	Note that neither kthread updated any ->nocb_timer state,
> > 	so CPU 1's ->nocb_defer_wakeup is still set to RCU_NOCB_WAKE.
> > 
> > 10.	CPU 1 enqueues its second callback, again with interrupts
> > 	disabled, and thus must again defer awakening its
> > 	->nocb_gp_kthread.  However, ->nocb_defer_wakeup prevents
> > 	CPU 1 from queueing the timer.
> 
> I managed to recollect some pieces of my brain. So keep the above but
> let's change the point 10:
> 
> 10.	CPU 1 enqueues its second callback, this time with interrupts
>  	enabled so it can wake directly	->nocb_gp_kthread.
> 	It does so with calling __wake_nocb_gp() which also cancels the

wake_nocb_gp() in current -rcu, correct?

> 	pending timer that got queued in step 2. But that doesn't reset
> 	CPU 1's ->nocb_defer_wakeup which is still set to RCU_NOCB_WAKE.
> 	So CPU 1's ->nocb_defer_wakeup and CPU 1's ->nocb_timer are now
> 	desynchronized.

Agreed, and agreed that this is a bug.  Thank you for persisting on
this one!

> 11.	->nocb_gp_kthread associates the callback queued in 10 with a new
> 	grace period, arrange for it to start and sleeps on it.
> 
> 12.	The grace period ends, ->nocb_gp_kthread awakens and wakes up
> 	CPU 1's ->nocb_cb_kthread which invokes the callback queued in 10.
> 
> 13.	CPU 1 enqueues its third callback, this time with interrupts
> 	disabled so it tries to queue a deferred wakeup. However
> 	->nocb_defer_wakeup has a stalled RCU_NOCB_WAKE value which prevents
> 	the CPU 1's ->nocb_timer, that got cancelled in 10, from being armed.
> 
> 14.	CPU 1 has its pending callback and it may go unnoticed until
> 	some other CPU ever wakes up ->nocb_gp_kthread or CPU 1 ever calls
> 	an explicit deferred wake up caller like idle entry.
> 
> I hope I'm not missing something this time...

If you are missing something, then so am I!  ;-)

> > So far so good, but why isn't the timer still queued from back in step 2?
> > What am I missing here?  Either way, could you please update the commit
> > logs to tell the full story?  At some later time, you might be very
> > happy that you did.  ;-)
> > 
> > > 2) The "nocb_bypass_timer" ends up calling wake_nocb_gp() which deletes
> > >    the pending "nocb_timer" (note they are not the same timers) for the
> > >    given rdp without resetting the matching state stored in nocb_defer
> > >    wakeup.

Would like to similarly expand this one, or would you prefer to rest your
case on Case 1) above?

							Thanx, Paul

> > > On both situations, a future call_rcu() on that rdp may be fooled and
> > > think the timer is armed when it's not, missing a deferred nocb_gp
> > > wakeup.
> > > 
> > > Case 1) is very unlikely due to timing constraint (the timer fires after
> > > 1 jiffy) but still possible in theory. Case 2) is more likely to happen.
> > > But in any case such scenario require the CPU to spend a long time
> > > within a kernel thread without exiting to idle or user space, which is
> > > a pretty exotic behaviour.
> > > 
> > > Fix this with resetting rdp->nocb_defer_wakeup everytime we disarm the
> > > timer.
> > > 
> > > Fixes: d1b222c6be1f (rcu/nocb: Add bypass callback queueing)
> > > Cc: Stable <stable@vger.kernel.org>
> > > Cc: Josh Triplett <josh@joshtriplett.org>
> > > Cc: Lai Jiangshan <jiangshanlai@gmail.com>
> > > Cc: Joel Fernandes <joel@joelfernandes.org>
> > > Cc: Neeraj Upadhyay <neeraju@codeaurora.org>
> > > Cc: Boqun Feng <boqun.feng@gmail.com>
> > > Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
> > > ---
> > >  kernel/rcu/tree_plugin.h | 7 +++++--
> > >  1 file changed, 5 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> > > index 2ec9d7f55f99..dd0dc66c282d 100644
> > > --- a/kernel/rcu/tree_plugin.h
> > > +++ b/kernel/rcu/tree_plugin.h
> > > @@ -1720,7 +1720,11 @@ static bool wake_nocb_gp(struct rcu_data *rdp, bool force,
> > >  		rcu_nocb_unlock_irqrestore(rdp, flags);
> > >  		return false;
> > >  	}
> > > -	del_timer(&rdp->nocb_timer);
> > > +
> > > +	if (READ_ONCE(rdp->nocb_defer_wakeup) > RCU_NOCB_WAKE_NOT) {
> > > +		WRITE_ONCE(rdp->nocb_defer_wakeup, RCU_NOCB_WAKE_NOT);
> > > +		del_timer(&rdp->nocb_timer);
> > > +	}
> > >  	rcu_nocb_unlock_irqrestore(rdp, flags);
> > >  	raw_spin_lock_irqsave(&rdp_gp->nocb_gp_lock, flags);
> > >  	if (force || READ_ONCE(rdp_gp->nocb_gp_sleep)) {
> > > @@ -2349,7 +2353,6 @@ static bool do_nocb_deferred_wakeup_common(struct rcu_data *rdp)
> > >  		return false;
> > >  	}
> > >  	ndw = READ_ONCE(rdp->nocb_defer_wakeup);
> > > -	WRITE_ONCE(rdp->nocb_defer_wakeup, RCU_NOCB_WAKE_NOT);
> > >  	ret = wake_nocb_gp(rdp, ndw == RCU_NOCB_WAKE_FORCE, flags);
> > >  	trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("DeferredWake"));
> > >  
> > > -- 
> > > 2.25.1
> > > 

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 01/13] rcu/nocb: Fix potential missed nocb_timer rearm
  2021-03-02  1:48       ` Paul E. McKenney
@ 2021-03-02 12:34         ` Frederic Weisbecker
  2021-03-02 18:17           ` Paul E. McKenney
  0 siblings, 1 reply; 39+ messages in thread
From: Frederic Weisbecker @ 2021-03-02 12:34 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: LKML, Thomas Gleixner, Boqun Feng, Lai Jiangshan,
	Neeraj Upadhyay, Josh Triplett, Stable, Joel Fernandes

On Mon, Mar 01, 2021 at 05:48:29PM -0800, Paul E. McKenney wrote:
> On Wed, Feb 24, 2021 at 11:06:06PM +0100, Frederic Weisbecker wrote:
> > On Wed, Feb 24, 2021 at 10:37:09AM -0800, Paul E. McKenney wrote:
> > > On Tue, Feb 23, 2021 at 01:09:59AM +0100, Frederic Weisbecker wrote:
> > > > Two situations can cause a missed nocb timer rearm:
> > > > 
> > > > 1) rdp(CPU A) queues its nocb timer. The grace period elapses before
> > > >    the timer get a chance to fire. The nocb_gp kthread is awaken by
> > > >    rdp(CPU B). The nocb_cb kthread for rdp(CPU A) is awaken and
> > > >    process the callbacks, again before the nocb_timer for CPU A get a
> > > >    chance to fire. rdp(CPU A) queues a callback and wakes up nocb_gp
> > > >    kthread, cancelling the pending nocb_timer without resetting the
> > > >    corresponding nocb_defer_wakeup.
> > > 
> > > As discussed offlist, expanding the above scenario results in this
> > > sequence of steps:
> 
> I renumbered the CPUs, since the ->nocb_gp_kthread would normally be
> associated with CPU 0.  If the first CPU to enqueue a callback was also
> CPU 0, nocb_gp_wait() might clear that CPU's ->nocb_defer_wakeup, which
> would prevent this scenario from playing out.  (But admittedly only if
> some other CPU handled by this same ->nocb_gp_kthread used its bypass.)

Ok good point.

> 
> > > 1.	There are no callbacks queued for any CPU covered by CPU 0-2's
> > > 	->nocb_gp_kthread.
> 
> And ->nocb_gp_kthread is associated with CPU 0.
> 
> > > 2.	CPU 1 enqueues its first callback with interrupts disabled, and
> > > 	thus must defer awakening its ->nocb_gp_kthread.  It therefore
> > > 	queues its rcu_data structure's ->nocb_timer.
> 
> At this point, CPU 1's rdp->nocb_defer_wakeup is RCU_NOCB_WAKE.

Right.

> > > 7.	The grace period ends, so rcu_gp_kthread awakens the
> > > 	->nocb_gp_kthread, which in turn awakens both CPU 1's and
> > > 	CPU 2's ->nocb_cb_kthread.
> 
> And then ->nocb_cb_kthread sleeps waiting for more callbacks.

Yep

> > I managed to recollect some pieces of my brain. So keep the above but
> > let's change the point 10:
> > 
> > 10.	CPU 1 enqueues its second callback, this time with interrupts
> >  	enabled so it can wake directly	->nocb_gp_kthread.
> > 	It does so with calling __wake_nocb_gp() which also cancels the
> 
> wake_nocb_gp() in current -rcu, correct?

Heh, right.

> > > So far so good, but why isn't the timer still queued from back in step 2?
> > > What am I missing here?  Either way, could you please update the commit
> > > logs to tell the full story?  At some later time, you might be very
> > > happy that you did.  ;-)
> > > 
> > > > 2) The "nocb_bypass_timer" ends up calling wake_nocb_gp() which deletes
> > > >    the pending "nocb_timer" (note they are not the same timers) for the
> > > >    given rdp without resetting the matching state stored in nocb_defer
> > > >    wakeup.
> 
> Would like to similarly expand this one, or would you prefer to rest your
> case on Case 1) above?

I was about to say that we can skip that one, the changelog will already be
big enough but the "Fixes:" tag refers to the second scenario, since it's the
oldest vulnerable commit AFAICS.

> > > > Fixes: d1b222c6be1f (rcu/nocb: Add bypass callback queueing)

Thanks.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 01/13] rcu/nocb: Fix potential missed nocb_timer rearm
  2021-03-02 12:34         ` Frederic Weisbecker
@ 2021-03-02 18:17           ` Paul E. McKenney
  2021-03-03  1:35             ` Frederic Weisbecker
  2021-03-03 11:15             ` Neeraj Upadhyay
  0 siblings, 2 replies; 39+ messages in thread
From: Paul E. McKenney @ 2021-03-02 18:17 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Thomas Gleixner, Boqun Feng, Lai Jiangshan,
	Neeraj Upadhyay, Josh Triplett, Stable, Joel Fernandes

On Tue, Mar 02, 2021 at 01:34:44PM +0100, Frederic Weisbecker wrote:
> On Mon, Mar 01, 2021 at 05:48:29PM -0800, Paul E. McKenney wrote:
> > On Wed, Feb 24, 2021 at 11:06:06PM +0100, Frederic Weisbecker wrote:
> > > On Wed, Feb 24, 2021 at 10:37:09AM -0800, Paul E. McKenney wrote:
> > > > On Tue, Feb 23, 2021 at 01:09:59AM +0100, Frederic Weisbecker wrote:
> > > > > Two situations can cause a missed nocb timer rearm:
> > > > > 
> > > > > 1) rdp(CPU A) queues its nocb timer. The grace period elapses before
> > > > >    the timer get a chance to fire. The nocb_gp kthread is awaken by
> > > > >    rdp(CPU B). The nocb_cb kthread for rdp(CPU A) is awaken and
> > > > >    process the callbacks, again before the nocb_timer for CPU A get a
> > > > >    chance to fire. rdp(CPU A) queues a callback and wakes up nocb_gp
> > > > >    kthread, cancelling the pending nocb_timer without resetting the
> > > > >    corresponding nocb_defer_wakeup.
> > > > 
> > > > As discussed offlist, expanding the above scenario results in this
> > > > sequence of steps:
> > 
> > I renumbered the CPUs, since the ->nocb_gp_kthread would normally be
> > associated with CPU 0.  If the first CPU to enqueue a callback was also
> > CPU 0, nocb_gp_wait() might clear that CPU's ->nocb_defer_wakeup, which
> > would prevent this scenario from playing out.  (But admittedly only if
> > some other CPU handled by this same ->nocb_gp_kthread used its bypass.)
> 
> Ok good point.
> 
> > 
> > > > 1.	There are no callbacks queued for any CPU covered by CPU 0-2's
> > > > 	->nocb_gp_kthread.
> > 
> > And ->nocb_gp_kthread is associated with CPU 0.
> > 
> > > > 2.	CPU 1 enqueues its first callback with interrupts disabled, and
> > > > 	thus must defer awakening its ->nocb_gp_kthread.  It therefore
> > > > 	queues its rcu_data structure's ->nocb_timer.
> > 
> > At this point, CPU 1's rdp->nocb_defer_wakeup is RCU_NOCB_WAKE.
> 
> Right.
> 
> > > > 7.	The grace period ends, so rcu_gp_kthread awakens the
> > > > 	->nocb_gp_kthread, which in turn awakens both CPU 1's and
> > > > 	CPU 2's ->nocb_cb_kthread.
> > 
> > And then ->nocb_cb_kthread sleeps waiting for more callbacks.
> 
> Yep
> 
> > > I managed to recollect some pieces of my brain. So keep the above but
> > > let's change the point 10:
> > > 
> > > 10.	CPU 1 enqueues its second callback, this time with interrupts
> > >  	enabled so it can wake directly	->nocb_gp_kthread.
> > > 	It does so with calling __wake_nocb_gp() which also cancels the
> > 
> > wake_nocb_gp() in current -rcu, correct?
> 
> Heh, right.
> 
> > > > So far so good, but why isn't the timer still queued from back in step 2?
> > > > What am I missing here?  Either way, could you please update the commit
> > > > logs to tell the full story?  At some later time, you might be very
> > > > happy that you did.  ;-)
> > > > 
> > > > > 2) The "nocb_bypass_timer" ends up calling wake_nocb_gp() which deletes
> > > > >    the pending "nocb_timer" (note they are not the same timers) for the
> > > > >    given rdp without resetting the matching state stored in nocb_defer
> > > > >    wakeup.
> > 
> > Would like to similarly expand this one, or would you prefer to rest your
> > case on Case 1) above?
> 
> I was about to say that we can skip that one, the changelog will already be
> big enough but the "Fixes:" tag refers to the second scenario, since it's the
> oldest vulnerable commit AFAICS.
> 
> > > > > Fixes: d1b222c6be1f (rcu/nocb: Add bypass callback queueing)

OK, how about if I queue a temporary commit (shown below) that just
calls out the first scenario so that I can start testing, and you get
me more detail on the second scenario?  I can then update the commit.

							Thanx, Paul

------------------------------------------------------------------------

commit 302fd54b9ae98f678624cbf9bf7a4ca88455a8f9
Author: Frederic Weisbecker <frederic@kernel.org>
Date:   Tue Feb 23 01:09:59 2021 +0100

    rcu/nocb: Fix missed nocb_timer requeue
    
    This sequence of events can lead to a failure to requeue a CPU's
    ->nocb_timer:
    
    1.      There are no callbacks queued for any CPU covered by CPU 0-2's
            ->nocb_gp_kthread.  Note that ->nocb_gp_kthread is associated
            with CPU 0.
    
    2.      CPU 1 enqueues its first callback with interrupts disabled, and
            thus must defer awakening its ->nocb_gp_kthread.  It therefore
            queues its rcu_data structure's ->nocb_timer.  At this point,
            CPU 1's rdp->nocb_defer_wakeup is RCU_NOCB_WAKE.
    
    3.      CPU 2, which shares the same ->nocb_gp_kthread, also enqueues a
            callback, but with interrupts enabled, allowing it to directly
            awaken the ->nocb_gp_kthread.
    
    4.      The newly awakened ->nocb_gp_kthread associates both CPU 1's
            and CPU 2's callbacks with a future grace period and arranges
            for that grace period to be started.
    
    5.      This ->nocb_gp_kthread goes to sleep waiting for the end of this
            future grace period.
    
    6.      This grace period elapses before the CPU 1's timer fires.
            This is normally improbably given that the timer is set for only
            one jiffy, but timers can be delayed.  Besides, it is possible
            that kernel was built with CONFIG_RCU_STRICT_GRACE_PERIOD=y.
    
    7.      The grace period ends, so rcu_gp_kthread awakens the
            ->nocb_gp_kthread, which in turn awakens both CPU 1's and
            CPU 2's ->nocb_cb_kthread.  Then ->nocb_gb_kthread sleeps
            waiting for more newly queued callbacks.
    
    8.      CPU 1's ->nocb_cb_kthread invokes its callback, then sleeps
            waiting for more invocable callbacks.
    
    9.      Note that neither kthread updated any ->nocb_timer state,
            so CPU 1's ->nocb_defer_wakeup is still set to RCU_NOCB_WAKE.
    
    10.     CPU 1 enqueues its second callback, this time with interrupts
            enabled so it can wake directly ->nocb_gp_kthread.
            It does so with calling wake_nocb_gp() which also cancels the
            pending timer that got queued in step 2. But that doesn't reset
            CPU 1's ->nocb_defer_wakeup which is still set to RCU_NOCB_WAKE.
            So CPU 1's ->nocb_defer_wakeup and its ->nocb_timer are now
            desynchronized.
    
    11.     ->nocb_gp_kthread associates the callback queued in 10 with a new
            grace period, arranges for that grace period to start and sleeps
            waiting for it to complete.
    
    12.     The grace period ends, rcu_gp_kthread awakens ->nocb_gp_kthread,
            which in turn wakes up CPU 1's ->nocb_cb_kthread which then
            invokes the callback queued in 10.
    
    13.     CPU 1 enqueues its third callback, this time with interrupts
            disabled so it must queue a timer for a deferred wakeup. However
            the value of its ->nocb_defer_wakeup is RCU_NOCB_WAKE which
            incorrectly indicates that a timer is already queued.  Instead,
            CPU 1's ->nocb_timer was cancelled in 10.  CPU 1 therefore fails
            to queue the ->nocb_timer.
    
    14.     CPU 1 has its pending callback and it may go unnoticed until
            some other CPU ever wakes up ->nocb_gp_kthread or CPU 1 ever
            calls an explicit deferred wakeup, for example, during idle entry.
    
    This commit fixes this bug by resetting rdp->nocb_defer_wakeup everytime
    we delete the ->nocb_timer.
    
    Fixes: d1b222c6be1f (rcu/nocb: Add bypass callback queueing)
    Cc: Stable <stable@vger.kernel.org>
    Cc: Josh Triplett <josh@joshtriplett.org>
    Cc: Lai Jiangshan <jiangshanlai@gmail.com>
    Cc: Joel Fernandes <joel@joelfernandes.org>
    Cc: Neeraj Upadhyay <neeraju@codeaurora.org>
    Cc: Boqun Feng <boqun.feng@gmail.com>
    Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 44746d8..429491d 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -1721,7 +1721,11 @@ static bool wake_nocb_gp(struct rcu_data *rdp, bool force,
 		rcu_nocb_unlock_irqrestore(rdp, flags);
 		return false;
 	}
-	del_timer(&rdp->nocb_timer);
+
+	if (READ_ONCE(rdp->nocb_defer_wakeup) > RCU_NOCB_WAKE_NOT) {
+		WRITE_ONCE(rdp->nocb_defer_wakeup, RCU_NOCB_WAKE_NOT);
+		del_timer(&rdp->nocb_timer);
+	}
 	rcu_nocb_unlock_irqrestore(rdp, flags);
 	raw_spin_lock_irqsave(&rdp_gp->nocb_gp_lock, flags);
 	if (force || READ_ONCE(rdp_gp->nocb_gp_sleep)) {
@@ -2350,7 +2354,6 @@ static bool do_nocb_deferred_wakeup_common(struct rcu_data *rdp)
 		return false;
 	}
 	ndw = READ_ONCE(rdp->nocb_defer_wakeup);
-	WRITE_ONCE(rdp->nocb_defer_wakeup, RCU_NOCB_WAKE_NOT);
 	ret = wake_nocb_gp(rdp, ndw == RCU_NOCB_WAKE_FORCE, flags);
 	trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("DeferredWake"));
 

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [PATCH 05/13] rcu/nocb: Use the rcuog CPU's ->nocb_timer
  2021-02-23  0:10 ` [PATCH 05/13] rcu/nocb: Merge nocb_timer to the rdp leader Frederic Weisbecker
@ 2021-03-03  1:15   ` Paul E. McKenney
  2021-03-10 22:05     ` Frederic Weisbecker
  0 siblings, 1 reply; 39+ messages in thread
From: Paul E. McKenney @ 2021-03-03  1:15 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Thomas Gleixner, Boqun Feng, Lai Jiangshan,
	Neeraj Upadhyay, Josh Triplett, Stable, Joel Fernandes

On Tue, Feb 23, 2021 at 01:10:03AM +0100, Frederic Weisbecker wrote:
> Currently each offline rdp has its own nocb_timer armed when the
> nocb_gp wakeup must be deferred. This layout has many drawbacks,
> compared to a solution based on a single timer per rdp group:
> 
> * There are a lot of timers to maintain.
> 
> * The per-rdp ->nocb_lock must be held to queue and cancel the timer
>   and this lock can already be quite contended.
> 
> * One timer firing doesn't cancel the other timers in the same group:
>   - These other timers can thus cause spurious wakeups
>   - Each rdp that queued a timer must lock both ->nocb_lock and then
>     ->nocb_gp_lock upon exit from the kernel to idle/user/guest mode.
> 
> * We can't cancel all of them if we detect an unflushed bypass in
>   nocb_gp_wait(). In fact currently we only ever cancel the nocb_timer
>   of the leader group.
> 
> * The leader group's nocb_timer is cancelled without locking ->nocb_lock
>   in nocb_gp_wait().  This currently appears to be safe but is an
>   accident waiting to happen.
> 
> * Since the timer acquires ->nocb_lock, it requires extra care in the
>   NOCB (de-)offloading process, requiring that it be either enabled or
>   disabled and flushed.
> 
> This commit instead uses the rcuog kthread's CPU's ->nocb_timer instead.
> It is protected by nocb_gp_lock, which is _way_ less contended and
> remains so even after this change.  As a matter of fact, the nocb_timer
> almost never fires and the deferred wakeup is mostly carried out upon
> idle/user/guest entry.  Now the early check performed at this point in
> do_nocb_deferred_wakeup() is done on rdp_gp->nocb_defer_wakeup, which
> is of course racy.  However, this raciness is harmless because we only
> need the guarantee that the timer is queued if we were the last one to
> queue it.  Any other situation (another CPU has queued it and we either
> see it or not) is fine.
> 
> This solves all the issues listed above.
> 
> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
> Cc: Josh Triplett <josh@joshtriplett.org>
> Cc: Lai Jiangshan <jiangshanlai@gmail.com>
> Cc: Joel Fernandes <joel@joelfernandes.org>
> Cc: Neeraj Upadhyay <neeraju@codeaurora.org>
> Cc: Boqun Feng <boqun.feng@gmail.com>

I pulled in the previous three (2-4/13) with the usual commit-log wordsmithing,
thank you!  And I could not resist wordsmithing above.

I do very much like the general approach, but a few questions below.

The first question is of course: Did you try this with lockdep enabled?  ;-)

> ---
>  kernel/rcu/tree.h        |   1 -
>  kernel/rcu/tree_plugin.h | 142 +++++++++++++++++++++------------------
>  2 files changed, 78 insertions(+), 65 deletions(-)
> 
> diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
> index 71821d59d95c..b280a843bd2c 100644
> --- a/kernel/rcu/tree.h
> +++ b/kernel/rcu/tree.h
> @@ -257,7 +257,6 @@ struct rcu_data {
>  };
>  
>  /* Values for nocb_defer_wakeup field in struct rcu_data. */
> -#define RCU_NOCB_WAKE_OFF	-1
>  #define RCU_NOCB_WAKE_NOT	0
>  #define RCU_NOCB_WAKE		1
>  #define RCU_NOCB_WAKE_FORCE	2
> diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> index 587df271d640..847636d3e93d 100644
> --- a/kernel/rcu/tree_plugin.h
> +++ b/kernel/rcu/tree_plugin.h
> @@ -33,10 +33,6 @@ static inline bool rcu_current_is_nocb_kthread(struct rcu_data *rdp)
>  	return false;
>  }
>  
> -static inline bool rcu_running_nocb_timer(struct rcu_data *rdp)
> -{
> -	return (timer_curr_running(&rdp->nocb_timer) && !in_irq());
> -}
>  #else
>  static inline int rcu_lockdep_is_held_nocb(struct rcu_data *rdp)
>  {
> @@ -48,11 +44,6 @@ static inline bool rcu_current_is_nocb_kthread(struct rcu_data *rdp)
>  	return false;
>  }
>  
> -static inline bool rcu_running_nocb_timer(struct rcu_data *rdp)
> -{
> -	return false;
> -}
> -
>  #endif /* #ifdef CONFIG_RCU_NOCB_CPU */
>  
>  static bool rcu_rdp_is_offloaded(struct rcu_data *rdp)
> @@ -72,8 +63,7 @@ static bool rcu_rdp_is_offloaded(struct rcu_data *rdp)
>  		  rcu_lockdep_is_held_nocb(rdp) ||
>  		  (rdp == this_cpu_ptr(&rcu_data) &&
>  		   !(IS_ENABLED(CONFIG_PREEMPT_COUNT) && preemptible())) ||
> -		  rcu_current_is_nocb_kthread(rdp) ||
> -		  rcu_running_nocb_timer(rdp)),
> +		  rcu_current_is_nocb_kthread(rdp)),
>  		"Unsafe read of RCU_NOCB offloaded state"
>  	);
>  
> @@ -1702,43 +1692,50 @@ bool rcu_is_nocb_cpu(int cpu)
>  	return false;
>  }
>  
> -/*
> - * Kick the GP kthread for this NOCB group.  Caller holds ->nocb_lock
> - * and this function releases it.
> - */
> -static bool wake_nocb_gp(struct rcu_data *rdp, bool force,
> -			 unsigned long flags)
> -	__releases(rdp->nocb_lock)
> +static bool __wake_nocb_gp(struct rcu_data *rdp_gp,
> +			   struct rcu_data *rdp,
> +			   bool force, unsigned long flags)
> +	__releases(rdp_gp->nocb_gp_lock)
>  {
>  	bool needwake = false;
> -	struct rcu_data *rdp_gp = rdp->nocb_gp_rdp;
>  
> -	lockdep_assert_held(&rdp->nocb_lock);
>  	if (!READ_ONCE(rdp_gp->nocb_gp_kthread)) {
> -		rcu_nocb_unlock_irqrestore(rdp, flags);
> +		raw_spin_unlock_irqrestore(&rdp_gp->nocb_gp_lock, flags);
>  		trace_rcu_nocb_wake(rcu_state.name, rdp->cpu,
>  				    TPS("AlreadyAwake"));
>  		return false;
>  	}
>  
> -	if (READ_ONCE(rdp->nocb_defer_wakeup) > RCU_NOCB_WAKE_NOT) {
> -		WRITE_ONCE(rdp->nocb_defer_wakeup, RCU_NOCB_WAKE_NOT);
> -		del_timer(&rdp->nocb_timer);
> +	if (rdp_gp->nocb_defer_wakeup > RCU_NOCB_WAKE_NOT) {

So there are no longer any data races involving ->nocb_defer_wakeup?

(Yes, I could fire up KCSAN, but my KCSAN-capable system is otherwise
occupied for several more hours.)

> +		WRITE_ONCE(rdp_gp->nocb_defer_wakeup, RCU_NOCB_WAKE_NOT);
> +		del_timer(&rdp_gp->nocb_timer);
>  	}
> -	rcu_nocb_unlock_irqrestore(rdp, flags);
> -	raw_spin_lock_irqsave(&rdp_gp->nocb_gp_lock, flags);
> +
>  	if (force || READ_ONCE(rdp_gp->nocb_gp_sleep)) {
>  		WRITE_ONCE(rdp_gp->nocb_gp_sleep, false);
>  		needwake = true;
> +	}
> +	raw_spin_unlock_irqrestore(&rdp_gp->nocb_gp_lock, flags);
> +	if (needwake) {
>  		trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("DoWake"));
> -	}
> -	raw_spin_unlock_irqrestore(&rdp_gp->nocb_gp_lock, flags);
> -	if (needwake)
>  		wake_up_process(rdp_gp->nocb_gp_kthread);
> +	}
>  
>  	return needwake;
>  }
>  
> +/*
> + * Kick the GP kthread for this NOCB group.
> + */
> +static bool wake_nocb_gp(struct rcu_data *rdp, bool force)
> +{
> +	unsigned long flags;
> +	struct rcu_data *rdp_gp = rdp->nocb_gp_rdp;
> +
> +	raw_spin_lock_irqsave(&rdp_gp->nocb_gp_lock, flags);
> +	return __wake_nocb_gp(rdp_gp, rdp, force, flags);
> +}
> +
>  /*
>   * Arrange to wake the GP kthread for this NOCB group at some future
>   * time when it is safe to do so.
> @@ -1746,12 +1743,18 @@ static bool wake_nocb_gp(struct rcu_data *rdp, bool force,
>  static void wake_nocb_gp_defer(struct rcu_data *rdp, int waketype,
>  			       const char *reason)
>  {
> -	if (rdp->nocb_defer_wakeup == RCU_NOCB_WAKE_OFF)
> -		return;
> -	if (rdp->nocb_defer_wakeup == RCU_NOCB_WAKE_NOT)
> -		mod_timer(&rdp->nocb_timer, jiffies + 1);
> -	if (rdp->nocb_defer_wakeup < waketype)
> -		WRITE_ONCE(rdp->nocb_defer_wakeup, waketype);
> +	unsigned long flags;
> +	struct rcu_data *rdp_gp = rdp->nocb_gp_rdp;
> +
> +	raw_spin_lock_irqsave(&rdp_gp->nocb_gp_lock, flags);
> +
> +	if (rdp_gp->nocb_defer_wakeup == RCU_NOCB_WAKE_NOT)
> +		mod_timer(&rdp_gp->nocb_timer, jiffies + 1);
> +	if (rdp_gp->nocb_defer_wakeup < waketype)
> +		WRITE_ONCE(rdp_gp->nocb_defer_wakeup, waketype);
> +
> +	raw_spin_unlock_irqrestore(&rdp_gp->nocb_gp_lock, flags);
> +
>  	trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, reason);
>  }
>  
> @@ -1978,13 +1981,14 @@ static void __call_rcu_nocb_wake(struct rcu_data *rdp, bool was_alldone,
>  		rdp->qlen_last_fqs_check = len;
>  		if (!irqs_disabled_flags(flags)) {
>  			/* ... if queue was empty ... */
> -			wake_nocb_gp(rdp, false, flags);
> +			rcu_nocb_unlock_irqrestore(rdp, flags);
> +			wake_nocb_gp(rdp, false);
>  			trace_rcu_nocb_wake(rcu_state.name, rdp->cpu,
>  					    TPS("WakeEmpty"));
>  		} else {
> +			rcu_nocb_unlock_irqrestore(rdp, flags);
>  			wake_nocb_gp_defer(rdp, RCU_NOCB_WAKE,
>  					   TPS("WakeEmptyIsDeferred"));
> -			rcu_nocb_unlock_irqrestore(rdp, flags);
>  		}
>  	} else if (len > rdp->qlen_last_fqs_check + qhimark) {
>  		/* ... or if many callbacks queued. */
> @@ -1999,10 +2003,14 @@ static void __call_rcu_nocb_wake(struct rcu_data *rdp, bool was_alldone,
>  		smp_mb(); /* Enqueue before timer_pending(). */
>  		if ((rdp->nocb_cb_sleep ||
>  		     !rcu_segcblist_ready_cbs(&rdp->cblist)) &&
> -		    !timer_pending(&rdp->nocb_bypass_timer))
> +		    !timer_pending(&rdp->nocb_bypass_timer)) {
> +			rcu_nocb_unlock_irqrestore(rdp, flags);
>  			wake_nocb_gp_defer(rdp, RCU_NOCB_WAKE_FORCE,
>  					   TPS("WakeOvfIsDeferred"));
> -		rcu_nocb_unlock_irqrestore(rdp, flags);
> +		} else {
> +			rcu_nocb_unlock_irqrestore(rdp, flags);
> +			trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("WakeNot"));
> +		}
>  	} else {
>  		rcu_nocb_unlock_irqrestore(rdp, flags);
>  		trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("WakeNot"));
> @@ -2128,11 +2136,7 @@ static void nocb_gp_wait(struct rcu_data *my_rdp)
>  			bypass = true;
>  		}
>  		rnp = rdp->mynode;
> -		if (bypass) {  // Avoid race with first bypass CB.
> -			WRITE_ONCE(my_rdp->nocb_defer_wakeup,
> -				   RCU_NOCB_WAKE_NOT);
> -			del_timer(&my_rdp->nocb_timer);
> -		}

What ensures that the timer will be properly queued when a non-empty
bypass needs it to be?

Never mind, I now see the code below...

> +
>  		// Advance callbacks if helpful and low contention.
>  		needwake_gp = false;
>  		if (!rcu_segcblist_restempty(&rdp->cblist,
> @@ -2178,11 +2182,18 @@ static void nocb_gp_wait(struct rcu_data *my_rdp)
>  	my_rdp->nocb_gp_bypass = bypass;
>  	my_rdp->nocb_gp_gp = needwait_gp;
>  	my_rdp->nocb_gp_seq = needwait_gp ? wait_gp_seq : 0;
> -	if (bypass && !rcu_nocb_poll) {
> -		// At least one child with non-empty ->nocb_bypass, so set
> -		// timer in order to avoid stranding its callbacks.
> +	if (bypass) {
>  		raw_spin_lock_irqsave(&my_rdp->nocb_gp_lock, flags);
> -		mod_timer(&my_rdp->nocb_bypass_timer, j + 2);
> +		// Avoid race with first bypass CB.
> +		if (my_rdp->nocb_defer_wakeup > RCU_NOCB_WAKE_NOT) {
> +			WRITE_ONCE(my_rdp->nocb_defer_wakeup, RCU_NOCB_WAKE_NOT);
> +			del_timer(&my_rdp->nocb_timer);
> +		}

Given that the timer does not get queued if rcu_nocb_poll, why not move the
above "if" statement under the one following?

> +		if (!rcu_nocb_poll) {
> +			// At least one child with non-empty ->nocb_bypass, so set
> +			// timer in order to avoid stranding its callbacks.
> +			mod_timer(&my_rdp->nocb_bypass_timer, j + 2);
> +		}
>  		raw_spin_unlock_irqrestore(&my_rdp->nocb_gp_lock, flags);
>  	}
>  	if (rcu_nocb_poll) {
> @@ -2356,15 +2367,18 @@ static bool do_nocb_deferred_wakeup_common(struct rcu_data *rdp)
>  {
>  	unsigned long flags;
>  	int ndw;
> +	struct rcu_data *rdp_gp = rdp->nocb_gp_rdp;
>  	int ret;
>  
> -	rcu_nocb_lock_irqsave(rdp, flags);
> -	if (!rcu_nocb_need_deferred_wakeup(rdp)) {
> -		rcu_nocb_unlock_irqrestore(rdp, flags);
> +	raw_spin_lock_irqsave(&rdp_gp->nocb_gp_lock, flags);
> +
> +	if (!rcu_nocb_need_deferred_wakeup(rdp_gp)) {
> +		raw_spin_unlock_irqrestore(&rdp_gp->nocb_gp_lock, flags);
>  		return false;
>  	}
> -	ndw = READ_ONCE(rdp->nocb_defer_wakeup);
> -	ret = wake_nocb_gp(rdp, ndw == RCU_NOCB_WAKE_FORCE, flags);
> +
> +	ndw = rdp_gp->nocb_defer_wakeup;
> +	ret = __wake_nocb_gp(rdp_gp, rdp, ndw == RCU_NOCB_WAKE_FORCE, flags);
>  	trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("DeferredWake"));
>  
>  	return ret;
> @@ -2385,7 +2399,10 @@ static void do_nocb_deferred_wakeup_timer(struct timer_list *t)
>   */
>  static bool do_nocb_deferred_wakeup(struct rcu_data *rdp)
>  {
> -	if (rcu_nocb_need_deferred_wakeup(rdp))
> +	if (!rdp->nocb_gp_rdp)
> +		return false;

This check was not necessary previously because each CPU used its own rdp,
correct?  The theory is that this early return is taken only during boot,
and that the spawning of the kthreads will act as an implicit wakeup?
Or am I missing something subtle here?

> +
> +	if (rcu_nocb_need_deferred_wakeup(rdp->nocb_gp_rdp))
>  		return do_nocb_deferred_wakeup_common(rdp);
>  	return false;
>  }
> @@ -2454,17 +2471,15 @@ static long rcu_nocb_rdp_deoffload(void *arg)
>  	swait_event_exclusive(rdp->nocb_state_wq,
>  			      !rcu_segcblist_test_flags(cblist, SEGCBLIST_KTHREAD_CB |
>  							SEGCBLIST_KTHREAD_GP));
> -	rcu_nocb_lock_irqsave(rdp, flags);
> -	/* Make sure nocb timer won't stay around */
> -	WRITE_ONCE(rdp->nocb_defer_wakeup, RCU_NOCB_WAKE_OFF);
> -	rcu_nocb_unlock_irqrestore(rdp, flags);
> -	del_timer_sync(&rdp->nocb_timer);
> -
>  	/*
> -	 * Theoretically we could set SEGCBLIST_SOFTIRQ_ONLY with CB unlocked
> -	 * and IRQs disabled but let's be paranoid.
> +	 * Lock one last time to acquire latest callback updates from kthreads
> +	 * so we can later handle callbacks locally without locking.
>  	 */
>  	rcu_nocb_lock_irqsave(rdp, flags);
> +	/*
> +	 * Theoretically we could set SEGCBLIST_SOFTIRQ_ONLY after the nocb
> +	 * lock is released but how about being paranoid for once?
> +	 */
>  	rcu_segcblist_set_flags(cblist, SEGCBLIST_SOFTIRQ_ONLY);
>  	/*
>  	 * With SEGCBLIST_SOFTIRQ_ONLY, we can't use
> @@ -2528,8 +2543,7 @@ static long rcu_nocb_rdp_offload(void *arg)
>  	 * SEGCBLIST_SOFTIRQ_ONLY mode.
>  	 */
>  	raw_spin_lock_irqsave(&rdp->nocb_lock, flags);
> -	/* Re-enable nocb timer */
> -	WRITE_ONCE(rdp->nocb_defer_wakeup, RCU_NOCB_WAKE_NOT);
> +
>  	/*
>  	 * We didn't take the nocb lock while working on the
>  	 * rdp->cblist in SEGCBLIST_SOFTIRQ_ONLY mode.
> -- 
> 2.25.1
> 

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 11/13] rcu/nocb: Only cancel nocb timer if not polling
  2021-02-23  0:10 ` [PATCH 11/13] rcu/nocb: Only cancel nocb timer if not polling Frederic Weisbecker
@ 2021-03-03  1:22   ` Paul E. McKenney
  2021-03-10 22:08     ` Frederic Weisbecker
  0 siblings, 1 reply; 39+ messages in thread
From: Paul E. McKenney @ 2021-03-03  1:22 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Thomas Gleixner, Boqun Feng, Lai Jiangshan,
	Neeraj Upadhyay, Josh Triplett, Stable, Joel Fernandes

On Tue, Feb 23, 2021 at 01:10:09AM +0100, Frederic Weisbecker wrote:
> No need to disarm the nocb_timer if rcu_nocb is polling because it
> shouldn't be armed either.
> 
> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
> Cc: Josh Triplett <josh@joshtriplett.org>
> Cc: Lai Jiangshan <jiangshanlai@gmail.com>
> Cc: Joel Fernandes <joel@joelfernandes.org>
> Cc: Neeraj Upadhyay <neeraju@codeaurora.org>
> Cc: Boqun Feng <boqun.feng@gmail.com>

OK, so it does make sense to move that del_timer() under the following
"if" statement, then.  ;-)

> ---
>  kernel/rcu/tree_plugin.h | 14 +++++++-------
>  1 file changed, 7 insertions(+), 7 deletions(-)
> 
> diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> index 9da67b0d3997..d8b50ff40e4b 100644
> --- a/kernel/rcu/tree_plugin.h
> +++ b/kernel/rcu/tree_plugin.h
> @@ -2186,18 +2186,18 @@ static void nocb_gp_wait(struct rcu_data *my_rdp)
>  	my_rdp->nocb_gp_gp = needwait_gp;
>  	my_rdp->nocb_gp_seq = needwait_gp ? wait_gp_seq : 0;
>  	if (bypass) {
> -		raw_spin_lock_irqsave(&my_rdp->nocb_gp_lock, flags);
> -		// Avoid race with first bypass CB.
> -		if (my_rdp->nocb_defer_wakeup > RCU_NOCB_WAKE_NOT) {
> -			WRITE_ONCE(my_rdp->nocb_defer_wakeup, RCU_NOCB_WAKE_NOT);
> -			del_timer(&my_rdp->nocb_timer);
> -		}
>  		if (!rcu_nocb_poll) {
> +			raw_spin_lock_irqsave(&my_rdp->nocb_gp_lock, flags);
> +			// Avoid race with first bypass CB.
> +			if (my_rdp->nocb_defer_wakeup > RCU_NOCB_WAKE_NOT) {
> +				WRITE_ONCE(my_rdp->nocb_defer_wakeup, RCU_NOCB_WAKE_NOT);
> +				del_timer(&my_rdp->nocb_timer);
> +			}
>  			// At least one child with non-empty ->nocb_bypass, so set
>  			// timer in order to avoid stranding its callbacks.
>  			mod_timer(&my_rdp->nocb_bypass_timer, j + 2);
> +			raw_spin_unlock_irqrestore(&my_rdp->nocb_gp_lock, flags);
>  		}
> -		raw_spin_unlock_irqrestore(&my_rdp->nocb_gp_lock, flags);
>  	}
>  	if (rcu_nocb_poll) {
>  		/* Polling, so trace if first poll in the series. */
> -- 
> 2.25.1
> 

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 10/13] rcu/nocb: Delete bypass_timer upon nocb_gp wakeup
  2021-02-23  0:10 ` [PATCH 10/13] rcu/nocb: Delete bypass_timer " Frederic Weisbecker
@ 2021-03-03  1:24   ` Paul E. McKenney
  2021-03-10 22:17     ` Frederic Weisbecker
  0 siblings, 1 reply; 39+ messages in thread
From: Paul E. McKenney @ 2021-03-03  1:24 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Thomas Gleixner, Boqun Feng, Lai Jiangshan,
	Neeraj Upadhyay, Josh Triplett, Stable, Joel Fernandes

On Tue, Feb 23, 2021 at 01:10:08AM +0100, Frederic Weisbecker wrote:
> A NOCB-gp wake up can safely delete the nocb_bypass_timer. nocb_gp_wait()
> is going to check again the bypass state and rearm the bypass timer if
> necessary.
> 
> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
> Cc: Josh Triplett <josh@joshtriplett.org>
> Cc: Lai Jiangshan <jiangshanlai@gmail.com>
> Cc: Joel Fernandes <joel@joelfernandes.org>
> Cc: Neeraj Upadhyay <neeraju@codeaurora.org>
> Cc: Boqun Feng <boqun.feng@gmail.com>

Give that you delete this code a couple of patches later in this series,
why not just leave it out entirely?  ;-)

							Thanx, Paul

> ---
>  kernel/rcu/tree_plugin.h | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> index b62ad79bbda5..9da67b0d3997 100644
> --- a/kernel/rcu/tree_plugin.h
> +++ b/kernel/rcu/tree_plugin.h
> @@ -1711,6 +1711,8 @@ static bool __wake_nocb_gp(struct rcu_data *rdp_gp,
>  		del_timer(&rdp_gp->nocb_timer);
>  	}
>  
> +	del_timer(&rdp_gp->nocb_bypass_timer);
> +
>  	if (force || READ_ONCE(rdp_gp->nocb_gp_sleep)) {
>  		WRITE_ONCE(rdp_gp->nocb_gp_sleep, false);
>  		needwake = true;
> -- 
> 2.25.1
> 

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 01/13] rcu/nocb: Fix potential missed nocb_timer rearm
  2021-03-02 18:17           ` Paul E. McKenney
@ 2021-03-03  1:35             ` Frederic Weisbecker
  2021-03-03  2:06               ` Paul E. McKenney
  2021-03-03 11:15             ` Neeraj Upadhyay
  1 sibling, 1 reply; 39+ messages in thread
From: Frederic Weisbecker @ 2021-03-03  1:35 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: LKML, Thomas Gleixner, Boqun Feng, Lai Jiangshan,
	Neeraj Upadhyay, Josh Triplett, Stable, Joel Fernandes

On Tue, Mar 02, 2021 at 10:17:29AM -0800, Paul E. McKenney wrote:
> On Tue, Mar 02, 2021 at 01:34:44PM +0100, Frederic Weisbecker wrote:
> 
> OK, how about if I queue a temporary commit (shown below) that just
> calls out the first scenario so that I can start testing, and you get
> me more detail on the second scenario?  I can then update the commit.

Sure, meanwhile here is an attempt for a nocb_bypass_timer based
scenario, it's overly hairy and perhaps I picture more power
in the hands of callbacks advancing on nocb_cb_wait() than it
really has:


0.          CPU 0's ->nocb_cb_kthread just called rcu_do_batch() and
            executed all the ready callbacks. Its segcblist is now
            entirely empty. It's preempted while calling local_bh_enable().

1.          A new callback is enqueued on CPU 0 with IRQs enabled. So
            the ->nocb_gp_kthread for CPU 0-2's is awaken. Then a storm
            of callbacks enqueue follows on CPU 0 and even reaches the
            bypass queue. Note that ->nocb_gp_kthread is also associated
            with CPU 0.

2.          CPU 0 queues one last bypass callback.

3.          The ->nocb_gp_kthread wakes up and associates a grace period
            with the whole queue of regular callbacks on CPU 0. It also
            tries to flush the bypass queue of CPU 0 but the bypass lock
            is contended due to the concurrent enqueuing on the previous
            step 2, so the flush fails.

4.          This ->nocb_gp_kthread arms its ->nocb_bypass_timer and goes
            to sleep waiting for the end of this future grace period.

5.          This grace period elapses before the ->nocb_bypass_timer timer
            fires. This is normally improbably given that the timer is set
            for only two jiffies, but timers can be delayed.  Besides, it
            is possible that kernel was built with CONFIG_RCU_STRICT_GRACE_PERIOD=y.

6.          The grace period ends, so rcu_gp_kthread awakens the
            ->nocb_gp_kthread but it doesn't get a chance to run on a CPU
            before a while.

7.          CPU 0's ->nocb_cb_kthread get back to the CPU after its preemption.
            As it notices the new completed grace period, it advances the callbacks
            and executes them. Then it gets preempted again on local_bh_enabled().

8.          A new callback enqueue on CPU 0 flushes itself the bypass queue
            because CPU 0's ->nocb_nobypass_count < nocb_nobypass_lim_per_jiffy.

9.          CPUs from other ->nocb_gp_kthread groups (above CPU 2) initiate and
            elapse a few grace periods. CPU 0's ->nocb_gp_kthread still hasn't
            got an opportunity to run on a CPU and its ->nocb_bypass_timer still
            hasn't fired.

10.         CPU 0's ->nocb_cb_kthread wakes up from preemption. It notices the
            new grace periods that have elapsed, advance all the callbacks and
            executes them. Then it goes to sleep waiting for invocable callbacks.

11.         CPU 0 enqueues a new callback with interrupts disabled, and
            defers awakening its ->nocb_gp_kthread even though ->nocb_gp_sleep
            is actually false. It therefore queues its rcu_data structure's
            ->nocb_timer. At this point, CPU 0's rdp->nocb_defer_wakeup is
            RCU_NOCB_WAKE.

12.         The ->nocb_bypass_timer finally fires! It doesn't wake up
            ->nocb_gp_kthread because it's actually awaken already.
            But it cancels CPU 0's ->nocb_timer armed at 11. Yet it doesn't
            re-initialize CPU 0's ->nocb_defer_wakeup which stays with the
            stale RCU_NOCB_WAKE value. So CPU 0's->nocb_defer_wakeup and
            its ->nocb_timer are now desynchronized.
            
13.         The ->nocb_gp_kthread finally runs. It cancels the ->nocb_bypass_timer
            which has already fired. It sees the new callback on CPU 0 and
            associate it with a new grace period then sleep on it.
            
14.         The grace period elapses, rcu_gp_kthread wakes up ->nocb_gb_kthread
            which wakes up CPU 0's->nocb_cb_kthread which runs the callback.
            Both ->nocb_gp_kthread and CPU 0's->nocb_cb_kthread now wait for new
            callbacks.
            
15.         CPU 0 enqueues another callback, again with interrupts
            disabled so it must queue a timer for a deferred wakeup. However
            the value of its ->nocb_defer_wakeup is RCU_NOCB_WAKE which
            incorrectly indicates that a timer is already queued.  Instead,
            CPU 0's ->nocb_timer was cancelled in 12.  CPU 0 therefore fails
            to queue the ->nocb_timer.

16.         CPU 0 has its pending callback and it may go unnoticed until
            some other CPU ever wakes up ->nocb_gp_kthread or CPU 0 ever
            calls an explicit deferred wakeup, for example, during idle entry.


Thanks.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 01/13] rcu/nocb: Fix potential missed nocb_timer rearm
  2021-03-03  1:35             ` Frederic Weisbecker
@ 2021-03-03  2:06               ` Paul E. McKenney
  2021-03-03  2:17                 ` Frederic Weisbecker
  0 siblings, 1 reply; 39+ messages in thread
From: Paul E. McKenney @ 2021-03-03  2:06 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Thomas Gleixner, Boqun Feng, Lai Jiangshan,
	Neeraj Upadhyay, Josh Triplett, Stable, Joel Fernandes

On Wed, Mar 03, 2021 at 02:35:33AM +0100, Frederic Weisbecker wrote:
> On Tue, Mar 02, 2021 at 10:17:29AM -0800, Paul E. McKenney wrote:
> > On Tue, Mar 02, 2021 at 01:34:44PM +0100, Frederic Weisbecker wrote:
> > 
> > OK, how about if I queue a temporary commit (shown below) that just
> > calls out the first scenario so that I can start testing, and you get
> > me more detail on the second scenario?  I can then update the commit.
> 
> Sure, meanwhile here is an attempt for a nocb_bypass_timer based
> scenario, it's overly hairy and perhaps I picture more power
> in the hands of callbacks advancing on nocb_cb_wait() than it
> really has:

Thank you very much!

I must defer looking through this in detail until I am more awake,
but I do very much like the fine-grained exposition.

							Thanx, Paul

> 0.          CPU 0's ->nocb_cb_kthread just called rcu_do_batch() and
>             executed all the ready callbacks. Its segcblist is now
>             entirely empty. It's preempted while calling local_bh_enable().
> 
> 1.          A new callback is enqueued on CPU 0 with IRQs enabled. So
>             the ->nocb_gp_kthread for CPU 0-2's is awaken. Then a storm
>             of callbacks enqueue follows on CPU 0 and even reaches the
>             bypass queue. Note that ->nocb_gp_kthread is also associated
>             with CPU 0.
> 
> 2.          CPU 0 queues one last bypass callback.
> 
> 3.          The ->nocb_gp_kthread wakes up and associates a grace period
>             with the whole queue of regular callbacks on CPU 0. It also
>             tries to flush the bypass queue of CPU 0 but the bypass lock
>             is contended due to the concurrent enqueuing on the previous
>             step 2, so the flush fails.
> 
> 4.          This ->nocb_gp_kthread arms its ->nocb_bypass_timer and goes
>             to sleep waiting for the end of this future grace period.
> 
> 5.          This grace period elapses before the ->nocb_bypass_timer timer
>             fires. This is normally improbably given that the timer is set
>             for only two jiffies, but timers can be delayed.  Besides, it
>             is possible that kernel was built with CONFIG_RCU_STRICT_GRACE_PERIOD=y.
> 
> 6.          The grace period ends, so rcu_gp_kthread awakens the
>             ->nocb_gp_kthread but it doesn't get a chance to run on a CPU
>             before a while.
> 
> 7.          CPU 0's ->nocb_cb_kthread get back to the CPU after its preemption.
>             As it notices the new completed grace period, it advances the callbacks
>             and executes them. Then it gets preempted again on local_bh_enabled().
> 
> 8.          A new callback enqueue on CPU 0 flushes itself the bypass queue
>             because CPU 0's ->nocb_nobypass_count < nocb_nobypass_lim_per_jiffy.
> 
> 9.          CPUs from other ->nocb_gp_kthread groups (above CPU 2) initiate and
>             elapse a few grace periods. CPU 0's ->nocb_gp_kthread still hasn't
>             got an opportunity to run on a CPU and its ->nocb_bypass_timer still
>             hasn't fired.
> 
> 10.         CPU 0's ->nocb_cb_kthread wakes up from preemption. It notices the
>             new grace periods that have elapsed, advance all the callbacks and
>             executes them. Then it goes to sleep waiting for invocable callbacks.
> 
> 11.         CPU 0 enqueues a new callback with interrupts disabled, and
>             defers awakening its ->nocb_gp_kthread even though ->nocb_gp_sleep
>             is actually false. It therefore queues its rcu_data structure's
>             ->nocb_timer. At this point, CPU 0's rdp->nocb_defer_wakeup is
>             RCU_NOCB_WAKE.
> 
> 12.         The ->nocb_bypass_timer finally fires! It doesn't wake up
>             ->nocb_gp_kthread because it's actually awaken already.
>             But it cancels CPU 0's ->nocb_timer armed at 11. Yet it doesn't
>             re-initialize CPU 0's ->nocb_defer_wakeup which stays with the
>             stale RCU_NOCB_WAKE value. So CPU 0's->nocb_defer_wakeup and
>             its ->nocb_timer are now desynchronized.
>             
> 13.         The ->nocb_gp_kthread finally runs. It cancels the ->nocb_bypass_timer
>             which has already fired. It sees the new callback on CPU 0 and
>             associate it with a new grace period then sleep on it.
>             
> 14.         The grace period elapses, rcu_gp_kthread wakes up ->nocb_gb_kthread
>             which wakes up CPU 0's->nocb_cb_kthread which runs the callback.
>             Both ->nocb_gp_kthread and CPU 0's->nocb_cb_kthread now wait for new
>             callbacks.
>             
> 15.         CPU 0 enqueues another callback, again with interrupts
>             disabled so it must queue a timer for a deferred wakeup. However
>             the value of its ->nocb_defer_wakeup is RCU_NOCB_WAKE which
>             incorrectly indicates that a timer is already queued.  Instead,
>             CPU 0's ->nocb_timer was cancelled in 12.  CPU 0 therefore fails
>             to queue the ->nocb_timer.
> 
> 16.         CPU 0 has its pending callback and it may go unnoticed until
>             some other CPU ever wakes up ->nocb_gp_kthread or CPU 0 ever
>             calls an explicit deferred wakeup, for example, during idle entry.
> 
> 
> Thanks.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 01/13] rcu/nocb: Fix potential missed nocb_timer rearm
  2021-03-03  2:06               ` Paul E. McKenney
@ 2021-03-03  2:17                 ` Frederic Weisbecker
  0 siblings, 0 replies; 39+ messages in thread
From: Frederic Weisbecker @ 2021-03-03  2:17 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: LKML, Thomas Gleixner, Boqun Feng, Lai Jiangshan,
	Neeraj Upadhyay, Josh Triplett, Stable, Joel Fernandes

On Tue, Mar 02, 2021 at 06:06:43PM -0800, Paul E. McKenney wrote:
> On Wed, Mar 03, 2021 at 02:35:33AM +0100, Frederic Weisbecker wrote:
> > On Tue, Mar 02, 2021 at 10:17:29AM -0800, Paul E. McKenney wrote:
> > > On Tue, Mar 02, 2021 at 01:34:44PM +0100, Frederic Weisbecker wrote:
> > > 
> > > OK, how about if I queue a temporary commit (shown below) that just
> > > calls out the first scenario so that I can start testing, and you get
> > > me more detail on the second scenario?  I can then update the commit.
> > 
> > Sure, meanwhile here is an attempt for a nocb_bypass_timer based
> > scenario, it's overly hairy and perhaps I picture more power
> > in the hands of callbacks advancing on nocb_cb_wait() than it
> > really has:
> 
> Thank you very much!
> 
> I must defer looking through this in detail until I am more awake,
> but I do very much like the fine-grained exposition.
> 
> 							Thanx, Paul
> 
> > 0.          CPU 0's ->nocb_cb_kthread just called rcu_do_batch() and
> >             executed all the ready callbacks. Its segcblist is now
> >             entirely empty. It's preempted while calling local_bh_enable().
> > 
> > 1.          A new callback is enqueued on CPU 0 with IRQs enabled. So
> >             the ->nocb_gp_kthread for CPU 0-2's is awaken. Then a storm
> >             of callbacks enqueue follows on CPU 0 and even reaches the
> >             bypass queue. Note that ->nocb_gp_kthread is also associated
> >             with CPU 0.
> > 
> > 2.          CPU 0 queues one last bypass callback.
> > 
> > 3.          The ->nocb_gp_kthread wakes up and associates a grace period
> >             with the whole queue of regular callbacks on CPU 0. It also
> >             tries to flush the bypass queue of CPU 0 but the bypass lock
> >             is contended due to the concurrent enqueuing on the previous
> >             step 2, so the flush fails.
> > 
> > 4.          This ->nocb_gp_kthread arms its ->nocb_bypass_timer and goes
> >             to sleep waiting for the end of this future grace period.
> > 
> > 5.          This grace period elapses before the ->nocb_bypass_timer timer
> >             fires. This is normally improbably given that the timer is set
> >             for only two jiffies, but timers can be delayed.  Besides, it
> >             is possible that kernel was built with CONFIG_RCU_STRICT_GRACE_PERIOD=y.
> > 
> > 6.          The grace period ends, so rcu_gp_kthread awakens the
> >             ->nocb_gp_kthread but it doesn't get a chance to run on a CPU
> >             before a while.
> > 
> > 7.          CPU 0's ->nocb_cb_kthread get back to the CPU after its preemption.
> >             As it notices the new completed grace period, it advances the callbacks
> >             and executes them. Then it gets preempted again on local_bh_enabled().
> > 
> > 8.          A new callback enqueue on CPU 0 flushes itself the bypass queue
> >             because CPU 0's ->nocb_nobypass_count < nocb_nobypass_lim_per_jiffy.
> > 
> > 9.          CPUs from other ->nocb_gp_kthread groups (above CPU 2) initiate and
> >             elapse a few grace periods. CPU 0's ->nocb_gp_kthread still hasn't
> >             got an opportunity to run on a CPU and its ->nocb_bypass_timer still
> >             hasn't fired.
> > 
> > 10.         CPU 0's ->nocb_cb_kthread wakes up from preemption. It notices the
> >             new grace periods that have elapsed, advance all the callbacks and
> >             executes them. Then it goes to sleep waiting for invocable
> >             callbacks.

I'm just not so sure about the above point 10. Even though a few grace periods
have elapsed, the callback queued in 8 is in RCU_NEXT_TAIL at this
point. Perhaps one more grace period is necessary after that.

Anyway, I need to be more awake as well before checking that again.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 01/13] rcu/nocb: Fix potential missed nocb_timer rearm
  2021-03-02 18:17           ` Paul E. McKenney
  2021-03-03  1:35             ` Frederic Weisbecker
@ 2021-03-03 11:15             ` Neeraj Upadhyay
  1 sibling, 0 replies; 39+ messages in thread
From: Neeraj Upadhyay @ 2021-03-03 11:15 UTC (permalink / raw)
  To: paulmck, Frederic Weisbecker
  Cc: LKML, Thomas Gleixner, Boqun Feng, Lai Jiangshan, Josh Triplett,
	Stable, Joel Fernandes

Hi,

On 3/2/2021 11:47 PM, Paul E. McKenney wrote:
> On Tue, Mar 02, 2021 at 01:34:44PM +0100, Frederic Weisbecker wrote:
>> On Mon, Mar 01, 2021 at 05:48:29PM -0800, Paul E. McKenney wrote:
>>> On Wed, Feb 24, 2021 at 11:06:06PM +0100, Frederic Weisbecker wrote:
>>>> On Wed, Feb 24, 2021 at 10:37:09AM -0800, Paul E. McKenney wrote:
>>>>> On Tue, Feb 23, 2021 at 01:09:59AM +0100, Frederic Weisbecker wrote:
>>>>>> Two situations can cause a missed nocb timer rearm:
>>>>>>
>>>>>> 1) rdp(CPU A) queues its nocb timer. The grace period elapses before
>>>>>>     the timer get a chance to fire. The nocb_gp kthread is awaken by
>>>>>>     rdp(CPU B). The nocb_cb kthread for rdp(CPU A) is awaken and
>>>>>>     process the callbacks, again before the nocb_timer for CPU A get a
>>>>>>     chance to fire. rdp(CPU A) queues a callback and wakes up nocb_gp
>>>>>>     kthread, cancelling the pending nocb_timer without resetting the
>>>>>>     corresponding nocb_defer_wakeup.
>>>>>
>>>>> As discussed offlist, expanding the above scenario results in this
>>>>> sequence of steps:
>>>
>>> I renumbered the CPUs, since the ->nocb_gp_kthread would normally be
>>> associated with CPU 0.  If the first CPU to enqueue a callback was also
>>> CPU 0, nocb_gp_wait() might clear that CPU's ->nocb_defer_wakeup, which
>>> would prevent this scenario from playing out.  (But admittedly only if
>>> some other CPU handled by this same ->nocb_gp_kthread used its bypass.)
>>
>> Ok good point.
>>
>>>
>>>>> 1.	There are no callbacks queued for any CPU covered by CPU 0-2's
>>>>> 	->nocb_gp_kthread.
>>>
>>> And ->nocb_gp_kthread is associated with CPU 0.
>>>
>>>>> 2.	CPU 1 enqueues its first callback with interrupts disabled, and
>>>>> 	thus must defer awakening its ->nocb_gp_kthread.  It therefore
>>>>> 	queues its rcu_data structure's ->nocb_timer.
>>>
>>> At this point, CPU 1's rdp->nocb_defer_wakeup is RCU_NOCB_WAKE.
>>
>> Right.
>>
>>>>> 7.	The grace period ends, so rcu_gp_kthread awakens the
>>>>> 	->nocb_gp_kthread, which in turn awakens both CPU 1's and
>>>>> 	CPU 2's ->nocb_cb_kthread.
>>>
>>> And then ->nocb_cb_kthread sleeps waiting for more callbacks.
>>
>> Yep
>>
>>>> I managed to recollect some pieces of my brain. So keep the above but
>>>> let's change the point 10:
>>>>
>>>> 10.	CPU 1 enqueues its second callback, this time with interrupts
>>>>   	enabled so it can wake directly	->nocb_gp_kthread.
>>>> 	It does so with calling __wake_nocb_gp() which also cancels the
>>>
>>> wake_nocb_gp() in current -rcu, correct?
>>
>> Heh, right.
>>
>>>>> So far so good, but why isn't the timer still queued from back in step 2?
>>>>> What am I missing here?  Either way, could you please update the commit
>>>>> logs to tell the full story?  At some later time, you might be very
>>>>> happy that you did.  ;-)
>>>>>
>>>>>> 2) The "nocb_bypass_timer" ends up calling wake_nocb_gp() which deletes
>>>>>>     the pending "nocb_timer" (note they are not the same timers) for the
>>>>>>     given rdp without resetting the matching state stored in nocb_defer
>>>>>>     wakeup.
>>>
>>> Would like to similarly expand this one, or would you prefer to rest your
>>> case on Case 1) above?
>>
>> I was about to say that we can skip that one, the changelog will already be
>> big enough but the "Fixes:" tag refers to the second scenario, since it's the
>> oldest vulnerable commit AFAICS.
>>
>>>>>> Fixes: d1b222c6be1f (rcu/nocb: Add bypass callback queueing)
> 
> OK, how about if I queue a temporary commit (shown below) that just
> calls out the first scenario so that I can start testing, and you get
> me more detail on the second scenario?  I can then update the commit.
> 
> 							Thanx, Paul
> 
> ------------------------------------------------------------------------
> 
> commit 302fd54b9ae98f678624cbf9bf7a4ca88455a8f9
> Author: Frederic Weisbecker <frederic@kernel.org>
> Date:   Tue Feb 23 01:09:59 2021 +0100
> 
>      rcu/nocb: Fix missed nocb_timer requeue
>      
>      This sequence of events can lead to a failure to requeue a CPU's
>      ->nocb_timer:
>      
>      1.      There are no callbacks queued for any CPU covered by CPU 0-2's
>              ->nocb_gp_kthread.  Note that ->nocb_gp_kthread is associated
>              with CPU 0.
>      
>      2.      CPU 1 enqueues its first callback with interrupts disabled, and
>              thus must defer awakening its ->nocb_gp_kthread.  It therefore
>              queues its rcu_data structure's ->nocb_timer.  At this point,
>              CPU 1's rdp->nocb_defer_wakeup is RCU_NOCB_WAKE.
>      
>      3.      CPU 2, which shares the same ->nocb_gp_kthread, also enqueues a
>              callback, but with interrupts enabled, allowing it to directly
>              awaken the ->nocb_gp_kthread.
>      
>      4.      The newly awakened ->nocb_gp_kthread associates both CPU 1's
>              and CPU 2's callbacks with a future grace period and arranges
>              for that grace period to be started.
>      
>      5.      This ->nocb_gp_kthread goes to sleep waiting for the end of this
>              future grace period.
>      
>      6.      This grace period elapses before the CPU 1's timer fires.
>              This is normally improbably given that the timer is set for only
>              one jiffy, but timers can be delayed.  Besides, it is possible
>              that kernel was built with CONFIG_RCU_STRICT_GRACE_PERIOD=y.
>      
>      7.      The grace period ends, so rcu_gp_kthread awakens the
>              ->nocb_gp_kthread, which in turn awakens both CPU 1's and
>              CPU 2's ->nocb_cb_kthread.  Then ->nocb_gb_kthread sleeps
>              waiting for more newly queued callbacks.
>      
>      8.      CPU 1's ->nocb_cb_kthread invokes its callback, then sleeps
>              waiting for more invocable callbacks.
>      
>      9.      Note that neither kthread updated any ->nocb_timer state,
>              so CPU 1's ->nocb_defer_wakeup is still set to RCU_NOCB_WAKE.
>      
>      10.     CPU 1 enqueues its second callback, this time with interrupts
>              enabled so it can wake directly ->nocb_gp_kthread.
>              It does so with calling wake_nocb_gp() which also cancels the
>              pending timer that got queued in step 2. But that doesn't reset
>              CPU 1's ->nocb_defer_wakeup which is still set to RCU_NOCB_WAKE.
>              So CPU 1's ->nocb_defer_wakeup and its ->nocb_timer are now
>              desynchronized.
>      
>      11.     ->nocb_gp_kthread associates the callback queued in 10 with a new
>              grace period, arranges for that grace period to start and sleeps
>              waiting for it to complete.
>      
>      12.     The grace period ends, rcu_gp_kthread awakens ->nocb_gp_kthread,
>              which in turn wakes up CPU 1's ->nocb_cb_kthread which then
>              invokes the callback queued in 10.
>      
>      13.     CPU 1 enqueues its third callback, this time with interrupts
>              disabled so it must queue a timer for a deferred wakeup. However
>              the value of its ->nocb_defer_wakeup is RCU_NOCB_WAKE which
>              incorrectly indicates that a timer is already queued.  Instead,
>              CPU 1's ->nocb_timer was cancelled in 10.  CPU 1 therefore fails
>              to queue the ->nocb_timer.
>      
>      14.     CPU 1 has its pending callback and it may go unnoticed until
>              some other CPU ever wakes up ->nocb_gp_kthread or CPU 1 ever
>              calls an explicit deferred wakeup, for example, during idle entry.
>      
>      This commit fixes this bug by resetting rdp->nocb_defer_wakeup everytime
>      we delete the ->nocb_timer.
>      
>      Fixes: d1b222c6be1f (rcu/nocb: Add bypass callback queueing)
>      Cc: Stable <stable@vger.kernel.org>
>      Cc: Josh Triplett <josh@joshtriplett.org>
>      Cc: Lai Jiangshan <jiangshanlai@gmail.com>
>      Cc: Joel Fernandes <joel@joelfernandes.org>
>      Cc: Neeraj Upadhyay <neeraju@codeaurora.org>
>      Cc: Boqun Feng <boqun.feng@gmail.com>
>      Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
>      Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> 
> diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> index 44746d8..429491d 100644
> --- a/kernel/rcu/tree_plugin.h
> +++ b/kernel/rcu/tree_plugin.h
> @@ -1721,7 +1721,11 @@ static bool wake_nocb_gp(struct rcu_data *rdp, bool force,
>   		rcu_nocb_unlock_irqrestore(rdp, flags);
>   		return false;
>   	}
> -	del_timer(&rdp->nocb_timer);
> +
> +	if (READ_ONCE(rdp->nocb_defer_wakeup) > RCU_NOCB_WAKE_NOT) {
> +		WRITE_ONCE(rdp->nocb_defer_wakeup, RCU_NOCB_WAKE_NOT);
> +		del_timer(&rdp->nocb_timer);
> +	}
>   	rcu_nocb_unlock_irqrestore(rdp, flags);
>   	raw_spin_lock_irqsave(&rdp_gp->nocb_gp_lock, flags);
>   	if (force || READ_ONCE(rdp_gp->nocb_gp_sleep)) {
> @@ -2350,7 +2354,6 @@ static bool do_nocb_deferred_wakeup_common(struct rcu_data *rdp)
>   		return false;
>   	}
>   	ndw = READ_ONCE(rdp->nocb_defer_wakeup);
> -	WRITE_ONCE(rdp->nocb_defer_wakeup, RCU_NOCB_WAKE_NOT);
>   	ret = wake_nocb_gp(rdp, ndw == RCU_NOCB_WAKE_FORCE, flags);
>   	trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("DeferredWake"));
>   
> 

Reviewed-by: Neeraj Upadhyay <neeraju@codeaurora.org>


Thanks
Neeraj

-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a 
member of the Code Aurora Forum, hosted by The Linux Foundation

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 05/13] rcu/nocb: Use the rcuog CPU's ->nocb_timer
  2021-03-03  1:15   ` [PATCH 05/13] rcu/nocb: Use the rcuog CPU's ->nocb_timer Paul E. McKenney
@ 2021-03-10 22:05     ` Frederic Weisbecker
  2021-03-16  0:02       ` Paul E. McKenney
  0 siblings, 1 reply; 39+ messages in thread
From: Frederic Weisbecker @ 2021-03-10 22:05 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: LKML, Thomas Gleixner, Boqun Feng, Lai Jiangshan,
	Neeraj Upadhyay, Josh Triplett, Stable, Joel Fernandes

On Tue, Mar 02, 2021 at 05:15:57PM -0800, Paul E. McKenney wrote:
> The first question is of course: Did you try this with lockdep enabled?  ;-)

Yep I always do. But I may miss some configs on my testings. I usually
test at least TREE01 on x86 and arm64.

> > @@ -1702,43 +1692,50 @@ bool rcu_is_nocb_cpu(int cpu)
> >  	return false;
> >  }
> >  
> > -/*
> > - * Kick the GP kthread for this NOCB group.  Caller holds ->nocb_lock
> > - * and this function releases it.
> > - */
> > -static bool wake_nocb_gp(struct rcu_data *rdp, bool force,
> > -			 unsigned long flags)
> > -	__releases(rdp->nocb_lock)
> > +static bool __wake_nocb_gp(struct rcu_data *rdp_gp,
> > +			   struct rcu_data *rdp,
> > +			   bool force, unsigned long flags)
> > +	__releases(rdp_gp->nocb_gp_lock)
> >  {
> >  	bool needwake = false;
> > -	struct rcu_data *rdp_gp = rdp->nocb_gp_rdp;
> >  
> > -	lockdep_assert_held(&rdp->nocb_lock);
> >  	if (!READ_ONCE(rdp_gp->nocb_gp_kthread)) {
> > -		rcu_nocb_unlock_irqrestore(rdp, flags);
> > +		raw_spin_unlock_irqrestore(&rdp_gp->nocb_gp_lock, flags);
> >  		trace_rcu_nocb_wake(rcu_state.name, rdp->cpu,
> >  				    TPS("AlreadyAwake"));
> >  		return false;
> >  	}
> >  
> > -	if (READ_ONCE(rdp->nocb_defer_wakeup) > RCU_NOCB_WAKE_NOT) {
> > -		WRITE_ONCE(rdp->nocb_defer_wakeup, RCU_NOCB_WAKE_NOT);
> > -		del_timer(&rdp->nocb_timer);
> > +	if (rdp_gp->nocb_defer_wakeup > RCU_NOCB_WAKE_NOT) {
> 
> So there are no longer any data races involving ->nocb_defer_wakeup?
> 
> (Yes, I could fire up KCSAN, but my KCSAN-capable system is otherwise
> occupied for several more hours.)

To be more specific, there is no more unlocked write to the timer (queue/cancel)
and its nocb_defer_wakeup matching state. And there is only one (on purpose) racy
reader of ->nocb_defer_wakeup which is the non-timer deferred wakeup.

So the writes to the timer keep their WRITE_ONCE() and only the reader in
do_nocb_deferred_wakeup() keeps its READ_ONCE(). Other readers are protected
by the ->nocb_gp_lock.

> > +
> >  		// Advance callbacks if helpful and low contention.
> >  		needwake_gp = false;
> >  		if (!rcu_segcblist_restempty(&rdp->cblist,
> > @@ -2178,11 +2182,18 @@ static void nocb_gp_wait(struct rcu_data *my_rdp)
> >  	my_rdp->nocb_gp_bypass = bypass;
> >  	my_rdp->nocb_gp_gp = needwait_gp;
> >  	my_rdp->nocb_gp_seq = needwait_gp ? wait_gp_seq : 0;
> > -	if (bypass && !rcu_nocb_poll) {
> > -		// At least one child with non-empty ->nocb_bypass, so set
> > -		// timer in order to avoid stranding its callbacks.
> > +	if (bypass) {
> >  		raw_spin_lock_irqsave(&my_rdp->nocb_gp_lock, flags);
> > -		mod_timer(&my_rdp->nocb_bypass_timer, j + 2);
> > +		// Avoid race with first bypass CB.
> > +		if (my_rdp->nocb_defer_wakeup > RCU_NOCB_WAKE_NOT) {
> > +			WRITE_ONCE(my_rdp->nocb_defer_wakeup, RCU_NOCB_WAKE_NOT);
> > +			del_timer(&my_rdp->nocb_timer);
> > +		}
> 
> Given that the timer does not get queued if rcu_nocb_poll, why not move the
> above "if" statement under the one following?

It's done later in the set.

> 
> > +		if (!rcu_nocb_poll) {
> > +			// At least one child with non-empty ->nocb_bypass, so set
> > +			// timer in order to avoid stranding its callbacks.
> > +			mod_timer(&my_rdp->nocb_bypass_timer, j + 2);
> > +		}
> >  		raw_spin_unlock_irqrestore(&my_rdp->nocb_gp_lock, flags);
> >  	}
> >  	if (rcu_nocb_poll) {
> > @@ -2385,7 +2399,10 @@ static void do_nocb_deferred_wakeup_timer(struct timer_list *t)
> >   */
> >  static bool do_nocb_deferred_wakeup(struct rcu_data *rdp)
> >  {
> > -	if (rcu_nocb_need_deferred_wakeup(rdp))
> > +	if (!rdp->nocb_gp_rdp)
> > +		return false;
> 
> This check was not necessary previously because each CPU used its own rdp,
> correct?

Exactly!

> The theory is that this early return is taken only during boot,
> and that the spawning of the kthreads will act as an implicit wakeup?

You guessed right! That probably deserve a comment.

Thanks!

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 11/13] rcu/nocb: Only cancel nocb timer if not polling
  2021-03-03  1:22   ` Paul E. McKenney
@ 2021-03-10 22:08     ` Frederic Weisbecker
  0 siblings, 0 replies; 39+ messages in thread
From: Frederic Weisbecker @ 2021-03-10 22:08 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: LKML, Thomas Gleixner, Boqun Feng, Lai Jiangshan,
	Neeraj Upadhyay, Josh Triplett, Stable, Joel Fernandes

On Tue, Mar 02, 2021 at 05:22:29PM -0800, Paul E. McKenney wrote:
> On Tue, Feb 23, 2021 at 01:10:09AM +0100, Frederic Weisbecker wrote:
> > No need to disarm the nocb_timer if rcu_nocb is polling because it
> > shouldn't be armed either.
> > 
> > Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
> > Cc: Josh Triplett <josh@joshtriplett.org>
> > Cc: Lai Jiangshan <jiangshanlai@gmail.com>
> > Cc: Joel Fernandes <joel@joelfernandes.org>
> > Cc: Neeraj Upadhyay <neeraju@codeaurora.org>
> > Cc: Boqun Feng <boqun.feng@gmail.com>
> 
> OK, so it does make sense to move that del_timer() under the following
> "if" statement, then.  ;-)

Right, probably I should have handled that in the beginning of the patchset
instead of the end but heh, my mind is never that clear.

Thanks.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 10/13] rcu/nocb: Delete bypass_timer upon nocb_gp wakeup
  2021-03-03  1:24   ` Paul E. McKenney
@ 2021-03-10 22:17     ` Frederic Weisbecker
  2021-03-15 14:53       ` Boqun Feng
  0 siblings, 1 reply; 39+ messages in thread
From: Frederic Weisbecker @ 2021-03-10 22:17 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: LKML, Thomas Gleixner, Boqun Feng, Lai Jiangshan,
	Neeraj Upadhyay, Josh Triplett, Stable, Joel Fernandes

On Tue, Mar 02, 2021 at 05:24:56PM -0800, Paul E. McKenney wrote:
> On Tue, Feb 23, 2021 at 01:10:08AM +0100, Frederic Weisbecker wrote:
> > A NOCB-gp wake up can safely delete the nocb_bypass_timer. nocb_gp_wait()
> > is going to check again the bypass state and rearm the bypass timer if
> > necessary.
> > 
> > Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
> > Cc: Josh Triplett <josh@joshtriplett.org>
> > Cc: Lai Jiangshan <jiangshanlai@gmail.com>
> > Cc: Joel Fernandes <joel@joelfernandes.org>
> > Cc: Neeraj Upadhyay <neeraju@codeaurora.org>
> > Cc: Boqun Feng <boqun.feng@gmail.com>
> 
> Give that you delete this code a couple of patches later in this series,
> why not just leave it out entirely?  ;-)

It's not exactly deleted later, it's rather merged within the
"del_timer(&rdp_gp->nocb_timer)".

The purpose of that patch is to make it clear that we explicitly cancel
the nocb_bypass_timer here before we do it implicitly later with the
merge of nocb_bypass_timer into nocb_timer.

We could drop that patch, the resulting code in the end of the patchset
will be the same of course but the behaviour detail described here might
slip out of the reviewers attention :-)

> 
> 							Thanx, Paul
> 
> > ---
> >  kernel/rcu/tree_plugin.h | 2 ++
> >  1 file changed, 2 insertions(+)
> > 
> > diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> > index b62ad79bbda5..9da67b0d3997 100644
> > --- a/kernel/rcu/tree_plugin.h
> > +++ b/kernel/rcu/tree_plugin.h
> > @@ -1711,6 +1711,8 @@ static bool __wake_nocb_gp(struct rcu_data *rdp_gp,
> >  		del_timer(&rdp_gp->nocb_timer);
> >  	}
> >  
> > +	del_timer(&rdp_gp->nocb_bypass_timer);
> > +
> >  	if (force || READ_ONCE(rdp_gp->nocb_gp_sleep)) {
> >  		WRITE_ONCE(rdp_gp->nocb_gp_sleep, false);
> >  		needwake = true;

Thanks.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 10/13] rcu/nocb: Delete bypass_timer upon nocb_gp wakeup
  2021-03-10 22:17     ` Frederic Weisbecker
@ 2021-03-15 14:53       ` Boqun Feng
  2021-03-15 22:56         ` Frederic Weisbecker
  0 siblings, 1 reply; 39+ messages in thread
From: Boqun Feng @ 2021-03-15 14:53 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Paul E. McKenney, LKML, Thomas Gleixner, Lai Jiangshan,
	Neeraj Upadhyay, Josh Triplett, Stable, Joel Fernandes

On Wed, Mar 10, 2021 at 11:17:02PM +0100, Frederic Weisbecker wrote:
> On Tue, Mar 02, 2021 at 05:24:56PM -0800, Paul E. McKenney wrote:
> > On Tue, Feb 23, 2021 at 01:10:08AM +0100, Frederic Weisbecker wrote:
> > > A NOCB-gp wake up can safely delete the nocb_bypass_timer. nocb_gp_wait()
> > > is going to check again the bypass state and rearm the bypass timer if
> > > necessary.
> > > 
> > > Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
> > > Cc: Josh Triplett <josh@joshtriplett.org>
> > > Cc: Lai Jiangshan <jiangshanlai@gmail.com>
> > > Cc: Joel Fernandes <joel@joelfernandes.org>
> > > Cc: Neeraj Upadhyay <neeraju@codeaurora.org>
> > > Cc: Boqun Feng <boqun.feng@gmail.com>
> > 
> > Give that you delete this code a couple of patches later in this series,
> > why not just leave it out entirely?  ;-)
> 
> It's not exactly deleted later, it's rather merged within the
> "del_timer(&rdp_gp->nocb_timer)".
> 
> The purpose of that patch is to make it clear that we explicitly cancel
> the nocb_bypass_timer here before we do it implicitly later with the
> merge of nocb_bypass_timer into nocb_timer.
> 
> We could drop that patch, the resulting code in the end of the patchset
> will be the same of course but the behaviour detail described here might
> slip out of the reviewers attention :-)
> 

How about merging the timers first and adding those small improvements
later? i.e. move patch #12 #13 right after #7 (IIUC, #7 is the last
requirement you need for merging timers), and then patch #8~#11 just
follow, because IIUC, those patches are not about correctness but more
about avoiding necessary timer fire-ups, right?

Just my 2 cents. The overall patchset looks good to me ;-)

Feel free to add

Reviewed-by: Boqun Feng <boqun.feng@gmail.com>

Regards,
Boqun

> > 
> > 							Thanx, Paul
> > 
> > > ---
> > >  kernel/rcu/tree_plugin.h | 2 ++
> > >  1 file changed, 2 insertions(+)
> > > 
> > > diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> > > index b62ad79bbda5..9da67b0d3997 100644
> > > --- a/kernel/rcu/tree_plugin.h
> > > +++ b/kernel/rcu/tree_plugin.h
> > > @@ -1711,6 +1711,8 @@ static bool __wake_nocb_gp(struct rcu_data *rdp_gp,
> > >  		del_timer(&rdp_gp->nocb_timer);
> > >  	}
> > >  
> > > +	del_timer(&rdp_gp->nocb_bypass_timer);
> > > +
> > >  	if (force || READ_ONCE(rdp_gp->nocb_gp_sleep)) {
> > >  		WRITE_ONCE(rdp_gp->nocb_gp_sleep, false);
> > >  		needwake = true;
> 
> Thanks.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 10/13] rcu/nocb: Delete bypass_timer upon nocb_gp wakeup
  2021-03-15 14:53       ` Boqun Feng
@ 2021-03-15 22:56         ` Frederic Weisbecker
  2021-03-16  0:02           ` Paul E. McKenney
  0 siblings, 1 reply; 39+ messages in thread
From: Frederic Weisbecker @ 2021-03-15 22:56 UTC (permalink / raw)
  To: Boqun Feng
  Cc: Paul E. McKenney, LKML, Thomas Gleixner, Lai Jiangshan,
	Neeraj Upadhyay, Josh Triplett, Stable, Joel Fernandes

On Mon, Mar 15, 2021 at 10:53:36PM +0800, Boqun Feng wrote:
> On Wed, Mar 10, 2021 at 11:17:02PM +0100, Frederic Weisbecker wrote:
> > On Tue, Mar 02, 2021 at 05:24:56PM -0800, Paul E. McKenney wrote:
> > > On Tue, Feb 23, 2021 at 01:10:08AM +0100, Frederic Weisbecker wrote:
> > > > A NOCB-gp wake up can safely delete the nocb_bypass_timer. nocb_gp_wait()
> > > > is going to check again the bypass state and rearm the bypass timer if
> > > > necessary.
> > > > 
> > > > Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
> > > > Cc: Josh Triplett <josh@joshtriplett.org>
> > > > Cc: Lai Jiangshan <jiangshanlai@gmail.com>
> > > > Cc: Joel Fernandes <joel@joelfernandes.org>
> > > > Cc: Neeraj Upadhyay <neeraju@codeaurora.org>
> > > > Cc: Boqun Feng <boqun.feng@gmail.com>
> > > 
> > > Give that you delete this code a couple of patches later in this series,
> > > why not just leave it out entirely?  ;-)
> > 
> > It's not exactly deleted later, it's rather merged within the
> > "del_timer(&rdp_gp->nocb_timer)".
> > 
> > The purpose of that patch is to make it clear that we explicitly cancel
> > the nocb_bypass_timer here before we do it implicitly later with the
> > merge of nocb_bypass_timer into nocb_timer.
> > 
> > We could drop that patch, the resulting code in the end of the patchset
> > will be the same of course but the behaviour detail described here might
> > slip out of the reviewers attention :-)
> > 
> 
> How about merging the timers first and adding those small improvements
> later? i.e. move patch #12 #13 right after #7 (IIUC, #7 is the last
> requirement you need for merging timers)

Hmm, nope, patches 9 and 10 are actually preparation work for timers merge.
In fact they could even be skipped and timers could be merged directly but I
wanted the unified behaviour to be fully explicit for reviewers through those
incremental changes before merging the timers together.

>, and then patch #8~#11 just follow

Patch 8 really need to stay where it is because it is an important limitation
on nocb de-offloading that can be removed right after patch 7 (which itself
removes the sole reason for rdp leader to remain nocb) and it doesn't depend
on the timers unification that comes after.

> 
> Just my 2 cents. The overall patchset looks good to me ;-)
> 
> Feel free to add
> 
> Reviewed-by: Boqun Feng <boqun.feng@gmail.com>

Thanks a lot for checking that!

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 05/13] rcu/nocb: Use the rcuog CPU's ->nocb_timer
  2021-03-10 22:05     ` Frederic Weisbecker
@ 2021-03-16  0:02       ` Paul E. McKenney
  0 siblings, 0 replies; 39+ messages in thread
From: Paul E. McKenney @ 2021-03-16  0:02 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Thomas Gleixner, Boqun Feng, Lai Jiangshan,
	Neeraj Upadhyay, Josh Triplett, Stable, Joel Fernandes

On Wed, Mar 10, 2021 at 11:05:07PM +0100, Frederic Weisbecker wrote:
> On Tue, Mar 02, 2021 at 05:15:57PM -0800, Paul E. McKenney wrote:
> > The first question is of course: Did you try this with lockdep enabled?  ;-)
> 
> Yep I always do. But I may miss some configs on my testings. I usually
> test at least TREE01 on x86 and arm64.
> 
> > > @@ -1702,43 +1692,50 @@ bool rcu_is_nocb_cpu(int cpu)
> > >  	return false;
> > >  }
> > >  
> > > -/*
> > > - * Kick the GP kthread for this NOCB group.  Caller holds ->nocb_lock
> > > - * and this function releases it.
> > > - */
> > > -static bool wake_nocb_gp(struct rcu_data *rdp, bool force,
> > > -			 unsigned long flags)
> > > -	__releases(rdp->nocb_lock)
> > > +static bool __wake_nocb_gp(struct rcu_data *rdp_gp,
> > > +			   struct rcu_data *rdp,
> > > +			   bool force, unsigned long flags)
> > > +	__releases(rdp_gp->nocb_gp_lock)
> > >  {
> > >  	bool needwake = false;
> > > -	struct rcu_data *rdp_gp = rdp->nocb_gp_rdp;
> > >  
> > > -	lockdep_assert_held(&rdp->nocb_lock);
> > >  	if (!READ_ONCE(rdp_gp->nocb_gp_kthread)) {
> > > -		rcu_nocb_unlock_irqrestore(rdp, flags);
> > > +		raw_spin_unlock_irqrestore(&rdp_gp->nocb_gp_lock, flags);
> > >  		trace_rcu_nocb_wake(rcu_state.name, rdp->cpu,
> > >  				    TPS("AlreadyAwake"));
> > >  		return false;
> > >  	}
> > >  
> > > -	if (READ_ONCE(rdp->nocb_defer_wakeup) > RCU_NOCB_WAKE_NOT) {
> > > -		WRITE_ONCE(rdp->nocb_defer_wakeup, RCU_NOCB_WAKE_NOT);
> > > -		del_timer(&rdp->nocb_timer);
> > > +	if (rdp_gp->nocb_defer_wakeup > RCU_NOCB_WAKE_NOT) {
> > 
> > So there are no longer any data races involving ->nocb_defer_wakeup?
> > 
> > (Yes, I could fire up KCSAN, but my KCSAN-capable system is otherwise
> > occupied for several more hours.)
> 
> To be more specific, there is no more unlocked write to the timer (queue/cancel)
> and its nocb_defer_wakeup matching state. And there is only one (on purpose) racy
> reader of ->nocb_defer_wakeup which is the non-timer deferred wakeup.
> 
> So the writes to the timer keep their WRITE_ONCE() and only the reader in
> do_nocb_deferred_wakeup() keeps its READ_ONCE(). Other readers are protected
> by the ->nocb_gp_lock.
> 
> > > +
> > >  		// Advance callbacks if helpful and low contention.
> > >  		needwake_gp = false;
> > >  		if (!rcu_segcblist_restempty(&rdp->cblist,
> > > @@ -2178,11 +2182,18 @@ static void nocb_gp_wait(struct rcu_data *my_rdp)
> > >  	my_rdp->nocb_gp_bypass = bypass;
> > >  	my_rdp->nocb_gp_gp = needwait_gp;
> > >  	my_rdp->nocb_gp_seq = needwait_gp ? wait_gp_seq : 0;
> > > -	if (bypass && !rcu_nocb_poll) {
> > > -		// At least one child with non-empty ->nocb_bypass, so set
> > > -		// timer in order to avoid stranding its callbacks.
> > > +	if (bypass) {
> > >  		raw_spin_lock_irqsave(&my_rdp->nocb_gp_lock, flags);
> > > -		mod_timer(&my_rdp->nocb_bypass_timer, j + 2);
> > > +		// Avoid race with first bypass CB.
> > > +		if (my_rdp->nocb_defer_wakeup > RCU_NOCB_WAKE_NOT) {
> > > +			WRITE_ONCE(my_rdp->nocb_defer_wakeup, RCU_NOCB_WAKE_NOT);
> > > +			del_timer(&my_rdp->nocb_timer);
> > > +		}
> > 
> > Given that the timer does not get queued if rcu_nocb_poll, why not move the
> > above "if" statement under the one following?
> 
> It's done later in the set.
> 
> > 
> > > +		if (!rcu_nocb_poll) {
> > > +			// At least one child with non-empty ->nocb_bypass, so set
> > > +			// timer in order to avoid stranding its callbacks.
> > > +			mod_timer(&my_rdp->nocb_bypass_timer, j + 2);
> > > +		}
> > >  		raw_spin_unlock_irqrestore(&my_rdp->nocb_gp_lock, flags);
> > >  	}
> > >  	if (rcu_nocb_poll) {
> > > @@ -2385,7 +2399,10 @@ static void do_nocb_deferred_wakeup_timer(struct timer_list *t)
> > >   */
> > >  static bool do_nocb_deferred_wakeup(struct rcu_data *rdp)
> > >  {
> > > -	if (rcu_nocb_need_deferred_wakeup(rdp))
> > > +	if (!rdp->nocb_gp_rdp)
> > > +		return false;
> > 
> > This check was not necessary previously because each CPU used its own rdp,
> > correct?
> 
> Exactly!
> 
> > The theory is that this early return is taken only during boot,
> > and that the spawning of the kthreads will act as an implicit wakeup?
> 
> You guessed right! That probably deserve a comment.

OK, I have queued these for for further review and testing.  Also to
look at the overall effect.  Thank you very much!

							Thanx, Paul

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 10/13] rcu/nocb: Delete bypass_timer upon nocb_gp wakeup
  2021-03-15 22:56         ` Frederic Weisbecker
@ 2021-03-16  0:02           ` Paul E. McKenney
  0 siblings, 0 replies; 39+ messages in thread
From: Paul E. McKenney @ 2021-03-16  0:02 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Boqun Feng, LKML, Thomas Gleixner, Lai Jiangshan,
	Neeraj Upadhyay, Josh Triplett, Stable, Joel Fernandes

On Mon, Mar 15, 2021 at 11:56:33PM +0100, Frederic Weisbecker wrote:
> On Mon, Mar 15, 2021 at 10:53:36PM +0800, Boqun Feng wrote:
> > On Wed, Mar 10, 2021 at 11:17:02PM +0100, Frederic Weisbecker wrote:
> > > On Tue, Mar 02, 2021 at 05:24:56PM -0800, Paul E. McKenney wrote:
> > > > On Tue, Feb 23, 2021 at 01:10:08AM +0100, Frederic Weisbecker wrote:
> > > > > A NOCB-gp wake up can safely delete the nocb_bypass_timer. nocb_gp_wait()
> > > > > is going to check again the bypass state and rearm the bypass timer if
> > > > > necessary.
> > > > > 
> > > > > Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
> > > > > Cc: Josh Triplett <josh@joshtriplett.org>
> > > > > Cc: Lai Jiangshan <jiangshanlai@gmail.com>
> > > > > Cc: Joel Fernandes <joel@joelfernandes.org>
> > > > > Cc: Neeraj Upadhyay <neeraju@codeaurora.org>
> > > > > Cc: Boqun Feng <boqun.feng@gmail.com>
> > > > 
> > > > Give that you delete this code a couple of patches later in this series,
> > > > why not just leave it out entirely?  ;-)
> > > 
> > > It's not exactly deleted later, it's rather merged within the
> > > "del_timer(&rdp_gp->nocb_timer)".
> > > 
> > > The purpose of that patch is to make it clear that we explicitly cancel
> > > the nocb_bypass_timer here before we do it implicitly later with the
> > > merge of nocb_bypass_timer into nocb_timer.
> > > 
> > > We could drop that patch, the resulting code in the end of the patchset
> > > will be the same of course but the behaviour detail described here might
> > > slip out of the reviewers attention :-)
> > > 
> > 
> > How about merging the timers first and adding those small improvements
> > later? i.e. move patch #12 #13 right after #7 (IIUC, #7 is the last
> > requirement you need for merging timers)
> 
> Hmm, nope, patches 9 and 10 are actually preparation work for timers merge.
> In fact they could even be skipped and timers could be merged directly but I
> wanted the unified behaviour to be fully explicit for reviewers through those
> incremental changes before merging the timers together.
> 
> >, and then patch #8~#11 just follow
> 
> Patch 8 really need to stay where it is because it is an important limitation
> on nocb de-offloading that can be removed right after patch 7 (which itself
> removes the sole reason for rdp leader to remain nocb) and it doesn't depend
> on the timers unification that comes after.
> 
> > 
> > Just my 2 cents. The overall patchset looks good to me ;-)
> > 
> > Feel free to add
> > 
> > Reviewed-by: Boqun Feng <boqun.feng@gmail.com>
> 
> Thanks a lot for checking that!

Applied to 10/13, thank you both!

							Thanx, Paul

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 12/13] rcu/nocb: Prepare for finegrained deferred wakeup
  2021-02-23  0:10 ` [PATCH 12/13] rcu/nocb: Prepare for finegrained deferred wakeup Frederic Weisbecker
@ 2021-03-16  3:02   ` Paul E. McKenney
  2021-03-16 11:45     ` Frederic Weisbecker
  0 siblings, 1 reply; 39+ messages in thread
From: Paul E. McKenney @ 2021-03-16  3:02 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Thomas Gleixner, Boqun Feng, Lai Jiangshan,
	Neeraj Upadhyay, Josh Triplett, Stable, Joel Fernandes

On Tue, Feb 23, 2021 at 01:10:10AM +0100, Frederic Weisbecker wrote:
> Provide a way to tune the deferred wakeup level we want to perform from
> a safe wakeup point. Currently those sites are:
> 
> * nocb_timer
> * user/idle/guest entry
> * CPU down
> * softirq/rcuc
> 
> All of these sites perform the wake up for both RCU_NOCB_WAKE and
> RCU_NOCB_WAKE_FORCE.
> 
> In order to merge nocb_timer and nocb_bypass_timer together, we plan to
> add a new RCU_NOCB_WAKE_BYPASS that really want to be deferred until
> a timer fires so that we don't wake up the NOCB-gp kthread too early.
> 
> To prepare for that, integrate a wake up level/limit that a callsite is
> deemed to perform.

This appears to need the following in order to build for non-NOCB
configurations.  I will fold it in and am retesting.

							Thanx, Paul

------------------------------------------------------------------------

commit 55f59dd75a11455cf558fd387fbf9011017dcc8a
Author: Paul E. McKenney <paulmck@kernel.org>
Date:   Mon Mar 15 20:00:34 2021 -0700

    squash! rcu/nocb: Prepare for fine-grained deferred wakeup
    
    [ paulmck: Fix non-NOCB rcu_nocb_need_deferred_wakeup() definition. ]
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 0cc7f68..dfb048e 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -2926,7 +2926,7 @@ static void __init rcu_boot_init_nocb_percpu_data(struct rcu_data *rdp)
 {
 }
 
-static int rcu_nocb_need_deferred_wakeup(struct rcu_data *rdp)
+static int rcu_nocb_need_deferred_wakeup(struct rcu_data *rdp, int level)
 {
 	return false;
 }

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [PATCH 12/13] rcu/nocb: Prepare for finegrained deferred wakeup
  2021-03-16  3:02   ` Paul E. McKenney
@ 2021-03-16 11:45     ` Frederic Weisbecker
  2021-03-16 14:02       ` Paul E. McKenney
  0 siblings, 1 reply; 39+ messages in thread
From: Frederic Weisbecker @ 2021-03-16 11:45 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: LKML, Thomas Gleixner, Boqun Feng, Lai Jiangshan,
	Neeraj Upadhyay, Josh Triplett, Stable, Joel Fernandes

On Mon, Mar 15, 2021 at 08:02:39PM -0700, Paul E. McKenney wrote:
> On Tue, Feb 23, 2021 at 01:10:10AM +0100, Frederic Weisbecker wrote:
> > Provide a way to tune the deferred wakeup level we want to perform from
> > a safe wakeup point. Currently those sites are:
> > 
> > * nocb_timer
> > * user/idle/guest entry
> > * CPU down
> > * softirq/rcuc
> > 
> > All of these sites perform the wake up for both RCU_NOCB_WAKE and
> > RCU_NOCB_WAKE_FORCE.
> > 
> > In order to merge nocb_timer and nocb_bypass_timer together, we plan to
> > add a new RCU_NOCB_WAKE_BYPASS that really want to be deferred until
> > a timer fires so that we don't wake up the NOCB-gp kthread too early.
> > 
> > To prepare for that, integrate a wake up level/limit that a callsite is
> > deemed to perform.
> 
> This appears to need the following in order to build for non-NOCB
> configurations.  I will fold it in and am retesting.
> 
> 							Thanx, Paul
> 
> ------------------------------------------------------------------------
> 
> commit 55f59dd75a11455cf558fd387fbf9011017dcc8a
> Author: Paul E. McKenney <paulmck@kernel.org>
> Date:   Mon Mar 15 20:00:34 2021 -0700
> 
>     squash! rcu/nocb: Prepare for fine-grained deferred wakeup
>     
>     [ paulmck: Fix non-NOCB rcu_nocb_need_deferred_wakeup() definition. ]
>     Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> 
> diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> index 0cc7f68..dfb048e 100644
> --- a/kernel/rcu/tree_plugin.h
> +++ b/kernel/rcu/tree_plugin.h
> @@ -2926,7 +2926,7 @@ static void __init rcu_boot_init_nocb_percpu_data(struct rcu_data *rdp)
>  {
>  }
>  
> -static int rcu_nocb_need_deferred_wakeup(struct rcu_data *rdp)
> +static int rcu_nocb_need_deferred_wakeup(struct rcu_data *rdp, int level)
>  {
>  	return false;
>  }


Oops thanks, I missed that.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 12/13] rcu/nocb: Prepare for finegrained deferred wakeup
  2021-03-16 11:45     ` Frederic Weisbecker
@ 2021-03-16 14:02       ` Paul E. McKenney
  0 siblings, 0 replies; 39+ messages in thread
From: Paul E. McKenney @ 2021-03-16 14:02 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Thomas Gleixner, Boqun Feng, Lai Jiangshan,
	Neeraj Upadhyay, Josh Triplett, Stable, Joel Fernandes

On Tue, Mar 16, 2021 at 12:45:26PM +0100, Frederic Weisbecker wrote:
> On Mon, Mar 15, 2021 at 08:02:39PM -0700, Paul E. McKenney wrote:
> > On Tue, Feb 23, 2021 at 01:10:10AM +0100, Frederic Weisbecker wrote:
> > > Provide a way to tune the deferred wakeup level we want to perform from
> > > a safe wakeup point. Currently those sites are:
> > > 
> > > * nocb_timer
> > > * user/idle/guest entry
> > > * CPU down
> > > * softirq/rcuc
> > > 
> > > All of these sites perform the wake up for both RCU_NOCB_WAKE and
> > > RCU_NOCB_WAKE_FORCE.
> > > 
> > > In order to merge nocb_timer and nocb_bypass_timer together, we plan to
> > > add a new RCU_NOCB_WAKE_BYPASS that really want to be deferred until
> > > a timer fires so that we don't wake up the NOCB-gp kthread too early.
> > > 
> > > To prepare for that, integrate a wake up level/limit that a callsite is
> > > deemed to perform.
> > 
> > This appears to need the following in order to build for non-NOCB
> > configurations.  I will fold it in and am retesting.
> > 
> > 							Thanx, Paul
> > 
> > ------------------------------------------------------------------------
> > 
> > commit 55f59dd75a11455cf558fd387fbf9011017dcc8a
> > Author: Paul E. McKenney <paulmck@kernel.org>
> > Date:   Mon Mar 15 20:00:34 2021 -0700
> > 
> >     squash! rcu/nocb: Prepare for fine-grained deferred wakeup
> >     
> >     [ paulmck: Fix non-NOCB rcu_nocb_need_deferred_wakeup() definition. ]
> >     Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> > 
> > diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> > index 0cc7f68..dfb048e 100644
> > --- a/kernel/rcu/tree_plugin.h
> > +++ b/kernel/rcu/tree_plugin.h
> > @@ -2926,7 +2926,7 @@ static void __init rcu_boot_init_nocb_percpu_data(struct rcu_data *rdp)
> >  {
> >  }
> >  
> > -static int rcu_nocb_need_deferred_wakeup(struct rcu_data *rdp)
> > +static int rcu_nocb_need_deferred_wakeup(struct rcu_data *rdp, int level)
> >  {
> >  	return false;
> >  }
> 
> Oops thanks, I missed that.

And with this, light rcutorture testing passed.  I hope to pound on it
harder later this week.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 39+ messages in thread

end of thread, other threads:[~2021-03-16 14:02 UTC | newest]

Thread overview: 39+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-02-23  0:09 [PATCH 00/13] rcu/nocb updates v2 Frederic Weisbecker
2021-02-23  0:09 ` [PATCH 01/13] rcu/nocb: Fix potential missed nocb_timer rearm Frederic Weisbecker
2021-02-24 18:37   ` Paul E. McKenney
2021-02-24 22:06     ` Frederic Weisbecker
2021-02-25  0:14       ` Paul E. McKenney
2021-02-25  0:48         ` Frederic Weisbecker
2021-02-25  1:07           ` Paul E. McKenney
2021-03-02  1:48       ` Paul E. McKenney
2021-03-02 12:34         ` Frederic Weisbecker
2021-03-02 18:17           ` Paul E. McKenney
2021-03-03  1:35             ` Frederic Weisbecker
2021-03-03  2:06               ` Paul E. McKenney
2021-03-03  2:17                 ` Frederic Weisbecker
2021-03-03 11:15             ` Neeraj Upadhyay
2021-02-23  0:10 ` [PATCH 02/13] rcu/nocb: Disable bypass when CPU isn't completely offloaded Frederic Weisbecker
2021-02-23  0:10 ` [PATCH 03/13] rcu/nocb: Remove stale comment above rcu_segcblist_offload() Frederic Weisbecker
2021-02-23  0:10 ` [PATCH 04/13] rcu/nocb: Move trace_rcu_nocb_wake() calls outside nocb_lock when possible Frederic Weisbecker
2021-02-23  0:10 ` [PATCH 05/13] rcu/nocb: Merge nocb_timer to the rdp leader Frederic Weisbecker
2021-03-03  1:15   ` [PATCH 05/13] rcu/nocb: Use the rcuog CPU's ->nocb_timer Paul E. McKenney
2021-03-10 22:05     ` Frederic Weisbecker
2021-03-16  0:02       ` Paul E. McKenney
2021-02-23  0:10 ` [PATCH 06/13] timer: Revert "timer: Add timer_curr_running()" Frederic Weisbecker
2021-02-23  0:10 ` [PATCH 07/13] rcu/nocb: Directly call __wake_nocb_gp() from bypass timer Frederic Weisbecker
2021-02-23  0:10 ` [PATCH 08/13] rcu/nocb: Allow de-offloading rdp leader Frederic Weisbecker
2021-02-23  0:10 ` [PATCH 09/13] rcu/nocb: Cancel nocb_timer upon nocb_gp wakeup Frederic Weisbecker
2021-02-23  0:10 ` [PATCH 10/13] rcu/nocb: Delete bypass_timer " Frederic Weisbecker
2021-03-03  1:24   ` Paul E. McKenney
2021-03-10 22:17     ` Frederic Weisbecker
2021-03-15 14:53       ` Boqun Feng
2021-03-15 22:56         ` Frederic Weisbecker
2021-03-16  0:02           ` Paul E. McKenney
2021-02-23  0:10 ` [PATCH 11/13] rcu/nocb: Only cancel nocb timer if not polling Frederic Weisbecker
2021-03-03  1:22   ` Paul E. McKenney
2021-03-10 22:08     ` Frederic Weisbecker
2021-02-23  0:10 ` [PATCH 12/13] rcu/nocb: Prepare for finegrained deferred wakeup Frederic Weisbecker
2021-03-16  3:02   ` Paul E. McKenney
2021-03-16 11:45     ` Frederic Weisbecker
2021-03-16 14:02       ` Paul E. McKenney
2021-02-23  0:10 ` [PATCH 13/13] rcu/nocb: Unify timers Frederic Weisbecker

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).