linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/10] rcu: Allow a CPU to leave and reenter NOCB state
@ 2020-05-13 16:47 Frederic Weisbecker
  2020-05-13 16:47 ` [PATCH 01/10] rcu: Directly lock rdp->nocb_lock on nocb code entrypoints Frederic Weisbecker
                   ` (10 more replies)
  0 siblings, 11 replies; 57+ messages in thread
From: Frederic Weisbecker @ 2020-05-13 16:47 UTC (permalink / raw)
  To: Paul E . McKenney
  Cc: LKML, Frederic Weisbecker, Steven Rostedt, Mathieu Desnoyers,
	Lai Jiangshan, Joel Fernandes, Josh Triplett

This is a necessary step toward making nohz_full controllable through
cpuset. Next step should be to allow a CPU to be nocb even if it wasn't
part of the nocb set on boot.

The core design of this set is mostly based on suggestions from Paul
of course.

git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks.git
	rcu/nohz

HEAD: 31cb4ee9da4e9cc6314498ff22d83f0d872b1a88

Thanks,
	Frederic
---

Frederic Weisbecker (10):
      rcu: Directly lock rdp->nocb_lock on nocb code entrypoints
      rcu: Use direct rdp->nocb_lock operations on local calls
      rcu: Make locking explicit in do_nocb_deferred_wakeup_common()
      rcu: Implement rcu_segcblist_is_offloaded() config dependent
      rcu: Remove useless conditional nocb unlock
      rcu: Make nocb_cb kthread parkable
      rcu: Temporarily assume that nohz full CPUs might not be NOCB
      rcu: Allow to deactivate nocb on a CPU
      rcu: Allow to re-offload a CPU that used to be nocb
      rcu: Nocb (de)activate through sysfs


 include/linux/rcu_segcblist.h |   2 +
 include/linux/rcupdate.h      |   4 ++
 kernel/cpu.c                  |  23 +++++++
 kernel/rcu/rcu_segcblist.c    |   6 +-
 kernel/rcu/rcu_segcblist.h    |   8 ++-
 kernel/rcu/tree.c             |  24 +++----
 kernel/rcu/tree.h             |   2 +-
 kernel/rcu/tree_plugin.h      | 149 ++++++++++++++++++++++++++++++++++--------
 8 files changed, 172 insertions(+), 46 deletions(-)

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [PATCH 01/10] rcu: Directly lock rdp->nocb_lock on nocb code entrypoints
  2020-05-13 16:47 [PATCH 00/10] rcu: Allow a CPU to leave and reenter NOCB state Frederic Weisbecker
@ 2020-05-13 16:47 ` Frederic Weisbecker
  2020-05-20 12:29   ` Joel Fernandes
  2020-05-13 16:47 ` [PATCH 02/10] rcu: Use direct rdp->nocb_lock operations on local calls Frederic Weisbecker
                   ` (9 subsequent siblings)
  10 siblings, 1 reply; 57+ messages in thread
From: Frederic Weisbecker @ 2020-05-13 16:47 UTC (permalink / raw)
  To: Paul E . McKenney
  Cc: LKML, Frederic Weisbecker, Steven Rostedt, Mathieu Desnoyers,
	Lai Jiangshan, Joel Fernandes, Josh Triplett

Pure NOCB code entrypoints (nocb_cb kthread, nocb_gp kthread, nocb
timers) can unconditionally lock rdp->nocb_lock as they always execute
in the context of an offloaded rdp.

This also prepare for toggling CPUs to/from callback's offloaded mode
where the offloaded state will possibly change when rdp->nocb_lock
isn't taken. We'll still want the entrypoints to lock the rdp in any
case.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
---
 kernel/rcu/tree_plugin.h | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 097635c41135..523570469864 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -1909,7 +1909,7 @@ static void do_nocb_bypass_wakeup_timer(struct timer_list *t)
 	struct rcu_data *rdp = from_timer(rdp, t, nocb_bypass_timer);
 
 	trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("Timer"));
-	rcu_nocb_lock_irqsave(rdp, flags);
+	raw_spin_lock_irqsave(&rdp->nocb_lock, flags);
 	smp_mb__after_spinlock(); /* Timer expire before wakeup. */
 	__call_rcu_nocb_wake(rdp, true, flags);
 }
@@ -1942,7 +1942,7 @@ static void nocb_gp_wait(struct rcu_data *my_rdp)
 	 */
 	for (rdp = my_rdp; rdp; rdp = rdp->nocb_next_cb_rdp) {
 		trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("Check"));
-		rcu_nocb_lock_irqsave(rdp, flags);
+		raw_spin_lock_irqsave(&rdp->nocb_lock, flags);
 		bypass_ncbs = rcu_cblist_n_cbs(&rdp->nocb_bypass);
 		if (bypass_ncbs &&
 		    (time_after(j, READ_ONCE(rdp->nocb_bypass_first) + 1) ||
@@ -1951,7 +1951,7 @@ static void nocb_gp_wait(struct rcu_data *my_rdp)
 			(void)rcu_nocb_try_flush_bypass(rdp, j);
 			bypass_ncbs = rcu_cblist_n_cbs(&rdp->nocb_bypass);
 		} else if (!bypass_ncbs && rcu_segcblist_empty(&rdp->cblist)) {
-			rcu_nocb_unlock_irqrestore(rdp, flags);
+			raw_spin_unlock_irqrestore(&rdp->nocb_lock, flags);
 			continue; /* No callbacks here, try next. */
 		}
 		if (bypass_ncbs) {
@@ -1996,7 +1996,7 @@ static void nocb_gp_wait(struct rcu_data *my_rdp)
 		} else {
 			needwake = false;
 		}
-		rcu_nocb_unlock_irqrestore(rdp, flags);
+		raw_spin_unlock_irqrestore(&rdp->nocb_lock, flags);
 		if (needwake) {
 			swake_up_one(&rdp->nocb_cb_wq);
 			gotcbs = true;
@@ -2084,7 +2084,7 @@ static void nocb_cb_wait(struct rcu_data *rdp)
 	rcu_do_batch(rdp);
 	local_bh_enable();
 	lockdep_assert_irqs_enabled();
-	rcu_nocb_lock_irqsave(rdp, flags);
+	raw_spin_lock_irqsave(&rdp->nocb_lock, flags);
 	if (rcu_segcblist_nextgp(&rdp->cblist, &cur_gp_seq) &&
 	    rcu_seq_done(&rnp->gp_seq, cur_gp_seq) &&
 	    raw_spin_trylock_rcu_node(rnp)) { /* irqs already disabled. */
@@ -2092,7 +2092,7 @@ static void nocb_cb_wait(struct rcu_data *rdp)
 		raw_spin_unlock_rcu_node(rnp); /* irqs remain disabled. */
 	}
 	if (rcu_segcblist_ready_cbs(&rdp->cblist)) {
-		rcu_nocb_unlock_irqrestore(rdp, flags);
+		raw_spin_unlock_irqrestore(&rdp->nocb_lock, flags);
 		if (needwake_gp)
 			rcu_gp_kthread_wake();
 		return;
@@ -2100,7 +2100,7 @@ static void nocb_cb_wait(struct rcu_data *rdp)
 
 	trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("CBSleep"));
 	WRITE_ONCE(rdp->nocb_cb_sleep, true);
-	rcu_nocb_unlock_irqrestore(rdp, flags);
+	raw_spin_unlock_irqrestore(&rdp->nocb_lock, flags);
 	if (needwake_gp)
 		rcu_gp_kthread_wake();
 	swait_event_interruptible_exclusive(rdp->nocb_cb_wq,
-- 
2.25.0


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH 02/10] rcu: Use direct rdp->nocb_lock operations on local calls
  2020-05-13 16:47 [PATCH 00/10] rcu: Allow a CPU to leave and reenter NOCB state Frederic Weisbecker
  2020-05-13 16:47 ` [PATCH 01/10] rcu: Directly lock rdp->nocb_lock on nocb code entrypoints Frederic Weisbecker
@ 2020-05-13 16:47 ` Frederic Weisbecker
  2020-05-13 16:47 ` [PATCH 03/10] rcu: Make locking explicit in do_nocb_deferred_wakeup_common() Frederic Weisbecker
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 57+ messages in thread
From: Frederic Weisbecker @ 2020-05-13 16:47 UTC (permalink / raw)
  To: Paul E . McKenney
  Cc: LKML, Frederic Weisbecker, Steven Rostedt, Mathieu Desnoyers,
	Lai Jiangshan, Joel Fernandes, Josh Triplett

Unconditionally lock rdp->nocb_lock on nocb code that is called after
we verified that the rdp is offloaded:

This clarify the locking rules and expectations.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
---
 kernel/rcu/tree_plugin.h | 24 ++++++++++++------------
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 523570469864..1d22b16c03e0 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -1628,11 +1628,11 @@ static void wake_nocb_gp(struct rcu_data *rdp, bool force,
 	if (!READ_ONCE(rdp_gp->nocb_gp_kthread)) {
 		trace_rcu_nocb_wake(rcu_state.name, rdp->cpu,
 				    TPS("AlreadyAwake"));
-		rcu_nocb_unlock_irqrestore(rdp, flags);
+		raw_spin_unlock_irqrestore(&rdp->nocb_lock, flags);
 		return;
 	}
 	del_timer(&rdp->nocb_timer);
-	rcu_nocb_unlock_irqrestore(rdp, flags);
+	raw_spin_unlock_irqrestore(&rdp->nocb_lock, flags);
 	raw_spin_lock_irqsave(&rdp_gp->nocb_gp_lock, flags);
 	if (force || READ_ONCE(rdp_gp->nocb_gp_sleep)) {
 		WRITE_ONCE(rdp_gp->nocb_gp_sleep, false);
@@ -1753,7 +1753,7 @@ static bool rcu_nocb_try_bypass(struct rcu_data *rdp, struct rcu_head *rhp,
 
 	// Don't use ->nocb_bypass during early boot.
 	if (rcu_scheduler_active != RCU_SCHEDULER_RUNNING) {
-		rcu_nocb_lock(rdp);
+		raw_spin_lock(&rdp->nocb_lock);
 		WARN_ON_ONCE(rcu_cblist_n_cbs(&rdp->nocb_bypass));
 		*was_alldone = !rcu_segcblist_pend_cbs(&rdp->cblist);
 		return false;
@@ -1778,7 +1778,7 @@ static bool rcu_nocb_try_bypass(struct rcu_data *rdp, struct rcu_head *rhp,
 	// this jiffy, tell the caller to enqueue onto ->cblist.  But flush
 	// ->nocb_bypass first.
 	if (rdp->nocb_nobypass_count < nocb_nobypass_lim_per_jiffy) {
-		rcu_nocb_lock(rdp);
+		raw_spin_lock(&rdp->nocb_lock);
 		*was_alldone = !rcu_segcblist_pend_cbs(&rdp->cblist);
 		if (*was_alldone)
 			trace_rcu_nocb_wake(rcu_state.name, rdp->cpu,
@@ -1792,7 +1792,7 @@ static bool rcu_nocb_try_bypass(struct rcu_data *rdp, struct rcu_head *rhp,
 	// flush ->nocb_bypass to ->cblist.
 	if ((ncbs && j != READ_ONCE(rdp->nocb_bypass_first)) ||
 	    ncbs >= qhimark) {
-		rcu_nocb_lock(rdp);
+		raw_spin_lock(&rdp->nocb_lock);
 		if (!rcu_nocb_flush_bypass(rdp, rhp, j)) {
 			*was_alldone = !rcu_segcblist_pend_cbs(&rdp->cblist);
 			if (*was_alldone)
@@ -1807,7 +1807,7 @@ static bool rcu_nocb_try_bypass(struct rcu_data *rdp, struct rcu_head *rhp,
 			rcu_advance_cbs_nowake(rdp->mynode, rdp);
 			rdp->nocb_gp_adv_time = j;
 		}
-		rcu_nocb_unlock_irqrestore(rdp, flags);
+		raw_spin_unlock_irqrestore(&rdp->nocb_lock, flags);
 		return true; // Callback already enqueued.
 	}
 
@@ -1827,7 +1827,7 @@ static bool rcu_nocb_try_bypass(struct rcu_data *rdp, struct rcu_head *rhp,
 		local_irq_restore(flags);
 	} else {
 		// No-CBs GP kthread might be indefinitely asleep, if so, wake.
-		rcu_nocb_lock(rdp); // Rare during call_rcu() flood.
+		raw_spin_lock(&rdp->nocb_lock); // Rare during call_rcu() flood.
 		if (!rcu_segcblist_pend_cbs(&rdp->cblist)) {
 			trace_rcu_nocb_wake(rcu_state.name, rdp->cpu,
 					    TPS("FirstBQwake"));
@@ -1835,7 +1835,7 @@ static bool rcu_nocb_try_bypass(struct rcu_data *rdp, struct rcu_head *rhp,
 		} else {
 			trace_rcu_nocb_wake(rcu_state.name, rdp->cpu,
 					    TPS("FirstBQnoWake"));
-			rcu_nocb_unlock_irqrestore(rdp, flags);
+			raw_spin_unlock_irqrestore(&rdp->nocb_lock, flags);
 		}
 	}
 	return true; // Callback already enqueued.
@@ -1861,7 +1861,7 @@ static void __call_rcu_nocb_wake(struct rcu_data *rdp, bool was_alldone,
 	if (rcu_nocb_poll || !t) {
 		trace_rcu_nocb_wake(rcu_state.name, rdp->cpu,
 				    TPS("WakeNotPoll"));
-		rcu_nocb_unlock_irqrestore(rdp, flags);
+		raw_spin_unlock_irqrestore(&rdp->nocb_lock, flags);
 		return;
 	}
 	// Need to actually to a wakeup.
@@ -1876,7 +1876,7 @@ static void __call_rcu_nocb_wake(struct rcu_data *rdp, bool was_alldone,
 		} else {
 			wake_nocb_gp_defer(rdp, RCU_NOCB_WAKE,
 					   TPS("WakeEmptyIsDeferred"));
-			rcu_nocb_unlock_irqrestore(rdp, flags);
+			raw_spin_unlock_irqrestore(&rdp->nocb_lock, flags);
 		}
 	} else if (len > rdp->qlen_last_fqs_check + qhimark) {
 		/* ... or if many callbacks queued. */
@@ -1894,10 +1894,10 @@ static void __call_rcu_nocb_wake(struct rcu_data *rdp, bool was_alldone,
 		    !timer_pending(&rdp->nocb_bypass_timer))
 			wake_nocb_gp_defer(rdp, RCU_NOCB_WAKE_FORCE,
 					   TPS("WakeOvfIsDeferred"));
-		rcu_nocb_unlock_irqrestore(rdp, flags);
+		raw_spin_unlock_irqrestore(&rdp->nocb_lock, flags);
 	} else {
 		trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("WakeNot"));
-		rcu_nocb_unlock_irqrestore(rdp, flags);
+		raw_spin_unlock_irqrestore(&rdp->nocb_lock, flags);
 	}
 	return;
 }
-- 
2.25.0


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH 03/10] rcu: Make locking explicit in do_nocb_deferred_wakeup_common()
  2020-05-13 16:47 [PATCH 00/10] rcu: Allow a CPU to leave and reenter NOCB state Frederic Weisbecker
  2020-05-13 16:47 ` [PATCH 01/10] rcu: Directly lock rdp->nocb_lock on nocb code entrypoints Frederic Weisbecker
  2020-05-13 16:47 ` [PATCH 02/10] rcu: Use direct rdp->nocb_lock operations on local calls Frederic Weisbecker
@ 2020-05-13 16:47 ` Frederic Weisbecker
  2020-05-26 19:54   ` Joel Fernandes
  2020-05-26 19:59   ` Joel Fernandes
  2020-05-13 16:47 ` [PATCH 04/10] rcu: Implement rcu_segcblist_is_offloaded() config dependent Frederic Weisbecker
                   ` (7 subsequent siblings)
  10 siblings, 2 replies; 57+ messages in thread
From: Frederic Weisbecker @ 2020-05-13 16:47 UTC (permalink / raw)
  To: Paul E . McKenney
  Cc: LKML, Frederic Weisbecker, Steven Rostedt, Mathieu Desnoyers,
	Lai Jiangshan, Joel Fernandes, Josh Triplett

It can either be called inline (locally or CPU hotplug locked) when
rdp->nocb_defer_wakeup is pending or from the nocb timer. In both cases
the rdp is offlined and we want to take the nocb lock.

Clarify the locking rules and expectations.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
---
 kernel/rcu/tree_plugin.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 1d22b16c03e0..1dd3fdd675a1 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -2142,9 +2142,9 @@ static void do_nocb_deferred_wakeup_common(struct rcu_data *rdp)
 	unsigned long flags;
 	int ndw;
 
-	rcu_nocb_lock_irqsave(rdp, flags);
+	raw_spin_lock_irqsave(&rdp->nocb_lock, flags);
 	if (!rcu_nocb_need_deferred_wakeup(rdp)) {
-		rcu_nocb_unlock_irqrestore(rdp, flags);
+		raw_spin_unlock_irqrestore(&rdp->nocb_lock, flags);
 		return;
 	}
 	ndw = READ_ONCE(rdp->nocb_defer_wakeup);
-- 
2.25.0


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH 04/10] rcu: Implement rcu_segcblist_is_offloaded() config dependent
  2020-05-13 16:47 [PATCH 00/10] rcu: Allow a CPU to leave and reenter NOCB state Frederic Weisbecker
                   ` (2 preceding siblings ...)
  2020-05-13 16:47 ` [PATCH 03/10] rcu: Make locking explicit in do_nocb_deferred_wakeup_common() Frederic Weisbecker
@ 2020-05-13 16:47 ` Frederic Weisbecker
  2020-05-13 18:20   ` Paul E. McKenney
  2020-05-13 16:47 ` [PATCH 05/10] rcu: Remove useless conditional nocb unlock Frederic Weisbecker
                   ` (6 subsequent siblings)
  10 siblings, 1 reply; 57+ messages in thread
From: Frederic Weisbecker @ 2020-05-13 16:47 UTC (permalink / raw)
  To: Paul E . McKenney
  Cc: LKML, Frederic Weisbecker, Steven Rostedt, Mathieu Desnoyers,
	Lai Jiangshan, Joel Fernandes, Josh Triplett

This simplify the usage of this API and avoid checking the kernel
config from the callers.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
---
 include/linux/rcu_segcblist.h |  2 ++
 kernel/rcu/rcu_segcblist.c    |  2 ++
 kernel/rcu/rcu_segcblist.h    |  6 ++++++
 kernel/rcu/tree.c             | 21 +++++++--------------
 4 files changed, 17 insertions(+), 14 deletions(-)

diff --git a/include/linux/rcu_segcblist.h b/include/linux/rcu_segcblist.h
index b36afe7b22c9..0ced0a0ecbcf 100644
--- a/include/linux/rcu_segcblist.h
+++ b/include/linux/rcu_segcblist.h
@@ -73,7 +73,9 @@ struct rcu_segcblist {
 	long len;
 #endif
 	u8 enabled;
+#ifdef CONFIG_RCU_NOCB_CPU
 	u8 offloaded;
+#endif
 };
 
 #define RCU_SEGCBLIST_INITIALIZER(n) \
diff --git a/kernel/rcu/rcu_segcblist.c b/kernel/rcu/rcu_segcblist.c
index 9a0f66133b4b..d8ea2bef5574 100644
--- a/kernel/rcu/rcu_segcblist.c
+++ b/kernel/rcu/rcu_segcblist.c
@@ -166,6 +166,7 @@ void rcu_segcblist_disable(struct rcu_segcblist *rsclp)
 	rsclp->enabled = 0;
 }
 
+#ifdef CONFIG_RCU_NOCB_CPU
 /*
  * Mark the specified rcu_segcblist structure as offloaded.  This
  * structure must be empty.
@@ -174,6 +175,7 @@ void rcu_segcblist_offload(struct rcu_segcblist *rsclp)
 {
 	rsclp->offloaded = 1;
 }
+#endif
 
 /*
  * Does the specified rcu_segcblist structure contain callbacks that
diff --git a/kernel/rcu/rcu_segcblist.h b/kernel/rcu/rcu_segcblist.h
index 5c293afc07b8..4c1503a82492 100644
--- a/kernel/rcu/rcu_segcblist.h
+++ b/kernel/rcu/rcu_segcblist.h
@@ -62,7 +62,11 @@ static inline bool rcu_segcblist_is_enabled(struct rcu_segcblist *rsclp)
 /* Is the specified rcu_segcblist offloaded?  */
 static inline bool rcu_segcblist_is_offloaded(struct rcu_segcblist *rsclp)
 {
+#ifdef CONFIG_RCU_NOCB_CPU
 	return rsclp->offloaded;
+#else
+	return false;
+#endif
 }
 
 /*
@@ -78,7 +82,9 @@ static inline bool rcu_segcblist_restempty(struct rcu_segcblist *rsclp, int seg)
 void rcu_segcblist_inc_len(struct rcu_segcblist *rsclp);
 void rcu_segcblist_init(struct rcu_segcblist *rsclp);
 void rcu_segcblist_disable(struct rcu_segcblist *rsclp);
+#ifdef CONFIG_RCU_NOCB_CPU
 void rcu_segcblist_offload(struct rcu_segcblist *rsclp);
+#endif
 bool rcu_segcblist_ready_cbs(struct rcu_segcblist *rsclp);
 bool rcu_segcblist_pend_cbs(struct rcu_segcblist *rsclp);
 struct rcu_head *rcu_segcblist_first_cb(struct rcu_segcblist *rsclp);
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index d9a49cd6065a..804cf7dfff03 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -1401,8 +1401,7 @@ static bool __note_gp_changes(struct rcu_node *rnp, struct rcu_data *rdp)
 {
 	bool ret = false;
 	bool need_qs;
-	const bool offloaded = IS_ENABLED(CONFIG_RCU_NOCB_CPU) &&
-			       rcu_segcblist_is_offloaded(&rdp->cblist);
+	const bool offloaded = rcu_segcblist_is_offloaded(&rdp->cblist);
 
 	raw_lockdep_assert_held_rcu_node(rnp);
 
@@ -1790,8 +1789,7 @@ static void rcu_gp_cleanup(void)
 		needgp = true;
 	}
 	/* Advance CBs to reduce false positives below. */
-	offloaded = IS_ENABLED(CONFIG_RCU_NOCB_CPU) &&
-		    rcu_segcblist_is_offloaded(&rdp->cblist);
+	offloaded = rcu_segcblist_is_offloaded(&rdp->cblist);
 	if ((offloaded || !rcu_accelerate_cbs(rnp, rdp)) && needgp) {
 		WRITE_ONCE(rcu_state.gp_flags, RCU_GP_FLAG_INIT);
 		WRITE_ONCE(rcu_state.gp_req_activity, jiffies);
@@ -1985,8 +1983,7 @@ rcu_report_qs_rdp(int cpu, struct rcu_data *rdp)
 	unsigned long flags;
 	unsigned long mask;
 	bool needwake = false;
-	const bool offloaded = IS_ENABLED(CONFIG_RCU_NOCB_CPU) &&
-			       rcu_segcblist_is_offloaded(&rdp->cblist);
+	const bool offloaded = rcu_segcblist_is_offloaded(&rdp->cblist);
 	struct rcu_node *rnp;
 
 	rnp = rdp->mynode;
@@ -2153,8 +2150,7 @@ int rcutree_dead_cpu(unsigned int cpu)
 static void rcu_do_batch(struct rcu_data *rdp)
 {
 	unsigned long flags;
-	const bool offloaded = IS_ENABLED(CONFIG_RCU_NOCB_CPU) &&
-			       rcu_segcblist_is_offloaded(&rdp->cblist);
+	const bool offloaded = rcu_segcblist_is_offloaded(&rdp->cblist);
 	struct rcu_head *rhp;
 	struct rcu_cblist rcl = RCU_CBLIST_INITIALIZER(rcl);
 	long bl, count;
@@ -2397,8 +2393,7 @@ static __latent_entropy void rcu_core(void)
 	unsigned long flags;
 	struct rcu_data *rdp = raw_cpu_ptr(&rcu_data);
 	struct rcu_node *rnp = rdp->mynode;
-	const bool offloaded = IS_ENABLED(CONFIG_RCU_NOCB_CPU) &&
-			       rcu_segcblist_is_offloaded(&rdp->cblist);
+	const bool offloaded = rcu_segcblist_is_offloaded(&rdp->cblist);
 
 	if (cpu_is_offline(smp_processor_id()))
 		return;
@@ -2695,8 +2690,7 @@ __call_rcu(struct rcu_head *head, rcu_callback_t func)
 				   rcu_segcblist_n_cbs(&rdp->cblist));
 
 	/* Go handle any RCU core processing required. */
-	if (IS_ENABLED(CONFIG_RCU_NOCB_CPU) &&
-	    unlikely(rcu_segcblist_is_offloaded(&rdp->cblist))) {
+	if (unlikely(rcu_segcblist_is_offloaded(&rdp->cblist))) {
 		__call_rcu_nocb_wake(rdp, was_alldone, flags); /* unlocks */
 	} else {
 		__call_rcu_core(rdp, head, flags);
@@ -3243,8 +3237,7 @@ static int rcu_pending(int user)
 
 	/* Has RCU gone idle with this CPU needing another grace period? */
 	if (!gp_in_progress && rcu_segcblist_is_enabled(&rdp->cblist) &&
-	    (!IS_ENABLED(CONFIG_RCU_NOCB_CPU) ||
-	     !rcu_segcblist_is_offloaded(&rdp->cblist)) &&
+	    !rcu_segcblist_is_offloaded(&rdp->cblist) &&
 	    !rcu_segcblist_restempty(&rdp->cblist, RCU_NEXT_READY_TAIL))
 		return 1;
 
-- 
2.25.0


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH 05/10] rcu: Remove useless conditional nocb unlock
  2020-05-13 16:47 [PATCH 00/10] rcu: Allow a CPU to leave and reenter NOCB state Frederic Weisbecker
                   ` (3 preceding siblings ...)
  2020-05-13 16:47 ` [PATCH 04/10] rcu: Implement rcu_segcblist_is_offloaded() config dependent Frederic Weisbecker
@ 2020-05-13 16:47 ` Frederic Weisbecker
  2020-05-13 16:47 ` [PATCH 06/10] rcu: Make nocb_cb kthread parkable Frederic Weisbecker
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 57+ messages in thread
From: Frederic Weisbecker @ 2020-05-13 16:47 UTC (permalink / raw)
  To: Paul E . McKenney
  Cc: LKML, Frederic Weisbecker, Steven Rostedt, Mathieu Desnoyers,
	Lai Jiangshan, Joel Fernandes, Josh Triplett

Not only is it in the bad order (rdp->nocb_lock should be unlocked after
rnp) but it's also dead code as we are in the !offloaded path.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
---
 kernel/rcu/tree.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 804cf7dfff03..cc95419f6491 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -3680,7 +3680,6 @@ void rcutree_migrate_callbacks(int cpu)
 		raw_spin_unlock_rcu_node(my_rnp); /* irqs remain disabled. */
 		__call_rcu_nocb_wake(my_rdp, true, flags);
 	} else {
-		rcu_nocb_unlock(my_rdp); /* irqs remain disabled. */
 		raw_spin_unlock_irqrestore_rcu_node(my_rnp, flags);
 	}
 	if (needwake)
-- 
2.25.0


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH 06/10] rcu: Make nocb_cb kthread parkable
  2020-05-13 16:47 [PATCH 00/10] rcu: Allow a CPU to leave and reenter NOCB state Frederic Weisbecker
                   ` (4 preceding siblings ...)
  2020-05-13 16:47 ` [PATCH 05/10] rcu: Remove useless conditional nocb unlock Frederic Weisbecker
@ 2020-05-13 16:47 ` Frederic Weisbecker
  2020-06-11  1:34   ` Joel Fernandes
  2020-05-13 16:47 ` [PATCH 07/10] rcu: Temporarily assume that nohz full CPUs might not be NOCB Frederic Weisbecker
                   ` (4 subsequent siblings)
  10 siblings, 1 reply; 57+ messages in thread
From: Frederic Weisbecker @ 2020-05-13 16:47 UTC (permalink / raw)
  To: Paul E . McKenney
  Cc: LKML, Frederic Weisbecker, Steven Rostedt, Mathieu Desnoyers,
	Lai Jiangshan, Joel Fernandes, Josh Triplett

This will be necessary to correctly implement rdp de-offloading. We
don't want rcu_do_batch() in nocb_cb kthread to race with local
rcu_do_batch().

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
---
 kernel/rcu/tree_plugin.h | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 1dd3fdd675a1..43ecc047af26 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -2104,7 +2104,9 @@ static void nocb_cb_wait(struct rcu_data *rdp)
 	if (needwake_gp)
 		rcu_gp_kthread_wake();
 	swait_event_interruptible_exclusive(rdp->nocb_cb_wq,
-				 !READ_ONCE(rdp->nocb_cb_sleep));
+				    !READ_ONCE(rdp->nocb_cb_sleep) ||
+				    kthread_should_park());
+
 	if (!smp_load_acquire(&rdp->nocb_cb_sleep)) { /* VVV */
 		/* ^^^ Ensure CB invocation follows _sleep test. */
 		return;
@@ -2125,6 +2127,8 @@ static int rcu_nocb_cb_kthread(void *arg)
 	// if there are no more ready callbacks, waits for them.
 	for (;;) {
 		nocb_cb_wait(rdp);
+		if (kthread_should_park())
+			kthread_parkme();
 		cond_resched_tasks_rcu_qs();
 	}
 	return 0;
-- 
2.25.0


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH 07/10] rcu: Temporarily assume that nohz full CPUs might not be NOCB
  2020-05-13 16:47 [PATCH 00/10] rcu: Allow a CPU to leave and reenter NOCB state Frederic Weisbecker
                   ` (5 preceding siblings ...)
  2020-05-13 16:47 ` [PATCH 06/10] rcu: Make nocb_cb kthread parkable Frederic Weisbecker
@ 2020-05-13 16:47 ` Frederic Weisbecker
  2020-05-13 18:25   ` Paul E. McKenney
  2020-05-13 16:47 ` [PATCH 08/10] rcu: Allow to deactivate nocb on a CPU Frederic Weisbecker
                   ` (3 subsequent siblings)
  10 siblings, 1 reply; 57+ messages in thread
From: Frederic Weisbecker @ 2020-05-13 16:47 UTC (permalink / raw)
  To: Paul E . McKenney
  Cc: LKML, Frederic Weisbecker, Steven Rostedt, Mathieu Desnoyers,
	Lai Jiangshan, Joel Fernandes, Josh Triplett

So far nohz_full CPUs had to be nocb. This requirement may change
temporarily as we are working on preparing RCU to be able to toggle the
nocb state of a CPU. Once that is done and nohz_full can be toggled as
well dynamically, we'll restore that initial requirement.

Thus for now as a temporary state, make rcu_nohz_full_cpu() aware of
nohz_full CPUs that are not nocb so that they can handle the callbacks
locally.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
---
 kernel/rcu/tree.c        | 2 +-
 kernel/rcu/tree.h        | 2 +-
 kernel/rcu/tree_plugin.h | 7 ++++---
 3 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index cc95419f6491..74b6798309ef 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -3223,7 +3223,7 @@ static int rcu_pending(int user)
 		return 1;
 
 	/* Is this a nohz_full CPU in userspace or idle?  (Ignore RCU if so.) */
-	if ((user || rcu_is_cpu_rrupt_from_idle()) && rcu_nohz_full_cpu())
+	if ((user || rcu_is_cpu_rrupt_from_idle()) && rcu_nohz_full_cpu(rdp))
 		return 0;
 
 	/* Is the RCU core waiting for a quiescent state from this CPU? */
diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
index 9dc2ec021da5..4b9643d9f5e0 100644
--- a/kernel/rcu/tree.h
+++ b/kernel/rcu/tree.h
@@ -451,7 +451,7 @@ do {									\
 #endif /* #else #ifdef CONFIG_RCU_NOCB_CPU */
 
 static void rcu_bind_gp_kthread(void);
-static bool rcu_nohz_full_cpu(void);
+static bool rcu_nohz_full_cpu(struct rcu_data *rdp);
 static void rcu_dynticks_task_enter(void);
 static void rcu_dynticks_task_exit(void);
 
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 43ecc047af26..f19e81e0c691 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -2532,13 +2532,14 @@ static void show_rcu_nocb_state(struct rcu_data *rdp)
  * The idea is to avoid waking up RCU core processing on such a
  * CPU unless the grace period has extended for too long.
  *
- * This code relies on the fact that all NO_HZ_FULL CPUs are also
- * CONFIG_RCU_NOCB_CPU CPUs.
+ * This code relies on the fact that NO_HZ_FULL CPUs might not
+ * be CONFIG_RCU_NOCB_CPU CPUs (temporary development state).
  */
-static bool rcu_nohz_full_cpu(void)
+static bool rcu_nohz_full_cpu(struct rcu_data *rdp)
 {
 #ifdef CONFIG_NO_HZ_FULL
 	if (tick_nohz_full_cpu(smp_processor_id()) &&
+	    rcu_segcblist_is_offloaded(&rdp->cblist) &&
 	    (!rcu_gp_in_progress() ||
 	     ULONG_CMP_LT(jiffies, READ_ONCE(rcu_state.gp_start) + HZ)))
 		return true;
-- 
2.25.0


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH 08/10] rcu: Allow to deactivate nocb on a CPU
  2020-05-13 16:47 [PATCH 00/10] rcu: Allow a CPU to leave and reenter NOCB state Frederic Weisbecker
                   ` (6 preceding siblings ...)
  2020-05-13 16:47 ` [PATCH 07/10] rcu: Temporarily assume that nohz full CPUs might not be NOCB Frederic Weisbecker
@ 2020-05-13 16:47 ` Frederic Weisbecker
  2020-05-13 18:38   ` Paul E. McKenney
  2020-05-26 21:20   ` Joel Fernandes
  2020-05-13 16:47 ` [PATCH 09/10] rcu: Allow to re-offload a CPU that used to be nocb Frederic Weisbecker
                   ` (2 subsequent siblings)
  10 siblings, 2 replies; 57+ messages in thread
From: Frederic Weisbecker @ 2020-05-13 16:47 UTC (permalink / raw)
  To: Paul E . McKenney
  Cc: LKML, Frederic Weisbecker, Steven Rostedt, Mathieu Desnoyers,
	Lai Jiangshan, Joel Fernandes, Josh Triplett

Allow a CPU's rdp to quit the callback offlined mode.
The switch happens on the target with IRQs disabled and rdp->nocb_lock
held to avoid races between local callbacks handling and kthread
offloaded callbacks handling.

nocb_cb kthread is first parked to avoid any future race with
concurrent rcu_do_batch() executions. Then the cblist is set to offloaded
so that the nocb_gp kthread ignores this rdp.

Inspired-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
---
 include/linux/rcupdate.h   |  2 ++
 kernel/rcu/rcu_segcblist.c |  4 +--
 kernel/rcu/rcu_segcblist.h |  2 +-
 kernel/rcu/tree_plugin.h   | 50 +++++++++++++++++++++++++++++++++++++-
 4 files changed, 54 insertions(+), 4 deletions(-)

diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index 2678a37c3169..1d3a4c37c3c1 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -96,8 +96,10 @@ static inline void rcu_user_exit(void) { }
 
 #ifdef CONFIG_RCU_NOCB_CPU
 void rcu_init_nohz(void);
+void rcu_nocb_cpu_deoffload(int cpu);
 #else /* #ifdef CONFIG_RCU_NOCB_CPU */
 static inline void rcu_init_nohz(void) { }
+static inline void rcu_nocb_cpu_deoffload(int cpu) { }
 #endif /* #else #ifdef CONFIG_RCU_NOCB_CPU */
 
 /**
diff --git a/kernel/rcu/rcu_segcblist.c b/kernel/rcu/rcu_segcblist.c
index d8ea2bef5574..4bed48da7702 100644
--- a/kernel/rcu/rcu_segcblist.c
+++ b/kernel/rcu/rcu_segcblist.c
@@ -171,9 +171,9 @@ void rcu_segcblist_disable(struct rcu_segcblist *rsclp)
  * Mark the specified rcu_segcblist structure as offloaded.  This
  * structure must be empty.
  */
-void rcu_segcblist_offload(struct rcu_segcblist *rsclp)
+void rcu_segcblist_offload(struct rcu_segcblist *rsclp, bool offload)
 {
-	rsclp->offloaded = 1;
+	rsclp->offloaded = offload;
 }
 #endif
 
diff --git a/kernel/rcu/rcu_segcblist.h b/kernel/rcu/rcu_segcblist.h
index 4c1503a82492..8f7c6c34cb1b 100644
--- a/kernel/rcu/rcu_segcblist.h
+++ b/kernel/rcu/rcu_segcblist.h
@@ -83,7 +83,7 @@ void rcu_segcblist_inc_len(struct rcu_segcblist *rsclp);
 void rcu_segcblist_init(struct rcu_segcblist *rsclp);
 void rcu_segcblist_disable(struct rcu_segcblist *rsclp);
 #ifdef CONFIG_RCU_NOCB_CPU
-void rcu_segcblist_offload(struct rcu_segcblist *rsclp);
+void rcu_segcblist_offload(struct rcu_segcblist *rsclp, bool offload);
 #endif
 bool rcu_segcblist_ready_cbs(struct rcu_segcblist *rsclp);
 bool rcu_segcblist_pend_cbs(struct rcu_segcblist *rsclp);
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index f19e81e0c691..c74a4df8d5f2 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -1943,6 +1943,10 @@ static void nocb_gp_wait(struct rcu_data *my_rdp)
 	for (rdp = my_rdp; rdp; rdp = rdp->nocb_next_cb_rdp) {
 		trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("Check"));
 		raw_spin_lock_irqsave(&rdp->nocb_lock, flags);
+		if (!rcu_segcblist_is_offloaded(&rdp->cblist)) {
+			raw_spin_unlock_irqrestore(&rdp->nocb_lock, flags);
+			continue;
+		}
 		bypass_ncbs = rcu_cblist_n_cbs(&rdp->nocb_bypass);
 		if (bypass_ncbs &&
 		    (time_after(j, READ_ONCE(rdp->nocb_bypass_first) + 1) ||
@@ -2176,6 +2180,50 @@ static void do_nocb_deferred_wakeup(struct rcu_data *rdp)
 		do_nocb_deferred_wakeup_common(rdp);
 }
 
+static void __rcu_nocb_rdp_deoffload(struct rcu_data *rdp)
+{
+	unsigned long flags;
+	struct rcu_node *rnp = rdp->mynode;
+
+	printk("De-offloading %d\n", rdp->cpu);
+	kthread_park(rdp->nocb_cb_kthread);
+
+	raw_spin_lock_irqsave(&rdp->nocb_lock, flags);
+	rcu_nocb_flush_bypass(rdp, NULL, jiffies);
+	raw_spin_lock_rcu_node(rnp);
+	rcu_segcblist_offload(&rdp->cblist, false);
+	raw_spin_unlock_rcu_node(rnp);
+	raw_spin_unlock_irqrestore(&rdp->nocb_lock, flags);
+}
+
+static long rcu_nocb_rdp_deoffload(void *arg)
+{
+	struct rcu_data *rdp = arg;
+
+	WARN_ON_ONCE(rdp->cpu != raw_smp_processor_id());
+	__rcu_nocb_rdp_deoffload(rdp);
+
+	return 0;
+}
+
+void rcu_nocb_cpu_deoffload(int cpu)
+{
+	struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
+
+	mutex_lock(&rcu_state.barrier_mutex);
+	cpus_read_lock();
+	if (rcu_segcblist_is_offloaded(&rdp->cblist)) {
+		if (cpu_online(cpu)) {
+			work_on_cpu(cpu, rcu_nocb_rdp_deoffload, rdp);
+		} else {
+			__rcu_nocb_rdp_deoffload(rdp);
+		}
+		cpumask_clear_cpu(cpu, rcu_nocb_mask);
+	}
+	cpus_read_unlock();
+	mutex_unlock(&rcu_state.barrier_mutex);
+}
+
 void __init rcu_init_nohz(void)
 {
 	int cpu;
@@ -2218,7 +2266,7 @@ void __init rcu_init_nohz(void)
 		rdp = per_cpu_ptr(&rcu_data, cpu);
 		if (rcu_segcblist_empty(&rdp->cblist))
 			rcu_segcblist_init(&rdp->cblist);
-		rcu_segcblist_offload(&rdp->cblist);
+		rcu_segcblist_offload(&rdp->cblist, true);
 	}
 	rcu_organize_nocb_kthreads();
 }
-- 
2.25.0


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH 09/10] rcu: Allow to re-offload a CPU that used to be nocb
  2020-05-13 16:47 [PATCH 00/10] rcu: Allow a CPU to leave and reenter NOCB state Frederic Weisbecker
                   ` (7 preceding siblings ...)
  2020-05-13 16:47 ` [PATCH 08/10] rcu: Allow to deactivate nocb on a CPU Frederic Weisbecker
@ 2020-05-13 16:47 ` Frederic Weisbecker
  2020-05-13 18:41   ` Paul E. McKenney
  2020-05-13 16:47 ` [PATCH 10/10] rcu: Nocb (de)activate through sysfs Frederic Weisbecker
  2020-05-13 18:15 ` [PATCH 00/10] rcu: Allow a CPU to leave and reenter NOCB state Paul E. McKenney
  10 siblings, 1 reply; 57+ messages in thread
From: Frederic Weisbecker @ 2020-05-13 16:47 UTC (permalink / raw)
  To: Paul E . McKenney
  Cc: LKML, Frederic Weisbecker, Steven Rostedt, Mathieu Desnoyers,
	Lai Jiangshan, Joel Fernandes, Josh Triplett

This is essentially the reverse operation of de-offloading. For now it's
only supported on CPUs that used to be offloaded and therefore still have
the relevant nocb_cb/nocb_gp kthreads around.

Inspired-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
---
 include/linux/rcupdate.h |  2 ++
 kernel/rcu/tree_plugin.h | 44 ++++++++++++++++++++++++++++++++++++++++
 2 files changed, 46 insertions(+)

diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index 1d3a4c37c3c1..ee95e49d675f 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -96,9 +96,11 @@ static inline void rcu_user_exit(void) { }
 
 #ifdef CONFIG_RCU_NOCB_CPU
 void rcu_init_nohz(void);
+void rcu_nocb_cpu_offload(int cpu);
 void rcu_nocb_cpu_deoffload(int cpu);
 #else /* #ifdef CONFIG_RCU_NOCB_CPU */
 static inline void rcu_init_nohz(void) { }
+static inline void rcu_nocb_cpu_offload(int cpu) { }
 static inline void rcu_nocb_cpu_deoffload(int cpu) { }
 #endif /* #else #ifdef CONFIG_RCU_NOCB_CPU */
 
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index c74a4df8d5f2..ae4b5e9f2fc5 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -2224,6 +2224,50 @@ void rcu_nocb_cpu_deoffload(int cpu)
 	mutex_unlock(&rcu_state.barrier_mutex);
 }
 
+static void __rcu_nocb_rdp_offload(struct rcu_data *rdp)
+{
+	unsigned long flags;
+	struct rcu_node *rnp = rdp->mynode;
+
+	printk("Offloading %d\n", rdp->cpu);
+
+	raw_spin_lock_irqsave(&rdp->nocb_lock, flags);
+	raw_spin_lock_rcu_node(rnp);
+	rcu_segcblist_offload(&rdp->cblist, true);
+	raw_spin_unlock_rcu_node(rnp);
+	raw_spin_unlock_irqrestore(&rdp->nocb_lock, flags);
+
+	kthread_unpark(rdp->nocb_cb_kthread);
+}
+
+static long rcu_nocb_rdp_offload(void *arg)
+{
+	struct rcu_data *rdp = arg;
+
+	WARN_ON_ONCE(rdp->cpu != raw_smp_processor_id());
+	__rcu_nocb_rdp_offload(rdp);
+
+	return 0;
+}
+
+void rcu_nocb_cpu_offload(int cpu)
+{
+	struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
+
+	mutex_lock(&rcu_state.barrier_mutex);
+	cpus_read_lock();
+	if (!rcu_segcblist_is_offloaded(&rdp->cblist) && rdp->nocb_cb_kthread) {
+		if (cpu_online(cpu)) {
+			work_on_cpu(cpu, rcu_nocb_rdp_offload, rdp);
+		} else {
+			__rcu_nocb_rdp_offload(rdp);
+		}
+		cpumask_set_cpu(cpu, rcu_nocb_mask);
+	}
+	cpus_read_unlock();
+	mutex_unlock(&rcu_state.barrier_mutex);
+}
+
 void __init rcu_init_nohz(void)
 {
 	int cpu;
-- 
2.25.0


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH 10/10] rcu: Nocb (de)activate through sysfs
  2020-05-13 16:47 [PATCH 00/10] rcu: Allow a CPU to leave and reenter NOCB state Frederic Weisbecker
                   ` (8 preceding siblings ...)
  2020-05-13 16:47 ` [PATCH 09/10] rcu: Allow to re-offload a CPU that used to be nocb Frederic Weisbecker
@ 2020-05-13 16:47 ` Frederic Weisbecker
  2020-05-13 18:42   ` Paul E. McKenney
  2020-05-13 18:15 ` [PATCH 00/10] rcu: Allow a CPU to leave and reenter NOCB state Paul E. McKenney
  10 siblings, 1 reply; 57+ messages in thread
From: Frederic Weisbecker @ 2020-05-13 16:47 UTC (permalink / raw)
  To: Paul E . McKenney
  Cc: LKML, Frederic Weisbecker, Steven Rostedt, Mathieu Desnoyers,
	Lai Jiangshan, Joel Fernandes, Josh Triplett

Not for merge.

Make nocb toggable for a given CPU using:
	/sys/devices/system/cpu/cpu*/hotplug/nocb

This is only intended for those who want to test this patchset. The real
interfaces will be cpuset/isolation and rcutorture.

Not-Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
---
 kernel/cpu.c | 23 +++++++++++++++++++++++
 1 file changed, 23 insertions(+)

diff --git a/kernel/cpu.c b/kernel/cpu.c
index 2371292f30b0..ac6283dcb897 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -2208,10 +2208,33 @@ static ssize_t show_cpuhp_fail(struct device *dev,
 
 static DEVICE_ATTR(fail, 0644, show_cpuhp_fail, write_cpuhp_fail);
 
+static ssize_t write_nocb(struct device *dev,
+			  struct device_attribute *attr,
+			  const char *buf, size_t count)
+{
+	int val, ret;
+
+	ret = kstrtoint(buf, 10, &val);
+	if (ret)
+		return ret;
+
+	if (val == 0)
+		rcu_nocb_cpu_deoffload(dev->id);
+	else if (val == 1)
+		rcu_nocb_cpu_offload(dev->id);
+	else
+		return -EINVAL;
+
+	return count;
+}
+
+static DEVICE_ATTR(nocb, 0644, NULL, write_nocb);
+
 static struct attribute *cpuhp_cpu_attrs[] = {
 	&dev_attr_state.attr,
 	&dev_attr_target.attr,
 	&dev_attr_fail.attr,
+	&dev_attr_nocb.attr,
 	NULL
 };
 
-- 
2.25.0


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* Re: [PATCH 00/10] rcu: Allow a CPU to leave and reenter NOCB state
  2020-05-13 16:47 [PATCH 00/10] rcu: Allow a CPU to leave and reenter NOCB state Frederic Weisbecker
                   ` (9 preceding siblings ...)
  2020-05-13 16:47 ` [PATCH 10/10] rcu: Nocb (de)activate through sysfs Frederic Weisbecker
@ 2020-05-13 18:15 ` Paul E. McKenney
  10 siblings, 0 replies; 57+ messages in thread
From: Paul E. McKenney @ 2020-05-13 18:15 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan,
	Joel Fernandes, Josh Triplett

On Wed, May 13, 2020 at 06:47:04PM +0200, Frederic Weisbecker wrote:
> This is a necessary step toward making nohz_full controllable through
> cpuset. Next step should be to allow a CPU to be nocb even if it wasn't
> part of the nocb set on boot.
> 
> The core design of this set is mostly based on suggestions from Paul
> of course.
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks.git
> 	rcu/nohz
> 
> HEAD: 31cb4ee9da4e9cc6314498ff22d83f0d872b1a88

Very cool!!!  A few comments on individual commits on a quick first
scan, and more later.

							Thanx, Paul

> Thanks,
> 	Frederic
> ---
> 
> Frederic Weisbecker (10):
>       rcu: Directly lock rdp->nocb_lock on nocb code entrypoints
>       rcu: Use direct rdp->nocb_lock operations on local calls
>       rcu: Make locking explicit in do_nocb_deferred_wakeup_common()
>       rcu: Implement rcu_segcblist_is_offloaded() config dependent
>       rcu: Remove useless conditional nocb unlock
>       rcu: Make nocb_cb kthread parkable
>       rcu: Temporarily assume that nohz full CPUs might not be NOCB
>       rcu: Allow to deactivate nocb on a CPU
>       rcu: Allow to re-offload a CPU that used to be nocb
>       rcu: Nocb (de)activate through sysfs
> 
> 
>  include/linux/rcu_segcblist.h |   2 +
>  include/linux/rcupdate.h      |   4 ++
>  kernel/cpu.c                  |  23 +++++++
>  kernel/rcu/rcu_segcblist.c    |   6 +-
>  kernel/rcu/rcu_segcblist.h    |   8 ++-
>  kernel/rcu/tree.c             |  24 +++----
>  kernel/rcu/tree.h             |   2 +-
>  kernel/rcu/tree_plugin.h      | 149 ++++++++++++++++++++++++++++++++++--------
>  8 files changed, 172 insertions(+), 46 deletions(-)

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 04/10] rcu: Implement rcu_segcblist_is_offloaded() config dependent
  2020-05-13 16:47 ` [PATCH 04/10] rcu: Implement rcu_segcblist_is_offloaded() config dependent Frederic Weisbecker
@ 2020-05-13 18:20   ` Paul E. McKenney
  2020-05-13 23:03     ` Frederic Weisbecker
  0 siblings, 1 reply; 57+ messages in thread
From: Paul E. McKenney @ 2020-05-13 18:20 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan,
	Joel Fernandes, Josh Triplett

On Wed, May 13, 2020 at 06:47:08PM +0200, Frederic Weisbecker wrote:
> This simplify the usage of this API and avoid checking the kernel
> config from the callers.
> 
> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
> Cc: Paul E. McKenney <paulmck@kernel.org>
> Cc: Josh Triplett <josh@joshtriplett.org>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> Cc: Lai Jiangshan <jiangshanlai@gmail.com>
> Cc: Joel Fernandes <joel@joelfernandes.org>
> ---
>  include/linux/rcu_segcblist.h |  2 ++
>  kernel/rcu/rcu_segcblist.c    |  2 ++
>  kernel/rcu/rcu_segcblist.h    |  6 ++++++
>  kernel/rcu/tree.c             | 21 +++++++--------------
>  4 files changed, 17 insertions(+), 14 deletions(-)
> 
> diff --git a/include/linux/rcu_segcblist.h b/include/linux/rcu_segcblist.h
> index b36afe7b22c9..0ced0a0ecbcf 100644
> --- a/include/linux/rcu_segcblist.h
> +++ b/include/linux/rcu_segcblist.h
> @@ -73,7 +73,9 @@ struct rcu_segcblist {
>  	long len;
>  #endif
>  	u8 enabled;
> +#ifdef CONFIG_RCU_NOCB_CPU
>  	u8 offloaded;
> +#endif

Given that this is only one byte and that removing it won't actually
save any memory on most architectures, why not just leave it and
adjust as shown below?

>  };
>  
>  #define RCU_SEGCBLIST_INITIALIZER(n) \
> diff --git a/kernel/rcu/rcu_segcblist.c b/kernel/rcu/rcu_segcblist.c
> index 9a0f66133b4b..d8ea2bef5574 100644
> --- a/kernel/rcu/rcu_segcblist.c
> +++ b/kernel/rcu/rcu_segcblist.c
> @@ -166,6 +166,7 @@ void rcu_segcblist_disable(struct rcu_segcblist *rsclp)
>  	rsclp->enabled = 0;
>  }
>  
> +#ifdef CONFIG_RCU_NOCB_CPU
>  /*
>   * Mark the specified rcu_segcblist structure as offloaded.  This
>   * structure must be empty.
> @@ -174,6 +175,7 @@ void rcu_segcblist_offload(struct rcu_segcblist *rsclp)
>  {
>  	rsclp->offloaded = 1;
>  }
> +#endif

Leave this unconditional, as it is nowhere near a fastpath.

>  /*
>   * Does the specified rcu_segcblist structure contain callbacks that
> diff --git a/kernel/rcu/rcu_segcblist.h b/kernel/rcu/rcu_segcblist.h
> index 5c293afc07b8..4c1503a82492 100644
> --- a/kernel/rcu/rcu_segcblist.h
> +++ b/kernel/rcu/rcu_segcblist.h
> @@ -62,7 +62,11 @@ static inline bool rcu_segcblist_is_enabled(struct rcu_segcblist *rsclp)
>  /* Is the specified rcu_segcblist offloaded?  */
>  static inline bool rcu_segcblist_is_offloaded(struct rcu_segcblist *rsclp)
>  {
> +#ifdef CONFIG_RCU_NOCB_CPU
>  	return rsclp->offloaded;
> +#else
> +	return false;
> +#endif
>  }

Then this can just be:

	return IS_ENABLED(CONFIG_RCU_NOCB_CPU) && rsclp->offloaded;

>  /*
> @@ -78,7 +82,9 @@ static inline bool rcu_segcblist_restempty(struct rcu_segcblist *rsclp, int seg)
>  void rcu_segcblist_inc_len(struct rcu_segcblist *rsclp);
>  void rcu_segcblist_init(struct rcu_segcblist *rsclp);
>  void rcu_segcblist_disable(struct rcu_segcblist *rsclp);
> +#ifdef CONFIG_RCU_NOCB_CPU
>  void rcu_segcblist_offload(struct rcu_segcblist *rsclp);
> +#endif

This can then remain unconditional.

>  bool rcu_segcblist_ready_cbs(struct rcu_segcblist *rsclp);
>  bool rcu_segcblist_pend_cbs(struct rcu_segcblist *rsclp);
>  struct rcu_head *rcu_segcblist_first_cb(struct rcu_segcblist *rsclp);
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index d9a49cd6065a..804cf7dfff03 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -1401,8 +1401,7 @@ static bool __note_gp_changes(struct rcu_node *rnp, struct rcu_data *rdp)
>  {
>  	bool ret = false;
>  	bool need_qs;
> -	const bool offloaded = IS_ENABLED(CONFIG_RCU_NOCB_CPU) &&
> -			       rcu_segcblist_is_offloaded(&rdp->cblist);
> +	const bool offloaded = rcu_segcblist_is_offloaded(&rdp->cblist);

The adjustment to rcu_segcblist_is_offloaded() allows this (welcome!)
simplification to remain.

>  	raw_lockdep_assert_held_rcu_node(rnp);
>  
> @@ -1790,8 +1789,7 @@ static void rcu_gp_cleanup(void)
>  		needgp = true;
>  	}
>  	/* Advance CBs to reduce false positives below. */
> -	offloaded = IS_ENABLED(CONFIG_RCU_NOCB_CPU) &&
> -		    rcu_segcblist_is_offloaded(&rdp->cblist);
> +	offloaded = rcu_segcblist_is_offloaded(&rdp->cblist);

And here as well.

>  	if ((offloaded || !rcu_accelerate_cbs(rnp, rdp)) && needgp) {
>  		WRITE_ONCE(rcu_state.gp_flags, RCU_GP_FLAG_INIT);
>  		WRITE_ONCE(rcu_state.gp_req_activity, jiffies);
> @@ -1985,8 +1983,7 @@ rcu_report_qs_rdp(int cpu, struct rcu_data *rdp)
>  	unsigned long flags;
>  	unsigned long mask;
>  	bool needwake = false;
> -	const bool offloaded = IS_ENABLED(CONFIG_RCU_NOCB_CPU) &&
> -			       rcu_segcblist_is_offloaded(&rdp->cblist);
> +	const bool offloaded = rcu_segcblist_is_offloaded(&rdp->cblist);

Ditto.

>  	struct rcu_node *rnp;
>  
>  	rnp = rdp->mynode;
> @@ -2153,8 +2150,7 @@ int rcutree_dead_cpu(unsigned int cpu)
>  static void rcu_do_batch(struct rcu_data *rdp)
>  {
>  	unsigned long flags;
> -	const bool offloaded = IS_ENABLED(CONFIG_RCU_NOCB_CPU) &&
> -			       rcu_segcblist_is_offloaded(&rdp->cblist);
> +	const bool offloaded = rcu_segcblist_is_offloaded(&rdp->cblist);

Ditto.

>  	struct rcu_head *rhp;
>  	struct rcu_cblist rcl = RCU_CBLIST_INITIALIZER(rcl);
>  	long bl, count;
> @@ -2397,8 +2393,7 @@ static __latent_entropy void rcu_core(void)
>  	unsigned long flags;
>  	struct rcu_data *rdp = raw_cpu_ptr(&rcu_data);
>  	struct rcu_node *rnp = rdp->mynode;
> -	const bool offloaded = IS_ENABLED(CONFIG_RCU_NOCB_CPU) &&
> -			       rcu_segcblist_is_offloaded(&rdp->cblist);
> +	const bool offloaded = rcu_segcblist_is_offloaded(&rdp->cblist);

Ditto.

>  
>  	if (cpu_is_offline(smp_processor_id()))
>  		return;
> @@ -2695,8 +2690,7 @@ __call_rcu(struct rcu_head *head, rcu_callback_t func)
>  				   rcu_segcblist_n_cbs(&rdp->cblist));
>  
>  	/* Go handle any RCU core processing required. */
> -	if (IS_ENABLED(CONFIG_RCU_NOCB_CPU) &&
> -	    unlikely(rcu_segcblist_is_offloaded(&rdp->cblist))) {
> +	if (unlikely(rcu_segcblist_is_offloaded(&rdp->cblist))) {

Ditto.

>  		__call_rcu_nocb_wake(rdp, was_alldone, flags); /* unlocks */
>  	} else {
>  		__call_rcu_core(rdp, head, flags);
> @@ -3243,8 +3237,7 @@ static int rcu_pending(int user)
>  
>  	/* Has RCU gone idle with this CPU needing another grace period? */
>  	if (!gp_in_progress && rcu_segcblist_is_enabled(&rdp->cblist) &&
> -	    (!IS_ENABLED(CONFIG_RCU_NOCB_CPU) ||
> -	     !rcu_segcblist_is_offloaded(&rdp->cblist)) &&
> +	    !rcu_segcblist_is_offloaded(&rdp->cblist) &&

Ditto.

As in "Why didn't I do it that way to start with???"  ;-)

							Thanx, Paul

>  	    !rcu_segcblist_restempty(&rdp->cblist, RCU_NEXT_READY_TAIL))
>  		return 1;
>  
> -- 
> 2.25.0
> 

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 07/10] rcu: Temporarily assume that nohz full CPUs might not be NOCB
  2020-05-13 16:47 ` [PATCH 07/10] rcu: Temporarily assume that nohz full CPUs might not be NOCB Frederic Weisbecker
@ 2020-05-13 18:25   ` Paul E. McKenney
  2020-05-13 23:08     ` Frederic Weisbecker
  0 siblings, 1 reply; 57+ messages in thread
From: Paul E. McKenney @ 2020-05-13 18:25 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan,
	Joel Fernandes, Josh Triplett

On Wed, May 13, 2020 at 06:47:11PM +0200, Frederic Weisbecker wrote:
> So far nohz_full CPUs had to be nocb. This requirement may change
> temporarily as we are working on preparing RCU to be able to toggle the
> nocb state of a CPU. Once that is done and nohz_full can be toggled as
> well dynamically, we'll restore that initial requirement.

Would it simplify anything to make the CPU exit nohz_full first and
then exit rcu_nocb and vice versa in the other direction?  That way the
assumption about nohz_full CPUs always being rcu_nocb could remain while
still allowing runtime changes to both states.

Of course, given that setup, it would not be possible to cause a CPU to
exit rcu_nocb state if it was still in nohz_full state.

My fear is that allowing a CPU to be in nohz_full state without also
being in rcu_nocb state will cause needless confusion and bug reports.

							Thanx, Paul

> Thus for now as a temporary state, make rcu_nohz_full_cpu() aware of
> nohz_full CPUs that are not nocb so that they can handle the callbacks
> locally.
> 
> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
> Cc: Paul E. McKenney <paulmck@kernel.org>
> Cc: Josh Triplett <josh@joshtriplett.org>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> Cc: Lai Jiangshan <jiangshanlai@gmail.com>
> Cc: Joel Fernandes <joel@joelfernandes.org>
> ---
>  kernel/rcu/tree.c        | 2 +-
>  kernel/rcu/tree.h        | 2 +-
>  kernel/rcu/tree_plugin.h | 7 ++++---
>  3 files changed, 6 insertions(+), 5 deletions(-)
> 
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index cc95419f6491..74b6798309ef 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -3223,7 +3223,7 @@ static int rcu_pending(int user)
>  		return 1;
>  
>  	/* Is this a nohz_full CPU in userspace or idle?  (Ignore RCU if so.) */
> -	if ((user || rcu_is_cpu_rrupt_from_idle()) && rcu_nohz_full_cpu())
> +	if ((user || rcu_is_cpu_rrupt_from_idle()) && rcu_nohz_full_cpu(rdp))
>  		return 0;
>  
>  	/* Is the RCU core waiting for a quiescent state from this CPU? */
> diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
> index 9dc2ec021da5..4b9643d9f5e0 100644
> --- a/kernel/rcu/tree.h
> +++ b/kernel/rcu/tree.h
> @@ -451,7 +451,7 @@ do {									\
>  #endif /* #else #ifdef CONFIG_RCU_NOCB_CPU */
>  
>  static void rcu_bind_gp_kthread(void);
> -static bool rcu_nohz_full_cpu(void);
> +static bool rcu_nohz_full_cpu(struct rcu_data *rdp);
>  static void rcu_dynticks_task_enter(void);
>  static void rcu_dynticks_task_exit(void);
>  
> diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> index 43ecc047af26..f19e81e0c691 100644
> --- a/kernel/rcu/tree_plugin.h
> +++ b/kernel/rcu/tree_plugin.h
> @@ -2532,13 +2532,14 @@ static void show_rcu_nocb_state(struct rcu_data *rdp)
>   * The idea is to avoid waking up RCU core processing on such a
>   * CPU unless the grace period has extended for too long.
>   *
> - * This code relies on the fact that all NO_HZ_FULL CPUs are also
> - * CONFIG_RCU_NOCB_CPU CPUs.
> + * This code relies on the fact that NO_HZ_FULL CPUs might not
> + * be CONFIG_RCU_NOCB_CPU CPUs (temporary development state).
>   */
> -static bool rcu_nohz_full_cpu(void)
> +static bool rcu_nohz_full_cpu(struct rcu_data *rdp)
>  {
>  #ifdef CONFIG_NO_HZ_FULL
>  	if (tick_nohz_full_cpu(smp_processor_id()) &&
> +	    rcu_segcblist_is_offloaded(&rdp->cblist) &&
>  	    (!rcu_gp_in_progress() ||
>  	     ULONG_CMP_LT(jiffies, READ_ONCE(rcu_state.gp_start) + HZ)))
>  		return true;
> -- 
> 2.25.0
> 

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 08/10] rcu: Allow to deactivate nocb on a CPU
  2020-05-13 16:47 ` [PATCH 08/10] rcu: Allow to deactivate nocb on a CPU Frederic Weisbecker
@ 2020-05-13 18:38   ` Paul E. McKenney
  2020-05-13 22:45     ` Frederic Weisbecker
  2020-05-26 21:20   ` Joel Fernandes
  1 sibling, 1 reply; 57+ messages in thread
From: Paul E. McKenney @ 2020-05-13 18:38 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan,
	Joel Fernandes, Josh Triplett

On Wed, May 13, 2020 at 06:47:12PM +0200, Frederic Weisbecker wrote:
> Allow a CPU's rdp to quit the callback offlined mode.
> The switch happens on the target with IRQs disabled and rdp->nocb_lock
> held to avoid races between local callbacks handling and kthread
> offloaded callbacks handling.
> 
> nocb_cb kthread is first parked to avoid any future race with
> concurrent rcu_do_batch() executions. Then the cblist is set to offloaded
> so that the nocb_gp kthread ignores this rdp.
> 
> Inspired-by: Paul E. McKenney <paulmck@kernel.org>
> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
> Cc: Paul E. McKenney <paulmck@kernel.org>
> Cc: Josh Triplett <josh@joshtriplett.org>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> Cc: Lai Jiangshan <jiangshanlai@gmail.com>
> Cc: Joel Fernandes <joel@joelfernandes.org>
> ---
>  include/linux/rcupdate.h   |  2 ++
>  kernel/rcu/rcu_segcblist.c |  4 +--
>  kernel/rcu/rcu_segcblist.h |  2 +-
>  kernel/rcu/tree_plugin.h   | 50 +++++++++++++++++++++++++++++++++++++-
>  4 files changed, 54 insertions(+), 4 deletions(-)
> 
> diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
> index 2678a37c3169..1d3a4c37c3c1 100644
> --- a/include/linux/rcupdate.h
> +++ b/include/linux/rcupdate.h
> @@ -96,8 +96,10 @@ static inline void rcu_user_exit(void) { }
>  
>  #ifdef CONFIG_RCU_NOCB_CPU
>  void rcu_init_nohz(void);
> +void rcu_nocb_cpu_deoffload(int cpu);
>  #else /* #ifdef CONFIG_RCU_NOCB_CPU */
>  static inline void rcu_init_nohz(void) { }
> +static inline void rcu_nocb_cpu_deoffload(int cpu) { }
>  #endif /* #else #ifdef CONFIG_RCU_NOCB_CPU */
>  
>  /**
> diff --git a/kernel/rcu/rcu_segcblist.c b/kernel/rcu/rcu_segcblist.c
> index d8ea2bef5574..4bed48da7702 100644
> --- a/kernel/rcu/rcu_segcblist.c
> +++ b/kernel/rcu/rcu_segcblist.c
> @@ -171,9 +171,9 @@ void rcu_segcblist_disable(struct rcu_segcblist *rsclp)
>   * Mark the specified rcu_segcblist structure as offloaded.  This
>   * structure must be empty.
>   */
> -void rcu_segcblist_offload(struct rcu_segcblist *rsclp)
> +void rcu_segcblist_offload(struct rcu_segcblist *rsclp, bool offload)
>  {
> -	rsclp->offloaded = 1;
> +	rsclp->offloaded = offload;
>  }
>  #endif
>  
> diff --git a/kernel/rcu/rcu_segcblist.h b/kernel/rcu/rcu_segcblist.h
> index 4c1503a82492..8f7c6c34cb1b 100644
> --- a/kernel/rcu/rcu_segcblist.h
> +++ b/kernel/rcu/rcu_segcblist.h
> @@ -83,7 +83,7 @@ void rcu_segcblist_inc_len(struct rcu_segcblist *rsclp);
>  void rcu_segcblist_init(struct rcu_segcblist *rsclp);
>  void rcu_segcblist_disable(struct rcu_segcblist *rsclp);
>  #ifdef CONFIG_RCU_NOCB_CPU
> -void rcu_segcblist_offload(struct rcu_segcblist *rsclp);
> +void rcu_segcblist_offload(struct rcu_segcblist *rsclp, bool offload);
>  #endif
>  bool rcu_segcblist_ready_cbs(struct rcu_segcblist *rsclp);
>  bool rcu_segcblist_pend_cbs(struct rcu_segcblist *rsclp);
> diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> index f19e81e0c691..c74a4df8d5f2 100644
> --- a/kernel/rcu/tree_plugin.h
> +++ b/kernel/rcu/tree_plugin.h
> @@ -1943,6 +1943,10 @@ static void nocb_gp_wait(struct rcu_data *my_rdp)
>  	for (rdp = my_rdp; rdp; rdp = rdp->nocb_next_cb_rdp) {
>  		trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("Check"));
>  		raw_spin_lock_irqsave(&rdp->nocb_lock, flags);
> +		if (!rcu_segcblist_is_offloaded(&rdp->cblist)) {
> +			raw_spin_unlock_irqrestore(&rdp->nocb_lock, flags);
> +			continue;
> +		}
>  		bypass_ncbs = rcu_cblist_n_cbs(&rdp->nocb_bypass);
>  		if (bypass_ncbs &&
>  		    (time_after(j, READ_ONCE(rdp->nocb_bypass_first) + 1) ||
> @@ -2176,6 +2180,50 @@ static void do_nocb_deferred_wakeup(struct rcu_data *rdp)
>  		do_nocb_deferred_wakeup_common(rdp);
>  }
>  
> +static void __rcu_nocb_rdp_deoffload(struct rcu_data *rdp)
> +{
> +	unsigned long flags;
> +	struct rcu_node *rnp = rdp->mynode;
> +
> +	printk("De-offloading %d\n", rdp->cpu);
> +	kthread_park(rdp->nocb_cb_kthread);

I am a bit concerned about this, because from this point until the
end of this function, no RCU callbacks can be invoked for this CPU.
This could be a problem if there is a callback flood in progress, and
such callback floods are in fact one reason that the sysadm might want
to be switching from offloaded to non-offloaded.  Is it possible to
move this kthread_park() to the end of this function?  Yes, this can
result in concurrent invocation of different callbacks for this CPU,
but because we have excluded rcu_barrier(), that should be OK.

Or is there some failure mode that I am failing to see?  (Wouldn't
be the first time...)

> +	raw_spin_lock_irqsave(&rdp->nocb_lock, flags);
> +	rcu_nocb_flush_bypass(rdp, NULL, jiffies);
> +	raw_spin_lock_rcu_node(rnp);
> +	rcu_segcblist_offload(&rdp->cblist, false);
> +	raw_spin_unlock_rcu_node(rnp);
> +	raw_spin_unlock_irqrestore(&rdp->nocb_lock, flags);
> +}
> +
> +static long rcu_nocb_rdp_deoffload(void *arg)
> +{
> +	struct rcu_data *rdp = arg;
> +
> +	WARN_ON_ONCE(rdp->cpu != raw_smp_processor_id());
> +	__rcu_nocb_rdp_deoffload(rdp);
> +
> +	return 0;
> +}

For example, is the problem caused by invocations of this
rcu_nocb_rdp_deoffload() function?  But if so, do we really need to
acquire rcu_state.barrier_mutex?

That aside, if it is possible to do the switch without interrupting
callback invocation?  Or is your idea that because we are always executing
on the CPU being deoffloaded, that CPU has been prevented from posting
callbacks in any case?  If the latter, does that mean that it is not
possible to deoffload offlined CPUs?  (Not sure whether this restriction
is a real problem, but figured that I should ask.)

							Thanx, Paul

> +void rcu_nocb_cpu_deoffload(int cpu)
> +{
> +	struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
> +
> +	mutex_lock(&rcu_state.barrier_mutex);
> +	cpus_read_lock();
> +	if (rcu_segcblist_is_offloaded(&rdp->cblist)) {
> +		if (cpu_online(cpu)) {
> +			work_on_cpu(cpu, rcu_nocb_rdp_deoffload, rdp);
> +		} else {
> +			__rcu_nocb_rdp_deoffload(rdp);
> +		}
> +		cpumask_clear_cpu(cpu, rcu_nocb_mask);
> +	}
> +	cpus_read_unlock();
> +	mutex_unlock(&rcu_state.barrier_mutex);
> +}
> +
>  void __init rcu_init_nohz(void)
>  {
>  	int cpu;
> @@ -2218,7 +2266,7 @@ void __init rcu_init_nohz(void)
>  		rdp = per_cpu_ptr(&rcu_data, cpu);
>  		if (rcu_segcblist_empty(&rdp->cblist))
>  			rcu_segcblist_init(&rdp->cblist);
> -		rcu_segcblist_offload(&rdp->cblist);
> +		rcu_segcblist_offload(&rdp->cblist, true);
>  	}
>  	rcu_organize_nocb_kthreads();
>  }
> -- 
> 2.25.0
> 

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 09/10] rcu: Allow to re-offload a CPU that used to be nocb
  2020-05-13 16:47 ` [PATCH 09/10] rcu: Allow to re-offload a CPU that used to be nocb Frederic Weisbecker
@ 2020-05-13 18:41   ` Paul E. McKenney
  0 siblings, 0 replies; 57+ messages in thread
From: Paul E. McKenney @ 2020-05-13 18:41 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan,
	Joel Fernandes, Josh Triplett

On Wed, May 13, 2020 at 06:47:13PM +0200, Frederic Weisbecker wrote:
> This is essentially the reverse operation of de-offloading. For now it's
> only supported on CPUs that used to be offloaded and therefore still have
> the relevant nocb_cb/nocb_gp kthreads around.

And I still believe that this is a good restriction to put in place,
especially to start with.  ;-)

> Inspired-by: Paul E. McKenney <paulmck@kernel.org>
> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
> Cc: Paul E. McKenney <paulmck@kernel.org>
> Cc: Josh Triplett <josh@joshtriplett.org>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> Cc: Lai Jiangshan <jiangshanlai@gmail.com>
> Cc: Joel Fernandes <joel@joelfernandes.org>
> ---
>  include/linux/rcupdate.h |  2 ++
>  kernel/rcu/tree_plugin.h | 44 ++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 46 insertions(+)
> 
> diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
> index 1d3a4c37c3c1..ee95e49d675f 100644
> --- a/include/linux/rcupdate.h
> +++ b/include/linux/rcupdate.h
> @@ -96,9 +96,11 @@ static inline void rcu_user_exit(void) { }
>  
>  #ifdef CONFIG_RCU_NOCB_CPU
>  void rcu_init_nohz(void);
> +void rcu_nocb_cpu_offload(int cpu);
>  void rcu_nocb_cpu_deoffload(int cpu);
>  #else /* #ifdef CONFIG_RCU_NOCB_CPU */
>  static inline void rcu_init_nohz(void) { }
> +static inline void rcu_nocb_cpu_offload(int cpu) { }
>  static inline void rcu_nocb_cpu_deoffload(int cpu) { }
>  #endif /* #else #ifdef CONFIG_RCU_NOCB_CPU */
>  
> diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> index c74a4df8d5f2..ae4b5e9f2fc5 100644
> --- a/kernel/rcu/tree_plugin.h
> +++ b/kernel/rcu/tree_plugin.h
> @@ -2224,6 +2224,50 @@ void rcu_nocb_cpu_deoffload(int cpu)
>  	mutex_unlock(&rcu_state.barrier_mutex);
>  }
>  
> +static void __rcu_nocb_rdp_offload(struct rcu_data *rdp)
> +{
> +	unsigned long flags;
> +	struct rcu_node *rnp = rdp->mynode;
> +
> +	printk("Offloading %d\n", rdp->cpu);
> +
> +	raw_spin_lock_irqsave(&rdp->nocb_lock, flags);
> +	raw_spin_lock_rcu_node(rnp);
> +	rcu_segcblist_offload(&rdp->cblist, true);
> +	raw_spin_unlock_rcu_node(rnp);
> +	raw_spin_unlock_irqrestore(&rdp->nocb_lock, flags);
> +
> +	kthread_unpark(rdp->nocb_cb_kthread);

And of course I have a similar set of questions here regarding
interruption of callback invocation.  ;-)

							Thanx, Paul

> +}
> +
> +static long rcu_nocb_rdp_offload(void *arg)
> +{
> +	struct rcu_data *rdp = arg;
> +
> +	WARN_ON_ONCE(rdp->cpu != raw_smp_processor_id());
> +	__rcu_nocb_rdp_offload(rdp);
> +
> +	return 0;
> +}
> +
> +void rcu_nocb_cpu_offload(int cpu)
> +{
> +	struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
> +
> +	mutex_lock(&rcu_state.barrier_mutex);
> +	cpus_read_lock();
> +	if (!rcu_segcblist_is_offloaded(&rdp->cblist) && rdp->nocb_cb_kthread) {
> +		if (cpu_online(cpu)) {
> +			work_on_cpu(cpu, rcu_nocb_rdp_offload, rdp);
> +		} else {
> +			__rcu_nocb_rdp_offload(rdp);
> +		}
> +		cpumask_set_cpu(cpu, rcu_nocb_mask);
> +	}
> +	cpus_read_unlock();
> +	mutex_unlock(&rcu_state.barrier_mutex);
> +}
> +
>  void __init rcu_init_nohz(void)
>  {
>  	int cpu;
> -- 
> 2.25.0
> 

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 10/10] rcu: Nocb (de)activate through sysfs
  2020-05-13 16:47 ` [PATCH 10/10] rcu: Nocb (de)activate through sysfs Frederic Weisbecker
@ 2020-05-13 18:42   ` Paul E. McKenney
  2020-05-13 23:23     ` Frederic Weisbecker
  0 siblings, 1 reply; 57+ messages in thread
From: Paul E. McKenney @ 2020-05-13 18:42 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan,
	Joel Fernandes, Josh Triplett

On Wed, May 13, 2020 at 06:47:14PM +0200, Frederic Weisbecker wrote:
> Not for merge.
> 
> Make nocb toggable for a given CPU using:
> 	/sys/devices/system/cpu/cpu*/hotplug/nocb
> 
> This is only intended for those who want to test this patchset. The real
> interfaces will be cpuset/isolation and rcutorture.

Makes sense!

Speaking of rcutorture, what level of testing has this series undergone?

							Thanx, Paul

> Not-Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
> Cc: Paul E. McKenney <paulmck@kernel.org>
> Cc: Josh Triplett <josh@joshtriplett.org>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> Cc: Lai Jiangshan <jiangshanlai@gmail.com>
> Cc: Joel Fernandes <joel@joelfernandes.org>
> ---
>  kernel/cpu.c | 23 +++++++++++++++++++++++
>  1 file changed, 23 insertions(+)
> 
> diff --git a/kernel/cpu.c b/kernel/cpu.c
> index 2371292f30b0..ac6283dcb897 100644
> --- a/kernel/cpu.c
> +++ b/kernel/cpu.c
> @@ -2208,10 +2208,33 @@ static ssize_t show_cpuhp_fail(struct device *dev,
>  
>  static DEVICE_ATTR(fail, 0644, show_cpuhp_fail, write_cpuhp_fail);
>  
> +static ssize_t write_nocb(struct device *dev,
> +			  struct device_attribute *attr,
> +			  const char *buf, size_t count)
> +{
> +	int val, ret;
> +
> +	ret = kstrtoint(buf, 10, &val);
> +	if (ret)
> +		return ret;
> +
> +	if (val == 0)
> +		rcu_nocb_cpu_deoffload(dev->id);
> +	else if (val == 1)
> +		rcu_nocb_cpu_offload(dev->id);
> +	else
> +		return -EINVAL;
> +
> +	return count;
> +}
> +
> +static DEVICE_ATTR(nocb, 0644, NULL, write_nocb);
> +
>  static struct attribute *cpuhp_cpu_attrs[] = {
>  	&dev_attr_state.attr,
>  	&dev_attr_target.attr,
>  	&dev_attr_fail.attr,
> +	&dev_attr_nocb.attr,
>  	NULL
>  };
>  
> -- 
> 2.25.0
> 

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 08/10] rcu: Allow to deactivate nocb on a CPU
  2020-05-13 18:38   ` Paul E. McKenney
@ 2020-05-13 22:45     ` Frederic Weisbecker
  2020-05-14 15:47       ` Paul E. McKenney
  0 siblings, 1 reply; 57+ messages in thread
From: Frederic Weisbecker @ 2020-05-13 22:45 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: LKML, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan,
	Joel Fernandes, Josh Triplett

On Wed, May 13, 2020 at 11:38:31AM -0700, Paul E. McKenney wrote:
> On Wed, May 13, 2020 at 06:47:12PM +0200, Frederic Weisbecker wrote:
> > +static void __rcu_nocb_rdp_deoffload(struct rcu_data *rdp)
> > +{
> > +	unsigned long flags;
> > +	struct rcu_node *rnp = rdp->mynode;
> > +
> > +	printk("De-offloading %d\n", rdp->cpu);
> > +	kthread_park(rdp->nocb_cb_kthread);
> 
> I am a bit concerned about this, because from this point until the
> end of this function, no RCU callbacks can be invoked for this CPU.
> This could be a problem if there is a callback flood in progress, and
> such callback floods are in fact one reason that the sysadm might want
> to be switching from offloaded to non-offloaded.  Is it possible to
> move this kthread_park() to the end of this function?  Yes, this can
> result in concurrent invocation of different callbacks for this CPU,
> but because we have excluded rcu_barrier(), that should be OK.
> 
> Or is there some failure mode that I am failing to see?  (Wouldn't
> be the first time...)

Heh I actually worried about that. Ok the issue is that it leaves
a window where nocb_cb and local caller of rcu_do_batch() can
compete but the local caller doesn't lock the nocb_lock.

So there are two ways to solve that.:

1)  - set cblist->offloaded = 0 locally
    - From the kthread while calling rcu_do_batch():
      check the value of cblist->offloaded everytime after
      we call rcu_nocb_lock() and stop messsing with the
      cblist and return when we see cblist->offloaded == 0
    - Allow to handle cblist locally without taking the nocb_lock
    - Park kthread

But there I'm worried about races. Imagine we have:


      Kthread                     Local
      --------                   -------
      rcu_do_batch() {
          rcu_nocb_lock()
          do stuff with cblist
          rcu_nocb_unlock()
                                 rcu_nocb_lock()
                                 set cblist->offloaded = 0
                                 rcu_nocb_unlock()
                                 ===> INT or preemption
                                 rcu_do_batch() {
                                     do stuff with cblist

Are we guaranteed that the Local CPU will see the updates from
the kthread while calling rcu_do_batch()? I would tend to say
yes but I'm not entirely sure...

Oh wait, that solution also implies that we can't re-enqueue
extracted callbacks if we spent took much time in threaded
rcu_do_batch(), as the cblist may have been offloaded while
we executed the extracted callbacks.

That's a lot of corner cases to handle, which is why I prefer
the other solution:

2) enum cblist_offloaded {
        CBLIST_NOT_OFFLOADED,
        CBLIST_(DE)OFFLOADING,
        CBLIST_OFFLOADED
   }

 - Locally set cblist->offloaded =  CBLIST_DEOFFLOADING
 - From the kthread while calling rcu_do_batch(), do as
   usual.
 - Local CPU can call rcu_do_batch() and if it sees CBLIST_DEOFFLOADING,
   rcu_nocb_lock() will take the lock.
 - Park kthread
 - Locally set cblist->offloaded =  CBLIST_NOT_OFFLOADED
 - Local calls to rcu_do_batch() won't take the lock anymore.


> > +static long rcu_nocb_rdp_deoffload(void *arg)
> > +{
> > +	struct rcu_data *rdp = arg;
> > +
> > +	WARN_ON_ONCE(rdp->cpu != raw_smp_processor_id());
> > +	__rcu_nocb_rdp_deoffload(rdp);
> > +
> > +	return 0;
> > +}
> 
> For example, is the problem caused by invocations of this
> rcu_nocb_rdp_deoffload() function?

How so?

> But if so, do we really need to acquire rcu_state.barrier_mutex?

Indeed it was probably not needed if we parked the kthread before
anything, as we would have kept the callbacks ordering.

But now if we allow concurrent callbacks execution during the small
window, we'll need it.

> 
> That aside, if it is possible to do the switch without interrupting
> callback invocation?  Or is your idea that because we are always executing
> on the CPU being deoffloaded, that CPU has been prevented from posting
> callbacks in any case?

No in the tiny window between kthread_park() and the irqs being disabled,
the workqueue can be preempted and thus call_rcu() can be called anytime.

> If the latter, does that mean that it is not
> possible to deoffload offlined CPUs?  (Not sure whether this restriction
> is a real problem, but figured that I should ask.)

Ah in the case of offlined CPUs I simply call the function directly from the CPU
that disables the nocb remotely. So we remotely park the kthread (that we
always do anyway) and set offlined.

Thanks.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 04/10] rcu: Implement rcu_segcblist_is_offloaded() config dependent
  2020-05-13 18:20   ` Paul E. McKenney
@ 2020-05-13 23:03     ` Frederic Weisbecker
  2020-05-14 15:47       ` Paul E. McKenney
  0 siblings, 1 reply; 57+ messages in thread
From: Frederic Weisbecker @ 2020-05-13 23:03 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: LKML, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan,
	Joel Fernandes, Josh Triplett

On Wed, May 13, 2020 at 11:20:29AM -0700, Paul E. McKenney wrote:
> On Wed, May 13, 2020 at 06:47:08PM +0200, Frederic Weisbecker wrote:
> > This simplify the usage of this API and avoid checking the kernel
> > config from the callers.
> > 
> > Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
> > Cc: Paul E. McKenney <paulmck@kernel.org>
> > Cc: Josh Triplett <josh@joshtriplett.org>
> > Cc: Steven Rostedt <rostedt@goodmis.org>
> > Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> > Cc: Lai Jiangshan <jiangshanlai@gmail.com>
> > Cc: Joel Fernandes <joel@joelfernandes.org>
> > ---
> >  include/linux/rcu_segcblist.h |  2 ++
> >  kernel/rcu/rcu_segcblist.c    |  2 ++
> >  kernel/rcu/rcu_segcblist.h    |  6 ++++++
> >  kernel/rcu/tree.c             | 21 +++++++--------------
> >  4 files changed, 17 insertions(+), 14 deletions(-)
> > 
> > diff --git a/include/linux/rcu_segcblist.h b/include/linux/rcu_segcblist.h
> > index b36afe7b22c9..0ced0a0ecbcf 100644
> > --- a/include/linux/rcu_segcblist.h
> > +++ b/include/linux/rcu_segcblist.h
> > @@ -73,7 +73,9 @@ struct rcu_segcblist {
> >  	long len;
> >  #endif
> >  	u8 enabled;
> > +#ifdef CONFIG_RCU_NOCB_CPU
> >  	u8 offloaded;
> > +#endif
> 
> Given that this is only one byte and that removing it won't actually
> save any memory on most architectures, why not just leave it and
> adjust as shown below?

Right, the point was to make it private to that config and trigger
a build error otherwise. But if we have an off case that's fine.

> 
> >  };
> >  
> >  #define RCU_SEGCBLIST_INITIALIZER(n) \
> > diff --git a/kernel/rcu/rcu_segcblist.c b/kernel/rcu/rcu_segcblist.c
> > index 9a0f66133b4b..d8ea2bef5574 100644
> > --- a/kernel/rcu/rcu_segcblist.c
> > +++ b/kernel/rcu/rcu_segcblist.c
> > @@ -166,6 +166,7 @@ void rcu_segcblist_disable(struct rcu_segcblist *rsclp)
> >  	rsclp->enabled = 0;
> >  }
> >  
> > +#ifdef CONFIG_RCU_NOCB_CPU
> >  /*
> >   * Mark the specified rcu_segcblist structure as offloaded.  This
> >   * structure must be empty.
> > @@ -174,6 +175,7 @@ void rcu_segcblist_offload(struct rcu_segcblist *rsclp)
> >  {
> >  	rsclp->offloaded = 1;
> >  }
> > +#endif
> 
> Leave this unconditional, as it is nowhere near a fastpath.

The point was to not raise false hopes to those who want to
offload when it's not supported.

Let's perhaps have at least a WARN_ON_ONCE(1) if it is called
when !CONFIG_RCU_NOCB_CPU ?

> 
> >  /*
> >   * Does the specified rcu_segcblist structure contain callbacks that
> > diff --git a/kernel/rcu/rcu_segcblist.h b/kernel/rcu/rcu_segcblist.h
> > index 5c293afc07b8..4c1503a82492 100644
> > --- a/kernel/rcu/rcu_segcblist.h
> > +++ b/kernel/rcu/rcu_segcblist.h
> > @@ -62,7 +62,11 @@ static inline bool rcu_segcblist_is_enabled(struct rcu_segcblist *rsclp)
> >  /* Is the specified rcu_segcblist offloaded?  */
> >  static inline bool rcu_segcblist_is_offloaded(struct rcu_segcblist *rsclp)
> >  {
> > +#ifdef CONFIG_RCU_NOCB_CPU
> >  	return rsclp->offloaded;
> > +#else
> > +	return false;
> > +#endif
> >  }
> 
> Then this can just be:
> 
> 	return IS_ENABLED(CONFIG_RCU_NOCB_CPU) && rsclp->offloaded;

Ok.

> > @@ -1401,8 +1401,7 @@ static bool __note_gp_changes(struct rcu_node *rnp, struct rcu_data *rdp)
> >  {
> >  	bool ret = false;
> >  	bool need_qs;
> > -	const bool offloaded = IS_ENABLED(CONFIG_RCU_NOCB_CPU) &&
> > -			       rcu_segcblist_is_offloaded(&rdp->cblist);
> > +	const bool offloaded = rcu_segcblist_is_offloaded(&rdp->cblist);
> 
> The adjustment to rcu_segcblist_is_offloaded() allows this (welcome!)
> simplification to remain.

Ok thanks!

> > @@ -3243,8 +3237,7 @@ static int rcu_pending(int user)
> >  
> >  	/* Has RCU gone idle with this CPU needing another grace period? */
> >  	if (!gp_in_progress && rcu_segcblist_is_enabled(&rdp->cblist) &&
> > -	    (!IS_ENABLED(CONFIG_RCU_NOCB_CPU) ||
> > -	     !rcu_segcblist_is_offloaded(&rdp->cblist)) &&
> > +	    !rcu_segcblist_is_offloaded(&rdp->cblist) &&
> 
> Ditto.
> 
> As in "Why didn't I do it that way to start with???"  ;-)

You say that to someone who's too lazy to script short commands typed
100 times a day ;-)

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 07/10] rcu: Temporarily assume that nohz full CPUs might not be NOCB
  2020-05-13 18:25   ` Paul E. McKenney
@ 2020-05-13 23:08     ` Frederic Weisbecker
  2020-05-14 15:50       ` Paul E. McKenney
  0 siblings, 1 reply; 57+ messages in thread
From: Frederic Weisbecker @ 2020-05-13 23:08 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: LKML, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan,
	Joel Fernandes, Josh Triplett

On Wed, May 13, 2020 at 11:25:27AM -0700, Paul E. McKenney wrote:
> On Wed, May 13, 2020 at 06:47:11PM +0200, Frederic Weisbecker wrote:
> > So far nohz_full CPUs had to be nocb. This requirement may change
> > temporarily as we are working on preparing RCU to be able to toggle the
> > nocb state of a CPU. Once that is done and nohz_full can be toggled as
> > well dynamically, we'll restore that initial requirement.
> 
> Would it simplify anything to make the CPU exit nohz_full first and
> then exit rcu_nocb and vice versa in the other direction?  That way the
> assumption about nohz_full CPUs always being rcu_nocb could remain while
> still allowing runtime changes to both states.

That's the future plan but for now nohz_full can't even be exited yet.
RCU is unlucky enough to be chosen as the starting point of this whole work :-)

> Of course, given that setup, it would not be possible to cause a CPU to
> exit rcu_nocb state if it was still in nohz_full state.

Right.

> My fear is that allowing a CPU to be in nohz_full state without also
> being in rcu_nocb state will cause needless confusion and bug reports.

Well, it should only be visible to those who work on it since there
won't be a proper interface before we achieve the whole.

Thanks.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 10/10] rcu: Nocb (de)activate through sysfs
  2020-05-13 18:42   ` Paul E. McKenney
@ 2020-05-13 23:23     ` Frederic Weisbecker
  2020-05-14 15:51       ` Paul E. McKenney
  0 siblings, 1 reply; 57+ messages in thread
From: Frederic Weisbecker @ 2020-05-13 23:23 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: LKML, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan,
	Joel Fernandes, Josh Triplett

On Wed, May 13, 2020 at 11:42:29AM -0700, Paul E. McKenney wrote:
> On Wed, May 13, 2020 at 06:47:14PM +0200, Frederic Weisbecker wrote:
> > Not for merge.
> > 
> > Make nocb toggable for a given CPU using:
> > 	/sys/devices/system/cpu/cpu*/hotplug/nocb
> > 
> > This is only intended for those who want to test this patchset. The real
> > interfaces will be cpuset/isolation and rcutorture.
> 
> Makes sense!
> 
> Speaking of rcutorture, what level of testing has this series undergone?

None yet, that first set was to shape up the design. But next
iteration should see rcutorture coverage.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 08/10] rcu: Allow to deactivate nocb on a CPU
  2020-05-13 22:45     ` Frederic Weisbecker
@ 2020-05-14 15:47       ` Paul E. McKenney
  2020-05-14 22:30         ` Frederic Weisbecker
  0 siblings, 1 reply; 57+ messages in thread
From: Paul E. McKenney @ 2020-05-14 15:47 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan,
	Joel Fernandes, Josh Triplett

On Thu, May 14, 2020 at 12:45:26AM +0200, Frederic Weisbecker wrote:
> On Wed, May 13, 2020 at 11:38:31AM -0700, Paul E. McKenney wrote:
> > On Wed, May 13, 2020 at 06:47:12PM +0200, Frederic Weisbecker wrote:
> > > +static void __rcu_nocb_rdp_deoffload(struct rcu_data *rdp)
> > > +{
> > > +	unsigned long flags;
> > > +	struct rcu_node *rnp = rdp->mynode;
> > > +
> > > +	printk("De-offloading %d\n", rdp->cpu);
> > > +	kthread_park(rdp->nocb_cb_kthread);
> > 
> > I am a bit concerned about this, because from this point until the
> > end of this function, no RCU callbacks can be invoked for this CPU.
> > This could be a problem if there is a callback flood in progress, and
> > such callback floods are in fact one reason that the sysadm might want
> > to be switching from offloaded to non-offloaded.  Is it possible to
> > move this kthread_park() to the end of this function?  Yes, this can
> > result in concurrent invocation of different callbacks for this CPU,
> > but because we have excluded rcu_barrier(), that should be OK.
> > 
> > Or is there some failure mode that I am failing to see?  (Wouldn't
> > be the first time...)
> 
> Heh I actually worried about that. Ok the issue is that it leaves
> a window where nocb_cb and local caller of rcu_do_batch() can
> compete but the local caller doesn't lock the nocb_lock.

Indeed, my nightmare scenario involves some sort of preemption or
similar long delay at that point.  Callbacks pile up, and then OOM!

> So there are two ways to solve that.:
> 
> 1)  - set cblist->offloaded = 0 locally
>     - From the kthread while calling rcu_do_batch():
>       check the value of cblist->offloaded everytime after
>       we call rcu_nocb_lock() and stop messsing with the
>       cblist and return when we see cblist->offloaded == 0
>     - Allow to handle cblist locally without taking the nocb_lock
>     - Park kthread
> 
> But there I'm worried about races. Imagine we have:
> 
> 
>       Kthread                     Local
>       --------                   -------
>       rcu_do_batch() {
>           rcu_nocb_lock()
>           do stuff with cblist
>           rcu_nocb_unlock()
>                                  rcu_nocb_lock()
>                                  set cblist->offloaded = 0
>                                  rcu_nocb_unlock()
>                                  ===> INT or preemption
>                                  rcu_do_batch() {
>                                      do stuff with cblist
> 
> Are we guaranteed that the Local CPU will see the updates from
> the kthread while calling rcu_do_batch()? I would tend to say
> yes but I'm not entirely sure...
> 
> Oh wait, that solution also implies that we can't re-enqueue
> extracted callbacks if we spent took much time in threaded
> rcu_do_batch(), as the cblist may have been offloaded while
> we executed the extracted callbacks.
> 
> That's a lot of corner cases to handle, which is why I prefer
> the other solution:
> 
> 2) enum cblist_offloaded {
>         CBLIST_NOT_OFFLOADED,
>         CBLIST_(DE)OFFLOADING,
>         CBLIST_OFFLOADED
>    }
> 
>  - Locally set cblist->offloaded =  CBLIST_DEOFFLOADING
>  - From the kthread while calling rcu_do_batch(), do as
>    usual.
>  - Local CPU can call rcu_do_batch() and if it sees CBLIST_DEOFFLOADING,
>    rcu_nocb_lock() will take the lock.
>  - Park kthread
>  - Locally set cblist->offloaded =  CBLIST_NOT_OFFLOADED
>  - Local calls to rcu_do_batch() won't take the lock anymore.

This last seems best to me.  The transition from CBLIST_NOT_OFFLOADED
to CBLIST_OFFLOADING of course needs to be on the CPU in question with
at least bh disabled.  Probably best to be holding rcu_nocb_lock(),
but that might just be me being overly paranoid.

> > > +static long rcu_nocb_rdp_deoffload(void *arg)
> > > +{
> > > +	struct rcu_data *rdp = arg;
> > > +
> > > +	WARN_ON_ONCE(rdp->cpu != raw_smp_processor_id());
> > > +	__rcu_nocb_rdp_deoffload(rdp);
> > > +
> > > +	return 0;
> > > +}
> > 
> > For example, is the problem caused by invocations of this
> > rcu_nocb_rdp_deoffload() function?
> 
> How so?

It looked to me like it wasn't excluding either rcu_barrier() or CPU
hotplug.  It might also not have been pinning onto the CPU in question,
but that might just be me misremembering.  Then again, I didn't see a
call to it, so maybe its callers set things up appropriately.

OK, I will bite...  What is the purpose of rcu_nocb_rdp_deoffload()?  ;-)

> > But if so, do we really need to acquire rcu_state.barrier_mutex?
> 
> Indeed it was probably not needed if we parked the kthread before
> anything, as we would have kept the callbacks ordering.
> 
> But now if we allow concurrent callbacks execution during the small
> window, we'll need it.

Agreed!  And I do believe that concurrent callback execution will
prove better than a possibly indefinite gap in callback execution.

> > That aside, if it is possible to do the switch without interrupting
> > callback invocation?  Or is your idea that because we are always executing
> > on the CPU being deoffloaded, that CPU has been prevented from posting
> > callbacks in any case?
> 
> No in the tiny window between kthread_park() and the irqs being disabled,
> the workqueue can be preempted and thus call_rcu() can be called anytime.

Agreed!  ;-)

> > If the latter, does that mean that it is not
> > possible to deoffload offlined CPUs?  (Not sure whether this restriction
> > is a real problem, but figured that I should ask.)
> 
> Ah in the case of offlined CPUs I simply call the function directly from the CPU
> that disables the nocb remotely. So we remotely park the kthread (that we
> always do anyway) and set offlined.

And the cpus_read_lock() in rcu_nocb_cpu_deoffload() prevents that CPU
from coming back online, so looks good!

							Thanx, Paul

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 04/10] rcu: Implement rcu_segcblist_is_offloaded() config dependent
  2020-05-13 23:03     ` Frederic Weisbecker
@ 2020-05-14 15:47       ` Paul E. McKenney
  0 siblings, 0 replies; 57+ messages in thread
From: Paul E. McKenney @ 2020-05-14 15:47 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan,
	Joel Fernandes, Josh Triplett

On Thu, May 14, 2020 at 01:03:31AM +0200, Frederic Weisbecker wrote:
> On Wed, May 13, 2020 at 11:20:29AM -0700, Paul E. McKenney wrote:
> > On Wed, May 13, 2020 at 06:47:08PM +0200, Frederic Weisbecker wrote:
> > > This simplify the usage of this API and avoid checking the kernel
> > > config from the callers.
> > > 
> > > Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
> > > Cc: Paul E. McKenney <paulmck@kernel.org>
> > > Cc: Josh Triplett <josh@joshtriplett.org>
> > > Cc: Steven Rostedt <rostedt@goodmis.org>
> > > Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> > > Cc: Lai Jiangshan <jiangshanlai@gmail.com>
> > > Cc: Joel Fernandes <joel@joelfernandes.org>
> > > ---
> > >  include/linux/rcu_segcblist.h |  2 ++
> > >  kernel/rcu/rcu_segcblist.c    |  2 ++
> > >  kernel/rcu/rcu_segcblist.h    |  6 ++++++
> > >  kernel/rcu/tree.c             | 21 +++++++--------------
> > >  4 files changed, 17 insertions(+), 14 deletions(-)
> > > 
> > > diff --git a/include/linux/rcu_segcblist.h b/include/linux/rcu_segcblist.h
> > > index b36afe7b22c9..0ced0a0ecbcf 100644
> > > --- a/include/linux/rcu_segcblist.h
> > > +++ b/include/linux/rcu_segcblist.h
> > > @@ -73,7 +73,9 @@ struct rcu_segcblist {
> > >  	long len;
> > >  #endif
> > >  	u8 enabled;
> > > +#ifdef CONFIG_RCU_NOCB_CPU
> > >  	u8 offloaded;
> > > +#endif
> > 
> > Given that this is only one byte and that removing it won't actually
> > save any memory on most architectures, why not just leave it and
> > adjust as shown below?
> 
> Right, the point was to make it private to that config and trigger
> a build error otherwise. But if we have an off case that's fine.
> 
> > 
> > >  };
> > >  
> > >  #define RCU_SEGCBLIST_INITIALIZER(n) \
> > > diff --git a/kernel/rcu/rcu_segcblist.c b/kernel/rcu/rcu_segcblist.c
> > > index 9a0f66133b4b..d8ea2bef5574 100644
> > > --- a/kernel/rcu/rcu_segcblist.c
> > > +++ b/kernel/rcu/rcu_segcblist.c
> > > @@ -166,6 +166,7 @@ void rcu_segcblist_disable(struct rcu_segcblist *rsclp)
> > >  	rsclp->enabled = 0;
> > >  }
> > >  
> > > +#ifdef CONFIG_RCU_NOCB_CPU
> > >  /*
> > >   * Mark the specified rcu_segcblist structure as offloaded.  This
> > >   * structure must be empty.
> > > @@ -174,6 +175,7 @@ void rcu_segcblist_offload(struct rcu_segcblist *rsclp)
> > >  {
> > >  	rsclp->offloaded = 1;
> > >  }
> > > +#endif
> > 
> > Leave this unconditional, as it is nowhere near a fastpath.
> 
> The point was to not raise false hopes to those who want to
> offload when it's not supported.
> 
> Let's perhaps have at least a WARN_ON_ONCE(1) if it is called
> when !CONFIG_RCU_NOCB_CPU ?

Sounds like a good choice to me!

> > >  /*
> > >   * Does the specified rcu_segcblist structure contain callbacks that
> > > diff --git a/kernel/rcu/rcu_segcblist.h b/kernel/rcu/rcu_segcblist.h
> > > index 5c293afc07b8..4c1503a82492 100644
> > > --- a/kernel/rcu/rcu_segcblist.h
> > > +++ b/kernel/rcu/rcu_segcblist.h
> > > @@ -62,7 +62,11 @@ static inline bool rcu_segcblist_is_enabled(struct rcu_segcblist *rsclp)
> > >  /* Is the specified rcu_segcblist offloaded?  */
> > >  static inline bool rcu_segcblist_is_offloaded(struct rcu_segcblist *rsclp)
> > >  {
> > > +#ifdef CONFIG_RCU_NOCB_CPU
> > >  	return rsclp->offloaded;
> > > +#else
> > > +	return false;
> > > +#endif
> > >  }
> > 
> > Then this can just be:
> > 
> > 	return IS_ENABLED(CONFIG_RCU_NOCB_CPU) && rsclp->offloaded;
> 
> Ok.
> 
> > > @@ -1401,8 +1401,7 @@ static bool __note_gp_changes(struct rcu_node *rnp, struct rcu_data *rdp)
> > >  {
> > >  	bool ret = false;
> > >  	bool need_qs;
> > > -	const bool offloaded = IS_ENABLED(CONFIG_RCU_NOCB_CPU) &&
> > > -			       rcu_segcblist_is_offloaded(&rdp->cblist);
> > > +	const bool offloaded = rcu_segcblist_is_offloaded(&rdp->cblist);
> > 
> > The adjustment to rcu_segcblist_is_offloaded() allows this (welcome!)
> > simplification to remain.
> 
> Ok thanks!
> 
> > > @@ -3243,8 +3237,7 @@ static int rcu_pending(int user)
> > >  
> > >  	/* Has RCU gone idle with this CPU needing another grace period? */
> > >  	if (!gp_in_progress && rcu_segcblist_is_enabled(&rdp->cblist) &&
> > > -	    (!IS_ENABLED(CONFIG_RCU_NOCB_CPU) ||
> > > -	     !rcu_segcblist_is_offloaded(&rdp->cblist)) &&
> > > +	    !rcu_segcblist_is_offloaded(&rdp->cblist) &&
> > 
> > Ditto.
> > 
> > As in "Why didn't I do it that way to start with???"  ;-)
> 
> You say that to someone who's too lazy to script short commands typed
> 100 times a day ;-)

;-) ;-) ;-)

							Thanx, Paul

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 07/10] rcu: Temporarily assume that nohz full CPUs might not be NOCB
  2020-05-13 23:08     ` Frederic Weisbecker
@ 2020-05-14 15:50       ` Paul E. McKenney
  2020-05-14 22:49         ` Frederic Weisbecker
  0 siblings, 1 reply; 57+ messages in thread
From: Paul E. McKenney @ 2020-05-14 15:50 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan,
	Joel Fernandes, Josh Triplett

On Thu, May 14, 2020 at 01:08:28AM +0200, Frederic Weisbecker wrote:
> On Wed, May 13, 2020 at 11:25:27AM -0700, Paul E. McKenney wrote:
> > On Wed, May 13, 2020 at 06:47:11PM +0200, Frederic Weisbecker wrote:
> > > So far nohz_full CPUs had to be nocb. This requirement may change
> > > temporarily as we are working on preparing RCU to be able to toggle the
> > > nocb state of a CPU. Once that is done and nohz_full can be toggled as
> > > well dynamically, we'll restore that initial requirement.
> > 
> > Would it simplify anything to make the CPU exit nohz_full first and
> > then exit rcu_nocb and vice versa in the other direction?  That way the
> > assumption about nohz_full CPUs always being rcu_nocb could remain while
> > still allowing runtime changes to both states.
> 
> That's the future plan but for now nohz_full can't even be exited yet.
> RCU is unlucky enough to be chosen as the starting point of this whole work :-)

But testing could still start with CPUs marked rcu_nocb but not marked
nohz_full, right?  I must confess that I am a bit concerned about the
increase in state space.

> > Of course, given that setup, it would not be possible to cause a CPU to
> > exit rcu_nocb state if it was still in nohz_full state.
> 
> Right.
> 
> > My fear is that allowing a CPU to be in nohz_full state without also
> > being in rcu_nocb state will cause needless confusion and bug reports.
> 
> Well, it should only be visible to those who work on it since there
> won't be a proper interface before we achieve the whole.

Fair point, but I am also concerned about the welfare of the people
working on it.  ;-)

							Thanx, Paul

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 10/10] rcu: Nocb (de)activate through sysfs
  2020-05-13 23:23     ` Frederic Weisbecker
@ 2020-05-14 15:51       ` Paul E. McKenney
  0 siblings, 0 replies; 57+ messages in thread
From: Paul E. McKenney @ 2020-05-14 15:51 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan,
	Joel Fernandes, Josh Triplett

On Thu, May 14, 2020 at 01:23:13AM +0200, Frederic Weisbecker wrote:
> On Wed, May 13, 2020 at 11:42:29AM -0700, Paul E. McKenney wrote:
> > On Wed, May 13, 2020 at 06:47:14PM +0200, Frederic Weisbecker wrote:
> > > Not for merge.
> > > 
> > > Make nocb toggable for a given CPU using:
> > > 	/sys/devices/system/cpu/cpu*/hotplug/nocb
> > > 
> > > This is only intended for those who want to test this patchset. The real
> > > interfaces will be cpuset/isolation and rcutorture.
> > 
> > Makes sense!
> > 
> > Speaking of rcutorture, what level of testing has this series undergone?
> 
> None yet, that first set was to shape up the design. But next
> iteration should see rcutorture coverage.

Fair enough!  I am sure that rcutorture will be ready when you are.
Hey, it always has been ready for me!  ;-)

							Thanx, Paul

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 08/10] rcu: Allow to deactivate nocb on a CPU
  2020-05-14 15:47       ` Paul E. McKenney
@ 2020-05-14 22:30         ` Frederic Weisbecker
  2020-05-14 22:47           ` Paul E. McKenney
  0 siblings, 1 reply; 57+ messages in thread
From: Frederic Weisbecker @ 2020-05-14 22:30 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: LKML, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan,
	Joel Fernandes, Josh Triplett

On Thu, May 14, 2020 at 08:47:07AM -0700, Paul E. McKenney wrote:
> On Thu, May 14, 2020 at 12:45:26AM +0200, Frederic Weisbecker wrote:
> This last seems best to me.  The transition from CBLIST_NOT_OFFLOADED
> to CBLIST_OFFLOADING of course needs to be on the CPU in question with
> at least bh disabled.  Probably best to be holding rcu_nocb_lock(),
> but that might just be me being overly paranoid.

So that's in the case of offloading, right? Well, I don't think we'd
need to even disable bh nor lock nocb. We just need the current CPU
to see the local update of cblist->offloaded = CBLIST_OFFLOADING
before the kthread is unparked:

    cblist->offloaded = CBLIST_OFFLOADING;
    /* Make sure subsequent softirq lock nocb */
    barrier();
    kthread_unpark(rdp->nocb_cb_thread);

Now, although that guarantees that nocb_cb will see CBLIST_OFFLOADING
upon unparking, it's not guaranteed that the nocb_gp will see it on its
next round. Ok so eventually you're right, I should indeed lock nocb...

> 
> > > > +static long rcu_nocb_rdp_deoffload(void *arg)
> > > > +{
> > > > +	struct rcu_data *rdp = arg;
> > > > +
> > > > +	WARN_ON_ONCE(rdp->cpu != raw_smp_processor_id());
> > > > +	__rcu_nocb_rdp_deoffload(rdp);
> > > > +
> > > > +	return 0;
> > > > +}
> > > 
> > > For example, is the problem caused by invocations of this
> > > rcu_nocb_rdp_deoffload() function?
> > 
> > How so?
> 
> It looked to me like it wasn't excluding either rcu_barrier() or CPU
> hotplug.  It might also not have been pinning onto the CPU in question,
> but that might just be me misremembering.  Then again, I didn't see a
> call to it, so maybe its callers set things up appropriately.
> 
> OK, I will bite...  What is the purpose of rcu_nocb_rdp_deoffload()?  ;-)

Ah it's called using work_on_cpu() which launch a workqueue on the
target and waits for completion. And that whole thing is protected
inside the barrier mutex and hotplug.

> Agreed!  And I do believe that concurrent callback execution will
> prove better than a possibly indefinite gap in callback execution.

Mutual agreement! :-)

Thanks.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 08/10] rcu: Allow to deactivate nocb on a CPU
  2020-05-14 22:30         ` Frederic Weisbecker
@ 2020-05-14 22:47           ` Paul E. McKenney
  2020-05-14 22:55             ` Frederic Weisbecker
  0 siblings, 1 reply; 57+ messages in thread
From: Paul E. McKenney @ 2020-05-14 22:47 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan,
	Joel Fernandes, Josh Triplett

On Fri, May 15, 2020 at 12:30:23AM +0200, Frederic Weisbecker wrote:
> On Thu, May 14, 2020 at 08:47:07AM -0700, Paul E. McKenney wrote:
> > On Thu, May 14, 2020 at 12:45:26AM +0200, Frederic Weisbecker wrote:
> > This last seems best to me.  The transition from CBLIST_NOT_OFFLOADED
> > to CBLIST_OFFLOADING of course needs to be on the CPU in question with
> > at least bh disabled.  Probably best to be holding rcu_nocb_lock(),
> > but that might just be me being overly paranoid.
> 
> So that's in the case of offloading, right? Well, I don't think we'd
> need to even disable bh nor lock nocb. We just need the current CPU
> to see the local update of cblist->offloaded = CBLIST_OFFLOADING
> before the kthread is unparked:
> 
>     cblist->offloaded = CBLIST_OFFLOADING;
>     /* Make sure subsequent softirq lock nocb */
>     barrier();
>     kthread_unpark(rdp->nocb_cb_thread);
> 
> Now, although that guarantees that nocb_cb will see CBLIST_OFFLOADING
> upon unparking, it's not guaranteed that the nocb_gp will see it on its
> next round. Ok so eventually you're right, I should indeed lock nocb...

I suspect that our future selves would hate us much less if we held
that lock.  ;-)

> > > > > +static long rcu_nocb_rdp_deoffload(void *arg)
> > > > > +{
> > > > > +	struct rcu_data *rdp = arg;
> > > > > +
> > > > > +	WARN_ON_ONCE(rdp->cpu != raw_smp_processor_id());
> > > > > +	__rcu_nocb_rdp_deoffload(rdp);
> > > > > +
> > > > > +	return 0;
> > > > > +}
> > > > 
> > > > For example, is the problem caused by invocations of this
> > > > rcu_nocb_rdp_deoffload() function?
> > > 
> > > How so?
> > 
> > It looked to me like it wasn't excluding either rcu_barrier() or CPU
> > hotplug.  It might also not have been pinning onto the CPU in question,
> > but that might just be me misremembering.  Then again, I didn't see a
> > call to it, so maybe its callers set things up appropriately.
> > 
> > OK, I will bite...  What is the purpose of rcu_nocb_rdp_deoffload()?  ;-)
> 
> Ah it's called using work_on_cpu() which launch a workqueue on the
> target and waits for completion. And that whole thing is protected
> inside the barrier mutex and hotplug.

Ah!  Yet again, color me blind.

							Thanx, Paul

> > Agreed!  And I do believe that concurrent callback execution will
> > prove better than a possibly indefinite gap in callback execution.
> 
> Mutual agreement! :-)
> 
> Thanks.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 07/10] rcu: Temporarily assume that nohz full CPUs might not be NOCB
  2020-05-14 15:50       ` Paul E. McKenney
@ 2020-05-14 22:49         ` Frederic Weisbecker
  0 siblings, 0 replies; 57+ messages in thread
From: Frederic Weisbecker @ 2020-05-14 22:49 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: LKML, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan,
	Joel Fernandes, Josh Triplett

On Thu, May 14, 2020 at 08:50:32AM -0700, Paul E. McKenney wrote:
> On Thu, May 14, 2020 at 01:08:28AM +0200, Frederic Weisbecker wrote:
> > On Wed, May 13, 2020 at 11:25:27AM -0700, Paul E. McKenney wrote:
> > > On Wed, May 13, 2020 at 06:47:11PM +0200, Frederic Weisbecker wrote:
> > > > So far nohz_full CPUs had to be nocb. This requirement may change
> > > > temporarily as we are working on preparing RCU to be able to toggle the
> > > > nocb state of a CPU. Once that is done and nohz_full can be toggled as
> > > > well dynamically, we'll restore that initial requirement.
> > > 
> > > Would it simplify anything to make the CPU exit nohz_full first and
> > > then exit rcu_nocb and vice versa in the other direction?  That way the
> > > assumption about nohz_full CPUs always being rcu_nocb could remain while
> > > still allowing runtime changes to both states.
> > 
> > That's the future plan but for now nohz_full can't even be exited yet.
> > RCU is unlucky enough to be chosen as the starting point of this whole work :-)
> 
> But testing could still start with CPUs marked rcu_nocb but not marked
> nohz_full, right?

Ah! That makes sense indeed. I should indeed restrict de-offloading to CPUs
that are not nohz_full.

> I must confess that I am a bit concerned about the increase in state space.

Yeah good point!

> Fair point, but I am also concerned about the welfare of the people
> working on it.  ;-)

Fair enough! :-)

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 08/10] rcu: Allow to deactivate nocb on a CPU
  2020-05-14 22:47           ` Paul E. McKenney
@ 2020-05-14 22:55             ` Frederic Weisbecker
  0 siblings, 0 replies; 57+ messages in thread
From: Frederic Weisbecker @ 2020-05-14 22:55 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: LKML, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan,
	Joel Fernandes, Josh Triplett

On Thu, May 14, 2020 at 03:47:35PM -0700, Paul E. McKenney wrote:
> On Fri, May 15, 2020 at 12:30:23AM +0200, Frederic Weisbecker wrote:
> > On Thu, May 14, 2020 at 08:47:07AM -0700, Paul E. McKenney wrote:
> > > On Thu, May 14, 2020 at 12:45:26AM +0200, Frederic Weisbecker wrote:
> > > This last seems best to me.  The transition from CBLIST_NOT_OFFLOADED
> > > to CBLIST_OFFLOADING of course needs to be on the CPU in question with
> > > at least bh disabled.  Probably best to be holding rcu_nocb_lock(),
> > > but that might just be me being overly paranoid.
> > 
> > So that's in the case of offloading, right? Well, I don't think we'd
> > need to even disable bh nor lock nocb. We just need the current CPU
> > to see the local update of cblist->offloaded = CBLIST_OFFLOADING
> > before the kthread is unparked:
> > 
> >     cblist->offloaded = CBLIST_OFFLOADING;
> >     /* Make sure subsequent softirq lock nocb */
> >     barrier();
> >     kthread_unpark(rdp->nocb_cb_thread);
> > 
> > Now, although that guarantees that nocb_cb will see CBLIST_OFFLOADING
> > upon unparking, it's not guaranteed that the nocb_gp will see it on its
> > next round. Ok so eventually you're right, I should indeed lock nocb...
> 
> I suspect that our future selves would hate us much less if we held
> that lock.  ;-)

Also, taking the decision to hold that lock could teach a lesson to our
past selves. Win-win! Let us become that most welcome time bridge!

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 01/10] rcu: Directly lock rdp->nocb_lock on nocb code entrypoints
  2020-05-13 16:47 ` [PATCH 01/10] rcu: Directly lock rdp->nocb_lock on nocb code entrypoints Frederic Weisbecker
@ 2020-05-20 12:29   ` Joel Fernandes
  2020-05-22 17:57     ` Paul E. McKenney
  0 siblings, 1 reply; 57+ messages in thread
From: Joel Fernandes @ 2020-05-20 12:29 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Paul E . McKenney, LKML, Steven Rostedt, Mathieu Desnoyers,
	Lai Jiangshan, Josh Triplett

On Wed, May 13, 2020 at 06:47:05PM +0200, Frederic Weisbecker wrote:
> Pure NOCB code entrypoints (nocb_cb kthread, nocb_gp kthread, nocb
> timers) can unconditionally lock rdp->nocb_lock as they always execute
> in the context of an offloaded rdp.
> 
> This also prepare for toggling CPUs to/from callback's offloaded mode
> where the offloaded state will possibly change when rdp->nocb_lock
> isn't taken. We'll still want the entrypoints to lock the rdp in any
> case.

Suggested rewrite for change log:
<wordsmithing>
Make pure NOCB code entrypoints (nocb_cb kthread, nocb_gp kthread, nocb
timers) unconditionally lock rdp->nocb_lock as they always execute in the
context of an offloaded rdp.

This prepares for future toggling of CPUs to/from callback's offloaded mode
where the offloaded state can change when rdp->nocb_lock is not held. We'll
still want the entrypoints to lock the rdp in any case.
</wordsmithing>

Also, can we inline rcu_nocb_lock_irqsave() into
do_nocb_deferred_wakeup_common() since that's the only user, and then delete
rcu_nocb_lock_irqsave() and the corresponding unlock? That would also remove
confusion about which API to use for nocb locking (i.e. whether to directly
acquire lock or call rcu_nocb_lock_irqsave()).

Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>

thanks,

 - Joel


> 
> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
> Cc: Paul E. McKenney <paulmck@kernel.org>
> Cc: Josh Triplett <josh@joshtriplett.org>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> Cc: Lai Jiangshan <jiangshanlai@gmail.com>
> Cc: Joel Fernandes <joel@joelfernandes.org>
> ---
>  kernel/rcu/tree_plugin.h | 14 +++++++-------
>  1 file changed, 7 insertions(+), 7 deletions(-)
> 
> diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> index 097635c41135..523570469864 100644
> --- a/kernel/rcu/tree_plugin.h
> +++ b/kernel/rcu/tree_plugin.h
> @@ -1909,7 +1909,7 @@ static void do_nocb_bypass_wakeup_timer(struct timer_list *t)
>  	struct rcu_data *rdp = from_timer(rdp, t, nocb_bypass_timer);
>  
>  	trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("Timer"));
> -	rcu_nocb_lock_irqsave(rdp, flags);
> +	raw_spin_lock_irqsave(&rdp->nocb_lock, flags);
>  	smp_mb__after_spinlock(); /* Timer expire before wakeup. */
>  	__call_rcu_nocb_wake(rdp, true, flags);
>  }
> @@ -1942,7 +1942,7 @@ static void nocb_gp_wait(struct rcu_data *my_rdp)
>  	 */
>  	for (rdp = my_rdp; rdp; rdp = rdp->nocb_next_cb_rdp) {
>  		trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("Check"));
> -		rcu_nocb_lock_irqsave(rdp, flags);
> +		raw_spin_lock_irqsave(&rdp->nocb_lock, flags);
>  		bypass_ncbs = rcu_cblist_n_cbs(&rdp->nocb_bypass);
>  		if (bypass_ncbs &&
>  		    (time_after(j, READ_ONCE(rdp->nocb_bypass_first) + 1) ||
> @@ -1951,7 +1951,7 @@ static void nocb_gp_wait(struct rcu_data *my_rdp)
>  			(void)rcu_nocb_try_flush_bypass(rdp, j);
>  			bypass_ncbs = rcu_cblist_n_cbs(&rdp->nocb_bypass);
>  		} else if (!bypass_ncbs && rcu_segcblist_empty(&rdp->cblist)) {
> -			rcu_nocb_unlock_irqrestore(rdp, flags);
> +			raw_spin_unlock_irqrestore(&rdp->nocb_lock, flags);
>  			continue; /* No callbacks here, try next. */
>  		}
>  		if (bypass_ncbs) {
> @@ -1996,7 +1996,7 @@ static void nocb_gp_wait(struct rcu_data *my_rdp)
>  		} else {
>  			needwake = false;
>  		}
> -		rcu_nocb_unlock_irqrestore(rdp, flags);
> +		raw_spin_unlock_irqrestore(&rdp->nocb_lock, flags);
>  		if (needwake) {
>  			swake_up_one(&rdp->nocb_cb_wq);
>  			gotcbs = true;
> @@ -2084,7 +2084,7 @@ static void nocb_cb_wait(struct rcu_data *rdp)
>  	rcu_do_batch(rdp);
>  	local_bh_enable();
>  	lockdep_assert_irqs_enabled();
> -	rcu_nocb_lock_irqsave(rdp, flags);
> +	raw_spin_lock_irqsave(&rdp->nocb_lock, flags);
>  	if (rcu_segcblist_nextgp(&rdp->cblist, &cur_gp_seq) &&
>  	    rcu_seq_done(&rnp->gp_seq, cur_gp_seq) &&
>  	    raw_spin_trylock_rcu_node(rnp)) { /* irqs already disabled. */
> @@ -2092,7 +2092,7 @@ static void nocb_cb_wait(struct rcu_data *rdp)
>  		raw_spin_unlock_rcu_node(rnp); /* irqs remain disabled. */
>  	}
>  	if (rcu_segcblist_ready_cbs(&rdp->cblist)) {
> -		rcu_nocb_unlock_irqrestore(rdp, flags);
> +		raw_spin_unlock_irqrestore(&rdp->nocb_lock, flags);
>  		if (needwake_gp)
>  			rcu_gp_kthread_wake();
>  		return;
> @@ -2100,7 +2100,7 @@ static void nocb_cb_wait(struct rcu_data *rdp)
>  
>  	trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("CBSleep"));
>  	WRITE_ONCE(rdp->nocb_cb_sleep, true);
> -	rcu_nocb_unlock_irqrestore(rdp, flags);
> +	raw_spin_unlock_irqrestore(&rdp->nocb_lock, flags);
>  	if (needwake_gp)
>  		rcu_gp_kthread_wake();
>  	swait_event_interruptible_exclusive(rdp->nocb_cb_wq,
> -- 
> 2.25.0
> 

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 01/10] rcu: Directly lock rdp->nocb_lock on nocb code entrypoints
  2020-05-20 12:29   ` Joel Fernandes
@ 2020-05-22 17:57     ` Paul E. McKenney
  2020-05-26 15:21       ` Joel Fernandes
  2020-06-04 11:41       ` Frederic Weisbecker
  0 siblings, 2 replies; 57+ messages in thread
From: Paul E. McKenney @ 2020-05-22 17:57 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Frederic Weisbecker, LKML, Steven Rostedt, Mathieu Desnoyers,
	Lai Jiangshan, Josh Triplett

On Wed, May 20, 2020 at 08:29:49AM -0400, Joel Fernandes wrote:
> On Wed, May 13, 2020 at 06:47:05PM +0200, Frederic Weisbecker wrote:
> > Pure NOCB code entrypoints (nocb_cb kthread, nocb_gp kthread, nocb
> > timers) can unconditionally lock rdp->nocb_lock as they always execute
> > in the context of an offloaded rdp.
> > 
> > This also prepare for toggling CPUs to/from callback's offloaded mode
> > where the offloaded state will possibly change when rdp->nocb_lock
> > isn't taken. We'll still want the entrypoints to lock the rdp in any
> > case.
> 
> Suggested rewrite for change log:
> <wordsmithing>
> Make pure NOCB code entrypoints (nocb_cb kthread, nocb_gp kthread, nocb
> timers) unconditionally lock rdp->nocb_lock as they always execute in the
> context of an offloaded rdp.
> 
> This prepares for future toggling of CPUs to/from callback's offloaded mode
> where the offloaded state can change when rdp->nocb_lock is not held. We'll
> still want the entrypoints to lock the rdp in any case.
> </wordsmithing>
> 
> Also, can we inline rcu_nocb_lock_irqsave() into
> do_nocb_deferred_wakeup_common() since that's the only user, and then delete
> rcu_nocb_lock_irqsave() and the corresponding unlock? That would also remove
> confusion about which API to use for nocb locking (i.e. whether to directly
> acquire lock or call rcu_nocb_lock_irqsave()).
> 
> Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>

Thank you for looking this over, Joel!

Is it feasible to make rcu_nocb_lock*() and rcu_nocb_unlock*() "do the
right thing", even when things are changing?  If it is feasible, that
would prevent any number of "interesting" copy-pasta and "just now became
common code" bugs down the road.  And because irqs are disabled while
holding the lock, it should be possible to keep state on a per-CPU basis.

The ugliest scenario is callback adoption, where there are two ->cblist
structures in need of being locked.  In that case, changes are excluded
(because that is in CPU hotplug code), but is it possible to take
advantage of that reasonably?

Maybe these changes are the best we can do, but it would be good to
if the same primitive locked a ->cblist regardless of context.

Can that be made to work reasonably?

							Thanx, Paul

> thanks,
> 
>  - Joel
> 
> 
> > 
> > Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
> > Cc: Paul E. McKenney <paulmck@kernel.org>
> > Cc: Josh Triplett <josh@joshtriplett.org>
> > Cc: Steven Rostedt <rostedt@goodmis.org>
> > Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> > Cc: Lai Jiangshan <jiangshanlai@gmail.com>
> > Cc: Joel Fernandes <joel@joelfernandes.org>
> > ---
> >  kernel/rcu/tree_plugin.h | 14 +++++++-------
> >  1 file changed, 7 insertions(+), 7 deletions(-)
> > 
> > diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> > index 097635c41135..523570469864 100644
> > --- a/kernel/rcu/tree_plugin.h
> > +++ b/kernel/rcu/tree_plugin.h
> > @@ -1909,7 +1909,7 @@ static void do_nocb_bypass_wakeup_timer(struct timer_list *t)
> >  	struct rcu_data *rdp = from_timer(rdp, t, nocb_bypass_timer);
> >  
> >  	trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("Timer"));
> > -	rcu_nocb_lock_irqsave(rdp, flags);
> > +	raw_spin_lock_irqsave(&rdp->nocb_lock, flags);
> >  	smp_mb__after_spinlock(); /* Timer expire before wakeup. */
> >  	__call_rcu_nocb_wake(rdp, true, flags);
> >  }
> > @@ -1942,7 +1942,7 @@ static void nocb_gp_wait(struct rcu_data *my_rdp)
> >  	 */
> >  	for (rdp = my_rdp; rdp; rdp = rdp->nocb_next_cb_rdp) {
> >  		trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("Check"));
> > -		rcu_nocb_lock_irqsave(rdp, flags);
> > +		raw_spin_lock_irqsave(&rdp->nocb_lock, flags);
> >  		bypass_ncbs = rcu_cblist_n_cbs(&rdp->nocb_bypass);
> >  		if (bypass_ncbs &&
> >  		    (time_after(j, READ_ONCE(rdp->nocb_bypass_first) + 1) ||
> > @@ -1951,7 +1951,7 @@ static void nocb_gp_wait(struct rcu_data *my_rdp)
> >  			(void)rcu_nocb_try_flush_bypass(rdp, j);
> >  			bypass_ncbs = rcu_cblist_n_cbs(&rdp->nocb_bypass);
> >  		} else if (!bypass_ncbs && rcu_segcblist_empty(&rdp->cblist)) {
> > -			rcu_nocb_unlock_irqrestore(rdp, flags);
> > +			raw_spin_unlock_irqrestore(&rdp->nocb_lock, flags);
> >  			continue; /* No callbacks here, try next. */
> >  		}
> >  		if (bypass_ncbs) {
> > @@ -1996,7 +1996,7 @@ static void nocb_gp_wait(struct rcu_data *my_rdp)
> >  		} else {
> >  			needwake = false;
> >  		}
> > -		rcu_nocb_unlock_irqrestore(rdp, flags);
> > +		raw_spin_unlock_irqrestore(&rdp->nocb_lock, flags);
> >  		if (needwake) {
> >  			swake_up_one(&rdp->nocb_cb_wq);
> >  			gotcbs = true;
> > @@ -2084,7 +2084,7 @@ static void nocb_cb_wait(struct rcu_data *rdp)
> >  	rcu_do_batch(rdp);
> >  	local_bh_enable();
> >  	lockdep_assert_irqs_enabled();
> > -	rcu_nocb_lock_irqsave(rdp, flags);
> > +	raw_spin_lock_irqsave(&rdp->nocb_lock, flags);
> >  	if (rcu_segcblist_nextgp(&rdp->cblist, &cur_gp_seq) &&
> >  	    rcu_seq_done(&rnp->gp_seq, cur_gp_seq) &&
> >  	    raw_spin_trylock_rcu_node(rnp)) { /* irqs already disabled. */
> > @@ -2092,7 +2092,7 @@ static void nocb_cb_wait(struct rcu_data *rdp)
> >  		raw_spin_unlock_rcu_node(rnp); /* irqs remain disabled. */
> >  	}
> >  	if (rcu_segcblist_ready_cbs(&rdp->cblist)) {
> > -		rcu_nocb_unlock_irqrestore(rdp, flags);
> > +		raw_spin_unlock_irqrestore(&rdp->nocb_lock, flags);
> >  		if (needwake_gp)
> >  			rcu_gp_kthread_wake();
> >  		return;
> > @@ -2100,7 +2100,7 @@ static void nocb_cb_wait(struct rcu_data *rdp)
> >  
> >  	trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("CBSleep"));
> >  	WRITE_ONCE(rdp->nocb_cb_sleep, true);
> > -	rcu_nocb_unlock_irqrestore(rdp, flags);
> > +	raw_spin_unlock_irqrestore(&rdp->nocb_lock, flags);
> >  	if (needwake_gp)
> >  		rcu_gp_kthread_wake();
> >  	swait_event_interruptible_exclusive(rdp->nocb_cb_wq,
> > -- 
> > 2.25.0
> > 

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 01/10] rcu: Directly lock rdp->nocb_lock on nocb code entrypoints
  2020-05-22 17:57     ` Paul E. McKenney
@ 2020-05-26 15:21       ` Joel Fernandes
  2020-05-26 16:29         ` Paul E. McKenney
  2020-06-04 11:41       ` Frederic Weisbecker
  1 sibling, 1 reply; 57+ messages in thread
From: Joel Fernandes @ 2020-05-26 15:21 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Frederic Weisbecker, LKML, Steven Rostedt, Mathieu Desnoyers,
	Lai Jiangshan, Josh Triplett

Hi Paul,

On Fri, May 22, 2020 at 10:57:39AM -0700, Paul E. McKenney wrote:
> On Wed, May 20, 2020 at 08:29:49AM -0400, Joel Fernandes wrote:
> > On Wed, May 13, 2020 at 06:47:05PM +0200, Frederic Weisbecker wrote:
> > > Pure NOCB code entrypoints (nocb_cb kthread, nocb_gp kthread, nocb
> > > timers) can unconditionally lock rdp->nocb_lock as they always execute
> > > in the context of an offloaded rdp.
> > > 
> > > This also prepare for toggling CPUs to/from callback's offloaded mode
> > > where the offloaded state will possibly change when rdp->nocb_lock
> > > isn't taken. We'll still want the entrypoints to lock the rdp in any
> > > case.
> > 
> > Suggested rewrite for change log:
> > <wordsmithing>
> > Make pure NOCB code entrypoints (nocb_cb kthread, nocb_gp kthread, nocb
> > timers) unconditionally lock rdp->nocb_lock as they always execute in the
> > context of an offloaded rdp.
> > 
> > This prepares for future toggling of CPUs to/from callback's offloaded mode
> > where the offloaded state can change when rdp->nocb_lock is not held. We'll
> > still want the entrypoints to lock the rdp in any case.
> > </wordsmithing>
> > 
> > Also, can we inline rcu_nocb_lock_irqsave() into
> > do_nocb_deferred_wakeup_common() since that's the only user, and then delete
> > rcu_nocb_lock_irqsave() and the corresponding unlock? That would also remove
> > confusion about which API to use for nocb locking (i.e. whether to directly
> > acquire lock or call rcu_nocb_lock_irqsave()).
> > 
> > Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> 
> Thank you for looking this over, Joel!
> 
> Is it feasible to make rcu_nocb_lock*() and rcu_nocb_unlock*() "do the
> right thing", even when things are changing?

One way to prevent things from changing could be to employ Steven's
poor-man's RCU to mediate the "changing state" itself assuming the switch
from the fast-path to the slow-path does not happen often. :) So whichever
path is changing the state needs to wait for that poor-man's GP.

Any case, are you concerned with the performance issues with the
unconditional locking and that's why you suggest still keeping it conditional?

Also, coming back to my point of inline the helper into the last caller -
do_nocb_deferred_wakeup_common(). I think since we don't need to check if the
rdp is offloaded and do any conditional locking. The timer can be called only
with offloaded rdp. So we can directly do the unconditional spinlock instead
of using the rcu_nocb_lock helper there.

> would prevent any number of "interesting" copy-pasta and "just now became
> common code" bugs down the road.  And because irqs are disabled while
> holding the lock, it should be possible to keep state on a per-CPU basis.

Agreed, that would be nice. Though if we could keep simple, that'd be nice
too.

> The ugliest scenario is callback adoption, where there are two ->cblist
> structures in need of being locked.  In that case, changes are excluded
> (because that is in CPU hotplug code), but is it possible to take
> advantage of that reasonably?

Could you describe this a bit more? Thanks.

> Maybe these changes are the best we can do, but it would be good to
> if the same primitive locked a ->cblist regardless of context.

Here you are comparing 2 primitives. Do you mean that just IRQs being
disabled is another primitive, and rcu_nocb_lock is another one?

BTW, I'm really itching to give it a try to make the scheduler more deadlock
resilient (that is, if the scheduler wake up path detects a deadlock, then it
defers the wake up using timers, or irq_work on its own instead of passing
the burden of doing so to the callers). Thoughts?

thanks,

 - Joel


> Can that be made to work reasonably?
> 
> 							Thanx, Paul
> 
> > thanks,
> > 
> >  - Joel
> > 
> > 
> > > 
> > > Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
> > > Cc: Paul E. McKenney <paulmck@kernel.org>
> > > Cc: Josh Triplett <josh@joshtriplett.org>
> > > Cc: Steven Rostedt <rostedt@goodmis.org>
> > > Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> > > Cc: Lai Jiangshan <jiangshanlai@gmail.com>
> > > Cc: Joel Fernandes <joel@joelfernandes.org>
> > > ---
> > >  kernel/rcu/tree_plugin.h | 14 +++++++-------
> > >  1 file changed, 7 insertions(+), 7 deletions(-)
> > > 
> > > diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> > > index 097635c41135..523570469864 100644
> > > --- a/kernel/rcu/tree_plugin.h
> > > +++ b/kernel/rcu/tree_plugin.h
> > > @@ -1909,7 +1909,7 @@ static void do_nocb_bypass_wakeup_timer(struct timer_list *t)
> > >  	struct rcu_data *rdp = from_timer(rdp, t, nocb_bypass_timer);
> > >  
> > >  	trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("Timer"));
> > > -	rcu_nocb_lock_irqsave(rdp, flags);
> > > +	raw_spin_lock_irqsave(&rdp->nocb_lock, flags);
> > >  	smp_mb__after_spinlock(); /* Timer expire before wakeup. */
> > >  	__call_rcu_nocb_wake(rdp, true, flags);
> > >  }
> > > @@ -1942,7 +1942,7 @@ static void nocb_gp_wait(struct rcu_data *my_rdp)
> > >  	 */
> > >  	for (rdp = my_rdp; rdp; rdp = rdp->nocb_next_cb_rdp) {
> > >  		trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("Check"));
> > > -		rcu_nocb_lock_irqsave(rdp, flags);
> > > +		raw_spin_lock_irqsave(&rdp->nocb_lock, flags);
> > >  		bypass_ncbs = rcu_cblist_n_cbs(&rdp->nocb_bypass);
> > >  		if (bypass_ncbs &&
> > >  		    (time_after(j, READ_ONCE(rdp->nocb_bypass_first) + 1) ||
> > > @@ -1951,7 +1951,7 @@ static void nocb_gp_wait(struct rcu_data *my_rdp)
> > >  			(void)rcu_nocb_try_flush_bypass(rdp, j);
> > >  			bypass_ncbs = rcu_cblist_n_cbs(&rdp->nocb_bypass);
> > >  		} else if (!bypass_ncbs && rcu_segcblist_empty(&rdp->cblist)) {
> > > -			rcu_nocb_unlock_irqrestore(rdp, flags);
> > > +			raw_spin_unlock_irqrestore(&rdp->nocb_lock, flags);
> > >  			continue; /* No callbacks here, try next. */
> > >  		}
> > >  		if (bypass_ncbs) {
> > > @@ -1996,7 +1996,7 @@ static void nocb_gp_wait(struct rcu_data *my_rdp)
> > >  		} else {
> > >  			needwake = false;
> > >  		}
> > > -		rcu_nocb_unlock_irqrestore(rdp, flags);
> > > +		raw_spin_unlock_irqrestore(&rdp->nocb_lock, flags);
> > >  		if (needwake) {
> > >  			swake_up_one(&rdp->nocb_cb_wq);
> > >  			gotcbs = true;
> > > @@ -2084,7 +2084,7 @@ static void nocb_cb_wait(struct rcu_data *rdp)
> > >  	rcu_do_batch(rdp);
> > >  	local_bh_enable();
> > >  	lockdep_assert_irqs_enabled();
> > > -	rcu_nocb_lock_irqsave(rdp, flags);
> > > +	raw_spin_lock_irqsave(&rdp->nocb_lock, flags);
> > >  	if (rcu_segcblist_nextgp(&rdp->cblist, &cur_gp_seq) &&
> > >  	    rcu_seq_done(&rnp->gp_seq, cur_gp_seq) &&
> > >  	    raw_spin_trylock_rcu_node(rnp)) { /* irqs already disabled. */
> > > @@ -2092,7 +2092,7 @@ static void nocb_cb_wait(struct rcu_data *rdp)
> > >  		raw_spin_unlock_rcu_node(rnp); /* irqs remain disabled. */
> > >  	}
> > >  	if (rcu_segcblist_ready_cbs(&rdp->cblist)) {
> > > -		rcu_nocb_unlock_irqrestore(rdp, flags);
> > > +		raw_spin_unlock_irqrestore(&rdp->nocb_lock, flags);
> > >  		if (needwake_gp)
> > >  			rcu_gp_kthread_wake();
> > >  		return;
> > > @@ -2100,7 +2100,7 @@ static void nocb_cb_wait(struct rcu_data *rdp)
> > >  
> > >  	trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("CBSleep"));
> > >  	WRITE_ONCE(rdp->nocb_cb_sleep, true);
> > > -	rcu_nocb_unlock_irqrestore(rdp, flags);
> > > +	raw_spin_unlock_irqrestore(&rdp->nocb_lock, flags);
> > >  	if (needwake_gp)
> > >  		rcu_gp_kthread_wake();
> > >  	swait_event_interruptible_exclusive(rdp->nocb_cb_wq,
> > > -- 
> > > 2.25.0
> > > 

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 01/10] rcu: Directly lock rdp->nocb_lock on nocb code entrypoints
  2020-05-26 15:21       ` Joel Fernandes
@ 2020-05-26 16:29         ` Paul E. McKenney
  2020-05-26 20:18           ` Joel Fernandes
  0 siblings, 1 reply; 57+ messages in thread
From: Paul E. McKenney @ 2020-05-26 16:29 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Frederic Weisbecker, LKML, Steven Rostedt, Mathieu Desnoyers,
	Lai Jiangshan, Josh Triplett

On Tue, May 26, 2020 at 11:21:37AM -0400, Joel Fernandes wrote:
> Hi Paul,
> 
> On Fri, May 22, 2020 at 10:57:39AM -0700, Paul E. McKenney wrote:
> > On Wed, May 20, 2020 at 08:29:49AM -0400, Joel Fernandes wrote:
> > > On Wed, May 13, 2020 at 06:47:05PM +0200, Frederic Weisbecker wrote:
> > > > Pure NOCB code entrypoints (nocb_cb kthread, nocb_gp kthread, nocb
> > > > timers) can unconditionally lock rdp->nocb_lock as they always execute
> > > > in the context of an offloaded rdp.
> > > > 
> > > > This also prepare for toggling CPUs to/from callback's offloaded mode
> > > > where the offloaded state will possibly change when rdp->nocb_lock
> > > > isn't taken. We'll still want the entrypoints to lock the rdp in any
> > > > case.
> > > 
> > > Suggested rewrite for change log:
> > > <wordsmithing>
> > > Make pure NOCB code entrypoints (nocb_cb kthread, nocb_gp kthread, nocb
> > > timers) unconditionally lock rdp->nocb_lock as they always execute in the
> > > context of an offloaded rdp.
> > > 
> > > This prepares for future toggling of CPUs to/from callback's offloaded mode
> > > where the offloaded state can change when rdp->nocb_lock is not held. We'll
> > > still want the entrypoints to lock the rdp in any case.
> > > </wordsmithing>
> > > 
> > > Also, can we inline rcu_nocb_lock_irqsave() into
> > > do_nocb_deferred_wakeup_common() since that's the only user, and then delete
> > > rcu_nocb_lock_irqsave() and the corresponding unlock? That would also remove
> > > confusion about which API to use for nocb locking (i.e. whether to directly
> > > acquire lock or call rcu_nocb_lock_irqsave()).
> > > 
> > > Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > 
> > Thank you for looking this over, Joel!
> > 
> > Is it feasible to make rcu_nocb_lock*() and rcu_nocb_unlock*() "do the
> > right thing", even when things are changing?
> 
> One way to prevent things from changing could be to employ Steven's
> poor-man's RCU to mediate the "changing state" itself assuming the switch
> from the fast-path to the slow-path does not happen often. :) So whichever
> path is changing the state needs to wait for that poor-man's GP.

That should not be needed, given that acquiring ->nocb_lock on the CPU
corresponding to that lock suffices in both cases.  The trick is making
sure that the release matches the acquire.

> Any case, are you concerned with the performance issues with the
> unconditional locking and that's why you suggest still keeping it conditional?

My concerns are more about maintainability.

> Also, coming back to my point of inline the helper into the last caller -
> do_nocb_deferred_wakeup_common(). I think since we don't need to check if the
> rdp is offloaded and do any conditional locking. The timer can be called only
> with offloaded rdp. So we can directly do the unconditional spinlock instead
> of using the rcu_nocb_lock helper there.

Indeed we can.  But should we?

> > would prevent any number of "interesting" copy-pasta and "just now became
> > common code" bugs down the road.  And because irqs are disabled while
> > holding the lock, it should be possible to keep state on a per-CPU basis.
> 
> Agreed, that would be nice. Though if we could keep simple, that'd be nice
> too.

Having one set of functions/macros that are always used to protect
the ->cblist, no matter what the context, is a very attractive form of
simple.  ;-)

> > The ugliest scenario is callback adoption, where there are two ->cblist
> > structures in need of being locked.  In that case, changes are excluded
> > (because that is in CPU hotplug code), but is it possible to take
> > advantage of that reasonably?
> 
> Could you describe this a bit more? Thanks.

Right now, callbacks are merged directly from the outgoing CPU's ->cblist
to the surviving CPU's ->cblist.  This means that both ->cblist structures
must be locked at the same time, which would require additional state.
After all, if only the one ->cblist were to be locked at a given time,
a per-CPU variable could be used to track what method should be used to
unlock the ->cblist.

This could be restructured to use an intermediate private ->cblist,
but care would be required with the counts, as it is a very bad idea
for a large group of callbacks to become invisible.  (This is not a
problem for rcu_barrier(), which excludes CPU hotplug, but it could
be a serious problem for callback-flood-mitigation mechanisms.  Yes,
they are heuristic, but...)

> > Maybe these changes are the best we can do, but it would be good to
> > if the same primitive locked a ->cblist regardless of context.
> 
> Here you are comparing 2 primitives. Do you mean that just IRQs being
> disabled is another primitive, and rcu_nocb_lock is another one?

I am not sure what this question means, but I am advocating looking
into retaining a single wrapper that decides instead of direct use of
the underlying primitives.

Or are you instead asking why there are two different methods of
protecting the ->cblist structures?  (If so, because call_rcu() happens
often enough that we don't want lock-acquisition overhead unless we
absolutely need it, which we do on nohz_full CPUs but not otherwise.)

> BTW, I'm really itching to give it a try to make the scheduler more deadlock
> resilient (that is, if the scheduler wake up path detects a deadlock, then it
> defers the wake up using timers, or irq_work on its own instead of passing
> the burden of doing so to the callers). Thoughts?

I have used similar approaches within RCU, but on the other hand the
scheduler often has tighter latency constraints than RCU does.	So I
think that is a better question for the scheduler maintainers than it
is for me.  ;-)

							Thanx, Paul

> thanks,
> 
>  - Joel
> 
> 
> > Can that be made to work reasonably?
> > 
> > 							Thanx, Paul
> > 
> > > thanks,
> > > 
> > >  - Joel
> > > 
> > > 
> > > > 
> > > > Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
> > > > Cc: Paul E. McKenney <paulmck@kernel.org>
> > > > Cc: Josh Triplett <josh@joshtriplett.org>
> > > > Cc: Steven Rostedt <rostedt@goodmis.org>
> > > > Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> > > > Cc: Lai Jiangshan <jiangshanlai@gmail.com>
> > > > Cc: Joel Fernandes <joel@joelfernandes.org>
> > > > ---
> > > >  kernel/rcu/tree_plugin.h | 14 +++++++-------
> > > >  1 file changed, 7 insertions(+), 7 deletions(-)
> > > > 
> > > > diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> > > > index 097635c41135..523570469864 100644
> > > > --- a/kernel/rcu/tree_plugin.h
> > > > +++ b/kernel/rcu/tree_plugin.h
> > > > @@ -1909,7 +1909,7 @@ static void do_nocb_bypass_wakeup_timer(struct timer_list *t)
> > > >  	struct rcu_data *rdp = from_timer(rdp, t, nocb_bypass_timer);
> > > >  
> > > >  	trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("Timer"));
> > > > -	rcu_nocb_lock_irqsave(rdp, flags);
> > > > +	raw_spin_lock_irqsave(&rdp->nocb_lock, flags);
> > > >  	smp_mb__after_spinlock(); /* Timer expire before wakeup. */
> > > >  	__call_rcu_nocb_wake(rdp, true, flags);
> > > >  }
> > > > @@ -1942,7 +1942,7 @@ static void nocb_gp_wait(struct rcu_data *my_rdp)
> > > >  	 */
> > > >  	for (rdp = my_rdp; rdp; rdp = rdp->nocb_next_cb_rdp) {
> > > >  		trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("Check"));
> > > > -		rcu_nocb_lock_irqsave(rdp, flags);
> > > > +		raw_spin_lock_irqsave(&rdp->nocb_lock, flags);
> > > >  		bypass_ncbs = rcu_cblist_n_cbs(&rdp->nocb_bypass);
> > > >  		if (bypass_ncbs &&
> > > >  		    (time_after(j, READ_ONCE(rdp->nocb_bypass_first) + 1) ||
> > > > @@ -1951,7 +1951,7 @@ static void nocb_gp_wait(struct rcu_data *my_rdp)
> > > >  			(void)rcu_nocb_try_flush_bypass(rdp, j);
> > > >  			bypass_ncbs = rcu_cblist_n_cbs(&rdp->nocb_bypass);
> > > >  		} else if (!bypass_ncbs && rcu_segcblist_empty(&rdp->cblist)) {
> > > > -			rcu_nocb_unlock_irqrestore(rdp, flags);
> > > > +			raw_spin_unlock_irqrestore(&rdp->nocb_lock, flags);
> > > >  			continue; /* No callbacks here, try next. */
> > > >  		}
> > > >  		if (bypass_ncbs) {
> > > > @@ -1996,7 +1996,7 @@ static void nocb_gp_wait(struct rcu_data *my_rdp)
> > > >  		} else {
> > > >  			needwake = false;
> > > >  		}
> > > > -		rcu_nocb_unlock_irqrestore(rdp, flags);
> > > > +		raw_spin_unlock_irqrestore(&rdp->nocb_lock, flags);
> > > >  		if (needwake) {
> > > >  			swake_up_one(&rdp->nocb_cb_wq);
> > > >  			gotcbs = true;
> > > > @@ -2084,7 +2084,7 @@ static void nocb_cb_wait(struct rcu_data *rdp)
> > > >  	rcu_do_batch(rdp);
> > > >  	local_bh_enable();
> > > >  	lockdep_assert_irqs_enabled();
> > > > -	rcu_nocb_lock_irqsave(rdp, flags);
> > > > +	raw_spin_lock_irqsave(&rdp->nocb_lock, flags);
> > > >  	if (rcu_segcblist_nextgp(&rdp->cblist, &cur_gp_seq) &&
> > > >  	    rcu_seq_done(&rnp->gp_seq, cur_gp_seq) &&
> > > >  	    raw_spin_trylock_rcu_node(rnp)) { /* irqs already disabled. */
> > > > @@ -2092,7 +2092,7 @@ static void nocb_cb_wait(struct rcu_data *rdp)
> > > >  		raw_spin_unlock_rcu_node(rnp); /* irqs remain disabled. */
> > > >  	}
> > > >  	if (rcu_segcblist_ready_cbs(&rdp->cblist)) {
> > > > -		rcu_nocb_unlock_irqrestore(rdp, flags);
> > > > +		raw_spin_unlock_irqrestore(&rdp->nocb_lock, flags);
> > > >  		if (needwake_gp)
> > > >  			rcu_gp_kthread_wake();
> > > >  		return;
> > > > @@ -2100,7 +2100,7 @@ static void nocb_cb_wait(struct rcu_data *rdp)
> > > >  
> > > >  	trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("CBSleep"));
> > > >  	WRITE_ONCE(rdp->nocb_cb_sleep, true);
> > > > -	rcu_nocb_unlock_irqrestore(rdp, flags);
> > > > +	raw_spin_unlock_irqrestore(&rdp->nocb_lock, flags);
> > > >  	if (needwake_gp)
> > > >  		rcu_gp_kthread_wake();
> > > >  	swait_event_interruptible_exclusive(rdp->nocb_cb_wq,
> > > > -- 
> > > > 2.25.0
> > > > 

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 03/10] rcu: Make locking explicit in do_nocb_deferred_wakeup_common()
  2020-05-13 16:47 ` [PATCH 03/10] rcu: Make locking explicit in do_nocb_deferred_wakeup_common() Frederic Weisbecker
@ 2020-05-26 19:54   ` Joel Fernandes
  2020-05-26 19:59   ` Joel Fernandes
  1 sibling, 0 replies; 57+ messages in thread
From: Joel Fernandes @ 2020-05-26 19:54 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Paul E . McKenney, LKML, Steven Rostedt, Mathieu Desnoyers,
	Lai Jiangshan, Josh Triplett

On Wed, May 13, 2020 at 06:47:07PM +0200, Frederic Weisbecker wrote:
> It can either be called inline (locally or CPU hotplug locked) when
> rdp->nocb_defer_wakeup is pending or from the nocb timer. In both cases
> the rdp is offlined and we want to take the nocb lock.
> 
> Clarify the locking rules and expectations.

And you did inline the code in the locking helper already as I was suggesting
in 01/10. :-) So that's settled ;-)

thanks,

 - Joel

> 
> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
> Cc: Paul E. McKenney <paulmck@kernel.org>
> Cc: Josh Triplett <josh@joshtriplett.org>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> Cc: Lai Jiangshan <jiangshanlai@gmail.com>
> Cc: Joel Fernandes <joel@joelfernandes.org>
> ---
>  kernel/rcu/tree_plugin.h | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> index 1d22b16c03e0..1dd3fdd675a1 100644
> --- a/kernel/rcu/tree_plugin.h
> +++ b/kernel/rcu/tree_plugin.h
> @@ -2142,9 +2142,9 @@ static void do_nocb_deferred_wakeup_common(struct rcu_data *rdp)
>  	unsigned long flags;
>  	int ndw;
>  
> -	rcu_nocb_lock_irqsave(rdp, flags);
> +	raw_spin_lock_irqsave(&rdp->nocb_lock, flags);
>  	if (!rcu_nocb_need_deferred_wakeup(rdp)) {
> -		rcu_nocb_unlock_irqrestore(rdp, flags);
> +		raw_spin_unlock_irqrestore(&rdp->nocb_lock, flags);
>  		return;
>  	}
>  	ndw = READ_ONCE(rdp->nocb_defer_wakeup);
> -- 
> 2.25.0
> 

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 03/10] rcu: Make locking explicit in do_nocb_deferred_wakeup_common()
  2020-05-13 16:47 ` [PATCH 03/10] rcu: Make locking explicit in do_nocb_deferred_wakeup_common() Frederic Weisbecker
  2020-05-26 19:54   ` Joel Fernandes
@ 2020-05-26 19:59   ` Joel Fernandes
  1 sibling, 0 replies; 57+ messages in thread
From: Joel Fernandes @ 2020-05-26 19:59 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Paul E . McKenney, LKML, Steven Rostedt, Mathieu Desnoyers,
	Lai Jiangshan, Josh Triplett

On Wed, May 13, 2020 at 06:47:07PM +0200, Frederic Weisbecker wrote:
> It can either be called inline (locally or CPU hotplug locked) when
> rdp->nocb_defer_wakeup is pending or from the nocb timer. In both cases
> the rdp is offlined and we want to take the nocb lock.

s/offlined/offloaded/

thanks,

 - Joel

> 
> Clarify the locking rules and expectations.
> 
> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
> Cc: Paul E. McKenney <paulmck@kernel.org>
> Cc: Josh Triplett <josh@joshtriplett.org>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> Cc: Lai Jiangshan <jiangshanlai@gmail.com>
> Cc: Joel Fernandes <joel@joelfernandes.org>
> ---
>  kernel/rcu/tree_plugin.h | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> index 1d22b16c03e0..1dd3fdd675a1 100644
> --- a/kernel/rcu/tree_plugin.h
> +++ b/kernel/rcu/tree_plugin.h
> @@ -2142,9 +2142,9 @@ static void do_nocb_deferred_wakeup_common(struct rcu_data *rdp)
>  	unsigned long flags;
>  	int ndw;
>  
> -	rcu_nocb_lock_irqsave(rdp, flags);
> +	raw_spin_lock_irqsave(&rdp->nocb_lock, flags);
>  	if (!rcu_nocb_need_deferred_wakeup(rdp)) {
> -		rcu_nocb_unlock_irqrestore(rdp, flags);
> +		raw_spin_unlock_irqrestore(&rdp->nocb_lock, flags);
>  		return;
>  	}
>  	ndw = READ_ONCE(rdp->nocb_defer_wakeup);
> -- 
> 2.25.0
> 

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 01/10] rcu: Directly lock rdp->nocb_lock on nocb code entrypoints
  2020-05-26 16:29         ` Paul E. McKenney
@ 2020-05-26 20:18           ` Joel Fernandes
  2020-05-26 21:09             ` Paul E. McKenney
  0 siblings, 1 reply; 57+ messages in thread
From: Joel Fernandes @ 2020-05-26 20:18 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Frederic Weisbecker, LKML, Steven Rostedt, Mathieu Desnoyers,
	Lai Jiangshan, Josh Triplett

On Tue, May 26, 2020 at 09:29:46AM -0700, Paul E. McKenney wrote:
> On Tue, May 26, 2020 at 11:21:37AM -0400, Joel Fernandes wrote:
> > Hi Paul,
> > 
> > On Fri, May 22, 2020 at 10:57:39AM -0700, Paul E. McKenney wrote:
> > > On Wed, May 20, 2020 at 08:29:49AM -0400, Joel Fernandes wrote:
> > > > On Wed, May 13, 2020 at 06:47:05PM +0200, Frederic Weisbecker wrote:
> > > > > Pure NOCB code entrypoints (nocb_cb kthread, nocb_gp kthread, nocb
> > > > > timers) can unconditionally lock rdp->nocb_lock as they always execute
> > > > > in the context of an offloaded rdp.
> > > > > 
> > > > > This also prepare for toggling CPUs to/from callback's offloaded mode
> > > > > where the offloaded state will possibly change when rdp->nocb_lock
> > > > > isn't taken. We'll still want the entrypoints to lock the rdp in any
> > > > > case.
> > > > 
> > > > Suggested rewrite for change log:
> > > > <wordsmithing>
> > > > Make pure NOCB code entrypoints (nocb_cb kthread, nocb_gp kthread, nocb
> > > > timers) unconditionally lock rdp->nocb_lock as they always execute in the
> > > > context of an offloaded rdp.
> > > > 
> > > > This prepares for future toggling of CPUs to/from callback's offloaded mode
> > > > where the offloaded state can change when rdp->nocb_lock is not held. We'll
> > > > still want the entrypoints to lock the rdp in any case.
> > > > </wordsmithing>
> > > > 
> > > > Also, can we inline rcu_nocb_lock_irqsave() into
> > > > do_nocb_deferred_wakeup_common() since that's the only user, and then delete
> > > > rcu_nocb_lock_irqsave() and the corresponding unlock? That would also remove
> > > > confusion about which API to use for nocb locking (i.e. whether to directly
> > > > acquire lock or call rcu_nocb_lock_irqsave()).
> > > > 
> > > > Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > > 
> > > Thank you for looking this over, Joel!
> > > 
> > > Is it feasible to make rcu_nocb_lock*() and rcu_nocb_unlock*() "do the
> > > right thing", even when things are changing?
> > 
> > One way to prevent things from changing could be to employ Steven's
> > poor-man's RCU to mediate the "changing state" itself assuming the switch
> > from the fast-path to the slow-path does not happen often. :) So whichever
> > path is changing the state needs to wait for that poor-man's GP.
> 
> That should not be needed, given that acquiring ->nocb_lock on the CPU
> corresponding to that lock suffices in both cases.  The trick is making
> sure that the release matches the acquire.

Revisiting what you meant by "when things are changing", I'm assuming you
meant a CPU is dynamically switched from non-offloaded mode to
offloaded-mode. Please correct me if I'm wrong.

Assuming that's true, you asked how do we "do the right thing" in the
lock/unlock APIs. I was also suggesting getting rid of them and directly
acquiring/releasing the spinlock, like Frederic does. It sounds like
that is not good enough and you want an API that can do conditional locking
(and the said complexity is hidden behind the API). Allow me to read more
code and see if I can understand that / how to do that.

> > Any case, are you concerned with the performance issues with the
> > unconditional locking and that's why you suggest still keeping it conditional?
> 
> My concerns are more about maintainability.

Ok.

> > Also, coming back to my point of inline the helper into the last caller -
> > do_nocb_deferred_wakeup_common(). I think since we don't need to check if the
> > rdp is offloaded and do any conditional locking. The timer can be called only
> > with offloaded rdp. So we can directly do the unconditional spinlock instead
> > of using the rcu_nocb_lock helper there.
> 
> Indeed we can.  But should we?

Yeah may be not, in the event that we could do conditional locking and
benefit.

> > > would prevent any number of "interesting" copy-pasta and "just now became
> > > common code" bugs down the road.  And because irqs are disabled while
> > > holding the lock, it should be possible to keep state on a per-CPU basis.
> > 
> > Agreed, that would be nice. Though if we could keep simple, that'd be nice
> > too.
> 
> Having one set of functions/macros that are always used to protect
> the ->cblist, no matter what the context, is a very attractive form of
> simple.  ;-)

I was thinking that API is already raw_spin_lock/unlock() but I'll revisit
everything.

> > > The ugliest scenario is callback adoption, where there are two ->cblist
> > > structures in need of being locked.  In that case, changes are excluded
> > > (because that is in CPU hotplug code), but is it possible to take
> > > advantage of that reasonably?
> > 
> > Could you describe this a bit more? Thanks.
> 
> Right now, callbacks are merged directly from the outgoing CPU's ->cblist
> to the surviving CPU's ->cblist.  This means that both ->cblist structures
> must be locked at the same time, which would require additional state.
> After all, if only the one ->cblist were to be locked at a given time,
> a per-CPU variable could be used to track what method should be used to
> unlock the ->cblist.

So you do mean conditional locking behind an API, and everyone call the API
whether they want do the conditional locking or not. Ok.

> This could be restructured to use an intermediate private ->cblist,
> but care would be required with the counts, as it is a very bad idea
> for a large group of callbacks to become invisible.  (This is not a
> problem for rcu_barrier(), which excludes CPU hotplug, but it could
> be a serious problem for callback-flood-mitigation mechanisms.  Yes,
> they are heuristic, but...)

Ok.

> > > Maybe these changes are the best we can do, but it would be good to
> > > if the same primitive locked a ->cblist regardless of context.
> > 
> > Here you are comparing 2 primitives. Do you mean that just IRQs being
> > disabled is another primitive, and rcu_nocb_lock is another one?
> 
> I am not sure what this question means, but I am advocating looking
> into retaining a single wrapper that decides instead of direct use of
> the underlying primitives.

Yep.

> Or are you instead asking why there are two different methods of
> protecting the ->cblist structures?  (If so, because call_rcu() happens
> often enough that we don't want lock-acquisition overhead unless we
> absolutely need it, which we do on nohz_full CPUs but not otherwise.)

Yeah that's what I was asking. About lock-acquisition overhead, I think its
still uncontended overhead though because even if the nocb lock is taken when
it was not needed, it is still to lock the local ->cblist. Correct me if I'm
wrong though!

> > BTW, I'm really itching to give it a try to make the scheduler more deadlock
> > resilient (that is, if the scheduler wake up path detects a deadlock, then it
> > defers the wake up using timers, or irq_work on its own instead of passing
> > the burden of doing so to the callers). Thoughts?
> 
> I have used similar approaches within RCU, but on the other hand the
> scheduler often has tighter latency constraints than RCU does.	So I
> think that is a better question for the scheduler maintainers than it
> is for me.  ;-)

Ok, it definitely keeps coming up in my radar first with the
rcu_read_unlock_special() stuff, and now the nocb ;-). Perhaps it could also
be good for a conference discussion!

thanks,

 - Joel

> 							Thanx, Paul
> 
> > thanks,
> > 
> >  - Joel
> > 
> > 
> > > Can that be made to work reasonably?
> > > 
> > > 							Thanx, Paul
> > > 
> > > > thanks,
> > > > 
> > > >  - Joel
> > > > 
> > > > 
> > > > > 
> > > > > Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
> > > > > Cc: Paul E. McKenney <paulmck@kernel.org>
> > > > > Cc: Josh Triplett <josh@joshtriplett.org>
> > > > > Cc: Steven Rostedt <rostedt@goodmis.org>
> > > > > Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> > > > > Cc: Lai Jiangshan <jiangshanlai@gmail.com>
> > > > > Cc: Joel Fernandes <joel@joelfernandes.org>
> > > > > ---
> > > > >  kernel/rcu/tree_plugin.h | 14 +++++++-------
> > > > >  1 file changed, 7 insertions(+), 7 deletions(-)
> > > > > 
> > > > > diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> > > > > index 097635c41135..523570469864 100644
> > > > > --- a/kernel/rcu/tree_plugin.h
> > > > > +++ b/kernel/rcu/tree_plugin.h
> > > > > @@ -1909,7 +1909,7 @@ static void do_nocb_bypass_wakeup_timer(struct timer_list *t)
> > > > >  	struct rcu_data *rdp = from_timer(rdp, t, nocb_bypass_timer);
> > > > >  
> > > > >  	trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("Timer"));
> > > > > -	rcu_nocb_lock_irqsave(rdp, flags);
> > > > > +	raw_spin_lock_irqsave(&rdp->nocb_lock, flags);
> > > > >  	smp_mb__after_spinlock(); /* Timer expire before wakeup. */
> > > > >  	__call_rcu_nocb_wake(rdp, true, flags);
> > > > >  }
> > > > > @@ -1942,7 +1942,7 @@ static void nocb_gp_wait(struct rcu_data *my_rdp)
> > > > >  	 */
> > > > >  	for (rdp = my_rdp; rdp; rdp = rdp->nocb_next_cb_rdp) {
> > > > >  		trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("Check"));
> > > > > -		rcu_nocb_lock_irqsave(rdp, flags);
> > > > > +		raw_spin_lock_irqsave(&rdp->nocb_lock, flags);
> > > > >  		bypass_ncbs = rcu_cblist_n_cbs(&rdp->nocb_bypass);
> > > > >  		if (bypass_ncbs &&
> > > > >  		    (time_after(j, READ_ONCE(rdp->nocb_bypass_first) + 1) ||
> > > > > @@ -1951,7 +1951,7 @@ static void nocb_gp_wait(struct rcu_data *my_rdp)
> > > > >  			(void)rcu_nocb_try_flush_bypass(rdp, j);
> > > > >  			bypass_ncbs = rcu_cblist_n_cbs(&rdp->nocb_bypass);
> > > > >  		} else if (!bypass_ncbs && rcu_segcblist_empty(&rdp->cblist)) {
> > > > > -			rcu_nocb_unlock_irqrestore(rdp, flags);
> > > > > +			raw_spin_unlock_irqrestore(&rdp->nocb_lock, flags);
> > > > >  			continue; /* No callbacks here, try next. */
> > > > >  		}
> > > > >  		if (bypass_ncbs) {
> > > > > @@ -1996,7 +1996,7 @@ static void nocb_gp_wait(struct rcu_data *my_rdp)
> > > > >  		} else {
> > > > >  			needwake = false;
> > > > >  		}
> > > > > -		rcu_nocb_unlock_irqrestore(rdp, flags);
> > > > > +		raw_spin_unlock_irqrestore(&rdp->nocb_lock, flags);
> > > > >  		if (needwake) {
> > > > >  			swake_up_one(&rdp->nocb_cb_wq);
> > > > >  			gotcbs = true;
> > > > > @@ -2084,7 +2084,7 @@ static void nocb_cb_wait(struct rcu_data *rdp)
> > > > >  	rcu_do_batch(rdp);
> > > > >  	local_bh_enable();
> > > > >  	lockdep_assert_irqs_enabled();
> > > > > -	rcu_nocb_lock_irqsave(rdp, flags);
> > > > > +	raw_spin_lock_irqsave(&rdp->nocb_lock, flags);
> > > > >  	if (rcu_segcblist_nextgp(&rdp->cblist, &cur_gp_seq) &&
> > > > >  	    rcu_seq_done(&rnp->gp_seq, cur_gp_seq) &&
> > > > >  	    raw_spin_trylock_rcu_node(rnp)) { /* irqs already disabled. */
> > > > > @@ -2092,7 +2092,7 @@ static void nocb_cb_wait(struct rcu_data *rdp)
> > > > >  		raw_spin_unlock_rcu_node(rnp); /* irqs remain disabled. */
> > > > >  	}
> > > > >  	if (rcu_segcblist_ready_cbs(&rdp->cblist)) {
> > > > > -		rcu_nocb_unlock_irqrestore(rdp, flags);
> > > > > +		raw_spin_unlock_irqrestore(&rdp->nocb_lock, flags);
> > > > >  		if (needwake_gp)
> > > > >  			rcu_gp_kthread_wake();
> > > > >  		return;
> > > > > @@ -2100,7 +2100,7 @@ static void nocb_cb_wait(struct rcu_data *rdp)
> > > > >  
> > > > >  	trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("CBSleep"));
> > > > >  	WRITE_ONCE(rdp->nocb_cb_sleep, true);
> > > > > -	rcu_nocb_unlock_irqrestore(rdp, flags);
> > > > > +	raw_spin_unlock_irqrestore(&rdp->nocb_lock, flags);
> > > > >  	if (needwake_gp)
> > > > >  		rcu_gp_kthread_wake();
> > > > >  	swait_event_interruptible_exclusive(rdp->nocb_cb_wq,
> > > > > -- 
> > > > > 2.25.0
> > > > > 

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 01/10] rcu: Directly lock rdp->nocb_lock on nocb code entrypoints
  2020-05-26 20:18           ` Joel Fernandes
@ 2020-05-26 21:09             ` Paul E. McKenney
  2020-05-26 21:27               ` Joel Fernandes
  0 siblings, 1 reply; 57+ messages in thread
From: Paul E. McKenney @ 2020-05-26 21:09 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Frederic Weisbecker, LKML, Steven Rostedt, Mathieu Desnoyers,
	Lai Jiangshan, Josh Triplett

On Tue, May 26, 2020 at 04:18:40PM -0400, Joel Fernandes wrote:
> On Tue, May 26, 2020 at 09:29:46AM -0700, Paul E. McKenney wrote:
> > On Tue, May 26, 2020 at 11:21:37AM -0400, Joel Fernandes wrote:
> > > Hi Paul,
> > > 
> > > On Fri, May 22, 2020 at 10:57:39AM -0700, Paul E. McKenney wrote:
> > > > On Wed, May 20, 2020 at 08:29:49AM -0400, Joel Fernandes wrote:
> > > > > On Wed, May 13, 2020 at 06:47:05PM +0200, Frederic Weisbecker wrote:
> > > > > > Pure NOCB code entrypoints (nocb_cb kthread, nocb_gp kthread, nocb
> > > > > > timers) can unconditionally lock rdp->nocb_lock as they always execute
> > > > > > in the context of an offloaded rdp.
> > > > > > 
> > > > > > This also prepare for toggling CPUs to/from callback's offloaded mode
> > > > > > where the offloaded state will possibly change when rdp->nocb_lock
> > > > > > isn't taken. We'll still want the entrypoints to lock the rdp in any
> > > > > > case.
> > > > > 
> > > > > Suggested rewrite for change log:
> > > > > <wordsmithing>
> > > > > Make pure NOCB code entrypoints (nocb_cb kthread, nocb_gp kthread, nocb
> > > > > timers) unconditionally lock rdp->nocb_lock as they always execute in the
> > > > > context of an offloaded rdp.
> > > > > 
> > > > > This prepares for future toggling of CPUs to/from callback's offloaded mode
> > > > > where the offloaded state can change when rdp->nocb_lock is not held. We'll
> > > > > still want the entrypoints to lock the rdp in any case.
> > > > > </wordsmithing>
> > > > > 
> > > > > Also, can we inline rcu_nocb_lock_irqsave() into
> > > > > do_nocb_deferred_wakeup_common() since that's the only user, and then delete
> > > > > rcu_nocb_lock_irqsave() and the corresponding unlock? That would also remove
> > > > > confusion about which API to use for nocb locking (i.e. whether to directly
> > > > > acquire lock or call rcu_nocb_lock_irqsave()).
> > > > > 
> > > > > Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > > > 
> > > > Thank you for looking this over, Joel!
> > > > 
> > > > Is it feasible to make rcu_nocb_lock*() and rcu_nocb_unlock*() "do the
> > > > right thing", even when things are changing?
> > > 
> > > One way to prevent things from changing could be to employ Steven's
> > > poor-man's RCU to mediate the "changing state" itself assuming the switch
> > > from the fast-path to the slow-path does not happen often. :) So whichever
> > > path is changing the state needs to wait for that poor-man's GP.
> > 
> > That should not be needed, given that acquiring ->nocb_lock on the CPU
> > corresponding to that lock suffices in both cases.  The trick is making
> > sure that the release matches the acquire.
> 
> Revisiting what you meant by "when things are changing", I'm assuming you
> meant a CPU is dynamically switched from non-offloaded mode to
> offloaded-mode. Please correct me if I'm wrong.

Both.  It does no good to acquire by disabling IRQs and then release by
releasing a lock that you do not hold.  Nor does it help to acquire the
lock and then fail to release it, instead merely re-enabling IRQs.

Aside from the case where two locks are held, a per-CPU variable (as
in another field in the rcu_data structure would work.  And that case
could probably be reworked to hold only one lock at a time.

> Assuming that's true, you asked how do we "do the right thing" in the
> lock/unlock APIs. I was also suggesting getting rid of them and directly
> acquiring/releasing the spinlock, like Frederic does. It sounds like
> that is not good enough and you want an API that can do conditional locking
> (and the said complexity is hidden behind the API). Allow me to read more
> code and see if I can understand that / how to do that.

Indeed, the non-nohz_full code should not acquire the spinlock.

> > > Any case, are you concerned with the performance issues with the
> > > unconditional locking and that's why you suggest still keeping it conditional?
> > 
> > My concerns are more about maintainability.
> 
> Ok.
> 
> > > Also, coming back to my point of inline the helper into the last caller -
> > > do_nocb_deferred_wakeup_common(). I think since we don't need to check if the
> > > rdp is offloaded and do any conditional locking. The timer can be called only
> > > with offloaded rdp. So we can directly do the unconditional spinlock instead
> > > of using the rcu_nocb_lock helper there.
> > 
> > Indeed we can.  But should we?
> 
> Yeah may be not, in the event that we could do conditional locking and
> benefit.

Underneath an API, as current mainline does.

> > > > would prevent any number of "interesting" copy-pasta and "just now became
> > > > common code" bugs down the road.  And because irqs are disabled while
> > > > holding the lock, it should be possible to keep state on a per-CPU basis.
> > > 
> > > Agreed, that would be nice. Though if we could keep simple, that'd be nice
> > > too.
> > 
> > Having one set of functions/macros that are always used to protect
> > the ->cblist, no matter what the context, is a very attractive form of
> > simple.  ;-)
> 
> I was thinking that API is already raw_spin_lock/unlock() but I'll revisit
> everything.

It is the function/macro family that includes rcu_nocb_lock() and
rcu_nocb_unlock(), as you can see from Frederic's first patch.

> > > > The ugliest scenario is callback adoption, where there are two ->cblist
> > > > structures in need of being locked.  In that case, changes are excluded
> > > > (because that is in CPU hotplug code), but is it possible to take
> > > > advantage of that reasonably?
> > > 
> > > Could you describe this a bit more? Thanks.
> > 
> > Right now, callbacks are merged directly from the outgoing CPU's ->cblist
> > to the surviving CPU's ->cblist.  This means that both ->cblist structures
> > must be locked at the same time, which would require additional state.
> > After all, if only the one ->cblist were to be locked at a given time,
> > a per-CPU variable could be used to track what method should be used to
> > unlock the ->cblist.
> 
> So you do mean conditional locking behind an API, and everyone call the API
> whether they want do the conditional locking or not. Ok.

Yes, as the code in mainline is now.

> > This could be restructured to use an intermediate private ->cblist,
> > but care would be required with the counts, as it is a very bad idea
> > for a large group of callbacks to become invisible.  (This is not a
> > problem for rcu_barrier(), which excludes CPU hotplug, but it could
> > be a serious problem for callback-flood-mitigation mechanisms.  Yes,
> > they are heuristic, but...)
> 
> Ok.
> 
> > > > Maybe these changes are the best we can do, but it would be good to
> > > > if the same primitive locked a ->cblist regardless of context.
> > > 
> > > Here you are comparing 2 primitives. Do you mean that just IRQs being
> > > disabled is another primitive, and rcu_nocb_lock is another one?
> > 
> > I am not sure what this question means, but I am advocating looking
> > into retaining a single wrapper that decides instead of direct use of
> > the underlying primitives.
> 
> Yep.
> 
> > Or are you instead asking why there are two different methods of
> > protecting the ->cblist structures?  (If so, because call_rcu() happens
> > often enough that we don't want lock-acquisition overhead unless we
> > absolutely need it, which we do on nohz_full CPUs but not otherwise.)
> 
> Yeah that's what I was asking. About lock-acquisition overhead, I think its
> still uncontended overhead though because even if the nocb lock is taken when
> it was not needed, it is still to lock the local ->cblist. Correct me if I'm
> wrong though!

It is uncontended, but it is unnecessary overhead.  And call_rcu() can
be invoked rather frequently.

> > > BTW, I'm really itching to give it a try to make the scheduler more deadlock
> > > resilient (that is, if the scheduler wake up path detects a deadlock, then it
> > > defers the wake up using timers, or irq_work on its own instead of passing
> > > the burden of doing so to the callers). Thoughts?
> > 
> > I have used similar approaches within RCU, but on the other hand the
> > scheduler often has tighter latency constraints than RCU does.	So I
> > think that is a better question for the scheduler maintainers than it
> > is for me.  ;-)
> 
> Ok, it definitely keeps coming up in my radar first with the
> rcu_read_unlock_special() stuff, and now the nocb ;-). Perhaps it could also
> be good for a conference discussion!

Again, please understand that RCU has way looser latency constraints
than the scheduler does.  Adding half a jiffy to wakeup latency might
not go over well, especially in the real-time application area.

But what did the scheduler maintainers say about this idea?

							Thanx, Paul

> thanks,
> 
>  - Joel
> 
> > 							Thanx, Paul
> > 
> > > thanks,
> > > 
> > >  - Joel
> > > 
> > > 
> > > > Can that be made to work reasonably?
> > > > 
> > > > 							Thanx, Paul
> > > > 
> > > > > thanks,
> > > > > 
> > > > >  - Joel
> > > > > 
> > > > > 
> > > > > > 
> > > > > > Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
> > > > > > Cc: Paul E. McKenney <paulmck@kernel.org>
> > > > > > Cc: Josh Triplett <josh@joshtriplett.org>
> > > > > > Cc: Steven Rostedt <rostedt@goodmis.org>
> > > > > > Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> > > > > > Cc: Lai Jiangshan <jiangshanlai@gmail.com>
> > > > > > Cc: Joel Fernandes <joel@joelfernandes.org>
> > > > > > ---
> > > > > >  kernel/rcu/tree_plugin.h | 14 +++++++-------
> > > > > >  1 file changed, 7 insertions(+), 7 deletions(-)
> > > > > > 
> > > > > > diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> > > > > > index 097635c41135..523570469864 100644
> > > > > > --- a/kernel/rcu/tree_plugin.h
> > > > > > +++ b/kernel/rcu/tree_plugin.h
> > > > > > @@ -1909,7 +1909,7 @@ static void do_nocb_bypass_wakeup_timer(struct timer_list *t)
> > > > > >  	struct rcu_data *rdp = from_timer(rdp, t, nocb_bypass_timer);
> > > > > >  
> > > > > >  	trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("Timer"));
> > > > > > -	rcu_nocb_lock_irqsave(rdp, flags);
> > > > > > +	raw_spin_lock_irqsave(&rdp->nocb_lock, flags);
> > > > > >  	smp_mb__after_spinlock(); /* Timer expire before wakeup. */
> > > > > >  	__call_rcu_nocb_wake(rdp, true, flags);
> > > > > >  }
> > > > > > @@ -1942,7 +1942,7 @@ static void nocb_gp_wait(struct rcu_data *my_rdp)
> > > > > >  	 */
> > > > > >  	for (rdp = my_rdp; rdp; rdp = rdp->nocb_next_cb_rdp) {
> > > > > >  		trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("Check"));
> > > > > > -		rcu_nocb_lock_irqsave(rdp, flags);
> > > > > > +		raw_spin_lock_irqsave(&rdp->nocb_lock, flags);
> > > > > >  		bypass_ncbs = rcu_cblist_n_cbs(&rdp->nocb_bypass);
> > > > > >  		if (bypass_ncbs &&
> > > > > >  		    (time_after(j, READ_ONCE(rdp->nocb_bypass_first) + 1) ||
> > > > > > @@ -1951,7 +1951,7 @@ static void nocb_gp_wait(struct rcu_data *my_rdp)
> > > > > >  			(void)rcu_nocb_try_flush_bypass(rdp, j);
> > > > > >  			bypass_ncbs = rcu_cblist_n_cbs(&rdp->nocb_bypass);
> > > > > >  		} else if (!bypass_ncbs && rcu_segcblist_empty(&rdp->cblist)) {
> > > > > > -			rcu_nocb_unlock_irqrestore(rdp, flags);
> > > > > > +			raw_spin_unlock_irqrestore(&rdp->nocb_lock, flags);
> > > > > >  			continue; /* No callbacks here, try next. */
> > > > > >  		}
> > > > > >  		if (bypass_ncbs) {
> > > > > > @@ -1996,7 +1996,7 @@ static void nocb_gp_wait(struct rcu_data *my_rdp)
> > > > > >  		} else {
> > > > > >  			needwake = false;
> > > > > >  		}
> > > > > > -		rcu_nocb_unlock_irqrestore(rdp, flags);
> > > > > > +		raw_spin_unlock_irqrestore(&rdp->nocb_lock, flags);
> > > > > >  		if (needwake) {
> > > > > >  			swake_up_one(&rdp->nocb_cb_wq);
> > > > > >  			gotcbs = true;
> > > > > > @@ -2084,7 +2084,7 @@ static void nocb_cb_wait(struct rcu_data *rdp)
> > > > > >  	rcu_do_batch(rdp);
> > > > > >  	local_bh_enable();
> > > > > >  	lockdep_assert_irqs_enabled();
> > > > > > -	rcu_nocb_lock_irqsave(rdp, flags);
> > > > > > +	raw_spin_lock_irqsave(&rdp->nocb_lock, flags);
> > > > > >  	if (rcu_segcblist_nextgp(&rdp->cblist, &cur_gp_seq) &&
> > > > > >  	    rcu_seq_done(&rnp->gp_seq, cur_gp_seq) &&
> > > > > >  	    raw_spin_trylock_rcu_node(rnp)) { /* irqs already disabled. */
> > > > > > @@ -2092,7 +2092,7 @@ static void nocb_cb_wait(struct rcu_data *rdp)
> > > > > >  		raw_spin_unlock_rcu_node(rnp); /* irqs remain disabled. */
> > > > > >  	}
> > > > > >  	if (rcu_segcblist_ready_cbs(&rdp->cblist)) {
> > > > > > -		rcu_nocb_unlock_irqrestore(rdp, flags);
> > > > > > +		raw_spin_unlock_irqrestore(&rdp->nocb_lock, flags);
> > > > > >  		if (needwake_gp)
> > > > > >  			rcu_gp_kthread_wake();
> > > > > >  		return;
> > > > > > @@ -2100,7 +2100,7 @@ static void nocb_cb_wait(struct rcu_data *rdp)
> > > > > >  
> > > > > >  	trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("CBSleep"));
> > > > > >  	WRITE_ONCE(rdp->nocb_cb_sleep, true);
> > > > > > -	rcu_nocb_unlock_irqrestore(rdp, flags);
> > > > > > +	raw_spin_unlock_irqrestore(&rdp->nocb_lock, flags);
> > > > > >  	if (needwake_gp)
> > > > > >  		rcu_gp_kthread_wake();
> > > > > >  	swait_event_interruptible_exclusive(rdp->nocb_cb_wq,
> > > > > > -- 
> > > > > > 2.25.0
> > > > > > 

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 08/10] rcu: Allow to deactivate nocb on a CPU
  2020-05-13 16:47 ` [PATCH 08/10] rcu: Allow to deactivate nocb on a CPU Frederic Weisbecker
  2020-05-13 18:38   ` Paul E. McKenney
@ 2020-05-26 21:20   ` Joel Fernandes
  2020-05-26 22:49     ` Joel Fernandes
  2020-06-04 13:14     ` Frederic Weisbecker
  1 sibling, 2 replies; 57+ messages in thread
From: Joel Fernandes @ 2020-05-26 21:20 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Paul E . McKenney, LKML, Steven Rostedt, Mathieu Desnoyers,
	Lai Jiangshan, Josh Triplett

On Wed, May 13, 2020 at 06:47:12PM +0200, Frederic Weisbecker wrote:
> Allow a CPU's rdp to quit the callback offlined mode.

nit: s/offlined/offloaded/ ?

> The switch happens on the target with IRQs disabled and rdp->nocb_lock
> held to avoid races between local callbacks handling and kthread
> offloaded callbacks handling.
> nocb_cb kthread is first parked to avoid any future race with
> concurrent rcu_do_batch() executions. Then the cblist is set to offloaded
> so that the nocb_gp kthread ignores this rdp.

nit: you mean cblist is set to non-offloaded mode right?

Also, could you clarify better the rcu_barrier bits in the changelog. I know
there's some issue if the cblist has both offloaded and non-offloaded
callbacks, but it would be good to clarify this here better IMHO.

[...]
> diff --git a/kernel/rcu/rcu_segcblist.c b/kernel/rcu/rcu_segcblist.c
> index d8ea2bef5574..4bed48da7702 100644
> --- a/kernel/rcu/rcu_segcblist.c
> +++ b/kernel/rcu/rcu_segcblist.c
> @@ -171,9 +171,9 @@ void rcu_segcblist_disable(struct rcu_segcblist *rsclp)
>   * Mark the specified rcu_segcblist structure as offloaded.  This
>   * structure must be empty.
>   */
> -void rcu_segcblist_offload(struct rcu_segcblist *rsclp)
> +void rcu_segcblist_offload(struct rcu_segcblist *rsclp, bool offload)
>  {
> -	rsclp->offloaded = 1;
> +	rsclp->offloaded = offload;
>  }
>  #endif
>  
> diff --git a/kernel/rcu/rcu_segcblist.h b/kernel/rcu/rcu_segcblist.h
> index 4c1503a82492..8f7c6c34cb1b 100644
> --- a/kernel/rcu/rcu_segcblist.h
> +++ b/kernel/rcu/rcu_segcblist.h
> @@ -83,7 +83,7 @@ void rcu_segcblist_inc_len(struct rcu_segcblist *rsclp);
>  void rcu_segcblist_init(struct rcu_segcblist *rsclp);
>  void rcu_segcblist_disable(struct rcu_segcblist *rsclp);
>  #ifdef CONFIG_RCU_NOCB_CPU
> -void rcu_segcblist_offload(struct rcu_segcblist *rsclp);
> +void rcu_segcblist_offload(struct rcu_segcblist *rsclp, bool offload);
>  #endif
>  bool rcu_segcblist_ready_cbs(struct rcu_segcblist *rsclp);
>  bool rcu_segcblist_pend_cbs(struct rcu_segcblist *rsclp);
> diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> index f19e81e0c691..c74a4df8d5f2 100644
> --- a/kernel/rcu/tree_plugin.h
> +++ b/kernel/rcu/tree_plugin.h
> @@ -1943,6 +1943,10 @@ static void nocb_gp_wait(struct rcu_data *my_rdp)
>  	for (rdp = my_rdp; rdp; rdp = rdp->nocb_next_cb_rdp) {
>  		trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("Check"));
>  		raw_spin_lock_irqsave(&rdp->nocb_lock, flags);
> +		if (!rcu_segcblist_is_offloaded(&rdp->cblist)) {
> +			raw_spin_unlock_irqrestore(&rdp->nocb_lock, flags);
> +			continue;
> +		}
>  		bypass_ncbs = rcu_cblist_n_cbs(&rdp->nocb_bypass);
>  		if (bypass_ncbs &&
>  		    (time_after(j, READ_ONCE(rdp->nocb_bypass_first) + 1) ||
> @@ -2176,6 +2180,50 @@ static void do_nocb_deferred_wakeup(struct rcu_data *rdp)
>  		do_nocb_deferred_wakeup_common(rdp);
>  }
>  
> +static void __rcu_nocb_rdp_deoffload(struct rcu_data *rdp)
> +{
> +	unsigned long flags;
> +	struct rcu_node *rnp = rdp->mynode;
> +
> +	printk("De-offloading %d\n", rdp->cpu);

nit: s/printk/pr_debug/ ?

thanks,

 - Joel

> +	kthread_park(rdp->nocb_cb_kthread);
> +
> +	raw_spin_lock_irqsave(&rdp->nocb_lock, flags);
> +	rcu_nocb_flush_bypass(rdp, NULL, jiffies);
> +	raw_spin_lock_rcu_node(rnp);
> +	rcu_segcblist_offload(&rdp->cblist, false);
> +	raw_spin_unlock_rcu_node(rnp);
> +	raw_spin_unlock_irqrestore(&rdp->nocb_lock, flags);
> +}
> +
> +static long rcu_nocb_rdp_deoffload(void *arg)
> +{
> +	struct rcu_data *rdp = arg;
> +
> +	WARN_ON_ONCE(rdp->cpu != raw_smp_processor_id());
> +	__rcu_nocb_rdp_deoffload(rdp);
> +
> +	return 0;
> +}
> +
> +void rcu_nocb_cpu_deoffload(int cpu)
> +{
> +	struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
> +
> +	mutex_lock(&rcu_state.barrier_mutex);
> +	cpus_read_lock();
> +	if (rcu_segcblist_is_offloaded(&rdp->cblist)) {
> +		if (cpu_online(cpu)) {
> +			work_on_cpu(cpu, rcu_nocb_rdp_deoffload, rdp);
> +		} else {
> +			__rcu_nocb_rdp_deoffload(rdp);
> +		}
> +		cpumask_clear_cpu(cpu, rcu_nocb_mask);
> +	}
> +	cpus_read_unlock();
> +	mutex_unlock(&rcu_state.barrier_mutex);
> +}
> +
>  void __init rcu_init_nohz(void)
>  {
>  	int cpu;
> @@ -2218,7 +2266,7 @@ void __init rcu_init_nohz(void)
>  		rdp = per_cpu_ptr(&rcu_data, cpu);
>  		if (rcu_segcblist_empty(&rdp->cblist))
>  			rcu_segcblist_init(&rdp->cblist);
> -		rcu_segcblist_offload(&rdp->cblist);
> +		rcu_segcblist_offload(&rdp->cblist, true);
>  	}
>  	rcu_organize_nocb_kthreads();
>  }
> -- 
> 2.25.0
> 

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 01/10] rcu: Directly lock rdp->nocb_lock on nocb code entrypoints
  2020-05-26 21:09             ` Paul E. McKenney
@ 2020-05-26 21:27               ` Joel Fernandes
  2020-05-26 22:29                 ` Paul E. McKenney
  0 siblings, 1 reply; 57+ messages in thread
From: Joel Fernandes @ 2020-05-26 21:27 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Frederic Weisbecker, LKML, Steven Rostedt, Mathieu Desnoyers,
	Lai Jiangshan, Josh Triplett

On Tue, May 26, 2020 at 02:09:47PM -0700, Paul E. McKenney wrote:
[...]
> > > > BTW, I'm really itching to give it a try to make the scheduler more deadlock
> > > > resilient (that is, if the scheduler wake up path detects a deadlock, then it
> > > > defers the wake up using timers, or irq_work on its own instead of passing
> > > > the burden of doing so to the callers). Thoughts?
> > > 
> > > I have used similar approaches within RCU, but on the other hand the
> > > scheduler often has tighter latency constraints than RCU does.	So I
> > > think that is a better question for the scheduler maintainers than it
> > > is for me.  ;-)
> > 
> > Ok, it definitely keeps coming up in my radar first with the
> > rcu_read_unlock_special() stuff, and now the nocb ;-). Perhaps it could also
> > be good for a conference discussion!
> 
> Again, please understand that RCU has way looser latency constraints
> than the scheduler does.  Adding half a jiffy to wakeup latency might
> not go over well, especially in the real-time application area.

Yeah, agreed that the "deadlock detection" code should be pretty light weight
if/when it is written.

> But what did the scheduler maintainers say about this idea?

Last I remember when it came up during the rcu_read_unlock_special() deadlock
discussions, there's no way to know for infra like RCU to know that it was
invoked from the scheduler.

The idea I am bringing up now (about the scheduler itself detecting a
recursion) was never brought up (not yet) with the sched maintainers (at
least not by me).

thanks,

 - Joel




^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 01/10] rcu: Directly lock rdp->nocb_lock on nocb code entrypoints
  2020-05-26 21:27               ` Joel Fernandes
@ 2020-05-26 22:29                 ` Paul E. McKenney
  2020-05-27  0:45                   ` Joel Fernandes
  0 siblings, 1 reply; 57+ messages in thread
From: Paul E. McKenney @ 2020-05-26 22:29 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Frederic Weisbecker, LKML, Steven Rostedt, Mathieu Desnoyers,
	Lai Jiangshan, Josh Triplett

On Tue, May 26, 2020 at 05:27:56PM -0400, Joel Fernandes wrote:
> On Tue, May 26, 2020 at 02:09:47PM -0700, Paul E. McKenney wrote:
> [...]
> > > > > BTW, I'm really itching to give it a try to make the scheduler more deadlock
> > > > > resilient (that is, if the scheduler wake up path detects a deadlock, then it
> > > > > defers the wake up using timers, or irq_work on its own instead of passing
> > > > > the burden of doing so to the callers). Thoughts?
> > > > 
> > > > I have used similar approaches within RCU, but on the other hand the
> > > > scheduler often has tighter latency constraints than RCU does.	So I
> > > > think that is a better question for the scheduler maintainers than it
> > > > is for me.  ;-)
> > > 
> > > Ok, it definitely keeps coming up in my radar first with the
> > > rcu_read_unlock_special() stuff, and now the nocb ;-). Perhaps it could also
> > > be good for a conference discussion!
> > 
> > Again, please understand that RCU has way looser latency constraints
> > than the scheduler does.  Adding half a jiffy to wakeup latency might
> > not go over well, especially in the real-time application area.
> 
> Yeah, agreed that the "deadlock detection" code should be pretty light weight
> if/when it is written.

In addition, to even stand a chance, you would need to use hrtimers.
The half-jiffy (at a minimum) delay from any other deferral mechanism
that I know of would be the kiss of death, especially from the viewpoint
of the real-time guys.

> > But what did the scheduler maintainers say about this idea?
> 
> Last I remember when it came up during the rcu_read_unlock_special() deadlock
> discussions, there's no way to know for infra like RCU to know that it was
> invoked from the scheduler.
> 
> The idea I am bringing up now (about the scheduler itself detecting a
> recursion) was never brought up (not yet) with the sched maintainers (at
> least not by me).

It might be good to bounce if off of them sooner rather than later.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 08/10] rcu: Allow to deactivate nocb on a CPU
  2020-05-26 21:20   ` Joel Fernandes
@ 2020-05-26 22:49     ` Joel Fernandes
  2020-06-04 13:10       ` Frederic Weisbecker
  2020-06-04 13:14     ` Frederic Weisbecker
  1 sibling, 1 reply; 57+ messages in thread
From: Joel Fernandes @ 2020-05-26 22:49 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Paul E . McKenney, LKML, Steven Rostedt, Mathieu Desnoyers,
	Lai Jiangshan, Josh Triplett

On Tue, May 26, 2020 at 05:20:17PM -0400, Joel Fernandes wrote:
 
> > The switch happens on the target with IRQs disabled and rdp->nocb_lock
> > held to avoid races between local callbacks handling and kthread
> > offloaded callbacks handling.
> > nocb_cb kthread is first parked to avoid any future race with
> > concurrent rcu_do_batch() executions. Then the cblist is set to offloaded
> > so that the nocb_gp kthread ignores this rdp.
> 
> nit: you mean cblist is set to non-offloaded mode right?
> 
> Also, could you clarify better the rcu_barrier bits in the changelog. I know
> there's some issue if the cblist has both offloaded and non-offloaded
> callbacks, but it would be good to clarify this here better IMHO.

And for archival purposes: rcu_barrier needs excluding here because it is
possible that for a brief period of time, the callback kthread has been
parked to do the mode-switch, and it could be executing a bunch of callbacks
when it was asked to park.

Meanwhile, more interrupts happen and more callbacks are queued which are now
executing in softirq. This ruins the ordering of callbacks that rcu_barrier
needs.

thanks,

 - Joel


> 
> [...]
> > diff --git a/kernel/rcu/rcu_segcblist.c b/kernel/rcu/rcu_segcblist.c
> > index d8ea2bef5574..4bed48da7702 100644
> > --- a/kernel/rcu/rcu_segcblist.c
> > +++ b/kernel/rcu/rcu_segcblist.c
> > @@ -171,9 +171,9 @@ void rcu_segcblist_disable(struct rcu_segcblist *rsclp)
> >   * Mark the specified rcu_segcblist structure as offloaded.  This
> >   * structure must be empty.
> >   */
> > -void rcu_segcblist_offload(struct rcu_segcblist *rsclp)
> > +void rcu_segcblist_offload(struct rcu_segcblist *rsclp, bool offload)
> >  {
> > -	rsclp->offloaded = 1;
> > +	rsclp->offloaded = offload;
> >  }
> >  #endif
> >  
> > diff --git a/kernel/rcu/rcu_segcblist.h b/kernel/rcu/rcu_segcblist.h
> > index 4c1503a82492..8f7c6c34cb1b 100644
> > --- a/kernel/rcu/rcu_segcblist.h
> > +++ b/kernel/rcu/rcu_segcblist.h
> > @@ -83,7 +83,7 @@ void rcu_segcblist_inc_len(struct rcu_segcblist *rsclp);
> >  void rcu_segcblist_init(struct rcu_segcblist *rsclp);
> >  void rcu_segcblist_disable(struct rcu_segcblist *rsclp);
> >  #ifdef CONFIG_RCU_NOCB_CPU
> > -void rcu_segcblist_offload(struct rcu_segcblist *rsclp);
> > +void rcu_segcblist_offload(struct rcu_segcblist *rsclp, bool offload);
> >  #endif
> >  bool rcu_segcblist_ready_cbs(struct rcu_segcblist *rsclp);
> >  bool rcu_segcblist_pend_cbs(struct rcu_segcblist *rsclp);
> > diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> > index f19e81e0c691..c74a4df8d5f2 100644
> > --- a/kernel/rcu/tree_plugin.h
> > +++ b/kernel/rcu/tree_plugin.h
> > @@ -1943,6 +1943,10 @@ static void nocb_gp_wait(struct rcu_data *my_rdp)
> >  	for (rdp = my_rdp; rdp; rdp = rdp->nocb_next_cb_rdp) {
> >  		trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("Check"));
> >  		raw_spin_lock_irqsave(&rdp->nocb_lock, flags);
> > +		if (!rcu_segcblist_is_offloaded(&rdp->cblist)) {
> > +			raw_spin_unlock_irqrestore(&rdp->nocb_lock, flags);
> > +			continue;
> > +		}
> >  		bypass_ncbs = rcu_cblist_n_cbs(&rdp->nocb_bypass);
> >  		if (bypass_ncbs &&
> >  		    (time_after(j, READ_ONCE(rdp->nocb_bypass_first) + 1) ||
> > @@ -2176,6 +2180,50 @@ static void do_nocb_deferred_wakeup(struct rcu_data *rdp)
> >  		do_nocb_deferred_wakeup_common(rdp);
> >  }
> >  
> > +static void __rcu_nocb_rdp_deoffload(struct rcu_data *rdp)
> > +{
> > +	unsigned long flags;
> > +	struct rcu_node *rnp = rdp->mynode;
> > +
> > +	printk("De-offloading %d\n", rdp->cpu);
> 
> nit: s/printk/pr_debug/ ?
> 
> thanks,
> 
>  - Joel
> 
> > +	kthread_park(rdp->nocb_cb_kthread);
> > +
> > +	raw_spin_lock_irqsave(&rdp->nocb_lock, flags);
> > +	rcu_nocb_flush_bypass(rdp, NULL, jiffies);
> > +	raw_spin_lock_rcu_node(rnp);
> > +	rcu_segcblist_offload(&rdp->cblist, false);
> > +	raw_spin_unlock_rcu_node(rnp);
> > +	raw_spin_unlock_irqrestore(&rdp->nocb_lock, flags);
> > +}
> > +
> > +static long rcu_nocb_rdp_deoffload(void *arg)
> > +{
> > +	struct rcu_data *rdp = arg;
> > +
> > +	WARN_ON_ONCE(rdp->cpu != raw_smp_processor_id());
> > +	__rcu_nocb_rdp_deoffload(rdp);
> > +
> > +	return 0;
> > +}
> > +
> > +void rcu_nocb_cpu_deoffload(int cpu)
> > +{
> > +	struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
> > +
> > +	mutex_lock(&rcu_state.barrier_mutex);
> > +	cpus_read_lock();
> > +	if (rcu_segcblist_is_offloaded(&rdp->cblist)) {
> > +		if (cpu_online(cpu)) {
> > +			work_on_cpu(cpu, rcu_nocb_rdp_deoffload, rdp);
> > +		} else {
> > +			__rcu_nocb_rdp_deoffload(rdp);
> > +		}
> > +		cpumask_clear_cpu(cpu, rcu_nocb_mask);
> > +	}
> > +	cpus_read_unlock();
> > +	mutex_unlock(&rcu_state.barrier_mutex);
> > +}
> > +
> >  void __init rcu_init_nohz(void)
> >  {
> >  	int cpu;
> > @@ -2218,7 +2266,7 @@ void __init rcu_init_nohz(void)
> >  		rdp = per_cpu_ptr(&rcu_data, cpu);
> >  		if (rcu_segcblist_empty(&rdp->cblist))
> >  			rcu_segcblist_init(&rdp->cblist);
> > -		rcu_segcblist_offload(&rdp->cblist);
> > +		rcu_segcblist_offload(&rdp->cblist, true);
> >  	}
> >  	rcu_organize_nocb_kthreads();
> >  }
> > -- 
> > 2.25.0
> > 

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 01/10] rcu: Directly lock rdp->nocb_lock on nocb code entrypoints
  2020-05-26 22:29                 ` Paul E. McKenney
@ 2020-05-27  0:45                   ` Joel Fernandes
  2020-05-27  0:58                     ` Paul E. McKenney
  0 siblings, 1 reply; 57+ messages in thread
From: Joel Fernandes @ 2020-05-27  0:45 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Frederic Weisbecker, LKML, Steven Rostedt, Mathieu Desnoyers,
	Lai Jiangshan, Josh Triplett

Hi Paul,

On Tue, May 26, 2020 at 6:29 PM Paul E. McKenney <paulmck@kernel.org> wrote:
>
> On Tue, May 26, 2020 at 05:27:56PM -0400, Joel Fernandes wrote:
> > On Tue, May 26, 2020 at 02:09:47PM -0700, Paul E. McKenney wrote:
> > [...]
> > > > > > BTW, I'm really itching to give it a try to make the scheduler more deadlock
> > > > > > resilient (that is, if the scheduler wake up path detects a deadlock, then it
> > > > > > defers the wake up using timers, or irq_work on its own instead of passing
> > > > > > the burden of doing so to the callers). Thoughts?
> > > > >
> > > > > I have used similar approaches within RCU, but on the other hand the
> > > > > scheduler often has tighter latency constraints than RCU does.  So I
> > > > > think that is a better question for the scheduler maintainers than it
> > > > > is for me.  ;-)
> > > >
> > > > Ok, it definitely keeps coming up in my radar first with the
> > > > rcu_read_unlock_special() stuff, and now the nocb ;-). Perhaps it could also
> > > > be good for a conference discussion!
> > >
> > > Again, please understand that RCU has way looser latency constraints
> > > than the scheduler does.  Adding half a jiffy to wakeup latency might
> > > not go over well, especially in the real-time application area.
> >
> > Yeah, agreed that the "deadlock detection" code should be pretty light weight
> > if/when it is written.
>
> In addition, to even stand a chance, you would need to use hrtimers.
> The half-jiffy (at a minimum) delay from any other deferral mechanism
> that I know of would be the kiss of death, especially from the viewpoint
> of the real-time guys.

Just to make sure we are talking about the same kind of overhead - the
deferring is only needed if the rq lock is already held (detected by
trylocking). So there's no overhead in the common case other than the
trylock possibly being slightly more expensive than the regular
locking. Also, once the scheduler defers it, it uses the same kind of
mechanism that other deferral mechanisms use to overcome this deadlock
(timers, irq_work etc), so the overhead then would be no different
than what he have now - the RT users would already have the wake up
latency in current kernels without this idea implemented. Did I miss
something?

> > > But what did the scheduler maintainers say about this idea?
> >
> > Last I remember when it came up during the rcu_read_unlock_special() deadlock
> > discussions, there's no way to know for infra like RCU to know that it was
> > invoked from the scheduler.
> >
> > The idea I am bringing up now (about the scheduler itself detecting a
> > recursion) was never brought up (not yet) with the sched maintainers (at
> > least not by me).
>
> It might be good to bounce if off of them sooner rather than later.

Ok, I did that now over IRC. Thank you!

 - Joel

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 01/10] rcu: Directly lock rdp->nocb_lock on nocb code entrypoints
  2020-05-27  0:45                   ` Joel Fernandes
@ 2020-05-27  0:58                     ` Paul E. McKenney
  0 siblings, 0 replies; 57+ messages in thread
From: Paul E. McKenney @ 2020-05-27  0:58 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Frederic Weisbecker, LKML, Steven Rostedt, Mathieu Desnoyers,
	Lai Jiangshan, Josh Triplett

On Tue, May 26, 2020 at 08:45:42PM -0400, Joel Fernandes wrote:
> Hi Paul,
> 
> On Tue, May 26, 2020 at 6:29 PM Paul E. McKenney <paulmck@kernel.org> wrote:
> >
> > On Tue, May 26, 2020 at 05:27:56PM -0400, Joel Fernandes wrote:
> > > On Tue, May 26, 2020 at 02:09:47PM -0700, Paul E. McKenney wrote:
> > > [...]
> > > > > > > BTW, I'm really itching to give it a try to make the scheduler more deadlock
> > > > > > > resilient (that is, if the scheduler wake up path detects a deadlock, then it
> > > > > > > defers the wake up using timers, or irq_work on its own instead of passing
> > > > > > > the burden of doing so to the callers). Thoughts?
> > > > > >
> > > > > > I have used similar approaches within RCU, but on the other hand the
> > > > > > scheduler often has tighter latency constraints than RCU does.  So I
> > > > > > think that is a better question for the scheduler maintainers than it
> > > > > > is for me.  ;-)
> > > > >
> > > > > Ok, it definitely keeps coming up in my radar first with the
> > > > > rcu_read_unlock_special() stuff, and now the nocb ;-). Perhaps it could also
> > > > > be good for a conference discussion!
> > > >
> > > > Again, please understand that RCU has way looser latency constraints
> > > > than the scheduler does.  Adding half a jiffy to wakeup latency might
> > > > not go over well, especially in the real-time application area.
> > >
> > > Yeah, agreed that the "deadlock detection" code should be pretty light weight
> > > if/when it is written.
> >
> > In addition, to even stand a chance, you would need to use hrtimers.
> > The half-jiffy (at a minimum) delay from any other deferral mechanism
> > that I know of would be the kiss of death, especially from the viewpoint
> > of the real-time guys.
> 
> Just to make sure we are talking about the same kind of overhead - the
> deferring is only needed if the rq lock is already held (detected by
> trylocking). So there's no overhead in the common case other than the
> trylock possibly being slightly more expensive than the regular
> locking. Also, once the scheduler defers it, it uses the same kind of
> mechanism that other deferral mechanisms use to overcome this deadlock
> (timers, irq_work etc), so the overhead then would be no different
> than what he have now - the RT users would already have the wake up
> latency in current kernels without this idea implemented. Did I miss
> something?

Aggressive real-time applications care deeply about the uncommon case.

							Thanx, Paul

> > > > But what did the scheduler maintainers say about this idea?
> > >
> > > Last I remember when it came up during the rcu_read_unlock_special() deadlock
> > > discussions, there's no way to know for infra like RCU to know that it was
> > > invoked from the scheduler.
> > >
> > > The idea I am bringing up now (about the scheduler itself detecting a
> > > recursion) was never brought up (not yet) with the sched maintainers (at
> > > least not by me).
> >
> > It might be good to bounce if off of them sooner rather than later.
> 
> Ok, I did that now over IRC. Thank you!
> 
>  - Joel

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 01/10] rcu: Directly lock rdp->nocb_lock on nocb code entrypoints
  2020-05-22 17:57     ` Paul E. McKenney
  2020-05-26 15:21       ` Joel Fernandes
@ 2020-06-04 11:41       ` Frederic Weisbecker
  2020-06-04 16:36         ` Paul E. McKenney
  1 sibling, 1 reply; 57+ messages in thread
From: Frederic Weisbecker @ 2020-06-04 11:41 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Joel Fernandes, LKML, Steven Rostedt, Mathieu Desnoyers,
	Lai Jiangshan, Josh Triplett

On Fri, May 22, 2020 at 10:57:39AM -0700, Paul E. McKenney wrote:
> On Wed, May 20, 2020 at 08:29:49AM -0400, Joel Fernandes wrote:
> > Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> 
> Thank you for looking this over, Joel!
> 
> Is it feasible to make rcu_nocb_lock*() and rcu_nocb_unlock*() "do the
> right thing", even when things are changing?  If it is feasible, that
> would prevent any number of "interesting" copy-pasta and "just now became
> common code" bugs down the road.

This won't be pretty:

    locked = rcu_nocb_lock();
    rcu_nocb_unlock(locked);

And anyway we still want to unconditionally lock on many places,
regardless of the offloaded state. I don't know how we could have
a magic helper do the unconditional lock on some places and the
conditional on others.

Also the point of turning the lock helpers into primitives is to make
it clearer as to where we really need unconditional locking and where
we allow it to be conditional. A gift to reviewers :-)

Thanks.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 08/10] rcu: Allow to deactivate nocb on a CPU
  2020-05-26 22:49     ` Joel Fernandes
@ 2020-06-04 13:10       ` Frederic Weisbecker
  2020-06-11  1:32         ` Joel Fernandes
  0 siblings, 1 reply; 57+ messages in thread
From: Frederic Weisbecker @ 2020-06-04 13:10 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Paul E . McKenney, LKML, Steven Rostedt, Mathieu Desnoyers,
	Lai Jiangshan, Josh Triplett

On Tue, May 26, 2020 at 06:49:08PM -0400, Joel Fernandes wrote:
> On Tue, May 26, 2020 at 05:20:17PM -0400, Joel Fernandes wrote:
>  
> > > The switch happens on the target with IRQs disabled and rdp->nocb_lock
> > > held to avoid races between local callbacks handling and kthread
> > > offloaded callbacks handling.
> > > nocb_cb kthread is first parked to avoid any future race with
> > > concurrent rcu_do_batch() executions. Then the cblist is set to offloaded
> > > so that the nocb_gp kthread ignores this rdp.
> > 
> > nit: you mean cblist is set to non-offloaded mode right?
> > 
> > Also, could you clarify better the rcu_barrier bits in the changelog. I know
> > there's some issue if the cblist has both offloaded and non-offloaded
> > callbacks, but it would be good to clarify this here better IMHO.
> 
> And for archival purposes: rcu_barrier needs excluding here because it is
> possible that for a brief period of time, the callback kthread has been
> parked to do the mode-switch, and it could be executing a bunch of callbacks
> when it was asked to park.
> 
> Meanwhile, more interrupts happen and more callbacks are queued which are now
> executing in softirq. This ruins the ordering of callbacks that rcu_barrier
> needs.

I think in that case the callbacks would still be executed in order. We wait
for the kthread to park before switching to softirq callback execution.

Initially it was to avoid callback ordering issues but I don't recall
exactly which. Maybe it wasn't actually needed. But anyway I'll keep it
for the next version where, for a brief period of time, nocb kthread will
be able to compete with callback execution in softirq.

I'll clarify that in the changelog.

Thanks.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 08/10] rcu: Allow to deactivate nocb on a CPU
  2020-05-26 21:20   ` Joel Fernandes
  2020-05-26 22:49     ` Joel Fernandes
@ 2020-06-04 13:14     ` Frederic Weisbecker
  1 sibling, 0 replies; 57+ messages in thread
From: Frederic Weisbecker @ 2020-06-04 13:14 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Paul E . McKenney, LKML, Steven Rostedt, Mathieu Desnoyers,
	Lai Jiangshan, Josh Triplett

On Tue, May 26, 2020 at 05:20:17PM -0400, Joel Fernandes wrote:
> On Wed, May 13, 2020 at 06:47:12PM +0200, Frederic Weisbecker wrote:
> > Allow a CPU's rdp to quit the callback offlined mode.
> 
> nit: s/offlined/offloaded/ ?

Oh, looks like I did that everywhere :)

> 
> > The switch happens on the target with IRQs disabled and rdp->nocb_lock
> > held to avoid races between local callbacks handling and kthread
> > offloaded callbacks handling.
> > nocb_cb kthread is first parked to avoid any future race with
> > concurrent rcu_do_batch() executions. Then the cblist is set to offloaded
> > so that the nocb_gp kthread ignores this rdp.
> 
> nit: you mean cblist is set to non-offloaded mode right?

Ah right!

> > +static void __rcu_nocb_rdp_deoffload(struct rcu_data *rdp)
> > +{
> > +	unsigned long flags;
> > +	struct rcu_node *rnp = rdp->mynode;
> > +
> > +	printk("De-offloading %d\n", rdp->cpu);
> 
> nit: s/printk/pr_debug/ ?

Ok.

Thanks!

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 01/10] rcu: Directly lock rdp->nocb_lock on nocb code entrypoints
  2020-06-04 11:41       ` Frederic Weisbecker
@ 2020-06-04 16:36         ` Paul E. McKenney
  2020-06-08 12:57           ` Frederic Weisbecker
  0 siblings, 1 reply; 57+ messages in thread
From: Paul E. McKenney @ 2020-06-04 16:36 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Joel Fernandes, LKML, Steven Rostedt, Mathieu Desnoyers,
	Lai Jiangshan, Josh Triplett

On Thu, Jun 04, 2020 at 01:41:22PM +0200, Frederic Weisbecker wrote:
> On Fri, May 22, 2020 at 10:57:39AM -0700, Paul E. McKenney wrote:
> > On Wed, May 20, 2020 at 08:29:49AM -0400, Joel Fernandes wrote:
> > > Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > 
> > Thank you for looking this over, Joel!
> > 
> > Is it feasible to make rcu_nocb_lock*() and rcu_nocb_unlock*() "do the
> > right thing", even when things are changing?  If it is feasible, that
> > would prevent any number of "interesting" copy-pasta and "just now became
> > common code" bugs down the road.
> 
> This won't be pretty:
> 
>     locked = rcu_nocb_lock();
>     rcu_nocb_unlock(locked);

I was thinking in terms of a bit in the rcu_data structure into
which rcu_nocb_lock() and friends stored the status, and from which
rcu_nocb_unlock() and friends retrieved that same status.  Sort of like
how preemptible RCU uses the ->rcu_read_lock_nesting field in task_struct.

As noted, this does require reworking the hotplug code to avoid the
current holding of two such locks concurrently, which I am happy to do
if that helps.

Or am I missing a subtle (or not-so-subtle) twist here?

> And anyway we still want to unconditionally lock on many places,
> regardless of the offloaded state. I don't know how we could have
> a magic helper do the unconditional lock on some places and the
> conditional on others.

I was assuming (perhaps incorrectly) that an intermediate phase between
not-offloaded and offloaded would take care of all of those cases.

> Also the point of turning the lock helpers into primitives is to make
> it clearer as to where we really need unconditional locking and where
> we allow it to be conditional. A gift to reviewers :-)

Unless and until someone does a copy-pasta, thus unconditionally
doing the wrong thing.  ;-)

If we cannot avoid different spellings of ->cblist in different places,
such is life, but I do want to make sure that we have fully considered
the alternatives.

						Thanx, Paul

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 01/10] rcu: Directly lock rdp->nocb_lock on nocb code entrypoints
  2020-06-04 16:36         ` Paul E. McKenney
@ 2020-06-08 12:57           ` Frederic Weisbecker
  2020-06-09 18:02             ` Paul E. McKenney
  0 siblings, 1 reply; 57+ messages in thread
From: Frederic Weisbecker @ 2020-06-08 12:57 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Joel Fernandes, LKML, Steven Rostedt, Mathieu Desnoyers,
	Lai Jiangshan, Josh Triplett

On Thu, Jun 04, 2020 at 09:36:55AM -0700, Paul E. McKenney wrote:
> On Thu, Jun 04, 2020 at 01:41:22PM +0200, Frederic Weisbecker wrote:
> > On Fri, May 22, 2020 at 10:57:39AM -0700, Paul E. McKenney wrote:
> > > On Wed, May 20, 2020 at 08:29:49AM -0400, Joel Fernandes wrote:
> > > > Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > > 
> > > Thank you for looking this over, Joel!
> > > 
> > > Is it feasible to make rcu_nocb_lock*() and rcu_nocb_unlock*() "do the
> > > right thing", even when things are changing?  If it is feasible, that
> > > would prevent any number of "interesting" copy-pasta and "just now became
> > > common code" bugs down the road.
> > 
> > This won't be pretty:
> > 
> >     locked = rcu_nocb_lock();
> >     rcu_nocb_unlock(locked);
> 
> I was thinking in terms of a bit in the rcu_data structure into
> which rcu_nocb_lock() and friends stored the status, and from which
> rcu_nocb_unlock() and friends retrieved that same status.  Sort of like
> how preemptible RCU uses the ->rcu_read_lock_nesting field in task_struct.
> 
> As noted, this does require reworking the hotplug code to avoid the
> current holding of two such locks concurrently, which I am happy to do
> if that helps.
> 
> Or am I missing a subtle (or not-so-subtle) twist here?

So, during a CPU round, the nocb-gp kthread locks the corresponding rdp-nocb
and then ignores if it is not offloaded. Surely there is a smarter ways to do
that though. Well that's something we'll need to optimize at some point anyway.

Also we must then make sure that the nocb timers won't ever fire while we
switch to offloaded state or they may fiddle with internals without locking
nocb.

> 
> > And anyway we still want to unconditionally lock on many places,
> > regardless of the offloaded state. I don't know how we could have
> > a magic helper do the unconditional lock on some places and the
> > conditional on others.
> 
> I was assuming (perhaps incorrectly) that an intermediate phase between
> not-offloaded and offloaded would take care of all of those cases.

Perhaps partly but I fear that won't be enough.

> 
> > Also the point of turning the lock helpers into primitives is to make
> > it clearer as to where we really need unconditional locking and where
> > we allow it to be conditional. A gift to reviewers :-)
> 
> Unless and until someone does a copy-pasta, thus unconditionally
> doing the wrong thing.  ;-)

Yeah but the cost is more complicated code and synchronization to
maintain the use of those all-in-one locking APIs everywhere. And
current state is already not that simple :-)

Perhaps we should rename rcu_nocb_lock() into rcu_nocb_lock_cond() to
prevent from accidents?

Also I've been thinking that rcu_nocb_lock() should meet any of these
requirements:

* hotplug is locked
* rcu barrier is locked
* rnp is locked

Because checking the offloaded state (when nocb isn't locked yet) of
an rdp without any of the above locks held is racy. And that should
be easy to check and prevent from copy-pasta accidents.

What do you think?

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 01/10] rcu: Directly lock rdp->nocb_lock on nocb code entrypoints
  2020-06-08 12:57           ` Frederic Weisbecker
@ 2020-06-09 18:02             ` Paul E. McKenney
  2020-06-10 13:12               ` Frederic Weisbecker
  0 siblings, 1 reply; 57+ messages in thread
From: Paul E. McKenney @ 2020-06-09 18:02 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Joel Fernandes, LKML, Steven Rostedt, Mathieu Desnoyers,
	Lai Jiangshan, Josh Triplett

On Mon, Jun 08, 2020 at 02:57:17PM +0200, Frederic Weisbecker wrote:
> On Thu, Jun 04, 2020 at 09:36:55AM -0700, Paul E. McKenney wrote:
> > On Thu, Jun 04, 2020 at 01:41:22PM +0200, Frederic Weisbecker wrote:
> > > On Fri, May 22, 2020 at 10:57:39AM -0700, Paul E. McKenney wrote:
> > > > On Wed, May 20, 2020 at 08:29:49AM -0400, Joel Fernandes wrote:
> > > > > Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > > > 
> > > > Thank you for looking this over, Joel!
> > > > 
> > > > Is it feasible to make rcu_nocb_lock*() and rcu_nocb_unlock*() "do the
> > > > right thing", even when things are changing?  If it is feasible, that
> > > > would prevent any number of "interesting" copy-pasta and "just now became
> > > > common code" bugs down the road.
> > > 
> > > This won't be pretty:
> > > 
> > >     locked = rcu_nocb_lock();
> > >     rcu_nocb_unlock(locked);
> > 
> > I was thinking in terms of a bit in the rcu_data structure into
> > which rcu_nocb_lock() and friends stored the status, and from which
> > rcu_nocb_unlock() and friends retrieved that same status.  Sort of like
> > how preemptible RCU uses the ->rcu_read_lock_nesting field in task_struct.
> > 
> > As noted, this does require reworking the hotplug code to avoid the
> > current holding of two such locks concurrently, which I am happy to do
> > if that helps.
> > 
> > Or am I missing a subtle (or not-so-subtle) twist here?
> 
> So, during a CPU round, the nocb-gp kthread locks the corresponding rdp-nocb
> and then ignores if it is not offloaded. Surely there is a smarter ways to do
> that though. Well that's something we'll need to optimize at some point anyway.
> 
> Also we must then make sure that the nocb timers won't ever fire while we
> switch to offloaded state or they may fiddle with internals without locking
> nocb.

Understood, the synchronization requirements during the switchover will
be tricky.  But that will be the case no matter how we set it up.
If possible, it would be good if that trickiness could be concentrated
rather than spread far and wide.

> > > And anyway we still want to unconditionally lock on many places,
> > > regardless of the offloaded state. I don't know how we could have
> > > a magic helper do the unconditional lock on some places and the
> > > conditional on others.
> > 
> > I was assuming (perhaps incorrectly) that an intermediate phase between
> > not-offloaded and offloaded would take care of all of those cases.
> 
> Perhaps partly but I fear that won't be enough.

One approach is to rely on RCU read-side critical sections surrounding
the lock acquisition and to stay in the intermediate phase until a grace
period completes, preferably call_rcu() instead of synchronize_rcu().

This of course means refusing to do a transition if the CPU is still
in the intermediate state from a prior transition.

> > > Also the point of turning the lock helpers into primitives is to make
> > > it clearer as to where we really need unconditional locking and where
> > > we allow it to be conditional. A gift to reviewers :-)
> > 
> > Unless and until someone does a copy-pasta, thus unconditionally
> > doing the wrong thing.  ;-)
> 
> Yeah but the cost is more complicated code and synchronization to
> maintain the use of those all-in-one locking APIs everywhere. And
> current state is already not that simple :-)
> 
> Perhaps we should rename rcu_nocb_lock() into rcu_nocb_lock_cond() to
> prevent from accidents?

We do need something to avoid accidents.  ;-)

> Also I've been thinking that rcu_nocb_lock() should meet any of these
> requirements:
> 
> * hotplug is locked
> * rcu barrier is locked
> * rnp is locked
> 
> Because checking the offloaded state (when nocb isn't locked yet) of
> an rdp without any of the above locks held is racy. And that should
> be easy to check and prevent from copy-pasta accidents.
> 
> What do you think?

An RCU read-side critical section might be simpler.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 01/10] rcu: Directly lock rdp->nocb_lock on nocb code entrypoints
  2020-06-09 18:02             ` Paul E. McKenney
@ 2020-06-10 13:12               ` Frederic Weisbecker
  2020-06-10 14:02                 ` Paul E. McKenney
  0 siblings, 1 reply; 57+ messages in thread
From: Frederic Weisbecker @ 2020-06-10 13:12 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Joel Fernandes, LKML, Steven Rostedt, Mathieu Desnoyers,
	Lai Jiangshan, Josh Triplett

On Tue, Jun 09, 2020 at 11:02:27AM -0700, Paul E. McKenney wrote:
> > > > And anyway we still want to unconditionally lock on many places,
> > > > regardless of the offloaded state. I don't know how we could have
> > > > a magic helper do the unconditional lock on some places and the
> > > > conditional on others.
> > > 
> > > I was assuming (perhaps incorrectly) that an intermediate phase between
> > > not-offloaded and offloaded would take care of all of those cases.
> > 
> > Perhaps partly but I fear that won't be enough.
> 
> One approach is to rely on RCU read-side critical sections surrounding
> the lock acquisition and to stay in the intermediate phase until a grace
> period completes, preferably call_rcu() instead of synchronize_rcu().
> 
> This of course means refusing to do a transition if the CPU is still
> in the intermediate state from a prior transition.

That sounds good. But using synchronize_rcu() would be far easier. We
need to keep the hotplug and rcu barrier locked during the transition.

> > Also I've been thinking that rcu_nocb_lock() should meet any of these
> > requirements:
> > 
> > * hotplug is locked
> > * rcu barrier is locked
> > * rnp is locked
> > 
> > Because checking the offloaded state (when nocb isn't locked yet) of
> > an rdp without any of the above locks held is racy. And that should
> > be easy to check and prevent from copy-pasta accidents.
> > 
> > What do you think?
> 
> An RCU read-side critical section might be simpler.

Ok I think I can manage that.

Thanks.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 01/10] rcu: Directly lock rdp->nocb_lock on nocb code entrypoints
  2020-06-10 13:12               ` Frederic Weisbecker
@ 2020-06-10 14:02                 ` Paul E. McKenney
  2020-06-10 22:12                   ` Frederic Weisbecker
  0 siblings, 1 reply; 57+ messages in thread
From: Paul E. McKenney @ 2020-06-10 14:02 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Joel Fernandes, LKML, Steven Rostedt, Mathieu Desnoyers,
	Lai Jiangshan, Josh Triplett

On Wed, Jun 10, 2020 at 03:12:39PM +0200, Frederic Weisbecker wrote:
> On Tue, Jun 09, 2020 at 11:02:27AM -0700, Paul E. McKenney wrote:
> > > > > And anyway we still want to unconditionally lock on many places,
> > > > > regardless of the offloaded state. I don't know how we could have
> > > > > a magic helper do the unconditional lock on some places and the
> > > > > conditional on others.
> > > > 
> > > > I was assuming (perhaps incorrectly) that an intermediate phase between
> > > > not-offloaded and offloaded would take care of all of those cases.
> > > 
> > > Perhaps partly but I fear that won't be enough.
> > 
> > One approach is to rely on RCU read-side critical sections surrounding
> > the lock acquisition and to stay in the intermediate phase until a grace
> > period completes, preferably call_rcu() instead of synchronize_rcu().
> > 
> > This of course means refusing to do a transition if the CPU is still
> > in the intermediate state from a prior transition.
> 
> That sounds good. But using synchronize_rcu() would be far easier. We
> need to keep the hotplug and rcu barrier locked during the transition.
> 
> > > Also I've been thinking that rcu_nocb_lock() should meet any of these
> > > requirements:
> > > 
> > > * hotplug is locked
> > > * rcu barrier is locked
> > > * rnp is locked
> > > 
> > > Because checking the offloaded state (when nocb isn't locked yet) of
> > > an rdp without any of the above locks held is racy. And that should
> > > be easy to check and prevent from copy-pasta accidents.
> > > 
> > > What do you think?
> > 
> > An RCU read-side critical section might be simpler.
> 
> Ok I think I can manage that.

And just to argue against myself...

Another approach is to maintain explicit multiple states for each
->cblist, perhaps something like this:

1.	In softirq.  Transition code advances to next.
2.	To no-CB 1.  Either GP or CB kthread for the transitioning
	CPU advances to next.  Note that the fact that the
	transition code runs on the transitioning CPU means that
	the RCU softirq handler doesn't need to be involved.
3.	To no-CB 2.  Either GP or CB kthread for the transitioning
	CPU advances to next.
4.	To no-CB 3.  Transitioning code advances to next.
	At this point, the no-CBs setup is fully functional.
5.	No-CB.  Transitioning code advances to next.
	Again, the fact that the transitioning code is running
	on the transitioning CPU with interrupts disabled means
	that the RCU softirq handler need not be explicitly
	involved.
6.	To softirq 1.  The RCU softirq handler for the transitioning
	CPU advances to next.
	At this point, the RCU softirq handler is processing callbacks.
7.	To softirq 2.  Either GP or CB kthread for the transitioning
	CPU advances to next.
	At this point, the softirq handler is processing callbacks.
8.	To softirq 3.  Either GP or CB kthread for the transitioning
	CPU advances to next.
	At this point, the no-CBs setup is fully shut down.
9.	To softirq 4.  Transitioning code advances to next,
	which is the first, "In softirq".
	(This one -might- be unnecessary, but...)

All transitions are of course with the ->nocb_lock held.

When there is only one CPU during early boot near rcu_init() time,
the transition from "In softirq" to "No-CB" can remain be instantaneous.

This has the advantage of not slowing things down just because there
is an RCU callback flood in progress.  It also uses an explicit
protocol that should (give or take bugs) maintain full safety both
in protection of ->cblist and in dealing with RCU callback floods.

Thoughts?

							Thanx, Paul

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 01/10] rcu: Directly lock rdp->nocb_lock on nocb code entrypoints
  2020-06-10 14:02                 ` Paul E. McKenney
@ 2020-06-10 22:12                   ` Frederic Weisbecker
  2020-06-10 23:21                     ` Paul E. McKenney
  0 siblings, 1 reply; 57+ messages in thread
From: Frederic Weisbecker @ 2020-06-10 22:12 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Joel Fernandes, LKML, Steven Rostedt, Mathieu Desnoyers,
	Lai Jiangshan, Josh Triplett

On Wed, Jun 10, 2020 at 07:02:10AM -0700, Paul E. McKenney wrote:
> And just to argue against myself...
> 
> Another approach is to maintain explicit multiple states for each
> ->cblist, perhaps something like this:
> 
> 1.	In softirq.  Transition code advances to next.
> 2.	To no-CB 1.  Either GP or CB kthread for the transitioning
> 	CPU advances to next.  Note that the fact that the
> 	transition code runs on the transitioning CPU means that
> 	the RCU softirq handler doesn't need to be involved.
> 3.	To no-CB 2.  Either GP or CB kthread for the transitioning
> 	CPU advances to next.

Just to clarify, if GP has set NO_CB2 in (2), we want CB to set NO_CB3
in 3), right? OTOH if CB has set NO_CB2 in (2), we want GP to set NO_CB3
in (3), right?

The point being to make sure that both threads acknowledge the transition?

> 4.	To no-CB 3.  Transitioning code advances to next.
> 	At this point, the no-CBs setup is fully functional.

And softirq can stop processing callbacks from that point on.

> 5.	No-CB.  Transitioning code advances to next.
> 	Again, the fact that the transitioning code is running
> 	on the transitioning CPU with interrupts disabled means
> 	that the RCU softirq handler need not be explicitly
> 	involved.
> 6.	To softirq 1.  The RCU softirq handler for the transitioning
> 	CPU advances to next.
> 	At this point, the RCU softirq handler is processing callbacks.
> 7.	To softirq 2.  Either GP or CB kthread for the transitioning
> 	CPU advances to next.
> 	At this point, the softirq handler is processing callbacks.

SOFTIRQ2 should be part of what happens in SOFTIRQ1. The transitioning
CPU sets SOFTIRQ1, which is immediately visible by local softirqs,
and wakes up CB/GP. CB/GP sets SOFTIRQ2, CB/GP sets SOFTIRQ3 and
we go back to transitioning code that sets IN_SOFTIRQ.

Or did I miss something?


> 8.	To softirq 3.  Either GP or CB kthread for the transitioning
> 	CPU advances to next.
> 	At this point, the no-CBs setup is fully shut down.
> 9.	To softirq 4.  Transitioning code advances to next,
> 	which is the first, "In softirq".
> 	(This one -might- be unnecessary, but...)
> 
> All transitions are of course with the ->nocb_lock held.
> 
> When there is only one CPU during early boot near rcu_init() time,
> the transition from "In softirq" to "No-CB" can remain be instantaneous.
> 
> This has the advantage of not slowing things down just because there
> is an RCU callback flood in progress.  It also uses an explicit
> protocol that should (give or take bugs) maintain full safety both
> in protection of ->cblist and in dealing with RCU callback floods.
> 
> Thoughts?

Agreed. And I really like that it details the whole process in a very
explicit way.

Thanks!

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 01/10] rcu: Directly lock rdp->nocb_lock on nocb code entrypoints
  2020-06-10 22:12                   ` Frederic Weisbecker
@ 2020-06-10 23:21                     ` Paul E. McKenney
  2020-06-11  1:32                       ` Joel Fernandes
  0 siblings, 1 reply; 57+ messages in thread
From: Paul E. McKenney @ 2020-06-10 23:21 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Joel Fernandes, LKML, Steven Rostedt, Mathieu Desnoyers,
	Lai Jiangshan, Josh Triplett

On Thu, Jun 11, 2020 at 12:12:46AM +0200, Frederic Weisbecker wrote:
> On Wed, Jun 10, 2020 at 07:02:10AM -0700, Paul E. McKenney wrote:
> > And just to argue against myself...
> > 
> > Another approach is to maintain explicit multiple states for each
> > ->cblist, perhaps something like this:
> > 
> > 1.	In softirq.  Transition code advances to next.
> > 2.	To no-CB 1.  Either GP or CB kthread for the transitioning
> > 	CPU advances to next.  Note that the fact that the
> > 	transition code runs on the transitioning CPU means that
> > 	the RCU softirq handler doesn't need to be involved.
> > 3.	To no-CB 2.  Either GP or CB kthread for the transitioning
> > 	CPU advances to next.
> 
> Just to clarify, if GP has set NO_CB2 in (2), we want CB to set NO_CB3
> in 3), right? OTOH if CB has set NO_CB2 in (2), we want GP to set NO_CB3
> in (3), right?
> 
> The point being to make sure that both threads acknowledge the transition?

Exactly!

> > 4.	To no-CB 3.  Transitioning code advances to next.
> > 	At this point, the no-CBs setup is fully functional.
> 
> And softirq can stop processing callbacks from that point on.

You got it!

> > 5.	No-CB.  Transitioning code advances to next.
> > 	Again, the fact that the transitioning code is running
> > 	on the transitioning CPU with interrupts disabled means
> > 	that the RCU softirq handler need not be explicitly
> > 	involved.
> > 6.	To softirq 1.  The RCU softirq handler for the transitioning
> > 	CPU advances to next.
> > 	At this point, the RCU softirq handler is processing callbacks.
> > 7.	To softirq 2.  Either GP or CB kthread for the transitioning
> > 	CPU advances to next.
> > 	At this point, the softirq handler is processing callbacks.
> 
> SOFTIRQ2 should be part of what happens in SOFTIRQ1. The transitioning
> CPU sets SOFTIRQ1, which is immediately visible by local softirqs,
> and wakes up CB/GP. CB/GP sets SOFTIRQ2, CB/GP sets SOFTIRQ3 and
> we go back to transitioning code that sets IN_SOFTIRQ.
> 
> Or did I miss something?

I was perhaps being overly paranoid.  You might well be right.

> > 8.	To softirq 3.  Either GP or CB kthread for the transitioning
> > 	CPU advances to next.
> > 	At this point, the no-CBs setup is fully shut down.
> > 9.	To softirq 4.  Transitioning code advances to next,
> > 	which is the first, "In softirq".
> > 	(This one -might- be unnecessary, but...)
> > 
> > All transitions are of course with the ->nocb_lock held.
> > 
> > When there is only one CPU during early boot near rcu_init() time,
> > the transition from "In softirq" to "No-CB" can remain be instantaneous.
> > 
> > This has the advantage of not slowing things down just because there
> > is an RCU callback flood in progress.  It also uses an explicit
> > protocol that should (give or take bugs) maintain full safety both
> > in protection of ->cblist and in dealing with RCU callback floods.
> > 
> > Thoughts?
> 
> Agreed. And I really like that it details the whole process in a very
> explicit way.
> 
> Thanks!

Glad you like it!  And of course please adjust it as needed, up to and
including doing something completely different that works better.  ;-)

							Thanx, Paul

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 08/10] rcu: Allow to deactivate nocb on a CPU
  2020-06-04 13:10       ` Frederic Weisbecker
@ 2020-06-11  1:32         ` Joel Fernandes
  2020-06-11 17:03           ` Paul E. McKenney
  0 siblings, 1 reply; 57+ messages in thread
From: Joel Fernandes @ 2020-06-11  1:32 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Paul E . McKenney, LKML, Steven Rostedt, Mathieu Desnoyers,
	Lai Jiangshan, Josh Triplett

On Thu, Jun 04, 2020 at 03:10:30PM +0200, Frederic Weisbecker wrote:
> On Tue, May 26, 2020 at 06:49:08PM -0400, Joel Fernandes wrote:
> > On Tue, May 26, 2020 at 05:20:17PM -0400, Joel Fernandes wrote:
> >  
> > > > The switch happens on the target with IRQs disabled and rdp->nocb_lock
> > > > held to avoid races between local callbacks handling and kthread
> > > > offloaded callbacks handling.
> > > > nocb_cb kthread is first parked to avoid any future race with
> > > > concurrent rcu_do_batch() executions. Then the cblist is set to offloaded
> > > > so that the nocb_gp kthread ignores this rdp.
> > > 
> > > nit: you mean cblist is set to non-offloaded mode right?
> > > 
> > > Also, could you clarify better the rcu_barrier bits in the changelog. I know
> > > there's some issue if the cblist has both offloaded and non-offloaded
> > > callbacks, but it would be good to clarify this here better IMHO.
> > 
> > And for archival purposes: rcu_barrier needs excluding here because it is
> > possible that for a brief period of time, the callback kthread has been
> > parked to do the mode-switch, and it could be executing a bunch of callbacks
> > when it was asked to park.
> > 
> > Meanwhile, more interrupts happen and more callbacks are queued which are now
> > executing in softirq. This ruins the ordering of callbacks that rcu_barrier
> > needs.
> 
> I think in that case the callbacks would still be executed in order. We wait
> for the kthread to park before switching to softirq callback execution.

Ah ok, you are parking the CB kthread after the no-cb CB's are already
invoked (that's when parkme() is called -- i.e. after rcu_do_batch() in the
CB kthread runs).

Yeah, I don't see the purpose of acquiring rcu_barrier mutex either now. Once
you park, all CBs should have been invoked by the nocb CB thread right?
kthread_park() waits for the thread to be parked before proceeding. And you
don't de-offload before it is parked.

> Initially it was to avoid callback ordering issues but I don't recall
> exactly which. Maybe it wasn't actually needed. But anyway I'll keep it
> for the next version where, for a brief period of time, nocb kthread will
> be able to compete with callback execution in softirq.

Which nocb kthread is competing? Do you mean GP or CB?

Either way, could you clarify how does softirqs compete? Until the thread is
parked, you wouldn't de-offload. And once you de-offload, only then the
softirq would be executing callbacks. So at any point of time, it is
either the CB kthread executing CBs or the softirq executing CBs, not both.
Or did I miss something?

thanks,

 - Joel


> I'll clarify that in the changelog.
> 
> Thanks.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 01/10] rcu: Directly lock rdp->nocb_lock on nocb code entrypoints
  2020-06-10 23:21                     ` Paul E. McKenney
@ 2020-06-11  1:32                       ` Joel Fernandes
  0 siblings, 0 replies; 57+ messages in thread
From: Joel Fernandes @ 2020-06-11  1:32 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Frederic Weisbecker, LKML, Steven Rostedt, Mathieu Desnoyers,
	Lai Jiangshan, Josh Triplett

On Wed, Jun 10, 2020 at 04:21:42PM -0700, Paul E. McKenney wrote:
[...]
> > > 8.	To softirq 3.  Either GP or CB kthread for the transitioning
> > > 	CPU advances to next.
> > > 	At this point, the no-CBs setup is fully shut down.
> > > 9.	To softirq 4.  Transitioning code advances to next,
> > > 	which is the first, "In softirq".
> > > 	(This one -might- be unnecessary, but...)
> > > 
> > > All transitions are of course with the ->nocb_lock held.
> > > 
> > > When there is only one CPU during early boot near rcu_init() time,
> > > the transition from "In softirq" to "No-CB" can remain be instantaneous.
> > > 
> > > This has the advantage of not slowing things down just because there
> > > is an RCU callback flood in progress.  It also uses an explicit
> > > protocol that should (give or take bugs) maintain full safety both
> > > in protection of ->cblist and in dealing with RCU callback floods.
> > > 
> > > Thoughts?
> > 
> > Agreed. And I really like that it details the whole process in a very
> > explicit way.
> > 
> > Thanks!
> 
> Glad you like it!  And of course please adjust it as needed, up to and
> including doing something completely different that works better.  ;-)

Makes sense to me too, thanks!

 - Joel


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 06/10] rcu: Make nocb_cb kthread parkable
  2020-05-13 16:47 ` [PATCH 06/10] rcu: Make nocb_cb kthread parkable Frederic Weisbecker
@ 2020-06-11  1:34   ` Joel Fernandes
  0 siblings, 0 replies; 57+ messages in thread
From: Joel Fernandes @ 2020-06-11  1:34 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Paul E . McKenney, LKML, Steven Rostedt, Mathieu Desnoyers,
	Lai Jiangshan, Josh Triplett

On Wed, May 13, 2020 at 06:47:10PM +0200, Frederic Weisbecker wrote:
> This will be necessary to correctly implement rdp de-offloading. We
> don't want rcu_do_batch() in nocb_cb kthread to race with local
> rcu_do_batch().
> 
> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
> Cc: Paul E. McKenney <paulmck@kernel.org>
> Cc: Josh Triplett <josh@joshtriplett.org>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> Cc: Lai Jiangshan <jiangshanlai@gmail.com>
> Cc: Joel Fernandes <joel@joelfernandes.org>

Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>

thanks,

 - Joel


> ---
>  kernel/rcu/tree_plugin.h | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> index 1dd3fdd675a1..43ecc047af26 100644
> --- a/kernel/rcu/tree_plugin.h
> +++ b/kernel/rcu/tree_plugin.h
> @@ -2104,7 +2104,9 @@ static void nocb_cb_wait(struct rcu_data *rdp)
>  	if (needwake_gp)
>  		rcu_gp_kthread_wake();
>  	swait_event_interruptible_exclusive(rdp->nocb_cb_wq,
> -				 !READ_ONCE(rdp->nocb_cb_sleep));
> +				    !READ_ONCE(rdp->nocb_cb_sleep) ||
> +				    kthread_should_park());
> +
>  	if (!smp_load_acquire(&rdp->nocb_cb_sleep)) { /* VVV */
>  		/* ^^^ Ensure CB invocation follows _sleep test. */
>  		return;
> @@ -2125,6 +2127,8 @@ static int rcu_nocb_cb_kthread(void *arg)
>  	// if there are no more ready callbacks, waits for them.
>  	for (;;) {
>  		nocb_cb_wait(rdp);
> +		if (kthread_should_park())
> +			kthread_parkme();
>  		cond_resched_tasks_rcu_qs();
>  	}
>  	return 0;
> -- 
> 2.25.0
> 

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 08/10] rcu: Allow to deactivate nocb on a CPU
  2020-06-11  1:32         ` Joel Fernandes
@ 2020-06-11 17:03           ` Paul E. McKenney
  0 siblings, 0 replies; 57+ messages in thread
From: Paul E. McKenney @ 2020-06-11 17:03 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Frederic Weisbecker, LKML, Steven Rostedt, Mathieu Desnoyers,
	Lai Jiangshan, Josh Triplett

On Wed, Jun 10, 2020 at 09:32:03PM -0400, Joel Fernandes wrote:
> On Thu, Jun 04, 2020 at 03:10:30PM +0200, Frederic Weisbecker wrote:
> > On Tue, May 26, 2020 at 06:49:08PM -0400, Joel Fernandes wrote:
> > > On Tue, May 26, 2020 at 05:20:17PM -0400, Joel Fernandes wrote:
> > >  
> > > > > The switch happens on the target with IRQs disabled and rdp->nocb_lock
> > > > > held to avoid races between local callbacks handling and kthread
> > > > > offloaded callbacks handling.
> > > > > nocb_cb kthread is first parked to avoid any future race with
> > > > > concurrent rcu_do_batch() executions. Then the cblist is set to offloaded
> > > > > so that the nocb_gp kthread ignores this rdp.
> > > > 
> > > > nit: you mean cblist is set to non-offloaded mode right?
> > > > 
> > > > Also, could you clarify better the rcu_barrier bits in the changelog. I know
> > > > there's some issue if the cblist has both offloaded and non-offloaded
> > > > callbacks, but it would be good to clarify this here better IMHO.
> > > 
> > > And for archival purposes: rcu_barrier needs excluding here because it is
> > > possible that for a brief period of time, the callback kthread has been
> > > parked to do the mode-switch, and it could be executing a bunch of callbacks
> > > when it was asked to park.
> > > 
> > > Meanwhile, more interrupts happen and more callbacks are queued which are now
> > > executing in softirq. This ruins the ordering of callbacks that rcu_barrier
> > > needs.
> > 
> > I think in that case the callbacks would still be executed in order. We wait
> > for the kthread to park before switching to softirq callback execution.
> 
> Ah ok, you are parking the CB kthread after the no-cb CB's are already
> invoked (that's when parkme() is called -- i.e. after rcu_do_batch() in the
> CB kthread runs).
> 
> Yeah, I don't see the purpose of acquiring rcu_barrier mutex either now. Once
> you park, all CBs should have been invoked by the nocb CB thread right?
> kthread_park() waits for the thread to be parked before proceeding. And you
> don't de-offload before it is parked.

We absolutely must execute callbacks out of order in order to avoid
OOM due to RCU callback floods.  This is because if we don't execute
callbacks out of order, there will be a time when we are not executing
callbacks at all.  If execution gets preempted at this point, it is
quite possibly game over due to OOM.

							Thanx, Paul

> > Initially it was to avoid callback ordering issues but I don't recall
> > exactly which. Maybe it wasn't actually needed. But anyway I'll keep it
> > for the next version where, for a brief period of time, nocb kthread will
> > be able to compete with callback execution in softirq.
> 
> Which nocb kthread is competing? Do you mean GP or CB?
> 
> Either way, could you clarify how does softirqs compete? Until the thread is
> parked, you wouldn't de-offload. And once you de-offload, only then the
> softirq would be executing callbacks. So at any point of time, it is
> either the CB kthread executing CBs or the softirq executing CBs, not both.
> Or did I miss something?
> 
> thanks,
> 
>  - Joel
> 
> 
> > I'll clarify that in the changelog.
> > 
> > Thanks.

^ permalink raw reply	[flat|nested] 57+ messages in thread

end of thread, other threads:[~2020-06-11 17:03 UTC | newest]

Thread overview: 57+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-13 16:47 [PATCH 00/10] rcu: Allow a CPU to leave and reenter NOCB state Frederic Weisbecker
2020-05-13 16:47 ` [PATCH 01/10] rcu: Directly lock rdp->nocb_lock on nocb code entrypoints Frederic Weisbecker
2020-05-20 12:29   ` Joel Fernandes
2020-05-22 17:57     ` Paul E. McKenney
2020-05-26 15:21       ` Joel Fernandes
2020-05-26 16:29         ` Paul E. McKenney
2020-05-26 20:18           ` Joel Fernandes
2020-05-26 21:09             ` Paul E. McKenney
2020-05-26 21:27               ` Joel Fernandes
2020-05-26 22:29                 ` Paul E. McKenney
2020-05-27  0:45                   ` Joel Fernandes
2020-05-27  0:58                     ` Paul E. McKenney
2020-06-04 11:41       ` Frederic Weisbecker
2020-06-04 16:36         ` Paul E. McKenney
2020-06-08 12:57           ` Frederic Weisbecker
2020-06-09 18:02             ` Paul E. McKenney
2020-06-10 13:12               ` Frederic Weisbecker
2020-06-10 14:02                 ` Paul E. McKenney
2020-06-10 22:12                   ` Frederic Weisbecker
2020-06-10 23:21                     ` Paul E. McKenney
2020-06-11  1:32                       ` Joel Fernandes
2020-05-13 16:47 ` [PATCH 02/10] rcu: Use direct rdp->nocb_lock operations on local calls Frederic Weisbecker
2020-05-13 16:47 ` [PATCH 03/10] rcu: Make locking explicit in do_nocb_deferred_wakeup_common() Frederic Weisbecker
2020-05-26 19:54   ` Joel Fernandes
2020-05-26 19:59   ` Joel Fernandes
2020-05-13 16:47 ` [PATCH 04/10] rcu: Implement rcu_segcblist_is_offloaded() config dependent Frederic Weisbecker
2020-05-13 18:20   ` Paul E. McKenney
2020-05-13 23:03     ` Frederic Weisbecker
2020-05-14 15:47       ` Paul E. McKenney
2020-05-13 16:47 ` [PATCH 05/10] rcu: Remove useless conditional nocb unlock Frederic Weisbecker
2020-05-13 16:47 ` [PATCH 06/10] rcu: Make nocb_cb kthread parkable Frederic Weisbecker
2020-06-11  1:34   ` Joel Fernandes
2020-05-13 16:47 ` [PATCH 07/10] rcu: Temporarily assume that nohz full CPUs might not be NOCB Frederic Weisbecker
2020-05-13 18:25   ` Paul E. McKenney
2020-05-13 23:08     ` Frederic Weisbecker
2020-05-14 15:50       ` Paul E. McKenney
2020-05-14 22:49         ` Frederic Weisbecker
2020-05-13 16:47 ` [PATCH 08/10] rcu: Allow to deactivate nocb on a CPU Frederic Weisbecker
2020-05-13 18:38   ` Paul E. McKenney
2020-05-13 22:45     ` Frederic Weisbecker
2020-05-14 15:47       ` Paul E. McKenney
2020-05-14 22:30         ` Frederic Weisbecker
2020-05-14 22:47           ` Paul E. McKenney
2020-05-14 22:55             ` Frederic Weisbecker
2020-05-26 21:20   ` Joel Fernandes
2020-05-26 22:49     ` Joel Fernandes
2020-06-04 13:10       ` Frederic Weisbecker
2020-06-11  1:32         ` Joel Fernandes
2020-06-11 17:03           ` Paul E. McKenney
2020-06-04 13:14     ` Frederic Weisbecker
2020-05-13 16:47 ` [PATCH 09/10] rcu: Allow to re-offload a CPU that used to be nocb Frederic Weisbecker
2020-05-13 18:41   ` Paul E. McKenney
2020-05-13 16:47 ` [PATCH 10/10] rcu: Nocb (de)activate through sysfs Frederic Weisbecker
2020-05-13 18:42   ` Paul E. McKenney
2020-05-13 23:23     ` Frederic Weisbecker
2020-05-14 15:51       ` Paul E. McKenney
2020-05-13 18:15 ` [PATCH 00/10] rcu: Allow a CPU to leave and reenter NOCB state Paul E. McKenney

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).