[PATCH 00/16] rcu/nocb: De-offload and re-offload support v3

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 00/16] rcu/nocb: De-offload and re-offload support v3
@ 2020-10-23 14:46 Frederic Weisbecker
  2020-10-23 14:46 ` [PATCH 01/16] rcu: Implement rcu_segcblist_is_offloaded() config dependent Frederic Weisbecker
                   ` (15 more replies)
  0 siblings, 16 replies; 21+ messages in thread
From: Frederic Weisbecker @ 2020-10-23 14:46 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Steven Rostedt, Mathieu Desnoyers,
	Paul E . McKenney, Lai Jiangshan, Neeraj Upadhyay,
	Joel Fernandes, Josh Triplett

Hi,

This is the third attempt to make rcu-nocb mutable at runtime. See
v2 there: https://lwn.net/Articles/832031/

Many fixes included and rcutorture support:

03: Whitespace fix
04: Add "rcu/nocb: Always init segcblist on CPU up" (Thanks rcutorture!)
05: Add return value for failure on (de)offloading
06: Add "rcu/nocb: Don't deoffload an offline CPU with pending work" (Thanks Paul!)
12: Add "rcu/nocb: Only cond_resched() from actual offloaded batch processing" (Thanks rcutorture!)
14: Write a real and more self-confident changelog
15/16: Add rcutorture support (thanks Paul!)

My last remaining worry, as exposed in 13/16, is that __call_rcu_core()
isn't called during transition states. I need to see how I can handle
that.

git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks.git
	rcu/nocb-toggle-v2

HEAD: 0247cd09fd48a38741e5cc6634810985b405baca

Thanks,
	Frederic
---

Frederic Weisbecker (15):
      rcu: Implement rcu_segcblist_is_offloaded() config dependent
      rcu: Turn enabled/offload states into a common flag
      rcu: Provide basic callback offloading state machine bits
      rcu/nocb: Always init segcblist on CPU up
      rcu: De-offloading CB kthread
      rcu/nocb: Don't deoffload an offline CPU with pending work
      rcu: De-offloading GP kthread
      rcu: Re-offload support
      rcu: Shutdown nocb timer on de-offloading
      rcu: Flush bypass before setting SEGCBLIST_SOFTIRQ_ONLY
      rcu: Set SEGCBLIST_SOFTIRQ_ONLY at the very last stage of de-offloading
      rcu/nocb: Only cond_resched() from actual offloaded batch processing
      rcu: Process batch locally as long as offloading isn't complete
      rcu: Locally accelerate callbacks as long as offloading isn't complete
      tools/rcutorture: Support nocb toggle in TREE01

Paul E. McKenney (1):
      rcutorture: Test runtime toggling of CPUs' callback offloading


 Documentation/admin-guide/kernel-parameters.txt    |   8 +
 include/linux/rcu_segcblist.h                      | 119 +++++++-
 include/linux/rcupdate.h                           |   4 +
 kernel/rcu/rcu_segcblist.c                         |  13 +-
 kernel/rcu/rcu_segcblist.h                         |  45 ++-
 kernel/rcu/rcutorture.c                            |  86 +++++-
 kernel/rcu/tree.c                                  |  47 +--
 kernel/rcu/tree.h                                  |   2 +
 kernel/rcu/tree_plugin.h                           | 330 +++++++++++++++++++--
 .../selftests/rcutorture/configs/rcu/TREE01.boot   |   4 +-
 10 files changed, 603 insertions(+), 55 deletions(-)

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 01/16] rcu: Implement rcu_segcblist_is_offloaded() config dependent
  2020-10-23 14:46 [PATCH 00/16] rcu/nocb: De-offload and re-offload support v3 Frederic Weisbecker
@ 2020-10-23 14:46 ` Frederic Weisbecker
  2020-10-23 14:46 ` [PATCH 02/16] rcu: Turn enabled/offload states into a common flag Frederic Weisbecker
                   ` (14 subsequent siblings)
  15 siblings, 0 replies; 21+ messages in thread
From: Frederic Weisbecker @ 2020-10-23 14:46 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Steven Rostedt, Mathieu Desnoyers,
	Paul E . McKenney, Lai Jiangshan, Neeraj Upadhyay,
	Joel Fernandes, Josh Triplett

This simplify the usage of this API and avoid checking the kernel
config from the callers.

Suggested-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Neeraj Upadhyay <neeraju@codeaurora.org>
---
 kernel/rcu/rcu_segcblist.h |  2 +-
 kernel/rcu/tree.c          | 21 +++++++--------------
 2 files changed, 8 insertions(+), 15 deletions(-)

diff --git a/kernel/rcu/rcu_segcblist.h b/kernel/rcu/rcu_segcblist.h
index 5c293afc07b8..492262bcb591 100644
--- a/kernel/rcu/rcu_segcblist.h
+++ b/kernel/rcu/rcu_segcblist.h
@@ -62,7 +62,7 @@ static inline bool rcu_segcblist_is_enabled(struct rcu_segcblist *rsclp)
 /* Is the specified rcu_segcblist offloaded?  */
 static inline bool rcu_segcblist_is_offloaded(struct rcu_segcblist *rsclp)
 {
-	return rsclp->offloaded;
+	return IS_ENABLED(CONFIG_RCU_NOCB_CPU) && rsclp->offloaded;
 }
 
 /*
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 06895ef85d69..dc1e578644df 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -1603,8 +1603,7 @@ static bool __note_gp_changes(struct rcu_node *rnp, struct rcu_data *rdp)
 {
 	bool ret = false;
 	bool need_qs;
-	const bool offloaded = IS_ENABLED(CONFIG_RCU_NOCB_CPU) &&
-			       rcu_segcblist_is_offloaded(&rdp->cblist);
+	const bool offloaded = rcu_segcblist_is_offloaded(&rdp->cblist);
 
 	raw_lockdep_assert_held_rcu_node(rnp);
 
@@ -2048,8 +2047,7 @@ static void rcu_gp_cleanup(void)
 		needgp = true;
 	}
 	/* Advance CBs to reduce false positives below. */
-	offloaded = IS_ENABLED(CONFIG_RCU_NOCB_CPU) &&
-		    rcu_segcblist_is_offloaded(&rdp->cblist);
+	offloaded = rcu_segcblist_is_offloaded(&rdp->cblist);
 	if ((offloaded || !rcu_accelerate_cbs(rnp, rdp)) && needgp) {
 		WRITE_ONCE(rcu_state.gp_flags, RCU_GP_FLAG_INIT);
 		WRITE_ONCE(rcu_state.gp_req_activity, jiffies);
@@ -2248,8 +2246,7 @@ rcu_report_qs_rdp(struct rcu_data *rdp)
 	unsigned long flags;
 	unsigned long mask;
 	bool needwake = false;
-	const bool offloaded = IS_ENABLED(CONFIG_RCU_NOCB_CPU) &&
-			       rcu_segcblist_is_offloaded(&rdp->cblist);
+	const bool offloaded = rcu_segcblist_is_offloaded(&rdp->cblist);
 	struct rcu_node *rnp;
 
 	WARN_ON_ONCE(rdp->cpu != smp_processor_id());
@@ -2417,8 +2414,7 @@ static void rcu_do_batch(struct rcu_data *rdp)
 {
 	int div;
 	unsigned long flags;
-	const bool offloaded = IS_ENABLED(CONFIG_RCU_NOCB_CPU) &&
-			       rcu_segcblist_is_offloaded(&rdp->cblist);
+	const bool offloaded = rcu_segcblist_is_offloaded(&rdp->cblist);
 	struct rcu_head *rhp;
 	struct rcu_cblist rcl = RCU_CBLIST_INITIALIZER(rcl);
 	long bl, count;
@@ -2675,8 +2671,7 @@ static __latent_entropy void rcu_core(void)
 	unsigned long flags;
 	struct rcu_data *rdp = raw_cpu_ptr(&rcu_data);
 	struct rcu_node *rnp = rdp->mynode;
-	const bool offloaded = IS_ENABLED(CONFIG_RCU_NOCB_CPU) &&
-			       rcu_segcblist_is_offloaded(&rdp->cblist);
+	const bool offloaded = rcu_segcblist_is_offloaded(&rdp->cblist);
 
 	if (cpu_is_offline(smp_processor_id()))
 		return;
@@ -2978,8 +2973,7 @@ __call_rcu(struct rcu_head *head, rcu_callback_t func)
 				   rcu_segcblist_n_cbs(&rdp->cblist));
 
 	/* Go handle any RCU core processing required. */
-	if (IS_ENABLED(CONFIG_RCU_NOCB_CPU) &&
-	    unlikely(rcu_segcblist_is_offloaded(&rdp->cblist))) {
+	if (unlikely(rcu_segcblist_is_offloaded(&rdp->cblist))) {
 		__call_rcu_nocb_wake(rdp, was_alldone, flags); /* unlocks */
 	} else {
 		__call_rcu_core(rdp, head, flags);
@@ -3712,8 +3706,7 @@ static int rcu_pending(int user)
 
 	/* Has RCU gone idle with this CPU needing another grace period? */
 	if (!gp_in_progress && rcu_segcblist_is_enabled(&rdp->cblist) &&
-	    (!IS_ENABLED(CONFIG_RCU_NOCB_CPU) ||
-	     !rcu_segcblist_is_offloaded(&rdp->cblist)) &&
+	    !rcu_segcblist_is_offloaded(&rdp->cblist) &&
 	    !rcu_segcblist_restempty(&rdp->cblist, RCU_NEXT_READY_TAIL))
 		return 1;
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 02/16] rcu: Turn enabled/offload states into a common flag
  2020-10-23 14:46 [PATCH 00/16] rcu/nocb: De-offload and re-offload support v3 Frederic Weisbecker
  2020-10-23 14:46 ` [PATCH 01/16] rcu: Implement rcu_segcblist_is_offloaded() config dependent Frederic Weisbecker
@ 2020-10-23 14:46 ` Frederic Weisbecker
  2020-10-23 14:46 ` [PATCH 03/16] rcu: Provide basic callback offloading state machine bits Frederic Weisbecker
                   ` (13 subsequent siblings)
  15 siblings, 0 replies; 21+ messages in thread
From: Frederic Weisbecker @ 2020-10-23 14:46 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Steven Rostedt, Mathieu Desnoyers,
	Paul E . McKenney, Lai Jiangshan, Neeraj Upadhyay,
	Joel Fernandes, Josh Triplett

Gather the segcblist properties in a common map to avoid spreading
booleans in the structure. And this prepares for the offloaded state to
be mutable on runtime.

Inspired-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Neeraj Upadhyay <neeraju@codeaurora.org>
---
 include/linux/rcu_segcblist.h |  6 ++++--
 kernel/rcu/rcu_segcblist.c    |  6 +++---
 kernel/rcu/rcu_segcblist.h    | 23 +++++++++++++++++++++--
 3 files changed, 28 insertions(+), 7 deletions(-)

diff --git a/include/linux/rcu_segcblist.h b/include/linux/rcu_segcblist.h
index b36afe7b22c9..dca2f39ee67f 100644
--- a/include/linux/rcu_segcblist.h
+++ b/include/linux/rcu_segcblist.h
@@ -63,6 +63,9 @@ struct rcu_cblist {
 #define RCU_NEXT_TAIL		3
 #define RCU_CBLIST_NSEGS	4
 
+#define SEGCBLIST_ENABLED	BIT(0)
+#define SEGCBLIST_OFFLOADED	BIT(1)
+
 struct rcu_segcblist {
 	struct rcu_head *head;
 	struct rcu_head **tails[RCU_CBLIST_NSEGS];
@@ -72,8 +75,7 @@ struct rcu_segcblist {
 #else
 	long len;
 #endif
-	u8 enabled;
-	u8 offloaded;
+	u8 flags;
 };
 
 #define RCU_SEGCBLIST_INITIALIZER(n) \
diff --git a/kernel/rcu/rcu_segcblist.c b/kernel/rcu/rcu_segcblist.c
index 2d2a6b6b9dfb..e6522fa54311 100644
--- a/kernel/rcu/rcu_segcblist.c
+++ b/kernel/rcu/rcu_segcblist.c
@@ -152,7 +152,7 @@ void rcu_segcblist_init(struct rcu_segcblist *rsclp)
 	for (i = 0; i < RCU_CBLIST_NSEGS; i++)
 		rsclp->tails[i] = &rsclp->head;
 	rcu_segcblist_set_len(rsclp, 0);
-	rsclp->enabled = 1;
+	rcu_segcblist_set_flags(rsclp, SEGCBLIST_ENABLED);
 }
 
 /*
@@ -163,7 +163,7 @@ void rcu_segcblist_disable(struct rcu_segcblist *rsclp)
 {
 	WARN_ON_ONCE(!rcu_segcblist_empty(rsclp));
 	WARN_ON_ONCE(rcu_segcblist_n_cbs(rsclp));
-	rsclp->enabled = 0;
+	rcu_segcblist_clear_flags(rsclp, SEGCBLIST_ENABLED);
 }
 
 /*
@@ -172,7 +172,7 @@ void rcu_segcblist_disable(struct rcu_segcblist *rsclp)
  */
 void rcu_segcblist_offload(struct rcu_segcblist *rsclp)
 {
-	rsclp->offloaded = 1;
+	rcu_segcblist_set_flags(rsclp, SEGCBLIST_OFFLOADED);
 }
 
 /*
diff --git a/kernel/rcu/rcu_segcblist.h b/kernel/rcu/rcu_segcblist.h
index 492262bcb591..fc98761e3ee9 100644
--- a/kernel/rcu/rcu_segcblist.h
+++ b/kernel/rcu/rcu_segcblist.h
@@ -50,19 +50,38 @@ static inline long rcu_segcblist_n_cbs(struct rcu_segcblist *rsclp)
 #endif
 }
 
+static inline void rcu_segcblist_set_flags(struct rcu_segcblist *rsclp,
+					   int flags)
+{
+	rsclp->flags |= flags;
+}
+
+static inline void rcu_segcblist_clear_flags(struct rcu_segcblist *rsclp,
+					     int flags)
+{
+	rsclp->flags &= ~flags;
+}
+
+static inline bool rcu_segcblist_test_flags(struct rcu_segcblist *rsclp,
+					    int flags)
+{
+	return READ_ONCE(rsclp->flags) & flags;
+}
+
 /*
  * Is the specified rcu_segcblist enabled, for example, not corresponding
  * to an offline CPU?
  */
 static inline bool rcu_segcblist_is_enabled(struct rcu_segcblist *rsclp)
 {
-	return rsclp->enabled;
+	return rcu_segcblist_test_flags(rsclp, SEGCBLIST_ENABLED);
 }
 
 /* Is the specified rcu_segcblist offloaded?  */
 static inline bool rcu_segcblist_is_offloaded(struct rcu_segcblist *rsclp)
 {
-	return IS_ENABLED(CONFIG_RCU_NOCB_CPU) && rsclp->offloaded;
+	return IS_ENABLED(CONFIG_RCU_NOCB_CPU) &&
+		rcu_segcblist_test_flags(rsclp, SEGCBLIST_OFFLOADED);
 }
 
 /*
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 03/16] rcu: Provide basic callback offloading state machine bits
  2020-10-23 14:46 [PATCH 00/16] rcu/nocb: De-offload and re-offload support v3 Frederic Weisbecker
  2020-10-23 14:46 ` [PATCH 01/16] rcu: Implement rcu_segcblist_is_offloaded() config dependent Frederic Weisbecker
  2020-10-23 14:46 ` [PATCH 02/16] rcu: Turn enabled/offload states into a common flag Frederic Weisbecker
@ 2020-10-23 14:46 ` Frederic Weisbecker
  2020-10-23 14:46 ` [PATCH 04/16] rcu/nocb: Always init segcblist on CPU up Frederic Weisbecker
                   ` (12 subsequent siblings)
  15 siblings, 0 replies; 21+ messages in thread
From: Frederic Weisbecker @ 2020-10-23 14:46 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Steven Rostedt, Mathieu Desnoyers,
	Paul E . McKenney, Lai Jiangshan, Neeraj Upadhyay,
	Joel Fernandes, Josh Triplett

We'll need to be able to runtime offload and de-offload the processing
of callback for a given CPU. In order to support a smooth transition
from unlocked local processing (softirq/rcuc) to locked offloaded
processing (rcuop/rcuog) and the reverse, provide the necessary bits and
documentation for the state machine that will carry up all the steps to
enforce correctness while serving callbacks processing all along.

Inspired-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Neeraj Upadhyay <neeraju@codeaurora.org>
---
 include/linux/rcu_segcblist.h | 115 +++++++++++++++++++++++++++++++++-
 kernel/rcu/rcu_segcblist.c    |   1 +
 kernel/rcu/rcu_segcblist.h    |  12 +++-
 kernel/rcu/tree.c             |   3 +
 4 files changed, 128 insertions(+), 3 deletions(-)

diff --git a/include/linux/rcu_segcblist.h b/include/linux/rcu_segcblist.h
index dca2f39ee67f..8a0d3a211e7c 100644
--- a/include/linux/rcu_segcblist.h
+++ b/include/linux/rcu_segcblist.h
@@ -63,8 +63,121 @@ struct rcu_cblist {
 #define RCU_NEXT_TAIL		3
 #define RCU_CBLIST_NSEGS	4
 
+
+/*
+ *                     ==NOCB Offloading state machine==
+ *
+ *
+ *  ----------------------------------------------------------------------------
+ *  |                         SEGCBLIST_SOFTIRQ_ONLY                           |
+ *  |                                                                          |
+ *  |  Callbacks processed by rcu_core() from softirqs or local                |
+ *  |  rcuc kthread, without holding nocb_lock.                                |
+ *  ----------------------------------------------------------------------------
+ *                                         |
+ *                                         v
+ *  ----------------------------------------------------------------------------
+ *  |                        SEGCBLIST_OFFLOADED                               |
+ *  |                                                                          |
+ *  | Callbacks processed by rcu_core() from softirqs or local                 |
+ *  | rcuc kthread, while holding nocb_lock. Waking up CB and GP kthreads,     |
+ *  | allowing nocb_timer to be armed.                                        |
+ *  ----------------------------------------------------------------------------
+ *                                         |
+ *                                         v
+ *                        -----------------------------------
+ *                        |                                 |
+ *                        v                                 v
+ *  ---------------------------------------  ----------------------------------|
+ *  |        SEGCBLIST_OFFLOADED |        |  |     SEGCBLIST_OFFLOADED |       |
+ *  |        SEGCBLIST_KTHREAD_CB         |  |     SEGCBLIST_KTHREAD_GP        |
+ *  |                                     |  |                                 |
+ *  |                                     |  |                                 |
+ *  | CB kthread woke up and              |  | GP kthread woke up and          |
+ *  | acknowledged SEGCBLIST_OFFLOADED.   |  | acknowledged SEGCBLIST_OFFLOADED|
+ *  | Processes callbacks concurrently    |  |                                 |
+ *  | with rcu_core(), holding            |  |                                 |
+ *  | nocb_lock.                          |  |                                 |
+ *  ---------------------------------------  -----------------------------------
+ *                        |                                 |
+ *                        -----------------------------------
+ *                                         |
+ *                                         v
+ *  |--------------------------------------------------------------------------|
+ *  |                           SEGCBLIST_OFFLOADED |                          |
+ *  |                           SEGCBLIST_KTHREAD_CB |                         |
+ *  |                           SEGCBLIST_KTHREAD_GP                           |
+ *  |                                                                          |
+ *  |   Kthreads handle callbacks holding nocb_lock, local rcu_core() stops    |
+ *  |   handling callbacks.                                                    |
+ *  ----------------------------------------------------------------------------
+ */
+
+
+
+/*
+ *                       ==NOCB De-Offloading state machine==
+ *
+ *
+ *  |--------------------------------------------------------------------------|
+ *  |                           SEGCBLIST_OFFLOADED |                          |
+ *  |                           SEGCBLIST_KTHREAD_CB |                         |
+ *  |                           SEGCBLIST_KTHREAD_GP                           |
+ *  |                                                                          |
+ *  |   CB/GP kthreads handle callbacks holding nocb_lock, local rcu_core()    |
+ *  |   ignores callbacks.                                                     |
+ *  ----------------------------------------------------------------------------
+ *                                      |
+ *                                      v
+ *  |--------------------------------------------------------------------------|
+ *  |                           SEGCBLIST_KTHREAD_CB |                         |
+ *  |                           SEGCBLIST_KTHREAD_GP                           |
+ *  |                                                                          |
+ *  |   CB/GP kthreads and local rcu_core() handle callbacks concurrently      |
+ *  |   holding nocb_lock. Wake up CB and GP kthreads if necessary.            |
+ *  ----------------------------------------------------------------------------
+ *                                      |
+ *                                      v
+ *                     -----------------------------------
+ *                     |                                 |
+ *                     v                                 v
+ *  ---------------------------------------------------------------------------|
+ *  |                                                                          |
+ *  |        SEGCBLIST_KTHREAD_CB         |       SEGCBLIST_KTHREAD_GP         |
+ *  |                                     |                                    |
+ *  | GP kthread woke up and              |   CB kthread woke up and           |
+ *  | acknowledged the fact that          |   acknowledged the fact that       |
+ *  | SEGCBLIST_OFFLOADED got cleared.    |   SEGCBLIST_OFFLOADED got cleared. |
+ *  |                                     |   The CB kthread goes to sleep     |
+ *  | The callbacks from the target CPU   |   until it ever gets re-offloaded. |
+ *  | will be ignored from the GP kthread |                                    |
+ *  | loop.                               |                                    |
+ *  ----------------------------------------------------------------------------
+ *                      |                                 |
+ *                      -----------------------------------
+ *                                      |
+ *                                      v
+ *  ----------------------------------------------------------------------------
+ *  |                                   0                                      |
+ *  |                                                                          |
+ *  | Callbacks processed by rcu_core() from softirqs or local                 |
+ *  | rcuc kthread, while holding nocb_lock. Forbid nocb_timer to be armed.    |
+ *  | Flush pending nocb_timer. Flush nocb bypass callbacks.                   |
+ *  ----------------------------------------------------------------------------
+ *                                      |
+ *                                      v
+ *  ----------------------------------------------------------------------------
+ *  |                         SEGCBLIST_SOFTIRQ_ONLY                           |
+ *  |                                                                          |
+ *  |  Callbacks processed by rcu_core() from softirqs or local                |
+ *  |  rcuc kthread, without holding nocb_lock.                                |
+ *  ----------------------------------------------------------------------------
+ */
 #define SEGCBLIST_ENABLED	BIT(0)
-#define SEGCBLIST_OFFLOADED	BIT(1)
+#define SEGCBLIST_SOFTIRQ_ONLY	BIT(1)
+#define SEGCBLIST_KTHREAD_CB	BIT(2)
+#define SEGCBLIST_KTHREAD_GP	BIT(3)
+#define SEGCBLIST_OFFLOADED	BIT(4)
 
 struct rcu_segcblist {
 	struct rcu_head *head;
diff --git a/kernel/rcu/rcu_segcblist.c b/kernel/rcu/rcu_segcblist.c
index e6522fa54311..a96511b7cc98 100644
--- a/kernel/rcu/rcu_segcblist.c
+++ b/kernel/rcu/rcu_segcblist.c
@@ -172,6 +172,7 @@ void rcu_segcblist_disable(struct rcu_segcblist *rsclp)
  */
 void rcu_segcblist_offload(struct rcu_segcblist *rsclp)
 {
+	rcu_segcblist_clear_flags(rsclp, SEGCBLIST_SOFTIRQ_ONLY);
 	rcu_segcblist_set_flags(rsclp, SEGCBLIST_OFFLOADED);
 }
 
diff --git a/kernel/rcu/rcu_segcblist.h b/kernel/rcu/rcu_segcblist.h
index fc98761e3ee9..575896a2518b 100644
--- a/kernel/rcu/rcu_segcblist.h
+++ b/kernel/rcu/rcu_segcblist.h
@@ -80,8 +80,16 @@ static inline bool rcu_segcblist_is_enabled(struct rcu_segcblist *rsclp)
 /* Is the specified rcu_segcblist offloaded?  */
 static inline bool rcu_segcblist_is_offloaded(struct rcu_segcblist *rsclp)
 {
-	return IS_ENABLED(CONFIG_RCU_NOCB_CPU) &&
-		rcu_segcblist_test_flags(rsclp, SEGCBLIST_OFFLOADED);
+	if (IS_ENABLED(CONFIG_RCU_NOCB_CPU)) {
+		/*
+		 * Complete de-offloading happens only when SEGCBLIST_SOFTIRQ_ONLY
+		 * is set.
+		 */
+		if (!rcu_segcblist_test_flags(rsclp, SEGCBLIST_SOFTIRQ_ONLY))
+			return true;
+	}
+
+	return false;
 }
 
 /*
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index dc1e578644df..3b7adc9cc068 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -83,6 +83,9 @@ static DEFINE_PER_CPU_SHARED_ALIGNED(struct rcu_data, rcu_data) = {
 	.dynticks_nesting = 1,
 	.dynticks_nmi_nesting = DYNTICK_IRQ_NONIDLE,
 	.dynticks = ATOMIC_INIT(RCU_DYNTICK_CTRL_CTR),
+#ifdef CONFIG_RCU_NOCB_CPU
+	.cblist.flags = SEGCBLIST_SOFTIRQ_ONLY,
+#endif
 };
 static struct rcu_state rcu_state = {
 	.level = { &rcu_state.node[0] },
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 04/16] rcu/nocb: Always init segcblist on CPU up
  2020-10-23 14:46 [PATCH 00/16] rcu/nocb: De-offload and re-offload support v3 Frederic Weisbecker
                   ` (2 preceding siblings ...)
  2020-10-23 14:46 ` [PATCH 03/16] rcu: Provide basic callback offloading state machine bits Frederic Weisbecker
@ 2020-10-23 14:46 ` Frederic Weisbecker
  2020-10-23 14:46 ` [PATCH 05/16] rcu: De-offloading CB kthread Frederic Weisbecker
                   ` (11 subsequent siblings)
  15 siblings, 0 replies; 21+ messages in thread
From: Frederic Weisbecker @ 2020-10-23 14:46 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Steven Rostedt, Mathieu Desnoyers,
	Paul E . McKenney, Lai Jiangshan, Neeraj Upadhyay,
	Joel Fernandes, Josh Triplett

An rdp's segcblist enabled state is treated differently on CPU hotplug
operations, depending on whether it is offloaded or not.

1) Not offloaded: An rdp is disabled on CPU down. All its callbacks are
   migrated and no more aren't supposed to be enqueued until it gets
   re-enabled on CPU up.

2) Offloaded: An rdp is not disabled on CPU down in order to let the
   CB/GP kthreads finish their jobs on remaining callbacks. Hence it is
   not re-enabled on CPU up either.

Since an rdp's offloaded state is set in stone at boot, we expect the
offloaded state to remain the same between CPU down and CPU up. So 1)
and 2) are symmetrical.

Now the offloaded state will become toggable at runtime. Hence the new
possible asymmetrical scenarios:

3) An rdp goes into CPU down while in a not-offloaded state. It gets
   later set to offloaded and finally goes into CPU up.

4) An rdp goes into CPU down while in an offloaded state. It gets
   later set to not-offloaded and finally goes into CPU up.

The scenario 4) is currently well handled. The rdp isn't disabled on
CPU down and it gets re-initialized on CPU up. We require the segcblist
to be empty in order to toggle to non-offloaded state while a CPU is
offlined.

The scenario 3) would run into trouble though, as the rdp is disabled
on CPU down and not re-initialized/re-enabled on CPU up.

In order to fix this, always re-initialize/re-enable an rdp on CPU up
unless it still has callbacks at that time, which anyway can only happen
when the rdp went down and up in offloaded state (case 2), the only
case that doesn't need re-initialization.

NOTE: The proper longer term fix will be to wait for all the offloaded
callbacks to be processed before completing CPU down operations. So we
can unconditionally re-initialize on CPU up.

Inspired-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Neeraj Upadhyay <neeraju@codeaurora.org>
---
 kernel/rcu/tree.c | 12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 3b7adc9cc068..6bad7018dc18 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -3940,12 +3940,18 @@ int rcutree_prepare_cpu(unsigned int cpu)
 	rdp->qlen_last_fqs_check = 0;
 	rdp->n_force_qs_snap = rcu_state.n_force_qs;
 	rdp->blimit = blimit;
-	if (rcu_segcblist_empty(&rdp->cblist) && /* No early-boot CBs? */
-	    !rcu_segcblist_is_offloaded(&rdp->cblist))
-		rcu_segcblist_init(&rdp->cblist);  /* Re-enable callbacks. */
 	rdp->dynticks_nesting = 1;	/* CPU not up, no tearing. */
 	rcu_dynticks_eqs_online();
 	raw_spin_unlock_rcu_node(rnp);		/* irqs remain disabled. */
+	/*
+	 * Lock in case the CB/GP kthreads are still around handling
+	 * old callbacks (longer term we should flush all callbacks
+	 * before completing CPU offline)
+	 */
+	rcu_nocb_lock(rdp);
+	if (rcu_segcblist_empty(&rdp->cblist)) /* No early-boot CBs? */
+		rcu_segcblist_init(&rdp->cblist);  /* Re-enable callbacks. */
+	rcu_nocb_unlock(rdp);

 	/*
 	 * Add CPU to leaf rcu_node pending-online bitmask.  Any needed
-- 
2.25.1

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 05/16] rcu: De-offloading CB kthread
  2020-10-23 14:46 [PATCH 00/16] rcu/nocb: De-offload and re-offload support v3 Frederic Weisbecker
                   ` (3 preceding siblings ...)
  2020-10-23 14:46 ` [PATCH 04/16] rcu/nocb: Always init segcblist on CPU up Frederic Weisbecker
@ 2020-10-23 14:46 ` Frederic Weisbecker
  2020-11-02 13:38   ` Boqun Feng
  2020-10-23 14:46 ` [PATCH 06/16] rcu/nocb: Don't deoffload an offline CPU with pending work Frederic Weisbecker
                   ` (10 subsequent siblings)
  15 siblings, 1 reply; 21+ messages in thread
From: Frederic Weisbecker @ 2020-10-23 14:46 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Steven Rostedt, Mathieu Desnoyers,
	Paul E . McKenney, Lai Jiangshan, Neeraj Upadhyay,
	Joel Fernandes, Josh Triplett

In order to de-offload the callbacks processing of an rdp, we must clear
SEGCBLIST_OFFLOAD and notify the CB kthread so that it clears its own
bit flag and goes to sleep to stop handling callbacks. The GP kthread
will also be notified the same way. Whoever acknowledges and clears its
own bit last must notify the de-offloading worker so that it can resume
the de-offloading while being sure that callbacks won't be handled
remotely anymore.

Inspired-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Neeraj Upadhyay <neeraju@codeaurora.org>
---
 include/linux/rcupdate.h   |   2 +
 kernel/rcu/rcu_segcblist.c |  10 ++-
 kernel/rcu/rcu_segcblist.h |   2 +-
 kernel/rcu/tree.h          |   1 +
 kernel/rcu/tree_plugin.h   | 134 +++++++++++++++++++++++++++++++------
 5 files changed, 126 insertions(+), 23 deletions(-)

diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index 7c1ceff02852..bf8eb02411c2 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -104,8 +104,10 @@ static inline void rcu_user_exit(void) { }
 
 #ifdef CONFIG_RCU_NOCB_CPU
 void rcu_init_nohz(void);
+int rcu_nocb_cpu_deoffload(int cpu);
 #else /* #ifdef CONFIG_RCU_NOCB_CPU */
 static inline void rcu_init_nohz(void) { }
+static inline int rcu_nocb_cpu_deoffload(int cpu) { return 0; }
 #endif /* #else #ifdef CONFIG_RCU_NOCB_CPU */
 
 /**
diff --git a/kernel/rcu/rcu_segcblist.c b/kernel/rcu/rcu_segcblist.c
index a96511b7cc98..3f6b5b724b39 100644
--- a/kernel/rcu/rcu_segcblist.c
+++ b/kernel/rcu/rcu_segcblist.c
@@ -170,10 +170,14 @@ void rcu_segcblist_disable(struct rcu_segcblist *rsclp)
  * Mark the specified rcu_segcblist structure as offloaded.  This
  * structure must be empty.
  */
-void rcu_segcblist_offload(struct rcu_segcblist *rsclp)
+void rcu_segcblist_offload(struct rcu_segcblist *rsclp, bool offload)
 {
-	rcu_segcblist_clear_flags(rsclp, SEGCBLIST_SOFTIRQ_ONLY);
-	rcu_segcblist_set_flags(rsclp, SEGCBLIST_OFFLOADED);
+	if (offload) {
+		rcu_segcblist_clear_flags(rsclp, SEGCBLIST_SOFTIRQ_ONLY);
+		rcu_segcblist_set_flags(rsclp, SEGCBLIST_OFFLOADED);
+	} else {
+		rcu_segcblist_clear_flags(rsclp, SEGCBLIST_OFFLOADED);
+	}
 }
 
 /*
diff --git a/kernel/rcu/rcu_segcblist.h b/kernel/rcu/rcu_segcblist.h
index 575896a2518b..00ebeb8d39b7 100644
--- a/kernel/rcu/rcu_segcblist.h
+++ b/kernel/rcu/rcu_segcblist.h
@@ -105,7 +105,7 @@ static inline bool rcu_segcblist_restempty(struct rcu_segcblist *rsclp, int seg)
 void rcu_segcblist_inc_len(struct rcu_segcblist *rsclp);
 void rcu_segcblist_init(struct rcu_segcblist *rsclp);
 void rcu_segcblist_disable(struct rcu_segcblist *rsclp);
-void rcu_segcblist_offload(struct rcu_segcblist *rsclp);
+void rcu_segcblist_offload(struct rcu_segcblist *rsclp, bool offload);
 bool rcu_segcblist_ready_cbs(struct rcu_segcblist *rsclp);
 bool rcu_segcblist_pend_cbs(struct rcu_segcblist *rsclp);
 struct rcu_head *rcu_segcblist_first_cb(struct rcu_segcblist *rsclp);
diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
index e4f66b8f7c47..8047102be878 100644
--- a/kernel/rcu/tree.h
+++ b/kernel/rcu/tree.h
@@ -200,6 +200,7 @@ struct rcu_data {
 	/* 5) Callback offloading. */
 #ifdef CONFIG_RCU_NOCB_CPU
 	struct swait_queue_head nocb_cb_wq; /* For nocb kthreads to sleep on. */
+	struct swait_queue_head nocb_state_wq; /* For offloading state changes */
 	struct task_struct *nocb_gp_kthread;
 	raw_spinlock_t nocb_lock;	/* Guard following pair of fields. */
 	atomic_t nocb_lock_contended;	/* Contention experienced. */
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index fd8a52e9a887..09caf319a4a9 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -2081,16 +2081,29 @@ static int rcu_nocb_gp_kthread(void *arg)
 	return 0;
 }
 
+static inline bool nocb_cb_can_run(struct rcu_data *rdp)
+{
+	u8 flags = SEGCBLIST_OFFLOADED | SEGCBLIST_KTHREAD_CB;
+	return rcu_segcblist_test_flags(&rdp->cblist, flags);
+}
+
+static inline bool nocb_cb_wait_cond(struct rcu_data *rdp)
+{
+	return nocb_cb_can_run(rdp) && !READ_ONCE(rdp->nocb_cb_sleep);
+}
+
 /*
  * Invoke any ready callbacks from the corresponding no-CBs CPU,
  * then, if there are no more, wait for more to appear.
  */
 static void nocb_cb_wait(struct rcu_data *rdp)
 {
-	unsigned long cur_gp_seq;
-	unsigned long flags;
+	struct rcu_segcblist *cblist = &rdp->cblist;
+	struct rcu_node *rnp = rdp->mynode;
+	bool needwake_state = false;
 	bool needwake_gp = false;
-	struct rcu_node *rnp = rdp->mynode;
+	unsigned long cur_gp_seq;
+	unsigned long flags;
 
 	local_irq_save(flags);
 	rcu_momentary_dyntick_idle();
@@ -2100,32 +2113,50 @@ static void nocb_cb_wait(struct rcu_data *rdp)
 	local_bh_enable();
 	lockdep_assert_irqs_enabled();
 	rcu_nocb_lock_irqsave(rdp, flags);
-	if (rcu_segcblist_nextgp(&rdp->cblist, &cur_gp_seq) &&
+	if (rcu_segcblist_nextgp(cblist, &cur_gp_seq) &&
 	    rcu_seq_done(&rnp->gp_seq, cur_gp_seq) &&
 	    raw_spin_trylock_rcu_node(rnp)) { /* irqs already disabled. */
 		needwake_gp = rcu_advance_cbs(rdp->mynode, rdp);
 		raw_spin_unlock_rcu_node(rnp); /* irqs remain disabled. */
 	}
-	if (rcu_segcblist_ready_cbs(&rdp->cblist)) {
-		rcu_nocb_unlock_irqrestore(rdp, flags);
-		if (needwake_gp)
-			rcu_gp_kthread_wake();
-		return;
-	}
 
-	trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("CBSleep"));
 	WRITE_ONCE(rdp->nocb_cb_sleep, true);
+
+	if (rcu_segcblist_test_flags(cblist, SEGCBLIST_OFFLOADED)) {
+		if (rcu_segcblist_ready_cbs(cblist))
+			WRITE_ONCE(rdp->nocb_cb_sleep, false);
+	} else {
+		/*
+		 * De-offloading. Clear our flag and notify the de-offload worker.
+		 * We won't touch the callbacks and keep sleeping until we ever
+		 * get re-offloaded.
+		 */
+		WARN_ON_ONCE(!rcu_segcblist_test_flags(cblist, SEGCBLIST_KTHREAD_CB));
+		rcu_segcblist_clear_flags(cblist, SEGCBLIST_KTHREAD_CB);
+		if (!rcu_segcblist_test_flags(cblist, SEGCBLIST_KTHREAD_GP))
+			needwake_state = true;
+	}
+
+	if (rdp->nocb_cb_sleep)
+		trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("CBSleep"));
+
 	rcu_nocb_unlock_irqrestore(rdp, flags);
 	if (needwake_gp)
 		rcu_gp_kthread_wake();
-	swait_event_interruptible_exclusive(rdp->nocb_cb_wq,
-				 !READ_ONCE(rdp->nocb_cb_sleep));
-	if (!smp_load_acquire(&rdp->nocb_cb_sleep)) { /* VVV */
+
+	if (needwake_state)
+		swake_up_one(&rdp->nocb_state_wq);
+
+	do {
+		swait_event_interruptible_exclusive(rdp->nocb_cb_wq,
+						    nocb_cb_wait_cond(rdp));
+
 		/* ^^^ Ensure CB invocation follows _sleep test. */
-		return;
-	}
-	WARN_ON(signal_pending(current));
-	trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("WokeEmpty"));
+		if (smp_load_acquire(&rdp->nocb_cb_sleep)) {
+			WARN_ON(signal_pending(current));
+			trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("WokeEmpty"));
+		}
+	} while (!nocb_cb_can_run(rdp));
 }
 
 /*
@@ -2187,6 +2218,69 @@ static void do_nocb_deferred_wakeup(struct rcu_data *rdp)
 		do_nocb_deferred_wakeup_common(rdp);
 }
 
+static int __rcu_nocb_rdp_deoffload(struct rcu_data *rdp)
+{
+	struct rcu_segcblist *cblist = &rdp->cblist;
+	struct rcu_node *rnp = rdp->mynode;
+	bool wake_cb = false;
+	unsigned long flags;
+
+	printk("De-offloading %d\n", rdp->cpu);
+
+	rcu_nocb_lock_irqsave(rdp, flags);
+	raw_spin_lock_rcu_node(rnp);
+	rcu_segcblist_offload(cblist, false);
+	raw_spin_unlock_rcu_node(rnp);
+
+	if (rdp->nocb_cb_sleep) {
+		rdp->nocb_cb_sleep = false;
+		wake_cb = true;
+	}
+	rcu_nocb_unlock_irqrestore(rdp, flags);
+
+	if (wake_cb)
+		swake_up_one(&rdp->nocb_cb_wq);
+
+	swait_event_exclusive(rdp->nocb_state_wq,
+			      !rcu_segcblist_test_flags(cblist, SEGCBLIST_KTHREAD_CB));
+
+	return 0;
+}
+
+static long rcu_nocb_rdp_deoffload(void *arg)
+{
+	struct rcu_data *rdp = arg;
+
+	WARN_ON_ONCE(rdp->cpu != raw_smp_processor_id());
+	return __rcu_nocb_rdp_deoffload(rdp);
+}
+
+int rcu_nocb_cpu_deoffload(int cpu)
+{
+	struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
+	int ret = 0;
+
+	if (rdp == rdp->nocb_gp_rdp) {
+		pr_info("Can't deoffload an rdp GP leader (yet)\n");
+		return -EINVAL;
+	}
+	mutex_lock(&rcu_state.barrier_mutex);
+	cpus_read_lock();
+	if (rcu_segcblist_is_offloaded(&rdp->cblist)) {
+		if (cpu_online(cpu)) {
+			ret = work_on_cpu(cpu, rcu_nocb_rdp_deoffload, rdp);
+		} else {
+			ret = __rcu_nocb_rdp_deoffload(rdp);
+		}
+		if (!ret)
+			cpumask_clear_cpu(cpu, rcu_nocb_mask);
+	}
+	cpus_read_unlock();
+	mutex_unlock(&rcu_state.barrier_mutex);
+
+	return ret;
+}
+
 void __init rcu_init_nohz(void)
 {
 	int cpu;
@@ -2229,7 +2323,8 @@ void __init rcu_init_nohz(void)
 		rdp = per_cpu_ptr(&rcu_data, cpu);
 		if (rcu_segcblist_empty(&rdp->cblist))
 			rcu_segcblist_init(&rdp->cblist);
-		rcu_segcblist_offload(&rdp->cblist);
+		rcu_segcblist_offload(&rdp->cblist, true);
+		rcu_segcblist_set_flags(&rdp->cblist, SEGCBLIST_KTHREAD_CB);
 	}
 	rcu_organize_nocb_kthreads();
 }
@@ -2239,6 +2334,7 @@ static void __init rcu_boot_init_nocb_percpu_data(struct rcu_data *rdp)
 {
 	init_swait_queue_head(&rdp->nocb_cb_wq);
 	init_swait_queue_head(&rdp->nocb_gp_wq);
+	init_swait_queue_head(&rdp->nocb_state_wq);
 	raw_spin_lock_init(&rdp->nocb_lock);
 	raw_spin_lock_init(&rdp->nocb_bypass_lock);
 	raw_spin_lock_init(&rdp->nocb_gp_lock);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 06/16] rcu/nocb: Don't deoffload an offline CPU with pending work
  2020-10-23 14:46 [PATCH 00/16] rcu/nocb: De-offload and re-offload support v3 Frederic Weisbecker
                   ` (4 preceding siblings ...)
  2020-10-23 14:46 ` [PATCH 05/16] rcu: De-offloading CB kthread Frederic Weisbecker
@ 2020-10-23 14:46 ` Frederic Weisbecker
  2020-10-23 14:46 ` [PATCH 07/16] rcu: De-offloading GP kthread Frederic Weisbecker
                   ` (9 subsequent siblings)
  15 siblings, 0 replies; 21+ messages in thread
From: Frederic Weisbecker @ 2020-10-23 14:46 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Steven Rostedt, Mathieu Desnoyers,
	Paul E . McKenney, Lai Jiangshan, Neeraj Upadhyay,
	Joel Fernandes, Josh Triplett

Offlining offloaded CPUs don't migrate their callbacks just like
non-offloaded CPUs do. It's up to their CB/GP kthread to handle what
remains.

Therefore we can't afford to de-offload an offline CPU that still has
pending work to do, or the callbacks would be ignored.

NOTE: The long term solution will be to wait for all pending callbacks
to be processed before completing a CPU down operation.

Suggested-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Neeraj Upadhyay <neeraju@codeaurora.org>
---
 kernel/rcu/tree_plugin.h | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 09caf319a4a9..33e9d53d2181 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -2228,6 +2228,14 @@ static int __rcu_nocb_rdp_deoffload(struct rcu_data *rdp)
 	printk("De-offloading %d\n", rdp->cpu);
 
 	rcu_nocb_lock_irqsave(rdp, flags);
+	/*
+	 * If there are still pending work offloaded, the offline
+	 * CPU won't help much handling them.
+	 */
+	if (cpu_is_offline(rdp->cpu) && !rcu_segcblist_empty(&rdp->cblist)) {
+		rcu_nocb_unlock_irqrestore(rdp, flags);
+		return -EBUSY;
+	}
 	raw_spin_lock_rcu_node(rnp);
 	rcu_segcblist_offload(cblist, false);
 	raw_spin_unlock_rcu_node(rnp);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 07/16] rcu: De-offloading GP kthread
  2020-10-23 14:46 [PATCH 00/16] rcu/nocb: De-offload and re-offload support v3 Frederic Weisbecker
                   ` (5 preceding siblings ...)
  2020-10-23 14:46 ` [PATCH 06/16] rcu/nocb: Don't deoffload an offline CPU with pending work Frederic Weisbecker
@ 2020-10-23 14:46 ` Frederic Weisbecker
  2020-10-23 14:46 ` [PATCH 08/16] rcu: Re-offload support Frederic Weisbecker
                   ` (8 subsequent siblings)
  15 siblings, 0 replies; 21+ messages in thread
From: Frederic Weisbecker @ 2020-10-23 14:46 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Steven Rostedt, Mathieu Desnoyers,
	Paul E . McKenney, Lai Jiangshan, Neeraj Upadhyay,
	Joel Fernandes, Josh Triplett

In order to de-offload the callbacks processing of an rdp, we must clear
SEGCBLIST_OFFLOAD and notify the GP kthread so that it clears its own
bit flag and ignore the target rdp from its loop. The CB kthread
is also notified the same way. Whoever acknowledges and clears its
own bit last must notify the de-offloading worker so that it can resume
the de-offloading while being sure that callbacks won't be handled
remotely anymore.

Inspired-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Neeraj Upadhyay <neeraju@codeaurora.org>
---
 kernel/rcu/tree_plugin.h | 54 +++++++++++++++++++++++++++++++++++++---
 1 file changed, 51 insertions(+), 3 deletions(-)

diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 33e9d53d2181..432ab20722ff 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -1928,6 +1928,33 @@ static void do_nocb_bypass_wakeup_timer(struct timer_list *t)
 	__call_rcu_nocb_wake(rdp, true, flags);
 }
 
+static inline bool nocb_gp_enabled_cb(struct rcu_data *rdp)
+{
+	u8 flags = SEGCBLIST_OFFLOADED | SEGCBLIST_KTHREAD_GP;
+
+	return rcu_segcblist_test_flags(&rdp->cblist, flags);
+}
+
+static inline bool nocb_gp_update_state(struct rcu_data *rdp, bool *needwake_state)
+{
+	struct rcu_segcblist *cblist = &rdp->cblist;
+
+	if (rcu_segcblist_test_flags(cblist, SEGCBLIST_OFFLOADED)) {
+		return true;
+	} else {
+		/*
+		 * De-offloading. Clear our flag and notify the de-offload worker.
+		 * We will ignore this rdp until it ever gets re-offloaded.
+		 */
+		WARN_ON_ONCE(!rcu_segcblist_test_flags(cblist, SEGCBLIST_KTHREAD_GP));
+		rcu_segcblist_clear_flags(cblist, SEGCBLIST_KTHREAD_GP);
+		if (!rcu_segcblist_test_flags(cblist, SEGCBLIST_KTHREAD_CB))
+			*needwake_state = true;
+		return false;
+	}
+}
+
+
 /*
  * No-CBs GP kthreads come here to wait for additional callbacks to show up
  * or for grace periods to end.
@@ -1956,8 +1983,17 @@ static void nocb_gp_wait(struct rcu_data *my_rdp)
 	 */
 	WARN_ON_ONCE(my_rdp->nocb_gp_rdp != my_rdp);
 	for (rdp = my_rdp; rdp; rdp = rdp->nocb_next_cb_rdp) {
+		bool needwake_state = false;
+		if (!nocb_gp_enabled_cb(rdp))
+			continue;
 		trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("Check"));
 		rcu_nocb_lock_irqsave(rdp, flags);
+		if (!nocb_gp_update_state(rdp, &needwake_state)) {
+			rcu_nocb_unlock_irqrestore(rdp, flags);
+			if (needwake_state)
+				swake_up_one(&rdp->nocb_state_wq);
+			continue;
+		}
 		bypass_ncbs = rcu_cblist_n_cbs(&rdp->nocb_bypass);
 		if (bypass_ncbs &&
 		    (time_after(j, READ_ONCE(rdp->nocb_bypass_first) + 1) ||
@@ -2221,8 +2257,9 @@ static void do_nocb_deferred_wakeup(struct rcu_data *rdp)
 static int __rcu_nocb_rdp_deoffload(struct rcu_data *rdp)
 {
 	struct rcu_segcblist *cblist = &rdp->cblist;
+	struct rcu_data *rdp_gp = rdp->nocb_gp_rdp;
+	bool wake_cb = false, wake_gp = false;
 	struct rcu_node *rnp = rdp->mynode;
-	bool wake_cb = false;
 	unsigned long flags;
 
 	printk("De-offloading %d\n", rdp->cpu);
@@ -2249,9 +2286,19 @@ static int __rcu_nocb_rdp_deoffload(struct rcu_data *rdp)
 	if (wake_cb)
 		swake_up_one(&rdp->nocb_cb_wq);
 
+	raw_spin_lock_irqsave(&rdp_gp->nocb_gp_lock, flags);
+	if (rdp_gp->nocb_gp_sleep) {
+		rdp_gp->nocb_gp_sleep = false;
+		wake_gp = true;
+	}
+	raw_spin_unlock_irqrestore(&rdp_gp->nocb_gp_lock, flags);
+
+	if (wake_gp)
+		wake_up_process(rdp_gp->nocb_gp_kthread);
+
 	swait_event_exclusive(rdp->nocb_state_wq,
-			      !rcu_segcblist_test_flags(cblist, SEGCBLIST_KTHREAD_CB));
-
+			      !rcu_segcblist_test_flags(cblist, SEGCBLIST_KTHREAD_CB |
+							SEGCBLIST_KTHREAD_GP));
 	return 0;
 }
 
@@ -2333,6 +2380,7 @@ void __init rcu_init_nohz(void)
 			rcu_segcblist_init(&rdp->cblist);
 		rcu_segcblist_offload(&rdp->cblist, true);
 		rcu_segcblist_set_flags(&rdp->cblist, SEGCBLIST_KTHREAD_CB);
+		rcu_segcblist_set_flags(&rdp->cblist, SEGCBLIST_KTHREAD_GP);
 	}
 	rcu_organize_nocb_kthreads();
 }
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 08/16] rcu: Re-offload support
  2020-10-23 14:46 [PATCH 00/16] rcu/nocb: De-offload and re-offload support v3 Frederic Weisbecker
                   ` (6 preceding siblings ...)
  2020-10-23 14:46 ` [PATCH 07/16] rcu: De-offloading GP kthread Frederic Weisbecker
@ 2020-10-23 14:46 ` Frederic Weisbecker
  2020-10-23 14:46 ` [PATCH 09/16] rcu: Shutdown nocb timer on de-offloading Frederic Weisbecker
                   ` (7 subsequent siblings)
  15 siblings, 0 replies; 21+ messages in thread
From: Frederic Weisbecker @ 2020-10-23 14:46 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Steven Rostedt, Mathieu Desnoyers,
	Paul E . McKenney, Lai Jiangshan, Neeraj Upadhyay,
	Joel Fernandes, Josh Triplett

In order to re-offload the callbacks processing of an rdp, we must clear
SEGCBLIST_SOFTIRQ_ONLY, set SEGCBLIST_OFFLOADED and notify the CB and GP
kthreads so that they both set their own bit flag and start processing
the callbacks remotely. The re-offloading worker is then notified that
it can stop processing the callbacks locally.

Ordering must be carefully enforced so that the callbacks that used to
be processed locally without locking must have their latest updates
visible by the time they get processed by the kthreads.

Inspired-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Neeraj Upadhyay <neeraju@codeaurora.org>
---
 include/linux/rcupdate.h |   2 +
 kernel/rcu/tree_plugin.h | 157 +++++++++++++++++++++++++++++++++------
 2 files changed, 138 insertions(+), 21 deletions(-)

diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index bf8eb02411c2..f5ad5d0051c4 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -104,9 +104,11 @@ static inline void rcu_user_exit(void) { }
 
 #ifdef CONFIG_RCU_NOCB_CPU
 void rcu_init_nohz(void);
+int rcu_nocb_cpu_offload(int cpu);
 int rcu_nocb_cpu_deoffload(int cpu);
 #else /* #ifdef CONFIG_RCU_NOCB_CPU */
 static inline void rcu_init_nohz(void) { }
+static inline int rcu_nocb_cpu_offload(int cpu) { return -EINVAL; }
 static inline int rcu_nocb_cpu_deoffload(int cpu) { return 0; }
 #endif /* #else #ifdef CONFIG_RCU_NOCB_CPU */
 
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 432ab20722ff..c0474e985f44 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -1928,6 +1928,20 @@ static void do_nocb_bypass_wakeup_timer(struct timer_list *t)
 	__call_rcu_nocb_wake(rdp, true, flags);
 }
 
+/*
+ * Check if we ignore this rdp.
+ *
+ * We check that without holding the nocb lock but
+ * we make sure not to miss a freshly offloaded rdp
+ * with the current ordering:
+ *
+ *  rdp_offload_toggle()        nocb_gp_enabled_cb()
+ * -------------------------   ----------------------------
+ *    WRITE flags                 LOCK nocb_gp_lock
+ *    LOCK nocb_gp_lock           READ/WRITE nocb_gp_sleep
+ *    READ/WRITE nocb_gp_sleep    UNLOCK nocb_gp_lock
+ *    UNLOCK nocb_gp_lock         READ flags
+ */
 static inline bool nocb_gp_enabled_cb(struct rcu_data *rdp)
 {
 	u8 flags = SEGCBLIST_OFFLOADED | SEGCBLIST_KTHREAD_GP;
@@ -1940,6 +1954,11 @@ static inline bool nocb_gp_update_state(struct rcu_data *rdp, bool *needwake_sta
 	struct rcu_segcblist *cblist = &rdp->cblist;
 
 	if (rcu_segcblist_test_flags(cblist, SEGCBLIST_OFFLOADED)) {
+		if (!rcu_segcblist_test_flags(cblist, SEGCBLIST_KTHREAD_GP)) {
+			rcu_segcblist_set_flags(cblist, SEGCBLIST_KTHREAD_GP);
+			if (rcu_segcblist_test_flags(cblist, SEGCBLIST_KTHREAD_CB))
+				*needwake_state = true;
+		}
 		return true;
 	} else {
 		/*
@@ -2003,6 +2022,8 @@ static void nocb_gp_wait(struct rcu_data *my_rdp)
 			bypass_ncbs = rcu_cblist_n_cbs(&rdp->nocb_bypass);
 		} else if (!bypass_ncbs && rcu_segcblist_empty(&rdp->cblist)) {
 			rcu_nocb_unlock_irqrestore(rdp, flags);
+			if (needwake_state)
+				swake_up_one(&rdp->nocb_state_wq);
 			continue; /* No callbacks here, try next. */
 		}
 		if (bypass_ncbs) {
@@ -2054,6 +2075,8 @@ static void nocb_gp_wait(struct rcu_data *my_rdp)
 		}
 		if (needwake_gp)
 			rcu_gp_kthread_wake();
+		if (needwake_state)
+			swake_up_one(&rdp->nocb_state_wq);
 	}
 
 	my_rdp->nocb_gp_bypass = bypass;
@@ -2159,6 +2182,11 @@ static void nocb_cb_wait(struct rcu_data *rdp)
 	WRITE_ONCE(rdp->nocb_cb_sleep, true);
 
 	if (rcu_segcblist_test_flags(cblist, SEGCBLIST_OFFLOADED)) {
+		if (!rcu_segcblist_test_flags(cblist, SEGCBLIST_KTHREAD_CB)) {
+			rcu_segcblist_set_flags(cblist, SEGCBLIST_KTHREAD_CB);
+			if (rcu_segcblist_test_flags(cblist, SEGCBLIST_KTHREAD_GP))
+				needwake_state = true;
+		}
 		if (rcu_segcblist_ready_cbs(cblist))
 			WRITE_ONCE(rdp->nocb_cb_sleep, false);
 	} else {
@@ -2254,37 +2282,28 @@ static void do_nocb_deferred_wakeup(struct rcu_data *rdp)
 		do_nocb_deferred_wakeup_common(rdp);
 }
 
-static int __rcu_nocb_rdp_deoffload(struct rcu_data *rdp)
+static int rdp_offload_toggle(struct rcu_data *rdp,
+			       bool offload, unsigned long flags)
+	__releases(rdp->nocb_lock)
 {
 	struct rcu_segcblist *cblist = &rdp->cblist;
 	struct rcu_data *rdp_gp = rdp->nocb_gp_rdp;
-	bool wake_cb = false, wake_gp = false;
 	struct rcu_node *rnp = rdp->mynode;
-	unsigned long flags;
+	bool wake_gp = false;
 
-	printk("De-offloading %d\n", rdp->cpu);
-
-	rcu_nocb_lock_irqsave(rdp, flags);
-	/*
-	 * If there are still pending work offloaded, the offline
-	 * CPU won't help much handling them.
-	 */
-	if (cpu_is_offline(rdp->cpu) && !rcu_segcblist_empty(&rdp->cblist)) {
-		rcu_nocb_unlock_irqrestore(rdp, flags);
-		return -EBUSY;
-	}
 	raw_spin_lock_rcu_node(rnp);
-	rcu_segcblist_offload(cblist, false);
+	rcu_segcblist_offload(cblist, offload);
 	raw_spin_unlock_rcu_node(rnp);
 
-	if (rdp->nocb_cb_sleep) {
+	if (rdp->nocb_cb_sleep)
 		rdp->nocb_cb_sleep = false;
-		wake_cb = true;
-	}
 	rcu_nocb_unlock_irqrestore(rdp, flags);
 
-	if (wake_cb)
-		swake_up_one(&rdp->nocb_cb_wq);
+	/*
+	 * Ignore former value of nocb_cb_sleep and force wake up as it could
+	 * have been spuriously set to false already.
+	 */
+	swake_up_one(&rdp->nocb_cb_wq);
 
 	raw_spin_lock_irqsave(&rdp_gp->nocb_gp_lock, flags);
 	if (rdp_gp->nocb_gp_sleep) {
@@ -2296,10 +2315,32 @@ static int __rcu_nocb_rdp_deoffload(struct rcu_data *rdp)
 	if (wake_gp)
 		wake_up_process(rdp_gp->nocb_gp_kthread);
 
+	return 0;
+}
+
+static int __rcu_nocb_rdp_deoffload(struct rcu_data *rdp)
+{
+	struct rcu_segcblist *cblist = &rdp->cblist;
+	unsigned long flags;
+	int ret;
+
+	printk("De-offloading %d\n", rdp->cpu);
+
+	rcu_nocb_lock_irqsave(rdp, flags);
+	/*
+	 * If there are still pending work offloaded, the offline
+	 * CPU won't help much handling them.
+	 */
+	if (cpu_is_offline(rdp->cpu) && !rcu_segcblist_empty(&rdp->cblist)) {
+		rcu_nocb_unlock_irqrestore(rdp, flags);
+		return -EBUSY;
+	}
+
+	ret = rdp_offload_toggle(rdp, false, flags);
 	swait_event_exclusive(rdp->nocb_state_wq,
 			      !rcu_segcblist_test_flags(cblist, SEGCBLIST_KTHREAD_CB |
 							SEGCBLIST_KTHREAD_GP));
-	return 0;
+	return ret;
 }
 
 static long rcu_nocb_rdp_deoffload(void *arg)
@@ -2336,6 +2377,80 @@ int rcu_nocb_cpu_deoffload(int cpu)
 	return ret;
 }
 
+static int __rcu_nocb_rdp_offload(struct rcu_data *rdp)
+{
+	struct rcu_segcblist *cblist = &rdp->cblist;
+	unsigned long flags;
+	int ret;
+
+	/*
+	 * For now we only support re-offload, ie: the rdp must have been
+	 * offloaded on boot first.
+	 */
+	if (!rdp->nocb_gp_rdp)
+		return -EINVAL;
+
+	printk("Offloading %d\n", rdp->cpu);
+	/*
+	 * Can't use rcu_nocb_lock_irqsave() while we are in
+	 * SEGCBLIST_SOFTIRQ_ONLY mode.
+	 */
+	raw_spin_lock_irqsave(&rdp->nocb_lock, flags);
+	/*
+	 * We didn't take the nocb lock while working on the
+	 * rdp->cblist in SEGCBLIST_SOFTIRQ_ONLY mode.
+	 * Every modifications that have been done previously on
+	 * rdp->cblist must be visible remotely by the nocb kthreads
+	 * upon wake up after reading the cblist flags.
+	 *
+	 * The layout against nocb_lock enforces that ordering:
+	 *
+	 *  __rcu_nocb_rdp_offload()   nocb_cb_wait()/nocb_gp_wait()
+	 * -------------------------   ----------------------------
+	 *      WRITE callbacks           rcu_nocb_lock()
+	 *      rcu_nocb_lock()           READ flags
+	 *      WRITE flags               READ callbacks
+	 *      rcu_nocb_unlock()         rcu_nocb_unlock()
+	 */
+	ret = rdp_offload_toggle(rdp, true, flags);
+	swait_event_exclusive(rdp->nocb_state_wq,
+			      rcu_segcblist_test_flags(cblist, SEGCBLIST_KTHREAD_CB) &&
+			      rcu_segcblist_test_flags(cblist, SEGCBLIST_KTHREAD_GP));
+
+	return ret;
+}
+
+static long rcu_nocb_rdp_offload(void *arg)
+{
+	struct rcu_data *rdp = arg;
+
+	WARN_ON_ONCE(rdp->cpu != raw_smp_processor_id());
+	return __rcu_nocb_rdp_offload(rdp);
+}
+
+int rcu_nocb_cpu_offload(int cpu)
+{
+	struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
+	int ret = 0;
+
+	mutex_lock(&rcu_state.barrier_mutex);
+	cpus_read_lock();
+	if (!rcu_segcblist_is_offloaded(&rdp->cblist)) {
+		if (cpu_online(cpu)) {
+			ret = work_on_cpu(cpu, rcu_nocb_rdp_offload, rdp);
+		} else {
+			ret = __rcu_nocb_rdp_offload(rdp);
+		}
+		if (!ret)
+			cpumask_set_cpu(cpu, rcu_nocb_mask);
+	}
+	cpus_read_unlock();
+	mutex_unlock(&rcu_state.barrier_mutex);
+
+	return ret;
+}
+
+
 void __init rcu_init_nohz(void)
 {
 	int cpu;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 09/16] rcu: Shutdown nocb timer on de-offloading
  2020-10-23 14:46 [PATCH 00/16] rcu/nocb: De-offload and re-offload support v3 Frederic Weisbecker
                   ` (7 preceding siblings ...)
  2020-10-23 14:46 ` [PATCH 08/16] rcu: Re-offload support Frederic Weisbecker
@ 2020-10-23 14:46 ` Frederic Weisbecker
  2020-10-23 14:46 ` [PATCH 10/16] rcu: Flush bypass before setting SEGCBLIST_SOFTIRQ_ONLY Frederic Weisbecker
                   ` (6 subsequent siblings)
  15 siblings, 0 replies; 21+ messages in thread
From: Frederic Weisbecker @ 2020-10-23 14:46 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Steven Rostedt, Mathieu Desnoyers,
	Paul E . McKenney, Lai Jiangshan, Neeraj Upadhyay,
	Joel Fernandes, Josh Triplett

Make sure the nocb timer can't fire anymore before we reach the final
de-offload state. Spuriously waking up the GP kthread is no big deal but
we must prevent from executing the timer callback without nocb locking.

Inspired-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Neeraj Upadhyay <neeraju@codeaurora.org>
---
 kernel/rcu/tree.h        |  1 +
 kernel/rcu/tree_plugin.h | 12 +++++++++++-
 2 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
index 8047102be878..5a4e23782340 100644
--- a/kernel/rcu/tree.h
+++ b/kernel/rcu/tree.h
@@ -256,6 +256,7 @@ struct rcu_data {
 };
 
 /* Values for nocb_defer_wakeup field in struct rcu_data. */
+#define RCU_NOCB_WAKE_OFF	-1
 #define RCU_NOCB_WAKE_NOT	0
 #define RCU_NOCB_WAKE		1
 #define RCU_NOCB_WAKE_FORCE	2
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index c0474e985f44..c44b83b79196 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -1665,6 +1665,8 @@ static void wake_nocb_gp(struct rcu_data *rdp, bool force,
 static void wake_nocb_gp_defer(struct rcu_data *rdp, int waketype,
 			       const char *reason)
 {
+	if (rdp->nocb_defer_wakeup == RCU_NOCB_WAKE_OFF)
+		return;
 	if (rdp->nocb_defer_wakeup == RCU_NOCB_WAKE_NOT)
 		mod_timer(&rdp->nocb_timer, jiffies + 1);
 	if (rdp->nocb_defer_wakeup < waketype)
@@ -2243,7 +2245,7 @@ static int rcu_nocb_cb_kthread(void *arg)
 /* Is a deferred wakeup of rcu_nocb_kthread() required? */
 static int rcu_nocb_need_deferred_wakeup(struct rcu_data *rdp)
 {
-	return READ_ONCE(rdp->nocb_defer_wakeup);
+	return READ_ONCE(rdp->nocb_defer_wakeup) > RCU_NOCB_WAKE_NOT;
 }
 
 /* Do a deferred wakeup of rcu_nocb_kthread(). */
@@ -2340,6 +2342,12 @@ static int __rcu_nocb_rdp_deoffload(struct rcu_data *rdp)
 	swait_event_exclusive(rdp->nocb_state_wq,
 			      !rcu_segcblist_test_flags(cblist, SEGCBLIST_KTHREAD_CB |
 							SEGCBLIST_KTHREAD_GP));
+	/* Make sure nocb timer won't stay around */
+	rcu_nocb_lock_irqsave(rdp, flags);
+	WRITE_ONCE(rdp->nocb_defer_wakeup, RCU_NOCB_WAKE_OFF);
+	rcu_nocb_unlock_irqrestore(rdp, flags);
+	del_timer_sync(&rdp->nocb_timer);
+
 	return ret;
 }
 
@@ -2396,6 +2404,8 @@ static int __rcu_nocb_rdp_offload(struct rcu_data *rdp)
 	 * SEGCBLIST_SOFTIRQ_ONLY mode.
 	 */
 	raw_spin_lock_irqsave(&rdp->nocb_lock, flags);
+	/* Re-enable nocb timer */
+	WRITE_ONCE(rdp->nocb_defer_wakeup, RCU_NOCB_WAKE_NOT);
 	/*
 	 * We didn't take the nocb lock while working on the
 	 * rdp->cblist in SEGCBLIST_SOFTIRQ_ONLY mode.
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 10/16] rcu: Flush bypass before setting SEGCBLIST_SOFTIRQ_ONLY
  2020-10-23 14:46 [PATCH 00/16] rcu/nocb: De-offload and re-offload support v3 Frederic Weisbecker
                   ` (8 preceding siblings ...)
  2020-10-23 14:46 ` [PATCH 09/16] rcu: Shutdown nocb timer on de-offloading Frederic Weisbecker
@ 2020-10-23 14:46 ` Frederic Weisbecker
  2020-10-23 14:46 ` [PATCH 11/16] rcu: Set SEGCBLIST_SOFTIRQ_ONLY at the very last stage of de-offloading Frederic Weisbecker
                   ` (5 subsequent siblings)
  15 siblings, 0 replies; 21+ messages in thread
From: Frederic Weisbecker @ 2020-10-23 14:46 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Steven Rostedt, Mathieu Desnoyers,
	Paul E . McKenney, Lai Jiangshan, Neeraj Upadhyay,
	Joel Fernandes, Josh Triplett

Make sure to handle the pending bypass queue before we switch to the
final de-offload state. We'll have to be careful and later set
SEGCBLIST_SOFTIRQ_ONLY before re-enabling again IRQs, or new bypass
callbacks could be queued in the meantine.

Inspired-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Neeraj Upadhyay <neeraju@codeaurora.org>
---
 kernel/rcu/tree_plugin.h | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index c44b83b79196..49bd42995ae7 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -2342,12 +2342,19 @@ static int __rcu_nocb_rdp_deoffload(struct rcu_data *rdp)
 	swait_event_exclusive(rdp->nocb_state_wq,
 			      !rcu_segcblist_test_flags(cblist, SEGCBLIST_KTHREAD_CB |
 							SEGCBLIST_KTHREAD_GP));
+	rcu_nocb_lock_irqsave(rdp, flags);
 	/* Make sure nocb timer won't stay around */
-	rcu_nocb_lock_irqsave(rdp, flags);
 	WRITE_ONCE(rdp->nocb_defer_wakeup, RCU_NOCB_WAKE_OFF);
-	rcu_nocb_unlock_irqrestore(rdp, flags);
 	del_timer_sync(&rdp->nocb_timer);
 
+	/*
+	 * Flush bypass. While IRQs are disabled and once we set
+	 * SEGCBLIST_SOFTIRQ_ONLY, no callback is supposed to be
+	 * enqueued on bypass.
+	 */
+	rcu_nocb_flush_bypass(rdp, NULL, jiffies);
+	rcu_nocb_unlock_irqrestore(rdp, flags);
+
 	return ret;
 }
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 11/16] rcu: Set SEGCBLIST_SOFTIRQ_ONLY at the very last stage of de-offloading
  2020-10-23 14:46 [PATCH 00/16] rcu/nocb: De-offload and re-offload support v3 Frederic Weisbecker
                   ` (9 preceding siblings ...)
  2020-10-23 14:46 ` [PATCH 10/16] rcu: Flush bypass before setting SEGCBLIST_SOFTIRQ_ONLY Frederic Weisbecker
@ 2020-10-23 14:46 ` Frederic Weisbecker
  2020-10-23 14:46 ` [PATCH 12/16] rcu/nocb: Only cond_resched() from actual offloaded batch processing Frederic Weisbecker
                   ` (4 subsequent siblings)
  15 siblings, 0 replies; 21+ messages in thread
From: Frederic Weisbecker @ 2020-10-23 14:46 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Steven Rostedt, Mathieu Desnoyers,
	Paul E . McKenney, Lai Jiangshan, Neeraj Upadhyay,
	Joel Fernandes, Josh Triplett

Set SEGCBLIST_SOFTIRQ_ONLY once everything is settled. After that, the
callbacks are handled locklessly and locally.

Inspired-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Neeraj Upadhyay <neeraju@codeaurora.org>
---
 kernel/rcu/tree_plugin.h | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 49bd42995ae7..2f083beab9d9 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -2353,7 +2353,14 @@ static int __rcu_nocb_rdp_deoffload(struct rcu_data *rdp)
 	 * enqueued on bypass.
 	 */
 	rcu_nocb_flush_bypass(rdp, NULL, jiffies);
-	rcu_nocb_unlock_irqrestore(rdp, flags);
+	rcu_segcblist_set_flags(cblist, SEGCBLIST_SOFTIRQ_ONLY);
+	/*
+	 * With SEGCBLIST_SOFTIRQ_ONLY, we can't use
+	 * rcu_nocb_unlock_irqrestore() anymore. Theoretically we
+	 * could set SEGCBLIST_SOFTIRQ_ONLY with cb unlocked and IRQs
+	 * disabled now, but let's be paranoid.
+	 */
+	raw_spin_unlock_irqrestore(&rdp->nocb_lock, flags);
 
 	return ret;
 }
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 12/16] rcu/nocb: Only cond_resched() from actual offloaded batch processing
  2020-10-23 14:46 [PATCH 00/16] rcu/nocb: De-offload and re-offload support v3 Frederic Weisbecker
                   ` (10 preceding siblings ...)
  2020-10-23 14:46 ` [PATCH 11/16] rcu: Set SEGCBLIST_SOFTIRQ_ONLY at the very last stage of de-offloading Frederic Weisbecker
@ 2020-10-23 14:46 ` Frederic Weisbecker
  2020-10-23 14:46 ` [PATCH 13/16] rcu: Process batch locally as long as offloading isn't complete Frederic Weisbecker
                   ` (3 subsequent siblings)
  15 siblings, 0 replies; 21+ messages in thread
From: Frederic Weisbecker @ 2020-10-23 14:46 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Steven Rostedt, Mathieu Desnoyers,
	Paul E . McKenney, Lai Jiangshan, Neeraj Upadhyay,
	Joel Fernandes, Josh Triplett

rcu_do_batch() will be callable concurrently by softirqs and offloaded
processing. So make sure we actually call cond resched only from the
offloaded context.

Inspired-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Neeraj Upadhyay <neeraju@codeaurora.org>
---
 kernel/rcu/tree.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 6bad7018dc18..35834ce2d042 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -2491,8 +2491,7 @@ static void rcu_do_batch(struct rcu_data *rdp)
 			/* Exceeded the time limit, so leave. */
 			break;
 		}
-		if (offloaded) {
-			WARN_ON_ONCE(in_serving_softirq());
+		if (!in_serving_softirq()) {
 			local_bh_enable();
 			lockdep_assert_irqs_enabled();
 			cond_resched_tasks_rcu_qs();
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 13/16] rcu: Process batch locally as long as offloading isn't complete
  2020-10-23 14:46 [PATCH 00/16] rcu/nocb: De-offload and re-offload support v3 Frederic Weisbecker
                   ` (11 preceding siblings ...)
  2020-10-23 14:46 ` [PATCH 12/16] rcu/nocb: Only cond_resched() from actual offloaded batch processing Frederic Weisbecker
@ 2020-10-23 14:46 ` Frederic Weisbecker
  2020-10-23 14:46 ` [PATCH 14/16] rcu: Locally accelerate callbacks " Frederic Weisbecker
                   ` (2 subsequent siblings)
  15 siblings, 0 replies; 21+ messages in thread
From: Frederic Weisbecker @ 2020-10-23 14:46 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Steven Rostedt, Mathieu Desnoyers,
	Paul E . McKenney, Lai Jiangshan, Neeraj Upadhyay,
	Joel Fernandes, Josh Triplett

During the offloading or de-offloading process, make sure to process
the callbacks batch locally whenever the segcblist isn't entirely
offloaded. This enforces callback service processing while we are still
in intermediate (de-)offloading state.

FIXME: Note that __call_rcu_core() isn't called during these intermediate
states. Some pieces there may still be necessary.

Inspired-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Neeraj Upadhyay <neeraju@codeaurora.org>
---
 kernel/rcu/rcu_segcblist.h | 12 ++++++++++++
 kernel/rcu/tree.c          |  3 ++-
 2 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/kernel/rcu/rcu_segcblist.h b/kernel/rcu/rcu_segcblist.h
index 00ebeb8d39b7..f7da3d535888 100644
--- a/kernel/rcu/rcu_segcblist.h
+++ b/kernel/rcu/rcu_segcblist.h
@@ -92,6 +92,18 @@ static inline bool rcu_segcblist_is_offloaded(struct rcu_segcblist *rsclp)
 	return false;
 }
 
+static inline bool rcu_segcblist_completely_offloaded(struct rcu_segcblist *rsclp)
+{
+	int flags = SEGCBLIST_KTHREAD_CB | SEGCBLIST_KTHREAD_GP | SEGCBLIST_OFFLOADED;
+
+	if (IS_ENABLED(CONFIG_RCU_NOCB_CPU)) {
+		if ((rsclp->flags & flags) == flags)
+			return true;
+	}
+
+	return false;
+}
+
 /*
  * Are all segments following the specified segment of the specified
  * rcu_segcblist structure empty of callbacks?  (The specified
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 35834ce2d042..45fad6977bea 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -2674,6 +2674,7 @@ static __latent_entropy void rcu_core(void)
 	struct rcu_data *rdp = raw_cpu_ptr(&rcu_data);
 	struct rcu_node *rnp = rdp->mynode;
 	const bool offloaded = rcu_segcblist_is_offloaded(&rdp->cblist);
+	const bool do_batch = !rcu_segcblist_completely_offloaded(&rdp->cblist);
 
 	if (cpu_is_offline(smp_processor_id()))
 		return;
@@ -2703,7 +2704,7 @@ static __latent_entropy void rcu_core(void)
 	rcu_check_gp_start_stall(rnp, rdp, rcu_jiffies_till_stall_check());
 
 	/* If there are callbacks ready, invoke them. */
-	if (!offloaded && rcu_segcblist_ready_cbs(&rdp->cblist) &&
+	if (do_batch && rcu_segcblist_ready_cbs(&rdp->cblist) &&
 	    likely(READ_ONCE(rcu_scheduler_fully_active)))
 		rcu_do_batch(rdp);
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 14/16] rcu: Locally accelerate callbacks as long as offloading isn't complete
  2020-10-23 14:46 [PATCH 00/16] rcu/nocb: De-offload and re-offload support v3 Frederic Weisbecker
                   ` (12 preceding siblings ...)
  2020-10-23 14:46 ` [PATCH 13/16] rcu: Process batch locally as long as offloading isn't complete Frederic Weisbecker
@ 2020-10-23 14:46 ` Frederic Weisbecker
  2020-10-23 14:46 ` [PATCH 15/16] rcutorture: Test runtime toggling of CPUs' callback offloading Frederic Weisbecker
  2020-10-23 14:46 ` [PATCH 16/16] tools/rcutorture: Support nocb toggle in TREE01 Frederic Weisbecker
  15 siblings, 0 replies; 21+ messages in thread
From: Frederic Weisbecker @ 2020-10-23 14:46 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Steven Rostedt, Mathieu Desnoyers,
	Paul E . McKenney, Lai Jiangshan, Neeraj Upadhyay,
	Joel Fernandes, Josh Triplett

The local callbacks processing checks if some callbacks need
acceleration. Keep that behaviour under nocb lock protection when
rcu_core() executes concurrently with GP/CB kthreads.

Inspired-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Neeraj Upadhyay <neeraju@codeaurora.org>
---
 kernel/rcu/tree.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 45fad6977bea..4af5fed11885 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -2673,7 +2673,6 @@ static __latent_entropy void rcu_core(void)
 	unsigned long flags;
 	struct rcu_data *rdp = raw_cpu_ptr(&rcu_data);
 	struct rcu_node *rnp = rdp->mynode;
-	const bool offloaded = rcu_segcblist_is_offloaded(&rdp->cblist);
 	const bool do_batch = !rcu_segcblist_completely_offloaded(&rdp->cblist);
 
 	if (cpu_is_offline(smp_processor_id()))
@@ -2694,11 +2693,11 @@ static __latent_entropy void rcu_core(void)
 
 	/* No grace period and unregistered callbacks? */
 	if (!rcu_gp_in_progress() &&
-	    rcu_segcblist_is_enabled(&rdp->cblist) && !offloaded) {
-		local_irq_save(flags);
+	    rcu_segcblist_is_enabled(&rdp->cblist) && do_batch) {
+		rcu_nocb_lock_irqsave(rdp, flags);
 		if (!rcu_segcblist_restempty(&rdp->cblist, RCU_NEXT_READY_TAIL))
 			rcu_accelerate_cbs_unlocked(rnp, rdp);
-		local_irq_restore(flags);
+		rcu_nocb_unlock_irqrestore(rdp, flags);
 	}
 
 	rcu_check_gp_start_stall(rnp, rdp, rcu_jiffies_till_stall_check());
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 15/16] rcutorture: Test runtime toggling of CPUs' callback offloading
  2020-10-23 14:46 [PATCH 00/16] rcu/nocb: De-offload and re-offload support v3 Frederic Weisbecker
                   ` (13 preceding siblings ...)
  2020-10-23 14:46 ` [PATCH 14/16] rcu: Locally accelerate callbacks " Frederic Weisbecker
@ 2020-10-23 14:46 ` Frederic Weisbecker
  2020-10-23 14:46 ` [PATCH 16/16] tools/rcutorture: Support nocb toggle in TREE01 Frederic Weisbecker
  15 siblings, 0 replies; 21+ messages in thread
From: Frederic Weisbecker @ 2020-10-23 14:46 UTC (permalink / raw)
  To: LKML
  Cc: Paul E. McKenney, Steven Rostedt, Frederic Weisbecker,
	Mathieu Desnoyers, Lai Jiangshan, Neeraj Upadhyay,
	Joel Fernandes, Josh Triplett

From: "Paul E. McKenney" <paulmck@kernel.org>

Frederic Weisbecker is adding the ability to change the rcu_nocbs state
of CPUs at runtime, that is, to offload and deoffload their RCU callback
processing without the need to reboot.  As the old saying goes, "if it
ain't tested, it don't work", so this commit therefore adds prototype
rcutorture testing for this capability.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Neeraj Upadhyay <neeraju@codeaurora.org>
---
 .../admin-guide/kernel-parameters.txt         |  8 ++
 kernel/rcu/rcutorture.c                       | 86 ++++++++++++++++++-
 kernel/rcu/tree_plugin.h                      |  3 +-
 3 files changed, 93 insertions(+), 4 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 02d4adbf98d2..de31a867e0d9 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -4324,6 +4324,14 @@
 			stress RCU, they don't participate in the actual
 			test, hence the "fake".
 
+	rcutorture.nocbs_nthreads= [KNL]
+			Set number of RCU callback-offload togglers.
+			Zero (the default) disables toggling.
+
+	rcutorture.nocbs_toggle= [KNL]
+			Set the delay in milliseconds between successive
+			callback-offload toggling attempts.
+
 	rcutorture.nreaders= [KNL]
 			Set number of RCU readers.  The value -1 selects
 			N-1, where N is the number of CPUs.  A value
diff --git a/kernel/rcu/rcutorture.c b/kernel/rcu/rcutorture.c
index 916ea4f66e4b..c027b753bed5 100644
--- a/kernel/rcu/rcutorture.c
+++ b/kernel/rcu/rcutorture.c
@@ -97,6 +97,8 @@ torture_param(int, object_debug, 0,
 torture_param(int, onoff_holdoff, 0, "Time after boot before CPU hotplugs (s)");
 torture_param(int, onoff_interval, 0,
 	     "Time between CPU hotplugs (jiffies), 0=disable");
+torture_param(int, nocbs_nthreads, 0, "Number of NOCB toggle threads, 0 to disable");
+torture_param(int, nocbs_toggle, 1000, "Time between toggling nocb state (ms)");
 torture_param(int, read_exit_delay, 13,
 	      "Delay between read-then-exit episodes (s)");
 torture_param(int, read_exit_burst, 16,
@@ -127,10 +129,12 @@ static char *torture_type = "rcu";
 module_param(torture_type, charp, 0444);
 MODULE_PARM_DESC(torture_type, "Type of RCU to torture (rcu, srcu, ...)");
 
+static int nrealnocbers;
 static int nrealreaders;
 static struct task_struct *writer_task;
 static struct task_struct **fakewriter_tasks;
 static struct task_struct **reader_tasks;
+static struct task_struct **nocb_tasks;
 static struct task_struct *stats_task;
 static struct task_struct *fqs_task;
 static struct task_struct *boost_tasks[NR_CPUS];
@@ -174,6 +178,8 @@ static unsigned long n_read_exits;
 static struct list_head rcu_torture_removed;
 static unsigned long shutdown_jiffies;
 static unsigned long start_gp_seq;
+static atomic_long_t n_nocb_offload;
+static atomic_long_t n_nocb_deoffload;
 
 static int rcu_torture_writer_state;
 #define RTWS_FIXED_DELAY	0
@@ -1483,6 +1489,53 @@ rcu_torture_reader(void *arg)
 	return 0;
 }
 
+/*
+ * Randomly Toggle CPUs' callback-offload state.  This uses hrtimers to
+ * increase race probabilities and fuzzes the interval between toggling.
+ */
+static int rcu_nocb_toggle(void *arg)
+{
+	int cpu;
+	int maxcpu = -1;
+	int oldnice = task_nice(current);
+	long r;
+	DEFINE_TORTURE_RANDOM(rand);
+	ktime_t toggle_delay;
+	unsigned long toggle_fuzz;
+	ktime_t toggle_interval = ms_to_ktime(nocbs_toggle);
+
+	VERBOSE_TOROUT_STRING("rcu_nocb_toggle task started");
+	while (!rcu_inkernel_boot_has_ended())
+		schedule_timeout_interruptible(HZ / 10);
+	for_each_online_cpu(cpu)
+		maxcpu = cpu;
+	WARN_ON(maxcpu < 0);
+	if (toggle_interval > ULONG_MAX)
+		toggle_fuzz = ULONG_MAX >> 3;
+	else
+		toggle_fuzz = toggle_interval >> 3;
+	if (toggle_fuzz <= 0)
+		toggle_fuzz = NSEC_PER_USEC;
+	do {
+		r = torture_random(&rand);
+		cpu = (r >> 4) % (maxcpu + 1);
+		if (r & 0x1) {
+			rcu_nocb_cpu_offload(cpu);
+			atomic_long_inc(&n_nocb_offload);
+		} else {
+			rcu_nocb_cpu_deoffload(cpu);
+			atomic_long_inc(&n_nocb_deoffload);
+		}
+		toggle_delay = torture_random(&rand) % toggle_fuzz + toggle_interval;
+		set_current_state(TASK_INTERRUPTIBLE);
+		schedule_hrtimeout(&toggle_delay, HRTIMER_MODE_REL);
+		if (stutter_wait("rcu_nocb_toggle"))
+			sched_set_normal(current, oldnice);
+	} while (!torture_must_stop());
+	torture_kthread_stopping("rcu_nocb_toggle");
+	return 0;
+}
+
 /*
  * Print torture statistics.  Caller must ensure that there is only
  * one call to this function at a given time!!!  This is normally
@@ -1538,7 +1591,9 @@ rcu_torture_stats_print(void)
 		data_race(n_barrier_successes),
 		data_race(n_barrier_attempts),
 		data_race(n_rcu_torture_barrier_error));
-	pr_cont("read-exits: %ld\n", data_race(n_read_exits));
+	pr_cont("read-exits: %ld ", data_race(n_read_exits));
+	pr_cont("nocb-toggles: %ld:%ld\n",
+		atomic_long_read(&n_nocb_offload), atomic_long_read(&n_nocb_deoffload));
 
 	pr_alert("%s%s ", torture_type, TORTURE_FLAG);
 	if (atomic_read(&n_rcu_torture_mberror) ||
@@ -1631,7 +1686,8 @@ rcu_torture_print_module_parms(struct rcu_torture_ops *cur_ops, const char *tag)
 		 "stall_cpu_block=%d "
 		 "n_barrier_cbs=%d "
 		 "onoff_interval=%d onoff_holdoff=%d "
-		 "read_exit_delay=%d read_exit_burst=%d\n",
+		 "read_exit_delay=%d read_exit_burst=%d "
+		 "nocbs_nthreads=%d nocbs_toggle=%d\n",
 		 torture_type, tag, nrealreaders, nfakewriters,
 		 stat_interval, verbose, test_no_idle_hz, shuffle_interval,
 		 stutter, irqreader, fqs_duration, fqs_holdoff, fqs_stutter,
@@ -1641,7 +1697,8 @@ rcu_torture_print_module_parms(struct rcu_torture_ops *cur_ops, const char *tag)
 		 stall_cpu_block,
 		 n_barrier_cbs,
 		 onoff_interval, onoff_holdoff,
-		 read_exit_delay, read_exit_burst);
+		 read_exit_delay, read_exit_burst,
+		 nocbs_nthreads, nocbs_toggle);
 }
 
 static int rcutorture_booster_cleanup(unsigned int cpu)
@@ -2479,6 +2536,13 @@ rcu_torture_cleanup(void)
 	torture_stop_kthread(rcu_torture_stall, stall_task);
 	torture_stop_kthread(rcu_torture_writer, writer_task);
 
+	if (nocb_tasks) {
+		for (i = 0; i < nrealnocbers; i++)
+			torture_stop_kthread(rcu_nocb_toggle, nocb_tasks[i]);
+		kfree(nocb_tasks);
+		nocb_tasks = NULL;
+	}
+
 	if (reader_tasks) {
 		for (i = 0; i < nrealreaders; i++)
 			torture_stop_kthread(rcu_torture_reader,
@@ -2742,6 +2806,22 @@ rcu_torture_init(void)
 		if (firsterr)
 			goto unwind;
 	}
+	nrealnocbers = nocbs_nthreads;
+	if (WARN_ON(nrealnocbers < 0))
+		nrealnocbers = 1;
+	if (WARN_ON(nocbs_toggle < 0))
+		nocbs_toggle = HZ;
+	nocb_tasks = kcalloc(nrealnocbers, sizeof(nocb_tasks[0]), GFP_KERNEL);
+	if (nocb_tasks == NULL) {
+		VERBOSE_TOROUT_ERRSTRING("out of memory");
+		firsterr = -ENOMEM;
+		goto unwind;
+	}
+	for (i = 0; i < nrealnocbers; i++) {
+		firsterr = torture_create_kthread(rcu_nocb_toggle, NULL, nocb_tasks[i]);
+		if (firsterr)
+			goto unwind;
+	}
 	if (stat_interval > 0) {
 		firsterr = torture_create_kthread(rcu_torture_stats, NULL,
 						  stats_task);
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 2f083beab9d9..5e0870229dd5 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -2398,6 +2398,7 @@ int rcu_nocb_cpu_deoffload(int cpu)
 
 	return ret;
 }
+EXPORT_SYMBOL_GPL(rcu_nocb_cpu_deoffload);
 
 static int __rcu_nocb_rdp_offload(struct rcu_data *rdp)
 {
@@ -2473,7 +2474,7 @@ int rcu_nocb_cpu_offload(int cpu)
 
 	return ret;
 }
-
+EXPORT_SYMBOL_GPL(rcu_nocb_cpu_offload);
 
 void __init rcu_init_nohz(void)
 {
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 16/16] tools/rcutorture: Support nocb toggle in TREE01
  2020-10-23 14:46 [PATCH 00/16] rcu/nocb: De-offload and re-offload support v3 Frederic Weisbecker
                   ` (14 preceding siblings ...)
  2020-10-23 14:46 ` [PATCH 15/16] rcutorture: Test runtime toggling of CPUs' callback offloading Frederic Weisbecker
@ 2020-10-23 14:46 ` Frederic Weisbecker
  15 siblings, 0 replies; 21+ messages in thread
From: Frederic Weisbecker @ 2020-10-23 14:46 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Steven Rostedt, Mathieu Desnoyers,
	Paul E . McKenney, Lai Jiangshan, Neeraj Upadhyay,
	Joel Fernandes, Josh Triplett

Add periodic toggling of 7 CPUs over 8 every second in order to test
NOCB toggle code. Choose TREE01 for that as it's already testing nocb.

Inspired-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Neeraj Upadhyay <neeraju@codeaurora.org>
---
 tools/testing/selftests/rcutorture/configs/rcu/TREE01.boot | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/rcutorture/configs/rcu/TREE01.boot b/tools/testing/selftests/rcutorture/configs/rcu/TREE01.boot
index d6da9a61d44a..40af3df0f397 100644
--- a/tools/testing/selftests/rcutorture/configs/rcu/TREE01.boot
+++ b/tools/testing/selftests/rcutorture/configs/rcu/TREE01.boot
@@ -2,5 +2,7 @@ maxcpus=8 nr_cpus=43
 rcutree.gp_preinit_delay=3
 rcutree.gp_init_delay=3
 rcutree.gp_cleanup_delay=3
-rcu_nocbs=0
+rcu_nocbs=0-1,3-7
+rcutorture.nocbs_nthreads=8
+rcutorture.nocbs_toggle=1000
 rcutorture.fwd_progress=0
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH 05/16] rcu: De-offloading CB kthread
  2020-10-23 14:46 ` [PATCH 05/16] rcu: De-offloading CB kthread Frederic Weisbecker
@ 2020-11-02 13:38   ` Boqun Feng
  2020-11-04 14:31     ` Frederic Weisbecker
  0 siblings, 1 reply; 21+ messages in thread
From: Boqun Feng @ 2020-11-02 13:38 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Steven Rostedt, Mathieu Desnoyers, Paul E . McKenney,
	Lai Jiangshan, Neeraj Upadhyay, Joel Fernandes, Josh Triplett

Hi Frederic,

Could you copy the rcu@vger.kernel.org if you have another version, it
will help RCU hobbyists like me to catch up news in RCU, thanks! ;-)

Please see below for some comments, I'm still reading the whole
patchset, so probably I miss something..

On Fri, Oct 23, 2020 at 04:46:38PM +0200, Frederic Weisbecker wrote:
> In order to de-offload the callbacks processing of an rdp, we must clear
> SEGCBLIST_OFFLOAD and notify the CB kthread so that it clears its own
> bit flag and goes to sleep to stop handling callbacks. The GP kthread
> will also be notified the same way. Whoever acknowledges and clears its
> own bit last must notify the de-offloading worker so that it can resume
> the de-offloading while being sure that callbacks won't be handled
> remotely anymore.
> 
> Inspired-by: Paul E. McKenney <paulmck@kernel.org>
> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
> Cc: Paul E. McKenney <paulmck@kernel.org>
> Cc: Josh Triplett <josh@joshtriplett.org>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> Cc: Lai Jiangshan <jiangshanlai@gmail.com>
> Cc: Joel Fernandes <joel@joelfernandes.org>
> Cc: Neeraj Upadhyay <neeraju@codeaurora.org>
> ---
>  include/linux/rcupdate.h   |   2 +
>  kernel/rcu/rcu_segcblist.c |  10 ++-
>  kernel/rcu/rcu_segcblist.h |   2 +-
>  kernel/rcu/tree.h          |   1 +
>  kernel/rcu/tree_plugin.h   | 134 +++++++++++++++++++++++++++++++------
>  5 files changed, 126 insertions(+), 23 deletions(-)
> 
> diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
> index 7c1ceff02852..bf8eb02411c2 100644
> --- a/include/linux/rcupdate.h
> +++ b/include/linux/rcupdate.h
> @@ -104,8 +104,10 @@ static inline void rcu_user_exit(void) { }
>  
>  #ifdef CONFIG_RCU_NOCB_CPU
>  void rcu_init_nohz(void);
> +int rcu_nocb_cpu_deoffload(int cpu);
>  #else /* #ifdef CONFIG_RCU_NOCB_CPU */
>  static inline void rcu_init_nohz(void) { }
> +static inline int rcu_nocb_cpu_deoffload(int cpu) { return 0; }
>  #endif /* #else #ifdef CONFIG_RCU_NOCB_CPU */
>  
>  /**
> diff --git a/kernel/rcu/rcu_segcblist.c b/kernel/rcu/rcu_segcblist.c
> index a96511b7cc98..3f6b5b724b39 100644
> --- a/kernel/rcu/rcu_segcblist.c
> +++ b/kernel/rcu/rcu_segcblist.c
> @@ -170,10 +170,14 @@ void rcu_segcblist_disable(struct rcu_segcblist *rsclp)
>   * Mark the specified rcu_segcblist structure as offloaded.  This
>   * structure must be empty.
>   */
> -void rcu_segcblist_offload(struct rcu_segcblist *rsclp)
> +void rcu_segcblist_offload(struct rcu_segcblist *rsclp, bool offload)
>  {
> -	rcu_segcblist_clear_flags(rsclp, SEGCBLIST_SOFTIRQ_ONLY);
> -	rcu_segcblist_set_flags(rsclp, SEGCBLIST_OFFLOADED);
> +	if (offload) {
> +		rcu_segcblist_clear_flags(rsclp, SEGCBLIST_SOFTIRQ_ONLY);
> +		rcu_segcblist_set_flags(rsclp, SEGCBLIST_OFFLOADED);
> +	} else {
> +		rcu_segcblist_clear_flags(rsclp, SEGCBLIST_OFFLOADED);
> +	}
>  }
>  
>  /*
> diff --git a/kernel/rcu/rcu_segcblist.h b/kernel/rcu/rcu_segcblist.h
> index 575896a2518b..00ebeb8d39b7 100644
> --- a/kernel/rcu/rcu_segcblist.h
> +++ b/kernel/rcu/rcu_segcblist.h
> @@ -105,7 +105,7 @@ static inline bool rcu_segcblist_restempty(struct rcu_segcblist *rsclp, int seg)
>  void rcu_segcblist_inc_len(struct rcu_segcblist *rsclp);
>  void rcu_segcblist_init(struct rcu_segcblist *rsclp);
>  void rcu_segcblist_disable(struct rcu_segcblist *rsclp);
> -void rcu_segcblist_offload(struct rcu_segcblist *rsclp);
> +void rcu_segcblist_offload(struct rcu_segcblist *rsclp, bool offload);
>  bool rcu_segcblist_ready_cbs(struct rcu_segcblist *rsclp);
>  bool rcu_segcblist_pend_cbs(struct rcu_segcblist *rsclp);
>  struct rcu_head *rcu_segcblist_first_cb(struct rcu_segcblist *rsclp);
> diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
> index e4f66b8f7c47..8047102be878 100644
> --- a/kernel/rcu/tree.h
> +++ b/kernel/rcu/tree.h
> @@ -200,6 +200,7 @@ struct rcu_data {
>  	/* 5) Callback offloading. */
>  #ifdef CONFIG_RCU_NOCB_CPU
>  	struct swait_queue_head nocb_cb_wq; /* For nocb kthreads to sleep on. */
> +	struct swait_queue_head nocb_state_wq; /* For offloading state changes */
>  	struct task_struct *nocb_gp_kthread;
>  	raw_spinlock_t nocb_lock;	/* Guard following pair of fields. */
>  	atomic_t nocb_lock_contended;	/* Contention experienced. */
> diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> index fd8a52e9a887..09caf319a4a9 100644
> --- a/kernel/rcu/tree_plugin.h
> +++ b/kernel/rcu/tree_plugin.h
> @@ -2081,16 +2081,29 @@ static int rcu_nocb_gp_kthread(void *arg)
>  	return 0;
>  }
>  
> +static inline bool nocb_cb_can_run(struct rcu_data *rdp)
> +{
> +	u8 flags = SEGCBLIST_OFFLOADED | SEGCBLIST_KTHREAD_CB;
> +	return rcu_segcblist_test_flags(&rdp->cblist, flags);
> +}
> +
> +static inline bool nocb_cb_wait_cond(struct rcu_data *rdp)
> +{
> +	return nocb_cb_can_run(rdp) && !READ_ONCE(rdp->nocb_cb_sleep);
> +}
> +
>  /*
>   * Invoke any ready callbacks from the corresponding no-CBs CPU,
>   * then, if there are no more, wait for more to appear.
>   */
>  static void nocb_cb_wait(struct rcu_data *rdp)
>  {
> -	unsigned long cur_gp_seq;
> -	unsigned long flags;
> +	struct rcu_segcblist *cblist = &rdp->cblist;
> +	struct rcu_node *rnp = rdp->mynode;
> +	bool needwake_state = false;
>  	bool needwake_gp = false;
> -	struct rcu_node *rnp = rdp->mynode;
> +	unsigned long cur_gp_seq;
> +	unsigned long flags;
>  
>  	local_irq_save(flags);
>  	rcu_momentary_dyntick_idle();
> @@ -2100,32 +2113,50 @@ static void nocb_cb_wait(struct rcu_data *rdp)
>  	local_bh_enable();
>  	lockdep_assert_irqs_enabled();
>  	rcu_nocb_lock_irqsave(rdp, flags);
> -	if (rcu_segcblist_nextgp(&rdp->cblist, &cur_gp_seq) &&
> +	if (rcu_segcblist_nextgp(cblist, &cur_gp_seq) &&
>  	    rcu_seq_done(&rnp->gp_seq, cur_gp_seq) &&
>  	    raw_spin_trylock_rcu_node(rnp)) { /* irqs already disabled. */
>  		needwake_gp = rcu_advance_cbs(rdp->mynode, rdp);
>  		raw_spin_unlock_rcu_node(rnp); /* irqs remain disabled. */
>  	}
> -	if (rcu_segcblist_ready_cbs(&rdp->cblist)) {
> -		rcu_nocb_unlock_irqrestore(rdp, flags);
> -		if (needwake_gp)
> -			rcu_gp_kthread_wake();
> -		return;
> -	}
>  
> -	trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("CBSleep"));
>  	WRITE_ONCE(rdp->nocb_cb_sleep, true);
> +
> +	if (rcu_segcblist_test_flags(cblist, SEGCBLIST_OFFLOADED)) {
> +		if (rcu_segcblist_ready_cbs(cblist))
> +			WRITE_ONCE(rdp->nocb_cb_sleep, false);
> +	} else {
> +		/*
> +		 * De-offloading. Clear our flag and notify the de-offload worker.
> +		 * We won't touch the callbacks and keep sleeping until we ever
> +		 * get re-offloaded.
> +		 */
> +		WARN_ON_ONCE(!rcu_segcblist_test_flags(cblist, SEGCBLIST_KTHREAD_CB));
> +		rcu_segcblist_clear_flags(cblist, SEGCBLIST_KTHREAD_CB);
> +		if (!rcu_segcblist_test_flags(cblist, SEGCBLIST_KTHREAD_GP))
> +			needwake_state = true;
> +	}
> +
> +	if (rdp->nocb_cb_sleep)
> +		trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("CBSleep"));
> +
>  	rcu_nocb_unlock_irqrestore(rdp, flags);
>  	if (needwake_gp)
>  		rcu_gp_kthread_wake();
> -	swait_event_interruptible_exclusive(rdp->nocb_cb_wq,
> -				 !READ_ONCE(rdp->nocb_cb_sleep));
> -	if (!smp_load_acquire(&rdp->nocb_cb_sleep)) { /* VVV */
> +
> +	if (needwake_state)
> +		swake_up_one(&rdp->nocb_state_wq);
> +
> +	do {
> +		swait_event_interruptible_exclusive(rdp->nocb_cb_wq,
> +						    nocb_cb_wait_cond(rdp));
> +
>  		/* ^^^ Ensure CB invocation follows _sleep test. */
> -		return;
> -	}
> -	WARN_ON(signal_pending(current));
> -	trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("WokeEmpty"));
> +		if (smp_load_acquire(&rdp->nocb_cb_sleep)) {
> +			WARN_ON(signal_pending(current));
> +			trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("WokeEmpty"));
> +		}
> +	} while (!nocb_cb_can_run(rdp));
>  }
>  
>  /*
> @@ -2187,6 +2218,69 @@ static void do_nocb_deferred_wakeup(struct rcu_data *rdp)
>  		do_nocb_deferred_wakeup_common(rdp);
>  }
>  
> +static int __rcu_nocb_rdp_deoffload(struct rcu_data *rdp)
> +{
> +	struct rcu_segcblist *cblist = &rdp->cblist;
> +	struct rcu_node *rnp = rdp->mynode;
> +	bool wake_cb = false;
> +	unsigned long flags;
> +
> +	printk("De-offloading %d\n", rdp->cpu);
> +
> +	rcu_nocb_lock_irqsave(rdp, flags);
> +	raw_spin_lock_rcu_node(rnp);

Could you explain why both nocb_lock and node lock are needed here?
Besides rcu_nocb_{un}lock_* can skip the lock part based on the offload
state, which gets modified right here (in a rcu_nocb_{un}lock_* critical
secion), and I think this is error-prone, because you may get a unpaired
lock-unlock (not in this patch though). Maybe just use node lock?

> +	rcu_segcblist_offload(cblist, false);
> +	raw_spin_unlock_rcu_node(rnp);
> +
> +	if (rdp->nocb_cb_sleep) {
> +		rdp->nocb_cb_sleep = false;
> +		wake_cb = true;
> +	}
> +	rcu_nocb_unlock_irqrestore(rdp, flags);
> +
> +	if (wake_cb)
> +		swake_up_one(&rdp->nocb_cb_wq);
> +
> +	swait_event_exclusive(rdp->nocb_state_wq,
> +			      !rcu_segcblist_test_flags(cblist, SEGCBLIST_KTHREAD_CB));
> +
> +	return 0;
> +}
> +
> +static long rcu_nocb_rdp_deoffload(void *arg)
> +{
> +	struct rcu_data *rdp = arg;
> +
> +	WARN_ON_ONCE(rdp->cpu != raw_smp_processor_id());

I think this warning can actually happen, if I understand how workqueue
works correctly. Consider that the corresponding cpu gets offlined right
after the rcu_nocb_cpu_deoffloaed(), and the workqueue of that cpu
becomes unbound, and IIUC, workqueues don't do migration during
cpu-offlining, which means the worker can be scheduled to other CPUs,
and the work gets executed on another cpu. Am I missing something here?.

Regards,
Boqun

> +	return __rcu_nocb_rdp_deoffload(rdp);
> +}
> +
> +int rcu_nocb_cpu_deoffload(int cpu)
> +{
> +	struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
> +	int ret = 0;
> +
> +	if (rdp == rdp->nocb_gp_rdp) {
> +		pr_info("Can't deoffload an rdp GP leader (yet)\n");
> +		return -EINVAL;
> +	}
> +	mutex_lock(&rcu_state.barrier_mutex);
> +	cpus_read_lock();
> +	if (rcu_segcblist_is_offloaded(&rdp->cblist)) {
> +		if (cpu_online(cpu)) {
> +			ret = work_on_cpu(cpu, rcu_nocb_rdp_deoffload, rdp);
> +		} else {
> +			ret = __rcu_nocb_rdp_deoffload(rdp);
> +		}
> +		if (!ret)
> +			cpumask_clear_cpu(cpu, rcu_nocb_mask);
> +	}
> +	cpus_read_unlock();
> +	mutex_unlock(&rcu_state.barrier_mutex);
> +
> +	return ret;
> +}
> +
>  void __init rcu_init_nohz(void)
>  {
>  	int cpu;
> @@ -2229,7 +2323,8 @@ void __init rcu_init_nohz(void)
>  		rdp = per_cpu_ptr(&rcu_data, cpu);
>  		if (rcu_segcblist_empty(&rdp->cblist))
>  			rcu_segcblist_init(&rdp->cblist);
> -		rcu_segcblist_offload(&rdp->cblist);
> +		rcu_segcblist_offload(&rdp->cblist, true);
> +		rcu_segcblist_set_flags(&rdp->cblist, SEGCBLIST_KTHREAD_CB);
>  	}
>  	rcu_organize_nocb_kthreads();
>  }
> @@ -2239,6 +2334,7 @@ static void __init rcu_boot_init_nocb_percpu_data(struct rcu_data *rdp)
>  {
>  	init_swait_queue_head(&rdp->nocb_cb_wq);
>  	init_swait_queue_head(&rdp->nocb_gp_wq);
> +	init_swait_queue_head(&rdp->nocb_state_wq);
>  	raw_spin_lock_init(&rdp->nocb_lock);
>  	raw_spin_lock_init(&rdp->nocb_bypass_lock);
>  	raw_spin_lock_init(&rdp->nocb_gp_lock);
> -- 
> 2.25.1
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 05/16] rcu: De-offloading CB kthread
  2020-11-02 13:38   ` Boqun Feng
@ 2020-11-04 14:31     ` Frederic Weisbecker
  2020-11-04 14:42       ` Boqun Feng
  0 siblings, 1 reply; 21+ messages in thread
From: Frederic Weisbecker @ 2020-11-04 14:31 UTC (permalink / raw)
  To: Boqun Feng
  Cc: LKML, Steven Rostedt, Mathieu Desnoyers, Paul E . McKenney,
	Lai Jiangshan, Neeraj Upadhyay, Joel Fernandes, Josh Triplett

On Mon, Nov 02, 2020 at 09:38:24PM +0800, Boqun Feng wrote:
> Hi Frederic,
> 
> Could you copy the rcu@vger.kernel.org if you have another version, it
> will help RCU hobbyists like me to catch up news in RCU, thanks! ;-)

Sure! Will do!

> > +static int __rcu_nocb_rdp_deoffload(struct rcu_data *rdp)
> > +{
> > +	struct rcu_segcblist *cblist = &rdp->cblist;
> > +	struct rcu_node *rnp = rdp->mynode;
> > +	bool wake_cb = false;
> > +	unsigned long flags;
> > +
> > +	printk("De-offloading %d\n", rdp->cpu);
> > +
> > +	rcu_nocb_lock_irqsave(rdp, flags);
> > +	raw_spin_lock_rcu_node(rnp);
> 
> Could you explain why both nocb_lock and node lock are needed here?

So here is the common scheme with rcu nocb locking:

rcu_nocb_lock(rdp) {
   if (rcu_segcblist_is_offloaded(rdp->cblist))
       raw_spin_lock(rdp->nocb_lock)
}
....

rcu_nocb_unlock(rdp) {
   if (rcu_segcblist_is_offloaded(rdp->cblist))
       raw_spin_unlock(rdp->nocb_lock)
}

Here we first need to check rdp->cblist offloaded state locklessly.
This is dangerous because we now can switch that offloaded state remotely.
So we may run some code under rcu_nocb_lock() without actually locking while at
the same time we switch to offloaded mode and we must run things under the
lock.

Fortunately, rcu_nocb_lock(), and rcu_segcblist_is_offloaded() in general,
is only called under either of the following situations (for which I need to
add debug assertions, but that will be in a subsequent patchset):

   1) hotplug write lock held
   2) rcu_barrier mutex held
   3) nocb lock held
   4) rnp lock held
   5) rdp is local
   6) we run the nocb timer

1) and 2) are covered because we hold them while calling work_on_cpu().
3) and 4) are held when we change the flags of the rdp->cblist.
5) is covered by the fact we are modifying the rdp->cblist state only
   locally (using work_on_cpu()) with irqs disabled.
6) we cancel/rearm the timer before it can see half states.

But you make me realize one thing: we must protect against these situations
especially when we switch the flags:

   a) From SEGCBLIST_SOFTIRQ_ONLY to SEGCBLIST_OFFLOADED (switch from no locking to locking)
   b) From 0 to SEGCBLIST_SOFTIRQ_ONLY (switch from locking to no locking)

Those are the critical state changes that modify the locking behaviour
and we have to hold both rdp->nocb_lock and the rnp lock on these state changes.

Unfortunately in the case b) I omitted the rnp lock. So I need to fix that.

> Besides rcu_nocb_{un}lock_* can skip the lock part based on the offload
> state, which gets modified right here (in a rcu_nocb_{un}lock_* critical
> secion), and I think this is error-prone, because you may get a unpaired
> lock-unlock (not in this patch though). Maybe just use node lock?

So that's why case 1 to 6 must be covered.

> 
> > +	rcu_segcblist_offload(cblist, false);
> > +	raw_spin_unlock_rcu_node(rnp);
> > +
> > +	if (rdp->nocb_cb_sleep) {
> > +		rdp->nocb_cb_sleep = false;
> > +		wake_cb = true;
> > +	}
> > +	rcu_nocb_unlock_irqrestore(rdp, flags);
> > +
> > +	if (wake_cb)
> > +		swake_up_one(&rdp->nocb_cb_wq);
> > +
> > +	swait_event_exclusive(rdp->nocb_state_wq,
> > +			      !rcu_segcblist_test_flags(cblist, SEGCBLIST_KTHREAD_CB));
> > +
> > +	return 0;
> > +}
> > +
> > +static long rcu_nocb_rdp_deoffload(void *arg)
> > +{
> > +	struct rcu_data *rdp = arg;
> > +
> > +	WARN_ON_ONCE(rdp->cpu != raw_smp_processor_id());
> 
> I think this warning can actually happen, if I understand how workqueue
> works correctly. Consider that the corresponding cpu gets offlined right
> after the rcu_nocb_cpu_deoffloaed(), and the workqueue of that cpu
> becomes unbound, and IIUC, workqueues don't do migration during
> cpu-offlining, which means the worker can be scheduled to other CPUs,
> and the work gets executed on another cpu. Am I missing something here?.

We are holding cpus_read_lock() in rcu_nocb_cpu_offload(), this should
prevent from that.

Thanks!

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 05/16] rcu: De-offloading CB kthread
  2020-11-04 14:31     ` Frederic Weisbecker
@ 2020-11-04 14:42       ` Boqun Feng
  2020-11-04 14:45         ` Frederic Weisbecker
  0 siblings, 1 reply; 21+ messages in thread
From: Boqun Feng @ 2020-11-04 14:42 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Steven Rostedt, Mathieu Desnoyers, Paul E . McKenney,
	Lai Jiangshan, Neeraj Upadhyay, Joel Fernandes, Josh Triplett

On Wed, Nov 04, 2020 at 03:31:35PM +0100, Frederic Weisbecker wrote:
[...]
> > 
> > > +	rcu_segcblist_offload(cblist, false);
> > > +	raw_spin_unlock_rcu_node(rnp);
> > > +
> > > +	if (rdp->nocb_cb_sleep) {
> > > +		rdp->nocb_cb_sleep = false;
> > > +		wake_cb = true;
> > > +	}
> > > +	rcu_nocb_unlock_irqrestore(rdp, flags);
> > > +
> > > +	if (wake_cb)
> > > +		swake_up_one(&rdp->nocb_cb_wq);
> > > +
> > > +	swait_event_exclusive(rdp->nocb_state_wq,
> > > +			      !rcu_segcblist_test_flags(cblist, SEGCBLIST_KTHREAD_CB));
> > > +
> > > +	return 0;
> > > +}
> > > +
> > > +static long rcu_nocb_rdp_deoffload(void *arg)
> > > +{
> > > +	struct rcu_data *rdp = arg;
> > > +
> > > +	WARN_ON_ONCE(rdp->cpu != raw_smp_processor_id());
> > 
> > I think this warning can actually happen, if I understand how workqueue
> > works correctly. Consider that the corresponding cpu gets offlined right
> > after the rcu_nocb_cpu_deoffloaed(), and the workqueue of that cpu
> > becomes unbound, and IIUC, workqueues don't do migration during
> > cpu-offlining, which means the worker can be scheduled to other CPUs,
> > and the work gets executed on another cpu. Am I missing something here?.
> 
> We are holding cpus_read_lock() in rcu_nocb_cpu_offload(), this should
> prevent from that.
> 

But what if the work doesn't get executed until we cpus_read_unlock()
and someone offlines that CPU?

Regards,
Boqun

> Thanks!

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 05/16] rcu: De-offloading CB kthread
  2020-11-04 14:42       ` Boqun Feng
@ 2020-11-04 14:45         ` Frederic Weisbecker
  0 siblings, 0 replies; 21+ messages in thread
From: Frederic Weisbecker @ 2020-11-04 14:45 UTC (permalink / raw)
  To: Boqun Feng
  Cc: LKML, Steven Rostedt, Mathieu Desnoyers, Paul E . McKenney,
	Lai Jiangshan, Neeraj Upadhyay, Joel Fernandes, Josh Triplett

On Wed, Nov 04, 2020 at 10:42:09PM +0800, Boqun Feng wrote:
> On Wed, Nov 04, 2020 at 03:31:35PM +0100, Frederic Weisbecker wrote:
> [...]
> > > 
> > > > +	rcu_segcblist_offload(cblist, false);
> > > > +	raw_spin_unlock_rcu_node(rnp);
> > > > +
> > > > +	if (rdp->nocb_cb_sleep) {
> > > > +		rdp->nocb_cb_sleep = false;
> > > > +		wake_cb = true;
> > > > +	}
> > > > +	rcu_nocb_unlock_irqrestore(rdp, flags);
> > > > +
> > > > +	if (wake_cb)
> > > > +		swake_up_one(&rdp->nocb_cb_wq);
> > > > +
> > > > +	swait_event_exclusive(rdp->nocb_state_wq,
> > > > +			      !rcu_segcblist_test_flags(cblist, SEGCBLIST_KTHREAD_CB));
> > > > +
> > > > +	return 0;
> > > > +}
> > > > +
> > > > +static long rcu_nocb_rdp_deoffload(void *arg)
> > > > +{
> > > > +	struct rcu_data *rdp = arg;
> > > > +
> > > > +	WARN_ON_ONCE(rdp->cpu != raw_smp_processor_id());
> > > 
> > > I think this warning can actually happen, if I understand how workqueue
> > > works correctly. Consider that the corresponding cpu gets offlined right
> > > after the rcu_nocb_cpu_deoffloaed(), and the workqueue of that cpu
> > > becomes unbound, and IIUC, workqueues don't do migration during
> > > cpu-offlining, which means the worker can be scheduled to other CPUs,
> > > and the work gets executed on another cpu. Am I missing something here?.
> > 
> > We are holding cpus_read_lock() in rcu_nocb_cpu_offload(), this should
> > prevent from that.
> > 
> 
> But what if the work doesn't get executed until we cpus_read_unlock()
> and someone offlines that CPU?

work_on_cpu() waits for completion before returning.

> Regards,
> Boqun
> 
> > Thanks!

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2020-11-04 14:45 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-23 14:46 [PATCH 00/16] rcu/nocb: De-offload and re-offload support v3 Frederic Weisbecker
2020-10-23 14:46 ` [PATCH 01/16] rcu: Implement rcu_segcblist_is_offloaded() config dependent Frederic Weisbecker
2020-10-23 14:46 ` [PATCH 02/16] rcu: Turn enabled/offload states into a common flag Frederic Weisbecker
2020-10-23 14:46 ` [PATCH 03/16] rcu: Provide basic callback offloading state machine bits Frederic Weisbecker
2020-10-23 14:46 ` [PATCH 04/16] rcu/nocb: Always init segcblist on CPU up Frederic Weisbecker
2020-10-23 14:46 ` [PATCH 05/16] rcu: De-offloading CB kthread Frederic Weisbecker
2020-11-02 13:38   ` Boqun Feng
2020-11-04 14:31     ` Frederic Weisbecker
2020-11-04 14:42       ` Boqun Feng
2020-11-04 14:45         ` Frederic Weisbecker
2020-10-23 14:46 ` [PATCH 06/16] rcu/nocb: Don't deoffload an offline CPU with pending work Frederic Weisbecker
2020-10-23 14:46 ` [PATCH 07/16] rcu: De-offloading GP kthread Frederic Weisbecker
2020-10-23 14:46 ` [PATCH 08/16] rcu: Re-offload support Frederic Weisbecker
2020-10-23 14:46 ` [PATCH 09/16] rcu: Shutdown nocb timer on de-offloading Frederic Weisbecker
2020-10-23 14:46 ` [PATCH 10/16] rcu: Flush bypass before setting SEGCBLIST_SOFTIRQ_ONLY Frederic Weisbecker
2020-10-23 14:46 ` [PATCH 11/16] rcu: Set SEGCBLIST_SOFTIRQ_ONLY at the very last stage of de-offloading Frederic Weisbecker
2020-10-23 14:46 ` [PATCH 12/16] rcu/nocb: Only cond_resched() from actual offloaded batch processing Frederic Weisbecker
2020-10-23 14:46 ` [PATCH 13/16] rcu: Process batch locally as long as offloading isn't complete Frederic Weisbecker
2020-10-23 14:46 ` [PATCH 14/16] rcu: Locally accelerate callbacks " Frederic Weisbecker
2020-10-23 14:46 ` [PATCH 15/16] rcutorture: Test runtime toggling of CPUs' callback offloading Frederic Weisbecker
2020-10-23 14:46 ` [PATCH 16/16] tools/rcutorture: Support nocb toggle in TREE01 Frederic Weisbecker

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).