linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH tip/core/rcu 0/11] No-CBs grace-period kthread updates for v5.3-rc2
@ 2019-08-01 22:50 Paul E. McKenney
  2019-08-01 22:50 ` [PATCH tip/core/rcu 01/11] rcu/nocb: Rename rcu_data fields to prepare for forward-progress work Paul E. McKenney
                   ` (10 more replies)
  0 siblings, 11 replies; 15+ messages in thread
From: Paul E. McKenney @ 2019-08-01 22:50 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, mingo, jiangshanlai, dipankar, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel

Hello!

This series reduces memory footprint (RCU callbacks posted by no-CBs
CPUs) by providing separate rcuog kthreads that wait for grace periods,
thus avoiding the current situation in which callbacks are delayed while
the leader rcuo kthreads invokes callbacks:

1.	Rename rcu_data fields to prepare for forward-progress work.

2.	Update comments to prepare for forward-progress work.

3.	Provide separate no-CBs grace-period kthreads.

4.	Rename nocb_follower_wait() to nocb_cb_wait().

5.	Rename wake_nocb_leader() to wake_nocb_gp().

6.	Rename __wake_nocb_leader() to __wake_nocb_gp().

7.	Rename wake_nocb_leader_defer() to wake_nocb_gp_defer().

8.	Rename rcu_organize_nocb_kthreads() local variable.

9.	Rename and document no-CB CB kthread sleep trace event.

10.	Rename rcu_nocb_leader_stride kernel boot parameter.

11.	Print gp/cb kthread hierarchy if dump_tree.

							Thanx, Paul

------------------------------------------------------------------------

 Documentation/admin-guide/kernel-parameters.txt |   13 -
 include/trace/events/rcu.h                      |    3 
 kernel/rcu/tree.h                               |   28 +-
 kernel/rcu/tree_plugin.h                        |  312 ++++++++++++------------
 4 files changed, 183 insertions(+), 173 deletions(-)

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH tip/core/rcu 01/11] rcu/nocb: Rename rcu_data fields to prepare for forward-progress work
  2019-08-01 22:50 [PATCH tip/core/rcu 0/11] No-CBs grace-period kthread updates for v5.3-rc2 Paul E. McKenney
@ 2019-08-01 22:50 ` Paul E. McKenney
  2019-08-01 22:50 ` [PATCH tip/core/rcu 02/11] rcu/nocb: Update comments " Paul E. McKenney
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 15+ messages in thread
From: Paul E. McKenney @ 2019-08-01 22:50 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, mingo, jiangshanlai, dipankar, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, Paul E. McKenney

This commit simply renames rcu_data fields to prepare for leader
nocb kthreads doing only grace-period work and callback shuffling.
This will mean the addition of replacement kthreads to invoke callbacks.
The "leader" and "follower" thus become less meaningful, so the commit
changes no-CB fields with these strings to "gp" and "cb", respectively.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
---
 kernel/rcu/tree.h        | 14 ++++----
 kernel/rcu/tree_plugin.h | 78 ++++++++++++++++++++--------------------
 2 files changed, 46 insertions(+), 46 deletions(-)

diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
index 7acaf3a62d39..e4e59b627c5a 100644
--- a/kernel/rcu/tree.h
+++ b/kernel/rcu/tree.h
@@ -198,10 +198,10 @@ struct rcu_data {
 	struct rcu_head **nocb_tail;
 	atomic_long_t nocb_q_count;	/* # CBs waiting for nocb */
 	atomic_long_t nocb_q_count_lazy; /*  invocation (all stages). */
-	struct rcu_head *nocb_follower_head; /* CBs ready to invoke. */
-	struct rcu_head **nocb_follower_tail;
+	struct rcu_head *nocb_cb_head;	/* CBs ready to invoke. */
+	struct rcu_head **nocb_cb_tail;
 	struct swait_queue_head nocb_wq; /* For nocb kthreads to sleep on. */
-	struct task_struct *nocb_kthread;
+	struct task_struct *nocb_cb_kthread;
 	raw_spinlock_t nocb_lock;	/* Guard following pair of fields. */
 	int nocb_defer_wakeup;		/* Defer wakeup of nocb_kthread. */
 	struct timer_list nocb_timer;	/* Enforce finite deferral. */
@@ -210,12 +210,12 @@ struct rcu_data {
 	struct rcu_head *nocb_gp_head ____cacheline_internodealigned_in_smp;
 					/* CBs waiting for GP. */
 	struct rcu_head **nocb_gp_tail;
-	bool nocb_leader_sleep;		/* Is the nocb leader thread asleep? */
-	struct rcu_data *nocb_next_follower;
-					/* Next follower in wakeup chain. */
+	bool nocb_gp_sleep;		/* Is the nocb leader thread asleep? */
+	struct rcu_data *nocb_next_cb_rdp;
+					/* Next rcu_data in wakeup chain. */
 
 	/* The following fields are used by the follower, hence new cachline. */
-	struct rcu_data *nocb_leader ____cacheline_internodealigned_in_smp;
+	struct rcu_data *nocb_gp_rdp ____cacheline_internodealigned_in_smp;
 					/* Leader CPU takes GP-end wakeups. */
 #endif /* #ifdef CONFIG_RCU_NOCB_CPU */
 
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 99e9d952827b..5ce1edd1c87f 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -1528,19 +1528,19 @@ static void __wake_nocb_leader(struct rcu_data *rdp, bool force,
 			       unsigned long flags)
 	__releases(rdp->nocb_lock)
 {
-	struct rcu_data *rdp_leader = rdp->nocb_leader;
+	struct rcu_data *rdp_leader = rdp->nocb_gp_rdp;
 
 	lockdep_assert_held(&rdp->nocb_lock);
-	if (!READ_ONCE(rdp_leader->nocb_kthread)) {
+	if (!READ_ONCE(rdp_leader->nocb_cb_kthread)) {
 		raw_spin_unlock_irqrestore(&rdp->nocb_lock, flags);
 		return;
 	}
-	if (rdp_leader->nocb_leader_sleep || force) {
+	if (rdp_leader->nocb_gp_sleep || force) {
 		/* Prior smp_mb__after_atomic() orders against prior enqueue. */
-		WRITE_ONCE(rdp_leader->nocb_leader_sleep, false);
+		WRITE_ONCE(rdp_leader->nocb_gp_sleep, false);
 		del_timer(&rdp->nocb_timer);
 		raw_spin_unlock_irqrestore(&rdp->nocb_lock, flags);
-		smp_mb(); /* ->nocb_leader_sleep before swake_up_one(). */
+		smp_mb(); /* ->nocb_gp_sleep before swake_up_one(). */
 		swake_up_one(&rdp_leader->nocb_wq);
 	} else {
 		raw_spin_unlock_irqrestore(&rdp->nocb_lock, flags);
@@ -1604,10 +1604,10 @@ static bool rcu_nocb_cpu_needs_barrier(int cpu)
 	if (!rhp)
 		rhp = READ_ONCE(rdp->nocb_gp_head);
 	if (!rhp)
-		rhp = READ_ONCE(rdp->nocb_follower_head);
+		rhp = READ_ONCE(rdp->nocb_cb_head);
 
 	/* Having no rcuo kthread but CBs after scheduler starts is bad! */
-	if (!READ_ONCE(rdp->nocb_kthread) && rhp &&
+	if (!READ_ONCE(rdp->nocb_cb_kthread) && rhp &&
 	    rcu_scheduler_fully_active) {
 		/* RCU callback enqueued before CPU first came online??? */
 		pr_err("RCU: Never-onlined no-CBs CPU %d has CB %p\n",
@@ -1646,7 +1646,7 @@ static void __call_rcu_nocb_enqueue(struct rcu_data *rdp,
 	smp_mb__after_atomic(); /* Store *old_rhpp before _wake test. */
 
 	/* If we are not being polled and there is a kthread, awaken it ... */
-	t = READ_ONCE(rdp->nocb_kthread);
+	t = READ_ONCE(rdp->nocb_cb_kthread);
 	if (rcu_nocb_poll || !t) {
 		trace_rcu_nocb_wake(rcu_state.name, rdp->cpu,
 				    TPS("WakeNotPoll"));
@@ -1800,9 +1800,9 @@ static void nocb_leader_wait(struct rcu_data *my_rdp)
 	if (!rcu_nocb_poll) {
 		trace_rcu_nocb_wake(rcu_state.name, my_rdp->cpu, TPS("Sleep"));
 		swait_event_interruptible_exclusive(my_rdp->nocb_wq,
-				!READ_ONCE(my_rdp->nocb_leader_sleep));
+				!READ_ONCE(my_rdp->nocb_gp_sleep));
 		raw_spin_lock_irqsave(&my_rdp->nocb_lock, flags);
-		my_rdp->nocb_leader_sleep = true;
+		my_rdp->nocb_gp_sleep = true;
 		WRITE_ONCE(my_rdp->nocb_defer_wakeup, RCU_NOCB_WAKE_NOT);
 		del_timer(&my_rdp->nocb_timer);
 		raw_spin_unlock_irqrestore(&my_rdp->nocb_lock, flags);
@@ -1818,7 +1818,7 @@ static void nocb_leader_wait(struct rcu_data *my_rdp)
 	 */
 	gotcbs = false;
 	smp_mb(); /* wakeup and _sleep before ->nocb_head reads. */
-	for (rdp = my_rdp; rdp; rdp = rdp->nocb_next_follower) {
+	for (rdp = my_rdp; rdp; rdp = rdp->nocb_next_cb_rdp) {
 		rdp->nocb_gp_head = READ_ONCE(rdp->nocb_head);
 		if (!rdp->nocb_gp_head)
 			continue;  /* No CBs here, try next follower. */
@@ -1845,12 +1845,12 @@ static void nocb_leader_wait(struct rcu_data *my_rdp)
 	rcu_nocb_wait_gp(my_rdp);
 
 	/* Each pass through the following loop wakes a follower, if needed. */
-	for (rdp = my_rdp; rdp; rdp = rdp->nocb_next_follower) {
+	for (rdp = my_rdp; rdp; rdp = rdp->nocb_next_cb_rdp) {
 		if (!rcu_nocb_poll &&
 		    READ_ONCE(rdp->nocb_head) &&
-		    READ_ONCE(my_rdp->nocb_leader_sleep)) {
+		    READ_ONCE(my_rdp->nocb_gp_sleep)) {
 			raw_spin_lock_irqsave(&my_rdp->nocb_lock, flags);
-			my_rdp->nocb_leader_sleep = false;/* No need to sleep.*/
+			my_rdp->nocb_gp_sleep = false;/* No need to sleep.*/
 			raw_spin_unlock_irqrestore(&my_rdp->nocb_lock, flags);
 		}
 		if (!rdp->nocb_gp_head)
@@ -1858,18 +1858,18 @@ static void nocb_leader_wait(struct rcu_data *my_rdp)
 
 		/* Append callbacks to follower's "done" list. */
 		raw_spin_lock_irqsave(&rdp->nocb_lock, flags);
-		tail = rdp->nocb_follower_tail;
-		rdp->nocb_follower_tail = rdp->nocb_gp_tail;
+		tail = rdp->nocb_cb_tail;
+		rdp->nocb_cb_tail = rdp->nocb_gp_tail;
 		*tail = rdp->nocb_gp_head;
 		raw_spin_unlock_irqrestore(&rdp->nocb_lock, flags);
-		if (rdp != my_rdp && tail == &rdp->nocb_follower_head) {
+		if (rdp != my_rdp && tail == &rdp->nocb_cb_head) {
 			/* List was empty, so wake up the follower.  */
 			swake_up_one(&rdp->nocb_wq);
 		}
 	}
 
 	/* If we (the leader) don't have CBs, go wait some more. */
-	if (!my_rdp->nocb_follower_head)
+	if (!my_rdp->nocb_cb_head)
 		goto wait_again;
 }
 
@@ -1882,8 +1882,8 @@ static void nocb_follower_wait(struct rcu_data *rdp)
 	for (;;) {
 		trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("FollowerSleep"));
 		swait_event_interruptible_exclusive(rdp->nocb_wq,
-					 READ_ONCE(rdp->nocb_follower_head));
-		if (smp_load_acquire(&rdp->nocb_follower_head)) {
+					 READ_ONCE(rdp->nocb_cb_head));
+		if (smp_load_acquire(&rdp->nocb_cb_head)) {
 			/* ^^^ Ensure CB invocation follows _head test. */
 			return;
 		}
@@ -1910,17 +1910,17 @@ static int rcu_nocb_kthread(void *arg)
 	/* Each pass through this loop invokes one batch of callbacks */
 	for (;;) {
 		/* Wait for callbacks. */
-		if (rdp->nocb_leader == rdp)
+		if (rdp->nocb_gp_rdp == rdp)
 			nocb_leader_wait(rdp);
 		else
 			nocb_follower_wait(rdp);
 
 		/* Pull the ready-to-invoke callbacks onto local list. */
 		raw_spin_lock_irqsave(&rdp->nocb_lock, flags);
-		list = rdp->nocb_follower_head;
-		rdp->nocb_follower_head = NULL;
-		tail = rdp->nocb_follower_tail;
-		rdp->nocb_follower_tail = &rdp->nocb_follower_head;
+		list = rdp->nocb_cb_head;
+		rdp->nocb_cb_head = NULL;
+		tail = rdp->nocb_cb_tail;
+		rdp->nocb_cb_tail = &rdp->nocb_cb_head;
 		raw_spin_unlock_irqrestore(&rdp->nocb_lock, flags);
 		if (WARN_ON_ONCE(!list))
 			continue;
@@ -2048,7 +2048,7 @@ static void __init rcu_boot_init_nocb_percpu_data(struct rcu_data *rdp)
 {
 	rdp->nocb_tail = &rdp->nocb_head;
 	init_swait_queue_head(&rdp->nocb_wq);
-	rdp->nocb_follower_tail = &rdp->nocb_follower_head;
+	rdp->nocb_cb_tail = &rdp->nocb_cb_head;
 	raw_spin_lock_init(&rdp->nocb_lock);
 	timer_setup(&rdp->nocb_timer, do_nocb_deferred_wakeup_timer, 0);
 }
@@ -2070,27 +2070,27 @@ static void rcu_spawn_one_nocb_kthread(int cpu)
 	 * If this isn't a no-CBs CPU or if it already has an rcuo kthread,
 	 * then nothing to do.
 	 */
-	if (!rcu_is_nocb_cpu(cpu) || rdp_spawn->nocb_kthread)
+	if (!rcu_is_nocb_cpu(cpu) || rdp_spawn->nocb_cb_kthread)
 		return;
 
 	/* If we didn't spawn the leader first, reorganize! */
-	rdp_old_leader = rdp_spawn->nocb_leader;
-	if (rdp_old_leader != rdp_spawn && !rdp_old_leader->nocb_kthread) {
+	rdp_old_leader = rdp_spawn->nocb_gp_rdp;
+	if (rdp_old_leader != rdp_spawn && !rdp_old_leader->nocb_cb_kthread) {
 		rdp_last = NULL;
 		rdp = rdp_old_leader;
 		do {
-			rdp->nocb_leader = rdp_spawn;
+			rdp->nocb_gp_rdp = rdp_spawn;
 			if (rdp_last && rdp != rdp_spawn)
-				rdp_last->nocb_next_follower = rdp;
+				rdp_last->nocb_next_cb_rdp = rdp;
 			if (rdp == rdp_spawn) {
-				rdp = rdp->nocb_next_follower;
+				rdp = rdp->nocb_next_cb_rdp;
 			} else {
 				rdp_last = rdp;
-				rdp = rdp->nocb_next_follower;
-				rdp_last->nocb_next_follower = NULL;
+				rdp = rdp->nocb_next_cb_rdp;
+				rdp_last->nocb_next_cb_rdp = NULL;
 			}
 		} while (rdp);
-		rdp_spawn->nocb_next_follower = rdp_old_leader;
+		rdp_spawn->nocb_next_cb_rdp = rdp_old_leader;
 	}
 
 	/* Spawn the kthread for this CPU. */
@@ -2098,7 +2098,7 @@ static void rcu_spawn_one_nocb_kthread(int cpu)
 			"rcuo%c/%d", rcu_state.abbr, cpu);
 	if (WARN_ONCE(IS_ERR(t), "%s: Could not start rcuo kthread, OOM is now expected behavior\n", __func__))
 		return;
-	WRITE_ONCE(rdp_spawn->nocb_kthread, t);
+	WRITE_ONCE(rdp_spawn->nocb_cb_kthread, t);
 }
 
 /*
@@ -2158,12 +2158,12 @@ static void __init rcu_organize_nocb_kthreads(void)
 		if (rdp->cpu >= nl) {
 			/* New leader, set up for followers & next leader. */
 			nl = DIV_ROUND_UP(rdp->cpu + 1, ls) * ls;
-			rdp->nocb_leader = rdp;
+			rdp->nocb_gp_rdp = rdp;
 			rdp_leader = rdp;
 		} else {
 			/* Another follower, link to previous leader. */
-			rdp->nocb_leader = rdp_leader;
-			rdp_prev->nocb_next_follower = rdp;
+			rdp->nocb_gp_rdp = rdp_leader;
+			rdp_prev->nocb_next_cb_rdp = rdp;
 		}
 		rdp_prev = rdp;
 	}
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH tip/core/rcu 02/11] rcu/nocb: Update comments to prepare for forward-progress work
  2019-08-01 22:50 [PATCH tip/core/rcu 0/11] No-CBs grace-period kthread updates for v5.3-rc2 Paul E. McKenney
  2019-08-01 22:50 ` [PATCH tip/core/rcu 01/11] rcu/nocb: Rename rcu_data fields to prepare for forward-progress work Paul E. McKenney
@ 2019-08-01 22:50 ` Paul E. McKenney
  2019-08-01 22:50 ` [PATCH tip/core/rcu 03/11] rcu/nocb: Provide separate no-CBs grace-period kthreads Paul E. McKenney
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 15+ messages in thread
From: Paul E. McKenney @ 2019-08-01 22:50 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, mingo, jiangshanlai, dipankar, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, Paul E. McKenney

This commit simply rewords comments to prepare for leader nocb kthreads
doing only grace-period work and callback shuffling.  This will mean
the addition of replacement kthreads to invoke callbacks.  The "leader"
and "follower" thus become less meaningful, so the commit changes no-CB
comments with these strings to "GP" and "CB", respectively.  (Give or
take the usual grammatical transformations.)

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
---
 kernel/rcu/tree.h        |  8 +++---
 kernel/rcu/tree_plugin.h | 57 ++++++++++++++++++++--------------------
 2 files changed, 33 insertions(+), 32 deletions(-)

diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
index e4e59b627c5a..32b3348d3a4d 100644
--- a/kernel/rcu/tree.h
+++ b/kernel/rcu/tree.h
@@ -206,17 +206,17 @@ struct rcu_data {
 	int nocb_defer_wakeup;		/* Defer wakeup of nocb_kthread. */
 	struct timer_list nocb_timer;	/* Enforce finite deferral. */
 
-	/* The following fields are used by the leader, hence own cacheline. */
+	/* The following fields are used by GP kthread, hence own cacheline. */
 	struct rcu_head *nocb_gp_head ____cacheline_internodealigned_in_smp;
 					/* CBs waiting for GP. */
 	struct rcu_head **nocb_gp_tail;
-	bool nocb_gp_sleep;		/* Is the nocb leader thread asleep? */
+	bool nocb_gp_sleep;		/* Is the nocb GP thread asleep? */
 	struct rcu_data *nocb_next_cb_rdp;
 					/* Next rcu_data in wakeup chain. */
 
-	/* The following fields are used by the follower, hence new cachline. */
+	/* The following fields are used by CB kthread, hence new cachline. */
 	struct rcu_data *nocb_gp_rdp ____cacheline_internodealigned_in_smp;
-					/* Leader CPU takes GP-end wakeups. */
+					/* GP rdp takes GP-end wakeups. */
 #endif /* #ifdef CONFIG_RCU_NOCB_CPU */
 
 	/* 6) RCU priority boosting. */
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 5ce1edd1c87f..5a72700c3a32 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -1447,10 +1447,10 @@ static void rcu_cleanup_after_idle(void)
  * specified by rcu_nocb_mask.  For the CPUs in the set, there are kthreads
  * created that pull the callbacks from the corresponding CPU, wait for
  * a grace period to elapse, and invoke the callbacks.  These kthreads
- * are organized into leaders, which manage incoming callbacks, wait for
- * grace periods, and awaken followers, and the followers, which only
- * invoke callbacks.  Each leader is its own follower.  The no-CBs CPUs
- * do a wake_up() on their kthread when they insert a callback into any
+ * are organized into GP kthreads, which manage incoming callbacks, wait for
+ * grace periods, and awaken CB kthreads, and the CB kthreads, which only
+ * invoke callbacks.  Each GP kthread invokes its own CBs.  The no-CBs CPUs
+ * do a wake_up() on their GP kthread when they insert a callback into any
  * empty list, unless the rcu_nocb_poll boot parameter has been specified,
  * in which case each kthread actively polls its CPU.  (Which isn't so great
  * for energy efficiency, but which does reduce RCU's overhead on that CPU.)
@@ -1521,7 +1521,7 @@ bool rcu_is_nocb_cpu(int cpu)
 }
 
 /*
- * Kick the leader kthread for this NOCB group.  Caller holds ->nocb_lock
+ * Kick the GP kthread for this NOCB group.  Caller holds ->nocb_lock
  * and this function releases it.
  */
 static void __wake_nocb_leader(struct rcu_data *rdp, bool force,
@@ -1548,7 +1548,7 @@ static void __wake_nocb_leader(struct rcu_data *rdp, bool force,
 }
 
 /*
- * Kick the leader kthread for this NOCB group, but caller has not
+ * Kick the GP kthread for this NOCB group, but caller has not
  * acquired locks.
  */
 static void wake_nocb_leader(struct rcu_data *rdp, bool force)
@@ -1560,8 +1560,8 @@ static void wake_nocb_leader(struct rcu_data *rdp, bool force)
 }
 
 /*
- * Arrange to wake the leader kthread for this NOCB group at some
- * future time when it is safe to do so.
+ * Arrange to wake the GP kthread for this NOCB group at some future
+ * time when it is safe to do so.
  */
 static void wake_nocb_leader_defer(struct rcu_data *rdp, int waketype,
 				   const char *reason)
@@ -1783,7 +1783,7 @@ static void rcu_nocb_wait_gp(struct rcu_data *rdp)
 }
 
 /*
- * Leaders come here to wait for additional callbacks to show up.
+ * No-CBs GP kthreads come here to wait for additional callbacks to show up.
  * This function does not return until callbacks appear.
  */
 static void nocb_leader_wait(struct rcu_data *my_rdp)
@@ -1812,8 +1812,8 @@ static void nocb_leader_wait(struct rcu_data *my_rdp)
 	}
 
 	/*
-	 * Each pass through the following loop checks a follower for CBs.
-	 * We are our own first follower.  Any CBs found are moved to
+	 * Each pass through the following loop checks for CBs.
+	 * We are our own first CB kthread.  Any CBs found are moved to
 	 * nocb_gp_head, where they await a grace period.
 	 */
 	gotcbs = false;
@@ -1821,7 +1821,7 @@ static void nocb_leader_wait(struct rcu_data *my_rdp)
 	for (rdp = my_rdp; rdp; rdp = rdp->nocb_next_cb_rdp) {
 		rdp->nocb_gp_head = READ_ONCE(rdp->nocb_head);
 		if (!rdp->nocb_gp_head)
-			continue;  /* No CBs here, try next follower. */
+			continue;  /* No CBs here, try next. */
 
 		/* Move callbacks to wait-for-GP list, which is empty. */
 		WRITE_ONCE(rdp->nocb_head, NULL);
@@ -1844,7 +1844,7 @@ static void nocb_leader_wait(struct rcu_data *my_rdp)
 	/* Wait for one grace period. */
 	rcu_nocb_wait_gp(my_rdp);
 
-	/* Each pass through the following loop wakes a follower, if needed. */
+	/* Each pass through this loop wakes a CB kthread, if needed. */
 	for (rdp = my_rdp; rdp; rdp = rdp->nocb_next_cb_rdp) {
 		if (!rcu_nocb_poll &&
 		    READ_ONCE(rdp->nocb_head) &&
@@ -1854,27 +1854,27 @@ static void nocb_leader_wait(struct rcu_data *my_rdp)
 			raw_spin_unlock_irqrestore(&my_rdp->nocb_lock, flags);
 		}
 		if (!rdp->nocb_gp_head)
-			continue; /* No CBs, so no need to wake follower. */
+			continue; /* No CBs, so no need to wake kthread. */
 
-		/* Append callbacks to follower's "done" list. */
+		/* Append callbacks to CB kthread's "done" list. */
 		raw_spin_lock_irqsave(&rdp->nocb_lock, flags);
 		tail = rdp->nocb_cb_tail;
 		rdp->nocb_cb_tail = rdp->nocb_gp_tail;
 		*tail = rdp->nocb_gp_head;
 		raw_spin_unlock_irqrestore(&rdp->nocb_lock, flags);
 		if (rdp != my_rdp && tail == &rdp->nocb_cb_head) {
-			/* List was empty, so wake up the follower.  */
+			/* List was empty, so wake up the kthread.  */
 			swake_up_one(&rdp->nocb_wq);
 		}
 	}
 
-	/* If we (the leader) don't have CBs, go wait some more. */
+	/* If we (the GP kthreads) don't have CBs, go wait some more. */
 	if (!my_rdp->nocb_cb_head)
 		goto wait_again;
 }
 
 /*
- * Followers come here to wait for additional callbacks to show up.
+ * No-CBs CB kthreads come here to wait for additional callbacks to show up.
  * This function does not return until callbacks appear.
  */
 static void nocb_follower_wait(struct rcu_data *rdp)
@@ -1894,9 +1894,10 @@ static void nocb_follower_wait(struct rcu_data *rdp)
 
 /*
  * Per-rcu_data kthread, but only for no-CBs CPUs.  Each kthread invokes
- * callbacks queued by the corresponding no-CBs CPU, however, there is
- * an optional leader-follower relationship so that the grace-period
- * kthreads don't have to do quite so many wakeups.
+ * callbacks queued by the corresponding no-CBs CPU, however, there is an
+ * optional GP-CB relationship so that the grace-period kthreads don't
+ * have to do quite so many wakeups (as in they only need to wake the
+ * no-CBs GP kthreads, not the CB kthreads).
  */
 static int rcu_nocb_kthread(void *arg)
 {
@@ -2056,7 +2057,7 @@ static void __init rcu_boot_init_nocb_percpu_data(struct rcu_data *rdp)
 /*
  * If the specified CPU is a no-CBs CPU that does not already have its
  * rcuo kthread, spawn it.  If the CPUs are brought online out of order,
- * this can require re-organizing the leader-follower relationships.
+ * this can require re-organizing the GP-CB relationships.
  */
 static void rcu_spawn_one_nocb_kthread(int cpu)
 {
@@ -2073,7 +2074,7 @@ static void rcu_spawn_one_nocb_kthread(int cpu)
 	if (!rcu_is_nocb_cpu(cpu) || rdp_spawn->nocb_cb_kthread)
 		return;
 
-	/* If we didn't spawn the leader first, reorganize! */
+	/* If we didn't spawn the GP kthread first, reorganize! */
 	rdp_old_leader = rdp_spawn->nocb_gp_rdp;
 	if (rdp_old_leader != rdp_spawn && !rdp_old_leader->nocb_cb_kthread) {
 		rdp_last = NULL;
@@ -2125,18 +2126,18 @@ static void __init rcu_spawn_nocb_kthreads(void)
 		rcu_spawn_cpu_nocb_kthread(cpu);
 }
 
-/* How many follower CPU IDs per leader?  Default of -1 for sqrt(nr_cpu_ids). */
+/* How many CB CPU IDs per GP kthread?  Default of -1 for sqrt(nr_cpu_ids). */
 static int rcu_nocb_leader_stride = -1;
 module_param(rcu_nocb_leader_stride, int, 0444);
 
 /*
- * Initialize leader-follower relationships for all no-CBs CPU.
+ * Initialize GP-CB relationships for all no-CBs CPU.
  */
 static void __init rcu_organize_nocb_kthreads(void)
 {
 	int cpu;
 	int ls = rcu_nocb_leader_stride;
-	int nl = 0;  /* Next leader. */
+	int nl = 0;  /* Next GP kthread. */
 	struct rcu_data *rdp;
 	struct rcu_data *rdp_leader = NULL;  /* Suppress misguided gcc warn. */
 	struct rcu_data *rdp_prev = NULL;
@@ -2156,12 +2157,12 @@ static void __init rcu_organize_nocb_kthreads(void)
 	for_each_cpu(cpu, rcu_nocb_mask) {
 		rdp = per_cpu_ptr(&rcu_data, cpu);
 		if (rdp->cpu >= nl) {
-			/* New leader, set up for followers & next leader. */
+			/* New GP kthread, set up for CBs & next GP. */
 			nl = DIV_ROUND_UP(rdp->cpu + 1, ls) * ls;
 			rdp->nocb_gp_rdp = rdp;
 			rdp_leader = rdp;
 		} else {
-			/* Another follower, link to previous leader. */
+			/* Another CB kthread, link to previous GP kthread. */
 			rdp->nocb_gp_rdp = rdp_leader;
 			rdp_prev->nocb_next_cb_rdp = rdp;
 		}
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH tip/core/rcu 03/11] rcu/nocb: Provide separate no-CBs grace-period kthreads
  2019-08-01 22:50 [PATCH tip/core/rcu 0/11] No-CBs grace-period kthread updates for v5.3-rc2 Paul E. McKenney
  2019-08-01 22:50 ` [PATCH tip/core/rcu 01/11] rcu/nocb: Rename rcu_data fields to prepare for forward-progress work Paul E. McKenney
  2019-08-01 22:50 ` [PATCH tip/core/rcu 02/11] rcu/nocb: Update comments " Paul E. McKenney
@ 2019-08-01 22:50 ` Paul E. McKenney
  2019-08-03 17:41   ` Joel Fernandes
  2019-08-01 22:50 ` [PATCH tip/core/rcu 04/11] rcu/nocb: Rename nocb_follower_wait() to nocb_cb_wait() Paul E. McKenney
                   ` (7 subsequent siblings)
  10 siblings, 1 reply; 15+ messages in thread
From: Paul E. McKenney @ 2019-08-01 22:50 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, mingo, jiangshanlai, dipankar, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, Paul E. McKenney

Currently, there is one no-CBs rcuo kthread per CPU, and these kthreads
are divided into groups.  The first rcuo kthread to come online in a
given group is that group's leader, and the leader both waits for grace
periods and invokes its CPU's callbacks.  The non-leader rcuo kthreads
only invoke callbacks.

This works well in the real-time/embedded environments for which it was
intended because such environments tend not to generate all that many
callbacks.  However, given huge floods of callbacks, it is possible for
the leader kthread to be stuck invoking callbacks while its followers
wait helplessly while their callbacks pile up.  This is a good recipe
for an OOM, and rcutorture's new callback-flood capability does generate
such OOMs.

One strategy would be to wait until such OOMs start happening in
production, but similar OOMs have in fact happened starting in 2018.
It would therefore be wise to take a more proactive approach.

This commit therefore features per-CPU rcuo kthreads that do nothing
but invoke callbacks.  Instead of having one of these kthreads act as
leader, each group has a separate rcog kthread that handles grace periods
for its group.  Because these rcuog kthreads do not invoke callbacks,
callback floods on one CPU no longer block callbacks from reaching the
rcuc callback-invocation kthreads on other CPUs.

This change does introduce additional kthreads, however:

1.	The number of additional kthreads is about the square root of
	the number of CPUs, so that a 4096-CPU system would have only
	about 64 additional kthreads.  Note that recent changes
	decreased the number of rcuo kthreads by a factor of two
	(CONFIG_PREEMPT=n) or even three (CONFIG_PREEMPT=y), so
	this still represents a significant improvement on most systems.

2.	The leading "rcuo" of the rcuog kthreads should allow existing
	scripting to affinity these additional kthreads as needed, the
	same as for the rcuop and rcuos kthreads.  (There are no longer
	any rcuob kthreads.)

3.	A state-machine approach was considered and rejected.  Although
	this would allow the rcuo kthreads to continue their dual
	leader/follower roles, it complicates callback invocation
	and makes it more difficult to consolidate rcuo callback
	invocation with existing softirq callback invocation.

The introduction of rcuog kthreads should thus be acceptable.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
---
 kernel/rcu/tree.h        |   6 +-
 kernel/rcu/tree_plugin.h | 115 +++++++++++++++++++--------------------
 2 files changed, 61 insertions(+), 60 deletions(-)

diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
index 32b3348d3a4d..dc3c53cb9608 100644
--- a/kernel/rcu/tree.h
+++ b/kernel/rcu/tree.h
@@ -200,8 +200,8 @@ struct rcu_data {
 	atomic_long_t nocb_q_count_lazy; /*  invocation (all stages). */
 	struct rcu_head *nocb_cb_head;	/* CBs ready to invoke. */
 	struct rcu_head **nocb_cb_tail;
-	struct swait_queue_head nocb_wq; /* For nocb kthreads to sleep on. */
-	struct task_struct *nocb_cb_kthread;
+	struct swait_queue_head nocb_cb_wq; /* For nocb kthreads to sleep on. */
+	struct task_struct *nocb_gp_kthread;
 	raw_spinlock_t nocb_lock;	/* Guard following pair of fields. */
 	int nocb_defer_wakeup;		/* Defer wakeup of nocb_kthread. */
 	struct timer_list nocb_timer;	/* Enforce finite deferral. */
@@ -211,6 +211,8 @@ struct rcu_data {
 					/* CBs waiting for GP. */
 	struct rcu_head **nocb_gp_tail;
 	bool nocb_gp_sleep;		/* Is the nocb GP thread asleep? */
+	struct swait_queue_head nocb_gp_wq; /* For nocb kthreads to sleep on. */
+	struct task_struct *nocb_cb_kthread;
 	struct rcu_data *nocb_next_cb_rdp;
 					/* Next rcu_data in wakeup chain. */
 
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 5a72700c3a32..c3b6493313ab 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -1531,7 +1531,7 @@ static void __wake_nocb_leader(struct rcu_data *rdp, bool force,
 	struct rcu_data *rdp_leader = rdp->nocb_gp_rdp;
 
 	lockdep_assert_held(&rdp->nocb_lock);
-	if (!READ_ONCE(rdp_leader->nocb_cb_kthread)) {
+	if (!READ_ONCE(rdp_leader->nocb_gp_kthread)) {
 		raw_spin_unlock_irqrestore(&rdp->nocb_lock, flags);
 		return;
 	}
@@ -1541,7 +1541,7 @@ static void __wake_nocb_leader(struct rcu_data *rdp, bool force,
 		del_timer(&rdp->nocb_timer);
 		raw_spin_unlock_irqrestore(&rdp->nocb_lock, flags);
 		smp_mb(); /* ->nocb_gp_sleep before swake_up_one(). */
-		swake_up_one(&rdp_leader->nocb_wq);
+		swake_up_one(&rdp_leader->nocb_gp_wq);
 	} else {
 		raw_spin_unlock_irqrestore(&rdp->nocb_lock, flags);
 	}
@@ -1646,7 +1646,7 @@ static void __call_rcu_nocb_enqueue(struct rcu_data *rdp,
 	smp_mb__after_atomic(); /* Store *old_rhpp before _wake test. */
 
 	/* If we are not being polled and there is a kthread, awaken it ... */
-	t = READ_ONCE(rdp->nocb_cb_kthread);
+	t = READ_ONCE(rdp->nocb_gp_kthread);
 	if (rcu_nocb_poll || !t) {
 		trace_rcu_nocb_wake(rcu_state.name, rdp->cpu,
 				    TPS("WakeNotPoll"));
@@ -1786,7 +1786,7 @@ static void rcu_nocb_wait_gp(struct rcu_data *rdp)
  * No-CBs GP kthreads come here to wait for additional callbacks to show up.
  * This function does not return until callbacks appear.
  */
-static void nocb_leader_wait(struct rcu_data *my_rdp)
+static void nocb_gp_wait(struct rcu_data *my_rdp)
 {
 	bool firsttime = true;
 	unsigned long flags;
@@ -1794,12 +1794,10 @@ static void nocb_leader_wait(struct rcu_data *my_rdp)
 	struct rcu_data *rdp;
 	struct rcu_head **tail;
 
-wait_again:
-
 	/* Wait for callbacks to appear. */
 	if (!rcu_nocb_poll) {
 		trace_rcu_nocb_wake(rcu_state.name, my_rdp->cpu, TPS("Sleep"));
-		swait_event_interruptible_exclusive(my_rdp->nocb_wq,
+		swait_event_interruptible_exclusive(my_rdp->nocb_gp_wq,
 				!READ_ONCE(my_rdp->nocb_gp_sleep));
 		raw_spin_lock_irqsave(&my_rdp->nocb_lock, flags);
 		my_rdp->nocb_gp_sleep = true;
@@ -1838,7 +1836,7 @@ static void nocb_leader_wait(struct rcu_data *my_rdp)
 			trace_rcu_nocb_wake(rcu_state.name, my_rdp->cpu,
 					    TPS("WokeEmpty"));
 		}
-		goto wait_again;
+		return;
 	}
 
 	/* Wait for one grace period. */
@@ -1862,34 +1860,47 @@ static void nocb_leader_wait(struct rcu_data *my_rdp)
 		rdp->nocb_cb_tail = rdp->nocb_gp_tail;
 		*tail = rdp->nocb_gp_head;
 		raw_spin_unlock_irqrestore(&rdp->nocb_lock, flags);
-		if (rdp != my_rdp && tail == &rdp->nocb_cb_head) {
+		if (tail == &rdp->nocb_cb_head) {
 			/* List was empty, so wake up the kthread.  */
-			swake_up_one(&rdp->nocb_wq);
+			swake_up_one(&rdp->nocb_cb_wq);
 		}
 	}
+}
 
-	/* If we (the GP kthreads) don't have CBs, go wait some more. */
-	if (!my_rdp->nocb_cb_head)
-		goto wait_again;
+/*
+ * No-CBs grace-period-wait kthread.  There is one of these per group
+ * of CPUs, but only once at least one CPU in that group has come online
+ * at least once since boot.  This kthread checks for newly posted
+ * callbacks from any of the CPUs it is responsible for, waits for a
+ * grace period, then awakens all of the rcu_nocb_cb_kthread() instances
+ * that then have callback-invocation work to do.
+ */
+static int rcu_nocb_gp_kthread(void *arg)
+{
+	struct rcu_data *rdp = arg;
+
+	for (;;)
+		nocb_gp_wait(rdp);
+	return 0;
 }
 
 /*
  * No-CBs CB kthreads come here to wait for additional callbacks to show up.
- * This function does not return until callbacks appear.
+ * This function returns true ("keep waiting") until callbacks appear and
+ * then false ("stop waiting") when callbacks finally do appear.
  */
-static void nocb_follower_wait(struct rcu_data *rdp)
+static bool nocb_follower_wait(struct rcu_data *rdp)
 {
-	for (;;) {
-		trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("FollowerSleep"));
-		swait_event_interruptible_exclusive(rdp->nocb_wq,
-					 READ_ONCE(rdp->nocb_cb_head));
-		if (smp_load_acquire(&rdp->nocb_cb_head)) {
-			/* ^^^ Ensure CB invocation follows _head test. */
-			return;
-		}
-		WARN_ON(signal_pending(current));
-		trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("WokeEmpty"));
+	trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("FollowerSleep"));
+	swait_event_interruptible_exclusive(rdp->nocb_cb_wq,
+				 READ_ONCE(rdp->nocb_cb_head));
+	if (smp_load_acquire(&rdp->nocb_cb_head)) { /* VVV */
+		/* ^^^ Ensure CB invocation follows _head test. */
+		return false;
 	}
+	WARN_ON(signal_pending(current));
+	trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("WokeEmpty"));
+	return true;
 }
 
 /*
@@ -1899,7 +1910,7 @@ static void nocb_follower_wait(struct rcu_data *rdp)
  * have to do quite so many wakeups (as in they only need to wake the
  * no-CBs GP kthreads, not the CB kthreads).
  */
-static int rcu_nocb_kthread(void *arg)
+static int rcu_nocb_cb_kthread(void *arg)
 {
 	int c, cl;
 	unsigned long flags;
@@ -1911,10 +1922,8 @@ static int rcu_nocb_kthread(void *arg)
 	/* Each pass through this loop invokes one batch of callbacks */
 	for (;;) {
 		/* Wait for callbacks. */
-		if (rdp->nocb_gp_rdp == rdp)
-			nocb_leader_wait(rdp);
-		else
-			nocb_follower_wait(rdp);
+		while (nocb_follower_wait(rdp))
+			continue;
 
 		/* Pull the ready-to-invoke callbacks onto local list. */
 		raw_spin_lock_irqsave(&rdp->nocb_lock, flags);
@@ -2048,7 +2057,8 @@ void __init rcu_init_nohz(void)
 static void __init rcu_boot_init_nocb_percpu_data(struct rcu_data *rdp)
 {
 	rdp->nocb_tail = &rdp->nocb_head;
-	init_swait_queue_head(&rdp->nocb_wq);
+	init_swait_queue_head(&rdp->nocb_cb_wq);
+	init_swait_queue_head(&rdp->nocb_gp_wq);
 	rdp->nocb_cb_tail = &rdp->nocb_cb_head;
 	raw_spin_lock_init(&rdp->nocb_lock);
 	timer_setup(&rdp->nocb_timer, do_nocb_deferred_wakeup_timer, 0);
@@ -2056,50 +2066,39 @@ static void __init rcu_boot_init_nocb_percpu_data(struct rcu_data *rdp)
 
 /*
  * If the specified CPU is a no-CBs CPU that does not already have its
- * rcuo kthread, spawn it.  If the CPUs are brought online out of order,
- * this can require re-organizing the GP-CB relationships.
+ * rcuo CB kthread, spawn it.  Additionally, if the rcuo GP kthread
+ * for this CPU's group has not yet been created, spawn it as well.
  */
 static void rcu_spawn_one_nocb_kthread(int cpu)
 {
-	struct rcu_data *rdp;
-	struct rcu_data *rdp_last;
-	struct rcu_data *rdp_old_leader;
-	struct rcu_data *rdp_spawn = per_cpu_ptr(&rcu_data, cpu);
+	struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
+	struct rcu_data *rdp_gp;
 	struct task_struct *t;
 
 	/*
 	 * If this isn't a no-CBs CPU or if it already has an rcuo kthread,
 	 * then nothing to do.
 	 */
-	if (!rcu_is_nocb_cpu(cpu) || rdp_spawn->nocb_cb_kthread)
+	if (!rcu_is_nocb_cpu(cpu) || rdp->nocb_cb_kthread)
 		return;
 
 	/* If we didn't spawn the GP kthread first, reorganize! */
-	rdp_old_leader = rdp_spawn->nocb_gp_rdp;
-	if (rdp_old_leader != rdp_spawn && !rdp_old_leader->nocb_cb_kthread) {
-		rdp_last = NULL;
-		rdp = rdp_old_leader;
-		do {
-			rdp->nocb_gp_rdp = rdp_spawn;
-			if (rdp_last && rdp != rdp_spawn)
-				rdp_last->nocb_next_cb_rdp = rdp;
-			if (rdp == rdp_spawn) {
-				rdp = rdp->nocb_next_cb_rdp;
-			} else {
-				rdp_last = rdp;
-				rdp = rdp->nocb_next_cb_rdp;
-				rdp_last->nocb_next_cb_rdp = NULL;
-			}
-		} while (rdp);
-		rdp_spawn->nocb_next_cb_rdp = rdp_old_leader;
+	rdp_gp = rdp->nocb_gp_rdp;
+	if (!rdp_gp->nocb_gp_kthread) {
+		t = kthread_run(rcu_nocb_gp_kthread, rdp_gp,
+				"rcuog/%d", rdp_gp->cpu);
+		if (WARN_ONCE(IS_ERR(t), "%s: Could not start rcuo GP kthread, OOM is now expected behavior\n", __func__))
+			return;
+		WRITE_ONCE(rdp_gp->nocb_gp_kthread, t);
 	}
 
 	/* Spawn the kthread for this CPU. */
-	t = kthread_run(rcu_nocb_kthread, rdp_spawn,
+	t = kthread_run(rcu_nocb_cb_kthread, rdp,
 			"rcuo%c/%d", rcu_state.abbr, cpu);
-	if (WARN_ONCE(IS_ERR(t), "%s: Could not start rcuo kthread, OOM is now expected behavior\n", __func__))
+	if (WARN_ONCE(IS_ERR(t), "%s: Could not start rcuo CB kthread, OOM is now expected behavior\n", __func__))
 		return;
-	WRITE_ONCE(rdp_spawn->nocb_cb_kthread, t);
+	WRITE_ONCE(rdp->nocb_cb_kthread, t);
+	WRITE_ONCE(rdp->nocb_gp_kthread, rdp_gp->nocb_gp_kthread);
 }
 
 /*
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH tip/core/rcu 04/11] rcu/nocb: Rename nocb_follower_wait() to nocb_cb_wait()
  2019-08-01 22:50 [PATCH tip/core/rcu 0/11] No-CBs grace-period kthread updates for v5.3-rc2 Paul E. McKenney
                   ` (2 preceding siblings ...)
  2019-08-01 22:50 ` [PATCH tip/core/rcu 03/11] rcu/nocb: Provide separate no-CBs grace-period kthreads Paul E. McKenney
@ 2019-08-01 22:50 ` Paul E. McKenney
  2019-08-01 22:50 ` [PATCH tip/core/rcu 05/11] rcu/nocb: Rename wake_nocb_leader() to wake_nocb_gp() Paul E. McKenney
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 15+ messages in thread
From: Paul E. McKenney @ 2019-08-01 22:50 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, mingo, jiangshanlai, dipankar, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, Paul E. McKenney

This commit adjusts naming to account for the new distinction between
callback and grace-period no-CBs kthreads.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
---
 kernel/rcu/tree_plugin.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index c3b6493313ab..9d5448217bbc 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -1889,7 +1889,7 @@ static int rcu_nocb_gp_kthread(void *arg)
  * This function returns true ("keep waiting") until callbacks appear and
  * then false ("stop waiting") when callbacks finally do appear.
  */
-static bool nocb_follower_wait(struct rcu_data *rdp)
+static bool nocb_cb_wait(struct rcu_data *rdp)
 {
 	trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("FollowerSleep"));
 	swait_event_interruptible_exclusive(rdp->nocb_cb_wq,
@@ -1922,7 +1922,7 @@ static int rcu_nocb_cb_kthread(void *arg)
 	/* Each pass through this loop invokes one batch of callbacks */
 	for (;;) {
 		/* Wait for callbacks. */
-		while (nocb_follower_wait(rdp))
+		while (nocb_cb_wait(rdp))
 			continue;
 
 		/* Pull the ready-to-invoke callbacks onto local list. */
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH tip/core/rcu 05/11] rcu/nocb: Rename wake_nocb_leader() to wake_nocb_gp()
  2019-08-01 22:50 [PATCH tip/core/rcu 0/11] No-CBs grace-period kthread updates for v5.3-rc2 Paul E. McKenney
                   ` (3 preceding siblings ...)
  2019-08-01 22:50 ` [PATCH tip/core/rcu 04/11] rcu/nocb: Rename nocb_follower_wait() to nocb_cb_wait() Paul E. McKenney
@ 2019-08-01 22:50 ` Paul E. McKenney
  2019-08-01 22:50 ` [PATCH tip/core/rcu 06/11] rcu/nocb: Rename __wake_nocb_leader() to __wake_nocb_gp() Paul E. McKenney
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 15+ messages in thread
From: Paul E. McKenney @ 2019-08-01 22:50 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, mingo, jiangshanlai, dipankar, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, Paul E. McKenney

This commit adjusts naming to account for the new distinction between
callback and grace-period no-CBs kthreads.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
---
 kernel/rcu/tree_plugin.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 9d5448217bbc..632c2cfb9856 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -1551,7 +1551,7 @@ static void __wake_nocb_leader(struct rcu_data *rdp, bool force,
  * Kick the GP kthread for this NOCB group, but caller has not
  * acquired locks.
  */
-static void wake_nocb_leader(struct rcu_data *rdp, bool force)
+static void wake_nocb_gp(struct rcu_data *rdp, bool force)
 {
 	unsigned long flags;
 
@@ -1656,7 +1656,7 @@ static void __call_rcu_nocb_enqueue(struct rcu_data *rdp,
 	if (old_rhpp == &rdp->nocb_head) {
 		if (!irqs_disabled_flags(flags)) {
 			/* ... if queue was empty ... */
-			wake_nocb_leader(rdp, false);
+			wake_nocb_gp(rdp, false);
 			trace_rcu_nocb_wake(rcu_state.name, rdp->cpu,
 					    TPS("WakeEmpty"));
 		} else {
@@ -1667,7 +1667,7 @@ static void __call_rcu_nocb_enqueue(struct rcu_data *rdp,
 	} else if (len > rdp->qlen_last_fqs_check + qhimark) {
 		/* ... or if many callbacks queued. */
 		if (!irqs_disabled_flags(flags)) {
-			wake_nocb_leader(rdp, true);
+			wake_nocb_gp(rdp, true);
 			trace_rcu_nocb_wake(rcu_state.name, rdp->cpu,
 					    TPS("WakeOvf"));
 		} else {
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH tip/core/rcu 06/11] rcu/nocb: Rename __wake_nocb_leader() to __wake_nocb_gp()
  2019-08-01 22:50 [PATCH tip/core/rcu 0/11] No-CBs grace-period kthread updates for v5.3-rc2 Paul E. McKenney
                   ` (4 preceding siblings ...)
  2019-08-01 22:50 ` [PATCH tip/core/rcu 05/11] rcu/nocb: Rename wake_nocb_leader() to wake_nocb_gp() Paul E. McKenney
@ 2019-08-01 22:50 ` Paul E. McKenney
  2019-08-01 22:50 ` [PATCH tip/core/rcu 07/11] rcu/nocb: Rename wake_nocb_leader_defer() to wake_nocb_gp_defer() Paul E. McKenney
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 15+ messages in thread
From: Paul E. McKenney @ 2019-08-01 22:50 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, mingo, jiangshanlai, dipankar, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, Paul E. McKenney

This commit adjusts naming to account for the new distinction between
callback and grace-period no-CBs kthreads.  While in the area, it also
updates local variables.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
---
 kernel/rcu/tree_plugin.h | 18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 632c2cfb9856..7c7870da234a 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -1524,24 +1524,24 @@ bool rcu_is_nocb_cpu(int cpu)
  * Kick the GP kthread for this NOCB group.  Caller holds ->nocb_lock
  * and this function releases it.
  */
-static void __wake_nocb_leader(struct rcu_data *rdp, bool force,
-			       unsigned long flags)
+static void __wake_nocb_gp(struct rcu_data *rdp, bool force,
+			   unsigned long flags)
 	__releases(rdp->nocb_lock)
 {
-	struct rcu_data *rdp_leader = rdp->nocb_gp_rdp;
+	struct rcu_data *rdp_gp = rdp->nocb_gp_rdp;
 
 	lockdep_assert_held(&rdp->nocb_lock);
-	if (!READ_ONCE(rdp_leader->nocb_gp_kthread)) {
+	if (!READ_ONCE(rdp_gp->nocb_gp_kthread)) {
 		raw_spin_unlock_irqrestore(&rdp->nocb_lock, flags);
 		return;
 	}
-	if (rdp_leader->nocb_gp_sleep || force) {
+	if (rdp_gp->nocb_gp_sleep || force) {
 		/* Prior smp_mb__after_atomic() orders against prior enqueue. */
-		WRITE_ONCE(rdp_leader->nocb_gp_sleep, false);
+		WRITE_ONCE(rdp_gp->nocb_gp_sleep, false);
 		del_timer(&rdp->nocb_timer);
 		raw_spin_unlock_irqrestore(&rdp->nocb_lock, flags);
 		smp_mb(); /* ->nocb_gp_sleep before swake_up_one(). */
-		swake_up_one(&rdp_leader->nocb_gp_wq);
+		swake_up_one(&rdp_gp->nocb_gp_wq);
 	} else {
 		raw_spin_unlock_irqrestore(&rdp->nocb_lock, flags);
 	}
@@ -1556,7 +1556,7 @@ static void wake_nocb_gp(struct rcu_data *rdp, bool force)
 	unsigned long flags;
 
 	raw_spin_lock_irqsave(&rdp->nocb_lock, flags);
-	__wake_nocb_leader(rdp, force, flags);
+	__wake_nocb_gp(rdp, force, flags);
 }
 
 /*
@@ -1988,7 +1988,7 @@ static void do_nocb_deferred_wakeup_common(struct rcu_data *rdp)
 	}
 	ndw = READ_ONCE(rdp->nocb_defer_wakeup);
 	WRITE_ONCE(rdp->nocb_defer_wakeup, RCU_NOCB_WAKE_NOT);
-	__wake_nocb_leader(rdp, ndw == RCU_NOCB_WAKE_FORCE, flags);
+	__wake_nocb_gp(rdp, ndw == RCU_NOCB_WAKE_FORCE, flags);
 	trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("DeferredWake"));
 }
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH tip/core/rcu 07/11] rcu/nocb: Rename wake_nocb_leader_defer() to wake_nocb_gp_defer()
  2019-08-01 22:50 [PATCH tip/core/rcu 0/11] No-CBs grace-period kthread updates for v5.3-rc2 Paul E. McKenney
                   ` (5 preceding siblings ...)
  2019-08-01 22:50 ` [PATCH tip/core/rcu 06/11] rcu/nocb: Rename __wake_nocb_leader() to __wake_nocb_gp() Paul E. McKenney
@ 2019-08-01 22:50 ` Paul E. McKenney
  2019-08-01 22:50 ` [PATCH tip/core/rcu 08/11] rcu/nocb: Rename rcu_organize_nocb_kthreads() local variable Paul E. McKenney
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 15+ messages in thread
From: Paul E. McKenney @ 2019-08-01 22:50 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, mingo, jiangshanlai, dipankar, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, Paul E. McKenney

This commit adjusts naming to account for the new distinction between
callback and grace-period no-CBs kthreads.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
---
 kernel/rcu/tree_plugin.h | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 7c7870da234a..e6581a51ff9a 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -1563,8 +1563,8 @@ static void wake_nocb_gp(struct rcu_data *rdp, bool force)
  * Arrange to wake the GP kthread for this NOCB group at some future
  * time when it is safe to do so.
  */
-static void wake_nocb_leader_defer(struct rcu_data *rdp, int waketype,
-				   const char *reason)
+static void wake_nocb_gp_defer(struct rcu_data *rdp, int waketype,
+			       const char *reason)
 {
 	unsigned long flags;
 
@@ -1660,8 +1660,8 @@ static void __call_rcu_nocb_enqueue(struct rcu_data *rdp,
 			trace_rcu_nocb_wake(rcu_state.name, rdp->cpu,
 					    TPS("WakeEmpty"));
 		} else {
-			wake_nocb_leader_defer(rdp, RCU_NOCB_WAKE,
-					       TPS("WakeEmptyIsDeferred"));
+			wake_nocb_gp_defer(rdp, RCU_NOCB_WAKE,
+					   TPS("WakeEmptyIsDeferred"));
 		}
 		rdp->qlen_last_fqs_check = 0;
 	} else if (len > rdp->qlen_last_fqs_check + qhimark) {
@@ -1671,8 +1671,8 @@ static void __call_rcu_nocb_enqueue(struct rcu_data *rdp,
 			trace_rcu_nocb_wake(rcu_state.name, rdp->cpu,
 					    TPS("WakeOvf"));
 		} else {
-			wake_nocb_leader_defer(rdp, RCU_NOCB_WAKE_FORCE,
-					       TPS("WakeOvfIsDeferred"));
+			wake_nocb_gp_defer(rdp, RCU_NOCB_WAKE_FORCE,
+					   TPS("WakeOvfIsDeferred"));
 		}
 		rdp->qlen_last_fqs_check = LONG_MAX / 2;
 	} else {
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH tip/core/rcu 08/11] rcu/nocb: Rename rcu_organize_nocb_kthreads() local variable
  2019-08-01 22:50 [PATCH tip/core/rcu 0/11] No-CBs grace-period kthread updates for v5.3-rc2 Paul E. McKenney
                   ` (6 preceding siblings ...)
  2019-08-01 22:50 ` [PATCH tip/core/rcu 07/11] rcu/nocb: Rename wake_nocb_leader_defer() to wake_nocb_gp_defer() Paul E. McKenney
@ 2019-08-01 22:50 ` Paul E. McKenney
  2019-08-01 22:50 ` [PATCH tip/core/rcu 09/11] rcu/nocb: Rename and document no-CB CB kthread sleep trace event Paul E. McKenney
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 15+ messages in thread
From: Paul E. McKenney @ 2019-08-01 22:50 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, mingo, jiangshanlai, dipankar, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, Paul E. McKenney

This commit renames rdp_leader to rdp_gp in order to account for the
new distinction between callback and grace-period no-CBs kthreads.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
---
 kernel/rcu/tree_plugin.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index e6581a51ff9a..0af36e98e70f 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -2138,7 +2138,7 @@ static void __init rcu_organize_nocb_kthreads(void)
 	int ls = rcu_nocb_leader_stride;
 	int nl = 0;  /* Next GP kthread. */
 	struct rcu_data *rdp;
-	struct rcu_data *rdp_leader = NULL;  /* Suppress misguided gcc warn. */
+	struct rcu_data *rdp_gp = NULL;  /* Suppress misguided gcc warn. */
 	struct rcu_data *rdp_prev = NULL;
 
 	if (!cpumask_available(rcu_nocb_mask))
@@ -2159,10 +2159,10 @@ static void __init rcu_organize_nocb_kthreads(void)
 			/* New GP kthread, set up for CBs & next GP. */
 			nl = DIV_ROUND_UP(rdp->cpu + 1, ls) * ls;
 			rdp->nocb_gp_rdp = rdp;
-			rdp_leader = rdp;
+			rdp_gp = rdp;
 		} else {
 			/* Another CB kthread, link to previous GP kthread. */
-			rdp->nocb_gp_rdp = rdp_leader;
+			rdp->nocb_gp_rdp = rdp_gp;
 			rdp_prev->nocb_next_cb_rdp = rdp;
 		}
 		rdp_prev = rdp;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH tip/core/rcu 09/11] rcu/nocb: Rename and document no-CB CB kthread sleep trace event
  2019-08-01 22:50 [PATCH tip/core/rcu 0/11] No-CBs grace-period kthread updates for v5.3-rc2 Paul E. McKenney
                   ` (7 preceding siblings ...)
  2019-08-01 22:50 ` [PATCH tip/core/rcu 08/11] rcu/nocb: Rename rcu_organize_nocb_kthreads() local variable Paul E. McKenney
@ 2019-08-01 22:50 ` Paul E. McKenney
  2019-08-01 22:50 ` [PATCH tip/core/rcu 10/11] rcu/nocb: Rename rcu_nocb_leader_stride kernel boot parameter Paul E. McKenney
  2019-08-01 22:50 ` [PATCH tip/core/rcu 11/11] rcu/nocb: Print gp/cb kthread hierarchy if dump_tree Paul E. McKenney
  10 siblings, 0 replies; 15+ messages in thread
From: Paul E. McKenney @ 2019-08-01 22:50 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, mingo, jiangshanlai, dipankar, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, Paul E. McKenney

The nocb_cb_wait() function traces a "FollowerSleep" trace_rcu_nocb_wake()
event, which never was documented and is now misleading.  This commit
therefore changes "FollowerSleep" to "CBSleep", documents this, and
updates the documentation for "Sleep" as well.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
---
 include/trace/events/rcu.h | 3 ++-
 kernel/rcu/tree_plugin.h   | 2 +-
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/include/trace/events/rcu.h b/include/trace/events/rcu.h
index 02a3f78f7cd8..313324d1b135 100644
--- a/include/trace/events/rcu.h
+++ b/include/trace/events/rcu.h
@@ -267,7 +267,8 @@ TRACE_EVENT_RCU(rcu_exp_funnel_lock,
  *	"WakeNotPoll": Don't wake rcuo kthread because it is polling.
  *	"DeferredWake": Carried out the "IsDeferred" wakeup.
  *	"Poll": Start of new polling cycle for rcu_nocb_poll.
- *	"Sleep": Sleep waiting for CBs for !rcu_nocb_poll.
+ *	"Sleep": Sleep waiting for GP for !rcu_nocb_poll.
+ *	"CBSleep": Sleep waiting for CBs for !rcu_nocb_poll.
  *	"WokeEmpty": rcuo kthread woke to find empty list.
  *	"WokeNonEmpty": rcuo kthread woke to find non-empty list.
  *	"WaitQueue": Enqueue partially done, timed wait for it to complete.
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 0af36e98e70f..be065aacd63b 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -1891,7 +1891,7 @@ static int rcu_nocb_gp_kthread(void *arg)
  */
 static bool nocb_cb_wait(struct rcu_data *rdp)
 {
-	trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("FollowerSleep"));
+	trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("CBSleep"));
 	swait_event_interruptible_exclusive(rdp->nocb_cb_wq,
 				 READ_ONCE(rdp->nocb_cb_head));
 	if (smp_load_acquire(&rdp->nocb_cb_head)) { /* VVV */
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH tip/core/rcu 10/11] rcu/nocb: Rename rcu_nocb_leader_stride kernel boot parameter
  2019-08-01 22:50 [PATCH tip/core/rcu 0/11] No-CBs grace-period kthread updates for v5.3-rc2 Paul E. McKenney
                   ` (8 preceding siblings ...)
  2019-08-01 22:50 ` [PATCH tip/core/rcu 09/11] rcu/nocb: Rename and document no-CB CB kthread sleep trace event Paul E. McKenney
@ 2019-08-01 22:50 ` Paul E. McKenney
  2019-08-01 22:50 ` [PATCH tip/core/rcu 11/11] rcu/nocb: Print gp/cb kthread hierarchy if dump_tree Paul E. McKenney
  10 siblings, 0 replies; 15+ messages in thread
From: Paul E. McKenney @ 2019-08-01 22:50 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, mingo, jiangshanlai, dipankar, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, Paul E. McKenney

This commit changes the name of the rcu_nocb_leader_stride kernel
boot parameter to rcu_nocb_gp_stride in order to account for the new
distinction between callback and grace-period no-CBs kthreads.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
---
 Documentation/admin-guide/kernel-parameters.txt | 13 +++++++------
 kernel/rcu/tree_plugin.h                        |  8 ++++----
 2 files changed, 11 insertions(+), 10 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index f3fcd6140ee1..79b983bedcaa 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -3837,12 +3837,13 @@
 			RCU_BOOST is not set, valid values are 0-99 and
 			the default is zero (non-realtime operation).
 
-	rcutree.rcu_nocb_leader_stride= [KNL]
-			Set the number of NOCB kthread groups, which
-			defaults to the square root of the number of
-			CPUs.  Larger numbers reduces the wakeup overhead
-			on the per-CPU grace-period kthreads, but increases
-			that same overhead on each group's leader.
+	rcutree.rcu_nocb_gp_stride= [KNL]
+			Set the number of NOCB callback kthreads in
+			each group, which defaults to the square root
+			of the number of CPUs.	Larger numbers reduce
+			the wakeup overhead on the global grace-period
+			kthread, but increases that same overhead on
+			each group's NOCB grace-period kthread.
 
 	rcutree.qhimark= [KNL]
 			Set threshold of queued RCU callbacks beyond which
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index be065aacd63b..80b27a9f306d 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -2126,8 +2126,8 @@ static void __init rcu_spawn_nocb_kthreads(void)
 }
 
 /* How many CB CPU IDs per GP kthread?  Default of -1 for sqrt(nr_cpu_ids). */
-static int rcu_nocb_leader_stride = -1;
-module_param(rcu_nocb_leader_stride, int, 0444);
+static int rcu_nocb_gp_stride = -1;
+module_param(rcu_nocb_gp_stride, int, 0444);
 
 /*
  * Initialize GP-CB relationships for all no-CBs CPU.
@@ -2135,7 +2135,7 @@ module_param(rcu_nocb_leader_stride, int, 0444);
 static void __init rcu_organize_nocb_kthreads(void)
 {
 	int cpu;
-	int ls = rcu_nocb_leader_stride;
+	int ls = rcu_nocb_gp_stride;
 	int nl = 0;  /* Next GP kthread. */
 	struct rcu_data *rdp;
 	struct rcu_data *rdp_gp = NULL;  /* Suppress misguided gcc warn. */
@@ -2145,7 +2145,7 @@ static void __init rcu_organize_nocb_kthreads(void)
 		return;
 	if (ls == -1) {
 		ls = int_sqrt(nr_cpu_ids);
-		rcu_nocb_leader_stride = ls;
+		rcu_nocb_gp_stride = ls;
 	}
 
 	/*
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH tip/core/rcu 11/11] rcu/nocb: Print gp/cb kthread hierarchy if dump_tree
  2019-08-01 22:50 [PATCH tip/core/rcu 0/11] No-CBs grace-period kthread updates for v5.3-rc2 Paul E. McKenney
                   ` (9 preceding siblings ...)
  2019-08-01 22:50 ` [PATCH tip/core/rcu 10/11] rcu/nocb: Rename rcu_nocb_leader_stride kernel boot parameter Paul E. McKenney
@ 2019-08-01 22:50 ` Paul E. McKenney
  10 siblings, 0 replies; 15+ messages in thread
From: Paul E. McKenney @ 2019-08-01 22:50 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, mingo, jiangshanlai, dipankar, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, Paul E. McKenney

This commit causes the no-CBs grace-period/callback hierarchy to be
printed to the console when the dump_tree kernel boot parameter is set.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
---
 kernel/rcu/tree_plugin.h | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 80b27a9f306d..0a3f8680b450 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -2135,6 +2135,7 @@ module_param(rcu_nocb_gp_stride, int, 0444);
 static void __init rcu_organize_nocb_kthreads(void)
 {
 	int cpu;
+	bool firsttime = true;
 	int ls = rcu_nocb_gp_stride;
 	int nl = 0;  /* Next GP kthread. */
 	struct rcu_data *rdp;
@@ -2160,10 +2161,15 @@ static void __init rcu_organize_nocb_kthreads(void)
 			nl = DIV_ROUND_UP(rdp->cpu + 1, ls) * ls;
 			rdp->nocb_gp_rdp = rdp;
 			rdp_gp = rdp;
+			if (!firsttime && dump_tree)
+				pr_cont("\n");
+			firsttime = false;
+			pr_alert("%s: No-CB GP kthread CPU %d:", __func__, cpu);
 		} else {
 			/* Another CB kthread, link to previous GP kthread. */
 			rdp->nocb_gp_rdp = rdp_gp;
 			rdp_prev->nocb_next_cb_rdp = rdp;
+			pr_alert(" %d", cpu);
 		}
 		rdp_prev = rdp;
 	}
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH tip/core/rcu 03/11] rcu/nocb: Provide separate no-CBs grace-period kthreads
  2019-08-01 22:50 ` [PATCH tip/core/rcu 03/11] rcu/nocb: Provide separate no-CBs grace-period kthreads Paul E. McKenney
@ 2019-08-03 17:41   ` Joel Fernandes
  2019-08-03 19:46     ` Paul E. McKenney
  0 siblings, 1 reply; 15+ messages in thread
From: Joel Fernandes @ 2019-08-03 17:41 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: rcu, linux-kernel, mingo, jiangshanlai, dipankar, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg

On Thu, Aug 01, 2019 at 03:50:20PM -0700, Paul E. McKenney wrote:
> Currently, there is one no-CBs rcuo kthread per CPU, and these kthreads
> are divided into groups.  The first rcuo kthread to come online in a
> given group is that group's leader, and the leader both waits for grace
> periods and invokes its CPU's callbacks.  The non-leader rcuo kthreads
> only invoke callbacks.
> 
> This works well in the real-time/embedded environments for which it was
> intended because such environments tend not to generate all that many
> callbacks.  However, given huge floods of callbacks, it is possible for
> the leader kthread to be stuck invoking callbacks while its followers
> wait helplessly while their callbacks pile up.  This is a good recipe
> for an OOM, and rcutorture's new callback-flood capability does generate
> such OOMs.
> 
> One strategy would be to wait until such OOMs start happening in
> production, but similar OOMs have in fact happened starting in 2018.
> It would therefore be wise to take a more proactive approach.

I haven't looked much into nocbs/nohz_full stuff (yet). In particular, I did
not even know that the rcuo threads do grace period life-cycle management and
waiting, I thought only the RCU GP threads did :-/. however, it seems this is
a completely separate grace-period management state machine outside of the
RCU GP thread right?

I was wondering for this patch, could we also just have the rcuo
leader handle both callback execution and waking other non-leader threads at
the same time? So like, execute few callbacks, then do the wake up of the
non-leaders to execute their callbacks, the get back to executing their own
callbacks, etc. That way we don't need a separate rcuog thread to wait for
grace period, would that not work?

If you don't mind could you share with me a kvm.sh command (which has config,
boot parameters etc) that can produce the OOM without this patch? I'd
like to take a closer look at it.

Is there also a short answer for my the RCU GP thread cannot do the job of
these new rcuog threads?

thanks a lot,

 - Joel


> This commit therefore features per-CPU rcuo kthreads that do nothing
> but invoke callbacks.  Instead of having one of these kthreads act as
> leader, each group has a separate rcog kthread that handles grace periods
> for its group.  Because these rcuog kthreads do not invoke callbacks,
> callback floods on one CPU no longer block callbacks from reaching the
> rcuc callback-invocation kthreads on other CPUs.
> 
> This change does introduce additional kthreads, however:
> 
> 1.	The number of additional kthreads is about the square root of
> 	the number of CPUs, so that a 4096-CPU system would have only
> 	about 64 additional kthreads.  Note that recent changes
> 	decreased the number of rcuo kthreads by a factor of two
> 	(CONFIG_PREEMPT=n) or even three (CONFIG_PREEMPT=y), so
> 	this still represents a significant improvement on most systems.
> 
> 2.	The leading "rcuo" of the rcuog kthreads should allow existing
> 	scripting to affinity these additional kthreads as needed, the
> 	same as for the rcuop and rcuos kthreads.  (There are no longer
> 	any rcuob kthreads.)
> 
> 3.	A state-machine approach was considered and rejected.  Although
> 	this would allow the rcuo kthreads to continue their dual
> 	leader/follower roles, it complicates callback invocation
> 	and makes it more difficult to consolidate rcuo callback
> 	invocation with existing softirq callback invocation.
> 
> The introduction of rcuog kthreads should thus be acceptable.
> 
> Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
> ---
>  kernel/rcu/tree.h        |   6 +-
>  kernel/rcu/tree_plugin.h | 115 +++++++++++++++++++--------------------
>  2 files changed, 61 insertions(+), 60 deletions(-)
> 
> diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
> index 32b3348d3a4d..dc3c53cb9608 100644
> --- a/kernel/rcu/tree.h
> +++ b/kernel/rcu/tree.h
> @@ -200,8 +200,8 @@ struct rcu_data {
>  	atomic_long_t nocb_q_count_lazy; /*  invocation (all stages). */
>  	struct rcu_head *nocb_cb_head;	/* CBs ready to invoke. */
>  	struct rcu_head **nocb_cb_tail;
> -	struct swait_queue_head nocb_wq; /* For nocb kthreads to sleep on. */
> -	struct task_struct *nocb_cb_kthread;
> +	struct swait_queue_head nocb_cb_wq; /* For nocb kthreads to sleep on. */
> +	struct task_struct *nocb_gp_kthread;
>  	raw_spinlock_t nocb_lock;	/* Guard following pair of fields. */
>  	int nocb_defer_wakeup;		/* Defer wakeup of nocb_kthread. */
>  	struct timer_list nocb_timer;	/* Enforce finite deferral. */
> @@ -211,6 +211,8 @@ struct rcu_data {
>  					/* CBs waiting for GP. */
>  	struct rcu_head **nocb_gp_tail;
>  	bool nocb_gp_sleep;		/* Is the nocb GP thread asleep? */
> +	struct swait_queue_head nocb_gp_wq; /* For nocb kthreads to sleep on. */
> +	struct task_struct *nocb_cb_kthread;
>  	struct rcu_data *nocb_next_cb_rdp;
>  					/* Next rcu_data in wakeup chain. */
>  
> diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> index 5a72700c3a32..c3b6493313ab 100644
> --- a/kernel/rcu/tree_plugin.h
> +++ b/kernel/rcu/tree_plugin.h
> @@ -1531,7 +1531,7 @@ static void __wake_nocb_leader(struct rcu_data *rdp, bool force,
>  	struct rcu_data *rdp_leader = rdp->nocb_gp_rdp;
>  
>  	lockdep_assert_held(&rdp->nocb_lock);
> -	if (!READ_ONCE(rdp_leader->nocb_cb_kthread)) {
> +	if (!READ_ONCE(rdp_leader->nocb_gp_kthread)) {
>  		raw_spin_unlock_irqrestore(&rdp->nocb_lock, flags);
>  		return;
>  	}
> @@ -1541,7 +1541,7 @@ static void __wake_nocb_leader(struct rcu_data *rdp, bool force,
>  		del_timer(&rdp->nocb_timer);
>  		raw_spin_unlock_irqrestore(&rdp->nocb_lock, flags);
>  		smp_mb(); /* ->nocb_gp_sleep before swake_up_one(). */
> -		swake_up_one(&rdp_leader->nocb_wq);
> +		swake_up_one(&rdp_leader->nocb_gp_wq);
>  	} else {
>  		raw_spin_unlock_irqrestore(&rdp->nocb_lock, flags);
>  	}
> @@ -1646,7 +1646,7 @@ static void __call_rcu_nocb_enqueue(struct rcu_data *rdp,
>  	smp_mb__after_atomic(); /* Store *old_rhpp before _wake test. */
>  
>  	/* If we are not being polled and there is a kthread, awaken it ... */
> -	t = READ_ONCE(rdp->nocb_cb_kthread);
> +	t = READ_ONCE(rdp->nocb_gp_kthread);
>  	if (rcu_nocb_poll || !t) {
>  		trace_rcu_nocb_wake(rcu_state.name, rdp->cpu,
>  				    TPS("WakeNotPoll"));
> @@ -1786,7 +1786,7 @@ static void rcu_nocb_wait_gp(struct rcu_data *rdp)
>   * No-CBs GP kthreads come here to wait for additional callbacks to show up.
>   * This function does not return until callbacks appear.
>   */
> -static void nocb_leader_wait(struct rcu_data *my_rdp)
> +static void nocb_gp_wait(struct rcu_data *my_rdp)
>  {
>  	bool firsttime = true;
>  	unsigned long flags;
> @@ -1794,12 +1794,10 @@ static void nocb_leader_wait(struct rcu_data *my_rdp)
>  	struct rcu_data *rdp;
>  	struct rcu_head **tail;
>  
> -wait_again:
> -
>  	/* Wait for callbacks to appear. */
>  	if (!rcu_nocb_poll) {
>  		trace_rcu_nocb_wake(rcu_state.name, my_rdp->cpu, TPS("Sleep"));
> -		swait_event_interruptible_exclusive(my_rdp->nocb_wq,
> +		swait_event_interruptible_exclusive(my_rdp->nocb_gp_wq,
>  				!READ_ONCE(my_rdp->nocb_gp_sleep));
>  		raw_spin_lock_irqsave(&my_rdp->nocb_lock, flags);
>  		my_rdp->nocb_gp_sleep = true;
> @@ -1838,7 +1836,7 @@ static void nocb_leader_wait(struct rcu_data *my_rdp)
>  			trace_rcu_nocb_wake(rcu_state.name, my_rdp->cpu,
>  					    TPS("WokeEmpty"));
>  		}
> -		goto wait_again;
> +		return;
>  	}
>  
>  	/* Wait for one grace period. */
> @@ -1862,34 +1860,47 @@ static void nocb_leader_wait(struct rcu_data *my_rdp)
>  		rdp->nocb_cb_tail = rdp->nocb_gp_tail;
>  		*tail = rdp->nocb_gp_head;
>  		raw_spin_unlock_irqrestore(&rdp->nocb_lock, flags);
> -		if (rdp != my_rdp && tail == &rdp->nocb_cb_head) {
> +		if (tail == &rdp->nocb_cb_head) {
>  			/* List was empty, so wake up the kthread.  */
> -			swake_up_one(&rdp->nocb_wq);
> +			swake_up_one(&rdp->nocb_cb_wq);
>  		}
>  	}
> +}
>  
> -	/* If we (the GP kthreads) don't have CBs, go wait some more. */
> -	if (!my_rdp->nocb_cb_head)
> -		goto wait_again;
> +/*
> + * No-CBs grace-period-wait kthread.  There is one of these per group
> + * of CPUs, but only once at least one CPU in that group has come online
> + * at least once since boot.  This kthread checks for newly posted
> + * callbacks from any of the CPUs it is responsible for, waits for a
> + * grace period, then awakens all of the rcu_nocb_cb_kthread() instances
> + * that then have callback-invocation work to do.
> + */
> +static int rcu_nocb_gp_kthread(void *arg)
> +{
> +	struct rcu_data *rdp = arg;
> +
> +	for (;;)
> +		nocb_gp_wait(rdp);
> +	return 0;
>  }
>  
>  /*
>   * No-CBs CB kthreads come here to wait for additional callbacks to show up.
> - * This function does not return until callbacks appear.
> + * This function returns true ("keep waiting") until callbacks appear and
> + * then false ("stop waiting") when callbacks finally do appear.
>   */
> -static void nocb_follower_wait(struct rcu_data *rdp)
> +static bool nocb_follower_wait(struct rcu_data *rdp)
>  {
> -	for (;;) {
> -		trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("FollowerSleep"));
> -		swait_event_interruptible_exclusive(rdp->nocb_wq,
> -					 READ_ONCE(rdp->nocb_cb_head));
> -		if (smp_load_acquire(&rdp->nocb_cb_head)) {
> -			/* ^^^ Ensure CB invocation follows _head test. */
> -			return;
> -		}
> -		WARN_ON(signal_pending(current));
> -		trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("WokeEmpty"));
> +	trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("FollowerSleep"));
> +	swait_event_interruptible_exclusive(rdp->nocb_cb_wq,
> +				 READ_ONCE(rdp->nocb_cb_head));
> +	if (smp_load_acquire(&rdp->nocb_cb_head)) { /* VVV */
> +		/* ^^^ Ensure CB invocation follows _head test. */
> +		return false;
>  	}
> +	WARN_ON(signal_pending(current));
> +	trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("WokeEmpty"));
> +	return true;
>  }
>  
>  /*
> @@ -1899,7 +1910,7 @@ static void nocb_follower_wait(struct rcu_data *rdp)
>   * have to do quite so many wakeups (as in they only need to wake the
>   * no-CBs GP kthreads, not the CB kthreads).
>   */
> -static int rcu_nocb_kthread(void *arg)
> +static int rcu_nocb_cb_kthread(void *arg)
>  {
>  	int c, cl;
>  	unsigned long flags;
> @@ -1911,10 +1922,8 @@ static int rcu_nocb_kthread(void *arg)
>  	/* Each pass through this loop invokes one batch of callbacks */
>  	for (;;) {
>  		/* Wait for callbacks. */
> -		if (rdp->nocb_gp_rdp == rdp)
> -			nocb_leader_wait(rdp);
> -		else
> -			nocb_follower_wait(rdp);
> +		while (nocb_follower_wait(rdp))
> +			continue;
>  
>  		/* Pull the ready-to-invoke callbacks onto local list. */
>  		raw_spin_lock_irqsave(&rdp->nocb_lock, flags);
> @@ -2048,7 +2057,8 @@ void __init rcu_init_nohz(void)
>  static void __init rcu_boot_init_nocb_percpu_data(struct rcu_data *rdp)
>  {
>  	rdp->nocb_tail = &rdp->nocb_head;
> -	init_swait_queue_head(&rdp->nocb_wq);
> +	init_swait_queue_head(&rdp->nocb_cb_wq);
> +	init_swait_queue_head(&rdp->nocb_gp_wq);
>  	rdp->nocb_cb_tail = &rdp->nocb_cb_head;
>  	raw_spin_lock_init(&rdp->nocb_lock);
>  	timer_setup(&rdp->nocb_timer, do_nocb_deferred_wakeup_timer, 0);
> @@ -2056,50 +2066,39 @@ static void __init rcu_boot_init_nocb_percpu_data(struct rcu_data *rdp)
>  
>  /*
>   * If the specified CPU is a no-CBs CPU that does not already have its
> - * rcuo kthread, spawn it.  If the CPUs are brought online out of order,
> - * this can require re-organizing the GP-CB relationships.
> + * rcuo CB kthread, spawn it.  Additionally, if the rcuo GP kthread
> + * for this CPU's group has not yet been created, spawn it as well.
>   */
>  static void rcu_spawn_one_nocb_kthread(int cpu)
>  {
> -	struct rcu_data *rdp;
> -	struct rcu_data *rdp_last;
> -	struct rcu_data *rdp_old_leader;
> -	struct rcu_data *rdp_spawn = per_cpu_ptr(&rcu_data, cpu);
> +	struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
> +	struct rcu_data *rdp_gp;
>  	struct task_struct *t;
>  
>  	/*
>  	 * If this isn't a no-CBs CPU or if it already has an rcuo kthread,
>  	 * then nothing to do.
>  	 */
> -	if (!rcu_is_nocb_cpu(cpu) || rdp_spawn->nocb_cb_kthread)
> +	if (!rcu_is_nocb_cpu(cpu) || rdp->nocb_cb_kthread)
>  		return;
>  
>  	/* If we didn't spawn the GP kthread first, reorganize! */
> -	rdp_old_leader = rdp_spawn->nocb_gp_rdp;
> -	if (rdp_old_leader != rdp_spawn && !rdp_old_leader->nocb_cb_kthread) {
> -		rdp_last = NULL;
> -		rdp = rdp_old_leader;
> -		do {
> -			rdp->nocb_gp_rdp = rdp_spawn;
> -			if (rdp_last && rdp != rdp_spawn)
> -				rdp_last->nocb_next_cb_rdp = rdp;
> -			if (rdp == rdp_spawn) {
> -				rdp = rdp->nocb_next_cb_rdp;
> -			} else {
> -				rdp_last = rdp;
> -				rdp = rdp->nocb_next_cb_rdp;
> -				rdp_last->nocb_next_cb_rdp = NULL;
> -			}
> -		} while (rdp);
> -		rdp_spawn->nocb_next_cb_rdp = rdp_old_leader;
> +	rdp_gp = rdp->nocb_gp_rdp;
> +	if (!rdp_gp->nocb_gp_kthread) {
> +		t = kthread_run(rcu_nocb_gp_kthread, rdp_gp,
> +				"rcuog/%d", rdp_gp->cpu);
> +		if (WARN_ONCE(IS_ERR(t), "%s: Could not start rcuo GP kthread, OOM is now expected behavior\n", __func__))
> +			return;
> +		WRITE_ONCE(rdp_gp->nocb_gp_kthread, t);
>  	}
>  
>  	/* Spawn the kthread for this CPU. */
> -	t = kthread_run(rcu_nocb_kthread, rdp_spawn,
> +	t = kthread_run(rcu_nocb_cb_kthread, rdp,
>  			"rcuo%c/%d", rcu_state.abbr, cpu);
> -	if (WARN_ONCE(IS_ERR(t), "%s: Could not start rcuo kthread, OOM is now expected behavior\n", __func__))
> +	if (WARN_ONCE(IS_ERR(t), "%s: Could not start rcuo CB kthread, OOM is now expected behavior\n", __func__))
>  		return;
> -	WRITE_ONCE(rdp_spawn->nocb_cb_kthread, t);
> +	WRITE_ONCE(rdp->nocb_cb_kthread, t);
> +	WRITE_ONCE(rdp->nocb_gp_kthread, rdp_gp->nocb_gp_kthread);
>  }
>  
>  /*
> -- 
> 2.17.1
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH tip/core/rcu 03/11] rcu/nocb: Provide separate no-CBs grace-period kthreads
  2019-08-03 17:41   ` Joel Fernandes
@ 2019-08-03 19:46     ` Paul E. McKenney
  2019-08-04 19:24       ` Joel Fernandes
  0 siblings, 1 reply; 15+ messages in thread
From: Paul E. McKenney @ 2019-08-03 19:46 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: rcu, linux-kernel, mingo, jiangshanlai, dipankar, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg

On Sat, Aug 03, 2019 at 01:41:27PM -0400, Joel Fernandes wrote:
> On Thu, Aug 01, 2019 at 03:50:20PM -0700, Paul E. McKenney wrote:
> > Currently, there is one no-CBs rcuo kthread per CPU, and these kthreads
> > are divided into groups.  The first rcuo kthread to come online in a
> > given group is that group's leader, and the leader both waits for grace
> > periods and invokes its CPU's callbacks.  The non-leader rcuo kthreads
> > only invoke callbacks.
> > 
> > This works well in the real-time/embedded environments for which it was
> > intended because such environments tend not to generate all that many
> > callbacks.  However, given huge floods of callbacks, it is possible for
> > the leader kthread to be stuck invoking callbacks while its followers
> > wait helplessly while their callbacks pile up.  This is a good recipe
> > for an OOM, and rcutorture's new callback-flood capability does generate
> > such OOMs.
> > 
> > One strategy would be to wait until such OOMs start happening in
> > production, but similar OOMs have in fact happened starting in 2018.
> > It would therefore be wise to take a more proactive approach.
> 
> I haven't looked much into nocbs/nohz_full stuff (yet). In particular, I did
> not even know that the rcuo threads do grace period life-cycle management and
> waiting, I thought only the RCU GP threads did :-/. however, it seems this is
> a completely separate grace-period management state machine outside of the
> RCU GP thread right?

No, the rcuo kthreads interact with the main RCU GP kthread, initiating
new grace periods when needed and being awakened as needed by the RCU
GP kthread.

> I was wondering for this patch, could we also just have the rcuo
> leader handle both callback execution and waking other non-leader threads at
> the same time? So like, execute few callbacks, then do the wake up of the
> non-leaders to execute their callbacks, the get back to executing their own
> callbacks, etc. That way we don't need a separate rcuog thread to wait for
> grace period, would that not work?

I did look into that, but it was more complex and also didn't foster
sharing of rcu_do_batch(), which used to only be for non-offloaded
callbacks but now does both.  Besides which, invoking callbacks would
degrade the rcuog kthread's response to new callbacks and the like.

> If you don't mind could you share with me a kvm.sh command (which has config,
> boot parameters etc) that can produce the OOM without this patch? I'd
> like to take a closer look at it.

Here you go:

tools/testing/selftests/rcutorture/bin/kvm.sh --cpus 8 --duration 120 --configs TREE04

If you add "--memory 1G" in mainline, the OOMs go away.  Or at least
decrease substantially in probability.

> Is there also a short answer for my the RCU GP thread cannot do the job of
> these new rcuog threads?

First, the code is more complicated when you do it that way (and yes,
I did actually write it out in pen on paper).  Second, if the CPU
corresponding to the combined grace-period/callback kthread is doing the
call_rcu() flooding, you are between a rock and a hard place.  On the
one hand, you want that kthread to do nothing but invoke callbacks so
as to have half a chance of keeping up, and on the other hand you need
it to check state frequently so as to react in a timely fashion to a
CPU corresponding to one of its callback kthreads starting a second
callback flood.

Introducing rcug grace-period-only kthreads means that you get the best of
both worlds.  Plus last year's flavor consolidation decreased the number
of rcuo kthreads from either 2N or 3N to N, so increasing it to only
(N + sqrt(N)) should be just fine.  Though I would expect that there
will be at least some screaming and shouting.  ;-)

							Thanx, Paul

> thanks a lot,
> 
>  - Joel
> 
> 
> > This commit therefore features per-CPU rcuo kthreads that do nothing
> > but invoke callbacks.  Instead of having one of these kthreads act as
> > leader, each group has a separate rcog kthread that handles grace periods
> > for its group.  Because these rcuog kthreads do not invoke callbacks,
> > callback floods on one CPU no longer block callbacks from reaching the
> > rcuc callback-invocation kthreads on other CPUs.
> > 
> > This change does introduce additional kthreads, however:
> > 
> > 1.	The number of additional kthreads is about the square root of
> > 	the number of CPUs, so that a 4096-CPU system would have only
> > 	about 64 additional kthreads.  Note that recent changes
> > 	decreased the number of rcuo kthreads by a factor of two
> > 	(CONFIG_PREEMPT=n) or even three (CONFIG_PREEMPT=y), so
> > 	this still represents a significant improvement on most systems.
> > 
> > 2.	The leading "rcuo" of the rcuog kthreads should allow existing
> > 	scripting to affinity these additional kthreads as needed, the
> > 	same as for the rcuop and rcuos kthreads.  (There are no longer
> > 	any rcuob kthreads.)
> > 
> > 3.	A state-machine approach was considered and rejected.  Although
> > 	this would allow the rcuo kthreads to continue their dual
> > 	leader/follower roles, it complicates callback invocation
> > 	and makes it more difficult to consolidate rcuo callback
> > 	invocation with existing softirq callback invocation.
> > 
> > The introduction of rcuog kthreads should thus be acceptable.
> > 
> > Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
> > ---
> >  kernel/rcu/tree.h        |   6 +-
> >  kernel/rcu/tree_plugin.h | 115 +++++++++++++++++++--------------------
> >  2 files changed, 61 insertions(+), 60 deletions(-)
> > 
> > diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
> > index 32b3348d3a4d..dc3c53cb9608 100644
> > --- a/kernel/rcu/tree.h
> > +++ b/kernel/rcu/tree.h
> > @@ -200,8 +200,8 @@ struct rcu_data {
> >  	atomic_long_t nocb_q_count_lazy; /*  invocation (all stages). */
> >  	struct rcu_head *nocb_cb_head;	/* CBs ready to invoke. */
> >  	struct rcu_head **nocb_cb_tail;
> > -	struct swait_queue_head nocb_wq; /* For nocb kthreads to sleep on. */
> > -	struct task_struct *nocb_cb_kthread;
> > +	struct swait_queue_head nocb_cb_wq; /* For nocb kthreads to sleep on. */
> > +	struct task_struct *nocb_gp_kthread;
> >  	raw_spinlock_t nocb_lock;	/* Guard following pair of fields. */
> >  	int nocb_defer_wakeup;		/* Defer wakeup of nocb_kthread. */
> >  	struct timer_list nocb_timer;	/* Enforce finite deferral. */
> > @@ -211,6 +211,8 @@ struct rcu_data {
> >  					/* CBs waiting for GP. */
> >  	struct rcu_head **nocb_gp_tail;
> >  	bool nocb_gp_sleep;		/* Is the nocb GP thread asleep? */
> > +	struct swait_queue_head nocb_gp_wq; /* For nocb kthreads to sleep on. */
> > +	struct task_struct *nocb_cb_kthread;
> >  	struct rcu_data *nocb_next_cb_rdp;
> >  					/* Next rcu_data in wakeup chain. */
> >  
> > diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> > index 5a72700c3a32..c3b6493313ab 100644
> > --- a/kernel/rcu/tree_plugin.h
> > +++ b/kernel/rcu/tree_plugin.h
> > @@ -1531,7 +1531,7 @@ static void __wake_nocb_leader(struct rcu_data *rdp, bool force,
> >  	struct rcu_data *rdp_leader = rdp->nocb_gp_rdp;
> >  
> >  	lockdep_assert_held(&rdp->nocb_lock);
> > -	if (!READ_ONCE(rdp_leader->nocb_cb_kthread)) {
> > +	if (!READ_ONCE(rdp_leader->nocb_gp_kthread)) {
> >  		raw_spin_unlock_irqrestore(&rdp->nocb_lock, flags);
> >  		return;
> >  	}
> > @@ -1541,7 +1541,7 @@ static void __wake_nocb_leader(struct rcu_data *rdp, bool force,
> >  		del_timer(&rdp->nocb_timer);
> >  		raw_spin_unlock_irqrestore(&rdp->nocb_lock, flags);
> >  		smp_mb(); /* ->nocb_gp_sleep before swake_up_one(). */
> > -		swake_up_one(&rdp_leader->nocb_wq);
> > +		swake_up_one(&rdp_leader->nocb_gp_wq);
> >  	} else {
> >  		raw_spin_unlock_irqrestore(&rdp->nocb_lock, flags);
> >  	}
> > @@ -1646,7 +1646,7 @@ static void __call_rcu_nocb_enqueue(struct rcu_data *rdp,
> >  	smp_mb__after_atomic(); /* Store *old_rhpp before _wake test. */
> >  
> >  	/* If we are not being polled and there is a kthread, awaken it ... */
> > -	t = READ_ONCE(rdp->nocb_cb_kthread);
> > +	t = READ_ONCE(rdp->nocb_gp_kthread);
> >  	if (rcu_nocb_poll || !t) {
> >  		trace_rcu_nocb_wake(rcu_state.name, rdp->cpu,
> >  				    TPS("WakeNotPoll"));
> > @@ -1786,7 +1786,7 @@ static void rcu_nocb_wait_gp(struct rcu_data *rdp)
> >   * No-CBs GP kthreads come here to wait for additional callbacks to show up.
> >   * This function does not return until callbacks appear.
> >   */
> > -static void nocb_leader_wait(struct rcu_data *my_rdp)
> > +static void nocb_gp_wait(struct rcu_data *my_rdp)
> >  {
> >  	bool firsttime = true;
> >  	unsigned long flags;
> > @@ -1794,12 +1794,10 @@ static void nocb_leader_wait(struct rcu_data *my_rdp)
> >  	struct rcu_data *rdp;
> >  	struct rcu_head **tail;
> >  
> > -wait_again:
> > -
> >  	/* Wait for callbacks to appear. */
> >  	if (!rcu_nocb_poll) {
> >  		trace_rcu_nocb_wake(rcu_state.name, my_rdp->cpu, TPS("Sleep"));
> > -		swait_event_interruptible_exclusive(my_rdp->nocb_wq,
> > +		swait_event_interruptible_exclusive(my_rdp->nocb_gp_wq,
> >  				!READ_ONCE(my_rdp->nocb_gp_sleep));
> >  		raw_spin_lock_irqsave(&my_rdp->nocb_lock, flags);
> >  		my_rdp->nocb_gp_sleep = true;
> > @@ -1838,7 +1836,7 @@ static void nocb_leader_wait(struct rcu_data *my_rdp)
> >  			trace_rcu_nocb_wake(rcu_state.name, my_rdp->cpu,
> >  					    TPS("WokeEmpty"));
> >  		}
> > -		goto wait_again;
> > +		return;
> >  	}
> >  
> >  	/* Wait for one grace period. */
> > @@ -1862,34 +1860,47 @@ static void nocb_leader_wait(struct rcu_data *my_rdp)
> >  		rdp->nocb_cb_tail = rdp->nocb_gp_tail;
> >  		*tail = rdp->nocb_gp_head;
> >  		raw_spin_unlock_irqrestore(&rdp->nocb_lock, flags);
> > -		if (rdp != my_rdp && tail == &rdp->nocb_cb_head) {
> > +		if (tail == &rdp->nocb_cb_head) {
> >  			/* List was empty, so wake up the kthread.  */
> > -			swake_up_one(&rdp->nocb_wq);
> > +			swake_up_one(&rdp->nocb_cb_wq);
> >  		}
> >  	}
> > +}
> >  
> > -	/* If we (the GP kthreads) don't have CBs, go wait some more. */
> > -	if (!my_rdp->nocb_cb_head)
> > -		goto wait_again;
> > +/*
> > + * No-CBs grace-period-wait kthread.  There is one of these per group
> > + * of CPUs, but only once at least one CPU in that group has come online
> > + * at least once since boot.  This kthread checks for newly posted
> > + * callbacks from any of the CPUs it is responsible for, waits for a
> > + * grace period, then awakens all of the rcu_nocb_cb_kthread() instances
> > + * that then have callback-invocation work to do.
> > + */
> > +static int rcu_nocb_gp_kthread(void *arg)
> > +{
> > +	struct rcu_data *rdp = arg;
> > +
> > +	for (;;)
> > +		nocb_gp_wait(rdp);
> > +	return 0;
> >  }
> >  
> >  /*
> >   * No-CBs CB kthreads come here to wait for additional callbacks to show up.
> > - * This function does not return until callbacks appear.
> > + * This function returns true ("keep waiting") until callbacks appear and
> > + * then false ("stop waiting") when callbacks finally do appear.
> >   */
> > -static void nocb_follower_wait(struct rcu_data *rdp)
> > +static bool nocb_follower_wait(struct rcu_data *rdp)
> >  {
> > -	for (;;) {
> > -		trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("FollowerSleep"));
> > -		swait_event_interruptible_exclusive(rdp->nocb_wq,
> > -					 READ_ONCE(rdp->nocb_cb_head));
> > -		if (smp_load_acquire(&rdp->nocb_cb_head)) {
> > -			/* ^^^ Ensure CB invocation follows _head test. */
> > -			return;
> > -		}
> > -		WARN_ON(signal_pending(current));
> > -		trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("WokeEmpty"));
> > +	trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("FollowerSleep"));
> > +	swait_event_interruptible_exclusive(rdp->nocb_cb_wq,
> > +				 READ_ONCE(rdp->nocb_cb_head));
> > +	if (smp_load_acquire(&rdp->nocb_cb_head)) { /* VVV */
> > +		/* ^^^ Ensure CB invocation follows _head test. */
> > +		return false;
> >  	}
> > +	WARN_ON(signal_pending(current));
> > +	trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("WokeEmpty"));
> > +	return true;
> >  }
> >  
> >  /*
> > @@ -1899,7 +1910,7 @@ static void nocb_follower_wait(struct rcu_data *rdp)
> >   * have to do quite so many wakeups (as in they only need to wake the
> >   * no-CBs GP kthreads, not the CB kthreads).
> >   */
> > -static int rcu_nocb_kthread(void *arg)
> > +static int rcu_nocb_cb_kthread(void *arg)
> >  {
> >  	int c, cl;
> >  	unsigned long flags;
> > @@ -1911,10 +1922,8 @@ static int rcu_nocb_kthread(void *arg)
> >  	/* Each pass through this loop invokes one batch of callbacks */
> >  	for (;;) {
> >  		/* Wait for callbacks. */
> > -		if (rdp->nocb_gp_rdp == rdp)
> > -			nocb_leader_wait(rdp);
> > -		else
> > -			nocb_follower_wait(rdp);
> > +		while (nocb_follower_wait(rdp))
> > +			continue;
> >  
> >  		/* Pull the ready-to-invoke callbacks onto local list. */
> >  		raw_spin_lock_irqsave(&rdp->nocb_lock, flags);
> > @@ -2048,7 +2057,8 @@ void __init rcu_init_nohz(void)
> >  static void __init rcu_boot_init_nocb_percpu_data(struct rcu_data *rdp)
> >  {
> >  	rdp->nocb_tail = &rdp->nocb_head;
> > -	init_swait_queue_head(&rdp->nocb_wq);
> > +	init_swait_queue_head(&rdp->nocb_cb_wq);
> > +	init_swait_queue_head(&rdp->nocb_gp_wq);
> >  	rdp->nocb_cb_tail = &rdp->nocb_cb_head;
> >  	raw_spin_lock_init(&rdp->nocb_lock);
> >  	timer_setup(&rdp->nocb_timer, do_nocb_deferred_wakeup_timer, 0);
> > @@ -2056,50 +2066,39 @@ static void __init rcu_boot_init_nocb_percpu_data(struct rcu_data *rdp)
> >  
> >  /*
> >   * If the specified CPU is a no-CBs CPU that does not already have its
> > - * rcuo kthread, spawn it.  If the CPUs are brought online out of order,
> > - * this can require re-organizing the GP-CB relationships.
> > + * rcuo CB kthread, spawn it.  Additionally, if the rcuo GP kthread
> > + * for this CPU's group has not yet been created, spawn it as well.
> >   */
> >  static void rcu_spawn_one_nocb_kthread(int cpu)
> >  {
> > -	struct rcu_data *rdp;
> > -	struct rcu_data *rdp_last;
> > -	struct rcu_data *rdp_old_leader;
> > -	struct rcu_data *rdp_spawn = per_cpu_ptr(&rcu_data, cpu);
> > +	struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
> > +	struct rcu_data *rdp_gp;
> >  	struct task_struct *t;
> >  
> >  	/*
> >  	 * If this isn't a no-CBs CPU or if it already has an rcuo kthread,
> >  	 * then nothing to do.
> >  	 */
> > -	if (!rcu_is_nocb_cpu(cpu) || rdp_spawn->nocb_cb_kthread)
> > +	if (!rcu_is_nocb_cpu(cpu) || rdp->nocb_cb_kthread)
> >  		return;
> >  
> >  	/* If we didn't spawn the GP kthread first, reorganize! */
> > -	rdp_old_leader = rdp_spawn->nocb_gp_rdp;
> > -	if (rdp_old_leader != rdp_spawn && !rdp_old_leader->nocb_cb_kthread) {
> > -		rdp_last = NULL;
> > -		rdp = rdp_old_leader;
> > -		do {
> > -			rdp->nocb_gp_rdp = rdp_spawn;
> > -			if (rdp_last && rdp != rdp_spawn)
> > -				rdp_last->nocb_next_cb_rdp = rdp;
> > -			if (rdp == rdp_spawn) {
> > -				rdp = rdp->nocb_next_cb_rdp;
> > -			} else {
> > -				rdp_last = rdp;
> > -				rdp = rdp->nocb_next_cb_rdp;
> > -				rdp_last->nocb_next_cb_rdp = NULL;
> > -			}
> > -		} while (rdp);
> > -		rdp_spawn->nocb_next_cb_rdp = rdp_old_leader;
> > +	rdp_gp = rdp->nocb_gp_rdp;
> > +	if (!rdp_gp->nocb_gp_kthread) {
> > +		t = kthread_run(rcu_nocb_gp_kthread, rdp_gp,
> > +				"rcuog/%d", rdp_gp->cpu);
> > +		if (WARN_ONCE(IS_ERR(t), "%s: Could not start rcuo GP kthread, OOM is now expected behavior\n", __func__))
> > +			return;
> > +		WRITE_ONCE(rdp_gp->nocb_gp_kthread, t);
> >  	}
> >  
> >  	/* Spawn the kthread for this CPU. */
> > -	t = kthread_run(rcu_nocb_kthread, rdp_spawn,
> > +	t = kthread_run(rcu_nocb_cb_kthread, rdp,
> >  			"rcuo%c/%d", rcu_state.abbr, cpu);
> > -	if (WARN_ONCE(IS_ERR(t), "%s: Could not start rcuo kthread, OOM is now expected behavior\n", __func__))
> > +	if (WARN_ONCE(IS_ERR(t), "%s: Could not start rcuo CB kthread, OOM is now expected behavior\n", __func__))
> >  		return;
> > -	WRITE_ONCE(rdp_spawn->nocb_cb_kthread, t);
> > +	WRITE_ONCE(rdp->nocb_cb_kthread, t);
> > +	WRITE_ONCE(rdp->nocb_gp_kthread, rdp_gp->nocb_gp_kthread);
> >  }
> >  
> >  /*
> > -- 
> > 2.17.1
> > 
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH tip/core/rcu 03/11] rcu/nocb: Provide separate no-CBs grace-period kthreads
  2019-08-03 19:46     ` Paul E. McKenney
@ 2019-08-04 19:24       ` Joel Fernandes
  0 siblings, 0 replies; 15+ messages in thread
From: Joel Fernandes @ 2019-08-04 19:24 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: rcu, linux-kernel, mingo, jiangshanlai, dipankar, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg

On Sat, Aug 03, 2019 at 12:46:11PM -0700, Paul E. McKenney wrote:
> On Sat, Aug 03, 2019 at 01:41:27PM -0400, Joel Fernandes wrote:
> > On Thu, Aug 01, 2019 at 03:50:20PM -0700, Paul E. McKenney wrote:
> > > Currently, there is one no-CBs rcuo kthread per CPU, and these kthreads
> > > are divided into groups.  The first rcuo kthread to come online in a
> > > given group is that group's leader, and the leader both waits for grace
> > > periods and invokes its CPU's callbacks.  The non-leader rcuo kthreads
> > > only invoke callbacks.
> > > 
> > > This works well in the real-time/embedded environments for which it was
> > > intended because such environments tend not to generate all that many
> > > callbacks.  However, given huge floods of callbacks, it is possible for
> > > the leader kthread to be stuck invoking callbacks while its followers
> > > wait helplessly while their callbacks pile up.  This is a good recipe
> > > for an OOM, and rcutorture's new callback-flood capability does generate
> > > such OOMs.
> > > 
> > > One strategy would be to wait until such OOMs start happening in
> > > production, but similar OOMs have in fact happened starting in 2018.
> > > It would therefore be wise to take a more proactive approach.
> > 
> > I haven't looked much into nocbs/nohz_full stuff (yet). In particular, I did
> > not even know that the rcuo threads do grace period life-cycle management and
> > waiting, I thought only the RCU GP threads did :-/. however, it seems this is
> > a completely separate grace-period management state machine outside of the
> > RCU GP thread right?
> 
> No, the rcuo kthreads interact with the main RCU GP kthread, initiating
> new grace periods when needed and being awakened as needed by the RCU
> GP kthread.

Ok, I see the interactions in rcu_nocb_wait_gp(). This what I was thinking
too is that there has to be these interactions with the main RCU GP kthread,
for anything to work :) Thanks for the explanation!

> > I was wondering for this patch, could we also just have the rcuo
> > leader handle both callback execution and waking other non-leader threads at
> > the same time? So like, execute few callbacks, then do the wake up of the
> > non-leaders to execute their callbacks, the get back to executing their own
> > callbacks, etc. That way we don't need a separate rcuog thread to wait for
> > grace period, would that not work?
> 
> I did look into that, but it was more complex and also didn't foster
> sharing of rcu_do_batch(), which used to only be for non-offloaded
> callbacks but now does both.  Besides which, invoking callbacks would
> degrade the rcuog kthread's response to new callbacks and the like.

Makes sense.

> > If you don't mind could you share with me a kvm.sh command (which has config,
> > boot parameters etc) that can produce the OOM without this patch? I'd
> > like to take a closer look at it.
> 
> Here you go:
> 
> tools/testing/selftests/rcutorture/bin/kvm.sh --cpus 8 --duration 120 --configs TREE04
> 
> If you add "--memory 1G" in mainline, the OOMs go away.  Or at least
> decrease substantially in probability.

I could reproduce it, it look around 5-10 minutes for the OOM for 512MB
memory. Thanks.

> > Is there also a short answer for my the RCU GP thread cannot do the job of
> > these new rcuog threads?
> 
> First, the code is more complicated when you do it that way (and yes,
> I did actually write it out in pen on paper).  Second, if the CPU
> corresponding to the combined grace-period/callback kthread is doing the
> call_rcu() flooding, you are between a rock and a hard place.  On the
> one hand, you want that kthread to do nothing but invoke callbacks so
> as to have half a chance of keeping up, and on the other hand you need
> it to check state frequently so as to react in a timely fashion to a
> CPU corresponding to one of its callback kthreads starting a second
> callback flood.

> Introducing rcug grace-period-only kthreads means that you get the best of
> both worlds.  Plus last year's flavor consolidation decreased the number
> of rcuo kthreads from either 2N or 3N to N, so increasing it to only
> (N + sqrt(N)) should be just fine.  Though I would expect that there
> will be at least some screaming and shouting.  ;-)

Ok got it. Yes, fewer newer threads now even with nocb improvements :)

thanks,

 - Joel


> 
> 							Thanx, Paul
> 
> > thanks a lot,
> > 
> >  - Joel
> > 
> > 
> > > This commit therefore features per-CPU rcuo kthreads that do nothing
> > > but invoke callbacks.  Instead of having one of these kthreads act as
> > > leader, each group has a separate rcog kthread that handles grace periods
> > > for its group.  Because these rcuog kthreads do not invoke callbacks,
> > > callback floods on one CPU no longer block callbacks from reaching the
> > > rcuc callback-invocation kthreads on other CPUs.
> > > 
> > > This change does introduce additional kthreads, however:
> > > 
> > > 1.	The number of additional kthreads is about the square root of
> > > 	the number of CPUs, so that a 4096-CPU system would have only
> > > 	about 64 additional kthreads.  Note that recent changes
> > > 	decreased the number of rcuo kthreads by a factor of two
> > > 	(CONFIG_PREEMPT=n) or even three (CONFIG_PREEMPT=y), so
> > > 	this still represents a significant improvement on most systems.
> > > 
> > > 2.	The leading "rcuo" of the rcuog kthreads should allow existing
> > > 	scripting to affinity these additional kthreads as needed, the
> > > 	same as for the rcuop and rcuos kthreads.  (There are no longer
> > > 	any rcuob kthreads.)
> > > 
> > > 3.	A state-machine approach was considered and rejected.  Although
> > > 	this would allow the rcuo kthreads to continue their dual
> > > 	leader/follower roles, it complicates callback invocation
> > > 	and makes it more difficult to consolidate rcuo callback
> > > 	invocation with existing softirq callback invocation.
> > > 
> > > The introduction of rcuog kthreads should thus be acceptable.
> > > 
> > > Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
> > > ---
> > >  kernel/rcu/tree.h        |   6 +-
> > >  kernel/rcu/tree_plugin.h | 115 +++++++++++++++++++--------------------
> > >  2 files changed, 61 insertions(+), 60 deletions(-)
> > > 
> > > diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
> > > index 32b3348d3a4d..dc3c53cb9608 100644
> > > --- a/kernel/rcu/tree.h
> > > +++ b/kernel/rcu/tree.h
> > > @@ -200,8 +200,8 @@ struct rcu_data {
> > >  	atomic_long_t nocb_q_count_lazy; /*  invocation (all stages). */
> > >  	struct rcu_head *nocb_cb_head;	/* CBs ready to invoke. */
> > >  	struct rcu_head **nocb_cb_tail;
> > > -	struct swait_queue_head nocb_wq; /* For nocb kthreads to sleep on. */
> > > -	struct task_struct *nocb_cb_kthread;
> > > +	struct swait_queue_head nocb_cb_wq; /* For nocb kthreads to sleep on. */
> > > +	struct task_struct *nocb_gp_kthread;
> > >  	raw_spinlock_t nocb_lock;	/* Guard following pair of fields. */
> > >  	int nocb_defer_wakeup;		/* Defer wakeup of nocb_kthread. */
> > >  	struct timer_list nocb_timer;	/* Enforce finite deferral. */
> > > @@ -211,6 +211,8 @@ struct rcu_data {
> > >  					/* CBs waiting for GP. */
> > >  	struct rcu_head **nocb_gp_tail;
> > >  	bool nocb_gp_sleep;		/* Is the nocb GP thread asleep? */
> > > +	struct swait_queue_head nocb_gp_wq; /* For nocb kthreads to sleep on. */
> > > +	struct task_struct *nocb_cb_kthread;
> > >  	struct rcu_data *nocb_next_cb_rdp;
> > >  					/* Next rcu_data in wakeup chain. */
> > >  
> > > diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> > > index 5a72700c3a32..c3b6493313ab 100644
> > > --- a/kernel/rcu/tree_plugin.h
> > > +++ b/kernel/rcu/tree_plugin.h
> > > @@ -1531,7 +1531,7 @@ static void __wake_nocb_leader(struct rcu_data *rdp, bool force,
> > >  	struct rcu_data *rdp_leader = rdp->nocb_gp_rdp;
> > >  
> > >  	lockdep_assert_held(&rdp->nocb_lock);
> > > -	if (!READ_ONCE(rdp_leader->nocb_cb_kthread)) {
> > > +	if (!READ_ONCE(rdp_leader->nocb_gp_kthread)) {
> > >  		raw_spin_unlock_irqrestore(&rdp->nocb_lock, flags);
> > >  		return;
> > >  	}
> > > @@ -1541,7 +1541,7 @@ static void __wake_nocb_leader(struct rcu_data *rdp, bool force,
> > >  		del_timer(&rdp->nocb_timer);
> > >  		raw_spin_unlock_irqrestore(&rdp->nocb_lock, flags);
> > >  		smp_mb(); /* ->nocb_gp_sleep before swake_up_one(). */
> > > -		swake_up_one(&rdp_leader->nocb_wq);
> > > +		swake_up_one(&rdp_leader->nocb_gp_wq);
> > >  	} else {
> > >  		raw_spin_unlock_irqrestore(&rdp->nocb_lock, flags);
> > >  	}
> > > @@ -1646,7 +1646,7 @@ static void __call_rcu_nocb_enqueue(struct rcu_data *rdp,
> > >  	smp_mb__after_atomic(); /* Store *old_rhpp before _wake test. */
> > >  
> > >  	/* If we are not being polled and there is a kthread, awaken it ... */
> > > -	t = READ_ONCE(rdp->nocb_cb_kthread);
> > > +	t = READ_ONCE(rdp->nocb_gp_kthread);
> > >  	if (rcu_nocb_poll || !t) {
> > >  		trace_rcu_nocb_wake(rcu_state.name, rdp->cpu,
> > >  				    TPS("WakeNotPoll"));
> > > @@ -1786,7 +1786,7 @@ static void rcu_nocb_wait_gp(struct rcu_data *rdp)
> > >   * No-CBs GP kthreads come here to wait for additional callbacks to show up.
> > >   * This function does not return until callbacks appear.
> > >   */
> > > -static void nocb_leader_wait(struct rcu_data *my_rdp)
> > > +static void nocb_gp_wait(struct rcu_data *my_rdp)
> > >  {
> > >  	bool firsttime = true;
> > >  	unsigned long flags;
> > > @@ -1794,12 +1794,10 @@ static void nocb_leader_wait(struct rcu_data *my_rdp)
> > >  	struct rcu_data *rdp;
> > >  	struct rcu_head **tail;
> > >  
> > > -wait_again:
> > > -
> > >  	/* Wait for callbacks to appear. */
> > >  	if (!rcu_nocb_poll) {
> > >  		trace_rcu_nocb_wake(rcu_state.name, my_rdp->cpu, TPS("Sleep"));
> > > -		swait_event_interruptible_exclusive(my_rdp->nocb_wq,
> > > +		swait_event_interruptible_exclusive(my_rdp->nocb_gp_wq,
> > >  				!READ_ONCE(my_rdp->nocb_gp_sleep));
> > >  		raw_spin_lock_irqsave(&my_rdp->nocb_lock, flags);
> > >  		my_rdp->nocb_gp_sleep = true;
> > > @@ -1838,7 +1836,7 @@ static void nocb_leader_wait(struct rcu_data *my_rdp)
> > >  			trace_rcu_nocb_wake(rcu_state.name, my_rdp->cpu,
> > >  					    TPS("WokeEmpty"));
> > >  		}
> > > -		goto wait_again;
> > > +		return;
> > >  	}
> > >  
> > >  	/* Wait for one grace period. */
> > > @@ -1862,34 +1860,47 @@ static void nocb_leader_wait(struct rcu_data *my_rdp)
> > >  		rdp->nocb_cb_tail = rdp->nocb_gp_tail;
> > >  		*tail = rdp->nocb_gp_head;
> > >  		raw_spin_unlock_irqrestore(&rdp->nocb_lock, flags);
> > > -		if (rdp != my_rdp && tail == &rdp->nocb_cb_head) {
> > > +		if (tail == &rdp->nocb_cb_head) {
> > >  			/* List was empty, so wake up the kthread.  */
> > > -			swake_up_one(&rdp->nocb_wq);
> > > +			swake_up_one(&rdp->nocb_cb_wq);
> > >  		}
> > >  	}
> > > +}
> > >  
> > > -	/* If we (the GP kthreads) don't have CBs, go wait some more. */
> > > -	if (!my_rdp->nocb_cb_head)
> > > -		goto wait_again;
> > > +/*
> > > + * No-CBs grace-period-wait kthread.  There is one of these per group
> > > + * of CPUs, but only once at least one CPU in that group has come online
> > > + * at least once since boot.  This kthread checks for newly posted
> > > + * callbacks from any of the CPUs it is responsible for, waits for a
> > > + * grace period, then awakens all of the rcu_nocb_cb_kthread() instances
> > > + * that then have callback-invocation work to do.
> > > + */
> > > +static int rcu_nocb_gp_kthread(void *arg)
> > > +{
> > > +	struct rcu_data *rdp = arg;
> > > +
> > > +	for (;;)
> > > +		nocb_gp_wait(rdp);
> > > +	return 0;
> > >  }
> > >  
> > >  /*
> > >   * No-CBs CB kthreads come here to wait for additional callbacks to show up.
> > > - * This function does not return until callbacks appear.
> > > + * This function returns true ("keep waiting") until callbacks appear and
> > > + * then false ("stop waiting") when callbacks finally do appear.
> > >   */
> > > -static void nocb_follower_wait(struct rcu_data *rdp)
> > > +static bool nocb_follower_wait(struct rcu_data *rdp)
> > >  {
> > > -	for (;;) {
> > > -		trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("FollowerSleep"));
> > > -		swait_event_interruptible_exclusive(rdp->nocb_wq,
> > > -					 READ_ONCE(rdp->nocb_cb_head));
> > > -		if (smp_load_acquire(&rdp->nocb_cb_head)) {
> > > -			/* ^^^ Ensure CB invocation follows _head test. */
> > > -			return;
> > > -		}
> > > -		WARN_ON(signal_pending(current));
> > > -		trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("WokeEmpty"));
> > > +	trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("FollowerSleep"));
> > > +	swait_event_interruptible_exclusive(rdp->nocb_cb_wq,
> > > +				 READ_ONCE(rdp->nocb_cb_head));
> > > +	if (smp_load_acquire(&rdp->nocb_cb_head)) { /* VVV */
> > > +		/* ^^^ Ensure CB invocation follows _head test. */
> > > +		return false;
> > >  	}
> > > +	WARN_ON(signal_pending(current));
> > > +	trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("WokeEmpty"));
> > > +	return true;
> > >  }
> > >  
> > >  /*
> > > @@ -1899,7 +1910,7 @@ static void nocb_follower_wait(struct rcu_data *rdp)
> > >   * have to do quite so many wakeups (as in they only need to wake the
> > >   * no-CBs GP kthreads, not the CB kthreads).
> > >   */
> > > -static int rcu_nocb_kthread(void *arg)
> > > +static int rcu_nocb_cb_kthread(void *arg)
> > >  {
> > >  	int c, cl;
> > >  	unsigned long flags;
> > > @@ -1911,10 +1922,8 @@ static int rcu_nocb_kthread(void *arg)
> > >  	/* Each pass through this loop invokes one batch of callbacks */
> > >  	for (;;) {
> > >  		/* Wait for callbacks. */
> > > -		if (rdp->nocb_gp_rdp == rdp)
> > > -			nocb_leader_wait(rdp);
> > > -		else
> > > -			nocb_follower_wait(rdp);
> > > +		while (nocb_follower_wait(rdp))
> > > +			continue;
> > >  
> > >  		/* Pull the ready-to-invoke callbacks onto local list. */
> > >  		raw_spin_lock_irqsave(&rdp->nocb_lock, flags);
> > > @@ -2048,7 +2057,8 @@ void __init rcu_init_nohz(void)
> > >  static void __init rcu_boot_init_nocb_percpu_data(struct rcu_data *rdp)
> > >  {
> > >  	rdp->nocb_tail = &rdp->nocb_head;
> > > -	init_swait_queue_head(&rdp->nocb_wq);
> > > +	init_swait_queue_head(&rdp->nocb_cb_wq);
> > > +	init_swait_queue_head(&rdp->nocb_gp_wq);
> > >  	rdp->nocb_cb_tail = &rdp->nocb_cb_head;
> > >  	raw_spin_lock_init(&rdp->nocb_lock);
> > >  	timer_setup(&rdp->nocb_timer, do_nocb_deferred_wakeup_timer, 0);
> > > @@ -2056,50 +2066,39 @@ static void __init rcu_boot_init_nocb_percpu_data(struct rcu_data *rdp)
> > >  
> > >  /*
> > >   * If the specified CPU is a no-CBs CPU that does not already have its
> > > - * rcuo kthread, spawn it.  If the CPUs are brought online out of order,
> > > - * this can require re-organizing the GP-CB relationships.
> > > + * rcuo CB kthread, spawn it.  Additionally, if the rcuo GP kthread
> > > + * for this CPU's group has not yet been created, spawn it as well.
> > >   */
> > >  static void rcu_spawn_one_nocb_kthread(int cpu)
> > >  {
> > > -	struct rcu_data *rdp;
> > > -	struct rcu_data *rdp_last;
> > > -	struct rcu_data *rdp_old_leader;
> > > -	struct rcu_data *rdp_spawn = per_cpu_ptr(&rcu_data, cpu);
> > > +	struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
> > > +	struct rcu_data *rdp_gp;
> > >  	struct task_struct *t;
> > >  
> > >  	/*
> > >  	 * If this isn't a no-CBs CPU or if it already has an rcuo kthread,
> > >  	 * then nothing to do.
> > >  	 */
> > > -	if (!rcu_is_nocb_cpu(cpu) || rdp_spawn->nocb_cb_kthread)
> > > +	if (!rcu_is_nocb_cpu(cpu) || rdp->nocb_cb_kthread)
> > >  		return;
> > >  
> > >  	/* If we didn't spawn the GP kthread first, reorganize! */
> > > -	rdp_old_leader = rdp_spawn->nocb_gp_rdp;
> > > -	if (rdp_old_leader != rdp_spawn && !rdp_old_leader->nocb_cb_kthread) {
> > > -		rdp_last = NULL;
> > > -		rdp = rdp_old_leader;
> > > -		do {
> > > -			rdp->nocb_gp_rdp = rdp_spawn;
> > > -			if (rdp_last && rdp != rdp_spawn)
> > > -				rdp_last->nocb_next_cb_rdp = rdp;
> > > -			if (rdp == rdp_spawn) {
> > > -				rdp = rdp->nocb_next_cb_rdp;
> > > -			} else {
> > > -				rdp_last = rdp;
> > > -				rdp = rdp->nocb_next_cb_rdp;
> > > -				rdp_last->nocb_next_cb_rdp = NULL;
> > > -			}
> > > -		} while (rdp);
> > > -		rdp_spawn->nocb_next_cb_rdp = rdp_old_leader;
> > > +	rdp_gp = rdp->nocb_gp_rdp;
> > > +	if (!rdp_gp->nocb_gp_kthread) {
> > > +		t = kthread_run(rcu_nocb_gp_kthread, rdp_gp,
> > > +				"rcuog/%d", rdp_gp->cpu);
> > > +		if (WARN_ONCE(IS_ERR(t), "%s: Could not start rcuo GP kthread, OOM is now expected behavior\n", __func__))
> > > +			return;
> > > +		WRITE_ONCE(rdp_gp->nocb_gp_kthread, t);
> > >  	}
> > >  
> > >  	/* Spawn the kthread for this CPU. */
> > > -	t = kthread_run(rcu_nocb_kthread, rdp_spawn,
> > > +	t = kthread_run(rcu_nocb_cb_kthread, rdp,
> > >  			"rcuo%c/%d", rcu_state.abbr, cpu);
> > > -	if (WARN_ONCE(IS_ERR(t), "%s: Could not start rcuo kthread, OOM is now expected behavior\n", __func__))
> > > +	if (WARN_ONCE(IS_ERR(t), "%s: Could not start rcuo CB kthread, OOM is now expected behavior\n", __func__))
> > >  		return;
> > > -	WRITE_ONCE(rdp_spawn->nocb_cb_kthread, t);
> > > +	WRITE_ONCE(rdp->nocb_cb_kthread, t);
> > > +	WRITE_ONCE(rdp->nocb_gp_kthread, rdp_gp->nocb_gp_kthread);
> > >  }
> > >  
> > >  /*
> > > -- 
> > > 2.17.1
> > > 
> > 

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2019-08-04 19:25 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-01 22:50 [PATCH tip/core/rcu 0/11] No-CBs grace-period kthread updates for v5.3-rc2 Paul E. McKenney
2019-08-01 22:50 ` [PATCH tip/core/rcu 01/11] rcu/nocb: Rename rcu_data fields to prepare for forward-progress work Paul E. McKenney
2019-08-01 22:50 ` [PATCH tip/core/rcu 02/11] rcu/nocb: Update comments " Paul E. McKenney
2019-08-01 22:50 ` [PATCH tip/core/rcu 03/11] rcu/nocb: Provide separate no-CBs grace-period kthreads Paul E. McKenney
2019-08-03 17:41   ` Joel Fernandes
2019-08-03 19:46     ` Paul E. McKenney
2019-08-04 19:24       ` Joel Fernandes
2019-08-01 22:50 ` [PATCH tip/core/rcu 04/11] rcu/nocb: Rename nocb_follower_wait() to nocb_cb_wait() Paul E. McKenney
2019-08-01 22:50 ` [PATCH tip/core/rcu 05/11] rcu/nocb: Rename wake_nocb_leader() to wake_nocb_gp() Paul E. McKenney
2019-08-01 22:50 ` [PATCH tip/core/rcu 06/11] rcu/nocb: Rename __wake_nocb_leader() to __wake_nocb_gp() Paul E. McKenney
2019-08-01 22:50 ` [PATCH tip/core/rcu 07/11] rcu/nocb: Rename wake_nocb_leader_defer() to wake_nocb_gp_defer() Paul E. McKenney
2019-08-01 22:50 ` [PATCH tip/core/rcu 08/11] rcu/nocb: Rename rcu_organize_nocb_kthreads() local variable Paul E. McKenney
2019-08-01 22:50 ` [PATCH tip/core/rcu 09/11] rcu/nocb: Rename and document no-CB CB kthread sleep trace event Paul E. McKenney
2019-08-01 22:50 ` [PATCH tip/core/rcu 10/11] rcu/nocb: Rename rcu_nocb_leader_stride kernel boot parameter Paul E. McKenney
2019-08-01 22:50 ` [PATCH tip/core/rcu 11/11] rcu/nocb: Print gp/cb kthread hierarchy if dump_tree Paul E. McKenney

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).