linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH rcu 0/18] RCU Tasks updates for v5.17
@ 2021-12-02  0:38 Paul E. McKenney
  2021-12-02  0:38 ` [PATCH rcu 01/18] rcu-tasks: Don't remove tasks with pending IPIs from holdout list Paul E. McKenney
                   ` (17 more replies)
  0 siblings, 18 replies; 19+ messages in thread
From: Paul E. McKenney @ 2021-12-02  0:38 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, kernel-team, mingo, jiangshanlai, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel

Hello!

This series provides RCU Tasks updates, including making stall warnings
use task_call_func() and providing better update-side scalability for
call_rcu_tasks_trace() and friends:

1.	rcu-tasks: Don't remove tasks with pending IPIs from holdout list.

2.	rcu-tasks:  Create per-CPU callback lists.

3.	rcu-tasks: Introduce ->percpu_enqueue_shift for dynamic queue
	selection.

4.	rcu-tasks: Convert grace-period counter to grace-period sequence
	number.

5.	rcu_tasks: Convert bespoke callback list to rcu_segcblist
	structure.

6.	rcu-tasks: Use spin_lock_rcu_node() and friends.

7.	rcu-tasks: Inspect stalled task's trc state in locked state,
	courtesy of Neeraj Upadhyay.

8.	rcu-tasks: Add a ->percpu_enqueue_lim to the rcu_tasks structure.

9.	rcu-tasks: Abstract checking of callback lists.

10.	rcu-tasks: Abstract invocations of callbacks.

11.	rcu-tasks: Use workqueues for multiple rcu_tasks_invoke_cbs()
	invocations.

12.	rcu-tasks: Make rcu_barrier_tasks*() handle multiple callback
	queues.

13.	rcu-tasks: Add rcupdate.rcu_task_enqueue_lim to set initial
	queueing.

14.	rcu-tasks: Count trylocks to estimate call_rcu_tasks() contention.

15.	rcu-tasks: Avoid raw-spinlocked wakeups from
	call_rcu_tasks_generic().

16.	rcu-tasks: Use more callback queues if contention encountered.

17.	rcu-tasks: Use separate ->percpu_dequeue_lim for callback
	dequeueing.

18.	rcu-tasks: Use fewer callbacks queues if callback flood ends.

						Thanx, Paul

------------------------------------------------------------------------

 Documentation/admin-guide/kernel-parameters.txt   |   16 
 b/Documentation/admin-guide/kernel-parameters.txt |    7 
 b/kernel/rcu/Kconfig                              |    2 
 b/kernel/rcu/tasks.h                              |    5 
 kernel/rcu/tasks.h                                |  668 ++++++++++++++++------
 5 files changed, 518 insertions(+), 180 deletions(-)

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH rcu 01/18] rcu-tasks: Don't remove tasks with pending IPIs from holdout list
  2021-12-02  0:38 [PATCH rcu 0/18] RCU Tasks updates for v5.17 Paul E. McKenney
@ 2021-12-02  0:38 ` Paul E. McKenney
  2021-12-02  0:38 ` [PATCH rcu 02/18] rcu-tasks: Create per-CPU callback lists Paul E. McKenney
                   ` (16 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: Paul E. McKenney @ 2021-12-02  0:38 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, kernel-team, mingo, jiangshanlai, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, Paul E. McKenney

Currently, the check_all_holdout_tasks_trace() function removes all tasks
marked with ->trc_reader_checked from the holdout list, including those
with IPIs pending.  This means that the IPI handler might arrive at
a task that has already been removed from the list, which is at best
an accident waiting to happen.

This commit therefore avoids removing tasks with IPIs pending from
the holdout list.  This in turn means that the "if" condition in the
for_each_online_cpu() loop in rcu_tasks_trace_postgp() should always
evaluate to false, so a WARN_ON_ONCE() is added to check that.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 kernel/rcu/tasks.h | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h
index 7da3c81c3f59c..bd44cd4794d3d 100644
--- a/kernel/rcu/tasks.h
+++ b/kernel/rcu/tasks.h
@@ -1121,7 +1121,8 @@ static void check_all_holdout_tasks_trace(struct list_head *hop,
 			trc_wait_for_one_reader(t, hop);
 
 		// If check succeeded, remove this task from the list.
-		if (READ_ONCE(t->trc_reader_checked))
+		if (smp_load_acquire(&t->trc_ipi_to_cpu) == -1 &&
+		    READ_ONCE(t->trc_reader_checked))
 			trc_del_holdout(t);
 		else if (needreport)
 			show_stalled_task_trace(t, firstreport);
@@ -1156,7 +1157,7 @@ static void rcu_tasks_trace_postgp(struct rcu_tasks *rtp)
 	// Yes, this assumes that CPUs process IPIs in order.  If that ever
 	// changes, there will need to be a recheck and/or timed wait.
 	for_each_online_cpu(cpu)
-		if (smp_load_acquire(per_cpu_ptr(&trc_ipi_to_cpu, cpu)))
+		if (WARN_ON_ONCE(smp_load_acquire(per_cpu_ptr(&trc_ipi_to_cpu, cpu))))
 			smp_call_function_single(cpu, rcu_tasks_trace_empty_fn, NULL, 1);
 
 	// Remove the safety count.
-- 
2.31.1.189.g2e36527f23


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH rcu 02/18] rcu-tasks:  Create per-CPU callback lists
  2021-12-02  0:38 [PATCH rcu 0/18] RCU Tasks updates for v5.17 Paul E. McKenney
  2021-12-02  0:38 ` [PATCH rcu 01/18] rcu-tasks: Don't remove tasks with pending IPIs from holdout list Paul E. McKenney
@ 2021-12-02  0:38 ` Paul E. McKenney
  2021-12-02  0:38 ` [PATCH rcu 03/18] rcu-tasks: Introduce ->percpu_enqueue_shift for dynamic queue selection Paul E. McKenney
                   ` (15 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: Paul E. McKenney @ 2021-12-02  0:38 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, kernel-team, mingo, jiangshanlai, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, Paul E. McKenney, Martin Lau,
	Neeraj Upadhyay, kernel test robot

Currently, RCU Tasks Trace (as well as the other two flavors of RCU Tasks)
use a single global callback list.  This works well and is simple, but
expected changes in workload will cause this list to become a bottleneck.
This commit therefore creates per-CPU callback lists for the various
flavors of RCU Tasks, but continues queueing on a single list, namely
that of CPU 0.  Later commits will dynamically vary the number of lists
in use to accommodate dynamic changes in workload.

Reported-by: Martin Lau <kafai@fb.com>
Cc: Neeraj Upadhyay <neeraj.iitr10@gmail.com>
Tested-by: kernel test robot <beibei.si@intel.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 kernel/rcu/tasks.h | 106 +++++++++++++++++++++++++++++++++------------
 1 file changed, 78 insertions(+), 28 deletions(-)

diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h
index bd44cd4794d3d..30048db7aa49d 100644
--- a/kernel/rcu/tasks.h
+++ b/kernel/rcu/tasks.h
@@ -20,11 +20,21 @@ typedef void (*holdouts_func_t)(struct list_head *hop, bool ndrpt, bool *frptp);
 typedef void (*postgp_func_t)(struct rcu_tasks *rtp);
 
 /**
- * struct rcu_tasks - Definition for a Tasks-RCU-like mechanism.
+ * struct rcu_tasks_percpu - Per-CPU component of definition for a Tasks-RCU-like mechanism.
  * @cbs_head: Head of callback list.
  * @cbs_tail: Tail pointer for callback list.
+ * @cbs_pcpu_lock: Lock protecting per-CPU callback list.
+ */
+struct rcu_tasks_percpu {
+	struct rcu_head *cbs_head;
+	struct rcu_head **cbs_tail;
+	raw_spinlock_t cbs_pcpu_lock;
+};
+
+/**
+ * struct rcu_tasks - Definition for a Tasks-RCU-like mechanism.
  * @cbs_wq: Wait queue allowing new callback to get kthread's attention.
- * @cbs_lock: Lock protecting callback list.
+ * @cbs_gbl_lock: Lock protecting callback list.
  * @kthread_ptr: This flavor's grace-period/callback-invocation kthread.
  * @gp_func: This flavor's grace-period-wait function.
  * @gp_state: Grace period's most recent state transition (debugging).
@@ -41,14 +51,13 @@ typedef void (*postgp_func_t)(struct rcu_tasks *rtp);
  * @holdouts_func: This flavor's holdout-list scan function (optional).
  * @postgp_func: This flavor's post-grace-period function (optional).
  * @call_func: This flavor's call_rcu()-equivalent function.
+ * @rtpcpu: This flavor's rcu_tasks_percpu structure.
  * @name: This flavor's textual name.
  * @kname: This flavor's kthread name.
  */
 struct rcu_tasks {
-	struct rcu_head *cbs_head;
-	struct rcu_head **cbs_tail;
 	struct wait_queue_head cbs_wq;
-	raw_spinlock_t cbs_lock;
+	raw_spinlock_t cbs_gbl_lock;
 	int gp_state;
 	int gp_sleep;
 	int init_fract;
@@ -65,20 +74,24 @@ struct rcu_tasks {
 	holdouts_func_t holdouts_func;
 	postgp_func_t postgp_func;
 	call_rcu_func_t call_func;
+	struct rcu_tasks_percpu __percpu *rtpcpu;
 	char *name;
 	char *kname;
 };
 
-#define DEFINE_RCU_TASKS(rt_name, gp, call, n)				\
-static struct rcu_tasks rt_name =					\
-{									\
-	.cbs_tail = &rt_name.cbs_head,					\
-	.cbs_wq = __WAIT_QUEUE_HEAD_INITIALIZER(rt_name.cbs_wq),	\
-	.cbs_lock = __RAW_SPIN_LOCK_UNLOCKED(rt_name.cbs_lock),		\
-	.gp_func = gp,							\
-	.call_func = call,						\
-	.name = n,							\
-	.kname = #rt_name,						\
+#define DEFINE_RCU_TASKS(rt_name, gp, call, n)						\
+static DEFINE_PER_CPU(struct rcu_tasks_percpu, rt_name ## __percpu) = {			\
+	.cbs_pcpu_lock = __RAW_SPIN_LOCK_UNLOCKED(rt_name ## __percpu.cbs_pcpu_lock),	\
+};											\
+static struct rcu_tasks rt_name =							\
+{											\
+	.cbs_wq = __WAIT_QUEUE_HEAD_INITIALIZER(rt_name.cbs_wq),			\
+	.cbs_gbl_lock = __RAW_SPIN_LOCK_UNLOCKED(rt_name.cbs_gbl_lock),			\
+	.gp_func = gp,									\
+	.call_func = call,								\
+	.rtpcpu = &rt_name ## __percpu,							\
+	.name = n,									\
+	.kname = #rt_name,								\
 }
 
 /* Track exiting tasks in order to allow them to be waited for. */
@@ -148,20 +161,51 @@ static const char *tasks_gp_state_getname(struct rcu_tasks *rtp)
 }
 #endif /* #ifndef CONFIG_TINY_RCU */
 
+// Initialize per-CPU callback lists for the specified flavor of
+// Tasks RCU.
+static void cblist_init_generic(struct rcu_tasks *rtp)
+{
+	int cpu;
+	unsigned long flags;
+
+	raw_spin_lock_irqsave(&rtp->cbs_gbl_lock, flags);
+	for_each_possible_cpu(cpu) {
+		struct rcu_tasks_percpu *rtpcp = per_cpu_ptr(rtp->rtpcpu, cpu);
+
+		WARN_ON_ONCE(!rtpcp);
+		if (cpu)
+			raw_spin_lock_init(&rtpcp->cbs_pcpu_lock);
+		raw_spin_lock(&rtpcp->cbs_pcpu_lock); // irqs already disabled.
+		if (!WARN_ON_ONCE(rtpcp->cbs_tail))
+			rtpcp->cbs_tail = &rtpcp->cbs_head;
+		raw_spin_unlock(&rtpcp->cbs_pcpu_lock); // irqs remain disabled.
+	}
+	raw_spin_unlock_irqrestore(&rtp->cbs_gbl_lock, flags);
+
+}
+
 // Enqueue a callback for the specified flavor of Tasks RCU.
 static void call_rcu_tasks_generic(struct rcu_head *rhp, rcu_callback_t func,
 				   struct rcu_tasks *rtp)
 {
 	unsigned long flags;
 	bool needwake;
+	struct rcu_tasks_percpu *rtpcp;
 
 	rhp->next = NULL;
 	rhp->func = func;
-	raw_spin_lock_irqsave(&rtp->cbs_lock, flags);
-	needwake = !rtp->cbs_head;
-	WRITE_ONCE(*rtp->cbs_tail, rhp);
-	rtp->cbs_tail = &rhp->next;
-	raw_spin_unlock_irqrestore(&rtp->cbs_lock, flags);
+	local_irq_save(flags);
+	rtpcp = per_cpu_ptr(rtp->rtpcpu, 0 /* smp_processor_id() */);
+	raw_spin_lock(&rtpcp->cbs_pcpu_lock);
+	if (!rtpcp->cbs_tail) {
+		raw_spin_unlock(&rtpcp->cbs_pcpu_lock); // irqs remain disabled.
+		cblist_init_generic(rtp);
+		raw_spin_lock(&rtpcp->cbs_pcpu_lock); // irqs already disabled.
+	}
+	needwake = !rtpcp->cbs_head;
+	WRITE_ONCE(*rtpcp->cbs_tail, rhp);
+	rtpcp->cbs_tail = &rhp->next;
+	raw_spin_unlock_irqrestore(&rtpcp->cbs_pcpu_lock, flags);
 	/* We can't create the thread unless interrupts are enabled. */
 	if (needwake && READ_ONCE(rtp->kthread_ptr))
 		wake_up(&rtp->cbs_wq);
@@ -197,21 +241,23 @@ static int __noreturn rcu_tasks_kthread(void *arg)
 	 * This loop is terminated by the system going down.  ;-)
 	 */
 	for (;;) {
+		struct rcu_tasks_percpu *rtpcp = per_cpu_ptr(rtp->rtpcpu, 0);  // for_each...
+
 		set_tasks_gp_state(rtp, RTGS_WAIT_CBS);
 
 		/* Pick up any new callbacks. */
-		raw_spin_lock_irqsave(&rtp->cbs_lock, flags);
+		raw_spin_lock_irqsave(&rtpcp->cbs_pcpu_lock, flags);
 		smp_mb__after_spinlock(); // Order updates vs. GP.
-		list = rtp->cbs_head;
-		rtp->cbs_head = NULL;
-		rtp->cbs_tail = &rtp->cbs_head;
-		raw_spin_unlock_irqrestore(&rtp->cbs_lock, flags);
+		list = rtpcp->cbs_head;
+		rtpcp->cbs_head = NULL;
+		rtpcp->cbs_tail = &rtpcp->cbs_head;
+		raw_spin_unlock_irqrestore(&rtpcp->cbs_pcpu_lock, flags);
 
 		/* If there were none, wait a bit and start over. */
 		if (!list) {
 			wait_event_interruptible(rtp->cbs_wq,
-						 READ_ONCE(rtp->cbs_head));
-			if (!rtp->cbs_head) {
+						 READ_ONCE(rtpcp->cbs_head));
+			if (!rtpcp->cbs_head) {
 				WARN_ON(signal_pending(current));
 				set_tasks_gp_state(rtp, RTGS_WAIT_WAIT_CBS);
 				schedule_timeout_idle(HZ/10);
@@ -279,6 +325,7 @@ static void __init rcu_tasks_bootup_oddness(void)
 /* Dump out rcutorture-relevant state common to all RCU-tasks flavors. */
 static void show_rcu_tasks_generic_gp_kthread(struct rcu_tasks *rtp, char *s)
 {
+	struct rcu_tasks_percpu *rtpcp = per_cpu_ptr(rtp->rtpcpu, 0); // for_each...
 	pr_info("%s: %s(%d) since %lu g:%lu i:%lu/%lu %c%c %s\n",
 		rtp->kname,
 		tasks_gp_state_getname(rtp), data_race(rtp->gp_state),
@@ -286,7 +333,7 @@ static void show_rcu_tasks_generic_gp_kthread(struct rcu_tasks *rtp, char *s)
 		data_race(rtp->n_gps),
 		data_race(rtp->n_ipis_fails), data_race(rtp->n_ipis),
 		".k"[!!data_race(rtp->kthread_ptr)],
-		".C"[!!data_race(rtp->cbs_head)],
+		".C"[!!data_race(rtpcp->cbs_head)],
 		s);
 }
 #endif // #ifndef CONFIG_TINY_RCU
@@ -593,6 +640,7 @@ EXPORT_SYMBOL_GPL(rcu_barrier_tasks);
 
 static int __init rcu_spawn_tasks_kthread(void)
 {
+	cblist_init_generic(&rcu_tasks);
 	rcu_tasks.gp_sleep = HZ / 10;
 	rcu_tasks.init_fract = HZ / 10;
 	rcu_tasks.pregp_func = rcu_tasks_pregp_step;
@@ -731,6 +779,7 @@ EXPORT_SYMBOL_GPL(rcu_barrier_tasks_rude);
 
 static int __init rcu_spawn_tasks_rude_kthread(void)
 {
+	cblist_init_generic(&rcu_tasks_rude);
 	rcu_tasks_rude.gp_sleep = HZ / 10;
 	rcu_spawn_tasks_kthread_generic(&rcu_tasks_rude);
 	return 0;
@@ -1264,6 +1313,7 @@ EXPORT_SYMBOL_GPL(rcu_barrier_tasks_trace);
 
 static int __init rcu_spawn_tasks_trace_kthread(void)
 {
+	cblist_init_generic(&rcu_tasks_trace);
 	if (IS_ENABLED(CONFIG_TASKS_TRACE_RCU_READ_MB)) {
 		rcu_tasks_trace.gp_sleep = HZ / 10;
 		rcu_tasks_trace.init_fract = HZ / 10;
-- 
2.31.1.189.g2e36527f23


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH rcu 03/18] rcu-tasks: Introduce ->percpu_enqueue_shift for dynamic queue selection
  2021-12-02  0:38 [PATCH rcu 0/18] RCU Tasks updates for v5.17 Paul E. McKenney
  2021-12-02  0:38 ` [PATCH rcu 01/18] rcu-tasks: Don't remove tasks with pending IPIs from holdout list Paul E. McKenney
  2021-12-02  0:38 ` [PATCH rcu 02/18] rcu-tasks: Create per-CPU callback lists Paul E. McKenney
@ 2021-12-02  0:38 ` Paul E. McKenney
  2021-12-02  0:38 ` [PATCH rcu 04/18] rcu-tasks: Convert grace-period counter to grace-period sequence number Paul E. McKenney
                   ` (14 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: Paul E. McKenney @ 2021-12-02  0:38 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, kernel-team, mingo, jiangshanlai, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, Paul E. McKenney, Martin Lau,
	Neeraj Upadhyay

This commit introduces a ->percpu_enqueue_shift field to the rcu_tasks
structure, and uses it to shift down the CPU number in order to
select a rcu_tasks_percpu structure.  This field is currently set to a
sufficiently large shift count to always select the CPU-0 instance of
the rcu_tasks_percpu structure, and later commits will adjust this.

Reported-by: Martin Lau <kafai@fb.com>
Cc: Neeraj Upadhyay <neeraj.iitr10@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 kernel/rcu/tasks.h | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h
index 30048db7aa49d..2a5fe3e04b363 100644
--- a/kernel/rcu/tasks.h
+++ b/kernel/rcu/tasks.h
@@ -52,6 +52,7 @@ struct rcu_tasks_percpu {
  * @postgp_func: This flavor's post-grace-period function (optional).
  * @call_func: This flavor's call_rcu()-equivalent function.
  * @rtpcpu: This flavor's rcu_tasks_percpu structure.
+ * @percpu_enqueue_shift: Shift down CPU ID this much when enqueuing callbacks.
  * @name: This flavor's textual name.
  * @kname: This flavor's kthread name.
  */
@@ -75,6 +76,7 @@ struct rcu_tasks {
 	postgp_func_t postgp_func;
 	call_rcu_func_t call_func;
 	struct rcu_tasks_percpu __percpu *rtpcpu;
+	int percpu_enqueue_shift;
 	char *name;
 	char *kname;
 };
@@ -91,6 +93,7 @@ static struct rcu_tasks rt_name =							\
 	.call_func = call,								\
 	.rtpcpu = &rt_name ## __percpu,							\
 	.name = n,									\
+	.percpu_enqueue_shift = ilog2(CONFIG_NR_CPUS),					\
 	.kname = #rt_name,								\
 }
 
@@ -169,6 +172,7 @@ static void cblist_init_generic(struct rcu_tasks *rtp)
 	unsigned long flags;
 
 	raw_spin_lock_irqsave(&rtp->cbs_gbl_lock, flags);
+	rtp->percpu_enqueue_shift = ilog2(nr_cpu_ids);
 	for_each_possible_cpu(cpu) {
 		struct rcu_tasks_percpu *rtpcp = per_cpu_ptr(rtp->rtpcpu, cpu);
 
@@ -195,7 +199,8 @@ static void call_rcu_tasks_generic(struct rcu_head *rhp, rcu_callback_t func,
 	rhp->next = NULL;
 	rhp->func = func;
 	local_irq_save(flags);
-	rtpcp = per_cpu_ptr(rtp->rtpcpu, 0 /* smp_processor_id() */);
+	rtpcp = per_cpu_ptr(rtp->rtpcpu,
+			    smp_processor_id() >> READ_ONCE(rtp->percpu_enqueue_shift));
 	raw_spin_lock(&rtpcp->cbs_pcpu_lock);
 	if (!rtpcp->cbs_tail) {
 		raw_spin_unlock(&rtpcp->cbs_pcpu_lock); // irqs remain disabled.
-- 
2.31.1.189.g2e36527f23


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH rcu 04/18] rcu-tasks: Convert grace-period counter to grace-period sequence number
  2021-12-02  0:38 [PATCH rcu 0/18] RCU Tasks updates for v5.17 Paul E. McKenney
                   ` (2 preceding siblings ...)
  2021-12-02  0:38 ` [PATCH rcu 03/18] rcu-tasks: Introduce ->percpu_enqueue_shift for dynamic queue selection Paul E. McKenney
@ 2021-12-02  0:38 ` Paul E. McKenney
  2021-12-02  0:38 ` [PATCH rcu 05/18] rcu_tasks: Convert bespoke callback list to rcu_segcblist structure Paul E. McKenney
                   ` (13 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: Paul E. McKenney @ 2021-12-02  0:38 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, kernel-team, mingo, jiangshanlai, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, Paul E. McKenney, Martin Lau,
	Neeraj Upadhyay

This commit moves the rcu_tasks structure's ->n_gps grace-period-counter
field to a ->task_gp_seq grce-period sequence number in order to enable
use of the rcu_segcblist structure for the callback lists.  This in turn
permits CPUs to lag behind the RCU Tasks grace-period sequence number
without suffering long-term slowdowns in callback invocation.

Reported-by: Martin Lau <kafai@fb.com>
Cc: Neeraj Upadhyay <neeraj.iitr10@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 kernel/rcu/tasks.h | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h
index 2a5fe3e04b363..ed84d59b6dbfa 100644
--- a/kernel/rcu/tasks.h
+++ b/kernel/rcu/tasks.h
@@ -42,7 +42,7 @@ struct rcu_tasks_percpu {
  * @init_fract: Initial backoff sleep interval.
  * @gp_jiffies: Time of last @gp_state transition.
  * @gp_start: Most recent grace-period start in jiffies.
- * @n_gps: Number of grace periods completed since boot.
+ * @tasks_gp_seq: Number of grace periods completed since boot.
  * @n_ipis: Number of IPIs sent to encourage grace periods to end.
  * @n_ipis_fails: Number of IPI-send failures.
  * @pregp_func: This flavor's pre-grace-period function (optional).
@@ -64,7 +64,7 @@ struct rcu_tasks {
 	int init_fract;
 	unsigned long gp_jiffies;
 	unsigned long gp_start;
-	unsigned long n_gps;
+	unsigned long tasks_gp_seq;
 	unsigned long n_ipis;
 	unsigned long n_ipis_fails;
 	struct task_struct *kthread_ptr;
@@ -273,8 +273,9 @@ static int __noreturn rcu_tasks_kthread(void *arg)
 		// Wait for one grace period.
 		set_tasks_gp_state(rtp, RTGS_WAIT_GP);
 		rtp->gp_start = jiffies;
+		rcu_seq_start(&rtp->tasks_gp_seq);
 		rtp->gp_func(rtp);
-		rtp->n_gps++;
+		rcu_seq_end(&rtp->tasks_gp_seq);
 
 		/* Invoke the callbacks. */
 		set_tasks_gp_state(rtp, RTGS_INVOKE_CBS);
@@ -335,7 +336,7 @@ static void show_rcu_tasks_generic_gp_kthread(struct rcu_tasks *rtp, char *s)
 		rtp->kname,
 		tasks_gp_state_getname(rtp), data_race(rtp->gp_state),
 		jiffies - data_race(rtp->gp_jiffies),
-		data_race(rtp->n_gps),
+		data_race(rcu_seq_current(&rtp->tasks_gp_seq)),
 		data_race(rtp->n_ipis_fails), data_race(rtp->n_ipis),
 		".k"[!!data_race(rtp->kthread_ptr)],
 		".C"[!!data_race(rtpcp->cbs_head)],
-- 
2.31.1.189.g2e36527f23


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH rcu 05/18] rcu_tasks: Convert bespoke callback list to rcu_segcblist structure
  2021-12-02  0:38 [PATCH rcu 0/18] RCU Tasks updates for v5.17 Paul E. McKenney
                   ` (3 preceding siblings ...)
  2021-12-02  0:38 ` [PATCH rcu 04/18] rcu-tasks: Convert grace-period counter to grace-period sequence number Paul E. McKenney
@ 2021-12-02  0:38 ` Paul E. McKenney
  2021-12-02  0:38 ` [PATCH rcu 06/18] rcu-tasks: Use spin_lock_rcu_node() and friends Paul E. McKenney
                   ` (12 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: Paul E. McKenney @ 2021-12-02  0:38 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, kernel-team, mingo, jiangshanlai, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, Paul E. McKenney, Martin Lau,
	Neeraj Upadhyay

This commit moves from a bespoke head and tail pointer in the
rcu_tasks_percpu structure to an rcu_segcblist structure, thus allowing
associating the grace-period sequence number with groups of callbacks.
This in turn will allow callbacks to be invoked independently on
different CPUs.

Reported-by: Martin Lau <kafai@fb.com>
Cc: Neeraj Upadhyay <neeraj.iitr10@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 kernel/rcu/Kconfig |  2 +-
 kernel/rcu/tasks.h | 52 ++++++++++++++++++++++++++--------------------
 2 files changed, 30 insertions(+), 24 deletions(-)

diff --git a/kernel/rcu/Kconfig b/kernel/rcu/Kconfig
index 3128b7cf8e1fd..42bcd34312c2c 100644
--- a/kernel/rcu/Kconfig
+++ b/kernel/rcu/Kconfig
@@ -112,7 +112,7 @@ config RCU_STALL_COMMON
 	  making these warnings mandatory for the tree variants.
 
 config RCU_NEED_SEGCBLIST
-	def_bool ( TREE_RCU || TREE_SRCU )
+	def_bool ( TREE_RCU || TREE_SRCU || TASKS_RCU_GENERIC )
 
 config RCU_FANOUT
 	int "Tree-based hierarchical RCU fanout value"
diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h
index ed84d59b6dbfa..2e58d7fa2da41 100644
--- a/kernel/rcu/tasks.h
+++ b/kernel/rcu/tasks.h
@@ -6,6 +6,7 @@
  */
 
 #ifdef CONFIG_TASKS_RCU_GENERIC
+#include "rcu_segcblist.h"
 
 ////////////////////////////////////////////////////////////////////////
 //
@@ -21,13 +22,11 @@ typedef void (*postgp_func_t)(struct rcu_tasks *rtp);
 
 /**
  * struct rcu_tasks_percpu - Per-CPU component of definition for a Tasks-RCU-like mechanism.
- * @cbs_head: Head of callback list.
- * @cbs_tail: Tail pointer for callback list.
+ * @cblist: Callback list.
  * @cbs_pcpu_lock: Lock protecting per-CPU callback list.
  */
 struct rcu_tasks_percpu {
-	struct rcu_head *cbs_head;
-	struct rcu_head **cbs_tail;
+	struct rcu_segcblist cblist;
 	raw_spinlock_t cbs_pcpu_lock;
 };
 
@@ -180,8 +179,8 @@ static void cblist_init_generic(struct rcu_tasks *rtp)
 		if (cpu)
 			raw_spin_lock_init(&rtpcp->cbs_pcpu_lock);
 		raw_spin_lock(&rtpcp->cbs_pcpu_lock); // irqs already disabled.
-		if (!WARN_ON_ONCE(rtpcp->cbs_tail))
-			rtpcp->cbs_tail = &rtpcp->cbs_head;
+		if (rcu_segcblist_empty(&rtpcp->cblist))
+			rcu_segcblist_init(&rtpcp->cblist);
 		raw_spin_unlock(&rtpcp->cbs_pcpu_lock); // irqs remain disabled.
 	}
 	raw_spin_unlock_irqrestore(&rtp->cbs_gbl_lock, flags);
@@ -202,14 +201,13 @@ static void call_rcu_tasks_generic(struct rcu_head *rhp, rcu_callback_t func,
 	rtpcp = per_cpu_ptr(rtp->rtpcpu,
 			    smp_processor_id() >> READ_ONCE(rtp->percpu_enqueue_shift));
 	raw_spin_lock(&rtpcp->cbs_pcpu_lock);
-	if (!rtpcp->cbs_tail) {
+	if (!rcu_segcblist_is_enabled(&rtpcp->cblist)) {
 		raw_spin_unlock(&rtpcp->cbs_pcpu_lock); // irqs remain disabled.
 		cblist_init_generic(rtp);
 		raw_spin_lock(&rtpcp->cbs_pcpu_lock); // irqs already disabled.
 	}
-	needwake = !rtpcp->cbs_head;
-	WRITE_ONCE(*rtpcp->cbs_tail, rhp);
-	rtpcp->cbs_tail = &rhp->next;
+	needwake = rcu_segcblist_empty(&rtpcp->cblist);
+	rcu_segcblist_enqueue(&rtpcp->cblist, rhp);
 	raw_spin_unlock_irqrestore(&rtpcp->cbs_pcpu_lock, flags);
 	/* We can't create the thread unless interrupts are enabled. */
 	if (needwake && READ_ONCE(rtp->kthread_ptr))
@@ -231,8 +229,9 @@ static void synchronize_rcu_tasks_generic(struct rcu_tasks *rtp)
 static int __noreturn rcu_tasks_kthread(void *arg)
 {
 	unsigned long flags;
-	struct rcu_head *list;
-	struct rcu_head *next;
+	int len;
+	struct rcu_cblist rcl = RCU_CBLIST_INITIALIZER(rcl);
+	struct rcu_head *rhp;
 	struct rcu_tasks *rtp = arg;
 
 	/* Run on housekeeping CPUs by default.  Sysadm can move if desired. */
@@ -253,16 +252,15 @@ static int __noreturn rcu_tasks_kthread(void *arg)
 		/* Pick up any new callbacks. */
 		raw_spin_lock_irqsave(&rtpcp->cbs_pcpu_lock, flags);
 		smp_mb__after_spinlock(); // Order updates vs. GP.
-		list = rtpcp->cbs_head;
-		rtpcp->cbs_head = NULL;
-		rtpcp->cbs_tail = &rtpcp->cbs_head;
+		rcu_segcblist_advance(&rtpcp->cblist, rcu_seq_current(&rtp->tasks_gp_seq));
+		(void)rcu_segcblist_accelerate(&rtpcp->cblist, rcu_seq_snap(&rtp->tasks_gp_seq));
 		raw_spin_unlock_irqrestore(&rtpcp->cbs_pcpu_lock, flags);
 
 		/* If there were none, wait a bit and start over. */
-		if (!list) {
+		if (!rcu_segcblist_pend_cbs(&rtpcp->cblist)) {
 			wait_event_interruptible(rtp->cbs_wq,
-						 READ_ONCE(rtpcp->cbs_head));
-			if (!rtpcp->cbs_head) {
+						 rcu_segcblist_pend_cbs(&rtpcp->cblist));
+			if (!rcu_segcblist_pend_cbs(&rtpcp->cblist)) {
 				WARN_ON(signal_pending(current));
 				set_tasks_gp_state(rtp, RTGS_WAIT_WAIT_CBS);
 				schedule_timeout_idle(HZ/10);
@@ -279,14 +277,22 @@ static int __noreturn rcu_tasks_kthread(void *arg)
 
 		/* Invoke the callbacks. */
 		set_tasks_gp_state(rtp, RTGS_INVOKE_CBS);
-		while (list) {
-			next = list->next;
+		raw_spin_lock_irqsave(&rtpcp->cbs_pcpu_lock, flags);
+		smp_mb__after_spinlock(); // Order updates vs. GP.
+		rcu_segcblist_advance(&rtpcp->cblist, rcu_seq_current(&rtp->tasks_gp_seq));
+		rcu_segcblist_extract_done_cbs(&rtpcp->cblist, &rcl);
+		raw_spin_unlock_irqrestore(&rtpcp->cbs_pcpu_lock, flags);
+		len = rcl.len;
+		for (rhp = rcu_cblist_dequeue(&rcl); rhp; rhp = rcu_cblist_dequeue(&rcl)) {
 			local_bh_disable();
-			list->func(list);
+			rhp->func(rhp);
 			local_bh_enable();
-			list = next;
 			cond_resched();
 		}
+		raw_spin_lock_irqsave(&rtpcp->cbs_pcpu_lock, flags);
+		rcu_segcblist_add_len(&rtpcp->cblist, -len);
+		(void)rcu_segcblist_accelerate(&rtpcp->cblist, rcu_seq_snap(&rtp->tasks_gp_seq));
+		raw_spin_unlock_irqrestore(&rtpcp->cbs_pcpu_lock, flags);
 		/* Paranoid sleep to keep this from entering a tight loop */
 		schedule_timeout_idle(rtp->gp_sleep);
 	}
@@ -339,7 +345,7 @@ static void show_rcu_tasks_generic_gp_kthread(struct rcu_tasks *rtp, char *s)
 		data_race(rcu_seq_current(&rtp->tasks_gp_seq)),
 		data_race(rtp->n_ipis_fails), data_race(rtp->n_ipis),
 		".k"[!!data_race(rtp->kthread_ptr)],
-		".C"[!!data_race(rtpcp->cbs_head)],
+		".C"[!data_race(rcu_segcblist_empty(&rtpcp->cblist))],
 		s);
 }
 #endif // #ifndef CONFIG_TINY_RCU
-- 
2.31.1.189.g2e36527f23


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH rcu 06/18] rcu-tasks: Use spin_lock_rcu_node() and friends
  2021-12-02  0:38 [PATCH rcu 0/18] RCU Tasks updates for v5.17 Paul E. McKenney
                   ` (4 preceding siblings ...)
  2021-12-02  0:38 ` [PATCH rcu 05/18] rcu_tasks: Convert bespoke callback list to rcu_segcblist structure Paul E. McKenney
@ 2021-12-02  0:38 ` Paul E. McKenney
  2021-12-02  0:38 ` [PATCH rcu 07/18] rcu-tasks: Inspect stalled task's trc state in locked state Paul E. McKenney
                   ` (11 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: Paul E. McKenney @ 2021-12-02  0:38 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, kernel-team, mingo, jiangshanlai, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, Paul E. McKenney, Martin Lau,
	Neeraj Upadhyay

This commit renames the rcu_tasks_percpu structure's ->cbs_pcpu_lock
to ->lock and then uses spin_lock_rcu_node() and friends to acquire and
release this lock, preparing for upcoming commits that will spread the
grace-period process across multiple CPUs and kthreads.

[ paulmck: Apply feedback from kernel test robot. ]

Reported-by: Martin Lau <kafai@fb.com>
Cc: Neeraj Upadhyay <neeraj.iitr10@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 kernel/rcu/tasks.h | 32 ++++++++++++++++----------------
 1 file changed, 16 insertions(+), 16 deletions(-)

diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h
index 2e58d7fa2da41..e9f59a88637cf 100644
--- a/kernel/rcu/tasks.h
+++ b/kernel/rcu/tasks.h
@@ -23,11 +23,11 @@ typedef void (*postgp_func_t)(struct rcu_tasks *rtp);
 /**
  * struct rcu_tasks_percpu - Per-CPU component of definition for a Tasks-RCU-like mechanism.
  * @cblist: Callback list.
- * @cbs_pcpu_lock: Lock protecting per-CPU callback list.
+ * @lock: Lock protecting per-CPU callback list.
  */
 struct rcu_tasks_percpu {
 	struct rcu_segcblist cblist;
-	raw_spinlock_t cbs_pcpu_lock;
+	raw_spinlock_t __private lock;
 };
 
 /**
@@ -82,7 +82,7 @@ struct rcu_tasks {
 
 #define DEFINE_RCU_TASKS(rt_name, gp, call, n)						\
 static DEFINE_PER_CPU(struct rcu_tasks_percpu, rt_name ## __percpu) = {			\
-	.cbs_pcpu_lock = __RAW_SPIN_LOCK_UNLOCKED(rt_name ## __percpu.cbs_pcpu_lock),	\
+	.lock = __RAW_SPIN_LOCK_UNLOCKED(rt_name ## __percpu.cbs_pcpu_lock),		\
 };											\
 static struct rcu_tasks rt_name =							\
 {											\
@@ -177,11 +177,11 @@ static void cblist_init_generic(struct rcu_tasks *rtp)
 
 		WARN_ON_ONCE(!rtpcp);
 		if (cpu)
-			raw_spin_lock_init(&rtpcp->cbs_pcpu_lock);
-		raw_spin_lock(&rtpcp->cbs_pcpu_lock); // irqs already disabled.
+			raw_spin_lock_init(&ACCESS_PRIVATE(rtpcp, lock));
+		raw_spin_lock_rcu_node(rtpcp); // irqs already disabled.
 		if (rcu_segcblist_empty(&rtpcp->cblist))
 			rcu_segcblist_init(&rtpcp->cblist);
-		raw_spin_unlock(&rtpcp->cbs_pcpu_lock); // irqs remain disabled.
+		raw_spin_unlock_rcu_node(rtpcp); // irqs remain disabled.
 	}
 	raw_spin_unlock_irqrestore(&rtp->cbs_gbl_lock, flags);
 
@@ -200,15 +200,15 @@ static void call_rcu_tasks_generic(struct rcu_head *rhp, rcu_callback_t func,
 	local_irq_save(flags);
 	rtpcp = per_cpu_ptr(rtp->rtpcpu,
 			    smp_processor_id() >> READ_ONCE(rtp->percpu_enqueue_shift));
-	raw_spin_lock(&rtpcp->cbs_pcpu_lock);
+	raw_spin_lock_rcu_node(rtpcp); // irqs already disabled.
 	if (!rcu_segcblist_is_enabled(&rtpcp->cblist)) {
-		raw_spin_unlock(&rtpcp->cbs_pcpu_lock); // irqs remain disabled.
+		raw_spin_unlock_rcu_node(rtpcp); // irqs remain disabled.
 		cblist_init_generic(rtp);
-		raw_spin_lock(&rtpcp->cbs_pcpu_lock); // irqs already disabled.
+		raw_spin_lock_rcu_node(rtpcp); // irqs already disabled.
 	}
 	needwake = rcu_segcblist_empty(&rtpcp->cblist);
 	rcu_segcblist_enqueue(&rtpcp->cblist, rhp);
-	raw_spin_unlock_irqrestore(&rtpcp->cbs_pcpu_lock, flags);
+	raw_spin_unlock_irqrestore_rcu_node(rtpcp, flags);
 	/* We can't create the thread unless interrupts are enabled. */
 	if (needwake && READ_ONCE(rtp->kthread_ptr))
 		wake_up(&rtp->cbs_wq);
@@ -250,11 +250,11 @@ static int __noreturn rcu_tasks_kthread(void *arg)
 		set_tasks_gp_state(rtp, RTGS_WAIT_CBS);
 
 		/* Pick up any new callbacks. */
-		raw_spin_lock_irqsave(&rtpcp->cbs_pcpu_lock, flags);
+		raw_spin_lock_irqsave_rcu_node(rtpcp, flags);
 		smp_mb__after_spinlock(); // Order updates vs. GP.
 		rcu_segcblist_advance(&rtpcp->cblist, rcu_seq_current(&rtp->tasks_gp_seq));
 		(void)rcu_segcblist_accelerate(&rtpcp->cblist, rcu_seq_snap(&rtp->tasks_gp_seq));
-		raw_spin_unlock_irqrestore(&rtpcp->cbs_pcpu_lock, flags);
+		raw_spin_unlock_irqrestore_rcu_node(rtpcp, flags);
 
 		/* If there were none, wait a bit and start over. */
 		if (!rcu_segcblist_pend_cbs(&rtpcp->cblist)) {
@@ -277,11 +277,11 @@ static int __noreturn rcu_tasks_kthread(void *arg)
 
 		/* Invoke the callbacks. */
 		set_tasks_gp_state(rtp, RTGS_INVOKE_CBS);
-		raw_spin_lock_irqsave(&rtpcp->cbs_pcpu_lock, flags);
+		raw_spin_lock_irqsave_rcu_node(rtpcp, flags);
 		smp_mb__after_spinlock(); // Order updates vs. GP.
 		rcu_segcblist_advance(&rtpcp->cblist, rcu_seq_current(&rtp->tasks_gp_seq));
 		rcu_segcblist_extract_done_cbs(&rtpcp->cblist, &rcl);
-		raw_spin_unlock_irqrestore(&rtpcp->cbs_pcpu_lock, flags);
+		raw_spin_unlock_irqrestore_rcu_node(rtpcp, flags);
 		len = rcl.len;
 		for (rhp = rcu_cblist_dequeue(&rcl); rhp; rhp = rcu_cblist_dequeue(&rcl)) {
 			local_bh_disable();
@@ -289,10 +289,10 @@ static int __noreturn rcu_tasks_kthread(void *arg)
 			local_bh_enable();
 			cond_resched();
 		}
-		raw_spin_lock_irqsave(&rtpcp->cbs_pcpu_lock, flags);
+		raw_spin_lock_irqsave_rcu_node(rtpcp, flags);
 		rcu_segcblist_add_len(&rtpcp->cblist, -len);
 		(void)rcu_segcblist_accelerate(&rtpcp->cblist, rcu_seq_snap(&rtp->tasks_gp_seq));
-		raw_spin_unlock_irqrestore(&rtpcp->cbs_pcpu_lock, flags);
+		raw_spin_unlock_irqrestore_rcu_node(rtpcp, flags);
 		/* Paranoid sleep to keep this from entering a tight loop */
 		schedule_timeout_idle(rtp->gp_sleep);
 	}
-- 
2.31.1.189.g2e36527f23


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH rcu 07/18] rcu-tasks: Inspect stalled task's trc state in locked state
  2021-12-02  0:38 [PATCH rcu 0/18] RCU Tasks updates for v5.17 Paul E. McKenney
                   ` (5 preceding siblings ...)
  2021-12-02  0:38 ` [PATCH rcu 06/18] rcu-tasks: Use spin_lock_rcu_node() and friends Paul E. McKenney
@ 2021-12-02  0:38 ` Paul E. McKenney
  2021-12-02  0:38 ` [PATCH rcu 08/18] rcu-tasks: Add a ->percpu_enqueue_lim to the rcu_tasks structure Paul E. McKenney
                   ` (10 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: Paul E. McKenney @ 2021-12-02  0:38 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, kernel-team, mingo, jiangshanlai, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, Neeraj Upadhyay,
	Paul E . McKenney

From: Neeraj Upadhyay <quic_neeraju@quicinc.com>

On RCU tasks trace stall, inspect the RCU-tasks-trace specific
states of stalled task in locked down state, using try_invoke_
on_locked_down_task(), to get reliable trc state of a non-running
stalled task.

This was tested using the following command:

tools/testing/selftests/rcutorture/bin/kvm.sh --cpus 8 --configs TRACE01 \
--bootargs "rcutorture.torture_type=tasks-tracing rcutorture.stall_cpu=10 \
rcutorture.stall_cpu_block=1 rcupdate.rcu_task_stall_timeout=100" --trust-make

As expected, this produced the following console output for running and
sleeping tasks.

[   21.520291] INFO: rcu_tasks_trace detected stalls on tasks:
[   21.521292] P85: ... nesting: 1N cpu: 2
[   21.521966] task:rcu_torture_sta state:D stack:15080 pid:   85 ppid:     2
flags:0x00004000
[   21.523384] Call Trace:
[   21.523808]  __schedule+0x273/0x6e0
[   21.524428]  schedule+0x35/0xa0
[   21.524971]  schedule_timeout+0x1ed/0x270
[   21.525690]  ? del_timer_sync+0x30/0x30
[   21.526371]  ? rcu_torture_writer+0x720/0x720
[   21.527106]  rcu_torture_stall+0x24a/0x270
[   21.527816]  kthread+0x115/0x140
[   21.528401]  ? set_kthread_struct+0x40/0x40
[   21.529136]  ret_from_fork+0x22/0x30
[   21.529766]  1 holdouts
[   21.632300] INFO: rcu_tasks_trace detected stalls on tasks:
[   21.632345] rcu_torture_stall end.
[   21.633293] P85: .
[   21.633294] task:rcu_torture_sta state:R  running task stack:15080 pid:
85 ppid:     2 flags:0x00004000
[   21.633299] Call Trace:
[   21.633301]  ? vprintk_emit+0xab/0x180
[   21.633306]  ? vprintk_emit+0x11a/0x180
[   21.633308]  ? _printk+0x4d/0x69
[   21.633311]  ? __default_send_IPI_shortcut+0x1f/0x40

[ paulmck: Update to new v5.16 task_call_func() name. ]

Signed-off-by: Neeraj Upadhyay <quic_neeraju@quicinc.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 kernel/rcu/tasks.h | 43 ++++++++++++++++++++++++++++++++++---------
 1 file changed, 34 insertions(+), 9 deletions(-)

diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h
index e9f59a88637cf..b3d15600f4fe9 100644
--- a/kernel/rcu/tasks.h
+++ b/kernel/rcu/tasks.h
@@ -1134,25 +1134,50 @@ static void rcu_tasks_trace_postscan(struct list_head *hop)
 	// Any tasks that exit after this point will set ->trc_reader_checked.
 }
 
+/* Communicate task state back to the RCU tasks trace stall warning request. */
+struct trc_stall_chk_rdr {
+	int nesting;
+	int ipi_to_cpu;
+	u8 needqs;
+};
+
+static int trc_check_slow_task(struct task_struct *t, void *arg)
+{
+	struct trc_stall_chk_rdr *trc_rdrp = arg;
+
+	if (task_curr(t))
+		return false; // It is running, so decline to inspect it.
+	trc_rdrp->nesting = READ_ONCE(t->trc_reader_nesting);
+	trc_rdrp->ipi_to_cpu = READ_ONCE(t->trc_ipi_to_cpu);
+	trc_rdrp->needqs = READ_ONCE(t->trc_reader_special.b.need_qs);
+	return true;
+}
+
 /* Show the state of a task stalling the current RCU tasks trace GP. */
 static void show_stalled_task_trace(struct task_struct *t, bool *firstreport)
 {
 	int cpu;
+	struct trc_stall_chk_rdr trc_rdr;
+	bool is_idle_tsk = is_idle_task(t);
 
 	if (*firstreport) {
 		pr_err("INFO: rcu_tasks_trace detected stalls on tasks:\n");
 		*firstreport = false;
 	}
-	// FIXME: This should attempt to use try_invoke_on_nonrunning_task().
 	cpu = task_cpu(t);
-	pr_alert("P%d: %c%c%c nesting: %d%c cpu: %d\n",
-		 t->pid,
-		 ".I"[READ_ONCE(t->trc_ipi_to_cpu) >= 0],
-		 ".i"[is_idle_task(t)],
-		 ".N"[cpu >= 0 && tick_nohz_full_cpu(cpu)],
-		 READ_ONCE(t->trc_reader_nesting),
-		 " N"[!!READ_ONCE(t->trc_reader_special.b.need_qs)],
-		 cpu);
+	if (!task_call_func(t, trc_check_slow_task, &trc_rdr))
+		pr_alert("P%d: %c\n",
+			 t->pid,
+			 ".i"[is_idle_tsk]);
+	else
+		pr_alert("P%d: %c%c%c nesting: %d%c cpu: %d\n",
+			 t->pid,
+			 ".I"[trc_rdr.ipi_to_cpu >= 0],
+			 ".i"[is_idle_tsk],
+			 ".N"[cpu >= 0 && tick_nohz_full_cpu(cpu)],
+			 trc_rdr.nesting,
+			 " N"[!!trc_rdr.needqs],
+			 cpu);
 	sched_show_task(t);
 }
 
-- 
2.31.1.189.g2e36527f23


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH rcu 08/18] rcu-tasks: Add a ->percpu_enqueue_lim to the rcu_tasks structure
  2021-12-02  0:38 [PATCH rcu 0/18] RCU Tasks updates for v5.17 Paul E. McKenney
                   ` (6 preceding siblings ...)
  2021-12-02  0:38 ` [PATCH rcu 07/18] rcu-tasks: Inspect stalled task's trc state in locked state Paul E. McKenney
@ 2021-12-02  0:38 ` Paul E. McKenney
  2021-12-02  0:38 ` [PATCH rcu 09/18] rcu-tasks: Abstract checking of callback lists Paul E. McKenney
                   ` (9 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: Paul E. McKenney @ 2021-12-02  0:38 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, kernel-team, mingo, jiangshanlai, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, Paul E. McKenney, Martin Lau,
	Neeraj Upadhyay

This commit adds a ->percpu_enqueue_lim field to the rcu_tasks structure.
This field contains two to the power of the ->percpu_enqueue_shift
field, easing construction of iterators over the per-CPU queues that
might contain RCU Tasks callbacks.  Such iterators will be introduced
in later commits.

Reported-by: Martin Lau <kafai@fb.com>
Cc: Neeraj Upadhyay <neeraj.iitr10@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 kernel/rcu/tasks.h | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h
index b3d15600f4fe9..1c7cf8e8d65e7 100644
--- a/kernel/rcu/tasks.h
+++ b/kernel/rcu/tasks.h
@@ -52,6 +52,7 @@ struct rcu_tasks_percpu {
  * @call_func: This flavor's call_rcu()-equivalent function.
  * @rtpcpu: This flavor's rcu_tasks_percpu structure.
  * @percpu_enqueue_shift: Shift down CPU ID this much when enqueuing callbacks.
+ * @percpu_enqueue_lim: Number of per-CPU callback queues in use.
  * @name: This flavor's textual name.
  * @kname: This flavor's kthread name.
  */
@@ -76,6 +77,7 @@ struct rcu_tasks {
 	call_rcu_func_t call_func;
 	struct rcu_tasks_percpu __percpu *rtpcpu;
 	int percpu_enqueue_shift;
+	int percpu_enqueue_lim;
 	char *name;
 	char *kname;
 };
@@ -93,6 +95,7 @@ static struct rcu_tasks rt_name =							\
 	.rtpcpu = &rt_name ## __percpu,							\
 	.name = n,									\
 	.percpu_enqueue_shift = ilog2(CONFIG_NR_CPUS),					\
+	.percpu_enqueue_lim = 1,							\
 	.kname = #rt_name,								\
 }
 
@@ -172,6 +175,7 @@ static void cblist_init_generic(struct rcu_tasks *rtp)
 
 	raw_spin_lock_irqsave(&rtp->cbs_gbl_lock, flags);
 	rtp->percpu_enqueue_shift = ilog2(nr_cpu_ids);
+	rtp->percpu_enqueue_lim = 1;
 	for_each_possible_cpu(cpu) {
 		struct rcu_tasks_percpu *rtpcp = per_cpu_ptr(rtp->rtpcpu, cpu);
 
-- 
2.31.1.189.g2e36527f23


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH rcu 09/18] rcu-tasks: Abstract checking of callback lists
  2021-12-02  0:38 [PATCH rcu 0/18] RCU Tasks updates for v5.17 Paul E. McKenney
                   ` (7 preceding siblings ...)
  2021-12-02  0:38 ` [PATCH rcu 08/18] rcu-tasks: Add a ->percpu_enqueue_lim to the rcu_tasks structure Paul E. McKenney
@ 2021-12-02  0:38 ` Paul E. McKenney
  2021-12-02  0:38 ` [PATCH rcu 10/18] rcu-tasks: Abstract invocations of callbacks Paul E. McKenney
                   ` (8 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: Paul E. McKenney @ 2021-12-02  0:38 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, kernel-team, mingo, jiangshanlai, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, Paul E. McKenney, Martin Lau,
	Neeraj Upadhyay

This commit adds a rcu_tasks_need_gpcb() function that returns an
indication of whether another grace period is required, and if no grace
period is required, whether there are callbacks that need to be invoked.
The function scans all per-CPU lists currently in use.

Reported-by: Martin Lau <kafai@fb.com>
Cc: Neeraj Upadhyay <neeraj.iitr10@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 kernel/rcu/tasks.h | 63 ++++++++++++++++++++++++++++------------------
 1 file changed, 39 insertions(+), 24 deletions(-)

diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h
index 1c7cf8e8d65e7..bb37d4a4e48de 100644
--- a/kernel/rcu/tasks.h
+++ b/kernel/rcu/tasks.h
@@ -229,11 +229,39 @@ static void synchronize_rcu_tasks_generic(struct rcu_tasks *rtp)
 	wait_rcu_gp(rtp->call_func);
 }
 
+// Advance callbacks and indicate whether either a grace period or
+// callback invocation is needed.
+static int rcu_tasks_need_gpcb(struct rcu_tasks *rtp)
+{
+	int cpu;
+	unsigned long flags;
+	int needgpcb = 0;
+
+	for (cpu = 0; cpu < rtp->percpu_enqueue_lim; cpu++) {
+		struct rcu_tasks_percpu *rtpcp = per_cpu_ptr(rtp->rtpcpu, cpu);
+
+		/* Advance and accelerate any new callbacks. */
+		if (rcu_segcblist_empty(&rtpcp->cblist))
+			continue;
+		raw_spin_lock_irqsave_rcu_node(rtpcp, flags);
+		smp_mb__after_spinlock(); // Order updates vs. GP.
+		rcu_segcblist_advance(&rtpcp->cblist, rcu_seq_current(&rtp->tasks_gp_seq));
+		(void)rcu_segcblist_accelerate(&rtpcp->cblist, rcu_seq_snap(&rtp->tasks_gp_seq));
+		if (rcu_segcblist_pend_cbs(&rtpcp->cblist))
+			needgpcb |= 0x3;
+		if (!rcu_segcblist_empty(&rtpcp->cblist))
+			needgpcb |= 0x1;
+		raw_spin_unlock_irqrestore_rcu_node(rtpcp, flags);
+	}
+	return needgpcb;
+}
+
 /* RCU-tasks kthread that detects grace periods and invokes callbacks. */
 static int __noreturn rcu_tasks_kthread(void *arg)
 {
 	unsigned long flags;
 	int len;
+	int needgpcb;
 	struct rcu_cblist rcl = RCU_CBLIST_INITIALIZER(rcl);
 	struct rcu_head *rhp;
 	struct rcu_tasks *rtp = arg;
@@ -249,38 +277,25 @@ static int __noreturn rcu_tasks_kthread(void *arg)
 	 * This loop is terminated by the system going down.  ;-)
 	 */
 	for (;;) {
-		struct rcu_tasks_percpu *rtpcp = per_cpu_ptr(rtp->rtpcpu, 0);  // for_each...
+		struct rcu_tasks_percpu *rtpcp;
 
 		set_tasks_gp_state(rtp, RTGS_WAIT_CBS);
 
-		/* Pick up any new callbacks. */
-		raw_spin_lock_irqsave_rcu_node(rtpcp, flags);
-		smp_mb__after_spinlock(); // Order updates vs. GP.
-		rcu_segcblist_advance(&rtpcp->cblist, rcu_seq_current(&rtp->tasks_gp_seq));
-		(void)rcu_segcblist_accelerate(&rtpcp->cblist, rcu_seq_snap(&rtp->tasks_gp_seq));
-		raw_spin_unlock_irqrestore_rcu_node(rtpcp, flags);
-
 		/* If there were none, wait a bit and start over. */
-		if (!rcu_segcblist_pend_cbs(&rtpcp->cblist)) {
-			wait_event_interruptible(rtp->cbs_wq,
-						 rcu_segcblist_pend_cbs(&rtpcp->cblist));
-			if (!rcu_segcblist_pend_cbs(&rtpcp->cblist)) {
-				WARN_ON(signal_pending(current));
-				set_tasks_gp_state(rtp, RTGS_WAIT_WAIT_CBS);
-				schedule_timeout_idle(HZ/10);
-			}
-			continue;
+		wait_event_idle(rtp->cbs_wq, (needgpcb = rcu_tasks_need_gpcb(rtp)));
+
+		if (needgpcb & 0x2) {
+			// Wait for one grace period.
+			set_tasks_gp_state(rtp, RTGS_WAIT_GP);
+			rtp->gp_start = jiffies;
+			rcu_seq_start(&rtp->tasks_gp_seq);
+			rtp->gp_func(rtp);
+			rcu_seq_end(&rtp->tasks_gp_seq);
 		}
 
-		// Wait for one grace period.
-		set_tasks_gp_state(rtp, RTGS_WAIT_GP);
-		rtp->gp_start = jiffies;
-		rcu_seq_start(&rtp->tasks_gp_seq);
-		rtp->gp_func(rtp);
-		rcu_seq_end(&rtp->tasks_gp_seq);
-
 		/* Invoke the callbacks. */
 		set_tasks_gp_state(rtp, RTGS_INVOKE_CBS);
+		rtpcp = per_cpu_ptr(rtp->rtpcpu, 0);
 		raw_spin_lock_irqsave_rcu_node(rtpcp, flags);
 		smp_mb__after_spinlock(); // Order updates vs. GP.
 		rcu_segcblist_advance(&rtpcp->cblist, rcu_seq_current(&rtp->tasks_gp_seq));
-- 
2.31.1.189.g2e36527f23


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH rcu 10/18] rcu-tasks: Abstract invocations of callbacks
  2021-12-02  0:38 [PATCH rcu 0/18] RCU Tasks updates for v5.17 Paul E. McKenney
                   ` (8 preceding siblings ...)
  2021-12-02  0:38 ` [PATCH rcu 09/18] rcu-tasks: Abstract checking of callback lists Paul E. McKenney
@ 2021-12-02  0:38 ` Paul E. McKenney
  2021-12-02  0:38 ` [PATCH rcu 11/18] rcu-tasks: Use workqueues for multiple rcu_tasks_invoke_cbs() invocations Paul E. McKenney
                   ` (7 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: Paul E. McKenney @ 2021-12-02  0:38 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, kernel-team, mingo, jiangshanlai, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, Paul E. McKenney, Martin Lau,
	Neeraj Upadhyay

This commit adds a rcu_tasks_invoke_cbs() function that invokes all
ready callbacks on all of the per-CPU lists that are currently in use.

Reported-by: Martin Lau <kafai@fb.com>
Cc: Neeraj Upadhyay <neeraj.iitr10@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 kernel/rcu/tasks.h | 58 ++++++++++++++++++++++++++++------------------
 1 file changed, 35 insertions(+), 23 deletions(-)

diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h
index bb37d4a4e48de..3bc0edb6a7bb6 100644
--- a/kernel/rcu/tasks.h
+++ b/kernel/rcu/tasks.h
@@ -256,14 +256,43 @@ static int rcu_tasks_need_gpcb(struct rcu_tasks *rtp)
 	return needgpcb;
 }
 
-/* RCU-tasks kthread that detects grace periods and invokes callbacks. */
-static int __noreturn rcu_tasks_kthread(void *arg)
+// Advance callbacks and invoke any that are ready.
+static void rcu_tasks_invoke_cbs(struct rcu_tasks *rtp)
 {
+	int cpu;
 	unsigned long flags;
 	int len;
-	int needgpcb;
 	struct rcu_cblist rcl = RCU_CBLIST_INITIALIZER(rcl);
 	struct rcu_head *rhp;
+
+	for (cpu = 0; cpu < rtp->percpu_enqueue_lim; cpu++) {
+		struct rcu_tasks_percpu *rtpcp = per_cpu_ptr(rtp->rtpcpu, cpu);
+
+		if (rcu_segcblist_empty(&rtpcp->cblist))
+			continue;
+		raw_spin_lock_irqsave_rcu_node(rtpcp, flags);
+		smp_mb__after_spinlock(); // Order updates vs. GP.
+		rcu_segcblist_advance(&rtpcp->cblist, rcu_seq_current(&rtp->tasks_gp_seq));
+		rcu_segcblist_extract_done_cbs(&rtpcp->cblist, &rcl);
+		raw_spin_unlock_irqrestore_rcu_node(rtpcp, flags);
+		len = rcl.len;
+		for (rhp = rcu_cblist_dequeue(&rcl); rhp; rhp = rcu_cblist_dequeue(&rcl)) {
+			local_bh_disable();
+			rhp->func(rhp);
+			local_bh_enable();
+			cond_resched();
+		}
+		raw_spin_lock_irqsave_rcu_node(rtpcp, flags);
+		rcu_segcblist_add_len(&rtpcp->cblist, -len);
+		(void)rcu_segcblist_accelerate(&rtpcp->cblist, rcu_seq_snap(&rtp->tasks_gp_seq));
+		raw_spin_unlock_irqrestore_rcu_node(rtpcp, flags);
+	}
+}
+
+/* RCU-tasks kthread that detects grace periods and invokes callbacks. */
+static int __noreturn rcu_tasks_kthread(void *arg)
+{
+	int needgpcb;
 	struct rcu_tasks *rtp = arg;
 
 	/* Run on housekeeping CPUs by default.  Sysadm can move if desired. */
@@ -277,8 +306,6 @@ static int __noreturn rcu_tasks_kthread(void *arg)
 	 * This loop is terminated by the system going down.  ;-)
 	 */
 	for (;;) {
-		struct rcu_tasks_percpu *rtpcp;
-
 		set_tasks_gp_state(rtp, RTGS_WAIT_CBS);
 
 		/* If there were none, wait a bit and start over. */
@@ -293,25 +320,10 @@ static int __noreturn rcu_tasks_kthread(void *arg)
 			rcu_seq_end(&rtp->tasks_gp_seq);
 		}
 
-		/* Invoke the callbacks. */
+		/* Invoke callbacks. */
 		set_tasks_gp_state(rtp, RTGS_INVOKE_CBS);
-		rtpcp = per_cpu_ptr(rtp->rtpcpu, 0);
-		raw_spin_lock_irqsave_rcu_node(rtpcp, flags);
-		smp_mb__after_spinlock(); // Order updates vs. GP.
-		rcu_segcblist_advance(&rtpcp->cblist, rcu_seq_current(&rtp->tasks_gp_seq));
-		rcu_segcblist_extract_done_cbs(&rtpcp->cblist, &rcl);
-		raw_spin_unlock_irqrestore_rcu_node(rtpcp, flags);
-		len = rcl.len;
-		for (rhp = rcu_cblist_dequeue(&rcl); rhp; rhp = rcu_cblist_dequeue(&rcl)) {
-			local_bh_disable();
-			rhp->func(rhp);
-			local_bh_enable();
-			cond_resched();
-		}
-		raw_spin_lock_irqsave_rcu_node(rtpcp, flags);
-		rcu_segcblist_add_len(&rtpcp->cblist, -len);
-		(void)rcu_segcblist_accelerate(&rtpcp->cblist, rcu_seq_snap(&rtp->tasks_gp_seq));
-		raw_spin_unlock_irqrestore_rcu_node(rtpcp, flags);
+		rcu_tasks_invoke_cbs(rtp);
+
 		/* Paranoid sleep to keep this from entering a tight loop */
 		schedule_timeout_idle(rtp->gp_sleep);
 	}
-- 
2.31.1.189.g2e36527f23


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH rcu 11/18] rcu-tasks: Use workqueues for multiple rcu_tasks_invoke_cbs() invocations
  2021-12-02  0:38 [PATCH rcu 0/18] RCU Tasks updates for v5.17 Paul E. McKenney
                   ` (9 preceding siblings ...)
  2021-12-02  0:38 ` [PATCH rcu 10/18] rcu-tasks: Abstract invocations of callbacks Paul E. McKenney
@ 2021-12-02  0:38 ` Paul E. McKenney
  2021-12-02  0:38 ` [PATCH rcu 12/18] rcu-tasks: Make rcu_barrier_tasks*() handle multiple callback queues Paul E. McKenney
                   ` (6 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: Paul E. McKenney @ 2021-12-02  0:38 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, kernel-team, mingo, jiangshanlai, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, Paul E. McKenney, Martin Lau,
	Neeraj Upadhyay

If there is a flood of callbacks, it is necessary to put multiple
CPUs to work invoking those callbacks.  This commit therefore uses a
workqueue-flooding approach to parallelize RCU Tasks callback execution.

Reported-by: Martin Lau <kafai@fb.com>
Cc: Neeraj Upadhyay <neeraj.iitr10@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 kernel/rcu/tasks.h | 77 +++++++++++++++++++++++++++++++---------------
 1 file changed, 53 insertions(+), 24 deletions(-)

diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h
index 3bc0edb6a7bb6..a75a4ca78a621 100644
--- a/kernel/rcu/tasks.h
+++ b/kernel/rcu/tasks.h
@@ -24,10 +24,14 @@ typedef void (*postgp_func_t)(struct rcu_tasks *rtp);
  * struct rcu_tasks_percpu - Per-CPU component of definition for a Tasks-RCU-like mechanism.
  * @cblist: Callback list.
  * @lock: Lock protecting per-CPU callback list.
+ * @rtp_work: Work queue for invoking callbacks.
  */
 struct rcu_tasks_percpu {
 	struct rcu_segcblist cblist;
 	raw_spinlock_t __private lock;
+	struct work_struct rtp_work;
+	int cpu;
+	struct rcu_tasks *rtpp;
 };
 
 /**
@@ -146,6 +150,8 @@ static const char * const rcu_tasks_gp_state_names[] = {
 //
 // Generic code.
 
+static void rcu_tasks_invoke_cbs_wq(struct work_struct *wp);
+
 /* Record grace-period phase and time. */
 static void set_tasks_gp_state(struct rcu_tasks *rtp, int newstate)
 {
@@ -185,6 +191,9 @@ static void cblist_init_generic(struct rcu_tasks *rtp)
 		raw_spin_lock_rcu_node(rtpcp); // irqs already disabled.
 		if (rcu_segcblist_empty(&rtpcp->cblist))
 			rcu_segcblist_init(&rtpcp->cblist);
+		INIT_WORK(&rtpcp->rtp_work, rcu_tasks_invoke_cbs_wq);
+		rtpcp->cpu = cpu;
+		rtpcp->rtpp = rtp;
 		raw_spin_unlock_rcu_node(rtpcp); // irqs remain disabled.
 	}
 	raw_spin_unlock_irqrestore(&rtp->cbs_gbl_lock, flags);
@@ -257,36 +266,56 @@ static int rcu_tasks_need_gpcb(struct rcu_tasks *rtp)
 }
 
 // Advance callbacks and invoke any that are ready.
-static void rcu_tasks_invoke_cbs(struct rcu_tasks *rtp)
+static void rcu_tasks_invoke_cbs(struct rcu_tasks *rtp, struct rcu_tasks_percpu *rtpcp)
 {
 	int cpu;
+	int cpunext;
 	unsigned long flags;
 	int len;
-	struct rcu_cblist rcl = RCU_CBLIST_INITIALIZER(rcl);
 	struct rcu_head *rhp;
-
-	for (cpu = 0; cpu < rtp->percpu_enqueue_lim; cpu++) {
-		struct rcu_tasks_percpu *rtpcp = per_cpu_ptr(rtp->rtpcpu, cpu);
-
-		if (rcu_segcblist_empty(&rtpcp->cblist))
-			continue;
-		raw_spin_lock_irqsave_rcu_node(rtpcp, flags);
-		smp_mb__after_spinlock(); // Order updates vs. GP.
-		rcu_segcblist_advance(&rtpcp->cblist, rcu_seq_current(&rtp->tasks_gp_seq));
-		rcu_segcblist_extract_done_cbs(&rtpcp->cblist, &rcl);
-		raw_spin_unlock_irqrestore_rcu_node(rtpcp, flags);
-		len = rcl.len;
-		for (rhp = rcu_cblist_dequeue(&rcl); rhp; rhp = rcu_cblist_dequeue(&rcl)) {
-			local_bh_disable();
-			rhp->func(rhp);
-			local_bh_enable();
-			cond_resched();
+	struct rcu_cblist rcl = RCU_CBLIST_INITIALIZER(rcl);
+	struct rcu_tasks_percpu *rtpcp_next;
+
+	cpu = rtpcp->cpu;
+	cpunext = cpu * 2 + 1;
+	if (cpunext < rtp->percpu_enqueue_lim) {
+		rtpcp_next = per_cpu_ptr(rtp->rtpcpu, cpunext);
+		queue_work_on(cpunext, system_wq, &rtpcp_next->rtp_work);
+		cpunext++;
+		if (cpunext < rtp->percpu_enqueue_lim) {
+			rtpcp_next = per_cpu_ptr(rtp->rtpcpu, cpunext);
+			queue_work_on(cpunext, system_wq, &rtpcp_next->rtp_work);
 		}
-		raw_spin_lock_irqsave_rcu_node(rtpcp, flags);
-		rcu_segcblist_add_len(&rtpcp->cblist, -len);
-		(void)rcu_segcblist_accelerate(&rtpcp->cblist, rcu_seq_snap(&rtp->tasks_gp_seq));
-		raw_spin_unlock_irqrestore_rcu_node(rtpcp, flags);
 	}
+
+	if (rcu_segcblist_empty(&rtpcp->cblist))
+		return;
+	raw_spin_lock_irqsave_rcu_node(rtpcp, flags);
+	smp_mb__after_spinlock(); // Order updates vs. GP.
+	rcu_segcblist_advance(&rtpcp->cblist, rcu_seq_current(&rtp->tasks_gp_seq));
+	rcu_segcblist_extract_done_cbs(&rtpcp->cblist, &rcl);
+	raw_spin_unlock_irqrestore_rcu_node(rtpcp, flags);
+	len = rcl.len;
+	for (rhp = rcu_cblist_dequeue(&rcl); rhp; rhp = rcu_cblist_dequeue(&rcl)) {
+		local_bh_disable();
+		rhp->func(rhp);
+		local_bh_enable();
+		cond_resched();
+	}
+	raw_spin_lock_irqsave_rcu_node(rtpcp, flags);
+	rcu_segcblist_add_len(&rtpcp->cblist, -len);
+	(void)rcu_segcblist_accelerate(&rtpcp->cblist, rcu_seq_snap(&rtp->tasks_gp_seq));
+	raw_spin_unlock_irqrestore_rcu_node(rtpcp, flags);
+}
+
+// Workqueue flood to advance callbacks and invoke any that are ready.
+static void rcu_tasks_invoke_cbs_wq(struct work_struct *wp)
+{
+	struct rcu_tasks *rtp;
+	struct rcu_tasks_percpu *rtpcp = container_of(wp, struct rcu_tasks_percpu, rtp_work);
+
+	rtp = rtpcp->rtpp;
+	rcu_tasks_invoke_cbs(rtp, rtpcp);
 }
 
 /* RCU-tasks kthread that detects grace periods and invokes callbacks. */
@@ -322,7 +351,7 @@ static int __noreturn rcu_tasks_kthread(void *arg)
 
 		/* Invoke callbacks. */
 		set_tasks_gp_state(rtp, RTGS_INVOKE_CBS);
-		rcu_tasks_invoke_cbs(rtp);
+		rcu_tasks_invoke_cbs(rtp, per_cpu_ptr(rtp->rtpcpu, 0));
 
 		/* Paranoid sleep to keep this from entering a tight loop */
 		schedule_timeout_idle(rtp->gp_sleep);
-- 
2.31.1.189.g2e36527f23


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH rcu 12/18] rcu-tasks: Make rcu_barrier_tasks*() handle multiple callback queues
  2021-12-02  0:38 [PATCH rcu 0/18] RCU Tasks updates for v5.17 Paul E. McKenney
                   ` (10 preceding siblings ...)
  2021-12-02  0:38 ` [PATCH rcu 11/18] rcu-tasks: Use workqueues for multiple rcu_tasks_invoke_cbs() invocations Paul E. McKenney
@ 2021-12-02  0:38 ` Paul E. McKenney
  2021-12-02  0:38 ` [PATCH rcu 13/18] rcu-tasks: Add rcupdate.rcu_task_enqueue_lim to set initial queueing Paul E. McKenney
                   ` (5 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: Paul E. McKenney @ 2021-12-02  0:38 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, kernel-team, mingo, jiangshanlai, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, Paul E. McKenney, Martin Lau,
	Neeraj Upadhyay

Currently, rcu_barrier_tasks(), rcu_barrier_tasks_rude(),
and rcu_barrier_tasks_trace() simply invoke the corresponding
synchronize_rcu_tasks*() function.  This works because there is only
one callback queue.

However, there will soon be multiple callback queues.  This commit
therefore scans the queues currently in use, entraining a callback on
each non-empty queue.  Sequence numbers and reference counts are used
to synchronize this process in a manner similar to the approach taken
by rcu_barrier().

Reported-by: Martin Lau <kafai@fb.com>
Cc: Neeraj Upadhyay <neeraj.iitr10@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 kernel/rcu/tasks.h | 70 ++++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 64 insertions(+), 6 deletions(-)

diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h
index a75a4ca78a621..61a606569868b 100644
--- a/kernel/rcu/tasks.h
+++ b/kernel/rcu/tasks.h
@@ -25,11 +25,15 @@ typedef void (*postgp_func_t)(struct rcu_tasks *rtp);
  * @cblist: Callback list.
  * @lock: Lock protecting per-CPU callback list.
  * @rtp_work: Work queue for invoking callbacks.
+ * @barrier_q_head: RCU callback for barrier operation.
+ * @cpu: CPU number corresponding to this entry.
+ * @rtpp: Pointer to the rcu_tasks structure.
  */
 struct rcu_tasks_percpu {
 	struct rcu_segcblist cblist;
 	raw_spinlock_t __private lock;
 	struct work_struct rtp_work;
+	struct rcu_head barrier_q_head;
 	int cpu;
 	struct rcu_tasks *rtpp;
 };
@@ -57,6 +61,10 @@ struct rcu_tasks_percpu {
  * @rtpcpu: This flavor's rcu_tasks_percpu structure.
  * @percpu_enqueue_shift: Shift down CPU ID this much when enqueuing callbacks.
  * @percpu_enqueue_lim: Number of per-CPU callback queues in use.
+ * @barrier_q_mutex: Serialize barrier operations.
+ * @barrier_q_count: Number of queues being waited on.
+ * @barrier_q_completion: Barrier wait/wakeup mechanism.
+ * @barrier_q_seq: Sequence number for barrier operations.
  * @name: This flavor's textual name.
  * @kname: This flavor's kthread name.
  */
@@ -82,6 +90,10 @@ struct rcu_tasks {
 	struct rcu_tasks_percpu __percpu *rtpcpu;
 	int percpu_enqueue_shift;
 	int percpu_enqueue_lim;
+	struct mutex barrier_q_mutex;
+	atomic_t barrier_q_count;
+	struct completion barrier_q_completion;
+	unsigned long barrier_q_seq;
 	char *name;
 	char *kname;
 };
@@ -100,6 +112,8 @@ static struct rcu_tasks rt_name =							\
 	.name = n,									\
 	.percpu_enqueue_shift = ilog2(CONFIG_NR_CPUS),					\
 	.percpu_enqueue_lim = 1,							\
+	.barrier_q_mutex = __MUTEX_INITIALIZER(rt_name.barrier_q_mutex),		\
+	.barrier_q_seq = (0UL - 50UL) << RCU_SEQ_CTR_SHIFT,				\
 	.kname = #rt_name,								\
 }
 
@@ -238,6 +252,53 @@ static void synchronize_rcu_tasks_generic(struct rcu_tasks *rtp)
 	wait_rcu_gp(rtp->call_func);
 }
 
+// RCU callback function for rcu_barrier_tasks_generic().
+static void rcu_barrier_tasks_generic_cb(struct rcu_head *rhp)
+{
+	struct rcu_tasks *rtp;
+	struct rcu_tasks_percpu *rtpcp;
+
+	rtpcp = container_of(rhp, struct rcu_tasks_percpu, barrier_q_head);
+	rtp = rtpcp->rtpp;
+	if (atomic_dec_and_test(&rtp->barrier_q_count))
+		complete(&rtp->barrier_q_completion);
+}
+
+// Wait for all in-flight callbacks for the specified RCU Tasks flavor.
+// Operates in a manner similar to rcu_barrier().
+static void rcu_barrier_tasks_generic(struct rcu_tasks *rtp)
+{
+	int cpu;
+	unsigned long flags;
+	struct rcu_tasks_percpu *rtpcp;
+	unsigned long s = rcu_seq_snap(&rtp->barrier_q_seq);
+
+	mutex_lock(&rtp->barrier_q_mutex);
+	if (rcu_seq_done(&rtp->barrier_q_seq, s)) {
+		smp_mb();
+		mutex_unlock(&rtp->barrier_q_mutex);
+		return;
+	}
+	rcu_seq_start(&rtp->barrier_q_seq);
+	init_completion(&rtp->barrier_q_completion);
+	atomic_set(&rtp->barrier_q_count, 2);
+	for_each_possible_cpu(cpu) {
+		if (cpu >= smp_load_acquire(&rtp->percpu_enqueue_lim))
+			break;
+		rtpcp = per_cpu_ptr(rtp->rtpcpu, cpu);
+		rtpcp->barrier_q_head.func = rcu_barrier_tasks_generic_cb;
+		raw_spin_lock_irqsave_rcu_node(rtpcp, flags);
+		if (rcu_segcblist_entrain(&rtpcp->cblist, &rtpcp->barrier_q_head))
+			atomic_inc(&rtp->barrier_q_count);
+		raw_spin_unlock_irqrestore_rcu_node(rtpcp, flags);
+	}
+	if (atomic_sub_and_test(2, &rtp->barrier_q_count))
+		complete(&rtp->barrier_q_completion);
+	wait_for_completion(&rtp->barrier_q_completion);
+	rcu_seq_end(&rtp->barrier_q_seq);
+	mutex_unlock(&rtp->barrier_q_mutex);
+}
+
 // Advance callbacks and indicate whether either a grace period or
 // callback invocation is needed.
 static int rcu_tasks_need_gpcb(struct rcu_tasks *rtp)
@@ -705,8 +766,7 @@ EXPORT_SYMBOL_GPL(synchronize_rcu_tasks);
  */
 void rcu_barrier_tasks(void)
 {
-	/* There is only one callback queue, so this is easy.  ;-) */
-	synchronize_rcu_tasks();
+	rcu_barrier_tasks_generic(&rcu_tasks);
 }
 EXPORT_SYMBOL_GPL(rcu_barrier_tasks);
 
@@ -844,8 +904,7 @@ EXPORT_SYMBOL_GPL(synchronize_rcu_tasks_rude);
  */
 void rcu_barrier_tasks_rude(void)
 {
-	/* There is only one callback queue, so this is easy.  ;-) */
-	synchronize_rcu_tasks_rude();
+	rcu_barrier_tasks_generic(&rcu_tasks_rude);
 }
 EXPORT_SYMBOL_GPL(rcu_barrier_tasks_rude);
 
@@ -1403,8 +1462,7 @@ EXPORT_SYMBOL_GPL(synchronize_rcu_tasks_trace);
  */
 void rcu_barrier_tasks_trace(void)
 {
-	/* There is only one callback queue, so this is easy.  ;-) */
-	synchronize_rcu_tasks_trace();
+	rcu_barrier_tasks_generic(&rcu_tasks_trace);
 }
 EXPORT_SYMBOL_GPL(rcu_barrier_tasks_trace);
 
-- 
2.31.1.189.g2e36527f23


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH rcu 13/18] rcu-tasks: Add rcupdate.rcu_task_enqueue_lim to set initial queueing
  2021-12-02  0:38 [PATCH rcu 0/18] RCU Tasks updates for v5.17 Paul E. McKenney
                   ` (11 preceding siblings ...)
  2021-12-02  0:38 ` [PATCH rcu 12/18] rcu-tasks: Make rcu_barrier_tasks*() handle multiple callback queues Paul E. McKenney
@ 2021-12-02  0:38 ` Paul E. McKenney
  2021-12-02  0:38 ` [PATCH rcu 14/18] rcu-tasks: Count trylocks to estimate call_rcu_tasks() contention Paul E. McKenney
                   ` (4 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: Paul E. McKenney @ 2021-12-02  0:38 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, kernel-team, mingo, jiangshanlai, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, Paul E. McKenney, Martin Lau,
	Neeraj Upadhyay

This commit adds a rcupdate.rcu_task_enqueue_lim module parameter that
sets the initial number of callback queues to use for the RCU Tasks
family of RCU implementations.  This paramter allows testing of various
fanout values.

Reported-by: Martin Lau <kafai@fb.com>
Cc: Neeraj Upadhyay <neeraj.iitr10@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 .../admin-guide/kernel-parameters.txt         |  7 ++++++
 kernel/rcu/tasks.h                            | 24 ++++++++++++++-----
 2 files changed, 25 insertions(+), 6 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 9725c546a0d46..9b09fc5dfe665 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -4805,6 +4805,13 @@
 			period to instead use normal non-expedited
 			grace-period processing.
 
+	rcupdate.rcu_task_enqueue_lim= [KNL]
+			Set the number of callback queues to use for the
+			RCU Tasks family of RCU flavors.  The default
+			of -1 allows this to be automatically (and
+			dynamically) adjusted.	This parameter is intended
+			for use in testing.
+
 	rcupdate.rcu_task_ipi_delay= [KNL]
 			Set time in jiffies during which RCU tasks will
 			avoid sending IPIs, starting with the beginning
diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h
index 61a606569868b..2b148f6743150 100644
--- a/kernel/rcu/tasks.h
+++ b/kernel/rcu/tasks.h
@@ -130,6 +130,9 @@ module_param(rcu_task_ipi_delay, int, 0644);
 static int rcu_task_stall_timeout __read_mostly = RCU_TASK_STALL_TIMEOUT;
 module_param(rcu_task_stall_timeout, int, 0644);
 
+static int rcu_task_enqueue_lim __read_mostly = -1;
+module_param(rcu_task_enqueue_lim, int, 0444);
+
 /* RCU tasks grace-period state for debugging. */
 #define RTGS_INIT		 0
 #define RTGS_WAIT_WAIT_CBS	 1
@@ -192,10 +195,19 @@ static void cblist_init_generic(struct rcu_tasks *rtp)
 {
 	int cpu;
 	unsigned long flags;
+	int lim;
 
 	raw_spin_lock_irqsave(&rtp->cbs_gbl_lock, flags);
-	rtp->percpu_enqueue_shift = ilog2(nr_cpu_ids);
-	rtp->percpu_enqueue_lim = 1;
+	if (rcu_task_enqueue_lim < 0)
+		rcu_task_enqueue_lim = nr_cpu_ids;
+	else if (rcu_task_enqueue_lim == 0)
+		rcu_task_enqueue_lim = 1;
+	lim = rcu_task_enqueue_lim;
+
+	if (lim > nr_cpu_ids)
+		lim = nr_cpu_ids;
+	WRITE_ONCE(rtp->percpu_enqueue_shift, ilog2(nr_cpu_ids / lim));
+	smp_store_release(&rtp->percpu_enqueue_lim, lim);
 	for_each_possible_cpu(cpu) {
 		struct rcu_tasks_percpu *rtpcp = per_cpu_ptr(rtp->rtpcpu, cpu);
 
@@ -211,7 +223,7 @@ static void cblist_init_generic(struct rcu_tasks *rtp)
 		raw_spin_unlock_rcu_node(rtpcp); // irqs remain disabled.
 	}
 	raw_spin_unlock_irqrestore(&rtp->cbs_gbl_lock, flags);
-
+	pr_info("%s: Setting shift to %d and lim to %d.\n", __func__, data_race(rtp->percpu_enqueue_shift), data_race(rtp->percpu_enqueue_lim));
 }
 
 // Enqueue a callback for the specified flavor of Tasks RCU.
@@ -307,7 +319,7 @@ static int rcu_tasks_need_gpcb(struct rcu_tasks *rtp)
 	unsigned long flags;
 	int needgpcb = 0;
 
-	for (cpu = 0; cpu < rtp->percpu_enqueue_lim; cpu++) {
+	for (cpu = 0; cpu < smp_load_acquire(&rtp->percpu_enqueue_lim); cpu++) {
 		struct rcu_tasks_percpu *rtpcp = per_cpu_ptr(rtp->rtpcpu, cpu);
 
 		/* Advance and accelerate any new callbacks. */
@@ -339,11 +351,11 @@ static void rcu_tasks_invoke_cbs(struct rcu_tasks *rtp, struct rcu_tasks_percpu
 
 	cpu = rtpcp->cpu;
 	cpunext = cpu * 2 + 1;
-	if (cpunext < rtp->percpu_enqueue_lim) {
+	if (cpunext < smp_load_acquire(&rtp->percpu_enqueue_lim)) {
 		rtpcp_next = per_cpu_ptr(rtp->rtpcpu, cpunext);
 		queue_work_on(cpunext, system_wq, &rtpcp_next->rtp_work);
 		cpunext++;
-		if (cpunext < rtp->percpu_enqueue_lim) {
+		if (cpunext < smp_load_acquire(&rtp->percpu_enqueue_lim)) {
 			rtpcp_next = per_cpu_ptr(rtp->rtpcpu, cpunext);
 			queue_work_on(cpunext, system_wq, &rtpcp_next->rtp_work);
 		}
-- 
2.31.1.189.g2e36527f23


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH rcu 14/18] rcu-tasks: Count trylocks to estimate call_rcu_tasks() contention
  2021-12-02  0:38 [PATCH rcu 0/18] RCU Tasks updates for v5.17 Paul E. McKenney
                   ` (12 preceding siblings ...)
  2021-12-02  0:38 ` [PATCH rcu 13/18] rcu-tasks: Add rcupdate.rcu_task_enqueue_lim to set initial queueing Paul E. McKenney
@ 2021-12-02  0:38 ` Paul E. McKenney
  2021-12-02  0:38 ` [PATCH rcu 15/18] rcu-tasks: Avoid raw-spinlocked wakeups from call_rcu_tasks_generic() Paul E. McKenney
                   ` (3 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: Paul E. McKenney @ 2021-12-02  0:38 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, kernel-team, mingo, jiangshanlai, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, Paul E. McKenney, Martin Lau,
	Neeraj Upadhyay

This commit converts the unconditional raw_spin_lock_rcu_node() lock
acquisition in call_rcu_tasks_generic() to a trylock followed by an
unconditional acquisition if the trylock fails.  If the trylock fails,
the failure is counted, but the count is reset to zero on each new jiffy.

This statistic will be used to determine when to move from a single
callback queue to per-CPU callback queues.

Reported-by: Martin Lau <kafai@fb.com>
Cc: Neeraj Upadhyay <neeraj.iitr10@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 kernel/rcu/tasks.h | 15 ++++++++++++++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h
index 2b148f6743150..3f25022a0db9a 100644
--- a/kernel/rcu/tasks.h
+++ b/kernel/rcu/tasks.h
@@ -24,6 +24,8 @@ typedef void (*postgp_func_t)(struct rcu_tasks *rtp);
  * struct rcu_tasks_percpu - Per-CPU component of definition for a Tasks-RCU-like mechanism.
  * @cblist: Callback list.
  * @lock: Lock protecting per-CPU callback list.
+ * @rtp_jiffies: Jiffies counter value for statistics.
+ * @rtp_n_lock_retries: Rough lock-contention statistic.
  * @rtp_work: Work queue for invoking callbacks.
  * @barrier_q_head: RCU callback for barrier operation.
  * @cpu: CPU number corresponding to this entry.
@@ -32,6 +34,8 @@ typedef void (*postgp_func_t)(struct rcu_tasks *rtp);
 struct rcu_tasks_percpu {
 	struct rcu_segcblist cblist;
 	raw_spinlock_t __private lock;
+	unsigned long rtp_jiffies;
+	unsigned long rtp_n_lock_retries;
 	struct work_struct rtp_work;
 	struct rcu_head barrier_q_head;
 	int cpu;
@@ -231,6 +235,7 @@ static void call_rcu_tasks_generic(struct rcu_head *rhp, rcu_callback_t func,
 				   struct rcu_tasks *rtp)
 {
 	unsigned long flags;
+	unsigned long j;
 	bool needwake;
 	struct rcu_tasks_percpu *rtpcp;
 
@@ -239,7 +244,15 @@ static void call_rcu_tasks_generic(struct rcu_head *rhp, rcu_callback_t func,
 	local_irq_save(flags);
 	rtpcp = per_cpu_ptr(rtp->rtpcpu,
 			    smp_processor_id() >> READ_ONCE(rtp->percpu_enqueue_shift));
-	raw_spin_lock_rcu_node(rtpcp); // irqs already disabled.
+	if (!raw_spin_trylock_rcu_node(rtpcp)) { // irqs already disabled.
+		raw_spin_lock_rcu_node(rtpcp); // irqs already disabled.
+		j = jiffies;
+		if (rtpcp->rtp_jiffies != j) {
+			rtpcp->rtp_jiffies = j;
+			rtpcp->rtp_n_lock_retries = 0;
+		}
+		rtpcp->rtp_n_lock_retries++;
+	}
 	if (!rcu_segcblist_is_enabled(&rtpcp->cblist)) {
 		raw_spin_unlock_rcu_node(rtpcp); // irqs remain disabled.
 		cblist_init_generic(rtp);
-- 
2.31.1.189.g2e36527f23


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH rcu 15/18] rcu-tasks: Avoid raw-spinlocked wakeups from call_rcu_tasks_generic()
  2021-12-02  0:38 [PATCH rcu 0/18] RCU Tasks updates for v5.17 Paul E. McKenney
                   ` (13 preceding siblings ...)
  2021-12-02  0:38 ` [PATCH rcu 14/18] rcu-tasks: Count trylocks to estimate call_rcu_tasks() contention Paul E. McKenney
@ 2021-12-02  0:38 ` Paul E. McKenney
  2021-12-02  0:38 ` [PATCH rcu 16/18] rcu-tasks: Use more callback queues if contention encountered Paul E. McKenney
                   ` (2 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: Paul E. McKenney @ 2021-12-02  0:38 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, kernel-team, mingo, jiangshanlai, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, Paul E. McKenney, Martin Lau,
	Neeraj Upadhyay

If the caller of of call_rcu_tasks(), call_rcu_tasks_rude(),
or call_rcu_tasks_trace() holds a raw spinlock, and then if
call_rcu_tasks_generic() determines that the grace-period kthread must
be awakened, then the wakeup might acquire a normal spinlock while a
raw spinlock is held.  This results in lockdep splats when the
kernel is built with CONFIG_PROVE_RAW_LOCK_NESTING=y.

This commit therefore defers the wakeup using irq_work_queue().

It would be nice to directly invoke wakeup when a raw spinlock is not
held, but there is currently no way to check for this in all kernels.

Reported-by: Martin Lau <kafai@fb.com>
Cc: Neeraj Upadhyay <neeraj.iitr10@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 kernel/rcu/tasks.h | 20 ++++++++++++++++++--
 1 file changed, 18 insertions(+), 2 deletions(-)

diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h
index 3f25022a0db9a..652f51eec5f34 100644
--- a/kernel/rcu/tasks.h
+++ b/kernel/rcu/tasks.h
@@ -27,6 +27,7 @@ typedef void (*postgp_func_t)(struct rcu_tasks *rtp);
  * @rtp_jiffies: Jiffies counter value for statistics.
  * @rtp_n_lock_retries: Rough lock-contention statistic.
  * @rtp_work: Work queue for invoking callbacks.
+ * @rtp_irq_work: IRQ work queue for deferred wakeups.
  * @barrier_q_head: RCU callback for barrier operation.
  * @cpu: CPU number corresponding to this entry.
  * @rtpp: Pointer to the rcu_tasks structure.
@@ -37,6 +38,7 @@ struct rcu_tasks_percpu {
 	unsigned long rtp_jiffies;
 	unsigned long rtp_n_lock_retries;
 	struct work_struct rtp_work;
+	struct irq_work rtp_irq_work;
 	struct rcu_head barrier_q_head;
 	int cpu;
 	struct rcu_tasks *rtpp;
@@ -102,9 +104,12 @@ struct rcu_tasks {
 	char *kname;
 };
 
+static void call_rcu_tasks_iw_wakeup(struct irq_work *iwp);
+
 #define DEFINE_RCU_TASKS(rt_name, gp, call, n)						\
 static DEFINE_PER_CPU(struct rcu_tasks_percpu, rt_name ## __percpu) = {			\
 	.lock = __RAW_SPIN_LOCK_UNLOCKED(rt_name ## __percpu.cbs_pcpu_lock),		\
+	.rtp_irq_work = IRQ_WORK_INIT(call_rcu_tasks_iw_wakeup),			\
 };											\
 static struct rcu_tasks rt_name =							\
 {											\
@@ -230,6 +235,16 @@ static void cblist_init_generic(struct rcu_tasks *rtp)
 	pr_info("%s: Setting shift to %d and lim to %d.\n", __func__, data_race(rtp->percpu_enqueue_shift), data_race(rtp->percpu_enqueue_lim));
 }
 
+// IRQ-work handler that does deferred wakeup for call_rcu_tasks_generic().
+static void call_rcu_tasks_iw_wakeup(struct irq_work *iwp)
+{
+	struct rcu_tasks *rtp;
+	struct rcu_tasks_percpu *rtpcp = container_of(iwp, struct rcu_tasks_percpu, rtp_irq_work);
+
+	rtp = rtpcp->rtpp;
+	wake_up(&rtp->cbs_wq);
+}
+
 // Enqueue a callback for the specified flavor of Tasks RCU.
 static void call_rcu_tasks_generic(struct rcu_head *rhp, rcu_callback_t func,
 				   struct rcu_tasks *rtp)
@@ -262,8 +277,9 @@ static void call_rcu_tasks_generic(struct rcu_head *rhp, rcu_callback_t func,
 	rcu_segcblist_enqueue(&rtpcp->cblist, rhp);
 	raw_spin_unlock_irqrestore_rcu_node(rtpcp, flags);
 	/* We can't create the thread unless interrupts are enabled. */
-	if (needwake && READ_ONCE(rtp->kthread_ptr))
-		wake_up(&rtp->cbs_wq);
+	if (needwake && READ_ONCE(rtp->kthread_ptr)) {
+		irq_work_queue(&rtpcp->rtp_irq_work);
+	}
 }
 
 // Wait for a grace period for the specified flavor of Tasks RCU.
-- 
2.31.1.189.g2e36527f23


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH rcu 16/18] rcu-tasks: Use more callback queues if contention encountered
  2021-12-02  0:38 [PATCH rcu 0/18] RCU Tasks updates for v5.17 Paul E. McKenney
                   ` (14 preceding siblings ...)
  2021-12-02  0:38 ` [PATCH rcu 15/18] rcu-tasks: Avoid raw-spinlocked wakeups from call_rcu_tasks_generic() Paul E. McKenney
@ 2021-12-02  0:38 ` Paul E. McKenney
  2021-12-02  0:38 ` [PATCH rcu 17/18] rcu-tasks: Use separate ->percpu_dequeue_lim for callback dequeueing Paul E. McKenney
  2021-12-02  0:38 ` [PATCH rcu 18/18] rcu-tasks: Use fewer callbacks queues if callback flood ends Paul E. McKenney
  17 siblings, 0 replies; 19+ messages in thread
From: Paul E. McKenney @ 2021-12-02  0:38 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, kernel-team, mingo, jiangshanlai, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, Paul E. McKenney, Martin Lau,
	Neeraj Upadhyay

The rcupdate.rcu_task_enqueue_lim module parameter allows system
administrators to tune the number of callback queues used by the RCU
Tasks flavors.  However if callback storms are infrequent, it would
be better to operate with a single queue on a given system unless and
until that system actually needed more queues.  Systems not needing
more queues can then avoid the overhead of checking the extra queues
and especially avoid the overhead of fanning workqueue handlers out to
all CPUs to invoke callbacks.

This commit therefore switches to using all the CPUs' callback queues if
call_rcu_tasks_generic() encounters too much lock contention.  The amount
of lock contention to tolerate defaults to 100 contended lock acquisitions
per jiffy, and can be adjusted using the new rcupdate.rcu_task_contend_lim
module parameter.

Such switching is undertaken only if the rcupdate.rcu_task_enqueue_lim
module parameter is negative, which is its default value (-1).
This allows savvy systems administrators to set the number of queues
to some known good value and to not have to worry about the kernel doing
any second guessing.

[ paulmck: Apply feedback from Guillaume Tucker and kernelci. ]

Reported-by: Martin Lau <kafai@fb.com>
Cc: Neeraj Upadhyay <neeraj.iitr10@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 .../admin-guide/kernel-parameters.txt         |  8 ++++++
 kernel/rcu/tasks.h                            | 27 ++++++++++++++++---
 2 files changed, 31 insertions(+), 4 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 9b09fc5dfe665..089f4c5f8225b 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -4805,6 +4805,14 @@
 			period to instead use normal non-expedited
 			grace-period processing.
 
+	rcupdate.rcu_task_contend_lim= [KNL]
+			Set the minimum number of callback-queuing-time
+			lock-contention events per jiffy required to
+			cause the RCU Tasks flavors to switch to per-CPU
+			callback queuing.  This switching only occurs
+			when rcupdate.rcu_task_enqueue_lim is set to
+			the default value of -1.
+
 	rcupdate.rcu_task_enqueue_lim= [KNL]
 			Set the number of callback queues to use for the
 			RCU Tasks family of RCU flavors.  The default
diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h
index 652f51eec5f34..1695da0f6985e 100644
--- a/kernel/rcu/tasks.h
+++ b/kernel/rcu/tasks.h
@@ -142,6 +142,10 @@ module_param(rcu_task_stall_timeout, int, 0644);
 static int rcu_task_enqueue_lim __read_mostly = -1;
 module_param(rcu_task_enqueue_lim, int, 0444);
 
+static bool rcu_task_cb_adjust;
+static int rcu_task_contend_lim __read_mostly = 100;
+module_param(rcu_task_contend_lim, int, 0444);
+
 /* RCU tasks grace-period state for debugging. */
 #define RTGS_INIT		 0
 #define RTGS_WAIT_WAIT_CBS	 1
@@ -207,10 +211,13 @@ static void cblist_init_generic(struct rcu_tasks *rtp)
 	int lim;
 
 	raw_spin_lock_irqsave(&rtp->cbs_gbl_lock, flags);
-	if (rcu_task_enqueue_lim < 0)
-		rcu_task_enqueue_lim = nr_cpu_ids;
-	else if (rcu_task_enqueue_lim == 0)
+	if (rcu_task_enqueue_lim < 0) {
+		rcu_task_enqueue_lim = 1;
+		rcu_task_cb_adjust = true;
+		pr_info("%s: Setting adjustable number of callback queues.\n", __func__);
+	} else if (rcu_task_enqueue_lim == 0) {
 		rcu_task_enqueue_lim = 1;
+	}
 	lim = rcu_task_enqueue_lim;
 
 	if (lim > nr_cpu_ids)
@@ -251,6 +258,7 @@ static void call_rcu_tasks_generic(struct rcu_head *rhp, rcu_callback_t func,
 {
 	unsigned long flags;
 	unsigned long j;
+	bool needadjust = false;
 	bool needwake;
 	struct rcu_tasks_percpu *rtpcp;
 
@@ -266,7 +274,9 @@ static void call_rcu_tasks_generic(struct rcu_head *rhp, rcu_callback_t func,
 			rtpcp->rtp_jiffies = j;
 			rtpcp->rtp_n_lock_retries = 0;
 		}
-		rtpcp->rtp_n_lock_retries++;
+		if (rcu_task_cb_adjust && ++rtpcp->rtp_n_lock_retries > rcu_task_contend_lim &&
+		    READ_ONCE(rtp->percpu_enqueue_lim) != nr_cpu_ids)
+			needadjust = true;  // Defer adjustment to avoid deadlock.
 	}
 	if (!rcu_segcblist_is_enabled(&rtpcp->cblist)) {
 		raw_spin_unlock_rcu_node(rtpcp); // irqs remain disabled.
@@ -276,6 +286,15 @@ static void call_rcu_tasks_generic(struct rcu_head *rhp, rcu_callback_t func,
 	needwake = rcu_segcblist_empty(&rtpcp->cblist);
 	rcu_segcblist_enqueue(&rtpcp->cblist, rhp);
 	raw_spin_unlock_irqrestore_rcu_node(rtpcp, flags);
+	if (unlikely(needadjust)) {
+		raw_spin_lock_irqsave(&rtp->cbs_gbl_lock, flags);
+		if (rtp->percpu_enqueue_lim != nr_cpu_ids) {
+			WRITE_ONCE(rtp->percpu_enqueue_shift, ilog2(nr_cpu_ids));
+			smp_store_release(&rtp->percpu_enqueue_lim, nr_cpu_ids);
+			pr_info("Switching %s to per-CPU callback queuing.\n", rtp->name);
+		}
+		raw_spin_unlock_irqrestore(&rtp->cbs_gbl_lock, flags);
+	}
 	/* We can't create the thread unless interrupts are enabled. */
 	if (needwake && READ_ONCE(rtp->kthread_ptr)) {
 		irq_work_queue(&rtpcp->rtp_irq_work);
-- 
2.31.1.189.g2e36527f23


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH rcu 17/18] rcu-tasks: Use separate ->percpu_dequeue_lim for callback dequeueing
  2021-12-02  0:38 [PATCH rcu 0/18] RCU Tasks updates for v5.17 Paul E. McKenney
                   ` (15 preceding siblings ...)
  2021-12-02  0:38 ` [PATCH rcu 16/18] rcu-tasks: Use more callback queues if contention encountered Paul E. McKenney
@ 2021-12-02  0:38 ` Paul E. McKenney
  2021-12-02  0:38 ` [PATCH rcu 18/18] rcu-tasks: Use fewer callbacks queues if callback flood ends Paul E. McKenney
  17 siblings, 0 replies; 19+ messages in thread
From: Paul E. McKenney @ 2021-12-02  0:38 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, kernel-team, mingo, jiangshanlai, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, Paul E. McKenney, Martin Lau,
	Neeraj Upadhyay

Decreasing the number of callback queues is a bit tricky because it
is necessary to handle callbacks that were queued before the number of
queues decreased, but which were not ready to invoke until afterwards.
This commit takes a first step in this direction by maintaining a separate
->percpu_dequeue_lim to control callback dequeueing, in addition to the
existing ->percpu_enqueue_lim which now controls only enqueueing.

Reported-by: Martin Lau <kafai@fb.com>
Cc: Neeraj Upadhyay <neeraj.iitr10@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 kernel/rcu/tasks.h | 15 ++++++++++-----
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h
index 1695da0f6985e..1fbffea6ae469 100644
--- a/kernel/rcu/tasks.h
+++ b/kernel/rcu/tasks.h
@@ -66,7 +66,8 @@ struct rcu_tasks_percpu {
  * @call_func: This flavor's call_rcu()-equivalent function.
  * @rtpcpu: This flavor's rcu_tasks_percpu structure.
  * @percpu_enqueue_shift: Shift down CPU ID this much when enqueuing callbacks.
- * @percpu_enqueue_lim: Number of per-CPU callback queues in use.
+ * @percpu_enqueue_lim: Number of per-CPU callback queues in use for enqueuing.
+ * @percpu_dequeue_lim: Number of per-CPU callback queues in use for dequeuing.
  * @barrier_q_mutex: Serialize barrier operations.
  * @barrier_q_count: Number of queues being waited on.
  * @barrier_q_completion: Barrier wait/wakeup mechanism.
@@ -96,6 +97,7 @@ struct rcu_tasks {
 	struct rcu_tasks_percpu __percpu *rtpcpu;
 	int percpu_enqueue_shift;
 	int percpu_enqueue_lim;
+	int percpu_dequeue_lim;
 	struct mutex barrier_q_mutex;
 	atomic_t barrier_q_count;
 	struct completion barrier_q_completion;
@@ -121,6 +123,7 @@ static struct rcu_tasks rt_name =							\
 	.name = n,									\
 	.percpu_enqueue_shift = ilog2(CONFIG_NR_CPUS),					\
 	.percpu_enqueue_lim = 1,							\
+	.percpu_dequeue_lim = 1,							\
 	.barrier_q_mutex = __MUTEX_INITIALIZER(rt_name.barrier_q_mutex),		\
 	.barrier_q_seq = (0UL - 50UL) << RCU_SEQ_CTR_SHIFT,				\
 	.kname = #rt_name,								\
@@ -223,6 +226,7 @@ static void cblist_init_generic(struct rcu_tasks *rtp)
 	if (lim > nr_cpu_ids)
 		lim = nr_cpu_ids;
 	WRITE_ONCE(rtp->percpu_enqueue_shift, ilog2(nr_cpu_ids / lim));
+	WRITE_ONCE(rtp->percpu_dequeue_lim, lim);
 	smp_store_release(&rtp->percpu_enqueue_lim, lim);
 	for_each_possible_cpu(cpu) {
 		struct rcu_tasks_percpu *rtpcp = per_cpu_ptr(rtp->rtpcpu, cpu);
@@ -290,6 +294,7 @@ static void call_rcu_tasks_generic(struct rcu_head *rhp, rcu_callback_t func,
 		raw_spin_lock_irqsave(&rtp->cbs_gbl_lock, flags);
 		if (rtp->percpu_enqueue_lim != nr_cpu_ids) {
 			WRITE_ONCE(rtp->percpu_enqueue_shift, ilog2(nr_cpu_ids));
+			WRITE_ONCE(rtp->percpu_enqueue_lim, nr_cpu_ids);
 			smp_store_release(&rtp->percpu_enqueue_lim, nr_cpu_ids);
 			pr_info("Switching %s to per-CPU callback queuing.\n", rtp->name);
 		}
@@ -343,7 +348,7 @@ static void rcu_barrier_tasks_generic(struct rcu_tasks *rtp)
 	init_completion(&rtp->barrier_q_completion);
 	atomic_set(&rtp->barrier_q_count, 2);
 	for_each_possible_cpu(cpu) {
-		if (cpu >= smp_load_acquire(&rtp->percpu_enqueue_lim))
+		if (cpu >= smp_load_acquire(&rtp->percpu_dequeue_lim))
 			break;
 		rtpcp = per_cpu_ptr(rtp->rtpcpu, cpu);
 		rtpcp->barrier_q_head.func = rcu_barrier_tasks_generic_cb;
@@ -367,7 +372,7 @@ static int rcu_tasks_need_gpcb(struct rcu_tasks *rtp)
 	unsigned long flags;
 	int needgpcb = 0;
 
-	for (cpu = 0; cpu < smp_load_acquire(&rtp->percpu_enqueue_lim); cpu++) {
+	for (cpu = 0; cpu < smp_load_acquire(&rtp->percpu_dequeue_lim); cpu++) {
 		struct rcu_tasks_percpu *rtpcp = per_cpu_ptr(rtp->rtpcpu, cpu);
 
 		/* Advance and accelerate any new callbacks. */
@@ -399,11 +404,11 @@ static void rcu_tasks_invoke_cbs(struct rcu_tasks *rtp, struct rcu_tasks_percpu
 
 	cpu = rtpcp->cpu;
 	cpunext = cpu * 2 + 1;
-	if (cpunext < smp_load_acquire(&rtp->percpu_enqueue_lim)) {
+	if (cpunext < smp_load_acquire(&rtp->percpu_dequeue_lim)) {
 		rtpcp_next = per_cpu_ptr(rtp->rtpcpu, cpunext);
 		queue_work_on(cpunext, system_wq, &rtpcp_next->rtp_work);
 		cpunext++;
-		if (cpunext < smp_load_acquire(&rtp->percpu_enqueue_lim)) {
+		if (cpunext < smp_load_acquire(&rtp->percpu_dequeue_lim)) {
 			rtpcp_next = per_cpu_ptr(rtp->rtpcpu, cpunext);
 			queue_work_on(cpunext, system_wq, &rtpcp_next->rtp_work);
 		}
-- 
2.31.1.189.g2e36527f23


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH rcu 18/18] rcu-tasks: Use fewer callbacks queues if callback flood ends
  2021-12-02  0:38 [PATCH rcu 0/18] RCU Tasks updates for v5.17 Paul E. McKenney
                   ` (16 preceding siblings ...)
  2021-12-02  0:38 ` [PATCH rcu 17/18] rcu-tasks: Use separate ->percpu_dequeue_lim for callback dequeueing Paul E. McKenney
@ 2021-12-02  0:38 ` Paul E. McKenney
  17 siblings, 0 replies; 19+ messages in thread
From: Paul E. McKenney @ 2021-12-02  0:38 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, kernel-team, mingo, jiangshanlai, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, Paul E. McKenney, Martin Lau,
	Neeraj Upadhyay

By default, when lock contention is encountered, the RCU Tasks flavors
of RCU switch to using per-CPU queueing.  However, if the callback
flood ends, per-CPU queueing continues to be used, which introduces
significant additional overhead, especially for callback invocation,
which fans out a series of workqueue handlers.

This commit therefore switches back to single-queue operation if at the
beginning of a grace period there are very few callbacks.  The definition
of "very few" is set by the rcupdate.rcu_task_collapse_lim module
parameter, which defaults to 10.  This switch happens in two phases,
with the first phase causing future callbacks to be enqueued on CPU 0's
queue, but with all queues continuing to be checked for grace periods
and callback invocation.  The second phase checks to see if an RCU grace
period has elapsed and if all remaining RCU-Tasks callbacks are queued
on CPU 0.  If so, only CPU 0 is checked for future grace periods and
callback operation.

Of course, the return of contention anywhere during this process will
result in returning to per-CPU callback queueing.

Reported-by: Martin Lau <kafai@fb.com>
Cc: Neeraj Upadhyay <neeraj.iitr10@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 .../admin-guide/kernel-parameters.txt         |  8 ++++
 kernel/rcu/tasks.h                            | 46 ++++++++++++++++++-
 2 files changed, 53 insertions(+), 1 deletion(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 089f4c5f8225b..d1b0542b85644 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -4805,6 +4805,14 @@
 			period to instead use normal non-expedited
 			grace-period processing.
 
+	rcupdate.rcu_task_collapse_lim= [KNL]
+			Set the maximum number of callbacks present
+			at the beginning of a grace period that allows
+			the RCU Tasks flavors to collapse back to using
+			a single callback queue.  This switching only
+			occurs when rcupdate.rcu_task_enqueue_lim is
+			set to the default value of -1.
+
 	rcupdate.rcu_task_contend_lim= [KNL]
 			Set the minimum number of callback-queuing-time
 			lock-contention events per jiffy required to
diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h
index 1fbffea6ae469..02c673e64456e 100644
--- a/kernel/rcu/tasks.h
+++ b/kernel/rcu/tasks.h
@@ -68,6 +68,7 @@ struct rcu_tasks_percpu {
  * @percpu_enqueue_shift: Shift down CPU ID this much when enqueuing callbacks.
  * @percpu_enqueue_lim: Number of per-CPU callback queues in use for enqueuing.
  * @percpu_dequeue_lim: Number of per-CPU callback queues in use for dequeuing.
+ * @percpu_dequeue_gpseq: RCU grace-period number to propagate enqueue limit to dequeuers.
  * @barrier_q_mutex: Serialize barrier operations.
  * @barrier_q_count: Number of queues being waited on.
  * @barrier_q_completion: Barrier wait/wakeup mechanism.
@@ -98,6 +99,7 @@ struct rcu_tasks {
 	int percpu_enqueue_shift;
 	int percpu_enqueue_lim;
 	int percpu_dequeue_lim;
+	unsigned long percpu_dequeue_gpseq;
 	struct mutex barrier_q_mutex;
 	atomic_t barrier_q_count;
 	struct completion barrier_q_completion;
@@ -148,6 +150,8 @@ module_param(rcu_task_enqueue_lim, int, 0444);
 static bool rcu_task_cb_adjust;
 static int rcu_task_contend_lim __read_mostly = 100;
 module_param(rcu_task_contend_lim, int, 0444);
+static int rcu_task_collapse_lim __read_mostly = 10;
+module_param(rcu_task_collapse_lim, int, 0444);
 
 /* RCU tasks grace-period state for debugging. */
 #define RTGS_INIT		 0
@@ -269,6 +273,7 @@ static void call_rcu_tasks_generic(struct rcu_head *rhp, rcu_callback_t func,
 	rhp->next = NULL;
 	rhp->func = func;
 	local_irq_save(flags);
+	rcu_read_lock();
 	rtpcp = per_cpu_ptr(rtp->rtpcpu,
 			    smp_processor_id() >> READ_ONCE(rtp->percpu_enqueue_shift));
 	if (!raw_spin_trylock_rcu_node(rtpcp)) { // irqs already disabled.
@@ -294,12 +299,13 @@ static void call_rcu_tasks_generic(struct rcu_head *rhp, rcu_callback_t func,
 		raw_spin_lock_irqsave(&rtp->cbs_gbl_lock, flags);
 		if (rtp->percpu_enqueue_lim != nr_cpu_ids) {
 			WRITE_ONCE(rtp->percpu_enqueue_shift, ilog2(nr_cpu_ids));
-			WRITE_ONCE(rtp->percpu_enqueue_lim, nr_cpu_ids);
+			WRITE_ONCE(rtp->percpu_dequeue_lim, nr_cpu_ids);
 			smp_store_release(&rtp->percpu_enqueue_lim, nr_cpu_ids);
 			pr_info("Switching %s to per-CPU callback queuing.\n", rtp->name);
 		}
 		raw_spin_unlock_irqrestore(&rtp->cbs_gbl_lock, flags);
 	}
+	rcu_read_unlock();
 	/* We can't create the thread unless interrupts are enabled. */
 	if (needwake && READ_ONCE(rtp->kthread_ptr)) {
 		irq_work_queue(&rtpcp->rtp_irq_work);
@@ -370,6 +376,9 @@ static int rcu_tasks_need_gpcb(struct rcu_tasks *rtp)
 {
 	int cpu;
 	unsigned long flags;
+	long n;
+	long ncbs = 0;
+	long ncbsnz = 0;
 	int needgpcb = 0;
 
 	for (cpu = 0; cpu < smp_load_acquire(&rtp->percpu_dequeue_lim); cpu++) {
@@ -380,6 +389,13 @@ static int rcu_tasks_need_gpcb(struct rcu_tasks *rtp)
 			continue;
 		raw_spin_lock_irqsave_rcu_node(rtpcp, flags);
 		smp_mb__after_spinlock(); // Order updates vs. GP.
+		// Should we shrink down to a single callback queue?
+		if (!rcu_segcblist_empty(&rtpcp->cblist)) {
+			n = rcu_segcblist_n_cbs(&rtpcp->cblist);
+			ncbs += n;
+			if (cpu > 0)
+				ncbsnz += n;
+		}
 		rcu_segcblist_advance(&rtpcp->cblist, rcu_seq_current(&rtp->tasks_gp_seq));
 		(void)rcu_segcblist_accelerate(&rtpcp->cblist, rcu_seq_snap(&rtp->tasks_gp_seq));
 		if (rcu_segcblist_pend_cbs(&rtpcp->cblist))
@@ -388,6 +404,34 @@ static int rcu_tasks_need_gpcb(struct rcu_tasks *rtp)
 			needgpcb |= 0x1;
 		raw_spin_unlock_irqrestore_rcu_node(rtpcp, flags);
 	}
+
+	// Shrink down to a single callback queue if appropriate.
+	// This is done in two stages: (1) If there are no more than
+	// rcu_task_collapse_lim callbacks on CPU 0 and none on any other
+	// CPU, limit enqueueing to CPU 0.  (2) After an RCU grace period,
+	// if there has not been an increase in callbacks, limit dequeuing
+	// to CPU 0.  Note the matching RCU read-side critical section in
+	// call_rcu_tasks_generic().
+	if (rcu_task_cb_adjust && ncbs <= rcu_task_collapse_lim) {
+		raw_spin_lock_irqsave(&rtp->cbs_gbl_lock, flags);
+		if (rtp->percpu_enqueue_lim > 1) {
+			WRITE_ONCE(rtp->percpu_enqueue_shift, ilog2(nr_cpu_ids));
+			smp_store_release(&rtp->percpu_enqueue_lim, 1);
+			rtp->percpu_dequeue_gpseq = get_state_synchronize_rcu();
+			pr_info("Starting switch %s to CPU-0 callback queuing.\n", rtp->name);
+		}
+		raw_spin_unlock_irqrestore(&rtp->cbs_gbl_lock, flags);
+	}
+	if (rcu_task_cb_adjust && !ncbsnz &&
+	    poll_state_synchronize_rcu(rtp->percpu_dequeue_gpseq)) {
+		raw_spin_lock_irqsave(&rtp->cbs_gbl_lock, flags);
+		if (rtp->percpu_enqueue_lim < rtp->percpu_dequeue_lim) {
+			WRITE_ONCE(rtp->percpu_dequeue_lim, 1);
+			pr_info("Completing switch %s to CPU-0 callback queuing.\n", rtp->name);
+		}
+		raw_spin_unlock_irqrestore(&rtp->cbs_gbl_lock, flags);
+	}
+
 	return needgpcb;
 }
 
-- 
2.31.1.189.g2e36527f23


^ permalink raw reply related	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2021-12-02  0:43 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-02  0:38 [PATCH rcu 0/18] RCU Tasks updates for v5.17 Paul E. McKenney
2021-12-02  0:38 ` [PATCH rcu 01/18] rcu-tasks: Don't remove tasks with pending IPIs from holdout list Paul E. McKenney
2021-12-02  0:38 ` [PATCH rcu 02/18] rcu-tasks: Create per-CPU callback lists Paul E. McKenney
2021-12-02  0:38 ` [PATCH rcu 03/18] rcu-tasks: Introduce ->percpu_enqueue_shift for dynamic queue selection Paul E. McKenney
2021-12-02  0:38 ` [PATCH rcu 04/18] rcu-tasks: Convert grace-period counter to grace-period sequence number Paul E. McKenney
2021-12-02  0:38 ` [PATCH rcu 05/18] rcu_tasks: Convert bespoke callback list to rcu_segcblist structure Paul E. McKenney
2021-12-02  0:38 ` [PATCH rcu 06/18] rcu-tasks: Use spin_lock_rcu_node() and friends Paul E. McKenney
2021-12-02  0:38 ` [PATCH rcu 07/18] rcu-tasks: Inspect stalled task's trc state in locked state Paul E. McKenney
2021-12-02  0:38 ` [PATCH rcu 08/18] rcu-tasks: Add a ->percpu_enqueue_lim to the rcu_tasks structure Paul E. McKenney
2021-12-02  0:38 ` [PATCH rcu 09/18] rcu-tasks: Abstract checking of callback lists Paul E. McKenney
2021-12-02  0:38 ` [PATCH rcu 10/18] rcu-tasks: Abstract invocations of callbacks Paul E. McKenney
2021-12-02  0:38 ` [PATCH rcu 11/18] rcu-tasks: Use workqueues for multiple rcu_tasks_invoke_cbs() invocations Paul E. McKenney
2021-12-02  0:38 ` [PATCH rcu 12/18] rcu-tasks: Make rcu_barrier_tasks*() handle multiple callback queues Paul E. McKenney
2021-12-02  0:38 ` [PATCH rcu 13/18] rcu-tasks: Add rcupdate.rcu_task_enqueue_lim to set initial queueing Paul E. McKenney
2021-12-02  0:38 ` [PATCH rcu 14/18] rcu-tasks: Count trylocks to estimate call_rcu_tasks() contention Paul E. McKenney
2021-12-02  0:38 ` [PATCH rcu 15/18] rcu-tasks: Avoid raw-spinlocked wakeups from call_rcu_tasks_generic() Paul E. McKenney
2021-12-02  0:38 ` [PATCH rcu 16/18] rcu-tasks: Use more callback queues if contention encountered Paul E. McKenney
2021-12-02  0:38 ` [PATCH rcu 17/18] rcu-tasks: Use separate ->percpu_dequeue_lim for callback dequeueing Paul E. McKenney
2021-12-02  0:38 ` [PATCH rcu 18/18] rcu-tasks: Use fewer callbacks queues if callback flood ends Paul E. McKenney

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).