linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH tip/core/rcu 0/5] Realtime-related RCU updates for v5.13
@ 2021-03-04  0:29 Paul E. McKenney
  2021-03-04  0:29 ` [PATCH tip/core/rcu 1/5] rcu: Expedite deboost in case of deferred quiescent state paulmck
                   ` (4 more replies)
  0 siblings, 5 replies; 6+ messages in thread
From: Paul E. McKenney @ 2021-03-04  0:29 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, kernel-team, mingo, jiangshanlai, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel

Hello!

These commits are related to RCU's realtime support, including better
testing preemption, improving testing of RCU priority boosting, and
ensuring that at least one rcutorture scenario tests use_softirq=0.

1.	Expedite deboost in case of deferred quiescent state.

2.	Make TREE03 use real-time tree.use_softirq setting.

3.	Fix testing of RCU priority boosting.

4.	Make rcu_read_unlock_special() expedite strict grace periods.

5.	Make jitter.sh handle large systems.

						Thanx, Paul

------------------------------------------------------------------------

 kernel/rcu/rcutorture.c                                    |   36 +++++++------
 kernel/rcu/tree_plugin.h                                   |   27 +++++----
 tools/testing/selftests/rcutorture/bin/jitter.sh           |    8 +-
 tools/testing/selftests/rcutorture/configs/rcu/TREE03.boot |    1 
 4 files changed, 42 insertions(+), 30 deletions(-)

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH tip/core/rcu 1/5] rcu: Expedite deboost in case of deferred quiescent state
  2021-03-04  0:29 [PATCH tip/core/rcu 0/5] Realtime-related RCU updates for v5.13 Paul E. McKenney
@ 2021-03-04  0:29 ` paulmck
  2021-03-04  0:29 ` [PATCH tip/core/rcu 2/5] rcutorture: Make TREE03 use real-time tree.use_softirq setting paulmck
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: paulmck @ 2021-03-04  0:29 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, kernel-team, mingo, jiangshanlai, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, Paul E. McKenney,
	Sebastian Andrzej Siewior, Scott Wood

From: "Paul E. McKenney" <paulmck@kernel.org>

Historically, a task that has been subjected to RCU priority boosting is
deboosted at rcu_read_unlock() time.  However, with the advent of deferred
quiescent states, if the outermost rcu_read_unlock() was invoked with
either bottom halves, interrupts, or preemption disabled, the deboosting
will be delayed for some time.  During this time, a low-priority process
might be incorrectly running at a high real-time priority level.

Fortunately, rcu_read_unlock_special() already provides mechanisms for
forcing a minimal deferral of quiescent states, at least for kernels
built with CONFIG_IRQ_WORK=y.  These mechanisms are currently used
when expedited grace periods are pending that might be blocked by the
current task.  This commit therefore causes those mechanisms to also be
used in cases where the current task has been or might soon be subjected
to RCU priority boosting.  Note that this applies to all kernels built
with CONFIG_RCU_BOOST=y, regardless of whether or not they are also
built with CONFIG_PREEMPT_RT=y.

This approach assumes that kernels build for use with aggressive real-time
applications are built with CONFIG_IRQ_WORK=y.  It is likely to be far
simpler to enable CONFIG_IRQ_WORK=y than to implement a fast-deboosting
scheme that works correctly in its absence.

While in the area, alphabetize the rcu_preempt_deferred_qs_handler()
function's local variables.

Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Scott Wood <swood@redhat.com>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 kernel/rcu/tree_plugin.h | 26 ++++++++++++++------------
 1 file changed, 14 insertions(+), 12 deletions(-)

diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 2d60377..e17cb23 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -598,9 +598,9 @@ static void rcu_preempt_deferred_qs_handler(struct irq_work *iwp)
 static void rcu_read_unlock_special(struct task_struct *t)
 {
 	unsigned long flags;
+	bool irqs_were_disabled;
 	bool preempt_bh_were_disabled =
 			!!(preempt_count() & (PREEMPT_MASK | SOFTIRQ_MASK));
-	bool irqs_were_disabled;
 
 	/* NMI handlers cannot block and cannot safely manipulate state. */
 	if (in_nmi())
@@ -609,30 +609,32 @@ static void rcu_read_unlock_special(struct task_struct *t)
 	local_irq_save(flags);
 	irqs_were_disabled = irqs_disabled_flags(flags);
 	if (preempt_bh_were_disabled || irqs_were_disabled) {
-		bool exp;
+		bool expboost; // Expedited GP in flight or possible boosting.
 		struct rcu_data *rdp = this_cpu_ptr(&rcu_data);
 		struct rcu_node *rnp = rdp->mynode;
 
-		exp = (t->rcu_blocked_node &&
-		       READ_ONCE(t->rcu_blocked_node->exp_tasks)) ||
-		      (rdp->grpmask & READ_ONCE(rnp->expmask));
+		expboost = (t->rcu_blocked_node && READ_ONCE(t->rcu_blocked_node->exp_tasks)) ||
+			   (rdp->grpmask & READ_ONCE(rnp->expmask)) ||
+			   (IS_ENABLED(CONFIG_RCU_BOOST) && irqs_were_disabled &&
+			    t->rcu_blocked_node);
 		// Need to defer quiescent state until everything is enabled.
-		if (use_softirq && (in_irq() || (exp && !irqs_were_disabled))) {
+		if (use_softirq && (in_irq() || (expboost && !irqs_were_disabled))) {
 			// Using softirq, safe to awaken, and either the
-			// wakeup is free or there is an expedited GP.
+			// wakeup is free or there is either an expedited
+			// GP in flight or a potential need to deboost.
 			raise_softirq_irqoff(RCU_SOFTIRQ);
 		} else {
 			// Enabling BH or preempt does reschedule, so...
-			// Also if no expediting, slow is OK.
-			// Plus nohz_full CPUs eventually get tick enabled.
+			// Also if no expediting and no possible deboosting,
+			// slow is OK.  Plus nohz_full CPUs eventually get
+			// tick enabled.
 			set_tsk_need_resched(current);
 			set_preempt_need_resched();
 			if (IS_ENABLED(CONFIG_IRQ_WORK) && irqs_were_disabled &&
-			    !rdp->defer_qs_iw_pending && exp && cpu_online(rdp->cpu)) {
+			    expboost && !rdp->defer_qs_iw_pending && cpu_online(rdp->cpu)) {
 				// Get scheduler to re-evaluate and call hooks.
 				// If !IRQ_WORK, FQS scan will eventually IPI.
-				init_irq_work(&rdp->defer_qs_iw,
-					      rcu_preempt_deferred_qs_handler);
+				init_irq_work(&rdp->defer_qs_iw, rcu_preempt_deferred_qs_handler);
 				rdp->defer_qs_iw_pending = true;
 				irq_work_queue_on(&rdp->defer_qs_iw, rdp->cpu);
 			}
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH tip/core/rcu 2/5] rcutorture: Make TREE03 use real-time tree.use_softirq setting
  2021-03-04  0:29 [PATCH tip/core/rcu 0/5] Realtime-related RCU updates for v5.13 Paul E. McKenney
  2021-03-04  0:29 ` [PATCH tip/core/rcu 1/5] rcu: Expedite deboost in case of deferred quiescent state paulmck
@ 2021-03-04  0:29 ` paulmck
  2021-03-04  0:29 ` [PATCH tip/core/rcu 3/5] rcutorture: Fix testing of RCU priority boosting paulmck
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: paulmck @ 2021-03-04  0:29 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, kernel-team, mingo, jiangshanlai, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, Paul E. McKenney

From: "Paul E. McKenney" <paulmck@kernel.org>

TREE03 tests RCU priority boosting, which is a real-time feature.
It would also be good if it tested something closer to what is
actually used by the real-time folks.  This commit therefore adds
tree.use_softirq=0 to the TREE03 kernel boot parameters in TREE03.boot.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 tools/testing/selftests/rcutorture/configs/rcu/TREE03.boot | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tools/testing/selftests/rcutorture/configs/rcu/TREE03.boot b/tools/testing/selftests/rcutorture/configs/rcu/TREE03.boot
index 1c21894..64f864f1 100644
--- a/tools/testing/selftests/rcutorture/configs/rcu/TREE03.boot
+++ b/tools/testing/selftests/rcutorture/configs/rcu/TREE03.boot
@@ -4,3 +4,4 @@ rcutree.gp_init_delay=3
 rcutree.gp_cleanup_delay=3
 rcutree.kthread_prio=2
 threadirqs
+tree.use_softirq=0
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH tip/core/rcu 3/5] rcutorture: Fix testing of RCU priority boosting
  2021-03-04  0:29 [PATCH tip/core/rcu 0/5] Realtime-related RCU updates for v5.13 Paul E. McKenney
  2021-03-04  0:29 ` [PATCH tip/core/rcu 1/5] rcu: Expedite deboost in case of deferred quiescent state paulmck
  2021-03-04  0:29 ` [PATCH tip/core/rcu 2/5] rcutorture: Make TREE03 use real-time tree.use_softirq setting paulmck
@ 2021-03-04  0:29 ` paulmck
  2021-03-04  0:29 ` [PATCH tip/core/rcu 4/5] rcu: Make rcu_read_unlock_special() expedite strict grace periods paulmck
  2021-03-04  0:29 ` [PATCH tip/core/rcu 5/5] torture: Make jitter.sh handle large systems paulmck
  4 siblings, 0 replies; 6+ messages in thread
From: paulmck @ 2021-03-04  0:29 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, kernel-team, mingo, jiangshanlai, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, Paul E. McKenney,
	Sebastian Andrzej Siewior, Scott Wood

From: "Paul E. McKenney" <paulmck@kernel.org>

Currently, rcutorture refuses to test RCU priority boosting in
CONFIG_HOTPLUG_CPU=y kernels, which are the only kind normally built on
x86 these days.  This commit therefore updates rcutorture's tests of RCU
priority boosting to make them safe for CPU hotplug.  However, these tests
will fail unless TIMER_SOFTIRQ runs at realtime priority, which does not
happen in current mainline.  This commit therefore also refuses to test
RCU priority boosting except in kernels built with CONFIG_PREEMPT_RT=y.

While in the area, this commt adds some debug output at boost-fail time
that helps diagnose the cause of the failure, for example, failing to
run TIMER_SOFTIRQ at realtime priority.

Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Scott Wood <swood@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 kernel/rcu/rcutorture.c | 36 ++++++++++++++++++++++--------------
 1 file changed, 22 insertions(+), 14 deletions(-)

diff --git a/kernel/rcu/rcutorture.c b/kernel/rcu/rcutorture.c
index 99657ff..af64bd8 100644
--- a/kernel/rcu/rcutorture.c
+++ b/kernel/rcu/rcutorture.c
@@ -245,11 +245,11 @@ static const char *rcu_torture_writer_state_getname(void)
 	return rcu_torture_writer_state_names[i];
 }
 
-#if defined(CONFIG_RCU_BOOST) && !defined(CONFIG_HOTPLUG_CPU)
-#define rcu_can_boost() 1
-#else /* #if defined(CONFIG_RCU_BOOST) && !defined(CONFIG_HOTPLUG_CPU) */
-#define rcu_can_boost() 0
-#endif /* #else #if defined(CONFIG_RCU_BOOST) && !defined(CONFIG_HOTPLUG_CPU) */
+#if defined(CONFIG_RCU_BOOST) && defined(CONFIG_PREEMPT_RT)
+# define rcu_can_boost() 1
+#else
+# define rcu_can_boost() 0
+#endif
 
 #ifdef CONFIG_RCU_TRACE
 static u64 notrace rcu_trace_clock_local(void)
@@ -923,9 +923,13 @@ static void rcu_torture_enable_rt_throttle(void)
 
 static bool rcu_torture_boost_failed(unsigned long start, unsigned long end)
 {
+	static int dbg_done;
+
 	if (end - start > test_boost_duration * HZ - HZ / 2) {
 		VERBOSE_TOROUT_STRING("rcu_torture_boost boosting failed");
 		n_rcu_torture_boost_failure++;
+		if (!xchg(&dbg_done, 1) && cur_ops->gp_kthread_dbg)
+			cur_ops->gp_kthread_dbg();
 
 		return true; /* failed */
 	}
@@ -948,8 +952,8 @@ static int rcu_torture_boost(void *arg)
 	init_rcu_head_on_stack(&rbi.rcu);
 	/* Each pass through the following loop does one boost-test cycle. */
 	do {
-		/* Track if the test failed already in this test interval? */
-		bool failed = false;
+		bool failed = false; // Test failed already in this test interval
+		bool firsttime = true;
 
 		/* Increment n_rcu_torture_boosts once per boost-test */
 		while (!kthread_should_stop()) {
@@ -975,18 +979,17 @@ static int rcu_torture_boost(void *arg)
 
 		/* Do one boost-test interval. */
 		endtime = oldstarttime + test_boost_duration * HZ;
-		call_rcu_time = jiffies;
 		while (time_before(jiffies, endtime)) {
 			/* If we don't have a callback in flight, post one. */
 			if (!smp_load_acquire(&rbi.inflight)) {
 				/* RCU core before ->inflight = 1. */
 				smp_store_release(&rbi.inflight, 1);
-				call_rcu(&rbi.rcu, rcu_torture_boost_cb);
+				cur_ops->call(&rbi.rcu, rcu_torture_boost_cb);
 				/* Check if the boost test failed */
-				failed = failed ||
-					 rcu_torture_boost_failed(call_rcu_time,
-								 jiffies);
+				if (!firsttime && !failed)
+					failed = rcu_torture_boost_failed(call_rcu_time, jiffies);
 				call_rcu_time = jiffies;
+				firsttime = false;
 			}
 			if (stutter_wait("rcu_torture_boost"))
 				sched_set_fifo_low(current);
@@ -999,7 +1002,7 @@ static int rcu_torture_boost(void *arg)
 		 * this case the boost check would never happen in the above
 		 * loop so do another one here.
 		 */
-		if (!failed && smp_load_acquire(&rbi.inflight))
+		if (!firsttime && !failed && smp_load_acquire(&rbi.inflight))
 			rcu_torture_boost_failed(call_rcu_time, jiffies);
 
 		/*
@@ -1025,6 +1028,9 @@ checkwait:	if (stutter_wait("rcu_torture_boost"))
 			sched_set_fifo_low(current);
 	} while (!torture_must_stop());
 
+	while (smp_load_acquire(&rbi.inflight))
+		schedule_timeout_uninterruptible(1); // rcu_barrier() deadlocks.
+
 	/* Clean up and exit. */
 	while (!kthread_should_stop() || smp_load_acquire(&rbi.inflight)) {
 		torture_shutdown_absorb("rcu_torture_boost");
@@ -1797,7 +1803,7 @@ rcu_torture_stats_print(void)
 		WARN_ON_ONCE(n_rcu_torture_barrier_error);  // rcu_barrier()
 		WARN_ON_ONCE(n_rcu_torture_boost_ktrerror); // no boost kthread
 		WARN_ON_ONCE(n_rcu_torture_boost_rterror); // can't set RT prio
-		WARN_ON_ONCE(n_rcu_torture_boost_failure); // RCU boost failed
+		WARN_ON_ONCE(n_rcu_torture_boost_failure); // boost failed (TIMER_SOFTIRQ RT prio?)
 		WARN_ON_ONCE(i > 1); // Too-short grace period
 	}
 	pr_cont("Reader Pipe: ");
@@ -2595,6 +2601,8 @@ static bool rcu_torture_can_boost(void)
 
 	if (!(test_boost == 1 && cur_ops->can_boost) && test_boost != 2)
 		return false;
+	if (!cur_ops->call)
+		return false;
 
 	prio = rcu_get_gp_kthreads_prio();
 	if (!prio)
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH tip/core/rcu 4/5] rcu: Make rcu_read_unlock_special() expedite strict grace periods
  2021-03-04  0:29 [PATCH tip/core/rcu 0/5] Realtime-related RCU updates for v5.13 Paul E. McKenney
                   ` (2 preceding siblings ...)
  2021-03-04  0:29 ` [PATCH tip/core/rcu 3/5] rcutorture: Fix testing of RCU priority boosting paulmck
@ 2021-03-04  0:29 ` paulmck
  2021-03-04  0:29 ` [PATCH tip/core/rcu 5/5] torture: Make jitter.sh handle large systems paulmck
  4 siblings, 0 replies; 6+ messages in thread
From: paulmck @ 2021-03-04  0:29 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, kernel-team, mingo, jiangshanlai, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, Paul E. McKenney

From: "Paul E. McKenney" <paulmck@kernel.org>

In kernels built with CONFIG_RCU_STRICT_GRACE_PERIOD=y, every grace
period is an expedited grace period.  However, rcu_read_unlock_special()
does not treat them that way, instead allowing the deferred quiescent
state to be reported whenever.  This commit therefore adds a check of
this Kconfig option that causes rcu_read_unlock_special() to treat all
grace periods as expedited for CONFIG_RCU_STRICT_GRACE_PERIOD=y kernels.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 kernel/rcu/tree_plugin.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index e17cb23..a21c41c 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -615,6 +615,7 @@ static void rcu_read_unlock_special(struct task_struct *t)
 
 		expboost = (t->rcu_blocked_node && READ_ONCE(t->rcu_blocked_node->exp_tasks)) ||
 			   (rdp->grpmask & READ_ONCE(rnp->expmask)) ||
+			   IS_ENABLED(CONFIG_RCU_STRICT_GRACE_PERIOD) ||
 			   (IS_ENABLED(CONFIG_RCU_BOOST) && irqs_were_disabled &&
 			    t->rcu_blocked_node);
 		// Need to defer quiescent state until everything is enabled.
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH tip/core/rcu 5/5] torture: Make jitter.sh handle large systems
  2021-03-04  0:29 [PATCH tip/core/rcu 0/5] Realtime-related RCU updates for v5.13 Paul E. McKenney
                   ` (3 preceding siblings ...)
  2021-03-04  0:29 ` [PATCH tip/core/rcu 4/5] rcu: Make rcu_read_unlock_special() expedite strict grace periods paulmck
@ 2021-03-04  0:29 ` paulmck
  4 siblings, 0 replies; 6+ messages in thread
From: paulmck @ 2021-03-04  0:29 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, kernel-team, mingo, jiangshanlai, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, Paul E. McKenney

From: "Paul E. McKenney" <paulmck@kernel.org>

The current jitter.sh script expects cpumask bits to fit into whatever
the awk interpreter uses for an integer, which clearly does not hold for
even medium-sized systems these days.  This means that on a large system,
only the first 32 or 64 CPUs (depending) are subjected to jitter.sh
CPU-time perturbations.  This commit therefore computes a given CPU's
cpumask using text manipulation rather than arithmetic shifts.

Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 tools/testing/selftests/rcutorture/bin/jitter.sh | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/tools/testing/selftests/rcutorture/bin/jitter.sh b/tools/testing/selftests/rcutorture/bin/jitter.sh
index 188b864..3a856ec 100755
--- a/tools/testing/selftests/rcutorture/bin/jitter.sh
+++ b/tools/testing/selftests/rcutorture/bin/jitter.sh
@@ -67,10 +67,10 @@ do
 		srand(n + me + systime());
 		ncpus = split(cpus, ca);
 		curcpu = ca[int(rand() * ncpus + 1)];
-		mask = lshift(1, curcpu);
-		if (mask + 0 <= 0)
-			mask = 1;
-		printf("%#x\n", mask);
+		z = "";
+		for (i = 1; 4 * i <= curcpu; i++)
+			z = z "0";
+		print "0x" 2 ^ (curcpu % 4) z;
 	}' < /dev/null`
 	n=$(($n+1))
 	if ! taskset -p $cpumask $$ > /dev/null 2>&1
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2021-03-04  1:08 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-04  0:29 [PATCH tip/core/rcu 0/5] Realtime-related RCU updates for v5.13 Paul E. McKenney
2021-03-04  0:29 ` [PATCH tip/core/rcu 1/5] rcu: Expedite deboost in case of deferred quiescent state paulmck
2021-03-04  0:29 ` [PATCH tip/core/rcu 2/5] rcutorture: Make TREE03 use real-time tree.use_softirq setting paulmck
2021-03-04  0:29 ` [PATCH tip/core/rcu 3/5] rcutorture: Fix testing of RCU priority boosting paulmck
2021-03-04  0:29 ` [PATCH tip/core/rcu 4/5] rcu: Make rcu_read_unlock_special() expedite strict grace periods paulmck
2021-03-04  0:29 ` [PATCH tip/core/rcu 5/5] torture: Make jitter.sh handle large systems paulmck

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).