rcu.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/4] rcu, arm64: PREEMPT_RT fixlets
@ 2021-08-11 20:13 Valentin Schneider
  2021-08-11 20:13 ` [PATCH v3 1/4] rcutorture: Don't disable softirqs with preemption disabled when PREEMPT_RT Valentin Schneider
                   ` (3 more replies)
  0 siblings, 4 replies; 51+ messages in thread
From: Valentin Schneider @ 2021-08-11 20:13 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, rcu, linux-rt-users
  Cc: Catalin Marinas, Will Deacon, Ingo Molnar, Peter Zijlstra,
	Thomas Gleixner, Steven Rostedt, Daniel Bristot de Oliveira,
	Sebastian Andrzej Siewior, Paul E. McKenney, Frederic Weisbecker,
	Josh Triplett, Mathieu Desnoyers, Davidlohr Bueso, Lai Jiangshan,
	Joel Fernandes, Anshuman Khandual, Vincenzo Frascino,
	Steven Price, Ard Biesheuvel, Boqun Feng, Mike Galbraith

Hi folks,

this is v3 of:

  https://lore.kernel.org/lkml/20210721115118.729943-1-valentin.schneider@arm.com/

respun from Frederic and Paul's helpful feedback. Tested atop v5.14-rc5-rt8 with
the v1 patches reverted. There, commit 

  d76e0926d835 ("rcu/nocb: Use the rcuog CPU's ->nocb_timer")

prevents the NOCB offload warning from firing if there are no NOCB CPUs (which
is sensible). Adding a single NOCB CPU brings the warning back, which patch 3
fixes.

Revisions
=========

v2 -> v3
++++++++

o Rebased and tested against v5.14-rc5-rt8
o Dropped affinity check from is_pcpu_safe() (Boqun)
o Renamed is_pcpu_safe() to migratable() (Boqun, Mike)

v1 -> v2
++++++++

o Rebased and tested against v5.14-rc4-rt6
o Picked rcutorture patch patch from
  https://lore.kernel.org/lkml/20210803225437.3612591-2-valentin.schneider@arm.com/
o Added a local_lock to protect NOCB offload state under PREEMPT_RT (Frederic,
  Paul)

Valentin Schneider (4):
  rcutorture: Don't disable softirqs with preemption disabled when
    PREEMPT_RT
  sched: Introduce migratable()
  rcu/nocb: Protect NOCB state via local_lock() under PREEMPT_RT
  arm64: mm: Make arch_faults_on_old_pte() check for migratability

 arch/arm64/include/asm/pgtable.h |  2 +-
 include/linux/sched.h            | 10 ++++
 kernel/rcu/rcutorture.c          |  2 +
 kernel/rcu/tree.c                |  4 ++
 kernel/rcu/tree.h                |  4 ++
 kernel/rcu/tree_plugin.h         | 82 ++++++++++++++++++++++++++++----
 6 files changed, 94 insertions(+), 10 deletions(-)

--
2.25.1


^ permalink raw reply	[flat|nested] 51+ messages in thread

* [PATCH v3 1/4] rcutorture: Don't disable softirqs with preemption disabled when PREEMPT_RT
  2021-08-11 20:13 [PATCH v3 0/4] rcu, arm64: PREEMPT_RT fixlets Valentin Schneider
@ 2021-08-11 20:13 ` Valentin Schneider
  2021-08-12 16:47   ` Paul E. McKenney
  2021-08-17 12:13   ` Sebastian Andrzej Siewior
  2021-08-11 20:13 ` [PATCH v3 2/4] sched: Introduce migratable() Valentin Schneider
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 51+ messages in thread
From: Valentin Schneider @ 2021-08-11 20:13 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, rcu, linux-rt-users
  Cc: Catalin Marinas, Will Deacon, Ingo Molnar, Peter Zijlstra,
	Thomas Gleixner, Steven Rostedt, Daniel Bristot de Oliveira,
	Sebastian Andrzej Siewior, Paul E. McKenney, Frederic Weisbecker,
	Josh Triplett, Mathieu Desnoyers, Davidlohr Bueso, Lai Jiangshan,
	Joel Fernandes, Anshuman Khandual, Vincenzo Frascino,
	Steven Price, Ard Biesheuvel, Boqun Feng, Mike Galbraith

Running RCU torture with CONFIG_PREEMPT_RT under v5.14-rc4-rt6 triggers:

[    8.755472] DEBUG_LOCKS_WARN_ON(this_cpu_read(softirq_ctrl.cnt))
[    8.755482] WARNING: CPU: 1 PID: 137 at kernel/softirq.c:172 __local_bh_disable_ip (kernel/softirq.c:172 (discriminator 31))
[    8.755500] Modules linked in:
[    8.755506] CPU: 1 PID: 137 Comm: rcu_torture_rea Not tainted 5.14.0-rc4-rt6 #171
[    8.755514] Hardware name: ARM Juno development board (r0) (DT)
[    8.755518] pstate: 60000005 (nZCv daif -PAN -UAO -TCO BTYPE=--)
[    8.755622] Call trace:
[    8.755624] __local_bh_disable_ip (kernel/softirq.c:172 (discriminator 31))
[    8.755633] rcutorture_one_extend (kernel/rcu/rcutorture.c:1453)
[    8.755640] rcu_torture_one_read (kernel/rcu/rcutorture.c:1601 kernel/rcu/rcutorture.c:1645)
[    8.755645] rcu_torture_reader (kernel/rcu/rcutorture.c:1737)
[    8.755650] kthread (kernel/kthread.c:327)
[    8.755656] ret_from_fork (arch/arm64/kernel/entry.S:783)
[    8.755663] irq event stamp: 11959
[    8.755666] hardirqs last enabled at (11959): __rcu_read_unlock (kernel/rcu/tree_plugin.h:695 kernel/rcu/tree_plugin.h:451)
[    8.755675] hardirqs last disabled at (11958): __rcu_read_unlock (kernel/rcu/tree_plugin.h:661 kernel/rcu/tree_plugin.h:451)
[    8.755683] softirqs last enabled at (11950): __local_bh_enable_ip (./arch/arm64/include/asm/irqflags.h:85 kernel/softirq.c:261)
[    8.755692] softirqs last disabled at (11942): rcutorture_one_extend (./include/linux/bottom_half.h:19 kernel/rcu/rcutorture.c:1441)

Per this warning, softirqs cannot be disabled in a non-preemptible region
under CONFIG_PREEMPT_RT. Adjust RCU torture accordingly.

Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
---
 kernel/rcu/rcutorture.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/kernel/rcu/rcutorture.c b/kernel/rcu/rcutorture.c
index eecd1caef71d..4f3db1d3170d 100644
--- a/kernel/rcu/rcutorture.c
+++ b/kernel/rcu/rcutorture.c
@@ -1548,6 +1548,8 @@ rcutorture_extend_mask(int oldmask, struct torture_random_state *trsp)
 	 * them on non-RT.
 	 */
 	if (IS_ENABLED(CONFIG_PREEMPT_RT)) {
+		/* Can't disable bh in atomic context under PREEMPT_RT */
+		mask &= ~(RCUTORTURE_RDR_ATOM_BH | RCUTORTURE_RDR_ATOM_RBH);
 		/*
 		 * Can't release the outermost rcu lock in an irq disabled
 		 * section without preemption also being disabled, if irqs
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v3 2/4] sched: Introduce migratable()
  2021-08-11 20:13 [PATCH v3 0/4] rcu, arm64: PREEMPT_RT fixlets Valentin Schneider
  2021-08-11 20:13 ` [PATCH v3 1/4] rcutorture: Don't disable softirqs with preemption disabled when PREEMPT_RT Valentin Schneider
@ 2021-08-11 20:13 ` Valentin Schneider
  2021-08-17 14:43   ` Sebastian Andrzej Siewior
  2021-08-17 17:09   ` Sebastian Andrzej Siewior
  2021-08-11 20:13 ` [PATCH v3 3/4] rcu/nocb: Protect NOCB state via local_lock() under PREEMPT_RT Valentin Schneider
  2021-08-11 20:13 ` [PATCH v3 4/4] arm64: mm: Make arch_faults_on_old_pte() check for migratability Valentin Schneider
  3 siblings, 2 replies; 51+ messages in thread
From: Valentin Schneider @ 2021-08-11 20:13 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, rcu, linux-rt-users
  Cc: Catalin Marinas, Will Deacon, Ingo Molnar, Peter Zijlstra,
	Thomas Gleixner, Steven Rostedt, Daniel Bristot de Oliveira,
	Sebastian Andrzej Siewior, Paul E. McKenney, Frederic Weisbecker,
	Josh Triplett, Mathieu Desnoyers, Davidlohr Bueso, Lai Jiangshan,
	Joel Fernandes, Anshuman Khandual, Vincenzo Frascino,
	Steven Price, Ard Biesheuvel, Boqun Feng, Mike Galbraith

Some areas use preempt_disable() + preempt_enable() to safely access
per-CPU data. The PREEMPT_RT folks have shown this can also be done by
keeping preemption enabled and instead disabling migration (and acquiring a
sleepable lock, if relevant).

Introduce a helper which checks whether the current task can be migrated
elsewhere, IOW if it is pinned to its local CPU in the current
context. This can help determining if per-CPU properties can be safely
accessed.

Note that CPU affinity is not checked here, as a preemptible task can have
its affinity changed at any given time (including if it has
PF_NO_SETAFFINITY, when hotplug gets involved).

Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
---
 include/linux/sched.h | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index debc960f41e3..8ba7b4a7ee69 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1715,6 +1715,16 @@ static inline bool is_percpu_thread(void)
 #endif
 }
 
+/* Is the current task guaranteed to stay on its current CPU? */
+static inline bool migratable(void)
+{
+#ifdef CONFIG_SMP
+	return preemptible() && !current->migration_disabled;
+#else
+	return true;
+#endif
+}
+
 /* Per-process atomic flags. */
 #define PFA_NO_NEW_PRIVS		0	/* May not gain new privileges. */
 #define PFA_SPREAD_PAGE			1	/* Spread page cache over cpuset */
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v3 3/4] rcu/nocb: Protect NOCB state via local_lock() under PREEMPT_RT
  2021-08-11 20:13 [PATCH v3 0/4] rcu, arm64: PREEMPT_RT fixlets Valentin Schneider
  2021-08-11 20:13 ` [PATCH v3 1/4] rcutorture: Don't disable softirqs with preemption disabled when PREEMPT_RT Valentin Schneider
  2021-08-11 20:13 ` [PATCH v3 2/4] sched: Introduce migratable() Valentin Schneider
@ 2021-08-11 20:13 ` Valentin Schneider
  2021-08-13  0:20   ` Paul E. McKenney
                     ` (2 more replies)
  2021-08-11 20:13 ` [PATCH v3 4/4] arm64: mm: Make arch_faults_on_old_pte() check for migratability Valentin Schneider
  3 siblings, 3 replies; 51+ messages in thread
From: Valentin Schneider @ 2021-08-11 20:13 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, rcu, linux-rt-users
  Cc: Catalin Marinas, Will Deacon, Ingo Molnar, Peter Zijlstra,
	Thomas Gleixner, Steven Rostedt, Daniel Bristot de Oliveira,
	Sebastian Andrzej Siewior, Paul E. McKenney, Frederic Weisbecker,
	Josh Triplett, Mathieu Desnoyers, Davidlohr Bueso, Lai Jiangshan,
	Joel Fernandes, Anshuman Khandual, Vincenzo Frascino,
	Steven Price, Ard Biesheuvel, Boqun Feng, Mike Galbraith

Warning
=======

Running v5.13-rt1 on my arm64 Juno board triggers:

[    0.156302] =============================
[    0.160416] WARNING: suspicious RCU usage
[    0.164529] 5.13.0-rt1 #20 Not tainted
[    0.168300] -----------------------------
[    0.172409] kernel/rcu/tree_plugin.h:69 Unsafe read of RCU_NOCB offloaded state!
[    0.179920]
[    0.179920] other info that might help us debug this:
[    0.179920]
[    0.188037]
[    0.188037] rcu_scheduler_active = 1, debug_locks = 1
[    0.194677] 3 locks held by rcuc/0/11:
[    0.198448] #0: ffff00097ef10cf8 ((softirq_ctrl.lock).lock){+.+.}-{2:2}, at: __local_bh_disable_ip (./include/linux/rcupdate.h:662 kernel/softirq.c:171)
[    0.208709] #1: ffff80001205e5f0 (rcu_read_lock){....}-{1:2}, at: rt_spin_lock (kernel/locking/spinlock_rt.c:43 (discriminator 4))
[    0.217134] #2: ffff80001205e5f0 (rcu_read_lock){....}-{1:2}, at: __local_bh_disable_ip (kernel/softirq.c:169)
[    0.226428]
[    0.226428] stack backtrace:
[    0.230889] CPU: 0 PID: 11 Comm: rcuc/0 Not tainted 5.13.0-rt1 #20
[    0.237100] Hardware name: ARM Juno development board (r0) (DT)
[    0.243041] Call trace:
[    0.245497] dump_backtrace (arch/arm64/kernel/stacktrace.c:163)
[    0.249185] show_stack (arch/arm64/kernel/stacktrace.c:219)
[    0.252522] dump_stack (lib/dump_stack.c:122)
[    0.255947] lockdep_rcu_suspicious (kernel/locking/lockdep.c:6439)
[    0.260328] rcu_rdp_is_offloaded (kernel/rcu/tree_plugin.h:69 kernel/rcu/tree_plugin.h:58)
[    0.264537] rcu_core (kernel/rcu/tree.c:2332 kernel/rcu/tree.c:2398 kernel/rcu/tree.c:2777)
[    0.267786] rcu_cpu_kthread (./include/linux/bottom_half.h:32 kernel/rcu/tree.c:2876)
[    0.271644] smpboot_thread_fn (kernel/smpboot.c:165 (discriminator 3))
[    0.275767] kthread (kernel/kthread.c:321)
[    0.279013] ret_from_fork (arch/arm64/kernel/entry.S:1005)

In this case, this is the RCU core kthread accessing the local CPU's
rdp. Before that, rcu_cpu_kthread() invokes local_bh_disable().

Under !CONFIG_PREEMPT_RT (and rcutree.use_softirq=0), this ends up
incrementing the preempt_count, which satisfies the "local non-preemptible
read" of rcu_rdp_is_offloaded().

Under CONFIG_PREEMPT_RT however, this becomes

  local_lock(&softirq_ctrl.lock)

which, under the same config, is migrate_disable() + rt_spin_lock(). As
pointed out by Frederic, this is not sufficient to safely access an rdp's
offload state, as the RCU core kthread can be preempted by a kworker
executing rcu_nocb_rdp_offload() [1].

Introduce a local_lock to serialize an rdp's offload state while the rdp's
associated core kthread is executing rcu_core().

rcu_core() preemptability considerations
========================================

As pointed out by Paul [2], keeping rcu_check_quiescent_state() preemptible
(which is the case under CONFIG_PREEMPT_RT) requires some consideration.

note_gp_changes() itself runs with irqs off, and enters
__note_gp_changes() with rnp->lock held (raw_spinlock), thus is safe vs
preemption.

rdp->core_needs_qs *could* change after being read by the RCU core
kthread if it then gets preempted. Consider, with
CONFIG_RCU_STRICT_GRACE_PERIOD:

  rcuc/x                                   task_foo

    rcu_check_quiescent_state()
    `\
      rdp->core_needs_qs == true
			     <PREEMPT>
					   rcu_read_unlock()
					   `\
					     rcu_preempt_deferred_qs_irqrestore()
					     `\
					       rcu_report_qs_rdp()
					       `\
						 rdp->core_needs_qs := false;

This would let rcuc/x's rcu_check_quiescent_state() proceed further down to
rcu_report_qs_rdp(), but if task_foo's earlier rcu_report_qs_rdp()
invocation would have cleared the rdp grpmask from the rnp mask, so
rcuc/x's invocation would simply bail.

Since rcu_report_qs_rdp() can be safely invoked, even if rdp->core_needs_qs
changed, it appears safe to keep rcu_check_quiescent_state() preemptible.

[1]: http://lore.kernel.org/r/20210727230814.GC283787@lothringen
[2]: http://lore.kernel.org/r/20210729010445.GO4397@paulmck-ThinkPad-P17-Gen-1
Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
---
 kernel/rcu/tree.c        |  4 ++
 kernel/rcu/tree.h        |  4 ++
 kernel/rcu/tree_plugin.h | 82 +++++++++++++++++++++++++++++++++++-----
 3 files changed, 81 insertions(+), 9 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 51f24ecd94b2..caadfec994f3 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -87,6 +87,7 @@ static DEFINE_PER_CPU_SHARED_ALIGNED(struct rcu_data, rcu_data) = {
 	.dynticks = ATOMIC_INIT(RCU_DYNTICK_CTRL_CTR),
 #ifdef CONFIG_RCU_NOCB_CPU
 	.cblist.flags = SEGCBLIST_SOFTIRQ_ONLY,
+	.nocb_local_lock =  INIT_LOCAL_LOCK(nocb_local_lock),
 #endif
 };
 static struct rcu_state rcu_state = {
@@ -2853,10 +2854,12 @@ static void rcu_cpu_kthread(unsigned int cpu)
 {
 	unsigned int *statusp = this_cpu_ptr(&rcu_data.rcu_cpu_kthread_status);
 	char work, *workp = this_cpu_ptr(&rcu_data.rcu_cpu_has_work);
+	struct rcu_data *rdp = this_cpu_ptr(&rcu_data);
 	int spincnt;
 
 	trace_rcu_utilization(TPS("Start CPU kthread@rcu_run"));
 	for (spincnt = 0; spincnt < 10; spincnt++) {
+		rcu_nocb_local_lock(rdp);
 		local_bh_disable();
 		*statusp = RCU_KTHREAD_RUNNING;
 		local_irq_disable();
@@ -2866,6 +2869,7 @@ static void rcu_cpu_kthread(unsigned int cpu)
 		if (work)
 			rcu_core();
 		local_bh_enable();
+		rcu_nocb_local_unlock(rdp);
 		if (*workp == 0) {
 			trace_rcu_utilization(TPS("End CPU kthread@rcu_wait"));
 			*statusp = RCU_KTHREAD_WAITING;
diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
index 305cf6aeb408..aa6831255fec 100644
--- a/kernel/rcu/tree.h
+++ b/kernel/rcu/tree.h
@@ -210,6 +210,8 @@ struct rcu_data {
 	struct timer_list nocb_timer;	/* Enforce finite deferral. */
 	unsigned long nocb_gp_adv_time;	/* Last call_rcu() CB adv (jiffies). */
 
+	local_lock_t nocb_local_lock;
+
 	/* The following fields are used by call_rcu, hence own cacheline. */
 	raw_spinlock_t nocb_bypass_lock ____cacheline_internodealigned_in_smp;
 	struct rcu_cblist nocb_bypass;	/* Lock-contention-bypass CB list. */
@@ -445,6 +447,8 @@ static void rcu_nocb_unlock(struct rcu_data *rdp);
 static void rcu_nocb_unlock_irqrestore(struct rcu_data *rdp,
 				       unsigned long flags);
 static void rcu_lockdep_assert_cblist_protected(struct rcu_data *rdp);
+static void rcu_nocb_local_lock(struct rcu_data *rdp);
+static void rcu_nocb_local_unlock(struct rcu_data *rdp);
 #ifdef CONFIG_RCU_NOCB_CPU
 static void __init rcu_organize_nocb_kthreads(void);
 #define rcu_nocb_lock_irqsave(rdp, flags)				\
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 0ff5e4fb933e..11c4ff00afde 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -21,6 +21,17 @@ static inline int rcu_lockdep_is_held_nocb(struct rcu_data *rdp)
 	return lockdep_is_held(&rdp->nocb_lock);
 }
 
+static inline int rcu_lockdep_is_held_nocb_local(struct rcu_data *rdp)
+{
+	return lockdep_is_held(
+#ifdef CONFIG_PREEMPT_RT
+		&rdp->nocb_local_lock.lock
+#else
+		&rdp->nocb_local_lock
+#endif
+	);
+}
+
 static inline bool rcu_current_is_nocb_kthread(struct rcu_data *rdp)
 {
 	/* Race on early boot between thread creation and assignment */
@@ -38,7 +49,10 @@ static inline int rcu_lockdep_is_held_nocb(struct rcu_data *rdp)
 {
 	return 0;
 }
-
+static inline int rcu_lockdep_is_held_nocb_local(struct rcu_data *rdp)
+{
+	return 0;
+}
 static inline bool rcu_current_is_nocb_kthread(struct rcu_data *rdp)
 {
 	return false;
@@ -46,23 +60,44 @@ static inline bool rcu_current_is_nocb_kthread(struct rcu_data *rdp)
 
 #endif /* #ifdef CONFIG_RCU_NOCB_CPU */
 
+/*
+ * Is a local read of the rdp's offloaded state safe and stable?
+ * See rcu_nocb_local_lock() & family.
+ */
+static inline bool rcu_local_offload_access_safe(struct rcu_data *rdp)
+{
+	if (!preemptible())
+		return true;
+
+	if (!migratable()) {
+		if (!IS_ENABLED(CONFIG_RCU_NOCB))
+			return true;
+
+		return rcu_lockdep_is_held_nocb_local(rdp);
+	}
+
+	return false;
+}
+
 static bool rcu_rdp_is_offloaded(struct rcu_data *rdp)
 {
 	/*
-	 * In order to read the offloaded state of an rdp is a safe
-	 * and stable way and prevent from its value to be changed
-	 * under us, we must either hold the barrier mutex, the cpu
-	 * hotplug lock (read or write) or the nocb lock. Local
-	 * non-preemptible reads are also safe. NOCB kthreads and
-	 * timers have their own means of synchronization against the
-	 * offloaded state updaters.
+	 * In order to read the offloaded state of an rdp is a safe and stable
+	 * way and prevent from its value to be changed under us, we must either...
 	 */
 	RCU_LOCKDEP_WARN(
+		// ...hold the barrier mutex...
 		!(lockdep_is_held(&rcu_state.barrier_mutex) ||
+		  // ... the cpu hotplug lock (read or write)...
 		  (IS_ENABLED(CONFIG_HOTPLUG_CPU) && lockdep_is_cpus_held()) ||
+		  // ... or the NOCB lock.
 		  rcu_lockdep_is_held_nocb(rdp) ||
+		  // Local reads still require the local state to remain stable
+		  // (preemption disabled / local lock held)
 		  (rdp == this_cpu_ptr(&rcu_data) &&
-		   !(IS_ENABLED(CONFIG_PREEMPT_COUNT) && preemptible())) ||
+		   rcu_local_offload_access_safe(rdp)) ||
+		  // NOCB kthreads and timers have their own means of synchronization
+		  // against the offloaded state updaters.
 		  rcu_current_is_nocb_kthread(rdp)),
 		"Unsafe read of RCU_NOCB offloaded state"
 	);
@@ -1629,6 +1664,22 @@ static void rcu_nocb_unlock_irqrestore(struct rcu_data *rdp,
 	}
 }
 
+/*
+ * The invocation of rcu_core() within the RCU core kthreads remains preemptible
+ * under PREEMPT_RT, thus the offload state of a CPU could change while
+ * said kthreads are preempted. Prevent this from happening by protecting the
+ * offload state with a local_lock().
+ */
+static void rcu_nocb_local_lock(struct rcu_data *rdp)
+{
+	local_lock(&rcu_data.nocb_local_lock);
+}
+
+static void rcu_nocb_local_unlock(struct rcu_data *rdp)
+{
+	local_unlock(&rcu_data.nocb_local_lock);
+}
+
 /* Lockdep check that ->cblist may be safely accessed. */
 static void rcu_lockdep_assert_cblist_protected(struct rcu_data *rdp)
 {
@@ -2396,6 +2447,7 @@ static int rdp_offload_toggle(struct rcu_data *rdp,
 	if (rdp->nocb_cb_sleep)
 		rdp->nocb_cb_sleep = false;
 	rcu_nocb_unlock_irqrestore(rdp, flags);
+	rcu_nocb_local_unlock(rdp);
 
 	/*
 	 * Ignore former value of nocb_cb_sleep and force wake up as it could
@@ -2427,6 +2479,7 @@ static long rcu_nocb_rdp_deoffload(void *arg)
 
 	pr_info("De-offloading %d\n", rdp->cpu);
 
+	rcu_nocb_local_lock(rdp);
 	rcu_nocb_lock_irqsave(rdp, flags);
 	/*
 	 * Flush once and for all now. This suffices because we are
@@ -2509,6 +2562,7 @@ static long rcu_nocb_rdp_offload(void *arg)
 	 * Can't use rcu_nocb_lock_irqsave() while we are in
 	 * SEGCBLIST_SOFTIRQ_ONLY mode.
 	 */
+	rcu_nocb_local_lock(rdp);
 	raw_spin_lock_irqsave(&rdp->nocb_lock, flags);
 
 	/*
@@ -2868,6 +2922,16 @@ static void rcu_nocb_unlock_irqrestore(struct rcu_data *rdp,
 	local_irq_restore(flags);
 }
 
+/* No ->nocb_local_lock to acquire. */
+static void rcu_nocb_local_lock(struct rcu_data *rdp)
+{
+}
+
+/* No ->nocb_local_lock to release. */
+static void rcu_nocb_local_unlock(struct rcu_data *rdp)
+{
+}
+
 /* Lockdep check that ->cblist may be safely accessed. */
 static void rcu_lockdep_assert_cblist_protected(struct rcu_data *rdp)
 {
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v3 4/4] arm64: mm: Make arch_faults_on_old_pte() check for migratability
  2021-08-11 20:13 [PATCH v3 0/4] rcu, arm64: PREEMPT_RT fixlets Valentin Schneider
                   ` (2 preceding siblings ...)
  2021-08-11 20:13 ` [PATCH v3 3/4] rcu/nocb: Protect NOCB state via local_lock() under PREEMPT_RT Valentin Schneider
@ 2021-08-11 20:13 ` Valentin Schneider
  3 siblings, 0 replies; 51+ messages in thread
From: Valentin Schneider @ 2021-08-11 20:13 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, rcu, linux-rt-users
  Cc: Catalin Marinas, Will Deacon, Ingo Molnar, Peter Zijlstra,
	Thomas Gleixner, Steven Rostedt, Daniel Bristot de Oliveira,
	Sebastian Andrzej Siewior, Paul E. McKenney, Frederic Weisbecker,
	Josh Triplett, Mathieu Desnoyers, Davidlohr Bueso, Lai Jiangshan,
	Joel Fernandes, Anshuman Khandual, Vincenzo Frascino,
	Steven Price, Ard Biesheuvel, Boqun Feng, Mike Galbraith

arch_faults_on_old_pte() relies on the calling context being
non-preemptible. CONFIG_PREEMPT_RT turns the PTE lock into a sleepable
spinlock, which doesn't disable preemption once acquired, triggering the
warning in arch_faults_on_old_pte().

It does however disable migration, ensuring the task remains on the same
CPU during the entirety of the critical section, making the read of
cpu_has_hw_af() safe and stable.

Make arch_faults_on_old_pte() check migratable() instead of preemptible().

Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
---
 arch/arm64/include/asm/pgtable.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index f09bf5c02891..8c0de3234b20 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -995,7 +995,7 @@ static inline void update_mmu_cache(struct vm_area_struct *vma,
  */
 static inline bool arch_faults_on_old_pte(void)
 {
-	WARN_ON(preemptible());
+	WARN_ON(migratable());
 
 	return !cpu_has_hw_af();
 }
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 1/4] rcutorture: Don't disable softirqs with preemption disabled when PREEMPT_RT
  2021-08-11 20:13 ` [PATCH v3 1/4] rcutorture: Don't disable softirqs with preemption disabled when PREEMPT_RT Valentin Schneider
@ 2021-08-12 16:47   ` Paul E. McKenney
  2021-08-17 12:13   ` Sebastian Andrzej Siewior
  1 sibling, 0 replies; 51+ messages in thread
From: Paul E. McKenney @ 2021-08-12 16:47 UTC (permalink / raw)
  To: Valentin Schneider
  Cc: linux-kernel, linux-arm-kernel, rcu, linux-rt-users,
	Catalin Marinas, Will Deacon, Ingo Molnar, Peter Zijlstra,
	Thomas Gleixner, Steven Rostedt, Daniel Bristot de Oliveira,
	Sebastian Andrzej Siewior, Frederic Weisbecker, Josh Triplett,
	Mathieu Desnoyers, Davidlohr Bueso, Lai Jiangshan,
	Joel Fernandes, Anshuman Khandual, Vincenzo Frascino,
	Steven Price, Ard Biesheuvel, Boqun Feng, Mike Galbraith

On Wed, Aug 11, 2021 at 09:13:51PM +0100, Valentin Schneider wrote:
> Running RCU torture with CONFIG_PREEMPT_RT under v5.14-rc4-rt6 triggers:
> 
> [    8.755472] DEBUG_LOCKS_WARN_ON(this_cpu_read(softirq_ctrl.cnt))
> [    8.755482] WARNING: CPU: 1 PID: 137 at kernel/softirq.c:172 __local_bh_disable_ip (kernel/softirq.c:172 (discriminator 31))
> [    8.755500] Modules linked in:
> [    8.755506] CPU: 1 PID: 137 Comm: rcu_torture_rea Not tainted 5.14.0-rc4-rt6 #171
> [    8.755514] Hardware name: ARM Juno development board (r0) (DT)
> [    8.755518] pstate: 60000005 (nZCv daif -PAN -UAO -TCO BTYPE=--)
> [    8.755622] Call trace:
> [    8.755624] __local_bh_disable_ip (kernel/softirq.c:172 (discriminator 31))
> [    8.755633] rcutorture_one_extend (kernel/rcu/rcutorture.c:1453)
> [    8.755640] rcu_torture_one_read (kernel/rcu/rcutorture.c:1601 kernel/rcu/rcutorture.c:1645)
> [    8.755645] rcu_torture_reader (kernel/rcu/rcutorture.c:1737)
> [    8.755650] kthread (kernel/kthread.c:327)
> [    8.755656] ret_from_fork (arch/arm64/kernel/entry.S:783)
> [    8.755663] irq event stamp: 11959
> [    8.755666] hardirqs last enabled at (11959): __rcu_read_unlock (kernel/rcu/tree_plugin.h:695 kernel/rcu/tree_plugin.h:451)
> [    8.755675] hardirqs last disabled at (11958): __rcu_read_unlock (kernel/rcu/tree_plugin.h:661 kernel/rcu/tree_plugin.h:451)
> [    8.755683] softirqs last enabled at (11950): __local_bh_enable_ip (./arch/arm64/include/asm/irqflags.h:85 kernel/softirq.c:261)
> [    8.755692] softirqs last disabled at (11942): rcutorture_one_extend (./include/linux/bottom_half.h:19 kernel/rcu/rcutorture.c:1441)
> 
> Per this warning, softirqs cannot be disabled in a non-preemptible region
> under CONFIG_PREEMPT_RT. Adjust RCU torture accordingly.
> 
> Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>

Looks sensible for -rt, but of course does not apply on mainline or -rcu.

Acked-by: Paul E. McKenney <paulmck@kernel.org>

> ---
>  kernel/rcu/rcutorture.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/kernel/rcu/rcutorture.c b/kernel/rcu/rcutorture.c
> index eecd1caef71d..4f3db1d3170d 100644
> --- a/kernel/rcu/rcutorture.c
> +++ b/kernel/rcu/rcutorture.c
> @@ -1548,6 +1548,8 @@ rcutorture_extend_mask(int oldmask, struct torture_random_state *trsp)
>  	 * them on non-RT.
>  	 */
>  	if (IS_ENABLED(CONFIG_PREEMPT_RT)) {
> +		/* Can't disable bh in atomic context under PREEMPT_RT */
> +		mask &= ~(RCUTORTURE_RDR_ATOM_BH | RCUTORTURE_RDR_ATOM_RBH);
>  		/*
>  		 * Can't release the outermost rcu lock in an irq disabled
>  		 * section without preemption also being disabled, if irqs
> -- 
> 2.25.1
> 

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 3/4] rcu/nocb: Protect NOCB state via local_lock() under PREEMPT_RT
  2021-08-11 20:13 ` [PATCH v3 3/4] rcu/nocb: Protect NOCB state via local_lock() under PREEMPT_RT Valentin Schneider
@ 2021-08-13  0:20   ` Paul E. McKenney
  2021-08-13 18:48     ` Valentin Schneider
  2021-08-17 15:36   ` Sebastian Andrzej Siewior
  2021-09-21 14:05   ` Thomas Gleixner
  2 siblings, 1 reply; 51+ messages in thread
From: Paul E. McKenney @ 2021-08-13  0:20 UTC (permalink / raw)
  To: Valentin Schneider
  Cc: linux-kernel, linux-arm-kernel, rcu, linux-rt-users,
	Catalin Marinas, Will Deacon, Ingo Molnar, Peter Zijlstra,
	Thomas Gleixner, Steven Rostedt, Daniel Bristot de Oliveira,
	Sebastian Andrzej Siewior, Frederic Weisbecker, Josh Triplett,
	Mathieu Desnoyers, Davidlohr Bueso, Lai Jiangshan,
	Joel Fernandes, Anshuman Khandual, Vincenzo Frascino,
	Steven Price, Ard Biesheuvel, Boqun Feng, Mike Galbraith

On Wed, Aug 11, 2021 at 09:13:53PM +0100, Valentin Schneider wrote:
> Warning
> =======
> 
> Running v5.13-rt1 on my arm64 Juno board triggers:
> 
> [    0.156302] =============================
> [    0.160416] WARNING: suspicious RCU usage
> [    0.164529] 5.13.0-rt1 #20 Not tainted
> [    0.168300] -----------------------------
> [    0.172409] kernel/rcu/tree_plugin.h:69 Unsafe read of RCU_NOCB offloaded state!
> [    0.179920]
> [    0.179920] other info that might help us debug this:
> [    0.179920]
> [    0.188037]
> [    0.188037] rcu_scheduler_active = 1, debug_locks = 1
> [    0.194677] 3 locks held by rcuc/0/11:
> [    0.198448] #0: ffff00097ef10cf8 ((softirq_ctrl.lock).lock){+.+.}-{2:2}, at: __local_bh_disable_ip (./include/linux/rcupdate.h:662 kernel/softirq.c:171)
> [    0.208709] #1: ffff80001205e5f0 (rcu_read_lock){....}-{1:2}, at: rt_spin_lock (kernel/locking/spinlock_rt.c:43 (discriminator 4))
> [    0.217134] #2: ffff80001205e5f0 (rcu_read_lock){....}-{1:2}, at: __local_bh_disable_ip (kernel/softirq.c:169)
> [    0.226428]
> [    0.226428] stack backtrace:
> [    0.230889] CPU: 0 PID: 11 Comm: rcuc/0 Not tainted 5.13.0-rt1 #20
> [    0.237100] Hardware name: ARM Juno development board (r0) (DT)
> [    0.243041] Call trace:
> [    0.245497] dump_backtrace (arch/arm64/kernel/stacktrace.c:163)
> [    0.249185] show_stack (arch/arm64/kernel/stacktrace.c:219)
> [    0.252522] dump_stack (lib/dump_stack.c:122)
> [    0.255947] lockdep_rcu_suspicious (kernel/locking/lockdep.c:6439)
> [    0.260328] rcu_rdp_is_offloaded (kernel/rcu/tree_plugin.h:69 kernel/rcu/tree_plugin.h:58)
> [    0.264537] rcu_core (kernel/rcu/tree.c:2332 kernel/rcu/tree.c:2398 kernel/rcu/tree.c:2777)
> [    0.267786] rcu_cpu_kthread (./include/linux/bottom_half.h:32 kernel/rcu/tree.c:2876)
> [    0.271644] smpboot_thread_fn (kernel/smpboot.c:165 (discriminator 3))
> [    0.275767] kthread (kernel/kthread.c:321)
> [    0.279013] ret_from_fork (arch/arm64/kernel/entry.S:1005)
> 
> In this case, this is the RCU core kthread accessing the local CPU's
> rdp. Before that, rcu_cpu_kthread() invokes local_bh_disable().
> 
> Under !CONFIG_PREEMPT_RT (and rcutree.use_softirq=0), this ends up
> incrementing the preempt_count, which satisfies the "local non-preemptible
> read" of rcu_rdp_is_offloaded().
> 
> Under CONFIG_PREEMPT_RT however, this becomes
> 
>   local_lock(&softirq_ctrl.lock)
> 
> which, under the same config, is migrate_disable() + rt_spin_lock(). As
> pointed out by Frederic, this is not sufficient to safely access an rdp's
> offload state, as the RCU core kthread can be preempted by a kworker
> executing rcu_nocb_rdp_offload() [1].
> 
> Introduce a local_lock to serialize an rdp's offload state while the rdp's
> associated core kthread is executing rcu_core().
> 
> rcu_core() preemptability considerations
> ========================================
> 
> As pointed out by Paul [2], keeping rcu_check_quiescent_state() preemptible
> (which is the case under CONFIG_PREEMPT_RT) requires some consideration.
> 
> note_gp_changes() itself runs with irqs off, and enters
> __note_gp_changes() with rnp->lock held (raw_spinlock), thus is safe vs
> preemption.
> 
> rdp->core_needs_qs *could* change after being read by the RCU core
> kthread if it then gets preempted. Consider, with
> CONFIG_RCU_STRICT_GRACE_PERIOD:
> 
>   rcuc/x                                   task_foo
> 
>     rcu_check_quiescent_state()
>     `\
>       rdp->core_needs_qs == true
> 			     <PREEMPT>
> 					   rcu_read_unlock()
> 					   `\
> 					     rcu_preempt_deferred_qs_irqrestore()
> 					     `\
> 					       rcu_report_qs_rdp()
> 					       `\
> 						 rdp->core_needs_qs := false;
> 
> This would let rcuc/x's rcu_check_quiescent_state() proceed further down to
> rcu_report_qs_rdp(), but if task_foo's earlier rcu_report_qs_rdp()
> invocation would have cleared the rdp grpmask from the rnp mask, so
> rcuc/x's invocation would simply bail.
> 
> Since rcu_report_qs_rdp() can be safely invoked, even if rdp->core_needs_qs
> changed, it appears safe to keep rcu_check_quiescent_state() preemptible.

Another concern...

During the preemption of rcu_check_quiescent_state() someone might report
a quiescent state on behalf of CPU x (perhaps due to its having recently
been idle) and then the RCU grace-period kthread might start running on
CPU x, where it might initialize a new grace period in rcu_gp_init().
It can then invoke __note_gp_changes(), also on CPU x.

If preempted as shown above just after checking >core_needs_qs, the
->cpu_no_qs.b.norm field will be set by the grace-period kthread, which
will cause the rcu_check_quiescent_state() function's subsequent check
of ->cpu_no_qs.b.norm to take an early exit.  So OK here.

On the other hand, if preempted just after the rcu_check_quiescent_state()
function's check of ->cpu_no_qs.b.norm, the later invocation of
rcu_report_qs_rdp() should take an early exit due to ->gp_seq mismatch.
So OK here.

However, this should be added to the commit log.  Might be a big commit
log, but mass storage is cheap these days.  ;-)

This needs a review of each and every manipulation of ->core_needs_qs
and ->cpu_no_qs.b.norm.  For example, the preemptions will cause the
scheduler to invoke RCU's context-switch hooks, which also mess with
->cpu_no_qs.b.norm.  I can get to that some time next week (or tomorrow,
if things go better than expected), but it would be good for you (and
others) to check as well.

Frederic should look this over, but I am taking a quick pass in the
meantime.  Please see below.

							Thanx, Paul

> [1]: http://lore.kernel.org/r/20210727230814.GC283787@lothringen
> [2]: http://lore.kernel.org/r/20210729010445.GO4397@paulmck-ThinkPad-P17-Gen-1
> Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
> ---
>  kernel/rcu/tree.c        |  4 ++
>  kernel/rcu/tree.h        |  4 ++
>  kernel/rcu/tree_plugin.h | 82 +++++++++++++++++++++++++++++++++++-----
>  3 files changed, 81 insertions(+), 9 deletions(-)
> 
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 51f24ecd94b2..caadfec994f3 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -87,6 +87,7 @@ static DEFINE_PER_CPU_SHARED_ALIGNED(struct rcu_data, rcu_data) = {
>  	.dynticks = ATOMIC_INIT(RCU_DYNTICK_CTRL_CTR),
>  #ifdef CONFIG_RCU_NOCB_CPU
>  	.cblist.flags = SEGCBLIST_SOFTIRQ_ONLY,
> +	.nocb_local_lock =  INIT_LOCAL_LOCK(nocb_local_lock),
>  #endif
>  };
>  static struct rcu_state rcu_state = {
> @@ -2853,10 +2854,12 @@ static void rcu_cpu_kthread(unsigned int cpu)
>  {
>  	unsigned int *statusp = this_cpu_ptr(&rcu_data.rcu_cpu_kthread_status);
>  	char work, *workp = this_cpu_ptr(&rcu_data.rcu_cpu_has_work);
> +	struct rcu_data *rdp = this_cpu_ptr(&rcu_data);
>  	int spincnt;
>  
>  	trace_rcu_utilization(TPS("Start CPU kthread@rcu_run"));
>  	for (spincnt = 0; spincnt < 10; spincnt++) {
> +		rcu_nocb_local_lock(rdp);
>  		local_bh_disable();
>  		*statusp = RCU_KTHREAD_RUNNING;
>  		local_irq_disable();
> @@ -2866,6 +2869,7 @@ static void rcu_cpu_kthread(unsigned int cpu)
>  		if (work)
>  			rcu_core();
>  		local_bh_enable();
> +		rcu_nocb_local_unlock(rdp);
>  		if (*workp == 0) {
>  			trace_rcu_utilization(TPS("End CPU kthread@rcu_wait"));
>  			*statusp = RCU_KTHREAD_WAITING;
> diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
> index 305cf6aeb408..aa6831255fec 100644
> --- a/kernel/rcu/tree.h
> +++ b/kernel/rcu/tree.h
> @@ -210,6 +210,8 @@ struct rcu_data {
>  	struct timer_list nocb_timer;	/* Enforce finite deferral. */
>  	unsigned long nocb_gp_adv_time;	/* Last call_rcu() CB adv (jiffies). */
>  
> +	local_lock_t nocb_local_lock;

Should this go near the beginning of the structure, given that code
paths taking this lock tend to access ->cpu_no_qs, ->core_needs_qs,
and so on?

Given that it is used to protect core processing (not just offloaded
callbacks), might ->core_local_lock be a better name?

Please keep in mind that you can build kernels that offload callbacks
but that still use softirq for RCU core processing.  And vice versa,
that is, kernels that use rcuc kthreads but do not offload callbacks.

> +
>  	/* The following fields are used by call_rcu, hence own cacheline. */
>  	raw_spinlock_t nocb_bypass_lock ____cacheline_internodealigned_in_smp;
>  	struct rcu_cblist nocb_bypass;	/* Lock-contention-bypass CB list. */
> @@ -445,6 +447,8 @@ static void rcu_nocb_unlock(struct rcu_data *rdp);
>  static void rcu_nocb_unlock_irqrestore(struct rcu_data *rdp,
>  				       unsigned long flags);
>  static void rcu_lockdep_assert_cblist_protected(struct rcu_data *rdp);
> +static void rcu_nocb_local_lock(struct rcu_data *rdp);
> +static void rcu_nocb_local_unlock(struct rcu_data *rdp);
>  #ifdef CONFIG_RCU_NOCB_CPU
>  static void __init rcu_organize_nocb_kthreads(void);
>  #define rcu_nocb_lock_irqsave(rdp, flags)				\
> diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> index 0ff5e4fb933e..11c4ff00afde 100644
> --- a/kernel/rcu/tree_plugin.h
> +++ b/kernel/rcu/tree_plugin.h
> @@ -21,6 +21,17 @@ static inline int rcu_lockdep_is_held_nocb(struct rcu_data *rdp)
>  	return lockdep_is_held(&rdp->nocb_lock);
>  }
>  
> +static inline int rcu_lockdep_is_held_nocb_local(struct rcu_data *rdp)
> +{
> +	return lockdep_is_held(
> +#ifdef CONFIG_PREEMPT_RT
> +		&rdp->nocb_local_lock.lock
> +#else
> +		&rdp->nocb_local_lock
> +#endif

It would be good if this was abstracted.  Or is this the only place
in the kernel that needs this #ifdef?  Maybe a lockdep_is_held_rt()
or lockdep_is_held_local(), as the case may be?

> +	);
> +}
> +
>  static inline bool rcu_current_is_nocb_kthread(struct rcu_data *rdp)
>  {
>  	/* Race on early boot between thread creation and assignment */
> @@ -38,7 +49,10 @@ static inline int rcu_lockdep_is_held_nocb(struct rcu_data *rdp)
>  {
>  	return 0;
>  }
> -
> +static inline int rcu_lockdep_is_held_nocb_local(struct rcu_data *rdp)
> +{
> +	return 0;

This is backwards of normal lockdep practice, which defaults to locks
always held.  And that will be what happens once lockdep has detected
its first deadlock, correct?  At which point, this function and its
earlier instance will be in conflict.

Or is there some subtle reason why this conflict would be OK?

> +}
>  static inline bool rcu_current_is_nocb_kthread(struct rcu_data *rdp)
>  {
>  	return false;
> @@ -46,23 +60,44 @@ static inline bool rcu_current_is_nocb_kthread(struct rcu_data *rdp)
>  
>  #endif /* #ifdef CONFIG_RCU_NOCB_CPU */
>  
> +/*
> + * Is a local read of the rdp's offloaded state safe and stable?
> + * See rcu_nocb_local_lock() & family.
> + */
> +static inline bool rcu_local_offload_access_safe(struct rcu_data *rdp)
> +{
> +	if (!preemptible())
> +		return true;
> +
> +	if (!migratable()) {
> +		if (!IS_ENABLED(CONFIG_RCU_NOCB))

Do we also need to consult the use_softirq module parameter that controls
whether or not there are rcuc kthreads?

Actually, if !IS_ENABLED(CONFIG_RCU_NOCB) then rcu_rdp_is_offloaded()
can simply return false without bothering with the RCU_LOCKDEP_WARN().
Might be worth getting that out of the way of the RCU_LOCKDEP_WARN()
condition.  ;-)

> +			return true;
> +
> +		return rcu_lockdep_is_held_nocb_local(rdp);
> +	}
> +
> +	return false;
> +}
> +
>  static bool rcu_rdp_is_offloaded(struct rcu_data *rdp)
>  {
>  	/*
> -	 * In order to read the offloaded state of an rdp is a safe
> -	 * and stable way and prevent from its value to be changed
> -	 * under us, we must either hold the barrier mutex, the cpu
> -	 * hotplug lock (read or write) or the nocb lock. Local
> -	 * non-preemptible reads are also safe. NOCB kthreads and
> -	 * timers have their own means of synchronization against the
> -	 * offloaded state updaters.
> +	 * In order to read the offloaded state of an rdp is a safe and stable
> +	 * way and prevent from its value to be changed under us, we must either...
>  	 */
>  	RCU_LOCKDEP_WARN(
> +		// ...hold the barrier mutex...
>  		!(lockdep_is_held(&rcu_state.barrier_mutex) ||
> +		  // ... the cpu hotplug lock (read or write)...
>  		  (IS_ENABLED(CONFIG_HOTPLUG_CPU) && lockdep_is_cpus_held()) ||
> +		  // ... or the NOCB lock.
>  		  rcu_lockdep_is_held_nocb(rdp) ||
> +		  // Local reads still require the local state to remain stable
> +		  // (preemption disabled / local lock held)
>  		  (rdp == this_cpu_ptr(&rcu_data) &&
> -		   !(IS_ENABLED(CONFIG_PREEMPT_COUNT) && preemptible())) ||
> +		   rcu_local_offload_access_safe(rdp)) ||
> +		  // NOCB kthreads and timers have their own means of synchronization
> +		  // against the offloaded state updaters.
>  		  rcu_current_is_nocb_kthread(rdp)),
>  		"Unsafe read of RCU_NOCB offloaded state"
>  	);
> @@ -1629,6 +1664,22 @@ static void rcu_nocb_unlock_irqrestore(struct rcu_data *rdp,
>  	}
>  }
>  
> +/*
> + * The invocation of rcu_core() within the RCU core kthreads remains preemptible
> + * under PREEMPT_RT, thus the offload state of a CPU could change while
> + * said kthreads are preempted. Prevent this from happening by protecting the
> + * offload state with a local_lock().
> + */
> +static void rcu_nocb_local_lock(struct rcu_data *rdp)
> +{
> +	local_lock(&rcu_data.nocb_local_lock);
> +}
> +
> +static void rcu_nocb_local_unlock(struct rcu_data *rdp)
> +{
> +	local_unlock(&rcu_data.nocb_local_lock);
> +}
> +
>  /* Lockdep check that ->cblist may be safely accessed. */
>  static void rcu_lockdep_assert_cblist_protected(struct rcu_data *rdp)
>  {
> @@ -2396,6 +2447,7 @@ static int rdp_offload_toggle(struct rcu_data *rdp,
>  	if (rdp->nocb_cb_sleep)
>  		rdp->nocb_cb_sleep = false;
>  	rcu_nocb_unlock_irqrestore(rdp, flags);
> +	rcu_nocb_local_unlock(rdp);
>  
>  	/*
>  	 * Ignore former value of nocb_cb_sleep and force wake up as it could
> @@ -2427,6 +2479,7 @@ static long rcu_nocb_rdp_deoffload(void *arg)
>  
>  	pr_info("De-offloading %d\n", rdp->cpu);
>  
> +	rcu_nocb_local_lock(rdp);
>  	rcu_nocb_lock_irqsave(rdp, flags);
>  	/*
>  	 * Flush once and for all now. This suffices because we are
> @@ -2509,6 +2562,7 @@ static long rcu_nocb_rdp_offload(void *arg)
>  	 * Can't use rcu_nocb_lock_irqsave() while we are in
>  	 * SEGCBLIST_SOFTIRQ_ONLY mode.
>  	 */
> +	rcu_nocb_local_lock(rdp);
>  	raw_spin_lock_irqsave(&rdp->nocb_lock, flags);

These look plausible at first glance, but it would be good for Frederic
to look at the exact placement of these rcu_nocb_local_lock() and
rcu_nocb_local_unlock() calls.

>  	/*
> @@ -2868,6 +2922,16 @@ static void rcu_nocb_unlock_irqrestore(struct rcu_data *rdp,
>  	local_irq_restore(flags);
>  }
>  
> +/* No ->nocb_local_lock to acquire. */
> +static void rcu_nocb_local_lock(struct rcu_data *rdp)
> +{
> +}
> +
> +/* No ->nocb_local_lock to release. */
> +static void rcu_nocb_local_unlock(struct rcu_data *rdp)
> +{
> +}
> +
>  /* Lockdep check that ->cblist may be safely accessed. */
>  static void rcu_lockdep_assert_cblist_protected(struct rcu_data *rdp)
>  {
> -- 
> 2.25.1
> 

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 3/4] rcu/nocb: Protect NOCB state via local_lock() under PREEMPT_RT
  2021-08-13  0:20   ` Paul E. McKenney
@ 2021-08-13 18:48     ` Valentin Schneider
  0 siblings, 0 replies; 51+ messages in thread
From: Valentin Schneider @ 2021-08-13 18:48 UTC (permalink / raw)
  To: paulmck
  Cc: linux-kernel, linux-arm-kernel, rcu, linux-rt-users,
	Catalin Marinas, Will Deacon, Ingo Molnar, Peter Zijlstra,
	Thomas Gleixner, Steven Rostedt, Daniel Bristot de Oliveira,
	Sebastian Andrzej Siewior, Frederic Weisbecker, Josh Triplett,
	Mathieu Desnoyers, Davidlohr Bueso, Lai Jiangshan,
	Joel Fernandes, Anshuman Khandual, Vincenzo Frascino,
	Steven Price, Ard Biesheuvel, Boqun Feng, Mike Galbraith

On 12/08/21 17:20, Paul E. McKenney wrote:
> On Wed, Aug 11, 2021 at 09:13:53PM +0100, Valentin Schneider wrote:
>> rcu_core() preemptability considerations
>> ========================================
>> 
>> As pointed out by Paul [2], keeping rcu_check_quiescent_state() preemptible
>> (which is the case under CONFIG_PREEMPT_RT) requires some consideration.
>> 
>> note_gp_changes() itself runs with irqs off, and enters
>> __note_gp_changes() with rnp->lock held (raw_spinlock), thus is safe vs
>> preemption.
>> 
>> rdp->core_needs_qs *could* change after being read by the RCU core
>> kthread if it then gets preempted. Consider, with
>> CONFIG_RCU_STRICT_GRACE_PERIOD:
>> 
>>   rcuc/x                                   task_foo
>> 
>>     rcu_check_quiescent_state()
>>     `\
>>       rdp->core_needs_qs == true
>> 			     <PREEMPT>
>> 					   rcu_read_unlock()
>> 					   `\
>> 					     rcu_preempt_deferred_qs_irqrestore()
>> 					     `\
>> 					       rcu_report_qs_rdp()
>> 					       `\
>> 						 rdp->core_needs_qs := false;
>> 
>> This would let rcuc/x's rcu_check_quiescent_state() proceed further down to
>> rcu_report_qs_rdp(), but if task_foo's earlier rcu_report_qs_rdp()
>> invocation would have cleared the rdp grpmask from the rnp mask, so
>> rcuc/x's invocation would simply bail.
>> 
>> Since rcu_report_qs_rdp() can be safely invoked, even if rdp->core_needs_qs
>> changed, it appears safe to keep rcu_check_quiescent_state() preemptible.
>
> Another concern...
>
> During the preemption of rcu_check_quiescent_state() someone might report
> a quiescent state on behalf of CPU x (perhaps due to its having recently
> been idle) and then the RCU grace-period kthread might start running on
> CPU x, where it might initialize a new grace period in rcu_gp_init().
> It can then invoke __note_gp_changes(), also on CPU x.
>

(this is me "writing out loud" to make sure I can follow)

I take it the preemption of rcuc/x itself would count as a quiescent state,
if the current grace period started before rcuc/x's rcu_read_lock()
(rcuc/x would end up in ->blkd_tasks but wouldn't affect ->gp_tasks).

Then if we get something to run rcu_report_qs_rnp() with a mask spanning
CPU x - I think the GP kthread doing quiescent state forcing might fit the
bill - we'll let the GP kthread initialize a new GP.

> If preempted as shown above just after checking >core_needs_qs, the
> ->cpu_no_qs.b.norm field will be set by the grace-period kthread, which
> will cause the rcu_check_quiescent_state() function's subsequent check
> of ->cpu_no_qs.b.norm to take an early exit.  So OK here.
>

Right

> On the other hand, if preempted just after the rcu_check_quiescent_state()
> function's check of ->cpu_no_qs.b.norm, the later invocation of
> rcu_report_qs_rdp() should take an early exit due to ->gp_seq mismatch.
> So OK here.
>

If, as described in your scenario above, the GP kthread has preempted
rcuc/x, initialized a new GP and has run __note_gp_changes() (on CPU x),
then wouldn't the rdp->gp_seq and rnp->gp_seq match when rcuc/x gets to run
again?

And because we would've then had a context switch between the GP kthread
and rcuc/x, we would've noted a quiescent state for CPU x, which would let
rcuc/x's rcu_report_qs_rdp() continue - or have I erred on the way there?

> However, this should be added to the commit log.  Might be a big commit
> log, but mass storage is cheap these days.  ;-)
>

No objections here!

> This needs a review of each and every manipulation of ->core_needs_qs
> and ->cpu_no_qs.b.norm.  For example, the preemptions will cause the
> scheduler to invoke RCU's context-switch hooks, which also mess with
> ->cpu_no_qs.b.norm.  I can get to that some time next week (or tomorrow,
> if things go better than expected), but it would be good for you (and
> others) to check as well.
>

I have some notes scribbled down regarding those that I need to go through
again, but that won't be before a week's time - I'll be away next week.

> Frederic should look this over, but I am taking a quick pass in the
> meantime.  Please see below.
>

Thanks!

>> @@ -210,6 +210,8 @@ struct rcu_data {
>>  	struct timer_list nocb_timer;	/* Enforce finite deferral. */
>>  	unsigned long nocb_gp_adv_time;	/* Last call_rcu() CB adv (jiffies). */
>>  
>> +	local_lock_t nocb_local_lock;
>
> Should this go near the beginning of the structure, given that code
> paths taking this lock tend to access ->cpu_no_qs, ->core_needs_qs,
> and so on?
>
> Given that it is used to protect core processing (not just offloaded
> callbacks), might ->core_local_lock be a better name?
>

It would still have to be in a #define CONFIG_RCU_NOCB_CPU region IMO, as
it only exists to protect access of the offloading state - hence the name,
but I'm not particularly attached to it.

> Please keep in mind that you can build kernels that offload callbacks
> but that still use softirq for RCU core processing.  And vice versa,
> that is, kernels that use rcuc kthreads but do not offload callbacks.
>

AFAICT the problem only stands if core processing becomes preemptible,
which is only the case under PREEMPT_RT (at least for now), and that implies
core processing done purely via kthreads.

>> +
>>  	/* The following fields are used by call_rcu, hence own cacheline. */
>>  	raw_spinlock_t nocb_bypass_lock ____cacheline_internodealigned_in_smp;
>>  	struct rcu_cblist nocb_bypass;	/* Lock-contention-bypass CB list. */
>> @@ -445,6 +447,8 @@ static void rcu_nocb_unlock(struct rcu_data *rdp);
>>  static void rcu_nocb_unlock_irqrestore(struct rcu_data *rdp,
>>  				       unsigned long flags);
>>  static void rcu_lockdep_assert_cblist_protected(struct rcu_data *rdp);
>> +static void rcu_nocb_local_lock(struct rcu_data *rdp);
>> +static void rcu_nocb_local_unlock(struct rcu_data *rdp);
>>  #ifdef CONFIG_RCU_NOCB_CPU
>>  static void __init rcu_organize_nocb_kthreads(void);
>>  #define rcu_nocb_lock_irqsave(rdp, flags)				\
>> diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
>> index 0ff5e4fb933e..11c4ff00afde 100644
>> --- a/kernel/rcu/tree_plugin.h
>> +++ b/kernel/rcu/tree_plugin.h
>> @@ -21,6 +21,17 @@ static inline int rcu_lockdep_is_held_nocb(struct rcu_data *rdp)
>>  	return lockdep_is_held(&rdp->nocb_lock);
>>  }
>>  
>> +static inline int rcu_lockdep_is_held_nocb_local(struct rcu_data *rdp)
>> +{
>> +	return lockdep_is_held(
>> +#ifdef CONFIG_PREEMPT_RT
>> +		&rdp->nocb_local_lock.lock
>> +#else
>> +		&rdp->nocb_local_lock
>> +#endif
>
> It would be good if this was abstracted.  Or is this the only place
> in the kernel that needs this #ifdef?  Maybe a lockdep_is_held_rt()
> or lockdep_is_held_local(), as the case may be?
>

It does look like the first/only place that tries to access a local_lock's
dep_map regardless of CONFIG_PREEMPT_RT. Looking at it some more, I'm not
sure if it's really useful: !PREEMPT_RT local_locks disable preemption, so
the preemption check in rcu_rdp_is_offloaded() will short circuit the
lockdep check when !PREEMPT_RT...

One thing I should perhaps point out - the local lock is only really needed
for PREEMPT_RT; for !PREEMPT_RT it will just disable preemption, and it's
only used in places that *already* disabled preemption (when !PREEMPT_RT).
I initially gated the lock under CONFIG_PREEMPT_RT, but changed that as I
found it reduced the ifdeffery.

>> +	);
>> +}
>> +
>>  static inline bool rcu_current_is_nocb_kthread(struct rcu_data *rdp)
>>  {
>>  	/* Race on early boot between thread creation and assignment */
>> @@ -38,7 +49,10 @@ static inline int rcu_lockdep_is_held_nocb(struct rcu_data *rdp)
>>  {
>>  	return 0;
>>  }
>> -
>> +static inline int rcu_lockdep_is_held_nocb_local(struct rcu_data *rdp)
>> +{
>> +	return 0;
>
> This is backwards of normal lockdep practice, which defaults to locks
> always held.  And that will be what happens once lockdep has detected
> its first deadlock, correct?  At which point, this function and its
> earlier instance will be in conflict.
>
> Or is there some subtle reason why this conflict would be OK?
>

This follows the !CONFIG_RCU_NOCB definition of rcu_lockdep_is_held_nocb(),
which looked fine to me.

Actually with the way I wrote rcu_local_offload_access_safe(), we don't
even access that function when !CONFIG_RCU_NOCB...

>> +}
>>  static inline bool rcu_current_is_nocb_kthread(struct rcu_data *rdp)
>>  {
>>  	return false;
>> @@ -46,23 +60,44 @@ static inline bool rcu_current_is_nocb_kthread(struct rcu_data *rdp)
>>  
>>  #endif /* #ifdef CONFIG_RCU_NOCB_CPU */
>>  
>> +/*
>> + * Is a local read of the rdp's offloaded state safe and stable?
>> + * See rcu_nocb_local_lock() & family.
>> + */
>> +static inline bool rcu_local_offload_access_safe(struct rcu_data *rdp)
>> +{
>> +	if (!preemptible())
>> +		return true;
>> +
>> +	if (!migratable()) {
>> +		if (!IS_ENABLED(CONFIG_RCU_NOCB))
>
> Do we also need to consult the use_softirq module parameter that controls
> whether or not there are rcuc kthreads?
>
> Actually, if !IS_ENABLED(CONFIG_RCU_NOCB) then rcu_rdp_is_offloaded()
> can simply return false without bothering with the RCU_LOCKDEP_WARN().
> Might be worth getting that out of the way of the RCU_LOCKDEP_WARN()
> condition.  ;-)
>

I _assumed_ the check was there even for !CONFIG_RCU_NOCB to provide a wider
test coverage of the calling conditions.

If that RCU_LOCKDEP_WARN() becomes conditionnal on CONFIG_RCU_NOCB then
yes, we could clean that up.

>> +			return true;
>> +
>> +		return rcu_lockdep_is_held_nocb_local(rdp);
>> +	}
>> +
>> +	return false;
>> +}
>> +

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 1/4] rcutorture: Don't disable softirqs with preemption disabled when PREEMPT_RT
  2021-08-11 20:13 ` [PATCH v3 1/4] rcutorture: Don't disable softirqs with preemption disabled when PREEMPT_RT Valentin Schneider
  2021-08-12 16:47   ` Paul E. McKenney
@ 2021-08-17 12:13   ` Sebastian Andrzej Siewior
  2021-08-17 13:17     ` Sebastian Andrzej Siewior
  1 sibling, 1 reply; 51+ messages in thread
From: Sebastian Andrzej Siewior @ 2021-08-17 12:13 UTC (permalink / raw)
  To: Valentin Schneider
  Cc: linux-kernel, linux-arm-kernel, rcu, linux-rt-users,
	Catalin Marinas, Will Deacon, Ingo Molnar, Peter Zijlstra,
	Thomas Gleixner, Steven Rostedt, Daniel Bristot de Oliveira,
	Paul E. McKenney, Frederic Weisbecker, Josh Triplett,
	Mathieu Desnoyers, Davidlohr Bueso, Lai Jiangshan,
	Joel Fernandes, Anshuman Khandual, Vincenzo Frascino,
	Steven Price, Ard Biesheuvel, Boqun Feng, Mike Galbraith

On 2021-08-11 21:13:51 [+0100], Valentin Schneider wrote:
> Running RCU torture with CONFIG_PREEMPT_RT under v5.14-rc4-rt6 triggers:
> 
> [    8.755472] DEBUG_LOCKS_WARN_ON(this_cpu_read(softirq_ctrl.cnt))
> [    8.755482] WARNING: CPU: 1 PID: 137 at kernel/softirq.c:172 __local_bh_disable_ip (kernel/softirq.c:172 (discriminator 31))
> [    8.755500] Modules linked in:
> [    8.755506] CPU: 1 PID: 137 Comm: rcu_torture_rea Not tainted 5.14.0-rc4-rt6 #171
> [    8.755514] Hardware name: ARM Juno development board (r0) (DT)
> [    8.755518] pstate: 60000005 (nZCv daif -PAN -UAO -TCO BTYPE=--)
> [    8.755622] Call trace:
> [    8.755624] __local_bh_disable_ip (kernel/softirq.c:172 (discriminator 31))
> [    8.755633] rcutorture_one_extend (kernel/rcu/rcutorture.c:1453)
> [    8.755640] rcu_torture_one_read (kernel/rcu/rcutorture.c:1601 kernel/rcu/rcutorture.c:1645)
> [    8.755645] rcu_torture_reader (kernel/rcu/rcutorture.c:1737)
> [    8.755650] kthread (kernel/kthread.c:327)
> [    8.755656] ret_from_fork (arch/arm64/kernel/entry.S:783)
> [    8.755663] irq event stamp: 11959
> [    8.755666] hardirqs last enabled at (11959): __rcu_read_unlock (kernel/rcu/tree_plugin.h:695 kernel/rcu/tree_plugin.h:451)
> [    8.755675] hardirqs last disabled at (11958): __rcu_read_unlock (kernel/rcu/tree_plugin.h:661 kernel/rcu/tree_plugin.h:451)
> [    8.755683] softirqs last enabled at (11950): __local_bh_enable_ip (./arch/arm64/include/asm/irqflags.h:85 kernel/softirq.c:261)
> [    8.755692] softirqs last disabled at (11942): rcutorture_one_extend (./include/linux/bottom_half.h:19 kernel/rcu/rcutorture.c:1441)
> 
> Per this warning, softirqs cannot be disabled in a non-preemptible region
> under CONFIG_PREEMPT_RT. Adjust RCU torture accordingly.
> 
> Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
> ---
>  kernel/rcu/rcutorture.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/kernel/rcu/rcutorture.c b/kernel/rcu/rcutorture.c
> index eecd1caef71d..4f3db1d3170d 100644
> --- a/kernel/rcu/rcutorture.c
> +++ b/kernel/rcu/rcutorture.c
> @@ -1548,6 +1548,8 @@ rcutorture_extend_mask(int oldmask, struct torture_random_state *trsp)
>  	 * them on non-RT.
>  	 */
>  	if (IS_ENABLED(CONFIG_PREEMPT_RT)) {
> +		/* Can't disable bh in atomic context under PREEMPT_RT */
> +		mask &= ~(RCUTORTURE_RDR_ATOM_BH | RCUTORTURE_RDR_ATOM_RBH);

Let me stare at this…

>  		/*
>  		 * Can't release the outermost rcu lock in an irq disabled
>  		 * section without preemption also being disabled, if irqs
> -- 
> 2.25.1

Sebastian

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 1/4] rcutorture: Don't disable softirqs with preemption disabled when PREEMPT_RT
  2021-08-17 12:13   ` Sebastian Andrzej Siewior
@ 2021-08-17 13:17     ` Sebastian Andrzej Siewior
  2021-08-17 14:40       ` [PATCH] rcutorture: Avoid problematic critical section nesting on RT Sebastian Andrzej Siewior
  0 siblings, 1 reply; 51+ messages in thread
From: Sebastian Andrzej Siewior @ 2021-08-17 13:17 UTC (permalink / raw)
  To: Valentin Schneider
  Cc: linux-kernel, linux-arm-kernel, rcu, linux-rt-users,
	Catalin Marinas, Will Deacon, Ingo Molnar, Peter Zijlstra,
	Thomas Gleixner, Steven Rostedt, Daniel Bristot de Oliveira,
	Paul E. McKenney, Frederic Weisbecker, Josh Triplett,
	Mathieu Desnoyers, Davidlohr Bueso, Lai Jiangshan,
	Joel Fernandes, Anshuman Khandual, Vincenzo Frascino,
	Steven Price, Ard Biesheuvel, Boqun Feng, Mike Galbraith

On 2021-08-17 14:13:47 [+0200], To Valentin Schneider wrote:
> > index eecd1caef71d..4f3db1d3170d 100644
> > --- a/kernel/rcu/rcutorture.c
> > +++ b/kernel/rcu/rcutorture.c
> > @@ -1548,6 +1548,8 @@ rcutorture_extend_mask(int oldmask, struct torture_random_state *trsp)
> >  	 * them on non-RT.
> >  	 */
> >  	if (IS_ENABLED(CONFIG_PREEMPT_RT)) {
> > +		/* Can't disable bh in atomic context under PREEMPT_RT */
> > +		mask &= ~(RCUTORTURE_RDR_ATOM_BH | RCUTORTURE_RDR_ATOM_RBH);
> 
> Let me stare at this…

I would fold this

--- a/kernel/rcu/rcutorture.c
+++ b/kernel/rcu/rcutorture.c
@@ -1549,6 +1549,13 @@ rcutorture_extend_mask(int oldmask, stru
 	 */
 	if (IS_ENABLED(CONFIG_PREEMPT_RT)) {
 		/*
+		 * Can't disable bh in atomic context if bh was already
+		 * disabled by another task on the same CPU. Instead of
+		 * attempting to track this, just avoid disabling bh in atomic
+		 * context.
+		 */
+		mask &= ~atomic_bhs;
+		/*
 		 * Can't release the outermost rcu lock in an irq disabled
 		 * section without preemption also being disabled, if irqs
 		 * had ever been enabled during this RCU critical section
@@ -1559,16 +1566,6 @@ rcutorture_extend_mask(int oldmask, stru
 		    !(mask & preempts))
 			mask |= RCUTORTURE_RDR_RCU;
 
-		/* Can't modify atomic bh in non-atomic context */
-		if ((oldmask & atomic_bhs) && (mask & atomic_bhs) &&
-		    !(mask & preempts_irq)) {
-			mask |= oldmask & preempts_irq;
-			if (mask & RCUTORTURE_RDR_IRQ)
-				mask |= oldmask & tmp;
-		}
-		if ((mask & atomic_bhs) && !(mask & preempts_irq))
-			mask |= RCUTORTURE_RDR_PREEMPT;
-
 		/* Can't modify non-atomic bh in atomic context */
 		tmp = nonatomic_bhs;
 		if (oldmask & preempts_irq)

into the original patch and forward it upstream…

Sebastian

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [PATCH] rcutorture: Avoid problematic critical section nesting on RT
  2021-08-17 13:17     ` Sebastian Andrzej Siewior
@ 2021-08-17 14:40       ` Sebastian Andrzej Siewior
  2021-08-18 22:46         ` Paul E. McKenney
  2021-08-20  3:23         ` [PATCH] rcutorture: Avoid problematic critical section nesting on RT Scott Wood
  0 siblings, 2 replies; 51+ messages in thread
From: Sebastian Andrzej Siewior @ 2021-08-17 14:40 UTC (permalink / raw)
  To: Valentin Schneider
  Cc: linux-kernel, linux-arm-kernel, rcu, linux-rt-users,
	Catalin Marinas, Will Deacon, Ingo Molnar, Peter Zijlstra,
	Thomas Gleixner, Steven Rostedt, Daniel Bristot de Oliveira,
	Paul E. McKenney, Frederic Weisbecker, Josh Triplett,
	Mathieu Desnoyers, Davidlohr Bueso, Lai Jiangshan,
	Joel Fernandes, Anshuman Khandual, Vincenzo Frascino,
	Steven Price, Ard Biesheuvel, Boqun Feng, Mike Galbraith,
	Scott Wood

From: Scott Wood <swood@redhat.com>

rcutorture was generating some nesting scenarios that are not
reasonable.  Constrain the state selection to avoid them.

Example:

1. rcu_read_lock()
2. local_irq_disable()
3. rcu_read_unlock()
4. local_irq_enable()

If the thread is preempted between steps 1 and 2,
rcu_read_unlock_special.b.blocked will be set, but it won't be
acted on in step 3 because IRQs are disabled.  Thus, reporting of the
quiescent state will be delayed beyond the local_irq_enable().

For now, these scenarios will continue to be tested on non-PREEMPT_RT
kernels, until debug checks are added to ensure that they are not
happening elsewhere.

Signed-off-by: Scott Wood <swood@redhat.com>
[valentin.schneider@arm.com: Don't disable BH in atomic context]
[bigeasy: remove 'preempt_disable(); local_bh_disable(); preempt_enable();
 local_bh_enable();' from the examples because this works on RT now. ]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
I folded Valentin's bits.
I removed the unbalanced preempt_disable()/migrate_disable() part from
the description because it is supported now by the migrate disable
implementation. I didn't find it explicit in code/ patch except as part
of local_bh_disable().


 kernel/rcu/rcutorture.c |   94 ++++++++++++++++++++++++++++++++++++++++--------
 1 file changed, 80 insertions(+), 14 deletions(-)
---
--- a/kernel/rcu/rcutorture.c
+++ b/kernel/rcu/rcutorture.c
@@ -61,10 +61,13 @@ MODULE_AUTHOR("Paul E. McKenney <paulmck
 #define RCUTORTURE_RDR_RBH	 0x08	/*  ... rcu_read_lock_bh(). */
 #define RCUTORTURE_RDR_SCHED	 0x10	/*  ... rcu_read_lock_sched(). */
 #define RCUTORTURE_RDR_RCU	 0x20	/*  ... entering another RCU reader. */
-#define RCUTORTURE_RDR_NBITS	 6	/* Number of bits defined above. */
+#define RCUTORTURE_RDR_ATOM_BH	 0x40	/*  ... disabling bh while atomic */
+#define RCUTORTURE_RDR_ATOM_RBH	 0x80	/*  ... RBH while atomic */
+#define RCUTORTURE_RDR_NBITS	 8	/* Number of bits defined above. */
 #define RCUTORTURE_MAX_EXTEND	 \
 	(RCUTORTURE_RDR_BH | RCUTORTURE_RDR_IRQ | RCUTORTURE_RDR_PREEMPT | \
-	 RCUTORTURE_RDR_RBH | RCUTORTURE_RDR_SCHED)
+	 RCUTORTURE_RDR_RBH | RCUTORTURE_RDR_SCHED | \
+	 RCUTORTURE_RDR_ATOM_BH | RCUTORTURE_RDR_ATOM_RBH)
 #define RCUTORTURE_RDR_MAX_LOOPS 0x7	/* Maximum reader extensions. */
 					/* Must be power of two minus one. */
 #define RCUTORTURE_RDR_MAX_SEGS (RCUTORTURE_RDR_MAX_LOOPS + 3)
@@ -1429,31 +1432,53 @@ static void rcutorture_one_extend(int *r
 	WARN_ON_ONCE((idxold >> RCUTORTURE_RDR_SHIFT) > 1);
 	rtrsp->rt_readstate = newstate;
 
-	/* First, put new protection in place to avoid critical-section gap. */
+	/*
+	 * First, put new protection in place to avoid critical-section gap.
+	 * Disable preemption around the ATOM disables to ensure that
+	 * in_atomic() is true.
+	 */
 	if (statesnew & RCUTORTURE_RDR_BH)
 		local_bh_disable();
+	if (statesnew & RCUTORTURE_RDR_RBH)
+		rcu_read_lock_bh();
 	if (statesnew & RCUTORTURE_RDR_IRQ)
 		local_irq_disable();
 	if (statesnew & RCUTORTURE_RDR_PREEMPT)
 		preempt_disable();
-	if (statesnew & RCUTORTURE_RDR_RBH)
-		rcu_read_lock_bh();
 	if (statesnew & RCUTORTURE_RDR_SCHED)
 		rcu_read_lock_sched();
+	preempt_disable();
+	if (statesnew & RCUTORTURE_RDR_ATOM_BH)
+		local_bh_disable();
+	if (statesnew & RCUTORTURE_RDR_ATOM_RBH)
+		rcu_read_lock_bh();
+	preempt_enable();
 	if (statesnew & RCUTORTURE_RDR_RCU)
 		idxnew = cur_ops->readlock() << RCUTORTURE_RDR_SHIFT;
 
-	/* Next, remove old protection, irq first due to bh conflict. */
+	/*
+	 * Next, remove old protection, in decreasing order of strength
+	 * to avoid unlock paths that aren't safe in the stronger
+	 * context.  Disable preemption around the ATOM enables in
+	 * case the context was only atomic due to IRQ disabling.
+	 */
+	preempt_disable();
 	if (statesold & RCUTORTURE_RDR_IRQ)
 		local_irq_enable();
-	if (statesold & RCUTORTURE_RDR_BH)
+	if (statesold & RCUTORTURE_RDR_ATOM_BH)
 		local_bh_enable();
+	if (statesold & RCUTORTURE_RDR_ATOM_RBH)
+		rcu_read_unlock_bh();
+	preempt_enable();
 	if (statesold & RCUTORTURE_RDR_PREEMPT)
 		preempt_enable();
-	if (statesold & RCUTORTURE_RDR_RBH)
-		rcu_read_unlock_bh();
 	if (statesold & RCUTORTURE_RDR_SCHED)
 		rcu_read_unlock_sched();
+	if (statesold & RCUTORTURE_RDR_BH)
+		local_bh_enable();
+	if (statesold & RCUTORTURE_RDR_RBH)
+		rcu_read_unlock_bh();
+
 	if (statesold & RCUTORTURE_RDR_RCU) {
 		bool lockit = !statesnew && !(torture_random(trsp) & 0xffff);
 
@@ -1496,6 +1521,12 @@ rcutorture_extend_mask(int oldmask, stru
 	int mask = rcutorture_extend_mask_max();
 	unsigned long randmask1 = torture_random(trsp) >> 8;
 	unsigned long randmask2 = randmask1 >> 3;
+	unsigned long preempts = RCUTORTURE_RDR_PREEMPT | RCUTORTURE_RDR_SCHED;
+	unsigned long preempts_irq = preempts | RCUTORTURE_RDR_IRQ;
+	unsigned long nonatomic_bhs = RCUTORTURE_RDR_BH | RCUTORTURE_RDR_RBH;
+	unsigned long atomic_bhs = RCUTORTURE_RDR_ATOM_BH |
+				   RCUTORTURE_RDR_ATOM_RBH;
+	unsigned long tmp;
 
 	WARN_ON_ONCE(mask >> RCUTORTURE_RDR_SHIFT);
 	/* Mostly only one bit (need preemption!), sometimes lots of bits. */
@@ -1503,11 +1534,46 @@ rcutorture_extend_mask(int oldmask, stru
 		mask = mask & randmask2;
 	else
 		mask = mask & (1 << (randmask2 % RCUTORTURE_RDR_NBITS));
-	/* Can't enable bh w/irq disabled. */
-	if ((mask & RCUTORTURE_RDR_IRQ) &&
-	    ((!(mask & RCUTORTURE_RDR_BH) && (oldmask & RCUTORTURE_RDR_BH)) ||
-	     (!(mask & RCUTORTURE_RDR_RBH) && (oldmask & RCUTORTURE_RDR_RBH))))
-		mask |= RCUTORTURE_RDR_BH | RCUTORTURE_RDR_RBH;
+
+	/*
+	 * Can't enable bh w/irq disabled.
+	 */
+	tmp = atomic_bhs | nonatomic_bhs;
+	if (mask & RCUTORTURE_RDR_IRQ)
+		mask |= oldmask & tmp;
+
+	/*
+	 * Ideally these sequences would be detected in debug builds
+	 * (regardless of RT), but until then don't stop testing
+	 * them on non-RT.
+	 */
+	if (IS_ENABLED(CONFIG_PREEMPT_RT)) {
+		/*
+		 * Can't disable bh in atomic context if bh was already
+		 * disabled by another task on the same CPU. Instead of
+		 * attempting to track this, just avoid disabling bh in atomic
+		 * context.
+		 */
+		mask &= ~atomic_bhs;
+		/*
+		 * Can't release the outermost rcu lock in an irq disabled
+		 * section without preemption also being disabled, if irqs
+		 * had ever been enabled during this RCU critical section
+		 * (could leak a special flag and delay reporting the qs).
+		 */
+		if ((oldmask & RCUTORTURE_RDR_RCU) &&
+		    (mask & RCUTORTURE_RDR_IRQ) &&
+		    !(mask & preempts))
+			mask |= RCUTORTURE_RDR_RCU;
+
+		/* Can't modify non-atomic bh in atomic context */
+		tmp = nonatomic_bhs;
+		if (oldmask & preempts_irq)
+			mask &= ~tmp;
+		if ((oldmask | mask) & preempts_irq)
+			mask |= oldmask & tmp;
+	}
+
 	return mask ?: RCUTORTURE_RDR_RCU;
 }
 

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 2/4] sched: Introduce migratable()
  2021-08-11 20:13 ` [PATCH v3 2/4] sched: Introduce migratable() Valentin Schneider
@ 2021-08-17 14:43   ` Sebastian Andrzej Siewior
  2021-08-22 17:31     ` Valentin Schneider
  2021-08-17 17:09   ` Sebastian Andrzej Siewior
  1 sibling, 1 reply; 51+ messages in thread
From: Sebastian Andrzej Siewior @ 2021-08-17 14:43 UTC (permalink / raw)
  To: Valentin Schneider
  Cc: linux-kernel, linux-arm-kernel, rcu, linux-rt-users,
	Catalin Marinas, Will Deacon, Ingo Molnar, Peter Zijlstra,
	Thomas Gleixner, Steven Rostedt, Daniel Bristot de Oliveira,
	Paul E. McKenney, Frederic Weisbecker, Josh Triplett,
	Mathieu Desnoyers, Davidlohr Bueso, Lai Jiangshan,
	Joel Fernandes, Anshuman Khandual, Vincenzo Frascino,
	Steven Price, Ard Biesheuvel, Boqun Feng, Mike Galbraith

On 2021-08-11 21:13:52 [+0100], Valentin Schneider wrote:
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index debc960f41e3..8ba7b4a7ee69 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -1715,6 +1715,16 @@ static inline bool is_percpu_thread(void)
>  #endif
>  }
>  
> +/* Is the current task guaranteed to stay on its current CPU? */
> +static inline bool migratable(void)
> +{
> +#ifdef CONFIG_SMP
> +	return preemptible() && !current->migration_disabled;
> +#else
> +	return true;

shouldn't this be false in the UP case?

> +#endif
> +}
> +
>  /* Per-process atomic flags. */
>  #define PFA_NO_NEW_PRIVS		0	/* May not gain new privileges. */
>  #define PFA_SPREAD_PAGE			1	/* Spread page cache over cpuset */

Sebastian

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 3/4] rcu/nocb: Protect NOCB state via local_lock() under PREEMPT_RT
  2021-08-11 20:13 ` [PATCH v3 3/4] rcu/nocb: Protect NOCB state via local_lock() under PREEMPT_RT Valentin Schneider
  2021-08-13  0:20   ` Paul E. McKenney
@ 2021-08-17 15:36   ` Sebastian Andrzej Siewior
  2021-08-22 18:15     ` Valentin Schneider
  2021-09-21 14:05   ` Thomas Gleixner
  2 siblings, 1 reply; 51+ messages in thread
From: Sebastian Andrzej Siewior @ 2021-08-17 15:36 UTC (permalink / raw)
  To: Valentin Schneider
  Cc: linux-kernel, linux-arm-kernel, rcu, linux-rt-users,
	Catalin Marinas, Will Deacon, Ingo Molnar, Peter Zijlstra,
	Thomas Gleixner, Steven Rostedt, Daniel Bristot de Oliveira,
	Paul E. McKenney, Frederic Weisbecker, Josh Triplett,
	Mathieu Desnoyers, Davidlohr Bueso, Lai Jiangshan,
	Joel Fernandes, Anshuman Khandual, Vincenzo Frascino,
	Steven Price, Ard Biesheuvel, Boqun Feng, Mike Galbraith

On 2021-08-11 21:13:53 [+0100], Valentin Schneider wrote:
> diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> index 0ff5e4fb933e..11c4ff00afde 100644
> --- a/kernel/rcu/tree_plugin.h
> +++ b/kernel/rcu/tree_plugin.h
> @@ -21,6 +21,17 @@ static inline int rcu_lockdep_is_held_nocb(struct rcu_data *rdp)
>  	return lockdep_is_held(&rdp->nocb_lock);
>  }
>  
> +static inline int rcu_lockdep_is_held_nocb_local(struct rcu_data *rdp)
> +{
> +	return lockdep_is_held(
> +#ifdef CONFIG_PREEMPT_RT
> +		&rdp->nocb_local_lock.lock
> +#else
> +		&rdp->nocb_local_lock
> +#endif
> +	);
> +}

Now that I see it and Paul asked for it, please just use !RT version.
	return lockdep_is_held(&rdp->nocb_local_lock);

and RT will work, too.

>  static inline bool rcu_current_is_nocb_kthread(struct rcu_data *rdp)
>  {
>  	/* Race on early boot between thread creation and assignment */
> @@ -1629,6 +1664,22 @@ static void rcu_nocb_unlock_irqrestore(struct rcu_data *rdp,
>  	}
>  }
>  
> +/*
> + * The invocation of rcu_core() within the RCU core kthreads remains preemptible
> + * under PREEMPT_RT, thus the offload state of a CPU could change while
> + * said kthreads are preempted. Prevent this from happening by protecting the
> + * offload state with a local_lock().
> + */
> +static void rcu_nocb_local_lock(struct rcu_data *rdp)
> +{
> +	local_lock(&rcu_data.nocb_local_lock);
> +}
> +
> +static void rcu_nocb_local_unlock(struct rcu_data *rdp)
> +{
> +	local_unlock(&rcu_data.nocb_local_lock);
> +}
> +
Do you need to pass rdp given that it is not used?

>  /* Lockdep check that ->cblist may be safely accessed. */
>  static void rcu_lockdep_assert_cblist_protected(struct rcu_data *rdp)
>  {

Sebastian

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 2/4] sched: Introduce migratable()
  2021-08-11 20:13 ` [PATCH v3 2/4] sched: Introduce migratable() Valentin Schneider
  2021-08-17 14:43   ` Sebastian Andrzej Siewior
@ 2021-08-17 17:09   ` Sebastian Andrzej Siewior
  2021-08-17 19:30     ` Phil Auld
  2021-08-22 18:14     ` Valentin Schneider
  1 sibling, 2 replies; 51+ messages in thread
From: Sebastian Andrzej Siewior @ 2021-08-17 17:09 UTC (permalink / raw)
  To: Valentin Schneider
  Cc: linux-kernel, linux-arm-kernel, rcu, linux-rt-users,
	Catalin Marinas, Will Deacon, Ingo Molnar, Peter Zijlstra,
	Thomas Gleixner, Steven Rostedt, Daniel Bristot de Oliveira,
	Paul E. McKenney, Frederic Weisbecker, Josh Triplett,
	Mathieu Desnoyers, Davidlohr Bueso, Lai Jiangshan,
	Joel Fernandes, Anshuman Khandual, Vincenzo Frascino,
	Steven Price, Ard Biesheuvel, Boqun Feng, Mike Galbraith

On 2021-08-11 21:13:52 [+0100], Valentin Schneider wrote:
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index debc960f41e3..8ba7b4a7ee69 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -1715,6 +1715,16 @@ static inline bool is_percpu_thread(void)
>  #endif
>  }
>  
> +/* Is the current task guaranteed to stay on its current CPU? */
> +static inline bool migratable(void)

I'm going to rename this in my tree to `is_migratable' because of
|security/keys/trusted-keys/trusted_core.c:45:22: error: ‘migratable’ redeclared as different kind of symbol
|   45 | static unsigned char migratable;
|      |                      ^~~~~~~~~~
|In file included from arch/arm64/include/asm/compat.h:16,
|                 from arch/arm64/include/asm/stat.h:13,
|                 from include/linux/stat.h:6,
|                 from include/linux/sysfs.h:22,
|                 from include/linux/kobject.h:20,
|                 from include/linux/of.h:17,
|                 from include/linux/irqdomain.h:35,
|                 from include/linux/acpi.h:13,
|                 from include/linux/tpm.h:21,
|                 from include/keys/trusted-type.h:12,
|                 from security/keys/trusted-keys/trusted_core.c:10:
|include/linux/sched.h:1719:20: note: previous definition of ‘migratable’ was here
| 1719 | static inline bool migratable(void)
|      |                    ^~~~~~~~~~

Sebastian

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 2/4] sched: Introduce migratable()
  2021-08-17 17:09   ` Sebastian Andrzej Siewior
@ 2021-08-17 19:30     ` Phil Auld
  2021-08-22 18:14     ` Valentin Schneider
  1 sibling, 0 replies; 51+ messages in thread
From: Phil Auld @ 2021-08-17 19:30 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Valentin Schneider, linux-kernel, linux-arm-kernel, rcu,
	linux-rt-users, Catalin Marinas, Will Deacon, Ingo Molnar,
	Peter Zijlstra, Thomas Gleixner, Steven Rostedt,
	Daniel Bristot de Oliveira, Paul E. McKenney,
	Frederic Weisbecker, Josh Triplett, Mathieu Desnoyers,
	Davidlohr Bueso, Lai Jiangshan, Joel Fernandes,
	Anshuman Khandual, Vincenzo Frascino, Steven Price,
	Ard Biesheuvel, Boqun Feng, Mike Galbraith

On Tue, Aug 17, 2021 at 07:09:25PM +0200 Sebastian Andrzej Siewior wrote:
> On 2021-08-11 21:13:52 [+0100], Valentin Schneider wrote:
> > diff --git a/include/linux/sched.h b/include/linux/sched.h
> > index debc960f41e3..8ba7b4a7ee69 100644
> > --- a/include/linux/sched.h
> > +++ b/include/linux/sched.h
> > @@ -1715,6 +1715,16 @@ static inline bool is_percpu_thread(void)
> >  #endif
> >  }
> >  
> > +/* Is the current task guaranteed to stay on its current CPU? */
> > +static inline bool migratable(void)
> 
> I'm going to rename this in my tree to `is_migratable' because of

It's better anyway. See is_percpu_thread() 5 lines above :)


> |security/keys/trusted-keys/trusted_core.c:45:22: error: ‘migratable’ redeclared as different kind of symbol
> |   45 | static unsigned char migratable;
> |      |                      ^~~~~~~~~~
> |In file included from arch/arm64/include/asm/compat.h:16,
> |                 from arch/arm64/include/asm/stat.h:13,
> |                 from include/linux/stat.h:6,
> |                 from include/linux/sysfs.h:22,
> |                 from include/linux/kobject.h:20,
> |                 from include/linux/of.h:17,
> |                 from include/linux/irqdomain.h:35,
> |                 from include/linux/acpi.h:13,
> |                 from include/linux/tpm.h:21,
> |                 from include/keys/trusted-type.h:12,
> |                 from security/keys/trusted-keys/trusted_core.c:10:
> |include/linux/sched.h:1719:20: note: previous definition of ‘migratable’ was here
> | 1719 | static inline bool migratable(void)
> |      |                    ^~~~~~~~~~
> 
> Sebastian
> 

-- 


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH] rcutorture: Avoid problematic critical section nesting on RT
  2021-08-17 14:40       ` [PATCH] rcutorture: Avoid problematic critical section nesting on RT Sebastian Andrzej Siewior
@ 2021-08-18 22:46         ` Paul E. McKenney
  2021-08-19 15:35           ` Sebastian Andrzej Siewior
  2021-08-19 15:39           ` Sebastian Andrzej Siewior
  2021-08-20  3:23         ` [PATCH] rcutorture: Avoid problematic critical section nesting on RT Scott Wood
  1 sibling, 2 replies; 51+ messages in thread
From: Paul E. McKenney @ 2021-08-18 22:46 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Valentin Schneider, linux-kernel, linux-arm-kernel, rcu,
	linux-rt-users, Catalin Marinas, Will Deacon, Ingo Molnar,
	Peter Zijlstra, Thomas Gleixner, Steven Rostedt,
	Daniel Bristot de Oliveira, Frederic Weisbecker, Josh Triplett,
	Mathieu Desnoyers, Davidlohr Bueso, Lai Jiangshan,
	Joel Fernandes, Anshuman Khandual, Vincenzo Frascino,
	Steven Price, Ard Biesheuvel, Boqun Feng, Mike Galbraith,
	Scott Wood

On Tue, Aug 17, 2021 at 04:40:18PM +0200, Sebastian Andrzej Siewior wrote:
> From: Scott Wood <swood@redhat.com>
> 
> rcutorture was generating some nesting scenarios that are not
> reasonable.  Constrain the state selection to avoid them.
> 
> Example:
> 
> 1. rcu_read_lock()
> 2. local_irq_disable()
> 3. rcu_read_unlock()
> 4. local_irq_enable()
> 
> If the thread is preempted between steps 1 and 2,
> rcu_read_unlock_special.b.blocked will be set, but it won't be
> acted on in step 3 because IRQs are disabled.  Thus, reporting of the
> quiescent state will be delayed beyond the local_irq_enable().
> 
> For now, these scenarios will continue to be tested on non-PREEMPT_RT
> kernels, until debug checks are added to ensure that they are not
> happening elsewhere.
> 
> Signed-off-by: Scott Wood <swood@redhat.com>
> [valentin.schneider@arm.com: Don't disable BH in atomic context]
> [bigeasy: remove 'preempt_disable(); local_bh_disable(); preempt_enable();
>  local_bh_enable();' from the examples because this works on RT now. ]
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

This looks close to being ready for mainline, actually.

One comment below.

							Thanx, Paul

> ---
> I folded Valentin's bits.
> I removed the unbalanced preempt_disable()/migrate_disable() part from
> the description because it is supported now by the migrate disable
> implementation. I didn't find it explicit in code/ patch except as part
> of local_bh_disable().
> 
> 
>  kernel/rcu/rcutorture.c |   94 ++++++++++++++++++++++++++++++++++++++++--------
>  1 file changed, 80 insertions(+), 14 deletions(-)
> ---
> --- a/kernel/rcu/rcutorture.c
> +++ b/kernel/rcu/rcutorture.c
> @@ -61,10 +61,13 @@ MODULE_AUTHOR("Paul E. McKenney <paulmck
>  #define RCUTORTURE_RDR_RBH	 0x08	/*  ... rcu_read_lock_bh(). */
>  #define RCUTORTURE_RDR_SCHED	 0x10	/*  ... rcu_read_lock_sched(). */
>  #define RCUTORTURE_RDR_RCU	 0x20	/*  ... entering another RCU reader. */
> -#define RCUTORTURE_RDR_NBITS	 6	/* Number of bits defined above. */
> +#define RCUTORTURE_RDR_ATOM_BH	 0x40	/*  ... disabling bh while atomic */
> +#define RCUTORTURE_RDR_ATOM_RBH	 0x80	/*  ... RBH while atomic */
> +#define RCUTORTURE_RDR_NBITS	 8	/* Number of bits defined above. */
>  #define RCUTORTURE_MAX_EXTEND	 \
>  	(RCUTORTURE_RDR_BH | RCUTORTURE_RDR_IRQ | RCUTORTURE_RDR_PREEMPT | \
> -	 RCUTORTURE_RDR_RBH | RCUTORTURE_RDR_SCHED)
> +	 RCUTORTURE_RDR_RBH | RCUTORTURE_RDR_SCHED | \
> +	 RCUTORTURE_RDR_ATOM_BH | RCUTORTURE_RDR_ATOM_RBH)
>  #define RCUTORTURE_RDR_MAX_LOOPS 0x7	/* Maximum reader extensions. */
>  					/* Must be power of two minus one. */
>  #define RCUTORTURE_RDR_MAX_SEGS (RCUTORTURE_RDR_MAX_LOOPS + 3)
> @@ -1429,31 +1432,53 @@ static void rcutorture_one_extend(int *r
>  	WARN_ON_ONCE((idxold >> RCUTORTURE_RDR_SHIFT) > 1);
>  	rtrsp->rt_readstate = newstate;
>  
> -	/* First, put new protection in place to avoid critical-section gap. */
> +	/*
> +	 * First, put new protection in place to avoid critical-section gap.
> +	 * Disable preemption around the ATOM disables to ensure that
> +	 * in_atomic() is true.
> +	 */
>  	if (statesnew & RCUTORTURE_RDR_BH)
>  		local_bh_disable();
> +	if (statesnew & RCUTORTURE_RDR_RBH)
> +		rcu_read_lock_bh();
>  	if (statesnew & RCUTORTURE_RDR_IRQ)
>  		local_irq_disable();
>  	if (statesnew & RCUTORTURE_RDR_PREEMPT)
>  		preempt_disable();
> -	if (statesnew & RCUTORTURE_RDR_RBH)
> -		rcu_read_lock_bh();
>  	if (statesnew & RCUTORTURE_RDR_SCHED)
>  		rcu_read_lock_sched();
> +	preempt_disable();
> +	if (statesnew & RCUTORTURE_RDR_ATOM_BH)
> +		local_bh_disable();
> +	if (statesnew & RCUTORTURE_RDR_ATOM_RBH)
> +		rcu_read_lock_bh();
> +	preempt_enable();
>  	if (statesnew & RCUTORTURE_RDR_RCU)
>  		idxnew = cur_ops->readlock() << RCUTORTURE_RDR_SHIFT;
>  
> -	/* Next, remove old protection, irq first due to bh conflict. */
> +	/*
> +	 * Next, remove old protection, in decreasing order of strength
> +	 * to avoid unlock paths that aren't safe in the stronger
> +	 * context.  Disable preemption around the ATOM enables in
> +	 * case the context was only atomic due to IRQ disabling.
> +	 */
> +	preempt_disable();
>  	if (statesold & RCUTORTURE_RDR_IRQ)
>  		local_irq_enable();
> -	if (statesold & RCUTORTURE_RDR_BH)
> +	if (statesold & RCUTORTURE_RDR_ATOM_BH)
>  		local_bh_enable();
> +	if (statesold & RCUTORTURE_RDR_ATOM_RBH)
> +		rcu_read_unlock_bh();
> +	preempt_enable();

The addition of preempt_enable() here prevents rcutorture from covering
an important part of the mainline RCU state space, namely when an RCU
read-side section ends with just local_irq_enable().  This situation
is a challenge for RCU because it must indirectly detect the end of the
critical section.

Would it work for RT if the preempt_enable() and preempt_disable()
were executed only if either RT on the one hand or statesold has the
RCUTORTURE_RDR_ATOM_BH or RCUTORTURE_RDR_ATOM_RBH bit set on the other?

>  	if (statesold & RCUTORTURE_RDR_PREEMPT)
>  		preempt_enable();
> -	if (statesold & RCUTORTURE_RDR_RBH)
> -		rcu_read_unlock_bh();
>  	if (statesold & RCUTORTURE_RDR_SCHED)
>  		rcu_read_unlock_sched();
> +	if (statesold & RCUTORTURE_RDR_BH)
> +		local_bh_enable();
> +	if (statesold & RCUTORTURE_RDR_RBH)
> +		rcu_read_unlock_bh();
> +
>  	if (statesold & RCUTORTURE_RDR_RCU) {
>  		bool lockit = !statesnew && !(torture_random(trsp) & 0xffff);
>  
> @@ -1496,6 +1521,12 @@ rcutorture_extend_mask(int oldmask, stru
>  	int mask = rcutorture_extend_mask_max();
>  	unsigned long randmask1 = torture_random(trsp) >> 8;
>  	unsigned long randmask2 = randmask1 >> 3;
> +	unsigned long preempts = RCUTORTURE_RDR_PREEMPT | RCUTORTURE_RDR_SCHED;
> +	unsigned long preempts_irq = preempts | RCUTORTURE_RDR_IRQ;
> +	unsigned long nonatomic_bhs = RCUTORTURE_RDR_BH | RCUTORTURE_RDR_RBH;
> +	unsigned long atomic_bhs = RCUTORTURE_RDR_ATOM_BH |
> +				   RCUTORTURE_RDR_ATOM_RBH;
> +	unsigned long tmp;
>  
>  	WARN_ON_ONCE(mask >> RCUTORTURE_RDR_SHIFT);
>  	/* Mostly only one bit (need preemption!), sometimes lots of bits. */
> @@ -1503,11 +1534,46 @@ rcutorture_extend_mask(int oldmask, stru
>  		mask = mask & randmask2;
>  	else
>  		mask = mask & (1 << (randmask2 % RCUTORTURE_RDR_NBITS));
> -	/* Can't enable bh w/irq disabled. */
> -	if ((mask & RCUTORTURE_RDR_IRQ) &&
> -	    ((!(mask & RCUTORTURE_RDR_BH) && (oldmask & RCUTORTURE_RDR_BH)) ||
> -	     (!(mask & RCUTORTURE_RDR_RBH) && (oldmask & RCUTORTURE_RDR_RBH))))
> -		mask |= RCUTORTURE_RDR_BH | RCUTORTURE_RDR_RBH;
> +
> +	/*
> +	 * Can't enable bh w/irq disabled.
> +	 */
> +	tmp = atomic_bhs | nonatomic_bhs;
> +	if (mask & RCUTORTURE_RDR_IRQ)
> +		mask |= oldmask & tmp;

This is more straightforward than my original, good!

> +
> +	/*
> +	 * Ideally these sequences would be detected in debug builds
> +	 * (regardless of RT), but until then don't stop testing
> +	 * them on non-RT.
> +	 */
> +	if (IS_ENABLED(CONFIG_PREEMPT_RT)) {
> +		/*
> +		 * Can't disable bh in atomic context if bh was already
> +		 * disabled by another task on the same CPU. Instead of
> +		 * attempting to track this, just avoid disabling bh in atomic
> +		 * context.
> +		 */
> +		mask &= ~atomic_bhs;

At some point, we will need to test disabling bh in atomic context,
correct?  Or am I missing something here?

> +		/*
> +		 * Can't release the outermost rcu lock in an irq disabled
> +		 * section without preemption also being disabled, if irqs
> +		 * had ever been enabled during this RCU critical section
> +		 * (could leak a special flag and delay reporting the qs).
> +		 */
> +		if ((oldmask & RCUTORTURE_RDR_RCU) &&
> +		    (mask & RCUTORTURE_RDR_IRQ) &&
> +		    !(mask & preempts))
> +			mask |= RCUTORTURE_RDR_RCU;
> +
> +		/* Can't modify non-atomic bh in atomic context */
> +		tmp = nonatomic_bhs;
> +		if (oldmask & preempts_irq)
> +			mask &= ~tmp;
> +		if ((oldmask | mask) & preempts_irq)
> +			mask |= oldmask & tmp;
> +	}
> +
>  	return mask ?: RCUTORTURE_RDR_RCU;
>  }
>  

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH] rcutorture: Avoid problematic critical section nesting on RT
  2021-08-18 22:46         ` Paul E. McKenney
@ 2021-08-19 15:35           ` Sebastian Andrzej Siewior
  2021-08-19 15:39           ` Sebastian Andrzej Siewior
  1 sibling, 0 replies; 51+ messages in thread
From: Sebastian Andrzej Siewior @ 2021-08-19 15:35 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Valentin Schneider, linux-kernel, linux-arm-kernel, rcu,
	linux-rt-users, Catalin Marinas, Will Deacon, Ingo Molnar,
	Peter Zijlstra, Thomas Gleixner, Steven Rostedt,
	Daniel Bristot de Oliveira, Frederic Weisbecker, Josh Triplett,
	Mathieu Desnoyers, Davidlohr Bueso, Lai Jiangshan,
	Joel Fernandes, Anshuman Khandual, Vincenzo Frascino,
	Steven Price, Ard Biesheuvel, Boqun Feng, Mike Galbraith,
	Scott Wood

On 2021-08-18 15:46:51 [-0700], Paul E. McKenney wrote:
…
> > +	/*
> > +	 * Ideally these sequences would be detected in debug builds
> > +	 * (regardless of RT), but until then don't stop testing
> > +	 * them on non-RT.
> > +	 */
> > +	if (IS_ENABLED(CONFIG_PREEMPT_RT)) {
> > +		/*
> > +		 * Can't disable bh in atomic context if bh was already
> > +		 * disabled by another task on the same CPU. Instead of
> > +		 * attempting to track this, just avoid disabling bh in atomic
> > +		 * context.
> > +		 */
> > +		mask &= ~atomic_bhs;
> 
> At some point, we will need to test disabling bh in atomic context,
> correct?  Or am I missing something here?

Ideally there is no disabling bh in atomic context (on RT). Having it
breaks some fundamental rules how softirq handling and the bh related
synchronisation is implemented. Given that the softirq handler is
invoked in thread context and preemption is not disabled as part of
spin_lock(), rcu_read_lock(), and interrupts are in general not disabled
in the interrupt handler or spin_lock_irq() there is close to zero
chance of disabling bh in atomic context on RT.

In reality there is (of course) something that needs to disable bh in
atomic context and it happens only during boot up (or from idle unless
I'm mistaken).
It is required that bh disable and its enable part (later) happens in
the same context that is if bh has been disabled in preemptible context
it must not be enabled in atomic context (and vice versa).

The bottom line is that there must not be a local_bh_disable() in atomic
context if another (preempted) task already did a local_bh_disable() on
the same CPU, like in the following scenario on one CPU:

TASK A                      TASK B
 local_bh_disable();
 … preempted
                           preempt_disable();
			   local_bh_disable();

Then this breaks the synchronisation that is otherwise provided by
local_bh_disable(). Without that preempt_disable() TASK B would block
(and wait) until TASK A completes its BH section. In atomic context it
is not possible and therefore not allowed.

Sebastian

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH] rcutorture: Avoid problematic critical section nesting on RT
  2021-08-18 22:46         ` Paul E. McKenney
  2021-08-19 15:35           ` Sebastian Andrzej Siewior
@ 2021-08-19 15:39           ` Sebastian Andrzej Siewior
  2021-08-19 15:47             ` Sebastian Andrzej Siewior
  1 sibling, 1 reply; 51+ messages in thread
From: Sebastian Andrzej Siewior @ 2021-08-19 15:39 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Valentin Schneider, linux-kernel, linux-arm-kernel, rcu,
	linux-rt-users, Catalin Marinas, Will Deacon, Ingo Molnar,
	Peter Zijlstra, Thomas Gleixner, Steven Rostedt,
	Daniel Bristot de Oliveira, Frederic Weisbecker, Josh Triplett,
	Mathieu Desnoyers, Davidlohr Bueso, Lai Jiangshan,
	Joel Fernandes, Anshuman Khandual, Vincenzo Frascino,
	Steven Price, Ard Biesheuvel, Boqun Feng, Mike Galbraith,
	Scott Wood

On 2021-08-18 15:46:51 [-0700], Paul E. McKenney wrote:
> > ---
> > --- a/kernel/rcu/rcutorture.c
> > +++ b/kernel/rcu/rcutorture.c
> > @@ -61,10 +61,13 @@ MODULE_AUTHOR("Paul E. McKenney <paulmck
> > -	/* Next, remove old protection, irq first due to bh conflict. */
> > +	/*
> > +	 * Next, remove old protection, in decreasing order of strength
> > +	 * to avoid unlock paths that aren't safe in the stronger
> > +	 * context.  Disable preemption around the ATOM enables in
> > +	 * case the context was only atomic due to IRQ disabling.
> > +	 */
> > +	preempt_disable();
> >  	if (statesold & RCUTORTURE_RDR_IRQ)
> >  		local_irq_enable();
> > -	if (statesold & RCUTORTURE_RDR_BH)
> > +	if (statesold & RCUTORTURE_RDR_ATOM_BH)
> >  		local_bh_enable();
> > +	if (statesold & RCUTORTURE_RDR_ATOM_RBH)
> > +		rcu_read_unlock_bh();
> > +	preempt_enable();
> 
> The addition of preempt_enable() here prevents rcutorture from covering
> an important part of the mainline RCU state space, namely when an RCU
> read-side section ends with just local_irq_enable().  This situation
> is a challenge for RCU because it must indirectly detect the end of the
> critical section.
> 
> Would it work for RT if the preempt_enable() and preempt_disable()
> were executed only if either RT on the one hand or statesold has the
> RCUTORTURE_RDR_ATOM_BH or RCUTORTURE_RDR_ATOM_RBH bit set on the other?

Now that I stared at it some more (and it stared briefly back at me) I
couldn't explain why we need this and that piece of the patch so I came
up with following which I can explain:

diff --git a/kernel/rcu/rcutorture.c b/kernel/rcu/rcutorture.c
index 40ef5417d9545..5c8b31b7eff03 100644
--- a/kernel/rcu/rcutorture.c
+++ b/kernel/rcu/rcutorture.c
@@ -1432,28 +1432,34 @@ static void rcutorture_one_extend(int *readstate, int newstate,
 	/* First, put new protection in place to avoid critical-section gap. */
 	if (statesnew & RCUTORTURE_RDR_BH)
 		local_bh_disable();
+	if (statesnew & RCUTORTURE_RDR_RBH)
+		rcu_read_lock_bh();
 	if (statesnew & RCUTORTURE_RDR_IRQ)
 		local_irq_disable();
 	if (statesnew & RCUTORTURE_RDR_PREEMPT)
 		preempt_disable();
-	if (statesnew & RCUTORTURE_RDR_RBH)
-		rcu_read_lock_bh();
 	if (statesnew & RCUTORTURE_RDR_SCHED)
 		rcu_read_lock_sched();
 	if (statesnew & RCUTORTURE_RDR_RCU)
 		idxnew = cur_ops->readlock() << RCUTORTURE_RDR_SHIFT;
 
-	/* Next, remove old protection, irq first due to bh conflict. */
+	/*
+	 * Next, remove old protection, in decreasing order of strength
+	 * to avoid unlock paths that aren't safe in the stronger
+	 * context. Namely: BH can not be enabled with disabled interrupts.
+	 * Additionally PREEMPT_RT requires that BH is enabled in preemptible
+	 * context.
+	 */
 	if (statesold & RCUTORTURE_RDR_IRQ)
 		local_irq_enable();
-	if (statesold & RCUTORTURE_RDR_BH)
-		local_bh_enable();
 	if (statesold & RCUTORTURE_RDR_PREEMPT)
 		preempt_enable();
-	if (statesold & RCUTORTURE_RDR_RBH)
-		rcu_read_unlock_bh();
 	if (statesold & RCUTORTURE_RDR_SCHED)
 		rcu_read_unlock_sched();
+	if (statesold & RCUTORTURE_RDR_BH)
+		local_bh_enable();
+	if (statesold & RCUTORTURE_RDR_RBH)
+		rcu_read_unlock_bh();
 	if (statesold & RCUTORTURE_RDR_RCU) {
 		bool lockit = !statesnew && !(torture_random(trsp) & 0xffff);
 
@@ -1496,6 +1502,9 @@ rcutorture_extend_mask(int oldmask, struct torture_random_state *trsp)
 	int mask = rcutorture_extend_mask_max();
 	unsigned long randmask1 = torture_random(trsp) >> 8;
 	unsigned long randmask2 = randmask1 >> 3;
+	unsigned long preempts = RCUTORTURE_RDR_PREEMPT | RCUTORTURE_RDR_SCHED;
+	unsigned long preempts_irq = preempts | RCUTORTURE_RDR_IRQ;
+	unsigned long bhs = RCUTORTURE_RDR_BH | RCUTORTURE_RDR_RBH;
 
 	WARN_ON_ONCE(mask >> RCUTORTURE_RDR_SHIFT);
 	/* Mostly only one bit (need preemption!), sometimes lots of bits. */
@@ -1503,11 +1512,37 @@ rcutorture_extend_mask(int oldmask, struct torture_random_state *trsp)
 		mask = mask & randmask2;
 	else
 		mask = mask & (1 << (randmask2 % RCUTORTURE_RDR_NBITS));
-	/* Can't enable bh w/irq disabled. */
-	if ((mask & RCUTORTURE_RDR_IRQ) &&
-	    ((!(mask & RCUTORTURE_RDR_BH) && (oldmask & RCUTORTURE_RDR_BH)) ||
-	     (!(mask & RCUTORTURE_RDR_RBH) && (oldmask & RCUTORTURE_RDR_RBH))))
-		mask |= RCUTORTURE_RDR_BH | RCUTORTURE_RDR_RBH;
+
+	/*
+	 * Can't enable bh w/irq disabled.
+	 */
+	if (mask & RCUTORTURE_RDR_IRQ)
+		mask |= oldmask & bhs;
+
+	/*
+	 * Ideally these sequences would be detected in debug builds
+	 * (regardless of RT), but until then don't stop testing
+	 * them on non-RT.
+	 */
+	if (IS_ENABLED(CONFIG_PREEMPT_RT)) {
+		/*
+		 * Can't release the outermost rcu lock in an irq disabled
+		 * section without preemption also being disabled, if irqs
+		 * had ever been enabled during this RCU critical section
+		 * (could leak a special flag and delay reporting the qs).
+		 */
+		if ((oldmask & RCUTORTURE_RDR_RCU) &&
+		    (mask & RCUTORTURE_RDR_IRQ) &&
+		    !(mask & preempts))
+			mask |= RCUTORTURE_RDR_RCU;
+
+		/* Can't modify bh in atomic context */
+		if (oldmask & preempts_irq)
+			mask &= ~bhs;
+		if ((oldmask | mask) & preempts_irq)
+			mask |= oldmask & bhs;
+	}
+
 	return mask ?: RCUTORTURE_RDR_RCU;
 }
 

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* Re: [PATCH] rcutorture: Avoid problematic critical section nesting on RT
  2021-08-19 15:39           ` Sebastian Andrzej Siewior
@ 2021-08-19 15:47             ` Sebastian Andrzej Siewior
  2021-08-19 18:20               ` Paul E. McKenney
  0 siblings, 1 reply; 51+ messages in thread
From: Sebastian Andrzej Siewior @ 2021-08-19 15:47 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Valentin Schneider, linux-kernel, linux-arm-kernel, rcu,
	linux-rt-users, Catalin Marinas, Will Deacon, Ingo Molnar,
	Peter Zijlstra, Thomas Gleixner, Steven Rostedt,
	Daniel Bristot de Oliveira, Frederic Weisbecker, Josh Triplett,
	Mathieu Desnoyers, Davidlohr Bueso, Lai Jiangshan,
	Joel Fernandes, Anshuman Khandual, Vincenzo Frascino,
	Steven Price, Ard Biesheuvel, Boqun Feng, Mike Galbraith,
	Scott Wood

On 2021-08-19 17:39:29 [+0200], To Paul E. McKenney wrote:
> up with following which I can explain:
> 
> diff --git a/kernel/rcu/rcutorture.c b/kernel/rcu/rcutorture.c
> index 40ef5417d9545..5c8b31b7eff03 100644
> --- a/kernel/rcu/rcutorture.c
> +++ b/kernel/rcu/rcutorture.c
> @@ -1432,28 +1432,34 @@ static void rcutorture_one_extend(int *readstate, int newstate,
>  	/* First, put new protection in place to avoid critical-section gap. */
>  	if (statesnew & RCUTORTURE_RDR_BH)
>  		local_bh_disable();
> +	if (statesnew & RCUTORTURE_RDR_RBH)
> +		rcu_read_lock_bh();
>  	if (statesnew & RCUTORTURE_RDR_IRQ)
>  		local_irq_disable();
>  	if (statesnew & RCUTORTURE_RDR_PREEMPT)
>  		preempt_disable();
> -	if (statesnew & RCUTORTURE_RDR_RBH)
> -		rcu_read_lock_bh();
>  	if (statesnew & RCUTORTURE_RDR_SCHED)
>  		rcu_read_lock_sched();
>  	if (statesnew & RCUTORTURE_RDR_RCU)
>  		idxnew = cur_ops->readlock() << RCUTORTURE_RDR_SHIFT;

So the ordering in the enable and disable part regarding BH is
important. First BH, then preemption or IRQ.

> -	/* Next, remove old protection, irq first due to bh conflict. */
> +	/*
> +	 * Next, remove old protection, in decreasing order of strength
> +	 * to avoid unlock paths that aren't safe in the stronger
> +	 * context. Namely: BH can not be enabled with disabled interrupts.
> +	 * Additionally PREEMPT_RT requires that BH is enabled in preemptible
> +	 * context.
> +	 */
>  	if (statesold & RCUTORTURE_RDR_IRQ)
>  		local_irq_enable();
> -	if (statesold & RCUTORTURE_RDR_BH)
> -		local_bh_enable();
>  	if (statesold & RCUTORTURE_RDR_PREEMPT)
>  		preempt_enable();
> -	if (statesold & RCUTORTURE_RDR_RBH)
> -		rcu_read_unlock_bh();
>  	if (statesold & RCUTORTURE_RDR_SCHED)
>  		rcu_read_unlock_sched();
> +	if (statesold & RCUTORTURE_RDR_BH)
> +		local_bh_enable();
> +	if (statesold & RCUTORTURE_RDR_RBH)
> +		rcu_read_unlock_bh();
>  	if (statesold & RCUTORTURE_RDR_RCU) {
>  		bool lockit = !statesnew && !(torture_random(trsp) & 0xffff);

The same in the unlock part so that BH is unlocked in preemptible
context.
Now if you need bh lock/unlock in atomic context (either with disabled
IRQs or preemption) then I would dig out the atomic-bh part again and
make !RT only without the preempt_disable() section around about which
one you did complain.

> @@ -1496,6 +1502,9 @@ rcutorture_extend_mask(int oldmask, struct torture_random_state *trsp)
>  	int mask = rcutorture_extend_mask_max();
>  	unsigned long randmask1 = torture_random(trsp) >> 8;
>  	unsigned long randmask2 = randmask1 >> 3;
> +	unsigned long preempts = RCUTORTURE_RDR_PREEMPT | RCUTORTURE_RDR_SCHED;
> +	unsigned long preempts_irq = preempts | RCUTORTURE_RDR_IRQ;
> +	unsigned long bhs = RCUTORTURE_RDR_BH | RCUTORTURE_RDR_RBH;
>  
>  	WARN_ON_ONCE(mask >> RCUTORTURE_RDR_SHIFT);
>  	/* Mostly only one bit (need preemption!), sometimes lots of bits. */
> @@ -1503,11 +1512,37 @@ rcutorture_extend_mask(int oldmask, struct torture_random_state *trsp)
>  		mask = mask & randmask2;
>  	else
>  		mask = mask & (1 << (randmask2 % RCUTORTURE_RDR_NBITS));
> -	/* Can't enable bh w/irq disabled. */
> -	if ((mask & RCUTORTURE_RDR_IRQ) &&
> -	    ((!(mask & RCUTORTURE_RDR_BH) && (oldmask & RCUTORTURE_RDR_BH)) ||
> -	     (!(mask & RCUTORTURE_RDR_RBH) && (oldmask & RCUTORTURE_RDR_RBH))))
> -		mask |= RCUTORTURE_RDR_BH | RCUTORTURE_RDR_RBH;
> +
> +	/*
> +	 * Can't enable bh w/irq disabled.
> +	 */
> +	if (mask & RCUTORTURE_RDR_IRQ)
> +		mask |= oldmask & bhs;
> +
> +	/*
> +	 * Ideally these sequences would be detected in debug builds
> +	 * (regardless of RT), but until then don't stop testing
> +	 * them on non-RT.
> +	 */
> +	if (IS_ENABLED(CONFIG_PREEMPT_RT)) {
> +		/*
> +		 * Can't release the outermost rcu lock in an irq disabled
> +		 * section without preemption also being disabled, if irqs
> +		 * had ever been enabled during this RCU critical section
> +		 * (could leak a special flag and delay reporting the qs).
> +		 */
> +		if ((oldmask & RCUTORTURE_RDR_RCU) &&
> +		    (mask & RCUTORTURE_RDR_IRQ) &&
> +		    !(mask & preempts))
> +			mask |= RCUTORTURE_RDR_RCU;

This piece above, I don't understand. I had it running for a while and
it didn't explode. Let me try TREE01 for 30min without that piece.

> +		/* Can't modify bh in atomic context */
> +		if (oldmask & preempts_irq)
> +			mask &= ~bhs;
> +		if ((oldmask | mask) & preempts_irq)
> +			mask |= oldmask & bhs;

And this is needed because we can't lock/unlock bh while atomic.

> +	}
> +
>  	return mask ?: RCUTORTURE_RDR_RCU;
>  }
>  

Sebastian

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH] rcutorture: Avoid problematic critical section nesting on RT
  2021-08-19 15:47             ` Sebastian Andrzej Siewior
@ 2021-08-19 18:20               ` Paul E. McKenney
  2021-08-19 18:45                 ` Sebastian Andrzej Siewior
  2021-08-20  4:11                 ` Scott Wood
  0 siblings, 2 replies; 51+ messages in thread
From: Paul E. McKenney @ 2021-08-19 18:20 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Valentin Schneider, linux-kernel, linux-arm-kernel, rcu,
	linux-rt-users, Catalin Marinas, Will Deacon, Ingo Molnar,
	Peter Zijlstra, Thomas Gleixner, Steven Rostedt,
	Daniel Bristot de Oliveira, Frederic Weisbecker, Josh Triplett,
	Mathieu Desnoyers, Davidlohr Bueso, Lai Jiangshan,
	Joel Fernandes, Anshuman Khandual, Vincenzo Frascino,
	Steven Price, Ard Biesheuvel, Boqun Feng, Mike Galbraith,
	Scott Wood

On Thu, Aug 19, 2021 at 05:47:08PM +0200, Sebastian Andrzej Siewior wrote:
> On 2021-08-19 17:39:29 [+0200], To Paul E. McKenney wrote:
> > up with following which I can explain:
> > 
> > diff --git a/kernel/rcu/rcutorture.c b/kernel/rcu/rcutorture.c
> > index 40ef5417d9545..5c8b31b7eff03 100644
> > --- a/kernel/rcu/rcutorture.c
> > +++ b/kernel/rcu/rcutorture.c
> > @@ -1432,28 +1432,34 @@ static void rcutorture_one_extend(int *readstate, int newstate,
> >  	/* First, put new protection in place to avoid critical-section gap. */
> >  	if (statesnew & RCUTORTURE_RDR_BH)
> >  		local_bh_disable();
> > +	if (statesnew & RCUTORTURE_RDR_RBH)
> > +		rcu_read_lock_bh();
> >  	if (statesnew & RCUTORTURE_RDR_IRQ)
> >  		local_irq_disable();
> >  	if (statesnew & RCUTORTURE_RDR_PREEMPT)
> >  		preempt_disable();
> > -	if (statesnew & RCUTORTURE_RDR_RBH)
> > -		rcu_read_lock_bh();
> >  	if (statesnew & RCUTORTURE_RDR_SCHED)
> >  		rcu_read_lock_sched();
> >  	if (statesnew & RCUTORTURE_RDR_RCU)
> >  		idxnew = cur_ops->readlock() << RCUTORTURE_RDR_SHIFT;
> 
> So the ordering in the enable and disable part regarding BH is
> important. First BH, then preemption or IRQ.
> 
> > -	/* Next, remove old protection, irq first due to bh conflict. */
> > +	/*
> > +	 * Next, remove old protection, in decreasing order of strength
> > +	 * to avoid unlock paths that aren't safe in the stronger
> > +	 * context. Namely: BH can not be enabled with disabled interrupts.
> > +	 * Additionally PREEMPT_RT requires that BH is enabled in preemptible
> > +	 * context.
> > +	 */
> >  	if (statesold & RCUTORTURE_RDR_IRQ)
> >  		local_irq_enable();
> > -	if (statesold & RCUTORTURE_RDR_BH)
> > -		local_bh_enable();
> >  	if (statesold & RCUTORTURE_RDR_PREEMPT)
> >  		preempt_enable();
> > -	if (statesold & RCUTORTURE_RDR_RBH)
> > -		rcu_read_unlock_bh();
> >  	if (statesold & RCUTORTURE_RDR_SCHED)
> >  		rcu_read_unlock_sched();
> > +	if (statesold & RCUTORTURE_RDR_BH)
> > +		local_bh_enable();
> > +	if (statesold & RCUTORTURE_RDR_RBH)
> > +		rcu_read_unlock_bh();
> >  	if (statesold & RCUTORTURE_RDR_RCU) {
> >  		bool lockit = !statesnew && !(torture_random(trsp) & 0xffff);
> 
> The same in the unlock part so that BH is unlocked in preemptible
> context.
> Now if you need bh lock/unlock in atomic context (either with disabled
> IRQs or preemption) then I would dig out the atomic-bh part again and
> make !RT only without the preempt_disable() section around about which
> one you did complain.
> 
> > @@ -1496,6 +1502,9 @@ rcutorture_extend_mask(int oldmask, struct torture_random_state *trsp)
> >  	int mask = rcutorture_extend_mask_max();
> >  	unsigned long randmask1 = torture_random(trsp) >> 8;
> >  	unsigned long randmask2 = randmask1 >> 3;
> > +	unsigned long preempts = RCUTORTURE_RDR_PREEMPT | RCUTORTURE_RDR_SCHED;
> > +	unsigned long preempts_irq = preempts | RCUTORTURE_RDR_IRQ;
> > +	unsigned long bhs = RCUTORTURE_RDR_BH | RCUTORTURE_RDR_RBH;
> >  
> >  	WARN_ON_ONCE(mask >> RCUTORTURE_RDR_SHIFT);
> >  	/* Mostly only one bit (need preemption!), sometimes lots of bits. */
> > @@ -1503,11 +1512,37 @@ rcutorture_extend_mask(int oldmask, struct torture_random_state *trsp)
> >  		mask = mask & randmask2;
> >  	else
> >  		mask = mask & (1 << (randmask2 % RCUTORTURE_RDR_NBITS));
> > -	/* Can't enable bh w/irq disabled. */
> > -	if ((mask & RCUTORTURE_RDR_IRQ) &&
> > -	    ((!(mask & RCUTORTURE_RDR_BH) && (oldmask & RCUTORTURE_RDR_BH)) ||
> > -	     (!(mask & RCUTORTURE_RDR_RBH) && (oldmask & RCUTORTURE_RDR_RBH))))
> > -		mask |= RCUTORTURE_RDR_BH | RCUTORTURE_RDR_RBH;
> > +
> > +	/*
> > +	 * Can't enable bh w/irq disabled.
> > +	 */
> > +	if (mask & RCUTORTURE_RDR_IRQ)
> > +		mask |= oldmask & bhs;
> > +
> > +	/*
> > +	 * Ideally these sequences would be detected in debug builds
> > +	 * (regardless of RT), but until then don't stop testing
> > +	 * them on non-RT.
> > +	 */
> > +	if (IS_ENABLED(CONFIG_PREEMPT_RT)) {
> > +		/*
> > +		 * Can't release the outermost rcu lock in an irq disabled
> > +		 * section without preemption also being disabled, if irqs
> > +		 * had ever been enabled during this RCU critical section
> > +		 * (could leak a special flag and delay reporting the qs).
> > +		 */
> > +		if ((oldmask & RCUTORTURE_RDR_RCU) &&
> > +		    (mask & RCUTORTURE_RDR_IRQ) &&
> > +		    !(mask & preempts))
> > +			mask |= RCUTORTURE_RDR_RCU;
> 
> This piece above, I don't understand. I had it running for a while and
> it didn't explode. Let me try TREE01 for 30min without that piece.

This might be historical.  There was a time when interrupts being
disabled across rcu_read_unlock() meant that preemption had to have
been disabled across the entire RCU read-side critical section.

I am not seeing a purpose for it now, but I could easily be missing
something, especially given my tenuous grasp of RT.

Either way, looking forward to the next version!

							Thanx, Paul

> > +		/* Can't modify bh in atomic context */
> > +		if (oldmask & preempts_irq)
> > +			mask &= ~bhs;
> > +		if ((oldmask | mask) & preempts_irq)
> > +			mask |= oldmask & bhs;
> 
> And this is needed because we can't lock/unlock bh while atomic.
> 
> > +	}
> > +
> >  	return mask ?: RCUTORTURE_RDR_RCU;
> >  }
> >  
> 
> Sebastian

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH] rcutorture: Avoid problematic critical section nesting on RT
  2021-08-19 18:20               ` Paul E. McKenney
@ 2021-08-19 18:45                 ` Sebastian Andrzej Siewior
  2021-08-20  4:11                 ` Scott Wood
  1 sibling, 0 replies; 51+ messages in thread
From: Sebastian Andrzej Siewior @ 2021-08-19 18:45 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Valentin Schneider, linux-kernel, linux-arm-kernel, rcu,
	linux-rt-users, Catalin Marinas, Will Deacon, Ingo Molnar,
	Peter Zijlstra, Thomas Gleixner, Steven Rostedt,
	Daniel Bristot de Oliveira, Frederic Weisbecker, Josh Triplett,
	Mathieu Desnoyers, Davidlohr Bueso, Lai Jiangshan,
	Joel Fernandes, Anshuman Khandual, Vincenzo Frascino,
	Steven Price, Ard Biesheuvel, Boqun Feng, Mike Galbraith,
	Scott Wood

On 2021-08-19 11:20:35 [-0700], Paul E. McKenney wrote:
> > This piece above, I don't understand. I had it running for a while and
> > it didn't explode. Let me try TREE01 for 30min without that piece.
> 
> This might be historical.  There was a time when interrupts being
> disabled across rcu_read_unlock() meant that preemption had to have
> been disabled across the entire RCU read-side critical section.
> 
> I am not seeing a purpose for it now, but I could easily be missing
> something, especially given my tenuous grasp of RT.

Okay. So the 30min test didn't trigger any warnings…

> Either way, looking forward to the next version!

Good. So if you liked what you have seen then I'm going to resubmit the
above as a proper patch then.
Thanks!

> 							Thanx, Paul

Sebastian

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH] rcutorture: Avoid problematic critical section nesting on RT
  2021-08-17 14:40       ` [PATCH] rcutorture: Avoid problematic critical section nesting on RT Sebastian Andrzej Siewior
  2021-08-18 22:46         ` Paul E. McKenney
@ 2021-08-20  3:23         ` Scott Wood
  2021-08-20  6:54           ` Sebastian Andrzej Siewior
  1 sibling, 1 reply; 51+ messages in thread
From: Scott Wood @ 2021-08-20  3:23 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior, Valentin Schneider
  Cc: linux-kernel, linux-arm-kernel, rcu, linux-rt-users,
	Catalin Marinas, Will Deacon, Ingo Molnar, Peter Zijlstra,
	Thomas Gleixner, Steven Rostedt, Daniel Bristot de Oliveira,
	Paul E. McKenney, Frederic Weisbecker, Josh Triplett,
	Mathieu Desnoyers, Davidlohr Bueso, Lai Jiangshan,
	Joel Fernandes, Anshuman Khandual, Vincenzo Frascino,
	Steven Price, Ard Biesheuvel, Boqun Feng, Mike Galbraith

On Tue, 2021-08-17 at 16:40 +0200, Sebastian Andrzej Siewior wrote:
> [bigeasy: remove 'preempt_disable(); local_bh_disable(); preempt_enable();
>  local_bh_enable();' from the examples because this works on RT now. ]

Does it actually work?  If preemption is disabled during local_bh_disable,
softirq_ctrl.lock won't be taken.  If you then get preempted between the
preempt_enable() and the local_bh_enable(), and another task tries to do
local_bh_disable(), won't it successfully get softirq_ctrl.lock, add to
softirq_ctrl.cnt, and proceed right into the critical section?

Or am I missing something?

-Scott



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH] rcutorture: Avoid problematic critical section nesting on RT
  2021-08-19 18:20               ` Paul E. McKenney
  2021-08-19 18:45                 ` Sebastian Andrzej Siewior
@ 2021-08-20  4:11                 ` Scott Wood
  2021-08-20  7:11                   ` Sebastian Andrzej Siewior
  2021-08-20  7:42                   ` [PATCH v2] rcutorture: Avoid problematic critical section nesting on PREEMPT_RT Sebastian Andrzej Siewior
  1 sibling, 2 replies; 51+ messages in thread
From: Scott Wood @ 2021-08-20  4:11 UTC (permalink / raw)
  To: paulmck, Sebastian Andrzej Siewior
  Cc: Valentin Schneider, linux-kernel, linux-arm-kernel, rcu,
	linux-rt-users, Catalin Marinas, Will Deacon, Ingo Molnar,
	Peter Zijlstra, Thomas Gleixner, Steven Rostedt,
	Daniel Bristot de Oliveira, Frederic Weisbecker, Josh Triplett,
	Mathieu Desnoyers, Davidlohr Bueso, Lai Jiangshan,
	Joel Fernandes, Anshuman Khandual, Vincenzo Frascino,
	Steven Price, Ard Biesheuvel, Boqun Feng, Mike Galbraith

On Thu, 2021-08-19 at 11:20 -0700, Paul E. McKenney wrote:
> On Thu, Aug 19, 2021 at 05:47:08PM +0200, Sebastian Andrzej Siewior wrote:
> > On 2021-08-19 17:39:29 [+0200], To Paul E. McKenney wrote:
> > > +	/*
> > > +	 * Ideally these sequences would be detected in debug builds
> > > +	 * (regardless of RT), but until then don't stop testing
> > > +	 * them on non-RT.
> > > +	 */
> > > +	if (IS_ENABLED(CONFIG_PREEMPT_RT)) {
> > > +		/*
> > > +		 * Can't release the outermost rcu lock in an irq disabled
> > > +		 * section without preemption also being disabled, if irqs
> > > +		 * had ever been enabled during this RCU critical section
> > > +		 * (could leak a special flag and delay reporting the qs).
> > > +		 */
> > > +		if ((oldmask & RCUTORTURE_RDR_RCU) &&
> > > +		    (mask & RCUTORTURE_RDR_IRQ) &&
> > > +		    !(mask & preempts))
> > > +			mask |= RCUTORTURE_RDR_RCU;
> > 
> > This piece above, I don't understand. I had it running for a while and
> > it didn't explode. Let me try TREE01 for 30min without that piece.
> 
> This might be historical.  There was a time when interrupts being
> disabled across rcu_read_unlock() meant that preemption had to have
> been disabled across the entire RCU read-side critical section.
> 
> I am not seeing a purpose for it now, but I could easily be missing
> something, especially given my tenuous grasp of RT.

Yeah, I think this was to deal with not having the irq work stuff in RT
at the time.

-Scott


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH] rcutorture: Avoid problematic critical section nesting on RT
  2021-08-20  3:23         ` [PATCH] rcutorture: Avoid problematic critical section nesting on RT Scott Wood
@ 2021-08-20  6:54           ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 51+ messages in thread
From: Sebastian Andrzej Siewior @ 2021-08-20  6:54 UTC (permalink / raw)
  To: Scott Wood
  Cc: Valentin Schneider, linux-kernel, linux-arm-kernel, rcu,
	linux-rt-users, Catalin Marinas, Will Deacon, Ingo Molnar,
	Peter Zijlstra, Thomas Gleixner, Steven Rostedt,
	Daniel Bristot de Oliveira, Paul E. McKenney,
	Frederic Weisbecker, Josh Triplett, Mathieu Desnoyers,
	Davidlohr Bueso, Lai Jiangshan, Joel Fernandes,
	Anshuman Khandual, Vincenzo Frascino, Steven Price,
	Ard Biesheuvel, Boqun Feng, Mike Galbraith

On 2021-08-19 22:23:37 [-0500], Scott Wood wrote:
> On Tue, 2021-08-17 at 16:40 +0200, Sebastian Andrzej Siewior wrote:
> > [bigeasy: remove 'preempt_disable(); local_bh_disable(); preempt_enable();
> >  local_bh_enable();' from the examples because this works on RT now. ]
> 
> Does it actually work?  If preemption is disabled during local_bh_disable,
> softirq_ctrl.lock won't be taken.  If you then get preempted between the
> preempt_enable() and the local_bh_enable(), and another task tries to do
> local_bh_disable(), won't it successfully get softirq_ctrl.lock, add to
> softirq_ctrl.cnt, and proceed right into the critical section?
> 
> Or am I missing something?

No, I mixed it up with migrate_disable/enable. I corrected it while
redoing it yesterday.

> -Scott

Sebastian

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH] rcutorture: Avoid problematic critical section nesting on RT
  2021-08-20  4:11                 ` Scott Wood
@ 2021-08-20  7:11                   ` Sebastian Andrzej Siewior
  2021-08-20  7:42                   ` [PATCH v2] rcutorture: Avoid problematic critical section nesting on PREEMPT_RT Sebastian Andrzej Siewior
  1 sibling, 0 replies; 51+ messages in thread
From: Sebastian Andrzej Siewior @ 2021-08-20  7:11 UTC (permalink / raw)
  To: Scott Wood
  Cc: paulmck, Valentin Schneider, linux-kernel, linux-arm-kernel, rcu,
	linux-rt-users, Catalin Marinas, Will Deacon, Ingo Molnar,
	Peter Zijlstra, Thomas Gleixner, Steven Rostedt,
	Daniel Bristot de Oliveira, Frederic Weisbecker, Josh Triplett,
	Mathieu Desnoyers, Davidlohr Bueso, Lai Jiangshan,
	Joel Fernandes, Anshuman Khandual, Vincenzo Frascino,
	Steven Price, Ard Biesheuvel, Boqun Feng, Mike Galbraith

On 2021-08-19 23:11:12 [-0500], Scott Wood wrote:
> On Thu, 2021-08-19 at 11:20 -0700, Paul E. McKenney wrote:
> > On Thu, Aug 19, 2021 at 05:47:08PM +0200, Sebastian Andrzej Siewior wrote:
> > > On 2021-08-19 17:39:29 [+0200], To Paul E. McKenney wrote:
> > > > +	/*
> > > > +	 * Ideally these sequences would be detected in debug builds
> > > > +	 * (regardless of RT), but until then don't stop testing
> > > > +	 * them on non-RT.
> > > > +	 */
> > > > +	if (IS_ENABLED(CONFIG_PREEMPT_RT)) {
> > > > +		/*
> > > > +		 * Can't release the outermost rcu lock in an irq disabled
> > > > +		 * section without preemption also being disabled, if irqs
> > > > +		 * had ever been enabled during this RCU critical section
> > > > +		 * (could leak a special flag and delay reporting the qs).
> > > > +		 */
> > > > +		if ((oldmask & RCUTORTURE_RDR_RCU) &&
> > > > +		    (mask & RCUTORTURE_RDR_IRQ) &&
> > > > +		    !(mask & preempts))
> > > > +			mask |= RCUTORTURE_RDR_RCU;
> > > 
> > > This piece above, I don't understand. I had it running for a while and
> > > it didn't explode. Let me try TREE01 for 30min without that piece.
> > 
> > This might be historical.  There was a time when interrupts being
> > disabled across rcu_read_unlock() meant that preemption had to have
> > been disabled across the entire RCU read-side critical section.
> > 
> > I am not seeing a purpose for it now, but I could easily be missing
> > something, especially given my tenuous grasp of RT.
> 
> Yeah, I think this was to deal with not having the irq work stuff in RT
> at the time.

Good. Thank you for the confirmation. 
I run (without the hunk above) 2x 6h of TREE01 and 4x 6h of TREE06 and
it looked good.

> -Scott

Sebastian

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [PATCH v2] rcutorture: Avoid problematic critical section nesting on PREEMPT_RT
  2021-08-20  4:11                 ` Scott Wood
  2021-08-20  7:11                   ` Sebastian Andrzej Siewior
@ 2021-08-20  7:42                   ` Sebastian Andrzej Siewior
  2021-08-20 22:10                     ` Paul E. McKenney
  1 sibling, 1 reply; 51+ messages in thread
From: Sebastian Andrzej Siewior @ 2021-08-20  7:42 UTC (permalink / raw)
  To: Scott Wood
  Cc: paulmck, Valentin Schneider, linux-kernel, linux-arm-kernel, rcu,
	linux-rt-users, Catalin Marinas, Will Deacon, Ingo Molnar,
	Peter Zijlstra, Thomas Gleixner, Steven Rostedt,
	Daniel Bristot de Oliveira, Frederic Weisbecker, Josh Triplett,
	Mathieu Desnoyers, Davidlohr Bueso, Lai Jiangshan,
	Joel Fernandes, Anshuman Khandual, Vincenzo Frascino,
	Steven Price, Ard Biesheuvel, Boqun Feng, Mike Galbraith

From: "From: Scott Wood" <swood@redhat.com>

rcutorture is generating some nesting scenarios that are not compatible on PREEMPT_RT.
For example:
	preempt_disable();
	rcu_read_lock_bh();
	preempt_enable();
	rcu_read_unlock_bh();

The problem here is that on PREEMPT_RT the bottom halves have to be
disabled and enabled in preemptible context.

Reorder locking: start with BH locking and continue with then with
disabling preemption or interrupts. In the unlocking do it reverse by
first enabling interrupts and preemption and BH at the very end.
Ensure that on PREEMPT_RT BH locking remains unchanged if in
non-preemptible context.

Link: https://lkml.kernel.org/r/20190911165729.11178-6-swood@redhat.com
Link: https://lkml.kernel.org/r/20210819182035.GF4126399@paulmck-ThinkPad-P17-Gen-1
Signed-off-by: Scott Wood <swood@redhat.com>
[bigeasy: Drop ATOM_BH, make it only about changing BH in atomic
context. Allow enabling RCU in IRQ-off section. Reword commit message.]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
v1…v2:
  - Drop the ATOM_BH* bits. There don't seem to be needed, Paul did not
    ant the preempt-disable around enabling/disabling BH as it might fix
    things that RCU should take care.

  - Allow enabling RCU with disabled interrupts on RT. Scott confirmed
    that it was needed but might no longer be needed. Paul said that it
    might have been required at some point. It survived multiple 6h long
    TREE01 and TREE06 testing.

 kernel/rcu/rcutorture.c | 48 ++++++++++++++++++++++++++++++-----------
 1 file changed, 36 insertions(+), 12 deletions(-)

diff --git a/kernel/rcu/rcutorture.c b/kernel/rcu/rcutorture.c
index 40ef5417d9545..d2ef535530b10 100644
--- a/kernel/rcu/rcutorture.c
+++ b/kernel/rcu/rcutorture.c
@@ -1432,28 +1432,34 @@ static void rcutorture_one_extend(int *readstate, int newstate,
 	/* First, put new protection in place to avoid critical-section gap. */
 	if (statesnew & RCUTORTURE_RDR_BH)
 		local_bh_disable();
+	if (statesnew & RCUTORTURE_RDR_RBH)
+		rcu_read_lock_bh();
 	if (statesnew & RCUTORTURE_RDR_IRQ)
 		local_irq_disable();
 	if (statesnew & RCUTORTURE_RDR_PREEMPT)
 		preempt_disable();
-	if (statesnew & RCUTORTURE_RDR_RBH)
-		rcu_read_lock_bh();
 	if (statesnew & RCUTORTURE_RDR_SCHED)
 		rcu_read_lock_sched();
 	if (statesnew & RCUTORTURE_RDR_RCU)
 		idxnew = cur_ops->readlock() << RCUTORTURE_RDR_SHIFT;
 
-	/* Next, remove old protection, irq first due to bh conflict. */
+	/*
+	 * Next, remove old protection, in decreasing order of strength
+	 * to avoid unlock paths that aren't safe in the stronger
+	 * context. Namely: BH can not be enabled with disabled interrupts.
+	 * Additionally PREEMPT_RT requires that BH is enabled in preemptible
+	 * context.
+	 */
 	if (statesold & RCUTORTURE_RDR_IRQ)
 		local_irq_enable();
-	if (statesold & RCUTORTURE_RDR_BH)
-		local_bh_enable();
 	if (statesold & RCUTORTURE_RDR_PREEMPT)
 		preempt_enable();
-	if (statesold & RCUTORTURE_RDR_RBH)
-		rcu_read_unlock_bh();
 	if (statesold & RCUTORTURE_RDR_SCHED)
 		rcu_read_unlock_sched();
+	if (statesold & RCUTORTURE_RDR_BH)
+		local_bh_enable();
+	if (statesold & RCUTORTURE_RDR_RBH)
+		rcu_read_unlock_bh();
 	if (statesold & RCUTORTURE_RDR_RCU) {
 		bool lockit = !statesnew && !(torture_random(trsp) & 0xffff);
 
@@ -1496,6 +1502,9 @@ rcutorture_extend_mask(int oldmask, struct torture_random_state *trsp)
 	int mask = rcutorture_extend_mask_max();
 	unsigned long randmask1 = torture_random(trsp) >> 8;
 	unsigned long randmask2 = randmask1 >> 3;
+	unsigned long preempts = RCUTORTURE_RDR_PREEMPT | RCUTORTURE_RDR_SCHED;
+	unsigned long preempts_irq = preempts | RCUTORTURE_RDR_IRQ;
+	unsigned long bhs = RCUTORTURE_RDR_BH | RCUTORTURE_RDR_RBH;
 
 	WARN_ON_ONCE(mask >> RCUTORTURE_RDR_SHIFT);
 	/* Mostly only one bit (need preemption!), sometimes lots of bits. */
@@ -1503,11 +1512,26 @@ rcutorture_extend_mask(int oldmask, struct torture_random_state *trsp)
 		mask = mask & randmask2;
 	else
 		mask = mask & (1 << (randmask2 % RCUTORTURE_RDR_NBITS));
-	/* Can't enable bh w/irq disabled. */
-	if ((mask & RCUTORTURE_RDR_IRQ) &&
-	    ((!(mask & RCUTORTURE_RDR_BH) && (oldmask & RCUTORTURE_RDR_BH)) ||
-	     (!(mask & RCUTORTURE_RDR_RBH) && (oldmask & RCUTORTURE_RDR_RBH))))
-		mask |= RCUTORTURE_RDR_BH | RCUTORTURE_RDR_RBH;
+
+	/*
+	 * Can't enable bh w/irq disabled.
+	 */
+	if (mask & RCUTORTURE_RDR_IRQ)
+		mask |= oldmask & bhs;
+
+	/*
+	 * Ideally these sequences would be detected in debug builds
+	 * (regardless of RT), but until then don't stop testing
+	 * them on non-RT.
+	 */
+	if (IS_ENABLED(CONFIG_PREEMPT_RT)) {
+		/* Can't modify BH in atomic context */
+		if (oldmask & preempts_irq)
+			mask &= ~bhs;
+		if ((oldmask | mask) & preempts_irq)
+			mask |= oldmask & bhs;
+	}
+
 	return mask ?: RCUTORTURE_RDR_RCU;
 }
 
-- 
2.33.0


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* Re: [PATCH v2] rcutorture: Avoid problematic critical section nesting on PREEMPT_RT
  2021-08-20  7:42                   ` [PATCH v2] rcutorture: Avoid problematic critical section nesting on PREEMPT_RT Sebastian Andrzej Siewior
@ 2021-08-20 22:10                     ` Paul E. McKenney
  0 siblings, 0 replies; 51+ messages in thread
From: Paul E. McKenney @ 2021-08-20 22:10 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Scott Wood, Valentin Schneider, linux-kernel, linux-arm-kernel,
	rcu, linux-rt-users, Catalin Marinas, Will Deacon, Ingo Molnar,
	Peter Zijlstra, Thomas Gleixner, Steven Rostedt,
	Daniel Bristot de Oliveira, Frederic Weisbecker, Josh Triplett,
	Mathieu Desnoyers, Davidlohr Bueso, Lai Jiangshan,
	Joel Fernandes, Anshuman Khandual, Vincenzo Frascino,
	Steven Price, Ard Biesheuvel, Boqun Feng, Mike Galbraith

On Fri, Aug 20, 2021 at 09:42:36AM +0200, Sebastian Andrzej Siewior wrote:
> From: "From: Scott Wood" <swood@redhat.com>
> 
> rcutorture is generating some nesting scenarios that are not compatible on PREEMPT_RT.
> For example:
> 	preempt_disable();
> 	rcu_read_lock_bh();
> 	preempt_enable();
> 	rcu_read_unlock_bh();
> 
> The problem here is that on PREEMPT_RT the bottom halves have to be
> disabled and enabled in preemptible context.
> 
> Reorder locking: start with BH locking and continue with then with
> disabling preemption or interrupts. In the unlocking do it reverse by
> first enabling interrupts and preemption and BH at the very end.
> Ensure that on PREEMPT_RT BH locking remains unchanged if in
> non-preemptible context.
> 
> Link: https://lkml.kernel.org/r/20190911165729.11178-6-swood@redhat.com
> Link: https://lkml.kernel.org/r/20210819182035.GF4126399@paulmck-ThinkPad-P17-Gen-1
> Signed-off-by: Scott Wood <swood@redhat.com>
> [bigeasy: Drop ATOM_BH, make it only about changing BH in atomic
> context. Allow enabling RCU in IRQ-off section. Reword commit message.]
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

Looks plausible.  ;-)

I have queued this for testing and further review.  If all goes well,
perhaps the v5.16 merge window.

							Thanx, Paul

> ---
> v1…v2:
>   - Drop the ATOM_BH* bits. There don't seem to be needed, Paul did not
>     ant the preempt-disable around enabling/disabling BH as it might fix
>     things that RCU should take care.
> 
>   - Allow enabling RCU with disabled interrupts on RT. Scott confirmed
>     that it was needed but might no longer be needed. Paul said that it
>     might have been required at some point. It survived multiple 6h long
>     TREE01 and TREE06 testing.
> 
>  kernel/rcu/rcutorture.c | 48 ++++++++++++++++++++++++++++++-----------
>  1 file changed, 36 insertions(+), 12 deletions(-)
> 
> diff --git a/kernel/rcu/rcutorture.c b/kernel/rcu/rcutorture.c
> index 40ef5417d9545..d2ef535530b10 100644
> --- a/kernel/rcu/rcutorture.c
> +++ b/kernel/rcu/rcutorture.c
> @@ -1432,28 +1432,34 @@ static void rcutorture_one_extend(int *readstate, int newstate,
>  	/* First, put new protection in place to avoid critical-section gap. */
>  	if (statesnew & RCUTORTURE_RDR_BH)
>  		local_bh_disable();
> +	if (statesnew & RCUTORTURE_RDR_RBH)
> +		rcu_read_lock_bh();
>  	if (statesnew & RCUTORTURE_RDR_IRQ)
>  		local_irq_disable();
>  	if (statesnew & RCUTORTURE_RDR_PREEMPT)
>  		preempt_disable();
> -	if (statesnew & RCUTORTURE_RDR_RBH)
> -		rcu_read_lock_bh();
>  	if (statesnew & RCUTORTURE_RDR_SCHED)
>  		rcu_read_lock_sched();
>  	if (statesnew & RCUTORTURE_RDR_RCU)
>  		idxnew = cur_ops->readlock() << RCUTORTURE_RDR_SHIFT;
>  
> -	/* Next, remove old protection, irq first due to bh conflict. */
> +	/*
> +	 * Next, remove old protection, in decreasing order of strength
> +	 * to avoid unlock paths that aren't safe in the stronger
> +	 * context. Namely: BH can not be enabled with disabled interrupts.
> +	 * Additionally PREEMPT_RT requires that BH is enabled in preemptible
> +	 * context.
> +	 */
>  	if (statesold & RCUTORTURE_RDR_IRQ)
>  		local_irq_enable();
> -	if (statesold & RCUTORTURE_RDR_BH)
> -		local_bh_enable();
>  	if (statesold & RCUTORTURE_RDR_PREEMPT)
>  		preempt_enable();
> -	if (statesold & RCUTORTURE_RDR_RBH)
> -		rcu_read_unlock_bh();
>  	if (statesold & RCUTORTURE_RDR_SCHED)
>  		rcu_read_unlock_sched();
> +	if (statesold & RCUTORTURE_RDR_BH)
> +		local_bh_enable();
> +	if (statesold & RCUTORTURE_RDR_RBH)
> +		rcu_read_unlock_bh();
>  	if (statesold & RCUTORTURE_RDR_RCU) {
>  		bool lockit = !statesnew && !(torture_random(trsp) & 0xffff);
>  
> @@ -1496,6 +1502,9 @@ rcutorture_extend_mask(int oldmask, struct torture_random_state *trsp)
>  	int mask = rcutorture_extend_mask_max();
>  	unsigned long randmask1 = torture_random(trsp) >> 8;
>  	unsigned long randmask2 = randmask1 >> 3;
> +	unsigned long preempts = RCUTORTURE_RDR_PREEMPT | RCUTORTURE_RDR_SCHED;
> +	unsigned long preempts_irq = preempts | RCUTORTURE_RDR_IRQ;
> +	unsigned long bhs = RCUTORTURE_RDR_BH | RCUTORTURE_RDR_RBH;
>  
>  	WARN_ON_ONCE(mask >> RCUTORTURE_RDR_SHIFT);
>  	/* Mostly only one bit (need preemption!), sometimes lots of bits. */
> @@ -1503,11 +1512,26 @@ rcutorture_extend_mask(int oldmask, struct torture_random_state *trsp)
>  		mask = mask & randmask2;
>  	else
>  		mask = mask & (1 << (randmask2 % RCUTORTURE_RDR_NBITS));
> -	/* Can't enable bh w/irq disabled. */
> -	if ((mask & RCUTORTURE_RDR_IRQ) &&
> -	    ((!(mask & RCUTORTURE_RDR_BH) && (oldmask & RCUTORTURE_RDR_BH)) ||
> -	     (!(mask & RCUTORTURE_RDR_RBH) && (oldmask & RCUTORTURE_RDR_RBH))))
> -		mask |= RCUTORTURE_RDR_BH | RCUTORTURE_RDR_RBH;
> +
> +	/*
> +	 * Can't enable bh w/irq disabled.
> +	 */
> +	if (mask & RCUTORTURE_RDR_IRQ)
> +		mask |= oldmask & bhs;
> +
> +	/*
> +	 * Ideally these sequences would be detected in debug builds
> +	 * (regardless of RT), but until then don't stop testing
> +	 * them on non-RT.
> +	 */
> +	if (IS_ENABLED(CONFIG_PREEMPT_RT)) {
> +		/* Can't modify BH in atomic context */
> +		if (oldmask & preempts_irq)
> +			mask &= ~bhs;
> +		if ((oldmask | mask) & preempts_irq)
> +			mask |= oldmask & bhs;
> +	}
> +
>  	return mask ?: RCUTORTURE_RDR_RCU;
>  }
>  
> -- 
> 2.33.0
> 

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 2/4] sched: Introduce migratable()
  2021-08-17 14:43   ` Sebastian Andrzej Siewior
@ 2021-08-22 17:31     ` Valentin Schneider
  0 siblings, 0 replies; 51+ messages in thread
From: Valentin Schneider @ 2021-08-22 17:31 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: linux-kernel, linux-arm-kernel, rcu, linux-rt-users,
	Catalin Marinas, Will Deacon, Ingo Molnar, Peter Zijlstra,
	Thomas Gleixner, Steven Rostedt, Daniel Bristot de Oliveira,
	Paul E. McKenney, Frederic Weisbecker, Josh Triplett,
	Mathieu Desnoyers, Davidlohr Bueso, Lai Jiangshan,
	Joel Fernandes, Anshuman Khandual, Vincenzo Frascino,
	Steven Price, Ard Biesheuvel, Boqun Feng, Mike Galbraith

On 17/08/21 16:43, Sebastian Andrzej Siewior wrote:
> On 2021-08-11 21:13:52 [+0100], Valentin Schneider wrote:
>> diff --git a/include/linux/sched.h b/include/linux/sched.h
>> index debc960f41e3..8ba7b4a7ee69 100644
>> --- a/include/linux/sched.h
>> +++ b/include/linux/sched.h
>> @@ -1715,6 +1715,16 @@ static inline bool is_percpu_thread(void)
>>  #endif
>>  }
>>
>> +/* Is the current task guaranteed to stay on its current CPU? */
>> +static inline bool migratable(void)
>> +{
>> +#ifdef CONFIG_SMP
>> +	return preemptible() && !current->migration_disabled;
>> +#else
>> +	return true;
>
> shouldn't this be false in the UP case?
>

Yes indeed, forgot to flip that one when inverting the logic.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 2/4] sched: Introduce migratable()
  2021-08-17 17:09   ` Sebastian Andrzej Siewior
  2021-08-17 19:30     ` Phil Auld
@ 2021-08-22 18:14     ` Valentin Schneider
  2022-01-26 16:56       ` Sebastian Andrzej Siewior
  1 sibling, 1 reply; 51+ messages in thread
From: Valentin Schneider @ 2021-08-22 18:14 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: linux-kernel, linux-arm-kernel, rcu, linux-rt-users,
	Catalin Marinas, Will Deacon, Ingo Molnar, Peter Zijlstra,
	Thomas Gleixner, Steven Rostedt, Daniel Bristot de Oliveira,
	Paul E. McKenney, Frederic Weisbecker, Josh Triplett,
	Mathieu Desnoyers, Davidlohr Bueso, Lai Jiangshan,
	Joel Fernandes, Anshuman Khandual, Vincenzo Frascino,
	Steven Price, Ard Biesheuvel, Boqun Feng, Mike Galbraith

On 17/08/21 19:09, Sebastian Andrzej Siewior wrote:
> On 2021-08-11 21:13:52 [+0100], Valentin Schneider wrote:
>> diff --git a/include/linux/sched.h b/include/linux/sched.h
>> index debc960f41e3..8ba7b4a7ee69 100644
>> --- a/include/linux/sched.h
>> +++ b/include/linux/sched.h
>> @@ -1715,6 +1715,16 @@ static inline bool is_percpu_thread(void)
>>  #endif
>>  }
>>
>> +/* Is the current task guaranteed to stay on its current CPU? */
>> +static inline bool migratable(void)
>
> I'm going to rename this in my tree to `is_migratable' because of

<snip>

Thanks for carrying it through, I'll keep that change for the next version.

> Sebastian

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 3/4] rcu/nocb: Protect NOCB state via local_lock() under PREEMPT_RT
  2021-08-17 15:36   ` Sebastian Andrzej Siewior
@ 2021-08-22 18:15     ` Valentin Schneider
  0 siblings, 0 replies; 51+ messages in thread
From: Valentin Schneider @ 2021-08-22 18:15 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: linux-kernel, linux-arm-kernel, rcu, linux-rt-users,
	Catalin Marinas, Will Deacon, Ingo Molnar, Peter Zijlstra,
	Thomas Gleixner, Steven Rostedt, Daniel Bristot de Oliveira,
	Paul E. McKenney, Frederic Weisbecker, Josh Triplett,
	Mathieu Desnoyers, Davidlohr Bueso, Lai Jiangshan,
	Joel Fernandes, Anshuman Khandual, Vincenzo Frascino,
	Steven Price, Ard Biesheuvel, Boqun Feng, Mike Galbraith

On 17/08/21 17:36, Sebastian Andrzej Siewior wrote:
>> +static inline int rcu_lockdep_is_held_nocb_local(struct rcu_data *rdp)
>> +{
>> +	return lockdep_is_held(
>> +#ifdef CONFIG_PREEMPT_RT
>> +		&rdp->nocb_local_lock.lock
>> +#else
>> +		&rdp->nocb_local_lock
>> +#endif
>> +	);
>> +}
>
> Now that I see it and Paul asked for it, please just use !RT version.
>       return lockdep_is_held(&rdp->nocb_local_lock);
>
> and RT will work, too.
>

The above was required due to the (previous) definition of local_lock_t:

  typedef struct {
          spinlock_t		lock;
  } local_lock_t;

On -rt11 I see this is now:

  typedef spinlock_t local_lock_t;

which indeed means the iffdefery can (actually it *has* to) go.

>>  static inline bool rcu_current_is_nocb_kthread(struct rcu_data *rdp)
>>  {
>>      /* Race on early boot between thread creation and assignment */
>> @@ -1629,6 +1664,22 @@ static void rcu_nocb_unlock_irqrestore(struct rcu_data *rdp,
>>      }
>>  }
>>
>> +/*
>> + * The invocation of rcu_core() within the RCU core kthreads remains preemptible
>> + * under PREEMPT_RT, thus the offload state of a CPU could change while
>> + * said kthreads are preempted. Prevent this from happening by protecting the
>> + * offload state with a local_lock().
>> + */
>> +static void rcu_nocb_local_lock(struct rcu_data *rdp)
>> +{
>> +	local_lock(&rcu_data.nocb_local_lock);
>> +}
>> +
>> +static void rcu_nocb_local_unlock(struct rcu_data *rdp)
>> +{
>> +	local_unlock(&rcu_data.nocb_local_lock);
>> +}
>> +
> Do you need to pass rdp given that it is not used?
>

Not anymore, you're right.

>>  /* Lockdep check that ->cblist may be safely accessed. */
>>  static void rcu_lockdep_assert_cblist_protected(struct rcu_data *rdp)
>>  {
>
> Sebastian

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 3/4] rcu/nocb: Protect NOCB state via local_lock() under PREEMPT_RT
  2021-08-11 20:13 ` [PATCH v3 3/4] rcu/nocb: Protect NOCB state via local_lock() under PREEMPT_RT Valentin Schneider
  2021-08-13  0:20   ` Paul E. McKenney
  2021-08-17 15:36   ` Sebastian Andrzej Siewior
@ 2021-09-21 14:05   ` Thomas Gleixner
  2021-09-21 21:12     ` rcu/tree: Protect rcu_rdp_is_offloaded() invocations on RT Thomas Gleixner
  2 siblings, 1 reply; 51+ messages in thread
From: Thomas Gleixner @ 2021-09-21 14:05 UTC (permalink / raw)
  To: Valentin Schneider, linux-kernel, linux-arm-kernel, rcu, linux-rt-users
  Cc: Catalin Marinas, Will Deacon, Ingo Molnar, Peter Zijlstra,
	Steven Rostedt, Daniel Bristot de Oliveira,
	Sebastian Andrzej Siewior, Paul E. McKenney, Frederic Weisbecker,
	Josh Triplett, Mathieu Desnoyers, Davidlohr Bueso, Lai Jiangshan,
	Joel Fernandes, Anshuman Khandual, Vincenzo Frascino,
	Steven Price, Ard Biesheuvel, Boqun Feng, Mike Galbraith

Valentin,

On Wed, Aug 11 2021 at 21:13, Valentin Schneider wrote:
> Running v5.13-rt1 on my arm64 Juno board triggers:
>
> [    0.156302] =============================
> [    0.160416] WARNING: suspicious RCU usage
> [    0.172409] kernel/rcu/tree_plugin.h:69 Unsafe read of RCU_NOCB offloaded state!
> [    0.260328] rcu_rdp_is_offloaded (kernel/rcu/tree_plugin.h:69 kernel/rcu/tree_plugin.h:58)
> [    0.264537] rcu_core (kernel/rcu/tree.c:2332 kernel/rcu/tree.c:2398 kernel/rcu/tree.c:2777)
> [    0.267786] rcu_cpu_kthread (./include/linux/bottom_half.h:32 kernel/rcu/tree.c:2876)
>
> In this case, this is the RCU core kthread accessing the local CPU's
> rdp. Before that, rcu_cpu_kthread() invokes local_bh_disable().
>
> Under !CONFIG_PREEMPT_RT (and rcutree.use_softirq=0), this ends up
> incrementing the preempt_count, which satisfies the "local non-preemptible
> read" of rcu_rdp_is_offloaded().
>
> Under CONFIG_PREEMPT_RT however, this becomes
>
>   local_lock(&softirq_ctrl.lock)
>
> which, under the same config, is migrate_disable() + rt_spin_lock(). As
> pointed out by Frederic, this is not sufficient to safely access an rdp's
> offload state, as the RCU core kthread can be preempted by a kworker
> executing rcu_nocb_rdp_offload() [1].
>
> Introduce a local_lock to serialize an rdp's offload state while the rdp's
> associated core kthread is executing rcu_core().

Yes, sure. But I don't think that local_lock is required at all.

The point is that the two places where this actually matters invoke
rcu_rdp_is_offloaded() just at the top of the function outside of the
anyway existing protection sections. Moving it into the sections which
already provide the required protections makes it just work for both RT
and !RT.

Thanks,

        tglx
---

--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -2278,13 +2278,13 @@ rcu_report_qs_rdp(struct rcu_data *rdp)
 {
 	unsigned long flags;
 	unsigned long mask;
-	bool needwake = false;
-	const bool offloaded = rcu_rdp_is_offloaded(rdp);
+	bool offloaded, needwake = false;
 	struct rcu_node *rnp;
 
 	WARN_ON_ONCE(rdp->cpu != smp_processor_id());
 	rnp = rdp->mynode;
 	raw_spin_lock_irqsave_rcu_node(rnp, flags);
+	offloaded = rcu_rdp_is_offloaded(rdp);
 	if (rdp->cpu_no_qs.b.norm || rdp->gp_seq != rnp->gp_seq ||
 	    rdp->gpwrap) {
 
@@ -2446,7 +2446,7 @@ static void rcu_do_batch(struct rcu_data
 	int div;
 	bool __maybe_unused empty;
 	unsigned long flags;
-	const bool offloaded = rcu_rdp_is_offloaded(rdp);
+	bool offloaded;
 	struct rcu_head *rhp;
 	struct rcu_cblist rcl = RCU_CBLIST_INITIALIZER(rcl);
 	long bl, count = 0;
@@ -2472,6 +2472,7 @@ static void rcu_do_batch(struct rcu_data
 	rcu_nocb_lock(rdp);
 	WARN_ON_ONCE(cpu_is_offline(smp_processor_id()));
 	pending = rcu_segcblist_n_cbs(&rdp->cblist);
+	offloaded = rcu_rdp_is_offloaded(rdp);
 	div = READ_ONCE(rcu_divisor);
 	div = div < 0 ? 7 : div > sizeof(long) * 8 - 2 ? sizeof(long) * 8 - 2 : div;
 	bl = max(rdp->blimit, pending >> div);

^ permalink raw reply	[flat|nested] 51+ messages in thread

* rcu/tree: Protect rcu_rdp_is_offloaded() invocations on RT
  2021-09-21 14:05   ` Thomas Gleixner
@ 2021-09-21 21:12     ` Thomas Gleixner
  2021-09-21 23:36       ` Frederic Weisbecker
                         ` (2 more replies)
  0 siblings, 3 replies; 51+ messages in thread
From: Thomas Gleixner @ 2021-09-21 21:12 UTC (permalink / raw)
  To: Valentin Schneider, linux-kernel, linux-arm-kernel, rcu, linux-rt-users
  Cc: Catalin Marinas, Will Deacon, Ingo Molnar, Peter Zijlstra,
	Steven Rostedt, Daniel Bristot de Oliveira,
	Sebastian Andrzej Siewior, Paul E. McKenney, Frederic Weisbecker,
	Josh Triplett, Mathieu Desnoyers, Davidlohr Bueso, Lai Jiangshan,
	Joel Fernandes, Anshuman Khandual, Vincenzo Frascino,
	Steven Price, Ard Biesheuvel, Boqun Feng, Mike Galbraith

Valentin reported warnings about suspicious RCU usage on RT kernels. Those
happen when offloading of RCU callbacks is enabled:

  WARNING: suspicious RCU usage
  5.13.0-rt1 #20 Not tainted
  -----------------------------
  kernel/rcu/tree_plugin.h:69 Unsafe read of RCU_NOCB offloaded state!

  rcu_rdp_is_offloaded (kernel/rcu/tree_plugin.h:69 kernel/rcu/tree_plugin.h:58)
  rcu_core (kernel/rcu/tree.c:2332 kernel/rcu/tree.c:2398 kernel/rcu/tree.c:2777)
  rcu_cpu_kthread (./include/linux/bottom_half.h:32 kernel/rcu/tree.c:2876)

The reason is that rcu_rdp_is_offloaded() is invoked without one of the
required protections on RT enabled kernels because local_bh_disable() does
not disable preemption on RT.

Valentin proposed to add a local lock to the code in question, but that's
suboptimal in several aspects:

  1) local locks add extra code to !RT kernels for no value.

  2) All possible callsites have to audited and amended when affected
     possible at an outer function level due to lock nesting issues.

  3) As the local lock has to be taken at the outer functions it's required
     to release and reacquire them in the inner code sections which might
     voluntary schedule, e.g. rcu_do_batch().

Both callsites of rcu_rdp_is_offloaded() which trigger this check invoke
rcu_rdp_is_offloaded() in the variable declaration section right at the top
of the functions. But the actual usage of the result is either within a
section which provides the required protections or after such a section.

So the obvious solution is to move the invocation into the code sections
which provide the proper protections, which solves the problem for RT and
does not have any impact on !RT kernels.

Reported-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 kernel/rcu/tree.c |    7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -2278,13 +2278,13 @@ rcu_report_qs_rdp(struct rcu_data *rdp)
 {
 	unsigned long flags;
 	unsigned long mask;
-	bool needwake = false;
-	const bool offloaded = rcu_rdp_is_offloaded(rdp);
+	bool offloaded, needwake = false;
 	struct rcu_node *rnp;
 
 	WARN_ON_ONCE(rdp->cpu != smp_processor_id());
 	rnp = rdp->mynode;
 	raw_spin_lock_irqsave_rcu_node(rnp, flags);
+	offloaded = rcu_rdp_is_offloaded(rdp);
 	if (rdp->cpu_no_qs.b.norm || rdp->gp_seq != rnp->gp_seq ||
 	    rdp->gpwrap) {
 
@@ -2446,7 +2446,7 @@ static void rcu_do_batch(struct rcu_data
 	int div;
 	bool __maybe_unused empty;
 	unsigned long flags;
-	const bool offloaded = rcu_rdp_is_offloaded(rdp);
+	bool offloaded;
 	struct rcu_head *rhp;
 	struct rcu_cblist rcl = RCU_CBLIST_INITIALIZER(rcl);
 	long bl, count = 0;
@@ -2472,6 +2472,7 @@ static void rcu_do_batch(struct rcu_data
 	rcu_nocb_lock(rdp);
 	WARN_ON_ONCE(cpu_is_offline(smp_processor_id()));
 	pending = rcu_segcblist_n_cbs(&rdp->cblist);
+	offloaded = rcu_rdp_is_offloaded(rdp);
 	div = READ_ONCE(rcu_divisor);
 	div = div < 0 ? 7 : div > sizeof(long) * 8 - 2 ? sizeof(long) * 8 - 2 : div;
 	bl = max(rdp->blimit, pending >> div);

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: rcu/tree: Protect rcu_rdp_is_offloaded() invocations on RT
  2021-09-21 21:12     ` rcu/tree: Protect rcu_rdp_is_offloaded() invocations on RT Thomas Gleixner
@ 2021-09-21 23:36       ` Frederic Weisbecker
  2021-09-22  2:18         ` Paul E. McKenney
  2021-09-21 23:45       ` Frederic Weisbecker
  2021-09-30  9:00       ` Valentin Schneider
  2 siblings, 1 reply; 51+ messages in thread
From: Frederic Weisbecker @ 2021-09-21 23:36 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Valentin Schneider, linux-kernel, linux-arm-kernel, rcu,
	linux-rt-users, Catalin Marinas, Will Deacon, Ingo Molnar,
	Peter Zijlstra, Steven Rostedt, Daniel Bristot de Oliveira,
	Sebastian Andrzej Siewior, Paul E. McKenney, Josh Triplett,
	Mathieu Desnoyers, Davidlohr Bueso, Lai Jiangshan,
	Joel Fernandes, Anshuman Khandual, Vincenzo Frascino,
	Steven Price, Ard Biesheuvel, Boqun Feng, Mike Galbraith

On Tue, Sep 21, 2021 at 11:12:50PM +0200, Thomas Gleixner wrote:
> Valentin reported warnings about suspicious RCU usage on RT kernels. Those
> happen when offloading of RCU callbacks is enabled:
> 
>   WARNING: suspicious RCU usage
>   5.13.0-rt1 #20 Not tainted
>   -----------------------------
>   kernel/rcu/tree_plugin.h:69 Unsafe read of RCU_NOCB offloaded state!
> 
>   rcu_rdp_is_offloaded (kernel/rcu/tree_plugin.h:69 kernel/rcu/tree_plugin.h:58)
>   rcu_core (kernel/rcu/tree.c:2332 kernel/rcu/tree.c:2398 kernel/rcu/tree.c:2777)
>   rcu_cpu_kthread (./include/linux/bottom_half.h:32 kernel/rcu/tree.c:2876)
> 
> The reason is that rcu_rdp_is_offloaded() is invoked without one of the
> required protections on RT enabled kernels because local_bh_disable() does
> not disable preemption on RT.
> 
> Valentin proposed to add a local lock to the code in question, but that's
> suboptimal in several aspects:
> 
>   1) local locks add extra code to !RT kernels for no value.
> 
>   2) All possible callsites have to audited and amended when affected
>      possible at an outer function level due to lock nesting issues.
> 
>   3) As the local lock has to be taken at the outer functions it's required
>      to release and reacquire them in the inner code sections which might
>      voluntary schedule, e.g. rcu_do_batch().
> 
> Both callsites of rcu_rdp_is_offloaded() which trigger this check invoke
> rcu_rdp_is_offloaded() in the variable declaration section right at the top
> of the functions. But the actual usage of the result is either within a
> section which provides the required protections or after such a section.
> 
> So the obvious solution is to move the invocation into the code sections
> which provide the proper protections, which solves the problem for RT and
> does not have any impact on !RT kernels.

You also need to consider rcu_segcblist_completely_offloaded(). There
are two users:

1) The first chunk using it in rcu_core() checks if there is a need to
accelerate the callback and that can happen concurrently with nocb
manipulations on the cblist. Concurrent (de-)offloading could mess
up with the locking state but here is what we can do:

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index bce848e50512..3e56a1a4af03 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -2728,9 +2728,10 @@ static __latent_entropy void rcu_core(void)
 
 	/* No grace period and unregistered callbacks? */
 	if (!rcu_gp_in_progress() &&
-	    rcu_segcblist_is_enabled(&rdp->cblist) && do_batch) {
+	    rcu_segcblist_is_enabled(&rdp->cblist)) {
 		rcu_nocb_lock_irqsave(rdp, flags);
-		if (!rcu_segcblist_restempty(&rdp->cblist, RCU_NEXT_READY_TAIL))
+		if (!rcu_segcblist_completely_offloaded(&rdp->cblist) &&
+		    !rcu_segcblist_restempty(&rdp->cblist, RCU_NEXT_READY_TAIL))
 			rcu_accelerate_cbs_unlocked(rnp, rdp);
 		rcu_nocb_unlock_irqrestore(rdp, flags);
 	}
diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
index 305cf6aeb408..64d615be3346 100644
--- a/kernel/rcu/tree.h
+++ b/kernel/rcu/tree.h
@@ -449,10 +449,9 @@ static void rcu_lockdep_assert_cblist_protected(struct rcu_data *rdp);
 static void __init rcu_organize_nocb_kthreads(void);
 #define rcu_nocb_lock_irqsave(rdp, flags)				\
 do {									\
+	local_irq_save(flags);						\
 	if (!rcu_segcblist_is_offloaded(&(rdp)->cblist))		\
-		local_irq_save(flags);					\
-	else								\
-		raw_spin_lock_irqsave(&(rdp)->nocb_lock, (flags));	\
+		raw_spin_lock(&(rdp)->nocb_lock);			\
 } while (0)
 #else /* #ifdef CONFIG_RCU_NOCB_CPU */
 #define rcu_nocb_lock_irqsave(rdp, flags) local_irq_save(flags)


Doing the local_irq_save() before checking that the segcblist is offloaded
protect that state from being changed (provided we lock the local rdp). Then we
can safely manipulate cblist, whether locked or unlocked.

2) The actual call to rcu_do_batch(). If we are preempted between
rcu_segcblist_completely_offloaded() and rcu_do_batch() with a deoffload in
the middle, we miss the callback invocation. Invoking rcu_core by the end of
deoffloading process should solve that.

> 
> Reported-by: Valentin Schneider <valentin.schneider@arm.com>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> ---
>  kernel/rcu/tree.c |    7 ++++---
>  1 file changed, 4 insertions(+), 3 deletions(-)
> 
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -2278,13 +2278,13 @@ rcu_report_qs_rdp(struct rcu_data *rdp)
>  {
>  	unsigned long flags;
>  	unsigned long mask;
> -	bool needwake = false;
> -	const bool offloaded = rcu_rdp_is_offloaded(rdp);
> +	bool offloaded, needwake = false;
>  	struct rcu_node *rnp;
>  
>  	WARN_ON_ONCE(rdp->cpu != smp_processor_id());
>  	rnp = rdp->mynode;
>  	raw_spin_lock_irqsave_rcu_node(rnp, flags);
> +	offloaded = rcu_rdp_is_offloaded(rdp);
>  	if (rdp->cpu_no_qs.b.norm || rdp->gp_seq != rnp->gp_seq ||
>  	    rdp->gpwrap) {

BTW Paul, if we happen to switch to non-NOCB (deoffload) some time after
rcu_report_qs_rdp(), it's possible that the rcu_accelerate_cbs()
that was supposed to be handled by nocb kthreads on behalf of
rcu_core() -> rcu_report_qs_rdp() would not happen. At least not until
we invoke rcu_core() again. Not sure how much harm that could cause.

> @@ -2446,7 +2446,7 @@ static void rcu_do_batch(struct rcu_data
>  	int div;
>  	bool __maybe_unused empty;
>  	unsigned long flags;
> -	const bool offloaded = rcu_rdp_is_offloaded(rdp);
> +	bool offloaded;
>  	struct rcu_head *rhp;
>  	struct rcu_cblist rcl = RCU_CBLIST_INITIALIZER(rcl);
>  	long bl, count = 0;
> @@ -2472,6 +2472,7 @@ static void rcu_do_batch(struct rcu_data
>  	rcu_nocb_lock(rdp);
>  	WARN_ON_ONCE(cpu_is_offline(smp_processor_id()));
>  	pending = rcu_segcblist_n_cbs(&rdp->cblist);
> +	offloaded = rcu_rdp_is_offloaded(rdp);

offloaded is also checked later in rcu_do_batch(), after irqrestore. In
fact that should even become a rcu_segcblist_completely_offloaded() check
there because if there are still pending callbacks while we are de-offloading,
rcu_core() should be invoked. Hmm but that might be a remote rcu_core...

Anyway I guess we could live with some of those races with invoking rcu core on the
target after deoffloading.

I guess I should cook a series to handle all these issues one by one, then
probably we can live without a local_lock().

Thanks.



>  	div = READ_ONCE(rcu_divisor);
>  	div = div < 0 ? 7 : div > sizeof(long) * 8 - 2 ? sizeof(long) * 8 - 2 : div;
>  	bl = max(rdp->blimit, pending >> div);

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* Re: rcu/tree: Protect rcu_rdp_is_offloaded() invocations on RT
  2021-09-21 21:12     ` rcu/tree: Protect rcu_rdp_is_offloaded() invocations on RT Thomas Gleixner
  2021-09-21 23:36       ` Frederic Weisbecker
@ 2021-09-21 23:45       ` Frederic Weisbecker
  2021-09-22  6:32         ` Sebastian Andrzej Siewior
  2021-09-30  9:00       ` Valentin Schneider
  2 siblings, 1 reply; 51+ messages in thread
From: Frederic Weisbecker @ 2021-09-21 23:45 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Valentin Schneider, linux-kernel, linux-arm-kernel, rcu,
	linux-rt-users, Catalin Marinas, Will Deacon, Ingo Molnar,
	Peter Zijlstra, Steven Rostedt, Daniel Bristot de Oliveira,
	Sebastian Andrzej Siewior, Paul E. McKenney, Josh Triplett,
	Mathieu Desnoyers, Davidlohr Bueso, Lai Jiangshan,
	Joel Fernandes, Anshuman Khandual, Vincenzo Frascino,
	Steven Price, Ard Biesheuvel, Boqun Feng, Mike Galbraith

On Tue, Sep 21, 2021 at 11:12:50PM +0200, Thomas Gleixner wrote:
> Valentin reported warnings about suspicious RCU usage on RT kernels. Those
> happen when offloading of RCU callbacks is enabled:
> 
>   WARNING: suspicious RCU usage
>   5.13.0-rt1 #20 Not tainted
>   -----------------------------
>   kernel/rcu/tree_plugin.h:69 Unsafe read of RCU_NOCB offloaded state!
> 
>   rcu_rdp_is_offloaded (kernel/rcu/tree_plugin.h:69 kernel/rcu/tree_plugin.h:58)
>   rcu_core (kernel/rcu/tree.c:2332 kernel/rcu/tree.c:2398 kernel/rcu/tree.c:2777)
>   rcu_cpu_kthread (./include/linux/bottom_half.h:32 kernel/rcu/tree.c:2876)
> 
> The reason is that rcu_rdp_is_offloaded() is invoked without one of the
> required protections on RT enabled kernels because local_bh_disable() does
> not disable preemption on RT.
> 
> Valentin proposed to add a local lock to the code in question, but that's
> suboptimal in several aspects:
> 
>   1) local locks add extra code to !RT kernels for no value.
> 
>   2) All possible callsites have to audited and amended when affected
>      possible at an outer function level due to lock nesting issues.
> 
>   3) As the local lock has to be taken at the outer functions it's required
>      to release and reacquire them in the inner code sections which might
>      voluntary schedule, e.g. rcu_do_batch().
> 
> Both callsites of rcu_rdp_is_offloaded() which trigger this check invoke
> rcu_rdp_is_offloaded() in the variable declaration section right at the top
> of the functions. But the actual usage of the result is either within a
> section which provides the required protections or after such a section.
> 
> So the obvious solution is to move the invocation into the code sections
> which provide the proper protections, which solves the problem for RT and
> does not have any impact on !RT kernels.

Also while at it, I'm asking again: traditionally softirqs could assume that
manipulating a local state was safe against !irq_count() code fiddling with
the same state on the same CPU.

Now with preemptible softirqs, that assumption can be broken anytime. RCU was
fortunate enough to have a warning for that. But who knows how many issues like
this are lurking?

Thanks.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: rcu/tree: Protect rcu_rdp_is_offloaded() invocations on RT
  2021-09-21 23:36       ` Frederic Weisbecker
@ 2021-09-22  2:18         ` Paul E. McKenney
  2021-09-22 11:31           ` Frederic Weisbecker
  0 siblings, 1 reply; 51+ messages in thread
From: Paul E. McKenney @ 2021-09-22  2:18 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Thomas Gleixner, Valentin Schneider, linux-kernel,
	linux-arm-kernel, rcu, linux-rt-users, Catalin Marinas,
	Will Deacon, Ingo Molnar, Peter Zijlstra, Steven Rostedt,
	Daniel Bristot de Oliveira, Sebastian Andrzej Siewior,
	Josh Triplett, Mathieu Desnoyers, Davidlohr Bueso, Lai Jiangshan,
	Joel Fernandes, Anshuman Khandual, Vincenzo Frascino,
	Steven Price, Ard Biesheuvel, Boqun Feng, Mike Galbraith

On Wed, Sep 22, 2021 at 01:36:27AM +0200, Frederic Weisbecker wrote:
> On Tue, Sep 21, 2021 at 11:12:50PM +0200, Thomas Gleixner wrote:
> > Valentin reported warnings about suspicious RCU usage on RT kernels. Those
> > happen when offloading of RCU callbacks is enabled:
> > 
> >   WARNING: suspicious RCU usage
> >   5.13.0-rt1 #20 Not tainted
> >   -----------------------------
> >   kernel/rcu/tree_plugin.h:69 Unsafe read of RCU_NOCB offloaded state!
> > 
> >   rcu_rdp_is_offloaded (kernel/rcu/tree_plugin.h:69 kernel/rcu/tree_plugin.h:58)
> >   rcu_core (kernel/rcu/tree.c:2332 kernel/rcu/tree.c:2398 kernel/rcu/tree.c:2777)
> >   rcu_cpu_kthread (./include/linux/bottom_half.h:32 kernel/rcu/tree.c:2876)
> > 
> > The reason is that rcu_rdp_is_offloaded() is invoked without one of the
> > required protections on RT enabled kernels because local_bh_disable() does
> > not disable preemption on RT.
> > 
> > Valentin proposed to add a local lock to the code in question, but that's
> > suboptimal in several aspects:
> > 
> >   1) local locks add extra code to !RT kernels for no value.
> > 
> >   2) All possible callsites have to audited and amended when affected
> >      possible at an outer function level due to lock nesting issues.
> > 
> >   3) As the local lock has to be taken at the outer functions it's required
> >      to release and reacquire them in the inner code sections which might
> >      voluntary schedule, e.g. rcu_do_batch().
> > 
> > Both callsites of rcu_rdp_is_offloaded() which trigger this check invoke
> > rcu_rdp_is_offloaded() in the variable declaration section right at the top
> > of the functions. But the actual usage of the result is either within a
> > section which provides the required protections or after such a section.
> > 
> > So the obvious solution is to move the invocation into the code sections
> > which provide the proper protections, which solves the problem for RT and
> > does not have any impact on !RT kernels.
> 
> You also need to consider rcu_segcblist_completely_offloaded(). There
> are two users:
> 
> 1) The first chunk using it in rcu_core() checks if there is a need to
> accelerate the callback and that can happen concurrently with nocb
> manipulations on the cblist. Concurrent (de-)offloading could mess
> up with the locking state but here is what we can do:
> 
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index bce848e50512..3e56a1a4af03 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -2728,9 +2728,10 @@ static __latent_entropy void rcu_core(void)
>  
>  	/* No grace period and unregistered callbacks? */
>  	if (!rcu_gp_in_progress() &&
> -	    rcu_segcblist_is_enabled(&rdp->cblist) && do_batch) {
> +	    rcu_segcblist_is_enabled(&rdp->cblist)) {
>  		rcu_nocb_lock_irqsave(rdp, flags);
> -		if (!rcu_segcblist_restempty(&rdp->cblist, RCU_NEXT_READY_TAIL))
> +		if (!rcu_segcblist_completely_offloaded(&rdp->cblist) &&
> +		    !rcu_segcblist_restempty(&rdp->cblist, RCU_NEXT_READY_TAIL))
>  			rcu_accelerate_cbs_unlocked(rnp, rdp);
>  		rcu_nocb_unlock_irqrestore(rdp, flags);
>  	}
> diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
> index 305cf6aeb408..64d615be3346 100644
> --- a/kernel/rcu/tree.h
> +++ b/kernel/rcu/tree.h
> @@ -449,10 +449,9 @@ static void rcu_lockdep_assert_cblist_protected(struct rcu_data *rdp);
>  static void __init rcu_organize_nocb_kthreads(void);
>  #define rcu_nocb_lock_irqsave(rdp, flags)				\
>  do {									\
> +	local_irq_save(flags);						\
>  	if (!rcu_segcblist_is_offloaded(&(rdp)->cblist))		\
> -		local_irq_save(flags);					\
> -	else								\
> -		raw_spin_lock_irqsave(&(rdp)->nocb_lock, (flags));	\
> +		raw_spin_lock(&(rdp)->nocb_lock);			\
>  } while (0)
>  #else /* #ifdef CONFIG_RCU_NOCB_CPU */
>  #define rcu_nocb_lock_irqsave(rdp, flags) local_irq_save(flags)
> 
> 
> Doing the local_irq_save() before checking that the segcblist is offloaded
> protect that state from being changed (provided we lock the local rdp). Then we
> can safely manipulate cblist, whether locked or unlocked.
> 
> 2) The actual call to rcu_do_batch(). If we are preempted between
> rcu_segcblist_completely_offloaded() and rcu_do_batch() with a deoffload in
> the middle, we miss the callback invocation. Invoking rcu_core by the end of
> deoffloading process should solve that.

Maybe invoke rcu_core() at that point?  My concern is that there might
be an extended time between the missed rcu_do_batch() and the end of
the deoffloading process.

> > Reported-by: Valentin Schneider <valentin.schneider@arm.com>
> > Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> > ---
> >  kernel/rcu/tree.c |    7 ++++---
> >  1 file changed, 4 insertions(+), 3 deletions(-)
> > 
> > --- a/kernel/rcu/tree.c
> > +++ b/kernel/rcu/tree.c
> > @@ -2278,13 +2278,13 @@ rcu_report_qs_rdp(struct rcu_data *rdp)
> >  {
> >  	unsigned long flags;
> >  	unsigned long mask;
> > -	bool needwake = false;
> > -	const bool offloaded = rcu_rdp_is_offloaded(rdp);
> > +	bool offloaded, needwake = false;
> >  	struct rcu_node *rnp;
> >  
> >  	WARN_ON_ONCE(rdp->cpu != smp_processor_id());
> >  	rnp = rdp->mynode;
> >  	raw_spin_lock_irqsave_rcu_node(rnp, flags);
> > +	offloaded = rcu_rdp_is_offloaded(rdp);
> >  	if (rdp->cpu_no_qs.b.norm || rdp->gp_seq != rnp->gp_seq ||
> >  	    rdp->gpwrap) {
> 
> BTW Paul, if we happen to switch to non-NOCB (deoffload) some time after
> rcu_report_qs_rdp(), it's possible that the rcu_accelerate_cbs()
> that was supposed to be handled by nocb kthreads on behalf of
> rcu_core() -> rcu_report_qs_rdp() would not happen. At least not until
> we invoke rcu_core() again. Not sure how much harm that could cause.

Again, can we just invoke rcu_core() as soon as this is noticed?

> > @@ -2446,7 +2446,7 @@ static void rcu_do_batch(struct rcu_data
> >  	int div;
> >  	bool __maybe_unused empty;
> >  	unsigned long flags;
> > -	const bool offloaded = rcu_rdp_is_offloaded(rdp);
> > +	bool offloaded;
> >  	struct rcu_head *rhp;
> >  	struct rcu_cblist rcl = RCU_CBLIST_INITIALIZER(rcl);
> >  	long bl, count = 0;
> > @@ -2472,6 +2472,7 @@ static void rcu_do_batch(struct rcu_data
> >  	rcu_nocb_lock(rdp);
> >  	WARN_ON_ONCE(cpu_is_offline(smp_processor_id()));
> >  	pending = rcu_segcblist_n_cbs(&rdp->cblist);
> > +	offloaded = rcu_rdp_is_offloaded(rdp);
> 
> offloaded is also checked later in rcu_do_batch(), after irqrestore. In
> fact that should even become a rcu_segcblist_completely_offloaded() check
> there because if there are still pending callbacks while we are de-offloading,
> rcu_core() should be invoked. Hmm but that might be a remote rcu_core...
> 
> Anyway I guess we could live with some of those races with invoking rcu core on the
> target after deoffloading.
> 
> I guess I should cook a series to handle all these issues one by one, then
> probably we can live without a local_lock().

And thank you very much for looking this over!  Not simple stuff.  ;-)

							Thanx, Paul

> Thanks.
> 
> 
> 
> >  	div = READ_ONCE(rcu_divisor);
> >  	div = div < 0 ? 7 : div > sizeof(long) * 8 - 2 ? sizeof(long) * 8 - 2 : div;
> >  	bl = max(rdp->blimit, pending >> div);

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: rcu/tree: Protect rcu_rdp_is_offloaded() invocations on RT
  2021-09-21 23:45       ` Frederic Weisbecker
@ 2021-09-22  6:32         ` Sebastian Andrzej Siewior
  2021-09-22 11:10           ` Frederic Weisbecker
  0 siblings, 1 reply; 51+ messages in thread
From: Sebastian Andrzej Siewior @ 2021-09-22  6:32 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Thomas Gleixner, Valentin Schneider, linux-kernel,
	linux-arm-kernel, rcu, linux-rt-users, Catalin Marinas,
	Will Deacon, Ingo Molnar, Peter Zijlstra, Steven Rostedt,
	Daniel Bristot de Oliveira, Paul E. McKenney, Josh Triplett,
	Mathieu Desnoyers, Davidlohr Bueso, Lai Jiangshan,
	Joel Fernandes, Anshuman Khandual, Vincenzo Frascino,
	Steven Price, Ard Biesheuvel, Boqun Feng, Mike Galbraith

On 2021-09-22 01:45:18 [+0200], Frederic Weisbecker wrote:
> 
> Also while at it, I'm asking again: traditionally softirqs could assume that
> manipulating a local state was safe against !irq_count() code fiddling with
> the same state on the same CPU.
> 
> Now with preemptible softirqs, that assumption can be broken anytime. RCU was
> fortunate enough to have a warning for that. But who knows how many issues like
> this are lurking?

If "local state" is modified then it is safe as long as it is modified
within a local_bh_disable() section. And we are in this section while
invoking a forced-threaded interrupt. The special part about RCU is
that it is used in_irq() as part of core-code.

> Thanks.

Sebastian

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: rcu/tree: Protect rcu_rdp_is_offloaded() invocations on RT
  2021-09-22  6:32         ` Sebastian Andrzej Siewior
@ 2021-09-22 11:10           ` Frederic Weisbecker
  2021-09-22 11:27             ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 51+ messages in thread
From: Frederic Weisbecker @ 2021-09-22 11:10 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Thomas Gleixner, Valentin Schneider, linux-kernel,
	linux-arm-kernel, rcu, linux-rt-users, Catalin Marinas,
	Will Deacon, Ingo Molnar, Peter Zijlstra, Steven Rostedt,
	Daniel Bristot de Oliveira, Paul E. McKenney, Josh Triplett,
	Mathieu Desnoyers, Davidlohr Bueso, Lai Jiangshan,
	Joel Fernandes, Anshuman Khandual, Vincenzo Frascino,
	Steven Price, Ard Biesheuvel, Boqun Feng, Mike Galbraith

On Wed, Sep 22, 2021 at 08:32:08AM +0200, Sebastian Andrzej Siewior wrote:
> On 2021-09-22 01:45:18 [+0200], Frederic Weisbecker wrote:
> > 
> > Also while at it, I'm asking again: traditionally softirqs could assume that
> > manipulating a local state was safe against !irq_count() code fiddling with
> > the same state on the same CPU.
> > 
> > Now with preemptible softirqs, that assumption can be broken anytime. RCU was
> > fortunate enough to have a warning for that. But who knows how many issues like
> > this are lurking?
> 
> If "local state" is modified then it is safe as long as it is modified
> within a local_bh_disable() section. And we are in this section while
> invoking a forced-threaded interrupt. The special part about RCU is
> that it is used in_irq() as part of core-code.

But local_bh_disable() was deemed for protecting from interrupting softirqs,
not the other way around (softirqs being preempted by other tasks). The latter
semantic is new and nobody had that in mind until softirqs have been made
preemptible.

For example:

                             CPU 0
          -----------------------------------------------
          SOFTIRQ                            RANDOM TASK
          ------                             -----------
          int *X = &per_cpu(CPUX, 0)         int *X = &per_cpu(CPUX, 0)
          int A, B;                          WRITE_ONCE(*X, 0);
                                             WRITE_ONCE(*X, 1);
          A = READ_ONCE(*X);
          B = READ_ONCE(*X);


We used to have the guarantee that A == B. That's not true anymore. Now
some new explicit local_bh_disable() should be carefully placed on RANDOM_TASK
where it wasn't necessary before. RCU is not that special in this regard.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: rcu/tree: Protect rcu_rdp_is_offloaded() invocations on RT
  2021-09-22 11:10           ` Frederic Weisbecker
@ 2021-09-22 11:27             ` Sebastian Andrzej Siewior
  2021-09-22 11:38               ` Frederic Weisbecker
  0 siblings, 1 reply; 51+ messages in thread
From: Sebastian Andrzej Siewior @ 2021-09-22 11:27 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Thomas Gleixner, Valentin Schneider, linux-kernel,
	linux-arm-kernel, rcu, linux-rt-users, Catalin Marinas,
	Will Deacon, Ingo Molnar, Peter Zijlstra, Steven Rostedt,
	Daniel Bristot de Oliveira, Paul E. McKenney, Josh Triplett,
	Mathieu Desnoyers, Davidlohr Bueso, Lai Jiangshan,
	Joel Fernandes, Anshuman Khandual, Vincenzo Frascino,
	Steven Price, Ard Biesheuvel, Boqun Feng, Mike Galbraith

On 2021-09-22 13:10:12 [+0200], Frederic Weisbecker wrote:
> On Wed, Sep 22, 2021 at 08:32:08AM +0200, Sebastian Andrzej Siewior wrote:
> > On 2021-09-22 01:45:18 [+0200], Frederic Weisbecker wrote:
> > > 
> > > Also while at it, I'm asking again: traditionally softirqs could assume that
> > > manipulating a local state was safe against !irq_count() code fiddling with
> > > the same state on the same CPU.
> > > 
> > > Now with preemptible softirqs, that assumption can be broken anytime. RCU was
> > > fortunate enough to have a warning for that. But who knows how many issues like
> > > this are lurking?
> > 
> > If "local state" is modified then it is safe as long as it is modified
> > within a local_bh_disable() section. And we are in this section while
> > invoking a forced-threaded interrupt. The special part about RCU is
> > that it is used in_irq() as part of core-code.
> 
> But local_bh_disable() was deemed for protecting from interrupting softirqs,
> not the other way around (softirqs being preempted by other tasks). The latter
> semantic is new and nobody had that in mind until softirqs have been made
> preemptible.
> 
> For example:
> 
>                              CPU 0
>           -----------------------------------------------
>           SOFTIRQ                            RANDOM TASK
>           ------                             -----------
>           int *X = &per_cpu(CPUX, 0)         int *X = &per_cpu(CPUX, 0)
>           int A, B;                          WRITE_ONCE(*X, 0);
>                                              WRITE_ONCE(*X, 1);
>           A = READ_ONCE(*X);
>           B = READ_ONCE(*X);
> 
> 
> We used to have the guarantee that A == B. That's not true anymore. Now
> some new explicit local_bh_disable() should be carefully placed on RANDOM_TASK
> where it wasn't necessary before. RCU is not that special in this regard.

The part with rcutree.use_softirq=0 on RT does not make it any better,
right?
So you rely on some implicit behaviour which breaks with RT such as:

                              CPU 0
           -----------------------------------------------
           RANDOM TASK-A                      RANDOM TASK-B
           ------                             -----------
           int *X = &per_cpu(CPUX, 0)         int *X = &per_cpu(CPUX, 0)
           int A, B;                          
					      spin_lock(&D);
           spin_lock(&C);
	   				      WRITE_ONCE(*X, 0);
           A = READ_ONCE(*X);
                                              WRITE_ONCE(*X, 1);
           B = READ_ONCE(*X);

while spinlock C and D are just random locks not related to CPUX but it
just happens that they are held at that time. So for !RT you guarantee
that A == B while it is not the case on RT.

Sebastian

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: rcu/tree: Protect rcu_rdp_is_offloaded() invocations on RT
  2021-09-22  2:18         ` Paul E. McKenney
@ 2021-09-22 11:31           ` Frederic Weisbecker
  0 siblings, 0 replies; 51+ messages in thread
From: Frederic Weisbecker @ 2021-09-22 11:31 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Thomas Gleixner, Valentin Schneider, linux-kernel,
	linux-arm-kernel, rcu, linux-rt-users, Catalin Marinas,
	Will Deacon, Ingo Molnar, Peter Zijlstra, Steven Rostedt,
	Daniel Bristot de Oliveira, Sebastian Andrzej Siewior,
	Josh Triplett, Mathieu Desnoyers, Davidlohr Bueso, Lai Jiangshan,
	Joel Fernandes, Anshuman Khandual, Vincenzo Frascino,
	Steven Price, Ard Biesheuvel, Boqun Feng, Mike Galbraith

On Tue, Sep 21, 2021 at 07:18:37PM -0700, Paul E. McKenney wrote:
> On Wed, Sep 22, 2021 at 01:36:27AM +0200, Frederic Weisbecker wrote:
> > Doing the local_irq_save() before checking that the segcblist is offloaded
> > protect that state from being changed (provided we lock the local rdp). Then we
> > can safely manipulate cblist, whether locked or unlocked.
> > 
> > 2) The actual call to rcu_do_batch(). If we are preempted between
> > rcu_segcblist_completely_offloaded() and rcu_do_batch() with a deoffload in
> > the middle, we miss the callback invocation. Invoking rcu_core by the end of
> > deoffloading process should solve that.
> 
> Maybe invoke rcu_core() at that point?  My concern is that there might
> be an extended time between the missed rcu_do_batch() and the end of
> the deoffloading process.

Agreed!

> 
> > > Reported-by: Valentin Schneider <valentin.schneider@arm.com>
> > > Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> > > ---
> > >  kernel/rcu/tree.c |    7 ++++---
> > >  1 file changed, 4 insertions(+), 3 deletions(-)
> > > 
> > > --- a/kernel/rcu/tree.c
> > > +++ b/kernel/rcu/tree.c
> > > @@ -2278,13 +2278,13 @@ rcu_report_qs_rdp(struct rcu_data *rdp)
> > >  {
> > >  	unsigned long flags;
> > >  	unsigned long mask;
> > > -	bool needwake = false;
> > > -	const bool offloaded = rcu_rdp_is_offloaded(rdp);
> > > +	bool offloaded, needwake = false;
> > >  	struct rcu_node *rnp;
> > >  
> > >  	WARN_ON_ONCE(rdp->cpu != smp_processor_id());
> > >  	rnp = rdp->mynode;
> > >  	raw_spin_lock_irqsave_rcu_node(rnp, flags);
> > > +	offloaded = rcu_rdp_is_offloaded(rdp);
> > >  	if (rdp->cpu_no_qs.b.norm || rdp->gp_seq != rnp->gp_seq ||
> > >  	    rdp->gpwrap) {
> > 
> > BTW Paul, if we happen to switch to non-NOCB (deoffload) some time after
> > rcu_report_qs_rdp(), it's possible that the rcu_accelerate_cbs()
> > that was supposed to be handled by nocb kthreads on behalf of
> > rcu_core() -> rcu_report_qs_rdp() would not happen. At least not until
> > we invoke rcu_core() again. Not sure how much harm that could cause.
> 
> Again, can we just invoke rcu_core() as soon as this is noticed?

Right. So I'm going to do things a bit differently. I'm going to add
a new segcblist state flag so that during the deoffloading process,
the first very step is an invoke_rcu_core() on the target after setting a
flag that requires handling all this things: accelerate/do_batch, etc...

Then will remain the "do we still have pending callbacks after do_batch?"
in which case we'll need to invoke the rcu_core again as long as we are in
the middle of deoffloading.

Ok, now to write the patches.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: rcu/tree: Protect rcu_rdp_is_offloaded() invocations on RT
  2021-09-22 11:27             ` Sebastian Andrzej Siewior
@ 2021-09-22 11:38               ` Frederic Weisbecker
  2021-09-22 13:02                 ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 51+ messages in thread
From: Frederic Weisbecker @ 2021-09-22 11:38 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Thomas Gleixner, Valentin Schneider, linux-kernel,
	linux-arm-kernel, rcu, linux-rt-users, Catalin Marinas,
	Will Deacon, Ingo Molnar, Peter Zijlstra, Steven Rostedt,
	Daniel Bristot de Oliveira, Paul E. McKenney, Josh Triplett,
	Mathieu Desnoyers, Davidlohr Bueso, Lai Jiangshan,
	Joel Fernandes, Anshuman Khandual, Vincenzo Frascino,
	Steven Price, Ard Biesheuvel, Boqun Feng, Mike Galbraith

On Wed, Sep 22, 2021 at 01:27:31PM +0200, Sebastian Andrzej Siewior wrote:
> On 2021-09-22 13:10:12 [+0200], Frederic Weisbecker wrote:
> > On Wed, Sep 22, 2021 at 08:32:08AM +0200, Sebastian Andrzej Siewior wrote:
> > > On 2021-09-22 01:45:18 [+0200], Frederic Weisbecker wrote:
> > > > 
> > > > Also while at it, I'm asking again: traditionally softirqs could assume that
> > > > manipulating a local state was safe against !irq_count() code fiddling with
> > > > the same state on the same CPU.
> > > > 
> > > > Now with preemptible softirqs, that assumption can be broken anytime. RCU was
> > > > fortunate enough to have a warning for that. But who knows how many issues like
> > > > this are lurking?
> > > 
> > > If "local state" is modified then it is safe as long as it is modified
> > > within a local_bh_disable() section. And we are in this section while
> > > invoking a forced-threaded interrupt. The special part about RCU is
> > > that it is used in_irq() as part of core-code.
> > 
> > But local_bh_disable() was deemed for protecting from interrupting softirqs,
> > not the other way around (softirqs being preempted by other tasks). The latter
> > semantic is new and nobody had that in mind until softirqs have been made
> > preemptible.
> > 
> > For example:
> > 
> >                              CPU 0
> >           -----------------------------------------------
> >           SOFTIRQ                            RANDOM TASK
> >           ------                             -----------
> >           int *X = &per_cpu(CPUX, 0)         int *X = &per_cpu(CPUX, 0)
> >           int A, B;                          WRITE_ONCE(*X, 0);
> >                                              WRITE_ONCE(*X, 1);
> >           A = READ_ONCE(*X);
> >           B = READ_ONCE(*X);
> > 
> > 
> > We used to have the guarantee that A == B. That's not true anymore. Now
> > some new explicit local_bh_disable() should be carefully placed on RANDOM_TASK
> > where it wasn't necessary before. RCU is not that special in this regard.
> 
> The part with rcutree.use_softirq=0 on RT does not make it any better,
> right?

The rcuc kthread disables softirqs before calling rcu_core(), so it behaves
pretty much the same as a softirq. Or am I missing something?

> So you rely on some implicit behaviour which breaks with RT such as:
> 
>                               CPU 0
>            -----------------------------------------------
>            RANDOM TASK-A                      RANDOM TASK-B
>            ------                             -----------
>            int *X = &per_cpu(CPUX, 0)         int *X = &per_cpu(CPUX, 0)
>            int A, B;                          
> 					      spin_lock(&D);
>            spin_lock(&C);
> 	   				      WRITE_ONCE(*X, 0);
>            A = READ_ONCE(*X);
>                                               WRITE_ONCE(*X, 1);
>            B = READ_ONCE(*X);
> 
> while spinlock C and D are just random locks not related to CPUX but it
> just happens that they are held at that time. So for !RT you guarantee
> that A == B while it is not the case on RT.

Not sure which spinlocks you are referring to here. Also most RCU spinlocks
are raw.

> 
> Sebastian

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: rcu/tree: Protect rcu_rdp_is_offloaded() invocations on RT
  2021-09-22 11:38               ` Frederic Weisbecker
@ 2021-09-22 13:02                 ` Sebastian Andrzej Siewior
  2021-09-23 10:02                   ` Frederic Weisbecker
  0 siblings, 1 reply; 51+ messages in thread
From: Sebastian Andrzej Siewior @ 2021-09-22 13:02 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Thomas Gleixner, Valentin Schneider, linux-kernel,
	linux-arm-kernel, rcu, linux-rt-users, Catalin Marinas,
	Will Deacon, Ingo Molnar, Peter Zijlstra, Steven Rostedt,
	Daniel Bristot de Oliveira, Paul E. McKenney, Josh Triplett,
	Mathieu Desnoyers, Davidlohr Bueso, Lai Jiangshan,
	Joel Fernandes, Anshuman Khandual, Vincenzo Frascino,
	Steven Price, Ard Biesheuvel, Boqun Feng, Mike Galbraith

On 2021-09-22 13:38:20 [+0200], Frederic Weisbecker wrote:
> > The part with rcutree.use_softirq=0 on RT does not make it any better,
> > right?
> 
> The rcuc kthread disables softirqs before calling rcu_core(), so it behaves
> pretty much the same as a softirq. Or am I missing something?

Oh, no you don't.

> > So you rely on some implicit behaviour which breaks with RT such as:
> > 
> >                               CPU 0
> >            -----------------------------------------------
> >            RANDOM TASK-A                      RANDOM TASK-B
> >            ------                             -----------
> >            int *X = &per_cpu(CPUX, 0)         int *X = &per_cpu(CPUX, 0)
> >            int A, B;                          
> > 					      spin_lock(&D);
> >            spin_lock(&C);
> > 	   				      WRITE_ONCE(*X, 0);
> >            A = READ_ONCE(*X);
> >                                               WRITE_ONCE(*X, 1);
> >            B = READ_ONCE(*X);
> > 
> > while spinlock C and D are just random locks not related to CPUX but it
> > just happens that they are held at that time. So for !RT you guarantee
> > that A == B while it is not the case on RT.
> 
> Not sure which spinlocks you are referring to here. Also most RCU spinlocks
> are raw.

I was bringing an example where you also could rely on implicit locking
provided by spin_lock() which breaks on RT.

Sebastian

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: rcu/tree: Protect rcu_rdp_is_offloaded() invocations on RT
  2021-09-22 13:02                 ` Sebastian Andrzej Siewior
@ 2021-09-23 10:02                   ` Frederic Weisbecker
  0 siblings, 0 replies; 51+ messages in thread
From: Frederic Weisbecker @ 2021-09-23 10:02 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Thomas Gleixner, Valentin Schneider, linux-kernel,
	linux-arm-kernel, rcu, linux-rt-users, Catalin Marinas,
	Will Deacon, Ingo Molnar, Peter Zijlstra, Steven Rostedt,
	Daniel Bristot de Oliveira, Paul E. McKenney, Josh Triplett,
	Mathieu Desnoyers, Davidlohr Bueso, Lai Jiangshan,
	Joel Fernandes, Anshuman Khandual, Vincenzo Frascino,
	Steven Price, Ard Biesheuvel, Boqun Feng, Mike Galbraith

On Wed, Sep 22, 2021 at 03:02:32PM +0200, Sebastian Andrzej Siewior wrote:
> On 2021-09-22 13:38:20 [+0200], Frederic Weisbecker wrote:
> > > So you rely on some implicit behaviour which breaks with RT such as:
> > > 
> > >                               CPU 0
> > >            -----------------------------------------------
> > >            RANDOM TASK-A                      RANDOM TASK-B
> > >            ------                             -----------
> > >            int *X = &per_cpu(CPUX, 0)         int *X = &per_cpu(CPUX, 0)
> > >            int A, B;                          
> > > 					      spin_lock(&D);
> > >            spin_lock(&C);
> > > 	   				      WRITE_ONCE(*X, 0);
> > >            A = READ_ONCE(*X);
> > >                                               WRITE_ONCE(*X, 1);
> > >            B = READ_ONCE(*X);
> > > 
> > > while spinlock C and D are just random locks not related to CPUX but it
> > > just happens that they are held at that time. So for !RT you guarantee
> > > that A == B while it is not the case on RT.
> > 
> > Not sure which spinlocks you are referring to here. Also most RCU spinlocks
> > are raw.
> 
> I was bringing an example where you also could rely on implicit locking
> provided by spin_lock() which breaks on RT.

Good point!

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: rcu/tree: Protect rcu_rdp_is_offloaded() invocations on RT
  2021-09-21 21:12     ` rcu/tree: Protect rcu_rdp_is_offloaded() invocations on RT Thomas Gleixner
  2021-09-21 23:36       ` Frederic Weisbecker
  2021-09-21 23:45       ` Frederic Weisbecker
@ 2021-09-30  9:00       ` Valentin Schneider
  2021-09-30 10:53         ` Frederic Weisbecker
  2 siblings, 1 reply; 51+ messages in thread
From: Valentin Schneider @ 2021-09-30  9:00 UTC (permalink / raw)
  To: Thomas Gleixner, linux-kernel, linux-arm-kernel, rcu, linux-rt-users
  Cc: Catalin Marinas, Will Deacon, Ingo Molnar, Peter Zijlstra,
	Steven Rostedt, Daniel Bristot de Oliveira,
	Sebastian Andrzej Siewior, Paul E. McKenney, Frederic Weisbecker,
	Josh Triplett, Mathieu Desnoyers, Davidlohr Bueso, Lai Jiangshan,
	Joel Fernandes, Anshuman Khandual, Vincenzo Frascino,
	Steven Price, Ard Biesheuvel, Boqun Feng, Mike Galbraith

Hi,

On 21/09/21 23:12, Thomas Gleixner wrote:
> Valentin reported warnings about suspicious RCU usage on RT kernels. Those
> happen when offloading of RCU callbacks is enabled:
>
>   WARNING: suspicious RCU usage
>   5.13.0-rt1 #20 Not tainted
>   -----------------------------
>   kernel/rcu/tree_plugin.h:69 Unsafe read of RCU_NOCB offloaded state!
>
>   rcu_rdp_is_offloaded (kernel/rcu/tree_plugin.h:69 kernel/rcu/tree_plugin.h:58)
>   rcu_core (kernel/rcu/tree.c:2332 kernel/rcu/tree.c:2398 kernel/rcu/tree.c:2777)
>   rcu_cpu_kthread (./include/linux/bottom_half.h:32 kernel/rcu/tree.c:2876)
>
> The reason is that rcu_rdp_is_offloaded() is invoked without one of the
> required protections on RT enabled kernels because local_bh_disable() does
> not disable preemption on RT.
>
> Valentin proposed to add a local lock to the code in question, but that's
> suboptimal in several aspects:
>
>   1) local locks add extra code to !RT kernels for no value.
>
>   2) All possible callsites have to audited and amended when affected
>      possible at an outer function level due to lock nesting issues.
>
>   3) As the local lock has to be taken at the outer functions it's required
>      to release and reacquire them in the inner code sections which might
>      voluntary schedule, e.g. rcu_do_batch().
>
> Both callsites of rcu_rdp_is_offloaded() which trigger this check invoke
> rcu_rdp_is_offloaded() in the variable declaration section right at the top
> of the functions. But the actual usage of the result is either within a
> section which provides the required protections or after such a section.
>
> So the obvious solution is to move the invocation into the code sections
> which provide the proper protections, which solves the problem for RT and
> does not have any impact on !RT kernels.
>

Thanks for taking a look at this!

My reasoning for adding protection in the outer functions was to prevent
impaired unlocks of rcu_nocb_{un}lock_irqsave(), as that too depends on the
offload state. Cf. Frederic's writeup:

  http://lore.kernel.org/r/20210727230814.GC283787@lothringen

Anywho, I see Frederic has sent a fancy new series, let me go stare at it.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: rcu/tree: Protect rcu_rdp_is_offloaded() invocations on RT
  2021-09-30  9:00       ` Valentin Schneider
@ 2021-09-30 10:53         ` Frederic Weisbecker
  2021-09-30 13:22           ` Valentin Schneider
  0 siblings, 1 reply; 51+ messages in thread
From: Frederic Weisbecker @ 2021-09-30 10:53 UTC (permalink / raw)
  To: Valentin Schneider
  Cc: Thomas Gleixner, linux-kernel, linux-arm-kernel, rcu,
	linux-rt-users, Catalin Marinas, Will Deacon, Ingo Molnar,
	Peter Zijlstra, Steven Rostedt, Daniel Bristot de Oliveira,
	Sebastian Andrzej Siewior, Paul E. McKenney, Josh Triplett,
	Mathieu Desnoyers, Davidlohr Bueso, Lai Jiangshan,
	Joel Fernandes, Anshuman Khandual, Vincenzo Frascino,
	Steven Price, Ard Biesheuvel, Boqun Feng, Mike Galbraith

On Thu, Sep 30, 2021 at 10:00:39AM +0100, Valentin Schneider wrote:
> Hi,
> 
> On 21/09/21 23:12, Thomas Gleixner wrote:
> > Valentin reported warnings about suspicious RCU usage on RT kernels. Those
> > happen when offloading of RCU callbacks is enabled:
> >
> >   WARNING: suspicious RCU usage
> >   5.13.0-rt1 #20 Not tainted
> >   -----------------------------
> >   kernel/rcu/tree_plugin.h:69 Unsafe read of RCU_NOCB offloaded state!
> >
> >   rcu_rdp_is_offloaded (kernel/rcu/tree_plugin.h:69 kernel/rcu/tree_plugin.h:58)
> >   rcu_core (kernel/rcu/tree.c:2332 kernel/rcu/tree.c:2398 kernel/rcu/tree.c:2777)
> >   rcu_cpu_kthread (./include/linux/bottom_half.h:32 kernel/rcu/tree.c:2876)
> >
> > The reason is that rcu_rdp_is_offloaded() is invoked without one of the
> > required protections on RT enabled kernels because local_bh_disable() does
> > not disable preemption on RT.
> >
> > Valentin proposed to add a local lock to the code in question, but that's
> > suboptimal in several aspects:
> >
> >   1) local locks add extra code to !RT kernels for no value.
> >
> >   2) All possible callsites have to audited and amended when affected
> >      possible at an outer function level due to lock nesting issues.
> >
> >   3) As the local lock has to be taken at the outer functions it's required
> >      to release and reacquire them in the inner code sections which might
> >      voluntary schedule, e.g. rcu_do_batch().
> >
> > Both callsites of rcu_rdp_is_offloaded() which trigger this check invoke
> > rcu_rdp_is_offloaded() in the variable declaration section right at the top
> > of the functions. But the actual usage of the result is either within a
> > section which provides the required protections or after such a section.
> >
> > So the obvious solution is to move the invocation into the code sections
> > which provide the proper protections, which solves the problem for RT and
> > does not have any impact on !RT kernels.
> >
> 
> Thanks for taking a look at this!
> 
> My reasoning for adding protection in the outer functions was to prevent
> impaired unlocks of rcu_nocb_{un}lock_irqsave(), as that too depends on the
> offload state. Cf. Frederic's writeup:
> 
>   http://lore.kernel.org/r/20210727230814.GC283787@lothringen

I was wrong about that BTW!
Because rcu_nocb_lock() always require IRQs to be disabled, which of course disables
preemption, so the offloaded state can't change between
rcu_nocb_lock[_irqsave]() and rcu_nocb_unlock[_irqrestore]() but anyway there
were many other issues to fix :-)


> 
> Anywho, I see Frederic has sent a fancy new series, let me go stare at it.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: rcu/tree: Protect rcu_rdp_is_offloaded() invocations on RT
  2021-09-30 10:53         ` Frederic Weisbecker
@ 2021-09-30 13:22           ` Valentin Schneider
  0 siblings, 0 replies; 51+ messages in thread
From: Valentin Schneider @ 2021-09-30 13:22 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Thomas Gleixner, linux-kernel, linux-arm-kernel, rcu,
	linux-rt-users, Catalin Marinas, Will Deacon, Ingo Molnar,
	Peter Zijlstra, Steven Rostedt, Daniel Bristot de Oliveira,
	Sebastian Andrzej Siewior, Paul E. McKenney, Josh Triplett,
	Mathieu Desnoyers, Davidlohr Bueso, Lai Jiangshan,
	Joel Fernandes, Anshuman Khandual, Vincenzo Frascino,
	Steven Price, Ard Biesheuvel, Boqun Feng, Mike Galbraith

On 30/09/21 12:53, Frederic Weisbecker wrote:
> On Thu, Sep 30, 2021 at 10:00:39AM +0100, Valentin Schneider wrote:
>> My reasoning for adding protection in the outer functions was to prevent
>> impaired unlocks of rcu_nocb_{un}lock_irqsave(), as that too depends on the
>> offload state. Cf. Frederic's writeup:
>>
>>   http://lore.kernel.org/r/20210727230814.GC283787@lothringen
>
> I was wrong about that BTW!
> Because rcu_nocb_lock() always require IRQs to be disabled, which of course disables
> preemption, so the offloaded state can't change between
> rcu_nocb_lock[_irqsave]() and rcu_nocb_unlock[_irqrestore]() but anyway there
> were many other issues to fix :-)
>

Ooooh... Even with you pointing it out, it took me a while to see it that
way. It's tough to get out of holidays mode :-)

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 2/4] sched: Introduce migratable()
  2021-08-22 18:14     ` Valentin Schneider
@ 2022-01-26 16:56       ` Sebastian Andrzej Siewior
  2022-01-26 18:10         ` Valentin Schneider
  2022-01-27 19:27         ` Valentin Schneider
  0 siblings, 2 replies; 51+ messages in thread
From: Sebastian Andrzej Siewior @ 2022-01-26 16:56 UTC (permalink / raw)
  To: Valentin Schneider
  Cc: linux-kernel, linux-arm-kernel, rcu, linux-rt-users,
	Catalin Marinas, Will Deacon, Ingo Molnar, Peter Zijlstra,
	Thomas Gleixner, Steven Rostedt, Daniel Bristot de Oliveira,
	Paul E. McKenney, Frederic Weisbecker, Josh Triplett,
	Mathieu Desnoyers, Davidlohr Bueso, Lai Jiangshan,
	Joel Fernandes, Anshuman Khandual, Vincenzo Frascino,
	Steven Price, Ard Biesheuvel, Boqun Feng, Mike Galbraith

On 2021-08-22 19:14:18 [+0100], Valentin Schneider wrote:
> Thanks for carrying it through, I'll keep that change for the next version.

Just a quick question. This series ended with 3 patches in my queue. It
got decimated further due to Frederic series which ended up in
v5.17-rc1. I still have
| 2021-08-11 21:13 +0100 Valentin Schneider    ∙ sched: Introduce migratable()
| 2021-08-11 21:13 +0100 Valentin Schneider    ∙ arm64: mm: Make arch_faults_on_old_pte() check for migratability

what do we do about these two? I could repost these two if there are no
objections…
 
Sebastian

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 2/4] sched: Introduce migratable()
  2022-01-26 16:56       ` Sebastian Andrzej Siewior
@ 2022-01-26 18:10         ` Valentin Schneider
  2022-01-27 10:07           ` Sebastian Andrzej Siewior
  2022-01-27 19:27         ` Valentin Schneider
  1 sibling, 1 reply; 51+ messages in thread
From: Valentin Schneider @ 2022-01-26 18:10 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: linux-kernel, linux-arm-kernel, rcu, linux-rt-users,
	Catalin Marinas, Will Deacon, Ingo Molnar, Peter Zijlstra,
	Thomas Gleixner, Steven Rostedt, Daniel Bristot de Oliveira,
	Paul E. McKenney, Frederic Weisbecker, Josh Triplett,
	Mathieu Desnoyers, Davidlohr Bueso, Lai Jiangshan,
	Joel Fernandes, Anshuman Khandual, Vincenzo Frascino,
	Steven Price, Ard Biesheuvel, Boqun Feng, Mike Galbraith

On 26/01/22 17:56, Sebastian Andrzej Siewior wrote:
> On 2021-08-22 19:14:18 [+0100], Valentin Schneider wrote:
>> Thanks for carrying it through, I'll keep that change for the next version.
>
> Just a quick question. This series ended with 3 patches in my queue. It
> got decimated further due to Frederic series which ended up in
> v5.17-rc1. I still have
> | 2021-08-11 21:13 +0100 Valentin Schneider    ∙ sched: Introduce migratable()
> | 2021-08-11 21:13 +0100 Valentin Schneider    ∙ arm64: mm: Make arch_faults_on_old_pte() check for migratability
>
> what do we do about these two? I could repost these two if there are no
> objections…
>

Heh, had forgotten about those - I'm happy to repost with the
s/migratable/is_migratable/. I also need to go back to those splats I got
on my emag and fix the PMU/GPIO warnings...

> Sebastian

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 2/4] sched: Introduce migratable()
  2022-01-26 18:10         ` Valentin Schneider
@ 2022-01-27 10:07           ` Sebastian Andrzej Siewior
  2022-01-27 18:23             ` Valentin Schneider
  0 siblings, 1 reply; 51+ messages in thread
From: Sebastian Andrzej Siewior @ 2022-01-27 10:07 UTC (permalink / raw)
  To: Valentin Schneider
  Cc: linux-kernel, linux-arm-kernel, rcu, linux-rt-users,
	Catalin Marinas, Will Deacon, Ingo Molnar, Peter Zijlstra,
	Thomas Gleixner, Steven Rostedt, Daniel Bristot de Oliveira,
	Paul E. McKenney, Frederic Weisbecker, Josh Triplett,
	Mathieu Desnoyers, Davidlohr Bueso, Lai Jiangshan,
	Joel Fernandes, Anshuman Khandual, Vincenzo Frascino,
	Steven Price, Ard Biesheuvel, Boqun Feng, Mike Galbraith

On 2022-01-26 18:10:51 [+0000], Valentin Schneider wrote:
> > | 2021-08-11 21:13 +0100 Valentin Schneider    ∙ sched: Introduce migratable()
> > | 2021-08-11 21:13 +0100 Valentin Schneider    ∙ arm64: mm: Make arch_faults_on_old_pte() check for migratability
> Heh, had forgotten about those - I'm happy to repost with the
> s/migratable/is_migratable/. I also need to go back to those splats I got
> on my emag and fix the PMU/GPIO warnings...

Now that I look at it gain, you might want to drop #1 and then #2 would
switch to cant_migrate(). This might work…

Sebastian

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 2/4] sched: Introduce migratable()
  2022-01-27 10:07           ` Sebastian Andrzej Siewior
@ 2022-01-27 18:23             ` Valentin Schneider
  0 siblings, 0 replies; 51+ messages in thread
From: Valentin Schneider @ 2022-01-27 18:23 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: linux-kernel, linux-arm-kernel, rcu, linux-rt-users,
	Catalin Marinas, Will Deacon, Ingo Molnar, Peter Zijlstra,
	Thomas Gleixner, Steven Rostedt, Daniel Bristot de Oliveira,
	Paul E. McKenney, Frederic Weisbecker, Josh Triplett,
	Mathieu Desnoyers, Davidlohr Bueso, Lai Jiangshan,
	Joel Fernandes, Anshuman Khandual, Vincenzo Frascino,
	Steven Price, Ard Biesheuvel, Boqun Feng, Mike Galbraith

On 27/01/22 11:07, Sebastian Andrzej Siewior wrote:
> On 2022-01-26 18:10:51 [+0000], Valentin Schneider wrote:
>> > | 2021-08-11 21:13 +0100 Valentin Schneider    ∙ sched: Introduce migratable()
>> > | 2021-08-11 21:13 +0100 Valentin Schneider    ∙ arm64: mm: Make arch_faults_on_old_pte() check for migratability
> …
>> Heh, had forgotten about those - I'm happy to repost with the
>> s/migratable/is_migratable/. I also need to go back to those splats I got
>> on my emag and fix the PMU/GPIO warnings...
>
> Now that I look at it gain, you might want to drop #1 and then #2 would
> switch to cant_migrate(). This might work…
>

Wasn't aware of cant_migrate(), that does look even better. Lemme give it a shot.

> Sebastian

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 2/4] sched: Introduce migratable()
  2022-01-26 16:56       ` Sebastian Andrzej Siewior
  2022-01-26 18:10         ` Valentin Schneider
@ 2022-01-27 19:27         ` Valentin Schneider
  2022-02-04  9:24           ` Sebastian Andrzej Siewior
  1 sibling, 1 reply; 51+ messages in thread
From: Valentin Schneider @ 2022-01-27 19:27 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: linux-kernel, linux-arm-kernel, rcu, linux-rt-users,
	Catalin Marinas, Will Deacon, Ingo Molnar, Peter Zijlstra,
	Thomas Gleixner, Steven Rostedt, Daniel Bristot de Oliveira,
	Paul E. McKenney, Frederic Weisbecker, Josh Triplett,
	Mathieu Desnoyers, Davidlohr Bueso, Lai Jiangshan,
	Joel Fernandes, Anshuman Khandual, Vincenzo Frascino,
	Steven Price, Ard Biesheuvel, Boqun Feng, Mike Galbraith

On 26/01/22 17:56, Sebastian Andrzej Siewior wrote:
> On 2021-08-22 19:14:18 [+0100], Valentin Schneider wrote:
>> Thanks for carrying it through, I'll keep that change for the next version.
>
> Just a quick question. This series ended with 3 patches in my queue. It
> got decimated further due to Frederic series which ended up in
> v5.17-rc1. I still have
> | 2021-08-11 21:13 +0100 Valentin Schneider    ∙ sched: Introduce migratable()
> | 2021-08-11 21:13 +0100 Valentin Schneider    ∙ arm64: mm: Make arch_faults_on_old_pte() check for migratability
>
> what do we do about these two? I could repost these two if there are no
> objections…
>

While I'm at it, I see you're still carrying

  6ab5bb09040d arm64/sve: Make kernel FPU protection RT friendly

If you're OK with it, I'll repost that:
https://lore.kernel.org/lkml/20210722175157.1367122-1-valentin.schneider@arm.com/

> Sebastian

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 2/4] sched: Introduce migratable()
  2022-01-27 19:27         ` Valentin Schneider
@ 2022-02-04  9:24           ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 51+ messages in thread
From: Sebastian Andrzej Siewior @ 2022-02-04  9:24 UTC (permalink / raw)
  To: Valentin Schneider
  Cc: linux-kernel, linux-arm-kernel, rcu, linux-rt-users,
	Catalin Marinas, Will Deacon, Ingo Molnar, Peter Zijlstra,
	Thomas Gleixner, Steven Rostedt, Daniel Bristot de Oliveira,
	Paul E. McKenney, Frederic Weisbecker, Josh Triplett,
	Mathieu Desnoyers, Davidlohr Bueso, Lai Jiangshan,
	Joel Fernandes, Anshuman Khandual, Vincenzo Frascino,
	Steven Price, Ard Biesheuvel, Boqun Feng, Mike Galbraith

On 2022-01-27 19:27:56 [+0000], Valentin Schneider wrote:

Sorry, got distracted.

> While I'm at it, I see you're still carrying
> 
>   6ab5bb09040d arm64/sve: Make kernel FPU protection RT friendly
> 
> If you're OK with it, I'll repost that:
> https://lore.kernel.org/lkml/20210722175157.1367122-1-valentin.schneider@arm.com/

I'm not too keen about preempt_enable_bh(). I would prefer the current
approach maybe with a comment why BH here and preemption there is
correct.
 
Sebastian

^ permalink raw reply	[flat|nested] 51+ messages in thread

end of thread, other threads:[~2022-02-04  9:25 UTC | newest]

Thread overview: 51+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-11 20:13 [PATCH v3 0/4] rcu, arm64: PREEMPT_RT fixlets Valentin Schneider
2021-08-11 20:13 ` [PATCH v3 1/4] rcutorture: Don't disable softirqs with preemption disabled when PREEMPT_RT Valentin Schneider
2021-08-12 16:47   ` Paul E. McKenney
2021-08-17 12:13   ` Sebastian Andrzej Siewior
2021-08-17 13:17     ` Sebastian Andrzej Siewior
2021-08-17 14:40       ` [PATCH] rcutorture: Avoid problematic critical section nesting on RT Sebastian Andrzej Siewior
2021-08-18 22:46         ` Paul E. McKenney
2021-08-19 15:35           ` Sebastian Andrzej Siewior
2021-08-19 15:39           ` Sebastian Andrzej Siewior
2021-08-19 15:47             ` Sebastian Andrzej Siewior
2021-08-19 18:20               ` Paul E. McKenney
2021-08-19 18:45                 ` Sebastian Andrzej Siewior
2021-08-20  4:11                 ` Scott Wood
2021-08-20  7:11                   ` Sebastian Andrzej Siewior
2021-08-20  7:42                   ` [PATCH v2] rcutorture: Avoid problematic critical section nesting on PREEMPT_RT Sebastian Andrzej Siewior
2021-08-20 22:10                     ` Paul E. McKenney
2021-08-20  3:23         ` [PATCH] rcutorture: Avoid problematic critical section nesting on RT Scott Wood
2021-08-20  6:54           ` Sebastian Andrzej Siewior
2021-08-11 20:13 ` [PATCH v3 2/4] sched: Introduce migratable() Valentin Schneider
2021-08-17 14:43   ` Sebastian Andrzej Siewior
2021-08-22 17:31     ` Valentin Schneider
2021-08-17 17:09   ` Sebastian Andrzej Siewior
2021-08-17 19:30     ` Phil Auld
2021-08-22 18:14     ` Valentin Schneider
2022-01-26 16:56       ` Sebastian Andrzej Siewior
2022-01-26 18:10         ` Valentin Schneider
2022-01-27 10:07           ` Sebastian Andrzej Siewior
2022-01-27 18:23             ` Valentin Schneider
2022-01-27 19:27         ` Valentin Schneider
2022-02-04  9:24           ` Sebastian Andrzej Siewior
2021-08-11 20:13 ` [PATCH v3 3/4] rcu/nocb: Protect NOCB state via local_lock() under PREEMPT_RT Valentin Schneider
2021-08-13  0:20   ` Paul E. McKenney
2021-08-13 18:48     ` Valentin Schneider
2021-08-17 15:36   ` Sebastian Andrzej Siewior
2021-08-22 18:15     ` Valentin Schneider
2021-09-21 14:05   ` Thomas Gleixner
2021-09-21 21:12     ` rcu/tree: Protect rcu_rdp_is_offloaded() invocations on RT Thomas Gleixner
2021-09-21 23:36       ` Frederic Weisbecker
2021-09-22  2:18         ` Paul E. McKenney
2021-09-22 11:31           ` Frederic Weisbecker
2021-09-21 23:45       ` Frederic Weisbecker
2021-09-22  6:32         ` Sebastian Andrzej Siewior
2021-09-22 11:10           ` Frederic Weisbecker
2021-09-22 11:27             ` Sebastian Andrzej Siewior
2021-09-22 11:38               ` Frederic Weisbecker
2021-09-22 13:02                 ` Sebastian Andrzej Siewior
2021-09-23 10:02                   ` Frederic Weisbecker
2021-09-30  9:00       ` Valentin Schneider
2021-09-30 10:53         ` Frederic Weisbecker
2021-09-30 13:22           ` Valentin Schneider
2021-08-11 20:13 ` [PATCH v3 4/4] arm64: mm: Make arch_faults_on_old_pte() check for migratability Valentin Schneider

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).