[PATCH 00/11] rcu: Make rcu_core() safe in PREEMPT_RT with NOCB + a few other fixes v2

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 00/11] rcu: Make rcu_core() safe in PREEMPT_RT with NOCB + a few other fixes v2
@ 2021-10-11 14:51 Frederic Weisbecker
  2021-10-11 14:51 ` [PATCH 01/11] rcu/nocb: Make local rcu_nocb_lock_irqsave() safe against concurrent deoffloading Frederic Weisbecker
                   ` (11 more replies)
  0 siblings, 12 replies; 25+ messages in thread
From: Frederic Weisbecker @ 2021-10-11 14:51 UTC (permalink / raw)
  To: Paul E . McKenney
  Cc: LKML, Frederic Weisbecker, Sebastian Andrzej Siewior,
	Valentin Schneider, Peter Zijlstra, Uladzislau Rezki,
	Thomas Gleixner, Valentin Schneider, Boqun Feng, Neeraj Upadhyay,
	Josh Triplett, Joel Fernandes, rcu

Hi,

No code change in this v2, only changelogs:

* Add tags from Valentin and Sebastian

* Remove last reference to SEGCBLIST_SOFTIRQ_ONLY (thanks Valentin)

* Rewrite changelog for "rcu/nocb: Check a stable offloaded state to manipulate qlen_last_fqs_check"
  after off-list debates with Paul.

* Remove the scenario with softirq interrupting rcuc on
  "rcu/nocb: Limit number of softirq callbacks only on softirq" as it's
  probably not possible (thanks Valentin).

* Remove the scenario with task spent scheduling out accounted on tlimit
  as it's not possible (thanks Valentin)
  (see "rcu: Apply callbacks processing time limit only on softirq")

* Fixed changelog of
  "rcu/nocb: Don't invoke local rcu core on callback overload from nocb kthread"
  (thanks Sebastian).

git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks.git
	rcu/rt-v2

HEAD: 2c9349986d5f70a555195139665841cd98e9aba4

Thanks,
	Frederic
---

Frederic Weisbecker (10):
      rcu/nocb: Make local rcu_nocb_lock_irqsave() safe against concurrent deoffloading
      rcu/nocb: Prepare state machine for a new step
      rcu/nocb: Invoke rcu_core() at the start of deoffloading
      rcu/nocb: Make rcu_core() callbacks acceleration (de-)offloading safe
      rcu/nocb: Check a stable offloaded state to manipulate qlen_last_fqs_check
      rcu/nocb: Use appropriate rcu_nocb_lock_irqsave()
      rcu/nocb: Limit number of softirq callbacks only on softirq
      rcu: Fix callbacks processing time limit retaining cond_resched()
      rcu: Apply callbacks processing time limit only on softirq
      rcu/nocb: Don't invoke local rcu core on callback overload from nocb kthread

Thomas Gleixner (1):
      rcu/nocb: Make rcu_core() callbacks acceleration preempt-safe


 include/linux/rcu_segcblist.h | 51 ++++++++++++++++++-------
 kernel/rcu/rcu_segcblist.c    | 10 ++---
 kernel/rcu/rcu_segcblist.h    | 12 +++---
 kernel/rcu/tree.c             | 87 +++++++++++++++++++++++++++++--------------
 kernel/rcu/tree.h             | 16 +++++---
 kernel/rcu/tree_nocb.h        | 29 ++++++++++++---
 6 files changed, 141 insertions(+), 64 deletions(-)

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 01/11] rcu/nocb: Make local rcu_nocb_lock_irqsave() safe against concurrent deoffloading
  2021-10-11 14:51 [PATCH 00/11] rcu: Make rcu_core() safe in PREEMPT_RT with NOCB + a few other fixes v2 Frederic Weisbecker
@ 2021-10-11 14:51 ` Frederic Weisbecker
  2021-10-11 14:51 ` [PATCH 02/11] rcu/nocb: Prepare state machine for a new step Frederic Weisbecker
                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 25+ messages in thread
From: Frederic Weisbecker @ 2021-10-11 14:51 UTC (permalink / raw)
  To: Paul E . McKenney
  Cc: LKML, Frederic Weisbecker, Sebastian Andrzej Siewior,
	Valentin Schneider, Peter Zijlstra, Uladzislau Rezki,
	Thomas Gleixner, Valentin Schneider, Boqun Feng, Neeraj Upadhyay,
	Josh Triplett, Joel Fernandes, rcu

rcu_nocb_lock_irqsave() can be preempted between the call to
rcu_segcblist_is_offloaded() and the actual locking. This matters now
that rcu_core() is preemptible on PREEMPT_RT and the (de-)offloading
process can interrupt the softirq or the rcuc kthread.

As a result we may locklessly call into code that requires nocb locking.
In practice this is a problem while we accelerate callbacks on rcu_core().

Simply disabling interrupts before (instead of after) checking the NOCB
offload state fixes the issue.

Reported-and-tested-by: Valentin Schneider <valentin.schneider@arm.com>
Tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Valentin Schneider <valentin.schneider@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Neeraj Upadhyay <neeraju@codeaurora.org>
Cc: Uladzislau Rezki <urezki@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
---
 kernel/rcu/tree.h | 16 ++++++++++------
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
index 70188cb42473..deeaf2fee714 100644
--- a/kernel/rcu/tree.h
+++ b/kernel/rcu/tree.h
@@ -439,12 +439,16 @@ static void rcu_nocb_unlock_irqrestore(struct rcu_data *rdp,
 static void rcu_lockdep_assert_cblist_protected(struct rcu_data *rdp);
 #ifdef CONFIG_RCU_NOCB_CPU
 static void __init rcu_organize_nocb_kthreads(void);
-#define rcu_nocb_lock_irqsave(rdp, flags)				\
-do {									\
-	if (!rcu_segcblist_is_offloaded(&(rdp)->cblist))		\
-		local_irq_save(flags);					\
-	else								\
-		raw_spin_lock_irqsave(&(rdp)->nocb_lock, (flags));	\
+
+/*
+ * Disable IRQs before checking offloaded state so that local
+ * locking is safe against concurrent de-offloading.
+ */
+#define rcu_nocb_lock_irqsave(rdp, flags)			\
+do {								\
+	local_irq_save(flags);					\
+	if (rcu_segcblist_is_offloaded(&(rdp)->cblist))	\
+		raw_spin_lock(&(rdp)->nocb_lock);		\
 } while (0)
 #else /* #ifdef CONFIG_RCU_NOCB_CPU */
 #define rcu_nocb_lock_irqsave(rdp, flags) local_irq_save(flags)
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 02/11] rcu/nocb: Prepare state machine for a new step
  2021-10-11 14:51 [PATCH 00/11] rcu: Make rcu_core() safe in PREEMPT_RT with NOCB + a few other fixes v2 Frederic Weisbecker
  2021-10-11 14:51 ` [PATCH 01/11] rcu/nocb: Make local rcu_nocb_lock_irqsave() safe against concurrent deoffloading Frederic Weisbecker
@ 2021-10-11 14:51 ` Frederic Weisbecker
  2021-10-11 14:51 ` [PATCH 03/11] rcu/nocb: Invoke rcu_core() at the start of deoffloading Frederic Weisbecker
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 25+ messages in thread
From: Frederic Weisbecker @ 2021-10-11 14:51 UTC (permalink / raw)
  To: Paul E . McKenney
  Cc: LKML, Frederic Weisbecker, Sebastian Andrzej Siewior,
	Valentin Schneider, Peter Zijlstra, Uladzislau Rezki,
	Thomas Gleixner, Valentin Schneider, Boqun Feng, Neeraj Upadhyay,
	Josh Triplett, Joel Fernandes, rcu

Currently SEGCBLIST_SOFTIRQ_ONLY is a bit of an exception among the
segcblist flags because it is an exclusive state that doesn't mix up
with the other flags. Remove it in favour of:

_ A flag specifying that rcu_core() needs to perform callbacks execution
  and acceleration

and

_ A flag specifying we want the nocb lock to be held in any needed
  circumstances

This clarifies the code and is more flexible: It allows to have a state
where rcu_core() runs with locking while offloading hasn't started yet.
This is a necessary step to prepare for triggering rcu_core() at the
very beginning of the de-offloading process so that rcu_core() won't
dismiss work while being preempted by the de-offloading process, at
least not without a pending subsequent rcu_core() that will quickly
catch up.

Reviewed-by: Valentin Schneider <Valentin.Schneider@arm.com>
Tested-by: Valentin Schneider <valentin.schneider@arm.com>
Tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Neeraj Upadhyay <neeraju@codeaurora.org>
Cc: Uladzislau Rezki <urezki@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
---
 include/linux/rcu_segcblist.h | 37 ++++++++++++++++++++++-------------
 kernel/rcu/rcu_segcblist.c    |  6 +++---
 kernel/rcu/rcu_segcblist.h    | 12 +++++++-----
 kernel/rcu/tree.c             |  2 +-
 kernel/rcu/tree_nocb.h        | 20 +++++++++++++------
 5 files changed, 48 insertions(+), 29 deletions(-)

diff --git a/include/linux/rcu_segcblist.h b/include/linux/rcu_segcblist.h
index 3db96c4f45fd..812961b1d064 100644
--- a/include/linux/rcu_segcblist.h
+++ b/include/linux/rcu_segcblist.h
@@ -69,7 +69,7 @@ struct rcu_cblist {
  *
  *
  *  ----------------------------------------------------------------------------
- *  |                         SEGCBLIST_SOFTIRQ_ONLY                           |
+ *  |                              SEGCBLIST_RCU_CORE                          |
  *  |                                                                          |
  *  |  Callbacks processed by rcu_core() from softirqs or local                |
  *  |  rcuc kthread, without holding nocb_lock.                                |
@@ -77,7 +77,7 @@ struct rcu_cblist {
  *                                         |
  *                                         v
  *  ----------------------------------------------------------------------------
- *  |                        SEGCBLIST_OFFLOADED                               |
+ *  |       SEGCBLIST_RCU_CORE | SEGCBLIST_LOCKING | SEGCBLIST_OFFLOADED       |
  *  |                                                                          |
  *  | Callbacks processed by rcu_core() from softirqs or local                 |
  *  | rcuc kthread, while holding nocb_lock. Waking up CB and GP kthreads,     |
@@ -89,7 +89,9 @@ struct rcu_cblist {
  *                        |                                 |
  *                        v                                 v
  *  ---------------------------------------  ----------------------------------|
- *  |        SEGCBLIST_OFFLOADED |        |  |     SEGCBLIST_OFFLOADED |       |
+ *  |        SEGCBLIST_RCU_CORE   |       |  |     SEGCBLIST_RCU_CORE   |      |
+ *  |        SEGCBLIST_LOCKING    |       |  |     SEGCBLIST_LOCKING    |      |
+ *  |        SEGCBLIST_OFFLOADED  |       |  |     SEGCBLIST_OFFLOADED  |      |
  *  |        SEGCBLIST_KTHREAD_CB         |  |     SEGCBLIST_KTHREAD_GP        |
  *  |                                     |  |                                 |
  *  |                                     |  |                                 |
@@ -104,9 +106,10 @@ struct rcu_cblist {
  *                                         |
  *                                         v
  *  |--------------------------------------------------------------------------|
- *  |                           SEGCBLIST_OFFLOADED |                          |
- *  |                           SEGCBLIST_KTHREAD_CB |                         |
- *  |                           SEGCBLIST_KTHREAD_GP                           |
+ *  |                           SEGCBLIST_LOCKING    |                         |
+ *  |                           SEGCBLIST_OFFLOADED  |                         |
+ *  |                           SEGCBLIST_KTHREAD_GP |                         |
+ *  |                           SEGCBLIST_KTHREAD_CB                           |
  *  |                                                                          |
  *  |   Kthreads handle callbacks holding nocb_lock, local rcu_core() stops    |
  *  |   handling callbacks. Enable bypass queueing.                            |
@@ -120,7 +123,8 @@ struct rcu_cblist {
  *
  *
  *  |--------------------------------------------------------------------------|
- *  |                           SEGCBLIST_OFFLOADED |                          |
+ *  |                           SEGCBLIST_LOCKING    |                         |
+ *  |                           SEGCBLIST_OFFLOADED  |                         |
  *  |                           SEGCBLIST_KTHREAD_CB |                         |
  *  |                           SEGCBLIST_KTHREAD_GP                           |
  *  |                                                                          |
@@ -130,6 +134,8 @@ struct rcu_cblist {
  *                                      |
  *                                      v
  *  |--------------------------------------------------------------------------|
+ *  |                           SEGCBLIST_RCU_CORE   |                         |
+ *  |                           SEGCBLIST_LOCKING    |                         |
  *  |                           SEGCBLIST_KTHREAD_CB |                         |
  *  |                           SEGCBLIST_KTHREAD_GP                           |
  *  |                                                                          |
@@ -143,7 +149,9 @@ struct rcu_cblist {
  *                     |                                 |
  *                     v                                 v
  *  ---------------------------------------------------------------------------|
- *  |                                                                          |
+ *  |                                     |                                    |
+ *  |        SEGCBLIST_RCU_CORE |         |       SEGCBLIST_RCU_CORE |         |
+ *  |        SEGCBLIST_LOCKING  |         |       SEGCBLIST_LOCKING  |         |
  *  |        SEGCBLIST_KTHREAD_CB         |       SEGCBLIST_KTHREAD_GP         |
  *  |                                     |                                    |
  *  | GP kthread woke up and              |   CB kthread woke up and           |
@@ -159,7 +167,7 @@ struct rcu_cblist {
  *                                      |
  *                                      v
  *  ----------------------------------------------------------------------------
- *  |                                   0                                      |
+ *  |                SEGCBLIST_RCU_CORE | SEGCBLIST_LOCKING                    |
  *  |                                                                          |
  *  | Callbacks processed by rcu_core() from softirqs or local                 |
  *  | rcuc kthread, while holding nocb_lock. Forbid nocb_timer to be armed.    |
@@ -168,17 +176,18 @@ struct rcu_cblist {
  *                                      |
  *                                      v
  *  ----------------------------------------------------------------------------
- *  |                         SEGCBLIST_SOFTIRQ_ONLY                           |
+ *  |                         SEGCBLIST_RCU_CORE                               |
  *  |                                                                          |
  *  |  Callbacks processed by rcu_core() from softirqs or local                |
  *  |  rcuc kthread, without holding nocb_lock.                                |
  *  ----------------------------------------------------------------------------
  */
 #define SEGCBLIST_ENABLED	BIT(0)
-#define SEGCBLIST_SOFTIRQ_ONLY	BIT(1)
-#define SEGCBLIST_KTHREAD_CB	BIT(2)
-#define SEGCBLIST_KTHREAD_GP	BIT(3)
-#define SEGCBLIST_OFFLOADED	BIT(4)
+#define SEGCBLIST_RCU_CORE	BIT(1)
+#define SEGCBLIST_LOCKING	BIT(2)
+#define SEGCBLIST_KTHREAD_CB	BIT(3)
+#define SEGCBLIST_KTHREAD_GP	BIT(4)
+#define SEGCBLIST_OFFLOADED	BIT(5)
 
 struct rcu_segcblist {
 	struct rcu_head *head;
diff --git a/kernel/rcu/rcu_segcblist.c b/kernel/rcu/rcu_segcblist.c
index aaa111237b60..c07aab6e39ef 100644
--- a/kernel/rcu/rcu_segcblist.c
+++ b/kernel/rcu/rcu_segcblist.c
@@ -261,14 +261,14 @@ void rcu_segcblist_disable(struct rcu_segcblist *rsclp)
 }
 
 /*
- * Mark the specified rcu_segcblist structure as offloaded.
+ * Mark the specified rcu_segcblist structure as offloaded (or not)
  */
 void rcu_segcblist_offload(struct rcu_segcblist *rsclp, bool offload)
 {
 	if (offload) {
-		rcu_segcblist_clear_flags(rsclp, SEGCBLIST_SOFTIRQ_ONLY);
-		rcu_segcblist_set_flags(rsclp, SEGCBLIST_OFFLOADED);
+		rcu_segcblist_set_flags(rsclp, SEGCBLIST_LOCKING | SEGCBLIST_OFFLOADED);
 	} else {
+		rcu_segcblist_set_flags(rsclp, SEGCBLIST_RCU_CORE);
 		rcu_segcblist_clear_flags(rsclp, SEGCBLIST_OFFLOADED);
 	}
 }
diff --git a/kernel/rcu/rcu_segcblist.h b/kernel/rcu/rcu_segcblist.h
index 9a19328ff251..e373fbe44da5 100644
--- a/kernel/rcu/rcu_segcblist.h
+++ b/kernel/rcu/rcu_segcblist.h
@@ -80,11 +80,14 @@ static inline bool rcu_segcblist_is_enabled(struct rcu_segcblist *rsclp)
 	return rcu_segcblist_test_flags(rsclp, SEGCBLIST_ENABLED);
 }
 
-/* Is the specified rcu_segcblist offloaded, or is SEGCBLIST_SOFTIRQ_ONLY set? */
+/*
+ * Is the specified rcu_segcblist NOCB offloaded (or in the middle of the
+ * [de]offloading process)?
+ */
 static inline bool rcu_segcblist_is_offloaded(struct rcu_segcblist *rsclp)
 {
 	if (IS_ENABLED(CONFIG_RCU_NOCB_CPU) &&
-	    !rcu_segcblist_test_flags(rsclp, SEGCBLIST_SOFTIRQ_ONLY))
+	    rcu_segcblist_test_flags(rsclp, SEGCBLIST_LOCKING))
 		return true;
 
 	return false;
@@ -92,9 +95,8 @@ static inline bool rcu_segcblist_is_offloaded(struct rcu_segcblist *rsclp)
 
 static inline bool rcu_segcblist_completely_offloaded(struct rcu_segcblist *rsclp)
 {
-	int flags = SEGCBLIST_KTHREAD_CB | SEGCBLIST_KTHREAD_GP | SEGCBLIST_OFFLOADED;
-
-	if (IS_ENABLED(CONFIG_RCU_NOCB_CPU) && (rsclp->flags & flags) == flags)
+	if (IS_ENABLED(CONFIG_RCU_NOCB_CPU) &&
+	    !rcu_segcblist_test_flags(rsclp, SEGCBLIST_RCU_CORE))
 		return true;
 
 	return false;
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index e8247861ca93..e38028d48648 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -79,7 +79,7 @@ static DEFINE_PER_CPU_SHARED_ALIGNED(struct rcu_data, rcu_data) = {
 	.dynticks_nmi_nesting = DYNTICK_IRQ_NONIDLE,
 	.dynticks = ATOMIC_INIT(1),
 #ifdef CONFIG_RCU_NOCB_CPU
-	.cblist.flags = SEGCBLIST_SOFTIRQ_ONLY,
+	.cblist.flags = SEGCBLIST_RCU_CORE,
 #endif
 };
 static struct rcu_state rcu_state = {
diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
index 368ef7b9af4f..71a28f50b40f 100644
--- a/kernel/rcu/tree_nocb.h
+++ b/kernel/rcu/tree_nocb.h
@@ -1000,12 +1000,12 @@ static long rcu_nocb_rdp_deoffload(void *arg)
 	 */
 	rcu_nocb_lock_irqsave(rdp, flags);
 	/*
-	 * Theoretically we could set SEGCBLIST_SOFTIRQ_ONLY after the nocb
+	 * Theoretically we could clear SEGCBLIST_LOCKING after the nocb
 	 * lock is released but how about being paranoid for once?
 	 */
-	rcu_segcblist_set_flags(cblist, SEGCBLIST_SOFTIRQ_ONLY);
+	rcu_segcblist_clear_flags(cblist, SEGCBLIST_LOCKING);
 	/*
-	 * With SEGCBLIST_SOFTIRQ_ONLY, we can't use
+	 * Without SEGCBLIST_LOCKING, we can't use
 	 * rcu_nocb_unlock_irqrestore() anymore.
 	 */
 	raw_spin_unlock_irqrestore(&rdp->nocb_lock, flags);
@@ -1058,14 +1058,14 @@ static long rcu_nocb_rdp_offload(void *arg)
 
 	pr_info("Offloading %d\n", rdp->cpu);
 	/*
-	 * Can't use rcu_nocb_lock_irqsave() while we are in
-	 * SEGCBLIST_SOFTIRQ_ONLY mode.
+	 * Can't use rcu_nocb_lock_irqsave() before SEGCBLIST_LOCKING
+	 * is set.
 	 */
 	raw_spin_lock_irqsave(&rdp->nocb_lock, flags);
 
 	/*
 	 * We didn't take the nocb lock while working on the
-	 * rdp->cblist in SEGCBLIST_SOFTIRQ_ONLY mode.
+	 * rdp->cblist with SEGCBLIST_LOCKING cleared (pure softirq/rcuc mode).
 	 * Every modifications that have been done previously on
 	 * rdp->cblist must be visible remotely by the nocb kthreads
 	 * upon wake up after reading the cblist flags.
@@ -1084,6 +1084,14 @@ static long rcu_nocb_rdp_offload(void *arg)
 			      rcu_segcblist_test_flags(cblist, SEGCBLIST_KTHREAD_CB) &&
 			      rcu_segcblist_test_flags(cblist, SEGCBLIST_KTHREAD_GP));
 
+	/*
+	 * All kthreads are ready to work, we can finally relieve rcu_core() and
+	 * enable nocb bypass.
+	 */
+	rcu_nocb_lock_irqsave(rdp, flags);
+	rcu_segcblist_clear_flags(cblist, SEGCBLIST_RCU_CORE);
+	rcu_nocb_unlock_irqrestore(rdp, flags);
+
 	return ret;
 }
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 03/11] rcu/nocb: Invoke rcu_core() at the start of deoffloading
  2021-10-11 14:51 [PATCH 00/11] rcu: Make rcu_core() safe in PREEMPT_RT with NOCB + a few other fixes v2 Frederic Weisbecker
  2021-10-11 14:51 ` [PATCH 01/11] rcu/nocb: Make local rcu_nocb_lock_irqsave() safe against concurrent deoffloading Frederic Weisbecker
  2021-10-11 14:51 ` [PATCH 02/11] rcu/nocb: Prepare state machine for a new step Frederic Weisbecker
@ 2021-10-11 14:51 ` Frederic Weisbecker
  2021-10-13 16:07   ` Boqun Feng
  2021-10-11 14:51 ` [PATCH 04/11] rcu/nocb: Make rcu_core() callbacks acceleration preempt-safe Frederic Weisbecker
                   ` (8 subsequent siblings)
  11 siblings, 1 reply; 25+ messages in thread
From: Frederic Weisbecker @ 2021-10-11 14:51 UTC (permalink / raw)
  To: Paul E . McKenney
  Cc: LKML, Frederic Weisbecker, Sebastian Andrzej Siewior,
	Valentin Schneider, Peter Zijlstra, Uladzislau Rezki,
	Thomas Gleixner, Valentin Schneider, Boqun Feng, Neeraj Upadhyay,
	Josh Triplett, Joel Fernandes, rcu

On PREEMPT_RT, if rcu_core() is preempted by the de-offloading process,
some work, such as callbacks acceleration and invocation, may be left
unattended due to the volatile checks on the offloaded state.

In the worst case this work is postponed until the next rcu_pending()
check that can take a jiffy to reach, which can be a problem in case
of callbacks flooding.

Solve that with invoking rcu_core() early in the de-offloading process.
This way any work dismissed by an ongoing rcu_core() call fooled by
a preempting deoffloading process will be caught up by a nearby future
recall to rcu_core(), this time fully aware of the de-offloading state.

Tested-by: Valentin Schneider <valentin.schneider@arm.com>
Tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Valentin Schneider <valentin.schneider@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Neeraj Upadhyay <neeraju@codeaurora.org>
Cc: Uladzislau Rezki <urezki@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
---
 include/linux/rcu_segcblist.h | 14 ++++++++++++++
 kernel/rcu/rcu_segcblist.c    |  6 ++----
 kernel/rcu/tree.c             | 17 +++++++++++++++++
 kernel/rcu/tree_nocb.h        |  9 +++++++++
 4 files changed, 42 insertions(+), 4 deletions(-)

diff --git a/include/linux/rcu_segcblist.h b/include/linux/rcu_segcblist.h
index 812961b1d064..659d13a7ddaa 100644
--- a/include/linux/rcu_segcblist.h
+++ b/include/linux/rcu_segcblist.h
@@ -136,6 +136,20 @@ struct rcu_cblist {
  *  |--------------------------------------------------------------------------|
  *  |                           SEGCBLIST_RCU_CORE   |                         |
  *  |                           SEGCBLIST_LOCKING    |                         |
+ *  |                           SEGCBLIST_OFFLOADED  |                         |
+ *  |                           SEGCBLIST_KTHREAD_CB |                         |
+ *  |                           SEGCBLIST_KTHREAD_GP                           |
+ *  |                                                                          |
+ *  |   CB/GP kthreads handle callbacks holding nocb_lock, local rcu_core()    |
+ *  |   handles callbacks concurrently. Bypass enqueue is enabled.             |
+ *  |   Invoke RCU core so we make sure not to preempt it in the middle with   |
+ *  |   leaving some urgent work unattended within a jiffy.                    |
+ *  ----------------------------------------------------------------------------
+ *                                      |
+ *                                      v
+ *  |--------------------------------------------------------------------------|
+ *  |                           SEGCBLIST_RCU_CORE   |                         |
+ *  |                           SEGCBLIST_LOCKING    |                         |
  *  |                           SEGCBLIST_KTHREAD_CB |                         |
  *  |                           SEGCBLIST_KTHREAD_GP                           |
  *  |                                                                          |
diff --git a/kernel/rcu/rcu_segcblist.c b/kernel/rcu/rcu_segcblist.c
index c07aab6e39ef..81145c3ece25 100644
--- a/kernel/rcu/rcu_segcblist.c
+++ b/kernel/rcu/rcu_segcblist.c
@@ -265,12 +265,10 @@ void rcu_segcblist_disable(struct rcu_segcblist *rsclp)
  */
 void rcu_segcblist_offload(struct rcu_segcblist *rsclp, bool offload)
 {
-	if (offload) {
+	if (offload)
 		rcu_segcblist_set_flags(rsclp, SEGCBLIST_LOCKING | SEGCBLIST_OFFLOADED);
-	} else {
-		rcu_segcblist_set_flags(rsclp, SEGCBLIST_RCU_CORE);
+	else
 		rcu_segcblist_clear_flags(rsclp, SEGCBLIST_OFFLOADED);
-	}
 }
 
 /*
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index e38028d48648..b236271b9022 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -2717,6 +2717,23 @@ static __latent_entropy void rcu_core(void)
 	unsigned long flags;
 	struct rcu_data *rdp = raw_cpu_ptr(&rcu_data);
 	struct rcu_node *rnp = rdp->mynode;
+	/*
+	 * On RT rcu_core() can be preempted when IRQs aren't disabled.
+	 * Therefore this function can race with concurrent NOCB (de-)offloading
+	 * on this CPU and the below condition must be considered volatile.
+	 * However if we race with:
+	 *
+	 * _ Offloading:   In the worst case we accelerate or process callbacks
+	 *                 concurrently with NOCB kthreads. We are guaranteed to
+	 *                 call rcu_nocb_lock() if that happens.
+	 *
+	 * _ Deoffloading: In the worst case we miss callbacks acceleration or
+	 *                 processing. This is fine because the early stage
+	 *                 of deoffloading invokes rcu_core() after setting
+	 *                 SEGCBLIST_RCU_CORE. So we guarantee that we'll process
+	 *                 what could have been dismissed without the need to wait
+	 *                 for the next rcu_pending() check in the next jiffy.
+	 */
 	const bool do_batch = !rcu_segcblist_completely_offloaded(&rdp->cblist);
 
 	if (cpu_is_offline(smp_processor_id()))
diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
index 71a28f50b40f..3b470113ae38 100644
--- a/kernel/rcu/tree_nocb.h
+++ b/kernel/rcu/tree_nocb.h
@@ -990,6 +990,15 @@ static long rcu_nocb_rdp_deoffload(void *arg)
 	 * will refuse to put anything into the bypass.
 	 */
 	WARN_ON_ONCE(!rcu_nocb_flush_bypass(rdp, NULL, jiffies));
+	/*
+	 * Start with invoking rcu_core() early. This way if the current thread
+	 * happens to preempt an ongoing call to rcu_core() in the middle,
+	 * leaving some work dismissed because rcu_core() still thinks the rdp is
+	 * completely offloaded, we are guaranteed a nearby future instance of
+	 * rcu_core() to catch up.
+	 */
+	rcu_segcblist_set_flags(cblist, SEGCBLIST_RCU_CORE);
+	invoke_rcu_core();
 	ret = rdp_offload_toggle(rdp, false, flags);
 	swait_event_exclusive(rdp->nocb_state_wq,
 			      !rcu_segcblist_test_flags(cblist, SEGCBLIST_KTHREAD_CB |
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 04/11] rcu/nocb: Make rcu_core() callbacks acceleration preempt-safe
  2021-10-11 14:51 [PATCH 00/11] rcu: Make rcu_core() safe in PREEMPT_RT with NOCB + a few other fixes v2 Frederic Weisbecker
                   ` (2 preceding siblings ...)
  2021-10-11 14:51 ` [PATCH 03/11] rcu/nocb: Invoke rcu_core() at the start of deoffloading Frederic Weisbecker
@ 2021-10-11 14:51 ` Frederic Weisbecker
  2021-10-11 14:51 ` [PATCH 05/11] rcu/nocb: Make rcu_core() callbacks acceleration (de-)offloading safe Frederic Weisbecker
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 25+ messages in thread
From: Frederic Weisbecker @ 2021-10-11 14:51 UTC (permalink / raw)
  To: Paul E . McKenney
  Cc: LKML, Thomas Gleixner, Sebastian Andrzej Siewior,
	Valentin Schneider, Peter Zijlstra, Uladzislau Rezki,
	Frederic Weisbecker, Valentin Schneider, Boqun Feng,
	Neeraj Upadhyay, Josh Triplett, Joel Fernandes, rcu

From: Thomas Gleixner <tglx@linutronix.de>

While reporting a quiescent state for a given CPU, rcu_core() takes
advantage of the freshly loaded grace period sequence number and the
locked rnp to accelerate the callbacks whose sequence number have been
assigned a stale value.

This action is only necessary when the rdp isn't offloaded, otherwise
the NOCB kthreads already take care of the callbacks progression.

However the check for the offloaded state is volatile because it is
performed outside the IRQs disabled section. It's possible for the
offloading process to preempt rcu_core() at that point on PREEMPT_RT.

This is dangerous because rcu_core() may end up accelerating callbacks
concurrently with NOCB kthreads without appropriate locking.

Fix this with moving the offloaded check inside the rnp locking section.

Reported-and-tested-by: Valentin Schneider <valentin.schneider@arm.com>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Neeraj Upadhyay <neeraju@codeaurora.org>
Cc: Uladzislau Rezki <urezki@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
 kernel/rcu/tree.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index b236271b9022..4869a6856bf1 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -2288,7 +2288,6 @@ rcu_report_qs_rdp(struct rcu_data *rdp)
 	unsigned long flags;
 	unsigned long mask;
 	bool needwake = false;
-	const bool offloaded = rcu_rdp_is_offloaded(rdp);
 	struct rcu_node *rnp;
 
 	WARN_ON_ONCE(rdp->cpu != smp_processor_id());
@@ -2315,8 +2314,10 @@ rcu_report_qs_rdp(struct rcu_data *rdp)
 		/*
 		 * This GP can't end until cpu checks in, so all of our
 		 * callbacks can be processed during the next GP.
+		 *
+		 * NOCB kthreads have their own way to deal with that.
 		 */
-		if (!offloaded)
+		if (!rcu_rdp_is_offloaded(rdp))
 			needwake = rcu_accelerate_cbs(rnp, rdp);
 
 		rcu_disable_urgency_upon_qs(rdp);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 05/11] rcu/nocb: Make rcu_core() callbacks acceleration (de-)offloading safe
  2021-10-11 14:51 [PATCH 00/11] rcu: Make rcu_core() safe in PREEMPT_RT with NOCB + a few other fixes v2 Frederic Weisbecker
                   ` (3 preceding siblings ...)
  2021-10-11 14:51 ` [PATCH 04/11] rcu/nocb: Make rcu_core() callbacks acceleration preempt-safe Frederic Weisbecker
@ 2021-10-11 14:51 ` Frederic Weisbecker
  2021-10-11 14:51 ` [PATCH 06/11] rcu/nocb: Check a stable offloaded state to manipulate qlen_last_fqs_check Frederic Weisbecker
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 25+ messages in thread
From: Frederic Weisbecker @ 2021-10-11 14:51 UTC (permalink / raw)
  To: Paul E . McKenney
  Cc: LKML, Frederic Weisbecker, Sebastian Andrzej Siewior,
	Valentin Schneider, Peter Zijlstra, Uladzislau Rezki,
	Thomas Gleixner, Valentin Schneider, Boqun Feng, Neeraj Upadhyay,
	Josh Triplett, Joel Fernandes, rcu

When callbacks are offloaded, the NOCB kthreads handle the callbacks
progression on behalf of rcu_core().

However during the (de-)offloading process, the kthread may not be
entirely up to the task. As a result some callbacks grace period
sequence number may remain stale for a while because rcu_core() won't
take care of them either.

Fix this with forcing callbacks acceleration from rcu_core() as long
as the offloading process isn't complete.

Reported-and-tested-by: Valentin Schneider <valentin.schneider@arm.com>
Tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Valentin Schneider <valentin.schneider@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Neeraj Upadhyay <neeraju@codeaurora.org>
Cc: Uladzislau Rezki <urezki@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
---
 kernel/rcu/tree.c | 18 ++++++++++++++++--
 1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 4869a6856bf1..a43924244000 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -2288,6 +2288,7 @@ rcu_report_qs_rdp(struct rcu_data *rdp)
 	unsigned long flags;
 	unsigned long mask;
 	bool needwake = false;
+	bool needacc = false;
 	struct rcu_node *rnp;
 
 	WARN_ON_ONCE(rdp->cpu != smp_processor_id());
@@ -2315,16 +2316,29 @@ rcu_report_qs_rdp(struct rcu_data *rdp)
 		 * This GP can't end until cpu checks in, so all of our
 		 * callbacks can be processed during the next GP.
 		 *
-		 * NOCB kthreads have their own way to deal with that.
+		 * NOCB kthreads have their own way to deal with that...
 		 */
-		if (!rcu_rdp_is_offloaded(rdp))
+		if (!rcu_rdp_is_offloaded(rdp)) {
 			needwake = rcu_accelerate_cbs(rnp, rdp);
+		} else if (!rcu_segcblist_completely_offloaded(&rdp->cblist)) {
+			/*
+			 * ...but NOCB kthreads may miss or delay callbacks acceleration
+			 * if in the middle of a (de-)offloading process.
+			 */
+			needacc = true;
+		}
 
 		rcu_disable_urgency_upon_qs(rdp);
 		rcu_report_qs_rnp(mask, rnp, rnp->gp_seq, flags);
 		/* ^^^ Released rnp->lock */
 		if (needwake)
 			rcu_gp_kthread_wake();
+
+		if (needacc) {
+			rcu_nocb_lock_irqsave(rdp, flags);
+			rcu_accelerate_cbs_unlocked(rnp, rdp);
+			rcu_nocb_unlock_irqrestore(rdp, flags);
+		}
 	}
 }
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 06/11] rcu/nocb: Check a stable offloaded state to manipulate qlen_last_fqs_check
  2021-10-11 14:51 [PATCH 00/11] rcu: Make rcu_core() safe in PREEMPT_RT with NOCB + a few other fixes v2 Frederic Weisbecker
                   ` (4 preceding siblings ...)
  2021-10-11 14:51 ` [PATCH 05/11] rcu/nocb: Make rcu_core() callbacks acceleration (de-)offloading safe Frederic Weisbecker
@ 2021-10-11 14:51 ` Frederic Weisbecker
  2021-10-11 14:51 ` [PATCH 07/11] rcu/nocb: Use appropriate rcu_nocb_lock_irqsave() Frederic Weisbecker
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 25+ messages in thread
From: Frederic Weisbecker @ 2021-10-11 14:51 UTC (permalink / raw)
  To: Paul E . McKenney
  Cc: LKML, Frederic Weisbecker, Sebastian Andrzej Siewior,
	Valentin Schneider, Peter Zijlstra, Uladzislau Rezki,
	Thomas Gleixner, Valentin Schneider, Boqun Feng, Neeraj Upadhyay,
	Josh Triplett, Joel Fernandes, rcu

It's not entirely clear why rdp->qlen_last_fqs_check is updated before
processing the queue only on offloaded rdp. There can be different
effect to that, either in favour of triggering the force quiescent state
path or not. For example:

1) If the number of callbacks has decreased since the last
   rdp->qlen_last_fqs_check update (because we recently called
   rcu_do_batch() and we executed below qhimark callbacks) and the number
   of processed callbacks on a subsequent do_batch() arranges for
   exceeding qhimark on non-offloaded but not on offloaded setup, then we
   may spare a later run to the force quiescent state
   slow path on __call_rcu_nocb_wake(), as compared to the non-offloaded
   counterpart scenario.

   Here is such an offloaded scenario instance:

    qhimark = 1000
    rdp->last_qlen_last_fqs_check = 3000
    rcu_segcblist_n_cbs(rdp) = 2000

    rcu_do_batch() {
        if (offloaded)
            rdp->last_qlen_fqs_check = rcu_segcblist_n_cbs(rdp) // 2000
        // run 1000 callback
        rcu_segcblist_n_cbs(rdp) = 1000
        // Not updating rdp->qlen_last_fqs_check
        if (count < rdp->qlen_last_fqs_check - qhimark)
            rdp->qlen_last_fqs_check = count;
    }

    call_rcu() * 1001 {
        __call_rcu_nocb_wake() {
            // not taking the fqs slowpath:
            // rcu_segcblist_n_cbs(rdp) == 2001
            // rdp->qlen_last_fqs_check == 2000
            // qhimark == 1000
            if (len > rdp->qlen_last_fqs_check + qhimark)
                ...
    }

    In the case of a non-offloaded scenario, rdp->qlen_last_fqs_check
    would be 1000 and the fqs slowpath would have executed.

2) If the number of callbacks has increased since the last
   rdp->qlen_last_fqs_check update (because we recently queued below
   qhimark callbacks) and the number of callbacks executed in rcu_do_batch()
   doesn't exceed qhimark for either offloaded or non-offloaded setup,
   then it's possible that the offloaded scenario later run the force
   quiescent state slow path on __call_rcu_nocb_wake() while the
   non-offloaded doesn't.

    qhimark = 1000
    rdp->last_qlen_last_fqs_check = 3000
    rcu_segcblist_n_cbs(rdp) = 2000

    rcu_do_batch() {
        if (offloaded)
            rdp->last_qlen_last_fqs_check = rcu_segcblist_n_cbs(rdp) // 2000
        // run 100 callbacks
        // concurrent queued 100
        rcu_segcblist_n_cbs(rdp) = 2000
        // Not updating rdp->qlen_last_fqs_check
        if (count < rdp->qlen_last_fqs_check - qhimark)
            rdp->qlen_last_fqs_check = count;
    }

    call_rcu() * 1001 {
        __call_rcu_nocb_wake() {
            // Taking the fqs slowpath:
            // rcu_segcblist_n_cbs(rdp) == 3001
            // rdp->qlen_last_fqs_check == 2000
            // qhimark == 1000
            if (len > rdp->qlen_last_fqs_check + qhimark)
                ...
    }

    In the case of a non-offloaded scenario, rdp->qlen_last_fqs_check
    would be 3000 and the fqs slowpath would have executed.

Until we sort this out, keep the current behaviour, whatever the
original intent is, but make sure we check a stable and not volatile
offloading state in order not to raise a useless alarm on -rt

Reported-and-tested-by: Valentin Schneider <valentin.schneider@arm.com>
Tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Original-patch-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Valentin Schneider <valentin.schneider@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Neeraj Upadhyay <neeraju@codeaurora.org>
Cc: Uladzislau Rezki <urezki@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
---
 kernel/rcu/tree.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index a43924244000..27500981d4b1 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -2508,7 +2508,7 @@ static void rcu_do_batch(struct rcu_data *rdp)
 	trace_rcu_batch_start(rcu_state.name,
 			      rcu_segcblist_n_cbs(&rdp->cblist), bl);
 	rcu_segcblist_extract_done_cbs(&rdp->cblist, &rcl);
-	if (offloaded)
+	if (rcu_rdp_is_offloaded(rdp))
 		rdp->qlen_last_fqs_check = rcu_segcblist_n_cbs(&rdp->cblist);
 
 	trace_rcu_segcb_stats(&rdp->cblist, TPS("SegCbDequeued"));
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 07/11] rcu/nocb: Use appropriate rcu_nocb_lock_irqsave()
  2021-10-11 14:51 [PATCH 00/11] rcu: Make rcu_core() safe in PREEMPT_RT with NOCB + a few other fixes v2 Frederic Weisbecker
                   ` (5 preceding siblings ...)
  2021-10-11 14:51 ` [PATCH 06/11] rcu/nocb: Check a stable offloaded state to manipulate qlen_last_fqs_check Frederic Weisbecker
@ 2021-10-11 14:51 ` Frederic Weisbecker
  2021-10-11 14:51 ` [PATCH 08/11] rcu/nocb: Limit number of softirq callbacks only on softirq Frederic Weisbecker
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 25+ messages in thread
From: Frederic Weisbecker @ 2021-10-11 14:51 UTC (permalink / raw)
  To: Paul E . McKenney
  Cc: LKML, Frederic Weisbecker, Sebastian Andrzej Siewior,
	Valentin Schneider, Peter Zijlstra, Uladzislau Rezki,
	Thomas Gleixner, Valentin Schneider, Boqun Feng, Neeraj Upadhyay,
	Josh Triplett, Joel Fernandes, rcu

Instead of hardcoding IRQ save and nocb lock, use the consolidated
API (and fix a comment as per Valentin Schneider's suggestion).

Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Tested-by: Valentin Schneider <valentin.schneider@arm.com>
Tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Valentin Schneider <valentin.schneider@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Neeraj Upadhyay <neeraju@codeaurora.org>
Cc: Uladzislau Rezki <urezki@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
---
 kernel/rcu/tree.c | 8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 27500981d4b1..eaa9c7ce91bb 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -2488,12 +2488,11 @@ static void rcu_do_batch(struct rcu_data *rdp)
 	}
 
 	/*
-	 * Extract the list of ready callbacks, disabling to prevent
+	 * Extract the list of ready callbacks, disabling IRQs to prevent
 	 * races with call_rcu() from interrupt handlers.  Leave the
 	 * callback counts, as rcu_barrier() needs to be conservative.
 	 */
-	local_irq_save(flags);
-	rcu_nocb_lock(rdp);
+	rcu_nocb_lock_irqsave(rdp, flags);
 	WARN_ON_ONCE(cpu_is_offline(smp_processor_id()));
 	pending = rcu_segcblist_n_cbs(&rdp->cblist);
 	div = READ_ONCE(rcu_divisor);
@@ -2556,8 +2555,7 @@ static void rcu_do_batch(struct rcu_data *rdp)
 		}
 	}
 
-	local_irq_save(flags);
-	rcu_nocb_lock(rdp);
+	rcu_nocb_lock_irqsave(rdp, flags);
 	rdp->n_cbs_invoked += count;
 	trace_rcu_batch_end(rcu_state.name, count, !!rcl.head, need_resched(),
 			    is_idle_task(current), rcu_is_callbacks_kthread());
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 08/11] rcu/nocb: Limit number of softirq callbacks only on softirq
  2021-10-11 14:51 [PATCH 00/11] rcu: Make rcu_core() safe in PREEMPT_RT with NOCB + a few other fixes v2 Frederic Weisbecker
                   ` (6 preceding siblings ...)
  2021-10-11 14:51 ` [PATCH 07/11] rcu/nocb: Use appropriate rcu_nocb_lock_irqsave() Frederic Weisbecker
@ 2021-10-11 14:51 ` Frederic Weisbecker
  2021-10-11 14:51 ` [PATCH 09/11] rcu: Fix callbacks processing time limit retaining cond_resched() Frederic Weisbecker
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 25+ messages in thread
From: Frederic Weisbecker @ 2021-10-11 14:51 UTC (permalink / raw)
  To: Paul E . McKenney
  Cc: LKML, Frederic Weisbecker, Sebastian Andrzej Siewior,
	Valentin Schneider, Peter Zijlstra, Uladzislau Rezki,
	Thomas Gleixner, Valentin Schneider, Boqun Feng, Neeraj Upadhyay,
	Josh Triplett, Joel Fernandes, rcu

The current condition to limit the number of callbacks executed in a
row checks the offloaded state of the rdp. Not only is it volatile
but it is also misleading: the rcu_core() may well be executing
callbacks concurrently with NOCB kthreads, and the offloaded state
would then be verified on both cases. As a result the limit would
spuriously not apply anymore on softirq while in the middle of
(de-)offloading process.

Fix and clarify the condition with those constraints in mind:

_ If callbacks are processed either by rcuc or NOCB kthread, the call
  to cond_resched_tasks_rcu_qs() is enough to take care of the overload.

_ If instead callbacks are processed by softirqs:
  * If need_resched(), exit the callbacks processing
  * Otherwise if CPU is idle we can continue
  * Otherwise exit because a softirq shouldn't interrupt a task for too
    long nor deprive other pending softirq vectors of the CPU.

Tested-by: Valentin Schneider <valentin.schneider@arm.com>
Tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Valentin Schneider <valentin.schneider@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Neeraj Upadhyay <neeraju@codeaurora.org>
Cc: Uladzislau Rezki <urezki@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
---
 kernel/rcu/tree.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index eaa9c7ce91bb..716dead1509d 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -2535,9 +2535,8 @@ static void rcu_do_batch(struct rcu_data *rdp)
 		/*
 		 * Stop only if limit reached and CPU has something to do.
 		 */
-		if (count >= bl && !offloaded &&
-		    (need_resched() ||
-		     (!is_idle_task(current) && !rcu_is_callbacks_kthread())))
+		if (count >= bl && in_serving_softirq() &&
+		    (need_resched() || !is_idle_task(current)))
 			break;
 		if (unlikely(tlimit)) {
 			/* only call local_clock() every 32 callbacks */
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 09/11] rcu: Fix callbacks processing time limit retaining cond_resched()
  2021-10-11 14:51 [PATCH 00/11] rcu: Make rcu_core() safe in PREEMPT_RT with NOCB + a few other fixes v2 Frederic Weisbecker
                   ` (7 preceding siblings ...)
  2021-10-11 14:51 ` [PATCH 08/11] rcu/nocb: Limit number of softirq callbacks only on softirq Frederic Weisbecker
@ 2021-10-11 14:51 ` Frederic Weisbecker
  2021-10-11 14:51 ` [PATCH 10/11] rcu: Apply callbacks processing time limit only on softirq Frederic Weisbecker
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 25+ messages in thread
From: Frederic Weisbecker @ 2021-10-11 14:51 UTC (permalink / raw)
  To: Paul E . McKenney
  Cc: LKML, Frederic Weisbecker, Sebastian Andrzej Siewior,
	Valentin Schneider, Peter Zijlstra, Uladzislau Rezki,
	Thomas Gleixner, Valentin Schneider, Boqun Feng, Neeraj Upadhyay,
	Josh Triplett, Joel Fernandes, rcu

The callbacks processing time limit makes sure we are not exceeding a
given amount of time executing the queue.

However its "continue" clause bypasses the cond_resched() call on
rcuc and NOCB kthreads, delaying it until we reach the limit, which can
be very long...

Make sure the scheduler has a higher priority than the time limit.

Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Tested-by: Valentin Schneider <valentin.schneider@arm.com>
Tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Valentin Schneider <valentin.schneider@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Neeraj Upadhyay <neeraju@codeaurora.org>
Cc: Uladzislau Rezki <urezki@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
---
 kernel/rcu/tree.c | 25 +++++++++++++++----------
 1 file changed, 15 insertions(+), 10 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 716dead1509d..7a655d93a28a 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -2535,9 +2535,21 @@ static void rcu_do_batch(struct rcu_data *rdp)
 		/*
 		 * Stop only if limit reached and CPU has something to do.
 		 */
-		if (count >= bl && in_serving_softirq() &&
-		    (need_resched() || !is_idle_task(current)))
-			break;
+		if (in_serving_softirq()) {
+			if (count >= bl && (need_resched() || !is_idle_task(current)))
+				break;
+		} else {
+			local_bh_enable();
+			lockdep_assert_irqs_enabled();
+			cond_resched_tasks_rcu_qs();
+			lockdep_assert_irqs_enabled();
+			local_bh_disable();
+		}
+
+		/*
+		 * Make sure we don't spend too much time here and deprive other
+		 * softirq vectors of CPU cycles.
+		 */
 		if (unlikely(tlimit)) {
 			/* only call local_clock() every 32 callbacks */
 			if (likely((count & 31) || local_clock() < tlimit))
@@ -2545,13 +2557,6 @@ static void rcu_do_batch(struct rcu_data *rdp)
 			/* Exceeded the time limit, so leave. */
 			break;
 		}
-		if (!in_serving_softirq()) {
-			local_bh_enable();
-			lockdep_assert_irqs_enabled();
-			cond_resched_tasks_rcu_qs();
-			lockdep_assert_irqs_enabled();
-			local_bh_disable();
-		}
 	}
 
 	rcu_nocb_lock_irqsave(rdp, flags);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 10/11] rcu: Apply callbacks processing time limit only on softirq
  2021-10-11 14:51 [PATCH 00/11] rcu: Make rcu_core() safe in PREEMPT_RT with NOCB + a few other fixes v2 Frederic Weisbecker
                   ` (8 preceding siblings ...)
  2021-10-11 14:51 ` [PATCH 09/11] rcu: Fix callbacks processing time limit retaining cond_resched() Frederic Weisbecker
@ 2021-10-11 14:51 ` Frederic Weisbecker
  2021-10-11 14:51 ` [PATCH 11/11] rcu/nocb: Don't invoke local rcu core on callback overload from nocb kthread Frederic Weisbecker
  2021-10-13  0:32 ` [PATCH 00/11] rcu: Make rcu_core() safe in PREEMPT_RT with NOCB + a few other fixes v2 Paul E. McKenney
  11 siblings, 0 replies; 25+ messages in thread
From: Frederic Weisbecker @ 2021-10-11 14:51 UTC (permalink / raw)
  To: Paul E . McKenney
  Cc: LKML, Frederic Weisbecker, Sebastian Andrzej Siewior,
	Valentin Schneider, Peter Zijlstra, Uladzislau Rezki,
	Thomas Gleixner, Valentin Schneider, Boqun Feng, Neeraj Upadhyay,
	Josh Triplett, Joel Fernandes, rcu

Time limit only makes sense when callbacks are serviced in softirq mode
because:

_ In case we need to get back to the scheduler,
  cond_resched_tasks_rcu_qs() is called after each callback.

_ In case some other softirq vector needs the CPU, the call to
  local_bh_enable() before cond_resched_tasks_rcu_qs() takes care about
  them via a call to do_softirq().

Therefore, make sure the time limit only applies to softirq mode.

Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Tested-by: Valentin Schneider <valentin.schneider@arm.com>
Tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Valentin Schneider <valentin.schneider@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Neeraj Upadhyay <neeraju@codeaurora.org>
Cc: Uladzislau Rezki <urezki@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
---
 kernel/rcu/tree.c | 25 ++++++++++++-------------
 1 file changed, 12 insertions(+), 13 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 7a655d93a28a..0ca4f4ec724d 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -2498,7 +2498,7 @@ static void rcu_do_batch(struct rcu_data *rdp)
 	div = READ_ONCE(rcu_divisor);
 	div = div < 0 ? 7 : div > sizeof(long) * 8 - 2 ? sizeof(long) * 8 - 2 : div;
 	bl = max(rdp->blimit, pending >> div);
-	if (unlikely(bl > 100)) {
+	if (in_serving_softirq() && unlikely(bl > 100)) {
 		long rrn = READ_ONCE(rcu_resched_ns);
 
 		rrn = rrn < NSEC_PER_MSEC ? NSEC_PER_MSEC : rrn > NSEC_PER_SEC ? NSEC_PER_SEC : rrn;
@@ -2538,6 +2538,17 @@ static void rcu_do_batch(struct rcu_data *rdp)
 		if (in_serving_softirq()) {
 			if (count >= bl && (need_resched() || !is_idle_task(current)))
 				break;
+			/*
+			 * Make sure we don't spend too much time here and deprive other
+			 * softirq vectors of CPU cycles.
+			 */
+			if (unlikely(tlimit)) {
+				/* only call local_clock() every 32 callbacks */
+				if (likely((count & 31) || local_clock() < tlimit))
+					continue;
+				/* Exceeded the time limit, so leave. */
+				break;
+			}
 		} else {
 			local_bh_enable();
 			lockdep_assert_irqs_enabled();
@@ -2545,18 +2556,6 @@ static void rcu_do_batch(struct rcu_data *rdp)
 			lockdep_assert_irqs_enabled();
 			local_bh_disable();
 		}
-
-		/*
-		 * Make sure we don't spend too much time here and deprive other
-		 * softirq vectors of CPU cycles.
-		 */
-		if (unlikely(tlimit)) {
-			/* only call local_clock() every 32 callbacks */
-			if (likely((count & 31) || local_clock() < tlimit))
-				continue;
-			/* Exceeded the time limit, so leave. */
-			break;
-		}
 	}
 
 	rcu_nocb_lock_irqsave(rdp, flags);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 11/11] rcu/nocb: Don't invoke local rcu core on callback overload from nocb kthread
  2021-10-11 14:51 [PATCH 00/11] rcu: Make rcu_core() safe in PREEMPT_RT with NOCB + a few other fixes v2 Frederic Weisbecker
                   ` (9 preceding siblings ...)
  2021-10-11 14:51 ` [PATCH 10/11] rcu: Apply callbacks processing time limit only on softirq Frederic Weisbecker
@ 2021-10-11 14:51 ` Frederic Weisbecker
  2021-10-13  0:32 ` [PATCH 00/11] rcu: Make rcu_core() safe in PREEMPT_RT with NOCB + a few other fixes v2 Paul E. McKenney
  11 siblings, 0 replies; 25+ messages in thread
From: Frederic Weisbecker @ 2021-10-11 14:51 UTC (permalink / raw)
  To: Paul E . McKenney
  Cc: LKML, Frederic Weisbecker, Sebastian Andrzej Siewior,
	Valentin Schneider, Peter Zijlstra, Uladzislau Rezki,
	Thomas Gleixner, Valentin Schneider, Boqun Feng, Neeraj Upadhyay,
	Josh Triplett, Joel Fernandes, rcu

rcu_core() tries to ensure that its self-invocation in case of callbacks
overload only happen in softirq/rcuc mode. Indeed it doesn't make sense
to trigger local RCU core from nocb_cb kthread since it can execute
on a CPU different from the target rdp. Also in case of overload, the
nocb_cb kthread simply iterates a new loop of callbacks processing.

However the "offloaded" check that aims at preventing misplaced
rcu_core() invocations is wrong. First of all that state is volatile
and second: softirq/rcuc can execute while the target rdp is offloaded.
As a result rcu_core() can be invoked on the wrong CPU while in the
process of (de-)offloading.

Fix that with moving the rcu_core() self-invocation to rcu_core() itself,
irrespective of the rdp offloaded state.

Tested-by: Valentin Schneider <valentin.schneider@arm.com>
Tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Valentin Schneider <valentin.schneider@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Neeraj Upadhyay <neeraju@codeaurora.org>
Cc: Uladzislau Rezki <urezki@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
---
 kernel/rcu/tree.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 0ca4f4ec724d..8270e58cd0f3 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -2470,7 +2470,6 @@ static void rcu_do_batch(struct rcu_data *rdp)
 	int div;
 	bool __maybe_unused empty;
 	unsigned long flags;
-	const bool offloaded = rcu_rdp_is_offloaded(rdp);
 	struct rcu_head *rhp;
 	struct rcu_cblist rcl = RCU_CBLIST_INITIALIZER(rcl);
 	long bl, count = 0;
@@ -2592,9 +2591,6 @@ static void rcu_do_batch(struct rcu_data *rdp)
 
 	rcu_nocb_unlock_irqrestore(rdp, flags);
 
-	/* Re-invoke RCU core processing if there are callbacks remaining. */
-	if (!offloaded && rcu_segcblist_ready_cbs(&rdp->cblist))
-		invoke_rcu_core();
 	tick_dep_clear_task(current, TICK_DEP_BIT_RCU);
 }
 
@@ -2781,8 +2777,12 @@ static __latent_entropy void rcu_core(void)
 
 	/* If there are callbacks ready, invoke them. */
 	if (do_batch && rcu_segcblist_ready_cbs(&rdp->cblist) &&
-	    likely(READ_ONCE(rcu_scheduler_fully_active)))
+	    likely(READ_ONCE(rcu_scheduler_fully_active))) {
 		rcu_do_batch(rdp);
+		/* Re-invoke RCU core processing if there are callbacks remaining. */
+		if (rcu_segcblist_ready_cbs(&rdp->cblist))
+			invoke_rcu_core();
+	}
 
 	/* Do any needed deferred wakeups of rcuo kthreads. */
 	do_nocb_deferred_wakeup(rdp);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH 00/11] rcu: Make rcu_core() safe in PREEMPT_RT with NOCB + a few other fixes v2
  2021-10-11 14:51 [PATCH 00/11] rcu: Make rcu_core() safe in PREEMPT_RT with NOCB + a few other fixes v2 Frederic Weisbecker
                   ` (10 preceding siblings ...)
  2021-10-11 14:51 ` [PATCH 11/11] rcu/nocb: Don't invoke local rcu core on callback overload from nocb kthread Frederic Weisbecker
@ 2021-10-13  0:32 ` Paul E. McKenney
  2021-10-13  3:28   ` Paul E. McKenney
  11 siblings, 1 reply; 25+ messages in thread
From: Paul E. McKenney @ 2021-10-13  0:32 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Sebastian Andrzej Siewior, Valentin Schneider,
	Peter Zijlstra, Uladzislau Rezki, Thomas Gleixner, Boqun Feng,
	Neeraj Upadhyay, Josh Triplett, Joel Fernandes, rcu

On Mon, Oct 11, 2021 at 04:51:29PM +0200, Frederic Weisbecker wrote:
> Hi,
> 
> No code change in this v2, only changelogs:
> 
> * Add tags from Valentin and Sebastian
> 
> * Remove last reference to SEGCBLIST_SOFTIRQ_ONLY (thanks Valentin)
> 
> * Rewrite changelog for "rcu/nocb: Check a stable offloaded state to manipulate qlen_last_fqs_check"
>   after off-list debates with Paul.
> 
> * Remove the scenario with softirq interrupting rcuc on
>   "rcu/nocb: Limit number of softirq callbacks only on softirq" as it's
>   probably not possible (thanks Valentin).
> 
> * Remove the scenario with task spent scheduling out accounted on tlimit
>   as it's not possible (thanks Valentin)
>   (see "rcu: Apply callbacks processing time limit only on softirq")
> 
> * Fixed changelog of
>   "rcu/nocb: Don't invoke local rcu core on callback overload from nocb kthread"
>   (thanks Sebastian).
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks.git
> 	rcu/rt-v2
> 
> HEAD: 2c9349986d5f70a555195139665841cd98e9aba4
> 
> Thanks,
> 	Frederic

Nice!

I queued these for further review and testing.  I reworked the commit log
of 6/11 to give my idea of the reason, though I freely admit that this
reason is not as compelling as it no doubt seemed when I wrote that code.

							Thanx, Paul

> ---
> 
> Frederic Weisbecker (10):
>       rcu/nocb: Make local rcu_nocb_lock_irqsave() safe against concurrent deoffloading
>       rcu/nocb: Prepare state machine for a new step
>       rcu/nocb: Invoke rcu_core() at the start of deoffloading
>       rcu/nocb: Make rcu_core() callbacks acceleration (de-)offloading safe
>       rcu/nocb: Check a stable offloaded state to manipulate qlen_last_fqs_check
>       rcu/nocb: Use appropriate rcu_nocb_lock_irqsave()
>       rcu/nocb: Limit number of softirq callbacks only on softirq
>       rcu: Fix callbacks processing time limit retaining cond_resched()
>       rcu: Apply callbacks processing time limit only on softirq
>       rcu/nocb: Don't invoke local rcu core on callback overload from nocb kthread
> 
> Thomas Gleixner (1):
>       rcu/nocb: Make rcu_core() callbacks acceleration preempt-safe
> 
> 
>  include/linux/rcu_segcblist.h | 51 ++++++++++++++++++-------
>  kernel/rcu/rcu_segcblist.c    | 10 ++---
>  kernel/rcu/rcu_segcblist.h    | 12 +++---
>  kernel/rcu/tree.c             | 87 +++++++++++++++++++++++++++++--------------
>  kernel/rcu/tree.h             | 16 +++++---
>  kernel/rcu/tree_nocb.h        | 29 ++++++++++++---
>  6 files changed, 141 insertions(+), 64 deletions(-)

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 00/11] rcu: Make rcu_core() safe in PREEMPT_RT with NOCB + a few other fixes v2
  2021-10-13  0:32 ` [PATCH 00/11] rcu: Make rcu_core() safe in PREEMPT_RT with NOCB + a few other fixes v2 Paul E. McKenney
@ 2021-10-13  3:28   ` Paul E. McKenney
  2021-10-13 10:01     ` Frederic Weisbecker
  2021-10-13 11:43     ` Frederic Weisbecker
  0 siblings, 2 replies; 25+ messages in thread
From: Paul E. McKenney @ 2021-10-13  3:28 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Sebastian Andrzej Siewior, Valentin Schneider,
	Peter Zijlstra, Uladzislau Rezki, Thomas Gleixner, Boqun Feng,
	Neeraj Upadhyay, Josh Triplett, Joel Fernandes, rcu

On Tue, Oct 12, 2021 at 05:32:15PM -0700, Paul E. McKenney wrote:
> On Mon, Oct 11, 2021 at 04:51:29PM +0200, Frederic Weisbecker wrote:
> > Hi,
> > 
> > No code change in this v2, only changelogs:
> > 
> > * Add tags from Valentin and Sebastian
> > 
> > * Remove last reference to SEGCBLIST_SOFTIRQ_ONLY (thanks Valentin)
> > 
> > * Rewrite changelog for "rcu/nocb: Check a stable offloaded state to manipulate qlen_last_fqs_check"
> >   after off-list debates with Paul.
> > 
> > * Remove the scenario with softirq interrupting rcuc on
> >   "rcu/nocb: Limit number of softirq callbacks only on softirq" as it's
> >   probably not possible (thanks Valentin).
> > 
> > * Remove the scenario with task spent scheduling out accounted on tlimit
> >   as it's not possible (thanks Valentin)
> >   (see "rcu: Apply callbacks processing time limit only on softirq")
> > 
> > * Fixed changelog of
> >   "rcu/nocb: Don't invoke local rcu core on callback overload from nocb kthread"
> >   (thanks Sebastian).
> > 
> > git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks.git
> > 	rcu/rt-v2
> > 
> > HEAD: 2c9349986d5f70a555195139665841cd98e9aba4
> > 
> > Thanks,
> > 	Frederic
> 
> Nice!
> 
> I queued these for further review and testing.  I reworked the commit log
> of 6/11 to give my idea of the reason, though I freely admit that this
> reason is not as compelling as it no doubt seemed when I wrote that code.

But in initial tests TREE04.5, TREE04.6, and TREE04.9 all hit the
WARN_ON(1) in rcu_torture_barrier(), which indicates rcu_barrier()
breakage.  My best (but not so good) guess is a five-hour MTBF on a
dual-socket system.

I started an automated "git bisect" with each step running 100 hours
of TREE04, but I would be surprised if anything useful comes of it.
Pleased, mind you, but surprised.

							Thanx, Paul

> > ---
> > 
> > Frederic Weisbecker (10):
> >       rcu/nocb: Make local rcu_nocb_lock_irqsave() safe against concurrent deoffloading
> >       rcu/nocb: Prepare state machine for a new step
> >       rcu/nocb: Invoke rcu_core() at the start of deoffloading
> >       rcu/nocb: Make rcu_core() callbacks acceleration (de-)offloading safe
> >       rcu/nocb: Check a stable offloaded state to manipulate qlen_last_fqs_check
> >       rcu/nocb: Use appropriate rcu_nocb_lock_irqsave()
> >       rcu/nocb: Limit number of softirq callbacks only on softirq
> >       rcu: Fix callbacks processing time limit retaining cond_resched()
> >       rcu: Apply callbacks processing time limit only on softirq
> >       rcu/nocb: Don't invoke local rcu core on callback overload from nocb kthread
> > 
> > Thomas Gleixner (1):
> >       rcu/nocb: Make rcu_core() callbacks acceleration preempt-safe
> > 
> > 
> >  include/linux/rcu_segcblist.h | 51 ++++++++++++++++++-------
> >  kernel/rcu/rcu_segcblist.c    | 10 ++---
> >  kernel/rcu/rcu_segcblist.h    | 12 +++---
> >  kernel/rcu/tree.c             | 87 +++++++++++++++++++++++++++++--------------
> >  kernel/rcu/tree.h             | 16 +++++---
> >  kernel/rcu/tree_nocb.h        | 29 ++++++++++++---
> >  6 files changed, 141 insertions(+), 64 deletions(-)

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 00/11] rcu: Make rcu_core() safe in PREEMPT_RT with NOCB + a few other fixes v2
  2021-10-13  3:28   ` Paul E. McKenney
@ 2021-10-13 10:01     ` Frederic Weisbecker
  2021-10-13 11:43     ` Frederic Weisbecker
  1 sibling, 0 replies; 25+ messages in thread
From: Frederic Weisbecker @ 2021-10-13 10:01 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: LKML, Sebastian Andrzej Siewior, Valentin Schneider,
	Peter Zijlstra, Uladzislau Rezki, Thomas Gleixner, Boqun Feng,
	Neeraj Upadhyay, Josh Triplett, Joel Fernandes, rcu

On Tue, Oct 12, 2021 at 08:28:32PM -0700, Paul E. McKenney wrote:
> On Tue, Oct 12, 2021 at 05:32:15PM -0700, Paul E. McKenney wrote:
> > On Mon, Oct 11, 2021 at 04:51:29PM +0200, Frederic Weisbecker wrote:
> > > Hi,
> > > 
> > > No code change in this v2, only changelogs:
> > > 
> > > * Add tags from Valentin and Sebastian
> > > 
> > > * Remove last reference to SEGCBLIST_SOFTIRQ_ONLY (thanks Valentin)
> > > 
> > > * Rewrite changelog for "rcu/nocb: Check a stable offloaded state to manipulate qlen_last_fqs_check"
> > >   after off-list debates with Paul.
> > > 
> > > * Remove the scenario with softirq interrupting rcuc on
> > >   "rcu/nocb: Limit number of softirq callbacks only on softirq" as it's
> > >   probably not possible (thanks Valentin).
> > > 
> > > * Remove the scenario with task spent scheduling out accounted on tlimit
> > >   as it's not possible (thanks Valentin)
> > >   (see "rcu: Apply callbacks processing time limit only on softirq")
> > > 
> > > * Fixed changelog of
> > >   "rcu/nocb: Don't invoke local rcu core on callback overload from nocb kthread"
> > >   (thanks Sebastian).
> > > 
> > > git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks.git
> > > 	rcu/rt-v2
> > > 
> > > HEAD: 2c9349986d5f70a555195139665841cd98e9aba4
> > > 
> > > Thanks,
> > > 	Frederic
> > 
> > Nice!
> > 
> > I queued these for further review and testing.  I reworked the commit log
> > of 6/11 to give my idea of the reason, though I freely admit that this
> > reason is not as compelling as it no doubt seemed when I wrote that code.
> 
> But in initial tests TREE04.5, TREE04.6, and TREE04.9 all hit the
> WARN_ON(1) in rcu_torture_barrier(), which indicates rcu_barrier()
> breakage.  My best (but not so good) guess is a five-hour MTBF on a
> dual-socket system.
> 
> I started an automated "git bisect" with each step running 100 hours
> of TREE04, but I would be surprised if anything useful comes of it.
> Pleased, mind you, but surprised.

Oops, trying those scenario on my side as well.

Thanks!

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 00/11] rcu: Make rcu_core() safe in PREEMPT_RT with NOCB + a few other fixes v2
  2021-10-13  3:28   ` Paul E. McKenney
  2021-10-13 10:01     ` Frederic Weisbecker
@ 2021-10-13 11:43     ` Frederic Weisbecker
  2021-10-13 16:27       ` Paul E. McKenney
  1 sibling, 1 reply; 25+ messages in thread
From: Frederic Weisbecker @ 2021-10-13 11:43 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: LKML, Sebastian Andrzej Siewior, Valentin Schneider,
	Peter Zijlstra, Uladzislau Rezki, Thomas Gleixner, Boqun Feng,
	Neeraj Upadhyay, Josh Triplett, Joel Fernandes, rcu

On Tue, Oct 12, 2021 at 08:28:32PM -0700, Paul E. McKenney wrote:
> On Tue, Oct 12, 2021 at 05:32:15PM -0700, Paul E. McKenney wrote:
> > On Mon, Oct 11, 2021 at 04:51:29PM +0200, Frederic Weisbecker wrote:
> > > Hi,
> > > 
> > > No code change in this v2, only changelogs:
> > > 
> > > * Add tags from Valentin and Sebastian
> > > 
> > > * Remove last reference to SEGCBLIST_SOFTIRQ_ONLY (thanks Valentin)
> > > 
> > > * Rewrite changelog for "rcu/nocb: Check a stable offloaded state to manipulate qlen_last_fqs_check"
> > >   after off-list debates with Paul.
> > > 
> > > * Remove the scenario with softirq interrupting rcuc on
> > >   "rcu/nocb: Limit number of softirq callbacks only on softirq" as it's
> > >   probably not possible (thanks Valentin).
> > > 
> > > * Remove the scenario with task spent scheduling out accounted on tlimit
> > >   as it's not possible (thanks Valentin)
> > >   (see "rcu: Apply callbacks processing time limit only on softirq")
> > > 
> > > * Fixed changelog of
> > >   "rcu/nocb: Don't invoke local rcu core on callback overload from nocb kthread"
> > >   (thanks Sebastian).
> > > 
> > > git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks.git
> > > 	rcu/rt-v2
> > > 
> > > HEAD: 2c9349986d5f70a555195139665841cd98e9aba4
> > > 
> > > Thanks,
> > > 	Frederic
> > 
> > Nice!
> > 
> > I queued these for further review and testing.  I reworked the commit log
> > of 6/11 to give my idea of the reason, though I freely admit that this
> > reason is not as compelling as it no doubt seemed when I wrote that code.
> 
> But in initial tests TREE04.5, TREE04.6, and TREE04.9 all hit the
> WARN_ON(1) in rcu_torture_barrier(), which indicates rcu_barrier()
> breakage.  My best (but not so good) guess is a five-hour MTBF on a
> dual-socket system.
> 
> I started an automated "git bisect" with each step running 100 hours
> of TREE04, but I would be surprised if anything useful comes of it.
> Pleased, mind you, but surprised.

Ok I can reproduce.

I'm launching a bisect from my side as well.

Thanks.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 03/11] rcu/nocb: Invoke rcu_core() at the start of deoffloading
  2021-10-11 14:51 ` [PATCH 03/11] rcu/nocb: Invoke rcu_core() at the start of deoffloading Frederic Weisbecker
@ 2021-10-13 16:07   ` Boqun Feng
  2021-10-14 11:07     ` Frederic Weisbecker
  2021-10-14 11:42     ` Valentin Schneider
  0 siblings, 2 replies; 25+ messages in thread
From: Boqun Feng @ 2021-10-13 16:07 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Paul E . McKenney, LKML, Sebastian Andrzej Siewior,
	Valentin Schneider, Peter Zijlstra, Uladzislau Rezki,
	Thomas Gleixner, Neeraj Upadhyay, Josh Triplett, Joel Fernandes,
	rcu

Hi Frederic,

On Mon, Oct 11, 2021 at 04:51:32PM +0200, Frederic Weisbecker wrote:
> On PREEMPT_RT, if rcu_core() is preempted by the de-offloading process,
> some work, such as callbacks acceleration and invocation, may be left
> unattended due to the volatile checks on the offloaded state.
> 
> In the worst case this work is postponed until the next rcu_pending()
> check that can take a jiffy to reach, which can be a problem in case
> of callbacks flooding.
> 
> Solve that with invoking rcu_core() early in the de-offloading process.
> This way any work dismissed by an ongoing rcu_core() call fooled by
> a preempting deoffloading process will be caught up by a nearby future
> recall to rcu_core(), this time fully aware of the de-offloading state.
> 
> Tested-by: Valentin Schneider <valentin.schneider@arm.com>
> Tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
> Cc: Valentin Schneider <valentin.schneider@arm.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> Cc: Josh Triplett <josh@joshtriplett.org>
> Cc: Joel Fernandes <joel@joelfernandes.org>
> Cc: Boqun Feng <boqun.feng@gmail.com>
> Cc: Neeraj Upadhyay <neeraju@codeaurora.org>
> Cc: Uladzislau Rezki <urezki@gmail.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> ---
>  include/linux/rcu_segcblist.h | 14 ++++++++++++++
>  kernel/rcu/rcu_segcblist.c    |  6 ++----
>  kernel/rcu/tree.c             | 17 +++++++++++++++++
>  kernel/rcu/tree_nocb.h        |  9 +++++++++
>  4 files changed, 42 insertions(+), 4 deletions(-)
> 
> diff --git a/include/linux/rcu_segcblist.h b/include/linux/rcu_segcblist.h
> index 812961b1d064..659d13a7ddaa 100644
> --- a/include/linux/rcu_segcblist.h
> +++ b/include/linux/rcu_segcblist.h
> @@ -136,6 +136,20 @@ struct rcu_cblist {
>   *  |--------------------------------------------------------------------------|
>   *  |                           SEGCBLIST_RCU_CORE   |                         |
>   *  |                           SEGCBLIST_LOCKING    |                         |
> + *  |                           SEGCBLIST_OFFLOADED  |                         |
> + *  |                           SEGCBLIST_KTHREAD_CB |                         |
> + *  |                           SEGCBLIST_KTHREAD_GP                           |
> + *  |                                                                          |
> + *  |   CB/GP kthreads handle callbacks holding nocb_lock, local rcu_core()    |
> + *  |   handles callbacks concurrently. Bypass enqueue is enabled.             |
> + *  |   Invoke RCU core so we make sure not to preempt it in the middle with   |
> + *  |   leaving some urgent work unattended within a jiffy.                    |
> + *  ----------------------------------------------------------------------------
> + *                                      |
> + *                                      v
> + *  |--------------------------------------------------------------------------|
> + *  |                           SEGCBLIST_RCU_CORE   |                         |
> + *  |                           SEGCBLIST_LOCKING    |                         |
>   *  |                           SEGCBLIST_KTHREAD_CB |                         |
>   *  |                           SEGCBLIST_KTHREAD_GP                           |
>   *  |                                                                          |
> diff --git a/kernel/rcu/rcu_segcblist.c b/kernel/rcu/rcu_segcblist.c
> index c07aab6e39ef..81145c3ece25 100644
> --- a/kernel/rcu/rcu_segcblist.c
> +++ b/kernel/rcu/rcu_segcblist.c
> @@ -265,12 +265,10 @@ void rcu_segcblist_disable(struct rcu_segcblist *rsclp)
>   */
>  void rcu_segcblist_offload(struct rcu_segcblist *rsclp, bool offload)
>  {
> -	if (offload) {
> +	if (offload)
>  		rcu_segcblist_set_flags(rsclp, SEGCBLIST_LOCKING | SEGCBLIST_OFFLOADED);
> -	} else {
> -		rcu_segcblist_set_flags(rsclp, SEGCBLIST_RCU_CORE);
> +	else
>  		rcu_segcblist_clear_flags(rsclp, SEGCBLIST_OFFLOADED);
> -	}
>  }
>  
>  /*
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index e38028d48648..b236271b9022 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -2717,6 +2717,23 @@ static __latent_entropy void rcu_core(void)
>  	unsigned long flags;
>  	struct rcu_data *rdp = raw_cpu_ptr(&rcu_data);
>  	struct rcu_node *rnp = rdp->mynode;
> +	/*
> +	 * On RT rcu_core() can be preempted when IRQs aren't disabled.
> +	 * Therefore this function can race with concurrent NOCB (de-)offloading
> +	 * on this CPU and the below condition must be considered volatile.
> +	 * However if we race with:
> +	 *
> +	 * _ Offloading:   In the worst case we accelerate or process callbacks
> +	 *                 concurrently with NOCB kthreads. We are guaranteed to
> +	 *                 call rcu_nocb_lock() if that happens.

If offloading races with rcu_core(), can the following happen?

	<offload work>				
	rcu_nocb_rdp_offload():
	    					rcu_core():
						  ...
						  rcu_nocb_lock_irqsave(); // no a lock
	  raw_spin_lock_irqsave(->nocb_lock);
	    rdp_offload_toggle():
	      <LOCKING | OFFLOADED set>
	      					  if (!rcu_segcblist_restempty(...))
						    rcu_accelerate_cbs_unlocked(...);
						  rcu_nocb_unlock_irqrestore();
						  // ^ a real unlock,
						  // and will preempt_enable()
	    // offload continue with ->nocb_lock not held

If this can happen, it means an unpaired preempt_enable() and an
incorrect unlock. Thoughts? Maybe I'm missing something here?

Regards,
Boqun

> +	 *
> +	 * _ Deoffloading: In the worst case we miss callbacks acceleration or
> +	 *                 processing. This is fine because the early stage
> +	 *                 of deoffloading invokes rcu_core() after setting
> +	 *                 SEGCBLIST_RCU_CORE. So we guarantee that we'll process
> +	 *                 what could have been dismissed without the need to wait
> +	 *                 for the next rcu_pending() check in the next jiffy.
> +	 */
>  	const bool do_batch = !rcu_segcblist_completely_offloaded(&rdp->cblist);
>  
>  	if (cpu_is_offline(smp_processor_id()))
> diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
> index 71a28f50b40f..3b470113ae38 100644
> --- a/kernel/rcu/tree_nocb.h
> +++ b/kernel/rcu/tree_nocb.h
> @@ -990,6 +990,15 @@ static long rcu_nocb_rdp_deoffload(void *arg)
>  	 * will refuse to put anything into the bypass.
>  	 */
>  	WARN_ON_ONCE(!rcu_nocb_flush_bypass(rdp, NULL, jiffies));
> +	/*
> +	 * Start with invoking rcu_core() early. This way if the current thread
> +	 * happens to preempt an ongoing call to rcu_core() in the middle,
> +	 * leaving some work dismissed because rcu_core() still thinks the rdp is
> +	 * completely offloaded, we are guaranteed a nearby future instance of
> +	 * rcu_core() to catch up.
> +	 */
> +	rcu_segcblist_set_flags(cblist, SEGCBLIST_RCU_CORE);
> +	invoke_rcu_core();
>  	ret = rdp_offload_toggle(rdp, false, flags);
>  	swait_event_exclusive(rdp->nocb_state_wq,
>  			      !rcu_segcblist_test_flags(cblist, SEGCBLIST_KTHREAD_CB |
> -- 
> 2.25.1
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 00/11] rcu: Make rcu_core() safe in PREEMPT_RT with NOCB + a few other fixes v2
  2021-10-13 11:43     ` Frederic Weisbecker
@ 2021-10-13 16:27       ` Paul E. McKenney
  2021-10-14 10:43         ` Frederic Weisbecker
  0 siblings, 1 reply; 25+ messages in thread
From: Paul E. McKenney @ 2021-10-13 16:27 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Sebastian Andrzej Siewior, Valentin Schneider,
	Peter Zijlstra, Uladzislau Rezki, Thomas Gleixner, Boqun Feng,
	Neeraj Upadhyay, Josh Triplett, Joel Fernandes, rcu

On Wed, Oct 13, 2021 at 01:43:35PM +0200, Frederic Weisbecker wrote:
> On Tue, Oct 12, 2021 at 08:28:32PM -0700, Paul E. McKenney wrote:
> > On Tue, Oct 12, 2021 at 05:32:15PM -0700, Paul E. McKenney wrote:
> > > On Mon, Oct 11, 2021 at 04:51:29PM +0200, Frederic Weisbecker wrote:
> > > > Hi,
> > > > 
> > > > No code change in this v2, only changelogs:
> > > > 
> > > > * Add tags from Valentin and Sebastian
> > > > 
> > > > * Remove last reference to SEGCBLIST_SOFTIRQ_ONLY (thanks Valentin)
> > > > 
> > > > * Rewrite changelog for "rcu/nocb: Check a stable offloaded state to manipulate qlen_last_fqs_check"
> > > >   after off-list debates with Paul.
> > > > 
> > > > * Remove the scenario with softirq interrupting rcuc on
> > > >   "rcu/nocb: Limit number of softirq callbacks only on softirq" as it's
> > > >   probably not possible (thanks Valentin).
> > > > 
> > > > * Remove the scenario with task spent scheduling out accounted on tlimit
> > > >   as it's not possible (thanks Valentin)
> > > >   (see "rcu: Apply callbacks processing time limit only on softirq")
> > > > 
> > > > * Fixed changelog of
> > > >   "rcu/nocb: Don't invoke local rcu core on callback overload from nocb kthread"
> > > >   (thanks Sebastian).
> > > > 
> > > > git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks.git
> > > > 	rcu/rt-v2
> > > > 
> > > > HEAD: 2c9349986d5f70a555195139665841cd98e9aba4
> > > > 
> > > > Thanks,
> > > > 	Frederic
> > > 
> > > Nice!
> > > 
> > > I queued these for further review and testing.  I reworked the commit log
> > > of 6/11 to give my idea of the reason, though I freely admit that this
> > > reason is not as compelling as it no doubt seemed when I wrote that code.
> > 
> > But in initial tests TREE04.5, TREE04.6, and TREE04.9 all hit the
> > WARN_ON(1) in rcu_torture_barrier(), which indicates rcu_barrier()
> > breakage.  My best (but not so good) guess is a five-hour MTBF on a
> > dual-socket system.
> > 
> > I started an automated "git bisect" with each step running 100 hours
> > of TREE04, but I would be surprised if anything useful comes of it.
> > Pleased, mind you, but surprised.
> 
> Ok I can reproduce.
> 
> I'm launching a bisect from my side as well.

Mine converged on 2a4200944750 ("rcu/nocb: Prepare state machine for
a new step").  The surprise is that I was running "git bisect run"
on a script wrappering kvm-remote.sh, which means that it managed to
repeatedly request 10 systems, download to them, run the test, collect
the results, and finally return the systems.

Huh.  I should probably refactor my local script to avoid the pointless
repeated request/return work.

But which commit did your bisect find?  ;-)

Anyway, I am keeping the first commit 4b246eab4750 ("rcu/nocb: Make
local rcu_nocb_lock_irqsave() safe against concurrent deoffloading"),
but dropping the others for the time being.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 00/11] rcu: Make rcu_core() safe in PREEMPT_RT with NOCB + a few other fixes v2
  2021-10-13 16:27       ` Paul E. McKenney
@ 2021-10-14 10:43         ` Frederic Weisbecker
  0 siblings, 0 replies; 25+ messages in thread
From: Frederic Weisbecker @ 2021-10-14 10:43 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: LKML, Sebastian Andrzej Siewior, Valentin Schneider,
	Peter Zijlstra, Uladzislau Rezki, Thomas Gleixner, Boqun Feng,
	Neeraj Upadhyay, Josh Triplett, Joel Fernandes, rcu

On Wed, Oct 13, 2021 at 09:27:33AM -0700, Paul E. McKenney wrote:
> On Wed, Oct 13, 2021 at 01:43:35PM +0200, Frederic Weisbecker wrote:
> > On Tue, Oct 12, 2021 at 08:28:32PM -0700, Paul E. McKenney wrote:
> > > On Tue, Oct 12, 2021 at 05:32:15PM -0700, Paul E. McKenney wrote:
> > > > On Mon, Oct 11, 2021 at 04:51:29PM +0200, Frederic Weisbecker wrote:
> > > > > Hi,
> > > > > 
> > > > > No code change in this v2, only changelogs:
> > > > > 
> > > > > * Add tags from Valentin and Sebastian
> > > > > 
> > > > > * Remove last reference to SEGCBLIST_SOFTIRQ_ONLY (thanks Valentin)
> > > > > 
> > > > > * Rewrite changelog for "rcu/nocb: Check a stable offloaded state to manipulate qlen_last_fqs_check"
> > > > >   after off-list debates with Paul.
> > > > > 
> > > > > * Remove the scenario with softirq interrupting rcuc on
> > > > >   "rcu/nocb: Limit number of softirq callbacks only on softirq" as it's
> > > > >   probably not possible (thanks Valentin).
> > > > > 
> > > > > * Remove the scenario with task spent scheduling out accounted on tlimit
> > > > >   as it's not possible (thanks Valentin)
> > > > >   (see "rcu: Apply callbacks processing time limit only on softirq")
> > > > > 
> > > > > * Fixed changelog of
> > > > >   "rcu/nocb: Don't invoke local rcu core on callback overload from nocb kthread"
> > > > >   (thanks Sebastian).
> > > > > 
> > > > > git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks.git
> > > > > 	rcu/rt-v2
> > > > > 
> > > > > HEAD: 2c9349986d5f70a555195139665841cd98e9aba4
> > > > > 
> > > > > Thanks,
> > > > > 	Frederic
> > > > 
> > > > Nice!
> > > > 
> > > > I queued these for further review and testing.  I reworked the commit log
> > > > of 6/11 to give my idea of the reason, though I freely admit that this
> > > > reason is not as compelling as it no doubt seemed when I wrote that code.
> > > 
> > > But in initial tests TREE04.5, TREE04.6, and TREE04.9 all hit the
> > > WARN_ON(1) in rcu_torture_barrier(), which indicates rcu_barrier()
> > > breakage.  My best (but not so good) guess is a five-hour MTBF on a
> > > dual-socket system.
> > > 
> > > I started an automated "git bisect" with each step running 100 hours
> > > of TREE04, but I would be surprised if anything useful comes of it.
> > > Pleased, mind you, but surprised.
> > 
> > Ok I can reproduce.
> > 
> > I'm launching a bisect from my side as well.
> 
> Mine converged on 2a4200944750 ("rcu/nocb: Prepare state machine for
> a new step").  The surprise is that I was running "git bisect run"
> on a script wrappering kvm-remote.sh, which means that it managed to
> repeatedly request 10 systems, download to them, run the test, collect
> the results, and finally return the systems.
> 
> Huh.  I should probably refactor my local script to avoid the pointless
> repeated request/return work.
> 
> But which commit did your bisect find?  ;-)

So my bisection got confused with two different issues: one with an
oom and one with rcu_barrier() being unhappy.

I'm re-running it but I'll investigate both.

> 
> Anyway, I am keeping the first commit 4b246eab4750 ("rcu/nocb: Make
> local rcu_nocb_lock_irqsave() safe against concurrent deoffloading"),
> but dropping the others for the time being.

Fair enough!

Thanks.



> 							Thanx, Paul

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 03/11] rcu/nocb: Invoke rcu_core() at the start of deoffloading
  2021-10-13 16:07   ` Boqun Feng
@ 2021-10-14 11:07     ` Frederic Weisbecker
  2021-10-14 11:42     ` Valentin Schneider
  1 sibling, 0 replies; 25+ messages in thread
From: Frederic Weisbecker @ 2021-10-14 11:07 UTC (permalink / raw)
  To: Boqun Feng
  Cc: Paul E . McKenney, LKML, Sebastian Andrzej Siewior,
	Valentin Schneider, Peter Zijlstra, Uladzislau Rezki,
	Thomas Gleixner, Neeraj Upadhyay, Josh Triplett, Joel Fernandes,
	rcu

Hi Boqun,

On Thu, Oct 14, 2021 at 12:07:58AM +0800, Boqun Feng wrote:
> If offloading races with rcu_core(), can the following happen?
> 
> 	<offload work>				
> 	rcu_nocb_rdp_offload():
> 	    					rcu_core():
> 						  ...
> 						  rcu_nocb_lock_irqsave(); // no a lock
> 	  raw_spin_lock_irqsave(->nocb_lock);
> 	    rdp_offload_toggle():
> 	      <LOCKING | OFFLOADED set>
> 	      					  if (!rcu_segcblist_restempty(...))
> 						    rcu_accelerate_cbs_unlocked(...);
> 						  rcu_nocb_unlock_irqrestore();
> 						  // ^ a real unlock,
> 						  // and will preempt_enable()
> 	    // offload continue with ->nocb_lock not held
> 
> If this can happen, it means an unpaired preempt_enable() and an
> incorrect unlock. Thoughts? Maybe I'm missing something here?

Since we are unconditionally disabling IRQs on rcu_nocb_lock_irqsave(), we can't
be preempted by rcu_nocb_rdp_offload() until rcu_nocb_unlock_irqrestore() has
completed. And both have to run on the rdp target CPU. So this shouldn't happen.

Thanks.



> 
> Regards,
> Boqun
> 
> > +	 *
> > +	 * _ Deoffloading: In the worst case we miss callbacks acceleration or
> > +	 *                 processing. This is fine because the early stage
> > +	 *                 of deoffloading invokes rcu_core() after setting
> > +	 *                 SEGCBLIST_RCU_CORE. So we guarantee that we'll process
> > +	 *                 what could have been dismissed without the need to wait
> > +	 *                 for the next rcu_pending() check in the next jiffy.
> > +	 */
> >  	const bool do_batch = !rcu_segcblist_completely_offloaded(&rdp->cblist);
> >  
> >  	if (cpu_is_offline(smp_processor_id()))
> > diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
> > index 71a28f50b40f..3b470113ae38 100644
> > --- a/kernel/rcu/tree_nocb.h
> > +++ b/kernel/rcu/tree_nocb.h
> > @@ -990,6 +990,15 @@ static long rcu_nocb_rdp_deoffload(void *arg)
> >  	 * will refuse to put anything into the bypass.
> >  	 */
> >  	WARN_ON_ONCE(!rcu_nocb_flush_bypass(rdp, NULL, jiffies));
> > +	/*
> > +	 * Start with invoking rcu_core() early. This way if the current thread
> > +	 * happens to preempt an ongoing call to rcu_core() in the middle,
> > +	 * leaving some work dismissed because rcu_core() still thinks the rdp is
> > +	 * completely offloaded, we are guaranteed a nearby future instance of
> > +	 * rcu_core() to catch up.
> > +	 */
> > +	rcu_segcblist_set_flags(cblist, SEGCBLIST_RCU_CORE);
> > +	invoke_rcu_core();
> >  	ret = rdp_offload_toggle(rdp, false, flags);
> >  	swait_event_exclusive(rdp->nocb_state_wq,
> >  			      !rcu_segcblist_test_flags(cblist, SEGCBLIST_KTHREAD_CB |
> > -- 
> > 2.25.1
> > 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 03/11] rcu/nocb: Invoke rcu_core() at the start of deoffloading
  2021-10-13 16:07   ` Boqun Feng
  2021-10-14 11:07     ` Frederic Weisbecker
@ 2021-10-14 11:42     ` Valentin Schneider
  2021-10-14 13:57       ` Boqun Feng
  1 sibling, 1 reply; 25+ messages in thread
From: Valentin Schneider @ 2021-10-14 11:42 UTC (permalink / raw)
  To: Boqun Feng, Frederic Weisbecker
  Cc: Paul E . McKenney, LKML, Sebastian Andrzej Siewior,
	Peter Zijlstra, Uladzislau Rezki, Thomas Gleixner,
	Neeraj Upadhyay, Josh Triplett, Joel Fernandes, rcu

On 14/10/21 00:07, Boqun Feng wrote:
>> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
>> index e38028d48648..b236271b9022 100644
>> --- a/kernel/rcu/tree.c
>> +++ b/kernel/rcu/tree.c
>> @@ -2717,6 +2717,23 @@ static __latent_entropy void rcu_core(void)
>>      unsigned long flags;
>>      struct rcu_data *rdp = raw_cpu_ptr(&rcu_data);
>>      struct rcu_node *rnp = rdp->mynode;
>> +	/*
>> +	 * On RT rcu_core() can be preempted when IRQs aren't disabled.
>> +	 * Therefore this function can race with concurrent NOCB (de-)offloading
>> +	 * on this CPU and the below condition must be considered volatile.
>> +	 * However if we race with:
>> +	 *
>> +	 * _ Offloading:   In the worst case we accelerate or process callbacks
>> +	 *                 concurrently with NOCB kthreads. We are guaranteed to
>> +	 *                 call rcu_nocb_lock() if that happens.
>
> If offloading races with rcu_core(), can the following happen?
>
>       <offload work>
>       rcu_nocb_rdp_offload():
>                                               rcu_core():
>                                                 ...
>                                                 rcu_nocb_lock_irqsave(); // no a lock
>         raw_spin_lock_irqsave(->nocb_lock);
>           rdp_offload_toggle():
>             <LOCKING | OFFLOADED set>
>                                                 if (!rcu_segcblist_restempty(...))
>                                                   rcu_accelerate_cbs_unlocked(...);
>                                                 rcu_nocb_unlock_irqrestore();
>                                                 // ^ a real unlock,
>                                                 // and will preempt_enable()
>           // offload continue with ->nocb_lock not held
>
> If this can happen, it means an unpaired preempt_enable() and an
> incorrect unlock. Thoughts? Maybe I'm missing something here?
>

As Frederic pointed out in an earlier thread [1], this can't happen because
we still have IRQs disabled, and the offloading process has to be processed
on the CPU being offloaded. IOW, in the above scenario, rcu_core() can't be
preempted by the rcu_nocb_rdp_offload() work until it re-enables IRQs at
rcu_nocb_unlock_irqrestore().

(hopefully Paul or Frederic will correct me if I've just spewed garbage)

[1]: https://lore.kernel.org/lkml/20210930105340.GA232241@lothringen/

> Regards,
> Boqun
>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 03/11] rcu/nocb: Invoke rcu_core() at the start of deoffloading
  2021-10-14 11:42     ` Valentin Schneider
@ 2021-10-14 13:57       ` Boqun Feng
  0 siblings, 0 replies; 25+ messages in thread
From: Boqun Feng @ 2021-10-14 13:57 UTC (permalink / raw)
  To: Valentin Schneider
  Cc: Frederic Weisbecker, Paul E . McKenney, LKML,
	Sebastian Andrzej Siewior, Peter Zijlstra, Uladzislau Rezki,
	Thomas Gleixner, Neeraj Upadhyay, Josh Triplett, Joel Fernandes,
	rcu

On Thu, Oct 14, 2021 at 12:42:40PM +0100, Valentin Schneider wrote:
> On 14/10/21 00:07, Boqun Feng wrote:
> >> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> >> index e38028d48648..b236271b9022 100644
> >> --- a/kernel/rcu/tree.c
> >> +++ b/kernel/rcu/tree.c
> >> @@ -2717,6 +2717,23 @@ static __latent_entropy void rcu_core(void)
> >>      unsigned long flags;
> >>      struct rcu_data *rdp = raw_cpu_ptr(&rcu_data);
> >>      struct rcu_node *rnp = rdp->mynode;
> >> +	/*
> >> +	 * On RT rcu_core() can be preempted when IRQs aren't disabled.
> >> +	 * Therefore this function can race with concurrent NOCB (de-)offloading
> >> +	 * on this CPU and the below condition must be considered volatile.
> >> +	 * However if we race with:
> >> +	 *
> >> +	 * _ Offloading:   In the worst case we accelerate or process callbacks
> >> +	 *                 concurrently with NOCB kthreads. We are guaranteed to
> >> +	 *                 call rcu_nocb_lock() if that happens.
> >
> > If offloading races with rcu_core(), can the following happen?
> >
> >       <offload work>
> >       rcu_nocb_rdp_offload():
> >                                               rcu_core():
> >                                                 ...
> >                                                 rcu_nocb_lock_irqsave(); // no a lock
> >         raw_spin_lock_irqsave(->nocb_lock);
> >           rdp_offload_toggle():
> >             <LOCKING | OFFLOADED set>
> >                                                 if (!rcu_segcblist_restempty(...))
> >                                                   rcu_accelerate_cbs_unlocked(...);
> >                                                 rcu_nocb_unlock_irqrestore();
> >                                                 // ^ a real unlock,
> >                                                 // and will preempt_enable()
> >           // offload continue with ->nocb_lock not held
> >
> > If this can happen, it means an unpaired preempt_enable() and an
> > incorrect unlock. Thoughts? Maybe I'm missing something here?
> >
> 
> As Frederic pointed out in an earlier thread [1], this can't happen because
> we still have IRQs disabled, and the offloading process has to be processed
> on the CPU being offloaded. IOW, in the above scenario, rcu_core() can't be
> preempted by the rcu_nocb_rdp_offload() work until it re-enables IRQs at
> rcu_nocb_unlock_irqrestore().
> 
> (hopefully Paul or Frederic will correct me if I've just spewed garbage)
> 
> [1]: https://lore.kernel.org/lkml/20210930105340.GA232241@lothringen/
> 

Thanks! I think Frederic and you are right: this cannot happen. Thank
you both for looking into this ;-)

Regards,
Boqun

> > Regards,
> > Boqun
> >

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 03/11] rcu/nocb: Invoke rcu_core() at the start of deoffloading
  2021-10-01 17:50   ` Valentin Schneider
@ 2021-10-04 12:41     ` Frederic Weisbecker
  0 siblings, 0 replies; 25+ messages in thread
From: Frederic Weisbecker @ 2021-10-04 12:41 UTC (permalink / raw)
  To: Valentin Schneider
  Cc: Paul E . McKenney, LKML, Sebastian Andrzej Siewior,
	Peter Zijlstra, Uladzislau Rezki, Thomas Gleixner, Boqun Feng,
	Neeraj Upadhyay, Josh Triplett, Joel Fernandes, rcu

On Fri, Oct 01, 2021 at 06:50:04PM +0100, Valentin Schneider wrote:
> On 30/09/21 00:10, Frederic Weisbecker wrote:
> > On PREEMPT_RT, if rcu_core() is preempted by the de-offloading process,
> > some work, such as callbacks acceleration and invocation, may be left
> > unattended due to the volatile checks on the offloaded state.
> >
> > In the worst case this work is postponed until the next rcu_pending()
> > check that can take a jiffy to reach, which can be a problem in case
> > of callbacks flooding.
> >
> > Solve that with invoking rcu_core() early in the de-offloading process.
> > This way any work dismissed by an ongoing rcu_core() call fooled by
> > a preempting deoffloading process will be caught up by a nearby future
> > recall to rcu_core(), this time fully aware of the de-offloading state.
> >
> > Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
> > Cc: Valentin Schneider <valentin.schneider@arm.com>
> > Cc: Peter Zijlstra <peterz@infradead.org>
> > Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> > Cc: Josh Triplett <josh@joshtriplett.org>
> > Cc: Joel Fernandes <joel@joelfernandes.org>
> > Cc: Boqun Feng <boqun.feng@gmail.com>
> > Cc: Neeraj Upadhyay <neeraju@codeaurora.org>
> > Cc: Uladzislau Rezki <urezki@gmail.com>
> > Cc: Thomas Gleixner <tglx@linutronix.de>
> 
> One comment/question below.
> 
> > @@ -990,6 +990,15 @@ static long rcu_nocb_rdp_deoffload(void *arg)
> >        * will refuse to put anything into the bypass.
> >        */
> >       WARN_ON_ONCE(!rcu_nocb_flush_bypass(rdp, NULL, jiffies));
> > +	/*
> > +	 * Start with invoking rcu_core() early. This way if the current thread
> > +	 * happens to preempt an ongoing call to rcu_core() in the middle,
> > +	 * leaving some work dismissed because rcu_core() still thinks the rdp is
> > +	 * completely offloaded, we are guaranteed a nearby future instance of
> > +	 * rcu_core() to catch up.
> > +	 */
> > +	rcu_segcblist_set_flags(cblist, SEGCBLIST_RCU_CORE);
> > +	invoke_rcu_core();
> 
> I think your approach is a bit neater, but would there have been any issue
> with keeping the setting of SEGCBLIST_RCU_CORE within
> rcu_segcblist_offload() and bundling it with an invoke_rcu_core()?

Probably not in practice.

But in theory, it may be more comfortable to read the following in order:

1) Set SEGCBLIST_RCU_CORE so subsequent invocations of rcu_core() handle
callbacks

2) Invoke rcu_core()

3) Only once we achieved the above we can clear SEGCBLIST_OFFLOADED which
will stop the nocb kthreads.

If we did 3) first and only then 1) and 2), there would be a risk that callbacks
get completely ignored in the middle.

That said you have a point in that we could do:

1) Set SEGCBLIST_RCU_CORE and clear SEGCBLIST_OFFLOADED at the _very_ same time
(arrange that with a WRITE_ONCE() I guess).

2) Invoke rcu_core()

But well...arranging for rcu_core() to take over before we even consider
starting the de-offloading process provides some unexplainable relief to the
soul. Some code design sometimes rely more on faith than logic :)

Thanks.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 03/11] rcu/nocb: Invoke rcu_core() at the start of deoffloading
  2021-09-29 22:10 ` [PATCH 03/11] rcu/nocb: Invoke rcu_core() at the start of deoffloading Frederic Weisbecker
@ 2021-10-01 17:50   ` Valentin Schneider
  2021-10-04 12:41     ` Frederic Weisbecker
  0 siblings, 1 reply; 25+ messages in thread
From: Valentin Schneider @ 2021-10-01 17:50 UTC (permalink / raw)
  To: Frederic Weisbecker, Paul E . McKenney
  Cc: LKML, Frederic Weisbecker, Sebastian Andrzej Siewior,
	Peter Zijlstra, Uladzislau Rezki, Thomas Gleixner, Boqun Feng,
	Neeraj Upadhyay, Josh Triplett, Joel Fernandes, rcu

On 30/09/21 00:10, Frederic Weisbecker wrote:
> On PREEMPT_RT, if rcu_core() is preempted by the de-offloading process,
> some work, such as callbacks acceleration and invocation, may be left
> unattended due to the volatile checks on the offloaded state.
>
> In the worst case this work is postponed until the next rcu_pending()
> check that can take a jiffy to reach, which can be a problem in case
> of callbacks flooding.
>
> Solve that with invoking rcu_core() early in the de-offloading process.
> This way any work dismissed by an ongoing rcu_core() call fooled by
> a preempting deoffloading process will be caught up by a nearby future
> recall to rcu_core(), this time fully aware of the de-offloading state.
>
> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
> Cc: Valentin Schneider <valentin.schneider@arm.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> Cc: Josh Triplett <josh@joshtriplett.org>
> Cc: Joel Fernandes <joel@joelfernandes.org>
> Cc: Boqun Feng <boqun.feng@gmail.com>
> Cc: Neeraj Upadhyay <neeraju@codeaurora.org>
> Cc: Uladzislau Rezki <urezki@gmail.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>

One comment/question below.

> @@ -990,6 +990,15 @@ static long rcu_nocb_rdp_deoffload(void *arg)
>        * will refuse to put anything into the bypass.
>        */
>       WARN_ON_ONCE(!rcu_nocb_flush_bypass(rdp, NULL, jiffies));
> +	/*
> +	 * Start with invoking rcu_core() early. This way if the current thread
> +	 * happens to preempt an ongoing call to rcu_core() in the middle,
> +	 * leaving some work dismissed because rcu_core() still thinks the rdp is
> +	 * completely offloaded, we are guaranteed a nearby future instance of
> +	 * rcu_core() to catch up.
> +	 */
> +	rcu_segcblist_set_flags(cblist, SEGCBLIST_RCU_CORE);
> +	invoke_rcu_core();

I think your approach is a bit neater, but would there have been any issue
with keeping the setting of SEGCBLIST_RCU_CORE within
rcu_segcblist_offload() and bundling it with an invoke_rcu_core()?

>       ret = rdp_offload_toggle(rdp, false, flags);
>       swait_event_exclusive(rdp->nocb_state_wq,
>                             !rcu_segcblist_test_flags(cblist, SEGCBLIST_KTHREAD_CB |
> --
> 2.25.1

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 03/11] rcu/nocb: Invoke rcu_core() at the start of deoffloading
  2021-09-29 22:10 [PATCH 00/11] rcu: Make rcu_core() safe in PREEMPT_RT with NOCB + a few other fixes Frederic Weisbecker
@ 2021-09-29 22:10 ` Frederic Weisbecker
  2021-10-01 17:50   ` Valentin Schneider
  0 siblings, 1 reply; 25+ messages in thread
From: Frederic Weisbecker @ 2021-09-29 22:10 UTC (permalink / raw)
  To: Paul E . McKenney
  Cc: LKML, Frederic Weisbecker, Sebastian Andrzej Siewior,
	Peter Zijlstra, Uladzislau Rezki, Valentin Schneider,
	Thomas Gleixner, Boqun Feng, Neeraj Upadhyay, Josh Triplett,
	Joel Fernandes, rcu

On PREEMPT_RT, if rcu_core() is preempted by the de-offloading process,
some work, such as callbacks acceleration and invocation, may be left
unattended due to the volatile checks on the offloaded state.

In the worst case this work is postponed until the next rcu_pending()
check that can take a jiffy to reach, which can be a problem in case
of callbacks flooding.

Solve that with invoking rcu_core() early in the de-offloading process.
This way any work dismissed by an ongoing rcu_core() call fooled by
a preempting deoffloading process will be caught up by a nearby future
recall to rcu_core(), this time fully aware of the de-offloading state.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Valentin Schneider <valentin.schneider@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Neeraj Upadhyay <neeraju@codeaurora.org>
Cc: Uladzislau Rezki <urezki@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
---
 include/linux/rcu_segcblist.h | 14 ++++++++++++++
 kernel/rcu/rcu_segcblist.c    |  6 ++----
 kernel/rcu/tree.c             | 17 +++++++++++++++++
 kernel/rcu/tree_nocb.h        |  9 +++++++++
 4 files changed, 42 insertions(+), 4 deletions(-)

diff --git a/include/linux/rcu_segcblist.h b/include/linux/rcu_segcblist.h
index 812961b1d064..659d13a7ddaa 100644
--- a/include/linux/rcu_segcblist.h
+++ b/include/linux/rcu_segcblist.h
@@ -136,6 +136,20 @@ struct rcu_cblist {
  *  |--------------------------------------------------------------------------|
  *  |                           SEGCBLIST_RCU_CORE   |                         |
  *  |                           SEGCBLIST_LOCKING    |                         |
+ *  |                           SEGCBLIST_OFFLOADED  |                         |
+ *  |                           SEGCBLIST_KTHREAD_CB |                         |
+ *  |                           SEGCBLIST_KTHREAD_GP                           |
+ *  |                                                                          |
+ *  |   CB/GP kthreads handle callbacks holding nocb_lock, local rcu_core()    |
+ *  |   handles callbacks concurrently. Bypass enqueue is enabled.             |
+ *  |   Invoke RCU core so we make sure not to preempt it in the middle with   |
+ *  |   leaving some urgent work unattended within a jiffy.                    |
+ *  ----------------------------------------------------------------------------
+ *                                      |
+ *                                      v
+ *  |--------------------------------------------------------------------------|
+ *  |                           SEGCBLIST_RCU_CORE   |                         |
+ *  |                           SEGCBLIST_LOCKING    |                         |
  *  |                           SEGCBLIST_KTHREAD_CB |                         |
  *  |                           SEGCBLIST_KTHREAD_GP                           |
  *  |                                                                          |
diff --git a/kernel/rcu/rcu_segcblist.c b/kernel/rcu/rcu_segcblist.c
index c07aab6e39ef..81145c3ece25 100644
--- a/kernel/rcu/rcu_segcblist.c
+++ b/kernel/rcu/rcu_segcblist.c
@@ -265,12 +265,10 @@ void rcu_segcblist_disable(struct rcu_segcblist *rsclp)
  */
 void rcu_segcblist_offload(struct rcu_segcblist *rsclp, bool offload)
 {
-	if (offload) {
+	if (offload)
 		rcu_segcblist_set_flags(rsclp, SEGCBLIST_LOCKING | SEGCBLIST_OFFLOADED);
-	} else {
-		rcu_segcblist_set_flags(rsclp, SEGCBLIST_RCU_CORE);
+	else
 		rcu_segcblist_clear_flags(rsclp, SEGCBLIST_OFFLOADED);
-	}
 }
 
 /*
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index ef417769bab2..1512efc52816 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -2717,6 +2717,23 @@ static __latent_entropy void rcu_core(void)
 	unsigned long flags;
 	struct rcu_data *rdp = raw_cpu_ptr(&rcu_data);
 	struct rcu_node *rnp = rdp->mynode;
+	/*
+	 * On RT rcu_core() can be preempted when IRQs aren't disabled.
+	 * Therefore this function can race with concurrent NOCB (de-)offloading
+	 * on this CPU and the below condition must be considered volatile.
+	 * However if we race with:
+	 *
+	 * _ Offloading:   In the worst case we accelerate or process callbacks
+	 *                 concurrently with NOCB kthreads. We are guaranteed to
+	 *                 call rcu_nocb_lock() if that happens.
+	 *
+	 * _ Deoffloading: In the worst case we miss callbacks acceleration or
+	 *                 processing. This is fine because the early stage
+	 *                 of deoffloading invokes rcu_core() after setting
+	 *                 SEGCBLIST_RCU_CORE. So we guarantee that we'll process
+	 *                 what could have been dismissed without the need to wait
+	 *                 for the next rcu_pending() check in the next jiffy.
+	 */
 	const bool do_batch = !rcu_segcblist_completely_offloaded(&rdp->cblist);
 
 	if (cpu_is_offline(smp_processor_id()))
diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
index 71a28f50b40f..3b470113ae38 100644
--- a/kernel/rcu/tree_nocb.h
+++ b/kernel/rcu/tree_nocb.h
@@ -990,6 +990,15 @@ static long rcu_nocb_rdp_deoffload(void *arg)
 	 * will refuse to put anything into the bypass.
 	 */
 	WARN_ON_ONCE(!rcu_nocb_flush_bypass(rdp, NULL, jiffies));
+	/*
+	 * Start with invoking rcu_core() early. This way if the current thread
+	 * happens to preempt an ongoing call to rcu_core() in the middle,
+	 * leaving some work dismissed because rcu_core() still thinks the rdp is
+	 * completely offloaded, we are guaranteed a nearby future instance of
+	 * rcu_core() to catch up.
+	 */
+	rcu_segcblist_set_flags(cblist, SEGCBLIST_RCU_CORE);
+	invoke_rcu_core();
 	ret = rdp_offload_toggle(rdp, false, flags);
 	swait_event_exclusive(rdp->nocb_state_wq,
 			      !rcu_segcblist_test_flags(cblist, SEGCBLIST_KTHREAD_CB |
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2021-10-14 13:57 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-11 14:51 [PATCH 00/11] rcu: Make rcu_core() safe in PREEMPT_RT with NOCB + a few other fixes v2 Frederic Weisbecker
2021-10-11 14:51 ` [PATCH 01/11] rcu/nocb: Make local rcu_nocb_lock_irqsave() safe against concurrent deoffloading Frederic Weisbecker
2021-10-11 14:51 ` [PATCH 02/11] rcu/nocb: Prepare state machine for a new step Frederic Weisbecker
2021-10-11 14:51 ` [PATCH 03/11] rcu/nocb: Invoke rcu_core() at the start of deoffloading Frederic Weisbecker
2021-10-13 16:07   ` Boqun Feng
2021-10-14 11:07     ` Frederic Weisbecker
2021-10-14 11:42     ` Valentin Schneider
2021-10-14 13:57       ` Boqun Feng
2021-10-11 14:51 ` [PATCH 04/11] rcu/nocb: Make rcu_core() callbacks acceleration preempt-safe Frederic Weisbecker
2021-10-11 14:51 ` [PATCH 05/11] rcu/nocb: Make rcu_core() callbacks acceleration (de-)offloading safe Frederic Weisbecker
2021-10-11 14:51 ` [PATCH 06/11] rcu/nocb: Check a stable offloaded state to manipulate qlen_last_fqs_check Frederic Weisbecker
2021-10-11 14:51 ` [PATCH 07/11] rcu/nocb: Use appropriate rcu_nocb_lock_irqsave() Frederic Weisbecker
2021-10-11 14:51 ` [PATCH 08/11] rcu/nocb: Limit number of softirq callbacks only on softirq Frederic Weisbecker
2021-10-11 14:51 ` [PATCH 09/11] rcu: Fix callbacks processing time limit retaining cond_resched() Frederic Weisbecker
2021-10-11 14:51 ` [PATCH 10/11] rcu: Apply callbacks processing time limit only on softirq Frederic Weisbecker
2021-10-11 14:51 ` [PATCH 11/11] rcu/nocb: Don't invoke local rcu core on callback overload from nocb kthread Frederic Weisbecker
2021-10-13  0:32 ` [PATCH 00/11] rcu: Make rcu_core() safe in PREEMPT_RT with NOCB + a few other fixes v2 Paul E. McKenney
2021-10-13  3:28   ` Paul E. McKenney
2021-10-13 10:01     ` Frederic Weisbecker
2021-10-13 11:43     ` Frederic Weisbecker
2021-10-13 16:27       ` Paul E. McKenney
2021-10-14 10:43         ` Frederic Weisbecker
  -- strict thread matches above, loose matches on Subject: below --
2021-09-29 22:10 [PATCH 00/11] rcu: Make rcu_core() safe in PREEMPT_RT with NOCB + a few other fixes Frederic Weisbecker
2021-09-29 22:10 ` [PATCH 03/11] rcu/nocb: Invoke rcu_core() at the start of deoffloading Frederic Weisbecker
2021-10-01 17:50   ` Valentin Schneider
2021-10-04 12:41     ` Frederic Weisbecker

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.