All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/10] rcu cleanups
@ 2023-09-08 20:35 Frederic Weisbecker
  2023-09-08 20:35 ` [PATCH 01/10] rcu: Use rcu_segcblist_segempty() instead of open coding it Frederic Weisbecker
                   ` (9 more replies)
  0 siblings, 10 replies; 29+ messages in thread
From: Frederic Weisbecker @ 2023-09-08 20:35 UTC (permalink / raw)
  To: Paul E . McKenney
  Cc: LKML, Frederic Weisbecker, rcu, Uladzislau Rezki,
	Neeraj Upadhyay, Boqun Feng, Joel Fernandes

Hi,

Here is a bunch of accumulated cleanups. Many of them are trivial but
beware some tricky ordering changes in the middle :-)

git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks.git
	rcu/dev

HEAD: 43d6b973aeb7459d29cd52054142291b099bf8ad

Thanks,
	Frederic
---

Frederic Weisbecker (10):
      rcu: Use rcu_segcblist_segempty() instead of open coding it
      rcu: Rename jiffies_till_flush to jiffies_lazy_flush
      rcu/nocb: Remove needless LOAD-ACQUIRE
      rcu/nocb: Remove needless full barrier after callback advancing
      rcu: Assume IRQS disabled from rcu_report_dead()
      rcu: Assume rcu_report_dead() is always called locally
      rcu: Conditionally build CPU-hotplug teardown callbacks
      rcu: Standardize explicit CPU-hotplug calls
      rcu: Remove references to rcu_migrate_callbacks() from diagrams
      rcu: Comment why callbacks migration can't wait for CPUHP_RCUTREE_PREP


 .../Expedited-Grace-Periods.rst                    |   2 +-
 .../Memory-Ordering/TreeRCU-callback-registry.svg  |   9 --
 .../RCU/Design/Memory-Ordering/TreeRCU-gp-fqs.svg  |   4 +-
 .../RCU/Design/Memory-Ordering/TreeRCU-gp.svg      |  13 +-
 .../RCU/Design/Memory-Ordering/TreeRCU-hotplug.svg |   4 +-
 .../RCU/Design/Requirements/Requirements.rst       |   4 +-
 arch/arm64/kernel/smp.c                            |   4 +-
 arch/powerpc/kernel/smp.c                          |   2 +-
 arch/s390/kernel/smp.c                             |   2 +-
 arch/x86/kernel/smpboot.c                          |   2 +-
 include/linux/interrupt.h                          |   2 +-
 include/linux/rcupdate.h                           |   2 -
 include/linux/rcutiny.h                            |   2 +-
 include/linux/rcutree.h                            |  16 ++-
 kernel/cpu.c                                       |  13 +-
 kernel/rcu/rcu.h                                   |   8 +-
 kernel/rcu/rcu_segcblist.c                         |   4 +-
 kernel/rcu/rcuscale.c                              |   6 +-
 kernel/rcu/tree.c                                  | 138 ++++++++++-----------
 kernel/rcu/tree_nocb.h                             |  24 ++--
 20 files changed, 129 insertions(+), 132 deletions(-)

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH 01/10] rcu: Use rcu_segcblist_segempty() instead of open coding it
  2023-09-08 20:35 [PATCH 00/10] rcu cleanups Frederic Weisbecker
@ 2023-09-08 20:35 ` Frederic Weisbecker
  2023-10-02 15:38   ` Paul E. McKenney
  2023-09-08 20:35 ` [PATCH 02/10] rcu: Rename jiffies_till_flush to jiffies_lazy_flush Frederic Weisbecker
                   ` (8 subsequent siblings)
  9 siblings, 1 reply; 29+ messages in thread
From: Frederic Weisbecker @ 2023-09-08 20:35 UTC (permalink / raw)
  To: Paul E . McKenney
  Cc: LKML, Frederic Weisbecker, rcu, Uladzislau Rezki,
	Neeraj Upadhyay, Boqun Feng, Qiuxu Zhuo, Joel Fernandes

This makes the code more readable.

Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
 kernel/rcu/rcu_segcblist.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/rcu/rcu_segcblist.c b/kernel/rcu/rcu_segcblist.c
index f71fac422c8f..1693ea22ef1b 100644
--- a/kernel/rcu/rcu_segcblist.c
+++ b/kernel/rcu/rcu_segcblist.c
@@ -368,7 +368,7 @@ bool rcu_segcblist_entrain(struct rcu_segcblist *rsclp,
 	smp_mb(); /* Ensure counts are updated before callback is entrained. */
 	rhp->next = NULL;
 	for (i = RCU_NEXT_TAIL; i > RCU_DONE_TAIL; i--)
-		if (rsclp->tails[i] != rsclp->tails[i - 1])
+		if (!rcu_segcblist_segempty(rsclp, i))
 			break;
 	rcu_segcblist_inc_seglen(rsclp, i);
 	WRITE_ONCE(*rsclp->tails[i], rhp);
@@ -551,7 +551,7 @@ bool rcu_segcblist_accelerate(struct rcu_segcblist *rsclp, unsigned long seq)
 	 * as their ->gp_seq[] grace-period completion sequence number.
 	 */
 	for (i = RCU_NEXT_READY_TAIL; i > RCU_DONE_TAIL; i--)
-		if (rsclp->tails[i] != rsclp->tails[i - 1] &&
+		if (!rcu_segcblist_segempty(rsclp, i) &&
 		    ULONG_CMP_LT(rsclp->gp_seq[i], seq))
 			break;
 
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 02/10] rcu: Rename jiffies_till_flush to jiffies_lazy_flush
  2023-09-08 20:35 [PATCH 00/10] rcu cleanups Frederic Weisbecker
  2023-09-08 20:35 ` [PATCH 01/10] rcu: Use rcu_segcblist_segempty() instead of open coding it Frederic Weisbecker
@ 2023-09-08 20:35 ` Frederic Weisbecker
  2023-09-09  1:07   ` Joel Fernandes
  2023-09-08 20:35 ` [PATCH 03/10] rcu/nocb: Remove needless LOAD-ACQUIRE Frederic Weisbecker
                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 29+ messages in thread
From: Frederic Weisbecker @ 2023-09-08 20:35 UTC (permalink / raw)
  To: Paul E . McKenney
  Cc: LKML, Frederic Weisbecker, rcu, Uladzislau Rezki,
	Neeraj Upadhyay, Boqun Feng, Joel Fernandes

The variable name jiffies_till_flush is too generic and therefore:

* It may shadow a global variable
* It doesn't tell on what it operates

Make the name more precise, along with the related APIs.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
 kernel/rcu/rcu.h       |  8 ++++----
 kernel/rcu/rcuscale.c  |  6 +++---
 kernel/rcu/tree_nocb.h | 20 ++++++++++----------
 3 files changed, 17 insertions(+), 17 deletions(-)

diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h
index 56a8466a11a2..4ac4daae9917 100644
--- a/kernel/rcu/rcu.h
+++ b/kernel/rcu/rcu.h
@@ -541,11 +541,11 @@ enum rcutorture_type {
 };
 
 #if defined(CONFIG_RCU_LAZY)
-unsigned long rcu_lazy_get_jiffies_till_flush(void);
-void rcu_lazy_set_jiffies_till_flush(unsigned long j);
+unsigned long rcu_get_jiffies_lazy_flush(void);
+void rcu_set_jiffies_lazy_flush(unsigned long j);
 #else
-static inline unsigned long rcu_lazy_get_jiffies_till_flush(void) { return 0; }
-static inline void rcu_lazy_set_jiffies_till_flush(unsigned long j) { }
+static inline unsigned long rcu_get_jiffies_lazy_flush(void) { return 0; }
+static inline void rcu_set_jiffies_lazy_flush(unsigned long j) { }
 #endif
 
 #if defined(CONFIG_TREE_RCU)
diff --git a/kernel/rcu/rcuscale.c b/kernel/rcu/rcuscale.c
index ffdb30495e3c..8db4fedaaa1e 100644
--- a/kernel/rcu/rcuscale.c
+++ b/kernel/rcu/rcuscale.c
@@ -764,9 +764,9 @@ kfree_scale_init(void)
 
 	if (kfree_by_call_rcu) {
 		/* do a test to check the timeout. */
-		orig_jif = rcu_lazy_get_jiffies_till_flush();
+		orig_jif = rcu_get_jiffies_lazy_flush();
 
-		rcu_lazy_set_jiffies_till_flush(2 * HZ);
+		rcu_set_jiffies_lazy_flush(2 * HZ);
 		rcu_barrier();
 
 		jif_start = jiffies;
@@ -775,7 +775,7 @@ kfree_scale_init(void)
 
 		smp_cond_load_relaxed(&rcu_lazy_test1_cb_called, VAL == 1);
 
-		rcu_lazy_set_jiffies_till_flush(orig_jif);
+		rcu_set_jiffies_lazy_flush(orig_jif);
 
 		if (WARN_ON_ONCE(jiffies_at_lazy_cb - jif_start < 2 * HZ)) {
 			pr_alert("ERROR: call_rcu() CBs are not being lazy as expected!\n");
diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
index 5598212d1f27..b9eab359c597 100644
--- a/kernel/rcu/tree_nocb.h
+++ b/kernel/rcu/tree_nocb.h
@@ -264,21 +264,21 @@ static bool wake_nocb_gp(struct rcu_data *rdp, bool force)
  * left unsubmitted to RCU after those many jiffies.
  */
 #define LAZY_FLUSH_JIFFIES (10 * HZ)
-static unsigned long jiffies_till_flush = LAZY_FLUSH_JIFFIES;
+static unsigned long jiffies_lazy_flush = LAZY_FLUSH_JIFFIES;
 
 #ifdef CONFIG_RCU_LAZY
 // To be called only from test code.
-void rcu_lazy_set_jiffies_till_flush(unsigned long jif)
+void rcu_set_jiffies_lazy_flush(unsigned long jif)
 {
-	jiffies_till_flush = jif;
+	jiffies_lazy_flush = jif;
 }
-EXPORT_SYMBOL(rcu_lazy_set_jiffies_till_flush);
+EXPORT_SYMBOL(rcu_set_jiffies_lazy_flush);
 
-unsigned long rcu_lazy_get_jiffies_till_flush(void)
+unsigned long rcu_get_jiffies_lazy_flush(void)
 {
-	return jiffies_till_flush;
+	return jiffies_lazy_flush;
 }
-EXPORT_SYMBOL(rcu_lazy_get_jiffies_till_flush);
+EXPORT_SYMBOL(rcu_get_jiffies_lazy_flush);
 #endif
 
 /*
@@ -299,7 +299,7 @@ static void wake_nocb_gp_defer(struct rcu_data *rdp, int waketype,
 	 */
 	if (waketype == RCU_NOCB_WAKE_LAZY &&
 	    rdp->nocb_defer_wakeup == RCU_NOCB_WAKE_NOT) {
-		mod_timer(&rdp_gp->nocb_timer, jiffies + jiffies_till_flush);
+		mod_timer(&rdp_gp->nocb_timer, jiffies + jiffies_lazy_flush);
 		WRITE_ONCE(rdp_gp->nocb_defer_wakeup, waketype);
 	} else if (waketype == RCU_NOCB_WAKE_BYPASS) {
 		mod_timer(&rdp_gp->nocb_timer, jiffies + 2);
@@ -482,7 +482,7 @@ static bool rcu_nocb_try_bypass(struct rcu_data *rdp, struct rcu_head *rhp,
 	// flush ->nocb_bypass to ->cblist.
 	if ((ncbs && !bypass_is_lazy && j != READ_ONCE(rdp->nocb_bypass_first)) ||
 	    (ncbs &&  bypass_is_lazy &&
-	     (time_after(j, READ_ONCE(rdp->nocb_bypass_first) + jiffies_till_flush))) ||
+	     (time_after(j, READ_ONCE(rdp->nocb_bypass_first) + jiffies_lazy_flush))) ||
 	    ncbs >= qhimark) {
 		rcu_nocb_lock(rdp);
 		*was_alldone = !rcu_segcblist_pend_cbs(&rdp->cblist);
@@ -723,7 +723,7 @@ static void nocb_gp_wait(struct rcu_data *my_rdp)
 		lazy_ncbs = READ_ONCE(rdp->lazy_len);
 
 		if (bypass_ncbs && (lazy_ncbs == bypass_ncbs) &&
-		    (time_after(j, READ_ONCE(rdp->nocb_bypass_first) + jiffies_till_flush) ||
+		    (time_after(j, READ_ONCE(rdp->nocb_bypass_first) + jiffies_lazy_flush) ||
 		     bypass_ncbs > 2 * qhimark)) {
 			flush_bypass = true;
 		} else if (bypass_ncbs && (lazy_ncbs != bypass_ncbs) &&
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 03/10] rcu/nocb: Remove needless LOAD-ACQUIRE
  2023-09-08 20:35 [PATCH 00/10] rcu cleanups Frederic Weisbecker
  2023-09-08 20:35 ` [PATCH 01/10] rcu: Use rcu_segcblist_segempty() instead of open coding it Frederic Weisbecker
  2023-09-08 20:35 ` [PATCH 02/10] rcu: Rename jiffies_till_flush to jiffies_lazy_flush Frederic Weisbecker
@ 2023-09-08 20:35 ` Frederic Weisbecker
  2023-09-09  1:48   ` Joel Fernandes
  2023-09-08 20:35 ` [PATCH 04/10] rcu/nocb: Remove needless full barrier after callback advancing Frederic Weisbecker
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 29+ messages in thread
From: Frederic Weisbecker @ 2023-09-08 20:35 UTC (permalink / raw)
  To: Paul E . McKenney
  Cc: LKML, Frederic Weisbecker, rcu, Uladzislau Rezki,
	Neeraj Upadhyay, Boqun Feng, Joel Fernandes

The LOAD-ACQUIRE access performed on rdp->nocb_cb_sleep advertizes
ordering callback execution against grace period completion. However
this is contradicted by the following:

* This LOAD-ACQUIRE doesn't pair with anything. The only counterpart
  barrier that can be found is the smp_mb() placed after callbacks
  advancing in nocb_gp_wait(). However the barrier is placed _after_
  ->nocb_cb_sleep write.

* Callbacks can be concurrently advanced between the LOAD-ACQUIRE on
  ->nocb_cb_sleep and the call to rcu_segcblist_extract_done_cbs() in
  rcu_do_batch(), making any ordering based on ->nocb_cb_sleep broken.

* Both rcu_segcblist_extract_done_cbs() and rcu_advance_cbs() are called
  under the nocb_lock, the latter hereby providing already the desired
  ACQUIRE semantics.

Therefore it is safe to access ->nocb_cb_sleep with a simple compiler
barrier.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
 kernel/rcu/tree_nocb.h | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
index b9eab359c597..6e63ba4788e1 100644
--- a/kernel/rcu/tree_nocb.h
+++ b/kernel/rcu/tree_nocb.h
@@ -933,8 +933,7 @@ static void nocb_cb_wait(struct rcu_data *rdp)
 		swait_event_interruptible_exclusive(rdp->nocb_cb_wq,
 						    nocb_cb_wait_cond(rdp));
 
-		// VVV Ensure CB invocation follows _sleep test.
-		if (smp_load_acquire(&rdp->nocb_cb_sleep)) { // ^^^
+		if (READ_ONCE(rdp->nocb_cb_sleep)) {
 			WARN_ON(signal_pending(current));
 			trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("WokeEmpty"));
 		}
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 04/10] rcu/nocb: Remove needless full barrier after callback advancing
  2023-09-08 20:35 [PATCH 00/10] rcu cleanups Frederic Weisbecker
                   ` (2 preceding siblings ...)
  2023-09-08 20:35 ` [PATCH 03/10] rcu/nocb: Remove needless LOAD-ACQUIRE Frederic Weisbecker
@ 2023-09-08 20:35 ` Frederic Weisbecker
  2023-09-09  4:31   ` Joel Fernandes
  2023-09-08 20:35 ` [PATCH 05/10] rcu: Assume IRQS disabled from rcu_report_dead() Frederic Weisbecker
                   ` (5 subsequent siblings)
  9 siblings, 1 reply; 29+ messages in thread
From: Frederic Weisbecker @ 2023-09-08 20:35 UTC (permalink / raw)
  To: Paul E . McKenney
  Cc: LKML, Frederic Weisbecker, rcu, Uladzislau Rezki,
	Neeraj Upadhyay, Boqun Feng, Joel Fernandes

A full barrier is issued from nocb_gp_wait() upon callbacks advancing
to order grace period completion with callbacks execution.

However these two events are already ordered by the
smp_mb__after_unlock_lock() barrier within the call to
raw_spin_lock_rcu_node() that is necessary for callbacks advancing to
happen.

The following litmus test shows the kind of guarantee that this barrier
provides:

	C smp_mb__after_unlock_lock

	{}

	// rcu_gp_cleanup()
	P0(spinlock_t *rnp_lock, int *gpnum)
	{
		// Grace period cleanup increase gp sequence number
		spin_lock(rnp_lock);
		WRITE_ONCE(*gpnum, 1);
		spin_unlock(rnp_lock);
	}

	// nocb_gp_wait()
	P1(spinlock_t *rnp_lock, spinlock_t *nocb_lock, int *gpnum, int *cb_ready)
	{
		int r1;

		// Call rcu_advance_cbs() from nocb_gp_wait()
		spin_lock(nocb_lock);
		spin_lock(rnp_lock);
		smp_mb__after_unlock_lock();
		r1 = READ_ONCE(*gpnum);
		WRITE_ONCE(*cb_ready, 1);
		spin_unlock(rnp_lock);
		spin_unlock(nocb_lock);
	}

	// nocb_cb_wait()
	P2(spinlock_t *nocb_lock, int *cb_ready, int *cb_executed)
	{
		int r2;

		// rcu_do_batch() -> rcu_segcblist_extract_done_cbs()
		spin_lock(nocb_lock);
		r2 = READ_ONCE(*cb_ready);
		spin_unlock(nocb_lock);

		// Actual callback execution
		WRITE_ONCE(*cb_executed, 1);
	}

	P3(int *cb_executed, int *gpnum)
	{
		int r3;

		WRITE_ONCE(*cb_executed, 2);
		smp_mb();
		r3 = READ_ONCE(*gpnum);
	}

	exists (1:r1=1 /\ 2:r2=1 /\ cb_executed=2 /\ 3:r3=0) (* Bad outcome. *)

Here the bad outcome only occurs if the smp_mb__after_unlock_lock() is
removed. This barrier orders the grace period completion against
callbacks advancing and even later callbacks invocation, thanks to the
opportunistic propagation via the ->nocb_lock to nocb_cb_wait().

Therefore the smp_mb() placed after callbacks advancing can be safely
removed.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
 kernel/rcu/tree_nocb.h | 1 -
 1 file changed, 1 deletion(-)

diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
index 6e63ba4788e1..2dc76f5e6e78 100644
--- a/kernel/rcu/tree_nocb.h
+++ b/kernel/rcu/tree_nocb.h
@@ -779,7 +779,6 @@ static void nocb_gp_wait(struct rcu_data *my_rdp)
 		if (rcu_segcblist_ready_cbs(&rdp->cblist)) {
 			needwake = rdp->nocb_cb_sleep;
 			WRITE_ONCE(rdp->nocb_cb_sleep, false);
-			smp_mb(); /* CB invocation -after- GP end. */
 		} else {
 			needwake = false;
 		}
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 05/10] rcu: Assume IRQS disabled from rcu_report_dead()
  2023-09-08 20:35 [PATCH 00/10] rcu cleanups Frederic Weisbecker
                   ` (3 preceding siblings ...)
  2023-09-08 20:35 ` [PATCH 04/10] rcu/nocb: Remove needless full barrier after callback advancing Frederic Weisbecker
@ 2023-09-08 20:35 ` Frederic Weisbecker
  2023-10-02 15:41   ` Paul E. McKenney
  2023-09-08 20:35 ` [PATCH 06/10] rcu: Assume rcu_report_dead() is always called locally Frederic Weisbecker
                   ` (4 subsequent siblings)
  9 siblings, 1 reply; 29+ messages in thread
From: Frederic Weisbecker @ 2023-09-08 20:35 UTC (permalink / raw)
  To: Paul E . McKenney
  Cc: LKML, Frederic Weisbecker, rcu, Uladzislau Rezki,
	Neeraj Upadhyay, Boqun Feng, Joel Fernandes

rcu_report_dead() is the last RCU word from the CPU down through the
hotplug path. It is called in the idle loop right before the CPU shuts
down for good. Because it removes the CPU from the grace period state
machine and reports an ultimate quiescent state if necessary, no further
use of RCU is allowed. Therefore it is expected that IRQs are disabled
upon calling this function and are not to be re-enabled again until the
CPU shuts down.

Remove the IRQs disablement from that function and verify instead that
it is actually called with IRQs disabled as it is expected at that
special point in the idle path.

Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
 kernel/rcu/tree.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index a83ecab77917..8b5ebef32e17 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -4553,11 +4553,16 @@ void rcu_cpu_starting(unsigned int cpu)
  */
 void rcu_report_dead(unsigned int cpu)
 {
-	unsigned long flags, seq_flags;
+	unsigned long flags;
 	unsigned long mask;
 	struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
 	struct rcu_node *rnp = rdp->mynode;  /* Outgoing CPU's rdp & rnp. */
 
+	/*
+	 * IRQS must be disabled from now on and until the CPU dies, or an interrupt
+	 * may introduce a new READ-side while it is actually off the QS masks.
+	 */
+	lockdep_assert_irqs_disabled();
 	// Do any dangling deferred wakeups.
 	do_nocb_deferred_wakeup(rdp);
 
@@ -4565,7 +4570,6 @@ void rcu_report_dead(unsigned int cpu)
 
 	/* Remove outgoing CPU from mask in the leaf rcu_node structure. */
 	mask = rdp->grpmask;
-	local_irq_save(seq_flags);
 	arch_spin_lock(&rcu_state.ofl_lock);
 	raw_spin_lock_irqsave_rcu_node(rnp, flags); /* Enforce GP memory-order guarantee. */
 	rdp->rcu_ofl_gp_seq = READ_ONCE(rcu_state.gp_seq);
@@ -4579,8 +4583,6 @@ void rcu_report_dead(unsigned int cpu)
 	WRITE_ONCE(rnp->qsmaskinitnext, rnp->qsmaskinitnext & ~mask);
 	raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
 	arch_spin_unlock(&rcu_state.ofl_lock);
-	local_irq_restore(seq_flags);
-
 	rdp->cpu_started = false;
 }
 
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 06/10] rcu: Assume rcu_report_dead() is always called locally
  2023-09-08 20:35 [PATCH 00/10] rcu cleanups Frederic Weisbecker
                   ` (4 preceding siblings ...)
  2023-09-08 20:35 ` [PATCH 05/10] rcu: Assume IRQS disabled from rcu_report_dead() Frederic Weisbecker
@ 2023-09-08 20:35 ` Frederic Weisbecker
  2023-10-02 15:45   ` Paul E. McKenney
  2023-09-08 20:36 ` [PATCH 07/10] rcu: Conditionally build CPU-hotplug teardown callbacks Frederic Weisbecker
                   ` (3 subsequent siblings)
  9 siblings, 1 reply; 29+ messages in thread
From: Frederic Weisbecker @ 2023-09-08 20:35 UTC (permalink / raw)
  To: Paul E . McKenney
  Cc: LKML, Frederic Weisbecker, rcu, Uladzislau Rezki,
	Neeraj Upadhyay, Boqun Feng, Joel Fernandes

rcu_report_dead() has to be called locally by the CPU that is going to
exit the RCU state machine. Passing a cpu argument here is error-prone
and leaves the possibility for a racy remote call.

Use local access instead.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
 arch/arm64/kernel/smp.c  | 2 +-
 include/linux/rcupdate.h | 2 +-
 kernel/cpu.c             | 2 +-
 kernel/rcu/tree.c        | 4 ++--
 4 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
index edd63894d61e..ce672cb69f1c 100644
--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c
@@ -401,7 +401,7 @@ void __noreturn cpu_die_early(void)
 
 	/* Mark this CPU absent */
 	set_cpu_present(cpu, 0);
-	rcu_report_dead(cpu);
+	rcu_report_dead();
 
 	if (IS_ENABLED(CONFIG_HOTPLUG_CPU)) {
 		update_cpu_boot_status(CPU_KILL_ME);
diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index 5e5f920ade90..aa351ddcbe8d 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -122,7 +122,7 @@ static inline void call_rcu_hurry(struct rcu_head *head, rcu_callback_t func)
 void rcu_init(void);
 extern int rcu_scheduler_active;
 void rcu_sched_clock_irq(int user);
-void rcu_report_dead(unsigned int cpu);
+void rcu_report_dead(void);
 void rcutree_migrate_callbacks(int cpu);
 
 #ifdef CONFIG_TASKS_RCU_GENERIC
diff --git a/kernel/cpu.c b/kernel/cpu.c
index 88a7ede322bd..86f08eafbd9f 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -1368,7 +1368,7 @@ void cpuhp_report_idle_dead(void)
 	struct cpuhp_cpu_state *st = this_cpu_ptr(&cpuhp_state);
 
 	BUG_ON(st->state != CPUHP_AP_OFFLINE);
-	rcu_report_dead(smp_processor_id());
+	rcu_report_dead();
 	st->state = CPUHP_AP_IDLE_DEAD;
 	/*
 	 * We cannot call complete after rcu_report_dead() so we delegate it
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 8b5ebef32e17..289c51417cbc 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -4551,11 +4551,11 @@ void rcu_cpu_starting(unsigned int cpu)
  * from the outgoing CPU rather than from the cpuhp_step mechanism.
  * This is because this function must be invoked at a precise location.
  */
-void rcu_report_dead(unsigned int cpu)
+void rcu_report_dead(void)
 {
 	unsigned long flags;
 	unsigned long mask;
-	struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
+	struct rcu_data *rdp = this_cpu_ptr(&rcu_data);
 	struct rcu_node *rnp = rdp->mynode;  /* Outgoing CPU's rdp & rnp. */
 
 	/*
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 07/10] rcu: Conditionally build CPU-hotplug teardown callbacks
  2023-09-08 20:35 [PATCH 00/10] rcu cleanups Frederic Weisbecker
                   ` (5 preceding siblings ...)
  2023-09-08 20:35 ` [PATCH 06/10] rcu: Assume rcu_report_dead() is always called locally Frederic Weisbecker
@ 2023-09-08 20:36 ` Frederic Weisbecker
  2023-10-04 16:57   ` Paul E. McKenney
  2023-09-08 20:36 ` [PATCH 08/10] rcu: Standardize explicit CPU-hotplug calls Frederic Weisbecker
                   ` (2 subsequent siblings)
  9 siblings, 1 reply; 29+ messages in thread
From: Frederic Weisbecker @ 2023-09-08 20:36 UTC (permalink / raw)
  To: Paul E . McKenney
  Cc: LKML, Frederic Weisbecker, rcu, Uladzislau Rezki,
	Neeraj Upadhyay, Boqun Feng, Joel Fernandes

Among the three CPU-hotplug teardown RCU callbacks, two of them early
exit if CONFIG_HOTPLUG_CPU=n, and one is left unchanged. In any case
all of them have an implementation when CONFIG_HOTPLUG_CPU=n.

Align instead with the common way to deal with CPU-hotplug teardown
callbacks and provide a proper stub when they are not supported.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
 include/linux/rcutree.h |  11 +++-
 kernel/rcu/tree.c       | 114 +++++++++++++++++++---------------------
 2 files changed, 63 insertions(+), 62 deletions(-)

diff --git a/include/linux/rcutree.h b/include/linux/rcutree.h
index af6ddbd291eb..7d75066c72aa 100644
--- a/include/linux/rcutree.h
+++ b/include/linux/rcutree.h
@@ -109,9 +109,16 @@ void rcu_all_qs(void);
 /* RCUtree hotplug events */
 int rcutree_prepare_cpu(unsigned int cpu);
 int rcutree_online_cpu(unsigned int cpu);
-int rcutree_offline_cpu(unsigned int cpu);
+void rcu_cpu_starting(unsigned int cpu);
+
+#ifdef CONFIG_HOTPLUG_CPU
 int rcutree_dead_cpu(unsigned int cpu);
 int rcutree_dying_cpu(unsigned int cpu);
-void rcu_cpu_starting(unsigned int cpu);
+int rcutree_offline_cpu(unsigned int cpu);
+#else
+#define rcutree_dead_cpu NULL
+#define rcutree_dying_cpu NULL
+#define rcutree_offline_cpu NULL
+#endif
 
 #endif /* __LINUX_RCUTREE_H */
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 289c51417cbc..875f241db508 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -4228,25 +4228,6 @@ static bool rcu_init_invoked(void)
 	return !!rcu_state.n_online_cpus;
 }
 
-/*
- * Near the end of the offline process.  Trace the fact that this CPU
- * is going offline.
- */
-int rcutree_dying_cpu(unsigned int cpu)
-{
-	bool blkd;
-	struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
-	struct rcu_node *rnp = rdp->mynode;
-
-	if (!IS_ENABLED(CONFIG_HOTPLUG_CPU))
-		return 0;
-
-	blkd = !!(READ_ONCE(rnp->qsmask) & rdp->grpmask);
-	trace_rcu_grace_period(rcu_state.name, READ_ONCE(rnp->gp_seq),
-			       blkd ? TPS("cpuofl-bgp") : TPS("cpuofl"));
-	return 0;
-}
-
 /*
  * All CPUs for the specified rcu_node structure have gone offline,
  * and all tasks that were preempted within an RCU read-side critical
@@ -4292,23 +4273,6 @@ static void rcu_cleanup_dead_rnp(struct rcu_node *rnp_leaf)
 	}
 }
 
-/*
- * The CPU has been completely removed, and some other CPU is reporting
- * this fact from process context.  Do the remainder of the cleanup.
- * There can only be one CPU hotplug operation at a time, so no need for
- * explicit locking.
- */
-int rcutree_dead_cpu(unsigned int cpu)
-{
-	if (!IS_ENABLED(CONFIG_HOTPLUG_CPU))
-		return 0;
-
-	WRITE_ONCE(rcu_state.n_online_cpus, rcu_state.n_online_cpus - 1);
-	// Stop-machine done, so allow nohz_full to disable tick.
-	tick_dep_clear(TICK_DEP_BIT_RCU);
-	return 0;
-}
-
 /*
  * Propagate ->qsinitmask bits up the rcu_node tree to account for the
  * first CPU in a given leaf rcu_node structure coming online.  The caller
@@ -4461,29 +4425,6 @@ int rcutree_online_cpu(unsigned int cpu)
 	return 0;
 }
 
-/*
- * Near the beginning of the process.  The CPU is still very much alive
- * with pretty much all services enabled.
- */
-int rcutree_offline_cpu(unsigned int cpu)
-{
-	unsigned long flags;
-	struct rcu_data *rdp;
-	struct rcu_node *rnp;
-
-	rdp = per_cpu_ptr(&rcu_data, cpu);
-	rnp = rdp->mynode;
-	raw_spin_lock_irqsave_rcu_node(rnp, flags);
-	rnp->ffmask &= ~rdp->grpmask;
-	raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
-
-	rcutree_affinity_setting(cpu, cpu);
-
-	// nohz_full CPUs need the tick for stop-machine to work quickly
-	tick_dep_set(TICK_DEP_BIT_RCU);
-	return 0;
-}
-
 /*
  * Mark the specified CPU as being online so that subsequent grace periods
  * (both expedited and normal) will wait on it.  Note that this means that
@@ -4637,7 +4578,60 @@ void rcutree_migrate_callbacks(int cpu)
 		  cpu, rcu_segcblist_n_cbs(&rdp->cblist),
 		  rcu_segcblist_first_cb(&rdp->cblist));
 }
-#endif
+
+/*
+ * The CPU has been completely removed, and some other CPU is reporting
+ * this fact from process context.  Do the remainder of the cleanup.
+ * There can only be one CPU hotplug operation at a time, so no need for
+ * explicit locking.
+ */
+int rcutree_dead_cpu(unsigned int cpu)
+{
+	WRITE_ONCE(rcu_state.n_online_cpus, rcu_state.n_online_cpus - 1);
+	// Stop-machine done, so allow nohz_full to disable tick.
+	tick_dep_clear(TICK_DEP_BIT_RCU);
+	return 0;
+}
+
+/*
+ * Near the end of the offline process.  Trace the fact that this CPU
+ * is going offline.
+ */
+int rcutree_dying_cpu(unsigned int cpu)
+{
+	bool blkd;
+	struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
+	struct rcu_node *rnp = rdp->mynode;
+
+	blkd = !!(READ_ONCE(rnp->qsmask) & rdp->grpmask);
+	trace_rcu_grace_period(rcu_state.name, READ_ONCE(rnp->gp_seq),
+			       blkd ? TPS("cpuofl-bgp") : TPS("cpuofl"));
+	return 0;
+}
+
+/*
+ * Near the beginning of the process.  The CPU is still very much alive
+ * with pretty much all services enabled.
+ */
+int rcutree_offline_cpu(unsigned int cpu)
+{
+	unsigned long flags;
+	struct rcu_data *rdp;
+	struct rcu_node *rnp;
+
+	rdp = per_cpu_ptr(&rcu_data, cpu);
+	rnp = rdp->mynode;
+	raw_spin_lock_irqsave_rcu_node(rnp, flags);
+	rnp->ffmask &= ~rdp->grpmask;
+	raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
+
+	rcutree_affinity_setting(cpu, cpu);
+
+	// nohz_full CPUs need the tick for stop-machine to work quickly
+	tick_dep_set(TICK_DEP_BIT_RCU);
+	return 0;
+}
+#endif /* #ifdef CONFIG_HOTPLUG_CPU */
 
 /*
  * On non-huge systems, use expedited RCU grace periods to make suspend
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 08/10] rcu: Standardize explicit CPU-hotplug calls
  2023-09-08 20:35 [PATCH 00/10] rcu cleanups Frederic Weisbecker
                   ` (6 preceding siblings ...)
  2023-09-08 20:36 ` [PATCH 07/10] rcu: Conditionally build CPU-hotplug teardown callbacks Frederic Weisbecker
@ 2023-09-08 20:36 ` Frederic Weisbecker
  2023-10-02 15:47   ` Paul E. McKenney
  2023-09-08 20:36 ` [PATCH 09/10] rcu: Remove references to rcu_migrate_callbacks() from diagrams Frederic Weisbecker
  2023-09-08 20:36 ` [PATCH 10/10] rcu: Comment why callbacks migration can't wait for CPUHP_RCUTREE_PREP Frederic Weisbecker
  9 siblings, 1 reply; 29+ messages in thread
From: Frederic Weisbecker @ 2023-09-08 20:36 UTC (permalink / raw)
  To: Paul E . McKenney
  Cc: LKML, Frederic Weisbecker, rcu, Uladzislau Rezki,
	Neeraj Upadhyay, Boqun Feng, Joel Fernandes

rcu_report_dead() and rcutree_migrate_callbacks() have their headers in
rcupdate.h while those are pure rcutree calls, like the other CPU-hotplug
functions.

Also rcu_cpu_starting() and rcu_report_dead() have different naming
conventions while they mirror each other's effects.

Fix the headers and propose a naming that relates both functions and
aligns with the prefix of other rcutree CPU-hotplug functions.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
 .../Expedited-Grace-Periods.rst                      |  2 +-
 .../RCU/Design/Memory-Ordering/TreeRCU-gp-fqs.svg    |  4 ++--
 .../RCU/Design/Memory-Ordering/TreeRCU-gp.svg        |  4 ++--
 .../RCU/Design/Memory-Ordering/TreeRCU-hotplug.svg   |  4 ++--
 .../RCU/Design/Requirements/Requirements.rst         |  4 ++--
 arch/arm64/kernel/smp.c                              |  4 ++--
 arch/powerpc/kernel/smp.c                            |  2 +-
 arch/s390/kernel/smp.c                               |  2 +-
 arch/x86/kernel/smpboot.c                            |  2 +-
 include/linux/interrupt.h                            |  2 +-
 include/linux/rcupdate.h                             |  2 --
 include/linux/rcutiny.h                              |  2 +-
 include/linux/rcutree.h                              |  7 ++++++-
 kernel/cpu.c                                         |  6 +++---
 kernel/rcu/tree.c                                    | 12 ++++++++----
 15 files changed, 33 insertions(+), 26 deletions(-)

diff --git a/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.rst b/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.rst
index 93d899d53258..414f8a2012d6 100644
--- a/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.rst
+++ b/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.rst
@@ -181,7 +181,7 @@ operations is carried out at several levels:
    of this wait (or series of waits, as the case may be) is to permit a
    concurrent CPU-hotplug operation to complete.
 #. In the case of RCU-sched, one of the last acts of an outgoing CPU is
-   to invoke ``rcu_report_dead()``, which reports a quiescent state for
+   to invoke ``rcutree_report_cpu_dead()``, which reports a quiescent state for
    that CPU. However, this is likely paranoia-induced redundancy.
 
 +-----------------------------------------------------------------------+
diff --git a/Documentation/RCU/Design/Memory-Ordering/TreeRCU-gp-fqs.svg b/Documentation/RCU/Design/Memory-Ordering/TreeRCU-gp-fqs.svg
index 7ddc094d7f28..d82a77d03d8c 100644
--- a/Documentation/RCU/Design/Memory-Ordering/TreeRCU-gp-fqs.svg
+++ b/Documentation/RCU/Design/Memory-Ordering/TreeRCU-gp-fqs.svg
@@ -1135,7 +1135,7 @@
        font-weight="bold"
        font-size="192"
        id="text202-7-5-3-27-6-5"
-       style="font-size:192px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;stroke-width:0.025in;font-family:Courier">rcu_report_dead()</text>
+       style="font-size:192px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;stroke-width:0.025in;font-family:Courier">rcutree_report_cpu_dead()</text>
     <text
        xml:space="preserve"
        x="3745.7725"
@@ -1256,7 +1256,7 @@
        font-style="normal"
        y="3679.27"
        x="-3804.9949"
-       xml:space="preserve">rcu_cpu_starting()</text>
+       xml:space="preserve">rcutree_report_cpu_starting()</text>
     <g
        style="fill:none;stroke-width:0.025in"
        id="g3107-7-5-0"
diff --git a/Documentation/RCU/Design/Memory-Ordering/TreeRCU-gp.svg b/Documentation/RCU/Design/Memory-Ordering/TreeRCU-gp.svg
index 069f6f8371c2..6e690a3be161 100644
--- a/Documentation/RCU/Design/Memory-Ordering/TreeRCU-gp.svg
+++ b/Documentation/RCU/Design/Memory-Ordering/TreeRCU-gp.svg
@@ -3274,7 +3274,7 @@
          font-weight="bold"
          font-size="192"
          id="text202-7-5-3-27-6-5"
-         style="font-size:192px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;stroke-width:0.025in;font-family:Courier">rcu_report_dead()</text>
+         style="font-size:192px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;stroke-width:0.025in;font-family:Courier">rcutree_report_cpu_dead()</text>
       <text
          xml:space="preserve"
          x="3745.7725"
@@ -3395,7 +3395,7 @@
          font-style="normal"
          y="3679.27"
          x="-3804.9949"
-         xml:space="preserve">rcu_cpu_starting()</text>
+         xml:space="preserve">rcutree_report_cpu_starting()</text>
       <g
          style="fill:none;stroke-width:0.025in"
          id="g3107-7-5-0"
diff --git a/Documentation/RCU/Design/Memory-Ordering/TreeRCU-hotplug.svg b/Documentation/RCU/Design/Memory-Ordering/TreeRCU-hotplug.svg
index 2c9310ba29ba..4fa7506082bf 100644
--- a/Documentation/RCU/Design/Memory-Ordering/TreeRCU-hotplug.svg
+++ b/Documentation/RCU/Design/Memory-Ordering/TreeRCU-hotplug.svg
@@ -607,7 +607,7 @@
        font-weight="bold"
        font-size="192"
        id="text202-7-5-3-27-6"
-       style="font-size:192px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;stroke-width:0.025in;font-family:Courier">rcu_report_dead()</text>
+       style="font-size:192px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;stroke-width:0.025in;font-family:Courier">rcutree_report_cpu_dead()</text>
     <text
        xml:space="preserve"
        x="3745.7725"
@@ -728,7 +728,7 @@
        font-style="normal"
        y="3679.27"
        x="-3804.9949"
-       xml:space="preserve">rcu_cpu_starting()</text>
+       xml:space="preserve">rcutree_report_cpu_starting()</text>
     <g
        style="fill:none;stroke-width:0.025in"
        id="g3107-7-5-0"
diff --git a/Documentation/RCU/Design/Requirements/Requirements.rst b/Documentation/RCU/Design/Requirements/Requirements.rst
index f3b605285a87..cccafdaa1f84 100644
--- a/Documentation/RCU/Design/Requirements/Requirements.rst
+++ b/Documentation/RCU/Design/Requirements/Requirements.rst
@@ -1955,12 +1955,12 @@ if offline CPUs block an RCU grace period for too long.
 
 An offline CPU's quiescent state will be reported either:
 
-1.  As the CPU goes offline using RCU's hotplug notifier (rcu_report_dead()).
+1.  As the CPU goes offline using RCU's hotplug notifier (rcutree_report_cpu_dead()).
 2.  When grace period initialization (rcu_gp_init()) detects a
     race either with CPU offlining or with a task unblocking on a leaf
     ``rcu_node`` structure whose CPUs are all offline.
 
-The CPU-online path (rcu_cpu_starting()) should never need to report
+The CPU-online path (rcutree_report_cpu_starting()) should never need to report
 a quiescent state for an offline CPU.  However, as a debugging measure,
 it does emit a warning if a quiescent state was not already reported
 for that CPU.
diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
index ce672cb69f1c..90c0083438ea 100644
--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c
@@ -215,7 +215,7 @@ asmlinkage notrace void secondary_start_kernel(void)
 	if (system_uses_irq_prio_masking())
 		init_gic_priority_masking();
 
-	rcu_cpu_starting(cpu);
+	rcutree_report_cpu_starting(cpu);
 	trace_hardirqs_off();
 
 	/*
@@ -401,7 +401,7 @@ void __noreturn cpu_die_early(void)
 
 	/* Mark this CPU absent */
 	set_cpu_present(cpu, 0);
-	rcu_report_dead();
+	rcutree_report_cpu_dead();
 
 	if (IS_ENABLED(CONFIG_HOTPLUG_CPU)) {
 		update_cpu_boot_status(CPU_KILL_ME);
diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index fbbb695bae3d..a132e3290e98 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -1619,7 +1619,7 @@ void start_secondary(void *unused)
 
 	smp_store_cpu_info(cpu);
 	set_dec(tb_ticks_per_jiffy);
-	rcu_cpu_starting(cpu);
+	rcutree_report_cpu_starting(cpu);
 	cpu_callin_map[cpu] = 1;
 
 	if (smp_ops->setup_cpu)
diff --git a/arch/s390/kernel/smp.c b/arch/s390/kernel/smp.c
index f9a2b755f510..3e39a5e1bf48 100644
--- a/arch/s390/kernel/smp.c
+++ b/arch/s390/kernel/smp.c
@@ -894,7 +894,7 @@ static void smp_start_secondary(void *cpuvoid)
 	S390_lowcore.restart_flags = 0;
 	restore_access_regs(S390_lowcore.access_regs_save_area);
 	cpu_init();
-	rcu_cpu_starting(cpu);
+	rcutree_report_cpu_starting(cpu);
 	init_cpu_timer();
 	vtime_init();
 	vdso_getcpu_init();
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index e1aa2cd7734b..d25b952a2b91 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -288,7 +288,7 @@ static void notrace start_secondary(void *unused)
 
 	cpu_init();
 	fpu__init_cpu();
-	rcu_cpu_starting(raw_smp_processor_id());
+	rcutree_report_cpu_starting(raw_smp_processor_id());
 	x86_cpuinit.early_percpu_clock_init();
 
 	ap_starting();
diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h
index a92bce40b04b..d05e1e9a553c 100644
--- a/include/linux/interrupt.h
+++ b/include/linux/interrupt.h
@@ -566,7 +566,7 @@ enum
  *
  * _ RCU:
  * 	1) rcutree_migrate_callbacks() migrates the queue.
- * 	2) rcu_report_dead() reports the final quiescent states.
+ * 	2) rcutree_report_cpu_dead() reports the final quiescent states.
  *
  * _ IRQ_POLL: irq_poll_cpu_dead() migrates the queue
  */
diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index aa351ddcbe8d..f7206b2623c9 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -122,8 +122,6 @@ static inline void call_rcu_hurry(struct rcu_head *head, rcu_callback_t func)
 void rcu_init(void);
 extern int rcu_scheduler_active;
 void rcu_sched_clock_irq(int user);
-void rcu_report_dead(void);
-void rcutree_migrate_callbacks(int cpu);
 
 #ifdef CONFIG_TASKS_RCU_GENERIC
 void rcu_init_tasks_generic(void);
diff --git a/include/linux/rcutiny.h b/include/linux/rcutiny.h
index 7f17acf29dda..04889a9602e7 100644
--- a/include/linux/rcutiny.h
+++ b/include/linux/rcutiny.h
@@ -169,6 +169,6 @@ static inline void rcu_all_qs(void) { barrier(); }
 #define rcutree_offline_cpu      NULL
 #define rcutree_dead_cpu         NULL
 #define rcutree_dying_cpu        NULL
-static inline void rcu_cpu_starting(unsigned int cpu) { }
+static inline void rcutree_report_cpu_starting(unsigned int cpu) { }
 
 #endif /* __LINUX_RCUTINY_H */
diff --git a/include/linux/rcutree.h b/include/linux/rcutree.h
index 7d75066c72aa..07d0fc1e0d31 100644
--- a/include/linux/rcutree.h
+++ b/include/linux/rcutree.h
@@ -109,7 +109,7 @@ void rcu_all_qs(void);
 /* RCUtree hotplug events */
 int rcutree_prepare_cpu(unsigned int cpu);
 int rcutree_online_cpu(unsigned int cpu);
-void rcu_cpu_starting(unsigned int cpu);
+void rcutree_report_cpu_starting(unsigned int cpu);
 
 #ifdef CONFIG_HOTPLUG_CPU
 int rcutree_dead_cpu(unsigned int cpu);
@@ -121,4 +121,9 @@ int rcutree_offline_cpu(unsigned int cpu);
 #define rcutree_offline_cpu NULL
 #endif
 
+void rcutree_migrate_callbacks(int cpu);
+
+/* Called from hotplug and also arm64 early secondary boot failure */
+void rcutree_report_cpu_dead(void);
+
 #endif /* __LINUX_RCUTREE_H */
diff --git a/kernel/cpu.c b/kernel/cpu.c
index 86f08eafbd9f..a41a6fff3c91 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -1368,10 +1368,10 @@ void cpuhp_report_idle_dead(void)
 	struct cpuhp_cpu_state *st = this_cpu_ptr(&cpuhp_state);
 
 	BUG_ON(st->state != CPUHP_AP_OFFLINE);
-	rcu_report_dead();
+	rcutree_report_cpu_dead();
 	st->state = CPUHP_AP_IDLE_DEAD;
 	/*
-	 * We cannot call complete after rcu_report_dead() so we delegate it
+	 * We cannot call complete after rcutree_report_cpu_dead() so we delegate it
 	 * to an online cpu.
 	 */
 	smp_call_function_single(cpumask_first(cpu_online_mask),
@@ -1575,7 +1575,7 @@ void notify_cpu_starting(unsigned int cpu)
 	struct cpuhp_cpu_state *st = per_cpu_ptr(&cpuhp_state, cpu);
 	enum cpuhp_state target = min((int)st->target, CPUHP_AP_ONLINE);
 
-	rcu_cpu_starting(cpu);	/* Enables RCU usage on this CPU. */
+	rcutree_report_cpu_starting(cpu);	/* Enables RCU usage on this CPU. */
 	cpumask_set_cpu(cpu, &cpus_booted_once_mask);
 
 	/*
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 875f241db508..5698c3f30b1d 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -4207,7 +4207,7 @@ bool rcu_lockdep_current_cpu_online(void)
 	rdp = this_cpu_ptr(&rcu_data);
 	/*
 	 * Strictly, we care here about the case where the current CPU is
-	 * in rcu_cpu_starting() and thus has an excuse for rdp->grpmask
+	 * in rcutree_report_cpu_starting() and thus has an excuse for rdp->grpmask
 	 * not being up to date. So arch_spin_is_locked() might have a
 	 * false positive if it's held by some *other* CPU, but that's
 	 * OK because that just means a false *negative* on the warning.
@@ -4436,8 +4436,10 @@ int rcutree_online_cpu(unsigned int cpu)
  * from the incoming CPU rather than from the cpuhp_step mechanism.
  * This is because this function must be invoked at a precise location.
  * This incoming CPU must not have enabled interrupts yet.
+ *
+ * This mirrors the effects of rcutree_report_cpu_dead().
  */
-void rcu_cpu_starting(unsigned int cpu)
+void rcutree_report_cpu_starting(unsigned int cpu)
 {
 	unsigned long mask;
 	struct rcu_data *rdp;
@@ -4491,8 +4493,10 @@ void rcu_cpu_starting(unsigned int cpu)
  * Note that this function is special in that it is invoked directly
  * from the outgoing CPU rather than from the cpuhp_step mechanism.
  * This is because this function must be invoked at a precise location.
+ *
+ * This mirrors the effect of rcutree_report_cpu_starting().
  */
-void rcu_report_dead(void)
+void rcutree_report_cpu_dead(void)
 {
 	unsigned long flags;
 	unsigned long mask;
@@ -5063,7 +5067,7 @@ void __init rcu_init(void)
 	pm_notifier(rcu_pm_notify, 0);
 	WARN_ON(num_online_cpus() > 1); // Only one CPU this early in boot.
 	rcutree_prepare_cpu(cpu);
-	rcu_cpu_starting(cpu);
+	rcutree_report_cpu_starting(cpu);
 	rcutree_online_cpu(cpu);
 
 	/* Create workqueue for Tree SRCU and for expedited GPs. */
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 09/10] rcu: Remove references to rcu_migrate_callbacks() from diagrams
  2023-09-08 20:35 [PATCH 00/10] rcu cleanups Frederic Weisbecker
                   ` (7 preceding siblings ...)
  2023-09-08 20:36 ` [PATCH 08/10] rcu: Standardize explicit CPU-hotplug calls Frederic Weisbecker
@ 2023-09-08 20:36 ` Frederic Weisbecker
  2023-10-02 15:52   ` Paul E. McKenney
  2023-09-08 20:36 ` [PATCH 10/10] rcu: Comment why callbacks migration can't wait for CPUHP_RCUTREE_PREP Frederic Weisbecker
  9 siblings, 1 reply; 29+ messages in thread
From: Frederic Weisbecker @ 2023-09-08 20:36 UTC (permalink / raw)
  To: Paul E . McKenney
  Cc: LKML, Frederic Weisbecker, rcu, Uladzislau Rezki,
	Neeraj Upadhyay, Boqun Feng, Joel Fernandes

This function is gone since:

	53b46303da84 (rcu: Remove rsp parameter from rcu_boot_init_percpu_data() and friends)

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
 .../Design/Memory-Ordering/TreeRCU-callback-registry.svg | 9 ---------
 Documentation/RCU/Design/Memory-Ordering/TreeRCU-gp.svg  | 9 ---------
 2 files changed, 18 deletions(-)

diff --git a/Documentation/RCU/Design/Memory-Ordering/TreeRCU-callback-registry.svg b/Documentation/RCU/Design/Memory-Ordering/TreeRCU-callback-registry.svg
index 7ac6f9269806..63eff867175a 100644
--- a/Documentation/RCU/Design/Memory-Ordering/TreeRCU-callback-registry.svg
+++ b/Documentation/RCU/Design/Memory-Ordering/TreeRCU-callback-registry.svg
@@ -564,15 +564,6 @@
        font-size="192"
        id="text202-7-9-6"
        style="font-size:192px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;stroke-width:0.025in;font-family:Courier">rcutree_migrate_callbacks()</text>
-    <text
-       xml:space="preserve"
-       x="8335.4873"
-       y="5357.1006"
-       font-style="normal"
-       font-weight="bold"
-       font-size="192"
-       id="text202-7-9-6-0"
-       style="font-size:192px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;stroke-width:0.025in;font-family:Courier">rcu_migrate_callbacks()</text>
     <text
        xml:space="preserve"
        x="8768.4678"
diff --git a/Documentation/RCU/Design/Memory-Ordering/TreeRCU-gp.svg b/Documentation/RCU/Design/Memory-Ordering/TreeRCU-gp.svg
index 6e690a3be161..53e0dc2a2c79 100644
--- a/Documentation/RCU/Design/Memory-Ordering/TreeRCU-gp.svg
+++ b/Documentation/RCU/Design/Memory-Ordering/TreeRCU-gp.svg
@@ -1446,15 +1446,6 @@
        font-size="192"
        id="text202-7-9-6"
        style="font-size:192px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;stroke-width:0.025in;font-family:Courier">rcutree_migrate_callbacks()</text>
-    <text
-       xml:space="preserve"
-       x="8335.4873"
-       y="5357.1006"
-       font-style="normal"
-       font-weight="bold"
-       font-size="192"
-       id="text202-7-9-6-0"
-       style="font-size:192px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;stroke-width:0.025in;font-family:Courier">rcu_migrate_callbacks()</text>
     <text
        xml:space="preserve"
        x="8768.4678"
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 10/10] rcu: Comment why callbacks migration can't wait for CPUHP_RCUTREE_PREP
  2023-09-08 20:35 [PATCH 00/10] rcu cleanups Frederic Weisbecker
                   ` (8 preceding siblings ...)
  2023-09-08 20:36 ` [PATCH 09/10] rcu: Remove references to rcu_migrate_callbacks() from diagrams Frederic Weisbecker
@ 2023-09-08 20:36 ` Frederic Weisbecker
  2023-10-02 15:48   ` Paul E. McKenney
  9 siblings, 1 reply; 29+ messages in thread
From: Frederic Weisbecker @ 2023-09-08 20:36 UTC (permalink / raw)
  To: Paul E . McKenney
  Cc: LKML, Frederic Weisbecker, rcu, Uladzislau Rezki,
	Neeraj Upadhyay, Boqun Feng, Joel Fernandes

The callbacks migration is performed through an explicit call from
the hotplug control CPU right after the death of the target CPU and
before proceeding with the CPUHP_ teardown functions.

This is unusual but necessary and yet uncommented. Summarize the reason
as explained in the changelog of:

	a58163d8ca2c (rcu: Migrate callbacks earlier in the CPU-offline timeline)

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
 kernel/cpu.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/kernel/cpu.c b/kernel/cpu.c
index a41a6fff3c91..b135bb481be1 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -1352,7 +1352,14 @@ static int takedown_cpu(unsigned int cpu)
 	cpuhp_bp_sync_dead(cpu);
 
 	tick_cleanup_dead_cpu(cpu);
+
+	/*
+	 * Callbacks must be re-integrated right away to the RCU state machine.
+	 * Otherwise an RCU callback could block a further teardown function
+	 * waiting for its completion.
+	 */
 	rcutree_migrate_callbacks(cpu);
+
 	return 0;
 }
 
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH 02/10] rcu: Rename jiffies_till_flush to jiffies_lazy_flush
  2023-09-08 20:35 ` [PATCH 02/10] rcu: Rename jiffies_till_flush to jiffies_lazy_flush Frederic Weisbecker
@ 2023-09-09  1:07   ` Joel Fernandes
  2023-09-10 19:48     ` Frederic Weisbecker
  0 siblings, 1 reply; 29+ messages in thread
From: Joel Fernandes @ 2023-09-09  1:07 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Paul E . McKenney, LKML, rcu, Uladzislau Rezki, Neeraj Upadhyay,
	Boqun Feng

On Fri, Sep 8, 2023 at 4:36 PM Frederic Weisbecker <frederic@kernel.org> wrote:
>
> The variable name jiffies_till_flush is too generic and therefore:
>
> * It may shadow a global variable
> * It doesn't tell on what it operates
>
> Make the name more precise, along with the related APIs.
>
> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
> ---
>  kernel/rcu/rcu.h       |  8 ++++----
>  kernel/rcu/rcuscale.c  |  6 +++---
>  kernel/rcu/tree_nocb.h | 20 ++++++++++----------
>  3 files changed, 17 insertions(+), 17 deletions(-)
>
> diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h
> index 56a8466a11a2..4ac4daae9917 100644
> --- a/kernel/rcu/rcu.h
> +++ b/kernel/rcu/rcu.h
> @@ -541,11 +541,11 @@ enum rcutorture_type {
>  };
>
>  #if defined(CONFIG_RCU_LAZY)
> -unsigned long rcu_lazy_get_jiffies_till_flush(void);
> -void rcu_lazy_set_jiffies_till_flush(unsigned long j);
> +unsigned long rcu_get_jiffies_lazy_flush(void);
> +void rcu_set_jiffies_lazy_flush(unsigned long j);
>  #else
> -static inline unsigned long rcu_lazy_get_jiffies_till_flush(void) { return 0; }
> -static inline void rcu_lazy_set_jiffies_till_flush(unsigned long j) { }
> +static inline unsigned long rcu_get_jiffies_lazy_flush(void) { return 0; }
> +static inline void rcu_set_jiffies_lazy_flush(unsigned long j) { }
>  #endif
>
>  #if defined(CONFIG_TREE_RCU)
> diff --git a/kernel/rcu/rcuscale.c b/kernel/rcu/rcuscale.c
> index ffdb30495e3c..8db4fedaaa1e 100644
> --- a/kernel/rcu/rcuscale.c
> +++ b/kernel/rcu/rcuscale.c
> @@ -764,9 +764,9 @@ kfree_scale_init(void)
>
>         if (kfree_by_call_rcu) {
>                 /* do a test to check the timeout. */
> -               orig_jif = rcu_lazy_get_jiffies_till_flush();
> +               orig_jif = rcu_get_jiffies_lazy_flush();
>
> -               rcu_lazy_set_jiffies_till_flush(2 * HZ);
> +               rcu_set_jiffies_lazy_flush(2 * HZ);
>                 rcu_barrier();
>
>                 jif_start = jiffies;
> @@ -775,7 +775,7 @@ kfree_scale_init(void)
>
>                 smp_cond_load_relaxed(&rcu_lazy_test1_cb_called, VAL == 1);
>
> -               rcu_lazy_set_jiffies_till_flush(orig_jif);
> +               rcu_set_jiffies_lazy_flush(orig_jif);
>
>                 if (WARN_ON_ONCE(jiffies_at_lazy_cb - jif_start < 2 * HZ)) {
>                         pr_alert("ERROR: call_rcu() CBs are not being lazy as expected!\n");
> diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
> index 5598212d1f27..b9eab359c597 100644
> --- a/kernel/rcu/tree_nocb.h
> +++ b/kernel/rcu/tree_nocb.h
> @@ -264,21 +264,21 @@ static bool wake_nocb_gp(struct rcu_data *rdp, bool force)
>   * left unsubmitted to RCU after those many jiffies.
>   */
>  #define LAZY_FLUSH_JIFFIES (10 * HZ)
> -static unsigned long jiffies_till_flush = LAZY_FLUSH_JIFFIES;
> +static unsigned long jiffies_lazy_flush = LAZY_FLUSH_JIFFIES;
>
>  #ifdef CONFIG_RCU_LAZY
>  // To be called only from test code.
> -void rcu_lazy_set_jiffies_till_flush(unsigned long jif)
> +void rcu_set_jiffies_lazy_flush(unsigned long jif)
>  {
> -       jiffies_till_flush = jif;
> +       jiffies_lazy_flush = jif;
>  }
> -EXPORT_SYMBOL(rcu_lazy_set_jiffies_till_flush);
> +EXPORT_SYMBOL(rcu_set_jiffies_lazy_flush);
>
> -unsigned long rcu_lazy_get_jiffies_till_flush(void)
> +unsigned long rcu_get_jiffies_lazy_flush(void)
>  {
> -       return jiffies_till_flush;
> +       return jiffies_lazy_flush;
>  }
> -EXPORT_SYMBOL(rcu_lazy_get_jiffies_till_flush);
> +EXPORT_SYMBOL(rcu_get_jiffies_lazy_flush);
>  #endif
>
>  /*

While at it, could we replace jiffies_lazy_flush with
rcu_get_jiffies_lazy_flush() in the below? That way all users are
going through the API.

With that,
Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>

thanks,

 - Joel


> @@ -299,7 +299,7 @@ static void wake_nocb_gp_defer(struct rcu_data *rdp, int waketype,
>          */
>         if (waketype == RCU_NOCB_WAKE_LAZY &&
>             rdp->nocb_defer_wakeup == RCU_NOCB_WAKE_NOT) {
> -               mod_timer(&rdp_gp->nocb_timer, jiffies + jiffies_till_flush);
> +               mod_timer(&rdp_gp->nocb_timer, jiffies + jiffies_lazy_flush);
>                 WRITE_ONCE(rdp_gp->nocb_defer_wakeup, waketype);
>         } else if (waketype == RCU_NOCB_WAKE_BYPASS) {
>                 mod_timer(&rdp_gp->nocb_timer, jiffies + 2);
> @@ -482,7 +482,7 @@ static bool rcu_nocb_try_bypass(struct rcu_data *rdp, struct rcu_head *rhp,
>         // flush ->nocb_bypass to ->cblist.
>         if ((ncbs && !bypass_is_lazy && j != READ_ONCE(rdp->nocb_bypass_first)) ||
>             (ncbs &&  bypass_is_lazy &&
> -            (time_after(j, READ_ONCE(rdp->nocb_bypass_first) + jiffies_till_flush))) ||
> +            (time_after(j, READ_ONCE(rdp->nocb_bypass_first) + jiffies_lazy_flush))) ||
>             ncbs >= qhimark) {
>                 rcu_nocb_lock(rdp);
>                 *was_alldone = !rcu_segcblist_pend_cbs(&rdp->cblist);
> @@ -723,7 +723,7 @@ static void nocb_gp_wait(struct rcu_data *my_rdp)
>                 lazy_ncbs = READ_ONCE(rdp->lazy_len);
>
>                 if (bypass_ncbs && (lazy_ncbs == bypass_ncbs) &&
> -                   (time_after(j, READ_ONCE(rdp->nocb_bypass_first) + jiffies_till_flush) ||
> +                   (time_after(j, READ_ONCE(rdp->nocb_bypass_first) + jiffies_lazy_flush) ||
>                      bypass_ncbs > 2 * qhimark)) {
>                         flush_bypass = true;
>                 } else if (bypass_ncbs && (lazy_ncbs != bypass_ncbs) &&
> --
> 2.41.0
>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 03/10] rcu/nocb: Remove needless LOAD-ACQUIRE
  2023-09-08 20:35 ` [PATCH 03/10] rcu/nocb: Remove needless LOAD-ACQUIRE Frederic Weisbecker
@ 2023-09-09  1:48   ` Joel Fernandes
  2023-09-09  1:50     ` Joel Fernandes
  2023-09-10 21:17     ` Frederic Weisbecker
  0 siblings, 2 replies; 29+ messages in thread
From: Joel Fernandes @ 2023-09-09  1:48 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Paul E . McKenney, LKML, rcu, Uladzislau Rezki, Neeraj Upadhyay,
	Boqun Feng

On Fri, Sep 8, 2023 at 4:36 PM Frederic Weisbecker <frederic@kernel.org> wrote:
>
> The LOAD-ACQUIRE access performed on rdp->nocb_cb_sleep advertizes
> ordering callback execution against grace period completion. However
> this is contradicted by the following:
>
> * This LOAD-ACQUIRE doesn't pair with anything. The only counterpart
>   barrier that can be found is the smp_mb() placed after callbacks
>   advancing in nocb_gp_wait(). However the barrier is placed _after_
>   ->nocb_cb_sleep write.

Hmm, on one side you have:

WRITE_ONCE(rdp->nocb_cb_sleep, false);
smp_mb();
swake_up_one(&rdp->nocb_cb_wq);   /* wakeup -- consider this to be a STORE */

And on another side you have:
swait_event_interruptible_exclusive(rdp->nocb_cb_wq, ..cond..) /*
consider this to be a LOAD */
smp_load_acquire(&rdp->nocb_cb_sleep)
/* exec CBs (LOAD operations) */

So there seems to be pairing AFAICS.

But maybe you are referring to pairing between advancing the callbacks
and storing to nocb_cb_sleep. In this case, the RELEASE of the nocb
unlock operation just after advancing should be providing the
ordering, but we still need the acquire this patch deletes.

> * Callbacks can be concurrently advanced between the LOAD-ACQUIRE on
>   ->nocb_cb_sleep and the call to rcu_segcblist_extract_done_cbs() in
>   rcu_do_batch(), making any ordering based on ->nocb_cb_sleep broken.

If you don't mind, could you elaborate more?

> * Both rcu_segcblist_extract_done_cbs() and rcu_advance_cbs() are called
>   under the nocb_lock, the latter hereby providing already the desired
>   ACQUIRE semantics.

The acquire orders loads to nocb_cb_sleep with all later loads/stores.
I am not sure how nocb_lock gives that same behavior since that's
doing ACQUIRE on the lock access itself and not on nocb_cb_sleep
access, I'd appreciate it if we can debate this out.

Every few months I need a memory-ordering workout so this can be that.
;-) You could be onto something.

thanks,

 - Joel



>
> Therefore it is safe to access ->nocb_cb_sleep with a simple compiler
> barrier.
>
> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
> ---
>  kernel/rcu/tree_nocb.h | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
> index b9eab359c597..6e63ba4788e1 100644
> --- a/kernel/rcu/tree_nocb.h
> +++ b/kernel/rcu/tree_nocb.h
> @@ -933,8 +933,7 @@ static void nocb_cb_wait(struct rcu_data *rdp)
>                 swait_event_interruptible_exclusive(rdp->nocb_cb_wq,
>                                                     nocb_cb_wait_cond(rdp));
>
> -               // VVV Ensure CB invocation follows _sleep test.
> -               if (smp_load_acquire(&rdp->nocb_cb_sleep)) { // ^^^
> +               if (READ_ONCE(rdp->nocb_cb_sleep)) {
>                         WARN_ON(signal_pending(current));
>                         trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("WokeEmpty"));
>                 }
> --
> 2.41.0
>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 03/10] rcu/nocb: Remove needless LOAD-ACQUIRE
  2023-09-09  1:48   ` Joel Fernandes
@ 2023-09-09  1:50     ` Joel Fernandes
  2023-09-10 21:17     ` Frederic Weisbecker
  1 sibling, 0 replies; 29+ messages in thread
From: Joel Fernandes @ 2023-09-09  1:50 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Paul E . McKenney, LKML, rcu, Uladzislau Rezki, Neeraj Upadhyay,
	Boqun Feng

> > * Callbacks can be concurrently advanced between the LOAD-ACQUIRE on
> >   ->nocb_cb_sleep and the call to rcu_segcblist_extract_done_cbs() in
> >   rcu_do_batch(), making any ordering based on ->nocb_cb_sleep broken.
>
> If you don't mind, could you elaborate more?

Ah, I see you deleted the counterpart memory barrier in the next
patch. I was reading the patches in order, so I did not notice. I'll
go read that as well. It might make sense to combine this and the next
patch, not sure.

 - Joel

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 04/10] rcu/nocb: Remove needless full barrier after callback advancing
  2023-09-08 20:35 ` [PATCH 04/10] rcu/nocb: Remove needless full barrier after callback advancing Frederic Weisbecker
@ 2023-09-09  4:31   ` Joel Fernandes
  2023-09-09 18:22     ` Boqun Feng
  0 siblings, 1 reply; 29+ messages in thread
From: Joel Fernandes @ 2023-09-09  4:31 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Paul E . McKenney, LKML, rcu, Uladzislau Rezki, Neeraj Upadhyay,
	Boqun Feng

On Fri, Sep 08, 2023 at 10:35:57PM +0200, Frederic Weisbecker wrote:
> A full barrier is issued from nocb_gp_wait() upon callbacks advancing
> to order grace period completion with callbacks execution.
> 
> However these two events are already ordered by the
> smp_mb__after_unlock_lock() barrier within the call to
> raw_spin_lock_rcu_node() that is necessary for callbacks advancing to
> happen.
> 
> The following litmus test shows the kind of guarantee that this barrier
> provides:
> 
> 	C smp_mb__after_unlock_lock
> 
> 	{}
> 
> 	// rcu_gp_cleanup()
> 	P0(spinlock_t *rnp_lock, int *gpnum)
> 	{
> 		// Grace period cleanup increase gp sequence number
> 		spin_lock(rnp_lock);
> 		WRITE_ONCE(*gpnum, 1);
> 		spin_unlock(rnp_lock);
> 	}
> 
> 	// nocb_gp_wait()
> 	P1(spinlock_t *rnp_lock, spinlock_t *nocb_lock, int *gpnum, int *cb_ready)
> 	{
> 		int r1;
> 
> 		// Call rcu_advance_cbs() from nocb_gp_wait()
> 		spin_lock(nocb_lock);
> 		spin_lock(rnp_lock);
> 		smp_mb__after_unlock_lock();
> 		r1 = READ_ONCE(*gpnum);
> 		WRITE_ONCE(*cb_ready, 1);
> 		spin_unlock(rnp_lock);
> 		spin_unlock(nocb_lock);
> 	}
> 
> 	// nocb_cb_wait()
> 	P2(spinlock_t *nocb_lock, int *cb_ready, int *cb_executed)
> 	{
> 		int r2;
> 
> 		// rcu_do_batch() -> rcu_segcblist_extract_done_cbs()
> 		spin_lock(nocb_lock);
> 		r2 = READ_ONCE(*cb_ready);
> 		spin_unlock(nocb_lock);
> 
> 		// Actual callback execution
> 		WRITE_ONCE(*cb_executed, 1);

So related to this something in the docs caught my attention under "Callback
Invocation" [1]

<quote>
However, if the callback function communicates to other CPUs, for example,
doing a wakeup, then it is that function's responsibility to maintain
ordering. For example, if the callback function wakes up a task that runs on
some other CPU, proper ordering must in place in both the callback function
and the task being awakened. To see why this is important, consider the top
half of the grace-period cleanup diagram. The callback might be running on a
CPU corresponding to the leftmost leaf rcu_node structure, and awaken a task
that is to run on a CPU corresponding to the rightmost leaf rcu_node
structure, and the grace-period kernel thread might not yet have reached the
rightmost leaf. In this case, the grace period's memory ordering might not
yet have reached that CPU, so again the callback function and the awakened
task must supply proper ordering.
</quote>

I believe this text is for non-nocb but if we apply that to the nocb case,
lets see what happens.

In the litmus, he rcu_advance_cbs() happened on P1, however the callback is
executing on P2. That sounds very similar to the non-nocb world described in
the text where a callback tries to wake something up on a different CPU and
needs to take care of all the ordering.

So unless I'm missing something (quite possible), P2 must see the update to
gpnum as well. However, per your limus test, the only thing P2  does is
acquire the nocb_lock. I don't see how it is guaranteed to see gpnum == 1.
I am curious what happens in your litmus if you read gpnum in a register and
checked for it.

So maybe the memory barriers you are deleting need to be kept in place? Idk.

thanks,

 - Joel

[1] https://docs.kernel.org/RCU/Design/Memory-Ordering/Tree-RCU-Memory-Ordering.html#callback-invocation


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 04/10] rcu/nocb: Remove needless full barrier after callback advancing
  2023-09-09  4:31   ` Joel Fernandes
@ 2023-09-09 18:22     ` Boqun Feng
  2023-09-10  4:09       ` Joel Fernandes
  0 siblings, 1 reply; 29+ messages in thread
From: Boqun Feng @ 2023-09-09 18:22 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Frederic Weisbecker, Paul E . McKenney, LKML, rcu,
	Uladzislau Rezki, Neeraj Upadhyay

On Sat, Sep 09, 2023 at 04:31:25AM +0000, Joel Fernandes wrote:
> On Fri, Sep 08, 2023 at 10:35:57PM +0200, Frederic Weisbecker wrote:
> > A full barrier is issued from nocb_gp_wait() upon callbacks advancing
> > to order grace period completion with callbacks execution.
> > 
> > However these two events are already ordered by the
> > smp_mb__after_unlock_lock() barrier within the call to
> > raw_spin_lock_rcu_node() that is necessary for callbacks advancing to
> > happen.
> > 
> > The following litmus test shows the kind of guarantee that this barrier
> > provides:
> > 
> > 	C smp_mb__after_unlock_lock
> > 
> > 	{}
> > 
> > 	// rcu_gp_cleanup()
> > 	P0(spinlock_t *rnp_lock, int *gpnum)
> > 	{
> > 		// Grace period cleanup increase gp sequence number
> > 		spin_lock(rnp_lock);
> > 		WRITE_ONCE(*gpnum, 1);
> > 		spin_unlock(rnp_lock);
> > 	}
> > 
> > 	// nocb_gp_wait()
> > 	P1(spinlock_t *rnp_lock, spinlock_t *nocb_lock, int *gpnum, int *cb_ready)
> > 	{
> > 		int r1;
> > 
> > 		// Call rcu_advance_cbs() from nocb_gp_wait()
> > 		spin_lock(nocb_lock);
> > 		spin_lock(rnp_lock);
> > 		smp_mb__after_unlock_lock();
> > 		r1 = READ_ONCE(*gpnum);
> > 		WRITE_ONCE(*cb_ready, 1);
> > 		spin_unlock(rnp_lock);
> > 		spin_unlock(nocb_lock);
> > 	}
> > 
> > 	// nocb_cb_wait()
> > 	P2(spinlock_t *nocb_lock, int *cb_ready, int *cb_executed)
> > 	{
> > 		int r2;
> > 
> > 		// rcu_do_batch() -> rcu_segcblist_extract_done_cbs()
> > 		spin_lock(nocb_lock);
> > 		r2 = READ_ONCE(*cb_ready);
> > 		spin_unlock(nocb_lock);
> > 
> > 		// Actual callback execution
> > 		WRITE_ONCE(*cb_executed, 1);
> 
> So related to this something in the docs caught my attention under "Callback
> Invocation" [1]
> 
> <quote>
> However, if the callback function communicates to other CPUs, for example,
> doing a wakeup, then it is that function's responsibility to maintain
> ordering. For example, if the callback function wakes up a task that runs on
> some other CPU, proper ordering must in place in both the callback function
> and the task being awakened. To see why this is important, consider the top
> half of the grace-period cleanup diagram. The callback might be running on a
> CPU corresponding to the leftmost leaf rcu_node structure, and awaken a task
> that is to run on a CPU corresponding to the rightmost leaf rcu_node
> structure, and the grace-period kernel thread might not yet have reached the
> rightmost leaf. In this case, the grace period's memory ordering might not
> yet have reached that CPU, so again the callback function and the awakened
> task must supply proper ordering.
> </quote>
> 
> I believe this text is for non-nocb but if we apply that to the nocb case,
> lets see what happens.
> 
> In the litmus, he rcu_advance_cbs() happened on P1, however the callback is
> executing on P2. That sounds very similar to the non-nocb world described in
> the text where a callback tries to wake something up on a different CPU and
> needs to take care of all the ordering.
> 
> So unless I'm missing something (quite possible), P2 must see the update to
> gpnum as well. However, per your limus test, the only thing P2  does is
> acquire the nocb_lock. I don't see how it is guaranteed to see gpnum == 1.

Because P1 writes cb_ready under nocb_lock, and P2 reads cb_ready under
nocb_lock as well and if P2 read P1's write, then we know the serialized
order of locking is P1 first (i.e. the spin_lock(nocb_lock) on P2 reads
from the spin_unlock(nocb_lock) on P1), in other words:

(fact #1)

	unlock(nocb_lock) // on P1
	->rfe
	lock(nocb_lock) // on P2

so if P1 reads P0's write on gpnum

(assumption #1)

	W(gpnum)=1 // on P0
	->rfe
	R(gpnum)=1 // on P1

and we have

(fact #2)

	R(gpnum)=1 // on P1
	->(po; [UL])
	unlock(nocb_lock) // on P1

combine them you get

	W(gpnum)=1 // on P0
	->rfe           // fact #1
	->(po; [UL])    // fact #2
	->rfe           // assumption #1
	lock(nocb_lock) // on P2
	->([LKR]; po)
	M // any access on P2 after spin_lock(nocb_lock);

so
	W(gpnum)=1 // on P0
	->rfe ->po-unlock-lock-po
	M // on P2

and po-unlock-lock-po is A-culum, hence "->rfe ->po-unlock-lock-po" or
"rfe; po-unlock-lock-po" is culum-fence, hence it's a ->prop, which
means the write of gpnum on P0 propagates to P2 before any memory
accesses after spin_lock(nocb_lock)?

Of course, I haven't looked into the bigger picture here (whether the
barrier is for something else, etc.) ;-)

Regards,
Boqun

> I am curious what happens in your litmus if you read gpnum in a register and
> checked for it.
> 
> So maybe the memory barriers you are deleting need to be kept in place? Idk.
> 
> thanks,
> 
>  - Joel
> 
> [1] https://docs.kernel.org/RCU/Design/Memory-Ordering/Tree-RCU-Memory-Ordering.html#callback-invocation
> 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 04/10] rcu/nocb: Remove needless full barrier after callback advancing
  2023-09-09 18:22     ` Boqun Feng
@ 2023-09-10  4:09       ` Joel Fernandes
  2023-09-10 10:22         ` Paul E. McKenney
  2023-09-10 20:17         ` Frederic Weisbecker
  0 siblings, 2 replies; 29+ messages in thread
From: Joel Fernandes @ 2023-09-10  4:09 UTC (permalink / raw)
  To: Boqun Feng
  Cc: Frederic Weisbecker, Paul E . McKenney, LKML, rcu,
	Uladzislau Rezki, Neeraj Upadhyay

On Sat, Sep 09, 2023 at 11:22:48AM -0700, Boqun Feng wrote:
> On Sat, Sep 09, 2023 at 04:31:25AM +0000, Joel Fernandes wrote:
> > On Fri, Sep 08, 2023 at 10:35:57PM +0200, Frederic Weisbecker wrote:
> > > A full barrier is issued from nocb_gp_wait() upon callbacks advancing
> > > to order grace period completion with callbacks execution.
> > > 
> > > However these two events are already ordered by the
> > > smp_mb__after_unlock_lock() barrier within the call to
> > > raw_spin_lock_rcu_node() that is necessary for callbacks advancing to
> > > happen.
> > > 
> > > The following litmus test shows the kind of guarantee that this barrier
> > > provides:
> > > 
> > > 	C smp_mb__after_unlock_lock
> > > 
> > > 	{}
> > > 
> > > 	// rcu_gp_cleanup()
> > > 	P0(spinlock_t *rnp_lock, int *gpnum)
> > > 	{
> > > 		// Grace period cleanup increase gp sequence number
> > > 		spin_lock(rnp_lock);
> > > 		WRITE_ONCE(*gpnum, 1);
> > > 		spin_unlock(rnp_lock);
> > > 	}
> > > 
> > > 	// nocb_gp_wait()
> > > 	P1(spinlock_t *rnp_lock, spinlock_t *nocb_lock, int *gpnum, int *cb_ready)
> > > 	{
> > > 		int r1;
> > > 
> > > 		// Call rcu_advance_cbs() from nocb_gp_wait()
> > > 		spin_lock(nocb_lock);
> > > 		spin_lock(rnp_lock);
> > > 		smp_mb__after_unlock_lock();
> > > 		r1 = READ_ONCE(*gpnum);
> > > 		WRITE_ONCE(*cb_ready, 1);
> > > 		spin_unlock(rnp_lock);
> > > 		spin_unlock(nocb_lock);
> > > 	}
> > > 
> > > 	// nocb_cb_wait()
> > > 	P2(spinlock_t *nocb_lock, int *cb_ready, int *cb_executed)
> > > 	{
> > > 		int r2;
> > > 
> > > 		// rcu_do_batch() -> rcu_segcblist_extract_done_cbs()
> > > 		spin_lock(nocb_lock);
> > > 		r2 = READ_ONCE(*cb_ready);
> > > 		spin_unlock(nocb_lock);
> > > 
> > > 		// Actual callback execution
> > > 		WRITE_ONCE(*cb_executed, 1);
> > 
> > So related to this something in the docs caught my attention under "Callback
> > Invocation" [1]
> > 
> > <quote>
> > However, if the callback function communicates to other CPUs, for example,
> > doing a wakeup, then it is that function's responsibility to maintain
> > ordering. For example, if the callback function wakes up a task that runs on
> > some other CPU, proper ordering must in place in both the callback function
> > and the task being awakened. To see why this is important, consider the top
> > half of the grace-period cleanup diagram. The callback might be running on a
> > CPU corresponding to the leftmost leaf rcu_node structure, and awaken a task
> > that is to run on a CPU corresponding to the rightmost leaf rcu_node
> > structure, and the grace-period kernel thread might not yet have reached the
> > rightmost leaf. In this case, the grace period's memory ordering might not
> > yet have reached that CPU, so again the callback function and the awakened
> > task must supply proper ordering.
> > </quote>
> > 
> > I believe this text is for non-nocb but if we apply that to the nocb case,
> > lets see what happens.
> > 
> > In the litmus, he rcu_advance_cbs() happened on P1, however the callback is
> > executing on P2. That sounds very similar to the non-nocb world described in
> > the text where a callback tries to wake something up on a different CPU and
> > needs to take care of all the ordering.
> > 
> > So unless I'm missing something (quite possible), P2 must see the update to
> > gpnum as well. However, per your limus test, the only thing P2  does is
> > acquire the nocb_lock. I don't see how it is guaranteed to see gpnum == 1.
> 
> Because P1 writes cb_ready under nocb_lock, and P2 reads cb_ready under
> nocb_lock as well and if P2 read P1's write, then we know the serialized
> order of locking is P1 first (i.e. the spin_lock(nocb_lock) on P2 reads
> from the spin_unlock(nocb_lock) on P1), in other words:
> 
> (fact #1)
> 
> 	unlock(nocb_lock) // on P1
> 	->rfe
> 	lock(nocb_lock) // on P2
> 
> so if P1 reads P0's write on gpnum
> 
> (assumption #1)
> 
> 	W(gpnum)=1 // on P0
> 	->rfe
> 	R(gpnum)=1 // on P1
> 
> and we have
> 
> (fact #2)
> 
> 	R(gpnum)=1 // on P1
> 	->(po; [UL])
> 	unlock(nocb_lock) // on P1
> 
> combine them you get
> 
> 	W(gpnum)=1 // on P0
> 	->rfe           // fact #1
> 	->(po; [UL])    // fact #2
> 	->rfe           // assumption #1
> 	lock(nocb_lock) // on P2
> 	->([LKR]; po)
> 	M // any access on P2 after spin_lock(nocb_lock);
> 
> so
> 	W(gpnum)=1 // on P0
> 	->rfe ->po-unlock-lock-po
> 	M // on P2
> 
> and po-unlock-lock-po is A-culum, hence "->rfe ->po-unlock-lock-po" or
> "rfe; po-unlock-lock-po" is culum-fence, hence it's a ->prop, which
> means the write of gpnum on P0 propagates to P2 before any memory
> accesses after spin_lock(nocb_lock)?

You and Frederic are right. I confirmed this by running herd7 as well.

Also he adds a ->co between P2 and P3, so that's why the
smp_mb__after_lock_unlock() helps to keep the propogation intact. Its pretty
much the R-pattern extended across 4 CPUs.

We should probably document these in the RCU memory ordering docs.

thanks,

 - Joel


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 04/10] rcu/nocb: Remove needless full barrier after callback advancing
  2023-09-10  4:09       ` Joel Fernandes
@ 2023-09-10 10:22         ` Paul E. McKenney
  2023-09-10 20:17         ` Frederic Weisbecker
  1 sibling, 0 replies; 29+ messages in thread
From: Paul E. McKenney @ 2023-09-10 10:22 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Boqun Feng, Frederic Weisbecker, LKML, rcu, Uladzislau Rezki,
	Neeraj Upadhyay

On Sun, Sep 10, 2023 at 12:09:23AM -0400, Joel Fernandes wrote:
> On Sat, Sep 09, 2023 at 11:22:48AM -0700, Boqun Feng wrote:
> > On Sat, Sep 09, 2023 at 04:31:25AM +0000, Joel Fernandes wrote:
> > > On Fri, Sep 08, 2023 at 10:35:57PM +0200, Frederic Weisbecker wrote:
> > > > A full barrier is issued from nocb_gp_wait() upon callbacks advancing
> > > > to order grace period completion with callbacks execution.
> > > > 
> > > > However these two events are already ordered by the
> > > > smp_mb__after_unlock_lock() barrier within the call to
> > > > raw_spin_lock_rcu_node() that is necessary for callbacks advancing to
> > > > happen.
> > > > 
> > > > The following litmus test shows the kind of guarantee that this barrier
> > > > provides:
> > > > 
> > > > 	C smp_mb__after_unlock_lock
> > > > 
> > > > 	{}
> > > > 
> > > > 	// rcu_gp_cleanup()
> > > > 	P0(spinlock_t *rnp_lock, int *gpnum)
> > > > 	{
> > > > 		// Grace period cleanup increase gp sequence number
> > > > 		spin_lock(rnp_lock);
> > > > 		WRITE_ONCE(*gpnum, 1);
> > > > 		spin_unlock(rnp_lock);
> > > > 	}
> > > > 
> > > > 	// nocb_gp_wait()
> > > > 	P1(spinlock_t *rnp_lock, spinlock_t *nocb_lock, int *gpnum, int *cb_ready)
> > > > 	{
> > > > 		int r1;
> > > > 
> > > > 		// Call rcu_advance_cbs() from nocb_gp_wait()
> > > > 		spin_lock(nocb_lock);
> > > > 		spin_lock(rnp_lock);
> > > > 		smp_mb__after_unlock_lock();
> > > > 		r1 = READ_ONCE(*gpnum);
> > > > 		WRITE_ONCE(*cb_ready, 1);
> > > > 		spin_unlock(rnp_lock);
> > > > 		spin_unlock(nocb_lock);
> > > > 	}
> > > > 
> > > > 	// nocb_cb_wait()
> > > > 	P2(spinlock_t *nocb_lock, int *cb_ready, int *cb_executed)
> > > > 	{
> > > > 		int r2;
> > > > 
> > > > 		// rcu_do_batch() -> rcu_segcblist_extract_done_cbs()
> > > > 		spin_lock(nocb_lock);
> > > > 		r2 = READ_ONCE(*cb_ready);
> > > > 		spin_unlock(nocb_lock);
> > > > 
> > > > 		// Actual callback execution
> > > > 		WRITE_ONCE(*cb_executed, 1);
> > > 
> > > So related to this something in the docs caught my attention under "Callback
> > > Invocation" [1]
> > > 
> > > <quote>
> > > However, if the callback function communicates to other CPUs, for example,
> > > doing a wakeup, then it is that function's responsibility to maintain
> > > ordering. For example, if the callback function wakes up a task that runs on
> > > some other CPU, proper ordering must in place in both the callback function
> > > and the task being awakened. To see why this is important, consider the top
> > > half of the grace-period cleanup diagram. The callback might be running on a
> > > CPU corresponding to the leftmost leaf rcu_node structure, and awaken a task
> > > that is to run on a CPU corresponding to the rightmost leaf rcu_node
> > > structure, and the grace-period kernel thread might not yet have reached the
> > > rightmost leaf. In this case, the grace period's memory ordering might not
> > > yet have reached that CPU, so again the callback function and the awakened
> > > task must supply proper ordering.
> > > </quote>
> > > 
> > > I believe this text is for non-nocb but if we apply that to the nocb case,
> > > lets see what happens.
> > > 
> > > In the litmus, he rcu_advance_cbs() happened on P1, however the callback is
> > > executing on P2. That sounds very similar to the non-nocb world described in
> > > the text where a callback tries to wake something up on a different CPU and
> > > needs to take care of all the ordering.
> > > 
> > > So unless I'm missing something (quite possible), P2 must see the update to
> > > gpnum as well. However, per your limus test, the only thing P2  does is
> > > acquire the nocb_lock. I don't see how it is guaranteed to see gpnum == 1.
> > 
> > Because P1 writes cb_ready under nocb_lock, and P2 reads cb_ready under
> > nocb_lock as well and if P2 read P1's write, then we know the serialized
> > order of locking is P1 first (i.e. the spin_lock(nocb_lock) on P2 reads
> > from the spin_unlock(nocb_lock) on P1), in other words:
> > 
> > (fact #1)
> > 
> > 	unlock(nocb_lock) // on P1
> > 	->rfe
> > 	lock(nocb_lock) // on P2
> > 
> > so if P1 reads P0's write on gpnum
> > 
> > (assumption #1)
> > 
> > 	W(gpnum)=1 // on P0
> > 	->rfe
> > 	R(gpnum)=1 // on P1
> > 
> > and we have
> > 
> > (fact #2)
> > 
> > 	R(gpnum)=1 // on P1
> > 	->(po; [UL])
> > 	unlock(nocb_lock) // on P1
> > 
> > combine them you get
> > 
> > 	W(gpnum)=1 // on P0
> > 	->rfe           // fact #1
> > 	->(po; [UL])    // fact #2
> > 	->rfe           // assumption #1
> > 	lock(nocb_lock) // on P2
> > 	->([LKR]; po)
> > 	M // any access on P2 after spin_lock(nocb_lock);
> > 
> > so
> > 	W(gpnum)=1 // on P0
> > 	->rfe ->po-unlock-lock-po
> > 	M // on P2
> > 
> > and po-unlock-lock-po is A-culum, hence "->rfe ->po-unlock-lock-po" or
> > "rfe; po-unlock-lock-po" is culum-fence, hence it's a ->prop, which
> > means the write of gpnum on P0 propagates to P2 before any memory
> > accesses after spin_lock(nocb_lock)?
> 
> You and Frederic are right. I confirmed this by running herd7 as well.
> 
> Also he adds a ->co between P2 and P3, so that's why the
> smp_mb__after_lock_unlock() helps to keep the propogation intact. Its pretty
> much the R-pattern extended across 4 CPUs.
> 
> We should probably document these in the RCU memory ordering docs.

I never have heard of R showing up in real code before, so that is
worth a few words in the documentation as well.  ;-)

							Thanx, Paul

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 02/10] rcu: Rename jiffies_till_flush to jiffies_lazy_flush
  2023-09-09  1:07   ` Joel Fernandes
@ 2023-09-10 19:48     ` Frederic Weisbecker
  0 siblings, 0 replies; 29+ messages in thread
From: Frederic Weisbecker @ 2023-09-10 19:48 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Paul E . McKenney, LKML, rcu, Uladzislau Rezki, Neeraj Upadhyay,
	Boqun Feng

Le Fri, Sep 08, 2023 at 09:07:42PM -0400, Joel Fernandes a écrit :
> While at it, could we replace jiffies_lazy_flush with
> rcu_get_jiffies_lazy_flush() in the below? That way all users are
> going through the API.
> 
> With that,
> Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> 
> thanks,

Makes sense, will do, thanks!

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 04/10] rcu/nocb: Remove needless full barrier after callback advancing
  2023-09-10  4:09       ` Joel Fernandes
  2023-09-10 10:22         ` Paul E. McKenney
@ 2023-09-10 20:17         ` Frederic Weisbecker
  2023-09-10 20:29           ` Frederic Weisbecker
  1 sibling, 1 reply; 29+ messages in thread
From: Frederic Weisbecker @ 2023-09-10 20:17 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Boqun Feng, Paul E . McKenney, LKML, rcu, Uladzislau Rezki,
	Neeraj Upadhyay

Le Sun, Sep 10, 2023 at 12:09:23AM -0400, Joel Fernandes a écrit :
> On Sat, Sep 09, 2023 at 11:22:48AM -0700, Boqun Feng wrote:
> > On Sat, Sep 09, 2023 at 04:31:25AM +0000, Joel Fernandes wrote:
> > > On Fri, Sep 08, 2023 at 10:35:57PM +0200, Frederic Weisbecker wrote:
> > > > A full barrier is issued from nocb_gp_wait() upon callbacks advancing
> > > > to order grace period completion with callbacks execution.
> > > > 
> > > > However these two events are already ordered by the
> > > > smp_mb__after_unlock_lock() barrier within the call to
> > > > raw_spin_lock_rcu_node() that is necessary for callbacks advancing to
> > > > happen.
> > > > 
> > > > The following litmus test shows the kind of guarantee that this barrier
> > > > provides:
> > > > 
> > > > 	C smp_mb__after_unlock_lock
> > > > 
> > > > 	{}
> > > > 
> > > > 	// rcu_gp_cleanup()
> > > > 	P0(spinlock_t *rnp_lock, int *gpnum)
> > > > 	{
> > > > 		// Grace period cleanup increase gp sequence number
> > > > 		spin_lock(rnp_lock);
> > > > 		WRITE_ONCE(*gpnum, 1);
> > > > 		spin_unlock(rnp_lock);
> > > > 	}
> > > > 
> > > > 	// nocb_gp_wait()
> > > > 	P1(spinlock_t *rnp_lock, spinlock_t *nocb_lock, int *gpnum, int *cb_ready)
> > > > 	{
> > > > 		int r1;
> > > > 
> > > > 		// Call rcu_advance_cbs() from nocb_gp_wait()
> > > > 		spin_lock(nocb_lock);
> > > > 		spin_lock(rnp_lock);
> > > > 		smp_mb__after_unlock_lock();
> > > > 		r1 = READ_ONCE(*gpnum);
> > > > 		WRITE_ONCE(*cb_ready, 1);
> > > > 		spin_unlock(rnp_lock);
> > > > 		spin_unlock(nocb_lock);
> > > > 	}
> > > > 
> > > > 	// nocb_cb_wait()
> > > > 	P2(spinlock_t *nocb_lock, int *cb_ready, int *cb_executed)
> > > > 	{
> > > > 		int r2;
> > > > 
> > > > 		// rcu_do_batch() -> rcu_segcblist_extract_done_cbs()
> > > > 		spin_lock(nocb_lock);
> > > > 		r2 = READ_ONCE(*cb_ready);
> > > > 		spin_unlock(nocb_lock);
> > > > 
> > > > 		// Actual callback execution
> > > > 		WRITE_ONCE(*cb_executed, 1);
> > > 
> > > So related to this something in the docs caught my attention under "Callback
> > > Invocation" [1]
> > > 
> > > <quote>
> > > However, if the callback function communicates to other CPUs, for example,
> > > doing a wakeup, then it is that function's responsibility to maintain
> > > ordering. For example, if the callback function wakes up a task that runs on
> > > some other CPU, proper ordering must in place in both the callback function
> > > and the task being awakened. To see why this is important, consider the top
> > > half of the grace-period cleanup diagram. The callback might be running on a
> > > CPU corresponding to the leftmost leaf rcu_node structure, and awaken a task
> > > that is to run on a CPU corresponding to the rightmost leaf rcu_node
> > > structure, and the grace-period kernel thread might not yet have reached the
> > > rightmost leaf. In this case, the grace period's memory ordering might not
> > > yet have reached that CPU, so again the callback function and the awakened
> > > task must supply proper ordering.
> > > </quote>
> > > 
> > > I believe this text is for non-nocb but if we apply that to the nocb case,
> > > lets see what happens.
> > > 
> > > In the litmus, he rcu_advance_cbs() happened on P1, however the callback is
> > > executing on P2. That sounds very similar to the non-nocb world described in
> > > the text where a callback tries to wake something up on a different CPU and
> > > needs to take care of all the ordering.
> > > 
> > > So unless I'm missing something (quite possible), P2 must see the update to
> > > gpnum as well. However, per your limus test, the only thing P2  does is
> > > acquire the nocb_lock. I don't see how it is guaranteed to see gpnum == 1.
> > 
> > Because P1 writes cb_ready under nocb_lock, and P2 reads cb_ready under
> > nocb_lock as well and if P2 read P1's write, then we know the serialized
> > order of locking is P1 first (i.e. the spin_lock(nocb_lock) on P2 reads
> > from the spin_unlock(nocb_lock) on P1), in other words:
> > 
> > (fact #1)
> > 
> > 	unlock(nocb_lock) // on P1
> > 	->rfe
> > 	lock(nocb_lock) // on P2
> > 
> > so if P1 reads P0's write on gpnum
> > 
> > (assumption #1)
> > 
> > 	W(gpnum)=1 // on P0
> > 	->rfe
> > 	R(gpnum)=1 // on P1
> > 
> > and we have
> > 
> > (fact #2)
> > 
> > 	R(gpnum)=1 // on P1
> > 	->(po; [UL])
> > 	unlock(nocb_lock) // on P1
> > 
> > combine them you get
> > 
> > 	W(gpnum)=1 // on P0
> > 	->rfe           // fact #1
> > 	->(po; [UL])    // fact #2
> > 	->rfe           // assumption #1
> > 	lock(nocb_lock) // on P2
> > 	->([LKR]; po)
> > 	M // any access on P2 after spin_lock(nocb_lock);
> > 
> > so
> > 	W(gpnum)=1 // on P0
> > 	->rfe ->po-unlock-lock-po
> > 	M // on P2
> > 
> > and po-unlock-lock-po is A-culum, hence "->rfe ->po-unlock-lock-po" or
> > "rfe; po-unlock-lock-po" is culum-fence, hence it's a ->prop, which
> > means the write of gpnum on P0 propagates to P2 before any memory
> > accesses after spin_lock(nocb_lock)?
> 
> You and Frederic are right. I confirmed this by running herd7 as well.
> 
> Also he adds a ->co between P2 and P3, so that's why the
> smp_mb__after_lock_unlock() helps to keep the propogation intact. Its pretty
> much the R-pattern extended across 4 CPUs.
> 
> We should probably document these in the RCU memory ordering docs.

I have to trust you on that guys, I haven't managed to spend time on
tools/memory-model/Documentation/explanation.txt yet. But glad you sorted
it out.

> 
> thanks,
> 
>  - Joel
> 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 04/10] rcu/nocb: Remove needless full barrier after callback advancing
  2023-09-10 20:17         ` Frederic Weisbecker
@ 2023-09-10 20:29           ` Frederic Weisbecker
  0 siblings, 0 replies; 29+ messages in thread
From: Frederic Weisbecker @ 2023-09-10 20:29 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Boqun Feng, Paul E . McKenney, LKML, rcu, Uladzislau Rezki,
	Neeraj Upadhyay

Le Sun, Sep 10, 2023 at 10:17:15PM +0200, Frederic Weisbecker a écrit :
> Le Sun, Sep 10, 2023 at 12:09:23AM -0400, Joel Fernandes a écrit :
> > On Sat, Sep 09, 2023 at 11:22:48AM -0700, Boqun Feng wrote:
> > > On Sat, Sep 09, 2023 at 04:31:25AM +0000, Joel Fernandes wrote:
> > > > On Fri, Sep 08, 2023 at 10:35:57PM +0200, Frederic Weisbecker wrote:
> > > > > A full barrier is issued from nocb_gp_wait() upon callbacks advancing
> > > > > to order grace period completion with callbacks execution.
> > > > > 
> > > > > However these two events are already ordered by the
> > > > > smp_mb__after_unlock_lock() barrier within the call to
> > > > > raw_spin_lock_rcu_node() that is necessary for callbacks advancing to
> > > > > happen.
> > > > > 
> > > > > The following litmus test shows the kind of guarantee that this barrier
> > > > > provides:
> > > > > 
> > > > > 	C smp_mb__after_unlock_lock
> > > > > 
> > > > > 	{}
> > > > > 
> > > > > 	// rcu_gp_cleanup()
> > > > > 	P0(spinlock_t *rnp_lock, int *gpnum)
> > > > > 	{
> > > > > 		// Grace period cleanup increase gp sequence number
> > > > > 		spin_lock(rnp_lock);
> > > > > 		WRITE_ONCE(*gpnum, 1);
> > > > > 		spin_unlock(rnp_lock);
> > > > > 	}
> > > > > 
> > > > > 	// nocb_gp_wait()
> > > > > 	P1(spinlock_t *rnp_lock, spinlock_t *nocb_lock, int *gpnum, int *cb_ready)
> > > > > 	{
> > > > > 		int r1;
> > > > > 
> > > > > 		// Call rcu_advance_cbs() from nocb_gp_wait()
> > > > > 		spin_lock(nocb_lock);
> > > > > 		spin_lock(rnp_lock);
> > > > > 		smp_mb__after_unlock_lock();
> > > > > 		r1 = READ_ONCE(*gpnum);
> > > > > 		WRITE_ONCE(*cb_ready, 1);
> > > > > 		spin_unlock(rnp_lock);
> > > > > 		spin_unlock(nocb_lock);
> > > > > 	}
> > > > > 
> > > > > 	// nocb_cb_wait()
> > > > > 	P2(spinlock_t *nocb_lock, int *cb_ready, int *cb_executed)
> > > > > 	{
> > > > > 		int r2;
> > > > > 
> > > > > 		// rcu_do_batch() -> rcu_segcblist_extract_done_cbs()
> > > > > 		spin_lock(nocb_lock);
> > > > > 		r2 = READ_ONCE(*cb_ready);
> > > > > 		spin_unlock(nocb_lock);
> > > > > 
> > > > > 		// Actual callback execution
> > > > > 		WRITE_ONCE(*cb_executed, 1);
> > > > 
> > > > So related to this something in the docs caught my attention under "Callback
> > > > Invocation" [1]
> > > > 
> > > > <quote>
> > > > However, if the callback function communicates to other CPUs, for example,
> > > > doing a wakeup, then it is that function's responsibility to maintain
> > > > ordering. For example, if the callback function wakes up a task that runs on
> > > > some other CPU, proper ordering must in place in both the callback function
> > > > and the task being awakened. To see why this is important, consider the top
> > > > half of the grace-period cleanup diagram. The callback might be running on a
> > > > CPU corresponding to the leftmost leaf rcu_node structure, and awaken a task
> > > > that is to run on a CPU corresponding to the rightmost leaf rcu_node
> > > > structure, and the grace-period kernel thread might not yet have reached the
> > > > rightmost leaf. In this case, the grace period's memory ordering might not
> > > > yet have reached that CPU, so again the callback function and the awakened
> > > > task must supply proper ordering.
> > > > </quote>
> > > > 
> > > > I believe this text is for non-nocb but if we apply that to the nocb case,
> > > > lets see what happens.
> > > > 
> > > > In the litmus, he rcu_advance_cbs() happened on P1, however the callback is
> > > > executing on P2. That sounds very similar to the non-nocb world described in
> > > > the text where a callback tries to wake something up on a different CPU and
> > > > needs to take care of all the ordering.
> > > > 
> > > > So unless I'm missing something (quite possible), P2 must see the update to
> > > > gpnum as well. However, per your limus test, the only thing P2  does is
> > > > acquire the nocb_lock. I don't see how it is guaranteed to see gpnum == 1.
> > > 
> > > Because P1 writes cb_ready under nocb_lock, and P2 reads cb_ready under
> > > nocb_lock as well and if P2 read P1's write, then we know the serialized
> > > order of locking is P1 first (i.e. the spin_lock(nocb_lock) on P2 reads
> > > from the spin_unlock(nocb_lock) on P1), in other words:
> > > 
> > > (fact #1)
> > > 
> > > 	unlock(nocb_lock) // on P1
> > > 	->rfe
> > > 	lock(nocb_lock) // on P2
> > > 
> > > so if P1 reads P0's write on gpnum
> > > 
> > > (assumption #1)
> > > 
> > > 	W(gpnum)=1 // on P0
> > > 	->rfe
> > > 	R(gpnum)=1 // on P1
> > > 
> > > and we have
> > > 
> > > (fact #2)
> > > 
> > > 	R(gpnum)=1 // on P1
> > > 	->(po; [UL])
> > > 	unlock(nocb_lock) // on P1
> > > 
> > > combine them you get
> > > 
> > > 	W(gpnum)=1 // on P0
> > > 	->rfe           // fact #1
> > > 	->(po; [UL])    // fact #2
> > > 	->rfe           // assumption #1
> > > 	lock(nocb_lock) // on P2
> > > 	->([LKR]; po)
> > > 	M // any access on P2 after spin_lock(nocb_lock);
> > > 
> > > so
> > > 	W(gpnum)=1 // on P0
> > > 	->rfe ->po-unlock-lock-po
> > > 	M // on P2
> > > 
> > > and po-unlock-lock-po is A-culum, hence "->rfe ->po-unlock-lock-po" or
> > > "rfe; po-unlock-lock-po" is culum-fence, hence it's a ->prop, which
> > > means the write of gpnum on P0 propagates to P2 before any memory
> > > accesses after spin_lock(nocb_lock)?
> > 
> > You and Frederic are right. I confirmed this by running herd7 as well.
> > 
> > Also he adds a ->co between P2 and P3, so that's why the
> > smp_mb__after_lock_unlock() helps to keep the propogation intact. Its pretty
> > much the R-pattern extended across 4 CPUs.
> > 
> > We should probably document these in the RCU memory ordering docs.
> 
> I have to trust you on that guys, I haven't managed to spend time on
> tools/memory-model/Documentation/explanation.txt yet. But glad you sorted
> it out.

Oh and I can indeed add a comment in rcu_do_batch() -> rcu_nocb_lock_irqsave()
to tell about how the guaranteed ordering against grace period completion is
enforced. Will do on the next take.

Thanks.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 03/10] rcu/nocb: Remove needless LOAD-ACQUIRE
  2023-09-09  1:48   ` Joel Fernandes
  2023-09-09  1:50     ` Joel Fernandes
@ 2023-09-10 21:17     ` Frederic Weisbecker
  1 sibling, 0 replies; 29+ messages in thread
From: Frederic Weisbecker @ 2023-09-10 21:17 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Paul E . McKenney, LKML, rcu, Uladzislau Rezki, Neeraj Upadhyay,
	Boqun Feng

Le Fri, Sep 08, 2023 at 09:48:44PM -0400, Joel Fernandes a écrit :
> On Fri, Sep 8, 2023 at 4:36 PM Frederic Weisbecker <frederic@kernel.org> wrote:
> >
> > The LOAD-ACQUIRE access performed on rdp->nocb_cb_sleep advertizes
> > ordering callback execution against grace period completion. However
> > this is contradicted by the following:
> >
> > * This LOAD-ACQUIRE doesn't pair with anything. The only counterpart
> >   barrier that can be found is the smp_mb() placed after callbacks
> >   advancing in nocb_gp_wait(). However the barrier is placed _after_
> >   ->nocb_cb_sleep write.
> 
> Hmm, on one side you have:
> 
> WRITE_ONCE(rdp->nocb_cb_sleep, false);
> smp_mb();
> swake_up_one(&rdp->nocb_cb_wq);   /* wakeup -- consider this to be a STORE */
> 
> And on another side you have:
> swait_event_interruptible_exclusive(rdp->nocb_cb_wq, ..cond..) /*
> consider this to be a LOAD */
> smp_load_acquire(&rdp->nocb_cb_sleep)
> /* exec CBs (LOAD operations) */
> 
> So there seems to be pairing AFAICS.

I must be confused, that would give such pattern:

         WRITE X                LOAD Y
         smp_mb()
         WRITE Y                smp_load_acquire(X)

How does this pair?

> 
> But maybe you are referring to pairing between advancing the callbacks
> and storing to nocb_cb_sleep. In this case, the RELEASE of the nocb
> unlock operation just after advancing should be providing the
> ordering

Right.

> but we still need the acquire this patch deletes.

Why?

> 
> > * Callbacks can be concurrently advanced between the LOAD-ACQUIRE on
> >   ->nocb_cb_sleep and the call to rcu_segcblist_extract_done_cbs() in
> >   rcu_do_batch(), making any ordering based on ->nocb_cb_sleep broken.
> 
> If you don't mind, could you elaborate more?

So imagine:

1) Some callbacks are pending
2) A grace period completes, nocb_gp_wait() advance some callbacks to DONE and
   some callbacks to WAIT, another grace period starts to handle the latter.
3) Because some callbacks are ready to invoke, nocb_gp_wait() sets
   rdp->nocb_cb_sleep to false and wakes up nocb_cb_wait()
4) nocb_cb_wait() does smp_load_acquire(rdp->nocb_cb_sleep) and proceeds
   with rcu_do_batch() but it gets preempted right before.
5) The new grace period completes.
6) nocb_gp_wait() does one more round and advances the WAIT callbacks to the
   non-empty DONE segment. Also it doesn't need to wake up nocb_cb_wait()
   since it's pending and ->nocb_cb_sleep is still false but it force writes
   again ->nocb_cb_sleep to false.
7) nocb_cb_wait() resumes and calls rcu_do_batch() without doing a new
   load-acquire on ->nocb_cb_sleep, this means the ordering only applies to the
   callbacks that were moved to DONE on step 2) but not to those moved to DONE
   on step 6).

> 
> > * Both rcu_segcblist_extract_done_cbs() and rcu_advance_cbs() are called
> >   under the nocb_lock, the latter hereby providing already the desired
> >   ACQUIRE semantics.
> 
> The acquire orders loads to nocb_cb_sleep with all later loads/stores.
> I am not sure how nocb_lock gives that same behavior since that's
> doing ACQUIRE on the lock access itself and not on nocb_cb_sleep
> access, I'd appreciate it if we can debate this out.

Well, the nocb_lock releases not only the write to nocb_cb_sleep but also
everything that precedes it. So it plays the same role and, most importantly,
it's acquired before calling rcu_segcblist_extract_done_cbs().

> 
> Every few months I need a memory-ordering workout so this can be that.
> ;-) You could be onto something.

No worries, I have some more headaches upcoming for all of us on the plate  ;-)

Thanks.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 01/10] rcu: Use rcu_segcblist_segempty() instead of open coding it
  2023-09-08 20:35 ` [PATCH 01/10] rcu: Use rcu_segcblist_segempty() instead of open coding it Frederic Weisbecker
@ 2023-10-02 15:38   ` Paul E. McKenney
  0 siblings, 0 replies; 29+ messages in thread
From: Paul E. McKenney @ 2023-10-02 15:38 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, rcu, Uladzislau Rezki, Neeraj Upadhyay, Boqun Feng,
	Qiuxu Zhuo, Joel Fernandes

On Fri, Sep 08, 2023 at 10:35:54PM +0200, Frederic Weisbecker wrote:
> This makes the code more readable.
> 
> Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
> Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>

Reviewed-by: Paul E. McKenney <paulmck@kernel.org>

> ---
>  kernel/rcu/rcu_segcblist.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/rcu/rcu_segcblist.c b/kernel/rcu/rcu_segcblist.c
> index f71fac422c8f..1693ea22ef1b 100644
> --- a/kernel/rcu/rcu_segcblist.c
> +++ b/kernel/rcu/rcu_segcblist.c
> @@ -368,7 +368,7 @@ bool rcu_segcblist_entrain(struct rcu_segcblist *rsclp,
>  	smp_mb(); /* Ensure counts are updated before callback is entrained. */
>  	rhp->next = NULL;
>  	for (i = RCU_NEXT_TAIL; i > RCU_DONE_TAIL; i--)
> -		if (rsclp->tails[i] != rsclp->tails[i - 1])
> +		if (!rcu_segcblist_segempty(rsclp, i))
>  			break;
>  	rcu_segcblist_inc_seglen(rsclp, i);
>  	WRITE_ONCE(*rsclp->tails[i], rhp);
> @@ -551,7 +551,7 @@ bool rcu_segcblist_accelerate(struct rcu_segcblist *rsclp, unsigned long seq)
>  	 * as their ->gp_seq[] grace-period completion sequence number.
>  	 */
>  	for (i = RCU_NEXT_READY_TAIL; i > RCU_DONE_TAIL; i--)
> -		if (rsclp->tails[i] != rsclp->tails[i - 1] &&
> +		if (!rcu_segcblist_segempty(rsclp, i) &&
>  		    ULONG_CMP_LT(rsclp->gp_seq[i], seq))
>  			break;
>  
> -- 
> 2.41.0
> 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 05/10] rcu: Assume IRQS disabled from rcu_report_dead()
  2023-09-08 20:35 ` [PATCH 05/10] rcu: Assume IRQS disabled from rcu_report_dead() Frederic Weisbecker
@ 2023-10-02 15:41   ` Paul E. McKenney
  0 siblings, 0 replies; 29+ messages in thread
From: Paul E. McKenney @ 2023-10-02 15:41 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, rcu, Uladzislau Rezki, Neeraj Upadhyay, Boqun Feng, Joel Fernandes

On Fri, Sep 08, 2023 at 10:35:58PM +0200, Frederic Weisbecker wrote:
> rcu_report_dead() is the last RCU word from the CPU down through the
> hotplug path. It is called in the idle loop right before the CPU shuts
> down for good. Because it removes the CPU from the grace period state
> machine and reports an ultimate quiescent state if necessary, no further
> use of RCU is allowed. Therefore it is expected that IRQs are disabled
> upon calling this function and are not to be re-enabled again until the
> CPU shuts down.
> 
> Remove the IRQs disablement from that function and verify instead that
> it is actually called with IRQs disabled as it is expected at that
> special point in the idle path.
> 
> Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>

Reviewed-by: Paul E. McKenney <paulmck@kernel.org>

> ---
>  kernel/rcu/tree.c | 10 ++++++----
>  1 file changed, 6 insertions(+), 4 deletions(-)
> 
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index a83ecab77917..8b5ebef32e17 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -4553,11 +4553,16 @@ void rcu_cpu_starting(unsigned int cpu)
>   */
>  void rcu_report_dead(unsigned int cpu)
>  {
> -	unsigned long flags, seq_flags;
> +	unsigned long flags;
>  	unsigned long mask;
>  	struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
>  	struct rcu_node *rnp = rdp->mynode;  /* Outgoing CPU's rdp & rnp. */
>  
> +	/*
> +	 * IRQS must be disabled from now on and until the CPU dies, or an interrupt
> +	 * may introduce a new READ-side while it is actually off the QS masks.
> +	 */
> +	lockdep_assert_irqs_disabled();
>  	// Do any dangling deferred wakeups.
>  	do_nocb_deferred_wakeup(rdp);
>  
> @@ -4565,7 +4570,6 @@ void rcu_report_dead(unsigned int cpu)
>  
>  	/* Remove outgoing CPU from mask in the leaf rcu_node structure. */
>  	mask = rdp->grpmask;
> -	local_irq_save(seq_flags);
>  	arch_spin_lock(&rcu_state.ofl_lock);
>  	raw_spin_lock_irqsave_rcu_node(rnp, flags); /* Enforce GP memory-order guarantee. */
>  	rdp->rcu_ofl_gp_seq = READ_ONCE(rcu_state.gp_seq);
> @@ -4579,8 +4583,6 @@ void rcu_report_dead(unsigned int cpu)
>  	WRITE_ONCE(rnp->qsmaskinitnext, rnp->qsmaskinitnext & ~mask);
>  	raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
>  	arch_spin_unlock(&rcu_state.ofl_lock);
> -	local_irq_restore(seq_flags);
> -
>  	rdp->cpu_started = false;
>  }
>  
> -- 
> 2.41.0
> 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 06/10] rcu: Assume rcu_report_dead() is always called locally
  2023-09-08 20:35 ` [PATCH 06/10] rcu: Assume rcu_report_dead() is always called locally Frederic Weisbecker
@ 2023-10-02 15:45   ` Paul E. McKenney
  0 siblings, 0 replies; 29+ messages in thread
From: Paul E. McKenney @ 2023-10-02 15:45 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, rcu, Uladzislau Rezki, Neeraj Upadhyay, Boqun Feng, Joel Fernandes

On Fri, Sep 08, 2023 at 10:35:59PM +0200, Frederic Weisbecker wrote:
> rcu_report_dead() has to be called locally by the CPU that is going to
> exit the RCU state machine. Passing a cpu argument here is error-prone
> and leaves the possibility for a racy remote call.
> 
> Use local access instead.
> 
> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>

I was going to ask for an assertion for "cpu" in cpu_die_early(), but
given that its value comes from smp_processor_id() just a few lines
earlier, there isn't a whole lot of point to that.  So:

Reviewed-by: Paul E. McKenney <paulmck@kernel.org>

> ---
>  arch/arm64/kernel/smp.c  | 2 +-
>  include/linux/rcupdate.h | 2 +-
>  kernel/cpu.c             | 2 +-
>  kernel/rcu/tree.c        | 4 ++--
>  4 files changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
> index edd63894d61e..ce672cb69f1c 100644
> --- a/arch/arm64/kernel/smp.c
> +++ b/arch/arm64/kernel/smp.c
> @@ -401,7 +401,7 @@ void __noreturn cpu_die_early(void)
>  
>  	/* Mark this CPU absent */
>  	set_cpu_present(cpu, 0);
> -	rcu_report_dead(cpu);
> +	rcu_report_dead();
>  
>  	if (IS_ENABLED(CONFIG_HOTPLUG_CPU)) {
>  		update_cpu_boot_status(CPU_KILL_ME);
> diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
> index 5e5f920ade90..aa351ddcbe8d 100644
> --- a/include/linux/rcupdate.h
> +++ b/include/linux/rcupdate.h
> @@ -122,7 +122,7 @@ static inline void call_rcu_hurry(struct rcu_head *head, rcu_callback_t func)
>  void rcu_init(void);
>  extern int rcu_scheduler_active;
>  void rcu_sched_clock_irq(int user);
> -void rcu_report_dead(unsigned int cpu);
> +void rcu_report_dead(void);
>  void rcutree_migrate_callbacks(int cpu);
>  
>  #ifdef CONFIG_TASKS_RCU_GENERIC
> diff --git a/kernel/cpu.c b/kernel/cpu.c
> index 88a7ede322bd..86f08eafbd9f 100644
> --- a/kernel/cpu.c
> +++ b/kernel/cpu.c
> @@ -1368,7 +1368,7 @@ void cpuhp_report_idle_dead(void)
>  	struct cpuhp_cpu_state *st = this_cpu_ptr(&cpuhp_state);
>  
>  	BUG_ON(st->state != CPUHP_AP_OFFLINE);
> -	rcu_report_dead(smp_processor_id());
> +	rcu_report_dead();
>  	st->state = CPUHP_AP_IDLE_DEAD;
>  	/*
>  	 * We cannot call complete after rcu_report_dead() so we delegate it
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 8b5ebef32e17..289c51417cbc 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -4551,11 +4551,11 @@ void rcu_cpu_starting(unsigned int cpu)
>   * from the outgoing CPU rather than from the cpuhp_step mechanism.
>   * This is because this function must be invoked at a precise location.
>   */
> -void rcu_report_dead(unsigned int cpu)
> +void rcu_report_dead(void)
>  {
>  	unsigned long flags;
>  	unsigned long mask;
> -	struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
> +	struct rcu_data *rdp = this_cpu_ptr(&rcu_data);
>  	struct rcu_node *rnp = rdp->mynode;  /* Outgoing CPU's rdp & rnp. */
>  
>  	/*
> -- 
> 2.41.0
> 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 08/10] rcu: Standardize explicit CPU-hotplug calls
  2023-09-08 20:36 ` [PATCH 08/10] rcu: Standardize explicit CPU-hotplug calls Frederic Weisbecker
@ 2023-10-02 15:47   ` Paul E. McKenney
  0 siblings, 0 replies; 29+ messages in thread
From: Paul E. McKenney @ 2023-10-02 15:47 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, rcu, Uladzislau Rezki, Neeraj Upadhyay, Boqun Feng, Joel Fernandes

On Fri, Sep 08, 2023 at 10:36:01PM +0200, Frederic Weisbecker wrote:
> rcu_report_dead() and rcutree_migrate_callbacks() have their headers in
> rcupdate.h while those are pure rcutree calls, like the other CPU-hotplug
> functions.
> 
> Also rcu_cpu_starting() and rcu_report_dead() have different naming
> conventions while they mirror each other's effects.
> 
> Fix the headers and propose a naming that relates both functions and
> aligns with the prefix of other rcutree CPU-hotplug functions.
> 
> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>

Reviewed-by: Paul E. McKenney <paulmck@kernel.org>

> ---
>  .../Expedited-Grace-Periods.rst                      |  2 +-
>  .../RCU/Design/Memory-Ordering/TreeRCU-gp-fqs.svg    |  4 ++--
>  .../RCU/Design/Memory-Ordering/TreeRCU-gp.svg        |  4 ++--
>  .../RCU/Design/Memory-Ordering/TreeRCU-hotplug.svg   |  4 ++--
>  .../RCU/Design/Requirements/Requirements.rst         |  4 ++--
>  arch/arm64/kernel/smp.c                              |  4 ++--
>  arch/powerpc/kernel/smp.c                            |  2 +-
>  arch/s390/kernel/smp.c                               |  2 +-
>  arch/x86/kernel/smpboot.c                            |  2 +-
>  include/linux/interrupt.h                            |  2 +-
>  include/linux/rcupdate.h                             |  2 --
>  include/linux/rcutiny.h                              |  2 +-
>  include/linux/rcutree.h                              |  7 ++++++-
>  kernel/cpu.c                                         |  6 +++---
>  kernel/rcu/tree.c                                    | 12 ++++++++----
>  15 files changed, 33 insertions(+), 26 deletions(-)
> 
> diff --git a/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.rst b/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.rst
> index 93d899d53258..414f8a2012d6 100644
> --- a/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.rst
> +++ b/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.rst
> @@ -181,7 +181,7 @@ operations is carried out at several levels:
>     of this wait (or series of waits, as the case may be) is to permit a
>     concurrent CPU-hotplug operation to complete.
>  #. In the case of RCU-sched, one of the last acts of an outgoing CPU is
> -   to invoke ``rcu_report_dead()``, which reports a quiescent state for
> +   to invoke ``rcutree_report_cpu_dead()``, which reports a quiescent state for
>     that CPU. However, this is likely paranoia-induced redundancy.
>  
>  +-----------------------------------------------------------------------+
> diff --git a/Documentation/RCU/Design/Memory-Ordering/TreeRCU-gp-fqs.svg b/Documentation/RCU/Design/Memory-Ordering/TreeRCU-gp-fqs.svg
> index 7ddc094d7f28..d82a77d03d8c 100644
> --- a/Documentation/RCU/Design/Memory-Ordering/TreeRCU-gp-fqs.svg
> +++ b/Documentation/RCU/Design/Memory-Ordering/TreeRCU-gp-fqs.svg
> @@ -1135,7 +1135,7 @@
>         font-weight="bold"
>         font-size="192"
>         id="text202-7-5-3-27-6-5"
> -       style="font-size:192px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;stroke-width:0.025in;font-family:Courier">rcu_report_dead()</text>
> +       style="font-size:192px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;stroke-width:0.025in;font-family:Courier">rcutree_report_cpu_dead()</text>
>      <text
>         xml:space="preserve"
>         x="3745.7725"
> @@ -1256,7 +1256,7 @@
>         font-style="normal"
>         y="3679.27"
>         x="-3804.9949"
> -       xml:space="preserve">rcu_cpu_starting()</text>
> +       xml:space="preserve">rcutree_report_cpu_starting()</text>
>      <g
>         style="fill:none;stroke-width:0.025in"
>         id="g3107-7-5-0"
> diff --git a/Documentation/RCU/Design/Memory-Ordering/TreeRCU-gp.svg b/Documentation/RCU/Design/Memory-Ordering/TreeRCU-gp.svg
> index 069f6f8371c2..6e690a3be161 100644
> --- a/Documentation/RCU/Design/Memory-Ordering/TreeRCU-gp.svg
> +++ b/Documentation/RCU/Design/Memory-Ordering/TreeRCU-gp.svg
> @@ -3274,7 +3274,7 @@
>           font-weight="bold"
>           font-size="192"
>           id="text202-7-5-3-27-6-5"
> -         style="font-size:192px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;stroke-width:0.025in;font-family:Courier">rcu_report_dead()</text>
> +         style="font-size:192px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;stroke-width:0.025in;font-family:Courier">rcutree_report_cpu_dead()</text>
>        <text
>           xml:space="preserve"
>           x="3745.7725"
> @@ -3395,7 +3395,7 @@
>           font-style="normal"
>           y="3679.27"
>           x="-3804.9949"
> -         xml:space="preserve">rcu_cpu_starting()</text>
> +         xml:space="preserve">rcutree_report_cpu_starting()</text>
>        <g
>           style="fill:none;stroke-width:0.025in"
>           id="g3107-7-5-0"
> diff --git a/Documentation/RCU/Design/Memory-Ordering/TreeRCU-hotplug.svg b/Documentation/RCU/Design/Memory-Ordering/TreeRCU-hotplug.svg
> index 2c9310ba29ba..4fa7506082bf 100644
> --- a/Documentation/RCU/Design/Memory-Ordering/TreeRCU-hotplug.svg
> +++ b/Documentation/RCU/Design/Memory-Ordering/TreeRCU-hotplug.svg
> @@ -607,7 +607,7 @@
>         font-weight="bold"
>         font-size="192"
>         id="text202-7-5-3-27-6"
> -       style="font-size:192px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;stroke-width:0.025in;font-family:Courier">rcu_report_dead()</text>
> +       style="font-size:192px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;stroke-width:0.025in;font-family:Courier">rcutree_report_cpu_dead()</text>
>      <text
>         xml:space="preserve"
>         x="3745.7725"
> @@ -728,7 +728,7 @@
>         font-style="normal"
>         y="3679.27"
>         x="-3804.9949"
> -       xml:space="preserve">rcu_cpu_starting()</text>
> +       xml:space="preserve">rcutree_report_cpu_starting()</text>
>      <g
>         style="fill:none;stroke-width:0.025in"
>         id="g3107-7-5-0"
> diff --git a/Documentation/RCU/Design/Requirements/Requirements.rst b/Documentation/RCU/Design/Requirements/Requirements.rst
> index f3b605285a87..cccafdaa1f84 100644
> --- a/Documentation/RCU/Design/Requirements/Requirements.rst
> +++ b/Documentation/RCU/Design/Requirements/Requirements.rst
> @@ -1955,12 +1955,12 @@ if offline CPUs block an RCU grace period for too long.
>  
>  An offline CPU's quiescent state will be reported either:
>  
> -1.  As the CPU goes offline using RCU's hotplug notifier (rcu_report_dead()).
> +1.  As the CPU goes offline using RCU's hotplug notifier (rcutree_report_cpu_dead()).
>  2.  When grace period initialization (rcu_gp_init()) detects a
>      race either with CPU offlining or with a task unblocking on a leaf
>      ``rcu_node`` structure whose CPUs are all offline.
>  
> -The CPU-online path (rcu_cpu_starting()) should never need to report
> +The CPU-online path (rcutree_report_cpu_starting()) should never need to report
>  a quiescent state for an offline CPU.  However, as a debugging measure,
>  it does emit a warning if a quiescent state was not already reported
>  for that CPU.
> diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
> index ce672cb69f1c..90c0083438ea 100644
> --- a/arch/arm64/kernel/smp.c
> +++ b/arch/arm64/kernel/smp.c
> @@ -215,7 +215,7 @@ asmlinkage notrace void secondary_start_kernel(void)
>  	if (system_uses_irq_prio_masking())
>  		init_gic_priority_masking();
>  
> -	rcu_cpu_starting(cpu);
> +	rcutree_report_cpu_starting(cpu);
>  	trace_hardirqs_off();
>  
>  	/*
> @@ -401,7 +401,7 @@ void __noreturn cpu_die_early(void)
>  
>  	/* Mark this CPU absent */
>  	set_cpu_present(cpu, 0);
> -	rcu_report_dead();
> +	rcutree_report_cpu_dead();
>  
>  	if (IS_ENABLED(CONFIG_HOTPLUG_CPU)) {
>  		update_cpu_boot_status(CPU_KILL_ME);
> diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
> index fbbb695bae3d..a132e3290e98 100644
> --- a/arch/powerpc/kernel/smp.c
> +++ b/arch/powerpc/kernel/smp.c
> @@ -1619,7 +1619,7 @@ void start_secondary(void *unused)
>  
>  	smp_store_cpu_info(cpu);
>  	set_dec(tb_ticks_per_jiffy);
> -	rcu_cpu_starting(cpu);
> +	rcutree_report_cpu_starting(cpu);
>  	cpu_callin_map[cpu] = 1;
>  
>  	if (smp_ops->setup_cpu)
> diff --git a/arch/s390/kernel/smp.c b/arch/s390/kernel/smp.c
> index f9a2b755f510..3e39a5e1bf48 100644
> --- a/arch/s390/kernel/smp.c
> +++ b/arch/s390/kernel/smp.c
> @@ -894,7 +894,7 @@ static void smp_start_secondary(void *cpuvoid)
>  	S390_lowcore.restart_flags = 0;
>  	restore_access_regs(S390_lowcore.access_regs_save_area);
>  	cpu_init();
> -	rcu_cpu_starting(cpu);
> +	rcutree_report_cpu_starting(cpu);
>  	init_cpu_timer();
>  	vtime_init();
>  	vdso_getcpu_init();
> diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
> index e1aa2cd7734b..d25b952a2b91 100644
> --- a/arch/x86/kernel/smpboot.c
> +++ b/arch/x86/kernel/smpboot.c
> @@ -288,7 +288,7 @@ static void notrace start_secondary(void *unused)
>  
>  	cpu_init();
>  	fpu__init_cpu();
> -	rcu_cpu_starting(raw_smp_processor_id());
> +	rcutree_report_cpu_starting(raw_smp_processor_id());
>  	x86_cpuinit.early_percpu_clock_init();
>  
>  	ap_starting();
> diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h
> index a92bce40b04b..d05e1e9a553c 100644
> --- a/include/linux/interrupt.h
> +++ b/include/linux/interrupt.h
> @@ -566,7 +566,7 @@ enum
>   *
>   * _ RCU:
>   * 	1) rcutree_migrate_callbacks() migrates the queue.
> - * 	2) rcu_report_dead() reports the final quiescent states.
> + * 	2) rcutree_report_cpu_dead() reports the final quiescent states.
>   *
>   * _ IRQ_POLL: irq_poll_cpu_dead() migrates the queue
>   */
> diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
> index aa351ddcbe8d..f7206b2623c9 100644
> --- a/include/linux/rcupdate.h
> +++ b/include/linux/rcupdate.h
> @@ -122,8 +122,6 @@ static inline void call_rcu_hurry(struct rcu_head *head, rcu_callback_t func)
>  void rcu_init(void);
>  extern int rcu_scheduler_active;
>  void rcu_sched_clock_irq(int user);
> -void rcu_report_dead(void);
> -void rcutree_migrate_callbacks(int cpu);
>  
>  #ifdef CONFIG_TASKS_RCU_GENERIC
>  void rcu_init_tasks_generic(void);
> diff --git a/include/linux/rcutiny.h b/include/linux/rcutiny.h
> index 7f17acf29dda..04889a9602e7 100644
> --- a/include/linux/rcutiny.h
> +++ b/include/linux/rcutiny.h
> @@ -169,6 +169,6 @@ static inline void rcu_all_qs(void) { barrier(); }
>  #define rcutree_offline_cpu      NULL
>  #define rcutree_dead_cpu         NULL
>  #define rcutree_dying_cpu        NULL
> -static inline void rcu_cpu_starting(unsigned int cpu) { }
> +static inline void rcutree_report_cpu_starting(unsigned int cpu) { }
>  
>  #endif /* __LINUX_RCUTINY_H */
> diff --git a/include/linux/rcutree.h b/include/linux/rcutree.h
> index 7d75066c72aa..07d0fc1e0d31 100644
> --- a/include/linux/rcutree.h
> +++ b/include/linux/rcutree.h
> @@ -109,7 +109,7 @@ void rcu_all_qs(void);
>  /* RCUtree hotplug events */
>  int rcutree_prepare_cpu(unsigned int cpu);
>  int rcutree_online_cpu(unsigned int cpu);
> -void rcu_cpu_starting(unsigned int cpu);
> +void rcutree_report_cpu_starting(unsigned int cpu);
>  
>  #ifdef CONFIG_HOTPLUG_CPU
>  int rcutree_dead_cpu(unsigned int cpu);
> @@ -121,4 +121,9 @@ int rcutree_offline_cpu(unsigned int cpu);
>  #define rcutree_offline_cpu NULL
>  #endif
>  
> +void rcutree_migrate_callbacks(int cpu);
> +
> +/* Called from hotplug and also arm64 early secondary boot failure */
> +void rcutree_report_cpu_dead(void);
> +
>  #endif /* __LINUX_RCUTREE_H */
> diff --git a/kernel/cpu.c b/kernel/cpu.c
> index 86f08eafbd9f..a41a6fff3c91 100644
> --- a/kernel/cpu.c
> +++ b/kernel/cpu.c
> @@ -1368,10 +1368,10 @@ void cpuhp_report_idle_dead(void)
>  	struct cpuhp_cpu_state *st = this_cpu_ptr(&cpuhp_state);
>  
>  	BUG_ON(st->state != CPUHP_AP_OFFLINE);
> -	rcu_report_dead();
> +	rcutree_report_cpu_dead();
>  	st->state = CPUHP_AP_IDLE_DEAD;
>  	/*
> -	 * We cannot call complete after rcu_report_dead() so we delegate it
> +	 * We cannot call complete after rcutree_report_cpu_dead() so we delegate it
>  	 * to an online cpu.
>  	 */
>  	smp_call_function_single(cpumask_first(cpu_online_mask),
> @@ -1575,7 +1575,7 @@ void notify_cpu_starting(unsigned int cpu)
>  	struct cpuhp_cpu_state *st = per_cpu_ptr(&cpuhp_state, cpu);
>  	enum cpuhp_state target = min((int)st->target, CPUHP_AP_ONLINE);
>  
> -	rcu_cpu_starting(cpu);	/* Enables RCU usage on this CPU. */
> +	rcutree_report_cpu_starting(cpu);	/* Enables RCU usage on this CPU. */
>  	cpumask_set_cpu(cpu, &cpus_booted_once_mask);
>  
>  	/*
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 875f241db508..5698c3f30b1d 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -4207,7 +4207,7 @@ bool rcu_lockdep_current_cpu_online(void)
>  	rdp = this_cpu_ptr(&rcu_data);
>  	/*
>  	 * Strictly, we care here about the case where the current CPU is
> -	 * in rcu_cpu_starting() and thus has an excuse for rdp->grpmask
> +	 * in rcutree_report_cpu_starting() and thus has an excuse for rdp->grpmask
>  	 * not being up to date. So arch_spin_is_locked() might have a
>  	 * false positive if it's held by some *other* CPU, but that's
>  	 * OK because that just means a false *negative* on the warning.
> @@ -4436,8 +4436,10 @@ int rcutree_online_cpu(unsigned int cpu)
>   * from the incoming CPU rather than from the cpuhp_step mechanism.
>   * This is because this function must be invoked at a precise location.
>   * This incoming CPU must not have enabled interrupts yet.
> + *
> + * This mirrors the effects of rcutree_report_cpu_dead().
>   */
> -void rcu_cpu_starting(unsigned int cpu)
> +void rcutree_report_cpu_starting(unsigned int cpu)
>  {
>  	unsigned long mask;
>  	struct rcu_data *rdp;
> @@ -4491,8 +4493,10 @@ void rcu_cpu_starting(unsigned int cpu)
>   * Note that this function is special in that it is invoked directly
>   * from the outgoing CPU rather than from the cpuhp_step mechanism.
>   * This is because this function must be invoked at a precise location.
> + *
> + * This mirrors the effect of rcutree_report_cpu_starting().
>   */
> -void rcu_report_dead(void)
> +void rcutree_report_cpu_dead(void)
>  {
>  	unsigned long flags;
>  	unsigned long mask;
> @@ -5063,7 +5067,7 @@ void __init rcu_init(void)
>  	pm_notifier(rcu_pm_notify, 0);
>  	WARN_ON(num_online_cpus() > 1); // Only one CPU this early in boot.
>  	rcutree_prepare_cpu(cpu);
> -	rcu_cpu_starting(cpu);
> +	rcutree_report_cpu_starting(cpu);
>  	rcutree_online_cpu(cpu);
>  
>  	/* Create workqueue for Tree SRCU and for expedited GPs. */
> -- 
> 2.41.0
> 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 10/10] rcu: Comment why callbacks migration can't wait for CPUHP_RCUTREE_PREP
  2023-09-08 20:36 ` [PATCH 10/10] rcu: Comment why callbacks migration can't wait for CPUHP_RCUTREE_PREP Frederic Weisbecker
@ 2023-10-02 15:48   ` Paul E. McKenney
  0 siblings, 0 replies; 29+ messages in thread
From: Paul E. McKenney @ 2023-10-02 15:48 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, rcu, Uladzislau Rezki, Neeraj Upadhyay, Boqun Feng, Joel Fernandes

On Fri, Sep 08, 2023 at 10:36:03PM +0200, Frederic Weisbecker wrote:
> The callbacks migration is performed through an explicit call from
> the hotplug control CPU right after the death of the target CPU and
> before proceeding with the CPUHP_ teardown functions.
> 
> This is unusual but necessary and yet uncommented. Summarize the reason
> as explained in the changelog of:
> 
> 	a58163d8ca2c (rcu: Migrate callbacks earlier in the CPU-offline timeline)
> 
> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>

Reviewed-by: Paul E. McKenney <paulmck@kernel.org>

> ---
>  kernel/cpu.c | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/kernel/cpu.c b/kernel/cpu.c
> index a41a6fff3c91..b135bb481be1 100644
> --- a/kernel/cpu.c
> +++ b/kernel/cpu.c
> @@ -1352,7 +1352,14 @@ static int takedown_cpu(unsigned int cpu)
>  	cpuhp_bp_sync_dead(cpu);
>  
>  	tick_cleanup_dead_cpu(cpu);
> +
> +	/*
> +	 * Callbacks must be re-integrated right away to the RCU state machine.
> +	 * Otherwise an RCU callback could block a further teardown function
> +	 * waiting for its completion.
> +	 */
>  	rcutree_migrate_callbacks(cpu);
> +
>  	return 0;
>  }
>  
> -- 
> 2.41.0
> 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 09/10] rcu: Remove references to rcu_migrate_callbacks() from diagrams
  2023-09-08 20:36 ` [PATCH 09/10] rcu: Remove references to rcu_migrate_callbacks() from diagrams Frederic Weisbecker
@ 2023-10-02 15:52   ` Paul E. McKenney
  0 siblings, 0 replies; 29+ messages in thread
From: Paul E. McKenney @ 2023-10-02 15:52 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, rcu, Uladzislau Rezki, Neeraj Upadhyay, Boqun Feng, Joel Fernandes

On Fri, Sep 08, 2023 at 10:36:02PM +0200, Frederic Weisbecker wrote:
> This function is gone since:
> 
> 	53b46303da84 (rcu: Remove rsp parameter from rcu_boot_init_percpu_data() and friends)
> 
> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>

Reviewed-by: Paul E. McKenney <paulmck@kernel.org>

> ---
>  .../Design/Memory-Ordering/TreeRCU-callback-registry.svg | 9 ---------
>  Documentation/RCU/Design/Memory-Ordering/TreeRCU-gp.svg  | 9 ---------
>  2 files changed, 18 deletions(-)
> 
> diff --git a/Documentation/RCU/Design/Memory-Ordering/TreeRCU-callback-registry.svg b/Documentation/RCU/Design/Memory-Ordering/TreeRCU-callback-registry.svg
> index 7ac6f9269806..63eff867175a 100644
> --- a/Documentation/RCU/Design/Memory-Ordering/TreeRCU-callback-registry.svg
> +++ b/Documentation/RCU/Design/Memory-Ordering/TreeRCU-callback-registry.svg
> @@ -564,15 +564,6 @@
>         font-size="192"
>         id="text202-7-9-6"
>         style="font-size:192px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;stroke-width:0.025in;font-family:Courier">rcutree_migrate_callbacks()</text>
> -    <text
> -       xml:space="preserve"
> -       x="8335.4873"
> -       y="5357.1006"
> -       font-style="normal"
> -       font-weight="bold"
> -       font-size="192"
> -       id="text202-7-9-6-0"
> -       style="font-size:192px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;stroke-width:0.025in;font-family:Courier">rcu_migrate_callbacks()</text>
>      <text
>         xml:space="preserve"
>         x="8768.4678"
> diff --git a/Documentation/RCU/Design/Memory-Ordering/TreeRCU-gp.svg b/Documentation/RCU/Design/Memory-Ordering/TreeRCU-gp.svg
> index 6e690a3be161..53e0dc2a2c79 100644
> --- a/Documentation/RCU/Design/Memory-Ordering/TreeRCU-gp.svg
> +++ b/Documentation/RCU/Design/Memory-Ordering/TreeRCU-gp.svg
> @@ -1446,15 +1446,6 @@
>         font-size="192"
>         id="text202-7-9-6"
>         style="font-size:192px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;stroke-width:0.025in;font-family:Courier">rcutree_migrate_callbacks()</text>
> -    <text
> -       xml:space="preserve"
> -       x="8335.4873"
> -       y="5357.1006"
> -       font-style="normal"
> -       font-weight="bold"
> -       font-size="192"
> -       id="text202-7-9-6-0"
> -       style="font-size:192px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;stroke-width:0.025in;font-family:Courier">rcu_migrate_callbacks()</text>
>      <text
>         xml:space="preserve"
>         x="8768.4678"
> -- 
> 2.41.0
> 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 07/10] rcu: Conditionally build CPU-hotplug teardown callbacks
  2023-09-08 20:36 ` [PATCH 07/10] rcu: Conditionally build CPU-hotplug teardown callbacks Frederic Weisbecker
@ 2023-10-04 16:57   ` Paul E. McKenney
  0 siblings, 0 replies; 29+ messages in thread
From: Paul E. McKenney @ 2023-10-04 16:57 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, rcu, Uladzislau Rezki, Neeraj Upadhyay, Boqun Feng, Joel Fernandes

On Fri, Sep 08, 2023 at 10:36:00PM +0200, Frederic Weisbecker wrote:
> Among the three CPU-hotplug teardown RCU callbacks, two of them early
> exit if CONFIG_HOTPLUG_CPU=n, and one is left unchanged. In any case
> all of them have an implementation when CONFIG_HOTPLUG_CPU=n.
> 
> Align instead with the common way to deal with CPU-hotplug teardown
> callbacks and provide a proper stub when they are not supported.
> 
> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>

Good eyes!

Reviewed-by: Paul E. McKenney <paulmck@kernel.org>

> ---
>  include/linux/rcutree.h |  11 +++-
>  kernel/rcu/tree.c       | 114 +++++++++++++++++++---------------------
>  2 files changed, 63 insertions(+), 62 deletions(-)
> 
> diff --git a/include/linux/rcutree.h b/include/linux/rcutree.h
> index af6ddbd291eb..7d75066c72aa 100644
> --- a/include/linux/rcutree.h
> +++ b/include/linux/rcutree.h
> @@ -109,9 +109,16 @@ void rcu_all_qs(void);
>  /* RCUtree hotplug events */
>  int rcutree_prepare_cpu(unsigned int cpu);
>  int rcutree_online_cpu(unsigned int cpu);
> -int rcutree_offline_cpu(unsigned int cpu);
> +void rcu_cpu_starting(unsigned int cpu);
> +
> +#ifdef CONFIG_HOTPLUG_CPU
>  int rcutree_dead_cpu(unsigned int cpu);
>  int rcutree_dying_cpu(unsigned int cpu);
> -void rcu_cpu_starting(unsigned int cpu);
> +int rcutree_offline_cpu(unsigned int cpu);
> +#else
> +#define rcutree_dead_cpu NULL
> +#define rcutree_dying_cpu NULL
> +#define rcutree_offline_cpu NULL
> +#endif
>  
>  #endif /* __LINUX_RCUTREE_H */
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 289c51417cbc..875f241db508 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -4228,25 +4228,6 @@ static bool rcu_init_invoked(void)
>  	return !!rcu_state.n_online_cpus;
>  }
>  
> -/*
> - * Near the end of the offline process.  Trace the fact that this CPU
> - * is going offline.
> - */
> -int rcutree_dying_cpu(unsigned int cpu)
> -{
> -	bool blkd;
> -	struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
> -	struct rcu_node *rnp = rdp->mynode;
> -
> -	if (!IS_ENABLED(CONFIG_HOTPLUG_CPU))
> -		return 0;
> -
> -	blkd = !!(READ_ONCE(rnp->qsmask) & rdp->grpmask);
> -	trace_rcu_grace_period(rcu_state.name, READ_ONCE(rnp->gp_seq),
> -			       blkd ? TPS("cpuofl-bgp") : TPS("cpuofl"));
> -	return 0;
> -}
> -
>  /*
>   * All CPUs for the specified rcu_node structure have gone offline,
>   * and all tasks that were preempted within an RCU read-side critical
> @@ -4292,23 +4273,6 @@ static void rcu_cleanup_dead_rnp(struct rcu_node *rnp_leaf)
>  	}
>  }
>  
> -/*
> - * The CPU has been completely removed, and some other CPU is reporting
> - * this fact from process context.  Do the remainder of the cleanup.
> - * There can only be one CPU hotplug operation at a time, so no need for
> - * explicit locking.
> - */
> -int rcutree_dead_cpu(unsigned int cpu)
> -{
> -	if (!IS_ENABLED(CONFIG_HOTPLUG_CPU))
> -		return 0;
> -
> -	WRITE_ONCE(rcu_state.n_online_cpus, rcu_state.n_online_cpus - 1);
> -	// Stop-machine done, so allow nohz_full to disable tick.
> -	tick_dep_clear(TICK_DEP_BIT_RCU);
> -	return 0;
> -}
> -
>  /*
>   * Propagate ->qsinitmask bits up the rcu_node tree to account for the
>   * first CPU in a given leaf rcu_node structure coming online.  The caller
> @@ -4461,29 +4425,6 @@ int rcutree_online_cpu(unsigned int cpu)
>  	return 0;
>  }
>  
> -/*
> - * Near the beginning of the process.  The CPU is still very much alive
> - * with pretty much all services enabled.
> - */
> -int rcutree_offline_cpu(unsigned int cpu)
> -{
> -	unsigned long flags;
> -	struct rcu_data *rdp;
> -	struct rcu_node *rnp;
> -
> -	rdp = per_cpu_ptr(&rcu_data, cpu);
> -	rnp = rdp->mynode;
> -	raw_spin_lock_irqsave_rcu_node(rnp, flags);
> -	rnp->ffmask &= ~rdp->grpmask;
> -	raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
> -
> -	rcutree_affinity_setting(cpu, cpu);
> -
> -	// nohz_full CPUs need the tick for stop-machine to work quickly
> -	tick_dep_set(TICK_DEP_BIT_RCU);
> -	return 0;
> -}
> -
>  /*
>   * Mark the specified CPU as being online so that subsequent grace periods
>   * (both expedited and normal) will wait on it.  Note that this means that
> @@ -4637,7 +4578,60 @@ void rcutree_migrate_callbacks(int cpu)
>  		  cpu, rcu_segcblist_n_cbs(&rdp->cblist),
>  		  rcu_segcblist_first_cb(&rdp->cblist));
>  }
> -#endif
> +
> +/*
> + * The CPU has been completely removed, and some other CPU is reporting
> + * this fact from process context.  Do the remainder of the cleanup.
> + * There can only be one CPU hotplug operation at a time, so no need for
> + * explicit locking.
> + */
> +int rcutree_dead_cpu(unsigned int cpu)
> +{
> +	WRITE_ONCE(rcu_state.n_online_cpus, rcu_state.n_online_cpus - 1);
> +	// Stop-machine done, so allow nohz_full to disable tick.
> +	tick_dep_clear(TICK_DEP_BIT_RCU);
> +	return 0;
> +}
> +
> +/*
> + * Near the end of the offline process.  Trace the fact that this CPU
> + * is going offline.
> + */
> +int rcutree_dying_cpu(unsigned int cpu)
> +{
> +	bool blkd;
> +	struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
> +	struct rcu_node *rnp = rdp->mynode;
> +
> +	blkd = !!(READ_ONCE(rnp->qsmask) & rdp->grpmask);
> +	trace_rcu_grace_period(rcu_state.name, READ_ONCE(rnp->gp_seq),
> +			       blkd ? TPS("cpuofl-bgp") : TPS("cpuofl"));
> +	return 0;
> +}
> +
> +/*
> + * Near the beginning of the process.  The CPU is still very much alive
> + * with pretty much all services enabled.
> + */
> +int rcutree_offline_cpu(unsigned int cpu)
> +{
> +	unsigned long flags;
> +	struct rcu_data *rdp;
> +	struct rcu_node *rnp;
> +
> +	rdp = per_cpu_ptr(&rcu_data, cpu);
> +	rnp = rdp->mynode;
> +	raw_spin_lock_irqsave_rcu_node(rnp, flags);
> +	rnp->ffmask &= ~rdp->grpmask;
> +	raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
> +
> +	rcutree_affinity_setting(cpu, cpu);
> +
> +	// nohz_full CPUs need the tick for stop-machine to work quickly
> +	tick_dep_set(TICK_DEP_BIT_RCU);
> +	return 0;
> +}
> +#endif /* #ifdef CONFIG_HOTPLUG_CPU */
>  
>  /*
>   * On non-huge systems, use expedited RCU grace periods to make suspend
> -- 
> 2.41.0
> 

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2023-10-04 16:57 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-09-08 20:35 [PATCH 00/10] rcu cleanups Frederic Weisbecker
2023-09-08 20:35 ` [PATCH 01/10] rcu: Use rcu_segcblist_segempty() instead of open coding it Frederic Weisbecker
2023-10-02 15:38   ` Paul E. McKenney
2023-09-08 20:35 ` [PATCH 02/10] rcu: Rename jiffies_till_flush to jiffies_lazy_flush Frederic Weisbecker
2023-09-09  1:07   ` Joel Fernandes
2023-09-10 19:48     ` Frederic Weisbecker
2023-09-08 20:35 ` [PATCH 03/10] rcu/nocb: Remove needless LOAD-ACQUIRE Frederic Weisbecker
2023-09-09  1:48   ` Joel Fernandes
2023-09-09  1:50     ` Joel Fernandes
2023-09-10 21:17     ` Frederic Weisbecker
2023-09-08 20:35 ` [PATCH 04/10] rcu/nocb: Remove needless full barrier after callback advancing Frederic Weisbecker
2023-09-09  4:31   ` Joel Fernandes
2023-09-09 18:22     ` Boqun Feng
2023-09-10  4:09       ` Joel Fernandes
2023-09-10 10:22         ` Paul E. McKenney
2023-09-10 20:17         ` Frederic Weisbecker
2023-09-10 20:29           ` Frederic Weisbecker
2023-09-08 20:35 ` [PATCH 05/10] rcu: Assume IRQS disabled from rcu_report_dead() Frederic Weisbecker
2023-10-02 15:41   ` Paul E. McKenney
2023-09-08 20:35 ` [PATCH 06/10] rcu: Assume rcu_report_dead() is always called locally Frederic Weisbecker
2023-10-02 15:45   ` Paul E. McKenney
2023-09-08 20:36 ` [PATCH 07/10] rcu: Conditionally build CPU-hotplug teardown callbacks Frederic Weisbecker
2023-10-04 16:57   ` Paul E. McKenney
2023-09-08 20:36 ` [PATCH 08/10] rcu: Standardize explicit CPU-hotplug calls Frederic Weisbecker
2023-10-02 15:47   ` Paul E. McKenney
2023-09-08 20:36 ` [PATCH 09/10] rcu: Remove references to rcu_migrate_callbacks() from diagrams Frederic Weisbecker
2023-10-02 15:52   ` Paul E. McKenney
2023-09-08 20:36 ` [PATCH 10/10] rcu: Comment why callbacks migration can't wait for CPUHP_RCUTREE_PREP Frederic Weisbecker
2023-10-02 15:48   ` Paul E. McKenney

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.