[PATCH tip/core/rcu 0/18] Expedited grace-period improvements for 4.4

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH tip/core/rcu 0/18] Expedited grace-period improvements for 4.4
@ 2015-10-06 16:29 Paul E. McKenney
  2015-10-06 16:29 ` [PATCH tip/core/rcu 01/18] rcu: Use rsp->expedited_wq instead of sync_rcu_preempt_exp_wq Paul E. McKenney
  0 siblings, 1 reply; 67+ messages in thread
From: Paul E. McKenney @ 2015-10-06 16:29 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, jiangshanlai, dipankar, akpm, mathieu.desnoyers, josh,
	tglx, peterz, rostedt, dhowells, edumazet, dvhart, fweisbec,
	oleg, bobby.prani

Hello!

This commit continues the effort to reduce the OS jitter from RCU's
expedited grace-period primitives, while also loosening the coupling
between CPU hotplug and RCU's expedited grace-period primitives:

1.	Use rsp->expedited_wq instead of sync_rcu_preempt_exp_wq to
	enable later code consolidation.

2.	Move rcu_report_exp_rnp() to allow later code consolidation.

3.	Consolidate combining-tree bitmaks setup for the initialization
	portion of synchronize_rcu_expedited().

4.	Use single-stage IPI algorithm for preemptible-RCU expedited
	grace periods.

5.	Make synchronize_sched_expedited() use the combining tree
	to reduce memory contention when waiting for quiescent states.

6.	Rename ->qs_pending to ->core_needs_qs to better match this
	field's use.

7.	Invert ->passed_quiesce and rename to ->cpu_no_qs in order to
	enable later aggregate-OR for requests for normal and expedited
	grace periods.

8.	Make ->cpu_no_qs be a union for aggregate OR.

9.	Switch synchronize_sched_expedited() from stop-CPUs to IPI.

10.	Stop silencing lockdep false positive for expedited grace periods,
	given that synchronize_rcu_expedited() no longer invokes
	synchronize_sched_expedited(), eliminating the apparent deadlock.
	(Just for the record, there never was a real deadlock.)

11.	Stop excluding CPU hotplug in synchronize_sched_expedited().

12.	Remove try_get_online_cpus(), which is now no longer used.

13.	Bring sync_sched_exp_select_cpus() into alignment with
	sync_rcu_exp_select_cpus() as a first step towards consolidating
	them into one function.

14.	Consolidate expedited CPU selection, now that #13 enabled it.

15.	Add online/offline information to expedited stall warning message.

16.	Dump blocking tasks in expedited stall-warning messages.

17.	Enable stall warnings for synchronize_rcu_expedited().

18.	Improve synchronize_sched_expedited() CPU-hotplug handling.

							Thanx, Paul

------------------------------------------------------------------------

 b/Documentation/RCU/trace.txt |   32 +-
 b/include/linux/cpu.h         |    2 
 b/include/linux/sched.h       |   10 
 b/kernel/cpu.c                |   13 
 b/kernel/rcu/tree.c           |  561 ++++++++++++++++++++++++++++++----------
 b/kernel/rcu/tree.h           |   50 ++-
 b/kernel/rcu/tree_plugin.h    |  579 ++++++++++++++++++++++--------------------
 b/kernel/rcu/tree_trace.c     |   10 
 8 files changed, 782 insertions(+), 475 deletions(-)


^ permalink raw reply	[flat|nested] 67+ messages in thread

* [PATCH tip/core/rcu 01/18] rcu: Use rsp->expedited_wq instead of sync_rcu_preempt_exp_wq
  2015-10-06 16:29 [PATCH tip/core/rcu 0/18] Expedited grace-period improvements for 4.4 Paul E. McKenney
@ 2015-10-06 16:29 ` Paul E. McKenney
  2015-10-06 16:29   ` [PATCH tip/core/rcu 02/18] rcu: Move rcu_report_exp_rnp() to allow consolidation Paul E. McKenney
                     ` (16 more replies)
  0 siblings, 17 replies; 67+ messages in thread
From: Paul E. McKenney @ 2015-10-06 16:29 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, jiangshanlai, dipankar, akpm, mathieu.desnoyers, josh,
	tglx, peterz, rostedt, dhowells, edumazet, dvhart, fweisbec,
	oleg, bobby.prani, Paul E. McKenney

Now that there is an ->expedited_wq waitqueue in each rcu_state structure,
there is no need for the sync_rcu_preempt_exp_wq global variable.  This
commit therefore substitutes ->expedited_wq for sync_rcu_preempt_exp_wq.
It also initializes ->expedited_wq only once at boot instead of at the
start of each expedited grace period.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 kernel/rcu/tree.c        | 2 +-
 kernel/rcu/tree_plugin.h | 6 ++----
 2 files changed, 3 insertions(+), 5 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 775d36cc0050..53d66ebb4811 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -3556,7 +3556,6 @@ void synchronize_sched_expedited(void)
 	rcu_exp_gp_seq_start(rsp);
 
 	/* Stop each CPU that is online, non-idle, and not us. */
-	init_waitqueue_head(&rsp->expedited_wq);
 	atomic_set(&rsp->expedited_need_qs, 1); /* Extra count avoids race. */
 	for_each_online_cpu(cpu) {
 		struct rcu_data *rdp = per_cpu_ptr(rsp->rda, cpu);
@@ -4179,6 +4178,7 @@ static void __init rcu_init_one(struct rcu_state *rsp,
 	}
 
 	init_waitqueue_head(&rsp->gp_wq);
+	init_waitqueue_head(&rsp->expedited_wq);
 	rnp = rsp->level[rcu_num_lvls - 1];
 	for_each_possible_cpu(i) {
 		while (i > rnp->grphi)
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index b2bf3963a0ae..72df006de798 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -535,8 +535,6 @@ void synchronize_rcu(void)
 }
 EXPORT_SYMBOL_GPL(synchronize_rcu);
 
-static DECLARE_WAIT_QUEUE_HEAD(sync_rcu_preempt_exp_wq);
-
 /*
  * Return non-zero if there are any tasks in RCU read-side critical
  * sections blocking the current preemptible-RCU expedited grace period.
@@ -590,7 +588,7 @@ static void rcu_report_exp_rnp(struct rcu_state *rsp, struct rcu_node *rnp,
 			raw_spin_unlock_irqrestore(&rnp->lock, flags);
 			if (wake) {
 				smp_mb(); /* EGP done before wake_up(). */
-				wake_up(&sync_rcu_preempt_exp_wq);
+				wake_up(&rsp->expedited_wq);
 			}
 			break;
 		}
@@ -729,7 +727,7 @@ void synchronize_rcu_expedited(void)
 
 	/* Wait for snapshotted ->blkd_tasks lists to drain. */
 	rnp = rcu_get_root(rsp);
-	wait_event(sync_rcu_preempt_exp_wq,
+	wait_event(rsp->expedited_wq,
 		   sync_rcu_preempt_exp_done(rnp));
 
 	/* Clean up and exit. */
-- 
2.5.2


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH tip/core/rcu 02/18] rcu: Move rcu_report_exp_rnp() to allow consolidation
  2015-10-06 16:29 ` [PATCH tip/core/rcu 01/18] rcu: Use rsp->expedited_wq instead of sync_rcu_preempt_exp_wq Paul E. McKenney
@ 2015-10-06 16:29   ` Paul E. McKenney
  2015-10-06 20:29     ` Peter Zijlstra
  2015-10-06 16:29   ` [PATCH tip/core/rcu 03/18] rcu: Consolidate tree setup for synchronize_rcu_expedited() Paul E. McKenney
                     ` (15 subsequent siblings)
  16 siblings, 1 reply; 67+ messages in thread
From: Paul E. McKenney @ 2015-10-06 16:29 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, jiangshanlai, dipankar, akpm, mathieu.desnoyers, josh,
	tglx, peterz, rostedt, dhowells, edumazet, dvhart, fweisbec,
	oleg, bobby.prani, Paul E. McKenney

This is a nearly pure code-movement commit, moving rcu_report_exp_rnp(),
sync_rcu_preempt_exp_done(), and rcu_preempted_readers_exp() so
that later commits can make synchronize_sched_expedited() use them.
The non-code-movement portion of this commit tags rcu_report_exp_rnp()
as __maybe_unused to avoid build errors when CONFIG_PREEMPT=n.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 kernel/rcu/tree.c        | 66 ++++++++++++++++++++++++++++++++++++++++++++++++
 kernel/rcu/tree_plugin.h | 66 ------------------------------------------------
 2 files changed, 66 insertions(+), 66 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 53d66ebb4811..59af27d8bc6a 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -3379,6 +3379,72 @@ static bool rcu_exp_gp_seq_done(struct rcu_state *rsp, unsigned long s)
 	return rcu_seq_done(&rsp->expedited_sequence, s);
 }
 
+/*
+ * Return non-zero if there are any tasks in RCU read-side critical
+ * sections blocking the current preemptible-RCU expedited grace period.
+ * If there is no preemptible-RCU expedited grace period currently in
+ * progress, returns zero unconditionally.
+ */
+static int rcu_preempted_readers_exp(struct rcu_node *rnp)
+{
+	return rnp->exp_tasks != NULL;
+}
+
+/*
+ * return non-zero if there is no RCU expedited grace period in progress
+ * for the specified rcu_node structure, in other words, if all CPUs and
+ * tasks covered by the specified rcu_node structure have done their bit
+ * for the current expedited grace period.  Works only for preemptible
+ * RCU -- other RCU implementation use other means.
+ *
+ * Caller must hold the root rcu_node's exp_funnel_mutex.
+ */
+static int sync_rcu_preempt_exp_done(struct rcu_node *rnp)
+{
+	return !rcu_preempted_readers_exp(rnp) &&
+	       READ_ONCE(rnp->expmask) == 0;
+}
+
+/*
+ * Report the exit from RCU read-side critical section for the last task
+ * that queued itself during or before the current expedited preemptible-RCU
+ * grace period.  This event is reported either to the rcu_node structure on
+ * which the task was queued or to one of that rcu_node structure's ancestors,
+ * recursively up the tree.  (Calm down, calm down, we do the recursion
+ * iteratively!)
+ *
+ * Caller must hold the root rcu_node's exp_funnel_mutex.
+ */
+static void __maybe_unused rcu_report_exp_rnp(struct rcu_state *rsp,
+					      struct rcu_node *rnp, bool wake)
+{
+	unsigned long flags;
+	unsigned long mask;
+
+	raw_spin_lock_irqsave(&rnp->lock, flags);
+	smp_mb__after_unlock_lock();
+	for (;;) {
+		if (!sync_rcu_preempt_exp_done(rnp)) {
+			raw_spin_unlock_irqrestore(&rnp->lock, flags);
+			break;
+		}
+		if (rnp->parent == NULL) {
+			raw_spin_unlock_irqrestore(&rnp->lock, flags);
+			if (wake) {
+				smp_mb(); /* EGP done before wake_up(). */
+				wake_up(&rsp->expedited_wq);
+			}
+			break;
+		}
+		mask = rnp->grpmask;
+		raw_spin_unlock(&rnp->lock); /* irqs remain disabled */
+		rnp = rnp->parent;
+		raw_spin_lock(&rnp->lock); /* irqs already disabled */
+		smp_mb__after_unlock_lock();
+		rnp->expmask &= ~mask;
+	}
+}
+
 /* Common code for synchronize_{rcu,sched}_expedited() work-done checking. */
 static bool sync_exp_work_done(struct rcu_state *rsp, struct rcu_node *rnp,
 			       struct rcu_data *rdp,
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 72df006de798..e73be8539978 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -536,72 +536,6 @@ void synchronize_rcu(void)
 EXPORT_SYMBOL_GPL(synchronize_rcu);
 
 /*
- * Return non-zero if there are any tasks in RCU read-side critical
- * sections blocking the current preemptible-RCU expedited grace period.
- * If there is no preemptible-RCU expedited grace period currently in
- * progress, returns zero unconditionally.
- */
-static int rcu_preempted_readers_exp(struct rcu_node *rnp)
-{
-	return rnp->exp_tasks != NULL;
-}
-
-/*
- * return non-zero if there is no RCU expedited grace period in progress
- * for the specified rcu_node structure, in other words, if all CPUs and
- * tasks covered by the specified rcu_node structure have done their bit
- * for the current expedited grace period.  Works only for preemptible
- * RCU -- other RCU implementation use other means.
- *
- * Caller must hold the root rcu_node's exp_funnel_mutex.
- */
-static int sync_rcu_preempt_exp_done(struct rcu_node *rnp)
-{
-	return !rcu_preempted_readers_exp(rnp) &&
-	       READ_ONCE(rnp->expmask) == 0;
-}
-
-/*
- * Report the exit from RCU read-side critical section for the last task
- * that queued itself during or before the current expedited preemptible-RCU
- * grace period.  This event is reported either to the rcu_node structure on
- * which the task was queued or to one of that rcu_node structure's ancestors,
- * recursively up the tree.  (Calm down, calm down, we do the recursion
- * iteratively!)
- *
- * Caller must hold the root rcu_node's exp_funnel_mutex.
- */
-static void rcu_report_exp_rnp(struct rcu_state *rsp, struct rcu_node *rnp,
-			       bool wake)
-{
-	unsigned long flags;
-	unsigned long mask;
-
-	raw_spin_lock_irqsave(&rnp->lock, flags);
-	smp_mb__after_unlock_lock();
-	for (;;) {
-		if (!sync_rcu_preempt_exp_done(rnp)) {
-			raw_spin_unlock_irqrestore(&rnp->lock, flags);
-			break;
-		}
-		if (rnp->parent == NULL) {
-			raw_spin_unlock_irqrestore(&rnp->lock, flags);
-			if (wake) {
-				smp_mb(); /* EGP done before wake_up(). */
-				wake_up(&rsp->expedited_wq);
-			}
-			break;
-		}
-		mask = rnp->grpmask;
-		raw_spin_unlock(&rnp->lock); /* irqs remain disabled */
-		rnp = rnp->parent;
-		raw_spin_lock(&rnp->lock); /* irqs already disabled */
-		smp_mb__after_unlock_lock();
-		rnp->expmask &= ~mask;
-	}
-}
-
-/*
  * Snapshot the tasks blocking the newly started preemptible-RCU expedited
  * grace period for the specified rcu_node structure, phase 1.  If there
  * are such tasks, set the ->expmask bits up the rcu_node tree and also
-- 
2.5.2


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH tip/core/rcu 03/18] rcu: Consolidate tree setup for synchronize_rcu_expedited()
  2015-10-06 16:29 ` [PATCH tip/core/rcu 01/18] rcu: Use rsp->expedited_wq instead of sync_rcu_preempt_exp_wq Paul E. McKenney
  2015-10-06 16:29   ` [PATCH tip/core/rcu 02/18] rcu: Move rcu_report_exp_rnp() to allow consolidation Paul E. McKenney
@ 2015-10-06 16:29   ` Paul E. McKenney
  2015-10-06 16:29   ` [PATCH tip/core/rcu 04/18] rcu: Use single-stage IPI algorithm for RCU expedited grace period Paul E. McKenney
                     ` (14 subsequent siblings)
  16 siblings, 0 replies; 67+ messages in thread
From: Paul E. McKenney @ 2015-10-06 16:29 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, jiangshanlai, dipankar, akpm, mathieu.desnoyers, josh,
	tglx, peterz, rostedt, dhowells, edumazet, dvhart, fweisbec,
	oleg, bobby.prani, Paul E. McKenney

This commit replaces sync_rcu_preempt_exp_init1(() and
sync_rcu_preempt_exp_init2() with sync_exp_reset_tree_hotplug()
and sync_exp_reset_tree(), which will also be used by
synchronize_sched_expedited(), and sync_rcu_exp_select_nodes(), which
contains code specific to synchronize_rcu_expedited().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 kernel/rcu/tree.c        |  86 ++++++++++++++++++++++++++++++++++++++-
 kernel/rcu/tree.h        |  17 +++++---
 kernel/rcu/tree_plugin.h | 102 +++++++++--------------------------------------
 3 files changed, 115 insertions(+), 90 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 59af27d8bc6a..8526896afea7 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -3380,6 +3380,87 @@ static bool rcu_exp_gp_seq_done(struct rcu_state *rsp, unsigned long s)
 }
 
 /*
+ * Reset the ->expmaskinit values in the rcu_node tree to reflect any
+ * recent CPU-online activity.  Note that these masks are not cleared
+ * when CPUs go offline, so they reflect the union of all CPUs that have
+ * ever been online.  This means that this function normally takes its
+ * no-work-to-do fastpath.
+ */
+static void sync_exp_reset_tree_hotplug(struct rcu_state *rsp)
+{
+	bool done;
+	unsigned long flags;
+	unsigned long mask;
+	unsigned long oldmask;
+	int ncpus = READ_ONCE(rsp->ncpus);
+	struct rcu_node *rnp;
+	struct rcu_node *rnp_up;
+
+	/* If no new CPUs onlined since last time, nothing to do. */
+	if (likely(ncpus == rsp->ncpus_snap))
+		return;
+	rsp->ncpus_snap = ncpus;
+
+	/*
+	 * Each pass through the following loop propagates newly onlined
+	 * CPUs for the current rcu_node structure up the rcu_node tree.
+	 */
+	rcu_for_each_leaf_node(rsp, rnp) {
+		raw_spin_lock_irqsave(&rnp->lock, flags);
+		smp_mb__after_unlock_lock();
+		if (rnp->expmaskinit == rnp->expmaskinitnext) {
+			raw_spin_unlock_irqrestore(&rnp->lock, flags);
+			continue;  /* No new CPUs, nothing to do. */
+		}
+
+		/* Update this node's mask, track old value for propagation. */
+		oldmask = rnp->expmaskinit;
+		rnp->expmaskinit = rnp->expmaskinitnext;
+		raw_spin_unlock_irqrestore(&rnp->lock, flags);
+
+		/* If was already nonzero, nothing to propagate. */
+		if (oldmask)
+			continue;
+
+		/* Propagate the new CPU up the tree. */
+		mask = rnp->grpmask;
+		rnp_up = rnp->parent;
+		done = false;
+		while (rnp_up) {
+			raw_spin_lock_irqsave(&rnp_up->lock, flags);
+			smp_mb__after_unlock_lock();
+			if (rnp_up->expmaskinit)
+				done = true;
+			rnp_up->expmaskinit |= mask;
+			raw_spin_unlock_irqrestore(&rnp_up->lock, flags);
+			if (done)
+				break;
+			mask = rnp_up->grpmask;
+			rnp_up = rnp_up->parent;
+		}
+	}
+}
+
+/*
+ * Reset the ->expmask values in the rcu_node tree in preparation for
+ * a new expedited grace period.
+ */
+static void __maybe_unused sync_exp_reset_tree(struct rcu_state *rsp)
+{
+	unsigned long flags;
+	struct rcu_node *rnp;
+
+	sync_exp_reset_tree_hotplug(rsp);
+	rcu_for_each_node_breadth_first(rsp, rnp) {
+		raw_spin_lock_irqsave(&rnp->lock, flags);
+		smp_mb__after_unlock_lock();
+		WARN_ON_ONCE(rnp->expmask);
+		rnp->expmask = rnp->expmaskinit;
+		raw_spin_unlock_irqrestore(&rnp->lock, flags);
+	}
+}
+
+/*
  * Return non-zero if there are any tasks in RCU read-side critical
  * sections blocking the current preemptible-RCU expedited grace period.
  * If there is no preemptible-RCU expedited grace period currently in
@@ -3971,7 +4052,6 @@ rcu_init_percpu_data(int cpu, struct rcu_state *rsp)
 
 	/* Set up local state, ensuring consistent view of global state. */
 	raw_spin_lock_irqsave(&rnp->lock, flags);
-	rdp->beenonline = 1;	 /* We have now been online. */
 	rdp->qlen_last_fqs_check = 0;
 	rdp->n_force_qs_snap = rsp->n_force_qs;
 	rdp->blimit = blimit;
@@ -3993,6 +4073,10 @@ rcu_init_percpu_data(int cpu, struct rcu_state *rsp)
 	raw_spin_lock(&rnp->lock);		/* irqs already disabled. */
 	smp_mb__after_unlock_lock();
 	rnp->qsmaskinitnext |= mask;
+	rnp->expmaskinitnext |= mask;
+	if (!rdp->beenonline)
+		WRITE_ONCE(rsp->ncpus, READ_ONCE(rsp->ncpus) + 1);
+	rdp->beenonline = true;	 /* We have now been online. */
 	rdp->gpnum = rnp->completed; /* Make CPU later note any new GP. */
 	rdp->completed = rnp->completed;
 	rdp->passed_quiesce = false;
diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
index 2e991f8361e4..a57f25ecca58 100644
--- a/kernel/rcu/tree.h
+++ b/kernel/rcu/tree.h
@@ -171,16 +171,21 @@ struct rcu_node {
 				/*  an rcu_data structure, otherwise, each */
 				/*  bit corresponds to a child rcu_node */
 				/*  structure. */
-	unsigned long expmask;	/* Groups that have ->blkd_tasks */
-				/*  elements that need to drain to allow the */
-				/*  current expedited grace period to */
-				/*  complete (only for PREEMPT_RCU). */
 	unsigned long qsmaskinit;
-				/* Per-GP initial value for qsmask & expmask. */
+				/* Per-GP initial value for qsmask. */
 				/*  Initialized from ->qsmaskinitnext at the */
 				/*  beginning of each grace period. */
 	unsigned long qsmaskinitnext;
 				/* Online CPUs for next grace period. */
+	unsigned long expmask;	/* CPUs or groups that need to check in */
+				/*  to allow the current expedited GP */
+				/*  to complete. */
+	unsigned long expmaskinit;
+				/* Per-GP initial values for expmask. */
+				/*  Initialized from ->expmaskinitnext at the */
+				/*  beginning of each expedited GP. */
+	unsigned long expmaskinitnext;
+				/* Online CPUs for next expedited GP. */
 	unsigned long grpmask;	/* Mask to apply to parent qsmask. */
 				/*  Only one bit will be set in this mask. */
 	int	grplo;		/* lowest-numbered CPU or group here. */
@@ -466,6 +471,7 @@ struct rcu_state {
 	struct rcu_data __percpu *rda;		/* pointer of percu rcu_data. */
 	void (*call)(struct rcu_head *head,	/* call_rcu() flavor. */
 		     void (*func)(struct rcu_head *head));
+	int ncpus;				/* # CPUs seen so far. */
 
 	/* The following fields are guarded by the root rcu_node's lock. */
 
@@ -508,6 +514,7 @@ struct rcu_state {
 	atomic_long_t expedited_normal;		/* # fallbacks to normal. */
 	atomic_t expedited_need_qs;		/* # CPUs left to check in. */
 	wait_queue_head_t expedited_wq;		/* Wait for check-ins. */
+	int ncpus_snap;				/* # CPUs seen last time. */
 
 	unsigned long jiffies_force_qs;		/* Time at which to invoke */
 						/*  force_quiescent_state(). */
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index e73be8539978..62d05413b7ba 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -536,86 +536,28 @@ void synchronize_rcu(void)
 EXPORT_SYMBOL_GPL(synchronize_rcu);
 
 /*
- * Snapshot the tasks blocking the newly started preemptible-RCU expedited
- * grace period for the specified rcu_node structure, phase 1.  If there
- * are such tasks, set the ->expmask bits up the rcu_node tree and also
- * set the ->expmask bits on the leaf rcu_node structures to tell phase 2
- * that work is needed here.
- *
- * Caller must hold the root rcu_node's exp_funnel_mutex.
+ * Select the nodes that the upcoming expedited grace period needs
+ * to wait for.
  */
-static void
-sync_rcu_preempt_exp_init1(struct rcu_state *rsp, struct rcu_node *rnp)
+static void sync_rcu_exp_select_nodes(struct rcu_state *rsp)
 {
 	unsigned long flags;
-	unsigned long mask;
-	struct rcu_node *rnp_up;
+	struct rcu_node *rnp;
 
-	raw_spin_lock_irqsave(&rnp->lock, flags);
-	smp_mb__after_unlock_lock();
-	WARN_ON_ONCE(rnp->expmask);
-	WARN_ON_ONCE(rnp->exp_tasks);
-	if (!rcu_preempt_has_tasks(rnp)) {
-		/* No blocked tasks, nothing to do. */
-		raw_spin_unlock_irqrestore(&rnp->lock, flags);
-		return;
-	}
-	/* Call for Phase 2 and propagate ->expmask bits up the tree. */
-	rnp->expmask = 1;
-	rnp_up = rnp;
-	while (rnp_up->parent) {
-		mask = rnp_up->grpmask;
-		rnp_up = rnp_up->parent;
-		if (rnp_up->expmask & mask)
-			break;
-		raw_spin_lock(&rnp_up->lock); /* irqs already off */
+	sync_exp_reset_tree(rsp);
+	rcu_for_each_leaf_node(rsp, rnp) {
+		raw_spin_lock_irqsave(&rnp->lock, flags);
 		smp_mb__after_unlock_lock();
-		rnp_up->expmask |= mask;
-		raw_spin_unlock(&rnp_up->lock); /* irqs still off */
-	}
-	raw_spin_unlock_irqrestore(&rnp->lock, flags);
-}
-
-/*
- * Snapshot the tasks blocking the newly started preemptible-RCU expedited
- * grace period for the specified rcu_node structure, phase 2.  If the
- * leaf rcu_node structure has its ->expmask field set, check for tasks.
- * If there are some, clear ->expmask and set ->exp_tasks accordingly,
- * then initiate RCU priority boosting.  Otherwise, clear ->expmask and
- * invoke rcu_report_exp_rnp() to clear out the upper-level ->expmask bits,
- * enabling rcu_read_unlock_special() to do the bit-clearing.
- *
- * Caller must hold the root rcu_node's exp_funnel_mutex.
- */
-static void
-sync_rcu_preempt_exp_init2(struct rcu_state *rsp, struct rcu_node *rnp)
-{
-	unsigned long flags;
-
-	raw_spin_lock_irqsave(&rnp->lock, flags);
-	smp_mb__after_unlock_lock();
-	if (!rnp->expmask) {
-		/* Phase 1 didn't do anything, so Phase 2 doesn't either. */
-		raw_spin_unlock_irqrestore(&rnp->lock, flags);
-		return;
-	}
-
-	/* Phase 1 is over. */
-	rnp->expmask = 0;
-
-	/*
-	 * If there are still blocked tasks, set up ->exp_tasks so that
-	 * rcu_read_unlock_special() will wake us and then boost them.
-	 */
-	if (rcu_preempt_has_tasks(rnp)) {
-		rnp->exp_tasks = rnp->blkd_tasks.next;
-		rcu_initiate_boost(rnp, flags);  /* releases rnp->lock */
-		return;
+		rnp->expmask = 0; /* No per-CPU component yet. */
+		if (!rcu_preempt_has_tasks(rnp)) {
+			/* FIXME: Want __rcu_report_exp_rnp() here. */
+			raw_spin_unlock_irqrestore(&rnp->lock, flags);
+		} else {
+			rnp->exp_tasks = rnp->blkd_tasks.next;
+			rcu_initiate_boost(rnp, flags);
+		}
+		rcu_report_exp_rnp(rsp, rnp, false);
 	}
-
-	/* No longer any blocked tasks, so undo bit setting. */
-	raw_spin_unlock_irqrestore(&rnp->lock, flags);
-	rcu_report_exp_rnp(rsp, rnp, false);
 }
 
 /**
@@ -648,16 +590,8 @@ void synchronize_rcu_expedited(void)
 	/* force all RCU readers onto ->blkd_tasks lists. */
 	synchronize_sched_expedited();
 
-	/*
-	 * Snapshot current state of ->blkd_tasks lists into ->expmask.
-	 * Phase 1 sets bits and phase 2 permits rcu_read_unlock_special()
-	 * to start clearing them.  Doing this in one phase leads to
-	 * strange races between setting and clearing bits, so just say "no"!
-	 */
-	rcu_for_each_leaf_node(rsp, rnp)
-		sync_rcu_preempt_exp_init1(rsp, rnp);
-	rcu_for_each_leaf_node(rsp, rnp)
-		sync_rcu_preempt_exp_init2(rsp, rnp);
+	/* Initialize the rcu_node tree in preparation for the wait. */
+	sync_rcu_exp_select_nodes(rsp);
 
 	/* Wait for snapshotted ->blkd_tasks lists to drain. */
 	rnp = rcu_get_root(rsp);
-- 
2.5.2


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH tip/core/rcu 04/18] rcu: Use single-stage IPI algorithm for RCU expedited grace period
  2015-10-06 16:29 ` [PATCH tip/core/rcu 01/18] rcu: Use rsp->expedited_wq instead of sync_rcu_preempt_exp_wq Paul E. McKenney
  2015-10-06 16:29   ` [PATCH tip/core/rcu 02/18] rcu: Move rcu_report_exp_rnp() to allow consolidation Paul E. McKenney
  2015-10-06 16:29   ` [PATCH tip/core/rcu 03/18] rcu: Consolidate tree setup for synchronize_rcu_expedited() Paul E. McKenney
@ 2015-10-06 16:29   ` Paul E. McKenney
  2015-10-07 13:24     ` Peter Zijlstra
                       ` (2 more replies)
  2015-10-06 16:29   ` [PATCH tip/core/rcu 05/18] rcu: Move synchronize_sched_expedited() to combining tree Paul E. McKenney
                     ` (13 subsequent siblings)
  16 siblings, 3 replies; 67+ messages in thread
From: Paul E. McKenney @ 2015-10-06 16:29 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, jiangshanlai, dipankar, akpm, mathieu.desnoyers, josh,
	tglx, peterz, rostedt, dhowells, edumazet, dvhart, fweisbec,
	oleg, bobby.prani, Paul E. McKenney

The current preemptible-RCU expedited grace-period algorithm invokes
synchronize_sched_expedited() to enqueue all tasks currently running
in a preemptible-RCU read-side critical section, then waits for all the
->blkd_tasks lists to drain.  This works, but results in both an IPI and
a double context switch even on CPUs that do not happen to be running
in a preemptible RCU read-side critical section.

This commit implements a new algorithm that causes less OS jitter.
This new algorithm IPIs all online CPUs that are not idle (from an
RCU perspective), but refrains from self-IPIs.  If a CPU receiving
this IPI is not in a preemptible RCU read-side critical section (or
is just now exiting one), it pushes quiescence up the rcu_node tree,
otherwise, it sets a flag that will be handled by the upcoming outermost
rcu_read_unlock(), which will then push quiescence up the tree.

The expedited grace period must of course wait on any pre-existing blocked
readers, and newly blocked readers must be queued carefully based on
the state of both the normal and the expedited grace periods.  This
new queueing approach also avoids the need to update boost state,
courtesy of the fact that blocked tasks are no longer ever migrated to
the root rcu_node structure.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 include/linux/sched.h    |  10 +-
 kernel/rcu/tree.c        |  75 ++++++++----
 kernel/rcu/tree_plugin.h | 296 +++++++++++++++++++++++++++++++++++++++--------
 3 files changed, 311 insertions(+), 70 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index a4ab9daa387c..7fa8c4d372e7 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1330,10 +1330,12 @@ struct sched_dl_entity {
 
 union rcu_special {
 	struct {
-		bool blocked;
-		bool need_qs;
-	} b;
-	short s;
+		u8 blocked;
+		u8 need_qs;
+		u8 exp_need_qs;
+		u8 pad;	/* Otherwise the compiler can store garbage here. */
+	} b; /* Bits. */
+	u32 s; /* Set of bits. */
 };
 struct rcu_node;
 
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 8526896afea7..6e92bf4337bd 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -3461,18 +3461,7 @@ static void __maybe_unused sync_exp_reset_tree(struct rcu_state *rsp)
 }
 
 /*
- * Return non-zero if there are any tasks in RCU read-side critical
- * sections blocking the current preemptible-RCU expedited grace period.
- * If there is no preemptible-RCU expedited grace period currently in
- * progress, returns zero unconditionally.
- */
-static int rcu_preempted_readers_exp(struct rcu_node *rnp)
-{
-	return rnp->exp_tasks != NULL;
-}
-
-/*
- * return non-zero if there is no RCU expedited grace period in progress
+ * Return non-zero if there is no RCU expedited grace period in progress
  * for the specified rcu_node structure, in other words, if all CPUs and
  * tasks covered by the specified rcu_node structure have done their bit
  * for the current expedited grace period.  Works only for preemptible
@@ -3482,7 +3471,7 @@ static int rcu_preempted_readers_exp(struct rcu_node *rnp)
  */
 static int sync_rcu_preempt_exp_done(struct rcu_node *rnp)
 {
-	return !rcu_preempted_readers_exp(rnp) &&
+	return rnp->exp_tasks == NULL &&
 	       READ_ONCE(rnp->expmask) == 0;
 }
 
@@ -3494,19 +3483,21 @@ static int sync_rcu_preempt_exp_done(struct rcu_node *rnp)
  * recursively up the tree.  (Calm down, calm down, we do the recursion
  * iteratively!)
  *
- * Caller must hold the root rcu_node's exp_funnel_mutex.
+ * Caller must hold the root rcu_node's exp_funnel_mutex and the
+ * specified rcu_node structure's ->lock.
  */
-static void __maybe_unused rcu_report_exp_rnp(struct rcu_state *rsp,
-					      struct rcu_node *rnp, bool wake)
+static void __rcu_report_exp_rnp(struct rcu_state *rsp, struct rcu_node *rnp,
+				 bool wake, unsigned long flags)
+	__releases(rnp->lock)
 {
-	unsigned long flags;
 	unsigned long mask;
 
-	raw_spin_lock_irqsave(&rnp->lock, flags);
-	smp_mb__after_unlock_lock();
 	for (;;) {
 		if (!sync_rcu_preempt_exp_done(rnp)) {
-			raw_spin_unlock_irqrestore(&rnp->lock, flags);
+			if (!rnp->expmask)
+				rcu_initiate_boost(rnp, flags);
+			else
+				raw_spin_unlock_irqrestore(&rnp->lock, flags);
 			break;
 		}
 		if (rnp->parent == NULL) {
@@ -3522,10 +3513,54 @@ static void __maybe_unused rcu_report_exp_rnp(struct rcu_state *rsp,
 		rnp = rnp->parent;
 		raw_spin_lock(&rnp->lock); /* irqs already disabled */
 		smp_mb__after_unlock_lock();
+		WARN_ON_ONCE(!(rnp->expmask & mask));
 		rnp->expmask &= ~mask;
 	}
 }
 
+/*
+ * Report expedited quiescent state for specified node.  This is a
+ * lock-acquisition wrapper function for __rcu_report_exp_rnp().
+ *
+ * Caller must hold the root rcu_node's exp_funnel_mutex.
+ */
+static void __maybe_unused rcu_report_exp_rnp(struct rcu_state *rsp,
+					      struct rcu_node *rnp, bool wake)
+{
+	unsigned long flags;
+
+	raw_spin_lock_irqsave(&rnp->lock, flags);
+	smp_mb__after_unlock_lock();
+	__rcu_report_exp_rnp(rsp, rnp, wake, flags);
+}
+
+/*
+ * Report expedited quiescent state for multiple CPUs, all covered by the
+ * specified leaf rcu_node structure.  Caller must hold the root
+ * rcu_node's exp_funnel_mutex.
+ */
+static void rcu_report_exp_cpu_mult(struct rcu_state *rsp, struct rcu_node *rnp,
+				    unsigned long mask, bool wake)
+{
+	unsigned long flags;
+
+	raw_spin_lock_irqsave(&rnp->lock, flags);
+	smp_mb__after_unlock_lock();
+	WARN_ON_ONCE((rnp->expmask & mask) != mask);
+	rnp->expmask &= ~mask;
+	__rcu_report_exp_rnp(rsp, rnp, wake, flags); /* Releases rnp->lock. */
+}
+
+/*
+ * Report expedited quiescent state for specified rcu_data (CPU).
+ * Caller must hold the root rcu_node's exp_funnel_mutex.
+ */
+static void __maybe_unused rcu_report_exp_rdp(struct rcu_state *rsp,
+					      struct rcu_data *rdp, bool wake)
+{
+	rcu_report_exp_cpu_mult(rsp, rdp->mynode, rdp->grpmask, wake);
+}
+
 /* Common code for synchronize_{rcu,sched}_expedited() work-done checking. */
 static bool sync_exp_work_done(struct rcu_state *rsp, struct rcu_node *rnp,
 			       struct rcu_data *rdp,
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 62d05413b7ba..6f7500f9387c 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -101,7 +101,6 @@ RCU_STATE_INITIALIZER(rcu_preempt, 'p', call_rcu);
 static struct rcu_state *const rcu_state_p = &rcu_preempt_state;
 static struct rcu_data __percpu *const rcu_data_p = &rcu_preempt_data;
 
-static int rcu_preempted_readers_exp(struct rcu_node *rnp);
 static void rcu_report_exp_rnp(struct rcu_state *rsp, struct rcu_node *rnp,
 			       bool wake);
 
@@ -114,6 +113,147 @@ static void __init rcu_bootup_announce(void)
 	rcu_bootup_announce_oddness();
 }
 
+/* Flags for rcu_preempt_ctxt_queue() decision table. */
+#define RCU_GP_TASKS	0x8
+#define RCU_EXP_TASKS	0x4
+#define RCU_GP_BLKD	0x2
+#define RCU_EXP_BLKD	0x1
+
+/*
+ * Queues a task preempted within an RCU-preempt read-side critical
+ * section into the appropriate location within the ->blkd_tasks list,
+ * depending on the states of any ongoing normal and expedited grace
+ * periods.  The ->gp_tasks pointer indicates which element the normal
+ * grace period is waiting on (NULL if none), and the ->exp_tasks pointer
+ * indicates which element the expedited grace period is waiting on (again,
+ * NULL if none).  If a grace period is waiting on a given element in the
+ * ->blkd_tasks list, it also waits on all subsequent elements.  Thus,
+ * adding a task to the tail of the list blocks any grace period that is
+ * already waiting on one of the elements.  In contrast, adding a task
+ * to the head of the list won't block any grace period that is already
+ * waiting on one of the elements.
+ *
+ * This queuing is imprecise, and can sometimes make an ongoing grace
+ * period wait for a task that is not strictly speaking blocking it.
+ * Given the choice, we needlessly block a normal grace period rather than
+ * blocking an expedited grace period.
+ *
+ * Note that an endless sequence of expedited grace periods still cannot
+ * indefinitely postpone a normal grace period.  Eventually, all of the
+ * fixed number of preempted tasks blocking the normal grace period that are
+ * not also blocking the expedited grace period will resume and complete
+ * their RCU read-side critical sections.  At that point, the ->gp_tasks
+ * pointer will equal the ->exp_tasks pointer, at which point the end of
+ * the corresponding expedited grace period will also be the end of the
+ * normal grace period.
+ */
+static void rcu_preempt_ctxt_queue(struct rcu_node *rnp, struct rcu_data *rdp,
+				   unsigned long flags) __releases(rnp->lock)
+{
+	int blkd_state = (rnp->gp_tasks ? RCU_GP_TASKS : 0) +
+			 (rnp->exp_tasks ? RCU_EXP_TASKS : 0) +
+			 (rnp->qsmask & rdp->grpmask ? RCU_GP_BLKD : 0) +
+			 (rnp->expmask & rdp->grpmask ? RCU_EXP_BLKD : 0);
+	struct task_struct *t = current;
+
+	/*
+	 * Decide where to queue the newly blocked task.  In theory,
+	 * this could be an if-statement.  In practice, when I tried
+	 * that, it was quite messy.
+	 */
+	switch (blkd_state) {
+	case 0:
+	case                RCU_EXP_TASKS:
+	case                RCU_EXP_TASKS + RCU_GP_BLKD:
+	case RCU_GP_TASKS:
+	case RCU_GP_TASKS + RCU_EXP_TASKS:
+
+		/*
+		 * Blocking neither GP, or first task blocking the normal
+		 * GP but not blocking the already-waiting expedited GP.
+		 * Queue at the head of the list to avoid unnecessarily
+		 * blocking the already-waiting GPs.
+		 */
+		list_add(&t->rcu_node_entry, &rnp->blkd_tasks);
+		break;
+
+	case                                              RCU_EXP_BLKD:
+	case                                RCU_GP_BLKD:
+	case                                RCU_GP_BLKD + RCU_EXP_BLKD:
+	case RCU_GP_TASKS +                               RCU_EXP_BLKD:
+	case RCU_GP_TASKS +                 RCU_GP_BLKD + RCU_EXP_BLKD:
+	case RCU_GP_TASKS + RCU_EXP_TASKS + RCU_GP_BLKD + RCU_EXP_BLKD:
+
+		/*
+		 * First task arriving that blocks either GP, or first task
+		 * arriving that blocks the expedited GP (with the normal
+		 * GP already waiting), or a task arriving that blocks
+		 * both GPs with both GPs already waiting.  Queue at the
+		 * tail of the list to avoid any GP waiting on any of the
+		 * already queued tasks that are not blocking it.
+		 */
+		list_add_tail(&t->rcu_node_entry, &rnp->blkd_tasks);
+		break;
+
+	case                RCU_EXP_TASKS +               RCU_EXP_BLKD:
+	case                RCU_EXP_TASKS + RCU_GP_BLKD + RCU_EXP_BLKD:
+	case RCU_GP_TASKS + RCU_EXP_TASKS +               RCU_EXP_BLKD:
+
+		/*
+		 * Second or subsequent task blocking the expedited GP.
+		 * The task either does not block the normal GP, or is the
+		 * first task blocking the normal GP.  Queue just after
+		 * the first task blocking the expedited GP.
+		 */
+		list_add(&t->rcu_node_entry, rnp->exp_tasks);
+		break;
+
+	case RCU_GP_TASKS +                 RCU_GP_BLKD:
+	case RCU_GP_TASKS + RCU_EXP_TASKS + RCU_GP_BLKD:
+
+		/*
+		 * Second or subsequent task blocking the normal GP.
+		 * The task does not block the expedited GP. Queue just
+		 * after the first task blocking the normal GP.
+		 */
+		list_add(&t->rcu_node_entry, rnp->gp_tasks);
+		break;
+
+	default:
+
+		/* Yet another exercise in excessive paranoia. */
+		WARN_ON_ONCE(1);
+		break;
+	}
+
+	/*
+	 * We have now queued the task.  If it was the first one to
+	 * block either grace period, update the ->gp_tasks and/or
+	 * ->exp_tasks pointers, respectively, to reference the newly
+	 * blocked tasks.
+	 */
+	if (!rnp->gp_tasks && (blkd_state & RCU_GP_BLKD))
+		rnp->gp_tasks = &t->rcu_node_entry;
+	if (!rnp->exp_tasks && (blkd_state & RCU_EXP_BLKD))
+		rnp->exp_tasks = &t->rcu_node_entry;
+	raw_spin_unlock(&rnp->lock);
+
+	/*
+	 * Report the quiescent state for the expedited GP.  This expedited
+	 * GP should not be able to end until we report, so there should be
+	 * no need to check for a subsequent expedited GP.  (Though we are
+	 * still in a quiescent state in any case.)
+	 */
+	if (blkd_state & RCU_EXP_BLKD &&
+	    t->rcu_read_unlock_special.b.exp_need_qs) {
+		t->rcu_read_unlock_special.b.exp_need_qs = false;
+		rcu_report_exp_rdp(rdp->rsp, rdp, true);
+	} else {
+		WARN_ON_ONCE(t->rcu_read_unlock_special.b.exp_need_qs);
+	}
+	local_irq_restore(flags);
+}
+
 /*
  * Record a preemptible-RCU quiescent state for the specified CPU.  Note
  * that this just means that the task currently running on the CPU is
@@ -167,42 +307,18 @@ static void rcu_preempt_note_context_switch(void)
 		t->rcu_blocked_node = rnp;
 
 		/*
-		 * If this CPU has already checked in, then this task
-		 * will hold up the next grace period rather than the
-		 * current grace period.  Queue the task accordingly.
-		 * If the task is queued for the current grace period
-		 * (i.e., this CPU has not yet passed through a quiescent
-		 * state for the current grace period), then as long
-		 * as that task remains queued, the current grace period
-		 * cannot end.  Note that there is some uncertainty as
-		 * to exactly when the current grace period started.
-		 * We take a conservative approach, which can result
-		 * in unnecessarily waiting on tasks that started very
-		 * slightly after the current grace period began.  C'est
-		 * la vie!!!
-		 *
-		 * But first, note that the current CPU must still be
-		 * on line!
+		 * Verify the CPU's sanity, trace the preemption, and
+		 * then queue the task as required based on the states
+		 * of any ongoing and expedited grace periods.
 		 */
 		WARN_ON_ONCE((rdp->grpmask & rcu_rnp_online_cpus(rnp)) == 0);
 		WARN_ON_ONCE(!list_empty(&t->rcu_node_entry));
-		if ((rnp->qsmask & rdp->grpmask) && rnp->gp_tasks != NULL) {
-			list_add(&t->rcu_node_entry, rnp->gp_tasks->prev);
-			rnp->gp_tasks = &t->rcu_node_entry;
-			if (IS_ENABLED(CONFIG_RCU_BOOST) &&
-			    rnp->boost_tasks != NULL)
-				rnp->boost_tasks = rnp->gp_tasks;
-		} else {
-			list_add(&t->rcu_node_entry, &rnp->blkd_tasks);
-			if (rnp->qsmask & rdp->grpmask)
-				rnp->gp_tasks = &t->rcu_node_entry;
-		}
 		trace_rcu_preempt_task(rdp->rsp->name,
 				       t->pid,
 				       (rnp->qsmask & rdp->grpmask)
 				       ? rnp->gpnum
 				       : rnp->gpnum + 1);
-		raw_spin_unlock_irqrestore(&rnp->lock, flags);
+		rcu_preempt_ctxt_queue(rnp, rdp, flags);
 	} else if (t->rcu_read_lock_nesting < 0 &&
 		   t->rcu_read_unlock_special.s) {
 
@@ -272,6 +388,7 @@ void rcu_read_unlock_special(struct task_struct *t)
 	unsigned long flags;
 	struct list_head *np;
 	bool drop_boost_mutex = false;
+	struct rcu_data *rdp;
 	struct rcu_node *rnp;
 	union rcu_special special;
 
@@ -282,8 +399,8 @@ void rcu_read_unlock_special(struct task_struct *t)
 	local_irq_save(flags);
 
 	/*
-	 * If RCU core is waiting for this CPU to exit critical section,
-	 * let it know that we have done so.  Because irqs are disabled,
+	 * If RCU core is waiting for this CPU to exit its critical section,
+	 * report the fact that it has exited.  Because irqs are disabled,
 	 * t->rcu_read_unlock_special cannot change.
 	 */
 	special = t->rcu_read_unlock_special;
@@ -296,13 +413,32 @@ void rcu_read_unlock_special(struct task_struct *t)
 		}
 	}
 
+	/*
+	 * Respond to a request for an expedited grace period, but only if
+	 * we were not preempted, meaning that we were running on the same
+	 * CPU throughout.  If we were preempted, the exp_need_qs flag
+	 * would have been cleared at the time of the first preemption,
+	 * and the quiescent state would be reported when we were dequeued.
+	 */
+	if (special.b.exp_need_qs) {
+		WARN_ON_ONCE(special.b.blocked);
+		t->rcu_read_unlock_special.b.exp_need_qs = false;
+		rdp = this_cpu_ptr(rcu_state_p->rda);
+		rcu_report_exp_rdp(rcu_state_p, rdp, true);
+		if (!t->rcu_read_unlock_special.s) {
+			local_irq_restore(flags);
+			return;
+		}
+	}
+
 	/* Hardware IRQ handlers cannot block, complain if they get here. */
 	if (in_irq() || in_serving_softirq()) {
 		lockdep_rcu_suspicious(__FILE__, __LINE__,
 				       "rcu_read_unlock() from irq or softirq with blocking in critical section!!!\n");
-		pr_alert("->rcu_read_unlock_special: %#x (b: %d, nq: %d)\n",
+		pr_alert("->rcu_read_unlock_special: %#x (b: %d, enq: %d nq: %d)\n",
 			 t->rcu_read_unlock_special.s,
 			 t->rcu_read_unlock_special.b.blocked,
+			 t->rcu_read_unlock_special.b.exp_need_qs,
 			 t->rcu_read_unlock_special.b.need_qs);
 		local_irq_restore(flags);
 		return;
@@ -329,7 +465,7 @@ void rcu_read_unlock_special(struct task_struct *t)
 			raw_spin_unlock(&rnp->lock); /* irqs remain disabled. */
 		}
 		empty_norm = !rcu_preempt_blocked_readers_cgp(rnp);
-		empty_exp = !rcu_preempted_readers_exp(rnp);
+		empty_exp = sync_rcu_preempt_exp_done(rnp);
 		smp_mb(); /* ensure expedited fastpath sees end of RCU c-s. */
 		np = rcu_next_node_entry(t, rnp);
 		list_del_init(&t->rcu_node_entry);
@@ -353,7 +489,7 @@ void rcu_read_unlock_special(struct task_struct *t)
 		 * Note that rcu_report_unblock_qs_rnp() releases rnp->lock,
 		 * so we must take a snapshot of the expedited state.
 		 */
-		empty_exp_now = !rcu_preempted_readers_exp(rnp);
+		empty_exp_now = sync_rcu_preempt_exp_done(rnp);
 		if (!empty_norm && !rcu_preempt_blocked_readers_cgp(rnp)) {
 			trace_rcu_quiescent_state_report(TPS("preempt_rcu"),
 							 rnp->gpnum,
@@ -536,27 +672,98 @@ void synchronize_rcu(void)
 EXPORT_SYMBOL_GPL(synchronize_rcu);
 
 /*
+ * Remote handler for smp_call_function_single().  If there is an
+ * RCU read-side critical section in effect, request that the
+ * next rcu_read_unlock() record the quiescent state up the
+ * ->expmask fields in the rcu_node tree.  Otherwise, immediately
+ * report the quiescent state.
+ */
+static void sync_rcu_exp_handler(void *info)
+{
+	struct rcu_data *rdp;
+	struct rcu_state *rsp = info;
+	struct task_struct *t = current;
+
+	/*
+	 * Within an RCU read-side critical section, request that the next
+	 * rcu_read_unlock() report.  Unless this RCU read-side critical
+	 * section has already blocked, in which case it is already set
+	 * up for the expedited grace period to wait on it.
+	 */
+	if (t->rcu_read_lock_nesting > 0 &&
+	    !t->rcu_read_unlock_special.b.blocked) {
+		t->rcu_read_unlock_special.b.exp_need_qs = true;
+		return;
+	}
+
+	/*
+	 * We are either exiting an RCU read-side critical section (negative
+	 * values of t->rcu_read_lock_nesting) or are not in one at all
+	 * (zero value of t->rcu_read_lock_nesting).  Or we are in an RCU
+	 * read-side critical section that blocked before this expedited
+	 * grace period started.  Either way, we can immediately report
+	 * the quiescent state.
+	 */
+	rdp = this_cpu_ptr(rsp->rda);
+	rcu_report_exp_rdp(rsp, rdp, true);
+}
+
+/*
  * Select the nodes that the upcoming expedited grace period needs
  * to wait for.
  */
-static void sync_rcu_exp_select_nodes(struct rcu_state *rsp)
+static void sync_rcu_exp_select_cpus(struct rcu_state *rsp)
 {
+	int cpu;
 	unsigned long flags;
+	unsigned long mask;
+	unsigned long mask_ofl_test;
+	unsigned long mask_ofl_ipi;
+	int ret;
 	struct rcu_node *rnp;
 
 	sync_exp_reset_tree(rsp);
 	rcu_for_each_leaf_node(rsp, rnp) {
 		raw_spin_lock_irqsave(&rnp->lock, flags);
 		smp_mb__after_unlock_lock();
-		rnp->expmask = 0; /* No per-CPU component yet. */
-		if (!rcu_preempt_has_tasks(rnp)) {
-			/* FIXME: Want __rcu_report_exp_rnp() here. */
-			raw_spin_unlock_irqrestore(&rnp->lock, flags);
-		} else {
+
+		/* Each pass checks a CPU for identity, offline, and idle. */
+		mask_ofl_test = 0;
+		for (cpu = rnp->grplo; cpu <= rnp->grphi; cpu++) {
+			struct rcu_data *rdp = per_cpu_ptr(rsp->rda, cpu);
+			struct rcu_dynticks *rdtp = &per_cpu(rcu_dynticks, cpu);
+
+			if (raw_smp_processor_id() == cpu ||
+			    cpu_is_offline(cpu) ||
+			    !(atomic_add_return(0, &rdtp->dynticks) & 0x1))
+				mask_ofl_test |= rdp->grpmask;
+		}
+		mask_ofl_ipi = rnp->expmask & ~mask_ofl_test;
+
+		/*
+		 * Need to wait for any blocked tasks as well.  Note that
+		 * additional blocking tasks will also block the expedited
+		 * GP until such time as the ->expmask bits are cleared.
+		 */
+		if (rcu_preempt_has_tasks(rnp))
 			rnp->exp_tasks = rnp->blkd_tasks.next;
-			rcu_initiate_boost(rnp, flags);
+		raw_spin_unlock_irqrestore(&rnp->lock, flags);
+
+		/* IPI the remaining CPUs for expedited quiescent state. */
+		mask = 1;
+		for (cpu = rnp->grplo; cpu <= rnp->grphi; cpu++, mask <<= 1) {
+			if (!(mask_ofl_ipi & mask))
+				continue;
+			ret = smp_call_function_single(cpu,
+						       sync_rcu_exp_handler,
+						       rsp, 0);
+			if (!ret)
+				mask_ofl_ipi &= ~mask;
 		}
-		rcu_report_exp_rnp(rsp, rnp, false);
+		/* Report quiescent states for those that went offline. */
+		mask_ofl_test |= mask_ofl_ipi;
+		if (mask_ofl_test)
+			rcu_report_exp_cpu_mult(rsp, rnp, mask_ofl_test, false);
 	}
 }
 
@@ -587,11 +794,8 @@ void synchronize_rcu_expedited(void)
 
 	rcu_exp_gp_seq_start(rsp);
 
-	/* force all RCU readers onto ->blkd_tasks lists. */
-	synchronize_sched_expedited();
-
 	/* Initialize the rcu_node tree in preparation for the wait. */
-	sync_rcu_exp_select_nodes(rsp);
+	sync_rcu_exp_select_cpus(rsp);
 
 	/* Wait for snapshotted ->blkd_tasks lists to drain. */
 	rnp = rcu_get_root(rsp);
-- 
2.5.2


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH tip/core/rcu 05/18] rcu: Move synchronize_sched_expedited() to combining tree
  2015-10-06 16:29 ` [PATCH tip/core/rcu 01/18] rcu: Use rsp->expedited_wq instead of sync_rcu_preempt_exp_wq Paul E. McKenney
                     ` (2 preceding siblings ...)
  2015-10-06 16:29   ` [PATCH tip/core/rcu 04/18] rcu: Use single-stage IPI algorithm for RCU expedited grace period Paul E. McKenney
@ 2015-10-06 16:29   ` Paul E. McKenney
  2015-10-06 16:29   ` [PATCH tip/core/rcu 06/18] rcu: Rename qs_pending to core_needs_qs Paul E. McKenney
                     ` (12 subsequent siblings)
  16 siblings, 0 replies; 67+ messages in thread
From: Paul E. McKenney @ 2015-10-06 16:29 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, jiangshanlai, dipankar, akpm, mathieu.desnoyers, josh,
	tglx, peterz, rostedt, dhowells, edumazet, dvhart, fweisbec,
	oleg, bobby.prani, Paul E. McKenney

Currently, synchronize_sched_expedited() uses a single global counter
to track the number of remaining context switches that the current
expedited grace period must wait on.  This is problematic on large
systems, where the resulting memory contention can be pathological.
This commit therefore makes synchronize_sched_expedited() instead use
the combining tree in the same manner as synchronize_rcu_expedited(),
keeping memory contention down to a dull roar.

This commit creates a temporary function sync_sched_exp_select_cpus()
that is very similar to sync_rcu_exp_select_cpus().  A later commit
will consolidate these two functions, which becomes possible when
synchronize_sched_expedited() switches from stop_one_cpu_nowait() to
smp_call_function_single().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 kernel/rcu/tree.c | 123 ++++++++++++++++++++++++++++++++++++------------------
 kernel/rcu/tree.h |   1 -
 2 files changed, 82 insertions(+), 42 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 6e92bf4337bd..d2cdcada6fe0 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -3642,19 +3642,77 @@ static int synchronize_sched_expedited_cpu_stop(void *data)
 	struct rcu_data *rdp = data;
 	struct rcu_state *rsp = rdp->rsp;
 
-	/* We are here: If we are last, do the wakeup. */
-	rdp->exp_done = true;
-	if (atomic_dec_and_test(&rsp->expedited_need_qs))
-		wake_up(&rsp->expedited_wq);
+	/* Report the quiescent state. */
+	rcu_report_exp_rdp(rsp, rdp, true);
 	return 0;
 }
 
+/*
+ * Select the nodes that the upcoming expedited grace period needs
+ * to wait for.
+ */
+static void sync_sched_exp_select_cpus(struct rcu_state *rsp)
+{
+	int cpu;
+	unsigned long flags;
+	unsigned long mask;
+	unsigned long mask_ofl_test;
+	unsigned long mask_ofl_ipi;
+	struct rcu_data *rdp;
+	struct rcu_node *rnp;
+
+	sync_exp_reset_tree(rsp);
+	rcu_for_each_leaf_node(rsp, rnp) {
+		raw_spin_lock_irqsave(&rnp->lock, flags);
+		smp_mb__after_unlock_lock();
+
+		/* Each pass checks a CPU for identity, offline, and idle. */
+		mask_ofl_test = 0;
+		for (cpu = rnp->grplo; cpu <= rnp->grphi; cpu++) {
+			struct rcu_data *rdp = per_cpu_ptr(rsp->rda, cpu);
+			struct rcu_dynticks *rdtp = &per_cpu(rcu_dynticks, cpu);
+
+			if (raw_smp_processor_id() == cpu ||
+			    cpu_is_offline(cpu) ||
+			    !(atomic_add_return(0, &rdtp->dynticks) & 0x1))
+				mask_ofl_test |= rdp->grpmask;
+		}
+		mask_ofl_ipi = rnp->expmask & ~mask_ofl_test;
+
+		/*
+		 * Need to wait for any blocked tasks as well.  Note that
+		 * additional blocking tasks will also block the expedited
+		 * GP until such time as the ->expmask bits are cleared.
+		 */
+		if (rcu_preempt_has_tasks(rnp))
+			rnp->exp_tasks = rnp->blkd_tasks.next;
+		raw_spin_unlock_irqrestore(&rnp->lock, flags);
+
+		/* IPI the remaining CPUs for expedited quiescent state. */
+		mask = 1;
+		for (cpu = rnp->grplo; cpu <= rnp->grphi; cpu++, mask <<= 1) {
+			if (!(mask_ofl_ipi & mask))
+				continue;
+			rdp = per_cpu_ptr(rsp->rda, cpu);
+			stop_one_cpu_nowait(cpu, synchronize_sched_expedited_cpu_stop,
+					    rdp, &rdp->exp_stop_work);
+			mask_ofl_ipi &= ~mask;
+		}
+		/* Report quiescent states for those that went offline. */
+		mask_ofl_test |= mask_ofl_ipi;
+		if (mask_ofl_test)
+			rcu_report_exp_cpu_mult(rsp, rnp, mask_ofl_test, false);
+	}
+}
+
 static void synchronize_sched_expedited_wait(struct rcu_state *rsp)
 {
 	int cpu;
 	unsigned long jiffies_stall;
 	unsigned long jiffies_start;
-	struct rcu_data *rdp;
+	unsigned long mask;
+	struct rcu_node *rnp;
+	struct rcu_node *rnp_root = rcu_get_root(rsp);
 	int ret;
 
 	jiffies_stall = rcu_jiffies_till_stall_check();
@@ -3663,33 +3721,36 @@ static void synchronize_sched_expedited_wait(struct rcu_state *rsp)
 	for (;;) {
 		ret = wait_event_interruptible_timeout(
 				rsp->expedited_wq,
-				!atomic_read(&rsp->expedited_need_qs),
+				sync_rcu_preempt_exp_done(rnp_root),
 				jiffies_stall);
 		if (ret > 0)
 			return;
 		if (ret < 0) {
 			/* Hit a signal, disable CPU stall warnings. */
 			wait_event(rsp->expedited_wq,
-				   !atomic_read(&rsp->expedited_need_qs));
+				   sync_rcu_preempt_exp_done(rnp_root));
 			return;
 		}
 		pr_err("INFO: %s detected expedited stalls on CPUs: {",
 		       rsp->name);
-		for_each_online_cpu(cpu) {
-			rdp = per_cpu_ptr(rsp->rda, cpu);
-
-			if (rdp->exp_done)
-				continue;
-			pr_cont(" %d", cpu);
+		rcu_for_each_leaf_node(rsp, rnp) {
+			mask = 1;
+			for (cpu = rnp->grplo; cpu <= rnp->grphi; cpu++, mask <<= 1) {
+				if (!(rnp->expmask & mask))
+					continue;
+				pr_cont(" %d", cpu);
+			}
+			mask <<= 1;
 		}
 		pr_cont(" } %lu jiffies s: %lu\n",
 			jiffies - jiffies_start, rsp->expedited_sequence);
-		for_each_online_cpu(cpu) {
-			rdp = per_cpu_ptr(rsp->rda, cpu);
-
-			if (rdp->exp_done)
-				continue;
-			dump_cpu_task(cpu);
+		rcu_for_each_leaf_node(rsp, rnp) {
+			mask = 1;
+			for (cpu = rnp->grplo; cpu <= rnp->grphi; cpu++, mask <<= 1) {
+				if (!(rnp->expmask & mask))
+					continue;
+				dump_cpu_task(cpu);
+			}
 		}
 		jiffies_stall = 3 * rcu_jiffies_till_stall_check() + 3;
 	}
@@ -3713,7 +3774,6 @@ static void synchronize_sched_expedited_wait(struct rcu_state *rsp)
  */
 void synchronize_sched_expedited(void)
 {
-	int cpu;
 	unsigned long s;
 	struct rcu_node *rnp;
 	struct rcu_state *rsp = &rcu_sched_state;
@@ -3736,27 +3796,8 @@ void synchronize_sched_expedited(void)
 	}
 
 	rcu_exp_gp_seq_start(rsp);
-
-	/* Stop each CPU that is online, non-idle, and not us. */
-	atomic_set(&rsp->expedited_need_qs, 1); /* Extra count avoids race. */
-	for_each_online_cpu(cpu) {
-		struct rcu_data *rdp = per_cpu_ptr(rsp->rda, cpu);
-		struct rcu_dynticks *rdtp = &per_cpu(rcu_dynticks, cpu);
-
-		rdp->exp_done = false;
-
-		/* Skip our CPU and any idle CPUs. */
-		if (raw_smp_processor_id() == cpu ||
-		    !(atomic_add_return(0, &rdtp->dynticks) & 0x1))
-			continue;
-		atomic_inc(&rsp->expedited_need_qs);
-		stop_one_cpu_nowait(cpu, synchronize_sched_expedited_cpu_stop,
-				    rdp, &rdp->exp_stop_work);
-	}
-
-	/* Remove extra count and, if necessary, wait for CPUs to stop. */
-	if (!atomic_dec_and_test(&rsp->expedited_need_qs))
-		synchronize_sched_expedited_wait(rsp);
+	sync_sched_exp_select_cpus(rsp);
+	synchronize_sched_expedited_wait(rsp);
 
 	rcu_exp_gp_seq_end(rsp);
 	mutex_unlock(&rnp->exp_funnel_mutex);
diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
index a57f25ecca58..efe361c764ab 100644
--- a/kernel/rcu/tree.h
+++ b/kernel/rcu/tree.h
@@ -383,7 +383,6 @@ struct rcu_data {
 	struct rcu_head oom_head;
 #endif /* #ifdef CONFIG_RCU_FAST_NO_HZ */
 	struct mutex exp_funnel_mutex;
-	bool exp_done;			/* Expedited QS for this CPU? */
 
 	/* 7) Callback offloading. */
 #ifdef CONFIG_RCU_NOCB_CPU
-- 
2.5.2


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH tip/core/rcu 06/18] rcu: Rename qs_pending to core_needs_qs
  2015-10-06 16:29 ` [PATCH tip/core/rcu 01/18] rcu: Use rsp->expedited_wq instead of sync_rcu_preempt_exp_wq Paul E. McKenney
                     ` (3 preceding siblings ...)
  2015-10-06 16:29   ` [PATCH tip/core/rcu 05/18] rcu: Move synchronize_sched_expedited() to combining tree Paul E. McKenney
@ 2015-10-06 16:29   ` Paul E. McKenney
  2015-10-06 16:29   ` [PATCH tip/core/rcu 07/18] rcu: Invert passed_quiesce and rename to cpu_no_qs Paul E. McKenney
                     ` (11 subsequent siblings)
  16 siblings, 0 replies; 67+ messages in thread
From: Paul E. McKenney @ 2015-10-06 16:29 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, jiangshanlai, dipankar, akpm, mathieu.desnoyers, josh,
	tglx, peterz, rostedt, dhowells, edumazet, dvhart, fweisbec,
	oleg, bobby.prani, Paul E. McKenney

An upcoming commit needs to invert the sense of the ->passed_quiesce
rcu_data structure field, so this commit is taking this opportunity
to clarify things a bit by renaming ->qs_pending to ->core_needs_qs.

So if !rdp->core_needs_qs, then this CPU need not concern itself with
quiescent states, in particular, it need not acquire its leaf rcu_node
structure's ->lock to check.  Otherwise, it needs to report the next
quiescent state.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 kernel/rcu/tree.c        | 14 +++++++-------
 kernel/rcu/tree.h        |  4 ++--
 kernel/rcu/tree_plugin.h |  2 +-
 kernel/rcu/tree_trace.c  |  4 ++--
 4 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index d2cdcada6fe0..7c158ffc7769 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -1746,7 +1746,7 @@ static bool __note_gp_changes(struct rcu_state *rsp, struct rcu_node *rnp,
 		trace_rcu_grace_period(rsp->name, rdp->gpnum, TPS("cpustart"));
 		rdp->passed_quiesce = 0;
 		rdp->rcu_qs_ctr_snap = __this_cpu_read(rcu_qs_ctr);
-		rdp->qs_pending = !!(rnp->qsmask & rdp->grpmask);
+		rdp->core_needs_qs = !!(rnp->qsmask & rdp->grpmask);
 		zero_cpu_stall_ticks(rdp);
 		WRITE_ONCE(rdp->gpwrap, false);
 	}
@@ -2357,7 +2357,7 @@ rcu_report_qs_rdp(int cpu, struct rcu_state *rsp, struct rcu_data *rdp)
 	if ((rnp->qsmask & mask) == 0) {
 		raw_spin_unlock_irqrestore(&rnp->lock, flags);
 	} else {
-		rdp->qs_pending = 0;
+		rdp->core_needs_qs = 0;
 
 		/*
 		 * This GP can't end until cpu checks in, so all of our
@@ -2388,7 +2388,7 @@ rcu_check_quiescent_state(struct rcu_state *rsp, struct rcu_data *rdp)
 	 * Does this CPU still need to do its part for current grace period?
 	 * If no, return and let the other CPUs do their part as well.
 	 */
-	if (!rdp->qs_pending)
+	if (!rdp->core_needs_qs)
 		return;
 
 	/*
@@ -3828,10 +3828,10 @@ static int __rcu_pending(struct rcu_state *rsp, struct rcu_data *rdp)
 
 	/* Is the RCU core waiting for a quiescent state from this CPU? */
 	if (rcu_scheduler_fully_active &&
-	    rdp->qs_pending && !rdp->passed_quiesce &&
+	    rdp->core_needs_qs && !rdp->passed_quiesce &&
 	    rdp->rcu_qs_ctr_snap == __this_cpu_read(rcu_qs_ctr)) {
-		rdp->n_rp_qs_pending++;
-	} else if (rdp->qs_pending &&
+		rdp->n_rp_core_needs_qs++;
+	} else if (rdp->core_needs_qs &&
 		   (rdp->passed_quiesce ||
 		    rdp->rcu_qs_ctr_snap != __this_cpu_read(rcu_qs_ctr))) {
 		rdp->n_rp_report_qs++;
@@ -4157,7 +4157,7 @@ rcu_init_percpu_data(int cpu, struct rcu_state *rsp)
 	rdp->completed = rnp->completed;
 	rdp->passed_quiesce = false;
 	rdp->rcu_qs_ctr_snap = per_cpu(rcu_qs_ctr, cpu);
-	rdp->qs_pending = false;
+	rdp->core_needs_qs = false;
 	trace_rcu_grace_period(rsp->name, rdp->gpnum, TPS("cpuonl"));
 	raw_spin_unlock_irqrestore(&rnp->lock, flags);
 }
diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
index efe361c764ab..4a0f30676ba8 100644
--- a/kernel/rcu/tree.h
+++ b/kernel/rcu/tree.h
@@ -303,7 +303,7 @@ struct rcu_data {
 	unsigned long	rcu_qs_ctr_snap;/* Snapshot of rcu_qs_ctr to check */
 					/*  for rcu_all_qs() invocations. */
 	bool		passed_quiesce;	/* User-mode/idle loop etc. */
-	bool		qs_pending;	/* Core waits for quiesc state. */
+	bool		core_needs_qs;	/* Core waits for quiesc state. */
 	bool		beenonline;	/* CPU online at least once. */
 	bool		gpwrap;		/* Possible gpnum/completed wrap. */
 	struct rcu_node *mynode;	/* This CPU's leaf of hierarchy */
@@ -368,7 +368,7 @@ struct rcu_data {
 
 	/* 5) __rcu_pending() statistics. */
 	unsigned long n_rcu_pending;	/* rcu_pending() calls since boot. */
-	unsigned long n_rp_qs_pending;
+	unsigned long n_rp_core_needs_qs;
 	unsigned long n_rp_report_qs;
 	unsigned long n_rp_cb_ready;
 	unsigned long n_rp_cpu_needs_gp;
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 6f7500f9387c..e33b4f3b8e0a 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -619,7 +619,7 @@ static void rcu_preempt_check_callbacks(void)
 		return;
 	}
 	if (t->rcu_read_lock_nesting > 0 &&
-	    __this_cpu_read(rcu_data_p->qs_pending) &&
+	    __this_cpu_read(rcu_data_p->core_needs_qs) &&
 	    !__this_cpu_read(rcu_data_p->passed_quiesce))
 		t->rcu_read_unlock_special.b.need_qs = true;
 }
diff --git a/kernel/rcu/tree_trace.c b/kernel/rcu/tree_trace.c
index 6fc4c5ff3bb5..4ac25f8520d6 100644
--- a/kernel/rcu/tree_trace.c
+++ b/kernel/rcu/tree_trace.c
@@ -123,7 +123,7 @@ static void print_one_rcu_data(struct seq_file *m, struct rcu_data *rdp)
 		   ulong2long(rdp->completed), ulong2long(rdp->gpnum),
 		   rdp->passed_quiesce,
 		   rdp->rcu_qs_ctr_snap == per_cpu(rcu_qs_ctr, rdp->cpu),
-		   rdp->qs_pending);
+		   rdp->core_needs_qs);
 	seq_printf(m, " dt=%d/%llx/%d df=%lu",
 		   atomic_read(&rdp->dynticks->dynticks),
 		   rdp->dynticks->dynticks_nesting,
@@ -361,7 +361,7 @@ static void print_one_rcu_pending(struct seq_file *m, struct rcu_data *rdp)
 		   cpu_is_offline(rdp->cpu) ? '!' : ' ',
 		   rdp->n_rcu_pending);
 	seq_printf(m, "qsp=%ld rpq=%ld cbr=%ld cng=%ld ",
-		   rdp->n_rp_qs_pending,
+		   rdp->n_rp_core_needs_qs,
 		   rdp->n_rp_report_qs,
 		   rdp->n_rp_cb_ready,
 		   rdp->n_rp_cpu_needs_gp);
-- 
2.5.2


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH tip/core/rcu 07/18] rcu: Invert passed_quiesce and rename to cpu_no_qs
  2015-10-06 16:29 ` [PATCH tip/core/rcu 01/18] rcu: Use rsp->expedited_wq instead of sync_rcu_preempt_exp_wq Paul E. McKenney
                     ` (4 preceding siblings ...)
  2015-10-06 16:29   ` [PATCH tip/core/rcu 06/18] rcu: Rename qs_pending to core_needs_qs Paul E. McKenney
@ 2015-10-06 16:29   ` Paul E. McKenney
  2015-10-06 16:29   ` [PATCH tip/core/rcu 08/18] rcu: Make ->cpu_no_qs be a union for aggregate OR Paul E. McKenney
                     ` (10 subsequent siblings)
  16 siblings, 0 replies; 67+ messages in thread
From: Paul E. McKenney @ 2015-10-06 16:29 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, jiangshanlai, dipankar, akpm, mathieu.desnoyers, josh,
	tglx, peterz, rostedt, dhowells, edumazet, dvhart, fweisbec,
	oleg, bobby.prani, Paul E. McKenney

This commit inverts the sense of the rcu_data structure's ->passed_quiesce
field and renames it to ->cpu_no_qs.  This will allow a later commit to
use an "aggregate OR" operation to test expedited as well as normal grace
periods without added overhead.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 Documentation/RCU/trace.txt | 32 ++++++++++++++++----------------
 kernel/rcu/tree.c           | 22 +++++++++++-----------
 kernel/rcu/tree.h           |  2 +-
 kernel/rcu/tree_plugin.h    |  6 +++---
 kernel/rcu/tree_trace.c     |  4 ++--
 5 files changed, 33 insertions(+), 33 deletions(-)

diff --git a/Documentation/RCU/trace.txt b/Documentation/RCU/trace.txt
index 97f17e9decda..ec6998b1b6d0 100644
--- a/Documentation/RCU/trace.txt
+++ b/Documentation/RCU/trace.txt
@@ -56,14 +56,14 @@ rcuboost:
 
 The output of "cat rcu/rcu_preempt/rcudata" looks as follows:
 
-  0!c=30455 g=30456 pq=1/0 qp=1 dt=126535/140000000000000/0 df=2002 of=4 ql=0/0 qs=N... b=10 ci=74572 nci=0 co=1131 ca=716
-  1!c=30719 g=30720 pq=1/0 qp=0 dt=132007/140000000000000/0 df=1874 of=10 ql=0/0 qs=N... b=10 ci=123209 nci=0 co=685 ca=982
-  2!c=30150 g=30151 pq=1/1 qp=1 dt=138537/140000000000000/0 df=1707 of=8 ql=0/0 qs=N... b=10 ci=80132 nci=0 co=1328 ca=1458
-  3 c=31249 g=31250 pq=1/1 qp=0 dt=107255/140000000000000/0 df=1749 of=6 ql=0/450 qs=NRW. b=10 ci=151700 nci=0 co=509 ca=622
-  4!c=29502 g=29503 pq=1/0 qp=1 dt=83647/140000000000000/0 df=965 of=5 ql=0/0 qs=N... b=10 ci=65643 nci=0 co=1373 ca=1521
-  5 c=31201 g=31202 pq=1/0 qp=1 dt=70422/0/0 df=535 of=7 ql=0/0 qs=.... b=10 ci=58500 nci=0 co=764 ca=698
-  6!c=30253 g=30254 pq=1/0 qp=1 dt=95363/140000000000000/0 df=780 of=5 ql=0/0 qs=N... b=10 ci=100607 nci=0 co=1414 ca=1353
-  7 c=31178 g=31178 pq=1/0 qp=0 dt=91536/0/0 df=547 of=4 ql=0/0 qs=.... b=10 ci=109819 nci=0 co=1115 ca=969
+  0!c=30455 g=30456 cnq=1/0:1 dt=126535/140000000000000/0 df=2002 of=4 ql=0/0 qs=N... b=10 ci=74572 nci=0 co=1131 ca=716
+  1!c=30719 g=30720 cnq=1/0:0 dt=132007/140000000000000/0 df=1874 of=10 ql=0/0 qs=N... b=10 ci=123209 nci=0 co=685 ca=982
+  2!c=30150 g=30151 cnq=1/1:1 dt=138537/140000000000000/0 df=1707 of=8 ql=0/0 qs=N... b=10 ci=80132 nci=0 co=1328 ca=1458
+  3 c=31249 g=31250 cnq=1/1:0 dt=107255/140000000000000/0 df=1749 of=6 ql=0/450 qs=NRW. b=10 ci=151700 nci=0 co=509 ca=622
+  4!c=29502 g=29503 cnq=1/0:1 dt=83647/140000000000000/0 df=965 of=5 ql=0/0 qs=N... b=10 ci=65643 nci=0 co=1373 ca=1521
+  5 c=31201 g=31202 cnq=1/0:1 dt=70422/0/0 df=535 of=7 ql=0/0 qs=.... b=10 ci=58500 nci=0 co=764 ca=698
+  6!c=30253 g=30254 cnq=1/0:1 dt=95363/140000000000000/0 df=780 of=5 ql=0/0 qs=N... b=10 ci=100607 nci=0 co=1414 ca=1353
+  7 c=31178 g=31178 cnq=1/0:0 dt=91536/0/0 df=547 of=4 ql=0/0 qs=.... b=10 ci=109819 nci=0 co=1115 ca=969
 
 This file has one line per CPU, or eight for this 8-CPU system.
 The fields are as follows:
@@ -188,14 +188,14 @@ o	"ca" is the number of RCU callbacks that have been adopted by this
 Kernels compiled with CONFIG_RCU_BOOST=y display the following from
 /debug/rcu/rcu_preempt/rcudata:
 
-  0!c=12865 g=12866 pq=1/0 qp=1 dt=83113/140000000000000/0 df=288 of=11 ql=0/0 qs=N... kt=0/O ktl=944 b=10 ci=60709 nci=0 co=748 ca=871
-  1 c=14407 g=14408 pq=1/0 qp=0 dt=100679/140000000000000/0 df=378 of=7 ql=0/119 qs=NRW. kt=0/W ktl=9b6 b=10 ci=109740 nci=0 co=589 ca=485
-  2 c=14407 g=14408 pq=1/0 qp=0 dt=105486/0/0 df=90 of=9 ql=0/89 qs=NRW. kt=0/W ktl=c0c b=10 ci=83113 nci=0 co=533 ca=490
-  3 c=14407 g=14408 pq=1/0 qp=0 dt=107138/0/0 df=142 of=8 ql=0/188 qs=NRW. kt=0/W ktl=b96 b=10 ci=121114 nci=0 co=426 ca=290
-  4 c=14405 g=14406 pq=1/0 qp=1 dt=50238/0/0 df=706 of=7 ql=0/0 qs=.... kt=0/W ktl=812 b=10 ci=34929 nci=0 co=643 ca=114
-  5!c=14168 g=14169 pq=1/0 qp=0 dt=45465/140000000000000/0 df=161 of=11 ql=0/0 qs=N... kt=0/O ktl=b4d b=10 ci=47712 nci=0 co=677 ca=722
-  6 c=14404 g=14405 pq=1/0 qp=0 dt=59454/0/0 df=94 of=6 ql=0/0 qs=.... kt=0/W ktl=e57 b=10 ci=55597 nci=0 co=701 ca=811
-  7 c=14407 g=14408 pq=1/0 qp=1 dt=68850/0/0 df=31 of=8 ql=0/0 qs=.... kt=0/W ktl=14bd b=10 ci=77475 nci=0 co=508 ca=1042
+  0!c=12865 g=12866 cnq=1/0:1 dt=83113/140000000000000/0 df=288 of=11 ql=0/0 qs=N... kt=0/O ktl=944 b=10 ci=60709 nci=0 co=748 ca=871
+  1 c=14407 g=14408 cnq=1/0:0 dt=100679/140000000000000/0 df=378 of=7 ql=0/119 qs=NRW. kt=0/W ktl=9b6 b=10 ci=109740 nci=0 co=589 ca=485
+  2 c=14407 g=14408 cnq=1/0:0 dt=105486/0/0 df=90 of=9 ql=0/89 qs=NRW. kt=0/W ktl=c0c b=10 ci=83113 nci=0 co=533 ca=490
+  3 c=14407 g=14408 cnq=1/0:0 dt=107138/0/0 df=142 of=8 ql=0/188 qs=NRW. kt=0/W ktl=b96 b=10 ci=121114 nci=0 co=426 ca=290
+  4 c=14405 g=14406 cnq=1/0:1 dt=50238/0/0 df=706 of=7 ql=0/0 qs=.... kt=0/W ktl=812 b=10 ci=34929 nci=0 co=643 ca=114
+  5!c=14168 g=14169 cnq=1/0:0 dt=45465/140000000000000/0 df=161 of=11 ql=0/0 qs=N... kt=0/O ktl=b4d b=10 ci=47712 nci=0 co=677 ca=722
+  6 c=14404 g=14405 cnq=1/0:0 dt=59454/0/0 df=94 of=6 ql=0/0 qs=.... kt=0/W ktl=e57 b=10 ci=55597 nci=0 co=701 ca=811
+  7 c=14407 g=14408 cnq=1/0:1 dt=68850/0/0 df=31 of=8 ql=0/0 qs=.... kt=0/W ktl=14bd b=10 ci=77475 nci=0 co=508 ca=1042
 
 This is similar to the output discussed above, but contains the following
 additional fields:
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 7c158ffc7769..31e7021ced4d 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -245,21 +245,21 @@ static int rcu_gp_in_progress(struct rcu_state *rsp)
  */
 void rcu_sched_qs(void)
 {
-	if (!__this_cpu_read(rcu_sched_data.passed_quiesce)) {
+	if (__this_cpu_read(rcu_sched_data.cpu_no_qs)) {
 		trace_rcu_grace_period(TPS("rcu_sched"),
 				       __this_cpu_read(rcu_sched_data.gpnum),
 				       TPS("cpuqs"));
-		__this_cpu_write(rcu_sched_data.passed_quiesce, 1);
+		__this_cpu_write(rcu_sched_data.cpu_no_qs, false);
 	}
 }
 
 void rcu_bh_qs(void)
 {
-	if (!__this_cpu_read(rcu_bh_data.passed_quiesce)) {
+	if (__this_cpu_read(rcu_bh_data.cpu_no_qs)) {
 		trace_rcu_grace_period(TPS("rcu_bh"),
 				       __this_cpu_read(rcu_bh_data.gpnum),
 				       TPS("cpuqs"));
-		__this_cpu_write(rcu_bh_data.passed_quiesce, 1);
+		__this_cpu_write(rcu_bh_data.cpu_no_qs, false);
 	}
 }
 
@@ -1744,7 +1744,7 @@ static bool __note_gp_changes(struct rcu_state *rsp, struct rcu_node *rnp,
 		 */
 		rdp->gpnum = rnp->gpnum;
 		trace_rcu_grace_period(rsp->name, rdp->gpnum, TPS("cpustart"));
-		rdp->passed_quiesce = 0;
+		rdp->cpu_no_qs = true;
 		rdp->rcu_qs_ctr_snap = __this_cpu_read(rcu_qs_ctr);
 		rdp->core_needs_qs = !!(rnp->qsmask & rdp->grpmask);
 		zero_cpu_stall_ticks(rdp);
@@ -2337,7 +2337,7 @@ rcu_report_qs_rdp(int cpu, struct rcu_state *rsp, struct rcu_data *rdp)
 	rnp = rdp->mynode;
 	raw_spin_lock_irqsave(&rnp->lock, flags);
 	smp_mb__after_unlock_lock();
-	if ((rdp->passed_quiesce == 0 &&
+	if ((rdp->cpu_no_qs &&
 	     rdp->rcu_qs_ctr_snap == __this_cpu_read(rcu_qs_ctr)) ||
 	    rdp->gpnum != rnp->gpnum || rnp->completed == rnp->gpnum ||
 	    rdp->gpwrap) {
@@ -2348,7 +2348,7 @@ rcu_report_qs_rdp(int cpu, struct rcu_state *rsp, struct rcu_data *rdp)
 		 * We will instead need a new quiescent state that lies
 		 * within the current grace period.
 		 */
-		rdp->passed_quiesce = 0;	/* need qs for new gp. */
+		rdp->cpu_no_qs = true;	/* need qs for new gp. */
 		rdp->rcu_qs_ctr_snap = __this_cpu_read(rcu_qs_ctr);
 		raw_spin_unlock_irqrestore(&rnp->lock, flags);
 		return;
@@ -2395,7 +2395,7 @@ rcu_check_quiescent_state(struct rcu_state *rsp, struct rcu_data *rdp)
 	 * Was there a quiescent state since the beginning of the grace
 	 * period? If no, then exit and wait for the next call.
 	 */
-	if (!rdp->passed_quiesce &&
+	if (rdp->cpu_no_qs &&
 	    rdp->rcu_qs_ctr_snap == __this_cpu_read(rcu_qs_ctr))
 		return;
 
@@ -3828,11 +3828,11 @@ static int __rcu_pending(struct rcu_state *rsp, struct rcu_data *rdp)
 
 	/* Is the RCU core waiting for a quiescent state from this CPU? */
 	if (rcu_scheduler_fully_active &&
-	    rdp->core_needs_qs && !rdp->passed_quiesce &&
+	    rdp->core_needs_qs && rdp->cpu_no_qs &&
 	    rdp->rcu_qs_ctr_snap == __this_cpu_read(rcu_qs_ctr)) {
 		rdp->n_rp_core_needs_qs++;
 	} else if (rdp->core_needs_qs &&
-		   (rdp->passed_quiesce ||
+		   (!rdp->cpu_no_qs ||
 		    rdp->rcu_qs_ctr_snap != __this_cpu_read(rcu_qs_ctr))) {
 		rdp->n_rp_report_qs++;
 		return 1;
@@ -4155,7 +4155,7 @@ rcu_init_percpu_data(int cpu, struct rcu_state *rsp)
 	rdp->beenonline = true;	 /* We have now been online. */
 	rdp->gpnum = rnp->completed; /* Make CPU later note any new GP. */
 	rdp->completed = rnp->completed;
-	rdp->passed_quiesce = false;
+	rdp->cpu_no_qs = true;
 	rdp->rcu_qs_ctr_snap = per_cpu(rcu_qs_ctr, cpu);
 	rdp->core_needs_qs = false;
 	trace_rcu_grace_period(rsp->name, rdp->gpnum, TPS("cpuonl"));
diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
index 4a0f30676ba8..ded4ceebed76 100644
--- a/kernel/rcu/tree.h
+++ b/kernel/rcu/tree.h
@@ -302,7 +302,7 @@ struct rcu_data {
 					/*  is aware of having started. */
 	unsigned long	rcu_qs_ctr_snap;/* Snapshot of rcu_qs_ctr to check */
 					/*  for rcu_all_qs() invocations. */
-	bool		passed_quiesce;	/* User-mode/idle loop etc. */
+	bool		cpu_no_qs;	/* No QS yet for this CPU. */
 	bool		core_needs_qs;	/* Core waits for quiesc state. */
 	bool		beenonline;	/* CPU online at least once. */
 	bool		gpwrap;		/* Possible gpnum/completed wrap. */
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index e33b4f3b8e0a..6977ff0dccb9 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -265,11 +265,11 @@ static void rcu_preempt_ctxt_queue(struct rcu_node *rnp, struct rcu_data *rdp,
  */
 static void rcu_preempt_qs(void)
 {
-	if (!__this_cpu_read(rcu_data_p->passed_quiesce)) {
+	if (__this_cpu_read(rcu_data_p->cpu_no_qs)) {
 		trace_rcu_grace_period(TPS("rcu_preempt"),
 				       __this_cpu_read(rcu_data_p->gpnum),
 				       TPS("cpuqs"));
-		__this_cpu_write(rcu_data_p->passed_quiesce, 1);
+		__this_cpu_write(rcu_data_p->cpu_no_qs, false);
 		barrier(); /* Coordinate with rcu_preempt_check_callbacks(). */
 		current->rcu_read_unlock_special.b.need_qs = false;
 	}
@@ -620,7 +620,7 @@ static void rcu_preempt_check_callbacks(void)
 	}
 	if (t->rcu_read_lock_nesting > 0 &&
 	    __this_cpu_read(rcu_data_p->core_needs_qs) &&
-	    !__this_cpu_read(rcu_data_p->passed_quiesce))
+	    __this_cpu_read(rcu_data_p->cpu_no_qs))
 		t->rcu_read_unlock_special.b.need_qs = true;
 }
 
diff --git a/kernel/rcu/tree_trace.c b/kernel/rcu/tree_trace.c
index 4ac25f8520d6..d373e57109b8 100644
--- a/kernel/rcu/tree_trace.c
+++ b/kernel/rcu/tree_trace.c
@@ -117,11 +117,11 @@ static void print_one_rcu_data(struct seq_file *m, struct rcu_data *rdp)
 
 	if (!rdp->beenonline)
 		return;
-	seq_printf(m, "%3d%cc=%ld g=%ld pq=%d/%d qp=%d",
+	seq_printf(m, "%3d%cc=%ld g=%ld cnq=%d/%d:%d",
 		   rdp->cpu,
 		   cpu_is_offline(rdp->cpu) ? '!' : ' ',
 		   ulong2long(rdp->completed), ulong2long(rdp->gpnum),
-		   rdp->passed_quiesce,
+		   rdp->cpu_no_qs,
 		   rdp->rcu_qs_ctr_snap == per_cpu(rcu_qs_ctr, rdp->cpu),
 		   rdp->core_needs_qs);
 	seq_printf(m, " dt=%d/%llx/%d df=%lu",
-- 
2.5.2


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH tip/core/rcu 08/18] rcu: Make ->cpu_no_qs be a union for aggregate OR
  2015-10-06 16:29 ` [PATCH tip/core/rcu 01/18] rcu: Use rsp->expedited_wq instead of sync_rcu_preempt_exp_wq Paul E. McKenney
                     ` (5 preceding siblings ...)
  2015-10-06 16:29   ` [PATCH tip/core/rcu 07/18] rcu: Invert passed_quiesce and rename to cpu_no_qs Paul E. McKenney
@ 2015-10-06 16:29   ` Paul E. McKenney
  2015-10-06 16:29   ` [PATCH tip/core/rcu 09/18] rcu: Switch synchronize_sched_expedited() to IPI Paul E. McKenney
                     ` (9 subsequent siblings)
  16 siblings, 0 replies; 67+ messages in thread
From: Paul E. McKenney @ 2015-10-06 16:29 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, jiangshanlai, dipankar, akpm, mathieu.desnoyers, josh,
	tglx, peterz, rostedt, dhowells, edumazet, dvhart, fweisbec,
	oleg, bobby.prani, Paul E. McKenney

This commit converts the rcu_data structure's ->cpu_no_qs field
to a union.  The bytewise side of this union allows individual access
to indications as to whether this CPU needs to find a quiescent state
for a normal (.norm) and/or expedited (.exp) grace period.  The setwise
side of the union allows testing whether or not a quiescent state is
needed at all, for either type of grace period.

For now, only .norm is used.  A later commit will introduce the expedited
usage.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 kernel/rcu/tree.c        | 22 +++++++++++-----------
 kernel/rcu/tree.h        | 14 +++++++++++++-
 kernel/rcu/tree_plugin.h |  6 +++---
 kernel/rcu/tree_trace.c  |  2 +-
 4 files changed, 28 insertions(+), 16 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 31e7021ced4d..3e2875b38eae 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -245,21 +245,21 @@ static int rcu_gp_in_progress(struct rcu_state *rsp)
  */
 void rcu_sched_qs(void)
 {
-	if (__this_cpu_read(rcu_sched_data.cpu_no_qs)) {
+	if (__this_cpu_read(rcu_sched_data.cpu_no_qs.s)) {
 		trace_rcu_grace_period(TPS("rcu_sched"),
 				       __this_cpu_read(rcu_sched_data.gpnum),
 				       TPS("cpuqs"));
-		__this_cpu_write(rcu_sched_data.cpu_no_qs, false);
+		__this_cpu_write(rcu_sched_data.cpu_no_qs.b.norm, false);
 	}
 }
 
 void rcu_bh_qs(void)
 {
-	if (__this_cpu_read(rcu_bh_data.cpu_no_qs)) {
+	if (__this_cpu_read(rcu_bh_data.cpu_no_qs.s)) {
 		trace_rcu_grace_period(TPS("rcu_bh"),
 				       __this_cpu_read(rcu_bh_data.gpnum),
 				       TPS("cpuqs"));
-		__this_cpu_write(rcu_bh_data.cpu_no_qs, false);
+		__this_cpu_write(rcu_bh_data.cpu_no_qs.b.norm, false);
 	}
 }
 
@@ -1744,7 +1744,7 @@ static bool __note_gp_changes(struct rcu_state *rsp, struct rcu_node *rnp,
 		 */
 		rdp->gpnum = rnp->gpnum;
 		trace_rcu_grace_period(rsp->name, rdp->gpnum, TPS("cpustart"));
-		rdp->cpu_no_qs = true;
+		rdp->cpu_no_qs.b.norm = true;
 		rdp->rcu_qs_ctr_snap = __this_cpu_read(rcu_qs_ctr);
 		rdp->core_needs_qs = !!(rnp->qsmask & rdp->grpmask);
 		zero_cpu_stall_ticks(rdp);
@@ -2337,7 +2337,7 @@ rcu_report_qs_rdp(int cpu, struct rcu_state *rsp, struct rcu_data *rdp)
 	rnp = rdp->mynode;
 	raw_spin_lock_irqsave(&rnp->lock, flags);
 	smp_mb__after_unlock_lock();
-	if ((rdp->cpu_no_qs &&
+	if ((rdp->cpu_no_qs.b.norm &&
 	     rdp->rcu_qs_ctr_snap == __this_cpu_read(rcu_qs_ctr)) ||
 	    rdp->gpnum != rnp->gpnum || rnp->completed == rnp->gpnum ||
 	    rdp->gpwrap) {
@@ -2348,7 +2348,7 @@ rcu_report_qs_rdp(int cpu, struct rcu_state *rsp, struct rcu_data *rdp)
 		 * We will instead need a new quiescent state that lies
 		 * within the current grace period.
 		 */
-		rdp->cpu_no_qs = true;	/* need qs for new gp. */
+		rdp->cpu_no_qs.b.norm = true;	/* need qs for new gp. */
 		rdp->rcu_qs_ctr_snap = __this_cpu_read(rcu_qs_ctr);
 		raw_spin_unlock_irqrestore(&rnp->lock, flags);
 		return;
@@ -2395,7 +2395,7 @@ rcu_check_quiescent_state(struct rcu_state *rsp, struct rcu_data *rdp)
 	 * Was there a quiescent state since the beginning of the grace
 	 * period? If no, then exit and wait for the next call.
 	 */
-	if (rdp->cpu_no_qs &&
+	if (rdp->cpu_no_qs.b.norm &&
 	    rdp->rcu_qs_ctr_snap == __this_cpu_read(rcu_qs_ctr))
 		return;
 
@@ -3828,11 +3828,11 @@ static int __rcu_pending(struct rcu_state *rsp, struct rcu_data *rdp)
 
 	/* Is the RCU core waiting for a quiescent state from this CPU? */
 	if (rcu_scheduler_fully_active &&
-	    rdp->core_needs_qs && rdp->cpu_no_qs &&
+	    rdp->core_needs_qs && rdp->cpu_no_qs.b.norm &&
 	    rdp->rcu_qs_ctr_snap == __this_cpu_read(rcu_qs_ctr)) {
 		rdp->n_rp_core_needs_qs++;
 	} else if (rdp->core_needs_qs &&
-		   (!rdp->cpu_no_qs ||
+		   (!rdp->cpu_no_qs.b.norm ||
 		    rdp->rcu_qs_ctr_snap != __this_cpu_read(rcu_qs_ctr))) {
 		rdp->n_rp_report_qs++;
 		return 1;
@@ -4155,7 +4155,7 @@ rcu_init_percpu_data(int cpu, struct rcu_state *rsp)
 	rdp->beenonline = true;	 /* We have now been online. */
 	rdp->gpnum = rnp->completed; /* Make CPU later note any new GP. */
 	rdp->completed = rnp->completed;
-	rdp->cpu_no_qs = true;
+	rdp->cpu_no_qs.b.norm = true;
 	rdp->rcu_qs_ctr_snap = per_cpu(rcu_qs_ctr, cpu);
 	rdp->core_needs_qs = false;
 	trace_rcu_grace_period(rsp->name, rdp->gpnum, TPS("cpuonl"));
diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
index ded4ceebed76..3eee48bcf52b 100644
--- a/kernel/rcu/tree.h
+++ b/kernel/rcu/tree.h
@@ -286,6 +286,18 @@ struct rcu_node {
 	for ((rnp) = (rsp)->level[rcu_num_lvls - 1]; \
 	     (rnp) < &(rsp)->node[rcu_num_nodes]; (rnp)++)
 
+/*
+ * Union to allow "aggregate OR" operation on the need for a quiescent
+ * state by the normal and expedited grace periods.
+ */
+union rcu_noqs {
+	struct {
+		u8 norm;
+		u8 exp;
+	} b; /* Bits. */
+	u16 s; /* Set of bits, aggregate OR here. */
+};
+
 /* Index values for nxttail array in struct rcu_data. */
 #define RCU_DONE_TAIL		0	/* Also RCU_WAIT head. */
 #define RCU_WAIT_TAIL		1	/* Also RCU_NEXT_READY head. */
@@ -302,7 +314,7 @@ struct rcu_data {
 					/*  is aware of having started. */
 	unsigned long	rcu_qs_ctr_snap;/* Snapshot of rcu_qs_ctr to check */
 					/*  for rcu_all_qs() invocations. */
-	bool		cpu_no_qs;	/* No QS yet for this CPU. */
+	union rcu_noqs	cpu_no_qs;	/* No QSes yet for this CPU. */
 	bool		core_needs_qs;	/* Core waits for quiesc state. */
 	bool		beenonline;	/* CPU online at least once. */
 	bool		gpwrap;		/* Possible gpnum/completed wrap. */
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 6977ff0dccb9..7880202f1e38 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -265,11 +265,11 @@ static void rcu_preempt_ctxt_queue(struct rcu_node *rnp, struct rcu_data *rdp,
  */
 static void rcu_preempt_qs(void)
 {
-	if (__this_cpu_read(rcu_data_p->cpu_no_qs)) {
+	if (__this_cpu_read(rcu_data_p->cpu_no_qs.s)) {
 		trace_rcu_grace_period(TPS("rcu_preempt"),
 				       __this_cpu_read(rcu_data_p->gpnum),
 				       TPS("cpuqs"));
-		__this_cpu_write(rcu_data_p->cpu_no_qs, false);
+		__this_cpu_write(rcu_data_p->cpu_no_qs.b.norm, false);
 		barrier(); /* Coordinate with rcu_preempt_check_callbacks(). */
 		current->rcu_read_unlock_special.b.need_qs = false;
 	}
@@ -620,7 +620,7 @@ static void rcu_preempt_check_callbacks(void)
 	}
 	if (t->rcu_read_lock_nesting > 0 &&
 	    __this_cpu_read(rcu_data_p->core_needs_qs) &&
-	    __this_cpu_read(rcu_data_p->cpu_no_qs))
+	    __this_cpu_read(rcu_data_p->cpu_no_qs.b.norm))
 		t->rcu_read_unlock_special.b.need_qs = true;
 }
 
diff --git a/kernel/rcu/tree_trace.c b/kernel/rcu/tree_trace.c
index d373e57109b8..999c3672f990 100644
--- a/kernel/rcu/tree_trace.c
+++ b/kernel/rcu/tree_trace.c
@@ -121,7 +121,7 @@ static void print_one_rcu_data(struct seq_file *m, struct rcu_data *rdp)
 		   rdp->cpu,
 		   cpu_is_offline(rdp->cpu) ? '!' : ' ',
 		   ulong2long(rdp->completed), ulong2long(rdp->gpnum),
-		   rdp->cpu_no_qs,
+		   rdp->cpu_no_qs.b.norm,
 		   rdp->rcu_qs_ctr_snap == per_cpu(rcu_qs_ctr, rdp->cpu),
 		   rdp->core_needs_qs);
 	seq_printf(m, " dt=%d/%llx/%d df=%lu",
-- 
2.5.2


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH tip/core/rcu 09/18] rcu: Switch synchronize_sched_expedited() to IPI
  2015-10-06 16:29 ` [PATCH tip/core/rcu 01/18] rcu: Use rsp->expedited_wq instead of sync_rcu_preempt_exp_wq Paul E. McKenney
                     ` (6 preceding siblings ...)
  2015-10-06 16:29   ` [PATCH tip/core/rcu 08/18] rcu: Make ->cpu_no_qs be a union for aggregate OR Paul E. McKenney
@ 2015-10-06 16:29   ` Paul E. McKenney
  2015-10-07 14:18     ` Peter Zijlstra
  2015-10-06 16:29   ` [PATCH tip/core/rcu 10/18] rcu: Stop silencing lockdep false positive for expedited grace periods Paul E. McKenney
                     ` (8 subsequent siblings)
  16 siblings, 1 reply; 67+ messages in thread
From: Paul E. McKenney @ 2015-10-06 16:29 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, jiangshanlai, dipankar, akpm, mathieu.desnoyers, josh,
	tglx, peterz, rostedt, dhowells, edumazet, dvhart, fweisbec,
	oleg, bobby.prani, Paul E. McKenney

This commit switches synchronize_sched_expedited() from stop_one_cpu_nowait()
to smp_call_function_single(), thus moving from an IPI and a pair of
context switches to an IPI and a single pass through the scheduler.
Of course, if the scheduler actually does decide to switch to a different
task, there will still be a pair of context switches, but there would
likely have been a pair of context switches anyway, just a bit later.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 kernel/rcu/tree.c | 32 ++++++++++++++++++++------------
 kernel/rcu/tree.h |  3 ---
 2 files changed, 20 insertions(+), 15 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 3e2875b38eae..869e58b92c53 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -161,6 +161,8 @@ static void rcu_cleanup_dead_rnp(struct rcu_node *rnp_leaf);
 static void rcu_boost_kthread_setaffinity(struct rcu_node *rnp, int outgoingcpu);
 static void invoke_rcu_core(void);
 static void invoke_rcu_callbacks(struct rcu_state *rsp, struct rcu_data *rdp);
+static void __maybe_unused rcu_report_exp_rdp(struct rcu_state *rsp,
+					      struct rcu_data *rdp, bool wake);
 
 /* rcuc/rcub kthread realtime priority */
 #ifdef CONFIG_RCU_KTHREAD_PRIO
@@ -250,6 +252,12 @@ void rcu_sched_qs(void)
 				       __this_cpu_read(rcu_sched_data.gpnum),
 				       TPS("cpuqs"));
 		__this_cpu_write(rcu_sched_data.cpu_no_qs.b.norm, false);
+		if (__this_cpu_read(rcu_sched_data.cpu_no_qs.b.exp)) {
+			__this_cpu_write(rcu_sched_data.cpu_no_qs.b.exp, false);
+			rcu_report_exp_rdp(&rcu_sched_state,
+					   this_cpu_ptr(&rcu_sched_data),
+					   true);
+		}
 	}
 }
 
@@ -3555,8 +3563,8 @@ static void rcu_report_exp_cpu_mult(struct rcu_state *rsp, struct rcu_node *rnp,
  * Report expedited quiescent state for specified rcu_data (CPU).
  * Caller must hold the root rcu_node's exp_funnel_mutex.
  */
-static void __maybe_unused rcu_report_exp_rdp(struct rcu_state *rsp,
-					      struct rcu_data *rdp, bool wake)
+static void rcu_report_exp_rdp(struct rcu_state *rsp, struct rcu_data *rdp,
+			       bool wake)
 {
 	rcu_report_exp_cpu_mult(rsp, rdp->mynode, rdp->grpmask, wake);
 }
@@ -3637,14 +3645,10 @@ static struct rcu_node *exp_funnel_lock(struct rcu_state *rsp, unsigned long s)
 }
 
 /* Invoked on each online non-idle CPU for expedited quiescent state. */
-static int synchronize_sched_expedited_cpu_stop(void *data)
+static void synchronize_sched_expedited_cpu_stop(void *data)
 {
-	struct rcu_data *rdp = data;
-	struct rcu_state *rsp = rdp->rsp;
-
-	/* Report the quiescent state. */
-	rcu_report_exp_rdp(rsp, rdp, true);
-	return 0;
+	__this_cpu_write(rcu_sched_data.cpu_no_qs.b.exp, true);
+	resched_cpu(smp_processor_id());
 }
 
 /*
@@ -3659,6 +3663,7 @@ static void sync_sched_exp_select_cpus(struct rcu_state *rsp)
 	unsigned long mask_ofl_test;
 	unsigned long mask_ofl_ipi;
 	struct rcu_data *rdp;
+	int ret;
 	struct rcu_node *rnp;
 
 	sync_exp_reset_tree(rsp);
@@ -3694,9 +3699,9 @@ static void sync_sched_exp_select_cpus(struct rcu_state *rsp)
 			if (!(mask_ofl_ipi & mask))
 				continue;
 			rdp = per_cpu_ptr(rsp->rda, cpu);
-			stop_one_cpu_nowait(cpu, synchronize_sched_expedited_cpu_stop,
-					    rdp, &rdp->exp_stop_work);
-			mask_ofl_ipi &= ~mask;
+			ret = smp_call_function_single(cpu, synchronize_sched_expedited_cpu_stop, NULL, 0);
+			if (!ret)
+				mask_ofl_ipi &= ~mask;
 		}
 		/* Report quiescent states for those that went offline. */
 		mask_ofl_test |= mask_ofl_ipi;
@@ -4201,6 +4206,9 @@ int rcu_cpu_notify(struct notifier_block *self,
 			rcu_cleanup_dying_cpu(rsp);
 		break;
 	case CPU_DYING_IDLE:
+		/* QS for any half-done expedited RCU-sched GP. */
+		rcu_sched_qs();
+
 		for_each_rcu_flavor(rsp) {
 			rcu_cleanup_dying_idle_cpu(cpu, rsp);
 		}
diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
index 3eee48bcf52b..1b969cef8fe4 100644
--- a/kernel/rcu/tree.h
+++ b/kernel/rcu/tree.h
@@ -324,9 +324,6 @@ struct rcu_data {
 					/*  ticks this CPU has handled */
 					/*  during and after the last grace */
 					/* period it is aware of. */
-	struct cpu_stop_work exp_stop_work;
-					/* Expedited grace-period control */
-					/*  for CPU stopping. */
 
 	/* 2) batch handling */
 	/*
-- 
2.5.2


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH tip/core/rcu 10/18] rcu: Stop silencing lockdep false positive for expedited grace periods
  2015-10-06 16:29 ` [PATCH tip/core/rcu 01/18] rcu: Use rsp->expedited_wq instead of sync_rcu_preempt_exp_wq Paul E. McKenney
                     ` (7 preceding siblings ...)
  2015-10-06 16:29   ` [PATCH tip/core/rcu 09/18] rcu: Switch synchronize_sched_expedited() to IPI Paul E. McKenney
@ 2015-10-06 16:29   ` Paul E. McKenney
  2015-10-06 16:29   ` [PATCH tip/core/rcu 11/18] rcu: Stop excluding CPU hotplug in synchronize_sched_expedited() Paul E. McKenney
                     ` (7 subsequent siblings)
  16 siblings, 0 replies; 67+ messages in thread
From: Paul E. McKenney @ 2015-10-06 16:29 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, jiangshanlai, dipankar, akpm, mathieu.desnoyers, josh,
	tglx, peterz, rostedt, dhowells, edumazet, dvhart, fweisbec,
	oleg, bobby.prani, Paul E. McKenney

This reverts commit af859beaaba4 (rcu: Silence lockdep false positive
for expedited grace periods).  Because synchronize_rcu_expedited()
no longer invokes synchronize_sched_expedited(), ->exp_funnel_mutex
acquisition is no longer nested, so the false positive no longer happens.
This commit therefore removes the extra lockdep data structures, as they
are no longer needed.
---
 kernel/rcu/tree.c | 17 ++---------------
 kernel/rcu/tree.h |  8 --------
 2 files changed, 2 insertions(+), 23 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 869e58b92c53..57b83f6d5263 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -71,7 +71,6 @@ MODULE_ALIAS("rcutree");
 static struct lock_class_key rcu_node_class[RCU_NUM_LVLS];
 static struct lock_class_key rcu_fqs_class[RCU_NUM_LVLS];
 static struct lock_class_key rcu_exp_class[RCU_NUM_LVLS];
-static struct lock_class_key rcu_exp_sched_class[RCU_NUM_LVLS];
 
 /*
  * In order to export the rcu_state name to the tracing tools, it
@@ -4095,7 +4094,6 @@ static void rcu_init_new_rnp(struct rcu_node *rnp_leaf)
 static void __init
 rcu_boot_init_percpu_data(int cpu, struct rcu_state *rsp)
 {
-	static struct lock_class_key rcu_exp_sched_rdp_class;
 	unsigned long flags;
 	struct rcu_data *rdp = per_cpu_ptr(rsp->rda, cpu);
 	struct rcu_node *rnp = rcu_get_root(rsp);
@@ -4111,10 +4109,6 @@ rcu_boot_init_percpu_data(int cpu, struct rcu_state *rsp)
 	mutex_init(&rdp->exp_funnel_mutex);
 	rcu_boot_init_nocb_percpu_data(rdp);
 	raw_spin_unlock_irqrestore(&rnp->lock, flags);
-	if (rsp == &rcu_sched_state)
-		lockdep_set_class_and_name(&rdp->exp_funnel_mutex,
-					   &rcu_exp_sched_rdp_class,
-					   "rcu_data_exp_sched");
 }
 
 /*
@@ -4340,7 +4334,6 @@ static void __init rcu_init_one(struct rcu_state *rsp,
 	static const char * const buf[] = RCU_NODE_NAME_INIT;
 	static const char * const fqs[] = RCU_FQS_NAME_INIT;
 	static const char * const exp[] = RCU_EXP_NAME_INIT;
-	static const char * const exp_sched[] = RCU_EXP_SCHED_NAME_INIT;
 	static u8 fl_mask = 0x1;
 
 	int levelcnt[RCU_NUM_LVLS];		/* # nodes in each level. */
@@ -4400,14 +4393,8 @@ static void __init rcu_init_one(struct rcu_state *rsp,
 			INIT_LIST_HEAD(&rnp->blkd_tasks);
 			rcu_init_one_nocb(rnp);
 			mutex_init(&rnp->exp_funnel_mutex);
-			if (rsp == &rcu_sched_state)
-				lockdep_set_class_and_name(
-					&rnp->exp_funnel_mutex,
-					&rcu_exp_sched_class[i], exp_sched[i]);
-			else
-				lockdep_set_class_and_name(
-					&rnp->exp_funnel_mutex,
-					&rcu_exp_class[i], exp[i]);
+			lockdep_set_class_and_name(&rnp->exp_funnel_mutex,
+						   &rcu_exp_class[i], exp[i]);
 		}
 	}
 
diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
index 1b969cef8fe4..6f3b63b68886 100644
--- a/kernel/rcu/tree.h
+++ b/kernel/rcu/tree.h
@@ -70,8 +70,6 @@
 #  define RCU_NODE_NAME_INIT  { "rcu_node_0" }
 #  define RCU_FQS_NAME_INIT   { "rcu_node_fqs_0" }
 #  define RCU_EXP_NAME_INIT   { "rcu_node_exp_0" }
-#  define RCU_EXP_SCHED_NAME_INIT \
-			      { "rcu_node_exp_sched_0" }
 #elif NR_CPUS <= RCU_FANOUT_2
 #  define RCU_NUM_LVLS	      2
 #  define NUM_RCU_LVL_0	      1
@@ -81,8 +79,6 @@
 #  define RCU_NODE_NAME_INIT  { "rcu_node_0", "rcu_node_1" }
 #  define RCU_FQS_NAME_INIT   { "rcu_node_fqs_0", "rcu_node_fqs_1" }
 #  define RCU_EXP_NAME_INIT   { "rcu_node_exp_0", "rcu_node_exp_1" }
-#  define RCU_EXP_SCHED_NAME_INIT \
-			      { "rcu_node_exp_sched_0", "rcu_node_exp_sched_1" }
 #elif NR_CPUS <= RCU_FANOUT_3
 #  define RCU_NUM_LVLS	      3
 #  define NUM_RCU_LVL_0	      1
@@ -93,8 +89,6 @@
 #  define RCU_NODE_NAME_INIT  { "rcu_node_0", "rcu_node_1", "rcu_node_2" }
 #  define RCU_FQS_NAME_INIT   { "rcu_node_fqs_0", "rcu_node_fqs_1", "rcu_node_fqs_2" }
 #  define RCU_EXP_NAME_INIT   { "rcu_node_exp_0", "rcu_node_exp_1", "rcu_node_exp_2" }
-#  define RCU_EXP_SCHED_NAME_INIT \
-			      { "rcu_node_exp_sched_0", "rcu_node_exp_sched_1", "rcu_node_exp_sched_2" }
 #elif NR_CPUS <= RCU_FANOUT_4
 #  define RCU_NUM_LVLS	      4
 #  define NUM_RCU_LVL_0	      1
@@ -106,8 +100,6 @@
 #  define RCU_NODE_NAME_INIT  { "rcu_node_0", "rcu_node_1", "rcu_node_2", "rcu_node_3" }
 #  define RCU_FQS_NAME_INIT   { "rcu_node_fqs_0", "rcu_node_fqs_1", "rcu_node_fqs_2", "rcu_node_fqs_3" }
 #  define RCU_EXP_NAME_INIT   { "rcu_node_exp_0", "rcu_node_exp_1", "rcu_node_exp_2", "rcu_node_exp_3" }
-#  define RCU_EXP_SCHED_NAME_INIT \
-			      { "rcu_node_exp_sched_0", "rcu_node_exp_sched_1", "rcu_node_exp_sched_2", "rcu_node_exp_sched_3" }
 #else
 # error "CONFIG_RCU_FANOUT insufficient for NR_CPUS"
 #endif /* #if (NR_CPUS) <= RCU_FANOUT_1 */
-- 
2.5.2


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH tip/core/rcu 11/18] rcu: Stop excluding CPU hotplug in synchronize_sched_expedited()
  2015-10-06 16:29 ` [PATCH tip/core/rcu 01/18] rcu: Use rsp->expedited_wq instead of sync_rcu_preempt_exp_wq Paul E. McKenney
                     ` (8 preceding siblings ...)
  2015-10-06 16:29   ` [PATCH tip/core/rcu 10/18] rcu: Stop silencing lockdep false positive for expedited grace periods Paul E. McKenney
@ 2015-10-06 16:29   ` Paul E. McKenney
  2015-10-06 16:29   ` [PATCH tip/core/rcu 12/18] cpu: Remove try_get_online_cpus() Paul E. McKenney
                     ` (6 subsequent siblings)
  16 siblings, 0 replies; 67+ messages in thread
From: Paul E. McKenney @ 2015-10-06 16:29 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, jiangshanlai, dipankar, akpm, mathieu.desnoyers, josh,
	tglx, peterz, rostedt, dhowells, edumazet, dvhart, fweisbec,
	oleg, bobby.prani, Paul E. McKenney

Now that synchronize_sched_expedited() uses IPIs, a hook in
rcu_sched_qs(), and the ->expmask field in the rcu_node combining
tree, it is no longer necessary to exclude CPU hotplug.  Any
races with CPU hotplug will be detected when attempting to send
the IPI.  This commit therefore removes the code excluding
CPU hotplug operations.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 kernel/rcu/tree.c | 14 +-------------
 1 file changed, 1 insertion(+), 13 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 57b83f6d5263..6885bfc9d2bf 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -3785,19 +3785,9 @@ void synchronize_sched_expedited(void)
 	/* Take a snapshot of the sequence number.  */
 	s = rcu_exp_gp_seq_snap(rsp);
 
-	if (!try_get_online_cpus()) {
-		/* CPU hotplug operation in flight, fall back to normal GP. */
-		wait_rcu_gp(call_rcu_sched);
-		atomic_long_inc(&rsp->expedited_normal);
-		return;
-	}
-	WARN_ON_ONCE(cpu_is_offline(raw_smp_processor_id()));
-
 	rnp = exp_funnel_lock(rsp, s);
-	if (rnp == NULL) {
-		put_online_cpus();
+	if (rnp == NULL)
 		return;  /* Someone else did our work for us. */
-	}
 
 	rcu_exp_gp_seq_start(rsp);
 	sync_sched_exp_select_cpus(rsp);
@@ -3805,8 +3795,6 @@ void synchronize_sched_expedited(void)
 
 	rcu_exp_gp_seq_end(rsp);
 	mutex_unlock(&rnp->exp_funnel_mutex);
-
-	put_online_cpus();
 }
 EXPORT_SYMBOL_GPL(synchronize_sched_expedited);
 
-- 
2.5.2


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH tip/core/rcu 12/18] cpu: Remove try_get_online_cpus()
  2015-10-06 16:29 ` [PATCH tip/core/rcu 01/18] rcu: Use rsp->expedited_wq instead of sync_rcu_preempt_exp_wq Paul E. McKenney
                     ` (9 preceding siblings ...)
  2015-10-06 16:29   ` [PATCH tip/core/rcu 11/18] rcu: Stop excluding CPU hotplug in synchronize_sched_expedited() Paul E. McKenney
@ 2015-10-06 16:29   ` Paul E. McKenney
  2015-10-06 16:29   ` [PATCH tip/core/rcu 13/18] rcu: Prepare for consolidating expedited CPU selection Paul E. McKenney
                     ` (5 subsequent siblings)
  16 siblings, 0 replies; 67+ messages in thread
From: Paul E. McKenney @ 2015-10-06 16:29 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, jiangshanlai, dipankar, akpm, mathieu.desnoyers, josh,
	tglx, peterz, rostedt, dhowells, edumazet, dvhart, fweisbec,
	oleg, bobby.prani, Paul E. McKenney

Now that synchronize_sched_expedited() no longer uses it, there are
no users of try_get_online_cpus() in mainline.  This commit therefore
removes it.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
---
 include/linux/cpu.h |  2 --
 kernel/cpu.c        | 13 -------------
 2 files changed, 15 deletions(-)

diff --git a/include/linux/cpu.h b/include/linux/cpu.h
index 23c30bdcca86..d2ca8c38f9c4 100644
--- a/include/linux/cpu.h
+++ b/include/linux/cpu.h
@@ -228,7 +228,6 @@ extern struct bus_type cpu_subsys;
 extern void cpu_hotplug_begin(void);
 extern void cpu_hotplug_done(void);
 extern void get_online_cpus(void);
-extern bool try_get_online_cpus(void);
 extern void put_online_cpus(void);
 extern void cpu_hotplug_disable(void);
 extern void cpu_hotplug_enable(void);
@@ -246,7 +245,6 @@ int cpu_down(unsigned int cpu);
 static inline void cpu_hotplug_begin(void) {}
 static inline void cpu_hotplug_done(void) {}
 #define get_online_cpus()	do { } while (0)
-#define try_get_online_cpus()	true
 #define put_online_cpus()	do { } while (0)
 #define cpu_hotplug_disable()	do { } while (0)
 #define cpu_hotplug_enable()	do { } while (0)
diff --git a/kernel/cpu.c b/kernel/cpu.c
index 82cf9dff4295..14a9cdf8abe9 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -102,19 +102,6 @@ void get_online_cpus(void)
 }
 EXPORT_SYMBOL_GPL(get_online_cpus);
 
-bool try_get_online_cpus(void)
-{
-	if (cpu_hotplug.active_writer == current)
-		return true;
-	if (!mutex_trylock(&cpu_hotplug.lock))
-		return false;
-	cpuhp_lock_acquire_tryread();
-	atomic_inc(&cpu_hotplug.refcount);
-	mutex_unlock(&cpu_hotplug.lock);
-	return true;
-}
-EXPORT_SYMBOL_GPL(try_get_online_cpus);
-
 void put_online_cpus(void)
 {
 	int refcount;
-- 
2.5.2


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH tip/core/rcu 13/18] rcu: Prepare for consolidating expedited CPU selection
  2015-10-06 16:29 ` [PATCH tip/core/rcu 01/18] rcu: Use rsp->expedited_wq instead of sync_rcu_preempt_exp_wq Paul E. McKenney
                     ` (10 preceding siblings ...)
  2015-10-06 16:29   ` [PATCH tip/core/rcu 12/18] cpu: Remove try_get_online_cpus() Paul E. McKenney
@ 2015-10-06 16:29   ` Paul E. McKenney
  2015-10-06 16:29   ` [PATCH tip/core/rcu 14/18] rcu: Consolidate " Paul E. McKenney
                     ` (4 subsequent siblings)
  16 siblings, 0 replies; 67+ messages in thread
From: Paul E. McKenney @ 2015-10-06 16:29 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, jiangshanlai, dipankar, akpm, mathieu.desnoyers, josh,
	tglx, peterz, rostedt, dhowells, edumazet, dvhart, fweisbec,
	oleg, bobby.prani, Paul E. McKenney

This commit brings sync_sched_exp_select_cpus() into alignment with
sync_rcu_exp_select_cpus(), as a first step towards consolidating them
into one function.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 kernel/rcu/tree.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 6885bfc9d2bf..500b1347554d 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -3661,7 +3661,6 @@ static void sync_sched_exp_select_cpus(struct rcu_state *rsp)
 	unsigned long mask;
 	unsigned long mask_ofl_test;
 	unsigned long mask_ofl_ipi;
-	struct rcu_data *rdp;
 	int ret;
 	struct rcu_node *rnp;
 
@@ -3697,7 +3696,6 @@ static void sync_sched_exp_select_cpus(struct rcu_state *rsp)
 		for (cpu = rnp->grplo; cpu <= rnp->grphi; cpu++, mask <<= 1) {
 			if (!(mask_ofl_ipi & mask))
 				continue;
-			rdp = per_cpu_ptr(rsp->rda, cpu);
 			ret = smp_call_function_single(cpu, synchronize_sched_expedited_cpu_stop, NULL, 0);
 			if (!ret)
 				mask_ofl_ipi &= ~mask;
-- 
2.5.2


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH tip/core/rcu 14/18] rcu: Consolidate expedited CPU selection
  2015-10-06 16:29 ` [PATCH tip/core/rcu 01/18] rcu: Use rsp->expedited_wq instead of sync_rcu_preempt_exp_wq Paul E. McKenney
                     ` (11 preceding siblings ...)
  2015-10-06 16:29   ` [PATCH tip/core/rcu 13/18] rcu: Prepare for consolidating expedited CPU selection Paul E. McKenney
@ 2015-10-06 16:29   ` Paul E. McKenney
  2015-10-06 16:29   ` [PATCH tip/core/rcu 15/18] rcu: Add online/offline info to expedited stall warning message Paul E. McKenney
                     ` (3 subsequent siblings)
  16 siblings, 0 replies; 67+ messages in thread
From: Paul E. McKenney @ 2015-10-06 16:29 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, jiangshanlai, dipankar, akpm, mathieu.desnoyers, josh,
	tglx, peterz, rostedt, dhowells, edumazet, dvhart, fweisbec,
	oleg, bobby.prani, Paul E. McKenney

Now that sync_sched_exp_select_cpus() and sync_rcu_exp_select_cpus()
are identical aside from the the argument to smp_call_function_single(),
this commit consolidates them with a functional argument.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 kernel/rcu/tree.c        |  7 +++---
 kernel/rcu/tree_plugin.h | 61 +-----------------------------------------------
 2 files changed, 5 insertions(+), 63 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 500b1347554d..ff68087d93a3 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -3654,7 +3654,8 @@ static void synchronize_sched_expedited_cpu_stop(void *data)
  * Select the nodes that the upcoming expedited grace period needs
  * to wait for.
  */
-static void sync_sched_exp_select_cpus(struct rcu_state *rsp)
+static void sync_rcu_exp_select_cpus(struct rcu_state *rsp,
+				     smp_call_func_t func)
 {
 	int cpu;
 	unsigned long flags;
@@ -3696,7 +3697,7 @@ static void sync_sched_exp_select_cpus(struct rcu_state *rsp)
 		for (cpu = rnp->grplo; cpu <= rnp->grphi; cpu++, mask <<= 1) {
 			if (!(mask_ofl_ipi & mask))
 				continue;
-			ret = smp_call_function_single(cpu, synchronize_sched_expedited_cpu_stop, NULL, 0);
+			ret = smp_call_function_single(cpu, func, rsp, 0);
 			if (!ret)
 				mask_ofl_ipi &= ~mask;
 		}
@@ -3788,7 +3789,7 @@ void synchronize_sched_expedited(void)
 		return;  /* Someone else did our work for us. */
 
 	rcu_exp_gp_seq_start(rsp);
-	sync_sched_exp_select_cpus(rsp);
+	sync_rcu_exp_select_cpus(rsp, synchronize_sched_expedited_cpu_stop);
 	synchronize_sched_expedited_wait(rsp);
 
 	rcu_exp_gp_seq_end(rsp);
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 7880202f1e38..6cbfbfc58656 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -708,65 +708,6 @@ static void sync_rcu_exp_handler(void *info)
 	rcu_report_exp_rdp(rsp, rdp, true);
 }
 
-/*
- * Select the nodes that the upcoming expedited grace period needs
- * to wait for.
- */
-static void sync_rcu_exp_select_cpus(struct rcu_state *rsp)
-{
-	int cpu;
-	unsigned long flags;
-	unsigned long mask;
-	unsigned long mask_ofl_test;
-	unsigned long mask_ofl_ipi;
-	int ret;
-	struct rcu_node *rnp;
-
-	sync_exp_reset_tree(rsp);
-	rcu_for_each_leaf_node(rsp, rnp) {
-		raw_spin_lock_irqsave(&rnp->lock, flags);
-		smp_mb__after_unlock_lock();
-
-		/* Each pass checks a CPU for identity, offline, and idle. */
-		mask_ofl_test = 0;
-		for (cpu = rnp->grplo; cpu <= rnp->grphi; cpu++) {
-			struct rcu_data *rdp = per_cpu_ptr(rsp->rda, cpu);
-			struct rcu_dynticks *rdtp = &per_cpu(rcu_dynticks, cpu);
-
-			if (raw_smp_processor_id() == cpu ||
-			    cpu_is_offline(cpu) ||
-			    !(atomic_add_return(0, &rdtp->dynticks) & 0x1))
-				mask_ofl_test |= rdp->grpmask;
-		}
-		mask_ofl_ipi = rnp->expmask & ~mask_ofl_test;
-
-		/*
-		 * Need to wait for any blocked tasks as well.  Note that
-		 * additional blocking tasks will also block the expedited
-		 * GP until such time as the ->expmask bits are cleared.
-		 */
-		if (rcu_preempt_has_tasks(rnp))
-			rnp->exp_tasks = rnp->blkd_tasks.next;
-		raw_spin_unlock_irqrestore(&rnp->lock, flags);
-
-		/* IPI the remaining CPUs for expedited quiescent state. */
-		mask = 1;
-		for (cpu = rnp->grplo; cpu <= rnp->grphi; cpu++, mask <<= 1) {
-			if (!(mask_ofl_ipi & mask))
-				continue;
-			ret = smp_call_function_single(cpu,
-						       sync_rcu_exp_handler,
-						       rsp, 0);
-			if (!ret)
-				mask_ofl_ipi &= ~mask;
-		}
-		/* Report quiescent states for those that went offline. */
-		mask_ofl_test |= mask_ofl_ipi;
-		if (mask_ofl_test)
-			rcu_report_exp_cpu_mult(rsp, rnp, mask_ofl_test, false);
-	}
-}
-
 /**
  * synchronize_rcu_expedited - Brute-force RCU grace period
  *
@@ -795,7 +736,7 @@ void synchronize_rcu_expedited(void)
 	rcu_exp_gp_seq_start(rsp);
 
 	/* Initialize the rcu_node tree in preparation for the wait. */
-	sync_rcu_exp_select_cpus(rsp);
+	sync_rcu_exp_select_cpus(rsp, sync_rcu_exp_handler);
 
 	/* Wait for snapshotted ->blkd_tasks lists to drain. */
 	rnp = rcu_get_root(rsp);
-- 
2.5.2


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH tip/core/rcu 15/18] rcu: Add online/offline info to expedited stall warning message
  2015-10-06 16:29 ` [PATCH tip/core/rcu 01/18] rcu: Use rsp->expedited_wq instead of sync_rcu_preempt_exp_wq Paul E. McKenney
                     ` (12 preceding siblings ...)
  2015-10-06 16:29   ` [PATCH tip/core/rcu 14/18] rcu: Consolidate " Paul E. McKenney
@ 2015-10-06 16:29   ` Paul E. McKenney
  2015-10-06 16:29   ` [PATCH tip/core/rcu 16/18] rcu: Add tasks to expedited stall-warning messages Paul E. McKenney
                     ` (2 subsequent siblings)
  16 siblings, 0 replies; 67+ messages in thread
From: Paul E. McKenney @ 2015-10-06 16:29 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, jiangshanlai, dipankar, akpm, mathieu.desnoyers, josh,
	tglx, peterz, rostedt, dhowells, edumazet, dvhart, fweisbec,
	oleg, bobby.prani, Paul E. McKenney

This commit makes the RCU CPU stall warning message print online/offline
indications immediately after the CPU number.  A "O" indicates global
offline, a "." global online, and a "o" indicates RCU believes that the
CPU is offline for the current grace period and "." otherwise, and an
"N" indicates that RCU believes that the CPU will be offline for the
next grace period, and "." otherwise, all right after the CPU number.
So for CPU 10, you would normally see "10-...:" indicating that everything
believes that the CPU is online.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 kernel/rcu/tree.c        |  9 ++++++++-
 kernel/rcu/tree.h        |  1 +
 kernel/rcu/tree_plugin.h | 31 +++++++++++++++++++++++++++++++
 3 files changed, 40 insertions(+), 1 deletion(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index ff68087d93a3..b246aa0470dc 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -3737,11 +3737,18 @@ static void synchronize_sched_expedited_wait(struct rcu_state *rsp)
 		pr_err("INFO: %s detected expedited stalls on CPUs: {",
 		       rsp->name);
 		rcu_for_each_leaf_node(rsp, rnp) {
+			(void)rcu_print_task_exp_stall(rnp);
 			mask = 1;
 			for (cpu = rnp->grplo; cpu <= rnp->grphi; cpu++, mask <<= 1) {
+				struct rcu_data *rdp;
+
 				if (!(rnp->expmask & mask))
 					continue;
-				pr_cont(" %d", cpu);
+				rdp = per_cpu_ptr(rsp->rda, cpu);
+				pr_cont(" %d-%c%c%c", cpu,
+					"O."[cpu_online(cpu)],
+					"o."[!!(rdp->grpmask & rnp->expmaskinit)],
+					"N."[!!(rdp->grpmask & rnp->expmaskinitnext)]);
 			}
 			mask <<= 1;
 		}
diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
index 6f3b63b68886..191aa3678575 100644
--- a/kernel/rcu/tree.h
+++ b/kernel/rcu/tree.h
@@ -589,6 +589,7 @@ static bool rcu_preempt_has_tasks(struct rcu_node *rnp);
 #endif /* #ifdef CONFIG_HOTPLUG_CPU */
 static void rcu_print_detail_task_stall(struct rcu_state *rsp);
 static int rcu_print_task_stall(struct rcu_node *rnp);
+static int rcu_print_task_exp_stall(struct rcu_node *rnp);
 static void rcu_preempt_check_blocked_tasks(struct rcu_node *rnp);
 static void rcu_preempt_check_callbacks(void);
 void call_rcu(struct rcu_head *head, void (*func)(struct rcu_head *rcu));
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 6cbfbfc58656..7b61cece80c0 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -586,6 +586,27 @@ static int rcu_print_task_stall(struct rcu_node *rnp)
 }
 
 /*
+ * Scan the current list of tasks blocked within RCU read-side critical
+ * sections, printing out the tid of each that is blocking the current
+ * expedited grace period.
+ */
+static int rcu_print_task_exp_stall(struct rcu_node *rnp)
+{
+	struct task_struct *t;
+	int ndetected = 0;
+
+	if (!rnp->exp_tasks)
+		return 0;
+	t = list_entry(rnp->exp_tasks->prev,
+		       struct task_struct, rcu_node_entry);
+	list_for_each_entry_continue(t, &rnp->blkd_tasks, rcu_node_entry) {
+		pr_cont(" P%d", t->pid);
+		ndetected++;
+	}
+	return ndetected;
+}
+
+/*
  * Check that the list of blocked tasks for the newly completed grace
  * period is in fact empty.  It is a serious bug to complete a grace
  * period that still has RCU readers blocked!  This function must be
@@ -846,6 +867,16 @@ static int rcu_print_task_stall(struct rcu_node *rnp)
 }
 
 /*
+ * Because preemptible RCU does not exist, we never have to check for
+ * tasks blocked within RCU read-side critical sections that are
+ * blocking the current expedited grace period.
+ */
+static int rcu_print_task_exp_stall(struct rcu_node *rnp)
+{
+	return 0;
+}
+
+/*
  * Because there is no preemptible RCU, there can be no readers blocked,
  * so there is no need to check for blocked tasks.  So check only for
  * bogus qsmask values.
-- 
2.5.2


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH tip/core/rcu 16/18] rcu: Add tasks to expedited stall-warning messages
  2015-10-06 16:29 ` [PATCH tip/core/rcu 01/18] rcu: Use rsp->expedited_wq instead of sync_rcu_preempt_exp_wq Paul E. McKenney
                     ` (13 preceding siblings ...)
  2015-10-06 16:29   ` [PATCH tip/core/rcu 15/18] rcu: Add online/offline info to expedited stall warning message Paul E. McKenney
@ 2015-10-06 16:29   ` Paul E. McKenney
  2015-10-06 16:29   ` [PATCH tip/core/rcu 17/18] rcu: Enable stall warnings for synchronize_rcu_expedited() Paul E. McKenney
  2015-10-06 16:29   ` [PATCH tip/core/rcu 18/18] rcu: Better hotplug handling for synchronize_sched_expedited() Paul E. McKenney
  16 siblings, 0 replies; 67+ messages in thread
From: Paul E. McKenney @ 2015-10-06 16:29 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, jiangshanlai, dipankar, akpm, mathieu.desnoyers, josh,
	tglx, peterz, rostedt, dhowells, edumazet, dvhart, fweisbec,
	oleg, bobby.prani, Paul E. McKenney

This commit adds task-print ability to the expedited RCU CPU stall
warning messages in preparation for adding stall warnings to
synchornize_rcu_expedited().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 kernel/rcu/tree.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index b246aa0470dc..ed957c3b6c86 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -3734,7 +3734,7 @@ static void synchronize_sched_expedited_wait(struct rcu_state *rsp)
 				   sync_rcu_preempt_exp_done(rnp_root));
 			return;
 		}
-		pr_err("INFO: %s detected expedited stalls on CPUs: {",
+		pr_err("INFO: %s detected expedited stalls on CPUs/tasks: {",
 		       rsp->name);
 		rcu_for_each_leaf_node(rsp, rnp) {
 			(void)rcu_print_task_exp_stall(rnp);
-- 
2.5.2


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH tip/core/rcu 17/18] rcu: Enable stall warnings for synchronize_rcu_expedited()
  2015-10-06 16:29 ` [PATCH tip/core/rcu 01/18] rcu: Use rsp->expedited_wq instead of sync_rcu_preempt_exp_wq Paul E. McKenney
                     ` (14 preceding siblings ...)
  2015-10-06 16:29   ` [PATCH tip/core/rcu 16/18] rcu: Add tasks to expedited stall-warning messages Paul E. McKenney
@ 2015-10-06 16:29   ` Paul E. McKenney
  2015-10-06 16:29   ` [PATCH tip/core/rcu 18/18] rcu: Better hotplug handling for synchronize_sched_expedited() Paul E. McKenney
  16 siblings, 0 replies; 67+ messages in thread
From: Paul E. McKenney @ 2015-10-06 16:29 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, jiangshanlai, dipankar, akpm, mathieu.desnoyers, josh,
	tglx, peterz, rostedt, dhowells, edumazet, dvhart, fweisbec,
	oleg, bobby.prani, Paul E. McKenney

This commit redirects synchronize_rcu_expedited()'s wait to
synchronize_sched_expedited_wait(), thus enabling RCU CPU
stall warnings.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 kernel/rcu/tree_plugin.h | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 7b61cece80c0..ffeb99e550e8 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -761,8 +761,7 @@ void synchronize_rcu_expedited(void)
 
 	/* Wait for snapshotted ->blkd_tasks lists to drain. */
 	rnp = rcu_get_root(rsp);
-	wait_event(rsp->expedited_wq,
-		   sync_rcu_preempt_exp_done(rnp));
+	synchronize_sched_expedited_wait(rsp);
 
 	/* Clean up and exit. */
 	rcu_exp_gp_seq_end(rsp);
-- 
2.5.2


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH tip/core/rcu 18/18] rcu: Better hotplug handling for synchronize_sched_expedited()
  2015-10-06 16:29 ` [PATCH tip/core/rcu 01/18] rcu: Use rsp->expedited_wq instead of sync_rcu_preempt_exp_wq Paul E. McKenney
                     ` (15 preceding siblings ...)
  2015-10-06 16:29   ` [PATCH tip/core/rcu 17/18] rcu: Enable stall warnings for synchronize_rcu_expedited() Paul E. McKenney
@ 2015-10-06 16:29   ` Paul E. McKenney
  2015-10-07 14:26     ` Peter Zijlstra
  16 siblings, 1 reply; 67+ messages in thread
From: Paul E. McKenney @ 2015-10-06 16:29 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, jiangshanlai, dipankar, akpm, mathieu.desnoyers, josh,
	tglx, peterz, rostedt, dhowells, edumazet, dvhart, fweisbec,
	oleg, bobby.prani, Paul E. McKenney

Earlier versions of synchronize_sched_expedited() can prematurely end
grace periods due to the fact that a CPU marked as cpu_is_offline()
can still be using RCU read-side critical sections during the time that
CPU makes its last pass through the scheduler and into the idle loop
and during the time that a given CPU is in the process of coming online.
This commit therefore eliminates this window by adding additional
interaction with the CPU-hotplug operations.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 kernel/rcu/tree.c | 68 ++++++++++++++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 62 insertions(+), 6 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index ed957c3b6c86..80c834c46b8d 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -246,17 +246,23 @@ static int rcu_gp_in_progress(struct rcu_state *rsp)
  */
 void rcu_sched_qs(void)
 {
+	unsigned long flags;
+
 	if (__this_cpu_read(rcu_sched_data.cpu_no_qs.s)) {
 		trace_rcu_grace_period(TPS("rcu_sched"),
 				       __this_cpu_read(rcu_sched_data.gpnum),
 				       TPS("cpuqs"));
 		__this_cpu_write(rcu_sched_data.cpu_no_qs.b.norm, false);
+		if (!__this_cpu_read(rcu_sched_data.cpu_no_qs.b.exp))
+			return;
+		local_irq_save(flags);
 		if (__this_cpu_read(rcu_sched_data.cpu_no_qs.b.exp)) {
 			__this_cpu_write(rcu_sched_data.cpu_no_qs.b.exp, false);
 			rcu_report_exp_rdp(&rcu_sched_state,
 					   this_cpu_ptr(&rcu_sched_data),
 					   true);
 		}
+		local_irq_restore(flags);
 	}
 }
 
@@ -3553,7 +3559,10 @@ static void rcu_report_exp_cpu_mult(struct rcu_state *rsp, struct rcu_node *rnp,
 
 	raw_spin_lock_irqsave(&rnp->lock, flags);
 	smp_mb__after_unlock_lock();
-	WARN_ON_ONCE((rnp->expmask & mask) != mask);
+	if (!(rnp->expmask & mask)) {
+		raw_spin_unlock_irqrestore(&rnp->lock, flags);
+		return;
+	}
 	rnp->expmask &= ~mask;
 	__rcu_report_exp_rnp(rsp, rnp, wake, flags); /* Releases rnp->lock. */
 }
@@ -3644,12 +3653,37 @@ static struct rcu_node *exp_funnel_lock(struct rcu_state *rsp, unsigned long s)
 }
 
 /* Invoked on each online non-idle CPU for expedited quiescent state. */
-static void synchronize_sched_expedited_cpu_stop(void *data)
+static void sync_sched_exp_handler(void *data)
 {
+	struct rcu_data *rdp;
+	struct rcu_node *rnp;
+	struct rcu_state *rsp = data;
+
+	rdp = this_cpu_ptr(rsp->rda);
+	rnp = rdp->mynode;
+	if (!(READ_ONCE(rnp->expmask) & rdp->grpmask) ||
+	    __this_cpu_read(rcu_sched_data.cpu_no_qs.b.exp))
+		return;
 	__this_cpu_write(rcu_sched_data.cpu_no_qs.b.exp, true);
 	resched_cpu(smp_processor_id());
 }
 
+/* Send IPI for expedited cleanup if needed at end of CPU-hotplug operation. */
+static void sync_sched_exp_online_cleanup(int cpu)
+{
+	struct rcu_data *rdp;
+	int ret;
+	struct rcu_node *rnp;
+	struct rcu_state *rsp = &rcu_sched_state;
+
+	rdp = per_cpu_ptr(rsp->rda, cpu);
+	rnp = rdp->mynode;
+	if (!(READ_ONCE(rnp->expmask) & rdp->grpmask))
+		return;
+	ret = smp_call_function_single(cpu, sync_sched_exp_handler, rsp, 0);
+	WARN_ON_ONCE(ret);
+}
+
 /*
  * Select the nodes that the upcoming expedited grace period needs
  * to wait for.
@@ -3677,7 +3711,6 @@ static void sync_rcu_exp_select_cpus(struct rcu_state *rsp,
 			struct rcu_dynticks *rdtp = &per_cpu(rcu_dynticks, cpu);
 
 			if (raw_smp_processor_id() == cpu ||
-			    cpu_is_offline(cpu) ||
 			    !(atomic_add_return(0, &rdtp->dynticks) & 0x1))
 				mask_ofl_test |= rdp->grpmask;
 		}
@@ -3697,9 +3730,28 @@ static void sync_rcu_exp_select_cpus(struct rcu_state *rsp,
 		for (cpu = rnp->grplo; cpu <= rnp->grphi; cpu++, mask <<= 1) {
 			if (!(mask_ofl_ipi & mask))
 				continue;
+retry_ipi:
 			ret = smp_call_function_single(cpu, func, rsp, 0);
-			if (!ret)
+			if (!ret) {
 				mask_ofl_ipi &= ~mask;
+			} else {
+				/* Failed, raced with offline. */
+				raw_spin_lock_irqsave(&rnp->lock, flags);
+				if (cpu_online(cpu) &&
+				    (rnp->expmask & mask)) {
+					raw_spin_unlock_irqrestore(&rnp->lock,
+								   flags);
+					schedule_timeout_uninterruptible(1);
+					if (cpu_online(cpu) &&
+					    (rnp->expmask & mask))
+						goto retry_ipi;
+					raw_spin_lock_irqsave(&rnp->lock,
+							      flags);
+				}
+				if (!(rnp->expmask & mask))
+					mask_ofl_ipi &= ~mask;
+				raw_spin_unlock_irqrestore(&rnp->lock, flags);
+			}
 		}
 		/* Report quiescent states for those that went offline. */
 		mask_ofl_test |= mask_ofl_ipi;
@@ -3796,7 +3848,7 @@ void synchronize_sched_expedited(void)
 		return;  /* Someone else did our work for us. */
 
 	rcu_exp_gp_seq_start(rsp);
-	sync_rcu_exp_select_cpus(rsp, synchronize_sched_expedited_cpu_stop);
+	sync_rcu_exp_select_cpus(rsp, sync_sched_exp_handler);
 	synchronize_sched_expedited_wait(rsp);
 
 	rcu_exp_gp_seq_end(rsp);
@@ -4183,6 +4235,7 @@ int rcu_cpu_notify(struct notifier_block *self,
 		break;
 	case CPU_ONLINE:
 	case CPU_DOWN_FAILED:
+		sync_sched_exp_online_cleanup(cpu);
 		rcu_boost_kthread_setaffinity(rnp, -1);
 		break;
 	case CPU_DOWN_PREPARE:
@@ -4195,7 +4248,10 @@ int rcu_cpu_notify(struct notifier_block *self,
 		break;
 	case CPU_DYING_IDLE:
 		/* QS for any half-done expedited RCU-sched GP. */
-		rcu_sched_qs();
+		preempt_disable();
+		rcu_report_exp_rdp(&rcu_sched_state,
+				   this_cpu_ptr(rcu_sched_state.rda), true);
+		preempt_enable();
 
 		for_each_rcu_flavor(rsp) {
 			rcu_cleanup_dying_idle_cpu(cpu, rsp);
-- 
2.5.2


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* Re: [PATCH tip/core/rcu 02/18] rcu: Move rcu_report_exp_rnp() to allow consolidation
  2015-10-06 16:29   ` [PATCH tip/core/rcu 02/18] rcu: Move rcu_report_exp_rnp() to allow consolidation Paul E. McKenney
@ 2015-10-06 20:29     ` Peter Zijlstra
  2015-10-06 20:58       ` Paul E. McKenney
  0 siblings, 1 reply; 67+ messages in thread
From: Peter Zijlstra @ 2015-10-06 20:29 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, mingo, jiangshanlai, dipankar, akpm,
	mathieu.desnoyers, josh, tglx, rostedt, dhowells, edumazet,
	dvhart, fweisbec, oleg, bobby.prani

On Tue, Oct 06, 2015 at 09:29:21AM -0700, Paul E. McKenney wrote:
> +static void __maybe_unused rcu_report_exp_rnp(struct rcu_state *rsp,
> +					      struct rcu_node *rnp, bool wake)
> +{
> +	unsigned long flags;
> +	unsigned long mask;
> +
> +	raw_spin_lock_irqsave(&rnp->lock, flags);

Normally we require a comment with barriers, explaining the order and
the pairing etc.. :-)

> +	smp_mb__after_unlock_lock();
> +	for (;;) {
> +		if (!sync_rcu_preempt_exp_done(rnp)) {
> +			raw_spin_unlock_irqrestore(&rnp->lock, flags);
> +			break;
> +		}
> +		if (rnp->parent == NULL) {
> +			raw_spin_unlock_irqrestore(&rnp->lock, flags);
> +			if (wake) {
> +				smp_mb(); /* EGP done before wake_up(). */
> +				wake_up(&rsp->expedited_wq);
> +			}
> +			break;
> +		}
> +		mask = rnp->grpmask;
> +		raw_spin_unlock(&rnp->lock); /* irqs remain disabled */
> +		rnp = rnp->parent;
> +		raw_spin_lock(&rnp->lock); /* irqs already disabled */
> +		smp_mb__after_unlock_lock();
> +		rnp->expmask &= ~mask;
> +	}
> +}

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH tip/core/rcu 02/18] rcu: Move rcu_report_exp_rnp() to allow consolidation
  2015-10-06 20:29     ` Peter Zijlstra
@ 2015-10-06 20:58       ` Paul E. McKenney
  2015-10-07  7:51         ` Peter Zijlstra
  0 siblings, 1 reply; 67+ messages in thread
From: Paul E. McKenney @ 2015-10-06 20:58 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, mingo, jiangshanlai, dipankar, akpm,
	mathieu.desnoyers, josh, tglx, rostedt, dhowells, edumazet,
	dvhart, fweisbec, oleg, bobby.prani

On Tue, Oct 06, 2015 at 10:29:37PM +0200, Peter Zijlstra wrote:
> On Tue, Oct 06, 2015 at 09:29:21AM -0700, Paul E. McKenney wrote:
> > +static void __maybe_unused rcu_report_exp_rnp(struct rcu_state *rsp,
> > +					      struct rcu_node *rnp, bool wake)
> > +{
> > +	unsigned long flags;
> > +	unsigned long mask;
> > +
> > +	raw_spin_lock_irqsave(&rnp->lock, flags);
> 
> Normally we require a comment with barriers, explaining the order and
> the pairing etc.. :-)
> 
> > +	smp_mb__after_unlock_lock();

Hmmmm...  That is not good.

Worse yet, I am missing comments on most of the pre-existing barriers
of this form.

The purpose is to enforce the heavy-weight grace-period memory-ordering
guarantees documented in the synchronize_sched() header comment and
elsewhere.  They pair with anything you might use to check for violation
of these guarantees, or, simiarly, any ordering that you might use when
relying on these guarantees.

I could add something like  "/* Enforce GP memory ordering. */"

Or perhaps "/* See synchronize_sched() header. */"

I do not propose reproducing the synchronize_sched() header on each
of these.  That would be verbose, even for me!  ;-)

Other thoughts?

							Thanx, Paul

> > +	for (;;) {
> > +		if (!sync_rcu_preempt_exp_done(rnp)) {
> > +			raw_spin_unlock_irqrestore(&rnp->lock, flags);
> > +			break;
> > +		}
> > +		if (rnp->parent == NULL) {
> > +			raw_spin_unlock_irqrestore(&rnp->lock, flags);
> > +			if (wake) {
> > +				smp_mb(); /* EGP done before wake_up(). */
> > +				wake_up(&rsp->expedited_wq);
> > +			}
> > +			break;
> > +		}
> > +		mask = rnp->grpmask;
> > +		raw_spin_unlock(&rnp->lock); /* irqs remain disabled */
> > +		rnp = rnp->parent;
> > +		raw_spin_lock(&rnp->lock); /* irqs already disabled */
> > +		smp_mb__after_unlock_lock();
> > +		rnp->expmask &= ~mask;
> > +	}
> > +}
> 


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH tip/core/rcu 02/18] rcu: Move rcu_report_exp_rnp() to allow consolidation
  2015-10-06 20:58       ` Paul E. McKenney
@ 2015-10-07  7:51         ` Peter Zijlstra
  2015-10-07  8:42           ` Mathieu Desnoyers
  2015-10-07 14:33           ` Paul E. McKenney
  0 siblings, 2 replies; 67+ messages in thread
From: Peter Zijlstra @ 2015-10-07  7:51 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, mingo, jiangshanlai, dipankar, akpm,
	mathieu.desnoyers, josh, tglx, rostedt, dhowells, edumazet,
	dvhart, fweisbec, oleg, bobby.prani

On Tue, Oct 06, 2015 at 01:58:50PM -0700, Paul E. McKenney wrote:
> On Tue, Oct 06, 2015 at 10:29:37PM +0200, Peter Zijlstra wrote:
> > On Tue, Oct 06, 2015 at 09:29:21AM -0700, Paul E. McKenney wrote:
> > > +static void __maybe_unused rcu_report_exp_rnp(struct rcu_state *rsp,
> > > +					      struct rcu_node *rnp, bool wake)
> > > +{
> > > +	unsigned long flags;
> > > +	unsigned long mask;
> > > +
> > > +	raw_spin_lock_irqsave(&rnp->lock, flags);
> > 
> > Normally we require a comment with barriers, explaining the order and
> > the pairing etc.. :-)
> > 
> > > +	smp_mb__after_unlock_lock();
> 
> Hmmmm...  That is not good.
> 
> Worse yet, I am missing comments on most of the pre-existing barriers
> of this form.

Yes I noticed.. :/

> The purpose is to enforce the heavy-weight grace-period memory-ordering
> guarantees documented in the synchronize_sched() header comment and
> elsewhere.

> They pair with anything you might use to check for violation
> of these guarantees, or, simiarly, any ordering that you might use when
> relying on these guarantees.

I'm sure you know what that means, but I've no clue ;-) That is, I
wouldn't know where to start looking in the RCU implementation to verify
the barrier is either needed or sufficient. Unless you mean _everywhere_
:-)

> I could add something like  "/* Enforce GP memory ordering. */"
> 
> Or perhaps "/* See synchronize_sched() header. */"
> 
> I do not propose reproducing the synchronize_sched() header on each
> of these.  That would be verbose, even for me!  ;-)
> 
> Other thoughts?

Well, this is an UNLOCK+LOCK on non-matching lock variables upgrade to
full barrier thing, right?

To me its not clear which UNLOCK we even match here. I've just read the
sync_sched() header, but that doesn't help me either, so referring to
that isn't really helpful either.

In any case, I don't want to make too big a fuzz here, but I just
stumbled over a lot of unannotated barriers and figured I ought to say
something about it.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH tip/core/rcu 02/18] rcu: Move rcu_report_exp_rnp() to allow consolidation
  2015-10-07  7:51         ` Peter Zijlstra
@ 2015-10-07  8:42           ` Mathieu Desnoyers
  2015-10-07 11:01             ` Peter Zijlstra
  2015-10-07 14:33           ` Paul E. McKenney
  1 sibling, 1 reply; 67+ messages in thread
From: Mathieu Desnoyers @ 2015-10-07  8:42 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Paul E. McKenney, linux-kernel, Ingo Molnar, Lai Jiangshan,
	dipankar, Andrew Morton, josh, Thomas Gleixner, rostedt,
	dhowells, edumazet, dvhart, fweisbec, oleg, bobby prani

----- On Oct 7, 2015, at 3:51 AM, Peter Zijlstra peterz@infradead.org wrote:

> On Tue, Oct 06, 2015 at 01:58:50PM -0700, Paul E. McKenney wrote:
>> On Tue, Oct 06, 2015 at 10:29:37PM +0200, Peter Zijlstra wrote:
>> > On Tue, Oct 06, 2015 at 09:29:21AM -0700, Paul E. McKenney wrote:
>> > > +static void __maybe_unused rcu_report_exp_rnp(struct rcu_state *rsp,
>> > > +					      struct rcu_node *rnp, bool wake)
>> > > +{
>> > > +	unsigned long flags;
>> > > +	unsigned long mask;
>> > > +
>> > > +	raw_spin_lock_irqsave(&rnp->lock, flags);
>> > 
>> > Normally we require a comment with barriers, explaining the order and
>> > the pairing etc.. :-)
>> > 
>> > > +	smp_mb__after_unlock_lock();
>> 
>> Hmmmm...  That is not good.
>> 
>> Worse yet, I am missing comments on most of the pre-existing barriers
>> of this form.
> 
> Yes I noticed.. :/
> 
>> The purpose is to enforce the heavy-weight grace-period memory-ordering
>> guarantees documented in the synchronize_sched() header comment and
>> elsewhere.
> 
>> They pair with anything you might use to check for violation
>> of these guarantees, or, simiarly, any ordering that you might use when
>> relying on these guarantees.
> 
> I'm sure you know what that means, but I've no clue ;-) That is, I
> wouldn't know where to start looking in the RCU implementation to verify
> the barrier is either needed or sufficient. Unless you mean _everywhere_
> :-)

One example is the new membarrier system call. It relies on synchronize_sched()
to enforce this:

from kernel/membarrier.c:

 * All memory accesses performed in program order from each targeted thread
 * is guaranteed to be ordered with respect to sys_membarrier(). If we use
 * the semantic "barrier()" to represent a compiler barrier forcing memory
 * accesses to be performed in program order across the barrier, and
 * smp_mb() to represent explicit memory barriers forcing full memory
 * ordering across the barrier, we have the following ordering table for
 * each pair of barrier(), sys_membarrier() and smp_mb():
 *
 * The pair ordering is detailed as (O: ordered, X: not ordered):
 *
 *                        barrier()   smp_mb() sys_membarrier()
 *        barrier()          X           X            O
 *        smp_mb()           X           O            O
 *        sys_membarrier()   O           O            O

And include/uapi/linux/membarrier.h:

 * @MEMBARRIER_CMD_SHARED:  Execute a memory barrier on all running threads.
 *                          Upon return from system call, the caller thread
 *                          is ensured that all running threads have passed
 *                          through a state where all memory accesses to
 *                          user-space addresses match program order between
 *                          entry to and return from the system call
 *                          (non-running threads are de facto in such a
 *                          state). This covers threads from all processes
 *                          running on the system. This command returns 0.

I hope this sheds light on a userspace-facing interface to
synchronize_sched() and clarifies its expected semantic a bit.

Thanks,

Mathieu

> 
>> I could add something like  "/* Enforce GP memory ordering. */"
>> 
>> Or perhaps "/* See synchronize_sched() header. */"
>> 
>> I do not propose reproducing the synchronize_sched() header on each
>> of these.  That would be verbose, even for me!  ;-)
>> 
>> Other thoughts?
> 
> Well, this is an UNLOCK+LOCK on non-matching lock variables upgrade to
> full barrier thing, right?
> 
> To me its not clear which UNLOCK we even match here. I've just read the
> sync_sched() header, but that doesn't help me either, so referring to
> that isn't really helpful either.
> 
> In any case, I don't want to make too big a fuzz here, but I just
> stumbled over a lot of unannotated barriers and figured I ought to say
> something about it.

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH tip/core/rcu 02/18] rcu: Move rcu_report_exp_rnp() to allow consolidation
  2015-10-07  8:42           ` Mathieu Desnoyers
@ 2015-10-07 11:01             ` Peter Zijlstra
  2015-10-07 11:50               ` Peter Zijlstra
  2015-10-07 15:15               ` Paul E. McKenney
  0 siblings, 2 replies; 67+ messages in thread
From: Peter Zijlstra @ 2015-10-07 11:01 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Paul E. McKenney, linux-kernel, Ingo Molnar, Lai Jiangshan,
	dipankar, Andrew Morton, josh, Thomas Gleixner, rostedt,
	dhowells, edumazet, dvhart, fweisbec, oleg, bobby prani

On Wed, Oct 07, 2015 at 08:42:05AM +0000, Mathieu Desnoyers wrote:
> ----- On Oct 7, 2015, at 3:51 AM, Peter Zijlstra peterz@infradead.org wrote:
> 
> > On Tue, Oct 06, 2015 at 01:58:50PM -0700, Paul E. McKenney wrote:
> >> On Tue, Oct 06, 2015 at 10:29:37PM +0200, Peter Zijlstra wrote:
> >> > On Tue, Oct 06, 2015 at 09:29:21AM -0700, Paul E. McKenney wrote:
> >> > > +static void __maybe_unused rcu_report_exp_rnp(struct rcu_state *rsp,
> >> > > +					      struct rcu_node *rnp, bool wake)
> >> > > +{
> >> > > +	unsigned long flags;
> >> > > +	unsigned long mask;
> >> > > +
> >> > > +	raw_spin_lock_irqsave(&rnp->lock, flags);
> >> > 
> >> > Normally we require a comment with barriers, explaining the order and
> >> > the pairing etc.. :-)
> >> > 
> >> > > +	smp_mb__after_unlock_lock();
> >> 
> >> Hmmmm...  That is not good.
> >> 
> >> Worse yet, I am missing comments on most of the pre-existing barriers
> >> of this form.
> > 
> > Yes I noticed.. :/
> > 
> >> The purpose is to enforce the heavy-weight grace-period memory-ordering
> >> guarantees documented in the synchronize_sched() header comment and
> >> elsewhere.
> > 
> >> They pair with anything you might use to check for violation
> >> of these guarantees, or, simiarly, any ordering that you might use when
> >> relying on these guarantees.
> > 
> > I'm sure you know what that means, but I've no clue ;-) That is, I
> > wouldn't know where to start looking in the RCU implementation to verify
> > the barrier is either needed or sufficient. Unless you mean _everywhere_
> > :-)
> 
> One example is the new membarrier system call. It relies on synchronize_sched()
> to enforce this:

That again doesn't explain which UNLOCKs with non-matching lock values
it pairs with and what particular ordering is important here.

I'm fully well aware of what sync_sched() guarantees and how one can use
it, that is not the issue, what I'm saying is that a generic description
of sync_sched() doesn't help in figuring out WTH that barrier is for and
which other code I should also inspect.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH tip/core/rcu 02/18] rcu: Move rcu_report_exp_rnp() to allow consolidation
  2015-10-07 11:01             ` Peter Zijlstra
@ 2015-10-07 11:50               ` Peter Zijlstra
  2015-10-07 12:03                 ` Peter Zijlstra
                                   ` (4 more replies)
  2015-10-07 15:15               ` Paul E. McKenney
  1 sibling, 5 replies; 67+ messages in thread
From: Peter Zijlstra @ 2015-10-07 11:50 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Paul E. McKenney, linux-kernel, Ingo Molnar, Lai Jiangshan,
	dipankar, Andrew Morton, josh, Thomas Gleixner, rostedt,
	dhowells, edumazet, dvhart, fweisbec, oleg, bobby prani

On Wed, Oct 07, 2015 at 01:01:20PM +0200, Peter Zijlstra wrote:

> That again doesn't explain which UNLOCKs with non-matching lock values
> it pairs with and what particular ordering is important here.

So after staring at that stuff for a while I came up with the following.
Does this make sense, or am I completely misunderstanding things?

Not been near a compiler.

---
 kernel/rcu/tree.c | 99 ++++++++++++++++++++++++++++++++++---------------------
 1 file changed, 61 insertions(+), 38 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 775d36cc0050..46e1e23ff762 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -1460,6 +1460,48 @@ static void trace_rcu_future_gp(struct rcu_node *rnp, struct rcu_data *rdp,
 }
 
 /*
+ * Wrappers for the rcu_node::lock acquire.
+ *
+ * Because the rcu_nodes form a tree, the tree taversal locking will observe
+ * different lock values, this in turn means that an UNLOCK of one level
+ * followed by a LOCK of another level does not imply a full memory barrier;
+ * and most importantly transitivity is lost.
+ *
+ * In order to restore full ordering between tree levels, augment the regular
+ * lock acquire functions with smp_mb__after_unlock_lock().
+ */
+static inline void raw_spin_lock_rcu_node(struct rcu_node *rnp)
+{
+	raw_spin_lock(&rnp->lock);
+	smp_mb__after_unlock_lock();
+}
+
+static inline void raw_spin_lock_irq_rcu_node(struct rcu_node *rnp)
+{
+	raw_spin_lock_irq(&rnp->lock);
+	smp_mb__after_unlock_lock();
+}
+
+static inline void
+_raw_spin_lock_irqsave_rcu_node(struct rcu_node *rnp, unsigned long *flags)
+{
+	_raw_spin_lock_irqsave(&rnp->lock, flags);
+	smp_mb__after_unlock_lock();
+}
+
+#define raw_spin_lock_irqsave_rcu_node(rnp, flags)
+	_raw_spin_lock_irqsave_rcu_node((rnp), &(flags))
+
+static inline bool raw_spin_trylock_rcu_node(struct rcu_node *rnp)
+{
+	bool locked = raw_spin_trylock(&rnp->lock);
+	if (locked)
+		smp_mb__after_unlock_lock();
+	return locked;
+}
+
+
+/*
  * Start some future grace period, as needed to handle newly arrived
  * callbacks.  The required future grace periods are recorded in each
  * rcu_node structure's ->need_future_gp field.  Returns true if there
@@ -1512,10 +1554,8 @@ rcu_start_future_gp(struct rcu_node *rnp, struct rcu_data *rdp,
 	 * hold it, acquire the root rcu_node structure's lock in order to
 	 * start one (if needed).
 	 */
-	if (rnp != rnp_root) {
-		raw_spin_lock(&rnp_root->lock);
-		smp_mb__after_unlock_lock();
-	}
+	if (rnp != rnp_root)
+		raw_spin_lock_rcu_node(rnp);
 
 	/*
 	 * Get a new grace-period number.  If there really is no grace
@@ -1764,11 +1804,10 @@ static void note_gp_changes(struct rcu_state *rsp, struct rcu_data *rdp)
 	if ((rdp->gpnum == READ_ONCE(rnp->gpnum) &&
 	     rdp->completed == READ_ONCE(rnp->completed) &&
 	     !unlikely(READ_ONCE(rdp->gpwrap))) || /* w/out lock. */
-	    !raw_spin_trylock(&rnp->lock)) { /* irqs already off, so later. */
+	    !raw_spin_trylock_rcu_node(rnp)) { /* irqs already off, so later. */
 		local_irq_restore(flags);
 		return;
 	}
-	smp_mb__after_unlock_lock();
 	needwake = __note_gp_changes(rsp, rnp, rdp);
 	raw_spin_unlock_irqrestore(&rnp->lock, flags);
 	if (needwake)
@@ -1792,8 +1831,7 @@ static int rcu_gp_init(struct rcu_state *rsp)
 	struct rcu_node *rnp = rcu_get_root(rsp);
 
 	WRITE_ONCE(rsp->gp_activity, jiffies);
-	raw_spin_lock_irq(&rnp->lock);
-	smp_mb__after_unlock_lock();
+	raw_spin_lock_irq_rcu_node(rnp);
 	if (!READ_ONCE(rsp->gp_flags)) {
 		/* Spurious wakeup, tell caller to go back to sleep.  */
 		raw_spin_unlock_irq(&rnp->lock);
@@ -1825,8 +1863,7 @@ static int rcu_gp_init(struct rcu_state *rsp)
 	 */
 	rcu_for_each_leaf_node(rsp, rnp) {
 		rcu_gp_slow(rsp, gp_preinit_delay);
-		raw_spin_lock_irq(&rnp->lock);
-		smp_mb__after_unlock_lock();
+		raw_spin_lock_irq_rcu_node(rnp);
 		if (rnp->qsmaskinit == rnp->qsmaskinitnext &&
 		    !rnp->wait_blkd_tasks) {
 			/* Nothing to do on this leaf rcu_node structure. */
@@ -1882,8 +1919,7 @@ static int rcu_gp_init(struct rcu_state *rsp)
 	 */
 	rcu_for_each_node_breadth_first(rsp, rnp) {
 		rcu_gp_slow(rsp, gp_init_delay);
-		raw_spin_lock_irq(&rnp->lock);
-		smp_mb__after_unlock_lock();
+		raw_spin_lock_irq_rcu_node(rnp);
 		rdp = this_cpu_ptr(rsp->rda);
 		rcu_preempt_check_blocked_tasks(rnp);
 		rnp->qsmask = rnp->qsmaskinit;
@@ -1953,8 +1989,7 @@ static int rcu_gp_fqs(struct rcu_state *rsp, int fqs_state_in)
 	}
 	/* Clear flag to prevent immediate re-entry. */
 	if (READ_ONCE(rsp->gp_flags) & RCU_GP_FLAG_FQS) {
-		raw_spin_lock_irq(&rnp->lock);
-		smp_mb__after_unlock_lock();
+		raw_spin_lock_irq_rcu_node(rnp);
 		WRITE_ONCE(rsp->gp_flags,
 			   READ_ONCE(rsp->gp_flags) & ~RCU_GP_FLAG_FQS);
 		raw_spin_unlock_irq(&rnp->lock);
@@ -1974,8 +2009,7 @@ static void rcu_gp_cleanup(struct rcu_state *rsp)
 	struct rcu_node *rnp = rcu_get_root(rsp);
 
 	WRITE_ONCE(rsp->gp_activity, jiffies);
-	raw_spin_lock_irq(&rnp->lock);
-	smp_mb__after_unlock_lock();
+	raw_spin_lock_irq_rcu_node(rnp);
 	gp_duration = jiffies - rsp->gp_start;
 	if (gp_duration > rsp->gp_max)
 		rsp->gp_max = gp_duration;
@@ -2000,8 +2034,7 @@ static void rcu_gp_cleanup(struct rcu_state *rsp)
 	 * grace period is recorded in any of the rcu_node structures.
 	 */
 	rcu_for_each_node_breadth_first(rsp, rnp) {
-		raw_spin_lock_irq(&rnp->lock);
-		smp_mb__after_unlock_lock();
+		raw_spin_lock_irq_rcu_node(rnp);
 		WARN_ON_ONCE(rcu_preempt_blocked_readers_cgp(rnp));
 		WARN_ON_ONCE(rnp->qsmask);
 		WRITE_ONCE(rnp->completed, rsp->gpnum);
@@ -2016,8 +2049,7 @@ static void rcu_gp_cleanup(struct rcu_state *rsp)
 		rcu_gp_slow(rsp, gp_cleanup_delay);
 	}
 	rnp = rcu_get_root(rsp);
-	raw_spin_lock_irq(&rnp->lock);
-	smp_mb__after_unlock_lock(); /* Order GP before ->completed update. */
+	raw_spin_lock_irq_rcu_node(rnp); /* Order GP before ->completed update. */
 	rcu_nocb_gp_set(rnp, nocb);
 
 	/* Declare grace period done. */
@@ -2264,8 +2296,7 @@ rcu_report_qs_rnp(unsigned long mask, struct rcu_state *rsp,
 		raw_spin_unlock_irqrestore(&rnp->lock, flags);
 		rnp_c = rnp;
 		rnp = rnp->parent;
-		raw_spin_lock_irqsave(&rnp->lock, flags);
-		smp_mb__after_unlock_lock();
+		raw_spin_lock_irqsave_rcu_node(rnp, flags);
 		oldmask = rnp_c->qsmask;
 	}
 
@@ -2312,8 +2343,7 @@ static void rcu_report_unblock_qs_rnp(struct rcu_state *rsp,
 	gps = rnp->gpnum;
 	mask = rnp->grpmask;
 	raw_spin_unlock(&rnp->lock);	/* irqs remain disabled. */
-	raw_spin_lock(&rnp_p->lock);	/* irqs already disabled. */
-	smp_mb__after_unlock_lock();
+	raw_spin_lock_rcu_node(rnp);	/* irqs already disabled. */
 	rcu_report_qs_rnp(mask, rsp, rnp_p, gps, flags);
 }
 
@@ -2335,8 +2365,7 @@ rcu_report_qs_rdp(int cpu, struct rcu_state *rsp, struct rcu_data *rdp)
 	struct rcu_node *rnp;
 
 	rnp = rdp->mynode;
-	raw_spin_lock_irqsave(&rnp->lock, flags);
-	smp_mb__after_unlock_lock();
+	raw_spin_lock_irqsave_rcu_node(rnp, flags);
 	if ((rdp->passed_quiesce == 0 &&
 	     rdp->rcu_qs_ctr_snap == __this_cpu_read(rcu_qs_ctr)) ||
 	    rdp->gpnum != rnp->gpnum || rnp->completed == rnp->gpnum ||
@@ -2562,8 +2591,7 @@ static void rcu_cleanup_dead_rnp(struct rcu_node *rnp_leaf)
 		rnp = rnp->parent;
 		if (!rnp)
 			break;
-		raw_spin_lock(&rnp->lock); /* irqs already disabled. */
-		smp_mb__after_unlock_lock(); /* GP memory ordering. */
+		raw_spin_lock_rcu_node(rnp); /* irqs already disabled. */
 		rnp->qsmaskinit &= ~mask;
 		rnp->qsmask &= ~mask;
 		if (rnp->qsmaskinit) {
@@ -2591,8 +2619,7 @@ static void rcu_cleanup_dying_idle_cpu(int cpu, struct rcu_state *rsp)
 
 	/* Remove outgoing CPU from mask in the leaf rcu_node structure. */
 	mask = rdp->grpmask;
-	raw_spin_lock_irqsave(&rnp->lock, flags);
-	smp_mb__after_unlock_lock();	/* Enforce GP memory-order guarantee. */
+	raw_spin_lock_irqsave_rcu_node(rnp, flags); /* Enforce GP memory-order guarantee. */
 	rnp->qsmaskinitnext &= ~mask;
 	raw_spin_unlock_irqrestore(&rnp->lock, flags);
 }
@@ -2789,8 +2816,7 @@ static void force_qs_rnp(struct rcu_state *rsp,
 	rcu_for_each_leaf_node(rsp, rnp) {
 		cond_resched_rcu_qs();
 		mask = 0;
-		raw_spin_lock_irqsave(&rnp->lock, flags);
-		smp_mb__after_unlock_lock();
+		raw_spin_lock_irqsave_rcu_node(rnp, flags);
 		if (rnp->qsmask == 0) {
 			if (rcu_state_p == &rcu_sched_state ||
 			    rsp != rcu_state_p ||
@@ -2861,8 +2887,7 @@ static void force_quiescent_state(struct rcu_state *rsp)
 	/* rnp_old == rcu_get_root(rsp), rnp == NULL. */
 
 	/* Reached the root of the rcu_node tree, acquire lock. */
-	raw_spin_lock_irqsave(&rnp_old->lock, flags);
-	smp_mb__after_unlock_lock();
+	raw_spin_lock_irqsave_rcu_node(rnp_old, flags);
 	raw_spin_unlock(&rnp_old->fqslock);
 	if (READ_ONCE(rsp->gp_flags) & RCU_GP_FLAG_FQS) {
 		rsp->n_force_qs_lh++;
@@ -2985,8 +3010,7 @@ static void __call_rcu_core(struct rcu_state *rsp, struct rcu_data *rdp,
 		if (!rcu_gp_in_progress(rsp)) {
 			struct rcu_node *rnp_root = rcu_get_root(rsp);
 
-			raw_spin_lock(&rnp_root->lock);
-			smp_mb__after_unlock_lock();
+			raw_spin_lock_rcu_node(rnp_root);
 			needwake = rcu_start_gp(rsp);
 			raw_spin_unlock(&rnp_root->lock);
 			if (needwake)
@@ -3925,8 +3949,7 @@ rcu_init_percpu_data(int cpu, struct rcu_state *rsp)
 	 */
 	rnp = rdp->mynode;
 	mask = rdp->grpmask;
-	raw_spin_lock(&rnp->lock);		/* irqs already disabled. */
-	smp_mb__after_unlock_lock();
+	raw_spin_lock_rcu_node(rnp);		/* irqs already disabled. */
 	rnp->qsmaskinitnext |= mask;
 	rdp->gpnum = rnp->completed; /* Make CPU later note any new GP. */
 	rdp->completed = rnp->completed;

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* Re: [PATCH tip/core/rcu 02/18] rcu: Move rcu_report_exp_rnp() to allow consolidation
  2015-10-07 11:50               ` Peter Zijlstra
@ 2015-10-07 12:03                 ` Peter Zijlstra
  2015-10-07 12:05                 ` kbuild test robot
                                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 67+ messages in thread
From: Peter Zijlstra @ 2015-10-07 12:03 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Paul E. McKenney, linux-kernel, Ingo Molnar, Lai Jiangshan,
	dipankar, Andrew Morton, josh, Thomas Gleixner, rostedt,
	dhowells, edumazet, dvhart, fweisbec, oleg, bobby prani

On Wed, Oct 07, 2015 at 01:50:46PM +0200, Peter Zijlstra wrote:
> @@ -1512,10 +1554,8 @@ rcu_start_future_gp(struct rcu_node *rnp, struct rcu_data *rdp,
>  	 * hold it, acquire the root rcu_node structure's lock in order to
>  	 * start one (if needed).
>  	 */
> -	if (rnp != rnp_root) {
> -		raw_spin_lock(&rnp_root->lock);
> -		smp_mb__after_unlock_lock();
> -	}
> +	if (rnp != rnp_root)
> +		raw_spin_lock_rcu_node(rnp);

					rnp_root


> @@ -2312,8 +2343,7 @@ static void rcu_report_unblock_qs_rnp(struct rcu_state *rsp,
>  	gps = rnp->gpnum;
>  	mask = rnp->grpmask;
>  	raw_spin_unlock(&rnp->lock);	/* irqs remain disabled. */
> -	raw_spin_lock(&rnp_p->lock);	/* irqs already disabled. */
> -	smp_mb__after_unlock_lock();
> +	raw_spin_lock_rcu_node(rnp);	/* irqs already disabled. */

				rnp_p


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: Re: [PATCH tip/core/rcu 02/18] rcu: Move rcu_report_exp_rnp() to allow consolidation
  2015-10-07 11:50               ` Peter Zijlstra
  2015-10-07 12:03                 ` Peter Zijlstra
@ 2015-10-07 12:05                 ` kbuild test robot
  2015-10-07 12:09                 ` kbuild test robot
                                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 67+ messages in thread
From: kbuild test robot @ 2015-10-07 12:05 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: kbuild-all, Mathieu Desnoyers, Paul E. McKenney, linux-kernel,
	Ingo Molnar, Lai Jiangshan, dipankar, Andrew Morton, josh,
	Thomas Gleixner, rostedt, dhowells, edumazet, dvhart, fweisbec,
	oleg, bobby prani

[-- Attachment #1: Type: text/plain, Size: 2214 bytes --]

Hi Peter,

[auto build test ERROR on v4.3-rc4 -- if it's inappropriate base, please ignore]

config: i386-randconfig-x004-201540 (attached as .config)
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

All errors (new ones prefixed by >>):

   kernel/rcu/tree.c: In function '_raw_spin_lock_irqsave_rcu_node':
>> kernel/rcu/tree.c:1488:2: error: too many arguments to function '_raw_spin_lock_irqsave'
     _raw_spin_lock_irqsave(&rnp->lock, flags);
     ^
   In file included from include/linux/spinlock.h:280:0,
                    from kernel/rcu/tree.c:33:
   include/linux/spinlock_api_smp.h:34:26: note: declared here
    unsigned long __lockfunc _raw_spin_lock_irqsave(raw_spinlock_t *lock)
                             ^
   kernel/rcu/tree.c: At top level:
>> kernel/rcu/tree.c:1493:34: error: expected declaration specifiers or '...' before '(' token
     _raw_spin_lock_irqsave_rcu_node((rnp), &(flags))
                                     ^
>> kernel/rcu/tree.c:1493:41: error: expected declaration specifiers or '...' before '&' token
     _raw_spin_lock_irqsave_rcu_node((rnp), &(flags))
                                            ^
   kernel/rcu/tree.c: In function 'note_gp_changes':
>> kernel/rcu/tree.c:1807:7: error: implicit declaration of function 'raw_spin_trylock_rcu_node' [-Werror=implicit-function-declaration]
         !raw_spin_trylock_rcu_node(rnp)) { /* irqs already off, so later. */
          ^
   cc1: some warnings being treated as errors

vim +/_raw_spin_lock_irqsave +1488 kernel/rcu/tree.c

  1482		smp_mb__after_unlock_lock();
  1483	}
  1484	
  1485	static inline void
  1486	_raw_spin_lock_irqsave_rcu_node(struct rcu_node *rnp, unsigned long *flags)
  1487	{
> 1488		_raw_spin_lock_irqsave(&rnp->lock, flags);
  1489		smp_mb__after_unlock_lock();
  1490	}
  1491	
  1492	#define raw_spin_lock_irqsave_rcu_node(rnp, flags)
> 1493		_raw_spin_lock_irqsave_rcu_node((rnp), &(flags))
  1494	
  1495	static inline bool raw_spin_trylock_rcu_node(struct rcu_node *rnp)
  1496	{

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/octet-stream, Size: 20374 bytes --]

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: Re: [PATCH tip/core/rcu 02/18] rcu: Move rcu_report_exp_rnp() to allow consolidation
  2015-10-07 11:50               ` Peter Zijlstra
  2015-10-07 12:03                 ` Peter Zijlstra
  2015-10-07 12:05                 ` kbuild test robot
@ 2015-10-07 12:09                 ` kbuild test robot
  2015-10-07 12:11                 ` kbuild test robot
  2015-10-07 15:18                 ` Paul E. McKenney
  4 siblings, 0 replies; 67+ messages in thread
From: kbuild test robot @ 2015-10-07 12:09 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: kbuild-all, Mathieu Desnoyers, Paul E. McKenney, linux-kernel,
	Ingo Molnar, Lai Jiangshan, dipankar, Andrew Morton, josh,
	Thomas Gleixner, rostedt, dhowells, edumazet, dvhart, fweisbec,
	oleg, bobby prani

[-- Attachment #1: Type: text/plain, Size: 4325 bytes --]

Hi Peter,

[auto build test WARNING on v4.3-rc4 -- if it's inappropriate base, please ignore]

config: x86_64-randconfig-x010-201540 (attached as .config)
reproduce:
        # save the attached .config to linux build tree
        make ARCH=x86_64 

All warnings (new ones prefixed by >>):

   In file included from include/linux/kernel.h:12:0,
                    from kernel/rcu/tree.c:31:
   kernel/rcu/tree.c: In function '_raw_spin_lock_irqsave_rcu_node':
   include/linux/typecheck.h:11:18: warning: comparison of distinct pointer types lacks a cast
     (void)(&__dummy == &__dummy2); \
                     ^
>> include/linux/irqflags.h:63:3: note: in expansion of macro 'typecheck'
      typecheck(unsigned long, flags); \
      ^
>> include/linux/irqflags.h:95:3: note: in expansion of macro 'raw_local_irq_save'
      raw_local_irq_save(flags);  \
      ^
>> include/linux/spinlock_api_up.h:40:8: note: in expansion of macro 'local_irq_save'
      do { local_irq_save(flags); __LOCK(lock); } while (0)
           ^
>> include/linux/spinlock_api_up.h:69:45: note: in expansion of macro '__LOCK_IRQSAVE'
    #define _raw_spin_lock_irqsave(lock, flags) __LOCK_IRQSAVE(lock, flags)
                                                ^
>> kernel/rcu/tree.c:1488:2: note: in expansion of macro '_raw_spin_lock_irqsave'
     _raw_spin_lock_irqsave(&rnp->lock, flags);
     ^
   In file included from arch/x86/include/asm/processor.h:32:0,
                    from arch/x86/include/asm/thread_info.h:52,
                    from include/linux/thread_info.h:54,
                    from arch/x86/include/asm/preempt.h:6,
                    from include/linux/preempt.h:64,
                    from include/linux/spinlock.h:50,
                    from kernel/rcu/tree.c:33:
>> include/linux/irqflags.h:64:9: warning: assignment makes pointer from integer without a cast [-Wint-conversion]
      flags = arch_local_irq_save();  \
            ^
>> include/linux/irqflags.h:95:3: note: in expansion of macro 'raw_local_irq_save'
      raw_local_irq_save(flags);  \
      ^
>> include/linux/spinlock_api_up.h:40:8: note: in expansion of macro 'local_irq_save'
      do { local_irq_save(flags); __LOCK(lock); } while (0)
           ^
>> include/linux/spinlock_api_up.h:69:45: note: in expansion of macro '__LOCK_IRQSAVE'
    #define _raw_spin_lock_irqsave(lock, flags) __LOCK_IRQSAVE(lock, flags)
                                                ^
>> kernel/rcu/tree.c:1488:2: note: in expansion of macro '_raw_spin_lock_irqsave'
     _raw_spin_lock_irqsave(&rnp->lock, flags);
     ^
   kernel/rcu/tree.c: At top level:
   kernel/rcu/tree.c:1493:34: error: expected declaration specifiers or '...' before '(' token
     _raw_spin_lock_irqsave_rcu_node((rnp), &(flags))
                                     ^
   kernel/rcu/tree.c:1493:41: error: expected declaration specifiers or '...' before '&' token
     _raw_spin_lock_irqsave_rcu_node((rnp), &(flags))
                                            ^
   kernel/rcu/tree.c: In function 'note_gp_changes':
   kernel/rcu/tree.c:1807:7: error: implicit declaration of function 'raw_spin_trylock_rcu_node' [-Werror=implicit-function-declaration]
         !raw_spin_trylock_rcu_node(rnp)) { /* irqs already off, so later. */
          ^
   cc1: some warnings being treated as errors

vim +/_raw_spin_lock_irqsave +1488 kernel/rcu/tree.c

  1472	 */
  1473	static inline void raw_spin_lock_rcu_node(struct rcu_node *rnp)
  1474	{
  1475		raw_spin_lock(&rnp->lock);
  1476		smp_mb__after_unlock_lock();
  1477	}
  1478	
  1479	static inline void raw_spin_lock_irq_rcu_node(struct rcu_node *rnp)
  1480	{
  1481		raw_spin_lock_irq(&rnp->lock);
  1482		smp_mb__after_unlock_lock();
  1483	}
  1484	
  1485	static inline void
  1486	_raw_spin_lock_irqsave_rcu_node(struct rcu_node *rnp, unsigned long *flags)
  1487	{
> 1488		_raw_spin_lock_irqsave(&rnp->lock, flags);
  1489		smp_mb__after_unlock_lock();
  1490	}
  1491	
  1492	#define raw_spin_lock_irqsave_rcu_node(rnp, flags)
  1493		_raw_spin_lock_irqsave_rcu_node((rnp), &(flags))
  1494	
  1495	static inline bool raw_spin_trylock_rcu_node(struct rcu_node *rnp)
  1496	{

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/octet-stream, Size: 21407 bytes --]

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: Re: [PATCH tip/core/rcu 02/18] rcu: Move rcu_report_exp_rnp() to allow consolidation
  2015-10-07 11:50               ` Peter Zijlstra
                                   ` (2 preceding siblings ...)
  2015-10-07 12:09                 ` kbuild test robot
@ 2015-10-07 12:11                 ` kbuild test robot
  2015-10-07 12:17                   ` Peter Zijlstra
  2015-10-07 15:18                 ` Paul E. McKenney
  4 siblings, 1 reply; 67+ messages in thread
From: kbuild test robot @ 2015-10-07 12:11 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: kbuild-all, Mathieu Desnoyers, Paul E. McKenney, linux-kernel,
	Ingo Molnar, Lai Jiangshan, dipankar, Andrew Morton, josh,
	Thomas Gleixner, rostedt, dhowells, edumazet, dvhart, fweisbec,
	oleg, bobby prani

[-- Attachment #1: Type: text/plain, Size: 4893 bytes --]

Hi Peter,

[auto build test WARNING on v4.3-rc4 -- if it's inappropriate base, please ignore]

config: i386-randconfig-x007-201540 (attached as .config)
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

All warnings (new ones prefixed by >>):

   kernel/rcu/tree.c: In function '_raw_spin_lock_irqsave_rcu_node':
   kernel/rcu/tree.c:1488:2: error: too many arguments to function '_raw_spin_lock_irqsave'
     _raw_spin_lock_irqsave(&rnp->lock, flags);
     ^
   In file included from include/linux/spinlock.h:280:0,
                    from kernel/rcu/tree.c:33:
   include/linux/spinlock_api_smp.h:34:26: note: declared here
    unsigned long __lockfunc _raw_spin_lock_irqsave(raw_spinlock_t *lock)
                             ^
   kernel/rcu/tree.c: At top level:
   kernel/rcu/tree.c:1493:34: error: expected declaration specifiers or '...' before '(' token
     _raw_spin_lock_irqsave_rcu_node((rnp), &(flags))
                                     ^
   kernel/rcu/tree.c:1493:41: error: expected declaration specifiers or '...' before '&' token
     _raw_spin_lock_irqsave_rcu_node((rnp), &(flags))
                                            ^
   In file included from include/uapi/linux/stddef.h:1:0,
                    from include/linux/stddef.h:4,
                    from include/uapi/linux/posix_types.h:4,
                    from include/uapi/linux/types.h:13,
                    from include/linux/types.h:5,
                    from kernel/rcu/tree.c:30:
   kernel/rcu/tree.c: In function 'note_gp_changes':
   kernel/rcu/tree.c:1807:7: error: implicit declaration of function 'raw_spin_trylock_rcu_node' [-Werror=implicit-function-declaration]
         !raw_spin_trylock_rcu_node(rnp)) { /* irqs already off, so later. */
          ^
   include/linux/compiler.h:147:28: note: in definition of macro '__trace_if'
     if (__builtin_constant_p((cond)) ? !!(cond) :   \
                               ^
>> kernel/rcu/tree.c:1804:2: note: in expansion of macro 'if'
     if ((rdp->gpnum == READ_ONCE(rnp->gpnum) &&
     ^
   cc1: some warnings being treated as errors

vim +/if +1804 kernel/rcu/tree.c

5cd37193 kernel/rcu/tree.c Paul E. McKenney 2014-12-13  1788  		rdp->rcu_qs_ctr_snap = __this_cpu_read(rcu_qs_ctr);
6eaef633 kernel/rcutree.c  Paul E. McKenney 2013-03-19  1789  		rdp->qs_pending = !!(rnp->qsmask & rdp->grpmask);
6eaef633 kernel/rcutree.c  Paul E. McKenney 2013-03-19  1790  		zero_cpu_stall_ticks(rdp);
7d0ae808 kernel/rcu/tree.c Paul E. McKenney 2015-03-03  1791  		WRITE_ONCE(rdp->gpwrap, false);
6eaef633 kernel/rcutree.c  Paul E. McKenney 2013-03-19  1792  	}
48a7639c kernel/rcu/tree.c Paul E. McKenney 2014-03-11  1793  	return ret;
6eaef633 kernel/rcutree.c  Paul E. McKenney 2013-03-19  1794  }
6eaef633 kernel/rcutree.c  Paul E. McKenney 2013-03-19  1795  
d34ea322 kernel/rcutree.c  Paul E. McKenney 2013-03-19  1796  static void note_gp_changes(struct rcu_state *rsp, struct rcu_data *rdp)
6eaef633 kernel/rcutree.c  Paul E. McKenney 2013-03-19  1797  {
6eaef633 kernel/rcutree.c  Paul E. McKenney 2013-03-19  1798  	unsigned long flags;
48a7639c kernel/rcu/tree.c Paul E. McKenney 2014-03-11  1799  	bool needwake;
6eaef633 kernel/rcutree.c  Paul E. McKenney 2013-03-19  1800  	struct rcu_node *rnp;
6eaef633 kernel/rcutree.c  Paul E. McKenney 2013-03-19  1801  
6eaef633 kernel/rcutree.c  Paul E. McKenney 2013-03-19  1802  	local_irq_save(flags);
6eaef633 kernel/rcutree.c  Paul E. McKenney 2013-03-19  1803  	rnp = rdp->mynode;
7d0ae808 kernel/rcu/tree.c Paul E. McKenney 2015-03-03 @1804  	if ((rdp->gpnum == READ_ONCE(rnp->gpnum) &&
7d0ae808 kernel/rcu/tree.c Paul E. McKenney 2015-03-03  1805  	     rdp->completed == READ_ONCE(rnp->completed) &&
7d0ae808 kernel/rcu/tree.c Paul E. McKenney 2015-03-03  1806  	     !unlikely(READ_ONCE(rdp->gpwrap))) || /* w/out lock. */
3538015d kernel/rcu/tree.c Peter Zijlstra   2015-10-07  1807  	    !raw_spin_trylock_rcu_node(rnp)) { /* irqs already off, so later. */
6eaef633 kernel/rcutree.c  Paul E. McKenney 2013-03-19  1808  		local_irq_restore(flags);
6eaef633 kernel/rcutree.c  Paul E. McKenney 2013-03-19  1809  		return;
6eaef633 kernel/rcutree.c  Paul E. McKenney 2013-03-19  1810  	}
48a7639c kernel/rcu/tree.c Paul E. McKenney 2014-03-11  1811  	needwake = __note_gp_changes(rsp, rnp, rdp);
6eaef633 kernel/rcutree.c  Paul E. McKenney 2013-03-19  1812  	raw_spin_unlock_irqrestore(&rnp->lock, flags);

:::::: The code at line 1804 was first introduced by commit
:::::: 7d0ae8086b828311250c6afdf800b568ac9bd693 rcu: Convert ACCESS_ONCE() to READ_ONCE() and WRITE_ONCE()

:::::: TO: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
:::::: CC: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/octet-stream, Size: 22021 bytes --]

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: Re: [PATCH tip/core/rcu 02/18] rcu: Move rcu_report_exp_rnp() to allow consolidation
  2015-10-07 12:11                 ` kbuild test robot
@ 2015-10-07 12:17                   ` Peter Zijlstra
  2015-10-07 13:44                     ` [kbuild-all] " Fengguang Wu
  0 siblings, 1 reply; 67+ messages in thread
From: Peter Zijlstra @ 2015-10-07 12:17 UTC (permalink / raw)
  To: kbuild test robot
  Cc: kbuild-all, Mathieu Desnoyers, Paul E. McKenney, linux-kernel,
	Ingo Molnar, Lai Jiangshan, dipankar, Andrew Morton, josh,
	Thomas Gleixner, rostedt, dhowells, edumazet, dvhart, fweisbec,
	oleg, bobby prani

On Wed, Oct 07, 2015 at 08:11:01PM +0800, kbuild test robot wrote:
> Hi Peter,
> 
> [auto build test WARNING on v4.3-rc4 -- if it's inappropriate base, please ignore]

So much punishment for not having compiled my proto patch :/

Wu, is there a tag one can include to ward off this patch sucking robot
prematurely?

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH tip/core/rcu 04/18] rcu: Use single-stage IPI algorithm for RCU expedited grace period
  2015-10-06 16:29   ` [PATCH tip/core/rcu 04/18] rcu: Use single-stage IPI algorithm for RCU expedited grace period Paul E. McKenney
@ 2015-10-07 13:24     ` Peter Zijlstra
  2015-10-07 18:11       ` Paul E. McKenney
  2015-10-07 13:35     ` Peter Zijlstra
  2015-10-07 13:43     ` Peter Zijlstra
  2 siblings, 1 reply; 67+ messages in thread
From: Peter Zijlstra @ 2015-10-07 13:24 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, mingo, jiangshanlai, dipankar, akpm,
	mathieu.desnoyers, josh, tglx, rostedt, dhowells, edumazet,
	dvhart, fweisbec, oleg, bobby.prani

On Tue, Oct 06, 2015 at 09:29:23AM -0700, Paul E. McKenney wrote:
> @@ -3494,19 +3483,21 @@ static int sync_rcu_preempt_exp_done(struct rcu_node *rnp)
>   * recursively up the tree.  (Calm down, calm down, we do the recursion
>   * iteratively!)
>   *
> - * Caller must hold the root rcu_node's exp_funnel_mutex.
> + * Caller must hold the root rcu_node's exp_funnel_mutex and the
> + * specified rcu_node structure's ->lock.
>   */
> -static void __maybe_unused rcu_report_exp_rnp(struct rcu_state *rsp,
> -					      struct rcu_node *rnp, bool wake)
> +static void __rcu_report_exp_rnp(struct rcu_state *rsp, struct rcu_node *rnp,
> +				 bool wake, unsigned long flags)
> +	__releases(rnp->lock)
>  {
> -	unsigned long flags;
>  	unsigned long mask;
>  
> -	raw_spin_lock_irqsave(&rnp->lock, flags);
> -	smp_mb__after_unlock_lock();

	lockdep_assert_held(&rnp->lock);

> +/*
> + * Report expedited quiescent state for specified node.  This is a
> + * lock-acquisition wrapper function for __rcu_report_exp_rnp().
> + *
> + * Caller must hold the root rcu_node's exp_funnel_mutex.
> + */
> +static void __maybe_unused rcu_report_exp_rnp(struct rcu_state *rsp,
> +					      struct rcu_node *rnp, bool wake)
> +{
> +	unsigned long flags;

	lockdep_assert_held(&rcu_get_root(rsp)->exp_funnel_mutex);

> +
> +	raw_spin_lock_irqsave(&rnp->lock, flags);
> +	smp_mb__after_unlock_lock();
> +	__rcu_report_exp_rnp(rsp, rnp, wake, flags);
> +}


Etc.. these are much harder to ignore than comments.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH tip/core/rcu 04/18] rcu: Use single-stage IPI algorithm for RCU expedited grace period
  2015-10-06 16:29   ` [PATCH tip/core/rcu 04/18] rcu: Use single-stage IPI algorithm for RCU expedited grace period Paul E. McKenney
  2015-10-07 13:24     ` Peter Zijlstra
@ 2015-10-07 13:35     ` Peter Zijlstra
  2015-10-07 15:44       ` Paul E. McKenney
  2015-10-07 13:43     ` Peter Zijlstra
  2 siblings, 1 reply; 67+ messages in thread
From: Peter Zijlstra @ 2015-10-07 13:35 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, mingo, jiangshanlai, dipankar, akpm,
	mathieu.desnoyers, josh, tglx, rostedt, dhowells, edumazet,
	dvhart, fweisbec, oleg, bobby.prani

On Tue, Oct 06, 2015 at 09:29:23AM -0700, Paul E. McKenney wrote:

> +/* Flags for rcu_preempt_ctxt_queue() decision table. */
> +#define RCU_GP_TASKS	0x8
> +#define RCU_EXP_TASKS	0x4
> +#define RCU_GP_BLKD	0x2
> +#define RCU_EXP_BLKD	0x1

Purely cosmetic, but that's backwards ;-) Most of our flags etc.. are in
increasing order.

> +static void rcu_preempt_ctxt_queue(struct rcu_node *rnp, struct rcu_data *rdp,
> +				   unsigned long flags) __releases(rnp->lock)
> +{
> +	int blkd_state = (rnp->gp_tasks ? RCU_GP_TASKS : 0) +
> +			 (rnp->exp_tasks ? RCU_EXP_TASKS : 0) +
> +			 (rnp->qsmask & rdp->grpmask ? RCU_GP_BLKD : 0) +
> +			 (rnp->expmask & rdp->grpmask ? RCU_EXP_BLKD : 0);

An alternative way is:

	int blkd_state = RCU_GP_TASKS  * !!rnp->gp_tasks +
			 RCU_EXP_TASKS * !!rnp->exp_tasks +
			 RCU_GP_BLKD   * !!(rnp->qsmask  & rdp->grpmask) +
			 RCU_EXP_BLKD  * !!(rnp->expmask & rdp->grpmask);

I suppose it depends on how your brain is wired which version reads
easier :-)

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH tip/core/rcu 04/18] rcu: Use single-stage IPI algorithm for RCU expedited grace period
  2015-10-06 16:29   ` [PATCH tip/core/rcu 04/18] rcu: Use single-stage IPI algorithm for RCU expedited grace period Paul E. McKenney
  2015-10-07 13:24     ` Peter Zijlstra
  2015-10-07 13:35     ` Peter Zijlstra
@ 2015-10-07 13:43     ` Peter Zijlstra
  2015-10-07 13:49       ` Peter Zijlstra
  2015-10-07 16:13       ` Paul E. McKenney
  2 siblings, 2 replies; 67+ messages in thread
From: Peter Zijlstra @ 2015-10-07 13:43 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, mingo, jiangshanlai, dipankar, akpm,
	mathieu.desnoyers, josh, tglx, rostedt, dhowells, edumazet,
	dvhart, fweisbec, oleg, bobby.prani

On Tue, Oct 06, 2015 at 09:29:23AM -0700, Paul E. McKenney wrote:
> @@ -167,42 +307,18 @@ static void rcu_preempt_note_context_switch(void)

> -		raw_spin_unlock_irqrestore(&rnp->lock, flags);

This again reminds me that we should move rcu_note_context_switch()
under the IRQ disable section the scheduler already has or remove the
IRQ disable from rcu_note_context_switch().



^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [kbuild-all] [PATCH tip/core/rcu 02/18] rcu: Move rcu_report_exp_rnp() to allow consolidation
  2015-10-07 12:17                   ` Peter Zijlstra
@ 2015-10-07 13:44                     ` Fengguang Wu
  2015-10-07 13:55                       ` Peter Zijlstra
  0 siblings, 1 reply; 67+ messages in thread
From: Fengguang Wu @ 2015-10-07 13:44 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: bobby prani, Thomas Gleixner, oleg, fweisbec, dvhart,
	Lai Jiangshan, linux-kernel, rostedt, josh, dhowells, edumazet,
	Mathieu Desnoyers, kbuild-all, dipankar, Andrew Morton,
	Paul E. McKenney, Ingo Molnar

On Wed, Oct 07, 2015 at 02:17:51PM +0200, Peter Zijlstra wrote:
> On Wed, Oct 07, 2015 at 08:11:01PM +0800, kbuild test robot wrote:
> > Hi Peter,
> > 
> > [auto build test WARNING on v4.3-rc4 -- if it's inappropriate base, please ignore]
> 
> So much punishment for not having compiled my proto patch :/
> 
> Wu, is there a tag one can include to ward off this patch sucking robot
> prematurely?

Yes. The best way may be to push the patches to a git tree known to
0day robot:

https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/tree/repo/linux

So that it's tested first there. You'll then get private email reports
if it's a private git branch.

The robot has logic to avoid duplicate testing an emailed patch (based
on patch-id and author/title) if its git tree version has been tested.

We may also add a rule: only send private reports for patches with
"RFC", "Not-yet-signed-off-by:", etc.

Thanks,
Fengguang

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH tip/core/rcu 04/18] rcu: Use single-stage IPI algorithm for RCU expedited grace period
  2015-10-07 13:43     ` Peter Zijlstra
@ 2015-10-07 13:49       ` Peter Zijlstra
  2015-10-07 16:14         ` Paul E. McKenney
  2015-10-07 16:13       ` Paul E. McKenney
  1 sibling, 1 reply; 67+ messages in thread
From: Peter Zijlstra @ 2015-10-07 13:49 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, mingo, jiangshanlai, dipankar, akpm,
	mathieu.desnoyers, josh, tglx, rostedt, dhowells, edumazet,
	dvhart, fweisbec, oleg, bobby.prani

On Wed, Oct 07, 2015 at 03:43:11PM +0200, Peter Zijlstra wrote:
> On Tue, Oct 06, 2015 at 09:29:23AM -0700, Paul E. McKenney wrote:
> > @@ -167,42 +307,18 @@ static void rcu_preempt_note_context_switch(void)
> 
> > -		raw_spin_unlock_irqrestore(&rnp->lock, flags);
> 
> This again reminds me that we should move rcu_note_context_switch()
> under the IRQ disable section the scheduler already has or remove the
> IRQ disable from rcu_note_context_switch().

Ah, this is the unlikely path where we actually need to do work. The
normal fast paths no longer require IRQs disabled, you already fixed
that last time, good!

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [kbuild-all] [PATCH tip/core/rcu 02/18] rcu: Move rcu_report_exp_rnp() to allow consolidation
  2015-10-07 13:44                     ` [kbuild-all] " Fengguang Wu
@ 2015-10-07 13:55                       ` Peter Zijlstra
  2015-10-07 14:21                         ` Fengguang Wu
  0 siblings, 1 reply; 67+ messages in thread
From: Peter Zijlstra @ 2015-10-07 13:55 UTC (permalink / raw)
  To: Fengguang Wu
  Cc: bobby prani, Thomas Gleixner, oleg, fweisbec, dvhart,
	Lai Jiangshan, linux-kernel, rostedt, josh, dhowells, edumazet,
	Mathieu Desnoyers, kbuild-all, dipankar, Andrew Morton,
	Paul E. McKenney, Ingo Molnar

On Wed, Oct 07, 2015 at 09:44:32PM +0800, Fengguang Wu wrote:

> > Wu, is there a tag one can include to ward off this patch sucking robot
> > prematurely?
> 
> Yes. The best way may be to push the patches to a git tree known to
> 0day robot:
> 
> https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/tree/repo/linux
> 
> So that it's tested first there. You'll then get private email reports
> if it's a private git branch.

Right, but if I can't be bothered to compile test a patch, I also cannot
be bothered to stuff it into git :-)

> We may also add a rule: only send private reports for patches with
> "RFC", "Not-yet-signed-off-by:", etc.

How about not building when there's no "^Signed-off-by:" at all?

Even private build fails for patches like this -- esp. 3+ -- gets
annoying real quick.

Also note that this 'patch' has: $subject ~ /^Re:/, nor did it have
"^Subject:" like headers in the body.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH tip/core/rcu 09/18] rcu: Switch synchronize_sched_expedited() to IPI
  2015-10-06 16:29   ` [PATCH tip/core/rcu 09/18] rcu: Switch synchronize_sched_expedited() to IPI Paul E. McKenney
@ 2015-10-07 14:18     ` Peter Zijlstra
  2015-10-07 16:24       ` Paul E. McKenney
  0 siblings, 1 reply; 67+ messages in thread
From: Peter Zijlstra @ 2015-10-07 14:18 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, mingo, jiangshanlai, dipankar, akpm,
	mathieu.desnoyers, josh, tglx, rostedt, dhowells, edumazet,
	dvhart, fweisbec, oleg, bobby.prani

On Tue, Oct 06, 2015 at 09:29:28AM -0700, Paul E. McKenney wrote:

> @@ -161,6 +161,8 @@ static void rcu_cleanup_dead_rnp(struct rcu_node *rnp_leaf);
>  static void rcu_boost_kthread_setaffinity(struct rcu_node *rnp, int outgoingcpu);
>  static void invoke_rcu_core(void);
>  static void invoke_rcu_callbacks(struct rcu_state *rsp, struct rcu_data *rdp);
> +static void __maybe_unused rcu_report_exp_rdp(struct rcu_state *rsp,
> +					      struct rcu_data *rdp, bool wake);
>  

Do we really need that on a declaration? I though only unused
definitions required it.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [kbuild-all] [PATCH tip/core/rcu 02/18] rcu: Move rcu_report_exp_rnp() to allow consolidation
  2015-10-07 13:55                       ` Peter Zijlstra
@ 2015-10-07 14:21                         ` Fengguang Wu
  2015-10-07 14:28                           ` Peter Zijlstra
  0 siblings, 1 reply; 67+ messages in thread
From: Fengguang Wu @ 2015-10-07 14:21 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Andrew Morton, josh, fweisbec, dvhart,
	Lai Jiangshan, oleg, rostedt, linux-kernel, dhowells, edumazet,
	Mathieu Desnoyers, kbuild-all, dipankar, bobby prani,
	Paul E. McKenney, Thomas Gleixner

On Wed, Oct 07, 2015 at 03:55:29PM +0200, Peter Zijlstra wrote:
> On Wed, Oct 07, 2015 at 09:44:32PM +0800, Fengguang Wu wrote:
> 
> > > Wu, is there a tag one can include to ward off this patch sucking robot
> > > prematurely?
> > 
> > Yes. The best way may be to push the patches to a git tree known to
> > 0day robot:
> > 
> > https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/tree/repo/linux
> > 
> > So that it's tested first there. You'll then get private email reports
> > if it's a private git branch.
> 
> Right, but if I can't be bothered to compile test a patch, I also cannot
> be bothered to stuff it into git :-)

OK, that's understandable.

> > We may also add a rule: only send private reports for patches with
> > "RFC", "Not-yet-signed-off-by:", etc.
> 
> How about not building when there's no "^Signed-off-by:" at all?

That's a good idea: no need to test quick demo-of-idea patches.

> Even private build fails for patches like this -- esp. 3+ -- gets
> annoying real quick.
> 
> Also note that this 'patch' has: $subject ~ /^Re:/, nor did it have
> "^Subject:" like headers in the body.

That's good clues, too. So how about make the rule

        Skip test if no "^Signed-off-by:" and Subject =~ /^Re:/

For a patch posted inside a discussion thread, as long as it have
"^Signed-off-by:", I guess the author is serious and the patch could
be tested seriously.

Thanks,
Fengguang

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH tip/core/rcu 18/18] rcu: Better hotplug handling for synchronize_sched_expedited()
  2015-10-06 16:29   ` [PATCH tip/core/rcu 18/18] rcu: Better hotplug handling for synchronize_sched_expedited() Paul E. McKenney
@ 2015-10-07 14:26     ` Peter Zijlstra
  2015-10-07 16:26       ` Paul E. McKenney
  0 siblings, 1 reply; 67+ messages in thread
From: Peter Zijlstra @ 2015-10-07 14:26 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, mingo, jiangshanlai, dipankar, akpm,
	mathieu.desnoyers, josh, tglx, rostedt, dhowells, edumazet,
	dvhart, fweisbec, oleg, bobby.prani

On Tue, Oct 06, 2015 at 09:29:37AM -0700, Paul E. McKenney wrote:
>  void rcu_sched_qs(void)
>  {
> +	unsigned long flags;
> +
>  	if (__this_cpu_read(rcu_sched_data.cpu_no_qs.s)) {
>  		trace_rcu_grace_period(TPS("rcu_sched"),
>  				       __this_cpu_read(rcu_sched_data.gpnum),
>  				       TPS("cpuqs"));
>  		__this_cpu_write(rcu_sched_data.cpu_no_qs.b.norm, false);
> +		if (!__this_cpu_read(rcu_sched_data.cpu_no_qs.b.exp))
> +			return;
> +		local_irq_save(flags);
>  		if (__this_cpu_read(rcu_sched_data.cpu_no_qs.b.exp)) {
>  			__this_cpu_write(rcu_sched_data.cpu_no_qs.b.exp, false);
>  			rcu_report_exp_rdp(&rcu_sched_state,
>  					   this_cpu_ptr(&rcu_sched_data),
>  					   true);
>  		}
> +		local_irq_restore(flags);
>  	}
>  }

*sigh*.. still rare I suppose, but should we look at doing something
like this?

---
 kernel/sched/core.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index fe819298c220..3d830c3491c4 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3050,7 +3050,6 @@ static void __sched __schedule(void)
 
 	cpu = smp_processor_id();
 	rq = cpu_rq(cpu);
-	rcu_note_context_switch();
 	prev = rq->curr;
 
 	schedule_debug(prev);
@@ -3058,13 +3057,16 @@ static void __sched __schedule(void)
 	if (sched_feat(HRTICK))
 		hrtick_clear(rq);
 
+	local_irq_disable();
+	rcu_note_context_switch();
+
 	/*
 	 * Make sure that signal_pending_state()->signal_pending() below
 	 * can't be reordered with __set_current_state(TASK_INTERRUPTIBLE)
 	 * done by the caller to avoid the race with signal_wake_up().
 	 */
 	smp_mb__before_spinlock();
-	raw_spin_lock_irq(&rq->lock);
+	raw_spin_lock(&rq->lock);
 	lockdep_pin_lock(&rq->lock);
 
 	rq->clock_skip_update <<= 1; /* promote REQ to ACT */

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* Re: [kbuild-all] [PATCH tip/core/rcu 02/18] rcu: Move rcu_report_exp_rnp() to allow consolidation
  2015-10-07 14:21                         ` Fengguang Wu
@ 2015-10-07 14:28                           ` Peter Zijlstra
  0 siblings, 0 replies; 67+ messages in thread
From: Peter Zijlstra @ 2015-10-07 14:28 UTC (permalink / raw)
  To: Fengguang Wu
  Cc: Ingo Molnar, Andrew Morton, josh, fweisbec, dvhart,
	Lai Jiangshan, oleg, rostedt, linux-kernel, dhowells, edumazet,
	Mathieu Desnoyers, kbuild-all, dipankar, bobby prani,
	Paul E. McKenney, Thomas Gleixner

On Wed, Oct 07, 2015 at 10:21:33PM +0800, Fengguang Wu wrote:
> On Wed, Oct 07, 2015 at 03:55:29PM +0200, Peter Zijlstra wrote:
> > How about not building when there's no "^Signed-off-by:" at all?
> 
> That's a good idea: no need to test quick demo-of-idea patches.
> 
> > Even private build fails for patches like this -- esp. 3+ -- gets
> > annoying real quick.
> > 
> > Also note that this 'patch' has: $subject ~ /^Re:/, nor did it have
> > "^Subject:" like headers in the body.
> 
> That's good clues, too. So how about make the rule
> 
>         Skip test if no "^Signed-off-by:" and Subject =~ /^Re:/
> 
> For a patch posted inside a discussion thread, as long as it have
> "^Signed-off-by:", I guess the author is serious and the patch could
> be tested seriously.

Works for me; now hoping I will abide by my own suggested rules ;-)

Thanks!

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH tip/core/rcu 02/18] rcu: Move rcu_report_exp_rnp() to allow consolidation
  2015-10-07  7:51         ` Peter Zijlstra
  2015-10-07  8:42           ` Mathieu Desnoyers
@ 2015-10-07 14:33           ` Paul E. McKenney
  2015-10-07 14:40             ` Peter Zijlstra
  1 sibling, 1 reply; 67+ messages in thread
From: Paul E. McKenney @ 2015-10-07 14:33 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, mingo, jiangshanlai, dipankar, akpm,
	mathieu.desnoyers, josh, tglx, rostedt, dhowells, edumazet,
	dvhart, fweisbec, oleg, bobby.prani

On Wed, Oct 07, 2015 at 09:51:14AM +0200, Peter Zijlstra wrote:
> On Tue, Oct 06, 2015 at 01:58:50PM -0700, Paul E. McKenney wrote:
> > On Tue, Oct 06, 2015 at 10:29:37PM +0200, Peter Zijlstra wrote:
> > > On Tue, Oct 06, 2015 at 09:29:21AM -0700, Paul E. McKenney wrote:
> > > > +static void __maybe_unused rcu_report_exp_rnp(struct rcu_state *rsp,
> > > > +					      struct rcu_node *rnp, bool wake)
> > > > +{
> > > > +	unsigned long flags;
> > > > +	unsigned long mask;
> > > > +
> > > > +	raw_spin_lock_irqsave(&rnp->lock, flags);
> > > 
> > > Normally we require a comment with barriers, explaining the order and
> > > the pairing etc.. :-)
> > > 
> > > > +	smp_mb__after_unlock_lock();
> > 
> > Hmmmm...  That is not good.
> > 
> > Worse yet, I am missing comments on most of the pre-existing barriers
> > of this form.
> 
> Yes I noticed.. :/

Will fix, though probably as a follow-up patch.  Once I figure out what
comment makes sense...

> > The purpose is to enforce the heavy-weight grace-period memory-ordering
> > guarantees documented in the synchronize_sched() header comment and
> > elsewhere.
> 
> > They pair with anything you might use to check for violation
> > of these guarantees, or, simiarly, any ordering that you might use when
> > relying on these guarantees.
> 
> I'm sure you know what that means, but I've no clue ;-) That is, I
> wouldn't know where to start looking in the RCU implementation to verify
> the barrier is either needed or sufficient. Unless you mean _everywhere_
> :-)

Pretty much everywhere.

Let's take the usual RCU removal pattern as an example:

	void f1(struct foo *p)
	{
		list_del_rcu(p);
		synchronize_rcu_expedited();
		kfree(p);
	}

	void f2(void)
	{
		struct foo *p;

		list_for_each_entry_rcu(p, &my_head, next)
			do_something_with(p);
	}

So the synchronize_rcu_expedited() acts as an extremely heavyweight
memory barrier that pairs with the rcu_dereference() inside of
list_for_each_entry_rcu().  Easy enough, right?

But what exactly within synchronize_rcu_expedited() provides the
ordering?  The answer is a web of lock-based critical sections and
explicit memory barriers, with the one you called out as needing
a comment being one of them.

> > I could add something like  "/* Enforce GP memory ordering. */"
> > 
> > Or perhaps "/* See synchronize_sched() header. */"
> > 
> > I do not propose reproducing the synchronize_sched() header on each
> > of these.  That would be verbose, even for me!  ;-)
> > 
> > Other thoughts?
> 
> Well, this is an UNLOCK+LOCK on non-matching lock variables upgrade to
> full barrier thing, right?

Yep!

> To me its not clear which UNLOCK we even match here. I've just read the
> sync_sched() header, but that doesn't help me either, so referring to
> that isn't really helpful either.

Usually this pairs with an rcu_dereference() somewhere in the calling
code.  Some other task in the calling code, actually.

> In any case, I don't want to make too big a fuzz here, but I just
> stumbled over a lot of unannotated barriers and figured I ought to say
> something about it.

I do need to better document how this works, no two ways about it.

							Thanx, Paul


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH tip/core/rcu 02/18] rcu: Move rcu_report_exp_rnp() to allow consolidation
  2015-10-07 14:33           ` Paul E. McKenney
@ 2015-10-07 14:40             ` Peter Zijlstra
  2015-10-07 16:48               ` Paul E. McKenney
  0 siblings, 1 reply; 67+ messages in thread
From: Peter Zijlstra @ 2015-10-07 14:40 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, mingo, jiangshanlai, dipankar, akpm,
	mathieu.desnoyers, josh, tglx, rostedt, dhowells, edumazet,
	dvhart, fweisbec, oleg, bobby.prani

On Wed, Oct 07, 2015 at 07:33:25AM -0700, Paul E. McKenney wrote:
> > I'm sure you know what that means, but I've no clue ;-) That is, I
> > wouldn't know where to start looking in the RCU implementation to verify
> > the barrier is either needed or sufficient. Unless you mean _everywhere_
> > :-)
> 
> Pretty much everywhere.
> 
> Let's take the usual RCU removal pattern as an example:
> 
> 	void f1(struct foo *p)
> 	{
> 		list_del_rcu(p);
> 		synchronize_rcu_expedited();
> 		kfree(p);
> 	}
> 
> 	void f2(void)
> 	{
> 		struct foo *p;
> 
> 		list_for_each_entry_rcu(p, &my_head, next)
> 			do_something_with(p);
> 	}
> 
> So the synchronize_rcu_expedited() acts as an extremely heavyweight
> memory barrier that pairs with the rcu_dereference() inside of
> list_for_each_entry_rcu().  Easy enough, right?
> 
> But what exactly within synchronize_rcu_expedited() provides the
> ordering?  The answer is a web of lock-based critical sections and
> explicit memory barriers, with the one you called out as needing
> a comment being one of them.

Right, but seeing there's possible implementations of sync_rcu(_exp)*()
that do not have the whole rcu_node tree like thing, there's more to
this particular barrier than the semantics of sync_rcu().

Some implementation choice requires this barrier upgrade -- and in
another email I suggest its the whole tree thing, we need to firmly
establish the state of one level before propagating the state up etc.

Now I'm not entirely sure this is fully correct, but its the best I
could come up.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH tip/core/rcu 02/18] rcu: Move rcu_report_exp_rnp() to allow consolidation
  2015-10-07 11:01             ` Peter Zijlstra
  2015-10-07 11:50               ` Peter Zijlstra
@ 2015-10-07 15:15               ` Paul E. McKenney
  1 sibling, 0 replies; 67+ messages in thread
From: Paul E. McKenney @ 2015-10-07 15:15 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Mathieu Desnoyers, linux-kernel, Ingo Molnar, Lai Jiangshan,
	dipankar, Andrew Morton, josh, Thomas Gleixner, rostedt,
	dhowells, edumazet, dvhart, fweisbec, oleg, bobby prani

On Wed, Oct 07, 2015 at 01:01:20PM +0200, Peter Zijlstra wrote:
> On Wed, Oct 07, 2015 at 08:42:05AM +0000, Mathieu Desnoyers wrote:
> > ----- On Oct 7, 2015, at 3:51 AM, Peter Zijlstra peterz@infradead.org wrote:
> > 
> > > On Tue, Oct 06, 2015 at 01:58:50PM -0700, Paul E. McKenney wrote:
> > >> On Tue, Oct 06, 2015 at 10:29:37PM +0200, Peter Zijlstra wrote:
> > >> > On Tue, Oct 06, 2015 at 09:29:21AM -0700, Paul E. McKenney wrote:
> > >> > > +static void __maybe_unused rcu_report_exp_rnp(struct rcu_state *rsp,
> > >> > > +					      struct rcu_node *rnp, bool wake)
> > >> > > +{
> > >> > > +	unsigned long flags;
> > >> > > +	unsigned long mask;
> > >> > > +
> > >> > > +	raw_spin_lock_irqsave(&rnp->lock, flags);
> > >> > 
> > >> > Normally we require a comment with barriers, explaining the order and
> > >> > the pairing etc.. :-)
> > >> > 
> > >> > > +	smp_mb__after_unlock_lock();
> > >> 
> > >> Hmmmm...  That is not good.
> > >> 
> > >> Worse yet, I am missing comments on most of the pre-existing barriers
> > >> of this form.
> > > 
> > > Yes I noticed.. :/
> > > 
> > >> The purpose is to enforce the heavy-weight grace-period memory-ordering
> > >> guarantees documented in the synchronize_sched() header comment and
> > >> elsewhere.
> > > 
> > >> They pair with anything you might use to check for violation
> > >> of these guarantees, or, simiarly, any ordering that you might use when
> > >> relying on these guarantees.
> > > 
> > > I'm sure you know what that means, but I've no clue ;-) That is, I
> > > wouldn't know where to start looking in the RCU implementation to verify
> > > the barrier is either needed or sufficient. Unless you mean _everywhere_
> > > :-)
> > 
> > One example is the new membarrier system call. It relies on synchronize_sched()
> > to enforce this:
> 
> That again doesn't explain which UNLOCKs with non-matching lock values
> it pairs with and what particular ordering is important here.
> 
> I'm fully well aware of what sync_sched() guarantees and how one can use
> it, that is not the issue, what I'm saying is that a generic description
> of sync_sched() doesn't help in figuring out WTH that barrier is for and
> which other code I should also inspect.

Unfortunately, the answer is "pretty much all of it".  :-(

The enforced ordering relies on pretty much every acquisition/release of
an rcu_node structure's ->lock and all the dyntick-idle stuff, plus some
explicit barriers and a few smp_load_acquire()s and smp_store_release()s.

							Thanx, Paul


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH tip/core/rcu 02/18] rcu: Move rcu_report_exp_rnp() to allow consolidation
  2015-10-07 11:50               ` Peter Zijlstra
                                   ` (3 preceding siblings ...)
  2015-10-07 12:11                 ` kbuild test robot
@ 2015-10-07 15:18                 ` Paul E. McKenney
  2015-10-08 10:24                   ` Peter Zijlstra
  4 siblings, 1 reply; 67+ messages in thread
From: Paul E. McKenney @ 2015-10-07 15:18 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Mathieu Desnoyers, linux-kernel, Ingo Molnar, Lai Jiangshan,
	dipankar, Andrew Morton, josh, Thomas Gleixner, rostedt,
	dhowells, edumazet, dvhart, fweisbec, oleg, bobby prani

On Wed, Oct 07, 2015 at 01:50:46PM +0200, Peter Zijlstra wrote:
> On Wed, Oct 07, 2015 at 01:01:20PM +0200, Peter Zijlstra wrote:
> 
> > That again doesn't explain which UNLOCKs with non-matching lock values
> > it pairs with and what particular ordering is important here.
> 
> So after staring at that stuff for a while I came up with the following.
> Does this make sense, or am I completely misunderstanding things?
> 
> Not been near a compiler.

Actually, this would be quite good.  "Premature abstraction is the
root of all evil" and all that, but this abstraction is anything but
premature.  My thought would be to have it against commit cd58087c9cee
("Merge branches 'doc.2015.10.06a', 'percpu-rwsem.2015.10.06a' and
'torture.2015.10.06a' into HEAD") in -rcu given the merge conflicts
that would otherwise arise.

							Thanx, Paul

> ---
>  kernel/rcu/tree.c | 99 ++++++++++++++++++++++++++++++++++---------------------
>  1 file changed, 61 insertions(+), 38 deletions(-)
> 
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 775d36cc0050..46e1e23ff762 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -1460,6 +1460,48 @@ static void trace_rcu_future_gp(struct rcu_node *rnp, struct rcu_data *rdp,
>  }
> 
>  /*
> + * Wrappers for the rcu_node::lock acquire.
> + *
> + * Because the rcu_nodes form a tree, the tree taversal locking will observe
> + * different lock values, this in turn means that an UNLOCK of one level
> + * followed by a LOCK of another level does not imply a full memory barrier;
> + * and most importantly transitivity is lost.
> + *
> + * In order to restore full ordering between tree levels, augment the regular
> + * lock acquire functions with smp_mb__after_unlock_lock().
> + */
> +static inline void raw_spin_lock_rcu_node(struct rcu_node *rnp)
> +{
> +	raw_spin_lock(&rnp->lock);
> +	smp_mb__after_unlock_lock();
> +}
> +
> +static inline void raw_spin_lock_irq_rcu_node(struct rcu_node *rnp)
> +{
> +	raw_spin_lock_irq(&rnp->lock);
> +	smp_mb__after_unlock_lock();
> +}
> +
> +static inline void
> +_raw_spin_lock_irqsave_rcu_node(struct rcu_node *rnp, unsigned long *flags)
> +{
> +	_raw_spin_lock_irqsave(&rnp->lock, flags);
> +	smp_mb__after_unlock_lock();
> +}
> +
> +#define raw_spin_lock_irqsave_rcu_node(rnp, flags)
> +	_raw_spin_lock_irqsave_rcu_node((rnp), &(flags))
> +
> +static inline bool raw_spin_trylock_rcu_node(struct rcu_node *rnp)
> +{
> +	bool locked = raw_spin_trylock(&rnp->lock);
> +	if (locked)
> +		smp_mb__after_unlock_lock();
> +	return locked;
> +}
> +
> +
> +/*
>   * Start some future grace period, as needed to handle newly arrived
>   * callbacks.  The required future grace periods are recorded in each
>   * rcu_node structure's ->need_future_gp field.  Returns true if there
> @@ -1512,10 +1554,8 @@ rcu_start_future_gp(struct rcu_node *rnp, struct rcu_data *rdp,
>  	 * hold it, acquire the root rcu_node structure's lock in order to
>  	 * start one (if needed).
>  	 */
> -	if (rnp != rnp_root) {
> -		raw_spin_lock(&rnp_root->lock);
> -		smp_mb__after_unlock_lock();
> -	}
> +	if (rnp != rnp_root)
> +		raw_spin_lock_rcu_node(rnp);
> 
>  	/*
>  	 * Get a new grace-period number.  If there really is no grace
> @@ -1764,11 +1804,10 @@ static void note_gp_changes(struct rcu_state *rsp, struct rcu_data *rdp)
>  	if ((rdp->gpnum == READ_ONCE(rnp->gpnum) &&
>  	     rdp->completed == READ_ONCE(rnp->completed) &&
>  	     !unlikely(READ_ONCE(rdp->gpwrap))) || /* w/out lock. */
> -	    !raw_spin_trylock(&rnp->lock)) { /* irqs already off, so later. */
> +	    !raw_spin_trylock_rcu_node(rnp)) { /* irqs already off, so later. */
>  		local_irq_restore(flags);
>  		return;
>  	}
> -	smp_mb__after_unlock_lock();
>  	needwake = __note_gp_changes(rsp, rnp, rdp);
>  	raw_spin_unlock_irqrestore(&rnp->lock, flags);
>  	if (needwake)
> @@ -1792,8 +1831,7 @@ static int rcu_gp_init(struct rcu_state *rsp)
>  	struct rcu_node *rnp = rcu_get_root(rsp);
> 
>  	WRITE_ONCE(rsp->gp_activity, jiffies);
> -	raw_spin_lock_irq(&rnp->lock);
> -	smp_mb__after_unlock_lock();
> +	raw_spin_lock_irq_rcu_node(rnp);
>  	if (!READ_ONCE(rsp->gp_flags)) {
>  		/* Spurious wakeup, tell caller to go back to sleep.  */
>  		raw_spin_unlock_irq(&rnp->lock);
> @@ -1825,8 +1863,7 @@ static int rcu_gp_init(struct rcu_state *rsp)
>  	 */
>  	rcu_for_each_leaf_node(rsp, rnp) {
>  		rcu_gp_slow(rsp, gp_preinit_delay);
> -		raw_spin_lock_irq(&rnp->lock);
> -		smp_mb__after_unlock_lock();
> +		raw_spin_lock_irq_rcu_node(rnp);
>  		if (rnp->qsmaskinit == rnp->qsmaskinitnext &&
>  		    !rnp->wait_blkd_tasks) {
>  			/* Nothing to do on this leaf rcu_node structure. */
> @@ -1882,8 +1919,7 @@ static int rcu_gp_init(struct rcu_state *rsp)
>  	 */
>  	rcu_for_each_node_breadth_first(rsp, rnp) {
>  		rcu_gp_slow(rsp, gp_init_delay);
> -		raw_spin_lock_irq(&rnp->lock);
> -		smp_mb__after_unlock_lock();
> +		raw_spin_lock_irq_rcu_node(rnp);
>  		rdp = this_cpu_ptr(rsp->rda);
>  		rcu_preempt_check_blocked_tasks(rnp);
>  		rnp->qsmask = rnp->qsmaskinit;
> @@ -1953,8 +1989,7 @@ static int rcu_gp_fqs(struct rcu_state *rsp, int fqs_state_in)
>  	}
>  	/* Clear flag to prevent immediate re-entry. */
>  	if (READ_ONCE(rsp->gp_flags) & RCU_GP_FLAG_FQS) {
> -		raw_spin_lock_irq(&rnp->lock);
> -		smp_mb__after_unlock_lock();
> +		raw_spin_lock_irq_rcu_node(rnp);
>  		WRITE_ONCE(rsp->gp_flags,
>  			   READ_ONCE(rsp->gp_flags) & ~RCU_GP_FLAG_FQS);
>  		raw_spin_unlock_irq(&rnp->lock);
> @@ -1974,8 +2009,7 @@ static void rcu_gp_cleanup(struct rcu_state *rsp)
>  	struct rcu_node *rnp = rcu_get_root(rsp);
> 
>  	WRITE_ONCE(rsp->gp_activity, jiffies);
> -	raw_spin_lock_irq(&rnp->lock);
> -	smp_mb__after_unlock_lock();
> +	raw_spin_lock_irq_rcu_node(rnp);
>  	gp_duration = jiffies - rsp->gp_start;
>  	if (gp_duration > rsp->gp_max)
>  		rsp->gp_max = gp_duration;
> @@ -2000,8 +2034,7 @@ static void rcu_gp_cleanup(struct rcu_state *rsp)
>  	 * grace period is recorded in any of the rcu_node structures.
>  	 */
>  	rcu_for_each_node_breadth_first(rsp, rnp) {
> -		raw_spin_lock_irq(&rnp->lock);
> -		smp_mb__after_unlock_lock();
> +		raw_spin_lock_irq_rcu_node(rnp);
>  		WARN_ON_ONCE(rcu_preempt_blocked_readers_cgp(rnp));
>  		WARN_ON_ONCE(rnp->qsmask);
>  		WRITE_ONCE(rnp->completed, rsp->gpnum);
> @@ -2016,8 +2049,7 @@ static void rcu_gp_cleanup(struct rcu_state *rsp)
>  		rcu_gp_slow(rsp, gp_cleanup_delay);
>  	}
>  	rnp = rcu_get_root(rsp);
> -	raw_spin_lock_irq(&rnp->lock);
> -	smp_mb__after_unlock_lock(); /* Order GP before ->completed update. */
> +	raw_spin_lock_irq_rcu_node(rnp); /* Order GP before ->completed update. */
>  	rcu_nocb_gp_set(rnp, nocb);
> 
>  	/* Declare grace period done. */
> @@ -2264,8 +2296,7 @@ rcu_report_qs_rnp(unsigned long mask, struct rcu_state *rsp,
>  		raw_spin_unlock_irqrestore(&rnp->lock, flags);
>  		rnp_c = rnp;
>  		rnp = rnp->parent;
> -		raw_spin_lock_irqsave(&rnp->lock, flags);
> -		smp_mb__after_unlock_lock();
> +		raw_spin_lock_irqsave_rcu_node(rnp, flags);
>  		oldmask = rnp_c->qsmask;
>  	}
> 
> @@ -2312,8 +2343,7 @@ static void rcu_report_unblock_qs_rnp(struct rcu_state *rsp,
>  	gps = rnp->gpnum;
>  	mask = rnp->grpmask;
>  	raw_spin_unlock(&rnp->lock);	/* irqs remain disabled. */
> -	raw_spin_lock(&rnp_p->lock);	/* irqs already disabled. */
> -	smp_mb__after_unlock_lock();
> +	raw_spin_lock_rcu_node(rnp);	/* irqs already disabled. */
>  	rcu_report_qs_rnp(mask, rsp, rnp_p, gps, flags);
>  }
> 
> @@ -2335,8 +2365,7 @@ rcu_report_qs_rdp(int cpu, struct rcu_state *rsp, struct rcu_data *rdp)
>  	struct rcu_node *rnp;
> 
>  	rnp = rdp->mynode;
> -	raw_spin_lock_irqsave(&rnp->lock, flags);
> -	smp_mb__after_unlock_lock();
> +	raw_spin_lock_irqsave_rcu_node(rnp, flags);
>  	if ((rdp->passed_quiesce == 0 &&
>  	     rdp->rcu_qs_ctr_snap == __this_cpu_read(rcu_qs_ctr)) ||
>  	    rdp->gpnum != rnp->gpnum || rnp->completed == rnp->gpnum ||
> @@ -2562,8 +2591,7 @@ static void rcu_cleanup_dead_rnp(struct rcu_node *rnp_leaf)
>  		rnp = rnp->parent;
>  		if (!rnp)
>  			break;
> -		raw_spin_lock(&rnp->lock); /* irqs already disabled. */
> -		smp_mb__after_unlock_lock(); /* GP memory ordering. */
> +		raw_spin_lock_rcu_node(rnp); /* irqs already disabled. */
>  		rnp->qsmaskinit &= ~mask;
>  		rnp->qsmask &= ~mask;
>  		if (rnp->qsmaskinit) {
> @@ -2591,8 +2619,7 @@ static void rcu_cleanup_dying_idle_cpu(int cpu, struct rcu_state *rsp)
> 
>  	/* Remove outgoing CPU from mask in the leaf rcu_node structure. */
>  	mask = rdp->grpmask;
> -	raw_spin_lock_irqsave(&rnp->lock, flags);
> -	smp_mb__after_unlock_lock();	/* Enforce GP memory-order guarantee. */
> +	raw_spin_lock_irqsave_rcu_node(rnp, flags); /* Enforce GP memory-order guarantee. */
>  	rnp->qsmaskinitnext &= ~mask;
>  	raw_spin_unlock_irqrestore(&rnp->lock, flags);
>  }
> @@ -2789,8 +2816,7 @@ static void force_qs_rnp(struct rcu_state *rsp,
>  	rcu_for_each_leaf_node(rsp, rnp) {
>  		cond_resched_rcu_qs();
>  		mask = 0;
> -		raw_spin_lock_irqsave(&rnp->lock, flags);
> -		smp_mb__after_unlock_lock();
> +		raw_spin_lock_irqsave_rcu_node(rnp, flags);
>  		if (rnp->qsmask == 0) {
>  			if (rcu_state_p == &rcu_sched_state ||
>  			    rsp != rcu_state_p ||
> @@ -2861,8 +2887,7 @@ static void force_quiescent_state(struct rcu_state *rsp)
>  	/* rnp_old == rcu_get_root(rsp), rnp == NULL. */
> 
>  	/* Reached the root of the rcu_node tree, acquire lock. */
> -	raw_spin_lock_irqsave(&rnp_old->lock, flags);
> -	smp_mb__after_unlock_lock();
> +	raw_spin_lock_irqsave_rcu_node(rnp_old, flags);
>  	raw_spin_unlock(&rnp_old->fqslock);
>  	if (READ_ONCE(rsp->gp_flags) & RCU_GP_FLAG_FQS) {
>  		rsp->n_force_qs_lh++;
> @@ -2985,8 +3010,7 @@ static void __call_rcu_core(struct rcu_state *rsp, struct rcu_data *rdp,
>  		if (!rcu_gp_in_progress(rsp)) {
>  			struct rcu_node *rnp_root = rcu_get_root(rsp);
> 
> -			raw_spin_lock(&rnp_root->lock);
> -			smp_mb__after_unlock_lock();
> +			raw_spin_lock_rcu_node(rnp_root);
>  			needwake = rcu_start_gp(rsp);
>  			raw_spin_unlock(&rnp_root->lock);
>  			if (needwake)
> @@ -3925,8 +3949,7 @@ rcu_init_percpu_data(int cpu, struct rcu_state *rsp)
>  	 */
>  	rnp = rdp->mynode;
>  	mask = rdp->grpmask;
> -	raw_spin_lock(&rnp->lock);		/* irqs already disabled. */
> -	smp_mb__after_unlock_lock();
> +	raw_spin_lock_rcu_node(rnp);		/* irqs already disabled. */
>  	rnp->qsmaskinitnext |= mask;
>  	rdp->gpnum = rnp->completed; /* Make CPU later note any new GP. */
>  	rdp->completed = rnp->completed;
> 


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH tip/core/rcu 04/18] rcu: Use single-stage IPI algorithm for RCU expedited grace period
  2015-10-07 13:35     ` Peter Zijlstra
@ 2015-10-07 15:44       ` Paul E. McKenney
  0 siblings, 0 replies; 67+ messages in thread
From: Paul E. McKenney @ 2015-10-07 15:44 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, mingo, jiangshanlai, dipankar, akpm,
	mathieu.desnoyers, josh, tglx, rostedt, dhowells, edumazet,
	dvhart, fweisbec, oleg, bobby.prani

On Wed, Oct 07, 2015 at 03:35:45PM +0200, Peter Zijlstra wrote:
> On Tue, Oct 06, 2015 at 09:29:23AM -0700, Paul E. McKenney wrote:
> 
> > +/* Flags for rcu_preempt_ctxt_queue() decision table. */
> > +#define RCU_GP_TASKS	0x8
> > +#define RCU_EXP_TASKS	0x4
> > +#define RCU_GP_BLKD	0x2
> > +#define RCU_EXP_BLKD	0x1
> 
> Purely cosmetic, but that's backwards ;-) Most of our flags etc.. are in
> increasing order.
> 
> > +static void rcu_preempt_ctxt_queue(struct rcu_node *rnp, struct rcu_data *rdp,
> > +				   unsigned long flags) __releases(rnp->lock)
> > +{
> > +	int blkd_state = (rnp->gp_tasks ? RCU_GP_TASKS : 0) +
> > +			 (rnp->exp_tasks ? RCU_EXP_TASKS : 0) +
> > +			 (rnp->qsmask & rdp->grpmask ? RCU_GP_BLKD : 0) +
> > +			 (rnp->expmask & rdp->grpmask ? RCU_EXP_BLKD : 0);
> 
> An alternative way is:
> 
> 	int blkd_state = RCU_GP_TASKS  * !!rnp->gp_tasks +
> 			 RCU_EXP_TASKS * !!rnp->exp_tasks +
> 			 RCU_GP_BLKD   * !!(rnp->qsmask  & rdp->grpmask) +
> 			 RCU_EXP_BLKD  * !!(rnp->expmask & rdp->grpmask);
> 
> I suppose it depends on how your brain is wired which version reads
> easier :-)

Indeed!  ;-)

I will stick with the ?: for the moment, but your multiplied bang-up
approach might well grow on me.

							Thanx, Paul


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH tip/core/rcu 04/18] rcu: Use single-stage IPI algorithm for RCU expedited grace period
  2015-10-07 13:43     ` Peter Zijlstra
  2015-10-07 13:49       ` Peter Zijlstra
@ 2015-10-07 16:13       ` Paul E. McKenney
  1 sibling, 0 replies; 67+ messages in thread
From: Paul E. McKenney @ 2015-10-07 16:13 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, mingo, jiangshanlai, dipankar, akpm,
	mathieu.desnoyers, josh, tglx, rostedt, dhowells, edumazet,
	dvhart, fweisbec, oleg, bobby.prani

On Wed, Oct 07, 2015 at 03:43:11PM +0200, Peter Zijlstra wrote:
> On Tue, Oct 06, 2015 at 09:29:23AM -0700, Paul E. McKenney wrote:
> > @@ -167,42 +307,18 @@ static void rcu_preempt_note_context_switch(void)
> 
> > -		raw_spin_unlock_irqrestore(&rnp->lock, flags);
> 
> This again reminds me that we should move rcu_note_context_switch()
> under the IRQ disable section the scheduler already has or remove the
> IRQ disable from rcu_note_context_switch().

Like this?  I verified that all callers of rcu_virt_note_context_switch()
currently have interrupts disabled.

							Thanx, Paul

------------------------------------------------------------------------

commit 945c702687f760872adf7ce6e030c81ba427bf34
Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Date:   Wed Oct 7 09:10:48 2015 -0700

    rcu: Stop disabling interrupts in scheduler fastpaths
    
    We need the scheduler's fastpaths to be, well, fast, and unnecessarily
    disabling and re-enabling interrupts is not necessarily consistent with
    this goal.  Especially given that there are regions of the scheduler that
    already have interrupts disabled.
    
    This commit therefore moves the call to rcu_note_context_switch()
    to one of the interrupts-disabled regions of the scheduler, and
    removes the now-redundant disabling and re-enabling of interrupts from
    rcu_note_context_switch() and the functions it calls.
    
    Reported-by: Peter Zijlstra <peterz@infradead.org>
    Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

diff --git a/include/linux/rcutree.h b/include/linux/rcutree.h
index 60d15a080d7c..9d3eda39bcd2 100644
--- a/include/linux/rcutree.h
+++ b/include/linux/rcutree.h
@@ -37,7 +37,7 @@ void rcu_cpu_stall_reset(void);
 /*
  * Note a virtualization-based context switch.  This is simply a
  * wrapper around rcu_note_context_switch(), which allows TINY_RCU
- * to save a few bytes.
+ * to save a few bytes. The caller must have disabled interrupts.
  */
 static inline void rcu_virt_note_context_switch(int cpu)
 {
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 3dd63ebf279a..1f81a960678a 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -295,17 +295,16 @@ EXPORT_PER_CPU_SYMBOL_GPL(rcu_qs_ctr);
  * We inform the RCU core by emulating a zero-duration dyntick-idle
  * period, which we in turn do by incrementing the ->dynticks counter
  * by two.
+ *
+ * The caller must have disabled interrupts.
  */
 static void rcu_momentary_dyntick_idle(void)
 {
-	unsigned long flags;
 	struct rcu_data *rdp;
 	struct rcu_dynticks *rdtp;
 	int resched_mask;
 	struct rcu_state *rsp;
 
-	local_irq_save(flags);
-
 	/*
 	 * Yes, we can lose flag-setting operations.  This is OK, because
 	 * the flag will be set again after some delay.
@@ -335,13 +334,12 @@ static void rcu_momentary_dyntick_idle(void)
 		smp_mb__after_atomic(); /* Later stuff after QS. */
 		break;
 	}
-	local_irq_restore(flags);
 }
 
 /*
  * Note a context switch.  This is a quiescent state for RCU-sched,
  * and requires special handling for preemptible RCU.
- * The caller must have disabled preemption.
+ * The caller must have disabled interrupts.
  */
 void rcu_note_context_switch(void)
 {
@@ -371,9 +369,14 @@ EXPORT_SYMBOL_GPL(rcu_note_context_switch);
  */
 void rcu_all_qs(void)
 {
+	unsigned long flags;
+
 	barrier(); /* Avoid RCU read-side critical sections leaking down. */
-	if (unlikely(raw_cpu_read(rcu_sched_qs_mask)))
+	if (unlikely(raw_cpu_read(rcu_sched_qs_mask))) {
+		local_irq_save(flags);
 		rcu_momentary_dyntick_idle();
+		local_irq_restore(flags);
+	}
 	this_cpu_inc(rcu_qs_ctr);
 	barrier(); /* Avoid RCU read-side critical sections leaking up. */
 }
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 97dfa7d57f79..7087fb047e2d 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -146,8 +146,8 @@ static void __init rcu_bootup_announce(void)
  * the corresponding expedited grace period will also be the end of the
  * normal grace period.
  */
-static void rcu_preempt_ctxt_queue(struct rcu_node *rnp, struct rcu_data *rdp,
-				   unsigned long flags) __releases(rnp->lock)
+static void rcu_preempt_ctxt_queue(struct rcu_node *rnp, struct rcu_data *rdp)
+	__releases(rnp->lock) /* But leaves rrupts disabled. */
 {
 	int blkd_state = (rnp->gp_tasks ? RCU_GP_TASKS : 0) +
 			 (rnp->exp_tasks ? RCU_EXP_TASKS : 0) +
@@ -235,7 +235,7 @@ static void rcu_preempt_ctxt_queue(struct rcu_node *rnp, struct rcu_data *rdp,
 		rnp->gp_tasks = &t->rcu_node_entry;
 	if (!rnp->exp_tasks && (blkd_state & RCU_EXP_BLKD))
 		rnp->exp_tasks = &t->rcu_node_entry;
-	raw_spin_unlock(&rnp->lock);
+	raw_spin_unlock(&rnp->lock); /* rrupts remain disabled. */
 
 	/*
 	 * Report the quiescent state for the expedited GP.  This expedited
@@ -250,7 +250,6 @@ static void rcu_preempt_ctxt_queue(struct rcu_node *rnp, struct rcu_data *rdp,
 	} else {
 		WARN_ON_ONCE(t->rcu_read_unlock_special.b.exp_need_qs);
 	}
-	local_irq_restore(flags);
 }
 
 /*
@@ -285,12 +284,11 @@ static void rcu_preempt_qs(void)
  * predating the current grace period drain, in other words, until
  * rnp->gp_tasks becomes NULL.
  *
- * Caller must disable preemption.
+ * Caller must disable interrupts.
  */
 static void rcu_preempt_note_context_switch(void)
 {
 	struct task_struct *t = current;
-	unsigned long flags;
 	struct rcu_data *rdp;
 	struct rcu_node *rnp;
 
@@ -300,7 +298,7 @@ static void rcu_preempt_note_context_switch(void)
 		/* Possibly blocking in an RCU read-side critical section. */
 		rdp = this_cpu_ptr(rcu_state_p->rda);
 		rnp = rdp->mynode;
-		raw_spin_lock_irqsave(&rnp->lock, flags);
+		raw_spin_lock(&rnp->lock); /* rrupts already disabled. */
 		smp_mb__after_unlock_lock();
 		t->rcu_read_unlock_special.b.blocked = true;
 		t->rcu_blocked_node = rnp;
@@ -317,7 +315,7 @@ static void rcu_preempt_note_context_switch(void)
 				       (rnp->qsmask & rdp->grpmask)
 				       ? rnp->gpnum
 				       : rnp->gpnum + 1);
-		rcu_preempt_ctxt_queue(rnp, rdp, flags);
+		rcu_preempt_ctxt_queue(rnp, rdp);
 	} else if (t->rcu_read_lock_nesting < 0 &&
 		   t->rcu_read_unlock_special.s) {
 
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index c4e607873d6f..ef374ad506f0 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3056,7 +3056,6 @@ static void __sched __schedule(void)
 
 	cpu = smp_processor_id();
 	rq = cpu_rq(cpu);
-	rcu_note_context_switch();
 	prev = rq->curr;
 
 	schedule_debug(prev);
@@ -3072,6 +3071,7 @@ static void __sched __schedule(void)
 	smp_mb__before_spinlock();
 	raw_spin_lock_irq(&rq->lock);
 	lockdep_pin_lock(&rq->lock);
+	rcu_note_context_switch();
 
 	rq->clock_skip_update <<= 1; /* promote REQ to ACT */
 


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* Re: [PATCH tip/core/rcu 04/18] rcu: Use single-stage IPI algorithm for RCU expedited grace period
  2015-10-07 13:49       ` Peter Zijlstra
@ 2015-10-07 16:14         ` Paul E. McKenney
  2015-10-08  9:00           ` Peter Zijlstra
  0 siblings, 1 reply; 67+ messages in thread
From: Paul E. McKenney @ 2015-10-07 16:14 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, mingo, jiangshanlai, dipankar, akpm,
	mathieu.desnoyers, josh, tglx, rostedt, dhowells, edumazet,
	dvhart, fweisbec, oleg, bobby.prani

On Wed, Oct 07, 2015 at 03:49:29PM +0200, Peter Zijlstra wrote:
> On Wed, Oct 07, 2015 at 03:43:11PM +0200, Peter Zijlstra wrote:
> > On Tue, Oct 06, 2015 at 09:29:23AM -0700, Paul E. McKenney wrote:
> > > @@ -167,42 +307,18 @@ static void rcu_preempt_note_context_switch(void)
> > 
> > > -		raw_spin_unlock_irqrestore(&rnp->lock, flags);
> > 
> > This again reminds me that we should move rcu_note_context_switch()
> > under the IRQ disable section the scheduler already has or remove the
> > IRQ disable from rcu_note_context_switch().
> 
> Ah, this is the unlikely path where we actually need to do work. The
> normal fast paths no longer require IRQs disabled, you already fixed
> that last time, good!

But why not make the less-likely paths a bit cheaper while we are at it?  ;-)

							Thanx, Paul


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH tip/core/rcu 09/18] rcu: Switch synchronize_sched_expedited() to IPI
  2015-10-07 14:18     ` Peter Zijlstra
@ 2015-10-07 16:24       ` Paul E. McKenney
  0 siblings, 0 replies; 67+ messages in thread
From: Paul E. McKenney @ 2015-10-07 16:24 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, mingo, jiangshanlai, dipankar, akpm,
	mathieu.desnoyers, josh, tglx, rostedt, dhowells, edumazet,
	dvhart, fweisbec, oleg, bobby.prani

On Wed, Oct 07, 2015 at 04:18:07PM +0200, Peter Zijlstra wrote:
> On Tue, Oct 06, 2015 at 09:29:28AM -0700, Paul E. McKenney wrote:
> 
> > @@ -161,6 +161,8 @@ static void rcu_cleanup_dead_rnp(struct rcu_node *rnp_leaf);
> >  static void rcu_boost_kthread_setaffinity(struct rcu_node *rnp, int outgoingcpu);
> >  static void invoke_rcu_core(void);
> >  static void invoke_rcu_callbacks(struct rcu_state *rsp, struct rcu_data *rdp);
> > +static void __maybe_unused rcu_report_exp_rdp(struct rcu_state *rsp,
> > +					      struct rcu_data *rdp, bool wake);
> >  
> 
> Do we really need that on a declaration? I though only unused
> definitions required it.

You are right.  I should have removed it in commit 4684e21b19e9 ("rcu:
Switch synchronize_sched_expedited() to IPI"), fixed.

							Thanx, Paul


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH tip/core/rcu 18/18] rcu: Better hotplug handling for synchronize_sched_expedited()
  2015-10-07 14:26     ` Peter Zijlstra
@ 2015-10-07 16:26       ` Paul E. McKenney
  2015-10-08  9:01         ` Peter Zijlstra
  0 siblings, 1 reply; 67+ messages in thread
From: Paul E. McKenney @ 2015-10-07 16:26 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, mingo, jiangshanlai, dipankar, akpm,
	mathieu.desnoyers, josh, tglx, rostedt, dhowells, edumazet,
	dvhart, fweisbec, oleg, bobby.prani

On Wed, Oct 07, 2015 at 04:26:27PM +0200, Peter Zijlstra wrote:
> On Tue, Oct 06, 2015 at 09:29:37AM -0700, Paul E. McKenney wrote:
> >  void rcu_sched_qs(void)
> >  {
> > +	unsigned long flags;
> > +
> >  	if (__this_cpu_read(rcu_sched_data.cpu_no_qs.s)) {
> >  		trace_rcu_grace_period(TPS("rcu_sched"),
> >  				       __this_cpu_read(rcu_sched_data.gpnum),
> >  				       TPS("cpuqs"));
> >  		__this_cpu_write(rcu_sched_data.cpu_no_qs.b.norm, false);
> > +		if (!__this_cpu_read(rcu_sched_data.cpu_no_qs.b.exp))
> > +			return;
> > +		local_irq_save(flags);
> >  		if (__this_cpu_read(rcu_sched_data.cpu_no_qs.b.exp)) {
> >  			__this_cpu_write(rcu_sched_data.cpu_no_qs.b.exp, false);
> >  			rcu_report_exp_rdp(&rcu_sched_state,
> >  					   this_cpu_ptr(&rcu_sched_data),
> >  					   true);
> >  		}
> > +		local_irq_restore(flags);
> >  	}
> >  }
> 
> *sigh*.. still rare I suppose, but should we look at doing something
> like this?

Indeed, that approach looks better than moving rcu_note_context_switch(),
which probably results in deadlocks.  I will update my patch accordingly.

							Thanx, Paul

> ---
>  kernel/sched/core.c | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index fe819298c220..3d830c3491c4 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -3050,7 +3050,6 @@ static void __sched __schedule(void)
> 
>  	cpu = smp_processor_id();
>  	rq = cpu_rq(cpu);
> -	rcu_note_context_switch();
>  	prev = rq->curr;
> 
>  	schedule_debug(prev);
> @@ -3058,13 +3057,16 @@ static void __sched __schedule(void)
>  	if (sched_feat(HRTICK))
>  		hrtick_clear(rq);
> 
> +	local_irq_disable();
> +	rcu_note_context_switch();
> +
>  	/*
>  	 * Make sure that signal_pending_state()->signal_pending() below
>  	 * can't be reordered with __set_current_state(TASK_INTERRUPTIBLE)
>  	 * done by the caller to avoid the race with signal_wake_up().
>  	 */
>  	smp_mb__before_spinlock();
> -	raw_spin_lock_irq(&rq->lock);
> +	raw_spin_lock(&rq->lock);
>  	lockdep_pin_lock(&rq->lock);
> 
>  	rq->clock_skip_update <<= 1; /* promote REQ to ACT */
> 


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH tip/core/rcu 02/18] rcu: Move rcu_report_exp_rnp() to allow consolidation
  2015-10-07 14:40             ` Peter Zijlstra
@ 2015-10-07 16:48               ` Paul E. McKenney
  2015-10-08  9:49                 ` Peter Zijlstra
  0 siblings, 1 reply; 67+ messages in thread
From: Paul E. McKenney @ 2015-10-07 16:48 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, mingo, jiangshanlai, dipankar, akpm,
	mathieu.desnoyers, josh, tglx, rostedt, dhowells, edumazet,
	dvhart, fweisbec, oleg, bobby.prani

On Wed, Oct 07, 2015 at 04:40:24PM +0200, Peter Zijlstra wrote:
> On Wed, Oct 07, 2015 at 07:33:25AM -0700, Paul E. McKenney wrote:
> > > I'm sure you know what that means, but I've no clue ;-) That is, I
> > > wouldn't know where to start looking in the RCU implementation to verify
> > > the barrier is either needed or sufficient. Unless you mean _everywhere_
> > > :-)
> > 
> > Pretty much everywhere.
> > 
> > Let's take the usual RCU removal pattern as an example:
> > 
> > 	void f1(struct foo *p)
> > 	{
> > 		list_del_rcu(p);
> > 		synchronize_rcu_expedited();
> > 		kfree(p);
> > 	}
> > 
> > 	void f2(void)
> > 	{
> > 		struct foo *p;
> > 
> > 		list_for_each_entry_rcu(p, &my_head, next)
> > 			do_something_with(p);
> > 	}
> > 
> > So the synchronize_rcu_expedited() acts as an extremely heavyweight
> > memory barrier that pairs with the rcu_dereference() inside of
> > list_for_each_entry_rcu().  Easy enough, right?
> > 
> > But what exactly within synchronize_rcu_expedited() provides the
> > ordering?  The answer is a web of lock-based critical sections and
> > explicit memory barriers, with the one you called out as needing
> > a comment being one of them.
> 
> Right, but seeing there's possible implementations of sync_rcu(_exp)*()
> that do not have the whole rcu_node tree like thing, there's more to
> this particular barrier than the semantics of sync_rcu().
> 
> Some implementation choice requires this barrier upgrade -- and in
> another email I suggest its the whole tree thing, we need to firmly
> establish the state of one level before propagating the state up etc.
> 
> Now I'm not entirely sure this is fully correct, but its the best I
> could come up.

It is pretty close.  Ignoring dyntick idle for the moment, things
go (very) roughly like this:

o	The RCU grace-period kthread notices that a new grace period
	is needed.  It initializes the tree, which includes acquiring
	every rcu_node structure's ->lock.

o	CPU A notices that there is a new grace period.  It acquires
	the ->lock of its leaf rcu_node structure, which forces full
	ordering against the grace-period kthread.

o	Some time later, that CPU A realizes that it has passed
	through a quiescent state, and again acquires its leaf rcu_node
	structure's ->lock, again enforcing full ordering, but this
	time against all CPUs corresponding to this same leaf rcu_node
	structure that previously noticed quiescent states for this
	same grace period.  Also against all prior readers on this
	same CPU.

o	Some time later, CPU B (corresponding to that same leaf
	rcu_node structure) is the last of that leaf's group of CPUs
	to notice a quiescent state.  It has also acquired that leaf's
	->lock, again forcing ordering against its prior RCU read-side
	critical sections, but also against all the prior RCU
	read-side critical sections of all other CPUs corresponding
	to this same leaf.

o	CPU B therefore moves up the tree, acquiring the parent
	rcu_node structures' ->lock.  In so doing, it forces full
	ordering against all prior RCU read-side critical sections
	of all CPUs corresponding to all leaf rcu_node structures
	subordinate to the current (non-leaf) rcu_node structure.

o	And so on, up the tree.

o	When CPU C reaches the root of the tree, and realizes that
	it is the last CPU to report a quiescent state for the
	current grace period, its acquisition of the root rcu_node
	structure's ->lock has forced full ordering against all
	RCU read-side critical sections that started before this
	grace period -- on all CPUs.

	CPU C therefore awakens the grace-period kthread.

o	When the grace-period kthread wakes up, it does cleanup,
	which (you guessed it!) requires acquiring the ->lock of
	each rcu_node structure.  This not only forces full ordering
	against each pre-existing RCU read-side critical section,
	it also sets up things so that...

o	When CPU D notices that the grace period ended, it does so
	while holding its leaf rcu_node structure's ->lock.  This
	forces full ordering against all relevant RCU read-side
	critical sections.  This ordering prevails when CPU D later
	starts invoking RCU callbacks.

o	Just for fun, suppose that one of those callbacks does an
	"smp_store_release(&leak_gp, 1)".  Suppose further that some
	CPU E that is not yet aware that the grace period is finished
	does an "r1 = smp_load_acquire(&lead_gp)" and gets 1.  Even
	if CPU E was the very first CPU to report a quiescent state
	for the grace period, and even if CPU E has not executed any
	sort of ordering operations since, CPU E's subsequent code is
	-still- guaranteed to be fully ordered after each and every
	RCU read-side critical section that started before the grace
	period.

Hey, you asked!!!  ;-)

Again, this is a cartoon-like view of the ordering that leaves out a
lot of details, but it should get across the gist of the ordering.

							Thanx, Paul


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH tip/core/rcu 04/18] rcu: Use single-stage IPI algorithm for RCU expedited grace period
  2015-10-07 13:24     ` Peter Zijlstra
@ 2015-10-07 18:11       ` Paul E. McKenney
  0 siblings, 0 replies; 67+ messages in thread
From: Paul E. McKenney @ 2015-10-07 18:11 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, mingo, jiangshanlai, dipankar, akpm,
	mathieu.desnoyers, josh, tglx, rostedt, dhowells, edumazet,
	dvhart, fweisbec, oleg, bobby.prani

On Wed, Oct 07, 2015 at 03:24:54PM +0200, Peter Zijlstra wrote:
> On Tue, Oct 06, 2015 at 09:29:23AM -0700, Paul E. McKenney wrote:
> > @@ -3494,19 +3483,21 @@ static int sync_rcu_preempt_exp_done(struct rcu_node *rnp)
> >   * recursively up the tree.  (Calm down, calm down, we do the recursion
> >   * iteratively!)
> >   *
> > - * Caller must hold the root rcu_node's exp_funnel_mutex.
> > + * Caller must hold the root rcu_node's exp_funnel_mutex and the
> > + * specified rcu_node structure's ->lock.
> >   */
> > -static void __maybe_unused rcu_report_exp_rnp(struct rcu_state *rsp,
> > -					      struct rcu_node *rnp, bool wake)
> > +static void __rcu_report_exp_rnp(struct rcu_state *rsp, struct rcu_node *rnp,
> > +				 bool wake, unsigned long flags)
> > +	__releases(rnp->lock)
> >  {
> > -	unsigned long flags;
> >  	unsigned long mask;
> >  
> > -	raw_spin_lock_irqsave(&rnp->lock, flags);
> > -	smp_mb__after_unlock_lock();
> 
> 	lockdep_assert_held(&rnp->lock);
> 
> > +/*
> > + * Report expedited quiescent state for specified node.  This is a
> > + * lock-acquisition wrapper function for __rcu_report_exp_rnp().
> > + *
> > + * Caller must hold the root rcu_node's exp_funnel_mutex.
> > + */
> > +static void __maybe_unused rcu_report_exp_rnp(struct rcu_state *rsp,
> > +					      struct rcu_node *rnp, bool wake)
> > +{
> > +	unsigned long flags;
> 
> 	lockdep_assert_held(&rcu_get_root(rsp)->exp_funnel_mutex);
> 
> > +
> > +	raw_spin_lock_irqsave(&rnp->lock, flags);
> > +	smp_mb__after_unlock_lock();
> > +	__rcu_report_exp_rnp(rsp, rnp, wake, flags);
> > +}
> 
> 
> Etc.. these are much harder to ignore than comments.

Good point!  I probably should do the same for interrupt disabling.

							Thanx, Paul


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH tip/core/rcu 04/18] rcu: Use single-stage IPI algorithm for RCU expedited grace period
  2015-10-07 16:14         ` Paul E. McKenney
@ 2015-10-08  9:00           ` Peter Zijlstra
  0 siblings, 0 replies; 67+ messages in thread
From: Peter Zijlstra @ 2015-10-08  9:00 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, mingo, jiangshanlai, dipankar, akpm,
	mathieu.desnoyers, josh, tglx, rostedt, dhowells, edumazet,
	dvhart, fweisbec, oleg, bobby.prani

On Wed, Oct 07, 2015 at 09:14:45AM -0700, Paul E. McKenney wrote:
> On Wed, Oct 07, 2015 at 03:49:29PM +0200, Peter Zijlstra wrote:
> > On Wed, Oct 07, 2015 at 03:43:11PM +0200, Peter Zijlstra wrote:
> > > On Tue, Oct 06, 2015 at 09:29:23AM -0700, Paul E. McKenney wrote:
> > > > @@ -167,42 +307,18 @@ static void rcu_preempt_note_context_switch(void)
> > > 
> > > > -		raw_spin_unlock_irqrestore(&rnp->lock, flags);
> > > 
> > > This again reminds me that we should move rcu_note_context_switch()
> > > under the IRQ disable section the scheduler already has or remove the
> > > IRQ disable from rcu_note_context_switch().
> > 
> > Ah, this is the unlikely path where we actually need to do work. The
> > normal fast paths no longer require IRQs disabled, you already fixed
> > that last time, good!
> 
> But why not make the less-likely paths a bit cheaper while we are at it?  ;-)

True, less critical, but if we're almost there we might as well fix that
up too.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH tip/core/rcu 18/18] rcu: Better hotplug handling for synchronize_sched_expedited()
  2015-10-07 16:26       ` Paul E. McKenney
@ 2015-10-08  9:01         ` Peter Zijlstra
  2015-10-08 15:06           ` Paul E. McKenney
  0 siblings, 1 reply; 67+ messages in thread
From: Peter Zijlstra @ 2015-10-08  9:01 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, mingo, jiangshanlai, dipankar, akpm,
	mathieu.desnoyers, josh, tglx, rostedt, dhowells, edumazet,
	dvhart, fweisbec, oleg, bobby.prani

On Wed, Oct 07, 2015 at 09:26:53AM -0700, Paul E. McKenney wrote:
> On Wed, Oct 07, 2015 at 04:26:27PM +0200, Peter Zijlstra wrote:

> Indeed, that approach looks better than moving rcu_note_context_switch(),
> which probably results in deadlocks.  I will update my patch accordingly.

Yeah, calling rcu_note_context_switch() under the rq->lock is asking for
trouble we don't need.

Thanks!

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH tip/core/rcu 02/18] rcu: Move rcu_report_exp_rnp() to allow consolidation
  2015-10-07 16:48               ` Paul E. McKenney
@ 2015-10-08  9:49                 ` Peter Zijlstra
  2015-10-08 15:33                   ` Paul E. McKenney
  0 siblings, 1 reply; 67+ messages in thread
From: Peter Zijlstra @ 2015-10-08  9:49 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, mingo, jiangshanlai, dipankar, akpm,
	mathieu.desnoyers, josh, tglx, rostedt, dhowells, edumazet,
	dvhart, fweisbec, oleg, bobby.prani

On Wed, Oct 07, 2015 at 09:48:58AM -0700, Paul E. McKenney wrote:

> > Some implementation choice requires this barrier upgrade -- and in
> > another email I suggest its the whole tree thing, we need to firmly
> > establish the state of one level before propagating the state up etc.
> > 
> > Now I'm not entirely sure this is fully correct, but its the best I
> > could come up.
> 
> It is pretty close.  Ignoring dyntick idle for the moment, things
> go (very) roughly like this:
> 
> o	The RCU grace-period kthread notices that a new grace period
> 	is needed.  It initializes the tree, which includes acquiring
> 	every rcu_node structure's ->lock.
> 
> o	CPU A notices that there is a new grace period.  It acquires
> 	the ->lock of its leaf rcu_node structure, which forces full
> 	ordering against the grace-period kthread.

If the kthread took _all_ rcu_node locks, then this does not require the
barrier upgrade because they will share a lock variable.

> o	Some time later, that CPU A realizes that it has passed
> 	through a quiescent state, and again acquires its leaf rcu_node
> 	structure's ->lock, again enforcing full ordering, but this
> 	time against all CPUs corresponding to this same leaf rcu_node
> 	structure that previously noticed quiescent states for this
> 	same grace period.  Also against all prior readers on this
> 	same CPU.

This again reads like the same lock variable is involved, and therefore
the barrier upgrade is not required for this.

> o	Some time later, CPU B (corresponding to that same leaf
> 	rcu_node structure) is the last of that leaf's group of CPUs
> 	to notice a quiescent state.  It has also acquired that leaf's
> 	->lock, again forcing ordering against its prior RCU read-side
> 	critical sections, but also against all the prior RCU
> 	read-side critical sections of all other CPUs corresponding
> 	to this same leaf.

same lock var again..

> o	CPU B therefore moves up the tree, acquiring the parent
> 	rcu_node structures' ->lock.  In so doing, it forces full
> 	ordering against all prior RCU read-side critical sections
> 	of all CPUs corresponding to all leaf rcu_node structures
> 	subordinate to the current (non-leaf) rcu_node structure.

And here we iterate the tree and get another lock var involved, here the
barrier upgrade will actually do something.

> o	And so on, up the tree.

idem..

> o	When CPU C reaches the root of the tree, and realizes that
> 	it is the last CPU to report a quiescent state for the
> 	current grace period, its acquisition of the root rcu_node
> 	structure's ->lock has forced full ordering against all
> 	RCU read-side critical sections that started before this
> 	grace period -- on all CPUs.

Right, which makes the full barrier transitivity thing important

> 	CPU C therefore awakens the grace-period kthread.

> o	When the grace-period kthread wakes up, it does cleanup,
> 	which (you guessed it!) requires acquiring the ->lock of
> 	each rcu_node structure.  This not only forces full ordering
> 	against each pre-existing RCU read-side critical section,
> 	it also sets up things so that...

Again, if it takes _all_ rcu_nodes, it also shares a lock variable and
hence the upgrade is not required.

> o	When CPU D notices that the grace period ended, it does so
> 	while holding its leaf rcu_node structure's ->lock.  This
> 	forces full ordering against all relevant RCU read-side
> 	critical sections.  This ordering prevails when CPU D later
> 	starts invoking RCU callbacks.

Does also not seem to require the upgrade..

> Hey, you asked!!!  ;-)

No, I asked what all the barrier upgrade was for, most of the above does
not seem to rely on that at all.

The only place this upgrade matters is the UNLOCK x + LOCK y scenario,
as also per the comment above smp_mb__after_unlock_lock().

Any other ordering is not on this but on the other primitives and
irrelevant to the barrier upgrade.

> Again, this is a cartoon-like view of the ordering that leaves out a
> lot of details, but it should get across the gist of the ordering.

So the ordering I'm interested in, is the bit that is provided by the
barrier upgrade, and that seems very limited and directly pertains to
the tree iteration, ensuring its fully separated and transitive.

So I'll stick to explanation that the barrier upgrade is purely for the
tree iteration, to separate and make transitive the tree level state.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH tip/core/rcu 02/18] rcu: Move rcu_report_exp_rnp() to allow consolidation
  2015-10-07 15:18                 ` Paul E. McKenney
@ 2015-10-08 10:24                   ` Peter Zijlstra
  0 siblings, 0 replies; 67+ messages in thread
From: Peter Zijlstra @ 2015-10-08 10:24 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Mathieu Desnoyers, linux-kernel, Ingo Molnar, Lai Jiangshan,
	dipankar, Andrew Morton, josh, Thomas Gleixner, rostedt,
	dhowells, edumazet, dvhart, fweisbec, oleg, bobby prani

On Wed, Oct 07, 2015 at 08:18:29AM -0700, Paul E. McKenney wrote:

> Actually, this would be quite good.  "Premature abstraction is the
> root of all evil" and all that, but this abstraction is anything but
> premature.  My thought would be to have it against commit cd58087c9cee
> ("Merge branches 'doc.2015.10.06a', 'percpu-rwsem.2015.10.06a' and
> 'torture.2015.10.06a' into HEAD") in -rcu given the merge conflicts
> that would otherwise arise.

OK here goes, compile tested this time ;-)

---
Subject: rcu: Clarify the smp_mb__after_unlock_lock usage

Because undocumented barriers are bad remove all the uncommented
smp_mb__after_unlock_lock() usage and replace it with a documented set
of wrappers.

The problem is that PPC has RCpc UNLOCK+LOCK where all other archs have
RCsc, which means that on PPC UNLOCK x + LOCK y does not form a full
barrier (which also implies transitivity) and needs help.

AFAICT the only case where this really matters is the rcu_node tree
traversal, where we want to ensure the state a node is 'complete' before
propagating its state up the tree, such that once we reach the top, all
CPUs agree on the observed state.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 kernel/rcu/tree.c        | 128 ++++++++++++++++++++++++++++-------------------
 kernel/rcu/tree.h        |  11 ----
 kernel/rcu/tree_plugin.h |  18 +++----
 3 files changed, 82 insertions(+), 75 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index b7cd210f3b1e..6ee3a6ffcc27 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -1482,6 +1482,56 @@ static void trace_rcu_future_gp(struct rcu_node *rnp, struct rcu_data *rdp,
 }
 
 /*
+ * Place this after a lock-acquisition primitive to guarantee that
+ * an UNLOCK+LOCK pair act as a full barrier.  This guarantee applies
+ * if the UNLOCK and LOCK are executed by the same CPU or if the
+ * UNLOCK and LOCK operate on the same lock variable.
+ */
+#ifdef CONFIG_PPC
+#define smp_mb__after_unlock_lock()	smp_mb()  /* Full ordering for lock. */
+#else /* #ifdef CONFIG_PPC */
+#define smp_mb__after_unlock_lock()	do { } while (0)
+#endif /* #else #ifdef CONFIG_PPC */
+
+/*
+ * Wrappers for the rcu_node::lock acquire.
+ *
+ * Because the rcu_nodes form a tree, the tree traversal locking will observe
+ * different lock values, this in turn means that an UNLOCK of one level
+ * followed by a LOCK of another level does not imply a full memory barrier;
+ * and most importantly transitivity is lost.
+ *
+ * In order to restore full ordering between tree levels, augment the regular
+ * lock acquire functions with smp_mb__after_unlock_lock().
+ */
+static inline void raw_spin_lock_rcu_node(struct rcu_node *rnp)
+{
+	raw_spin_lock(&rnp->lock);
+	smp_mb__after_unlock_lock();
+}
+
+static inline void raw_spin_lock_irq_rcu_node(struct rcu_node *rnp)
+{
+	raw_spin_lock_irq(&rnp->lock);
+	smp_mb__after_unlock_lock();
+}
+
+#define raw_spin_lock_irqsave_rcu_node(rnp, flags)	\
+do {							\
+	typecheck(unsigned long, flags);		\
+	flags = _raw_spin_lock_irqsave(&(rnp)->lock);	\
+	smp_mb__after_unlock_lock();			\
+} while (0)
+
+static inline bool raw_spin_trylock_rcu_node(struct rcu_node *rnp)
+{
+	bool locked = raw_spin_trylock(&rnp->lock);
+	if (locked)
+		smp_mb__after_unlock_lock();
+	return locked;
+}
+
+/*
  * Start some future grace period, as needed to handle newly arrived
  * callbacks.  The required future grace periods are recorded in each
  * rcu_node structure's ->need_future_gp field.  Returns true if there
@@ -1534,10 +1584,8 @@ rcu_start_future_gp(struct rcu_node *rnp, struct rcu_data *rdp,
 	 * hold it, acquire the root rcu_node structure's lock in order to
 	 * start one (if needed).
 	 */
-	if (rnp != rnp_root) {
-		raw_spin_lock(&rnp_root->lock);
-		smp_mb__after_unlock_lock();
-	}
+	if (rnp != rnp_root)
+		raw_spin_lock_rcu_node(rnp_root);
 
 	/*
 	 * Get a new grace-period number.  If there really is no grace
@@ -1786,11 +1834,10 @@ static void note_gp_changes(struct rcu_state *rsp, struct rcu_data *rdp)
 	if ((rdp->gpnum == READ_ONCE(rnp->gpnum) &&
 	     rdp->completed == READ_ONCE(rnp->completed) &&
 	     !unlikely(READ_ONCE(rdp->gpwrap))) || /* w/out lock. */
-	    !raw_spin_trylock(&rnp->lock)) { /* irqs already off, so later. */
+	    !raw_spin_trylock_rcu_node(rnp)) { /* irqs already off, so later. */
 		local_irq_restore(flags);
 		return;
 	}
-	smp_mb__after_unlock_lock();
 	needwake = __note_gp_changes(rsp, rnp, rdp);
 	raw_spin_unlock_irqrestore(&rnp->lock, flags);
 	if (needwake)
@@ -1814,8 +1861,7 @@ static int rcu_gp_init(struct rcu_state *rsp)
 	struct rcu_node *rnp = rcu_get_root(rsp);
 
 	WRITE_ONCE(rsp->gp_activity, jiffies);
-	raw_spin_lock_irq(&rnp->lock);
-	smp_mb__after_unlock_lock();
+	raw_spin_lock_irq_rcu_node(rnp);
 	if (!READ_ONCE(rsp->gp_flags)) {
 		/* Spurious wakeup, tell caller to go back to sleep.  */
 		raw_spin_unlock_irq(&rnp->lock);
@@ -1847,8 +1893,7 @@ static int rcu_gp_init(struct rcu_state *rsp)
 	 */
 	rcu_for_each_leaf_node(rsp, rnp) {
 		rcu_gp_slow(rsp, gp_preinit_delay);
-		raw_spin_lock_irq(&rnp->lock);
-		smp_mb__after_unlock_lock();
+		raw_spin_lock_irq_rcu_node(rnp);
 		if (rnp->qsmaskinit == rnp->qsmaskinitnext &&
 		    !rnp->wait_blkd_tasks) {
 			/* Nothing to do on this leaf rcu_node structure. */
@@ -1904,8 +1949,7 @@ static int rcu_gp_init(struct rcu_state *rsp)
 	 */
 	rcu_for_each_node_breadth_first(rsp, rnp) {
 		rcu_gp_slow(rsp, gp_init_delay);
-		raw_spin_lock_irq(&rnp->lock);
-		smp_mb__after_unlock_lock();
+		raw_spin_lock_irq_rcu_node(rnp);
 		rdp = this_cpu_ptr(rsp->rda);
 		rcu_preempt_check_blocked_tasks(rnp);
 		rnp->qsmask = rnp->qsmaskinit;
@@ -1973,8 +2017,7 @@ static void rcu_gp_fqs(struct rcu_state *rsp, bool first_time)
 	}
 	/* Clear flag to prevent immediate re-entry. */
 	if (READ_ONCE(rsp->gp_flags) & RCU_GP_FLAG_FQS) {
-		raw_spin_lock_irq(&rnp->lock);
-		smp_mb__after_unlock_lock();
+		raw_spin_lock_irq_rcu_node(rnp);
 		WRITE_ONCE(rsp->gp_flags,
 			   READ_ONCE(rsp->gp_flags) & ~RCU_GP_FLAG_FQS);
 		raw_spin_unlock_irq(&rnp->lock);
@@ -1993,8 +2036,7 @@ static void rcu_gp_cleanup(struct rcu_state *rsp)
 	struct rcu_node *rnp = rcu_get_root(rsp);
 
 	WRITE_ONCE(rsp->gp_activity, jiffies);
-	raw_spin_lock_irq(&rnp->lock);
-	smp_mb__after_unlock_lock();
+	raw_spin_lock_irq_rcu_node(rnp);
 	gp_duration = jiffies - rsp->gp_start;
 	if (gp_duration > rsp->gp_max)
 		rsp->gp_max = gp_duration;
@@ -2019,8 +2061,7 @@ static void rcu_gp_cleanup(struct rcu_state *rsp)
 	 * grace period is recorded in any of the rcu_node structures.
 	 */
 	rcu_for_each_node_breadth_first(rsp, rnp) {
-		raw_spin_lock_irq(&rnp->lock);
-		smp_mb__after_unlock_lock();
+		raw_spin_lock_irq_rcu_node(rnp);
 		WARN_ON_ONCE(rcu_preempt_blocked_readers_cgp(rnp));
 		WARN_ON_ONCE(rnp->qsmask);
 		WRITE_ONCE(rnp->completed, rsp->gpnum);
@@ -2035,8 +2076,7 @@ static void rcu_gp_cleanup(struct rcu_state *rsp)
 		rcu_gp_slow(rsp, gp_cleanup_delay);
 	}
 	rnp = rcu_get_root(rsp);
-	raw_spin_lock_irq(&rnp->lock);
-	smp_mb__after_unlock_lock(); /* Order GP before ->completed update. */
+	raw_spin_lock_irq_rcu_node(rnp); /* Order GP before ->completed update. */
 	rcu_nocb_gp_set(rnp, nocb);
 
 	/* Declare grace period done. */
@@ -2284,8 +2324,7 @@ rcu_report_qs_rnp(unsigned long mask, struct rcu_state *rsp,
 		raw_spin_unlock_irqrestore(&rnp->lock, flags);
 		rnp_c = rnp;
 		rnp = rnp->parent;
-		raw_spin_lock_irqsave(&rnp->lock, flags);
-		smp_mb__after_unlock_lock();
+		raw_spin_lock_irqsave_rcu_node(rnp, flags);
 		oldmask = rnp_c->qsmask;
 	}
 
@@ -2332,8 +2371,7 @@ static void rcu_report_unblock_qs_rnp(struct rcu_state *rsp,
 	gps = rnp->gpnum;
 	mask = rnp->grpmask;
 	raw_spin_unlock(&rnp->lock);	/* irqs remain disabled. */
-	raw_spin_lock(&rnp_p->lock);	/* irqs already disabled. */
-	smp_mb__after_unlock_lock();
+	raw_spin_lock_rcu_node(rnp_p);	/* irqs already disabled. */
 	rcu_report_qs_rnp(mask, rsp, rnp_p, gps, flags);
 }
 
@@ -2355,8 +2393,7 @@ rcu_report_qs_rdp(int cpu, struct rcu_state *rsp, struct rcu_data *rdp)
 	struct rcu_node *rnp;
 
 	rnp = rdp->mynode;
-	raw_spin_lock_irqsave(&rnp->lock, flags);
-	smp_mb__after_unlock_lock();
+	raw_spin_lock_irqsave_rcu_node(rnp, flags);
 	if ((rdp->cpu_no_qs.b.norm &&
 	     rdp->rcu_qs_ctr_snap == __this_cpu_read(rcu_qs_ctr)) ||
 	    rdp->gpnum != rnp->gpnum || rnp->completed == rnp->gpnum ||
@@ -2582,8 +2619,7 @@ static void rcu_cleanup_dead_rnp(struct rcu_node *rnp_leaf)
 		rnp = rnp->parent;
 		if (!rnp)
 			break;
-		raw_spin_lock(&rnp->lock); /* irqs already disabled. */
-		smp_mb__after_unlock_lock(); /* GP memory ordering. */
+		raw_spin_lock_rcu_node(rnp); /* irqs already disabled. */
 		rnp->qsmaskinit &= ~mask;
 		rnp->qsmask &= ~mask;
 		if (rnp->qsmaskinit) {
@@ -2611,8 +2647,7 @@ static void rcu_cleanup_dying_idle_cpu(int cpu, struct rcu_state *rsp)
 
 	/* Remove outgoing CPU from mask in the leaf rcu_node structure. */
 	mask = rdp->grpmask;
-	raw_spin_lock_irqsave(&rnp->lock, flags);
-	smp_mb__after_unlock_lock();	/* Enforce GP memory-order guarantee. */
+	raw_spin_lock_irqsave_rcu_node(rnp, flags); /* Enforce GP memory-order guarantee. */
 	rnp->qsmaskinitnext &= ~mask;
 	raw_spin_unlock_irqrestore(&rnp->lock, flags);
 }
@@ -2809,8 +2844,7 @@ static void force_qs_rnp(struct rcu_state *rsp,
 	rcu_for_each_leaf_node(rsp, rnp) {
 		cond_resched_rcu_qs();
 		mask = 0;
-		raw_spin_lock_irqsave(&rnp->lock, flags);
-		smp_mb__after_unlock_lock();
+		raw_spin_lock_irqsave_rcu_node(rnp, flags);
 		if (rnp->qsmask == 0) {
 			if (rcu_state_p == &rcu_sched_state ||
 			    rsp != rcu_state_p ||
@@ -2881,8 +2915,7 @@ static void force_quiescent_state(struct rcu_state *rsp)
 	/* rnp_old == rcu_get_root(rsp), rnp == NULL. */
 
 	/* Reached the root of the rcu_node tree, acquire lock. */
-	raw_spin_lock_irqsave(&rnp_old->lock, flags);
-	smp_mb__after_unlock_lock();
+	raw_spin_lock_irqsave_rcu_node(rnp_old, flags);
 	raw_spin_unlock(&rnp_old->fqslock);
 	if (READ_ONCE(rsp->gp_flags) & RCU_GP_FLAG_FQS) {
 		rsp->n_force_qs_lh++;
@@ -3005,8 +3038,7 @@ static void __call_rcu_core(struct rcu_state *rsp, struct rcu_data *rdp,
 		if (!rcu_gp_in_progress(rsp)) {
 			struct rcu_node *rnp_root = rcu_get_root(rsp);
 
-			raw_spin_lock(&rnp_root->lock);
-			smp_mb__after_unlock_lock();
+			raw_spin_lock_rcu_node(rnp_root);
 			needwake = rcu_start_gp(rsp);
 			raw_spin_unlock(&rnp_root->lock);
 			if (needwake)
@@ -3426,8 +3458,7 @@ static void sync_exp_reset_tree_hotplug(struct rcu_state *rsp)
 	 * CPUs for the current rcu_node structure up the rcu_node tree.
 	 */
 	rcu_for_each_leaf_node(rsp, rnp) {
-		raw_spin_lock_irqsave(&rnp->lock, flags);
-		smp_mb__after_unlock_lock();
+		raw_spin_lock_irqsave_rcu_node(rnp, flags);
 		if (rnp->expmaskinit == rnp->expmaskinitnext) {
 			raw_spin_unlock_irqrestore(&rnp->lock, flags);
 			continue;  /* No new CPUs, nothing to do. */
@@ -3447,8 +3478,7 @@ static void sync_exp_reset_tree_hotplug(struct rcu_state *rsp)
 		rnp_up = rnp->parent;
 		done = false;
 		while (rnp_up) {
-			raw_spin_lock_irqsave(&rnp_up->lock, flags);
-			smp_mb__after_unlock_lock();
+			raw_spin_lock_irqsave_rcu_node(rnp_up, flags);
 			if (rnp_up->expmaskinit)
 				done = true;
 			rnp_up->expmaskinit |= mask;
@@ -3472,8 +3502,7 @@ static void __maybe_unused sync_exp_reset_tree(struct rcu_state *rsp)
 
 	sync_exp_reset_tree_hotplug(rsp);
 	rcu_for_each_node_breadth_first(rsp, rnp) {
-		raw_spin_lock_irqsave(&rnp->lock, flags);
-		smp_mb__after_unlock_lock();
+		raw_spin_lock_irqsave_rcu_node(rnp, flags);
 		WARN_ON_ONCE(rnp->expmask);
 		rnp->expmask = rnp->expmaskinit;
 		raw_spin_unlock_irqrestore(&rnp->lock, flags);
@@ -3531,8 +3560,7 @@ static void __rcu_report_exp_rnp(struct rcu_state *rsp, struct rcu_node *rnp,
 		mask = rnp->grpmask;
 		raw_spin_unlock(&rnp->lock); /* irqs remain disabled */
 		rnp = rnp->parent;
-		raw_spin_lock(&rnp->lock); /* irqs already disabled */
-		smp_mb__after_unlock_lock();
+		raw_spin_lock_rcu_node(rnp); /* irqs already disabled */
 		WARN_ON_ONCE(!(rnp->expmask & mask));
 		rnp->expmask &= ~mask;
 	}
@@ -3549,8 +3577,7 @@ static void __maybe_unused rcu_report_exp_rnp(struct rcu_state *rsp,
 {
 	unsigned long flags;
 
-	raw_spin_lock_irqsave(&rnp->lock, flags);
-	smp_mb__after_unlock_lock();
+	raw_spin_lock_irqsave_rcu_node(rnp, flags);
 	__rcu_report_exp_rnp(rsp, rnp, wake, flags);
 }
 
@@ -3564,8 +3591,7 @@ static void rcu_report_exp_cpu_mult(struct rcu_state *rsp, struct rcu_node *rnp,
 {
 	unsigned long flags;
 
-	raw_spin_lock_irqsave(&rnp->lock, flags);
-	smp_mb__after_unlock_lock();
+	raw_spin_lock_irqsave_rcu_node(rnp, flags);
 	if (!(rnp->expmask & mask)) {
 		raw_spin_unlock_irqrestore(&rnp->lock, flags);
 		return;
@@ -3708,8 +3734,7 @@ static void sync_rcu_exp_select_cpus(struct rcu_state *rsp,
 
 	sync_exp_reset_tree(rsp);
 	rcu_for_each_leaf_node(rsp, rnp) {
-		raw_spin_lock_irqsave(&rnp->lock, flags);
-		smp_mb__after_unlock_lock();
+		raw_spin_lock_irqsave_rcu_node(rnp, flags);
 
 		/* Each pass checks a CPU for identity, offline, and idle. */
 		mask_ofl_test = 0;
@@ -4198,8 +4223,7 @@ rcu_init_percpu_data(int cpu, struct rcu_state *rsp)
 	 */
 	rnp = rdp->mynode;
 	mask = rdp->grpmask;
-	raw_spin_lock(&rnp->lock);		/* irqs already disabled. */
-	smp_mb__after_unlock_lock();
+	raw_spin_lock_rcu_node(rnp);		/* irqs already disabled. */
 	rnp->qsmaskinitnext |= mask;
 	rnp->expmaskinitnext |= mask;
 	if (!rdp->beenonline)
diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
index 9fb4e238d4dc..1d2eb0859f70 100644
--- a/kernel/rcu/tree.h
+++ b/kernel/rcu/tree.h
@@ -653,14 +653,3 @@ static inline void rcu_nocb_q_lengths(struct rcu_data *rdp, long *ql, long *qll)
 }
 #endif /* #ifdef CONFIG_RCU_TRACE */
 
-/*
- * Place this after a lock-acquisition primitive to guarantee that
- * an UNLOCK+LOCK pair act as a full barrier.  This guarantee applies
- * if the UNLOCK and LOCK are executed by the same CPU or if the
- * UNLOCK and LOCK operate on the same lock variable.
- */
-#ifdef CONFIG_PPC
-#define smp_mb__after_unlock_lock()	smp_mb()  /* Full ordering for lock. */
-#else /* #ifdef CONFIG_PPC */
-#define smp_mb__after_unlock_lock()	do { } while (0)
-#endif /* #else #ifdef CONFIG_PPC */
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 630c19772630..fa0e3b96a9ed 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -301,8 +301,7 @@ static void rcu_preempt_note_context_switch(void)
 		/* Possibly blocking in an RCU read-side critical section. */
 		rdp = this_cpu_ptr(rcu_state_p->rda);
 		rnp = rdp->mynode;
-		raw_spin_lock_irqsave(&rnp->lock, flags);
-		smp_mb__after_unlock_lock();
+		raw_spin_lock_irqsave_rcu_node(rnp, flags);
 		t->rcu_read_unlock_special.b.blocked = true;
 		t->rcu_blocked_node = rnp;
 
@@ -457,8 +456,7 @@ void rcu_read_unlock_special(struct task_struct *t)
 		 */
 		for (;;) {
 			rnp = t->rcu_blocked_node;
-			raw_spin_lock(&rnp->lock);  /* irqs already disabled. */
-			smp_mb__after_unlock_lock();
+			raw_spin_lock_rcu_node(rnp); /* irqs already disabled. */
 			if (rnp == t->rcu_blocked_node)
 				break;
 			WARN_ON_ONCE(1);
@@ -989,8 +987,7 @@ static int rcu_boost(struct rcu_node *rnp)
 	    READ_ONCE(rnp->boost_tasks) == NULL)
 		return 0;  /* Nothing left to boost. */
 
-	raw_spin_lock_irqsave(&rnp->lock, flags);
-	smp_mb__after_unlock_lock();
+	raw_spin_lock_irqsave_rcu_node(rnp, flags);
 
 	/*
 	 * Recheck under the lock: all tasks in need of boosting
@@ -1176,8 +1173,7 @@ static int rcu_spawn_one_boost_kthread(struct rcu_state *rsp,
 			   "rcub/%d", rnp_index);
 	if (IS_ERR(t))
 		return PTR_ERR(t);
-	raw_spin_lock_irqsave(&rnp->lock, flags);
-	smp_mb__after_unlock_lock();
+	raw_spin_lock_irqsave_rcu_node(rnp, flags);
 	rnp->boost_kthread_task = t;
 	raw_spin_unlock_irqrestore(&rnp->lock, flags);
 	sp.sched_priority = kthread_prio;
@@ -1567,8 +1563,7 @@ static void rcu_prepare_for_idle(void)
 		if (!*rdp->nxttail[RCU_DONE_TAIL])
 			continue;
 		rnp = rdp->mynode;
-		raw_spin_lock(&rnp->lock); /* irqs already disabled. */
-		smp_mb__after_unlock_lock();
+		raw_spin_lock_rcu_node(rnp); /* irqs already disabled. */
 		needwake = rcu_accelerate_cbs(rsp, rnp, rdp);
 		raw_spin_unlock(&rnp->lock); /* irqs remain disabled. */
 		if (needwake)
@@ -2068,8 +2063,7 @@ static void rcu_nocb_wait_gp(struct rcu_data *rdp)
 	bool needwake;
 	struct rcu_node *rnp = rdp->mynode;
 
-	raw_spin_lock_irqsave(&rnp->lock, flags);
-	smp_mb__after_unlock_lock();
+	raw_spin_lock_irqsave_rcu_node(rnp, flags);
 	needwake = rcu_start_future_gp(rnp, rdp, &c);
 	raw_spin_unlock_irqrestore(&rnp->lock, flags);
 	if (needwake)

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* Re: [PATCH tip/core/rcu 18/18] rcu: Better hotplug handling for synchronize_sched_expedited()
  2015-10-08  9:01         ` Peter Zijlstra
@ 2015-10-08 15:06           ` Paul E. McKenney
  2015-10-08 15:12             ` Peter Zijlstra
  0 siblings, 1 reply; 67+ messages in thread
From: Paul E. McKenney @ 2015-10-08 15:06 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, mingo, jiangshanlai, dipankar, akpm,
	mathieu.desnoyers, josh, tglx, rostedt, dhowells, edumazet,
	dvhart, fweisbec, oleg, bobby.prani

On Thu, Oct 08, 2015 at 11:01:28AM +0200, Peter Zijlstra wrote:
> On Wed, Oct 07, 2015 at 09:26:53AM -0700, Paul E. McKenney wrote:
> > On Wed, Oct 07, 2015 at 04:26:27PM +0200, Peter Zijlstra wrote:
> 
> > Indeed, that approach looks better than moving rcu_note_context_switch(),
> > which probably results in deadlocks.  I will update my patch accordingly.
> 
> Yeah, calling rcu_note_context_switch() under the rq->lock is asking for
> trouble we don't need.

Please see below for the fixed version.  Thoughts?

(Queued for 4.5, want some serious testing on this.)

							Thanx, Paul

------------------------------------------------------------------------

commit 3ab3edf72a59a800e6e59ad3128f5b3b251b8962
Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Date:   Wed Oct 7 09:10:48 2015 -0700

    rcu: Stop disabling interrupts in scheduler fastpaths
    
    We need the scheduler's fastpaths to be, well, fast, and unnecessarily
    disabling and re-enabling interrupts is not necessarily consistent with
    this goal.  Especially given that there are regions of the scheduler that
    already have interrupts disabled.
    
    This commit therefore moves the call to rcu_note_context_switch()
    to one of the interrupts-disabled regions of the scheduler, and
    removes the now-redundant disabling and re-enabling of interrupts from
    rcu_note_context_switch() and the functions it calls.
    
    Reported-by: Peter Zijlstra <peterz@infradead.org>
    Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
    [ paulmck: Shift rcu_note_context_switch() to avoid deadlock, as suggested
      by Peter Zijlstra. ]

diff --git a/include/linux/rcutree.h b/include/linux/rcutree.h
index 60d15a080d7c..9d3eda39bcd2 100644
--- a/include/linux/rcutree.h
+++ b/include/linux/rcutree.h
@@ -37,7 +37,7 @@ void rcu_cpu_stall_reset(void);
 /*
  * Note a virtualization-based context switch.  This is simply a
  * wrapper around rcu_note_context_switch(), which allows TINY_RCU
- * to save a few bytes.
+ * to save a few bytes. The caller must have disabled interrupts.
  */
 static inline void rcu_virt_note_context_switch(int cpu)
 {
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index c9780fc47391..fbc9b5574e48 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -295,17 +295,16 @@ EXPORT_PER_CPU_SYMBOL_GPL(rcu_qs_ctr);
  * We inform the RCU core by emulating a zero-duration dyntick-idle
  * period, which we in turn do by incrementing the ->dynticks counter
  * by two.
+ *
+ * The caller must have disabled interrupts.
  */
 static void rcu_momentary_dyntick_idle(void)
 {
-	unsigned long flags;
 	struct rcu_data *rdp;
 	struct rcu_dynticks *rdtp;
 	int resched_mask;
 	struct rcu_state *rsp;
 
-	local_irq_save(flags);
-
 	/*
 	 * Yes, we can lose flag-setting operations.  This is OK, because
 	 * the flag will be set again after some delay.
@@ -335,13 +334,12 @@ static void rcu_momentary_dyntick_idle(void)
 		smp_mb__after_atomic(); /* Later stuff after QS. */
 		break;
 	}
-	local_irq_restore(flags);
 }
 
 /*
  * Note a context switch.  This is a quiescent state for RCU-sched,
  * and requires special handling for preemptible RCU.
- * The caller must have disabled preemption.
+ * The caller must have disabled interrupts.
  */
 void rcu_note_context_switch(void)
 {
@@ -371,9 +369,14 @@ EXPORT_SYMBOL_GPL(rcu_note_context_switch);
  */
 void rcu_all_qs(void)
 {
+	unsigned long flags;
+
 	barrier(); /* Avoid RCU read-side critical sections leaking down. */
-	if (unlikely(raw_cpu_read(rcu_sched_qs_mask)))
+	if (unlikely(raw_cpu_read(rcu_sched_qs_mask))) {
+		local_irq_save(flags);
 		rcu_momentary_dyntick_idle();
+		local_irq_restore(flags);
+	}
 	this_cpu_inc(rcu_qs_ctr);
 	barrier(); /* Avoid RCU read-side critical sections leaking up. */
 }
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 97dfa7d57f79..7087fb047e2d 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -146,8 +146,8 @@ static void __init rcu_bootup_announce(void)
  * the corresponding expedited grace period will also be the end of the
  * normal grace period.
  */
-static void rcu_preempt_ctxt_queue(struct rcu_node *rnp, struct rcu_data *rdp,
-				   unsigned long flags) __releases(rnp->lock)
+static void rcu_preempt_ctxt_queue(struct rcu_node *rnp, struct rcu_data *rdp)
+	__releases(rnp->lock) /* But leaves rrupts disabled. */
 {
 	int blkd_state = (rnp->gp_tasks ? RCU_GP_TASKS : 0) +
 			 (rnp->exp_tasks ? RCU_EXP_TASKS : 0) +
@@ -235,7 +235,7 @@ static void rcu_preempt_ctxt_queue(struct rcu_node *rnp, struct rcu_data *rdp,
 		rnp->gp_tasks = &t->rcu_node_entry;
 	if (!rnp->exp_tasks && (blkd_state & RCU_EXP_BLKD))
 		rnp->exp_tasks = &t->rcu_node_entry;
-	raw_spin_unlock(&rnp->lock);
+	raw_spin_unlock(&rnp->lock); /* rrupts remain disabled. */
 
 	/*
 	 * Report the quiescent state for the expedited GP.  This expedited
@@ -250,7 +250,6 @@ static void rcu_preempt_ctxt_queue(struct rcu_node *rnp, struct rcu_data *rdp,
 	} else {
 		WARN_ON_ONCE(t->rcu_read_unlock_special.b.exp_need_qs);
 	}
-	local_irq_restore(flags);
 }
 
 /*
@@ -285,12 +284,11 @@ static void rcu_preempt_qs(void)
  * predating the current grace period drain, in other words, until
  * rnp->gp_tasks becomes NULL.
  *
- * Caller must disable preemption.
+ * Caller must disable interrupts.
  */
 static void rcu_preempt_note_context_switch(void)
 {
 	struct task_struct *t = current;
-	unsigned long flags;
 	struct rcu_data *rdp;
 	struct rcu_node *rnp;
 
@@ -300,7 +298,7 @@ static void rcu_preempt_note_context_switch(void)
 		/* Possibly blocking in an RCU read-side critical section. */
 		rdp = this_cpu_ptr(rcu_state_p->rda);
 		rnp = rdp->mynode;
-		raw_spin_lock_irqsave(&rnp->lock, flags);
+		raw_spin_lock(&rnp->lock); /* rrupts already disabled. */
 		smp_mb__after_unlock_lock();
 		t->rcu_read_unlock_special.b.blocked = true;
 		t->rcu_blocked_node = rnp;
@@ -317,7 +315,7 @@ static void rcu_preempt_note_context_switch(void)
 				       (rnp->qsmask & rdp->grpmask)
 				       ? rnp->gpnum
 				       : rnp->gpnum + 1);
-		rcu_preempt_ctxt_queue(rnp, rdp, flags);
+		rcu_preempt_ctxt_queue(rnp, rdp);
 	} else if (t->rcu_read_lock_nesting < 0 &&
 		   t->rcu_read_unlock_special.s) {
 
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index c4e607873d6f..ac246b0b987a 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3056,7 +3056,6 @@ static void __sched __schedule(void)
 
 	cpu = smp_processor_id();
 	rq = cpu_rq(cpu);
-	rcu_note_context_switch();
 	prev = rq->curr;
 
 	schedule_debug(prev);
@@ -3064,13 +3063,16 @@ static void __sched __schedule(void)
 	if (sched_feat(HRTICK))
 		hrtick_clear(rq);
 
+	local_irq_disable();
+	rcu_note_context_switch();
+
 	/*
 	 * Make sure that signal_pending_state()->signal_pending() below
 	 * can't be reordered with __set_current_state(TASK_INTERRUPTIBLE)
 	 * done by the caller to avoid the race with signal_wake_up().
 	 */
 	smp_mb__before_spinlock();
-	raw_spin_lock_irq(&rq->lock);
+	raw_spin_lock(&rq->lock);
 	lockdep_pin_lock(&rq->lock);
 
 	rq->clock_skip_update <<= 1; /* promote REQ to ACT */


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* Re: [PATCH tip/core/rcu 18/18] rcu: Better hotplug handling for synchronize_sched_expedited()
  2015-10-08 15:06           ` Paul E. McKenney
@ 2015-10-08 15:12             ` Peter Zijlstra
  2015-10-08 15:19               ` Paul E. McKenney
  0 siblings, 1 reply; 67+ messages in thread
From: Peter Zijlstra @ 2015-10-08 15:12 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, mingo, jiangshanlai, dipankar, akpm,
	mathieu.desnoyers, josh, tglx, rostedt, dhowells, edumazet,
	dvhart, fweisbec, oleg, bobby.prani

On Thu, Oct 08, 2015 at 08:06:39AM -0700, Paul E. McKenney wrote:
> Please see below for the fixed version.  Thoughts?

> +	__releases(rnp->lock) /* But leaves rrupts disabled. */
> +	raw_spin_unlock(&rnp->lock); /* rrupts remain disabled. */
> +		raw_spin_lock(&rnp->lock); /* rrupts already disabled. */

What them 'rrupts' about? ;-)

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH tip/core/rcu 18/18] rcu: Better hotplug handling for synchronize_sched_expedited()
  2015-10-08 15:12             ` Peter Zijlstra
@ 2015-10-08 15:19               ` Paul E. McKenney
  2015-10-08 18:01                 ` Josh Triplett
  0 siblings, 1 reply; 67+ messages in thread
From: Paul E. McKenney @ 2015-10-08 15:19 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, mingo, jiangshanlai, dipankar, akpm,
	mathieu.desnoyers, josh, tglx, rostedt, dhowells, edumazet,
	dvhart, fweisbec, oleg, bobby.prani

On Thu, Oct 08, 2015 at 05:12:42PM +0200, Peter Zijlstra wrote:
> On Thu, Oct 08, 2015 at 08:06:39AM -0700, Paul E. McKenney wrote:
> > Please see below for the fixed version.  Thoughts?
> 
> > +	__releases(rnp->lock) /* But leaves rrupts disabled. */
> > +	raw_spin_unlock(&rnp->lock); /* rrupts remain disabled. */
> > +		raw_spin_lock(&rnp->lock); /* rrupts already disabled. */
> 
> What them 'rrupts' about? ;-)

Interrupts when it won't fit.  I suppose I could use IRQs instead.  ;-)

							Thanx, Paul


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH tip/core/rcu 02/18] rcu: Move rcu_report_exp_rnp() to allow consolidation
  2015-10-08  9:49                 ` Peter Zijlstra
@ 2015-10-08 15:33                   ` Paul E. McKenney
  2015-10-08 17:12                     ` Peter Zijlstra
  0 siblings, 1 reply; 67+ messages in thread
From: Paul E. McKenney @ 2015-10-08 15:33 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, mingo, jiangshanlai, dipankar, akpm,
	mathieu.desnoyers, josh, tglx, rostedt, dhowells, edumazet,
	dvhart, fweisbec, oleg, bobby.prani

On Thu, Oct 08, 2015 at 11:49:33AM +0200, Peter Zijlstra wrote:
> On Wed, Oct 07, 2015 at 09:48:58AM -0700, Paul E. McKenney wrote:
> 
> > > Some implementation choice requires this barrier upgrade -- and in
> > > another email I suggest its the whole tree thing, we need to firmly
> > > establish the state of one level before propagating the state up etc.
> > > 
> > > Now I'm not entirely sure this is fully correct, but its the best I
> > > could come up.
> > 
> > It is pretty close.  Ignoring dyntick idle for the moment, things
> > go (very) roughly like this:
> > 
> > o	The RCU grace-period kthread notices that a new grace period
> > 	is needed.  It initializes the tree, which includes acquiring
> > 	every rcu_node structure's ->lock.
> > 
> > o	CPU A notices that there is a new grace period.  It acquires
> > 	the ->lock of its leaf rcu_node structure, which forces full
> > 	ordering against the grace-period kthread.
> 
> If the kthread took _all_ rcu_node locks, then this does not require the
> barrier upgrade because they will share a lock variable.
> 
> > o	Some time later, that CPU A realizes that it has passed
> > 	through a quiescent state, and again acquires its leaf rcu_node
> > 	structure's ->lock, again enforcing full ordering, but this
> > 	time against all CPUs corresponding to this same leaf rcu_node
> > 	structure that previously noticed quiescent states for this
> > 	same grace period.  Also against all prior readers on this
> > 	same CPU.
> 
> This again reads like the same lock variable is involved, and therefore
> the barrier upgrade is not required for this.
> 
> > o	Some time later, CPU B (corresponding to that same leaf
> > 	rcu_node structure) is the last of that leaf's group of CPUs
> > 	to notice a quiescent state.  It has also acquired that leaf's
> > 	->lock, again forcing ordering against its prior RCU read-side
> > 	critical sections, but also against all the prior RCU
> > 	read-side critical sections of all other CPUs corresponding
> > 	to this same leaf.
> 
> same lock var again..
> 
> > o	CPU B therefore moves up the tree, acquiring the parent
> > 	rcu_node structures' ->lock.  In so doing, it forces full
> > 	ordering against all prior RCU read-side critical sections
> > 	of all CPUs corresponding to all leaf rcu_node structures
> > 	subordinate to the current (non-leaf) rcu_node structure.
> 
> And here we iterate the tree and get another lock var involved, here the
> barrier upgrade will actually do something.

Yep.  And I am way too lazy to sort out exactly which acquisitions really
truly need smp_mb__after_unlock_lock() and which don't.  Besides, if I
tried to sort it out, I would occasionally get it wrong, and this would be
a real pain to debug.  Therefore, I simply do smp_mb__after_unlock_lock()
on all acquisitions of the rcu_node structures' ->lock fields.  I can
actually validate that!  ;-)

> > o	And so on, up the tree.
> 
> idem..
> 
> > o	When CPU C reaches the root of the tree, and realizes that
> > 	it is the last CPU to report a quiescent state for the
> > 	current grace period, its acquisition of the root rcu_node
> > 	structure's ->lock has forced full ordering against all
> > 	RCU read-side critical sections that started before this
> > 	grace period -- on all CPUs.
> 
> Right, which makes the full barrier transitivity thing important
> 
> > 	CPU C therefore awakens the grace-period kthread.
> 
> > o	When the grace-period kthread wakes up, it does cleanup,
> > 	which (you guessed it!) requires acquiring the ->lock of
> > 	each rcu_node structure.  This not only forces full ordering
> > 	against each pre-existing RCU read-side critical section,
> > 	it also sets up things so that...
> 
> Again, if it takes _all_ rcu_nodes, it also shares a lock variable and
> hence the upgrade is not required.
> 
> > o	When CPU D notices that the grace period ended, it does so
> > 	while holding its leaf rcu_node structure's ->lock.  This
> > 	forces full ordering against all relevant RCU read-side
> > 	critical sections.  This ordering prevails when CPU D later
> > 	starts invoking RCU callbacks.
> 
> Does also not seem to require the upgrade..
> 
> > Hey, you asked!!!  ;-)
> 
> No, I asked what all the barrier upgrade was for, most of the above does
> not seem to rely on that at all.
> 
> The only place this upgrade matters is the UNLOCK x + LOCK y scenario,
> as also per the comment above smp_mb__after_unlock_lock().
> 
> Any other ordering is not on this but on the other primitives and
> irrelevant to the barrier upgrade.

I am still keeping an smp_mb__after_unlock_lock() after every ->lock.
Trying to track which needs it and which does not is asking for
subtle bugs.

> > Again, this is a cartoon-like view of the ordering that leaves out a
> > lot of details, but it should get across the gist of the ordering.
> 
> So the ordering I'm interested in, is the bit that is provided by the
> barrier upgrade, and that seems very limited and directly pertains to
> the tree iteration, ensuring its fully separated and transitive.
> 
> So I'll stick to explanation that the barrier upgrade is purely for the
> tree iteration, to separate and make transitive the tree level state.

Fair enough, but I will be sticking to the simple coding rule that keeps
RCU out of trouble!

							Thanx, Paul


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH tip/core/rcu 02/18] rcu: Move rcu_report_exp_rnp() to allow consolidation
  2015-10-08 15:33                   ` Paul E. McKenney
@ 2015-10-08 17:12                     ` Peter Zijlstra
  2015-10-08 17:46                       ` Paul E. McKenney
  2015-10-09  0:10                       ` Paul E. McKenney
  0 siblings, 2 replies; 67+ messages in thread
From: Peter Zijlstra @ 2015-10-08 17:12 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, mingo, jiangshanlai, dipankar, akpm,
	mathieu.desnoyers, josh, tglx, rostedt, dhowells, edumazet,
	dvhart, fweisbec, oleg, bobby.prani

On Thu, Oct 08, 2015 at 08:33:51AM -0700, Paul E. McKenney wrote:

> > > o	CPU B therefore moves up the tree, acquiring the parent
> > > 	rcu_node structures' ->lock.  In so doing, it forces full
> > > 	ordering against all prior RCU read-side critical sections
> > > 	of all CPUs corresponding to all leaf rcu_node structures
> > > 	subordinate to the current (non-leaf) rcu_node structure.
> > 
> > And here we iterate the tree and get another lock var involved, here the
> > barrier upgrade will actually do something.
> 
> Yep.  And I am way too lazy to sort out exactly which acquisitions really
> truly need smp_mb__after_unlock_lock() and which don't.  Besides, if I
> tried to sort it out, I would occasionally get it wrong, and this would be
> a real pain to debug.  Therefore, I simply do smp_mb__after_unlock_lock()
> on all acquisitions of the rcu_node structures' ->lock fields.  I can
> actually validate that!  ;-)

This is a whole different line of reasoning once again.

The point remains, that the sole purpose of the barrier upgrade is for
the tree iteration, having some extra (pointless but harmless) instances
does not detract from that.

> Fair enough, but I will be sticking to the simple coding rule that keeps
> RCU out of trouble!

Note that there are rnp->lock acquires without the extra barrier though,
so you seem somewhat inconsistent with your own rule.

See for example:

	rcu_dump_cpu_stacks()
	print_other_cpu_stall()
	print_cpu_stall()

(did not do an exhaustive scan, there might be more)

and yes, that is 'obvious' debug code and not critical to the correct
behaviour of the code, but it is a deviation from 'the rule'.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH tip/core/rcu 02/18] rcu: Move rcu_report_exp_rnp() to allow consolidation
  2015-10-08 17:12                     ` Peter Zijlstra
@ 2015-10-08 17:46                       ` Paul E. McKenney
  2015-10-09  0:10                       ` Paul E. McKenney
  1 sibling, 0 replies; 67+ messages in thread
From: Paul E. McKenney @ 2015-10-08 17:46 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, mingo, jiangshanlai, dipankar, akpm,
	mathieu.desnoyers, josh, tglx, rostedt, dhowells, edumazet,
	dvhart, fweisbec, oleg, bobby.prani

On Thu, Oct 08, 2015 at 07:12:03PM +0200, Peter Zijlstra wrote:
> On Thu, Oct 08, 2015 at 08:33:51AM -0700, Paul E. McKenney wrote:
> 
> > > > o	CPU B therefore moves up the tree, acquiring the parent
> > > > 	rcu_node structures' ->lock.  In so doing, it forces full
> > > > 	ordering against all prior RCU read-side critical sections
> > > > 	of all CPUs corresponding to all leaf rcu_node structures
> > > > 	subordinate to the current (non-leaf) rcu_node structure.
> > > 
> > > And here we iterate the tree and get another lock var involved, here the
> > > barrier upgrade will actually do something.
> > 
> > Yep.  And I am way too lazy to sort out exactly which acquisitions really
> > truly need smp_mb__after_unlock_lock() and which don't.  Besides, if I
> > tried to sort it out, I would occasionally get it wrong, and this would be
> > a real pain to debug.  Therefore, I simply do smp_mb__after_unlock_lock()
> > on all acquisitions of the rcu_node structures' ->lock fields.  I can
> > actually validate that!  ;-)
> 
> This is a whole different line of reasoning once again.
> 
> The point remains, that the sole purpose of the barrier upgrade is for
> the tree iteration, having some extra (pointless but harmless) instances
> does not detract from that.
> 
> > Fair enough, but I will be sticking to the simple coding rule that keeps
> > RCU out of trouble!
> 
> Note that there are rnp->lock acquires without the extra barrier though,
> so you seem somewhat inconsistent with your own rule.
> 
> See for example:
> 
> 	rcu_dump_cpu_stacks()
> 	print_other_cpu_stall()
> 	print_cpu_stall()
> 
> (did not do an exhaustive scan, there might be more)
> 
> and yes, that is 'obvious' debug code and not critical to the correct
> behaviour of the code, but it is a deviation from 'the rule'.

Which I need to fix, thank you.

							Thanx, Paul


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH tip/core/rcu 18/18] rcu: Better hotplug handling for synchronize_sched_expedited()
  2015-10-08 15:19               ` Paul E. McKenney
@ 2015-10-08 18:01                 ` Josh Triplett
  2015-10-09  0:11                   ` Paul E. McKenney
  0 siblings, 1 reply; 67+ messages in thread
From: Josh Triplett @ 2015-10-08 18:01 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Peter Zijlstra, linux-kernel, mingo, jiangshanlai, dipankar,
	akpm, mathieu.desnoyers, tglx, rostedt, dhowells, edumazet,
	dvhart, fweisbec, oleg, bobby.prani

On Thu, Oct 08, 2015 at 08:19:03AM -0700, Paul E. McKenney wrote:
> On Thu, Oct 08, 2015 at 05:12:42PM +0200, Peter Zijlstra wrote:
> > On Thu, Oct 08, 2015 at 08:06:39AM -0700, Paul E. McKenney wrote:
> > > Please see below for the fixed version.  Thoughts?
> > 
> > > +	__releases(rnp->lock) /* But leaves rrupts disabled. */
> > > +	raw_spin_unlock(&rnp->lock); /* rrupts remain disabled. */
> > > +		raw_spin_lock(&rnp->lock); /* rrupts already disabled. */
> > 
> > What them 'rrupts' about? ;-)
> 
> Interrupts when it won't fit.  I suppose I could use IRQs instead.  ;-)

In this particular case, "IRQs" works just as well; however, in general,
this seems like an excellent example of when to ignore the 80-column
guideline. :)

- Josh Triplett

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH tip/core/rcu 02/18] rcu: Move rcu_report_exp_rnp() to allow consolidation
  2015-10-08 17:12                     ` Peter Zijlstra
  2015-10-08 17:46                       ` Paul E. McKenney
@ 2015-10-09  0:10                       ` Paul E. McKenney
  2015-10-09  8:44                         ` Peter Zijlstra
  1 sibling, 1 reply; 67+ messages in thread
From: Paul E. McKenney @ 2015-10-09  0:10 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, mingo, jiangshanlai, dipankar, akpm,
	mathieu.desnoyers, josh, tglx, rostedt, dhowells, edumazet,
	dvhart, fweisbec, oleg, bobby.prani

On Thu, Oct 08, 2015 at 07:12:03PM +0200, Peter Zijlstra wrote:
> On Thu, Oct 08, 2015 at 08:33:51AM -0700, Paul E. McKenney wrote:
> 
> > > > o	CPU B therefore moves up the tree, acquiring the parent
> > > > 	rcu_node structures' ->lock.  In so doing, it forces full
> > > > 	ordering against all prior RCU read-side critical sections
> > > > 	of all CPUs corresponding to all leaf rcu_node structures
> > > > 	subordinate to the current (non-leaf) rcu_node structure.
> > > 
> > > And here we iterate the tree and get another lock var involved, here the
> > > barrier upgrade will actually do something.
> > 
> > Yep.  And I am way too lazy to sort out exactly which acquisitions really
> > truly need smp_mb__after_unlock_lock() and which don't.  Besides, if I
> > tried to sort it out, I would occasionally get it wrong, and this would be
> > a real pain to debug.  Therefore, I simply do smp_mb__after_unlock_lock()
> > on all acquisitions of the rcu_node structures' ->lock fields.  I can
> > actually validate that!  ;-)
> 
> This is a whole different line of reasoning once again.
> 
> The point remains, that the sole purpose of the barrier upgrade is for
> the tree iteration, having some extra (pointless but harmless) instances
> does not detract from that.
> 
> > Fair enough, but I will be sticking to the simple coding rule that keeps
> > RCU out of trouble!
> 
> Note that there are rnp->lock acquires without the extra barrier though,
> so you seem somewhat inconsistent with your own rule.
> 
> See for example:
> 
> 	rcu_dump_cpu_stacks()
> 	print_other_cpu_stall()
> 	print_cpu_stall()
> 
> (did not do an exhaustive scan, there might be more)
> 
> and yes, that is 'obvious' debug code and not critical to the correct
> behaviour of the code, but it is a deviation from 'the rule'.

How about the following patch on top of yours?

							Thanx, Paul

------------------------------------------------------------------------

commit 65764359aaec9513bc6aa94e79069469ec74b53e
Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Date:   Thu Oct 8 15:36:54 2015 -0700

    rcu: Add transitivity to remaining rcu_node ->lock acquisitions
    
    The rule is that all acquisitions of the rcu_node structure's ->lock
    must provide transitivity:  The lock is not acquired that frequently,
    and sorting out exactly which required it and which did not would be
    a maintenance nightmare.  This commit therefore supplies the needed
    transitivity to the remaining ->lock acquisitions.
    
    Reported-by: Peter Zijlstra <peterz@infradead.org>
    Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index daf17e248757..81aa1cdc6bc9 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -1214,7 +1214,7 @@ static void rcu_dump_cpu_stacks(struct rcu_state *rsp)
 	struct rcu_node *rnp;
 
 	rcu_for_each_leaf_node(rsp, rnp) {
-		raw_spin_lock_irqsave(&rnp->lock, flags);
+		raw_spin_lock_irqsave_rcu_node(rnp, flags);
 		if (rnp->qsmask != 0) {
 			for (cpu = 0; cpu <= rnp->grphi - rnp->grplo; cpu++)
 				if (rnp->qsmask & (1UL << cpu))
@@ -1237,7 +1237,7 @@ static void print_other_cpu_stall(struct rcu_state *rsp, unsigned long gpnum)
 
 	/* Only let one CPU complain about others per time interval. */
 
-	raw_spin_lock_irqsave(&rnp->lock, flags);
+	raw_spin_lock_irqsave_rcu_node(rnp, flags);
 	delta = jiffies - READ_ONCE(rsp->jiffies_stall);
 	if (delta < RCU_STALL_RAT_DELAY || !rcu_gp_in_progress(rsp)) {
 		raw_spin_unlock_irqrestore(&rnp->lock, flags);
@@ -1256,7 +1256,7 @@ static void print_other_cpu_stall(struct rcu_state *rsp, unsigned long gpnum)
 	       rsp->name);
 	print_cpu_stall_info_begin();
 	rcu_for_each_leaf_node(rsp, rnp) {
-		raw_spin_lock_irqsave(&rnp->lock, flags);
+		raw_spin_lock_irqsave_rcu_node(rnp, flags);
 		ndetected += rcu_print_task_stall(rnp);
 		if (rnp->qsmask != 0) {
 			for (cpu = 0; cpu <= rnp->grphi - rnp->grplo; cpu++)
@@ -1327,7 +1327,7 @@ static void print_cpu_stall(struct rcu_state *rsp)
 
 	rcu_dump_cpu_stacks(rsp);
 
-	raw_spin_lock_irqsave(&rnp->lock, flags);
+	raw_spin_lock_irqsave_rcu_node(rnp, flags);
 	if (ULONG_CMP_GE(jiffies, READ_ONCE(rsp->jiffies_stall)))
 		WRITE_ONCE(rsp->jiffies_stall,
 			   jiffies + 3 * rcu_jiffies_till_stall_check() + 3);
@@ -2897,7 +2897,7 @@ __rcu_process_callbacks(struct rcu_state *rsp)
 	/* Does this CPU require a not-yet-started grace period? */
 	local_irq_save(flags);
 	if (cpu_needs_another_gp(rsp, rdp)) {
-		raw_spin_lock(&rcu_get_root(rsp)->lock); /* irqs disabled. */
+		raw_spin_lock_rcu_node(rcu_get_root(rsp)); /* irqs disabled. */
 		needwake = rcu_start_gp(rsp);
 		raw_spin_unlock_irqrestore(&rcu_get_root(rsp)->lock, flags);
 		if (needwake)
@@ -3718,7 +3718,7 @@ retry_ipi:
 				mask_ofl_ipi &= ~mask;
 			} else {
 				/* Failed, raced with offline. */
-				raw_spin_lock_irqsave(&rnp->lock, flags);
+				raw_spin_lock_irqsave_rcu_node(rnp, flags);
 				if (cpu_online(cpu) &&
 				    (rnp->expmask & mask)) {
 					raw_spin_unlock_irqrestore(&rnp->lock,
@@ -3727,8 +3727,8 @@ retry_ipi:
 					if (cpu_online(cpu) &&
 					    (rnp->expmask & mask))
 						goto retry_ipi;
-					raw_spin_lock_irqsave(&rnp->lock,
-							      flags);
+					raw_spin_lock_irqsave_rcu_node(rnp,
+								       flags);
 				}
 				if (!(rnp->expmask & mask))
 					mask_ofl_ipi &= ~mask;
@@ -4110,7 +4110,7 @@ static void rcu_init_new_rnp(struct rcu_node *rnp_leaf)
 		rnp = rnp->parent;
 		if (rnp == NULL)
 			return;
-		raw_spin_lock(&rnp->lock); /* Interrupts already disabled. */
+		raw_spin_lock_rcu_node(rnp); /* Interrupts already disabled. */
 		rnp->qsmaskinit |= mask;
 		raw_spin_unlock(&rnp->lock); /* Interrupts remain disabled. */
 	}
@@ -4127,7 +4127,7 @@ rcu_boot_init_percpu_data(int cpu, struct rcu_state *rsp)
 	struct rcu_node *rnp = rcu_get_root(rsp);
 
 	/* Set up local state, ensuring consistent view of global state. */
-	raw_spin_lock_irqsave(&rnp->lock, flags);
+	raw_spin_lock_irqsave_rcu_node(rnp, flags);
 	rdp->grpmask = 1UL << (cpu - rdp->mynode->grplo);
 	rdp->dynticks = &per_cpu(rcu_dynticks, cpu);
 	WARN_ON_ONCE(rdp->dynticks->dynticks_nesting != DYNTICK_TASK_EXIT_IDLE);
@@ -4154,7 +4154,7 @@ rcu_init_percpu_data(int cpu, struct rcu_state *rsp)
 	struct rcu_node *rnp = rcu_get_root(rsp);
 
 	/* Set up local state, ensuring consistent view of global state. */
-	raw_spin_lock_irqsave(&rnp->lock, flags);
+	raw_spin_lock_irqsave_rcu_node(rnp, flags);
 	rdp->qlen_last_fqs_check = 0;
 	rdp->n_force_qs_snap = rsp->n_force_qs;
 	rdp->blimit = blimit;
@@ -4301,7 +4301,7 @@ static int __init rcu_spawn_gp_kthread(void)
 		t = kthread_create(rcu_gp_kthread, rsp, "%s", rsp->name);
 		BUG_ON(IS_ERR(t));
 		rnp = rcu_get_root(rsp);
-		raw_spin_lock_irqsave(&rnp->lock, flags);
+		raw_spin_lock_irqsave_rcu_node(rnp, flags);
 		rsp->gp_kthread = t;
 		if (kthread_prio) {
 			sp.sched_priority = kthread_prio;
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index fa0e3b96a9ed..57ba873d2f18 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -525,7 +525,7 @@ static void rcu_print_detail_task_stall_rnp(struct rcu_node *rnp)
 	unsigned long flags;
 	struct task_struct *t;
 
-	raw_spin_lock_irqsave(&rnp->lock, flags);
+	raw_spin_lock_irqsave_rcu_node(rnp, flags);
 	if (!rcu_preempt_blocked_readers_cgp(rnp)) {
 		raw_spin_unlock_irqrestore(&rnp->lock, flags);
 		return;
diff --git a/kernel/rcu/tree_trace.c b/kernel/rcu/tree_trace.c
index ef7093cc9b5c..8efaba870d96 100644
--- a/kernel/rcu/tree_trace.c
+++ b/kernel/rcu/tree_trace.c
@@ -319,7 +319,7 @@ static void show_one_rcugp(struct seq_file *m, struct rcu_state *rsp)
 	unsigned long gpmax;
 	struct rcu_node *rnp = &rsp->node[0];
 
-	raw_spin_lock_irqsave(&rnp->lock, flags);
+	raw_spin_lock_irqsave_rcu_node(rnp, flags);
 	completed = READ_ONCE(rsp->completed);
 	gpnum = READ_ONCE(rsp->gpnum);
 	if (completed == gpnum)


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* Re: [PATCH tip/core/rcu 18/18] rcu: Better hotplug handling for synchronize_sched_expedited()
  2015-10-08 18:01                 ` Josh Triplett
@ 2015-10-09  0:11                   ` Paul E. McKenney
  2015-10-09  0:48                     ` Josh Triplett
  0 siblings, 1 reply; 67+ messages in thread
From: Paul E. McKenney @ 2015-10-09  0:11 UTC (permalink / raw)
  To: Josh Triplett
  Cc: Peter Zijlstra, linux-kernel, mingo, jiangshanlai, dipankar,
	akpm, mathieu.desnoyers, tglx, rostedt, dhowells, edumazet,
	dvhart, fweisbec, oleg, bobby.prani

On Thu, Oct 08, 2015 at 11:01:14AM -0700, Josh Triplett wrote:
> On Thu, Oct 08, 2015 at 08:19:03AM -0700, Paul E. McKenney wrote:
> > On Thu, Oct 08, 2015 at 05:12:42PM +0200, Peter Zijlstra wrote:
> > > On Thu, Oct 08, 2015 at 08:06:39AM -0700, Paul E. McKenney wrote:
> > > > Please see below for the fixed version.  Thoughts?
> > > 
> > > > +	__releases(rnp->lock) /* But leaves rrupts disabled. */
> > > > +	raw_spin_unlock(&rnp->lock); /* rrupts remain disabled. */
> > > > +		raw_spin_lock(&rnp->lock); /* rrupts already disabled. */
> > > 
> > > What them 'rrupts' about? ;-)
> > 
> > Interrupts when it won't fit.  I suppose I could use IRQs instead.  ;-)
> 
> In this particular case, "IRQs" works just as well; however, in general,
> this seems like an excellent example of when to ignore the 80-column
> guideline. :)

But but but...   You are talking to someone who used actual PUNCHED CARDS
in real life in a paying job!!!  ;-)

							Thanx, Paul


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH tip/core/rcu 18/18] rcu: Better hotplug handling for synchronize_sched_expedited()
  2015-10-09  0:11                   ` Paul E. McKenney
@ 2015-10-09  0:48                     ` Josh Triplett
  2015-10-09  3:54                       ` Paul E. McKenney
  0 siblings, 1 reply; 67+ messages in thread
From: Josh Triplett @ 2015-10-09  0:48 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Peter Zijlstra, linux-kernel, mingo, jiangshanlai, dipankar,
	akpm, mathieu.desnoyers, tglx, rostedt, dhowells, edumazet,
	dvhart, fweisbec, oleg, bobby.prani

On Thu, Oct 08, 2015 at 05:11:11PM -0700, Paul E. McKenney wrote:
> On Thu, Oct 08, 2015 at 11:01:14AM -0700, Josh Triplett wrote:
> > On Thu, Oct 08, 2015 at 08:19:03AM -0700, Paul E. McKenney wrote:
> > > On Thu, Oct 08, 2015 at 05:12:42PM +0200, Peter Zijlstra wrote:
> > > > On Thu, Oct 08, 2015 at 08:06:39AM -0700, Paul E. McKenney wrote:
> > > > > Please see below for the fixed version.  Thoughts?
> > > > 
> > > > > +	__releases(rnp->lock) /* But leaves rrupts disabled. */
> > > > > +	raw_spin_unlock(&rnp->lock); /* rrupts remain disabled. */
> > > > > +		raw_spin_lock(&rnp->lock); /* rrupts already disabled. */
> > > > 
> > > > What them 'rrupts' about? ;-)
> > > 
> > > Interrupts when it won't fit.  I suppose I could use IRQs instead.  ;-)
> > 
> > In this particular case, "IRQs" works just as well; however, in general,
> > this seems like an excellent example of when to ignore the 80-column
> > guideline. :)
> 
> But but but...   You are talking to someone who used actual PUNCHED CARDS
> in real life in a paying job!!!  ;-)

And I learned on a DOS system with 80x25 text mode.  Let us revel in
wonderment at the capabilities of modern systems. :)

- Josh Triplett

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH tip/core/rcu 18/18] rcu: Better hotplug handling for synchronize_sched_expedited()
  2015-10-09  0:48                     ` Josh Triplett
@ 2015-10-09  3:54                       ` Paul E. McKenney
  0 siblings, 0 replies; 67+ messages in thread
From: Paul E. McKenney @ 2015-10-09  3:54 UTC (permalink / raw)
  To: Josh Triplett
  Cc: Peter Zijlstra, linux-kernel, mingo, jiangshanlai, dipankar,
	akpm, mathieu.desnoyers, tglx, rostedt, dhowells, edumazet,
	dvhart, fweisbec, oleg, bobby.prani

On Thu, Oct 08, 2015 at 05:48:13PM -0700, Josh Triplett wrote:
> On Thu, Oct 08, 2015 at 05:11:11PM -0700, Paul E. McKenney wrote:
> > On Thu, Oct 08, 2015 at 11:01:14AM -0700, Josh Triplett wrote:
> > > On Thu, Oct 08, 2015 at 08:19:03AM -0700, Paul E. McKenney wrote:
> > > > On Thu, Oct 08, 2015 at 05:12:42PM +0200, Peter Zijlstra wrote:
> > > > > On Thu, Oct 08, 2015 at 08:06:39AM -0700, Paul E. McKenney wrote:
> > > > > > Please see below for the fixed version.  Thoughts?
> > > > > 
> > > > > > +	__releases(rnp->lock) /* But leaves rrupts disabled. */
> > > > > > +	raw_spin_unlock(&rnp->lock); /* rrupts remain disabled. */
> > > > > > +		raw_spin_lock(&rnp->lock); /* rrupts already disabled. */
> > > > > 
> > > > > What them 'rrupts' about? ;-)
> > > > 
> > > > Interrupts when it won't fit.  I suppose I could use IRQs instead.  ;-)
> > > 
> > > In this particular case, "IRQs" works just as well; however, in general,
> > > this seems like an excellent example of when to ignore the 80-column
> > > guideline. :)
> > 
> > But but but...   You are talking to someone who used actual PUNCHED CARDS
> > in real life in a paying job!!!  ;-)
> 
> And I learned on a DOS system with 80x25 text mode.  Let us revel in
> wonderment at the capabilities of modern systems. :)

But of course!  On my new 2880x1620 screen, I can put four 80x24 xterms
on each row!

Too bad that I cannot actually read them without an external monitor,
and my current external monitors are only 1920x1600.  ;-)

							Thanx, Paul


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH tip/core/rcu 02/18] rcu: Move rcu_report_exp_rnp() to allow consolidation
  2015-10-09  0:10                       ` Paul E. McKenney
@ 2015-10-09  8:44                         ` Peter Zijlstra
  0 siblings, 0 replies; 67+ messages in thread
From: Peter Zijlstra @ 2015-10-09  8:44 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, mingo, jiangshanlai, dipankar, akpm,
	mathieu.desnoyers, josh, tglx, rostedt, dhowells, edumazet,
	dvhart, fweisbec, oleg, bobby.prani

On Thu, Oct 08, 2015 at 05:10:21PM -0700, Paul E. McKenney wrote:
> > Note that there are rnp->lock acquires without the extra barrier though,
> > so you seem somewhat inconsistent with your own rule.
> > 
> > See for example:
> > 
> > 	rcu_dump_cpu_stacks()
> > 	print_other_cpu_stall()
> > 	print_cpu_stall()
> > 
> > (did not do an exhaustive scan, there might be more)
> > 
> > and yes, that is 'obvious' debug code and not critical to the correct
> > behaviour of the code, but it is a deviation from 'the rule'.
> 
> How about the following patch on top of yours?

Works for me, Thanks!

^ permalink raw reply	[flat|nested] 67+ messages in thread

end of thread, other threads:[~2015-10-09  8:45 UTC | newest]

Thread overview: 67+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-10-06 16:29 [PATCH tip/core/rcu 0/18] Expedited grace-period improvements for 4.4 Paul E. McKenney
2015-10-06 16:29 ` [PATCH tip/core/rcu 01/18] rcu: Use rsp->expedited_wq instead of sync_rcu_preempt_exp_wq Paul E. McKenney
2015-10-06 16:29   ` [PATCH tip/core/rcu 02/18] rcu: Move rcu_report_exp_rnp() to allow consolidation Paul E. McKenney
2015-10-06 20:29     ` Peter Zijlstra
2015-10-06 20:58       ` Paul E. McKenney
2015-10-07  7:51         ` Peter Zijlstra
2015-10-07  8:42           ` Mathieu Desnoyers
2015-10-07 11:01             ` Peter Zijlstra
2015-10-07 11:50               ` Peter Zijlstra
2015-10-07 12:03                 ` Peter Zijlstra
2015-10-07 12:05                 ` kbuild test robot
2015-10-07 12:09                 ` kbuild test robot
2015-10-07 12:11                 ` kbuild test robot
2015-10-07 12:17                   ` Peter Zijlstra
2015-10-07 13:44                     ` [kbuild-all] " Fengguang Wu
2015-10-07 13:55                       ` Peter Zijlstra
2015-10-07 14:21                         ` Fengguang Wu
2015-10-07 14:28                           ` Peter Zijlstra
2015-10-07 15:18                 ` Paul E. McKenney
2015-10-08 10:24                   ` Peter Zijlstra
2015-10-07 15:15               ` Paul E. McKenney
2015-10-07 14:33           ` Paul E. McKenney
2015-10-07 14:40             ` Peter Zijlstra
2015-10-07 16:48               ` Paul E. McKenney
2015-10-08  9:49                 ` Peter Zijlstra
2015-10-08 15:33                   ` Paul E. McKenney
2015-10-08 17:12                     ` Peter Zijlstra
2015-10-08 17:46                       ` Paul E. McKenney
2015-10-09  0:10                       ` Paul E. McKenney
2015-10-09  8:44                         ` Peter Zijlstra
2015-10-06 16:29   ` [PATCH tip/core/rcu 03/18] rcu: Consolidate tree setup for synchronize_rcu_expedited() Paul E. McKenney
2015-10-06 16:29   ` [PATCH tip/core/rcu 04/18] rcu: Use single-stage IPI algorithm for RCU expedited grace period Paul E. McKenney
2015-10-07 13:24     ` Peter Zijlstra
2015-10-07 18:11       ` Paul E. McKenney
2015-10-07 13:35     ` Peter Zijlstra
2015-10-07 15:44       ` Paul E. McKenney
2015-10-07 13:43     ` Peter Zijlstra
2015-10-07 13:49       ` Peter Zijlstra
2015-10-07 16:14         ` Paul E. McKenney
2015-10-08  9:00           ` Peter Zijlstra
2015-10-07 16:13       ` Paul E. McKenney
2015-10-06 16:29   ` [PATCH tip/core/rcu 05/18] rcu: Move synchronize_sched_expedited() to combining tree Paul E. McKenney
2015-10-06 16:29   ` [PATCH tip/core/rcu 06/18] rcu: Rename qs_pending to core_needs_qs Paul E. McKenney
2015-10-06 16:29   ` [PATCH tip/core/rcu 07/18] rcu: Invert passed_quiesce and rename to cpu_no_qs Paul E. McKenney
2015-10-06 16:29   ` [PATCH tip/core/rcu 08/18] rcu: Make ->cpu_no_qs be a union for aggregate OR Paul E. McKenney
2015-10-06 16:29   ` [PATCH tip/core/rcu 09/18] rcu: Switch synchronize_sched_expedited() to IPI Paul E. McKenney
2015-10-07 14:18     ` Peter Zijlstra
2015-10-07 16:24       ` Paul E. McKenney
2015-10-06 16:29   ` [PATCH tip/core/rcu 10/18] rcu: Stop silencing lockdep false positive for expedited grace periods Paul E. McKenney
2015-10-06 16:29   ` [PATCH tip/core/rcu 11/18] rcu: Stop excluding CPU hotplug in synchronize_sched_expedited() Paul E. McKenney
2015-10-06 16:29   ` [PATCH tip/core/rcu 12/18] cpu: Remove try_get_online_cpus() Paul E. McKenney
2015-10-06 16:29   ` [PATCH tip/core/rcu 13/18] rcu: Prepare for consolidating expedited CPU selection Paul E. McKenney
2015-10-06 16:29   ` [PATCH tip/core/rcu 14/18] rcu: Consolidate " Paul E. McKenney
2015-10-06 16:29   ` [PATCH tip/core/rcu 15/18] rcu: Add online/offline info to expedited stall warning message Paul E. McKenney
2015-10-06 16:29   ` [PATCH tip/core/rcu 16/18] rcu: Add tasks to expedited stall-warning messages Paul E. McKenney
2015-10-06 16:29   ` [PATCH tip/core/rcu 17/18] rcu: Enable stall warnings for synchronize_rcu_expedited() Paul E. McKenney
2015-10-06 16:29   ` [PATCH tip/core/rcu 18/18] rcu: Better hotplug handling for synchronize_sched_expedited() Paul E. McKenney
2015-10-07 14:26     ` Peter Zijlstra
2015-10-07 16:26       ` Paul E. McKenney
2015-10-08  9:01         ` Peter Zijlstra
2015-10-08 15:06           ` Paul E. McKenney
2015-10-08 15:12             ` Peter Zijlstra
2015-10-08 15:19               ` Paul E. McKenney
2015-10-08 18:01                 ` Josh Triplett
2015-10-09  0:11                   ` Paul E. McKenney
2015-10-09  0:48                     ` Josh Triplett
2015-10-09  3:54                       ` Paul E. McKenney

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).