All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH tip/core/rcu 0/16] Expedited grace-period changes for 4.7
@ 2016-04-12 15:29 Paul E. McKenney
  2016-04-12 15:29 ` [PATCH tip/core/rcu 01/16] rcu: Make expedited RCU-sched grace period immediately detect idle Paul E. McKenney
                   ` (15 more replies)
  0 siblings, 16 replies; 17+ messages in thread
From: Paul E. McKenney @ 2016-04-12 15:29 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, jiangshanlai, dipankar, akpm, mathieu.desnoyers, josh,
	tglx, peterz, rostedt, dhowells, edumazet, dvhart, fweisbec,
	oleg, bobby.prani

Hello!

This series contains updates for RCU's expedited grace-period facility:

1.	Make expedited RCU-sched grace period immediately detect idle.

2.	Make expedited RCU-preempt stall warnings count accurately.

3.	Make cond_resched_rcu_qs() supply RCU-sched expedited QS.

4.	Fix synchronize_rcu_expedited() header comment.

5.	Add event tracing definitions for expedited grace periods.

6.	Add funnel-locking tracing for expedited grace periods.

7.	Add expedited-grace-period event tracing.

8.	Remove expedited GP funnel-lock bypass.

9.	Force boolean subscript for expedited stall warnings.

10.	Shorten expedited_workdone* to exp_workdone*.

11.	Enforce expedited-GP fairness via funnel wait queue.

12.	Add exp_funnel_lock() fastpath for new funnel wait queue.

13.	Consolidate expedited GP code into rcu_exp_wait_wake().

14.	Consolidate expedited GP tracing into rcu_exp_gp_seq_snap().

15.	Consolidate expedited GP code into exp_funnel_lock().

16.	Overlap wakeups with next expedited grace period.

							Thanx, Paul

------------------------------------------------------------------------

 Documentation/RCU/trace.txt |   10 -
 include/trace/events/rcu.h  |   83 ++++++++++-
 kernel/rcu/tree.c           |  317 +++++++++++++++++++++++++-------------------
 kernel/rcu/tree.h           |   20 +-
 kernel/rcu/tree_plugin.h    |   57 +++----
 kernel/rcu/tree_trace.c     |   13 -
 6 files changed, 312 insertions(+), 188 deletions(-)

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH tip/core/rcu 01/16] rcu: Make expedited RCU-sched grace period immediately detect idle
  2016-04-12 15:29 [PATCH tip/core/rcu 0/16] Expedited grace-period changes for 4.7 Paul E. McKenney
@ 2016-04-12 15:29 ` Paul E. McKenney
  2016-04-12 15:29 ` [PATCH tip/core/rcu 02/16] rcu: Make expedited RCU-preempt stall warnings count accurately Paul E. McKenney
                   ` (14 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: Paul E. McKenney @ 2016-04-12 15:29 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, jiangshanlai, dipankar, akpm, mathieu.desnoyers, josh,
	tglx, peterz, rostedt, dhowells, edumazet, dvhart, fweisbec,
	oleg, bobby.prani, Paul E. McKenney

Currently, sync_sched_exp_handler() will force a reschedule unless
this CPU has already checked in or unless a reschedule has already
been called for.  This is clearly wasteful if sync_sched_exp_handler()
interrupted an idle CPU, so this commit immediately reports the
quiescent state in that case.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 kernel/rcu/tree.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 531a328076bd..5f4336fadc28 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -3649,6 +3649,11 @@ static void sync_sched_exp_handler(void *data)
 	if (!(READ_ONCE(rnp->expmask) & rdp->grpmask) ||
 	    __this_cpu_read(rcu_sched_data.cpu_no_qs.b.exp))
 		return;
+	if (rcu_is_cpu_rrupt_from_idle()) {
+		rcu_report_exp_rdp(&rcu_sched_state,
+				   this_cpu_ptr(&rcu_sched_data), true);
+		return;
+	}
 	__this_cpu_write(rcu_sched_data.cpu_no_qs.b.exp, true);
 	resched_cpu(smp_processor_id());
 }
-- 
2.5.2

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH tip/core/rcu 02/16] rcu: Make expedited RCU-preempt stall warnings count accurately
  2016-04-12 15:29 [PATCH tip/core/rcu 0/16] Expedited grace-period changes for 4.7 Paul E. McKenney
  2016-04-12 15:29 ` [PATCH tip/core/rcu 01/16] rcu: Make expedited RCU-sched grace period immediately detect idle Paul E. McKenney
@ 2016-04-12 15:29 ` Paul E. McKenney
  2016-04-12 15:29 ` [PATCH tip/core/rcu 03/16] rcu: Make cond_resched_rcu_qs() supply RCU-sched expedited QS Paul E. McKenney
                   ` (13 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: Paul E. McKenney @ 2016-04-12 15:29 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, jiangshanlai, dipankar, akpm, mathieu.desnoyers, josh,
	tglx, peterz, rostedt, dhowells, edumazet, dvhart, fweisbec,
	oleg, bobby.prani, Paul E. McKenney

Currently, synchronize_sched_expedited_wait() simply sets the ndetected
variable to the rcu_print_task_exp_stall() return value.  This means
that if the last rcu_node structure has no stalled tasks, record of
any stalled tasks in previous rcu_node structures is lost, which can
in turn result in failure to dump out the blocking rcu_node structures.
Or could, had the test been correct.

This commit therefore adds the return value of rcu_print_task_exp_stall()
to ndetected and corrects the later test for ndetected.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 kernel/rcu/tree.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 5f4336fadc28..687d8a5f35c7 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -3778,7 +3778,7 @@ static void synchronize_sched_expedited_wait(struct rcu_state *rsp)
 		       rsp->name);
 		ndetected = 0;
 		rcu_for_each_leaf_node(rsp, rnp) {
-			ndetected = rcu_print_task_exp_stall(rnp);
+			ndetected += rcu_print_task_exp_stall(rnp);
 			mask = 1;
 			for (cpu = rnp->grplo; cpu <= rnp->grphi; cpu++, mask <<= 1) {
 				struct rcu_data *rdp;
@@ -3797,7 +3797,7 @@ static void synchronize_sched_expedited_wait(struct rcu_state *rsp)
 		pr_cont(" } %lu jiffies s: %lu root: %#lx/%c\n",
 			jiffies - jiffies_start, rsp->expedited_sequence,
 			rnp_root->expmask, ".T"[!!rnp_root->exp_tasks]);
-		if (!ndetected) {
+		if (ndetected) {
 			pr_err("blocking rcu_node structures:");
 			rcu_for_each_node_breadth_first(rsp, rnp) {
 				if (rnp == rnp_root)
-- 
2.5.2

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH tip/core/rcu 03/16] rcu: Make cond_resched_rcu_qs() supply RCU-sched expedited QS
  2016-04-12 15:29 [PATCH tip/core/rcu 0/16] Expedited grace-period changes for 4.7 Paul E. McKenney
  2016-04-12 15:29 ` [PATCH tip/core/rcu 01/16] rcu: Make expedited RCU-sched grace period immediately detect idle Paul E. McKenney
  2016-04-12 15:29 ` [PATCH tip/core/rcu 02/16] rcu: Make expedited RCU-preempt stall warnings count accurately Paul E. McKenney
@ 2016-04-12 15:29 ` Paul E. McKenney
  2016-04-12 15:29 ` [PATCH tip/core/rcu 04/16] rcu: Fix synchronize_rcu_expedited() header comment Paul E. McKenney
                   ` (12 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: Paul E. McKenney @ 2016-04-12 15:29 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, jiangshanlai, dipankar, akpm, mathieu.desnoyers, josh,
	tglx, peterz, rostedt, dhowells, edumazet, dvhart, fweisbec,
	oleg, bobby.prani, Paul E. McKenney

Although cond_resched_rcu_qs() supplies quiescent states to all flavors
of normal RCU grace periods, it does nothing for expedited RCU-sched
grace periods.  This commit therefore adds a check for a need for a
quiescent state from the current CPU by an expedited RCU-sched grace
period, and invokes rcu_sched_qs() to supply that quiescent state if so.

Note that the check is racy in that we might be migrated to some other
CPU just after checking the per-CPU variable.  This is OK because the
act of migration will do a context switch, which will supply the needed
quiescent state.  The only downside is that we might do an unnecessary
call to rcu_sched_qs(), but the probability is low and the overhead
is small.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 kernel/rcu/tree.c | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 687d8a5f35c7..178575c01d09 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -370,6 +370,21 @@ void rcu_all_qs(void)
 		rcu_momentary_dyntick_idle();
 		local_irq_restore(flags);
 	}
+	if (unlikely(raw_cpu_read(rcu_sched_data.cpu_no_qs.b.exp))) {
+		/*
+		 * Yes, we just checked a per-CPU variable with preemption
+		 * enabled, so we might be migrated to some other CPU at
+		 * this point.  That is OK because in that case, the
+		 * migration will supply the needed quiescent state.
+		 * We might end up needlessly disabling preemption and
+		 * invoking rcu_sched_qs() on the destination CPU, but
+		 * the probability and cost are both quite low, so this
+		 * should not be a problem in practice.
+		 */
+		preempt_disable();
+		rcu_sched_qs();
+		preempt_enable();
+	}
 	this_cpu_inc(rcu_qs_ctr);
 	barrier(); /* Avoid RCU read-side critical sections leaking up. */
 }
-- 
2.5.2

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH tip/core/rcu 04/16] rcu: Fix synchronize_rcu_expedited() header comment
  2016-04-12 15:29 [PATCH tip/core/rcu 0/16] Expedited grace-period changes for 4.7 Paul E. McKenney
                   ` (2 preceding siblings ...)
  2016-04-12 15:29 ` [PATCH tip/core/rcu 03/16] rcu: Make cond_resched_rcu_qs() supply RCU-sched expedited QS Paul E. McKenney
@ 2016-04-12 15:29 ` Paul E. McKenney
  2016-04-12 15:29 ` [PATCH tip/core/rcu 05/16] rcu: Add event tracing definitions for expedited grace periods Paul E. McKenney
                   ` (11 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: Paul E. McKenney @ 2016-04-12 15:29 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, jiangshanlai, dipankar, akpm, mathieu.desnoyers, josh,
	tglx, peterz, rostedt, dhowells, edumazet, dvhart, fweisbec,
	oleg, bobby.prani, Paul E. McKenney

This commit brings the synchronize_rcu_expedited() function's header
comment into line with the new implementation.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 kernel/rcu/tree_plugin.h | 20 +++++++++++++-------
 1 file changed, 13 insertions(+), 7 deletions(-)

diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index efdf7b61ce12..a2ac2628ef8e 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -722,13 +722,19 @@ static void sync_rcu_exp_handler(void *info)
  * synchronize_rcu_expedited - Brute-force RCU grace period
  *
  * Wait for an RCU-preempt grace period, but expedite it.  The basic
- * idea is to invoke synchronize_sched_expedited() to push all the tasks to
- * the ->blkd_tasks lists and wait for this list to drain.  This consumes
- * significant time on all CPUs and is unfriendly to real-time workloads,
- * so is thus not recommended for any sort of common-case code.
- * In fact, if you are using synchronize_rcu_expedited() in a loop,
- * please restructure your code to batch your updates, and then Use a
- * single synchronize_rcu() instead.
+ * idea is to IPI all non-idle non-nohz online CPUs.  The IPI handler
+ * checks whether the CPU is in an RCU-preempt critical section, and
+ * if so, it sets a flag that causes the outermost rcu_read_unlock()
+ * to report the quiescent state.  On the other hand, if the CPU is
+ * not in an RCU read-side critical section, the IPI handler reports
+ * the quiescent state immediately.
+ *
+ * Although this is a greate improvement over previous expedited
+ * implementations, it is still unfriendly to real-time workloads, so is
+ * thus not recommended for any sort of common-case code.  In fact, if
+ * you are using synchronize_rcu_expedited() in a loop, please restructure
+ * your code to batch your updates, and then Use a single synchronize_rcu()
+ * instead.
  */
 void synchronize_rcu_expedited(void)
 {
-- 
2.5.2

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH tip/core/rcu 05/16] rcu: Add event tracing definitions for expedited grace periods
  2016-04-12 15:29 [PATCH tip/core/rcu 0/16] Expedited grace-period changes for 4.7 Paul E. McKenney
                   ` (3 preceding siblings ...)
  2016-04-12 15:29 ` [PATCH tip/core/rcu 04/16] rcu: Fix synchronize_rcu_expedited() header comment Paul E. McKenney
@ 2016-04-12 15:29 ` Paul E. McKenney
  2016-04-12 15:29 ` [PATCH tip/core/rcu 06/16] rcu: Add funnel-locking tracing " Paul E. McKenney
                   ` (10 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: Paul E. McKenney @ 2016-04-12 15:29 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, jiangshanlai, dipankar, akpm, mathieu.desnoyers, josh,
	tglx, peterz, rostedt, dhowells, edumazet, dvhart, fweisbec,
	oleg, bobby.prani, Paul E. McKenney

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 include/trace/events/rcu.h | 78 ++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 76 insertions(+), 2 deletions(-)

diff --git a/include/trace/events/rcu.h b/include/trace/events/rcu.h
index ef72c4aada56..aacc172eba7e 100644
--- a/include/trace/events/rcu.h
+++ b/include/trace/events/rcu.h
@@ -172,6 +172,76 @@ TRACE_EVENT(rcu_grace_period_init,
 );
 
 /*
+ * Tracepoint for expedited grace-period events.  Takes a string identifying
+ * the RCU flavor, the expedited grace-period sequence number, and a string
+ * identifying the grace-period-related event as follows:
+ *
+ *	"snap": Captured snapshot of expedited grace period sequence number.
+ *	"start": Started a real expedited grace period.
+ *	"end": Ended a real expedited grace period.
+ *	"done": Someone else did the expedited grace period for us.
+ */
+TRACE_EVENT(rcu_exp_grace_period,
+
+	TP_PROTO(const char *rcuname, unsigned long gpseq, const char *gpevent),
+
+	TP_ARGS(rcuname, gpseq, gpevent),
+
+	TP_STRUCT__entry(
+		__field(const char *, rcuname)
+		__field(unsigned long, gpseq)
+		__field(const char *, gpevent)
+	),
+
+	TP_fast_assign(
+		__entry->rcuname = rcuname;
+		__entry->gpseq = gpseq;
+		__entry->gpevent = gpevent;
+	),
+
+	TP_printk("%s %lu %s",
+		  __entry->rcuname, __entry->gpseq, __entry->gpevent)
+);
+
+/*
+ * Tracepoint for expedited grace-period funnel-locking events.  Takes a
+ * string identifying the RCU flavor, an integer identifying the rcu_node
+ * combining-tree level, another pair of integers identifying the lowest-
+ * and highest-numbered CPU associated with the current rcu_node structure,
+ * and a string.  identifying the grace-period-related event as follows:
+ *
+ *	"acq": Acquired a level of funnel lock
+ *	"rel": Released a level of funnel lock
+ */
+TRACE_EVENT(rcu_exp_funnel_lock,
+
+	TP_PROTO(const char *rcuname, u8 level, int grplo, int grphi,
+		 const char *gpevent),
+
+	TP_ARGS(rcuname, level, grplo, grphi, gpevent),
+
+	TP_STRUCT__entry(
+		__field(const char *, rcuname)
+		__field(u8, level)
+		__field(int, grplo)
+		__field(int, grphi)
+		__field(const char *, gpevent)
+	),
+
+	TP_fast_assign(
+		__entry->rcuname = rcuname;
+		__entry->level = level;
+		__entry->grplo = grplo;
+		__entry->grphi = grphi;
+		__entry->gpevent = gpevent;
+	),
+
+	TP_printk("%s %d %d %d %s",
+		  __entry->rcuname, __entry->level, __entry->grplo,
+		  __entry->grphi, __entry->gpevent)
+);
+
+/*
  * Tracepoint for RCU no-CBs CPU callback handoffs.  This event is intended
  * to assist debugging of these handoffs.
  *
@@ -704,11 +774,15 @@ TRACE_EVENT(rcu_barrier,
 #else /* #ifdef CONFIG_RCU_TRACE */
 
 #define trace_rcu_grace_period(rcuname, gpnum, gpevent) do { } while (0)
-#define trace_rcu_grace_period_init(rcuname, gpnum, level, grplo, grphi, \
-				    qsmask) do { } while (0)
 #define trace_rcu_future_grace_period(rcuname, gpnum, completed, c, \
 				      level, grplo, grphi, event) \
 				      do { } while (0)
+#define trace_rcu_grace_period_init(rcuname, gpnum, level, grplo, grphi, \
+				    qsmask) do { } while (0)
+#define trace_rcu_exp_grace_period(rcuname, gqseq, gpevent) \
+	do { } while (0)
+#define trace_rcu_exp_funnel_lock(rcuname, level, grplo, grphi, gpevent) \
+	do { } while (0)
 #define trace_rcu_nocb_wake(rcuname, cpu, reason) do { } while (0)
 #define trace_rcu_preempt_task(rcuname, pid, gpnum) do { } while (0)
 #define trace_rcu_unlock_preempted_task(rcuname, gpnum, pid) do { } while (0)
-- 
2.5.2

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH tip/core/rcu 06/16] rcu: Add funnel-locking tracing for expedited grace periods
  2016-04-12 15:29 [PATCH tip/core/rcu 0/16] Expedited grace-period changes for 4.7 Paul E. McKenney
                   ` (4 preceding siblings ...)
  2016-04-12 15:29 ` [PATCH tip/core/rcu 05/16] rcu: Add event tracing definitions for expedited grace periods Paul E. McKenney
@ 2016-04-12 15:29 ` Paul E. McKenney
  2016-04-12 15:29 ` [PATCH tip/core/rcu 07/16] rcu: Add expedited-grace-period event tracing Paul E. McKenney
                   ` (9 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: Paul E. McKenney @ 2016-04-12 15:29 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, jiangshanlai, dipankar, akpm, mathieu.desnoyers, josh,
	tglx, peterz, rostedt, dhowells, edumazet, dvhart, fweisbec,
	oleg, bobby.prani, Paul E. McKenney

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 kernel/rcu/tree.c        | 31 +++++++++++++++++++++++++++----
 kernel/rcu/tree_plugin.h |  3 +++
 2 files changed, 30 insertions(+), 4 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 178575c01d09..79e9206a7b11 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -3584,10 +3584,18 @@ static bool sync_exp_work_done(struct rcu_state *rsp, struct rcu_node *rnp,
 			       atomic_long_t *stat, unsigned long s)
 {
 	if (rcu_exp_gp_seq_done(rsp, s)) {
-		if (rnp)
+		if (rnp) {
 			mutex_unlock(&rnp->exp_funnel_mutex);
-		else if (rdp)
+			trace_rcu_exp_funnel_lock(rsp->name, rnp->level,
+						  rnp->grplo, rnp->grphi,
+						  TPS("rel"));
+		} else if (rdp) {
 			mutex_unlock(&rdp->exp_funnel_mutex);
+			trace_rcu_exp_funnel_lock(rsp->name,
+						  rdp->mynode->level + 1,
+						  rdp->cpu, rdp->cpu,
+						  TPS("rel"));
+		}
 		/* Ensure test happens before caller kfree(). */
 		smp_mb__before_atomic(); /* ^^^ */
 		atomic_long_inc(stat);
@@ -3619,6 +3627,9 @@ static struct rcu_node *exp_funnel_lock(struct rcu_state *rsp, unsigned long s)
 			if (sync_exp_work_done(rsp, rnp0, NULL,
 					       &rdp->expedited_workdone0, s))
 				return NULL;
+			trace_rcu_exp_funnel_lock(rsp->name, rnp0->level,
+						  rnp0->grplo, rnp0->grphi,
+						  TPS("acq"));
 			return rnp0;
 		}
 	}
@@ -3634,16 +3645,28 @@ static struct rcu_node *exp_funnel_lock(struct rcu_state *rsp, unsigned long s)
 	if (sync_exp_work_done(rsp, NULL, NULL, &rdp->expedited_workdone1, s))
 		return NULL;
 	mutex_lock(&rdp->exp_funnel_mutex);
+	trace_rcu_exp_funnel_lock(rsp->name, rdp->mynode->level + 1,
+				  rdp->cpu, rdp->cpu, TPS("acq"));
 	rnp0 = rdp->mynode;
 	for (; rnp0 != NULL; rnp0 = rnp0->parent) {
 		if (sync_exp_work_done(rsp, rnp1, rdp,
 				       &rdp->expedited_workdone2, s))
 			return NULL;
 		mutex_lock(&rnp0->exp_funnel_mutex);
-		if (rnp1)
+		trace_rcu_exp_funnel_lock(rsp->name, rnp0->level,
+					  rnp0->grplo, rnp0->grphi, TPS("acq"));
+		if (rnp1) {
 			mutex_unlock(&rnp1->exp_funnel_mutex);
-		else
+			trace_rcu_exp_funnel_lock(rsp->name, rnp1->level,
+						  rnp1->grplo, rnp1->grphi,
+						  TPS("rel"));
+		} else {
 			mutex_unlock(&rdp->exp_funnel_mutex);
+			trace_rcu_exp_funnel_lock(rsp->name,
+						  rdp->mynode->level + 1,
+						  rdp->cpu, rdp->cpu,
+						  TPS("rel"));
+		}
 		rnp1 = rnp0;
 	}
 	if (sync_exp_work_done(rsp, rnp1, rdp,
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index a2ac2628ef8e..cd2dae43ff48 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -767,6 +767,9 @@ void synchronize_rcu_expedited(void)
 	/* Clean up and exit. */
 	rcu_exp_gp_seq_end(rsp);
 	mutex_unlock(&rnp_unlock->exp_funnel_mutex);
+	trace_rcu_exp_funnel_lock(rsp->name, rnp_unlock->level,
+				  rnp_unlock->grplo, rnp_unlock->grphi,
+				  TPS("rel"));
 }
 EXPORT_SYMBOL_GPL(synchronize_rcu_expedited);
 
-- 
2.5.2

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH tip/core/rcu 07/16] rcu: Add expedited-grace-period event tracing
  2016-04-12 15:29 [PATCH tip/core/rcu 0/16] Expedited grace-period changes for 4.7 Paul E. McKenney
                   ` (5 preceding siblings ...)
  2016-04-12 15:29 ` [PATCH tip/core/rcu 06/16] rcu: Add funnel-locking tracing " Paul E. McKenney
@ 2016-04-12 15:29 ` Paul E. McKenney
  2016-04-12 15:29 ` [PATCH tip/core/rcu 08/16] rcu: Remove expedited GP funnel-lock bypass Paul E. McKenney
                   ` (8 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: Paul E. McKenney @ 2016-04-12 15:29 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, jiangshanlai, dipankar, akpm, mathieu.desnoyers, josh,
	tglx, peterz, rostedt, dhowells, edumazet, dvhart, fweisbec,
	oleg, bobby.prani, Paul E. McKenney

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 kernel/rcu/tree.c        | 20 +++++++++++++-------
 kernel/rcu/tree_plugin.h |  3 +++
 2 files changed, 16 insertions(+), 7 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 79e9206a7b11..524026fd9dd7 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -3584,17 +3584,18 @@ static bool sync_exp_work_done(struct rcu_state *rsp, struct rcu_node *rnp,
 			       atomic_long_t *stat, unsigned long s)
 {
 	if (rcu_exp_gp_seq_done(rsp, s)) {
+		trace_rcu_exp_grace_period(rsp->name, s, TPS("done"));
 		if (rnp) {
-			mutex_unlock(&rnp->exp_funnel_mutex);
 			trace_rcu_exp_funnel_lock(rsp->name, rnp->level,
 						  rnp->grplo, rnp->grphi,
 						  TPS("rel"));
+			mutex_unlock(&rnp->exp_funnel_mutex);
 		} else if (rdp) {
-			mutex_unlock(&rdp->exp_funnel_mutex);
 			trace_rcu_exp_funnel_lock(rsp->name,
 						  rdp->mynode->level + 1,
 						  rdp->cpu, rdp->cpu,
 						  TPS("rel"));
+			mutex_unlock(&rdp->exp_funnel_mutex);
 		}
 		/* Ensure test happens before caller kfree(). */
 		smp_mb__before_atomic(); /* ^^^ */
@@ -3624,12 +3625,12 @@ static struct rcu_node *exp_funnel_lock(struct rcu_state *rsp, unsigned long s)
 	rnp0 = rcu_get_root(rsp);
 	if (!mutex_is_locked(&rnp0->exp_funnel_mutex)) {
 		if (mutex_trylock(&rnp0->exp_funnel_mutex)) {
-			if (sync_exp_work_done(rsp, rnp0, NULL,
-					       &rdp->expedited_workdone0, s))
-				return NULL;
 			trace_rcu_exp_funnel_lock(rsp->name, rnp0->level,
 						  rnp0->grplo, rnp0->grphi,
 						  TPS("acq"));
+			if (sync_exp_work_done(rsp, rnp0, NULL,
+					       &rdp->expedited_workdone0, s))
+				return NULL;
 			return rnp0;
 		}
 	}
@@ -3656,16 +3657,16 @@ static struct rcu_node *exp_funnel_lock(struct rcu_state *rsp, unsigned long s)
 		trace_rcu_exp_funnel_lock(rsp->name, rnp0->level,
 					  rnp0->grplo, rnp0->grphi, TPS("acq"));
 		if (rnp1) {
-			mutex_unlock(&rnp1->exp_funnel_mutex);
 			trace_rcu_exp_funnel_lock(rsp->name, rnp1->level,
 						  rnp1->grplo, rnp1->grphi,
 						  TPS("rel"));
+			mutex_unlock(&rnp1->exp_funnel_mutex);
 		} else {
-			mutex_unlock(&rdp->exp_funnel_mutex);
 			trace_rcu_exp_funnel_lock(rsp->name,
 						  rdp->mynode->level + 1,
 						  rdp->cpu, rdp->cpu,
 						  TPS("rel"));
+			mutex_unlock(&rdp->exp_funnel_mutex);
 		}
 		rnp1 = rnp0;
 	}
@@ -3895,16 +3896,21 @@ void synchronize_sched_expedited(void)
 
 	/* Take a snapshot of the sequence number.  */
 	s = rcu_exp_gp_seq_snap(rsp);
+	trace_rcu_exp_grace_period(rsp->name, s, TPS("snap"));
 
 	rnp = exp_funnel_lock(rsp, s);
 	if (rnp == NULL)
 		return;  /* Someone else did our work for us. */
 
 	rcu_exp_gp_seq_start(rsp);
+	trace_rcu_exp_grace_period(rsp->name, s, TPS("start"));
 	sync_rcu_exp_select_cpus(rsp, sync_sched_exp_handler);
 	synchronize_sched_expedited_wait(rsp);
 
 	rcu_exp_gp_seq_end(rsp);
+	trace_rcu_exp_grace_period(rsp->name, s, TPS("end"));
+	trace_rcu_exp_funnel_lock(rsp->name, rnp->level,
+				  rnp->grplo, rnp->grphi, TPS("rel"));
 	mutex_unlock(&rnp->exp_funnel_mutex);
 }
 EXPORT_SYMBOL_GPL(synchronize_sched_expedited);
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index cd2dae43ff48..36e94aed38a7 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -750,12 +750,14 @@ void synchronize_rcu_expedited(void)
 	}
 
 	s = rcu_exp_gp_seq_snap(rsp);
+	trace_rcu_exp_grace_period(rsp->name, s, TPS("snap"));
 
 	rnp_unlock = exp_funnel_lock(rsp, s);
 	if (rnp_unlock == NULL)
 		return;  /* Someone else did our work for us. */
 
 	rcu_exp_gp_seq_start(rsp);
+	trace_rcu_exp_grace_period(rsp->name, s, TPS("start"));
 
 	/* Initialize the rcu_node tree in preparation for the wait. */
 	sync_rcu_exp_select_cpus(rsp, sync_rcu_exp_handler);
@@ -766,6 +768,7 @@ void synchronize_rcu_expedited(void)
 
 	/* Clean up and exit. */
 	rcu_exp_gp_seq_end(rsp);
+	trace_rcu_exp_grace_period(rsp->name, s, TPS("end"));
 	mutex_unlock(&rnp_unlock->exp_funnel_mutex);
 	trace_rcu_exp_funnel_lock(rsp->name, rnp_unlock->level,
 				  rnp_unlock->grplo, rnp_unlock->grphi,
-- 
2.5.2

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH tip/core/rcu 08/16] rcu: Remove expedited GP funnel-lock bypass
  2016-04-12 15:29 [PATCH tip/core/rcu 0/16] Expedited grace-period changes for 4.7 Paul E. McKenney
                   ` (6 preceding siblings ...)
  2016-04-12 15:29 ` [PATCH tip/core/rcu 07/16] rcu: Add expedited-grace-period event tracing Paul E. McKenney
@ 2016-04-12 15:29 ` Paul E. McKenney
  2016-04-12 15:29 ` [PATCH tip/core/rcu 09/16] rcu: Force boolean subscript for expedited stall warnings Paul E. McKenney
                   ` (7 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: Paul E. McKenney @ 2016-04-12 15:29 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, jiangshanlai, dipankar, akpm, mathieu.desnoyers, josh,
	tglx, peterz, rostedt, dhowells, edumazet, dvhart, fweisbec,
	oleg, bobby.prani, Paul E. McKenney

Commit #cdacbe1f91264 ("rcu: Add fastpath bypassing funnel locking")
turns out to be a pessimization at high load because it forces a tree
full of tasks to wait for an expedited grace period that they probably
do not need.  This commit therefore removes this optimization.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 Documentation/RCU/trace.txt | 10 +++++-----
 kernel/rcu/tree.c           | 19 -------------------
 kernel/rcu/tree.h           |  1 -
 kernel/rcu/tree_trace.c     |  7 +++----
 4 files changed, 8 insertions(+), 29 deletions(-)

diff --git a/Documentation/RCU/trace.txt b/Documentation/RCU/trace.txt
index ec6998b1b6d0..00a3a38b375a 100644
--- a/Documentation/RCU/trace.txt
+++ b/Documentation/RCU/trace.txt
@@ -237,17 +237,17 @@ o	"ktl" is the low-order 16 bits (in hexadecimal) of the count of
 
 The output of "cat rcu/rcu_preempt/rcuexp" looks as follows:
 
-s=21872 wd0=0 wd1=0 wd2=0 wd3=5 n=0 enq=0 sc=21872
+s=21872 wd1=0 wd2=0 wd3=5 n=0 enq=0 sc=21872
 
 These fields are as follows:
 
 o	"s" is the sequence number, with an odd number indicating that
 	an expedited grace period is in progress.
 
-o	"wd0", "wd1", "wd2", and "wd3" are the number of times that an
-	attempt to start an expedited grace period found that someone
-	else had completed an expedited grace period that satisfies the
-	attempted request.  "Our work is done."
+o	"wd1", "wd2", and "wd3" are the number of times that an attempt
+	to start an expedited grace period found that someone else had
+	completed an expedited grace period that satisfies the attempted
+	request.  "Our work is done."
 
 o	"n" is number of times that a concurrent CPU-hotplug operation
 	forced a fallback to a normal grace period.
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 524026fd9dd7..62e73e0a929f 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -3617,25 +3617,6 @@ static struct rcu_node *exp_funnel_lock(struct rcu_state *rsp, unsigned long s)
 	struct rcu_node *rnp1 = NULL;
 
 	/*
-	 * First try directly acquiring the root lock in order to reduce
-	 * latency in the common case where expedited grace periods are
-	 * rare.  We check mutex_is_locked() to avoid pathological levels of
-	 * memory contention on ->exp_funnel_mutex in the heavy-load case.
-	 */
-	rnp0 = rcu_get_root(rsp);
-	if (!mutex_is_locked(&rnp0->exp_funnel_mutex)) {
-		if (mutex_trylock(&rnp0->exp_funnel_mutex)) {
-			trace_rcu_exp_funnel_lock(rsp->name, rnp0->level,
-						  rnp0->grplo, rnp0->grphi,
-						  TPS("acq"));
-			if (sync_exp_work_done(rsp, rnp0, NULL,
-					       &rdp->expedited_workdone0, s))
-				return NULL;
-			return rnp0;
-		}
-	}
-
-	/*
 	 * Each pass through the following loop works its way
 	 * up the rcu_node tree, returning if others have done the
 	 * work or otherwise falls through holding the root rnp's
diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
index df668c0f9e64..ac9a7b0c36ae 100644
--- a/kernel/rcu/tree.h
+++ b/kernel/rcu/tree.h
@@ -388,7 +388,6 @@ struct rcu_data {
 	struct rcu_head oom_head;
 #endif /* #ifdef CONFIG_RCU_FAST_NO_HZ */
 	struct mutex exp_funnel_mutex;
-	atomic_long_t expedited_workdone0;	/* # done by others #0. */
 	atomic_long_t expedited_workdone1;	/* # done by others #1. */
 	atomic_long_t expedited_workdone2;	/* # done by others #2. */
 	atomic_long_t expedited_workdone3;	/* # done by others #3. */
diff --git a/kernel/rcu/tree_trace.c b/kernel/rcu/tree_trace.c
index 1088e64f01ad..d149c412a4e5 100644
--- a/kernel/rcu/tree_trace.c
+++ b/kernel/rcu/tree_trace.c
@@ -185,17 +185,16 @@ static int show_rcuexp(struct seq_file *m, void *v)
 	int cpu;
 	struct rcu_state *rsp = (struct rcu_state *)m->private;
 	struct rcu_data *rdp;
-	unsigned long s0 = 0, s1 = 0, s2 = 0, s3 = 0;
+	unsigned long s1 = 0, s2 = 0, s3 = 0;
 
 	for_each_possible_cpu(cpu) {
 		rdp = per_cpu_ptr(rsp->rda, cpu);
-		s0 += atomic_long_read(&rdp->expedited_workdone0);
 		s1 += atomic_long_read(&rdp->expedited_workdone1);
 		s2 += atomic_long_read(&rdp->expedited_workdone2);
 		s3 += atomic_long_read(&rdp->expedited_workdone3);
 	}
-	seq_printf(m, "s=%lu wd0=%lu wd1=%lu wd2=%lu wd3=%lu n=%lu enq=%d sc=%lu\n",
-		   rsp->expedited_sequence, s0, s1, s2, s3,
+	seq_printf(m, "s=%lu wd1=%lu wd2=%lu wd3=%lu n=%lu enq=%d sc=%lu\n",
+		   rsp->expedited_sequence, s1, s2, s3,
 		   atomic_long_read(&rsp->expedited_normal),
 		   atomic_read(&rsp->expedited_need_qs),
 		   rsp->expedited_sequence / 2);
-- 
2.5.2

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH tip/core/rcu 09/16] rcu: Force boolean subscript for expedited stall warnings
  2016-04-12 15:29 [PATCH tip/core/rcu 0/16] Expedited grace-period changes for 4.7 Paul E. McKenney
                   ` (7 preceding siblings ...)
  2016-04-12 15:29 ` [PATCH tip/core/rcu 08/16] rcu: Remove expedited GP funnel-lock bypass Paul E. McKenney
@ 2016-04-12 15:29 ` Paul E. McKenney
  2016-04-12 15:29 ` [PATCH tip/core/rcu 10/16] rcu: Shorten expedited_workdone* to exp_workdone* Paul E. McKenney
                   ` (6 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: Paul E. McKenney @ 2016-04-12 15:29 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, jiangshanlai, dipankar, akpm, mathieu.desnoyers, josh,
	tglx, peterz, rostedt, dhowells, edumazet, dvhart, fweisbec,
	oleg, bobby.prani, Paul E. McKenney

The cpu_online() function can return values other than 0 and 1, which
can result in subscript overflow when applied to a two-element array.
This commit allows for this behavior by using "!!" on the return value
from cpu_online() when used as a subscript.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 kernel/rcu/tree.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 62e73e0a929f..64c2e3288551 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -3808,7 +3808,7 @@ static void synchronize_sched_expedited_wait(struct rcu_state *rsp)
 				ndetected++;
 				rdp = per_cpu_ptr(rsp->rda, cpu);
 				pr_cont(" %d-%c%c%c", cpu,
-					"O."[cpu_online(cpu)],
+					"O."[!!cpu_online(cpu)],
 					"o."[!!(rdp->grpmask & rnp->expmaskinit)],
 					"N."[!!(rdp->grpmask & rnp->expmaskinitnext)]);
 			}
-- 
2.5.2

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH tip/core/rcu 10/16] rcu: Shorten expedited_workdone* to exp_workdone*
  2016-04-12 15:29 [PATCH tip/core/rcu 0/16] Expedited grace-period changes for 4.7 Paul E. McKenney
                   ` (8 preceding siblings ...)
  2016-04-12 15:29 ` [PATCH tip/core/rcu 09/16] rcu: Force boolean subscript for expedited stall warnings Paul E. McKenney
@ 2016-04-12 15:29 ` Paul E. McKenney
  2016-04-12 15:29 ` [PATCH tip/core/rcu 11/16] rcu: Enforce expedited-GP fairness via funnel wait queue Paul E. McKenney
                   ` (5 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: Paul E. McKenney @ 2016-04-12 15:29 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, jiangshanlai, dipankar, akpm, mathieu.desnoyers, josh,
	tglx, peterz, rostedt, dhowells, edumazet, dvhart, fweisbec,
	oleg, bobby.prani, Paul E. McKenney

Just a name change to save a few lines and a bit of typing.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 kernel/rcu/tree.c       | 8 +++-----
 kernel/rcu/tree.h       | 6 +++---
 kernel/rcu/tree_trace.c | 6 +++---
 3 files changed, 9 insertions(+), 11 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 64c2e3288551..89f028767765 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -3624,15 +3624,14 @@ static struct rcu_node *exp_funnel_lock(struct rcu_state *rsp, unsigned long s)
 	 * can be inexact, as it is just promoting locality and is not
 	 * strictly needed for correctness.
 	 */
-	if (sync_exp_work_done(rsp, NULL, NULL, &rdp->expedited_workdone1, s))
+	if (sync_exp_work_done(rsp, NULL, NULL, &rdp->exp_workdone1, s))
 		return NULL;
 	mutex_lock(&rdp->exp_funnel_mutex);
 	trace_rcu_exp_funnel_lock(rsp->name, rdp->mynode->level + 1,
 				  rdp->cpu, rdp->cpu, TPS("acq"));
 	rnp0 = rdp->mynode;
 	for (; rnp0 != NULL; rnp0 = rnp0->parent) {
-		if (sync_exp_work_done(rsp, rnp1, rdp,
-				       &rdp->expedited_workdone2, s))
+		if (sync_exp_work_done(rsp, rnp1, rdp, &rdp->exp_workdone2, s))
 			return NULL;
 		mutex_lock(&rnp0->exp_funnel_mutex);
 		trace_rcu_exp_funnel_lock(rsp->name, rnp0->level,
@@ -3651,8 +3650,7 @@ static struct rcu_node *exp_funnel_lock(struct rcu_state *rsp, unsigned long s)
 		}
 		rnp1 = rnp0;
 	}
-	if (sync_exp_work_done(rsp, rnp1, rdp,
-			       &rdp->expedited_workdone3, s))
+	if (sync_exp_work_done(rsp, rnp1, rdp, &rdp->exp_workdone3, s))
 		return NULL;
 	return rnp1;
 }
diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
index ac9a7b0c36ae..6a8f09446924 100644
--- a/kernel/rcu/tree.h
+++ b/kernel/rcu/tree.h
@@ -388,9 +388,9 @@ struct rcu_data {
 	struct rcu_head oom_head;
 #endif /* #ifdef CONFIG_RCU_FAST_NO_HZ */
 	struct mutex exp_funnel_mutex;
-	atomic_long_t expedited_workdone1;	/* # done by others #1. */
-	atomic_long_t expedited_workdone2;	/* # done by others #2. */
-	atomic_long_t expedited_workdone3;	/* # done by others #3. */
+	atomic_long_t exp_workdone1;	/* # done by others #1. */
+	atomic_long_t exp_workdone2;	/* # done by others #2. */
+	atomic_long_t exp_workdone3;	/* # done by others #3. */
 
 	/* 7) Callback offloading. */
 #ifdef CONFIG_RCU_NOCB_CPU
diff --git a/kernel/rcu/tree_trace.c b/kernel/rcu/tree_trace.c
index d149c412a4e5..86782f9a4604 100644
--- a/kernel/rcu/tree_trace.c
+++ b/kernel/rcu/tree_trace.c
@@ -189,9 +189,9 @@ static int show_rcuexp(struct seq_file *m, void *v)
 
 	for_each_possible_cpu(cpu) {
 		rdp = per_cpu_ptr(rsp->rda, cpu);
-		s1 += atomic_long_read(&rdp->expedited_workdone1);
-		s2 += atomic_long_read(&rdp->expedited_workdone2);
-		s3 += atomic_long_read(&rdp->expedited_workdone3);
+		s1 += atomic_long_read(&rdp->exp_workdone1);
+		s2 += atomic_long_read(&rdp->exp_workdone2);
+		s3 += atomic_long_read(&rdp->exp_workdone3);
 	}
 	seq_printf(m, "s=%lu wd1=%lu wd2=%lu wd3=%lu n=%lu enq=%d sc=%lu\n",
 		   rsp->expedited_sequence, s1, s2, s3,
-- 
2.5.2

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH tip/core/rcu 11/16] rcu: Enforce expedited-GP fairness via funnel wait queue
  2016-04-12 15:29 [PATCH tip/core/rcu 0/16] Expedited grace-period changes for 4.7 Paul E. McKenney
                   ` (9 preceding siblings ...)
  2016-04-12 15:29 ` [PATCH tip/core/rcu 10/16] rcu: Shorten expedited_workdone* to exp_workdone* Paul E. McKenney
@ 2016-04-12 15:29 ` Paul E. McKenney
  2016-04-12 15:29 ` [PATCH tip/core/rcu 12/16] rcu: Add exp_funnel_lock() fastpath Paul E. McKenney
                   ` (4 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: Paul E. McKenney @ 2016-04-12 15:29 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, jiangshanlai, dipankar, akpm, mathieu.desnoyers, josh,
	tglx, peterz, rostedt, dhowells, edumazet, dvhart, fweisbec,
	oleg, bobby.prani, Paul E. McKenney

The current mutex-based funnel-locking approach used by expedited grace
periods is subject to severe unfairness.  The problem arises when a
few tasks, making a path from leaves to root, all wake up before other
tasks do.  A new task can then follow this path all the way to the root,
which needlessly delays tasks whose grace period is done, but who do
not happen to acquire the lock quickly enough.

This commit avoids this problem by maintaining per-rcu_node wait queues,
along with a per-rcu_node counter that tracks the latest grace period
sought by an earlier task to visit this node.  If that grace period
would satisfy the current task, instead of proceeding up the tree,
it waits on the current rcu_node structure using a pair of wait queues
provided for that purpose.  This decouples awakening of old tasks from
the arrival of new tasks.

If the wakeups prove to be a bottleneck, additional kthreads can be
brought to bear for that purpose.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 include/trace/events/rcu.h |   5 +-
 kernel/rcu/tree.c          | 155 +++++++++++++++++++++++----------------------
 kernel/rcu/tree.h          |  10 ++-
 kernel/rcu/tree_plugin.h   |  16 ++---
 4 files changed, 93 insertions(+), 93 deletions(-)

diff --git a/include/trace/events/rcu.h b/include/trace/events/rcu.h
index aacc172eba7e..d3e756539d44 100644
--- a/include/trace/events/rcu.h
+++ b/include/trace/events/rcu.h
@@ -179,6 +179,7 @@ TRACE_EVENT(rcu_grace_period_init,
  *	"snap": Captured snapshot of expedited grace period sequence number.
  *	"start": Started a real expedited grace period.
  *	"end": Ended a real expedited grace period.
+ *	"endwake": Woke piggybackers up.
  *	"done": Someone else did the expedited grace period for us.
  */
 TRACE_EVENT(rcu_exp_grace_period,
@@ -210,8 +211,8 @@ TRACE_EVENT(rcu_exp_grace_period,
  * and highest-numbered CPU associated with the current rcu_node structure,
  * and a string.  identifying the grace-period-related event as follows:
  *
- *	"acq": Acquired a level of funnel lock
- *	"rel": Released a level of funnel lock
+ *	"nxtlvl": Advance to next level of rcu_node funnel
+ *	"wait": Wait for someone else to do expedited GP
  */
 TRACE_EVENT(rcu_exp_funnel_lock,
 
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 89f028767765..bd2658edce00 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -102,6 +102,7 @@ struct rcu_state sname##_state = { \
 	.barrier_mutex = __MUTEX_INITIALIZER(sname##_state.barrier_mutex), \
 	.name = RCU_STATE_NAME(sname), \
 	.abbr = sabbr, \
+	.exp_mutex = __MUTEX_INITIALIZER(sname##_state.exp_mutex), \
 }
 
 RCU_STATE_INITIALIZER(rcu_sched, 's', call_rcu_sched);
@@ -3484,7 +3485,7 @@ static void __maybe_unused sync_exp_reset_tree(struct rcu_state *rsp)
  * for the current expedited grace period.  Works only for preemptible
  * RCU -- other RCU implementation use other means.
  *
- * Caller must hold the root rcu_node's exp_funnel_mutex.
+ * Caller must hold the rcu_state's exp_mutex.
  */
 static int sync_rcu_preempt_exp_done(struct rcu_node *rnp)
 {
@@ -3500,8 +3501,8 @@ static int sync_rcu_preempt_exp_done(struct rcu_node *rnp)
  * recursively up the tree.  (Calm down, calm down, we do the recursion
  * iteratively!)
  *
- * Caller must hold the root rcu_node's exp_funnel_mutex and the
- * specified rcu_node structure's ->lock.
+ * Caller must hold the rcu_state's exp_mutex and the specified rcu_node
+ * structure's ->lock.
  */
 static void __rcu_report_exp_rnp(struct rcu_state *rsp, struct rcu_node *rnp,
 				 bool wake, unsigned long flags)
@@ -3538,7 +3539,7 @@ static void __rcu_report_exp_rnp(struct rcu_state *rsp, struct rcu_node *rnp,
  * Report expedited quiescent state for specified node.  This is a
  * lock-acquisition wrapper function for __rcu_report_exp_rnp().
  *
- * Caller must hold the root rcu_node's exp_funnel_mutex.
+ * Caller must hold the rcu_state's exp_mutex.
  */
 static void __maybe_unused rcu_report_exp_rnp(struct rcu_state *rsp,
 					      struct rcu_node *rnp, bool wake)
@@ -3551,8 +3552,8 @@ static void __maybe_unused rcu_report_exp_rnp(struct rcu_state *rsp,
 
 /*
  * Report expedited quiescent state for multiple CPUs, all covered by the
- * specified leaf rcu_node structure.  Caller must hold the root
- * rcu_node's exp_funnel_mutex.
+ * specified leaf rcu_node structure.  Caller must hold the rcu_state's
+ * exp_mutex.
  */
 static void rcu_report_exp_cpu_mult(struct rcu_state *rsp, struct rcu_node *rnp,
 				    unsigned long mask, bool wake)
@@ -3570,7 +3571,6 @@ static void rcu_report_exp_cpu_mult(struct rcu_state *rsp, struct rcu_node *rnp,
 
 /*
  * Report expedited quiescent state for specified rcu_data (CPU).
- * Caller must hold the root rcu_node's exp_funnel_mutex.
  */
 static void rcu_report_exp_rdp(struct rcu_state *rsp, struct rcu_data *rdp,
 			       bool wake)
@@ -3579,24 +3579,11 @@ static void rcu_report_exp_rdp(struct rcu_state *rsp, struct rcu_data *rdp,
 }
 
 /* Common code for synchronize_{rcu,sched}_expedited() work-done checking. */
-static bool sync_exp_work_done(struct rcu_state *rsp, struct rcu_node *rnp,
-			       struct rcu_data *rdp,
-			       atomic_long_t *stat, unsigned long s)
+static bool sync_exp_work_done(struct rcu_state *rsp, atomic_long_t *stat,
+			       unsigned long s)
 {
 	if (rcu_exp_gp_seq_done(rsp, s)) {
 		trace_rcu_exp_grace_period(rsp->name, s, TPS("done"));
-		if (rnp) {
-			trace_rcu_exp_funnel_lock(rsp->name, rnp->level,
-						  rnp->grplo, rnp->grphi,
-						  TPS("rel"));
-			mutex_unlock(&rnp->exp_funnel_mutex);
-		} else if (rdp) {
-			trace_rcu_exp_funnel_lock(rsp->name,
-						  rdp->mynode->level + 1,
-						  rdp->cpu, rdp->cpu,
-						  TPS("rel"));
-			mutex_unlock(&rdp->exp_funnel_mutex);
-		}
 		/* Ensure test happens before caller kfree(). */
 		smp_mb__before_atomic(); /* ^^^ */
 		atomic_long_inc(stat);
@@ -3606,53 +3593,53 @@ static bool sync_exp_work_done(struct rcu_state *rsp, struct rcu_node *rnp,
 }
 
 /*
- * Funnel-lock acquisition for expedited grace periods.  Returns a
- * pointer to the root rcu_node structure, or NULL if some other
- * task did the expedited grace period for us.
+ * Funnel-lock acquisition for expedited grace periods.  Returns true
+ * if some other task completed an expedited grace period that this task
+ * can piggy-back on, and with no mutex held.  Otherwise, returns false
+ * with the mutex held, indicating that the caller must actually do the
+ * expedited grace period.
  */
-static struct rcu_node *exp_funnel_lock(struct rcu_state *rsp, unsigned long s)
+static bool exp_funnel_lock(struct rcu_state *rsp, unsigned long s)
 {
 	struct rcu_data *rdp = per_cpu_ptr(rsp->rda, raw_smp_processor_id());
-	struct rcu_node *rnp0;
-	struct rcu_node *rnp1 = NULL;
+	struct rcu_node *rnp = rdp->mynode;
 
 	/*
-	 * Each pass through the following loop works its way
-	 * up the rcu_node tree, returning if others have done the
-	 * work or otherwise falls through holding the root rnp's
-	 * ->exp_funnel_mutex.  The mapping from CPU to rcu_node structure
-	 * can be inexact, as it is just promoting locality and is not
-	 * strictly needed for correctness.
+	 * Each pass through the following loop works its way up
+	 * the rcu_node tree, returning if others have done the work or
+	 * otherwise falls through to acquire rsp->exp_mutex.  The mapping
+	 * from CPU to rcu_node structure can be inexact, as it is just
+	 * promoting locality and is not strictly needed for correctness.
 	 */
-	if (sync_exp_work_done(rsp, NULL, NULL, &rdp->exp_workdone1, s))
-		return NULL;
-	mutex_lock(&rdp->exp_funnel_mutex);
-	trace_rcu_exp_funnel_lock(rsp->name, rdp->mynode->level + 1,
-				  rdp->cpu, rdp->cpu, TPS("acq"));
-	rnp0 = rdp->mynode;
-	for (; rnp0 != NULL; rnp0 = rnp0->parent) {
-		if (sync_exp_work_done(rsp, rnp1, rdp, &rdp->exp_workdone2, s))
-			return NULL;
-		mutex_lock(&rnp0->exp_funnel_mutex);
-		trace_rcu_exp_funnel_lock(rsp->name, rnp0->level,
-					  rnp0->grplo, rnp0->grphi, TPS("acq"));
-		if (rnp1) {
-			trace_rcu_exp_funnel_lock(rsp->name, rnp1->level,
-						  rnp1->grplo, rnp1->grphi,
-						  TPS("rel"));
-			mutex_unlock(&rnp1->exp_funnel_mutex);
-		} else {
-			trace_rcu_exp_funnel_lock(rsp->name,
-						  rdp->mynode->level + 1,
-						  rdp->cpu, rdp->cpu,
-						  TPS("rel"));
-			mutex_unlock(&rdp->exp_funnel_mutex);
+	for (; rnp != NULL; rnp = rnp->parent) {
+		if (sync_exp_work_done(rsp, &rdp->exp_workdone1, s))
+			return true;
+
+		/* Work not done, either wait here or go up. */
+		spin_lock(&rnp->exp_lock);
+		if (ULONG_CMP_GE(rnp->exp_seq_rq, s)) {
+
+			/* Someone else doing GP, so wait for them. */
+			spin_unlock(&rnp->exp_lock);
+			trace_rcu_exp_funnel_lock(rsp->name, rnp->level,
+						  rnp->grplo, rnp->grphi,
+						  TPS("wait"));
+			wait_event(rnp->exp_wq[(s >> 1) & 0x1],
+				   sync_exp_work_done(rsp,
+						      &rdp->exp_workdone2, s));
+			return true;
 		}
-		rnp1 = rnp0;
+		rnp->exp_seq_rq = s; /* Followers can wait on us. */
+		spin_unlock(&rnp->exp_lock);
+		trace_rcu_exp_funnel_lock(rsp->name, rnp->level, rnp->grplo,
+					  rnp->grphi, TPS("nxtlvl"));
 	}
-	if (sync_exp_work_done(rsp, rnp1, rdp, &rdp->exp_workdone3, s))
-		return NULL;
-	return rnp1;
+	mutex_lock(&rsp->exp_mutex);
+	if (sync_exp_work_done(rsp, &rdp->exp_workdone3, s)) {
+		mutex_unlock(&rsp->exp_mutex);
+		return true;
+	}
+	return false;
 }
 
 /* Invoked on each online non-idle CPU for expedited quiescent state. */
@@ -3841,6 +3828,27 @@ static void synchronize_sched_expedited_wait(struct rcu_state *rsp)
 	}
 }
 
+/*
+ * Wake up everyone who piggybacked on the just-completed expedited
+ * grace period.  Also update all the ->exp_seq_rq counters as needed
+ * in order to avoid counter-wrap problems.
+ */
+static void rcu_exp_wake(struct rcu_state *rsp, unsigned long s)
+{
+	struct rcu_node *rnp;
+
+	rcu_for_each_node_breadth_first(rsp, rnp) {
+		if (ULONG_CMP_LT(READ_ONCE(rnp->exp_seq_rq), s)) {
+			spin_lock(&rnp->exp_lock);
+			/* Recheck, avoid hang in case someone just arrived. */
+			if (ULONG_CMP_LT(rnp->exp_seq_rq, s))
+				rnp->exp_seq_rq = s;
+			spin_unlock(&rnp->exp_lock);
+		}
+		wake_up_all(&rnp->exp_wq[(rsp->expedited_sequence >> 1) & 0x1]);
+	}
+}
+
 /**
  * synchronize_sched_expedited - Brute-force RCU-sched grace period
  *
@@ -3860,7 +3868,6 @@ static void synchronize_sched_expedited_wait(struct rcu_state *rsp)
 void synchronize_sched_expedited(void)
 {
 	unsigned long s;
-	struct rcu_node *rnp;
 	struct rcu_state *rsp = &rcu_sched_state;
 
 	/* If only one CPU, this is automatically a grace period. */
@@ -3877,20 +3884,23 @@ void synchronize_sched_expedited(void)
 	s = rcu_exp_gp_seq_snap(rsp);
 	trace_rcu_exp_grace_period(rsp->name, s, TPS("snap"));
 
-	rnp = exp_funnel_lock(rsp, s);
-	if (rnp == NULL)
+	if (exp_funnel_lock(rsp, s))
 		return;  /* Someone else did our work for us. */
 
 	rcu_exp_gp_seq_start(rsp);
 	trace_rcu_exp_grace_period(rsp->name, s, TPS("start"));
+
+	/* Initialize the rcu_node tree in preparation for the wait. */
 	sync_rcu_exp_select_cpus(rsp, sync_sched_exp_handler);
-	synchronize_sched_expedited_wait(rsp);
 
+	/* Wait and clean up, including waking everyone. */
+	synchronize_sched_expedited_wait(rsp);
 	rcu_exp_gp_seq_end(rsp);
 	trace_rcu_exp_grace_period(rsp->name, s, TPS("end"));
-	trace_rcu_exp_funnel_lock(rsp->name, rnp->level,
-				  rnp->grplo, rnp->grphi, TPS("rel"));
-	mutex_unlock(&rnp->exp_funnel_mutex);
+	rcu_exp_wake(rsp, s);
+
+	trace_rcu_exp_grace_period(rsp->name, s, TPS("endwake"));
+	mutex_unlock(&rsp->exp_mutex);
 }
 EXPORT_SYMBOL_GPL(synchronize_sched_expedited);
 
@@ -4190,7 +4200,6 @@ rcu_boot_init_percpu_data(int cpu, struct rcu_state *rsp)
 	WARN_ON_ONCE(atomic_read(&rdp->dynticks->dynticks) != 1);
 	rdp->cpu = cpu;
 	rdp->rsp = rsp;
-	mutex_init(&rdp->exp_funnel_mutex);
 	rcu_boot_init_nocb_percpu_data(rdp);
 	raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
 }
@@ -4448,10 +4457,8 @@ static void __init rcu_init_one(struct rcu_state *rsp)
 {
 	static const char * const buf[] = RCU_NODE_NAME_INIT;
 	static const char * const fqs[] = RCU_FQS_NAME_INIT;
-	static const char * const exp[] = RCU_EXP_NAME_INIT;
 	static struct lock_class_key rcu_node_class[RCU_NUM_LVLS];
 	static struct lock_class_key rcu_fqs_class[RCU_NUM_LVLS];
-	static struct lock_class_key rcu_exp_class[RCU_NUM_LVLS];
 	static u8 fl_mask = 0x1;
 
 	int levelcnt[RCU_NUM_LVLS];		/* # nodes in each level. */
@@ -4510,9 +4517,9 @@ static void __init rcu_init_one(struct rcu_state *rsp)
 			rnp->level = i;
 			INIT_LIST_HEAD(&rnp->blkd_tasks);
 			rcu_init_one_nocb(rnp);
-			mutex_init(&rnp->exp_funnel_mutex);
-			lockdep_set_class_and_name(&rnp->exp_funnel_mutex,
-						   &rcu_exp_class[i], exp[i]);
+			init_waitqueue_head(&rnp->exp_wq[0]);
+			init_waitqueue_head(&rnp->exp_wq[1]);
+			spin_lock_init(&rnp->exp_lock);
 		}
 	}
 
diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
index 6a8f09446924..f9d4fbb1e014 100644
--- a/kernel/rcu/tree.h
+++ b/kernel/rcu/tree.h
@@ -70,7 +70,6 @@
 #  define NUM_RCU_LVL_INIT    { NUM_RCU_LVL_0 }
 #  define RCU_NODE_NAME_INIT  { "rcu_node_0" }
 #  define RCU_FQS_NAME_INIT   { "rcu_node_fqs_0" }
-#  define RCU_EXP_NAME_INIT   { "rcu_node_exp_0" }
 #elif NR_CPUS <= RCU_FANOUT_2
 #  define RCU_NUM_LVLS	      2
 #  define NUM_RCU_LVL_0	      1
@@ -79,7 +78,6 @@
 #  define NUM_RCU_LVL_INIT    { NUM_RCU_LVL_0, NUM_RCU_LVL_1 }
 #  define RCU_NODE_NAME_INIT  { "rcu_node_0", "rcu_node_1" }
 #  define RCU_FQS_NAME_INIT   { "rcu_node_fqs_0", "rcu_node_fqs_1" }
-#  define RCU_EXP_NAME_INIT   { "rcu_node_exp_0", "rcu_node_exp_1" }
 #elif NR_CPUS <= RCU_FANOUT_3
 #  define RCU_NUM_LVLS	      3
 #  define NUM_RCU_LVL_0	      1
@@ -89,7 +87,6 @@
 #  define NUM_RCU_LVL_INIT    { NUM_RCU_LVL_0, NUM_RCU_LVL_1, NUM_RCU_LVL_2 }
 #  define RCU_NODE_NAME_INIT  { "rcu_node_0", "rcu_node_1", "rcu_node_2" }
 #  define RCU_FQS_NAME_INIT   { "rcu_node_fqs_0", "rcu_node_fqs_1", "rcu_node_fqs_2" }
-#  define RCU_EXP_NAME_INIT   { "rcu_node_exp_0", "rcu_node_exp_1", "rcu_node_exp_2" }
 #elif NR_CPUS <= RCU_FANOUT_4
 #  define RCU_NUM_LVLS	      4
 #  define NUM_RCU_LVL_0	      1
@@ -100,7 +97,6 @@
 #  define NUM_RCU_LVL_INIT    { NUM_RCU_LVL_0, NUM_RCU_LVL_1, NUM_RCU_LVL_2, NUM_RCU_LVL_3 }
 #  define RCU_NODE_NAME_INIT  { "rcu_node_0", "rcu_node_1", "rcu_node_2", "rcu_node_3" }
 #  define RCU_FQS_NAME_INIT   { "rcu_node_fqs_0", "rcu_node_fqs_1", "rcu_node_fqs_2", "rcu_node_fqs_3" }
-#  define RCU_EXP_NAME_INIT   { "rcu_node_exp_0", "rcu_node_exp_1", "rcu_node_exp_2", "rcu_node_exp_3" }
 #else
 # error "CONFIG_RCU_FANOUT insufficient for NR_CPUS"
 #endif /* #if (NR_CPUS) <= RCU_FANOUT_1 */
@@ -252,7 +248,9 @@ struct rcu_node {
 				/* Counts of upcoming no-CB GP requests. */
 	raw_spinlock_t fqslock ____cacheline_internodealigned_in_smp;
 
-	struct mutex exp_funnel_mutex ____cacheline_internodealigned_in_smp;
+	spinlock_t exp_lock ____cacheline_internodealigned_in_smp;
+	unsigned long exp_seq_rq;
+	wait_queue_head_t exp_wq[2];
 } ____cacheline_internodealigned_in_smp;
 
 /*
@@ -387,7 +385,6 @@ struct rcu_data {
 #ifdef CONFIG_RCU_FAST_NO_HZ
 	struct rcu_head oom_head;
 #endif /* #ifdef CONFIG_RCU_FAST_NO_HZ */
-	struct mutex exp_funnel_mutex;
 	atomic_long_t exp_workdone1;	/* # done by others #1. */
 	atomic_long_t exp_workdone2;	/* # done by others #2. */
 	atomic_long_t exp_workdone3;	/* # done by others #3. */
@@ -504,6 +501,7 @@ struct rcu_state {
 						/*  _rcu_barrier(). */
 	/* End of fields guarded by barrier_mutex. */
 
+	struct mutex exp_mutex;			/* Serialize expedited GP. */
 	unsigned long expedited_sequence;	/* Take a ticket. */
 	atomic_long_t expedited_normal;		/* # fallbacks to normal. */
 	atomic_t expedited_need_qs;		/* # CPUs left to check in. */
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 36e94aed38a7..c82c3640493f 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -738,8 +738,6 @@ static void sync_rcu_exp_handler(void *info)
  */
 void synchronize_rcu_expedited(void)
 {
-	struct rcu_node *rnp;
-	struct rcu_node *rnp_unlock;
 	struct rcu_state *rsp = rcu_state_p;
 	unsigned long s;
 
@@ -752,8 +750,7 @@ void synchronize_rcu_expedited(void)
 	s = rcu_exp_gp_seq_snap(rsp);
 	trace_rcu_exp_grace_period(rsp->name, s, TPS("snap"));
 
-	rnp_unlock = exp_funnel_lock(rsp, s);
-	if (rnp_unlock == NULL)
+	if (exp_funnel_lock(rsp, s))
 		return;  /* Someone else did our work for us. */
 
 	rcu_exp_gp_seq_start(rsp);
@@ -763,16 +760,13 @@ void synchronize_rcu_expedited(void)
 	sync_rcu_exp_select_cpus(rsp, sync_rcu_exp_handler);
 
 	/* Wait for snapshotted ->blkd_tasks lists to drain. */
-	rnp = rcu_get_root(rsp);
 	synchronize_sched_expedited_wait(rsp);
-
-	/* Clean up and exit. */
 	rcu_exp_gp_seq_end(rsp);
 	trace_rcu_exp_grace_period(rsp->name, s, TPS("end"));
-	mutex_unlock(&rnp_unlock->exp_funnel_mutex);
-	trace_rcu_exp_funnel_lock(rsp->name, rnp_unlock->level,
-				  rnp_unlock->grplo, rnp_unlock->grphi,
-				  TPS("rel"));
+	rcu_exp_wake(rsp, s);
+
+	trace_rcu_exp_grace_period(rsp->name, s, TPS("endwake"));
+	mutex_unlock(&rsp->exp_mutex);
 }
 EXPORT_SYMBOL_GPL(synchronize_rcu_expedited);
 
-- 
2.5.2

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH tip/core/rcu 12/16] rcu: Add exp_funnel_lock() fastpath
  2016-04-12 15:29 [PATCH tip/core/rcu 0/16] Expedited grace-period changes for 4.7 Paul E. McKenney
                   ` (10 preceding siblings ...)
  2016-04-12 15:29 ` [PATCH tip/core/rcu 11/16] rcu: Enforce expedited-GP fairness via funnel wait queue Paul E. McKenney
@ 2016-04-12 15:29 ` Paul E. McKenney
  2016-04-12 15:29 ` [PATCH tip/core/rcu 13/16] rcu: Consolidate expedited GP code into rcu_exp_wait_wake() Paul E. McKenney
                   ` (3 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: Paul E. McKenney @ 2016-04-12 15:29 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, jiangshanlai, dipankar, akpm, mathieu.desnoyers, josh,
	tglx, peterz, rostedt, dhowells, edumazet, dvhart, fweisbec,
	oleg, bobby.prani, Paul E. McKenney

This commit speeds up the low-contention case, especially for systems
with large rcu_node trees, by attempting to directly acquire the
->exp_mutex.  This fastpath checks the leaves and root first in
order to avoid excessive memory contention on the mutex itself.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 kernel/rcu/tree.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index bd2658edce00..892a140ae7b6 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -3603,6 +3603,15 @@ static bool exp_funnel_lock(struct rcu_state *rsp, unsigned long s)
 {
 	struct rcu_data *rdp = per_cpu_ptr(rsp->rda, raw_smp_processor_id());
 	struct rcu_node *rnp = rdp->mynode;
+	struct rcu_node *rnp_root = rcu_get_root(rsp);
+
+	/* Low-contention fastpath. */
+	if (ULONG_CMP_LT(READ_ONCE(rnp->exp_seq_rq), s) &&
+	    (rnp == rnp_root ||
+	     ULONG_CMP_LT(READ_ONCE(rnp_root->exp_seq_rq), s)) &&
+	    !mutex_is_locked(&rsp->exp_mutex) &&
+	    mutex_trylock(&rsp->exp_mutex))
+		goto fastpath;
 
 	/*
 	 * Each pass through the following loop works its way up
@@ -3635,6 +3644,7 @@ static bool exp_funnel_lock(struct rcu_state *rsp, unsigned long s)
 					  rnp->grphi, TPS("nxtlvl"));
 	}
 	mutex_lock(&rsp->exp_mutex);
+fastpath:
 	if (sync_exp_work_done(rsp, &rdp->exp_workdone3, s)) {
 		mutex_unlock(&rsp->exp_mutex);
 		return true;
-- 
2.5.2

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH tip/core/rcu 13/16] rcu: Consolidate expedited GP code into rcu_exp_wait_wake()
  2016-04-12 15:29 [PATCH tip/core/rcu 0/16] Expedited grace-period changes for 4.7 Paul E. McKenney
                   ` (11 preceding siblings ...)
  2016-04-12 15:29 ` [PATCH tip/core/rcu 12/16] rcu: Add exp_funnel_lock() fastpath Paul E. McKenney
@ 2016-04-12 15:29 ` Paul E. McKenney
  2016-04-12 15:29 ` [PATCH tip/core/rcu 14/16] rcu: Consolidate expedited GP tracing into rcu_exp_gp_seq_snap() Paul E. McKenney
                   ` (2 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: Paul E. McKenney @ 2016-04-12 15:29 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, jiangshanlai, dipankar, akpm, mathieu.desnoyers, josh,
	tglx, peterz, rostedt, dhowells, edumazet, dvhart, fweisbec,
	oleg, bobby.prani, Paul E. McKenney

Currently, synchronize_rcu_expedited() and rcu_sched_expedited() have
significant duplicate code.  This commit therefore consolidates some of
this code into rcu_exp_wake(), which is now renamed to rcu_exp_wait_wake()
in recognition of its added responsibilities.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 kernel/rcu/tree.c        | 18 +++++++++---------
 kernel/rcu/tree_plugin.h | 10 ++--------
 2 files changed, 11 insertions(+), 17 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 892a140ae7b6..fd86eca9478e 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -3839,14 +3839,18 @@ static void synchronize_sched_expedited_wait(struct rcu_state *rsp)
 }
 
 /*
- * Wake up everyone who piggybacked on the just-completed expedited
+ * Wait for the current expedited grace period to complete, and then
+ * wake up everyone who piggybacked on the just-completed expedited
  * grace period.  Also update all the ->exp_seq_rq counters as needed
  * in order to avoid counter-wrap problems.
  */
-static void rcu_exp_wake(struct rcu_state *rsp, unsigned long s)
+static void rcu_exp_wait_wake(struct rcu_state *rsp, unsigned long s)
 {
 	struct rcu_node *rnp;
 
+	synchronize_sched_expedited_wait(rsp);
+	rcu_exp_gp_seq_end(rsp);
+	trace_rcu_exp_grace_period(rsp->name, s, TPS("end"));
 	rcu_for_each_node_breadth_first(rsp, rnp) {
 		if (ULONG_CMP_LT(READ_ONCE(rnp->exp_seq_rq), s)) {
 			spin_lock(&rnp->exp_lock);
@@ -3857,6 +3861,8 @@ static void rcu_exp_wake(struct rcu_state *rsp, unsigned long s)
 		}
 		wake_up_all(&rnp->exp_wq[(rsp->expedited_sequence >> 1) & 0x1]);
 	}
+	trace_rcu_exp_grace_period(rsp->name, s, TPS("endwake"));
+	mutex_unlock(&rsp->exp_mutex);
 }
 
 /**
@@ -3904,13 +3910,7 @@ void synchronize_sched_expedited(void)
 	sync_rcu_exp_select_cpus(rsp, sync_sched_exp_handler);
 
 	/* Wait and clean up, including waking everyone. */
-	synchronize_sched_expedited_wait(rsp);
-	rcu_exp_gp_seq_end(rsp);
-	trace_rcu_exp_grace_period(rsp->name, s, TPS("end"));
-	rcu_exp_wake(rsp, s);
-
-	trace_rcu_exp_grace_period(rsp->name, s, TPS("endwake"));
-	mutex_unlock(&rsp->exp_mutex);
+	rcu_exp_wait_wake(rsp, s);
 }
 EXPORT_SYMBOL_GPL(synchronize_sched_expedited);
 
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index c82c3640493f..b6d5dde6eab9 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -759,14 +759,8 @@ void synchronize_rcu_expedited(void)
 	/* Initialize the rcu_node tree in preparation for the wait. */
 	sync_rcu_exp_select_cpus(rsp, sync_rcu_exp_handler);
 
-	/* Wait for snapshotted ->blkd_tasks lists to drain. */
-	synchronize_sched_expedited_wait(rsp);
-	rcu_exp_gp_seq_end(rsp);
-	trace_rcu_exp_grace_period(rsp->name, s, TPS("end"));
-	rcu_exp_wake(rsp, s);
-
-	trace_rcu_exp_grace_period(rsp->name, s, TPS("endwake"));
-	mutex_unlock(&rsp->exp_mutex);
+	/* Wait for ->blkd_tasks lists to drain, then wake everyone up. */
+	rcu_exp_wait_wake(rsp, s);
 }
 EXPORT_SYMBOL_GPL(synchronize_rcu_expedited);
 
-- 
2.5.2

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH tip/core/rcu 14/16] rcu: Consolidate expedited GP tracing into rcu_exp_gp_seq_snap()
  2016-04-12 15:29 [PATCH tip/core/rcu 0/16] Expedited grace-period changes for 4.7 Paul E. McKenney
                   ` (12 preceding siblings ...)
  2016-04-12 15:29 ` [PATCH tip/core/rcu 13/16] rcu: Consolidate expedited GP code into rcu_exp_wait_wake() Paul E. McKenney
@ 2016-04-12 15:29 ` Paul E. McKenney
  2016-04-12 15:29 ` [PATCH tip/core/rcu 15/16] rcu: Consolidate expedited GP code into exp_funnel_lock() Paul E. McKenney
  2016-04-12 15:29 ` [PATCH tip/core/rcu 16/16] rcu: Overlap wakeups with next expedited grace period Paul E. McKenney
  15 siblings, 0 replies; 17+ messages in thread
From: Paul E. McKenney @ 2016-04-12 15:29 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, jiangshanlai, dipankar, akpm, mathieu.desnoyers, josh,
	tglx, peterz, rostedt, dhowells, edumazet, dvhart, fweisbec,
	oleg, bobby.prani, Paul E. McKenney

This commit moves some duplicate code from synchronize_rcu_expedited()
and synchronize_sched_expedited() into rcu_exp_gp_seq_snap().  This
doesn't save lines of code, but does eliminate a "tell me twice" issue.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 kernel/rcu/tree.c        | 8 +++++---
 kernel/rcu/tree_plugin.h | 2 --
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index fd86eca9478e..5b1c8fd89af0 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -3392,8 +3392,12 @@ static void rcu_exp_gp_seq_end(struct rcu_state *rsp)
 }
 static unsigned long rcu_exp_gp_seq_snap(struct rcu_state *rsp)
 {
+	unsigned long s;
+
 	smp_mb(); /* Caller's modifications seen first by other CPUs. */
-	return rcu_seq_snap(&rsp->expedited_sequence);
+	s = rcu_seq_snap(&rsp->expedited_sequence);
+	trace_rcu_exp_grace_period(rsp->name, s, TPS("snap"));
+	return s;
 }
 static bool rcu_exp_gp_seq_done(struct rcu_state *rsp, unsigned long s)
 {
@@ -3898,8 +3902,6 @@ void synchronize_sched_expedited(void)
 
 	/* Take a snapshot of the sequence number.  */
 	s = rcu_exp_gp_seq_snap(rsp);
-	trace_rcu_exp_grace_period(rsp->name, s, TPS("snap"));
-
 	if (exp_funnel_lock(rsp, s))
 		return;  /* Someone else did our work for us. */
 
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index b6d5dde6eab9..529a44085a63 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -748,8 +748,6 @@ void synchronize_rcu_expedited(void)
 	}
 
 	s = rcu_exp_gp_seq_snap(rsp);
-	trace_rcu_exp_grace_period(rsp->name, s, TPS("snap"));
-
 	if (exp_funnel_lock(rsp, s))
 		return;  /* Someone else did our work for us. */
 
-- 
2.5.2

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH tip/core/rcu 15/16] rcu: Consolidate expedited GP code into exp_funnel_lock()
  2016-04-12 15:29 [PATCH tip/core/rcu 0/16] Expedited grace-period changes for 4.7 Paul E. McKenney
                   ` (13 preceding siblings ...)
  2016-04-12 15:29 ` [PATCH tip/core/rcu 14/16] rcu: Consolidate expedited GP tracing into rcu_exp_gp_seq_snap() Paul E. McKenney
@ 2016-04-12 15:29 ` Paul E. McKenney
  2016-04-12 15:29 ` [PATCH tip/core/rcu 16/16] rcu: Overlap wakeups with next expedited grace period Paul E. McKenney
  15 siblings, 0 replies; 17+ messages in thread
From: Paul E. McKenney @ 2016-04-12 15:29 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, jiangshanlai, dipankar, akpm, mathieu.desnoyers, josh,
	tglx, peterz, rostedt, dhowells, edumazet, dvhart, fweisbec,
	oleg, bobby.prani, Paul E. McKenney

This commit pulls the grace-period-start counter adjustment and tracing
from synchronize_rcu_expedited() and synchronize_sched_expedited()
into exp_funnel_lock(), thus eliminating some code duplication.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 kernel/rcu/tree.c        | 5 ++---
 kernel/rcu/tree_plugin.h | 3 ---
 2 files changed, 2 insertions(+), 6 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 5b1c8fd89af0..e8fff14e417b 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -3653,6 +3653,8 @@ fastpath:
 		mutex_unlock(&rsp->exp_mutex);
 		return true;
 	}
+	rcu_exp_gp_seq_start(rsp);
+	trace_rcu_exp_grace_period(rsp->name, s, TPS("start"));
 	return false;
 }
 
@@ -3905,9 +3907,6 @@ void synchronize_sched_expedited(void)
 	if (exp_funnel_lock(rsp, s))
 		return;  /* Someone else did our work for us. */
 
-	rcu_exp_gp_seq_start(rsp);
-	trace_rcu_exp_grace_period(rsp->name, s, TPS("start"));
-
 	/* Initialize the rcu_node tree in preparation for the wait. */
 	sync_rcu_exp_select_cpus(rsp, sync_sched_exp_handler);
 
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 529a44085a63..ff1cd4e1188d 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -751,9 +751,6 @@ void synchronize_rcu_expedited(void)
 	if (exp_funnel_lock(rsp, s))
 		return;  /* Someone else did our work for us. */
 
-	rcu_exp_gp_seq_start(rsp);
-	trace_rcu_exp_grace_period(rsp->name, s, TPS("start"));
-
 	/* Initialize the rcu_node tree in preparation for the wait. */
 	sync_rcu_exp_select_cpus(rsp, sync_rcu_exp_handler);
 
-- 
2.5.2

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH tip/core/rcu 16/16] rcu: Overlap wakeups with next expedited grace period
  2016-04-12 15:29 [PATCH tip/core/rcu 0/16] Expedited grace-period changes for 4.7 Paul E. McKenney
                   ` (14 preceding siblings ...)
  2016-04-12 15:29 ` [PATCH tip/core/rcu 15/16] rcu: Consolidate expedited GP code into exp_funnel_lock() Paul E. McKenney
@ 2016-04-12 15:29 ` Paul E. McKenney
  15 siblings, 0 replies; 17+ messages in thread
From: Paul E. McKenney @ 2016-04-12 15:29 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, jiangshanlai, dipankar, akpm, mathieu.desnoyers, josh,
	tglx, peterz, rostedt, dhowells, edumazet, dvhart, fweisbec,
	oleg, bobby.prani, Paul E. McKenney

The current expedited grace-period implementation makes subsequent grace
periods wait on wakeups for the prior grace period.  This does not fit
the dictionary definition of "expedited", so this commit allows these two
phases to overlap.  Doing this requires four waitqueues rather than two
because tasks can now be waiting on the previous, current, and next grace
periods.  The fourth waitqueue makes the bit masking work out nicely.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 kernel/rcu/tree.c | 17 ++++++++++++++---
 kernel/rcu/tree.h |  3 ++-
 2 files changed, 16 insertions(+), 4 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index e8fff14e417b..1df100cb7a62 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -103,6 +103,7 @@ struct rcu_state sname##_state = { \
 	.name = RCU_STATE_NAME(sname), \
 	.abbr = sabbr, \
 	.exp_mutex = __MUTEX_INITIALIZER(sname##_state.exp_mutex), \
+	.exp_wake_mutex = __MUTEX_INITIALIZER(sname##_state.exp_wake_mutex), \
 }
 
 RCU_STATE_INITIALIZER(rcu_sched, 's', call_rcu_sched);
@@ -3637,7 +3638,7 @@ static bool exp_funnel_lock(struct rcu_state *rsp, unsigned long s)
 			trace_rcu_exp_funnel_lock(rsp->name, rnp->level,
 						  rnp->grplo, rnp->grphi,
 						  TPS("wait"));
-			wait_event(rnp->exp_wq[(s >> 1) & 0x1],
+			wait_event(rnp->exp_wq[(s >> 1) & 0x3],
 				   sync_exp_work_done(rsp,
 						      &rdp->exp_workdone2, s));
 			return true;
@@ -3857,6 +3858,14 @@ static void rcu_exp_wait_wake(struct rcu_state *rsp, unsigned long s)
 	synchronize_sched_expedited_wait(rsp);
 	rcu_exp_gp_seq_end(rsp);
 	trace_rcu_exp_grace_period(rsp->name, s, TPS("end"));
+
+	/*
+	 * Switch over to wakeup mode, allowing the next GP, but -only- the
+	 * next GP, to proceed.
+	 */
+	mutex_lock(&rsp->exp_wake_mutex);
+	mutex_unlock(&rsp->exp_mutex);
+
 	rcu_for_each_node_breadth_first(rsp, rnp) {
 		if (ULONG_CMP_LT(READ_ONCE(rnp->exp_seq_rq), s)) {
 			spin_lock(&rnp->exp_lock);
@@ -3865,10 +3874,10 @@ static void rcu_exp_wait_wake(struct rcu_state *rsp, unsigned long s)
 				rnp->exp_seq_rq = s;
 			spin_unlock(&rnp->exp_lock);
 		}
-		wake_up_all(&rnp->exp_wq[(rsp->expedited_sequence >> 1) & 0x1]);
+		wake_up_all(&rnp->exp_wq[(rsp->expedited_sequence >> 1) & 0x3]);
 	}
 	trace_rcu_exp_grace_period(rsp->name, s, TPS("endwake"));
-	mutex_unlock(&rsp->exp_mutex);
+	mutex_unlock(&rsp->exp_wake_mutex);
 }
 
 /**
@@ -4530,6 +4539,8 @@ static void __init rcu_init_one(struct rcu_state *rsp)
 			rcu_init_one_nocb(rnp);
 			init_waitqueue_head(&rnp->exp_wq[0]);
 			init_waitqueue_head(&rnp->exp_wq[1]);
+			init_waitqueue_head(&rnp->exp_wq[2]);
+			init_waitqueue_head(&rnp->exp_wq[3]);
 			spin_lock_init(&rnp->exp_lock);
 		}
 	}
diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
index f9d4fbb1e014..1194ab0da56a 100644
--- a/kernel/rcu/tree.h
+++ b/kernel/rcu/tree.h
@@ -250,7 +250,7 @@ struct rcu_node {
 
 	spinlock_t exp_lock ____cacheline_internodealigned_in_smp;
 	unsigned long exp_seq_rq;
-	wait_queue_head_t exp_wq[2];
+	wait_queue_head_t exp_wq[4];
 } ____cacheline_internodealigned_in_smp;
 
 /*
@@ -502,6 +502,7 @@ struct rcu_state {
 	/* End of fields guarded by barrier_mutex. */
 
 	struct mutex exp_mutex;			/* Serialize expedited GP. */
+	struct mutex exp_wake_mutex;		/* Serialize wakeup. */
 	unsigned long expedited_sequence;	/* Take a ticket. */
 	atomic_long_t expedited_normal;		/* # fallbacks to normal. */
 	atomic_t expedited_need_qs;		/* # CPUs left to check in. */
-- 
2.5.2

^ permalink raw reply related	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2016-04-12 15:33 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-04-12 15:29 [PATCH tip/core/rcu 0/16] Expedited grace-period changes for 4.7 Paul E. McKenney
2016-04-12 15:29 ` [PATCH tip/core/rcu 01/16] rcu: Make expedited RCU-sched grace period immediately detect idle Paul E. McKenney
2016-04-12 15:29 ` [PATCH tip/core/rcu 02/16] rcu: Make expedited RCU-preempt stall warnings count accurately Paul E. McKenney
2016-04-12 15:29 ` [PATCH tip/core/rcu 03/16] rcu: Make cond_resched_rcu_qs() supply RCU-sched expedited QS Paul E. McKenney
2016-04-12 15:29 ` [PATCH tip/core/rcu 04/16] rcu: Fix synchronize_rcu_expedited() header comment Paul E. McKenney
2016-04-12 15:29 ` [PATCH tip/core/rcu 05/16] rcu: Add event tracing definitions for expedited grace periods Paul E. McKenney
2016-04-12 15:29 ` [PATCH tip/core/rcu 06/16] rcu: Add funnel-locking tracing " Paul E. McKenney
2016-04-12 15:29 ` [PATCH tip/core/rcu 07/16] rcu: Add expedited-grace-period event tracing Paul E. McKenney
2016-04-12 15:29 ` [PATCH tip/core/rcu 08/16] rcu: Remove expedited GP funnel-lock bypass Paul E. McKenney
2016-04-12 15:29 ` [PATCH tip/core/rcu 09/16] rcu: Force boolean subscript for expedited stall warnings Paul E. McKenney
2016-04-12 15:29 ` [PATCH tip/core/rcu 10/16] rcu: Shorten expedited_workdone* to exp_workdone* Paul E. McKenney
2016-04-12 15:29 ` [PATCH tip/core/rcu 11/16] rcu: Enforce expedited-GP fairness via funnel wait queue Paul E. McKenney
2016-04-12 15:29 ` [PATCH tip/core/rcu 12/16] rcu: Add exp_funnel_lock() fastpath Paul E. McKenney
2016-04-12 15:29 ` [PATCH tip/core/rcu 13/16] rcu: Consolidate expedited GP code into rcu_exp_wait_wake() Paul E. McKenney
2016-04-12 15:29 ` [PATCH tip/core/rcu 14/16] rcu: Consolidate expedited GP tracing into rcu_exp_gp_seq_snap() Paul E. McKenney
2016-04-12 15:29 ` [PATCH tip/core/rcu 15/16] rcu: Consolidate expedited GP code into exp_funnel_lock() Paul E. McKenney
2016-04-12 15:29 ` [PATCH tip/core/rcu 16/16] rcu: Overlap wakeups with next expedited grace period Paul E. McKenney

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.